diff --git a/.gitattributes b/.gitattributes index 6ac63eb7cfee..9484cbe1260f 100644 --- a/.gitattributes +++ b/.gitattributes @@ -32,4 +32,7 @@ Dockerfile text .gitignore export-ignore .gitattributes export-ignore /gradlew* export-ignore -/gradle export-ignore +**/gradle export-ignore + +# Website is not part of archive +/website export-ignore diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index f2ded40dd5b6..70036185529d 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -11,23 +11,359 @@ Thank you for your contribution! Follow this checklist to help us incorporate yo See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). -Post-Commit Tests Status (on master branch) +`ValidatesRunner` compliance status (on master branch) +-------------------------------------------------------- + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LangULRDataflowFlinkSamzaSparkTwister2
Go--- + + Build Status + + + + Build Status + + + + Build Status + + + + Build Status + + ---
Java + + Build Status + + + + Build Status +
+ + Build Status +
+ + Build Status +
+ + Build Status +
+ + Build Status +
+
+ + Build Status +
+ + Build Status +
+ + Build Status +
+ + Build Status + +
+ + Build Status +
+ + Build Status + +
+ + Build Status +
+ + Build Status +
+ + Build Status + +
+ + Build Status + +
Python--- + + Build Status +
+ + Build Status +
+ + Build Status + +
+ + Build Status +
+ + Build Status + +
+ + Build Status + + + + Build Status + + ---
XLang + + Build Status + + + + Build Status + + + + Build Status + + + + Build Status + + + + Build Status + + ---
+ +Examples testing status on various runners +-------------------------------------------------------- + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LangULRDataflowFlinkSamzaSparkTwister2
Go---------------------
Java--- + + Build Status +
+ + Build Status +
+ + Build Status +
+
---------------
Python---------------------
XLang---------------------
+ +Post-Commit SDK/Transform Integration Tests Status (on master branch) ------------------------------------------------------------------------------------------------ -Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 ---- | --- | --- | --- | --- | --- | --- -Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- -Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) -Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | --- -XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | --- + + + + + + + + + + + + + + + +
GoJavaPython
+ + Build Status + + + + Build Status + + + + Build Status +
+ + Build Status +
+ + Build Status + +
Pre-Commit Tests Status (on master branch) ------------------------------------------------------------------------------------------------ ---- |Java | Python | Go | Website | Whitespace | Typescript ---- | --- | --- | --- | --- | --- | --- -Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/) -Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | --- + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
---JavaPythonGoWebsiteWhitespaceTypescript
Non-portable + + Build Status +
+
+ + Build Status +
+ + Build Status +
+ + Build Status +
+ + Build Status + +
+ + Build Status + + + + Build Status + + + + Build Status + + + + Build Status + +
Portable--- + + Build Status + + + + Build Status + + ---------
See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs. diff --git a/.github/actions/cancel-workflow-runs b/.github/actions/cancel-workflow-runs new file mode 160000 index 000000000000..953e057dc81d --- /dev/null +++ b/.github/actions/cancel-workflow-runs @@ -0,0 +1 @@ +Subproject commit 953e057dc81d3458935a18d1184c386b0f6b5738 diff --git a/.github/actions/github-push-action b/.github/actions/github-push-action new file mode 160000 index 000000000000..057a6ba835d9 --- /dev/null +++ b/.github/actions/github-push-action @@ -0,0 +1 @@ +Subproject commit 057a6ba835d986bfe495dd476a6c4db1d5f9503c diff --git a/.github/actions/gradle-command-action b/.github/actions/gradle-command-action new file mode 160000 index 000000000000..90ccf054e6b9 --- /dev/null +++ b/.github/actions/gradle-command-action @@ -0,0 +1 @@ +Subproject commit 90ccf054e6b9905f30f98c938bce4c6acd323b6b diff --git a/.github/autolabeler.yml b/.github/autolabeler.yml index 0df9e2892713..da96f2107923 100644 --- a/.github/autolabeler.yml +++ b/.github/autolabeler.yml @@ -18,7 +18,7 @@ # Please keep the entries sorted lexicographically in each category. # General -build: ["assembly.xml", "build.gradle", "buildSrc/*", ".gitattributes", ".github/*", ".gitignore", "gradle/*", ".mailmap", "ownership/*", "release/*", "sdks/java/build-tools/*", "settings.gradle"] +build: ["assembly.xml", "build.gradle.kts", "buildSrc/*", ".gitattributes", ".github/*", ".gitignore", "gradle/*", ".mailmap", "ownership/*", "release/*", "sdks/java/build-tools/*", "settings.gradle.kts"] docker: ["runners/flink/job-server-container/*", "runners/spark/job-server/container/*", "sdks/go/container/*", "sdks/java/container/*", "sdks/python/container/*"] examples: ["examples/*", "sdks/go/examples/*", "sdks/python/apache_beam/examples/*"] go: ["sdks/go/*"] diff --git a/.github/workflows/build_wheels.yml b/.github/workflows/build_wheels.yml index 048e871d0d49..4934be7c8ca0 100644 --- a/.github/workflows/build_wheels.yml +++ b/.github/workflows/build_wheels.yml @@ -29,7 +29,7 @@ on: branches: ['master', 'release-*'] tags: 'v*' paths: ['sdks/python/**', 'model/**', 'release/**'] - + workflow_dispatch: env: GCP_PATH: "gs://${{ secrets.GCP_PYTHON_WHEELS_BUCKET }}/${GITHUB_REF##*/}/${GITHUB_SHA}-${GITHUB_RUN_ID}/" @@ -59,6 +59,9 @@ jobs: build_source: runs-on: ubuntu-latest name: Build python source distribution + outputs: + is_rc: ${{ steps.is_rc.outputs.is_rc }} + rc_num: ${{ steps.get_rc_version.outputs.RC_NUM }} steps: - name: Checkout code uses: actions/checkout@v2 @@ -71,6 +74,24 @@ jobs: run: python -m pip install -r build-requirements.txt - name: Install wheels run: python -m pip install wheel + - name: Get tag + id: get_tag + run: | + echo ::set-output name=TAG::${GITHUB_REF#refs/*/} + - name: Check whether an -RC tag was applied to the commit. + id: is_rc + run: | + echo ${{ steps.get_tag.outputs.TAG }} > temp + OUTPUT=$( if grep -e '-RC.' -q temp; then echo 1; else echo 0; fi) + echo "::set-output name=is_rc::$OUTPUT" + - name: Get RELEASE_VERSION and RC_NUM + if: steps.is_rc.outputs.is_rc == 1 + id: get_rc_version + run: | + RC_NUM=$(sed -n "s/^.*-RC\([0-9]*\)/\1/p" temp) + RELEASE_VERSION=$(sed -n "s/^v\(.*\)-RC[0-9]/\1/p" temp) + echo "::set-output name=RC_NUM::$RC_NUM" + echo "::set-output name=RELEASE_VERSION::$RELEASE_VERSION" - name: Build source working-directory: ./sdks/python run: python setup.py sdist --formats=zip @@ -95,6 +116,50 @@ jobs: with: name: source_zip path: sdks/python/dist + - name: Clear dist + if: steps.is_rc.outputs.is_rc == 1 + working-directory: ./sdks/python + run: | + rm -r ./dist + rm -rd apache-beam-source + - name: Rewrite SDK version to include RC number + if: steps.is_rc.outputs.is_rc == 1 + working-directory: ./sdks/python + run: | + RELEASE_VERSION=${{ steps.get_rc_version.outputs.RELEASE_VERSION }} + RC_NUM=${{ steps.get_rc_version.outputs.RC_NUM }} + sed -i -e "s/${RELEASE_VERSION}/${RELEASE_VERSION}rc${RC_NUM}/g" apache_beam/version.py + - name: Build RC source + if: steps.is_rc.outputs.is_rc == 1 + working-directory: ./sdks/python + run: python setup.py sdist --formats=zip + - name: Add RC checksums + if: steps.is_rc.outputs.is_rc == 1 + working-directory: ./sdks/python/dist + run: | + file=$(ls | grep .zip | head -n 1) + sha512sum $file > ${file}.sha512 + - name: Unzip RC source + if: steps.is_rc.outputs.is_rc == 1 + working-directory: ./sdks/python + run: unzip dist/$(ls dist | grep .zip | head -n 1) + - name: Rename RC source directory + if: steps.is_rc.outputs.is_rc == 1 + working-directory: ./sdks/python + run: mv $(ls | grep apache-beam) apache-beam-source-rc + - name: Upload RC source as artifact + if: steps.is_rc.outputs.is_rc == 1 + uses: actions/upload-artifact@v2 + with: + name: source_rc${{ steps.get_rc_version.outputs.RC_NUM }} + path: sdks/python/apache-beam-source-rc + - name: Upload compressed RC sources as artifacts + if: steps.is_rc.outputs.is_rc == 1 + uses: actions/upload-artifact@v2 + with: + name: source_zip_rc${{ steps.get_rc_version.outputs.RC_NUM }} + path: sdks/python/dist + prepare_gcs: name: Prepare GCS @@ -134,8 +199,10 @@ jobs: run: gsutil cp -r -a public-read source/* ${{ env.GCP_PATH }} build_wheels: - name: Build python wheels on ${{ matrix.os_python.os }} + name: Build python wheels on ${{matrix.arch}} for ${{ matrix.os_python.os }} needs: build_source + env: + CIBW_ARCHS_LINUX: ${{matrix.arch}} runs-on: ${{ matrix.os_python.os }} strategy: matrix: @@ -144,18 +211,31 @@ jobs: {"os": "macos-latest", "python": "cp36-* cp37-* cp38-*"}, {"os": "windows-latest", "python": "cp36-* cp37-* cp38-*"}, ] + arch: [auto] + include: + - os_python: {"os": "ubuntu-latest", "python": "cp36-* cp37-* cp38-*"} + arch: aarch64 steps: - name: Download python source distribution from artifacts uses: actions/download-artifact@v2 with: name: source path: apache-beam-source + - name: Download Python SDK RC source distribution from artifacts + if: ${{ needs.build_source.outputs.is_rc == 1 }} + uses: actions/download-artifact@v2 + with: + name: source_rc${{ needs.build_source.outputs.rc_num }} + path: apache-beam-source-rc - name: Install Python uses: actions/setup-python@v2 with: python-version: 3.7 + - uses: docker/setup-qemu-action@v1 + if: ${{matrix.arch == 'aarch64'}} + name: Set up QEMU - name: Install cibuildwheel - run: pip install cibuildwheel==1.4.2 + run: pip install cibuildwheel==1.11.0 - name: Build wheel working-directory: apache-beam-source env: @@ -178,6 +258,28 @@ jobs: with: name: wheelhouse-${{ matrix.os_python.os }} path: apache-beam-source/wheelhouse/ + - name: Build RC wheels + if: ${{ needs.build_source.outputs.is_rc == 1 }} + working-directory: apache-beam-source-rc + env: + CIBW_BUILD: ${{ matrix.os_python.python }} + CIBW_BEFORE_BUILD: pip install cython + run: cibuildwheel --print-build-identifiers && cibuildwheel --output-dir wheelhouse + shell: bash + - name: Add RC checksums + if: ${{ needs.build_source.outputs.is_rc == 1 }} + working-directory: apache-beam-source-rc/wheelhouse/ + run: | + for file in *.whl; do + sha512sum $file > ${file}.sha512 + done + shell: bash + - name: Upload RC wheels as artifacts + if: ${{ needs.build_source.outputs.is_rc == 1 }} + uses: actions/upload-artifact@v2 + with: + name: wheelhouse-rc${{ needs.build_source.outputs.rc_num }}-${{ matrix.os_python.os }} + path: apache-beam-source-rc/wheelhouse/ upload_wheels_to_gcs: name: Upload wheels to GCS bucket @@ -249,14 +351,17 @@ jobs: if: github.repository_owner == 'apache' && github.event_name == 'schedule' steps: - name: Checkout code on master branch - uses: actions/checkout@master + uses: actions/checkout@v2 + with: + persist-credentials: false + submodules: recursive - name: Tag commit run: | BRANCH_NAME=${GITHUB_REF##*/} echo "Tagging ${BRANCH_NAME}" git tag -f nightly-${BRANCH_NAME} HEAD - name: Push tags - uses: ad-m/github-push-action@master + uses: ./.github/actions/github-push-action with: github_token: ${{ secrets.GITHUB_TOKEN }} tags: true diff --git a/.github/workflows/cancel.yml b/.github/workflows/cancel.yml index 18ce996145bf..ea46ebf89d36 100644 --- a/.github/workflows/cancel.yml +++ b/.github/workflows/cancel.yml @@ -30,7 +30,12 @@ jobs: name: "Cancel duplicate workflow runs" runs-on: ubuntu-latest steps: - - uses: potiuk/cancel-workflow-runs@e9e87cb7738dbb999654aa90d69359d62c9e4eae #v3 + - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )" + uses: actions/checkout@v2 + with: + persist-credentials: false + submodules: recursive + - uses: ./.github/actions/cancel-workflow-runs name: "Cancel duplicate workflow runs" with: cancelMode: duplicates diff --git a/.github/workflows/java_tests.yml b/.github/workflows/java_tests.yml index 62c51fbf721a..9c4955d39766 100644 --- a/.github/workflows/java_tests.yml +++ b/.github/workflows/java_tests.yml @@ -33,7 +33,8 @@ on: pull_request: branches: ['master', 'release-*'] tags: 'v*' - paths: ['sdks/java/**', 'model/**', 'runners/**', 'examples/java/**', 'examples/kotlin/**', 'release/**'] + paths: ['sdks/java/**', 'model/**', 'runners/**', 'examples/java/**', + 'examples/kotlin/**', 'release/**', 'buildSrc/**'] jobs: @@ -67,9 +68,12 @@ jobs: steps: - name: Checkout code uses: actions/checkout@v2 + with: + persist-credentials: false + submodules: recursive # :sdks:java:core:test - name: Run :sdks:java:core:test - uses: eskatos/gradle-command-action@v1 + uses: ./.github/actions/gradle-command-action with: arguments: :sdks:java:core:test - name: Upload test logs for :sdks:java:core:test @@ -80,7 +84,7 @@ jobs: path: sdks/java/core/build/reports/tests/test # :sdks:java:harness:test - name: Run :sdks:java:harness:test - uses: eskatos/gradle-command-action@v1 + uses: ./.github/actions/gradle-command-action with: arguments: :sdks:java:harness:test if: always() @@ -92,7 +96,7 @@ jobs: path: sdks/java/harness/build/reports/tests/test # :runners:core-java:test - name: Run :runners:core-java:test - uses: eskatos/gradle-command-action@v1 + uses: ./.github/actions/gradle-command-action with: arguments: :runners:core-java:test if: always() @@ -113,8 +117,11 @@ jobs: steps: - name: Checkout code uses: actions/checkout@v2 + with: + persist-credentials: false + submodules: recursive - name: Run WordCount Unix - uses: eskatos/gradle-command-action@v1 + uses: ./.github/actions/gradle-command-action with: arguments: -p examples/ integrationTest --tests org.apache.beam.examples.WordCountIT -DintegrationTestRunner=direct @@ -143,6 +150,9 @@ jobs: steps: - name: Checkout code uses: actions/checkout@v2 + with: + persist-credentials: false + submodules: recursive - name: Authenticate on GCP uses: google-github-actions/setup-gcloud@master with: @@ -150,8 +160,17 @@ jobs: service_account_key: ${{ secrets.GCP_SA_KEY }} project_id: ${{ secrets.GCP_PROJECT_ID }} export_default_credentials: true + - name: Set Java Version + uses: actions/setup-java@v1 + with: + java-version: 8 + - name: Remove default github maven configuration + # This step is a workaround to avoid a decryption issue of Beam's + # gradle-command-action plugin and github's provided maven + # settings.xml file + run: rm ~/.m2/settings.xml - name: Run WordCount - uses: eskatos/gradle-command-action@v1 + uses: ./.github/actions/gradle-command-action with: arguments: -p examples/ integrationTest --tests org.apache.beam.examples.WordCountIT -DintegrationTestPipelineOptions=["--runner=DataflowRunner","--project=${{ secrets.GCP_PROJECT_ID }}","--tempRoot=gs://${{ secrets.GCP_TESTING_BUCKET }}/tmp/"] diff --git a/.github/workflows/local_env_tests.yml b/.github/workflows/local_env_tests.yml new file mode 100644 index 000000000000..2bf6af1c6a88 --- /dev/null +++ b/.github/workflows/local_env_tests.yml @@ -0,0 +1,55 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# To learn more about GitHub Actions in Apache Beam check the CI.md + +name: Local environment tests + +on: + push: + branches: ['master', 'release-*'] + tags: 'v*' + pull_request: + branches: ['master', 'release-*'] + tags: 'v*' + paths: ['dev-support/**', 'buildSrc/**', '**/build.gradle', 'sdks/python/setup.py', 'sdks/python/tox.ini'] + +jobs: + run_local_env_install_ubuntu: + timeout-minutes: 25 + name: "Ubuntu run local environment shell script" + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - name: "Installing local env dependencies" + run: "sudo ./local-env-setup.sh" + id: local_env_install_ubuntu + - name: "Gradle check" + run: "./gradlew checkSetup" + id: local_env_install_gradle_check_ubuntu + run_local_env_install_mac: + timeout-minutes: 50 + name: "Mac run local environment shell script" + runs-on: macos-latest + steps: + - uses: actions/checkout@v2 + - name: "Installing local env dependencies" + run: "./local-env-setup.sh" + id: local_env_install_mac + - name: "Gradle check" + run: "./gradlew checkSetup" + id: local_env_install_gradle_check_mac diff --git a/.gitignore b/.gitignore index a8933b21a152..33aac5b14bf7 100644 --- a/.gitignore +++ b/.gitignore @@ -15,6 +15,11 @@ sdks/**/vendor/**/* runners/**/vendor/**/* **/.gradletasknamecache +**/generated/* + +# Ignore sources generated into the main tree +**/src/main/generated/** +**/src/test/generated_tests/** # Ignore files generated by the Maven build process. **/bin/**/* @@ -24,6 +29,8 @@ runners/**/vendor/**/* # Ignore generated archetypes sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/src/ sdks/java/maven-archetypes/examples-java8/src/main/resources/archetype-resources/src/ +sdks/java/maven-archetypes/gcp-bom-examples/src/main/resources/archetype-resources/src/ +sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/sample.txt # Ignore files generated by the Python build process. **/*.pyc @@ -96,3 +103,13 @@ sdks/python/coverage.xml .pytest_cache .pytest_cache/**/* + +# Hugo +website/www/node_modules +website/www/dist +website/www/site/resources +website/www/site/github_samples +website/www/site/code_samples +website/www/site/_config_branch_repo.toml +website/www/yarn-error.log +!website/www/site/content diff --git a/.gitmodules b/.gitmodules index 5abfbe7da83e..fa6e30a8850a 100644 --- a/.gitmodules +++ b/.gitmodules @@ -1,3 +1,12 @@ [submodule "website/www/site/themes/docsy"] path = website/www/site/themes/docsy url = https://github.com/google/docsy.git +[submodule ".github/actions/cancel-workflow-runs"] + path = .github/actions/cancel-workflow-runs + url = https://github.com/potiuk/cancel-workflow-runs +[submodule ".github/actions/github-push-action"] + path = .github/actions/github-push-action + url = https://github.com/ad-m/github-push-action +[submodule ".github/actions/gradle-command-action"] + path = .github/actions/gradle-command-action + url = https://github.com/eskatos/gradle-command-action diff --git a/.test-infra/dataproc/flink_cluster.sh b/.test-infra/dataproc/flink_cluster.sh index 5d019a01d235..1a4384b1d0d4 100755 --- a/.test-infra/dataproc/flink_cluster.sh +++ b/.test-infra/dataproc/flink_cluster.sh @@ -35,7 +35,7 @@ # HARNESS_IMAGES_TO_PULL='gcr.io//python:latest gcr.io//java:latest' \ # JOB_SERVER_IMAGE=gcr.io//job-server-flink:latest \ # ARTIFACTS_DIR=gs:// \ -# FLINK_DOWNLOAD_URL=https://archive.apache.org/dist/flink/flink-1.10.1/flink-1.10.1-bin-scala_2.11.tgz \ +# FLINK_DOWNLOAD_URL=https://archive.apache.org/dist/flink/flink-1.12.3/flink-1.12.3-bin-scala_2.11.tgz \ # HADOOP_DOWNLOAD_URL=https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-9.0/flink-shaded-hadoop-2-uber-2.8.3-9.0.jar \ # FLINK_NUM_WORKERS=2 \ # FLINK_TASKMANAGER_SLOTS=1 \ diff --git a/.test-infra/jenkins/CommonJobProperties.groovy b/.test-infra/jenkins/CommonJobProperties.groovy index 40f143ad1f08..295cc89e46e7 100644 --- a/.test-infra/jenkins/CommonJobProperties.groovy +++ b/.test-infra/jenkins/CommonJobProperties.groovy @@ -49,7 +49,7 @@ class CommonJobProperties { // Discard old builds. Build records are only kept up to this number of days. context.logRotator { - daysToKeep(14) + daysToKeep(30) } // Source code management. @@ -183,6 +183,11 @@ class CommonJobProperties { context.switches("-Dorg.gradle.jvmargs=-Xms2g") context.switches("-Dorg.gradle.jvmargs=-Xmx4g") + // Disable file system watching for CI builds + // Builds are performed on a clean clone and files aren't modified, so + // there's no value in watching for changes. + context.switches("-Dorg.gradle.vfs.watch=false") + // Include dependency licenses when build docker images on Jenkins, see https://s.apache.org/zt68q context.switches("-Pdocker-pull-licenses") } diff --git a/.test-infra/jenkins/CommonTestProperties.groovy b/.test-infra/jenkins/CommonTestProperties.groovy index 47716d30e42b..d7be46453011 100644 --- a/.test-infra/jenkins/CommonTestProperties.groovy +++ b/.test-infra/jenkins/CommonTestProperties.groovy @@ -25,6 +25,10 @@ class CommonTestProperties { GO, } + static String getFlinkVersion() { + return "1.13" + } + enum Runner { DATAFLOW("DataflowRunner"), TEST_DATAFLOW("TestDataflowRunner"), @@ -38,9 +42,9 @@ class CommonTestProperties { JAVA: [ DATAFLOW: ":runners:google-cloud-dataflow-java", TEST_DATAFLOW: ":runners:google-cloud-dataflow-java", - SPARK: ":runners:spark", - SPARK_STRUCTURED_STREAMING: ":runners:spark", - FLINK: ":runners:flink:1.10", + SPARK: ":runners:spark:2", + SPARK_STRUCTURED_STREAMING: ":runners:spark:2", + FLINK: ":runners:flink:${CommonTestProperties.getFlinkVersion()}", DIRECT: ":runners:direct-java" ], PYTHON: [ diff --git a/.test-infra/jenkins/Flink.groovy b/.test-infra/jenkins/Flink.groovy index 53f11fc1b334..40dfa0377175 100644 --- a/.test-infra/jenkins/Flink.groovy +++ b/.test-infra/jenkins/Flink.groovy @@ -17,8 +17,8 @@ */ class Flink { - private static final String flinkDownloadUrl = 'https://archive.apache.org/dist/flink/flink-1.10.1/flink-1.10.1-bin-scala_2.11.tgz' - private static final String hadoopDownloadUrl = 'https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-9.0/flink-shaded-hadoop-2-uber-2.8.3-9.0.jar' + private static final String flinkDownloadUrl = 'https://archive.apache.org/dist/flink/flink-1.12.3/flink-1.12.3-bin-scala_2.11.tgz' + private static final String hadoopDownloadUrl = 'https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar' private static final String FLINK_DIR = '"$WORKSPACE/src/.test-infra/dataproc"' private static final String FLINK_SCRIPT = 'flink_cluster.sh' private def job diff --git a/.test-infra/jenkins/LoadTestsBuilder.groovy b/.test-infra/jenkins/LoadTestsBuilder.groovy index 58d82ba29571..228201ba4c16 100644 --- a/.test-infra/jenkins/LoadTestsBuilder.groovy +++ b/.test-infra/jenkins/LoadTestsBuilder.groovy @@ -27,20 +27,21 @@ class LoadTestsBuilder { final static String DOCKER_CONTAINER_REGISTRY = 'gcr.io/apache-beam-testing/beam_portability' final static String DOCKER_BEAM_SDK_IMAGE = "beam_python${LOAD_TEST_PYTHON_VERSION}_sdk:latest" - static void loadTests(scope, CommonTestProperties.SDK sdk, List testConfigurations, String test, String mode){ + static void loadTests(scope, CommonTestProperties.SDK sdk, List testConfigurations, String test, String mode, + List jobSpecificSwitches = null) { scope.description("Runs ${sdk.toString().toLowerCase().capitalize()} ${test} load tests in ${mode} mode") commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 240) for (testConfiguration in testConfigurations) { loadTest(scope, testConfiguration.title, testConfiguration.runner, sdk, testConfiguration.pipelineOptions, - testConfiguration.test, testConfiguration.withDataflowWorkerJar ?: false) + testConfiguration.test, jobSpecificSwitches) } } static void loadTest(context, String title, Runner runner, SDK sdk, Map options, - String mainClass, Boolean withDataflowWorkerJar = false) { + String mainClass, List jobSpecificSwitches = null) { options.put('runner', runner.option) InfluxDBCredentialsHelper.useCredentials(context) @@ -48,7 +49,8 @@ class LoadTestsBuilder { shell("echo \"*** ${title} ***\"") gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - setGradleTask(delegate, runner, sdk, options, mainClass, withDataflowWorkerJar) + setGradleTask(delegate, runner, sdk, options, mainClass, + jobSpecificSwitches) commonJobProperties.setGradleSwitches(delegate) } } @@ -74,13 +76,16 @@ class LoadTestsBuilder { } private static void setGradleTask(context, Runner runner, SDK sdk, Map options, - String mainClass, Boolean withDataflowWorkerJar) { + String mainClass, List jobSpecificSwitches) { context.tasks(getGradleTaskName(sdk)) context.switches("-PloadTest.mainClass=\"${mainClass}\"") context.switches("-Prunner=${runner.getDependencyBySDK(sdk)}") - context.switches("-PwithDataflowWorkerJar=\"${withDataflowWorkerJar}\"") context.switches("-PloadTest.args=\"${parseOptions(options)}\"") - + if (jobSpecificSwitches != null) { + jobSpecificSwitches.each { + context.switches(it) + } + } if (sdk == SDK.PYTHON) { context.switches("-PpythonVersion=${LOAD_TEST_PYTHON_VERSION}") diff --git a/.test-infra/jenkins/NexmarkBuilder.groovy b/.test-infra/jenkins/NexmarkBuilder.groovy index 382c385f9656..d50cc6f6f09f 100644 --- a/.test-infra/jenkins/NexmarkBuilder.groovy +++ b/.test-infra/jenkins/NexmarkBuilder.groovy @@ -25,6 +25,8 @@ import NexmarkDatabaseProperties // Class for building NEXMark jobs and suites. class NexmarkBuilder { + final static String DEFAULT_JAVA_RUNTIME_VERSION = "1.8"; + final static String JAVA_11_RUNTIME_VERSION = "11"; private static Map defaultOptions = [ 'manageResources': false, @@ -32,42 +34,56 @@ class NexmarkBuilder { ] << NexmarkDatabaseProperties.nexmarkBigQueryArgs << NexmarkDatabaseProperties.nexmarkInfluxDBArgs static void standardJob(context, Runner runner, SDK sdk, Map jobSpecificOptions, TriggeringContext triggeringContext) { + standardJob(context, runner, sdk, jobSpecificOptions, triggeringContext, null, DEFAULT_JAVA_RUNTIME_VERSION); + } + + static void standardJob(context, Runner runner, SDK sdk, Map jobSpecificOptions, TriggeringContext triggeringContext, List jobSpecificSwitches, String javaRuntimeVersion) { Map options = getFullOptions(jobSpecificOptions, runner, triggeringContext) options.put('streaming', false) - suite(context, "NEXMARK IN BATCH MODE USING ${runner} RUNNER", runner, sdk, options) + suite(context, "NEXMARK IN BATCH MODE USING ${runner} RUNNER", runner, sdk, options, jobSpecificSwitches, javaRuntimeVersion) options.put('streaming', true) - suite(context, "NEXMARK IN STREAMING MODE USING ${runner} RUNNER", runner, sdk, options) + suite(context, "NEXMARK IN STREAMING MODE USING ${runner} RUNNER", runner, sdk, options, jobSpecificSwitches, javaRuntimeVersion) options.put('queryLanguage', 'sql') options.put('streaming', false) - suite(context, "NEXMARK IN SQL BATCH MODE USING ${runner} RUNNER", runner, sdk, options) + suite(context, "NEXMARK IN SQL BATCH MODE USING ${runner} RUNNER", runner, sdk, options, jobSpecificSwitches, javaRuntimeVersion) options.put('streaming', true) - suite(context, "NEXMARK IN SQL STREAMING MODE USING ${runner} RUNNER", runner, sdk, options) + suite(context, "NEXMARK IN SQL STREAMING MODE USING ${runner} RUNNER", runner, sdk, options, jobSpecificSwitches, javaRuntimeVersion) options.put('queryLanguage', 'zetasql') options.put('streaming', false) - suite(context, "NEXMARK IN ZETASQL BATCH MODE USING ${runner} RUNNER", runner, sdk, options) + suite(context, "NEXMARK IN ZETASQL BATCH MODE USING ${runner} RUNNER", runner, sdk, options, jobSpecificSwitches, javaRuntimeVersion) options.put('streaming', true) - suite(context, "NEXMARK IN ZETASQL STREAMING MODE USING ${runner} RUNNER", runner, sdk, options) + suite(context, "NEXMARK IN ZETASQL STREAMING MODE USING ${runner} RUNNER", runner, sdk, options, jobSpecificSwitches, javaRuntimeVersion) + } + + static void nonQueryLanguageJobs(context, Runner runner, SDK sdk, Map jobSpecificOptions, TriggeringContext triggeringContext, List jobSpecificSwitches, String javaRuntimeVersion) { + Map options = getFullOptions(jobSpecificOptions, runner, triggeringContext) + + options.put('streaming', false) + suite(context, "NEXMARK IN BATCH MODE USING ${runner} RUNNER", runner, sdk, options, jobSpecificSwitches, javaRuntimeVersion) + + options.put('streaming', true) + suite(context, "NEXMARK IN STREAMING MODE USING ${runner} RUNNER", runner, sdk, options, jobSpecificSwitches, javaRuntimeVersion) } static void batchOnlyJob(context, Runner runner, SDK sdk, Map jobSpecificOptions, TriggeringContext triggeringContext) { Map options = getFullOptions(jobSpecificOptions, runner, triggeringContext) options.put('streaming', false) - suite(context, "NEXMARK IN BATCH MODE USING ${runner} RUNNER", runner, sdk, options) + suite(context, "NEXMARK IN BATCH MODE USING ${runner} RUNNER", runner, sdk, options, null, DEFAULT_JAVA_RUNTIME_VERSION) options.put('queryLanguage', 'sql') - suite(context, "NEXMARK IN SQL BATCH MODE USING ${runner} RUNNER", runner, sdk, options) + suite(context, "NEXMARK IN SQL BATCH MODE USING ${runner} RUNNER", runner, sdk, options, null, DEFAULT_JAVA_RUNTIME_VERSION) options.put('queryLanguage', 'zetasql') - suite(context, "NEXMARK IN ZETASQL BATCH MODE USING ${runner} RUNNER", runner, sdk, options) + suite(context, "NEXMARK IN ZETASQL BATCH MODE USING ${runner} RUNNER", runner, sdk, options, null, DEFAULT_JAVA_RUNTIME_VERSION) } private @@ -81,16 +97,52 @@ class NexmarkBuilder { } - static void suite(context, String title, Runner runner, SDK sdk, Map options) { + static void suite(context, String title, Runner runner, SDK sdk, Map options, List jobSpecificSwitches, String javaRuntimeVersion) { + if (javaRuntimeVersion == JAVA_11_RUNTIME_VERSION) { + java11Suite(context, title, runner, sdk, options, jobSpecificSwitches) + } else { + java8Suite(context, title, runner, sdk, options, jobSpecificSwitches) + } + } + + static void java8Suite(context, String title, Runner runner, SDK sdk, Map options, List jobSpecificSwitches) { InfluxDBCredentialsHelper.useCredentials(context) context.steps { - shell("echo \"*** RUN ${title} ***\"") + shell("echo \"*** RUN ${title} with Java 8 ***\"") + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(':sdks:java:testing:nexmark:run') + commonJobProperties.setGradleSwitches(delegate) + switches("-Pnexmark.runner=${runner.getDependencyBySDK(sdk)}") + switches("-Pnexmark.args=\"${parseOptions(options)}\"") + if (jobSpecificSwitches != null) { + jobSpecificSwitches.each { + switches(it) + } + } + } + } + } + + static void java11Suite(context, String title, Runner runner, SDK sdk, Map options, List jobSpecificSwitches) { + InfluxDBCredentialsHelper.useCredentials(context) + context.steps { + shell("echo \"*** RUN ${title} with Java 11***\"") + + // Run with Java 11 gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) + switches("-PcompileAndRunTestsWithJava11") + switches("-Pjava11Home=${commonJobProperties.JAVA_11_HOME}") switches("-Pnexmark.runner=${runner.getDependencyBySDK(sdk)}") switches("-Pnexmark.args=\"${parseOptions(options)}\"") + if (jobSpecificSwitches != null) { + jobSpecificSwitches.each { + switches(it) + } + } } } } diff --git a/.test-infra/jenkins/PrecommitJobBuilder.groovy b/.test-infra/jenkins/PrecommitJobBuilder.groovy index e2a2dd46d4de..df2ef22d193f 100644 --- a/.test-infra/jenkins/PrecommitJobBuilder.groovy +++ b/.test-infra/jenkins/PrecommitJobBuilder.groovy @@ -77,7 +77,7 @@ class PrecommitJobBuilder { '^gradle.properties$', '^gradlew$', '^gradle.bat$', - '^settings.gradle$' + '^settings.gradle.kts$' ] if (triggerPathPatterns) { triggerPathPatterns.addAll defaultPathTriggers diff --git a/.test-infra/jenkins/README.md b/.test-infra/jenkins/README.md index 6d936411948b..785bc4419138 100644 --- a/.test-infra/jenkins/README.md +++ b/.test-infra/jenkins/README.md @@ -52,10 +52,12 @@ Beam Jenkins overview page: [link](https://ci-beam.apache.org/) |------|------|-------------------|-------------| | beam_PostCommit_BeamMetrics_Publish | [cron](https://ci-beam.apache.org/job/beam_PostCommit_BeamMetrics_Publish/) | N/A | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_BeamMetrics_Publish/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_BeamMetrics_Publish) | | beam_PostCommit_CrossLanguageValidatesRunner | [cron](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink_PR/) | `Run XVR_Flink PostCommit` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink) | +| beam_PostCommit_CrossLanguageValidatesRunner | [cron](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza_PR/) | `Run XVR_Samza PostCommit` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza) | | beam_PostCommit_CrossLanguageValidatesRunner | [cron](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark_PR/) | `Run XVR_Spark PostCommit` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark) | | beam_PostCommit_CrossLanguageValidatesRunner | [cron](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct_PR/) | `Run XVR_Direct PostCommit` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct) | | beam_PostCommit_Go | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Go/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Go_PR/) | `Run Go PostCommit` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go) | | beam_PostCommit_Go_VR_Flink | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink_PR/) | `Run Go Flink ValidatesRunner` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/) | +| beam_PostCommit_Go_VR_Samza | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza_PR/) | `Run Go Samza ValidatesRunner` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/) | | beam_PostCommit_Go_VR_Spark | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark_PR/) | `Run Go Spark ValidatesRunner` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/) | | beam_PostCommit_Java | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Java/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Java_PR/) | `Run Java PostCommit` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java) | | beam_PostCommit_Java_Nexmark_Dataflow | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow_PR/) | `Dataflow Runner Nexmark Tests` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow) | @@ -64,6 +66,7 @@ Beam Jenkins overview page: [link](https://ci-beam.apache.org/) | beam_PostCommit_Java_Nexmark_Spark | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Spark/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Spark_PR/) | `Spark Runner Nexmark Tests` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Spark/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Spark) | | beam_PostCommit_Java_PVR_Flink_Batch | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch_PR/) | `Run Java Flink PortableValidatesRunner Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch) | | beam_PostCommit_Java_PVR_Flink_Streaming | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming_PR/) | `Run Java Flink PortableValidatesRunner Streaming` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming) | +| beam_PostCommit_Java_PVR_Samza | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza_PR/) | `Run Java Samza PortableValidatesRunner` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza) | | beam_PostCommit_Java_PVR_Spark_Batch | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch_PR/) | `Run Java Spark PortableValidatesRunner Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch) | | beam_PostCommit_Java_PortabilityApi | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Java_PortabilityApi/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Java_PortabilityApi_PR/) | `Run Java PortabilityApi PostCommit` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PortabilityApi/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PortabilityApi) | | beam_PostCommit_Java_Dataflow_Examples_Java11 | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_Java11/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_Java11_PR/) | `Run Java examples on Dataflow Java 11` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_Java11/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_Java11) | @@ -84,8 +87,10 @@ Beam Jenkins overview page: [link](https://ci-beam.apache.org/) | beam_PostCommit_Javadoc | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Javadoc/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Javadoc_PR/) | `Run Javadoc PostCommit` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Javadoc/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Javadoc) | | beam_PostCommit_PortableJar_Flink | [cron](https://ci-beam.apache.org/job/beam_PostCommit_PortableJar_Flink/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_PortableJar_Flink_PR/) | `Run PortableJar_Flink PostCommit` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_PortableJar_Flink/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_PortableJar_Flink) | | beam_PostCommit_Py_VR_Dataflow | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_PR/) | `Run Python Dataflow ValidatesRunner` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow) | +| beam_PostCommit_Py_VR_Dataflow_V2 | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2_PR/) | `Run Python Dataflow V2 ValidatesRunner` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2) | | beam_PostCommit_Py_ValCont | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont_PR/) | `Run Python Dataflow ValidatesContainer` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont) | -| beam_PostCommit_Python_VR_Flink | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/) | `Run Python Flink ValidatesRunner` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink) | +| beam_PostCommit_Python_VR_Flink | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink_PR/) | `Run Python Flink ValidatesRunner` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink) | +| beam_PostCommit_Python_VR_Samza | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza_PR/) | `Run Python Samza ValidatesRunner` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza) | | beam_PostCommit_Python_Chicago_Taxi_Example_Dataflow | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Dataflow/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Dataflow_PR/) | `Run Chicago Taxi on Dataflow` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Dataflow/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Dataflow) | | beam_PostCommit_Python_Chicago_Taxi_Example_Flink | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Flink/), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Flink_PR/) | `Run Chicago Taxi on Flink` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Flink/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Flink) | | beam_PostCommit_Python_MongoDBIO_IT | [cron](https://ci-beam.apache.org/job/beam_PostCommit_Python_MongoDBIO_IT), [phrase](https://ci-beam.apache.org/job/beam_PostCommit_Python_MongoDBIO_IT_PR/) | `Run Python MongoDBIO_IT` | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_MongoDBIO_IT/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_MongoDBIO_IT) | @@ -113,6 +118,8 @@ Beam Jenkins overview page: [link](https://ci-beam.apache.org/) | beam_PerformanceTests_ParquetIOIT | [cron](https://ci-beam.apache.org/job/beam_PerformanceTests_ParquetIOIT/), [hdfs_cron](https://ci-beam.apache.org/job/beam_PerformanceTests_ParquetIOIT_HDFS/) | `Run Java ParquetIO Performance Test` | [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_ParquetIOIT/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_ParquetIOIT) [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_ParquetIOIT_HDFS/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_ParquetIOIT_HDFS) | | beam_PerformanceTests_PubsubIOIT_Python_Streaming | [cron](https://ci-beam.apache.org/job/beam_PerformanceTests_PubsubIOIT_Python_Streaming/) | `Run PubsubIO Performance Test Python` | [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_PubsubIOIT_Python_Streaming/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_PubsubIOIT_Python_Streaming) | | beam_PerformanceTests_Spark | [cron](https://ci-beam.apache.org/job/beam_PerformanceTests_Spark/) | `Run Spark Performance Test` | [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_Spark/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_Spark) | +| beam_PerformanceTests_SpannerIO_Read_2GB_Python | [cron](https://ci-beam.apache.org/job/beam_PerformanceTests_SpannerIO_Read_2GB_Python_Batch/) | `Run SpannerIO Read 2GB Performance Test Python Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_SpannerIO_Read_2GB_Python_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_SpannerIO_Read_2GB_Python_Batch) | +| beam_PerformanceTests_SpannerIO_Write_2GB_Python_Batch | [cron](https://ci-beam.apache.org/job/beam_PerformanceTests_SpannerIO_Write_2GB_Python_Batch/) | `Run SpannerIO Write 2GB Performance Test Python Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_SpannerIO_Write_2GB_Python_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_SpannerIO_Write_2GB_Python_Batch) | | beam_PerformanceTests_TFRecordIOIT | [cron](https://ci-beam.apache.org/job/beam_PerformanceTests_TFRecordIOIT/) | `Run Java TFRecordIO Performance Test` | [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_TFRecordIOIT/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_TFRecordIOIT) | | beam_PerformanceTests_TextIOIT | [cron](https://ci-beam.apache.org/job/beam_PerformanceTests_TextIOIT/), [hdfs_cron](https://ci-beam.apache.org/job/beam_PerformanceTests_TextIOIT_HDFS/) | `Run Java TextIO Performance Test` | [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_TextIOIT/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_TextIOIT) [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_TextIOIT_HDFS/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_TextIOIT_HDFS) | | beam_PerformanceTests_WordCountIT_Py37 | [cron](https://ci-beam.apache.org/job/beam_PerformanceTests_WordCountIT_Py37/) | `Run Python37 WordCountIT Performance Test` | [![Build Status](https://ci-beam.apache.org/job/beam_PerformanceTests_WordCountIT_Py37/badge/icon)](https://ci-beam.apache.org/job/beam_PerformanceTests_WordCountIT_Py37) | @@ -122,7 +129,14 @@ Beam Jenkins overview page: [link](https://ci-beam.apache.org/) | Name | Link | PR Trigger Phrase | Cron Status | |------|------|-------------------|-------------| -| beam_LoadTests_Go_ParDo_Flink_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Flink_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Flink_Batch_PR/) | `Run Load Tests Go ParDo Flink Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Flink_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Python_Go_Flink_Batch/) | +| beam_LoadTests_Go_CoGBK_Dataflow_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Go_CoGBK_Dataflow_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Go_CoGBK_Dataflow_Batch_PR/) | `Run Load Tests Go CoGBK Dataflow Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Go_CoGBK_Dataflow_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Go_CoGBK_Dataflow_Batch/) | +| beam_LoadTests_Go_CoGBK_Flink_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Go_CoGBK_Flink_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Go_CoGBK_Flink_Batch_PR/) | `Run Load Tests Go CoGBK Flink Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Go_CoGBK_Flink_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Go_CoGBK_Flink_Batch/) | +| beam_LoadTests_Go_Combine_Dataflow_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Go_Combine_Dataflow_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Go_Combine_Dataflow_Batch_PR/) | `Run Load Tests Go Combine Dataflow Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Go_Combine_Dataflow_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Go_Combine_Dataflow_Batch/) | +| beam_LoadTests_Go_Combine_Flink_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Go_Combine_Flink_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Go_Combine_Flink_Batch_PR/) | `Run Load Tests Go Combine Flink Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Go_Combine_Flink_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Go_Combine_Flink_Batch/) | +| beam_LoadTests_Go_GBK_Dataflow_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Go_GBK_Dataflow_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Go_GBK_Dataflow_Batch_PR/) | `Run Load Tests Go GBK Dataflow Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Go_GBK_Dataflow_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Go_GBK_Dataflow_Batch/) | +| beam_LoadTests_Go_GBK_Flink_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Go_GBK_Flink_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Go_GBK_Flink_Batch_PR/) | `Run Load Tests Go GBK Flink Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Go_GBK_Flink_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Go_GBK_Flink_Batch/) | +| beam_LoadTests_Go_ParDo_Dataflow_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Dataflow_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Dataflow_Batch_PR/) | `Run Load Tests Go ParDo Dataflow Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Dataflow_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Dataflow_Batch/) | +| beam_LoadTests_Go_ParDo_Flink_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Flink_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Flink_Batch_PR/) | `Run Load Tests Go ParDo Flink Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Flink_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Go_ParDo_Flink_Batch/) | | beam_Java_LoadTests_Smoke | [phrase](https://ci-beam.apache.org/job/beam_Java_LoadTests_Smoke_PR/) | `Run Java Load Tests Smoke` | | | beam_LoadTests_Java_CoGBK_Dataflow_Batch | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Java_CoGBK_Dataflow_Batch/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Java_CoGBK_Dataflow_Batch_PR/) | `Run Load Tests Java CoGBK Dataflow Batch` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Java_CoGBK_Dataflow_Batch/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Java_CoGBK_Dataflow_Batch/) | | beam_LoadTests_Java_CoGBK_Dataflow_Streaming | [cron](https://ci-beam.apache.org/job/beam_LoadTests_Java_CoGBK_Dataflow_Streaming/), [phrase](https://ci-beam.apache.org/job/beam_LoadTests_Java_CoGBK_Dataflow_Streaming_PR/) | `Run Load Tests Java CoGBK Dataflow Streaming` | [![Build Status](https://ci-beam.apache.org/job/beam_LoadTests_Java_CoGBK_Dataflow_Streaming/badge/icon)](https://ci-beam.apache.org/job/beam_LoadTests_Java_CoGBK_Dataflow_Streaming/) | diff --git a/.test-infra/jenkins/job_Inventory.groovy b/.test-infra/jenkins/job_Inventory.groovy index baaf657dd813..876c3991041a 100644 --- a/.test-infra/jenkins/job_Inventory.groovy +++ b/.test-infra/jenkins/job_Inventory.groovy @@ -18,6 +18,8 @@ import CommonJobProperties as commonJobProperties +import static PythonTestProperties.ALL_SUPPORTED_VERSIONS + // These jobs list details about each beam runner, to clarify what software // is on each machine. def nums = 1..16 @@ -59,16 +61,17 @@ nums.each { shell('ls /home/jenkins/tools/*') shell('python --version || echo "python not found"') shell('python3 --version || echo "python3 not found"') - shell('python3.6 --version || echo "python3.6 not found"') - shell('python3.7 --version || echo "python3.7 not found"') - shell('python3.8 --version || echo "python3.8 not found"') + ALL_SUPPORTED_VERSIONS.each { version -> + shell("python${version} --version || echo \"python${version} not found\"") + } shell('/home/jenkins/tools/maven/latest/mvn -v || echo "mvn not found"') shell('/home/jenkins/tools/gradle4.3/gradle -v || echo "gradle not found"') shell('gcloud -v || echo "gcloud not found"') shell('kubectl version || echo "kubectl not found"') - shell('virtualenv -p python3.6 test36 && . ./test36/bin/activate && python --version && deactivate || echo "python 3.6 not found"') - shell('virtualenv -p python3.7 test37 && . ./test37/bin/activate && python --version && deactivate || echo "python 3.7 not found"') - shell('virtualenv -p python3.8 test38 && . ./test38/bin/activate && python --version && deactivate || echo "python 3.8 not found"') + ALL_SUPPORTED_VERSIONS.each { version -> + def versionSuffix = version.replace('.', '') + shell("virtualenv -p python${version} test${versionSuffix} && . ./test${versionSuffix}/bin/activate && python --version && deactivate || echo \"python ${version} not found\"") + } shell('echo "Maven home $MAVEN_HOME"') shell('env') shell('docker system prune --all --filter until=24h --force') diff --git a/.test-infra/jenkins/job_LoadTests_CoGBK_Dataflow_V2_Java11.groovy b/.test-infra/jenkins/job_LoadTests_CoGBK_Dataflow_V2_Java11.groovy new file mode 100644 index 000000000000..deeeef733afb --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_CoGBK_Dataflow_V2_Java11.groovy @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import CronJobBuilder +import InfluxDBCredentialsHelper + +def loadTestConfigurations = { mode, isStreaming, datasetName -> + [ + [ + title : 'Load test: CoGBK 2GB 100 byte records - single key', + test : 'org.apache.beam.sdk.loadtests.CoGroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_CoGBK_1", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_CoGBK_1", + influxMeasurement : "java_${mode}_cogbk_1", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 1 + } + """.trim().replaceAll("\\s", ""), + coSourceOptions : """ + { + "numRecords": 2000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 1000 + } + """.trim().replaceAll("\\s", ""), + iterations : 1, + numWorkers : 5, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + ], + [ + title : 'Load test: CoGBK 2GB 100 byte records - multiple keys', + test : 'org.apache.beam.sdk.loadtests.CoGroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_CoGBK_2", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_CoGBK_2", + influxMeasurement : "java_${mode}_cogbk_2", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 5 + } + """.trim().replaceAll("\\s", ""), + coSourceOptions : """ + { + "numRecords": 2000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 1000 + } + """.trim().replaceAll("\\s", ""), + iterations : 1, + numWorkers : 5, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + ], + [ + + title : 'Load test: CoGBK 2GB reiteration 10kB value', + test : 'org.apache.beam.sdk.loadtests.CoGroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_CoGBK_3", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_CoGBK_3", + influxMeasurement : "java_${mode}_cogbk_3", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 2000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 200000 + } + """.trim().replaceAll("\\s", ""), + coSourceOptions : """ + { + "numRecords": 2000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 1000 + } + """.trim().replaceAll("\\s", ""), + iterations : 4, + numWorkers : 5, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + + ], + [ + title : 'Load test: CoGBK 2GB reiteration 2MB value', + test : 'org.apache.beam.sdk.loadtests.CoGroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_CoGBK_4", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_CoGBK_4", + influxMeasurement : "java_${mode}_cogbk_4", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 2000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 1000 + } + """.trim().replaceAll("\\s", ""), + coSourceOptions : """ + { + "numRecords": 2000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 1000 + } + """.trim().replaceAll("\\s", ""), + iterations : 4, + numWorkers : 5, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + ] + ].each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def final JOB_SPECIFIC_SWITCHES = [ + '-Prunner.version="V2"', + '-PcompileAndRunTestsWithJava11', + "-Pjava11Home=${commonJobProperties.JAVA_11_HOME}" +] + +def streamingLoadTestJob = { scope, triggeringContext -> + scope.description('Runs Java 11 CoGBK load tests on Dataflow runner V2 in streaming mode') + commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 240) + + def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext) + for (testConfiguration in loadTestConfigurations('streaming', true, datasetName)) { + testConfiguration.pipelineOptions << [inputWindowDurationSec: 1200, coInputWindowDurationSec: 1200] + loadTestsBuilder.loadTest(scope, testConfiguration.title, testConfiguration.runner, CommonTestProperties.SDK.JAVA, + testConfiguration.pipelineOptions, testConfiguration.test, JOB_SPECIFIC_SWITCHES) + } +} + +CronJobBuilder.cronJob('beam_LoadTests_Java_CoGBK_Dataflow_V2_Streaming_Java11', 'H 12 * * *', this) { + additionalPipelineArgs = [ + influxDatabase: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influxHost: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + streamingLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Java_CoGBK_Dataflow_V2_Streaming_Java11', + 'Run Load Tests Java 11 CoGBK Dataflow V2 Streaming', + 'Load Tests Java 11 CoGBK Dataflow V2 Streaming suite', + this + ) { + additionalPipelineArgs = [:] + streamingLoadTestJob(delegate, CommonTestProperties.TriggeringContext.PR) + } + + +def batchLoadTestJob = { scope, triggeringContext -> + def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext) + loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.JAVA, loadTestConfigurations('batch', false, datasetName), + "CoGBK", "batch", JOB_SPECIFIC_SWITCHES) +} + +CronJobBuilder.cronJob('beam_LoadTests_Java_CoGBK_Dataflow_V2_Batch_Java11', 'H 14 * * *', this) { + additionalPipelineArgs = [ + influxDatabase: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influxHost: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Java_CoGBK_Dataflow_V2_Batch_Java11', + 'Run Load Tests Java 11 CoGBK Dataflow V2 Batch', + 'Load Tests Java 11 CoGBK Dataflow V2 Batch suite', + this + ) { + additionalPipelineArgs = [:] + batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.PR) + } diff --git a/.test-infra/jenkins/job_LoadTests_Combine_Flink_Go.groovy b/.test-infra/jenkins/job_LoadTests_Combine_Flink_Go.groovy index 9e2ea2c597e4..0143121caaa9 100644 --- a/.test-infra/jenkins/job_LoadTests_Combine_Flink_Go.groovy +++ b/.test-infra/jenkins/job_LoadTests_Combine_Flink_Go.groovy @@ -105,7 +105,7 @@ def loadTestJob = { scope, triggeringContext, mode -> "${DOCKER_CONTAINER_REGISTRY}/beam_go_sdk:latest" ], initialParallelism, - "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.10_job_server:latest") + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") // Execute all scenarios connected with initial parallelism. loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, initialScenarios, 'combine', mode) @@ -120,7 +120,7 @@ def loadTestJob = { scope, triggeringContext, mode -> PhraseTriggeringPostCommitBuilder.postCommitJob( 'beam_LoadTests_Go_Combine_Flink_Batch', 'Run Load Tests Go Combine Flink Batch', - 'Load Tests Go Combine Batch suite', + 'Load Tests Go Combine Flink Batch suite', this ) { additionalPipelineArgs = [:] diff --git a/.test-infra/jenkins/job_LoadTests_Combine_Flink_Python.groovy b/.test-infra/jenkins/job_LoadTests_Combine_Flink_Python.groovy index afe86594f2df..5055d1a6982f 100644 --- a/.test-infra/jenkins/job_LoadTests_Combine_Flink_Python.groovy +++ b/.test-infra/jenkins/job_LoadTests_Combine_Flink_Python.groovy @@ -132,7 +132,7 @@ def loadTestJob = { scope, triggeringContext, mode -> "${DOCKER_CONTAINER_REGISTRY}/${DOCKER_BEAM_SDK_IMAGE}" ], initialParallelism, - "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.10_job_server:latest") + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") // Execute all scenarios connected with initial parallelism. loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.PYTHON, initialScenarios, 'Combine', mode) diff --git a/.test-infra/jenkins/job_LoadTests_Combine_Go.groovy b/.test-infra/jenkins/job_LoadTests_Combine_Go.groovy new file mode 100644 index 000000000000..d1204a50f48c --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_Combine_Go.groovy @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + + +String now = new Date().format('MMddHHmmss', TimeZone.getTimeZone('UTC')) + +def batchScenarios = { + [ + [ + title : 'Combine Go Load test: 2GB of 10B records', + test : 'combine', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-combine-1-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_combine_1', + input_options : '\'{' + + '"num_records": 200000000,' + + '"key_size": 1,' + + '"value_size": 9}\'', + fanout : 1, + top_count : 20, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'Combine Go Load test: fanout 4 times with 2GB 10-byte records total', + test : 'combine', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-combine-4-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_combine_4', + input_options : '\'{' + + '"num_records": 5000000,' + + '"key_size": 10,' + + '"value_size": 90}\'', + fanout : 4, + top_count : 20, + num_workers : 16, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'Combine Go Load test: fanout 8 times with 2GB 10-byte records total', + test : 'combine', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-combine-5-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_combine_5', + input_options : '\'{' + + '"num_records": 2500000,' + + '"key_size": 10,' + + '"value_size": 90}\'', + fanout : 8, + top_count : 20, + num_workers : 16, + autoscaling_algorithm: 'NONE', + ] + ], + ].each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def loadTestJob = { scope, triggeringContext, mode -> + loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, batchScenarios(), 'combine', mode) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Go_Combine_Dataflow_Batch', + 'Run Load Tests Go Combine Dataflow Batch', + 'Load Tests Go Combine Dataflow Batch suite', + this + ) { + additionalPipelineArgs = [:] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, 'batch') + } + +CronJobBuilder.cronJob('beam_LoadTests_Go_Combine_Dataflow_Batch', 'H 8 * * *', this) { + additionalPipelineArgs = [ + influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, 'batch') +} diff --git a/.test-infra/jenkins/job_LoadTests_Combine_Python.groovy b/.test-infra/jenkins/job_LoadTests_Combine_Python.groovy index 0ea402c34544..f9554e79a111 100644 --- a/.test-infra/jenkins/job_LoadTests_Combine_Python.groovy +++ b/.test-infra/jenkins/job_LoadTests_Combine_Python.groovy @@ -100,7 +100,8 @@ def loadTestConfigurations = { datasetName, mode -> def addStreamingOptions(test){ test.pipelineOptions << [streaming: null, - experiments: "use_runner_v2" + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + experiments: "use_runner_v2, shuffle_mode=appliance" ] } diff --git a/.test-infra/jenkins/job_LoadTests_FnApiRunner_Python.groovy b/.test-infra/jenkins/job_LoadTests_FnApiRunner_Python.groovy new file mode 100644 index 000000000000..befc04a9346e --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_FnApiRunner_Python.groovy @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + +def now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC')) + +def loadTestConfigurations = { datasetName -> + [ + [ + title : 'FnApiRunner Python load test - microbenchmark', + test : 'apache_beam.testing.load_tests.microbenchmarks_test', + runner : CommonTestProperties.Runner.DIRECT, + pipelineOptions: [ + publish_to_big_query: true, + influx_measurement : 'python_direct_microbenchmarks', + project : 'apache-beam-testing', + metrics_dataset : datasetName, + metrics_table : 'python_direct_microbenchmarks', + input_options : '\'{}\'', + ] + ], + ] + .each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def loadTestJob = { scope, triggeringContext -> + scope.description("Runs Python FnApiRunner Microbenchmark") + commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 120) + + def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext) + for (testConfiguration in loadTestConfigurations(datasetName)) { + loadTestsBuilder.loadTest(scope, testConfiguration.title, testConfiguration.runner, CommonTestProperties.SDK.PYTHON, testConfiguration.pipelineOptions, testConfiguration.test) + } +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_Python_LoadTests_FnApiRunner_Microbenchmark', + 'Run Python Load Tests FnApiRunner Microbenchmark', + 'Python Load Tests FnApiRunner Microbenchmark', + this + ) { + additionalPipelineArgs = [:] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR) + } + + +// Run this job every 6 hours on a random minute. +CronJobBuilder.cronJob('beam_Python_LoadTests_FnApiRunner_Microbenchmark', 'H */6 * * *', this) { + additionalPipelineArgs = [ + influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) +} + diff --git a/.test-infra/jenkins/job_LoadTests_GBK_Dataflow_V2_Java11.groovy b/.test-infra/jenkins/job_LoadTests_GBK_Dataflow_V2_Java11.groovy new file mode 100644 index 000000000000..0c57e1231a4f --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_GBK_Dataflow_V2_Java11.groovy @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import CronJobBuilder +import InfluxDBCredentialsHelper + +def loadTestConfigurations = { mode, isStreaming, datasetName -> + [ + [ + title : 'Load test: 2GB of 10B records', + test : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_GBK_1", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_Dataflow_V2_${mode}_GBK_1", + influxMeasurement : "java_${mode}_gbk_1", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 200000000, + "keySizeBytes": 1, + "valueSizeBytes": 9 + } + """.trim().replaceAll("\\s", ""), + fanout : 1, + iterations : 1, + numWorkers : 5, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + ], + [ + title : 'Load test: 2GB of 100B records', + test : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_GBK_2", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_GBK_2", + influxMeasurement : "java_${mode}_gbk_2", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000000, + "keySizeBytes": 10, + "valueSizeBytes": 90 + } + """.trim().replaceAll("\\s", ""), + fanout : 1, + iterations : 1, + numWorkers : 5, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + ], + [ + + title : 'Load test: 2GB of 100kB records', + test : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_GBK_3", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_GBK_3", + influxMeasurement : "java_${mode}_gbk_3", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000, + "keySizeBytes": 10000, + "valueSizeBytes": 90000 + } + """.trim().replaceAll("\\s", ""), + fanout : 1, + iterations : 1, + numWorkers : 5, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + + ], + [ + title : 'Load test: fanout 4 times with 2GB 10-byte records total', + test : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_GBK_4", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_GBK_4", + influxMeasurement : "java_${mode}_gbk_4", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 5000000, + "keySizeBytes": 10, + "valueSizeBytes": 90 + } + """.trim().replaceAll("\\s", ""), + fanout : 4, + iterations : 1, + numWorkers : 16, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + ], + [ + title : 'Load test: fanout 8 times with 2GB 10-byte records total', + test : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_GBK_5", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_GBK_5", + influxMeasurement : "java_${mode}_gbk_5", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 2500000, + "keySizeBytes": 10, + "valueSizeBytes": 90 + } + """.trim().replaceAll("\\s", ""), + fanout : 8, + iterations : 1, + numWorkers : 16, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + ], + [ + title : 'Load test: reiterate 4 times 10kB values', + test : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_GBK_6", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_GBK_6", + influxMeasurement : "java_${mode}_gbk_6", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 200, + "hotKeyFraction": 1 + } + """.trim().replaceAll("\\s", ""), + fanout : 1, + iterations : 4, + numWorkers : 5, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + ], + [ + title : 'Load test: reiterate 4 times 2MB values', + test : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${mode}_GBK_7", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${mode}_GBK_7", + influxMeasurement : "java_${mode}_gbk_7", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000000, + "keySizeBytes": 10, + "valueSizeBytes": 90, + "numHotKeys": 10, + "hotKeyFraction": 1 + } + """.trim().replaceAll("\\s", ""), + fanout : 1, + iterations : 4, + numWorkers : 5, + autoscalingAlgorithm : "NONE", + streaming : isStreaming + ] + ] + ].each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def final JOB_SPECIFIC_SWITCHES = [ + '-Prunner.version="V2"', + '-PcompileAndRunTestsWithJava11', + "-Pjava11Home=${commonJobProperties.JAVA_11_HOME}" +] + +def streamingLoadTestJob = { scope, triggeringContext -> + scope.description('Runs Java 11 GBK load tests on Dataflow runner V2 in streaming mode') + commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 240) + + def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext) + for (testConfiguration in loadTestConfigurations('streaming', true, datasetName)) { + testConfiguration.pipelineOptions << [inputWindowDurationSec: 1200] + loadTestsBuilder.loadTest(scope, testConfiguration.title, testConfiguration.runner, CommonTestProperties.SDK.JAVA, + testConfiguration.pipelineOptions, testConfiguration.test, JOB_SPECIFIC_SWITCHES) + } +} + +CronJobBuilder.cronJob('beam_LoadTests_Java_GBK_Dataflow_V2_Streaming_Java11', 'H 12 * * *', this) { + additionalPipelineArgs = [ + influxDatabase: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influxHost: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + streamingLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Java_GBK_Dataflow_V2_Streaming_Java11', + 'Run Load Tests Java 11 GBK Dataflow V2 Streaming', + 'Load Tests Java 11 GBK Dataflow V2 Streaming suite', + this + ) { + additionalPipelineArgs = [:] + streamingLoadTestJob(delegate, CommonTestProperties.TriggeringContext.PR) + } + + +def batchLoadTestJob = { scope, triggeringContext -> + def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext) + loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.JAVA, loadTestConfigurations('batch', false, datasetName), + "GBK", "batch", JOB_SPECIFIC_SWITCHES) +} + +CronJobBuilder.cronJob('beam_LoadTests_Java_GBK_Dataflow_V2_Batch_Java11', 'H 14 * * *', this) { + additionalPipelineArgs = [ + influxDatabase: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influxHost: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Java_GBK_Dataflow_V2_Batch_Java11', + 'Run Load Tests Java 11 GBK Dataflow V2 Batch', + 'Load Tests Java 11 GBK Dataflow V2 Batch suite', + this + ) { + additionalPipelineArgs = [:] + batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.PR) + } diff --git a/.test-infra/jenkins/job_LoadTests_GBK_Flink_Go.groovy b/.test-infra/jenkins/job_LoadTests_GBK_Flink_Go.groovy index 11731e81e768..fee2e7d29d7a 100644 --- a/.test-infra/jenkins/job_LoadTests_GBK_Flink_Go.groovy +++ b/.test-infra/jenkins/job_LoadTests_GBK_Flink_Go.groovy @@ -198,7 +198,7 @@ def loadTestJob = { scope, triggeringContext, mode -> "${DOCKER_CONTAINER_REGISTRY}/beam_go_sdk:latest" ], initialParallelism, - "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.10_job_server:latest") + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") // Execute all scenarios connected with initial parallelism. loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, initialScenarios, 'group_by_key', mode) @@ -213,7 +213,7 @@ def loadTestJob = { scope, triggeringContext, mode -> PhraseTriggeringPostCommitBuilder.postCommitJob( 'beam_LoadTests_Go_GBK_Flink_Batch', 'Run Load Tests Go GBK Flink Batch', - 'Load Tests Go GBK Batch suite', + 'Load Tests Go GBK Flink Batch suite', this ) { additionalPipelineArgs = [:] diff --git a/.test-infra/jenkins/job_LoadTests_GBK_Flink_Python.groovy b/.test-infra/jenkins/job_LoadTests_GBK_Flink_Python.groovy index 6fa8df4cb681..1b7c1f9a807d 100644 --- a/.test-infra/jenkins/job_LoadTests_GBK_Flink_Python.groovy +++ b/.test-infra/jenkins/job_LoadTests_GBK_Flink_Python.groovy @@ -146,7 +146,7 @@ def loadTest = { scope, triggeringContext -> "${DOCKER_CONTAINER_REGISTRY}/${DOCKER_BEAM_SDK_IMAGE}" ], numberOfWorkers, - "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.10_job_server:latest") + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") def configurations = testScenarios.findAll { it.pipelineOptions?.parallelism?.value == numberOfWorkers } loadTestsBuilder.loadTests(scope, sdk, configurations, "GBK", "batch") diff --git a/.test-infra/jenkins/job_LoadTests_GBK_Go.groovy b/.test-infra/jenkins/job_LoadTests_GBK_Go.groovy new file mode 100644 index 000000000000..8e3f26c5cdac --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_GBK_Go.groovy @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + +String now = new Date().format('MMddHHmmss', TimeZone.getTimeZone('UTC')) + +def batchScenarios = { + [ + [ + title : 'Group By Key Go Load test: 2GB of 10B records', + test : 'group_by_key', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-gbk-1-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_gbk_1', + input_options : '\'{' + + '"num_records": 200000000,' + + '"key_size": 1,' + + '"value_size": 9}\'', + iterations : 1, + fanout : 1, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'Group By Key Go Load test: 2GB of 100B records', + test : 'group_by_key', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-gbk-2-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_gbk_2', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90}\'', + iterations : 1, + fanout : 1, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'Group By Key Go Load test: 2GB of 100kB records', + test : 'group_by_key', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-gbk-3-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_gbk_3', + input_options : '\'{' + + '"num_records": 20000,' + + '"key_size": 10000,' + + '"value_size": 90000}\'', + iterations : 1, + fanout : 1, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'Group By Key Go Load test: fanout 4 times with 2GB 10-byte records total', + test : 'group_by_key', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-gbk-4-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_gbk_4', + input_options : '\'{' + + '"num_records": 5000000,' + + '"key_size": 10,' + + '"value_size": 90}\'', + iterations : 1, + fanout : 4, + num_workers : 16, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'Group By Key Go Load test: fanout 8 times with 2GB 10-byte records total', + test : 'group_by_key', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-gbk-5-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_gbk_5', + input_options : '\'{' + + '"num_records": 2500000,' + + '"key_size": 10,' + + '"value_size": 90}\'', + iterations : 1, + fanout : 8, + num_workers : 16, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'Group By Key Go Load test: reiterate 4 times 10kB values', + test : 'group_by_key', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-gbk-6-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_gbk_6', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 200,' + + '"hot_key_fraction": 1}\'', + iterations : 4, + fanout : 1, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'Group By Key Go Load test: reiterate 4 times 2MB values', + test : 'group_by_key', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-gbk-7-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_gbk_7', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 10,' + + '"hot_key_fraction": 1}\'', + iterations : 4, + fanout : 1, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + ] + .each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def loadTestJob = { scope, triggeringContext, mode -> + loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, batchScenarios(), 'group_by_key', mode) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Go_GBK_Dataflow_Batch', + 'Run Load Tests Go GBK Dataflow Batch', + 'Load Tests Go GBK Dataflow Batch suite', + this + ) { + additionalPipelineArgs = [:] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, 'batch') + } + +CronJobBuilder.cronJob('beam_LoadTests_Go_GBK_Dataflow_Batch', 'H 10 * * *', this) { + additionalPipelineArgs = [ + influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, 'batch') +} diff --git a/.test-infra/jenkins/job_LoadTests_GBK_Python.groovy b/.test-infra/jenkins/job_LoadTests_GBK_Python.groovy index 2364ae2661fc..58a0697e729a 100644 --- a/.test-infra/jenkins/job_LoadTests_GBK_Python.groovy +++ b/.test-infra/jenkins/job_LoadTests_GBK_Python.groovy @@ -156,7 +156,8 @@ def addStreamingOptions(test) { // Use the new Dataflow runner, which offers improved efficiency of Dataflow jobs. // See https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2 // for more details. - experiments: 'use_runner_v2', + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + experiments: 'use_runner_v2, shuffle_mode=appliance', ] } diff --git a/.test-infra/jenkins/job_LoadTests_GBK_Python_reiterate.groovy b/.test-infra/jenkins/job_LoadTests_GBK_Python_reiterate.groovy index d1960abce170..c980cf070fbe 100644 --- a/.test-infra/jenkins/job_LoadTests_GBK_Python_reiterate.groovy +++ b/.test-infra/jenkins/job_LoadTests_GBK_Python_reiterate.groovy @@ -86,7 +86,8 @@ def addStreamingOptions(test) { // Use the new Dataflow runner, which offers improved efficiency of Dataflow jobs. // See https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2 // for more details. - experiments: 'use_runner_v2', + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + experiments: 'use_runner_v2, shuffle_mode=appliance', ] } diff --git a/.test-infra/jenkins/job_LoadTests_ParDo_Dataflow_V2_Java11.groovy b/.test-infra/jenkins/job_LoadTests_ParDo_Dataflow_V2_Java11.groovy new file mode 100644 index 000000000000..1712e13a44ba --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_ParDo_Dataflow_V2_Java11.groovy @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import CronJobBuilder +import InfluxDBCredentialsHelper + +def commonLoadTestConfig = { jobType, isStreaming, datasetName -> + [ + [ + title : 'Load test: ParDo 2GB 100 byte records 10 times', + test : 'org.apache.beam.sdk.loadtests.ParDoLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${jobType}_ParDo_1", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${jobType}_ParDo_1", + influxMeasurement : "java_${jobType}_pardo_1", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000000, + "keySizeBytes": 10, + "valueSizeBytes": 90 + } + """.trim().replaceAll("\\s", ""), + iterations : 10, + numberOfCounters : 1, + numberOfCounterOperations: 0, + numWorkers : 5, + autoscalingAlgorithm: "NONE", + streaming : isStreaming + ] + ], + [ + title : 'Load test: ParDo 2GB 100 byte records 200 times', + test : 'org.apache.beam.sdk.loadtests.ParDoLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${jobType}_ParDo_2", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${jobType}_ParDo_2", + influxMeasurement : "java_${jobType}_pardo_2", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000000, + "keySizeBytes": 10, + "valueSizeBytes": 90 + } + """.trim().replaceAll("\\s", ""), + iterations : 200, + numberOfCounters : 1, + numberOfCounterOperations: 0, + numWorkers : 5, + autoscalingAlgorithm: "NONE", + streaming : isStreaming + ] + ], + [ + + title : 'Load test: ParDo 2GB 100 byte records 10 counters', + test : 'org.apache.beam.sdk.loadtests.ParDoLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${jobType}_ParDo_3", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${jobType}_ParDo_3", + influxMeasurement : "java_${jobType}_pardo_3", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000000, + "keySizeBytes": 10, + "valueSizeBytes": 90 + } + """.trim().replaceAll("\\s", ""), + iterations : 1, + numberOfCounters : 1, + numberOfCounterOperations: 10, + numWorkers : 5, + autoscalingAlgorithm: "NONE", + streaming : isStreaming + ] + + ], + [ + title : 'Load test: ParDo 2GB 100 byte records 100 counters', + test : 'org.apache.beam.sdk.loadtests.ParDoLoadTest', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + project : 'apache-beam-testing', + region : 'us-central1', + appName : "load_tests_Java11_Dataflow_V2_${jobType}_ParDo_4", + tempLocation : 'gs://temp-storage-for-perf-tests/loadtests', + publishToBigQuery : true, + bigQueryDataset : datasetName, + bigQueryTable : "java11_dataflow_v2_${jobType}_ParDo_4", + influxMeasurement : "java_${jobType}_pardo_4", + influxTags : """ + { + "runnerVersion": "v2", + "jdk": "java11" + } + """.trim().replaceAll("\\s", ""), + publishToInfluxDB : true, + sourceOptions : """ + { + "numRecords": 20000000, + "keySizeBytes": 10, + "valueSizeBytes": 90 + } + """.trim().replaceAll("\\s", ""), + iterations : 1, + numberOfCounters : 1, + numberOfCounterOperations: 100, + numWorkers : 5, + autoscalingAlgorithm: "NONE", + streaming : isStreaming + ] + ] + ].each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def final JOB_SPECIFIC_SWITCHES = [ + '-Prunner.version="V2"', + '-PcompileAndRunTestsWithJava11', + "-Pjava11Home=${commonJobProperties.JAVA_11_HOME}" +] + +def batchLoadTestJob = { scope, triggeringContext -> + def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext) + loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.JAVA, commonLoadTestConfig('batch', false, datasetName), + "ParDo", "batch", JOB_SPECIFIC_SWITCHES) +} + +def streamingLoadTestJob = {scope, triggeringContext -> + scope.description('Runs Java 11 ParDo load tests on Dataflow runner V2 in streaming mode') + commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 240) + + def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext) + for (testConfiguration in commonLoadTestConfig('streaming', true, datasetName)) { + testConfiguration.pipelineOptions << [inputWindowDurationSec: 1200] + loadTestsBuilder.loadTest(scope, testConfiguration.title, testConfiguration.runner, CommonTestProperties.SDK.JAVA, + testConfiguration.pipelineOptions, testConfiguration.test, JOB_SPECIFIC_SWITCHES) + } +} + +CronJobBuilder.cronJob('beam_LoadTests_Java_ParDo_Dataflow_V2_Batch_Java11', 'H 12 * * *', this) { + additionalPipelineArgs = [ + influxDatabase: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influxHost: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) +} + +CronJobBuilder.cronJob('beam_LoadTests_Java_ParDo_Dataflow_V2_Streaming_Java11', 'H 12 * * *', this) { + additionalPipelineArgs = [ + influxDatabase: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influxHost: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + streamingLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Java_ParDo_Dataflow_V2_Batch_Java11', + 'Run Load Tests Java 11 ParDo Dataflow V2 Batch', + 'Load Tests Java 11 ParDo Dataflow V2 Batch suite', + this + ) { + additionalPipelineArgs = [:] + batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.PR) + } + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Java_ParDo_Dataflow_V2_Streaming_Java11', + 'Run Load Tests Java 11 ParDo Dataflow V2 Streaming', + 'Load Tests Java 11 ParDo Dataflow V2 Streaming suite', + this + ) { + additionalPipelineArgs = [:] + streamingLoadTestJob(delegate, CommonTestProperties.TriggeringContext.PR) + } diff --git a/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Go.groovy b/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Go.groovy index 17d8d5c8e5e5..567e3276c2b6 100644 --- a/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Go.groovy +++ b/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Go.groovy @@ -126,7 +126,7 @@ def loadTestJob = { scope, triggeringContext, mode -> "${DOCKER_CONTAINER_REGISTRY}/beam_go_sdk:latest" ], numberOfWorkers, - "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.10_job_server:latest") + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, batchScenarios(), 'ParDo', mode) } diff --git a/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy b/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy index ac87b793138a..9b80c61fc726 100644 --- a/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy +++ b/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy @@ -320,7 +320,7 @@ def loadTestJob = { scope, triggeringContext, mode -> "${DOCKER_CONTAINER_REGISTRY}/${DOCKER_BEAM_SDK_IMAGE}" ], numberOfWorkers, - "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.10_job_server:latest") + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.PYTHON, testScenarios, 'ParDo', mode) } diff --git a/.test-infra/jenkins/job_LoadTests_ParDo_Go.groovy b/.test-infra/jenkins/job_LoadTests_ParDo_Go.groovy new file mode 100644 index 000000000000..a03b9bb0b082 --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_ParDo_Go.groovy @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + +String now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC')) + + +def batchScenarios = { + [ + [ + title : 'ParDo Go Load test: 20M 100 byte records 10 iterations', + test : 'pardo', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-pardo-1-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_measurement : 'go_batch_pardo_1', + influx_namespace : 'dataflow', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90}\'', + iterations : 10, + number_of_counter_operations: 0, + number_of_counters : 0, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'ParDo Go Load test: 20M 100 byte records 200 times', + test : 'pardo', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-pardo-2-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_measurement : 'go_batch_pardo_2', + influx_namespace : 'dataflow', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90}\'', + iterations : 200, + number_of_counter_operations: 0, + number_of_counters : 0, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'ParDo Go Load test: 20M 100 byte records 10 counters', + test : 'pardo', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-pardo-3-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_measurement : 'go_batch_pardo_3', + influx_namespace : 'dataflow', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90}\'', + iterations : 1, + number_of_counter_operations: 10, + number_of_counters : 1, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'ParDo Go Load test: 20M 100 byte records 100 counters', + test : 'pardo', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-pardo-4-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_measurement : 'go_batch_pardo_4', + influx_namespace : 'dataflow', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90}\'', + iterations : 1, + number_of_counter_operations: 100, + number_of_counters : 1, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + ].each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def loadTestJob = { scope, triggeringContext, mode -> + loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, batchScenarios(), 'ParDo', mode) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Go_ParDo_Dataflow_Batch', + 'Run Load Tests Go ParDo Dataflow Batch', + 'Load Tests Go ParDo Dataflow Batch suite', + this + ) { + additionalPipelineArgs = [:] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, 'batch') + } + +CronJobBuilder.cronJob('beam_LoadTests_Go_ParDo_Dataflow_Batch', 'H 10 * * *', this) { + additionalPipelineArgs = [ + influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, 'batch') +} diff --git a/.test-infra/jenkins/job_LoadTests_ParDo_Python.groovy b/.test-infra/jenkins/job_LoadTests_ParDo_Python.groovy index 090361a21a5e..82def0f5ab57 100644 --- a/.test-infra/jenkins/job_LoadTests_ParDo_Python.groovy +++ b/.test-infra/jenkins/job_LoadTests_ParDo_Python.groovy @@ -131,7 +131,8 @@ def addStreamingOptions(test) { // Use the new Dataflow runner, which offers improved efficiency of Dataflow jobs. // See https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2 // for more details. - experiments: 'use_runner_v2', + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + experiments: 'use_runner_v2, shuffle_mode=appliance', ] } diff --git a/.test-infra/jenkins/job_LoadTests_SideInput_Flink_Go.groovy b/.test-infra/jenkins/job_LoadTests_SideInput_Flink_Go.groovy new file mode 100644 index 000000000000..c2db4836df33 --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_SideInput_Flink_Go.groovy @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonTestProperties +import CommonJobProperties as commonJobProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + +import static LoadTestsBuilder.DOCKER_CONTAINER_REGISTRY + +def now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC')) + +def batchScenarios = { + [ + [ + title : 'SideInput Go Load test: 400mb-1kb-10workers-1window-first-iterable', + test : 'sideinput', + runner : CommonTestProperties.Runner.FLINK, + pipelineOptions: [ + job_name : "load-tests-go-flink-batch-sideinput-3-${now}", + influx_namespace : 'flink', + influx_measurement : 'go_batch_sideinput_3', + input_options : '\'{' + + '"num_records": 400000,' + + '"key_size": 100,' + + '"value_size": 900}\'', + access_percentage : 1, + parallelism : 10, + endpoint : 'localhost:8099', + environment_type : 'DOCKER', + environment_config : "${DOCKER_CONTAINER_REGISTRY}/beam_go_sdk:latest", + ] + ], + [ + title : 'SideInput Go Load test: 400mb-1kb-10workers-1window-iterable', + test : 'sideinput', + runner : CommonTestProperties.Runner.FLINK, + pipelineOptions: [ + job_name : "load-tests-go-flink-batch-sideinput-4-${now}", + influx_namespace : 'flink', + influx_measurement : 'go_batch_sideinput_4', + input_options : '\'{' + + '"num_records": 400000,' + + '"key_size": 100,' + + '"value_size": 900}\'', + parallelism : 10, + endpoint : 'localhost:8099', + environment_type : 'DOCKER', + environment_config : "${DOCKER_CONTAINER_REGISTRY}/beam_go_sdk:latest", + ] + ], + ] + .each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def loadTestJob = { scope, triggeringContext, mode -> + def numberOfWorkers = 10 + + Flink flink = new Flink(scope, "beam_LoadTests_Go_SideInput_Flink_${mode.capitalize()}") + flink.setUp( + [ + "${DOCKER_CONTAINER_REGISTRY}/beam_go_sdk:latest" + ], + numberOfWorkers, + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") + + loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, + batchScenarios(), 'SideInput', mode) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Go_SideInput_Flink_Batch', + 'Run Load Tests Go SideInput Flink Batch', + 'Load Tests Go SideInput Flink Batch suite', + this + ) { + additionalPipelineArgs = [:] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, 'batch') + } + +CronJobBuilder.cronJob('beam_LoadTests_Go_SideInput_Flink_Batch', 'H 11 * * *', this) { + additionalPipelineArgs = [ + influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, 'batch') +} diff --git a/.test-infra/jenkins/job_LoadTests_SideInput_Go.groovy b/.test-infra/jenkins/job_LoadTests_SideInput_Go.groovy new file mode 100644 index 000000000000..16e8618b4302 --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_SideInput_Go.groovy @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + +String now = new Date().format('MMddHHmmss', TimeZone.getTimeZone('UTC')) + +def batchScenarios = { + [ + [ + title : 'SideInput Go Load test: 400mb-1kb-10workers-1window-first-iterable', + test : 'sideinput', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-sideinput-3-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_sideinput_3', + input_options : '\'{' + + '"num_records": 400000,' + + '"key_size": 100,' + + '"value_size": 900}\'', + access_percentage: 1, + num_workers : 10, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'SideInput Go Load test: 400mb-1kb-10workers-1window-iterable', + test : 'sideinput', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-sideinput-4-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_namespace : 'dataflow', + influx_measurement : 'go_batch_sideinput_4', + input_options : '\'{' + + '"num_records": 400000,' + + '"key_size": 100,' + + '"value_size": 900}\'', + num_workers : 10, + autoscaling_algorithm: 'NONE', + ] + ] + ] + .each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def loadTestJob = { scope, triggeringContext, mode -> + loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, batchScenarios(), 'sideinput', mode) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Go_SideInput_Dataflow_Batch', + 'Run Load Tests Go SideInput Dataflow Batch', + 'Load Tests Go SideInput Dataflow Batch suite', + this + ) { + additionalPipelineArgs = [:] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, 'batch') + } + +CronJobBuilder.cronJob('beam_LoadTests_Go_SideInput_Dataflow_Batch', 'H 11 * * *', this) { + additionalPipelineArgs = [ + influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, 'batch') +} diff --git a/.test-infra/jenkins/job_LoadTests_SideInput_Python.groovy b/.test-infra/jenkins/job_LoadTests_SideInput_Python.groovy index 5ed7cc6381df..5261e5f677f0 100644 --- a/.test-infra/jenkins/job_LoadTests_SideInput_Python.groovy +++ b/.test-infra/jenkins/job_LoadTests_SideInput_Python.groovy @@ -39,7 +39,8 @@ def fromTemplate = { mode, name, id, datasetName, testSpecificOptions -> influx_measurement : "python_${mode}_sideinput_${id}", num_workers : 10, autoscaling_algorithm: 'NONE', - experiments : 'use_runner_v2', + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + experiments : 'use_runner_v2, shuffle_mode=appliance', ] << testSpecificOptions ] } diff --git a/.test-infra/jenkins/job_LoadTests_coGBK_Flink_Go.groovy b/.test-infra/jenkins/job_LoadTests_coGBK_Flink_Go.groovy index d5463b0a0fa8..b15c47fe6f18 100644 --- a/.test-infra/jenkins/job_LoadTests_coGBK_Flink_Go.groovy +++ b/.test-infra/jenkins/job_LoadTests_coGBK_Flink_Go.groovy @@ -158,7 +158,7 @@ def loadTestJob = { scope, triggeringContext, mode -> "${DOCKER_CONTAINER_REGISTRY}/beam_go_sdk:latest" ], numberOfWorkers, - "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.10_job_server:latest") + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, batchScenarios(), 'CoGBK', mode) } diff --git a/.test-infra/jenkins/job_LoadTests_coGBK_Flink_Python.groovy b/.test-infra/jenkins/job_LoadTests_coGBK_Flink_Python.groovy index 5b9a21946049..68dd2113e7a8 100644 --- a/.test-infra/jenkins/job_LoadTests_coGBK_Flink_Python.groovy +++ b/.test-infra/jenkins/job_LoadTests_coGBK_Flink_Python.groovy @@ -137,7 +137,7 @@ def loadTest = { scope, triggeringContext -> "${DOCKER_CONTAINER_REGISTRY}/${DOCKER_BEAM_SDK_IMAGE}" ], numberOfWorkers, - "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.10_job_server:latest") + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.PYTHON, testScenarios, 'CoGBK', 'batch') } diff --git a/.test-infra/jenkins/job_LoadTests_coGBK_Go.groovy b/.test-infra/jenkins/job_LoadTests_coGBK_Go.groovy new file mode 100644 index 000000000000..09072b6197dc --- /dev/null +++ b/.test-infra/jenkins/job_LoadTests_coGBK_Go.groovy @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + +String now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC')) + +def batchScenarios = { + [ + [ + title : 'CoGroupByKey Go Load test: 2GB of 100B records with a single key', + test : 'cogbk', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-cogbk-1-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_measurement : 'go_batch_cogbk_1', + influx_namespace : 'dataflow', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 1,' + + '"hot_key_fraction": 1}\'', + co_input_options : '\'{' + + '"num_records": 2000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 1000,' + + '"hot_key_fraction": 1}\'', + iterations : 1, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'CoGroupByKey Go Load test: 2GB of 100B records with multiple keys', + test : 'cogbk', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-cogbk-2-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_measurement : 'go_batch_cogbk_2', + influx_namespace : 'dataflow', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 5,' + + '"hot_key_fraction": 1}\'', + co_input_options : '\'{' + + '"num_records": 2000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 1000,' + + '"hot_key_fraction": 1}\'', + iterations : 1, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'CoGroupByKey Go Load test: reiterate 4 times 10kB values', + test : 'cogbk', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-cogbk-3-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_measurement : 'go_batch_cogbk_3', + influx_namespace : 'dataflow', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 200000,' + + '"hot_key_fraction": 1}\'', + co_input_options : '\'{' + + '"num_records": 2000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 1000,' + + '"hot_key_fraction": 1}\'', + iterations : 4, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + [ + title : 'CoGroupByKey Go Load test: reiterate 4 times 2MB values', + test : 'cogbk', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : "load-tests-go-dataflow-batch-cogbk-4-${now}", + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + staging_location : 'gs://temp-storage-for-perf-tests/loadtests', + influx_measurement : 'go_batch_cogbk_4', + influx_namespace : 'dataflow', + input_options : '\'{' + + '"num_records": 20000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 1000,' + + '"hot_key_fraction": 1}\'', + co_input_options : '\'{' + + '"num_records": 2000000,' + + '"key_size": 10,' + + '"value_size": 90,' + + '"num_hot_keys": 1000,' + + '"hot_key_fraction": 1}\'', + iterations : 4, + num_workers : 5, + autoscaling_algorithm: 'NONE', + ] + ], + ] + .each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } +} + +def loadTestJob = { scope, triggeringContext, mode -> + loadTestsBuilder.loadTests(scope, CommonTestProperties.SDK.GO, batchScenarios(), 'CoGBK', mode) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_LoadTests_Go_CoGBK_Dataflow_Batch', + 'Run Load Tests Go CoGBK Dataflow Batch', + 'Load Tests Go CoGBK Dataflow Batch suite', + this + ) { + additionalPipelineArgs = [:] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, 'batch') + } + +CronJobBuilder.cronJob('beam_LoadTests_Go_CoGBK_Dataflow_batch', 'H 8 * * *', this) { + additionalPipelineArgs = [ + influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostUrl, + ] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, 'batch') +} diff --git a/.test-infra/jenkins/job_PerformanceTests_InfluxDBIO_IT.groovy b/.test-infra/jenkins/job_PerformanceTests_InfluxDBIO_IT.groovy index 8cd75ed5615f..28b3d34dfa0f 100644 --- a/.test-infra/jenkins/job_PerformanceTests_InfluxDBIO_IT.groovy +++ b/.test-infra/jenkins/job_PerformanceTests_InfluxDBIO_IT.groovy @@ -22,6 +22,7 @@ String jobName = "beam_PerformanceTests_InfluxDbIO_IT" job(jobName) { common.setTopLevelMainJobProperties(delegate) + common.setAutoJob(delegate,'H */6 * * *') common.enablePhraseTriggeringFromPullRequest( delegate, 'Java InfluxDbIO Performance Test', diff --git a/.test-infra/jenkins/job_PerformanceTests_KafkaIO_IT.groovy b/.test-infra/jenkins/job_PerformanceTests_KafkaIO_IT.groovy index 0c6e6ae9ea63..4c665a3b35dd 100644 --- a/.test-infra/jenkins/job_PerformanceTests_KafkaIO_IT.groovy +++ b/.test-infra/jenkins/job_PerformanceTests_KafkaIO_IT.groovy @@ -24,7 +24,8 @@ String jobName = "beam_PerformanceTests_Kafka_IO" job(jobName) { common.setTopLevelMainJobProperties(delegate) - common.setAutoJob(delegate, 'H */6 * * *') + // TODO(BEAM-9482): Re-enable once fixed. + // common.setAutoJob(delegate, 'H */6 * * *') common.enablePhraseTriggeringFromPullRequest( delegate, 'Java KafkaIO Performance Test', @@ -74,7 +75,8 @@ job(jobName) { kafkaTopic : 'beam-runnerv2', bigQueryTable : 'kafkaioit_results_sdf_wrapper', influxMeasurement : 'kafkaioit_results_sdf_wrapper', - experiments : 'beam_fn_api,use_runner_v2,use_unified_worker', + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + experiments : 'use_runner_v2,shuffle_mode=appliance,use_unified_worker', ] Map dataflowRunnerV2SdfPipelineOptions = pipelineOptions + [ @@ -88,7 +90,8 @@ job(jobName) { kafkaTopic : 'beam-sdf', bigQueryTable : 'kafkaioit_results_runner_v2', influxMeasurement : 'kafkaioit_results_runner_v2', - experiments : 'beam_fn_api,use_runner_v2,use_unified_worker,use_sdf_kafka_read', + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + experiments : 'use_runner_v2,shuffle_mode=appliance,use_unified_worker', ] steps { diff --git a/.test-infra/jenkins/job_PerformanceTests_PubsubIO_Python.groovy b/.test-infra/jenkins/job_PerformanceTests_PubsubIO_Python.groovy index fd30c964ef76..8217569e187c 100644 --- a/.test-infra/jenkins/job_PerformanceTests_PubsubIO_Python.groovy +++ b/.test-infra/jenkins/job_PerformanceTests_PubsubIO_Python.groovy @@ -24,7 +24,9 @@ import static java.util.UUID.randomUUID def now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC')) -def withDataflowWorkerJar = true +def final JOB_SPECIFIC_SWITCHES = [ + '-PwithDataflowWorkerJar="true"' +] def psio_test = [ title : 'PubsubIO Write Performance Test Python 2GB', @@ -56,7 +58,7 @@ def executeJob = { scope, testConfig -> commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 240) loadTestsBuilder.loadTest(scope, testConfig.title, testConfig.runner, - CommonTestProperties.SDK.PYTHON, testConfig.pipelineOptions, testConfig.test, withDataflowWorkerJar) + CommonTestProperties.SDK.PYTHON, testConfig.pipelineOptions, testConfig.test, JOB_SPECIFIC_SWITCHES) } PhraseTriggeringPostCommitBuilder.postCommitJob( diff --git a/.test-infra/jenkins/job_PerformanceTests_Python.groovy b/.test-infra/jenkins/job_PerformanceTests_Python.groovy index c6c4f9e7ae2f..0807d2de9814 100644 --- a/.test-infra/jenkins/job_PerformanceTests_Python.groovy +++ b/.test-infra/jenkins/job_PerformanceTests_Python.groovy @@ -37,7 +37,7 @@ for (pythonVersion in pythonVersions) { jobName : "beam_PerformanceTests_WordCountIT_Py${pythonVersion}", jobDescription : "Python SDK Performance Test - Run WordCountIT in Py${pythonVersion} with 1Gb files", jobTriggerPhrase : "Run Python${pythonVersion} WordCountIT Performance Test", - test : "apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it", + test : "apache_beam/examples/wordcount_it_test.py::WordCountIT::test_wordcount_it", gradleTaskName : ":sdks:python:test-suites:dataflow:py${pythonVersion}:runPerformanceTest", pipelineOptions : dataflowPipelineArgs + [ job_name : "performance-tests-wordcount-python${pythonVersion}-batch-1gb${now}", @@ -80,7 +80,7 @@ private void createPythonPerformanceTestJob(Map testConfig) { ) publishers { - archiveJunit('**/nosetests*.xml') + archiveJunit('**/pytest*.xml') } steps { diff --git a/.test-infra/jenkins/job_PerformanceTests_SpannerIO_Python.groovy b/.test-infra/jenkins/job_PerformanceTests_SpannerIO_Python.groovy new file mode 100644 index 000000000000..4b7b9c9c3487 --- /dev/null +++ b/.test-infra/jenkins/job_PerformanceTests_SpannerIO_Python.groovy @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import LoadTestsBuilder as loadTestsBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + +def now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC')) + +def spannerio_read_test_2gb = [ + title : 'SpannerIO Read Performance Test Python 2 GB', + test : 'apache_beam.io.gcp.experimental.spannerio_read_perf_test', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : 'performance-tests-spanner-read-python-2gb' + now, + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + spanner_instance : 'beam-test', + spanner_database : 'pyspanner_read_2gb', + publish_to_big_query : true, + metrics_dataset : 'beam_performance', + metrics_table : 'pyspanner_read_2GB_results', + influx_measurement : 'python_spannerio_read_2GB_results', + influx_db_name : InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influx_hostname : InfluxDBCredentialsHelper.InfluxDBHostUrl, + input_options : '\'{' + + '"num_records": 2097152,' + + '"key_size": 1,' + + '"value_size": 1024}\'', + num_workers : 5, + autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. + ] +] + +def spannerio_write_test_2gb = [ + title : 'SpannerIO Write Performance Test Python Batch 2 GB', + test : 'apache_beam.io.gcp.experimental.spannerio_write_perf_test', + runner : CommonTestProperties.Runner.DATAFLOW, + pipelineOptions: [ + job_name : 'performance-tests-spannerio-write-python-batch-2gb' + now, + project : 'apache-beam-testing', + region : 'us-central1', + temp_location : 'gs://temp-storage-for-perf-tests/loadtests', + spanner_instance : 'beam-test', + spanner_database : 'pyspanner_write_2gb', + publish_to_big_query : true, + metrics_dataset : 'beam_performance', + metrics_table : 'pyspanner_write_2GB_results', + influx_measurement : 'python_spanner_write_2GB_results', + influx_db_name : InfluxDBCredentialsHelper.InfluxDBDatabaseName, + influx_hostname : InfluxDBCredentialsHelper.InfluxDBHostUrl, + input_options : '\'{' + + '"num_records": 2097152,' + + '"key_size": 1,' + + '"value_size": 1024}\'', + num_workers : 5, + autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. + ] +] + +def executeJob = { scope, testConfig -> + commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 480) + + loadTestsBuilder.loadTest(scope, testConfig.title, testConfig.runner, CommonTestProperties.SDK.PYTHON, testConfig.pipelineOptions, testConfig.test) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_PerformanceTests_SpannerIO_Read_2GB_Python', + 'Run SpannerIO Read 2GB Performance Test Python', + 'SpannerIO Read 2GB Performance Test Python', + this + ) { + executeJob(delegate, spannerio_read_test_2gb) + } + +CronJobBuilder.cronJob('beam_PerformanceTests_SpannerIO_Read_2GB_Python', 'H 15 * * *', this) { + executeJob(delegate, spannerio_read_test_2gb) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( + 'beam_PerformanceTests_SpannerIO_Write_2GB_Python_Batch', + 'Run SpannerIO Write 2GB Performance Test Python Batch', + 'SpannerIO Write 2GB Performance Test Python Batch', + this + ) { + executeJob(delegate, spannerio_write_test_2gb) + } + +CronJobBuilder.cronJob('beam_PerformanceTests_SpannerIO_Write_2GB_Python_Batch', 'H 15 * * *', this) { + executeJob(delegate, spannerio_write_test_2gb) +} diff --git a/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Dataflow.groovy b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Dataflow.groovy index 926f710661e4..6435f8086ac3 100644 --- a/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Dataflow.groovy +++ b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Dataflow.groovy @@ -32,7 +32,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_XVR_Dataflow', // Publish all test results to Jenkins publishers { - archiveJunit('**/nosetests*.xml') + archiveJunit('**/pytest*.xml') } // Gradle goals for this job. diff --git a/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Flink.groovy b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Flink.groovy index 96882030fbae..828b51bf5c1e 100644 --- a/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Flink.groovy +++ b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Flink.groovy @@ -17,6 +17,7 @@ */ import CommonJobProperties as commonJobProperties +import CommonTestProperties import PostcommitJobBuilder import static PythonTestProperties.CROSS_LANGUAGE_VALIDATES_RUNNER_PYTHON_VERSIONS @@ -40,7 +41,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_XVR_Flink', shell("echo \"*** RUN CROSS-LANGUAGE FLINK USING PYTHON ${pythonVersion} ***\"") gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:flink:1.10:job-server:validatesCrossLanguageRunner') + tasks(":runners:flink:${CommonTestProperties.getFlinkVersion()}:job-server:validatesCrossLanguageRunner") commonJobProperties.setGradleSwitches(delegate) switches("-PpythonVersion=${pythonVersion}") } diff --git a/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Samza.groovy b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Samza.groovy new file mode 100644 index 000000000000..ba273dadae1a --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Samza.groovy @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties +import PostcommitJobBuilder + +import static PythonTestProperties.CROSS_LANGUAGE_VALIDATES_RUNNER_PYTHON_VERSIONS + +// This job runs the suite of ValidatesRunner tests against the Samza runner. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_XVR_Samza', + 'Run XVR_Samza PostCommit', 'Samza CrossLanguageValidatesRunner Tests', this) { + description('Runs the CrossLanguageValidatesRunner suite on the Samza runner.') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate) + + // Publish all test results to Jenkins + publishers { + archiveJunit('**/build/test-results/**/*.xml') + } + + // Gradle goals for this job. + steps { + CROSS_LANGUAGE_VALIDATES_RUNNER_PYTHON_VERSIONS.each { pythonVersion -> + shell("echo \"*** RUN CROSS-LANGUAGE SAMZA USING PYTHON ${pythonVersion} ***\"") + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(":runners:samza:job-server:validatesCrossLanguageRunner") + commonJobProperties.setGradleSwitches(delegate) + switches("-PpythonVersion=${pythonVersion}") + } + } + } + } diff --git a/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark.groovy b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark.groovy index 25744ceeac99..610d5efec2c4 100644 --- a/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark.groovy +++ b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark.groovy @@ -40,7 +40,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_XVR_Spark', shell("echo \"*** RUN CROSS-LANGUAGE SPARK USING PYTHON ${pythonVersion} ***\"") gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:spark:job-server:validatesCrossLanguageRunner') + tasks(':runners:spark:2:job-server:validatesCrossLanguageRunner') commonJobProperties.setGradleSwitches(delegate) switches("-PpythonVersion=${pythonVersion}") } diff --git a/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark3.groovy b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark3.groovy new file mode 100644 index 000000000000..de2d925c4d24 --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark3.groovy @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + +import static PythonTestProperties.CROSS_LANGUAGE_VALIDATES_RUNNER_PYTHON_VERSIONS + +// This job runs the suite of ValidatesRunner tests against the Flink runner. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_XVR_Spark3', + 'Run XVR_Spark3 PostCommit', 'Spark3 CrossLanguageValidatesRunner Tests', this) { + description('Runs the CrossLanguageValidatesRunner suite on the Spark3 runner.') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate) + + // Publish all test results to Jenkins + publishers { + archiveJunit('**/build/test-results/**/*.xml') + } + + // Gradle goals for this job. + steps { + CROSS_LANGUAGE_VALIDATES_RUNNER_PYTHON_VERSIONS.each { pythonVersion -> + shell("echo \"*** RUN CROSS-LANGUAGE SPARK3 USING PYTHON ${pythonVersion} ***\"") + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(':runners:spark:3:job-server:validatesCrossLanguageRunner') + commonJobProperties.setGradleSwitches(delegate) + switches("-PpythonVersion=${pythonVersion}") + } + } + } + } diff --git a/.test-infra/jenkins/job_PostCommit_Python36.groovy b/.test-infra/jenkins/job_PostCommit_Go_ValidatesRunner_Samza.groovy similarity index 70% rename from .test-infra/jenkins/job_PostCommit_Python36.groovy rename to .test-infra/jenkins/job_PostCommit_Go_ValidatesRunner_Samza.groovy index 1a3751899797..891f25a7e12b 100644 --- a/.test-infra/jenkins/job_PostCommit_Python36.groovy +++ b/.test-infra/jenkins/job_PostCommit_Go_ValidatesRunner_Samza.groovy @@ -19,25 +19,19 @@ import CommonJobProperties as commonJobProperties import PostcommitJobBuilder -// This job defines the Python postcommit tests. -PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python36', 'Run Python 3.6 PostCommit', - 'Python36_PC("Run Python 3.6 PostCommit")', this) { - description('Runs Python postcommit tests using Python 3.6.') - - previousNames('/beam_PostCommit_Python3_Verify/') +// This job runs the suite of Go integration tests against the Samza runner. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Go_VR_Samza', + 'Run Go Samza ValidatesRunner', 'Go Samza ValidatesRunner Tests', this) { + description('Runs Go integration tests on the Samza runner.') // Set common parameters. commonJobProperties.setTopLevelMainJobProperties(delegate) - publishers { - archiveJunit('**/nosetests*.xml') - } - - // Execute shell command to test Python SDK. + // Gradle goals for this job. steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':python36PostCommit') + tasks(':sdks:go:test:samzaValidatesRunner') commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostCommit_Java_Dataflow.groovy b/.test-infra/jenkins/job_PostCommit_Java_Dataflow.groovy new file mode 100644 index 000000000000..2ab27da0f1b0 --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_Dataflow.groovy @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + + +// This job runs the Java postcommit tests, including the suite of integration +// tests. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_DataflowV1', 'Run PostCommit_Java_Dataflow', + 'Dataflow Java Post Commit Tests', this) { + + description('Dataflow Java Post Commit Tests') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 240) + + // Publish all test results to Jenkins + publishers { + archiveJunit('**/build/test-results/**/*.xml') + } + + // Gradle goals for this job. + steps { + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(":runners:google-cloud-dataflow-java:postCommit") + commonJobProperties.setGradleSwitches(delegate) + // Specify maven home on Jenkins, needed by Maven archetype integration tests. + switches('-Pmaven_home=/home/jenkins/tools/maven/apache-maven-3.5.4') + } + } + } diff --git a/.test-infra/jenkins/job_PostCommit_Java_DataflowV2.groovy b/.test-infra/jenkins/job_PostCommit_Java_DataflowV2.groovy new file mode 100644 index 000000000000..03d82d48974e --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_DataflowV2.groovy @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + + +// This job runs the Java postcommit tests, including the suite of integration +// tests. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_DataflowV2', 'Run PostCommit_Java_DataflowV2', + 'Dataflow V2 Java Post Commit Tests', this) { + + description('Dataflow V2 Java Post Commit Tests') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 240) + + // Publish all test results to Jenkins + publishers { + archiveJunit('**/build/test-results/**/*.xml') + } + + // Gradle goals for this job. + steps { + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(":runners:google-cloud-dataflow-java:postCommitRunnerV2") + commonJobProperties.setGradleSwitches(delegate) + // Specify maven home on Jenkins, needed by Maven archetype integration tests. + switches('-Pmaven_home=/home/jenkins/tools/maven/apache-maven-3.5.4') + } + } + } diff --git a/.test-infra/jenkins/job_PostCommit_Java_Hadoop_Versions.groovy b/.test-infra/jenkins/job_PostCommit_Java_Hadoop_Versions.groovy new file mode 100644 index 000000000000..dfca212d64fb --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_Hadoop_Versions.groovy @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + + +// This job runs the Java postcommit tests, including the suite of integration +// tests. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_Hadoop_Versions', 'Run PostCommit_Java_Hadoop_Versions', + 'Java Hadoop Versions Post Commit Tests', this) { + + description('Java Hadoop Versions Post Commit Tests') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 240) + + // Publish all test results to Jenkins + publishers { + archiveJunit('**/build/test-results/**/*.xml') + } + + // Gradle goals for this job. + steps { + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(":javaHadoopVersionsTest") + commonJobProperties.setGradleSwitches(delegate) + // Specify maven home on Jenkins, needed by Maven archetype integration tests. + switches('-Pmaven_home=/home/jenkins/tools/maven/apache-maven-3.5.4') + } + } + } diff --git a/.test-infra/jenkins/job_PostCommit_Java_Jpms_Spark_Java11.groovy b/.test-infra/jenkins/job_PostCommit_Java_Jpms_Spark_Java11.groovy new file mode 100644 index 000000000000..35a9ef961f95 --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_Jpms_Spark_Java11.groovy @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + + +// This job runs the Java postcommit tests, including the suite of integration +// tests. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_Jpms_Spark_Java11', 'Run Jpms Spark Java 11 PostCommit', + 'JPMS Java 11 Spark Post Commit Tests', this) { + + description('Runs JPMS tests on Spark using the Java 11 SDK.') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 240) + + // Publish all test results to Jenkins + publishers { + archiveJunit('**/build/test-results/**/*.xml') + } + + // Gradle goals for this job. + steps { + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(':sdks:java:testing:jpms-tests:sparkRunnerIntegrationTest') + commonJobProperties.setGradleSwitches(delegate) + switches("-Dorg.gradle.java.home=${commonJobProperties.JAVA_11_HOME}") + // Specify maven home on Jenkins, needed by Maven archetype integration tests. + switches('-Pmaven_home=/home/jenkins/tools/maven/apache-maven-3.5.4') + } + } + } diff --git a/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Dataflow_V2.groovy b/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Dataflow_V2.groovy new file mode 100644 index 000000000000..9d7a1245da90 --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Dataflow_V2.groovy @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties.Runner +import CommonTestProperties.SDK +import CommonTestProperties.TriggeringContext +import NexmarkBuilder as Nexmark +import NoPhraseTriggeringPostCommitBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + +import static NexmarkDatabaseProperties.nexmarkBigQueryArgs +import static NexmarkDatabaseProperties.nexmarkInfluxDBArgs + +def final JOB_SPECIFIC_OPTIONS = [ + 'influxTags' : '{\\\"runnerVersion\\\":\\\"V2\\\",\\\"javaVersion\\\":\\\"8\\\"}', + 'exportSummaryToBigQuery' : false, + 'region' : 'us-central1', + 'suite' : 'STRESS', + 'numWorkers' : 4, + 'maxNumWorkers' : 4, + 'autoscalingAlgorithm' : 'NONE', + 'nexmarkParallel' : 16, + 'enforceEncodability' : true, + 'enforceImmutability' : true +] + +def final JOB_SPECIFIC_SWITCHES = [ + '-Pnexmark.runner.version="V2"' +] + +// This job runs the suite of Nexmark tests against the Dataflow runner V2. +NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_Dataflow_V2', + 'Dataflow Runner V2 Nexmark Tests', this) { + description('Runs the Nexmark suite on the Dataflow runner V2.') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 240) + + Nexmark.nonQueryLanguageJobs(delegate, Runner.DATAFLOW, SDK.JAVA, JOB_SPECIFIC_OPTIONS, TriggeringContext.POST_COMMIT, JOB_SPECIFIC_SWITCHES, Nexmark.DEFAULT_JAVA_RUNTIME_VERSION) + } + +PhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_DataflowV2', + 'Run Dataflow Runner V2 Nexmark Tests', 'Dataflow Runner V2 Nexmark Tests', this) { + + description('Runs the Nexmark suite on the Dataflow runner V2 against a Pull Request, on demand.') + + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 240) + + Nexmark.nonQueryLanguageJobs(delegate, Runner.DATAFLOW, SDK.JAVA, JOB_SPECIFIC_OPTIONS, TriggeringContext.PR, JOB_SPECIFIC_SWITCHES, Nexmark.DEFAULT_JAVA_RUNTIME_VERSION) + } diff --git a/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Dataflow_V2_Java11.groovy b/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Dataflow_V2_Java11.groovy new file mode 100644 index 000000000000..8fc69a7c0994 --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Dataflow_V2_Java11.groovy @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import CommonTestProperties.Runner +import CommonTestProperties.SDK +import CommonTestProperties.TriggeringContext +import NexmarkBuilder as Nexmark +import NoPhraseTriggeringPostCommitBuilder +import PhraseTriggeringPostCommitBuilder +import InfluxDBCredentialsHelper + +import static NexmarkDatabaseProperties.nexmarkBigQueryArgs +import static NexmarkDatabaseProperties.nexmarkInfluxDBArgs + +def final JOB_SPECIFIC_OPTIONS = [ + 'influxTags' : '{\\\"runnerVersion\\\":\\\"V2\\\",\\\"javaVersion\\\":\\\"11\\\"}', + 'exportSummaryToBigQuery' : false, + 'region' : 'us-central1', + 'suite' : 'STRESS', + 'numWorkers' : 4, + 'maxNumWorkers' : 4, + 'autoscalingAlgorithm' : 'NONE', + 'nexmarkParallel' : 16, + 'enforceEncodability' : true, + 'enforceImmutability' : true +] + +def final JOB_SPECIFIC_SWITCHES = [ + '-Pnexmark.runner.version="V2"' +] + +// This job runs the suite of Nexmark tests against the Dataflow runner V2. +NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_Dataflow_V2_Java11', + 'Dataflow Runner V2 Java 11 Nexmark Tests', this) { + description('Runs the Nexmark suite on the Dataflow runner V2 on Java 11.') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 240) + + Nexmark.nonQueryLanguageJobs(delegate, Runner.DATAFLOW, SDK.JAVA, JOB_SPECIFIC_OPTIONS, TriggeringContext.POST_COMMIT, JOB_SPECIFIC_SWITCHES, Nexmark.JAVA_11_RUNTIME_VERSION) + } + +PhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_DataflowV2_Java11', + 'Run Dataflow Runner V2 Java 11 Nexmark Tests', 'Dataflow Runner V2 Java 11 Nexmark Tests', this) { + + description('Runs the Nexmark suite on the Dataflow runner V2 on Java 11 against a Pull Request, on demand.') + + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 240) + + Nexmark.nonQueryLanguageJobs(delegate, Runner.DATAFLOW, SDK.JAVA, JOB_SPECIFIC_OPTIONS, TriggeringContext.PR, JOB_SPECIFIC_SWITCHES, Nexmark.JAVA_11_RUNTIME_VERSION) + } diff --git a/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Flink.groovy b/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Flink.groovy index 30c73522c6cf..4fa9a4056c67 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Flink.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Flink.groovy @@ -17,6 +17,7 @@ */ import CommonJobProperties as commonJobProperties +import CommonTestProperties import CommonTestProperties.Runner import CommonTestProperties.SDK import CommonTestProperties.TriggeringContext @@ -44,7 +45,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:flink:1.10"' + + switches("-Pnexmark.runner=\":runners:flink:${CommonTestProperties.getFlinkVersion()}\"" + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), @@ -62,7 +63,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:flink:1.10"' + + switches("-Pnexmark.runner=\":runners:flink:${CommonTestProperties.getFlinkVersion()}\"" + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), @@ -80,7 +81,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:flink:1.10"' + + switches("-Pnexmark.runner=\":runners:flink:${CommonTestProperties.getFlinkVersion()}\"" + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), @@ -98,7 +99,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:flink:1.10"' + + switches("-Pnexmark.runner=\":runners:flink:${CommonTestProperties.getFlinkVersion()}\"" + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), @@ -117,7 +118,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:flink:1.10"' + + switches("-Pnexmark.runner=\":runners:flink:${CommonTestProperties.getFlinkVersion()}\"" + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), @@ -135,7 +136,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:flink:1.10"' + + switches("-Pnexmark.runner=\":runners:flink:${CommonTestProperties.getFlinkVersion()}\"" + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), diff --git a/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Spark.groovy b/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Spark.groovy index cab81c49a4a7..a41760ee92e0 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Spark.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Spark.groovy @@ -44,7 +44,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:spark"' + + switches('-Pnexmark.runner=":runners:spark:2"' + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), @@ -62,7 +62,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:spark"' + + switches('-Pnexmark.runner=":runners:spark:2"' + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), @@ -81,7 +81,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:spark"' + + switches('-Pnexmark.runner=":runners:spark:2"' + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), @@ -101,7 +101,7 @@ NoPhraseTriggeringPostCommitBuilder.postCommitJob('beam_PostCommit_Java_Nexmark_ rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:java:testing:nexmark:run') commonJobProperties.setGradleSwitches(delegate) - switches('-Pnexmark.runner=":runners:spark"' + + switches('-Pnexmark.runner=":runners:spark:2"' + ' -Pnexmark.args="' + [ commonJobProperties.mapToArgString(nexmarkBigQueryArgs), diff --git a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink_Batch.groovy b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink_Batch.groovy index 8e62f8ed99e6..b987c3e3bf29 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink_Batch.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink_Batch.groovy @@ -17,6 +17,7 @@ */ import CommonJobProperties as commonJobProperties +import CommonTestProperties import PostcommitJobBuilder // This job runs the suite of ValidatesRunner tests against the Flink runner. @@ -36,7 +37,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_PVR_Flink_Batch', steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:flink:1.10:job-server:validatesPortableRunnerBatch') + tasks(":runners:flink:${CommonTestProperties.getFlinkVersion()}:job-server:validatesPortableRunnerBatch") commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink_Streaming.groovy b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink_Streaming.groovy index e3d427e2a095..a35746a0cd3f 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink_Streaming.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink_Streaming.groovy @@ -17,6 +17,7 @@ */ import CommonJobProperties as commonJobProperties +import CommonTestProperties import PostcommitJobBuilder // This job runs the suite of ValidatesRunner tests against the Flink runner. @@ -36,13 +37,13 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_PVR_Flink_Streaming', steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:flink:1.10:job-server:validatesPortableRunnerStreaming') + tasks(":runners:flink:${CommonTestProperties.getFlinkVersion()}:job-server:validatesPortableRunnerStreaming") commonJobProperties.setGradleSwitches(delegate) } // TODO(BEAM-10940): Enable this test suite once we have support. //gradle { // rootBuildScriptDir(commonJobProperties.checkoutDir) - // tasks(':runners:flink:1.10:job-server:validatesPortableRunnerStreamingCheckpoint') + // tasks(":runners:flink:${CommonTestProperties.getFlinkVersion()}:job-server:validatesPortableRunnerStreamingCheckpoint") // commonJobProperties.setGradleSwitches(delegate) //} } diff --git a/.test-infra/jenkins/job_PostCommit_Python37.groovy b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Samza.groovy similarity index 65% rename from .test-infra/jenkins/job_PostCommit_Python37.groovy rename to .test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Samza.groovy index ed0edbea3c1b..881d4e1ca85f 100644 --- a/.test-infra/jenkins/job_PostCommit_Python37.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Samza.groovy @@ -19,25 +19,24 @@ import CommonJobProperties as commonJobProperties import PostcommitJobBuilder -// This job defines the Python postcommit tests. -PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python37', 'Run Python 3.7 PostCommit', - 'Python37_PC("Run Python 3.7 PostCommit")', this) { - description('Runs Python postcommit tests using Python 3.7.') - - previousNames('/beam_PostCommit_Python3_Verify/') +// This job runs the suite of Java ValidatesRunner tests against the Samza runner. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_PVR_Samza', + 'Run Java Samza PortableValidatesRunner', 'Java Samza PortableValidatesRunner Tests', this) { + description('Runs the Java PortableValidatesRunner suite on the Samza runner.') // Set common parameters. - commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 100) + commonJobProperties.setTopLevelMainJobProperties(delegate) + // Publish all test results to Jenkins publishers { - archiveJunit('**/nosetests*.xml') + archiveJunit('**/build/test-results/**/*.xml') } - // Execute shell command to test Python SDK. + // Gradle goals for this job. steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':python37PostCommit') + tasks(':runners:samza:job-server:validatesPortableRunner') commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark_Streaming.groovy b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark2_Streaming.groovy similarity index 85% rename from .test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark_Streaming.groovy rename to .test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark2_Streaming.groovy index bfc50635511d..14e1ff3d6078 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark_Streaming.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark2_Streaming.groovy @@ -20,9 +20,9 @@ import CommonJobProperties as commonJobProperties import PostcommitJobBuilder // This job runs the suite of Java ValidatesRunner tests against the Spark runner in streaming mode. -PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_PVR_Spark_Streaming', - 'Run Java Spark PortableValidatesRunner Streaming', 'Java Spark PortableValidatesRunner Streaming Tests', this) { - description('Runs the Java PortableValidatesRunner suite on the Spark runner in streaming mode.') +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_PVR_Spark2_Streaming', + 'Run Java Spark v2 PortableValidatesRunner Streaming', 'Java Spark v2 PortableValidatesRunner Streaming Tests', this) { + description('Runs the Java PortableValidatesRunner suite on the Spark v2 runner in streaming mode.') // Set common parameters. commonJobProperties.setTopLevelMainJobProperties(delegate) @@ -36,7 +36,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_PVR_Spark_Streaming', steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:spark:job-server:validatesPortableRunnerStreaming') + tasks(':runners:spark:2:job-server:validatesPortableRunnerStreaming') commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark3_Streaming.groovy b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark3_Streaming.groovy new file mode 100644 index 000000000000..2164ab554b3b --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark3_Streaming.groovy @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + +// This job runs the suite of Java ValidatesRunner tests against the Spark runner in streaming mode. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_PVR_Spark3_Streaming', + 'Run Java Spark v3 PortableValidatesRunner Streaming', 'Java Spark v3 PortableValidatesRunner Streaming Tests', this) { + description('Runs the Java PortableValidatesRunner suite on the Spark v3 runner in streaming mode.') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate) + + // Publish all test results to Jenkins + publishers { + archiveJunit('**/build/test-results/**/*.xml') + } + + // Gradle goals for this job. + steps { + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(':runners:spark:3:job-server:validatesPortableRunnerStreaming') + commonJobProperties.setGradleSwitches(delegate) + } + } + } diff --git a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark_Batch.groovy b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark_Batch.groovy index 2297402859a6..33914ac921ee 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark_Batch.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark_Batch.groovy @@ -36,7 +36,8 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_PVR_Spark_Batch', steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:spark:job-server:validatesPortableRunnerBatch') + tasks(':runners:spark:2:job-server:validatesPortableRunnerBatch') + tasks(':runners:spark:3:job-server:validatesPortableRunnerBatch') commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy index 0e3e628bf679..1ac456207b00 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy @@ -27,7 +27,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_Dataflo description('Runs the ValidatesRunner suite on the Dataflow runner.') - commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 270) + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 420) previousNames(/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/) // Publish all test results to Jenkins diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Java11.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Java11.groovy index af9e25ebd175..6ba96859a807 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Java11.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Java11.groovy @@ -28,7 +28,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_Dataflo def JAVA_11_HOME = '/usr/lib/jvm/java-11-openjdk-amd64' def JAVA_8_HOME = '/usr/lib/jvm/java-8-openjdk-amd64' - commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 270) + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 420) publishers { archiveJunit('**/build/test-results/**/*.xml') } diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_streaming.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Streaming.groovy similarity index 99% rename from .test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_streaming.groovy rename to .test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Streaming.groovy index 8bea6be2b4ee..01ebd946b0bb 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_streaming.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Streaming.groovy @@ -27,7 +27,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_Dataflo description('Runs the ValidatesRunner suite on the Dataflow runner forcing streaming mode.') - commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 270) + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 420) // Publish all test results to Jenkins publishers { diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_V2_Streaming.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_V2_Streaming.groovy index 0de085d3f937..3b3d8ed9f828 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_V2_Streaming.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_V2_Streaming.groovy @@ -27,7 +27,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_VR_Dataflow_V2_Streamin description('Runs Java ValidatesRunner suite on the Dataflow runner V2 forcing streaming mode.') - commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 330) + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 450) // Publish all test results to Jenkins publishers { diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink.groovy index 37cc9ec5d876..6f24f9fe6a47 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink.groovy @@ -17,6 +17,7 @@ */ import CommonJobProperties as commonJobProperties +import CommonTestProperties import PostcommitJobBuilder // This job runs the suite of ValidatesRunner tests against the Flink runner. @@ -37,7 +38,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_Flink', steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:flink:1.10:validatesRunner') + tasks(":runners:flink:${CommonTestProperties.getFlinkVersion()}:validatesRunner") commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink_Java11.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink_Java11.groovy index 02e70a9046f1..9178caebd813 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink_Java11.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink_Java11.groovy @@ -17,6 +17,7 @@ */ import CommonJobProperties as commonJobProperties +import CommonTestProperties import PostcommitJobBuilder @@ -36,14 +37,14 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_Flink_J steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:flink:1.10:jar') - tasks(':runners:flink:1.10:testJar') + tasks(":runners:flink:${CommonTestProperties.getFlinkVersion()}:jar") + tasks(":runners:flink:${CommonTestProperties.getFlinkVersion()}:testJar") switches("-Dorg.gradle.java.home=${JAVA_8_HOME}") } gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:flinK:1.10:validatesRunner') + tasks(":runners:flink:${CommonTestProperties.getFlinkVersion()}:validatesRunner") switches('-x shadowJar') switches('-x shadowTestJar') switches('-x compileJava') diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Spark.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Spark.groovy index 21bd71a9aba4..2509e6884d7d 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Spark.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Spark.groovy @@ -37,7 +37,8 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_Spark', steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:spark:validatesRunner') + tasks(':runners:spark:2:validatesRunner') + tasks(':runners:spark:3:validatesRunner') commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming.groovy index d20b5484bfa2..7253140faa51 100644 --- a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming.groovy +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming.groovy @@ -36,7 +36,8 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_SparkSt steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':runners:spark:validatesStructuredStreamingRunnerBatch') + tasks(':runners:spark:2:validatesStructuredStreamingRunnerBatch') + tasks(':runners:spark:3:validatesStructuredStreamingRunnerBatch') commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Spark_Java11.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Spark_Java11.groovy new file mode 100644 index 000000000000..d0da52927b8a --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Spark_Java11.groovy @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + + +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_Spark_Java11', + 'Run Spark ValidatesRunner Java 11', 'Apache Spark Runner ValidatesRunner Tests On Java 11', this) { + + description('Runs the ValidatesRunner suite on the Spark runner with Java 11.') + + def JAVA_11_HOME = '/usr/lib/jvm/java-11-openjdk-amd64' + def JAVA_8_HOME = '/usr/lib/jvm/java-8-openjdk-amd64' + + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 270) + publishers { + archiveJunit('**/build/test-results/**/*.xml') + } + + steps { + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(':runners:spark:3:jar') + tasks(':runners:spark:3:testJar') + switches("-Dorg.gradle.java.home=${JAVA_8_HOME}") + } + + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(':runners:spark:3:validatesRunner') + switches('-x shadowJar') + switches('-x shadowTestJar') + switches('-x compileJava') + switches('-x compileTestJava') + switches('-x jar') + switches('-x testJar') + switches('-x classes') + switches('-x testClasses') + switches("-Dorg.gradle.java.home=${JAVA_11_HOME}") + + commonJobProperties.setGradleSwitches(delegate, 3 * Runtime.runtime.availableProcessors()) + } + } + } diff --git a/.test-infra/jenkins/job_PostCommit_Python.groovy b/.test-infra/jenkins/job_PostCommit_Python.groovy new file mode 100644 index 000000000000..2f3b1d8c1e43 --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Python.groovy @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + +import static PythonTestProperties.ALL_SUPPORTED_VERSIONS + +// This job defines the Python postcommit tests. +ALL_SUPPORTED_VERSIONS.each { pythonVersion -> + def versionSuffix = pythonVersion.replace('.', '') + PostcommitJobBuilder.postCommitJob("beam_PostCommit_Python${versionSuffix}", + "Run Python ${pythonVersion} PostCommit", + "Python${versionSuffix}_PC(\"Run Python ${pythonVersion} PostCommit\")", this) { + description('Runs Python postcommit tests using Python ${pythonVersion}.') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 120) + + publishers { + archiveJunit('**/pytest*.xml') + } + + // Execute shell command to test Python SDK. + steps { + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(":python${versionSuffix}PostCommit") + commonJobProperties.setGradleSwitches(delegate) + } + } + } +} + diff --git a/.test-infra/jenkins/job_PostCommit_Python_Chicago_Taxi_Example_Flink.groovy b/.test-infra/jenkins/job_PostCommit_Python_Chicago_Taxi_Example_Flink.groovy index 34d81f2b2ec2..54f8cd949141 100644 --- a/.test-infra/jenkins/job_PostCommit_Python_Chicago_Taxi_Example_Flink.groovy +++ b/.test-infra/jenkins/job_PostCommit_Python_Chicago_Taxi_Example_Flink.groovy @@ -38,7 +38,7 @@ def chicagoTaxiJob = { scope -> "${DOCKER_CONTAINER_REGISTRY}/${beamSdkDockerImage}" ], numberOfWorkers, - "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.10_job_server:latest") + "${DOCKER_CONTAINER_REGISTRY}/beam_flink1.12_job_server:latest") def pipelineOptions = [ parallelism : numberOfWorkers, diff --git a/.test-infra/jenkins/job_PostCommit_Python_ValidatesContainer_Dataflow.groovy b/.test-infra/jenkins/job_PostCommit_Python_ValidatesContainer_Dataflow.groovy index 208324e25fe2..bc43ecb366fc 100644 --- a/.test-infra/jenkins/job_PostCommit_Python_ValidatesContainer_Dataflow.groovy +++ b/.test-infra/jenkins/job_PostCommit_Python_ValidatesContainer_Dataflow.groovy @@ -31,15 +31,15 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Py_ValCont', commonJobProperties.setTopLevelMainJobProperties(delegate) publishers { - archiveJunit('**/nosetests*.xml') + archiveJunit('**/pytest*.xml') } // Execute shell command to test Python SDK. - // TODO: Parallel the script run with Jenkins DSL or Gradle. steps { - VALIDATES_CONTAINER_DATAFLOW_PYTHON_VERSIONS.each { - pythonVersion = it.replace('.', '') - shell('cd ' + commonJobProperties.checkoutDir + " && bash sdks/python/container/run_validatescontainer.sh python${pythonVersion}") + gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(':sdks:python:test-suites:dataflow:validatesContainerTests') + commonJobProperties.setGradleSwitches(delegate) } } } diff --git a/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Dataflow.groovy b/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Dataflow.groovy index 7849562b8ae8..81ff8276d297 100644 --- a/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Dataflow.groovy +++ b/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Dataflow.groovy @@ -26,22 +26,18 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Py_VR_Dataflow', 'Run Python description('Runs Python ValidatesRunner suite on the Dataflow runner.') // Set common parameters. - commonJobProperties.setTopLevelMainJobProperties(delegate) + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 200) publishers { - archiveJunit('**/nosetests*.xml') + archiveJunit('**/pytest*.xml') } // Execute gradle task to test Python SDK. steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':sdks:python:test-suites:dataflow:py36:validatesRunnerBatchTests') - tasks(':sdks:python:test-suites:dataflow:py37:validatesRunnerBatchTests') - tasks(':sdks:python:test-suites:dataflow:py38:validatesRunnerBatchTests') - tasks(':sdks:python:test-suites:dataflow:py36:validatesRunnerStreamingTests') - tasks(':sdks:python:test-suites:dataflow:py37:validatesRunnerStreamingTests') - tasks(':sdks:python:test-suites:dataflow:py38:validatesRunnerStreamingTests') + tasks(':sdks:python:test-suites:dataflow:validatesRunnerBatchTests') + tasks(':sdks:python:test-suites:dataflow:validatesRunnerStreamingTests') commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Dataflow_V2.groovy b/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Dataflow_V2.groovy index ca88fb8484ac..060cf33aabc6 100644 --- a/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Dataflow_V2.groovy +++ b/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Dataflow_V2.groovy @@ -26,23 +26,18 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Py_VR_Dataflow_V2', 'Run Pyt description('Runs Python ValidatesRunner suite on the Dataflow runner v2.') // Set common parameters. - commonJobProperties.setTopLevelMainJobProperties(delegate) + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 200) publishers { - archiveJunit('**/nosetests*.xml') + archiveJunit('**/pytest*.xml') } // Execute gradle task to test Python SDK. steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - // TODO: Enable following tests after making sure we have enough capacity. - // tasks(':sdks:python:test-suites:dataflow:py36:validatesRunnerBatchTests') - // tasks(':sdks:python:test-suites:dataflow:py37:validatesRunnerBatchTests') - tasks(':sdks:python:test-suites:dataflow:py38:validatesRunnerBatchTests') - // tasks(':sdks:python:test-suites:dataflow:py36:validatesRunnerBatchTests') - // tasks(':sdks:python:test-suites:dataflow:py37:validatesRunnerStreamingTests') - tasks(':sdks:python:test-suites:dataflow:py38:validatesRunnerStreamingTests') + tasks(':sdks:python:test-suites:dataflow:validatesRunnerBatchTestsV2') + tasks(':sdks:python:test-suites:dataflow:validatesRunnerStreamingTestsV2') switches('-PuseRunnerV2') commonJobProperties.setGradleSwitches(delegate) } diff --git a/.test-infra/jenkins/job_PostCommit_Python38.groovy b/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Samza.groovy similarity index 65% rename from .test-infra/jenkins/job_PostCommit_Python38.groovy rename to .test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Samza.groovy index 85c0ee8b9f78..c49e3677ab8a 100644 --- a/.test-infra/jenkins/job_PostCommit_Python38.groovy +++ b/.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Samza.groovy @@ -19,25 +19,24 @@ import CommonJobProperties as commonJobProperties import PostcommitJobBuilder -// This job defines the Python postcommit tests. -PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python38', 'Run Python 3.8 PostCommit', - 'Python38_PC("Run Python 3.8 PostCommit")', this) { - description('Runs Python postcommit tests using Python 3.8.') - - previousNames('/beam_PostCommit_Python3_Verify/') +// This job runs the suite of Python ValidatesRunner tests against the Samza runner. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python_VR_Samza', + 'Run Python Samza ValidatesRunner', 'Python Samza ValidatesRunner Tests', this) { + description('Runs the Python ValidatesRunner suite on the Samza runner.') // Set common parameters. - commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 100) + commonJobProperties.setTopLevelMainJobProperties(delegate) + // Publish all test results to Jenkins. publishers { - archiveJunit('**/nosetests*.xml') + archiveJunit('**/pytest*.xml') } - // Execute shell command to test Python SDK. + // Gradle goals for this job. steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':python38PostCommit') + tasks(':sdks:python:test-suites:portable:samzaValidatesRunner') commonJobProperties.setGradleSwitches(delegate) } } diff --git a/.test-infra/jenkins/job_PostRelease_NightlySnapshot.groovy b/.test-infra/jenkins/job_PostRelease_NightlySnapshot.groovy index f911c6de3f9a..d31c3544106d 100644 --- a/.test-infra/jenkins/job_PostRelease_NightlySnapshot.groovy +++ b/.test-infra/jenkins/job_PostRelease_NightlySnapshot.groovy @@ -23,8 +23,8 @@ import CommonJobProperties as commonJobProperties job('beam_PostRelease_NightlySnapshot') { description('Runs post release verification of the nightly snapshot.') - // Set common parameters. - commonJobProperties.setTopLevelMainJobProperties(delegate) + // Set common parameters. Timeout is longer, to avoid [BEAM-5774]. + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 200) parameters { stringParam('snapshot_version', diff --git a/.test-infra/jenkins/job_PreCommit_Go_Portable.groovy b/.test-infra/jenkins/job_PreCommit_Go_Portable.groovy new file mode 100644 index 000000000000..3e0e198d2a8d --- /dev/null +++ b/.test-infra/jenkins/job_PreCommit_Go_Portable.groovy @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import PrecommitJobBuilder + +PrecommitJobBuilder builder = new PrecommitJobBuilder( + scope: this, + nameBase: 'GoPortable', + gradleTask: ':goPortablePreCommit', + triggerPathPatterns: [ + '^model/.*$', + '^sdks/go/.*$', + '^release/.*$', + ] + ) +builder.build() diff --git a/.test-infra/jenkins/job_PreCommit_SQL_Java11.groovy b/.test-infra/jenkins/job_PreCommit_SQL_Java11.groovy index 1dabf4e669ac..73e46a1155b4 100644 --- a/.test-infra/jenkins/job_PreCommit_SQL_Java11.groovy +++ b/.test-infra/jenkins/job_PreCommit_SQL_Java11.groovy @@ -32,8 +32,7 @@ PrecommitJobBuilder builder = new PrecommitJobBuilder( ], // spotless checked in job_PreCommit_Spotless triggerPathPatterns: [ '^sdks/java/extensions/sql.*$', - ], - timeoutMins: 30 + ] ) builder.build { publishers { diff --git a/.test-infra/jenkins/job_Publish_Docker_Snapshots.groovy b/.test-infra/jenkins/job_Publish_Docker_Snapshots.groovy index 916b6347c402..256ee5a7000c 100644 --- a/.test-infra/jenkins/job_Publish_Docker_Snapshots.groovy +++ b/.test-infra/jenkins/job_Publish_Docker_Snapshots.groovy @@ -17,6 +17,7 @@ */ import CommonJobProperties as commonJobProperties +import CommonTestProperties import static PythonTestProperties.SUPPORTED_CONTAINER_TASKS job('beam_Publish_Docker_Snapshots') { @@ -44,7 +45,7 @@ job('beam_Publish_Docker_Snapshots') { tasks(":sdks:python:container:${taskVer}:dockerPush") } tasks(':sdks:go:container:dockerPush') - tasks(':runners:flink:1.10:job-server-container:dockerPush') + tasks(":runners:flink:${CommonTestProperties.getFlinkVersion()}:job-server-container:dockerPush") switches("-Pdocker-repository-root=gcr.io/apache-beam-testing/beam_portability") switches("-Pdocker-tag=latest") } diff --git a/.test-infra/jenkins/job_Publish_SDK_Image_Snapshots.groovy b/.test-infra/jenkins/job_Publish_SDK_Image_Snapshots.groovy index 0b72c4c69938..cc35e7f6ffd2 100644 --- a/.test-infra/jenkins/job_Publish_SDK_Image_Snapshots.groovy +++ b/.test-infra/jenkins/job_Publish_SDK_Image_Snapshots.groovy @@ -41,15 +41,15 @@ job('beam_Publish_Beam_SDK_Snapshots') { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) commonJobProperties.setGradleSwitches(delegate) - tasks(':sdks:go:container:dockerPush') + tasks(':sdks:go:container:dockerTagPush') SUPPORTED_JAVA_CONTAINER_TASKS.each { taskVer -> - tasks(":sdks:java:container:${taskVer}:dockerPush") + tasks(":sdks:java:container:${taskVer}:dockerTagPush") } SUPPORTED_PYTHON_CONTAINER_TASKS.each { taskVer -> - tasks(":sdks:python:container:${taskVer}:dockerPush") + tasks(":sdks:python:container:${taskVer}:dockerTagPush") } switches("-Pdocker-repository-root=${imageRepo}") - switches("-Pdocker-tag=${imageTag}") + switches("-Pdocker-tag-list=${imageTag},latest") } } } diff --git a/.test-infra/jenkins/metrics_report/requirements.txt b/.test-infra/jenkins/metrics_report/requirements.txt index 5d868a89e5ec..8a9efe73b325 100644 --- a/.test-infra/jenkins/metrics_report/requirements.txt +++ b/.test-infra/jenkins/metrics_report/requirements.txt @@ -16,5 +16,5 @@ # under the License. influxdb==5.3.0 -Jinja2==2.11.2 +Jinja2==2.11.3 prettytable==0.7.2 diff --git a/.test-infra/kubernetes/influxdb/influxdb.yml b/.test-infra/kubernetes/influxdb/influxdb.yml index a062b3999b46..16bf1b34c27d 100644 --- a/.test-infra/kubernetes/influxdb/influxdb.yml +++ b/.test-infra/kubernetes/influxdb/influxdb.yml @@ -18,7 +18,7 @@ kind: Secret metadata: name: influxdb-creds data: - INFLUXDB_USER: c3VwZXJzYWRtaW4= + INFLUXDB_USER: c3VwZXJhZG1pbg== INFLUXDB_USER_PASSWORD: c3VwZXJzZWNyZXRwYXNzd29yZA== INFLUXDB_GRAPHITE_ENABLED: dHJ1ZQ== @@ -42,7 +42,7 @@ spec: spec: containers: - name: influxdb - image: influxdb + image: influxdb:1.8 env: - name: INFLUXDB_USER valueFrom: diff --git a/.test-infra/kubernetes/kafka-cluster/03-zookeeper/30service.yml b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/30service.yml index 4fd6eca3ea9c..2b0445e57437 100644 --- a/.test-infra/kubernetes/kafka-cluster/03-zookeeper/30service.yml +++ b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/30service.yml @@ -23,4 +23,3 @@ spec: name: client selector: app: zookeeper - type: LoadBalancer diff --git a/.test-infra/kubernetes/kafka-cluster/05-kafka/configmap-config.yaml b/.test-infra/kubernetes/kafka-cluster/05-kafka/configmap-config.yaml index d9acf9f7d0ad..bac8292c8344 100644 --- a/.test-infra/kubernetes/kafka-cluster/05-kafka/configmap-config.yaml +++ b/.test-infra/kubernetes/kafka-cluster/05-kafka/configmap-config.yaml @@ -33,9 +33,9 @@ data: sleep 20 done echo "Applying runtime configuration using confluentinc/cp-kafka:5.0.1" - kafka-topics --zookeeper zookeeper:2181 --create --if-not-exists --force --topic beam --partitions 1 --replication-factor 1 + kafka-topics --zookeeper zookeeper:2181 --create --if-not-exists --force --topic beam --partitions 1 --replication-factor 3 kafka-configs --zookeeper zookeeper:2181 --entity-type topics --entity-name beam --describe - kafka-topics --zookeeper zookeeper:2181 --create --if-not-exists --force --topic beam-runnerv2 --partitions 1 --replication-factor 1 + kafka-topics --zookeeper zookeeper:2181 --create --if-not-exists --force --topic beam-runnerv2 --partitions 1 --replication-factor 3 kafka-configs --zookeeper zookeeper:2181 --entity-type topics --entity-name beam-runnerv2 --describe - kafka-topics --zookeeper zookeeper:2181 --create --if-not-exists --force --topic beam-sdf --partitions 1 --replication-factor 1 + kafka-topics --zookeeper zookeeper:2181 --create --if-not-exists --force --topic beam-sdf --partitions 1 --replication-factor 3 kafka-configs --zookeeper zookeeper:2181 --entity-type topics --entity-name beam-sdf --describe diff --git a/.test-infra/metrics/docker-compose.yml b/.test-infra/metrics/docker-compose.yml index d47c7d7b47e6..dfa4a5024b0a 100644 --- a/.test-infra/metrics/docker-compose.yml +++ b/.test-infra/metrics/docker-compose.yml @@ -64,6 +64,7 @@ services: - GF_SECURITY_ADMIN_PASSWORD= - GF_AUTH_ANONYMOUS_ENABLED=true - GF_AUTH_ANONYMOUS_ORG_NAME=Beam + - GF_INSTALL_PLUGINS=marcusolsson-json-datasource - PSQL_DB_HOST=beampostgresql - PSQL_DB_PORT=5432 - PSQL_DB_DBNAME=beam_metrics diff --git a/.test-infra/metrics/grafana/Dockerfile b/.test-infra/metrics/grafana/Dockerfile index ff5b66d13d3c..5eadd2a77be5 100644 --- a/.test-infra/metrics/grafana/Dockerfile +++ b/.test-infra/metrics/grafana/Dockerfile @@ -16,7 +16,9 @@ # limitations under the License. ################################################################################ -FROM grafana/grafana:6.7.3 +FROM grafana/grafana:8.1.2 + +RUN grafana-cli plugins install marcusolsson-json-datasource COPY ./provisioning /etc/beamgrafana/provisioning COPY ./dashboards /etc/beamgrafana/dashboards diff --git a/.test-infra/metrics/grafana/dashboards/github_actions_post-commit_tests.json b/.test-infra/metrics/grafana/dashboards/github_actions_post-commit_tests.json new file mode 100644 index 000000000000..d5be40216768 --- /dev/null +++ b/.test-infra/metrics/grafana/dashboards/github_actions_post-commit_tests.json @@ -0,0 +1,558 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": "-- Grafana --", + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "gnetId": null, + "graphTooltip": 0, + "id": 2, + "links": [], + "panels": [ + { + "datasource": "Python Tests", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + } + }, + "mappings": [] + }, + "overrides": [ + { + "matcher": { + "id": "byFrameRefID", + "options": "A" + }, + "properties": [ + { + "id": "color", + "value": { + "fixedColor": "green", + "mode": "fixed" + } + } + ] + }, + { + "matcher": { + "id": "byFrameRefID", + "options": "B" + }, + "properties": [ + { + "id": "color", + "value": { + "fixedColor": "dark-red", + "mode": "fixed" + } + } + ] + }, + { + "matcher": { + "id": "byFrameRefID", + "options": "C" + }, + "properties": [ + { + "id": "color", + "value": { + "fixedColor": "super-light-yellow", + "mode": "fixed" + } + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 0 + }, + "id": 6, + "options": { + "displayLabels": [ + "value" + ], + "legend": { + "displayMode": "list", + "placement": "right", + "values": [] + }, + "pieType": "pie", + "reduceOptions": { + "calcs": [ + "count" + ], + "fields": "/.*/", + "values": false + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.0.4", + "targets": [ + { + "body": "", + "cacheDurationSeconds": 300, + "fields": [ + { + "jsonPath": "$.workflow_runs[?(@.conclusion == \"success\")]", + "name": "success" + } + ], + "hide": false, + "method": "GET", + "queryParams": "", + "refId": "A", + "urlPath": "" + }, + { + "cacheDurationSeconds": 300, + "fields": [ + { + "jsonPath": "$.workflow_runs[?(@.conclusion == \"failure\")]", + "name": "failure" + } + ], + "hide": false, + "method": "GET", + "queryParams": "", + "refId": "B", + "urlPath": "" + }, + { + "cacheDurationSeconds": 300, + "fields": [ + { + "jsonPath": "$.workflow_runs[?(@.conclusion == \"cancelled\")]", + "name": "cancelled" + } + ], + "hide": false, + "method": "GET", + "queryParams": "", + "refId": "C", + "urlPath": "" + } + ], + "title": "Python last 100 post-commit tests results", + "transformations": [], + "type": "piechart" + }, + { + "datasource": "Python Tests", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "align": "center", + "displayMode": "color-background" + }, + "mappings": [ + { + "options": { + "cancelled": { + "color": "super-light-yellow", + "index": 2 + }, + "failure": { + "color": "dark-red", + "index": 1 + }, + "success": { + "color": "green", + "index": 0 + } + }, + "type": "value" + } + ], + "noValue": "Fetching status", + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + } + }, + "overrides": [ + { + "matcher": { + "id": "byName", + "options": "Workflow URL" + }, + "properties": [ + { + "id": "custom.displayMode", + "value": "auto" + }, + { + "id": "custom.width", + "value": 415 + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Created At" + }, + "properties": [ + { + "id": "custom.displayMode", + "value": "auto" + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 0 + }, + "id": 8, + "options": { + "showHeader": true, + "sortBy": [] + }, + "pluginVersion": "8.0.4", + "targets": [ + { + "body": "", + "cacheDurationSeconds": 300, + "fields": [ + { + "jsonPath": "$.workflow_runs[*].id", + "name": "Workflow ID" + }, + { + "jsonPath": "$.workflow_runs[*].conclusion", + "name": "Test Result" + }, + { + "jsonPath": "$.workflow_runs[*].html_url", + "name": "Workflow URL" + }, + { + "jsonPath": "$.workflow_runs[*].created_at", + "name": "Created At" + } + ], + "method": "GET", + "queryParams": "", + "refId": "A", + "urlPath": "" + } + ], + "title": "Python Workflow Results", + "type": "table" + }, + { + "datasource": "Java Tests", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + } + }, + "mappings": [] + }, + "overrides": [ + { + "matcher": { + "id": "byFrameRefID", + "options": "A" + }, + "properties": [ + { + "id": "color", + "value": { + "fixedColor": "green", + "mode": "fixed" + } + } + ] + }, + { + "matcher": { + "id": "byFrameRefID", + "options": "B" + }, + "properties": [ + { + "id": "color", + "value": { + "fixedColor": "dark-red", + "mode": "fixed" + } + } + ] + }, + { + "matcher": { + "id": "byFrameRefID", + "options": "C" + }, + "properties": [ + { + "id": "color", + "value": { + "fixedColor": "super-light-yellow", + "mode": "fixed" + } + } + ] + } + ] + }, + "gridPos": { + "h": 9, + "w": 12, + "x": 0, + "y": 8 + }, + "id": 2, + "options": { + "displayLabels": [ + "value" + ], + "legend": { + "displayMode": "list", + "placement": "right", + "values": [] + }, + "pieType": "pie", + "reduceOptions": { + "calcs": [ + "count" + ], + "fields": "/.*/", + "values": false + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.0.4", + "targets": [ + { + "cacheDurationSeconds": 300, + "fields": [ + { + "jsonPath": "$.workflow_runs[?(@.conclusion == \"success\")]", + "name": "success" + } + ], + "hide": false, + "method": "GET", + "queryParams": "", + "refId": "A", + "urlPath": "" + }, + { + "cacheDurationSeconds": 300, + "fields": [ + { + "jsonPath": "$.workflow_runs[?(@.conclusion == \"failure\")]", + "name": "failure" + } + ], + "hide": false, + "method": "GET", + "queryParams": "", + "refId": "B", + "urlPath": "" + }, + { + "cacheDurationSeconds": 300, + "fields": [ + { + "jsonPath": "$.workflow_runs[?(@.conclusion == \"cancelled\")]", + "name": "cancelled" + } + ], + "hide": false, + "method": "GET", + "queryParams": "", + "refId": "C", + "urlPath": "" + } + ], + "title": "Java last 100 post-commit tests results", + "type": "piechart" + }, + { + "datasource": "Java Tests", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "align": "center", + "displayMode": "color-background-solid" + }, + "mappings": [ + { + "options": { + "cancelled": { + "color": "super-light-yellow", + "index": 2 + }, + "failure": { + "color": "dark-red", + "index": 1 + }, + "success": { + "color": "green", + "index": 0 + } + }, + "type": "value" + } + ], + "noValue": "Fetching status", + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + } + }, + "overrides": [ + { + "matcher": { + "id": "byName", + "options": "Workflow URL" + }, + "properties": [ + { + "id": "custom.displayMode", + "value": "auto" + }, + { + "id": "custom.width", + "value": 410 + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Created At" + }, + "properties": [ + { + "id": "custom.displayMode", + "value": "auto" + } + ] + } + ] + }, + "gridPos": { + "h": 9, + "w": 12, + "x": 12, + "y": 8 + }, + "id": 4, + "options": { + "showHeader": true, + "sortBy": [] + }, + "pluginVersion": "8.0.4", + "targets": [ + { + "cacheDurationSeconds": 300, + "fields": [ + { + "jsonPath": "$.workflow_runs[*].id", + "name": "Workflow ID" + }, + { + "jsonPath": "$.workflow_runs[*].conclusion", + "name": "Test Result" + }, + { + "jsonPath": "$.workflow_runs[*].html_url", + "name": "Workflow URL" + }, + { + "jsonPath": "$.workflow_runs[*].created_at", + "name": "Created At" + } + ], + "method": "GET", + "queryParams": "", + "refId": "A", + "urlPath": "" + } + ], + "title": "Java Workflow Results", + "type": "table" + } + ], + "refresh": "", + "schemaVersion": 30, + "style": "dark", + "tags": [], + "templating": { + "list": [] + }, + "time": { + "from": "now-24h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "GitHub Actions Post Commit Tests", + "uid": "dYwQFp7nk", + "version": 103 +} + diff --git a/.test-infra/metrics/grafana/dashboards/perftests_metrics/Nexmark.json b/.test-infra/metrics/grafana/dashboards/perftests_metrics/Nexmark.json index c34dcae7cb50..fef3547134d2 100644 --- a/.test-infra/metrics/grafana/dashboards/perftests_metrics/Nexmark.json +++ b/.test-infra/metrics/grafana/dashboards/perftests_metrics/Nexmark.json @@ -111,7 +111,7 @@ ], "orderByTime": "ASC", "policy": "default", - "query": "SELECT \"runtimeMs\" FROM \"forever\".\"nexmark_${ID}_${processingType}\" WHERE \"runner\" =~ /^$runner$/ AND $timeFilter", + "query": "SELECT \"runtimeMs\" FROM \"forever\".\"nexmark_${ID}_${processingType}\" WHERE \"runner\" =~ /^$runner$/ AND \"runnerVersion\" = '' AND $timeFilter", "rawQuery": true, "refId": "A", "resultFormat": "time_series", @@ -229,7 +229,7 @@ ], "orderByTime": "ASC", "policy": "default", - "query": "SELECT \"runtimeMs\" FROM \"forever\".\"nexmark_${ID}_sql_${processingType}\" WHERE \"runner\" =~ /^$runner$/ AND $timeFilter", + "query": "SELECT \"runtimeMs\" FROM \"forever\".\"nexmark_${ID}_sql_${processingType}\" WHERE \"runner\" =~ /^$runner$/ AND \"runnerVersion\" = '' AND $timeFilter", "rawQuery": true, "refId": "A", "resultFormat": "time_series", @@ -343,7 +343,7 @@ "measurement": "", "orderByTime": "ASC", "policy": "default", - "query": "SELECT \"runtimeMs\" FROM \"forever\"./nexmark_${ID}_\\w*${processingType}/ WHERE \"runner\" =~ /^$runner$/ AND $timeFilter GROUP BY \"runner\"", + "query": "SELECT \"runtimeMs\" FROM \"forever\"./nexmark_${ID}_\\w*${processingType}/ WHERE \"runner\" =~ /^$runner$/ AND \"runnerVersion\" = '' AND $timeFilter GROUP BY \"runner\"", "rawQuery": true, "refId": "A", "resultFormat": "time_series", @@ -557,9 +557,15 @@ "selected": false, "text": "14", "value": "14" + }, + { + "$$hashKey": "object:400", + "selected": false, + "text": "15", + "value": "15" } ], - "query": "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14", + "query": "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15", "queryValue": "", "skipUrlSync": false, "type": "custom" diff --git a/.test-infra/metrics/grafana/dashboards/perftests_metrics/Nexmark_Dataflow_RunnerV2.json b/.test-infra/metrics/grafana/dashboards/perftests_metrics/Nexmark_Dataflow_RunnerV2.json new file mode 100644 index 000000000000..0a42f6676b43 --- /dev/null +++ b/.test-infra/metrics/grafana/dashboards/perftests_metrics/Nexmark_Dataflow_RunnerV2.json @@ -0,0 +1,622 @@ +{ + "annotations": { + "list": [ + { + "$$hashKey": "object:2584", + "builtIn": 1, + "datasource": "-- Grafana --", + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "gnetId": null, + "graphTooltip": 0, + "id": 1, + "iteration": 1620086777943, + "links": [], + "panels": [ + { + "collapsed": false, + "datasource": null, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Latest Run | Standard", + "type": "row" + }, + { + "cacheTimeout": null, + "datasource": "BeamInfluxDB", + "gridPos": { + "h": 5, + "w": 2, + "x": 0, + "y": 1 + }, + "id": 2, + "links": [], + "maxPerRow": 12, + "options": { + "displayMode": "basic", + "fieldOptions": { + "calcs": [ + "lastNotNull" + ], + "defaults": { + "mappings": [ + { + "$$hashKey": "object:1164", + "id": 0, + "op": "=", + "text": "N/A", + "type": 1, + "value": "null" + } + ], + "nullValueMode": "connected", + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [], + "values": false + }, + "orientation": "vertical", + "showUnfilled": true + }, + "pluginVersion": "6.7.3", + "repeat": "ID", + "repeatDirection": "h", + "scopedVars": { + "ID": { + "$$hashKey": "object:385", + "selected": false, + "text": "0", + "value": "0" + } + }, + "targets": [ + { + "alias": "", + "groupBy": [ + { + "params": [ + "$__interval" + ], + "type": "time" + }, + { + "params": [ + "null" + ], + "type": "fill" + } + ], + "orderByTime": "ASC", + "policy": "default", + "query": "SELECT \"runtimeMs\" FROM \"forever\".\"nexmark_${ID}_${processingType}\" WHERE \"runner\" = 'DataflowRunner' AND \"runnerVersion\" = 'V2' AND $timeFilter", + "rawQuery": true, + "refId": "A", + "resultFormat": "time_series", + "select": [ + [ + { + "params": [ + "value" + ], + "type": "field" + }, + { + "params": [], + "type": "mean" + } + ] + ], + "tags": [] + } + ], + "timeFrom": null, + "timeShift": null, + "title": "Query ${ID}", + "transparent": true, + "type": "bargauge" + }, + { + "collapsed": false, + "datasource": null, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 11 + }, + "id": 3, + "panels": [], + "title": "Latest Run | SQL", + "type": "row" + }, + { + "cacheTimeout": null, + "datasource": "BeamInfluxDB", + "gridPos": { + "h": 5, + "w": 2, + "x": 0, + "y": 12 + }, + "id": 4, + "links": [], + "maxPerRow": 12, + "options": { + "displayMode": "basic", + "fieldOptions": { + "calcs": [ + "lastNotNull" + ], + "defaults": { + "mappings": [ + { + "$$hashKey": "object:1164", + "id": 0, + "op": "=", + "text": "N/A", + "type": 1, + "value": "null" + } + ], + "nullValueMode": "connected", + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [], + "values": false + }, + "orientation": "vertical", + "showUnfilled": true + }, + "pluginVersion": "6.7.3", + "repeat": "ID", + "repeatDirection": "h", + "scopedVars": { + "ID": { + "$$hashKey": "object:385", + "selected": false, + "text": "0", + "value": "0" + } + }, + "targets": [ + { + "alias": "", + "groupBy": [ + { + "params": [ + "$__interval" + ], + "type": "time" + }, + { + "params": [ + "null" + ], + "type": "fill" + } + ], + "orderByTime": "ASC", + "policy": "default", + "query": "SELECT \"runtimeMs\" FROM \"forever\".\"nexmark_${ID}_sql_${processingType}\" WHERE \"runner\" = 'DataflowRunner' AND \"runnerVersion\" = 'V2' AND $timeFilter", + "rawQuery": true, + "refId": "A", + "resultFormat": "time_series", + "select": [ + [ + { + "params": [ + "value" + ], + "type": "field" + }, + { + "params": [], + "type": "mean" + } + ] + ], + "tags": [] + } + ], + "timeFrom": null, + "timeShift": null, + "title": "Query ${ID}", + "transparent": true, + "type": "bargauge" + }, + { + "collapsed": false, + "datasource": null, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 22 + }, + "id": 5, + "panels": [], + "title": "All results", + "type": "row" + }, + { + "aliasColors": { + "DataflowRunner": "#7eb26d", + "DirectRunner": "#eab839", + "FlinkRunner": "#6ed0e0", + "SparkRunner": "#ef843c", + "SparkStructuredStreamingRunner": "#e24d42" + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "BeamInfluxDB", + "fieldConfig": { + "defaults": { + "custom": {} + }, + "overrides": [] + }, + "fill": 1, + "fillGradient": 0, + "gridPos": { + "h": 12, + "w": 12, + "x": 0, + "y": 23 + }, + "hiddenSeries": false, + "id": 6, + "interval": "1d", + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "hideEmpty": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 2, + "maxPerRow": 2, + "nullPointMode": "connected", + "options": { + "dataLinks": [] + }, + "percentage": false, + "pointradius": 2, + "points": true, + "renderer": "flot", + "repeat": "ID", + "repeatDirection": "h", + "scopedVars": { + "ID": { + "$$hashKey": "object:385", + "selected": false, + "text": "0", + "value": "0" + } + }, + "seriesOverrides": [], + "spaceLength": 10, + "stack": false, + "steppedLine": false, + "targets": [ + { + "alias": "[[m]]_java8", + "groupBy": [], + "measurement": "", + "orderByTime": "ASC", + "policy": "default", + "query": "SELECT \"runtimeMs\" FROM \"forever\"./nexmark_${ID}_\\w*${processingType}/ WHERE \"runner\" = 'DataflowRunner' AND \"runnerVersion\" = 'V2' AND \"javaVersion\" = '8' AND $timeFilter GROUP BY \"runner\"", + "rawQuery": true, + "refId": "A", + "resultFormat": "time_series", + "select": [ + [ + { + "params": [ + "value" + ], + "type": "field" + } + ] + ], + "tags": [] + }, + { + "alias": "[[m]]_java11", + "groupBy": [], + "measurement": "", + "orderByTime": "ASC", + "policy": "default", + "query": "SELECT \"runtimeMs\" FROM \"forever\"./nexmark_${ID}_\\w*${processingType}/ WHERE \"runner\" = 'DataflowRunner' AND \"runnerVersion\" = 'V2' AND \"javaVersion\" = '11' AND $timeFilter GROUP BY \"runner\"", + "rawQuery": true, + "refId": "B", + "resultFormat": "time_series", + "select": [ + [ + { + "params": [ + "value" + ], + "type": "field" + } + ] + ], + "tags": [] + } + ], + "thresholds": [], + "timeFrom": null, + "timeRegions": [], + "timeShift": null, + "title": "Query ${ID}", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "transparent": true, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [] + }, + "yaxes": [ + { + "$$hashKey": "object:1048", + "decimals": null, + "format": "ms", + "label": "", + "logBase": 2, + "max": null, + "min": null, + "show": true + }, + { + "$$hashKey": "object:1049", + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ], + "yaxis": { + "align": false, + "alignLevel": null + } + } + ], + "refresh": false, + "schemaVersion": 22, + "style": "dark", + "tags": [ + "performance tests", + "nexmark" + ], + "templating": { + "list": [ + { + "allValue": null, + "current": { + "$$hashKey": "object:5347", + "selected": false, + "text": "batch", + "value": "batch" + }, + "hide": 0, + "includeAll": false, + "label": "Data processing type", + "multi": false, + "name": "processingType", + "options": [ + { + "$$hashKey": "object:5347", + "selected": true, + "text": "batch", + "value": "batch" + }, + { + "$$hashKey": "object:5348", + "selected": false, + "text": "streaming", + "value": "streaming" + } + ], + "query": "batch,streaming", + "queryValue": "", + "skipUrlSync": false, + "type": "custom" + }, + { + "allValue": null, + "current": { + "tags": [], + "text": "All", + "value": [ + "$__all" + ] + }, + "hide": 0, + "includeAll": true, + "label": "Query", + "multi": true, + "name": "ID", + "options": [ + { + "$$hashKey": "object:384", + "selected": true, + "text": "All", + "value": "$__all" + }, + { + "$$hashKey": "object:385", + "selected": false, + "text": "0", + "value": "0" + }, + { + "$$hashKey": "object:386", + "selected": false, + "text": "1", + "value": "1" + }, + { + "$$hashKey": "object:387", + "selected": false, + "text": "2", + "value": "2" + }, + { + "$$hashKey": "object:388", + "selected": false, + "text": "3", + "value": "3" + }, + { + "$$hashKey": "object:389", + "selected": false, + "text": "4", + "value": "4" + }, + { + "$$hashKey": "object:390", + "selected": false, + "text": "5", + "value": "5" + }, + { + "$$hashKey": "object:391", + "selected": false, + "text": "6", + "value": "6" + }, + { + "$$hashKey": "object:392", + "selected": false, + "text": "7", + "value": "7" + }, + { + "$$hashKey": "object:393", + "selected": false, + "text": "8", + "value": "8" + }, + { + "$$hashKey": "object:394", + "selected": false, + "text": "9", + "value": "9" + }, + { + "$$hashKey": "object:395", + "selected": false, + "text": "10", + "value": "10" + }, + { + "$$hashKey": "object:396", + "selected": false, + "text": "11", + "value": "11" + }, + { + "$$hashKey": "object:397", + "selected": false, + "text": "12", + "value": "12" + }, + { + "$$hashKey": "object:398", + "selected": false, + "text": "13", + "value": "13" + }, + { + "$$hashKey": "object:399", + "selected": false, + "text": "14", + "value": "14" + }, + { + "$$hashKey": "object:400", + "selected": false, + "text": "15", + "value": "15" + } + ], + "query": "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15", + "queryValue": "", + "skipUrlSync": false, + "type": "custom" + } + ] + }, + "time": { + "from": "now-30d", + "to": "now" + }, + "timepicker": { + "hidden": false, + "refresh_intervals": [ + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ] + }, + "timezone": "", + "title": "Nexmark Dataflow Runner V2", + "uid": "8INnSY9Mz", + "variables": { + "list": [] + }, + "version": 1 +} diff --git a/.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_FnApiRunner_ubenchmarks.json b/.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_FnApiRunner_ubenchmarks.json new file mode 100644 index 000000000000..238ff7c53999 --- /dev/null +++ b/.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_FnApiRunner_ubenchmarks.json @@ -0,0 +1,224 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": "-- Grafana --", + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "gnetId": null, + "graphTooltip": 0, + "id": 16, + "links": [], + "panels": [ + { + "aliasColors": {}, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "BeamInfluxDB", + "fill": 1, + "fillGradient": 0, + "gridPos": { + "h": 9, + "w": 12, + "x": 0, + "y": 0 + }, + "hiddenSeries": false, + "id": 2, + "interval": "6h", + "legend": { + "avg": false, + "current": false, + "max": false, + "min": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 2, + "nullPointMode": "connected", + "options": { + "dataLinks": [] + }, + "percentage": false, + "pointradius": 2, + "points": true, + "renderer": "flot", + "repeat": "microbenchmarkMetric", + "repeatDirection": "h", + "scopedVars": { + "microbenchmarkTest": { + "$$hashKey": "object:5347", + "selected": false, + "text": "runtime_sec", + "value": "runtime_sec" + } + }, + "seriesOverrides": [], + "spaceLength": 10, + "stack": false, + "steppedLine": false, + "targets": [ + { + "groupBy": [ + { + "params": [ + "$__interval" + ], + "type": "time" + }, + { + "params": [ + "null" + ], + "type": "fill" + } + ], + "orderByTime": "ASC", + "policy": "default", + "query": "SELECT mean(\"value\") FROM \"python_direct_microbenchmarks\" WHERE (\"metric\" = 'fn_api_runner_microbenchmark_${microbenchmarkMetric}' OR \"metric\" = 'teststream_microbenchmark_${microbenchmarkMetric}') AND $timeFilter GROUP BY time($__interval), \"metric\"", + "rawQuery": true, + "refId": "A", + "resultFormat": "time_series", + "select": [ + [ + { + "params": [ + "value" + ], + "type": "field" + }, + { + "params": [], + "type": "mean" + } + ] + ], + "tags": [] + } + ], + "thresholds": [], + "timeFrom": null, + "timeRegions": [], + "timeShift": null, + "title": "Python DirectRunner Microbenchmarks", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "transparent": true, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [] + }, + "yaxes": [ + { + "format": "s", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ], + "yaxis": { + "align": false, + "alignLevel": null + } + } + ], + "schemaVersion": 22, + "style": "dark", + "tags": ["performance tests"], + "time": { + "from": "now-30d", + "to": "now" + }, + "timepicker": { + "refresh_intervals": [ + "5s", + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ] + }, + "templating": { + "enable": true, + "list": [ + { + "allValue": null, + "current": { + "$$hashKey": "object:5347", + "selected": false, + "text": "runtime_sec", + "value": "runtime_sec" + }, + "hide": 0, + "includeAll": false, + "label": "Microbenchmark Metric", + "multi": false, + "name": "microbenchmarkMetric", + "options": [ + { + "$$hashKey": "object:5347", + "selected": true, + "text": "runtime_sec", + "value": "runtime_sec" + }, + { + "$$hashKey": "object:5348", + "selected": true, + "text": "per_element_cost_ms", + "value": "per_element_cost_ms" + }, + { + "$$hashKey": "object:5349", + "selected": true, + "text": "fixed_cost_ms", + "value": "fixed_cost_ms" + } + ], + "query": "runtime_sec,per_element_cost_ms,fixed_cost_ms", + "queryValue": "", + "skipUrlSync": false, + "type": "custom" + } + ] + }, + "timezone": "", + "title": "Python DirectRunner Microbenchmarks", + "uid": "1cnwVDkGk", + "variables": { + "list": [] + }, + "version": 2 +} diff --git a/.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json b/.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json index c7bf2e390862..570dc82e3d4b 100644 --- a/.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json +++ b/.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json @@ -505,6 +505,250 @@ "align": false, "alignLevel": null } + }, + { + "aliasColors": {}, + "bars": false, + "cacheTimeout": null, + "dashLength": 10, + "dashes": false, + "datasource": "BeamInfluxDB", + "fill": 1, + "fillGradient": 0, + "gridPos": { + "h": 9, + "w": 12, + "x": 0, + "y": 18 + }, + "hiddenSeries": false, + "id": 6, + "interval": "24h", + "legend": { + "avg": false, + "current": false, + "max": false, + "min": false, + "show": false, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 2, + "links": [], + "nullPointMode": "connected", + "options": { + "dataLinks": [] + }, + "percentage": false, + "pluginVersion": "6.7.2", + "pointradius": 2, + "points": true, + "renderer": "flot", + "seriesOverrides": [], + "spaceLength": 10, + "stack": false, + "steppedLine": false, + "targets": [ + { + "alias": "read_time", + "groupBy": [ + { + "params": [ + "$__interval" + ], + "type": "time" + } + ], + "measurement": "python_spannerio_read", + "orderByTime": "ASC", + "policy": "default", + "query": "SELECT mean(\"value\") FROM \"python_spannerio_read_2GB_results\" WHERE \"metric\" = 'pyspanner_read_2GB_results_runtime' AND $timeFilter GROUP BY time($__interval), \"metric\"", + "rawQuery": true, + "refId": "A", + "resultFormat": "time_series", + "select": [ + [ + { + "params": [ + "value" + ], + "type": "field" + }, + { + "params": [], + "type": "mean" + } + ] + ], + "tags": [] + } + ], + "thresholds": [], + "timeFrom": null, + "timeRegions": [], + "timeShift": null, + "title": "Reading 2GB of data | Spanner native Dataflow IO", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "transparent": true, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [] + }, + "yaxes": [ + { + "$$hashKey": "object:403", + "format": "s", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "$$hashKey": "object:404", + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ], + "yaxis": { + "align": false, + "alignLevel": null + } + }, + { + "aliasColors": {}, + "bars": false, + "cacheTimeout": null, + "dashLength": 10, + "dashes": false, + "datasource": "BeamInfluxDB", + "fill": 1, + "fillGradient": 0, + "gridPos": { + "h": 9, + "w": 12, + "x": 12, + "y": 18 + }, + "hiddenSeries": false, + "id": 7, + "interval": "24h", + "legend": { + "avg": false, + "current": false, + "max": false, + "min": false, + "show": false, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 2, + "links": [], + "nullPointMode": "connected", + "options": { + "dataLinks": [] + }, + "percentage": false, + "pluginVersion": "6.7.2", + "pointradius": 2, + "points": true, + "renderer": "flot", + "seriesOverrides": [], + "spaceLength": 10, + "stack": false, + "steppedLine": false, + "targets": [ + { + "alias": "write_time", + "groupBy": [ + { + "params": [ + "$__interval" + ], + "type": "time" + } + ], + "measurement": "python_spannerio_write", + "orderByTime": "ASC", + "policy": "default", + "query": "SELECT mean(\"value\") FROM \"python_spanner_write_2GB_results\" WHERE \"metric\" = 'pyspanner_write_2GB_results_runtime' AND $timeFilter GROUP BY time($__interval), \"metric\"", + "rawQuery": true, + "refId": "A", + "resultFormat": "time_series", + "select": [ + [ + { + "params": [ + "value" + ], + "type": "field" + }, + { + "params": [], + "type": "mean" + } + ] + ], + "tags": [] + } + ], + "thresholds": [], + "timeFrom": null, + "timeRegions": [], + "timeShift": null, + "title": "Writing 2GB of data | Spanner native Dataflow IO", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "transparent": true, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [] + }, + "yaxes": [ + { + "$$hashKey": "object:403", + "format": "s", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "$$hashKey": "object:404", + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ], + "yaxis": { + "align": false, + "alignLevel": null + } } ], "schemaVersion": 22, diff --git a/.test-infra/metrics/grafana/dashboards/perftests_metrics/SideInput_Load_Tests.json b/.test-infra/metrics/grafana/dashboards/perftests_metrics/SideInput_Load_Tests.json index df346fb9bc7a..62616951c808 100644 --- a/.test-infra/metrics/grafana/dashboards/perftests_metrics/SideInput_Load_Tests.json +++ b/.test-infra/metrics/grafana/dashboards/perftests_metrics/SideInput_Load_Tests.json @@ -21,7 +21,7 @@ "links": [], "panels": [ { - "content": "The following options are common to all tests:\n* key size: 100B\n* value size: 900B\n* number of workers: 10\n* size of the window (if fixed windows are used): 1 second\n\nAdditional common options for Dataflow:\n* experiments: use_runner_v2\n* autoscaling_algorithm: NONE\n\n\n[Jenkins job definition (Python, Dataflow)](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_SideInput_Python.groovy)\n\n", + "content": "The following options should be used by default:\n* key size: 100B\n* value size: 900B\n* number of workers: 10\n* size of the window (if fixed windows are used): 1 second\n\n[Jenkins job definition (Python, Dataflow)](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_SideInput_Python.groovy) [Jenkins job definition (Go, Flink)](https://github.com/apache/beam/tree/master/.test-infra/jenkins/job_LoadTests_SideInput_Flink_Go.groovy) [Jenkins job definition (Go, Dataflow)](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_SideInput_Go.groovy)\n\nUntil the issue [BEAM-11427](https://issues.apache.org/jira/browse/BEAM-11427) in Go SDK is resolved, sideinput iteration test have 400MB, instead of 10GB.", "datasource": null, "gridPos": { "h": 8, diff --git a/.test-infra/metrics/grafana/dashboards/post-commit_tests.json b/.test-infra/metrics/grafana/dashboards/post-commit_tests.json index 2536543b9804..291430679d87 100644 --- a/.test-infra/metrics/grafana/dashboards/post-commit_tests.json +++ b/.test-infra/metrics/grafana/dashboards/post-commit_tests.json @@ -10,6 +10,12 @@ "limit": 100, "name": "Annotations & Alerts", "showIn": 0, + "target": { + "limit": 100, + "matchAny": false, + "tags": [], + "type": "dashboard" + }, "type": "dashboard" } ] @@ -17,11 +23,10 @@ "editable": true, "gnetId": null, "graphTooltip": 0, - "id": 1, "links": [], "panels": [ { - "content": "This dashboard tracks Post-commit test reliability over-time.\n\n* [Post-commit test policies](https://beam.apache.org/contribute/postcommits-policies/)\n* [Existing test failure issues](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC)\n* [File a new test failure issue](https://s.apache.org/beam-test-failure)", + "datasource": null, "gridPos": { "h": 4, "w": 24, @@ -30,7 +35,11 @@ }, "id": 11, "links": [], - "mode": "markdown", + "options": { + "content": "This dashboard tracks Post-commit test reliability over-time.\n\n* [Post-commit test policies](https://beam.apache.org/contribute/postcommits-policies/)\n* [Existing test failure issues](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC)\n* [File a new test failure issue](https://s.apache.org/beam-test-failure)", + "mode": "markdown" + }, + "pluginVersion": "8.1.2", "title": "Dashboard guidelines", "type": "text" }, @@ -68,14 +77,62 @@ "noDataState": "keep_state", "notifications": [] }, - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, "datasource": "BeamPSQL", - "decimals": 0, "description": "Percent reliability of all post-commit job runs for a given week.\n\nUnreliability of a test suite impact developer productivity by forcing contributors to re-run tests. When tests are consistently unreliable, developers will simply ignore them.\n\nWe aim for >= 70% reliability per test suite.", - "fill": 0, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "% successful runs", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": true, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "line+area" + } + }, + "decimals": 1, + "mappings": [], + "max": 1, + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "red", + "value": null + }, + { + "color": "transparent", + "value": 0.7 + } + ] + }, + "unit": "percentunit" + }, + "overrides": [] + }, "gridPos": { "h": 7, "w": 24, @@ -83,31 +140,20 @@ "y": 4 }, "id": 6, - "legend": { - "alignAsTable": true, - "avg": false, - "current": true, - "hideEmpty": false, - "hideZero": true, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 1, "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 2, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "stack": false, - "steppedLine": false, + "options": { + "legend": { + "calcs": [ + "lastNotNull" + ], + "displayMode": "table", + "placement": "right" + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.1.2", "targets": [ { "alias": "", @@ -115,7 +161,7 @@ "group": [], "metricColumn": "none", "rawQuery": true, - "rawSql": "SELECT\n DATE_TRUNC('week', build_timestamp) as time,\n avg(\n case \n when build_result = 'SUCCESS' then 1\n else 0\n end) as value,\n substring(job_name from 'beam_PostCommit_#\"%#\"' for '#') as job_name\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND (job_name LIKE 'beam_PostCommit_%')\n AND NOT (job_name like '%_PR')\nGROUP BY\n time, job_name\norder BY\n job_name, time\n", + "rawSql": "SELECT\n DATE_TRUNC('week', build_timestamp) as time,\n avg(\n case \n when build_result = 'SUCCESS' then 1\n else 0\n end) as value,\n substring(job_name from 'beam_#\"%#\"' for '#') as job_name\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND (((job_name LIKE 'beam_PostCommit_%')\n AND NOT (job_name like '%_PR')) OR job_name like '%_Cron')\nGROUP BY\n time, job_name\norder BY\n time, job_name\n", "refId": "A", "select": [ [ @@ -137,55 +183,10 @@ ] } ], - "thresholds": [ - { - "colorMode": "critical", - "fill": true, - "line": true, - "op": "lt", - "value": 0.7 - } - ], "timeFrom": null, - "timeRegions": [], "timeShift": null, "title": "Post-commit reliability per week", - "tooltip": { - "shared": true, - "sort": 1, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "decimals": 1, - "format": "percentunit", - "label": "% successful runs", - "logBase": 1, - "max": "1", - "min": "0", - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": false - } - ], - "yaxis": { - "align": false, - "alignLevel": null - } + "type": "timeseries" }, { "alert": { @@ -221,14 +222,62 @@ "noDataState": "no_data", "notifications": [] }, - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, "datasource": "BeamPSQL", - "decimals": 0, "description": "Percent reliability of all post-commit job runs per-day.\n\nUnreliability of a test suite impact developer productivity by forcing contributors to re-run tests. When tests are consistently unreliable, developers will simply ignore them.\n\nWe aim for >= 70% reliability per test suite.", - "fill": 0, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "% successful runs", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": true, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "line+area" + } + }, + "decimals": 1, + "mappings": [], + "max": 1, + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "red", + "value": null + }, + { + "color": "transparent", + "value": 0.7 + } + ] + }, + "unit": "percentunit" + }, + "overrides": [] + }, "gridPos": { "h": 12, "w": 15, @@ -236,31 +285,20 @@ "y": 11 }, "id": 9, - "legend": { - "alignAsTable": true, - "avg": false, - "current": true, - "hideZero": true, - "max": false, - "min": false, - "rightSide": true, - "show": false, - "sideWidth": null, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 1, "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 2, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "stack": false, - "steppedLine": false, + "options": { + "legend": { + "calcs": [ + "lastNotNull" + ], + "displayMode": "hidden", + "placement": "right" + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.1.2", "targets": [ { "alias": "", @@ -268,7 +306,7 @@ "group": [], "metricColumn": "none", "rawQuery": true, - "rawSql": "SELECT\n DATE_TRUNC('day', build_timestamp) as time,\n avg(\n case \n when build_result = 'SUCCESS' then 1\n else 0\n end) as value,\n substring(job_name from 'beam_PostCommit_#\"%#\"' for '#') as job_name\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND (job_name LIKE 'beam_PostCommit_%')\n AND NOT (job_name like '%_PR')\nGROUP BY\n time, job_name\norder BY\n job_name, time\n", + "rawSql": "SELECT\n DATE_TRUNC('day', build_timestamp) as time,\n avg(\n case \n when build_result = 'SUCCESS' then 1\n else 0\n end) as value,\n substring(job_name from 'beam_PostCommit_#\"%#\"' for '#') as job_name\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND (job_name LIKE 'beam_PostCommit_%')\n AND NOT (job_name like '%_PR')\nGROUP BY\n time, job_name\norder BY\n time, job_name\n", "refId": "A", "select": [ [ @@ -290,61 +328,197 @@ ] } ], - "thresholds": [ - { - "colorMode": "critical", - "fill": true, - "line": true, - "op": "lt", - "value": 0.7 - } - ], "timeFrom": null, - "timeRegions": [], "timeShift": null, "title": "Post-commit reliability per day", - "tooltip": { - "shared": true, - "sort": 1, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "decimals": 1, - "format": "percentunit", - "label": "% successful runs", - "logBase": 1, - "max": "1", - "min": "0", - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": false - } - ], - "yaxis": { - "align": false, - "alignLevel": null - } + "type": "timeseries" }, { - "columns": [], "datasource": "BeamPSQL", "description": "List of jobs which have failed. Click on the job to view it in Jenkins.", - "fontSize": "100%", + "fieldConfig": { + "defaults": { + "color": { + "mode": "thresholds" + }, + "custom": { + "align": "auto", + "displayMode": "auto" + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + } + }, + "overrides": [ + { + "matcher": { + "id": "byName", + "options": "Time" + }, + "properties": [ + { + "id": "displayName", + "value": "Time" + }, + { + "id": "unit", + "value": "time: YYYY-MM-DD HH:mm:ss" + }, + { + "id": "custom.align", + "value": null + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "build_url" + }, + "properties": [ + { + "id": "displayName", + "value": "Build Url" + }, + { + "id": "unit", + "value": "short" + }, + { + "id": "decimals", + "value": 2 + }, + { + "id": "links", + "value": [ + { + "targetBlank": true, + "title": "Link to Jenkins job.", + "url": "${__cell:raw}" + } + ] + }, + { + "id": "custom.align", + "value": null + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "job_name" + }, + "properties": [ + { + "id": "displayName", + "value": "Job Name" + }, + { + "id": "unit", + "value": "short" + }, + { + "id": "decimals", + "value": 2 + }, + { + "id": "links", + "value": [ + { + "targetBlank": true, + "title": "View Jenkins job: ${__cell_1}_${__cell_2}", + "url": "${__cell_0:raw}" + } + ] + }, + { + "id": "custom.align", + "value": null + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "build_id" + }, + "properties": [ + { + "id": "displayName", + "value": "Build ID" + }, + { + "id": "unit", + "value": "short" + }, + { + "id": "links", + "value": [ + { + "targetBlank": true, + "title": "View Jenkins job: ${__cell_1}_${__cell_2}", + "url": "${__cell_0:raw}" + } + ] + }, + { + "id": "custom.align", + "value": null + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "build_timestamp" + }, + "properties": [ + { + "id": "displayName", + "value": "Start Time" + }, + { + "id": "unit", + "value": "short" + }, + { + "id": "decimals", + "value": 2 + }, + { + "id": "unit", + "value": "time: MM/DD/YY h:mm:ss a" + }, + { + "id": "links", + "value": [ + { + "targetBlank": true, + "title": "View Jenkins job: ${__cell_1}_${__cell_2}", + "url": "${__cell_0:raw}" + } + ] + }, + { + "id": "custom.align", + "value": null + } + ] + } + ] + }, "gridPos": { "h": 12, "w": 9, @@ -355,109 +529,15 @@ "id": 8, "links": [ { - "includeVars": false, "targetBlank": true, "title": "Beam Jenkins", - "type": "absolute", "url": "https://ci-beam.apache.org/" } ], - "pageSize": null, - "scroll": true, - "showHeader": true, - "sort": { - "col": 0, - "desc": true + "options": { + "showHeader": true }, - "styles": [ - { - "alias": "Time", - "dateFormat": "YYYY-MM-DD HH:mm:ss", - "link": false, - "pattern": "Time", - "type": "date" - }, - { - "alias": "Build Url", - "colorMode": null, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "dateFormat": "YYYY-MM-DD HH:mm:ss", - "decimals": 2, - "link": true, - "linkTargetBlank": true, - "linkTooltip": "Link to Jenkins job.", - "linkUrl": "${__cell:raw}", - "mappingType": 1, - "pattern": "build_url", - "thresholds": [], - "type": "hidden", - "unit": "short" - }, - { - "alias": "Job Name", - "colorMode": null, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "dateFormat": "YYYY-MM-DD HH:mm:ss", - "decimals": 2, - "link": true, - "linkTargetBlank": true, - "linkTooltip": "View Jenkins job: ${__cell_1}_${__cell_2}", - "linkUrl": "${__cell_0:raw}", - "mappingType": 1, - "pattern": "job_name", - "thresholds": [], - "type": "string", - "unit": "short" - }, - { - "alias": "Build ID", - "colorMode": null, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "dateFormat": "YYYY-MM-DD HH:mm:ss", - "decimals": 0, - "link": true, - "linkTargetBlank": true, - "linkTooltip": "View Jenkins job: ${__cell_1}_${__cell_2}", - "linkUrl": "${__cell_0:raw}", - "mappingType": 1, - "pattern": "build_id", - "thresholds": [], - "type": "number", - "unit": "short" - }, - { - "alias": "Start Time", - "colorMode": null, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "dateFormat": "MM/DD/YY h:mm:ss a", - "decimals": 2, - "link": true, - "linkTargetBlank": true, - "linkTooltip": "View Jenkins job: ${__cell_1}_${__cell_2}", - "linkUrl": "${__cell_0:raw}", - "mappingType": 1, - "pattern": "build_timestamp", - "thresholds": [], - "type": "date", - "unit": "short" - } - ], + "pluginVersion": "8.1.2", "targets": [ { "alias": "", @@ -489,18 +569,71 @@ ], "timeShift": null, "title": "Failed builds", - "transform": "table", + "transformations": [ + { + "id": "merge", + "options": { + "reducers": [] + } + } + ], "type": "table" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, "datasource": "BeamPSQL", - "decimals": 1, "description": "Execution time for each post-commit job", - "fill": 0, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "Average job duration", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": true, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + }, + "unit": "ms" + }, + "overrides": [] + }, "gridPos": { "h": 8, "w": 15, @@ -508,29 +641,20 @@ "y": 23 }, "id": 5, - "legend": { - "alignAsTable": true, - "avg": false, - "current": true, - "max": false, - "min": false, - "rightSide": true, - "show": false, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 1, "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "stack": false, - "steppedLine": false, + "options": { + "legend": { + "calcs": [ + "lastNotNull" + ], + "displayMode": "hidden", + "placement": "right" + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.1.2", "targets": [ { "alias": "", @@ -538,7 +662,7 @@ "group": [], "metricColumn": "none", "rawQuery": true, - "rawSql": "SELECT\n build_timestamp as time,\n build_duration as value,\n substring(job_name from 'beam_PostCommit_#\"%#\"' for '#') as metric\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND (job_name LIKE 'beam_PostCommit_%')\n AND NOT (job_name LIKE '%_PR')\nORDER BY\n job_name, time", + "rawSql": "SELECT\n build_timestamp as time,\n build_duration as value,\n substring(job_name from 'beam_PostCommit_#\"%#\"' for '#') as metric\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND (job_name LIKE 'beam_PostCommit_%')\n AND NOT (job_name LIKE '%_PR')\nORDER BY\n time, job_name", "refId": "A", "select": [ [ @@ -560,57 +684,98 @@ ] } ], - "thresholds": [], "timeFrom": null, - "timeRegions": [], "timeShift": null, "title": "Post-commit job duration", - "tooltip": { - "shared": true, - "sort": 2, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "decimals": null, - "format": "ms", - "label": "Average job duration", - "logBase": 1, - "max": null, - "min": "0", - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": false - } - ], - "yaxis": { - "align": false, - "alignLevel": null - } + "type": "timeseries" }, { - "aliasColors": {}, - "bars": true, - "dashLength": 10, - "dashes": false, "datasource": "BeamPSQL", - "decimals": 0, "description": "Tracks the count of test failure JIRA issues currently open.", - "fill": 3, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "# of JIRA issues", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "bars", + "fillOpacity": 100, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": true, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "decimals": 0, + "mappings": [], + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + }, + "unit": "short" + }, + "overrides": [ + { + "matcher": { + "id": "byName", + "options": "total_open" + }, + "properties": [ + { + "id": "color", + "value": { + "fixedColor": "#eab839", + "mode": "fixed" + } + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "currently_failing" + }, + "properties": [ + { + "id": "color", + "value": { + "fixedColor": "#bf1b00", + "mode": "fixed" + } + } + ] + } + ] + }, "gridPos": { "h": 8, "w": 9, @@ -618,45 +783,24 @@ "y": 23 }, "id": 14, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false, - "values": false - }, - "lines": false, - "linewidth": 1, "links": [ { "targetBlank": true, "title": "Jira tickets", - "type": "absolute", "url": "https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC" } ], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "total_open", - "color": "#eab839" + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom" }, - { - "alias": "currently_failing", - "color": "#bf1b00" + "tooltip": { + "mode": "single" } - ], - "spaceLength": 10, - "stack": false, - "steppedLine": false, + }, + "pluginVersion": "8.1.2", "targets": [ { "format": "time_series", @@ -713,51 +857,14 @@ ] } ], - "thresholds": [], "timeFrom": null, - "timeRegions": [], "timeShift": null, "title": "Test Failure JIRA issues", - "tooltip": { - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "decimals": 0, - "format": "short", - "label": "# of JIRA issues", - "logBase": 1, - "max": null, - "min": "0", - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": false - } - ], - "yaxis": { - "align": false, - "alignLevel": null - } + "type": "timeseries" } ], "refresh": false, - "schemaVersion": 18, + "schemaVersion": 30, "style": "dark", "tags": [], "templating": { @@ -793,5 +900,5 @@ "timezone": "", "title": "Post-commit Test Reliability", "uid": "D81lW0pmk", - "version": 46 + "version": 2 } diff --git a/.test-infra/metrics/grafana/dashboards/pre-commit_tests.json b/.test-infra/metrics/grafana/dashboards/pre-commit_tests.json index e5ab46e05c68..a518b2a24024 100644 --- a/.test-infra/metrics/grafana/dashboards/pre-commit_tests.json +++ b/.test-infra/metrics/grafana/dashboards/pre-commit_tests.json @@ -8,6 +8,12 @@ "hide": true, "iconColor": "rgba(0, 211, 255, 1)", "name": "Annotations & Alerts", + "target": { + "limit": 100, + "matchAny": false, + "tags": [], + "type": "dashboard" + }, "type": "dashboard" } ] @@ -15,7 +21,6 @@ "editable": true, "gnetId": null, "graphTooltip": 0, - "id": 2, "links": [], "panels": [ { @@ -52,13 +57,60 @@ "noDataState": "keep_state", "notifications": [] }, - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, "datasource": "BeamPSQL", "description": "Execution time for each pre-commit job.\n\nLong test suite execution impacts developer productivity by delaying the quality signal of a pull request of current HEAD. If tests are consistently slow, developers won't wait for them to complete.\n\nWe aim for under 2 hour execution per test suite, but ideally under 30 mins.", - "fill": 0, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "Average job duration", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": true, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "line+area" + } + }, + "mappings": [], + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "transparent", + "value": null + }, + { + "color": "red", + "value": 7200000 + } + ] + }, + "unit": "ms" + }, + "overrides": [] + }, "gridPos": { "h": 8, "w": 24, @@ -66,31 +118,20 @@ "y": 0 }, "id": 4, - "legend": { - "alignAsTable": true, - "avg": false, - "current": true, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "sort": "current", - "sortDesc": true, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 1, "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "stack": false, - "steppedLine": false, + "options": { + "legend": { + "calcs": [ + "lastNotNull" + ], + "displayMode": "table", + "placement": "right" + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.1.2", "targets": [ { "alias": "", @@ -98,7 +139,7 @@ "group": [], "metricColumn": "none", "rawQuery": true, - "rawSql": "SELECT\n build_timestamp as time,\n build_duration as value,\n substring(job_name from 'beam_PreCommit_#\"%#\"_(Cron|Commit)' for '#') as metric\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND build_result = 'SUCCESS'\n AND ((job_name LIKE 'beam_PreCommit_%_Commit')\n OR (job_name LIKE 'beam_PreCommit_%_Cron'))\nORDER BY\n metric, time", + "rawSql": "SELECT\n build_timestamp as time,\n build_duration as value,\n substring(job_name from 'beam_PreCommit_#\"%#\"_(Cron|Commit)' for '#') as metric\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND build_result = 'SUCCESS'\n AND ((job_name LIKE 'beam_PreCommit_%_Commit')\n OR (job_name LIKE 'beam_PreCommit_%_Cron'))\nORDER BY\n time, metric", "refId": "A", "select": [ [ @@ -120,62 +161,64 @@ ] } ], - "thresholds": [ - { - "colorMode": "critical", - "fill": true, - "line": true, - "op": "gt", - "value": 7200000 - } - ], "timeFrom": null, - "timeRegions": [], "timeShift": null, "title": "Pre-commit job duration", - "tooltip": { - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "ms", - "label": "Average job duration", - "logBase": 1, - "max": null, - "min": "0", - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": false - } - ], - "yaxis": { - "align": false, - "alignLevel": null - } + "type": "timeseries" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, "datasource": "BeamPSQL", - "fill": 1, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 10, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": true, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + }, + "unit": "dtdurationms" + }, + "overrides": [] + }, "gridPos": { "h": 8, "w": 24, @@ -183,38 +226,25 @@ "y": 8 }, "id": 6, - "legend": { - "alignAsTable": true, - "avg": false, - "current": false, - "hideEmpty": true, - "hideZero": true, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "stack": false, - "steppedLine": false, + "options": { + "legend": { + "calcs": [], + "displayMode": "table", + "placement": "right" + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.1.2", "targets": [ { "format": "time_series", "group": [], "metricColumn": "none", "rawQuery": true, - "rawSql": "SELECT\n build_timestamp as time,\n timing_queuingDurationMillis as value,\n substring(job_name from 'beam_PreCommit_#\"%#\"_(Cron|Commit|Phrase)' for '#') as metric\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND build_result = 'SUCCESS'\n AND ((job_name LIKE 'beam_PreCommit_%_Commit')\n OR (job_name LIKE 'beam_PreCommit_%_Cron')\n OR (job_name LIKE 'beam_PreCommit_%_Phrase'))\nORDER BY\n metric, time", + "rawSql": "SELECT\n build_timestamp as time,\n timing_queuingDurationMillis as value,\n substring(job_name from 'beam_PreCommit_#\"%#\"_(Cron|Commit|Phrase)' for '#') as metric\nFROM\n jenkins_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND build_result = 'SUCCESS'\n AND ((job_name LIKE 'beam_PreCommit_%_Commit')\n OR (job_name LIKE 'beam_PreCommit_%_Cron')\n OR (job_name LIKE 'beam_PreCommit_%_Phrase'))\nORDER BY\n time, metric", "refId": "A", "select": [ [ @@ -236,54 +266,64 @@ ] } ], - "thresholds": [], "timeFrom": null, - "timeRegions": [], "timeShift": null, "title": "Time in queue", - "tooltip": { - "shared": true, - "sort": 2, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "dtdurationms", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ], - "yaxis": { - "align": false, - "alignLevel": null - } + "type": "timeseries" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, "datasource": "BeamPSQL", - "fill": 0, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": true, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + }, + "unit": "dtdurationms" + }, + "overrides": [] + }, "gridPos": { "h": 8, "w": 24, @@ -291,31 +331,18 @@ "y": 16 }, "id": 8, - "legend": { - "alignAsTable": true, - "avg": false, - "current": false, - "hideEmpty": true, - "hideZero": true, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 2, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "stack": false, - "steppedLine": false, + "options": { + "legend": { + "calcs": [], + "displayMode": "table", + "placement": "right" + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.1.2", "targets": [ { "aggregation": "Last", @@ -351,49 +378,13 @@ ] } ], - "thresholds": [], "timeFrom": null, - "timeRegions": [], "timeShift": null, "title": "Time in queue: 0.9 percentile on month period", - "tooltip": { - "shared": true, - "sort": 2, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "dtdurationms", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ], - "yaxis": { - "align": false, - "alignLevel": null - } + "type": "timeseries" } ], - "schemaVersion": 18, + "schemaVersion": 30, "style": "dark", "tags": [], "templating": { @@ -431,5 +422,5 @@ "timezone": "utc", "title": "Pre-commit Test Latency", "uid": "_TNndF2iz", - "version": 18 + "version": 1 } diff --git a/.test-infra/metrics/grafana/dashboards/stability_critical_jobs_status.json b/.test-infra/metrics/grafana/dashboards/stability_critical_jobs_status.json index 83695dc9ce01..f0ffe636bf95 100644 --- a/.test-infra/metrics/grafana/dashboards/stability_critical_jobs_status.json +++ b/.test-infra/metrics/grafana/dashboards/stability_critical_jobs_status.json @@ -8,6 +8,12 @@ "hide": true, "iconColor": "rgba(0, 211, 255, 1)", "name": "Annotations & Alerts", + "target": { + "limit": 100, + "matchAny": false, + "tags": [], + "type": "dashboard" + }, "type": "dashboard" } ] @@ -15,11 +21,10 @@ "editable": true, "gnetId": null, "graphTooltip": 0, - "id": 3, "links": [], "panels": [ { - "content": "The graph shows: average greenness of critical post-commit tests jobs per week. This graph show health of our project.\n\nTable shows list of relevant jobs failures during selected time interval (You can change time interval on top-right corner of the dashboard). Please, triage failed jobs and update or create corresponding jira tickets. You can utilized provided links to help with this.", + "datasource": null, "gridPos": { "h": 3, "w": 10, @@ -28,15 +33,97 @@ }, "id": 8, "links": [], - "mode": "markdown", - "options": {}, + "options": { + "content": "The graph shows: average greenness of critical post-commit tests jobs per week. This graph show health of our project.\n\nTable shows list of relevant jobs failures during selected time interval (You can change time interval on top-right corner of the dashboard). Please, triage failed jobs and update or create corresponding jira tickets. You can utilized provided links to help with this.", + "mode": "markdown" + }, + "pluginVersion": "8.1.2", "title": "Dashboard guidelines", "type": "text" }, { - "columns": [], "datasource": "BeamPSQL", - "fontSize": "100%", + "fieldConfig": { + "defaults": { + "color": { + "mode": "thresholds" + }, + "custom": { + "align": "auto", + "displayMode": "auto" + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + } + }, + "overrides": [ + { + "matcher": { + "id": "byName", + "options": "Time" + }, + "properties": [ + { + "id": "displayName", + "value": "Time" + }, + { + "id": "unit", + "value": "time: YYYY-MM-DD HH:mm:ss" + }, + { + "id": "custom.align", + "value": null + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "build_url" + }, + "properties": [ + { + "id": "displayName", + "value": "Build Url" + }, + { + "id": "unit", + "value": "short" + }, + { + "id": "decimals", + "value": 2 + }, + { + "id": "links", + "value": [ + { + "targetBlank": true, + "title": "Link to Jenkins job.", + "url": "${__cell:raw}" + } + ] + }, + { + "id": "custom.align", + "value": null + } + ] + } + ] + }, "gridPos": { "h": 6, "w": 14, @@ -46,43 +133,10 @@ "hideTimeOverride": false, "id": 4, "links": [], - "options": {}, - "pageSize": null, - "scroll": true, - "showHeader": true, - "sort": { - "col": 0, - "desc": true + "options": { + "showHeader": true }, - "styles": [ - { - "alias": "Time", - "dateFormat": "YYYY-MM-DD HH:mm:ss", - "link": false, - "pattern": "Time", - "type": "date" - }, - { - "alias": "Build Url", - "colorMode": null, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "dateFormat": "YYYY-MM-DD HH:mm:ss", - "decimals": 2, - "link": true, - "linkTargetBlank": true, - "linkTooltip": "Link to Jenkins job.", - "linkUrl": "${__cell:raw}", - "mappingType": 1, - "pattern": "build_url", - "thresholds": [], - "type": "number", - "unit": "short" - } - ], + "pluginVersion": "8.1.2", "targets": [ { "alias": "", @@ -114,11 +168,18 @@ ], "timeShift": null, "title": "Failed builds", - "transform": "table", + "transformations": [ + { + "id": "merge", + "options": { + "reducers": [] + } + } + ], "type": "table" }, { - "content": "[List existing jira tickets](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC)\n\n[Create new Jira ticket](https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12319527&issuetype=1&summary=%5BjobName%5D%5BTestName%5D%5BIsFlake%5D%20Failure%20summary&priority=3&components=12334203&description=%3CFailure%20summary%3E%0AFailing%20job%20url:%0AJob%20history%20url:%0ARelevant%20log:)", + "datasource": null, "gridPos": { "h": 3, "w": 10, @@ -127,19 +188,70 @@ }, "id": 6, "links": [], - "mode": "markdown", - "options": {}, + "options": { + "content": "[List existing jira tickets](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC)\n\n[Create new Jira ticket](https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12319527&issuetype=1&summary=%5BjobName%5D%5BTestName%5D%5BIsFlake%5D%20Failure%20summary&priority=3&components=12334203&description=%3CFailure%20summary%3E%0AFailing%20job%20url:%0AJob%20history%20url:%0ARelevant%20log:)", + "mode": "markdown" + }, + "pluginVersion": "8.1.2", "title": "Useful links", "type": "text" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, "datasource": "BeamPSQL", "description": "Each data point shows aggregation for corresponding week.\nLatest (rightmost) data point aggregates all data available for current week, so it may change based on new data and should not be considered a final value.", - "fill": 0, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": true, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "line" + } + }, + "mappings": [], + "max": 1, + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "#3f6833", + "value": null + }, + { + "color": "transparent", + "value": 0.7 + } + ] + }, + "unit": "percentunit" + }, + "overrides": [] + }, "gridPos": { "h": 7, "w": 10, @@ -147,29 +259,18 @@ "y": 6 }, "id": 2, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, "links": [], - "nullPointMode": "null", - "options": {}, - "percentage": false, - "pointradius": 2, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "stack": false, - "steppedLine": false, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "right" + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.1.2", "targets": [ { "alias": "", @@ -177,7 +278,7 @@ "group": [], "metricColumn": "none", "rawQuery": true, - "rawSql": "SELECT\n DATE_TRUNC('week', build_timestamp) as time,\n avg(\n case \n when build_result = 'SUCCESS' then 1\n else 0\n end) as value,\n substring(job_name from 'beam_PostCommit_#\"%#\"' for '#') as job_name\nFROM\n /*\n We perform a union here to create a fake \"Python_All\" job_name in\n order to graph a new line for all the python results combined.\n */\n ( SELECT build_timestamp, build_result, job_name\n FROM jenkins_builds\n UNION\n SELECT build_timestamp, build_result, 'beam_PostCommit_Python_All' as job_name\n FROM jenkins_builds\n WHERE \n ((job_name SIMILAR TO 'beam_PostCommit_Python[0-9]+'))\n AND NOT (job_name like '%_PR')\n ) AS critical_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND ((job_name = 'beam_PostCommit_Java') \n OR (job_name = 'beam_PostCommit_Go') \n OR (job_name SIMILAR TO 'beam_PostCommit_Python[0-9]+')\n OR (job_name = 'beam_PostCommit_Python_Verify')\n OR (job_name = 'beam_PostCommit_Python_All')\n OR (job_name = 'beam_PostCommit_Website_Publish'))\n AND NOT (job_name like '%_PR')\nGROUP BY\n time, job_name\norder BY\n job_name, time", + "rawSql": "SELECT\n DATE_TRUNC('week', build_timestamp) as time,\n avg(\n case \n when build_result = 'SUCCESS' then 1\n else 0\n end) as value,\n substring(job_name from 'beam_PostCommit_#\"%#\"' for '#') as job_name\nFROM\n /*\n We perform a union here to create a fake \"Python_All\" job_name in\n order to graph a new line for all the python results combined.\n */\n ( SELECT build_timestamp, build_result, job_name\n FROM jenkins_builds\n UNION\n SELECT build_timestamp, build_result, 'beam_PostCommit_Python_All' as job_name\n FROM jenkins_builds\n WHERE \n ((job_name SIMILAR TO 'beam_PostCommit_Python[0-9]+'))\n AND NOT (job_name like '%_PR')\n ) AS critical_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND ((job_name = 'beam_PostCommit_Java') \n OR (job_name = 'beam_PostCommit_Go') \n OR (job_name SIMILAR TO 'beam_PostCommit_Python[0-9]+')\n OR (job_name = 'beam_PostCommit_Python_Verify')\n OR (job_name = 'beam_PostCommit_Python_All')\n OR (job_name = 'beam_PostCommit_Website_Publish'))\n AND NOT (job_name like '%_PR')\nGROUP BY\n time, job_name\norder BY\n time, job_name", "refId": "A", "select": [ [ @@ -199,66 +300,67 @@ ] } ], - "thresholds": [ - { - "colorMode": "custom", - "fill": false, - "line": true, - "lineColor": "#3f6833", - "op": "lt", - "value": 0.7, - "yaxis": "left" - } - ], "timeFrom": null, - "timeRegions": [], "timeShift": null, "title": "Greenness per Week (in %)", - "tooltip": { - "shared": true, - "sort": 1, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "decimals": null, - "format": "percentunit", - "label": "", - "logBase": 1, - "max": "1", - "min": "0", - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": false - } - ], - "yaxis": { - "align": false, - "alignLevel": null - } + "type": "timeseries" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, "datasource": "BeamPSQL", "description": "Each data point shows aggregation for corresponding month.\nLatest (rightmost) data point aggregates all data available for current month, so it may change based on new data and should not be considered a final value.", - "fill": 0, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": true, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "line" + } + }, + "mappings": [], + "max": 1, + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "#3f6833", + "value": null + }, + { + "color": "transparent", + "value": 0.7 + } + ] + }, + "unit": "percentunit" + }, + "overrides": [] + }, "gridPos": { "h": 7, "w": 14, @@ -266,30 +368,18 @@ "y": 6 }, "id": 10, - "legend": { - "alignAsTable": true, - "avg": false, - "current": false, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, "links": [], - "nullPointMode": "null", - "options": {}, - "percentage": false, - "pointradius": 2, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "stack": false, - "steppedLine": false, + "options": { + "legend": { + "calcs": [], + "displayMode": "table", + "placement": "right" + }, + "tooltip": { + "mode": "single" + } + }, + "pluginVersion": "8.1.2", "targets": [ { "alias": "", @@ -297,7 +387,7 @@ "group": [], "metricColumn": "none", "rawQuery": true, - "rawSql": "SELECT\n DATE_TRUNC('month', build_timestamp) as time,\n avg(\n case \n when build_result = 'SUCCESS' then 1\n else 0\n end) as value,\n substring(job_name from 'beam_PostCommit_#\"%#\"' for '#') as job_name\nFROM\n /*\n We perform a union here to create a fake \"Python_All\" job_name in\n order to graph a new line for all the python results combined.\n */\n ( SELECT build_timestamp, build_result, job_name\n FROM jenkins_builds\n UNION\n SELECT build_timestamp, build_result, 'beam_PostCommit_Python_All' as job_name\n FROM jenkins_builds\n WHERE \n ((job_name SIMILAR TO 'beam_PostCommit_Python[0-9]+'))\n AND NOT (job_name like '%_PR')\n ) AS critical_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND ((job_name = 'beam_PostCommit_Java') \n OR (job_name = 'beam_PostCommit_Go') \n OR (job_name SIMILAR TO 'beam_PostCommit_Python[0-9]+')\n OR (job_name = 'beam_PostCommit_Python_Verify')\n OR (job_name = 'beam_PostCommit_Python_All')\n OR (job_name = 'beam_PostCommit_Website_Publish'))\n AND NOT (job_name like '%_PR')\nGROUP BY\n time, job_name\norder BY\n job_name, time", + "rawSql": "SELECT\n DATE_TRUNC('month', build_timestamp) as time,\n avg(\n case \n when build_result = 'SUCCESS' then 1\n else 0\n end) as value,\n substring(job_name from 'beam_PostCommit_#\"%#\"' for '#') as job_name\nFROM\n /*\n We perform a union here to create a fake \"Python_All\" job_name in\n order to graph a new line for all the python results combined.\n */\n ( SELECT build_timestamp, build_result, job_name\n FROM jenkins_builds\n UNION\n SELECT build_timestamp, build_result, 'beam_PostCommit_Python_All' as job_name\n FROM jenkins_builds\n WHERE \n ((job_name SIMILAR TO 'beam_PostCommit_Python[0-9]+'))\n AND NOT (job_name like '%_PR')\n ) AS critical_builds\nWHERE\n (build_timestamp BETWEEN $__timeFrom() AND $__timeTo())\n AND ((job_name = 'beam_PostCommit_Java') \n OR (job_name = 'beam_PostCommit_Go') \n OR (job_name SIMILAR TO 'beam_PostCommit_Python[0-9]+')\n OR (job_name = 'beam_PostCommit_Python_Verify')\n OR (job_name = 'beam_PostCommit_Python_All')\n OR (job_name = 'beam_PostCommit_Website_Publish'))\n AND NOT (job_name like '%_PR')\nGROUP BY\n time, job_name\norder BY\n time, job_name", "refId": "A", "select": [ [ @@ -319,61 +409,14 @@ ] } ], - "thresholds": [ - { - "colorMode": "custom", - "fill": false, - "line": true, - "lineColor": "#3f6833", - "op": "lt", - "value": 0.7, - "yaxis": "left" - } - ], "timeFrom": null, - "timeRegions": [], "timeShift": null, "title": "Greenness per Month (in %)", - "tooltip": { - "shared": true, - "sort": 1, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "decimals": null, - "format": "percentunit", - "label": "", - "logBase": 1, - "max": "1", - "min": "0", - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": false - } - ], - "yaxis": { - "align": false, - "alignLevel": null - } + "type": "timeseries" } ], "refresh": false, - "schemaVersion": 18, + "schemaVersion": 30, "style": "dark", "tags": [], "templating": { @@ -411,5 +454,5 @@ "timezone": "utc", "title": "Stability critical jobs status", "uid": "McTAiu0ik", - "version": 1 -} \ No newline at end of file + "version": 2 +} diff --git a/.test-infra/metrics/grafana/provisioning/datasources/beamgithubjavatests-api.yaml b/.test-infra/metrics/grafana/provisioning/datasources/beamgithubjavatests-api.yaml new file mode 100644 index 000000000000..a3a0e9bf3118 --- /dev/null +++ b/.test-infra/metrics/grafana/provisioning/datasources/beamgithubjavatests-api.yaml @@ -0,0 +1,34 @@ +################################################################################ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +################################################################################ + +apiVersion: 1 + +deleteDatasources: + +datasources: + - name: Java Tests + type: marcusolsson-json-datasource + access: proxy + orgId: 1 + url: https://api.github.com/repos/apache/beam/actions/workflows/java_tests.yml/runs + jsonData: + httpHeaderName1: "accept" + customQueryParameters: "per_page=100" + secureJsonData: + httpHeaderValue1: "application/vnd.github.v3+json" + editable: false diff --git a/.test-infra/metrics/grafana/provisioning/datasources/beamgithubpythontests-api.yaml b/.test-infra/metrics/grafana/provisioning/datasources/beamgithubpythontests-api.yaml new file mode 100644 index 000000000000..abcd060d1b37 --- /dev/null +++ b/.test-infra/metrics/grafana/provisioning/datasources/beamgithubpythontests-api.yaml @@ -0,0 +1,34 @@ +################################################################################ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +################################################################################ + +apiVersion: 1 + +deleteDatasources: + +datasources: + - name: Python Tests + type: marcusolsson-json-datasource + access: proxy + orgId: 1 + url: https://api.github.com/repos/apache/beam/actions/workflows/python_tests.yml/runs + jsonData: + httpHeaderName1: "accept" + customQueryParameters: "per_page=100" + secureJsonData: + httpHeaderValue1: "application/vnd.github.v3+json" + editable: false diff --git a/.test-infra/metrics/influxdb/Dockerfile b/.test-infra/metrics/influxdb/Dockerfile index a170166d7790..57a541fb9955 100644 --- a/.test-infra/metrics/influxdb/Dockerfile +++ b/.test-infra/metrics/influxdb/Dockerfile @@ -28,7 +28,7 @@ tar xzf influxdb-backup.tar.gz FROM influxdb:1.8.0 -ADD https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh /wait-for-it.sh +ADD https://raw.githubusercontent.com/vishnubob/wait-for-it/81b1373f17855a4dc21156cfe1694c31d7d1792e/wait-for-it.sh /wait-for-it.sh RUN chmod +x /wait-for-it.sh diff --git a/.test-infra/validate-runner/README.md b/.test-infra/validate-runner/README.md new file mode 100644 index 000000000000..640c7ee811a3 --- /dev/null +++ b/.test-infra/validate-runner/README.md @@ -0,0 +1,38 @@ + + +# Overview +Apache Beam provides a portable API layer for building sophisticated data-parallel processing pipelines that may be executed across a diversity of execution engines, or runners. The core concepts of this layer are based upon the Beam Model (formerly referred to as the Dataflow Model), and implemented to varying degrees in each Beam runner. +Apache Beam maintains a capability matrix to track which Beam features are supported by which set of language SDKs + Runners. + +This module consists of the scripts to automatically update the capability matrix with each project release so that its uptodate up to date with minimum supervision or ownership. +The workflow works as follows: + +- The script will run periodically, and using the latest runs from relevant test suites. The script outputs a capability matrix file in JSON format. +- The capability matrix file will be uploaded to a public folder in GCS +- The Beam website will fetch the capability matrix file every time a user loads the Capability Matrix pagefile, and build the matrix + +###Run the project +This module can be run using the below command. It accepts a single argument which is the output JSON filename. If not passes, the output will be written to the file capability.json + +`./gradlew beam-validate-runner:runner -Pargs="filename"` + +####Run Configurations +The project includes a [configuration file](src/main/resources/configuration.yaml) which includes the different configurations to generate the capablities. +Inoreder to add a new runner, the runner name and the Jenkins job name needs to be added to the [configuration file](src/main/resources/configuration.yaml) in the respective mode(batch/stream). diff --git a/.test-infra/validate-runner/build.gradle b/.test-infra/validate-runner/build.gradle new file mode 100644 index 000000000000..072184891b42 --- /dev/null +++ b/.test-infra/validate-runner/build.gradle @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin : "application" + +group 'org.apache.beam' +description = "Apache Beam :: Validate :: Runner" + +repositories { + mavenCentral() + maven { + url "https://repo.jenkins-ci.org/releases/" + } + maven { + url "https://jcenter.bintray.com/" + } + maven { + url "https://packages.confluent.io/maven/" + } +} + +dependencies { + compile group: 'com.cdancy', name: 'jenkins-rest', version: '0.0.22' + compile 'com.offbytwo.jenkins:jenkins-client:0.3.8' + compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.0.1' + compile group: 'org.jenkins-ci.plugins', name: 'junit', version: '1.49' + compile group: 'org.yaml', name: 'snakeyaml', version: '1.28' + compile group: 'com.fasterxml.jackson.dataformat', name: 'jackson-dataformat-yaml', version: '2.11.3' + compile 'org.slf4j:slf4j-simple:1.7.9' + compile group: 'junit', name: 'junit', version: '4.12' + compile project(path: ":sdks:java:core") + compile project(path: ":sdks:java:core", configuration: "shadowTest") + compile project(path: ":runners:spark:3") + compile project(path: ":runners:flink:${latestFlinkVersion}") + compile project(path: ":runners:core-java") + compile project(path: ":runners:core-java", configuration: "testRuntime") +} + + + +ext.javaMainClass = "org.apache.beam.validate.runner.Main" +if (project.hasProperty("args")) { + ext.cmdargs = project.getProperty("args") +} else { + ext.cmdargs = "" +} + +task runner(type: JavaExec) { + classpath = sourceSets.main.runtimeClasspath + main = "org.apache.beam.validate.runner.Main" + args cmdargs.split() +} diff --git a/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/Main.java b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/Main.java new file mode 100644 index 000000000000..812c4a2eedbe --- /dev/null +++ b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/Main.java @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.validate.runner; + +import net.sf.json.JSONArray; +import org.apache.beam.validate.runner.service.ModeTestService; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + + +import java.io.FileWriter; +import java.io.IOException; + +public class Main { + public static void main(String args[]) { + try { + final Logger logger = LoggerFactory.getLogger(Main.class); + + String outputFile; + if (args.length == 0) { + logger.info("Output file name missing. Output will be saved to capability.json"); + outputFile = "capability"; + } else { + outputFile = args[0]; + logger.info("Output will be saved to {}.json", outputFile); + } + JSONArray outputDetails = new JSONArray(); + + logger.info("Processing Batch Jobs:"); + ModeTestService batchTestService = new ModeTestService("batch"); + outputDetails.add(batchTestService.getTests()); + + logger.info("Processing Stream Jobs:"); + ModeTestService streamTestService = new ModeTestService("stream"); + outputDetails.add(streamTestService.getTests()); + + try (FileWriter file = new FileWriter(outputFile + ".json")) { + file.write(outputDetails.toString(3)); + file.flush(); + } catch (IOException e) { + e.printStackTrace(); + } + } catch (Exception ex) { + ex.printStackTrace(); + } + } +} diff --git a/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/CaseResult.java b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/CaseResult.java new file mode 100644 index 000000000000..d1d6910633a4 --- /dev/null +++ b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/CaseResult.java @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.validate.runner.model; + +import com.fasterxml.jackson.annotation.JsonIgnoreProperties; + +import java.util.List; +import java.util.Objects; + +@JsonIgnoreProperties(ignoreUnknown = true) +public class CaseResult { + private String className; + private String name; + private String status; + private List categories; + + public CaseResult(String className, String name, String status, List categories) { + this.className = className; + this.name = name; + this.status = status; + this.categories = categories; + } + + public CaseResult() { + + } + + public String getClassName() { + return className; + } + + public void setClassName(String className) { + this.className = className; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public String getStatus() { + return status; + } + + public void setStatus(String status) { + this.status = status; + } + + public List getCategories() { + return categories; + } + + public void setCategories(List categories) { + this.categories = categories; + } + + @Override + public boolean equals(Object o) { + if (this == o) return true; + if (o == null || getClass() != o.getClass()) return false; + CaseResult that = (CaseResult) o; + return className.equals(that.className) && + status.equals(that.status) && + name.equals(that.name); + } + + @Override + public int hashCode() { + return Objects.hash(className, status, name); + } +} diff --git a/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/Configuration.java b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/Configuration.java new file mode 100644 index 000000000000..3d4f2b406f3d --- /dev/null +++ b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/Configuration.java @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.validate.runner.model; + +import java.util.List; +import java.util.Map; + +public class Configuration { + private List> batch; + private List> stream; + private String server; + private String jsonapi; + + public List> getBatch() { + return batch; + } + + public void setBatch(List> batch) { + this.batch = batch; + } + + public List> getStream() { + return stream; + } + + public void setStream(List> stream) { + this.stream = stream; + } + + public String getServer() { + return server; + } + + public void setServer(String server) { + this.server = server; + } + + public String getJsonapi() { + return jsonapi; + } + + public void setJsonapi(String jsonapi) { + this.jsonapi = jsonapi; + } +} diff --git a/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/SuiteResult.java b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/SuiteResult.java new file mode 100644 index 000000000000..648674f38305 --- /dev/null +++ b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/SuiteResult.java @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.validate.runner.model; + +import com.fasterxml.jackson.annotation.JsonIgnoreProperties; + +import java.util.List; + +@JsonIgnoreProperties(ignoreUnknown = true) +public class SuiteResult { + + public SuiteResult(List cases) { + this.cases = cases; + } + + public SuiteResult() { + + } + + public List getCases() { + return cases; + } + + public void setCases(List cases) { + this.cases = cases; + } + + private List cases; +} diff --git a/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/TestResult.java b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/TestResult.java new file mode 100644 index 000000000000..3e9e9e28a3b1 --- /dev/null +++ b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/model/TestResult.java @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.validate.runner.model; + +import com.fasterxml.jackson.annotation.JsonIgnoreProperties; + +import java.util.List; + +@JsonIgnoreProperties(ignoreUnknown = true) +public class TestResult { + + public List getSuites() { + return suites; + } + + public void setSuites(List suites) { + this.suites = suites; + } + + public TestResult(List suites) { + this.suites = suites; + } + + private List suites; + public TestResult() { + + } +} diff --git a/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/service/ModeTestService.java b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/service/ModeTestService.java new file mode 100644 index 000000000000..0cb6ba8e415f --- /dev/null +++ b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/service/ModeTestService.java @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.validate.runner.service; + +import com.fasterxml.jackson.databind.ObjectMapper; +import net.sf.json.JSONObject; +import org.apache.beam.validate.runner.model.CaseResult; +import org.apache.beam.validate.runner.model.Configuration; +import org.apache.beam.validate.runner.model.TestResult; +import org.apache.beam.validate.runner.util.FileReaderUtil; +import org.apache.commons.lang3.tuple.Pair; + +import java.util.*; + +public class ModeTestService implements TestService { + // Stores all the tests which are run across runners in current mode + private static Set> tests = new HashSet<>(); + + //Stores the tests which are run for the particular runner. + private HashMap> map = new HashMap<>(); + + private String mode; + + public ModeTestService(String mode) { + this.mode = mode; + } + + public JSONObject getTests() { + try { + Configuration configuration = FileReaderUtil.readConfiguration(); + List> jobs = (mode.equals("batch")) ? configuration.getBatch() : configuration.getStream(); + for (Map job : jobs) { + try { + TestResult result = new ObjectMapper().readValue(getUrl(job, configuration), TestResult.class); + tests.addAll(getTestNames(result)); + map.put((String) job.keySet().toArray()[0], getAllTests(result)); + } catch (Exception ex) { + ex.printStackTrace(); + } + } + } catch (Exception ex) { + ex.printStackTrace(); + } + JSONObject outputDetails = new JSONObject(); + outputDetails.put(mode, process(tests, map)); + return outputDetails; + } +} diff --git a/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/service/TestService.java b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/service/TestService.java new file mode 100644 index 000000000000..ab4d0128875f --- /dev/null +++ b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/service/TestService.java @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.validate.runner.service; + +import com.offbytwo.jenkins.JenkinsServer; +import com.offbytwo.jenkins.model.Job; +import com.offbytwo.jenkins.model.JobWithDetails; +import net.sf.json.JSONArray; +import net.sf.json.JSONObject; +import org.apache.beam.validate.runner.model.CaseResult; +import org.apache.beam.validate.runner.model.Configuration; +import org.apache.beam.validate.runner.model.TestResult; +import org.apache.beam.validate.runner.util.CategoryRetriever; +import org.apache.commons.lang3.tuple.ImmutablePair; +import org.apache.commons.lang3.tuple.Pair; + +import java.io.IOException; +import java.net.URI; +import java.net.URISyntaxException; +import java.net.URL; +import java.util.*; + +public interface TestService { + + /** + * Returns all the tests {@link CaseResult} from test result. + * @param testResult {@link TestResult} + * @return set of case result + */ + default Set getAllTests(TestResult testResult) { + Set caseResults = new HashSet<>(); + Optional.ofNullable(testResult.getSuites()).ifPresent(suites -> suites.forEach(item -> item.getCases() + .forEach(caseResult -> { + caseResult.setCategories(CategoryRetriever.getCategories(caseResult.getClassName(), caseResult.getName())); + caseResults.add(caseResult); + }))); + return caseResults; + } + + /** + * Returns the jenkins URL for the last successful build. + * + * @param job Map of runner an job name retrieved from configuration + * @param configuration The input configuration + * + * @return The URL of last successful job. + * @throws URISyntaxException + * @throws IOException + */ + default URL getUrl(Map job, Configuration configuration) throws URISyntaxException, IOException { + Map jobs = new JenkinsServer(new URI(configuration.getServer())).getJobs(); + JobWithDetails jobWithDetails = jobs.get(job.get(job.keySet().toArray()[0])).details(); + return new URL(jobWithDetails.getLastSuccessfulBuild().getUrl() + configuration.getJsonapi()); + } + + /** + * Fetches the test name and class name from a test result. + * @param testResult {@link TestResult} + * @return set of pair of all testname and test class name + */ + default Set> getTestNames(TestResult testResult) { + Set> caseResults = new HashSet<>(); + Optional.ofNullable(testResult.getSuites()).ifPresent(suites -> suites.forEach(item -> item.getCases().forEach(caseResult -> caseResults.add(new ImmutablePair<>(caseResult.getName(), caseResult.getClassName()))))); + return caseResults; + } + + /** + * Method find the tests which are not run for a particular runner and add them as the tests + * which are "NOT RUN" for a particular runner + * + * @param testnames Set of Pair of test name and Test ClassName of all the tests which are run across runners. + * @param map Map of runner and the Tests run for that particular runner + * @return The JsonObject with each runner and all the tests which are RUN/ NOT RUN. + */ + default JSONObject process(Set> testnames, HashMap> map) { + map.forEach((k, v) -> { + Set caseResults = v; + for(Pair pair : testnames) { + boolean found = false; + for(CaseResult caseResult : caseResults) { + if (caseResult.getName().equals(pair.getKey())) { + found = true; + break; + } + } + if(found) { + found = false; + } else { + caseResults.add(new CaseResult(pair.getValue(), "NOT RUN", pair.getKey(), new ArrayList<>())); + } + } + }); + + JSONObject jsonMain = new JSONObject(); + map.forEach((k, v) -> { + JSONArray tests = new JSONArray(); + tests.addAll(v); + JSONObject jsonOut = new JSONObject(); + jsonOut.put("testCases", tests); + jsonMain.put(k, jsonOut); + }); + return jsonMain; + } +} diff --git a/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/util/CategoryRetriever.java b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/util/CategoryRetriever.java new file mode 100644 index 000000000000..713c0e04a414 --- /dev/null +++ b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/util/CategoryRetriever.java @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.validate.runner.util; + +import java.lang.annotation.Annotation; +import java.lang.reflect.Method; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +public class CategoryRetriever { + public static List getCategories(String className, String methodName) { + List categoryNames = new ArrayList<>(); + try { + Class tempClass = Class.forName(className); + for (Method method : tempClass.getDeclaredMethods()) { + if(method.getName().equals(methodName)) { + Annotation[] annotations = method.getAnnotations(); + for(Annotation annotation : annotations) { + if (annotation instanceof org.junit.experimental.categories.Category) { + Class[] categories = null; + categories = ((org.junit.experimental.categories.Category) annotation).value(); + Arrays.stream(categories).forEach(category -> categoryNames.add(category.getName())); + } else { + continue; + } + return categoryNames; + } + } else { + continue; + } + } + } catch (ClassNotFoundException e) { + return categoryNames; + } + return new ArrayList<>(); + } +} diff --git a/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/util/FileReaderUtil.java b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/util/FileReaderUtil.java new file mode 100644 index 000000000000..6010cabac511 --- /dev/null +++ b/.test-infra/validate-runner/src/main/java/org/apache/beam/validate/runner/util/FileReaderUtil.java @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.validate.runner.util; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.apache.beam.validate.runner.model.Configuration; + +import java.io.InputStream; + +/** + * Reads the input configurations from the configuration.yaml file + * and passes as a {@link Configuration} object for processing. + */ +public class FileReaderUtil { + + private static final String FILE_PATH = "/configuration.yaml"; + + public static Configuration readConfiguration() { + try { + ObjectMapper mapper = new ObjectMapper(new YAMLFactory()); + InputStream inputStream = FileReaderUtil.class.getResourceAsStream(FILE_PATH); + return mapper.readValue(inputStream, Configuration.class); + } catch (Exception ex) { + ex.printStackTrace(); + } + return null; + } +} diff --git a/.test-infra/validate-runner/src/main/resources/configuration.yaml b/.test-infra/validate-runner/src/main/resources/configuration.yaml new file mode 100644 index 000000000000..31934d2146d7 --- /dev/null +++ b/.test-infra/validate-runner/src/main/resources/configuration.yaml @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +batch: + - flink: beam_PostCommit_Java_PVR_Flink_Batch + - spark: beam_PostCommit_Java_PVR_Spark_Batch + - dataflow: beam_PostCommit_Java_VR_Dataflow_V2 +stream: + - flink: beam_PostCommit_Java_PVR_Flink_Streaming + - samza: beam_PostCommit_Java_PVR_Samza +server: https://ci-beam.apache.org/ +jsonapi: testReport/api/json diff --git a/.test-infra/validate-runner/src/main/resources/log4j.properties b/.test-infra/validate-runner/src/main/resources/log4j.properties new file mode 100644 index 000000000000..590fc0ac23cf --- /dev/null +++ b/.test-infra/validate-runner/src/main/resources/log4j.properties @@ -0,0 +1,23 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Set everything to be logged to the console +log4j.rootCategory=DEBUG, console +log4j.appender.console=org.apache.log4j.ConsoleAppender +log4j.appender.console.target=System.err +log4j.appender.console.layout=org.apache.log4j.PatternLayout +log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n diff --git a/CHANGES.md b/CHANGES.md index 1f5b532ef33f..5547c753f2f6 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -36,6 +36,11 @@ ## Breaking Changes * X behavior was changed ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). +* Go SDK pipelines require new import paths to use this release due to migration to Go Modules. + * `go.mod` files will need to change to require `github.com/apache/beam/sdks/v2`. + * Code depending on beam imports need to include v2 on the module path. + * Fix by'v2' to the import paths, turning `.../sdks/go/...` to `.../sdks/v2/go/...` + * No other code change should be required to use v2.33.0 of the Go SDK. ## Deprecations @@ -45,29 +50,71 @@ * Fixed X (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). --> +# [2.34.0] - Unreleased +* Add an [example](https://github.com/cometta/python-apache-beam-spark) of deploying Python Apache Beam job with Spark Cluster +## Highlights + +* New highly anticipated feature X added to Python SDK ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). +* New highly anticipated feature Y added to Java SDK ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). +* The Beam Java API for Calcite SqlTransform is no longer experimental ([BEAM-12680](https://issues.apache.org/jira/browse/BEAM-12680)). + +## I/Os + +* Support for X source added (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). + +## New Features / Improvements + +* X feature added (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). +* Upgrade to Calcite 1.26.0 ([BEAM-9379](https://issues.apache.org/jira/browse/BEAM-9379)). + +## Breaking Changes + +* X behavior was changed ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). +* Go SDK pipelines require new import paths to use this release due to migration to Go Modules. + * `go.mod` files will need to change to require `github.com/apache/beam/sdks/v2`. + * Code depending on beam imports need to include v2 on the module path. + * Fix by'v2' to the import paths, turning `.../sdks/go/...` to `.../sdks/v2/go/...` + * No other code change should be required to use v2.33.0 of the Go SDK. +* SQL Rows are no longer flattened ([BEAM-5505](https://issues.apache.org/jira/browse/BEAM-5505)). + +## Deprecations -# [2.27.X] - Unreleased +* X behavior is deprecated and will be removed in X versions ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). + +## Known Issues + +* Fixed X (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). + +# [2.33.0] - Unreleased ## Highlights * New highly anticipated feature X added to Python SDK ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). * New highly anticipated feature Y added to Java SDK ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). +* Go SDK is no longer experimental, and is officially part of the Beam release process. + * Matching Go SDK containers are published on release. + * Batch usage is well supported, and tested on Flink, Spark, and the Python Portable Runner. + * SDK Tests are also run against Google Cloud Dataflow, but this doesn't indicate reciprical support. + * The SDK supports Splittable DoFns, Cross Language transforms, and most Beam Model basics. + * Go Modules are now used for dependency management. + * This is a breaking change, see Breaking Changes for resolution. + * Easier path to contribute to the Go SDK, no need to set up a GO_PATH. + * Minimum Go version is now Go v1.16 + * See the announcement blogpost for full information (TODO(lostluck): Add link once published.) ## I/Os -* ReadFromMongoDB can now be used with MongoDB Atlas (Python) ([BEAM-11266](https://issues.apache.org/jira/browse/BEAM-11266).) + * Support for X source added (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). -* There is a new transform `ReadAllFromBigQuery` that can receive multiple requests to read data from BigQuery at pipeline runtime. See [PR 13170](https://github.com/apache/beam/pull/13170), and [BEAM-9650](https://issues.apache.org/jira/browse/BEAM-9650). ## New Features / Improvements -* Beam modules that depend on Hadoop are now tested for compatibility with Hadoop 3 ([BEAM-8569](https://issues.apache.org/jira/browse/BEAM-8569)). (Hive/HCatalog pending) -* Publishing Java 11 SDK container images now supported as part of Apache Beam release process. ([BEAM-8106](https://issues.apache.org/jira/browse/BEAM-8106)) -* Added Cloud Bigtable Provider extension to Beam SQL ([BEAM-11173](https://issues.apache.org/jira/browse/BEAM-11173), [BEAM-11373](https://issues.apache.org/jira/browse/BEAM-11373)) +* X feature added (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). +* Upgrade Flink runner to Flink versions 1.13.2, 1.12.5 and 1.11.4 ([BEAM-10955](https://issues.apache.org/jira/browse/BEAM-10955)). ## Breaking Changes -* HBaseIO hbase-shaded-client dependency should be now provided by the users ([BEAM-9278](https://issues.apache.org/jira/browse/BEAM-9278)). -* `--region` flag in amazon-web-services2 was replaced by `--awsRegion` ([BEAM-11331](https://issues.apache.org/jira/projects/BEAM/issues/BEAM-11331)). +* Python GBK by defualt will fail on unbounded PCollections that have global windowing and a default trigger. The `--allow_unsafe_triggers` flag can be used to override this. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). +* Python GBK will fail if it detects an unsafe trigger unless the `--allow_unsafe_triggers` flag is set. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). ## Deprecations @@ -76,16 +123,221 @@ ## Known Issues * Fixed X (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). ---> +# [2.32.0] - 2021-08-25 + +## Highlights +* The [Beam DataFrame + API](https://beam.apache.org/documentation/dsls/dataframes/overview/) is no + longer experimental! We've spent the time since the [2.26.0 preview + announcement](https://beam.apache.org/blog/dataframe-api-preview-available/) + implementing the most frequently used pandas operations + ([BEAM-9547](https://issues.apache.org/jira/browse/BEAM-9547)), improving + [documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.html) + and [error messages](https://issues.apache.org/jira/browse/BEAM-12028), + adding + [examples](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/dataframe), + integrating DataFrames with [interactive + Beam](https://beam.apache.org/releases/pydoc/current/apache_beam.runners.interactive.interactive_beam.html), + and of course finding and fixing + [bugs](https://issues.apache.org/jira/issues/?jql=project%3DBEAM%20AND%20issuetype%3DBug%20AND%20status%3DResolved%20AND%20component%3Ddsl-dataframe). + Leaving experimental just means that we now have high confidence in the API + and recommend its use for production workloads. We will continue to improve + the API, guided by your + [feedback](https://beam.apache.org/community/contact-us/). + + +## I/Os + +* Added ability to use JdbcIO.Write.withResults without statement and preparedStatementSetter. ([BEAM-12511](https://issues.apache.org/jira/browse/BEAM-12511)) +- Added ability to register URI schemes to use the S3 protocol via FileIO. ([BEAM-12435](https://issues.apache.org/jira/browse/BEAM-12435)). +* Respect number of shards set in SnowflakeWrite batch mode. ([BEAM-12715](https://issues.apache.org/jira/browse/BEAM-12715)) +* Java SDK: Update Google Cloud Healthcare IO connectors from using v1beta1 to using the GA version. + +## New Features / Improvements + +* Add support to convert Beam Schema to Avro Schema for JDBC LogicalTypes: + `VARCHAR`, `NVARCHAR`, `LONGVARCHAR`, `LONGNVARCHAR`, `DATE`, `TIME` + (Java)([BEAM-12385](https://issues.apache.org/jira/browse/BEAM-12385)). +* Reading from JDBC source by partitions (Java) ([BEAM-12456](https://issues.apache.org/jira/browse/BEAM-12456)). +* PubsubIO can now write to a dead-letter topic after a parsing error (Java)([BEAM-12474](https://issues.apache.org/jira/browse/BEAM-12474)). +* New append-only option for Elasticsearch sink (Java) [BEAM-12601](https://issues.apache.org/jira/browse/BEAM-12601) + +## Breaking Changes + +* ListShards (with DescribeStreamSummary) is used instead of DescribeStream to list shards in Kinesis streams. Due to this change, as mentioned in [AWS documentation](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html), for fine-grained IAM policies it is required to update them to allow calls to ListShards and DescribeStreamSummary APIs. For more information, see [Controlling Access to Amazon Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/controlling-access.html) ([BEAM-12225](https://issues.apache.org/jira/browse/BEAM-12225)). + +## Deprecations + +* Python GBK will stop supporting unbounded PCollections that have global windowing and a default trigger in Beam 2.33. This can be overriden with `--allow_unsafe_triggers`. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). +* Python GBK will start requiring safe triggers or the `--allow_unsafe_triggers` flag starting with Beam 2.33. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). + +## Known Issues + +* Fixed race condition in RabbitMqIO causing duplicate acks (Java) ([BEAM-6516](https://issues.apache.org/jira/browse/BEAM-6516))) + +# [2.31.0] - 2021-07-08 + +## I/Os + +* Fixed bug in ReadFromBigQuery when a RuntimeValueProvider is used as value of table argument (Python) ([BEAM-12514](https://issues.apache.org/jira/browse/BEAM-12514)). + +## New Features / Improvements + +* `CREATE FUNCTION` DDL statement added to Calcite SQL syntax. `JAR` and `AGGREGATE` are now reserved keywords. ([BEAM-12339](https://issues.apache.org/jira/browse/BEAM-12339)). +* Flink 1.13 is now supported by the Flink runner ([BEAM-12277](https://issues.apache.org/jira/browse/BEAM-12277)). +* DatastoreIO: Write and delete operations now follow automatic gradual ramp-up, + in line with best practices (Java/Python) ([BEAM-12260](https://issues.apache.org/jira/browse/BEAM-12260), [BEAM-12272](https://issues.apache.org/jira/browse/BEAM-12272)). +* Python `TriggerFn` has a new `may_lose_data` method to signal potential data loss. Default behavior assumes safe (necessary for backwards compatibility). See Deprecations for potential impact of overriding this. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). + +## Breaking Changes + +* Python Row objects are now sensitive to field order. So `Row(x=3, y=4)` is no + longer considered equal to `Row(y=4, x=3)` (BEAM-11929). +* Kafka Beam SQL tables now ascribe meaning to the LOCATION field; previously + it was ignored if provided. +* `TopCombineFn` disallow `compare` as its argument (Python) ([BEAM-7372](https://issues.apache.org/jira/browse/BEAM-7372)). +* Drop support for Flink 1.10 ([BEAM-12281](https://issues.apache.org/jira/browse/BEAM-12281)). + +## Deprecations + +* Python GBK will stop supporting unbounded PCollections that have global windowing and a default trigger in Beam 2.33. This can be overriden with `--allow_unsafe_triggers`. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). +* Python GBK will start requiring safe triggers or the `--allow_unsafe_triggers` flag starting with Beam 2.33. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). + +# [2.30.0] - 2021-06-09 + +## I/Os + +* Allow splitting apart document serialization and IO for ElasticsearchIO +* Support Bulk API request size optimization through addition of ElasticsearchIO.Write.withStatefulBatches + +## New Features / Improvements + +* Added capability to declare resource hints in Java and Python SDKs ([BEAM-2085](https://issues.apache.org/jira/browse/BEAM-2085)). +* Added Spanner IO Performance tests for read and write. (Python) ([BEAM-10029](https://issues.apache.org/jira/browse/BEAM-10029)). +* Added support for accessing GCP PubSub Message ordering keys, message IDs and message publish timestamp (Python) ([BEAM-7819](https://issues.apache.org/jira/browse/BEAM-7819)). +* DataFrame API: Added support for collecting DataFrame objects in interactive Beam ([BEAM-11855](https://issues.apache.org/jira/browse/BEAM-11855)) +* DataFrame API: Added [apache_beam.examples.dataframe](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/dataframe) module ([BEAM-12024](https://issues.apache.org/jira/browse/BEAM-12024)) +* Upgraded the GCP Libraries BOM version to 20.0.0 ([BEAM-11205](https://issues.apache.org/jira/browse/BEAM-11205)). + For Google Cloud client library versions set by this BOM, see [this table](https://storage.googleapis.com/cloud-opensource-java-dashboard/com.google.cloud/libraries-bom/20.0.0/artifact_details.html). + +## Breaking Changes + +* Drop support for Flink 1.8 and 1.9 ([BEAM-11948](https://issues.apache.org/jira/browse/BEAM-11948)). +* MongoDbIO: Read.withFilter() and Read.withProjection() are removed since they are deprecated since + Beam 2.12.0 ([BEAM-12217](https://issues.apache.org/jira/browse/BEAM-12217)). +* RedisIO.readAll() was removed since it was deprecated since Beam 2.13.0. Please use + RedisIO.readKeyPatterns() for the equivalent functionality. + ([BEAM-12214](https://issues.apache.org/jira/browse/BEAM-12214)). +* MqttIO.create() with clientId constructor removed because it was deprecated since Beam + 2.13.0 ([BEAM-12216](https://issues.apache.org/jira/browse/BEAM-12216)). + +# [2.29.0] - 2021-04-29 + +## Highlights + +* Spark Classic and Portable runners officially support Spark 3 ([BEAM-7093](https://issues.apache.org/jira/browse/BEAM-7093)). +* Official Java 11 support for most runners (Dataflow, Flink, Spark) ([BEAM-2530](https://issues.apache.org/jira/browse/BEAM-2530)). +* DataFrame API now supports GroupBy.apply ([BEAM-11628](https://issues.apache.org/jira/browse/BEAM-11628)). + +## I/Os + +* Added support for S3 filesystem on AWS SDK V2 (Java) ([BEAM-7637](https://issues.apache.org/jira/browse/BEAM-7637)) + +## New Features / Improvements -# [2.26.0] - Cut, Not Yet released. +* DataFrame API now supports pandas 1.2.x ([BEAM-11531](https://issues.apache.org/jira/browse/BEAM-11531)). +* Multiple DataFrame API bugfixes ([BEAM-12071](https://issues.apache/jira/browse/BEAM-12071), [BEAM-11929](https://issues.apache/jira/browse/BEAM-11929)) + +## Breaking Changes + +* Deterministic coding enforced for GroupByKey and Stateful DoFns. Previously non-deterministic coding was allowed, resulting in keys not properly being grouped in some cases. ([BEAM-11719](https://issues.apache.org/jira/browse/BEAM-11719)) + To restore the old behavior, one can register `FakeDeterministicFastPrimitivesCoder` with + `beam.coders.registry.register_fallback_coder(beam.coders.coders.FakeDeterministicFastPrimitivesCoder())` + or use the `allow_non_deterministic_key_coders` pipeline option. + +## Deprecations + +* Support for Flink 1.8 and 1.9 will be removed in the next release (2.30.0) ([BEAM-11948](https://issues.apache.org/jira/browse/BEAM-11948)). + +# [2.28.0] - 2021-02-22 + +## Highlights +* Many improvements related to Parquet support ([BEAM-11460](https://issues.apache.org/jira/browse/BEAM-11460), [BEAM-8202](https://issues.apache.org/jira/browse/BEAM-8202), and [BEAM-11526](https://issues.apache.org/jira/browse/BEAM-11526)) +* Hash Functions in BeamSQL ([BEAM-10074](https://issues.apache.org/jira/browse/BEAM-10074)) +* Hash functions in ZetaSQL ([BEAM-11624](https://issues.apache.org/jira/browse/BEAM-11624)) +* Create ApproximateDistinct using HLL Impl ([BEAM-10324](https://issues.apache.org/jira/browse/BEAM-10324)) + +## I/Os + +* SpannerIO supports using BigDecimal for Numeric fields ([BEAM-11643](https://issues.apache.org/jira/browse/BEAM-11643)) +* Add Beam schema support to ParquetIO ([BEAM-11526](https://issues.apache.org/jira/browse/BEAM-11526)) +* Support ParquetTable Writer ([BEAM-8202](https://issues.apache.org/jira/browse/BEAM-8202)) +* GCP BigQuery sink (streaming inserts) uses runner determined sharding ([BEAM-11408](https://issues.apache.org/jira/browse/BEAM-11408)) +* PubSub support types: TIMESTAMP, DATE, TIME, DATETIME ([BEAM-11533](https://issues.apache.org/jira/browse/BEAM-11533)) + +## New Features / Improvements + +* ParquetIO add methods _readGenericRecords_ and _readFilesGenericRecords_ can read files with an unknown schema. See [PR-13554](https://github.com/apache/beam/pull/13554) and ([BEAM-11460](https://issues.apache.org/jira/browse/BEAM-11460)) +* Added support for thrift in KafkaTableProvider ([BEAM-11482](https://issues.apache.org/jira/browse/BEAM-11482)) +* Added support for HadoopFormatIO to skip key/value clone ([BEAM-11457](https://issues.apache.org/jira/browse/BEAM-11457)) +* Support Conversion to GenericRecords in Convert.to transform ([BEAM-11571](https://issues.apache.org/jira/browse/BEAM-11571)). +* Support writes for Parquet Tables in Beam SQL ([BEAM-8202](https://issues.apache.org/jira/browse/BEAM-8202)). +* Support reading Parquet files with unknown schema ([BEAM-11460](https://issues.apache.org/jira/browse/BEAM-11460)) +* Support user configurable Hadoop Configuration flags for ParquetIO ([BEAM-11527](https://issues.apache.org/jira/browse/BEAM-11527)) +* Expose commit_offset_in_finalize and timestamp_policy to ReadFromKafka ([BEAM-11677](https://issues.apache.org/jira/browse/BEAM-11677)) +* S3 options does not provided to boto3 client while using FlinkRunner and Beam worker pool container ([BEAM-11799](https://issues.apache.org/jira/browse/BEAM-11799)) +* HDFS not deduplicating identical configuration paths ([BEAM-11329](https://issues.apache.org/jira/browse/BEAM-11329)) +* Hash Functions in BeamSQL ([BEAM-10074](https://issues.apache.org/jira/browse/BEAM-10074)) +* Create ApproximateDistinct using HLL Impl ([BEAM-10324](https://issues.apache.org/jira/browse/BEAM-10324)) +* Add Beam schema support to ParquetIO ([BEAM-11526](https://issues.apache.org/jira/browse/BEAM-11526)) +* Add a Deque Encoder ([BEAM-11538](https://issues.apache.org/jira/browse/BEAM-11538)) +* Hash functions in ZetaSQL ([BEAM-11624](https://issues.apache.org/jira/browse/BEAM-11624)) +* Refactor ParquetTableProvider ([](https://issues.apache.org/jira/browse/)) +* Add JVM properties to JavaJobServer ([BEAM-8344](https://issues.apache.org/jira/browse/BEAM-8344)) +* Single source of truth for supported Flink versions ([](https://issues.apache.org/jira/browse/)) +* Use metric for Python BigQuery streaming insert API latency logging ([BEAM-11018](https://issues.apache.org/jira/browse/BEAM-11018)) +* Use metric for Java BigQuery streaming insert API latency logging ([BEAM-11032](https://issues.apache.org/jira/browse/BEAM-11032)) +* Upgrade Flink runner to Flink versions 1.12.1 and 1.11.3 ([BEAM-11697](https://issues.apache.org/jira/browse/BEAM-11697)) +* Upgrade Beam base image to use Tensorflow 2.4.1 ([BEAM-11762](https://issues.apache.org/jira/browse/BEAM-11762)) +* Create Beam GCP BOM ([BEAM-11665](https://issues.apache.org/jira/browse/BEAM-11665)) + +## Breaking Changes + +* The Java artifacts "beam-sdks-java-io-kinesis", "beam-sdks-java-io-google-cloud-platform", and + "beam-sdks-java-extensions-sql-zetasql" declare Guava 30.1-jre dependency (It was 25.1-jre in Beam 2.27.0). + This new Guava version may introduce dependency conflicts if your project or dependencies rely + on removed APIs. If affected, ensure to use an appropriate Guava version via `dependencyManagement` in Maven and + `force` in Gradle. + + +# [2.27.0] - 2021-01-08 + +## I/Os +* ReadFromMongoDB can now be used with MongoDB Atlas (Python) ([BEAM-11266](https://issues.apache.org/jira/browse/BEAM-11266).) +* ReadFromMongoDB/WriteToMongoDB will mask password in display_data (Python) ([BEAM-11444](https://issues.apache.org/jira/browse/BEAM-11444).) +* Support for X source added (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). +* There is a new transform `ReadAllFromBigQuery` that can receive multiple requests to read data from BigQuery at pipeline runtime. See [PR 13170](https://github.com/apache/beam/pull/13170), and [BEAM-9650](https://issues.apache.org/jira/browse/BEAM-9650). + +## New Features / Improvements + +* Beam modules that depend on Hadoop are now tested for compatibility with Hadoop 3 ([BEAM-8569](https://issues.apache.org/jira/browse/BEAM-8569)). (Hive/HCatalog pending) +* Publishing Java 11 SDK container images now supported as part of Apache Beam release process. ([BEAM-8106](https://issues.apache.org/jira/browse/BEAM-8106)) +* Added Cloud Bigtable Provider extension to Beam SQL ([BEAM-11173](https://issues.apache.org/jira/browse/BEAM-11173), [BEAM-11373](https://issues.apache.org/jira/browse/BEAM-11373)) +* Added a schema provider for thrift data ([BEAM-11338](https://issues.apache.org/jira/browse/BEAM-11338)) +* Added combiner packing pipeline optimization to Dataflow runner. ([BEAM-10641](https://issues.apache.org/jira/browse/BEAM-10641)) +* Support for the Deque structure by adding a coder ([BEAM-11538](https://issues.apache.org/jira/browse/BEAM-11538)) + +## Breaking Changes + +* HBaseIO hbase-shaded-client dependency should be now provided by the users ([BEAM-9278](https://issues.apache.org/jira/browse/BEAM-9278)). +* `--region` flag in amazon-web-services2 was replaced by `--awsRegion` ([BEAM-11331](https://issues.apache.org/jira/projects/BEAM/issues/BEAM-11331)). + +# [2.26.0] - 2020-12-11 ## Highlights * Splittable DoFn is now the default for executing the Read transform for Java based runners (Spark with bounded pipelines) in addition to existing runners from the 2.25.0 release (Direct, Flink, Jet, Samza, Twister2). The expected output of the Read transform is unchanged. Users can opt-out using `--experiments=use_deprecated_read`. The Apache Beam community is looking for feedback for this change as the community is planning to make this change permanent with no opt-out. If you run into an issue requiring the opt-out, please send an e-mail to [user@beam.apache.org](mailto:user@beam.apache.org) specifically referencing BEAM-10670 in the subject line and why you needed to opt-out. (Java) ([BEAM-10670](https://issues.apache.org/jira/browse/BEAM-10670)) -* New highly anticipated feature X added to Python SDK ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). -* New highly anticipated feature Y added to Java SDK ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). ## I/Os @@ -100,21 +352,15 @@ * Added option to disable unnecessary copying between operators in Flink Runner (Java) ([BEAM-11146](https://issues.apache.org/jira/browse/BEAM-11146)) * Added CombineFn.setup and CombineFn.teardown to Python SDK. These methods let you initialize the CombineFn's state before any of the other methods of the CombineFn is executed and clean that state up later on. If you are using Dataflow, you need to enable Dataflow Runner V2 by passing `--experiments=use_runner_v2` before using this feature. ([BEAM-3736](https://issues.apache.org/jira/browse/BEAM-3736)) * Added support for NestedValueProvider for the Python SDK ([BEAM-10856](https://issues.apache.org/jira/browse/BEAM-10856)). -* X feature added (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). ## Breaking Changes -* X behavior was changed ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). * BigQuery's DATETIME type now maps to Beam logical type org.apache.beam.sdk.schemas.logicaltypes.SqlTypes.DATETIME * Pandas 1.x is now required for dataframe operations. -## Deprecations - -* X behavior is deprecated and will be removed in X versions ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). - ## Known Issues -* Fixed X (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). +* Non-idempotent combiners built via `CombineFn.from_callable()` or `CombineFn.maybe_from_callable()` can lead to incorrect behavior. ([BEAM-11522](https://issues.apache.org/jira/browse/BEAM-11522)). # [2.25.0] - 2020-10-23 @@ -122,8 +368,6 @@ ## Highlights * Splittable DoFn is now the default for executing the Read transform for Java based runners (Direct, Flink, Jet, Samza, Twister2). The expected output of the Read transform is unchanged. Users can opt-out using `--experiments=use_deprecated_read`. The Apache Beam community is looking for feedback for this change as the community is planning to make this change permanent with no opt-out. If you run into an issue requiring the opt-out, please send an e-mail to [user@beam.apache.org](mailto:user@beam.apache.org) specifically referencing BEAM-10670 in the subject line and why you needed to opt-out. (Java) ([BEAM-10670](https://issues.apache.org/jira/browse/BEAM-10670)) -* New highly anticipated feature X added to Python SDK ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). -* New highly anticipated feature Y added to Java SDK ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). ## I/Os @@ -150,7 +394,6 @@ This has not enabled by default to preserve backwards compatibility; use the `--type_check_additional=ptransform_fn` flag to enable. It may be enabled by default in future versions of Beam. -* X feature added (Java/Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). ## Breaking Changes @@ -160,7 +403,6 @@ ## Deprecations * Python transform ReadFromSnowflake has been moved from `apache_beam.io.external.snowflake` to `apache_beam.io.snowflake`. The previous path will be removed in the future versions. -* X behavior is deprecated and will be removed in X versions ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). ## Known Issues diff --git a/LICENSE b/LICENSE index 261eeb9e9f8b..3b335e38d36d 100644 --- a/LICENSE +++ b/LICENSE @@ -1,3 +1,4 @@ + Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ @@ -199,3 +200,208 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. + + A part of several convenience binary distributions of this software is licensed as follows: + + Google Protobuf: + Copyright 2008 Google Inc. All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above + copyright notice, this list of conditions and the following disclaimer + in the documentation and/or other materials provided with the + distribution. + * Neither the name of Google Inc. nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + Code generated by the Protocol Buffer compiler is owned by the owner + of the input file used when generating it. This code is not + standalone and requires a support library to be linked with it. This + support library is itself covered by the above license. + + jsr-305: + Copyright (c) 2007-2009, JSR305 expert group + All rights reserved. + + https://opensource.org/licenses/BSD-3-Clause + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + * Neither the name of the JSR305 expert group nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + POSSIBILITY OF SUCH DAMAGE. + + janino-compiler: + Janino - An embedded Java[TM] compiler + + Copyright (c) 2001-2016, Arno Unkrig + Copyright (c) 2015-2016 TIBCO Software Inc. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above + copyright notice, this list of conditions and the following + disclaimer in the documentation and/or other materials + provided with the distribution. + 3. Neither the name of JANINO nor the names of its contributors + may be used to endorse or promote products derived from this + software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE + LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR + OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN + IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + jline: + Copyright (c) 2002-2016, the original author or authors. + All rights reserved. + + http://www.opensource.org/licenses/bsd-license.php + + Redistribution and use in source and binary forms, with or + without modification, are permitted provided that the following + conditions are met: + + Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer + in the documentation and/or other materials provided with + the distribution. + + Neither the name of JLine nor the names of its contributors + may be used to endorse or promote products derived from this + software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, + BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY + AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO + EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE + FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED + AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING + IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED + OF THE POSSIBILITY OF SUCH DAMAGE. + + sqlline: + SQLLine - Shell for issuing SQL to relational databases via JDBC + + Copyright (c) 2002,2003,2004,2005,2006,2007 Marc Prud'hommeaux + Copyright (c) 2004-2010 The Eigenbase Project + Copyright (c) 2013-2017 Julian Hyde + All rights reserved. + + =============================================================================== + + Licensed under the Modified BSD License (the "License"); you may not + use this file except in compliance with the License. You may obtain a + copy of the License at: + + http://opensource.org/licenses/BSD-3-Clause + + Redistribution and use in source and binary forms, + with or without modification, are permitted provided + that the following conditions are met: + + (1) Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + (2) Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the + distribution. + + (3) The name of the author may not be used to endorse or promote + products derived from this software without specific prior written + permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + slf4j: + Copyright (c) 2004-2017 QOS.ch + All rights reserved. + + Permission is hereby granted, free of charge, to any person obtaining + a copy of this software and associated documentation files (the + "Software"), to deal in the Software without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of the Software, and to + permit persons to whom the Software is furnished to do so, subject to + the following conditions: + + The above copyright notice and this permission notice shall be + included in all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE + LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE + +See the adjacent LICENSE.python file, if present, for additional licenses that +apply to parts of Apache Beam Python. \ No newline at end of file diff --git a/LICENSE.python b/LICENSE.python new file mode 100644 index 000000000000..f2f028b075fa --- /dev/null +++ b/LICENSE.python @@ -0,0 +1,258 @@ + +################################################################################ +CPython LICENSE. Source: +https://github.com/python/cpython/blob/81574b80e92554adf75c13fa42415beb8be383cb/LICENSE + +A. HISTORY OF THE SOFTWARE +========================== + +Python was created in the early 1990s by Guido van Rossum at Stichting +Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands +as a successor of a language called ABC. Guido remains Python's +principal author, although it includes many contributions from others. + +In 1995, Guido continued his work on Python at the Corporation for +National Research Initiatives (CNRI, see http://www.cnri.reston.va.us) +in Reston, Virginia where he released several versions of the +software. + +In May 2000, Guido and the Python core development team moved to +BeOpen.com to form the BeOpen PythonLabs team. In October of the same +year, the PythonLabs team moved to Digital Creations, which became +Zope Corporation. In 2001, the Python Software Foundation (PSF, see +https://www.python.org/psf/) was formed, a non-profit organization +created specifically to own Python-related Intellectual Property. +Zope Corporation was a sponsoring member of the PSF. + +All Python releases are Open Source (see http://www.opensource.org for +the Open Source Definition). Historically, most, but not all, Python +releases have also been GPL-compatible; the table below summarizes +the various releases. + + Release Derived Year Owner GPL- + from compatible? (1) + + 0.9.0 thru 1.2 1991-1995 CWI yes + 1.3 thru 1.5.2 1.2 1995-1999 CNRI yes + 1.6 1.5.2 2000 CNRI no + 2.0 1.6 2000 BeOpen.com no + 1.6.1 1.6 2001 CNRI yes (2) + 2.1 2.0+1.6.1 2001 PSF no + 2.0.1 2.0+1.6.1 2001 PSF yes + 2.1.1 2.1+2.0.1 2001 PSF yes + 2.1.2 2.1.1 2002 PSF yes + 2.1.3 2.1.2 2002 PSF yes + 2.2 and above 2.1.1 2001-now PSF yes + +Footnotes: + +(1) GPL-compatible doesn't mean that we're distributing Python under + the GPL. All Python licenses, unlike the GPL, let you distribute + a modified version without making your changes open source. The + GPL-compatible licenses make it possible to combine Python with + other software that is released under the GPL; the others don't. + +(2) According to Richard Stallman, 1.6.1 is not GPL-compatible, + because its license has a choice of law clause. According to + CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1 + is "not incompatible" with the GPL. + +Thanks to the many outside volunteers who have worked under Guido's +direction to make these releases possible. + +B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON +=============================================================== + +PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2 +-------------------------------------------- + +1. This LICENSE AGREEMENT is between the Python Software Foundation +("PSF"), and the Individual or Organization ("Licensee") accessing and +otherwise using this software ("Python") in source or binary form and +its associated documentation. + +2. Subject to the terms and conditions of this License Agreement, PSF hereby +grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, +analyze, test, perform and/or display publicly, prepare derivative works, +distribute, and otherwise use Python alone or in any derivative version, +provided, however, that PSF's License Agreement and PSF's notice of copyright, +i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, +2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018 Python Software Foundation; All +Rights Reserved" are retained in Python alone or in any derivative version +prepared by Licensee. + +3. In the event Licensee prepares a derivative work that is based on +or incorporates Python or any part thereof, and wants to make +the derivative work available to others as provided herein, then +Licensee hereby agrees to include in any such work a brief summary of +the changes made to Python. + +4. PSF is making Python available to Licensee on an "AS IS" +basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR +IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND +DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS +FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT +INFRINGE ANY THIRD PARTY RIGHTS. + +5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON +FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS +A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON, +OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. + +6. This License Agreement will automatically terminate upon a material +breach of its terms and conditions. + +7. Nothing in this License Agreement shall be deemed to create any +relationship of agency, partnership, or joint venture between PSF and +Licensee. This License Agreement does not grant permission to use PSF +trademarks or trade name in a trademark sense to endorse or promote +products or services of Licensee, or any third party. + +8. By copying, installing or otherwise using Python, Licensee +agrees to be bound by the terms and conditions of this License +Agreement. + +BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0 +------------------------------------------- + +BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1 + +1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an +office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the +Individual or Organization ("Licensee") accessing and otherwise using +this software in source or binary form and its associated +documentation ("the Software"). + +2. Subject to the terms and conditions of this BeOpen Python License +Agreement, BeOpen hereby grants Licensee a non-exclusive, +royalty-free, world-wide license to reproduce, analyze, test, perform +and/or display publicly, prepare derivative works, distribute, and +otherwise use the Software alone or in any derivative version, +provided, however, that the BeOpen Python License is retained in the +Software, alone or in any derivative version prepared by Licensee. + +3. BeOpen is making the Software available to Licensee on an "AS IS" +basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR +IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND +DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS +FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT +INFRINGE ANY THIRD PARTY RIGHTS. + +4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE +SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS +AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY +DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. + +5. This License Agreement will automatically terminate upon a material +breach of its terms and conditions. + +6. This License Agreement shall be governed by and interpreted in all +respects by the law of the State of California, excluding conflict of +law provisions. Nothing in this License Agreement shall be deemed to +create any relationship of agency, partnership, or joint venture +between BeOpen and Licensee. This License Agreement does not grant +permission to use BeOpen trademarks or trade names in a trademark +sense to endorse or promote products or services of Licensee, or any +third party. As an exception, the "BeOpen Python" logos available at +http://www.pythonlabs.com/logos.html may be used according to the +permissions granted on that web page. + +7. By copying, installing or otherwise using the software, Licensee +agrees to be bound by the terms and conditions of this License +Agreement. + + +CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1 +--------------------------------------- + +1. This LICENSE AGREEMENT is between the Corporation for National +Research Initiatives, having an office at 1895 Preston White Drive, +Reston, VA 20191 ("CNRI"), and the Individual or Organization +("Licensee") accessing and otherwise using Python 1.6.1 software in +source or binary form and its associated documentation. + +2. Subject to the terms and conditions of this License Agreement, CNRI +hereby grants Licensee a nonexclusive, royalty-free, world-wide +license to reproduce, analyze, test, perform and/or display publicly, +prepare derivative works, distribute, and otherwise use Python 1.6.1 +alone or in any derivative version, provided, however, that CNRI's +License Agreement and CNRI's notice of copyright, i.e., "Copyright (c) +1995-2001 Corporation for National Research Initiatives; All Rights +Reserved" are retained in Python 1.6.1 alone or in any derivative +version prepared by Licensee. Alternately, in lieu of CNRI's License +Agreement, Licensee may substitute the following text (omitting the +quotes): "Python 1.6.1 is made available subject to the terms and +conditions in CNRI's License Agreement. This Agreement together with +Python 1.6.1 may be located on the Internet using the following +unique, persistent identifier (known as a handle): 1895.22/1013. This +Agreement may also be obtained from a proxy server on the Internet +using the following URL: http://hdl.handle.net/1895.22/1013". + +3. In the event Licensee prepares a derivative work that is based on +or incorporates Python 1.6.1 or any part thereof, and wants to make +the derivative work available to others as provided herein, then +Licensee hereby agrees to include in any such work a brief summary of +the changes made to Python 1.6.1. + +4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS" +basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR +IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND +DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS +FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT +INFRINGE ANY THIRD PARTY RIGHTS. + +5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON +1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS +A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1, +OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. + +6. This License Agreement will automatically terminate upon a material +breach of its terms and conditions. + +7. This License Agreement shall be governed by the federal +intellectual property law of the United States, including without +limitation the federal copyright law, and, to the extent such +U.S. federal law does not apply, by the law of the Commonwealth of +Virginia, excluding Virginia's conflict of law provisions. +Notwithstanding the foregoing, with regard to derivative works based +on Python 1.6.1 that incorporate non-separable material that was +previously distributed under the GNU General Public License (GPL), the +law of the Commonwealth of Virginia shall govern this License +Agreement only as to issues arising under or with respect to +Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this +License Agreement shall be deemed to create any relationship of +agency, partnership, or joint venture between CNRI and Licensee. This +License Agreement does not grant permission to use CNRI trademarks or +trade name in a trademark sense to endorse or promote products or +services of Licensee, or any third party. + +8. By clicking on the "ACCEPT" button where indicated, or by copying, +installing or otherwise using Python 1.6.1, Licensee agrees to be +bound by the terms and conditions of this License Agreement. + + ACCEPT + + +CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2 +-------------------------------------------------- + +Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam, +The Netherlands. All rights reserved. + +Permission to use, copy, modify, and distribute this software and its +documentation for any purpose and without fee is hereby granted, +provided that the above copyright notice appear in all copies and that +both that copyright notice and this permission notice appear in +supporting documentation, and that the name of Stichting Mathematisch +Centrum or CWI not be used in advertising or publicity pertaining to +distribution of the software without specific, written prior +permission. + +STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO +THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND +FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE +FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN +ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT +OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + diff --git a/README.md b/README.md index a1fc8d1c7223..12c60dbf303a 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ # Apache Beam -[Apache Beam](http://beam.apache.org/) is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including [Apache Flink](http://flink.apache.org/), [Apache Spark](http://spark.apache.org/), [Google Cloud Dataflow](http://cloud.google.com/dataflow/) and [Hazelcast Jet](https://jet.hazelcast.org/). +[Apache Beam](http://beam.apache.org/) is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including [Apache Flink](http://flink.apache.org/), [Apache Spark](http://spark.apache.org/), [Google Cloud Dataflow](http://cloud.google.com/dataflow/), and [Hazelcast Jet](https://jet.hazelcast.org/). ## Status @@ -36,10 +36,10 @@ Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- -Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- -Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) -Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | --- -XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | --- +Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- +Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) +Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)
[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | --- +XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark3/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark3/lastCompletedBuild/) | --- ## Overview diff --git a/build.gradle b/build.gradle deleted file mode 100644 index 15cfc245c8e9..000000000000 --- a/build.gradle +++ /dev/null @@ -1,392 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -plugins { - id 'base' - // This plugin provides a task to determine which dependencies have updates. - // Additionally, the plugin checks for updates to Gradle itself. - // - // See https://github.com/ben-manes/gradle-versions-plugin for further details. - id 'com.github.ben-manes.versions' version '0.33.0' - // Apply one top level rat plugin to perform any required license enforcement analysis - id 'org.nosphere.apache.rat' version '0.7.0' - // Enable gradle-based release management - id 'net.researchgate.release' version '2.8.1' - id 'org.apache.beam.module' - id "org.sonarqube" version "3.0" -} - -/*************************************************************************************************/ -// Configure the root project - -rat { - // Set input directory to that of the root project instead of the CWD. This - // makes .gitignore rules (added below) work properly. - inputDir = project.rootDir - - def exclusions = [ - // Ignore files we track but do not distribute - "**/.github/**/*", - "gradlew", - "gradlew.bat", - "gradle/wrapper/gradle-wrapper.properties", - - "**/package-list", - "**/test.avsc", - "**/user.avsc", - "**/test/resources/**/*.txt", - "**/test/**/.placeholder", - - // Default eclipse excludes neglect subprojects - - // Proto/grpc generated wrappers - "**/apache_beam/portability/api/*_pb2*.py", - "**/go/pkg/beam/**/*.pb.go", - - // Ignore go.sum files, which don't permit headers - "**/go.sum", - - // Ignore Go test data files - "**/go/data/**", - - // VCF test files - "**/apache_beam/testing/data/vcf/*", - - // JDBC package config files - "**/META-INF/services/java.sql.Driver", - - // Website build files - "**/Gemfile.lock", - "**/Rakefile", - "**/.htaccess", - "website/www/site/assets/scss/_bootstrap.scss", - "website/www/site/assets/scss/bootstrap/**/*", - "website/www/site/static/images/mascot/*.ai", - "website/www/site/static/js/bootstrap*.js", - "website/www/site/static/js/bootstrap/**/*", - "website/www/site/themes", - "website/www/yarn.lock", - "website/www/package.json", - - // Ignore ownership files - "ownership/**/*", - "**/OWNERS", - - // Json doesn't support comments. - "**/*.json", - - // Katas files - "learning/katas/**/course-remote-info.yaml", - "learning/katas/**/section-remote-info.yaml", - "learning/katas/**/lesson-remote-info.yaml", - "learning/katas/**/task-remote-info.yaml", - "learning/katas/**/*.txt", - - // test p8 file for SnowflakeIO - "sdks/java/io/snowflake/src/test/resources/invalid_test_rsa_key.p8", - "sdks/java/io/snowflake/src/test/resources/valid_test_rsa_key.p8", - - // Mockito extensions - "sdks/java/io/amazon-web-services2/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker", - "sdks/java/io/azure/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker", - "sdks/java/extensions/ml/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker", - - // JupyterLab extensions - "sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/yarn.lock", - - // Sample text file for Java quickstart - "sdks/java/maven-archetypes/examples/sample.txt" - ] - - // Add .gitignore excludes to the Apache Rat exclusion list. We re-create the behavior - // of the Apache Maven Rat plugin since the Apache Ant Rat plugin doesn't do this - // automatically. - def gitIgnore = project(':').file('.gitignore') - if (gitIgnore.exists()) { - def gitIgnoreExcludes = gitIgnore.readLines().findAll { !it.isEmpty() && !it.startsWith('#') } - exclusions.addAll(gitIgnoreExcludes) - } - - failOnError = true - excludes = exclusions -} -check.dependsOn rat - -// Define root pre/post commit tasks simplifying what is needed -// to be specified on the commandline when executing locally. -// This indirection also makes Jenkins use the branch of the PR -// for the test definitions. -task javaPreCommit() { - // We need to list the model/* builds since sdks/java/core doesn't - // depend on any of the model. - dependsOn ":model:pipeline:build" - dependsOn ":model:job-management:build" - dependsOn ":model:fn-execution:build" - dependsOn ":runners:google-cloud-dataflow-java:worker:legacy-worker:build" - dependsOn ":sdks:java:core:buildNeeded" - dependsOn ":sdks:java:core:buildDependents" - dependsOn ":examples:java:preCommit" - dependsOn ":sdks:java:extensions:sql:jdbc:preCommit" - dependsOn ":sdks:java:javadoc:allJavadoc" - dependsOn ":runners:direct-java:needsRunnerTests" - dependsOn ":sdks:java:container:java8:docker" -} - -task sqlPreCommit() { - dependsOn ":sdks:java:extensions:sql:runBasicExample" - dependsOn ":sdks:java:extensions:sql:runPojoExample" - dependsOn ":sdks:java:extensions:sql:build" - dependsOn ":sdks:java:extensions:sql:buildDependents" -} - -task javaPreCommitPortabilityApi() { - dependsOn ":runners:google-cloud-dataflow-java:worker:build" -} - -task javaPostCommit() { - dependsOn ":runners:google-cloud-dataflow-java:postCommit" - dependsOn ":sdks:java:extensions:google-cloud-platform-core:postCommit" - dependsOn ":sdks:java:extensions:zetasketch:postCommit" - dependsOn ":sdks:java:io:google-cloud-platform:postCommit" - dependsOn ":sdks:java:io:kinesis:integrationTest" - dependsOn ":sdks:java:extensions:ml:postCommit" - dependsOn ":javaHadoopVersionsTest" - dependsOn ":sdks:java:io:kafka:kafkaVersionsCompatibilityTest" -} - -task javaHadoopVersionsTest() { - dependsOn ":sdks:java:io:hadoop-common:hadoopVersionsTest" - dependsOn ":sdks:java:io:hadoop-file-system:hadoopVersionsTest" - dependsOn ":sdks:java:io:hadoop-format:hadoopVersionsTest" - dependsOn ":sdks:java:io:hcatalog:hadoopVersionsTest" - dependsOn ":sdks:java:io:parquet:hadoopVersionsTest" - dependsOn ":sdks:java:extensions:sorter:hadoopVersionsTest" - dependsOn ":runners:spark:hadoopVersionsTest" -} - -task sqlPostCommit() { - dependsOn ":sdks:java:extensions:sql:postCommit" - dependsOn ":sdks:java:extensions:sql:jdbc:postCommit" - dependsOn ":sdks:java:extensions:sql:datacatalog:postCommit" - dependsOn ":sdks:java:extensions:sql:hadoopVersionsTest" -} - -task goPreCommit() { - dependsOn ":sdks:go:goBuild" - dependsOn ":sdks:go:goTest" - - dependsOn ":sdks:go:examples:goBuild" - dependsOn ":sdks:go:test:goBuild" - - // Ensure all container Go boot code builds as well. - dependsOn ":sdks:java:container:goBuild" - dependsOn ":sdks:python:container:goBuild" - dependsOn ":sdks:go:container:goBuild" -} - -task goPostCommit() { - dependsOn ":goIntegrationTests" -} - -task goIntegrationTests() { - doLast { - exec { - executable 'sh' - args '-c', './sdks/go/test/run_integration_tests.sh' - } - } - dependsOn ":sdks:go:test:build" - dependsOn ":runners:google-cloud-dataflow-java:worker:shadowJar" -} - -task pythonPreCommit() { - dependsOn ":sdks:python:test-suites:tox:pycommon:preCommitPyCommon" - dependsOn ":sdks:python:test-suites:tox:py36:preCommitPy36" - dependsOn ":sdks:python:test-suites:tox:py37:preCommitPy37" - dependsOn ":sdks:python:test-suites:tox:py38:preCommitPy38" - dependsOn ":sdks:python:test-suites:dataflow:preCommitIT" - dependsOn ":sdks:python:test-suites:dataflow:preCommitIT_V2" -} - -task pythonDocsPreCommit() { - dependsOn ":sdks:python:test-suites:tox:pycommon:docs" -} - -task pythonDockerBuildPreCommit() { - dependsOn ":sdks:python:container:py36:docker" - dependsOn ":sdks:python:container:py37:docker" - dependsOn ":sdks:python:container:py38:docker" -} - -task pythonLintPreCommit() { - dependsOn ":sdks:python:test-suites:tox:py37:lint" -} - -task pythonFormatterPreCommit() { - dependsOn 'sdks:python:test-suites:tox:py38:formatter' -} - -task python36PostCommit() { - dependsOn ":sdks:python:test-suites:dataflow:py36:postCommitIT" - dependsOn ":sdks:python:test-suites:direct:py36:postCommitIT" - dependsOn ":sdks:python:test-suites:portable:py36:postCommitPy36" -} - -task python37PostCommit() { - dependsOn ":sdks:python:test-suites:dataflow:py37:postCommitIT" - dependsOn ":sdks:python:test-suites:direct:py37:postCommitIT" - dependsOn ":sdks:python:test-suites:direct:py37:directRunnerIT" - dependsOn ":sdks:python:test-suites:direct:py37:hdfsIntegrationTest" - dependsOn ":sdks:python:test-suites:direct:py37:mongodbioIT" - dependsOn ":sdks:python:test-suites:portable:py37:postCommitPy37" -} - -task python38PostCommit() { - dependsOn ":sdks:python:test-suites:dataflow:py38:postCommitIT" - dependsOn ":sdks:python:test-suites:direct:py38:postCommitIT" - dependsOn ":sdks:python:test-suites:direct:py38:hdfsIntegrationTest" - dependsOn ":sdks:python:test-suites:portable:py38:postCommitPy38" -} - -task portablePythonPreCommit() { - dependsOn ":sdks:python:test-suites:portable:py36:preCommitPy36" - dependsOn ":sdks:python:test-suites:portable:py37:preCommitPy37" -} - -task pythonSparkPostCommit() { - dependsOn ":sdks:python:test-suites:portable:py36:sparkValidatesRunner" - dependsOn ":sdks:python:test-suites:portable:py37:sparkValidatesRunner" - dependsOn ":sdks:python:test-suites:portable:py38:sparkValidatesRunner" -} - -task websitePreCommit() { - dependsOn ":website:preCommit" -} - -task communityMetricsPreCommit() { - dependsOn ":beam-test-infra-metrics:preCommit" -} - -task communityMetricsProber() { - dependsOn ":beam-test-infra-metrics:checkProber" -} - -task javaExamplesDataflowPrecommit() { - dependsOn ":runners:google-cloud-dataflow-java:examples:preCommit" - dependsOn ":runners:google-cloud-dataflow-java:examples-streaming:preCommit" -} - -task runBeamDependencyCheck() { - dependsOn ":dependencyUpdates" - dependsOn ":sdks:python:dependencyUpdates" -} - -task whitespacePreCommit() { - dependsOn ":sdks:python:test-suites:tox:py38:archiveFilesToLint" - dependsOn ":sdks:python:test-suites:tox:py38:unpackFilesToLint" - dependsOn ":sdks:python:test-suites:tox:py38:whitespacelint" -} - -task typescriptPreCommit() { - dependsOn ":sdks:python:test-suites:tox:py38:eslint" - dependsOn ":sdks:python:test-suites:tox:py38:jest" -} - -// Configure the release plugin to do only local work; the release manager determines what, if -// anything, to push. On failure, the release manager can reset the branch without pushing. -release { - revertOnFail = false - tagTemplate = 'v${version}' - git { - requireBranch = 'release-.*|master' - pushToRemote = '' - } -} - -// Reports linkage errors across multiple Apache Beam artifact ids. -// -// To use (from the root of project): -// ./gradlew -Ppublishing -PjavaLinkageArtifactIds=artifactId1,artifactId2,... :checkJavaLinkage -// -// For example: -// ./gradlew -Ppublishing -PjavaLinkageArtifactIds=beam-sdks-java-core,beam-sdks-java-io-jdbc :checkJavaLinkage -// -// Note that this task publishes artifacts into your local Maven repository. -if (project.hasProperty('javaLinkageArtifactIds')) { - if (!project.hasProperty('publishing')) { - throw new GradleException('You can only check linkage of Java artifacts if you specify -Ppublishing on the command line as well.') - } - - configurations { linkageCheckerJava } - dependencies { - linkageCheckerJava "com.google.cloud.tools:dependencies:1.5.0" - } - - // We need to evaluate all the projects first so that we can find depend on all the - // publishMavenJavaPublicationToMavenLocal tasks below. - for (p in rootProject.subprojects) { - if (!p.path.equals(project.path)) { - evaluationDependsOn(p.path) - } - } - - project.task('checkJavaLinkage', type: JavaExec) { - dependsOn project.getTasksByName('publishMavenJavaPublicationToMavenLocal', true /* recursively */) - classpath = project.configurations.linkageCheckerJava - main = 'com.google.cloud.tools.opensource.classpath.LinkageCheckerMain' - def arguments = ['-a', project.javaLinkageArtifactIds.split(',').collect({ - if (it.contains(':')) { - "${project.ext.mavenGroupId}:${it}" - } else { - // specify the version if not provided - "${project.ext.mavenGroupId}:${it}:${project.version}" - } - }).join(',')] - - // Exclusion file filters out existing linkage errors before a change - if (project.hasProperty('javaLinkageWriteBaseline')) { - arguments += ['--output-exclusion-file', project.getProperty('javaLinkageWriteBaseline')] - } else if (project.hasProperty('javaLinkageReadBaseline')) { - arguments += ['--exclusion-file', project.getProperty('javaLinkageReadBaseline')] - } - args arguments - doLast { - println "NOTE: This task published artifacts into your local Maven repository. You may want to remove them manually." - } - } -} -if (project.hasProperty('compileAndRunTestsWithJava11')) { - project.javaPreCommitPortabilityApi.dependsOn ':sdks:java:testing:test-utils:verifyJavaVersion' - project.javaExamplesDataflowPrecommit.dependsOn ':sdks:java:testing:test-utils:verifyJavaVersion' - project.sqlPreCommit.dependsOn ':sdks:java:testing:test-utils:verifyJavaVersion' -} else { - allprojects { - tasks.withType(Test).configureEach { - exclude '**/JvmVerification.class' - } - } -} - -// Ignore sdk.properties so it doesn't spoil the build cache unnecessarily (see https://docs.gradle.org/6.6/userguide/more_about_tasks.html#sec:configure_input_normalization) -allprojects { - normalization { - runtimeClasspath { - ignore('**/sdk.properties') - } - } -} \ No newline at end of file diff --git a/build.gradle.kts b/build.gradle.kts new file mode 100644 index 000000000000..72a7725a6756 --- /dev/null +++ b/build.gradle.kts @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +plugins { + base + // This plugin provides a task to determine which dependencies have updates. + // Additionally, the plugin checks for updates to Gradle itself. + // + // See https://github.com/ben-manes/gradle-versions-plugin for further details. + id("com.github.ben-manes.versions") version "0.33.0" + // Apply one top level rat plugin to perform any required license enforcement analysis + id("org.nosphere.apache.rat") version "0.7.0" + // Enable gradle-based release management + id("net.researchgate.release") version "2.8.1" + id("org.apache.beam.module") + id("org.sonarqube") version "3.0" +} + +/*************************************************************************************************/ +// Configure the root project + +tasks.rat { + // Set input directory to that of the root project instead of the CWD. This + // makes .gitignore rules (added below) work properly. + inputDir.set(project.rootDir) + + val exclusions = mutableListOf( + // Ignore files we track but do not distribute + "**/.github/**/*", + "gradlew", + "gradlew.bat", + "gradle/wrapper/gradle-wrapper.properties", + + "**/package-list", + "**/test.avsc", + "**/user.avsc", + "**/test/resources/**/*.txt", + "**/test/resources/**/*.csv", + "**/test/**/.placeholder", + + // Default eclipse excludes neglect subprojects + + // Proto/grpc generated wrappers + "**/apache_beam/portability/api/*_pb2*.py", + "**/go/pkg/beam/**/*.pb.go", + + // Ignore go.sum files, which don't permit headers + "**/go.sum", + + // Ignore Go test data files + "**/go/data/**", + + // VCF test files + "**/apache_beam/testing/data/vcf/*", + + // JDBC package config files + "**/META-INF/services/java.sql.Driver", + + // Website build files + "**/Gemfile.lock", + "**/Rakefile", + "**/.htaccess", + "website/www/site/assets/scss/_bootstrap.scss", + "website/www/site/assets/scss/bootstrap/**/*", + "website/www/site/static/images/mascot/*.ai", + "website/www/site/static/js/bootstrap*.js", + "website/www/site/static/js/bootstrap/**/*", + "website/www/site/themes", + "website/www/yarn.lock", + "website/www/package.json", + "website/www/site/static/js/hero/lottie-light.min.js", + "website/www/site/static/js/keen-slider.min.js", + "website/www/site/assets/scss/_keen-slider.scss", + + // Ignore ownership files + "ownership/**/*", + "**/OWNERS", + + // Ignore CPython LICENSE file + "LICENSE.python", + + // Json doesn't support comments. + "**/*.json", + + // Katas files + "learning/katas/**/course-remote-info.yaml", + "learning/katas/**/section-remote-info.yaml", + "learning/katas/**/lesson-remote-info.yaml", + "learning/katas/**/task-remote-info.yaml", + "learning/katas/**/*.txt", + + // test p8 file for SnowflakeIO + "sdks/java/io/snowflake/src/test/resources/invalid_test_rsa_key.p8", + "sdks/java/io/snowflake/src/test/resources/valid_test_rsa_key.p8", + + // Mockito extensions + "sdks/java/io/amazon-web-services2/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker", + "sdks/java/io/azure/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker", + "sdks/java/extensions/ml/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker", + + // JupyterLab extensions + "sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/yarn.lock", + + // Sample text file for Java quickstart + "sdks/java/maven-archetypes/examples/sample.txt" + ) + + // Add .gitignore excludes to the Apache Rat exclusion list. We re-create the behavior + // of the Apache Maven Rat plugin since the Apache Ant Rat plugin doesn't do this + // automatically. + val gitIgnore = project(":").file(".gitignore") + if (gitIgnore.exists()) { + val gitIgnoreExcludes = gitIgnore.readLines().filter { it.isNotEmpty() && !it.startsWith("#") } + exclusions.addAll(gitIgnoreExcludes) + } + + failOnError.set(true) + setExcludes(exclusions) +} +tasks.check.get().dependsOn(tasks.rat) + +// Define root pre/post commit tasks simplifying what is needed +// to be specified on the commandline when executing locally. +// This indirection also makes Jenkins use the branch of the PR +// for the test definitions. +task("javaPreCommit") { + // We need to list the model/* builds since sdks/java/core doesn't + // depend on any of the model. + dependsOn(":model:pipeline:build") + dependsOn(":model:job-management:build") + dependsOn(":model:fn-execution:build") + dependsOn(":runners:google-cloud-dataflow-java:worker:legacy-worker:build") + dependsOn(":sdks:java:core:buildNeeded") + dependsOn(":sdks:java:core:buildDependents") + dependsOn(":examples:java:preCommit") + dependsOn(":examples:java:twitter:preCommit") + dependsOn(":sdks:java:extensions:sql:jdbc:preCommit") + dependsOn(":sdks:java:javadoc:allJavadoc") + dependsOn(":runners:direct-java:needsRunnerTests") + dependsOn(":sdks:java:container:java8:docker") +} + +task("sqlPreCommit") { + dependsOn(":sdks:java:extensions:sql:runBasicExample") + dependsOn(":sdks:java:extensions:sql:runPojoExample") + dependsOn(":sdks:java:extensions:sql:build") + dependsOn(":sdks:java:extensions:sql:buildDependents") +} + +task("javaPreCommitPortabilityApi") { + dependsOn(":runners:google-cloud-dataflow-java:worker:build") +} + +task("javaPostCommit") { + dependsOn(":sdks:java:extensions:google-cloud-platform-core:postCommit") + dependsOn(":sdks:java:extensions:zetasketch:postCommit") + dependsOn(":sdks:java:io:debezium:integrationTest") + dependsOn(":sdks:java:io:google-cloud-platform:postCommit") + dependsOn(":sdks:java:io:kinesis:integrationTest") + dependsOn(":sdks:java:extensions:ml:postCommit") + dependsOn(":sdks:java:io:kafka:kafkaVersionsCompatibilityTest") +} + +task("javaHadoopVersionsTest") { + dependsOn(":sdks:java:io:hadoop-common:hadoopVersionsTest") + dependsOn(":sdks:java:io:hadoop-file-system:hadoopVersionsTest") + dependsOn(":sdks:java:io:hadoop-format:hadoopVersionsTest") + dependsOn(":sdks:java:io:hcatalog:hadoopVersionsTest") + dependsOn(":sdks:java:io:parquet:hadoopVersionsTest") + dependsOn(":sdks:java:extensions:sorter:hadoopVersionsTest") + dependsOn(":runners:spark:2:hadoopVersionsTest") +} + +task("sqlPostCommit") { + dependsOn(":sdks:java:extensions:sql:postCommit") + dependsOn(":sdks:java:extensions:sql:jdbc:postCommit") + dependsOn(":sdks:java:extensions:sql:datacatalog:postCommit") + dependsOn(":sdks:java:extensions:sql:hadoopVersionsTest") +} + +task("goPreCommit") { + // Ensure the Precommit builds run after the tests, in order to avoid the + // flake described in BEAM-11918. This is done by splitting them into two + // tasks and using "mustRunAfter" to enforce ordering. + dependsOn(":goPrecommitTest") + dependsOn(":goPrecommitBuild") +} + +task("goPrecommitTest") { + dependsOn(":sdks:go:goTest") +} + +task("goPrecommitBuild") { + mustRunAfter(":goPrecommitTest") + + dependsOn(":sdks:go:goBuild") + dependsOn(":sdks:go:examples:goBuild") + dependsOn(":sdks:go:test:goBuild") + + // Ensure all container Go boot code builds as well. + dependsOn(":sdks:java:container:goBuild") + dependsOn(":sdks:python:container:goBuild") + dependsOn(":sdks:go:container:goBuild") +} + +task("goPortablePreCommit") { + dependsOn(":sdks:go:test:ulrValidatesRunner") +} + +task("goPostCommit") { + dependsOn(":goIntegrationTests") +} + +task("goIntegrationTests") { + doLast { + exec { + executable("sh") + args("-c", "./sdks/go/test/run_validatesrunner_tests.sh --runner dataflow") + } + } + dependsOn(":sdks:go:test:build") + dependsOn(":runners:google-cloud-dataflow-java:worker:shadowJar") +} + +task("pythonPreCommit") { + dependsOn(":sdks:python:test-suites:tox:pycommon:preCommitPyCommon") + dependsOn(":sdks:python:test-suites:tox:py36:preCommitPy36") + dependsOn(":sdks:python:test-suites:tox:py37:preCommitPy37") + dependsOn(":sdks:python:test-suites:tox:py38:preCommitPy38") + dependsOn(":sdks:python:test-suites:dataflow:preCommitIT") + dependsOn(":sdks:python:test-suites:dataflow:preCommitIT_V2") +} + +task("pythonDocsPreCommit") { + dependsOn(":sdks:python:test-suites:tox:pycommon:docs") +} + +task("pythonDockerBuildPreCommit") { + dependsOn(":sdks:python:container:py36:docker") + dependsOn(":sdks:python:container:py37:docker") + dependsOn(":sdks:python:container:py38:docker") +} + +task("pythonLintPreCommit") { + dependsOn(":sdks:python:test-suites:tox:py37:lint") +} + +task("pythonFormatterPreCommit") { + dependsOn("sdks:python:test-suites:tox:py38:formatter") +} + +task("python36PostCommit") { + dependsOn(":sdks:python:test-suites:dataflow:py36:postCommitIT") + dependsOn(":sdks:python:test-suites:direct:py36:postCommitIT") + dependsOn(":sdks:python:test-suites:portable:py36:postCommitPy36") +} + +task("python37PostCommit") { + dependsOn(":sdks:python:test-suites:dataflow:py37:postCommitIT") + dependsOn(":sdks:python:test-suites:direct:py37:postCommitIT") + dependsOn(":sdks:python:test-suites:direct:py37:directRunnerIT") + dependsOn(":sdks:python:test-suites:direct:py37:hdfsIntegrationTest") + dependsOn(":sdks:python:test-suites:direct:py37:mongodbioIT") + dependsOn(":sdks:python:test-suites:portable:py37:postCommitPy37") +} + +task("python38PostCommit") { + dependsOn(":sdks:python:test-suites:dataflow:py38:postCommitIT") + dependsOn(":sdks:python:test-suites:direct:py38:postCommitIT") + dependsOn(":sdks:python:test-suites:direct:py38:hdfsIntegrationTest") + dependsOn(":sdks:python:test-suites:portable:py38:postCommitPy38") +} + +task("portablePythonPreCommit") { + dependsOn(":sdks:python:test-suites:portable:py36:preCommitPy36") + dependsOn(":sdks:python:test-suites:portable:py37:preCommitPy37") +} + +task("pythonSparkPostCommit") { + dependsOn(":sdks:python:test-suites:portable:py36:sparkValidatesRunner") + dependsOn(":sdks:python:test-suites:portable:py37:sparkValidatesRunner") + dependsOn(":sdks:python:test-suites:portable:py38:sparkValidatesRunner") +} + +task("websitePreCommit") { + dependsOn(":website:preCommit") +} + +task("communityMetricsPreCommit") { + dependsOn(":beam-test-infra-metrics:preCommit") +} + +task("communityMetricsProber") { + dependsOn(":beam-test-infra-metrics:checkProber") +} + +task("javaExamplesDataflowPrecommit") { + dependsOn(":runners:google-cloud-dataflow-java:examples:preCommit") + dependsOn(":runners:google-cloud-dataflow-java:examples-streaming:preCommit") +} + +task("runBeamDependencyCheck") { + dependsOn(":dependencyUpdates") + dependsOn(":sdks:python:dependencyUpdates") +} + +task("whitespacePreCommit") { + dependsOn(":sdks:python:test-suites:tox:py38:archiveFilesToLint") + dependsOn(":sdks:python:test-suites:tox:py38:unpackFilesToLint") + dependsOn(":sdks:python:test-suites:tox:py38:whitespacelint") +} + +task("typescriptPreCommit") { + dependsOn(":sdks:python:test-suites:tox:py38:eslint") + dependsOn(":sdks:python:test-suites:tox:py38:jest") +} + +task("pushAllDockerImages") { + dependsOn(":runners:spark:2:job-server:container:dockerPush") + dependsOn(":runners:spark:3:job-server:container:dockerPush") + dependsOn(":sdks:java:container:pushAll") + dependsOn(":sdks:python:container:pushAll") + dependsOn(":sdks:go:container:pushAll") + for (version in project.ext.get("allFlinkVersions") as Array<*>) { + dependsOn(":runners:flink:${version}:job-server-container:dockerPush") + } +} + +// Use this task to validate the environment set up for Go, Python and Java +task("checkSetup") { + dependsOn(":sdks:go:examples:wordCount") + dependsOn(":sdks:python:wordCount") + dependsOn(":examples:java:wordCount") +} + +// Configure the release plugin to do only local work; the release manager determines what, if +// anything, to push. On failure, the release manager can reset the branch without pushing. +release { + revertOnFail = false + tagTemplate = "v${version}" + // workaround from https://github.com/researchgate/gradle-release/issues/281#issuecomment-466876492 + release { + with (propertyMissing("git") as net.researchgate.release.GitAdapter.GitConfig) { + requireBranch = "release-.*|master" + pushToRemote = "" + } + } +} + +// Reports linkage errors across multiple Apache Beam artifact ids. +// +// To use (from the root of project): +// ./gradlew -Ppublishing -PjavaLinkageArtifactIds=artifactId1,artifactId2,... :checkJavaLinkage +// +// For example: +// ./gradlew -Ppublishing -PjavaLinkageArtifactIds=beam-sdks-java-core,beam-sdks-java-io-jdbc :checkJavaLinkage +// +// Note that this task publishes artifacts into your local Maven repository. +if (project.hasProperty("javaLinkageArtifactIds")) { + if (!project.hasProperty("publishing")) { + throw GradleException("You can only check linkage of Java artifacts if you specify -Ppublishing on the command line as well.") + } + + val linkageCheckerJava by configurations.creating + dependencies { + linkageCheckerJava("com.google.cloud.tools:dependencies:1.5.6") + } + + // We need to evaluate all the projects first so that we can find depend on all the + // publishMavenJavaPublicationToMavenLocal tasks below. + for (p in rootProject.subprojects) { + if (p.path != project.path) { + evaluationDependsOn(p.path) + } + } + + project.task("checkJavaLinkage") { + dependsOn(project.getTasksByName("publishMavenJavaPublicationToMavenLocal", true /* recursively */)) + classpath = linkageCheckerJava + main = "com.google.cloud.tools.opensource.classpath.LinkageCheckerMain" + val javaLinkageArtifactIds: String = project.property("javaLinkageArtifactIds") as String? ?: "" + var arguments = arrayOf("-a", javaLinkageArtifactIds.split(",").joinToString(",") { + if (it.contains(":")) { + "${project.ext.get("mavenGroupId")}:${it}" + } else { + // specify the version if not provided + "${project.ext.get("mavenGroupId")}:${it}:${project.version}" + } + }) + + // Exclusion file filters out existing linkage errors before a change + if (project.hasProperty("javaLinkageWriteBaseline")) { + arguments += "--output-exclusion-file" + arguments += project.property("javaLinkageWriteBaseline") as String + } else if (project.hasProperty("javaLinkageReadBaseline")) { + arguments += "--exclusion-file" + arguments += project.property("javaLinkageReadBaseline") as String + } + args(*arguments) + doLast { + println("NOTE: This task published artifacts into your local Maven repository. You may want to remove them manually.") + } + } +} +if (project.hasProperty("compileAndRunTestsWithJava11")) { + tasks.getByName("javaPreCommitPortabilityApi").dependsOn(":sdks:java:testing:test-utils:verifyJavaVersion") + tasks.getByName("javaExamplesDataflowPrecommit").dependsOn(":sdks:java:testing:test-utils:verifyJavaVersion") + tasks.getByName("sqlPreCommit").dependsOn(":sdks:java:testing:test-utils:verifyJavaVersion") +} else { + allprojects { + tasks.withType(Test::class).configureEach { + exclude("**/JvmVerification.class") + } + } +} diff --git a/buildSrc/build.gradle b/buildSrc/build.gradle deleted file mode 100644 index 618475c96f39..000000000000 --- a/buildSrc/build.gradle +++ /dev/null @@ -1,98 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -// Plugins for configuring _this build_ of the module -plugins { - id 'java-gradle-plugin' - id 'groovy' - id "com.diffplug.spotless" version "5.6.1" -} - -// Define the set of repositories required to fetch and enable plugins. -repositories { - jcenter() - maven { url "https://plugins.gradle.org/m2/" } - maven { - url "https://repo.spring.io/plugins-release/" - content { includeGroup "io.spring.gradle" } - } -} - -// Dependencies on other plugins used when this plugin is invoked -dependencies { - compile gradleApi() - compile localGroovy() - compile 'com.github.jengelman.gradle.plugins:shadow:5.2.0' - compile "gradle.plugin.com.github.spotbugs.snom:spotbugs-gradle-plugin:4.5.0" - - runtime "net.ltgt.gradle:gradle-apt-plugin:0.21" // Enable a Java annotation processor - runtime "com.google.protobuf:protobuf-gradle-plugin:0.8.13" // Enable proto code generation - runtime "io.spring.gradle:propdeps-plugin:0.0.9.RELEASE" // Enable provided and optional configurations - runtime "com.commercehub.gradle.plugin:gradle-avro-plugin:0.11.0" // Enable Avro code generation - runtime "com.diffplug.spotless:spotless-plugin-gradle:5.6.1" // Enable a code formatting plugin - runtime "gradle.plugin.com.github.blindpirate:gogradle:0.11.4" // Enable Go code compilation - runtime "gradle.plugin.com.palantir.gradle.docker:gradle-docker:0.22.0" // Enable building Docker containers - runtime "gradle.plugin.com.dorongold.plugins:task-tree:1.5" // Adds a 'taskTree' task to print task dependency tree - runtime "com.github.jengelman.gradle.plugins:shadow:5.2.0" // Enable shading Java dependencies - runtime "ca.coglinc:javacc-gradle-plugin:2.4.0" // Enable the JavaCC parser generator - runtime "net.linguica.gradle:maven-settings-plugin:0.5" - runtime "gradle.plugin.io.pry.gradle.offline_dependencies:gradle-offline-dependencies-plugin:0.5.0" // Enable creating an offline repository - runtime "net.ltgt.gradle:gradle-errorprone-plugin:1.2.1" // Enable errorprone Java static analysis - runtime "org.ajoberstar.grgit:grgit-gradle:4.0.2" // Enable website git publish to asf-site branch - runtime "com.avast.gradle:gradle-docker-compose-plugin:0.13.2" // Enable docker compose tasks - runtime "ca.cutterslade.gradle:gradle-dependency-analyze:1.4.2" // Enable dep analysis - runtime "gradle.plugin.net.ossindex:ossindex-gradle-plugin:0.4.11" // Enable dep vulnerability analysis - runtime "org.checkerframework:checkerframework-gradle-plugin:0.5.11" // Enable enhanced static checking plugin -} - -// Because buildSrc is built and tested automatically _before_ gradle -// does anything else, it is not possible to spotlessApply because -// spotlessCheck fails before that. So this hack allows disabling -// the check for the moment of application. -// -// ./gradlew :buildSrc:spotlessApply -PdisableSpotlessCheck=true -def disableSpotlessCheck = project.hasProperty('disableSpotlessCheck') && - project.disableSpotlessCheck == 'true' - -spotless { - enforceCheck !disableSpotlessCheck - groovy { - excludeJava() - greclipse().configFile('greclipse.properties') - } - groovyGradle { - greclipse().configFile('greclipse.properties') - } -} - -gradlePlugin { - plugins { - beamModule { - id = 'org.apache.beam.module' - implementationClass = 'org.apache.beam.gradle.BeamModulePlugin' - } - vendorJava { - id = 'org.apache.beam.vendor-java' - implementationClass = 'org.apache.beam.gradle.VendorJavaPlugin' - } - beamJenkins { - id = 'org.apache.beam.jenkins' - implementationClass = 'org.apache.beam.gradle.BeamJenkinsPlugin' - } - } -} diff --git a/buildSrc/build.gradle.kts b/buildSrc/build.gradle.kts new file mode 100644 index 000000000000..60d037b61ace --- /dev/null +++ b/buildSrc/build.gradle.kts @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// Plugins for configuring _this build_ of the module +plugins { + `java-gradle-plugin` + groovy + id("com.diffplug.spotless") version "5.6.1" +} + +// Define the set of repositories required to fetch and enable plugins. +repositories { + jcenter() + maven { url = uri("https://plugins.gradle.org/m2/") } + maven { + url = uri("https://repo.spring.io/plugins-release/") + content { includeGroup("io.spring.gradle") } + } +} + +// Dependencies on other plugins used when this plugin is invoked +dependencies { + compile(gradleApi()) + compile(localGroovy()) + compile("com.github.jengelman.gradle.plugins:shadow:6.1.0") + compile("gradle.plugin.com.github.spotbugs.snom:spotbugs-gradle-plugin:4.5.0") + + runtime("net.ltgt.gradle:gradle-apt-plugin:0.21") // Enable a Java annotation processor + runtime("com.google.protobuf:protobuf-gradle-plugin:0.8.13") // Enable proto code generation + runtime("io.spring.gradle:propdeps-plugin:0.0.9.RELEASE") // Enable provided and optional configurations + runtime("com.commercehub.gradle.plugin:gradle-avro-plugin:0.11.0") // Enable Avro code generation + runtime("com.diffplug.spotless:spotless-plugin-gradle:5.6.1") // Enable a code formatting plugin + runtime("gradle.plugin.com.github.blindpirate:gogradle:0.11.4") // Enable Go code compilation + runtime("gradle.plugin.com.palantir.gradle.docker:gradle-docker:0.22.0") // Enable building Docker containers + runtime("gradle.plugin.com.dorongold.plugins:task-tree:1.5") // Adds a 'taskTree' task to print task dependency tree + runtime("com.github.jengelman.gradle.plugins:shadow:6.1.0") // Enable shading Java dependencies + runtime("ca.coglinc:javacc-gradle-plugin:2.4.0") // Enable the JavaCC parser generator + runtime("net.linguica.gradle:maven-settings-plugin:0.5") + runtime("gradle.plugin.io.pry.gradle.offline_dependencies:gradle-offline-dependencies-plugin:0.5.0") // Enable creating an offline repository + runtime("net.ltgt.gradle:gradle-errorprone-plugin:1.2.1") // Enable errorprone Java static analysis + runtime("org.ajoberstar.grgit:grgit-gradle:4.0.2") // Enable website git publish to asf-site branch + runtime("com.avast.gradle:gradle-docker-compose-plugin:0.13.2") // Enable docker compose tasks + runtime("ca.cutterslade.gradle:gradle-dependency-analyze:1.4.3") // Enable dep analysis + runtime("gradle.plugin.net.ossindex:ossindex-gradle-plugin:0.4.11") // Enable dep vulnerability analysis + runtime("org.checkerframework:checkerframework-gradle-plugin:0.5.16") // Enable enhanced static checking plugin +} + +// Because buildSrc is built and tested automatically _before_ gradle +// does anything else, it is not possible to spotlessApply because +// spotlessCheck fails before that. So this hack allows disabling +// the check for the moment of application. +// +// ./gradlew :buildSrc:spotlessApply -PdisableSpotlessCheck=true +val disableSpotlessCheck: String by project +val isSpotlessCheckDisabled = project.hasProperty("disableSpotlessCheck") && + disableSpotlessCheck == "true" + +spotless { + isEnforceCheck = !isSpotlessCheckDisabled + groovy { + excludeJava() + greclipse().configFile("greclipse.properties") + } + groovyGradle { + greclipse().configFile("greclipse.properties") + } +} + +gradlePlugin { + plugins { + create("beamModule") { + id = "org.apache.beam.module" + implementationClass = "org.apache.beam.gradle.BeamModulePlugin" + } + create("vendorJava") { + id = "org.apache.beam.vendor-java" + implementationClass = "org.apache.beam.gradle.VendorJavaPlugin" + } + create("beamJenkins") { + id = "org.apache.beam.jenkins" + implementationClass = "org.apache.beam.gradle.BeamJenkinsPlugin" + } + } +} diff --git a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy index f5070639055e..67fe8b894c68 100644 --- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy +++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy @@ -23,6 +23,7 @@ import groovy.json.JsonOutput import groovy.json.JsonSlurper import org.gradle.api.attributes.Category import org.gradle.api.GradleException +import org.gradle.api.JavaVersion import org.gradle.api.Plugin import org.gradle.api.Project import org.gradle.api.Task @@ -44,7 +45,7 @@ import org.gradle.api.tasks.PathSensitive import org.gradle.api.tasks.PathSensitivity import org.gradle.testing.jacoco.tasks.JacocoReport -import java.util.concurrent.atomic.AtomicInteger +import java.net.ServerSocket /** * This plugin adds methods to configure a module with Beam's defaults, called "natures". * @@ -81,7 +82,16 @@ class BeamModulePlugin implements Plugin { * limitations under the License. */ """ - static AtomicInteger startingExpansionPortNumber = new AtomicInteger(18091) + static def getRandomPort() { + new ServerSocket(0).withCloseable { socket -> + def port = socket.getLocalPort() + if (port > 0) { + return port + } else { + throw new GradleException("couldn't find a free port.") + } + } + } /** A class defining the set of configurable properties accepted by applyJavaNature. */ class JavaNatureConfiguration { @@ -94,14 +104,8 @@ class BeamModulePlugin implements Plugin { /** Classes triggering Checker failures. A map from class name to the bug filed against checkerframework. */ Map classesTriggerCheckerBugs = [:] - /** - Some module's tests take a very long time to compile with checkerframework. - Until that is solved, set this flag to skip checking for those tests. - */ - boolean checkerTooSlowOnTests = false - /** Controls whether the dependency analysis plugin is enabled. */ - boolean enableStrictDependencies = false + boolean enableStrictDependencies = true /** Override the default "beam-" + `dash separated path` archivesBaseName. */ String archivesBaseName = null @@ -125,6 +129,17 @@ class BeamModulePlugin implements Plugin { */ boolean validateShadowJar = true + /** + * Controls whether the 'jmh' source set is enabled for JMH benchmarks. + * + * Add additional dependencies to the jmhCompile and jmhRuntime dependency + * sets. + * + * Note that the JMH annotation processor is enabled by default and that + * a 'jmh' task is created which executes JMH. + */ + boolean enableJmh = false + /** * The set of excludes that should be used during validation of the shadow jar. Projects should override * the default with the most specific set of excludes that is valid for the contents of its shaded jar. @@ -243,8 +258,8 @@ class BeamModulePlugin implements Plugin { // Attribute tag that can filter the test set. String attribute = System.getProperty('attr') - // Extra test options pass to nose. - String[] extraTestOptions = ["--nocapture"] + // Extra test options pass to pytest. + String[] extraTestOptions = ["--capture=no"] // Name of Cloud KMS encryption key to use in some tests. String kmsKeyName = System.getProperty('kmsKeyName') @@ -317,8 +332,8 @@ class BeamModulePlugin implements Plugin { "--environment_cache_millis=10000", "--experiments=beam_fn_api", ] - // Additional nosetests options - List nosetestsOptions = [] + // Additional pytest options + List pytestOptions = [] // Job server startup task. Task startJobServer // Job server cleanup task. @@ -335,7 +350,7 @@ class BeamModulePlugin implements Plugin { // excludeCategories 'org.apache.beam.sdk.testing.FlattenWithHeterogeneousCoders' } // Attribute for Python tests to run. - String pythonTestAttr = "UsesCrossLanguageTransforms" + String pythonTestAttr = "xlang_transforms" // classpath for running tests. FileCollection classpath } @@ -357,7 +372,7 @@ class BeamModulePlugin implements Plugin { // Automatically use the official release version if we are performing a release // otherwise append '-SNAPSHOT' - project.version = '2.27.0' + project.version = '2.34.0' if (!isRelease(project)) { project.version += '-SNAPSHOT' } @@ -406,6 +421,9 @@ class BeamModulePlugin implements Plugin { project.apply plugin: "com.dorongold.task-tree" project.taskTree { noRepeat = true } + project.ext.allFlinkVersions = project.flink_versions.split(',') + project.ext.latestFlinkVersion = project.ext.allFlinkVersions.last() + /** ***********************************************************************************************/ // Define and export a map dependencies shared across multiple sub-projects. // @@ -419,36 +437,43 @@ class BeamModulePlugin implements Plugin { // a dependency version which should match across multiple // Maven artifacts. def activemq_version = "5.14.5" - def aws_java_sdk_version = "1.11.718" - def aws_java_sdk2_version = "2.13.54" + def autovalue_version = "1.8.1" + def aws_java_sdk_version = "1.11.974" + def aws_java_sdk2_version = "2.15.31" def cassandra_driver_version = "3.10.2" - def checkerframework_version = "3.7.0" - def classgraph_version = "4.8.65" - def google_clients_version = "1.30.10" - def google_cloud_bigdataoss_version = "2.1.6" - def google_cloud_pubsub_version = "1.108.6" - def google_cloud_pubsublite_version = "0.7.0" + def checkerframework_version = "3.10.0" + def classgraph_version = "4.8.104" + def errorprone_version = "2.3.4" + def google_clients_version = "1.32.1" + def google_cloud_bigdataoss_version = "2.2.2" + def google_cloud_pubsublite_version = "1.0.4" def google_code_gson_version = "2.8.6" def google_oauth_clients_version = "1.31.0" // Try to keep grpc_version consistent with gRPC version in google_cloud_platform_libraries_bom - def grpc_version = "1.32.2" - def guava_version = "25.1-jre" + def grpc_version = "1.40.1" + def guava_version = "30.1.1-jre" def hadoop_version = "2.10.1" def hamcrest_version = "2.1" def influxdb_version = "2.19" - def jackson_version = "2.10.2" + def httpclient_version = "4.5.13" + def httpcore_version = "4.4.14" + def jackson_version = "2.12.4" def jaxb_api_version = "2.3.3" + def jsr305_version = "3.0.2" def kafka_version = "2.4.1" def nemo_version = "0.1" - def netty_version = "4.1.51.Final" + def netty_version = "4.1.52.Final" def postgres_version = "42.2.16" - def powermock_version = "2.0.2" - def protobuf_version = "3.12.0" + def powermock_version = "2.0.9" + // Try to keep protobuf_version consistent with the protobuf version in google_cloud_platform_libraries_bom + def protobuf_version = "3.17.3" def quickcheck_version = "0.8" def slf4j_version = "1.7.30" - def spark_version = "2.4.7" + def spark_version = "2.4.8" def spotbugs_version = "4.0.6" - def testcontainers_version = "1.15.0-rc2" + def testcontainers_version = "1.15.1" + def arrow_version = "5.0.0" + def jmh_version = "1.32" // A map of maps containing common libraries used per language. To use: // dependencies { @@ -466,6 +491,7 @@ class BeamModulePlugin implements Plugin { antlr : "org.antlr:antlr4:4.7", antlr_runtime : "org.antlr:antlr4-runtime:4.7", args4j : "args4j:args4j:2.33", + auto_value_annotations : "com.google.auto.value:auto-value-annotations:$autovalue_version", avro : "org.apache.avro:avro:1.8.2", avro_tests : "org.apache.avro:avro:1.8.2:tests", aws_java_sdk_cloudwatch : "com.amazonaws:aws-java-sdk-cloudwatch:$aws_java_sdk_version", @@ -484,48 +510,55 @@ class BeamModulePlugin implements Plugin { aws_java_sdk2_sdk_core : "software.amazon.awssdk:sdk-core:$aws_java_sdk2_version", aws_java_sdk2_sns : "software.amazon.awssdk:sns:$aws_java_sdk2_version", aws_java_sdk2_sqs : "software.amazon.awssdk:sqs:$aws_java_sdk2_version", + aws_java_sdk2_s3 : "software.amazon.awssdk:s3:$aws_java_sdk2_version", + aws_java_sdk2_http_client_spi : "software.amazon.awssdk:http-client-spi:$aws_java_sdk2_version", + aws_java_sdk2_regions : "software.amazon.awssdk:regions:$aws_java_sdk2_version", + aws_java_sdk2_utils : "software.amazon.awssdk:utils:$aws_java_sdk2_version", bigdataoss_gcsio : "com.google.cloud.bigdataoss:gcsio:$google_cloud_bigdataoss_version", bigdataoss_util : "com.google.cloud.bigdataoss:util:$google_cloud_bigdataoss_version", cassandra_driver_core : "com.datastax.cassandra:cassandra-driver-core:$cassandra_driver_version", cassandra_driver_mapping : "com.datastax.cassandra:cassandra-driver-mapping:$cassandra_driver_version", classgraph : "io.github.classgraph:classgraph:$classgraph_version", - commons_codec : "commons-codec:commons-codec:1.14", - commons_compress : "org.apache.commons:commons-compress:1.20", + commons_codec : "commons-codec:commons-codec:1.15", + commons_compress : "org.apache.commons:commons-compress:1.21", commons_csv : "org.apache.commons:commons-csv:1.8", commons_io : "commons-io:commons-io:2.6", commons_lang3 : "org.apache.commons:commons-lang3:3.9", commons_math3 : "org.apache.commons:commons-math3:3.6.1", - error_prone_annotations : "com.google.errorprone:error_prone_annotations:2.3.1", + error_prone_annotations : "com.google.errorprone:error_prone_annotations:$errorprone_version", + flogger_system_backend : "com.google.flogger:flogger-system-backend:0.6", gax : "com.google.api:gax", // google_cloud_platform_libraries_bom sets version gax_grpc : "com.google.api:gax-grpc", // google_cloud_platform_libraries_bom sets version - google_api_client : "com.google.api-client:google-api-client:$google_clients_version", + gax_httpjson : "com.google.api:gax-httpjson", // google_cloud_platform_libraries_bom sets version + google_api_client : "com.google.api-client:google-api-client:$google_clients_version", // for the libraries using $google_clients_version below. google_api_client_jackson2 : "com.google.api-client:google-api-client-jackson2:$google_clients_version", google_api_client_java6 : "com.google.api-client:google-api-client-java6:$google_clients_version", google_api_common : "com.google.api:api-common", // google_cloud_platform_libraries_bom sets version - google_api_services_bigquery : "com.google.apis:google-api-services-bigquery:v2-rev20200719-$google_clients_version", - google_api_services_clouddebugger : "com.google.apis:google-api-services-clouddebugger:v2-rev20200501-$google_clients_version", - google_api_services_cloudresourcemanager : "com.google.apis:google-api-services-cloudresourcemanager:v1-rev20200720-$google_clients_version", - google_api_services_dataflow : "com.google.apis:google-api-services-dataflow:v1b3-rev20200713-$google_clients_version", - google_api_services_healthcare : "com.google.apis:google-api-services-healthcare:v1beta1-rev20200713-$google_clients_version", - google_api_services_pubsub : "com.google.apis:google-api-services-pubsub:v1-rev20200713-$google_clients_version", - google_api_services_storage : "com.google.apis:google-api-services-storage:v1-rev20200611-$google_clients_version", + google_api_services_bigquery : "com.google.apis:google-api-services-bigquery:v2-rev20210813-$google_clients_version", + google_api_services_clouddebugger : "com.google.apis:google-api-services-clouddebugger:v2-rev20210813-$google_clients_version", + google_api_services_cloudresourcemanager : "com.google.apis:google-api-services-cloudresourcemanager:v1-rev20210815-$google_clients_version", + google_api_services_dataflow : "com.google.apis:google-api-services-dataflow:v1b3-rev20210818-$google_clients_version", + google_api_services_healthcare : "com.google.apis:google-api-services-healthcare:v1-rev20210806-$google_clients_version", + google_api_services_pubsub : "com.google.apis:google-api-services-pubsub:v1-rev20210809-$google_clients_version", + google_api_services_storage : "com.google.apis:google-api-services-storage:v1-rev20210127-$google_clients_version", google_auth_library_credentials : "com.google.auth:google-auth-library-credentials", // google_cloud_platform_libraries_bom sets version google_auth_library_oauth2_http : "com.google.auth:google-auth-library-oauth2-http", // google_cloud_platform_libraries_bom sets version google_cloud_bigquery : "com.google.cloud:google-cloud-bigquery", // google_cloud_platform_libraries_bom sets version google_cloud_bigquery_storage : "com.google.cloud:google-cloud-bigquerystorage", // google_cloud_platform_libraries_bom sets version - google_cloud_bigtable_client_core : "com.google.cloud.bigtable:bigtable-client-core:1.16.0", - google_cloud_bigtable_emulator : "com.google.cloud:google-cloud-bigtable-emulator:0.125.2", + google_cloud_bigtable_client_core : "com.google.cloud.bigtable:bigtable-client-core:1.23.1", + google_cloud_bigtable_emulator : "com.google.cloud:google-cloud-bigtable-emulator:0.137.1", google_cloud_core : "com.google.cloud:google-cloud-core", // google_cloud_platform_libraries_bom sets version google_cloud_core_grpc : "com.google.cloud:google-cloud-core-grpc", // google_cloud_platform_libraries_bom sets version google_cloud_datacatalog_v1beta1 : "com.google.cloud:google-cloud-datacatalog", // google_cloud_platform_libraries_bom sets version google_cloud_dataflow_java_proto_library_all: "com.google.cloud.dataflow:google-cloud-dataflow-java-proto-library-all:0.5.160304", google_cloud_datastore_v1_proto_client : "com.google.cloud.datastore:datastore-v1-proto-client:1.6.3", - google_cloud_pubsub : "com.google.cloud:google-cloud-pubsub:$google_cloud_pubsub_version", + google_cloud_firestore : "com.google.cloud:google-cloud-firestore", // google_cloud_platform_libraries_bom sets version + google_cloud_pubsub : "com.google.cloud:google-cloud-pubsub", // google_cloud_platform_libraries_bom sets version google_cloud_pubsublite : "com.google.cloud:google-cloud-pubsublite:$google_cloud_pubsublite_version", // The GCP Libraries BOM dashboard shows the versions set by the BOM: - // https://storage.googleapis.com/cloud-opensource-java-dashboard/com.google.cloud/libraries-bom/12.0.0/artifact_details.html + // https://storage.googleapis.com/cloud-opensource-java-dashboard/com.google.cloud/libraries-bom/20.0.0/artifact_details.html // Update libraries-bom version on sdks/java/container/license_scripts/dep_urls_java.yaml - google_cloud_platform_libraries_bom : "com.google.cloud:libraries-bom:13.2.0", + google_cloud_platform_libraries_bom : "com.google.cloud:libraries-bom:22.0.0", google_cloud_spanner : "com.google.cloud:google-cloud-spanner", // google_cloud_platform_libraries_bom sets version google_code_gson : "com.google.code.gson:gson:$google_code_gson_version", // google-http-client's version is explicitly declared for sdks/java/maven-archetypes/examples @@ -543,8 +576,10 @@ class BeamModulePlugin implements Plugin { grpc_auth : "io.grpc:grpc-auth", // google_cloud_platform_libraries_bom sets version grpc_context : "io.grpc:grpc-context", // google_cloud_platform_libraries_bom sets version grpc_core : "io.grpc:grpc-core", // google_cloud_platform_libraries_bom sets version + grpc_google_cloud_firestore_v1 : "com.google.api.grpc:grpc-google-cloud-firestore-v1", // google_cloud_platform_libraries_bom sets version grpc_google_cloud_pubsub_v1 : "com.google.api.grpc:grpc-google-cloud-pubsub-v1", // google_cloud_platform_libraries_bom sets version grpc_google_cloud_pubsublite_v1 : "com.google.api.grpc:grpc-google-cloud-pubsublite-v1:$google_cloud_pubsublite_version", + grpc_google_common_protos : "com.google.api.grpc:grpc-google-common-protos", // google_cloud_platform_libraries_bom sets version grpc_grpclb : "io.grpc:grpc-grpclb", // google_cloud_platform_libraries_bom sets version grpc_protobuf : "io.grpc:grpc-protobuf", // google_cloud_platform_libraries_bom sets version grpc_protobuf_lite : "io.grpc:grpc-protobuf-lite:$grpc_version", @@ -559,8 +594,11 @@ class BeamModulePlugin implements Plugin { hadoop_minicluster : "org.apache.hadoop:hadoop-minicluster:$hadoop_version", hadoop_hdfs : "org.apache.hadoop:hadoop-hdfs:$hadoop_version", hadoop_hdfs_tests : "org.apache.hadoop:hadoop-hdfs:$hadoop_version:tests", + hamcrest : "org.hamcrest:hamcrest:$hamcrest_version", hamcrest_core : "org.hamcrest:hamcrest-core:$hamcrest_version", hamcrest_library : "org.hamcrest:hamcrest-library:$hamcrest_version", + http_client : "org.apache.httpcomponents:httpclient:$httpclient_version", + http_core : "org.apache.httpcomponents:httpcore:$httpcore_version", influxdb_library : "org.influxdb:influxdb-java:$influxdb_version", jackson_annotations : "com.fasterxml.jackson.core:jackson-annotations:$jackson_version", jackson_jaxb_annotations : "com.fasterxml.jackson.module:jackson-module-jaxb-annotations:$jackson_version", @@ -571,16 +609,18 @@ class BeamModulePlugin implements Plugin { jackson_dataformat_xml : "com.fasterxml.jackson.dataformat:jackson-dataformat-xml:$jackson_version", jackson_dataformat_yaml : "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:$jackson_version", jackson_datatype_joda : "com.fasterxml.jackson.datatype:jackson-datatype-joda:$jackson_version", - jackson_module_scala : "com.fasterxml.jackson.module:jackson-module-scala_2.11:$jackson_version", + jackson_module_scala_2_11 : "com.fasterxml.jackson.module:jackson-module-scala_2.11:$jackson_version", + jackson_module_scala_2_12 : "com.fasterxml.jackson.module:jackson-module-scala_2.12:$jackson_version", jaxb_api : "jakarta.xml.bind:jakarta.xml.bind-api:$jaxb_api_version", jaxb_impl : "com.sun.xml.bind:jaxb-impl:$jaxb_api_version", - joda_time : "joda-time:joda-time:2.10.5", + joda_time : "joda-time:joda-time:2.10.10", jsonassert : "org.skyscreamer:jsonassert:1.5.0", - jsr305 : "com.google.code.findbugs:jsr305:3.0.2", - junit : "junit:junit:4.13-beta-3", + jsr305 : "com.google.code.findbugs:jsr305:$jsr305_version", + junit : "junit:junit:4.13.1", kafka : "org.apache.kafka:kafka_2.11:$kafka_version", kafka_clients : "org.apache.kafka:kafka-clients:$kafka_version", - mockito_core : "org.mockito:mockito-core:3.0.0", + mockito_core : "org.mockito:mockito-core:3.7.7", + mongo_java_driver : "org.mongodb:mongo-java-driver:3.12.7", nemo_compiler_frontend_beam : "org.apache.nemo:nemo-compiler-frontend-beam:$nemo_version", netty_all : "io.netty:netty-all:$netty_version", netty_handler : "io.netty:netty-handler:$netty_version", @@ -591,18 +631,23 @@ class BeamModulePlugin implements Plugin { powermock_mockito : "org.powermock:powermock-api-mockito2:$powermock_version", protobuf_java : "com.google.protobuf:protobuf-java:$protobuf_version", protobuf_java_util : "com.google.protobuf:protobuf-java-util:$protobuf_version", - proto_google_cloud_bigquery_storage_v1beta1 : "com.google.api.grpc:proto-google-cloud-bigquerystorage-v1beta1", // google_cloud_platform_libraries_bom sets version + proto_google_cloud_bigquery_storage_v1 : "com.google.api.grpc:proto-google-cloud-bigquerystorage-v1", // google_cloud_platform_libraries_bom sets version + proto_google_cloud_bigtable_admin_v2 : "com.google.api.grpc:proto-google-cloud-bigtable-admin-v2", // google_cloud_platform_libraries_bom sets version + proto_google_cloud_bigquery_storage_v1beta2 : "com.google.api.grpc:proto-google-cloud-bigquerystorage-v1beta2", // google_cloud_platform_libraries_bom sets version proto_google_cloud_bigtable_v2 : "com.google.api.grpc:proto-google-cloud-bigtable-v2", // google_cloud_platform_libraries_bom sets version + proto_google_cloud_datacatalog_v1beta1 : "com.google.api.grpc:proto-google-cloud-datacatalog-v1beta1", // google_cloud_platform_libraries_bom sets version proto_google_cloud_datastore_v1 : "com.google.api.grpc:proto-google-cloud-datastore-v1", // google_cloud_platform_libraries_bom sets version + proto_google_cloud_firestore_v1 : "com.google.api.grpc:proto-google-cloud-firestore-v1", // google_cloud_platform_libraries_bom sets version proto_google_cloud_pubsub_v1 : "com.google.api.grpc:proto-google-cloud-pubsub-v1", // google_cloud_platform_libraries_bom sets version proto_google_cloud_pubsublite_v1 : "com.google.api.grpc:proto-google-cloud-pubsublite-v1:$google_cloud_pubsublite_version", + proto_google_cloud_spanner_v1 : "com.google.api.grpc:proto-google-cloud-spanner-v1", // google_cloud_platform_libraries_bom sets version proto_google_cloud_spanner_admin_database_v1: "com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1", // google_cloud_platform_libraries_bom sets version proto_google_common_protos : "com.google.api.grpc:proto-google-common-protos", // google_cloud_platform_libraries_bom sets version slf4j_api : "org.slf4j:slf4j-api:$slf4j_version", slf4j_simple : "org.slf4j:slf4j-simple:$slf4j_version", slf4j_jdk14 : "org.slf4j:slf4j-jdk14:$slf4j_version", slf4j_log4j12 : "org.slf4j:slf4j-log4j12:$slf4j_version", - snappy_java : "org.xerial.snappy:snappy-java:1.1.4", + snappy_java : "org.xerial.snappy:snappy-java:1.1.8.4", spark_core : "org.apache.spark:spark-core_2.11:$spark_version", spark_network_common : "org.apache.spark:spark-network-common_2.11:$spark_version", spark_sql : "org.apache.spark:spark-sql_2.11:$spark_version", @@ -613,13 +658,17 @@ class BeamModulePlugin implements Plugin { testcontainers_kafka : "org.testcontainers:kafka:$testcontainers_version", testcontainers_localstack : "org.testcontainers:localstack:$testcontainers_version", testcontainers_postgresql : "org.testcontainers:postgresql:$testcontainers_version", - vendored_bytebuddy_1_10_8 : "org.apache.beam:beam-vendor-bytebuddy-1_10_8:0.1", - vendored_grpc_1_26_0 : "org.apache.beam:beam-vendor-grpc-1_26_0:0.3", + testcontainers_gcloud : "org.testcontainers:gcloud:$testcontainers_version", + vendored_bytebuddy_1_11_0 : "org.apache.beam:beam-vendor-bytebuddy-1_11_0:0.1", + vendored_grpc_1_36_0 : "org.apache.beam:beam-vendor-grpc-1_36_0:0.2", vendored_guava_26_0_jre : "org.apache.beam:beam-vendor-guava-26_0-jre:0.1", - vendored_calcite_1_20_0 : "org.apache.beam:beam-vendor-calcite-1_20_0:0.1", + vendored_calcite_1_26_0 : "org.apache.beam:beam-vendor-calcite-1_26_0:0.1", woodstox_core_asl : "org.codehaus.woodstox:woodstox-core-asl:4.4.1", zstd_jni : "com.github.luben:zstd-jni:1.4.5-2", quickcheck_core : "com.pholser:junit-quickcheck-core:$quickcheck_version", + arrow_vector : "org.apache.arrow:arrow-vector:$arrow_version", + arrow_memory_core : "org.apache.arrow:arrow-memory-core:$arrow_version", + arrow_memory_netty : "org.apache.arrow:arrow-memory-netty:$arrow_version", ], groovy: [ groovy_all: "org.codehaus.groovy:groovy-all:2.4.13", @@ -811,20 +860,36 @@ class BeamModulePlugin implements Plugin { List skipDefRegexes = [] skipDefRegexes << "AutoValue_.*" skipDefRegexes << "AutoOneOf_.*" + skipDefRegexes << ".*\\.jmh_generated\\..*" skipDefRegexes += configuration.generatedClassPatterns skipDefRegexes += configuration.classesTriggerCheckerBugs.keySet() String skipDefCombinedRegex = skipDefRegexes.collect({ regex -> "(${regex})"}).join("|") + // SLF4J logger handles null log message parameters + String skipUsesRegex = "^org\\.slf4j\\.Logger.*" + project.apply plugin: 'org.checkerframework' project.checkerFramework { checkers = [ 'org.checkerframework.checker.nullness.NullnessChecker' ] - excludeTests = configuration.checkerTooSlowOnTests + if (project.findProperty('enableCheckerFramework') || project.jenkins.isCIBuild) { + skipCheckerFramework = false + } else { + skipCheckerFramework = true + } + + // Always exclude checkerframework on tests. It's slow, and it often + // raises erroneous error because we don't have checker annotations for + // test libraries like junit and hamcrest. See BEAM-11436. + // Consider re-enabling if we can get annotations for the test libraries + // we use. + excludeTests = true extraJavacArgs = [ "-AskipDefs=${skipDefCombinedRegex}", + "-AskipUses=${skipUsesRegex}", "-AsuppressWarnings=annotation.not.completed", ] @@ -855,8 +920,9 @@ class BeamModulePlugin implements Plugin { // configurations because they are never required to be shaded or become a // dependency of the output. def compileOnlyAnnotationDeps = [ - "com.google.auto.value:auto-value-annotations:1.7", "com.google.auto.service:auto-service-annotations:1.0-rc6", + "com.google.auto.value:auto-value-annotations:$autovalue_version", + "com.google.code.findbugs:jsr305:$jsr305_version", "com.google.j2objc:j2objc-annotations:1.3", // These dependencies are needed to avoid error-prone warnings on package-info.java files, // also to include the annotations to suppress warnings. @@ -893,7 +959,7 @@ class BeamModulePlugin implements Plugin { // Add common annotation processors to all Java projects def annotationProcessorDeps = [ - "com.google.auto.value:auto-value:1.7", + "com.google.auto.value:auto-value:$autovalue_version", "com.google.auto.service:auto-service:1.0-rc6", ] @@ -972,11 +1038,12 @@ class BeamModulePlugin implements Plugin { project.apply plugin: 'com.github.spotbugs' project.dependencies { spotbugs "com.github.spotbugs:spotbugs:$spotbugs_version" - spotbugs "com.google.auto.value:auto-value:1.7" + spotbugs "com.google.auto.value:auto-value:$autovalue_version" compileOnlyAnnotationDeps.each { dep -> spotbugs dep } } project.spotbugs { excludeFilter = project.rootProject.file('sdks/java/build-tools/src/main/resources/beam/spotbugs-filter.xml') + jvmArgs = ['-Xmx12g'] } project.tasks.withType(com.github.spotbugs.snom.SpotBugsTask) { reports { @@ -994,6 +1061,7 @@ class BeamModulePlugin implements Plugin { permitUnusedDeclared dep permitTestUnusedDeclared dep } + permitUnusedDeclared "org.checkerframework:checker-qual:$checkerframework_version" } if (configuration.enableStrictDependencies) { project.tasks.analyzeClassesDependencies.enabled = true @@ -1009,14 +1077,14 @@ class BeamModulePlugin implements Plugin { project.apply plugin: 'net.ltgt.errorprone' project.dependencies { - errorprone("com.google.errorprone:error_prone_core:2.3.1") + errorprone("com.google.errorprone:error_prone_core:$errorprone_version") errorprone("jp.skypencil.errorprone.slf4j:errorprone-slf4j:0.1.2") // At least JDk 9 compiler is required, however JDK 8 still can be used but with additional errorproneJavac // configuration. For more details please see https://github.com/tbroyer/gradle-errorprone-plugin#jdk-8-support errorproneJavac("com.google.errorprone:javac:9+181-r4173-1") } - project.configurations.errorprone { resolutionStrategy.force 'com.google.errorprone:error_prone_core:2.3.1' } + project.configurations.errorprone { resolutionStrategy.force "com.google.errorprone:error_prone_core:$errorprone_version" } project.tasks.withType(JavaCompile) { options.errorprone.disableWarningsInGeneratedCode = true @@ -1025,6 +1093,40 @@ class BeamModulePlugin implements Plugin { options.errorprone.errorproneArgs.add("-Xep:Slf4jLoggerShouldBeNonStatic:OFF") options.errorprone.errorproneArgs.add("-Xep:Slf4jFormatShouldBeConst:OFF") options.errorprone.errorproneArgs.add("-Xep:Slf4jSignOnlyFormat:OFF") + options.errorprone.errorproneArgs.add("-Xep:AssignmentToMock:OFF") + options.errorprone.errorproneArgs.add("-Xep:AnnotateFormatMethod:OFF") + options.errorprone.errorproneArgs.add("-Xep:AutoValueFinalMethods:OFF") + options.errorprone.errorproneArgs.add("-Xep:AutoValueImmutableFields:OFF") + options.errorprone.errorproneArgs.add("-Xep:BadImport:OFF") + options.errorprone.errorproneArgs.add("-Xep:BadInstanceof:OFF") + options.errorprone.errorproneArgs.add("-Xep:BigDecimalEquals:OFF") + options.errorprone.errorproneArgs.add("-Xep:BigDecimalLiteralDouble:OFF") + options.errorprone.errorproneArgs.add("-Xep:ComparableType:OFF") + options.errorprone.errorproneArgs.add("-Xep:CompareToZero:OFF") + options.errorprone.errorproneArgs.add("-Xep:EqualsGetClass:OFF") + options.errorprone.errorproneArgs.add("-Xep:EqualsUnsafeCast:OFF") + options.errorprone.errorproneArgs.add("-Xep:ExtendsAutoValue:OFF") + options.errorprone.errorproneArgs.add("-Xep:FloatingPointAssertionWithinEpsilon:OFF") + options.errorprone.errorproneArgs.add("-Xep:JodaDurationConstructor:OFF") + options.errorprone.errorproneArgs.add("-Xep:JavaTimeDefaultTimeZone:OFF") + options.errorprone.errorproneArgs.add("-Xep:JodaPlusMinusLong:OFF") + options.errorprone.errorproneArgs.add("-Xep:JodaToSelf:OFF") + options.errorprone.errorproneArgs.add("-Xep:LockNotBeforeTry:OFF") + options.errorprone.errorproneArgs.add("-Xep:MathAbsoluteRandom:OFF") + options.errorprone.errorproneArgs.add("-Xep:MixedMutabilityReturnType:OFF") + options.errorprone.errorproneArgs.add("-Xep:PreferJavaTimeOverload:OFF") + options.errorprone.errorproneArgs.add("-Xep:ModifiedButNotUsed:OFF") + options.errorprone.errorproneArgs.add("-Xep:SameNameButDifferent:OFF") + options.errorprone.errorproneArgs.add("-Xep:ThreadPriorityCheck:OFF") + options.errorprone.errorproneArgs.add("-Xep:TimeUnitConversionChecker:OFF") + options.errorprone.errorproneArgs.add("-Xep:UndefinedEquals:OFF") + options.errorprone.errorproneArgs.add("-Xep:UnnecessaryAnonymousClass:OFF") + options.errorprone.errorproneArgs.add("-Xep:UnnecessaryLambda:OFF") + options.errorprone.errorproneArgs.add("-Xep:UnusedMethod:OFF") + options.errorprone.errorproneArgs.add("-Xep:UnusedVariable:OFF") + options.errorprone.errorproneArgs.add("-Xep:UnusedNestedClass:OFF") + options.errorprone.errorproneArgs.add("-Xep:UnnecessaryParentheses:OFF") + options.errorprone.errorproneArgs.add("-Xep:UnsafeReflectiveConstructionCast:OFF") } if (configuration.shadowClosure) { @@ -1152,6 +1254,45 @@ class BeamModulePlugin implements Plugin { project.artifacts.testRuntime project.testJar } + if (configuration.enableJmh) { + // We specifically use a separate source set for JMH to ensure that it does not + // become a required artifact + project.sourceSets { + jmh { + java { + srcDir "src/jmh/java" + } + resources { + srcDir "src/jmh/resources" + } + } + } + + project.dependencies { + jmhAnnotationProcessor "org.openjdk.jmh:jmh-generator-annprocess:$jmh_version" + jmhCompile "org.openjdk.jmh:jmh-core:$jmh_version" + } + + project.task("jmh", type: JavaExec, dependsOn: project.jmhClasses, { + main = "org.openjdk.jmh.Main" + classpath = project.sourceSets.jmh.compileClasspath + project.sourceSets.jmh.runtimeClasspath + // For a list of arguments, see + // https://github.com/guozheng/jmh-tutorial/blob/master/README.md + // + // Filter for a specific benchmark to run (uncomment below) + // Note that multiple regex are supported each as a separate argument. + // args 'BeamFnLoggingClientBenchmark.testLoggingWithAllOptionalParameters' + // args 'additional regexp...' + // + // Enumerate available benchmarks and exit (uncomment below) + // args '-l' + // + // Enable connecting a debugger by disabling forking (uncomment below) + // Useful for debugging via an IDE such as Intellij + // args '-f0' + }) + } + project.ext.includeInJavaBom = configuration.publish project.ext.exportJavadoc = configuration.exportJavadoc @@ -1511,13 +1652,15 @@ class BeamModulePlugin implements Plugin { if (pipelineOptionsString.contains('use_runner_v2')) { def dockerImageName = project.project(':runners:google-cloud-dataflow-java').ext.dockerImageName allOptionsList.addAll([ - "--workerHarnessContainerImage=${dockerImageName}", + "--sdkContainerImage=${dockerImageName}", "--region=${dataflowRegion}" ]) } else { def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: project.project(":runners:google-cloud-dataflow-java:worker:legacy-worker").shadowJar.archivePath allOptionsList.addAll([ + // Keep as legacy flag to ensure via test this flag works for + // legacy pipeline. '--workerHarnessContainerImage=', "--dataflowWorkerJar=${dataflowWorkerJar}", "--region=${dataflowRegion}" @@ -1570,11 +1713,11 @@ class BeamModulePlugin implements Plugin { } if (runner?.equalsIgnoreCase('flink')) { - testRuntime it.project(path: ":runners:flink:1.10", configuration: 'testRuntime') + testRuntime it.project(path: ":runners:flink:${project.ext.latestFlinkVersion}", configuration: 'testRuntime') } if (runner?.equalsIgnoreCase('spark')) { - testRuntime it.project(path: ":runners:spark", configuration: 'testRuntime') + testRuntime it.project(path: ":runners:spark:2", configuration: 'testRuntime') testRuntime project.library.java.spark_core testRuntime project.library.java.spark_streaming @@ -1605,7 +1748,7 @@ class BeamModulePlugin implements Plugin { project.apply plugin: 'base' project.apply plugin: "com.github.blindpirate.gogradle" - project.golang { goVersion = '1.12' } + project.golang { goVersion = '1.16.5' } project.repositories { golang { @@ -1648,6 +1791,7 @@ class BeamModulePlugin implements Plugin { project.docker { noCache true } project.tasks.create(name: "copyLicenses", type: Copy) { from "${project.rootProject.projectDir}/LICENSE" + from "${project.rootProject.projectDir}/LICENSE.python" from "${project.rootProject.projectDir}/NOTICE" into "build/target" } @@ -1704,6 +1848,19 @@ class BeamModulePlugin implements Plugin { return "${configuration.root}/${configuration.name}:${configuration.tag}" } + project.ext.containerImageTags = { + String[] tags + if (project.rootProject.hasProperty(["docker-tag-list"])) { + tags = project.rootProject["docker-tag-list"].split(',') + } else { + tags = [ + project.rootProject.hasProperty(["docker-tag"]) ? + project.rootProject["docker-tag"] : project.sdk_version + ] + } + return tags + } + /** ***********************************************************************************************/ // applyGrpcNature should only be applied to projects who wish to use @@ -1767,6 +1924,7 @@ class BeamModulePlugin implements Plugin { } project.ext.applyJavaNature( + enableStrictDependencies: false, exportJavadoc: false, enableSpotbugs: false, publish: configuration.publish, @@ -1774,10 +1932,10 @@ class BeamModulePlugin implements Plugin { archivesBaseName: configuration.archivesBaseName, automaticModuleName: configuration.automaticModuleName, shadowJarValidationExcludes: it.shadowJarValidationExcludes, - shadowClosure: GrpcVendoring_1_26_0.shadowClosure() << { + shadowClosure: GrpcVendoring_1_36_0.shadowClosure() << { // We perform all the code relocations but don't include // any of the actual dependencies since they will be supplied - // by org.apache.beam:beam-vendor-grpc-v1p26p0:0.1 + // by org.apache.beam:beam-vendor-grpc-v1p36p0:0.1 dependencies { include(dependency { return false }) } @@ -1794,14 +1952,14 @@ class BeamModulePlugin implements Plugin { project.protobuf { protoc { // The artifact spec for the Protobuf Compiler - artifact = "com.google.protobuf:protoc:${GrpcVendoring_1_26_0.protobuf_version}" } + artifact = "com.google.protobuf:protoc:${GrpcVendoring_1_36_0.protobuf_version}" } // Configure the codegen plugins plugins { // An artifact spec for a protoc plugin, with "grpc" as // the identifier, which can be referred to in the "plugins" // container of the "generateProtoTasks" closure. - grpc { artifact = "io.grpc:protoc-gen-grpc-java:${GrpcVendoring_1_26_0.grpc_version}" } + grpc { artifact = "io.grpc:protoc-gen-grpc-java:${GrpcVendoring_1_36_0.grpc_version}" } } generateProtoTasks { @@ -1815,7 +1973,7 @@ class BeamModulePlugin implements Plugin { } } - project.dependencies GrpcVendoring_1_26_0.dependenciesClosure() << { shadow project.ext.library.java.vendored_grpc_1_26_0 } + project.dependencies GrpcVendoring_1_36_0.dependenciesClosure() << { shadow project.ext.library.java.vendored_grpc_1_36_0 } } /** ***********************************************************************************************/ @@ -1938,8 +2096,8 @@ class BeamModulePlugin implements Plugin { // Task for launching expansion services def envDir = project.project(":sdks:python").envdir def pythonDir = project.project(":sdks:python").projectDir - def javaPort = startingExpansionPortNumber.getAndDecrement() - def pythonPort = startingExpansionPortNumber.getAndDecrement() + def javaPort = getRandomPort() + def pythonPort = getRandomPort() def expansionJar = project.project(':sdks:java:testing:expansion-service').buildTestExpansionServiceJar.archivePath def expansionServiceOpts = [ "group_id": project.name, @@ -1951,8 +2109,16 @@ class BeamModulePlugin implements Plugin { ] def serviceArgs = project.project(':sdks:python').mapToArgString(expansionServiceOpts) def pythonContainerSuffix = project.project(':sdks:python').pythonVersion == '2.7' ? '2' : project.project(':sdks:python').pythonVersion.replace('.', '') + def javaContainerSuffix + if (JavaVersion.current() == JavaVersion.VERSION_1_8) { + javaContainerSuffix = 'java8' + } else if (JavaVersion.current() == JavaVersion.VERSION_11) { + javaContainerSuffix = 'java11' + } else { + throw new GradleException("unsupported java version.") + } def setupTask = project.tasks.create(name: config.name+"Setup", type: Exec) { - dependsOn ':sdks:java:container:java8:docker' + dependsOn ':sdks:java:container:'+javaContainerSuffix+':docker' dependsOn ':sdks:python:container:py'+pythonContainerSuffix+':docker' dependsOn ':sdks:java:testing:expansion-service:buildTestExpansionServiceJar' dependsOn ":sdks:python:installGcpTest" @@ -2005,13 +2171,11 @@ class BeamModulePlugin implements Plugin { config.cleanupJobServer.mustRunAfter javaTask // Task for running testcases in Python SDK - def testOpts = [ - "--attr=${config.pythonTestAttr}" - ] def beamPythonTestPipelineOptions = [ "pipeline_opts": config.pythonPipelineOptions + sdkLocationOpt, - "test_opts": testOpts + config.nosetestsOptions, - "suite": "xlangValidateRunner" + "test_opts": config.pytestOptions, + "suite": "xlangValidateRunner", + "collect": config.pythonTestAttr ] def cmdArgs = project.project(':sdks:python').mapToArgString(beamPythonTestPipelineOptions) def pythonTask = project.tasks.create(name: config.name+"PythonUsing"+sdk, type: Exec) { @@ -2028,14 +2192,36 @@ class BeamModulePlugin implements Plugin { cleanupTask.mustRunAfter pythonTask config.cleanupJobServer.mustRunAfter pythonTask } + + // Task for running Python-only testcases in Java SDK + def javaUsingPythonOnlyTask = project.tasks.create(name: config.name+"JavaUsingPythonOnly", type: Test) { + group = "Verification" + description = "Validates runner for cross-language capability of using Python-only transforms from Java SDK" + systemProperty "beamTestPipelineOptions", JsonOutput.toJson(config.javaPipelineOptions) + systemProperty "expansionJar", expansionJar + systemProperty "expansionPort", pythonPort + classpath = config.classpath + testClassesDirs = project.files(project.project(":runners:core-construction-java").sourceSets.test.output.classesDirs) + maxParallelForks config.numParallelTests + useJUnit { + includeCategories 'org.apache.beam.sdk.testing.UsesPythonExpansionService' + } + // increase maxHeapSize as this is directly correlated to direct memory, + // see https://issues.apache.org/jira/browse/BEAM-6698 + maxHeapSize = '4g' + dependsOn setupTask + dependsOn config.startJobServer + } + mainTask.dependsOn javaUsingPythonOnlyTask + cleanupTask.mustRunAfter javaUsingPythonOnlyTask + config.cleanupJobServer.mustRunAfter javaUsingPythonOnlyTask + // Task for running testcases in Python SDK - def testOpts = [ - "--attr=UsesSqlExpansionService" - ] def beamPythonTestPipelineOptions = [ "pipeline_opts": config.pythonPipelineOptions + sdkLocationOpt, - "test_opts": testOpts + config.nosetestsOptions, - "suite": "xlangSqlValidateRunner" + "test_opts": config.pytestOptions, + "suite": "xlangSqlValidateRunner", + "collect": "xlang_sql_expansion_service" ] def cmdArgs = project.project(':sdks:python').mapToArgString(beamPythonTestPipelineOptions) def pythonSqlTask = project.tasks.create(name: config.name+"PythonUsingSql", type: Exec) { @@ -2043,13 +2229,12 @@ class BeamModulePlugin implements Plugin { description = "Validates runner for cross-language capability of using Java's SqlTransform from Python SDK" executable 'sh' args '-c', ". $envDir/bin/activate && cd $pythonDir && ./scripts/run_integration_test.sh $cmdArgs" + dependsOn setupTask dependsOn config.startJobServer - dependsOn ':sdks:java:container:java8:docker' - dependsOn ':sdks:python:container:py'+pythonContainerSuffix+':docker' dependsOn ':sdks:java:extensions:sql:expansion-service:shadowJar' - dependsOn ":sdks:python:installGcpTest" } mainTask.dependsOn pythonSqlTask + cleanupTask.mustRunAfter pythonSqlTask config.cleanupJobServer.mustRunAfter pythonSqlTask } @@ -2092,7 +2277,7 @@ class BeamModulePlugin implements Plugin { project.exec { commandLine virtualenvCmd } project.exec { executable 'sh' - args '-c', ". ${project.ext.envdir}/bin/activate && pip install --retries 10 --upgrade tox==3.11.1 -r ${project.rootDir}/sdks/python/build-requirements.txt" + args '-c', ". ${project.ext.envdir}/bin/activate && pip install --retries 10 --upgrade tox==3.20.1 -r ${project.rootDir}/sdks/python/build-requirements.txt" } } // Gradle will delete outputs whenever it thinks they are stale. Putting a @@ -2213,9 +2398,9 @@ class BeamModulePlugin implements Plugin { // Build test options that configures test environment and framework def testOptions = [] if (config.tests) - testOptions += "--tests=$config.tests" + testOptions += "$config.tests" if (config.attribute) - testOptions += "--attr=$config.attribute" + testOptions += "-m=$config.attribute" testOptions.addAll(config.extraTestOptions) argMap["test_opts"] = testOptions @@ -2238,11 +2423,12 @@ class BeamModulePlugin implements Plugin { def addPortableWordCountTask = { boolean isStreaming, String runner -> def taskName = 'portableWordCount' + runner + (isStreaming ? 'Streaming' : 'Batch') + def flinkJobServerProject = ":runners:flink:${project.ext.latestFlinkVersion}:job-server" project.task(taskName) { dependsOn = ['installGcpTest'] mustRunAfter = [ - ':runners:flink:1.10:job-server:shadowJar', - ':runners:spark:job-server:shadowJar', + ":runners:flink:${project.ext.latestFlinkVersion}:job-server:shadowJar", + ':runners:spark:2:job-server:shadowJar', ':sdks:python:container:py36:docker', ':sdks:python:container:py37:docker', ':sdks:python:container:py38:docker', @@ -2256,8 +2442,8 @@ class BeamModulePlugin implements Plugin { "--runner=${runner}", "--parallelism=2", "--sdk_worker_parallelism=1", - "--flink_job_server_jar=${project.project(':runners:flink:1.10:job-server').shadowJar.archivePath}", - "--spark_job_server_jar=${project.project(':runners:spark:job-server').shadowJar.archivePath}", + "--flink_job_server_jar=${project.project(flinkJobServerProject).shadowJar.archivePath}", + "--spark_job_server_jar=${project.project(':runners:spark:2:job-server').shadowJar.archivePath}", ] if (isStreaming) options += [ diff --git a/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy b/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_36_0.groovy similarity index 72% rename from buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy rename to buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_36_0.groovy index 898faa2ab5d5..4345b4178043 100644 --- a/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy +++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_36_0.groovy @@ -21,28 +21,20 @@ package org.apache.beam.gradle /** * Utilities for working with our vendored version of gRPC. */ -class GrpcVendoring_1_26_0 { +class GrpcVendoring_1_36_0 { - static def guava_version = "26.0-jre" - static def protobuf_version = "3.11.0" - static def grpc_version = "1.26.0" + static def guava_version = "30.1-jre" + static def protobuf_version = "3.15.3" + static def grpc_version = "1.36.0" static def gson_version = "2.8.6" - static def netty_version = "4.1.51.Final" - static def google_auth_version = "0.18.0" - static def proto_google_common_protos_version = "1.12.0" - static def opencensus_version = "0.24.0" - static def perfmark_version = "0.19.0" - static def lzma_java_version = "1.3" - static def protobuf_javanano_version = "3.0.0-alpha-5" - static def jzlib_version = "1.1.3" - static def compress_lzf_version = "1.0.3" - static def lz4_version = "1.3.0" - static def bouncycastle_version = "1.54" + // tcnative version from https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty + static def netty_version = "4.1.52.Final" + // google-auth-library version from https://search.maven.org/artifact/io.grpc/grpc-auth/1.36.0/jar + static def google_auth_version = "0.22.2" + // proto-google-common-protos version from https://search.maven.org/artifact/io.grpc/grpc-protobuf/1.36.0/jar + static def proto_google_common_protos_version = "2.0.1" + static def opencensus_version = "0.28.0" static def conscrypt_version = "2.5.1" - static def alpn_api_version = "1.1.2.v20150522" - static def npn_api_version = "1.1.1.v20141010" - static def jboss_marshalling_version = "1.4.11.Final" - static def jboss_modules_version = "1.1.0.Beta1" /** Returns the list of compile time dependencies. */ static List dependencies() { @@ -57,26 +49,16 @@ class GrpcVendoring_1_26_0 { "io.grpc:grpc-netty:$grpc_version", "io.grpc:grpc-protobuf:$grpc_version", "io.grpc:grpc-stub:$grpc_version", - "io.netty:netty-transport-native-epoll:$netty_version", + // Use a classifier to ensure we get the jar containing native libraries. In the future + // hopefully netty releases a single jar containing native libraries for all architectures. + "io.netty:netty-transport-native-epoll:$netty_version:linux-x86_64", // tcnative version from https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty - "io.netty:netty-tcnative-boringssl-static:2.0.33.Final", + "io.netty:netty-tcnative-boringssl-static:2.0.34.Final", "com.google.auth:google-auth-library-credentials:$google_auth_version", "io.grpc:grpc-testing:$grpc_version", "com.google.api.grpc:proto-google-common-protos:$proto_google_common_protos_version", "io.opencensus:opencensus-api:$opencensus_version", "io.opencensus:opencensus-contrib-grpc-metrics:$opencensus_version", - "io.perfmark:perfmark-api:$perfmark_version", - "com.github.jponge:lzma-java:$lzma_java_version", - "com.google.protobuf.nano:protobuf-javanano:$protobuf_javanano_version", - "com.jcraft:jzlib:$jzlib_version", - "com.ning:compress-lzf:$compress_lzf_version", - "net.jpountz.lz4:lz4:$lz4_version", - "org.bouncycastle:bcpkix-jdk15on:$bouncycastle_version", - "org.bouncycastle:bcprov-jdk15on:$bouncycastle_version", - "org.eclipse.jetty.alpn:alpn-api:$alpn_api_version", - "org.eclipse.jetty.npn:npn-api:$npn_api_version", - "org.jboss.marshalling:jboss-marshalling:$jboss_marshalling_version", - "org.jboss.modules:jboss-modules:$jboss_modules_version" ] } @@ -86,10 +68,7 @@ class GrpcVendoring_1_26_0 { */ static List runtimeDependencies() { return [ - 'com.google.errorprone:error_prone_annotations:2.3.3', - 'commons-logging:commons-logging:1.2', - 'org.apache.logging.log4j:log4j-api:2.6.2', - 'org.slf4j:slf4j-api:1.7.30', + 'com.google.errorprone:error_prone_annotations:2.4.0', // TODO(BEAM-9288): Enable relocation for conscrypt "org.conscrypt:conscrypt-openjdk-uber:$conscrypt_version" ] @@ -119,7 +98,7 @@ class GrpcVendoring_1_26_0 { // those libraries may provide. The 'validateShadedJarDoesntLeakNonOrgApacheBeamClasses' // ensures that there are no classes outside of the 'org.apache.beam' namespace. - String version = "v1p26p0"; + String version = "v1p36p0"; String prefix = "org.apache.beam.vendor.grpc.${version}"; List packagesToRelocate = [ // guava uses the com.google.common and com.google.thirdparty package namespaces @@ -134,22 +113,11 @@ class GrpcVendoring_1_26_0 { "com.google.longrunning", "com.google.rpc", "com.google.type", + "com.google.geo.type", "io.grpc", "io.netty", "io.opencensus", "io.perfmark", - "com.google.protobuf.nano", - "com.jcraft", - "com.ning", - "com.sun", - "lzma", - "net.jpountz", - "org.bouncycastle", - "org.cservenak.streams", - "org.eclipse.jetty.alpn", - "org.eclipse.jetty.npn", - "org.jboss.marshalling", - "org.jboss.modules" ] return packagesToRelocate.collectEntries { @@ -161,6 +129,7 @@ class GrpcVendoring_1_26_0 { // this includes concatenation of string literals and constants. 'META-INF/native/libnetty': "META-INF/native/liborg_apache_beam_vendor_grpc_${version}_netty", 'META-INF/native/netty': "META-INF/native/org_apache_beam_vendor_grpc_${version}_netty", + 'META-INF/native/lib-netty': "META-INF/native/lib-org-apache-beam-vendor-grpc-${version}-netty", ] } @@ -173,25 +142,29 @@ class GrpcVendoring_1_26_0 { "com/google/errorprone/**", "com/google/instrumentation/**", "com/google/j2objc/annotations/**", + "io/netty/handler/codec/marshalling/**", + "io/netty/handler/codec/spdy/**", + "io/netty/handler/codec/compression/JZlib*", + "io/netty/handler/codec/compression/Lz4*", + "io/netty/handler/codec/compression/Lzf*", + "io/netty/handler/codec/compression/Lzma*", + "io/netty/handler/codec/protobuf/Protobuf*Nano.class", + "io/netty/util/internal/logging/CommonsLogger*", + "io/netty/util/internal/logging/LocationAwareSlf4JLogger*", + "io/netty/util/internal/logging/Log4JLogger*", + "io/netty/util/internal/logging/Log4J2Logger*", "javax/annotation/**", "junit/**", "module-info.class", - "org/apache/commons/logging/**", - "org/apache/log/**", - "org/apache/log4j/**", - "org/apache/logging/log4j/**", "org/checkerframework/**", "org/codehaus/mojo/animal_sniffer/**", "org/conscrypt/**", "META-INF/native/libconscrypt**", "META-INF/native/conscrypt**", "org/hamcrest/**", - // This Main class prevents shading (BEAM-9252) - "org/jboss/modules/Main*", "org/junit/**", "org/mockito/**", "org/objenesis/**", - "org/slf4j/**", ] } diff --git a/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy b/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy index 13a0c105d01c..ad4099fbaddd 100644 --- a/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy +++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy @@ -40,9 +40,9 @@ class Repositories { mavenLocal() jcenter() - // Spring only for resolving pentaho dependency. + // For pentaho dependencies. maven { - url "https://repo.spring.io/plugins-release/" + url "https://public.nexus.pentaho.org/repository/proxy-public-3rd-party-release" content { includeGroup "org.pentaho" } } @@ -81,6 +81,7 @@ class Repositories { jcenter() maven { url "https://plugins.gradle.org/m2/" } maven { url "https://repo.spring.io/plugins-release" } + maven { url "https://public.nexus.pentaho.org/repository/proxy-public-3rd-party-release" } maven { url "https://packages.confluent.io/maven/" } maven { url project.offlineRepositoryRoot } } diff --git a/dev-support/docker/Dockerfile b/dev-support/docker/Dockerfile index 92e5a8cdf845..ff9bb2ab52ea 100644 --- a/dev-support/docker/Dockerfile +++ b/dev-support/docker/Dockerfile @@ -42,32 +42,11 @@ ENV DEBCONF_TERSE true ### RUN apt -q update \ && apt install -y software-properties-common apt-utils apt-transport-https ca-certificates \ - && add-apt-repository -y ppa:deadsnakes/ppa \ - && apt-get -q install -y --no-install-recommends \ - bash-completion \ - build-essential \ - bzip2 \ - wget \ - curl \ - docker.io \ - g++ \ - gcc \ - git \ - gnupg-agent \ - rsync \ - sudo \ - vim \ - locales \ - wget \ - time \ - openjdk-8-jdk \ - python3-setuptools \ - python3-pip \ - python3.6 \ - python3.7 \ - python3.8 \ - virtualenv \ - tox + && add-apt-repository -y ppa:deadsnakes/ppa + +RUN mkdir /package +COPY pkglist /package/pkglist +RUN apt-get -q install -y --no-install-recommends $(grep -v '^#' /package/pkglist | cat) ### # Set the locale ( see https://stackoverflow.com/a/28406007/114196 ) @@ -87,7 +66,7 @@ RUN alias python=python3.6 ### # Install grpcio-tools mypy-protobuf for `python3 sdks/python/setup.py sdist` to work ### -RUN pip3 install grpcio-tools mypy-protobuf +RUN pip3 install grpcio-tools mypy-protobuf virtualenv ### # Install useful tools diff --git a/dev-support/docker/pkglist b/dev-support/docker/pkglist new file mode 100644 index 000000000000..00fc31684725 --- /dev/null +++ b/dev-support/docker/pkglist @@ -0,0 +1,39 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +bash-completion +build-essential +bzip2 +wget +curl +docker.io +g++ +gcc +git +gnupg-agent +rsync +sudo +vim +locales +wget +time +openjdk-8-jdk +python3-setuptools +python3-pip +python3.6 +python3.7 +python3.8 +tox +docker.io diff --git a/examples/java/build.gradle b/examples/java/build.gradle index c305f0451f09..fced844f2956 100644 --- a/examples/java/build.gradle +++ b/examples/java/build.gradle @@ -18,7 +18,12 @@ import groovy.json.JsonOutput -plugins { id 'org.apache.beam.module' } +plugins { + id 'java' + id 'org.apache.beam.module' + id 'com.github.johnrengelman.shadow' +} + applyJavaNature( exportJavadoc: false, automaticModuleName: 'org.apache.beam.examples', @@ -49,9 +54,11 @@ configurations.sparkRunnerPreCommit { dependencies { compile enforcedPlatform(library.java.google_cloud_platform_libraries_bom) compile library.java.vendored_guava_26_0_jre + compile library.java.kafka_clients compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":sdks:java:extensions:google-cloud-platform-core") compile project(":sdks:java:io:google-cloud-platform") + compile project(":sdks:java:io:kafka") compile project(":sdks:java:extensions:ml") compile library.java.avro compile library.java.bigdataoss_util @@ -63,17 +70,36 @@ dependencies { compile library.java.google_cloud_datastore_v1_proto_client compile library.java.google_code_gson compile library.java.google_http_client + compile library.java.google_oauth_client + compile library.java.jackson_databind compile library.java.joda_time + compile library.java.protobuf_java + compile library.java.proto_google_cloud_bigtable_v2 compile library.java.proto_google_cloud_datastore_v1 compile library.java.slf4j_api - compile library.java.slf4j_jdk14 + provided library.java.commons_io + provided library.java.commons_csv runtime project(path: ":runners:direct-java", configuration: "shadow") + compile library.java.vendored_grpc_1_36_0 + compile library.java.vendored_guava_26_0_jre + compile "com.google.api.grpc:proto-google-cloud-language-v1:1.81.4" + compile ("io.confluent:kafka-avro-serializer:5.3.2") { + // It depends on "spotbugs-annotations:3.1.9" which clashes with current + // "spotbugs-annotations:3.1.12" used in Beam. Not required. + exclude group: "org.apache.zookeeper", module: "zookeeper" + } + compile "org.apache.commons:commons-lang3:3.9" + compile "org.apache.httpcomponents:httpclient:4.5.13" + compile "org.apache.httpcomponents:httpcore:4.4.13" + testCompile project(path: ":runners:direct-java", configuration: "shadow") testCompile project(":sdks:java:io:google-cloud-platform") testCompile project(":sdks:java:extensions:ml") testCompile library.java.hamcrest_core testCompile library.java.hamcrest_library testCompile library.java.junit testCompile library.java.mockito_core + testCompile library.java.testcontainers_kafka + testCompile library.java.testcontainers_gcloud // Add dependencies for the PreCommit configurations // For each runner a project level dependency on the examples project. @@ -82,11 +108,11 @@ dependencies { delegate.add(runner + "PreCommit", project(path: ":examples:java", configuration: "testRuntime")) } directRunnerPreCommit project(path: ":runners:direct-java", configuration: "shadow") - flinkRunnerPreCommit project(":runners:flink:1.10") + flinkRunnerPreCommit project(":runners:flink:${project.ext.latestFlinkVersion}") // TODO: Make the netty version used configurable, we add netty-all 4.1.17.Final so it appears on the classpath // before 4.1.8.Final defined by Apache Beam sparkRunnerPreCommit "io.netty:netty-all:4.1.17.Final" - sparkRunnerPreCommit project(":runners:spark") + sparkRunnerPreCommit project(":runners:spark:2") sparkRunnerPreCommit project(":sdks:java:io:hadoop-file-system") sparkRunnerPreCommit library.java.spark_streaming sparkRunnerPreCommit library.java.spark_core @@ -129,3 +155,18 @@ task preCommit() { } } +task execute (type:JavaExec) { + main = System.getProperty("mainClass") + classpath = sourceSets.main.runtimeClasspath + systemProperties System.getProperties() + args System.getProperty("exec.args", "").split() +} + +// Run this task to validate the Java environment setup for contributors +task wordCount(type:JavaExec) { + description "Run the Java word count example" + main = "org.apache.beam.examples.WordCount" + classpath = sourceSets.main.runtimeClasspath + systemProperties = System.getProperties() + args = ["--output=/tmp/ouput.txt"] +} \ No newline at end of file diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/README.md b/examples/java/src/main/java/org/apache/beam/examples/complete/README.md index 3f4842a5c7a7..b74ea44554d5 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/README.md +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/README.md @@ -32,6 +32,11 @@ This directory contains end-to-end example pipelines that perform complex data p Pub/Sub topic, splits each line into individual words, capitalizes those words, and writes the output to a BigQuery table. +
  • KafkaToPubsub + — A streaming pipeline example that creates a pipeline to read data + from a single or multiple topics from Apache Kafka and write data into a single topic + in Google Cloud Pub/Sub. +
  • TfIdf — An example that computes a basic TF-IDF search table for a directory or Cloud Storage prefix. Demonstrates joining data, side inputs, and logging. diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/DataTokenization.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/DataTokenization.java new file mode 100644 index 000000000000..bbe3759a8ff6 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/DataTokenization.java @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization; + +import static org.apache.beam.examples.complete.datatokenization.utils.DurationUtils.parseDuration; +import static org.apache.beam.examples.complete.datatokenization.utils.SchemasUtils.DEADLETTER_SCHEMA; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import org.apache.beam.examples.complete.datatokenization.options.DataTokenizationOptions; +import org.apache.beam.examples.complete.datatokenization.transforms.DataProtectors.RowToTokenizedRow; +import org.apache.beam.examples.complete.datatokenization.transforms.JsonToBeamRow; +import org.apache.beam.examples.complete.datatokenization.transforms.SerializableFunctions; +import org.apache.beam.examples.complete.datatokenization.transforms.io.TokenizationBigQueryIO; +import org.apache.beam.examples.complete.datatokenization.transforms.io.TokenizationBigTableIO; +import org.apache.beam.examples.complete.datatokenization.transforms.io.TokenizationFileSystemIO; +import org.apache.beam.examples.complete.datatokenization.utils.ErrorConverters; +import org.apache.beam.examples.complete.datatokenization.utils.FailsafeElement; +import org.apache.beam.examples.complete.datatokenization.utils.FailsafeElementCoder; +import org.apache.beam.examples.complete.datatokenization.utils.RowToCsv; +import org.apache.beam.examples.complete.datatokenization.utils.SchemasUtils; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.coders.CoderRegistry; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.NullableCoder; +import org.apache.beam.sdk.coders.RowCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.VarIntCoder; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.gcp.bigquery.WriteResult; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.windowing.FixedWindows; +import org.apache.beam.sdk.transforms.windowing.Window; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * The {@link DataTokenization} pipeline reads data from one of the supported sources, tokenizes + * data with external API calls to some tokenization server, and writes data into one of the + * supported sinks.
    + * + *

    Pipeline Requirements + * + *

      + *
    • Java 8 + *
    • Data schema (JSON with an array of fields described in BigQuery format) + *
    • 1 of supported sources to read data from + * + *
    • 1 of supported destination sinks to write data into + * + *
    • A configured tokenization server + *
    + * + *

    Example Usage + * + *

    + * Gradle Preparation
    + * To run this example your  build.gradle file should contain the following task
    + * to execute the pipeline:
    + *   {@code
    + *   task execute (type:JavaExec) {
    + *      main = System.getProperty("mainClass")
    + *      classpath = sourceSets.main.runtimeClasspath
    + *      systemProperties System.getProperties()
    + *      args System.getProperty("exec.args", "").split()
    + *   }
    + *   }
    + * This task allows to run the pipeline via the following command:
    + *   {@code
    + *   gradle clean execute -DmainClass=org.apache.beam.examples.complete.datatokenization.DataTokenization \
    + *        -Dexec.args="--= --="
    + *   }
    + * Running the pipeline
    + * To execute this pipeline, specify the parameters:
    + *
    + * - Data schema
    + *     - dataSchemaPath: Path to data schema (JSON format) compatible with BigQuery.
    + * - 1 specified input source out of these:
    + *     - File System
    + *         - inputFilePattern: Filepattern for files to read data from
    + *         - inputFileFormat: File format of input files. Supported formats: JSON, CSV
    + *         - In case if input data is in CSV format:
    + *             - csvContainsHeaders: `true` if file(s) in bucket to read data from contain headers,
    + *               and `false` otherwise
    + *             - csvDelimiter: Delimiting character in CSV. Default: use delimiter provided in
    + *               csvFormat
    + *             - csvFormat: Csv format according to Apache Commons CSV format. Default is:
    + *               [Apache Commons CSV default](https://static.javadoc.io/org.apache.commons/commons-csv/1.7/org/apache/commons/csv/CSVFormat.html#DEFAULT)
    + *               . Must match format names exactly found
    + *               at: https://static.javadoc.io/org.apache.commons/commons-csv/1.7/org/apache/commons/csv/CSVFormat.Predefined.html
    + *     - Google Pub/Sub
    + *         - pubsubTopic: The Cloud Pub/Sub topic to read from, in the format of '
    + *           projects/yourproject/topics/yourtopic'
    + * - 1 specified output sink out of these:
    + *     - File System
    + *         - outputDirectory: Directory to write data to
    + *         - outputFileFormat: File format of output files. Supported formats: JSON, CSV
    + *         - windowDuration: The window duration in which data will be written. Should be specified
    + *           only for 'Pub/Sub -> FileSystem' case. Defaults to 30s.
    + *
    + *           Allowed formats are:
    + *             - Ns (for seconds, example: 5s),
    + *             - Nm (for minutes, example: 12m),
    + *             - Nh (for hours, example: 2h).
    + *     - Google Cloud BigQuery
    + *         - bigQueryTableName: Cloud BigQuery table name to write into
    + *         - tempLocation: Folder in a Google Cloud Storage bucket, which is needed for
    + *           BigQuery to handle data writing
    + *     - Cloud BigTable
    + *         - bigTableProjectId: Id of the project where the Cloud BigTable instance to write into
    + *           is located
    + *         - bigTableInstanceId: Id of the Cloud BigTable instance to write into
    + *         - bigTableTableId: Id of the Cloud BigTable table to write into
    + *         - bigTableKeyColumnName: Column name to use as a key in Cloud BigTable
    + *         - bigTableColumnFamilyName: Column family name to use in Cloud BigTable
    + * - RPC server parameters
    + *     - rpcUri: URI for the API calls to RPC server
    + *     - batchSize: Size of the batch to send to RPC server per request
    + *
    + * The template allows for the user to supply the following optional parameter:
    + *
    + * - nonTokenizedDeadLetterPath: Folder where failed to tokenize data will be stored
    + *
    + *
    + * Specify the parameters in the following format:
    + *
    + * {@code
    + * --dataSchemaPath="path-to-data-schema-in-json-format"
    + * --inputFilePattern="path-pattern-to-input-data"
    + * --outputDirectory="path-to-output-directory"
    + * # example for CSV case
    + * --inputFileFormat="CSV"
    + * --outputFileFormat="CSV"
    + * --csvContainsHeaders="true"
    + * --nonTokenizedDeadLetterPath="path-to-errors-rows-writing"
    + * --batchSize=batch-size-number
    + * --rpcUri=http://host:port/tokenize
    + * }
    + *
    + * By default, this will run the pipeline locally with the DirectRunner. To change the runner, specify:
    + *
    + * {@code
    + * --runner=YOUR_SELECTED_RUNNER
    + * }
    + * 
    + */ +public class DataTokenization { + + /** Logger for class. */ + private static final Logger LOG = LoggerFactory.getLogger(DataTokenization.class); + + /** String/String Coder for FailsafeElement. */ + public static final FailsafeElementCoder FAILSAFE_ELEMENT_CODER = + FailsafeElementCoder.of( + NullableCoder.of(StringUtf8Coder.of()), NullableCoder.of(StringUtf8Coder.of())); + + /** The default suffix for error tables if dead letter table is not specified. */ + private static final String DEFAULT_DEADLETTER_TABLE_SUFFIX = "_error_records"; + + /** The tag for the main output for the UDF. */ + private static final TupleTag TOKENIZATION_OUT = new TupleTag() {}; + + /** The tag for the dead-letter output of the udf. */ + static final TupleTag> TOKENIZATION_DEADLETTER_OUT = + new TupleTag>() {}; + + /** + * Main entry point for pipeline execution. + * + * @param args Command line arguments to the pipeline. + */ + public static void main(String[] args) { + DataTokenizationOptions options = + PipelineOptionsFactory.fromArgs(args).withValidation().as(DataTokenizationOptions.class); + FileSystems.setDefaultPipelineOptions(options); + + run(options); + } + + /** + * Runs the pipeline to completion with the specified options. + * + * @param options The execution options. + * @return The pipeline result. + */ + @SuppressWarnings({"dereference.of.nullable", "argument.type.incompatible"}) + public static PipelineResult run(DataTokenizationOptions options) { + SchemasUtils schema = null; + try { + schema = new SchemasUtils(options.getDataSchemaPath(), StandardCharsets.UTF_8); + } catch (IOException e) { + LOG.error("Failed to retrieve schema for data.", e); + } + checkArgument(schema != null, "Data schema is mandatory."); + + // Create the pipeline + Pipeline pipeline = Pipeline.create(options); + // Register the coder for pipeline + CoderRegistry coderRegistry = pipeline.getCoderRegistry(); + coderRegistry.registerCoderForType( + FAILSAFE_ELEMENT_CODER.getEncodedTypeDescriptor(), FAILSAFE_ELEMENT_CODER); + coderRegistry.registerCoderForType( + RowCoder.of(schema.getBeamSchema()).getEncodedTypeDescriptor(), + RowCoder.of(schema.getBeamSchema())); + + /* + * Row/Row Coder for FailsafeElement. + */ + FailsafeElementCoder coder = + FailsafeElementCoder.of( + RowCoder.of(schema.getBeamSchema()), RowCoder.of(schema.getBeamSchema())); + + coderRegistry.registerCoderForType(coder.getEncodedTypeDescriptor(), coder); + + PCollection rows; + if (options.getInputFilePattern() != null) { + rows = new TokenizationFileSystemIO(options).read(pipeline, schema); + } else if (options.getPubsubTopic() != null) { + rows = + pipeline + .apply( + "ReadMessagesFromPubsub", + PubsubIO.readStrings().fromTopic(options.getPubsubTopic())) + .apply( + "TransformToBeamRow", + new JsonToBeamRow(options.getNonTokenizedDeadLetterPath(), schema)); + if (options.getOutputDirectory() != null) { + rows = rows.apply(Window.into(FixedWindows.of(parseDuration(options.getWindowDuration())))); + } + } else { + throw new IllegalStateException( + "No source is provided, please configure File System or Pub/Sub"); + } + + /* + Tokenize data using remote API call + */ + PCollectionTuple tokenizedRows = + rows.setRowSchema(schema.getBeamSchema()) + .apply( + MapElements.into( + TypeDescriptors.kvs(TypeDescriptors.integers(), TypeDescriptors.rows())) + .via((Row row) -> KV.of(0, row))) + .setCoder(KvCoder.of(VarIntCoder.of(), RowCoder.of(schema.getBeamSchema()))) + .apply( + "DsgTokenization", + RowToTokenizedRow.newBuilder() + .setBatchSize(options.getBatchSize()) + .setRpcURI(options.getRpcUri()) + .setSchema(schema.getBeamSchema()) + .setSuccessTag(TOKENIZATION_OUT) + .setFailureTag(TOKENIZATION_DEADLETTER_OUT) + .build()); + + String csvDelimiter = options.getCsvDelimiter(); + if (options.getNonTokenizedDeadLetterPath() != null) { + /* + Write tokenization errors to dead-letter sink + */ + tokenizedRows + .get(TOKENIZATION_DEADLETTER_OUT) + .apply( + "ConvertToCSV", + MapElements.into(FAILSAFE_ELEMENT_CODER.getEncodedTypeDescriptor()) + .via( + (FailsafeElement fse) -> + FailsafeElement.of( + new RowToCsv(csvDelimiter).getCsvFromRow(fse.getOriginalPayload()), + new RowToCsv(csvDelimiter).getCsvFromRow(fse.getPayload())))) + .apply( + "WriteTokenizationErrorsToFS", + ErrorConverters.WriteErrorsToTextIO.newBuilder() + .setErrorWritePath(options.getNonTokenizedDeadLetterPath()) + .setTranslateFunction(SerializableFunctions.getCsvErrorConverter()) + .build()); + } + + if (options.getOutputDirectory() != null) { + new TokenizationFileSystemIO(options) + .write(tokenizedRows.get(TOKENIZATION_OUT), schema.getBeamSchema()); + } else if (options.getBigQueryTableName() != null) { + WriteResult writeResult = + TokenizationBigQueryIO.write( + tokenizedRows.get(TOKENIZATION_OUT), + options.getBigQueryTableName(), + schema.getBigQuerySchema()); + writeResult + .getFailedInsertsWithErr() + .apply( + "WrapInsertionErrors", + MapElements.into(FAILSAFE_ELEMENT_CODER.getEncodedTypeDescriptor()) + .via(TokenizationBigQueryIO::wrapBigQueryInsertError)) + .setCoder(FAILSAFE_ELEMENT_CODER) + .apply( + "WriteInsertionFailedRecords", + ErrorConverters.WriteStringMessageErrors.newBuilder() + .setErrorRecordsTable( + options.getBigQueryTableName() + DEFAULT_DEADLETTER_TABLE_SUFFIX) + .setErrorRecordsTableSchema(DEADLETTER_SCHEMA) + .build()); + } else if (options.getBigTableInstanceId() != null) { + new TokenizationBigTableIO(options) + .write(tokenizedRows.get(TOKENIZATION_OUT), schema.getBeamSchema()); + } else { + throw new IllegalStateException( + "No sink is provided, please configure BigQuery or BigTable."); + } + + return pipeline.run(); + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/README.md b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/README.md new file mode 100644 index 000000000000..7d92f604ee4f --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/README.md @@ -0,0 +1,172 @@ + + +# Apache Beam pipeline example to tokenize data using remote RPC server + +This directory contains an Apache Beam example that creates a pipeline to read data from one of +the supported sources, tokenize data with external API calls to remote RPC server, and write data into one of the supported sinks. + +Supported data formats: + +- JSON +- CSV + +Supported input sources: + +- File system +- [Google Pub/Sub](https://cloud.google.com/pubsub) + +Supported destination sinks: + +- File system +- [Google Cloud BigQuery](https://cloud.google.com/bigquery) +- [Cloud BigTable](https://cloud.google.com/bigtable) + +Supported data schema format: + +- JSON with an array of fields described in BigQuery format + +In the main scenario, the template will create an Apache Beam pipeline that will read data in CSV or +JSON format from a specified input source, send the data to an external processing server, receive +processed data, and write it into a specified output sink. + +## Requirements + +- Java 8 +- 1 of supported sources to read data from +- 1 of supported destination sinks to write data into +- A configured RPC to tokenize data + +## Getting Started + +This section describes what is needed to get the template up and running. + +- Gradle preparation +- Local execution +- Running as a Dataflow Template + - Setting Up Project Environment + - Build Data Tokenization Dataflow Flex Template + - Creating the Dataflow Flex Template + - Executing Template + +## Gradle preparation + +To run this example your `build.gradle` file should contain the following task to execute the pipeline: + +``` +task execute (type:JavaExec) { + main = System.getProperty("mainClass") + classpath = sourceSets.main.runtimeClasspath + systemProperties System.getProperties() + args System.getProperty("exec.args", "").split() +} +``` + +This task allows to run the pipeline via the following command: + +```bash +gradle clean execute -DmainClass=org.apache.beam.examples.complete.datatokenization.DataTokenization \ + -Dexec.args="--= --=" +``` + +## Running the pipeline + +To execute this pipeline, specify the parameters: + +- Data schema + - **dataSchemaPath**: Path to data schema (JSON format) compatible with BigQuery. +- 1 specified input source out of these: + - File System + - **inputFilePattern**: Filepattern for files to read data from + - **inputFileFormat**: File format of input files. Supported formats: JSON, CSV + - In case if input data is in CSV format: + - **csvContainsHeaders**: `true` if file(s) in bucket to read data from contain headers, + and `false` otherwise + - **csvDelimiter**: Delimiting character in CSV. Default: use delimiter provided in + csvFormat + - **csvFormat**: Csv format according to Apache Commons CSV format. Default is: + [Apache Commons CSV default](https://static.javadoc.io/org.apache.commons/commons-csv/1.7/org/apache/commons/csv/CSVFormat.html#DEFAULT) + . Must match format names exactly found + at: https://static.javadoc.io/org.apache.commons/commons-csv/1.7/org/apache/commons/csv/CSVFormat.Predefined.html + - Google Pub/Sub + - **pubsubTopic**: The Cloud Pub/Sub topic to read from, in the format of ' + projects/yourproject/topics/yourtopic' +- 1 specified output sink out of these: + - File System + - **outputDirectory**: Directory to write data to + - **outputFileFormat**: File format of output files. Supported formats: JSON, CSV + - **windowDuration**: The window duration in which data will be written. Should be specified + only for 'Pub/Sub -> GCS' case. Defaults to 30s. + + Allowed formats are: + - Ns (for seconds, example: 5s), + - Nm (for minutes, example: 12m), + - Nh (for hours, example: 2h). + - Google Cloud BigQuery + - **bigQueryTableName**: Cloud BigQuery table name to write into + - **tempLocation**: Folder in a Google Cloud Storage bucket, which is needed for + BigQuery to handle data writing + - Cloud BigTable + - **bigTableProjectId**: Id of the project where the Cloud BigTable instance to write into + is located + - **bigTableInstanceId**: Id of the Cloud BigTable instance to write into + - **bigTableTableId**: Id of the Cloud BigTable table to write into + - **bigTableKeyColumnName**: Column name to use as a key in Cloud BigTable + - **bigTableColumnFamilyName**: Column family name to use in Cloud BigTable +- RPC server parameters + - **rpcUri**: URI for the API calls to RPC server + - **batchSize**: Size of the batch to send to RPC server per request + +The template allows for the user to supply the following optional parameter: + +- **nonTokenizedDeadLetterPath**: Folder where failed to tokenize data will be stored + +The template also allows user to override the environment variable: + +- **MAX_BUFFERING_DURATION_MS**: Max duration of buffering rows in milliseconds. Default value: 100ms. + +in the following format: + +```bash +--dataSchemaPath="path-to-data-schema-in-json-format" +--inputFilePattern="path-pattern-to-input-data" +--outputDirectory="path-to-output-directory" +# example for CSV case +--inputFileFormat="CSV" +--outputFileFormat="CSV" +--csvContainsHeaders="true" +--nonTokenizedDeadLetterPath="path-to-errors-rows-writing" +--batchSize=batch-size-number +--rpcUri=http://host:port/tokenize +``` + +By default, this will run the pipeline locally with the DirectRunner. To change the runner, specify: + +```bash +--runner=YOUR_SELECTED_RUNNER +``` + +See the [documentation](http://beam.apache.org/get-started/quickstart/) and +the [Examples README](../../../../../../../../../README.md) for more information about how to run this example. + +## Running as a Dataflow Template + +This example also exists as Google Dataflow Template, which you can build and run using Google Cloud Platform. See +this template documentation [README.md](https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/v2/protegrity-data-tokenization/README.md) for +more information. diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/options/DataTokenizationOptions.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/options/DataTokenizationOptions.java new file mode 100644 index 000000000000..c68a5378e487 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/options/DataTokenizationOptions.java @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.options; + +import org.apache.beam.examples.complete.datatokenization.transforms.io.TokenizationBigTableIO; +import org.apache.beam.examples.complete.datatokenization.transforms.io.TokenizationFileSystemIO.FileSystemPipelineOptions; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * The {@link DataTokenizationOptions} interface provides the custom execution options passed by the + * executor at the command-line. + */ +public interface DataTokenizationOptions + extends PipelineOptions, FileSystemPipelineOptions, TokenizationBigTableIO.BigTableOptions { + + @Description("Path to data schema (JSON format) compatible with BigQuery.") + String getDataSchemaPath(); + + void setDataSchemaPath(String dataSchemaPath); + + @Description( + "The Cloud Pub/Sub topic to read from." + + "The name should be in the format of " + + "projects//topics/.") + String getPubsubTopic(); + + void setPubsubTopic(String pubsubTopic); + + @Description("Cloud BigQuery table name to write into.") + String getBigQueryTableName(); + + void setBigQueryTableName(String bigQueryTableName); + + // Tokenization API specific parameters + @Description("URI for the API calls to RPC server.") + String getRpcUri(); + + void setRpcUri(String dsgUri); + + @Description("Size of the batch to send to RPC server per request.") + @Default.Integer(10) + Integer getBatchSize(); + + void setBatchSize(Integer batchSize); + + @Description("Dead-Letter path to store not-tokenized data") + String getNonTokenizedDeadLetterPath(); + + void setNonTokenizedDeadLetterPath(String nonTokenizedDeadLetterPath); +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/options/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/options/package-info.java new file mode 100644 index 000000000000..b61fce151910 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/options/package-info.java @@ -0,0 +1,18 @@ +/* + * Copyright (C) 2020 Google Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); you may not + * use this file except in compliance with the License. You may obtain a copy of + * the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ + +/** Protegrity Data Tokenization template for Google Cloud Teleport. */ +package org.apache.beam.examples.complete.datatokenization.options; diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/package-info.java new file mode 100644 index 000000000000..09d4df153c3b --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/package-info.java @@ -0,0 +1,18 @@ +/* + * Copyright (C) 2020 Google Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); you may not + * use this file except in compliance with the License. You may obtain a copy of + * the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ + +/** Protegrity Data Tokenization template for Google Cloud Teleport. */ +package org.apache.beam.examples.complete.datatokenization; diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/DataProtectors.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/DataProtectors.java new file mode 100644 index 000000000000..199b139c4b13 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/DataProtectors.java @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.transforms; + +import static org.apache.beam.sdk.util.RowJsonUtils.rowToJson; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.api.services.bigquery.model.TableRow; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.UUID; +import org.apache.beam.examples.complete.datatokenization.utils.FailsafeElement; +import org.apache.beam.examples.complete.datatokenization.utils.FailsafeElementCoder; +import org.apache.beam.sdk.coders.RowCoder; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.util.RowJson; +import org.apache.beam.sdk.util.RowJsonUtils; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.gson.Gson; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.gson.JsonArray; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.gson.JsonObject; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; +import org.apache.commons.io.IOUtils; +import org.apache.http.HttpEntity; +import org.apache.http.HttpStatus; +import org.apache.http.client.methods.CloseableHttpResponse; +import org.apache.http.client.methods.HttpPost; +import org.apache.http.entity.ByteArrayEntity; +import org.apache.http.entity.ContentType; +import org.apache.http.impl.client.CloseableHttpClient; +import org.apache.http.impl.client.HttpClients; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * The {@link DataProtectors} Using passing parameters transform will buffer input rows in batch and + * will send it when the count of buffered rows will equal specified batch size. When it takes the + * last one batch, it will send it when the last row will come to doFn even count of buffered rows + * will less than the batch size. + */ +public class DataProtectors { + + /** Logger for class. */ + private static final Logger LOG = LoggerFactory.getLogger(DataProtectors.class); + + public static final String ID_FIELD_NAME = "ID"; + private static final Long MAX_BUFFERING_DURATION_MS = + Long.valueOf(System.getenv().getOrDefault("MAX_BUFFERING_DURATION_MS", "100")); + + /** + * The {@link RowToTokenizedRow} transform converts {@link Row} to {@link TableRow} objects. The + * transform accepts a {@link FailsafeElement} object so the original payload of the incoming + * record can be maintained across multiple series of transforms. + */ + @AutoValue + public abstract static class RowToTokenizedRow + extends PTransform>, PCollectionTuple> { + + public static Builder newBuilder() { + return new AutoValue_DataProtectors_RowToTokenizedRow.Builder<>(); + } + + public abstract TupleTag successTag(); + + public abstract TupleTag> failureTag(); + + public abstract Schema schema(); + + public abstract int batchSize(); + + public abstract String rpcURI(); + + @Override + public PCollectionTuple expand(PCollection> inputRows) { + FailsafeElementCoder coder = + FailsafeElementCoder.of(RowCoder.of(schema()), RowCoder.of(schema())); + + Duration maxBuffering = Duration.millis(MAX_BUFFERING_DURATION_MS); + PCollectionTuple pCollectionTuple = + inputRows + .apply( + "GroupRowsIntoBatches", + GroupIntoBatches.ofSize(batchSize()) + .withMaxBufferingDuration(maxBuffering)) + .apply( + "Tokenize", + ParDo.of(new TokenizationFn(schema(), rpcURI(), failureTag())) + .withOutputTags(successTag(), TupleTagList.of(failureTag()))); + + return PCollectionTuple.of( + successTag(), pCollectionTuple.get(successTag()).setRowSchema(schema())) + .and(failureTag(), pCollectionTuple.get(failureTag()).setCoder(coder)); + } + + /** Builder for {@link RowToTokenizedRow}. */ + @AutoValue.Builder + public abstract static class Builder { + + public abstract Builder setSuccessTag(TupleTag successTag); + + public abstract Builder setFailureTag(TupleTag> failureTag); + + public abstract Builder setSchema(Schema schema); + + public abstract Builder setBatchSize(int batchSize); + + public abstract Builder setRpcURI(String rpcURI); + + public abstract RowToTokenizedRow build(); + } + } + + /** Class implements stateful doFn for data tokenization using remote RPC. */ + @SuppressWarnings("initialization.static.fields.uninitialized") + public static class TokenizationFn extends DoFn>, Row> { + + private static Schema schemaToRpc; + private static CloseableHttpClient httpclient; + private static ObjectMapper objectMapperSerializerForSchema; + private static ObjectMapper objectMapperDeserializerForSchema; + + private final Schema schema; + private final String rpcURI; + private final TupleTag> failureTag; + + private Map inputRowsWithIds; + + public TokenizationFn( + Schema schema, String rpcURI, TupleTag> failureTag) { + this.schema = schema; + this.rpcURI = rpcURI; + this.failureTag = failureTag; + this.inputRowsWithIds = new HashMap<>(); + } + + @Setup + public void setup() { + + List fields = schema.getFields(); + fields.add(Field.of(ID_FIELD_NAME, FieldType.STRING)); + schemaToRpc = new Schema(fields); + + objectMapperSerializerForSchema = + RowJsonUtils.newObjectMapperWith(RowJson.RowJsonSerializer.forSchema(schemaToRpc)); + + objectMapperDeserializerForSchema = + RowJsonUtils.newObjectMapperWith(RowJson.RowJsonDeserializer.forSchema(schemaToRpc)); + + httpclient = HttpClients.createDefault(); + } + + @Teardown + public void close() { + try { + httpclient.close(); + } catch (IOException exception) { + String exceptionMessage = exception.getMessage(); + if (exceptionMessage != null) { + LOG.warn("Can't close connection: {}", exceptionMessage); + } + } + } + + @ProcessElement + @SuppressWarnings("argument.type.incompatible") + public void process(@Element KV> element, ProcessContext context) { + Iterable rows = element.getValue(); + + try { + for (Row outputRow : getTokenizedRow(rows)) { + context.output(outputRow); + } + } catch (Exception e) { + for (Row outputRow : rows) { + context.output( + failureTag, + FailsafeElement.of(outputRow, outputRow) + .setErrorMessage(e.getMessage()) + .setStacktrace(Throwables.getStackTraceAsString(e))); + } + } + } + + private ArrayList rowsToJsons(Iterable inputRows) { + ArrayList jsons = new ArrayList<>(); + Map inputRowsWithIds = new HashMap<>(); + for (Row inputRow : inputRows) { + + Row.Builder builder = Row.withSchema(schemaToRpc); + for (Schema.Field field : schemaToRpc.getFields()) { + if (inputRow.getSchema().hasField(field.getName())) { + builder = builder.addValue(inputRow.getValue(field.getName())); + } + } + String id = UUID.randomUUID().toString(); + builder = builder.addValue(id); + inputRowsWithIds.put(id, inputRow); + + Row row = builder.build(); + + jsons.add(rowToJson(objectMapperSerializerForSchema, row)); + } + this.inputRowsWithIds = inputRowsWithIds; + return jsons; + } + + private String formatJsonsToRpcBatch(Iterable jsons) { + StringBuilder stringBuilder = new StringBuilder(String.join(",", jsons)); + stringBuilder.append("]").insert(0, "{\"data\": [").append("}"); + return stringBuilder.toString(); + } + + @SuppressWarnings("argument.type.incompatible") + private ArrayList getTokenizedRow(Iterable inputRows) throws IOException { + ArrayList outputRows = new ArrayList<>(); + + CloseableHttpResponse response = + sendRpc(formatJsonsToRpcBatch(rowsToJsons(inputRows)).getBytes(Charset.defaultCharset())); + + if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) { + LOG.error("Send to RPC '{}' failed with '{}'", this.rpcURI, response.getStatusLine()); + } + + String tokenizedData = + IOUtils.toString(response.getEntity().getContent(), StandardCharsets.UTF_8); + + Gson gson = new Gson(); + JsonArray jsonTokenizedRows = + gson.fromJson(tokenizedData, JsonObject.class).getAsJsonArray("data"); + + for (int i = 0; i < jsonTokenizedRows.size(); i++) { + Row tokenizedRow = + RowJsonUtils.jsonToRow( + objectMapperDeserializerForSchema, jsonTokenizedRows.get(i).toString()); + Row.FieldValueBuilder rowBuilder = + Row.fromRow(this.inputRowsWithIds.get(tokenizedRow.getString(ID_FIELD_NAME))); + for (Schema.Field field : schemaToRpc.getFields()) { + if (field.getName().equals(ID_FIELD_NAME)) { + continue; + } + rowBuilder = + rowBuilder.withFieldValue(field.getName(), tokenizedRow.getValue(field.getName())); + } + outputRows.add(rowBuilder.build()); + } + + return outputRows; + } + + private CloseableHttpResponse sendRpc(byte[] data) throws IOException { + HttpPost httpPost = new HttpPost(rpcURI); + HttpEntity stringEntity = new ByteArrayEntity(data, ContentType.APPLICATION_JSON); + httpPost.setEntity(stringEntity); + return httpclient.execute(httpPost); + } + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/JsonToBeamRow.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/JsonToBeamRow.java new file mode 100644 index 000000000000..a6c87a368cf9 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/JsonToBeamRow.java @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.transforms; + +import static org.apache.beam.examples.complete.datatokenization.DataTokenization.FAILSAFE_ELEMENT_CODER; + +import org.apache.beam.examples.complete.datatokenization.utils.ErrorConverters; +import org.apache.beam.examples.complete.datatokenization.utils.FailsafeElement; +import org.apache.beam.examples.complete.datatokenization.utils.SchemasUtils; +import org.apache.beam.sdk.transforms.JsonToRow; +import org.apache.beam.sdk.transforms.JsonToRow.ParseResult; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; + +/** The {@link JsonToBeamRow} converts jsons string to beam rows. */ +public class JsonToBeamRow extends PTransform, PCollection> { + + private final String failedToParseDeadLetterPath; + private final transient SchemasUtils schema; + + public JsonToBeamRow(String failedToParseDeadLetterPath, SchemasUtils schema) { + this.failedToParseDeadLetterPath = failedToParseDeadLetterPath; + this.schema = schema; + } + + @Override + @SuppressWarnings("argument.type.incompatible") + public PCollection expand(PCollection jsons) { + ParseResult rows = + jsons.apply( + "JsonToRow", + JsonToRow.withExceptionReporting(schema.getBeamSchema()).withExtendedErrorInfo()); + + if (failedToParseDeadLetterPath != null) { + /* + * Write Row conversion errors to filesystem specified path + */ + rows.getFailedToParseLines() + .apply( + "ToFailsafeElement", + MapElements.into(FAILSAFE_ELEMENT_CODER.getEncodedTypeDescriptor()) + .via( + (Row errRow) -> + FailsafeElement.of( + Strings.nullToEmpty(errRow.getString("line")), + Strings.nullToEmpty(errRow.getString("line"))) + .setErrorMessage(Strings.nullToEmpty(errRow.getString("err"))))) + .apply( + "WriteCsvConversionErrorsToFS", + ErrorConverters.WriteErrorsToTextIO.newBuilder() + .setErrorWritePath(failedToParseDeadLetterPath) + .setTranslateFunction(SerializableFunctions.getCsvErrorConverter()) + .build()); + } + + return rows.getResults().setRowSchema(schema.getBeamSchema()); + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/SerializableFunctions.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/SerializableFunctions.java new file mode 100644 index 000000000000..1e5df0776e16 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/SerializableFunctions.java @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.transforms; + +import java.time.Instant; +import java.util.ArrayList; +import org.apache.beam.examples.complete.datatokenization.utils.FailsafeElement; +import org.apache.beam.sdk.transforms.SerializableFunction; + +/** The {@link SerializableFunctions} class to store static Serializable functions. */ +public class SerializableFunctions { + + private static final SerializableFunction, String> + csvErrorConverter = + (FailsafeElement failsafeElement) -> { + ArrayList outputRow = new ArrayList<>(); + final String message = failsafeElement.getOriginalPayload(); + String timestamp = Instant.now().toString(); + outputRow.add(timestamp); + outputRow.add(failsafeElement.getErrorMessage()); + outputRow.add(failsafeElement.getStacktrace()); + // Only set the payload if it's populated on the message. + if (failsafeElement.getOriginalPayload() != null) { + outputRow.add(message); + } + + return String.join(",", outputRow); + }; + + public static SerializableFunction, String> + getCsvErrorConverter() { + return csvErrorConverter; + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/TokenizationBigQueryIO.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/TokenizationBigQueryIO.java new file mode 100644 index 000000000000..fe8f4c1afad8 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/TokenizationBigQueryIO.java @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.transforms.io; + +import com.google.api.services.bigquery.model.TableRow; +import com.google.api.services.bigquery.model.TableSchema; +import java.io.IOException; +import org.apache.beam.examples.complete.datatokenization.utils.FailsafeElement; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryInsertError; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils; +import org.apache.beam.sdk.io.gcp.bigquery.InsertRetryPolicy; +import org.apache.beam.sdk.io.gcp.bigquery.WriteResult; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** The {@link TokenizationBigQueryIO} class for writing data from template to BigTable. */ +public class TokenizationBigQueryIO { + + /** Logger for class. */ + private static final Logger LOG = LoggerFactory.getLogger(TokenizationBigQueryIO.class); + + public static WriteResult write( + PCollection input, String bigQueryTableName, TableSchema schema) { + return input + .apply("RowToTableRow", ParDo.of(new RowToTableRowFn())) + .apply( + "WriteSuccessfulRecords", + org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.writeTableRows() + .withCreateDisposition( + org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition + .CREATE_IF_NEEDED) + .withWriteDisposition( + org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition + .WRITE_APPEND) + .withExtendedErrorInfo() + .withMethod( + org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.Method.STREAMING_INSERTS) + .withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors()) + .withSchema(schema) + .to(bigQueryTableName)); + } + + /** + * Method to wrap a {@link BigQueryInsertError} into a {@link FailsafeElement}. + * + * @param insertError BigQueryInsert error. + * @return FailsafeElement object. + */ + public static FailsafeElement wrapBigQueryInsertError( + BigQueryInsertError insertError) { + + FailsafeElement failsafeElement; + try { + + failsafeElement = + FailsafeElement.of( + insertError.getRow().toPrettyString(), insertError.getRow().toPrettyString()); + failsafeElement.setErrorMessage(insertError.getError().toPrettyString()); + + } catch (IOException e) { + TokenizationBigQueryIO.LOG.error("Failed to wrap BigQuery insert error."); + throw new RuntimeException(e); + } + return failsafeElement; + } + + /** + * The {@link RowToTableRowFn} class converts a row to tableRow using {@link + * BigQueryUtils#toTableRow()}. + */ + public static class RowToTableRowFn extends DoFn { + + @ProcessElement + public void processElement(ProcessContext context) { + Row row = context.element(); + context.output(BigQueryUtils.toTableRow(row)); + } + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/TokenizationBigTableIO.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/TokenizationBigTableIO.java new file mode 100644 index 000000000000..d7d1c3e97232 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/TokenizationBigTableIO.java @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.transforms.io; + +import com.google.bigtable.v2.Mutation; +import com.google.protobuf.ByteString; +import java.nio.charset.StandardCharsets; +import java.util.Objects; +import java.util.Set; +import java.util.stream.Collectors; +import org.apache.beam.examples.complete.datatokenization.options.DataTokenizationOptions; +import org.apache.beam.sdk.io.gcp.bigtable.BigtableIO; +import org.apache.beam.sdk.io.gcp.bigtable.BigtableWriteResult; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.Row; +import org.apache.commons.lang3.tuple.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** The {@link TokenizationBigTableIO} class for writing data from template to BigTable. */ +public class TokenizationBigTableIO { + + /** Logger for class. */ + private static final Logger LOG = LoggerFactory.getLogger(TokenizationBigTableIO.class); + + private final DataTokenizationOptions options; + + public TokenizationBigTableIO(DataTokenizationOptions options) { + this.options = options; + } + + public PDone write(PCollection input, Schema schema) { + return input + .apply("ConvertToBigTableFormat", ParDo.of(new TransformToBigTableFormat(schema))) + .apply( + "WriteToBigTable", + BigtableIO.write() + .withProjectId(options.getBigTableProjectId()) + .withInstanceId(options.getBigTableInstanceId()) + .withTableId(options.getBigTableTableId()) + .withWriteResults()) + .apply("LogRowCount", new LogSuccessfulRows()); + } + + static class TransformToBigTableFormat extends DoFn>> { + + private final Schema schema; + + TransformToBigTableFormat(Schema schema) { + this.schema = schema; + } + + @ProcessElement + public void processElement( + @Element Row in, OutputReceiver>> out, ProcessContext c) { + DataTokenizationOptions options = c.getPipelineOptions().as(DataTokenizationOptions.class); + // Mapping every field in provided Row to Mutation.SetCell, which will create/update + // cell content with provided data + Set mutations = + schema.getFields().stream() + .map(Schema.Field::getName) + // Ignoring key field, otherwise it will be added as regular column + .filter(fieldName -> !Objects.equals(fieldName, options.getBigTableKeyColumnName())) + .map(fieldName -> Pair.of(fieldName, in.getString(fieldName))) + .map( + pair -> + Mutation.newBuilder() + .setSetCell( + Mutation.SetCell.newBuilder() + .setFamilyName(options.getBigTableColumnFamilyName()) + .setColumnQualifier( + ByteString.copyFrom(pair.getKey(), StandardCharsets.UTF_8)) + .setValue( + ByteString.copyFrom(pair.getValue(), StandardCharsets.UTF_8)) + .setTimestampMicros(System.currentTimeMillis() * 1000) + .build()) + .build()) + .collect(Collectors.toSet()); + // Converting key value to BigTable format + String columnName = in.getString(options.getBigTableKeyColumnName()); + if (columnName != null) { + ByteString key = ByteString.copyFrom(columnName, StandardCharsets.UTF_8); + out.output(KV.of(key, mutations)); + } + } + } + + static class LogSuccessfulRows extends PTransform, PDone> { + + @Override + public PDone expand(PCollection input) { + input.apply( + ParDo.of( + new DoFn() { + @ProcessElement + public void processElement(@Element BigtableWriteResult in) { + LOG.info("Successfully wrote {} rows.", in.getRowsWritten()); + } + })); + return PDone.in(input.getPipeline()); + } + } + + /** + * Necessary {@link PipelineOptions} options for Pipelines that perform write operations to + * BigTable. + */ + public interface BigTableOptions extends PipelineOptions { + + @Description("Id of the project where the Cloud BigTable instance to write into is located.") + String getBigTableProjectId(); + + void setBigTableProjectId(String bigTableProjectId); + + @Description("Id of the Cloud BigTable instance to write into.") + String getBigTableInstanceId(); + + void setBigTableInstanceId(String bigTableInstanceId); + + @Description("Id of the Cloud BigTable table to write into.") + String getBigTableTableId(); + + void setBigTableTableId(String bigTableTableId); + + @Description("Column name to use as a key in Cloud BigTable.") + String getBigTableKeyColumnName(); + + void setBigTableKeyColumnName(String bigTableKeyColumnName); + + @Description("Column family name to use in Cloud BigTable.") + String getBigTableColumnFamilyName(); + + void setBigTableColumnFamilyName(String bigTableColumnFamilyName); + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/TokenizationFileSystemIO.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/TokenizationFileSystemIO.java new file mode 100644 index 000000000000..54805e11e2d5 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/TokenizationFileSystemIO.java @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.transforms.io; + +import org.apache.beam.examples.complete.datatokenization.options.DataTokenizationOptions; +import org.apache.beam.examples.complete.datatokenization.transforms.JsonToBeamRow; +import org.apache.beam.examples.complete.datatokenization.transforms.SerializableFunctions; +import org.apache.beam.examples.complete.datatokenization.utils.CsvConverters; +import org.apache.beam.examples.complete.datatokenization.utils.ErrorConverters; +import org.apache.beam.examples.complete.datatokenization.utils.FailsafeElement; +import org.apache.beam.examples.complete.datatokenization.utils.RowToCsv; +import org.apache.beam.examples.complete.datatokenization.utils.SchemasUtils; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.io.TextIO; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.ToJson; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollection.IsBounded; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TypeDescriptors; + +/** The {@link TokenizationFileSystemIO} class to read/write data from/into File Systems. */ +public class TokenizationFileSystemIO { + + /** The tag for the headers of the CSV if required. */ + static final TupleTag CSV_HEADERS = new TupleTag() {}; + + /** The tag for the lines of the CSV. */ + static final TupleTag CSV_LINES = new TupleTag() {}; + + /** The tag for the dead-letter output. */ + static final TupleTag> PROCESSING_DEADLETTER_OUT = + new TupleTag>() {}; + + /** The tag for the main output. */ + static final TupleTag> PROCESSING_OUT = + new TupleTag>() {}; + + /** Supported format to read from GCS. */ + public enum FORMAT { + JSON, + CSV + } + + /** + * Necessary {@link PipelineOptions} options for Pipelines that operate with JSON/CSV data in FS. + */ + public interface FileSystemPipelineOptions extends PipelineOptions { + + @Description("Filepattern for files to read data from") + String getInputFilePattern(); + + void setInputFilePattern(String inputFilePattern); + + @Description("File format of input files. Supported formats: JSON, CSV") + @Default.Enum("JSON") + TokenizationFileSystemIO.FORMAT getInputFileFormat(); + + void setInputFileFormat(FORMAT inputFileFormat); + + @Description("Directory to write data to") + String getOutputDirectory(); + + void setOutputDirectory(String outputDirectory); + + @Description("File format of output files. Supported formats: JSON, CSV") + @Default.Enum("JSON") + TokenizationFileSystemIO.FORMAT getOutputFileFormat(); + + void setOutputFileFormat(FORMAT outputFileFormat); + + @Description( + "The window duration in which data will be written. " + + "Should be specified only for 'Pub/Sub -> FS' case. Defaults to 30s. " + + "Allowed formats are: " + + "Ns (for seconds, example: 5s), " + + "Nm (for minutes, example: 12m), " + + "Nh (for hours, example: 2h).") + @Default.String("30s") + String getWindowDuration(); + + void setWindowDuration(String windowDuration); + + // CSV parameters + @Description("If file(s) contain headers") + Boolean getCsvContainsHeaders(); + + void setCsvContainsHeaders(Boolean csvContainsHeaders); + + @Description("Delimiting character in CSV. Default: use delimiter provided in csvFormat") + @Default.InstanceFactory(CsvConverters.DelimiterFactory.class) + String getCsvDelimiter(); + + void setCsvDelimiter(String csvDelimiter); + + @Description( + "Csv format according to Apache Commons CSV format. Default is: Apache Commons CSV" + + " default\n" + + "https://static.javadoc.io/org.apache.commons/commons-csv/1.7/org/apache/commons/csv/CSVFormat.html#DEFAULT\n" + + "Must match format names exactly found at: " + + "https://static.javadoc.io/org.apache.commons/commons-csv/1.7/org/apache/commons/csv/CSVFormat.Predefined.html") + @Default.String("Default") + String getCsvFormat(); + + void setCsvFormat(String csvFormat); + } + + private final DataTokenizationOptions options; + + public TokenizationFileSystemIO(DataTokenizationOptions options) { + this.options = options; + } + + public PCollection read(Pipeline pipeline, SchemasUtils schema) { + switch (options.getInputFileFormat()) { + case JSON: + return readJson(pipeline) + .apply(new JsonToBeamRow(options.getNonTokenizedDeadLetterPath(), schema)); + case CSV: + return readCsv(pipeline, schema) + .apply(new JsonToBeamRow(options.getNonTokenizedDeadLetterPath(), schema)); + default: + throw new IllegalStateException( + "No valid format for input data is provided. Please, choose JSON or CSV."); + } + } + + private PCollection readJson(Pipeline pipeline) { + return pipeline.apply("ReadJsonFromFiles", TextIO.read().from(options.getInputFilePattern())); + } + + private PCollection readCsv(Pipeline pipeline, SchemasUtils schema) { + /* + * Step 1: Read CSV file(s) from File System using {@link CsvConverters.ReadCsv}. + */ + PCollectionTuple csvLines = readCsv(pipeline); + /* + * Step 2: Convert lines to Json. + */ + PCollectionTuple jsons = csvLineToJson(csvLines, schema.getJsonBeamSchema()); + + if (options.getNonTokenizedDeadLetterPath() != null) { + /* + * Step 3: Write jsons to dead-letter that weren't successfully processed. + */ + jsons + .get(PROCESSING_DEADLETTER_OUT) + .apply( + "WriteCsvConversionErrorsToFS", + ErrorConverters.WriteErrorsToTextIO.newBuilder() + .setErrorWritePath(options.getNonTokenizedDeadLetterPath()) + .setTranslateFunction(SerializableFunctions.getCsvErrorConverter()) + .build()); + } + + /* + * Step 4: Get jsons that were successfully processed. + */ + return jsons + .get(PROCESSING_OUT) + .apply( + "GetJson", + MapElements.into(TypeDescriptors.strings()).via(FailsafeElement::getPayload)); + } + + private PCollectionTuple readCsv(Pipeline pipeline) { + return pipeline.apply( + "ReadCsvFromFiles", + CsvConverters.ReadCsv.newBuilder() + .setCsvFormat(options.getCsvFormat()) + .setDelimiter(options.getCsvDelimiter()) + .setHasHeaders(options.getCsvContainsHeaders()) + .setInputFileSpec(options.getInputFilePattern()) + .setHeaderTag(CSV_HEADERS) + .setLineTag(CSV_LINES) + .build()); + } + + private PCollectionTuple csvLineToJson(PCollectionTuple csvLines, String jsonSchema) { + return csvLines.apply( + "LineToJson", + CsvConverters.LineToFailsafeJson.newBuilder() + .setDelimiter(options.getCsvDelimiter()) + .setJsonSchema(jsonSchema) + .setHeaderTag(CSV_HEADERS) + .setLineTag(CSV_LINES) + .setUdfOutputTag(PROCESSING_OUT) + .setUdfDeadletterTag(PROCESSING_DEADLETTER_OUT) + .build()); + } + + public PDone write(PCollection input, Schema schema) { + switch (options.getOutputFileFormat()) { + case JSON: + return writeJson(input); + case CSV: + return writeCsv(input, schema); + default: + throw new IllegalStateException( + "No valid format for output data is provided. Please, choose JSON or CSV."); + } + } + + private PDone writeJson(PCollection input) { + PCollection jsons = input.apply("RowsToJSON", ToJson.of()); + + if (jsons.isBounded() == IsBounded.BOUNDED) { + return jsons.apply("WriteToFS", TextIO.write().to(options.getOutputDirectory())); + } else { + return jsons.apply( + "WriteToFS", + TextIO.write().withWindowedWrites().withNumShards(1).to(options.getOutputDirectory())); + } + } + + private PDone writeCsv(PCollection input, Schema schema) { + String header = String.join(options.getCsvDelimiter(), schema.getFieldNames()); + String csvDelimiter = options.getCsvDelimiter(); + + PCollection csvs = + input.apply( + "ConvertToCSV", + MapElements.into(TypeDescriptors.strings()) + .via((Row inputRow) -> new RowToCsv(csvDelimiter).getCsvFromRow(inputRow))); + + if (csvs.isBounded() == IsBounded.BOUNDED) { + return csvs.apply( + "WriteToFS", TextIO.write().to(options.getOutputDirectory()).withHeader(header)); + } else { + return csvs.apply( + "WriteToFS", + TextIO.write() + .withWindowedWrites() + .withNumShards(1) + .to(options.getOutputDirectory()) + .withHeader(header)); + } + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/package-info.java new file mode 100644 index 000000000000..0c1c03c0295c --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/io/package-info.java @@ -0,0 +1,18 @@ +/* + * Copyright (C) 2020 Google Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); you may not + * use this file except in compliance with the License. You may obtain a copy of + * the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ + +/** Protegrity Data Tokenization template for Google Cloud Teleport. */ +package org.apache.beam.examples.complete.datatokenization.transforms.io; diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/package-info.java new file mode 100644 index 000000000000..5530e633d727 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/transforms/package-info.java @@ -0,0 +1,18 @@ +/* + * Copyright (C) 2020 Google Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); you may not + * use this file except in compliance with the License. You may obtain a copy of + * the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ + +/** Protegrity Data Tokenization template for Google Cloud Teleport. */ +package org.apache.beam.examples.complete.datatokenization.transforms; diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/CsvConverters.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/CsvConverters.java new file mode 100644 index 000000000000..090abcb80eb5 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/CsvConverters.java @@ -0,0 +1,537 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.utils; + +import static org.apache.beam.examples.complete.datatokenization.utils.SchemasUtils.getGcsFileAsString; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.auto.value.AutoValue; +import com.google.gson.JsonArray; +import com.google.gson.JsonObject; +import com.google.gson.JsonParser; +import com.google.gson.stream.JsonWriter; +import java.io.IOException; +import java.io.StringReader; +import java.io.StringWriter; +import java.util.Arrays; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.FileIO.ReadableFile; +import org.apache.beam.sdk.io.TextIO; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.Sample; +import org.apache.beam.sdk.transforms.View; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.PCollectionView; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; +import org.apache.commons.csv.CSVFormat; +import org.apache.commons.csv.CSVParser; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Common transforms for Csv files. */ +@SuppressWarnings({"argument.type.incompatible"}) +public class CsvConverters { + + /* Logger for class. */ + private static final Logger LOG = LoggerFactory.getLogger(CsvConverters.class); + + private static final String SUCCESSFUL_TO_JSON_COUNTER = "SuccessfulToJsonCounter"; + + private static final String FAILED_TO_JSON_COUNTER = "FailedToJsonCounter"; + + private static JsonParser jsonParser = new JsonParser(); + + /** + * Builds Json string from list of values and headers or values and schema if schema is provided. + * + * @param headers optional list of strings which is the header of the Csv file. + * @param values list of strings which are combined with header or json schema to create Json + * string. + * @param jsonSchemaString + * @return Json string containing object. + * @throws IOException thrown if Json object is not able to be written. + * @throws NumberFormatException thrown if value cannot be parsed into type successfully. + */ + static String buildJsonString( + @Nullable List headers, List values, @Nullable String jsonSchemaString) + throws Exception { + + StringWriter stringWriter = new StringWriter(); + JsonWriter writer = new JsonWriter(stringWriter); + + if (jsonSchemaString != null) { + JsonArray jsonSchema = jsonParser.parse(jsonSchemaString).getAsJsonArray(); + writer.beginObject(); + + for (int i = 0; i < jsonSchema.size(); i++) { + JsonObject jsonObject = jsonSchema.get(i).getAsJsonObject(); + String type = jsonObject.get("type").getAsString().toUpperCase(); + writer.name(jsonObject.get("name").getAsString()); + + switch (type) { + case "LONG": + writer.value(Long.parseLong(values.get(i))); + break; + + case "DOUBLE": + writer.value(Double.parseDouble(values.get(i))); + break; + + case "INTEGER": + writer.value(Integer.parseInt(values.get(i))); + break; + + case "SHORT": + writer.value(Short.parseShort(values.get(i))); + break; + + case "BYTE": + writer.value(Byte.parseByte(values.get(i))); + break; + + case "FLOAT": + writer.value(Float.parseFloat(values.get(i))); + break; + + case "TEXT": + case "KEYWORD": + case "STRING": + writer.value(values.get(i)); + break; + + default: + LOG.error("Invalid data type, got: " + type); + throw new RuntimeException("Invalid data type, got: " + type); + } + } + writer.endObject(); + writer.close(); + return stringWriter.toString(); + + } else if (headers != null) { + + writer.beginObject(); + + for (int i = 0; i < headers.size(); i++) { + writer.name(headers.get(i)); + writer.value(values.get(i)); + } + + writer.endObject(); + writer.close(); + return stringWriter.toString(); + + } else { + LOG.error("No headers or schema specified"); + throw new RuntimeException("No headers or schema specified"); + } + } + + /** + * Gets Csv format accoring to Apache Commons CSV. If user + * passed invalid format error is thrown. + */ + public static CSVFormat getCsvFormat(String formatString, @Nullable String delimiter) { + + CSVFormat format = CSVFormat.Predefined.valueOf(formatString).getFormat(); + + // If a delimiter has been passed set it here. + if (delimiter != null) { + return format.withDelimiter(delimiter.charAt(0)); + } + return format; + } + + /** Necessary {@link PipelineOptions} options for Csv Pipelines. */ + public interface CsvPipelineOptions extends PipelineOptions { + @Description("Pattern to where data lives, ex: gs://mybucket/somepath/*.csv") + String getInputFileSpec(); + + void setInputFileSpec(String inputFileSpec); + + @Description("If file(s) contain headers") + Boolean getContainsHeaders(); + + void setContainsHeaders(Boolean containsHeaders); + + @Description("Deadletter table for failed inserts in form: :.") + String getDeadletterTable(); + + void setDeadletterTable(String deadletterTable); + + @Description("Delimiting character. Default: use delimiter provided in csvFormat") + @Default.InstanceFactory(DelimiterFactory.class) + String getDelimiter(); + + void setDelimiter(String delimiter); + + @Description( + "Csv format according to Apache Commons CSV format. Default is: Apache Commons CSV" + + " default\n" + + "https://static.javadoc.io/org.apache.commons/commons-csv/1.7/org/apache/commons/csv/CSVFormat.html#DEFAULT\n" + + "Must match format names exactly found at: " + + "https://static.javadoc.io/org.apache.commons/commons-csv/1.7/org/apache/commons/csv/CSVFormat.Predefined.html") + @Default.String("Default") + String getCsvFormat(); + + void setCsvFormat(String csvFormat); + + @Description("Optional: Path to JSON schema, ex gs://path/to/schema. ") + String getJsonSchemaPath(); + + void setJsonSchemaPath(String jsonSchemaPath); + + @Description("Set to true if number of files is in the tens of thousands. Default: false") + @Default.Boolean(false) + Boolean getLargeNumFiles(); + + void setLargeNumFiles(Boolean largeNumFiles); + } + + /** + * Default value factory to get delimiter from Csv format so that if the user does not pass one + * in, it matches the supplied {@link CsvPipelineOptions#getCsvFormat()}. + */ + public static class DelimiterFactory implements DefaultValueFactory { + + @Override + public String create(PipelineOptions options) { + CSVFormat csvFormat = getCsvFormat(options.as(CsvPipelineOptions.class).getCsvFormat(), null); + return String.valueOf(csvFormat.getDelimiter()); + } + } + + /** + * The {@link LineToFailsafeJson} interface converts a line from a Csv file into a Json string. + * Uses either: Javascript Udf, Json schema or the headers of the file to create the Json object + * which is then added to the {@link FailsafeElement} as the new payload. + */ + @AutoValue + public abstract static class LineToFailsafeJson + extends PTransform { + + public static Builder newBuilder() { + return new AutoValue_CsvConverters_LineToFailsafeJson.Builder(); + } + + public abstract String delimiter(); + + @Nullable + public abstract String jsonSchemaPath(); + + @Nullable + public abstract String jsonSchema(); + + public abstract TupleTag headerTag(); + + public abstract TupleTag lineTag(); + + public abstract TupleTag> udfOutputTag(); + + public abstract TupleTag> udfDeadletterTag(); + + @Override + public PCollectionTuple expand(PCollectionTuple lines) { + + PCollectionView headersView = null; + + // Convert csv lines into Failsafe elements so that we can recover over multiple transforms. + PCollection> lineFailsafeElements = + lines + .get(lineTag()) + .apply("LineToFailsafeElement", ParDo.of(new LineToFailsafeElementFn())); + + // If no udf then use json schema + String schemaPath = jsonSchemaPath(); + if (schemaPath != null || jsonSchema() != null) { + + String schema; + if (schemaPath != null) { + schema = getGcsFileAsString(schemaPath); + } else { + schema = jsonSchema(); + } + + return lineFailsafeElements.apply( + "LineToDocumentUsingSchema", + ParDo.of( + new FailsafeElementToJsonFn( + headersView, schema, delimiter(), udfDeadletterTag())) + .withOutputTags(udfOutputTag(), TupleTagList.of(udfDeadletterTag()))); + } + + // Run if using headers + headersView = lines.get(headerTag()).apply(Sample.any(1)).apply(View.asSingleton()); + + PCollectionView finalHeadersView = headersView; + lines + .get(headerTag()) + .apply( + "CheckHeaderConsistency", + ParDo.of( + new DoFn() { + @ProcessElement + public void processElement(ProcessContext c) { + String headers = c.sideInput(finalHeadersView); + if (!c.element().equals(headers)) { + LOG.error("Headers do not match, consistency cannot be guaranteed"); + throw new RuntimeException( + "Headers do not match, consistency cannot be guaranteed"); + } + } + }) + .withSideInputs(finalHeadersView)); + + return lineFailsafeElements.apply( + "LineToDocumentWithHeaders", + ParDo.of( + new FailsafeElementToJsonFn( + headersView, jsonSchemaPath(), delimiter(), udfDeadletterTag())) + .withSideInputs(headersView) + .withOutputTags(udfOutputTag(), TupleTagList.of(udfDeadletterTag()))); + } + + /** Builder for {@link LineToFailsafeJson}. */ + @AutoValue.Builder + public abstract static class Builder { + public abstract Builder setDelimiter(String delimiter); + + public abstract Builder setJsonSchemaPath(String jsonSchemaPath); + + public abstract Builder setJsonSchema(String jsonSchema); + + public abstract Builder setHeaderTag(TupleTag headerTag); + + public abstract Builder setLineTag(TupleTag lineTag); + + public abstract Builder setUdfOutputTag( + TupleTag> udfOutputTag); + + public abstract Builder setUdfDeadletterTag( + TupleTag> udfDeadletterTag); + + public abstract LineToFailsafeJson build(); + } + } + + /** + * The {@link FailsafeElementToJsonFn} class creates a Json string from a failsafe element. + * + *

    {@link FailsafeElementToJsonFn#FailsafeElementToJsonFn(PCollectionView, String, String, + * TupleTag)} + */ + public static class FailsafeElementToJsonFn + extends DoFn, FailsafeElement> { + + @Nullable public final String jsonSchema; + public final String delimiter; + public final TupleTag> udfDeadletterTag; + @Nullable private final PCollectionView headersView; + private Counter successCounter = + Metrics.counter(FailsafeElementToJsonFn.class, SUCCESSFUL_TO_JSON_COUNTER); + private Counter failedCounter = + Metrics.counter(FailsafeElementToJsonFn.class, FAILED_TO_JSON_COUNTER); + + FailsafeElementToJsonFn( + PCollectionView headersView, + String jsonSchema, + String delimiter, + TupleTag> udfDeadletterTag) { + this.headersView = headersView; + this.jsonSchema = jsonSchema; + this.delimiter = delimiter; + this.udfDeadletterTag = udfDeadletterTag; + } + + @ProcessElement + public void processElement(ProcessContext context) { + FailsafeElement element = context.element(); + List header = null; + + if (this.headersView != null) { + header = Arrays.asList(context.sideInput(this.headersView).split(this.delimiter)); + } + + List record = Arrays.asList(element.getOriginalPayload().split(this.delimiter)); + + try { + String json = buildJsonString(header, record, this.jsonSchema); + context.output(FailsafeElement.of(element.getOriginalPayload(), json)); + successCounter.inc(); + } catch (Exception e) { + failedCounter.inc(); + context.output( + this.udfDeadletterTag, + FailsafeElement.of(element) + .setErrorMessage(e.getMessage()) + .setStacktrace(Throwables.getStackTraceAsString(e))); + } + } + } + + /** + * The {@link LineToFailsafeElementFn} wraps an csv line with the {@link FailsafeElement} class so + * errors can be recovered from and the original message can be output to a error records table. + */ + static class LineToFailsafeElementFn extends DoFn> { + + @ProcessElement + public void processElement(ProcessContext context) { + String message = context.element(); + context.output(FailsafeElement.of(message, message)); + } + } + + /** + * The {@link ReadCsv} class is a {@link PTransform} that reads from one for more Csv files. The + * transform returns a {@link PCollectionTuple} consisting of the following {@link PCollection}: + * + *

      + *
    • {@link ReadCsv#headerTag()} - Contains headers found in files if read with headers, + * contains empty {@link PCollection} if no headers. + *
    • {@link ReadCsv#lineTag()} - Contains Csv lines as a {@link PCollection} of strings. + *
    + */ + @AutoValue + public abstract static class ReadCsv extends PTransform { + + public static Builder newBuilder() { + return new AutoValue_CsvConverters_ReadCsv.Builder(); + } + + public abstract String csvFormat(); + + @Nullable + public abstract String delimiter(); + + public abstract Boolean hasHeaders(); + + public abstract String inputFileSpec(); + + public abstract TupleTag headerTag(); + + public abstract TupleTag lineTag(); + + @Override + public PCollectionTuple expand(PBegin input) { + + if (hasHeaders()) { + return input + .apply("MatchFilePattern", FileIO.match().filepattern(inputFileSpec())) + .apply("ReadMatches", FileIO.readMatches()) + .apply( + "ReadCsvWithHeaders", + ParDo.of(new GetCsvHeadersFn(headerTag(), lineTag(), csvFormat(), delimiter())) + .withOutputTags(headerTag(), TupleTagList.of(lineTag()))); + } + + return PCollectionTuple.of( + lineTag(), input.apply("ReadCsvWithoutHeaders", TextIO.read().from(inputFileSpec()))); + } + + /** Builder for {@link ReadCsv}. */ + @AutoValue.Builder + public abstract static class Builder { + + public abstract Builder setCsvFormat(String csvFormat); + + public abstract Builder setDelimiter(@Nullable String delimiter); + + public abstract Builder setHasHeaders(Boolean hasHeaders); + + public abstract Builder setInputFileSpec(String inputFileSpec); + + public abstract Builder setHeaderTag(TupleTag headerTag); + + public abstract Builder setLineTag(TupleTag lineTag); + + abstract ReadCsv autoBuild(); + + public ReadCsv build() { + + ReadCsv readCsv = autoBuild(); + + checkArgument(readCsv.inputFileSpec() != null, "Input file spec must be provided."); + + checkArgument(readCsv.csvFormat() != null, "Csv format must not be null."); + + checkArgument(readCsv.hasHeaders() != null, "Header information must be provided."); + + return readCsv; + } + } + } + + /** + * The {@link GetCsvHeadersFn} class gets the header of a Csv file and outputs it as a string. The + * csv format provided in {@link CsvConverters#getCsvFormat(String, String)} is used to get the + * header. + */ + static class GetCsvHeadersFn extends DoFn { + + private final TupleTag headerTag; + private final TupleTag linesTag; + private CSVFormat csvFormat; + + GetCsvHeadersFn( + TupleTag headerTag, TupleTag linesTag, String csvFormat, String delimiter) { + this.headerTag = headerTag; + this.linesTag = linesTag; + this.csvFormat = getCsvFormat(csvFormat, delimiter); + } + + @ProcessElement + public void processElement(ProcessContext context, MultiOutputReceiver outputReceiver) { + ReadableFile f = context.element(); + String headers; + List records = null; + String delimiter = String.valueOf(this.csvFormat.getDelimiter()); + try { + String csvFileString = f.readFullyAsUTF8String(); + StringReader reader = new StringReader(csvFileString); + CSVParser parser = CSVParser.parse(reader, this.csvFormat.withFirstRecordAsHeader()); + records = + parser.getRecords().stream() + .map(i -> String.join(delimiter, i)) + .collect(Collectors.toList()); + headers = String.join(delimiter, parser.getHeaderNames()); + } catch (IOException ioe) { + LOG.error("Headers do not match, consistency cannot be guaranteed"); + throw new RuntimeException("Could not read Csv headers: " + ioe.getMessage()); + } + outputReceiver.get(this.headerTag).output(headers); + records.forEach(r -> outputReceiver.get(this.linesTag).output(r)); + } + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/DurationUtils.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/DurationUtils.java new file mode 100644 index 000000000000..83d3aea3acb5 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/DurationUtils.java @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.utils; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.util.Locale; +import org.joda.time.DateTime; +import org.joda.time.Duration; +import org.joda.time.MutablePeriod; +import org.joda.time.format.PeriodFormatterBuilder; +import org.joda.time.format.PeriodParser; + +/** + * The {@link DurationUtils} class provides common utilities for manipulating and formatting {@link + * Duration} objects. + */ +public class DurationUtils { + + /** + * Parses a duration from a period formatted string. Values are accepted in the following formats: + * + *

    Formats Ns - Seconds. Example: 5s
    + * Nm - Minutes. Example: 13m
    + * Nh - Hours. Example: 2h + * + *

    +   * parseDuration(null) = NullPointerException()
    +   * parseDuration("")   = Duration.standardSeconds(0)
    +   * parseDuration("2s") = Duration.standardSeconds(2)
    +   * parseDuration("5m") = Duration.standardMinutes(5)
    +   * parseDuration("3h") = Duration.standardHours(3)
    +   * 
    + * + * @param value The period value to parse. + * @return The {@link Duration} parsed from the supplied period string. + */ + public static Duration parseDuration(String value) { + checkNotNull(value, "The specified duration must be a non-null value!"); + + PeriodParser parser = + new PeriodFormatterBuilder() + .appendSeconds() + .appendSuffix("s") + .appendMinutes() + .appendSuffix("m") + .appendHours() + .appendSuffix("h") + .toParser(); + + MutablePeriod period = new MutablePeriod(); + parser.parseInto(period, value, 0, Locale.getDefault()); + + Duration duration = period.toDurationFrom(new DateTime(0)); + checkArgument(duration.getMillis() > 0, "The window duration must be greater than 0!"); + + return duration; + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/ErrorConverters.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/ErrorConverters.java new file mode 100644 index 000000000000..04beac038926 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/ErrorConverters.java @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.utils; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.api.services.bigquery.model.TableRow; +import com.google.auto.value.AutoValue; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import javax.annotation.Nullable; +import org.apache.beam.sdk.io.TextIO; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition; +import org.apache.beam.sdk.io.gcp.bigquery.WriteResult; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.windowing.FixedWindows; +import org.apache.beam.sdk.transforms.windowing.Window; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollection.IsBounded; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.joda.time.DateTimeZone; +import org.joda.time.Duration; +import org.joda.time.format.DateTimeFormat; +import org.joda.time.format.DateTimeFormatter; + +/** Transforms & DoFns & Options for Teleport Error logging. */ +public class ErrorConverters { + + /** Writes all Errors to GCS, place at the end of your pipeline. */ + @AutoValue + public abstract static class WriteStringMessageErrorsAsCsv + extends PTransform>, PDone> { + + public static Builder newBuilder() { + return new AutoValue_ErrorConverters_WriteStringMessageErrorsAsCsv.Builder(); + } + + public abstract String errorWritePath(); + + public abstract String csvDelimiter(); + + @Nullable + public abstract Duration windowDuration(); + + @SuppressWarnings("argument.type.incompatible") + @Override + public PDone expand(PCollection> pCollection) { + + PCollection formattedErrorRows = + pCollection.apply( + "GetFormattedErrorRow", ParDo.of(new FailedStringToCsvRowFn(csvDelimiter()))); + + if (pCollection.isBounded() == IsBounded.UNBOUNDED) { + if (windowDuration() != null) { + formattedErrorRows = + formattedErrorRows.apply(Window.into(FixedWindows.of(windowDuration()))); + } + return formattedErrorRows.apply( + TextIO.write().to(errorWritePath()).withNumShards(1).withWindowedWrites()); + + } else { + return formattedErrorRows.apply(TextIO.write().to(errorWritePath()).withNumShards(1)); + } + } + + /** Builder for {@link WriteStringMessageErrorsAsCsv}. */ + @AutoValue.Builder + public abstract static class Builder { + + public abstract Builder setErrorWritePath(String errorWritePath); + + public abstract Builder setCsvDelimiter(String csvDelimiter); + + public abstract Builder setWindowDuration(@Nullable Duration duration); + + public abstract WriteStringMessageErrorsAsCsv build(); + } + } + + /** + * The {@link FailedStringToCsvRowFn} converts string objects which have failed processing into + * {@link String} objects contained CSV which can be output to a filesystem. + */ + public static class FailedStringToCsvRowFn extends DoFn, String> { + + /** + * The formatter used to convert timestamps into a BigQuery compatible format. + */ + private static final DateTimeFormatter TIMESTAMP_FORMATTER = + DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss.SSSSSS"); + + private final String csvDelimiter; + + public FailedStringToCsvRowFn(String csvDelimiter) { + this.csvDelimiter = csvDelimiter; + } + + public FailedStringToCsvRowFn() { + this.csvDelimiter = ","; + } + + @ProcessElement + public void processElement(ProcessContext context) { + FailsafeElement failsafeElement = context.element(); + ArrayList outputRow = new ArrayList<>(); + final String message = failsafeElement.getOriginalPayload(); + + // Format the timestamp for insertion + String timestamp = + TIMESTAMP_FORMATTER.print(context.timestamp().toDateTime(DateTimeZone.UTC)); + + outputRow.add(timestamp); + outputRow.add(failsafeElement.getErrorMessage()); + + // Only set the payload if it's populated on the message. + if (message != null) { + outputRow.add(message); + } + + context.output(String.join(csvDelimiter, outputRow)); + } + } + + /** Write errors as string encoded messages. */ + @AutoValue + public abstract static class WriteStringMessageErrors + extends PTransform>, WriteResult> { + + public static Builder newBuilder() { + return new AutoValue_ErrorConverters_WriteStringMessageErrors.Builder(); + } + + public abstract String getErrorRecordsTable(); + + public abstract String getErrorRecordsTableSchema(); + + @Override + public WriteResult expand(PCollection> failedRecords) { + + return failedRecords + .apply("FailedRecordToTableRow", ParDo.of(new FailedStringToTableRowFn())) + .apply( + "WriteFailedRecordsToBigQuery", + BigQueryIO.writeTableRows() + .to(getErrorRecordsTable()) + .withJsonSchema(getErrorRecordsTableSchema()) + .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) + .withWriteDisposition(WriteDisposition.WRITE_APPEND)); + } + + /** Builder for {@link WriteStringMessageErrors}. */ + @AutoValue.Builder + public abstract static class Builder { + + public abstract Builder setErrorRecordsTable(String errorRecordsTable); + + public abstract Builder setErrorRecordsTableSchema(String errorRecordsTableSchema); + + public abstract WriteStringMessageErrors build(); + } + } + + /** + * The {@link FailedStringToTableRowFn} converts string objects which have failed processing into + * {@link TableRow} objects which can be output to a dead-letter table. + */ + public static class FailedStringToTableRowFn + extends DoFn, TableRow> { + + /** + * The formatter used to convert timestamps into a BigQuery compatible format. + */ + private static final DateTimeFormatter TIMESTAMP_FORMATTER = + DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss.SSSSSS"); + + @ProcessElement + public void processElement(ProcessContext context) { + FailsafeElement failsafeElement = context.element(); + final String message = failsafeElement.getOriginalPayload(); + + // Format the timestamp for insertion + String timestamp = + TIMESTAMP_FORMATTER.print(context.timestamp().toDateTime(DateTimeZone.UTC)); + + // Build the table row + final TableRow failedRow = + new TableRow() + .set("timestamp", timestamp) + .set("errorMessage", failsafeElement.getErrorMessage()) + .set("stacktrace", failsafeElement.getStacktrace()); + + // Only set the payload if it's populated on the message. + if (message != null) { + failedRow + .set("payloadString", message) + .set("payloadBytes", message.getBytes(StandardCharsets.UTF_8)); + } + + context.output(failedRow); + } + } + + /** + * {@link WriteErrorsToTextIO} is a {@link PTransform} that writes strings error messages to file + * system using TextIO and custom line format {@link SerializableFunction} to convert errors in + * necessary format.
    + * Example of usage in pipeline: + * + *
    {@code
    +   * pCollection.apply("Write to TextIO",
    +   *   WriteErrorsToTextIO.newBuilder()
    +   *     .setErrorWritePath("errors.txt")
    +   *     .setTranslateFunction((FailsafeElement failsafeElement) -> {
    +   *       ArrayList outputRow  = new ArrayList<>();
    +   *       final String message = failsafeElement.getOriginalPayload();
    +   *       String timestamp = Instant.now().toString();
    +   *       outputRow.add(timestamp);
    +   *       outputRow.add(failsafeElement.getErrorMessage());
    +   *       outputRow.add(failsafeElement.getStacktrace());
    +   *       // Only set the payload if it's populated on the message.
    +   *       if (failsafeElement.getOriginalPayload() != null) {
    +   *         outputRow.add(message);
    +   *       }
    +   *
    +   *       return String.join(",",outputRow);
    +   *     })
    +   * }
    + */ + @AutoValue + public abstract static class WriteErrorsToTextIO + extends PTransform>, PDone> { + + public static WriteErrorsToTextIO.Builder newBuilder() { + return new AutoValue_ErrorConverters_WriteErrorsToTextIO.Builder<>(); + } + + public abstract String errorWritePath(); + + public abstract SerializableFunction, String> translateFunction(); + + @Nullable + public abstract Duration windowDuration(); + + @Override + @SuppressWarnings("argument.type.incompatible") + public PDone expand(PCollection> pCollection) { + + PCollection formattedErrorRows = + pCollection.apply( + "GetFormattedErrorRow", + MapElements.into(TypeDescriptors.strings()).via(translateFunction())); + + if (pCollection.isBounded() == PCollection.IsBounded.UNBOUNDED) { + if (windowDuration() == null) { + throw new RuntimeException("Unbounded input requires window interval to be set"); + } + return formattedErrorRows + .apply(Window.into(FixedWindows.of(windowDuration()))) + .apply(TextIO.write().to(errorWritePath()).withNumShards(1).withWindowedWrites()); + } + + return formattedErrorRows.apply(TextIO.write().to(errorWritePath()).withNumShards(1)); + } + + /** Builder for {@link WriteErrorsToTextIO}. */ + @AutoValue.Builder + public abstract static class Builder { + + public abstract WriteErrorsToTextIO.Builder setErrorWritePath(String errorWritePath); + + public abstract WriteErrorsToTextIO.Builder setTranslateFunction( + SerializableFunction, String> translateFunction); + + public abstract WriteErrorsToTextIO.Builder setWindowDuration( + @Nullable Duration duration); + + abstract SerializableFunction, String> translateFunction(); + + abstract WriteErrorsToTextIO autoBuild(); + + public WriteErrorsToTextIO build() { + checkNotNull(translateFunction(), "translateFunction is required."); + return autoBuild(); + } + } + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/FailsafeElement.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/FailsafeElement.java new file mode 100644 index 000000000000..66f1c3c176a8 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/FailsafeElement.java @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.utils; + +import java.util.Objects; +import org.apache.avro.reflect.Nullable; +import org.apache.beam.sdk.coders.DefaultCoder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; + +/** + * The {@link FailsafeElement} class holds the current value and original value of a record within a + * pipeline. This class allows pipelines to not lose valuable information about an incoming record + * throughout the processing of that record. The use of this class allows for more robust + * dead-letter strategies as the original record information is not lost throughout the pipeline and + * can be output to a dead-letter in the event of a failure during one of the pipelines transforms. + */ +@DefaultCoder(FailsafeElementCoder.class) +public class FailsafeElement { + + private final OriginalT originalPayload; + private final CurrentT payload; + @Nullable private String errorMessage = ""; + @Nullable private String stacktrace = ""; + + private FailsafeElement(OriginalT originalPayload, CurrentT payload) { + this.originalPayload = originalPayload; + this.payload = payload; + } + + public static FailsafeElement of( + OriginalT originalPayload, CurrentT currentPayload) { + return new FailsafeElement<>(originalPayload, currentPayload); + } + + public static FailsafeElement of( + FailsafeElement other) { + return new FailsafeElement<>(other.originalPayload, other.payload) + .setErrorMessage(other.getErrorMessage()) + .setStacktrace(other.getStacktrace()); + } + + public OriginalT getOriginalPayload() { + return originalPayload; + } + + public CurrentT getPayload() { + return payload; + } + + public String getErrorMessage() { + return errorMessage; + } + + public FailsafeElement setErrorMessage(String errorMessage) { + this.errorMessage = errorMessage; + return this; + } + + public String getStacktrace() { + return stacktrace; + } + + public FailsafeElement setStacktrace(String stacktrace) { + this.stacktrace = stacktrace; + return this; + } + + @Override + public int hashCode() { + return Objects.hash(originalPayload, payload, errorMessage, stacktrace); + } + + @Override + public String toString() { + return MoreObjects.toStringHelper(this) + .add("originalPayload", originalPayload) + .add("payload", payload) + .add("errorMessage", errorMessage) + .add("stacktrace", stacktrace) + .toString(); + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/FailsafeElementCoder.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/FailsafeElementCoder.java new file mode 100644 index 000000000000..151d98a9070e --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/FailsafeElementCoder.java @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.utils; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.Arrays; +import java.util.List; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; +import org.apache.beam.sdk.coders.NullableCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeParameter; + +/** + * The {@link FailsafeElementCoder} encodes and decodes {@link FailsafeElement} objects. + * + *

    This coder is necessary until Avro supports parameterized types (AVRO-1571) without requiring to + * explicitly specifying the schema for the type. + * + * @param The type of the original payload to be encoded. + * @param The type of the current payload to be encoded. + */ +public class FailsafeElementCoder + extends CustomCoder> { + + private static final NullableCoder STRING_CODER = NullableCoder.of(StringUtf8Coder.of()); + private final Coder originalPayloadCoder; + private final Coder currentPayloadCoder; + + private FailsafeElementCoder( + Coder originalPayloadCoder, Coder currentPayloadCoder) { + this.originalPayloadCoder = originalPayloadCoder; + this.currentPayloadCoder = currentPayloadCoder; + } + + public Coder getOriginalPayloadCoder() { + return originalPayloadCoder; + } + + public Coder getCurrentPayloadCoder() { + return currentPayloadCoder; + } + + public static FailsafeElementCoder of( + Coder originalPayloadCoder, Coder currentPayloadCoder) { + return new FailsafeElementCoder<>(originalPayloadCoder, currentPayloadCoder); + } + + @Override + public void encode(FailsafeElement value, OutputStream outStream) + throws IOException { + if (value == null) { + throw new CoderException("The FailsafeElementCoder cannot encode a null object!"); + } + + originalPayloadCoder.encode(value.getOriginalPayload(), outStream); + currentPayloadCoder.encode(value.getPayload(), outStream); + STRING_CODER.encode(value.getErrorMessage(), outStream); + STRING_CODER.encode(value.getStacktrace(), outStream); + } + + @Override + public FailsafeElement decode(InputStream inStream) throws IOException { + + OriginalT originalPayload = originalPayloadCoder.decode(inStream); + CurrentT currentPayload = currentPayloadCoder.decode(inStream); + String errorMessage = STRING_CODER.decode(inStream); + String stacktrace = STRING_CODER.decode(inStream); + + return FailsafeElement.of(originalPayload, currentPayload) + .setErrorMessage(errorMessage) + .setStacktrace(stacktrace); + } + + @Override + public List> getCoderArguments() { + return Arrays.asList(originalPayloadCoder, currentPayloadCoder); + } + + @Override + public TypeDescriptor> getEncodedTypeDescriptor() { + return new TypeDescriptor>() {}.where( + new TypeParameter() {}, originalPayloadCoder.getEncodedTypeDescriptor()) + .where(new TypeParameter() {}, currentPayloadCoder.getEncodedTypeDescriptor()); + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/RowToCsv.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/RowToCsv.java new file mode 100644 index 000000000000..f8ab7d4ff39f --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/RowToCsv.java @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.utils; + +import java.util.stream.Collectors; +import org.apache.beam.sdk.values.Row; + +/** The {@link RowToCsv} class to convert Beam Rows into strings in CSV format. */ +public class RowToCsv { + + private final String csvDelimiter; + + public RowToCsv(String csvDelimiter) { + this.csvDelimiter = csvDelimiter; + } + + public String getCsvFromRow(Row row) { + return row.getValues().stream() + .map(item -> item == null ? "null" : item) + .map(Object::toString) + .collect(Collectors.joining(csvDelimiter)); + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/SchemasUtils.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/SchemasUtils.java new file mode 100644 index 000000000000..e8dd24eabeee --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/SchemasUtils.java @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization.utils; + +import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.fromTableSchema; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.api.services.bigquery.model.TableSchema; +import java.io.IOException; +import java.io.InputStream; +import java.io.Reader; +import java.nio.channels.Channels; +import java.nio.channels.ReadableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MatchResult; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CharStreams; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * The {@link SchemasUtils} Class to read JSON based schema. Is there available to read from file or + * from string. Currently supported local File System and GCS. + */ +@SuppressWarnings({ + "initialization.fields.uninitialized", + "method.invocation.invalid", + "dereference.of.nullable", + "argument.type.incompatible", + "return.type.incompatible" +}) +public class SchemasUtils { + + /* Logger for class.*/ + private static final Logger LOG = LoggerFactory.getLogger(SchemasUtils.class); + + private TableSchema bigQuerySchema; + private Schema beamSchema; + private String jsonBeamSchema; + + public SchemasUtils(String schema) { + parseJson(schema); + } + + public SchemasUtils(String path, Charset encoding) throws IOException { + if (path.startsWith("gs://")) { + parseJson(new String(readGcsFile(path), encoding)); + } else { + byte[] encoded = Files.readAllBytes(Paths.get(path)); + parseJson(new String(encoded, encoding)); + } + LOG.info("Extracted schema: " + bigQuerySchema.toPrettyString()); + } + + public TableSchema getBigQuerySchema() { + return bigQuerySchema; + } + + private void parseJson(String jsonSchema) throws UnsupportedOperationException { + TableSchema schema = BigQueryHelpers.fromJsonString(jsonSchema, TableSchema.class); + validateSchemaTypes(schema); + bigQuerySchema = schema; + jsonBeamSchema = BigQueryHelpers.toJsonString(schema.getFields()); + } + + private void validateSchemaTypes(TableSchema bigQuerySchema) { + try { + beamSchema = fromTableSchema(bigQuerySchema); + } catch (UnsupportedOperationException exception) { + LOG.error("Check json schema, {}", exception.getMessage()); + } catch (NullPointerException npe) { + LOG.error("Missing schema keywords, please check what all required fields presented"); + } + } + + /** + * Method to read a schema file from GCS and return the file contents as a string. + * + * @param gcsFilePath path to file in GCS in format "gs://your-bucket/path/to/file" + * @return byte array with file contents + * @throws IOException thrown if not able to read file + */ + public static byte[] readGcsFile(String gcsFilePath) throws IOException { + LOG.info("Reading contents from GCS file: {}", gcsFilePath); + // Read the GCS file into byte array and will throw an I/O exception in case file not found. + try (ReadableByteChannel readerChannel = + FileSystems.open(FileSystems.matchSingleFileSpec(gcsFilePath).resourceId())) { + try (InputStream stream = Channels.newInputStream(readerChannel)) { + return ByteStreams.toByteArray(stream); + } + } + } + + public Schema getBeamSchema() { + return beamSchema; + } + + public String getJsonBeamSchema() { + return jsonBeamSchema; + } + + /** + * Reads a file from GCS and returns it as a string. + * + * @param filePath path to file in GCS + * @return contents of the file as a string + * @throws IOException thrown if not able to read file + */ + public static String getGcsFileAsString(String filePath) { + MatchResult result; + try { + result = FileSystems.match(filePath); + checkArgument( + result.status() == MatchResult.Status.OK && !result.metadata().isEmpty(), + "Failed to match any files with the pattern: " + filePath); + + List rId = + result.metadata().stream() + .map(MatchResult.Metadata::resourceId) + .collect(Collectors.toList()); + + checkArgument(rId.size() == 1, "Expected exactly 1 file, but got " + rId.size() + " files."); + + Reader reader = + Channels.newReader(FileSystems.open(rId.get(0)), StandardCharsets.UTF_8.name()); + + return CharStreams.toString(reader); + + } catch (IOException ioe) { + LOG.error("File system i/o error: " + ioe.getMessage()); + throw new RuntimeException(ioe); + } + } + + public static final String DEADLETTER_SCHEMA = + "{\n" + + " \"fields\": [\n" + + " {\n" + + " \"name\": \"timestamp\",\n" + + " \"type\": \"TIMESTAMP\",\n" + + " \"mode\": \"REQUIRED\"\n" + + " },\n" + + " {\n" + + " \"name\": \"payloadString\",\n" + + " \"type\": \"STRING\",\n" + + " \"mode\": \"REQUIRED\"\n" + + " },\n" + + " {\n" + + " \"name\": \"payloadBytes\",\n" + + " \"type\": \"BYTES\",\n" + + " \"mode\": \"REQUIRED\"\n" + + " },\n" + + " {\n" + + " \"name\": \"attributes\",\n" + + " \"type\": \"RECORD\",\n" + + " \"mode\": \"REPEATED\",\n" + + " \"fields\": [\n" + + " {\n" + + " \"name\": \"key\",\n" + + " \"type\": \"STRING\",\n" + + " \"mode\": \"NULLABLE\"\n" + + " },\n" + + " {\n" + + " \"name\": \"value\",\n" + + " \"type\": \"STRING\",\n" + + " \"mode\": \"NULLABLE\"\n" + + " }\n" + + " ]\n" + + " },\n" + + " {\n" + + " \"name\": \"errorMessage\",\n" + + " \"type\": \"STRING\",\n" + + " \"mode\": \"NULLABLE\"\n" + + " },\n" + + " {\n" + + " \"name\": \"stacktrace\",\n" + + " \"type\": \"STRING\",\n" + + " \"mode\": \"NULLABLE\"\n" + + " }\n" + + " ]\n" + + "}"; +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/package-info.java new file mode 100644 index 000000000000..e53a4f7ce046 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/datatokenization/utils/package-info.java @@ -0,0 +1,18 @@ +/* + * Copyright (C) 2020 Google Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); you may not + * use this file except in compliance with the License. You may obtain a copy of + * the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ + +/** Protegrity Data Tokenization template for Google Cloud Teleport. */ +package org.apache.beam.examples.complete.datatokenization.utils; diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaPubsubConstants.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaPubsubConstants.java new file mode 100644 index 000000000000..46f021a5ca75 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaPubsubConstants.java @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub; + +/** Constant variables that are used across the template's parts. */ +public class KafkaPubsubConstants { + + /* Config keywords */ + public static final String KAFKA_CREDENTIALS = "kafka"; + public static final String SSL_CREDENTIALS = "ssl"; + public static final String USERNAME = "username"; + public static final String PASSWORD = "password"; + public static final String BUCKET = "bucket"; +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaToPubsub.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaToPubsub.java new file mode 100644 index 000000000000..e4db72e2dc33 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaToPubsub.java @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub; + +import static org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer.Utils.configureKafka; +import static org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer.Utils.configureSsl; +import static org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer.Utils.getKafkaCredentialsFromVault; +import static org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer.Utils.isSslSpecified; +import static org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer.Utils.parseKafkaConsumerConfig; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.beam.examples.complete.kafkatopubsub.avro.AvroDataClass; +import org.apache.beam.examples.complete.kafkatopubsub.avro.AvroDataClassKafkaAvroDeserializer; +import org.apache.beam.examples.complete.kafkatopubsub.options.KafkaToPubsubOptions; +import org.apache.beam.examples.complete.kafkatopubsub.transforms.FormatTransform; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.transforms.Values; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * The {@link KafkaToPubsub} pipeline is a streaming pipeline which ingests data in JSON format from + * Kafka, and outputs the resulting records to PubSub. Input topics, output topic, Bootstrap servers + * are specified by the user as template parameters.
    + * Kafka may be configured with SASL/SCRAM security mechanism, in this case a Vault secret storage + * with credentials should be provided. URL to credentials and Vault token are specified by the user + * as template parameters. + * + *

    Pipeline Requirements + * + *

      + *
    • Kafka Bootstrap Server(s). + *
    • Kafka Topic(s) exists. + *
    • The PubSub output topic exists. + *
    • (Optional) An existing HashiCorp Vault secret storage + *
    • (Optional) A configured secure SSL connection for Kafka + *
    + * + *

    Example Usage + * + *

    + * # Gradle preparation
    + *
    + * To run this example your {@code build.gradle} file should contain the following task
    + * to execute the pipeline:
    + * {@code
    + * task execute (type:JavaExec) {
    + *     main = System.getProperty("mainClass")
    + *     classpath = sourceSets.main.runtimeClasspath
    + *     systemProperties System.getProperties()
    + *     args System.getProperty("exec.args", "").split()
    + * }
    + * }
    + *
    + * This task allows to run the pipeline via the following command:
    + * {@code
    + * gradle clean execute -DmainClass=org.apache.beam.examples.complete.kafkatopubsub.KafkaToPubsub \
    + *      -Dexec.args="--= --="
    + * }
    + *
    + * # Running the pipeline
    + * To execute this pipeline, specify the parameters:
    + *
    + * - Kafka Bootstrap servers
    + * - Kafka input topics
    + * - Pub/Sub output topic
    + * - Output format
    + *
    + * in the following format:
    + * {@code
    + * --bootstrapServers=host:port \
    + * --inputTopics=your-input-topic \
    + * --outputTopic=projects/your-project-id/topics/your-topic-pame \
    + * --outputFormat=AVRO|PUBSUB
    + * }
    + *
    + * Optionally, to retrieve Kafka credentials for SASL/SCRAM,
    + * specify a URL to the credentials in HashiCorp Vault and the vault access token:
    + * {@code
    + * --secretStoreUrl=http(s)://host:port/path/to/credentials
    + * --vaultToken=your-token
    + * }
    + *
    + * Optionally, to configure secure SSL connection between the Beam pipeline and Kafka,
    + * specify the parameters:
    + * - A path to a truststore file (it can be a local path or a GCS path, which should start with `gs://`)
    + * - A path to a keystore file (it can be a local path or a GCS path, which should start with `gs://`)
    + * - Truststore password
    + * - Keystore password
    + * - Key password
    + * {@code
    + * --truststorePath=path/to/kafka.truststore.jks
    + * --keystorePath=path/to/kafka.keystore.jks
    + * --truststorePassword=your-truststore-password
    + * --keystorePassword=your-keystore-password
    + * --keyPassword=your-key-password
    + * }
    + * By default this will run the pipeline locally with the DirectRunner. To change the runner, specify:
    + * {@code
    + * --runner=YOUR_SELECTED_RUNNER
    + * }
    + * 
    + * + *

    Example Avro usage + * + *

    + * This template contains an example Class to deserialize AVRO from Kafka and serialize it to AVRO in Pub/Sub.
    + *
    + * To use this example in the specific case, follow the few steps:
    + * 
      + *
    • Create your own class to describe AVRO schema. As an example use {@link AvroDataClass}. Just define necessary fields. + *
    • Create your own Avro Deserializer class. As an example use {@link AvroDataClassKafkaAvroDeserializer}. Just rename it, and put your own Schema class as the necessary types. + *
    • Modify the {@link FormatTransform}. Put your Schema class and Deserializer to the related parameter. + *
    • Modify write step in the {@link KafkaToPubsub} by put your Schema class to "writeAvrosToPubSub" step. + *
    + *
    + */ +public class KafkaToPubsub { + + /* Logger for class.*/ + private static final Logger LOG = LoggerFactory.getLogger(KafkaToPubsub.class); + + /** + * Main entry point for pipeline execution. + * + * @param args Command line arguments to the pipeline. + */ + public static void main(String[] args) { + KafkaToPubsubOptions options = + PipelineOptionsFactory.fromArgs(args).withValidation().as(KafkaToPubsubOptions.class); + + // Create the pipeline + Pipeline pipeline = Pipeline.create(options); + run(pipeline, options); + } + + /** + * Runs a pipeline which reads message from Kafka and writes it to GCS. + * + * @param options arguments to the pipeline + */ + public static PipelineResult run(Pipeline pipeline, KafkaToPubsubOptions options) { + // Configure Kafka consumer properties + Map kafkaConfig = new HashMap<>(); + kafkaConfig.putAll(parseKafkaConsumerConfig(options.getKafkaConsumerConfig())); + Map sslConfig = new HashMap<>(); + if (options.getSecretStoreUrl() != null && options.getVaultToken() != null) { + Map> credentials = + getKafkaCredentialsFromVault(options.getSecretStoreUrl(), options.getVaultToken()); + kafkaConfig = configureKafka(credentials.get(KafkaPubsubConstants.KAFKA_CREDENTIALS)); + } else { + LOG.warn( + "No information to retrieve Kafka credentials was provided. " + + "Trying to initiate an unauthorized connection."); + } + + if (isSslSpecified(options)) { + sslConfig.putAll(configureSsl(options)); + } else { + LOG.info( + "No information to retrieve SSL certificate was provided by parameters." + + "Trying to initiate a plain text connection."); + } + + List topicsList = new ArrayList<>(Arrays.asList(options.getInputTopics().split(","))); + + checkArgument( + topicsList.size() > 0 && topicsList.get(0).length() > 0, + "inputTopics cannot be an empty string."); + + List bootstrapServersList = + new ArrayList<>(Arrays.asList(options.getBootstrapServers().split(","))); + + checkArgument( + bootstrapServersList.size() > 0 && topicsList.get(0).length() > 0, + "bootstrapServers cannot be an empty string."); + + LOG.info( + "Starting Kafka-To-PubSub pipeline with parameters bootstrap servers:" + + options.getBootstrapServers() + + " input topics: " + + options.getInputTopics() + + " output pubsub topic: " + + options.getOutputTopic()); + + /* + * Steps: + * 1) Read messages in from Kafka + * 2) Extract values only + * 3) Write successful records to PubSub + */ + + if (options.getOutputFormat() == FormatTransform.FORMAT.AVRO) { + pipeline + .apply( + "readAvrosFromKafka", + FormatTransform.readAvrosFromKafka( + options.getBootstrapServers(), topicsList, kafkaConfig, sslConfig)) + .apply("createValues", Values.create()) + .apply("writeAvrosToPubSub", PubsubIO.writeAvros(AvroDataClass.class)); + + } else { + pipeline + .apply( + "readFromKafka", + FormatTransform.readFromKafka( + options.getBootstrapServers(), topicsList, kafkaConfig, sslConfig)) + .apply("createValues", Values.create()) + .apply("writeToPubSub", new FormatTransform.FormatOutput(options)); + } + + return pipeline.run(); + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/README.md b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/README.md new file mode 100644 index 000000000000..13f2d3a5f593 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/README.md @@ -0,0 +1,200 @@ + + +# Apache Beam pipeline example to ingest data from Apache Kafka to Google Cloud Pub/Sub + +This directory contains an [Apache Beam](https://beam.apache.org/) pipeline example that creates a pipeline to read data +from a single or multiple topics from +[Apache Kafka](https://kafka.apache.org/) and write data into a single topic +in [Google Cloud Pub/Sub](https://cloud.google.com/pubsub). + +Supported data formats: + +- Serializable plaintext formats, such as JSON +- [PubSubMessage](https://cloud.google.com/pubsub/docs/reference/rest/v1/PubsubMessage). + +Supported input source configurations: + +- Single or multiple Apache Kafka bootstrap servers +- Apache Kafka SASL/SCRAM authentication over plaintext or SSL connection +- Secrets vault service [HashiCorp Vault](https://www.vaultproject.io/). + +Supported destination configuration: + +- Single Google Cloud Pub/Sub topic. + +In a simple scenario, the example will create an Apache Beam pipeline that will read messages from a source Kafka server +with a source topic, and stream the text messages into specified Pub/Sub destination topic. Other scenarios may need +Kafka SASL/SCRAM authentication, that can be performed over plain text or SSL encrypted connection. The example supports +using a single Kafka user account to authenticate in the provided source Kafka servers and topics. To support SASL +authenticaton over SSL the example will need an SSL certificate location and access to a secrets vault service with +Kafka username and password, currently supporting HashiCorp Vault. + +## Requirements + +- Java 8 +- Kafka Bootstrap Server(s) up and running +- Existing source Kafka topic(s) +- An existing Pub/Sub destination output topic +- (Optional) An existing HashiCorp Vault +- (Optional) A configured secure SSL connection for Kafka + +## Getting Started + +This section describes what is needed to get the example up and running. + +- Gradle preparation +- Local execution +- Running as a Dataflow Template +- Supported Output Formats + - PubSubMessage + - AVRO +- E2E tests (TBD) + +## Gradle preparation + +To run this example your `build.gradle` file should contain the following task to execute the pipeline: + +``` +task execute (type:JavaExec) { + main = System.getProperty("mainClass") + classpath = sourceSets.main.runtimeClasspath + systemProperties System.getProperties() + args System.getProperty("exec.args", "").split() +} +``` + +This task allows to run the pipeline via the following command: + +```bash +gradle clean execute -DmainClass=org.apache.beam.examples.complete.kafkatopubsub.KafkaToPubsub \ + -Dexec.args="--= --=" +``` + +## Running the pipeline + +To execute this pipeline, specify the parameters: + +- Kafka Bootstrap servers +- Kafka input topics +- Pub/Sub output topic +- Output format + +in the following format: + +```bash +--bootstrapServers=host:port \ +--inputTopics=your-input-topic \ +--outputTopic=projects/your-project-id/topics/your-topic-pame \ +--outputFormat=AVRO|PUBSUB +``` + +Optionally, to retrieve Kafka credentials for SASL/SCRAM, specify a URL to the credentials in HashiCorp Vault and the +vault access token: + +```bash +--secretStoreUrl=http(s)://host:port/path/to/credentials +--vaultToken=your-token +``` + +Optionally, to configure secure SSL connection between the Beam pipeline and Kafka, specify the parameters: + +- A path to a truststore file (it can be a local path or a GCS path, which should start with `gs://`) +- A path to a keystore file (it can be a local path or a GCS path, which should start with `gs://`) +- Truststore password +- Keystore password +- Key password + +```bash +--truststorePath=path/to/kafka.truststore.jks +--keystorePath=path/to/kafka.keystore.jks +--truststorePassword=your-truststore-password +--keystorePassword=your-keystore-password +--keyPassword=your-key-password +``` + +By default this will run the pipeline locally with the DirectRunner. To change the runner, specify: + +```bash +--runner=YOUR_SELECTED_RUNNER +``` + +See the [documentation](http://beam.apache.org/get-started/quickstart/) and +the [Examples README](../../../../../../../../../README.md) for more information about how to run this example. + +## Running as a Dataflow Template + +This example also exists as Google Dataflow Template, which you can build and run using Google Cloud Platform. See +its [README.md](https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/v2/kafka-to-pubsub/README.md) for +more information. + +## Supported Output Formats + +This pipeline can output data in a format of PubSubMessage or AVRO. + +### PubSubMessage + +This example supports PubSubMessage format for output out-of-the-box. No additional changes are required. + +### AVRO + +This example contains an example demonstrating AVRO format support, the following steps should be done to provide it: + +- Define custom Class to deserialize AVRO from Kafka [provided in example] +- Create custom data serialization in Apache Beam +- Serialize data to AVRO in Pub/Sub [provided in example]. + +To use this example in the specific case, follow these steps: + +- Create your own class to describe AVRO schema. As an example use [AvroDataClass](avro/AvroDataClass.java). Just define + necessary fields. +- Create your own Avro Deserializer class. As an example + use [AvroDataClassKafkaAvroDeserializer class](avro/AvroDataClassKafkaAvroDeserializer.java). Just rename it, and put + your own Schema class as the necessary types. +- Modify the [FormatTransform.readAvrosFromKafka method](transforms/FormatTransform.java). Put your Schema class and + Deserializer to the related parameter. + +```java +return KafkaIO.read() + ... + .withValueDeserializerAndCoder( + AvroDataClassKafkaAvroDeserializer.class,AvroCoder.of(AvroDataClass.class)) // put your classes here + ... +``` + +- [OPTIONAL TO IMPLEMENT] Add [Beam Transform](https://beam.apache.org/documentation/programming-guide/#transforms) if + it necessary in your case. +- Modify the write step in the [KafkaToPubsub class](KafkaToPubsub.java) by putting your Schema class to " + writeAvrosToPubSub" step. + - NOTE: if it changed during the transform, you should use changed one class definition. + +```java +if(options.getOutputFormat()==FormatTransform.FORMAT.AVRO){ + ... + .apply("writeAvrosToPubSub",PubsubIO.writeAvros(AvroDataClass.class)); // put your SCHEMA class here + + } +``` + +## End to end tests + +TBD + +_Note: The Kafka to Pub/Sub job executed with a distributed runner supports SSL configuration with the certificate +located only in GCS._ diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/avro/AvroDataClass.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/avro/AvroDataClass.java new file mode 100644 index 000000000000..8c8702115f65 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/avro/AvroDataClass.java @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub.avro; + +import org.apache.beam.sdk.coders.AvroCoder; +import org.apache.beam.sdk.coders.DefaultCoder; + +/** + * Example of AVRO serialization class. To configure your AVRO schema, change this class to + * requirement schema definition + */ +@DefaultCoder(AvroCoder.class) +public class AvroDataClass { + + String field1; + Float field2; + Float field3; + + public AvroDataClass(String field1, Float field2, Float field3) { + this.field1 = field1; + this.field2 = field2; + this.field3 = field3; + } + + public String getField1() { + return field1; + } + + public void setField1(String field1) { + this.field1 = field1; + } + + public Float getField2() { + return field2; + } + + public void setField2(Float field2) { + this.field2 = field2; + } + + public Float getField3() { + return field3; + } + + public void setField3(Float field3) { + this.field3 = field3; + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/avro/AvroDataClassKafkaAvroDeserializer.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/avro/AvroDataClassKafkaAvroDeserializer.java new file mode 100644 index 000000000000..a9aeb72eb196 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/avro/AvroDataClassKafkaAvroDeserializer.java @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub.avro; + +import io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer; +import io.confluent.kafka.serializers.KafkaAvroDeserializerConfig; +import java.util.Map; +import org.apache.kafka.common.serialization.Deserializer; + +/** Example of custom AVRO Deserialize. */ +public class AvroDataClassKafkaAvroDeserializer extends AbstractKafkaAvroDeserializer + implements Deserializer { + + @Override + public void configure(Map configs, boolean isKey) { + configure(new KafkaAvroDeserializerConfig(configs)); + } + + @Override + public AvroDataClass deserialize(String s, byte[] bytes) { + return (AvroDataClass) this.deserialize(bytes); + } + + @Override + public void close() {} +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/avro/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/avro/package-info.java new file mode 100644 index 000000000000..1dc4f36ce656 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/avro/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Kafka to Pubsub template. */ +package org.apache.beam.examples.complete.kafkatopubsub.avro; diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/SslConsumerFactoryFn.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/SslConsumerFactoryFn.java new file mode 100644 index 000000000000..eda4e9e14f8b --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/SslConsumerFactoryFn.java @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer; + +import java.io.File; +import java.io.IOException; +import java.nio.channels.FileChannel; +import java.nio.channels.ReadableByteChannel; +import java.nio.file.Paths; +import java.nio.file.StandardOpenOption; +import java.util.HashSet; +import java.util.Map; +import java.util.Set; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.kafka.clients.CommonClientConfigs; +import org.apache.kafka.clients.consumer.Consumer; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.config.SslConfigs; +import org.apache.kafka.common.security.auth.SecurityProtocol; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Class to create Kafka Consumer with configured SSL. */ +public class SslConsumerFactoryFn + implements SerializableFunction, Consumer> { + private final Map sslConfig; + private static final String TRUSTSTORE_LOCAL_PATH = "/tmp/kafka.truststore.jks"; + private static final String KEYSTORE_LOCAL_PATH = "/tmp/kafka.keystore.jks"; + + /* Logger for class.*/ + private static final Logger LOG = LoggerFactory.getLogger(SslConsumerFactoryFn.class); + + public SslConsumerFactoryFn(Map sslConfig) { + this.sslConfig = sslConfig; + } + + @SuppressWarnings("nullness") + @Override + public Consumer apply(Map config) { + String truststoreLocation = sslConfig.get(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG); + String keystoreLocation = sslConfig.get(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG); + if (truststoreLocation == null || keystoreLocation == null) { + LOG.warn("Not enough information to configure SSL"); + return new KafkaConsumer<>(config); + } + + try { + if (truststoreLocation.startsWith("gs://")) { + getGcsFileAsLocal(truststoreLocation, TRUSTSTORE_LOCAL_PATH); + sslConfig.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, TRUSTSTORE_LOCAL_PATH); + } else { + checkFileExists(truststoreLocation); + } + + if (keystoreLocation.startsWith("gs://")) { + getGcsFileAsLocal(keystoreLocation, KEYSTORE_LOCAL_PATH); + sslConfig.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, KEYSTORE_LOCAL_PATH); + } else { + checkFileExists(keystoreLocation); + } + } catch (IOException e) { + LOG.error("Failed to retrieve data for SSL", e); + return new KafkaConsumer<>(config); + } + + config.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, SecurityProtocol.SASL_SSL.name()); + config.put( + SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, + sslConfig.get(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG)); + config.put( + SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, + sslConfig.get(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG)); + config.put( + SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, + sslConfig.get(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG)); + config.put( + SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, + sslConfig.get(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG)); + config.put( + SslConfigs.SSL_KEY_PASSWORD_CONFIG, sslConfig.get(SslConfigs.SSL_KEY_PASSWORD_CONFIG)); + + return new KafkaConsumer<>(config); + } + + private void checkFileExists(String filePath) throws IOException { + LOG.info( + "Trying to get file: {} locally. Local files don't support when in using distribute runner", + filePath); + File f = new File(filePath); + if (f.exists()) { + LOG.debug("{} exists", f.getAbsolutePath()); + } else { + LOG.error("{} does not exist", f.getAbsolutePath()); + throw new IOException(); + } + } + + /** + * Reads a file from GCS and writes it locally. + * + * @param gcsFilePath path to file in GCS in format "gs://your-bucket/path/to/file" + * @param outputFilePath path where to save file locally + * @throws IOException thrown if not able to read or write file + */ + public static void getGcsFileAsLocal(String gcsFilePath, String outputFilePath) + throws IOException { + LOG.info("Reading contents from GCS file: {}", gcsFilePath); + Set options = new HashSet<>(2); + options.add(StandardOpenOption.CREATE); + options.add(StandardOpenOption.APPEND); + // Copy the GCS file into a local file and will throw an I/O exception in case file not found. + try (ReadableByteChannel readerChannel = + FileSystems.open(FileSystems.matchSingleFileSpec(gcsFilePath).resourceId())) { + try (FileChannel writeChannel = FileChannel.open(Paths.get(outputFilePath), options)) { + writeChannel.transferFrom(readerChannel, 0, Long.MAX_VALUE); + } + } + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/Utils.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/Utils.java new file mode 100644 index 000000000000..c30e84247601 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/Utils.java @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer; + +import static org.apache.beam.examples.complete.kafkatopubsub.KafkaPubsubConstants.KAFKA_CREDENTIALS; +import static org.apache.beam.examples.complete.kafkatopubsub.KafkaPubsubConstants.PASSWORD; +import static org.apache.beam.examples.complete.kafkatopubsub.KafkaPubsubConstants.USERNAME; + +import java.io.IOException; +import java.util.Arrays; +import java.util.HashMap; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.beam.examples.complete.kafkatopubsub.options.KafkaToPubsubOptions; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.gson.JsonObject; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.gson.JsonParser; +import org.apache.http.HttpResponse; +import org.apache.http.client.HttpClient; +import org.apache.http.client.methods.HttpGet; +import org.apache.http.impl.client.HttpClientBuilder; +import org.apache.http.util.EntityUtils; +import org.apache.kafka.common.config.SaslConfigs; +import org.apache.kafka.common.config.SslConfigs; +import org.apache.kafka.common.security.scram.internals.ScramMechanism; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Utilities for construction of Kafka Consumer. */ +public class Utils { + + /* Logger for class.*/ + private static final Logger LOG = LoggerFactory.getLogger(Utils.class); + + /** + * Retrieves all credentials from HashiCorp Vault secret storage. + * + * @param secretStoreUrl url to the secret storage that contains a credentials for Kafka + * @param token Vault token to access the secret storage + * @return credentials for Kafka consumer config + */ + public static Map> getKafkaCredentialsFromVault( + String secretStoreUrl, String token) { + Map> credentialMap = new HashMap<>(); + + JsonObject credentials = null; + try { + HttpClient client = HttpClientBuilder.create().build(); + HttpGet request = new HttpGet(secretStoreUrl); + request.addHeader("X-Vault-Token", token); + HttpResponse response = client.execute(request); + String json = EntityUtils.toString(response.getEntity(), "UTF-8"); + + /* + Vault's response JSON has a specific schema, where the actual data is placed under + {data: {data: }}. + Example: + { + "request_id": "6a0bb14b-ef24-256c-3edf-cfd52ad1d60d", + "lease_id": "", + "renewable": false, + "lease_duration": 0, + "data": { + "data": { + "bucket": "kafka_to_pubsub_test", + "key_password": "secret", + "keystore_password": "secret", + "keystore_path": "ssl_cert/kafka.keystore.jks", + "password": "admin-secret", + "truststore_password": "secret", + "truststore_path": "ssl_cert/kafka.truststore.jks", + "username": "admin" + }, + "metadata": { + "created_time": "2020-10-20T11:43:11.109186969Z", + "deletion_time": "", + "destroyed": false, + "version": 8 + } + }, + "wrap_info": null, + "warnings": null, + "auth": null + } + */ + // Parse security properties from the response JSON + credentials = + JsonParser.parseString(json) + .getAsJsonObject() + .get("data") + .getAsJsonObject() + .getAsJsonObject("data"); + } catch (IOException e) { + LOG.error("Failed to retrieve credentials from Vault.", e); + } + + if (credentials != null) { + // Username and password for Kafka authorization + credentialMap.put(KAFKA_CREDENTIALS, new HashMap<>()); + + if (credentials.has(USERNAME) && credentials.has(PASSWORD)) { + credentialMap.get(KAFKA_CREDENTIALS).put(USERNAME, credentials.get(USERNAME).getAsString()); + credentialMap.get(KAFKA_CREDENTIALS).put(PASSWORD, credentials.get(PASSWORD).getAsString()); + } else { + LOG.warn( + "There are no username and/or password for Kafka in Vault." + + "Trying to initiate an unauthorized connection."); + } + } + + return credentialMap; + } + + /** + * Configures Kafka consumer for authorized connection. + * + * @param props username and password for Kafka + * @return configuration set of parameters for Kafka + */ + public static Map configureKafka(@Nullable Map props) { + // Create the configuration for Kafka + Map config = new HashMap<>(); + if (props != null && props.containsKey(USERNAME) && props.containsKey(PASSWORD)) { + config.put(SaslConfigs.SASL_MECHANISM, ScramMechanism.SCRAM_SHA_512.mechanismName()); + config.put( + SaslConfigs.SASL_JAAS_CONFIG, + String.format( + "org.apache.kafka.common.security.scram.ScramLoginModule required " + + "username=\"%s\" password=\"%s\";", + props.get(USERNAME), props.get(PASSWORD))); + } + return config; + } + + public static Map configureSsl(KafkaToPubsubOptions options) { + Map config = new HashMap<>(); + config.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, options.getTruststorePath()); + config.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, options.getKeystorePath()); + config.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, options.getTruststorePassword()); + config.put(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, options.getKeystorePassword()); + config.put(SslConfigs.SSL_KEY_PASSWORD_CONFIG, options.getKeyPassword()); + + return config; + } + + public static boolean isSslSpecified(KafkaToPubsubOptions options) { + return options.getTruststorePath() != null + || options.getTruststorePassword() != null + || options.getKeystorePath() != null + || options.getKeyPassword() != null; + } + + public static Map parseKafkaConsumerConfig(String kafkaConsumerConfig) { + return Arrays.stream(kafkaConsumerConfig.split(";")) + .map(s -> s.split("=")) + .collect(Collectors.toMap(kv -> kv[0], kv -> kv[1])); + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/package-info.java new file mode 100644 index 000000000000..54aff5e766e7 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Kafka to Pubsub template. */ +package org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer; diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/options/KafkaToPubsubOptions.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/options/KafkaToPubsubOptions.java new file mode 100644 index 000000000000..3a86b7f4345d --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/options/KafkaToPubsubOptions.java @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub.options; + +import org.apache.beam.examples.complete.kafkatopubsub.transforms.FormatTransform; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.Validation; + +public interface KafkaToPubsubOptions extends PipelineOptions { + @Description( + "Comma Separated list of Kafka Bootstrap Servers (e.g: server1:[port],server2:[port]).") + @Validation.Required + String getBootstrapServers(); + + void setBootstrapServers(String value); + + @Description( + "Comma Separated list of Kafka topic(s) to read the input from (e.g: topic1,topic2).") + @Validation.Required + String getInputTopics(); + + void setInputTopics(String value); + + @Description( + "The Cloud Pub/Sub topic to publish to. " + + "The name should be in the format of " + + "projects//topics/.") + @Validation.Required + String getOutputTopic(); + + void setOutputTopic(String outputTopic); + + @Description( + "Format which will be writen to output Pub/Sub topic. Supported formats: AVRO, PUBSUB") + @Validation.Required + FormatTransform.FORMAT getOutputFormat(); + + void setOutputFormat(FormatTransform.FORMAT outputFormat); + + @Description("URL to credentials in Vault") + String getSecretStoreUrl(); + + void setSecretStoreUrl(String secretStoreUrl); + + @Description("Vault token") + String getVaultToken(); + + void setVaultToken(String vaultToken); + + @Description("The path to the trust store file") + String getTruststorePath(); + + void setTruststorePath(String truststorePath); + + @Description("The path to the key store file") + String getKeystorePath(); + + void setKeystorePath(String keystorePath); + + @Description("The password for the trust store password") + String getTruststorePassword(); + + void setTruststorePassword(String truststorePassword); + + @Description("The store password for the key store password") + String getKeystorePassword(); + + void setKeystorePassword(String keystorePassword); + + @Description("The password of the private key in the key store file") + String getKeyPassword(); + + void setKeyPassword(String keyPassword); + + @Description( + "Additional kafka consumer configs to be applied to Kafka Consumer (e.g. key1=value1;key2=value2).") + String getKafkaConsumerConfig(); + + void setKafkaConsumerConfig(String kafkaConfig); +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/options/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/options/package-info.java new file mode 100644 index 000000000000..361a5eab058c --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/options/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Kafka to Pubsub template. */ +package org.apache.beam.examples.complete.kafkatopubsub.options; diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/package-info.java new file mode 100644 index 000000000000..66b4c04f763f --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Kafka to Pubsub template. */ +package org.apache.beam.examples.complete.kafkatopubsub; diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/transforms/FormatTransform.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/transforms/FormatTransform.java new file mode 100644 index 000000000000..d493c0648180 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/transforms/FormatTransform.java @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub.transforms; + +import java.util.List; +import java.util.Map; +import org.apache.beam.examples.complete.kafkatopubsub.avro.AvroDataClass; +import org.apache.beam.examples.complete.kafkatopubsub.avro.AvroDataClassKafkaAvroDeserializer; +import org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer.SslConsumerFactoryFn; +import org.apache.beam.examples.complete.kafkatopubsub.options.KafkaToPubsubOptions; +import org.apache.beam.sdk.coders.AvroCoder; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.NullableCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage; +import org.apache.beam.sdk.io.kafka.KafkaIO; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.kafka.common.serialization.StringDeserializer; + +/** Different transformations over the processed data in the pipeline. */ +public class FormatTransform { + + public enum FORMAT { + PUBSUB, + AVRO + } + + /** + * Configures Kafka consumer. + * + * @param bootstrapServers Kafka servers to read from + * @param topicsList Kafka topics to read from + * @param kafkaConfig configuration for the Kafka consumer + * @param sslConfig configuration for the SSL connection + * @return configured reading from Kafka + */ + public static PTransform>> readFromKafka( + String bootstrapServers, + List topicsList, + Map kafkaConfig, + Map sslConfig) { + return KafkaIO.read() + .withBootstrapServers(bootstrapServers) + .withTopics(topicsList) + .withKeyDeserializerAndCoder( + StringDeserializer.class, (Coder) NullableCoder.of(StringUtf8Coder.of())) + .withValueDeserializerAndCoder( + StringDeserializer.class, (Coder) NullableCoder.of(StringUtf8Coder.of())) + .withConsumerConfigUpdates(kafkaConfig) + .withConsumerFactoryFn(new SslConsumerFactoryFn(sslConfig)) + .withoutMetadata(); + } + + /** + * Configures Kafka consumer to read avros to {@link AvroDataClass} format. + * + * @param bootstrapServers Kafka servers to read from + * @param topicsList Kafka topics to read from + * @param config configuration for the Kafka consumer + * @return configured reading from Kafka + */ + public static PTransform>> readAvrosFromKafka( + String bootstrapServers, + List topicsList, + Map config, + Map sslConfig) { + return KafkaIO.read() + .withBootstrapServers(bootstrapServers) + .withTopics(topicsList) + .withKeyDeserializerAndCoder( + StringDeserializer.class, (Coder) NullableCoder.of(StringUtf8Coder.of())) + .withValueDeserializerAndCoder( + AvroDataClassKafkaAvroDeserializer.class, AvroCoder.of(AvroDataClass.class)) + .withConsumerConfigUpdates(config) + .withConsumerFactoryFn(new SslConsumerFactoryFn(sslConfig)) + .withoutMetadata(); + } + + /** + * The {@link FormatOutput} wraps a String serializable messages with the {@link PubsubMessage} + * class. + */ + public static class FormatOutput extends PTransform, PDone> { + + private final KafkaToPubsubOptions options; + + public FormatOutput(KafkaToPubsubOptions options) { + this.options = options; + } + + @Override + public PDone expand(PCollection input) { + return input + .apply( + "convertMessagesToPubsubMessages", + MapElements.into(TypeDescriptor.of(PubsubMessage.class)) + .via( + (String json) -> + new PubsubMessage(json.getBytes(Charsets.UTF_8), ImmutableMap.of()))) + .apply( + "writePubsubMessagesToPubSub", PubsubIO.writeMessages().to(options.getOutputTopic())); + } + } +} diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/transforms/package-info.java b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/transforms/package-info.java new file mode 100644 index 000000000000..83b9d5c3c956 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/transforms/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Kafka to Pubsub template. */ +package org.apache.beam.examples.complete.kafkatopubsub.transforms; diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java index 3d70ab1760ba..4a7ea4a46939 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java +++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java @@ -25,7 +25,6 @@ import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead; -import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead.Method; import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.Description; import org.apache.beam.sdk.options.PipelineOptions; @@ -38,6 +37,8 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** * An example that reads the public samples of weather data from BigQuery, counts the number of @@ -66,6 +67,8 @@ * and can be overridden with {@code --input}. */ public class BigQueryTornadoes { + private static final Logger LOG = LoggerFactory.getLogger(BigQueryTornadoes.class); + // Default to using a 1000 row subset of the public weather station table publicdata:samples.gsod. private static final String WEATHER_SAMPLES_TABLE = "clouddataflow-readonly:samples.weather_stations"; @@ -145,12 +148,30 @@ public interface Options extends PipelineOptions { void setInputQuery(String value); - @Description("Mode to use when reading from BigQuery") + @Description("Read method to use to read from BigQuery") @Default.Enum("EXPORT") TypedRead.Method getReadMethod(); void setReadMethod(TypedRead.Method value); + @Description("Write method to use to write to BigQuery") + @Default.Enum("DEFAULT") + BigQueryIO.Write.Method getWriteMethod(); + + void setWriteMethod(BigQueryIO.Write.Method value); + + @Description("Write disposition to use to write to BigQuery") + @Default.Enum("WRITE_TRUNCATE") + BigQueryIO.Write.WriteDisposition getWriteDisposition(); + + void setWriteDisposition(BigQueryIO.Write.WriteDisposition value); + + @Description("Create disposition to use to write to BigQuery") + @Default.Enum("CREATE_IF_NEEDED") + BigQueryIO.Write.CreateDisposition getCreateDisposition(); + + void setCreateDisposition(BigQueryIO.Write.CreateDisposition value); + @Description( "BigQuery table to write to, specified as " + ":.. The dataset must already exist.") @@ -160,63 +181,48 @@ public interface Options extends PipelineOptions { void setOutput(String value); } - static void runBigQueryTornadoes(Options options) { - Pipeline p = Pipeline.create(options); - + public static void applyBigQueryTornadoes(Pipeline p, Options options) { // Build the table schema for the output table. List fields = new ArrayList<>(); fields.add(new TableFieldSchema().setName("month").setType("INTEGER")); fields.add(new TableFieldSchema().setName("tornado_count").setType("INTEGER")); TableSchema schema = new TableSchema().setFields(fields); - PCollection rowsFromBigQuery; - - switch (options.getReadMethod()) { - case DIRECT_READ: - if (!options.getInputQuery().isEmpty()) { - rowsFromBigQuery = - p.apply( - BigQueryIO.readTableRows() - .fromQuery(options.getInputQuery()) - .usingStandardSql() - .withMethod(Method.DIRECT_READ)); - } else { - rowsFromBigQuery = - p.apply( - BigQueryIO.readTableRows() - .from(options.getInput()) - .withMethod(Method.DIRECT_READ) - .withSelectedFields(Lists.newArrayList("month", "tornado"))); - } - break; - - default: - if (!options.getInputQuery().isEmpty()) { - rowsFromBigQuery = - p.apply( - BigQueryIO.readTableRows() - .fromQuery(options.getInputQuery()) - .usingStandardSql() - .withMethod(options.getReadMethod())); - } else { - rowsFromBigQuery = - p.apply( - BigQueryIO.readTableRows() - .from(options.getInput()) - .withMethod(options.getReadMethod())); - } - break; + TypedRead bigqueryIO; + if (!options.getInputQuery().isEmpty()) { + bigqueryIO = + BigQueryIO.readTableRows() + .fromQuery(options.getInputQuery()) + .usingStandardSql() + .withMethod(options.getReadMethod()); + } else { + bigqueryIO = + BigQueryIO.readTableRows().from(options.getInput()).withMethod(options.getReadMethod()); + + // Selected fields only applies when using Method.DIRECT_READ and + // when reading directly from a table. + if (options.getReadMethod() == TypedRead.Method.DIRECT_READ) { + bigqueryIO = bigqueryIO.withSelectedFields(Lists.newArrayList("month", "tornado")); + } } + PCollection rowsFromBigQuery = p.apply(bigqueryIO); + rowsFromBigQuery .apply(new CountTornadoes()) .apply( BigQueryIO.writeTableRows() .to(options.getOutput()) .withSchema(schema) - .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) - .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)); + .withCreateDisposition(options.getCreateDisposition()) + .withWriteDisposition(options.getWriteDisposition()) + .withMethod(options.getWriteMethod())); + } + public static void runBigQueryTornadoes(Options options) { + LOG.info("Running BigQuery Tornadoes with options " + options.toString()); + Pipeline p = Pipeline.create(options); + applyBigQueryTornadoes(p, options); p.run().waitUntilFinish(); } diff --git a/examples/java/src/main/java/org/apache/beam/examples/snippets/transforms/io/gcp/bigquery/BigQueryReadFromQueryWithBigQueryStorageAPI.java b/examples/java/src/main/java/org/apache/beam/examples/snippets/transforms/io/gcp/bigquery/BigQueryReadFromQueryWithBigQueryStorageAPI.java new file mode 100644 index 000000000000..22680ac76809 --- /dev/null +++ b/examples/java/src/main/java/org/apache/beam/examples/snippets/transforms/io/gcp/bigquery/BigQueryReadFromQueryWithBigQueryStorageAPI.java @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.snippets.transforms.io.gcp.bigquery; + +// [START bigquery_read_from_query_with_bigquery_storage_api] + +import org.apache.beam.examples.snippets.transforms.io.gcp.bigquery.BigQueryMyData.MyData; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead.Method; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; + +class BigQueryReadFromQueryWithBigQueryStorageAPI { + public static PCollection readFromQueryWithBigQueryStorageAPI( + String project, String dataset, String table, String query, Pipeline pipeline) { + + // String project = "my-project-id"; + // String dataset = "my_bigquery_dataset_id"; + // String table = "my_bigquery_table_id"; + + // Pipeline pipeline = Pipeline.create(); + + /* + String query = String.format("SELECT\n" + + " string_field,\n" + + " int64_field,\n" + + " float64_field,\n" + + " numeric_field,\n" + + " bool_field,\n" + + " bytes_field,\n" + + " date_field,\n" + + " datetime_field,\n" + + " time_field,\n" + + " timestamp_field,\n" + + " geography_field,\n" + + " array_field,\n" + + " struct_field\n" + + "FROM\n" + + " `%s:%s.%s`", project, dataset, table) + */ + + PCollection rows = + pipeline + .apply( + "Read from BigQuery table", + BigQueryIO.readTableRows() + .fromQuery(query) + .usingStandardSql() + .withMethod(Method.DIRECT_READ)) + .apply( + "TableRows to MyData", + MapElements.into(TypeDescriptor.of(MyData.class)).via(MyData::fromTableRow)); + + return rows; + } +} +// [END bigquery_read_from_query_with_bigquery_storage_api] diff --git a/examples/java/src/test/java/org/apache/beam/examples/WindowedWordCountIT.java b/examples/java/src/test/java/org/apache/beam/examples/WindowedWordCountIT.java index b42f7d8c8c38..bfb11ae01462 100644 --- a/examples/java/src/test/java/org/apache/beam/examples/WindowedWordCountIT.java +++ b/examples/java/src/test/java/org/apache/beam/examples/WindowedWordCountIT.java @@ -62,9 +62,6 @@ /** End-to-end integration test of {@link WindowedWordCount}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindowedWordCountIT { @Rule public TestName testName = new TestName(); diff --git a/examples/java/src/test/java/org/apache/beam/examples/complete/TrafficMaxLaneFlowIT.java b/examples/java/src/test/java/org/apache/beam/examples/complete/TrafficMaxLaneFlowIT.java index d683c38a2bbe..0ee34be716b3 100644 --- a/examples/java/src/test/java/org/apache/beam/examples/complete/TrafficMaxLaneFlowIT.java +++ b/examples/java/src/test/java/org/apache/beam/examples/complete/TrafficMaxLaneFlowIT.java @@ -42,9 +42,6 @@ /** End-to-end tests of TrafficMaxLaneFlowIT. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TrafficMaxLaneFlowIT { private static final Logger LOG = LoggerFactory.getLogger(TrafficMaxLaneFlowIT.class); private TrafficMaxLaneFlowOptions options; diff --git a/examples/java/src/test/java/org/apache/beam/examples/complete/TrafficRoutesIT.java b/examples/java/src/test/java/org/apache/beam/examples/complete/TrafficRoutesIT.java index edaecfdcd5e5..3fde9d8e692d 100644 --- a/examples/java/src/test/java/org/apache/beam/examples/complete/TrafficRoutesIT.java +++ b/examples/java/src/test/java/org/apache/beam/examples/complete/TrafficRoutesIT.java @@ -42,9 +42,6 @@ /** End-to-end tests of TrafficRoutes. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TrafficRoutesIT { private static final Logger LOG = LoggerFactory.getLogger(TrafficRoutesIT.class); private TrafficRoutesOptions options; diff --git a/examples/java/src/test/java/org/apache/beam/examples/complete/datatokenization/DataTokenizationTest.java b/examples/java/src/test/java/org/apache/beam/examples/complete/datatokenization/DataTokenizationTest.java new file mode 100644 index 000000000000..a2b1ec05a2ec --- /dev/null +++ b/examples/java/src/test/java/org/apache/beam/examples/complete/datatokenization/DataTokenizationTest.java @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.datatokenization; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.equalTo; +import static org.hamcrest.Matchers.hasSize; +import static org.hamcrest.Matchers.startsWith; +import static org.junit.Assert.assertNotNull; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.LinkedList; +import java.util.List; +import org.apache.beam.examples.complete.datatokenization.options.DataTokenizationOptions; +import org.apache.beam.examples.complete.datatokenization.transforms.io.TokenizationFileSystemIO; +import org.apache.beam.examples.complete.datatokenization.transforms.io.TokenizationFileSystemIO.FORMAT; +import org.apache.beam.examples.complete.datatokenization.utils.FailsafeElementCoder; +import org.apache.beam.examples.complete.datatokenization.utils.RowToCsv; +import org.apache.beam.examples.complete.datatokenization.utils.SchemasUtils; +import org.apache.beam.sdk.coders.CoderRegistry; +import org.apache.beam.sdk.coders.NullableCoder; +import org.apache.beam.sdk.coders.RowCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.junit.Assert; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test class for {@link DataTokenization}. */ +@RunWith(JUnit4.class) +public class DataTokenizationTest { + + private static final String testSchema = + "{\"fields\":[{\"mode\":\"REQUIRED\",\"name\":\"FieldName1\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"FieldName2\",\"type\":\"STRING\"}]}"; + String[] fields = {"TestValue1", "TestValue2"}; + + @Rule public final transient TestPipeline testPipeline = TestPipeline.create(); + + private static final String RESOURCES_DIR = "./"; + + private static final String CSV_FILE_PATH = + Resources.getResource(RESOURCES_DIR + "testInput.csv").getPath(); + + private static final String JSON_FILE_PATH = + Resources.getResource(RESOURCES_DIR + "testInput.txt").getPath(); + + private static final String SCHEMA_FILE_PATH = + Resources.getResource(RESOURCES_DIR + "schema.txt").getPath(); + + private static final FailsafeElementCoder FAILSAFE_ELEMENT_CODER = + FailsafeElementCoder.of( + NullableCoder.of(StringUtf8Coder.of()), NullableCoder.of(StringUtf8Coder.of())); + + @Test + public void testGetBeamSchema() { + Schema expectedSchema = + Schema.builder() + .addField("FieldName1", FieldType.STRING) + .addField("FieldName2", FieldType.STRING) + .build(); + SchemasUtils schemasUtils = new SchemasUtils(testSchema); + Assert.assertEquals(expectedSchema, schemasUtils.getBeamSchema()); + } + + @Test + public void testGetBigQuerySchema() { + SchemasUtils schemasUtils = new SchemasUtils(testSchema); + Assert.assertEquals(testSchema, schemasUtils.getBigQuerySchema().toString()); + } + + @Test + public void testRowToCSV() { + Schema beamSchema = new SchemasUtils(testSchema).getBeamSchema(); + Row.Builder rowBuilder = Row.withSchema(beamSchema); + Row row = rowBuilder.addValues(new ArrayList<>(Arrays.asList(fields))).build(); + String csvResult = new RowToCsv(";").getCsvFromRow(row); + Assert.assertEquals(String.join(";", fields), csvResult); + } + + @Test + public void testRowToCSVWithNull() { + final String nullableTestSchema = + "{\"fields\":[{\"mode\":\"REQUIRED\",\"name\":\"FieldName1\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"FieldName2\",\"type\":\"STRING\"}]}"; + final String expectedCsv = "TestValueOne;null"; + + List values = Lists.newArrayList("TestValueOne", null); + + Schema beamSchema = new SchemasUtils(nullableTestSchema).getBeamSchema(); + Row.Builder rowBuilder = Row.withSchema(beamSchema); + Row row = rowBuilder.addValues(values).build(); + String csvResult = new RowToCsv(";").getCsvFromRow(row); + Assert.assertEquals(expectedCsv, csvResult); + } + + @Test + public void testFileSystemIOReadCSV() throws IOException { + PCollection jsons = fileSystemIORead(CSV_FILE_PATH, FORMAT.CSV); + assertRows(jsons); + testPipeline.run(); + } + + @Test + public void testFileSystemIOReadJSON() throws IOException { + PCollection jsons = fileSystemIORead(JSON_FILE_PATH, FORMAT.JSON); + assertRows(jsons); + testPipeline.run(); + } + + @Test + public void testJsonToRow() throws IOException { + PCollection rows = fileSystemIORead(JSON_FILE_PATH, FORMAT.JSON); + SchemasUtils testSchemaUtils = new SchemasUtils(SCHEMA_FILE_PATH, StandardCharsets.UTF_8); + + PAssert.that(rows) + .satisfies( + x -> { + LinkedList beamRows = Lists.newLinkedList(x); + assertThat(beamRows, hasSize(3)); + beamRows.forEach( + row -> { + List fieldValues = row.getValues(); + for (Object element : fieldValues) { + assertThat((String) element, startsWith("FieldValue")); + } + }); + return null; + }); + testPipeline.run(); + } + + private PCollection fileSystemIORead(String inputGcsFilePattern, FORMAT inputGcsFileFormat) + throws IOException { + DataTokenizationOptions options = + PipelineOptionsFactory.create().as(DataTokenizationOptions.class); + options.setDataSchemaPath(SCHEMA_FILE_PATH); + options.setInputFilePattern(inputGcsFilePattern); + options.setInputFileFormat(inputGcsFileFormat); + if (inputGcsFileFormat == FORMAT.CSV) { + options.setCsvContainsHeaders(Boolean.FALSE); + } + + SchemasUtils testSchemaUtils = + new SchemasUtils(options.getDataSchemaPath(), StandardCharsets.UTF_8); + + CoderRegistry coderRegistry = testPipeline.getCoderRegistry(); + coderRegistry.registerCoderForType( + FAILSAFE_ELEMENT_CODER.getEncodedTypeDescriptor(), FAILSAFE_ELEMENT_CODER); + coderRegistry.registerCoderForType( + RowCoder.of(testSchemaUtils.getBeamSchema()).getEncodedTypeDescriptor(), + RowCoder.of(testSchemaUtils.getBeamSchema())); + /* + * Row/Row Coder for FailsafeElement. + */ + FailsafeElementCoder coder = + FailsafeElementCoder.of( + RowCoder.of(testSchemaUtils.getBeamSchema()), + RowCoder.of(testSchemaUtils.getBeamSchema())); + coderRegistry.registerCoderForType(coder.getEncodedTypeDescriptor(), coder); + + return new TokenizationFileSystemIO(options).read(testPipeline, testSchemaUtils); + } + + private void assertRows(PCollection jsons) { + PAssert.that(jsons) + .satisfies( + x -> { + LinkedList rows = Lists.newLinkedList(x); + assertThat(rows, hasSize(3)); + rows.forEach( + row -> { + assertNotNull(row.getSchema()); + assertThat(row.getSchema().getFields(), hasSize(3)); + assertThat(row.getSchema().getField(0).getName(), equalTo("Field1")); + + assertThat(row.getValues(), hasSize(3)); + }); + return null; + }); + } +} diff --git a/examples/java/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardTest.java b/examples/java/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardTest.java index 4632887953e4..e48c2471cea7 100644 --- a/examples/java/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardTest.java +++ b/examples/java/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.examples.complete.game; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItem; -import static org.junit.Assert.assertThat; import java.io.Serializable; import org.apache.beam.examples.complete.game.LeaderBoard.CalculateTeamScores; diff --git a/examples/java/src/test/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaToPubsubE2ETest.java b/examples/java/src/test/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaToPubsubE2ETest.java new file mode 100644 index 000000000000..8e44fc2258dd --- /dev/null +++ b/examples/java/src/test/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaToPubsubE2ETest.java @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub; + +import static org.hamcrest.Matchers.equalTo; +import static org.hamcrest.Matchers.hasProperty; + +import com.google.auth.Credentials; +import java.nio.charset.StandardCharsets; +import java.util.UUID; +import java.util.concurrent.ExecutionException; +import org.apache.beam.examples.complete.kafkatopubsub.options.KafkaToPubsubOptions; +import org.apache.beam.examples.complete.kafkatopubsub.transforms.FormatTransform.FORMAT; +import org.apache.beam.runners.direct.DirectOptions; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.extensions.gcp.auth.NoopCredentialFactory; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubOptions; +import org.apache.beam.sdk.io.gcp.pubsub.TestPubsub; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.kafka.clients.consumer.ConsumerConfig; +import org.apache.kafka.clients.producer.KafkaProducer; +import org.apache.kafka.clients.producer.ProducerConfig; +import org.apache.kafka.clients.producer.ProducerRecord; +import org.apache.kafka.common.serialization.StringSerializer; +import org.joda.time.Duration; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.ClassRule; +import org.junit.Rule; +import org.junit.Test; +import org.testcontainers.containers.KafkaContainer; +import org.testcontainers.containers.PubSubEmulatorContainer; +import org.testcontainers.utility.DockerImageName; + +/** E2E test for {@link KafkaToPubsub} pipeline. */ +public class KafkaToPubsubE2ETest { + private static final String PUBSUB_EMULATOR_IMAGE = + "gcr.io/google.com/cloudsdktool/cloud-sdk:316.0.0-emulators"; + private static final String KAFKA_IMAGE_NAME = "confluentinc/cp-kafka:5.4.3"; + private static final String PUBSUB_MESSAGE = "test pubsub message"; + private static final String KAFKA_TOPIC_NAME = "messages-topic"; + private static final String PROJECT_ID = "try-kafka-pubsub"; + private static final PipelineOptions OPTIONS = TestPipeline.testingPipelineOptions(); + + @ClassRule + public static final PubSubEmulatorContainer PUB_SUB_EMULATOR_CONTAINER = + new PubSubEmulatorContainer(DockerImageName.parse(PUBSUB_EMULATOR_IMAGE)); + + @ClassRule + public static final KafkaContainer KAFKA_CONTAINER = + new KafkaContainer(DockerImageName.parse(KAFKA_IMAGE_NAME)); + + @Rule public final transient TestPipeline pipeline = TestPipeline.fromOptions(OPTIONS); + @Rule public final transient TestPubsub testPubsub = TestPubsub.fromOptions(OPTIONS); + + @BeforeClass + public static void beforeClass() throws Exception { + Credentials credentials = NoopCredentialFactory.fromOptions(OPTIONS).getCredential(); + OPTIONS.as(DirectOptions.class).setBlockOnRun(false); + OPTIONS.as(GcpOptions.class).setGcpCredential(credentials); + OPTIONS.as(GcpOptions.class).setProject(PROJECT_ID); + OPTIONS + .as(PubsubOptions.class) + .setPubsubRootUrl("http://" + PUB_SUB_EMULATOR_CONTAINER.getEmulatorEndpoint()); + OPTIONS.as(KafkaToPubsubOptions.class).setOutputFormat(FORMAT.PUBSUB); + OPTIONS + .as(KafkaToPubsubOptions.class) + .setBootstrapServers(KAFKA_CONTAINER.getBootstrapServers()); + OPTIONS.as(KafkaToPubsubOptions.class).setInputTopics(KAFKA_TOPIC_NAME); + OPTIONS + .as(KafkaToPubsubOptions.class) + .setKafkaConsumerConfig(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG + "=earliest"); + } + + @Before + public void setUp() { + OPTIONS.as(KafkaToPubsubOptions.class).setOutputTopic(testPubsub.topicPath().getPath()); + } + + @Test + public void testKafkaToPubsubE2E() throws Exception { + PipelineResult job = KafkaToPubsub.run(pipeline, OPTIONS.as(KafkaToPubsubOptions.class)); + + sendKafkaMessage(); + testPubsub + .assertThatTopicEventuallyReceives( + hasProperty("payload", equalTo(PUBSUB_MESSAGE.getBytes(StandardCharsets.UTF_8)))) + .waitForUpTo(Duration.standardMinutes(1)); + try { + job.cancel(); + } catch (UnsupportedOperationException e) { + throw new AssertionError("Could not stop pipeline.", e); + } + } + + private void sendKafkaMessage() { + try (KafkaProducer producer = + new KafkaProducer<>( + ImmutableMap.of( + ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, + KAFKA_CONTAINER.getBootstrapServers(), + ProducerConfig.CLIENT_ID_CONFIG, + UUID.randomUUID().toString()), + new StringSerializer(), + new StringSerializer())) { + producer.send(new ProducerRecord<>(KAFKA_TOPIC_NAME, "testcontainers", PUBSUB_MESSAGE)).get(); + } catch (ExecutionException | InterruptedException e) { + throw new RuntimeException("Something went wrong in kafka producer", e); + } + } +} diff --git a/examples/java/src/test/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaToPubsubTest.java b/examples/java/src/test/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaToPubsubTest.java new file mode 100644 index 000000000000..e71b30459723 --- /dev/null +++ b/examples/java/src/test/java/org/apache/beam/examples/complete/kafkatopubsub/KafkaToPubsubTest.java @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.complete.kafkatopubsub; + +import static org.apache.beam.examples.complete.kafkatopubsub.KafkaPubsubConstants.PASSWORD; +import static org.apache.beam.examples.complete.kafkatopubsub.KafkaPubsubConstants.USERNAME; +import static org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer.Utils.getKafkaCredentialsFromVault; + +import java.util.HashMap; +import java.util.Map; +import org.apache.beam.examples.complete.kafkatopubsub.kafka.consumer.Utils; +import org.apache.kafka.common.config.SaslConfigs; +import org.apache.kafka.common.security.scram.internals.ScramMechanism; +import org.junit.Assert; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test of KafkaToPubsub. */ +@RunWith(JUnit4.class) +public class KafkaToPubsubTest { + + /** Tests configureKafka() with a null input properties. */ + @Test + public void testConfigureKafkaNullProps() { + Map config = Utils.configureKafka(null); + Assert.assertEquals(new HashMap<>(), config); + } + + /** Tests configureKafka() without a Password in input properties. */ + @Test + public void testConfigureKafkaNoPassword() { + Map props = new HashMap<>(); + props.put(USERNAME, "username"); + Map config = Utils.configureKafka(props); + Assert.assertEquals(new HashMap<>(), config); + } + + /** Tests configureKafka() without a Username in input properties. */ + @Test + public void testConfigureKafkaNoUsername() { + Map props = new HashMap<>(); + props.put(PASSWORD, "password"); + Map config = Utils.configureKafka(props); + Assert.assertEquals(new HashMap<>(), config); + } + + /** Tests configureKafka() with an appropriate input properties. */ + @Test + public void testConfigureKafka() { + Map props = new HashMap<>(); + props.put(USERNAME, "username"); + props.put(PASSWORD, "password"); + + Map expectedConfig = new HashMap<>(); + expectedConfig.put(SaslConfigs.SASL_MECHANISM, ScramMechanism.SCRAM_SHA_512.mechanismName()); + expectedConfig.put( + SaslConfigs.SASL_JAAS_CONFIG, + String.format( + "org.apache.kafka.common.security.scram.ScramLoginModule required " + + "username=\"%s\" password=\"%s\";", + props.get(USERNAME), props.get(PASSWORD))); + + Map config = Utils.configureKafka(props); + Assert.assertEquals(expectedConfig, config); + } + + /** Tests getKafkaCredentialsFromVault() with an invalid url. */ + @Test + public void testGetKafkaCredentialsFromVaultInvalidUrl() { + Map> credentials = + getKafkaCredentialsFromVault("some-url", "some-token"); + Assert.assertEquals(new HashMap<>(), credentials); + } +} diff --git a/examples/java/src/test/java/org/apache/beam/examples/cookbook/BigQueryTornadoesIT.java b/examples/java/src/test/java/org/apache/beam/examples/cookbook/BigQueryTornadoesIT.java index ea10fb1e201f..660dbcb55290 100644 --- a/examples/java/src/test/java/org/apache/beam/examples/cookbook/BigQueryTornadoesIT.java +++ b/examples/java/src/test/java/org/apache/beam/examples/cookbook/BigQueryTornadoesIT.java @@ -41,7 +41,7 @@ * *
      *  ./gradlew integrationTest -p examples/java/ -DintegrationTestPipelineOptions='[
    - *  "--tempLocation=gs://your-location/"]'
    + *  "--tempLocation=gs://apache-beam-testing-developers/"]'
      *  --tests org.apache.beam.examples.cookbook.BigQueryTornadoesIT
      *  -DintegrationTestRunner=direct
      * 
    diff --git a/examples/java/src/test/java/org/apache/beam/examples/cookbook/CombinePerKeyExamplesTest.java b/examples/java/src/test/java/org/apache/beam/examples/cookbook/CombinePerKeyExamplesTest.java index eca98c3b9f8c..cd19bf993684 100644 --- a/examples/java/src/test/java/org/apache/beam/examples/cookbook/CombinePerKeyExamplesTest.java +++ b/examples/java/src/test/java/org/apache/beam/examples/cookbook/CombinePerKeyExamplesTest.java @@ -17,6 +17,8 @@ */ package org.apache.beam.examples.cookbook; +import static org.hamcrest.MatcherAssert.assertThat; + import com.google.api.services.bigquery.model.TableRow; import java.util.List; import org.apache.beam.examples.cookbook.CombinePerKeyExamples.ExtractLargeWordsFn; @@ -24,7 +26,6 @@ import org.apache.beam.sdk.transforms.DoFnTester; import org.apache.beam.sdk.values.KV; import org.hamcrest.CoreMatchers; -import org.junit.Assert; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; @@ -69,9 +70,9 @@ public void testExtractLargeWordsFn() throws Exception { DoFnTester> extractLargeWordsFn = DoFnTester.of(new ExtractLargeWordsFn()); List> results = extractLargeWordsFn.processBundle(ROWS_ARRAY); - Assert.assertThat(results, CoreMatchers.hasItem(tuple1)); - Assert.assertThat(results, CoreMatchers.hasItem(tuple2)); - Assert.assertThat(results, CoreMatchers.hasItem(tuple3)); + assertThat(results, CoreMatchers.hasItem(tuple1)); + assertThat(results, CoreMatchers.hasItem(tuple2)); + assertThat(results, CoreMatchers.hasItem(tuple3)); } @Test @@ -79,7 +80,7 @@ public void testFormatShakespeareOutputFn() throws Exception { DoFnTester, TableRow> formatShakespeareOutputFn = DoFnTester.of(new FormatShakespeareOutputFn()); List results = formatShakespeareOutputFn.processBundle(COMBINED_TUPLES_ARRAY); - Assert.assertThat(results, CoreMatchers.hasItem(resultRow1)); - Assert.assertThat(results, CoreMatchers.hasItem(resultRow2)); + assertThat(results, CoreMatchers.hasItem(resultRow1)); + assertThat(results, CoreMatchers.hasItem(resultRow2)); } } diff --git a/examples/java/src/test/resources/schema.txt b/examples/java/src/test/resources/schema.txt new file mode 100644 index 000000000000..10f5ae2eded7 --- /dev/null +++ b/examples/java/src/test/resources/schema.txt @@ -0,0 +1,19 @@ +{ + "fields": [ + { + "mode": "REQUIRED", + "name": "Field1", + "type": "STRING" + }, + { + "mode": "REQUIRED", + "name": "Field2", + "type": "STRING" + }, + { + "mode": "REQUIRED", + "name": "Field3", + "type": "STRING" + } + ] +} \ No newline at end of file diff --git a/examples/java/src/test/resources/testInput.csv b/examples/java/src/test/resources/testInput.csv new file mode 100644 index 000000000000..9e88e1312b27 --- /dev/null +++ b/examples/java/src/test/resources/testInput.csv @@ -0,0 +1,3 @@ +FieldValue11,FieldValue12,FieldValue13 +FieldValue21,FieldValue22,FieldValue23 +FieldValue31,FieldValue32,FieldValue33 \ No newline at end of file diff --git a/examples/java/src/test/resources/testInput.txt b/examples/java/src/test/resources/testInput.txt new file mode 100644 index 000000000000..703469f53d18 --- /dev/null +++ b/examples/java/src/test/resources/testInput.txt @@ -0,0 +1,3 @@ +{"Field1": "FieldValue11", "Field2": "FieldValue12", "Field3": "FieldValue13"} +{"Field1": "FieldValue12", "Field2": "FieldValue22", "Field3": "FieldValue23"} +{"Field1": "FieldValue13", "Field2": "FieldValue32", "Field3": "FieldValue33"} \ No newline at end of file diff --git a/examples/java/twitter/build.gradle b/examples/java/twitter/build.gradle new file mode 100644 index 000000000000..b7f315a061bb --- /dev/null +++ b/examples/java/twitter/build.gradle @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import groovy.json.JsonOutput + +plugins { + id 'java' + id 'org.apache.beam.module' + id 'com.github.johnrengelman.shadow' +} + +applyJavaNature( + exportJavadoc: false, + automaticModuleName: 'org.apache.beam.examples.twitterstreamgenerator', +) +provideIntegrationTestingDependencies() +enableJavaPerformanceTesting() + +description = "Apache Beam :: Examples :: Java :: Twitter " +ext.summary = """Apache Beam SDK provides a simple, Java-based +interface for processing virtually any size data. This +artifact includes all Apache Beam Java SDK examples.""" + +/** Define the list of runners which execute a precommit test. + * Some runners are run from separate projects, see the preCommit task below + * for details. + */ +def preCommitRunners = ["directRunner", "flinkRunner"] +for (String runner : preCommitRunners) { + configurations.create(runner + "PreCommit") +} + +dependencies { + compile enforcedPlatform(library.java.google_cloud_platform_libraries_bom) + compile project(path: ":sdks:java:core", configuration: "shadow") + compile library.java.joda_time + compile library.java.slf4j_api + provided library.java.commons_io + provided library.java.commons_csv + runtime project(path: ":runners:direct-java", configuration: "shadow") + compile ("org.twitter4j:twitter4j-stream:4.0.7") + compile ("org.twitter4j:twitter4j-core:4.0.7") + testCompile project(path: ":runners:direct-java", configuration: "shadow") + testCompile library.java.hamcrest_core + testCompile library.java.hamcrest_library + testCompile library.java.junit + testCompile library.java.mockito_core + testCompile library.java.testcontainers_gcloud + + // Add dependencies for the PreCommit configurations + // For each runner a project level dependency on the examples project. + for (String runner : preCommitRunners) { + delegate.add(runner + "PreCommit", project(":examples:java:twitter")) + delegate.add(runner + "PreCommit", project(path: ":examples:java:twitter", configuration: "testRuntime")) + } + directRunnerPreCommit project(path: ":runners:direct-java", configuration: "shadow") + flinkRunnerPreCommit project(":runners:flink:${project.ext.latestFlinkVersion}") +} + +/* + * Create a ${runner}PreCommit task for each runner which runs a set + * of integration tests for WordCount and WindowedWordCount. + */ +def preCommitRunnerClass = [ + directRunner: "org.apache.beam.runners.direct.DirectRunner", + flinkRunner: "org.apache.beam.runners.flink.TestFlinkRunner", +] + +for (String runner : preCommitRunners) { + tasks.create(name: runner + "PreCommit", type: Test) { + def preCommitBeamTestPipelineOptions = [ + "--runner=" + preCommitRunnerClass[runner], + ] + classpath = configurations."${runner}PreCommit" + forkEvery 1 + maxParallelForks 4 + systemProperty "beamTestPipelineOptions", JsonOutput.toJson(preCommitBeamTestPipelineOptions) + } +} + +/* Define a common precommit task which depends on all the individual precommits. */ +task preCommit() { + for (String runner : preCommitRunners) { + dependsOn runner + "PreCommit" + } +} + +task execute (type:JavaExec) { + main = "org.apache.beam.examples.twitterstreamgenerator.TwitterStream" +} \ No newline at end of file diff --git a/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/README.md b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/README.md new file mode 100644 index 000000000000..9d15f25d4575 --- /dev/null +++ b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/README.md @@ -0,0 +1,42 @@ + + +# Twitter Connector + +This directory contains an example pipelines for how to perform continues stream of data from twitter streaming api ( or any other 3rd party API ). This include: + +
      +
    • Splitable Dofn + — A simple example of implementation of splittable dofn on an unbounded source with a simple incrementing watermarking logic.
    • +
    • Connection Management + — The streaming pipeline example makes sure that only one Twitter connection is active at a time for a configuration. +
    • +
    • Terminating pipeline by time or elements + — The streaming pipeline keeps track of time and data collecting so far and terminated when the limit specified in passed. +
    • +
    + +## Requirements + +- Java 8 +- Twitter developer app account and streaming credentials. +- Direct runner or Flink runner. + +This section describes what is needed to get the example up and running. + +- Gradle preparation +- Local execution \ No newline at end of file diff --git a/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/ReadFromTwitterDoFn.java b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/ReadFromTwitterDoFn.java new file mode 100644 index 000000000000..e32595059da5 --- /dev/null +++ b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/ReadFromTwitterDoFn.java @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.twitterstreamgenerator; + +import java.io.IOException; +import java.io.Serializable; +import java.util.Objects; +import java.util.concurrent.BlockingQueue; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.SerializableCoder; +import org.apache.beam.sdk.io.range.OffsetRange; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.sdk.transforms.splittabledofn.SplitResult; +import org.apache.beam.sdk.transforms.splittabledofn.WatermarkEstimators; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.DateTime; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import twitter4j.Status; + +/** Splittable dofn that read live data off twitter. * */ +@DoFn.UnboundedPerElement +final class ReadFromTwitterDoFn extends DoFn { + + private final DateTime startTime; + + ReadFromTwitterDoFn() { + this.startTime = new DateTime(); + } + /* Logger for class.*/ + private static final Logger LOG = LoggerFactory.getLogger(ReadFromTwitterDoFn.class); + + static class OffsetHolder implements Serializable { + public final @Nullable TwitterConfig twitterConfig; + public final @Nullable Long fetchedRecords; + + OffsetHolder(@Nullable TwitterConfig twitterConfig, @Nullable Long fetchedRecords) { + this.twitterConfig = twitterConfig; + this.fetchedRecords = fetchedRecords; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + OffsetHolder that = (OffsetHolder) o; + return Objects.equals(twitterConfig, that.twitterConfig) + && Objects.equals(fetchedRecords, that.fetchedRecords); + } + + @Override + public int hashCode() { + return Objects.hash(twitterConfig, fetchedRecords); + } + } + + static class OffsetTracker extends RestrictionTracker + implements Serializable { + private OffsetHolder restriction; + private final DateTime startTime; + + OffsetTracker(OffsetHolder holder, DateTime startTime) { + this.restriction = holder; + this.startTime = startTime; + } + + @Override + public boolean tryClaim(TwitterConfig twitterConfig) { + LOG.debug( + "-------------- Claiming " + + twitterConfig.hashCode() + + " used to have: " + + restriction.fetchedRecords); + long fetchedRecords = + this.restriction == null || this.restriction.fetchedRecords == null + ? 0 + : this.restriction.fetchedRecords + 1; + long elapsedTime = System.currentTimeMillis() - startTime.getMillis(); + final long millis = 60 * 1000; + LOG.debug( + "-------------- Time running: {} / {}", + elapsedTime, + (twitterConfig.getMinutesToRun() * millis)); + if (fetchedRecords > twitterConfig.getTweetsCount() + || elapsedTime > twitterConfig.getMinutesToRun() * millis) { + return false; + } + this.restriction = new OffsetHolder(twitterConfig, fetchedRecords); + return true; + } + + @Override + public OffsetHolder currentRestriction() { + return restriction; + } + + @Override + public SplitResult trySplit(double fractionOfRemainder) { + LOG.debug("-------------- Trying to split: fractionOfRemainder=" + fractionOfRemainder); + return SplitResult.of(new OffsetHolder(null, 0L), restriction); + } + + @Override + public void checkDone() throws IllegalStateException {} + + @Override + public IsBounded isBounded() { + return IsBounded.UNBOUNDED; + } + } + + @GetInitialWatermarkEstimatorState + public Instant getInitialWatermarkEstimatorState(@Timestamp Instant currentElementTimestamp) { + return currentElementTimestamp; + } + + private static Instant ensureTimestampWithinBounds(Instant timestamp) { + if (timestamp.isBefore(BoundedWindow.TIMESTAMP_MIN_VALUE)) { + timestamp = BoundedWindow.TIMESTAMP_MIN_VALUE; + } else if (timestamp.isAfter(BoundedWindow.TIMESTAMP_MAX_VALUE)) { + timestamp = BoundedWindow.TIMESTAMP_MAX_VALUE; + } + return timestamp; + } + + @NewWatermarkEstimator + public WatermarkEstimators.Manual newWatermarkEstimator( + @WatermarkEstimatorState Instant watermarkEstimatorState) { + return new WatermarkEstimators.Manual(ensureTimestampWithinBounds(watermarkEstimatorState)); + } + + @DoFn.GetInitialRestriction + public OffsetHolder getInitialRestriction(@Element TwitterConfig twitterConfig) + throws IOException { + return new OffsetHolder(null, 0L); + } + + @DoFn.NewTracker + public RestrictionTracker newTracker( + @Element TwitterConfig twitterConfig, @DoFn.Restriction OffsetHolder restriction) { + return new OffsetTracker(restriction, startTime); + } + + @GetRestrictionCoder + public Coder getRestrictionCoder() { + return SerializableCoder.of(OffsetHolder.class); + } + + @DoFn.ProcessElement + public DoFn.ProcessContinuation processElement( + @Element TwitterConfig twitterConfig, + DoFn.OutputReceiver out, + RestrictionTracker tracker, + ManualWatermarkEstimator watermarkEstimator) { + LOG.debug("In Read From Twitter Do Fn"); + TwitterConnection twitterConnection = TwitterConnection.getInstance(twitterConfig); + BlockingQueue queue = twitterConnection.getQueue(); + if (queue.isEmpty()) { + if (checkIfDone(twitterConnection, twitterConfig, tracker)) { + return DoFn.ProcessContinuation.stop(); + } + } + while (!queue.isEmpty()) { + Status status = queue.poll(); + if (checkIfDone(twitterConnection, twitterConfig, tracker)) { + return DoFn.ProcessContinuation.stop(); + } + if (status != null) { + Instant currentInstant = Instant.ofEpochMilli(status.getCreatedAt().getTime()); + watermarkEstimator.setWatermark(currentInstant); + out.outputWithTimestamp(status.getText(), currentInstant); + } + } + return DoFn.ProcessContinuation.resume().withResumeDelay(Duration.standardSeconds(1)); + } + + boolean checkIfDone( + TwitterConnection twitterConnection, + TwitterConfig twitterConfig, + RestrictionTracker tracker) { + if (!tracker.tryClaim(twitterConfig)) { + twitterConnection.closeStream(); + return true; + } else { + return false; + } + } +} diff --git a/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterConfig.java b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterConfig.java new file mode 100644 index 000000000000..2a3fe3db7669 --- /dev/null +++ b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterConfig.java @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.twitterstreamgenerator; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.List; +import java.util.Objects; +import org.apache.beam.sdk.coders.DefaultCoder; +import org.apache.beam.sdk.coders.SerializableCoder; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** {@link Serializable} object to store twitter configurations for a connection. * */ +@DefaultCoder(SerializableCoder.class) +public class TwitterConfig implements Serializable { + private final String key; + private final String secret; + private final String token; + private final String tokenSecret; + private final List filters; + private final String language; + private final Long tweetsCount; + private final Integer minutesToRun; + + private TwitterConfig(TwitterConfig.Builder builder) { + this.key = builder.key; + this.secret = builder.secret; + this.token = builder.token; + this.tokenSecret = builder.tokenSecret; + this.filters = builder.filters; + this.language = builder.language; + this.tweetsCount = builder.tweetsCount; + this.minutesToRun = builder.minutesToRun; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + TwitterConfig that = (TwitterConfig) o; + return Objects.equals(key, that.key) + && Objects.equals(secret, that.secret) + && Objects.equals(token, that.token) + && Objects.equals(tokenSecret, that.tokenSecret) + && Objects.equals(filters, that.filters) + && Objects.equals(language, that.language) + && Objects.equals(tweetsCount, that.tweetsCount) + && Objects.equals(minutesToRun, that.minutesToRun); + } + + @Override + public int hashCode() { + return Objects.hash( + key, secret, token, tokenSecret, filters, language, tweetsCount, minutesToRun); + } + + public String getKey() { + return key; + } + + public String getSecret() { + return secret; + } + + public String getToken() { + return token; + } + + public String getTokenSecret() { + return tokenSecret; + } + + public List getFilters() { + return filters; + } + + public String getLanguage() { + return language; + } + + public Long getTweetsCount() { + return tweetsCount; + } + + public Integer getMinutesToRun() { + return minutesToRun; + } + + public static class Builder { + private String key = ""; + private String secret = ""; + private String token = ""; + private String tokenSecret = ""; + private List filters = new ArrayList<>(); + private String language = "en"; + private Long tweetsCount = Long.MAX_VALUE; + private Integer minutesToRun = Integer.MAX_VALUE; + + TwitterConfig.Builder setKey(final String key) { + this.key = key; + return this; + } + + TwitterConfig.Builder setSecret(final String secret) { + this.secret = secret; + return this; + } + + TwitterConfig.Builder setToken(final String token) { + this.token = token; + return this; + } + + TwitterConfig.Builder setTokenSecret(final String tokenSecret) { + this.tokenSecret = tokenSecret; + return this; + } + + TwitterConfig.Builder setFilters(final List filters) { + this.filters = filters; + return this; + } + + TwitterConfig.Builder setLanguage(final String language) { + this.language = language; + return this; + } + + TwitterConfig.Builder setTweetsCount(final Long tweetsCount) { + this.tweetsCount = tweetsCount; + return this; + } + + TwitterConfig.Builder setMinutesToRun(final Integer minutesToRun) { + this.minutesToRun = minutesToRun; + return this; + } + + TwitterConfig build() { + return new TwitterConfig(this); + } + } +} diff --git a/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterConnection.java b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterConnection.java new file mode 100644 index 000000000000..11453cd47858 --- /dev/null +++ b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterConnection.java @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.twitterstreamgenerator; + +import java.util.concurrent.BlockingQueue; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.LinkedBlockingQueue; +import twitter4j.FilterQuery; +import twitter4j.StallWarning; +import twitter4j.Status; +import twitter4j.StatusDeletionNotice; +import twitter4j.StatusListener; +import twitter4j.TwitterStream; +import twitter4j.TwitterStreamFactory; +import twitter4j.conf.ConfigurationBuilder; + +/** Singleton class for twitter connection. * */ +class TwitterConnection { + private final BlockingQueue queue; + private final TwitterStream twitterStream; + private static final Object lock = new Object(); + static final ConcurrentHashMap INSTANCE_MAP = + new ConcurrentHashMap<>(); + + /** + * Creates a new Twitter connection. + * + * @param twitterConfig configuration for twitter connection + */ + TwitterConnection(TwitterConfig twitterConfig) { + this.queue = new LinkedBlockingQueue<>(); + ConfigurationBuilder cb = new ConfigurationBuilder(); + cb.setDebugEnabled(true) + .setOAuthConsumerKey(twitterConfig.getKey()) + .setOAuthConsumerSecret(twitterConfig.getSecret()) + .setOAuthAccessToken(twitterConfig.getToken()) + .setOAuthAccessTokenSecret(twitterConfig.getTokenSecret()); + + this.twitterStream = new TwitterStreamFactory(cb.build()).getInstance(); + StatusListener listener = + new StatusListener() { + @Override + public void onException(Exception e) { + e.printStackTrace(); + } + + @Override + public void onDeletionNotice(StatusDeletionNotice arg) {} + + @Override + public void onScrubGeo(long userId, long upToStatusId) {} + + @Override + public void onStallWarning(StallWarning warning) {} + + @Override + public void onStatus(Status status) { + try { + queue.offer(status); + } catch (Exception ignored) { + } + } + + @Override + public void onTrackLimitationNotice(int numberOfLimitedStatuses) {} + }; + FilterQuery tweetFilterQuery = new FilterQuery(); + for (String filter : twitterConfig.getFilters()) { + tweetFilterQuery.track(filter); + } + tweetFilterQuery.language(twitterConfig.getLanguage()); + this.twitterStream.addListener(listener); + this.twitterStream.filter(tweetFilterQuery); + } + + public static TwitterConnection getInstance(TwitterConfig twitterConfig) { + synchronized (lock) { + if (INSTANCE_MAP.containsKey(twitterConfig)) { + return INSTANCE_MAP.get(twitterConfig); + } + TwitterConnection singleInstance = new TwitterConnection(twitterConfig); + INSTANCE_MAP.put(twitterConfig, singleInstance); + return singleInstance; + } + } + + public BlockingQueue getQueue() { + return this.queue; + } + + public void closeStream() { + this.twitterStream.shutdown(); + } +} diff --git a/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterIO.java b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterIO.java new file mode 100644 index 000000000000..bf960fe25d7a --- /dev/null +++ b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterIO.java @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.twitterstreamgenerator; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; + +/** + * An unbounded source for twitter + * stream. PTransforms for streaming live tweets from twitter. Reading from Twitter is supported by + * read() + * + *

    Standard Twitter API can be read using a list of Twitter Config + * readStandardStream(List) + * + *

    It allow multiple Twitter configurations to demonstrate how multiple twitter streams can be + * combined in a single pipeline. + * + *

    {@code
    + * PCollection weatherData = pipeline.apply(
    + *      TwitterIO.readStandardStream(
    + *          Arrays.asList(
    + *                  new TwitterConfig.Builder()
    + *                      .setKey("")
    + *                      .setSecret("")
    + *                      .setToken("")
    + *                      .setTokenSecret("")
    + *                      .setFilters(Arrays.asList("", ""))
    + *                      .setLanguage("en")
    + *                      .setTweetsCount(10L)
    + *                      .setMinutesToRun(1)
    + *                      .build())));
    + * }
    + */ +public class TwitterIO { + + /** + * Initializes the stream by converting input to a Twitter connection configuration. + * + * @param twitterConfigs list of twitter config + * @return PTransform of statuses + */ + public static PTransform> readStandardStream( + List twitterConfigs) { + return new TwitterIO.Read.Builder().setTwitterConfig(twitterConfigs).build(); + } + + /** A {@link PTransform} to read from Twitter stream. usage and configuration. */ + private static class Read extends PTransform> { + private final List twitterConfigs; + + private Read(Builder builder) { + this.twitterConfigs = builder.twitterConfigs; + } + + @Override + public PCollection expand(PBegin input) throws IllegalArgumentException { + return input.apply(Create.of(this.twitterConfigs)).apply(ParDo.of(new ReadFromTwitterDoFn())); + } + + private static class Builder { + List twitterConfigs = new ArrayList<>(); + + TwitterIO.Read.Builder setTwitterConfig(final List twitterConfigs) { + this.twitterConfigs = twitterConfigs; + return this; + } + + TwitterIO.Read build() { + return new TwitterIO.Read(this); + } + } + } +} diff --git a/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterStream.java b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterStream.java new file mode 100644 index 000000000000..0292fe74cf81 --- /dev/null +++ b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/TwitterStream.java @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.twitterstreamgenerator; + +import java.util.Arrays; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.windowing.AfterFirst; +import org.apache.beam.sdk.transforms.windowing.AfterPane; +import org.apache.beam.sdk.transforms.windowing.AfterProcessingTime; +import org.apache.beam.sdk.transforms.windowing.FixedWindows; +import org.apache.beam.sdk.transforms.windowing.Repeatedly; +import org.apache.beam.sdk.transforms.windowing.Window; +import org.apache.beam.sdk.values.PCollection; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * The {@link TwitterStream} pipeline is a streaming pipeline which ingests data in JSON format from + * Twitter, and outputs the resulting records to console. Stream configurations are specified by the + * user as template parameters.
    + * + *

    Concepts: API connectors and streaming; splittable Dofn and watermarking ; logging + * + *

    To execute this pipeline locally, specify key, secret, token, token-secret and filters to + * filter stream with, for your twitter streaming app.You can also set number of tweets ( use set + * TweetsCount - default Long.MAX_VALUE ) you wish to stream and/or the number of minutes to run the + * pipeline ( use set MinutesToRun: default Integer.MAX_VALUE ) : + * + *

    {@code
    + * new TwitterConfig
    + *        .Builder()
    + *        .setKey("")
    + *        .setSecret("")
    + *        .setToken("")
    + *        .setTokenSecret("")
    + *        .setFilters(Arrays.asList("", "")).build()
    + * }
    + * + *

    To change the runner( does not works on Dataflow ), specify: + * + *

    {@code
    + * --runner=YOUR_SELECTED_RUNNER
    + * }
    + * + * See examples/java/README.md for instructions about how to configure different runners. + */ +public class TwitterStream { + + private static final Logger LOG = LoggerFactory.getLogger(TwitterStream.class); + + /** + * Main entry point for pipeline execution. + * + * @param args Command line arguments to the pipeline. + */ + public static void main(String[] args) { + Pipeline pipeline = Pipeline.create(); + Window.configure() + .triggering( + Repeatedly.forever( + AfterFirst.of( + AfterPane.elementCountAtLeast(10), + AfterProcessingTime.pastFirstElementInPane() + .plusDelayOf(Duration.standardMinutes(2))))); + PCollection tweetStream = + pipeline + .apply( + "Create Twitter Connection Configuration", + TwitterIO.readStandardStream( + Arrays.asList( + new TwitterConfig.Builder() + .setKey("") + .setSecret("") + .setToken("") + .setTokenSecret("") + .setFilters(Arrays.asList("", "")) + .setLanguage("en") + .setTweetsCount(10L) + .setMinutesToRun(1) + .build()))) + .apply(Window.into(FixedWindows.of(Duration.standardSeconds(1)))); + tweetStream.apply( + "Output Tweets to console", + ParDo.of( + new DoFn() { + @ProcessElement + public void processElement(@Element String element, OutputReceiver receiver) { + LOG.debug("Output tweets: " + element); + receiver.output(element); + } + })); + + pipeline.run(); + } +} diff --git a/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/package-info.java b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/package-info.java new file mode 100644 index 000000000000..e15dae134339 --- /dev/null +++ b/examples/java/twitter/src/main/java/org/apache/beam/examples/twitterstreamgenerator/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Kafka to Pubsub template. */ +package org.apache.beam.examples.twitterstreamgenerator; diff --git a/examples/java/twitter/src/test/java/org/apache/beam/examples/twitterstreamgenerator/ReadFromTwitterDoFnTest.java b/examples/java/twitter/src/test/java/org/apache/beam/examples/twitterstreamgenerator/ReadFromTwitterDoFnTest.java new file mode 100644 index 000000000000..48d7f01eb47b --- /dev/null +++ b/examples/java/twitter/src/test/java/org/apache/beam/examples/twitterstreamgenerator/ReadFromTwitterDoFnTest.java @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.examples.twitterstreamgenerator; + +import static org.junit.Assert.assertArrayEquals; +import static org.mockito.Mockito.when; + +import com.fasterxml.jackson.core.JsonProcessingException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Date; +import java.util.List; +import java.util.concurrent.LinkedBlockingQueue; +import java.util.stream.IntStream; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.mockito.Mock; +import org.mockito.MockitoAnnotations; +import twitter4j.Status; + +@RunWith(JUnit4.class) +public class ReadFromTwitterDoFnTest { + @Rule public final transient TestPipeline pipeline = TestPipeline.create(); + @Rule public final ExpectedException expectedException = ExpectedException.none(); + @Mock TwitterConnection twitterConnection1; + @Mock TwitterConnection twitterConnection2; + @Mock Status status1; + @Mock Status status2; + @Mock Status status3; + @Mock Status status4; + @Mock Status status5; + LinkedBlockingQueue queue1 = new LinkedBlockingQueue<>(); + LinkedBlockingQueue queue2 = new LinkedBlockingQueue<>(); + + @Before + public void setUp() throws JsonProcessingException { + MockitoAnnotations.initMocks(this); + when(status1.getText()).thenReturn("Breaking News1"); + when(status1.getCreatedAt()).thenReturn(new Date()); + when(status2.getText()).thenReturn("Breaking News2"); + when(status2.getCreatedAt()).thenReturn(new Date()); + when(status3.getText()).thenReturn("Breaking News3"); + when(status3.getCreatedAt()).thenReturn(new Date()); + when(status4.getText()).thenReturn("Breaking News4"); + when(status4.getCreatedAt()).thenReturn(new Date()); + when(status5.getText()).thenReturn("Breaking News5"); + when(status5.getCreatedAt()).thenReturn(new Date()); + queue1.offer(status1); + queue1.offer(status2); + queue1.offer(status3); + queue2.offer(status4); + queue2.offer(status5); + } + + @Test + public void testTwitterRead() { + TwitterConfig twitterConfig = new TwitterConfig.Builder().setTweetsCount(3L).build(); + TwitterConnection.INSTANCE_MAP.put(twitterConfig, twitterConnection1); + when(twitterConnection1.getQueue()).thenReturn(queue1); + PCollection result = + pipeline + .apply("Create Twitter Connection Configuration", Create.of(twitterConfig)) + .apply(ParDo.of(new ReadFromTwitterDoFn())); + PAssert.that(result) + .satisfies( + pcollection -> { + List output = new ArrayList<>(); + pcollection.forEach(output::add); + String[] expected = {"Breaking News1", "Breaking News2", "Breaking News3"}; + String[] actual = new String[output.size()]; + IntStream.range(0, output.size()).forEach((i) -> actual[i] = output.get(i)); + assertArrayEquals("Mismatch found in output", actual, expected); + return null; + }); + pipeline.run(); + } + + @Test + public void testMultipleTwitterConfigs() { + TwitterConfig twitterConfig1 = new TwitterConfig.Builder().setTweetsCount(3L).build(); + TwitterConfig twitterConfig2 = new TwitterConfig.Builder().setTweetsCount(2L).build(); + TwitterConnection.INSTANCE_MAP.put(twitterConfig1, twitterConnection1); + TwitterConnection.INSTANCE_MAP.put(twitterConfig2, twitterConnection2); + when(twitterConnection1.getQueue()).thenReturn(queue1); + when(twitterConnection2.getQueue()).thenReturn(queue2); + PCollection result = + pipeline + .apply( + "Create Twitter Connection Configuration", + Create.of(twitterConfig1, twitterConfig2)) + .apply(ParDo.of(new ReadFromTwitterDoFn())); + PAssert.that(result) + .satisfies( + pcollection -> { + List output = new ArrayList<>(); + pcollection.forEach(output::add); + String[] expected = { + "Breaking News1", + "Breaking News2", + "Breaking News3", + "Breaking News4", + "Breaking News5" + }; + String[] actual = new String[output.size()]; + Collections.sort(output); + IntStream.range(0, output.size()).forEach((i) -> actual[i] = output.get(i)); + assertArrayEquals("Mismatch found in output", actual, expected); + return null; + }); + pipeline.run(); + } +} diff --git a/examples/kotlin/build.gradle b/examples/kotlin/build.gradle index e2525298d3a2..60e5ea7b3d2a 100644 --- a/examples/kotlin/build.gradle +++ b/examples/kotlin/build.gradle @@ -46,6 +46,8 @@ configurations.sparkRunnerPreCommit { exclude group: "org.slf4j", module: "slf4j-jdk14" } +def kotlin_version = "1.4.32" + dependencies { compile enforcedPlatform(library.java.google_cloud_platform_libraries_bom) compile library.java.vendored_guava_26_0_jre @@ -59,18 +61,15 @@ dependencies { compile library.java.google_api_services_pubsub compile library.java.google_auth_library_credentials compile library.java.google_auth_library_oauth2_http - compile library.java.google_cloud_datastore_v1_proto_client compile library.java.google_http_client compile library.java.joda_time - compile library.java.proto_google_cloud_datastore_v1 compile library.java.slf4j_api - compile library.java.slf4j_jdk14 + compile library.java.guava + compile "org.jetbrains.kotlin:kotlin-stdlib:$kotlin_version" + compile "org.jetbrains:annotations:13.0" runtime project(path: ":runners:direct-java", configuration: "shadow") testCompile project(":sdks:java:io:google-cloud-platform") - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.junit - testCompile library.java.mockito_core // Add dependencies for the PreCommit configurations // For each runner a project level dependency on the examples project. @@ -79,15 +78,14 @@ dependencies { delegate.add(runner + "PreCommit", project(path: ":examples:kotlin", configuration: "testRuntime")) } directRunnerPreCommit project(path: ":runners:direct-java", configuration: "shadow") - flinkRunnerPreCommit project(":runners:flink:1.10") + flinkRunnerPreCommit project(":runners:flink:${project.ext.latestFlinkVersion}") // TODO: Make the netty version used configurable, we add netty-all 4.1.17.Final so it appears on the classpath // before 4.1.8.Final defined by Apache Beam sparkRunnerPreCommit "io.netty:netty-all:4.1.17.Final" - sparkRunnerPreCommit project(":runners:spark") + sparkRunnerPreCommit project(":runners:spark:2") sparkRunnerPreCommit project(":sdks:java:io:hadoop-file-system") sparkRunnerPreCommit library.java.spark_streaming sparkRunnerPreCommit library.java.spark_core - implementation "org.jetbrains.kotlin:kotlin-stdlib-jdk8" } /* diff --git a/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/MinimalWordCount.kt b/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/MinimalWordCount.kt index 2c752c4565c2..b95a495f00b4 100644 --- a/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/MinimalWordCount.kt +++ b/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/MinimalWordCount.kt @@ -96,7 +96,7 @@ public object MinimalWordCount { // individual word in Shakespeare's collected texts. .apply( FlatMapElements.into(TypeDescriptors.strings()) - .via(ProcessFunction> { input -> input.split("[^\\p{L}]+").toList() }) + .via(ProcessFunction> { input -> input.split("[^\\p{L}]+".toRegex()).toList() }) ) // We use a Filter transform to avoid empty word .apply(Filter.by(SerializableFunction { input -> !input.isEmpty() })) diff --git a/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/common/WriteOneFilePerWindow.kt b/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/common/WriteOneFilePerWindow.kt index c37e72ab93c1..f06b5c82d521 100644 --- a/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/common/WriteOneFilePerWindow.kt +++ b/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/common/WriteOneFilePerWindow.kt @@ -62,7 +62,7 @@ class WriteOneFilePerWindow(private val filenamePrefix: String, private val numS */ class PerWindowFiles(private val baseFilename: ResourceId) : FilenamePolicy() { - fun filenamePrefixForWindow(window: IntervalWindow): String { + private fun filenamePrefixForWindow(window: IntervalWindow): String { val prefix = if (baseFilename.isDirectory) "" else firstNonNull(baseFilename.filename, "") return "$prefix-${FORMATTER.print(window.start())}-${FORMATTER.print(window.end())}" } diff --git a/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/cookbook/DistinctExample.kt b/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/cookbook/DistinctExample.kt index f3ea070e2ca5..65dbbd867bb4 100644 --- a/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/cookbook/DistinctExample.kt +++ b/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/cookbook/DistinctExample.kt @@ -69,7 +69,7 @@ object DistinctExample { /** Returns gs://${TEMP_LOCATION}/"deduped.txt". */ class OutputFactory : DefaultValueFactory { override fun create(options: PipelineOptions): String { - options.tempLocation?.let { + options.tempLocation.let { return GcsPath.fromUri(it).resolve("deduped.txt").toString() } ?: run { throw IllegalArgumentException("Must specify --output or --tempLocation") diff --git a/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/cookbook/TriggerExample.kt b/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/cookbook/TriggerExample.kt index 59caaceb0bf9..4cb95b55b68a 100644 --- a/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/cookbook/TriggerExample.kt +++ b/examples/kotlin/src/main/java/org/apache/beam/examples/kotlin/cookbook/TriggerExample.kt @@ -41,7 +41,6 @@ import org.apache.beam.sdk.transforms.windowing.* import org.apache.beam.sdk.values.KV import org.apache.beam.sdk.values.PCollection import org.apache.beam.sdk.values.PCollectionList -import org.checkerframework.checker.nullness.qual.Nullable import org.joda.time.Duration import org.joda.time.Instant diff --git a/examples/notebooks/get-started/try-apache-beam-java.ipynb b/examples/notebooks/get-started/try-apache-beam-java.ipynb index 101df8203330..a12294c706c2 100644 --- a/examples/notebooks/get-started/try-apache-beam-java.ipynb +++ b/examples/notebooks/get-started/try-apache-beam-java.ipynb @@ -593,8 +593,8 @@ "\n", "> Task :runShadow\n", "WARNING: An illegal reflective access operation has occurred\n", - "WARNING: Illegal reflective access by org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.UnsafeUtil (file:/content/build/install/content-shadow/lib/WordCount.jar) to field java.nio.Buffer.address\n", - "WARNING: Please consider reporting this to the maintainers of org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.UnsafeUtil\n", + "WARNING: Illegal reflective access by org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.UnsafeUtil (file:/content/build/install/content-shadow/lib/WordCount.jar) to field java.nio.Buffer.address\n", + "WARNING: Please consider reporting this to the maintainers of org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.UnsafeUtil\n", "WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations\n", "WARNING: All illegal access operations will be denied in a future release\n", "Mar 04, 2019 11:00:24 PM org.apache.beam.sdk.io.FileBasedSource getEstimatedSizeBytes\n", @@ -735,8 +735,8 @@ "\n", ">> java -jar WordCount.jar\n", "WARNING: An illegal reflective access operation has occurred\n", - "WARNING: Illegal reflective access by org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.UnsafeUtil (file:/content/WordCount.jar) to field java.nio.Buffer.address\n", - "WARNING: Please consider reporting this to the maintainers of org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.UnsafeUtil\n", + "WARNING: Illegal reflective access by org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.UnsafeUtil (file:/content/WordCount.jar) to field java.nio.Buffer.address\n", + "WARNING: Please consider reporting this to the maintainers of org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.UnsafeUtil\n", "WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations\n", "WARNING: All illegal access operations will be denied in a future release\n", "Mar 04, 2019 11:00:49 PM org.apache.beam.sdk.io.FileBasedSource getEstimatedSizeBytes\n", @@ -981,8 +981,8 @@ "\n", "> Task :runShadow\n", "WARNING: An illegal reflective access operation has occurred\n", - "WARNING: Illegal reflective access by org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.UnsafeUtil (file:/content/build/install/content-shadow/lib/WordCount.jar) to field java.nio.Buffer.address\n", - "WARNING: Please consider reporting this to the maintainers of org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.UnsafeUtil\n", + "WARNING: Illegal reflective access by org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.UnsafeUtil (file:/content/build/install/content-shadow/lib/WordCount.jar) to field java.nio.Buffer.address\n", + "WARNING: Please consider reporting this to the maintainers of org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.UnsafeUtil\n", "WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations\n", "WARNING: All illegal access operations will be denied in a future release\n", "Mar 04, 2019 11:01:26 PM org.apache.beam.sdk.io.FileBasedSource getEstimatedSizeBytes\n", diff --git a/examples/notebooks/tour-of-beam/getting-started.ipynb b/examples/notebooks/tour-of-beam/getting-started.ipynb index 7155e7632601..b93fa0704c5d 100644 --- a/examples/notebooks/tour-of-beam/getting-started.ipynb +++ b/examples/notebooks/tour-of-beam/getting-started.ipynb @@ -246,7 +246,7 @@ "id": "h0UUmpwRADqA" }, "source": [ - "> ℹ️  In Beam, you can __NOT__ access the elements from a `PCollection` directly like a Python list.\n", + "> ℹ️ In Beam, you can __NOT__ access the elements from a `PCollection` directly like a Python list.\n", "> This means, we can't simply `print` the output `PCollection` to see the elements.\n", ">\n", "> This is because, depending on the runner,\n", @@ -322,7 +322,7 @@ "\n", "`map` takes a _function_ that transforms a single input `a` into a single output `b`.\n", "\n", - "> ℹ️ -- For example, we want to multiply each element by 2." + "> ℹ️ For example, we want to multiply each element by 2." ] }, { @@ -441,7 +441,7 @@ "id": "Q06jvwmqOzer" }, "source": [ - "> ℹ️  Now that we know how `Map` works, we can see what's happening when we print the elements.\n", + "> ℹ️ Now that we know how `Map` works, we can see what's happening when we print the elements.\n", ">\n", "> We have our outputs stored in the `outputs` `PCollection`, so we _pipe_ it to a `Map` transform to apply the\n", "> [`print`](https://docs.python.org/3/library/functions.html#print)\n", @@ -473,7 +473,7 @@ "`flatMap` takes a function that transforms a single input `a` into an `iterable` of outputs `b`.\n", "But we get a _single collection_ containing the outputs of _all_ the elements.\n", "\n", - "> ℹ️ -- For example, we want to have as many elements as the element's value.\n", + "> ℹ️ For example, we want to have as many elements as the element's value.\n", "> For a value `1` we want one element, and three elements for a value `3`." ] }, @@ -608,7 +608,7 @@ "`filter` takes a function that checks a single element `a`,\n", "and returns `True` to keep the element, or `False` to discard it.\n", "\n", - "> ℹ️ -- For example, we only want to keep number that are *even*, or divisible by two.\n", + "> ℹ️ For example, we only want to keep number that are *even*, or divisible by two.\n", "> We can use the\n", "> [modulo operator `%`](https://en.wikipedia.org/wiki/Modulo_operation)\n", "> for a simple check." @@ -729,7 +729,7 @@ "\n", "Other common names for this function are `fold` and `reduce`.\n", "\n", - "> ℹ️ -- For example, we want to add all numbers together." + "> ℹ️ For example, we want to add all numbers together." ] }, { @@ -851,7 +851,7 @@ "id": "pFb98ioSp9YU" }, "source": [ - "> ℹ️  There are many ways to combine values in Beam.\n", + "> ℹ️ There are many ways to combine values in Beam.\n", "> You could even combine them into a different data type by defining a custom `CombineFn`.\n", ">\n", "> You can learn more about them by checking the available\n", @@ -874,7 +874,7 @@ "but instead of replacing the value on a \"duplicate\" key,\n", "you would get a list of all the values associated with that key.\n", "\n", - "> ℹ️ -- For example, we want to group each animal with the list of foods they like, and we start with `(animal, food)` pairs." + "> ℹ️ For example, we want to group each animal with the list of foods they like, and we start with `(animal, food)` pairs." ] }, { @@ -1006,10 +1006,15 @@ "source": [ "# What's next?\n", "\n", - "* [Transform catalog](https://beam.apache.org/documentation/transforms/python/overview/) -- check out all the available transforms\n", - "* [Using I/O transforms](https://beam.apache.org/documentation/programming-guide/#pipeline-io) -- learn how to read/write data to/from external sources and sinks like text files or databases\n", - "* [Mobile gaming example](https://beam.apache.org/get-started/mobile-gaming-example/) -- learn more about windowing, triggers, and streaming through a complete example pipeline\n", - "* [Runners](https://beam.apache.org/documentation/runners/capability-matrix/) -- check the available runners, their capabilities, and how to run your pipeline in them" + "* ![Open in Colab](https://github.com/googlecolab/open_in_colab/raw/main/images/icon16.png)\n", + " [Reading and writing data](https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/reading-and-writing-data.ipynb) --\n", + " how to read and write data to and from different data formats. \n", + "* [Transform catalog](https://beam.apache.org/documentation/transforms/python/overview) --\n", + " check out all the available transforms.\n", + "* [Mobile gaming example](https://beam.apache.org/get-started/mobile-gaming-example) --\n", + " learn more about windowing, triggers, and streaming through a complete example pipeline.\n", + "* [Runners](https://beam.apache.org/documentation/runners/capability-matrix) --\n", + " check the available runners, their capabilities, and how to run your pipeline in them." ] } ] diff --git a/examples/notebooks/tour-of-beam/reading-and-writing-data.ipynb b/examples/notebooks/tour-of-beam/reading-and-writing-data.ipynb new file mode 100644 index 000000000000..3d0333e6ce6d --- /dev/null +++ b/examples/notebooks/tour-of-beam/reading-and-writing-data.ipynb @@ -0,0 +1,1013 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Reading and writing data -- Tour of Beam", + "provenance": [], + "collapsed_sections": [], + "toc_visible": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "code", + "metadata": { + "cellView": "form", + "id": "upmJn_DjcThx" + }, + "source": [ + "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n", + "\n", + "# Licensed to the Apache Software Foundation (ASF) under one\n", + "# or more contributor license agreements. See the NOTICE file\n", + "# distributed with this work for additional information\n", + "# regarding copyright ownership. The ASF licenses this file\n", + "# to you under the Apache License, Version 2.0 (the\n", + "# \"License\"); you may not use this file except in compliance\n", + "# with the License. You may obtain a copy of the License at\n", + "#\n", + "# http://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing,\n", + "# software distributed under the License is distributed on an\n", + "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n", + "# KIND, either express or implied. See the License for the\n", + "# specific language governing permissions and limitations\n", + "# under the License." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5UC_aGanx6oE" + }, + "source": [ + "# Reading and writing data -- _Tour of Beam_\n", + "\n", + "So far we've learned some of the basic transforms like\n", + "[`Map`](https://beam.apache.org/documentation/transforms/python/elementwise/map),\n", + "[`FlatMap`](https://beam.apache.org/documentation/transforms/python/elementwise/flatmap),\n", + "[`Filter`](https://beam.apache.org/documentation/transforms/python/elementwise/filter),\n", + "[`Combine`](https://beam.apache.org/documentation/transforms/python/aggregation/combineglobally), and\n", + "[`GroupByKey`](https://beam.apache.org/documentation/transforms/python/aggregation/groupbykey).\n", + "These allow us to transform data in any way, but so far we've used\n", + "[`Create`](https://beam.apache.org/documentation/transforms/python/other/create)\n", + "to get data from an in-memory\n", + "[`iterable`](https://docs.python.org/3/glossary.html#term-iterable), like a `list`.\n", + "\n", + "This works well for experimenting with small datasets. For larger datasets we can use `Source` transforms to read data and `Sink` transforms to write data.\n", + "If there are no built-in `Source` or `Sink` transforms, we can also easily create our custom I/O transforms.\n", + "\n", + "Let's create some data files and see how we can read them in Beam." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "R_Yhoc6N_Flg" + }, + "source": [ + "# Install apache-beam with pip.\n", + "!pip install --quiet apache-beam\n", + "\n", + "# Create a directory for our data files.\n", + "!mkdir -p data" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "sQUUi4H9s-g2", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "da02ce82-61ce-43e2-a5f0-21b06df76bd6" + }, + "source": [ + "%%writefile data/my-text-file-1.txt\n", + "This is just a plain text file, UTF-8 strings are allowed 🎉.\n", + "Each line in the file is one element in the PCollection." + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Writing data/my-text-file-1.txt\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BWVVeTSOlKug", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "2f0ad045-0af3-4ca7-f5a3-1f348b5c1517" + }, + "source": [ + "%%writefile data/my-text-file-2.txt\n", + "There are no guarantees on the order of the elements.\n", + "ฅ^•ﻌ•^ฅ" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Writing data/my-text-file-2.txt\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "NhCws6ncbDJG", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0b5e0dc6-43ee-4786-c1d5-984fd422e445" + }, + "source": [ + "%%writefile data/penguins.csv\n", + "species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_g\n", + "0,0.2545454545454545,0.6666666666666666,0.15254237288135594,0.2916666666666667\n", + "0,0.26909090909090905,0.5119047619047618,0.23728813559322035,0.3055555555555556\n", + "1,0.5236363636363636,0.5714285714285713,0.3389830508474576,0.2222222222222222\n", + "1,0.6509090909090909,0.7619047619047619,0.4067796610169492,0.3333333333333333\n", + "2,0.509090909090909,0.011904761904761862,0.6610169491525424,0.5\n", + "2,0.6509090909090909,0.38095238095238104,0.9830508474576272,0.8333333333333334" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Writing data/penguins.csv\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_OkWHiAvpWDZ" + }, + "source": [ + "# Reading from text files\n", + "\n", + "We can use the\n", + "[`ReadFromText`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html#apache_beam.io.textio.ReadFromText)\n", + "transform to read text files into `str` elements.\n", + "\n", + "It takes a\n", + "[_glob pattern_](https://en.wikipedia.org/wiki/Glob_%28programming%29)\n", + "as an input, and reads all the files that match that pattern.\n", + "It returns one element for each line in the file.\n", + "\n", + "For example, in the pattern `data/*.txt`, the `*` is a wildcard that matches anything. This pattern matches all the files in the `data/` directory with a `.txt` extension." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "xDXdE9uysriw", + "outputId": "3ef9f7a9-8291-42a1-be03-71e50de266f5" + }, + "source": [ + "import apache_beam as beam\n", + "\n", + "input_files = 'data/*.txt'\n", + "with beam.Pipeline() as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Read files' >> beam.io.ReadFromText(input_files)\n", + " | 'Print contents' >> beam.Map(print)\n", + " )" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "This is just a plain text file, UTF-8 strings are allowed 🎉.\n", + "Each line in the file is one element in the PCollection.\n", + "There are no guarantees on the order of the elements.\n", + "ฅ^•ﻌ•^ฅ\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9-2wmzEWsdrb" + }, + "source": [ + "# Writing to text files\n", + "\n", + "We can use the\n", + "[`WriteToText`](https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.io.textio.html#apache_beam.io.textio.WriteToText) transform to write `str` elements into text files.\n", + "\n", + "It takes a _file path prefix_ as an input, and it writes the all `str` elements into one or more files with filenames starting with that prefix. You can optionally pass a `file_name_suffix` as well, usually used for the file extension. Each element goes into its own line in the output files." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nkPlfoTfz61I" + }, + "source": [ + "import apache_beam as beam\n", + "\n", + "output_file_name_prefix = 'outputs/file'\n", + "with beam.Pipeline() as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Create file lines' >> beam.Create([\n", + " 'Each element must be a string.',\n", + " 'It writes one element per line.',\n", + " 'There are no guarantees on the line order.',\n", + " 'The data might be written into multiple files.',\n", + " ])\n", + " | 'Write to files' >> beam.io.WriteToText(\n", + " output_file_name_prefix,\n", + " file_name_suffix='.txt')\n", + " )" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8au0yJSd1itt", + "outputId": "4822458b-2724-42e9-c71f-280a82d505d6" + }, + "source": [ + "# Lets look at the output files and contents.\n", + "!head outputs/file*.txt" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Each element must be a string.\n", + "It writes one element per line.\n", + "There are no guarantees on the line order.\n", + "The data might be written into multiple files.\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "21CCdZispqYK" + }, + "source": [ + "# Reading data\n", + "\n", + "Your data might reside in various input formats. Take a look at the\n", + "[Built-in I/O Transforms](https://beam.apache.org/documentation/io/built-in)\n", + "page for a list of all the available I/O transforms in Beam.\n", + "\n", + "If none of those work for you, you might need to create your own input transform.\n", + "\n", + "> ℹ️ For a more in-depth guide, take a look at the\n", + "[Developing a new I/O connector](https://beam.apache.org/documentation/io/developing-io-overview) page." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7dQEym1QRG4y" + }, + "source": [ + "## Reading from an `iterable`\n", + "\n", + "The easiest way to create elements is using\n", + "[`FlatMap`](https://beam.apache.org/documentation/transforms/python/elementwise/flatmap).\n", + "\n", + "A common way is having a [`generator`](https://docs.python.org/3/glossary.html#term-generator) function. This could take an input and _expand_ it into a large amount of elements. The nice thing about `generator`s is that they don't have to fit everything into memory like a `list`, they simply\n", + "[`yield`](https://docs.python.org/3/reference/simple_stmts.html#yield)\n", + "elements as they process them.\n", + "\n", + "For example, let's define a `generator` called `count`, that `yield`s the numbers from `0` to `n`. We use `Create` for the initial `n` value(s) and then exapand them with `FlatMap`." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wR6WY6wOMVhb", + "outputId": "f90ae0a1-0cb4-4f25-fa93-10f9de856a95" + }, + "source": [ + "import apache_beam as beam\n", + "from typing import Iterable\n", + "\n", + "def count(n: int) -> Iterable[int]:\n", + " for i in range(n):\n", + " yield i\n", + "\n", + "n = 5\n", + "with beam.Pipeline() as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Create inputs' >> beam.Create([n])\n", + " | 'Generate elements' >> beam.FlatMap(count)\n", + " | 'Print elements' >> beam.Map(print)\n", + " )" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0\n", + "1\n", + "2\n", + "3\n", + "4\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G4fw7NE1RQNf" + }, + "source": [ + "## Creating an input transform\n", + "\n", + "For a nicer interface, we could abstract the `Create` and the `FlatMap` into a custom `PTransform`. This would give a more intuitive way to use it, while hiding the inner workings.\n", + "\n", + "We need a new class that inherits from `beam.PTransform`. We can do this more conveniently with the\n", + "[`beam.ptransform_fn`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.ptransform.html#apache_beam.transforms.ptransform.ptransform_fn) decorator.\n", + "\n", + "The `PTransform` function takes the input `PCollection` as the first argument, and any other inputs from the generator function, like `n`, can be arguments to the `PTransform` as well. The original generator function can be defined locally within the `PTransform`.\n", + "Finally, we apply the `Create` and `FlatMap` transforms and return a new `PCollection`.\n", + "\n", + "We can also, optionally, add type hints with the [`with_input_types`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.ptransform.html#apache_beam.transforms.ptransform.PTransform.with_input_types) and [`with_output_types`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.ptransform.html#apache_beam.transforms.ptransform.PTransform.with_output_types) decorators. They serve both as documentation, and are a way to ensure your data types are consistent throughout your pipeline. This becomes more useful as the complexity grows.\n", + "\n", + "Since our `PTransform` is expected to be the first transform in the pipeline, it doesn't receive any inputs. We can mark it as the beginning with the [`PBegin`](https://beam.apache.org/releases/pydoc/current/_modules/apache_beam/pvalue.html) type hint.\n", + "\n", + "Finally, to enable type checking, you can pass `--type_check_additional=all` when running your pipeline. Alternatively, you can also pass it directly to `PipelineOptions` if you want them enabled by default. To learn more about pipeline options, see [Configuring pipeline options](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options)." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "m8iXqE1CRnn5", + "outputId": "d77fd363-76eb-49ce-8729-82d6cd38cfda" + }, + "source": [ + "import apache_beam as beam\n", + "from apache_beam.options.pipeline_options import PipelineOptions\n", + "from typing import Iterable\n", + "\n", + "@beam.ptransform_fn\n", + "@beam.typehints.with_input_types(beam.pvalue.PBegin)\n", + "@beam.typehints.with_output_types(int)\n", + "def Count(pbegin: beam.pvalue.PBegin, n: int) -> beam.PCollection[int]:\n", + " def count(n: int) -> Iterable[int]:\n", + " for i in range(n):\n", + " yield i\n", + "\n", + " return (\n", + " pbegin\n", + " | 'Create inputs' >> beam.Create([n])\n", + " | 'Generate elements' >> beam.FlatMap(count)\n", + " )\n", + "\n", + "n = 5\n", + "options = PipelineOptions(flags=[], type_check_additional='all')\n", + "with beam.Pipeline(options=options) as pipeline:\n", + " (\n", + " pipeline\n", + " | f'Count to {n}' >> Count(n)\n", + " | 'Print elements' >> beam.Map(print)\n", + " )" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0\n", + "1\n", + "2\n", + "3\n", + "4\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e02_vFmUg-mK" + }, + "source": [ + "## Example: Reading CSV files\n", + "\n", + "Lets say we want to read CSV files to get elements as Python dictionaries. We like how `ReadFromText` expands a file pattern, but we might want to allow for multiple patterns as well.\n", + "\n", + "We create a `ReadCsvFiles` transform, which takes a list of `file_patterns` as input. It expands all the `glob` patterns, and then, for each file name it reads each row as a `dict` using the\n", + "[`csv.DictReader`](https://docs.python.org/3/library/csv.html#csv.DictReader) module.\n", + "\n", + "We could use the [`open`](https://docs.python.org/3/library/functions.html#open) function to open a local file, but Beam already supports several different file systems besides local files.\n", + "To leverage that, we can use the [`apache_beam.io.filesystems`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.filesystems.html) module.\n", + "\n", + "> ℹ️ The [`open`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.filesystems.html#apache_beam.io.filesystems.FileSystems.open)\n", + "> function from the Beam filesystem reads bytes,\n", + "> it's roughly equivalent to opening a file in `rb` mode.\n", + "> To write a file, you would use\n", + "> [`create`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.filesystems.html#apache_beam.io.filesystems.FileSystems.open) instead." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ywVbJxegaZbo", + "outputId": "8dd0fdf3-43e8-47db-8442-ed9e88ef6c95" + }, + "source": [ + "import apache_beam as beam\n", + "from apache_beam.io.filesystems import FileSystems as beam_fs\n", + "from apache_beam.options.pipeline_options import PipelineOptions\n", + "import codecs\n", + "import csv\n", + "from typing import Dict, Iterable, List\n", + "\n", + "@beam.ptransform_fn\n", + "@beam.typehints.with_input_types(beam.pvalue.PBegin)\n", + "@beam.typehints.with_output_types(Dict[str, str])\n", + "def ReadCsvFiles(pbegin: beam.pvalue.PBegin, file_patterns: List[str]) -> beam.PCollection[Dict[str, str]]:\n", + " def expand_pattern(pattern: str) -> Iterable[str]:\n", + " for match_result in beam_fs.match([pattern])[0].metadata_list:\n", + " yield match_result.path\n", + "\n", + " def read_csv_lines(file_name: str) -> Iterable[Dict[str, str]]:\n", + " with beam_fs.open(file_name) as f:\n", + " # Beam reads files as bytes, but csv expects strings,\n", + " # so we need to decode the bytes into utf-8 strings.\n", + " for row in csv.DictReader(codecs.iterdecode(f, 'utf-8')):\n", + " yield dict(row)\n", + "\n", + " return (\n", + " pbegin\n", + " | 'Create file patterns' >> beam.Create(file_patterns)\n", + " | 'Expand file patterns' >> beam.FlatMap(expand_pattern)\n", + " | 'Read CSV lines' >> beam.FlatMap(read_csv_lines)\n", + " )\n", + "\n", + "input_patterns = ['data/*.csv']\n", + "options = PipelineOptions(flags=[], type_check_additional='all')\n", + "with beam.Pipeline(options=options) as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Read CSV files' >> ReadCsvFiles(input_patterns)\n", + " | 'Print elements' >> beam.Map(print)\n", + " )" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "{'species': '0', 'culmen_length_mm': '0.2545454545454545', 'culmen_depth_mm': '0.6666666666666666', 'flipper_length_mm': '0.15254237288135594', 'body_mass_g': '0.2916666666666667'}\n", + "{'species': '0', 'culmen_length_mm': '0.26909090909090905', 'culmen_depth_mm': '0.5119047619047618', 'flipper_length_mm': '0.23728813559322035', 'body_mass_g': '0.3055555555555556'}\n", + "{'species': '1', 'culmen_length_mm': '0.5236363636363636', 'culmen_depth_mm': '0.5714285714285713', 'flipper_length_mm': '0.3389830508474576', 'body_mass_g': '0.2222222222222222'}\n", + "{'species': '1', 'culmen_length_mm': '0.6509090909090909', 'culmen_depth_mm': '0.7619047619047619', 'flipper_length_mm': '0.4067796610169492', 'body_mass_g': '0.3333333333333333'}\n", + "{'species': '2', 'culmen_length_mm': '0.509090909090909', 'culmen_depth_mm': '0.011904761904761862', 'flipper_length_mm': '0.6610169491525424', 'body_mass_g': '0.5'}\n", + "{'species': '2', 'culmen_length_mm': '0.6509090909090909', 'culmen_depth_mm': '0.38095238095238104', 'flipper_length_mm': '0.9830508474576272', 'body_mass_g': '0.8333333333333334'}\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZyzB_RO9Vs1D" + }, + "source": [ + "## Example: Reading from a SQLite database\n", + "\n", + "Lets begin by creating a small SQLite local database file.\n", + "\n", + "Run the _\"Creating the SQLite database\"_ cell to create a new SQLite3 database with the filename you choose. You can double-click it to see the source code if you want." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "EJ58A0AoV02o", + "cellView": "form", + "outputId": "f932e834-8d65-4ddc-a4f8-1fc825c30b41" + }, + "source": [ + "#@title Creating the SQLite database\n", + "import sqlite3\n", + "\n", + "database_file = \"moon-phases.db\" #@param {type:\"string\"}\n", + "\n", + "with sqlite3.connect(database_file) as db:\n", + " cursor = db.cursor()\n", + "\n", + " # Create the moon_phases table.\n", + " cursor.execute('''\n", + " CREATE TABLE IF NOT EXISTS moon_phases (\n", + " id INTEGER PRIMARY KEY,\n", + " phase_emoji TEXT NOT NULL,\n", + " peak_datetime DATETIME NOT NULL,\n", + " phase TEXT NOT NULL)''')\n", + "\n", + " # Truncate the table if it's already populated.\n", + " cursor.execute('DELETE FROM moon_phases')\n", + "\n", + " # Insert some sample data.\n", + " insert_moon_phase = 'INSERT INTO moon_phases(phase_emoji, peak_datetime, phase) VALUES(?, ?, ?)'\n", + " cursor.execute(insert_moon_phase, ('🌕', '2017-12-03 15:47:00', 'Full Moon'))\n", + " cursor.execute(insert_moon_phase, ('🌗', '2017-12-10 07:51:00', 'Last Quarter'))\n", + " cursor.execute(insert_moon_phase, ('🌑', '2017-12-18 06:30:00', 'New Moon'))\n", + " cursor.execute(insert_moon_phase, ('🌓', '2017-12-26 09:20:00', 'First Quarter'))\n", + " cursor.execute(insert_moon_phase, ('🌕', '2018-01-02 02:24:00', 'Full Moon'))\n", + " cursor.execute(insert_moon_phase, ('🌗', '2018-01-08 22:25:00', 'Last Quarter'))\n", + " cursor.execute(insert_moon_phase, ('🌑', '2018-01-17 02:17:00', 'New Moon'))\n", + " cursor.execute(insert_moon_phase, ('🌓', '2018-01-24 22:20:00', 'First Quarter'))\n", + " cursor.execute(insert_moon_phase, ('🌕', '2018-01-31 13:27:00', 'Full Moon'))\n", + "\n", + " # Query for the data in the table to make sure it's populated.\n", + " cursor.execute('SELECT * FROM moon_phases')\n", + " for row in cursor.fetchall():\n", + " print(row)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "(1, '🌕', '2017-12-03 15:47:00', 'Full Moon')\n", + "(2, '🌗', '2017-12-10 07:51:00', 'Last Quarter')\n", + "(3, '🌑', '2017-12-18 06:30:00', 'New Moon')\n", + "(4, '🌓', '2017-12-26 09:20:00', 'First Quarter')\n", + "(5, '🌕', '2018-01-02 02:24:00', 'Full Moon')\n", + "(6, '🌗', '2018-01-08 22:25:00', 'Last Quarter')\n", + "(7, '🌑', '2018-01-17 02:17:00', 'New Moon')\n", + "(8, '🌓', '2018-01-24 22:20:00', 'First Quarter')\n", + "(9, '🌕', '2018-01-31 13:27:00', 'Full Moon')\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8y-bRhPVWai6" + }, + "source": [ + "We could use a `FlatMap` transform to receive a SQL query and `yield` each result row, but that would mean creating a new database connection for each query. If we generated a large number of queries, creating that many connections could be a bottleneck.\n", + "\n", + "It would be nice to create the database connection only once for each worker, and every query could use the same connection if needed.\n", + "\n", + "We can use a\n", + "[custom `DoFn` transform](https://beam.apache.org/documentation/transforms/python/elementwise/pardo/#example-3-pardo-with-dofn-methods)\n", + "for this. It allows us to open and close resources, like the database connection, only _once_ per `DoFn` _instance_ by using the `setup` and `teardown` methods.\n", + "\n", + "> ℹ️ It should be safe to _read_ from a database with multiple concurrent processes using the same connection, but only one process should be _writing_ at once." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Bnpwqr-NV5DF", + "outputId": "5f69a99c-c711-47cf-f13a-c780de57f3e6" + }, + "source": [ + "import apache_beam as beam\n", + "from apache_beam.options.pipeline_options import PipelineOptions\n", + "import sqlite3\n", + "from typing import Iterable, List, Tuple\n", + "\n", + "class SQLiteSelect(beam.DoFn):\n", + " def __init__(self, database_file: str):\n", + " self.database_file = database_file\n", + " self.connection = None\n", + "\n", + " def setup(self):\n", + " self.connection = sqlite3.connect(self.database_file)\n", + "\n", + " def process(self, query: Tuple[str, List[str]]) -> Iterable[Dict[str, str]]:\n", + " table, columns = query\n", + " cursor = self.connection.cursor()\n", + " cursor.execute(f\"SELECT {','.join(columns)} FROM {table}\")\n", + " for row in cursor.fetchall():\n", + " yield dict(zip(columns, row))\n", + "\n", + " def teardown(self):\n", + " self.connection.close()\n", + "\n", + "@beam.ptransform_fn\n", + "@beam.typehints.with_input_types(beam.pvalue.PBegin)\n", + "@beam.typehints.with_output_types(Dict[str, str])\n", + "def SelectFromSQLite(\n", + " pbegin: beam.pvalue.PBegin,\n", + " database_file: str,\n", + " queries: List[Tuple[str, List[str]]],\n", + ") -> beam.PCollection[Dict[str, str]]:\n", + " return (\n", + " pbegin\n", + " | 'Create None' >> beam.Create(queries)\n", + " | 'SQLite SELECT' >> beam.ParDo(SQLiteSelect(database_file))\n", + " )\n", + "\n", + "queries = [\n", + " # (table_name, [column1, column2, ...])\n", + " ('moon_phases', ['phase_emoji', 'peak_datetime', 'phase']),\n", + " ('moon_phases', ['phase_emoji', 'phase']),\n", + "]\n", + "\n", + "options = PipelineOptions(flags=[], type_check_additional='all')\n", + "with beam.Pipeline(options=options) as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Read from SQLite' >> SelectFromSQLite(database_file, queries)\n", + " | 'Print rows' >> beam.Map(print)\n", + " )" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "{'phase_emoji': '🌕', 'peak_datetime': '2017-12-03 15:47:00', 'phase': 'Full Moon'}\n", + "{'phase_emoji': '🌗', 'peak_datetime': '2017-12-10 07:51:00', 'phase': 'Last Quarter'}\n", + "{'phase_emoji': '🌑', 'peak_datetime': '2017-12-18 06:30:00', 'phase': 'New Moon'}\n", + "{'phase_emoji': '🌓', 'peak_datetime': '2017-12-26 09:20:00', 'phase': 'First Quarter'}\n", + "{'phase_emoji': '🌕', 'peak_datetime': '2018-01-02 02:24:00', 'phase': 'Full Moon'}\n", + "{'phase_emoji': '🌗', 'peak_datetime': '2018-01-08 22:25:00', 'phase': 'Last Quarter'}\n", + "{'phase_emoji': '🌑', 'peak_datetime': '2018-01-17 02:17:00', 'phase': 'New Moon'}\n", + "{'phase_emoji': '🌓', 'peak_datetime': '2018-01-24 22:20:00', 'phase': 'First Quarter'}\n", + "{'phase_emoji': '🌕', 'peak_datetime': '2018-01-31 13:27:00', 'phase': 'Full Moon'}\n", + "{'phase_emoji': '🌕', 'phase': 'Full Moon'}\n", + "{'phase_emoji': '🌗', 'phase': 'Last Quarter'}\n", + "{'phase_emoji': '🌑', 'phase': 'New Moon'}\n", + "{'phase_emoji': '🌓', 'phase': 'First Quarter'}\n", + "{'phase_emoji': '🌕', 'phase': 'Full Moon'}\n", + "{'phase_emoji': '🌗', 'phase': 'Last Quarter'}\n", + "{'phase_emoji': '🌑', 'phase': 'New Moon'}\n", + "{'phase_emoji': '🌓', 'phase': 'First Quarter'}\n", + "{'phase_emoji': '🌕', 'phase': 'Full Moon'}\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "C5Mx_pfNpu_q" + }, + "source": [ + "# Writing data\n", + "\n", + "Your might want to write your data in various output formats. Take a look at the\n", + "[Built-in I/O Transforms](https://beam.apache.org/documentation/io/built-in)\n", + "page for a list of all the available I/O transforms in Beam.\n", + "\n", + "If none of those work for you, you might need to create your own output transform.\n", + "\n", + "> ℹ️ For a more in-depth guide, take a look at the\n", + "[Developing a new I/O connector](https://beam.apache.org/documentation/io/developing-io-overview) page." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FpM368NEhc-q" + }, + "source": [ + "## Creating an output transform\n", + "\n", + "The most straightforward way to write data would be to use a `Map` transform to write each element into our desired output format. In most cases, however, this would result in a lot of overhead creating, connecting to, and/or deleting resources.\n", + "\n", + "Instead, most data services are optimized to write _batches_ of elements at a time. Batch writes only connects to the service once, and can load many elements at a time.\n", + "\n", + "Here, we discuss two common ways of batching elements for optimized writes: _fixed-sized batches_, and\n", + "_[windows](https://beam.apache.org/documentation/programming-guide/#windowing)\n", + "of elements_." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5gypFFh4hM48" + }, + "source": [ + "## Writing fixed-sized batches\n", + "\n", + "If the order of the elements _is not_ important, we can simply create fixed-sized batches and write those independently.\n", + "\n", + "We can use\n", + "[`GroupIntoBatches`](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches)\n", + "to get fixed-sized batches. Note that it expects `(key, value)` pairs. Since `GroupIntoBatches` is an _aggregation_, all the elements in a batch _must_ fit into memory for each worker.\n", + "\n", + "> ℹ️ `GroupIntoBatches` requires a `(key, value)` pair. For simplicity, this example uses a placeholder `None` key and discards it later. Depending on your data, there might be a key that makes more sense. Using a _balanced_ key, where each key contains around the same number of elements, may help parallelize the batching process.\n", + "\n", + "Let's create something similar to `WriteToText` but keep it simple with a unique identifier in the file name instead of the file count.\n", + "\n", + "To write a file using the Beam `filesystems` module, we need to use [`create`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.filesystems.html#apache_beam.io.filesystems.FileSystems.create), which writes `bytes` into the file.\n", + "\n", + "> ℹ️ To read a file instead, use the [`open`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.filesystems.html#apache_beam.io.filesystems.FileSystems.open)\n", + "> function instead.\n", + "\n", + "For the output type hint, we can use [`PDone`](https://beam.apache.org/releases/pydoc/current/_modules/apache_beam/pvalue.html) to indicate this is the last transform in a given pipeline." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LcRHXwyT8Rrj" + }, + "source": [ + "import apache_beam as beam\n", + "from apache_beam.io.filesystems import FileSystems as beam_fs\n", + "from apache_beam.options.pipeline_options import PipelineOptions\n", + "import os\n", + "import uuid\n", + "from typing import Iterable\n", + "\n", + "@beam.ptransform_fn\n", + "@beam.typehints.with_input_types(str)\n", + "@beam.typehints.with_output_types(beam.pvalue.PDone)\n", + "def WriteBatchesToFiles(\n", + " pcollection: beam.PCollection[str],\n", + " file_name_prefix: str,\n", + " file_name_suffix: str = '.txt',\n", + " batch_size: int = 100,\n", + ") -> beam.pvalue.PDone:\n", + " def expand_pattern(pattern):\n", + " for match_result in beam_fs.match([pattern])[0].metadata_list:\n", + " yield match_result.path\n", + "\n", + " def write_file(lines: Iterable[str]):\n", + " file_name = f\"{file_name_prefix}-{uuid.uuid4().hex}{file_name_suffix}\"\n", + " with beam_fs.create(file_name) as f:\n", + " for line in lines:\n", + " f.write(f\"{line}\\n\".encode('utf-8'))\n", + "\n", + " # Remove existing files matching the output file_name pattern.\n", + " for path in expand_pattern(f\"{file_name_prefix}*{file_name_suffix}\"):\n", + " os.remove(path)\n", + " return (\n", + " pcollection\n", + " # For simplicity we key with `None` and discard it.\n", + " | 'Key with None' >> beam.WithKeys(lambda _: None)\n", + " | 'Group into batches' >> beam.GroupIntoBatches(batch_size)\n", + " | 'Discard key' >> beam.Values()\n", + " | 'Write file' >> beam.Map(write_file)\n", + " )\n", + "\n", + "output_file_name_prefix = 'outputs/batch'\n", + "options = PipelineOptions(flags=[], type_check_additional='all')\n", + "with beam.Pipeline(options=options) as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Create file lines' >> beam.Create([\n", + " 'Each element must be a string.',\n", + " 'It writes one element per line.',\n", + " 'There are no guarantees on the line order.',\n", + " 'The data might be written into multiple files.',\n", + " ])\n", + " | 'Write batches to files' >> WriteBatchesToFiles(\n", + " file_name_prefix=output_file_name_prefix,\n", + " file_name_suffix='.txt',\n", + " batch_size=3,\n", + " )\n", + " )" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "CUklk4JtEbft", + "outputId": "adddbd9f-e66d-4def-ba59-1eafccdbe793" + }, + "source": [ + "# Lets look at the output files and contents.\n", + "!head outputs/batch*.txt" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "==> outputs/batch-30d399fb3f24430db193e8130f439cb0.txt <==\n", + "Each element must be a string.\n", + "It writes one element per line.\n", + "There are no guarantees on the line order.\n", + "\n", + "==> outputs/batch-ab16a5c2018e4c32b01a5acaa2671fd0.txt <==\n", + "The data might be written into multiple files.\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hbmPT317hP5K" + }, + "source": [ + "## Writing windows of elements\n", + "\n", + "If the order of the elements _is_ important, we could batch the elements by windows. This could be useful in _streaming_ pipelines, where we have an indefinite number of incoming elements and we would like to write windows as they are being processed.\n", + "\n", + "> ℹ️ For more information about windows and triggers, check the [Windowing](https://beam.apache.org/documentation/programming-guide/#windowing) page.\n", + "\n", + "We use a\n", + "[custom `DoFn` transform](https://beam.apache.org/documentation/transforms/python/elementwise/pardo/#example-2-pardo-with-timestamp-and-window-information)\n", + "to extract the window start time and end time.\n", + "We use this for the file names of the output files." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "v_qK300FG9js" + }, + "source": [ + "import apache_beam as beam\n", + "from apache_beam.io.filesystems import FileSystems as beam_fs\n", + "from apache_beam.options.pipeline_options import PipelineOptions\n", + "from datetime import datetime\n", + "import time\n", + "from typing import Any, Dict\n", + "\n", + "def unix_time(time_str: str) -> int:\n", + " return time.mktime(time.strptime(time_str, '%Y-%m-%d %H:%M:%S'))\n", + "\n", + "class WithWindowInfo(beam.DoFn):\n", + " def process(self, element: Any, window=beam.DoFn.WindowParam) -> Iterable[Dict[str, Any]]:\n", + " yield {\n", + " 'element': element,\n", + " 'window_start': window.start.to_utc_datetime(),\n", + " 'window_end': window.end.to_utc_datetime(),\n", + " }\n", + "\n", + "@beam.ptransform_fn\n", + "@beam.typehints.with_input_types(str)\n", + "@beam.typehints.with_output_types(beam.pvalue.PDone)\n", + "def WriteWindowsToFiles(\n", + " pcollection: beam.PCollection[str],\n", + " file_name_prefix: str,\n", + " file_name_suffix: str = '.txt',\n", + ") -> beam.pvalue.PDone:\n", + " def write_file(batch: Dict[str, Any]):\n", + " start_date = batch['window_start'].date()\n", + " start_time = batch['window_start'].time()\n", + " end_time = batch['window_end'].time()\n", + " file_name = f\"{file_name_prefix}-{start_date}-{start_time}-{end_time}{file_name_suffix}\"\n", + " with beam_fs.create(file_name) as f:\n", + " for x in batch['element']:\n", + " f.write(f\"{x}\\n\".encode('utf-8'))\n", + "\n", + " return (\n", + " pcollection\n", + " | 'Group all per window' >> beam.GroupBy(lambda _: None)\n", + " | 'Discard key' >> beam.Values()\n", + " | 'Get window info' >> beam.ParDo(WithWindowInfo())\n", + " | 'Write files' >> beam.Map(write_file)\n", + " )\n", + "\n", + "output_file_name_prefix = 'outputs/window'\n", + "window_size_sec = 5 * 60 # 5 minutes\n", + "options = PipelineOptions(flags=[], type_check_additional='all')\n", + "with beam.Pipeline(options=options) as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Create elements' >> beam.Create([\n", + " {'timestamp': unix_time('2020-03-19 08:49:00'), 'event': 'login'},\n", + " {'timestamp': unix_time('2020-03-19 08:49:20'), 'event': 'view_account'},\n", + " {'timestamp': unix_time('2020-03-19 08:50:00'), 'event': 'view_orders'},\n", + " {'timestamp': unix_time('2020-03-19 08:51:00'), 'event': 'track_order'},\n", + " {'timestamp': unix_time('2020-03-19 09:00:00'), 'event': 'logout'},\n", + " ])\n", + " | 'With timestamps' >> beam.Map(\n", + " lambda x: beam.window.TimestampedValue(x, x['timestamp']))\n", + " | 'Fixed-sized windows' >> beam.WindowInto(\n", + " beam.window.FixedWindows(window_size_sec))\n", + " | 'To string' >> beam.Map(\n", + " lambda x: f\"{datetime.fromtimestamp(x['timestamp'])}: {x['event']}\")\n", + " | 'Write windows to files' >> WriteWindowsToFiles(\n", + " file_name_prefix=output_file_name_prefix,\n", + " file_name_suffix='.txt',\n", + " )\n", + " )" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "4QXKKVawTJ2_", + "outputId": "96a84b29-3fd2-46f4-b21b-d3f07daa928b" + }, + "source": [ + "# Lets look at the output files and contents.\n", + "!head outputs/window*.txt" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "==> outputs/window-2020-03-19-08:45:00-08:50:00.txt <==\n", + "2020-03-19 08:49:00: login\n", + "2020-03-19 08:49:20: view_account\n", + "\n", + "==> outputs/window-2020-03-19-08:50:00-08:55:00.txt <==\n", + "2020-03-19 08:50:00: view_orders\n", + "2020-03-19 08:51:00: track_order\n", + "\n", + "==> outputs/window-2020-03-19-09:00:00-09:05:00.txt <==\n", + "2020-03-19 09:00:00: logout\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gnoz_mWtxSjW" + }, + "source": [ + "# What's next?\n", + "\n", + "* ![Open in Colab](https://github.com/googlecolab/open_in_colab/raw/main/images/icon16.png)\n", + " [Windowing](https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/reading-and-writing-data.ipynb) --\n", + " how process data based on time intervals. \n", + "* [Programming guide](https://beam.apache.org/documentation/programming-guide) -- learn about all the Apache Beam concepts in more depth.\n", + "* [Transform catalog](https://beam.apache.org/documentation/transforms/python/overview) -- check out all the available transforms.\n", + "* [Mobile gaming example](https://beam.apache.org/get-started/mobile-gaming-example) -- learn more about windowing, triggers, and streaming through a complete example pipeline.\n", + "* [Runners](https://beam.apache.org/documentation/runners/capability-matrix) -- check the available runners, their capabilities, and how to run your pipeline in them." + ] + } + ] +} \ No newline at end of file diff --git a/examples/notebooks/tour-of-beam/windowing.ipynb b/examples/notebooks/tour-of-beam/windowing.ipynb new file mode 100644 index 000000000000..f4573d719db1 --- /dev/null +++ b/examples/notebooks/tour-of-beam/windowing.ipynb @@ -0,0 +1,703 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Windowing -- Tour of Beam", + "provenance": [], + "collapsed_sections": [], + "toc_visible": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "code", + "metadata": { + "cellView": "form", + "id": "upmJn_DjcThx" + }, + "source": [ + "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n", + "\n", + "# Licensed to the Apache Software Foundation (ASF) under one\n", + "# or more contributor license agreements. See the NOTICE file\n", + "# distributed with this work for additional information\n", + "# regarding copyright ownership. The ASF licenses this file\n", + "# to you under the Apache License, Version 2.0 (the\n", + "# \"License\"); you may not use this file except in compliance\n", + "# with the License. You may obtain a copy of the License at\n", + "#\n", + "# http://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing,\n", + "# software distributed under the License is distributed on an\n", + "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n", + "# KIND, either express or implied. See the License for the\n", + "# specific language governing permissions and limitations\n", + "# under the License." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5UC_aGanx6oE" + }, + "source": [ + "# Windowing -- _Tour of Beam_\n", + "\n", + "Sometimes, we want to [aggregate](https://beam.apache.org/documentation/transforms/python/overview/#aggregation) data, like `GroupByKey` or `Combine`, only at certain intervals, like hourly or daily, instead of processing the entire `PCollection` of data only once.\n", + "\n", + "We might want to emit a [moving average](https://en.wikipedia.org/wiki/Moving_average) as we're processing data.\n", + "\n", + "Maybe we want to analyze the user experience for a certain task in a web app, it would be nice to get the app events by sessions of activity.\n", + "\n", + "Or we could be running a streaming pipeline, and there is no end to the data, so how can we aggregate data?\n", + "\n", + "_Windows_ in Beam allow us to process only certain data intervals at a time.\n", + "In this notebook, we go through different ways of windowing our pipeline.\n", + "\n", + "Lets begin by installing `apache-beam`." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "R_Yhoc6N_Flg" + }, + "source": [ + "# Install apache-beam with pip.\n", + "!pip install --quiet apache-beam" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_OkWHiAvpWDZ" + }, + "source": [ + "First, lets define some helper functions to simplify the rest of the examples.\n", + "\n", + "We have a transform to help us analyze an element alongside its window information, and we have another transform to help us analyze how many elements landed into each window.\n", + "We use a custom [`DoFn`](https://beam.apache.org/documentation/transforms/python/elementwise/pardo)\n", + "to access that information.\n", + "\n", + "You don't need to understand these, you just need to know they exist 🙂." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "C9yAN1Hgk3dF" + }, + "source": [ + "import apache_beam as beam\n", + "\n", + "def human_readable_window(window) -> str:\n", + " \"\"\"Formats a window object into a human readable string.\"\"\"\n", + " if isinstance(window, beam.window.GlobalWindow):\n", + " return str(window)\n", + " return f'{window.start.to_utc_datetime()} - {window.end.to_utc_datetime()}'\n", + "\n", + "class PrintElementInfo(beam.DoFn):\n", + " \"\"\"Prints an element with its Window information.\"\"\"\n", + " def process(self, element, timestamp=beam.DoFn.TimestampParam, window=beam.DoFn.WindowParam):\n", + " print(f'[{human_readable_window(window)}] {timestamp.to_utc_datetime()} -- {element}')\n", + " yield element\n", + "\n", + "@beam.ptransform_fn\n", + "def PrintWindowInfo(pcollection):\n", + " \"\"\"Prints the Window information with how many elements landed in that window.\"\"\"\n", + " class PrintCountsInfo(beam.DoFn):\n", + " def process(self, num_elements, window=beam.DoFn.WindowParam):\n", + " print(f'>> Window [{human_readable_window(window)}] has {num_elements} elements')\n", + " yield num_elements\n", + "\n", + " return (\n", + " pcollection\n", + " | 'Count elements per window' >> beam.combiners.Count.Globally().without_defaults()\n", + " | 'Print counts info' >> beam.ParDo(PrintCountsInfo())\n", + " )" + ], + "execution_count": 1, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CQrojV2QnqIU" + }, + "source": [ + "Now lets create some data to use in the examples.\n", + "\n", + "Windows define data intervals based on time, so we need to tell Apache Beam a timestamp for each element.\n", + "\n", + "We define a `PTransform` for convenience, so we can attach the timestamps automatically.\n", + "\n", + "Apache Beam requires us to provide the timestamp as [Unix time](https://en.wikipedia.org/wiki/Unix_time), which is a way to represent a date and time as the number of seconds since January 1st, 1970.\n", + "\n", + "For our data, lets analyze some events about the seasons and moon phases for the year 2021, which might be [useful for a gardening project](https://www.almanac.com/content/planting-by-the-moon).\n", + "\n", + "To attach timestamps to each element, we can `Map` each element and return a [`TimestmpedValue`](https://beam.apache.org/documentation/transforms/python/elementwise/withtimestamps/)." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Sgzscopvmh1f", + "outputId": "e0c6fc19-ab97-4754-8f1f-1601807be940" + }, + "source": [ + "import time\n", + "from apache_beam.options.pipeline_options import PipelineOptions\n", + "\n", + "def to_unix_time(time_str: str, time_format='%Y-%m-%d %H:%M:%S') -> int:\n", + " \"\"\"Converts a time string into Unix time.\"\"\"\n", + " time_tuple = time.strptime(time_str, time_format)\n", + " return int(time.mktime(time_tuple))\n", + "\n", + "@beam.ptransform_fn\n", + "@beam.typehints.with_input_types(beam.pvalue.PBegin)\n", + "@beam.typehints.with_output_types(beam.window.TimestampedValue)\n", + "def AstronomicalEvents(pipeline):\n", + " return (\n", + " pipeline\n", + " | 'Create data' >> beam.Create([\n", + " ('2021-03-20 03:37:00', 'March Equinox 2021'),\n", + " ('2021-04-26 22:31:00', 'Super full moon'),\n", + " ('2021-05-11 13:59:00', 'Micro new moon'),\n", + " ('2021-05-26 06:13:00', 'Super full moon, total lunar eclipse'),\n", + " ('2021-06-20 22:32:00', 'June Solstice 2021'),\n", + " ('2021-08-22 07:01:00', 'Blue moon'),\n", + " ('2021-09-22 14:21:00', 'September Equinox 2021'),\n", + " ('2021-11-04 15:14:00', 'Super new moon'),\n", + " ('2021-11-19 02:57:00', 'Micro full moon, partial lunar eclipse'),\n", + " ('2021-12-04 01:43:00', 'Super new moon'),\n", + " ('2021-12-18 10:35:00', 'Micro full moon'),\n", + " ('2021-12-21 09:59:00', 'December Solstice 2021'),\n", + " ])\n", + " | 'With timestamps' >> beam.MapTuple(\n", + " lambda timestamp, element:\n", + " beam.window.TimestampedValue(element, to_unix_time(timestamp))\n", + " )\n", + " )\n", + "\n", + "# Lets see how the data looks like.\n", + "beam_options = PipelineOptions(flags=[], type_check_additional='all')\n", + "with beam.Pipeline(options=beam_options) as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Astronomical events' >> AstronomicalEvents()\n", + " | 'Print element' >> beam.Map(print)\n", + " )" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "stream", + "text": [ + "March Equinox 2021\n", + "Super full moon\n", + "Micro new moon\n", + "Super full moon, total lunar eclipse\n", + "June Solstice 2021\n", + "Blue moon\n", + "September Equinox 2021\n", + "December Solstice 2021\n", + "Super new moon\n", + "Micro full moon, partial lunar eclipse\n", + "Super new moon\n", + "Micro full moon\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qI0K3jSA2LbJ" + }, + "source": [ + "> ℹ️ After running this, it looks like the timestamps disappeared!\n", + "> They're actually still _implicitly_ part of the element, just like the windowing information.\n", + "> If we need to access it, we can do so via a custom [`DoFn`](https://beam.apache.org/documentation/transforms/python/elementwise/pardo).\n", + "> Aggregation transforms use each element's timestamp along with the windowing we specified to create windows of elements." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ymHF1WCqnG4V" + }, + "source": [ + "# Global window\n", + "\n", + "All pipelines use the [`GlobalWindow`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.GlobalWindow) by default.\n", + "This is a single window that covers the entire `PCollection`.\n", + "\n", + "In many cases, especially for batch pipelines, this is what we want since we want to analyze all the data that we have.\n", + "\n", + "> ℹ️ `GlobalWindow` is not very useful in a streaming pipeline unless you only need element-wise transforms.\n", + "> Aggregations, like `GroupByKey` and `Combine`, need to process the entire window, but a streaming pipeline has no end, so they would never finish." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "xDXdE9uysriw", + "outputId": "b39e7fe7-dc13-4d77-89af-f2d1312ab673" + }, + "source": [ + "import apache_beam as beam\n", + "\n", + "# All elements fall into the GlobalWindow by default.\n", + "with beam.Pipeline() as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Astrolonomical events' >> AstronomicalEvents()\n", + " | 'Print element info' >> beam.ParDo(PrintElementInfo())\n", + " | 'Print window info' >> PrintWindowInfo()\n", + " )" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "stream", + "text": [ + "[GlobalWindow] 2021-03-20 03:37:00 -- March Equinox 2021\n", + "[GlobalWindow] 2021-04-26 22:31:00 -- Super full moon\n", + "[GlobalWindow] 2021-05-11 13:59:00 -- Micro new moon\n", + "[GlobalWindow] 2021-05-26 06:13:00 -- Super full moon, total lunar eclipse\n", + "[GlobalWindow] 2021-06-20 22:32:00 -- June Solstice 2021\n", + "[GlobalWindow] 2021-08-22 07:01:00 -- Blue moon\n", + "[GlobalWindow] 2021-09-22 14:21:00 -- September Equinox 2021\n", + "[GlobalWindow] 2021-12-21 09:59:00 -- December Solstice 2021\n", + "[GlobalWindow] 2021-11-04 15:14:00 -- Super new moon\n", + "[GlobalWindow] 2021-11-19 02:57:00 -- Micro full moon, partial lunar eclipse\n", + "[GlobalWindow] 2021-12-04 01:43:00 -- Super new moon\n", + "[GlobalWindow] 2021-12-18 10:35:00 -- Micro full moon\n", + ">> Window [GlobalWindow] has 12 elements\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l3Kod_pR7a7S" + }, + "source": [ + "![Global window]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7WkYLzFCo4Rl" + }, + "source": [ + "# Fixed time windows\n", + "\n", + "If we want to analyze our data hourly, daily, monthly, etc. We might want to create evenly spaced intervals.\n", + "\n", + "[`FixedWindows`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.FixedWindows)\n", + "allow us to create fixed-sized windows.\n", + "We only need to specify the _window size_ in seconds.\n", + "\n", + "In Python, we can use [`timedelta`](https://docs.python.org/3/library/datetime.html#timedelta-objects)\n", + "to help us do the conversion of minutes, hours, or days for us.\n", + "\n", + "> ℹ️ Some time deltas like a month cannot be so easily converted into seconds, since a month can have from 28 to 31 days.\n", + "> Sometimes using an estimate like 30 days in a month is enough.\n", + "\n", + "We must use the [`WindowInto`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html?highlight=windowinto#apache_beam.transforms.core.WindowInto)\n", + "transform to apply the kind of window we want." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "amZCkPNZ5gFQ", + "outputId": "8e6c0a13-3f19-4452-ce4e-08b74b24798e" + }, + "source": [ + "import apache_beam as beam\n", + "from datetime import timedelta\n", + "\n", + "# Fixed-sized windows of approximately 3 months.\n", + "window_size = timedelta(days=3*30).total_seconds() # in seconds\n", + "print(f'window_size: {window_size} seconds')\n", + "\n", + "with beam.Pipeline() as pipeline:\n", + " elements = (\n", + " pipeline\n", + " | 'Astronomical events' >> AstronomicalEvents()\n", + " | 'Fixed windows' >> beam.WindowInto(beam.window.FixedWindows(window_size))\n", + " | 'Print element info' >> beam.ParDo(PrintElementInfo())\n", + " | 'Print window info' >> PrintWindowInfo()\n", + " )" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "text": [ + "window_size: 7776000.0 seconds\n", + "[2021-01-03 00:00:00 - 2021-04-03 00:00:00] 2021-03-20 03:37:00 -- March Equinox 2021\n", + "[2021-04-03 00:00:00 - 2021-07-02 00:00:00] 2021-04-26 22:31:00 -- Super full moon\n", + "[2021-04-03 00:00:00 - 2021-07-02 00:00:00] 2021-05-11 13:59:00 -- Micro new moon\n", + "[2021-04-03 00:00:00 - 2021-07-02 00:00:00] 2021-05-26 06:13:00 -- Super full moon, total lunar eclipse\n", + "[2021-04-03 00:00:00 - 2021-07-02 00:00:00] 2021-06-20 22:32:00 -- June Solstice 2021\n", + "[2021-07-02 00:00:00 - 2021-09-30 00:00:00] 2021-08-22 07:01:00 -- Blue moon\n", + "[2021-07-02 00:00:00 - 2021-09-30 00:00:00] 2021-09-22 14:21:00 -- September Equinox 2021\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-12-21 09:59:00 -- December Solstice 2021\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-11-04 15:14:00 -- Super new moon\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-11-19 02:57:00 -- Micro full moon, partial lunar eclipse\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-12-04 01:43:00 -- Super new moon\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-12-18 10:35:00 -- Micro full moon\n", + ">> Window [2021-01-03 00:00:00 - 2021-04-03 00:00:00] has 1 elements\n", + ">> Window [2021-04-03 00:00:00 - 2021-07-02 00:00:00] has 4 elements\n", + ">> Window [2021-07-02 00:00:00 - 2021-09-30 00:00:00] has 2 elements\n", + ">> Window [2021-09-30 00:00:00 - 2021-12-29 00:00:00] has 5 elements\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4ijww4Vq7jO7" + }, + "source": [ + "![Fixed windows]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1_Vdnmazo5ot" + }, + "source": [ + "# Sliding time windows\n", + "\n", + "Maybe we want a fixed-sized window, but we don't want to wait until a window finishes so we can start the new one.\n", + "We might want to calculate a moving average.\n", + "\n", + "For example, lets say we want to analyze our data for the last three months, but we want to have a monthly report.\n", + "In other words, we want windows at a monthly frequency, but each window should cover the last three months.\n", + "\n", + "[`Sliding windows`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.SlidingWindows)\n", + "allow us to do just that.\n", + "We need to specify the _window size_ in seconds just like with `FixedWindows`. We also need to specify a _window period_ in seconds, which is how often we want to emit each window." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ue19z9wGg2-f", + "outputId": "1f99937f-ead9-485f-84b9-8a1da7ae8f1f" + }, + "source": [ + "import apache_beam as beam\n", + "from datetime import timedelta\n", + "\n", + "# Sliding windows of approximately 3 months every month.\n", + "window_size = timedelta(days=3*30).total_seconds() # in seconds\n", + "window_period = timedelta(days=30).total_seconds() # in seconds\n", + "print(f'window_size: {window_size} seconds')\n", + "print(f'window_period: {window_period} seconds')\n", + "\n", + "with beam.Pipeline() as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Astronomical events' >> AstronomicalEvents()\n", + " | 'Sliding windows' >> beam.WindowInto(\n", + " beam.window.SlidingWindows(window_size, window_period)\n", + " )\n", + " | 'Print element info' >> beam.ParDo(PrintElementInfo())\n", + " | 'Print window info' >> PrintWindowInfo()\n", + " )" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "text": [ + "window_size: 7776000.0 seconds\n", + "window_period: 2592000.0 seconds\n", + "[2021-03-04 00:00:00 - 2021-06-02 00:00:00] 2021-03-20 03:37:00 -- March Equinox 2021\n", + "[2021-02-02 00:00:00 - 2021-05-03 00:00:00] 2021-03-20 03:37:00 -- March Equinox 2021\n", + "[2021-01-03 00:00:00 - 2021-04-03 00:00:00] 2021-03-20 03:37:00 -- March Equinox 2021\n", + "[2021-04-03 00:00:00 - 2021-07-02 00:00:00] 2021-04-26 22:31:00 -- Super full moon\n", + "[2021-03-04 00:00:00 - 2021-06-02 00:00:00] 2021-04-26 22:31:00 -- Super full moon\n", + "[2021-02-02 00:00:00 - 2021-05-03 00:00:00] 2021-04-26 22:31:00 -- Super full moon\n", + "[2021-05-03 00:00:00 - 2021-08-01 00:00:00] 2021-05-11 13:59:00 -- Micro new moon\n", + "[2021-04-03 00:00:00 - 2021-07-02 00:00:00] 2021-05-11 13:59:00 -- Micro new moon\n", + "[2021-03-04 00:00:00 - 2021-06-02 00:00:00] 2021-05-11 13:59:00 -- Micro new moon\n", + "[2021-05-03 00:00:00 - 2021-08-01 00:00:00] 2021-05-26 06:13:00 -- Super full moon, total lunar eclipse\n", + "[2021-04-03 00:00:00 - 2021-07-02 00:00:00] 2021-05-26 06:13:00 -- Super full moon, total lunar eclipse\n", + "[2021-03-04 00:00:00 - 2021-06-02 00:00:00] 2021-05-26 06:13:00 -- Super full moon, total lunar eclipse\n", + "[2021-06-02 00:00:00 - 2021-08-31 00:00:00] 2021-06-20 22:32:00 -- June Solstice 2021\n", + "[2021-05-03 00:00:00 - 2021-08-01 00:00:00] 2021-06-20 22:32:00 -- June Solstice 2021\n", + "[2021-04-03 00:00:00 - 2021-07-02 00:00:00] 2021-06-20 22:32:00 -- June Solstice 2021\n", + "[2021-08-01 00:00:00 - 2021-10-30 00:00:00] 2021-08-22 07:01:00 -- Blue moon\n", + "[2021-07-02 00:00:00 - 2021-09-30 00:00:00] 2021-08-22 07:01:00 -- Blue moon\n", + "[2021-06-02 00:00:00 - 2021-08-31 00:00:00] 2021-08-22 07:01:00 -- Blue moon\n", + "[2021-08-31 00:00:00 - 2021-11-29 00:00:00] 2021-09-22 14:21:00 -- September Equinox 2021\n", + "[2021-08-01 00:00:00 - 2021-10-30 00:00:00] 2021-09-22 14:21:00 -- September Equinox 2021\n", + "[2021-07-02 00:00:00 - 2021-09-30 00:00:00] 2021-09-22 14:21:00 -- September Equinox 2021\n", + "[2021-11-29 00:00:00 - 2022-02-27 00:00:00] 2021-12-21 09:59:00 -- December Solstice 2021\n", + "[2021-10-30 00:00:00 - 2022-01-28 00:00:00] 2021-12-21 09:59:00 -- December Solstice 2021\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-12-21 09:59:00 -- December Solstice 2021\n", + "[2021-10-30 00:00:00 - 2022-01-28 00:00:00] 2021-11-04 15:14:00 -- Super new moon\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-11-04 15:14:00 -- Super new moon\n", + "[2021-08-31 00:00:00 - 2021-11-29 00:00:00] 2021-11-04 15:14:00 -- Super new moon\n", + "[2021-10-30 00:00:00 - 2022-01-28 00:00:00] 2021-11-19 02:57:00 -- Micro full moon, partial lunar eclipse\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-11-19 02:57:00 -- Micro full moon, partial lunar eclipse\n", + "[2021-08-31 00:00:00 - 2021-11-29 00:00:00] 2021-11-19 02:57:00 -- Micro full moon, partial lunar eclipse\n", + "[2021-11-29 00:00:00 - 2022-02-27 00:00:00] 2021-12-04 01:43:00 -- Super new moon\n", + "[2021-10-30 00:00:00 - 2022-01-28 00:00:00] 2021-12-04 01:43:00 -- Super new moon\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-12-04 01:43:00 -- Super new moon\n", + "[2021-11-29 00:00:00 - 2022-02-27 00:00:00] 2021-12-18 10:35:00 -- Micro full moon\n", + "[2021-10-30 00:00:00 - 2022-01-28 00:00:00] 2021-12-18 10:35:00 -- Micro full moon\n", + "[2021-09-30 00:00:00 - 2021-12-29 00:00:00] 2021-12-18 10:35:00 -- Micro full moon\n", + ">> Window [2021-03-04 00:00:00 - 2021-06-02 00:00:00] has 4 elements\n", + ">> Window [2021-02-02 00:00:00 - 2021-05-03 00:00:00] has 2 elements\n", + ">> Window [2021-01-03 00:00:00 - 2021-04-03 00:00:00] has 1 elements\n", + ">> Window [2021-04-03 00:00:00 - 2021-07-02 00:00:00] has 4 elements\n", + ">> Window [2021-05-03 00:00:00 - 2021-08-01 00:00:00] has 3 elements\n", + ">> Window [2021-06-02 00:00:00 - 2021-08-31 00:00:00] has 2 elements\n", + ">> Window [2021-08-01 00:00:00 - 2021-10-30 00:00:00] has 2 elements\n", + ">> Window [2021-07-02 00:00:00 - 2021-09-30 00:00:00] has 2 elements\n", + ">> Window [2021-08-31 00:00:00 - 2021-11-29 00:00:00] has 3 elements\n", + ">> Window [2021-11-29 00:00:00 - 2022-02-27 00:00:00] has 3 elements\n", + ">> Window [2021-10-30 00:00:00 - 2022-01-28 00:00:00] has 5 elements\n", + ">> Window [2021-09-30 00:00:00 - 2021-12-29 00:00:00] has 5 elements\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zl4CEcL-Xsxc" + }, + "source": [ + "![Sliding windows]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_8BpHRGmBYon" + }, + "source": [ + "A thing to note with `SlidingWindows` is that one element might be processed multiple times because it might overlap in more than one window.\n", + "\n", + "In our example, the _\"processing\"_ is done by `PrintElementInfo` which simply prints the element with its window information. For windows of three months every month, each element is processed three times, one time per window.\n", + "\n", + "In many cases, if we're just doing simple element-wise operations, this isn't generally an issue.\n", + "But for more resource-intensive transformations, it might be a good idea to perform those transformations _before_ doing the windowing." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DpFxVHD9CZ-7", + "outputId": "7bfe9cb4-f88e-4d41-e75b-59637ba72534" + }, + "source": [ + "import apache_beam as beam\n", + "from datetime import timedelta\n", + "\n", + "# Sliding windows of approximately 3 months every month.\n", + "window_size = timedelta(days=3*30).total_seconds() # in seconds\n", + "window_period = timedelta(days=30).total_seconds() # in seconds\n", + "print(f'window_size: {window_size} seconds')\n", + "print(f'window_period: {window_period} seconds')\n", + "\n", + "with beam.Pipeline() as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Astronomical events' >> AstronomicalEvents()\n", + " #------\n", + " # ℹ️ Here we're processing / printing the data before windowing.\n", + " | 'Print element info' >> beam.ParDo(PrintElementInfo())\n", + " | 'Sliding windows' >> beam.WindowInto(\n", + " beam.window.SlidingWindows(window_size, window_period)\n", + " )\n", + " #------\n", + " | 'Print window info' >> PrintWindowInfo()\n", + " )" + ], + "execution_count": 17, + "outputs": [ + { + "output_type": "stream", + "text": [ + "window_size: 7776000.0 seconds\n", + "window_period: 2592000.0 seconds\n", + "[GlobalWindow] 2021-03-20 03:37:00 -- March Equinox 2021\n", + "[GlobalWindow] 2021-04-26 22:31:00 -- Super full moon\n", + "[GlobalWindow] 2021-05-11 13:59:00 -- Micro new moon\n", + "[GlobalWindow] 2021-05-26 06:13:00 -- Super full moon, total lunar eclipse\n", + "[GlobalWindow] 2021-06-20 22:32:00 -- June Solstice 2021\n", + "[GlobalWindow] 2021-08-22 07:01:00 -- Blue moon\n", + "[GlobalWindow] 2021-09-22 14:21:00 -- September Equinox 2021\n", + "[GlobalWindow] 2021-12-21 09:59:00 -- December Solstice 2021\n", + "[GlobalWindow] 2021-11-04 15:14:00 -- Super new moon\n", + "[GlobalWindow] 2021-11-19 02:57:00 -- Micro full moon, partial lunar eclipse\n", + "[GlobalWindow] 2021-12-04 01:43:00 -- Super new moon\n", + "[GlobalWindow] 2021-12-18 10:35:00 -- Micro full moon\n", + ">> Window [2021-03-04 00:00:00 - 2021-06-02 00:00:00] has 4 elements\n", + ">> Window [2021-02-02 00:00:00 - 2021-05-03 00:00:00] has 2 elements\n", + ">> Window [2021-01-03 00:00:00 - 2021-04-03 00:00:00] has 1 elements\n", + ">> Window [2021-04-03 00:00:00 - 2021-07-02 00:00:00] has 4 elements\n", + ">> Window [2021-05-03 00:00:00 - 2021-08-01 00:00:00] has 3 elements\n", + ">> Window [2021-06-02 00:00:00 - 2021-08-31 00:00:00] has 2 elements\n", + ">> Window [2021-08-01 00:00:00 - 2021-10-30 00:00:00] has 2 elements\n", + ">> Window [2021-07-02 00:00:00 - 2021-09-30 00:00:00] has 2 elements\n", + ">> Window [2021-08-31 00:00:00 - 2021-11-29 00:00:00] has 3 elements\n", + ">> Window [2021-11-29 00:00:00 - 2022-02-27 00:00:00] has 3 elements\n", + ">> Window [2021-10-30 00:00:00 - 2022-01-28 00:00:00] has 5 elements\n", + ">> Window [2021-09-30 00:00:00 - 2021-12-29 00:00:00] has 5 elements\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Zx2k1uhyDKvY" + }, + "source": [ + "Note that by doing the windowing _after_ the processing, we only process / print the elments once, but the windowing afterwards is the same." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-XaTV5D2pM5f" + }, + "source": [ + "# Session windows\n", + "\n", + "Maybe we don't want regular windows, but instead, have the windows reflect periods where activity happened.\n", + "\n", + "[`Sessions`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.Sessions)\n", + "allow us to create those kinds of windows.\n", + "We now have to specify a _gap size_ in seconds, which is the maximum number of seconds of inactivity to close a session window.\n", + "\n", + "For example, if we specify a gap size of 30 days. The first event would open a new session window since there are no already opened windows. If the next event happens within the next 30 days or less, like 20 days after the previous event, the session window extends and covers that as well. If there are no new events for the next 30 days, the session window closes and is emitted." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "O41tAUwhjaLX", + "outputId": "d9c0b886-1e47-459b-a64c-94e71fd6075e" + }, + "source": [ + "import apache_beam as beam\n", + "from datetime import timedelta\n", + "\n", + "# Sessions divided by approximately 1 month gaps.\n", + "gap_size = timedelta(days=30).total_seconds() # in seconds\n", + "print(f'gap_size: {gap_size} seconds')\n", + "\n", + "with beam.Pipeline() as pipeline:\n", + " (\n", + " pipeline\n", + " | 'Astronomical events' >> AstronomicalEvents()\n", + " | 'Session windows' >> beam.WindowInto(beam.window.Sessions(gap_size))\n", + " | 'Print element info' >> beam.ParDo(PrintElementInfo())\n", + " | 'Print window info' >> PrintWindowInfo()\n", + " )" + ], + "execution_count": 19, + "outputs": [ + { + "output_type": "stream", + "text": [ + "gap_size: 2592000.0 seconds\n", + "[2021-03-20 03:37:00 - 2021-04-19 03:37:00] 2021-03-20 03:37:00 -- March Equinox 2021\n", + "[2021-04-26 22:31:00 - 2021-05-26 22:31:00] 2021-04-26 22:31:00 -- Super full moon\n", + "[2021-05-11 13:59:00 - 2021-06-10 13:59:00] 2021-05-11 13:59:00 -- Micro new moon\n", + "[2021-05-26 06:13:00 - 2021-06-25 06:13:00] 2021-05-26 06:13:00 -- Super full moon, total lunar eclipse\n", + "[2021-06-20 22:32:00 - 2021-07-20 22:32:00] 2021-06-20 22:32:00 -- June Solstice 2021\n", + "[2021-08-22 07:01:00 - 2021-09-21 07:01:00] 2021-08-22 07:01:00 -- Blue moon\n", + "[2021-09-22 14:21:00 - 2021-10-22 14:21:00] 2021-09-22 14:21:00 -- September Equinox 2021\n", + "[2021-12-21 09:59:00 - 2022-01-20 09:59:00] 2021-12-21 09:59:00 -- December Solstice 2021\n", + "[2021-11-04 15:14:00 - 2021-12-04 15:14:00] 2021-11-04 15:14:00 -- Super new moon\n", + "[2021-11-19 02:57:00 - 2021-12-19 02:57:00] 2021-11-19 02:57:00 -- Micro full moon, partial lunar eclipse\n", + "[2021-12-04 01:43:00 - 2022-01-03 01:43:00] 2021-12-04 01:43:00 -- Super new moon\n", + "[2021-12-18 10:35:00 - 2022-01-17 10:35:00] 2021-12-18 10:35:00 -- Micro full moon\n", + ">> Window [2021-03-20 03:37:00 - 2021-04-19 03:37:00] has 1 elements\n", + ">> Window [2021-04-26 22:31:00 - 2021-07-20 22:32:00] has 4 elements\n", + ">> Window [2021-08-22 07:01:00 - 2021-09-21 07:01:00] has 1 elements\n", + ">> Window [2021-09-22 14:21:00 - 2021-10-22 14:21:00] has 1 elements\n", + ">> Window [2021-11-04 15:14:00 - 2022-01-20 09:59:00] has 5 elements\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eJeoBghkJ_1O" + }, + "source": [ + "![Sessions]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9YE0HIKAvXR3" + }, + "source": [ + "# What's next?\n", + "\n", + "* [Windowing](https://beam.apache.org/documentation/programming-guide/#windowing) -- learn more about windowing in the Beam Programming Guide.\n", + "* [Triggers](https://beam.apache.org/documentation/programming-guide/#triggers) -- learn about triggers in the Beam Programming Guide.\n", + "* [Transform catalog](https://beam.apache.org/documentation/transforms/python/overview) --\n", + " check out all the available transforms.\n", + "* [Mobile gaming example](https://beam.apache.org/get-started/mobile-gaming-example) --\n", + " learn more about windowing, triggers, and streaming through a complete example pipeline.\n", + "* [Runners](https://beam.apache.org/documentation/runners/capability-matrix) --\n", + " check the available runners, their capabilities, and how to run your pipeline in them." + ] + } + ] +} \ No newline at end of file diff --git a/gradle.properties b/gradle.properties index 20df06a721ed..e3a005d94f3e 100644 --- a/gradle.properties +++ b/gradle.properties @@ -24,10 +24,13 @@ offlineRepositoryRoot=offline-repository signing.gnupg.executable=gpg signing.gnupg.useLegacyGpg=true -version=2.27.0-SNAPSHOT -sdk_version=2.27.0.dev +version=2.34.0-SNAPSHOT +sdk_version=2.34.0.dev javaVersion=1.8 docker_image_default_repo_root=apache docker_image_default_repo_prefix=beam_ + +flink_versions=1.11,1.12,1.13 + diff --git a/gradle/wrapper/gradle-wrapper.properties b/gradle/wrapper/gradle-wrapper.properties index be52383ef49c..442d9132ea32 100644 --- a/gradle/wrapper/gradle-wrapper.properties +++ b/gradle/wrapper/gradle-wrapper.properties @@ -1,5 +1,5 @@ distributionBase=GRADLE_USER_HOME distributionPath=wrapper/dists -distributionUrl=https\://services.gradle.org/distributions/gradle-6.7-bin.zip +distributionUrl=https\://services.gradle.org/distributions/gradle-6.8.3-bin.zip zipStoreBase=GRADLE_USER_HOME zipStorePath=wrapper/dists diff --git a/learning/katas/java/util/test/org/apache/beam/learning/katas/util/ContainsKvs.java b/learning/katas/java/util/test/org/apache/beam/learning/katas/util/ContainsKvs.java index 35c58668b71f..705e2beb3f9b 100644 --- a/learning/katas/java/util/test/org/apache/beam/learning/katas/util/ContainsKvs.java +++ b/learning/katas/java/util/test/org/apache/beam/learning/katas/util/ContainsKvs.java @@ -22,7 +22,7 @@ import static org.apache.beam.learning.katas.util.KvMatcher.isKv; import static org.hamcrest.CoreMatchers.equalTo; import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import com.google.common.collect.ImmutableList; import java.util.ArrayList; diff --git a/learning/katas/python/Common Transforms/Aggregation/Largest/task.md b/learning/katas/python/Common Transforms/Aggregation/Largest/task.md index 47716d7e6f6d..d482393f15c1 100644 --- a/learning/katas/python/Common Transforms/Aggregation/Largest/task.md +++ b/learning/katas/python/Common Transforms/Aggregation/Largest/task.md @@ -19,7 +19,7 @@ Aggregation - Largest --------------------- -**Kata:** Compute the largest of the elements from an input. +**Kata:** Compute a list of the two largest elements from an input.
    Use diff --git a/learning/katas/python/Common Transforms/Aggregation/Largest/task.py b/learning/katas/python/Common Transforms/Aggregation/Largest/task.py index e584b4d90875..e39701855ffd 100644 --- a/learning/katas/python/Common Transforms/Aggregation/Largest/task.py +++ b/learning/katas/python/Common Transforms/Aggregation/Largest/task.py @@ -21,6 +21,6 @@ with beam.Pipeline() as p: (p | beam.Create(range(1, 11)) - | beam.combiners.Top.Largest(1) + | beam.combiners.Top.Largest(2) | LogElements()) diff --git a/learning/katas/python/Common Transforms/Aggregation/Largest/tests.py b/learning/katas/python/Common Transforms/Aggregation/Largest/tests.py index 9a420cdb9371..9836dbd39cce 100644 --- a/learning/katas/python/Common Transforms/Aggregation/Largest/tests.py +++ b/learning/katas/python/Common Transforms/Aggregation/Largest/tests.py @@ -20,12 +20,12 @@ def test_output(): output = get_file_output() - answers = ['[10]'] + answers = ['[10, 9]'] if all(num in output for num in answers): passed() else: - failed("Incorrect output. Calculate the largest of all elements.") + failed("Incorrect output. Calculate a list of the largest two elements.") if __name__ == '__main__': diff --git a/learning/katas/python/Common Transforms/WithKeys/WithKeys/task.md b/learning/katas/python/Common Transforms/WithKeys/WithKeys/task.md index 3b1eaf563501..ee6678bead12 100644 --- a/learning/katas/python/Common Transforms/WithKeys/WithKeys/task.md +++ b/learning/katas/python/Common Transforms/WithKeys/WithKeys/task.md @@ -19,7 +19,7 @@ WithKeys -------- -**Kata:** Convert each fruit name into a KV of its first letter and itself, e.g. +**Kata:** Convert each fruit name into a key/value pair of its first letter and itself, e.g. `apple => ('a', 'apple')`
    diff --git a/learning/katas/python/Triggers/Early Triggers/Early Triggers/generate_event.py b/learning/katas/python/Triggers/Early Triggers/Early Triggers/generate_event.py new file mode 100644 index 000000000000..dd190a270fc1 --- /dev/null +++ b/learning/katas/python/Triggers/Early Triggers/Early Triggers/generate_event.py @@ -0,0 +1,58 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import apache_beam as beam +from apache_beam.testing.test_stream import TestStream +from datetime import datetime +import pytz + + +class GenerateEvent(beam.PTransform): + + @staticmethod + def sample_data(): + return GenerateEvent() + + def expand(self, input): + + return (input + | TestStream() + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 1, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 2, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 3, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 4, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 6, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 7, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 8, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 9, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 11, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 12, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 13, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 14, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 16, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 17, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 18, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 19, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 25, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to_infinity()) diff --git a/learning/katas/python/Triggers/Early Triggers/Early Triggers/task-info.yaml b/learning/katas/python/Triggers/Early Triggers/Early Triggers/task-info.yaml new file mode 100644 index 000000000000..913493414fb0 --- /dev/null +++ b/learning/katas/python/Triggers/Early Triggers/Early Triggers/task-info.yaml @@ -0,0 +1,31 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +type: edu +files: +- name: task.py + visible: true + placeholders: + - offset: 1342 + length: 387 + placeholder_text: TODO() +- name: tests.py + visible: false +- name: generate_event.py + visible: true diff --git a/learning/katas/python/Triggers/Early Triggers/Early Triggers/task.md b/learning/katas/python/Triggers/Early Triggers/Early Triggers/task.md new file mode 100644 index 000000000000..d3e47cc42e6a --- /dev/null +++ b/learning/katas/python/Triggers/Early Triggers/Early Triggers/task.md @@ -0,0 +1,59 @@ + + +Early Triggers +-------------- + +Triggers allow Beam to emit early results, before all the data in a given window has arrived. For +example, emitting after a certain amount of time elapses, or after a certain number of elements +arrives. + +**Kata:** Given that sample events generated with one second granularity and a fixed window of 1-day duration, +please implement an early trigger that emits the number of events count immediately after new +element is processed. + +
    + Use + withEarlyFirings to set early firing triggers. +
    + +
    + Use + FixedWindows with 1-day duration using + + AfterWatermark trigger. +
    + + + +
    + Use + CombineGlobally and + CountCombineFn to calculate the count of events. +
    + +
    + Refer to the Beam Programming Guide + + "Event time triggers" section for more information. +
    diff --git a/learning/katas/python/Triggers/Early Triggers/Early Triggers/task.py b/learning/katas/python/Triggers/Early Triggers/Early Triggers/task.py new file mode 100644 index 000000000000..2d30275660f7 --- /dev/null +++ b/learning/katas/python/Triggers/Early Triggers/Early Triggers/task.py @@ -0,0 +1,51 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +import apache_beam as beam +from generate_event import GenerateEvent +from apache_beam.transforms.window import FixedWindows +from apache_beam.transforms.trigger import AfterWatermark +from apache_beam.transforms.trigger import AfterCount +from apache_beam.transforms.trigger import AccumulationMode +from apache_beam.utils.timestamp import Duration +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import StandardOptions +from log_elements import LogElements + + +def apply_transform(events): + return (events + | beam.WindowInto(FixedWindows(1*24*60*60), # 1 Day Window + trigger=AfterWatermark(early=AfterCount(1)), + accumulation_mode=AccumulationMode.DISCARDING, + allowed_lateness=Duration(seconds=0)) + | beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults()) + + +def main(): + options = PipelineOptions() + options.view_as(StandardOptions).streaming = True + with beam.Pipeline(options=options) as p: + events = p | GenerateEvent.sample_data() + output = apply_transform(events) + output | LogElements(with_window=True) + + +if __name__ == "__main__": + main() diff --git a/learning/katas/python/Triggers/Early Triggers/Early Triggers/tests.py b/learning/katas/python/Triggers/Early Triggers/Early Triggers/tests.py new file mode 100644 index 000000000000..0ff32af392c6 --- /dev/null +++ b/learning/katas/python/Triggers/Early Triggers/Early Triggers/tests.py @@ -0,0 +1,82 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pytz +from datetime import datetime +from task import apply_transform +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.test_stream import TestStream +from apache_beam.transforms import window +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.testing.util import assert_that, equal_to, equal_to_per_window +from test_helper import failed, passed, get_file_output, test_is_not_empty + + +def test_output(): + options = PipelineOptions() + options.view_as(StandardOptions).streaming = True + test_pipeline = TestPipeline(options=options) + + events = ( test_pipeline + | TestStream() + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 1, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 2, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 3, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 4, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 6, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 7, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 8, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 9, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 11, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 12, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 13, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 14, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 16, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 17, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 18, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 19, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 25, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to_infinity()) + + results = apply_transform(events) + + answers = { + window.IntervalWindow(datetime(2021, 3, 1, 0, 0, 0, 0, tzinfo=pytz.UTC).timestamp(), + datetime(2021, 3, 2, 0, 0, 0, 0, tzinfo=pytz.UTC).timestamp()): [1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 0], + } + + assert_that(results, + equal_to_per_window(answers), + label='count assert per window') + + test_pipeline.run() + + +if __name__ == '__main__': + test_is_not_empty() + test_output() diff --git a/learning/katas/python/Triggers/Early Triggers/lesson-info.yaml b/learning/katas/python/Triggers/Early Triggers/lesson-info.yaml new file mode 100644 index 000000000000..184f82e82811 --- /dev/null +++ b/learning/katas/python/Triggers/Early Triggers/lesson-info.yaml @@ -0,0 +1,21 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +content: +- Early Triggers diff --git a/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/generate_event.py b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/generate_event.py new file mode 100644 index 000000000000..8e7101514dac --- /dev/null +++ b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/generate_event.py @@ -0,0 +1,57 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import apache_beam as beam +from apache_beam.testing.test_stream import TestStream +from datetime import datetime +import pytz + + +class GenerateEvent(beam.PTransform): + + @staticmethod + def sample_data(): + return GenerateEvent() + + def expand(self, input): + return (input + | TestStream() + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 1, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 2, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 3, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 4, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 6, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 7, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 8, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 9, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 11, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 12, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 13, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 14, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 16, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 17, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 18, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 19, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 25, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to_infinity()) diff --git a/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/task-info.yaml b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/task-info.yaml new file mode 100644 index 000000000000..83ea49746390 --- /dev/null +++ b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/task-info.yaml @@ -0,0 +1,31 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +type: edu +files: +- name: task.py + visible: true + placeholders: + - offset: 1288 + length: 343 + placeholder_text: TODO() +- name: generate_event.py + visible: true +- name: tests.py + visible: false diff --git a/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/task.md b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/task.md new file mode 100644 index 000000000000..19da1f55f6d8 --- /dev/null +++ b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/task.md @@ -0,0 +1,70 @@ + + +Event Time Triggers +------------------- + +When collecting and grouping data into windows, Beam uses triggers to determine when to emit the +aggregated results of each window (referred to as a pane). If you use Beam’s default windowing +configuration and default trigger, Beam outputs the aggregated result when it estimates all data +has arrived, and discards all subsequent data for that window. + +You can set triggers for your PCollections to change this default behavior. Beam provides a number +of pre-built triggers that you can set: + +* Event time triggers +* Processing time triggers +* Data-driven triggers +* Composite triggers + +Event time triggers operate on the event time, as indicated by the timestamp on each data element. +Beam’s default trigger is event time-based. + +The `AfterWatermark` trigger operates on event time. The AfterWatermark trigger emits the contents +of a window after the watermark passes the end of the window, based on the timestamps attached to +the data elements. The watermark is a global progress metric, and is Beam’s notion of input +completeness within your pipeline at any given point. `AfterWatermark` only fires when the watermark passes the end of the window. + +**Kata:** Given that sample events generated with one second granularity, please implement a trigger that emits +the number of events count within a fixed window of 5-second duration. + +
    + Use + FixedWindows with 5-second duration using + + AfterWatermark trigger. +
    + + + +
    + Use + CombineGlobally and + CountCombineFn to calculate the count of events. +
    + +
    + Refer to the Beam Programming Guide + + "Event time triggers" section for more information. +
    diff --git a/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/task.py b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/task.py new file mode 100644 index 000000000000..e5d5a6038aff --- /dev/null +++ b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/task.py @@ -0,0 +1,50 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +import apache_beam as beam +from generate_event import GenerateEvent +from apache_beam.transforms.window import FixedWindows +from apache_beam.transforms.trigger import AccumulationMode +from apache_beam.transforms.trigger import AfterWatermark +from apache_beam.utils.timestamp import Duration +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import StandardOptions +from log_elements import LogElements + + +def apply_transform(events): + return (events + | beam.WindowInto(FixedWindows(5), + trigger=AfterWatermark(), + accumulation_mode=AccumulationMode.DISCARDING, + allowed_lateness=Duration(seconds=0)) + | beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults()) + + +def main(): + options = PipelineOptions() + options.view_as(StandardOptions).streaming = True + with beam.Pipeline(options=options) as p: + events = p | GenerateEvent.sample_data() + output = apply_transform(events) + output | LogElements(with_window=True) + + +if __name__ == "__main__": + main() diff --git a/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/tests.py b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/tests.py new file mode 100644 index 000000000000..9bc77bc6c6ca --- /dev/null +++ b/learning/katas/python/Triggers/Event Time Triggers/Event Time Triggers/tests.py @@ -0,0 +1,87 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pytz +from datetime import datetime +from task import apply_transform +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.test_stream import TestStream +from apache_beam.transforms import window +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.testing.util import assert_that, equal_to, equal_to_per_window +from test_helper import failed, passed, get_file_output, test_is_not_empty + + +def test_output(): + options = PipelineOptions() + options.view_as(StandardOptions).streaming = True + test_pipeline = TestPipeline(options=options) + + events = ( test_pipeline + | TestStream() + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 1, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 2, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 3, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 4, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 6, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 7, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 8, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 9, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 11, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 12, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 13, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 14, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 16, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 17, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 18, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 19, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 25, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to_infinity()) + + results = apply_transform(events) + + answers = { + window.IntervalWindow(datetime(2021, 3, 1, 0, 0, 0, 0, tzinfo=pytz.UTC).timestamp(), + datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()): [4], + window.IntervalWindow(datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp(), + datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()): [5], + window.IntervalWindow(datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp(), + datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()): [5], + window.IntervalWindow(datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp(), + datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()): [5], + window.IntervalWindow(datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp(), + datetime(2021, 3, 1, 0, 0, 25, 0, tzinfo=pytz.UTC).timestamp()): [1], + } + + assert_that(results, + equal_to_per_window(answers), + label='count assert per window') + + test_pipeline.run() + + +if __name__ == '__main__': + test_is_not_empty() + test_output() diff --git a/learning/katas/python/Triggers/Event Time Triggers/lesson-info.yaml b/learning/katas/python/Triggers/Event Time Triggers/lesson-info.yaml new file mode 100644 index 000000000000..07e520d08006 --- /dev/null +++ b/learning/katas/python/Triggers/Event Time Triggers/lesson-info.yaml @@ -0,0 +1,21 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +content: +- Event Time Triggers \ No newline at end of file diff --git a/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/generate_event.py b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/generate_event.py new file mode 100644 index 000000000000..dd190a270fc1 --- /dev/null +++ b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/generate_event.py @@ -0,0 +1,58 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import apache_beam as beam +from apache_beam.testing.test_stream import TestStream +from datetime import datetime +import pytz + + +class GenerateEvent(beam.PTransform): + + @staticmethod + def sample_data(): + return GenerateEvent() + + def expand(self, input): + + return (input + | TestStream() + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 1, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 2, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 3, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 4, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 6, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 7, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 8, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 9, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 11, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 12, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 13, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 14, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 16, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 17, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 18, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 19, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 25, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to_infinity()) diff --git a/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/task-info.yaml b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/task-info.yaml new file mode 100644 index 000000000000..56c6adff5092 --- /dev/null +++ b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/task-info.yaml @@ -0,0 +1,31 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +type: edu +files: +- name: task.py + visible: true + placeholders: + - offset: 1342 + length: 389 + placeholder_text: TODO() +- name: tests.py + visible: false +- name: generate_event.py + visible: true diff --git a/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/task.md b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/task.md new file mode 100644 index 000000000000..01974cf0fc7c --- /dev/null +++ b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/task.md @@ -0,0 +1,64 @@ + + +Window Accumulation Mode +------------------------ + +When you specify a trigger, you must also set the the window’s accumulation mode. When a trigger +fires, it emits the current contents of the window as a pane. Since a trigger can fire multiple +times, the accumulation mode determines whether the system accumulates the window panes as the +trigger fires, or discards them. + +**Kata:** Given that sample events generated with one second granularity and a fixed window of 1-day duration, +please implement an early trigger that emits the number of events count immediately after new +element is processed in accumulating mode. + +
    + Use + ACCUMULATING accumulation mode to set a window to accumulate the panes that are produced when the + trigger fires. +
    + +
    + Use + withEarlyFirings to set early firing triggers. +
    + +
    + Use + FixedWindows with 1-day duration using + + AfterWatermark trigger. +
    + +
    + Set the + allowed lateness to 0 +
    + +
    + Use + CombineGlobally and + CountCombineFn to calculate the count of events. +
    + +
    + Refer to the Beam Programming Guide + + "Event time triggers" section for more information. +
    diff --git a/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/task.py b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/task.py new file mode 100644 index 000000000000..16048f3e04aa --- /dev/null +++ b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/task.py @@ -0,0 +1,51 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +import apache_beam as beam +from generate_event import GenerateEvent +from apache_beam.transforms.window import FixedWindows +from apache_beam.transforms.trigger import AfterWatermark +from apache_beam.transforms.trigger import AfterCount +from apache_beam.transforms.trigger import AccumulationMode +from apache_beam.utils.timestamp import Duration +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import StandardOptions +from log_elements import LogElements + + +def apply_transform(events): + return (events + | beam.WindowInto(FixedWindows(1*24*60*60), # 1 Day Window + trigger=AfterWatermark(early=AfterCount(1)), + accumulation_mode=AccumulationMode.ACCUMULATING, + allowed_lateness=Duration(seconds=0)) + | beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults()) + + +def main(): + options = PipelineOptions() + options.view_as(StandardOptions).streaming = True + with beam.Pipeline(options=options) as p: + events = p | GenerateEvent.sample_data() + output = apply_transform(events) + output | LogElements(with_window=True) + + +if __name__ == "__main__": + main() diff --git a/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/tests.py b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/tests.py new file mode 100644 index 000000000000..80fd21b5f490 --- /dev/null +++ b/learning/katas/python/Triggers/Window Accumulation Mode/Window Accumulation Mode/tests.py @@ -0,0 +1,82 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pytz +from datetime import datetime +from task import apply_transform +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.test_stream import TestStream +from apache_beam.transforms import window +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.testing.util import assert_that, equal_to, equal_to_per_window +from test_helper import failed, passed, get_file_output, test_is_not_empty + + +def test_output(): + options = PipelineOptions() + options.view_as(StandardOptions).streaming = True + test_pipeline = TestPipeline(options=options) + + events = ( test_pipeline + | TestStream() + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 1, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 2, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 3, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 4, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 5, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 6, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 7, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 8, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 9, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 10, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 11, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 12, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 13, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 14, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 15, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 16, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 17, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 18, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 19, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .add_elements(elements=["event"], event_timestamp=datetime(2021, 3, 1, 0, 0, 20, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to(datetime(2021, 3, 1, 0, 0, 25, 0, tzinfo=pytz.UTC).timestamp()) + .advance_watermark_to_infinity()) + + results = apply_transform(events) + + answers = { + window.IntervalWindow(datetime(2021, 3, 1, 0, 0, 0, 0, tzinfo=pytz.UTC).timestamp(), + datetime(2021, 3, 2, 0, 0, 0, 0, tzinfo=pytz.UTC).timestamp()): [1, 2, 3, 4, 5, + 6, 7, 8, 9, 10, + 11, 12, 13, 14, 15, + 16, 17, 18, 19, 20, 20], + } + + assert_that(results, + equal_to_per_window(answers), + label='count assert per window') + + test_pipeline.run() + + +if __name__ == '__main__': + test_is_not_empty() + test_output() diff --git a/learning/katas/python/Triggers/Window Accumulation Mode/lesson-info.yaml b/learning/katas/python/Triggers/Window Accumulation Mode/lesson-info.yaml new file mode 100644 index 000000000000..8a260af023db --- /dev/null +++ b/learning/katas/python/Triggers/Window Accumulation Mode/lesson-info.yaml @@ -0,0 +1,21 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +content: +- Window Accumulation Mode diff --git a/learning/katas/python/Triggers/section-info.yaml b/learning/katas/python/Triggers/section-info.yaml new file mode 100644 index 000000000000..f62f31674d4b --- /dev/null +++ b/learning/katas/python/Triggers/section-info.yaml @@ -0,0 +1,23 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +content: +- Event Time Triggers +- Early Triggers +- Window Accumulation Mode diff --git a/learning/katas/python/Windowing/Adding Timestamp/ParDo/task.md b/learning/katas/python/Windowing/Adding Timestamp/ParDo/task.md index 69628d127685..607079b150cc 100644 --- a/learning/katas/python/Windowing/Adding Timestamp/ParDo/task.md +++ b/learning/katas/python/Windowing/Adding Timestamp/ParDo/task.md @@ -35,7 +35,7 @@ outputs new elements with timestamps that you set.
    Use - beam.window.TimestampedValue to assign timestamp to the element. + beam.transforms.window.TimestampedValue to assign timestamp to the element.
    diff --git a/learning/katas/python/course-info.yaml b/learning/katas/python/course-info.yaml index 7a311cc40e6c..ad3fe3025915 100644 --- a/learning/katas/python/course-info.yaml +++ b/learning/katas/python/course-info.yaml @@ -29,4 +29,5 @@ content: - Common Transforms - IO - Windowing -- Examples +- Triggers +- Examples \ No newline at end of file diff --git a/local-env-setup.sh b/local-env-setup.sh new file mode 100755 index 000000000000..c52d15b4e741 --- /dev/null +++ b/local-env-setup.sh @@ -0,0 +1,128 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +darwin_install_pip3_packages() { + echo "Installing setuptools grpcio-tools virtualenv" + pip3 install setuptools grpcio-tools virtualenv + echo "Installing mypy-protobuf" + pip3 install --user mypy-protobuf +} + +install_go_packages(){ + echo "Installing goavro" + go get github.com/linkedin/goavro + # As we are using bash, we are assuming .bashrc exists. + grep -qxF "export GOPATH=${PWD}/sdks/go/examples/.gogradle/project_gopath" ~/.bashrc + gopathExists=$? + if [ $gopathExists -ne 0 ]; then + export GOPATH=${PWD}/sdks/go/examples/.gogradle/project_gopath && echo "GOPATH was set for this session to '$GOPATH'. Make sure to add this path to your ~/.bashrc file after the execution of this script." + fi +} + +kernelname=$(uname -s) + +# Running on Linux +if [ "$kernelname" = "Linux" ]; then + # Assuming Debian based Linux and the prerequisites in https://beam.apache.org/contribute/ are met: + apt-get update + + echo "Installing dependencies listed in pkglist file" + apt-get install -y $(grep -v '^#' dev-support/docker/pkglist | cat) # pulling dependencies from pkglist file + + type -P python3 > /dev/null 2>&1 + python3Exists=$? + type -P pip3 > /dev/null 2>&1 + pip3Exists=$? + if [ $python3Exists -eq 0 -a $pip3Exists -eq 0 ]; then + echo "Installing grpcio-tools mypy-protobuf virtualenv" + pip3 install grpcio-tools mypy-protobuf virtualenv + else + echo "Python3 and pip3 are required but failed to install. Install them manually and rerun the script." + exit + fi + + type -P go > /dev/null 2>&1 + goExists=$? + if [ $goExists -eq 0 ]; then + install_go_packages + else + echo "Go is required. Install it manually from https://golang.org/doc/install and rerun the script." + exit + fi + +# Running on Mac +elif [ "$kernelname" = "Darwin" ]; then + # Check for Homebrew, install if we don't have it + type -P brew > /dev/null 2>&1 + brewExists=$? + if [ $brewExists -ne 0 ]; then + echo "Installing homebrew" + /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" + fi + + # Update homebrew recipes + echo "Updating brew" + brew update + + # Assuming we are using brew + if brew ls --versions openjdk@8 > /dev/null; then + echo "openjdk@8 already installed. Skipping" + else + echo "Installing openjdk@8" + brew install openjdk@8 + fi + + type -P python3 > /dev/null 2>&1 + python3Exists=$? + type -P pip3 > /dev/null 2>&1 + pip3Exists=$? + if [ $python3Exists -eq 0 -a $pip3Exists -eq 0 ]; then + darwin_install_pip3_packages + else + echo "Python3 and pip3 are required but failed to install. Install them manually and rerun the script." + exit + fi + + type -P tox > /dev/null 2>&1 + toxExists=$? + if [ $toxExists -eq 0 ]; then + echo "tox already installed. Skipping" + else + echo "Installing tox" + brew install tox + fi + + type -P docker > /dev/null 2>&1 + dockerExists=$? + if [ $dockerExists -eq 0 ]; then + echo "docker already installed. Skipping" + else + echo "Installing docker" + brew install --cask docker + fi + + type -P go > /dev/null 2>&1 + goExists=$? + if [ $goExists -eq 0 ]; then + install_go_packages + else + echo "Go is required. Install it manually from https://golang.org/doc/install and rerun the script." + exit + fi + +else echo "Unrecognized Kernel Name: $kernelname" +fi diff --git a/model/fn-execution/src/main/proto/beam_fn_api.proto b/model/fn-execution/src/main/proto/beam_fn_api.proto index eaf5dc3be742..95693e37180c 100644 --- a/model/fn-execution/src/main/proto/beam_fn_api.proto +++ b/model/fn-execution/src/main/proto/beam_fn_api.proto @@ -34,7 +34,7 @@ syntax = "proto3"; package org.apache.beam.model.fn_execution.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1;fnexecution_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1;fnexecution_v1"; option java_package = "org.apache.beam.model.fnexecution.v1"; option java_outer_classname = "BeamFnApi"; @@ -709,6 +709,15 @@ message StateKey { bytes window = 3; } + // Represents a request for an unordered set of values associated with a + // specified user key and window for a PTransform. See + // https://s.apache.org/beam-fn-state-api-and-bundle-processing for further + // details. + // + // The response data stream will be a concatenation of all V's associated + // with the specified user key and window. + // See https://s.apache.org/beam-fn-api-send-and-receive-data for further + // details. message BagUserState { // (Required) The id of the PTransform containing user state. string transform_id = 1; @@ -721,6 +730,53 @@ message StateKey { bytes key = 4; } + // Represents a request for the keys of a multimap associated with a specified + // user key and window for a PTransform. See + // https://s.apache.org/beam-fn-state-api-and-bundle-processing for further + // details. + // + // Can only be used to perform StateGetRequests and StateClearRequests on the + // user state. + // + // The response data stream will be a concatenation of all K's associated + // with the specified user key and window. + // See https://s.apache.org/beam-fn-api-send-and-receive-data for further + // details. + message MultimapKeysUserState { + // (Required) The id of the PTransform containing user state. + string transform_id = 1; + // (Required) The id of the user state. + string user_state_id = 2; + // (Required) The window encoded in a nested context. + bytes window = 3; + // (Required) The key of the currently executing element encoded in a + // nested context. + bytes key = 4; + } + + // Represents a request for the values of the map key associated with a + // specified user key and window for a PTransform. See + // https://s.apache.org/beam-fn-state-api-and-bundle-processing for further + // details. + // + // The response data stream will be a concatenation of all V's associated + // with the specified map key, user key, and window. + // See https://s.apache.org/beam-fn-api-send-and-receive-data for further + // details. + message MultimapUserState { + // (Required) The id of the PTransform containing user state. + string transform_id = 1; + // (Required) The id of the user state. + string user_state_id = 2; + // (Required) The window encoded in a nested context. + bytes window = 3; + // (Required) The key of the currently executing element encoded in a + // nested context. + bytes key = 4; + // (Required) The map key encoded in a nested context. + bytes map_key = 5; + } + // (Required) One of the following state keys must be set. oneof type { Runner runner = 1; @@ -728,7 +784,8 @@ message StateKey { BagUserState bag_user_state = 3; IterableSideInput iterable_side_input = 4; MultimapKeysSideInput multimap_keys_side_input = 5; - // TODO: represent a state key for user map state + MultimapKeysUserState multimap_keys_user_state = 6; + MultimapUserState multimap_user_state = 7; } } diff --git a/model/fn-execution/src/main/proto/beam_provision_api.proto b/model/fn-execution/src/main/proto/beam_provision_api.proto index cef769a47d31..9ddc8b470122 100644 --- a/model/fn-execution/src/main/proto/beam_provision_api.proto +++ b/model/fn-execution/src/main/proto/beam_provision_api.proto @@ -25,7 +25,7 @@ syntax = "proto3"; package org.apache.beam.model.fn_execution.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1;fnexecution_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1;fnexecution_v1"; option java_package = "org.apache.beam.model.fnexecution.v1"; option java_outer_classname = "ProvisionApi"; @@ -77,4 +77,12 @@ message ProvisionInfo { // The set of dependencies that should be staged into this environment. repeated org.apache.beam.model.pipeline.v1.ArtifactInformation dependencies = 11; + + // (optional) A set of capabilities that this SDK is allowed to use in its + // interactions with this runner. + repeated string runner_capabilities = 12; + + // (optional) Runtime environment metadata that are static throughout the + // pipeline execution. + map metadata = 13; } diff --git a/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml b/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml index db7021d380e9..ef377722dd19 100644 --- a/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml +++ b/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml @@ -409,6 +409,7 @@ examples: "\x01\x00\x00\x00\x00\x02\x03foo\x01\xa9F\x03bar\x01\xff\xff\xff\xff\xff\xff\xff\xff\x7f": {f_map: {"foo": 9001, "bar": 9223372036854775807}} "\x01\x00\x00\x00\x00\x04\neverything\x00\x02is\x00\x05null!\x00\r\xc2\xaf\\_(\xe3\x83\x84)_/\xc2\xaf\x00": {f_map: {"everything": null, "is": null, "null!": null, "¯\\_(ツ)_/¯": null}} +--- coder: urn: "beam:coder:row:v1" @@ -440,3 +441,25 @@ examples: shardId: "", key: "key" } + +--- + +# Java code snippet to generate example bytes: +# TimestampPrefixingWindowCoder coder = TimestampPrefixingWindowCoder.of(IntervalWindowCoder.of()); +# Instant end = new Instant(-9223372036854410L); +# Duration span = Duration.millis(365L); +# IntervalWindow window = new IntervalWindow(end.minus(span), span); +# byte[] bytes = CoderUtils.encodeToByteArray(coder, window); +# String str = new String(bytes, java.nio.charset.StandardCharsets.ISO_8859_1); +# String example = ""; +# for(int i = 0; i < str.length(); i++){ +# example += CharUtils.unicodeEscaped(str.charAt(i)); +# } +# System.out.println(example); +coder: + urn: "beam:coder:custom_window:v1" + components: [{urn: "beam:coder:interval_window:v1"}] + +examples: + "\u0080\u0000\u0001\u0052\u009a\u00a4\u009b\u0067\u0080\u0000\u0001\u0052\u009a\u00a4\u009b\u0068\u0080\u00dd\u00db\u0001" : {window: {end: 1454293425000, span: 3600000}} + "\u007f\u00df\u003b\u0064\u005a\u001c\u00ad\u0075\u007f\u00df\u003b\u0064\u005a\u001c\u00ad\u0076\u00ed\u0002" : {window: {end: -9223372036854410, span: 365}} diff --git a/model/job-management/src/main/proto/beam_artifact_api.proto b/model/job-management/src/main/proto/beam_artifact_api.proto index dc27fc6f6969..7f64bc35caaf 100644 --- a/model/job-management/src/main/proto/beam_artifact_api.proto +++ b/model/job-management/src/main/proto/beam_artifact_api.proto @@ -25,7 +25,7 @@ syntax = "proto3"; package org.apache.beam.model.job_management.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1;jobmanagement_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1;jobmanagement_v1"; option java_package = "org.apache.beam.model.jobmanagement.v1"; option java_outer_classname = "ArtifactApi"; diff --git a/model/job-management/src/main/proto/beam_expansion_api.proto b/model/job-management/src/main/proto/beam_expansion_api.proto index a888741f89c7..c8e58ed0937b 100644 --- a/model/job-management/src/main/proto/beam_expansion_api.proto +++ b/model/job-management/src/main/proto/beam_expansion_api.proto @@ -25,7 +25,7 @@ syntax = "proto3"; package org.apache.beam.model.expansion.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1;jobmanagement_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1;jobmanagement_v1"; option java_package = "org.apache.beam.model.expansion.v1"; option java_outer_classname = "ExpansionApi"; diff --git a/model/job-management/src/main/proto/beam_job_api.proto b/model/job-management/src/main/proto/beam_job_api.proto index 0a05738e6cf5..69f3f8fd8101 100644 --- a/model/job-management/src/main/proto/beam_job_api.proto +++ b/model/job-management/src/main/proto/beam_job_api.proto @@ -25,7 +25,7 @@ syntax = "proto3"; package org.apache.beam.model.job_management.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1;jobmanagement_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1;jobmanagement_v1"; option java_package = "org.apache.beam.model.jobmanagement.v1"; option java_outer_classname = "JobApi"; diff --git a/model/pipeline/src/main/proto/beam_runner_api.proto b/model/pipeline/src/main/proto/beam_runner_api.proto index ce561d36da98..80370c538bf0 100644 --- a/model/pipeline/src/main/proto/beam_runner_api.proto +++ b/model/pipeline/src/main/proto/beam_runner_api.proto @@ -25,7 +25,7 @@ syntax = "proto3"; package org.apache.beam.model.pipeline.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1;pipeline_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1;pipeline_v1"; option java_package = "org.apache.beam.model.pipeline.v1"; option java_outer_classname = "RunnerApi"; @@ -58,6 +58,10 @@ message BeamConstants { // on any proto message that may contain references needing resolution. message Components { // (Required) A map from pipeline-scoped id to PTransform. + // + // Keys of the transforms map may be used by runners to identify pipeline + // steps. Hence it's recommended to use strings that are not too long that + // match regex '[A-Za-z0-9-_]+'. map transforms = 1; // (Required) A map from pipeline-scoped id to PCollection. @@ -151,7 +155,11 @@ message PTransform { // (Optional) A list of the ids of transforms that it contains. // - // Primitive transforms are not allowed to specify this. + // Primitive transforms (see StandardPTransforms.Primitives) are not allowed + // to specify subtransforms. + // + // Note that a composite transform may have zero subtransforms as long as it + // only outputs PCollections that are in its inputs. repeated string subtransforms = 2; // (Required) A map from local names of inputs (unique only with this map, and @@ -188,6 +196,15 @@ message PTransform { // Transforms that are required to be implemented by a runner must omit this. // All other transforms are required to specify this. string environment_id = 7; + + // (Optional) A map from URNs designating a type of annotation, to the + // annotation in binary format. For example, an annotation could indicate + // that this PTransform has specific privacy properties. + // + // A runner MAY ignore types of annotations it doesn't understand. Therefore + // annotations MUST NOT be used for metadata that can affect correct + // execution of the transform. + map annotations = 8; } message StandardPTransforms { @@ -297,10 +314,10 @@ message StandardPTransforms { // A transform that translates a given element to its human-readable // representation. - // + // // Input: KV // Output: KV - // + // // For each given element, the implementation returns the best-effort // human-readable representation. When possible, the implementation could // call a user-overridable method on the type. For example, Java could @@ -611,7 +628,7 @@ message TestStreamPayload { // Advances the watermark to the specified timestamp. message AdvanceWatermark { - // (Required) The watermark to advance to. + // (Required) The watermark in millisecond to advance to. int64 new_watermark = 1; // (Optional) The output watermark tag for a PCollection. If unspecified @@ -622,7 +639,7 @@ message TestStreamPayload { // Advances the processing time clock by the specified amount. message AdvanceProcessingTime { - // (Required) The duration to advance by. + // (Required) The duration in millisecond to advance by. int64 advance_duration = 1; } @@ -644,7 +661,7 @@ message TestStreamPayload { // encoding primitives. bytes encoded_element = 1; - // (Required) The event timestamp of this element. + // (Required) The event timestamp in millisecond of this element. int64 timestamp = 2; } } @@ -681,12 +698,18 @@ message WriteFilesPayload { // Payload used by Google Cloud Pub/Sub read transform. // This can be used by runners that wish to override Beam Pub/Sub read transform // with a native implementation. +// The SDK should guarantee that only one of topic, subscription, +// topic_runtime_overridden and subscription_runtime_overridden is set. +// The output of PubSubReadPayload should be bytes of serialized PubsubMessage +// proto if with_attributes == true. Otherwise, the bytes is the raw payload. message PubSubReadPayload { // Topic to read from. Exactly one of topic or subscription should be set. + // Topic format is: /topics/project_id/subscription_name string topic = 1; // Subscription to read from. Exactly one of topic or subscription should be set. + // Subscription format is: /subscriptions/project_id/subscription_name string subscription = 2; // Attribute that provides element timestamps. @@ -697,14 +720,25 @@ message PubSubReadPayload { // If true, reads Pub/Sub payload as well as attributes. If false, reads only the payload. bool with_attributes = 5; + + // If set, the topic is expected to be provided during runtime. + string topic_runtime_overridden = 6; + + // If set, the subscription that is expected to be provided during runtime. + string subscription_runtime_overridden = 7; } // Payload used by Google Cloud Pub/Sub write transform. // This can be used by runners that wish to override Beam Pub/Sub write transform // with a native implementation. +// The SDK should guarantee that only one of topic and topic_runtime_overridden +// is set. +// The output of PubSubWritePayload should be bytes if serialized PubsubMessage +// proto. message PubSubWritePayload { // Topic to write to. + // Topic format is: /topics/project_id/subscription_name string topic = 1; // Attribute that provides element timestamps. @@ -712,14 +746,20 @@ message PubSubWritePayload { // Attribute that uniquely identify messages. string id_attribute = 3; + + // If set, the topic is expected to be provided during runtime. + string topic_runtime_overridden = 4; } // Payload for GroupIntoBatches composite transform. message GroupIntoBatchesPayload { - // (Required) Max size of a batch. + // Max size of a batch. int64 batch_size = 1; + // Max byte size of a batch in element. + int64 batch_size_bytes = 3; + // (Optional) Max duration a batch is allowed to be cached in states. int64 max_buffering_duration_millis = 2; } @@ -899,6 +939,31 @@ message StandardCoders { // Experimental. STATE_BACKED_ITERABLE = 9 [(beam_urn) = "beam:coder:state_backed_iterable:v1"]; + + // Encodes an arbitrary user defined window and its max timestamp (inclusive). + // The encoding format is: + // maxTimestamp window + // + // maxTimestamp - A big endian 8 byte integer representing millis-since-epoch. + // The encoded representation is shifted so that the byte representation + // of negative values are lexicographically ordered before the byte + // representation of positive values. This is typically done by + // subtracting -9223372036854775808 from the value and encoding it as a + // signed big endian integer. Example values: + // + // -9223372036854775808: 00 00 00 00 00 00 00 00 + // -255: 7F FF FF FF FF FF FF 01 + // -1: 7F FF FF FF FF FF FF FF + // 0: 80 00 00 00 00 00 00 00 + // 1: 80 00 00 00 00 00 00 01 + // 256: 80 00 00 00 00 00 01 00 + // 9223372036854775807: FF FF FF FF FF FF FF FF + // + // window - the window is encoded using the supplied window coder. + // + // Components: Coder for the custom window type. + CUSTOM_WINDOW = 16 [(beam_urn) = "beam:coder:custom_window:v1"]; + // Additional Standard Coders // -------------------------- // The following coders are not required to be implemented for an SDK or @@ -948,13 +1013,13 @@ message StandardCoders { // values. // - For present values the null indicator is followed by the value // encoded with it's corresponding coder. - // + // // Well known logical types: // beam:logical_type:micros_instant:v1 // - Representation type: ROW // - A timestamp without a timezone where seconds + micros represents the // amount of time since the epoch. - // + // // The payload for RowCoder is an instance of Schema. // Components: None // Experimental. @@ -1145,12 +1210,9 @@ message TimeDomain { // execution of your pipeline PROCESSING_TIME = 2; - // Synchronized processing time is the minimum of the - // processing time of all pending elements. - // - // The "processing time" of an element refers to - // the local processing time at which it was emitted - SYNCHRONIZED_PROCESSING_TIME = 3; + // Synchronized processing time is _not_ part of the user-facing model + reserved 3; + reserved "SYNCHRONIZED_PROCESSING_TIME"; } } @@ -1342,7 +1404,11 @@ message StandardArtifacts { enum Roles { // A URN for staging-to role. // payload: ArtifactStagingToRolePayload - STAGING_TO = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"]; + STAGING_TO = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"]; + + // A URN for pip-requirements-file role. + // payload: None + PIP_REQUIREMENTS_FILE = 1 [(beam_urn) = "beam:artifact:role:pip_requirements_file:v1"]; } } @@ -1357,6 +1423,9 @@ message ArtifactFilePayload { message ArtifactUrlPayload { // a string for an artifact URL e.g. "https://.../foo.jar" or "gs://tmp/foo.jar" string url = 1; + + // (Optional) The hex-encoded sha256 checksum of the artifact if available. + string sha256 = 2; } message EmbeddedFilePayload { @@ -1431,6 +1500,11 @@ message Environment { // (Optional) artifact dependency information used for executing UDFs in this environment. repeated ArtifactInformation dependencies = 6; + // (Optional) A mapping of resource URNs to requested values. The encoding + // of the values is specified by the URN. Resource hints are advisory; + // a runner is free to ignore resource hints that it does not understand. + map resource_hints = 7; + reserved 1; } @@ -1441,6 +1515,8 @@ message StandardEnvironments { PROCESS = 1 [(beam_urn) = "beam:env:process:v1"]; // A managed native process to run user code. EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"]; // An external non managed process to run user code. + + DEFAULT = 3 [(beam_urn) = "beam:env:default:v1"]; // Used as a stub when context is missing a runner-provided default environment. } } @@ -1482,6 +1558,22 @@ message StandardProtocols { // simply indicates this SDK can actually parallelize the work across multiple // cores. MULTI_CORE_BUNDLE_PROCESSING = 3 [(beam_urn) = "beam:protocol:multi_core_bundle_processing:v1"]; + + // Indicates that this SDK handles the InstructionRequest of type + // HarnessMonitoringInfosRequest. + // A request to provide full MonitoringInfo data associated with + // the entire SDK harness process, not specific to a bundle. + HARNESS_MONITORING_INFOS = 4 + [(beam_urn) = "beam:protocol:harness_monitoring_infos:v1"]; + } +} + +// These URNs are used to indicate capabilities of runner that an environment +// may take advantage of when interacting with this runner. +message StandardRunnerProtocols { + enum Enum { + // Indicates suport the MonitoringInfo short id protocol. + MONITORING_INFO_SHORT_IDS = 0 [(beam_urn) = "beam:protocol:monitoring_info_short_ids:v1"]; } } @@ -1595,8 +1687,8 @@ message FunctionSpec { message StandardDisplayData { enum DisplayData { // A string label and value. Has a payload containing an encoded - // LabelledStringPayload. - LABELLED_STRING = 0 [(beam_urn) = "beam:display_data:labelled_string:v1"]; + // LabelledPayload. + LABELLED = 0 [(beam_urn) = "beam:display_data:labelled:v1"]; // Some samples that are being considered to become standard display data // types follow: @@ -1617,13 +1709,16 @@ message StandardDisplayData { } } -message LabelledStringPayload { +message LabelledPayload { // (Required) A human readable label for the value. string label = 1; - // (Required) A value which will be displayed to the user. The urn describes - // how the value can be interpreted and/or categorized. - string value = 2; + // (Required) A value which will be displayed to the user. + oneof value { + string string_value = 2; + bool bool_value = 3; + double double_value = 4; + } } // Static display data associated with a pipeline component. Display data is @@ -1779,3 +1874,13 @@ message ExecutableStagePayload { } } } + +message StandardResourceHints { + enum Enum { + // Describes hardware accelerators that are desired to have in the execution environment. + ACCELERATOR = 0 [(beam_urn) = "beam:resources:accelerator:v1"]; + // Describes desired minimal available RAM size in transform's execution environment. + // SDKs should convert the size to bytes, but can allow users to specify human-friendly units (e.g. GiB). + MIN_RAM_BYTES = 1 [(beam_urn) = "beam:resources:min_ram_bytes:v1"]; + } +} diff --git a/model/pipeline/src/main/proto/endpoints.proto b/model/pipeline/src/main/proto/endpoints.proto index e757a82d27d4..46fb3d5d3b1c 100644 --- a/model/pipeline/src/main/proto/endpoints.proto +++ b/model/pipeline/src/main/proto/endpoints.proto @@ -24,7 +24,7 @@ syntax = "proto3"; package org.apache.beam.model.pipeline.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1;pipeline_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1;pipeline_v1"; option java_package = "org.apache.beam.model.pipeline.v1"; option java_outer_classname = "Endpoints"; diff --git a/model/pipeline/src/main/proto/external_transforms.proto b/model/pipeline/src/main/proto/external_transforms.proto index 54a5548bd5a7..a528e565e4b0 100644 --- a/model/pipeline/src/main/proto/external_transforms.proto +++ b/model/pipeline/src/main/proto/external_transforms.proto @@ -24,11 +24,12 @@ syntax = "proto3"; package org.apache.beam.model.pipeline.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1;pipeline_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1;pipeline_v1"; option java_package = "org.apache.beam.model.pipeline.v1"; option java_outer_classname = "ExternalTransforms"; import "schema.proto"; +import "beam_runner_api.proto"; // A configuration payload for an external transform. // Used as the payload of ExternalTransform as part of an ExpansionRequest. @@ -40,3 +41,65 @@ message ExternalConfigurationPayload { // schema. bytes payload = 2; } + +// Defines specific expansion methods that may be used to expand cross-language +// transforms. +// Has to be set as the URN of the transform of the expansion request. +message ExpansionMethods { + enum Enum { + // Expand a Java transform using specified constructor and builder methods. + // Transform payload will be of type JavaClassLookupPayload. + JAVA_CLASS_LOOKUP = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) = + "beam:expansion:payload:java_class_lookup:v1"]; + } +} + +// A configuration payload for an external transform. +// Used to define a Java transform that can be directly instantiated by a Java +// expansion service. +message JavaClassLookupPayload { + // Name of the Java transform class. + string class_name = 1; + + // A static method to construct the initial instance of the transform. + // If not provided, the transform should be instantiated using a class + // constructor. + string constructor_method = 2; + + // The top level fields of the schema represent the method parameters in + // order. + // If able, top level field names are also verified against the method + // parameters for a match. + Schema constructor_schema = 3; + + // A payload which can be decoded using beam:coder:row:v1 and the provided + // constructor schema. + bytes constructor_payload = 4; + + // Set of builder methods and corresponding parameters to apply after the + // transform object is constructed. + // When constructing the transform object, given builder methods will be + // applied in order. + repeated BuilderMethod builder_methods = 5; +} + +// This represents a builder method of the transform class that should be +// applied in-order after instantiating the initial transform object. +// Each builder method may take one or more parameters and has to return an +// instance of the transform object. +message BuilderMethod { + // Name of the builder method + string name = 1; + + // The top level fields of the schema represent the method parameters in + // order. + // If able, top level field names are also verified against the method + // parameters for a match. + Schema schema = 2; + + // A payload which can be decoded using beam:coder:row:v1 and the builder + // method schema. + bytes payload = 3; +} + + diff --git a/model/pipeline/src/main/proto/metrics.proto b/model/pipeline/src/main/proto/metrics.proto index 39ef551130ee..8f819b600dc9 100644 --- a/model/pipeline/src/main/proto/metrics.proto +++ b/model/pipeline/src/main/proto/metrics.proto @@ -24,7 +24,7 @@ syntax = "proto3"; package org.apache.beam.model.pipeline.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1;pipeline_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1;pipeline_v1"; option java_package = "org.apache.beam.model.pipeline.v1"; option java_outer_classname = "MetricsApi"; @@ -321,7 +321,33 @@ message MonitoringInfoSpecs { annotations: [ { key: "description", - value: "Request counts with status made to an IOs service APIs to batch read or write elements." + value: "Request counts with status made to IO service APIs to batch read or write elements." + }, + { + key: "process_metric", // Should be reported as a process metric + // instead of a bundle metric + value: "true" + } + ] + }]; + + API_REQUEST_LATENCIES = 20 [(monitoring_info_spec) = { + urn: "beam:metric:io:api_request_latencies:v1", + type: "beam:metrics:histogram_int64:v1", + required_labels: [ + "SERVICE", + "METHOD", + "RESOURCE", + "PTRANSFORM" + ], + annotations: [ + { + key: "description", + value: "Histogram counts for request latencies made to IO service APIs to batch read or write elements." + }, + { + key: "units", + value: "Milliseconds" }, { key: "process_metric", // Should be reported as a process metric @@ -387,6 +413,13 @@ message MonitoringInfo { BIGQUERY_TABLE = 13 [(label_props) = { name: "BIGQUERY_TABLE" }]; BIGQUERY_VIEW = 14 [(label_props) = { name: "BIGQUERY_VIEW" }]; BIGQUERY_QUERY_NAME = 15 [(label_props) = { name: "BIGQUERY_QUERY_NAME" }]; + GCS_BUCKET = 16 [(label_props) = { name: "GCS_BUCKET"}]; + GCS_PROJECT_ID = 17 [(label_props) = { name: "GCS_PROJECT_ID"}]; + DATASTORE_PROJECT = 18 [(label_props) = { name: "DATASTORE_PROJECT" }]; + DATASTORE_NAMESPACE = 19 [(label_props) = { name: "DATASTORE_NAMESPACE" }]; + BIGTABLE_PROJECT_ID = 20 [(label_props) = { name: "BIGTABLE_PROJECT_ID"}]; + INSTANCE_ID = 21 [(label_props) = { name: "INSTANCE_ID"}]; + TABLE_ID = 22 [(label_props) = { name: "TABLE_ID"}]; } // A set of key and value labels which define the scope of the metric. For diff --git a/model/pipeline/src/main/proto/schema.proto b/model/pipeline/src/main/proto/schema.proto index a40087c378ff..23c32a0db968 100644 --- a/model/pipeline/src/main/proto/schema.proto +++ b/model/pipeline/src/main/proto/schema.proto @@ -27,7 +27,7 @@ syntax = "proto3"; package org.apache.beam.model.pipeline.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1;pipeline_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1;pipeline_v1"; option java_package = "org.apache.beam.model.pipeline.v1"; option java_outer_classname = "SchemaApi"; @@ -37,6 +37,8 @@ message Schema { // REQUIRED. An RFC 4122 UUID. string id = 2; repeated Option options = 3; + // Indicates that encoding positions have been overridden. + bool encoding_positions_set = 4; } message Field { @@ -52,6 +54,8 @@ message Field { // or all of them are. Used to support backwards compatibility with schema // changes. // If no fields have encoding position populated the order of encoding is the same as the order in the Schema. + // If this Field is part of a Schema where encoding_positions_set is True then encoding_position must be + // defined, otherwise this field is ignored. int32 encoding_position = 5; repeated Option options = 6; } @@ -109,9 +113,9 @@ message LogicalType { message Option { // REQUIRED. Identifier for the option. string name = 1; - // OPTIONAL. Type specifer for the structure of value. - // If not present, assumes no additional configuration is needed - // for this option and value is ignored. + // REQUIRED. Type specifer for the structure of value. + // Conventionally, options that don't require additional configuration should + // use a boolean type, with the value set to true. FieldType type = 2; FieldValue value = 3; } diff --git a/model/pipeline/src/main/proto/standard_window_fns.proto b/model/pipeline/src/main/proto/standard_window_fns.proto index adb357de5f7c..237cbd67ae71 100644 --- a/model/pipeline/src/main/proto/standard_window_fns.proto +++ b/model/pipeline/src/main/proto/standard_window_fns.proto @@ -25,7 +25,7 @@ syntax = "proto3"; package org.apache.beam.model.pipeline.v1; -option go_package = "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1;pipeline_v1"; +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1;pipeline_v1"; option java_package = "org.apache.beam.model.pipeline.v1"; option java_outer_classname = "StandardWindowFns"; diff --git a/ownership/PYTHON_DEPENDENCY_OWNERS.yaml b/ownership/PYTHON_DEPENDENCY_OWNERS.yaml index 52292949797e..11f7fd0689f9 100644 --- a/ownership/PYTHON_DEPENDENCY_OWNERS.yaml +++ b/ownership/PYTHON_DEPENDENCY_OWNERS.yaml @@ -93,5 +93,14 @@ deps: typing: owners: robertwb + pandas: + owners: bhulette, robertwb + + pyarrow: + owners: bhulette + + numpy: + owners: bhulette + # No dependency added below this line. ... diff --git a/release/build.gradle b/release/build.gradle.kts similarity index 51% rename from release/build.gradle rename to release/build.gradle.kts index 4abe92ff5b16..99ab361ffbb6 100644 --- a/release/build.gradle +++ b/release/build.gradle.kts @@ -16,25 +16,32 @@ * limitations under the License. */ -plugins { id 'groovy' } +plugins { + groovy + id("org.apache.beam.module") +} repositories { mavenCentral() } +val library = project.extensions.extraProperties["library"] as Map> + dependencies { - compile library.groovy.groovy_all - compile 'commons-cli:commons-cli:1.2' + compile(library.getValue("groovy").getValue("groovy_all")) + compile("commons-cli:commons-cli:1.2") + permitUnusedDeclared("commons-cli:commons-cli:1.2") // BEAM-11761 } -task runJavaExamplesValidationTask { +task("runJavaExamplesValidationTask") { group = "Verification" description = "Run the Beam quickstart across all Java runners" - dependsOn ":runners:direct-java:runQuickstartJavaDirect" - dependsOn ":runners:google-cloud-dataflow-java:runQuickstartJavaDataflow" - dependsOn ":runners:spark:runQuickstartJavaSpark" - dependsOn ":runners:flink:1.10:runQuickstartJavaFlinkLocal" - dependsOn ":runners:direct-java:runMobileGamingJavaDirect" - dependsOn ":runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow" - dependsOn ":runners:twister2:runQuickstartJavaTwister2" + dependsOn(":runners:direct-java:runQuickstartJavaDirect") + dependsOn(":runners:google-cloud-dataflow-java:runQuickstartJavaDataflow") + dependsOn(":runners:spark:2:runQuickstartJavaSpark") + dependsOn(":runners:flink:1.13:runQuickstartJavaFlinkLocal") + dependsOn(":runners:direct-java:runMobileGamingJavaDirect") + dependsOn(":runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow") + dependsOn(":runners:twister2:runQuickstartJavaTwister2") + dependsOn(":runners:google-cloud-dataflow-java:runMobileGamingJavaDataflowBom") } diff --git a/release/go-licenses/Dockerfile b/release/go-licenses/Dockerfile index 43e7870311c3..6004f457494d 100644 --- a/release/go-licenses/Dockerfile +++ b/release/go-licenses/Dockerfile @@ -16,7 +16,7 @@ # limitations under the License. ############################################################################### -FROM golang:1.15.0-buster +FROM golang:1.16.0-buster RUN go get github.com/google/go-licenses COPY get-licenses.sh /opt/apache/beam/ ARG sdk_location diff --git a/release/go-licenses/get-licenses.sh b/release/go-licenses/get-licenses.sh index 727fb9b08977..be1e01ddf0e9 100755 --- a/release/go-licenses/get-licenses.sh +++ b/release/go-licenses/get-licenses.sh @@ -19,6 +19,7 @@ set -ex rm -rf /output/* +export GO111MODULE=off go get $sdk_location go-licenses save $sdk_location --save_path=/output/licenses diff --git a/release/src/main/groovy/GoogleCloudPlatformBomArchetype.groovy b/release/src/main/groovy/GoogleCloudPlatformBomArchetype.groovy new file mode 100644 index 000000000000..4920d60ebc21 --- /dev/null +++ b/release/src/main/groovy/GoogleCloudPlatformBomArchetype.groovy @@ -0,0 +1,44 @@ +#!groovy +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +class GoogleCloudPlatformBomArchetype { + def static generate(TestScripts t) { + // Generate a maven project from the snapshot repository + String output_text = t.run """mvn archetype:generate \ + --update-snapshots \ + -DarchetypeGroupId=org.apache.beam \ + -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-gcp-bom-examples \ + -DarchetypeVersion=${t.ver()} \ + -DgroupId=org.example \ + -DartifactId=word-count-beam \ + -Dversion="0.1" \ + -Dpackage=org.apache.beam.examples \ + -DinteractiveMode=false""" + + // Check if it was generated + t.see "[INFO] BUILD SUCCESS", output_text + t.run "cd word-count-beam" + output_text = t.run "ls" + t.see "pom.xml", output_text + t.see "src", output_text + String wordcounts = t.run "ls src/main/java/org/apache/beam/examples/" + t.see "WordCount.java", wordcounts + String games = t.run "ls src/main/java/org/apache/beam/examples/complete/game/" + t.see "UserScore.java", games + } +} diff --git a/release/src/main/groovy/MobileGamingCommands.groovy b/release/src/main/groovy/MobileGamingCommands.groovy index cceca98bc42b..2b38943067ab 100644 --- a/release/src/main/groovy/MobileGamingCommands.groovy +++ b/release/src/main/groovy/MobileGamingCommands.groovy @@ -70,6 +70,11 @@ class MobileGamingCommands { className = exampleName } return """mvn compile exec:java -q \ + -Dmaven.wagon.http.retryHandler.class=default \ + -Dmaven.wagon.http.retryHandler.count=5 \ + -Dmaven.wagon.http.pool=false \ + -Dmaven.wagon.httpconnectionManager.ttlSeconds=120 \ + -Dhttp.keepAlive=false \ -Dexec.mainClass=org.apache.beam.examples.complete.game.${className} \ -Dexec.args=\"${getArgs(exampleName, runner, jobName)}\" \ -P${RUNNERS[runner]}""" diff --git a/release/src/main/groovy/mobilegaming-java-dataflow.groovy b/release/src/main/groovy/mobilegaming-java-dataflow.groovy index 6e3d64cec881..462b3d2cea0e 100644 --- a/release/src/main/groovy/mobilegaming-java-dataflow.groovy +++ b/release/src/main/groovy/mobilegaming-java-dataflow.groovy @@ -110,7 +110,14 @@ class LeaderBoardRunner { } InjectorThread.stop() LeaderBoardThread.stop() - t.run("gcloud dataflow jobs cancel \$(gcloud dataflow jobs list | grep ${jobName} | grep Running | cut -d' ' -f1)") + t.run("""RUNNING_JOB=`gcloud dataflow jobs list | grep ${jobName} | grep Running | cut -d' ' -f1` +if [ ! -z "\${RUNNING_JOB}" ] + then + gcloud dataflow jobs cancel \${RUNNING_JOB} + else + echo "Job '${jobName}' is not running." +fi +""") if (!isSuccess) { t.error("FAILED: Failed running LeaderBoard on DataflowRunner" + diff --git a/release/src/main/groovy/mobilegaming-java-dataflowbom.groovy b/release/src/main/groovy/mobilegaming-java-dataflowbom.groovy new file mode 100644 index 000000000000..87944588e35b --- /dev/null +++ b/release/src/main/groovy/mobilegaming-java-dataflowbom.groovy @@ -0,0 +1,61 @@ +#!groovy +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +t = new TestScripts(args) +mobileGamingCommands = new MobileGamingCommands(testScripts: t) + +/* + * Run the mobile game examples on Dataflow. + * https://beam.apache.org/get-started/mobile-gaming-example/ + */ + +t.describe ('Run Apache Beam Java SDK Mobile Gaming Examples using GCP BOM - Dataflow') + +GoogleCloudPlatformBomArchetype.generate(t) + +def runner = "DataflowRunner" +String command_output_text + +/** + * Run the UserScore example on DataflowRunner + * */ + +t.intent("Running: UserScore example with Beam GCP BOM on DataflowRunner") +t.run(mobileGamingCommands.createPipelineCommand("UserScore", runner)) +command_output_text = t.run "gsutil cat gs://${t.gcsBucket()}/${mobileGamingCommands.getUserScoreOutputName(runner)}* | grep user19_BananaWallaby" +t.see "total_score: 231, user: user19_BananaWallaby", command_output_text +t.success("UserScore successfully run on DataflowRunner.") +t.run "gsutil rm gs://${t.gcsBucket()}/${mobileGamingCommands.getUserScoreOutputName(runner)}*" + + +/** + * Run the HourlyTeamScore example on DataflowRunner + * */ + +t.intent("Running: HourlyTeamScore example with Beam GCP BOM on DataflowRunner") +t.run(mobileGamingCommands.createPipelineCommand("HourlyTeamScore", runner)) +command_output_text = t.run "gsutil cat gs://${t.gcsBucket()}/${mobileGamingCommands.getHourlyTeamScoreOutputName(runner)}* | grep AzureBilby " +t.see "total_score: 2788, team: AzureBilby", command_output_text +t.success("HourlyTeamScore successfully run on DataflowRunner.") +t.run "gsutil rm gs://${t.gcsBucket()}/${mobileGamingCommands.getHourlyTeamScoreOutputName(runner)}*" + +new LeaderBoardRunner().run(runner, t, mobileGamingCommands, false) +new LeaderBoardRunner().run(runner, t, mobileGamingCommands, true) + +t.done() diff --git a/release/src/main/groovy/quickstart-java-dataflow.groovy b/release/src/main/groovy/quickstart-java-dataflow.groovy index acd531f42623..6364cdef1f3c 100644 --- a/release/src/main/groovy/quickstart-java-dataflow.groovy +++ b/release/src/main/groovy/quickstart-java-dataflow.groovy @@ -35,6 +35,11 @@ t.describe 'Run Apache Beam Java SDK Quickstart - Dataflow' // Run the wordcount example with the Dataflow runner t.run """mvn compile exec:java -q \ + -Dmaven.wagon.http.retryHandler.class=default \ + -Dmaven.wagon.http.retryHandler.count=5 \ + -Dmaven.wagon.http.pool=false \ + -Dmaven.wagon.httpconnectionManager.ttlSeconds=120 \ + -Dhttp.keepAlive=false \ -Dexec.mainClass=org.apache.beam.examples.WordCount \ -Dexec.args="--runner=DataflowRunner \ --project=${t.gcpProject()} \ diff --git a/release/src/main/groovy/quickstart-java-direct.groovy b/release/src/main/groovy/quickstart-java-direct.groovy index 33023b393ca5..d489fca8e4af 100644 --- a/release/src/main/groovy/quickstart-java-direct.groovy +++ b/release/src/main/groovy/quickstart-java-direct.groovy @@ -31,6 +31,11 @@ t.describe 'Run Apache Beam Java SDK Quickstart - Direct' t.intent 'Runs the WordCount Code with Direct runner' // Run the wordcount example with the Direct runner t.run """mvn compile exec:java -q \ + -Dmaven.wagon.http.retryHandler.class=default \ + -Dmaven.wagon.http.retryHandler.count=5 \ + -Dmaven.wagon.http.pool=false \ + -Dmaven.wagon.httpconnectionManager.ttlSeconds=120 \ + -Dhttp.keepAlive=false \ -Dexec.mainClass=org.apache.beam.examples.WordCount \ -Dexec.args="--inputFile=pom.xml --output=counts" \ -Pdirect-runner""" diff --git a/release/src/main/groovy/quickstart-java-flinklocal.groovy b/release/src/main/groovy/quickstart-java-flinklocal.groovy index d8c3b0a37191..73e2a0a9a2fc 100644 --- a/release/src/main/groovy/quickstart-java-flinklocal.groovy +++ b/release/src/main/groovy/quickstart-java-flinklocal.groovy @@ -31,6 +31,11 @@ t.describe 'Run Apache Beam Java SDK Quickstart - Flink Local' t.intent 'Runs the WordCount Code with Flink Local runner' // Run the wordcount example with the flink local runner t.run """mvn compile exec:java -q \ + -Dmaven.wagon.http.retryHandler.class=default \ + -Dmaven.wagon.http.retryHandler.count=5 \ + -Dmaven.wagon.http.pool=false \ + -Dmaven.wagon.httpconnectionManager.ttlSeconds=120 \ + -Dhttp.keepAlive=false \ -Dexec.mainClass=org.apache.beam.examples.WordCount \ -Dexec.args="--inputFile=pom.xml --output=counts \ --runner=FlinkRunner" -Pflink-runner""" diff --git a/release/src/main/groovy/quickstart-java-twister2.groovy b/release/src/main/groovy/quickstart-java-twister2.groovy index f5989e81e2b3..7c7e3820fd03 100644 --- a/release/src/main/groovy/quickstart-java-twister2.groovy +++ b/release/src/main/groovy/quickstart-java-twister2.groovy @@ -31,6 +31,11 @@ t.describe 'Run Apache Beam Java SDK Quickstart - Twister2' t.intent 'Runs the WordCount Code with Spark runner' // Run the wordcount example with the Twister2 runner t.run """mvn compile exec:java -q \ + -Dmaven.wagon.http.retryHandler.class=default \ + -Dmaven.wagon.http.retryHandler.count=5 \ + -Dmaven.wagon.http.pool=false \ + -Dmaven.wagon.httpconnectionManager.ttlSeconds=120 \ + -Dhttp.keepAlive=false \ -Dexec.mainClass=org.apache.beam.examples.WordCount \ -Dexec.args="--inputFile=pom.xml --output=counts \ --runner=Twister2Runner" -Ptwister2-runner""" diff --git a/release/src/main/scripts/build_release_candidate.sh b/release/src/main/scripts/build_release_candidate.sh index 37b44932e178..76e2b73103ba 100755 --- a/release/src/main/scripts/build_release_candidate.sh +++ b/release/src/main/scripts/build_release_candidate.sh @@ -36,6 +36,7 @@ else exit 1 fi +SCRIPT_DIR="${PWD}/$(dirname $0)" LOCAL_CLONE_DIR=build_release_candidate LOCAL_JAVA_STAGING_DIR=java_staging_dir LOCAL_PYTHON_STAGING_DIR=python_staging_dir @@ -48,7 +49,7 @@ LOCAL_WEBSITE_REPO=beam_website_repo USER_REMOTE_URL= USER_GITHUB_ID= GIT_REPO_BASE_URL=apache/beam -GIT_REPO_URL=git@github.com:${GIT_REPO_BASE_URL}.git +GIT_REPO_URL=https://github.com/${GIT_REPO_BASE_URL} ROOT_SVN_URL=https://dist.apache.org/repos/dist/dev/beam GIT_BEAM_ARCHIVE=https://github.com/apache/beam/archive GIT_BEAM_WEBSITE=https://github.com/apache/beam-site.git @@ -57,37 +58,93 @@ PYTHON_ARTIFACTS_DIR=python BEAM_ROOT_DIR=beam WEBSITE_ROOT_DIR=beam-site -DOCKER_IMAGE_DEFAULT_REPO_ROOT=apache -DOCKER_IMAGE_DEFAULT_REPO_PREFIX=beam_ +function usage() { + echo 'Usage: build_release_candidate.sh --release --rc --github-user --signing-key [--debug]' +} -JAVA_VER=("java8", "java11") -PYTHON_VER=("python3.6" "python3.7" "python3.8") -FLINK_VER=("1.8" "1.9" "1.10") +RELEASE= +RC_NUM= +SIGNING_KEY= +USER_GITHUB_ID= +DEBUG= + +while [[ $# -gt 0 ]] ; do + arg="$1" + + case $arg in + --release) + shift; RELEASE=$1; shift + ;; + + --rc) + shift; RC_NUM=$1; shift + ;; + + --debug) + DEBUG=--debug + set -x; shift + ;; + + --signing-key) + shift; SIGNING_KEY=$1; shift + ;; + + --github-user) + shift; USER_GITHUB_ID=$1; shift + ;; + + *) + echo "Unrecognized argument: $1" + usage + exit 1 + ;; + esac +done + +if [[ -z "$RELEASE" ]] ; then + echo 'No release version supplied.' + usage + exit 1 +fi -echo "================Setting Up Environment Variables===========" -echo "Which release version are you working on: " -read RELEASE -RELEASE_BRANCH=release-${RELEASE} -echo "Which release candidate number(e.g. 1) are you going to create: " -read RC_NUM -echo "Please enter your github username(ID): " -read USER_GITHUB_ID +if [[ -z "$RC_NUM" ]] ; then + echo 'No RC number supplied.' + usage + exit 1 +fi -USER_REMOTE_URL=git@github.com:${USER_GITHUB_ID}/beam-site +if [[ -z "$RC_NUM" ]] ; then + echo 'No RC number supplied.' + usage + exit 1 +fi -echo "=================Pre-requirements====================" -echo "Please make sure you have configured and started your gpg by running ./preparation_before_release.sh." -echo "================Listing all GPG keys=================" -gpg --list-keys --keyid-format LONG --fingerprint --fingerprint -echo "Please copy the public key which is associated with your Apache account:" +if [[ -z "$USER_GITHUB_ID" ]] ; then + echo 'Please provide your github username(ID)' + usage + exit 1 +fi -read SIGNING_KEY +if [[ -z "$SIGNING_KEY" ]] ; then + echo "=================Pre-requirements====================" + echo "Please make sure you have configured and started your gpg-agent by running ./preparation_before_release.sh." + echo "================Listing all GPG keys=================" + echo "Please provide the public key to sign the release artifacts with. You can list them with this command:" + echo "" + echo " gpg --list-keys --keyid-format LONG --fingerprint --fingerprint" + echo "" + usage + exit 1 +fi + + +RC_TAG="v${RELEASE}-RC${RC_NUM}" +USER_REMOTE_URL=git@github.com:${USER_GITHUB_ID}/beam-site echo "================Checking Environment Variables==============" echo "beam repo will be cloned into: ${LOCAL_CLONE_DIR}" echo "working on release version: ${RELEASE}" -echo "working on release branch: ${RELEASE_BRANCH}" -echo "will create release candidate: RC${RC_NUM}" +echo "will create release candidate: RC${RC_NUM} from commit tagged ${RC_TAG}" echo "Your forked beam-site URL: ${USER_REMOTE_URL}" echo "Your signing key: ${SIGNING_KEY}" echo "Please review all environment variables and confirm: [y|N]" @@ -102,38 +159,19 @@ echo "Do you want to proceed? [y|N]" read confirmation if [[ $confirmation = "y" ]]; then echo "============Building and Staging Java Artifacts=============" - echo "--------Cloning Beam Repo and Checkout Release Branch-------" + echo "--------Cloning Beam Repo and Checkout Release Tag-------" cd ~ if [[ -d ${LOCAL_CLONE_DIR} ]]; then rm -rf ${LOCAL_CLONE_DIR} fi mkdir -p ${LOCAL_CLONE_DIR} cd ${LOCAL_CLONE_DIR} - git clone ${GIT_REPO_URL} + git clone --depth 1 --branch "${RC_TAG}" ${GIT_REPO_URL} "${BEAM_ROOT_DIR}" cd ${BEAM_ROOT_DIR} - git checkout ${RELEASE_BRANCH} - RELEASE_COMMIT=$(git rev-parse --verify ${RELEASE_BRANCH}) echo "-------------Building Java Artifacts with Gradle-------------" git config credential.helper store - if git rev-parse "v${RELEASE}-RC${RC_NUM}" >/dev/null 2>&1; then - echo "Tag v${RELEASE}-RC${RC_NUM} already exists." - echo "Delete the tag and create a new tag commit (y) or skip this step (n)? [y/N]" - read confirmation - if [[ $confirmation = "y" ]]; then - # Delete tag with the git push : format, as shown here: - # https://git-scm.com/docs/git-push#Documentation/git-push.txt-codegitpushoriginexperimentalcode - git push origin :refs/tags/v${RELEASE}-RC${RC_NUM} - fi - fi - if [[ $confirmation = "y" ]]; then # Expected to be "y" unless user chose to skip creating tag. - ./gradlew release -Prelease.newVersion=${RELEASE}-SNAPSHOT \ - -Prelease.releaseVersion=${RELEASE}-RC${RC_NUM} \ - -Prelease.useAutomaticVersion=true --info --no-daemon - git push origin "${RELEASE_BRANCH}" - git push origin "v${RELEASE}-RC${RC_NUM}" - fi echo "-------------Staging Java Artifacts into Maven---------------" gpg --local-user ${SIGNING_KEY} --output /dev/null --sign ~/.bashrc ./gradlew publish -Psigning.gnupg.keyName=${SIGNING_KEY} -PisRelease --no-daemon @@ -157,6 +195,12 @@ if [[ $confirmation = "y" ]]; then cd beam/${RELEASE} echo "----------------Downloading Source Release-------------------" + # GitHub strips the "v" from "v2.29.0" in naming zip and the dir inside it + RC_DIR="beam-${RELEASE}-RC${RC_NUM}" + RC_ZIP="${RC_DIR}.zip" + # We want to strip the -RC1 suffix from the directory name inside the zip + RELEASE_DIR="beam-${RELEASE}" + SOURCE_RELEASE_ZIP="apache-beam-${RELEASE}-source-release.zip" # Check whether there is an existing dist dir if (svn ls "${SOURCE_RELEASE_ZIP}"); then @@ -164,8 +208,14 @@ if [[ $confirmation = "y" ]]; then svn delete "${SOURCE_RELEASE_ZIP}" fi - echo "Downloading: ${GIT_BEAM_ARCHIVE}/release-${RELEASE}.zip" - wget ${GIT_BEAM_ARCHIVE}/release-${RELEASE}.zip -O "${SOURCE_RELEASE_ZIP}" + echo "Downloading: ${GIT_BEAM_ARCHIVE}/${RC_TAG}.zip" + wget ${GIT_BEAM_ARCHIVE}/${RC_TAG}.zip -O "${RC_ZIP}" + + unzip "$RC_ZIP" + rm "$RC_ZIP" + mv "$RC_DIR" "$RELEASE_DIR" + zip -r "${SOURCE_RELEASE_ZIP}" "$RELEASE_DIR" + rm -r "$RELEASE_DIR" echo "----Signing Source Release ${SOURCE_RELEASE_ZIP}-----" gpg --local-user ${SIGNING_KEY} --armor --detach-sig "${SOURCE_RELEASE_ZIP}" @@ -180,7 +230,7 @@ if [[ $confirmation = "y" ]]; then if [[ $confirmation != "y" ]]; then echo "Exit without staging source release on dist.apache.org." else - svn commit --no-auth-cache --non-interactive -m "Staging Java artifacts for Apache Beam ${RELEASE} RC${RC_NUM}" + svn commit --no-auth-cache -m "Staging Java artifacts for Apache Beam ${RELEASE} RC${RC_NUM}" fi rm -rf ~/${LOCAL_JAVA_STAGING_DIR} fi @@ -199,10 +249,9 @@ if [[ $confirmation = "y" ]]; then cd "${LOCAL_PYTHON_STAGING_DIR}" echo '-------------------Cloning Beam Release Branch-----------------' - git clone "${GIT_REPO_URL}" + git clone --branch "${RC_TAG}" --depth 1 "${GIT_REPO_URL}" cd "${BEAM_ROOT_DIR}" - git checkout "${RELEASE_BRANCH}" - RELEASE_COMMIT=$(git rev-parse --verify HEAD) + RELEASE_COMMIT=$(git rev-list -n 1 "tags/${RC_TAG}") echo '-------------------Creating Python Virtualenv-----------------' python3 -m venv "${LOCAL_PYTHON_VIRTUALENV}" @@ -213,10 +262,10 @@ if [[ $confirmation = "y" ]]; then SVN_ARTIFACTS_DIR="beam/${RELEASE}/${PYTHON_ARTIFACTS_DIR}" svn co https://dist.apache.org/repos/dist/dev/beam mkdir -p "${SVN_ARTIFACTS_DIR}" - python release/src/main/scripts/download_github_actions_artifacts.py \ + python "${SCRIPT_DIR}/download_github_actions_artifacts.py" \ --github-user "${USER_GITHUB_ID}" \ --repo-url "${GIT_REPO_BASE_URL}" \ - --release-branch "${RELEASE_BRANCH}" \ + --rc-tag "${RC_TAG}" \ --release-commit "${RELEASE_COMMIT}" \ --artifacts_dir "${SVN_ARTIFACTS_DIR}" @@ -246,7 +295,7 @@ if [[ $confirmation = "y" ]]; then if [[ $confirmation != "y" ]]; then echo "Exit without staging python artifacts on dist.apache.org." else - svn commit --no-auth-cache --non-interactive -m "Staging Python artifacts for Apache Beam ${RELEASE} RC${RC_NUM}" + svn commit --no-auth-cache -m "Staging Python artifacts for Apache Beam ${RELEASE} RC${RC_NUM}" fi rm -rf "${HOME:?}/${LOCAL_PYTHON_STAGING_DIR}" fi @@ -257,49 +306,20 @@ read confirmation if [[ $confirmation = "y" ]]; then echo "============Staging SDK docker images on docker hub=========" cd ~ - if [[ -d ${LOCAL_PYTHON_STAGING_DIR} ]]; then - rm -rf ${LOCAL_PYTHON_STAGING_DIR} + if [[ -d ${LOCAL_CLONE_DIR} ]]; then + rm -rf ${LOCAL_CLONE_DIR} fi - mkdir -p ${LOCAL_PYTHON_STAGING_DIR} - cd ${LOCAL_PYTHON_STAGING_DIR} + mkdir -p ${LOCAL_CLONE_DIR} + cd ${LOCAL_CLONE_DIR} - echo '-------------------Cloning Beam Release Branch-----------------' - git clone ${GIT_REPO_URL} + echo '-------------------Cloning Beam RC Tag-----------------' + git clone --depth 1 --branch "${RC_TAG}" ${GIT_REPO_URL} cd ${BEAM_ROOT_DIR} - git checkout ${RELEASE_BRANCH} - - echo '-------------------Generating and Pushing Python images-----------------' - ./gradlew :sdks:python:container:buildAll -Pdocker-pull-licenses -Pdocker-tag=${RELEASE}_rc${RC_NUM} - for ver in "${PYTHON_VER[@]}"; do - docker push ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_rc${RC_NUM} & - done - - echo '-------------------Generating and Pushing Java images-----------------' - echo "Building containers for the following Java versions:" "${JAVA_VER[@]}" - for ver in "${JAVA_VER[@]}"; do - ./gradlew :sdks:java:container:${ver}:dockerPush -Pdocker-pull-licenses -Pdocker-tag=${RELEASE}_rc${RC_NUM} - done - - echo '-------------Generating and Pushing Flink job server images-------------' - echo "Building containers for the following Flink versions:" "${FLINK_VER[@]}" - for ver in "${FLINK_VER[@]}"; do - ./gradlew ":runners:flink:${ver}:job-server-container:dockerPush" -Pdocker-tag="${RELEASE}_rc${RC_NUM}" - done + git checkout ${RC_TAG} - echo '-------------Generating and Pushing Spark job server image-------------' - ./gradlew ":runners:spark:job-server:container:dockerPush" -Pdocker-tag="${RELEASE}_rc${RC_NUM}" + ./gradlew :pushAllDockerImages -Pdocker-pull-licenses -Pdocker-tag=${RELEASE}_rc${RC_NUM} - rm -rf ~/${PYTHON_ARTIFACTS_DIR} - - echo '-------------------Clean up images at local-----------------' - for ver in "${PYTHON_VER[@]}"; do - docker rmi -f ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_rc${RC_NUM} - done - docker rmi -f ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}java_sdk:${RELEASE}_rc${RC_NUM} - for ver in "${FLINK_VER[@]}"; do - docker rmi -f "${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}flink${ver}_job_server:${RELEASE}_rc${RC_NUM}" - done - docker rmi -f "${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}spark_job_server:${RELEASE}_rc${RC_NUM}" + rm -rf ~/${LOCAL_CLONE_DIR} fi echo "[Current Step]: Update beam-site" @@ -322,19 +342,17 @@ if [[ $confirmation = "y" ]]; then source "${LOCAL_PYTHON_VIRTUALENV}/bin/activate" cd ${LOCAL_PYTHON_DOC} pip install tox - git clone ${GIT_REPO_URL} + git clone --branch "${RC_TAG}" --depth 1 ${GIT_REPO_URL} cd ${BEAM_ROOT_DIR} - git checkout ${RELEASE_BRANCH} - RELEASE_COMMIT=$(git rev-parse --verify ${RELEASE_BRANCH}) + RELEASE_COMMIT=$(git rev-list -n 1 "tags/${RC_TAG}") cd sdks/python && pip install -r build-requirements.txt && tox -e py38-docs GENERATED_PYDOC=~/${LOCAL_WEBSITE_UPDATE_DIR}/${LOCAL_PYTHON_DOC}/${BEAM_ROOT_DIR}/sdks/python/target/docs/_build rm -rf ${GENERATED_PYDOC}/.doctrees echo "----------------------Building Java Doc----------------------" cd ~/${LOCAL_WEBSITE_UPDATE_DIR}/${LOCAL_JAVA_DOC} - git clone ${GIT_REPO_URL} + git clone --branch "${RC_TAG}" --depth 1 ${GIT_REPO_URL} cd ${BEAM_ROOT_DIR} - git checkout ${RELEASE_BRANCH} ./gradlew :sdks:java:javadoc:aggregateJavadoc GENERATE_JAVADOC=~/${LOCAL_WEBSITE_UPDATE_DIR}/${LOCAL_JAVA_DOC}/${BEAM_ROOT_DIR}/sdks/java/javadoc/build/docs/javadoc/ diff --git a/release/src/main/scripts/choose_rc_commit.sh b/release/src/main/scripts/choose_rc_commit.sh new file mode 100755 index 000000000000..4ea9533932d4 --- /dev/null +++ b/release/src/main/scripts/choose_rc_commit.sh @@ -0,0 +1,139 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This script choose a commit to be the basis of a release candidate +# and pushed a new tagged commit for that RC. + +set -e + +function usage() { + echo 'Usage: choose_rc_commit.sh --release --rc --commit [--debug] [--clone] [--push-tag]' +} + +RELEASE= +RC= +COMMIT= +PUSH_TAG=no +CLONE=no +OVERWRITE=no +DEBUG= +GIT_REPO=git@github.com:apache/beam + +while [[ $# -gt 0 ]] ; do + arg="$1" + + case $arg in + --release) + shift + RELEASE=$1 + shift + ;; + + --rc) + shift + RC=$1 + shift + ;; + + --commit) + shift + COMMIT=$1 + shift + ;; + + --debug) + DEBUG=--debug + set -x + shift + ;; + + --push-tag) + PUSH_TAG=yes + shift + ;; + + --overwrite) + OVERWRITE=yes + shift + ;; + + --clone) + CLONE=yes + shift + ;; + + *) + usage + exit 1 + ;; + esac +done + +if [[ -z "$RELEASE" ]] ; then + echo 'No release version supplied.' + usage + exit 1 +fi + +if [[ -z "$RC" ]] ; then + echo 'No RC number supplied' + usage + exit 1 +fi + +if [[ -z "$COMMIT" ]] ; then + echo 'No commit hash supplied.' + usage + exit 1 +fi + +SCRIPT_DIR=$(dirname $0) + +RC_TAG="v${RELEASE}-RC${RC}" + +if [[ "$CLONE" == yes ]] ; then + CLONE_DIR=`mktemp -d` + git clone "$GIT_REPO" "$CLONE_DIR" --single-branch --branch "release-$RELEASE" --shallow-exclude master +else + echo "Not cloning repo; assuming working dir is the desired repo. To run with a fresh clone, run with --clone." + CLONE_DIR=$PWD +fi + +{ + cd "$CLONE_DIR" + bash "$SCRIPT_DIR/set_version.sh" "${RELEASE}" --release --git-add $DEBUG + git checkout --quiet "$COMMIT" # suppress warning about detached HEAD: we want it detached so we do not edit the branch + git commit -m "Set version for ${RELEASE} RC${RC}" + + if git rev-parse "$RC_TAG" >/dev/null 2>&1; then + if [[ "$OVERWRITE" == yes ]]; then + git push origin ":refs/tags/$RC_TAG" + else + echo "Tag $RC_TAG already exists. Either delete it manually or run with --overwrite. Do not overwrite if an RC has been built and shared!" + exit 1 + fi + fi + + git tag -a -m "$RC_TAG" "$RC_TAG" HEAD + + if [[ "$PUSH_TAG" == yes ]] ; then + git push --follow-tags origin "$RC_TAG" + else + echo "Not pushing tag $RC_TAG. You can push it manually or run with --push-tag." + fi +} diff --git a/release/src/main/scripts/cut_release_branch.sh b/release/src/main/scripts/cut_release_branch.sh index 6d0f91379708..1e633934bdce 100755 --- a/release/src/main/scripts/cut_release_branch.sh +++ b/release/src/main/scripts/cut_release_branch.sh @@ -51,11 +51,12 @@ else done fi if [[ -z "$RELEASE" || -z "$NEXT_VERSION_IN_BASE_BRANCH" ]]; then - echo "This sricpt needs to be ran with params, please run with -h to get more instructions." + echo "This script needs to be ran with params, please run with -h to get more instructions." exit fi - +SCRIPT=$(readlink -f $0) +SCRIPT_DIR=$(dirname $SCRIPT) MASTER_BRANCH=master DEV=${RELEASE}.dev RELEASE_BRANCH=release-${RELEASE} @@ -78,7 +79,7 @@ if [[ -d ${LOCAL_CLONE_DIR} ]]; then fi mkdir ${LOCAL_CLONE_DIR} cd ${LOCAL_CLONE_DIR} -git clone ${GITHUB_REPO_URL} +git clone --depth=1 ${GITHUB_REPO_URL} cd ${BEAM_ROOT_DIR} # Create local release branch @@ -91,9 +92,7 @@ echo ${MASTER_BRANCH} echo "===============================================================" # Update master branch -sed -i -e "s/'${RELEASE}'/'${NEXT_VERSION_IN_BASE_BRANCH}'/g" buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy -sed -i -e "s/${RELEASE}/${NEXT_VERSION_IN_BASE_BRANCH}/g" gradle.properties -sed -i -e "s/${RELEASE}/${NEXT_VERSION_IN_BASE_BRANCH}/g" sdks/python/apache_beam/version.py +sh "$SCRIPT_DIR"/set_version.sh "$NEXT_VERSION_IN_BASE_BRANCH" echo "==============Update master branch as following================" git diff @@ -110,6 +109,7 @@ fi git add buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy git add gradle.properties git add sdks/python/apache_beam/version.py +git add sdks/go/pkg/beam/core/core.go git commit -m "Moving to ${NEXT_VERSION_IN_BASE_BRANCH}-SNAPSHOT on master branch." if git push origin ${MASTER_BRANCH}; then break @@ -125,9 +125,6 @@ echo "==================Current working branch=======================" echo ${RELEASE_BRANCH} echo "===============================================================" -sed -i -e "s/${DEV}/${RELEASE}/g" gradle.properties -sed -i -e "s/${DEV}/${RELEASE}/g" sdks/python/apache_beam/version.py -# TODO: [BEAM-4767] sed -i -e "s/'beam-master-.*'/'beam-${RELEASE}'/g" runners/google-cloud-dataflow-java/build.gradle echo "===============Update release branch as following==============" @@ -142,10 +139,8 @@ if [[ $confirmation != "y" ]]; then exit fi -git add gradle.properties -git add sdks/python/apache_beam/version.py git add runners/google-cloud-dataflow-java/build.gradle -git commit -m "Create release branch for version ${RELEASE}." +git commit -m "Set Dataflow container to release version." git push --set-upstream origin ${RELEASE_BRANCH} clean_up diff --git a/release/src/main/scripts/deploy_release_candidate_pypi.sh b/release/src/main/scripts/deploy_release_candidate_pypi.sh new file mode 100755 index 000000000000..7502eb5a730e --- /dev/null +++ b/release/src/main/scripts/deploy_release_candidate_pypi.sh @@ -0,0 +1,173 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This script will deploy a Release Candidate to pypi, includes: +# 1. Download python binary artifacts +# 2. Deploy Release Candidate to pypi + +set -e + +function usage() { + echo 'Usage: deploy_release_candidate_pypi.sh --release --rc --user [--deploy]' +} + +RELEASE= +RC_NUMBER= +COMMIT= +USER_GITHUB_ID= +DEPLOY=no +BEAM_ROOT_DIR=beam +GIT_REPO_BASE_URL=apache/beam +GIT_REPO_URL=https://github.com/${GIT_REPO_BASE_URL} + +while [[ $# -gt 0 ]] ; do + arg="$1" + + case $arg in + --release) + shift + RELEASE=$1 + shift + ;; + + --rc) + shift + RC_NUMBER=$1 + shift + ;; + + --user) + shift + USER_GITHUB_ID=$1 + shift + ;; + + --deploy) + DEPLOY=yes + shift + ;; + + *) + usage + exit 1 + ;; + esac +done + +if [[ -z "$RELEASE" ]] ; then + echo 'No release version supplied.' + usage + exit 1 +fi + +if [[ -z "$RC_NUMBER" ]] ; then + echo 'No RC number supplied' + usage + exit 1 +fi + +if [[ -z "$USER_GITHUB_ID" ]] ; then + echo 'No github user supplied.' + usage + exit 1 +fi + +function clean_up(){ + echo "Do you want to clean local clone repo ${LOCAL_CLONE_DIR}? [y|N]" + read confirmation + if [[ $confirmation = "y" ]]; then + cd ~ + rm -rf ${LOCAL_CLONE_DIR} + echo "Cleaned up local repo." + fi +} + +RC_TAG="v${RELEASE}-RC${RC_NUMBER}" +LOCAL_CLONE_DIR="beam_release_${RC_TAG}" +SCRIPT_DIR="${PWD}/$(dirname $0)" + +echo "================Checking Environment Variables==============" +echo "will download artifacts for ${RC_TAG} built by github actions" +echo "Please review the release version and confirm: [y|N]" +read confirmation +if [[ $confirmation != "y" ]]; then + echo "Please rerun this script and make sure you have the right inputs." + exit +fi + +echo "=====================Clear folder==============================" +cd ~ +if [[ -d ${LOCAL_CLONE_DIR} ]]; then + echo "Deleting existing local clone repo ${LOCAL_CLONE_DIR}." + rm -rf "${LOCAL_CLONE_DIR}" +fi +mkdir "${LOCAL_CLONE_DIR}" +LOCAL_CLONE_DIR_ROOT=$(pwd)/${LOCAL_CLONE_DIR} +cd $LOCAL_CLONE_DIR + +echo "===================Cloning Beam Release Branch==================" +git clone --depth 1 --branch "${RC_TAG}" ${GIT_REPO_URL} "${BEAM_ROOT_DIR}" +cd $BEAM_ROOT_DIR +RELEASE_COMMIT=$(git rev-list -n 1 $RC_TAG) + +echo "================Download python artifacts======================" +PYTHON_ARTIFACTS_DIR="${LOCAL_CLONE_DIR_ROOT}/python" +python "${SCRIPT_DIR}/download_github_actions_artifacts.py" \ + --github-user "${USER_GITHUB_ID}" \ + --repo-url "${GIT_REPO_BASE_URL}" \ + --rc-tag "${RC_TAG}" \ + --release-commit "${RELEASE_COMMIT}" \ + --artifacts_dir "${PYTHON_ARTIFACTS_DIR}" \ + --rc_number "${RC_NUMBER}" + +cd "${PYTHON_ARTIFACTS_DIR}" + +echo "------Checking Hash Value for apache-beam-${RELEASE}rc${RC_NUMBER}.zip-----" +sha512sum -c "apache-beam-${RELEASE}rc${RC_NUMBER}.zip.sha512" + +for artifact in *.whl; do + echo "----------Checking Hash Value for ${artifact} wheel-----------" + sha512sum -c "${artifact}.sha512" +done + +echo "===================Removing sha512 files=======================" +rm $(ls | grep -i ".*.sha512$") + +echo "====================Upload rc to pypi========================" +virtualenv deploy_pypi_env +source ./deploy_pypi_env/bin/activate +pip install twine + +mkdir dist && mv $(ls | grep apache) dist && cd dist +echo "Will upload the following files to PyPI:" +ls +echo "Are the files listed correct? [y|N]" +read confirmation +if [[ $confirmation != "y" ]]; then + echo "Exiting without deploying artifacts to PyPI." + clean_up + exit +fi + +if [[ "$DEPLOY" == yes ]] ; then + twine upload * +else + echo "Skipping deployment to PyPI. Run the script with --deploy to stage the artifacts." +fi + +clean_up \ No newline at end of file diff --git a/release/src/main/scripts/download_github_actions_artifacts.py b/release/src/main/scripts/download_github_actions_artifacts.py index 3453fa0a5839..181fd0c8b92b 100644 --- a/release/src/main/scripts/download_github_actions_artifacts.py +++ b/release/src/main/scripts/download_github_actions_artifacts.py @@ -30,7 +30,7 @@ import requests GH_API_URL_WORKLOW_FMT = "https://api.github.com/repos/{repo_url}/actions/workflows/build_wheels.yml" -GH_API_URL_WORKFLOW_RUNS_FMT = "https://api.github.com/repos/{repo_url}/actions/workflows/{workflow_id}/runs" +GH_API_URL_WORKFLOW_RUNS_FMT = "https://api.github.com/repos/{repo_url}/actions/workflows/{workflow_id}/runs?branch={ref}" GH_API_URL_WORKFLOW_RUN_FMT = "https://api.github.com/repos/{repo_url}/actions/runs/{run_id}" GH_WEB_URL_WORKLOW_RUN_FMT = "https://github.com/{repo_url}/actions/runs/{run_id}" @@ -38,7 +38,7 @@ def parse_arguments(): """ Gets all neccessary data from the user by parsing arguments or asking for input. - Return: github_token, user_github_id, repo_url, release_branch, release_commit, artifacts_dir + Return: github_token, user_github_id, repo_url, rc_tag, release_commit, artifacts_dir, rc_number """ parser = argparse.ArgumentParser( description= @@ -46,9 +46,10 @@ def parse_arguments(): ) parser.add_argument("--github-user", required=True) parser.add_argument("--repo-url", required=True) - parser.add_argument("--release-branch", required=True) + parser.add_argument("--rc-tag", required=True) parser.add_argument("--release-commit", required=True) parser.add_argument("--artifacts_dir", required=True) + parser.add_argument("--rc_number", required=False, default="") args = parser.parse_args() github_token = ask_for_github_token() @@ -62,12 +63,13 @@ def parse_arguments(): user_github_id = args.github_user repo_url = args.repo_url - release_branch = args.release_branch + rc_tag = args.rc_tag release_commit = args.release_commit artifacts_dir = args.artifacts_dir if os.path.isabs(args.artifacts_dir) \ else os.path.abspath(args.artifacts_dir) + rc_number = args.rc_number - return github_token, user_github_id, repo_url, release_branch, release_commit, artifacts_dir + return github_token, user_github_id, repo_url, rc_tag, release_commit, artifacts_dir, rc_number def ask_for_github_token(): @@ -123,22 +125,23 @@ def get_build_wheels_workflow_id(repo_url, github_token): def get_single_workflow_run_data(run_id, repo_url, github_token): """Gets single workflow run data (github api payload).""" url = GH_API_URL_WORKFLOW_RUN_FMT.format(repo_url=repo_url, run_id=run_id) + print('Fetching run data: ', url) return request_url(url, github_token) def get_last_run_id( - workflow_id, repo_url, release_branch, release_commit, github_token): + workflow_id, repo_url, rc_tag, release_commit, github_token): """ Gets id of last run for given workflow, repo, branch and commit. Raises exception when no run found. """ url = GH_API_URL_WORKFLOW_RUNS_FMT.format( - repo_url=repo_url, workflow_id=workflow_id) + repo_url=repo_url, workflow_id=workflow_id, ref=rc_tag) data = request_url( url, github_token, params={ - "event": "push", "branch": release_branch + "event": "push", "tag": rc_tag }, ) runs = safe_get(data, "workflow_runs", url) @@ -149,9 +152,9 @@ def get_last_run_id( if not filtered_commit_runs: workflow_run_web_url = GH_API_URL_WORKFLOW_RUNS_FMT.format( - repo_url=repo_url, workflow_id=workflow_id) + repo_url=repo_url, workflow_id=workflow_id, ref=rc_tag) raise Exception( - f"No runs for workflow (branch {release_branch}, commit {release_commit}). Verify at {workflow_run_web_url}" + f"No runs for workflow (tag {rc_tag}, commit {release_commit}). Verify at {workflow_run_web_url}" ) sorted_runs = sorted( @@ -169,7 +172,7 @@ def get_last_run_id( print(f"Verify at {workflow_run_web_url}") print( f"GCS location corresponding to artifacts built in this run: " - f"gs://beam-wheels-staging/{release_branch}/{release_commit}-{last_run_id}/" + f"gs://beam-wheels-staging/{rc_tag}/{release_commit}-{last_run_id}/" ) return last_run_id @@ -254,22 +257,34 @@ def prepare_directory(artifacts_dir): os.makedirs(artifacts_dir) -def fetch_github_artifacts(run_id, repo_url, artifacts_dir, github_token): +def filter_artifacts(artifacts, rc_number): + def filter_source(artifact_name): + if rc_number: + return artifact_name.startswith("source_zip_rc{}".format(rc_number)) + return artifact_name.startswith("source_zip") and "_rc" not in artifact_name + + def filter_wheels(artifact_name): + if rc_number: + return artifact_name.startswith("wheelhouse-rc{}".format(rc_number)) + return artifact_name.startswith("wheelhouse") and "-rc" not in artifact_name + + return [a for a in artifacts if (filter_source(a["name"]) or filter_wheels(a["name"]))] + +def fetch_github_artifacts(run_id, repo_url, artifacts_dir, github_token, rc_number): """Downloads and extracts github artifacts with source dist and wheels from given run.""" print("Starting downloading artifacts ... (it may take a while)") run_data = get_single_workflow_run_data(run_id, repo_url, github_token) artifacts_url = safe_get(run_data, "artifacts_url") data_artifacts = request_url(artifacts_url, github_token) artifacts = safe_get(data_artifacts, "artifacts", artifacts_url) - filtered_artifacts = [ - a for a in artifacts if ( - a["name"].startswith("source_zip") or - a["name"].startswith("wheelhouse")) - ] + print('Filtering ', len(artifacts), ' artifacts') + filtered_artifacts = filter_artifacts(artifacts, rc_number) + print('Preparing to download ', len(filtered_artifacts), ' artifacts') for artifact in filtered_artifacts: url = safe_get(artifact, "archive_download_url") name = safe_get(artifact, "name") size_in_bytes = safe_get(artifact, "size_in_bytes") + print('Downloading ', size_in_bytes, ' from ', url) with tempfile.TemporaryDirectory() as tmp: temp_file_path = os.path.join(tmp, name + ".zip") @@ -308,18 +323,19 @@ def extract_single_artifact(file_path, output_dir): github_token, user_github_id, repo_url, - release_branch, + rc_tag, release_commit, artifacts_dir, + rc_number, ) = parse_arguments() try: workflow_id = get_build_wheels_workflow_id(repo_url, github_token) run_id = get_last_run_id( - workflow_id, repo_url, release_branch, release_commit, github_token) + workflow_id, repo_url, rc_tag, release_commit, github_token) validate_run(run_id, repo_url, github_token) prepare_directory(artifacts_dir) - fetch_github_artifacts(run_id, repo_url, artifacts_dir, github_token) + fetch_github_artifacts(run_id, repo_url, artifacts_dir, github_token, rc_number) print("Script finished successfully!") print(f"Artifacts available in directory: {artifacts_dir}") except KeyboardInterrupt as e: diff --git a/release/src/main/scripts/mass_comment.py b/release/src/main/scripts/mass_comment.py index 87c9f94b7667..bcfa099a5172 100644 --- a/release/src/main/scripts/mass_comment.py +++ b/release/src/main/scripts/mass_comment.py @@ -58,6 +58,7 @@ "Run XVR_Direct PostCommit", "Run XVR_Flink PostCommit", "Run XVR_Spark PostCommit", + "Run XVR_Spark3 PostCommit", ] @@ -110,7 +111,7 @@ def postComments(accessToken, subjectId): for commentBody in COMMENTS_TO_ADD: jsonData = fetchGHData(accessToken, subjectId, commentBody) - print(jsonData) + print(jsonData) def probeGitHubIsUp(): diff --git a/release/src/main/scripts/preparation_before_release.sh b/release/src/main/scripts/preparation_before_release.sh index 88e9ac65fde9..a42c4f00e236 100755 --- a/release/src/main/scripts/preparation_before_release.sh +++ b/release/src/main/scripts/preparation_before_release.sh @@ -73,6 +73,8 @@ if [[ $confirmation != "y" ]]; then else echo "Not commit new changes into ${ROOT_SVN_URL}${DEV_REPO}/${BEAM_REPO}/KEYS" fi + cd ~ + rm -rf ${LOCAL_SVN_DIR}/${BEAM_REPO} echo "Only a PMC member can write into dist.apache.org's release KEYS. Are you a PMC member? [y|N]" read pmc_permission diff --git a/release/src/main/scripts/publish_docker_images.sh b/release/src/main/scripts/publish_docker_images.sh index 922a1e52c8a1..0339f9c627d2 100755 --- a/release/src/main/scripts/publish_docker_images.sh +++ b/release/src/main/scripts/publish_docker_images.sh @@ -27,10 +27,6 @@ set -e DOCKER_IMAGE_DEFAULT_REPO_ROOT=apache DOCKER_IMAGE_DEFAULT_REPO_PREFIX=beam_ -JAVA_VER=("java8", "java11") -PYTHON_VER=("python3.6" "python3.7" "python3.8") -FLINK_VER=("1.8" "1.9" "1.10") - echo "Publish SDK docker images to Docker Hub." echo "================Setting Up Environment Variables===========" @@ -43,88 +39,23 @@ RC_VERSION="rc${RC_NUM}" echo "================Confirming Release and RC version===========" echo "We are using ${RC_VERSION} to push docker images for ${RELEASE}." +echo "Publishing the following images:" +IMAGES=$(docker images --filter "reference=apache/beam_*:${RELEASE}_${RC_VERSION}" --format "{{.Repository}}") +echo "${IMAGES}" echo "Do you want to proceed? [y|N]" read confirmation if [[ $confirmation = "y" ]]; then - - echo '-------------------Tagging and Pushing Python images-----------------' - for ver in "${PYTHON_VER[@]}"; do + echo "${IMAGES}" | while read IMAGE; do # Pull verified RC from dockerhub. - docker pull ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_${RC_VERSION} + docker pull "${IMAGE}:${RELEASE}_${RC_VERSION}" # Tag with ${RELEASE} and push to dockerhub. - docker tag ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_${RC_VERSION} ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE} - docker push ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE} + docker tag "${IMAGE}:${RELEASE}_${RC_VERSION}" "${IMAGE}:${RELEASE}" + docker push "${IMAGE}:${RELEASE}" # Tag with latest and push to dockerhub. - docker tag ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_${RC_VERSION} ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:latest - docker push ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:latest - - # Cleanup images from local - docker rmi -f ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_${RC_VERSION} - docker rmi -f ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE} - docker rmi -f ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:latest + docker tag "${IMAGE}:${RELEASE}_${RC_VERSION}" "${IMAGE}:latest" + docker push "${IMAGE}:latest" done - echo '-------------------Tagging and Pushing Java images-----------------' - for ver in "${JAVA_VER[@]}"; do - # Pull verified RC from dockerhub. - docker pull ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_${RC_VERSION} - - # Tag with ${RELEASE} and push to dockerhub. - docker tag ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_${RC_VERSION} ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE} - docker push ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE} - - # Tag with latest and push to dockerhub. - docker tag ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_${RC_VERSION} ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:latest - docker push ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:latest - - # Cleanup images from local - docker rmi -f ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE}_${RC_VERSION} - docker rmi -f ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:${RELEASE} - docker rmi -f ${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}${ver}_sdk:latest - end - - echo '-------------Tagging and Pushing Flink job server images-------------' - echo "Publishing images for the following Flink versions:" "${FLINK_VER[@]}" - echo "Make sure the versions are correct, then press any key to proceed." - read - for ver in "${FLINK_VER[@]}"; do - FLINK_IMAGE_NAME=${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}flink${ver}_job_server - - # Pull verified RC from dockerhub. - docker pull "${FLINK_IMAGE_NAME}:${RELEASE}_${RC_VERSION}" - - # Tag with ${RELEASE} and push to dockerhub. - docker tag "${FLINK_IMAGE_NAME}:${RELEASE}_${RC_VERSION}" "${FLINK_IMAGE_NAME}:${RELEASE}" - docker push "${FLINK_IMAGE_NAME}:${RELEASE}" - - # Tag with latest and push to dockerhub. - docker tag "${FLINK_IMAGE_NAME}:${RELEASE}_${RC_VERSION}" "${FLINK_IMAGE_NAME}:latest" - docker push "${FLINK_IMAGE_NAME}:latest" - - # Cleanup images from local - docker rmi -f "${FLINK_IMAGE_NAME}:${RELEASE}_${RC_VERSION}" - docker rmi -f "${FLINK_IMAGE_NAME}:${RELEASE}" - docker rmi -f "${FLINK_IMAGE_NAME}:latest" - done - - echo '-------------Tagging and Pushing Spark job server image-------------' - SPARK_IMAGE_NAME=${DOCKER_IMAGE_DEFAULT_REPO_ROOT}/${DOCKER_IMAGE_DEFAULT_REPO_PREFIX}spark_job_server - - # Pull verified RC from dockerhub. - docker pull "${SPARK_IMAGE_NAME}:${RELEASE}_${RC_VERSION}" - - # Tag with ${RELEASE} and push to dockerhub. - docker tag "${SPARK_IMAGE_NAME}:${RELEASE}_${RC_VERSION}" "${SPARK_IMAGE_NAME}:${RELEASE}" - docker push "${SPARK_IMAGE_NAME}:${RELEASE}" - - # Tag with latest and push to dockerhub. - docker tag "${SPARK_IMAGE_NAME}:${RELEASE}_${RC_VERSION}" "${SPARK_IMAGE_NAME}:latest" - docker push "${SPARK_IMAGE_NAME}:latest" - - # Cleanup images from local - docker rmi -f "${SPARK_IMAGE_NAME}:${RELEASE}_${RC_VERSION}" - docker rmi -f "${SPARK_IMAGE_NAME}:${RELEASE}" - docker rmi -f "${SPARK_IMAGE_NAME}:latest" -fi +fi \ No newline at end of file diff --git a/release/src/main/scripts/run_rc_validation.sh b/release/src/main/scripts/run_rc_validation.sh index dfc02b83dec4..292fc50c3029 100755 --- a/release/src/main/scripts/run_rc_validation.sh +++ b/release/src/main/scripts/run_rc_validation.sh @@ -58,8 +58,25 @@ function clean_up(){ } trap clean_up EXIT -RELEASE_BRANCH=release-${RELEASE_VER} -WORKING_BRANCH=release-${RELEASE_VER}-RC${RC_NUM}_validations +setup_bashrc=0 +function set_bashrc(){ + [[ $setup_bashrc -eq 0 ]] || return + # [BEAM-4518] + FIXED_WINDOW_DURATION=20 + cp ~/.bashrc ~/$BACKUP_BASHRC + echo "export USER_GCP_PROJECT=${USER_GCP_PROJECT}" >> ~/.bashrc + echo "export USER_GCS_BUCKET=${USER_GCS_BUCKET}" >> ~/.bashrc + echo "export SHARED_PUBSUB_TOPIC=${SHARED_PUBSUB_TOPIC}" >> ~/.bashrc + echo "export GOOGLE_APPLICATION_CREDENTIALS=${GOOGLE_APPLICATION_CREDENTIALS}" >> ~/.bashrc + echo "export RELEASE_VER=${RELEASE_VER}" >> ~/.bashrc + echo "export FIXED_WINDOW_DURATION=${FIXED_WINDOW_DURATION}" >> ~/.bashrc + echo "export LOCAL_BEAM_DIR=${LOCAL_BEAM_DIR}" >> ~/.bashrc + setup_bashrc=1 +} + +RC_TAG="v${RELEASE_VER}-RC${RC_NUM}" +RELEASE_BRANCH="releasev${RELEASE_VER}" +WORKING_BRANCH=v${RELEASE_VER}-RC${RC_NUM}_validations GIT_REPO_URL=https://github.com/apache/beam.git PYTHON_RC_DOWNLOAD_URL=https://dist.apache.org/repos/dist/dev/beam HUB_VERSION=2.12.0 @@ -103,9 +120,9 @@ else echo "* Creating local Beam workspace: ${LOCAL_BEAM_DIR}" mkdir -p ${LOCAL_BEAM_DIR} echo "* Cloning Beam repo" - git clone ${GIT_REPO_URL} ${LOCAL_BEAM_DIR} + git clone --depth 1 --branch ${RC_TAG} ${GIT_REPO_URL} ${LOCAL_BEAM_DIR} cd ${LOCAL_BEAM_DIR} - git checkout -b ${WORKING_BRANCH} origin/${RELEASE_BRANCH} --quiet + git checkout -b ${WORKING_BRANCH} ${RC_TAG} --quiet echo "* Setting up git config" # Set upstream repo url with access token included. USER_REPO_URL=https://${GITHUB_USERNAME}:${GITHUB_TOKEN}@github.com/${GITHUB_USERNAME}/beam.git @@ -167,16 +184,27 @@ bq version echo "-----------------Checking gnome-terminal-----------------" if [[ -z `which gnome-terminal` ]]; then echo "You don't have gnome-terminal installed." - if [[ "$INSTALL_GNOME_TERMINAL" != true ]]; then - sudo apt-get upgrade + if [[ "$INSTALL_GNOME_TERMINAL" = true ]]; then sudo apt-get install gnome-terminal else - echo "gnome-terminal is not installed. Validation on Python Leaderboard & GameStates will be skipped." + echo "gnome-terminal is not installed. Can't run validation on Python Leaderboard & GameStates. Exiting." exit fi fi gnome-terminal --version +echo "-----------------Checking kubectl-----------------" +if [[ -z `which kubectl` ]]; then + echo "You don't have kubectl installed." + if [[ "$INSTALL_KUBECTL" = true ]]; then + sudo apt-get install kubectl + else + echo "kubectl is not installed. Can't run validation on Python cross-language Kafka taxi. Exiting." + exit + fi +fi +kubectl version + echo "" echo "" @@ -198,7 +226,7 @@ if [[ "$java_quickstart_flink_local" = true ]]; then echo "*************************************************************" echo "* Running Java Quickstart with Flink local runner" echo "*************************************************************" - ./gradlew :runners:flink:1.10:runQuickstartJavaFlinkLocal \ + ./gradlew :runners:flink:1.13:runQuickstartJavaFlinkLocal \ -Prepourl=${REPO_URL} \ -Pver=${RELEASE_VER} else @@ -210,7 +238,7 @@ if [[ "$java_quickstart_spark_local" = true ]]; then echo "*************************************************************" echo "* Running Java Quickstart with Spark local runner" echo "*************************************************************" - ./gradlew :runners:spark:runQuickstartJavaSpark \ + ./gradlew :runners:spark:2:runQuickstartJavaSpark \ -Prepourl=${REPO_URL} \ -Pver=${RELEASE_VER} else @@ -285,12 +313,23 @@ if [[ ("$java_mobile_game_direct" = true || "$java_mobile_game_dataflow" = true) -PbqDataset=${MOBILE_GAME_DATASET} \ -PpubsubTopic=${MOBILE_GAME_PUBSUB_TOPIC} \ -PgcsBucket=${USER_GCS_BUCKET:5} # skip 'gs://' prefix + + echo "**************************************************************************" + echo "* Java mobile game on DataflowRunner using Beam GCP BOM: UserScore, HourlyTeamScore, Leaderboard" + echo "**************************************************************************" + ./gradlew :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflowBom \ + -Prepourl=${REPO_URL} \ + -Pver=${RELEASE_VER} \ + -PgcpProject=${USER_GCP_PROJECT} \ + -PbqDataset=${MOBILE_GAME_DATASET} \ + -PpubsubTopic=${MOBILE_GAME_PUBSUB_TOPIC} \ + -PgcsBucket=${USER_GCS_BUCKET:5} # skip 'gs://' prefix else echo "* Skip Java Mobile Game on DataflowRunner." fi echo "-----------------Cleaning up BigQuery & Pubsub-----------------" - bq rm -rf --project=${USER_GCP_PROJECT} ${MOBILE_GAME_DATASET} + bq rm -r -f --project_id=${USER_GCP_PROJECT} ${MOBILE_GAME_DATASET} gcloud pubsub topics delete projects/${USER_GCP_PROJECT}/topics/${MOBILE_GAME_PUBSUB_TOPIC} fi @@ -305,7 +344,7 @@ if [[ "$python_quickstart_mobile_game" = true && ! -z `which hub` ]]; then git commit -m "Add empty file in order to create PR" --quiet git push -f ${GITHUB_USERNAME} --quiet # Create a test PR - PR_URL=$(hub pull-request -b apache:${RELEASE_BRANCH} -h ${GITHUB_USERNAME}:${WORKING_BRANCH} -F- <<<"[DO NOT MERGE] Run Python RC Validation Tests + PR_URL=$(hub pull-request -b apache:${RELEASE_BRANCH} -h apache:${RC_TAG} -F- <<<"[DO NOT MERGE] Run Python RC Validation Tests Run Python ReleaseCandidate") echo "Created $PR_URL" @@ -370,16 +409,7 @@ if [[ ("$python_leaderboard_direct" = true \ echo "" >> settings.xml echo "-----------------------Setting up Shell Env Vars------------------------------" - # [BEAM-4518] - FIXED_WINDOW_DURATION=20 - cp ~/.bashrc ~/$BACKUP_BASHRC - echo "export USER_GCP_PROJECT=${USER_GCP_PROJECT}" >> ~/.bashrc - echo "export USER_GCS_BUCKET=${USER_GCS_BUCKET}" >> ~/.bashrc - echo "export SHARED_PUBSUB_TOPIC=${SHARED_PUBSUB_TOPIC}" >> ~/.bashrc - echo "export GOOGLE_APPLICATION_CREDENTIALS=${GOOGLE_APPLICATION_CREDENTIALS}" >> ~/.bashrc - echo "export RELEASE_VER=${RELEASE_VER}" >> ~/.bashrc - echo "export FIXED_WINDOW_DURATION=${FIXED_WINDOW_DURATION}" >> ~/.bashrc - echo "export LOCAL_BEAM_DIR=${LOCAL_BEAM_DIR}" >> ~/.bashrc + set_bashrc echo "----------------------Starting Pubsub Java Injector--------------------------" cd ${LOCAL_BEAM_DIR} @@ -571,3 +601,156 @@ if [[ ("$python_leaderboard_direct" = true \ else echo "* Skip Python Leaderboard & GameStates Validations" fi + +echo "" +echo "====================Starting Python Cross-language Validations===============" +if [[ ("$python_xlang_kafka_taxi_dataflow" = true + || "$python_xlang_sql_taxi_dataflow" = true) \ + && ! -z `which gnome-terminal` && ! -z `which kubectl` ]]; then + cd ${LOCAL_BEAM_DIR} + + echo "---------------------Downloading Python Staging RC----------------------------" + wget ${PYTHON_RC_DOWNLOAD_URL}/${RELEASE_VER}/python/apache-beam-${RELEASE_VER}.zip + wget ${PYTHON_RC_DOWNLOAD_URL}/${RELEASE_VER}/python/apache-beam-${RELEASE_VER}.zip.sha512 + if [[ ! -f apache-beam-${RELEASE_VER}.zip ]]; then + { echo "Fail to download Python Staging RC files." ;exit 1; } + fi + + echo "--------------------------Verifying Hashes------------------------------------" + sha512sum -c apache-beam-${RELEASE_VER}.zip.sha512 + + `which pip` install --upgrade pip + `which pip` install --upgrade setuptools + `which pip` install --upgrade virtualenv + + echo "-----------------------Setting up Shell Env Vars------------------------------" + set_bashrc + + echo "-----------------------Setting up Kafka Cluster on GKE------------------------" + CLUSTER_NAME=xlang-kafka-cluster-$RANDOM + if [[ "$python_xlang_kafka_taxi_dataflow" = true ]]; then + gcloud container clusters create --project=${USER_GCP_PROJECT} --region=${USER_GCP_REGION} --no-enable-ip-alias $CLUSTER_NAME + kubectl apply -R -f ${LOCAL_BEAM_DIR}/.test-infra/kubernetes/kafka-cluster + echo "* Please wait for 10 mins to let a Kafka cluster be launched on GKE." + echo "* Sleeping for 10 mins" + sleep 10m + else + echo "* Skip Kafka cluster setup" + fi + + echo "-----------------------Building expansion service jar------------------------" + ./gradlew sdks:java:io:expansion-service:shadowJar + ./gradlew sdks:java:extensions:sql:expansion-service:shadowJar + + # Run Python XLang pipelines under multiple versions of Python + cd ${LOCAL_BEAM_DIR} + for py_version in "${PYTHON_VERSIONS_TO_VALIDATE[@]}" + do + rm -rf ./beam_env_${py_version} + echo "--------------Setting up virtualenv with $py_version interpreter----------------" + virtualenv beam_env_${py_version} -p $py_version + . beam_env_${py_version}/bin/activate + ln -s ${LOCAL_BEAM_DIR}/sdks beam_env_${py_version}/lib/sdks + + echo "--------------------------Installing Python SDK-------------------------------" + pip install apache-beam-${RELEASE_VER}.zip[gcp] + + echo "----------------Starting XLang Kafka Taxi with DataflowRunner---------------------" + if [[ "$python_xlang_kafka_taxi_dataflow" = true ]]; then + BOOTSTRAP_SERVERS="$(kubectl get svc outside-0 -o jsonpath='{.status.loadBalancer.ingress[0].ip}'):32400" + echo "BOOTSTRAP_SERVERS=${BOOTSTRAP_SERVERS}" + KAFKA_TAXI_DF_DATASET=${USER}_python_validations_$(date +%m%d)_$RANDOM + bq mk --project_id=${USER_GCP_PROJECT} ${KAFKA_TAXI_DF_DATASET} + echo "export BOOTSTRAP_SERVERS=${BOOTSTRAP_SERVERS}" >> ~/.bashrc + echo "export KAFKA_TAXI_DF_DATASET=${KAFKA_TAXI_DF_DATASET}" >> ~/.bashrc + + echo "This is a streaming job. This task will be launched in a separate terminal." + gnome-terminal -x sh -c \ + "echo '*****************************************************'; + echo '* Running Python XLang Kafka Taxi with DataflowRunner'; + echo '*****************************************************'; + . ${LOCAL_BEAM_DIR}/beam_env_${py_version}/bin/activate + python -m apache_beam.examples.kafkataxi.kafka_taxi \ + --project=${USER_GCP_PROJECT} \ + --region=${USER_GCP_REGION} \ + --topic beam-runnerv2 \ + --bootstrap_servers ${BOOTSTRAP_SERVERS} \ + --bq_dataset ${KAFKA_TAXI_DF_DATASET} \ + --runner DataflowRunner \ + --num_workers 5 \ + --temp_location=${USER_GCS_BUCKET}/temp/ \ + --with_metadata \ + --sdk_location apache-beam-${RELEASE_VER}.zip; \ + exec bash" + + echo "***************************************************************" + echo "* Please wait for at least 10 mins to let Dataflow job be launched and results get populated." + echo "* Sleeping for 10 mins" + sleep 10m + echo "* How to verify results:" + echo "* 1. Goto your Dataflow job console and check whether there is any error." + echo "* 2. Check whether ${KAFKA_TAXI_DF_DATASET}.xlang_kafka_taxi has data, retrieving BigQuery data as below: " + test_output=$(bq head -n 10 ${KAFKA_TAXI_DF_DATASET}.xlang_kafka_taxi) + echo "$test_output" + if ! grep -q "passenger_count" <<< "$test_output"; then + echo "Couldn't find expected output. Please confirm the output by visiting the console manually." + exit 1 + fi + echo "***************************************************************" + else + echo "* Skip Python XLang Kafka Taxi with DataflowRunner" + fi + + echo "----------------Starting XLang SQL Taxi with DataflowRunner---------------------" + if [[ "$python_xlang_sql_taxi_dataflow" = true ]]; then + SQL_TAXI_TOPIC=${USER}_python_validations_$(date +%m%d)_$RANDOM + SQL_TAXI_SUBSCRIPTION=${USER}_python_validations_$(date +%m%d)_$RANDOM + gcloud pubsub topics create --project=${USER_GCP_PROJECT} ${SQL_TAXI_TOPIC} + gcloud pubsub subscriptions create --project=${USER_GCP_PROJECT} --topic=${SQL_TAXI_TOPIC} ${SQL_TAXI_SUBSCRIPTION} + echo "export SQL_TAXI_TOPIC=${SQL_TAXI_TOPIC}" >> ~/.bashrc + + echo "This is a streaming job. This task will be launched in a separate terminal." + gnome-terminal -x sh -c \ + "echo '***************************************************'; + echo '* Running Python XLang SQL Taxi with DataflowRunner'; + echo '***************************************************'; + . ${LOCAL_BEAM_DIR}/beam_env_${py_version}/bin/activate + python -m apache_beam.examples.sql_taxi \ + --project=${USER_GCP_PROJECT} \ + --region=${USER_GCP_REGION} \ + --runner DataflowRunner \ + --num_workers 5 \ + --temp_location=${USER_GCS_BUCKET}/temp/ \ + --output_topic projects/${USER_GCP_PROJECT}/topics/${SQL_TAXI_TOPIC} \ + --sdk_location apache-beam-${RELEASE_VER}.zip; \ + exec bash" + + echo "***************************************************************" + echo "* Please wait for at least 10 mins to let Dataflow job be launched and results get populated." + echo "* Sleeping for 10 mins" + sleep 10m + echo "* How to verify results:" + echo "* 1. Goto your Dataflow job console and check whether there is any error." + echo "* 2. Check whether your ${SQL_TAXI_SUBSCRIPTION} subscription has data below:" + # run twice since the first execution would return 0 messages + gcloud pubsub subscriptions pull --project=${USER_GCP_PROJECT} --limit=5 ${SQL_TAXI_SUBSCRIPTION} + test_output=$(gcloud pubsub subscriptions pull --project=${USER_GCP_PROJECT} --limit=5 ${SQL_TAXI_SUBSCRIPTION}) + echo "$test_output" + if ! grep -q "ride_status" <<< "$test_output"; then + echo "Couldn't find expected output. Please confirm the output by visiting the console manually." + exit 1 + fi + echo "***************************************************************" + else + echo "* Skip Python XLang SQL Taxi with DataflowRunner" + fi + done # Loop over Python versions. +else + echo "* Skip Python Cross-language Validations" +fi +echo "*************************************************************" +echo " NOTE: Streaming pipelines are not automatically canceled. " +echo " Please manually cancel any remaining test pipelines and " +echo " clean up the resources (BigQuery dataset, PubSub topics and " +echo " subscriptions, GKE cluster, etc.) after verification. " +echo "*************************************************************" diff --git a/release/src/main/scripts/script.config b/release/src/main/scripts/script.config index 3786979c3e25..97e9dd324f9e 100755 --- a/release/src/main/scripts/script.config +++ b/release/src/main/scripts/script.config @@ -38,6 +38,7 @@ RC_VALIDATE_CONFIGS=( GITHUB_USERNAME GITHUB_TOKEN INSTALL_GNOME_TERMINAL + INSTALL_KUBECTL LOCAL_BEAM_DIR java_quickstart_direct java_quickstart_apex_local @@ -52,6 +53,8 @@ RC_VALIDATE_CONFIGS=( python_leaderboard_dataflow python_gamestats_direct python_gamestats_dataflow + python_xlang_kafka_taxi_dataflow + python_xlang_sql_taxi_dataflow ) # List of required configurations for verify_release_build.sh @@ -126,12 +129,18 @@ GITHUB_USERNAME= GITHUB_TOKEN= # Install gnome-terminal -# Used in Python Leaderboard & GameStates to launches data injection pipeline +# Used in Python Leaderboard & GameStates to launch data injection pipeline # in a separate terminal. Set to true so that it will be installed if not found # from local. Otherwise, validation on Python Leaderboard & GameStates will be # skipped. INSTALL_GNOME_TERMINAL=true +# Install kubectl +# Used in Python cross-language Kafka taxi to launch Kafka cluster on GKE. +# Set to true so that it will be installed if not found from local. +# Otherwise, validation on Python cross-language tests will be skipped. +INSTALL_KUBECTL=true + # Local Beam directory # This is a local workspace used by validation scripts. # Default to a temporary directory created uniquely in each run. @@ -156,3 +165,5 @@ python_leaderboard_direct=true python_leaderboard_dataflow=true python_gamestats_direct=true python_gamestats_dataflow=true +python_xlang_kafka_taxi_dataflow=true +python_xlang_sql_taxi_dataflow=true diff --git a/release/src/main/scripts/set_version.sh b/release/src/main/scripts/set_version.sh index b52dfc952594..67420d6dda96 100755 --- a/release/src/main/scripts/set_version.sh +++ b/release/src/main/scripts/set_version.sh @@ -16,18 +16,20 @@ # limitations under the License. # -# This script will update apache beam master branch with next release version -# and cut release branch for current development version. - -# Parse parameters passing into the script +# This script will update the current checked out branch to be the +# specified version, either for release or development. +# +# This script should be the source of truth for all the locations in +# the codebase that require update. set -e function usage() { - echo 'Usage: set_version.sh [--release] [--debug]' + echo 'Usage: set_version.sh [--release] [--debug] [--git-add]' } IS_SNAPSHOT_VERSION=yes +GIT_ADD=no while [[ $# -gt 0 ]] ; do arg="$1" @@ -43,6 +45,11 @@ while [[ $# -gt 0 ]] ; do shift ;; + --git-add) + GIT_ADD=yes + shift + ;; + *) if [[ -z "$TARGET_VERSION" ]] ; then TARGET_VERSION="$1" @@ -68,8 +75,10 @@ if [[ -z "$IS_SNAPSHOT_VERSION" ]] ; then sed -i -e "s/project.version = .*/project.version = '$TARGET_VERSION'/" buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy sed -i -e "s/^__version__ = .*/__version__ = '${TARGET_VERSION}'/" sdks/python/apache_beam/version.py sed -i -e "s/sdk_version=.*/sdk_version=$TARGET_VERSION/" gradle.properties + sed -i -e "s/SdkVersion = .*/SdkVersion = \"$TARGET_VERSION\"/" sdks/go/pkg/beam/core/core.go # TODO: [BEAM-4767] - sed -i -e "s/'dataflow.container_version' : .*/'dataflow.container_version' : 'beam-${RELEASE}'/" runners/google-cloud-dataflow-java/build.gradle + sed -i -e "s/'dataflow.fnapi_container_version' : .*/'dataflow.fnapi_container_version' : '${TARGET_VERSION}',/" runners/google-cloud-dataflow-java/build.gradle + sed -i -e "s/'dataflow.legacy_container_version' : .*/'dataflow.legacy_container_version' : 'beam-${TARGET_VERSION}',/" runners/google-cloud-dataflow-java/build.gradle else # For snapshot version: # Java/gradle appends -SNAPSHOT @@ -80,6 +89,13 @@ else sed -i -e "s/project.version = .*/project.version = '$TARGET_VERSION'/" buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy sed -i -e "s/^__version__ = .*/__version__ = '${TARGET_VERSION}.dev'/" sdks/python/apache_beam/version.py sed -i -e "s/sdk_version=.*/sdk_version=$TARGET_VERSION.dev/" gradle.properties - sed -i -e "s/'dataflow.container_version' : .*/'dataflow.container_version' : 'beam-master-.*'/" runners/google-cloud-dataflow-java/build.gradle + sed -i -e "s/SdkVersion = .*/SdkVersion = \"${TARGET_VERSION}.dev\"/" sdks/go/pkg/beam/core/core.go fi +if [[ "$GIT_ADD" == yes ]] ; then + git add gradle.properties + git add buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy + git add sdks/python/apache_beam/version.py + git add sdks/go/pkg/beam/core/core.go + git add runners/google-cloud-dataflow-java/build.gradle +fi diff --git a/release/src/main/scripts/verify_release_build.sh b/release/src/main/scripts/verify_release_build.sh index 784dcb9ae304..ba934ce2f29b 100755 --- a/release/src/main/scripts/verify_release_build.sh +++ b/release/src/main/scripts/verify_release_build.sh @@ -39,6 +39,8 @@ set -e BEAM_REPO_URL=https://github.com/apache/beam.git RELEASE_BRANCH=release-${RELEASE_VER} WORKING_BRANCH=postcommit_validation_pr +SCRIPT=$(readlink -f $0) +SCRIPT_DIR=$(dirname $SCRIPT) function clean_up(){ echo "" @@ -123,12 +125,12 @@ if [[ ! -z `which hub` ]]; then # The version change is needed for Dataflow python batch tests. # Without changing to dev version, the dataflow pipeline will fail because of non-existed worker containers. # Note that dataflow worker containers should be built after RC has been built. - sed -i -e "s/${RELEASE_VER}/${RELEASE_VER}.dev/g" sdks/python/apache_beam/version.py - sed -i -e "s/sdk_version=${RELEASE_VER}/sdk_version=${RELEASE_VER}.dev/g" gradle.properties - git add sdks/python/apache_beam/version.py - git add gradle.properties + sh "$SCRIPT_DIR"/set_version.sh "$RELEASE_VER" --git-add + # In case the version string was not changed, append a newline to CHANGES.md + echo "" >> CHANGES.md + git add CHANGES.md git commit -m "Changed version.py and gradle.properties to python dev version to create a test PR" --quiet - git push -f ${GITHUB_USERNAME} --quiet + git push -f ${GITHUB_USERNAME} ${WORKING_BRANCH} --quiet hub pull-request -b apache:${RELEASE_BRANCH} -h ${GITHUB_USERNAME}:${WORKING_BRANCH} -F- <<<"[DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch diff --git a/runners/core-construction-java/build.gradle b/runners/core-construction-java/build.gradle index 99d68592ff27..92ff5aa19a94 100644 --- a/runners/core-construction-java/build.gradle +++ b/runners/core-construction-java/build.gradle @@ -31,6 +31,7 @@ applyJavaNature( 'ReadTranslation': 'https://github.com/typetools/checker-framework/issues/3791', 'GroupByKeyTranslation': 'https://github.com/typetools/checker-framework/issues/3791', 'ParDoTranslation': 'https://github.com/typetools/checker-framework/issues/3791', + 'GroupIntoBatchesTranslation': 'https://github.com/typetools/checker-framework/issues/3791', ], automaticModuleName: 'org.apache.beam.runners.core.construction', ) @@ -53,15 +54,18 @@ dependencies { compile project(path: ":model:job-management", configuration: "shadow") compile project(path: ":sdks:java:core", configuration: "shadow") compile project(path: ":sdks:java:fn-execution") - compile library.java.vendored_grpc_1_26_0 + compile library.java.vendored_grpc_1_36_0 compile library.java.vendored_guava_26_0_jre compile library.java.classgraph compile library.java.jackson_core compile library.java.jackson_databind compile library.java.joda_time compile library.java.slf4j_api - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library + compile library.java.jackson_annotations + compile library.java.avro + // Avro 1.8 leaks an older version of paranamer that conflicts in runtime with the dependencies + // of some runners so we need to fix it to a more recent but still compatible version. + runtimeOnly "com.thoughtworks.paranamer:paranamer:2.8" testCompile library.java.junit testCompile library.java.mockito_core testCompile library.java.jackson_annotations diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/BeamUrns.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/BeamUrns.java index e4fc6d76fc10..768958f95ecd 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/BeamUrns.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/BeamUrns.java @@ -18,7 +18,7 @@ package org.apache.beam.runners.core.construction; import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ProtocolMessageEnum; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ProtocolMessageEnum; /** Returns the standard URN of a given enum annotated with [(standard_urn)]. */ public class BeamUrns { diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslation.java index 14cf59a598dd..39e5201d8d78 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslation.java @@ -28,7 +28,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.util.SerializableUtils; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.BiMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableBiMap; diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java index 574b26e94d86..992a9eb4ed43 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java @@ -28,6 +28,7 @@ import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.LengthPrefixCoder; import org.apache.beam.sdk.coders.RowCoder; +import org.apache.beam.sdk.coders.TimestampPrefixingWindowCoder; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.SchemaTranslation; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; @@ -35,7 +36,7 @@ import org.apache.beam.sdk.util.ShardedKey; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.util.WindowedValue.FullWindowedValueCoder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; /** {@link CoderTranslator} implementations for known coder types. */ @@ -189,6 +190,20 @@ public ShardedKey.Coder fromComponents(List> components) { }; } + static CoderTranslator> timestampPrefixingWindow() { + return new SimpleStructuredCoderTranslator>() { + @Override + protected TimestampPrefixingWindowCoder fromComponents(List> components) { + return TimestampPrefixingWindowCoder.of((Coder) components.get(0)); + } + + @Override + public List> getComponents(TimestampPrefixingWindowCoder from) { + return from.getComponents(); + } + }; + } + public abstract static class SimpleStructuredCoderTranslator> implements CoderTranslator { @Override diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CombineTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CombineTranslation.java index aeb56fc527b9..de69abecd126 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CombineTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CombineTranslation.java @@ -39,7 +39,7 @@ import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CreatePCollectionViewTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CreatePCollectionViewTranslation.java index e69151d35bb2..c6dda1d26291 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CreatePCollectionViewTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CreatePCollectionViewTranslation.java @@ -34,7 +34,7 @@ import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionView; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** * Utility methods for translating a {@link View} transforms to and from {@link RunnerApi} diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultExpansionServiceClientFactory.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultExpansionServiceClientFactory.java index 87826963af55..d9b2701d21d1 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultExpansionServiceClientFactory.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultExpansionServiceClientFactory.java @@ -23,7 +23,7 @@ import org.apache.beam.model.expansion.v1.ExpansionApi; import org.apache.beam.model.expansion.v1.ExpansionServiceGrpc; import org.apache.beam.model.pipeline.v1.Endpoints; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; /** Default factory for ExpansionServiceClient used by External transform. */ public class DefaultExpansionServiceClientFactory implements ExpansionServiceClientFactory { diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DisplayDataTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DisplayDataTranslation.java index b2d3ebd1c9d6..b7765560a216 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DisplayDataTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DisplayDataTranslation.java @@ -25,7 +25,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.StandardDisplayData; import org.apache.beam.sdk.transforms.display.DisplayData; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -34,16 +34,15 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class DisplayDataTranslation { - public static final String LABELLED_STRING = "beam:display_data:labelled_string:v1"; + public static final String LABELLED = "beam:display_data:labelled:v1"; static { - checkState( - LABELLED_STRING.equals(BeamUrns.getUrn(StandardDisplayData.DisplayData.LABELLED_STRING))); + checkState(LABELLED.equals(BeamUrns.getUrn(StandardDisplayData.DisplayData.LABELLED))); } private static final Map> WELL_KNOWN_URN_TRANSLATORS = - ImmutableMap.of(LABELLED_STRING, DisplayDataTranslation::translateStringUtf8); + ImmutableMap.of(LABELLED, DisplayDataTranslation::translateStringUtf8); public static List toProto(DisplayData displayData) { ImmutableList.Builder builder = ImmutableList.builder(); @@ -54,7 +53,7 @@ public static List toProto(DisplayData displayData) { if (translator != null) { urn = item.getKey(); } else { - urn = LABELLED_STRING; + urn = LABELLED; translator = DisplayDataTranslation::translateStringUtf8; } builder.add( @@ -69,9 +68,9 @@ public static List toProto(DisplayData displayData) { private static ByteString translateStringUtf8(DisplayData.Item item) { String value = String.valueOf(item.getValue() == null ? item.getShortValue() : item.getValue()); String label = item.getLabel() == null ? item.getKey() : item.getLabel(); - return RunnerApi.LabelledStringPayload.newBuilder() + return RunnerApi.LabelledPayload.newBuilder() .setLabel(label) - .setValue(value) + .setStringValue(value) .build() .toByteString(); } diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java index b3b821f7b030..e743e683ddb1 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java @@ -39,6 +39,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi.ProcessPayload; import org.apache.beam.model.pipeline.v1.RunnerApi.StandardArtifacts; import org.apache.beam.model.pipeline.v1.RunnerApi.StandardEnvironments; +import org.apache.beam.model.pipeline.v1.RunnerApi.StandardPTransforms.Primitives; import org.apache.beam.model.pipeline.v1.RunnerApi.StandardPTransforms.SplittableParDoComponents; import org.apache.beam.model.pipeline.v1.RunnerApi.StandardProtocols; import org.apache.beam.sdk.options.PipelineOptions; @@ -46,11 +47,12 @@ import org.apache.beam.sdk.util.ReleaseInfo; import org.apache.beam.sdk.util.ZipFiles; import org.apache.beam.sdk.util.common.ReflectHelpers; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing; @@ -75,21 +77,36 @@ public class Environments { public static final String ENVIRONMENT_EMBEDDED = "EMBEDDED"; // Non Public urn for testing public static final String ENVIRONMENT_LOOPBACK = "LOOPBACK"; // Non Public urn for testing + private static final String dockerContainerImageOption = "docker_container_image"; + private static final String externalServiceAddressOption = "external_service_address"; + private static final String processCommandOption = "process_command"; + private static final String processVariablesOption = "process_variables"; + + private static final Map> allowedEnvironmentOptions = + ImmutableMap.>builder() + .put(ENVIRONMENT_DOCKER, ImmutableSet.of(dockerContainerImageOption)) + .put(ENVIRONMENT_EXTERNAL, ImmutableSet.of(externalServiceAddressOption)) + .put(ENVIRONMENT_PROCESS, ImmutableSet.of(processCommandOption, processVariablesOption)) + .build(); + public enum JavaVersion { - v8("java8", "1.8"), - v11("java11", "11"); + java8("java", "1.8"), + java11("java11", "11"), + java17("java17", "17"); + + // Legacy name, as used in container image + private final String legacyName; - private final String name; + // Specification version (e.g. System java.specification.version) private final String specification; - JavaVersion(final String name, final String specification) { - this.name = name; + JavaVersion(final String legacyName, final String specification) { + this.legacyName = legacyName; this.specification = specification; } - @Override - public String toString() { - return this.name; + public String legacyName() { + return this.legacyName; } public String specification() { @@ -123,6 +140,7 @@ public static JavaVersion forSpecification(String specification) { private Environments() {} public static Environment createOrGetDefaultEnvironment(PortablePipelineOptions options) { + verifyEnvironmentOptions(options); String type = options.getDefaultEnvironmentType(); String config = options.getDefaultEnvironmentConfig(); @@ -136,14 +154,14 @@ public static Environment createOrGetDefaultEnvironment(PortablePipelineOptions break; case ENVIRONMENT_EXTERNAL: case ENVIRONMENT_LOOPBACK: - defaultEnvironment = createExternalEnvironment(config); + defaultEnvironment = createExternalEnvironment(getExternalServiceAddress(options)); break; case ENVIRONMENT_PROCESS: - defaultEnvironment = createProcessEnvironment(config); + defaultEnvironment = createProcessEnvironment(options); break; case ENVIRONMENT_DOCKER: default: - defaultEnvironment = createDockerEnvironment(config); + defaultEnvironment = createDockerEnvironment(getDockerContainerImage(options)); } } return defaultEnvironment @@ -154,7 +172,7 @@ public static Environment createOrGetDefaultEnvironment(PortablePipelineOptions } public static Environment createDockerEnvironment(String dockerImageUrl) { - if (Strings.isNullOrEmpty(dockerImageUrl)) { + if (dockerImageUrl.isEmpty()) { return JAVA_SDK_HARNESS_ENVIRONMENT; } return Environment.newBuilder() @@ -164,21 +182,47 @@ public static Environment createDockerEnvironment(String dockerImageUrl) { .build(); } - private static Environment createExternalEnvironment(String config) { + private static Environment createExternalEnvironment(String externalServiceAddress) { + if (externalServiceAddress.isEmpty()) { + throw new IllegalArgumentException( + String.format( + "External service address must not be empty (set it using '--environmentOptions=%s=...'?).", + externalServiceAddressOption)); + } return Environment.newBuilder() .setUrn(BeamUrns.getUrn(StandardEnvironments.Environments.EXTERNAL)) .setPayload( ExternalPayload.newBuilder() - .setEndpoint(ApiServiceDescriptor.newBuilder().setUrl(config).build()) + .setEndpoint( + ApiServiceDescriptor.newBuilder().setUrl(externalServiceAddress).build()) .build() .toByteString()) .build(); } - private static Environment createProcessEnvironment(String config) { + private static Environment createEmbeddedEnvironment(String config) { + return Environment.newBuilder() + .setUrn(ENVIRONMENT_EMBEDDED) + .setPayload(ByteString.copyFromUtf8(MoreObjects.firstNonNull(config, ""))) + .build(); + } + + private static Environment createProcessEnvironment(PortablePipelineOptions options) { + if (options.getEnvironmentOptions() != null) { + String processCommand = + PortablePipelineOptions.getEnvironmentOption(options, processCommandOption); + if (processCommand.isEmpty()) { + throw new IllegalArgumentException( + String.format( + "Environment option '%s' must be set for process environment.", + processCommandOption)); + } + return createProcessEnvironment("", "", processCommand, getProcessVariables(options)); + } try { ProcessPayloadReferenceJSON payloadReferenceJSON = - MAPPER.readValue(config, ProcessPayloadReferenceJSON.class); + MAPPER.readValue( + options.getDefaultEnvironmentConfig(), ProcessPayloadReferenceJSON.class); return createProcessEnvironment( payloadReferenceJSON.getOs(), payloadReferenceJSON.getArch(), @@ -186,17 +230,13 @@ private static Environment createProcessEnvironment(String config) { payloadReferenceJSON.getEnv()); } catch (IOException e) { throw new RuntimeException( - String.format("Unable to parse process environment config: %s", config), e); + String.format( + "Unable to parse process environment config: %s", + options.getDefaultEnvironmentConfig()), + e); } } - private static Environment createEmbeddedEnvironment(String config) { - return Environment.newBuilder() - .setUrn(ENVIRONMENT_EMBEDDED) - .setPayload(ByteString.copyFromUtf8(MoreObjects.firstNonNull(config, ""))) - .build(); - } - public static Environment createProcessEnvironment( String os, String arch, String command, Map env) { ProcessPayload.Builder builder = ProcessPayload.newBuilder(); @@ -346,8 +386,10 @@ public static Set getJavaCapabilities() { capabilities.addAll(ModelCoders.urns()); capabilities.add(BeamUrns.getUrn(StandardProtocols.Enum.MULTI_CORE_BUNDLE_PROCESSING)); capabilities.add(BeamUrns.getUrn(StandardProtocols.Enum.PROGRESS_REPORTING)); + capabilities.add(BeamUrns.getUrn(StandardProtocols.Enum.HARNESS_MONITORING_INFOS)); capabilities.add("beam:version:sdk_base:" + JAVA_SDK_HARNESS_CONTAINER_URL); capabilities.add(BeamUrns.getUrn(SplittableParDoComponents.TRUNCATE_SIZED_RESTRICTION)); + capabilities.add(BeamUrns.getUrn(Primitives.TO_STRING)); return capabilities.build(); } @@ -363,6 +405,16 @@ public static String createStagingFileName(File path, HashCode hash) { return String.format("%s-%s%s", fileName, encodedHash, suffix); } + public static String getExternalServiceAddress(PortablePipelineOptions options) { + String environmentConfig = options.getDefaultEnvironmentConfig(); + String environmentOption = + PortablePipelineOptions.getEnvironmentOption(options, externalServiceAddressOption); + if (environmentConfig != null && !environmentConfig.isEmpty()) { + return environmentConfig; + } + return environmentOption; + } + private static File zipDirectory(File directory) throws IOException { File zipFile = File.createTempFile(directory.getName(), ".zip"); try (FileOutputStream fos = new FileOutputStream(zipFile)) { @@ -402,4 +454,51 @@ private static String getDefaultJavaSdkHarnessContainerUrl() { getJavaVersion().toString(), ReleaseInfo.getReleaseInfo().getSdkVersion()); } + + private static String getDockerContainerImage(PortablePipelineOptions options) { + String environmentConfig = options.getDefaultEnvironmentConfig(); + String environmentOption = + PortablePipelineOptions.getEnvironmentOption(options, dockerContainerImageOption); + if (environmentConfig != null && !environmentConfig.isEmpty()) { + return environmentConfig; + } + return environmentOption; + } + + private static Map getProcessVariables(PortablePipelineOptions options) { + ImmutableMap.Builder variables = ImmutableMap.builder(); + String assignments = + PortablePipelineOptions.getEnvironmentOption(options, processVariablesOption); + for (String assignment : assignments.split(",", -1)) { + String[] tokens = assignment.split("=", -1); + if (tokens.length == 1) { + throw new IllegalArgumentException( + String.format("Process environment variable '%s' is not assigned a value.", tokens[0])); + } + variables.put(tokens[0], tokens[1]); + } + return variables.build(); + } + + private static void verifyEnvironmentOptions(PortablePipelineOptions options) { + if (options.getEnvironmentOptions() == null || options.getEnvironmentOptions().isEmpty()) { + return; + } + if (!Strings.isNullOrEmpty(options.getDefaultEnvironmentConfig())) { + throw new IllegalArgumentException( + "Pipeline options defaultEnvironmentConfig and environmentOptions are mutually exclusive."); + } + Set allowedOptions = + allowedEnvironmentOptions.getOrDefault( + options.getDefaultEnvironmentType(), ImmutableSet.of()); + for (String option : options.getEnvironmentOptions()) { + String optionName = option.split("=", -1)[0]; + if (!allowedOptions.contains(optionName)) { + throw new IllegalArgumentException( + String.format( + "Environment option '%s' is incompatible with environment type '%s'.", + option, options.getDefaultEnvironmentType())); + } + } + } } diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java index 2c4eb99e4080..4c8756b9be44 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java @@ -42,6 +42,7 @@ import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.transforms.Impulse; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionTuple; @@ -50,9 +51,9 @@ import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.PValues; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannelBuilder; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; @@ -194,6 +195,8 @@ public OutputT expand(InputT input) { PValues.expandInput(PBegin.in(p)), ImmutableMap.of(entry.getKey(), (PCollection) entry.getValue()), Impulse.create(), + // TODO(BEAM-12082): Add proper support for Resource Hints with XLang. + ResourceHints.create(), p); // using fake Impulses to provide inputs components.registerPTransform(fakeImpulse, Collections.emptyList()); diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/GroupIntoBatchesTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/GroupIntoBatchesTranslation.java new file mode 100644 index 000000000000..86f6f87461f4 --- /dev/null +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/GroupIntoBatchesTranslation.java @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import com.google.auto.service.AutoService; +import java.util.Map; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; +import org.apache.beam.model.pipeline.v1.RunnerApi.GroupIntoBatchesPayload; +import org.apache.beam.runners.core.construction.PTransformTranslation.TransformPayloadTranslator; +import org.apache.beam.sdk.runners.AppliedPTransform; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +@SuppressWarnings({ + "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) +}) +public class GroupIntoBatchesTranslation { + /** + * A translator registered to translate {@link GroupIntoBatches} objects to protobuf + * representation. + */ + static class GroupIntoBatchesTranslator + implements TransformPayloadTranslator> { + @Override + public String getUrn(GroupIntoBatches transform) { + return PTransformTranslation.GROUP_INTO_BATCHES_URN; + } + + @Override + public RunnerApi.FunctionSpec translate( + AppliedPTransform> transform, SdkComponents components) { + return FunctionSpec.newBuilder() + .setUrn(getUrn(transform.getTransform())) + .setPayload( + getPayloadFromParameters(transform.getTransform().getBatchingParams()).toByteString()) + .build(); + } + } + + /** + * A translator registered to translate {@link GroupIntoBatches.WithShardedKey} objects to + * protobuf representation. + */ + static class ShardedGroupIntoBatchesTranslator + implements TransformPayloadTranslator.WithShardedKey> { + @Override + public String getUrn(GroupIntoBatches.WithShardedKey transform) { + return PTransformTranslation.GROUP_INTO_BATCHES_WITH_SHARDED_KEY_URN; + } + + @Override + public FunctionSpec translate( + AppliedPTransform.WithShardedKey> transform, + SdkComponents components) { + return FunctionSpec.newBuilder() + .setUrn(getUrn(transform.getTransform())) + .setPayload( + getPayloadFromParameters(transform.getTransform().getBatchingParams()).toByteString()) + .build(); + } + } + + private static GroupIntoBatchesPayload getPayloadFromParameters( + GroupIntoBatches.BatchingParams params) { + return RunnerApi.GroupIntoBatchesPayload.newBuilder() + .setBatchSize(params.getBatchSize()) + .setBatchSizeBytes(params.getBatchSizeBytes()) + .setMaxBufferingDurationMillis(params.getMaxBufferingDuration().getStandardSeconds() * 1000) + .build(); + } + + /** Registers {@link GroupIntoBatchesTranslator} and {@link ShardedGroupIntoBatchesTranslator}. */ + @AutoService(TransformPayloadTranslatorRegistrar.class) + public static class Registrar implements TransformPayloadTranslatorRegistrar { + @Override + public Map, ? extends TransformPayloadTranslator> + getTransformPayloadTranslators() { + return ImmutableMap., TransformPayloadTranslator>builder() + .put(GroupIntoBatches.class, new GroupIntoBatchesTranslator()) + .put(GroupIntoBatches.WithShardedKey.class, new ShardedGroupIntoBatchesTranslator()) + .build(); + } + } +} diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java index 44e2caaf9c8a..1fc8379977e0 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java @@ -32,6 +32,7 @@ import org.apache.beam.sdk.coders.LengthPrefixCoder; import org.apache.beam.sdk.coders.RowCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.TimestampPrefixingWindowCoder; import org.apache.beam.sdk.coders.VarLongCoder; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.transforms.windowing.IntervalWindow.IntervalWindowCoder; @@ -74,6 +75,7 @@ public class ModelCoderRegistrar implements CoderTranslatorRegistrar { .put(DoubleCoder.class, ModelCoders.DOUBLE_CODER_URN) .put(RowCoder.class, ModelCoders.ROW_CODER_URN) .put(ShardedKey.Coder.class, ModelCoders.SHARDED_KEY_CODER_URN) + .put(TimestampPrefixingWindowCoder.class, ModelCoders.CUSTOM_WINDOW_CODER_URN) .build(); public static final Set WELL_KNOWN_CODER_URNS = BEAM_MODEL_CODER_URNS.values(); @@ -96,6 +98,7 @@ public class ModelCoderRegistrar implements CoderTranslatorRegistrar { .put(DoubleCoder.class, CoderTranslators.atomic(DoubleCoder.class)) .put(RowCoder.class, CoderTranslators.row()) .put(ShardedKey.Coder.class, CoderTranslators.shardedKey()) + .put(TimestampPrefixingWindowCoder.class, CoderTranslators.timestampPrefixingWindow()) .build(); static { diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoders.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoders.java index ca69be9d7e43..0ff70f1ec5f7 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoders.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoders.java @@ -26,7 +26,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi.Coder; import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; import org.apache.beam.model.pipeline.v1.RunnerApi.StandardCoders; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; /** Utilities and constants ot interact with coders that are part of the Beam Model. */ @@ -54,6 +54,8 @@ private ModelCoders() {} public static final String INTERVAL_WINDOW_CODER_URN = getUrn(StandardCoders.Enum.INTERVAL_WINDOW); + public static final String CUSTOM_WINDOW_CODER_URN = getUrn(StandardCoders.Enum.CUSTOM_WINDOW); + public static final String WINDOWED_VALUE_CODER_URN = getUrn(StandardCoders.Enum.WINDOWED_VALUE); public static final String PARAM_WINDOWED_VALUE_CODER_URN = getUrn(StandardCoders.Enum.PARAM_WINDOWED_VALUE); @@ -82,6 +84,7 @@ private ModelCoders() {} LENGTH_PREFIX_CODER_URN, GLOBAL_WINDOW_CODER_URN, INTERVAL_WINDOW_CODER_URN, + CUSTOM_WINDOW_CODER_URN, WINDOWED_VALUE_CODER_URN, DOUBLE_CODER_URN, ROW_CODER_URN, diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionViewTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionViewTranslation.java index d3e837b25b3a..c8fe8f309872 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionViewTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionViewTranslation.java @@ -29,7 +29,7 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; /** Utilities for interacting with PCollection view protos. */ public class PCollectionViewTranslation { diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java index 2d50d727feb9..bb823cdb98e7 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java @@ -102,6 +102,10 @@ public class PTransformTranslation { public static final String COMBINE_GLOBALLY_TRANSFORM_URN = "beam:transform:combine_globally:v1"; public static final String RESHUFFLE_URN = "beam:transform:reshuffle:v1"; public static final String WRITE_FILES_TRANSFORM_URN = "beam:transform:write_files:v1"; + public static final String GROUP_INTO_BATCHES_WITH_SHARDED_KEY_URN = + "beam:transform:group_into_batches_with_sharded_key:v1"; + public static final String PUBSUB_READ = "beam:transform:pubsub_read:v1"; + public static final String PUBSUB_WRITE = "beam:transform:pubsub_write:v1"; // CombineComponents public static final String COMBINE_PER_KEY_PRECOMBINE_TRANSFORM_URN = @@ -140,6 +144,9 @@ public class PTransformTranslation { public static final String SPLITTABLE_PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS_URN = "beam:transform:sdf_process_sized_element_and_restrictions:v1"; + // GroupIntoBatchesComponents + public static final String GROUP_INTO_BATCHES_URN = "beam:transform:group_into_batches:v1"; + static { // Primitives checkState(PAR_DO_TRANSFORM_URN.equals(getUrn(StandardPTransforms.Primitives.PAR_DO))); @@ -172,6 +179,8 @@ public class PTransformTranslation { checkState(RESHUFFLE_URN.equals(getUrn(StandardPTransforms.Composites.RESHUFFLE))); checkState( WRITE_FILES_TRANSFORM_URN.equals(getUrn(StandardPTransforms.Composites.WRITE_FILES))); + checkState(PUBSUB_READ.equals(getUrn(StandardPTransforms.Composites.PUBSUB_READ))); + checkState(PUBSUB_WRITE.equals(getUrn(StandardPTransforms.Composites.PUBSUB_WRITE))); // CombineComponents checkState( @@ -364,7 +373,8 @@ public RunnerApi.PTransform translate( } if (!RUNNER_IMPLEMENTED_TRANSFORMS.contains(urn)) { - transformBuilder.setEnvironmentId(components.getOnlyEnvironmentId()); + transformBuilder.setEnvironmentId( + components.getEnvironmentIdFor(appliedPTransform.getResourceHints())); } return transformBuilder.build(); } @@ -439,10 +449,12 @@ public RunnerApi.PTransform translate( // reads since they are a Runner translated transform, unless, in the future, we have an // adapter available for splittable DoFn. if (appliedPTransform.getTransform().getClass() == Read.Bounded.class) { - transformBuilder.setEnvironmentId(components.getOnlyEnvironmentId()); + transformBuilder.setEnvironmentId( + components.getEnvironmentIdFor(appliedPTransform.getResourceHints())); } } else { - transformBuilder.setEnvironmentId(components.getOnlyEnvironmentId()); + transformBuilder.setEnvironmentId( + components.getEnvironmentIdFor(appliedPTransform.getResourceHints())); } } } diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ParDoTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ParDoTranslation.java index b569cf307f93..083754be991c 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ParDoTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ParDoTranslation.java @@ -77,8 +77,8 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; @@ -179,7 +179,7 @@ public RunnerApi.PTransform translate( .setUrn(PAR_DO_TRANSFORM_URN) .setPayload(payload.toByteString()) .build()); - builder.setEnvironmentId(components.getOnlyEnvironmentId()); + builder.setEnvironmentId(components.getEnvironmentIdFor(appliedParDo.getResourceHints())); return builder.build(); } @@ -673,7 +673,10 @@ private static RunnerApi.TimeDomain.Enum translateTimeDomain(TimeDomain timeDoma case PROCESSING_TIME: return RunnerApi.TimeDomain.Enum.PROCESSING_TIME; case SYNCHRONIZED_PROCESSING_TIME: - return RunnerApi.TimeDomain.Enum.SYNCHRONIZED_PROCESSING_TIME; + throw new IllegalArgumentException( + String.format( + "%s is not permitted for user timers", + TimeDomain.SYNCHRONIZED_PROCESSING_TIME.name())); default: throw new IllegalArgumentException("Unknown time domain"); } diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslation.java index 56e5d06979de..b31e91c31983 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslation.java @@ -27,9 +27,9 @@ import java.util.Map; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.util.common.ReflectHelpers; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.util.JsonFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.JsonFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.CaseFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java index 35de48a26aba..f324686154a3 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java @@ -18,10 +18,11 @@ package org.apache.beam.runners.core.construction; import java.io.IOException; -import java.util.Collection; +import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.HashSet; +import java.util.List; import java.util.Map; import java.util.Set; import java.util.stream.Collectors; @@ -55,7 +56,7 @@ public static RunnerApi.Pipeline toProto( final Pipeline pipeline, final SdkComponents components, boolean useDeprecatedViewTransforms) { - final Collection rootIds = new HashSet<>(); + final List rootIds = new ArrayList<>(); pipeline.traverseTopologically( new PipelineVisitor.Defaults() { private final ListMultimap> children = diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReadTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReadTranslation.java index c82287fcab06..0dbf7fe8445a 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReadTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReadTranslation.java @@ -36,8 +36,8 @@ import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; /** diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReplacementOutputs.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReplacementOutputs.java index aae35ab9ae66..e32edf7a261b 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReplacementOutputs.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReplacementOutputs.java @@ -26,7 +26,6 @@ import org.apache.beam.sdk.runners.PTransformOverrideFactory.ReplacementOutput; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.POutput; -import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.PValues; import org.apache.beam.sdk.values.TaggedPValue; import org.apache.beam.sdk.values.TupleTag; @@ -41,17 +40,16 @@ public class ReplacementOutputs { private ReplacementOutputs() {} public static Map, ReplacementOutput> singleton( - Map, PCollection> original, PValue replacement) { + Map, PCollection> original, POutput replacement) { Entry, PCollection> originalElement = Iterables.getOnlyElement(original.entrySet()); - TupleTag replacementTag = Iterables.getOnlyElement(replacement.expand().entrySet()).getKey(); - PCollection replacementCollection = - (PCollection) Iterables.getOnlyElement(replacement.expand().entrySet()).getValue(); + Entry, PCollection> replacementElement = + Iterables.getOnlyElement(PValues.expandOutput(replacement).entrySet()); return Collections.singletonMap( - replacementCollection, + replacementElement.getValue(), ReplacementOutput.of( TaggedPValue.of(originalElement.getKey(), originalElement.getValue()), - TaggedPValue.of(replacementTag, replacementCollection))); + TaggedPValue.of(replacementElement.getKey(), replacementElement.getValue()))); } public static Map, ReplacementOutput> tagged( diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java index fe4bc0b48406..5fe697e2e1f4 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java @@ -21,7 +21,10 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import java.io.IOException; +import java.text.Normalizer; +import java.text.Normalizer.Form; import java.util.Collection; +import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; @@ -34,13 +37,17 @@ import org.apache.beam.sdk.options.PortablePipelineOptions; import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.util.NameUtils; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.WindowingStrategy; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.BiMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashBiMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.checkerframework.checker.nullness.qual.Nullable; /** SDK objects that will be represented at some later point within a {@link Components} object. */ @@ -60,6 +67,7 @@ public class SdkComponents { private final Set reservedIds = new HashSet<>(); private String defaultEnvironmentId; + private Map environmentIdsByResourceHints = new HashMap<>(); /** Create a new {@link SdkComponents} with no components. */ public static SdkComponents create() { @@ -191,6 +199,10 @@ private String getApplicationName(AppliedPTransform appliedPTransform) if (name.isEmpty()) { name = "unnamed-ptransform"; } + // Normalize, trim, and uniqify. + int maxNameLength = 100; + name = Normalizer.normalize(name, Form.NFC).replaceAll("[^A-Za-z0-9-_]", "-"); + name = (name.length() > maxNameLength) ? name.substring(0, maxNameLength) : name; name = uniqify(name, transformIds.values()); transformIds.put(appliedPTransform, name); return name; @@ -303,13 +315,38 @@ public String registerEnvironment(Environment env) { return environmentId; } - public String getOnlyEnvironmentId() { + public String getEnvironmentIdFor(ResourceHints resourceHints) { + if (!environmentIdsByResourceHints.containsKey(resourceHints)) { + String baseEnvironmentId = getOnlyEnvironmentId(); + if (resourceHints.hints().size() == 0) { + environmentIdsByResourceHints.put(resourceHints, baseEnvironmentId); + } else { + Environment env = + componentsBuilder + .getEnvironmentsMap() + .get(baseEnvironmentId) + .toBuilder() + .putAllResourceHints( + Maps.transformValues( + resourceHints.hints(), hint -> ByteString.copyFrom(hint.toBytes()))) + .build(); + String name = uniqify(env.getUrn(), environmentIds.values()); + environmentIds.put(env, name); + componentsBuilder.putEnvironments(name, env); + environmentIdsByResourceHints.put(resourceHints, name); + } + } + return environmentIdsByResourceHints.get(resourceHints); + } + + @VisibleForTesting + /*package*/ String getOnlyEnvironmentId() { // TODO Support multiple environments. The environment should be decided by the translation. - if (defaultEnvironmentId != null) { - return defaultEnvironmentId; - } else { - return Iterables.getOnlyElement(componentsBuilder.getEnvironmentsMap().keySet()); + if (defaultEnvironmentId == null) { + defaultEnvironmentId = + Iterables.getOnlyElement(componentsBuilder.getEnvironmentsMap().keySet()); } + return defaultEnvironmentId; } public void addRequirement(String urn) { diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java index 8a2d88e1cec9..c71fc23bfa7a 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java @@ -506,9 +506,8 @@ private static class PairWithRestrictionFn } @Setup - public void setup() { - invoker = DoFnInvokers.invokerFor(fn); - invoker.invokeSetup(); + public void setup(PipelineOptions options) { + invoker = DoFnInvokers.tryInvokeSetupFor(fn, options); } @ProcessElement @@ -571,9 +570,8 @@ private static class SplitRestrictionFn } @Setup - public void setup() { - invoker = DoFnInvokers.invokerFor(splittableFn); - invoker.invokeSetup(); + public void setup(PipelineOptions options) { + invoker = DoFnInvokers.tryInvokeSetupFor(splittableFn, options); } @ProcessElement @@ -654,7 +652,9 @@ public void tearDown() { *

    TODO(BEAM-10670): Remove the primitive Read and make the splittable DoFn the only option. */ public static void convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(Pipeline pipeline) { - if (ExperimentalOptions.hasExperiment(pipeline.getOptions(), "beam_fn_api_use_deprecated_read") + if (!ExperimentalOptions.hasExperiment(pipeline.getOptions(), "use_sdf_read") + || ExperimentalOptions.hasExperiment( + pipeline.getOptions(), "beam_fn_api_use_deprecated_read") || ExperimentalOptions.hasExperiment(pipeline.getOptions(), "use_deprecated_read")) { convertReadBasedSplittableDoFnsToPrimitiveReads(pipeline); } diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java index e53c14c0d0f6..4949491ee9f3 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java @@ -134,9 +134,20 @@ static class NaiveProcessFn() { + @Override + public PipelineOptions pipelineOptions() { + return options; + } + + @Override + public String getErrorContext() { + return "SplittableParDoNaiveBounded/Setup"; + } + }); } @StartBundle @@ -574,7 +585,6 @@ public WatermarkEstimator watermarkEstimator() { } // ----------- Unsupported methods -------------------- - @Override public DoFn.StartBundleContext startBundleContext( DoFn doFn) { diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/TestStreamTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/TestStreamTranslation.java index 7f2c37f89643..7060ca09e3c8 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/TestStreamTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/TestStreamTranslation.java @@ -36,7 +36,7 @@ import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.TimestampedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.joda.time.Duration; import org.joda.time.Instant; @@ -49,13 +49,18 @@ }) public class TestStreamTranslation { - public static TestStream testStreamFromProtoPayload( + public static TestStream testStreamFromProtoPayload( RunnerApi.TestStreamPayload testStreamPayload, RehydratedComponents components) throws IOException { - Coder coder = (Coder) components.getCoder(testStreamPayload.getCoderId()); + Coder coder = (Coder) components.getCoder(testStreamPayload.getCoderId()); - List> events = new ArrayList<>(); + return testStreamFromProtoPayload(testStreamPayload, coder); + } + + public static TestStream testStreamFromProtoPayload( + RunnerApi.TestStreamPayload testStreamPayload, Coder coder) throws IOException { + List> events = new ArrayList<>(); for (RunnerApi.TestStreamPayload.Event event : testStreamPayload.getEventsList()) { events.add(eventFromProto(event, coder)); diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/TriggerTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/TriggerTranslation.java index ba3fd199f6ee..b49addad2593 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/TriggerTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/TriggerTranslation.java @@ -131,10 +131,9 @@ private RunnerApi.TimeDomain.Enum convertTimeDomain(TimeDomain timeDomain) { return RunnerApi.TimeDomain.Enum.EVENT_TIME; case PROCESSING_TIME: return RunnerApi.TimeDomain.Enum.PROCESSING_TIME; - case SYNCHRONIZED_PROCESSING_TIME: - return RunnerApi.TimeDomain.Enum.SYNCHRONIZED_PROCESSING_TIME; default: - throw new IllegalArgumentException(String.format("Unknown time domain: %s", timeDomain)); + throw new IllegalArgumentException( + String.format("Unknown or unsupported time domain: %s", timeDomain)); } } diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WindowIntoTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WindowIntoTranslation.java index 9242f3822fcf..effa04392106 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WindowIntoTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WindowIntoTranslation.java @@ -32,7 +32,7 @@ import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.transforms.windowing.Window.Assign; import org.apache.beam.sdk.transforms.windowing.WindowFn; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.checkerframework.checker.nullness.qual.Nullable; /** diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WindowingStrategyTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WindowingStrategyTranslation.java index 0125a8d0387d..6cede0296399 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WindowingStrategyTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WindowingStrategyTranslation.java @@ -24,11 +24,13 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.Components; import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; +import org.apache.beam.model.pipeline.v1.RunnerApi.MergeStatus; import org.apache.beam.model.pipeline.v1.RunnerApi.OutputTime; import org.apache.beam.model.pipeline.v1.StandardWindowFns.FixedWindowsPayload; import org.apache.beam.model.pipeline.v1.StandardWindowFns.GlobalWindowsPayload; import org.apache.beam.model.pipeline.v1.StandardWindowFns.SessionWindowsPayload; import org.apache.beam.model.pipeline.v1.StandardWindowFns.SlidingWindowsPayload; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.windowing.FixedWindows; import org.apache.beam.sdk.transforms.windowing.GlobalWindows; import org.apache.beam.sdk.transforms.windowing.Sessions; @@ -41,10 +43,10 @@ import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.beam.sdk.values.WindowingStrategy.AccumulationMode; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.util.Durations; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.util.Timestamps; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.Durations; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.Timestamps; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.joda.time.Duration; @@ -281,10 +283,12 @@ public static RunnerApi.MessageWithComponents toMessageProto( */ public static RunnerApi.WindowingStrategy toProto( WindowingStrategy windowingStrategy, SdkComponents components) throws IOException { - FunctionSpec windowFnSpec = toProto(windowingStrategy.getWindowFn(), components); + WindowFn windowFn = windowingStrategy.getWindowFn(); + + FunctionSpec windowFnSpec = toProto(windowFn, components); String environmentId = Strings.isNullOrEmpty(windowingStrategy.getEnvironmentId()) - ? components.getOnlyEnvironmentId() + ? components.getEnvironmentIdFor(ResourceHints.create()) : windowingStrategy.getEnvironmentId(); RunnerApi.WindowingStrategy.Builder windowingStrategyProto = @@ -295,10 +299,15 @@ public static RunnerApi.WindowingStrategy toProto( .setAllowedLateness(windowingStrategy.getAllowedLateness().getMillis()) .setTrigger(TriggerTranslation.toProto(windowingStrategy.getTrigger())) .setWindowFn(windowFnSpec) - .setAssignsToOneWindow(windowingStrategy.getWindowFn().assignsToOneWindow()) + .setAssignsToOneWindow(windowFn.assignsToOneWindow()) + .setMergeStatus( + windowFn.isNonMerging() + ? MergeStatus.Enum.NON_MERGING + : (windowingStrategy.isAlreadyMerged() + ? MergeStatus.Enum.ALREADY_MERGED + : MergeStatus.Enum.NEEDS_MERGE)) .setOnTimeBehavior(toProto(windowingStrategy.getOnTimeBehavior())) - .setWindowCoderId( - components.registerCoder(windowingStrategy.getWindowFn().windowCoder())) + .setWindowCoderId(components.registerCoder(windowFn.windowCoder())) .setEnvironmentId(environmentId); return windowingStrategyProto.build(); diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WriteFilesTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WriteFilesTranslation.java index e0f9a4adc06d..cab628c7ce96 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WriteFilesTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/WriteFilesTranslation.java @@ -45,7 +45,7 @@ import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java index 82cecd3a20fd..acecc4c4eeac 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java @@ -31,7 +31,7 @@ import org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode; import org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode; import org.apache.beam.sdk.transforms.Flatten; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.slf4j.Logger; diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java index e0cb973b503b..2f2449f0cf2b 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java @@ -61,7 +61,7 @@ import org.apache.beam.runners.core.construction.PTransformTranslation; import org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode; import org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResources.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResources.java index dda68f10d81a..37ce403f909e 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResources.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResources.java @@ -20,27 +20,30 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import java.io.File; -import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStream; import java.util.List; import java.util.function.Predicate; import java.util.stream.Collectors; +import org.apache.beam.sdk.options.FileStagingOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.util.ZipFiles; +import org.apache.beam.sdk.util.common.ReflectHelpers; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Funnels; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hasher; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** Utilities for working with classpath resources for pipelines. */ public class PipelineResources { + private static final Logger LOG = LoggerFactory.getLogger(PipelineResources.class); /** - * Uses algorithm provided via {@link - * org.apache.beam.runners.core.construction.resources.PipelineResourcesOptions} to detect - * classpath resources. + * Uses algorithm provided via {@link PipelineResourcesOptions} to detect classpath resources. * * @param classLoader The URLClassLoader to use to detect resources to stage (optional). * @param options pipeline options @@ -69,9 +72,36 @@ private static Predicate isStageable() { * directories and packages existing ones. This is necessary for runners that require filesToStage * to be jars only. * + *

    This method mutates the filesToStage value of the given options. + * + * @param options options object with the files to stage and temp location for staging + */ + public static void prepareFilesForStaging(FileStagingOptions options) { + List filesToStage = options.getFilesToStage(); + if (filesToStage == null || filesToStage.isEmpty()) { + filesToStage = detectClassPathResourcesToStage(ReflectHelpers.findClassLoader(), options); + LOG.info( + "PipelineOptions.filesToStage was not specified. " + + "Defaulting to files from the classpath: will stage {} files. " + + "Enable logging at DEBUG level to see which files will be staged.", + filesToStage.size()); + LOG.debug("Classpath elements: {}", filesToStage); + } + final String tmpJarLocation = + MoreObjects.firstNonNull(options.getTempLocation(), System.getProperty("java.io.tmpdir")); + final List resourcesToStage = prepareFilesForStaging(filesToStage, tmpJarLocation); + options.setFilesToStage(resourcesToStage); + } + + /** + * Goes through the list of files that need to be staged on runner. Fails if the directory does + * not exist and packages the ones that exist. This is necessary for runners that require + * filesToStage to be jars only. + * * @param resourcesToStage list of resources that need to be staged * @param tmpJarLocation temporary directory to store the jars * @return A list of absolute paths to resources (jar files) + * @throws IllegalStateException if the directory to be staged does not exist */ public static List prepareFilesForStaging( List resourcesToStage, String tmpJarLocation) { @@ -91,7 +121,11 @@ public static List prepareFilesForStaging( private static String packageDirectoriesToStage(File directoryToStage, String tmpJarLocation) { String hash = calculateDirectoryContentHash(directoryToStage); String pathForJar = getUniqueJarPath(hash, tmpJarLocation); - zipDirectory(directoryToStage, pathForJar); + try { + ZipFiles.zipDirectoryOverwrite(directoryToStage, new File(pathForJar)); + } catch (IOException e) { + throw new RuntimeException(e); + } return pathForJar; } @@ -112,12 +146,4 @@ private static String getUniqueJarPath(String contentHash, String tmpJarLocation return String.format("%s%s%s.jar", tmpJarLocation, File.separator, contentHash); } - - private static void zipDirectory(File directoryToStage, String uniqueDirectoryPath) { - try { - ZipFiles.zipDirectory(directoryToStage, new FileOutputStream(uniqueDirectoryPath)); - } catch (IOException e) { - throw new RuntimeException(e); - } - } } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java index c73e67be35fb..11543f3b7a07 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItems; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.io.InputStream; @@ -45,6 +45,7 @@ import org.apache.beam.sdk.coders.RowCoder; import org.apache.beam.sdk.coders.SerializableCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.TimestampPrefixingWindowCoder; import org.apache.beam.sdk.coders.VarLongCoder; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.Field; @@ -68,7 +69,6 @@ /** Tests for {@link CoderTranslation}. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class CoderTranslationTest { private static final Set> KNOWN_CODERS = @@ -96,6 +96,7 @@ public class CoderTranslationTest { Field.of("map", FieldType.map(FieldType.STRING, FieldType.INT32)), Field.of("bar", FieldType.logicalType(FixedBytes.of(123)))))) .add(ShardedKey.Coder.of(StringUtf8Coder.of())) + .add(TimestampPrefixingWindowCoder.of(IntervalWindowCoder.of())) .build(); /** diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java index 4fe3f8604b00..d7d366546b0f 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java @@ -21,12 +21,12 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects.firstNonNull; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList.toImmutableList; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import com.fasterxml.jackson.annotation.JsonCreator; @@ -59,6 +59,7 @@ import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.RowCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.TimestampPrefixingWindowCoder; import org.apache.beam.sdk.coders.VarLongCoder; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.SchemaTranslation; @@ -72,7 +73,7 @@ import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -91,7 +92,6 @@ @RunWith(Parameterized.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class CommonCoderTest { private static final String STANDARD_CODERS_YAML_PATH = @@ -117,6 +117,7 @@ public class CommonCoderTest { WindowedValue.ParamWindowedValueCoder.class) .put(getUrn(StandardCoders.Enum.ROW), RowCoder.class) .put(getUrn(StandardCoders.Enum.SHARDED_KEY), ShardedKey.Coder.class) + .put(getUrn(StandardCoders.Enum.CUSTOM_WINDOW), TimestampPrefixingWindowCoder.class) .build(); @AutoValue @@ -346,6 +347,10 @@ private static Object convertValue(Object value, CommonCoder coderSpec, Coder co byte[] shardId = ((String) kvMap.get("shardId")).getBytes(StandardCharsets.ISO_8859_1); return ShardedKey.of( convertValue(kvMap.get("key"), coderSpec.getComponents().get(0), keyCoder), shardId); + } else if (s.equals(getUrn(StandardCoders.Enum.CUSTOM_WINDOW))) { + Map kvMap = (Map) value; + Coder windowCoder = ((TimestampPrefixingWindowCoder) coder).getWindowCoder(); + return convertValue(kvMap.get("window"), coderSpec.getComponents().get(0), windowCoder); } else { throw new IllegalStateException("Unknown coder URN: " + coderSpec.getUrn()); } @@ -503,6 +508,8 @@ private void verifyDecodedValue(CommonCoder coder, Object expectedValue, Object assertEquals(expectedValue, actualValue); } else if (s.equals(getUrn(StandardCoders.Enum.SHARDED_KEY))) { assertEquals(expectedValue, actualValue); + } else if (s.equals(getUrn(StandardCoders.Enum.CUSTOM_WINDOW))) { + assertEquals(expectedValue, actualValue); } else { throw new IllegalStateException("Unknown coder URN: " + coder.getUrn()); } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CreatePCollectionViewTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CreatePCollectionViewTranslationTest.java index 35d51eea3c35..fccf1bd3af30 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CreatePCollectionViewTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CreatePCollectionViewTranslationTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.core.construction; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; import org.apache.beam.sdk.coders.BigEndianLongCoder; @@ -28,6 +28,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.View.CreatePCollectionView; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; @@ -106,6 +107,7 @@ public void testEncodedProto() throws Exception { PValues.expandInput(testPCollection), PValues.expandOutput(createViewTransform.getView()), createViewTransform, + ResourceHints.create(), p); FunctionSpec payload = PTransformTranslation.toProto(appliedPTransform, components).getSpec(); @@ -131,6 +133,7 @@ public void testExtractionDirectFromTransform() throws Exception { PValues.expandInput(testPCollection), PValues.expandOutput(createViewTransform.getView()), createViewTransform, + ResourceHints.create(), p); CreatePCollectionViewTranslation.getView((AppliedPTransform) appliedPTransform); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DeduplicatedFlattenFactoryTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DeduplicatedFlattenFactoryTest.java index ff01bff980e7..7c31b992a3d1 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DeduplicatedFlattenFactoryTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DeduplicatedFlattenFactoryTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.core.construction.DeduplicatedFlattenFactory.FlattenWithoutDuplicateInputs; import org.apache.beam.sdk.Pipeline.PipelineVisitor.Defaults; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DefaultArtifactResolverTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DefaultArtifactResolverTest.java index e99749c08071..d9cbcba05842 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DefaultArtifactResolverTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DefaultArtifactResolverTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.startsWith; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.Optional; @@ -30,9 +30,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DefaultArtifactResolverTest { private RunnerApi.Pipeline createEmptyPipeline( Iterable dependencies) { diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DisplayDataTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DisplayDataTranslationTest.java index 02805f6bec3a..9fe9b4b80128 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DisplayDataTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DisplayDataTranslationTest.java @@ -46,20 +46,20 @@ public void populateDisplayData(DisplayData.Builder builder) { })), containsInAnyOrder( RunnerApi.DisplayData.newBuilder() - .setUrn(DisplayDataTranslation.LABELLED_STRING) + .setUrn(DisplayDataTranslation.LABELLED) .setPayload( - RunnerApi.LabelledStringPayload.newBuilder() + RunnerApi.LabelledPayload.newBuilder() .setLabel("foo") - .setValue("value") + .setStringValue("value") .build() .toByteString()) .build(), RunnerApi.DisplayData.newBuilder() - .setUrn(DisplayDataTranslation.LABELLED_STRING) + .setUrn(DisplayDataTranslation.LABELLED) .setPayload( - RunnerApi.LabelledStringPayload.newBuilder() + RunnerApi.LabelledPayload.newBuilder() .setLabel("label") - .setValue("value2") + .setStringValue("value2") .build() .toByteString()) .build())); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactoryTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactoryTest.java index 309f9a352a9a..bc504afa63eb 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactoryTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactoryTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.emptyIterable; -import static org.junit.Assert.assertThat; import java.util.Collections; import java.util.Map; @@ -30,6 +30,7 @@ import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Flatten; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.PValues; @@ -59,6 +60,7 @@ public void getInputEmptySucceeds() { Collections.emptyMap(), Collections.emptyMap(), Flatten.pCollections(), + ResourceHints.create(), pipeline)); assertThat(replacement.getInput().getAll(), emptyIterable()); } @@ -77,6 +79,7 @@ public void getInputNonEmptyThrows() { PValues.expandInput(nonEmpty), Collections.emptyMap(), Flatten.pCollections(), + ResourceHints.create(), pipeline)); } @@ -109,6 +112,7 @@ public void testOverride() { Collections.emptyMap(), Collections.emptyMap(), Flatten.pCollections(), + ResourceHints.create(), pipeline)) .getTransform()); PAssert.that(emptyFlattened).empty(); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EnvironmentsTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EnvironmentsTest.java index 3e6960ec152f..d980d4a01639 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EnvironmentsTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EnvironmentsTest.java @@ -17,15 +17,16 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.is; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.io.Serializable; import java.util.Optional; +import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.DockerPayload; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; @@ -47,15 +48,20 @@ import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; import org.apache.beam.sdk.values.WindowingStrategy; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Rule; import org.junit.Test; +import org.junit.rules.ExpectedException; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** Tests for {@link Environments}. */ @RunWith(JUnit4.class) public class EnvironmentsTest implements Serializable { + @Rule public transient ExpectedException thrown = ExpectedException.none(); + @Test - public void createEnvironments() throws IOException { + public void createEnvironmentDockerFromEnvironmentConfig() throws IOException { PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); options.setDefaultEnvironmentType(Environments.ENVIRONMENT_DOCKER); options.setDefaultEnvironmentConfig("java"); @@ -68,6 +74,27 @@ public void createEnvironments() throws IOException { DockerPayload.newBuilder().setContainerImage("java").build().toByteString()) .addAllCapabilities(Environments.getJavaCapabilities()) .build())); + } + + @Test + public void createEnvironmentDockerFromEnvironmentOptions() { + PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); + options.setDefaultEnvironmentType(Environments.ENVIRONMENT_DOCKER); + options.setEnvironmentOptions(ImmutableList.of("docker_container_image=java")); + assertThat( + Environments.createOrGetDefaultEnvironment(options), + is( + Environment.newBuilder() + .setUrn(BeamUrns.getUrn(StandardEnvironments.Environments.DOCKER)) + .setPayload( + DockerPayload.newBuilder().setContainerImage("java").build().toByteString()) + .addAllCapabilities(Environments.getJavaCapabilities()) + .build())); + } + + @Test + public void createEnvironmentProcessFromEnvironmentConfig() throws IOException { + PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); options.setDefaultEnvironmentType(Environments.ENVIRONMENT_PROCESS); options.setDefaultEnvironmentConfig( "{\"os\": \"linux\", \"arch\": \"amd64\", \"command\": \"run.sh\", \"env\":{\"k1\": \"v1\", \"k2\": \"v2\"} }"); @@ -99,6 +126,61 @@ public void createEnvironments() throws IOException { .build())); } + @Test + public void createEnvironmentProcessFromEnvironmentOptions() { + PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); + options.setDefaultEnvironmentType(Environments.ENVIRONMENT_PROCESS); + options.setEnvironmentOptions( + ImmutableList.of("process_command=run.sh", "process_variables=k1=v1,k2=v2")); + assertThat( + Environments.createOrGetDefaultEnvironment(options), + is( + Environment.newBuilder() + .setUrn(BeamUrns.getUrn(StandardEnvironments.Environments.PROCESS)) + .setPayload( + ProcessPayload.newBuilder() + .setCommand("run.sh") + .putEnv("k1", "v1") + .putEnv("k2", "v2") + .build() + .toByteString()) + .addAllCapabilities(Environments.getJavaCapabilities()) + .build())); + } + + @Test + public void createEnvironmentExternalFromEnvironmentOptions() { + PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); + options.setDefaultEnvironmentType(Environments.ENVIRONMENT_EXTERNAL); + options.setEnvironmentOptions(ImmutableList.of("external_service_address=foo")); + assertThat( + Environments.createOrGetDefaultEnvironment(options), + is( + Environment.newBuilder() + .setUrn(BeamUrns.getUrn(StandardEnvironments.Environments.EXTERNAL)) + .setPayload( + RunnerApi.ExternalPayload.newBuilder() + .setEndpoint( + Endpoints.ApiServiceDescriptor.newBuilder().setUrl("foo").build()) + .build() + .toByteString()) + .addAllCapabilities(Environments.getJavaCapabilities()) + .build())); + } + + @Test + public void environmentConfigAndEnvironmentOptionsAreMutuallyExclusive() { + PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); + options.setDefaultEnvironmentType(Environments.ENVIRONMENT_DOCKER); + options.setDefaultEnvironmentConfig("foo"); + options.setEnvironmentOptions(ImmutableList.of("bar")); + + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage( + "Pipeline options defaultEnvironmentConfig and environmentOptions are mutually exclusive."); + Environments.createOrGetDefaultEnvironment(options); + } + @Test public void testCapabilities() { assertThat(Environments.getJavaCapabilities(), hasItem(ModelCoders.LENGTH_PREFIX_CODER_URN)); @@ -112,6 +194,9 @@ public void testCapabilities() { BeamUrns.getUrn( RunnerApi.StandardPTransforms.SplittableParDoComponents .TRUNCATE_SIZED_RESTRICTION))); + assertThat( + Environments.getJavaCapabilities(), + hasItem(BeamUrns.getUrn(RunnerApi.StandardPTransforms.Primitives.TO_STRING))); } @Test @@ -169,14 +254,14 @@ public void process(ProcessContext ctxt) {} @Test public void testJavaVersion() { - assertEquals(JavaVersion.v8, JavaVersion.forSpecification("1.8")); - assertEquals("java8", JavaVersion.v8.toString()); - assertEquals(JavaVersion.v11, JavaVersion.forSpecification("11")); - assertEquals("java11", JavaVersion.v11.toString()); + assertEquals(JavaVersion.java8, JavaVersion.forSpecification("1.8")); + assertEquals("java", JavaVersion.java8.legacyName()); + assertEquals(JavaVersion.java11, JavaVersion.forSpecification("11")); + assertEquals("java11", JavaVersion.java11.legacyName()); } @Test(expected = UnsupportedOperationException.class) public void testJavaVersionInvalid() { - assertEquals(JavaVersion.v8, JavaVersion.forSpecification("invalid")); + assertEquals(JavaVersion.java8, JavaVersion.forSpecification("invalid")); } } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java index ed1286ce6d35..38d3a16d50ea 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.Arrays; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ForwardingPTransformTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ForwardingPTransformTest.java index 401d035db70f..1ecf04fe4124 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ForwardingPTransformTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ForwardingPTransformTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.any; import static org.mockito.Mockito.doThrow; import static org.mockito.Mockito.mock; @@ -47,9 +47,6 @@ /** Tests for {@link ForwardingPTransform}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ForwardingPTransformTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/GroupByKeyTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/GroupByKeyTranslationTest.java index a9df42cea7c6..aae4c850f6d4 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/GroupByKeyTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/GroupByKeyTranslationTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.core.construction; import static org.apache.beam.runners.core.construction.PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.transforms.GroupByKey; import org.junit.Test; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/GroupIntoBatchesTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/GroupIntoBatchesTranslationTest.java new file mode 100644 index 000000000000..3ecea358da44 --- /dev/null +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/GroupIntoBatchesTranslationTest.java @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.equalTo; + +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.GroupIntoBatchesPayload; +import org.apache.beam.sdk.runners.AppliedPTransform; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; +import org.apache.beam.sdk.util.ShardedKey; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PValues; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.joda.time.Duration; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; +import org.junit.runners.Parameterized.Parameter; +import org.junit.runners.Parameterized.Parameters; + +/** Tests for {@link GroupIntoBatchesTranslation}. */ +@RunWith(Parameterized.class) +public class GroupIntoBatchesTranslationTest { + @Parameters(name = "{index}: {0}") + public static Iterable> transform() { + return ImmutableList.of( + GroupIntoBatches.ofSize(5), + GroupIntoBatches.ofSize(5).withMaxBufferingDuration(Duration.ZERO), + GroupIntoBatches.ofSize(5).withMaxBufferingDuration(Duration.standardSeconds(10)), + GroupIntoBatches.ofByteSize(10).withMaxBufferingDuration(Duration.standardSeconds(10))); + } + + @Parameter(0) + public GroupIntoBatches groupIntoBatches; + + public static TestPipeline p = TestPipeline.create().enableAbandonedNodeEnforcement(false); + + @Test + public void testToProto() throws Exception { + PCollection> input = + p.apply(Create.of(KV.of("a", 1), KV.of("b", 2), KV.of("a", 2))); + PCollection>> output = input.apply(groupIntoBatches); + + AppliedPTransform> appliedTransform = + AppliedPTransform.of( + "foo", + PValues.expandInput(input), + PValues.expandOutput(output), + groupIntoBatches, + ResourceHints.create(), + p); + + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + RunnerApi.FunctionSpec spec = + PTransformTranslation.toProto(appliedTransform, components).getSpec(); + + assertThat(spec.getUrn(), equalTo(PTransformTranslation.GROUP_INTO_BATCHES_URN)); + verifyPayload( + groupIntoBatches.getBatchingParams(), GroupIntoBatchesPayload.parseFrom(spec.getPayload())); + } + + @Test + public void testWithShardedKeyToProto() throws Exception { + PCollection> input = + p.apply(Create.of(KV.of("a", 1), KV.of("b", 2), KV.of("a", 2))); + GroupIntoBatches.WithShardedKey transform = groupIntoBatches.withShardedKey(); + PCollection, Iterable>> output = input.apply(transform); + + AppliedPTransform.WithShardedKey> appliedTransform = + AppliedPTransform.of( + "bar", + PValues.expandInput(input), + PValues.expandOutput(output), + transform, + ResourceHints.create(), + p); + + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + RunnerApi.FunctionSpec spec = + PTransformTranslation.toProto(appliedTransform, components).getSpec(); + + assertThat( + spec.getUrn(), equalTo(PTransformTranslation.GROUP_INTO_BATCHES_WITH_SHARDED_KEY_URN)); + verifyPayload( + transform.getBatchingParams(), GroupIntoBatchesPayload.parseFrom(spec.getPayload())); + } + + private void verifyPayload( + GroupIntoBatches.BatchingParams params, RunnerApi.GroupIntoBatchesPayload payload) { + assertThat(payload.getBatchSize(), equalTo(params.getBatchSize())); + assertThat(payload.getBatchSizeBytes(), equalTo(params.getBatchSizeBytes())); + assertThat( + payload.getMaxBufferingDurationMillis(), + equalTo(params.getMaxBufferingDuration().getStandardSeconds() * 1000)); + } +} diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ModelCodersTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ModelCodersTest.java index 74912fc400cc..da60661af8e8 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ModelCodersTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ModelCodersTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.io.IOException; import org.apache.beam.model.pipeline.v1.RunnerApi.Coder; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/MorePipelineTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/MorePipelineTest.java index 37a13604af82..267b4d580dae 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/MorePipelineTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/MorePipelineTest.java @@ -18,21 +18,25 @@ package org.apache.beam.runners.core.construction; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.util.List; import java.util.concurrent.atomic.AtomicReference; import org.apache.beam.sdk.Pipeline.PipelineVisitor; +import org.apache.beam.sdk.coders.BigEndianLongCoder; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.io.GenerateSequence; +import org.apache.beam.sdk.io.range.OffsetRange; import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.runners.PTransformOverride; import org.apache.beam.sdk.runners.TransformHierarchy.Node; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Materializations; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; @@ -50,7 +54,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class MorePipelineTest { @@ -100,13 +103,22 @@ static class FakeViewAsList extends PTransform, PCollectionVie @Override public PCollectionView> expand(PCollection input) { - PCollection> materializationInput = - input.apply(new View.VoidKeyToMultimapMaterialization<>()); Coder inputCoder = input.getCoder(); + PCollection>> materializationInput = + input + .apply("IndexElements", ParDo.of(new View.ToListViewDoFn<>())) + .setCoder( + KvCoder.of( + BigEndianLongCoder.of(), + PCollectionViews.ValueOrMetadataCoder.create( + inputCoder, OffsetRange.Coder.of()))); PCollectionView> view = - PCollectionViews.listViewUsingVoidKey( + PCollectionViews.listView( materializationInput, - (TupleTag>) originalView.getTagInternal(), + (TupleTag< + Materializations.MultimapView< + Long, PCollectionViews.ValueOrMetadata>>) + originalView.getTagInternal(), (PCollectionViews.TypeDescriptorSupplier) inputCoder::getEncodedTypeDescriptor, materializationInput.getWindowingStrategy()); materializationInput.apply(View.CreatePCollectionView.of(view)); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PCollectionTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PCollectionTranslationTest.java index b35e10cdca25..b521b682fb38 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PCollectionTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PCollectionTranslationTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.google.auto.value.AutoValue; import java.io.IOException; @@ -63,9 +63,6 @@ /** Tests for {@link PCollectionTranslation}. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PCollectionTranslationTest { // Each spec activates tests of all subsets of its fields @Parameters(name = "{index}: {0}") diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java index f065031c5e74..1c1efa82a782 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.Collections; @@ -55,6 +55,7 @@ import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.View.CreatePCollectionView; import org.apache.beam.sdk.transforms.ViewFn; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.GlobalWindows; @@ -85,7 +86,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PTransformMatchersTest implements Serializable { @Rule @@ -110,7 +110,12 @@ public class PTransformMatchersTest implements Serializable { output.setName("dummy output"); return AppliedPTransform.of( - "pardo", PValues.expandInput(input), PValues.expandOutput(output), pardo, p); + "pardo", + PValues.expandInput(input), + PValues.expandOutput(output), + pardo, + ResourceHints.create(), + p); } @Test @@ -440,6 +445,7 @@ public void emptyFlattenWithEmptyFlatten() { PCollection.createPrimitiveOutputInternal( p, WindowingStrategy.globalDefault(), IsBounded.BOUNDED, VarIntCoder.of())), Flatten.pCollections(), + ResourceHints.create(), p); assertThat(PTransformMatchers.emptyFlatten().matches(application), is(true)); @@ -459,6 +465,7 @@ public void emptyFlattenWithNonEmptyFlatten() { PCollection.createPrimitiveOutputInternal( p, WindowingStrategy.globalDefault(), IsBounded.BOUNDED, VarIntCoder.of())), Flatten.pCollections(), + ResourceHints.create(), p); assertThat(PTransformMatchers.emptyFlatten().matches(application), is(false)); @@ -477,6 +484,7 @@ public void emptyFlattenWithNonFlatten() { p, WindowingStrategy.globalDefault(), IsBounded.BOUNDED, VarIntCoder.of())), /* This isn't actually possible to construct, but for the sake of example */ Flatten.iterables(), + ResourceHints.create(), p); assertThat(PTransformMatchers.emptyFlatten().matches(application), is(false)); @@ -496,6 +504,7 @@ public void flattenWithDuplicateInputsWithoutDuplicates() { PCollection.createPrimitiveOutputInternal( p, WindowingStrategy.globalDefault(), IsBounded.BOUNDED, VarIntCoder.of())), Flatten.pCollections(), + ResourceHints.create(), p); assertThat(PTransformMatchers.flattenWithDuplicateInputs().matches(application), is(false)); @@ -518,6 +527,7 @@ public void flattenWithDuplicateInputsWithDuplicates() { PCollection.createPrimitiveOutputInternal( p, WindowingStrategy.globalDefault(), IsBounded.BOUNDED, VarIntCoder.of())), Flatten.pCollections(), + ResourceHints.create(), p); assertThat(PTransformMatchers.flattenWithDuplicateInputs().matches(application), is(true)); @@ -536,6 +546,7 @@ public void flattenWithDuplicateInputsNonFlatten() { p, WindowingStrategy.globalDefault(), IsBounded.BOUNDED, VarIntCoder.of())), /* This isn't actually possible to construct, but for the sake of example */ Flatten.iterables(), + ResourceHints.create(), p); assertThat(PTransformMatchers.flattenWithDuplicateInputs().matches(application), is(false)); @@ -579,7 +590,12 @@ public WriteOperation createWriteOperation() { private AppliedPTransform appliedWrite(WriteFiles write) { return AppliedPTransform.of( - "WriteFiles", Collections.emptyMap(), Collections.emptyMap(), write, p); + "WriteFiles", + Collections.emptyMap(), + Collections.emptyMap(), + write, + ResourceHints.create(), + p); } private static class FakeFilenamePolicy extends FilenamePolicy { diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformReplacementsTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformReplacementsTest.java index a1b221e3f7e1..7006a555b535 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformReplacementsTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformReplacementsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Collections; import org.apache.beam.sdk.io.GenerateSequence; @@ -28,6 +28,7 @@ import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.View; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.PValues; @@ -41,9 +42,6 @@ /** Tests for {@link PTransformReplacements}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PTransformReplacementsTest { @Rule public TestPipeline pipeline = TestPipeline.create().enableAbandonedNodeEnforcement(false); @Rule public ExpectedException thrown = ExpectedException.none(); @@ -61,6 +59,7 @@ public void getMainInputSingleOutputSingleInput() { Collections.singletonMap(new TupleTag(), mainInput), Collections.singletonMap(new TupleTag(), output), ParDo.of(new TestDoFn()), + ResourceHints.create(), pipeline); PCollection input = PTransformReplacements.getSingletonMainInput(application); assertThat(input, equalTo(mainInput)); @@ -77,6 +76,7 @@ public void getMainInputSingleOutputSideInputs() { .build(), Collections.singletonMap(new TupleTag(), output), ParDo.of(new TestDoFn()).withSideInputs(sideInput), + ResourceHints.create(), pipeline); PCollection input = PTransformReplacements.getSingletonMainInput(application); assertThat(input, equalTo(mainInput)); @@ -98,6 +98,7 @@ public void getMainInputExtraMainInputsThrows() { inputs, Collections.singletonMap(new TupleTag(), output), ParDo.of(new TestDoFn()).withSideInputs(sideInput), + ResourceHints.create(), pipeline); thrown.expect(IllegalArgumentException.class); thrown.expectMessage("multiple inputs"); @@ -119,6 +120,7 @@ public void getMainInputNoMainInputsThrows() { inputs, Collections.singletonMap(new TupleTag(), output), ParDo.of(new TestDoFn()).withSideInputs(sideInput), + ResourceHints.create(), pipeline); thrown.expect(IllegalArgumentException.class); thrown.expectMessage("No main input"); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformTranslationTest.java index 33253b7b547f..a0158e3db827 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformTranslationTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.google.auto.value.AutoValue; import java.io.IOException; @@ -42,6 +42,7 @@ import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.View; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; @@ -62,9 +63,6 @@ /** Tests for {@link PTransformTranslation}. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PTransformTranslationTest { @Parameters(name = "{index}: {0}") @@ -172,6 +170,7 @@ public void process(ProcessContext context) {} PValues.expandInput(pipeline.begin()), PValues.expandOutput(pcollection), sequence, + ResourceHints.create(), pipeline); } @@ -183,6 +182,7 @@ public void process(ProcessContext context) {} PValues.expandInput(pipeline.begin()), PValues.expandOutput(pcollection), transform, + ResourceHints.create(), pipeline); } @@ -204,6 +204,7 @@ public String getUrn() { PValues.expandInput(pipeline.begin()), PValues.expandOutput(PDone.in(pipeline)), rawPTransform, + ResourceHints.create(), pipeline); } @@ -224,6 +225,11 @@ public String getUrn() { return AppliedPTransform ., PCollectionTuple, ParDo.MultiOutput>>of( - "MultiParDoInAndOut", inputs, PValues.expandOutput(output), parDo, pipeline); + "MultiParDoInAndOut", + inputs, + PValues.expandOutput(output), + parDo, + ResourceHints.create(), + pipeline); } } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ParDoTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ParDoTranslationTest.java index 10a2b022dcbb..057533354bb8 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ParDoTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ParDoTranslationTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.HashMap; @@ -54,6 +54,7 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.ParDo.MultiOutput; import org.apache.beam.sdk.transforms.View; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; @@ -165,7 +166,7 @@ public void toTransformProto() throws Exception { RunnerApi.PTransform protoTransform = PTransformTranslation.toProto( AppliedPTransform.>, PCollection, MultiOutput>of( - "foo", inputs, PValues.expandOutput(output), parDo, p), + "foo", inputs, PValues.expandOutput(output), parDo, ResourceHints.create(), p), sdkComponents); RunnerApi.Components components = sdkComponents.toComponents(); RehydratedComponents rehydratedComponents = RehydratedComponents.forComponents(components); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslationTest.java index 04588fe25fa7..8b131fa0448f 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslationTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.notNullValue; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import com.fasterxml.jackson.annotation.JsonIgnore; import com.fasterxml.jackson.databind.ObjectMapper; @@ -30,9 +30,9 @@ import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.util.common.ReflectHelpers; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.NullValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Value; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.NullValue; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Value; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Test; import org.junit.runner.RunWith; @@ -42,9 +42,6 @@ import org.junit.runners.Parameterized.Parameters; /** Tests for {@link PipelineOptionsTranslation}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PipelineOptionsTranslationTest { /** Tests that translations can round-trip through the proto format. */ @RunWith(Parameterized.class) diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineTranslationTest.java index 374549136ad8..72d4ec2a405c 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineTranslationTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItem; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.util.Collections; @@ -44,6 +44,7 @@ import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.transforms.Impulse; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.WithKeys; @@ -51,6 +52,7 @@ import org.apache.beam.sdk.transforms.reflect.DoFnInvokers; import org.apache.beam.sdk.transforms.reflect.DoFnSignature; import org.apache.beam.sdk.transforms.reflect.DoFnSignatures; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.windowing.AfterPane; import org.apache.beam.sdk.transforms.windowing.AfterWatermark; import org.apache.beam.sdk.transforms.windowing.FixedWindows; @@ -60,6 +62,7 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.WindowingStrategy; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.joda.time.Duration; import org.junit.Test; @@ -72,7 +75,6 @@ @RunWith(Parameterized.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PipelineTranslationTest { @Parameter(0) @@ -288,4 +290,53 @@ public void testRequirements() { assertThat( pipelineProto.getRequirementsList(), hasItem(ParDoTranslation.REQUIRES_STABLE_INPUT_URN)); } + + private RunnerApi.PTransform getLeafTransform(RunnerApi.Pipeline pipelineProto, String prefix) { + for (RunnerApi.PTransform transform : + pipelineProto.getComponents().getTransformsMap().values()) { + if (transform.getUniqueName().startsWith(prefix) && transform.getSubtransformsCount() == 0) { + return transform; + } + } + throw new java.lang.IllegalArgumentException(prefix); + } + + private static class IdentityDoFn extends DoFn { + @ProcessElement + public void processElement(@Element T input, OutputReceiver out) { + out.output(input); + } + } + + @Test + public void testResourceHints() { + Pipeline pipeline = Pipeline.create(); + PCollection root = pipeline.apply(Impulse.create()); + ParDo.SingleOutput transform = ParDo.of(new IdentityDoFn()); + root.apply("Big", transform.setResourceHints(ResourceHints.create().withMinRam("640KB"))); + root.apply("Small", transform.setResourceHints(ResourceHints.create().withMinRam(1))); + root.apply( + "AnotherBig", transform.setResourceHints(ResourceHints.create().withMinRam("640KB"))); + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, false); + assertThat( + pipelineProto + .getComponents() + .getEnvironmentsMap() + .get(getLeafTransform(pipelineProto, "Big").getEnvironmentId()) + .getResourceHintsMap(), + org.hamcrest.Matchers.hasEntry( + "beam:resources:min_ram_bytes:v1", ByteString.copyFromUtf8("640000"))); + assertThat( + pipelineProto + .getComponents() + .getEnvironmentsMap() + .get(getLeafTransform(pipelineProto, "Small").getEnvironmentId()) + .getResourceHintsMap(), + org.hamcrest.Matchers.hasEntry( + "beam:resources:min_ram_bytes:v1", ByteString.copyFromUtf8("1"))); + // Ensure we re-use environments. + assertThat( + getLeafTransform(pipelineProto, "Big").getEnvironmentId(), + equalTo(getLeafTransform(pipelineProto, "AnotherBig").getEnvironmentId())); + } } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReadTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReadTranslationTest.java index 8309f766dfe2..cd2f5c38f305 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReadTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReadTranslationTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import static org.junit.Assume.assumeThat; import java.io.IOException; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/RehydratedComponentsTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/RehydratedComponentsTest.java index 434fb43ee477..c50882e92438 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/RehydratedComponentsTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/RehydratedComponentsTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.theInstance; -import static org.junit.Assert.assertThat; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; import org.apache.beam.sdk.coders.Coder; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReplacementOutputsTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReplacementOutputsTest.java index 249764455e50..0666c1468a4b 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReplacementOutputsTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReplacementOutputsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Map; import org.apache.beam.sdk.coders.StringUtf8Coder; @@ -45,9 +45,6 @@ /** Tests for {@link ReplacementOutputs}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReplacementOutputsTest { @Rule public ExpectedException thrown = ExpectedException.none(); private TestPipeline p = TestPipeline.create(); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReshuffleTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReshuffleTranslationTest.java index 48e849e27ecc..0f8b97bff2fe 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReshuffleTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ReshuffleTranslationTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.core.construction; import static org.apache.beam.runners.core.construction.PTransformTranslation.RESHUFFLE_URN; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.transforms.Reshuffle; import org.junit.Test; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SchemaTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SchemaTranslationTest.java index 4fe18b9aba2b..3b083f6bacfa 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SchemaTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SchemaTranslationTest.java @@ -17,14 +17,20 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import org.apache.beam.model.pipeline.v1.SchemaApi; +import org.apache.beam.model.pipeline.v1.SchemaApi.ArrayType; +import org.apache.beam.model.pipeline.v1.SchemaApi.ArrayTypeValue; +import org.apache.beam.model.pipeline.v1.SchemaApi.AtomicType; +import org.apache.beam.model.pipeline.v1.SchemaApi.AtomicTypeValue; +import org.apache.beam.model.pipeline.v1.SchemaApi.FieldValue; +import org.apache.beam.model.pipeline.v1.SchemaApi.LogicalType; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.Field; import org.apache.beam.sdk.schemas.Schema.FieldType; @@ -32,6 +38,8 @@ import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes; import org.apache.beam.sdk.schemas.logicaltypes.MicrosInstant; import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Test; import org.junit.experimental.runners.Enclosed; @@ -42,9 +50,6 @@ /** Tests for {@link SchemaTranslation}. */ @RunWith(Enclosed.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SchemaTranslationTest { /** Tests round-trip proto encodings for {@link Schema}. */ @@ -177,4 +182,106 @@ public void toAndFromProto() throws Exception { assertThat(decodedSchema, equalTo(schema)); } } + + /** Tests round-trip proto encodings for {@link Schema}. */ + @RunWith(Parameterized.class) + public static class FromProtoToProtoTest { + @Parameters(name = "{index}: {0}") + public static Iterable data() { + SchemaApi.Schema.Builder builder = SchemaApi.Schema.newBuilder(); + // A go 'int' + builder.addFields( + SchemaApi.Field.newBuilder() + .setName("goInt") + .setDescription("An int from go") + .setType( + SchemaApi.FieldType.newBuilder() + .setLogicalType( + SchemaApi.LogicalType.newBuilder() + .setUrn("gosdk:int") + .setRepresentation( + SchemaApi.FieldType.newBuilder() + .setAtomicType(SchemaApi.AtomicType.INT64)) + .build())) + .setId(0) + .setEncodingPosition(0) + .build()); + // A pickled python object + builder.addFields( + SchemaApi.Field.newBuilder() + .setName("pythonObject") + .setType( + SchemaApi.FieldType.newBuilder() + .setLogicalType( + SchemaApi.LogicalType.newBuilder() + .setUrn("pythonsdk:value") + .setPayload( + ByteString.copyFrom( + "some payload describing a python type", Charsets.UTF_8)) + .setRepresentation( + SchemaApi.FieldType.newBuilder() + .setAtomicType(SchemaApi.AtomicType.BYTES)) + .build())) + .setId(1) + .setEncodingPosition(1) + .build()); + // An enum logical type that Java doesn't know about + builder.addFields( + SchemaApi.Field.newBuilder() + .setName("enum") + .setType( + SchemaApi.FieldType.newBuilder() + .setLogicalType( + LogicalType.newBuilder() + .setUrn("strange_enum") + .setArgumentType( + SchemaApi.FieldType.newBuilder() + .setArrayType( + ArrayType.newBuilder() + .setElementType( + SchemaApi.FieldType.newBuilder() + .setAtomicType(AtomicType.STRING)))) + .setArgument( + FieldValue.newBuilder() + .setArrayValue( + ArrayTypeValue.newBuilder() + .addElement( + FieldValue.newBuilder() + .setAtomicValue( + AtomicTypeValue.newBuilder() + .setString("FOO") + .build()) + .build()) + .addElement( + FieldValue.newBuilder() + .setAtomicValue( + AtomicTypeValue.newBuilder() + .setString("BAR") + .build()) + .build()) + .build()) + .build()) + .setRepresentation( + SchemaApi.FieldType.newBuilder().setAtomicType(AtomicType.BYTE)) + .build())) + .setId(2) + .setEncodingPosition(2) + .build()); + SchemaApi.Schema unknownLogicalTypeSchema = builder.build(); + + return ImmutableList.builder().add(unknownLogicalTypeSchema).build(); + } + + @Parameter(0) + public SchemaApi.Schema schemaProto; + + @Test + public void fromProtoAndToProto() throws Exception { + Schema decodedSchema = SchemaTranslation.schemaFromProto(schemaProto); + + SchemaApi.Schema reencodedSchemaProto = SchemaTranslation.schemaToProto(decodedSchema, true); + + assertThat(reencodedSchemaProto, equalTo(schemaProto)); + } + } } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SdkComponentsTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SdkComponentsTest.java index 1096df099d62..6e58802beb52 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SdkComponentsTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SdkComponentsTest.java @@ -17,11 +17,14 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.isEmptyOrNullString; +import static org.hamcrest.Matchers.matchesPattern; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotEquals; import java.io.IOException; import java.util.Collections; @@ -37,6 +40,7 @@ import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PValues; import org.apache.beam.sdk.values.WindowingStrategy; @@ -50,9 +54,6 @@ /** Tests for {@link SdkComponents}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SdkComponentsTest { @Rule public TestPipeline pipeline = TestPipeline.create().enableAbandonedNodeEnforcement(false); @Rule public ExpectedException thrown = ExpectedException.none(); @@ -86,19 +87,38 @@ public void registerCoder() throws IOException { public void registerTransformNoChildren() throws IOException { Create.Values create = Create.of(1, 2, 3); PCollection pt = pipeline.apply(create); - String userName = "my_transform/my_nesting"; + String userName = "my_transform-my_nesting"; AppliedPTransform transform = AppliedPTransform.of( userName, PValues.expandInput(pipeline.begin()), PValues.expandOutput(pt), create, + ResourceHints.create(), pipeline); String componentName = components.registerPTransform(transform, Collections.emptyList()); assertThat(componentName, equalTo(userName)); assertThat(components.getExistingPTransformId(transform), equalTo(componentName)); } + @Test + public void registerTransformIdFormat() throws IOException { + Create.Values create = Create.of(1, 2, 3); + PCollection pt = pipeline.apply(create); + String malformedUserName = "my/tRAnsform 1(nesting)"; + AppliedPTransform transform = + AppliedPTransform.of( + malformedUserName, + PValues.expandInput(pipeline.begin()), + PValues.expandOutput(pt), + create, + ResourceHints.create(), + pipeline); + String componentName = components.registerPTransform(transform, Collections.emptyList()); + assertThat(componentName, matchesPattern("^[A-Za-z0-9-_]+")); + assertThat(components.getExistingPTransformId(transform), equalTo(componentName)); + } + @Test public void registerTransformAfterChildren() throws IOException { Create.Values create = Create.of(1L, 2L, 3L); @@ -113,6 +133,7 @@ public void registerTransformAfterChildren() throws IOException { PValues.expandInput(pipeline.begin()), PValues.expandOutput(pt), create, + ResourceHints.create(), pipeline); AppliedPTransform childTransform = AppliedPTransform.of( @@ -120,6 +141,7 @@ public void registerTransformAfterChildren() throws IOException { PValues.expandInput(pipeline.begin()), PValues.expandOutput(pt), createChild, + ResourceHints.create(), pipeline); String childId = components.registerPTransform(childTransform, Collections.emptyList()); @@ -136,7 +158,12 @@ public void registerTransformEmptyFullName() throws IOException { PCollection pt = pipeline.apply(create); AppliedPTransform transform = AppliedPTransform.of( - "", PValues.expandInput(pipeline.begin()), PValues.expandOutput(pt), create, pipeline); + "", + PValues.expandInput(pipeline.begin()), + PValues.expandOutput(pt), + create, + ResourceHints.create(), + pipeline); thrown.expect(IllegalArgumentException.class); thrown.expectMessage(transform.toString()); @@ -154,6 +181,7 @@ public void registerTransformNullComponents() throws IOException { PValues.expandInput(pipeline.begin()), PValues.expandOutput(pt), create, + ResourceHints.create(), pipeline); thrown.expect(NullPointerException.class); thrown.expectMessage("child nodes may not be null"); @@ -175,6 +203,7 @@ public void registerTransformWithUnregisteredChildren() throws IOException { PValues.expandInput(pipeline.begin()), PValues.expandOutput(pt), create, + ResourceHints.create(), pipeline); AppliedPTransform childTransform = AppliedPTransform.of( @@ -182,6 +211,7 @@ public void registerTransformWithUnregisteredChildren() throws IOException { PValues.expandInput(pipeline.begin()), PValues.expandOutput(pt), createChild, + ResourceHints.create(), pipeline); thrown.expect(IllegalArgumentException.class); @@ -232,4 +262,40 @@ public void registerWindowingStrategyIdEqualStrategies() throws IOException { WindowingStrategy.globalDefault().withMode(AccumulationMode.ACCUMULATING_FIRED_PANES)); assertThat(name, equalTo(duplicateName)); } + + @Test + public void testEnvironmentForHintDeduplicatonLogic() { + assertEquals( + components.getEnvironmentIdFor(ResourceHints.create()), + components.getEnvironmentIdFor(ResourceHints.create())); + + assertEquals( + components.getEnvironmentIdFor(ResourceHints.create().withMinRam(1000)), + components.getEnvironmentIdFor(ResourceHints.create().withMinRam(1000))); + + assertEquals( + components.getEnvironmentIdFor(ResourceHints.create().withMinRam(1000)), + components.getEnvironmentIdFor(ResourceHints.create().withMinRam(2000).withMinRam("1KB"))); + + assertNotEquals( + components.getEnvironmentIdFor(ResourceHints.create()), + components.getEnvironmentIdFor(ResourceHints.create().withMinRam("1GiB"))); + + assertNotEquals( + components.getEnvironmentIdFor(ResourceHints.create().withMinRam("10GiB")), + components.getEnvironmentIdFor(ResourceHints.create().withMinRam("10GB"))); + + assertEquals( + components.getEnvironmentIdFor(ResourceHints.create().withAccelerator("gpu")), + components.getEnvironmentIdFor(ResourceHints.create().withAccelerator("gpu"))); + + assertNotEquals( + components.getEnvironmentIdFor(ResourceHints.create().withAccelerator("gpu")), + components.getEnvironmentIdFor(ResourceHints.create().withAccelerator("tpu"))); + + assertNotEquals( + components.getEnvironmentIdFor(ResourceHints.create().withAccelerator("gpu")), + components.getEnvironmentIdFor( + ResourceHints.create().withAccelerator("gpu").withMinRam(10))); + } } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SingleInputOutputOverrideFactoryTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SingleInputOutputOverrideFactoryTest.java index 11bbefcad176..22982c5f2db7 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SingleInputOutputOverrideFactoryTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SingleInputOutputOverrideFactoryTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.core.construction; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.io.Serializable; import java.util.Map; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java index 635d665143d1..43bd1ec4c6ae 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java @@ -46,6 +46,7 @@ import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.splittabledofn.HasDefaultTracker; import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; import org.apache.beam.sdk.transforms.splittabledofn.SplitResult; @@ -152,6 +153,7 @@ private PCollection applySplittableParDo( PValues.expandInput(input), PValues.expandOutput(output), multiOutput, + ResourceHints.create(), pipeline); return input.apply(name, SplittableParDo.forAppliedParDo(transform)).get(MAIN_OUTPUT_TAG); } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TestStreamTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TestStreamTranslationTest.java index 703f65d162e3..01833a83eeb7 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TestStreamTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TestStreamTranslationTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.core.construction; import static org.apache.beam.runners.core.construction.PTransformTranslation.TEST_STREAM_TRANSFORM_URN; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.TestStreamPayload; @@ -28,6 +28,7 @@ import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.testing.TestStream; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PValues; @@ -85,6 +86,7 @@ public void testRegistrarEncodedProto() throws Exception { PValues.expandInput(PBegin.in(p)), PValues.expandOutput(output), testStream, + ResourceHints.create(), p); SdkComponents components = SdkComponents.create(); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TimerTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TimerTest.java index 50e575b1516a..f3d12e87e283 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TimerTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TimerTest.java @@ -34,9 +34,6 @@ /** Tests for {@link Timer}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TimerTest { private static final Instant FIRE_TIME = new Instant(123L); private static final Instant HOLD_TIME = new Instant(456L); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TransformInputsTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TransformInputsTest.java index 25338496ff50..995611e52e91 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TransformInputsTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TransformInputsTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.core.construction; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.util.Collections; import java.util.HashMap; @@ -28,6 +28,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PDone; import org.apache.beam.sdk.values.PInput; @@ -55,6 +56,7 @@ public void nonAdditionalInputsWithNoInputSucceeds() { Collections.emptyMap(), Collections.emptyMap(), new TestTransform(), + ResourceHints.create(), pipeline); assertThat(TransformInputs.nonAdditionalInputs(transform), Matchers.empty()); @@ -69,6 +71,7 @@ public void nonAdditionalInputsWithOneMainInputSucceeds() { Collections.singletonMap(new TupleTag() {}, input), Collections.emptyMap(), new TestTransform(), + ResourceHints.create(), pipeline); assertThat(TransformInputs.nonAdditionalInputs(transform), Matchers.containsInAnyOrder(input)); @@ -83,7 +86,12 @@ public void nonAdditionalInputsWithMultipleNonAdditionalInputsSucceeds() { allInputs.put(new TupleTag() {}, voids); AppliedPTransform transform = AppliedPTransform.of( - "additional-free", allInputs, Collections.emptyMap(), new TestTransform(), pipeline); + "additional-free", + allInputs, + Collections.emptyMap(), + new TestTransform(), + ResourceHints.create(), + pipeline); assertThat( TransformInputs.nonAdditionalInputs(transform), @@ -109,6 +117,7 @@ public void nonAdditionalInputsWithAdditionalInputsSucceeds() { allInputs, Collections.emptyMap(), new TestTransform(additionalInputs), + ResourceHints.create(), pipeline); assertThat( @@ -128,6 +137,7 @@ public void nonAdditionalInputsWithOnlyAdditionalInputsThrows() { additionalInputs, Collections.emptyMap(), new TestTransform((Map) additionalInputs), + ResourceHints.create(), pipeline); thrown.expect(IllegalArgumentException.class); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TriggerTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TriggerTranslationTest.java index a5702b929607..d2f108afd601 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TriggerTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/TriggerTranslationTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.google.auto.value.AutoValue; import org.apache.beam.sdk.transforms.windowing.AfterAll; @@ -43,9 +43,6 @@ /** Tests for utilities in {@link TriggerTranslation}. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TriggerTranslationTest { @AutoValue diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSourceTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSourceTest.java index 0a17aae1bc2e..57aadc7bb0c6 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSourceTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSourceTest.java @@ -72,9 +72,6 @@ /** Unit tests for {@link UnboundedReadFromBoundedSource}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UnboundedReadFromBoundedSourceTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnconsumedReadsTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnconsumedReadsTest.java index 6cc247ed3fb1..e8639cbb3b52 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnconsumedReadsTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnconsumedReadsTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.core.construction; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.util.HashSet; import java.util.Set; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnsupportedOverrideFactoryTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnsupportedOverrideFactoryTest.java index 0d7ea8623f91..e94422164cba 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnsupportedOverrideFactoryTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/UnsupportedOverrideFactoryTest.java @@ -30,7 +30,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class UnsupportedOverrideFactoryTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java index 6ca799b12087..371487610f60 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.io.ByteArrayOutputStream; import java.io.IOException; @@ -36,6 +36,7 @@ import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.testing.UsesCrossLanguageTransforms; +import org.apache.beam.sdk.testing.UsesPythonExpansionService; import org.apache.beam.sdk.testing.ValidatesRunner; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.MapElements; @@ -46,10 +47,10 @@ import org.apache.beam.sdk.values.PCollectionTuple; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptors; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ConnectivityState; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ConnectivityState; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannelBuilder; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.junit.After; import org.junit.Before; @@ -85,9 +86,6 @@ * details. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ValidateRunnerXlangTest implements Serializable { @Rule public transient TestPipeline testPipeline = TestPipeline.create(); private PipelineResult pipelineResult; @@ -103,6 +101,7 @@ public class ValidateRunnerXlangTest implements Serializable { private static final String TEST_COMPK_URN = "beam:transforms:xlang:test:compk"; private static final String TEST_FLATTEN_URN = "beam:transforms:xlang:test:flatten"; private static final String TEST_PARTITION_URN = "beam:transforms:xlang:test:partition"; + private static final String TEST_PYTHON_BS4_URN = "beam:transforms:xlang:test:python_bs4"; private static String expansionAddr; private static String expansionJar; @@ -320,6 +319,28 @@ public void partitionTest() { PAssert.that(col.get("1")).containsInAnyOrder(1L, 3L, 5L); } + @Test + @Category({ValidatesRunner.class, UsesPythonExpansionService.class}) + public void pythonDependenciesTest() { + String html = + "The Dormouse's story\n" + + "\n" + + "

    The Dormouse's story

    \n" + + "\n" + + "

    Once upon a time there were three little sisters; and their names were\n" + + "Elsie,\n" + + "Lacie and\n" + + "Tillie;\n" + + "and they lived at the bottom of a well.

    \n" + + "\n" + + "

    ...

    "; + PCollection col = + testPipeline + .apply(Create.of(html)) + .apply(External.of(TEST_PYTHON_BS4_URN, new byte[] {}, expansionAddr)); + PAssert.that(col).containsInAnyOrder("The Dormouse's story"); + } + private byte[] toStringPayloadBytes(String data) throws IOException { Row configRow = Row.withSchema(Schema.of(Field.of("data", FieldType.STRING))) diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WindowIntoTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WindowIntoTranslationTest.java index 26f0ea8b16c8..ae748ff85000 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WindowIntoTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WindowIntoTranslationTest.java @@ -38,7 +38,7 @@ import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.transforms.windowing.Window.Assign; import org.apache.beam.sdk.transforms.windowing.WindowFn; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WindowingStrategyTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WindowingStrategyTranslationTest.java index f68ed2e2f618..777376832781 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WindowingStrategyTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WindowingStrategyTranslationTest.java @@ -17,37 +17,55 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.Collection; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.Map; +import java.util.Objects; +import java.util.Set; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CustomCoder; +import org.apache.beam.sdk.coders.VarIntCoder; import org.apache.beam.sdk.transforms.windowing.AfterWatermark; import org.apache.beam.sdk.transforms.windowing.FixedWindows; +import org.apache.beam.sdk.transforms.windowing.IntervalWindow; import org.apache.beam.sdk.transforms.windowing.Sessions; import org.apache.beam.sdk.transforms.windowing.SlidingWindows; import org.apache.beam.sdk.transforms.windowing.TimestampCombiner; import org.apache.beam.sdk.transforms.windowing.Trigger; import org.apache.beam.sdk.transforms.windowing.Window.ClosingBehavior; import org.apache.beam.sdk.transforms.windowing.WindowFn; +import org.apache.beam.sdk.transforms.windowing.WindowMappingFn; +import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.beam.sdk.values.WindowingStrategy.AccumulationMode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; +import org.joda.time.Instant; import org.junit.Test; +import org.junit.experimental.runners.Enclosed; import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; import org.junit.runners.Parameterized; import org.junit.runners.Parameterized.Parameter; import org.junit.runners.Parameterized.Parameters; /** Unit tests for {@link WindowingStrategy}. */ -@RunWith(Parameterized.class) +@RunWith(Enclosed.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class WindowingStrategyTranslationTest { - // Each spec activates tests of all subsets of its fields @AutoValue @AutoValue.CopyAnnotations @@ -56,91 +74,264 @@ abstract static class ToProtoAndBackSpec { abstract WindowingStrategy getWindowingStrategy(); } - private static ToProtoAndBackSpec toProtoAndBackSpec(WindowingStrategy windowingStrategy) { - return new AutoValue_WindowingStrategyTranslationTest_ToProtoAndBackSpec(windowingStrategy); + @RunWith(Parameterized.class) + public static class ToProtoAndBackTests { + + private static ToProtoAndBackSpec toProtoAndBackSpec(WindowingStrategy windowingStrategy) { + return new AutoValue_WindowingStrategyTranslationTest_ToProtoAndBackSpec(windowingStrategy); + } + + private static final WindowFn REPRESENTATIVE_WINDOW_FN = + FixedWindows.of(Duration.millis(12)); + + private static final Trigger REPRESENTATIVE_TRIGGER = AfterWatermark.pastEndOfWindow(); + + @Parameters(name = "{index}: {0}") + public static Iterable data() { + return ImmutableList.of( + toProtoAndBackSpec(WindowingStrategy.globalDefault()), + toProtoAndBackSpec( + WindowingStrategy.of( + FixedWindows.of(Duration.millis(11)).withOffset(Duration.millis(3)))), + toProtoAndBackSpec( + WindowingStrategy.of( + SlidingWindows.of(Duration.millis(37)) + .every(Duration.millis(3)) + .withOffset(Duration.millis(2)))), + toProtoAndBackSpec(WindowingStrategy.of(Sessions.withGapDuration(Duration.millis(389)))), + toProtoAndBackSpec( + WindowingStrategy.of(REPRESENTATIVE_WINDOW_FN) + .withClosingBehavior(ClosingBehavior.FIRE_ALWAYS) + .withMode(AccumulationMode.ACCUMULATING_FIRED_PANES) + .withTrigger(REPRESENTATIVE_TRIGGER) + .withAllowedLateness(Duration.millis(71)) + .withTimestampCombiner(TimestampCombiner.EARLIEST)), + toProtoAndBackSpec( + WindowingStrategy.of(REPRESENTATIVE_WINDOW_FN) + .withClosingBehavior(ClosingBehavior.FIRE_IF_NON_EMPTY) + .withMode(AccumulationMode.DISCARDING_FIRED_PANES) + .withTrigger(REPRESENTATIVE_TRIGGER) + .withAllowedLateness(Duration.millis(93)) + .withTimestampCombiner(TimestampCombiner.LATEST)), + toProtoAndBackSpec( + WindowingStrategy.of(REPRESENTATIVE_WINDOW_FN) + .withClosingBehavior(ClosingBehavior.FIRE_IF_NON_EMPTY) + .withMode(AccumulationMode.RETRACTING_FIRED_PANES) + .withTrigger(REPRESENTATIVE_TRIGGER) + .withAllowedLateness(Duration.millis(100)) + .withTimestampCombiner(TimestampCombiner.LATEST)), + toProtoAndBackSpec(WindowingStrategy.of(new CustomWindowFn()))); + } + + @Parameter(0) + public ToProtoAndBackSpec toProtoAndBackSpec; + + @Test + public void testToProtoAndBack() throws Exception { + WindowingStrategy windowingStrategy = toProtoAndBackSpec.getWindowingStrategy(); + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + WindowingStrategy toProtoAndBackWindowingStrategy = + WindowingStrategyTranslation.fromProto( + WindowingStrategyTranslation.toMessageProto(windowingStrategy, components)); + + assertThat( + toProtoAndBackWindowingStrategy, + equalTo( + (WindowingStrategy) + windowingStrategy + .withEnvironmentId(components.getOnlyEnvironmentId()) + .fixDefaults())); + } + + @Test + public void testToProtoAndBackWithComponents() throws Exception { + WindowingStrategy windowingStrategy = toProtoAndBackSpec.getWindowingStrategy(); + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + RunnerApi.WindowingStrategy proto = + WindowingStrategyTranslation.toProto(windowingStrategy, components); + RehydratedComponents protoComponents = + RehydratedComponents.forComponents(components.toComponents()); + + assertThat( + WindowingStrategyTranslation.fromProto(proto, protoComponents).fixDefaults(), + equalTo( + windowingStrategy + .withEnvironmentId(components.getOnlyEnvironmentId()) + .fixDefaults())); + + protoComponents.getCoder( + components.registerCoder(windowingStrategy.getWindowFn().windowCoder())); + assertThat( + proto.getAssignsToOneWindow(), + equalTo(windowingStrategy.getWindowFn().assignsToOneWindow())); + } } - private static final WindowFn REPRESENTATIVE_WINDOW_FN = - FixedWindows.of(Duration.millis(12)); - - private static final Trigger REPRESENTATIVE_TRIGGER = AfterWatermark.pastEndOfWindow(); - - @Parameters(name = "{index}: {0}") - public static Iterable data() { - return ImmutableList.of( - toProtoAndBackSpec(WindowingStrategy.globalDefault()), - toProtoAndBackSpec( - WindowingStrategy.of( - FixedWindows.of(Duration.millis(11)).withOffset(Duration.millis(3)))), - toProtoAndBackSpec( - WindowingStrategy.of( - SlidingWindows.of(Duration.millis(37)) - .every(Duration.millis(3)) - .withOffset(Duration.millis(2)))), - toProtoAndBackSpec(WindowingStrategy.of(Sessions.withGapDuration(Duration.millis(389)))), - toProtoAndBackSpec( - WindowingStrategy.of(REPRESENTATIVE_WINDOW_FN) - .withClosingBehavior(ClosingBehavior.FIRE_ALWAYS) - .withMode(AccumulationMode.ACCUMULATING_FIRED_PANES) - .withTrigger(REPRESENTATIVE_TRIGGER) - .withAllowedLateness(Duration.millis(71)) - .withTimestampCombiner(TimestampCombiner.EARLIEST)), - toProtoAndBackSpec( - WindowingStrategy.of(REPRESENTATIVE_WINDOW_FN) - .withClosingBehavior(ClosingBehavior.FIRE_IF_NON_EMPTY) - .withMode(AccumulationMode.DISCARDING_FIRED_PANES) - .withTrigger(REPRESENTATIVE_TRIGGER) - .withAllowedLateness(Duration.millis(93)) - .withTimestampCombiner(TimestampCombiner.LATEST)), - toProtoAndBackSpec( - WindowingStrategy.of(REPRESENTATIVE_WINDOW_FN) - .withClosingBehavior(ClosingBehavior.FIRE_IF_NON_EMPTY) - .withMode(AccumulationMode.RETRACTING_FIRED_PANES) - .withTrigger(REPRESENTATIVE_TRIGGER) - .withAllowedLateness(Duration.millis(100)) - .withTimestampCombiner(TimestampCombiner.LATEST))); + @RunWith(JUnit4.class) + public static class ExpectedProtoTests { + + @Test + public void testSessionsMergeStatus() throws Exception { + WindowingStrategy windowingStrategy = + WindowingStrategy.of(Sessions.withGapDuration(Duration.millis(456))); + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + RunnerApi.WindowingStrategy proto = + WindowingStrategyTranslation.toProto(windowingStrategy, components); + + assertThat(proto.getMergeStatus(), equalTo(RunnerApi.MergeStatus.Enum.NEEDS_MERGE)); + } + + @Test + public void testFixedMergeStatus() throws Exception { + WindowingStrategy windowingStrategy = + WindowingStrategy.of(FixedWindows.of(Duration.millis(2))); + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + RunnerApi.WindowingStrategy proto = + WindowingStrategyTranslation.toProto(windowingStrategy, components); + + assertThat(proto.getMergeStatus(), equalTo(RunnerApi.MergeStatus.Enum.NON_MERGING)); + } } - @Parameter(0) - public ToProtoAndBackSpec toProtoAndBackSpec; - - @Test - public void testToProtoAndBack() throws Exception { - WindowingStrategy windowingStrategy = toProtoAndBackSpec.getWindowingStrategy(); - SdkComponents components = SdkComponents.create(); - components.registerEnvironment(Environments.createDockerEnvironment("java")); - WindowingStrategy toProtoAndBackWindowingStrategy = - WindowingStrategyTranslation.fromProto( - WindowingStrategyTranslation.toMessageProto(windowingStrategy, components)); - - assertThat( - toProtoAndBackWindowingStrategy, - equalTo( - (WindowingStrategy) - windowingStrategy - .withEnvironmentId(components.getOnlyEnvironmentId()) - .fixDefaults())); + private static class CustomWindow extends IntervalWindow { + private boolean isBig; + + CustomWindow(Instant start, Instant end, boolean isBig) { + super(start, end); + this.isBig = isBig; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + CustomWindow that = (CustomWindow) o; + return super.equals(o) && this.isBig == that.isBig; + } + + @Override + public int hashCode() { + return Objects.hash(super.hashCode(), isBig); + } + } + + private static class CustomWindowCoder extends CustomCoder { + + private static final CustomWindowCoder INSTANCE = new CustomWindowCoder(); + private static final Coder INTERVAL_WINDOW_CODER = IntervalWindow.getCoder(); + private static final VarIntCoder VAR_INT_CODER = VarIntCoder.of(); + + public static CustomWindowCoder of() { + return INSTANCE; + } + + @Override + public void encode(CustomWindow window, OutputStream outStream) throws IOException { + INTERVAL_WINDOW_CODER.encode(window, outStream); + VAR_INT_CODER.encode(window.isBig ? 1 : 0, outStream); + } + + @Override + public CustomWindow decode(InputStream inStream) throws IOException { + IntervalWindow superWindow = INTERVAL_WINDOW_CODER.decode(inStream); + boolean isBig = VAR_INT_CODER.decode(inStream) != 0; + return new CustomWindow(superWindow.start(), superWindow.end(), isBig); + } + + @Override + public void verifyDeterministic() throws NonDeterministicException { + INTERVAL_WINDOW_CODER.verifyDeterministic(); + VAR_INT_CODER.verifyDeterministic(); + } } - @Test - public void testToProtoAndBackWithComponents() throws Exception { - WindowingStrategy windowingStrategy = toProtoAndBackSpec.getWindowingStrategy(); - SdkComponents components = SdkComponents.create(); - components.registerEnvironment(Environments.createDockerEnvironment("java")); - RunnerApi.WindowingStrategy proto = - WindowingStrategyTranslation.toProto(windowingStrategy, components); - RehydratedComponents protoComponents = - RehydratedComponents.forComponents(components.toComponents()); - - assertThat( - WindowingStrategyTranslation.fromProto(proto, protoComponents).fixDefaults(), - equalTo( - windowingStrategy.withEnvironmentId(components.getOnlyEnvironmentId()).fixDefaults())); - - protoComponents.getCoder( - components.registerCoder(windowingStrategy.getWindowFn().windowCoder())); - assertThat( - proto.getAssignsToOneWindow(), - equalTo(windowingStrategy.getWindowFn().assignsToOneWindow())); + private static class CustomWindowFn extends WindowFn { + @Override + public Collection assignWindows(AssignContext c) throws Exception { + String element; + // It loses genericity of type T but this is not a big deal for a test. + // And it allows to avoid duplicating CustomWindowFn to support PCollection + if (c.element() instanceof KV) { + element = ((KV) c.element()).getValue(); + } else { + element = (String) c.element(); + } + // put big elements in windows of 30s and small ones in windows of 5s + if ("big".equals(element)) { + return Collections.singletonList( + new CustomWindow( + c.timestamp(), c.timestamp().plus(Duration.standardSeconds(30)), true)); + } else { + return Collections.singletonList( + new CustomWindow( + c.timestamp(), c.timestamp().plus(Duration.standardSeconds(5)), false)); + } + } + + @Override + public void mergeWindows(MergeContext c) throws Exception { + Map> windowsToMerge = new HashMap<>(); + for (CustomWindow window : c.windows()) { + if (window.isBig) { + HashSet windows = new HashSet<>(); + windows.add(window); + windowsToMerge.put(window, windows); + } + } + for (CustomWindow window : c.windows()) { + for (Map.Entry> bigWindow : windowsToMerge.entrySet()) { + if (bigWindow.getKey().contains(window)) { + bigWindow.getValue().add(window); + } + } + } + for (Map.Entry> mergeEntry : windowsToMerge.entrySet()) { + c.merge(mergeEntry.getValue(), mergeEntry.getKey()); + } + } + + @Override + public boolean isCompatible(WindowFn other) { + return other instanceof CustomWindowFn; + } + + @Override + public Coder windowCoder() { + return CustomWindowCoder.of(); + } + + @Override + public WindowMappingFn getDefaultWindowMappingFn() { + throw new UnsupportedOperationException("side inputs not supported"); + } + + @Override + public boolean equals(@Nullable Object o) { + if (o == null || getClass() != o.getClass()) { + return false; + } + + CustomWindowFn windowFn = (CustomWindowFn) o; + if (this.isCompatible(windowFn)) { + return true; + } + + return false; + } + + @Override + public int hashCode() { + // overriding hashCode() is required but it is not useful in terms of + // writting test cases. + return Objects.hash("test"); + } } } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WriteFilesTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WriteFilesTranslationTest.java index ca0274380d1f..c8861fdb28f2 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WriteFilesTranslationTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/WriteFilesTranslationTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.construction; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Objects; import org.apache.beam.model.pipeline.v1.RunnerApi; @@ -35,6 +35,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.SerializableFunctions; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.values.PCollection; @@ -92,7 +93,12 @@ public void testExtractionDirectFromTransform() throws Exception { AppliedPTransform, WriteFilesResult, WriteFiles> appliedPTransform = AppliedPTransform.of( - "foo", PValues.expandInput(input), PValues.expandOutput(output), writeFiles, p); + "foo", + PValues.expandInput(input), + PValues.expandOutput(output), + writeFiles, + ResourceHints.create(), + p); assertThat( WriteFilesTranslation.isRunnerDeterminedSharding(appliedPTransform), diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ExecutableStageTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ExecutableStageTest.java index 6101e6a59ed2..f0732b5ad365 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ExecutableStageTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ExecutableStageTest.java @@ -18,13 +18,13 @@ package org.apache.beam.runners.core.construction.graph; import static org.apache.beam.runners.core.construction.graph.ExecutableStage.DEFAULT_WIRE_CODER_SETTINGS; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasValue; -import static org.junit.Assert.assertThat; import java.util.Collections; import org.apache.beam.model.pipeline.v1.RunnerApi.Components; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/FusedPipelineTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/FusedPipelineTest.java index eda5f8d8ddd3..fff804b41d88 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/FusedPipelineTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/FusedPipelineTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.core.construction.graph; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItem; @@ -25,7 +26,6 @@ import static org.hamcrest.Matchers.hasKey; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.startsWith; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.HashSet; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyPipelineFuserTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyPipelineFuserTest.java index c395c23244e7..aef3dddf22ce 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyPipelineFuserTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyPipelineFuserTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.core.construction.graph; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables.getOnlyElement; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; @@ -27,7 +28,6 @@ import static org.hamcrest.Matchers.hasSize; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import java.util.Collection; import java.util.HashSet; @@ -60,9 +60,6 @@ /** Tests for {@link GreedyPipelineFuser}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GreedyPipelineFuserTest { // Contains the 'go' and 'py' environments, and a default 'impulse' step and output. private Components partialComponents; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuserTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuserTest.java index 7049c0b024f1..b0d1e82cd0f7 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuserTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuserTest.java @@ -18,12 +18,12 @@ package org.apache.beam.runners.core.construction.graph; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables.getOnlyElement; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.util.Collections; import java.util.Set; @@ -54,9 +54,6 @@ /** Tests for {@link GreedyStageFuser}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GreedyStageFuserTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ImmutableExecutableStageTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ImmutableExecutableStageTest.java index 5d11bfc13b73..2554b8505fe9 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ImmutableExecutableStageTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ImmutableExecutableStageTest.java @@ -18,12 +18,12 @@ package org.apache.beam.runners.core.construction.graph; import static org.apache.beam.runners.core.construction.graph.ExecutableStage.DEFAULT_WIRE_CODER_SETTINGS; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasValue; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import java.util.Collections; import org.apache.beam.model.pipeline.v1.RunnerApi; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/NetworksTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/NetworksTest.java index 5ca18062e734..9a13909a0f25 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/NetworksTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/NetworksTest.java @@ -18,13 +18,13 @@ package org.apache.beam.runners.core.construction.graph; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.collection.IsEmptyCollection.empty; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.Collections; import java.util.Comparator; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicatorTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicatorTest.java index 93c8fd3f680f..c62554a6058d 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicatorTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicatorTest.java @@ -19,13 +19,13 @@ import static org.apache.beam.runners.core.construction.graph.ExecutableStage.DEFAULT_WIRE_CODER_SETTINGS; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables.getOnlyElement; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasEntry; import static org.hamcrest.Matchers.hasItems; import static org.hamcrest.Matchers.hasSize; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Collection; @@ -51,7 +51,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class OutputDeduplicatorTest { @Test diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ProtoOverridesTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ProtoOverridesTest.java index 97df308d3b4a..62774dda30a7 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ProtoOverridesTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ProtoOverridesTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.core.construction.graph; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasKey; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.nio.charset.StandardCharsets; import org.apache.beam.model.pipeline.v1.RunnerApi; @@ -36,7 +36,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline; import org.apache.beam.model.pipeline.v1.RunnerApi.WindowingStrategy; import org.apache.beam.runners.core.construction.graph.ProtoOverrides.TransformReplacement; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Test; import org.junit.runner.RunWith; diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/QueryablePipelineTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/QueryablePipelineTest.java index 6de249b39fc0..c1fafdf73e91 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/QueryablePipelineTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/QueryablePipelineTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.core.construction.graph; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables.getOnlyElement; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; @@ -25,7 +26,6 @@ import static org.hamcrest.Matchers.hasSize; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.util.Collection; import java.util.Map; @@ -317,13 +317,13 @@ public void getProducer() { String impulseOutputName = getOnlyElement( PipelineNode.pTransform( - "BoundedRead/Impulse", components.getTransformsOrThrow("BoundedRead/Impulse")) + "BoundedRead-Impulse", components.getTransformsOrThrow("BoundedRead-Impulse")) .getTransform() .getOutputsMap() .values()); PTransformNode impulseProducer = PipelineNode.pTransform( - "BoundedRead/Impulse", components.getTransformsOrThrow("BoundedRead/Impulse")); + "BoundedRead-Impulse", components.getTransformsOrThrow("BoundedRead-Impulse")); PCollectionNode impulseOutput = PipelineNode.pCollection( impulseOutputName, components.getPcollectionsOrThrow(impulseOutputName)); @@ -355,7 +355,7 @@ public void getEnvironmentWithEnvironment() { PTransformNode environmentalTransform = PipelineNode.pTransform( - "ParDo/ParMultiDo(Test)", components.getTransformsOrThrow("ParDo/ParMultiDo(Test)")); + "ParDo-ParMultiDo-Test-", components.getTransformsOrThrow("ParDo-ParMultiDo-Test-")); PTransformNode nonEnvironmentalTransform = PipelineNode.pTransform("groupByKey", components.getTransformsOrThrow("groupByKey")); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetectorTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetectorTest.java index 60481035cac0..afd0472946b7 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetectorTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetectorTest.java @@ -40,9 +40,6 @@ import org.junit.rules.TemporaryFolder; import org.mockito.Mockito; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ClasspathScanningResourcesDetectorTest { @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder(); diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesTest.java index 76608f0b1467..bb4d7db2eb0e 100644 --- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesTest.java +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesTest.java @@ -23,15 +23,16 @@ import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertThrows; import java.io.File; import java.io.IOException; import java.net.URL; import java.net.URLClassLoader; -import java.util.ArrayList; import java.util.Arrays; import java.util.List; +import org.apache.beam.sdk.options.FileStagingOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.junit.Rule; import org.junit.Test; @@ -41,9 +42,6 @@ /** Tests for PipelineResources. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PipelineResourcesTest { @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder(); @@ -79,9 +77,8 @@ public void testDetectedResourcesListDoNotContainNotStageableResources() throws public void testFailOnNonExistingPaths() throws IOException { String nonexistentFilePath = tmpFolder.getRoot().getPath() + "/nonexistent/file"; String existingFilePath = tmpFolder.newFile("existingFile").getAbsolutePath(); - String temporaryLocation = tmpFolder.newFolder().getAbsolutePath(); - List filesToStage = Arrays.asList(nonexistentFilePath, existingFilePath); + String temporaryLocation = tmpFolder.newFolder().getAbsolutePath(); assertThrows( "To-be-staged file does not exist: ", @@ -92,11 +89,9 @@ public void testFailOnNonExistingPaths() throws IOException { @Test public void testPackagingDirectoryResourceToJarFile() throws IOException { String directoryPath = tmpFolder.newFolder().getAbsolutePath(); + List filesToStage = Arrays.asList(directoryPath); String temporaryLocation = tmpFolder.newFolder().getAbsolutePath(); - List filesToStage = new ArrayList<>(); - filesToStage.add(directoryPath); - List result = PipelineResources.prepareFilesForStaging(filesToStage, temporaryLocation); assertTrue(new File(result.get(0)).exists()); @@ -105,8 +100,58 @@ public void testPackagingDirectoryResourceToJarFile() throws IOException { @Test public void testIfThrowsWhenThereIsNoTemporaryFolderForJars() throws IOException { - List filesToStage = new ArrayList<>(); - filesToStage.add(tmpFolder.newFolder().getAbsolutePath()); + List filesToStage = Arrays.asList(tmpFolder.newFolder().getAbsolutePath()); + + IllegalArgumentException exception = + assertThrows( + IllegalArgumentException.class, + () -> PipelineResources.prepareFilesForStaging(filesToStage, null)); + + assertEquals( + "Please provide temporary location for storing the jar files.", exception.getMessage()); + } + + @Test + public void testPrepareFilesForStagingFromOptions() throws IOException { + String nonexistentFilePath = tmpFolder.getRoot().getPath() + "/nonexistent/file"; + String existingFilePath = tmpFolder.newFile("existingFile").getAbsolutePath(); + List filesToStage = Arrays.asList(nonexistentFilePath, existingFilePath); + String temporaryLocation = tmpFolder.newFolder().getAbsolutePath(); + + FileStagingOptions options = PipelineOptionsFactory.create().as(FileStagingOptions.class); + options.setFilesToStage(filesToStage); + options.setTempLocation(temporaryLocation); + + assertThrows( + "To-be-staged file does not exist: ", + IllegalStateException.class, + () -> PipelineResources.prepareFilesForStaging(filesToStage, temporaryLocation)); + } + + @Test + public void testPackagingDirectoryResourceFromOptions() throws IOException { + String directoryPath = tmpFolder.newFolder().getAbsolutePath(); + List filesToStage = Arrays.asList(directoryPath); + String temporaryLocation = tmpFolder.newFolder().getAbsolutePath(); + + FileStagingOptions options = PipelineOptionsFactory.create().as(FileStagingOptions.class); + options.setFilesToStage(filesToStage); + options.setTempLocation(temporaryLocation); + + PipelineResources.prepareFilesForStaging(options); + List result = options.getFilesToStage(); + + assertEquals(1, result.size()); + assertTrue(new File(result.get(0)).exists()); + assertTrue(result.get(0).matches(".*\\.jar")); + } + + @Test + public void testIfThrowsWhenThereIsNoTemporaryFolderForJarsFromOptions() throws IOException { + List filesToStage = Arrays.asList(tmpFolder.newFolder().getAbsolutePath()); + + FileStagingOptions options = PipelineOptionsFactory.create().as(FileStagingOptions.class); + options.setFilesToStage(filesToStage); IllegalArgumentException exception = assertThrows( @@ -116,4 +161,18 @@ public void testIfThrowsWhenThereIsNoTemporaryFolderForJars() throws IOException assertEquals( "Please provide temporary location for storing the jar files.", exception.getMessage()); } + + @Test + public void testPrepareFilesForStagingUndefinedFilesToStage() throws IOException { + String temporaryLocation = tmpFolder.newFolder().getAbsolutePath(); + + FileStagingOptions options = PipelineOptionsFactory.create().as(FileStagingOptions.class); + options.setTempLocation(temporaryLocation); + + PipelineResources.prepareFilesForStaging(options); + List result = options.getFilesToStage(); + + assertNotNull(result); + assertTrue(result.size() > 0); + } } diff --git a/runners/core-java/build.gradle b/runners/core-java/build.gradle index 3d57e54a4d16..e8ef11bf1b97 100644 --- a/runners/core-java/build.gradle +++ b/runners/core-java/build.gradle @@ -41,17 +41,16 @@ test { dependencies { compile project(path: ":model:pipeline", configuration: "shadow") compile project(path: ":sdks:java:core", configuration: "shadow") - compile project(path: ":model:fn-execution", configuration: "shadow") + compile project(path: ":model:job-management", configuration: "shadow") compile project(":runners:core-construction-java") compile project(":sdks:java:fn-execution") compile library.java.vendored_guava_26_0_jre compile library.java.joda_time + compile library.java.vendored_grpc_1_36_0 + compile library.java.slf4j_api testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.mockito_core testCompile library.java.slf4j_api - testCompile library.java.jackson_dataformat_yaml testRuntimeOnly library.java.slf4j_simple } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/Concatenate.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/Concatenate.java new file mode 100644 index 000000000000..c61152e7b93d --- /dev/null +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/Concatenate.java @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderRegistry; +import org.apache.beam.sdk.coders.ListCoder; +import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.values.PCollection; + +/** + * Combiner that combines {@code T}s into a single {@code List} containing all inputs. + * + *

    For internal use to translate {@link GroupByKey}. For a large {@link PCollection} this is + * expected to crash! + * + *

    This is copied from the dataflow runner code. + * + * @param the type of elements to concatenate. + */ +public class Concatenate extends Combine.CombineFn, List> { + @Override + public List createAccumulator() { + return new ArrayList<>(); + } + + @Override + public List addInput(List accumulator, T input) { + accumulator.add(input); + return accumulator; + } + + @Override + public List mergeAccumulators(Iterable> accumulators) { + List result = createAccumulator(); + for (List accumulator : accumulators) { + result.addAll(accumulator); + } + return result; + } + + @Override + public List extractOutput(List accumulator) { + return accumulator; + } + + @Override + public Coder> getAccumulatorCoder(CoderRegistry registry, Coder inputCoder) { + return ListCoder.of(inputCoder); + } + + @Override + public Coder> getDefaultOutputCoder(CoderRegistry registry, Coder inputCoder) { + return ListCoder.of(inputCoder); + } +} diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java index d69c4807fe62..cd1f9824ac3b 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java @@ -27,6 +27,7 @@ import java.util.NavigableMap; import java.util.Objects; import java.util.Set; +import java.util.function.Function; import java.util.stream.Collectors; import javax.annotation.Nullable; import org.apache.beam.runners.core.StateTag.StateBinder; @@ -55,6 +56,9 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.checkerframework.checker.initialization.qual.Initialized; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.UnknownKeyFor; import org.joda.time.Instant; /** @@ -641,7 +645,23 @@ public void clear() { @Override public ReadableState get(K key) { - return ReadableStates.immediate(contents.get(key)); + return getOrDefault(key, null); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState getOrDefault( + K key, @Nullable V defaultValue) { + return new ReadableState() { + @Override + public @org.checkerframework.checker.nullness.qual.Nullable V read() { + return contents.getOrDefault(key, defaultValue); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + return this; + } + }; } @Override @@ -650,10 +670,11 @@ public void put(K key, V value) { } @Override - public ReadableState putIfAbsent(K key, V value) { + public ReadableState computeIfAbsent( + K key, Function mappingFunction) { V v = contents.get(key); if (v == null) { - v = contents.put(key, value); + v = contents.put(key, mappingFunction.apply(key)); } return ReadableStates.immediate(v); @@ -701,6 +722,23 @@ public ReadableState>> entries() { return CollectionViewState.of(contents.entrySet()); } + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState< + @UnknownKeyFor @NonNull @Initialized Boolean> + isEmpty() { + return new ReadableState() { + @Override + public @org.checkerframework.checker.nullness.qual.Nullable Boolean read() { + return contents.isEmpty(); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + return this; + } + }; + } + @Override public boolean isCleared() { return contents.isEmpty(); diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java index 8be9081efbc2..863a92b591ad 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java @@ -63,6 +63,9 @@ public class InMemoryTimerInternals implements TimerInternals { /** Current synchronized processing time. */ private Instant synchronizedProcessingTime = BoundedWindow.TIMESTAMP_MIN_VALUE; + /** Class.getSimpleName() cached to avoid allocations for tracing. */ + private static final String SIMPLE_NAME = InMemoryTimerInternals.class.getSimpleName(); + @Override public @Nullable Instant currentOutputWatermarkTime() { return outputWatermarkTime; @@ -125,17 +128,12 @@ public void setTimer( @Deprecated @Override public void setTimer(TimerData timerData) { - WindowTracing.trace("{}.setTimer: {}", getClass().getSimpleName(), timerData); + WindowTracing.trace("{}.setTimer: {}", SIMPLE_NAME, timerData); - @Nullable - TimerData existing = - existingTimers.get( - timerData.getNamespace(), timerData.getTimerId() + '+' + timerData.getTimerFamilyId()); + @Nullable String colKey = timerData.getTimerId() + '+' + timerData.getTimerFamilyId(); + TimerData existing = existingTimers.get(timerData.getNamespace(), colKey); if (existing == null) { - existingTimers.put( - timerData.getNamespace(), - timerData.getTimerId() + '+' + timerData.getTimerFamilyId(), - timerData); + existingTimers.put(timerData.getNamespace(), colKey, timerData); timersForDomain(timerData.getDomain()).add(timerData); } else { checkArgument( @@ -149,16 +147,14 @@ public void setTimer(TimerData timerData) { NavigableSet timers = timersForDomain(timerData.getDomain()); timers.remove(existing); timers.add(timerData); - existingTimers.put( - timerData.getNamespace(), - timerData.getTimerId() + '+' + timerData.getTimerFamilyId(), - timerData); + existingTimers.put(timerData.getNamespace(), colKey, timerData); } } } @Override - public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain timeDomain) { + public void deleteTimer( + StateNamespace namespace, String timerId, String timerFamilyId, TimeDomain timeDomain) { throw new UnsupportedOperationException("Canceling a timer by ID is not yet supported."); } @@ -216,7 +212,7 @@ public void advanceInputWatermark(Instant newInputWatermark) throws Exception { newInputWatermark); WindowTracing.trace( "{}.advanceInputWatermark: from {} to {}", - getClass().getSimpleName(), + SIMPLE_NAME, inputWatermarkTime, newInputWatermark); inputWatermarkTime = newInputWatermark; @@ -229,7 +225,7 @@ public void advanceOutputWatermark(Instant newOutputWatermark) { if (newOutputWatermark.isAfter(inputWatermarkTime)) { WindowTracing.trace( "{}.advanceOutputWatermark: clipping output watermark from {} to {}", - getClass().getSimpleName(), + SIMPLE_NAME, newOutputWatermark, inputWatermarkTime); adjustedOutputWatermark = inputWatermarkTime; @@ -244,7 +240,7 @@ public void advanceOutputWatermark(Instant newOutputWatermark) { adjustedOutputWatermark); WindowTracing.trace( "{}.advanceOutputWatermark: from {} to {}", - getClass().getSimpleName(), + SIMPLE_NAME, outputWatermarkTime, adjustedOutputWatermark); outputWatermarkTime = adjustedOutputWatermark; @@ -259,10 +255,7 @@ public void advanceProcessingTime(Instant newProcessingTime) throws Exception { processingTime, newProcessingTime); WindowTracing.trace( - "{}.advanceProcessingTime: from {} to {}", - getClass().getSimpleName(), - processingTime, - newProcessingTime); + "{}.advanceProcessingTime: from {} to {}", SIMPLE_NAME, processingTime, newProcessingTime); processingTime = newProcessingTime; } @@ -277,7 +270,7 @@ public void advanceSynchronizedProcessingTime(Instant newSynchronizedProcessingT newSynchronizedProcessingTime); WindowTracing.trace( "{}.advanceProcessingTime: from {} to {}", - getClass().getSimpleName(), + SIMPLE_NAME, synchronizedProcessingTime, newSynchronizedProcessingTime); synchronizedProcessingTime = newSynchronizedProcessingTime; @@ -288,10 +281,7 @@ public void advanceSynchronizedProcessingTime(Instant newSynchronizedProcessingT TimerData timer = removeNextTimer(inputWatermarkTime, TimeDomain.EVENT_TIME); if (timer != null) { WindowTracing.trace( - "{}.removeNextEventTimer: firing {} at {}", - getClass().getSimpleName(), - timer, - inputWatermarkTime); + "{}.removeNextEventTimer: firing {} at {}", SIMPLE_NAME, timer, inputWatermarkTime); } return timer; } @@ -301,10 +291,7 @@ public void advanceSynchronizedProcessingTime(Instant newSynchronizedProcessingT TimerData timer = removeNextTimer(processingTime, TimeDomain.PROCESSING_TIME); if (timer != null) { WindowTracing.trace( - "{}.removeNextProcessingTimer: firing {} at {}", - getClass().getSimpleName(), - timer, - processingTime); + "{}.removeNextProcessingTimer: firing {} at {}", SIMPLE_NAME, timer, processingTime); } return timer; } @@ -316,7 +303,7 @@ public void advanceSynchronizedProcessingTime(Instant newSynchronizedProcessingT if (timer != null) { WindowTracing.trace( "{}.removeNextSynchronizedProcessingTimer: firing {} at {}", - getClass().getSimpleName(), + SIMPLE_NAME, timer, synchronizedProcessingTime); } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/LateDataDroppingDoFnRunner.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/LateDataDroppingDoFnRunner.java index 7bac3d1fc4db..805735a19d41 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/LateDataDroppingDoFnRunner.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/LateDataDroppingDoFnRunner.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core; -import java.util.stream.Collectors; -import java.util.stream.StreamSupport; +import java.util.ArrayList; +import java.util.List; import org.apache.beam.sdk.metrics.Counter; import org.apache.beam.sdk.metrics.Metrics; import org.apache.beam.sdk.state.TimeDomain; @@ -29,7 +29,6 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.joda.time.Instant; /** @@ -124,52 +123,37 @@ public LateDataFilter( */ public Iterable> filter( final K key, Iterable> elements) { - Iterable>> windowsExpandedElements = - StreamSupport.stream(elements.spliterator(), false) - .map( - input -> - input.getWindows().stream() - .map( - window -> - WindowedValue.of( - input.getValue(), - input.getTimestamp(), - window, - input.getPane())) - .collect(Collectors.toList())) - .collect(Collectors.toList()); - Iterable> concatElements = Iterables.concat(windowsExpandedElements); - - // Bump the counter separately since we don't want multiple iterations to - // increase it multiple times. - for (WindowedValue input : concatElements) { - BoundedWindow window = Iterables.getOnlyElement(input.getWindows()); - if (canDropDueToExpiredWindow(window)) { - // The element is too late for this window. - droppedDueToLateness.inc(); - WindowTracing.debug( - "{}: Dropping element at {} for key:{}; window:{} " - + "since too far behind inputWatermark:{}; outputWatermark:{}", - LateDataFilter.class.getSimpleName(), - input.getTimestamp(), - key, - window, - timerInternals.currentInputWatermarkTime(), - timerInternals.currentOutputWatermarkTime()); + final List> nonLateElements = new ArrayList<>(); + for (WindowedValue element : elements) { + for (BoundedWindow window : element.getWindows()) { + if (canDropDueToExpiredWindow(window)) { + // The element is too late for this window. + droppedDueToLateness.inc(); + WindowTracing.debug( + "{}: Dropping element at {} for key:{}; window:{} " + + "since too far behind inputWatermark:{}; outputWatermark:{}", + LateDataFilter.class.getSimpleName(), + element.getTimestamp(), + key, + window, + timerInternals.currentInputWatermarkTime(), + timerInternals.currentOutputWatermarkTime()); + } else { + nonLateElements.add( + WindowedValue.of( + element.getValue(), element.getTimestamp(), window, element.getPane())); + } } } - - // return nonLateElements - return StreamSupport.stream(concatElements.spliterator(), false) - .filter( - input -> { - BoundedWindow window = Iterables.getOnlyElement(input.getWindows()); - return !canDropDueToExpiredWindow(window); - }) - .collect(Collectors.toList()); + return nonLateElements; } - /** Is {@code window} expired w.r.t. the garbage collection watermark? */ + /** + * Is {@code window} expired w.r.t. the garbage collection watermark? + * + * @param window Window to check for expiration. + * @return True if element can be dropped. + */ private boolean canDropDueToExpiredWindow(BoundedWindow window) { Instant inputWM = timerInternals.currentInputWatermarkTime(); return LateDataUtils.garbageCollectionTime(window, windowingStrategy).isBefore(inputWM); diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java index 57c22b192427..e5de34d9bc1c 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java @@ -257,7 +257,7 @@ public ReduceFnRunner( } private ActiveWindowSet createActiveWindowSet() { - return windowingStrategy.getWindowFn().isNonMerging() + return !windowingStrategy.needsMerge() ? new NonMergingActiveWindowSet<>() : new MergingActiveWindowSet<>(windowingStrategy.getWindowFn(), stateInternals); } @@ -698,9 +698,8 @@ public void onTimers(Iterable timers) throws Exception { W window = windowNamespace.getWindow(); WindowTracing.debug( - "{}: Received timer key:{}; window:{}; data:{} with " + "ReduceFnRunner: Received timer key:{}; window:{}; data:{} with " + "inputWatermark:{}; outputWatermark:{}", - ReduceFnRunner.class.getSimpleName(), key, window, timer, @@ -756,8 +755,7 @@ public void onTimers(Iterable timers) throws Exception { if (windowActivation.isGarbageCollection) { WindowTracing.debug( - "{}: Cleaning up for key:{}; window:{} with inputWatermark:{}; outputWatermark:{}", - ReduceFnRunner.class.getSimpleName(), + "ReduceFnRunner: Cleaning up for key:{}; window:{} with inputWatermark:{}; outputWatermark:{}", key, directContext.window(), timerInternals.currentInputWatermarkTime(), @@ -880,7 +878,7 @@ private void clearAllState( // - A pane has fired. // - But the trigger is not (yet) closed. if (windowingStrategy.getMode() == AccumulationMode.DISCARDING_FIRED_PANES - && !windowingStrategy.getWindowFn().isNonMerging()) { + && windowingStrategy.needsMerge()) { watermarkHold.clearHolds(directContext); } } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java index cb7bf62c9198..bfa43986eef4 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java @@ -1124,14 +1124,13 @@ public TimerInternalsTimer( @Override public void set(Instant target) { this.target = target; - verifyAbsoluteTimeDomain(); setAndVerifyOutputTimestamp(); setUnderlyingTimer(); } @Override public void setRelative() { - Instant now = getCurrentTime(); + Instant now = getCurrentRelativeTime(); if (period.equals(Duration.ZERO)) { target = now.plus(offset); } else { @@ -1144,6 +1143,11 @@ public void setRelative() { setUnderlyingTimer(); } + @Override + public void clear() { + timerInternals.deleteTimer(namespace, timerId, timerFamilyId, spec.getTimeDomain()); + } + @Override public Timer offset(Duration offset) { this.offset = offset; @@ -1176,14 +1180,6 @@ public Timer withOutputTimestamp(Instant outputTimestamp) { return this; } - /** Verifies that the time domain of this timer is acceptable for absolute timers. */ - private void verifyAbsoluteTimeDomain() { - if (!TimeDomain.EVENT_TIME.equals(spec.getTimeDomain())) { - throw new IllegalStateException( - "Cannot only set relative timers in processing time domain." + " Use #setRelative()"); - } - } - /** * * @@ -1219,11 +1215,11 @@ private void setAndVerifyOutputTimestamp() { Instant windowExpiry = window.maxTimestamp().plus(allowedLateness); if (TimeDomain.EVENT_TIME.equals(spec.getTimeDomain())) { checkArgument( - !outputTimestamp.isAfter(target), + !outputTimestamp.isAfter(windowExpiry), "Attempted to set an event-time timer with an output timestamp of %s that is" - + " after the timer firing timestamp %s", + + " after the expiration of window %s", outputTimestamp, - target); + windowExpiry); checkArgument( !target.isAfter(windowExpiry), "Attempted to set an event-time timer with a firing timestamp of %s that is" @@ -1250,8 +1246,13 @@ private void setUnderlyingTimer() { namespace, timerId, timerFamilyId, target, outputTimestamp, spec.getTimeDomain()); } - private Instant getCurrentTime() { - switch (spec.getTimeDomain()) { + @Override + public Instant getCurrentRelativeTime() { + return getCurrentTime(spec.getTimeDomain()); + } + + private Instant getCurrentTime(TimeDomain timeDomain) { + switch (timeDomain) { case EVENT_TIME: return timerInternals.currentInputWatermarkTime(); case PROCESSING_TIME: diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java index a9856f0f2721..54561e7a34cb 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java @@ -319,9 +319,9 @@ public Coder getRestrictionCoder() { } @Setup - public void setup() throws Exception { + public void setup(PipelineOptions options) throws Exception { invoker = DoFnInvokers.invokerFor(fn); - invoker.invokeSetup(); + invoker.invokeSetup(wrapOptionsAsSetup(options)); } @Teardown @@ -548,6 +548,21 @@ public String getErrorContext() { stateNamespace, wakeupTime, wakeupTime, TimeDomain.PROCESSING_TIME)); } + private DoFnInvoker.ArgumentProvider wrapOptionsAsSetup( + final PipelineOptions options) { + return new BaseArgumentProvider() { + @Override + public PipelineOptions pipelineOptions() { + return options; + } + + @Override + public String getErrorContext() { + return "SplittableParDoViaKeyedWorkItems/Setup"; + } + }; + } + private DoFnInvoker.ArgumentProvider wrapContextAsStartBundle( final StartBundleContext baseContext) { return new BaseArgumentProvider() { diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/StateTags.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/StateTags.java index e185b0a0079e..acee02b84834 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/StateTags.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/StateTags.java @@ -239,6 +239,12 @@ public static StateTag> convertToBagT StateSpecs.convertToBagSpecInternal(combiningTag.getSpec())); } + public static StateTag> convertToMapTagInternal( + StateTag> setTag) { + return new SimpleStateTag<>( + new StructuredId(setTag.getId()), StateSpecs.convertToMapSpecInternal(setTag.getSpec())); + } + private static class StructuredId implements Serializable { private final StateKind kind; private final String rawId; diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/TimerInternals.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/TimerInternals.java index 05d882793056..965be822e0d3 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/TimerInternals.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/TimerInternals.java @@ -76,13 +76,14 @@ void setTimer( * manage timers for different time domains in very different ways, thus the {@link TimeDomain} is * a required parameter. */ - void deleteTimer(StateNamespace namespace, String timerId, TimeDomain timeDomain); + void deleteTimer( + StateNamespace namespace, String timerId, String timerFamilyId, TimeDomain timeDomain); - /** @deprecated use {@link #deleteTimer(StateNamespace, String, TimeDomain)}. */ + /** @deprecated use {@link #deleteTimer(StateNamespace, String, String, TimeDomain)}. */ @Deprecated void deleteTimer(StateNamespace namespace, String timerId, String timerFamilyId); - /** @deprecated use {@link #deleteTimer(StateNamespace, String, TimeDomain)}. */ + /** @deprecated use {@link #deleteTimer(StateNamespace, String, String, TimeDomain)}. */ @Deprecated void deleteTimer(TimerData timerKey); @@ -185,6 +186,8 @@ abstract class TimerData implements Comparable { public abstract TimeDomain getDomain(); + public abstract boolean getDeleted(); + // When adding a new field, make sure to add it to the compareTo() method. /** Construct a {@link TimerData} for the given parameters. */ @@ -195,7 +198,7 @@ public static TimerData of( Instant outputTimestamp, TimeDomain domain) { return new AutoValue_TimerInternals_TimerData( - timerId, "", namespace, timestamp, outputTimestamp, domain); + timerId, "", namespace, timestamp, outputTimestamp, domain, false); } /** @@ -210,7 +213,7 @@ public static TimerData of( Instant outputTimestamp, TimeDomain domain) { return new AutoValue_TimerInternals_TimerData( - timerId, timerFamilyId, namespace, timestamp, outputTimestamp, domain); + timerId, timerFamilyId, namespace, timestamp, outputTimestamp, domain, false); } /** @@ -228,6 +231,17 @@ public static TimerData of( return of(timerId, namespace, timestamp, outputTimestamp, domain); } + public TimerData deleted() { + return new AutoValue_TimerInternals_TimerData( + getTimerId(), + getTimerFamilyId(), + getNamespace(), + getTimestamp(), + getOutputTimestamp(), + getDomain(), + true); + } + /** * {@inheritDoc}. * @@ -242,6 +256,7 @@ public int compareTo(TimerData that) { } ComparisonChain chain = ComparisonChain.start() + .compare(this.getDeleted(), that.getDeleted()) .compare(this.getTimestamp(), that.getTimestamp()) .compare(this.getOutputTimestamp(), that.getOutputTimestamp()) .compare(this.getDomain(), that.getDomain()) diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java index 099d5dbd5aac..e1606db67d42 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java @@ -113,10 +113,7 @@ public WatermarkHold(TimerInternals timerInternals, WindowingStrategy wind * output time function. */ private Instant shift(Instant timestamp, W window) { - Instant shifted = - windowingStrategy - .getTimestampCombiner() - .assign(window, windowingStrategy.getWindowFn().getOutputTime(timestamp, window)); + Instant shifted = windowingStrategy.getTimestampCombiner().assign(window, timestamp); // Don't call checkState(), to avoid calling BoundedWindow.formatTimestamp() every time if (shifted.isBefore(timestamp)) { throw new IllegalStateException( diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateSampler.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateSampler.java index 8ecbc90464b0..2d39b377b576 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateSampler.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateSampler.java @@ -20,14 +20,15 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import java.io.Closeable; +import java.util.HashSet; import java.util.concurrent.CancellationException; -import java.util.concurrent.ConcurrentSkipListSet; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException; +import javax.annotation.concurrent.GuardedBy; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; import org.checkerframework.checker.nullness.qual.Nullable; @@ -36,8 +37,10 @@ /** Monitors the execution of one or more execution threads. */ public class ExecutionStateSampler { - private final ConcurrentSkipListSet activeTrackers = - new ConcurrentSkipListSet<>(); + // We use a synchronized data structure (as opposed to a concurrent one) since synchronization + // is necessary to prevent races between tracker removal and the sampling thread iteration. + @GuardedBy("this") + private final HashSet activeTrackers = new HashSet<>(); private static final MillisProvider SYSTEM_MILLIS_PROVIDER = System::currentTimeMillis; @@ -65,10 +68,16 @@ public static ExecutionStateSampler newForTest(MillisProvider clock) { return new ExecutionStateSampler(checkNotNull(clock)); } - private static final long PERIOD_MS = 200; + // The sampling period can be reset with flag --experiment state_sampling_period_millis=. + private static long periodMs = 200; private @Nullable Future executionSamplerFuture = null; + /** Set the state sampler sampling period. */ + public static void setSamplingPeriod(long samplingPeriodMillis) { + periodMs = samplingPeriodMillis; + } + /** Reset the state sampler. */ public void reset() { lastSampleTimeMillis = 0; @@ -78,8 +87,6 @@ public void reset() { * Called to start the ExecutionStateSampler. Until the returned {@link Closeable} is closed, the * state sampler will periodically sample the current state of all the threads it has been asked * to manage. - * - *

    */ public void start() { start( @@ -101,7 +108,7 @@ synchronized void start(ExecutorService executor) { executor.submit( () -> { lastSampleTimeMillis = clock.getMillis(); - long targetTimeMillis = lastSampleTimeMillis + PERIOD_MS; + long targetTimeMillis = lastSampleTimeMillis + periodMs; while (!Thread.interrupted()) { long currentTimeMillis = clock.getMillis(); long difference = targetTimeMillis - currentTimeMillis; @@ -115,7 +122,7 @@ synchronized void start(ExecutorService executor) { // Call doSampling if more than PERIOD_MS have passed. doSampling(currentTimeMillis - lastSampleTimeMillis); lastSampleTimeMillis = currentTimeMillis; - targetTimeMillis = lastSampleTimeMillis + PERIOD_MS; + targetTimeMillis = lastSampleTimeMillis + periodMs; } } return null; @@ -129,7 +136,7 @@ public synchronized void stop() { executionSamplerFuture.cancel(true); try { - executionSamplerFuture.get(5 * PERIOD_MS, TimeUnit.MILLISECONDS); + executionSamplerFuture.get(5 * periodMs, TimeUnit.MILLISECONDS); } catch (CancellationException e) { // This was expected -- we were cancelling the thread. } catch (InterruptedException | TimeoutException e) { @@ -142,19 +149,18 @@ public synchronized void stop() { } } - void addTracker(ExecutionStateTracker tracker) { + /** Add the tracker to the sampling set. */ + synchronized void addTracker(ExecutionStateTracker tracker) { this.activeTrackers.add(tracker); } - /** - * Deregister tracker after MapTask completes. - * - *

    This method needs to be synchronized to prevent race condition with sampling thread - */ - synchronized void removeTracker(ExecutionStateTracker tracker) { - this.activeTrackers.remove(tracker); + /** Remove the tracker from the sampling set. */ + void removeTracker(ExecutionStateTracker tracker) { + synchronized (this) { + activeTrackers.remove(tracker); + } - // Attribute any remaining time since last sampling on deregisteration. + // Attribute any remaining time since the last sampling while removing the tracker. // // There is a race condition here; if sampling happens in the time between when we remove the // tracker from activeTrackers and read the lastSampleTicks value, the sampling time will @@ -166,11 +172,7 @@ synchronized void removeTracker(ExecutionStateTracker tracker) { } } - /** - * Attributing sampling time to trackers. - * - *

    This method needs to be synchronized to prevent race condition with removing tracker - */ + /** Attributing sampling time to trackers. */ @VisibleForTesting public synchronized void doSampling(long millisSinceLastSample) { for (ExecutionStateTracker tracker : activeTrackers) { diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateTracker.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateTracker.java index b5dddad89bb2..fccad31da2cf 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateTracker.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateTracker.java @@ -41,7 +41,7 @@ public class ExecutionStateTracker implements Comparable * don't use a ThreadLocal to allow testing the implementation of this class without having to run * from multiple threads. */ - private static final Map CURRENT_TRACKERS = + private static final Map CURRENT_TRACKERS = new ConcurrentHashMap<>(); private static final long LULL_REPORT_MS = TimeUnit.MINUTES.toMillis(5); @@ -141,14 +141,16 @@ public ExecutionStateTracker(ExecutionStateSampler sampler) { } /** Reset the execution status. */ - public void reset() { - trackedThread = null; + public synchronized void reset() { + if (trackedThread != null) { + CURRENT_TRACKERS.remove(trackedThread.getId()); + trackedThread = null; + } currentState = null; numTransitions = 0; millisSinceLastTransition = 0; transitionsAtLastSample = 0; nextLullReportMs = LULL_REPORT_MS; - CURRENT_TRACKERS.entrySet().removeIf(entry -> entry.getValue() == this); } @VisibleForTesting @@ -180,7 +182,17 @@ public int compareTo(ExecutionStateTracker o) { * either is no current state or if the current thread is not currently tracking the state. */ public static @Nullable ExecutionState getCurrentExecutionState() { - ExecutionStateTracker tracker = CURRENT_TRACKERS.get(Thread.currentThread()); + ExecutionStateTracker tracker = CURRENT_TRACKERS.get(Thread.currentThread().getId()); + return tracker == null ? null : tracker.currentState; + } + + /** + * Return the current {@link ExecutionState} of the thread with thread id, or {@code null} if + * there either is no current state or if the corresponding thread is not currently tracking the + * state. + */ + public static @Nullable ExecutionState getCurrentExecutionState(long threadId) { + ExecutionStateTracker tracker = CURRENT_TRACKERS.get(threadId); return tracker == null ? null : tracker.currentState; } @@ -203,7 +215,7 @@ public synchronized Closeable activate(Thread thread) { checkState( trackedThread == null, "Cannot activate an ExecutionStateTracker that is already in use."); - ExecutionStateTracker other = CURRENT_TRACKERS.put(thread, this); + ExecutionStateTracker other = CURRENT_TRACKERS.put(thread.getId(), this); checkState( other == null, "Execution state of thread {} was already being tracked by {}", @@ -224,7 +236,9 @@ public Thread getTrackedThread() { private synchronized void deactivate() { sampler.removeTracker(this); Thread thread = this.trackedThread; - CURRENT_TRACKERS.remove(thread); + if (thread != null) { + CURRENT_TRACKERS.remove(thread.getId()); + } this.trackedThread = null; } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/GcpResourceIdentifiers.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/GcpResourceIdentifiers.java new file mode 100644 index 000000000000..4c388bf539cb --- /dev/null +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/GcpResourceIdentifiers.java @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +/** + * Helper functions to generate resource labels strings for GCP entitites These can be used on + * MonitoringInfo 'resource' labels. See example entities: + * + *

    https://s.apache.org/beam-gcp-debuggability For GCP entities, populate the RESOURCE label with + * the aip.dev/122 format: https://google.aip.dev/122 If an official GCP format does not exist, try + * to use the following format. //whatever.googleapis.com/parents/{parentId}/whatevers/{whateverId} + */ +public class GcpResourceIdentifiers { + + public static String bigQueryTable(String projectId, String datasetId, String tableId) { + return String.format( + "//bigquery.googleapis.com/projects/%s/datasets/%s/tables/%s", + projectId, datasetId, tableId); + } + + public static String bigtableTableID(String project, String instance, String table) { + return String.format("projects/%s/instances/%s/tables/%s", project, instance, table); + } + + public static String bigtableResource(String projectId, String instanceId, String tableId) { + return String.format( + "//bigtable.googleapis.com/projects/%s/instances/%s/tables/%s", + projectId, instanceId, tableId); + } + + public static String cloudStorageBucket(String bucketId) { + return String.format("//storage.googleapis.com/buckets/%s", bucketId); + } + + public static String datastoreResource(String projectId, String namespace) { + return String.format( + "//bigtable.googleapis.com/projects/%s/namespaces/%s", projectId, namespace); + } +} diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/HistogramCell.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/HistogramCell.java new file mode 100644 index 000000000000..9f3238255e9d --- /dev/null +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/HistogramCell.java @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import java.util.Objects; +import org.apache.beam.sdk.metrics.MetricName; +import org.apache.beam.sdk.metrics.MetricsContainer; +import org.apache.beam.sdk.util.HistogramData; +import org.apache.beam.sdk.values.KV; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * Tracks the current value (and delta) for a Histogram metric. + * + *

    This class generally shouldn't be used directly. The only exception is within a runner where a + * histogram is being reported for a specific step (rather than the histogram in the current + * context). In that case retrieving the underlying cell and reporting directly to it avoids a step + * of indirection. + */ +public class HistogramCell + implements org.apache.beam.sdk.metrics.Histogram, MetricCell { + + private final DirtyState dirty = new DirtyState(); + private final HistogramData value; + private final MetricName name; + + /** + * Generally, runners should construct instances using the methods in {@link + * MetricsContainerImpl}, unless they need to define their own version of {@link + * MetricsContainer}. These constructors are *only* public so runners can instantiate. + */ + public HistogramCell(KV kv) { + this.name = kv.getKey(); + this.value = new HistogramData(kv.getValue()); + } + + @Override + public void reset() { + dirty.afterModification(); + value.clear(); + } + + /** Increment the corresponding histogram bucket count for the value by 1. */ + @Override + public void update(double value) { + this.value.record(value); + dirty.afterModification(); + } + + /** + * Increment all of the bucket counts in this histogram, by the bucket counts specified in other. + */ + public void update(HistogramCell other) { + this.value.update(other.value); + dirty.afterModification(); + } + + // TODO(BEAM-12103): Update this function to allow incrementing the infinite buckets as well. + // and remove the incTopBucketCount and incBotBucketCount methods. + // Using 0 and length -1 as the bucketIndex. + public void incBucketCount(int bucketIndex, long count) { + this.value.incBucketCount(bucketIndex, count); + dirty.afterModification(); + } + + public void incTopBucketCount(long count) { + this.value.incTopBucketCount(count); + dirty.afterModification(); + } + + public void incBottomBucketCount(long count) { + this.value.incBottomBucketCount(count); + dirty.afterModification(); + } + + @Override + public DirtyState getDirty() { + return dirty; + } + + @Override + public HistogramData getCumulative() { + return value; + } + + @Override + public MetricName getName() { + return name; + } + + @Override + public boolean equals(@Nullable Object object) { + if (object instanceof HistogramCell) { + HistogramCell histogramCell = (HistogramCell) object; + return Objects.equals(dirty, histogramCell.dirty) + && Objects.equals(value, histogramCell.value) + && Objects.equals(name, histogramCell.name); + } + + return false; + } + + @Override + public int hashCode() { + return Objects.hash(dirty, value, name); + } +} diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/LabeledMetrics.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/LabeledMetrics.java index bf6295b0edb0..f3d27937763f 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/LabeledMetrics.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/LabeledMetrics.java @@ -19,6 +19,11 @@ import org.apache.beam.sdk.metrics.Counter; import org.apache.beam.sdk.metrics.DelegatingCounter; +import org.apache.beam.sdk.metrics.DelegatingDistribution; +import org.apache.beam.sdk.metrics.DelegatingHistogram; +import org.apache.beam.sdk.metrics.Distribution; +import org.apache.beam.sdk.metrics.Histogram; +import org.apache.beam.sdk.util.HistogramData; /** * Define a metric on the current MetricContainer with a specific URN and a set of labels. This is a @@ -32,4 +37,19 @@ public class LabeledMetrics { public static Counter counter(MonitoringInfoMetricName metricName) { return new DelegatingCounter(metricName); } + + public static Counter counter(MonitoringInfoMetricName metricName, boolean processWideContainer) { + return new DelegatingCounter(metricName, processWideContainer); + } + + public static Distribution distribution(MonitoringInfoMetricName metricName) { + return new DelegatingDistribution(metricName); + } + + public static Histogram histogram( + MonitoringInfoMetricName metricName, + HistogramData.BucketType bucketType, + boolean processWideContainer) { + return new DelegatingHistogram(metricName, bucketType, processWideContainer); + } } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerImpl.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerImpl.java index 7f41d87efd9d..4d7c37abc3b7 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerImpl.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerImpl.java @@ -23,6 +23,8 @@ import static org.apache.beam.runners.core.metrics.MonitoringInfoEncodings.decodeInt64Counter; import static org.apache.beam.runners.core.metrics.MonitoringInfoEncodings.decodeInt64Distribution; import static org.apache.beam.runners.core.metrics.MonitoringInfoEncodings.decodeInt64Gauge; +import static org.apache.beam.runners.core.metrics.MonitoringInfoEncodings.encodeInt64Counter; +import static org.apache.beam.runners.core.metrics.MonitoringInfoEncodings.encodeInt64Distribution; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import java.io.Serializable; @@ -30,6 +32,10 @@ import java.util.Map; import java.util.Map.Entry; import java.util.Objects; +import java.util.Optional; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; +import java.util.function.Function; import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; import org.apache.beam.runners.core.metrics.MetricUpdates.MetricUpdate; import org.apache.beam.sdk.metrics.Distribution; @@ -37,7 +43,12 @@ import org.apache.beam.sdk.metrics.MetricKey; import org.apache.beam.sdk.metrics.MetricName; import org.apache.beam.sdk.metrics.MetricsContainer; +import org.apache.beam.sdk.util.HistogramData; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.Nullable; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -61,7 +72,9 @@ public class MetricsContainerImpl implements Serializable, MetricsContainer { private static final Logger LOG = LoggerFactory.getLogger(MetricsContainerImpl.class); - private final @Nullable String stepName; + protected final @Nullable String stepName; + + private final boolean isProcessWide; private MetricsMap counters = new MetricsMap<>(CounterCell::new); @@ -70,19 +83,47 @@ public class MetricsContainerImpl implements Serializable, MetricsContainer { private MetricsMap gauges = new MetricsMap<>(GaugeCell::new); - /** Create a new {@link MetricsContainerImpl} associated with the given {@code stepName}. */ - public MetricsContainerImpl(@Nullable String stepName) { + private MetricsMap, HistogramCell> histograms = + new MetricsMap<>(HistogramCell::new); + + private MetricsContainerImpl(@Nullable String stepName, boolean isProcessWide) { this.stepName = stepName; + this.isProcessWide = isProcessWide; } + /** + * Create a new {@link MetricsContainerImpl} associated with the given {@code stepName}. If + * stepName is null, this MetricsContainer is not bound to a step. + */ + public MetricsContainerImpl(@Nullable String stepName) { + this(stepName, false); + } + + /** + * Create a new {@link MetricsContainerImpl} associated with the entire process. Used for + * collecting processWide metrics for HarnessMonitoringInfoRequest/Response. + */ + public static MetricsContainerImpl createProcessWideContainer() { + return new MetricsContainerImpl(null, true); + } + + @edu.umd.cs.findbugs.annotations.SuppressFBWarnings( + justification = "No bug", + value = "SE_BAD_FIELD") + private Map> shortIdsByMetricKey = new ConcurrentHashMap<>(); + /** Reset the metrics. */ public void reset() { + if (this.isProcessWide) { + throw new RuntimeException("Process Wide metric containers must not be reset"); + } reset(counters); reset(distributions); reset(gauges); + reset(histograms); } - private void reset(MetricsMap> cells) { + private void reset(MetricsMap> cells) { for (MetricCell cell : cells.values()) { cell.reset(); } @@ -122,6 +163,24 @@ public DistributionCell getDistribution(MetricName metricName) { return distributions.tryGet(metricName); } + /** + * Return a {@code HistogramCell} named {@code metricName}. If it doesn't exist, create a {@code + * Metric} with the specified name. + */ + @Override + public HistogramCell getHistogram(MetricName metricName, HistogramData.BucketType bucketType) { + return histograms.get(KV.of(metricName, bucketType)); + } + + /** + * Return a {@code HistogramCell} named {@code metricName}. If it doesn't exist, return {@code + * null}. + */ + public @Nullable HistogramCell tryGetHistogram( + MetricName metricName, HistogramData.BucketType bucketType) { + return histograms.tryGet(KV.of(metricName, bucketType)); + } + /** * Return a {@code GaugeCell} named {@code metricName}. If it doesn't exist, create a {@code * Metric} with the specified name. @@ -160,11 +219,13 @@ public MetricUpdates getUpdates() { extractUpdates(counters), extractUpdates(distributions), extractUpdates(gauges)); } - /** @return The MonitoringInfo generated from the metricUpdate. */ - private @Nullable MonitoringInfo counterUpdateToMonitoringInfo(MetricUpdate metricUpdate) { + /** @return The MonitoringInfo metadata from the metric. */ + private @Nullable SimpleMonitoringInfoBuilder metricToMonitoringMetadata( + MetricKey metricKey, String typeUrn, String userUrn) { SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder(true); + builder.setType(typeUrn); - MetricName metricName = metricUpdate.getKey().metricName(); + MetricName metricName = metricKey.metricName(); if (metricName instanceof MonitoringInfoMetricName) { MonitoringInfoMetricName monitoringInfoName = (MonitoringInfoMetricName) metricName; // Represents a specific MonitoringInfo for a specific URN. @@ -180,58 +241,57 @@ public MetricUpdates getUpdates() { } builder - .setUrn(MonitoringInfoConstants.Urns.USER_SUM_INT64) - .setLabel( - MonitoringInfoConstants.Labels.NAMESPACE, - metricUpdate.getKey().metricName().getNamespace()) - .setLabel( - MonitoringInfoConstants.Labels.NAME, metricUpdate.getKey().metricName().getName()) - .setLabel(MonitoringInfoConstants.Labels.PTRANSFORM, metricUpdate.getKey().stepName()); + .setUrn(userUrn) + .setLabel(MonitoringInfoConstants.Labels.NAMESPACE, metricKey.metricName().getNamespace()) + .setLabel(MonitoringInfoConstants.Labels.NAME, metricKey.metricName().getName()) + .setLabel(MonitoringInfoConstants.Labels.PTRANSFORM, metricKey.stepName()); } + return builder; + } - builder.setInt64SumValue(metricUpdate.getUpdate()); + /** @return The MonitoringInfo metadata from the counter metric. */ + private @Nullable SimpleMonitoringInfoBuilder counterToMonitoringMetadata(MetricKey metricKey) { + return metricToMonitoringMetadata( + metricKey, + MonitoringInfoConstants.TypeUrns.SUM_INT64_TYPE, + MonitoringInfoConstants.Urns.USER_SUM_INT64); + } + /** @return The MonitoringInfo generated from the counter metricUpdate. */ + private @Nullable MonitoringInfo counterUpdateToMonitoringInfo(MetricUpdate metricUpdate) { + SimpleMonitoringInfoBuilder builder = counterToMonitoringMetadata(metricUpdate.getKey()); + if (builder == null) { + return null; + } + builder.setInt64SumValue(metricUpdate.getUpdate()); return builder.build(); } + /** @return The MonitoringInfo metadata from the distribution metric. */ + private @Nullable SimpleMonitoringInfoBuilder distributionToMonitoringMetadata( + MetricKey metricKey) { + return metricToMonitoringMetadata( + metricKey, + MonitoringInfoConstants.TypeUrns.DISTRIBUTION_INT64_TYPE, + MonitoringInfoConstants.Urns.USER_DISTRIBUTION_INT64); + } + /** * @param metricUpdate - * @return The MonitoringInfo generated from the metricUpdate. + * @return The MonitoringInfo generated from the distribution metricUpdate. */ private @Nullable MonitoringInfo distributionUpdateToMonitoringInfo( MetricUpdate metricUpdate) { - SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder(true); - MetricName metricName = metricUpdate.getKey().metricName(); - if (metricName instanceof MonitoringInfoMetricName) { - MonitoringInfoMetricName monitoringInfoName = (MonitoringInfoMetricName) metricName; - // Represents a specific MonitoringInfo for a specific URN. - builder.setUrn(monitoringInfoName.getUrn()); - for (Entry e : monitoringInfoName.getLabels().entrySet()) { - builder.setLabel(e.getKey(), e.getValue()); - } - } else { // Note: (metricName instanceof MetricName) is always True. - // Represents a user counter. - builder - .setUrn(MonitoringInfoConstants.Urns.USER_DISTRIBUTION_INT64) - .setLabel( - MonitoringInfoConstants.Labels.NAMESPACE, - metricUpdate.getKey().metricName().getNamespace()) - .setLabel( - MonitoringInfoConstants.Labels.NAME, metricUpdate.getKey().metricName().getName()); - - // Drop if the stepname is not set. All user counters must be - // defined for a PTransform. They must be defined on a container bound to a step. - if (this.stepName == null) { - // TODO(BEAM-7191): Consider logging a warning with a quiet logging API. - return null; - } - builder.setLabel(MonitoringInfoConstants.Labels.PTRANSFORM, metricUpdate.getKey().stepName()); + SimpleMonitoringInfoBuilder builder = distributionToMonitoringMetadata(metricUpdate.getKey()); + if (builder == null) { + return null; } builder.setInt64DistributionValue(metricUpdate.getUpdate()); return builder.build(); } /** Return the cumulative values for any metrics in this container as MonitoringInfos. */ + @Override public Iterable getMonitoringInfos() { // Extract user metrics and store as MonitoringInfos. ArrayList monitoringInfos = new ArrayList(); @@ -254,6 +314,47 @@ public Iterable getMonitoringInfos() { return monitoringInfos; } + public Map getMonitoringData(ShortIdMap shortIds) { + ImmutableMap.Builder builder = ImmutableMap.builder(); + MetricUpdates metricUpdates = this.getUpdates(); + for (MetricUpdate metricUpdate : metricUpdates.counterUpdates()) { + String shortId = + getShortId(metricUpdate.getKey(), this::counterToMonitoringMetadata, shortIds); + if (shortId != null) { + builder.put(shortId, encodeInt64Counter(metricUpdate.getUpdate())); + } + } + for (MetricUpdate metricUpdate : + metricUpdates.distributionUpdates()) { + String shortId = + getShortId(metricUpdate.getKey(), this::distributionToMonitoringMetadata, shortIds); + if (shortId != null) { + builder.put(shortId, encodeInt64Distribution(metricUpdate.getUpdate())); + } + } + return builder.build(); + } + + private String getShortId( + MetricKey key, Function toInfo, ShortIdMap shortIds) { + Optional shortId = shortIdsByMetricKey.get(key); + if (shortId == null) { + SimpleMonitoringInfoBuilder monitoringInfoBuilder = toInfo.apply(key); + if (monitoringInfoBuilder == null) { + shortId = Optional.empty(); + } else { + MonitoringInfo monitoringInfo = monitoringInfoBuilder.build(); + if (monitoringInfo == null) { + shortId = Optional.empty(); + } else { + shortId = Optional.of(shortIds.getOrCreateShortId(monitoringInfo)); + } + } + shortIdsByMetricKey.put(key, shortId); + } + return shortId.orElse(null); + } + private void commitUpdates(MetricsMap> cells) { for (MetricCell cell : cells.values()) { cell.getDirty().afterCommit(); @@ -296,6 +397,7 @@ public void update(MetricsContainerImpl other) { updateCounters(counters, other.counters); updateDistributions(distributions, other.distributions); updateGauges(gauges, other.gauges); + updateHistograms(histograms, other.histograms); } private void updateForSumInt64Type(MonitoringInfo monitoringInfo) { @@ -366,6 +468,16 @@ private void updateGauges( } } + private void updateHistograms( + MetricsMap, HistogramCell> current, + MetricsMap, HistogramCell> updates) { + for (Map.Entry, HistogramCell> histogram : + updates.entries()) { + HistogramCell h = histogram.getValue(); + current.get(histogram.getKey()).update(h); + } + } + @Override public boolean equals(@Nullable Object object) { if (object instanceof MetricsContainerImpl) { @@ -375,7 +487,6 @@ public boolean equals(@Nullable Object object) { && Objects.equals(distributions, metricsContainerImpl.distributions) && Objects.equals(gauges, metricsContainerImpl.gauges); } - return false; } @@ -383,4 +494,116 @@ public boolean equals(@Nullable Object object) { public int hashCode() { return Objects.hash(stepName, counters, distributions, gauges); } + + /** + * Match a MetricName with a given metric filter. If the metric filter is null, the method always + * returns true. TODO(BEAM-10986) Consider making this use the MetricNameFilter and related + * classes. + */ + @VisibleForTesting + static boolean matchMetric(MetricName metricName, @Nullable Set allowedMetricUrns) { + if (allowedMetricUrns == null) { + return true; + } + if (metricName instanceof MonitoringInfoMetricName) { + return allowedMetricUrns.contains(((MonitoringInfoMetricName) metricName).getUrn()); + } + return false; + } + + /** Return a string representing the cumulative values of all metrics in this container. */ + public String getCumulativeString(@Nullable Set allowedMetricUrns) { + StringBuilder message = new StringBuilder(); + for (Map.Entry cell : counters.entries()) { + if (!matchMetric(cell.getKey(), allowedMetricUrns)) { + continue; + } + message.append(cell.getKey().toString()); + message.append(" = "); + message.append(cell.getValue().getCumulative()); + message.append("\n"); + } + for (Map.Entry cell : distributions.entries()) { + if (!matchMetric(cell.getKey(), allowedMetricUrns)) { + continue; + } + message.append(cell.getKey().toString()); + message.append(" = "); + DistributionData data = cell.getValue().getCumulative(); + message.append( + String.format( + "{sum: %d, count: %d, min: %d, max: %d}", + data.sum(), data.count(), data.min(), data.max())); + message.append("\n"); + } + for (Map.Entry cell : gauges.entries()) { + if (!matchMetric(cell.getKey(), allowedMetricUrns)) { + continue; + } + message.append(cell.getKey().toString()); + message.append(" = "); + GaugeData data = cell.getValue().getCumulative(); + message.append(String.format("{timestamp: %s, value: %d}", data.timestamp(), data.value())); + message.append("\n"); + } + for (Map.Entry, HistogramCell> cell : + histograms.entries()) { + if (!matchMetric(cell.getKey().getKey(), allowedMetricUrns)) { + continue; + } + message.append(cell.getKey().getKey().toString()); + message.append(" = "); + HistogramData data = cell.getValue().getCumulative(); + if (data.getTotalCount() > 0) { + message.append( + String.format( + "{count: %d, p50: %f, p90: %f, p99: %f}", + data.getTotalCount(), data.p50(), data.p90(), data.p99())); + } else { + message.append("{count: 0}"); + } + message.append("\n"); + } + return message.toString(); + } + + /** + * Returns a MetricContainer with the delta values between two MetricsContainers. The purpose of + * this function is to print the changes made to the metrics within a window of time. The + * difference between the counter and histogram bucket counters are calculated between curr and + * prev. The most recent value are used for gauges. Distribution metrics are dropped (As there is + * meaningful way to calculate the delta). Returns curr if prev is null. + */ + public static MetricsContainerImpl deltaContainer( + @Nullable MetricsContainerImpl prev, MetricsContainerImpl curr) { + if (prev == null) { + return curr; + } + MetricsContainerImpl deltaContainer = new MetricsContainerImpl(curr.stepName); + for (Map.Entry cell : curr.counters.entries()) { + Long prevValue = prev.counters.get(cell.getKey()).getCumulative(); + Long currValue = cell.getValue().getCumulative(); + deltaContainer.counters.get(cell.getKey()).inc(currValue - prevValue); + } + for (Map.Entry cell : curr.gauges.entries()) { + // Simply take the most recent value for gauge, no need to count deltas. + deltaContainer.gauges.get(cell.getKey()).update(cell.getValue().getCumulative()); + } + for (Map.Entry, HistogramCell> cell : + curr.histograms.entries()) { + HistogramData.BucketType bt = cell.getKey().getValue(); + HistogramData prevValue = prev.histograms.get(cell.getKey()).getCumulative(); + HistogramData currValue = cell.getValue().getCumulative(); + HistogramCell deltaValueCell = deltaContainer.histograms.get(cell.getKey()); + deltaValueCell.incBottomBucketCount( + currValue.getBottomBucketCount() - prevValue.getBottomBucketCount()); + for (int i = 0; i < bt.getNumBuckets(); i++) { + Long bucketCountDelta = currValue.getCount(i) - prevValue.getCount(i); + deltaValueCell.incBucketCount(i, bucketCountDelta); + } + deltaValueCell.incTopBucketCount( + currValue.getTopBucketCount() - prevValue.getTopBucketCount()); + } + return deltaContainer; + } } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMap.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMap.java index 7ef5629ace93..f3fe701c4972 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMap.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMap.java @@ -35,8 +35,10 @@ import org.apache.beam.sdk.metrics.MetricKey; import org.apache.beam.sdk.metrics.MetricResult; import org.apache.beam.sdk.metrics.MetricResults; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.util.JsonFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.JsonFormat; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.Nullable; /** @@ -64,10 +66,8 @@ public MetricsContainerImpl getContainer(String stepName) { // TODO(BEAM-6538): Disallow this in the future, some tests rely on an empty step name today. return getUnboundContainer(); } - if (!metricsContainers.containsKey(stepName)) { - metricsContainers.put(stepName, new MetricsContainerImpl(stepName)); - } - return metricsContainers.get(stepName); + return metricsContainers.computeIfAbsent( + stepName, (String name) -> new MetricsContainerImpl(name)); } /** @@ -176,6 +176,16 @@ public Iterable getMonitoringInfos() { return monitoringInfos; } + /** Return the cumulative values for any metrics in this container as MonitoringInfo data. */ + public Map getMonitoringData(ShortIdMap shortIds) { + // Extract user metrics and store as MonitoringInfos. + ImmutableMap.Builder builder = ImmutableMap.builder(); + for (MetricsContainerImpl container : getMetricsContainers()) { + builder.putAll(container.getMonitoringData(shortIds)); + } + return builder.build(); + } + @Override public String toString() { JobApi.MetricResults results = @@ -200,12 +210,15 @@ private static void mergeAttemptedResults( BiFunction combine) { for (MetricUpdate metricUpdate : updates) { MetricKey key = metricUpdate.getKey(); - MetricResult current = metricResultMap.get(key); - if (current == null) { - metricResultMap.put(key, MetricResult.attempted(key, metricUpdate.getUpdate())); - } else { - metricResultMap.put(key, current.addAttempted(metricUpdate.getUpdate(), combine)); - } + metricResultMap.compute( + key, + (k, current) -> { + if (current == null) { + return MetricResult.attempted(key, metricUpdate.getUpdate()); + } else { + return current.addAttempted(metricUpdate.getUpdate(), combine); + } + }); } } @@ -215,14 +228,14 @@ private static void mergeCommittedResults( BiFunction combine) { for (MetricUpdate metricUpdate : updates) { MetricKey key = metricUpdate.getKey(); - MetricResult current = metricResultMap.get(key); - if (current == null) { + if (metricResultMap.computeIfPresent( + key, ((k, current) -> current.addCommitted(metricUpdate.getUpdate(), combine))) + == null) { throw new IllegalStateException( String.format( "%s: existing 'attempted' result not found for 'committed' value %s", key, metricUpdate.getUpdate())); } - metricResultMap.put(key, current.addCommitted(metricUpdate.getUpdate(), combine)); } } } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsLogger.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsLogger.java new file mode 100644 index 000000000000..c36738221913 --- /dev/null +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsLogger.java @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import java.util.Date; +import java.util.Set; +import java.util.concurrent.atomic.AtomicLong; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@Experimental(Kind.METRICS) +public class MetricsLogger extends MetricsContainerImpl { + private static final Logger LOG = LoggerFactory.getLogger(MetricsLogger.class); + + Lock reportingLocK = new ReentrantLock(); + AtomicLong lastReportedMillis = new AtomicLong(System.currentTimeMillis()); + @Nullable MetricsContainerImpl lastMetricsSnapshot = null; + + public MetricsLogger(@Nullable String stepName) { + super(stepName); + } + + public String generateLogMessage( + String header, Set allowedMetricUrns, long lastReported) { + MetricsContainerImpl nextMetricsSnapshot = new MetricsContainerImpl(this.stepName); + nextMetricsSnapshot.update(this); + MetricsContainerImpl deltaContainer = + MetricsContainerImpl.deltaContainer(lastMetricsSnapshot, nextMetricsSnapshot); + + StringBuilder logMessage = new StringBuilder(); + logMessage.append(header); + logMessage.append(deltaContainer.getCumulativeString(allowedMetricUrns)); + logMessage.append(String.format("(last reported at %s)%n", new Date(lastReported))); + + lastMetricsSnapshot = nextMetricsSnapshot; + return logMessage.toString(); + } + + public void tryLoggingMetrics( + String header, Set allowedMetricUrns, long minimumLoggingFrequencyMillis) { + + if (reportingLocK.tryLock()) { + try { + long currentTimeMillis = System.currentTimeMillis(); + long lastReported = lastReportedMillis.get(); + if (currentTimeMillis - lastReported > minimumLoggingFrequencyMillis) { + LOG.info(generateLogMessage(header, allowedMetricUrns, lastReported)); + lastReportedMillis.set(currentTimeMillis); + } + } finally { + reportingLocK.unlock(); + } + } + } + + @Override + public int hashCode() { + return super.hashCode(); + } + + @Override + public boolean equals(@Nullable Object object) { + if (object instanceof MetricsLogger) { + return super.equals(object); + } + return false; + } +} diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java index a1f7d2c49976..c792719ea20c 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java @@ -54,6 +54,10 @@ public static final class Urns { public static final String WORK_REMAINING = extractUrn(MonitoringInfoSpecs.Enum.WORK_REMAINING); public static final String DATA_CHANNEL_READ_INDEX = extractUrn(MonitoringInfoSpecs.Enum.DATA_CHANNEL_READ_INDEX); + public static final String API_REQUEST_COUNT = + extractUrn(MonitoringInfoSpecs.Enum.API_REQUEST_COUNT); + public static final String API_REQUEST_LATENCIES = + extractUrn(MonitoringInfoSpecs.Enum.API_REQUEST_LATENCIES); } /** Standardised MonitoringInfo labels that can be utilized by runners. */ @@ -65,8 +69,29 @@ public static final class Labels { public static final String ENVIRONMENT = "ENVIRONMENT"; public static final String NAMESPACE = "NAMESPACE"; public static final String NAME = "NAME"; + public static final String SERVICE = "SERVICE"; + public static final String METHOD = "METHOD"; + public static final String RESOURCE = "RESOURCE"; + public static final String STATUS = "STATUS"; + public static final String BIGQUERY_PROJECT_ID = "BIGQUERY_PROJECT_ID"; + public static final String BIGQUERY_DATASET = "BIGQUERY_DATASET"; + public static final String BIGQUERY_TABLE = "BIGQUERY_TABLE"; + public static final String BIGQUERY_VIEW = "BIGQUERY_VIEW"; + public static final String BIGQUERY_QUERY_NAME = "BIGQUERY_QUERY_NAME"; + public static final String DATASTORE_PROJECT = "DATASTORE_PROJECT"; + public static final String DATASTORE_NAMESPACE = "DATASTORE_NAMESPACE"; + public static final String BIGTABLE_PROJECT_ID = "BIGTABLE_PROJECT_ID"; + public static final String INSTANCE_ID = "INSTANCE_ID"; + public static final String TABLE_ID = "TABLE_ID"; + public static final String GCS_BUCKET = "GCS_BUCKET"; + public static final String GCS_PROJECT_ID = "GCS_PROJECT_ID"; static { + // Note: One benefit of defining these strings above, instead of pulling them in from + // the proto files, is to ensure that this code will crash if the strings in the proto + // file are changed, without modifying this file. + // Though, one should not change those strings either, as Runner Harnesss running old versions + // would not be able to understand the new label names./ checkArgument(PTRANSFORM.equals(extractLabel(MonitoringInfoLabels.TRANSFORM))); checkArgument(PCOLLECTION.equals(extractLabel(MonitoringInfoLabels.PCOLLECTION))); checkArgument( @@ -75,6 +100,26 @@ public static final class Labels { checkArgument(ENVIRONMENT.equals(extractLabel(MonitoringInfoLabels.ENVIRONMENT))); checkArgument(NAMESPACE.equals(extractLabel(MonitoringInfoLabels.NAMESPACE))); checkArgument(NAME.equals(extractLabel(MonitoringInfoLabels.NAME))); + checkArgument(SERVICE.equals(extractLabel(MonitoringInfoLabels.SERVICE))); + checkArgument(METHOD.equals(extractLabel(MonitoringInfoLabels.METHOD))); + checkArgument(RESOURCE.equals(extractLabel(MonitoringInfoLabels.RESOURCE))); + checkArgument(STATUS.equals(extractLabel(MonitoringInfoLabels.STATUS))); + checkArgument( + BIGQUERY_PROJECT_ID.equals(extractLabel(MonitoringInfoLabels.BIGQUERY_PROJECT_ID))); + checkArgument(BIGQUERY_DATASET.equals(extractLabel(MonitoringInfoLabels.BIGQUERY_DATASET))); + checkArgument(BIGQUERY_TABLE.equals(extractLabel(MonitoringInfoLabels.BIGQUERY_TABLE))); + checkArgument(BIGQUERY_VIEW.equals(extractLabel(MonitoringInfoLabels.BIGQUERY_VIEW))); + checkArgument( + BIGQUERY_QUERY_NAME.equals(extractLabel(MonitoringInfoLabels.BIGQUERY_QUERY_NAME))); + checkArgument(DATASTORE_PROJECT.equals(extractLabel(MonitoringInfoLabels.DATASTORE_PROJECT))); + checkArgument( + DATASTORE_NAMESPACE.equals(extractLabel(MonitoringInfoLabels.DATASTORE_NAMESPACE))); + checkArgument( + BIGTABLE_PROJECT_ID.equals(extractLabel(MonitoringInfoLabels.BIGTABLE_PROJECT_ID))); + checkArgument(INSTANCE_ID.equals(extractLabel(MonitoringInfoLabels.INSTANCE_ID))); + checkArgument(TABLE_ID.equals(extractLabel(MonitoringInfoLabels.TABLE_ID))); + checkArgument(GCS_BUCKET.equals(extractLabel(MonitoringInfoLabels.GCS_BUCKET))); + checkArgument(GCS_PROJECT_ID.equals(extractLabel(MonitoringInfoLabels.GCS_PROJECT_ID))); } } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoEncodings.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoEncodings.java index 31567c312a5f..b5feea7e34cc 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoEncodings.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoEncodings.java @@ -22,7 +22,7 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.DoubleCoder; import org.apache.beam.sdk.coders.VarLongCoder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.joda.time.Instant; /** A set of functions used to encode and decode common monitoring info types. */ diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java index a5e593f38400..bf56b5c92389 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java @@ -91,7 +91,7 @@ public static MonitoringInfoMetricName of(MetricsApi.MonitoringInfo mi) { return new MonitoringInfoMetricName(mi.getUrn(), mi.getLabelsMap()); } - public static MonitoringInfoMetricName named(String urn, HashMap labels) { + public static MonitoringInfoMetricName named(String urn, Map labels) { return new MonitoringInfoMetricName(urn, labels); } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ServiceCallMetric.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ServiceCallMetric.java new file mode 100644 index 000000000000..17f55ba35f01 --- /dev/null +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ServiceCallMetric.java @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import java.util.HashMap; +import java.util.Map; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +/* + * Metric class which records Service API call metrics. + * This class will capture a request count metric for the specified + * request_count_urn and base_labels. + * When Call() is invoked the status must be provided, which will + * be converted to a canonical GCP status code, if possible. + */ +public class ServiceCallMetric { + + public static final Map CANONICAL_STATUS_MAP = + ImmutableMap.builder() + .put(200, "ok") + .put(400, "out_of_range") + .put(401, "unauthenticated") + .put(403, "permission_denied") + .put(404, "not_found") + .put(409, "already_exists") + .put(429, "resource_exhausted") + .put(499, "cancelled") + .put(500, "internal") + .put(501, "not_implemented") + .put(503, "unavailable") + .put(504, "deadline_exceeded") + .build(); + + public static final String CANONICAL_STATUS_UNKNOWN = "unknown"; + + public static final Map STATUS_NORMALIZATION_MAP = + ImmutableMap.builder() + .put("outofrange", "out_of_range") + .put("permissiondenied", "permission_denied") + .put("notfound", "not_found") + .put("alreadyexists", "already_exists") + .put("resourceexhausted", "resource_exhausted") + .put("notimplemented", "not_implemented") + .put("unavailable", "unavailable") + .put("deadlineexceeded", "deadline_exceeded") + .build(); + + private HashMap labels; + private final String requestCountUrn; + + public ServiceCallMetric(String requestCountUrn, HashMap baseLabels) { + this.requestCountUrn = requestCountUrn; + this.labels = baseLabels; + } + + public void call(int httpStatusCode) { + String canonicalStatusCode = ServiceCallMetric.convertToCanonicalStatusString(httpStatusCode); + call(canonicalStatusCode); + } + + public void call(String statusCode) { + labels.put( + MonitoringInfoConstants.Labels.STATUS, + ServiceCallMetric.convertToCanonicalStatusString(statusCode)); + // MonitoringInfoMetricName will copy labels. So its safe to reuse this reference. + MonitoringInfoMetricName name = MonitoringInfoMetricName.named(requestCountUrn, labels); + Counter counter = LabeledMetrics.counter(name, true); + counter.inc(); + } + + /** Converts an http status code to a canonical GCP status code string. */ + public static String convertToCanonicalStatusString(int httpStatusCode) { + return CANONICAL_STATUS_MAP.getOrDefault(httpStatusCode, CANONICAL_STATUS_UNKNOWN); + } + + /** + * Converts an status code string to a canonical GCP status code string. This is used to make + * strings like "notFound" to "not_found". If a mapping cannot be created known, then + * statusCode.toLowerCase() will be returned. + */ + public static String convertToCanonicalStatusString(String statusCode) { + if (statusCode == null) { + return CANONICAL_STATUS_UNKNOWN; + } + String normalizedStatus = STATUS_NORMALIZATION_MAP.get(statusCode.toLowerCase()); + if (normalizedStatus != null) { + return normalizedStatus; + } + return statusCode.toLowerCase(); + } +} diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ShortIdMap.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ShortIdMap.java new file mode 100644 index 000000000000..ef9481f4a3b6 --- /dev/null +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ShortIdMap.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import java.util.NoSuchElementException; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.BiMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashBiMap; + +/** A Class for registering short ids for MonitoringInfos. */ +public class ShortIdMap { + private int counter = 0; + private BiMap monitoringInfoMap = HashBiMap.create(); + + public synchronized String getOrCreateShortId(MonitoringInfo info) { + Preconditions.checkNotNull(info); + Preconditions.checkArgument(info.getPayload().isEmpty()); + Preconditions.checkArgument(!info.hasStartTime()); + + String shortId = monitoringInfoMap.inverse().get(info); + if (shortId == null) { + shortId = "metric" + counter++; + monitoringInfoMap.put(shortId, info); + } + return shortId; + } + + public synchronized MonitoringInfo get(String shortId) { + MonitoringInfo monitoringInfo = monitoringInfoMap.get(shortId); + if (monitoringInfo == null) { + throw new NoSuchElementException(shortId); + } + return monitoringInfo; + } +} diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleExecutionState.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleExecutionState.java index 9075510d5302..8d322f7ae9f3 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleExecutionState.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleExecutionState.java @@ -17,10 +17,15 @@ */ package org.apache.beam.runners.core.metrics; +import static org.apache.beam.runners.core.metrics.MonitoringInfoEncodings.decodeInt64Counter; +import static org.apache.beam.runners.core.metrics.MonitoringInfoEncodings.encodeInt64Counter; + import java.util.Collections; import java.util.HashMap; import java.util.Map; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; import org.apache.beam.runners.core.metrics.ExecutionStateTracker.ExecutionState; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.joda.time.Duration; import org.joda.time.format.PeriodFormatter; @@ -40,6 +45,7 @@ public class SimpleExecutionState extends ExecutionState { private long totalMillis = 0; private HashMap labelsMetadata; private String urn; + private String shortId; private static final Logger LOG = LoggerFactory.getLogger(SimpleExecutionState.class); @@ -81,6 +87,31 @@ public String getUrn() { return this.urn; } + public String getTotalMillisShortId(ShortIdMap shortIds) { + if (shortId == null) { + shortId = shortIds.getOrCreateShortId(getTotalMillisMonitoringMetadata()); + } + return shortId; + } + + public ByteString getTotalMillisPayload() { + return encodeInt64Counter(getTotalMillis()); + } + + public ByteString mergeTotalMillisPayload(ByteString other) { + return encodeInt64Counter(getTotalMillis() + decodeInt64Counter(other)); + } + + private MonitoringInfo getTotalMillisMonitoringMetadata() { + SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder(); + builder.setUrn(getUrn()); + for (Map.Entry entry : getLabels().entrySet()) { + builder.setLabel(entry.getKey(), entry.getValue()); + } + builder.setType(MonitoringInfoConstants.TypeUrns.SUM_INT64_TYPE); + return builder.build(); + } + public Map getLabels() { return Collections.unmodifiableMap(labelsMetadata); } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleStateRegistry.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleStateRegistry.java index 3f6f466984a8..af43645fe274 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleStateRegistry.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleStateRegistry.java @@ -18,9 +18,11 @@ package org.apache.beam.runners.core.metrics; import java.util.ArrayList; +import java.util.HashMap; import java.util.List; import java.util.Map; import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** * A Class for registering SimpleExecutionStates with and extracting execution time MonitoringInfos. @@ -56,4 +58,20 @@ public List getExecutionTimeMonitoringInfos() { } return monitoringInfos; } + + public Map getExecutionTimeMonitoringData(ShortIdMap shortIds) { + Map result = new HashMap<>(executionStates.size()); + for (SimpleExecutionState state : executionStates) { + if (state.getTotalMillis() != 0) { + String shortId = state.getTotalMillisShortId(shortIds); + if (result.containsKey(shortId)) { + // This can happen due to flatten unzipping. + result.put(shortId, state.mergeTotalMillisPayload(result.get(shortId))); + } else { + result.put(shortId, state.getTotalMillisPayload()); + } + } + } + return result; + } } diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/triggers/TriggerStateMachineContextFactory.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/triggers/TriggerStateMachineContextFactory.java index dd5ae577d9d4..4e9af0b8103e 100644 --- a/runners/core-java/src/main/java/org/apache/beam/runners/core/triggers/TriggerStateMachineContextFactory.java +++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/triggers/TriggerStateMachineContextFactory.java @@ -491,7 +491,7 @@ public void setTimer(Instant timestamp, TimeDomain domain) { @Override public void deleteTimer(Instant timestamp, TimeDomain domain) { - timers.setTimer(timestamp, domain); + timers.deleteTimer(timestamp, domain); } @Override diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/InMemoryStateInternalsTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/InMemoryStateInternalsTest.java index 61f0e0a76159..8cdc778f0655 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/InMemoryStateInternalsTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/InMemoryStateInternalsTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.core; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VarIntCoder; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/InMemoryTimerInternalsTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/InMemoryTimerInternalsTest.java index 4698394655e6..c96a00d36db5 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/InMemoryTimerInternalsTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/InMemoryTimerInternalsTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.core.TimerInternals.TimerData; import org.apache.beam.sdk.state.TimeDomain; @@ -30,9 +30,6 @@ /** Tests for {@link InMemoryTimerInternals}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class InMemoryTimerInternalsTest { private static final StateNamespace NS1 = new StateNamespaceForTest("NS1"); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/LateDataDroppingDoFnRunnerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/LateDataDroppingDoFnRunnerTest.java index 9d07ad8e48e8..d579611ec348 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/LateDataDroppingDoFnRunnerTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/LateDataDroppingDoFnRunnerTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import java.util.Arrays; @@ -44,9 +44,6 @@ /** Unit tests for {@link LateDataDroppingDoFnRunner}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LateDataDroppingDoFnRunnerTest { private static final FixedWindows WINDOW_FN = FixedWindows.of(Duration.millis(10)); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/LateDataUtilsTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/LateDataUtilsTest.java index f26b5b4033e2..0950ca62263b 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/LateDataUtilsTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/LateDataUtilsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.FixedWindows; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvokerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvokerTest.java index c3f6c891d7c3..5555c29ab6c8 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvokerTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvokerTest.java @@ -19,12 +19,12 @@ import static org.apache.beam.sdk.transforms.DoFn.ProcessContinuation.resume; import static org.apache.beam.sdk.transforms.DoFn.ProcessContinuation.stop; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.Matchers.lessThan; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.Collection; @@ -52,7 +52,6 @@ /** Tests for {@link OutputAndTimeBoundedSplittableProcessElementInvoker}. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class OutputAndTimeBoundedSplittableProcessElementInvokerTest { @Rule public transient ExpectedException e = ExpectedException.none(); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java index 1c0d2a39e0c6..e01ec5acb439 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java @@ -19,6 +19,7 @@ import static org.apache.beam.runners.core.WindowMatchers.isSingleWindowedValue; import static org.apache.beam.runners.core.WindowMatchers.isWindowedValue; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; @@ -29,7 +30,6 @@ import static org.hamcrest.Matchers.nullValue; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Mockito.doAnswer; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnTester.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnTester.java index 8ff587ece581..4bbc0f4b0e4d 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnTester.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnTester.java @@ -85,7 +85,6 @@ */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ReduceFnTester { private static final String KEY = "TEST_KEY"; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/SideInputHandlerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/SideInputHandlerTest.java index 27dfca5f13ff..3d2f95222fab 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/SideInputHandlerTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/SideInputHandlerTest.java @@ -18,9 +18,9 @@ package org.apache.beam.runners.core; import static org.apache.beam.sdk.testing.PCollectionViewTesting.materializeValuesFor; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.List; @@ -44,9 +44,6 @@ /** Unit tests for {@link SideInputHandler}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SideInputHandlerTest { @Test diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java index e21f81781af9..b3c032119bed 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.core; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.isA; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.when; @@ -63,7 +63,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SimpleDoFnRunnerTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/SimplePushbackSideInputDoFnRunnerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/SimplePushbackSideInputDoFnRunnerTest.java index b1990eb4d010..f18341b6d33f 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/SimplePushbackSideInputDoFnRunnerTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/SimplePushbackSideInputDoFnRunnerTest.java @@ -18,13 +18,13 @@ package org.apache.beam.runners.core; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.when; @@ -79,7 +79,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SimplePushbackSideInputDoFnRunnerTest { @Mock StepContext mockStepContext; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/SplittableParDoProcessFnTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/SplittableParDoProcessFnTest.java index e7158a11c525..a5622111cd11 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/SplittableParDoProcessFnTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/SplittableParDoProcessFnTest.java @@ -20,6 +20,7 @@ import static org.apache.beam.sdk.transforms.DoFn.ProcessContinuation.resume; import static org.apache.beam.sdk.transforms.DoFn.ProcessContinuation.stop; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.greaterThanOrEqualTo; import static org.hamcrest.Matchers.hasItem; @@ -27,7 +28,6 @@ import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.Serializable; @@ -77,9 +77,6 @@ /** Tests for {@link SplittableParDoViaKeyedWorkItems.ProcessFn}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SplittableParDoProcessFnTest { private static final int MAX_OUTPUTS_PER_BUNDLE = 10000; private static final Duration MAX_BUNDLE_DURATION = Duration.standardSeconds(5); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/StateInternalsTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/StateInternalsTest.java index b11083c152a3..e28e745416f4 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/StateInternalsTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/StateInternalsTest.java @@ -60,9 +60,6 @@ import org.junit.Test; /** Tests for {@link StateInternals}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public abstract class StateInternalsTest { private static final BoundedWindow WINDOW_1 = new IntervalWindow(new Instant(0), new Instant(10)); @@ -632,7 +629,7 @@ public void testMapReadable() throws Exception { // test get ReadableState get = value.get("B"); value.put("B", 2); - assertNull(get.read()); + assertThat(get.read(), equalTo(2)); // test addIfAbsent value.putIfAbsent("C", 3); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/StateNamespacesTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/StateNamespacesTest.java index f867aac67018..26e8c9e74d2f 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/StateNamespacesTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/StateNamespacesTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/StatefulDoFnRunnerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/StatefulDoFnRunnerTest.java index 3aa9fc09b382..a3b4367e8ce2 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/StatefulDoFnRunnerTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/StatefulDoFnRunnerTest.java @@ -63,7 +63,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class StatefulDoFnRunnerTest { diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/TimerInternalsTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/TimerInternalsTest.java index c6a36bdf0910..83fb30eb8120 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/TimerInternalsTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/TimerInternalsTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.comparesEqualTo; import static org.hamcrest.Matchers.lessThan; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.core.TimerInternals.TimerData; import org.apache.beam.runners.core.TimerInternals.TimerDataCoderV2; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/WindowMatchers.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/WindowMatchers.java index 0b4378bb01f9..d934f72d9f6b 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/WindowMatchers.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/WindowMatchers.java @@ -31,9 +31,6 @@ import org.joda.time.Instant; /** Matchers that are useful for working with Windowing, Timestamps, etc. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindowMatchers { public static Matcher> isWindowedValue( diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/WindowMatchersTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/WindowMatchersTest.java index bc3a0c97e444..ffd79ce606b6 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/WindowMatchersTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/WindowMatchersTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.core; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.sdk.transforms.windowing.IntervalWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/CounterCellTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/CounterCellTest.java index 1c9a34e651fc..56de258cbe47 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/CounterCellTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/CounterCellTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.metrics.MetricName; import org.junit.Assert; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/DirtyStateTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/DirtyStateTest.java index b6105f5715e7..45167403316d 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/DirtyStateTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/DirtyStateTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import org.junit.Assert; import org.junit.Test; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/DistributionCellTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/DistributionCellTest.java index 4e0b6b8ca268..faa8b5ebbc53 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/DistributionCellTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/DistributionCellTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.metrics.MetricName; import org.junit.Assert; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ExecutionStateSamplerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ExecutionStateSamplerTest.java index 19909c8deb30..642a0777dafe 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ExecutionStateSamplerTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ExecutionStateSamplerTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import java.io.Closeable; @@ -29,9 +29,6 @@ import org.junit.Test; /** Tests for {@link org.apache.beam.runners.core.metrics.ExecutionStateSampler}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ExecutionStateSamplerTest { private MillisProvider clock; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ExecutionStateTrackerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ExecutionStateTrackerTest.java index 4f2beef67927..5821e5ab6d52 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ExecutionStateTrackerTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ExecutionStateTrackerTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import java.io.Closeable; @@ -29,9 +29,6 @@ import org.junit.Test; /** Tests for {@link ExecutionStateTracker}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ExecutionStateTrackerTest { private MillisProvider clock; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/GaugeCellTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/GaugeCellTest.java index 174042dddaff..2fcae5b2c3fa 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/GaugeCellTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/GaugeCellTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.metrics.MetricName; import org.junit.Assert; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/GcpResourceIdentifiersTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/GcpResourceIdentifiersTest.java new file mode 100644 index 000000000000..97afa096f64b --- /dev/null +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/GcpResourceIdentifiersTest.java @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import org.junit.Assert; +import org.junit.Test; + +public class GcpResourceIdentifiersTest { + @Test + public void testBigQueryTable() { + String resource = GcpResourceIdentifiers.bigQueryTable("myProject", "myDataset", "myTableId"); + Assert.assertEquals( + "//bigquery.googleapis.com/projects/myProject/datasets/myDataset/tables/myTableId", + resource); + } +} diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/LabeledMetricsTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/LabeledMetricsTest.java index 03d9aadb8564..0e6f692df1a7 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/LabeledMetricsTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/LabeledMetricsTest.java @@ -32,9 +32,6 @@ import org.mockito.Mockito; /** Tests for {@link LabeledMetrics}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LabeledMetricsTest implements Serializable { @Rule public final transient ExpectedException thrown = ExpectedException.none(); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricUpdateMatchers.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricUpdateMatchers.java index abef0a934d26..9b8a4368f593 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricUpdateMatchers.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricUpdateMatchers.java @@ -24,9 +24,6 @@ import org.hamcrest.TypeSafeMatcher; /** Matchers for {@link MetricUpdate}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MetricUpdateMatchers { /** diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java index 52e77611dc95..c20ee5de2193 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java @@ -18,16 +18,24 @@ package org.apache.beam.runners.core.metrics; import static org.apache.beam.runners.core.metrics.MetricUpdateMatchers.metricUpdate; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertThrows; +import static org.junit.Assert.assertTrue; import java.util.ArrayList; +import java.util.Collections; import java.util.HashMap; +import java.util.HashSet; +import java.util.Map; +import java.util.Set; import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; import org.apache.beam.sdk.metrics.MetricName; +import org.apache.beam.sdk.util.HistogramData; import org.junit.Assert; import org.junit.Test; import org.junit.runner.RunWith; @@ -35,9 +43,6 @@ /** Tests for {@link MetricsContainerImpl}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MetricsContainerImplTest { @Test @@ -263,6 +268,15 @@ public void testMonitoringInfosArePopulatedForABeamCounter() { assertThat(actualMonitoringInfos, containsInAnyOrder(builder1.build())); } + @Test + public void testProcessWideMetricContainerThrowsWhenReset() { + MetricsContainerImpl testObject = MetricsContainerImpl.createProcessWideContainer(); + CounterCell c1 = testObject.getCounter(MetricName.named("ns", "name1")); + c1.inc(2L); + + assertThrows(RuntimeException.class, () -> testObject.reset()); + } + @Test public void testEquals() { MetricsContainerImpl metricsContainerImpl = new MetricsContainerImpl("stepName"); @@ -271,6 +285,61 @@ public void testEquals() { Assert.assertEquals(metricsContainerImpl.hashCode(), equal.hashCode()); } + @Test + public void testDeltaCounters() { + MetricName cName = MetricName.named("namespace", "counter"); + MetricName gName = MetricName.named("namespace", "gauge"); + HistogramData.BucketType bucketType = HistogramData.LinearBuckets.of(0, 2, 5); + MetricName hName = MetricName.named("namespace", "histogram"); + + MetricsContainerImpl prevContainer = new MetricsContainerImpl(null); + prevContainer.getCounter(cName).inc(2L); + prevContainer.getGauge(gName).set(4L); + // Set buckets counts to: [1,1,1,0,0,0,1] + prevContainer.getHistogram(hName, bucketType).update(-1); + prevContainer.getHistogram(hName, bucketType).update(1); + prevContainer.getHistogram(hName, bucketType).update(3); + prevContainer.getHistogram(hName, bucketType).update(20); + + MetricsContainerImpl nextContainer = new MetricsContainerImpl(null); + nextContainer.getCounter(cName).inc(9L); + nextContainer.getGauge(gName).set(8L); + // Set buckets counts to: [2,4,5,0,0,0,3] + nextContainer.getHistogram(hName, bucketType).update(-1); + nextContainer.getHistogram(hName, bucketType).update(-1); + for (int i = 0; i < 4; i++) { + nextContainer.getHistogram(hName, bucketType).update(1); + } + for (int i = 0; i < 5; i++) { + nextContainer.getHistogram(hName, bucketType).update(3); + } + nextContainer.getHistogram(hName, bucketType).update(20); + nextContainer.getHistogram(hName, bucketType).update(20); + nextContainer.getHistogram(hName, bucketType).update(20); + + MetricsContainerImpl deltaContainer = + MetricsContainerImpl.deltaContainer(prevContainer, nextContainer); + // Expect counter value: 7 = 9 - 2 + long cValue = deltaContainer.getCounter(cName).getCumulative(); + assertEquals(7L, cValue); + + // Expect gauge value: 8. + GaugeData gValue = deltaContainer.getGauge(gName).getCumulative(); + assertEquals(8L, gValue.value()); + + // Expect bucket counts: [1,3,4,0,0,0,2] + assertEquals( + 1, deltaContainer.getHistogram(hName, bucketType).getCumulative().getBottomBucketCount()); + long[] expectedBucketCounts = (new long[] {3, 4, 0, 0, 0}); + for (int i = 0; i < expectedBucketCounts.length; i++) { + assertEquals( + expectedBucketCounts[i], + deltaContainer.getHistogram(hName, bucketType).getCumulative().getCount(i)); + } + assertEquals( + 2, deltaContainer.getHistogram(hName, bucketType).getCumulative().getTopBucketCount()); + } + @Test public void testNotEquals() { MetricsContainerImpl metricsContainerImpl = new MetricsContainerImpl("stepName"); @@ -296,4 +365,37 @@ public void testNotEquals() { Assert.assertNotEquals(metricsContainerImpl, differentGauges); Assert.assertNotEquals(metricsContainerImpl.hashCode(), differentGauges.hashCode()); } + + @Test + public void testMatchMetric() { + String urn = MonitoringInfoConstants.Urns.API_REQUEST_COUNT; + Map labels = new HashMap(); + labels.put(MonitoringInfoConstants.Labels.PTRANSFORM, "MyPtransform"); + labels.put(MonitoringInfoConstants.Labels.SERVICE, "BigQuery"); + labels.put(MonitoringInfoConstants.Labels.METHOD, "BigQueryBatchWrite"); + labels.put(MonitoringInfoConstants.Labels.RESOURCE, "Resource"); + labels.put(MonitoringInfoConstants.Labels.BIGQUERY_PROJECT_ID, "MyProject"); + labels.put(MonitoringInfoConstants.Labels.BIGQUERY_DATASET, "MyDataset"); + labels.put(MonitoringInfoConstants.Labels.BIGQUERY_TABLE, "MyTable"); + + // MonitoringInfoMetricName will copy labels. So its safe to reuse this reference. + labels.put(MonitoringInfoConstants.Labels.STATUS, "ok"); + MonitoringInfoMetricName okName = MonitoringInfoMetricName.named(urn, labels); + labels.put(MonitoringInfoConstants.Labels.STATUS, "not_found"); + MonitoringInfoMetricName notFoundName = MonitoringInfoMetricName.named(urn, labels); + + Set allowedMetricUrns = new HashSet(); + allowedMetricUrns.add(MonitoringInfoConstants.Urns.API_REQUEST_COUNT); + assertTrue(MetricsContainerImpl.matchMetric(okName, allowedMetricUrns)); + assertTrue(MetricsContainerImpl.matchMetric(notFoundName, allowedMetricUrns)); + + MetricName userMetricName = MetricName.named("namespace", "name"); + assertFalse(MetricsContainerImpl.matchMetric(userMetricName, allowedMetricUrns)); + + MetricName elementCountName = + MonitoringInfoMetricName.named( + MonitoringInfoConstants.Urns.ELEMENT_COUNT, + Collections.singletonMap("name", "counter")); + assertFalse(MetricsContainerImpl.matchMetric(elementCountName, allowedMetricUrns)); + } } diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMapTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMapTest.java index c3d5d2cc010f..4718a6f2fed3 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMapTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMapTest.java @@ -20,9 +20,9 @@ import static org.apache.beam.runners.core.metrics.MetricsContainerStepMap.asAttemptedOnlyMetricResults; import static org.apache.beam.runners.core.metrics.MetricsContainerStepMap.asMetricResults; import static org.apache.beam.sdk.metrics.MetricResultsMatchers.metricsResult; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.hasItem; -import static org.junit.Assert.assertThat; import java.io.Closeable; import java.io.IOException; @@ -50,9 +50,6 @@ import org.slf4j.LoggerFactory; /** Tests for {@link MetricsContainerStepMap}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MetricsContainerStepMapTest { private static final Logger LOG = LoggerFactory.getLogger(MetricsContainerStepMapTest.class); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsLoggerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsLoggerTest.java new file mode 100644 index 000000000000..a37a0c75c12b --- /dev/null +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsLoggerTest.java @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import static org.hamcrest.MatcherAssert.assertThat; + +import java.util.Collections; +import java.util.HashSet; +import java.util.Set; +import org.apache.beam.sdk.metrics.MetricName; +import org.apache.beam.sdk.util.HistogramData; +import org.hamcrest.CoreMatchers; +import org.junit.Test; + +public class MetricsLoggerTest { + + @Test + public void testGeneratedLogMessageShowsDeltas() { + MetricName cName = + MonitoringInfoMetricName.named( + MonitoringInfoConstants.Urns.ELEMENT_COUNT, + Collections.singletonMap("name", "counter")); + HistogramData.BucketType bucketType = HistogramData.LinearBuckets.of(0, 2, 5); + MetricName hName = + MonitoringInfoMetricName.named( + MonitoringInfoConstants.Urns.ELEMENT_COUNT, + Collections.singletonMap("name", "histogram")); + + MetricsLogger logger = new MetricsLogger(null); + logger.getCounter(cName).inc(2L); + // Set buckets counts to: [0,1,1,,0,0,...] + logger.getHistogram(hName, bucketType).update(1); + logger.getHistogram(hName, bucketType).update(3); + + Set allowedMetricUrns = new HashSet(); + allowedMetricUrns.add(MonitoringInfoConstants.Urns.ELEMENT_COUNT); + String msg = logger.generateLogMessage("My Headder", allowedMetricUrns, 0); + assertThat(msg, CoreMatchers.containsString("beam:metric:element_count:v1 {name=counter} = 2")); + assertThat( + msg, + CoreMatchers.containsString( + "{name=histogram} = {count: 2, p50: 2.000000, p90: 3.600000, p99: 3.960000}")); + + logger.getCounter(cName).inc(3L); + // Set buckets counts to: [0,5,6,0,0,0] + // Which means a delta of: [0,4,5,0,0,0] + for (int i = 0; i < 4; i++) { + logger.getHistogram(hName, bucketType).update(1); + } + for (int i = 0; i < 5; i++) { + logger.getHistogram(hName, bucketType).update(3); + } + msg = logger.generateLogMessage("My Header: ", allowedMetricUrns, 0); + assertThat(msg, CoreMatchers.containsString("beam:metric:element_count:v1 {name=counter} = 3")); + assertThat( + msg, + CoreMatchers.containsString( + "{name=histogram} = {count: 9, p50: 2.200000, p90: 3.640000, p99: 3.964000}")); + + logger.getCounter(cName).inc(4L); + // Set buckets counts to: [0,8,10,0,0,0] + // Which means a delta of: [0,3,4,0,0,0] + for (int i = 0; i < 3; i++) { + logger.getHistogram(hName, bucketType).update(1); + } + for (int i = 0; i < 4; i++) { + logger.getHistogram(hName, bucketType).update(3); + } + msg = logger.generateLogMessage("My Header: ", allowedMetricUrns, 0); + assertThat( + msg, + CoreMatchers.containsString( + "{name=histogram} = {count: 7, p50: 2.250000, p90: 3.650000, p99: 3.965000}")); + } +} diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsMapTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsMapTest.java index f4b2b603c8fa..203d1dcf7f3d 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsMapTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsMapTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.core.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.nullValue; import static org.hamcrest.Matchers.sameInstance; -import static org.junit.Assert.assertThat; import java.util.Map; import java.util.Map.Entry; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsPusherTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsPusherTest.java index cdf8b29f383d..5c8902fd7ca5 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsPusherTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsPusherTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.io.GenerateSequence; import org.apache.beam.sdk.metrics.Counter; @@ -45,9 +45,6 @@ /** A test that verifies that metrics push system works. */ @Category({UsesMetricsPusher.class, ValidatesRunner.class}) @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MetricsPusherTest { private static final Logger LOG = LoggerFactory.getLogger(MetricsPusherTest.class); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstantsTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstantsTest.java index 27a1c1611fb4..b748eb8e00a3 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstantsTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstantsTest.java @@ -18,7 +18,7 @@ package org.apache.beam.runners.core.metrics; import static org.apache.beam.runners.core.metrics.MonitoringInfoConstants.extractUrn; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ArrayListMultimap; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoEncodingsTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoEncodingsTest.java index 4769976100cb..f048c5ac39a7 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoEncodingsTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoEncodingsTest.java @@ -28,7 +28,7 @@ import static org.apache.beam.runners.core.metrics.MonitoringInfoEncodings.encodeInt64Gauge; import static org.junit.Assert.assertEquals; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.joda.time.Instant; import org.junit.Test; import org.junit.runner.RunWith; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricNameTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricNameTest.java index 4572cb5e6a34..21f0993b792e 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricNameTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricNameTest.java @@ -27,9 +27,6 @@ import org.junit.rules.ExpectedException; /** Tests for {@link MonitoringInfoMetricName}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MonitoringInfoMetricNameTest implements Serializable { @Rule public final transient ExpectedException thrown = ExpectedException.none(); diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoTestUtil.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoTestUtil.java index be4258b16bb9..66090df32cdf 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoTestUtil.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoTestUtil.java @@ -25,7 +25,6 @@ */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class MonitoringInfoTestUtil { /** @return A basic MonitoringInfoMetricName to test. */ diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ServiceCallMetricTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ServiceCallMetricTest.java new file mode 100644 index 000000000000..182ebd31b182 --- /dev/null +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ServiceCallMetricTest.java @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import java.util.HashMap; +import org.apache.beam.sdk.metrics.MetricsEnvironment; +import org.junit.Assert; +import org.junit.Test; + +public class ServiceCallMetricTest { + + @Test + public void testCall() { + // Test that its on the ProcessWideMetricContainer. + MetricsContainerImpl container = new MetricsContainerImpl(null); + MetricsEnvironment.setProcessWideContainer(container); + + String urn = MonitoringInfoConstants.Urns.API_REQUEST_COUNT; + HashMap labels = new HashMap(); + labels.put("key", "value"); + ServiceCallMetric metric = + new ServiceCallMetric(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, labels); + + // Http Status Code + metric.call(200); + labels.put(MonitoringInfoConstants.Labels.STATUS, "ok"); + MonitoringInfoMetricName name = MonitoringInfoMetricName.named(urn, labels); + Assert.assertEquals(1, (long) container.getCounter(name).getCumulative()); + + // Normalize the status by lower casing and mapping to a canonical name with underscores. + metric.call("notFound"); + labels.put(MonitoringInfoConstants.Labels.STATUS, "not_found"); + name = MonitoringInfoMetricName.named(urn, labels); + Assert.assertEquals(1, (long) container.getCounter(name).getCumulative()); + + // Normalize the status by lower casing and mapping to a canonical name with underscores. + metric.call("PERMISSIONDENIED"); + labels.put(MonitoringInfoConstants.Labels.STATUS, "permission_denied"); + name = MonitoringInfoMetricName.named(urn, labels); + Assert.assertEquals(1, (long) container.getCounter(name).getCumulative()); + + // Accept other string codes passed in, even if they aren't in the canonical map. + metric.call("something_else"); + labels.put(MonitoringInfoConstants.Labels.STATUS, "something_else"); + name = MonitoringInfoMetricName.named(urn, labels); + Assert.assertEquals(1, (long) container.getCounter(name).getCumulative()); + + // Map unknown numeric codes to "unknown" + metric.call(123); + labels.put(MonitoringInfoConstants.Labels.STATUS, "unknown"); + name = MonitoringInfoMetricName.named(urn, labels); + Assert.assertEquals(1, (long) container.getCounter(name).getCumulative()); + } +} diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ShortIdMapTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ShortIdMapTest.java new file mode 100644 index 000000000000..0a1eb12e7ef9 --- /dev/null +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ShortIdMapTest.java @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import static org.junit.Assert.assertEquals; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import org.apache.beam.model.pipeline.v1.MetricsApi; +import org.apache.beam.sdk.values.KV; +import org.junit.Test; + +public class ShortIdMapTest { + + @Test + public void testShortIdAssignment() throws Exception { + ShortIdMap shortIdMap = new ShortIdMap(); + List> testCases = new ArrayList<>(); + + SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.USER_DISTRIBUTION_INT64); + testCases.add(KV.of("metric0", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT); + testCases.add(KV.of("metric1", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.USER_DISTRIBUTION_DOUBLE); + testCases.add(KV.of("metric2", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn("TestingSentinelUrn"); + builder.setType("TestingSentinelType"); + testCases.add(KV.of("metric3", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.FINISH_BUNDLE_MSECS); + testCases.add(KV.of("metric4", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.USER_SUM_INT64); + testCases.add(KV.of("metric5", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.USER_SUM_INT64); + builder.setLabel(MonitoringInfoConstants.Labels.NAME, "metricNumber7"); + builder.setLabel(MonitoringInfoConstants.Labels.NAMESPACE, "myNamespace"); + builder.setLabel(MonitoringInfoConstants.Labels.PTRANSFORM, "myPtransform"); + testCases.add(KV.of("metric6", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.USER_SUM_INT64); + builder.setLabel(MonitoringInfoConstants.Labels.NAME, "metricNumber8"); + builder.setLabel(MonitoringInfoConstants.Labels.NAMESPACE, "myNamespace"); + builder.setLabel(MonitoringInfoConstants.Labels.PTRANSFORM, "myPtransform"); + testCases.add(KV.of("metric7", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.API_REQUEST_COUNT); + builder.setLabel(MonitoringInfoConstants.Labels.SERVICE, "BigQuery"); + testCases.add(KV.of("metric8", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.API_REQUEST_COUNT); + builder.setLabel(MonitoringInfoConstants.Labels.SERVICE, "Storage"); + testCases.add(KV.of("metric9", builder.build())); + + // Validate that modifying the payload, but using the same URN/labels + // does not change the shortId assignment. + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.USER_DISTRIBUTION_DOUBLE); + testCases.add(KV.of("metric2", builder.build())); + + builder = new SimpleMonitoringInfoBuilder(false); + builder.setUrn(MonitoringInfoConstants.Urns.USER_SUM_INT64); + builder.setLabel(MonitoringInfoConstants.Labels.NAME, "metricNumber7"); + builder.setLabel(MonitoringInfoConstants.Labels.NAMESPACE, "myNamespace"); + builder.setLabel(MonitoringInfoConstants.Labels.PTRANSFORM, "myPtransform"); + testCases.add(KV.of("metric6", builder.build())); + + // Verify each short ID is assigned properly. + Set expectedShortIds = new HashSet(); + for (KV entry : testCases) { + assertEquals(entry.getKey(), shortIdMap.getOrCreateShortId(entry.getValue())); + expectedShortIds.add(entry.getKey()); + } + + HashMap actualRecoveredInfos = new HashMap<>(); + for (String expectedShortId : expectedShortIds) { + actualRecoveredInfos.put(expectedShortId, shortIdMap.get(expectedShortId)); + } + // Retrieve all of the MonitoringInfos by short id, and verify that the + // metadata (everything but the payload) matches the originals + assertEquals(expectedShortIds, actualRecoveredInfos.keySet()); + for (KV entry : testCases) { + // Clear payloads of both expected and actual before comparing + MetricsApi.MonitoringInfo expectedMonitoringInfo = entry.getValue(); + MetricsApi.MonitoringInfo.Builder expected = + MetricsApi.MonitoringInfo.newBuilder(expectedMonitoringInfo); + expected.clearPayload(); + + MetricsApi.MonitoringInfo.Builder actual = + MetricsApi.MonitoringInfo.newBuilder(actualRecoveredInfos.get(entry.getKey())); + actual.clearPayload(); + assertEquals(expected.build(), actual.build()); + } + + // Verify each short ID is assigned properly, in reverse. + for (int i = testCases.size() - 1; i > 0; i--) { + assertEquals( + testCases.get(i).getKey(), shortIdMap.getOrCreateShortId(testCases.get(i).getValue())); + } + } +} diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleExecutionStateTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleExecutionStateTest.java index 270c3c3edddf..f774930b958a 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleExecutionStateTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleExecutionStateTest.java @@ -30,9 +30,6 @@ /** Tests for {@link SimpleExecutionState}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SimpleExecutionStateTest { @Test diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilderTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilderTest.java index 69ec678ae342..c1d2016dd94a 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilderTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilderTest.java @@ -29,9 +29,6 @@ /** Tests for {@link SimpleMonitoringInfoBuilder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SimpleMonitoringInfoBuilderTest { @Test diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleStateRegistryTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleStateRegistryTest.java index 7da5607b114f..26007fd33d9f 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleStateRegistryTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleStateRegistryTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.core.metrics; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.times; import static org.mockito.Mockito.verify; @@ -31,9 +31,6 @@ import org.junit.Test; /** Tests for {@link SimpleStateRegistryTest}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SimpleStateRegistryTest { @Test diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/TestMetricsSink.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/TestMetricsSink.java index e34d55907629..29daa848cc33 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/TestMetricsSink.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/TestMetricsSink.java @@ -29,9 +29,6 @@ * This sink just stores in a static field the first counter (if it exists) attempted value. This is * useful for tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestMetricsSink implements MetricsSink { private static MetricQueryResults metricQueryResults; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterAllStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterAllStateMachineTest.java index 291be656fda8..b0874929fb63 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterAllStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterAllStateMachineTest.java @@ -33,9 +33,6 @@ /** Tests for {@link AfterAllStateMachine}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AfterAllStateMachineTest { private SimpleTriggerStateMachineTester tester; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterEachStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterEachStateMachineTest.java index d4a97b7fe2b8..301cf7d2c204 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterEachStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterEachStateMachineTest.java @@ -34,9 +34,6 @@ /** Tests for {@link AfterEachStateMachine}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AfterEachStateMachineTest { private SimpleTriggerStateMachineTester tester; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterFirstStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterFirstStateMachineTest.java index e4e572366775..37855b24742e 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterFirstStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterFirstStateMachineTest.java @@ -37,9 +37,6 @@ /** Tests for {@link AfterFirstStateMachine}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AfterFirstStateMachineTest { @Mock private TriggerStateMachine mockTrigger1; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterPaneStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterPaneStateMachineTest.java index 539a507c2cc1..8c1f947ef8eb 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterPaneStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterPaneStateMachineTest.java @@ -33,9 +33,6 @@ /** Tests for {@link AfterPaneStateMachine}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AfterPaneStateMachineTest { SimpleTriggerStateMachineTester tester; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterWatermarkStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterWatermarkStateMachineTest.java index baf454d95ea7..c71e85c57118 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterWatermarkStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/AfterWatermarkStateMachineTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.core.triggers; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.nullValue; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.doNothing; import static org.mockito.Mockito.verify; @@ -45,9 +45,6 @@ /** Tests the {@link AfterWatermarkStateMachine} triggers. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AfterWatermarkStateMachineTest { @Mock private TriggerStateMachine mockEarly; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/DefaultTriggerStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/DefaultTriggerStateMachineTest.java index b2fd54fcd4c3..f0dc347017d3 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/DefaultTriggerStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/DefaultTriggerStateMachineTest.java @@ -36,9 +36,6 @@ * Repeatedly.forever(AfterWatermark.pastEndOfWindow())}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DefaultTriggerStateMachineTest { SimpleTriggerStateMachineTester tester; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/FinishedTriggersBitSetTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/FinishedTriggersBitSetTest.java index 9594a509a978..7e7c246e6931 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/FinishedTriggersBitSetTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/FinishedTriggersBitSetTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core.triggers; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.theInstance; -import static org.junit.Assert.assertThat; import org.junit.Test; import org.junit.runner.RunWith; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/FinishedTriggersSetTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/FinishedTriggersSetTest.java index 7aed99a6c59c..03513929e352 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/FinishedTriggersSetTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/FinishedTriggersSetTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.core.triggers; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.theInstance; -import static org.junit.Assert.assertThat; import java.util.HashSet; import org.junit.Test; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/NeverStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/NeverStateMachineTest.java index df5c8869b887..384aa12639fa 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/NeverStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/NeverStateMachineTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.core.triggers; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.core.triggers.TriggerStateMachineTester.SimpleTriggerStateMachineTester; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; @@ -34,9 +34,6 @@ /** Tests for {@link NeverStateMachine}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class NeverStateMachineTest { private SimpleTriggerStateMachineTester triggerTester; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/OrFinallyStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/OrFinallyStateMachineTest.java index 2073e5278bb6..dcf0b96be39f 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/OrFinallyStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/OrFinallyStateMachineTest.java @@ -33,9 +33,6 @@ /** Tests for {@link OrFinallyStateMachine}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class OrFinallyStateMachineTest { private SimpleTriggerStateMachineTester tester; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/RepeatedlyStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/RepeatedlyStateMachineTest.java index e47cc6bfe6fb..08312360806e 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/RepeatedlyStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/RepeatedlyStateMachineTest.java @@ -41,9 +41,6 @@ /** Tests for {@link RepeatedlyStateMachine}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RepeatedlyStateMachineTest { @Mock private TriggerStateMachine mockTrigger; diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachineTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachineTest.java index 2992e2271dfa..20e68a434ef9 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachineTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachineTest.java @@ -29,9 +29,6 @@ /** Tests for {@link TriggerStateMachine}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TriggerStateMachineTest { @Test diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachineTester.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachineTester.java index 4b3dd620e0f3..a2c6b3762377 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachineTester.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachineTester.java @@ -63,9 +63,6 @@ * * @param The type of windows being used. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TriggerStateMachineTester { /** diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachinesTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachinesTest.java index 4082fda2fd42..bee9e61e4e48 100644 --- a/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachinesTest.java +++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/triggers/TriggerStateMachinesTest.java @@ -18,9 +18,9 @@ package org.apache.beam.runners.core.triggers; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.sdk.state.TimeDomain; diff --git a/runners/direct-java/build.gradle b/runners/direct-java/build.gradle index 42e0aa5452e7..caec7bafd150 100644 --- a/runners/direct-java/build.gradle +++ b/runners/direct-java/build.gradle @@ -26,7 +26,8 @@ def dependOnProjects = [":runners:core-construction-java", ":runners:core-java", ":runners:local-java", ":runners:java-fn-execution", - ":sdks:java:fn-execution"] + ":sdks:java:fn-execution" + ] applyJavaNature( automaticModuleName: 'org.apache.beam.runners.direct', @@ -69,20 +70,15 @@ dependencies { compile project(it) } shadow project(path: ":sdks:java:core", configuration: "shadow") - shadow library.java.vendored_grpc_1_26_0 + shadow library.java.vendored_grpc_1_36_0 shadow library.java.joda_time shadow library.java.slf4j_api - shadow library.java.args4j + shadow library.java.jackson_databind provided library.java.hamcrest_core provided library.java.junit shadowTest project(path: ":sdks:java:core", configuration: "shadowTest") shadowTest project(path: ":runners:core-java", configuration: "testRuntime") - shadowTest library.java.slf4j_jdk14 shadowTest library.java.mockito_core - shadowTest library.java.stax2_api - shadowTest library.java.woodstox_core_asl - shadowTest library.java.google_cloud_dataflow_java_proto_library_all - shadowTest library.java.jackson_dataformat_yaml needsRunner project(path: ":runners:core-construction-java", configuration: "testRuntime") needsRunner project(path: ":runners:core-java", configuration: "testRuntime") needsRunner project(path: ":sdks:java:core", configuration: "shadowTest") @@ -93,6 +89,9 @@ dependencies { validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest") validatesRunner project(path: project.path, configuration: "shadow") validatesRunner project(path: project.path, configuration: "shadowTest") + permitUnusedDeclared library.java.vendored_grpc_1_36_0 + permitUnusedDeclared project(":runners:java-fn-execution") + permitUnusedDeclared project(":sdks:java:fn-execution") } // windows handles quotes differently from linux, @@ -130,7 +129,10 @@ task needsRunnerTests(type: Test) { // MetricsPusher isn't implemented in direct runner excludeCategories "org.apache.beam.sdk.testing.UsesMetricsPusher" excludeCategories "org.apache.beam.sdk.testing.UsesCrossLanguageTransforms" + excludeCategories "org.apache.beam.sdk.testing.UsesPythonExpansionService" excludeCategories 'org.apache.beam.sdk.testing.UsesBundleFinalizer' + // https://issues.apache.org/jira/browse/BEAM-2791 + excludeCategories 'org.apache.beam.sdk.testing.UsesLoopingTimer' } testLogging { outputs.upToDateWhen {false} @@ -159,6 +161,9 @@ task validatesRunner(type: Test) { excludeCategories "org.apache.beam.sdk.testing.LargeKeys\$Above100MB" excludeCategories 'org.apache.beam.sdk.testing.UsesMetricsPusher' excludeCategories "org.apache.beam.sdk.testing.UsesCrossLanguageTransforms" + excludeCategories "org.apache.beam.sdk.testing.UsesPythonExpansionService" + // https://issues.apache.org/jira/browse/BEAM-2791 + excludeCategories 'org.apache.beam.sdk.testing.UsesLoopingTimer' } } diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java index 3404fa5547d7..4e7178bf01d0 100644 --- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java +++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java @@ -184,7 +184,6 @@ public DirectPipelineResult run(Pipeline pipeline) { DisplayDataValidator.validatePipeline(pipeline); DisplayDataValidator.validateOptions(options); - SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); ExecutorService metricsPool = Executors.newCachedThreadPool( @@ -253,6 +252,9 @@ void performRewrites(Pipeline pipeline) { // The last set of overrides includes GBK overrides used in WriteView pipeline.replaceAll(groupByKeyOverrides()); + + // TODO(BEAM-10670): Use SDF read as default when we address performance issue. + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); } @SuppressWarnings("rawtypes") diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectTimerInternals.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectTimerInternals.java index d240e1bb641a..5a477bb86c5e 100644 --- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectTimerInternals.java +++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectTimerInternals.java @@ -17,13 +17,13 @@ */ package org.apache.beam.runners.direct; -import java.util.stream.StreamSupport; import org.apache.beam.runners.core.StateNamespace; import org.apache.beam.runners.core.TimerInternals; import org.apache.beam.runners.direct.WatermarkManager.TimerUpdate; import org.apache.beam.runners.direct.WatermarkManager.TimerUpdate.TimerUpdateBuilder; import org.apache.beam.runners.direct.WatermarkManager.TransformWatermarks; import org.apache.beam.sdk.state.TimeDomain; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Instant; @@ -71,8 +71,16 @@ public void setTimer(TimerData timerData) { } @Override - public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain timeDomain) { - throw new UnsupportedOperationException("Canceling of timer by ID is not yet supported."); + public void deleteTimer( + StateNamespace namespace, String timerId, String timerFamilyId, TimeDomain timeDomain) { + deleteTimer( + TimerData.of( + timerId, + timerFamilyId, + namespace, + BoundedWindow.TIMESTAMP_MIN_VALUE, + BoundedWindow.TIMESTAMP_MAX_VALUE, + timeDomain)); } /** @deprecated use {@link #deleteTimer(StateNamespace, String, TimeDomain)}. */ @@ -93,10 +101,19 @@ public TimerUpdate getTimerUpdate() { return timerUpdateBuilder.build(); } - public boolean containsUpdateForTimeBefore(Instant time) { + public boolean containsUpdateForTimeBefore( + Instant maxWatermarkTime, Instant maxProcessingTime, Instant maxSynchronizedProcessingTime) { TimerUpdate update = timerUpdateBuilder.build(); - return hasTimeBefore(update.getSetTimers(), time) - || hasTimeBefore(update.getDeletedTimers(), time); + return hasTimeBefore( + update.getSetTimers(), + maxWatermarkTime, + maxProcessingTime, + maxSynchronizedProcessingTime) + || hasTimeBefore( + update.getDeletedTimers(), + maxWatermarkTime, + maxProcessingTime, + maxSynchronizedProcessingTime); } @Override @@ -119,8 +136,31 @@ public Instant currentInputWatermarkTime() { return watermarks.getOutputWatermark(); } - private boolean hasTimeBefore(Iterable timers, Instant time) { - return StreamSupport.stream(timers.spliterator(), false) - .anyMatch(td -> td.getTimestamp().isBefore(time)); + private boolean hasTimeBefore( + Iterable timers, + Instant maxWatermarkTime, + Instant maxProcessingTime, + Instant maxSynchronizedProcessingTime) { + for (TimerData timerData : timers) { + Instant currentTime; + switch (timerData.getDomain()) { + case EVENT_TIME: + currentTime = maxWatermarkTime; + break; + case PROCESSING_TIME: + currentTime = maxProcessingTime; + break; + case SYNCHRONIZED_PROCESSING_TIME: + currentTime = maxSynchronizedProcessingTime; + break; + default: + throw new RuntimeException("Unexpected timeDomain " + timerData.getDomain()); + } + if (timerData.getTimestamp().isBefore(currentTime) + || timerData.getTimestamp().isEqual(currentTime)) { + return true; + } + } + return false; } } diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DoFnLifecycleManager.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DoFnLifecycleManager.java index 788a8a0e0019..fdd46673a459 100644 --- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DoFnLifecycleManager.java +++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DoFnLifecycleManager.java @@ -21,6 +21,7 @@ import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentMap; import org.apache.beam.sdk.PipelineRunner; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.DoFn.Setup; import org.apache.beam.sdk.transforms.DoFn.Teardown; @@ -41,18 +42,18 @@ * clearing all cached {@link DoFn DoFns}. */ class DoFnLifecycleManager { - public static DoFnLifecycleManager of(DoFn original) { - return new DoFnLifecycleManager(original); + public static DoFnLifecycleManager of(DoFn original, PipelineOptions options) { + return new DoFnLifecycleManager(original, options); } private final LoadingCache> outstanding; private final ConcurrentMap thrownOnTeardown; - private DoFnLifecycleManager(DoFn original) { + private DoFnLifecycleManager(DoFn original, PipelineOptions options) { this.outstanding = CacheBuilder.newBuilder() .removalListener(new TeardownRemovedFnListener()) - .build(new DeserializingCacheLoader(original)); + .build(new DeserializingCacheLoader(original, options)); thrownOnTeardown = new ConcurrentHashMap<>(); } @@ -90,9 +91,11 @@ public Collection removeAll() throws Exception { private static class DeserializingCacheLoader extends CacheLoader> { private final byte[] original; + private final PipelineOptions options; - public DeserializingCacheLoader(DoFn original) { + public DeserializingCacheLoader(DoFn original, PipelineOptions options) { this.original = SerializableUtils.serializeToByteArray(original); + this.options = options; } @Override @@ -101,7 +104,7 @@ public DeserializingCacheLoader(DoFn original) { (DoFn) SerializableUtils.deserializeFromByteArray( original, "DoFn Copy in thread " + key.getName()); - DoFnInvokers.tryInvokeSetupFor(fn); + DoFnInvokers.tryInvokeSetupFor(fn, options); return fn; } } diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/MultiStepCombine.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/MultiStepCombine.java index faebc77231d4..cc23a83f441c 100644 --- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/MultiStepCombine.java +++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/MultiStepCombine.java @@ -94,7 +94,7 @@ private boolean isApplicable( PCollection> input = (PCollection>) Iterables.getOnlyElement(inputs.values()); WindowingStrategy windowingStrategy = input.getWindowingStrategy(); - boolean windowFnApplicable = windowingStrategy.getWindowFn().isNonMerging(); + boolean windowFnApplicable = !windowingStrategy.needsMerge(); // Triggering with count based triggers is not appropriately handled here. Disabling // most triggers is safe, though more broad than is technically required. boolean triggerApplicable = DefaultTrigger.of().equals(windowingStrategy.getTrigger()); diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoEvaluatorFactory.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoEvaluatorFactory.java index 80b90e6077cd..1ae6c07457d6 100644 --- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoEvaluatorFactory.java +++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoEvaluatorFactory.java @@ -65,7 +65,8 @@ final class ParDoEvaluatorFactory implements TransformEvaluator return new CacheLoader, DoFnLifecycleManager>() { @Override public DoFnLifecycleManager load(AppliedPTransform application) throws Exception { - return DoFnLifecycleManager.of(ParDoTranslation.getDoFn(application)); + return DoFnLifecycleManager.of( + ParDoTranslation.getDoFn(application), application.getPipeline().getOptions()); } }; } diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/SplittableProcessElementsEvaluatorFactory.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/SplittableProcessElementsEvaluatorFactory.java index b35a34548e39..e3852f640b4f 100644 --- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/SplittableProcessElementsEvaluatorFactory.java +++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/SplittableProcessElementsEvaluatorFactory.java @@ -80,7 +80,7 @@ public DoFnLifecycleManager load(final AppliedPTransform application) { (ProcessElements< InputT, OutputT, RestrictionT, PositionT, WatermarkEstimatorStateT>) application.getTransform(); - return DoFnLifecycleManager.of(transform.newProcessFn(transform.getFn())); + return DoFnLifecycleManager.of(transform.newProcessFn(transform.getFn()), options); } }, options); diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/StatefulParDoEvaluatorFactory.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/StatefulParDoEvaluatorFactory.java index 0fdde7573267..ebd305f4f365 100644 --- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/StatefulParDoEvaluatorFactory.java +++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/StatefulParDoEvaluatorFactory.java @@ -47,6 +47,7 @@ import org.apache.beam.sdk.values.PCollectionTuple; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheLoader; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Ordering; import org.joda.time.Instant; /** A {@link TransformEvaluatorFactory} for stateful {@link ParDo}. */ @@ -73,7 +74,7 @@ public DoFnLifecycleManager load(AppliedPTransform appliedStatefulParDo // do not go through the portability translation layers StatefulParDo statefulParDo = (StatefulParDo) appliedStatefulParDo.getTransform(); - return DoFnLifecycleManager.of(statefulParDo.getDoFn()); + return DoFnLifecycleManager.of(statefulParDo.getDoFn(), options); } }, options); @@ -173,14 +174,31 @@ public void processElement(WindowedValue>> gbkRes for (WindowedValue> windowedValue : gbkResult.getValue().elementsIterable()) { delegateEvaluator.processElement(windowedValue); } - - final Instant inputWatermarkTime = timerInternals.currentInputWatermarkTime(); PriorityQueue toBeFiredTimers = new PriorityQueue<>(Comparator.comparing(TimerData::getTimestamp)); - gbkResult.getValue().timersIterable().forEach(toBeFiredTimers::add); - while (!timerInternals.containsUpdateForTimeBefore(inputWatermarkTime) + Instant maxWatermarkTime = BoundedWindow.TIMESTAMP_MIN_VALUE; + Instant maxProcessingTime = BoundedWindow.TIMESTAMP_MIN_VALUE; + Instant maxSynchronizedProcessingTime = BoundedWindow.TIMESTAMP_MIN_VALUE; + for (TimerData timerData : gbkResult.getValue().timersIterable()) { + toBeFiredTimers.add(timerData); + switch (timerData.getDomain()) { + case EVENT_TIME: + maxWatermarkTime = Ordering.natural().max(maxWatermarkTime, timerData.getTimestamp()); + break; + case PROCESSING_TIME: + maxProcessingTime = Ordering.natural().max(maxProcessingTime, timerData.getTimestamp()); + break; + case SYNCHRONIZED_PROCESSING_TIME: + maxSynchronizedProcessingTime = + Ordering.natural().max(maxSynchronizedProcessingTime, timerData.getTimestamp()); + } + } + + while (!timerInternals.containsUpdateForTimeBefore( + maxWatermarkTime, maxProcessingTime, maxSynchronizedProcessingTime) && !toBeFiredTimers.isEmpty()) { + TimerData timer = toBeFiredTimers.poll(); checkState( timer.getNamespace() instanceof WindowNamespace, @@ -192,29 +210,37 @@ public void processElement(WindowedValue>> gbkRes BoundedWindow timerWindow = windowNamespace.getWindow(); delegateEvaluator.onTimer(timer, gbkResult.getValue().key(), timerWindow); - - StateTag timerWatermarkHoldTag = setTimerTag(timer); - - stepContext.stateInternals().state(timer.getNamespace(), timerWatermarkHoldTag).clear(); - stepContext.stateInternals().commit(); + clearWatermarkHold(timer); } pushedBackTimers.addAll(toBeFiredTimers); } + private void clearWatermarkHold(TimerData timer) { + StateTag timerWatermarkHoldTag = setTimerTag(timer); + stepContext.stateInternals().state(timer.getNamespace(), timerWatermarkHoldTag).clear(); + stepContext.stateInternals().commit(); + } + + private void setWatermarkHold(TimerData timer) { + StateTag timerWatermarkHoldTag = setTimerTag(timer); + stepContext + .stateInternals() + .state(timer.getNamespace(), timerWatermarkHoldTag) + .add(timer.getOutputTimestamp()); + } + @Override public TransformResult>> finishBundle() throws Exception { TransformResult> delegateResult = delegateEvaluator.finishBundle(); boolean isTimerDeclared = false; for (TimerData timerData : delegateResult.getTimerUpdate().getSetTimers()) { - StateTag timerWatermarkHoldTag = setTimerTag(timerData); - - stepContext - .stateInternals() - .state(timerData.getNamespace(), timerWatermarkHoldTag) - .add(timerData.getOutputTimestamp()); + setWatermarkHold(timerData); isTimerDeclared = true; } + for (TimerData timerData : delegateResult.getTimerUpdate().getDeletedTimers()) { + clearWatermarkHold(timerData); + } CopyOnAccessInMemoryStateInternals state; Instant watermarkHold; diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/WatermarkManager.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/WatermarkManager.java index dbc0a96902be..a04325609df5 100644 --- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/WatermarkManager.java +++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/WatermarkManager.java @@ -29,7 +29,6 @@ import java.util.EnumMap; import java.util.HashMap; import java.util.HashSet; -import java.util.LinkedHashSet; import java.util.List; import java.util.Map; import java.util.NavigableSet; @@ -46,7 +45,6 @@ import java.util.function.Consumer; import java.util.function.Function; import java.util.stream.Collectors; -import java.util.stream.StreamSupport; import javax.annotation.Nonnull; import javax.annotation.concurrent.GuardedBy; import org.apache.beam.runners.core.StateNamespace; @@ -66,8 +64,10 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ComparisonChain; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashBasedTable; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Ordering; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.SortedMultiset; @@ -635,7 +635,7 @@ private synchronized void updateTimers(TimerUpdate update) { Table existingTimersForKey = existingTimers.computeIfAbsent(update.key, k -> HashBasedTable.create()); - for (TimerData addedTimer : update.setTimers) { + for (TimerData addedTimer : update.setTimers.values()) { NavigableSet timerQueue = timerMap.get(addedTimer.getDomain()); if (timerQueue == null) { continue; @@ -659,7 +659,7 @@ private synchronized void updateTimers(TimerUpdate update) { addedTimer); } - for (TimerData deletedTimer : update.deletedTimers) { + for (TimerData deletedTimer : update.deletedTimers.values()) { NavigableSet timerQueue = timerMap.get(deletedTimer.getDomain()); if (timerQueue == null) { continue; @@ -670,7 +670,6 @@ private synchronized void updateTimers(TimerUpdate update) { existingTimersForKey.get( deletedTimer.getNamespace(), deletedTimer.getTimerId() + '+' + deletedTimer.getTimerFamilyId()); - if (existingTimer != null) { pendingTimers.remove(deletedTimer); timerQueue.remove(deletedTimer); @@ -1548,6 +1547,25 @@ public String toString() { } } + @AutoValue + public abstract static class TimerKey { + abstract TimeDomain getDomain(); + + abstract String getId(); + + abstract String getFamily(); + + abstract Object getNamespace(); + + static TimerKey of(TimerData timerData) { + return new AutoValue_WatermarkManager_TimerKey( + timerData.getDomain(), + timerData.getTimerId(), + timerData.getTimerFamilyId(), + timerData.getNamespace().getCacheKey()); + } + } + /** * A collection of newly set, deleted, and completed timers. * @@ -1559,8 +1577,8 @@ public String toString() { public static class TimerUpdate { private final StructuralKey key; private final Iterable completedTimers; - private final Iterable setTimers; - private final Iterable deletedTimers; + private final Map setTimers; + private final Map deletedTimers; private final Iterable pushedBackTimers; /** Returns a TimerUpdate for a null key with no timers. */ @@ -1568,8 +1586,8 @@ public static TimerUpdate empty() { return new TimerUpdate( null, Collections.emptyList(), - Collections.emptyList(), - Collections.emptyList(), + Collections.emptyMap(), + Collections.emptyMap(), Collections.emptyList()); } @@ -1587,14 +1605,14 @@ public static TimerUpdateBuilder builder(StructuralKey key) { public static final class TimerUpdateBuilder { private final StructuralKey key; private final Collection completedTimers; - private final Collection setTimers; - private final Collection deletedTimers; + private final Map setTimers; + private final Map deletedTimers; private TimerUpdateBuilder(StructuralKey key) { this.key = key; - this.completedTimers = new LinkedHashSet<>(); - this.setTimers = new LinkedHashSet<>(); - this.deletedTimers = new LinkedHashSet<>(); + this.completedTimers = Sets.newLinkedHashSet(); + this.setTimers = Maps.newLinkedHashMap(); + this.deletedTimers = Maps.newLinkedHashMap(); } /** @@ -1616,8 +1634,8 @@ public TimerUpdateBuilder setTimer(TimerData setTimer) { "Got a timer for after the end of time (%s), got %s", BoundedWindow.TIMESTAMP_MAX_VALUE, setTimer.getTimestamp()); - deletedTimers.remove(setTimer); - setTimers.add(setTimer); + deletedTimers.remove(TimerKey.of(setTimer)); + setTimers.put(TimerKey.of(setTimer), setTimer); return this; } @@ -1626,8 +1644,9 @@ public TimerUpdateBuilder setTimer(TimerData setTimer) { * it has previously been set. Returns this {@link TimerUpdateBuilder}. */ public TimerUpdateBuilder deletedTimer(TimerData deletedTimer) { - deletedTimers.add(deletedTimer); - setTimers.remove(deletedTimer); + TimerKey key = TimerKey.of(deletedTimer); + deletedTimers.put(key, deletedTimer); + setTimers.remove(key); return this; } @@ -1639,19 +1658,12 @@ public TimerUpdate build() { return new TimerUpdate( key, ImmutableList.copyOf(completedTimers), - ImmutableList.copyOf(setTimers), - ImmutableList.copyOf(deletedTimers), + ImmutableMap.copyOf(setTimers), + ImmutableMap.copyOf(deletedTimers), Collections.emptyList()); } } - private static Map indexTimerData(Iterable timerData) { - return StreamSupport.stream(timerData.spliterator(), false) - .collect( - Collectors.toMap( - TimerUpdate::getTimerIdAndTimerFamilyIdWithNamespace, e -> e, (a, b) -> b)); - } - private static String getTimerIdAndTimerFamilyIdWithNamespace(TimerData td) { return td.getNamespace() + td.getTimerId() + td.getTimerFamilyId(); } @@ -1659,8 +1671,8 @@ private static String getTimerIdAndTimerFamilyIdWithNamespace(TimerData td) { private TimerUpdate( StructuralKey key, Iterable completedTimers, - Iterable setTimers, - Iterable deletedTimers, + Map setTimers, + Map deletedTimers, Iterable pushedBackTimers) { this.key = key; this.completedTimers = completedTimers; @@ -1681,12 +1693,12 @@ public Iterable getCompletedTimers() { @VisibleForTesting public Iterable getSetTimers() { - return setTimers; + return setTimers.values(); } @VisibleForTesting public Iterable getDeletedTimers() { - return deletedTimers; + return deletedTimers.values(); } Iterable getPushedBackTimers() { @@ -1695,8 +1707,8 @@ Iterable getPushedBackTimers() { boolean isEmpty() { return Iterables.isEmpty(completedTimers) - && Iterables.isEmpty(setTimers) - && Iterables.isEmpty(deletedTimers) + && setTimers.isEmpty() + && deletedTimers.isEmpty() && Iterables.isEmpty(pushedBackTimers); } @@ -1708,17 +1720,18 @@ boolean isEmpty() { public TimerUpdate withCompletedTimers(Iterable completedTimers) { List timersToComplete = new ArrayList<>(); Set pushedBack = Sets.newHashSet(pushedBackTimers); - Map newSetTimers = indexTimerData(setTimers); + Map newSetTimers = Maps.newLinkedHashMap(); + newSetTimers.putAll(setTimers); for (TimerData td : completedTimers) { - String timerIdWithNs = getTimerIdAndTimerFamilyIdWithNamespace(td); + TimerKey timerKey = TimerKey.of(td); if (!pushedBack.contains(td)) { timersToComplete.add(td); - } else if (!newSetTimers.containsKey(timerIdWithNs)) { - newSetTimers.put(timerIdWithNs, td); + } else if (!newSetTimers.containsKey(timerKey)) { + newSetTimers.put(timerKey, td); } } return new TimerUpdate( - key, timersToComplete, newSetTimers.values(), deletedTimers, Collections.emptyList()); + key, timersToComplete, newSetTimers, deletedTimers, Collections.emptyList()); } /** diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactoryTest.java index 3dd37a0c9323..6f91de13c771 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactoryTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; @@ -26,7 +27,6 @@ import static org.hamcrest.Matchers.hasSize; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.lessThanOrEqualTo; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import java.io.IOException; @@ -72,7 +72,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class BoundedReadEvaluatorFactoryTest { private BoundedSource source; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CloningBundleFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CloningBundleFactoryTest.java index 8ae9fc30292f..6a6d7dfd1bfb 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CloningBundleFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CloningBundleFactoryTest.java @@ -17,13 +17,13 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.isA; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.theInstance; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.io.InputStream; @@ -55,9 +55,6 @@ /** Tests for {@link CloningBundleFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CloningBundleFactoryTest { @Rule public ExpectedException thrown = ExpectedException.none(); @Rule public final TestPipeline p = TestPipeline.create().enableAbandonedNodeEnforcement(false); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CommittedResultTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CommittedResultTest.java index 5a437ba1725c..fab33250e1ab 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CommittedResultTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CommittedResultTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.direct; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.io.Serializable; import java.util.Collections; @@ -30,6 +30,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; @@ -63,6 +64,7 @@ public PDone expand(PBegin begin) { throw new IllegalArgumentException("Should never be applied"); } }, + ResourceHints.create(), p); private transient BundleFactory bundleFactory = ImmutableListBundleFactory.create(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CopyOnAccessInMemoryStateInternalsTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CopyOnAccessInMemoryStateInternalsTest.java index a07d72fcb94d..aa3e8d7b96ef 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CopyOnAccessInMemoryStateInternalsTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/CopyOnAccessInMemoryStateInternalsTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; @@ -24,7 +25,6 @@ import static org.hamcrest.Matchers.nullValue; import static org.hamcrest.Matchers.theInstance; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.never; import static org.mockito.Mockito.spy; import static org.mockito.Mockito.verify; @@ -60,9 +60,6 @@ /** Tests for {@link CopyOnAccessInMemoryStateInternals}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CopyOnAccessInMemoryStateInternalsTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectGraphVisitorTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectGraphVisitorTest.java index 468d402119ba..9a404442552f 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectGraphVisitorTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectGraphVisitorTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasSize; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.List; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectGroupByKeyOverrideFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectGroupByKeyOverrideFactoryTest.java index 3faf7c9a36e2..b923252cf144 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectGroupByKeyOverrideFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectGroupByKeyOverrideFactoryTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.direct; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectMetricsTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectMetricsTest.java index bbf521bbb492..69e922e1cb34 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectMetricsTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectMetricsTest.java @@ -20,9 +20,9 @@ import static org.apache.beam.sdk.metrics.MetricNameFilter.inNamespace; import static org.apache.beam.sdk.metrics.MetricResultsMatchers.attemptedMetricsResult; import static org.apache.beam.sdk.metrics.MetricResultsMatchers.committedMetricsResult; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; -import static org.junit.Assert.assertThat; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; @@ -48,9 +48,6 @@ /** Tests for {@link DirectMetrics}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DirectMetricsTest { @Mock private CommittedBundle bundle1; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerApiSurfaceTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerApiSurfaceTest.java index 9b8cde168d53..899713f45351 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerApiSurfaceTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerApiSurfaceTest.java @@ -18,7 +18,7 @@ package org.apache.beam.runners.direct; import static org.apache.beam.sdk.util.ApiSurface.containsOnlyPackages; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.util.Set; import org.apache.beam.sdk.Pipeline; @@ -35,9 +35,6 @@ /** API surface verification for {@link org.apache.beam.runners.direct}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DirectRunnerApiSurfaceTest { @Test public void testDirectRunnerApiSurface() throws Exception { diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerTest.java index c2a6c864cc68..9603bda402fa 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerTest.java @@ -18,12 +18,12 @@ package org.apache.beam.runners.direct; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.isA; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import com.fasterxml.jackson.annotation.JsonIgnore; @@ -108,7 +108,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class DirectRunnerTest implements Serializable { @Rule public transient ExpectedException thrown = ExpectedException.none(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectTimerInternalsTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectTimerInternalsTest.java index 8016dd215aa5..cae453e9d1f3 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectTimerInternalsTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectTimerInternalsTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import org.apache.beam.runners.core.StateNamespaces; @@ -43,7 +43,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class DirectTimerInternalsTest { private MockClock clock; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectTransformExecutorTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectTransformExecutorTest.java index 519d38ca476e..9ac18ecdc289 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectTransformExecutorTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectTransformExecutorTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.isA; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import java.util.ArrayList; @@ -60,7 +60,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class DirectTransformExecutorTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagerRemovingTransformEvaluatorTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagerRemovingTransformEvaluatorTest.java index 91d488e80122..e042fc7df82e 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagerRemovingTransformEvaluatorTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagerRemovingTransformEvaluatorTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Matchers.any; import static org.mockito.Mockito.doThrow; @@ -28,6 +28,7 @@ import org.apache.beam.runners.core.StateNamespaces; import org.apache.beam.runners.core.TimerInternals.TimerData; +import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.state.TimeDomain; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; @@ -42,15 +43,12 @@ /** Tests for {@link DoFnLifecycleManagerRemovingTransformEvaluator}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DoFnLifecycleManagerRemovingTransformEvaluatorTest { private DoFnLifecycleManager lifecycleManager; @Before public void setup() { - lifecycleManager = DoFnLifecycleManager.of(new TestFn()); + lifecycleManager = DoFnLifecycleManager.of(new TestFn(), PipelineOptionsFactory.create()); } @Test diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagerTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagerTest.java index 1114511bf94a..61e62bd42e70 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagerTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagerTest.java @@ -18,12 +18,12 @@ package org.apache.beam.runners.direct; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.isA; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.theInstance; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.List; @@ -33,6 +33,7 @@ import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; +import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.util.UserCodeException; import org.hamcrest.Matchers; @@ -48,7 +49,7 @@ public class DoFnLifecycleManagerTest { @Rule public ExpectedException thrown = ExpectedException.none(); private TestFn fn = new TestFn(); - private DoFnLifecycleManager mgr = DoFnLifecycleManager.of(fn); + private DoFnLifecycleManager mgr = DoFnLifecycleManager.of(fn, PipelineOptionsFactory.create()); @Test public void setupOnGet() throws Exception { diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagersTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagersTest.java index 36105a245e6d..a47ed55430d0 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagersTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DoFnLifecycleManagersTest.java @@ -23,6 +23,8 @@ import java.util.ArrayList; import java.util.Collection; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.util.UserCodeException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -38,17 +40,15 @@ /** Tests for {@link DoFnLifecycleManagers}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DoFnLifecycleManagersTest { @Rule public ExpectedException thrown = ExpectedException.none(); @Test public void removeAllWhenManagersThrowSuppressesAndThrows() throws Exception { - DoFnLifecycleManager first = DoFnLifecycleManager.of(new ThrowsInCleanupFn("foo")); - DoFnLifecycleManager second = DoFnLifecycleManager.of(new ThrowsInCleanupFn("bar")); - DoFnLifecycleManager third = DoFnLifecycleManager.of(new ThrowsInCleanupFn("baz")); + PipelineOptions options = PipelineOptionsFactory.create(); + DoFnLifecycleManager first = DoFnLifecycleManager.of(new ThrowsInCleanupFn("foo"), options); + DoFnLifecycleManager second = DoFnLifecycleManager.of(new ThrowsInCleanupFn("bar"), options); + DoFnLifecycleManager third = DoFnLifecycleManager.of(new ThrowsInCleanupFn("baz"), options); first.get(); second.get(); third.get(); @@ -92,9 +92,10 @@ public boolean matches(Object item) { @Test public void whenManagersSucceedSucceeds() throws Exception { - DoFnLifecycleManager first = DoFnLifecycleManager.of(new EmptyFn()); - DoFnLifecycleManager second = DoFnLifecycleManager.of(new EmptyFn()); - DoFnLifecycleManager third = DoFnLifecycleManager.of(new EmptyFn()); + PipelineOptions options = PipelineOptionsFactory.create(); + DoFnLifecycleManager first = DoFnLifecycleManager.of(new EmptyFn(), options); + DoFnLifecycleManager second = DoFnLifecycleManager.of(new EmptyFn(), options); + DoFnLifecycleManager third = DoFnLifecycleManager.of(new EmptyFn(), options); first.get(); second.get(); third.get(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/EvaluationContextTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/EvaluationContextTest.java index c6e8ce10a0d1..5c60fbb64c1f 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/EvaluationContextTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/EvaluationContextTest.java @@ -19,13 +19,13 @@ import static java.nio.charset.StandardCharsets.UTF_8; import static org.apache.beam.sdk.testing.PCollectionViewTesting.materializeValuesFor; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.Collection; @@ -80,7 +80,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class EvaluationContextTest implements Serializable { private transient EvaluationContext context; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/FlattenEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/FlattenEvaluatorFactoryTest.java index 40290b7da2a4..fb02a60a5e0f 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/FlattenEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/FlattenEvaluatorFactoryTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; @@ -43,7 +43,7 @@ /** Tests for {@link FlattenEvaluatorFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({"keyfor", "nullness"}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +@SuppressWarnings({"keyfor"}) public class FlattenEvaluatorFactoryTest { private BundleFactory bundleFactory = ImmutableListBundleFactory.create(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/GroupByKeyEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/GroupByKeyEvaluatorFactoryTest.java index 0fad3386f9a7..5f41929506c0 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/GroupByKeyEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/GroupByKeyEvaluatorFactoryTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; @@ -47,9 +47,6 @@ /** Tests for {@link GroupByKeyEvaluatorFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GroupByKeyEvaluatorFactoryTest { private BundleFactory bundleFactory = ImmutableListBundleFactory.create(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/GroupByKeyOnlyEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/GroupByKeyOnlyEvaluatorFactoryTest.java index 6a3138491347..7cfbe4d2fd5a 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/GroupByKeyOnlyEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/GroupByKeyOnlyEvaluatorFactoryTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; @@ -47,9 +47,6 @@ /** Tests for {@link GroupByKeyOnlyEvaluatorFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GroupByKeyOnlyEvaluatorFactoryTest { private BundleFactory bundleFactory = ImmutableListBundleFactory.create(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutabilityCheckingBundleFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutabilityCheckingBundleFactoryTest.java index 678848f17f84..5dcfa90cb4c7 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutabilityCheckingBundleFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutabilityCheckingBundleFactoryTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.local.StructuralKey; import org.apache.beam.sdk.coders.ByteArrayCoder; @@ -42,9 +42,6 @@ /** Tests for {@link ImmutabilityCheckingBundleFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ImmutabilityCheckingBundleFactoryTest { @Rule public final TestPipeline p = TestPipeline.create().enableAbandonedNodeEnforcement(false); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutabilityEnforcementFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutabilityEnforcementFactoryTest.java index 24dacab2bde4..3a574102f664 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutabilityEnforcementFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutabilityEnforcementFactoryTest.java @@ -40,9 +40,6 @@ /** Tests for {@link ImmutabilityEnforcementFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ImmutabilityEnforcementFactoryTest implements Serializable { @Rule public transient TestPipeline p = TestPipeline.create().enableAbandonedNodeEnforcement(false); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutableListBundleFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutableListBundleFactoryTest.java index c325f3f20c28..81812ff49d1c 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutableListBundleFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImmutableListBundleFactoryTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Collection; @@ -53,9 +53,6 @@ /** Tests for {@link ImmutableListBundleFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ImmutableListBundleFactoryTest { @Rule public final TestPipeline p = TestPipeline.create().enableAbandonedNodeEnforcement(false); @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImpulseEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImpulseEvaluatorFactoryTest.java index 3a1e5b65d336..bc5e06e390d2 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImpulseEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ImpulseEvaluatorFactoryTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasSize; import static org.junit.Assert.assertArrayEquals; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import java.util.Collection; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/KeyedPValueTrackingVisitorTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/KeyedPValueTrackingVisitorTest.java index 21452cb5a78e..a228da6eff54 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/KeyedPValueTrackingVisitorTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/KeyedPValueTrackingVisitorTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.util.Collections; import org.apache.beam.runners.core.KeyedWorkItem; @@ -56,9 +56,6 @@ /** Tests for {@link KeyedPValueTrackingVisitor}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KeyedPValueTrackingVisitorTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/MultiStepCombineTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/MultiStepCombineTest.java index faac2e65c6df..dfe8fe2fbe48 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/MultiStepCombineTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/MultiStepCombineTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import com.google.auto.value.AutoValue; import java.io.IOException; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ParDoEvaluatorTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ParDoEvaluatorTest.java index 32b97261e616..93e6686d4fa4 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ParDoEvaluatorTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ParDoEvaluatorTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; @@ -67,7 +67,7 @@ /** Tests for {@link ParDoEvaluator}. */ @RunWith(JUnit4.class) -@SuppressWarnings({"keyfor", "nullness"}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +@SuppressWarnings({"keyfor"}) public class ParDoEvaluatorTest { @Mock private EvaluationContext evaluationContext; private PCollection inputPc; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/SideInputContainerTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/SideInputContainerTest.java index 3c681ec366a0..3e51f0087ad1 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/SideInputContainerTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/SideInputContainerTest.java @@ -18,11 +18,11 @@ package org.apache.beam.runners.direct; import static org.apache.beam.sdk.testing.PCollectionViewTesting.materializeValuesFor; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasEntry; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Mockito.doAnswer; @@ -64,9 +64,6 @@ /** Tests for {@link SideInputContainer}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SideInputContainerTest { private static final BoundedWindow FIRST_WINDOW = new BoundedWindow() { diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/StatefulParDoEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/StatefulParDoEvaluatorFactoryTest.java index 3344fd5d9209..3faccf0f977e 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/StatefulParDoEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/StatefulParDoEvaluatorFactoryTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.ArgumentMatchers.any; import static org.mockito.Matchers.anyList; import static org.mockito.Matchers.eq; @@ -88,7 +88,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class StatefulParDoEvaluatorFactoryTest implements Serializable { @Mock private transient EvaluationContext mockEvaluationContext; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/StepTransformResultTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/StepTransformResultTest.java index 570b6119033f..c1efb19e1297 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/StepTransformResultTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/StepTransformResultTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.hasItem; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.direct.CommittedResult.OutputType; import org.apache.beam.sdk.runners.AppliedPTransform; @@ -35,9 +35,6 @@ /** Tests for {@link StepTransformResult}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StepTransformResultTest { private AppliedPTransform transform; private BundleFactory bundleFactory; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/TestStreamEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/TestStreamEvaluatorFactoryTest.java index a8fc2f9594dc..ae270182aa25 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/TestStreamEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/TestStreamEvaluatorFactoryTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; @@ -50,8 +50,7 @@ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "keyfor", - "nullness" -}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) public class TestStreamEvaluatorFactoryTest { private TestStreamEvaluatorFactory factory; private BundleFactory bundleFactory; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/TransformExecutorServicesTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/TransformExecutorServicesTest.java index 07840f4bf084..41056168b6f5 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/TransformExecutorServicesTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/TransformExecutorServicesTest.java @@ -32,9 +32,6 @@ /** Tests for {@link TransformExecutorServices}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TransformExecutorServicesTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/UnboundedReadDeduplicatorTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/UnboundedReadDeduplicatorTest.java index 3575f47c75b0..50706c381616 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/UnboundedReadDeduplicatorTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/UnboundedReadDeduplicatorTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.lessThan; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.List; diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/UnboundedReadEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/UnboundedReadEvaluatorFactoryTest.java index 9949a8901e52..1adaaab6e4c7 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/UnboundedReadEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/UnboundedReadEvaluatorFactoryTest.java @@ -22,6 +22,7 @@ import static java.util.Collections.singletonMap; import static org.apache.beam.runners.direct.DirectGraphs.getProducer; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasSize; @@ -29,7 +30,6 @@ import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.nullValue; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.any; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.times; @@ -67,6 +67,7 @@ import org.apache.beam.sdk.testing.SourceTestUtils; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; @@ -98,8 +99,7 @@ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "keyfor", - "nullness" -}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) public class UnboundedReadEvaluatorFactoryTest { private PCollection longs; private UnboundedReadEvaluatorFactory factory; @@ -460,6 +460,7 @@ private void processElement(final TestUnboundedSource source) throws Exc new HashMap<>(), singletonMap(new TupleTag(), pCollection), unbounded, + ResourceHints.create(), pipeline); final TransformEvaluator> evaluator = factory.forApplication(application, null); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ViewEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ViewEvaluatorFactoryTest.java index 7398f11a64b8..c271c6f227b1 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ViewEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ViewEvaluatorFactoryTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; @@ -45,9 +45,6 @@ /** Tests for {@link ViewEvaluatorFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ViewEvaluatorFactoryTest { private BundleFactory bundleFactory = ImmutableListBundleFactory.create(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WatermarkCallbackExecutorTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WatermarkCallbackExecutorTest.java index ef9ec06092f7..19d592e63b49 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WatermarkCallbackExecutorTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WatermarkCallbackExecutorTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.concurrent.CountDownLatch; import java.util.concurrent.Executors; @@ -44,9 +44,6 @@ /** Tests for {@link WatermarkCallbackExecutor}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WatermarkCallbackExecutorTest { private WatermarkCallbackExecutor executor = WatermarkCallbackExecutor.create(Executors.newSingleThreadExecutor()); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WatermarkManagerTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WatermarkManagerTest.java index 858a99aafd74..c4b8df9f42ea 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WatermarkManagerTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WatermarkManagerTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.direct; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; @@ -27,7 +28,6 @@ import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.when; @@ -89,7 +89,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class WatermarkManagerTest implements Serializable { @Rule public transient ExpectedException thrown = ExpectedException.none(); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WindowEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WindowEvaluatorFactoryTest.java index f11c386ca7ff..46125904814c 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WindowEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WindowEvaluatorFactoryTest.java @@ -20,8 +20,8 @@ import static org.apache.beam.runners.core.WindowMatchers.isSingleWindowedValue; import static org.apache.beam.runners.core.WindowMatchers.isWindowedValue; import static org.apache.beam.sdk.transforms.windowing.PaneInfo.NO_FIRING; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import java.util.Collection; @@ -59,9 +59,6 @@ /** Tests for {@link WindowEvaluatorFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindowEvaluatorFactoryTest { private static final Instant EPOCH = new Instant(0); diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WriteWithShardingFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WriteWithShardingFactoryTest.java index df3abed53010..890cd4f1f421 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WriteWithShardingFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WriteWithShardingFactoryTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.direct; import static java.nio.charset.StandardCharsets.UTF_8; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; @@ -25,7 +26,6 @@ import static org.hamcrest.Matchers.hasSize; import static org.hamcrest.Matchers.lessThan; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.io.File; import java.io.Reader; @@ -57,6 +57,7 @@ import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.values.PCollection; @@ -157,7 +158,12 @@ public WriteOperation createWriteOperation() { PTransform, WriteFilesResult>> originalApplication = AppliedPTransform.of( - "write", PValues.expandInput(objs), Collections.emptyMap(), original, p); + "write", + PValues.expandInput(objs), + Collections.emptyMap(), + original, + ResourceHints.create(), + p); assertThat( factory.getReplacementTransform(originalApplication).getTransform(), diff --git a/runners/extensions-java/metrics/build.gradle b/runners/extensions-java/metrics/build.gradle index 4e8d1be77298..af31d71c634d 100644 --- a/runners/extensions-java/metrics/build.gradle +++ b/runners/extensions-java/metrics/build.gradle @@ -26,10 +26,11 @@ description = "Apache Beam :: Runners :: Extensions Java :: Metrics" ext.summary = "Beam Runners Extensions Metrics provides implementations of runners core metrics APIs." dependencies { - compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.jackson_databind compile library.java.jackson_datatype_joda + compile library.java.jackson_core + compile library.java.joda_time testCompile library.java.joda_time testCompile library.java.junit } diff --git a/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSinkTest.java b/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSinkTest.java index 7bfec1072769..0e4f5e88ecca 100644 --- a/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSinkTest.java +++ b/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSinkTest.java @@ -31,9 +31,6 @@ import org.junit.Test; /** Test class for MetricsGraphiteSink. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MetricsGraphiteSinkTest { private static NetworkMockServer graphiteServer; private static int port; diff --git a/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsHttpSinkTest.java b/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsHttpSinkTest.java index 8d7221c6bad7..afbe77bdb885 100644 --- a/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsHttpSinkTest.java +++ b/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsHttpSinkTest.java @@ -39,9 +39,6 @@ import org.junit.Test; /** Test class for MetricsHttpSink. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MetricsHttpSinkTest { private static int port; private static List messages = new ArrayList<>(); diff --git a/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/NetworkMockServer.java b/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/NetworkMockServer.java index 600f650bd564..5b608373cb39 100644 --- a/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/NetworkMockServer.java +++ b/runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/NetworkMockServer.java @@ -30,9 +30,6 @@ import java.util.concurrent.atomic.AtomicBoolean; /** Mock of a network server. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class NetworkMockServer { private final int port; private ServerSocket serverSocket; diff --git a/runners/flink/1.10/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java b/runners/flink/1.10/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java deleted file mode 100644 index 0852a13a9d67..000000000000 --- a/runners/flink/1.10/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java +++ /dev/null @@ -1,42 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.runners.flink; - -import org.apache.flink.client.program.OptimizerPlanEnvironment; -import org.apache.flink.client.program.PackagedProgram; -import org.apache.flink.client.program.ProgramInvocationException; -import org.apache.flink.configuration.Configuration; - -/** - * Compatibility layer for {@link PackagedProgram} and {@link OptimizerPlanEnvironment} breaking - * changes. - */ -public abstract class FlinkRunnerTestCompat { - public PackagedProgram getPackagedProgram() throws ProgramInvocationException { - return PackagedProgram.newBuilder().setEntryPointClassName(getClass().getName()).build(); - } - - public OptimizerPlanEnvironment getOptimizerPlanEnvironment() { - return new OptimizerPlanEnvironment(new Configuration()); - } - - public void getPipeline(OptimizerPlanEnvironment env, PackagedProgram packagedProgram) - throws ProgramInvocationException { - env.getPipeline(packagedProgram, false); - } -} diff --git a/runners/flink/1.11/build.gradle b/runners/flink/1.11/build.gradle index b1b2e110de07..c3736edfc614 100644 --- a/runners/flink/1.11/build.gradle +++ b/runners/flink/1.11/build.gradle @@ -20,10 +20,10 @@ def basePath = '..' /* All properties required for loading the Flink build script */ project.ext { // Set the version of all Flink-related dependencies here. - flink_version = '1.11.1' + flink_version = '1.11.4' // Version specific code overrides. - main_source_overrides = ["${basePath}/1.8/src/main/java", "${basePath}/1.9/src/main/java", "${basePath}/1.10/src/main/java", './src/main/java'] - test_source_overrides = ["${basePath}/1.8/src/test/java", "${basePath}/1.9/src/test/java", "${basePath}/1.10/src/test/java", './src/test/java'] + main_source_overrides = ['./src/main/java'] + test_source_overrides = ['./src/test/java'] main_resources_overrides = [] test_resources_overrides = [] archives_base_name = 'beam-runners-flink-1.11' diff --git a/runners/flink/1.11/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java b/runners/flink/1.11/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java index 49c1afc12acf..b49f323a60e5 100644 --- a/runners/flink/1.11/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java +++ b/runners/flink/1.11/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java @@ -27,9 +27,6 @@ * Compatibility layer for {@link PackagedProgram} and {@link OptimizerPlanEnvironment} breaking * changes. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public abstract class FlinkRunnerTestCompat { public PackagedProgram getPackagedProgram() throws ProgramInvocationException { return PackagedProgram.newBuilder().setEntryPointClassName(getClass().getName()).build(); diff --git a/runners/flink/1.10/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java b/runners/flink/1.11/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java similarity index 100% rename from runners/flink/1.10/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java rename to runners/flink/1.11/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java diff --git a/runners/flink/1.12/build.gradle b/runners/flink/1.12/build.gradle index 5bfa7e3f38c0..31878af92ddc 100644 --- a/runners/flink/1.12/build.gradle +++ b/runners/flink/1.12/build.gradle @@ -20,10 +20,10 @@ def basePath = '..' /* All properties required for loading the Flink build script */ project.ext { // Set the version of all Flink-related dependencies here. - flink_version = '1.12.0' + flink_version = '1.12.5' // Version specific code overrides. - main_source_overrides = ["${basePath}/1.8/src/main/java", "${basePath}/1.9/src/main/java", "${basePath}/1.10/src/main/java", "${basePath}/1.11/src/main/java", './src/main/java'] - test_source_overrides = ["${basePath}/1.8/src/test/java", "${basePath}/1.9/src/test/java", "${basePath}/1.10/src/test/java", "${basePath}/1.11/src/test/java", './src/test/java'] + main_source_overrides = ["${basePath}/1.11/src/main/java", './src/main/java'] + test_source_overrides = ["${basePath}/1.11/src/test/java", './src/test/java'] main_resources_overrides = [] test_resources_overrides = [] archives_base_name = 'beam-runners-flink-1.12' diff --git a/runners/flink/1.9/build.gradle b/runners/flink/1.13/build.gradle similarity index 79% rename from runners/flink/1.9/build.gradle rename to runners/flink/1.13/build.gradle index 0c2c01253e78..d6d04eb83bec 100644 --- a/runners/flink/1.9/build.gradle +++ b/runners/flink/1.13/build.gradle @@ -20,13 +20,13 @@ def basePath = '..' /* All properties required for loading the Flink build script */ project.ext { // Set the version of all Flink-related dependencies here. - flink_version = '1.9.3' + flink_version = '1.13.2' // Version specific code overrides. - main_source_overrides = ["${basePath}/1.8/src/main/java", './src/main/java'] - test_source_overrides = ["${basePath}/1.8/src/test/java", './src/test/java'] + main_source_overrides = ["${basePath}/1.11/src/main/java", "${basePath}/1.12/src/main/java", './src/main/java'] + test_source_overrides = ["${basePath}/1.11/src/test/java", "${basePath}/1.12/src/test/java", './src/test/java'] main_resources_overrides = [] test_resources_overrides = [] - archives_base_name = 'beam-runners-flink-1.9' + archives_base_name = 'beam-runners-flink-1.13' } // Load the main build script which contains all build logic. diff --git a/runners/flink/1.10/job-server-container/build.gradle b/runners/flink/1.13/job-server-container/build.gradle similarity index 100% rename from runners/flink/1.10/job-server-container/build.gradle rename to runners/flink/1.13/job-server-container/build.gradle diff --git a/runners/flink/1.9/job-server/build.gradle b/runners/flink/1.13/job-server/build.gradle similarity index 95% rename from runners/flink/1.9/job-server/build.gradle rename to runners/flink/1.13/job-server/build.gradle index b094ddac437c..a7e6fd6eb599 100644 --- a/runners/flink/1.9/job-server/build.gradle +++ b/runners/flink/1.13/job-server/build.gradle @@ -24,7 +24,7 @@ project.ext { test_source_dirs = ["$basePath/src/test/java"] main_resources_dirs = ["$basePath/src/main/resources"] test_resources_dirs = ["$basePath/src/test/resources"] - archives_base_name = 'beam-runners-flink-1.9-job-server' + archives_base_name = 'beam-runners-flink-1.13-job-server' } // Load the main build script which contains all build logic. diff --git a/runners/flink/1.9/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java b/runners/flink/1.13/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java similarity index 83% rename from runners/flink/1.9/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java rename to runners/flink/1.13/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java index bfbd89590958..2707395b194c 100644 --- a/runners/flink/1.9/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java +++ b/runners/flink/1.13/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.flink.streaming; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.flink.api.dag.Transformation; import org.apache.flink.runtime.operators.testutils.MockEnvironmentBuilder; import org.apache.flink.streaming.api.functions.source.SourceFunction; @@ -29,7 +30,13 @@ import org.apache.flink.streaming.runtime.tasks.OperatorChain; import org.apache.flink.streaming.runtime.tasks.StreamTask; -/** {@link StreamSource} utilities, that bridge incompatibilities between Flink releases. */ +/** + * {@link StreamSource} utilities, that bridge incompatibilities between Flink releases. + * + *

    This change is becauses RecordWriter is wrapped in RecordWriterDelegate in 1.10, please refer + * to https://github.com/apache/flink/commit/2c8b4ef572f05bf4740b7e204af1e5e709cd945c for more + * details. + */ public class StreamSources { /** @@ -40,7 +47,7 @@ public class StreamSources { * @return Input transformation. */ public static Transformation getOnlyInput(OneInputTransformation source) { - return source.getInput(); + return Iterables.getOnlyElement(source.getInputs()); } public static > void run( @@ -56,7 +63,7 @@ public static > void run( private static OperatorChain createOperatorChain(AbstractStreamOperator operator) { return new OperatorChain<>( operator.getContainingTask(), - StreamTask.createRecordWriters( + StreamTask.createRecordWriterDelegate( operator.getOperatorConfig(), new MockEnvironmentBuilder().build())); } } diff --git a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java b/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java deleted file mode 100644 index 0dc208a18434..000000000000 --- a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTestCompat.java +++ /dev/null @@ -1,43 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.runners.flink; - -import org.apache.flink.client.program.OptimizerPlanEnvironment; -import org.apache.flink.client.program.PackagedProgram; -import org.apache.flink.client.program.ProgramInvocationException; -import org.apache.flink.configuration.Configuration; -import org.apache.flink.optimizer.Optimizer; - -/** - * Compatibility layer for {@link PackagedProgram} and {@link OptimizerPlanEnvironment} breaking - * changes. - */ -public abstract class FlinkRunnerTestCompat { - public PackagedProgram getPackagedProgram() throws ProgramInvocationException { - return new PackagedProgram(getClass()); - } - - public OptimizerPlanEnvironment getOptimizerPlanEnvironment() { - return new OptimizerPlanEnvironment(new Optimizer(new Configuration())); - } - - public void getPipeline(OptimizerPlanEnvironment env, PackagedProgram packagedProgram) - throws ProgramInvocationException { - env.getOptimizedPlan(packagedProgram); - } -} diff --git a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/RemoteMiniClusterImpl.java b/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/RemoteMiniClusterImpl.java deleted file mode 100644 index 1802983a41e6..000000000000 --- a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/RemoteMiniClusterImpl.java +++ /dev/null @@ -1,71 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.runners.flink; - -import akka.actor.ActorSystem; -import com.typesafe.config.Config; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; -import org.apache.flink.runtime.akka.AkkaUtils; -import org.apache.flink.runtime.minicluster.MiniCluster; -import org.apache.flink.runtime.minicluster.MiniClusterConfiguration; -import org.apache.flink.runtime.rpc.RpcService; -import org.apache.flink.runtime.rpc.akka.AkkaRpcService; -import org.apache.flink.runtime.rpc.akka.AkkaRpcServiceConfiguration; - -/** A {@link MiniCluster} which allows remote connections for the end-to-end test. */ -public class RemoteMiniClusterImpl extends RemoteMiniCluster { - - private int port; - - public RemoteMiniClusterImpl(MiniClusterConfiguration miniClusterConfiguration) { - super(miniClusterConfiguration); - } - - @Override - protected RpcService createRpcService( - AkkaRpcServiceConfiguration akkaRpcServiceConfig, boolean remoteEnabled, String bindAddress) { - - // Enable remote connections to the mini cluster which are disabled by default - final Config akkaConfig = - AkkaUtils.getAkkaConfig(akkaRpcServiceConfig.getConfiguration(), bindAddress, 0); - - final Config effectiveAkkaConfig = AkkaUtils.testDispatcherConfig().withFallback(akkaConfig); - - final ActorSystem actorSystem = AkkaUtils.createActorSystem(effectiveAkkaConfig); - - final AkkaRpcService akkaRpcService = new AkkaRpcService(actorSystem, akkaRpcServiceConfig); - this.port = akkaRpcService.getPort(); - - return akkaRpcService; - } - - @Override - public int getClusterPort() { - Preconditions.checkState(port > 0, "Port not yet initialized. Start the cluster first."); - return port; - } - - @Override - public int getRestPort() { - try { - return getRestAddress().get().getPort(); - } catch (Exception e) { - throw new RuntimeException(e); - } - } -} diff --git a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java b/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java deleted file mode 100644 index c148c3fc1c06..000000000000 --- a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/streaming/StreamSources.java +++ /dev/null @@ -1,50 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.runners.flink.streaming; - -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.api.operators.Output; -import org.apache.flink.streaming.api.operators.StreamSource; -import org.apache.flink.streaming.api.transformations.OneInputTransformation; -import org.apache.flink.streaming.api.transformations.StreamTransformation; -import org.apache.flink.streaming.runtime.streamrecord.StreamRecord; -import org.apache.flink.streaming.runtime.streamstatus.StreamStatusMaintainer; - -/** {@link StreamSource} utilities, that bridge incompatibilities between Flink releases. */ -public class StreamSources { - - /** - * Backward compatibility helper for {@link OneInputTransformation} `getInput` method, that has - * been removed in Flink 1.12. - * - * @param source Source to get single input from. - * @return Input transformation. - */ - public static StreamTransformation getOnlyInput(OneInputTransformation source) { - return source.getInput(); - } - - public static > void run( - StreamSource streamSource, - Object lockingObject, - StreamStatusMaintainer streamStatusMaintainer, - Output> collector) - throws Exception { - streamSource.run(lockingObject, streamStatusMaintainer, collector); - } -} diff --git a/runners/flink/flink_runner.gradle b/runners/flink/flink_runner.gradle index 30cf1f770c8b..f400974506d0 100644 --- a/runners/flink/flink_runner.gradle +++ b/runners/flink/flink_runner.gradle @@ -27,6 +27,7 @@ import groovy.json.JsonOutput apply plugin: 'org.apache.beam.module' applyJavaNature( + enableStrictDependencies:true, automaticModuleName: 'org.apache.beam.runners.flink', archivesBaseName: (project.hasProperty('archives_base_name') ? archives_base_name : archivesBaseName), ) @@ -108,17 +109,14 @@ test { if (System.getProperty("beamSurefireArgline")) { jvmArgs System.getProperty("beamSurefireArgline") } - // TODO Running tests of all Flink versions in parallel can be too harsh on Jenkins memory - // Run them serially for now, to avoid "Exit code 137", i.e. Jenkins host killing the Gradle test process - if (project.path == ":runners:flink:1.9") { - mustRunAfter(":runners:flink:1.8:test") - } else if (project.path == ":runners:flink:1.10") { - mustRunAfter(":runners:flink:1.8:test") - mustRunAfter(":runners:flink:1.9:test") - } else if (project.path == ":runners:flink:1.11") { - mustRunAfter(":runners:flink:1.8:test") - mustRunAfter(":runners:flink:1.9:test") - mustRunAfter(":runners:flink:1.10:test") + // TODO(BEAM-6418) Running tests of all Flink versions in parallel can be too harsh on Jenkins memory. + // Run them serially for now, to avoid "Exit code 137", i.e. Jenkins host killing the Gradle test process. + def flink_minor_version = project.path.split(':').last() + for (version in project.ext.allFlinkVersions) { + if (version == flink_minor_version) { + break + } + mustRunAfter(":runners:flink:${version}:test") } } @@ -136,12 +134,14 @@ dependencies { compile project(":runners:java-fn-execution") compile project(":runners:java-job-service") compile project(":sdks:java:extensions:google-cloud-platform-core") - compile library.java.vendored_grpc_1_26_0 - compile library.java.jackson_annotations + compile library.java.vendored_grpc_1_36_0 compile library.java.slf4j_api compile library.java.joda_time compile library.java.args4j compile "org.apache.flink:flink-clients_2.11:$flink_version" + // Runtime dependencies are not included in Beam's generated pom.xml, so we must declare flink-clients in compile + // configuration (https://issues.apache.org/jira/browse/BEAM-11732). + permitUnusedDeclared "org.apache.flink:flink-clients_2.11:$flink_version" compile "org.apache.flink:flink-core:$flink_version" compile "org.apache.flink:flink-metrics-core:$flink_version" compile "org.apache.flink:flink-java:$flink_version" @@ -169,6 +169,13 @@ dependencies { validatesRunner project(path: ":runners:core-java", configuration: "testRuntime") validatesRunner project(project.path) miniCluster "org.apache.flink:flink-runtime-web_2.11:$flink_version" + compile project(path: ":model:fn-execution", configuration: "shadow") + compile project(path: ":model:pipeline", configuration: "shadow") + compile project(path: ":model:job-management", configuration: "shadow") + compile project(":sdks:java:fn-execution") + compile library.java.jackson_databind + compile "org.apache.flink:flink-annotations:$flink_version" + compile "org.apache.flink:flink-optimizer_2.11:$flink_version" } class ValidatesRunnerConfig { @@ -181,6 +188,8 @@ def createValidatesRunnerTask(Map m) { def config = m as ValidatesRunnerConfig tasks.register(config.name, Test) { group = "Verification" + // Disable gradle cache + outputs.upToDateWhen { false } def runnerType = config.streaming ? "streaming" : "batch" description = "Validates the ${runnerType} runner" def pipelineOptionsArray = ["--runner=TestFlinkRunner", @@ -218,6 +227,7 @@ def createValidatesRunnerTask(Map m) { excludeCategories 'org.apache.beam.sdk.testing.UsesOnWindowExpiration' excludeCategories 'org.apache.beam.sdk.testing.UsesStrictTimerOrdering' excludeCategories 'org.apache.beam.sdk.testing.UsesOrderedListState' + excludeCategories 'org.apache.beam.sdk.testing.UsesLoopingTimer' if (config.streaming) { excludeCategories 'org.apache.beam.sdk.testing.UsesTimerMap' excludeCategories 'org.apache.beam.sdk.testing.UsesTestStreamWithMultipleStages' // BEAM-8598 @@ -228,6 +238,14 @@ def createValidatesRunnerTask(Map m) { excludeCategories 'org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo' excludeCategories 'org.apache.beam.sdk.testing.UsesTestStream' } + filter { + // https://issues.apache.org/jira/browse/BEAM-12039 + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testDiscardingMode' + // https://issues.apache.org/jira/browse/BEAM-12037 + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testFirstElementLate' + // https://issues.apache.org/jira/browse/BEAM-12038 + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testLateDataAccumulating' + } } } } @@ -249,7 +267,7 @@ tasks.register('validatesRunner') { dependsOn validatesRunnerStreamingCheckpointing } -// Generates :runners:flink:1.10:runQuickstartJavaFlinkLocal +// Generates :runners:flink:1.13:runQuickstartJavaFlinkLocal createJavaExamplesArchetypeValidationTask(type: 'Quickstart', runner: 'FlinkLocal') /** diff --git a/runners/flink/job-server/flink_job_server.gradle b/runners/flink/job-server/flink_job_server.gradle index 130dd97983cf..9ab6c32696b8 100644 --- a/runners/flink/job-server/flink_job_server.gradle +++ b/runners/flink/job-server/flink_job_server.gradle @@ -78,6 +78,7 @@ configurations.all { dependencies { compile project(flinkRunnerProject) + permitUnusedDeclared project(flinkRunnerProject) // BEAM-11761 runtime group: "org.slf4j", name: "jcl-over-slf4j", version: dependencies.create(project.library.java.slf4j_api).getVersion() validatesPortableRunner project(path: flinkRunnerProject, configuration: "testRuntime") validatesPortableRunner project(path: ":sdks:java:core", configuration: "shadowTest") @@ -85,7 +86,8 @@ dependencies { validatesPortableRunner project(path: ":runners:portability:java", configuration: "testRuntime") runtime project(":sdks:java:extensions:google-cloud-platform-core") runtime library.java.slf4j_simple -// TODO: Enable AWS and HDFS file system. + // TODO: Enable HDFS file system. + runtime project(":sdks:java:io:amazon-web-services2") // External transform expansion // Kafka runtimeOnly project(":sdks:java:io:kafka") @@ -155,6 +157,7 @@ def portableValidatesRunnerTask(String name, Boolean streaming, Boolean checkpoi excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above10MB' excludeCategories 'org.apache.beam.sdk.testing.UsesCommittedMetrics' excludeCategories 'org.apache.beam.sdk.testing.UsesCrossLanguageTransforms' + excludeCategories 'org.apache.beam.sdk.testing.UsesPythonExpansionService' excludeCategories 'org.apache.beam.sdk.testing.UsesCustomWindowMerging' excludeCategories 'org.apache.beam.sdk.testing.UsesFailureMessage' excludeCategories 'org.apache.beam.sdk.testing.UsesGaugeMetrics' @@ -163,7 +166,6 @@ def portableValidatesRunnerTask(String name, Boolean streaming, Boolean checkpoi excludeCategories 'org.apache.beam.sdk.testing.UsesSetState' excludeCategories 'org.apache.beam.sdk.testing.UsesOrderedListState' excludeCategories 'org.apache.beam.sdk.testing.UsesStrictTimerOrdering' - excludeCategories 'org.apache.beam.sdk.testing.UsesTimerMap' excludeCategories 'org.apache.beam.sdk.testing.UsesOnWindowExpiration' excludeCategories 'org.apache.beam.sdk.testing.UsesBundleFinalizer' excludeCategories 'org.apache.beam.sdk.testing.UsesOrderedListState' @@ -183,8 +185,12 @@ def portableValidatesRunnerTask(String name, Boolean streaming, Boolean checkpoi testFilter: { // TODO(BEAM-10016) excludeTestsMatching 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2' - // TODO(BEAM-11310) - excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testWindowedSideInputNotPresent' + // TODO(BEAM-12039) + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testDiscardingMode' + // TODO(BEAM-12038) + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testLateDataAccumulating' + // TODO(BEAM-12710) + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testFirstElementLate' }, ) } @@ -199,8 +205,8 @@ task validatesPortableRunner() { dependsOn validatesPortableRunnerStreamingCheckpoint } -def jobPort = BeamModulePlugin.startingExpansionPortNumber.getAndDecrement() -def artifactPort = BeamModulePlugin.startingExpansionPortNumber.getAndDecrement() +def jobPort = BeamModulePlugin.getRandomPort() +def artifactPort = BeamModulePlugin.getRandomPort() def setupTask = project.tasks.create(name: "flinkJobServerSetup", type: Exec) { dependsOn shadowJar diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/CreateStreamingFlinkView.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/CreateStreamingFlinkView.java index 8f34413d42ce..7920fa4dc783 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/CreateStreamingFlinkView.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/CreateStreamingFlinkView.java @@ -18,14 +18,11 @@ package org.apache.beam.runners.flink; import java.io.IOException; -import java.util.ArrayList; import java.util.List; import java.util.Map; +import org.apache.beam.runners.core.Concatenate; import org.apache.beam.runners.core.construction.CreatePCollectionViewTranslation; import org.apache.beam.runners.core.construction.ReplacementOutputs; -import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.coders.CoderRegistry; -import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.runners.PTransformOverrideFactory; import org.apache.beam.sdk.transforms.Combine; @@ -58,51 +55,6 @@ public PCollection expand(PCollection input) { return input; } - /** - * Combiner that combines {@code T}s into a single {@code List} containing all inputs. - * - *

    For internal use by {@link CreateStreamingFlinkView}. This combiner requires that the input - * {@link PCollection} fits in memory. For a large {@link PCollection} this is expected to crash! - * - * @param the type of elements to concatenate. - */ - private static class Concatenate extends Combine.CombineFn, List> { - @Override - public List createAccumulator() { - return new ArrayList<>(); - } - - @Override - public List addInput(List accumulator, T input) { - accumulator.add(input); - return accumulator; - } - - @Override - public List mergeAccumulators(Iterable> accumulators) { - List result = createAccumulator(); - for (List accumulator : accumulators) { - result.addAll(accumulator); - } - return result; - } - - @Override - public List extractOutput(List accumulator) { - return accumulator; - } - - @Override - public Coder> getAccumulatorCoder(CoderRegistry registry, Coder inputCoder) { - return ListCoder.of(inputCoder); - } - - @Override - public Coder> getDefaultOutputCoder(CoderRegistry registry, Coder inputCoder) { - return ListCoder.of(inputCoder); - } - } - /** * Creates a primitive {@link PCollectionView}. * diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchPortablePipelineTranslator.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchPortablePipelineTranslator.java index 32726f37ce3f..d3d69d054c72 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchPortablePipelineTranslator.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchPortablePipelineTranslator.java @@ -26,7 +26,6 @@ import com.google.auto.service.AutoService; import java.io.IOException; -import java.util.ArrayList; import java.util.Collection; import java.util.Collections; import java.util.HashMap; @@ -39,6 +38,7 @@ import java.util.stream.Collectors; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId; +import org.apache.beam.runners.core.Concatenate; import org.apache.beam.runners.core.construction.NativeTransforms; import org.apache.beam.runners.core.construction.PTransformTranslation; import org.apache.beam.runners.core.construction.RehydratedComponents; @@ -62,10 +62,7 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderRegistry; import org.apache.beam.sdk.coders.KvCoder; -import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.coders.VoidCoder; -import org.apache.beam.sdk.transforms.Combine; -import org.apache.beam.sdk.transforms.GroupByKey; import org.apache.beam.sdk.transforms.join.RawUnionValue; import org.apache.beam.sdk.transforms.join.UnionCoder; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; @@ -73,9 +70,8 @@ import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; import org.apache.beam.sdk.values.KV; -import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.BiMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; @@ -559,53 +555,6 @@ private static void translateImpulse( Iterables.getOnlyElement(transform.getTransform().getOutputsMap().values()), dataSource); } - /** - * Combiner that combines {@code T}s into a single {@code List} containing all inputs. - * - *

    For internal use to translate {@link GroupByKey}. For a large {@link PCollection} this is - * expected to crash! - * - *

    This is copied from the dataflow runner code. - * - * @param the type of elements to concatenate. - */ - private static class Concatenate extends Combine.CombineFn, List> { - @Override - public List createAccumulator() { - return new ArrayList<>(); - } - - @Override - public List addInput(List accumulator, T input) { - accumulator.add(input); - return accumulator; - } - - @Override - public List mergeAccumulators(Iterable> accumulators) { - List result = createAccumulator(); - for (List accumulator : accumulators) { - result.addAll(accumulator); - } - return result; - } - - @Override - public List extractOutput(List accumulator) { - return accumulator; - } - - @Override - public Coder> getAccumulatorCoder(CoderRegistry registry, Coder inputCoder) { - return ListCoder.of(inputCoder); - } - - @Override - public Coder> getDefaultOutputCoder(CoderRegistry registry, Coder inputCoder) { - return ListCoder.of(inputCoder); - } - } - private static void urnNotFound( PTransformNode transform, RunnerApi.Pipeline pipeline, BatchTranslationContext context) { throw new IllegalArgumentException( diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchTransformTranslators.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchTransformTranslators.java index c12c9c24c3d2..af5259eb33ff 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchTransformTranslators.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchTransformTranslators.java @@ -21,12 +21,12 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; import java.io.IOException; -import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Map.Entry; +import org.apache.beam.runners.core.Concatenate; import org.apache.beam.runners.core.construction.CreatePCollectionViewTranslation; import org.apache.beam.runners.core.construction.PTransformTranslation; import org.apache.beam.runners.core.construction.ParDoTranslation; @@ -48,10 +48,8 @@ import org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat; import org.apache.beam.sdk.coders.CannotProvideCoderException; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.coders.CoderRegistry; import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; -import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.coders.VoidCoder; import org.apache.beam.sdk.io.BoundedSource; import org.apache.beam.sdk.runners.AppliedPTransform; @@ -59,7 +57,6 @@ import org.apache.beam.sdk.transforms.CombineFnBase; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.DoFnSchemaInformation; -import org.apache.beam.sdk.transforms.GroupByKey; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.Reshuffle; import org.apache.beam.sdk.transforms.join.RawUnionValue; @@ -95,7 +92,6 @@ import org.apache.flink.api.java.operators.GroupReduceOperator; import org.apache.flink.api.java.operators.Grouping; import org.apache.flink.api.java.operators.MapOperator; -import org.apache.flink.api.java.operators.MapPartitionOperator; import org.apache.flink.api.java.operators.SingleInputUdfOperator; import org.apache.flink.api.java.operators.UnsortedGrouping; import org.apache.flink.configuration.Configuration; @@ -264,7 +260,7 @@ public boolean canTranslate( FlinkBatchTranslationContext context) { final WindowingStrategy windowingStrategy = context.getInput(transform).getWindowingStrategy(); - return windowingStrategy.getWindowFn().isNonMerging() + return !windowingStrategy.needsMerge() && windowingStrategy.getTimestampCombiner() == TimestampCombiner.END_OF_WINDOW && windowingStrategy.getWindowFn().windowCoder().consistentWithEquals(); } @@ -426,53 +422,6 @@ public void translateNode( } } - /** - * Combiner that combines {@code T}s into a single {@code List} containing all inputs. - * - *

    For internal use to translate {@link GroupByKey}. For a large {@link PCollection} this is - * expected to crash! - * - *

    This is copied from the dataflow runner code. - * - * @param the type of elements to concatenate. - */ - private static class Concatenate extends Combine.CombineFn, List> { - @Override - public List createAccumulator() { - return new ArrayList<>(); - } - - @Override - public List addInput(List accumulator, T input) { - accumulator.add(input); - return accumulator; - } - - @Override - public List mergeAccumulators(Iterable> accumulators) { - List result = createAccumulator(); - for (List accumulator : accumulators) { - result.addAll(accumulator); - } - return result; - } - - @Override - public List extractOutput(List accumulator) { - return accumulator; - } - - @Override - public Coder> getAccumulatorCoder(CoderRegistry registry, Coder inputCoder) { - return ListCoder.of(inputCoder); - } - - @Override - public Coder> getDefaultOutputCoder(CoderRegistry registry, Coder inputCoder) { - return ListCoder.of(inputCoder); - } - } - private static class CombinePerKeyTranslatorBatch implements FlinkBatchPipelineTranslator.BatchTransformTranslator< PTransform>, PCollection>>> { @@ -538,7 +487,7 @@ public void translateNode( sideInputStrategies.put(sideInput, sideInput.getWindowingStrategyInternal()); } - if (windowingStrategy.getWindowFn().isNonMerging()) { + if (!windowingStrategy.needsMerge()) { final FlinkPartialReduceFunction partialReduceFunction = new FlinkPartialReduceFunction<>( combineFn, @@ -772,14 +721,7 @@ public void translateNode( doFnSchemaInformation, sideInputMapping); - if (FlinkCapabilities.supportsOutputDuringClosing()) { - outputDataSet = - new FlatMapOperator<>(inputDataSet, typeInformation, doFnWrapper, fullName); - } else { - // This can be removed once we drop support for 1.8 and 1.9 versions. - outputDataSet = - new MapPartitionOperator<>(inputDataSet, typeInformation, doFnWrapper, fullName); - } + outputDataSet = new FlatMapOperator<>(inputDataSet, typeInformation, doFnWrapper, fullName); } transformSideInputs(sideInputs, outputDataSet, context); diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java index a8dd1eb639c1..eabe952a879c 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java @@ -28,6 +28,7 @@ import org.apache.flink.api.common.ExecutionMode; import org.apache.flink.api.java.CollectionEnvironment; import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.LocalEnvironment; import org.apache.flink.configuration.Configuration; import org.apache.flink.configuration.CoreOptions; import org.apache.flink.configuration.GlobalConfiguration; @@ -41,6 +42,7 @@ import org.apache.flink.streaming.api.CheckpointingMode; import org.apache.flink.streaming.api.TimeCharacteristic; import org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup; +import org.apache.flink.streaming.api.environment.LocalStreamEnvironment; import org.apache.flink.streaming.api.environment.RemoteStreamEnvironment; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.checkerframework.checker.nullness.qual.Nullable; @@ -77,11 +79,17 @@ static ExecutionEnvironment createBatchExecutionEnvironment( // depending on the master, create the right environment. if ("[local]".equals(flinkMasterHostPort)) { setManagedMemoryByFraction(flinkConfiguration); + disableClassLoaderLeakCheck(flinkConfiguration); flinkBatchEnv = ExecutionEnvironment.createLocalEnvironment(flinkConfiguration); } else if ("[collection]".equals(flinkMasterHostPort)) { flinkBatchEnv = new CollectionEnvironment(); } else if ("[auto]".equals(flinkMasterHostPort)) { flinkBatchEnv = ExecutionEnvironment.getExecutionEnvironment(); + if (flinkBatchEnv instanceof LocalEnvironment) { + disableClassLoaderLeakCheck(flinkConfiguration); + flinkBatchEnv = ExecutionEnvironment.createLocalEnvironment(flinkConfiguration); + flinkBatchEnv.setParallelism(getDefaultLocalParallelism()); + } } else { int defaultPort = flinkConfiguration.getInteger(RestOptions.PORT); HostAndPort hostAndPort = @@ -150,16 +158,23 @@ static StreamExecutionEnvironment createStreamExecutionEnvironment( // Although Flink uses Rest, it expects the address not to contain a http scheme String masterUrl = stripHttpSchema(options.getFlinkMaster()); Configuration flinkConfiguration = getFlinkConfiguration(confDir); - final StreamExecutionEnvironment flinkStreamEnv; + StreamExecutionEnvironment flinkStreamEnv; // depending on the master, create the right environment. if ("[local]".equals(masterUrl)) { setManagedMemoryByFraction(flinkConfiguration); + disableClassLoaderLeakCheck(flinkConfiguration); flinkStreamEnv = StreamExecutionEnvironment.createLocalEnvironment( getDefaultLocalParallelism(), flinkConfiguration); } else if ("[auto]".equals(masterUrl)) { flinkStreamEnv = StreamExecutionEnvironment.getExecutionEnvironment(); + if (flinkStreamEnv instanceof LocalStreamEnvironment) { + disableClassLoaderLeakCheck(flinkConfiguration); + flinkStreamEnv = + StreamExecutionEnvironment.createLocalEnvironment( + getDefaultLocalParallelism(), flinkConfiguration); + } } else { int defaultPort = flinkConfiguration.getInteger(RestOptions.PORT); HostAndPort hostAndPort = HostAndPort.fromString(masterUrl).withDefaultPort(defaultPort); @@ -375,4 +390,14 @@ private static void setManagedMemoryByFraction(final Configuration config) { config.setString("taskmanager.memory.managed.size", String.valueOf(managedMemorySize)); } } + + /** + * Disables classloader.check-leaked-classloader unless set by the user. See + * https://issues.apache.org/jira/browse/BEAM-11570. + */ + private static void disableClassLoaderLeakCheck(final Configuration config) { + if (!config.containsKey("classloader.check-leaked-classloader")) { + config.setBoolean("classloader.check-leaked-classloader", false); + } + } } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobInvoker.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobInvoker.java index e6bc7f7f4db5..ce3e8696ca93 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobInvoker.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobInvoker.java @@ -28,7 +28,7 @@ import org.apache.beam.runners.jobsubmission.PortablePipelineJarCreator; import org.apache.beam.runners.jobsubmission.PortablePipelineRunner; import org.apache.beam.sdk.options.PortablePipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListeningExecutorService; import org.checkerframework.checker.nullness.qual.Nullable; @@ -49,7 +49,7 @@ public static FlinkJobInvoker create(FlinkJobServerDriver.FlinkServerConfigurati private final FlinkJobServerDriver.FlinkServerConfiguration serverConfig; protected FlinkJobInvoker(FlinkJobServerDriver.FlinkServerConfiguration serverConfig) { - super("flink-runner-job-invoker"); + super("flink-runner-job-invoker-%d"); this.serverConfig = serverConfig; } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java index 777506c2ff23..1c3781d636ca 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.flink; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.jobsubmission.JobServerDriver; import org.apache.beam.sdk.extensions.gcp.options.GcsOptions; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.sdk.io.FileSystems; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java index 1e7c206aa957..242684060067 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java @@ -17,13 +17,11 @@ */ package org.apache.beam.runners.flink; -import static org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import org.apache.beam.runners.core.construction.resources.PipelineResources; import org.apache.beam.sdk.Pipeline; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.flink.api.common.JobExecutionResult; import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.runtime.jobgraph.JobGraph; @@ -127,21 +125,7 @@ public void translate(Pipeline pipeline) { */ private static void prepareFilesToStageForRemoteClusterExecution(FlinkPipelineOptions options) { if (!options.getFlinkMaster().matches("\\[auto\\]|\\[collection\\]|\\[local\\]")) { - if (options.getFilesToStage() == null) { - options.setFilesToStage( - detectClassPathResourcesToStage(FlinkRunner.class.getClassLoader(), options)); - LOG.info( - "PipelineOptions.filesToStage was not specified. " - + "Defaulting to files from the classpath: will stage {} files. " - + "Enable logging at DEBUG level to see which files will be staged.", - options.getFilesToStage().size()); - LOG.debug("Classpath elements: {}", options.getFilesToStage()); - } - options.setFilesToStage( - PipelineResources.prepareFilesForStaging( - options.getFilesToStage(), - MoreObjects.firstNonNull( - options.getTempLocation(), System.getProperty("java.io.tmpdir")))); + PipelineResources.prepareFilesForStaging(options); } } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java index b0b364ac3ed2..b097b9c8b4f3 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.flink; -import java.util.List; import org.apache.beam.sdk.options.ApplicationNameOptions; import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.FileStagingOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.options.StreamingOptions; @@ -32,26 +32,12 @@ * requiring flink on the classpath (e.g. to use with the direct runner). */ public interface FlinkPipelineOptions - extends PipelineOptions, ApplicationNameOptions, StreamingOptions { + extends PipelineOptions, ApplicationNameOptions, StreamingOptions, FileStagingOptions { String AUTO = "[auto]"; String PIPELINED = "PIPELINED"; String EXACTLY_ONCE = "EXACTLY_ONCE"; - /** - * List of local files to make available to workers. - * - *

    Jars are placed on the worker's classpath. - * - *

    The default value is the list of jars from the main program's classpath. - */ - @Description( - "Jar-Files to send to all workers and put on the classpath. " - + "The default value is all files from the classpath.") - List getFilesToStage(); - - void setFilesToStage(List value); - /** * The url of the Flink JobManager on which to execute pipelines. This can either be the the * address of a cluster JobManager, in the form "host:port" or one of the special Strings diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java index 96b8781cc035..86220a920a2a 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java @@ -43,7 +43,7 @@ import org.apache.beam.sdk.metrics.MetricsOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.flink.api.common.JobExecutionResult; import org.checkerframework.checker.nullness.qual.Nullable; @@ -122,13 +122,8 @@ PortablePipelineResult runPipelineWithTranslator( private PortablePipelineResult createPortablePipelineResult( JobExecutionResult result, PipelineOptions options) { - // The package of DetachedJobExecutionResult has been changed in 1.10. - // Refer to https://github.com/apache/flink/commit/c36b35e6876ecdc717dade653e8554f9d8b543c9 for - // details. String resultClassName = result.getClass().getCanonicalName(); - if (resultClassName.equals( - "org.apache.flink.client.program.DetachedEnvironment.DetachedJobExecutionResult") - || resultClassName.equals("org.apache.flink.core.execution.DetachedJobExecutionResult")) { + if (resultClassName.equals("org.apache.flink.core.execution.DetachedJobExecutionResult")) { LOG.info("Pipeline submitted in Detached mode"); // no metricsPusher because metrics are not supported in detached mode return new FlinkPortableRunnerResult.Detached(); diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPortableClientEntryPoint.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPortableClientEntryPoint.java index df0f07c915a7..296e14590834 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPortableClientEntryPoint.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPortableClientEntryPoint.java @@ -18,7 +18,7 @@ package org.apache.beam.runners.flink; import java.io.File; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.time.Duration; import java.util.Arrays; @@ -188,7 +188,7 @@ private void runDriverProgram() throws Exception { String msg = String.format( "Failed to start job with driver program: %s %s output: %s", - executable, args, new String(output, Charset.defaultCharset())); + executable, args, new String(output, StandardCharsets.UTF_8)); throw new RuntimeException(msg, e); } } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java index f63869123ec6..c4735b8f38d4 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java @@ -29,6 +29,7 @@ import org.apache.beam.sdk.PipelineRunner; import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.metrics.MetricsOptions; +import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsValidator; import org.apache.beam.sdk.runners.TransformHierarchy; @@ -74,7 +75,12 @@ protected FlinkRunner(FlinkPipelineOptions options) { @Override public PipelineResult run(Pipeline pipeline) { - SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); + // Portable flink only support SDF as read. + // TODO(BEAM-10670): Use SDF read as default when we address performance issue. + if (!ExperimentalOptions.hasExperiment(pipeline.getOptions(), "beam_fn_api")) { + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); + } + logWarningIfPCollectionViewHasNonDeterministicKeyCoder(pipeline); MetricsEnvironment.setMetricsSupported(true); @@ -103,13 +109,8 @@ public PipelineResult run(Pipeline pipeline) { } static PipelineResult createPipelineResult(JobExecutionResult result, PipelineOptions options) { - // The package of DetachedJobExecutionResult has been changed in 1.10. - // Refer to https://github.com/apache/flink/commit/c36b35e6876ecdc717dade653e8554f9d8b543c9 for - // more details. String resultClassName = result.getClass().getCanonicalName(); - if (resultClassName.equals( - "org.apache.flink.client.program.DetachedEnvironment.DetachedJobExecutionResult") - || resultClassName.equals("org.apache.flink.core.execution.DetachedJobExecutionResult")) { + if (resultClassName.equals("org.apache.flink.core.execution.DetachedJobExecutionResult")) { LOG.info("Pipeline submitted in Detached mode"); // no metricsPusher because metrics are not supported in detached mode return new FlinkDetachedRunnerResult(); diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java index c1d2583109cf..7fdf9c95cf9b 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java @@ -38,7 +38,9 @@ import java.util.Set; import java.util.TreeMap; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.KeyedWorkItem; import org.apache.beam.runners.core.SystemReduceFn; +import org.apache.beam.runners.core.construction.ModelCoders; import org.apache.beam.runners.core.construction.NativeTransforms; import org.apache.beam.runners.core.construction.PTransformTranslation; import org.apache.beam.runners.core.construction.ReadTranslation; @@ -53,11 +55,11 @@ import org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageContextFactory; import org.apache.beam.runners.flink.translation.functions.ImpulseSourceFunction; import org.apache.beam.runners.flink.translation.types.CoderTypeInformation; +import org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat; import org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator; import org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator; import org.apache.beam.runners.flink.translation.wrappers.streaming.KvToByteBufferKeySelector; import org.apache.beam.runners.flink.translation.wrappers.streaming.SdfByteBufferKeySelector; -import org.apache.beam.runners.flink.translation.wrappers.streaming.SingletonKeyedWorkItem; import org.apache.beam.runners.flink.translation.wrappers.streaming.SingletonKeyedWorkItemCoder; import org.apache.beam.runners.flink.translation.wrappers.streaming.WindowDoFnOperator; import org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector; @@ -65,12 +67,16 @@ import org.apache.beam.runners.flink.translation.wrappers.streaming.io.StreamingImpulseSource; import org.apache.beam.runners.flink.translation.wrappers.streaming.io.TestStreamSource; import org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper; +import org.apache.beam.runners.fnexecution.control.SdkHarnessClient; import org.apache.beam.runners.fnexecution.provisioning.JobInfo; +import org.apache.beam.runners.fnexecution.wire.WireCoders; import org.apache.beam.sdk.coders.ByteArrayCoder; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.VoidCoder; +import org.apache.beam.sdk.io.BoundedSource; import org.apache.beam.sdk.io.FileSystems; import org.apache.beam.sdk.io.UnboundedSource; import org.apache.beam.sdk.options.PipelineOptions; @@ -81,6 +87,7 @@ import org.apache.beam.sdk.transforms.join.UnionCoder; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; +import org.apache.beam.sdk.util.CoderUtils; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; import org.apache.beam.sdk.values.KV; @@ -91,8 +98,7 @@ import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.beam.sdk.values.ValueWithRecordId; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.BiMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashMultiset; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -112,7 +118,6 @@ import org.apache.flink.streaming.api.datastream.KeyedStream; import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.operators.OneInputStreamOperator; import org.apache.flink.streaming.api.transformations.TwoInputTransformation; import org.apache.flink.util.Collector; import org.apache.flink.util.OutputTag; @@ -224,7 +229,7 @@ interface PTransformTranslator { // Consider removing now that timers are supported translatorMap.put(STREAMING_IMPULSE_TRANSFORM_URN, this::translateStreamingImpulse); // Remove once unbounded Reads can be wrapped in SDFs - translatorMap.put(PTransformTranslation.READ_TRANSFORM_URN, this::translateUnboundedRead); + translatorMap.put(PTransformTranslation.READ_TRANSFORM_URN, this::translateRead); // For testing only translatorMap.put(PTransformTranslation.TEST_STREAM_TRANSFORM_URN, this::translateTestStream); @@ -404,13 +409,13 @@ private SingleOutputStreamOperator>>> add inputElementCoder.getValueCoder(), windowingStrategy.getWindowFn().windowCoder()); - WindowedValue.FullWindowedValueCoder> windowedWorkItemCoder = + WindowedValue.FullWindowedValueCoder> windowedWorkItemCoder = WindowedValue.getFullCoder(workItemCoder, windowingStrategy.getWindowFn().windowCoder()); - CoderTypeInformation>> workItemTypeInfo = + CoderTypeInformation>> workItemTypeInfo = new CoderTypeInformation<>(windowedWorkItemCoder, context.getPipelineOptions()); - DataStream>> workItemStream = + DataStream>> workItemStream = inputDataStream .flatMap( new FlinkStreamingTransformTranslators.ToKeyedWorkItem<>( @@ -419,9 +424,11 @@ private SingleOutputStreamOperator>>> add .name("ToKeyedWorkItem"); WorkItemKeySelector keySelector = - new WorkItemKeySelector<>(inputElementCoder.getKeyCoder()); + new WorkItemKeySelector<>( + inputElementCoder.getKeyCoder(), + new SerializablePipelineOptions(context.getPipelineOptions())); - KeyedStream>, ByteBuffer> keyedWorkItemStream = + KeyedStream>, ByteBuffer> keyedWorkItemStream = workItemStream.keyBy(keySelector); SystemReduceFn, Iterable, BoundedWindow> reduceFn = @@ -443,10 +450,10 @@ private SingleOutputStreamOperator>>> add new WindowDoFnOperator<>( reduceFn, operatorName, - (Coder) windowedWorkItemCoder, + windowedWorkItemCoder, mainTag, Collections.emptyList(), - new DoFnOperator.MultiOutputOutputManagerFactory( + new DoFnOperator.MultiOutputOutputManagerFactory<>( mainTag, outputCoder, new SerializablePipelineOptions(context.getPipelineOptions())), @@ -455,17 +462,12 @@ private SingleOutputStreamOperator>>> add Collections.emptyList(), /* side inputs */ context.getPipelineOptions(), inputElementCoder.getKeyCoder(), - (KeySelector) keySelector /* key selector */); - - SingleOutputStreamOperator>>> outputDataStream = - keyedWorkItemStream.transform( - operatorName, outputTypeInfo, (OneInputStreamOperator) doFnOperator); + keySelector /* key selector */); - return outputDataStream; + return keyedWorkItemStream.transform(operatorName, outputTypeInfo, doFnOperator); } - @SuppressWarnings("unchecked") - private void translateUnboundedRead( + private void translateRead( String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { RunnerApi.PTransform transform = pipeline.getComponents().getTransformsOrThrow(id); String outputCollectionId = Iterables.getOnlyElement(transform.getOutputsMap().values()); @@ -477,20 +479,64 @@ private void translateUnboundedRead( throw new RuntimeException("Failed to parse ReadPayload from transform", e); } - Preconditions.checkState( - payload.getIsBounded() != RunnerApi.IsBounded.Enum.BOUNDED, - "Bounded reads should run inside an environment instead of being translated by the Runner."); + final DataStream> source; + if (payload.getIsBounded() == RunnerApi.IsBounded.Enum.BOUNDED) { + source = + translateBoundedSource( + transform.getUniqueName(), + outputCollectionId, + payload, + pipeline, + context.getPipelineOptions(), + context.getExecutionEnvironment()); + } else { + source = + translateUnboundedSource( + transform.getUniqueName(), + outputCollectionId, + payload, + pipeline, + context.getPipelineOptions(), + context.getExecutionEnvironment()); + } + context.addDataStream(outputCollectionId, source); + } - DataStream> source = - translateUnboundedSource( - transform.getUniqueName(), - outputCollectionId, - payload, - pipeline, - context.getPipelineOptions(), - context.getExecutionEnvironment()); + private DataStream> translateBoundedSource( + String transformName, + String outputCollectionId, + RunnerApi.ReadPayload payload, + RunnerApi.Pipeline pipeline, + FlinkPipelineOptions pipelineOptions, + StreamExecutionEnvironment env) { - context.addDataStream(outputCollectionId, source); + try { + @SuppressWarnings("unchecked") + BoundedSource boundedSource = + (BoundedSource) ReadTranslation.boundedSourceFromProto(payload); + @SuppressWarnings("unchecked") + WindowedValue.FullWindowedValueCoder wireCoder = + (WindowedValue.FullWindowedValueCoder) + instantiateCoder(outputCollectionId, pipeline.getComponents()); + + WindowedValue.FullWindowedValueCoder sdkCoder = + getSdkCoder(outputCollectionId, pipeline.getComponents()); + + CoderTypeInformation> outputTypeInfo = + new CoderTypeInformation<>(wireCoder, pipelineOptions); + + CoderTypeInformation> sdkTypeInfo = + new CoderTypeInformation<>(sdkCoder, pipelineOptions); + + return env.createInput(new SourceInputFormat<>(transformName, boundedSource, pipelineOptions)) + .name(transformName) + .uid(transformName) + .returns(sdkTypeInfo) + .map(value -> intoWireTypes(sdkCoder, wireCoder, value)) + .returns(outputTypeInfo); + } catch (Exception e) { + throw new RuntimeException("Error while translating UnboundedSource: " + transformName, e); + } } private static DataStream> translateUnboundedSource( @@ -503,28 +549,41 @@ private static DataStream> translateUnboundedSource( final DataStream> source; final DataStream>> nonDedupSource; - Coder> windowCoder = - instantiateCoder(outputCollectionId, pipeline.getComponents()); - TypeInformation> outputTypeInfo = - new CoderTypeInformation<>(windowCoder, pipelineOptions); + @SuppressWarnings("unchecked") + UnboundedSource unboundedSource = + (UnboundedSource) ReadTranslation.unboundedSourceFromProto(payload); - WindowingStrategy windowStrategy = + @SuppressWarnings("unchecked") + WindowingStrategy windowStrategy = getWindowingStrategy(outputCollectionId, pipeline.getComponents()); - TypeInformation>> withIdTypeInfo = - new CoderTypeInformation<>( - WindowedValue.getFullCoder( - ValueWithRecordId.ValueWithRecordIdCoder.of( - ((WindowedValueCoder) windowCoder).getValueCoder()), - windowStrategy.getWindowFn().windowCoder()), - pipelineOptions); - - UnboundedSource unboundedSource = ReadTranslation.unboundedSourceFromProto(payload); try { + + @SuppressWarnings("unchecked") + WindowedValue.FullWindowedValueCoder wireCoder = + (WindowedValue.FullWindowedValueCoder) + instantiateCoder(outputCollectionId, pipeline.getComponents()); + + WindowedValue.FullWindowedValueCoder sdkCoder = + getSdkCoder(outputCollectionId, pipeline.getComponents()); + + CoderTypeInformation> outputTypeInfo = + new CoderTypeInformation<>(wireCoder, pipelineOptions); + + CoderTypeInformation> sdkTypeInformation = + new CoderTypeInformation<>(sdkCoder, pipelineOptions); + + TypeInformation>> withIdTypeInfo = + new CoderTypeInformation<>( + WindowedValue.getFullCoder( + ValueWithRecordId.ValueWithRecordIdCoder.of(sdkCoder.getValueCoder()), + windowStrategy.getWindowFn().windowCoder()), + pipelineOptions); + int parallelism = env.getMaxParallelism() > 0 ? env.getMaxParallelism() : env.getParallelism(); - UnboundedSourceWrapper sourceWrapper = + UnboundedSourceWrapper sourceWrapper = new UnboundedSourceWrapper<>( transformName, pipelineOptions, unboundedSource, parallelism); nonDedupSource = @@ -537,19 +596,74 @@ private static DataStream> translateUnboundedSource( source = nonDedupSource .keyBy(new FlinkStreamingTransformTranslators.ValueWithRecordIdKeySelector<>()) - .transform("deduping", outputTypeInfo, new DedupingOperator<>(pipelineOptions)) - .uid(format("%s/__deduplicated__", transformName)); + .transform("deduping", sdkTypeInformation, new DedupingOperator<>(pipelineOptions)) + .uid(format("%s/__deduplicated__", transformName)) + .returns(sdkTypeInformation); } else { source = nonDedupSource .flatMap(new FlinkStreamingTransformTranslators.StripIdsMap<>(pipelineOptions)) - .returns(outputTypeInfo); + .returns(sdkTypeInformation); } + + return source.map(value -> intoWireTypes(sdkCoder, wireCoder, value)).returns(outputTypeInfo); } catch (Exception e) { throw new RuntimeException("Error while translating UnboundedSource: " + unboundedSource, e); } + } + + /** + * Get SDK coder for given PCollection. The SDK coder is the coder that the SDK-harness would have + * used to encode data before passing it to the runner over {@link SdkHarnessClient}. + * + * @param pCollectionId ID of PCollection in components + * @param components the Pipeline components (proto) + * @return SDK-side coder for the PCollection + */ + private static WindowedValue.FullWindowedValueCoder getSdkCoder( + String pCollectionId, RunnerApi.Components components) { + + PipelineNode.PCollectionNode pCollectionNode = + PipelineNode.pCollection(pCollectionId, components.getPcollectionsOrThrow(pCollectionId)); + RunnerApi.Components.Builder componentsBuilder = components.toBuilder(); + String coderId = + WireCoders.addSdkWireCoder( + pCollectionNode, + componentsBuilder, + RunnerApi.ExecutableStagePayload.WireCoderSetting.getDefaultInstance()); + RehydratedComponents rehydratedComponents = + RehydratedComponents.forComponents(componentsBuilder.build()); + try { + @SuppressWarnings("unchecked") + WindowedValue.FullWindowedValueCoder res = + (WindowedValue.FullWindowedValueCoder) rehydratedComponents.getCoder(coderId); + return res; + } catch (IOException ex) { + throw new IllegalStateException("Could not get SDK coder.", ex); + } + } - return source; + /** + * Transform types from SDK types to runner types. The runner uses byte array representation for + * non {@link ModelCoders} coders. + * + * @param inCoder the input coder (SDK-side) + * @param outCoder the output coder (runner-side) + * @param value encoded value + * @param SDK-side type + * @param runer-side type + * @return re-encoded {@link WindowedValue} + */ + private static WindowedValue intoWireTypes( + Coder> inCoder, + Coder> outCoder, + WindowedValue value) { + + try { + return CoderUtils.decodeFromByteArray(outCoder, CoderUtils.encodeToByteArray(inCoder, value)); + } catch (CoderException ex) { + throw new IllegalStateException("Could not transform element into wire types", ex); + } } private void translateImpulse( diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java index 319e5b7a8a0a..9379ef560fb6 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java @@ -56,6 +56,7 @@ import org.apache.beam.sdk.coders.ByteArrayCoder; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.VoidCoder; import org.apache.beam.sdk.io.BoundedSource; @@ -109,11 +110,10 @@ import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.datastream.DataStreamSource; +import org.apache.flink.streaming.api.datastream.DataStreamUtils; import org.apache.flink.streaming.api.datastream.KeyedStream; import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction; -import org.apache.flink.streaming.api.operators.OneInputStreamOperator; -import org.apache.flink.streaming.api.operators.TwoInputStreamOperator; import org.apache.flink.streaming.api.transformations.TwoInputTransformation; import org.apache.flink.streaming.api.watermark.Watermark; import org.apache.flink.streaming.runtime.tasks.ProcessingTimeCallback; @@ -552,12 +552,25 @@ static void translateParDo( keySelector = new KvToByteBufferKeySelector( keyCoder, new SerializablePipelineOptions(context.getPipelineOptions())); - inputDataStream = inputDataStream.keyBy(keySelector); + final PTransform> producer = context.getProducer(input); + final String previousUrn = + producer != null + ? PTransformTranslation.urnForTransformOrNull(context.getProducer(input)) + : null; + // We can skip reshuffle in case previous transform was CPK or GBK + if (PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN.equals(previousUrn) + || PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN.equals(previousUrn)) { + inputDataStream = DataStreamUtils.reinterpretAsKeyedStream(inputDataStream, keySelector); + } else { + inputDataStream = inputDataStream.keyBy(keySelector); + } stateful = true; } else if (doFn instanceof SplittableParDoViaKeyedWorkItems.ProcessFn) { // we know that it is keyed on byte[] keyCoder = ByteArrayCoder.of(); - keySelector = new WorkItemKeySelector<>(keyCoder); + keySelector = + new WorkItemKeySelector<>( + keyCoder, new SerializablePipelineOptions(context.getPipelineOptions())); stateful = true; } @@ -906,50 +919,56 @@ public void translateNode( KvCoder inputKvCoder = (KvCoder) input.getCoder(); - SingletonKeyedWorkItemCoder workItemCoder = + SingletonKeyedWorkItemCoder workItemCoder = SingletonKeyedWorkItemCoder.of( inputKvCoder.getKeyCoder(), - inputKvCoder.getValueCoder(), + ByteArrayCoder.of(), input.getWindowingStrategy().getWindowFn().windowCoder()); DataStream>> inputDataStream = context.getInputDataStream(input); - WindowedValue.FullWindowedValueCoder> - windowedWorkItemCoder = - WindowedValue.getFullCoder( - workItemCoder, input.getWindowingStrategy().getWindowFn().windowCoder()); + WindowedValue.FullWindowedValueCoder> windowedWorkItemCoder = + WindowedValue.getFullCoder( + workItemCoder, input.getWindowingStrategy().getWindowFn().windowCoder()); - CoderTypeInformation>> workItemTypeInfo = + CoderTypeInformation>> workItemTypeInfo = new CoderTypeInformation<>(windowedWorkItemCoder, context.getPipelineOptions()); - DataStream>> workItemStream = + DataStream>> workItemStream = inputDataStream - .flatMap(new ToKeyedWorkItem<>(context.getPipelineOptions())) + .flatMap( + new ToBinaryKeyedWorkItem<>( + context.getPipelineOptions(), inputKvCoder.getValueCoder())) .returns(workItemTypeInfo) - .name("ToKeyedWorkItem"); + .name("ToBinaryKeyedWorkItem"); - WorkItemKeySelector keySelector = new WorkItemKeySelector<>(inputKvCoder.getKeyCoder()); + WorkItemKeySelector keySelector = + new WorkItemKeySelector<>( + inputKvCoder.getKeyCoder(), + new SerializablePipelineOptions(context.getPipelineOptions())); - KeyedStream>, ByteBuffer> - keyedWorkItemStream = - workItemStream.keyBy(new WorkItemKeySelector<>(inputKvCoder.getKeyCoder())); + KeyedStream>, ByteBuffer> keyedWorkItemStream = + workItemStream.keyBy(keySelector); - SystemReduceFn, Iterable, BoundedWindow> reduceFn = - SystemReduceFn.buffering(inputKvCoder.getValueCoder()); + SystemReduceFn, Iterable, BoundedWindow> reduceFn = + SystemReduceFn.buffering(ByteArrayCoder.of()); - Coder>>> outputCoder = - context.getWindowedInputCoder(context.getOutput(transform)); - TypeInformation>>> outputTypeInfo = - context.getTypeInfo(context.getOutput(transform)); + Coder>>> outputCoder = + WindowedValue.getFullCoder( + KvCoder.of(inputKvCoder.getKeyCoder(), IterableCoder.of(ByteArrayCoder.of())), + windowingStrategy.getWindowFn().windowCoder()); + + TypeInformation>>> outputTypeInfo = + new CoderTypeInformation<>(outputCoder, context.getPipelineOptions()); - TupleTag>> mainTag = new TupleTag<>("main output"); + TupleTag>> mainTag = new TupleTag<>("main output"); String fullName = getCurrentTransformName(context); - WindowDoFnOperator> doFnOperator = + WindowDoFnOperator> doFnOperator = new WindowDoFnOperator<>( reduceFn, fullName, - (Coder) windowedWorkItemCoder, + windowedWorkItemCoder, mainTag, Collections.emptyList(), new DoFnOperator.MultiOutputOutputManagerFactory<>( @@ -963,13 +982,15 @@ public void translateNode( inputKvCoder.getKeyCoder(), keySelector); - // our operator expects WindowedValue while our input stream - // is WindowedValue, which is fine but Java doesn't like it ... - @SuppressWarnings("unchecked") - SingleOutputStreamOperator>>> outDataStream = + final SingleOutputStreamOperator>>> outDataStream = keyedWorkItemStream - .transform(fullName, outputTypeInfo, (OneInputStreamOperator) doFnOperator) - .uid(fullName); + .transform(fullName, outputTypeInfo, doFnOperator) + .uid(fullName) + .flatMap( + new ToGroupByKeyResult<>( + context.getPipelineOptions(), inputKvCoder.getValueCoder())) + .returns(context.getTypeInfo(context.getOutput(transform))) + .name("ToGBKResult"); context.setOutputDataStream(context.getOutput(transform), outDataStream); } @@ -992,7 +1013,7 @@ boolean canTranslate( WindowingStrategy windowingStrategy = (WindowingStrategy) input.getWindowingStrategy(); - return windowingStrategy.getWindowFn().isNonMerging() + return !windowingStrategy.needsMerge() || ((Combine.PerKey) transform).getSideInputs().isEmpty(); } @@ -1017,23 +1038,25 @@ public void translateNode( DataStream>> inputDataStream = context.getInputDataStream(input); - WindowedValue.FullWindowedValueCoder> - windowedWorkItemCoder = - WindowedValue.getFullCoder( - workItemCoder, input.getWindowingStrategy().getWindowFn().windowCoder()); + WindowedValue.FullWindowedValueCoder> windowedWorkItemCoder = + WindowedValue.getFullCoder( + workItemCoder, input.getWindowingStrategy().getWindowFn().windowCoder()); - CoderTypeInformation>> workItemTypeInfo = + CoderTypeInformation>> workItemTypeInfo = new CoderTypeInformation<>(windowedWorkItemCoder, context.getPipelineOptions()); - DataStream>> workItemStream = + DataStream>> workItemStream = inputDataStream .flatMap(new ToKeyedWorkItem<>(context.getPipelineOptions())) .returns(workItemTypeInfo) .name("ToKeyedWorkItem"); - WorkItemKeySelector keySelector = new WorkItemKeySelector<>(inputKvCoder.getKeyCoder()); - KeyedStream>, ByteBuffer> - keyedWorkItemStream = workItemStream.keyBy(keySelector); + WorkItemKeySelector keySelector = + new WorkItemKeySelector<>( + inputKvCoder.getKeyCoder(), + new SerializablePipelineOptions(context.getPipelineOptions())); + KeyedStream>, ByteBuffer> keyedWorkItemStream = + workItemStream.keyBy(keySelector); GlobalCombineFn combineFn = ((Combine.PerKey) transform).getFn(); SystemReduceFn reduceFn = @@ -1069,13 +1092,8 @@ public void translateNode( inputKvCoder.getKeyCoder(), keySelector); - // our operator expects WindowedValue while our input stream - // is WindowedValue, which is fine but Java doesn't like it ... - @SuppressWarnings("unchecked") SingleOutputStreamOperator>> outDataStream = - keyedWorkItemStream - .transform(fullName, outputTypeInfo, (OneInputStreamOperator) doFnOperator) - .uid(fullName); + keyedWorkItemStream.transform(fullName, outputTypeInfo, doFnOperator).uid(fullName); context.setOutputDataStream(context.getOutput(transform), outDataStream); } else { Tuple2>, DataStream> transformSideInputs = @@ -1104,7 +1122,7 @@ public void translateNode( // allowed to have only one input keyed, normally. TwoInputTransformation< - WindowedValue>, + WindowedValue>, RawUnionValue, WindowedValue>> rawFlinkTransform = @@ -1112,7 +1130,7 @@ public void translateNode( keyedWorkItemStream.getTransformation(), transformSideInputs.f1.broadcast().getTransformation(), transform.getName(), - (TwoInputStreamOperator) doFnOperator, + doFnOperator, outputTypeInfo, keyedWorkItemStream.getParallelism()); @@ -1158,23 +1176,25 @@ public void translateNode( inputKvCoder.getValueCoder(), input.getWindowingStrategy().getWindowFn().windowCoder()); - WindowedValue.ValueOnlyWindowedValueCoder> - windowedWorkItemCoder = WindowedValue.getValueOnlyCoder(workItemCoder); + WindowedValue.ValueOnlyWindowedValueCoder> windowedWorkItemCoder = + WindowedValue.getValueOnlyCoder(workItemCoder); - CoderTypeInformation>> workItemTypeInfo = + CoderTypeInformation>> workItemTypeInfo = new CoderTypeInformation<>(windowedWorkItemCoder, context.getPipelineOptions()); DataStream>> inputDataStream = context.getInputDataStream(input); - DataStream>> workItemStream = + DataStream>> workItemStream = inputDataStream .flatMap(new ToKeyedWorkItemInGlobalWindow<>(context.getPipelineOptions())) .returns(workItemTypeInfo) .name("ToKeyedWorkItem"); - KeyedStream>, ByteBuffer> - keyedWorkItemStream = - workItemStream.keyBy(new WorkItemKeySelector<>(inputKvCoder.getKeyCoder())); + KeyedStream>, ByteBuffer> keyedWorkItemStream = + workItemStream.keyBy( + new WorkItemKeySelector<>( + inputKvCoder.getKeyCoder(), + new SerializablePipelineOptions(context.getPipelineOptions()))); context.setOutputDataStream(context.getOutput(transform), keyedWorkItemStream); } @@ -1182,7 +1202,7 @@ public void translateNode( private static class ToKeyedWorkItemInGlobalWindow extends RichFlatMapFunction< - WindowedValue>, WindowedValue>> { + WindowedValue>, WindowedValue>> { private final SerializablePipelineOptions options; @@ -1200,7 +1220,7 @@ public void open(Configuration parameters) { @Override public void flatMap( WindowedValue> inWithMultipleWindows, - Collector>> out) + Collector>> out) throws Exception { // we need to wrap each one work item per window for now @@ -1290,7 +1310,7 @@ public void flatMap(T t, Collector collector) throws Exception { static class ToKeyedWorkItem extends RichFlatMapFunction< - WindowedValue>, WindowedValue>> { + WindowedValue>, WindowedValue>> { private final SerializablePipelineOptions options; @@ -1308,8 +1328,7 @@ public void open(Configuration parameters) { @Override public void flatMap( WindowedValue> inWithMultipleWindows, - Collector>> out) - throws Exception { + Collector>> out) { // we need to wrap each one work item per window for now // since otherwise the PushbackSideInputRunner will not correctly @@ -1326,6 +1345,78 @@ public void flatMap( } } + static class ToBinaryKeyedWorkItem + extends RichFlatMapFunction< + WindowedValue>, WindowedValue>> { + + private final SerializablePipelineOptions options; + private final Coder valueCoder; + + ToBinaryKeyedWorkItem(PipelineOptions options, Coder valueCoder) { + this.options = new SerializablePipelineOptions(options); + this.valueCoder = valueCoder; + } + + @Override + public void open(Configuration parameters) { + // Initialize FileSystems for any coders which may want to use the FileSystem, + // see https://issues.apache.org/jira/browse/BEAM-8303 + FileSystems.setDefaultPipelineOptions(options.get()); + } + + @Override + public void flatMap( + WindowedValue> inWithMultipleWindows, + Collector>> out) + throws CoderException { + + // we need to wrap each one work item per window for now + // since otherwise the PushbackSideInputRunner will not correctly + // determine whether side inputs are ready + // + // this is tracked as https://issues.apache.org/jira/browse/BEAM-1850 + for (WindowedValue> in : inWithMultipleWindows.explodeWindows()) { + final byte[] binaryValue = + CoderUtils.encodeToByteArray(valueCoder, in.getValue().getValue()); + final SingletonKeyedWorkItem workItem = + new SingletonKeyedWorkItem<>(in.getValue().getKey(), in.withValue(binaryValue)); + out.collect(in.withValue(workItem)); + } + } + } + + static class ToGroupByKeyResult + extends RichFlatMapFunction< + WindowedValue>>, WindowedValue>>> { + + private final SerializablePipelineOptions options; + private final Coder valueCoder; + + ToGroupByKeyResult(PipelineOptions options, Coder valueCoder) { + this.options = new SerializablePipelineOptions(options); + this.valueCoder = valueCoder; + } + + @Override + public void open(Configuration parameters) { + // Initialize FileSystems for any coders which may want to use the FileSystem, + // see https://issues.apache.org/jira/browse/BEAM-8303 + FileSystems.setDefaultPipelineOptions(options.get()); + } + + @Override + public void flatMap( + WindowedValue>> element, + Collector>>> collector) + throws CoderException { + final List result = new ArrayList<>(); + for (byte[] binaryValue : element.getValue().getValue()) { + result.add(CoderUtils.decodeFromByteArray(valueCoder, binaryValue)); + } + collector.collect(element.withValue(KV.of(element.getValue().getKey(), result))); + } + } + /** * A translator just to vend the URN. This will need to be moved to runners-core-construction-java * once SDF is reorganized appropriately. diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTranslationContext.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTranslationContext.java index 5bcf260d6c55..1b10af6d0d66 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTranslationContext.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTranslationContext.java @@ -34,6 +34,7 @@ import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.streaming.api.datastream.DataStream; @@ -55,14 +56,15 @@ class FlinkStreamingTranslationContext { * Keeps a mapping between the output value of the PTransform and the Flink Operator that produced * it, after the translation of the correspondinf PTransform to its Flink equivalent. */ - private final Map> dataStreams; + private final Map> dataStreams = new HashMap<>(); + + private final Map> producers = new HashMap<>(); private AppliedPTransform currentTransform; public FlinkStreamingTranslationContext(StreamExecutionEnvironment env, PipelineOptions options) { this.env = checkNotNull(env); this.options = checkNotNull(options); - this.dataStreams = new HashMap<>(); } public StreamExecutionEnvironment getExecutionEnvironment() { @@ -79,11 +81,19 @@ public DataStream getInputDataStream(PValue value) { } public void setOutputDataStream(PValue value, DataStream set) { + final PTransform previousProducer = producers.put(value, currentTransform.getTransform()); + Preconditions.checkArgument( + previousProducer == null, "PValue can only have a single producer."); if (!dataStreams.containsKey(value)) { dataStreams.put(value, set); } } + @SuppressWarnings({"unchecked", "rawtypes"}) + PTransform getProducer(T value) { + return (PTransform) producers.get(value); + } + /** * Sets the AppliedPTransform which carries input/output. * @@ -94,8 +104,7 @@ public void setCurrentTransform(AppliedPTransform currentTransform) { } public Coder> getWindowedInputCoder(PCollection collection) { - Coder valueCoder = collection.getCoder(); - + final Coder valueCoder = collection.getCoder(); return WindowedValue.getFullCoder( valueCoder, collection.getWindowingStrategy().getWindowFn().windowCoder()); } @@ -112,12 +121,7 @@ public Map, Coder> getOutputCoders() { @SuppressWarnings("unchecked") public TypeInformation> getTypeInfo(PCollection collection) { - Coder valueCoder = collection.getCoder(); - WindowedValue.FullWindowedValueCoder windowedValueCoder = - WindowedValue.getFullCoder( - valueCoder, collection.getWindowingStrategy().getWindowFn().windowCoder()); - - return new CoderTypeInformation<>(windowedValueCoder, options); + return new CoderTypeInformation<>(getWindowedInputCoder(collection), options); } public AppliedPTransform getCurrentTransform() { diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FileReporter.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FileReporter.java index 35c32fa706fc..030abcbd0836 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FileReporter.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FileReporter.java @@ -69,7 +69,13 @@ public void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup final String name = group.getMetricIdentifier(metricName, this); super.notifyOfRemovedMetric(metric, metricName, group); synchronized (this) { - ps.printf("%s: %s%n", name, Metrics.toString(metric)); + try { + ps.printf("%s: %s%n", name, Metrics.toString(metric)); + } catch (NullPointerException e) { + // Workaround to avoid a NPE on Flink's DeclarativeSlotManager during unregister + // TODO Remove once FLINK-22646 is fixed on upstream Flink. + log.warn("unable to log details on metric {}", name); + } } } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDoFnFunction.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDoFnFunction.java index 8a9b8d8497fd..cfcf28617485 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDoFnFunction.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDoFnFunction.java @@ -43,7 +43,6 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.flink.api.common.functions.AbstractRichFunction; import org.apache.flink.api.common.functions.FlatMapFunction; -import org.apache.flink.api.common.functions.MapPartitionFunction; import org.apache.flink.api.common.functions.RuntimeContext; import org.apache.flink.configuration.Configuration; import org.apache.flink.util.Collector; @@ -61,8 +60,7 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlinkDoFnFunction extends AbstractRichFunction - implements FlatMapFunction, WindowedValue>, - MapPartitionFunction, WindowedValue> { + implements FlatMapFunction, WindowedValue> { private final SerializablePipelineOptions serializedOptions; @@ -128,21 +126,14 @@ public void flatMap(WindowedValue value, Collector> values, Collector> out) { - for (WindowedValue value : values) { - flatMap(value, out); - } - } - @Override public void open(Configuration parameters) { // Note that the SerializablePipelineOptions already initialize FileSystems in the readObject() // deserialization method. However, this is a hack, and we want to properly initialize the // options where they are needed. - FileSystems.setDefaultPipelineOptions(serializedOptions.get()); - doFnInvoker = DoFnInvokers.tryInvokeSetupFor(doFn); + PipelineOptions options = serializedOptions.get(); + FileSystems.setDefaultPipelineOptions(options); + doFnInvoker = DoFnInvokers.tryInvokeSetupFor(doFn, options); metricContainer = new FlinkMetricContainer(getRuntimeContext()); // setup DoFnRunner @@ -159,7 +150,7 @@ public void open(Configuration parameters) { DoFnRunner doFnRunner = DoFnRunners.simpleRunner( - serializedOptions.get(), + options, doFn, new FlinkSideInputReader(sideInputs, runtimeContext), outputManager, diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkMergingNonShuffleReduceFunction.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkMergingNonShuffleReduceFunction.java index b34649f7e295..b1b95c6b58e4 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkMergingNonShuffleReduceFunction.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkMergingNonShuffleReduceFunction.java @@ -23,7 +23,7 @@ import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.CombineFnBase; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; -import org.apache.beam.sdk.transforms.windowing.IntervalWindow; +import org.apache.beam.sdk.transforms.windowing.Sessions; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollectionView; @@ -83,7 +83,7 @@ public void reduce( new FlinkSideInputReader(sideInputs, getRuntimeContext()); AbstractFlinkCombineRunner reduceRunner; - if (windowingStrategy.getWindowFn().windowCoder().equals(IntervalWindow.getCoder())) { + if (windowingStrategy.getWindowFn() instanceof Sessions) { reduceRunner = new SortingFlinkCombineRunner<>(); } else { reduceRunner = new HashingFlinkCombineRunner<>(); diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkNonMergingReduceFunction.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkNonMergingReduceFunction.java index b1ad8d0d2a9e..5a41d875462e 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkNonMergingReduceFunction.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkNonMergingReduceFunction.java @@ -84,10 +84,7 @@ public void reduce( final WindowedValue> first = iterator.peek(); final BoundedWindow window = Iterables.getOnlyElement(first.getWindows()); @SuppressWarnings("unchecked") - final Instant outputTimestamp = - ((WindowingStrategy) windowingStrategy) - .getWindowFn() - .getOutputTime(first.getTimestamp(), window); + final Instant outputTimestamp = first.getTimestamp(); final Instant combinedTimestamp = windowingStrategy.getTimestampCombiner().assign(window, outputTimestamp); final Iterable values; diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkPartialReduceFunction.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkPartialReduceFunction.java index b7a694a1c7e0..7b0f5d531e18 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkPartialReduceFunction.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkPartialReduceFunction.java @@ -23,7 +23,7 @@ import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.CombineFnBase; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; -import org.apache.beam.sdk.transforms.windowing.IntervalWindow; +import org.apache.beam.sdk.transforms.windowing.Sessions; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollectionView; @@ -98,11 +98,10 @@ public void combine( if (groupedByWindow) { reduceRunner = new SingleWindowFlinkCombineRunner<>(); } else { - if (!windowingStrategy.getWindowFn().isNonMerging() - && !windowingStrategy.getWindowFn().windowCoder().equals(IntervalWindow.getCoder())) { - reduceRunner = new HashingFlinkCombineRunner<>(); - } else { + if (windowingStrategy.needsMerge() && windowingStrategy.getWindowFn() instanceof Sessions) { reduceRunner = new SortingFlinkCombineRunner<>(); + } else { + reduceRunner = new HashingFlinkCombineRunner<>(); } } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkReduceFunction.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkReduceFunction.java index b7a15ac82e22..13998693de7f 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkReduceFunction.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkReduceFunction.java @@ -23,7 +23,7 @@ import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.CombineFnBase; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; -import org.apache.beam.sdk.transforms.windowing.IntervalWindow; +import org.apache.beam.sdk.transforms.windowing.Sessions; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollectionView; @@ -98,11 +98,10 @@ public void reduce( if (groupedByWindow) { reduceRunner = new SingleWindowFlinkCombineRunner<>(); } else { - if (!windowingStrategy.getWindowFn().isNonMerging() - && !windowingStrategy.getWindowFn().windowCoder().equals(IntervalWindow.getCoder())) { - reduceRunner = new HashingFlinkCombineRunner<>(); - } else { + if (windowingStrategy.needsMerge() && windowingStrategy.getWindowFn() instanceof Sessions) { reduceRunner = new SortingFlinkCombineRunner<>(); + } else { + reduceRunner = new HashingFlinkCombineRunner<>(); } } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkStatefulDoFnFunction.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkStatefulDoFnFunction.java index 7c6453a07c7b..d007ce1dc6a1 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkStatefulDoFnFunction.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkStatefulDoFnFunction.java @@ -236,9 +236,10 @@ public void open(Configuration parameters) { // Note that the SerializablePipelineOptions already initialize FileSystems in the readObject() // deserialization method. However, this is a hack, and we want to properly initialize the // options where they are needed. - FileSystems.setDefaultPipelineOptions(serializedOptions.get()); + PipelineOptions options = serializedOptions.get(); + FileSystems.setDefaultPipelineOptions(options); metricContainer = new FlinkMetricContainer(getRuntimeContext()); - doFnInvoker = DoFnInvokers.tryInvokeSetupFor(dofn); + doFnInvoker = DoFnInvokers.tryInvokeSetupFor(dofn, options); } @Override diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/HashingFlinkCombineRunner.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/HashingFlinkCombineRunner.java index 6c518c76f0aa..e11eadd3f989 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/HashingFlinkCombineRunner.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/HashingFlinkCombineRunner.java @@ -59,7 +59,6 @@ public void combine( @SuppressWarnings("unchecked") TimestampCombiner timestampCombiner = windowingStrategy.getTimestampCombiner(); - WindowFn windowFn = windowingStrategy.getWindowFn(); // Flink Iterable can be iterated over only once. List>> inputs = new ArrayList<>(); @@ -86,8 +85,7 @@ public void combine( flinkCombiner.firstInput( key, currentValue.getValue().getValue(), options, sideInputReader, singletonW); Instant windowTimestamp = - timestampCombiner.assign( - mergedWindow, windowFn.getOutputTime(currentValue.getTimestamp(), mergedWindow)); + timestampCombiner.assign(mergedWindow, currentValue.getTimestamp()); accumAndInstant = new Tuple2<>(accumT, windowTimestamp); mapState.put(mergedWindow, accumAndInstant); } else { @@ -102,11 +100,7 @@ public void combine( accumAndInstant.f1 = timestampCombiner.combine( accumAndInstant.f1, - timestampCombiner.assign( - mergedWindow, - windowingStrategy - .getWindowFn() - .getOutputTime(currentValue.getTimestamp(), mergedWindow))); + timestampCombiner.assign(mergedWindow, currentValue.getTimestamp())); } } if (iterator.hasNext()) { @@ -140,7 +134,7 @@ private Map mergeWindows(WindowingStrategy windowingStrategy, S throws Exception { WindowFn windowFn = windowingStrategy.getWindowFn(); - if (windowingStrategy.getWindowFn().isNonMerging()) { + if (!windowingStrategy.needsMerge()) { // Return an empty map, indicating that every window is not merged. return Collections.emptyMap(); } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/SingleWindowFlinkCombineRunner.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/SingleWindowFlinkCombineRunner.java index e5977388f5db..68795debe8f3 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/SingleWindowFlinkCombineRunner.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/SingleWindowFlinkCombineRunner.java @@ -24,7 +24,6 @@ import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.transforms.windowing.TimestampCombiner; -import org.apache.beam.sdk.transforms.windowing.WindowFn; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.WindowingStrategy; @@ -53,7 +52,6 @@ public void combine( Iterable>> elements, Collector>> out) { final TimestampCombiner timestampCombiner = windowingStrategy.getTimestampCombiner(); - final WindowFn windowFn = windowingStrategy.getWindowFn(); final PeekingIterator>> iterator = Iterators.peekingIterator(elements.iterator()); @@ -76,8 +74,7 @@ public void combine( sideInputReader, Collections.singleton(currentWindow)); Instant windowTimestamp = - timestampCombiner.assign( - currentWindow, windowFn.getOutputTime(currentValue.getTimestamp(), currentWindow)); + timestampCombiner.assign(currentWindow, currentValue.getTimestamp()); combinedState = new Tuple2<>(accumT, windowTimestamp); } else { combinedState.f0 = @@ -91,11 +88,7 @@ public void combine( combinedState.f1 = timestampCombiner.combine( combinedState.f1, - timestampCombiner.assign( - currentWindow, - windowingStrategy - .getWindowFn() - .getOutputTime(currentValue.getTimestamp(), currentWindow))); + timestampCombiner.assign(currentWindow, currentValue.getTimestamp())); } } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/SortingFlinkCombineRunner.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/SortingFlinkCombineRunner.java index 6bbc847037c4..309e83126403 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/SortingFlinkCombineRunner.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/SortingFlinkCombineRunner.java @@ -26,7 +26,6 @@ import org.apache.beam.sdk.transforms.windowing.IntervalWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.transforms.windowing.TimestampCombiner; -import org.apache.beam.sdk.transforms.windowing.WindowFn; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.WindowingStrategy; @@ -56,7 +55,6 @@ public void combine( @SuppressWarnings("unchecked") TimestampCombiner timestampCombiner = (TimestampCombiner) windowingStrategy.getTimestampCombiner(); - WindowFn windowFn = windowingStrategy.getWindowFn(); // get all elements so that we can sort them, has to fit into // memory @@ -70,7 +68,7 @@ public void combine( sortedInput.sort( Comparator.comparing(o -> Iterables.getOnlyElement(o.getWindows()).maxTimestamp())); - if (!windowingStrategy.getWindowFn().isNonMerging()) { + if (windowingStrategy.needsMerge()) { // merge windows, we have to do it in an extra pre-processing step and // can't do it as we go since the window of early elements would not // be correct when calling the CombineFn @@ -90,9 +88,7 @@ public void combine( key, firstValue, options, sideInputReader, currentValue.getWindows()); // we use this to keep track of the timestamps assigned by the TimestampCombiner - Instant windowTimestamp = - timestampCombiner.assign( - currentWindow, windowFn.getOutputTime(currentValue.getTimestamp(), currentWindow)); + Instant windowTimestamp = timestampCombiner.assign(currentWindow, currentValue.getTimestamp()); while (iterator.hasNext()) { WindowedValue> nextValue = iterator.next(); @@ -108,10 +104,7 @@ public void combine( windowTimestamp = timestampCombiner.combine( - windowTimestamp, - timestampCombiner.assign( - currentWindow, - windowFn.getOutputTime(nextValue.getTimestamp(), currentWindow))); + windowTimestamp, timestampCombiner.assign(currentWindow, nextValue.getTimestamp())); } else { // emit the value that we currently have @@ -131,9 +124,7 @@ public void combine( accumulator = flinkCombiner.firstInput( key, value, options, sideInputReader, currentValue.getWindows()); - windowTimestamp = - timestampCombiner.assign( - currentWindow, windowFn.getOutputTime(nextValue.getTimestamp(), currentWindow)); + windowTimestamp = timestampCombiner.assign(currentWindow, nextValue.getTimestamp()); } } diff --git a/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializer.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializer.java similarity index 97% rename from runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializer.java rename to runners/flink/src/main/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializer.java index a4ad44825e8b..0cd1e6bb53b1 100644 --- a/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializer.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializer.java @@ -86,12 +86,11 @@ public T createInstance() { public T copy(T t) { if (fasterCopy) { return t; - } else { - try { - return CoderUtils.clone(coder, t); - } catch (CoderException e) { - throw new RuntimeException("Could not clone.", e); - } + } + try { + return CoderUtils.clone(coder, t); + } catch (CoderException e) { + throw new RuntimeException("Could not clone.", e); } } diff --git a/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/types/EncodedValueSerializer.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/types/EncodedValueSerializer.java similarity index 100% rename from runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/types/EncodedValueSerializer.java rename to runners/flink/src/main/java/org/apache/beam/runners/flink/translation/types/EncodedValueSerializer.java diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPortableRunnerUtils.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPortableRunnerUtils.java index 3e0054517f58..851a271b6800 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPortableRunnerUtils.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPortableRunnerUtils.java @@ -19,7 +19,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.construction.PTransformTranslation; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; /** * Various utilies related to portability. Helps share code between portable batch and streaming diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/SourceInputFormat.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/SourceInputFormat.java index 8a2f863fa1d1..40ea207ef4f9 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/SourceInputFormat.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/SourceInputFormat.java @@ -131,7 +131,7 @@ public InputSplitAssigner getInputSplitAssigner(final SourceInputSplit[] sourceI } @Override - public boolean reachedEnd() throws IOException { + public boolean reachedEnd() { return !inputAvailable; } @@ -150,7 +150,9 @@ public WindowedValue nextRecord(WindowedValue t) throws IOException { @Override public void close() throws IOException { - metricContainer.registerMetricsForPipelineResult(); + if (metricContainer != null) { + metricContainer.registerMetricsForPipelineResult(); + } // TODO null check can be removed once FLINK-3796 is fixed if (reader != null) { reader.close(); diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java index dc264219ff6f..60abdddd4274 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java @@ -351,7 +351,7 @@ public void setForWindow(InputT input, BoundedWindow window) { }; // we don't know the window type - @SuppressWarnings({"unchecked", "rawtypes"}) + // @SuppressWarnings({"unchecked", "rawtypes"}) Coder windowCoder = windowingStrategy.getWindowFn().windowCoder(); @SuppressWarnings({"unchecked", "rawtypes"}) @@ -465,10 +465,10 @@ public void open() throws Exception { // So must wait StateInternals and TimerInternals ready. // This will be called after initializeState() this.doFn = getDoFn(); - doFnInvoker = DoFnInvokers.invokerFor(doFn); - doFnInvoker.invokeSetup(); FlinkPipelineOptions options = serializedOptions.get().as(FlinkPipelineOptions.class); + doFnInvoker = DoFnInvokers.tryInvokeSetupFor(doFn, options); + StepContext stepContext = new FlinkStepContext(); doFnRunner = DoFnRunners.simpleRunner( @@ -639,11 +639,12 @@ protected final void setBundleFinishedCallback(Runnable callback) { } @Override - public final void processElement(StreamRecord> streamRecord) - throws Exception { + public final void processElement(StreamRecord> streamRecord) { checkInvokeStartBundle(); + long oldHold = keyCoder != null ? keyedStateInternals.minWatermarkHoldMs() : -1L; doFnRunner.processElement(streamRecord.getValue()); checkInvokeFinishBundleByCount(); + emitWatermarkIfHoldChanged(oldHold); } @Override @@ -740,9 +741,12 @@ public final void processWatermark1(Watermark mark) throws Exception { } currentInputWatermark = mark.getTimestamp(); + processInputWatermark(true); + } + private void processInputWatermark(boolean advanceInputWatermark) throws Exception { long inputWatermarkHold = applyInputWatermarkHold(getEffectiveInputWatermark()); - if (keyCoder != null) { + if (keyCoder != null && advanceInputWatermark) { timeServiceManagerCompat.advanceWatermark(new Watermark(inputWatermarkHold)); } @@ -993,7 +997,23 @@ public void onProcessingTime(InternalTimer timer) { // allow overriding this in ExecutableStageDoFnOperator to set the key context protected void fireTimerInternal(ByteBuffer key, TimerData timerData) { + long oldHold = keyCoder != null ? keyedStateInternals.minWatermarkHoldMs() : -1L; fireTimer(timerData); + emitWatermarkIfHoldChanged(oldHold); + } + + void emitWatermarkIfHoldChanged(long currentWatermarkHold) { + if (keyCoder != null) { + long newWatermarkHold = keyedStateInternals.minWatermarkHoldMs(); + if (newWatermarkHold > currentWatermarkHold) { + try { + processInputWatermark(false); + } catch (Exception ex) { + // should not happen + throw new IllegalStateException(ex); + } + } + } } // allow overriding this in WindowDoFnOperator @@ -1392,6 +1412,10 @@ private boolean timerUsesOutputTimestamp(TimerData timer) { return timer.getOutputTimestamp().isBefore(timer.getTimestamp()); } + private String constructTimerId(String timerFamilyId, String timerId) { + return timerFamilyId + "+" + timerId; + } + @Override public void setTimer( StateNamespace namespace, @@ -1417,7 +1441,10 @@ public void setTimer(TimerData timer) { timer.getTimerId(), timer.getTimestamp().getMillis(), timer.getOutputTimestamp().getMillis()); - String contextTimerId = getContextTimerId(timer.getTimerId(), timer.getNamespace()); + String contextTimerId = + getContextTimerId( + constructTimerId(timer.getTimerFamilyId(), timer.getTimerId()), + timer.getNamespace()); @Nullable final TimerData oldTimer = pendingTimersById.get(contextTimerId); if (!timer.equals(oldTimer)) { // Only one timer can exist at a time for a given timer id and context. @@ -1480,7 +1507,10 @@ private void cancelPendingTimer(@Nullable TimerData timer) { */ void onFiredOrDeletedTimer(TimerData timer) { try { - pendingTimersById.remove(getContextTimerId(timer.getTimerId(), timer.getNamespace())); + pendingTimersById.remove( + getContextTimerId( + constructTimerId(timer.getTimerFamilyId(), timer.getTimerId()), + timer.getNamespace())); if (timer.getDomain() == TimeDomain.EVENT_TIME || StateAndTimerBundleCheckpointHandler.isSdfTimer(timer.getTimerId())) { if (timerUsesOutputTimestamp(timer)) { @@ -1500,7 +1530,8 @@ public void deleteTimer(StateNamespace namespace, String timerId, String timerFa } @Override - public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain timeDomain) { + public void deleteTimer( + StateNamespace namespace, String timerId, String timerFamilyId, TimeDomain timeDomain) { try { cancelPendingTimerById(getContextTimerId(timerId, namespace)); } catch (Exception e) { @@ -1512,7 +1543,11 @@ public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain tim @Override @Deprecated public void deleteTimer(TimerData timer) { - deleteTimer(timer.getNamespace(), timer.getTimerId(), timer.getDomain()); + deleteTimer( + timer.getNamespace(), + constructTimerId(timer.getTimerFamilyId(), timer.getTimerId()), + timer.getTimerFamilyId(), + timer.getDomain()); } void deleteTimerInternal(TimerData timer) { diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java index d479a3cb4454..ba0afce6f8e0 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java @@ -63,7 +63,6 @@ import org.apache.beam.runners.core.construction.graph.ExecutableStage; import org.apache.beam.runners.core.construction.graph.UserStateReference; import org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageContextFactory; -import org.apache.beam.runners.flink.translation.functions.FlinkStreamingSideInputHandlerFactory; import org.apache.beam.runners.flink.translation.types.CoderTypeSerializer; import org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals; import org.apache.beam.runners.fnexecution.control.BundleCheckpointHandler; @@ -82,6 +81,8 @@ import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.runners.fnexecution.state.StateRequestHandler; import org.apache.beam.runners.fnexecution.state.StateRequestHandlers; +import org.apache.beam.runners.fnexecution.translation.StreamingSideInputHandlerFactory; +import org.apache.beam.runners.fnexecution.wire.ByteStringCoder; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.VoidCoder; import org.apache.beam.sdk.fn.data.FnDataReceiver; @@ -102,11 +103,10 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusRuntimeException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusRuntimeException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; -import org.apache.beam.vendor.sdk.v2.sdk.extensions.protobuf.ByteStringCoder; import org.apache.flink.api.common.state.ListStateDescriptor; import org.apache.flink.api.common.typeutils.base.StringSerializer; import org.apache.flink.api.java.functions.KeySelector; @@ -221,6 +221,11 @@ public ExecutableStageDoFnOperator( this.windowCoder = (Coder) windowingStrategy.getWindowFn().windowCoder(); this.inputCoder = windowedInputCoder; this.pipelineOptions = new SerializablePipelineOptions(options); + + Preconditions.checkArgument( + !windowedInputCoder.getCoderArguments().isEmpty(), + "Empty arguments for WindowedValue Coder %s", + windowedInputCoder); } @Override @@ -314,7 +319,7 @@ private StateRequestHandler getStateRequestHandler(ExecutableStage executableSta checkNotNull(super.sideInputHandler); StateRequestHandlers.SideInputHandlerFactory sideInputHandlerFactory = Preconditions.checkNotNull( - FlinkStreamingSideInputHandlerFactory.forStage( + StreamingSideInputHandlerFactory.forStage( executableStage, sideInputIds, super.sideInputHandler)); try { sideInputStateHandler = @@ -522,8 +527,7 @@ void setTimer(Timer timerElement, TimerInternals.TimerData timerData) { try (Locker locker = Locker.locked(stateBackendLock)) { getKeyedStateBackend().setCurrentKey(encodedKey); if (timerElement.getClearBit()) { - timerInternals.deleteTimer( - timerData.getNamespace(), timerData.getTimerId(), timerData.getDomain()); + timerInternals.deleteTimer(timerData); } else { timerInternals.setTimer(timerData); if (!timerData.getTimerId().equals(GC_TIMER_ID)) { @@ -597,7 +601,8 @@ public void setTimer(TimerData timerData) { } @Override - public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain timeDomain) { + public void deleteTimer( + StateNamespace namespace, String timerId, String timerFamilyId, TimeDomain timeDomain) { throw new UnsupportedOperationException( "It is not expected to use SdfFlinkTimerInternals to delete a timer"); } @@ -973,7 +978,7 @@ public void onTimer( processElement(stateValue); } else { KV transformAndTimerFamilyId = - TimerReceiverFactory.decodeTimerDataTimerId(timerId); + TimerReceiverFactory.decodeTimerDataTimerId(timerFamilyId); LOG.debug( "timer callback: {} {} {} {} {}", transformAndTimerFamilyId.getKey(), @@ -990,7 +995,7 @@ public void onTimer( Timer timerValue = Timer.of( timerKey, - "", + timerId, Collections.singletonList(window), timestamp, outputTimestamp, diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtils.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtils.java index 3e27bb1ac1a8..6facf7cfb0d3 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtils.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtils.java @@ -32,7 +32,7 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.StructuredCoder; import org.apache.beam.sdk.util.CoderUtils; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** * Utility functions for dealing with key encoding. Beam requires keys to be compared in binary diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/KeyedPushedBackElementsHandler.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/KeyedPushedBackElementsHandler.java index 82229d969250..e51f30b4be6b 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/KeyedPushedBackElementsHandler.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/KeyedPushedBackElementsHandler.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.flink.translation.wrappers.streaming; import java.util.List; +import java.util.Objects; import java.util.stream.Collectors; import java.util.stream.Stream; import java.util.stream.StreamSupport; @@ -53,8 +54,8 @@ private KeyedPushedBackElementsHandler( KeyedStateBackend backend, ListStateDescriptor stateDescriptor) throws Exception { - this.keySelector = keySelector; - this.backend = backend; + this.keySelector = Objects.requireNonNull(keySelector); + this.backend = Objects.requireNonNull(backend); this.stateName = stateDescriptor.getName(); // Eagerly retrieve the state to work around https://jira.apache.org/jira/browse/FLINK-12653 this.state = diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/SingletonKeyedWorkItemCoder.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/SingletonKeyedWorkItemCoder.java index 23c67f7acb05..e06bafe85195 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/SingletonKeyedWorkItemCoder.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/SingletonKeyedWorkItemCoder.java @@ -32,7 +32,8 @@ /** Singleton keyed work item coder. */ public class SingletonKeyedWorkItemCoder - extends StructuredCoder> { + extends StructuredCoder> { + /** * Create a new {@link KeyedWorkItemCoder} with the provided key coder, element coder, and window * coder. @@ -64,17 +65,17 @@ public Coder getElementCoder() { } @Override - public void encode(SingletonKeyedWorkItem value, OutputStream outStream) + public void encode(KeyedWorkItem value, OutputStream outStream) throws CoderException, IOException { encode(value, outStream, Context.NESTED); } @Override - public void encode( - SingletonKeyedWorkItem value, OutputStream outStream, Context context) + public void encode(KeyedWorkItem value, OutputStream outStream, Context context) throws CoderException, IOException { - keyCoder.encode(value.key(), outStream); - valueCoder.encode(value.value(), outStream, context); + final SingletonKeyedWorkItem cast = (SingletonKeyedWorkItem) value; + keyCoder.encode(cast.key(), outStream); + valueCoder.encode(cast.value(), outStream, context); } @Override diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java index ec77481cc0b6..3cdb0aece9a7 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java @@ -19,11 +19,12 @@ import java.nio.ByteBuffer; import org.apache.beam.runners.core.KeyedWorkItem; +import org.apache.beam.runners.core.construction.SerializablePipelineOptions; +import org.apache.beam.runners.flink.translation.types.CoderTypeInformation; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.util.WindowedValue; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.api.java.functions.KeySelector; -import org.apache.flink.api.java.typeutils.GenericTypeInfo; import org.apache.flink.api.java.typeutils.ResultTypeQueryable; /** @@ -32,23 +33,25 @@ * comparisons/hashing happen on the encoded form. */ public class WorkItemKeySelector - implements KeySelector>, ByteBuffer>, + implements KeySelector>, ByteBuffer>, ResultTypeQueryable { private final Coder keyCoder; + private final SerializablePipelineOptions pipelineOptions; - public WorkItemKeySelector(Coder keyCoder) { + public WorkItemKeySelector(Coder keyCoder, SerializablePipelineOptions pipelineOptions) { this.keyCoder = keyCoder; + this.pipelineOptions = pipelineOptions; } @Override - public ByteBuffer getKey(WindowedValue> value) throws Exception { + public ByteBuffer getKey(WindowedValue> value) throws Exception { K key = value.getValue().key(); return FlinkKeyUtils.encodeKey(key, keyCoder); } @Override public TypeInformation getProducedType() { - return new GenericTypeInfo<>(ByteBuffer.class); + return new CoderTypeInformation<>(FlinkKeyUtils.ByteBufferCoder.of(), pipelineOptions.get()); } } diff --git a/runners/flink/1.9/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/BeamStoppableFunction.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/BeamStoppableFunction.java similarity index 100% rename from runners/flink/1.9/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/BeamStoppableFunction.java rename to runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/BeamStoppableFunction.java diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java index f7b65a98e4c5..610a4568d52e 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java @@ -24,6 +24,7 @@ import java.util.Map; import java.util.Objects; import java.util.Set; +import java.util.function.Function; import java.util.stream.Stream; import javax.annotation.Nonnull; import org.apache.beam.runners.core.StateInternals; @@ -72,7 +73,10 @@ import org.apache.flink.runtime.state.KeyedStateBackend; import org.apache.flink.runtime.state.VoidNamespace; import org.apache.flink.runtime.state.VoidNamespaceSerializer; +import org.checkerframework.checker.initialization.qual.Initialized; +import org.checkerframework.checker.nullness.qual.NonNull; import org.checkerframework.checker.nullness.qual.Nullable; +import org.checkerframework.checker.nullness.qual.UnknownKeyFor; import org.joda.time.Instant; /** @@ -155,8 +159,8 @@ public FlinkStateInternals( Coder keyCoder, SerializablePipelineOptions pipelineOptions) throws Exception { - this.flinkStateBackend = flinkStateBackend; - this.keyCoder = keyCoder; + this.flinkStateBackend = Objects.requireNonNull(flinkStateBackend); + this.keyCoder = Objects.requireNonNull(keyCoder); watermarkHoldStateDescriptor = new MapStateDescriptor<>( "watermark-holds", @@ -1033,15 +1037,32 @@ private static class FlinkMapState implements MapState get(final KeyT input) { - try { - return ReadableStates.immediate( - flinkStateBackend - .getPartitionedState( - namespace.stringKey(), StringSerializer.INSTANCE, flinkStateDescriptor) - .get(input)); - } catch (Exception e) { - throw new RuntimeException("Error get from state.", e); - } + return getOrDefault(input, null); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState getOrDefault( + KeyT key, @Nullable ValueT defaultValue) { + return new ReadableState() { + @Override + public @Nullable ValueT read() { + try { + ValueT value = + flinkStateBackend + .getPartitionedState( + namespace.stringKey(), StringSerializer.INSTANCE, flinkStateDescriptor) + .get(key); + return (value != null) ? value : defaultValue; + } catch (Exception e) { + throw new RuntimeException("Error get from state.", e); + } + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + return this; + } + }; } @Override @@ -1057,7 +1078,8 @@ public void put(KeyT key, ValueT value) { } @Override - public ReadableState putIfAbsent(final KeyT key, final ValueT value) { + public ReadableState computeIfAbsent( + final KeyT key, Function mappingFunction) { try { ValueT current = flinkStateBackend @@ -1069,7 +1091,7 @@ public ReadableState putIfAbsent(final KeyT key, final ValueT value) { flinkStateBackend .getPartitionedState( namespace.stringKey(), StringSerializer.INSTANCE, flinkStateDescriptor) - .put(key, value); + .put(key, mappingFunction.apply(key)); } return ReadableStates.immediate(current); } catch (Exception e) { @@ -1161,6 +1183,25 @@ public ReadableState>> readLater() { }; } + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState< + @UnknownKeyFor @NonNull @Initialized Boolean> + isEmpty() { + ReadableState> keys = this.keys(); + return new ReadableState() { + @Override + public @Nullable Boolean read() { + return Iterables.isEmpty(keys.read()); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + keys.readLater(); + return this; + } + }; + } + @Override public void clear() { try { diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkExecutionEnvironmentsTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkExecutionEnvironmentsTest.java index c77c38abdaab..bafa1aa6f21f 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkExecutionEnvironmentsTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkExecutionEnvironmentsTest.java @@ -35,7 +35,6 @@ import org.apache.flink.configuration.Configuration; import org.apache.flink.configuration.RestOptions; import org.apache.flink.contrib.streaming.state.RocksDBStateBackend; -import org.apache.flink.runtime.jobgraph.SavepointRestoreSettings; import org.apache.flink.runtime.state.filesystem.FsStateBackend; import org.apache.flink.streaming.api.environment.LocalStreamEnvironment; import org.apache.flink.streaming.api.environment.RemoteStreamEnvironment; @@ -45,12 +44,8 @@ import org.junit.rules.ExpectedException; import org.junit.rules.TemporaryFolder; import org.powermock.reflect.Whitebox; -import org.powermock.reflect.exceptions.FieldNotFoundException; /** Tests for {@link FlinkExecutionEnvironments}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlinkExecutionEnvironmentsTest { @Rule public TemporaryFolder temporaryFolder = new TemporaryFolder(); @@ -527,30 +522,18 @@ public void shouldCreateRocksDbStateBackend() { } private void checkHostAndPort(Object env, String expectedHost, int expectedPort) { - try { - assertThat(Whitebox.getInternalState(env, "host"), is(expectedHost)); - assertThat(Whitebox.getInternalState(env, "port"), is(expectedPort)); - } catch (FieldNotFoundException t) { - // for flink 1.10+ - String host = - ((Configuration) Whitebox.getInternalState(env, "configuration")) - .getString(RestOptions.ADDRESS); - int port = - ((Configuration) Whitebox.getInternalState(env, "configuration")) - .getInteger(RestOptions.PORT); - assertThat( - new InetSocketAddress(host, port), is(new InetSocketAddress(expectedHost, expectedPort))); - } + String host = + ((Configuration) Whitebox.getInternalState(env, "configuration")) + .getString(RestOptions.ADDRESS); + int port = + ((Configuration) Whitebox.getInternalState(env, "configuration")) + .getInteger(RestOptions.PORT); + assertThat( + new InetSocketAddress(host, port), is(new InetSocketAddress(expectedHost, expectedPort))); } private String getSavepointPath(Object env) { - try { - return ((SavepointRestoreSettings) Whitebox.getInternalState(env, "savepointRestoreSettings")) - .getRestorePath(); - } catch (FieldNotFoundException t) { - // for flink 1.10+ - return ((Configuration) Whitebox.getInternalState(env, "configuration")) - .getString("execution.savepoint.path", null); - } + return ((Configuration) Whitebox.getInternalState(env, "configuration")) + .getString("execution.savepoint.path", null); } } diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironmentTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironmentTest.java index c45f3f0e87a7..8f3bf96e08c6 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironmentTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironmentTest.java @@ -72,13 +72,11 @@ import org.mockito.ArgumentCaptor; import org.mockito.Mockito; import org.powermock.reflect.Whitebox; -import org.powermock.reflect.exceptions.FieldNotFoundException; /** Tests for {@link FlinkPipelineExecutionEnvironment}. */ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlinkPipelineExecutionEnvironmentTest implements Serializable { @@ -432,18 +430,13 @@ private static List convertFilesToURLs(List filePaths) { } private List getJars(Object env) throws Exception { - try { - return (List) Whitebox.getInternalState(env, "jarFiles"); - } catch (FieldNotFoundException t) { - // for flink 1.10+ - Configuration config = Whitebox.getInternalState(env, "configuration"); - Class accesorClass = Class.forName("org.apache.flink.client.cli.ExecutionConfigAccessor"); - Method fromConfigurationMethod = - accesorClass.getDeclaredMethod("fromConfiguration", Configuration.class); - Object accesor = fromConfigurationMethod.invoke(null, config); - - Method getJarsMethod = accesorClass.getDeclaredMethod("getJars"); - return (List) getJarsMethod.invoke(accesor); - } + Configuration config = Whitebox.getInternalState(env, "configuration"); + Class accesorClass = Class.forName("org.apache.flink.client.cli.ExecutionConfigAccessor"); + Method fromConfigurationMethod = + accesorClass.getDeclaredMethod("fromConfiguration", Configuration.class); + Object accesor = fromConfigurationMethod.invoke(null, config); + + Method getJarsMethod = accesorClass.getDeclaredMethod("getJars"); + return (List) getJarsMethod.invoke(accesor); } } diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineOptionsTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineOptionsTest.java index e233d5e39263..c2d9163aacc9 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineOptionsTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineOptionsTest.java @@ -53,9 +53,6 @@ /** * Tests for serialization and deserialization of {@link PipelineOptions} in {@link DoFnOperator}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlinkPipelineOptionsTest { /** Pipeline options. */ diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkRequiresStableInputTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkRequiresStableInputTest.java index e83e1cfebe3b..b9c0f3436841 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkRequiresStableInputTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkRequiresStableInputTest.java @@ -55,9 +55,6 @@ import org.junit.rules.TemporaryFolder; /** Tests {@link DoFn.RequiresStableInput} with Flink. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlinkRequiresStableInputTest { @ClassRule public static TemporaryFolder tempFolder = new TemporaryFolder(); diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTest.java index 34eb373405a3..c02a43275757 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkRunnerTest.java @@ -32,9 +32,6 @@ import org.junit.Test; /** Test for {@link FlinkRunner}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlinkRunnerTest extends FlinkRunnerTestCompat { @Test diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkSavepointTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkSavepointTest.java index 6693d963f013..e973079a38af 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkSavepointTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkSavepointTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.flink; import static org.hamcrest.MatcherAssert.assertThat; +import static org.junit.Assume.assumeFalse; import java.io.Serializable; import java.net.URI; @@ -61,6 +62,7 @@ import org.apache.flink.runtime.jobgraph.SavepointRestoreSettings; import org.apache.flink.runtime.minicluster.MiniCluster; import org.apache.flink.runtime.minicluster.MiniClusterConfiguration; +import org.apache.flink.runtime.util.EnvironmentInformation; import org.hamcrest.Matchers; import org.hamcrest.core.IsIterableContaining; import org.joda.time.Instant; @@ -82,7 +84,6 @@ */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlinkSavepointTest implements Serializable { @@ -144,6 +145,8 @@ public void afterTest() throws Exception { @Test public void testSavepointRestoreLegacy() throws Exception { + // Don't run on Flink 1.11. https://issues.apache.org/jira/browse/BEAM-10955 + assumeFalse(EnvironmentInformation.getVersion().startsWith("1.11")); runSavepointAndRestore(false); } @@ -192,7 +195,7 @@ private void runSavepointAndRestore(boolean isPortablePipeline) throws Exception private JobID executeLegacy(Pipeline pipeline) throws Exception { JobGraph jobGraph = getJobGraph(pipeline); flinkCluster.submitJob(jobGraph).get(); - return jobGraph.getJobID(); + return waitForJobToBeReady(); } private JobID executePortable(Pipeline pipeline) throws Exception { diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkStreamingPipelineTranslatorTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkStreamingPipelineTranslatorTest.java index 1b6ad7e22433..422ad580a134 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkStreamingPipelineTranslatorTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkStreamingPipelineTranslatorTest.java @@ -28,12 +28,30 @@ import java.util.Map; import java.util.UUID; import org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.FlinkAutoBalancedShardKeyShardingFunction; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.VarLongCoder; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.state.TimeDomain; +import org.apache.beam.sdk.state.TimerSpec; +import org.apache.beam.sdk.state.TimerSpecs; +import org.apache.beam.sdk.transforms.Count; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.ShardedKey; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.flink.runtime.jobgraph.JobGraph; import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.junit.Assert; import org.junit.Test; /** Tests if overrides are properly applied. */ @@ -120,4 +138,95 @@ public void testAutoBalanceShardKeyCacheMaxSize() throws Exception { assertThat( fn.getCache().size(), equalTo(FlinkAutoBalancedShardKeyShardingFunction.CACHE_MAX_SIZE)); } + + @Test + public void testStatefulParDoAfterCombineChaining() { + final JobGraph stablePartitioning = getStatefulParDoAfterCombineChainingJobGraph(true); + final JobGraph unstablePartitioning = getStatefulParDoAfterCombineChainingJobGraph(false); + // We expect an extra shuffle stage for unstable partitioning. + Assert.assertEquals( + 1, + Iterables.size(unstablePartitioning.getVertices()) + - Iterables.size(stablePartitioning.getVertices())); + } + + private JobGraph getStatefulParDoAfterCombineChainingJobGraph(boolean stablePartitioning) { + final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + final FlinkStreamingPipelineTranslator translator = + new FlinkStreamingPipelineTranslator(env, PipelineOptionsFactory.create()); + final PipelineOptions pipelineOptions = PipelineOptionsFactory.create(); + pipelineOptions.setRunner(FlinkRunner.class); + final Pipeline pipeline = Pipeline.create(pipelineOptions); + PCollection> aggregate = + pipeline + .apply(Create.of("foo", "bar").withCoder(StringUtf8Coder.of())) + .apply(Count.perElement()); + if (!stablePartitioning) { + // When we insert any element-wise "map" operation between aggregation and stateful ParDo, we + // can no longer assume that partitioning did not change, therefore we need an extra shuffle + aggregate = aggregate.apply(ParDo.of(new StatelessIdentityDoFn<>())); + } + aggregate.apply(ParDo.of(new StatefulNoopDoFn<>())); + translator.translate(pipeline); + return env.getStreamGraph().getJobGraph(); + } + + @Test + public void testStatefulParDoAfterGroupByKeyChaining() { + final JobGraph stablePartitioning = getStatefulParDoAfterGroupByKeyChainingJobGraph(true); + final JobGraph unstablePartitioning = getStatefulParDoAfterGroupByKeyChainingJobGraph(false); + // We expect an extra shuffle stage for unstable partitioning. + Assert.assertEquals( + 1, + Iterables.size(unstablePartitioning.getVertices()) + - Iterables.size(stablePartitioning.getVertices())); + } + + private JobGraph getStatefulParDoAfterGroupByKeyChainingJobGraph(boolean stablePartitioning) { + final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + final FlinkStreamingPipelineTranslator translator = + new FlinkStreamingPipelineTranslator(env, PipelineOptionsFactory.create()); + final PipelineOptions pipelineOptions = PipelineOptionsFactory.create(); + pipelineOptions.setRunner(FlinkRunner.class); + final Pipeline pipeline = Pipeline.create(pipelineOptions); + PCollection>> aggregate = + pipeline + .apply( + Create.of(KV.of("foo", 1L), KV.of("bar", 1L)) + .withCoder(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of()))) + .apply(GroupByKey.create()); + if (!stablePartitioning) { + // When we insert any element-wise "map" operation between aggregation and stateful ParDo, we + // can no longer assume that partitioning did not change, therefore we need an extra shuffle + aggregate = aggregate.apply(ParDo.of(new StatelessIdentityDoFn<>())); + } + aggregate.apply(ParDo.of(new StatefulNoopDoFn<>())); + translator.translate(pipeline); + return env.getStreamGraph().getJobGraph(); + } + + private static class StatelessIdentityDoFn + extends DoFn, KV> { + + @ProcessElement + public void processElement(ProcessContext ctx) { + ctx.output(ctx.element()); + } + } + + private static class StatefulNoopDoFn extends DoFn, Void> { + + @TimerId("my-timer") + private final TimerSpec myTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + @ProcessElement + public void processElement() { + // noop + } + + @OnTimer("my-timer") + public void onMyTimer() { + // noop + } + } } diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslatorsTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslatorsTest.java index 75388e940344..2cf6d6d13fe4 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslatorsTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslatorsTest.java @@ -42,6 +42,7 @@ import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.PValues; @@ -55,7 +56,6 @@ /** Tests for Flink streaming transform translators. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlinkStreamingTransformTranslatorsTest { @@ -170,6 +170,7 @@ private Object applyReadSourceTransform( Collections.emptyMap(), PValues.fullyExpand(outputs), transform, + ResourceHints.create(), Pipeline.create()); ctx.setCurrentTransform(appliedTransform); diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkSubmissionTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkSubmissionTest.java index 4197c38bb4b1..b29a88919661 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkSubmissionTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkSubmissionTest.java @@ -53,7 +53,6 @@ /** End-to-end submission test of Beam jobs on a Flink cluster. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlinkSubmissionTest { diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkTransformOverridesTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkTransformOverridesTest.java index 68b71c28219a..6c57096242aa 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkTransformOverridesTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkTransformOverridesTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.flink; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.util.Collections; import org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.StreamingShardedWriteFactory; @@ -37,6 +37,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.SerializableFunctions; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.values.PCollection; @@ -47,9 +48,6 @@ import org.junit.rules.TemporaryFolder; /** Tests if overrides are properly applied. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlinkTransformOverridesTest { @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder(); @@ -72,7 +70,12 @@ public void testRunnerDeterminedSharding() { AppliedPTransform, WriteFilesResult, WriteFiles> originalApplication = AppliedPTransform.of( - "writefiles", PValues.expandInput(objs), Collections.emptyMap(), original, p); + "writefiles", + PValues.expandInput(objs), + Collections.emptyMap(), + original, + ResourceHints.create(), + p); WriteFiles replacement = (WriteFiles) diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java index 7c605e48fbdc..00ca07c1b6f3 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java @@ -61,9 +61,6 @@ * batch and streaming. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PortableExecutionTest implements Serializable { private static final Logger LOG = LoggerFactory.getLogger(PortableExecutionTest.class); diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableStateExecutionTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableStateExecutionTest.java index 548a43e1e3d0..bff62b323db6 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableStateExecutionTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableStateExecutionTest.java @@ -58,9 +58,6 @@ * org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator}. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PortableStateExecutionTest implements Serializable { private static final Logger LOG = LoggerFactory.getLogger(PortableStateExecutionTest.class); diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableTimersExecutionTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableTimersExecutionTest.java index 72f866fb70ab..18f6dd74cc6a 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableTimersExecutionTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableTimersExecutionTest.java @@ -71,9 +71,6 @@ * of a given timer is run. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PortableTimersExecutionTest implements Serializable { private static final Logger LOG = LoggerFactory.getLogger(PortableTimersExecutionTest.class); diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourcePortableTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourcePortableTest.java index 64f00e728c7f..84f41824fd48 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourcePortableTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourcePortableTest.java @@ -17,26 +17,44 @@ */ package org.apache.beam.runners.flink; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.empty; +import static org.hamcrest.Matchers.not; + import java.io.Serializable; import java.util.Collections; +import java.util.List; +import java.util.NoSuchElementException; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; import org.apache.beam.model.jobmanagement.v1.JobApi.JobState; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.construction.Environments; +import org.apache.beam.runners.core.construction.PTransformTranslation; import org.apache.beam.runners.core.construction.PipelineTranslation; +import org.apache.beam.runners.core.construction.SplittableParDo; import org.apache.beam.runners.jobsubmission.JobInvocation; import org.apache.beam.sdk.Pipeline; -import org.apache.beam.sdk.io.GenerateSequence; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.SerializableCoder; +import org.apache.beam.sdk.io.Read; +import org.apache.beam.sdk.io.UnboundedSource; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.options.PortablePipelineOptions; import org.apache.beam.sdk.testing.CrashingRunner; import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.transforms.windowing.FixedWindows; +import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListeningExecutorService; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Duration; +import org.joda.time.Instant; import org.junit.AfterClass; import org.junit.BeforeClass; import org.junit.Test; @@ -49,9 +67,6 @@ /** Tests that Read translation is supported in portable pipelines. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReadSourcePortableTest implements Serializable { private static final Logger LOG = LoggerFactory.getLogger(ReadSourcePortableTest.class); @@ -84,7 +99,8 @@ public static void tearDown() throws InterruptedException { @Test(timeout = 120_000) public void testExecution() throws Exception { - PipelineOptions options = PipelineOptionsFactory.fromArgs("--experiments=beam_fn_api").create(); + PipelineOptions options = + PipelineOptionsFactory.fromArgs("--experiments=use_deprecated_read").create(); options.setRunner(CrashingRunner.class); options.as(FlinkPipelineOptions.class).setFlinkMaster("[local]"); options.as(FlinkPipelineOptions.class).setStreaming(isStreaming); @@ -93,12 +109,27 @@ public void testExecution() throws Exception { .as(PortablePipelineOptions.class) .setDefaultEnvironmentType(Environments.ENVIRONMENT_EMBEDDED); Pipeline p = Pipeline.create(options); - PCollection result = p.apply(GenerateSequence.from(0L).to(10L)); + PCollection result = + p.apply(Read.from(new Source(10))) + // FIXME: the test fails without this + .apply(Window.into(FixedWindows.of(Duration.millis(1)))); + PAssert.that(result) .containsInAnyOrder(ImmutableList.of(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)); + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReads(p); + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p); + List readTransforms = + pipelineProto.getComponents().getTransformsMap().values().stream() + .filter( + transform -> + transform.getSpec().getUrn().equals(PTransformTranslation.READ_TRANSFORM_URN)) + .collect(Collectors.toList()); + + assertThat(readTransforms, not(empty())); + // execute the pipeline JobInvocation jobInvocation = FlinkJobInvoker.create(null) @@ -112,7 +143,101 @@ public void testExecution() throws Exception { options.as(FlinkPipelineOptions.class), null, Collections.emptyList())); jobInvocation.start(); while (jobInvocation.getState() != JobState.Enum.DONE) { + assertThat(jobInvocation.getState(), not(JobState.Enum.FAILED)); Thread.sleep(100); } } + + private static class Source extends UnboundedSource { + + private final int count; + private final Instant now = Instant.now(); + + Source(int count) { + this.count = count; + } + + @Override + public List> split( + int desiredNumSplits, PipelineOptions options) { + + return Collections.singletonList(this); + } + + @Override + public UnboundedReader createReader( + PipelineOptions options, @Nullable Checkpoint checkpointMark) { + + return new UnboundedReader() { + int pos = -1; + + @Override + public boolean start() { + return advance(); + } + + @Override + public boolean advance() { + return ++pos < count; + } + + @Override + public Instant getWatermark() { + return pos < count + ? BoundedWindow.TIMESTAMP_MIN_VALUE + : BoundedWindow.TIMESTAMP_MAX_VALUE; + } + + @Override + public CheckpointMark getCheckpointMark() { + return new Checkpoint(pos); + } + + @Override + public UnboundedSource getCurrentSource() { + return Source.this; + } + + @Override + public Long getCurrent() throws NoSuchElementException { + return (long) pos; + } + + @Override + public Instant getCurrentTimestamp() throws NoSuchElementException { + return now; + } + + @Override + public void close() {} + }; + } + + @Override + public boolean requiresDeduping() { + return false; + } + + @Override + public Coder getOutputCoder() { + // use SerializableCoder to test custom java coders work + return SerializableCoder.of(Long.class); + } + + @Override + public Coder getCheckpointMarkCoder() { + return SerializableCoder.of(Checkpoint.class); + } + + private static class Checkpoint implements CheckpointMark, Serializable { + final int pos; + + Checkpoint(int pos) { + this.pos = pos; + } + + @Override + public void finalizeCheckpoint() {} + } + } } diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourceStreamingTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourceStreamingTest.java index 92d615a7f739..5f2434d7a25c 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourceStreamingTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourceStreamingTest.java @@ -30,9 +30,6 @@ import org.junit.Test; /** Reads from a bounded source in streaming. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReadSourceStreamingTest extends AbstractTestBase { protected String resultDir; diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourceTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourceTest.java index 9a703d71b495..96d45ddcf1bb 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourceTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/ReadSourceTest.java @@ -29,9 +29,6 @@ import org.apache.flink.test.util.JavaProgramTestBase; /** Reads from a bounded source in batch execution. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReadSourceTest extends JavaProgramTestBase { protected String resultPath; diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/batch/NonMergingGroupByKeyTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/batch/NonMergingGroupByKeyTest.java index af3c3e11a398..a63480e78800 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/batch/NonMergingGroupByKeyTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/batch/NonMergingGroupByKeyTest.java @@ -19,7 +19,6 @@ import java.util.Arrays; import java.util.Objects; -import org.apache.beam.runners.flink.FlinkCapabilities; import org.apache.beam.runners.flink.FlinkPipelineOptions; import org.apache.beam.runners.flink.FlinkTestPipeline; import org.apache.beam.sdk.Pipeline; @@ -31,12 +30,8 @@ import org.apache.beam.sdk.values.KV; import org.apache.flink.test.util.AbstractTestBase; import org.junit.Assert; -import org.junit.Assume; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class NonMergingGroupByKeyTest extends AbstractTestBase { private static class ReiterateDoFn extends DoFn>, Void> { @@ -51,9 +46,6 @@ public void processElement(@Element KV> el) { @Test public void testDisabledReIterationThrowsAnException() { - // If output during closing is not supported, we can not chain DoFns and results - // are therefore materialized during output serialization. - Assume.assumeTrue(FlinkCapabilities.supportsOutputDuringClosing()); final Pipeline p = FlinkTestPipeline.createForBatch(); p.apply(Create.of(Arrays.asList(KV.of("a", 1), KV.of("b", 2), KV.of("c", 3)))) .apply(GroupByKey.create()) diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/batch/ReshuffleTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/batch/ReshuffleTest.java index cb2bb2e409b1..bf0d739bc472 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/batch/ReshuffleTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/batch/ReshuffleTest.java @@ -36,9 +36,6 @@ import org.junit.Assert; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReshuffleTest { private static class WithBundleIdFn extends DoFn { diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainerTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainerTest.java index 9ca4fa5e6fb5..9f8046a81f06 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainerTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainerTest.java @@ -57,9 +57,6 @@ import org.mockito.MockitoAnnotations; /** Tests for {@link FlinkMetricContainer}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlinkMetricContainerTest { @Mock private RuntimeContext runtimeContext; diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/FlinkBroadcastStateInternalsTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/FlinkBroadcastStateInternalsTest.java index f83774e3b1de..10e20a6d47d3 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/FlinkBroadcastStateInternalsTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/FlinkBroadcastStateInternalsTest.java @@ -37,9 +37,6 @@ *

    Just test value, bag and combining. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlinkBroadcastStateInternalsTest extends StateInternalsTest { @Override diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/FlinkStateInternalsTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/FlinkStateInternalsTest.java index 159530245e4f..39738209bb55 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/FlinkStateInternalsTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/FlinkStateInternalsTest.java @@ -59,7 +59,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlinkStateInternalsTest extends StateInternalsTest { diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/GroupByNullKeyTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/GroupByNullKeyTest.java index fc68544dd4b2..6d0890621585 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/GroupByNullKeyTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/GroupByNullKeyTest.java @@ -42,9 +42,6 @@ import org.junit.Test; /** Test for GroupByNullKey. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GroupByNullKeyTest extends AbstractTestBase implements Serializable { protected String resultDir; diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/GroupByWithNullValuesTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/GroupByWithNullValuesTest.java index 9e071819e725..1797a0e40ac1 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/GroupByWithNullValuesTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/GroupByWithNullValuesTest.java @@ -38,9 +38,6 @@ import org.junit.Test; /** Tests grouping with null values. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GroupByWithNullValuesTest implements Serializable { @Test diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/TopWikipediaSessionsTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/TopWikipediaSessionsTest.java index 4b97a37443dd..63abfa5b618b 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/TopWikipediaSessionsTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/TopWikipediaSessionsTest.java @@ -41,9 +41,6 @@ import org.junit.Test; /** Session window test. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TopWikipediaSessionsTest extends AbstractTestBase implements Serializable { protected String resultDir; diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkDoFnFunctionTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkDoFnFunctionTest.java index 580a037c3040..87486c96370d 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkDoFnFunctionTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkDoFnFunctionTest.java @@ -36,7 +36,6 @@ /** Tests for {@link FlinkDoFnFunction}. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlinkDoFnFunctionTest { diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkExecutableStageFunctionTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkExecutableStageFunctionTest.java index 67b772d66018..ce9c8a6248d4 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkExecutableStageFunctionTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkExecutableStageFunctionTest.java @@ -51,7 +51,7 @@ import org.apache.beam.sdk.transforms.join.RawUnionValue; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.flink.api.common.cache.DistributedCache; import org.apache.flink.api.common.functions.RuntimeContext; @@ -72,7 +72,6 @@ @RunWith(Parameterized.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlinkExecutableStageFunctionTest { diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkStatefulDoFnFunctionTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkStatefulDoFnFunctionTest.java index dd9ad2585c98..b74593156ee4 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkStatefulDoFnFunctionTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/functions/FlinkStatefulDoFnFunctionTest.java @@ -36,7 +36,6 @@ /** Tests for {@link FlinkStatefulDoFnFunction}. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlinkStatefulDoFnFunctionTest { diff --git a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializerTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializerTest.java similarity index 97% rename from runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializerTest.java rename to runners/flink/src/test/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializerTest.java index 1647d762a5c0..d668bce6f86d 100644 --- a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializerTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializerTest.java @@ -36,7 +36,6 @@ /** Tests {@link CoderTypeSerializer}. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class CoderTypeSerializerTest implements Serializable { diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DedupingOperatorTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DedupingOperatorTest.java index dbc6ea6a076d..9576baa4c66b 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DedupingOperatorTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DedupingOperatorTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.flink.translation.wrappers.streaming; import static org.apache.beam.runners.flink.translation.wrappers.streaming.StreamRecordStripper.stripStreamRecordFromWindowedValue; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.collection.IsIterableContainingInOrder.contains; -import static org.junit.Assert.assertThat; import java.nio.ByteBuffer; import java.nio.charset.StandardCharsets; diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperatorTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperatorTest.java index 71d64044294d..7ad761b594f4 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperatorTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperatorTest.java @@ -21,6 +21,7 @@ import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; +import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.is; @@ -31,13 +32,23 @@ import com.fasterxml.jackson.databind.type.TypeFactory; import com.fasterxml.jackson.databind.util.LRUMap; import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.Arrays; import java.util.Collections; import java.util.HashMap; import java.util.List; +import java.util.Objects; import java.util.Optional; import java.util.function.Supplier; import java.util.stream.Collectors; +import org.apache.beam.runners.core.DoFnRunner; +import org.apache.beam.runners.core.StateNamespace; +import org.apache.beam.runners.core.StateNamespaces; +import org.apache.beam.runners.core.StateTag; +import org.apache.beam.runners.core.StateTags; import org.apache.beam.runners.core.StatefulDoFnRunner; +import org.apache.beam.runners.core.StepContext; +import org.apache.beam.runners.core.TimerInternals; import org.apache.beam.runners.core.construction.SerializablePipelineOptions; import org.apache.beam.runners.flink.FlinkPipelineOptions; import org.apache.beam.runners.flink.metrics.FlinkMetricContainer; @@ -56,6 +67,7 @@ import org.apache.beam.sdk.state.TimerSpec; import org.apache.beam.sdk.state.TimerSpecs; import org.apache.beam.sdk.state.ValueState; +import org.apache.beam.sdk.state.WatermarkHoldState; import org.apache.beam.sdk.testing.PCollectionViewTesting; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; @@ -67,6 +79,7 @@ import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.transforms.windowing.IntervalWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; +import org.apache.beam.sdk.transforms.windowing.TimestampCombiner; import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.util.WindowedValue.FullWindowedValueCoder; @@ -101,11 +114,7 @@ /** Tests for {@link DoFnOperator}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "keyfor", - "nullness" -}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +@SuppressWarnings({"keyfor"}) public class DoFnOperatorTest { // views and windows for testing side inputs @@ -128,7 +137,6 @@ public void setUp() { } @Test - @SuppressWarnings("unchecked") public void testSingleOutput() throws Exception { Coder> coder = WindowedValue.getValueOnlyCoder(StringUtf8Coder.of()); @@ -178,15 +186,12 @@ public void testMultiOutputOutput() throws Exception { TupleTag mainOutput = new TupleTag<>("main-output"); TupleTag additionalOutput1 = new TupleTag<>("output-1"); TupleTag additionalOutput2 = new TupleTag<>("output-2"); - ImmutableMap, OutputTag> tagsToOutputTags = - ImmutableMap., OutputTag>builder() - .put( - additionalOutput1, - new OutputTag>(additionalOutput1.getId()) {}) - .put( - additionalOutput2, - new OutputTag>(additionalOutput2.getId()) {}) + ImmutableMap, OutputTag>> tagsToOutputTags = + ImmutableMap., OutputTag>>builder() + .put(additionalOutput1, new OutputTag>(additionalOutput1.getId()) {}) + .put(additionalOutput2, new OutputTag>(additionalOutput2.getId()) {}) .build(); + @SuppressWarnings("rawtypes") ImmutableMap, Coder>> tagsToCoders = ImmutableMap., Coder>>builder() .put(mainOutput, (Coder) coder) @@ -208,7 +213,7 @@ public void testMultiOutputOutput() throws Exception { Collections.emptyMap(), mainOutput, ImmutableList.of(additionalOutput1, additionalOutput2), - new DoFnOperator.MultiOutputOutputManagerFactory( + new DoFnOperator.MultiOutputOutputManagerFactory<>( mainOutput, tagsToOutputTags, tagsToCoders, @@ -258,8 +263,8 @@ public void testMultiOutputOutput() throws Exception { * timestamp {@code <= T} in the future. We have to make sure to take this into account when * firing timers. * - *

    This not test the timer API in general or processing-time timers because there are generic - * tests for this in {@code ParDoTest}. + *

    This does not test the timer API in general or processing-time timers because there are + * generic tests for this in {@code ParDoTest}. */ @Test public void testWatermarkContract() throws Exception { @@ -425,6 +430,159 @@ public void onProcessingTime(OnTimerContext context) { testHarness.close(); } + @Test + public void testWatermarkUpdateAfterWatermarkHoldRelease() throws Exception { + + Coder>> coder = + WindowedValue.getValueOnlyCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())); + + TupleTag> outputTag = new TupleTag<>("main-output"); + List emittedWatermarkHolds = new ArrayList<>(); + KeySelector>, ByteBuffer> keySelector = + e -> FlinkKeyUtils.encodeKey(e.getValue().getKey(), StringUtf8Coder.of()); + + DoFnOperator, KV> doFnOperator = + new DoFnOperator, KV>( + new IdentityDoFn<>(), + "stepName", + coder, + Collections.emptyMap(), + outputTag, + Collections.emptyList(), + new DoFnOperator.MultiOutputOutputManagerFactory<>( + outputTag, coder, new SerializablePipelineOptions(FlinkPipelineOptions.defaults())), + WindowingStrategy.globalDefault(), + new HashMap<>(), /* side-input mapping */ + Collections.emptyList(), /* side inputs */ + FlinkPipelineOptions.defaults(), + StringUtf8Coder.of(), + keySelector, + DoFnSchemaInformation.create(), + Collections.emptyMap()) { + + @Override + protected DoFnRunner, KV> createWrappingDoFnRunner( + DoFnRunner, KV> wrappedRunner, + StepContext stepContext) { + + StateNamespace namespace = + StateNamespaces.window(GlobalWindow.Coder.INSTANCE, GlobalWindow.INSTANCE); + StateTag holdTag = + StateTags.watermarkStateInternal("hold", TimestampCombiner.LATEST); + WatermarkHoldState holdState = stepContext.stateInternals().state(namespace, holdTag); + TimerInternals timerInternals = stepContext.timerInternals(); + + return new DoFnRunner, KV>() { + + @Override + public void startBundle() { + wrappedRunner.startBundle(); + } + + @Override + public void processElement(WindowedValue> elem) { + wrappedRunner.processElement(elem); + holdState.add(elem.getTimestamp()); + timerInternals.setTimer( + namespace, + "timer", + "family", + elem.getTimestamp().plus(1), + elem.getTimestamp().plus(1), + TimeDomain.EVENT_TIME); + timerInternals.setTimer( + namespace, + "cleanup", + "", + GlobalWindow.INSTANCE.maxTimestamp(), + GlobalWindow.INSTANCE.maxTimestamp(), + TimeDomain.EVENT_TIME); + } + + @Override + public void onTimer( + String timerId, + String timerFamilyId, + KeyT key, + BoundedWindow window, + Instant timestamp, + Instant outputTimestamp, + TimeDomain timeDomain) { + + if ("cleanup".equals(timerId)) { + holdState.clear(); + } else { + holdState.add(outputTimestamp); + } + } + + @Override + public void finishBundle() { + wrappedRunner.finishBundle(); + } + + @Override + public void onWindowExpiration( + BoundedWindow window, Instant timestamp, KeyT key) { + wrappedRunner.onWindowExpiration(window, timestamp, key); + } + + @Override + public DoFn, KV> getFn() { + return doFn; + } + }; + } + + @Override + void emitWatermarkIfHoldChanged(long currentWatermarkHold) { + emittedWatermarkHolds.add(keyedStateInternals.minWatermarkHoldMs()); + } + }; + + OneInputStreamOperatorTestHarness< + WindowedValue>, WindowedValue>> + testHarness = + new KeyedOneInputStreamOperatorTestHarness<>( + doFnOperator, + keySelector, + new CoderTypeInformation<>( + FlinkKeyUtils.ByteBufferCoder.of(), FlinkPipelineOptions.defaults())); + + testHarness.setup(); + + Instant now = Instant.now(); + + testHarness.open(); + + // process first element, set hold to `now', setup timer for `now + 1' + testHarness.processElement( + new StreamRecord<>( + WindowedValue.timestampedValueInGlobalWindow(KV.of("Key", "Hello"), now))); + + assertThat(emittedWatermarkHolds, is(equalTo(Collections.singletonList(now.getMillis())))); + + // fire timer, change hold to `now + 2' + testHarness.processWatermark(now.getMillis() + 2); + + assertThat( + emittedWatermarkHolds, is(equalTo(Arrays.asList(now.getMillis(), now.getMillis() + 1)))); + + // process second element, verify we emitted changed hold + testHarness.processElement( + new StreamRecord<>( + WindowedValue.timestampedValueInGlobalWindow(KV.of("Key", "Hello"), now.plus(2)))); + + assertThat( + emittedWatermarkHolds, + is(equalTo(Arrays.asList(now.getMillis(), now.getMillis() + 1, now.getMillis() + 2)))); + + testHarness.processWatermark(GlobalWindow.INSTANCE.maxTimestamp().plus(1).getMillis()); + testHarness.processWatermark(BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis()); + + testHarness.close(); + } + @Test public void testLateDroppingForStatefulFn() throws Exception { @@ -537,7 +695,7 @@ public void testStateGCForStatefulFn() throws Exception { getHarness( windowingStrategy, offset, - (window) -> new Instant(window.maxTimestamp()), + (window) -> new Instant(Objects.requireNonNull(window).maxTimestamp()), timerOutput); testHarness.open(); @@ -652,7 +810,7 @@ public void testGCForGlobalWindow() throws Exception { private static KeyedOneInputStreamOperatorTestHarness< ByteBuffer, WindowedValue>, WindowedValue>> getHarness( - WindowingStrategy windowingStrategy, + WindowingStrategy windowingStrategy, int elementOffset, Function timerTimestamp, int timerOutput) @@ -676,7 +834,7 @@ public void processElement( @TimerId(timerId) Timer timer, @StateId(stateId) ValueState state, BoundedWindow window) { - timer.set(timerTimestamp.apply(window)); + timer.set(Objects.requireNonNull(timerTimestamp.apply(window))); state.write(context.element().getKey()); context.output( KV.of(context.element().getKey(), context.element().getValue() + elementOffset)); @@ -717,15 +875,11 @@ outputTag, coder, new SerializablePipelineOptions(FlinkPipelineOptions.defaults( DoFnSchemaInformation.create(), Collections.emptyMap()); - KeyedOneInputStreamOperatorTestHarness< - ByteBuffer, WindowedValue>, WindowedValue>> - testHarness = - new KeyedOneInputStreamOperatorTestHarness<>( - doFnOperator, - keySelector, - new CoderTypeInformation<>( - FlinkKeyUtils.ByteBufferCoder.of(), FlinkPipelineOptions.defaults())); - return testHarness; + return new KeyedOneInputStreamOperatorTestHarness<>( + doFnOperator, + keySelector, + new CoderTypeInformation<>( + FlinkKeyUtils.ByteBufferCoder.of(), FlinkPipelineOptions.defaults())); } @Test @@ -748,7 +902,6 @@ void testSideInputs(boolean keyed) throws Exception { ImmutableMap.>builder().put(1, view1).put(2, view2).build(); Coder keyCoder = StringUtf8Coder.of(); - ; KeySelector, ByteBuffer> keySelector = null; if (keyed) { keySelector = value -> FlinkKeyUtils.encodeKey(value.getValue(), keyCoder); @@ -873,7 +1026,8 @@ public void processElement( TupleTag> outputTag = new TupleTag<>("main-output"); StringUtf8Coder keyCoder = StringUtf8Coder.of(); - KvToByteBufferKeySelector keySelector = new KvToByteBufferKeySelector<>(keyCoder, null); + KvToByteBufferKeySelector keySelector = + new KvToByteBufferKeySelector<>(keyCoder, null); KvCoder coder = KvCoder.of(keyCoder, VarLongCoder.of()); FullWindowedValueCoder> kvCoder = @@ -1367,7 +1521,6 @@ OneInputStreamOperatorTestHarness, WindowedValue> creat } @Test - @SuppressWarnings("unchecked") public void testBundle() throws Exception { WindowedValue.ValueOnlyWindowedValueCoder windowedValueCoder = @@ -1388,7 +1541,7 @@ public void finishBundle(FinishBundleContext context) { }; DoFnOperator.MultiOutputOutputManagerFactory outputManagerFactory = - new DoFnOperator.MultiOutputOutputManagerFactory( + new DoFnOperator.MultiOutputOutputManagerFactory<>( outputTag, WindowedValue.getFullCoder(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE), new SerializablePipelineOptions(options)); @@ -1505,11 +1658,10 @@ public void finishBundle(FinishBundleContext context) { } @Test - @SuppressWarnings("unchecked") public void testBundleKeyed() throws Exception { StringUtf8Coder keyCoder = StringUtf8Coder.of(); - KvToByteBufferKeySelector keySelector = + KvToByteBufferKeySelector keySelector = new KvToByteBufferKeySelector<>( keyCoder, new SerializablePipelineOptions(FlinkPipelineOptions.defaults())); KvCoder kvCoder = KvCoder.of(keyCoder, StringUtf8Coder.of()); @@ -1537,13 +1689,13 @@ public void finishBundle(FinishBundleContext context) { }; DoFnOperator.MultiOutputOutputManagerFactory outputManagerFactory = - new DoFnOperator.MultiOutputOutputManagerFactory( + new DoFnOperator.MultiOutputOutputManagerFactory<>( outputTag, WindowedValue.getFullCoder(kvCoder.getValueCoder(), GlobalWindow.Coder.INSTANCE), new SerializablePipelineOptions(options)); - DoFnOperator, KV> doFnOperator = - new DoFnOperator( + DoFnOperator, String> doFnOperator = + new DoFnOperator<>( doFn, "stepName", windowedValueCoder, @@ -1562,17 +1714,17 @@ public void finishBundle(FinishBundleContext context) { OneInputStreamOperatorTestHarness>, WindowedValue> testHarness = - new KeyedOneInputStreamOperatorTestHarness( + new KeyedOneInputStreamOperatorTestHarness<>( doFnOperator, keySelector, keySelector.getProducedType()); testHarness.open(); testHarness.processElement( - new StreamRecord(WindowedValue.valueInGlobalWindow(KV.of("key", "a")))); + new StreamRecord<>(WindowedValue.valueInGlobalWindow(KV.of("key", "a")))); testHarness.processElement( - new StreamRecord(WindowedValue.valueInGlobalWindow(KV.of("key", "b")))); + new StreamRecord<>(WindowedValue.valueInGlobalWindow(KV.of("key", "b")))); testHarness.processElement( - new StreamRecord(WindowedValue.valueInGlobalWindow(KV.of("key", "c")))); + new StreamRecord<>(WindowedValue.valueInGlobalWindow(KV.of("key", "c")))); assertThat( stripStreamRecordFromWindowedValue(testHarness.getOutput()), @@ -1597,7 +1749,7 @@ public void finishBundle(FinishBundleContext context) { testHarness.close(); doFnOperator = - new DoFnOperator( + new DoFnOperator<>( doFn, "stepName", windowedValueCoder, @@ -1615,7 +1767,7 @@ public void finishBundle(FinishBundleContext context) { Collections.emptyMap()); testHarness = - new KeyedOneInputStreamOperatorTestHarness( + new KeyedOneInputStreamOperatorTestHarness<>( doFnOperator, keySelector, keySelector.getProducedType()); // Restore snapshot @@ -1659,11 +1811,10 @@ public void testCheckpointBufferingWithMultipleBundles() throws Exception { WindowedValue.getFullCoder(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE), new SerializablePipelineOptions(options)); - @SuppressWarnings("unchecked") Supplier> doFnOperatorSupplier = () -> new DoFnOperator<>( - new IdentityDoFn(), + new IdentityDoFn<>(), "stepName", windowedValueCoder, Collections.emptyMap(), @@ -1779,7 +1930,7 @@ public void finishBundle(FinishBundleContext context) { }; DoFnOperator.MultiOutputOutputManagerFactory outputManagerFactory = - new DoFnOperator.MultiOutputOutputManagerFactory( + new DoFnOperator.MultiOutputOutputManagerFactory<>( outputTag, WindowedValue.getFullCoder(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE), new SerializablePipelineOptions(options)); @@ -1857,16 +2008,15 @@ public void finishBundle(FinishBundleContext context) { } @Test - @SuppressWarnings("unchecked") public void testExactlyOnceBufferingKeyed() throws Exception { FlinkPipelineOptions options = FlinkPipelineOptions.defaults(); options.setMaxBundleSize(2L); options.setCheckpointingInterval(1L); - TupleTag outputTag = new TupleTag<>("main-output"); + TupleTag> outputTag = new TupleTag<>("main-output"); StringUtf8Coder keyCoder = StringUtf8Coder.of(); - KvToByteBufferKeySelector keySelector = + KvToByteBufferKeySelector keySelector = new KvToByteBufferKeySelector<>(keyCoder, new SerializablePipelineOptions(options)); KvCoder kvCoder = KvCoder.of(keyCoder, StringUtf8Coder.of()); WindowedValue.ValueOnlyWindowedValueCoder> windowedValueCoder = @@ -1875,7 +2025,7 @@ public void testExactlyOnceBufferingKeyed() throws Exception { DoFn, KV> doFn = new DoFn, KV>() { @StartBundle - public void startBundle(StartBundleContext context) { + public void startBundle() { numStartBundleCalled++; } @@ -1895,15 +2045,15 @@ public void finishBundle(FinishBundleContext context) { } }; - DoFnOperator.MultiOutputOutputManagerFactory outputManagerFactory = - new DoFnOperator.MultiOutputOutputManagerFactory( + DoFnOperator.MultiOutputOutputManagerFactory> outputManagerFactory = + new DoFnOperator.MultiOutputOutputManagerFactory<>( outputTag, - WindowedValue.getFullCoder(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE), + WindowedValue.getFullCoder(kvCoder, GlobalWindow.Coder.INSTANCE), new SerializablePipelineOptions(options)); Supplier, KV>> doFnOperatorSupplier = () -> - new DoFnOperator( + new DoFnOperator<>( doFn, "stepName", windowedValueCoder, @@ -1924,7 +2074,7 @@ public void finishBundle(FinishBundleContext context) { OneInputStreamOperatorTestHarness< WindowedValue>, WindowedValue>> testHarness = - new KeyedOneInputStreamOperatorTestHarness( + new KeyedOneInputStreamOperatorTestHarness<>( doFnOperator, keySelector, keySelector.getProducedType()); testHarness.open(); @@ -1955,7 +2105,7 @@ public void finishBundle(FinishBundleContext context) { doFnOperator = doFnOperatorSupplier.get(); testHarness = - new KeyedOneInputStreamOperatorTestHarness( + new KeyedOneInputStreamOperatorTestHarness<>( doFnOperator, keySelector, keySelector.getProducedType()); // restore from the snapshot @@ -1990,16 +2140,17 @@ public void finishBundle(FinishBundleContext context) { @Test(expected = IllegalStateException.class) public void testFailOnRequiresStableInputAndDisabledCheckpointing() { - TupleTag outputTag = new TupleTag<>("main-output"); + TupleTag> outputTag = new TupleTag<>("main-output"); StringUtf8Coder keyCoder = StringUtf8Coder.of(); - KvToByteBufferKeySelector keySelector = new KvToByteBufferKeySelector<>(keyCoder, null); + KvToByteBufferKeySelector keySelector = + new KvToByteBufferKeySelector<>(keyCoder, null); KvCoder kvCoder = KvCoder.of(keyCoder, StringUtf8Coder.of()); WindowedValue.ValueOnlyWindowedValueCoder> windowedValueCoder = WindowedValue.getValueOnlyCoder(kvCoder); - DoFn doFn = - new DoFn() { + DoFn, KV> doFn = + new DoFn, KV>() { @ProcessElement // Use RequiresStableInput to force buffering elements @RequiresStableInput @@ -2008,16 +2159,16 @@ public void processElement(ProcessContext context) { } }; - DoFnOperator.MultiOutputOutputManagerFactory outputManagerFactory = - new DoFnOperator.MultiOutputOutputManagerFactory( + FlinkPipelineOptions options = FlinkPipelineOptions.defaults(); + DoFnOperator.MultiOutputOutputManagerFactory> outputManagerFactory = + new DoFnOperator.MultiOutputOutputManagerFactory<>( outputTag, - WindowedValue.getFullCoder(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE), - new SerializablePipelineOptions(FlinkPipelineOptions.defaults())); + WindowedValue.getFullCoder(kvCoder, GlobalWindow.Coder.INSTANCE), + new SerializablePipelineOptions(options)); - FlinkPipelineOptions options = FlinkPipelineOptions.defaults(); // should make the DoFnOperator creation fail options.setCheckpointingInterval(-1L); - new DoFnOperator( + new DoFnOperator<>( doFn, "stepName", windowedValueCoder, @@ -2048,15 +2199,14 @@ public void testBundleProcessingExceptionIsFatalDuringCheckpointing() throws Exc WindowedValue.getValueOnlyCoder(coder); DoFnOperator.MultiOutputOutputManagerFactory outputManagerFactory = - new DoFnOperator.MultiOutputOutputManagerFactory( + new DoFnOperator.MultiOutputOutputManagerFactory<>( outputTag, WindowedValue.getFullCoder(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE), new SerializablePipelineOptions(options)); - @SuppressWarnings("unchecked") - DoFnOperator doFnOperator = + DoFnOperator doFnOperator = new DoFnOperator<>( - new IdentityDoFn() { + new IdentityDoFn() { @FinishBundle public void finishBundle() { throw new RuntimeException("something went wrong here"); @@ -2077,7 +2227,6 @@ public void finishBundle() { DoFnSchemaInformation.create(), Collections.emptyMap()); - @SuppressWarnings("unchecked") OneInputStreamOperatorTestHarness, WindowedValue> testHarness = new OneInputStreamOperatorTestHarness<>(doFnOperator); @@ -2094,7 +2243,7 @@ public void finishBundle() { @Test public void testAccumulatorRegistrationOnOperatorClose() throws Exception { - DoFnOperator doFnOperator = getOperatorForCleanupInspection(); + DoFnOperator doFnOperator = getOperatorForCleanupInspection(); OneInputStreamOperatorTestHarness, WindowedValue> testHarness = new OneInputStreamOperatorTestHarness<>(doFnOperator); @@ -2123,15 +2272,14 @@ public void testRemoveCachedClassReferences() throws Exception { OneInputStreamOperatorTestHarness, WindowedValue> testHarness = new OneInputStreamOperatorTestHarness<>(getOperatorForCleanupInspection()); - LRUMap typeCache = - (LRUMap) Whitebox.getInternalState(TypeFactory.defaultInstance(), "_typeCache"); + LRUMap typeCache = Whitebox.getInternalState(TypeFactory.defaultInstance(), "_typeCache"); assertThat(typeCache.size(), greaterThan(0)); testHarness.open(); testHarness.close(); assertThat(typeCache.size(), is(0)); } - private static DoFnOperator getOperatorForCleanupInspection() { + private static DoFnOperator getOperatorForCleanupInspection() { FlinkPipelineOptions options = FlinkPipelineOptions.defaults(); options.setParallelism(4); @@ -2148,7 +2296,7 @@ public void finishBundle(FinishBundleContext context) { }; DoFnOperator.MultiOutputOutputManagerFactory outputManagerFactory = - new DoFnOperator.MultiOutputOutputManagerFactory( + new DoFnOperator.MultiOutputOutputManagerFactory<>( outputTag, WindowedValue.getFullCoder(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE), new SerializablePipelineOptions(options)); @@ -2188,8 +2336,8 @@ private Iterable> stripStreamRecord(Iterable input) { } private static class MultiOutputDoFn extends DoFn { - private TupleTag additionalOutput1; - private TupleTag additionalOutput2; + private final TupleTag additionalOutput1; + private final TupleTag additionalOutput2; public MultiOutputDoFn(TupleTag additionalOutput1, TupleTag additionalOutput2) { this.additionalOutput1 = additionalOutput1; @@ -2197,7 +2345,7 @@ public MultiOutputDoFn(TupleTag additionalOutput1, TupleTag addi } @ProcessElement - public void processElement(ProcessContext c) throws Exception { + public void processElement(ProcessContext c) { if ("one".equals(c.element())) { c.output(additionalOutput1, "extra: one"); } else if ("two".equals(c.element())) { @@ -2211,19 +2359,18 @@ public void processElement(ProcessContext c) throws Exception { } private static class IdentityDoFn extends DoFn { + @ProcessElement - public void processElement(ProcessContext c) throws Exception { + public void processElement(ProcessContext c) { c.output(c.element()); } } - @SuppressWarnings({"unchecked", "rawtypes"}) private WindowedValue> valuesInWindow( Iterable values, Instant timestamp, BoundedWindow window) { - return (WindowedValue) WindowedValue.of(values, timestamp, window, PaneInfo.NO_FIRING); + return WindowedValue.of(values, timestamp, window, PaneInfo.NO_FIRING); } - @SuppressWarnings({"unchecked", "rawtypes"}) private WindowedValue valueInWindow(T value, Instant timestamp, BoundedWindow window) { return WindowedValue.of(value, timestamp, window, PaneInfo.NO_FIRING); } diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperatorTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperatorTest.java index 336b414dcd3b..1cee673ce6c2 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperatorTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperatorTest.java @@ -80,6 +80,7 @@ import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.runners.fnexecution.state.StateRequestHandler; import org.apache.beam.runners.fnexecution.state.StateRequestHandlers; +import org.apache.beam.runners.fnexecution.wire.ByteStringCoder; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; @@ -99,13 +100,12 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; -import org.apache.beam.vendor.sdk.v2.sdk.extensions.protobuf.ByteStringCoder; import org.apache.flink.api.common.cache.DistributedCache; import org.apache.flink.api.common.functions.RuntimeContext; import org.apache.flink.api.common.typeinfo.TypeInformation; @@ -133,7 +133,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ExecutableStageDoFnOperatorTest { @Rule public ExpectedException thrown = ExpectedException.none(); @@ -510,8 +509,8 @@ public void testWatermarkHandling() throws Exception { timestamp, PaneInfo.NO_FIRING), TimerInternals.TimerData.of( - TimerReceiverFactory.encodeToTimerDataTimerId("transform", timerId), "", + TimerReceiverFactory.encodeToTimerDataTimerId("transform", timerId), StateNamespaces.window(IntervalWindow.getCoder(), intervalWindow), timestamp, timestamp, @@ -799,6 +798,7 @@ private void testEnsureDeferredStateCleanupTimerFiring(boolean withCheckpointing // user timer that fires after the end of the window and after state cleanup TimerInternals.TimerData userTimer = TimerInternals.TimerData.of( + "", TimerReceiverFactory.encodeToTimerDataTimerId( timerInputKey.getKey(), timerInputKey.getValue()), stateNamespace, @@ -834,6 +834,7 @@ private void testEnsureDeferredStateCleanupTimerFiring(boolean withCheckpointing // Cleanup timer are rescheduled if a new timer is created during the bundle TimerInternals.TimerData userTimer2 = TimerInternals.TimerData.of( + "", TimerReceiverFactory.encodeToTimerDataTimerId( timerInputKey.getKey(), timerInputKey.getValue()), stateNamespace, diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtilsTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtilsTest.java index 817b5e834b16..26ad64d57087 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtilsTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtilsTest.java @@ -26,7 +26,7 @@ import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VoidCoder; import org.apache.beam.sdk.util.CoderUtils; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.junit.Test; diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/StreamRecordStripper.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/StreamRecordStripper.java index 8c9fc321c7b9..4f3a5f64f012 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/StreamRecordStripper.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/StreamRecordStripper.java @@ -23,9 +23,6 @@ import org.apache.flink.streaming.runtime.streamrecord.StreamRecord; import org.checkerframework.checker.nullness.qual.Nullable; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class StreamRecordStripper { @SuppressWarnings("Guava") diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WindowDoFnOperatorTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WindowDoFnOperatorTest.java index 794fa5e9ac66..ab45072ab580 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WindowDoFnOperatorTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WindowDoFnOperatorTest.java @@ -66,7 +66,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class WindowDoFnOperatorTest { @@ -204,7 +203,7 @@ private WindowDoFnOperator getWindowDoFnOperator() { Coder windowCoder = windowingStrategy.getWindowFn().windowCoder(); SingletonKeyedWorkItemCoder workItemCoder = SingletonKeyedWorkItemCoder.of(VarLongCoder.of(), VarLongCoder.of(), windowCoder); - FullWindowedValueCoder> inputCoder = + FullWindowedValueCoder> inputCoder = WindowedValue.getFullCoder(workItemCoder, windowCoder); FullWindowedValueCoder> outputCoder = WindowedValue.getFullCoder(KvCoder.of(VarLongCoder.of(), VarLongCoder.of()), windowCoder); @@ -224,7 +223,8 @@ private WindowDoFnOperator getWindowDoFnOperator() { emptyList(), FlinkPipelineOptions.defaults(), VarLongCoder.of(), - new WorkItemKeySelector(VarLongCoder.of())); + new WorkItemKeySelector( + VarLongCoder.of(), new SerializablePipelineOptions(FlinkPipelineOptions.defaults()))); } private KeyedOneInputStreamOperatorTestHarness< diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/TestCountingSource.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/TestCountingSource.java index bf26285b26ce..92bf298f9be9 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/TestCountingSource.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/TestCountingSource.java @@ -44,9 +44,6 @@ * The reader will occasionally return false from {@code advance}, in order to simulate a source * where not all the data is available immediately. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestCountingSource extends UnboundedSource, TestCountingSource.CounterMark> { private static final Logger LOG = LoggerFactory.getLogger(TestCountingSource.class); diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java index f56ae97109c8..eb2f34fbb156 100644 --- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java @@ -89,7 +89,6 @@ @RunWith(Enclosed.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class UnboundedSourceWrapperTest { diff --git a/runners/google-cloud-dataflow-java/build.gradle b/runners/google-cloud-dataflow-java/build.gradle index 197eff83aa43..be0ade920f2d 100644 --- a/runners/google-cloud-dataflow-java/build.gradle +++ b/runners/google-cloud-dataflow-java/build.gradle @@ -45,7 +45,9 @@ processResources { filter org.apache.tools.ant.filters.ReplaceTokens, tokens: [ 'dataflow.legacy_environment_major_version' : '8', 'dataflow.fnapi_environment_major_version' : '8', - 'dataflow.container_version' : 'beam-master-20200629' + 'dataflow.legacy_container_version' : 'beam-master-20210913', + 'dataflow.fnapi_container_version' : 'beam-master-20210913', + 'dataflow.container_base_repository' : 'gcr.io/cloud-dataflow/v1beta3', ] } @@ -67,29 +69,36 @@ configurations { dependencies { compile enforcedPlatform(library.java.google_cloud_platform_libraries_bom) + permitUnusedDeclared enforcedPlatform(library.java.google_cloud_platform_libraries_bom) compile library.java.vendored_guava_26_0_jre compile project(path: ":model:pipeline", configuration: "shadow") compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":sdks:java:extensions:google-cloud-platform-core") + compile project(":sdks:java:io:kafka") compile project(":sdks:java:io:google-cloud-platform") compile project(":runners:core-construction-java") compile library.java.avro compile library.java.bigdataoss_util + compile library.java.commons_codec + compile library.java.flogger_system_backend // Avoids conflicts with bigdataoss_util (BEAM-11010) + permitUnusedDeclared library.java.flogger_system_backend compile library.java.google_api_client compile library.java.google_api_services_clouddebugger compile library.java.google_api_services_dataflow compile library.java.google_api_services_storage + permitUnusedDeclared library.java.google_api_services_storage // BEAM-11761 compile library.java.google_auth_library_credentials compile library.java.google_auth_library_oauth2_http compile library.java.google_http_client compile library.java.google_http_client_jackson2 + permitUnusedDeclared library.java.google_http_client_jackson2 // BEAM-11761 + compile library.java.hamcrest compile library.java.jackson_annotations compile library.java.jackson_core compile library.java.jackson_databind compile library.java.joda_time compile library.java.slf4j_api - compile library.java.vendored_grpc_1_26_0 - testCompile library.java.hamcrest_core + compile library.java.vendored_grpc_1_36_0 testCompile library.java.guava_testlib testCompile library.java.junit testCompile project(path: ":sdks:java:io:google-cloud-platform", configuration: "testRuntime") @@ -99,8 +108,6 @@ dependencies { testCompile library.java.google_cloud_dataflow_java_proto_library_all testCompile library.java.jackson_dataformat_yaml testCompile library.java.mockito_core - testCompile library.java.proto_google_cloud_datastore_v1 - testCompile library.java.slf4j_jdk14 validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest") validatesRunner project(project.path) validatesRunner project(path: project.path, configuration: "testRuntime") @@ -143,18 +150,18 @@ def runnerV2PipelineOptions = [ "--project=${dataflowProject}", "--region=${dataflowRegion}", "--tempRoot=${dataflowValidatesTempRoot}", - "--workerHarnessContainerImage=${dockerImageContainer}:${dockerTag}", - "--experiments=beam_fn_api,use_unified_worker,use_runner_v2", + "--sdkContainerImage=${dockerImageContainer}:${dockerTag}", + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + "--experiments=use_unified_worker,use_runner_v2,shuffle_mode=appliance", ] def commonLegacyExcludeCategories = [ 'org.apache.beam.sdk.testing.LargeKeys$Above10MB', 'org.apache.beam.sdk.testing.UsesAttemptedMetrics', 'org.apache.beam.sdk.testing.UsesCrossLanguageTransforms', + 'org.apache.beam.sdk.testing.UsesPythonExpansionService', 'org.apache.beam.sdk.testing.UsesDistributionMetrics', 'org.apache.beam.sdk.testing.UsesGaugeMetrics', - 'org.apache.beam.sdk.testing.UsesSetState', - 'org.apache.beam.sdk.testing.UsesMapState', 'org.apache.beam.sdk.testing.UsesSplittableParDoWithWindowedSideInputs', 'org.apache.beam.sdk.testing.UsesTestStream', 'org.apache.beam.sdk.testing.UsesParDoLifecycle', @@ -163,7 +170,6 @@ def commonLegacyExcludeCategories = [ ] def commonRunnerV2ExcludeCategories = [ - 'org.apache.beam.sdk.testing.UsesCommittedMetrics', 'org.apache.beam.sdk.testing.UsesGaugeMetrics', 'org.apache.beam.sdk.testing.UsesSetState', 'org.apache.beam.sdk.testing.UsesMapState', @@ -250,7 +256,6 @@ def createRunnerV2ValidatesRunnerTest = { Map args -> task buildAndPushDockerContainer() { def javaVer = project.hasProperty('compileAndRunTestsWithJava11') ? "java11" : "java8" dependsOn ":sdks:java:container:${javaVer}:docker" - finalizedBy 'cleanUpDockerImages' def defaultDockerImageName = containerImageName( name: "${project.docker_image_default_repo_prefix}${javaVer}_sdk", root: "apache", @@ -260,7 +265,7 @@ task buildAndPushDockerContainer() { commandLine "docker", "tag", "${defaultDockerImageName}", "${dockerImageName}" } exec { - commandLine "gcloud", "docker", "--", "push", "${dockerImageContainer}" + commandLine "gcloud", "docker", "--", "push", "${dockerImageName}" } } } @@ -272,7 +277,10 @@ task cleanUpDockerImages() { commandLine "docker", "rmi", "--force", "${dockerImageName}" } exec { - commandLine "gcloud", "--quiet", "container", "images", "delete", "--force-delete-tags", "${dockerImageName}" + commandLine "gcloud", "--quiet", "container", "images", "untag", "${dockerImageName}" + } + exec { + commandLine "./scripts/cleanup_untagged_gcr_images.sh", "${dockerImageContainer}" } } } @@ -310,12 +318,11 @@ task validatesRunnerStreaming { name: 'validatesRunnerLegacyWorkerTestStreaming', pipelineOptions: legacyPipelineOptions + ['--streaming=true'], excludedCategories: [ + 'org.apache.beam.sdk.testing.UsesCommittedMetrics', + 'org.apache.beam.sdk.testing.UsesMapState', 'org.apache.beam.sdk.testing.UsesRequiresTimeSortedInput', - 'org.apache.beam.sdk.testing.UsesStrictTimerOrdering' + 'org.apache.beam.sdk.testing.UsesSetState', ], - excludedTests: [ - 'org.apache.beam.sdk.metrics.MetricsTest$CommittedMetricTests' - ] )) } @@ -335,6 +342,8 @@ createCrossLanguageValidatesRunnerTask( "--region=${dataflowRegion}", "--sdk_harness_container_image_overrides=.*java.*,${dockerImageContainer}:${dockerTag}", "--experiments=use_runner_v2", + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved + "--experiments=shuffle_mode=appliance", ], javaPipelineOptions: [ "--runner=TestDataflowRunner", @@ -342,14 +351,14 @@ createCrossLanguageValidatesRunnerTask( "--region=${dataflowRegion}", "--tempRoot=${dataflowValidatesTempRoot}", "--sdkHarnessContainerImageOverrides=.*java.*,${dockerImageContainer}:${dockerTag}", - "--experiments=use_runner_v2", + // TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + "--experiments=use_runner_v2,shuffle_mode=appliance", ], - nosetestsOptions: [ - "--nocapture", - "--processes=8", - "--process-timeout=4500", - // TODO: enable after fixing BEAM-10507 - "--exclude=test_xlang_parquetio_write|test_flatten", + pytestOptions: [ + "--capture=no", + "--numprocesses=8", + "--timeout=4500", + "--log-cli-level=INFO", ] ) @@ -359,6 +368,9 @@ task validatesRunnerV2 { dependsOn(createRunnerV2ValidatesRunnerTest( name: 'validatesRunnerV2Test', excludedCategories: [ + 'org.apache.beam.sdk.testing.UsesOnWindowExpiration', + 'org.apache.beam.sdk.testing.UsesStatefulParDo', + 'org.apache.beam.sdk.testing.UsesTimersInParDo', 'org.apache.beam.sdk.testing.UsesUnboundedPCollections', 'org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo', ], @@ -366,23 +378,10 @@ task validatesRunnerV2 { 'org.apache.beam.sdk.transforms.windowing.WindowingTest.testNonPartitioningWindowing', 'org.apache.beam.sdk.transforms.windowing.WindowTest.testMergingCustomWindowsKeyedCollection', 'org.apache.beam.sdk.transforms.windowing.WindowTest.testMergingCustomWindows', + 'org.apache.beam.sdk.transforms.windowing.WindowTest.testMergingCustomWindowsWithoutCustomWindowTypes', 'org.apache.beam.sdk.transforms.ReshuffleTest.testReshuffleWithTimestampsStreaming', - 'org.apache.beam.sdk.transforms.ParDoTest$OnWindowExpirationTests.testOnWindowExpirationSimpleBounded', - 'org.apache.beam.sdk.transforms.ParDoTest$OnWindowExpirationTests.testOnWindowExpirationSimpleUnbounded', - 'org.apache.beam.sdk.transforms.ParDoTest$StateTests.testValueStateTaggedOutput', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerFamilyTests.testTimerFamilyProcessingTime', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.duplicateTimerSetting', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testEventTimeTimerCanBeReset', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testEventTimeTimerOrdering', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testEventTimeTimerOrderingWithCreate', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestamp', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampWithProcessingTime', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testProcessingTimeTimerCanBeReset', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testTwoTimersSettingEachOther', - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testTwoTimersSettingEachOtherWithCreateAsInput', - 'org.apache.beam.sdk.transforms.ParDoLifecycleTest.testFnCallSequenceStateful', 'org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundle', 'org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful', @@ -396,6 +395,8 @@ task validatesRunnerV2 { 'org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testCombiningAccumulatingProcessingTime', 'org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testLargeKeys100MB', 'org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testLargeKeys10MB', + // TODO(BEAM-12353): Identify whether it's bug or a feature gap. + 'org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testRewindowWithTimestampCombiner', 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2', ] @@ -409,13 +410,15 @@ task validatesRunnerV2Streaming { name: 'validatesRunnerV2TestStreaming', pipelineOptions: runnerV2PipelineOptions + ['--streaming=true'], excludedCategories: [ - 'org.apache.beam.sdk.testing.UsesBoundedSplittableParDo', 'org.apache.beam.sdk.testing.LargeKeys$Above10KB', + 'org.apache.beam.sdk.testing.UsesBoundedSplittableParDo', + 'org.apache.beam.sdk.testing.UsesCommittedMetrics', + 'org.apache.beam.sdk.testing.UsesTestStream', ], excludedTests: [ - 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testEventTimeTimerAlignBounded', 'org.apache.beam.sdk.transforms.windowing.WindowTest.testMergingCustomWindows', 'org.apache.beam.sdk.transforms.windowing.WindowTest.testMergingCustomWindowsKeyedCollection', + 'org.apache.beam.sdk.transforms.windowing.WindowTest.testMergingCustomWindowsWithoutCustomWindowTypes', 'org.apache.beam.examples.complete.TopWikipediaSessionsTest.testComputeTopUsers', 'org.apache.beam.sdk.transforms.windowing.WindowingTest.testNonPartitioningWindowing', @@ -424,11 +427,18 @@ task validatesRunnerV2Streaming { 'org.apache.beam.sdk.extensions.sql.BeamSqlDslAggregationTest.testTriggeredTumble', 'org.apache.beam.sdk.transforms.ReshuffleTest.testReshuffleWithTimestampsStreaming', - 'org.apache.beam.sdk.testing.TestStreamTest.testProcessingTimeTrigger', - 'org.apache.beam.sdk.transforms.ParDoTest$OnWindowExpirationTests.testOnWindowExpirationSimpleBounded', 'org.apache.beam.sdk.transforms.ParDoTest$OnWindowExpirationTests.testOnWindowExpirationSimpleUnbounded', + // TODO(BEAM-11858) reading a side input twice fails + 'org.apache.beam.sdk.transforms.ParDoTest$MultipleInputsAndOutputTests.testSameSideInputReadTwice', + 'org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineWithContext', + 'org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContextEmpty', + 'org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContext', + 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testFixedWindowsCombineWithContext', + 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSessionsCombineWithContext', + 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSlidingWindowsCombineWithContext', + 'org.apache.beam.runners.dataflow.DataflowRunnerTest.testBatchGroupIntoBatchesOverride', 'org.apache.beam.sdk.transforms.GroupByKeyTest.testCombiningAccumulatingProcessingTime', @@ -447,7 +457,6 @@ task validatesRunnerV2Streaming { 'org.apache.beam.sdk.metrics.MetricsTest$CommittedMetricTests.testCommittedCounterMetrics', - 'org.apache.beam.sdk.testing.TestStreamTest.testMultipleStreams', 'org.apache.beam.sdk.transforms.WaitTest.testWaitWithSameFixedWindows', 'org.apache.beam.sdk.transforms.WaitTest.testWaitWithDifferentFixedWindows', 'org.apache.beam.sdk.transforms.WaitTest.testWaitWithSignalInSlidingWindows', @@ -465,7 +474,15 @@ task validatesRunnerV2Streaming { 'org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle', 'org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful', + // TODO(BEAM-11917) Empty flatten fails in streaming + "org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput", + "org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo", + "org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty", + 'org.apache.beam.sdk.io.CountingSourceTest.testBoundedSourceSplits', + + // TODO(BEAM-12353): Identify whether it's bug or a feature gap. + 'org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testRewindowWithTimestampCombiner', ] )) } @@ -530,18 +547,13 @@ task googleCloudPlatformRunnerV2IntegrationTest(type: Test) { systemProperty "beamTestPipelineOptions", JsonOutput.toJson(runnerV2PipelineOptions) include '**/*IT.class' - exclude '**/BigQueryIOReadIT.class' - exclude '**/BigQueryIOStorageQueryIT.class' - exclude '**/BigQueryIOStorageReadIT.class' exclude '**/BigQueryIOStorageReadTableRowIT.class' - exclude '**/PubsubReadIT.class' - exclude '**/SpannerReadIT.class' - exclude '**/BigtableReadIT.class' - exclude '**/V1ReadIT.class' exclude '**/SpannerWriteIT.class' - exclude '**/BigQueryNestedRecordsIT.class' - exclude '**/SplitQueryFnIT.class' exclude '**/*KmsKeyIT.class' + exclude '**/FhirIOReadIT.class' + exclude '**/FhirIOWriteIT.class' + exclude '**/FhirIOLROIT.class' + exclude '**/FhirIOSearchIT.class' maxParallelForks 4 classpath = configurations.googleCloudPlatformIntegrationTest @@ -671,6 +683,15 @@ createJavaExamplesArchetypeValidationTask(type: 'MobileGaming', bqDataset: bqDataset, pubsubTopic: pubsubTopic) +// Generates :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflowBom +createJavaExamplesArchetypeValidationTask(type: 'MobileGaming', + runner: 'DataflowBom', + gcpProject: gcpProject, + gcpRegion: gcpRegion, + gcsBucket: gcsBucket, + bqDataset: bqDataset, + pubsubTopic: pubsubTopic) + // Standalone task for testing GCS upload, use with -PfilesToStage and -PdataflowTempRoot. task GCSUpload(type: JavaExec) { main = 'org.apache.beam.runners.dataflow.util.GCSUploadMain' diff --git a/runners/google-cloud-dataflow-java/examples-streaming/build.gradle b/runners/google-cloud-dataflow-java/examples-streaming/build.gradle index fd8705d648ce..bc5f584d606c 100644 --- a/runners/google-cloud-dataflow-java/examples-streaming/build.gradle +++ b/runners/google-cloud-dataflow-java/examples-streaming/build.gradle @@ -40,8 +40,8 @@ task windmillPreCommit(type: Test) { dependsOn ":runners:google-cloud-dataflow-java:worker:legacy-worker:shadowJar" def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: project(":runners:google-cloud-dataflow-java:worker:legacy-worker").shadowJar.archivePath - // Set workerHarnessContainerImage to empty to make Dataflow pick up the non-versioned container - // image, which handles a staged worker jar. + // Set workerHarnessContainerImage to empty to make Dataflow pick up the + // non-versioned container image, which handles a staged worker jar. def preCommitBeamTestPipelineOptions = [ "--project=${gcpProject}", "--region=${gcpRegion}", diff --git a/runners/google-cloud-dataflow-java/examples/build.gradle b/runners/google-cloud-dataflow-java/examples/build.gradle index b128bc13a1d8..b52469b599c8 100644 --- a/runners/google-cloud-dataflow-java/examples/build.gradle +++ b/runners/google-cloud-dataflow-java/examples/build.gradle @@ -41,7 +41,7 @@ def gcpRegion = project.findProperty('gcpRegion') ?: 'us-central1' def gcsTempRoot = project.findProperty('gcsTempRoot') ?: 'gs://temp-storage-for-end-to-end-tests/' def dockerImageName = project(':runners:google-cloud-dataflow-java').ext.dockerImageName // If -PuseExecutableStage is set, the use_executable_stage_bundle_execution wil be enabled. -def fnapiExperiments = project.hasProperty('useExecutableStage') ? 'beam_fn_api,beam_fn_api_use_deprecated_read,use_executable_stage_bundle_execution' : "beam_fn_api,beam_fn_api_use_deprecated_read" +def fnapiExperiments = project.hasProperty('useExecutableStage') ? 'beam_fn_api_use_deprecated_read,use_executable_stage_bundle_execution' : "beam_fn_api,beam_fn_api_use_deprecated_read" def commonConfig = { dataflowWorkerJar, workerHarnessContainerImage = '', additionalOptions = [] -> // return the preevaluated configuration closure diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/contrib/create_elk_container.sh b/runners/google-cloud-dataflow-java/scripts/cleanup_untagged_gcr_images.sh similarity index 59% rename from sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/contrib/create_elk_container.sh rename to runners/google-cloud-dataflow-java/scripts/cleanup_untagged_gcr_images.sh index 48f6064cd07f..5bf8197f8f98 100755 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/contrib/create_elk_container.sh +++ b/runners/google-cloud-dataflow-java/scripts/cleanup_untagged_gcr_images.sh @@ -1,5 +1,4 @@ -#!/bin/sh -################################################################################ +#!/bin/bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with @@ -15,10 +14,19 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -# -################################################################################ -#Create an ELK (Elasticsearch Logstash Kibana) container for ES v2.4 and compatible Logstash and Kibana versions, -#bind then on host ports, allow shell access to container and mount current directory on /home/$USER inside the container +set -e + +IMAGE_NAME=$1 + +# Find all untagged images +DIGESTS=$(gcloud container images list-tags "${IMAGE_NAME}" --filter='-tags:*' --format="get(digest)") -docker create -p 5601:5601 -p 9200:9200 -p 5044:5044 -p 5000:5000 -p 9300:9300 -it -v $(pwd):/home/$USER/ --name elk-2.4 sebp/elk:es240_l240_k460 +# Delete image +echo "${DIGESTS}" | while read -r digest; do + if [[ ! -z "${digest}" ]]; then + img="${IMAGE_NAME}@${digest}" + echo "Removing untagged image ${img}" + gcloud container images delete --quiet "${img}" + fi +done diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java index e93a328deb7e..eb16ea248aa1 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java @@ -19,7 +19,6 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; -import java.util.List; import java.util.Map; import org.apache.beam.runners.core.construction.PTransformReplacements; import org.apache.beam.runners.core.construction.ReplacementOutputs; @@ -28,6 +27,7 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.InstantCoder; import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.runners.PTransformOverrideFactory; import org.apache.beam.sdk.transforms.DoFn; @@ -36,7 +36,6 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.ParDo.MultiOutput; import org.apache.beam.sdk.transforms.ParDo.SingleOutput; -import org.apache.beam.sdk.transforms.Reshuffle; import org.apache.beam.sdk.transforms.reflect.DoFnInvokers; import org.apache.beam.sdk.transforms.reflect.DoFnSignature; import org.apache.beam.sdk.transforms.reflect.DoFnSignatures; @@ -72,8 +71,8 @@ public class BatchStatefulParDoOverrides { PCollection>, PCollection, ParDo.SingleOutput, OutputT>> - singleOutputOverrideFactory(DataflowPipelineOptions options) { - return new SingleOutputOverrideFactory<>(isFnApi(options)); + singleOutputOverrideFactory() { + return new SingleOutputOverrideFactory<>(); } /** @@ -86,12 +85,7 @@ public class BatchStatefulParDoOverrides { PCollectionTuple, ParDo.MultiOutput, OutputT>> multiOutputOverrideFactory(DataflowPipelineOptions options) { - return new MultiOutputOverrideFactory<>(isFnApi(options)); - } - - private static boolean isFnApi(DataflowPipelineOptions options) { - List experiments = options.getExperiments(); - return experiments != null && experiments.contains("beam_fn_api"); + return new MultiOutputOverrideFactory<>(); } private static class SingleOutputOverrideFactory @@ -100,11 +94,7 @@ private static class SingleOutputOverrideFactory PCollection, ParDo.SingleOutput, OutputT>> { - private final boolean isFnApi; - - private SingleOutputOverrideFactory(boolean isFnApi) { - this.isFnApi = isFnApi; - } + private SingleOutputOverrideFactory() {} @Override public PTransformReplacement>, PCollection> @@ -116,7 +106,7 @@ private SingleOutputOverrideFactory(boolean isFnApi) { transform) { return PTransformReplacement.of( PTransformReplacements.getSingletonMainInput(transform), - new StatefulSingleOutputParDo<>(transform.getTransform(), isFnApi)); + new StatefulSingleOutputParDo<>(transform.getTransform())); } @Override @@ -130,11 +120,7 @@ private static class MultiOutputOverrideFactory implements PTransformOverrideFactory< PCollection>, PCollectionTuple, ParDo.MultiOutput, OutputT>> { - private final boolean isFnApi; - - private MultiOutputOverrideFactory(boolean isFnApi) { - this.isFnApi = isFnApi; - } + private MultiOutputOverrideFactory() {} @Override public PTransformReplacement>, PCollectionTuple> @@ -146,7 +132,7 @@ private MultiOutputOverrideFactory(boolean isFnApi) { transform) { return PTransformReplacement.of( PTransformReplacements.getSingletonMainInput(transform), - new StatefulMultiOutputParDo<>(transform.getTransform(), isFnApi)); + new StatefulMultiOutputParDo<>(transform.getTransform())); } @Override @@ -160,12 +146,9 @@ static class StatefulSingleOutputParDo extends PTransform>, PCollection> { private final ParDo.SingleOutput, OutputT> originalParDo; - private final boolean isFnApi; - StatefulSingleOutputParDo( - ParDo.SingleOutput, OutputT> originalParDo, boolean isFnApi) { + StatefulSingleOutputParDo(ParDo.SingleOutput, OutputT> originalParDo) { this.originalParDo = originalParDo; - this.isFnApi = isFnApi; } ParDo.SingleOutput, OutputT> getOriginalParDo() { @@ -176,13 +159,11 @@ ParDo.SingleOutput, OutputT> getOriginalParDo() { public PCollection expand(PCollection> input) { DoFn, OutputT> fn = originalParDo.getFn(); verifyFnIsStateful(fn); - DataflowRunner.verifyDoFnSupportedBatch(fn); + DataflowPipelineOptions options = + input.getPipeline().getOptions().as(DataflowPipelineOptions.class); + DataflowRunner.verifyDoFnSupported(fn, false, DataflowRunner.useStreamingEngine(options)); DataflowRunner.verifyStateSupportForWindowingStrategy(input.getWindowingStrategy()); - if (isFnApi) { - return input.apply(Reshuffle.of()).apply(originalParDo); - } - PTransform< PCollection>>>>>, PCollection> @@ -197,25 +178,20 @@ static class StatefulMultiOutputParDo extends PTransform>, PCollectionTuple> { private final ParDo.MultiOutput, OutputT> originalParDo; - private final boolean isFnApi; - StatefulMultiOutputParDo( - ParDo.MultiOutput, OutputT> originalParDo, boolean isFnApi) { + StatefulMultiOutputParDo(ParDo.MultiOutput, OutputT> originalParDo) { this.originalParDo = originalParDo; - this.isFnApi = isFnApi; } @Override public PCollectionTuple expand(PCollection> input) { DoFn, OutputT> fn = originalParDo.getFn(); verifyFnIsStateful(fn); - DataflowRunner.verifyDoFnSupportedBatch(fn); + DataflowPipelineOptions options = + input.getPipeline().getOptions().as(DataflowPipelineOptions.class); + DataflowRunner.verifyDoFnSupported(fn, false, DataflowRunner.useStreamingEngine(options)); DataflowRunner.verifyStateSupportForWindowingStrategy(input.getWindowingStrategy()); - if (isFnApi) { - return input.apply(Reshuffle.of()).apply(originalParDo); - } - PTransform< PCollection>>>>>, PCollectionTuple> @@ -304,8 +280,8 @@ public DoFn, OutputT> getUnderlyingDoFn() { } @Setup - public void setup() { - DoFnInvokers.invokerFor(underlyingDoFn).invokeSetup(); + public void setup(final PipelineOptions options) { + DoFnInvokers.tryInvokeSetupFor(underlyingDoFn, options); } @ProcessElement diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java index 4469d2f85511..b58e9030a652 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.dataflow; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects.firstNonNull; import com.google.api.client.util.ArrayMap; @@ -27,6 +28,7 @@ import java.util.HashMap; import java.util.HashSet; import java.util.List; +import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.sdk.metrics.DistributionResult; import org.apache.beam.sdk.metrics.GaugeResult; import org.apache.beam.sdk.metrics.MetricFiltering; @@ -36,6 +38,7 @@ import org.apache.beam.sdk.metrics.MetricResult; import org.apache.beam.sdk.metrics.MetricResults; import org.apache.beam.sdk.metrics.MetricsFilter; +import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Objects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.checkerframework.checker.nullness.qual.Nullable; @@ -45,7 +48,6 @@ /** Implementation of {@link MetricResults} for the Dataflow Runner. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) class DataflowMetrics extends MetricResults { private static final Logger LOG = LoggerFactory.getLogger(DataflowMetrics.class); @@ -64,7 +66,7 @@ class DataflowMetrics extends MetricResults { * After the job has finished running, Metrics no longer will change, so their results are cached * here. */ - private JobMetrics cachedMetricResults = null; + private @Nullable JobMetrics cachedMetricResults = null; /** * Constructor for the DataflowMetrics class. @@ -91,7 +93,7 @@ private MetricQueryResults populateMetricQueryResults( } @Override - public MetricQueryResults queryMetrics(@Nullable MetricsFilter filter) { + public MetricQueryResults queryMetrics(MetricsFilter filter) { List metricUpdates; ImmutableList> counters = ImmutableList.of(); ImmutableList> distributions = ImmutableList.of(); @@ -242,26 +244,72 @@ private boolean isMetricTentative(MetricUpdate metricUpdate) { && Objects.equal(metricUpdate.getName().getContext().get("tentative"), "true"); } + /** + * Returns the user step name for a given internal step name. + * + * @param internalStepName internal step name used by Dataflow + * @return user step name used to identify the metric + */ + private @Nullable String getUserStepName(String internalStepName) { + @Nullable String portableUserStepName = getPortableUserStepName(internalStepName); + if (portableUserStepName != null) { + return portableUserStepName; + } else { + return getNonPortableUserStepName(internalStepName); + } + } + + private @Nullable String getPortableUserStepName(String internalStepName) { + RunnerApi.@Nullable Pipeline pipelineProto = dataflowPipelineJob.getPipelineProto(); + if (pipelineProto == null) { + return null; + } + + RunnerApi.@Nullable PTransform transform = + pipelineProto.getComponents().getTransformsMap().get(internalStepName); + if (transform == null) { + return null; + } + + return transform.getUniqueName(); + } + + private @Nullable String getNonPortableUserStepName(String internalStepName) { + // If we can't translate internal step names to user step names, we just skip them altogether. + if (dataflowPipelineJob.transformStepNames == null) { + return null; + } + + @Nullable + AppliedPTransform appliedPTransform = + dataflowPipelineJob.transformStepNames.inverse().get(internalStepName); + if (appliedPTransform == null) { + return null; + } + + return appliedPTransform.getFullName(); + } + /** * Build an {@link MetricKey} that serves as a hash key for a metric update. * * @return a {@link MetricKey} that can be hashed and used to identify a metric. */ - private MetricKey getMetricHashKey(MetricUpdate metricUpdate) { - String fullStepName = metricUpdate.getName().getContext().get("step"); - if (dataflowPipelineJob.transformStepNames == null - || !dataflowPipelineJob.transformStepNames.inverse().containsKey(fullStepName)) { - // If we can't translate internal step names to user step names, we just skip them - // altogether. + private @Nullable MetricKey getMetricHashKey(MetricUpdate metricUpdate) { + @Nullable String internalStepName = metricUpdate.getName().getContext().get("step"); + checkArgumentNotNull( + internalStepName, "MetricUpdate has null internal step name: %s", metricUpdate); + + @Nullable String namespace = metricUpdate.getName().getContext().get("namespace"); + checkArgumentNotNull(namespace, "MetricUpdate has null namespace: %s", metricUpdate); + + @Nullable String userStepName = getUserStepName(internalStepName); + if (userStepName == null) { return null; } - fullStepName = - dataflowPipelineJob.transformStepNames.inverse().get(fullStepName).getFullName(); + return MetricKey.create( - fullStepName, - MetricName.named( - metricUpdate.getName().getContext().get("namespace"), - metricUpdate.getName().getName())); + userStepName, MetricName.named(namespace, metricUpdate.getName().getName())); } private void buildMetricsIndex() { @@ -275,7 +323,7 @@ private void buildMetricsIndex() { continue; } - MetricKey updateKey = getMetricHashKey(update); + @Nullable MetricKey updateKey = getMetricHashKey(update); if (updateKey == null || !MetricFiltering.matches(filter, updateKey)) { // Skip unmatched metrics early. continue; diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java index 1ecf18a71134..b2fdc194b7ad 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java @@ -35,6 +35,7 @@ import java.util.concurrent.ExecutionException; import java.util.concurrent.FutureTask; import java.util.concurrent.atomic.AtomicReference; +import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline; import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions; import org.apache.beam.runners.dataflow.util.MonitoringUtil; import org.apache.beam.runners.dataflow.util.MonitoringUtil.JobMessagesHandler; @@ -119,23 +120,45 @@ public class DataflowPipelineJob implements PipelineResult { .withMaxRetries(STATUS_POLLING_RETRIES) .withExponent(DEFAULT_BACKOFF_EXPONENT); + private @Nullable String latestStateString; + + private final @Nullable Pipeline pipelineProto; + /** * Constructs the job. * * @param jobId the job id * @param dataflowOptions used to configure the client for the Dataflow Service * @param transformStepNames a mapping from AppliedPTransforms to Step Names + * @param pipelineProto Runner API pipeline proto. */ public DataflowPipelineJob( DataflowClient dataflowClient, String jobId, DataflowPipelineOptions dataflowOptions, - Map, String> transformStepNames) { + Map, String> transformStepNames, + @Nullable Pipeline pipelineProto) { this.dataflowClient = dataflowClient; this.jobId = jobId; this.dataflowOptions = dataflowOptions; this.transformStepNames = HashBiMap.create(firstNonNull(transformStepNames, ImmutableMap.of())); this.dataflowMetrics = new DataflowMetrics(this, this.dataflowClient); + this.pipelineProto = pipelineProto; + } + + /** + * Constructs the job. + * + * @param jobId the job id + * @param dataflowOptions used to configure the client for the Dataflow Service + * @param transformStepNames a mapping from AppliedPTransforms to Step Names + */ + public DataflowPipelineJob( + DataflowClient dataflowClient, + String jobId, + DataflowPipelineOptions dataflowOptions, + Map, String> transformStepNames) { + this(dataflowClient, jobId, dataflowOptions, transformStepNames, null); } /** Get the id of this job. */ @@ -152,6 +175,11 @@ public DataflowPipelineOptions getDataflowOptions() { return dataflowOptions; } + /** Get the Runner API pipeline proto if available. */ + public @Nullable Pipeline getPipelineProto() { + return pipelineProto; + } + /** Get the region this job exists in. */ public String getRegion() { return dataflowOptions.getRegion(); @@ -480,6 +508,11 @@ public State getState() { BackOffAdapter.toGcpBackOff(STATUS_BACKOFF_FACTORY.backoff()), Sleeper.DEFAULT); } + @Nullable + String getLatestStateString() { + return latestStateString; + } + /** * Attempts to get the state. Uses exponential backoff on failure up to the maximum number of * passed in attempts. @@ -505,7 +538,8 @@ State getStateWithRetries(BackOff attempts, Sleeper sleeper) throws IOException return terminalState; } Job job = getJobWithRetries(attempts, sleeper); - return MonitoringUtil.toState(job.getCurrentState()); + latestStateString = job.getCurrentState(); + return MonitoringUtil.toState(latestStateString); } /** @@ -528,7 +562,11 @@ private Job getJobWithRetries(BackOff backoff, Sleeper sleeper) throws IOExcepti terminalState = currentState; replacedByJob = new DataflowPipelineJob( - dataflowClient, job.getReplacedByJobId(), dataflowOptions, transformStepNames); + dataflowClient, + job.getReplacedByJobId(), + dataflowOptions, + transformStepNames, + pipelineProto); } return job; } catch (IOException exn) { diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java index 3a75f40cca18..8229a32a4f81 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java @@ -36,6 +36,7 @@ import com.fasterxml.jackson.databind.ObjectMapper; import com.google.api.services.dataflow.model.AutoscalingSettings; import com.google.api.services.dataflow.model.DataflowPackage; +import com.google.api.services.dataflow.model.DebugOptions; import com.google.api.services.dataflow.model.Disk; import com.google.api.services.dataflow.model.Environment; import com.google.api.services.dataflow.model.Job; @@ -47,6 +48,7 @@ import java.util.HashMap; import java.util.List; import java.util.Map; +import java.util.Map.Entry; import java.util.concurrent.atomic.AtomicLong; import java.util.stream.Collectors; import org.apache.beam.model.pipeline.v1.RunnerApi; @@ -88,10 +90,9 @@ import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.display.HasDisplayData; -import org.apache.beam.sdk.transforms.reflect.DoFnInvoker; -import org.apache.beam.sdk.transforms.reflect.DoFnInvokers; -import org.apache.beam.sdk.transforms.reflect.DoFnSignature; import org.apache.beam.sdk.transforms.reflect.DoFnSignatures; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHint; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.windowing.DefaultTrigger; import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.util.AppliedCombineFn; @@ -107,10 +108,13 @@ import org.apache.beam.sdk.values.TimestampedValue; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.commons.codec.EncoderException; +import org.apache.commons.codec.net.PercentCodec; import org.checkerframework.checker.nullness.qual.Nullable; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -229,7 +233,7 @@ public RunnerApi.Pipeline getPipelineProto() { /** Renders a {@link Job} as a string. */ public static String jobToString(Job job) { try { - return MAPPER.writerWithDefaultPrettyPrinter().writeValueAsString(job); + return MAPPER.writeValueAsString(job); } catch (JsonProcessingException exc) { throw new IllegalStateException("Failed to render Job as String.", exc); } @@ -334,6 +338,7 @@ public Job translate(List packages) { Environment environment = new Environment(); job.setEnvironment(environment); + job.getEnvironment().setServiceOptions(options.getDataflowServiceOptions()); WorkerPool workerPool = new WorkerPool(); @@ -388,6 +393,13 @@ public Job translate(List packages) { workerPool.setPackages(packages); workerPool.setNumWorkers(options.getNumWorkers()); + if (options.getMaxNumWorkers() != 0 && options.getNumWorkers() > options.getMaxNumWorkers()) { + throw new IllegalArgumentException( + String.format( + "numWorkers (%d) cannot exceed maxNumWorkers (%d).", + options.getNumWorkers(), options.getMaxNumWorkers())); + } + if (options.getLabels() != null) { job.setLabels(options.getLabels()); } @@ -427,8 +439,14 @@ public Job translate(List packages) { if (options.getDataflowKmsKey() != null) { environment.setServiceKmsKeyName(options.getDataflowKmsKey()); } + if (options.isHotKeyLoggingEnabled()) { + DebugOptions debugOptions = new DebugOptions(); + debugOptions.setEnableHotKeyLogging(true); + environment.setDebugOptions(debugOptions); + } pipeline.traverseTopologically(this); + return job; } @@ -489,7 +507,16 @@ public void visitPrimitiveTransform(TransformHierarchy.Node node) { node.getFullName()); LOG.debug("Translating {}", transform); currentTransform = node.toAppliedPTransform(getPipeline()); + ResourceHints hints = transform.getResourceHints(); + // AppliedPTransform instance stores resource hints of current transform merged with outer + // hints (e.g. set on outer composites). + // Translation reads resource hints from PTransform objects, so update the hints. + transform.setResourceHints(currentTransform.getResourceHints()); translator.translate(transform, this); + // Avoid side-effects in case the same transform is applied in multiple places in the + // pipeline. + transform.setResourceHints(hints); + // translator.translate(node, this); currentTransform = null; } @@ -497,22 +524,13 @@ public void visitPrimitiveTransform(TransformHierarchy.Node node) { public void visitValue(PValue value, TransformHierarchy.Node producer) { LOG.debug("Checking translation of {}", value); // Primitive transforms are the only ones assigned step names. - if (producer.getTransform() instanceof CreateDataflowView - && !hasExperiment(options, "beam_fn_api")) { + if (producer.getTransform() instanceof CreateDataflowView) { // CreateDataflowView produces a dummy output (as it must be a primitive transform) // but in the Dataflow Job graph produces only the view and not the output PCollection. asOutputReference( ((CreateDataflowView) producer.getTransform()).getView(), producer.toAppliedPTransform(getPipeline())); return; - } else if (producer.getTransform() instanceof View.CreatePCollectionView - && hasExperiment(options, "beam_fn_api")) { - // View.CreatePCollectionView produces a dummy output (as it must be a primitive transform) - // but in the Dataflow Job graph produces only the view and not the output PCollection. - asOutputReference( - ((View.CreatePCollectionView) producer.getTransform()).getView(), - producer.toAppliedPTransform(getPipeline())); - return; } asOutputReference(value, producer.toAppliedPTransform(getPipeline())); } @@ -538,6 +556,7 @@ public StepTranslator addStep(PTransform transform, String type) { StepTranslator stepContext = new StepTranslator(this, step); stepContext.addInput(PropertyNames.USER_NAME, getFullName(transform)); stepContext.addDisplayData(step, stepName, transform); + stepContext.addResourceHints(step, stepName, transform.getResourceHints()); LOG.info("Adding {} as step {}", getCurrentTransform(transform).getFullName(), stepName); return stepContext; } @@ -608,6 +627,10 @@ static class StepTranslator implements StepTranslationContext { private final Translator translator; private final Step step; + // For compatibility with URL encoding implementations that represent space as +, + // always encode + as %2b even though we don't encode space as +. + private final PercentCodec percentCodec = + new PercentCodec("+".getBytes(Charsets.US_ASCII), false); private StepTranslator(Translator translator, Step step) { this.translator = translator; @@ -746,6 +769,23 @@ private void addDisplayData(Step step, String stepName, HasDisplayData hasDispla List> list = MAPPER.convertValue(displayData, List.class); addList(getProperties(), PropertyNames.DISPLAY_DATA, list); } + + private void addResourceHints(Step step, String stepName, ResourceHints hints) { + Map urlEncodedHints = new HashMap<>(); + for (Entry entry : hints.hints().entrySet()) { + try { + urlEncodedHints.put( + entry.getKey(), + new String(percentCodec.encode(entry.getValue().toBytes()), Charsets.US_ASCII)); + } catch (EncoderException e) { + // Should never happen. + throw new RuntimeException("Invalid value for resource hint: " + entry.getKey(), e); + } + } + if (urlEncodedHints.size() > 0) { + addDictionary(getProperties(), PropertyNames.RESOURCE_HINTS, urlEncodedHints); + } + } } ///////////////////////////////////////////////////////////////////////////// @@ -787,8 +827,7 @@ private void translateTyped( byteArrayToJsonString( serializeWindowingStrategy(windowingStrategy, context.getPipelineOptions()))); stepContext.addInput( - PropertyNames.IS_MERGING_WINDOW_FN, - !windowingStrategy.getWindowFn().isNonMerging()); + PropertyNames.IS_MERGING_WINDOW_FN, windowingStrategy.needsMerge()); stepContext.addCollectionToSingletonOutput( input, PropertyNames.OUTPUT, transform.getView()); } @@ -842,17 +881,8 @@ private void translateHelper( stepContext.addEncodingInput(fn.getAccumulatorCoder()); - List experiments = context.getPipelineOptions().getExperiments(); - boolean isFnApi = experiments != null && experiments.contains("beam_fn_api"); - - if (isFnApi) { - String ptransformId = - context.getSdkComponents().getPTransformIdOrThrow(context.getCurrentParent()); - stepContext.addInput(PropertyNames.SERIALIZED_FN, ptransformId); - } else { - stepContext.addInput( - PropertyNames.SERIALIZED_FN, byteArrayToJsonString(serializeToByteArray(fn))); - } + stepContext.addInput( + PropertyNames.SERIALIZED_FN, byteArrayToJsonString(serializeToByteArray(fn))); stepContext.addOutput(PropertyNames.OUTPUT, context.getOutput(primitiveTransform)); } @@ -919,7 +949,7 @@ private void groupByKeyHelper( boolean isStreaming = context.getPipelineOptions().as(StreamingOptions.class).isStreaming(); boolean allowCombinerLifting = - windowingStrategy.getWindowFn().isNonMerging() + !windowingStrategy.needsMerge() && windowingStrategy.getWindowFn().assignsToOneWindow(); if (isStreaming) { allowCombinerLifting &= transform.fewKeys(); @@ -978,24 +1008,6 @@ private void translateMultiHelper( outputCoders, doFnSchemaInformation, sideInputMapping); - - // TODO: Move this logic into translateFn once the legacy ProcessKeyedElements is - // removed. - if (context.isFnApi()) { - DoFnSignature signature = DoFnSignatures.signatureForDoFn(transform.getFn()); - if (signature.processElement().isSplittable()) { - DoFnInvoker doFnInvoker = DoFnInvokers.invokerFor(transform.getFn()); - Coder restrictionAndWatermarkStateCoder = - KvCoder.of( - doFnInvoker.invokeGetRestrictionCoder( - context.getInput(transform).getPipeline().getCoderRegistry()), - doFnInvoker.invokeGetWatermarkEstimatorStateCoder( - context.getInput(transform).getPipeline().getCoderRegistry())); - stepContext.addInput( - PropertyNames.RESTRICTION_ENCODING, - translateCoder(restrictionAndWatermarkStateCoder, context)); - } - } } }); @@ -1043,24 +1055,6 @@ private void translateSingleHelper( outputCoders, doFnSchemaInformation, sideInputMapping); - - // TODO: Move this logic into translateFn once the legacy ProcessKeyedElements is - // removed. - if (context.isFnApi()) { - DoFnSignature signature = DoFnSignatures.signatureForDoFn(transform.getFn()); - if (signature.processElement().isSplittable()) { - DoFnInvoker doFnInvoker = DoFnInvokers.invokerFor(transform.getFn()); - Coder restrictionAndWatermarkStateCoder = - KvCoder.of( - doFnInvoker.invokeGetRestrictionCoder( - context.getInput(transform).getPipeline().getCoderRegistry()), - doFnInvoker.invokeGetWatermarkEstimatorStateCoder( - context.getInput(transform).getPipeline().getCoderRegistry())); - stepContext.addInput( - PropertyNames.RESTRICTION_ENCODING, - translateCoder(restrictionAndWatermarkStateCoder, context)); - } - } } }); @@ -1252,7 +1246,9 @@ private static void translateFn( boolean isStateful = DoFnSignatures.isStateful(fn); if (isStateful) { - DataflowRunner.verifyDoFnSupported(fn, context.getPipelineOptions().isStreaming()); + DataflowPipelineOptions options = context.getPipelineOptions(); + DataflowRunner.verifyDoFnSupported( + fn, options.isStreaming(), DataflowRunner.useStreamingEngine(options)); DataflowRunner.verifyStateSupportForWindowingStrategy(windowingStrategy); } @@ -1260,23 +1256,19 @@ private static void translateFn( // Fn API does not need the additional metadata in the wrapper, and it is Java-only serializable // hence not suitable for portable execution - if (context.isFnApi()) { - stepContext.addInput(PropertyNames.SERIALIZED_FN, ptransformId); - } else { - stepContext.addInput( - PropertyNames.SERIALIZED_FN, - byteArrayToJsonString( - serializeToByteArray( - DoFnInfo.forFn( - fn, - windowingStrategy, - sideInputs, - inputCoder, - outputCoders, - mainOutput, - doFnSchemaInformation, - sideInputMapping)))); - } + stepContext.addInput( + PropertyNames.SERIALIZED_FN, + byteArrayToJsonString( + serializeToByteArray( + DoFnInfo.forFn( + fn, + windowingStrategy, + sideInputs, + inputCoder, + outputCoders, + mainOutput, + doFnSchemaInformation, + sideInputMapping)))); // Setting USES_KEYED_STATE will cause an ungrouped shuffle, which works // in streaming but does not work in batch @@ -1294,6 +1286,6 @@ private static void translateOutputs( } private static CloudObject translateCoder(Coder coder, TranslationContext context) { - return CloudObjects.asCloudObject(coder, context.isFnApi() ? context.getSdkComponents() : null); + return CloudObjects.asCloudObject(coder, null); } } diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java index fc8165328a55..c1220b0521ff 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java @@ -19,6 +19,7 @@ import static java.nio.charset.StandardCharsets.UTF_8; import static org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage; +import static org.apache.beam.sdk.options.ExperimentalOptions.hasExperiment; import static org.apache.beam.sdk.util.CoderUtils.encodeToByteArray; import static org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray; import static org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString; @@ -37,7 +38,9 @@ import com.google.api.services.dataflow.model.DataflowPackage; import com.google.api.services.dataflow.model.Job; import com.google.api.services.dataflow.model.ListJobsResponse; +import com.google.api.services.dataflow.model.SdkHarnessContainerImage; import com.google.api.services.dataflow.model.WorkerPool; +import com.google.auto.value.AutoValue; import java.io.BufferedWriter; import java.io.File; import java.io.IOException; @@ -93,7 +96,6 @@ import org.apache.beam.sdk.PipelineResult.State; import org.apache.beam.sdk.PipelineRunner; import org.apache.beam.sdk.annotations.Experimental; -import org.apache.beam.sdk.coders.ByteArrayCoder; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.Coder.NonDeterministicException; import org.apache.beam.sdk.coders.KvCoder; @@ -107,18 +109,18 @@ import org.apache.beam.sdk.io.UnboundedSource; import org.apache.beam.sdk.io.WriteFiles; import org.apache.beam.sdk.io.WriteFilesResult; +import org.apache.beam.sdk.io.fs.ResolveOptions; import org.apache.beam.sdk.io.fs.ResourceId; import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage; import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageWithAttributesAndMessageIdCoder; import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageWithAttributesCoder; -import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessages; import org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSink; import org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSource; +import org.apache.beam.sdk.io.kafka.KafkaIO; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsValidator; import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider; import org.apache.beam.sdk.runners.AppliedPTransform; -import org.apache.beam.sdk.runners.PTransformMatcher; import org.apache.beam.sdk.runners.PTransformOverride; import org.apache.beam.sdk.runners.PTransformOverrideFactory; import org.apache.beam.sdk.runners.TransformHierarchy; @@ -132,7 +134,6 @@ import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.GroupIntoBatches; import org.apache.beam.sdk.transforms.Impulse; -import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.Reshuffle; @@ -148,6 +149,7 @@ import org.apache.beam.sdk.util.InstanceBuilder; import org.apache.beam.sdk.util.MimeTypes; import org.apache.beam.sdk.util.NameUtils; +import org.apache.beam.sdk.util.ReleaseInfo; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.util.common.ReflectHelpers; import org.apache.beam.sdk.values.KV; @@ -159,12 +161,11 @@ import org.apache.beam.sdk.values.PInput; import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.sdk.values.TypeDescriptor; import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.beam.sdk.values.ValueWithRecordId; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Joiner; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; @@ -200,7 +201,6 @@ public class DataflowRunner extends PipelineRunner { private static final Logger LOG = LoggerFactory.getLogger(DataflowRunner.class); - /** Provided configuration options. */ private final DataflowPipelineOptions options; @@ -270,7 +270,8 @@ && isServiceEndpoint(dataflowOptions.getDataflowEndpoint())) { "Missing required pipeline options: " + Joiner.on(',').join(missing)); } - validateWorkerSettings(PipelineOptionsValidator.validate(GcpOptions.class, options)); + validateWorkerSettings( + PipelineOptionsValidator.validate(DataflowPipelineWorkerPoolOptions.class, options)); PathValidator validator = dataflowOptions.getPathValidator(); String gcpTempLocation; @@ -383,7 +384,7 @@ && isServiceEndpoint(dataflowOptions.getDataflowEndpoint())) { // Adding the Java version to the SDK name for user's and support convenience. String agentJavaVer = - (Environments.getJavaVersion() == Environments.JavaVersion.v8) + (Environments.getJavaVersion() == Environments.JavaVersion.java8) ? "(JRE 8 environment)" : "(JDK 11 environment)"; @@ -402,8 +403,39 @@ static boolean isServiceEndpoint(String endpoint) { return Strings.isNullOrEmpty(endpoint) || Pattern.matches(ENDPOINT_REGEXP, endpoint); } + static void validateSdkContainerImageOptions(DataflowPipelineWorkerPoolOptions workerOptions) { + // Check against null - empty string value for workerHarnessContainerImage + // must be preserved for legacy dataflowWorkerJar to work. + String sdkContainerOption = workerOptions.getSdkContainerImage(); + String workerHarnessOption = workerOptions.getWorkerHarnessContainerImage(); + Preconditions.checkArgument( + sdkContainerOption == null + || workerHarnessOption == null + || sdkContainerOption.equals(workerHarnessOption), + "Cannot use legacy option workerHarnessContainerImage with sdkContainerImage. Prefer sdkContainerImage."); + + // Default to new option, which may be null. + String containerImage = workerOptions.getSdkContainerImage(); + if (workerOptions.getWorkerHarnessContainerImage() != null + && workerOptions.getSdkContainerImage() == null) { + // Set image to old option if old option was set but new option is not set. + LOG.warn( + "Prefer --sdkContainerImage over deprecated legacy option --workerHarnessContainerImage."); + containerImage = workerOptions.getWorkerHarnessContainerImage(); + } + + // Make sure both options have same value. + workerOptions.setSdkContainerImage(containerImage); + workerOptions.setWorkerHarnessContainerImage(containerImage); + } + @VisibleForTesting - static void validateWorkerSettings(GcpOptions gcpOptions) { + static void validateWorkerSettings(DataflowPipelineWorkerPoolOptions workerOptions) { + DataflowPipelineOptions dataflowOptions = workerOptions.as(DataflowPipelineOptions.class); + + validateSdkContainerImageOptions(workerOptions); + + GcpOptions gcpOptions = workerOptions.as(GcpOptions.class); Preconditions.checkArgument( gcpOptions.getZone() == null || gcpOptions.getWorkerRegion() == null, "Cannot use option zone with workerRegion. Prefer either workerZone or workerRegion."); @@ -414,7 +446,6 @@ static void validateWorkerSettings(GcpOptions gcpOptions) { gcpOptions.getWorkerRegion() == null || gcpOptions.getWorkerZone() == null, "workerRegion and workerZone options are mutually exclusive."); - DataflowPipelineOptions dataflowOptions = gcpOptions.as(DataflowPipelineOptions.class); boolean hasExperimentWorkerRegion = false; if (dataflowOptions.getExperiments() != null) { for (String experiment : dataflowOptions.getExperiments()) { @@ -449,7 +480,6 @@ protected DataflowRunner(DataflowPipelineOptions options) { } private List getOverrides(boolean streaming) { - boolean fnApiEnabled = hasExperiment(options, "beam_fn_api"); ImmutableList.Builder overridesBuilder = ImmutableList.builder(); // Create is implemented in terms of a Read, so it must precede the override to Read in @@ -462,20 +492,20 @@ private List getOverrides(boolean streaming) { .add( PTransformOverride.of( PTransformMatchers.emptyFlatten(), EmptyFlattenAsCreateFactory.instance())); - if (!fnApiEnabled) { - // By default Dataflow runner replaces single-output ParDo with a ParDoSingle override. - // However, we want a different expansion for single-output splittable ParDo. - overridesBuilder - .add( - PTransformOverride.of( - PTransformMatchers.splittableParDoSingle(), - new ReflectiveOneToOneOverrideFactory( - SplittableParDoOverrides.ParDoSingleViaMulti.class, this))) - .add( - PTransformOverride.of( - PTransformMatchers.splittableParDoMulti(), - new SplittableParDoOverrides.SplittableParDoOverrideFactory())); - } + + // By default Dataflow runner replaces single-output ParDo with a ParDoSingle override. + // However, we want a different expansion for single-output splittable ParDo. + overridesBuilder + .add( + PTransformOverride.of( + PTransformMatchers.splittableParDoSingle(), + new ReflectiveOneToOneOverrideFactory( + SplittableParDoOverrides.ParDoSingleViaMulti.class, this))) + .add( + PTransformOverride.of( + PTransformMatchers.splittableParDoMulti(), + new SplittableParDoOverrides.SplittableParDoOverrideFactory())); + if (streaming) { if (!hasExperiment(options, "enable_custom_pubsub_source")) { overridesBuilder.add( @@ -484,29 +514,18 @@ private List getOverrides(boolean streaming) { new StreamingPubsubIOReadOverrideFactory())); } if (!hasExperiment(options, "enable_custom_pubsub_sink")) { - if (hasExperiment(options, "use_runner_v2") - || hasExperiment(options, "use_unified_worker")) { - overridesBuilder.add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(PubsubUnboundedSink.class), - new DataflowWriteToPubsubRunnerV2OverrideFactory())); - } else { - overridesBuilder.add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(PubsubUnboundedSink.class), - new StreamingPubsubIOWriteOverrideFactory(this))); - } + overridesBuilder.add( + PTransformOverride.of( + PTransformMatchers.classEqualTo(PubsubUnboundedSink.class), + new StreamingPubsubIOWriteOverrideFactory(this))); } + + overridesBuilder.add(KafkaIO.Read.KAFKA_READ_OVERRIDE); + overridesBuilder.add( PTransformOverride.of( PTransformMatchers.writeWithRunnerDeterminedSharding(), new StreamingShardedWriteFactory(options))); - if (fnApiEnabled) { - overridesBuilder.add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(Create.Values.class), - new StreamingFnApiCreateOverrideFactory())); - } overridesBuilder.add( PTransformOverride.of( @@ -514,32 +533,27 @@ private List getOverrides(boolean streaming) { new GroupIntoBatchesOverride.StreamingGroupIntoBatchesWithShardedKeyOverrideFactory( this))); - if (!fnApiEnabled) { - overridesBuilder - .add( - // Streaming Bounded Read is implemented in terms of Streaming Unbounded Read, and - // must precede it - PTransformOverride.of( - PTransformMatchers.classEqualTo(Read.Bounded.class), - new StreamingBoundedReadOverrideFactory())) - .add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(Read.Unbounded.class), - new StreamingUnboundedReadOverrideFactory())); - } + overridesBuilder + .add( + // Streaming Bounded Read is implemented in terms of Streaming Unbounded Read, and + // must precede it + PTransformOverride.of( + PTransformMatchers.classEqualTo(Read.Bounded.class), + new StreamingBoundedReadOverrideFactory())) + .add( + PTransformOverride.of( + PTransformMatchers.classEqualTo(Read.Unbounded.class), + new StreamingUnboundedReadOverrideFactory())); + + overridesBuilder.add( + PTransformOverride.of( + PTransformMatchers.classEqualTo(View.CreatePCollectionView.class), + new StreamingCreatePCollectionViewFactory())); - if (!fnApiEnabled) { - overridesBuilder.add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(View.CreatePCollectionView.class), - new StreamingCreatePCollectionViewFactory())); - } // Dataflow Streaming runner overrides the SPLITTABLE_PROCESS_KEYED transform // natively in the Dataflow service. } else { - if (!fnApiEnabled) { - overridesBuilder.add(SplittableParDo.PRIMITIVE_BOUNDED_READ_OVERRIDE); - } + overridesBuilder.add(SplittableParDo.PRIMITIVE_BOUNDED_READ_OVERRIDE); overridesBuilder // Replace GroupIntoBatches before the state/timer replacements below since // GroupIntoBatches internally uses a stateful DoFn. @@ -562,7 +576,7 @@ private List getOverrides(boolean streaming) { .add( PTransformOverride.of( PTransformMatchers.stateOrTimerParDoSingle(), - BatchStatefulParDoOverrides.singleOutputOverrideFactory(options))); + BatchStatefulParDoOverrides.singleOutputOverrideFactory())); // Dataflow Batch runner uses the naive override of the SPLITTABLE_PROCESS_KEYED transform // for now, but eventually (when liquid sharding is implemented) will also override it // natively in the Dataflow service. @@ -570,35 +584,35 @@ private List getOverrides(boolean streaming) { PTransformOverride.of( PTransformMatchers.splittableProcessKeyedBounded(), new SplittableParDoNaiveBounded.OverrideFactory())); - if (!fnApiEnabled) { - overridesBuilder - .add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(View.AsMap.class), - new ReflectiveViewOverrideFactory( - BatchViewOverrides.BatchViewAsMap.class, this))) - .add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(View.AsMultimap.class), - new ReflectiveViewOverrideFactory( - BatchViewOverrides.BatchViewAsMultimap.class, this))) - .add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(Combine.GloballyAsSingletonView.class), - new CombineGloballyAsSingletonViewOverrideFactory(this))) - .add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(View.AsList.class), - new ReflectiveViewOverrideFactory( - BatchViewOverrides.BatchViewAsList.class, this))) - .add( - PTransformOverride.of( - PTransformMatchers.classEqualTo(View.AsIterable.class), - new ReflectiveViewOverrideFactory( - BatchViewOverrides.BatchViewAsIterable.class, this))); - } + + overridesBuilder + .add( + PTransformOverride.of( + PTransformMatchers.classEqualTo(View.AsMap.class), + new ReflectiveViewOverrideFactory(BatchViewOverrides.BatchViewAsMap.class, this))) + .add( + PTransformOverride.of( + PTransformMatchers.classEqualTo(View.AsMultimap.class), + new ReflectiveViewOverrideFactory( + BatchViewOverrides.BatchViewAsMultimap.class, this))) + .add( + PTransformOverride.of( + PTransformMatchers.classEqualTo(Combine.GloballyAsSingletonView.class), + new CombineGloballyAsSingletonViewOverrideFactory(this))) + .add( + PTransformOverride.of( + PTransformMatchers.classEqualTo(View.AsList.class), + new ReflectiveViewOverrideFactory( + BatchViewOverrides.BatchViewAsList.class, this))) + .add( + PTransformOverride.of( + PTransformMatchers.classEqualTo(View.AsIterable.class), + new ReflectiveViewOverrideFactory( + BatchViewOverrides.BatchViewAsIterable.class, this))); } - /* TODO[Beam-4684]: Support @RequiresStableInput on Dataflow in a more intelligent way + /* TODO(Beam-4684): Support @RequiresStableInput on Dataflow in a more intelligent way + Use Reshuffle might cause an extra and unnecessary shuffle to be inserted. To enable this, we + should make sure that we do not add extra shuffles for transforms whose input is already stable. // Uses Reshuffle, so has to be before the Reshuffle override overridesBuilder.add( PTransformOverride.of( @@ -617,7 +631,7 @@ private List getOverrides(boolean streaming) { // Order is important. Streaming views almost all use Combine internally. .add( PTransformOverride.of( - combineValuesTranslation(fnApiEnabled), + new DataflowPTransformMatchers.CombineValuesWithoutSideInputsPTransformMatcher(), new PrimitiveCombineGroupedValuesOverrideFactory())) .add( PTransformOverride.of( @@ -788,22 +802,6 @@ public PTransformReplacement, PCollection> getRepla } } - /** - * Returns a {@link PTransformMatcher} that matches {@link PTransform}s of class {@link - * Combine.GroupedValues} that will be translated into CombineValues transforms in Dataflow's Job - * API and skips those that should be expanded into ParDos. - * - * @param fnApiEnabled Flag indicating whether this matcher is being retrieved for a fnapi or - * non-fnapi pipeline. - */ - private static PTransformMatcher combineValuesTranslation(boolean fnApiEnabled) { - if (fnApiEnabled) { - return new DataflowPTransformMatchers.CombineValuesWithParentCheckPTransformMatcher(); - } else { - return new DataflowPTransformMatchers.CombineValuesWithoutSideInputsPTransformMatcher(); - } - } - private String debuggerMessage(String projectId, String uniquifier) { return String.format( "To debug your job, visit Google Cloud Debugger at: " @@ -852,6 +850,57 @@ private Debuggee registerDebuggee(CloudDebugger debuggerClient, String uniquifie } } + @VisibleForTesting + protected RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline) { + RunnerApi.Pipeline.Builder pipelineBuilder = pipeline.toBuilder(); + RunnerApi.Components.Builder componentsBuilder = pipelineBuilder.getComponentsBuilder(); + componentsBuilder.clearEnvironments(); + for (Map.Entry entry : + pipeline.getComponents().getEnvironmentsMap().entrySet()) { + RunnerApi.Environment.Builder environmentBuilder = entry.getValue().toBuilder(); + environmentBuilder.clearDependencies(); + for (RunnerApi.ArtifactInformation info : entry.getValue().getDependenciesList()) { + if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + throw new RuntimeException( + String.format("unsupported artifact type %s", info.getTypeUrn())); + } + RunnerApi.ArtifactFilePayload filePayload; + try { + filePayload = RunnerApi.ArtifactFilePayload.parseFrom(info.getTypePayload()); + } catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Error parsing artifact file payload.", e); + } + if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO) + .equals(info.getRoleUrn())) { + throw new RuntimeException( + String.format("unsupported artifact role %s", info.getRoleUrn())); + } + RunnerApi.ArtifactStagingToRolePayload stagingPayload; + try { + stagingPayload = RunnerApi.ArtifactStagingToRolePayload.parseFrom(info.getRolePayload()); + } catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Error parsing artifact staging_to role payload.", e); + } + environmentBuilder.addDependencies( + info.toBuilder() + .setTypeUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.URL)) + .setTypePayload( + RunnerApi.ArtifactUrlPayload.newBuilder() + .setUrl( + FileSystems.matchNewResource(options.getStagingLocation(), true) + .resolve( + stagingPayload.getStagedName(), + ResolveOptions.StandardResolveOptions.RESOLVE_FILE) + .toString()) + .setSha256(filePayload.getSha256()) + .build() + .toByteString())); + } + componentsBuilder.putEnvironments(entry.getKey(), environmentBuilder.build()); + } + return pipelineBuilder.build(); + } + private List stageArtifacts(RunnerApi.Pipeline pipeline) { ImmutableList.Builder filesToStageBuilder = ImmutableList.builder(); for (Map.Entry entry : @@ -891,7 +940,7 @@ private List getDefaultArtifacts() { String windmillBinary = options.as(DataflowPipelineDebugOptions.class).getOverrideWindmillBinary(); String dataflowWorkerJar = options.getDataflowWorkerJar(); - if (dataflowWorkerJar != null && !dataflowWorkerJar.isEmpty()) { + if (dataflowWorkerJar != null && !dataflowWorkerJar.isEmpty() && !useUnifiedWorker(options)) { // Put the user specified worker jar at the start of the classpath, to be consistent with the // built in worker order. pathsToStageBuilder.add("dataflow-worker.jar=" + dataflowWorkerJar); @@ -905,36 +954,84 @@ private List getDefaultArtifacts() { @Override public DataflowPipelineJob run(Pipeline pipeline) { + + if (useUnifiedWorker(options)) { + List experiments = options.getExperiments(); // non-null if useUnifiedWorker is true + if (!experiments.contains("use_runner_v2")) { + experiments.add("use_runner_v2"); + } + if (!experiments.contains("use_unified_worker")) { + experiments.add("use_unified_worker"); + } + if (!experiments.contains("beam_fn_api")) { + experiments.add("beam_fn_api"); + } + if (!experiments.contains("use_portable_job_submission")) { + experiments.add("use_portable_job_submission"); + } + options.setExperiments(ImmutableList.copyOf(experiments)); + } + logWarningIfPCollectionViewHasNonDeterministicKeyCoder(pipeline); if (containsUnboundedPCollection(pipeline)) { options.setStreaming(true); } - replaceTransforms(pipeline); LOG.info( "Executing pipeline on the Dataflow Service, which will have billing implications " + "related to Google Compute Engine usage and other Google Cloud Services."); - // Capture the sdkComponents for look up during step translations - SdkComponents sdkComponents = SdkComponents.create(); - DataflowPipelineOptions dataflowOptions = options.as(DataflowPipelineOptions.class); String workerHarnessContainerImageURL = DataflowRunner.getContainerImageForJob(dataflowOptions); + + // This incorrectly puns the worker harness container image (which implements v1beta3 API) + // with the SDK harness image (which implements Fn API). + // + // The same Environment is used in different and contradictory ways, depending on whether + // it is a v1 or v2 job submission. RunnerApi.Environment defaultEnvironmentForDataflow = Environments.createDockerEnvironment(workerHarnessContainerImageURL); - sdkComponents.registerEnvironment( + // The SdkComponents for portable an non-portable job submission must be kept distinct. Both + // need the default environment. + SdkComponents portableComponents = SdkComponents.create(); + portableComponents.registerEnvironment( defaultEnvironmentForDataflow .toBuilder() .addAllDependencies(getDefaultArtifacts()) .addAllCapabilities(Environments.getJavaCapabilities()) .build()); - RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); - - LOG.debug("Portable pipeline proto:\n{}", TextFormat.printToString(pipelineProto)); + RunnerApi.Pipeline portablePipelineProto = + PipelineTranslation.toProto(pipeline, portableComponents, false); + // Note that `stageArtifacts` has to be called before `resolveArtifact` because + // `resolveArtifact` updates local paths to staged paths in pipeline proto. + List packages = stageArtifacts(portablePipelineProto); + portablePipelineProto = resolveArtifacts(portablePipelineProto); + LOG.debug("Portable pipeline proto:\n{}", TextFormat.printToString(portablePipelineProto)); + // Stage the portable pipeline proto, retrieving the staged pipeline path, then update + // the options on the new job + // TODO: add an explicit `pipeline` parameter to the submission instead of pipeline options + LOG.info("Staging portable pipeline proto to {}", options.getStagingLocation()); + byte[] serializedProtoPipeline = portablePipelineProto.toByteArray(); - List packages = stageArtifacts(pipelineProto); + DataflowPackage stagedPipeline = + options.getStager().stageToFile(serializedProtoPipeline, PIPELINE_FILE_NAME); + dataflowOptions.setPipelineUrl(stagedPipeline.getLocation()); + // Now rewrite things to be as needed for v1 (mutates the pipeline) + // This way the job submitted is valid for v1 and v2, simultaneously + replaceV1Transforms(pipeline); + // Capture the SdkComponents for look up during step translations + SdkComponents dataflowV1Components = SdkComponents.create(); + dataflowV1Components.registerEnvironment( + defaultEnvironmentForDataflow + .toBuilder() + .addAllDependencies(getDefaultArtifacts()) + .addAllCapabilities(Environments.getJavaCapabilities()) + .build()); + RunnerApi.Pipeline dataflowV1PipelineProto = + PipelineTranslation.toProto(pipeline, dataflowV1Components, true); + LOG.debug("Dataflow v1 pipeline proto:\n{}", TextFormat.printToString(dataflowV1PipelineProto)); // Set a unique client_request_id in the CreateJob request. // This is used to ensure idempotence of job creation across retried @@ -956,24 +1053,19 @@ public DataflowPipelineJob run(Pipeline pipeline) { maybeRegisterDebuggee(dataflowOptions, requestId); JobSpecification jobSpecification = - translator.translate(pipeline, pipelineProto, sdkComponents, this, packages); - - // Stage the pipeline, retrieving the staged pipeline path, then update - // the options on the new job - // TODO: add an explicit `pipeline` parameter to the submission instead of pipeline options - LOG.info("Staging pipeline description to {}", options.getStagingLocation()); - byte[] serializedProtoPipeline = jobSpecification.getPipelineProto().toByteArray(); - DataflowPackage stagedPipeline = - options.getStager().stageToFile(serializedProtoPipeline, PIPELINE_FILE_NAME); - dataflowOptions.setPipelineUrl(stagedPipeline.getLocation()); + translator.translate( + pipeline, dataflowV1PipelineProto, dataflowV1Components, this, packages); - if (!isNullOrEmpty(dataflowOptions.getDataflowWorkerJar())) { + if (!isNullOrEmpty(dataflowOptions.getDataflowWorkerJar()) && !useUnifiedWorker(options)) { List experiments = - dataflowOptions.getExperiments() == null - ? new ArrayList<>() - : new ArrayList<>(dataflowOptions.getExperiments()); - experiments.add("use_staged_dataflow_worker_jar"); - dataflowOptions.setExperiments(experiments); + firstNonNull(dataflowOptions.getExperiments(), Collections.emptyList()); + if (!experiments.contains("use_staged_dataflow_worker_jar")) { + dataflowOptions.setExperiments( + ImmutableList.builder() + .addAll(experiments) + .add("use_staged_dataflow_worker_jar") + .build()); + } } Job newJob = jobSpecification.getJob(); @@ -1022,9 +1114,9 @@ public DataflowPipelineJob run(Pipeline pipeline) { } // Represent the minCpuPlatform pipeline option as an experiment, if not already present. - List experiments = - firstNonNull(dataflowOptions.getExperiments(), new ArrayList()); if (!isNullOrEmpty(dataflowOptions.getMinCpuPlatform())) { + List experiments = + firstNonNull(dataflowOptions.getExperiments(), Collections.emptyList()); List minCpuFlags = experiments.stream() @@ -1032,7 +1124,11 @@ public DataflowPipelineJob run(Pipeline pipeline) { .collect(Collectors.toList()); if (minCpuFlags.isEmpty()) { - experiments.add("min_cpu_platform=" + dataflowOptions.getMinCpuPlatform()); + dataflowOptions.setExperiments( + ImmutableList.builder() + .addAll(experiments) + .add("min_cpu_platform=" + dataflowOptions.getMinCpuPlatform()) + .build()); } else { LOG.warn( "Flag min_cpu_platform is defined in both top level PipelineOption, " @@ -1041,15 +1137,24 @@ public DataflowPipelineJob run(Pipeline pipeline) { } } - newJob.getEnvironment().setExperiments(experiments); + newJob + .getEnvironment() + .setExperiments( + ImmutableList.copyOf( + firstNonNull(dataflowOptions.getExperiments(), Collections.emptyList()))); // Set the Docker container image that executes Dataflow worker harness, residing in Google // Container Registry. Translator is guaranteed to create a worker pool prior to this point. - String workerHarnessContainerImage = getContainerImageForJob(options); + // For runner_v1, only worker_harness_container is set. + // For runner_v2, both worker_harness_container and sdk_harness_container are set to the same + // value. + String containerImage = getContainerImageForJob(options); for (WorkerPool workerPool : newJob.getEnvironment().getWorkerPools()) { - workerPool.setWorkerHarnessContainerImage(workerHarnessContainerImage); + workerPool.setWorkerHarnessContainerImage(containerImage); } + configureSdkHarnessContainerImages(options, portablePipelineProto, newJob); + newJob.getEnvironment().setVersion(getEnvironmentVersion(options)); if (hooks != null) { @@ -1145,7 +1250,8 @@ public DataflowPipelineJob run(Pipeline pipeline) { DataflowClient.create(options), jobResult.getId(), options, - jobSpecification.getStepNames()); + jobSpecification != null ? jobSpecification.getStepNames() : Collections.emptyMap(), + portablePipelineProto); // If the service returned client request id, the SDK needs to compare it // with the original id generated in the request, if they are not the same @@ -1186,6 +1292,71 @@ public DataflowPipelineJob run(Pipeline pipeline) { return dataflowPipelineJob; } + private static String getContainerImageFromEnvironmentId( + String environmentId, RunnerApi.Pipeline pipelineProto) { + RunnerApi.Environment environment = + pipelineProto.getComponents().getEnvironmentsMap().get(environmentId); + if (!BeamUrns.getUrn(RunnerApi.StandardEnvironments.Environments.DOCKER) + .equals(environment.getUrn())) { + throw new RuntimeException( + "Dataflow can only execute pipeline steps in Docker environments: " + + environment.getUrn()); + } + RunnerApi.DockerPayload dockerPayload; + try { + dockerPayload = RunnerApi.DockerPayload.parseFrom(environment.getPayload()); + } catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Error parsing docker payload.", e); + } + return dockerPayload.getContainerImage(); + } + + @AutoValue + abstract static class EnvironmentInfo { + static EnvironmentInfo create(String environmentId, String containerUrl) { + return new AutoValue_DataflowRunner_EnvironmentInfo(environmentId, containerUrl); + } + + abstract String environmentId(); + + abstract String containerUrl(); + } + + private static List getAllEnvironmentInfo(RunnerApi.Pipeline pipelineProto) { + return pipelineProto.getComponents().getTransformsMap().values().stream() + .map(transform -> transform.getEnvironmentId()) + .filter(environmentId -> !environmentId.isEmpty()) + .distinct() + .map( + environmentId -> + EnvironmentInfo.create( + environmentId, + getContainerImageFromEnvironmentId(environmentId, pipelineProto))) + .collect(Collectors.toList()); + } + + static void configureSdkHarnessContainerImages( + DataflowPipelineOptions options, RunnerApi.Pipeline pipelineProto, Job newJob) { + if (useUnifiedWorker(options)) { + List sdkContainerList = + getAllEnvironmentInfo(pipelineProto).stream() + .map( + environmentInfo -> { + SdkHarnessContainerImage image = new SdkHarnessContainerImage(); + image.setEnvironmentId(environmentInfo.environmentId()); + image.setContainerImage(environmentInfo.containerUrl()); + if (environmentInfo.containerUrl().toLowerCase().contains("python")) { + image.setUseSingleCorePerContainer(true); + } + return image; + }) + .collect(Collectors.toList()); + for (WorkerPool workerPool : newJob.getEnvironment().getWorkerPools()) { + workerPool.setSdkHarnessContainerImages(sdkContainerList); + } + } + } + /** Returns true if the specified experiment is enabled, handling null experiments. */ public static boolean hasExperiment(DataflowPipelineDebugOptions options, String experiment) { List experiments = @@ -1198,7 +1369,7 @@ private static Map getEnvironmentVersion(DataflowPipelineOptions DataflowRunnerInfo runnerInfo = DataflowRunnerInfo.getDataflowRunnerInfo(); String majorVersion; String jobType; - if (hasExperiment(options, "beam_fn_api")) { + if (useUnifiedWorker(options)) { majorVersion = runnerInfo.getFnApiEnvironmentMajorVersion(); jobType = options.isStreaming() ? "FNAPI_STREAMING" : "FNAPI_BATCH"; } else { @@ -1213,7 +1384,7 @@ private static Map getEnvironmentVersion(DataflowPipelineOptions // This method is protected to allow a Google internal subclass to properly // setup overrides. @VisibleForTesting - protected void replaceTransforms(Pipeline pipeline) { + protected void replaceV1Transforms(Pipeline pipeline) { boolean streaming = options.isStreaming() || containsUnboundedPCollection(pipeline); // Ensure all outputs of all reads are consumed before potentially replacing any // Read PTransforms @@ -1320,21 +1491,13 @@ void addPCollectionRequiringIndexedFormat(PCollection pcol) { } void maybeRecordPCollectionWithAutoSharding(PCollection pcol) { - if (hasExperiment(options, "beam_fn_api")) { - LOG.warn( - "Runner determined sharding not available in Dataflow for GroupIntoBatches for portable " - + "jobs. Default sharding will be applied."); - return; - } - if (!options.isEnableStreamingEngine()) { - LOG.warn( - "Runner determined sharding not available in Dataflow for GroupIntoBatches for Streaming " - + "Appliance jobs. Default sharding will be applied."); - return; - } - if (hasExperiment(options, "enable_streaming_auto_sharding")) { - pcollectionsRequiringAutoSharding.add(pcol); - } + // Auto-sharding is only supported in Streaming Engine. + checkArgument( + options.isEnableStreamingEngine(), + "Runner determined sharding not available in Dataflow for GroupIntoBatches for" + + " non-Streaming-Engine jobs. In order to use runner determined sharding, please use" + + " --streaming --enable_streaming_engine"); + pcollectionsRequiringAutoSharding.add(pcol); } boolean doesPCollectionRequireAutoSharding(PCollection pcol) { @@ -1384,7 +1547,7 @@ public StreamingPubsubIORead(PubsubUnboundedSource transform) { this.transform = transform; } - PubsubUnboundedSource getOverriddenTransform() { + public PubsubUnboundedSource getOverriddenTransform() { return transform; } @@ -1409,6 +1572,49 @@ protected String getKindString() { } } + private static void translateOverriddenPubsubSourceStep( + PubsubUnboundedSource overriddenTransform, StepTranslationContext stepTranslationContext) { + stepTranslationContext.addInput(PropertyNames.FORMAT, "pubsub"); + if (overriddenTransform.getTopicProvider() != null) { + if (overriddenTransform.getTopicProvider().isAccessible()) { + stepTranslationContext.addInput( + PropertyNames.PUBSUB_TOPIC, overriddenTransform.getTopic().getFullPath()); + } else { + stepTranslationContext.addInput( + PropertyNames.PUBSUB_TOPIC_OVERRIDE, + ((NestedValueProvider) overriddenTransform.getTopicProvider()).propertyName()); + } + } + if (overriddenTransform.getSubscriptionProvider() != null) { + if (overriddenTransform.getSubscriptionProvider().isAccessible()) { + stepTranslationContext.addInput( + PropertyNames.PUBSUB_SUBSCRIPTION, overriddenTransform.getSubscription().getFullPath()); + } else { + stepTranslationContext.addInput( + PropertyNames.PUBSUB_SUBSCRIPTION_OVERRIDE, + ((NestedValueProvider) overriddenTransform.getSubscriptionProvider()).propertyName()); + } + } + if (overriddenTransform.getTimestampAttribute() != null) { + stepTranslationContext.addInput( + PropertyNames.PUBSUB_TIMESTAMP_ATTRIBUTE, overriddenTransform.getTimestampAttribute()); + } + if (overriddenTransform.getIdAttribute() != null) { + stepTranslationContext.addInput( + PropertyNames.PUBSUB_ID_ATTRIBUTE, overriddenTransform.getIdAttribute()); + } + // In both cases, the transform needs to read PubsubMessage. However, in case it needs + // the attributes or messageId, we supply an identity "parse fn" so the worker will + // read PubsubMessage's from Windmill and simply pass them around; and in case it + // doesn't need attributes, we're already implicitly using a "Coder" that interprets + // the data as a PubsubMessage's payload. + if (overriddenTransform.getNeedsAttributes() || overriddenTransform.getNeedsMessageId()) { + stepTranslationContext.addInput( + PropertyNames.PUBSUB_SERIALIZED_ATTRIBUTES_FN, + byteArrayToJsonString(serializeToByteArray(new IdentityMessageFn()))); + } + } + /** Rewrite {@link StreamingPubsubIORead} to the appropriate internal node. */ private static class StreamingPubsubIOReadTranslator implements TransformTranslator { @@ -1418,48 +1624,8 @@ public void translate(StreamingPubsubIORead transform, TranslationContext contex checkArgument( context.getPipelineOptions().isStreaming(), "StreamingPubsubIORead is only for streaming pipelines."); - PubsubUnboundedSource overriddenTransform = transform.getOverriddenTransform(); StepTranslationContext stepContext = context.addStep(transform, "ParallelRead"); - stepContext.addInput(PropertyNames.FORMAT, "pubsub"); - if (overriddenTransform.getTopicProvider() != null) { - if (overriddenTransform.getTopicProvider().isAccessible()) { - stepContext.addInput( - PropertyNames.PUBSUB_TOPIC, overriddenTransform.getTopic().getV1Beta1Path()); - } else { - stepContext.addInput( - PropertyNames.PUBSUB_TOPIC_OVERRIDE, - ((NestedValueProvider) overriddenTransform.getTopicProvider()).propertyName()); - } - } - if (overriddenTransform.getSubscriptionProvider() != null) { - if (overriddenTransform.getSubscriptionProvider().isAccessible()) { - stepContext.addInput( - PropertyNames.PUBSUB_SUBSCRIPTION, - overriddenTransform.getSubscription().getV1Beta1Path()); - } else { - stepContext.addInput( - PropertyNames.PUBSUB_SUBSCRIPTION_OVERRIDE, - ((NestedValueProvider) overriddenTransform.getSubscriptionProvider()).propertyName()); - } - } - if (overriddenTransform.getTimestampAttribute() != null) { - stepContext.addInput( - PropertyNames.PUBSUB_TIMESTAMP_ATTRIBUTE, overriddenTransform.getTimestampAttribute()); - } - if (overriddenTransform.getIdAttribute() != null) { - stepContext.addInput( - PropertyNames.PUBSUB_ID_ATTRIBUTE, overriddenTransform.getIdAttribute()); - } - // In both cases, the transform needs to read PubsubMessage. However, in case it needs - // the attributes or messageId, we supply an identity "parse fn" so the worker will - // read PubsubMessage's from Windmill and simply pass them around; and in case it - // doesn't need attributes, we're already implicitly using a "Coder" that interprets - // the data as a PubsubMessage's payload. - if (overriddenTransform.getNeedsAttributes() || overriddenTransform.getNeedsMessageId()) { - stepContext.addInput( - PropertyNames.PUBSUB_SERIALIZED_ATTRIBUTES_FN, - byteArrayToJsonString(serializeToByteArray(new IdentityMessageFn()))); - } + translateOverriddenPubsubSourceStep(transform.getOverriddenTransform(), stepContext); stepContext.addOutput(PropertyNames.OUTPUT, context.getOutput(transform)); } } @@ -1472,39 +1638,6 @@ public PubsubMessage apply(PubsubMessage input) { } } - /** - * Suppress application of {@link PubsubUnboundedSink#expand} in streaming mode so that we can - * instead defer to Windmill's implementation when using Dataflow runner v2. - */ - private static class DataflowRunnerV2PubsubSink extends PTransform, PDone> { - - private final PubsubUnboundedSink transform; - - public DataflowRunnerV2PubsubSink(PubsubUnboundedSink transform) { - this.transform = transform; - } - - PubsubUnboundedSink getOverriddenTransform() { - return transform; - } - - @Override - public PDone expand(PCollection input) { - return PDone.in(input.getPipeline()); - } - - @Override - protected String getKindString() { - return "DataflowRunnerV2PubsubSink"; - } - - static { - DataflowPipelineTranslator.registerTransformTranslator( - DataflowRunnerV2PubsubSink.class, - new StreamingPubsubSinkTranslators.DataflowRunnerV2PubsubSinkTranslator()); - } - } - /** * Suppress application of {@link PubsubUnboundedSink#expand} in streaming mode so that we can * instead defer to Windmill's implementation. @@ -1556,20 +1689,6 @@ public void translate(StreamingPubsubIOWrite transform, TranslationContext conte } } - /** Rewrite {@link DataflowRunnerV2PubsubSink} to the appropriate internal node. */ - static class DataflowRunnerV2PubsubSinkTranslator - implements TransformTranslator { - @Override - public void translate(DataflowRunnerV2PubsubSink transform, TranslationContext context) { - checkArgument( - context.getPipelineOptions().isStreaming(), - "StreamingPubsubIOWrite is only for streaming pipelines."); - StepTranslationContext stepContext = context.addStep(transform, "ParallelWrite"); - StreamingPubsubSinkTranslators.translate( - transform.getOverriddenTransform(), stepContext, context.getInput(transform)); - } - } - private static void translate( PubsubUnboundedSink overriddenTransform, StepTranslationContext stepContext, @@ -1577,7 +1696,7 @@ private static void translate( stepContext.addInput(PropertyNames.FORMAT, "pubsub"); if (overriddenTransform.getTopicProvider().isAccessible()) { stepContext.addInput( - PropertyNames.PUBSUB_TOPIC, overriddenTransform.getTopic().getV1Beta1Path()); + PropertyNames.PUBSUB_TOPIC, overriddenTransform.getTopic().getFullPath()); } else { stepContext.addInput( PropertyNames.PUBSUB_TOPIC_OVERRIDE, @@ -1712,8 +1831,7 @@ private static class ImpulseTranslator implements TransformTranslator { @Override public void translate(Impulse transform, TranslationContext context) { - if (context.getPipelineOptions().isStreaming() - && (!context.isFnApi() || !context.isStreamingEngine())) { + if (context.getPipelineOptions().isStreaming()) { StepTranslationContext stepContext = context.addStep(transform, "ParallelRead"); stepContext.addInput(PropertyNames.FORMAT, "pubsub"); stepContext.addInput(PropertyNames.PUBSUB_SUBSCRIPTION, "_starting_signal/"); @@ -2047,54 +2165,6 @@ public Map, ReplacementOutput> mapOutputs( } } - /** - * A replacement {@link PTransform} for {@link PubsubUnboundedSink} when using dataflow runner v2. - */ - private static class DataflowWriteToPubsubForRunnerV2 - extends PTransform, PDone> { - - private final PubsubUnboundedSink transform; - - public DataflowWriteToPubsubForRunnerV2(PubsubUnboundedSink transform) { - this.transform = transform; - } - - @Override - public PDone expand(PCollection input) { - input - .apply( - "Output Serialized PubsubMessage Proto", - MapElements.into(new TypeDescriptor() {}) - .via(new PubsubMessages.ParsePayloadAsPubsubMessageProto())) - .setCoder(ByteArrayCoder.of()) - .apply(new DataflowRunnerV2PubsubSink(transform)); - - return PDone.in(input.getPipeline()); - } - } - - /** - * A {@link PTransformOverrideFactory} to provide replacement {@link PTransform} for {@link - * PubsubUnboundedSink} when using dataflow runner v2. - */ - private static class DataflowWriteToPubsubRunnerV2OverrideFactory - implements PTransformOverrideFactory, PDone, PubsubUnboundedSink> { - - @Override - public PTransformReplacement, PDone> getReplacementTransform( - AppliedPTransform, PDone, PubsubUnboundedSink> transform) { - return PTransformReplacement.of( - PTransformReplacements.getSingletonMainInput(transform), - new DataflowWriteToPubsubForRunnerV2(transform.getTransform())); - } - - @Override - public Map, ReplacementOutput> mapOutputs( - Map, PCollection> outputs, PDone newOutput) { - return Collections.emptyMap(); - } - } - @VisibleForTesting static class StreamingShardedWriteFactory implements PTransformOverrideFactory< @@ -2160,58 +2230,97 @@ public Map, ReplacementOutput> mapOutputs( @VisibleForTesting static String getContainerImageForJob(DataflowPipelineOptions options) { - String workerHarnessContainerImage = options.getWorkerHarnessContainerImage(); + String containerImage = options.getSdkContainerImage(); + + if (containerImage == null) { + // If not set, construct and return default image URL. + return getDefaultContainerImageUrl(options); + } else if (containerImage.contains("IMAGE")) { + // Replace placeholder with default image name + return containerImage.replace("IMAGE", getDefaultContainerImageNameForJob(options)); + } else { + return containerImage; + } + } + + /** Construct the default Dataflow container full image URL. */ + static String getDefaultContainerImageUrl(DataflowPipelineOptions options) { + DataflowRunnerInfo dataflowRunnerInfo = DataflowRunnerInfo.getDataflowRunnerInfo(); + return String.format( + "%s/%s:%s", + dataflowRunnerInfo.getContainerImageBaseRepository(), + getDefaultContainerImageNameForJob(options), + getDefaultContainerVersion(options)); + } + /** + * Construct the default Dataflow container image name based on pipeline type and Java version. + */ + static String getDefaultContainerImageNameForJob(DataflowPipelineOptions options) { Environments.JavaVersion javaVersion = Environments.getJavaVersion(); - String javaVersionId = - (javaVersion == Environments.JavaVersion.v8) ? "java" : javaVersion.toString(); - if (!workerHarnessContainerImage.contains("IMAGE")) { - return workerHarnessContainerImage; - } else if (hasExperiment(options, "beam_fn_api")) { - return workerHarnessContainerImage.replace("IMAGE", "java"); + if (useUnifiedWorker(options)) { + return String.format("beam_%s_sdk", javaVersion.name()); } else if (options.isStreaming()) { - return workerHarnessContainerImage.replace( - "IMAGE", String.format("beam-%s-streaming", javaVersionId)); + return String.format("beam-%s-streaming", javaVersion.legacyName()); } else { - return workerHarnessContainerImage.replace( - "IMAGE", String.format("beam-%s-batch", javaVersionId)); + return String.format("beam-%s-batch", javaVersion.legacyName()); } } - static void verifyDoFnSupportedBatch(DoFn fn) { - verifyDoFnSupported(fn, false); + /** + * Construct the default Dataflow container image name based on pipeline type and Java version. + */ + static String getDefaultContainerVersion(DataflowPipelineOptions options) { + DataflowRunnerInfo dataflowRunnerInfo = DataflowRunnerInfo.getDataflowRunnerInfo(); + ReleaseInfo releaseInfo = ReleaseInfo.getReleaseInfo(); + if (releaseInfo.isDevSdkVersion()) { + if (useUnifiedWorker(options)) { + return dataflowRunnerInfo.getFnApiDevContainerVersion(); + } + return dataflowRunnerInfo.getLegacyDevContainerVersion(); + } + return releaseInfo.getSdkVersion(); } - static void verifyDoFnSupportedStreaming(DoFn fn) { - verifyDoFnSupported(fn, true); + static boolean useUnifiedWorker(DataflowPipelineOptions options) { + return hasExperiment(options, "beam_fn_api") + || hasExperiment(options, "use_runner_v2") + || hasExperiment(options, "use_unified_worker"); } - static void verifyDoFnSupported(DoFn fn, boolean streaming) { - if (DoFnSignatures.usesSetState(fn)) { - // https://issues.apache.org/jira/browse/BEAM-1479 - throw new UnsupportedOperationException( - String.format( - "%s does not currently support %s", - DataflowRunner.class.getSimpleName(), SetState.class.getSimpleName())); - } - if (DoFnSignatures.usesMapState(fn)) { - // https://issues.apache.org/jira/browse/BEAM-1474 - throw new UnsupportedOperationException( - String.format( - "%s does not currently support %s", - DataflowRunner.class.getSimpleName(), MapState.class.getSimpleName())); - } + static boolean useStreamingEngine(DataflowPipelineOptions options) { + return hasExperiment(options, GcpOptions.STREAMING_ENGINE_EXPERIMENT) + || hasExperiment(options, GcpOptions.WINDMILL_SERVICE_EXPERIMENT); + } + + static void verifyDoFnSupported(DoFn fn, boolean streaming, boolean streamingEngine) { if (streaming && DoFnSignatures.requiresTimeSortedInput(fn)) { throw new UnsupportedOperationException( String.format( "%s does not currently support @RequiresTimeSortedInput in streaming mode.", DataflowRunner.class.getSimpleName())); } + if (DoFnSignatures.usesSetState(fn)) { + if (streaming && streamingEngine) { + throw new UnsupportedOperationException( + String.format( + "%s does not currently support %s when using streaming engine", + DataflowRunner.class.getSimpleName(), SetState.class.getSimpleName())); + } + } + if (DoFnSignatures.usesMapState(fn)) { + if (streaming && streamingEngine) { + throw new UnsupportedOperationException( + String.format( + "%s does not currently support %s when using streaming engine", + DataflowRunner.class.getSimpleName(), MapState.class.getSimpleName())); + } + } } static void verifyStateSupportForWindowingStrategy(WindowingStrategy strategy) { // https://issues.apache.org/jira/browse/BEAM-2507 - if (!strategy.getWindowFn().isNonMerging()) { + if (strategy.needsMerge()) { throw new UnsupportedOperationException( String.format( "%s does not currently support state or timers with merging windows", diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunnerInfo.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunnerInfo.java index f54643f84281..fed503db49e0 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunnerInfo.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunnerInfo.java @@ -41,7 +41,9 @@ public final class DataflowRunnerInfo extends ReleaseInfo { "fnapi.environment.major.version"; private static final String LEGACY_ENVIRONMENT_MAJOR_VERSION_KEY = "legacy.environment.major.version"; - private static final String CONTAINER_VERSION_KEY = "container.version"; + private static final String CONTAINER_FNAPI_VERSION_KEY = "fnapi.container.version"; + private static final String CONTAINER_LEGACY_VERSION_KEY = "legacy.container.version"; + private static final String CONTAINER_BASE_REPOSITORY_KEY = "container.base_repository"; private static class LazyInit { private static final DataflowRunnerInfo INSTANCE; @@ -99,10 +101,25 @@ public String getFnApiEnvironmentMajorVersion() { return properties.get(FNAPI_ENVIRONMENT_MAJOR_VERSION_KEY); } - /** Provides the container version that will be used for constructing harness image paths. */ - public String getContainerVersion() { - checkState(properties.containsKey(CONTAINER_VERSION_KEY), "Unknown container version"); - return properties.get(CONTAINER_VERSION_KEY); + /** Provides the version/tag for dev SDK FnAPI container image. */ + public String getFnApiDevContainerVersion() { + checkState( + properties.containsKey(CONTAINER_FNAPI_VERSION_KEY), "Unknown FnAPI container version"); + return properties.get(CONTAINER_FNAPI_VERSION_KEY); + } + + /** Provides the version/tag for legacy SDK FnAPI container image. */ + public String getLegacyDevContainerVersion() { + checkState( + properties.containsKey(CONTAINER_LEGACY_VERSION_KEY), "Unknown legacy container version"); + return properties.get(CONTAINER_LEGACY_VERSION_KEY); + } + + /** Provides the version/tag for constructing the container image path. */ + public String getContainerImageBaseRepository() { + checkState( + properties.containsKey(CONTAINER_BASE_REPOSITORY_KEY), "Unknown container base repository"); + return properties.get(CONTAINER_BASE_REPOSITORY_KEY); } @Override diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/GroupIntoBatchesOverride.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/GroupIntoBatchesOverride.java index 1169527787d5..f46896f859a6 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/GroupIntoBatchesOverride.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/GroupIntoBatchesOverride.java @@ -18,27 +18,29 @@ package org.apache.beam.runners.dataflow; import java.nio.ByteBuffer; -import java.util.Iterator; import java.util.List; import java.util.Map; import java.util.UUID; import org.apache.beam.runners.core.construction.PTransformReplacements; import org.apache.beam.runners.core.construction.ReplacementOutputs; +import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.runners.PTransformOverrideFactory; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.GroupByKey; import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.GroupIntoBatches.BatchingParams; import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.util.ShardedKey; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; @SuppressWarnings({ "rawtypes" // TODO(https://issues.apache.org/jira/browse/BEAM-10556) @@ -57,7 +59,7 @@ static class BatchGroupIntoBatchesOverrideFactory transform) { return PTransformReplacement.of( PTransformReplacements.getSingletonMainInput(transform), - new BatchGroupIntoBatches<>(transform.getTransform().getBatchSize())); + new BatchGroupIntoBatches<>(transform.getTransform().getBatchingParams())); } @Override @@ -70,15 +72,20 @@ public Map, ReplacementOutput> mapOutputs( /** Specialized implementation of {@link GroupIntoBatches} for bounded Dataflow pipelines. */ static class BatchGroupIntoBatches extends PTransform>, PCollection>>> { + private final BatchingParams batchingParams; - private final long batchSize; - - private BatchGroupIntoBatches(long batchSize) { - this.batchSize = batchSize; + private BatchGroupIntoBatches(BatchingParams batchingParams) { + this.batchingParams = batchingParams; } @Override public PCollection>> expand(PCollection> input) { + KvCoder inputCoder = (KvCoder) input.getCoder(); + final Coder valueCoder = (Coder) inputCoder.getCoderArguments().get(1); + final SerializableFunction weigher = batchingParams.getWeigher(valueCoder); + long maxBatchSizeElements = batchingParams.getBatchSize(); + long maxBatchSizeBytes = batchingParams.getBatchSizeBytes(); + return input .apply("GroupAll", GroupByKey.create()) .apply( @@ -87,14 +94,23 @@ public PCollection>> expand(PCollection> input) { new DoFn>, KV>>() { @ProcessElement public void process(ProcessContext c) { - // Iterators.partition lazily creates the partitions as they are accessed - // allowing it to partition very large iterators. - Iterator> iterator = - Iterators.partition(c.element().getValue().iterator(), (int) batchSize); - - // Note that GroupIntoBatches only outputs when the batch is non-empty. - while (iterator.hasNext()) { - c.output(KV.of(c.element().getKey(), iterator.next())); + List currentBatch = Lists.newArrayList(); + long batchSizeBytes = 0; + for (V element : c.element().getValue()) { + currentBatch.add(element); + if (weigher != null) { + batchSizeBytes += weigher.apply(element); + } + if (currentBatch.size() == maxBatchSizeElements + || (maxBatchSizeBytes != Long.MAX_VALUE + && batchSizeBytes >= maxBatchSizeBytes)) { + c.output(KV.of(c.element().getKey(), currentBatch)); + currentBatch = Lists.newArrayList(); + batchSizeBytes = 0; + } + } + if (!currentBatch.isEmpty()) { + c.output(KV.of(c.element().getKey(), currentBatch)); } } })); @@ -117,7 +133,7 @@ static class BatchGroupIntoBatchesWithShardedKeyOverrideFactory transform) { return PTransformReplacement.of( PTransformReplacements.getSingletonMainInput(transform), - new BatchGroupIntoBatchesWithShardedKey<>(transform.getTransform().getBatchSize())); + new BatchGroupIntoBatchesWithShardedKey<>(transform.getTransform().getBatchingParams())); } @Override @@ -135,15 +151,15 @@ public Map, ReplacementOutput> mapOutputs( static class BatchGroupIntoBatchesWithShardedKey extends PTransform>, PCollection, Iterable>>> { - private final long batchSize; + private final BatchingParams batchingParams; - private BatchGroupIntoBatchesWithShardedKey(long batchSize) { - this.batchSize = batchSize; + private BatchGroupIntoBatchesWithShardedKey(BatchingParams batchingParams) { + this.batchingParams = batchingParams; } @Override public PCollection, Iterable>> expand(PCollection> input) { - return shardKeys(input).apply(new BatchGroupIntoBatches<>(batchSize)); + return shardKeys(input).apply(new BatchGroupIntoBatches<>(batchingParams)); } } diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactory.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactory.java index a612fb6f27ed..8b844278346d 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactory.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactory.java @@ -19,7 +19,6 @@ import static org.apache.beam.runners.core.construction.PTransformTranslation.PAR_DO_TRANSFORM_URN; import static org.apache.beam.runners.core.construction.ParDoTranslation.translateTimerFamilySpec; -import static org.apache.beam.sdk.options.ExperimentalOptions.hasExperiment; import static org.apache.beam.sdk.transforms.reflect.DoFnSignatures.getStateSpecOrThrow; import static org.apache.beam.sdk.transforms.reflect.DoFnSignatures.getTimerFamilySpecOrThrow; import static org.apache.beam.sdk.transforms.reflect.DoFnSignatures.getTimerSpecOrThrow; @@ -41,6 +40,7 @@ import org.apache.beam.runners.core.construction.SdkComponents; import org.apache.beam.runners.core.construction.SingleInputOutputOverrideFactory; import org.apache.beam.runners.core.construction.TransformPayloadTranslatorRegistrar; +import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.runners.AppliedPTransform; @@ -175,7 +175,8 @@ private static RunnerApi.ParDoPayload payloadForParDoSingle( final DoFn doFn = parDo.getFn(); final DoFnSignature signature = DoFnSignatures.getSignature(doFn.getClass()); - if (!hasExperiment(transform.getPipeline().getOptions(), "beam_fn_api")) { + if (!DataflowRunner.useUnifiedWorker( + transform.getPipeline().getOptions().as(DataflowPipelineOptions.class))) { checkArgument( !signature.processElement().isSplittable(), String.format( diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TestDataflowRunner.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TestDataflowRunner.java index 27a33a9ebfba..c85529005573 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TestDataflowRunner.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TestDataflowRunner.java @@ -114,7 +114,11 @@ DataflowPipelineJob run(Pipeline pipeline, DataflowRunner runner) { new ErrorMonitorMessagesHandler(job, new MonitoringUtil.LoggingHandler()); if (options.isStreaming()) { - jobSuccess = waitForStreamingJobTermination(job, messageHandler); + if (options.isBlockOnRun()) { + jobSuccess = waitForStreamingJobTermination(job, messageHandler); + } else { + jobSuccess = true; + } // No metrics in streaming allAssertionsPassed = Optional.absent(); } else { @@ -204,11 +208,17 @@ private boolean waitForBatchJobTermination( private static String errorMessage( DataflowPipelineJob job, ErrorMonitorMessagesHandler messageHandler) { - return Strings.isNullOrEmpty(messageHandler.getErrorMessage()) - ? String.format( - "Dataflow job %s terminated in state %s but did not return a failure reason.", - job.getJobId(), job.getState()) - : messageHandler.getErrorMessage(); + if (!Strings.isNullOrEmpty(messageHandler.getErrorMessage())) { + return messageHandler.getErrorMessage(); + } else { + State state = job.getState(); + return String.format( + "Dataflow job %s terminated in state %s but did not return a failure reason.", + job.getJobId(), + state == State.UNRECOGNIZED + ? String.format("UNRECOGNIZED (%s)", job.getLatestStateString()) + : state.toString()); + } } @VisibleForTesting diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TransformTranslator.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TransformTranslator.java index d354d53b9ab1..8ea7378bd8a7 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TransformTranslator.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TransformTranslator.java @@ -49,11 +49,6 @@ public interface TransformTranslator { * including reading and writing the values of {@link PCollection}s and side inputs. */ interface TranslationContext { - default boolean isFnApi() { - List experiments = getPipelineOptions().getExperiments(); - return experiments != null && experiments.contains("beam_fn_api"); - } - default boolean isStreamingEngine() { List experiments = getPipelineOptions().getExperiments(); return experiments != null diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java index 4811ec008189..f0c16ff0aa43 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java @@ -220,6 +220,16 @@ public Dataflow create(PipelineOptions options) { void setWorkerCacheMb(Integer value); + /** + * The amount of time before UnboundedReaders are considered idle and closed during streaming + * execution. + */ + @Description("The amount of time before UnboundedReaders are uncached, in seconds.") + @Default.Integer(60) + Integer getReaderCacheTimeoutSec(); + + void setReaderCacheTimeoutSec(Integer value); + /** * CAUTION: This option implies dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps * can of comparable size to the default boot disk. Consider increasing the boot disk size before diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java index cd5831a411ae..7d3be45e647b 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.dataflow.options; +import java.util.List; import java.util.Map; import org.apache.beam.runners.dataflow.DataflowRunner; import org.apache.beam.sdk.annotations.Experimental; @@ -94,8 +95,6 @@ public interface DataflowPipelineOptions void setUpdate(boolean value); /** If set, the snapshot from which the job should be created. */ - @Hidden - @Experimental @Description("If set, the snapshot from which the job should be created.") String getCreateFromSnapshot(); @@ -109,6 +108,17 @@ public interface DataflowPipelineOptions void setTemplateLocation(String value); + /** + * Service options are set by the user and configure the service. This decouples service side + * feature availability from the Apache Beam release cycle. + */ + @Description( + "Service options are set by the user and configure the service. This " + + "decouples service side feature availability from the Apache Beam release cycle.") + List getDataflowServiceOptions(); + + void setDataflowServiceOptions(List options); + /** Run the job as a specific service account, instead of the default GCE robot. */ @Hidden @Experimental diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java index 89892d0ab003..1978b17fee94 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java @@ -18,21 +18,17 @@ package org.apache.beam.runners.dataflow.options; import com.fasterxml.jackson.annotation.JsonIgnore; -import java.util.List; -import org.apache.beam.runners.dataflow.DataflowRunnerInfo; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; -import org.apache.beam.sdk.options.Default; -import org.apache.beam.sdk.options.DefaultValueFactory; import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.FileStagingOptions; import org.apache.beam.sdk.options.Hidden; -import org.apache.beam.sdk.options.PipelineOptions; import org.checkerframework.checker.nullness.qual.Nullable; /** Options that are used to configure the Dataflow pipeline worker pool. */ @Description("Options that are used to configure the Dataflow pipeline worker pool.") -public interface DataflowPipelineWorkerPoolOptions extends GcpOptions { +public interface DataflowPipelineWorkerPoolOptions extends GcpOptions, FileStagingOptions { /** * Number of workers to use when executing the Dataflow job. Note that selection of an autoscaling * algorithm other then {@code NONE} will affect the size of the worker pool. If left unspecified, @@ -112,30 +108,32 @@ public String getAlgorithm() { void setDiskSizeGb(int value); - /** - * Docker container image that executes Dataflow worker harness, residing in Google Container - * Registry. - */ - @Default.InstanceFactory(WorkerHarnessContainerImageFactory.class) + /** Container image used as Dataflow worker harness image. */ + /** @deprecated Use {@link #getSdkContainerImage} instead. */ @Description( - "Docker container image that executes Dataflow worker harness, residing in Google " - + " Container Registry.") + "Container image used to configure a Dataflow worker. " + + "Can only be used for official Dataflow container images. " + + "Prefer using sdkContainerImage instead.") + @Deprecated @Hidden String getWorkerHarnessContainerImage(); + /** @deprecated Use {@link #setSdkContainerImage} instead. */ + @Deprecated + @Hidden void setWorkerHarnessContainerImage(String value); /** - * Returns the default Docker container image that executes Dataflow worker harness, residing in - * Google Container Registry. + * Container image used to configure SDK execution environment on worker. Used for custom + * containers on portable pipelines only. */ - class WorkerHarnessContainerImageFactory implements DefaultValueFactory { - @Override - public String create(PipelineOptions options) { - String containerVersion = DataflowRunnerInfo.getDataflowRunnerInfo().getContainerVersion(); - return String.format("gcr.io/cloud-dataflow/v1beta3/IMAGE:%s", containerVersion); - } - } + @Description( + "Container image used to configure the SDK execution environment of " + + "pipeline code on a worker. For non-portable pipelines, can only be " + + "used for official Dataflow container images.") + String getSdkContainerImage(); + + void setSdkContainerImage(String value); /** * GCE network for launching @@ -184,21 +182,6 @@ public String create(PipelineOptions options) { void setWorkerMachineType(String value); - /** - * List of local files to make available to workers. - * - *

    Files are placed on the worker's classpath. - * - *

    The default value is the list of jars from the main program's classpath. - */ - @Description( - "Files to stage on GCS and make available to workers. " - + "Files are placed on the worker's classpath. " - + "The default value is all files from the classpath.") - List getFilesToStage(); - - void setFilesToStage(List value); - /** * Specifies what type of persistent disk is used. The value is a full disk type resource, e.g., * compute.googleapis.com/projects//zones//diskTypes/pd-ssd. For more information, see the customWindow() { + return new CloudObjectTranslator() { + @Override + public CloudObject toCloudObject( + TimestampPrefixingWindowCoder target, SdkComponents sdkComponents) { + CloudObject result = CloudObject.forClassName(CloudObjectKinds.KIND_CUSTOM_WINDOW); + return addComponents(result, target.getComponents(), sdkComponents); + } + + @Override + public TimestampPrefixingWindowCoder fromCloudObject(CloudObject cloudObject) { + List> components = getComponents(cloudObject); + checkArgument(components.size() == 1, "Expecting 1 component, got %s", components.size()); + return TimestampPrefixingWindowCoder.of((Coder) components.get(0)); + } + + @Override + public Class getSupportedClass() { + return TimestampPrefixingWindowCoder.class; + } + + @Override + public String cloudObjectClassName() { + return CloudObjectKinds.KIND_CUSTOM_WINDOW; + } + }; + } + /** * Returns a {@link CloudObjectTranslator} that produces a {@link CloudObject} that is of kind * "windowed_value". diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/CloudObjects.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/CloudObjects.java index 766a32d695c5..56c4d5736397 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/CloudObjects.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/CloudObjects.java @@ -31,6 +31,7 @@ import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.LengthPrefixCoder; +import org.apache.beam.sdk.coders.TimestampPrefixingWindowCoder; import org.apache.beam.sdk.coders.VarLongCoder; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.transforms.windowing.IntervalWindow.IntervalWindowCoder; @@ -58,7 +59,8 @@ private CloudObjects() {} Timer.Coder.class, LengthPrefixCoder.class, GlobalWindow.Coder.class, - FullWindowedValueCoder.class); + FullWindowedValueCoder.class, + TimestampPrefixingWindowCoder.class); static final Map, CloudObjectTranslator> CODER_TRANSLATORS = populateCoderTranslators(); diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DefaultCoderCloudObjectTranslatorRegistrar.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DefaultCoderCloudObjectTranslatorRegistrar.java index e572e250d44d..92070ef19c61 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DefaultCoderCloudObjectTranslatorRegistrar.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DefaultCoderCloudObjectTranslatorRegistrar.java @@ -65,6 +65,7 @@ public class DefaultCoderCloudObjectTranslatorRegistrar ImmutableList.of( CloudObjectTranslators.globalWindow(), CloudObjectTranslators.intervalWindow(), + CloudObjectTranslators.customWindow(), CloudObjectTranslators.bytes(), CloudObjectTranslators.varInt(), CloudObjectTranslators.lengthPrefix(), @@ -74,6 +75,7 @@ public class DefaultCoderCloudObjectTranslatorRegistrar new AvroCoderCloudObjectTranslator(), new SerializableCoderCloudObjectTranslator(), new SchemaCoderCloudObjectTranslator(), + new RowCoderCloudObjectTranslator(), CloudObjectTranslators.iterableLike(CollectionCoder.class), CloudObjectTranslators.iterableLike(ListCoder.class), CloudObjectTranslators.iterableLike(SetCoder.class), diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/MonitoringUtil.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/MonitoringUtil.java index ae6eee19443f..851197abfeec 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/MonitoringUtil.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/MonitoringUtil.java @@ -30,13 +30,10 @@ import java.util.ArrayList; import java.util.Comparator; import java.util.List; -import java.util.Map; import org.apache.beam.runners.dataflow.DataflowClient; import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions; import org.apache.beam.sdk.PipelineResult.State; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Instant; import org.slf4j.Logger; @@ -45,24 +42,12 @@ /** A helper class for monitoring jobs submitted to the service. */ public class MonitoringUtil { + private static final Logger LOG = LoggerFactory.getLogger(MonitoringUtil.class); + private static final String GCLOUD_DATAFLOW_PREFIX = "gcloud dataflow"; private static final String ENDPOINT_OVERRIDE_ENV_VAR = "CLOUDSDK_API_ENDPOINT_OVERRIDES_DATAFLOW"; - private static final Map DATAFLOW_STATE_TO_JOB_STATE = - ImmutableMap.builder() - .put("JOB_STATE_UNKNOWN", State.UNKNOWN) - .put("JOB_STATE_STOPPED", State.STOPPED) - .put("JOB_STATE_RUNNING", State.RUNNING) - .put("JOB_STATE_DONE", State.DONE) - .put("JOB_STATE_FAILED", State.FAILED) - .put("JOB_STATE_CANCELLED", State.CANCELLED) - .put("JOB_STATE_UPDATED", State.UPDATED) - // A DRAINING job is still running - the closest mapping is RUNNING. - .put("JOB_STATE_DRAINING", State.RUNNING) - // A DRAINED job has successfully terminated - the closest mapping is DONE. - .put("JOB_STATE_DRAINED", State.DONE) - .build(); private static final String JOB_MESSAGE_ERROR = "JOB_MESSAGE_ERROR"; private static final String JOB_MESSAGE_WARNING = "JOB_MESSAGE_WARNING"; private static final String JOB_MESSAGE_BASIC = "JOB_MESSAGE_BASIC"; @@ -218,7 +203,38 @@ public static String getGcloudCancelCommand(DataflowPipelineOptions options, Str jobId); } - public static State toState(String stateName) { - return MoreObjects.firstNonNull(DATAFLOW_STATE_TO_JOB_STATE.get(stateName), State.UNRECOGNIZED); + public static State toState(@Nullable String stateName) { + if (stateName == null) { + return State.UNRECOGNIZED; + } + + switch (stateName) { + case "JOB_STATE_UNKNOWN": + return State.UNKNOWN; + case "JOB_STATE_STOPPED": + return State.STOPPED; + case "JOB_STATE_FAILED": + return State.FAILED; + case "JOB_STATE_CANCELLED": + return State.CANCELLED; + case "JOB_STATE_UPDATED": + return State.UPDATED; + + case "JOB_STATE_RUNNING": + case "JOB_STATE_PENDING": // Job has not yet started; closest mapping is RUNNING + case "JOB_STATE_DRAINING": // Job is still active; the closest mapping is RUNNING + case "JOB_STATE_CANCELLING": // Job is still active; the closest mapping is RUNNING + return State.RUNNING; + + case "JOB_STATE_DONE": + case "JOB_STATE_DRAINED": // Job has successfully terminated; closest mapping is DONE + return State.DONE; + default: + LOG.warn( + "Unrecognized state from Dataflow service: {}." + + " This is likely due to using an older version of Beam.", + stateName); + return State.UNRECOGNIZED; + } } } diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PropertyNames.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PropertyNames.java index f48bcacfab57..f1a8993a3be3 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PropertyNames.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PropertyNames.java @@ -64,6 +64,7 @@ public class PropertyNames { public static final String VALUE = "value"; public static final String WINDOWING_STRATEGY = "windowing_strategy"; public static final String DISPLAY_DATA = "display_data"; + public static final String RESOURCE_HINTS = "resource_hints"; public static final String PRESERVES_KEYS = "preserves_keys"; /** * @deprecated Uses the incorrect terminology. {@link #RESTRICTION_ENCODING}. Should be removed diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/RowCoderCloudObjectTranslator.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/RowCoderCloudObjectTranslator.java new file mode 100644 index 000000000000..add8d4be4459 --- /dev/null +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/RowCoderCloudObjectTranslator.java @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.dataflow.util; + +import java.io.IOException; +import java.util.UUID; +import javax.annotation.Nullable; +import org.apache.beam.model.pipeline.v1.SchemaApi; +import org.apache.beam.runners.core.construction.SdkComponents; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.coders.RowCoder; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.SchemaTranslation; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.JsonFormat; + +/** Translator for row coders. */ +@Experimental(Kind.SCHEMAS) +public class RowCoderCloudObjectTranslator implements CloudObjectTranslator { + private static final String SCHEMA = "schema"; + + @Override + public Class getSupportedClass() { + return RowCoder.class; + } + + /** Convert to a cloud object. */ + @Override + public CloudObject toCloudObject(RowCoder target, SdkComponents sdkComponents) { + CloudObject base = CloudObject.forClass(RowCoder.class); + try { + Structs.addString( + base, + SCHEMA, + JsonFormat.printer().print(SchemaTranslation.schemaToProto(target.getSchema(), true))); + } catch (Exception e) { + throw new RuntimeException(e); + } + return base; + } + + /** Convert from a cloud object. */ + @Override + public RowCoder fromCloudObject(CloudObject cloudObject) { + try { + SchemaApi.Schema.Builder schemaBuilder = SchemaApi.Schema.newBuilder(); + JsonFormat.parser().merge(Structs.getString(cloudObject, SCHEMA), schemaBuilder); + Schema schema = SchemaTranslation.schemaFromProto(schemaBuilder.build()); + @Nullable UUID uuid = schema.getUUID(); + if (schema.isEncodingPositionsOverridden() && uuid != null) { + RowCoder.overrideEncodingPositions(uuid, schema.getEncodingPositions()); + } + return RowCoder.of(schema); + } catch (IOException e) { + throw new RuntimeException(e); + } + } + + @Override + public String cloudObjectClassName() { + return CloudObject.forClass(RowCoder.class).getClassName(); + } +} diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/SchemaCoderCloudObjectTranslator.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/SchemaCoderCloudObjectTranslator.java index 5657ded82278..bc20399b1f14 100644 --- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/SchemaCoderCloudObjectTranslator.java +++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/SchemaCoderCloudObjectTranslator.java @@ -18,6 +18,8 @@ package org.apache.beam.runners.dataflow.util; import java.io.IOException; +import java.util.UUID; +import javax.annotation.Nullable; import org.apache.beam.model.pipeline.v1.SchemaApi; import org.apache.beam.runners.core.construction.SdkComponents; import org.apache.beam.sdk.annotations.Experimental; @@ -29,6 +31,7 @@ import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.util.StringUtils; import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.JsonFormat; /** Translator for Schema coders. */ @Experimental(Kind.SCHEMAS) @@ -61,11 +64,15 @@ public CloudObject toCloudObject(SchemaCoder target, SdkComponents sdkComponents FROM_ROW_FUNCTION, StringUtils.byteArrayToJsonString( SerializableUtils.serializeToByteArray(target.getFromRowFunction()))); - Structs.addString( - base, - SCHEMA, - StringUtils.byteArrayToJsonString( - SchemaTranslation.schemaToProto(target.getSchema(), true).toByteArray())); + + try { + Structs.addString( + base, + SCHEMA, + JsonFormat.printer().print(SchemaTranslation.schemaToProto(target.getSchema(), true))); + } catch (Exception e) { + throw new RuntimeException(e); + } return base; } @@ -91,10 +98,13 @@ public SchemaCoder fromCloudObject(CloudObject cloudObject) { StringUtils.jsonStringToByteArray( Structs.getString(cloudObject, FROM_ROW_FUNCTION)), "fromRowFunction"); - SchemaApi.Schema protoSchema = - SchemaApi.Schema.parseFrom( - StringUtils.jsonStringToByteArray(Structs.getString(cloudObject, SCHEMA))); - Schema schema = SchemaTranslation.schemaFromProto(protoSchema); + SchemaApi.Schema.Builder schemaBuilder = SchemaApi.Schema.newBuilder(); + JsonFormat.parser().merge(Structs.getString(cloudObject, SCHEMA), schemaBuilder); + Schema schema = SchemaTranslation.schemaFromProto(schemaBuilder.build()); + @Nullable UUID uuid = schema.getUUID(); + if (schema.isEncodingPositionsOverridden() && uuid != null) { + SchemaCoder.overrideEncodingPositions(uuid, schema.getEncodingPositions()); + } return SchemaCoder.of(schema, typeDescriptor, toRowFunction, fromRowFunction); } catch (IOException e) { throw new RuntimeException(e); diff --git a/runners/google-cloud-dataflow-java/src/main/resources/org/apache/beam/runners/dataflow/dataflow.properties b/runners/google-cloud-dataflow-java/src/main/resources/org/apache/beam/runners/dataflow/dataflow.properties index 7d4bdf0d91d4..956c447e5faa 100644 --- a/runners/google-cloud-dataflow-java/src/main/resources/org/apache/beam/runners/dataflow/dataflow.properties +++ b/runners/google-cloud-dataflow-java/src/main/resources/org/apache/beam/runners/dataflow/dataflow.properties @@ -18,4 +18,6 @@ legacy.environment.major.version=@dataflow.legacy_environment_major_version@ fnapi.environment.major.version=@dataflow.fnapi_environment_major_version@ -container.version=@dataflow.container_version@ +legacy.container.version=@dataflow.legacy_container_version@ +fnapi.container.version=@dataflow.fnapi_container_version@ +container.base_repository=@dataflow.container_base_repository@ diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverridesTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverridesTest.java index 2e97a3979eda..f6e683e4a81f 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverridesTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverridesTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.dataflow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.any; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; @@ -58,9 +58,6 @@ /** Tests for {@link BatchStatefulParDoOverrides}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BatchStatefulParDoOverridesTest implements Serializable { @Test @@ -73,13 +70,13 @@ public void testSingleOutputOverrideNonCrashing() throws Exception { pipeline.apply(Create.of(KV.of(1, 2))).apply(ParDo.of(fn)); DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); assertThat(findBatchStatefulDoFn(pipeline), equalTo((DoFn) fn)); } @Test public void testFnApiSingleOutputOverrideNonCrashing() throws Exception { - DataflowPipelineOptions options = buildPipelineOptions("--experiments=beam_fn_api"); + DataflowPipelineOptions options = buildPipelineOptions(); options.setRunner(DataflowRunner.class); Pipeline pipeline = Pipeline.create(options); @@ -87,7 +84,7 @@ public void testFnApiSingleOutputOverrideNonCrashing() throws Exception { pipeline.apply(Create.of(KV.of(1, 2))).apply(ParDo.of(fn)); DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); assertThat(findBatchStatefulDoFn(pipeline), equalTo((DoFn) fn)); } @@ -106,7 +103,7 @@ public void testMultiOutputOverrideNonCrashing() throws Exception { .apply(ParDo.of(fn).withOutputTags(mainOutputTag, TupleTagList.of(sideOutputTag))); DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); assertThat(findBatchStatefulDoFn(pipeline), equalTo((DoFn) fn)); } @@ -116,7 +113,7 @@ public void testMultiOutputOverrideNonCrashing() throws Exception { + "exposes a way to know when the replacement is not required by checking that the " + "preceding ParDos to a GBK are key preserving.") public void testFnApiMultiOutputOverrideNonCrashing() throws Exception { - DataflowPipelineOptions options = buildPipelineOptions("--experiments=beam_fn_api"); + DataflowPipelineOptions options = buildPipelineOptions(); options.setRunner(DataflowRunner.class); Pipeline pipeline = Pipeline.create(options); @@ -129,7 +126,7 @@ public void testFnApiMultiOutputOverrideNonCrashing() throws Exception { .apply(ParDo.of(fn).withOutputTags(mainOutputTag, TupleTagList.of(sideOutputTag))); DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); assertThat(findBatchStatefulDoFn(pipeline), equalTo((DoFn) fn)); } diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchViewOverridesTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchViewOverridesTest.java index 65de4bb8feab..9ce81bf5764f 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchViewOverridesTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchViewOverridesTest.java @@ -18,10 +18,10 @@ package org.apache.beam.runners.dataflow; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.List; import java.util.Map; diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowMetricsTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowMetricsTest.java index b795142bfa1d..2ece1f8436e0 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowMetricsTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowMetricsTest.java @@ -61,7 +61,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class DataflowMetricsTest { private static final String PROJECT_ID = "some-project"; diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPTransformMatchersTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPTransformMatchersTest.java index b5ee82782724..e1b01fe40640 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPTransformMatchersTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPTransformMatchersTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.coders.KvCoder; @@ -43,9 +43,6 @@ /** Tests for {@link DataflowPTransformMatchers}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowPTransformMatchersTest { /** diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java index 6e08aaed0e20..1b05e73f614e 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java @@ -57,6 +57,7 @@ import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.testing.ExpectedLogs; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PInput; import org.apache.beam.sdk.values.POutput; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -75,9 +76,6 @@ /** Tests for DataflowPipelineJob. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowPipelineJobTest { private static final String PROJECT_ID = "some-project"; private static final String REGION_ID = "some-region-2b"; @@ -380,7 +378,12 @@ public void testGetStateNoThrowWithExceptionReturnsUnknown() throws Exception { PInput input = mock(PInput.class); when(input.getPipeline()).thenReturn(p); return AppliedPTransform.of( - fullName, Collections.emptyMap(), Collections.emptyMap(), transform, p); + fullName, + Collections.emptyMap(), + Collections.emptyMap(), + transform, + ResourceHints.create(), + p); } private static class FastNanoClockAndFuzzySleeper implements NanoClock, Sleeper { diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java index 6f245cc2215a..0445a703bbd8 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java @@ -19,6 +19,8 @@ import static org.apache.beam.runners.dataflow.util.Structs.getString; import static org.apache.beam.sdk.util.StringUtils.jsonStringToByteArray; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasKey; @@ -28,7 +30,6 @@ import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Matchers.argThat; @@ -51,17 +52,17 @@ import java.util.List; import java.util.Map; import java.util.Set; +import java.util.stream.Collectors; +import java.util.stream.Stream; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.ArtifactInformation; import org.apache.beam.model.pipeline.v1.RunnerApi.Components; import org.apache.beam.model.pipeline.v1.RunnerApi.DockerPayload; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; -import org.apache.beam.model.pipeline.v1.RunnerApi.ParDoPayload; import org.apache.beam.runners.core.construction.Environments; +import org.apache.beam.runners.core.construction.ModelCoders; import org.apache.beam.runners.core.construction.PTransformTranslation; -import org.apache.beam.runners.core.construction.ParDoTranslation; import org.apache.beam.runners.core.construction.PipelineTranslation; -import org.apache.beam.runners.core.construction.RehydratedComponents; import org.apache.beam.runners.core.construction.SdkComponents; import org.apache.beam.runners.dataflow.DataflowPipelineTranslator.JobSpecification; import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions; @@ -90,6 +91,7 @@ import org.apache.beam.sdk.state.StateSpec; import org.apache.beam.sdk.state.StateSpecs; import org.apache.beam.sdk.state.ValueState; +import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; @@ -102,6 +104,8 @@ import org.apache.beam.sdk.transforms.Sum; import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHintsOptions; import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; import org.apache.beam.sdk.transforms.windowing.FixedWindows; import org.apache.beam.sdk.transforms.windowing.Window; @@ -111,10 +115,13 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.PDone; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.beam.sdk.values.WindowingStrategy; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; @@ -133,7 +140,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class DataflowPipelineTranslatorTest implements Serializable { @@ -142,10 +148,10 @@ public class DataflowPipelineTranslatorTest implements Serializable { private SdkComponents createSdkComponents(PipelineOptions options) { SdkComponents sdkComponents = SdkComponents.create(); - String workerHarnessContainerImageURL = + String containerImageURL = DataflowRunner.getContainerImageForJob(options.as(DataflowPipelineOptions.class)); RunnerApi.Environment defaultEnvironmentForDataflow = - Environments.createDockerEnvironment(workerHarnessContainerImageURL); + Environments.createDockerEnvironment(containerImageURL); sdkComponents.registerEnvironment(defaultEnvironmentForDataflow); return sdkComponents; @@ -178,7 +184,7 @@ private Pipeline buildPipeline(DataflowPipelineOptions options) { p.apply("ReadMyFile", TextIO.read().from("gs://bucket/object")) .apply("WriteMyFile", TextIO.write().to("gs://bucket/object")); DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(p); + runner.replaceV1Transforms(p); return p; } @@ -355,6 +361,7 @@ public void testScalingAlgorithmNone() throws IOException { DataflowPipelineOptions options = buildPipelineOptions(); options.setAutoscalingAlgorithm(noScaling); + options.setNumWorkers(42); Pipeline p = buildPipeline(options); p.traverseTopologically(new RecordingPipelineVisitor()); @@ -374,6 +381,7 @@ public void testScalingAlgorithmNone() throws IOException { assertEquals( "AUTOSCALING_ALGORITHM_NONE", job.getEnvironment().getWorkerPools().get(0).getAutoscalingSettings().getAlgorithm()); + assertEquals(42, job.getEnvironment().getWorkerPools().get(0).getNumWorkers().intValue()); assertEquals( 0, job.getEnvironment() @@ -418,6 +426,29 @@ public void testMaxNumWorkersIsPassedWhenNoAlgorithmIsSet() throws IOException { .intValue()); } + @Test + public void testNumWorkersCannotExceedMaxNumWorkers() throws IOException { + DataflowPipelineOptions options = buildPipelineOptions(); + options.setNumWorkers(43); + options.setMaxNumWorkers(42); + + Pipeline p = buildPipeline(options); + p.traverseTopologically(new RecordingPipelineVisitor()); + SdkComponents sdkComponents = createSdkComponents(options); + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p, sdkComponents, true); + + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage("numWorkers (43) cannot exceed maxNumWorkers (42)."); + DataflowPipelineTranslator.fromOptions(options) + .translate( + p, + pipelineProto, + sdkComponents, + DataflowRunner.fromOptions(options), + Collections.emptyList()) + .getJob(); + } + @Test public void testWorkerMachineTypeConfig() throws IOException { final String testMachineType = "test-machine-type"; @@ -648,7 +679,7 @@ public void testNamesOverridden() throws Exception { pipeline.apply("Jazzy", Create.of(3)).setName("foobizzle"); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); SdkComponents sdkComponents = createSdkComponents(options); RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); @@ -704,7 +735,7 @@ public void drop() {} outputs.get(tag2).setName("gonzaggle"); outputs.get(tag3).setName("froonazzle"); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); SdkComponents sdkComponents = createSdkComponents(options); RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); @@ -758,7 +789,7 @@ public void process(ProcessContext c) { }) .withOutputTags(mainOutputTag, TupleTagList.empty())); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); SdkComponents sdkComponents = createSdkComponents(options); RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); @@ -792,6 +823,104 @@ public void process(ProcessContext c) { not(equalTo("true"))); } + /** Testing just the translation of the pipeline from ViewTest#testToList. */ + @Test + public void testToList() throws Exception { + DataflowPipelineOptions options = buildPipelineOptions(); + Pipeline pipeline = Pipeline.create(options); + + final PCollectionView> view = + pipeline.apply("CreateSideInput", Create.of(11, 13, 17, 23)).apply(View.asList()); + + PCollection output = + pipeline + .apply("CreateMainInput", Create.of(29, 31)) + .apply( + "OutputSideInputs", + ParDo.of( + new DoFn() { + @ProcessElement + public void processElement(ProcessContext c) { + checkArgument(c.sideInput(view).size() == 4); + checkArgument( + c.sideInput(view).get(0).equals(c.sideInput(view).get(0))); + for (Integer i : c.sideInput(view)) { + c.output(i); + } + } + }) + .withSideInputs(view)); + + DataflowRunner runner = DataflowRunner.fromOptions(options); + DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options); + + runner.replaceV1Transforms(pipeline); + + SdkComponents sdkComponents = createSdkComponents(options); + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); + Job job = + translator + .translate(pipeline, pipelineProto, sdkComponents, runner, Collections.emptyList()) + .getJob(); + List steps = job.getSteps(); + + // Change detector assertion just to make sure the test was not a noop. + // No need to actually check the pipeline as the ValidatesRunner tests + // ensure translation is correct. This is just a quick check to see that translation + // does not crash. + assertEquals(5, steps.size()); + } + + @Test + public void testToMap() throws Exception { + DataflowPipelineOptions options = buildPipelineOptions(); + Pipeline pipeline = Pipeline.create(options); + + final PCollectionView> view = + pipeline + .apply("CreateSideInput", Create.of(KV.of("a", 1), KV.of("b", 3))) + .apply(View.asMap()); + + PCollection> output = + pipeline + .apply("CreateMainInput", Create.of("apple", "banana", "blackberry")) + .apply( + "OutputSideInputs", + ParDo.of( + new DoFn>() { + @ProcessElement + public void processElement(ProcessContext c) { + c.output( + KV.of( + c.element(), + c.sideInput(view).get(c.element().substring(0, 1)))); + } + }) + .withSideInputs(view)); + + PAssert.that(output) + .containsInAnyOrder(KV.of("apple", 1), KV.of("banana", 3), KV.of("blackberry", 3)); + + DataflowRunner runner = DataflowRunner.fromOptions(options); + DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options); + + runner.replaceV1Transforms(pipeline); + + SdkComponents sdkComponents = createSdkComponents(options); + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); + Job job = + translator + .translate(pipeline, pipelineProto, sdkComponents, runner, Collections.emptyList()) + .getJob(); + List steps = job.getSteps(); + + // Change detector assertion just to make sure the test was not a noop. + // No need to actually check the pipeline as the ValidatesRunner tests + // ensure translation is correct. This is just a quick check to see that translation + // does not crash. + assertEquals(24, steps.size()); + } + /** Smoke test to fail fast if translation of a splittable ParDo in streaming breaks. */ @Test public void testStreamingSplittableParDoTranslation() throws Exception { @@ -808,7 +937,7 @@ public void testStreamingSplittableParDoTranslation() throws Exception { .apply(Window.into(FixedWindows.of(Duration.standardMinutes(1)))); windowedInput.apply(ParDo.of(new TestSplittableFn())); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); SdkComponents sdkComponents = createSdkComponents(options); RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); @@ -852,80 +981,6 @@ public void testStreamingSplittableParDoTranslation() throws Exception { KvCoder.of(SerializableCoder.of(OffsetRange.class), VoidCoder.of()), restrictionCoder); } - /** Smoke test to fail fast if translation of a splittable ParDo in FnAPI. */ - @Test - public void testSplittableParDoTranslationFnApi() throws Exception { - DataflowPipelineOptions options = buildPipelineOptions(); - options.setExperiments(Arrays.asList("beam_fn_api")); - DataflowRunner runner = DataflowRunner.fromOptions(options); - DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options); - - Pipeline pipeline = Pipeline.create(options); - - PCollection windowedInput = - pipeline - .apply(Impulse.create()) - .apply( - MapElements.via( - new SimpleFunction() { - @Override - public String apply(byte[] input) { - return ""; - } - })) - .apply(Window.into(FixedWindows.of(Duration.standardMinutes(1)))); - windowedInput.apply(ParDo.of(new TestSplittableFn())); - - runner.replaceTransforms(pipeline); - - SdkComponents sdkComponents = createSdkComponents(options); - RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); - JobSpecification result = - translator.translate( - pipeline, pipelineProto, sdkComponents, runner, Collections.emptyList()); - - Job job = result.getJob(); - - // The job should contain a ParDo step, containing a "restriction_encoding". - - List steps = job.getSteps(); - Step splittableParDo = null; - for (Step step : steps) { - if ("ParallelDo".equals(step.getKind()) - && step.getProperties().containsKey(PropertyNames.RESTRICTION_ENCODING)) { - assertNull(splittableParDo); - splittableParDo = step; - } - } - assertNotNull(splittableParDo); - - String fn = Structs.getString(splittableParDo.getProperties(), PropertyNames.SERIALIZED_FN); - - Components componentsProto = result.getPipelineProto().getComponents(); - RehydratedComponents components = RehydratedComponents.forComponents(componentsProto); - RunnerApi.PTransform splittableTransform = componentsProto.getTransformsOrThrow(fn); - assertEquals( - PTransformTranslation.PAR_DO_TRANSFORM_URN, splittableTransform.getSpec().getUrn()); - ParDoPayload payload = ParDoPayload.parseFrom(splittableTransform.getSpec().getPayload()); - assertThat( - ParDoTranslation.doFnWithExecutionInformationFromProto(payload.getDoFn()).getDoFn(), - instanceOf(TestSplittableFn.class)); - Coder expectedRestrictionAndStateCoder = - KvCoder.of(SerializableCoder.of(OffsetRange.class), VoidCoder.of()); - assertEquals( - expectedRestrictionAndStateCoder, components.getCoder(payload.getRestrictionCoderId())); - - // In the Fn API case, we still translate the restriction coder into the RESTRICTION_CODER - // property as a CloudObject, and it gets passed through the Dataflow backend, but in the end - // the Dataflow worker will end up fetching it from the SPK transform payload instead. - Coder restrictionCoder = - CloudObjects.coderFromCloudObject( - (CloudObject) - Structs.getObject( - splittableParDo.getProperties(), PropertyNames.RESTRICTION_ENCODING)); - assertEquals(expectedRestrictionAndStateCoder, restrictionCoder); - } - @Test public void testPortablePipelineContainsExpectedDependenciesAndCapabilities() throws Exception { DataflowPipelineOptions options = buildPipelineOptions(); @@ -948,7 +1003,7 @@ public String apply(byte[] input) { })) .apply(Window.into(FixedWindows.of(Duration.standardMinutes(1)))); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); File file1 = File.createTempFile("file1-", ".txt"); file1.deleteOnExit(); @@ -995,7 +1050,7 @@ public void testToSingletonTranslationWithIsmSideInput() throws Exception { Pipeline pipeline = Pipeline.create(options); pipeline.apply(Create.of(1)).apply(View.asSingleton()); DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); SdkComponents sdkComponents = createSdkComponents(options); RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); Job job = @@ -1031,7 +1086,7 @@ public void testToIterableTranslationWithIsmSideInput() throws Exception { pipeline.apply(Create.of(1, 2, 3)).apply(View.asIterable()); DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); SdkComponents sdkComponents = createSdkComponents(options); RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); Job job = @@ -1054,87 +1109,10 @@ public void testToIterableTranslationWithIsmSideInput() throws Exception { assertEquals("CollectionToSingleton", collectionToSingletonStep.getKind()); } - @Test - public void testToSingletonTranslationWithFnApiSideInput() throws Exception { - // A "change detector" test that makes sure the translation - // of getting a PCollectionView does not change - // in bad ways during refactor - + private JobSpecification runStreamingGroupIntoBatchesAndGetJobSpec( + Boolean withShardedKey, List experiments) throws IOException { DataflowPipelineOptions options = buildPipelineOptions(); - options.setExperiments(Arrays.asList("beam_fn_api")); - DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options); - - Pipeline pipeline = Pipeline.create(options); - pipeline.apply(Create.of(1)).apply(View.asSingleton()); - DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); - SdkComponents sdkComponents = createSdkComponents(options); - RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); - Job job = - translator - .translate(pipeline, pipelineProto, sdkComponents, runner, Collections.emptyList()) - .getJob(); - assertAllStepOutputsHaveUniqueIds(job); - - List steps = job.getSteps(); - assertEquals(9, steps.size()); - - Step collectionToSingletonStep = steps.get(steps.size() - 1); - assertEquals("CollectionToSingleton", collectionToSingletonStep.getKind()); - - @SuppressWarnings("unchecked") - List> ctsOutputs = - (List>) - steps.get(steps.size() - 1).getProperties().get(PropertyNames.OUTPUT_INFO); - assertTrue(Structs.getBoolean(Iterables.getOnlyElement(ctsOutputs), "use_indexed_format")); - } - - @Test - public void testToIterableTranslationWithFnApiSideInput() throws Exception { - // A "change detector" test that makes sure the translation - // of getting a PCollectionView> does not change - // in bad ways during refactor - - DataflowPipelineOptions options = buildPipelineOptions(); - options.setExperiments(Arrays.asList("beam_fn_api")); - DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options); - - Pipeline pipeline = Pipeline.create(options); - pipeline.apply(Create.of(1, 2, 3)).apply(View.asIterable()); - - DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); - SdkComponents sdkComponents = createSdkComponents(options); - RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); - Job job = - translator - .translate(pipeline, pipelineProto, sdkComponents, runner, Collections.emptyList()) - .getJob(); - assertAllStepOutputsHaveUniqueIds(job); - - List steps = job.getSteps(); - assertEquals(5, steps.size()); - - @SuppressWarnings("unchecked") - List> ctsOutputs = - (List>) - steps.get(steps.size() - 1).getProperties().get(PropertyNames.OUTPUT_INFO); - assertTrue(Structs.getBoolean(Iterables.getOnlyElement(ctsOutputs), "use_indexed_format")); - Step collectionToSingletonStep = steps.get(steps.size() - 1); - assertEquals("CollectionToSingleton", collectionToSingletonStep.getKind()); - } - - private Map runGroupIntoBatchesAndGetStepProperties( - Boolean withShardedKey, Boolean usesFnApi) throws IOException { - DataflowPipelineOptions options = buildPipelineOptions(); - options.setExperiments( - Arrays.asList( - "enable_streaming_auto_sharding", - GcpOptions.STREAMING_ENGINE_EXPERIMENT, - GcpOptions.WINDMILL_SERVICE_EXPERIMENT)); - if (usesFnApi) { - options.setExperiments(Arrays.asList("beam_fn_api")); - } + options.setExperiments(experiments); options.setStreaming(true); DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options); @@ -1148,21 +1126,23 @@ private Map runGroupIntoBatchesAndGetStepProperties( } DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); SdkComponents sdkComponents = createSdkComponents(options); RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); - Job job = - translator - .translate(pipeline, pipelineProto, sdkComponents, runner, Collections.emptyList()) - .getJob(); - List steps = job.getSteps(); - Step shardedStateStep = steps.get(steps.size() - 1); - return shardedStateStep.getProperties(); + return translator.translate( + pipeline, pipelineProto, sdkComponents, runner, Collections.emptyList()); } @Test public void testStreamingGroupIntoBatchesTranslation() throws Exception { - Map properties = runGroupIntoBatchesAndGetStepProperties(false, false); + List experiments = + new ArrayList<>( + ImmutableList.of( + GcpOptions.STREAMING_ENGINE_EXPERIMENT, GcpOptions.WINDMILL_SERVICE_EXPERIMENT)); + JobSpecification jobSpec = runStreamingGroupIntoBatchesAndGetJobSpec(false, experiments); + List steps = jobSpec.getJob().getSteps(); + Step shardedStateStep = steps.get(steps.size() - 1); + Map properties = shardedStateStep.getProperties(); assertTrue(properties.containsKey(PropertyNames.USES_KEYED_STATE)); assertEquals("true", getString(properties, PropertyNames.USES_KEYED_STATE)); assertFalse(properties.containsKey(PropertyNames.ALLOWS_SHARDABLE_STATE)); @@ -1171,7 +1151,14 @@ public void testStreamingGroupIntoBatchesTranslation() throws Exception { @Test public void testStreamingGroupIntoBatchesWithShardedKeyTranslation() throws Exception { - Map properties = runGroupIntoBatchesAndGetStepProperties(true, false); + List experiments = + new ArrayList<>( + ImmutableList.of( + GcpOptions.STREAMING_ENGINE_EXPERIMENT, GcpOptions.WINDMILL_SERVICE_EXPERIMENT)); + JobSpecification jobSpec = runStreamingGroupIntoBatchesAndGetJobSpec(true, experiments); + List steps = jobSpec.getJob().getSteps(); + Step shardedStateStep = steps.get(steps.size() - 1); + Map properties = shardedStateStep.getProperties(); assertTrue(properties.containsKey(PropertyNames.USES_KEYED_STATE)); assertEquals("true", getString(properties, PropertyNames.USES_KEYED_STATE)); assertTrue(properties.containsKey(PropertyNames.ALLOWS_SHARDABLE_STATE)); @@ -1181,13 +1168,88 @@ public void testStreamingGroupIntoBatchesWithShardedKeyTranslation() throws Exce } @Test - public void testStreamingGroupIntoBatchesTranslationFnApi() throws Exception { - Map properties = runGroupIntoBatchesAndGetStepProperties(true, true); + public void testStreamingGroupIntoBatchesTranslationUnifiedWorker() throws Exception { + List experiments = + new ArrayList<>( + ImmutableList.of( + GcpOptions.STREAMING_ENGINE_EXPERIMENT, + GcpOptions.WINDMILL_SERVICE_EXPERIMENT, + "use_runner_v2")); + JobSpecification jobSpec = runStreamingGroupIntoBatchesAndGetJobSpec(false, experiments); + List steps = jobSpec.getJob().getSteps(); + Step shardedStateStep = steps.get(steps.size() - 1); + Map properties = shardedStateStep.getProperties(); assertTrue(properties.containsKey(PropertyNames.USES_KEYED_STATE)); - assertEquals("true", getString(properties, PropertyNames.USES_KEYED_STATE)); - // "allows_shardable_state" is currently unsupported for portable jobs. assertFalse(properties.containsKey(PropertyNames.ALLOWS_SHARDABLE_STATE)); assertFalse(properties.containsKey(PropertyNames.PRESERVES_KEYS)); + + // Also checks runner proto is correctly populated. + Map transformMap = + jobSpec.getPipelineProto().getComponents().getTransformsMap(); + boolean transformFound = false; + for (Map.Entry transform : transformMap.entrySet()) { + RunnerApi.FunctionSpec spec = transform.getValue().getSpec(); + if (spec.getUrn().equals(PTransformTranslation.GROUP_INTO_BATCHES_URN)) { + transformFound = true; + } + } + assertTrue(transformFound); + } + + @Test + public void testGroupIntoBatchesWithShardedKeyTranslationUnifiedWorker() throws Exception { + List experiments = + new ArrayList<>( + ImmutableList.of( + GcpOptions.STREAMING_ENGINE_EXPERIMENT, + GcpOptions.WINDMILL_SERVICE_EXPERIMENT, + "use_runner_v2")); + JobSpecification jobSpec = runStreamingGroupIntoBatchesAndGetJobSpec(true, experiments); + List steps = jobSpec.getJob().getSteps(); + Step shardedStateStep = steps.get(steps.size() - 1); + Map properties = shardedStateStep.getProperties(); + assertTrue(properties.containsKey(PropertyNames.USES_KEYED_STATE)); + assertTrue(properties.containsKey(PropertyNames.ALLOWS_SHARDABLE_STATE)); + assertEquals("true", getString(properties, PropertyNames.ALLOWS_SHARDABLE_STATE)); + assertTrue(properties.containsKey(PropertyNames.PRESERVES_KEYS)); + assertEquals("true", getString(properties, PropertyNames.PRESERVES_KEYS)); + + // Also checks the runner proto is correctly populated. + Map transformMap = + jobSpec.getPipelineProto().getComponents().getTransformsMap(); + boolean transformFound = false; + for (Map.Entry transform : transformMap.entrySet()) { + RunnerApi.FunctionSpec spec = transform.getValue().getSpec(); + if (spec.getUrn().equals(PTransformTranslation.GROUP_INTO_BATCHES_WITH_SHARDED_KEY_URN)) { + for (String subtransform : transform.getValue().getSubtransformsList()) { + RunnerApi.PTransform ptransform = transformMap.get(subtransform); + if (ptransform.getSpec().getUrn().equals(PTransformTranslation.GROUP_INTO_BATCHES_URN)) { + transformFound = true; + } + } + } + } + assertTrue(transformFound); + + boolean coderFound = false; + Map coderMap = + jobSpec.getPipelineProto().getComponents().getCodersMap(); + for (Map.Entry coder : coderMap.entrySet()) { + if (coder.getValue().getSpec().getUrn().equals(ModelCoders.SHARDED_KEY_CODER_URN)) { + coderFound = true; + } + } + assertTrue(coderFound); + } + + @Test + public void testGroupIntoBatchesWithShardedKeyNotSupported() throws IOException { + // Not using streaming engine. + List experiments = new ArrayList<>(ImmutableList.of("use_runner_v2")); + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage( + "Runner determined sharding not available in Dataflow for GroupIntoBatches for non-Streaming-Engine jobs"); + runStreamingGroupIntoBatchesAndGetJobSpec(true, experiments); } @Test @@ -1232,7 +1294,7 @@ public void populateDisplayData(DisplayData.Builder builder) { pipeline.apply(Create.of(1, 2, 3)).apply(parDo1).apply(parDo2); DataflowRunner runner = DataflowRunner.fromOptions(options); - runner.replaceTransforms(pipeline); + runner.replaceV1Transforms(pipeline); SdkComponents sdkComponents = createSdkComponents(options); RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); Job job = @@ -1302,16 +1364,124 @@ public void populateDisplayData(DisplayData.Builder builder) { assertEquals(expectedFn2DisplayData, ImmutableSet.copyOf(fn2displayData)); } + @Test + public void testStepResourceHints() throws Exception { + DataflowPipelineOptions options = buildPipelineOptions(); + DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options); + Pipeline pipeline = Pipeline.create(options); + + pipeline + .apply(Create.of(1, 2, 3)) + .apply( + "Has hints", + MapElements.into(TypeDescriptors.integers()) + .via((Integer x) -> x + 1) + .setResourceHints( + ResourceHints.create() + .withMinRam("10.0GiB") + .withAccelerator("type:nvidia-tesla-k80;count:1;install-nvidia-driver"))); + + DataflowRunner runner = DataflowRunner.fromOptions(options); + runner.replaceV1Transforms(pipeline); + SdkComponents sdkComponents = createSdkComponents(options); + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); + Job job = + translator + .translate(pipeline, pipelineProto, sdkComponents, runner, Collections.emptyList()) + .getJob(); + + Step stepWithHints = job.getSteps().get(1); + ImmutableMap expectedHints = + ImmutableMap.builder() + .put("beam:resources:min_ram_bytes:v1", "10737418240") + .put( + "beam:resources:accelerator:v1", + "type:nvidia-tesla-k80;count:1;install-nvidia-driver") + .build(); + assertEquals(expectedHints, stepWithHints.getProperties().get("resource_hints")); + } + + private RunnerApi.PTransform getLeafTransform(RunnerApi.Pipeline pipelineProto, String label) { + for (RunnerApi.PTransform transform : + pipelineProto.getComponents().getTransformsMap().values()) { + if (transform.getUniqueName().contains(label) && transform.getSubtransformsCount() == 0) { + return transform; + } + } + throw new java.lang.IllegalArgumentException(label); + } + + private static class IdentityDoFn extends DoFn { + @ProcessElement + public void processElement(@Element T input, OutputReceiver out) { + out.output(input); + } + } + + private static class Inner extends PTransform, PCollection> { + @Override + public PCollection expand(PCollection input) { + return input.apply( + "Innermost", + ParDo.of(new IdentityDoFn()) + .setResourceHints(ResourceHints.create().withAccelerator("set_in_inner_transform"))); + } + } + + private static class Outer extends PTransform, PCollection> { + @Override + public PCollection expand(PCollection input) { + return input.apply(new Inner()); + } + } + + @Test + public void testResourceHintsTranslationsResolvesHintsOnOptionsAndComposites() { + ResourceHintsOptions options = PipelineOptionsFactory.as(ResourceHintsOptions.class); + options.setResourceHints(Arrays.asList("accelerator=set_via_options", "minRam=1B")); + Pipeline pipeline = Pipeline.create(options); + PCollection root = pipeline.apply(Impulse.create()); + root.apply( + new Outer() + .setResourceHints( + ResourceHints.create().withAccelerator("set_on_outer_transform").withMinRam(20))); + root.apply("Leaf", ParDo.of(new IdentityDoFn())); + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, false); + assertThat( + pipelineProto + .getComponents() + .getEnvironmentsMap() + .get(getLeafTransform(pipelineProto, "Leaf").getEnvironmentId()) + .getResourceHintsMap(), + org.hamcrest.Matchers.allOf( + org.hamcrest.Matchers.hasEntry( + "beam:resources:min_ram_bytes:v1", ByteString.copyFromUtf8("1")), + org.hamcrest.Matchers.hasEntry( + "beam:resources:accelerator:v1", ByteString.copyFromUtf8("set_via_options")))); + assertThat( + pipelineProto + .getComponents() + .getEnvironmentsMap() + .get(getLeafTransform(pipelineProto, "Innermost").getEnvironmentId()) + .getResourceHintsMap(), + org.hamcrest.Matchers.allOf( + org.hamcrest.Matchers.hasEntry( + "beam:resources:min_ram_bytes:v1", ByteString.copyFromUtf8("20")), + org.hamcrest.Matchers.hasEntry( + "beam:resources:accelerator:v1", + ByteString.copyFromUtf8("set_in_inner_transform")))); + } + /** - * Tests that when {@link DataflowPipelineOptions#setWorkerHarnessContainerImage(String)} pipeline - * option is set, {@link DataflowRunner} sets that value as the {@link - * DockerPayload#getContainerImage()} of the default {@link Environment} used when generating the - * model pipeline proto. + * Tests that when (deprecated) {@link + * DataflowPipelineOptions#setWorkerHarnessContainerImage(String)} pipeline option is set, {@link + * DataflowRunner} sets that value as the {@link DockerPayload#getContainerImage()} of the default + * {@link Environment} used when generating the model pipeline proto. */ @Test public void testSetWorkerHarnessContainerImageInPipelineProto() throws Exception { DataflowPipelineOptions options = buildPipelineOptions(); - String containerImage = "gcr.io/IMAGE/foo"; + String containerImage = "gcr.io/image:foo"; options.as(DataflowPipelineOptions.class).setWorkerHarnessContainerImage(containerImage); Pipeline p = Pipeline.create(options); @@ -1335,6 +1505,85 @@ public void testSetWorkerHarnessContainerImageInPipelineProto() throws Exception assertEquals(DataflowRunner.getContainerImageForJob(options), payload.getContainerImage()); } + /** + * Tests that when {@link DataflowPipelineOptions#setSdkContainerImage(String)} pipeline option is + * set, {@link DataflowRunner} sets that value as the {@link DockerPayload#getContainerImage()} of + * the default {@link Environment} used when generating the model pipeline proto. + */ + @Test + public void testSetSdkContainerImageInPipelineProto() throws Exception { + DataflowPipelineOptions options = buildPipelineOptions(); + String containerImage = "gcr.io/image:foo"; + options.as(DataflowPipelineOptions.class).setSdkContainerImage(containerImage); + + Pipeline p = Pipeline.create(options); + SdkComponents sdkComponents = createSdkComponents(options); + RunnerApi.Pipeline proto = PipelineTranslation.toProto(p, sdkComponents, true); + JobSpecification specification = + DataflowPipelineTranslator.fromOptions(options) + .translate( + p, + proto, + sdkComponents, + DataflowRunner.fromOptions(options), + Collections.emptyList()); + RunnerApi.Pipeline pipelineProto = specification.getPipelineProto(); + + assertEquals(1, pipelineProto.getComponents().getEnvironmentsCount()); + Environment defaultEnvironment = + Iterables.getOnlyElement(pipelineProto.getComponents().getEnvironmentsMap().values()); + + DockerPayload payload = DockerPayload.parseFrom(defaultEnvironment.getPayload()); + assertEquals(DataflowRunner.getContainerImageForJob(options), payload.getContainerImage()); + } + + @Test + public void testDataflowServiceOptionsSet() throws IOException { + final List dataflowServiceOptions = + Stream.of("whizz=bang", "foo=bar").collect(Collectors.toList()); + + DataflowPipelineOptions options = buildPipelineOptions(); + options.setDataflowServiceOptions(dataflowServiceOptions); + + Pipeline p = buildPipeline(options); + p.traverseTopologically(new RecordingPipelineVisitor()); + SdkComponents sdkComponents = createSdkComponents(options); + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p, sdkComponents, true); + Job job = + DataflowPipelineTranslator.fromOptions(options) + .translate( + p, + pipelineProto, + sdkComponents, + DataflowRunner.fromOptions(options), + Collections.emptyList()) + .getJob(); + + assertEquals(dataflowServiceOptions, job.getEnvironment().getServiceOptions()); + } + + @Test + public void testHotKeyLoggingEnabledOption() throws IOException { + DataflowPipelineOptions options = buildPipelineOptions(); + options.setHotKeyLoggingEnabled(true); + + Pipeline p = buildPipeline(options); + p.traverseTopologically(new RecordingPipelineVisitor()); + SdkComponents sdkComponents = createSdkComponents(options); + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p, sdkComponents, true); + Job job = + DataflowPipelineTranslator.fromOptions(options) + .translate( + p, + pipelineProto, + sdkComponents, + DataflowRunner.fromOptions(options), + Collections.emptyList()) + .getJob(); + + assertTrue(job.getEnvironment().getDebugOptions().getEnableHotKeyLogging()); + } + private static void assertAllStepOutputsHaveUniqueIds(Job job) throws Exception { List outputIds = new ArrayList<>(); for (Step step : job.getSteps()) { diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerInfoTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerInfoTest.java index f89e36e53754..e75af29504e5 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerInfoTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerInfoTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.dataflow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import org.junit.Test; @@ -31,9 +31,6 @@ *

    Note that tests for checking that the Dataflow distribution correctly loads overridden * properties is contained within the Dataflow distribution. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowRunnerInfoTest { @Test @@ -52,8 +49,23 @@ public void getDataflowRunnerInfo() throws Exception { String.format("FnAPI environment major version number %s is not a number", version), version.matches("\\d+")); - // Validate container version does not contain a $ (indicating it was not filled in). - assertThat("container version invalid", info.getContainerVersion(), not(containsString("$"))); + // Validate container versions do not contain the property name. + assertThat( + "legacy container version invalid", + info.getFnApiDevContainerVersion(), + not(containsString("dataflow.legacy_container_version"))); + + assertThat( + "FnAPI container version invalid", + info.getLegacyDevContainerVersion(), + not(containsString("dataflow.fnapi_container_version"))); + + // Validate container base repository does not contain the property name + // (indicating it was not filled in). + assertThat( + "container repository invalid", + info.getContainerImageBaseRepository(), + not(containsString("dataflow.container_base_repository"))); for (String property : new String[] {"java.vendor", "java.version", "os.arch", "os.name", "os.version"}) { diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java index c55f7aba0f7d..f5bde8e3e9e1 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java @@ -20,7 +20,6 @@ import static org.apache.beam.runners.dataflow.DataflowRunner.getContainerImageForJob; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Files.getFileExtension; import static org.hamcrest.MatcherAssert.assertThat; -import static org.hamcrest.Matchers.both; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.endsWith; @@ -28,6 +27,7 @@ import static org.hamcrest.Matchers.hasEntry; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.hasKey; +import static org.hamcrest.Matchers.hasProperty; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.lessThanOrEqualTo; import static org.hamcrest.Matchers.not; @@ -66,6 +66,7 @@ import com.google.api.services.dataflow.model.DataflowPackage; import com.google.api.services.dataflow.model.Job; import com.google.api.services.dataflow.model.ListJobsResponse; +import com.google.api.services.dataflow.model.SdkHarnessContainerImage; import com.google.api.services.storage.model.StorageObject; import com.google.auto.service.AutoService; import java.io.File; @@ -83,7 +84,10 @@ import java.util.Map; import java.util.concurrent.atomic.AtomicBoolean; import java.util.regex.Pattern; +import java.util.stream.Collectors; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.BeamUrns; +import org.apache.beam.runners.core.construction.Environments; import org.apache.beam.runners.core.construction.PipelineTranslation; import org.apache.beam.runners.core.construction.SdkComponents; import org.apache.beam.runners.dataflow.DataflowRunner.StreamingShardedWriteFactory; @@ -128,6 +132,7 @@ import org.apache.beam.sdk.testing.ExpectedLogs; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.UsesStatefulParDo; import org.apache.beam.sdk.testing.ValidatesRunner; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; @@ -135,9 +140,9 @@ import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; -import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.transforms.SerializableFunctions; import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.transforms.windowing.Sessions; @@ -148,7 +153,7 @@ import org.apache.beam.sdk.values.PValues; import org.apache.beam.sdk.values.TimestampedValue; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.checkerframework.checker.nullness.qual.Nullable; @@ -177,9 +182,6 @@ *

    Implements {@link Serializable} because it is caught in closures. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowRunnerTest implements Serializable { private static final String VALID_BUCKET = "valid-bucket"; @@ -346,14 +348,9 @@ public void testPathValidation() { "--credentialFactoryClass=" + NoopCredentialFactory.class.getName(), }; - try { - Pipeline.create(PipelineOptionsFactory.fromArgs(args).create()).run(); - fail(); - } catch (RuntimeException e) { - assertThat( - Throwables.getStackTraceAsString(e), - containsString("DataflowRunner requires gcpTempLocation")); - } + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage("DataflowRunner requires gcpTempLocation"); + Pipeline.create(PipelineOptionsFactory.fromArgs(args).create()).run(); } @Test @@ -367,15 +364,10 @@ public void testPathExistsValidation() { "--credentialFactoryClass=" + NoopCredentialFactory.class.getName(), }; - try { - Pipeline.create(PipelineOptionsFactory.fromArgs(args).create()).run(); - fail(); - } catch (RuntimeException e) { - assertThat( - Throwables.getStackTraceAsString(e), - both(containsString("gs://does/not/exist")) - .and(containsString("Unable to verify that GCS bucket gs://does exists"))); - } + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage("gcpTempLocation"); + thrown.expectCause(hasProperty("message", containsString("gs://does/not/exist"))); + Pipeline.create(PipelineOptionsFactory.fromArgs(args).create()).run(); } @Test @@ -592,7 +584,8 @@ public void testSettingOfPipelineOptionsWithCustomUserType() throws IOException @Test public void testZoneAndWorkerRegionMutuallyExclusive() { - GcpOptions options = PipelineOptionsFactory.as(GcpOptions.class); + DataflowPipelineWorkerPoolOptions options = + PipelineOptionsFactory.as(DataflowPipelineWorkerPoolOptions.class); options.setZone("us-east1-b"); options.setWorkerRegion("us-east1"); assertThrows( @@ -601,7 +594,8 @@ public void testZoneAndWorkerRegionMutuallyExclusive() { @Test public void testZoneAndWorkerZoneMutuallyExclusive() { - GcpOptions options = PipelineOptionsFactory.as(GcpOptions.class); + DataflowPipelineWorkerPoolOptions options = + PipelineOptionsFactory.as(DataflowPipelineWorkerPoolOptions.class); options.setZone("us-east1-b"); options.setWorkerZone("us-east1-c"); assertThrows( @@ -610,7 +604,8 @@ public void testZoneAndWorkerZoneMutuallyExclusive() { @Test public void testExperimentRegionAndWorkerRegionMutuallyExclusive() { - GcpOptions options = PipelineOptionsFactory.as(GcpOptions.class); + DataflowPipelineWorkerPoolOptions options = + PipelineOptionsFactory.as(DataflowPipelineWorkerPoolOptions.class); DataflowPipelineOptions dataflowOptions = options.as(DataflowPipelineOptions.class); ExperimentalOptions.addExperiment(dataflowOptions, "worker_region=us-west1"); options.setWorkerRegion("us-east1"); @@ -620,7 +615,8 @@ public void testExperimentRegionAndWorkerRegionMutuallyExclusive() { @Test public void testExperimentRegionAndWorkerZoneMutuallyExclusive() { - GcpOptions options = PipelineOptionsFactory.as(GcpOptions.class); + DataflowPipelineWorkerPoolOptions options = + PipelineOptionsFactory.as(DataflowPipelineWorkerPoolOptions.class); DataflowPipelineOptions dataflowOptions = options.as(DataflowPipelineOptions.class); ExperimentalOptions.addExperiment(dataflowOptions, "worker_region=us-west1"); options.setWorkerZone("us-east1-b"); @@ -630,7 +626,8 @@ public void testExperimentRegionAndWorkerZoneMutuallyExclusive() { @Test public void testWorkerRegionAndWorkerZoneMutuallyExclusive() { - GcpOptions options = PipelineOptionsFactory.as(GcpOptions.class); + DataflowPipelineWorkerPoolOptions options = + PipelineOptionsFactory.as(DataflowPipelineWorkerPoolOptions.class); options.setWorkerRegion("us-east1"); options.setWorkerZone("us-east1-b"); assertThrows( @@ -639,13 +636,36 @@ public void testWorkerRegionAndWorkerZoneMutuallyExclusive() { @Test public void testZoneAliasWorkerZone() { - GcpOptions options = PipelineOptionsFactory.as(GcpOptions.class); + DataflowPipelineWorkerPoolOptions options = + PipelineOptionsFactory.as(DataflowPipelineWorkerPoolOptions.class); options.setZone("us-east1-b"); DataflowRunner.validateWorkerSettings(options); assertNull(options.getZone()); assertEquals("us-east1-b", options.getWorkerZone()); } + @Test + public void testAliasForLegacyWorkerHarnessContainerImage() { + DataflowPipelineWorkerPoolOptions options = + PipelineOptionsFactory.as(DataflowPipelineWorkerPoolOptions.class); + String testImage = "image.url:worker"; + options.setWorkerHarnessContainerImage(testImage); + DataflowRunner.validateWorkerSettings(options); + assertEquals(testImage, options.getWorkerHarnessContainerImage()); + assertEquals(testImage, options.getSdkContainerImage()); + } + + @Test + public void testAliasForSdkContainerImage() { + DataflowPipelineWorkerPoolOptions options = + PipelineOptionsFactory.as(DataflowPipelineWorkerPoolOptions.class); + String testImage = "image.url:sdk"; + options.setSdkContainerImage("image.url:sdk"); + DataflowRunner.validateWorkerSettings(options); + assertEquals(testImage, options.getWorkerHarnessContainerImage()); + assertEquals(testImage, options.getSdkContainerImage()); + } + @Test public void testRegionRequiredForServiceRunner() throws IOException { DataflowPipelineOptions options = buildPipelineOptions(); @@ -701,7 +721,7 @@ public void testUnconsumedReads() throws IOException { RuntimeTestOptions options = dataflowOptions.as(RuntimeTestOptions.class); Pipeline p = buildDataflowPipeline(dataflowOptions); p.apply(TextIO.read().from(options.getInput())); - DataflowRunner.fromOptions(dataflowOptions).replaceTransforms(p); + DataflowRunner.fromOptions(dataflowOptions).replaceV1Transforms(p); final AtomicBoolean unconsumedSeenAsInput = new AtomicBoolean(); p.traverseTopologically( new PipelineVisitor.Defaults() { @@ -1158,6 +1178,97 @@ public void testNoStagingLocationAndNoTempLocationFails() { DataflowRunner.fromOptions(options); } + @Test + public void testResolveArtifacts() throws IOException { + DataflowPipelineOptions options = buildPipelineOptions(); + DataflowRunner runner = DataflowRunner.fromOptions(options); + String stagingLocation = options.getStagingLocation().replaceFirst("/$", ""); + RunnerApi.ArtifactInformation fooLocalArtifact = + RunnerApi.ArtifactInformation.newBuilder() + .setTypeUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE)) + .setTypePayload( + RunnerApi.ArtifactFilePayload.newBuilder() + .setPath("/tmp/foo.jar") + .build() + .toByteString()) + .setRoleUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO)) + .setRolePayload( + RunnerApi.ArtifactStagingToRolePayload.newBuilder() + .setStagedName("foo_staged.jar") + .build() + .toByteString()) + .build(); + RunnerApi.ArtifactInformation barLocalArtifact = + RunnerApi.ArtifactInformation.newBuilder() + .setTypeUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE)) + .setTypePayload( + RunnerApi.ArtifactFilePayload.newBuilder() + .setPath("/tmp/bar.jar") + .build() + .toByteString()) + .setRoleUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO)) + .setRolePayload( + RunnerApi.ArtifactStagingToRolePayload.newBuilder() + .setStagedName("bar_staged.jar") + .build() + .toByteString()) + .build(); + RunnerApi.Pipeline pipeline = + RunnerApi.Pipeline.newBuilder() + .setComponents( + RunnerApi.Components.newBuilder() + .putEnvironments( + "env", + RunnerApi.Environment.newBuilder() + .addAllDependencies( + ImmutableList.of(fooLocalArtifact, barLocalArtifact)) + .build())) + .build(); + + RunnerApi.ArtifactInformation fooStagedArtifact = + RunnerApi.ArtifactInformation.newBuilder() + .setTypeUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.URL)) + .setTypePayload( + RunnerApi.ArtifactUrlPayload.newBuilder() + .setUrl(stagingLocation + "/foo_staged.jar") + .build() + .toByteString()) + .setRoleUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO)) + .setRolePayload( + RunnerApi.ArtifactStagingToRolePayload.newBuilder() + .setStagedName("foo_staged.jar") + .build() + .toByteString()) + .build(); + RunnerApi.ArtifactInformation barStagedArtifact = + RunnerApi.ArtifactInformation.newBuilder() + .setTypeUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.URL)) + .setTypePayload( + RunnerApi.ArtifactUrlPayload.newBuilder() + .setUrl(stagingLocation + "/bar_staged.jar") + .build() + .toByteString()) + .setRoleUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO)) + .setRolePayload( + RunnerApi.ArtifactStagingToRolePayload.newBuilder() + .setStagedName("bar_staged.jar") + .build() + .toByteString()) + .build(); + RunnerApi.Pipeline expectedPipeline = + RunnerApi.Pipeline.newBuilder() + .setComponents( + RunnerApi.Components.newBuilder() + .putEnvironments( + "env", + RunnerApi.Environment.newBuilder() + .addAllDependencies( + ImmutableList.of(fooStagedArtifact, barStagedArtifact)) + .build())) + .build(); + assertThat(runner.resolveArtifacts(pipeline), equalTo(expectedPipeline)); + } + @Test public void testGcpTempAndNoTempLocationSucceeds() throws Exception { DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class); @@ -1342,6 +1453,66 @@ public void testTransformTranslator() throws IOException { assertTrue(transform.translated); } + @Test + public void testSdkHarnessConfiguration() throws IOException { + DataflowPipelineOptions options = buildPipelineOptions(); + ExperimentalOptions.addExperiment(options, "use_runner_v2"); + Pipeline p = Pipeline.create(options); + + p.apply(Create.of(Arrays.asList(1, 2, 3))); + + String defaultSdkContainerImage = DataflowRunner.getContainerImageForJob(options); + SdkComponents sdkComponents = SdkComponents.create(); + RunnerApi.Environment defaultEnvironmentForDataflow = + Environments.createDockerEnvironment(defaultSdkContainerImage); + sdkComponents.registerEnvironment(defaultEnvironmentForDataflow.toBuilder().build()); + + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p, sdkComponents, true); + + Job job = + DataflowPipelineTranslator.fromOptions(options) + .translate( + p, + pipelineProto, + sdkComponents, + DataflowRunner.fromOptions(options), + Collections.emptyList()) + .getJob(); + + DataflowRunner.configureSdkHarnessContainerImages(options, pipelineProto, job); + List sdks = + job.getEnvironment().getWorkerPools().get(0).getSdkHarnessContainerImages(); + + Map expectedEnvIdsAndContainerImages = + pipelineProto.getComponents().getEnvironmentsMap().entrySet().stream() + .filter( + x -> + BeamUrns.getUrn(RunnerApi.StandardEnvironments.Environments.DOCKER) + .equals(x.getValue().getUrn())) + .collect( + Collectors.toMap( + x -> x.getKey(), + x -> { + RunnerApi.DockerPayload payload; + try { + payload = RunnerApi.DockerPayload.parseFrom(x.getValue().getPayload()); + } catch (InvalidProtocolBufferException e) { + throw new RuntimeException(e); + } + return payload.getContainerImage(); + })); + + assertEquals(1, expectedEnvIdsAndContainerImages.size()); + assertEquals(1, sdks.size()); + assertEquals( + expectedEnvIdsAndContainerImages, + sdks.stream() + .collect( + Collectors.toMap( + SdkHarnessContainerImage::getEnvironmentId, + SdkHarnessContainerImage::getContainerImage))); + } + private void verifyMapStateUnsupported(PipelineOptions options) throws Exception { Pipeline p = Pipeline.create(options); p.apply(Create.of(KV.of(13, 42))) @@ -1361,16 +1532,12 @@ public void process() {} } @Test - public void testMapStateUnsupportedInBatch() throws Exception { + public void testMapStateUnsupportedStreamingEngine() throws Exception { PipelineOptions options = buildPipelineOptions(); - options.as(StreamingOptions.class).setStreaming(false); - verifyMapStateUnsupported(options); - } + ExperimentalOptions.addExperiment( + options.as(ExperimentalOptions.class), GcpOptions.STREAMING_ENGINE_EXPERIMENT); + options.as(DataflowPipelineOptions.class).setStreaming(true); - @Test - public void testMapStateUnsupportedInStreaming() throws Exception { - PipelineOptions options = buildPipelineOptions(); - options.as(StreamingOptions.class).setStreaming(true); verifyMapStateUnsupported(options); } @@ -1393,17 +1560,11 @@ public void process() {} } @Test - public void testSetStateUnsupportedInBatch() throws Exception { + public void testSetStateUnsupportedStreamingEngine() throws Exception { PipelineOptions options = buildPipelineOptions(); - options.as(StreamingOptions.class).setStreaming(false); - Pipeline.create(options); - verifySetStateUnsupported(options); - } - - @Test - public void testSetStateUnsupportedInStreaming() throws Exception { - PipelineOptions options = buildPipelineOptions(); - options.as(StreamingOptions.class).setStreaming(true); + ExperimentalOptions.addExperiment( + options.as(ExperimentalOptions.class), GcpOptions.STREAMING_ENGINE_EXPERIMENT); + options.as(DataflowPipelineOptions.class).setStreaming(true); verifySetStateUnsupported(options); } @@ -1535,33 +1696,60 @@ public void testTemplateRunnerLoggedErrorForFile() throws Exception { } @Test - public void testWorkerHarnessContainerImage() { + public void testGetContainerImageForJobFromOption() { DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class); - // default image set - options.setWorkerHarnessContainerImage("some-container"); - assertThat(getContainerImageForJob(options), equalTo("some-container")); - - // batch, legacy - options.setWorkerHarnessContainerImage("gcr.io/IMAGE/foo"); - options.setExperiments(null); - options.setStreaming(false); - System.setProperty("java.specification.version", "1.8"); - assertThat(getContainerImageForJob(options), equalTo("gcr.io/beam-java-batch/foo")); - // batch, legacy, jdk11 - options.setStreaming(false); - System.setProperty("java.specification.version", "11"); - assertThat(getContainerImageForJob(options), equalTo("gcr.io/beam-java11-batch/foo")); - // streaming, legacy - System.setProperty("java.specification.version", "1.8"); - options.setStreaming(true); - assertThat(getContainerImageForJob(options), equalTo("gcr.io/beam-java-streaming/foo")); - // streaming, legacy, jdk11 - System.setProperty("java.specification.version", "11"); - assertThat(getContainerImageForJob(options), equalTo("gcr.io/beam-java11-streaming/foo")); - // streaming, fnapi - options.setExperiments(ImmutableList.of("experiment1", "beam_fn_api")); - assertThat(getContainerImageForJob(options), equalTo("gcr.io/java/foo")); + String[] testCases = { + "some-container", + + // It is important that empty string is preserved, as + // dataflowWorkerJar relies on being passed an empty value vs + // not providing the container image option at all. + "", + }; + + for (String testCase : testCases) { + // When image option is set, should use that exact image. + options.setSdkContainerImage(testCase); + assertThat(getContainerImageForJob(options), equalTo(testCase)); + } + } + + @Test + public void testGetContainerImageForJobFromOptionWithPlaceholder() { + DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class); + options.setSdkContainerImage("gcr.io/IMAGE/foo"); + + for (Environments.JavaVersion javaVersion : Environments.JavaVersion.values()) { + System.setProperty("java.specification.version", javaVersion.specification()); + // batch legacy + options.setExperiments(null); + options.setStreaming(false); + assertThat( + getContainerImageForJob(options), + equalTo(String.format("gcr.io/beam-%s-batch/foo", javaVersion.legacyName()))); + + // streaming, legacy + options.setExperiments(null); + options.setStreaming(true); + assertThat( + getContainerImageForJob(options), + equalTo(String.format("gcr.io/beam-%s-streaming/foo", javaVersion.legacyName()))); + + // batch, FnAPI + options.setExperiments(ImmutableList.of("beam_fn_api")); + options.setStreaming(false); + assertThat( + getContainerImageForJob(options), + equalTo(String.format("gcr.io/beam_%s_sdk/foo", javaVersion.name()))); + + // streaming, FnAPI + options.setExperiments(ImmutableList.of("beam_fn_api")); + options.setStreaming(true); + assertThat( + getContainerImageForJob(options), + equalTo(String.format("gcr.io/beam_%s_sdk/foo", javaVersion.name()))); + } } @Test @@ -1611,19 +1799,21 @@ public void testMergingStatefulRejectedInBatch() throws Exception { verifyMergingStatefulParDoRejected(options); } - private void verifyGroupIntoBatchesOverride( + private void verifyGroupIntoBatchesOverrideCount( Pipeline p, Boolean withShardedKey, Boolean expectOverriden) { final int batchSize = 2; List> testValues = Arrays.asList(KV.of("A", 1), KV.of("B", 0), KV.of("A", 2), KV.of("A", 4), KV.of("A", 8)); - PCollection> input = p.apply(Create.of(testValues)); + PCollection> input = p.apply("CreateValuesCount", Create.of(testValues)); PCollection>> output; if (withShardedKey) { output = input - .apply(GroupIntoBatches.ofSize(batchSize).withShardedKey()) .apply( - "StripShardId", + "GroupIntoBatchesCount", + GroupIntoBatches.ofSize(batchSize).withShardedKey()) + .apply( + "StripShardIdCount", MapElements.via( new SimpleFunction< KV, Iterable>, @@ -1635,28 +1825,99 @@ public KV> apply( } })); } else { - output = input.apply(GroupIntoBatches.ofSize(batchSize)); + output = input.apply("GroupIntoBatchesCount", GroupIntoBatches.ofSize(batchSize)); } PAssert.thatMultimap(output) .satisfies( - new SerializableFunction>>, Void>() { - @Override - public Void apply(Map>> input) { - assertEquals(2, input.size()); - assertThat(input.keySet(), containsInAnyOrder("A", "B")); - Map sums = new HashMap<>(); - for (Map.Entry>> entry : input.entrySet()) { - for (Iterable batch : entry.getValue()) { - assertThat(Iterables.size(batch), lessThanOrEqualTo(batchSize)); - for (Integer value : batch) { - sums.put(entry.getKey(), value + sums.getOrDefault(entry.getKey(), 0)); - } + i -> { + assertEquals(2, i.size()); + assertThat(i.keySet(), containsInAnyOrder("A", "B")); + Map sums = new HashMap<>(); + for (Map.Entry>> entry : i.entrySet()) { + for (Iterable batch : entry.getValue()) { + assertThat(Iterables.size(batch), lessThanOrEqualTo(batchSize)); + for (Integer value : batch) { + sums.put(entry.getKey(), value + sums.getOrDefault(entry.getKey(), 0)); } } - assertEquals(15, (int) sums.get("A")); - assertEquals(0, (int) sums.get("B")); - return null; } + assertEquals(15, (int) sums.get("A")); + assertEquals(0, (int) sums.get("B")); + return null; + }); + p.run(); + + AtomicBoolean sawGroupIntoBatchesOverride = new AtomicBoolean(false); + p.traverseTopologically( + new PipelineVisitor.Defaults() { + + @Override + public CompositeBehavior enterCompositeTransform(Node node) { + if (p.getOptions().as(StreamingOptions.class).isStreaming() + && node.getTransform() + instanceof GroupIntoBatchesOverride.StreamingGroupIntoBatchesWithShardedKey) { + sawGroupIntoBatchesOverride.set(true); + } + if (!p.getOptions().as(StreamingOptions.class).isStreaming() + && node.getTransform() instanceof GroupIntoBatchesOverride.BatchGroupIntoBatches) { + sawGroupIntoBatchesOverride.set(true); + } + if (!p.getOptions().as(StreamingOptions.class).isStreaming() + && node.getTransform() + instanceof GroupIntoBatchesOverride.BatchGroupIntoBatchesWithShardedKey) { + sawGroupIntoBatchesOverride.set(true); + } + return CompositeBehavior.ENTER_TRANSFORM; + } + }); + if (expectOverriden) { + assertTrue(sawGroupIntoBatchesOverride.get()); + } else { + assertFalse(sawGroupIntoBatchesOverride.get()); + } + } + + private void verifyGroupIntoBatchesOverrideBytes( + Pipeline p, Boolean withShardedKey, Boolean expectOverriden) { + final long batchSizeBytes = 2; + List> testValues = + Arrays.asList( + KV.of("A", "a"), + KV.of("A", "ab"), + KV.of("A", "abc"), + KV.of("A", "abcd"), + KV.of("A", "abcde")); + PCollection> input = p.apply("CreateValuesBytes", Create.of(testValues)); + PCollection>> output; + if (withShardedKey) { + output = + input + .apply( + "GroupIntoBatchesBytes", + GroupIntoBatches.ofByteSize(batchSizeBytes).withShardedKey()) + .apply( + "StripShardIdBytes", + MapElements.via( + new SimpleFunction< + KV, Iterable>, + KV>>() { + @Override + public KV> apply( + KV, Iterable> input) { + return KV.of(input.getKey().getKey(), input.getValue()); + } + })); + } else { + output = input.apply("GroupIntoBatchesBytes", GroupIntoBatches.ofByteSize(batchSizeBytes)); + } + PAssert.thatMultimap(output) + .satisfies( + i -> { + assertEquals(1, i.size()); + assertThat(i.keySet(), containsInAnyOrder("A")); + Iterable> batches = i.get("A"); + assertEquals(5, Iterables.size(batches)); + return null; }); p.run(); @@ -1691,34 +1952,81 @@ public CompositeBehavior enterCompositeTransform(Node node) { } @Test - @Category(ValidatesRunner.class) - public void testBatchGroupIntoBatchesOverride() { + @Category({ValidatesRunner.class, UsesStatefulParDo.class}) + public void testBatchGroupIntoBatchesOverrideCount() { // Ignore this test for streaming pipelines. assumeFalse(pipeline.getOptions().as(StreamingOptions.class).isStreaming()); - verifyGroupIntoBatchesOverride(pipeline, false, true); + verifyGroupIntoBatchesOverrideCount(pipeline, false, true); } @Test - public void testBatchGroupIntoBatchesWithShardedKeyOverride() throws IOException { + @Category({ValidatesRunner.class, UsesStatefulParDo.class}) + public void testBatchGroupIntoBatchesOverrideBytes() { + // Ignore this test for streaming pipelines. + assumeFalse(pipeline.getOptions().as(StreamingOptions.class).isStreaming()); + verifyGroupIntoBatchesOverrideBytes(pipeline, false, true); + } + + @Test + public void testBatchGroupIntoBatchesWithShardedKeyOverrideCount() throws IOException { PipelineOptions options = buildPipelineOptions(); Pipeline p = Pipeline.create(options); - verifyGroupIntoBatchesOverride(p, true, true); + verifyGroupIntoBatchesOverrideCount(p, true, true); } @Test - public void testStreamingGroupIntoBatchesOverride() throws IOException { + public void testBatchGroupIntoBatchesWithShardedKeyOverrideBytes() throws IOException { + PipelineOptions options = buildPipelineOptions(); + Pipeline p = Pipeline.create(options); + verifyGroupIntoBatchesOverrideBytes(p, true, true); + } + + @Test + public void testStreamingGroupIntoBatchesOverrideCount() throws IOException { PipelineOptions options = buildPipelineOptions(); options.as(StreamingOptions.class).setStreaming(true); Pipeline p = Pipeline.create(options); - verifyGroupIntoBatchesOverride(p, false, false); + verifyGroupIntoBatchesOverrideCount(p, false, false); } @Test - public void testStreamingGroupIntoBatchesWithShardedKeyOverride() throws IOException { + public void testStreamingGroupIntoBatchesOverrideBytes() throws IOException { PipelineOptions options = buildPipelineOptions(); options.as(StreamingOptions.class).setStreaming(true); Pipeline p = Pipeline.create(options); - verifyGroupIntoBatchesOverride(p, true, true); + verifyGroupIntoBatchesOverrideBytes(p, false, false); + } + + @Test + public void testStreamingGroupIntoBatchesWithShardedKeyOverrideCount() throws IOException { + PipelineOptions options = buildPipelineOptions(); + List experiments = + new ArrayList<>( + ImmutableList.of( + GcpOptions.STREAMING_ENGINE_EXPERIMENT, + GcpOptions.WINDMILL_SERVICE_EXPERIMENT, + "use_runner_v2")); + DataflowPipelineOptions dataflowOptions = options.as(DataflowPipelineOptions.class); + dataflowOptions.setExperiments(experiments); + dataflowOptions.setStreaming(true); + Pipeline p = Pipeline.create(options); + verifyGroupIntoBatchesOverrideCount(p, true, true); + } + + @Test + public void testStreamingGroupIntoBatchesWithShardedKeyOverrideBytes() throws IOException { + PipelineOptions options = buildPipelineOptions(); + List experiments = + new ArrayList<>( + ImmutableList.of( + GcpOptions.STREAMING_ENGINE_EXPERIMENT, + GcpOptions.WINDMILL_SERVICE_EXPERIMENT, + "use_runner_v2")); + DataflowPipelineOptions dataflowOptions = options.as(DataflowPipelineOptions.class); + dataflowOptions.setExperiments(experiments); + dataflowOptions.setStreaming(true); + Pipeline p = Pipeline.create(options); + verifyGroupIntoBatchesOverrideBytes(p, true, true); } private void testStreamingWriteOverride(PipelineOptions options, int expectedNumShards) { @@ -1731,7 +2039,12 @@ private void testStreamingWriteOverride(PipelineOptions options, int expectedNum AppliedPTransform, WriteFilesResult, WriteFiles> originalApplication = AppliedPTransform.of( - "writefiles", PValues.expandInput(objs), Collections.emptyMap(), original, p); + "writefiles", + PValues.expandInput(objs), + Collections.emptyMap(), + original, + ResourceHints.create(), + p); WriteFiles replacement = (WriteFiles) diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactoryTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactoryTest.java index 5d73e16cb781..c09b2233d2fe 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactoryTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactoryTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.List; @@ -35,6 +35,7 @@ import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.display.DisplayDataEvaluator; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.PValues; @@ -74,6 +75,7 @@ public void getReplacementTransformPopulateDisplayData() { PValues.expandInput(input), PValues.expandOutput(input.apply(originalTransform)), originalTransform, + ResourceHints.create(), pipeline); PTransformReplacement, PCollection> replacement = @@ -111,6 +113,7 @@ public void getReplacementTransformGetSideInputs() { PValues.expandInput(input), PValues.expandOutput(input.apply(originalTransform)), originalTransform, + ResourceHints.create(), pipeline); PTransformReplacement, PCollection> replacementTransform = @@ -133,6 +136,7 @@ public void getReplacementTransformGetFn() { PValues.expandInput(input), PValues.expandOutput(input.apply(originalTransform)), originalTransform, + ResourceHints.create(), pipeline); PTransformReplacement, PCollection> replacementTransform = diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/RecordingPipelineVisitor.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/RecordingPipelineVisitor.java index 3e46bc575314..e04a676e86a0 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/RecordingPipelineVisitor.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/RecordingPipelineVisitor.java @@ -28,9 +28,6 @@ * Provides a simple {@link org.apache.beam.sdk.Pipeline.PipelineVisitor} that records the * transformation tree. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class RecordingPipelineVisitor extends Pipeline.PipelineVisitor.Defaults { public final List> transforms = new ArrayList<>(); diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/TestDataflowRunnerTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/TestDataflowRunnerTest.java index fb10b4f144d9..5e86fd3e6637 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/TestDataflowRunnerTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/TestDataflowRunnerTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.runners.dataflow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.ArgumentMatchers.anyString; import static org.mockito.Matchers.any; @@ -75,9 +75,6 @@ /** Tests for {@link TestDataflowRunner}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestDataflowRunnerTest { @Rule public ExpectedException expectedException = ExpectedException.none(); @Mock private DataflowClient mockClient; diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptionsTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptionsTest.java index dbe7a079c114..80a95e241116 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptionsTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptionsTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.options; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasEntry; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.junit.Test; diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptionsTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptionsTest.java index f96a009250d7..292ce619fc27 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptionsTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptionsTest.java @@ -43,9 +43,6 @@ /** Tests for {@link DataflowPipelineOptions}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowPipelineOptionsTest { @Rule public TestRule restoreSystemProperties = new RestoreSystemProperties(); @Rule public ResetDateTimeProvider resetDateTimeProviderRule = new ResetDateTimeProvider(); diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowProfilingOptionsTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowProfilingOptionsTest.java index 44d3c697463f..86955bbc0d69 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowProfilingOptionsTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowProfilingOptionsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.options; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.fasterxml.jackson.databind.ObjectMapper; import org.apache.beam.sdk.options.PipelineOptionsFactory; diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowWorkerLoggingOptionsTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowWorkerLoggingOptionsTest.java index 51049c8e636b..1fa5b143f719 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowWorkerLoggingOptionsTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowWorkerLoggingOptionsTest.java @@ -32,9 +32,6 @@ /** Tests for {@link DataflowWorkerLoggingOptions}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowWorkerLoggingOptionsTest { private static final ObjectMapper MAPPER = new ObjectMapper() diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/transforms/DataflowGroupByKeyTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/transforms/DataflowGroupByKeyTest.java index 798e1af91593..915dc74a2f1b 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/transforms/DataflowGroupByKeyTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/transforms/DataflowGroupByKeyTest.java @@ -18,27 +18,20 @@ package org.apache.beam.runners.dataflow.transforms; import com.google.api.services.dataflow.Dataflow; -import java.util.Arrays; -import java.util.List; import org.apache.beam.runners.dataflow.DataflowRunner; import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions; import org.apache.beam.sdk.Pipeline; -import org.apache.beam.sdk.coders.BigEndianIntegerCoder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VarIntCoder; import org.apache.beam.sdk.extensions.gcp.storage.NoopPathValidator; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.GroupByKey; import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.transforms.windowing.Sessions; -import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.WindowingStrategy; -import org.joda.time.Duration; import org.junit.Before; import org.junit.Rule; import org.junit.Test; @@ -50,9 +43,6 @@ /** Tests for {@link GroupByKey} for the {@link DataflowRunner}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowGroupByKeyTest { @Rule public ExpectedException thrown = ExpectedException.none(); @@ -78,23 +68,6 @@ private Pipeline createTestServiceRunner() { return Pipeline.create(options); } - @Test - public void testInvalidWindowsService() { - Pipeline p = createTestServiceRunner(); - - List> ungroupedPairs = Arrays.asList(); - - PCollection> input = - p.apply( - Create.of(ungroupedPairs) - .withCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of()))) - .apply(Window.into(Sessions.withGapDuration(Duration.standardMinutes(1)))); - - thrown.expect(IllegalStateException.class); - thrown.expectMessage("GroupByKey must have a valid Window merge function"); - input.apply("GroupByKey", GroupByKey.create()).apply("GroupByKeyAgain", GroupByKey.create()); - } - @Test public void testGroupByKeyServiceUnbounded() { Pipeline p = createTestServiceRunner(); diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/transforms/DataflowViewTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/transforms/DataflowViewTest.java index c8e8b93bb3dc..78eb18a46bda 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/transforms/DataflowViewTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/transforms/DataflowViewTest.java @@ -26,19 +26,14 @@ import org.apache.beam.sdk.coders.VarIntCoder; import org.apache.beam.sdk.extensions.gcp.storage.NoopPathValidator; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.View; -import org.apache.beam.sdk.transforms.windowing.FixedWindows; -import org.apache.beam.sdk.transforms.windowing.InvalidWindows; -import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.WindowingStrategy; import org.hamcrest.Matchers; -import org.joda.time.Duration; import org.junit.Before; import org.junit.Rule; import org.junit.Test; @@ -51,9 +46,6 @@ /** Tests for {@link View} for a {@link DataflowRunner}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowViewTest { @Rule public transient ExpectedException thrown = ExpectedException.none(); @@ -109,22 +101,6 @@ public PCollection> expand(PBegin input) { .apply(view); } - private void testViewNonmerging( - Pipeline pipeline, - PTransform>, ? extends PCollectionView> view) { - thrown.expect(IllegalStateException.class); - thrown.expectMessage("Unable to create a side-input view from input"); - thrown.expectCause( - ThrowableMessageMatcher.hasMessage(Matchers.containsString("Consumed by GroupByKey"))); - pipeline - .apply(Create.of(KV.of("hello", 5))) - .apply( - Window.into( - new InvalidWindows<>( - "Consumed by GroupByKey", FixedWindows.of(Duration.standardHours(1))))) - .apply(view); - } - @Test public void testViewUnboundedAsSingletonBatch() { testViewUnbounded(createTestBatchRunner(), View.asSingleton()); @@ -174,54 +150,4 @@ public void testViewUnboundedAsMultimapBatch() { public void testViewUnboundedAsMultimapStreaming() { testViewUnbounded(createTestStreamingRunner(), View.asMultimap()); } - - @Test - public void testViewNonmergingAsSingletonBatch() { - testViewNonmerging(createTestBatchRunner(), View.asSingleton()); - } - - @Test - public void testViewNonmergingAsSingletonStreaming() { - testViewNonmerging(createTestStreamingRunner(), View.asSingleton()); - } - - @Test - public void testViewNonmergingAsIterableBatch() { - testViewNonmerging(createTestBatchRunner(), View.asIterable()); - } - - @Test - public void testViewNonmergingAsIterableStreaming() { - testViewNonmerging(createTestStreamingRunner(), View.asIterable()); - } - - @Test - public void testViewNonmergingAsListBatch() { - testViewNonmerging(createTestBatchRunner(), View.asList()); - } - - @Test - public void testViewNonmergingAsListStreaming() { - testViewNonmerging(createTestStreamingRunner(), View.asList()); - } - - @Test - public void testViewNonmergingAsMapBatch() { - testViewNonmerging(createTestBatchRunner(), View.asMap()); - } - - @Test - public void testViewNonmergingAsMapStreaming() { - testViewNonmerging(createTestStreamingRunner(), View.asMap()); - } - - @Test - public void testViewNonmergingAsMultimapBatch() { - testViewNonmerging(createTestBatchRunner(), View.asMultimap()); - } - - @Test - public void testViewNonmergingAsMultimapStreaming() { - testViewNonmerging(createTestStreamingRunner(), View.asMultimap()); - } } diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/CloudObjectsTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/CloudObjectsTest.java index 27f32238a649..f1a004f49358 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/CloudObjectsTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/CloudObjectsTest.java @@ -47,9 +47,11 @@ import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.coders.MapCoder; import org.apache.beam.sdk.coders.NullableCoder; +import org.apache.beam.sdk.coders.RowCoder; import org.apache.beam.sdk.coders.SerializableCoder; import org.apache.beam.sdk.coders.SetCoder; import org.apache.beam.sdk.coders.StructuredCoder; +import org.apache.beam.sdk.coders.TimestampPrefixingWindowCoder; import org.apache.beam.sdk.coders.VarLongCoder; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; @@ -82,7 +84,6 @@ @RunWith(Enclosed.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class CloudObjectsTest { private static final Schema TEST_SCHEMA = @@ -144,6 +145,7 @@ public static Iterable> data() { .add(new ObjectCoder()) .add(GlobalWindow.Coder.INSTANCE) .add(IntervalWindow.getCoder()) + .add(TimestampPrefixingWindowCoder.of(IntervalWindow.getCoder())) .add(LengthPrefixCoder.of(VarLongCoder.of())) .add(IterableCoder.of(VarLongCoder.of())) .add(KvCoder.of(VarLongCoder.of(), ByteArrayCoder.of())) @@ -179,7 +181,8 @@ public static Iterable> data() { new RowIdentity())) .add( SchemaCoder.of( - TEST_SCHEMA, TypeDescriptors.rows(), new RowIdentity(), new RowIdentity())); + TEST_SCHEMA, TypeDescriptors.rows(), new RowIdentity(), new RowIdentity())) + .add(RowCoder.of(TEST_SCHEMA)); for (Class atomicCoder : DefaultCoderCloudObjectTranslatorRegistrar.KNOWN_ATOMIC_CODERS) { dataBuilder.add(InstanceBuilder.ofType(atomicCoder).fromFactoryMethod("of").build()); diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/MonitoringUtilTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/MonitoringUtilTest.java index f4c1ce8140ac..df169b66a58a 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/MonitoringUtilTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/MonitoringUtilTest.java @@ -45,9 +45,6 @@ /** Tests for MonitoringUtil. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MonitoringUtilTest { private static final String PROJECT_ID = "someProject"; private static final String REGION_ID = "thatRegion"; diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java index cfc8b6cf9123..5c6e813c51e4 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.dataflow.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.endsWith; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.instanceOf; @@ -24,7 +25,6 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotEquals; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; import static org.mockito.ArgumentMatchers.anyString; @@ -104,9 +104,6 @@ /** Tests for {@link PackageUtil}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PackageUtilTest { @Rule public ExpectedLogs logged = ExpectedLogs.none(PackageUtil.class); @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/StructsTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/StructsTest.java index 903c2fb2a152..b8fda7a14620 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/StructsTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/StructsTest.java @@ -33,6 +33,7 @@ import static org.apache.beam.runners.dataflow.util.Structs.getObject; import static org.apache.beam.runners.dataflow.util.Structs.getString; import static org.apache.beam.runners.dataflow.util.Structs.getStrings; +import static org.hamcrest.MatcherAssert.assertThat; import java.util.ArrayList; import java.util.Arrays; @@ -48,9 +49,6 @@ /** Tests for Structs. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StructsTest { private List> makeCloudObjects() { List> objects = new ArrayList<>(); @@ -94,7 +92,7 @@ public void testGetStringParameter() throws Exception { getString(o, "missingKey"); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat( + assertThat( exn.toString(), Matchers.containsString("didn't find required parameter missingKey")); } @@ -102,26 +100,26 @@ public void testGetStringParameter() throws Exception { getString(o, "noStringsKey"); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat(exn.toString(), Matchers.containsString("not a string")); + assertThat(exn.toString(), Matchers.containsString("not a string")); } - Assert.assertThat(getStrings(o, "noStringsKey", null), Matchers.emptyIterable()); - Assert.assertThat(getObject(o, "noStringsKey").keySet(), Matchers.emptyIterable()); - Assert.assertThat(getDictionary(o, "noStringsKey").keySet(), Matchers.emptyIterable()); - Assert.assertThat(getDictionary(o, "noStringsKey", null).keySet(), Matchers.emptyIterable()); + assertThat(getStrings(o, "noStringsKey", null), Matchers.emptyIterable()); + assertThat(getObject(o, "noStringsKey").keySet(), Matchers.emptyIterable()); + assertThat(getDictionary(o, "noStringsKey").keySet(), Matchers.emptyIterable()); + assertThat(getDictionary(o, "noStringsKey", null).keySet(), Matchers.emptyIterable()); try { getString(o, "multipleStringsKey"); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat(exn.toString(), Matchers.containsString("not a string")); + assertThat(exn.toString(), Matchers.containsString("not a string")); } try { getString(o, "emptyKey"); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat(exn.toString(), Matchers.containsString("not a string")); + assertThat(exn.toString(), Matchers.containsString("not a string")); } } @@ -136,7 +134,7 @@ public void testGetBooleanParameter() throws Exception { getBoolean(o, "emptyKey", false); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat(exn.toString(), Matchers.containsString("not a boolean")); + assertThat(exn.toString(), Matchers.containsString("not a boolean")); } } @@ -152,13 +150,13 @@ public void testGetLongParameter() throws Exception { getLong(o, "emptyKey", 666L); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat(exn.toString(), Matchers.containsString("not a long")); + assertThat(exn.toString(), Matchers.containsString("not a long")); } try { getInt(o, "emptyKey", 666); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat(exn.toString(), Matchers.containsString("not an int")); + assertThat(exn.toString(), Matchers.containsString("not an int")); } } @@ -172,7 +170,7 @@ public void testGetListOfMaps() throws Exception { getListOfMaps(o, "singletonLongKey", null); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat(exn.toString(), Matchers.containsString("not a list")); + assertThat(exn.toString(), Matchers.containsString("not a list")); } } diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/TimeUtilTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/TimeUtilTest.java index 6254429c6674..1ac9fabf6a45 100644 --- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/TimeUtilTest.java +++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/TimeUtilTest.java @@ -32,9 +32,6 @@ /** Unit tests for {@link TimeUtil}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public final class TimeUtilTest { @Test public void toCloudTimeShouldPrintTimeStrings() { diff --git a/runners/google-cloud-dataflow-java/worker/build.gradle b/runners/google-cloud-dataflow-java/worker/build.gradle index 44d7b8e8a177..11e69c1ca918 100644 --- a/runners/google-cloud-dataflow-java/worker/build.gradle +++ b/runners/google-cloud-dataflow-java/worker/build.gradle @@ -70,6 +70,7 @@ dependencies { // All main sourceset dependencies here should be listed as compile scope so that the dependencies // are all packaged into a single uber jar allowing the jar to serve as an application. compile enforcedPlatform(library.java.google_cloud_platform_libraries_bom) + permitUnusedDeclared enforcedPlatform(library.java.google_cloud_platform_libraries_bom) compile project(":runners:google-cloud-dataflow-java") compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":sdks:java:extensions:google-cloud-platform-core") @@ -81,24 +82,36 @@ dependencies { compile project(":runners:java-fn-execution") compile project(":sdks:java:fn-execution") compile project(path: ":runners:google-cloud-dataflow-java:worker:windmill", configuration: "shadow") - compile library.java.vendored_grpc_1_26_0 + compile library.java.vendored_grpc_1_36_0 compile google_api_services_dataflow compile library.java.avro compile library.java.google_api_client + permitUnusedDeclared library.java.google_api_client // BEAM-11761 + compile library.java.google_auth_library_credentials compile library.java.google_http_client compile library.java.google_http_client_jackson2 + compile library.java.guava compile library.java.jackson_annotations compile library.java.jackson_core compile library.java.jackson_databind compile library.java.joda_time + compile library.java.proto_google_common_protos shadow library.java.vendored_guava_26_0_jre compile library.java.slf4j_api + compile "io.opencensus:opencensus-api:0.28.0" compile "javax.servlet:javax.servlet-api:3.1.0" - compile "org.conscrypt:conscrypt-openjdk:2.5.1:linux-x86_64" + + // Conscrypt shouldn't be included here because Conscrypt won't work when being shaded. + // (Context: https://github.com/apache/beam/pull/13846) + // Conscrypt will be added to runtime dependencies by GrpcVendoring so compileOnly works for now. + compileOnly "org.conscrypt:conscrypt-openjdk-uber:2.5.1" + compile "org.eclipse.jetty:jetty-server:9.2.10.v20150310" compile "org.eclipse.jetty:jetty-servlet:9.2.10.v20150310" compile library.java.error_prone_annotations + permitUnusedDeclared library.java.error_prone_annotations // BEAM-11761 compile library.java.slf4j_jdk14 + permitUnusedDeclared library.java.slf4j_jdk14 // BEAM-11761 // All test sourceset dependencies can be marked as shadowTest since we create an uber jar without // relocating any code. @@ -107,8 +120,6 @@ dependencies { shadowTest project(path: ":sdks:java:core", configuration: "shadowTest") shadowTest project(path: ":sdks:java:extensions:google-cloud-platform-core", configuration: "testRuntime") shadowTest project(path: ":runners:direct-java", configuration: "shadow") - shadowTest library.java.hamcrest_core - shadowTest library.java.hamcrest_library shadowTest library.java.jsonassert shadowTest library.java.junit shadowTest library.java.mockito_core diff --git a/runners/google-cloud-dataflow-java/worker/legacy-worker/build.gradle b/runners/google-cloud-dataflow-java/worker/legacy-worker/build.gradle index cfd9f1fc15ec..6ae4a5327162 100644 --- a/runners/google-cloud-dataflow-java/worker/legacy-worker/build.gradle +++ b/runners/google-cloud-dataflow-java/worker/legacy-worker/build.gradle @@ -53,7 +53,7 @@ def sdk_provided_dependencies = [ library.java.jackson_databind, library.java.joda_time, library.java.slf4j_api, - library.java.vendored_grpc_1_26_0, + library.java.vendored_grpc_1_36_0, ] def sdk_provided_shaded_project_dependencies = [ @@ -72,7 +72,6 @@ def excluded_dependencies = [ "com.google.auto.value:auto-value", // Provided scope added from applyJavaNature "org.codehaus.jackson:jackson-core-asl", // Exclude an old version of jackson-core-asl introduced by google-http-client-jackson "org.objenesis:objenesis", // Transitive dependency introduced from Beam - "org.tukaani:xz", // Transitive dependency introduced from Beam library.java.commons_compress, // Transitive dependency introduced from Beam library.java.error_prone_annotations, // Provided scope added in worker library.java.hamcrest_core, // Test only @@ -95,8 +94,6 @@ applyJavaNature( "org/apache/beam/repackaged/beam_runners_google_cloud_dataflow_java_legacy_worker/**", // TODO(BEAM-6137): Move DataflowRunnerHarness class under org.apache.beam.runners.dataflow.worker namespace "com/google/cloud/dataflow/worker/DataflowRunnerHarness.class", - // TODO(BEAM-6136): Enable relocation for conscrypt - "org/conscrypt/**", // Allow slf4j implementation worker for logging during pipeline execution "org/slf4j/impl/**" ], @@ -140,11 +137,6 @@ applyJavaNature( relocate("org.apache.beam.sdk.fn", getWorkerRelocatedPath("org.apache.beam.sdk.fn")) relocate("org.apache.beam.repackaged.beam_sdks_java_fn_execution", getWorkerRelocatedPath("org.apache.beam.repackaged.beam_sdks_java_fn_execution")) - // TODO(BEAM-6136): Enable relocation for conscrypt - dependencies { - include(dependency("org.conscrypt:conscrypt-openjdk:2.5.1:linux-x86_64")) - } - dependencies { // We have to include jetty-server/jetty-servlet and all of its transitive dependencies // which includes several org.eclipse.jetty artifacts + servlet-api @@ -207,15 +199,30 @@ dependencies { } compile project(path: ":model:fn-execution", configuration: "shadow") + compile project(path: ":model:pipeline", configuration: "shadow") + compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":runners:core-construction-java") compile project(":runners:core-java") compile project(":runners:java-fn-execution") compile project(":sdks:java:fn-execution") compile project(path: ":runners:google-cloud-dataflow-java:worker:windmill", configuration: "shadow") shadow library.java.vendored_guava_26_0_jre - compile "org.conscrypt:conscrypt-openjdk:2.5.1:linux-x86_64" + + // Conscrypt shouldn't be included here because Conscrypt won't work when being shaded. + // (Context: https://github.com/apache/beam/pull/13846) + // Conscrypt will be added to runtime dependencies by GrpcVendoring so compileOnly works for now. + compileOnly "org.conscrypt:conscrypt-openjdk-uber:2.5.1" + + compile "javax.servlet:javax.servlet-api:3.1.0" compile "org.eclipse.jetty:jetty-server:9.2.10.v20150310" compile "org.eclipse.jetty:jetty-servlet:9.2.10.v20150310" + compile library.java.avro + compile library.java.jackson_annotations + compile library.java.jackson_core + compile library.java.jackson_databind + compile library.java.joda_time + compile library.java.slf4j_api + compile library.java.vendored_grpc_1_36_0 provided library.java.error_prone_annotations runtime library.java.slf4j_jdk14 @@ -227,8 +234,6 @@ dependencies { shadowTest project(path: ":runners:direct-java", configuration: "shadow") shadowTest project(path: ":sdks:java:harness", configuration: "shadowTest") shadowTest project(path: ":sdks:java:core", configuration: "shadowTest") - shadowTest library.java.hamcrest_core - shadowTest library.java.hamcrest_library shadowTest library.java.jsonassert shadowTest library.java.junit shadowTest library.java.mockito_core diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchModeExecutionContext.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchModeExecutionContext.java index cb54c60e257b..19670f47fc8e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchModeExecutionContext.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchModeExecutionContext.java @@ -75,6 +75,8 @@ public class BatchModeExecutionContext "org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer$LoggingHttpBackOffHandler"; protected static final String BIGQUERY_STREAMING_INSERT_THROTTLE_TIME_NAMESPACE = "org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl"; + protected static final String BIGQUERY_READ_THROTTLE_TIME_NAMESPACE = + "org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$StorageClientImpl"; protected static final String THROTTLE_TIME_COUNTER_NAME = "throttling-msecs"; private BatchModeExecutionContext( @@ -555,6 +557,13 @@ public Long extractThrottleTime() { totalThrottleMsecs += bigqueryStreamingInsertThrottleTime.getCumulative(); } + CounterCell bigqueryReadThrottleTime = + container.tryGetCounter( + MetricName.named(BIGQUERY_READ_THROTTLE_TIME_NAMESPACE, THROTTLE_TIME_COUNTER_NAME)); + if (bigqueryReadThrottleTime != null) { + totalThrottleMsecs += bigqueryReadThrottleTime.getCumulative(); + } + CounterCell throttlingMsecs = container.tryGetCounter(DataflowSystemMetrics.THROTTLING_MSECS_METRIC_NAME); if (throttlingMsecs != null) { diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BeamFnMapTaskExecutorFactory.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BeamFnMapTaskExecutorFactory.java index 8480dccf5c85..8a45e5a4848e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BeamFnMapTaskExecutorFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BeamFnMapTaskExecutorFactory.java @@ -79,7 +79,6 @@ import org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver; import org.apache.beam.runners.dataflow.worker.util.common.worker.Sink; import org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler; import org.apache.beam.runners.fnexecution.control.JobBundleFactory; import org.apache.beam.runners.fnexecution.control.SingleEnvironmentInstanceJobBundleFactory; @@ -92,6 +91,7 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.fn.IdGenerator; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.util.WindowedValue; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ByteStringCoder.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ByteStringCoder.java index eab920f73b94..38e64e53a529 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ByteStringCoder.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ByteStringCoder.java @@ -23,7 +23,7 @@ import org.apache.beam.sdk.coders.AtomicCoder; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.util.VarInt; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; /** diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowMapTaskExecutorFactory.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowMapTaskExecutorFactory.java index 820bfd32ac57..b310a349edc4 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowMapTaskExecutorFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowMapTaskExecutorFactory.java @@ -22,11 +22,11 @@ import org.apache.beam.runners.dataflow.worker.counters.CounterSet; import org.apache.beam.runners.dataflow.worker.graph.Edges.Edge; import org.apache.beam.runners.dataflow.worker.graph.Nodes.Node; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler; import org.apache.beam.runners.fnexecution.data.GrpcDataService; import org.apache.beam.runners.fnexecution.state.GrpcStateService; import org.apache.beam.sdk.fn.IdGenerator; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.graph.MutableNetwork; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowRunnerHarness.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowRunnerHarness.java index 4462c1e065bc..42166183207a 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowRunnerHarness.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowRunnerHarness.java @@ -33,13 +33,13 @@ import org.apache.beam.runners.dataflow.worker.fn.stream.ServerStreamObserverFactory; import org.apache.beam.runners.dataflow.worker.logging.DataflowWorkerLoggingInitializer; import org.apache.beam.runners.dataflow.worker.status.SdkWorkerStatusServlet; -import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.fnexecution.control.FnApiControlClient; import org.apache.beam.runners.fnexecution.state.GrpcStateService; import org.apache.beam.runners.fnexecution.status.BeamWorkerStatusGrpcService; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.sdk.io.FileSystems; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.checkerframework.checker.nullness.qual.Nullable; import org.slf4j.Logger; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClient.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClient.java index 618a451c2d9d..f2daa49e7cf2 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClient.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClient.java @@ -35,6 +35,8 @@ import com.google.api.services.dataflow.model.WorkItemServiceState; import com.google.api.services.dataflow.model.WorkItemStatus; import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; import java.util.Collections; import java.util.List; import javax.annotation.concurrent.ThreadSafe; @@ -48,7 +50,6 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.joda.time.DateTime; import org.joda.time.Duration; -import org.joda.time.Interval; import org.slf4j.Logger; /** A Dataflow WorkUnit client that fetches WorkItems from the Dataflow service. */ @@ -102,8 +103,14 @@ public Optional getWorkItem() throws IOException { // All remote sources require the "remote_source" capability. Dataflow's // custom sources are further tagged with the format "custom_source". List capabilities = - ImmutableList.of( - options.getWorkerId(), CAPABILITY_REMOTE_SOURCE, PropertyNames.CUSTOM_SOURCE_FORMAT); + new ArrayList( + Arrays.asList( + options.getWorkerId(), + CAPABILITY_REMOTE_SOURCE, + PropertyNames.CUSTOM_SOURCE_FORMAT)); + if (options.getWorkerPool() != null) { + capabilities.add(options.getWorkerPool()); + } Optional workItem = getWorkItemInternal(workItemTypes, capabilities); if (!workItem.isPresent()) { @@ -215,14 +222,15 @@ public WorkItemServiceState reportWorkItemStatus(WorkItemStatus workItemStatus) && DataflowWorkerLoggingMDC.getStageName() != null) { DateTime startTime = stageStartTime.get(); if (startTime != null) { - // This thread should have been tagged with the stage start time during getWorkItem(), - Interval elapsed = new Interval(startTime, endTime); + // elapsed time can be negative by time correction + long elapsed = endTime.getMillis() - startTime.getMillis(); int numErrors = workItemStatus.getErrors() == null ? 0 : workItemStatus.getErrors().size(); + // This thread should have been tagged with the stage start time during getWorkItem(), logger.info( "Finished processing stage {} with {} errors in {} seconds ", DataflowWorkerLoggingMDC.getStageName(), numErrors, - (double) elapsed.toDurationMillis() / 1000); + (double) elapsed / 1000); } } shortIdCache.shortenIdsIfAvailable(workItemStatus.getCounterUpdates()); diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelper.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelper.java index 461b7975f6bd..a6ab2b6e1e7c 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelper.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelper.java @@ -30,7 +30,7 @@ import org.apache.beam.runners.dataflow.worker.ExperimentContext.Experiment; import org.apache.beam.runners.dataflow.worker.logging.DataflowWorkerLoggingInitializer; import org.apache.beam.runners.dataflow.worker.logging.DataflowWorkerLoggingMDC; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; import org.checkerframework.checker.nullness.qual.Nullable; import org.conscrypt.OpenSSLProvider; import org.slf4j.Logger; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DoFnInstanceManagers.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DoFnInstanceManagers.java index 6570a4642314..175bcefd46b6 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DoFnInstanceManagers.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DoFnInstanceManagers.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.dataflow.worker; import java.util.concurrent.ConcurrentLinkedQueue; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.reflect.DoFnInvokers; import org.apache.beam.sdk.util.DoFnInfo; @@ -33,8 +34,8 @@ public class DoFnInstanceManagers { * deserializing the provided bytes. {@link DoFnInstanceManager} will call {@link DoFn.Setup} as * required before returning the {@link DoFnInfo}, and {@link DoFn.Teardown} as appropriate. */ - public static DoFnInstanceManager cloningPool(DoFnInfo info) { - return new ConcurrentQueueInstanceManager(info); + public static DoFnInstanceManager cloningPool(DoFnInfo info, PipelineOptions options) { + return new ConcurrentQueueInstanceManager(info, options); } /** @@ -52,10 +53,12 @@ public static DoFnInstanceManager singleInstance(DoFnInfo info) { private static class ConcurrentQueueInstanceManager implements DoFnInstanceManager { private final byte[] serializedFnInfo; private final ConcurrentLinkedQueue> fns; + private final PipelineOptions options; - private ConcurrentQueueInstanceManager(DoFnInfo info) { + private ConcurrentQueueInstanceManager(DoFnInfo info, PipelineOptions options) { this.serializedFnInfo = SerializableUtils.serializeToByteArray(info); - fns = new ConcurrentLinkedQueue<>(); + this.fns = new ConcurrentLinkedQueue<>(); + this.options = options; } @Override @@ -80,7 +83,7 @@ private ConcurrentQueueInstanceManager(DoFnInfo info) { private DoFnInfo deserializeCopy() throws Exception { DoFnInfo fn; fn = (DoFnInfo) SerializableUtils.deserializeFromByteArray(serializedFnInfo, null); - DoFnInvokers.invokerFor(fn.getDoFn()).invokeSetup(); + DoFnInvokers.tryInvokeSetupFor(fn.getDoFn(), options); return fn; } diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ExperimentContext.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ExperimentContext.java index 0d9e0ef7f64a..d215799e767a 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ExperimentContext.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ExperimentContext.java @@ -44,8 +44,7 @@ public enum Experiment { * operations for some IO connectors. */ EnableConscryptSecurityProvider("enable_conscrypt_security_provider"), - IntertransformIO("intertransform_io"), // Intertransform metrics for Shuffle IO (insights) - SideInputIOMetrics("sideinput_io_metrics"); // Intertransform metrics for Side Input IO + IntertransformIO("intertransform_io"); // Intertransform metrics for Shuffle IO (insights) private final String name; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/GroupAlsoByWindowParDoFnFactory.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/GroupAlsoByWindowParDoFnFactory.java index 6ae8b24f881d..05459e5a6e75 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/GroupAlsoByWindowParDoFnFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/GroupAlsoByWindowParDoFnFactory.java @@ -57,7 +57,7 @@ import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.checkerframework.checker.nullness.qual.Nullable; import org.slf4j.Logger; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactory.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactory.java index ea935344976b..57be68547f7a 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactory.java @@ -64,13 +64,13 @@ import org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver; import org.apache.beam.runners.dataflow.worker.util.common.worker.Sink; import org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler; import org.apache.beam.runners.fnexecution.data.GrpcDataService; import org.apache.beam.runners.fnexecution.state.GrpcStateService; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.fn.IdGenerator; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; import org.apache.beam.sdk.util.common.ElementByteSizeObserver; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmSideInputReader.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmSideInputReader.java index 26adc74a3d2f..35d865f31729 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmSideInputReader.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmSideInputReader.java @@ -56,7 +56,6 @@ import org.apache.beam.runners.dataflow.util.CloudObjects; import org.apache.beam.runners.dataflow.util.PropertyNames; import org.apache.beam.runners.dataflow.util.RandomAccessData; -import org.apache.beam.runners.dataflow.worker.ExperimentContext.Experiment; import org.apache.beam.runners.dataflow.worker.util.WorkerPropertyNames; import org.apache.beam.runners.dataflow.worker.util.common.worker.NativeReader; import org.apache.beam.sdk.coders.Coder; @@ -70,11 +69,17 @@ import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollectionView; +import org.apache.beam.sdk.values.PCollectionViews.HasDefaultValue; import org.apache.beam.sdk.values.PCollectionViews.IterableViewFn; +import org.apache.beam.sdk.values.PCollectionViews.IterableViewFn2; import org.apache.beam.sdk.values.PCollectionViews.ListViewFn; +import org.apache.beam.sdk.values.PCollectionViews.ListViewFn2; import org.apache.beam.sdk.values.PCollectionViews.MapViewFn; +import org.apache.beam.sdk.values.PCollectionViews.MapViewFn2; import org.apache.beam.sdk.values.PCollectionViews.MultimapViewFn; +import org.apache.beam.sdk.values.PCollectionViews.MultimapViewFn2; import org.apache.beam.sdk.values.PCollectionViews.SingletonViewFn; +import org.apache.beam.sdk.values.PCollectionViews.SingletonViewFn2; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function; @@ -105,7 +110,13 @@ public class IsmSideInputReader implements SideInputReader { private static final Object NULL_PLACE_HOLDER = new Object(); private static final ImmutableList> KNOWN_SINGLETON_VIEW_TYPES = - ImmutableList.of(SingletonViewFn.class, MapViewFn.class, MultimapViewFn.class); + ImmutableList.of( + SingletonViewFn.class, + SingletonViewFn2.class, + MapViewFn.class, + MapViewFn2.class, + MultimapViewFn.class, + MultimapViewFn2.class); /** * Limit the number of concurrent initializations. @@ -218,14 +229,8 @@ private List> createReadersFromSources( throw new Exception("unexpected kind of side input: " + sideInputKind); } - SideInputReadCounter sideInputReadCounter; - ExperimentContext ec = ExperimentContext.parseFrom(options); - if (ec.isEnabled(Experiment.SideInputIOMetrics)) { - sideInputReadCounter = - new DataflowSideInputReadCounter(executionContext, operationContext, sideInputIndex); - } else { - sideInputReadCounter = new NoopSideInputReadCounter(); - } + SideInputReadCounter sideInputReadCounter = + new DataflowSideInputReadCounter(executionContext, operationContext, sideInputIndex); ImmutableList.Builder> builder = ImmutableList.builder(); for (Source source : sideInputInfo.getSources()) { @@ -309,7 +314,7 @@ public ViewT get(final PCollectionView view, final BoundedWindow // We handle the singleton case separately since a null value may be returned. // We use a null place holder to represent this, and when we detect it, we translate // back to null for the user. - if (viewFn instanceof SingletonViewFn) { + if (viewFn instanceof SingletonViewFn || viewFn instanceof SingletonViewFn2) { ViewT rval = executionContext ., ViewT>getLogicalReferenceCache() @@ -318,7 +323,7 @@ public ViewT get(final PCollectionView view, final BoundedWindow () -> { @SuppressWarnings("unchecked") ViewT viewT = - getSingletonForWindow(tag, (SingletonViewFn) viewFn, window); + getSingletonForWindow(tag, (HasDefaultValue) viewFn, window); @SuppressWarnings("unchecked") ViewT nullPlaceHolder = (ViewT) NULL_PLACE_HOLDER; return viewT == null ? nullPlaceHolder : viewT; @@ -326,7 +331,10 @@ public ViewT get(final PCollectionView view, final BoundedWindow return rval == NULL_PLACE_HOLDER ? null : rval; } else if (singletonMaterializedTags.contains(tag)) { checkArgument( - viewFn instanceof MapViewFn || viewFn instanceof MultimapViewFn, + viewFn instanceof MapViewFn + || viewFn instanceof MapViewFn2 + || viewFn instanceof MultimapViewFn + || viewFn instanceof MultimapViewFn2, "Unknown view type stored as singleton. Expected one of %s, got %s", KNOWN_SINGLETON_VIEW_TYPES, viewFn.getClass().getName()); @@ -343,15 +351,19 @@ public ViewT get(final PCollectionView view, final BoundedWindow .get( PCollectionViewWindow.of(view, window), () -> { - if (viewFn instanceof IterableViewFn || viewFn instanceof ListViewFn) { + if (viewFn instanceof IterableViewFn + || viewFn instanceof IterableViewFn2 + || viewFn instanceof ListViewFn + || viewFn instanceof ListViewFn2) { @SuppressWarnings("unchecked") ViewT viewT = (ViewT) getListForWindow(tag, window); return viewT; - } else if (viewFn instanceof MapViewFn) { + } else if (viewFn instanceof MapViewFn || viewFn instanceof MapViewFn2) { @SuppressWarnings("unchecked") ViewT viewT = (ViewT) getMapForWindow(tag, window); return viewT; - } else if (viewFn instanceof MultimapViewFn) { + } else if (viewFn instanceof MultimapViewFn + || viewFn instanceof MultimapViewFn2) { @SuppressWarnings("unchecked") ViewT viewT = (ViewT) getMultimapForWindow(tag, window); return viewT; @@ -382,7 +394,7 @@ public ViewT get(final PCollectionView view, final BoundedWindow * */ private T getSingletonForWindow( - TupleTag viewTag, SingletonViewFn viewFn, W window) throws IOException { + TupleTag viewTag, HasDefaultValue viewFn, W window) throws IOException { @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "unchecked" diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/MetricTrackingWindmillServerStub.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/MetricTrackingWindmillServerStub.java index 60ff50ae464e..d2eff46ef770 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/MetricTrackingWindmillServerStub.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/MetricTrackingWindmillServerStub.java @@ -22,14 +22,16 @@ import java.util.HashMap; import java.util.List; import java.util.Map; -import java.util.concurrent.ArrayBlockingQueue; +import java.util.concurrent.ExecutionException; import java.util.concurrent.atomic.AtomicInteger; +import javax.annotation.concurrent.GuardedBy; import org.apache.beam.runners.dataflow.worker.util.MemoryMonitor; import org.apache.beam.runners.dataflow.worker.windmill.Windmill; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.KeyedGetDataRequest; import org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub; import org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub.GetDataStream; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.SettableFuture; +import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; /** @@ -37,10 +39,8 @@ * requests and throttles requests when memory pressure is high. * *

    External API: individual worker threads request state for their computation via {@link - * #getStateData}. However, we want to batch requests to WMS rather than calling for each thread, so - * calls actually just enqueue a state request in the local queue, which will be handled by up to - * {@link #NUM_THREADS} polling that queue and making requests to WMS in batches of size {@link - * #MAX_READS_PER_BATCH}. + * #getStateData}. However, requests are either issued using a pool of streaming rpcs or possibly + * batched requests. */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) @@ -54,14 +54,21 @@ public class MetricTrackingWindmillServerStub { private final MemoryMonitor gcThrashingMonitor; private final boolean useStreamingRequests; - private final ArrayBlockingQueue readQueue; - private final List readPool; + private static final class ReadBatch { + ArrayList reads = new ArrayList<>(); + SettableFuture startRead = SettableFuture.create(); + } + + @GuardedBy("this") + private final List pendingReadBatches; + + @GuardedBy("this") + private int activeReadThreads = 0; private WindmillServerStub.StreamPool streamPool; private static final int MAX_READS_PER_BATCH = 60; - private static final int QUEUE_SIZE = 1000; - private static final int NUM_THREADS = 10; + private static final int MAX_ACTIVE_READS = 10; private static final int NUM_STREAMS = 1; private static final Duration STREAM_TIMEOUT = Duration.standardSeconds(30); @@ -85,8 +92,8 @@ public MetricTrackingWindmillServerStub( WindmillServerStub server, MemoryMonitor gcThrashingMonitor, boolean useStreamingRequests) { this.server = server; this.gcThrashingMonitor = gcThrashingMonitor; - this.readQueue = new ArrayBlockingQueue<>(QUEUE_SIZE); - this.readPool = new ArrayList<>(NUM_THREADS); + // This is used as a queue but is expected to be less than 10 batches. + this.pendingReadBatches = new ArrayList<>(); this.useStreamingRequests = useStreamingRequests; } @@ -95,57 +102,79 @@ public void start() { streamPool = new WindmillServerStub.StreamPool<>( NUM_STREAMS, STREAM_TIMEOUT, this.server::getDataStream); - } else { - for (int i = 0; i < NUM_THREADS; i++) { - readPool.add( - new Thread("GetDataThread" + i) { - @Override - public void run() { - getDataLoop(); - } - }); - readPool.get(i).start(); - } } } - private void getDataLoop() { - while (true) { - // First, block until the readQueue has data, then pull up to MAX_READS_PER_BATCH or until the - // queue is empty. - QueueEntry entry; - try { - entry = readQueue.take(); - } catch (InterruptedException e) { - continue; + // Adds the entry to a read batch for sending to the windmill server. If a non-null batch is + // returned, this thread will be responsible for sending the batch and should wait for the batch + // startRead to be notified. + // If null is returned, the entry was added to a read batch that will be issued by another thread. + private @Nullable ReadBatch addToReadBatch(QueueEntry entry) { + synchronized (this) { + ReadBatch batch; + if (activeReadThreads < MAX_ACTIVE_READS) { + assert (pendingReadBatches.isEmpty()); + activeReadThreads += 1; + // fall through to below synchronized block + } else if (pendingReadBatches.isEmpty() + || pendingReadBatches.get(pendingReadBatches.size() - 1).reads.size() + >= MAX_READS_PER_BATCH) { + // This is the first read of a batch, it will be responsible for sending the batch. + batch = new ReadBatch(); + pendingReadBatches.add(batch); + batch.reads.add(entry); + return batch; + } else { + // This fits within an existing batch, it will be sent by the first blocking thread in the + // batch. + pendingReadBatches.get(pendingReadBatches.size() - 1).reads.add(entry); + return null; } - int numReads = 1; - Map> pendingResponses = - new HashMap<>(); - Map computationBuilders = new HashMap<>(); - do { - Windmill.ComputationGetDataRequest.Builder computationBuilder = - computationBuilders.get(entry.computation); - if (computationBuilder == null) { - computationBuilder = - Windmill.ComputationGetDataRequest.newBuilder().setComputationId(entry.computation); - computationBuilders.put(entry.computation, computationBuilder); - } + } + ReadBatch batch = new ReadBatch(); + batch.reads.add(entry); + batch.startRead.set(true); + return batch; + } - computationBuilder.addRequests(entry.request); - pendingResponses.put( - WindmillComputationKey.create( - entry.computation, entry.request.getKey(), entry.request.getShardingKey()), - entry.response); - } while (numReads++ < MAX_READS_PER_BATCH && (entry = readQueue.poll()) != null); - - // Build the full GetDataRequest from the KeyedGetDataRequests pulled from the queue. - Windmill.GetDataRequest.Builder builder = Windmill.GetDataRequest.newBuilder(); - for (Windmill.ComputationGetDataRequest.Builder computationBuilder : - computationBuilders.values()) { - builder.addRequests(computationBuilder.build()); - } + private void issueReadBatch(ReadBatch batch) { + try { + boolean read = batch.startRead.get(); + assert (read); + } catch (InterruptedException e) { + // We don't expect this thread to be interrupted. To simplify handling, we just fall through + // to issuing + // the call. + assert (false); + Thread.currentThread().interrupt(); + } catch (ExecutionException e) { + // startRead is a SettableFuture so this should never occur. + throw new AssertionError("Should not have exception on startRead", e); + } + Map> pendingResponses = + new HashMap<>(batch.reads.size()); + Map computationBuilders = new HashMap<>(); + for (QueueEntry entry : batch.reads) { + Windmill.ComputationGetDataRequest.Builder computationBuilder = + computationBuilders.computeIfAbsent( + entry.computation, + k -> Windmill.ComputationGetDataRequest.newBuilder().setComputationId(k)); + computationBuilder.addRequests(entry.request); + pendingResponses.put( + WindmillComputationKey.create( + entry.computation, entry.request.getKey(), entry.request.getShardingKey()), + entry.response); + } + + // Build the full GetDataRequest from the KeyedGetDataRequests pulled from the queue. + Windmill.GetDataRequest.Builder builder = Windmill.GetDataRequest.newBuilder(); + for (Windmill.ComputationGetDataRequest.Builder computationBuilder : + computationBuilders.values()) { + builder.addRequests(computationBuilder); + } + + try { Windmill.GetDataResponse response = server.getData(builder.build()); // Dispatch the per-key responses back to the waiting threads. @@ -160,6 +189,22 @@ private void getDataLoop() { .set(keyResponse); } } + } catch (RuntimeException e) { + // Fan the exception out to the reads. + for (QueueEntry entry : batch.reads) { + entry.response.setException(e); + } + } finally { + synchronized (this) { + assert (activeReadThreads >= 1); + if (pendingReadBatches.isEmpty()) { + activeReadThreads--; + } else { + // Notify the thread responsible for issuing the next batch read. + ReadBatch startBatch = pendingReadBatches.remove(0); + startBatch.startRead.set(true); + } + } } } @@ -178,7 +223,10 @@ public Windmill.KeyedGetDataResponse getStateData( } } else { SettableFuture response = SettableFuture.create(); - readQueue.add(new QueueEntry(computation, request, response)); + ReadBatch batch = addToReadBatch(new QueueEntry(computation, request, response)); + if (batch != null) { + issueReadBatch(batch); + } return response.get(); } } catch (Exception e) { @@ -244,6 +292,12 @@ public void printHtml(PrintWriter writer) { writer.println("Active Fetches:"); writer.println(" Side Inputs: " + activeSideInputs.get()); writer.println(" State Reads: " + activeStateReads.get()); + if (!useStreamingRequests) { + synchronized (this) { + writer.println(" Read threads: " + activeReadThreads); + writer.println(" Pending read batches: " + pendingReadBatches.size()); + } + } writer.println("Heartbeat Keys Active: " + activeHeartbeats.get()); } } diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/PubsubSink.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/PubsubSink.java index 72e3b4c5e402..955313e7ddad 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/PubsubSink.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/PubsubSink.java @@ -35,7 +35,7 @@ import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.Nullable; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ReaderCache.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ReaderCache.java index bf65cfac1d4a..58f3abfc66a2 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ReaderCache.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/ReaderCache.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.dataflow.worker; import java.io.IOException; +import java.util.concurrent.Executor; import java.util.concurrent.TimeUnit; import javax.annotation.concurrent.ThreadSafe; import org.apache.beam.sdk.io.UnboundedSource; @@ -41,6 +42,7 @@ class ReaderCache { private static final Logger LOG = LoggerFactory.getLogger(ReaderCache.class); + private final Executor invalidationExecutor; // Note on thread safety. This class is thread safe because: // - Guava Cache is thread safe. @@ -64,33 +66,36 @@ private static class CacheEntry { private final Cache cache; - /** ReaderCache with default 1 minute expiration for readers. */ - ReaderCache() { - this(Duration.standardMinutes(1)); - } - - /** Cache reader for {@code cacheDuration}. */ - ReaderCache(Duration cacheDuration) { + /** Cache reader for {@code cacheDuration}. Readers will be closed on {@code executor}. */ + ReaderCache(Duration cacheDuration, Executor invalidationExecutor) { + this.invalidationExecutor = invalidationExecutor; this.cache = CacheBuilder.newBuilder() .expireAfterWrite(cacheDuration.getMillis(), TimeUnit.MILLISECONDS) .removalListener( (RemovalNotification notification) -> { if (notification.getCause() != RemovalCause.EXPLICIT) { - LOG.info("Closing idle reader for {}", notification.getKey()); - closeReader(notification.getKey(), notification.getValue()); + LOG.info( + "Asynchronously closing reader for {} as it has been idle for over {}", + notification.getKey(), + cacheDuration); + asyncCloseReader(notification.getKey(), notification.getValue()); } }) .build(); } /** Close the reader and log a warning if close fails. */ - private void closeReader(WindmillComputationKey key, CacheEntry entry) { - try { - entry.reader.close(); - } catch (IOException e) { - LOG.warn("Failed to close UnboundedReader for {}", key, e); - } + private void asyncCloseReader(WindmillComputationKey key, CacheEntry entry) { + invalidationExecutor.execute( + () -> { + try { + entry.reader.close(); + LOG.info("Finished closing reader for {}", key); + } catch (IOException e) { + LOG.warn("Failed to close UnboundedReader for {}", key, e); + } + }); } /** @@ -112,7 +117,8 @@ UnboundedSource.UnboundedReader acquireReader( } else { // new cacheToken invalidates old one or this is a retried or stale request, // close the reader. - closeReader(computationKey, entry); + LOG.info("Asynchronously closing reader for {} as it is no longer valid", computationKey); + asyncCloseReader(computationKey, entry); } } return null; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistries.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistries.java index 0da3475c34a7..0033f2f54cb1 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistries.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistries.java @@ -26,10 +26,10 @@ import java.util.concurrent.atomic.AtomicInteger; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.runners.dataflow.worker.fn.data.BeamFnDataGrpcService; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.control.FnApiControlClient; import org.apache.beam.runners.fnexecution.data.GrpcDataService; import org.apache.beam.runners.fnexecution.state.GrpcStateService; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.checkerframework.checker.nullness.qual.Nullable; import org.slf4j.Logger; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistry.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistry.java index 401e6b95d7cf..818c05868c25 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistry.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistry.java @@ -18,10 +18,10 @@ package org.apache.beam.runners.dataflow.worker; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.control.FnApiControlClient; import org.apache.beam.runners.fnexecution.data.GrpcDataService; import org.apache.beam.runners.fnexecution.state.GrpcStateService; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.checkerframework.checker.nullness.qual.Nullable; /** Registry used to manage all the connections (Control, Data, State) from SdkHarness */ diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StateFetcher.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StateFetcher.java index aa393d2ea6af..db5256aa886d 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StateFetcher.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StateFetcher.java @@ -38,7 +38,7 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Optional; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.Cache; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java index 90b0ab9be460..539af163764e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java @@ -54,6 +54,7 @@ import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.ConcurrentMap; import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executors; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.Semaphore; import java.util.concurrent.ThreadFactory; @@ -67,6 +68,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.metrics.ExecutionStateSampler; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; +import org.apache.beam.runners.core.metrics.MetricsLogger; import org.apache.beam.runners.dataflow.DataflowRunner; import org.apache.beam.runners.dataflow.internal.CustomSources; import org.apache.beam.runners.dataflow.options.DataflowWorkerHarnessOptions; @@ -125,14 +127,15 @@ import org.apache.beam.sdk.fn.JvmInitializers; import org.apache.beam.sdk.io.FileSystems; import org.apache.beam.sdk.metrics.MetricName; +import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.util.BackOff; import org.apache.beam.sdk.util.BackOffUtils; import org.apache.beam.sdk.util.FluentBackoff; import org.apache.beam.sdk.util.Sleeper; import org.apache.beam.sdk.util.UserCodeException; import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Optional; @@ -219,6 +222,8 @@ public class StreamingDataflowWorker { "org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl", "throttling-msecs"); + private static final Duration MAX_LOCAL_PROCESSING_RETRY_DURATION = Duration.standardMinutes(5); + private final AtomicLong counterAggregationErrorCount = new AtomicLong(); /** Returns whether an exception was caused by a {@link OutOfMemoryError}. */ @@ -289,6 +294,10 @@ public static void main(String[] args) throws Exception { StreamingDataflowWorker worker = StreamingDataflowWorker.fromDataflowWorkerHarnessOptions(options, sdkHarnessRegistry); + // Use the MetricsLogger container which is used by BigQueryIO to periodically log process-wide + // metrics. + MetricsEnvironment.setProcessWideContainer(new MetricsLogger(null)); + JvmInitializers.runBeforeProcessing(options); worker.startStatusPages(); worker.start(); @@ -412,7 +421,7 @@ public int getSize() { private final ConcurrentMap systemNameToComputationIdMap = new ConcurrentHashMap<>(); - private final WindmillStateCache stateCache; + final WindmillStateCache stateCache; private final ThreadFactory threadFactory; private DataflowMapTaskExecutorFactory mapTaskExecutorFactory; @@ -472,7 +481,7 @@ public int getSize() { private final EvictingQueue pendingFailuresToReport = EvictingQueue.create(MAX_FAILURES_TO_REPORT_IN_UPDATE); - private final ReaderCache readerCache = new ReaderCache(); + private final ReaderCache readerCache; private final WorkUnitClient workUnitClient; private final CompletableFuture isDoneFuture; @@ -602,6 +611,10 @@ public static StreamingDataflowWorker fromDataflowWorkerHarnessOptions( HotKeyLogger hotKeyLogger) throws IOException { this.stateCache = new WindmillStateCache(options.getWorkerCacheMb()); + this.readerCache = + new ReaderCache( + Duration.standardSeconds(options.getReaderCacheTimeoutSec()), + Executors.newCachedThreadPool()); this.mapTaskExecutorFactory = mapTaskExecutorFactory; this.workUnitClient = workUnitClient; this.options = options; @@ -1131,8 +1144,11 @@ public static ShardedKey create(ByteString key, long shardingKey) { @Override public String toString() { - return String.format( - "%016x-%s", shardingKey(), TextFormat.escapeBytes(key().substring(0, 100))); + ByteString keyToDisplay = key(); + if (keyToDisplay.size() > 100) { + keyToDisplay = keyToDisplay.substring(0, 100); + } + return String.format("%016x-%s", shardingKey(), TextFormat.escapeBytes(keyToDisplay)); } } @@ -1492,6 +1508,7 @@ private void process( } else { LastExceptionDataProvider.reportException(t); LOG.debug("Failed work: {}", work); + Duration elapsedTimeSinceStart = new Duration(Instant.now(), work.getStartTime()); if (!reportFailure(computationId, workItem, t)) { LOG.error( "Execution of work for computation '{}' on key '{}' failed with uncaught exception, " @@ -1508,6 +1525,16 @@ private void process( key.toStringUtf8(), heapDump == null ? "not written" : ("written to '" + heapDump + "'"), t); + } else if (elapsedTimeSinceStart.isLongerThan(MAX_LOCAL_PROCESSING_RETRY_DURATION)) { + LOG.error( + "Execution of work for computation '{}' for key '{}' failed with uncaught exception, " + + "and it will not be retried locally because the elapsed time since start {} " + + "exceeds {}.", + computationId, + key.toStringUtf8(), + elapsedTimeSinceStart, + MAX_LOCAL_PROCESSING_RETRY_DURATION, + t); } else { LOG.error( "Execution of work for computation '{}' on key '{}' failed with uncaught exception. " @@ -2315,7 +2342,7 @@ public void invalidateStuckCommits(Instant stuckCommitDeadline) { if (work.getState() == State.COMMITTING && work.getStateStartTime().isBefore(stuckCommitDeadline)) { LOG.error( - "Detected key with sharding key 0x{} stuck in COMMITTING state since {}, completing it with error.", + "Detected key {} stuck in COMMITTING state since {}, completing it with error.", shardedKey, work.getStateStartTime()); stuckCommits.put(shardedKey, work.getWorkItem().getWorkToken()); diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowViaWindowSetFn.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowViaWindowSetFn.java index e8c239f46f3e..dd7d04f0013d 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowViaWindowSetFn.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowViaWindowSetFn.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.dataflow.worker; +import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.KeyedWorkItem; import org.apache.beam.runners.core.OutputWindowedValue; import org.apache.beam.runners.core.ReduceFnRunner; @@ -52,6 +53,7 @@ GroupAlsoByWindowFn, KV> create( } private final WindowingStrategy windowingStrategy; + private final RunnerApi.Trigger triggerProto; private final StateInternalsFactory stateInternalsFactory; private SystemReduceFn reduceFn; @@ -62,6 +64,7 @@ private StreamingGroupAlsoByWindowViaWindowSetFn( @SuppressWarnings("unchecked") WindowingStrategy noWildcard = (WindowingStrategy) windowingStrategy; this.windowingStrategy = noWildcard; + this.triggerProto = TriggerTranslation.toProto(windowingStrategy.getTrigger()); this.reduceFn = reduceFn; this.stateInternalsFactory = stateInternalsFactory; } @@ -82,8 +85,7 @@ public void processElement( key, windowingStrategy, ExecutableTriggerStateMachine.create( - TriggerStateMachines.stateMachineForTrigger( - TriggerTranslation.toProto(windowingStrategy.getTrigger()))), + TriggerStateMachines.stateMachineForTrigger(triggerProto)), stateInternals, stepContext.timerInternals(), output, diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContext.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContext.java index 48b78a3a7588..4595c121e546 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContext.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContext.java @@ -24,15 +24,18 @@ import com.google.api.services.dataflow.model.SideInputInfo; import java.io.Closeable; import java.io.IOException; +import java.util.Collection; import java.util.HashMap; import java.util.Iterator; import java.util.List; import java.util.Map; +import java.util.NavigableSet; import java.util.Set; import java.util.concurrent.ThreadLocalRandom; import java.util.concurrent.atomic.AtomicLong; import org.apache.beam.runners.core.SideInputReader; import org.apache.beam.runners.core.StateInternals; +import org.apache.beam.runners.core.StateNamespace; import org.apache.beam.runners.core.StateNamespaces; import org.apache.beam.runners.core.TimerInternals; import org.apache.beam.runners.core.TimerInternals.TimerData; @@ -53,13 +56,18 @@ import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Optional; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.FluentIterable; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashBasedTable; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.PeekingIterator; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Table; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; import org.joda.time.Instant; @@ -240,13 +248,21 @@ public void start( } } - for (StepContext stepContext : getAllStepContexts()) { - stepContext.start( - stateReader, - inputDataWatermark, - processingTime, - outputDataWatermark, - synchronizedProcessingTime); + Collection stepContexts = getAllStepContexts(); + if (!stepContexts.isEmpty()) { + // This must be only created once for the workItem as token validation will fail if the same + // work token is reused. + WindmillStateCache.ForKey cacheForKey = + stateCache.forKey(getComputationKey(), getWork().getCacheToken(), getWorkToken()); + for (StepContext stepContext : stepContexts) { + stepContext.start( + stateReader, + inputDataWatermark, + processingTime, + cacheForKey, + outputDataWatermark, + synchronizedProcessingTime); + } } } @@ -500,6 +516,7 @@ public void start( WindmillStateReader stateReader, Instant inputDataWatermark, Instant processingTime, + WindmillStateCache.ForKey cacheForKey, @Nullable Instant outputDataWatermark, @Nullable Instant synchronizedProcessingTime) { this.stateInternals = @@ -508,8 +525,7 @@ public void start( stateFamily, stateReader, work.getIsNewKey(), - stateCache.forKey( - getComputationKey(), stateFamily, getWork().getCacheToken(), getWorkToken()), + cacheForKey.forFamily(stateFamily), scopedReadStateSupplier); this.systemTimerInternals = @@ -519,7 +535,8 @@ public void start( inputDataWatermark, processingTime, outputDataWatermark, - synchronizedProcessingTime); + synchronizedProcessingTime, + td -> {}); this.userTimerInternals = new WindmillTimerInternals( @@ -528,10 +545,15 @@ public void start( inputDataWatermark, processingTime, outputDataWatermark, - synchronizedProcessingTime); + synchronizedProcessingTime, + this::onUserTimerModified); - this.cachedFiredTimers = null; + this.cachedFiredSystemTimers = null; this.cachedFiredUserTimers = null; + modifiedUserEventTimersOrdered = Sets.newTreeSet(); + modifiedUserProcessingTimersOrdered = Sets.newTreeSet(); + modifiedUserSynchronizedProcessingTimersOrdered = Sets.newTreeSet(); + modifiedUserTimerKeys = HashBasedTable.create(); } public void flushState() { @@ -541,12 +563,12 @@ public void flushState() { } // Lazily initialized - private Iterator cachedFiredTimers = null; + private Iterator cachedFiredSystemTimers = null; @Override public TimerData getNextFiredTimer(Coder windowCoder) { - if (cachedFiredTimers == null) { - cachedFiredTimers = + if (cachedFiredSystemTimers == null) { + cachedFiredSystemTimers = FluentIterable.from(StreamingModeExecutionContext.this.getFiredTimers()) .filter( timer -> @@ -559,10 +581,10 @@ public TimerData getNextFiredTimer(Coder windowCode .iterator(); } - if (!cachedFiredTimers.hasNext()) { + if (!cachedFiredSystemTimers.hasNext()) { return null; } - TimerData nextTimer = cachedFiredTimers.next(); + TimerData nextTimer = cachedFiredSystemTimers.next(); // system timers ( GC timer) must be explicitly deleted if only there is a hold. // if timestamp is not equals to outputTimestamp then there should be a hold if (!nextTimer.getTimestamp().equals(nextTimer.getOutputTimestamp())) { @@ -572,30 +594,96 @@ public TimerData getNextFiredTimer(Coder windowCode } // Lazily initialized - private Iterator cachedFiredUserTimers = null; + private PeekingIterator cachedFiredUserTimers = null; + // An ordered list of any timers that were set or modified by user processing earlier in this + // bundle. + // We use a NavigableSet instead of a priority queue to prevent duplicate elements from ending + // up in the queue. + private NavigableSet modifiedUserEventTimersOrdered = null; + private NavigableSet modifiedUserProcessingTimersOrdered = null; + private NavigableSet modifiedUserSynchronizedProcessingTimersOrdered = null; + + private NavigableSet getModifiedUserTimersOrdered(TimeDomain timeDomain) { + switch (timeDomain) { + case EVENT_TIME: + return modifiedUserEventTimersOrdered; + case PROCESSING_TIME: + return modifiedUserProcessingTimersOrdered; + case SYNCHRONIZED_PROCESSING_TIME: + return modifiedUserSynchronizedProcessingTimersOrdered; + default: + throw new RuntimeException("Unexpected time domain " + timeDomain); + } + } + + // A list of timer keys that were modified by user processing earlier in this bundle. This + // serves a tombstone, so + // that we know not to fire any bundle tiemrs that were moddified. + private Table modifiedUserTimerKeys = null; + + private void onUserTimerModified(TimerData timerData) { + if (!timerData.getDeleted()) { + getModifiedUserTimersOrdered(timerData.getDomain()).add(timerData); + } + modifiedUserTimerKeys.put( + WindmillTimerInternals.getTimerDataKey(timerData), timerData.getNamespace(), timerData); + } + + private boolean timerModified(TimerData timerData) { + String timerKey = WindmillTimerInternals.getTimerDataKey(timerData); + @Nullable + TimerData updatedTimer = modifiedUserTimerKeys.get(timerKey, timerData.getNamespace()); + return updatedTimer != null && !updatedTimer.equals(timerData); + } public TimerData getNextFiredUserTimer(Coder windowCoder) { if (cachedFiredUserTimers == null) { + // This is the first call to getNextFiredUserTimer in this bundle. Extract any user timers + // from the bundle + // and cache the list for the rest of this bundle processing. cachedFiredUserTimers = - FluentIterable.from(StreamingModeExecutionContext.this.getFiredTimers()) - .filter( - timer -> - WindmillTimerInternals.isUserTimer(timer) - && timer.getStateFamily().equals(stateFamily)) - .transform( - timer -> - WindmillTimerInternals.windmillTimerToTimerData( - WindmillNamespacePrefix.USER_NAMESPACE_PREFIX, timer, windowCoder)) - .iterator(); + Iterators.peekingIterator( + FluentIterable.from(StreamingModeExecutionContext.this.getFiredTimers()) + .filter( + timer -> + WindmillTimerInternals.isUserTimer(timer) + && timer.getStateFamily().equals(stateFamily)) + .transform( + timer -> + WindmillTimerInternals.windmillTimerToTimerData( + WindmillNamespacePrefix.USER_NAMESPACE_PREFIX, timer, windowCoder)) + .iterator()); } - if (!cachedFiredUserTimers.hasNext()) { - return null; + while (cachedFiredUserTimers.hasNext()) { + TimerData nextInBundle = cachedFiredUserTimers.peek(); + NavigableSet modifiedUserTimersOrdered = + getModifiedUserTimersOrdered(nextInBundle.getDomain()); + // If there is a modified timer that is earlier than the next timer in the bundle, try and + // fire that first. + while (!modifiedUserTimersOrdered.isEmpty() + && modifiedUserTimersOrdered.first().compareTo(nextInBundle) <= 0) { + TimerData earlierTimer = modifiedUserTimersOrdered.pollFirst(); + if (!timerModified(earlierTimer)) { + // We must delete the timer. This prevents it from being committed to the backing store. + // It also handles the + // case where the timer had been set to the far future and then modified in bundle; + // without deleting the + // timer, the runner will still have that future timer stored, and would fire it + // spuriously. + userTimerInternals.deleteTimer(earlierTimer); + return earlierTimer; + } + } + // There is no earlier timer to fire, so return the next timer in the bundle. + nextInBundle = cachedFiredUserTimers.next(); + if (!timerModified(nextInBundle)) { + // User timers must be explicitly deleted when delivered, to release the implied hold. + userTimerInternals.deleteTimer(nextInBundle); + return nextInBundle; + } } - TimerData nextTimer = cachedFiredUserTimers.next(); - // User timers must be explicitly deleted when delivered, to release the implied hold - userTimerInternals.deleteTimer(nextTimer); - return nextTimer; + return null; } @Override diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunner.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunner.java index 7b53ee5dad2e..e01be8035dec 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunner.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunner.java @@ -48,6 +48,7 @@ public StreamingSideInputDoFnRunner( @Override public void startBundle() { simpleDoFnRunner.startBundle(); + sideInputFetcher.prefetchBlockedMap(); // Find the set of ready windows. Set readyWindows = sideInputFetcher.getReadyWindows(); diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputFetcher.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputFetcher.java index e87855c3ba29..9ca8851c403e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputFetcher.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputFetcher.java @@ -48,8 +48,8 @@ import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Parser; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Parser; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; @@ -154,6 +154,10 @@ public Set getReadyWindows() { return readyWindows; } + public void prefetchBlockedMap() { + stepContext.stateInternals().state(StateNamespaces.global(), blockedMapAddr).readLater(); + } + public Iterable>> prefetchElements(Iterable readyWindows) { List>> elements = Lists.newArrayList(); for (W window : readyWindows) { diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/UserParDoFnFactory.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/UserParDoFnFactory.java index 600bec642dbc..58760ac35952 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/UserParDoFnFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/UserParDoFnFactory.java @@ -94,7 +94,8 @@ public ParDoFn create( DoFnInstanceManager instanceManager = fnCache.get( operationContext.nameContext().systemName(), - () -> DoFnInstanceManagers.cloningPool(doFnExtractor.getDoFnInfo(cloudUserFn))); + () -> + DoFnInstanceManagers.cloningPool(doFnExtractor.getDoFnInfo(cloudUserFn), options)); DoFnInfo doFnInfo = instanceManager.peek(); diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillComputationKey.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillComputationKey.java index ea5111c68f0e..b5ba9bb8c32c 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillComputationKey.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillComputationKey.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker; import com.google.auto.value.AutoValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; @AutoValue public abstract class WindmillComputationKey { diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillNamespacePrefix.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillNamespacePrefix.java index 2dd70062c634..d2aca7ef16c7 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillNamespacePrefix.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillNamespacePrefix.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.dataflow.worker; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** * A prefix for a Windmill state or timer tag to separate user state and timers from system state diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillSink.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillSink.java index cd4676a9156f..197a2d3c3686 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillSink.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillSink.java @@ -39,7 +39,7 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.ValueWithRecordId; import org.apache.beam.sdk.values.ValueWithRecordId.ValueWithRecordIdCoder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.Nullable; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java index 461134003b79..c73686818eb4 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java @@ -17,12 +17,13 @@ */ package org.apache.beam.runners.dataflow.worker; +import com.google.common.base.Preconditions; import java.io.IOException; import java.io.PrintWriter; import java.util.HashMap; -import java.util.Map; import java.util.Objects; -import javax.servlet.ServletException; +import java.util.concurrent.ConcurrentMap; +import java.util.function.BiConsumer; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.apache.beam.runners.core.StateNamespace; @@ -32,13 +33,13 @@ import org.apache.beam.runners.dataflow.worker.status.StatusDataProvider; import org.apache.beam.sdk.state.State; import org.apache.beam.sdk.util.Weighted; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Equivalence; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.Cache; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.RemovalCause; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheStats; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.Weigher; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashMultimap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.MapMaker; import org.checkerframework.checker.nullness.qual.Nullable; /** @@ -50,30 +51,28 @@ * StreamingDataflowWorker} ensures that a single computation * processing key is executing on one * thread at a time, so this is safe. */ -@SuppressWarnings({ - "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindmillStateCache implements StatusDataProvider { // Convert Megabytes to bytes private static final long MEGABYTES = 1024 * 1024; // Estimate of overhead per StateId. - private static final int PER_STATE_ID_OVERHEAD = 20; + private static final long PER_STATE_ID_OVERHEAD = 28; // Initial size of hash tables per entry. private static final int INITIAL_HASH_MAP_CAPACITY = 4; // Overhead of each hash map entry. private static final int HASH_MAP_ENTRY_OVERHEAD = 16; - // Overhead of each cache entry. Three longs, plus a hash table. + // Overhead of each StateCacheEntry. One long, plus a hash table. private static final int PER_CACHE_ENTRY_OVERHEAD = - 24 + HASH_MAP_ENTRY_OVERHEAD * INITIAL_HASH_MAP_CAPACITY; + 8 + HASH_MAP_ENTRY_OVERHEAD * INITIAL_HASH_MAP_CAPACITY; - private Cache stateCache; - private HashMultimap keyIndex = - HashMultimap.create(); - private int displayedWeight = 0; // Only used for status pages and unit tests. - private long workerCacheBytes; // Copy workerCacheMb and convert to bytes. + private final Cache stateCache; + // Contains the current valid ForKey object. Entries in the cache are keyed by ForKey with pointer + // equality so entries may be invalidated by creating a new key object, rendering the previous + // entries inaccessible. They will be evicted through normal cache operation. + private final ConcurrentMap keyIndex = + new MapMaker().weakValues().concurrencyLevel(4).makeMap(); + private final long workerCacheBytes; // Copy workerCacheMb and convert to bytes. - public WindmillStateCache(Integer workerCacheMb) { + public WindmillStateCache(long workerCacheMb) { final Weigher weigher = Weighers.weightedKeysAndValues(); workerCacheBytes = workerCacheMb * MEGABYTES; stateCache = @@ -81,31 +80,45 @@ public WindmillStateCache(Integer workerCacheMb) { .maximumWeight(workerCacheBytes) .recordStats() .weigher(weigher) - .removalListener( - removal -> { - if (removal.getCause() != RemovalCause.REPLACED) { - synchronized (this) { - StateId id = (StateId) removal.getKey(); - if (removal.getCause() != RemovalCause.EXPLICIT) { - // When we invalidate a key explicitly, we'll also update the keyIndex, so - // no need to do it here. - keyIndex.remove(id.getWindmillComputationKey(), id); - } - displayedWeight -= weigher.weigh(id, removal.getValue()); - } - } - }) + .concurrencyLevel(4) .build(); } + private static class EntryStats { + long entries; + long idWeight; + long entryWeight; + long entryValues; + long maxEntryValues; + } + + private EntryStats calculateEntryStats() { + EntryStats stats = new EntryStats(); + BiConsumer consumer = + (stateId, stateCacheEntry) -> { + stats.entries++; + stats.idWeight += stateId.getWeight(); + stats.entryWeight += stateCacheEntry.getWeight(); + stats.entryValues += stateCacheEntry.values.size(); + stats.maxEntryValues = Math.max(stats.maxEntryValues, stateCacheEntry.values.size()); + }; + stateCache.asMap().forEach(consumer); + return stats; + } + public long getWeight() { - return displayedWeight; + EntryStats w = calculateEntryStats(); + return w.idWeight + w.entryWeight; } public long getMaxWeight() { return workerCacheBytes; } + public CacheStats getCacheStats() { + return stateCache.stats(); + } + /** Per-computation view of the state cache. */ public class ForComputation { @@ -117,207 +130,199 @@ private ForComputation(String computation) { /** Invalidate all cache entries for this computation and {@code processingKey}. */ public void invalidate(ByteString processingKey, long shardingKey) { - synchronized (this) { - WindmillComputationKey key = - WindmillComputationKey.create(computation, processingKey, shardingKey); - for (StateId id : keyIndex.removeAll(key)) { - stateCache.invalidate(id); - } - } + WindmillComputationKey key = + WindmillComputationKey.create(computation, processingKey, shardingKey); + // By removing the ForKey object, all state for the key is orphaned in the cache and will + // be removed by normal cache cleanup. + keyIndex.remove(key); } - /** Returns a per-computation, per-key view of the state cache. */ - public ForKey forKey( - WindmillComputationKey computationKey, - String stateFamily, - long cacheToken, - long workToken) { - return new ForKey(computationKey, stateFamily, cacheToken, workToken); + /** + * Returns a per-computation, per-key view of the state cache. Access to the cached data for + * this key is not thread-safe. Callers should ensure that there is only a single ForKey object + * in use at a time and that access to it is synchronized or single-threaded. + */ + public ForKey forKey(WindmillComputationKey computationKey, long cacheToken, long workToken) { + ForKey forKey = keyIndex.get(computationKey); + if (forKey == null || !forKey.updateTokens(cacheToken, workToken)) { + forKey = new ForKey(computationKey, cacheToken, workToken); + // We prefer this implementation to using compute because that is implemented similarly for + // ConcurrentHashMap with the downside of it performing inserts for unchanged existing + // values as well. + keyIndex.put(computationKey, forKey); + } + return forKey; } } /** Per-computation, per-key view of the state cache. */ + // Note that we utilize the default equality and hashCode for this class based upon the instance + // (instead of the fields) to optimize cache invalidation. public class ForKey { - private final WindmillComputationKey computationKey; - private final String stateFamily; // Cache token must be consistent for the key for the cache to be valid. private final long cacheToken; // The work token for processing must be greater than the last work token. As work items are // increasing for a key, a less-than or equal to work token indicates that the current token is - // for stale processing. We don't use the cache so that fetches performed will fail with a no - // longer valid work token. - private final long workToken; - - private ForKey( - WindmillComputationKey computationKey, - String stateFamily, - long cacheToken, - long workToken) { + // for stale processing. + private long workToken; + + /** + * Returns a per-computation, per-key, per-family view of the state cache. Access to the cached + * data for this key is not thread-safe. Callers should ensure that there is only a single + * ForKeyAndFamily object in use at a time for a given computation, key, family tuple and that + * access to it is synchronized or single-threaded. + */ + public ForKeyAndFamily forFamily(String stateFamily) { + return new ForKeyAndFamily(this, stateFamily); + } + + private ForKey(WindmillComputationKey computationKey, long cacheToken, long workToken) { this.computationKey = computationKey; - this.stateFamily = stateFamily; this.cacheToken = cacheToken; this.workToken = workToken; } - public T get(StateNamespace namespace, StateTag address) { - return WindmillStateCache.this.get( - computationKey, stateFamily, cacheToken, workToken, namespace, address); - } - - // Note that once a value has been put for a given workToken, get calls with that same workToken - // will fail. This is ok as we only put entries when we are building the commit and will no - // longer be performing gets for the same work token. - public void put( - StateNamespace namespace, StateTag address, T value, long weight) { - WindmillStateCache.this.put( - computationKey, stateFamily, cacheToken, workToken, namespace, address, value, weight); + private boolean updateTokens(long cacheToken, long workToken) { + if (this.cacheToken != cacheToken || workToken <= this.workToken) { + return false; + } + this.workToken = workToken; + return true; } } - /** Returns a per-computation view of the state cache. */ - public ForComputation forComputation(String computation) { - return new ForComputation(computation); - } + /** + * Per-computation, per-key, per-family view of the state cache. Modifications are cached locally + * and must be flushed to the cache by calling persist. This class is not thread-safe. + */ + public class ForKeyAndFamily { + final ForKey forKey; + final String stateFamily; + private final HashMap localCache; - private T get( - WindmillComputationKey computationKey, - String stateFamily, - long cacheToken, - long workToken, - StateNamespace namespace, - StateTag address) { - StateId id = new StateId(computationKey, stateFamily, namespace); - StateCacheEntry entry = stateCache.getIfPresent(id); - if (entry == null) { - return null; + private ForKeyAndFamily(ForKey forKey, String stateFamily) { + this.forKey = forKey; + this.stateFamily = stateFamily; + localCache = new HashMap<>(); } - if (entry.getCacheToken() != cacheToken) { - stateCache.invalidate(id); - return null; + + public String getStateFamily() { + return stateFamily; } - if (workToken <= entry.getLastWorkToken()) { - // We don't used the cached item but we don't invalidate it. - return null; + + public @Nullable T get(StateNamespace namespace, StateTag address) { + StateId id = new StateId(forKey, stateFamily, namespace); + @SuppressWarnings("nullness") // Unsure how to annotate lambda return allowing null. + @Nullable + StateCacheEntry entry = localCache.computeIfAbsent(id, key -> stateCache.getIfPresent(key)); + return entry == null ? null : entry.get(namespace, address); } - return entry.get(namespace, address); - } - private void put( - WindmillComputationKey computationKey, - String stateFamily, - long cacheToken, - long workToken, - StateNamespace namespace, - StateTag address, - T value, - long weight) { - StateId id = new StateId(computationKey, stateFamily, namespace); - StateCacheEntry entry = stateCache.getIfPresent(id); - if (entry == null) { - synchronized (this) { - keyIndex.put(id.getWindmillComputationKey(), id); + public void put( + StateNamespace namespace, StateTag address, T value, long weight) { + StateId id = new StateId(forKey, stateFamily, namespace); + @Nullable StateCacheEntry entry = localCache.get(id); + if (entry == null) { + entry = stateCache.getIfPresent(id); + if (entry == null) { + entry = new StateCacheEntry(); + } + boolean hadValue = localCache.putIfAbsent(id, entry) != null; + Preconditions.checkState(!hadValue); } + entry.put(namespace, address, value, weight); } - if (entry == null || entry.getCacheToken() != cacheToken) { - entry = new StateCacheEntry(cacheToken); - this.displayedWeight += (int) id.getWeight(); - this.displayedWeight += (int) entry.getWeight(); + + public void persist() { + localCache.forEach((id, entry) -> stateCache.put(id, entry)); } - entry.setLastWorkToken(workToken); - this.displayedWeight += (int) entry.put(namespace, address, value, weight); - // Always add back to the cache to update the weight. - stateCache.put(id, entry); } - /** Struct identifying a cache entry that contains all data for a key and namespace. */ - private static class StateId implements Weighted { + /** Returns a per-computation view of the state cache. */ + public ForComputation forComputation(String computation) { + return new ForComputation(computation); + } - private final WindmillComputationKey computationKey; + /** + * Struct identifying a cache entry that contains all data for a ForKey instance and namespace. + */ + private static class StateId implements Weighted { + private final ForKey forKey; private final String stateFamily; private final Object namespaceKey; + private final int hashCode; - public StateId( - WindmillComputationKey computationKey, String stateFamily, StateNamespace namespace) { - this.computationKey = computationKey; + public StateId(ForKey forKey, String stateFamily, StateNamespace namespace) { + this.forKey = forKey; this.stateFamily = stateFamily; this.namespaceKey = namespace.getCacheKey(); + this.hashCode = Objects.hash(forKey, stateFamily, namespaceKey); } @Override public boolean equals(@Nullable Object other) { - if (other instanceof StateId) { - StateId otherId = (StateId) other; - return computationKey.equals(otherId.computationKey) - && stateFamily.equals(otherId.stateFamily) - && namespaceKey.equals(otherId.namespaceKey); + if (this == other) { + return true; } - return false; - } - - public WindmillComputationKey getWindmillComputationKey() { - return computationKey; + if (!(other instanceof StateId)) { + return false; + } + StateId otherId = (StateId) other; + return hashCode == otherId.hashCode + && forKey == otherId.forKey + && stateFamily.equals(otherId.stateFamily) + && namespaceKey.equals(otherId.namespaceKey); } @Override public int hashCode() { - return Objects.hash(computationKey, namespaceKey); + return hashCode; } @Override public long getWeight() { - return (long) computationKey.key().size() + PER_STATE_ID_OVERHEAD; + return forKey.computationKey.key().size() + stateFamily.length() + PER_STATE_ID_OVERHEAD; } } - /** - * Entry in the state cache that stores a map of values, a cache token representing the validity - * of the values, and a work token that is increasing to ensure sequential processing. - */ + /** Entry in the state cache that stores a map of values. */ private static class StateCacheEntry implements Weighted { - - private final long cacheToken; - private long lastWorkToken; - private final Map, WeightedValue> values; + private final HashMap, WeightedValue> values; private long weight; - public StateCacheEntry(long cacheToken) { + public StateCacheEntry() { this.values = new HashMap<>(INITIAL_HASH_MAP_CAPACITY); - this.cacheToken = cacheToken; - this.lastWorkToken = Long.MIN_VALUE; this.weight = 0; } - public void setLastWorkToken(long workToken) { - this.lastWorkToken = workToken; - } - - @SuppressWarnings("unchecked") - public T get(StateNamespace namespace, StateTag tag) { + public @Nullable T get(StateNamespace namespace, StateTag tag) { + @SuppressWarnings("unchecked") + @Nullable WeightedValue weightedValue = (WeightedValue) values.get(new NamespacedTag<>(namespace, tag)); return weightedValue == null ? null : weightedValue.value; } - public long put( + public void put( StateNamespace namespace, StateTag tag, T value, long weight) { - @SuppressWarnings("unchecked") - WeightedValue weightedValue = - (WeightedValue) values.get(new NamespacedTag<>(namespace, tag)); - long weightDelta = 0; - if (weightedValue == null) { - weightedValue = new WeightedValue<>(); - weightDelta += HASH_MAP_ENTRY_OVERHEAD; - } else { - weightDelta -= weightedValue.weight; - } - weightedValue.value = value; - weightedValue.weight = weight; - weightDelta += weight; - this.weight += weightDelta; - values.put(new NamespacedTag<>(namespace, tag), weightedValue); - return weightDelta; + values.compute( + new NamespacedTag<>(namespace, tag), + (t, v) -> { + @SuppressWarnings("unchecked") + WeightedValue weightedValue = (WeightedValue) v; + if (weightedValue == null) { + weightedValue = new WeightedValue<>(); + this.weight += HASH_MAP_ENTRY_OVERHEAD; + } else { + this.weight -= weightedValue.weight; + } + this.weight += weight; + weightedValue.value = value; + weightedValue.weight = weight; + return weightedValue; + }); } @Override @@ -325,26 +330,24 @@ public long getWeight() { return weight + PER_CACHE_ENTRY_OVERHEAD; } - public long getCacheToken() { - return cacheToken; - } - - public long getLastWorkToken() { - return lastWorkToken; - } - + // Even though we use the namespace at the higher cache level, we are only using the cacheKey. + // That allows for grouped eviction of entries sharing a cacheKey but we require the full + // namespace here to distinguish between grouped entries. private static class NamespacedTag { private final StateNamespace namespace; - private final Equivalence.Wrapper tag; + private final Equivalence.Wrapper> tag; NamespacedTag(StateNamespace namespace, StateTag tag) { this.namespace = namespace; - this.tag = StateTags.ID_EQUIVALENCE.wrap((StateTag) tag); + this.tag = StateTags.ID_EQUIVALENCE.wrap(tag); } @Override public boolean equals(@Nullable Object other) { + if (other == this) { + return true; + } if (!(other instanceof NamespacedTag)) { return false; } @@ -359,22 +362,31 @@ public int hashCode() { } private static class WeightedValue { - - public long weight = 0; - public T value = null; + public long weight; + public @Nullable T value; } } /** Print summary statistics of the cache to the given {@link PrintWriter}. */ @Override public void appendSummaryHtml(PrintWriter response) { - response.println("Cache Stats:

    "); + response.println("Cache Stats:
    "); response.println( - ""); - response.println(""); - response.println(""); - response.println(""); - response.println(""); + "" + + "" + + "" + + ""); + CacheStats cacheStats = stateCache.stats(); + EntryStats entryStats = calculateEntryStats(); + response.println(""); + response.println(""); + response.println(""); + response.println(""); + response.println(""); + response.println(""); + response.println(""); + response.println(""); + response.println(""); response.println("
    Hit RatioEvictionsSizeWeight
    " + stateCache.stats().hitRate() + "" + stateCache.stats().evictionCount() + "" + stateCache.size() + "" + getWeight() + "
    Hit RatioEvictionsEntriesEntry ValuesMax Entry ValuesId WeightEntry WeightMax WeightKeys
    " + cacheStats.hitRate() + "" + cacheStats.evictionCount() + "" + entryStats.entries + "(" + stateCache.size() + " inc. weak) " + entryStats.entryValues + "" + entryStats.maxEntryValues + "" + entryStats.idWeight / MEGABYTES + "MB" + entryStats.entryWeight / MEGABYTES + "MB" + getMaxWeight() / MEGABYTES + "MB" + keyIndex.size() + "

    "); } @@ -382,8 +394,7 @@ public BaseStatusServlet statusServlet() { return new BaseStatusServlet("/cachez") { @Override protected void doGet(HttpServletRequest request, HttpServletResponse response) - throws IOException, ServletException { - + throws IOException { PrintWriter writer = response.getWriter(); writer.println("

    Cache Information

    "); appendSummaryHtml(writer); diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java index 95e190a87781..9ca72ccfb6d4 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java @@ -24,6 +24,7 @@ import java.io.OutputStream; import java.io.OutputStreamWriter; import java.nio.charset.StandardCharsets; +import java.util.AbstractMap; import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; @@ -31,11 +32,14 @@ import java.util.Iterator; import java.util.List; import java.util.Map; +import java.util.Map.Entry; import java.util.Random; +import java.util.Set; import java.util.SortedSet; import java.util.concurrent.ExecutionException; import java.util.concurrent.Future; import java.util.function.BiConsumer; +import java.util.function.Function; import javax.annotation.concurrent.NotThreadSafe; import org.apache.beam.runners.core.StateInternals; import org.apache.beam.runners.core.StateNamespace; @@ -43,7 +47,6 @@ import org.apache.beam.runners.core.StateTag; import org.apache.beam.runners.core.StateTag.StateBinder; import org.apache.beam.runners.core.StateTags; -import org.apache.beam.runners.dataflow.worker.WindmillStateCache.ForKey; import org.apache.beam.runners.dataflow.worker.windmill.Windmill; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.SortedListEntry; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.SortedListRange; @@ -51,6 +54,7 @@ import org.apache.beam.runners.dataflow.worker.windmill.Windmill.TagSortedListInsertRequest; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.TagSortedListUpdateRequest; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.WorkItemCommitRequest; +import org.apache.beam.sdk.coders.BooleanCoder; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.Coder.Context; import org.apache.beam.sdk.coders.CoderException; @@ -78,7 +82,7 @@ import org.apache.beam.sdk.util.CombineFnUtil; import org.apache.beam.sdk.util.Weighted; import org.apache.beam.sdk.values.TimestampedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Optional; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; @@ -94,7 +98,10 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.TreeRangeSet; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Futures; +import org.checkerframework.checker.initialization.qual.Initialized; +import org.checkerframework.checker.nullness.qual.NonNull; import org.checkerframework.checker.nullness.qual.Nullable; +import org.checkerframework.checker.nullness.qual.UnknownKeyFor; import org.joda.time.Duration; import org.joda.time.Instant; @@ -120,7 +127,7 @@ private static class CachingStateTable extends StateTable { private final @Nullable K key; private final String stateFamily; private final WindmillStateReader reader; - private final WindmillStateCache.ForKey cache; + private final WindmillStateCache.ForKeyAndFamily cache; private final boolean isSystemTable; boolean isNewKey; private final Supplier scopedReadStateSupplier; @@ -130,7 +137,7 @@ public CachingStateTable( @Nullable K key, String stateFamily, WindmillStateReader reader, - WindmillStateCache.ForKey cache, + WindmillStateCache.ForKeyAndFamily cache, boolean isSystemTable, boolean isNewKey, Supplier scopedReadStateSupplier, @@ -166,17 +173,23 @@ public BagState bindBag(StateTag> address, Coder elemCoder @Override public SetState bindSet(StateTag> spec, Coder elemCoder) { - throw new UnsupportedOperationException( - String.format("%s is not supported", SetState.class.getSimpleName())); + WindmillSet result = + new WindmillSet(namespace, spec, stateFamily, elemCoder, cache, isNewKey); + result.initializeForWorkItem(reader, scopedReadStateSupplier); + return result; } @Override public MapState bindMap( - StateTag> spec, - Coder mapKeyCoder, - Coder mapValueCoder) { - throw new UnsupportedOperationException( - String.format("%s is not supported", SetState.class.getSimpleName())); + StateTag> spec, Coder keyCoder, Coder valueCoder) { + WindmillMap result = (WindmillMap) cache.get(namespace, spec); + if (result == null) { + result = + new WindmillMap( + namespace, spec, stateFamily, keyCoder, valueCoder, isNewKey); + } + result.initializeForWorkItem(reader, scopedReadStateSupplier); + return result; } @Override @@ -254,7 +267,7 @@ public ValueState bindValue(StateTag> address, Coder cod } } - private WindmillStateCache.ForKey cache; + private WindmillStateCache.ForKeyAndFamily cache; Supplier scopedReadStateSupplier; private StateTable workItemState; private StateTable workItemDerivedState; @@ -264,7 +277,7 @@ public WindmillStateInternals( String stateFamily, WindmillStateReader reader, boolean isNewKey, - WindmillStateCache.ForKey cache, + WindmillStateCache.ForKeyAndFamily cache, Supplier scopedReadStateSupplier) { this.key = key; this.cache = cache; @@ -329,6 +342,8 @@ public void persist(final Windmill.WorkItemCommitRequest.Builder commitBuilder) } throw new RuntimeException("Failed to retrieve Windmill state during persist()", exc); } + + cache.persist(); } /** Encodes the given namespace and address as {@code <namespace>+<address>}. */ @@ -368,7 +383,7 @@ private abstract static class WindmillState { * Return an asynchronously computed {@link WorkItemCommitRequest}. The request should be of a * form that can be merged with others (only add to repeated fields). */ - abstract Future persist(WindmillStateCache.ForKey cache) + abstract Future persist(WindmillStateCache.ForKeyAndFamily cache) throws IOException; /** @@ -401,7 +416,7 @@ Closeable scopedReadState() { */ private abstract static class SimpleWindmillState extends WindmillState { @Override - public final Future persist(WindmillStateCache.ForKey cache) + public final Future persist(WindmillStateCache.ForKeyAndFamily cache) throws IOException { return Futures.immediateFuture(persistDirectly(cache)); } @@ -409,8 +424,8 @@ public final Future persist(WindmillStateCache.ForKey cac /** * Returns a {@link WorkItemCommitRequest} that can be used to persist this state to Windmill. */ - protected abstract WorkItemCommitRequest persistDirectly(WindmillStateCache.ForKey cache) - throws IOException; + protected abstract WorkItemCommitRequest persistDirectly( + WindmillStateCache.ForKeyAndFamily cache) throws IOException; } @Override @@ -497,7 +512,7 @@ public void write(T value) { } @Override - protected WorkItemCommitRequest persistDirectly(WindmillStateCache.ForKey cache) + protected WorkItemCommitRequest persistDirectly(WindmillStateCache.ForKeyAndFamily cache) throws IOException { if (!valueIsKnown) { // The value was never read, written or cleared. @@ -764,8 +779,7 @@ void add( if (availableIdsForTsRange != null) { availableIdsForTsRange.removeAll(idsUsed); } - idsAvailableValue.write(idsAvailable); - subRangeDeletionsValue.write(subRangeDeletions); + writeValues(idsAvailable, subRangeDeletions); } // Remove a timestamp range. Returns ids freed up. @@ -797,8 +811,22 @@ void remove(Range tsRange) throws ExecutionException, InterruptedExcept subRangeDeletions.remove(current); } } - idsAvailableValue.write(idsAvailable); - subRangeDeletionsValue.write(subRangeDeletions); + writeValues(idsAvailable, subRangeDeletions); + } + + private void writeValues( + Map, RangeSet> idsAvailable, + Map, RangeSet> subRangeDeletions) { + if (idsAvailable.isEmpty()) { + idsAvailable.clear(); + } else { + idsAvailableValue.write(idsAvailable); + } + if (subRangeDeletions.isEmpty()) { + subRangeDeletionsValue.clear(); + } else { + subRangeDeletionsValue.write(subRangeDeletions); + } } } @@ -997,10 +1025,14 @@ public OrderedListState readRangeLater( } @Override - public WorkItemCommitRequest persistDirectly(ForKey cache) throws IOException { + public WorkItemCommitRequest persistDirectly(WindmillStateCache.ForKeyAndFamily cache) + throws IOException { WorkItemCommitRequest.Builder commitBuilder = WorkItemCommitRequest.newBuilder(); TagSortedListUpdateRequest.Builder updatesBuilder = - commitBuilder.addSortedListUpdatesBuilder().setStateFamily(stateFamily).setTag(stateKey); + commitBuilder + .addSortedListUpdatesBuilder() + .setStateFamily(cache.getStateFamily()) + .setTag(stateKey); try { if (cleared) { // Default range. @@ -1075,6 +1107,490 @@ private Future>> getFuture( } } + static class WindmillSet extends SimpleWindmillState implements SetState { + WindmillMap windmillMap; + + WindmillSet( + StateNamespace namespace, + StateTag> address, + String stateFamily, + Coder keyCoder, + WindmillStateCache.ForKeyAndFamily cache, + boolean isNewKey) { + StateTag> internalMapAddress = + StateTags.convertToMapTagInternal(address); + WindmillMap cachedMap = + (WindmillMap) cache.get(namespace, internalMapAddress); + this.windmillMap = + (cachedMap != null) + ? cachedMap + : new WindmillMap<>( + namespace, + internalMapAddress, + stateFamily, + keyCoder, + BooleanCoder.of(), + isNewKey); + } + + @Override + protected WorkItemCommitRequest persistDirectly(WindmillStateCache.ForKeyAndFamily cache) + throws IOException { + return windmillMap.persistDirectly(cache); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState< + @UnknownKeyFor @NonNull @Initialized Boolean> + contains(K k) { + return windmillMap.getOrDefault(k, false); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState< + @UnknownKeyFor @NonNull @Initialized Boolean> + addIfAbsent(K k) { + return new ReadableState() { + ReadableState putState = windmillMap.putIfAbsent(k, true); + + @Override + public @Nullable Boolean read() { + Boolean result = putState.read(); + return (result != null) ? result : false; + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + putState = putState.readLater(); + return this; + } + }; + } + + @Override + public void remove(K k) { + windmillMap.remove(k); + } + + @Override + public void add(K value) { + windmillMap.put(value, true); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState< + @UnknownKeyFor @NonNull @Initialized Boolean> + isEmpty() { + return windmillMap.isEmpty(); + } + + @Override + public @Nullable Iterable read() { + return windmillMap.keys().read(); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized SetState readLater() { + windmillMap.keys().readLater(); + return this; + } + + @Override + public void clear() { + windmillMap.clear(); + } + + @Override + void initializeForWorkItem( + WindmillStateReader reader, Supplier scopedReadStateSupplier) { + windmillMap.initializeForWorkItem(reader, scopedReadStateSupplier); + } + + @Override + void cleanupAfterWorkItem() { + windmillMap.cleanupAfterWorkItem(); + } + } + + static class WindmillMap extends SimpleWindmillState implements MapState { + private final StateNamespace namespace; + private final StateTag> address; + private final ByteString stateKeyPrefix; + private final String stateFamily; + private final Coder keyCoder; + private final Coder valueCoder; + private boolean complete; + + // TODO(reuvenlax): Should we evict items from the cache? We would have to make sure + // that anything in the cache that is not committed is not evicted. negativeCache could be + // evicted whenever we want. + private Map cachedValues = Maps.newHashMap(); + private Set negativeCache = Sets.newHashSet(); + private boolean cleared = false; + + private Set localAdditions = Sets.newHashSet(); + private Set localRemovals = Sets.newHashSet(); + + WindmillMap( + StateNamespace namespace, + StateTag> address, + String stateFamily, + Coder keyCoder, + Coder valueCoder, + boolean isNewKey) { + this.namespace = namespace; + this.address = address; + this.stateKeyPrefix = encodeKey(namespace, address); + this.stateFamily = stateFamily; + this.keyCoder = keyCoder; + this.valueCoder = valueCoder; + this.complete = isNewKey; + } + + private K userKeyFromProtoKey(ByteString tag) throws IOException { + Preconditions.checkState(tag.startsWith(stateKeyPrefix)); + ByteString keyBytes = tag.substring(stateKeyPrefix.size()); + return keyCoder.decode(keyBytes.newInput(), Context.OUTER); + } + + private ByteString protoKeyFromUserKey(K key) throws IOException { + ByteString.Output keyStream = ByteString.newOutput(); + stateKeyPrefix.writeTo(keyStream); + keyCoder.encode(key, keyStream, Context.OUTER); + return keyStream.toByteString(); + } + + @Override + protected WorkItemCommitRequest persistDirectly(WindmillStateCache.ForKeyAndFamily cache) + throws IOException { + if (!cleared && localAdditions.isEmpty() && localRemovals.isEmpty()) { + // No changes, so return directly. + return WorkItemCommitRequest.newBuilder().buildPartial(); + } + + WorkItemCommitRequest.Builder commitBuilder = WorkItemCommitRequest.newBuilder(); + + if (cleared) { + commitBuilder + .addTagValuePrefixDeletesBuilder() + .setStateFamily(stateFamily) + .setTagPrefix(stateKeyPrefix); + } + cleared = false; + + for (K key : localAdditions) { + ByteString keyBytes = protoKeyFromUserKey(key); + ByteString.Output valueStream = ByteString.newOutput(); + valueCoder.encode(cachedValues.get(key), valueStream, Context.OUTER); + ByteString valueBytes = valueStream.toByteString(); + + commitBuilder + .addValueUpdatesBuilder() + .setTag(keyBytes) + .setStateFamily(stateFamily) + .getValueBuilder() + .setData(valueBytes) + .setTimestamp(Long.MAX_VALUE); + } + localAdditions.clear(); + + for (K key : localRemovals) { + ByteString.Output keyStream = ByteString.newOutput(); + stateKeyPrefix.writeTo(keyStream); + keyCoder.encode(key, keyStream, Context.OUTER); + ByteString keyBytes = keyStream.toByteString(); + // Leaving data blank means that we delete the tag. + commitBuilder + .addValueUpdatesBuilder() + .setTag(keyBytes) + .setStateFamily(stateFamily) + .getValueBuilder() + .setTimestamp(Long.MAX_VALUE); + + V cachedValue = cachedValues.remove(key); + if (cachedValue != null) { + ByteString.Output valueStream = ByteString.newOutput(); + valueCoder.encode(cachedValues.get(key), valueStream, Context.OUTER); + } + } + negativeCache.addAll(localRemovals); + localRemovals.clear(); + + // TODO(reuvenlax): We should store in the cache parameter, as that would enable caching the + // map + // between work items, reducing fetches to Windmill. To do so, we need keep track of the + // encoded size + // of the map, and to do so efficiently (i.e. without iterating over the entire map on every + // persist) + // we need to track the sizes of each map entry. + cache.put(namespace, address, this, 1); + return commitBuilder.buildPartial(); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState get(K key) { + return getOrDefault(key, null); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState getOrDefault( + K key, @Nullable V defaultValue) { + return new ReadableState() { + @Override + public @Nullable V read() { + Future persistedData = getFutureForKey(key); + try (Closeable scope = scopedReadState()) { + if (localRemovals.contains(key) || negativeCache.contains(key)) { + return null; + } + @Nullable V cachedValue = cachedValues.get(key); + if (cachedValue != null || complete) { + return cachedValue; + } + + V persistedValue = persistedData.get(); + if (persistedValue == null) { + negativeCache.add(key); + return defaultValue; + } + // TODO: Don't do this if it was already in cache. + cachedValues.put(key, persistedValue); + return persistedValue; + } catch (InterruptedException | ExecutionException | IOException e) { + if (e instanceof InterruptedException) { + Thread.currentThread().interrupt(); + } + throw new RuntimeException("Unable to read state", e); + } + } + + @Override + @SuppressWarnings("FutureReturnValueIgnored") + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + WindmillMap.this.getFutureForKey(key); + return this; + } + }; + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState< + @UnknownKeyFor @NonNull @Initialized Iterable> + keys() { + ReadableState>> entries = entries(); + return new ReadableState>() { + @Override + public @Nullable Iterable read() { + return Iterables.transform(entries.read(), e -> e.getKey()); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState> readLater() { + entries.readLater(); + return this; + } + }; + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState< + @UnknownKeyFor @NonNull @Initialized Iterable> + values() { + ReadableState>> entries = entries(); + return new ReadableState>() { + @Override + public @Nullable Iterable read() { + return Iterables.transform(entries.read(), e -> e.getValue()); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState> readLater() { + entries.readLater(); + return this; + } + }; + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState< + @UnknownKeyFor @NonNull @Initialized Iterable< + @UnknownKeyFor @NonNull @Initialized Entry>> + entries() { + return new ReadableState>>() { + @Override + public Iterable> read() { + if (complete) { + return Iterables.unmodifiableIterable(cachedValues.entrySet()); + } + Future>> persistedData = getFuture(); + try (Closeable scope = scopedReadState()) { + Iterable> data = persistedData.get(); + Iterable> transformedData = + Iterables., Map.Entry>transform( + data, + entry -> { + try { + return new AbstractMap.SimpleEntry<>( + userKeyFromProtoKey(entry.getKey()), entry.getValue()); + } catch (IOException e) { + throw new RuntimeException(e); + } + }); + + if (data instanceof Weighted) { + // This is a known amount of data. Cache it all. + transformedData.forEach( + e -> { + // The cached data overrides what is read from state, so call putIfAbsent. + cachedValues.putIfAbsent(e.getKey(), e.getValue()); + }); + complete = true; + return Iterables.unmodifiableIterable(cachedValues.entrySet()); + } else { + // This means that the result might be too large to cache, so don't add it to the + // local cache. Instead merge the iterables, giving priority to any local additions + // (represented in cachedValued and localRemovals) that may not have been committed + // yet. + return Iterables.unmodifiableIterable( + Iterables.concat( + cachedValues.entrySet(), + Iterables.filter( + transformedData, + e -> + !cachedValues.containsKey(e.getKey()) + && !localRemovals.contains(e.getKey())))); + } + + } catch (InterruptedException | ExecutionException | IOException e) { + if (e instanceof InterruptedException) { + Thread.currentThread().interrupt(); + } + throw new RuntimeException("Unable to read state", e); + } + } + + @Override + @SuppressWarnings("FutureReturnValueIgnored") + public @UnknownKeyFor @NonNull @Initialized ReadableState>> + readLater() { + WindmillMap.this.getFuture(); + return this; + } + }; + } + + @Override + public ReadableState isEmpty() { + return new ReadableState() { + // TODO(reuvenlax): Can we find a more efficient way of implementing isEmpty than reading + // the entire map? + ReadableState> keys = WindmillMap.this.keys(); + + @Override + public @Nullable Boolean read() { + return Iterables.isEmpty(keys.read()); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + keys.readLater(); + return this; + } + }; + } + + @Override + public void put(K key, V value) { + V oldValue = cachedValues.put(key, value); + if (valueCoder.consistentWithEquals() && value.equals(oldValue)) { + return; + } + localAdditions.add(key); + localRemovals.remove(key); + negativeCache.remove(key); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState computeIfAbsent( + K key, Function mappingFunction) { + return new ReadableState() { + @Override + public @Nullable V read() { + Future persistedData = getFutureForKey(key); + try (Closeable scope = scopedReadState()) { + if (localRemovals.contains(key) || negativeCache.contains(key)) { + return null; + } + @Nullable V cachedValue = cachedValues.get(key); + if (cachedValue != null || complete) { + return cachedValue; + } + + V persistedValue = persistedData.get(); + if (persistedValue == null) { + // This is a new value. Add it to the map and return null. + put(key, mappingFunction.apply(key)); + return null; + } + // TODO: Don't do this if it was already in cache. + cachedValues.put(key, persistedValue); + return persistedValue; + } catch (InterruptedException | ExecutionException | IOException e) { + if (e instanceof InterruptedException) { + Thread.currentThread().interrupt(); + } + throw new RuntimeException("Unable to read state", e); + } + } + + @Override + @SuppressWarnings("FutureReturnValueIgnored") + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + WindmillMap.this.getFutureForKey(key); + return this; + } + }; + } + + @Override + public void remove(K key) { + if (localRemovals.add(key)) { + cachedValues.remove(key); + localAdditions.remove(key); + } + } + + @Override + public void clear() { + cachedValues.clear(); + localAdditions.clear(); + localRemovals.clear(); + negativeCache.clear(); + cleared = true; + complete = true; + } + + private Future getFutureForKey(K key) { + try { + ByteString.Output keyStream = ByteString.newOutput(); + stateKeyPrefix.writeTo(keyStream); + keyCoder.encode(key, keyStream, Context.OUTER); + return reader.valueFuture(keyStream.toByteString(), stateFamily, valueCoder); + } catch (IOException e) { + throw new RuntimeException(e); + } + } + + private Future>> getFuture() { + if (complete) { + // The caller will merge in local cached values. + return Futures.immediateFuture(Collections.emptyList()); + } else { + return reader.valuePrefixFuture(stateKeyPrefix, stateFamily, valueCoder); + } + } + }; + private static class WindmillBag extends SimpleWindmillState implements BagState { private final StateNamespace namespace; @@ -1186,7 +1702,7 @@ public void add(T input) { } @Override - public WorkItemCommitRequest persistDirectly(WindmillStateCache.ForKey cache) + public WorkItemCommitRequest persistDirectly(WindmillStateCache.ForKeyAndFamily cache) throws IOException { WorkItemCommitRequest.Builder commitBuilder = WorkItemCommitRequest.newBuilder(); @@ -1369,7 +1885,7 @@ public TimestampCombiner getTimestampCombiner() { } @Override - public Future persist(final WindmillStateCache.ForKey cache) { + public Future persist(final WindmillStateCache.ForKeyAndFamily cache) { Future result; @@ -1508,7 +2024,7 @@ private WindmillCombiningState( String stateFamily, Coder accumCoder, CombineFn combineFn, - WindmillStateCache.ForKey cache, + WindmillStateCache.ForKeyAndFamily cache, boolean isNewKey) { StateTag> internalBagAddress = StateTags.convertToBagTagInternal(address); WindmillBag cachedBag = @@ -1559,7 +2075,7 @@ public void clear() { } @Override - public Future persist(WindmillStateCache.ForKey cache) + public Future persist(WindmillStateCache.ForKeyAndFamily cache) throws IOException { if (hasLocalAdditions) { if (COMPACT_NOW.get().get() || bag.valuesAreCached()) { diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateReader.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateReader.java index 3ef2d7feea0e..c877c98082bf 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateReader.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateReader.java @@ -23,12 +23,14 @@ import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; import java.io.IOException; import java.io.InputStream; +import java.util.AbstractMap; import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.HashSet; import java.util.Iterator; import java.util.List; +import java.util.Map; import java.util.Set; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentLinkedQueue; @@ -38,17 +40,20 @@ import java.util.concurrent.TimeoutException; import javax.annotation.Nonnull; import javax.annotation.Nullable; +import org.apache.beam.runners.dataflow.worker.WindmillStateReader.StateTag.Kind; import org.apache.beam.runners.dataflow.worker.windmill.Windmill; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.SortedListEntry; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.SortedListRange; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.TagBag; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.TagSortedListFetchRequest; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.TagValue; +import org.apache.beam.runners.dataflow.worker.windmill.Windmill.TagValuePrefixRequest; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.Coder.Context; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.util.Weighted; import org.apache.beam.sdk.values.TimestampedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; @@ -78,7 +83,9 @@ class WindmillStateReader { * Ideal maximum bytes in a TagBag response. However, Windmill will always return at least one * value if possible irrespective of this limit. */ - public static final long MAX_BAG_BYTES = 8L << 20; // 8MB + public static final long INITIAL_MAX_BAG_BYTES = 8L << 20; // 8MB + + public static final long CONTINUATION_MAX_BAG_BYTES = 32L << 20; // 32MB /** * Ideal maximum bytes in a TagSortedList response. However, Windmill will always return at least @@ -86,12 +93,20 @@ class WindmillStateReader { */ public static final long MAX_ORDERED_LIST_BYTES = 8L << 20; // 8MB + /** + * Ideal maximum bytes in a tag-value prefix response. However, Windmill will always return at + * least one value if possible irrespective of this limit. + */ + public static final long MAX_TAG_VALUE_PREFIX_BYTES = 8L << 20; // 8MB + /** * Ideal maximum bytes in a KeyedGetDataResponse. However, Windmill will always return at least * one value if possible irrespective of this limit. */ public static final long MAX_KEY_BYTES = 16L << 20; // 16MB + public static final long MAX_CONTINUATION_KEY_BYTES = 72L << 20; // 72MB + /** * When combined with a key and computationId, represents the unique address for state managed by * Windmill. @@ -102,7 +117,8 @@ enum Kind { VALUE, BAG, WATERMARK, - ORDERED_LIST + ORDERED_LIST, + VALUE_PREFIX } abstract Kind getKind(); @@ -112,9 +128,9 @@ enum Kind { abstract String getStateFamily(); /** - * For {@link Kind#BAG, Kind#ORDERED_LIST} kinds: A previous 'continuation_position' returned by - * Windmill to signal the resulting bag was incomplete. Sending that position will request the - * next page of values. Null for first request. + * For {@link Kind#BAG, Kind#ORDERED_LIST, Kind#VALUE_PREFIX} kinds: A previous + * 'continuation_position' returned by Windmill to signal the resulting bag was incomplete. + * Sending that position will request the next page of values. Null for first request. * *

    Null for other kinds. */ @@ -197,11 +213,11 @@ public WindmillStateReader( this.workToken = workToken; } - private static final class CoderAndFuture { - private Coder coder; + private static final class CoderAndFuture { + private Coder coder = null; private final SettableFuture future; - private CoderAndFuture(Coder coder, SettableFuture future) { + private CoderAndFuture(Coder coder, SettableFuture future) { this.coder = coder; this.future = future; } @@ -217,11 +233,14 @@ private SettableFuture getNonDoneFuture(StateTag stateTag) { return future; } - private Coder getAndClearCoder() { + private Coder getAndClearCoder() { if (coder == null) { throw new IllegalStateException("Coder has already been cleared from cache"); } - Coder result = coder; + Coder result = (Coder) coder; + if (result == null) { + throw new IllegalStateException("Coder has already been cleared from cache"); + } coder = null; return result; } @@ -236,13 +255,11 @@ private void checkNoCoder() { @VisibleForTesting ConcurrentLinkedQueue> pendingLookups = new ConcurrentLinkedQueue<>(); - private ConcurrentHashMap, CoderAndFuture> waiting = new ConcurrentHashMap<>(); + private ConcurrentHashMap, CoderAndFuture> waiting = new ConcurrentHashMap<>(); - private Future stateFuture( - StateTag stateTag, @Nullable Coder coder) { - CoderAndFuture coderAndFuture = - new CoderAndFuture<>(coder, SettableFuture.create()); - CoderAndFuture existingCoderAndFutureWildcard = + private Future stateFuture(StateTag stateTag, @Nullable Coder coder) { + CoderAndFuture coderAndFuture = new CoderAndFuture<>(coder, SettableFuture.create()); + CoderAndFuture existingCoderAndFutureWildcard = waiting.putIfAbsent(stateTag, coderAndFuture); if (existingCoderAndFutureWildcard == null) { // Schedule a new request. It's response is guaranteed to find the future and coder. @@ -250,17 +267,16 @@ private Future stateFuture( } else { // Piggy-back on the pending or already answered request. @SuppressWarnings("unchecked") - CoderAndFuture existingCoderAndFuture = - (CoderAndFuture) existingCoderAndFutureWildcard; + CoderAndFuture existingCoderAndFuture = + (CoderAndFuture) existingCoderAndFutureWildcard; coderAndFuture = existingCoderAndFuture; } return wrappedFuture(coderAndFuture.getFuture()); } - private CoderAndFuture getWaiting( - StateTag stateTag, boolean shouldRemove) { - CoderAndFuture coderAndFutureWildcard; + private CoderAndFuture getWaiting(StateTag stateTag, boolean shouldRemove) { + CoderAndFuture coderAndFutureWildcard; if (shouldRemove) { coderAndFutureWildcard = waiting.remove(stateTag); } else { @@ -270,8 +286,7 @@ private CoderAndFuture getWaiting( throw new IllegalStateException("Missing future for " + stateTag); } @SuppressWarnings("unchecked") - CoderAndFuture coderAndFuture = - (CoderAndFuture) coderAndFutureWildcard; + CoderAndFuture coderAndFuture = (CoderAndFuture) coderAndFutureWildcard; return coderAndFuture; } @@ -303,18 +318,27 @@ public Future>> orderedListFuture( valuesToPagingIterableFuture(stateTag, elemCoder, this.stateFuture(stateTag, elemCoder))); } + public Future>> valuePrefixFuture( + ByteString prefix, String stateFamily, Coder valueCoder) { + // First request has no continuation position. + StateTag stateTag = + StateTag.of(Kind.VALUE_PREFIX, prefix, stateFamily).toBuilder().build(); + return Preconditions.checkNotNull( + valuesToPagingIterableFuture(stateTag, valueCoder, this.stateFuture(stateTag, valueCoder))); + } + /** * Internal request to fetch the next 'page' of values. Return null if no continuation position is * in {@code contStateTag}, which signals there are no more pages. */ - private @Nullable + private @Nullable Future> continuationFuture( - StateTag contStateTag, Coder elemCoder) { + StateTag contStateTag, Coder coder) { if (contStateTag.getRequestPosition() == null) { // We're done. return null; } - return stateFuture(contStateTag, elemCoder); + return stateFuture(contStateTag, coder); } /** @@ -367,7 +391,7 @@ private Future wrappedFuture(final Future future) { } /** Function to extract an {@link Iterable} from the continuation-supporting page read future. */ - private static class ToIterableFunction + private static class ToIterableFunction implements Function, Iterable> { /** * Reader to request continuation pages from, or {@literal null} if no continuation pages @@ -376,13 +400,13 @@ private static class ToIterableFunction private @Nullable WindmillStateReader reader; private final StateTag stateTag; - private final Coder elemCoder; + private final Coder coder; public ToIterableFunction( - WindmillStateReader reader, StateTag stateTag, Coder elemCoder) { + WindmillStateReader reader, StateTag stateTag, Coder coder) { this.reader = reader; this.stateTag = stateTag; - this.elemCoder = elemCoder; + this.coder = coder; } @SuppressFBWarnings( @@ -407,8 +431,8 @@ public Iterable apply( contStateTag = contStateTag.toBuilder().setSortedListRange(stateTag.getSortedListRange()).build(); } - return new PagingIterable( - reader, valuesAndContPosition.values, contStateTag, elemCoder); + return new PagingIterable( + reader, valuesAndContPosition.values, contStateTag, coder); } } } @@ -417,12 +441,12 @@ public Iterable apply( * Return future which transforms a {@code ValuesAndContPosition} result into the initial * Iterable result expected from the external caller. */ - private Future> valuesToPagingIterableFuture( + private Future> valuesToPagingIterableFuture( final StateTag stateTag, - final Coder elemCoder, + final Coder coder, final Future> future) { Function, Iterable> toIterable = - new ToIterableFunction<>(this, stateTag, elemCoder); + new ToIterableFunction<>(this, stateTag, coder); return Futures.lazyTransform(future, toIterable); } @@ -466,6 +490,7 @@ private Windmill.KeyedGetDataRequest createRequest(Iterable> toFetch .setShardingKey(shardingKey) .setWorkToken(workToken); + boolean continuation = false; List> orderedListsToFetch = Lists.newArrayList(); for (StateTag stateTag : toFetch) { switch (stateTag.getKind()) { @@ -474,11 +499,14 @@ private Windmill.KeyedGetDataRequest createRequest(Iterable> toFetch keyedDataBuilder .addBagsToFetchBuilder() .setTag(stateTag.getTag()) - .setStateFamily(stateTag.getStateFamily()) - .setFetchMaxBytes(MAX_BAG_BYTES); - if (stateTag.getRequestPosition() != null) { + .setStateFamily(stateTag.getStateFamily()); + if (stateTag.getRequestPosition() == null) { + bag.setFetchMaxBytes(INITIAL_MAX_BAG_BYTES); + } else { // We're asking for the next page. + bag.setFetchMaxBytes(CONTINUATION_MAX_BAG_BYTES); bag.setRequestPosition((Long) stateTag.getRequestPosition()); + continuation = true; } break; @@ -500,6 +528,18 @@ private Windmill.KeyedGetDataRequest createRequest(Iterable> toFetch .setStateFamily(stateTag.getStateFamily()); break; + case VALUE_PREFIX: + TagValuePrefixRequest.Builder prefixFetchBuilder = + keyedDataBuilder + .addTagValuePrefixesToFetchBuilder() + .setTagPrefix(stateTag.getTag()) + .setStateFamily(stateTag.getStateFamily()) + .setFetchMaxBytes(MAX_TAG_VALUE_PREFIX_BYTES); + if (stateTag.getRequestPosition() != null) { + prefixFetchBuilder.setRequestPosition((ByteString) stateTag.getRequestPosition()); + } + break; + default: throw new RuntimeException("Unknown kind of tag requested: " + stateTag.getKind()); } @@ -525,7 +565,11 @@ private Windmill.KeyedGetDataRequest createRequest(Iterable> toFetch sorted_list.setRequestPosition((ByteString) stateTag.getRequestPosition()); } } - keyedDataBuilder.setMaxBytes(MAX_KEY_BYTES); + if (continuation) { + keyedDataBuilder.setMaxBytes(MAX_CONTINUATION_KEY_BYTES); + } else { + keyedDataBuilder.setMaxBytes(MAX_KEY_BYTES); + } return keyedDataBuilder.build(); } @@ -583,6 +627,19 @@ private void consumeResponse( } consumeTagValue(value, stateTag); } + for (Windmill.TagValuePrefixResponse prefix_response : response.getTagValuePrefixesList()) { + StateTag stateTag = + StateTag.of( + Kind.VALUE_PREFIX, + prefix_response.getTagPrefix(), + prefix_response.getStateFamily(), + prefix_response.hasRequestPosition() ? prefix_response.getRequestPosition() : null); + if (!toFetch.remove(stateTag)) { + throw new IllegalStateException( + "Received response for unrequested tag " + stateTag + ". Pending tags: " + toFetch); + } + consumeTagPrefixResponse(prefix_response, stateTag); + } for (Windmill.TagSortedListFetchResponse sorted_list : response.getTagSortedListsList()) { SortedListRange sortedListRange = Iterables.getOnlyElement(sorted_list.getFetchRangesList()); Range range = Range.closedOpen(sortedListRange.getStart(), sortedListRange.getLimit()); @@ -680,6 +737,28 @@ private List> sortedListPageValues( return entryList; } + private List> tagPrefixPageTagValues( + Windmill.TagValuePrefixResponse tagValuePrefixResponse, Coder valueCoder) { + if (tagValuePrefixResponse.getTagValuesCount() == 0) { + return new WeightedList<>(Collections.emptyList()); + } + + WeightedList> entryList = + new WeightedList>( + new ArrayList<>(tagValuePrefixResponse.getTagValuesCount())); + for (TagValue entry : tagValuePrefixResponse.getTagValuesList()) { + try { + V value = valueCoder.decode(entry.getValue().getData().newInput(), Context.OUTER); + entryList.addWeighted( + new AbstractMap.SimpleEntry<>(entry.getTag(), value), + entry.getTag().size() + entry.getValue().getData().size()); + } catch (IOException e) { + throw new IllegalStateException("Unable to decode tag value " + e); + } + } + return entryList; + } + private void consumeBag(TagBag bag, StateTag stateTag) { boolean shouldRemove; if (stateTag.getRequestPosition() == null) { @@ -693,11 +772,11 @@ private void consumeBag(TagBag bag, StateTag stateTag) { // continuation positions. shouldRemove = true; } - CoderAndFuture> coderAndFuture = + CoderAndFuture> coderAndFuture = getWaiting(stateTag, shouldRemove); SettableFuture> future = coderAndFuture.getNonDoneFuture(stateTag); - Coder coder = coderAndFuture.getAndClearCoder(); + Coder coder = coderAndFuture.getAndClearCoder(); List values = this.bagPageValues(bag, coder); future.set( new ValuesAndContPosition<>( @@ -705,7 +784,7 @@ private void consumeBag(TagBag bag, StateTag stateTag) { } private void consumeWatermark(Windmill.WatermarkHold watermarkHold, StateTag stateTag) { - CoderAndFuture coderAndFuture = getWaiting(stateTag, false); + CoderAndFuture coderAndFuture = getWaiting(stateTag, false); SettableFuture future = coderAndFuture.getNonDoneFuture(stateTag); // No coders for watermarks coderAndFuture.checkNoCoder(); @@ -725,7 +804,7 @@ private void consumeWatermark(Windmill.WatermarkHold watermarkHold, StateTag void consumeTagValue(TagValue tagValue, StateTag stateTag) { - CoderAndFuture coderAndFuture = getWaiting(stateTag, false); + CoderAndFuture coderAndFuture = getWaiting(stateTag, false); SettableFuture future = coderAndFuture.getNonDoneFuture(stateTag); Coder coder = coderAndFuture.getAndClearCoder(); @@ -744,6 +823,36 @@ private void consumeTagValue(TagValue tagValue, StateTag stateTag) { } } + private void consumeTagPrefixResponse( + Windmill.TagValuePrefixResponse tagValuePrefixResponse, StateTag stateTag) { + boolean shouldRemove; + if (stateTag.getRequestPosition() == null) { + // This is the response for the first page.// Leave the future in the cache so subsequent + // requests for the first page + // can return immediately. + shouldRemove = false; + } else { + // This is a response for a subsequent page. + // Don't cache the future since we may need to make multiple requests with different + // continuation positions. + shouldRemove = true; + } + + CoderAndFuture, ByteString>> coderAndFuture = + getWaiting(stateTag, shouldRemove); + SettableFuture, ByteString>> future = + coderAndFuture.getNonDoneFuture(stateTag); + Coder valueCoder = coderAndFuture.getAndClearCoder(); + List> values = + this.tagPrefixPageTagValues(tagValuePrefixResponse, valueCoder); + future.set( + new ValuesAndContPosition<>( + values, + tagValuePrefixResponse.hasContinuationPosition() + ? tagValuePrefixResponse.getContinuationPosition() + : null)); + } + private void consumeSortedList( Windmill.TagSortedListFetchResponse sortedListFetchResponse, StateTag stateTag) { boolean shouldRemove; @@ -759,7 +868,7 @@ private void consumeSortedList( shouldRemove = true; } - CoderAndFuture, ByteString>> coderAndFuture = + CoderAndFuture, ByteString>> coderAndFuture = getWaiting(stateTag, shouldRemove); SettableFuture, ByteString>> future = coderAndFuture.getNonDoneFuture(stateTag); @@ -772,7 +881,6 @@ private void consumeSortedList( ? sortedListFetchResponse.getContinuationPosition() : null)); } - /** * An iterable over elements backed by paginated GetData requests to Windmill. The iterable may be * iterated over an arbitrary number of times and multiple iterators may be active simultaneously. @@ -789,7 +897,7 @@ private void consumeSortedList( * call to iterator. * */ - private static class PagingIterable implements Iterable { + private static class PagingIterable implements Iterable { /** * The reader we will use for scheduling continuation pages. * @@ -804,17 +912,17 @@ private static class PagingIterable implements It private final StateTag secondPagePos; /** Coder for elements. */ - private final Coder elemCoder; + private final Coder coder; private PagingIterable( WindmillStateReader reader, List firstPage, StateTag secondPagePos, - Coder elemCoder) { + Coder coder) { this.reader = reader; this.firstPage = firstPage; this.secondPagePos = secondPagePos; - this.elemCoder = elemCoder; + this.coder = coder; } @Override @@ -824,7 +932,7 @@ public Iterator iterator() { private StateTag nextPagePos = secondPagePos; private Future> pendingNextPage = // NOTE: The results of continuation page reads are never cached. - reader.continuationFuture(nextPagePos, elemCoder); + reader.continuationFuture(nextPagePos, coder); @Override protected ResultT computeNext() { @@ -854,7 +962,7 @@ protected ResultT computeNext() { valuesAndContPosition.continuationPosition); pendingNextPage = // NOTE: The results of continuation page reads are never cached. - reader.continuationFuture(nextPagePos, elemCoder); + reader.continuationFuture(nextPagePos, coder); } } }; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillTimerInternals.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillTimerInternals.java index a102e178aea5..6ad3b64c9118 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillTimerInternals.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillTimerInternals.java @@ -22,6 +22,7 @@ import java.io.IOException; import java.nio.charset.StandardCharsets; +import java.util.function.Consumer; import org.apache.beam.runners.core.StateNamespace; import org.apache.beam.runners.core.StateNamespaces; import org.apache.beam.runners.core.TimerInternals; @@ -33,7 +34,7 @@ import org.apache.beam.sdk.util.ExposedByteArrayInputStream; import org.apache.beam.sdk.util.ExposedByteArrayOutputStream; import org.apache.beam.sdk.util.VarInt; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashBasedTable; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Table; @@ -68,6 +69,7 @@ class WindmillTimerInternals implements TimerInternals { private @Nullable Instant synchronizedProcessingTime; private String stateFamily; private WindmillNamespacePrefix prefix; + private Consumer onTimerModified; public WindmillTimerInternals( String stateFamily, // unique identifies a step @@ -75,13 +77,15 @@ public WindmillTimerInternals( Instant inputDataWatermark, Instant processingTime, @Nullable Instant outputDataWatermark, - @Nullable Instant synchronizedProcessingTime) { + @Nullable Instant synchronizedProcessingTime, + Consumer onTimerModified) { this.inputDataWatermark = checkNotNull(inputDataWatermark); this.processingTime = checkNotNull(processingTime); this.outputDataWatermark = outputDataWatermark; this.synchronizedProcessingTime = synchronizedProcessingTime; this.stateFamily = stateFamily; this.prefix = prefix; + this.onTimerModified = onTimerModified; } public WindmillTimerInternals withPrefix(WindmillNamespacePrefix prefix) { @@ -91,19 +95,16 @@ public WindmillTimerInternals withPrefix(WindmillNamespacePrefix prefix) { inputDataWatermark, processingTime, outputDataWatermark, - synchronizedProcessingTime); + synchronizedProcessingTime, + onTimerModified); } @Override public void setTimer(TimerData timerKey) { - timers.put( - getTimerDataKey(timerKey.getTimerId(), timerKey.getTimerFamilyId()), - timerKey.getNamespace(), - timerKey); - timerStillPresent.put( - getTimerDataKey(timerKey.getTimerId(), timerKey.getTimerFamilyId()), - timerKey.getNamespace(), - true); + String timerDataKey = getTimerDataKey(timerKey.getTimerId(), timerKey.getTimerFamilyId()); + timers.put(timerDataKey, timerKey.getNamespace(), timerKey); + timerStillPresent.put(timerDataKey, timerKey.getNamespace(), true); + onTimerModified.accept(timerKey); } @Override @@ -114,28 +115,26 @@ public void setTimer( Instant timestamp, Instant outputTimestamp, TimeDomain timeDomain) { - timers.put( - getTimerDataKey(timerId, timerFamilyId), - namespace, - TimerData.of(timerId, timerFamilyId, namespace, timestamp, outputTimestamp, timeDomain)); - timerStillPresent.put(getTimerDataKey(timerId, timerFamilyId), namespace, true); + TimerData timer = + TimerData.of(timerId, timerFamilyId, namespace, timestamp, outputTimestamp, timeDomain); + setTimer(timer); + } + + public static String getTimerDataKey(TimerData timerData) { + return getTimerDataKey(timerData.getTimerId(), timerData.getTimerFamilyId()); } - private String getTimerDataKey(String timerId, String timerFamilyId) { + private static String getTimerDataKey(String timerId, String timerFamilyId) { // Identifies timer uniquely with timerFamilyId return timerId + '+' + timerFamilyId; } @Override public void deleteTimer(TimerData timerKey) { - timers.put( - getTimerDataKey(timerKey.getTimerId(), timerKey.getTimerFamilyId()), - timerKey.getNamespace(), - timerKey); - timerStillPresent.put( - getTimerDataKey(timerKey.getTimerId(), timerKey.getTimerFamilyId()), - timerKey.getNamespace(), - false); + String timerDataKey = getTimerDataKey(timerKey.getTimerId(), timerKey.getTimerFamilyId()); + timers.put(timerDataKey, timerKey.getNamespace(), timerKey); + timerStillPresent.put(timerDataKey, timerKey.getNamespace(), false); + onTimerModified.accept(timerKey.deleted()); } @Override @@ -144,8 +143,16 @@ public void deleteTimer(StateNamespace namespace, String timerId, String timerFa } @Override - public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain timeDomain) { - throw new UnsupportedOperationException("Deletion of timers by ID is not supported."); + public void deleteTimer( + StateNamespace namespace, String timerId, String timerFamilyId, TimeDomain timeDomain) { + deleteTimer( + TimerData.of( + timerId, + timerFamilyId, + namespace, + BoundedWindow.TIMESTAMP_MIN_VALUE, + BoundedWindow.TIMESTAMP_MAX_VALUE, + timeDomain)); } @Override diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkerCustomSources.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkerCustomSources.java index 4039996b73da..0c4a50859b2d 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkerCustomSources.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkerCustomSources.java @@ -60,7 +60,7 @@ import org.apache.beam.sdk.util.FluentBackoff; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.ValueWithRecordId; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkerPipelineOptionsFactory.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkerPipelineOptionsFactory.java index 7f1e5ff5cfb3..13abdebd1a8f 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkerPipelineOptionsFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkerPipelineOptionsFactory.java @@ -79,6 +79,9 @@ public static DataflowWorkerHarnessOptions createFromSystemProperties() throws I if (System.getProperties().containsKey("job_id")) { options.setJobId(System.getProperty("job_id")); } + if (System.getProperties().containsKey("worker_pool")) { + options.setWorkerPool(System.getProperty("worker_pool")); + } return options; } diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlService.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlService.java index ab0774714a56..2615112180b1 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlService.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlService.java @@ -26,11 +26,11 @@ import org.apache.beam.model.fnexecution.v1.BeamFnControlGrpc; import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.runners.dataflow.worker.fn.grpc.BeamFnService; -import org.apache.beam.runners.fnexecution.HeaderAccessor; import org.apache.beam.runners.fnexecution.control.FnApiControlClient; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.sdk.fn.server.HeaderAccessor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/RegisterAndProcessBundleOperation.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/RegisterAndProcessBundleOperation.java index b9a125aa088f..b49df39ca759 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/RegisterAndProcessBundleOperation.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/RegisterAndProcessBundleOperation.java @@ -73,8 +73,8 @@ import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.util.MoreFutures; import org.apache.beam.sdk.values.PCollectionView; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/data/BeamFnDataGrpcService.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/data/BeamFnDataGrpcService.java index 9abc49378a19..97773e12f420 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/data/BeamFnDataGrpcService.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/data/BeamFnDataGrpcService.java @@ -28,7 +28,6 @@ import org.apache.beam.model.fnexecution.v1.BeamFnDataGrpc; import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.runners.dataflow.worker.fn.grpc.BeamFnService; -import org.apache.beam.runners.fnexecution.HeaderAccessor; import org.apache.beam.runners.fnexecution.data.GrpcDataService; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver; @@ -39,10 +38,11 @@ import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.fn.data.InboundDataClient; import org.apache.beam.sdk.fn.data.LogicalEndpoint; +import org.apache.beam.sdk.fn.server.HeaderAccessor; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -120,6 +120,12 @@ public void awaitCompletion() throws Exception { future.get().awaitCompletion(); } + @Override + @SuppressWarnings("FutureReturnValueIgnored") + public void runWhenComplete(Runnable completeRunnable) { + future.whenComplete((result, throwable) -> completeRunnable.run()); + } + @Override public boolean isDone() { if (!future.isDone()) { diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/logging/BeamFnLoggingService.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/logging/BeamFnLoggingService.java index c4793e5fd42f..9021b99f2a18 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/logging/BeamFnLoggingService.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/logging/BeamFnLoggingService.java @@ -27,9 +27,9 @@ import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.runners.dataflow.worker.fn.grpc.BeamFnService; import org.apache.beam.runners.dataflow.worker.logging.DataflowWorkerLoggingMDC; -import org.apache.beam.runners.fnexecution.HeaderAccessor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.sdk.fn.server.HeaderAccessor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/stream/ServerStreamObserverFactory.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/stream/ServerStreamObserverFactory.java index 1d82acb4df54..fff3b06c656b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/stream/ServerStreamObserverFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/stream/ServerStreamObserverFactory.java @@ -27,9 +27,9 @@ import org.apache.beam.sdk.fn.stream.ForwardingClientResponseObserver; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ServerCallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** * A {@link StreamObserver} factory that wraps provided {@link CallStreamObserver}s making them flow diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/CreateExecutableStageNodeFunction.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/CreateExecutableStageNodeFunction.java index d106b7591f34..7b3dbdab0328 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/CreateExecutableStageNodeFunction.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/CreateExecutableStageNodeFunction.java @@ -83,8 +83,8 @@ import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/InsertFetchAndFilterStreamingSideInputNodes.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/InsertFetchAndFilterStreamingSideInputNodes.java index 8bc86dd3300c..ca6d0be23c4e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/InsertFetchAndFilterStreamingSideInputNodes.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/InsertFetchAndFilterStreamingSideInputNodes.java @@ -36,7 +36,7 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/RegisterNodeFunction.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/RegisterNodeFunction.java index 689703d75cd3..9bed9c49eaf3 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/RegisterNodeFunction.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/RegisterNodeFunction.java @@ -77,8 +77,8 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java index bd5331a86b74..f7b1a08be8c7 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java @@ -27,6 +27,7 @@ import java.io.File; import java.io.IOException; import java.io.PrintStream; +import java.nio.charset.Charset; import java.util.List; import java.util.Map; import java.util.logging.ErrorManager; @@ -49,7 +50,8 @@ * directory and the default file size is 1 GB. */ @SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) + "nullness", // TODO(https://issues.apache.org/jira/browse/BEAM-10402) + "ForbidDefaultCharset" }) public class DataflowWorkerLoggingInitializer { private static final String ROOT_LOGGER_NAME = ""; @@ -167,10 +169,10 @@ public static synchronized void initialize() { originalStdErr = System.err; System.setOut( JulHandlerPrintStreamAdapterFactory.create( - loggingHandler, SYSTEM_OUT_LOG_NAME, Level.INFO)); + loggingHandler, SYSTEM_OUT_LOG_NAME, Level.INFO, Charset.defaultCharset())); System.setErr( JulHandlerPrintStreamAdapterFactory.create( - loggingHandler, SYSTEM_ERR_LOG_NAME, Level.SEVERE)); + loggingHandler, SYSTEM_ERR_LOG_NAME, Level.SEVERE, Charset.defaultCharset())); // Initialize the SDK Logging Handler, which will only be used for the LoggingService sdkLoggingHandler = makeLoggingHandler(SDK_FILEPATH_PROPERTY, DEFAULT_SDK_LOGGING_LOCATION); @@ -208,7 +210,8 @@ public static synchronized void configure(DataflowWorkerLoggingOptions options) JulHandlerPrintStreamAdapterFactory.create( loggingHandler, SYSTEM_OUT_LOG_NAME, - getJulLevel(options.getWorkerSystemOutMessageLevel()))); + getJulLevel(options.getWorkerSystemOutMessageLevel()), + Charset.defaultCharset())); } if (options.getWorkerSystemErrMessageLevel() != null) { @@ -217,7 +220,8 @@ public static synchronized void configure(DataflowWorkerLoggingOptions options) JulHandlerPrintStreamAdapterFactory.create( loggingHandler, SYSTEM_ERR_LOG_NAME, - getJulLevel(options.getWorkerSystemErrMessageLevel()))); + getJulLevel(options.getWorkerSystemErrMessageLevel()), + Charset.defaultCharset())); } } diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/JulHandlerPrintStreamAdapterFactory.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/JulHandlerPrintStreamAdapterFactory.java index bcdd8c1d0527..caf747084390 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/JulHandlerPrintStreamAdapterFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/JulHandlerPrintStreamAdapterFactory.java @@ -36,7 +36,6 @@ import java.util.logging.LogRecord; import java.util.logging.Logger; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; /** * A {@link PrintStream} factory that creates {@link PrintStream}s which output to the specified JUL @@ -69,7 +68,8 @@ private static class JulHandlerPrintStream extends PrintStream { private int carryOverBytes; private byte[] carryOverByteArray; - private JulHandlerPrintStream(Handler handler, String loggerName, Level logLevel) + private JulHandlerPrintStream( + Handler handler, String loggerName, Level logLevel, Charset charset) throws UnsupportedEncodingException { super( new OutputStream() { @@ -79,14 +79,14 @@ public void write(int i) throws IOException { } }, false, - Charsets.UTF_8.name()); + charset.name()); this.handler = handler; this.loggerName = loggerName; this.messageLevel = logLevel; this.logger = Logger.getLogger(loggerName); this.buffer = new StringBuilder(); this.decoder = - Charset.defaultCharset() + charset .newDecoder() .onMalformedInput(CodingErrorAction.REPLACE) .onUnmappableCharacter(CodingErrorAction.REPLACE); @@ -402,11 +402,12 @@ private void publishIfNonEmpty(String message) { * Creates a {@link PrintStream} which redirects all output to the JUL {@link Handler} with the * specified {@code loggerName} and {@code level}. */ - static PrintStream create(Handler handler, String loggerName, Level messageLevel) { + static PrintStream create( + Handler handler, String loggerName, Level messageLevel, Charset charset) { try { - return new JulHandlerPrintStream(handler, loggerName, messageLevel); + return new JulHandlerPrintStream(handler, loggerName, messageLevel, charset); } catch (UnsupportedEncodingException exc) { - throw new RuntimeException("Encoding not supported: " + Charsets.UTF_8.name(), exc); + throw new RuntimeException("Encoding not supported: " + charset.name(), exc); } } diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowAndCombineFn.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowAndCombineFn.java index 37b11362e16c..12fd94e00f07 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowAndCombineFn.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowAndCombineFn.java @@ -134,11 +134,7 @@ public void merge(Collection toBeMerged, W mergeResult) throws Exception { Collection windows = (Collection) e.getWindows(); for (W window : windows) { Instant outputTime = - windowingStrategy - .getTimestampCombiner() - .assign( - window, - windowingStrategy.getWindowFn().getOutputTime(e.getTimestamp(), window)); + windowingStrategy.getTimestampCombiner().assign(window, e.getTimestamp()); Instant accumulatorOutputTime = accumulatorOutputTimestamps.get(window); if (accumulatorOutputTime == null) { accumulatorOutputTimestamps.put(window, outputTime); diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowViaIteratorsFn.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowViaIteratorsFn.java index 473a1813e0f7..9c16882f7c02 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowViaIteratorsFn.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowViaIteratorsFn.java @@ -58,7 +58,7 @@ class BatchGroupAlsoByWindowViaIteratorsFn private final WindowingStrategy strategy; public static boolean isSupported(WindowingStrategy strategy) { - if (!strategy.getWindowFn().isNonMerging()) { + if (strategy.needsMerge()) { return false; } @@ -127,11 +127,7 @@ public void processElement( windows.put(window.maxTimestamp(), window); output.outputWindowedValue( KV.of(key, (Iterable) new WindowReiterable(iterator, window)), - strategy - .getTimestampCombiner() - .assign( - typedWindow, - strategy.getWindowFn().getOutputTime(e.getTimestamp(), typedWindow)), + strategy.getTimestampCombiner().assign(typedWindow, e.getTimestamp()), Arrays.asList(window), PaneInfo.ON_TIME_AND_ONLY_FIRING); } diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/DirectStreamObserver.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/DirectStreamObserver.java index 7565ba260e55..abea5e0d31f7 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/DirectStreamObserver.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/DirectStreamObserver.java @@ -18,9 +18,14 @@ package org.apache.beam.runners.dataflow.worker.windmill; import java.util.concurrent.Phaser; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; +import javax.annotation.concurrent.GuardedBy; import javax.annotation.concurrent.ThreadSafe; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** * A {@link StreamObserver} which uses synchronization on the underlying {@link CallStreamObserver} @@ -33,27 +38,70 @@ */ @ThreadSafe public final class DirectStreamObserver implements StreamObserver { + private static final Logger LOG = LoggerFactory.getLogger(DirectStreamObserver.class); private final Phaser phaser; + + @GuardedBy("outboundObserver") private final CallStreamObserver outboundObserver; - public DirectStreamObserver(Phaser phaser, CallStreamObserver outboundObserver) { + private final long deadlineSeconds; + + @GuardedBy("outboundObserver") + private boolean firstMessage = true; + + public DirectStreamObserver( + Phaser phaser, CallStreamObserver outboundObserver, long deadlineSeconds) { this.phaser = phaser; this.outboundObserver = outboundObserver; + this.deadlineSeconds = deadlineSeconds; } @Override public void onNext(T value) { - int phase = phaser.getPhase(); - if (!outboundObserver.isReady()) { + final int phase = phaser.getPhase(); + long totalSecondsWaited = 0; + long waitSeconds = 1; + while (true) { try { - phaser.awaitAdvanceInterruptibly(phase); + synchronized (outboundObserver) { + // We let the first message passthrough without blocking because it is performed under the + // StreamPool synchronized block and single message isn't going to cause memory issues due + // to excessive buffering within grpc. + if (firstMessage || outboundObserver.isReady()) { + firstMessage = false; + outboundObserver.onNext(value); + return; + } + } + // A callback has been registered to advance the phaser whenever the observer transitions to + // is ready. Since we are waiting for a phase observed before the outboundObserver.isReady() + // returned false, we expect it to advance after the channel has become ready. This doesn't + // always seem to be the case (despite documentation stating otherwise) so we poll + // periodically and enforce an overall timeout related to the stream deadline. + phaser.awaitAdvanceInterruptibly(phase, waitSeconds, TimeUnit.SECONDS); + synchronized (outboundObserver) { + outboundObserver.onNext(value); + return; + } + } catch (TimeoutException e) { + totalSecondsWaited += waitSeconds; + if (totalSecondsWaited > deadlineSeconds) { + LOG.error( + "Exceeded timeout waiting for the outboundObserver to become ready meaning " + + "that the streamdeadline was not respected."); + throw new RuntimeException(e); + } + waitSeconds = waitSeconds * 2; } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new RuntimeException(e); } - } - synchronized (outboundObserver) { - outboundObserver.onNext(value); + if (totalSecondsWaited > 30) { + LOG.info( + "Output channel stalled for {}s, outbound thread {}.", + totalSecondsWaited, + Thread.currentThread().getName()); + } } } diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/ForwardingClientResponseObserver.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/ForwardingClientResponseObserver.java index d7eba1f28a5c..0446a30bc9e0 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/ForwardingClientResponseObserver.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/ForwardingClientResponseObserver.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker.windmill; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ClientCallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ClientResponseObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientCallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientResponseObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** * A {@link ClientResponseObserver} which delegates all {@link StreamObserver} calls. diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServer.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServer.java index 64517aa8ba92..3c06ee9e0323 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServer.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServer.java @@ -86,17 +86,17 @@ import org.apache.beam.sdk.util.BackOffUtils; import org.apache.beam.sdk.util.FluentBackoff; import org.apache.beam.sdk.util.Sleeper; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.CallCredentials; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Channel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusRuntimeException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.auth.MoreCallCredentials; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.netty.GrpcSslContexts; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.netty.NegotiationType; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.netty.NettyChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.CallCredentials; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Channel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusRuntimeException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.auth.MoreCallCredentials; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.GrpcSslContexts; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NegotiationType; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NettyChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; @@ -264,11 +264,11 @@ private synchronized void initializeLocalHost(int port) throws IOException { */ private static class VendoredRequestMetadataCallbackAdapter implements com.google.auth.RequestMetadataCallback { - private final org.apache.beam.vendor.grpc.v1p26p0.com.google.auth.RequestMetadataCallback + private final org.apache.beam.vendor.grpc.v1p36p0.com.google.auth.RequestMetadataCallback callback; private VendoredRequestMetadataCallbackAdapter( - org.apache.beam.vendor.grpc.v1p26p0.com.google.auth.RequestMetadataCallback callback) { + org.apache.beam.vendor.grpc.v1p36p0.com.google.auth.RequestMetadataCallback callback) { this.callback = callback; } @@ -292,7 +292,7 @@ public void onFailure(Throwable exception) { * delegate to reduce maintenance burden. */ private static class VendoredCredentialsAdapter - extends org.apache.beam.vendor.grpc.v1p26p0.com.google.auth.Credentials { + extends org.apache.beam.vendor.grpc.v1p36p0.com.google.auth.Credentials { private final com.google.auth.Credentials credentials; private VendoredCredentialsAdapter(com.google.auth.Credentials credentials) { @@ -313,7 +313,7 @@ public Map> getRequestMetadata() throws IOException { public void getRequestMetadata( final URI uri, Executor executor, - final org.apache.beam.vendor.grpc.v1p26p0.com.google.auth.RequestMetadataCallback + final org.apache.beam.vendor.grpc.v1p36p0.com.google.auth.RequestMetadataCallback callback) { credentials.getRequestMetadata( uri, executor, new VendoredRequestMetadataCallbackAdapter(callback)); @@ -623,7 +623,8 @@ private static long uniqueId() { * require synchronizing on this. */ private abstract class AbstractWindmillStream implements WindmillStream { - private final StreamObserverFactory streamObserverFactory = StreamObserverFactory.direct(); + private final StreamObserverFactory streamObserverFactory = + StreamObserverFactory.direct(streamDeadlineSeconds * 2); private final Function, StreamObserver> clientFactory; private final Executor executor = Executors.newSingleThreadExecutor(); @@ -1524,8 +1525,8 @@ public boolean hasMoreElements() { try { blockedStartMs.set(Instant.now().getMillis()); - current = queue.take(); - if (current != POISON_PILL) { + current = queue.poll(180, TimeUnit.SECONDS); + if (current != null && current != POISON_PILL) { return true; } if (cancelled.get()) { @@ -1534,7 +1535,8 @@ public boolean hasMoreElements() { if (complete.get()) { return false; } - throw new IllegalStateException("Got poison pill but stream is not done."); + throw new IllegalStateException( + "Got poison pill or timeout but stream is not done."); } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new CancellationException(); diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/StreamObserverFactory.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/StreamObserverFactory.java index 021676694d0e..541a394b506f 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/StreamObserverFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/StreamObserverFactory.java @@ -20,16 +20,16 @@ import java.util.function.Function; import org.apache.beam.sdk.fn.stream.AdvancingPhaser; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** * Uses {@link PipelineOptions} to configure which underlying {@link StreamObserver} implementation * to use. */ public abstract class StreamObserverFactory { - public static StreamObserverFactory direct() { - return new Direct(); + public static StreamObserverFactory direct(long deadlineSeconds) { + return new Direct(deadlineSeconds); } public abstract StreamObserver from( @@ -37,7 +37,11 @@ public abstract StreamObserver from( StreamObserver responseObserver); private static class Direct extends StreamObserverFactory { - Direct() {} + private final long deadlineSeconds; + + Direct(long deadlineSeconds) { + this.deadlineSeconds = deadlineSeconds; + } @Override public StreamObserver from( @@ -49,7 +53,7 @@ public StreamObserver from( clientFactory.apply( new ForwardingClientResponseObserver( inboundObserver, phaser::arrive, phaser::forceTermination)); - return new DirectStreamObserver<>(phaser, outboundObserver); + return new DirectStreamObserver<>(phaser, outboundObserver, deadlineSeconds); } } } diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestExecutors.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestExecutors.java index bae1b645c275..57c53a3d42e4 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestExecutors.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestExecutors.java @@ -29,9 +29,6 @@ * A {@link TestRule} that validates that all submitted tasks finished and were completed. This * allows for testing that tasks have exercised the appropriate shutdown logic. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestExecutors { public static TestExecutorService from(Supplier executorServiceSuppler) { return new FromSupplier(executorServiceSuppler); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestExecutorsTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestExecutorsTest.java index b851715abf75..cd67332dc91d 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestExecutorsTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestExecutorsTest.java @@ -35,7 +35,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "FutureReturnValueIgnored", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class TestExecutorsTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestStreams.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestStreams.java index e88054c327ae..8ac6ca8ce450 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestStreams.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestStreams.java @@ -19,8 +19,8 @@ import java.util.function.Consumer; import java.util.function.Supplier; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** Utility methods which enable testing of {@link StreamObserver}s. */ public class TestStreams { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestStreamsTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestStreamsTest.java index 4c29447180dc..cecaa8f416ee 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestStreamsTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/harness/test/TestStreamsTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.harness.test; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.ArrayList; @@ -31,9 +31,6 @@ /** Tests for {@link TestStreams}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestStreamsTest { @Test public void testOnNextIsCalled() { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ApplianceShuffleCountersTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ApplianceShuffleCountersTest.java index 06e89ce067d7..41da44a28dc9 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ApplianceShuffleCountersTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ApplianceShuffleCountersTest.java @@ -36,9 +36,6 @@ /** Tests for ApplianceShuffleCounters. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ApplianceShuffleCountersTest { private static final String DATASET_ID = "dataset"; private CounterSet counterSet = new CounterSet(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteReaderFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteReaderFactoryTest.java index 75cd856af346..b2ebc6db6786 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteReaderFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteReaderFactoryTest.java @@ -19,6 +19,7 @@ import static org.apache.beam.runners.dataflow.util.Structs.addLong; import static org.apache.beam.runners.dataflow.util.Structs.addString; +import static org.hamcrest.MatcherAssert.assertThat; import com.google.api.services.dataflow.model.Source; import org.apache.beam.runners.dataflow.util.CloudObject; @@ -40,7 +41,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class AvroByteReaderFactoryTest { private final String pathToAvroFile = "/path/to/file.avro"; @@ -79,7 +79,7 @@ public void testCreatePlainAvroByteReader() throws Exception { runTestCreateAvroReader( pathToAvroFile, null, null, CloudObjects.asCloudObject(coder, /*sdkComponents=*/ null)); - Assert.assertThat(reader, new IsInstanceOf(AvroByteReader.class)); + assertThat(reader, new IsInstanceOf(AvroByteReader.class)); AvroByteReader avroReader = (AvroByteReader) reader; Assert.assertEquals(pathToAvroFile, avroReader.avroSource.getFileOrPatternSpec()); Assert.assertEquals(0L, avroReader.startPosition); @@ -95,7 +95,7 @@ public void testCreateRichAvroByteReader() throws Exception { runTestCreateAvroReader( pathToAvroFile, 200L, 500L, CloudObjects.asCloudObject(coder, /*sdkComponents=*/ null)); - Assert.assertThat(reader, new IsInstanceOf(AvroByteReader.class)); + assertThat(reader, new IsInstanceOf(AvroByteReader.class)); AvroByteReader avroReader = (AvroByteReader) reader; Assert.assertEquals(pathToAvroFile, avroReader.avroSource.getFileOrPatternSpec()); Assert.assertEquals(200L, avroReader.startPosition); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteReaderTest.java index 61c219835851..eaeb0560b6e9 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteReaderTest.java @@ -57,7 +57,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class AvroByteReaderTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteSinkFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteSinkFactoryTest.java index e0076538ec42..34f6f70e8201 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteSinkFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/AvroByteSinkFactoryTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.runners.dataflow.util.Structs.addString; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.runners.dataflow.util.CloudObject; import org.apache.beam.runners.dataflow.worker.util.common.worker.Sink; @@ -60,7 +61,7 @@ public void testCreateAvroByteSink() throws Exception { WindowedValue.getFullCoder(BigEndianIntegerCoder.of(), GlobalWindow.Coder.INSTANCE); Sink sink = runTestCreateAvroSink(pathToAvroFile, coder); - Assert.assertThat(sink, new IsInstanceOf(AvroByteSink.class)); + assertThat(sink, new IsInstanceOf(AvroByteSink.class)); AvroByteSink avroSink = (AvroByteSink) sink; Assert.assertEquals(pathToAvroFile, avroSink.resourceId.toString()); Assert.assertEquals(coder, avroSink.coder); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorkerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorkerTest.java index f54d4d766eff..6b55d313ddf4 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorkerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorkerTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.Matchers.notNullValue; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.doThrow; import static org.mockito.Mockito.mock; @@ -59,9 +59,6 @@ /** Unit tests for {@link BatchDataflowWorker}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BatchDataflowWorkerTest { private static class WorkerException extends Exception {} diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchModeExecutionContextTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchModeExecutionContextTest.java index 0a377c381f1f..737136711c40 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchModeExecutionContextTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchModeExecutionContextTest.java @@ -19,11 +19,11 @@ import static org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.longToSplitInt; import static org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.splitIntToLong; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.hasItems; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import com.google.api.services.dataflow.model.CounterMetadata; import com.google.api.services.dataflow.model.CounterStructuredName; @@ -51,9 +51,6 @@ /** Tests for {@link BatchModeExecutionContext}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BatchModeExecutionContextTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ConcatReaderFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ConcatReaderFactoryTest.java index 1bfc2aa96075..867ff5140a8b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ConcatReaderFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ConcatReaderFactoryTest.java @@ -21,9 +21,9 @@ import static org.apache.beam.runners.dataflow.util.Structs.addLong; import static org.apache.beam.runners.dataflow.util.Structs.addStringList; import static org.apache.beam.runners.dataflow.worker.ReaderUtils.readAllFromReader; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import com.google.api.services.dataflow.model.Source; import java.util.ArrayList; @@ -42,9 +42,6 @@ /** Test for {@code ConcatReaderFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ConcatReaderFactoryTest { Source createSourcesWithInMemorySources(List> allData) { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ConcatReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ConcatReaderTest.java index 8b6e542b92f3..4a35a23ae820 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ConcatReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ConcatReaderTest.java @@ -19,12 +19,12 @@ import static org.apache.beam.runners.dataflow.worker.ReaderUtils.readAllFromReader; import static org.apache.beam.runners.dataflow.worker.SourceTranslationUtils.readerProgressToCloudProgress; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import com.google.api.services.dataflow.model.ApproximateReportedProgress; @@ -53,7 +53,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ConcatReaderTest { private static final String READER_OBJECT = "reader_object"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ContextActivationObserverRegistryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ContextActivationObserverRegistryTest.java index 64c30f40f55c..40d17387bc86 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ContextActivationObserverRegistryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ContextActivationObserverRegistryTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker; import static org.hamcrest.CoreMatchers.instanceOf; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItem; -import static org.junit.Assert.assertThat; import com.google.auto.service.AutoService; import java.io.Closeable; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/CreateIsmShardKeyAndSortKeyDoFnFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/CreateIsmShardKeyAndSortKeyDoFnFactoryTest.java index 73b75132e50f..9798315609e8 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/CreateIsmShardKeyAndSortKeyDoFnFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/CreateIsmShardKeyAndSortKeyDoFnFactoryTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.List; @@ -39,9 +39,6 @@ /** Tests for {@link CreateIsmShardKeyAndSortKeyDoFnFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CreateIsmShardKeyAndSortKeyDoFnFactoryTest { @Test public void testConversionOfRecord() throws Exception { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowBatchWorkerHarnessTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowBatchWorkerHarnessTest.java index 93798c60405d..34517435ad00 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowBatchWorkerHarnessTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowBatchWorkerHarnessTest.java @@ -46,9 +46,6 @@ /** Unit tests for {@link DataflowBatchWorkerHarness}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowBatchWorkerHarnessTest { @Rule public TestRule restoreSystemProperties = new RestoreSystemProperties(); @Rule public TestRule restoreLogging = new RestoreDataflowLoggingMDC(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowElementExecutionTrackerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowElementExecutionTrackerTest.java index 477e5109cf74..bdb714dd53d8 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowElementExecutionTrackerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowElementExecutionTrackerTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.notNullValue; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.util.List; @@ -43,9 +43,6 @@ /** Tests for {@link DataflowElementExecutionTracker}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowElementExecutionTrackerTest { @Rule public final ExpectedException thrown = ExpectedException.none(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowExecutionStateTrackerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowExecutionStateTrackerTest.java index be9d71ac1e1f..0185ffda2819 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowExecutionStateTrackerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowExecutionStateTrackerTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker; import static junit.framework.TestCase.assertNotNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import java.io.Closeable; @@ -44,9 +44,6 @@ import org.junit.Test; /** Tests for {@link DataflowExecutionStateTrackerTest}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowExecutionStateTrackerTest { private PipelineOptions options; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowMatchers.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowMatchers.java index 4b69f070df84..ddb68c72b759 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowMatchers.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowMatchers.java @@ -18,7 +18,7 @@ package org.apache.beam.runners.dataflow.worker; import java.io.Serializable; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.hamcrest.Description; import org.hamcrest.TypeSafeMatcher; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowOperationContextTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowOperationContextTest.java index 0515555b0211..220ab81a113d 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowOperationContextTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowOperationContextTest.java @@ -68,9 +68,6 @@ /** Tests for {@link DataflowOperationContext}. */ @RunWith(Enclosed.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowOperationContextTest { /** diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowSideInputReadCounterTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowSideInputReadCounterTest.java index bcebf41967ae..d069c88fbded 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowSideInputReadCounterTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowSideInputReadCounterTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Mockito.any; import static org.mockito.Mockito.mock; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkProgressUpdaterTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkProgressUpdaterTest.java index 37a398309ce4..2ecfaebf80f3 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkProgressUpdaterTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkProgressUpdaterTest.java @@ -61,9 +61,6 @@ /** Unit tests for {@link DataflowWorkProgressUpdater}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowWorkProgressUpdaterTest { private static final long LEASE_MS = 2000; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClientTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClientTest.java index f69c0a6dac35..b96ee2fb84cb 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClientTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClientTest.java @@ -59,9 +59,6 @@ /** Unit tests for {@link DataflowWorkUnitClient}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowWorkUnitClientTest { private static final Logger LOG = LoggerFactory.getLogger(DataflowWorkUnitClientTest.class); @Rule public TestRule restoreSystemProperties = new RestoreSystemProperties(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelperTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelperTest.java index ca424488a3d1..90095c0c966e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelperTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelperTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import com.fasterxml.jackson.databind.ObjectMapper; import java.io.File; @@ -32,7 +32,7 @@ import org.apache.beam.runners.dataflow.worker.testing.RestoreDataflowLoggingMDC; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.testing.RestoreSystemProperties; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; import org.junit.Rule; import org.junit.Test; import org.junit.rules.TemporaryFolder; @@ -42,9 +42,6 @@ /** Unit tests for {@link DataflowWorkerHarnessHelper}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowWorkerHarnessHelperTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); @Rule public TestRule restoreSystemProperties = new RestoreSystemProperties(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DefaultParDoFnFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DefaultParDoFnFactoryTest.java index e8a0d1d4310a..1e2eca664c1a 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DefaultParDoFnFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DefaultParDoFnFactoryTest.java @@ -18,9 +18,9 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.runners.dataflow.util.Structs.addString; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.Collections; @@ -50,7 +50,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class DefaultParDoFnFactoryTest { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DeltaCounterCellTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DeltaCounterCellTest.java index 244caaed7b4b..1fad038dd91b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DeltaCounterCellTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DeltaCounterCellTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Collections; @@ -38,9 +38,6 @@ /** Tests for {@link DeltaCounterCell}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DeltaCounterCellTest { private DeltaCounterCell cell; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DoFnInstanceManagersTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DoFnInstanceManagersTest.java index 1679f35a124e..7e53b1550263 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DoFnInstanceManagersTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DoFnInstanceManagersTest.java @@ -17,14 +17,16 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.theInstance; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.Collections; import org.apache.beam.runners.dataflow.util.PropertyNames; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.DoFnSchemaInformation; import org.apache.beam.sdk.util.DoFnInfo; @@ -39,9 +41,6 @@ /** Tests for {@link DoFnInstanceManagers}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DoFnInstanceManagersTest { @Rule public ExpectedException thrown = ExpectedException.none(); @@ -61,6 +60,7 @@ public void teardown() throws Exception { } private DoFn initialFn = new TestFn(); + private PipelineOptions options = PipelineOptionsFactory.create(); @Test public void testInstanceReturnsInstance() throws Exception { @@ -117,7 +117,7 @@ public void testCloningPoolReusesAfterComplete() throws Exception { DoFnSchemaInformation.create(), Collections.emptyMap()); - DoFnInstanceManager mgr = DoFnInstanceManagers.cloningPool(info); + DoFnInstanceManager mgr = DoFnInstanceManagers.cloningPool(info, options); DoFnInfo retrievedInfo = mgr.get(); assertThat(retrievedInfo, not(Matchers.>theInstance(info))); assertThat(retrievedInfo.getDoFn(), not(theInstance(info.getDoFn()))); @@ -140,7 +140,7 @@ public void testCloningPoolTearsDownAfterAbort() throws Exception { DoFnSchemaInformation.create(), Collections.emptyMap()); - DoFnInstanceManager mgr = DoFnInstanceManagers.cloningPool(info); + DoFnInstanceManager mgr = DoFnInstanceManagers.cloningPool(info, options); DoFnInfo retrievedInfo = mgr.get(); mgr.abort(retrievedInfo); @@ -165,7 +165,7 @@ public void testCloningPoolMultipleOutstanding() throws Exception { DoFnSchemaInformation.create(), Collections.emptyMap()); - DoFnInstanceManager mgr = DoFnInstanceManagers.cloningPool(info); + DoFnInstanceManager mgr = DoFnInstanceManagers.cloningPool(info, options); DoFnInfo firstInfo = mgr.get(); DoFnInfo secondInfo = mgr.get(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FakeWindmillServer.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FakeWindmillServer.java index aec0793b59d0..c7c14de2c34c 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FakeWindmillServer.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FakeWindmillServer.java @@ -58,9 +58,6 @@ import org.slf4j.LoggerFactory; /** An in-memory Windmill server that offers provided work and data. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class FakeWindmillServer extends WindmillServerStub { private static final Logger LOG = LoggerFactory.getLogger(FakeWindmillServer.class); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FilepatternsTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FilepatternsTest.java index cb841171b820..5e370b912e32 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FilepatternsTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FilepatternsTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.hasItems; -import static org.junit.Assert.assertThat; import org.hamcrest.Matchers; import org.junit.Rule; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FnApiWindowMappingFnTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FnApiWindowMappingFnTest.java index 95103d5b6108..22219c81aac3 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FnApiWindowMappingFnTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/FnApiWindowMappingFnTest.java @@ -62,7 +62,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FnApiWindowMappingFnTest { private static final ApiServiceDescriptor DATA_SERVICE = diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/GroupAlsoByWindowParDoFnFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/GroupAlsoByWindowParDoFnFactoryTest.java index d19126d858c9..787309575b86 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/GroupAlsoByWindowParDoFnFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/GroupAlsoByWindowParDoFnFactoryTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.hasSize; -import static org.junit.Assert.assertThat; import java.util.List; import org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowParDoFnFactory.MergingCombineFn; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/GroupingShuffleReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/GroupingShuffleReaderTest.java index f73f514e412f..55a6a95d2cf2 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/GroupingShuffleReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/GroupingShuffleReaderTest.java @@ -92,7 +92,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class GroupingShuffleReaderTest { private static final byte[] EMPTY_BYTE_ARRAY = new byte[0]; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/HotKeyLoggerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/HotKeyLoggerTest.java index 28fbd0cfdcb2..6dee1257b3eb 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/HotKeyLoggerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/HotKeyLoggerTest.java @@ -34,9 +34,6 @@ @RunWith(PowerMockRunner.class) @PrepareForTest({HotKeyLoggerTest.class, LoggerFactory.class}) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HotKeyLoggerTest { @Rule public ExpectedLogs expectedLogs = ExpectedLogs.none(HotKeyLogger.class); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/InMemoryReaderFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/InMemoryReaderFactoryTest.java index 3c2ab5d59952..e50caa1e0913 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/InMemoryReaderFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/InMemoryReaderFactoryTest.java @@ -19,6 +19,7 @@ import static org.apache.beam.runners.dataflow.util.Structs.addLong; import static org.apache.beam.runners.dataflow.util.Structs.addStringList; +import static org.hamcrest.MatcherAssert.assertThat; import com.google.api.services.dataflow.model.Source; import java.util.Arrays; @@ -39,9 +40,6 @@ /** Tests for InMemoryReaderFactory. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class InMemoryReaderFactoryTest { static Source createInMemoryCloudSource( List elements, Long start, Long end, Coder coder) throws Exception { @@ -76,7 +74,7 @@ void runTestCreateInMemoryReader( PipelineOptionsFactory.create(), BatchModeExecutionContext.forTesting(PipelineOptionsFactory.create(), "testStage"), TestOperationContext.create()); - Assert.assertThat(reader, new IsInstanceOf(InMemoryReader.class)); + assertThat(reader, new IsInstanceOf(InMemoryReader.class)); InMemoryReader inMemoryReader = (InMemoryReader) reader; Assert.assertEquals( InMemoryReaderTest.encodedElements(elements, coder), inMemoryReader.encodedElements); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java index a88fc2ff13d1..58e76d76fcc3 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java @@ -24,12 +24,12 @@ import static org.apache.beam.runners.dataflow.worker.counters.CounterName.named; import static org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray; import static org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItems; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.anyBoolean; import static org.mockito.Matchers.anyLong; import static org.mockito.Matchers.eq; @@ -117,7 +117,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class IntrinsicMapTaskExecutorFactoryTest { private static final String STAGE = "test"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorTest.java index bd946c6a5382..80444ed2ee37 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorTest.java @@ -27,10 +27,10 @@ import static org.apache.beam.runners.dataflow.worker.SourceTranslationUtils.cloudProgressToReaderProgress; import static org.apache.beam.runners.dataflow.worker.SourceTranslationUtils.splitRequestToApproximateSplitRequest; import static org.apache.beam.runners.dataflow.worker.counters.CounterName.named; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.arrayWithSize; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Matchers.anyBoolean; import static org.mockito.Matchers.eq; @@ -81,9 +81,6 @@ /** Tests for {@link MapTaskExecutor}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class IntrinsicMapTaskExecutorTest { private static final String COUNTER_PREFIX = "test-"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmFormatTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmFormatTest.java index e97412420e09..8ffb5a440f8c 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmFormatTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmFormatTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.containsString; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.ByteArrayInputStream; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmReaderFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmReaderFactoryTest.java index 67a057cd8213..4bf93538e149 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmReaderFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmReaderFactoryTest.java @@ -53,9 +53,6 @@ /** Tests for {@link IsmReaderFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class IsmReaderFactoryTest { private DataflowPipelineOptions options; private Cache logicalReferenceCache; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmReaderTest.java index 063d67169da6..8f60c1acaa79 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmReaderTest.java @@ -93,7 +93,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class IsmReaderTest { private static final long BLOOM_FILTER_SIZE_LIMIT = 10_000; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmSideInputReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmSideInputReaderTest.java index 9d90ad4bf599..ae78bfa21016 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmSideInputReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IsmSideInputReaderTest.java @@ -21,6 +21,7 @@ import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables.concat; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.hasItem; @@ -28,7 +29,6 @@ import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -70,7 +70,6 @@ import org.apache.beam.runners.dataflow.util.PropertyNames; import org.apache.beam.runners.dataflow.util.RandomAccessData; import org.apache.beam.runners.dataflow.worker.DataflowOperationContext.DataflowExecutionState; -import org.apache.beam.runners.dataflow.worker.ExperimentContext.Experiment; import org.apache.beam.runners.dataflow.worker.MetricsToCounterUpdateConverter.Kind; import org.apache.beam.runners.dataflow.worker.counters.Counter; import org.apache.beam.runners.dataflow.worker.counters.CounterName; @@ -114,7 +113,6 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ListMultimap; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Multimap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Ordering; @@ -141,7 +139,7 @@ * equivalently to their numeric representation for non-negative values. */ @RunWith(JUnit4.class) -@SuppressWarnings({"keyfor", "nullness"}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +@SuppressWarnings({"keyfor"}) public class IsmSideInputReaderTest { private static final Logger LOG = LoggerFactory.getLogger(IsmSideInputReaderTest.class); private static final long BLOOM_FILTER_SIZE_LIMIT = 10_000; @@ -169,9 +167,7 @@ public class IsmSideInputReaderTest { @Before public void setUp() { - pipelineOptions - .as(DataflowPipelineDebugOptions.class) - .setExperiments(Lists.newArrayList(Experiment.SideInputIOMetrics.getName())); + pipelineOptions.as(DataflowPipelineDebugOptions.class); setupCloser = Closer.create(); setupCloser.register(executionContext.getExecutionStateTracker().activate()); setupCloser.register(operationContext.enterProcess()); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LazilyInitializedSideInputReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LazilyInitializedSideInputReaderTest.java index 81fb46f52d35..dfcde525f4f5 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LazilyInitializedSideInputReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LazilyInitializedSideInputReaderTest.java @@ -42,7 +42,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class LazilyInitializedSideInputReaderTest { private static final String TEST_TAG = "test_tag"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogRecordMatcher.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogRecordMatcher.java index 10bb1a3ae8a2..00b09aae7d7b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogRecordMatcher.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogRecordMatcher.java @@ -25,9 +25,6 @@ import org.hamcrest.TypeSafeMatcher; /** Hamcrest matcher for asserts on {@link LogRecord} instances. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public final class LogRecordMatcher extends TypeSafeMatcher { private final String substring; private final Matcher levelMatcher; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogRecordMatcherTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogRecordMatcherTest.java index f968e04d6986..f78304e97aca 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogRecordMatcherTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogRecordMatcherTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.Set; @@ -33,7 +33,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "AssertionFailureIgnored", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class LogRecordMatcherTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogSaverTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogSaverTest.java index 8b7068a6ae7f..31047e45ba71 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogSaverTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/LogSaverTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.empty; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.Set; import java.util.logging.Level; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/MetricsEnvironmentContextActivationObserverRegistrationTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/MetricsEnvironmentContextActivationObserverRegistrationTest.java index 000e91653c2f..6e21cd709a57 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/MetricsEnvironmentContextActivationObserverRegistrationTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/MetricsEnvironmentContextActivationObserverRegistrationTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker; import static org.hamcrest.CoreMatchers.instanceOf; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItem; -import static org.junit.Assert.assertThat; import java.util.List; import org.junit.Test; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/NoOpSourceOperationExecutorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/NoOpSourceOperationExecutorTest.java index d8211fad0b5e..5ddd96099218 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/NoOpSourceOperationExecutorTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/NoOpSourceOperationExecutorTest.java @@ -33,9 +33,6 @@ /** Tests for {@link NoOpSourceOperationExecutor} */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class NoOpSourceOperationExecutorTest { private PipelineOptions options; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PairWithConstantKeyDoFnFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PairWithConstantKeyDoFnFactoryTest.java index a2ac2427ed58..e04d2ea80d4b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PairWithConstantKeyDoFnFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PairWithConstantKeyDoFnFactoryTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.List; @@ -38,9 +38,6 @@ /** Tests for {@link PairWithConstantKeyDoFnFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PairWithConstantKeyDoFnFactoryTest { @Test public void testConversionOfRecord() throws Exception { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PartialGroupByKeyParDoFnsTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PartialGroupByKeyParDoFnsTest.java index 6ee1eb57dced..7c3932d4a568 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PartialGroupByKeyParDoFnsTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PartialGroupByKeyParDoFnsTest.java @@ -19,8 +19,8 @@ import static org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver.TestOutputCounter.getMeanByteCounterName; import static org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver.TestOutputCounter.getObjectCounterName; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.doAnswer; import static org.mockito.Mockito.verify; @@ -89,7 +89,6 @@ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "unchecked", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PartialGroupByKeyParDoFnsTest { @Mock private StreamingSideInputFetcher, BoundedWindow> mockSideInputFetcher; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PartitioningShuffleReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PartitioningShuffleReaderTest.java index 4025d8a94ee0..cf784e18a512 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PartitioningShuffleReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PartitioningShuffleReaderTest.java @@ -45,9 +45,6 @@ /** Tests for PartitioningShuffleReader. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PartitioningShuffleReaderTest { private static final List>> NO_KVS = Collections.emptyList(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PubsubReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PubsubReaderTest.java index ef1ccd67c791..ea2687d7721b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PubsubReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PubsubReaderTest.java @@ -31,7 +31,7 @@ import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.transforms.windowing.IntervalWindow; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.joda.time.Instant; import org.junit.Before; import org.junit.Test; @@ -42,9 +42,6 @@ /** Unit tests for {@link PubsubReader}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubReaderTest { @Mock StreamingModeExecutionContext mockContext; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PubsubSinkTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PubsubSinkTest.java index be39de5b4d5e..deff0d1c4139 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PubsubSinkTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/PubsubSinkTest.java @@ -29,7 +29,7 @@ import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.transforms.windowing.IntervalWindow; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.joda.time.Instant; import org.junit.Before; import org.junit.Test; @@ -40,9 +40,6 @@ /** Unit tests for {@link PubsubSink}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubSinkTest { @Mock StreamingModeExecutionContext mockContext; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderCacheTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderCacheTest.java index f9ab12c41b0e..7105da82af59 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderCacheTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderCacheTest.java @@ -26,7 +26,7 @@ import java.io.IOException; import java.util.concurrent.TimeUnit; import org.apache.beam.sdk.io.UnboundedSource; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Stopwatch; import org.joda.time.Duration; import org.junit.Before; @@ -38,9 +38,6 @@ /** Tests for {@link ReaderCache}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReaderCacheTest { private static final String C_ID = "computationId"; @@ -61,7 +58,7 @@ public class ReaderCacheTest { @Before public void setUp() { - readerCache = new ReaderCache(); + readerCache = new ReaderCache(Duration.standardMinutes(1), Runnable::run); MockitoAnnotations.initMocks(this); } @@ -152,7 +149,7 @@ public void testReaderCacheExpiration() throws IOException, InterruptedException Duration cacheDuration = Duration.millis(10); // Create a cache with short expiry period. - ReaderCache readerCache = new ReaderCache(cacheDuration); + ReaderCache readerCache = new ReaderCache(cacheDuration, Runnable::run); readerCache.cacheReader( WindmillComputationKey.create(C_ID, KEY_1, SHARDING_KEY), 1, 0, reader1); @@ -172,7 +169,7 @@ public void testReaderCacheExpiration() throws IOException, InterruptedException @Test public void testReaderCacheRetries() throws IOException, InterruptedException { - ReaderCache readerCache = new ReaderCache(); + ReaderCache readerCache = new ReaderCache(Duration.standardMinutes(1), Runnable::run); readerCache.cacheReader( WindmillComputationKey.create(C_ID, KEY_1, SHARDING_KEY), 1, 1, reader1); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderFactoryTest.java index 31883fc208a1..7cdff4e1955a 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderFactoryTest.java @@ -17,6 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; + import com.google.api.services.dataflow.model.Source; import java.io.IOException; import java.util.NoSuchElementException; @@ -41,7 +43,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ReaderFactoryTest { @@ -138,7 +139,7 @@ public void testCreateReader() throws Exception { PipelineOptionsFactory.create(), BatchModeExecutionContext.forTesting(options, "testStage"), null); - Assert.assertThat(reader, new IsInstanceOf(TestReader.class)); + assertThat(reader, new IsInstanceOf(TestReader.class)); } @Test @@ -157,7 +158,7 @@ public void testCreateUnknownReader() throws Exception { null); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat(exn.toString(), CoreMatchers.containsString("Unable to create a Reader")); + assertThat(exn.toString(), CoreMatchers.containsString("Unable to create a Reader")); } } } diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderTestUtils.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderTestUtils.java index 729a84f82366..343661ae9a6b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderTestUtils.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReaderTestUtils.java @@ -38,9 +38,6 @@ * Helpers for testing {@link NativeReader} and related classes, especially {@link * NativeReaderIterator#getProgress} and {@link NativeReaderIterator#requestDynamicSplit}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReaderTestUtils { public static Position positionAtIndex(@Nullable Long index) { return new Position().setRecordIndex(index); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReifyTimestampAndWindowsParDoFnFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReifyTimestampAndWindowsParDoFnFactoryTest.java index e692b0becd3f..860c953d94e6 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReifyTimestampAndWindowsParDoFnFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ReifyTimestampAndWindowsParDoFnFactoryTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn; import org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver; @@ -33,9 +33,6 @@ import org.junit.Test; /** Tests for {@link ReifyTimestampAndWindowsParDoFnFactory} */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReifyTimestampAndWindowsParDoFnFactoryTest { private void verifyReifiedIsInTheSameWindows(WindowedValue> elem) diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/RunnerHarnessCoderCloudObjectTranslatorRegistrarTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/RunnerHarnessCoderCloudObjectTranslatorRegistrarTest.java index 81f26468265b..9aa015f72706 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/RunnerHarnessCoderCloudObjectTranslatorRegistrarTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/RunnerHarnessCoderCloudObjectTranslatorRegistrarTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.IsInstanceOf.instanceOf; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.dataflow.internal.IsmFormat; import org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistryTest.java index f51c50da736e..b2910e492048 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SdkHarnessRegistryTest.java @@ -38,9 +38,6 @@ /** Unit tests for {@link SdkHarnessRegistry}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SdkHarnessRegistryTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleReaderFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleReaderFactoryTest.java index 2bec17f82113..de78d083be73 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleReaderFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleReaderFactoryTest.java @@ -19,6 +19,7 @@ import static com.google.api.client.util.Base64.encodeBase64String; import static org.apache.beam.runners.dataflow.util.Structs.addString; +import static org.hamcrest.MatcherAssert.assertThat; import com.google.api.services.dataflow.model.Source; import org.apache.beam.runners.dataflow.util.CloudObject; @@ -50,7 +51,6 @@ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "unchecked", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ShuffleReaderFactoryTest { T runTestCreateShuffleReader( @@ -78,7 +78,7 @@ T runTestCreateShuffleReader( NativeReader reader = ReaderRegistry.defaultRegistry() .create(cloudSource, PipelineOptionsFactory.create(), context, null); - Assert.assertThat(reader, new IsInstanceOf(shuffleReaderClass)); + assertThat(reader, new IsInstanceOf(shuffleReaderClass)); return (T) reader; } diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleSinkFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleSinkFactoryTest.java index c70caa2cafc6..32437c20aa72 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleSinkFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleSinkFactoryTest.java @@ -19,6 +19,7 @@ import static com.google.api.client.util.Base64.encodeBase64String; import static org.apache.beam.runners.dataflow.util.Structs.addString; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.runners.dataflow.util.CloudObject; import org.apache.beam.runners.dataflow.worker.util.common.worker.Sink; @@ -65,7 +66,7 @@ private ShuffleSink runTestCreateShuffleSinkHelper( options, BatchModeExecutionContext.forTesting(options, "testStage"), TestOperationContext.create()); - Assert.assertThat(sink, new IsInstanceOf(ShuffleSink.class)); + assertThat(sink, new IsInstanceOf(ShuffleSink.class)); ShuffleSink shuffleSink = (ShuffleSink) sink; Assert.assertArrayEquals(shuffleWriterConfig, shuffleSink.shuffleWriterConfig); Assert.assertEquals(coder, shuffleSink.windowedElemCoder); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleSinkTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleSinkTest.java index 9fe63d1b81e0..1a87b3c9f86a 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleSinkTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ShuffleSinkTest.java @@ -48,9 +48,6 @@ /** Tests for {@link ShuffleSink}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ShuffleSinkTest { private static final List> NO_KVS = Collections.emptyList(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFnTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFnTest.java index 9515356eca4c..d798bd718646 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFnTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFnTest.java @@ -20,6 +20,7 @@ import static org.apache.beam.runners.dataflow.worker.util.CounterHamcrestMatchers.CounterStructuredNameMatcher.hasStructuredName; import static org.apache.beam.runners.dataflow.worker.util.CounterHamcrestMatchers.CounterUpdateDistributionMatcher.hasDistribution; import static org.hamcrest.CoreMatchers.containsString; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.not; import static org.hamcrest.collection.IsIterableContainingInOrder.contains; @@ -27,7 +28,6 @@ import static org.hamcrest.core.IsEqual.equalTo; import static org.hamcrest.core.IsInstanceOf.instanceOf; import static org.junit.Assert.assertArrayEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import com.google.api.services.dataflow.model.CounterUpdate; @@ -71,9 +71,6 @@ /** Tests for {@link SimpleParDoFn}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SimpleParDoFnTest { @Rule public ExpectedException thrown = ExpectedException.none(); @@ -218,7 +215,7 @@ public void testOutputReceivers() throws Exception { ParDoFn userParDoFn = new SimpleParDoFn<>( options, - DoFnInstanceManagers.cloningPool(fnInfo), + DoFnInstanceManagers.cloningPool(fnInfo, options), new EmptySideInputReader(), MAIN_OUTPUT, ImmutableMap.of( @@ -440,7 +437,7 @@ public void testUndeclaredSideOutputs() throws Exception { ParDoFn userParDoFn = new SimpleParDoFn<>( options, - DoFnInstanceManagers.cloningPool(fnInfo), + DoFnInstanceManagers.cloningPool(fnInfo, options), NullSideInputReader.empty(), MAIN_OUTPUT, ImmutableMap.of(MAIN_OUTPUT, 0, new TupleTag("declared"), 1), diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SinkRegistryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SinkRegistryTest.java index 89c1a699519b..5916e18a8c0e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SinkRegistryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SinkRegistryTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.runners.dataflow.util.Structs.addString; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.runners.dataflow.util.CloudObject; import org.apache.beam.runners.dataflow.util.CloudObjects; @@ -50,7 +51,7 @@ public void testCreatePredefinedSink() throws Exception { options, BatchModeExecutionContext.forTesting(options, "testStage"), TestOperationContext.create()); - Assert.assertThat(sink.getUnderlyingSink(), new IsInstanceOf(AvroByteSink.class)); + assertThat(sink.getUnderlyingSink(), new IsInstanceOf(AvroByteSink.class)); } @Test @@ -70,7 +71,7 @@ public void testCreateUnknownSink() throws Exception { TestOperationContext.create()); Assert.fail("should have thrown an exception"); } catch (Exception exn) { - Assert.assertThat(exn.toString(), CoreMatchers.containsString("Unable to create a Sink")); + assertThat(exn.toString(), CoreMatchers.containsString("Unable to create a Sink")); } } } diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SourceOperationExecutorFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SourceOperationExecutorFactoryTest.java index 9eb3a4c60d75..7203a50abd9f 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SourceOperationExecutorFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/SourceOperationExecutorFactoryTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import com.google.api.services.dataflow.model.SourceOperationRequest; import com.google.api.services.dataflow.model.SourceSplitRequest; @@ -35,9 +35,6 @@ /** Tests for {@link SourceOperationExecutorFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SourceOperationExecutorFactoryTest { @Mock public DataflowExecutionContext executionContext; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StateFetcherTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StateFetcherTest.java index b0a457daba25..867972083a8f 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StateFetcherTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StateFetcherTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.any; import static org.mockito.Mockito.times; import static org.mockito.Mockito.verify; @@ -43,7 +43,7 @@ import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.values.PCollectionView; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.Cache; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; @@ -57,9 +57,6 @@ /** Unit tests for {@link StateFetcher}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StateFetcherTest { private static final String STATE_FAMILY = "state"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java index 4d94198e085e..17b59ecf3737 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java @@ -20,6 +20,7 @@ import static org.apache.beam.runners.dataflow.util.Structs.addObject; import static org.apache.beam.runners.dataflow.util.Structs.addString; import static org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.splitIntToLong; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.both; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; @@ -27,7 +28,6 @@ import static org.hamcrest.Matchers.lessThan; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; import static org.mockito.ArgumentMatchers.eq; @@ -142,10 +142,11 @@ import org.apache.beam.sdk.values.ValueWithRecordId; import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.beam.sdk.values.WindowingStrategy.AccumulationMode; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString.Output; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString.Output; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Optional; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheStats; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; @@ -170,9 +171,6 @@ /** Unit tests for {@link StreamingDataflowWorker}. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StreamingDataflowWorkerTest { private final boolean streamingEngine; @@ -796,7 +794,8 @@ public void testHotKeyLogging() throws Exception { final int numIters = 2000; for (int i = 0; i < numIters; ++i) { - server.addWorkToOffer(makeInput(i, 0, "key", DEFAULT_SHARDING_KEY)); + server.addWorkToOffer( + makeInput(i, TimeUnit.MILLISECONDS.toMicros(i), "key", DEFAULT_SHARDING_KEY)); } Map result = server.waitForAndGetCommits(numIters); @@ -831,7 +830,8 @@ public void testHotKeyLoggingNotEnabled() throws Exception { final int numIters = 2000; for (int i = 0; i < numIters; ++i) { - server.addWorkToOffer(makeInput(i, 0, "key", DEFAULT_SHARDING_KEY)); + server.addWorkToOffer( + makeInput(i, TimeUnit.MILLISECONDS.toMicros(i), "key", DEFAULT_SHARDING_KEY)); } Map result = server.waitForAndGetCommits(numIters); @@ -1725,6 +1725,308 @@ public void testMergeWindows() throws Exception { assertEquals(0L, splitIntToLong(getCounter(counters, "WindmillShuffleBytesRead").getInteger())); } + static class PassthroughDoFn + extends DoFn>, KV>> { + + @ProcessElement + public void processElement(ProcessContext c) { + c.output(c.element()); + } + } + + @Test + // Runs a merging windows test verifying stored state, holds and timers with caching due to + // the first processing having is_new_key set. + public void testMergeWindowsCaching() throws Exception { + Coder> kvCoder = KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()); + Coder>> windowedKvCoder = + FullWindowedValueCoder.of(kvCoder, IntervalWindow.getCoder()); + KvCoder> groupedCoder = + KvCoder.of(StringUtf8Coder.of(), ListCoder.of(StringUtf8Coder.of())); + Coder>>> windowedGroupedCoder = + FullWindowedValueCoder.of(groupedCoder, IntervalWindow.getCoder()); + + CloudObject spec = CloudObject.forClassName("MergeWindowsDoFn"); + SdkComponents sdkComponents = SdkComponents.create(); + sdkComponents.registerEnvironment(Environments.JAVA_SDK_HARNESS_ENVIRONMENT); + addString( + spec, + PropertyNames.SERIALIZED_FN, + StringUtils.byteArrayToJsonString( + WindowingStrategyTranslation.toMessageProto( + WindowingStrategy.of(FixedWindows.of(Duration.standardSeconds(1))) + .withTimestampCombiner(TimestampCombiner.EARLIEST), + sdkComponents) + .toByteArray())); + addObject( + spec, + WorkerPropertyNames.INPUT_CODER, + CloudObjects.asCloudObject(windowedKvCoder, /*sdkComponents=*/ null)); + + ParallelInstruction mergeWindowsInstruction = + new ParallelInstruction() + .setSystemName("MergeWindows-System") + .setName("MergeWindowsStep") + .setOriginalName("MergeWindowsOriginal") + .setParDo( + new ParDoInstruction() + .setInput(new InstructionInput().setProducerInstructionIndex(0).setOutputNum(0)) + .setNumOutputs(1) + .setUserFn(spec)) + .setOutputs( + Arrays.asList( + new InstructionOutput() + .setOriginalName(DEFAULT_OUTPUT_ORIGINAL_NAME) + .setSystemName(DEFAULT_OUTPUT_SYSTEM_NAME) + .setName("output") + .setCodec( + CloudObjects.asCloudObject( + windowedGroupedCoder, /*sdkComponents=*/ null)))); + + List instructions = + Arrays.asList( + makeWindowingSourceInstruction(kvCoder), + mergeWindowsInstruction, + // Use multiple stages in the maptask to test caching with multiple stages. + makeDoFnInstruction(new PassthroughDoFn(), 1, groupedCoder), + makeSinkInstruction(groupedCoder, 2)); + + FakeWindmillServer server = new FakeWindmillServer(errorCollector); + + StreamingDataflowWorker worker = + makeWorker(instructions, createTestingPipelineOptions(server), false /* publishCounters */); + Map nameMap = new HashMap<>(); + nameMap.put("MergeWindowsStep", "MergeWindows"); + worker.addStateNameMappings(nameMap); + worker.start(); + + server.addWorkToOffer( + buildInput( + "work {" + + " computation_id: \"" + + DEFAULT_COMPUTATION_ID + + "\"" + + " input_data_watermark: 0" + + " work {" + + " key: \"" + + DEFAULT_KEY_STRING + + "\"" + + " sharding_key: " + + DEFAULT_SHARDING_KEY + + " cache_token: 1" + + " work_token: 1" + + " is_new_key: 1" + + " message_bundles {" + + " source_computation_id: \"" + + DEFAULT_SOURCE_COMPUTATION_ID + + "\"" + + " messages {" + + " timestamp: 0" + + " data: \"" + + dataStringForIndex(0) + + "\"" + + " }" + + " }" + + " }" + + "}", + intervalWindowBytes(WINDOW_AT_ZERO))); + + Map result = server.waitForAndGetCommits(1); + Iterable counters = worker.buildCounters(); + + // These tags and data are opaque strings and this is a change detector test. + // The "/u" indicates the user's namespace, versus "/s" for system namespace + String window = "/gAAAAAAAA-joBw/"; + String timerTagPrefix = "/s" + window + "+0"; + ByteString bufferTag = ByteString.copyFromUtf8(window + "+ubuf"); + ByteString paneInfoTag = ByteString.copyFromUtf8(window + "+upane"); + String watermarkDataHoldTag = window + "+uhold"; + String watermarkExtraHoldTag = window + "+uextra"; + String stateFamily = "MergeWindows"; + ByteString bufferData = ByteString.copyFromUtf8("data0"); + // Encoded form for Iterable: -1, true, 'data0', false + ByteString outputData = + ByteString.copyFrom( + new byte[] { + (byte) 0xff, + (byte) 0xff, + (byte) 0xff, + (byte) 0xff, + 0x01, + 0x05, + 0x64, + 0x61, + 0x74, + 0x61, + 0x30, + 0x00 + }); + + // These values are not essential to the change detector test + long timerTimestamp = 999000L; + + WorkItemCommitRequest actualOutput = result.get(1L); + + // Set timer + verifyTimers(actualOutput, buildWatermarkTimer(timerTagPrefix, 999)); + + assertThat( + actualOutput.getBagUpdatesList(), + Matchers.contains( + Matchers.equalTo( + Windmill.TagBag.newBuilder() + .setTag(bufferTag) + .setStateFamily(stateFamily) + .addValues(bufferData) + .build()))); + + verifyHolds(actualOutput, buildHold(watermarkDataHoldTag, 0, false)); + + // No state reads + assertEquals(0L, splitIntToLong(getCounter(counters, "WindmillStateBytesRead").getInteger())); + // Timer + buffer + watermark hold + assertEquals( + Windmill.WorkItemCommitRequest.newBuilder(actualOutput) + .clearCounterUpdates() + .clearOutputMessages() + .build() + .getSerializedSize(), + splitIntToLong(getCounter(counters, "WindmillStateBytesWritten").getInteger())); + // Input messages + assertEquals( + VarInt.getLength(0L) + + dataStringForIndex(0).length() + + addPaneTag(PaneInfo.NO_FIRING, intervalWindowBytes(WINDOW_AT_ZERO)).size() + + 5L // proto overhead + , + splitIntToLong(getCounter(counters, "WindmillShuffleBytesRead").getInteger())); + + Windmill.GetWorkResponse.Builder getWorkResponse = Windmill.GetWorkResponse.newBuilder(); + getWorkResponse + .addWorkBuilder() + .setComputationId(DEFAULT_COMPUTATION_ID) + .setInputDataWatermark(timerTimestamp + 1000) + .addWorkBuilder() + .setKey(ByteString.copyFromUtf8(DEFAULT_KEY_STRING)) + .setShardingKey(DEFAULT_SHARDING_KEY) + .setWorkToken(2) + .setCacheToken(1) + .getTimersBuilder() + .addTimers(buildWatermarkTimer(timerTagPrefix, timerTimestamp)); + server.addWorkToOffer(getWorkResponse.build()); + + long expectedBytesRead = 0L; + + Windmill.GetDataResponse.Builder dataResponse = Windmill.GetDataResponse.newBuilder(); + Windmill.KeyedGetDataResponse.Builder dataBuilder = + dataResponse + .addDataBuilder() + .setComputationId(DEFAULT_COMPUTATION_ID) + .addDataBuilder() + .setKey(ByteString.copyFromUtf8(DEFAULT_KEY_STRING)) + .setShardingKey(DEFAULT_SHARDING_KEY); + // These reads are skipped due to being cached from accesses in the first work item processing. + // dataBuilder + // .addBagsBuilder() + // .setTag(bufferTag) + // .setStateFamily(stateFamily) + // .addValues(bufferData); + // dataBuilder + // .addWatermarkHoldsBuilder() + // .setTag(ByteString.copyFromUtf8(watermarkDataHoldTag)) + // .setStateFamily(stateFamily) + // .addTimestamps(0); + dataBuilder + .addWatermarkHoldsBuilder() + .setTag(ByteString.copyFromUtf8(watermarkExtraHoldTag)) + .setStateFamily(stateFamily) + .addTimestamps(0); + dataBuilder + .addValuesBuilder() + .setTag(paneInfoTag) + .setStateFamily(stateFamily) + .getValueBuilder() + .setTimestamp(0) + .setData(ByteString.EMPTY); + server.addDataToOffer(dataResponse.build()); + + expectedBytesRead += dataBuilder.build().getSerializedSize(); + + result = server.waitForAndGetCommits(1); + counters = worker.buildCounters(); + actualOutput = result.get(2L); + + assertEquals(1, actualOutput.getOutputMessagesCount()); + assertEquals( + DEFAULT_DESTINATION_STREAM_ID, actualOutput.getOutputMessages(0).getDestinationStreamId()); + assertEquals( + DEFAULT_KEY_STRING, + actualOutput.getOutputMessages(0).getBundles(0).getKey().toStringUtf8()); + assertEquals(0, actualOutput.getOutputMessages(0).getBundles(0).getMessages(0).getTimestamp()); + assertEquals( + outputData, actualOutput.getOutputMessages(0).getBundles(0).getMessages(0).getData()); + + ByteString metadata = + actualOutput.getOutputMessages(0).getBundles(0).getMessages(0).getMetadata(); + InputStream inStream = metadata.newInput(); + assertEquals( + PaneInfo.createPane(true, true, Timing.ON_TIME), PaneInfoCoder.INSTANCE.decode(inStream)); + assertEquals( + Arrays.asList(WINDOW_AT_ZERO), + DEFAULT_WINDOW_COLLECTION_CODER.decode(inStream, Coder.Context.OUTER)); + + // Data was deleted + assertThat( + "" + actualOutput.getValueUpdatesList(), + actualOutput.getValueUpdatesList(), + Matchers.contains( + Matchers.equalTo( + Windmill.TagValue.newBuilder() + .setTag(paneInfoTag) + .setStateFamily(stateFamily) + .setValue( + Windmill.Value.newBuilder() + .setTimestamp(Long.MAX_VALUE) + .setData(ByteString.EMPTY)) + .build()))); + + assertThat( + "" + actualOutput.getBagUpdatesList(), + actualOutput.getBagUpdatesList(), + Matchers.contains( + Matchers.equalTo( + Windmill.TagBag.newBuilder() + .setTag(bufferTag) + .setStateFamily(stateFamily) + .setDeleteAll(true) + .build()))); + + verifyHolds( + actualOutput, + buildHold(watermarkDataHoldTag, -1, true), + buildHold(watermarkExtraHoldTag, -1, true)); + + // State reads for windowing + assertEquals( + expectedBytesRead, + splitIntToLong(getCounter(counters, "WindmillStateBytesRead").getInteger())); + // State updates to clear state + assertEquals( + Windmill.WorkItemCommitRequest.newBuilder(actualOutput) + .clearCounterUpdates() + .clearOutputMessages() + .build() + .getSerializedSize(), + splitIntToLong(getCounter(counters, "WindmillStateBytesWritten").getInteger())); + // No input messages + assertEquals(0L, splitIntToLong(getCounter(counters, "WindmillShuffleBytesRead").getInteger())); + + CacheStats stats = worker.stateCache.getCacheStats(); + LOG.info("cache stats {}", stats); + assertEquals(1, stats.hitCount()); + assertEquals(4, stats.missCount()); + } + static class Action { public Action(GetWorkResponse response) { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowFnsTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowFnsTest.java index 8d56ca034513..061c2898dada 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowFnsTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowFnsTest.java @@ -18,11 +18,11 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.sdk.TestUtils.KvMatcher.isKv; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import java.io.IOException; @@ -75,7 +75,7 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.joda.time.Duration; import org.joda.time.Instant; import org.junit.Before; @@ -87,9 +87,6 @@ /** Unit tests for {@link StreamingGroupAlsoByWindowsDoFns}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StreamingGroupAlsoByWindowFnsTest { private static final String KEY = "k"; private static final String STATE_FAMILY = "stateFamily"; @@ -317,16 +314,13 @@ public void testSlidingWindows() throws Exception { isKv(equalTo(KEY), containsInAnyOrder("v0", "v1")), equalTo(new Instant(2)), equalTo(window(-10, 10))), - - // For this sliding window, the minimum output timestmap was 10, since we didn't want to - // overlap with the previous window that was [-10, 10). WindowMatchers.isSingleWindowedValue( isKv(equalTo(KEY), containsInAnyOrder("v0", "v1", "v2")), - equalTo(window(-10, 10).maxTimestamp().plus(1)), + equalTo(new Instant(2)), equalTo(window(0, 20))), WindowMatchers.isSingleWindowedValue( isKv(equalTo(KEY), containsInAnyOrder("v2")), - equalTo(window(0, 20).maxTimestamp().plus(1)), + equalTo(new Instant(15)), equalTo(window(10, 30))))); } @@ -405,16 +399,13 @@ public void testSlidingWindowsAndLateData() throws Exception { isKv(equalTo(KEY), emptyIterable()), equalTo(window(-10, 10).maxTimestamp()), equalTo(window(-10, 10))), - - // For this sliding window, the minimum output timestmap was 10, since we didn't want to - // overlap with the previous window that was [-10, 10). WindowMatchers.isSingleWindowedValue( isKv(equalTo(KEY), containsInAnyOrder("v0", "v1", "v2")), - equalTo(window(-10, 10).maxTimestamp().plus(1)), + equalTo(new Instant(2)), equalTo(window(0, 20))), WindowMatchers.isSingleWindowedValue( isKv(equalTo(KEY), containsInAnyOrder("v2")), - equalTo(window(0, 20).maxTimestamp().plus(1)), + equalTo(new Instant(15)), equalTo(window(10, 30))))); long droppedValues = diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowsReshuffleDoFnTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowsReshuffleDoFnTest.java index daf523df07cb..5021d682c7f3 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowsReshuffleDoFnTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingGroupAlsoByWindowsReshuffleDoFnTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.util.Arrays; @@ -48,7 +48,7 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.hamcrest.Matchers; import org.joda.time.Duration; import org.joda.time.Instant; @@ -59,9 +59,6 @@ /** Unit tests for {@link StreamingGroupAlsoByWindowReshuffleFn}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StreamingGroupAlsoByWindowsReshuffleDoFnTest { private static final String KEY = "k"; private static final long WORK_TOKEN = 1000L; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingKeyedWorkItemSideInputDoFnRunnerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingKeyedWorkItemSideInputDoFnRunnerTest.java index a5f6199e09e9..5d6f1e7099f1 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingKeyedWorkItemSideInputDoFnRunnerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingKeyedWorkItemSideInputDoFnRunnerTest.java @@ -68,9 +68,6 @@ /** Unit tests for {@link StreamingKeyedWorkItemSideInputDoFnRunner}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StreamingKeyedWorkItemSideInputDoFnRunnerTest { private static final FixedWindows WINDOW_FN = FixedWindows.of(Duration.millis(10)); private static TupleTag> mainOutputTag = new TupleTag<>(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContextTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContextTest.java index 89ac7580b738..2cc2530eb409 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContextTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContextTest.java @@ -19,10 +19,10 @@ import static org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.longToSplitInt; import static org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.splitIntToLong; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import com.google.api.services.dataflow.model.CounterMetadata; @@ -66,9 +66,10 @@ import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.hamcrest.Matchers; +import org.joda.time.Duration; import org.joda.time.Instant; import org.junit.Before; import org.junit.Test; @@ -80,9 +81,6 @@ /** Tests for {@link StreamingModeExecutionContext}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StreamingModeExecutionContextTest { @Mock private StateFetcher stateFetcher; @@ -104,7 +102,7 @@ public void setUp() { new StreamingModeExecutionContext( counterSet, "computationId", - new ReaderCache(), + new ReaderCache(Duration.standardMinutes(1), Executors.newCachedThreadPool()), stateNameMap, new WindmillStateCache(options.getWorkerCacheMb()).forComputation("comp"), StreamingStepMetricsContainer.createRegistry(), diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingPCollectionViewWriterDoFnFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingPCollectionViewWriterDoFnFactoryTest.java index 0d3161ad793d..5ed67666ee59 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingPCollectionViewWriterDoFnFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingPCollectionViewWriterDoFnFactoryTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import org.apache.beam.runners.dataflow.util.CloudObject; @@ -40,7 +40,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class StreamingPCollectionViewWriterDoFnFactoryTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunnerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunnerTest.java index 90db7fcbb0b8..16d986b72486 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunnerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunnerTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Matchers.eq; @@ -63,7 +63,7 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.hamcrest.Matchers; import org.joda.time.Duration; @@ -78,9 +78,6 @@ /** Unit tests for {@link StreamingSideInputDoFnRunner}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StreamingSideInputDoFnRunnerTest { private static final FixedWindows WINDOW_FN = FixedWindows.of(Duration.millis(10)); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputFetcherTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputFetcherTest.java index d237234b8f20..b525554c6f23 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputFetcherTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputFetcherTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.dataflow.worker; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Matchers.eq; @@ -49,7 +49,7 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; import org.hamcrest.Matchers; @@ -64,9 +64,6 @@ /** Tests for {@link StreamingSideInputFetcher}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StreamingSideInputFetcherTest { private static final FixedWindows WINDOW_FN = FixedWindows.of(Duration.millis(10)); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingStepMetricsContainerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingStepMetricsContainerTest.java index 33e6a8e41e36..caec162ba38a 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingStepMetricsContainerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingStepMetricsContainerTest.java @@ -18,10 +18,10 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.longToSplitInt; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.sameInstance; -import static org.junit.Assert.assertThat; import com.google.api.services.dataflow.model.CounterMetadata; import com.google.api.services.dataflow.model.CounterStructuredName; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestOperationContext.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestOperationContext.java index 2ed9b8fde079..d00aede638b2 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestOperationContext.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestOperationContext.java @@ -30,9 +30,6 @@ import org.checkerframework.checker.nullness.qual.Nullable; /** {@link OperationContext} for testing purposes. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestOperationContext extends DataflowOperationContext { /** ExecutionState for testing. */ diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestShuffleReadCounterFactory.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestShuffleReadCounterFactory.java index 411ed69580f6..c802effcd1ee 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestShuffleReadCounterFactory.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestShuffleReadCounterFactory.java @@ -28,9 +28,6 @@ * shuffleReadCounters created for each shuffle step. Note: There is one ShuffleReadCounter for each * GroupingShuffleReader associated with a unique GBK/shuffle. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestShuffleReadCounterFactory extends ShuffleReadCounterFactory { private TreeMap originalShuffleStepToCounter; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestShuffleReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestShuffleReaderTest.java index db1f6f3b77bd..ace0eb0597aa 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestShuffleReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/TestShuffleReaderTest.java @@ -35,9 +35,6 @@ /** Tests of TestShuffleReader. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestShuffleReaderTest { static final String START_KEY = "ddd"; static final String END_KEY = "mmm"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ToIsmRecordForMultimapDoFnFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ToIsmRecordForMultimapDoFnFactoryTest.java index 759c5598bd81..7d273d63351a 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ToIsmRecordForMultimapDoFnFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ToIsmRecordForMultimapDoFnFactoryTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.List; @@ -38,9 +38,6 @@ /** Tests for {@link ToIsmRecordForMultimapDoFnFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ToIsmRecordForMultimapDoFnFactoryTest { @Test public void testConversionOfRecord() throws Exception { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/UngroupedShuffleReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/UngroupedShuffleReaderTest.java index f122ad638ab5..31e3c4589292 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/UngroupedShuffleReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/UngroupedShuffleReaderTest.java @@ -40,9 +40,6 @@ /** Tests for UngroupedShuffleReader. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UngroupedShuffleReaderTest { private static final Instant timestamp = new Instant(123000); private static final IntervalWindow window = new IntervalWindow(timestamp, timestamp.plus(1000)); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/UserParDoFnFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/UserParDoFnFactoryTest.java index c830e61cbf59..2725289bf640 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/UserParDoFnFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/UserParDoFnFactoryTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.nullValue; import static org.hamcrest.Matchers.theInstance; import static org.hamcrest.core.AnyOf.anyOf; import static org.hamcrest.core.IsEqual.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.any; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.verify; @@ -77,9 +77,6 @@ /** Tests for {@link UserParDoFnFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UserParDoFnFactoryTest { static class TestDoFn extends DoFn { enum State { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ValuesDoFnFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ValuesDoFnFactoryTest.java index 2da728836594..d360753456cd 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ValuesDoFnFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/ValuesDoFnFactoryTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.List; @@ -34,9 +34,6 @@ /** Tests for {@link ValuesDoFnFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ValuesDoFnFactoryTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillKeyedWorkItemTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillKeyedWorkItemTest.java index 777b8a5dbf1c..312ba03951de 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillKeyedWorkItemTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillKeyedWorkItemTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.dataflow.worker; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.io.IOException; import java.util.Collection; @@ -40,7 +40,7 @@ import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.transforms.windowing.PaneInfo.Timing; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.hamcrest.Matchers; import org.joda.time.Instant; import org.junit.Before; @@ -52,9 +52,6 @@ /** Tests for {@link WindmillKeyedWorkItem}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindmillKeyedWorkItemTest { private static final String STATE_FAMILY = "state"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillReaderIteratorBaseTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillReaderIteratorBaseTest.java index 0b89c998e1c6..d527be85bef1 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillReaderIteratorBaseTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillReaderIteratorBaseTest.java @@ -26,7 +26,7 @@ import java.util.List; import org.apache.beam.runners.dataflow.worker.windmill.Windmill; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateCacheTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateCacheTest.java index f95e7f848012..f661f1e8cd97 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateCacheTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateCacheTest.java @@ -30,7 +30,7 @@ import org.apache.beam.sdk.state.State; import org.apache.beam.sdk.state.StateSpec; import org.apache.beam.sdk.transforms.windowing.IntervalWindow; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Instant; import org.junit.Before; @@ -40,9 +40,6 @@ /** Tests for {@link WindmillStateCache}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindmillStateCacheTest { private static final String COMPUTATION = "computation"; @@ -144,7 +141,6 @@ private static WindmillComputationKey computationKey( } WindmillStateCache cache; - WindmillStateCache.ForKey keyCache; @Before public void setUp() { @@ -155,22 +151,31 @@ public void setUp() { @Test public void testBasic() throws Exception { - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 1L); + WindmillStateCache.ForKeyAndFamily keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 1L).forFamily(STATE_FAMILY); assertNull(keyCache.get(StateNamespaces.global(), new TestStateTag("tag1"))); assertNull(keyCache.get(windowNamespace(0), new TestStateTag("tag2"))); assertNull(keyCache.get(triggerNamespace(0, 0), new TestStateTag("tag3"))); assertNull(keyCache.get(triggerNamespace(0, 0), new TestStateTag("tag2"))); + assertEquals(0, cache.getWeight()); keyCache.put(StateNamespaces.global(), new TestStateTag("tag1"), new TestState("g1"), 2); - assertEquals(129, cache.getWeight()); keyCache.put(windowNamespace(0), new TestStateTag("tag2"), new TestState("w2"), 2); - assertEquals(258, cache.getWeight()); + + assertEquals(0, cache.getWeight()); + keyCache.persist(); + assertEquals(254, cache.getWeight()); + keyCache.put(triggerNamespace(0, 0), new TestStateTag("tag3"), new TestState("t3"), 2); - assertEquals(276, cache.getWeight()); keyCache.put(triggerNamespace(0, 0), new TestStateTag("tag2"), new TestState("t2"), 2); - assertEquals(294, cache.getWeight()); - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 2L); + // Observes updated weight in entries, though cache will not know about it. + assertEquals(290, cache.getWeight()); + keyCache.persist(); + assertEquals(290, cache.getWeight()); + + keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 2L).forFamily(STATE_FAMILY); assertEquals( new TestState("g1"), keyCache.get(StateNamespaces.global(), new TestStateTag("tag1"))); assertEquals(new TestState("w2"), keyCache.get(windowNamespace(0), new TestStateTag("tag2"))); @@ -189,31 +194,37 @@ public void testMaxWeight() throws Exception { /** Verifies that values are cached in the appropriate namespaces. */ @Test public void testInvalidation() throws Exception { - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 1L); + WindmillStateCache.ForKeyAndFamily keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 1L).forFamily(STATE_FAMILY); assertNull(keyCache.get(StateNamespaces.global(), new TestStateTag("tag1"))); keyCache.put(StateNamespaces.global(), new TestStateTag("tag1"), new TestState("g1"), 2); + keyCache.persist(); - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 2L); - assertEquals(129, cache.getWeight()); + keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 2L).forFamily(STATE_FAMILY); + assertEquals(127, cache.getWeight()); assertEquals( new TestState("g1"), keyCache.get(StateNamespaces.global(), new TestStateTag("tag1"))); - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 1L, 3L); - assertEquals(129, cache.getWeight()); + keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 1L, 3L).forFamily(STATE_FAMILY); assertNull(keyCache.get(StateNamespaces.global(), new TestStateTag("tag1"))); - assertEquals(0, cache.getWeight()); + assertEquals(127, cache.getWeight()); } /** Verifies that the cache is invalidated when the cache token changes. */ @Test public void testEviction() throws Exception { - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 1L); + WindmillStateCache.ForKeyAndFamily keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 1L).forFamily(STATE_FAMILY); keyCache.put(windowNamespace(0), new TestStateTag("tag2"), new TestState("w2"), 2); - assertEquals(129, cache.getWeight()); keyCache.put(triggerNamespace(0, 0), new TestStateTag("tag3"), new TestState("t3"), 2000000000); + keyCache.persist(); assertEquals(0, cache.getWeight()); + // Eviction is atomic across the whole window. - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 2L); + keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 2L).forFamily(STATE_FAMILY); assertNull(keyCache.get(windowNamespace(0), new TestStateTag("tag2"))); assertNull(keyCache.get(triggerNamespace(0, 0), new TestStateTag("tag3"))); } @@ -223,30 +234,40 @@ public void testEviction() throws Exception { public void testStaleWorkItem() throws Exception { TestStateTag tag = new TestStateTag("tag2"); - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 2L); + WindmillStateCache.ForKeyAndFamily keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 2L).forFamily(STATE_FAMILY); keyCache.put(windowNamespace(0), tag, new TestState("w2"), 2); - assertEquals(129, cache.getWeight()); + // Same cache. - assertNull(keyCache.get(windowNamespace(0), tag)); + assertEquals(new TestState("w2"), keyCache.get(windowNamespace(0), tag)); + assertEquals(0, cache.getWeight()); + keyCache.persist(); + assertEquals(127, cache.getWeight()); + assertEquals(new TestState("w2"), keyCache.get(windowNamespace(0), tag)); // Previous work token. - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 1L); + keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 1L).forFamily(STATE_FAMILY); assertNull(keyCache.get(windowNamespace(0), tag)); // Retry of work token that inserted. - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 2L); + keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 2L).forFamily(STATE_FAMILY); assertNull(keyCache.get(windowNamespace(0), tag)); - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 10L); - assertEquals(new TestState("w2"), keyCache.get(windowNamespace(0), tag)); + keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 10L).forFamily(STATE_FAMILY); + assertNull(keyCache.get(windowNamespace(0), tag)); keyCache.put(windowNamespace(0), tag, new TestState("w3"), 2); // Ensure that second put updated work token. - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 5L); + keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 5L).forFamily(STATE_FAMILY); assertNull(keyCache.get(windowNamespace(0), tag)); - keyCache = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 15L); - assertEquals(new TestState("w3"), keyCache.get(windowNamespace(0), tag)); + keyCache = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 15L).forFamily(STATE_FAMILY); + assertNull(keyCache.get(windowNamespace(0), tag)); } /** Verifies that caches are kept independently per-key. */ @@ -254,37 +275,45 @@ public void testStaleWorkItem() throws Exception { public void testMultipleKeys() throws Exception { TestStateTag tag = new TestStateTag("tag1"); - WindmillStateCache.ForKey keyCache1 = + WindmillStateCache.ForKeyAndFamily keyCache1 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key1", SHARDING_KEY), STATE_FAMILY, 0L, 0L); - WindmillStateCache.ForKey keyCache2 = + .forKey(computationKey("comp1", "key1", SHARDING_KEY), 0L, 0L) + .forFamily(STATE_FAMILY); + WindmillStateCache.ForKeyAndFamily keyCache2 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key2", SHARDING_KEY), STATE_FAMILY, 0L, 10L); - WindmillStateCache.ForKey keyCache3 = + .forKey(computationKey("comp1", "key2", SHARDING_KEY), 0L, 10L) + .forFamily(STATE_FAMILY); + WindmillStateCache.ForKeyAndFamily keyCache3 = cache .forComputation("comp2") - .forKey(computationKey("comp2", "key1", SHARDING_KEY), STATE_FAMILY, 0L, 0L); + .forKey(computationKey("comp2", "key1", SHARDING_KEY), 0L, 0L) + .forFamily(STATE_FAMILY); TestState state1 = new TestState("g1"); keyCache1.put(StateNamespaces.global(), tag, state1, 2); - assertNull(keyCache1.get(StateNamespaces.global(), tag)); + assertEquals(state1, keyCache1.get(StateNamespaces.global(), tag)); + keyCache1.persist(); + keyCache1 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key1", SHARDING_KEY), STATE_FAMILY, 0L, 1L); + .forKey(computationKey("comp1", "key1", SHARDING_KEY), 0L, 1L) + .forFamily(STATE_FAMILY); assertEquals(state1, keyCache1.get(StateNamespaces.global(), tag)); assertNull(keyCache2.get(StateNamespaces.global(), tag)); assertNull(keyCache3.get(StateNamespaces.global(), tag)); TestState state2 = new TestState("g2"); keyCache2.put(StateNamespaces.global(), tag, state2, 2); - assertNull(keyCache2.get(StateNamespaces.global(), tag)); + keyCache2.persist(); + assertEquals(state2, keyCache2.get(StateNamespaces.global(), tag)); keyCache2 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key2", SHARDING_KEY), STATE_FAMILY, 0L, 20L); + .forKey(computationKey("comp1", "key2", SHARDING_KEY), 0L, 20L) + .forFamily(STATE_FAMILY); assertEquals(state2, keyCache2.get(StateNamespaces.global(), tag)); assertEquals(state1, keyCache1.get(StateNamespaces.global(), tag)); assertNull(keyCache3.get(StateNamespaces.global(), tag)); @@ -294,85 +323,133 @@ public void testMultipleKeys() throws Exception { @Test public void testMultipleShardsOfKey() throws Exception { TestStateTag tag = new TestStateTag("tag1"); - ByteString key1 = ByteString.copyFromUtf8("key1"); - ByteString key2 = ByteString.copyFromUtf8("key2"); - WindmillStateCache.ForKey key1CacheShard1 = + WindmillStateCache.ForKeyAndFamily key1CacheShard1 = cache .forComputation(COMPUTATION) - .forKey(computationKey(COMPUTATION, "key1", 1), STATE_FAMILY, 0L, 0L); - WindmillStateCache.ForKey key1CacheShard2 = + .forKey(computationKey(COMPUTATION, "key1", 1), 0L, 0L) + .forFamily(STATE_FAMILY); + WindmillStateCache.ForKeyAndFamily key1CacheShard2 = cache .forComputation(COMPUTATION) - .forKey(computationKey(COMPUTATION, "key1", 2), STATE_FAMILY, 0L, 0L); - WindmillStateCache.ForKey key2CacheShard1 = + .forKey(computationKey(COMPUTATION, "key1", 2), 0L, 0L) + .forFamily(STATE_FAMILY); + WindmillStateCache.ForKeyAndFamily key2CacheShard1 = cache .forComputation(COMPUTATION) - .forKey(computationKey(COMPUTATION, "key2", 1), STATE_FAMILY, 0L, 0L); + .forKey(computationKey(COMPUTATION, "key2", 1), 0L, 0L) + .forFamily(STATE_FAMILY); TestState state1 = new TestState("g1"); key1CacheShard1.put(StateNamespaces.global(), tag, state1, 2); - assertNull(key1CacheShard1.get(StateNamespaces.global(), tag)); + key1CacheShard1.persist(); + assertEquals(state1, key1CacheShard1.get(StateNamespaces.global(), tag)); key1CacheShard1 = cache .forComputation(COMPUTATION) - .forKey(computationKey(COMPUTATION, "key1", 1), STATE_FAMILY, 0L, 1L); + .forKey(computationKey(COMPUTATION, "key1", 1), 0L, 1L) + .forFamily(STATE_FAMILY); assertEquals(state1, key1CacheShard1.get(StateNamespaces.global(), tag)); assertNull(key1CacheShard2.get(StateNamespaces.global(), tag)); assertNull(key2CacheShard1.get(StateNamespaces.global(), tag)); TestState state2 = new TestState("g2"); key1CacheShard2.put(StateNamespaces.global(), tag, state2, 2); - assertNull(key1CacheShard2.get(StateNamespaces.global(), tag)); + assertEquals(state2, key1CacheShard2.get(StateNamespaces.global(), tag)); + key1CacheShard2.persist(); key1CacheShard2 = cache .forComputation(COMPUTATION) - .forKey(computationKey(COMPUTATION, "key1", 2), STATE_FAMILY, 0L, 20L); + .forKey(computationKey(COMPUTATION, "key1", 2), 0L, 20L) + .forFamily(STATE_FAMILY); assertEquals(state2, key1CacheShard2.get(StateNamespaces.global(), tag)); assertEquals(state1, key1CacheShard1.get(StateNamespaces.global(), tag)); assertNull(key2CacheShard1.get(StateNamespaces.global(), tag)); } + /** Verifies that caches are kept independently per-family. */ + @Test + public void testMultipleFamilies() throws Exception { + TestStateTag tag = new TestStateTag("tag1"); + + WindmillStateCache.ForKey keyCache = + cache.forComputation("comp1").forKey(computationKey("comp1", "key1", SHARDING_KEY), 0L, 0L); + WindmillStateCache.ForKeyAndFamily family1 = keyCache.forFamily("family1"); + WindmillStateCache.ForKeyAndFamily family2 = keyCache.forFamily("family2"); + WindmillStateCache.ForKeyAndFamily family3 = keyCache.forFamily("family3"); + + TestState state1 = new TestState("g1"); + family1.put(StateNamespaces.global(), tag, state1, 2); + assertEquals(state1, family1.get(StateNamespaces.global(), tag)); + family1.persist(); + + TestState state2 = new TestState("g2"); + family2.put(StateNamespaces.global(), tag, state2, 2); + family2.persist(); + assertEquals(state2, family2.get(StateNamespaces.global(), tag)); + + keyCache = + cache.forComputation("comp1").forKey(computationKey("comp1", "key1", SHARDING_KEY), 0L, 1L); + family1 = keyCache.forFamily("family1"); + family2 = keyCache.forFamily("family2"); + family3 = keyCache.forFamily("family3"); + assertEquals(state1, family1.get(StateNamespaces.global(), tag)); + assertEquals(state2, family2.get(StateNamespaces.global(), tag)); + assertNull(family3.get(StateNamespaces.global(), tag)); + } + /** Verifies explicit invalidation does indeed invalidate the correct entries. */ @Test public void testExplicitInvalidation() throws Exception { - WindmillStateCache.ForKey keyCache1 = + WindmillStateCache.ForKeyAndFamily keyCache1 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key1", 1), STATE_FAMILY, 0L, 0L); - WindmillStateCache.ForKey keyCache2 = + .forKey(computationKey("comp1", "key1", 1), 0L, 0L) + .forFamily(STATE_FAMILY); + WindmillStateCache.ForKeyAndFamily keyCache2 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key2", SHARDING_KEY), STATE_FAMILY, 0L, 0L); - WindmillStateCache.ForKey keyCache3 = + .forKey(computationKey("comp1", "key2", SHARDING_KEY), 0L, 0L) + .forFamily(STATE_FAMILY); + WindmillStateCache.ForKeyAndFamily keyCache3 = cache .forComputation("comp2") - .forKey(computationKey("comp2", "key1", SHARDING_KEY), STATE_FAMILY, 0L, 0L); - WindmillStateCache.ForKey keyCache4 = + .forKey(computationKey("comp2", "key1", SHARDING_KEY), 0L, 0L) + .forFamily(STATE_FAMILY); + WindmillStateCache.ForKeyAndFamily keyCache4 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key1", 2), STATE_FAMILY, 0L, 0L); + .forKey(computationKey("comp1", "key1", 2), 0L, 0L) + .forFamily(STATE_FAMILY); keyCache1.put(StateNamespaces.global(), new TestStateTag("tag1"), new TestState("g1"), 1); + keyCache1.persist(); keyCache2.put(StateNamespaces.global(), new TestStateTag("tag2"), new TestState("g2"), 2); + keyCache2.persist(); keyCache3.put(StateNamespaces.global(), new TestStateTag("tag3"), new TestState("g3"), 3); + keyCache3.persist(); keyCache4.put(StateNamespaces.global(), new TestStateTag("tag4"), new TestState("g4"), 4); + keyCache4.persist(); keyCache1 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key1", 1), STATE_FAMILY, 0L, 1L); + .forKey(computationKey("comp1", "key1", 1), 0L, 1L) + .forFamily(STATE_FAMILY); keyCache2 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key2", SHARDING_KEY), STATE_FAMILY, 0L, 1L); + .forKey(computationKey("comp1", "key2", SHARDING_KEY), 0L, 1L) + .forFamily(STATE_FAMILY); keyCache3 = cache .forComputation("comp2") - .forKey(computationKey("comp2", "key1", SHARDING_KEY), STATE_FAMILY, 0L, 1L); + .forKey(computationKey("comp2", "key1", SHARDING_KEY), 0L, 1L) + .forFamily(STATE_FAMILY); keyCache4 = cache .forComputation("comp1") - .forKey(computationKey("comp1", "key1", 2), STATE_FAMILY, 0L, 1L); + .forKey(computationKey("comp1", "key1", 2), 0L, 1L) + .forFamily(STATE_FAMILY); assertEquals( new TestState("g1"), keyCache1.get(StateNamespaces.global(), new TestStateTag("tag1"))); assertEquals( @@ -384,6 +461,11 @@ public void testExplicitInvalidation() throws Exception { // Invalidation of key 1 shard 1 does not affect another shard of key 1 or other keys. cache.forComputation("comp1").invalidate(ByteString.copyFromUtf8("key1"), 1); + keyCache1 = + cache + .forComputation("comp1") + .forKey(computationKey("comp1", "key1", 1), 0L, 2L) + .forFamily(STATE_FAMILY); assertNull(keyCache1.get(StateNamespaces.global(), new TestStateTag("tag1"))); assertEquals( @@ -427,13 +509,15 @@ public int hashCode() { */ @Test public void testBadCoderEquality() throws Exception { - WindmillStateCache.ForKey keyCache1 = - cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 0L); + WindmillStateCache.ForKeyAndFamily keyCache1 = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 0L).forFamily(STATE_FAMILY); StateTag tag = new TestStateTagWithBadEquality("tag1"); keyCache1.put(StateNamespaces.global(), tag, new TestState("g1"), 1); + keyCache1.persist(); - keyCache1 = cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, STATE_FAMILY, 0L, 1L); + keyCache1 = + cache.forComputation(COMPUTATION).forKey(COMPUTATION_KEY, 0L, 1L).forFamily(STATE_FAMILY); assertEquals(new TestState("g1"), keyCache1.get(StateNamespaces.global(), tag)); assertEquals( new TestState("g1"), diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternalsTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternalsTest.java index 2670e7633683..727a0864da58 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternalsTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternalsTest.java @@ -19,22 +19,30 @@ import static org.apache.beam.runners.dataflow.worker.DataflowMatchers.ByteStringMatcher.byteStringEq; import static org.apache.beam.sdk.testing.SystemNanoTimeSleeper.sleepMillis; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNull; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.eq; import static org.mockito.Mockito.never; import static org.mockito.Mockito.times; import static org.mockito.Mockito.when; +import com.google.common.base.Charsets; import com.google.common.collect.Iterables; import java.io.Closeable; +import java.io.IOException; +import java.util.AbstractMap; +import java.util.AbstractMap.SimpleEntry; import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.Map; import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import javax.annotation.Nullable; import org.apache.beam.runners.core.StateNamespace; import org.apache.beam.runners.core.StateNamespaceForTest; import org.apache.beam.runners.core.StateTag; @@ -47,12 +55,14 @@ import org.apache.beam.runners.dataflow.worker.windmill.Windmill.TagSortedListUpdateRequest; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.TagValue; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.Coder.Context; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VarIntCoder; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.state.BagState; import org.apache.beam.sdk.state.CombiningState; import org.apache.beam.sdk.state.GroupingState; +import org.apache.beam.sdk.state.MapState; import org.apache.beam.sdk.state.OrderedListState; import org.apache.beam.sdk.state.ReadableState; import org.apache.beam.sdk.state.ValueState; @@ -61,7 +71,7 @@ import org.apache.beam.sdk.transforms.windowing.TimestampCombiner; import org.apache.beam.sdk.util.CoderUtils; import org.apache.beam.sdk.values.TimestampedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Range; @@ -84,7 +94,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class WindmillStateInternalsTest { @@ -135,10 +144,11 @@ public void resetUnderTest() { cache .forComputation("comp") .forKey( - WindmillComputationKey.create("comp", ByteString.EMPTY, 123), - STATE_FAMILY, + WindmillComputationKey.create( + "comp", ByteString.copyFrom("dummyKey", Charsets.UTF_8), 123), 17L, - workToken), + workToken) + .forFamily(STATE_FAMILY), readStateSupplier); underTestNewKey = new WindmillStateInternals( @@ -149,10 +159,11 @@ public void resetUnderTest() { cache .forComputation("comp") .forKey( - WindmillComputationKey.create("comp", ByteString.EMPTY, 123), - STATE_FAMILY, + WindmillComputationKey.create( + "comp", ByteString.copyFrom("dummyNewKey", Charsets.UTF_8), 123), 17L, - workToken), + workToken) + .forFamily(STATE_FAMILY), readStateSupplier); } @@ -185,6 +196,391 @@ private WindmillStateReader.WeightedList weightedList(String... elems) { return result; } + private ByteString protoKeyFromUserKey(@Nullable K tag, Coder keyCoder) + throws IOException { + ByteString.Output keyStream = ByteString.newOutput(); + key(NAMESPACE, "map").writeTo(keyStream); + if (tag != null) { + keyCoder.encode(tag, keyStream, Context.OUTER); + } + return keyStream.toByteString(); + } + + private K userKeyFromProtoKey(ByteString tag, Coder keyCoder) throws IOException { + ByteString keyBytes = tag.substring(key(NAMESPACE, "map").size()); + return keyCoder.decode(keyBytes.newInput(), Context.OUTER); + } + + @Test + public void testMapAddBeforeGet() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag = "tag"; + SettableFuture future = SettableFuture.create(); + when(mockReader.valueFuture( + protoKeyFromUserKey(tag, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(future); + + ReadableState result = mapState.get("tag"); + result = result.readLater(); + waitAndSet(future, 1, 200); + assertEquals(1, (int) result.read()); + mapState.put("tag", 2); + assertEquals(2, (int) result.read()); + } + + @Test + public void testMapAddClearBeforeGet() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag = "tag"; + + SettableFuture>> prefixFuture = SettableFuture.create(); + when(mockReader.valuePrefixFuture( + protoKeyFromUserKey(null, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(prefixFuture); + + ReadableState result = mapState.get("tag"); + result = result.readLater(); + waitAndSet( + prefixFuture, + ImmutableList.of( + new AbstractMap.SimpleEntry<>(protoKeyFromUserKey(tag, StringUtf8Coder.of()), 1)), + 50); + assertFalse(mapState.isEmpty().read()); + mapState.clear(); + assertTrue(mapState.isEmpty().read()); + assertNull(mapState.get("tag").read()); + mapState.put("tag", 2); + assertFalse(mapState.isEmpty().read()); + assertEquals(2, (int) result.read()); + } + + @Test + public void testMapLocalAddOverridesStorage() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag = "tag"; + + SettableFuture future = SettableFuture.create(); + when(mockReader.valueFuture( + protoKeyFromUserKey(tag, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(future); + SettableFuture>> prefixFuture = SettableFuture.create(); + when(mockReader.valuePrefixFuture( + protoKeyFromUserKey(null, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(prefixFuture); + + waitAndSet(future, 1, 50); + waitAndSet( + prefixFuture, + ImmutableList.of( + new AbstractMap.SimpleEntry<>(protoKeyFromUserKey(tag, StringUtf8Coder.of()), 1)), + 50); + mapState.put(tag, 42); + assertEquals(42, (int) mapState.get(tag).read()); + assertThat( + mapState.entries().read(), + Matchers.containsInAnyOrder(new AbstractMap.SimpleEntry<>(tag, 42))); + } + + @Test + public void testMapLocalRemoveOverridesStorage() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag1 = "tag1"; + final String tag2 = "tag2"; + + SettableFuture future = SettableFuture.create(); + when(mockReader.valueFuture( + protoKeyFromUserKey(tag1, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(future); + SettableFuture>> prefixFuture = SettableFuture.create(); + when(mockReader.valuePrefixFuture( + protoKeyFromUserKey(null, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(prefixFuture); + + waitAndSet(future, 1, 50); + waitAndSet( + prefixFuture, + ImmutableList.of( + new AbstractMap.SimpleEntry<>(protoKeyFromUserKey(tag1, StringUtf8Coder.of()), 1), + new AbstractMap.SimpleEntry<>(protoKeyFromUserKey(tag2, StringUtf8Coder.of()), 2)), + 50); + mapState.remove(tag1); + assertNull(mapState.get(tag1).read()); + assertThat( + mapState.entries().read(), + Matchers.containsInAnyOrder(new AbstractMap.SimpleEntry<>(tag2, 2))); + + mapState.remove(tag2); + assertTrue(mapState.isEmpty().read()); + } + + @Test + public void testMapLocalClearOverridesStorage() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag1 = "tag1"; + final String tag2 = "tag2"; + + SettableFuture future1 = SettableFuture.create(); + SettableFuture future2 = SettableFuture.create(); + + when(mockReader.valueFuture( + protoKeyFromUserKey(tag1, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(future1); + when(mockReader.valueFuture( + protoKeyFromUserKey(tag2, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(future2); + SettableFuture>> prefixFuture = SettableFuture.create(); + when(mockReader.valuePrefixFuture( + protoKeyFromUserKey(null, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(prefixFuture); + + waitAndSet(future1, 1, 50); + waitAndSet(future2, 2, 50); + waitAndSet( + prefixFuture, + ImmutableList.of( + new AbstractMap.SimpleEntry<>(protoKeyFromUserKey(tag1, StringUtf8Coder.of()), 1), + new AbstractMap.SimpleEntry<>(protoKeyFromUserKey(tag2, StringUtf8Coder.of()), 2)), + 50); + mapState.clear(); + assertNull(mapState.get(tag1).read()); + assertNull(mapState.get(tag2).read()); + assertThat(mapState.entries().read(), Matchers.emptyIterable()); + assertTrue(mapState.isEmpty().read()); + } + + @Test + public void testMapAddBeforeRead() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag1 = "tag1"; + final String tag2 = "tag2"; + final String tag3 = "tag3"; + SettableFuture>> prefixFuture = SettableFuture.create(); + when(mockReader.valuePrefixFuture( + protoKeyFromUserKey(null, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(prefixFuture); + + ReadableState>> result = mapState.entries(); + result = result.readLater(); + + mapState.put(tag1, 1); + waitAndSet( + prefixFuture, + ImmutableList.of( + new AbstractMap.SimpleEntry<>(protoKeyFromUserKey(tag2, StringUtf8Coder.of()), 2)), + 200); + Iterable> readData = result.read(); + assertThat( + readData, + Matchers.containsInAnyOrder( + new AbstractMap.SimpleEntry<>(tag1, 1), new AbstractMap.SimpleEntry<>(tag2, 2))); + + mapState.put(tag3, 3); + assertThat( + result.read(), + Matchers.containsInAnyOrder( + new AbstractMap.SimpleEntry<>(tag1, 1), + new AbstractMap.SimpleEntry<>(tag2, 2), + new AbstractMap.SimpleEntry<>(tag3, 3))); + } + + @Test + public void testMapPutIfAbsentSucceeds() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag1 = "tag1"; + SettableFuture future = SettableFuture.create(); + when(mockReader.valueFuture( + protoKeyFromUserKey(tag1, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(future); + waitAndSet(future, null, 50); + + assertNull(mapState.putIfAbsent(tag1, 42).read()); + assertEquals(42, (int) mapState.get(tag1).read()); + } + + @Test + public void testMapPutIfAbsentFails() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag1 = "tag1"; + mapState.put(tag1, 1); + assertEquals(1, (int) mapState.putIfAbsent(tag1, 42).read()); + assertEquals(1, (int) mapState.get(tag1).read()); + + final String tag2 = "tag2"; + SettableFuture future = SettableFuture.create(); + when(mockReader.valueFuture( + protoKeyFromUserKey(tag2, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(future); + waitAndSet(future, 2, 50); + assertEquals(2, (int) mapState.putIfAbsent(tag2, 42).read()); + assertEquals(2, (int) mapState.get(tag2).read()); + } + + @Test + public void testMapNegativeCache() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag = "tag"; + SettableFuture future = SettableFuture.create(); + when(mockReader.valueFuture( + protoKeyFromUserKey(tag, StringUtf8Coder.of()), STATE_FAMILY, VarIntCoder.of())) + .thenReturn(future); + waitAndSet(future, null, 200); + assertNull(mapState.get(tag).read()); + future.set(42); + assertNull(mapState.get(tag).read()); + } + + private Map.Entry fromTagValue( + TagValue tagValue, Coder keyCoder, Coder valueCoder) { + try { + V value = + !tagValue.getValue().getData().isEmpty() + ? valueCoder.decode(tagValue.getValue().getData().newInput()) + : null; + return new AbstractMap.SimpleEntry<>(userKeyFromProtoKey(tagValue.getTag(), keyCoder), value); + } catch (IOException e) { + throw new RuntimeException(e); + } + } + + @Test + public void testMapAddPersist() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag1 = "tag1"; + final String tag2 = "tag2"; + mapState.put(tag1, 1); + mapState.put(tag2, 2); + + Windmill.WorkItemCommitRequest.Builder commitBuilder = + Windmill.WorkItemCommitRequest.newBuilder(); + underTest.persist(commitBuilder); + + assertEquals(2, commitBuilder.getValueUpdatesCount()); + assertThat( + commitBuilder.getValueUpdatesList().stream() + .map(tv -> fromTagValue(tv, StringUtf8Coder.of(), VarIntCoder.of())) + .collect(Collectors.toList()), + Matchers.containsInAnyOrder(new SimpleEntry<>(tag1, 1), new SimpleEntry<>(tag2, 2))); + } + + @Test + public void testMapRemovePersist() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag1 = "tag1"; + final String tag2 = "tag2"; + mapState.remove(tag1); + mapState.remove(tag2); + + Windmill.WorkItemCommitRequest.Builder commitBuilder = + Windmill.WorkItemCommitRequest.newBuilder(); + underTest.persist(commitBuilder); + + assertEquals(2, commitBuilder.getValueUpdatesCount()); + assertThat( + commitBuilder.getValueUpdatesList().stream() + .map(tv -> fromTagValue(tv, StringUtf8Coder.of(), VarIntCoder.of())) + .collect(Collectors.toList()), + Matchers.containsInAnyOrder(new SimpleEntry<>(tag1, null), new SimpleEntry<>(tag2, null))); + } + + @Test + public void testMapClearPersist() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag1 = "tag1"; + final String tag2 = "tag2"; + mapState.put(tag1, 1); + mapState.put(tag2, 2); + mapState.clear(); + + Windmill.WorkItemCommitRequest.Builder commitBuilder = + Windmill.WorkItemCommitRequest.newBuilder(); + underTest.persist(commitBuilder); + + assertEquals(0, commitBuilder.getValueUpdatesCount()); + assertEquals(1, commitBuilder.getTagValuePrefixDeletesCount()); + System.err.println(commitBuilder); + assertEquals(STATE_FAMILY, commitBuilder.getTagValuePrefixDeletes(0).getStateFamily()); + assertEquals( + protoKeyFromUserKey(null, StringUtf8Coder.of()), + commitBuilder.getTagValuePrefixDeletes(0).getTagPrefix()); + } + + @Test + public void testMapComplexPersist() throws Exception { + StateTag> addr = + StateTags.map("map", StringUtf8Coder.of(), VarIntCoder.of()); + MapState mapState = underTest.state(NAMESPACE, addr); + + final String tag1 = "tag1"; + final String tag2 = "tag2"; + final String tag3 = "tag3"; + final String tag4 = "tag4"; + + mapState.put(tag1, 1); + mapState.clear(); + mapState.put(tag2, 2); + mapState.put(tag3, 3); + mapState.remove(tag2); + mapState.remove(tag4); + + Windmill.WorkItemCommitRequest.Builder commitBuilder = + Windmill.WorkItemCommitRequest.newBuilder(); + underTest.persist(commitBuilder); + assertEquals(1, commitBuilder.getTagValuePrefixDeletesCount()); + assertEquals(STATE_FAMILY, commitBuilder.getTagValuePrefixDeletes(0).getStateFamily()); + assertEquals( + protoKeyFromUserKey(null, StringUtf8Coder.of()), + commitBuilder.getTagValuePrefixDeletes(0).getTagPrefix()); + assertThat( + commitBuilder.getValueUpdatesList().stream() + .map(tv -> fromTagValue(tv, StringUtf8Coder.of(), VarIntCoder.of())) + .collect(Collectors.toList()), + Matchers.containsInAnyOrder( + new SimpleEntry<>(tag3, 3), + new SimpleEntry<>(tag2, null), + new SimpleEntry<>(tag4, null))); + + // Once persist has been called, calling persist again should be a noop. + commitBuilder = Windmill.WorkItemCommitRequest.newBuilder(); + assertEquals(0, commitBuilder.getTagValuePrefixDeletesCount()); + assertEquals(0, commitBuilder.getValueUpdatesCount()); + } + public static final Range FULL_ORDERED_LIST_RANGE = Range.closedOpen(WindmillOrderedList.MIN_TS_MICROS, WindmillOrderedList.MAX_TS_MICROS); @@ -1222,7 +1618,7 @@ public void testCachedValue() throws Exception { value.write("Hi"); underTest.persist(Windmill.WorkItemCommitRequest.newBuilder()); - assertEquals(126, cache.getWeight()); + assertEquals(132, cache.getWeight()); resetUnderTest(); value = underTest.state(NAMESPACE, addr); @@ -1230,7 +1626,7 @@ public void testCachedValue() throws Exception { value.clear(); underTest.persist(Windmill.WorkItemCommitRequest.newBuilder()); - assertEquals(124, cache.getWeight()); + assertEquals(130, cache.getWeight()); resetUnderTest(); value = underTest.state(NAMESPACE, addr); @@ -1262,7 +1658,7 @@ public void testCachedBag() throws Exception { underTest.persist(Windmill.WorkItemCommitRequest.newBuilder()); - assertEquals(134, cache.getWeight()); + assertEquals(140, cache.getWeight()); resetUnderTest(); bag = underTest.state(NAMESPACE, addr); @@ -1282,7 +1678,7 @@ public void testCachedBag() throws Exception { underTest.persist(Windmill.WorkItemCommitRequest.newBuilder()); - assertEquals(127, cache.getWeight()); + assertEquals(133, cache.getWeight()); resetUnderTest(); bag = underTest.state(NAMESPACE, addr); @@ -1293,7 +1689,7 @@ public void testCachedBag() throws Exception { underTest.persist(Windmill.WorkItemCommitRequest.newBuilder()); - assertEquals(128, cache.getWeight()); + assertEquals(134, cache.getWeight()); resetUnderTest(); bag = underTest.state(NAMESPACE, addr); @@ -1324,7 +1720,7 @@ public void testCachedWatermarkHold() throws Exception { underTest.persist(Windmill.WorkItemCommitRequest.newBuilder()); - assertEquals(132, cache.getWeight()); + assertEquals(138, cache.getWeight()); resetUnderTest(); hold = underTest.state(NAMESPACE, addr); @@ -1333,7 +1729,7 @@ public void testCachedWatermarkHold() throws Exception { underTest.persist(Windmill.WorkItemCommitRequest.newBuilder()); - assertEquals(132, cache.getWeight()); + assertEquals(138, cache.getWeight()); resetUnderTest(); hold = underTest.state(NAMESPACE, addr); @@ -1364,7 +1760,7 @@ public void testCachedCombining() throws Exception { underTest.persist(Windmill.WorkItemCommitRequest.newBuilder()); - assertEquals(125, cache.getWeight()); + assertEquals(131, cache.getWeight()); resetUnderTest(); value = underTest.state(NAMESPACE, COMBINING_ADDR); @@ -1375,7 +1771,7 @@ public void testCachedCombining() throws Exception { underTest.persist(Windmill.WorkItemCommitRequest.newBuilder()); - assertEquals(124, cache.getWeight()); + assertEquals(130, cache.getWeight()); resetUnderTest(); value = underTest.state(NAMESPACE, COMBINING_ADDR); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateReaderTest.java index 1cc1b2e95730..da07c8dbcb9e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateReaderTest.java @@ -17,23 +17,26 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; import java.io.IOException; +import java.util.AbstractMap; +import java.util.Map; import java.util.concurrent.Future; import org.apache.beam.runners.dataflow.worker.windmill.Windmill; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.KeyedGetDataRequest; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.SortedListEntry; import org.apache.beam.runners.dataflow.worker.windmill.Windmill.SortedListRange; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VarIntCoder; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.values.TimestampedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString.Output; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString.Output; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Range; import org.hamcrest.Matchers; @@ -51,9 +54,9 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "FutureReturnValueIgnored", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class WindmillStateReaderTest { + private static final StringUtf8Coder STRING_CODER = StringUtf8Coder.of(); private static final VarIntCoder INT_CODER = VarIntCoder.of(); private static final String COMPUTATION = "computation"; @@ -62,6 +65,7 @@ public class WindmillStateReaderTest { private static final long WORK_TOKEN = 5043L; private static final long CONT_POSITION = 1391631351L; + private static final ByteString STATE_KEY_PREFIX = ByteString.copyFromUtf8("key"); private static final ByteString STATE_KEY_1 = ByteString.copyFromUtf8("key1"); private static final ByteString STATE_KEY_2 = ByteString.copyFromUtf8("key2"); private static final String STATE_FAMILY = "family"; @@ -111,7 +115,7 @@ public void testReadBag() throws Exception { Windmill.TagBag.newBuilder() .setTag(STATE_KEY_1) .setStateFamily(STATE_FAMILY) - .setFetchMaxBytes(WindmillStateReader.MAX_BAG_BYTES)); + .setFetchMaxBytes(WindmillStateReader.INITIAL_MAX_BAG_BYTES)); Windmill.KeyedGetDataResponse.Builder response = Windmill.KeyedGetDataResponse.newBuilder() @@ -152,7 +156,7 @@ public void testReadBagWithContinuations() throws Exception { Windmill.TagBag.newBuilder() .setTag(STATE_KEY_1) .setStateFamily(STATE_FAMILY) - .setFetchMaxBytes(WindmillStateReader.MAX_BAG_BYTES)); + .setFetchMaxBytes(WindmillStateReader.INITIAL_MAX_BAG_BYTES)); Windmill.KeyedGetDataResponse.Builder response1 = Windmill.KeyedGetDataResponse.newBuilder() @@ -170,12 +174,12 @@ public void testReadBagWithContinuations() throws Exception { .setKey(DATA_KEY) .setShardingKey(SHARDING_KEY) .setWorkToken(WORK_TOKEN) - .setMaxBytes(WindmillStateReader.MAX_KEY_BYTES) + .setMaxBytes(WindmillStateReader.MAX_CONTINUATION_KEY_BYTES) .addBagsToFetch( Windmill.TagBag.newBuilder() .setTag(STATE_KEY_1) .setStateFamily(STATE_FAMILY) - .setFetchMaxBytes(WindmillStateReader.MAX_BAG_BYTES) + .setFetchMaxBytes(WindmillStateReader.CONTINUATION_MAX_BAG_BYTES) .setRequestPosition(CONT_POSITION)); Windmill.KeyedGetDataResponse.Builder response2 = @@ -227,7 +231,7 @@ public void testReadSortedList() throws Exception { .setTag(STATE_KEY_1) .setStateFamily(STATE_FAMILY) .addFetchRanges(SortedListRange.newBuilder().setStart(beginning).setLimit(end)) - .setFetchMaxBytes(WindmillStateReader.MAX_BAG_BYTES)); + .setFetchMaxBytes(WindmillStateReader.MAX_ORDERED_LIST_BYTES)); Windmill.KeyedGetDataResponse.Builder response = Windmill.KeyedGetDataResponse.newBuilder() @@ -290,19 +294,19 @@ public void testReadSortedListRanges() throws Exception { .setTag(STATE_KEY_1) .setStateFamily(STATE_FAMILY) .addFetchRanges(SortedListRange.newBuilder().setStart(0).setLimit(5)) - .setFetchMaxBytes(WindmillStateReader.MAX_BAG_BYTES)) + .setFetchMaxBytes(WindmillStateReader.MAX_ORDERED_LIST_BYTES)) .addSortedListsToFetch( Windmill.TagSortedListFetchRequest.newBuilder() .setTag(STATE_KEY_1) .setStateFamily(STATE_FAMILY) .addFetchRanges(SortedListRange.newBuilder().setStart(5).setLimit(6)) - .setFetchMaxBytes(WindmillStateReader.MAX_BAG_BYTES)) + .setFetchMaxBytes(WindmillStateReader.MAX_ORDERED_LIST_BYTES)) .addSortedListsToFetch( Windmill.TagSortedListFetchRequest.newBuilder() .setTag(STATE_KEY_1) .setStateFamily(STATE_FAMILY) .addFetchRanges(SortedListRange.newBuilder().setStart(6).setLimit(10)) - .setFetchMaxBytes(WindmillStateReader.MAX_BAG_BYTES)); + .setFetchMaxBytes(WindmillStateReader.MAX_ORDERED_LIST_BYTES)); Windmill.KeyedGetDataResponse.Builder response = Windmill.KeyedGetDataResponse.newBuilder() @@ -394,7 +398,7 @@ public void testReadSortedListWithContinuations() throws Exception { .setTag(STATE_KEY_1) .setStateFamily(STATE_FAMILY) .addFetchRanges(SortedListRange.newBuilder().setStart(beginning).setLimit(end)) - .setFetchMaxBytes(WindmillStateReader.MAX_BAG_BYTES)); + .setFetchMaxBytes(WindmillStateReader.MAX_ORDERED_LIST_BYTES)); final ByteString CONT = ByteString.copyFrom("CONTINUATION", Charsets.UTF_8); Windmill.KeyedGetDataResponse.Builder response1 = @@ -422,7 +426,7 @@ public void testReadSortedListWithContinuations() throws Exception { .setStateFamily(STATE_FAMILY) .addFetchRanges(SortedListRange.newBuilder().setStart(beginning).setLimit(end)) .setRequestPosition(CONT) - .setFetchMaxBytes(WindmillStateReader.MAX_BAG_BYTES)); + .setFetchMaxBytes(WindmillStateReader.MAX_ORDERED_LIST_BYTES)); Windmill.KeyedGetDataResponse.Builder response2 = Windmill.KeyedGetDataResponse.newBuilder() @@ -463,6 +467,139 @@ public void testReadSortedListWithContinuations() throws Exception { // NOTE: The future will still contain a reference to the underlying reader. } + @Test + public void testReadTagValuePrefix() throws Exception { + Future>> future = + underTest.valuePrefixFuture(STATE_KEY_PREFIX, STATE_FAMILY, INT_CODER); + Mockito.verifyNoMoreInteractions(mockWindmill); + + Windmill.KeyedGetDataRequest.Builder expectedRequest = + Windmill.KeyedGetDataRequest.newBuilder() + .setKey(DATA_KEY) + .setShardingKey(SHARDING_KEY) + .setWorkToken(WORK_TOKEN) + .setMaxBytes(WindmillStateReader.MAX_KEY_BYTES) + .addTagValuePrefixesToFetch( + Windmill.TagValuePrefixRequest.newBuilder() + .setTagPrefix(STATE_KEY_PREFIX) + .setStateFamily(STATE_FAMILY) + .setFetchMaxBytes(WindmillStateReader.MAX_TAG_VALUE_PREFIX_BYTES)); + + Windmill.KeyedGetDataResponse.Builder response = + Windmill.KeyedGetDataResponse.newBuilder() + .setKey(DATA_KEY) + .addTagValuePrefixes( + Windmill.TagValuePrefixResponse.newBuilder() + .setTagPrefix(STATE_KEY_PREFIX) + .setStateFamily(STATE_FAMILY) + .addTagValues( + Windmill.TagValue.newBuilder() + .setTag(STATE_KEY_1) + .setStateFamily(STATE_FAMILY) + .setValue(intValue(8))) + .addTagValues( + Windmill.TagValue.newBuilder() + .setTag(STATE_KEY_2) + .setStateFamily(STATE_FAMILY) + .setValue(intValue(9)))); + + Mockito.when(mockWindmill.getStateData(COMPUTATION, expectedRequest.build())) + .thenReturn(response.build()); + + Iterable> result = future.get(); + Mockito.verify(mockWindmill).getStateData(COMPUTATION, expectedRequest.build()); + Mockito.verifyNoMoreInteractions(mockWindmill); + + assertThat( + result, + Matchers.containsInAnyOrder( + new AbstractMap.SimpleEntry<>(STATE_KEY_1, 8), + new AbstractMap.SimpleEntry<>(STATE_KEY_2, 9))); + + assertNoReader(future); + } + + @Test + public void testReadTagValuePrefixWithContinuations() throws Exception { + Future>> future = + underTest.valuePrefixFuture(STATE_KEY_PREFIX, STATE_FAMILY, INT_CODER); + Mockito.verifyNoMoreInteractions(mockWindmill); + + Windmill.KeyedGetDataRequest.Builder expectedRequest1 = + Windmill.KeyedGetDataRequest.newBuilder() + .setKey(DATA_KEY) + .setShardingKey(SHARDING_KEY) + .setWorkToken(WORK_TOKEN) + .setMaxBytes(WindmillStateReader.MAX_KEY_BYTES) + .addTagValuePrefixesToFetch( + Windmill.TagValuePrefixRequest.newBuilder() + .setTagPrefix(STATE_KEY_PREFIX) + .setStateFamily(STATE_FAMILY) + .setFetchMaxBytes(WindmillStateReader.MAX_TAG_VALUE_PREFIX_BYTES)); + + final ByteString CONT = ByteString.copyFrom("CONTINUATION", Charsets.UTF_8); + Windmill.KeyedGetDataResponse.Builder response1 = + Windmill.KeyedGetDataResponse.newBuilder() + .setKey(DATA_KEY) + .addTagValuePrefixes( + Windmill.TagValuePrefixResponse.newBuilder() + .setTagPrefix(STATE_KEY_PREFIX) + .setStateFamily(STATE_FAMILY) + .setContinuationPosition(CONT) + .addTagValues( + Windmill.TagValue.newBuilder() + .setTag(STATE_KEY_1) + .setStateFamily(STATE_FAMILY) + .setValue(intValue(8)))); + + Windmill.KeyedGetDataRequest.Builder expectedRequest2 = + Windmill.KeyedGetDataRequest.newBuilder() + .setKey(DATA_KEY) + .setShardingKey(SHARDING_KEY) + .setWorkToken(WORK_TOKEN) + .setMaxBytes(WindmillStateReader.MAX_KEY_BYTES) + .addTagValuePrefixesToFetch( + Windmill.TagValuePrefixRequest.newBuilder() + .setTagPrefix(STATE_KEY_PREFIX) + .setStateFamily(STATE_FAMILY) + .setRequestPosition(CONT) + .setFetchMaxBytes(WindmillStateReader.MAX_TAG_VALUE_PREFIX_BYTES)); + + Windmill.KeyedGetDataResponse.Builder response2 = + Windmill.KeyedGetDataResponse.newBuilder() + .setKey(DATA_KEY) + .addTagValuePrefixes( + Windmill.TagValuePrefixResponse.newBuilder() + .setTagPrefix(STATE_KEY_PREFIX) + .setStateFamily(STATE_FAMILY) + .setRequestPosition(CONT) + .addTagValues( + Windmill.TagValue.newBuilder() + .setTag(STATE_KEY_2) + .setStateFamily(STATE_FAMILY) + .setValue(intValue(9)))); + + Mockito.when(mockWindmill.getStateData(COMPUTATION, expectedRequest1.build())) + .thenReturn(response1.build()); + Mockito.when(mockWindmill.getStateData(COMPUTATION, expectedRequest2.build())) + .thenReturn(response2.build()); + + Iterable> results = future.get(); + Mockito.verify(mockWindmill).getStateData(COMPUTATION, expectedRequest1.build()); + for (Map.Entry unused : results) { + // Iterate over the results to force loading all the pages. + } + Mockito.verify(mockWindmill).getStateData(COMPUTATION, expectedRequest2.build()); + Mockito.verifyNoMoreInteractions(mockWindmill); + + assertThat( + results, + Matchers.containsInAnyOrder( + new AbstractMap.SimpleEntry<>(STATE_KEY_1, 8), + new AbstractMap.SimpleEntry<>(STATE_KEY_2, 9))); + // NOTE: The future will still contain a reference to the underlying reader. + } + @Test public void testReadValue() throws Exception { Future future = underTest.valueFuture(STATE_KEY_1, STATE_FAMILY, INT_CODER); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateTestUtils.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateTestUtils.java index 3d75e1bb3e81..17da531d4525 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateTestUtils.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateTestUtils.java @@ -25,9 +25,6 @@ import java.util.HashSet; /** Static helpers for testing Windmill state. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindmillStateTestUtils { /** * Assert that no field (including compiler-generated fields) within {@code obj} point back to a diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillTimeUtilsTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillTimeUtilsTest.java index 73d42cad3d35..5f910c3acb5f 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillTimeUtilsTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillTimeUtilsTest.java @@ -30,9 +30,6 @@ /** Unit tests for {@link WindmillTimeUtils}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindmillTimeUtilsTest { @Test public void testWindmillToHarnessWatermark() { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillTimerInternalsTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillTimerInternalsTest.java index c9fe16c88c1d..9aed2cde25d0 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillTimerInternalsTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillTimerInternalsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.List; import org.apache.beam.runners.core.StateNamespace; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClientTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClientTest.java index 88c9b734a528..6cc2eae14408 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClientTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClientTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.dataflow.worker; import static org.apache.beam.runners.dataflow.worker.SourceTranslationUtils.cloudProgressToReaderProgress; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.empty; @@ -25,7 +26,6 @@ import static org.hamcrest.Matchers.hasEntry; import static org.hamcrest.Matchers.hasSize; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.anyBoolean; import static org.mockito.Matchers.isA; import static org.mockito.Mockito.mock; @@ -83,9 +83,6 @@ /** Tests for {@link WorkItemStatusClient}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WorkItemStatusClientTest { private static final String PROJECT_ID = "ProjectId"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WorkerCustomSourcesTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WorkerCustomSourcesTest.java index cf12da5230b2..680677458e05 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WorkerCustomSourcesTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WorkerCustomSourcesTest.java @@ -108,7 +108,7 @@ import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.ValueWithRecordId; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -124,9 +124,6 @@ /** Tests for {@link WorkerCustomSources}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WorkerCustomSourcesTest { @Rule public ExpectedException expectedException = ExpectedException.none(); @Rule public ExpectedLogs logged = ExpectedLogs.none(WorkerCustomSources.class); @@ -502,7 +499,7 @@ public void testReadUnboundedReader() throws Exception { CounterSet counterSet = new CounterSet(); StreamingModeExecutionStateRegistry executionStateRegistry = new StreamingModeExecutionStateRegistry(null); - ReaderCache readerCache = new ReaderCache(); + ReaderCache readerCache = new ReaderCache(Duration.standardMinutes(1), Runnable::run); StreamingModeExecutionContext context = new StreamingModeExecutionContext( counterSet, diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/apiary/ApiaryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/apiary/ApiaryTest.java index c6ee204e0a47..d01a7ef4c3ff 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/apiary/ApiaryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/apiary/ApiaryTest.java @@ -29,7 +29,7 @@ /** Tests for {@link Apiary}. */ @RunWith(JUnit4.class) -@SuppressWarnings({"keyfor", "nullness"}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +@SuppressWarnings({"keyfor"}) public class ApiaryTest { @Test public void testNullSafeList() { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/CounterSetTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/CounterSetTest.java index 27dadf96d575..adc1a324d29c 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/CounterSetTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/CounterSetTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.dataflow.worker.counters; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.times; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.verifyNoMoreInteractions; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/DistributionCounterUpdateAggregatorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/DistributionCounterUpdateAggregatorTest.java index 3ad8619b3429..8dac8a747f1d 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/DistributionCounterUpdateAggregatorTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/DistributionCounterUpdateAggregatorTest.java @@ -31,9 +31,6 @@ import org.junit.Before; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DistributionCounterUpdateAggregatorTest { private List counterUpdates; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/MeanCounterUpdateAggregatorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/MeanCounterUpdateAggregatorTest.java index 55dc7c2373b6..9ea7a31dfade 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/MeanCounterUpdateAggregatorTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/MeanCounterUpdateAggregatorTest.java @@ -31,9 +31,6 @@ import org.junit.Before; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MeanCounterUpdateAggregatorTest { private List counterUpdates; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/SumCounterUpdateAggregatorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/SumCounterUpdateAggregatorTest.java index cfa4388af222..e30354fc6fe6 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/SumCounterUpdateAggregatorTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/counters/SumCounterUpdateAggregatorTest.java @@ -30,9 +30,6 @@ import org.junit.Before; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SumCounterUpdateAggregatorTest { private List counterUpdates; private SumCounterUpdateAggregator aggregator; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java index e2152a0cc03a..b5135f84b180 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java @@ -30,14 +30,14 @@ import org.apache.beam.model.fnexecution.v1.BeamFnControlGrpc; import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.runners.dataflow.worker.fn.stream.ServerStreamObserverFactory; -import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.fnexecution.control.FnApiControlClient; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.testing.GrpcCleanupRule; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.testing.GrpcCleanupRule; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.net.HostAndPort; import org.junit.Before; @@ -50,9 +50,6 @@ /** Tests for {@link BeamFnControlService}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnControlServiceTest { @Rule public GrpcCleanupRule grpcCleanupRule = new GrpcCleanupRule().setTimeout(10, TimeUnit.SECONDS); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutorTest.java index d8129466d3b8..b365ac9ada7d 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutorTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutorTest.java @@ -18,9 +18,9 @@ package org.apache.beam.runners.dataflow.worker.fn.control; import static org.apache.beam.runners.core.metrics.MonitoringInfoEncodings.encodeInt64Counter; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Mockito.when; @@ -74,7 +74,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class BeamFnMapTaskExecutorTest { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/DataflowSideInputHandlerFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/DataflowSideInputHandlerFactoryTest.java index bd8f1610f477..a7d46875ff41 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/DataflowSideInputHandlerFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/DataflowSideInputHandlerFactoryTest.java @@ -52,7 +52,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public final class DataflowSideInputHandlerFactoryTest { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformerTest.java index a29e73b5582a..c9f09b454e81 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformerTest.java @@ -41,9 +41,6 @@ import org.mockito.Mock; import org.mockito.MockitoAnnotations; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ElementCountMonitoringInfoToCounterUpdateTransformerTest { @Rule public final ExpectedException exception = ExpectedException.none(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ExecutionTimeMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ExecutionTimeMonitoringInfoToCounterUpdateTransformerTest.java index f1e4ab8ae256..f99cf5a45dbe 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ExecutionTimeMonitoringInfoToCounterUpdateTransformerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ExecutionTimeMonitoringInfoToCounterUpdateTransformerTest.java @@ -44,9 +44,6 @@ import org.mockito.Mock; import org.mockito.MockitoAnnotations; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ExecutionTimeMonitoringInfoToCounterUpdateTransformerTest { @Rule public final ExpectedException exception = ExpectedException.none(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformerTest.java index c2424b3263f6..70ba27460521 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformerTest.java @@ -33,9 +33,6 @@ import org.mockito.Mock; import org.mockito.MockitoAnnotations; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FnApiMonitoringInfoToCounterUpdateTransformerTest { @Mock private UserMonitoringInfoToCounterUpdateTransformer mockTransformer2; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformerTest.java index db50ac2cfb1d..796e513b580c 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformerTest.java @@ -42,9 +42,6 @@ import org.mockito.Mock; import org.mockito.MockitoAnnotations; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MeanByteCountMonitoringInfoToCounterUpdateTransformerTest { @Rule public final ExpectedException exception = ExpectedException.none(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/RegisterAndProcessBundleOperationTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/RegisterAndProcessBundleOperationTest.java index d767d96a5fce..91e8257b6e1c 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/RegisterAndProcessBundleOperationTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/RegisterAndProcessBundleOperationTest.java @@ -18,11 +18,11 @@ package org.apache.beam.runners.dataflow.worker.fn.control; import static org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.encodeAndConcat; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.empty; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.any; import static org.mockito.Matchers.eq; import static org.mockito.Mockito.mock; @@ -78,7 +78,7 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.ValueInSingleWindow.Coder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableTable; @@ -99,7 +99,6 @@ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "FutureReturnValueIgnored", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class RegisterAndProcessBundleOperationTest { private static final BeamFnApi.RegisterRequest REGISTER_REQUEST = diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/SingularProcessBundleProgressTrackerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/SingularProcessBundleProgressTrackerTest.java index 63b72b6507d5..07958fb906e5 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/SingularProcessBundleProgressTrackerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/SingularProcessBundleProgressTrackerTest.java @@ -39,7 +39,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SingularProcessBundleProgressTrackerTest { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformerTest.java index 537d964c2ce5..4197aabb0dd0 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformerTest.java @@ -44,9 +44,6 @@ import org.mockito.Mock; import org.mockito.MockitoAnnotations; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UserDistributionMonitoringInfoToCounterUpdateTransformerTest { @Rule public final ExpectedException exception = ExpectedException.none(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformerTest.java index b02fa5b529fb..b76bfaeae746 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformerTest.java @@ -43,9 +43,6 @@ import org.mockito.Mock; import org.mockito.MockitoAnnotations; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UserMonitoringInfoToCounterUpdateTransformerTest { @Rule public final ExpectedException exception = ExpectedException.none(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/BeamFnDataGrpcServiceTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/BeamFnDataGrpcServiceTest.java index fde88aa2d2e7..43a2d17bf098 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/BeamFnDataGrpcServiceTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/BeamFnDataGrpcServiceTest.java @@ -19,10 +19,10 @@ import static org.apache.beam.sdk.util.CoderUtils.encodeToByteArray; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Collection; @@ -40,7 +40,6 @@ import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.runners.dataflow.harness.test.TestStreams; import org.apache.beam.runners.dataflow.worker.fn.stream.ServerStreamObserverFactory; -import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.LengthPrefixCoder; @@ -48,25 +47,26 @@ import org.apache.beam.sdk.fn.data.CloseableFnDataReceiver; import org.apache.beam.sdk.fn.data.InboundDataClient; import org.apache.beam.sdk.fn.data.LogicalEndpoint; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.BindableService; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.CallOptions; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Channel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ClientCall; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ClientInterceptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingClientCall.SimpleForwardingClientCall; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Metadata; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Metadata.Key; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.MethodDescriptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerInterceptors; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.BindableService; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.CallOptions; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Channel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ClientCall; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ClientInterceptor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ForwardingClientCall.SimpleForwardingClientCall; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Metadata; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Metadata.Key; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.MethodDescriptor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerInterceptors; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.After; import org.junit.Before; import org.junit.Test; @@ -77,7 +77,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "FutureReturnValueIgnored", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class BeamFnDataGrpcServiceTest { private static final String TRANSFORM_ID = "888"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/RemoteGrpcPortReadOperationTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/RemoteGrpcPortReadOperationTest.java index 7250ad80bda3..85c66d7ec972 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/RemoteGrpcPortReadOperationTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/RemoteGrpcPortReadOperationTest.java @@ -18,9 +18,9 @@ package org.apache.beam.runners.dataflow.worker.fn.data; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Matchers.eq; @@ -56,9 +56,6 @@ /** Tests for {@link RemoteGrpcPortReadOperation}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RemoteGrpcPortReadOperationTest { private static final Coder> CODER = WindowedValue.getFullCoder(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/RemoteGrpcPortWriteOperationTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/RemoteGrpcPortWriteOperationTest.java index d09adccd2e00..fb7ce24a7a86 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/RemoteGrpcPortWriteOperationTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/data/RemoteGrpcPortWriteOperationTest.java @@ -18,9 +18,9 @@ package org.apache.beam.runners.dataflow.worker.fn.data; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Mockito.times; @@ -50,9 +50,6 @@ /** Tests for {@link RemoteGrpcPortWriteOperation}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RemoteGrpcPortWriteOperationTest { private static final Coder> CODER = WindowedValue.getFullCoder(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/logging/BeamFnLoggingServiceTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/logging/BeamFnLoggingServiceTest.java index 448f9e2855bf..84faff91da97 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/logging/BeamFnLoggingServiceTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/logging/BeamFnLoggingServiceTest.java @@ -17,13 +17,14 @@ */ package org.apache.beam.runners.dataflow.worker.fn.logging; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; -import static org.junit.Assert.assertThat; import java.net.InetAddress; import java.net.ServerSocket; import java.util.ArrayList; import java.util.Collection; +import java.util.concurrent.BlockingQueue; import java.util.concurrent.Callable; import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.CountDownLatch; @@ -32,18 +33,19 @@ import java.util.concurrent.Future; import java.util.concurrent.LinkedBlockingQueue; import org.apache.beam.model.fnexecution.v1.BeamFnApi; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.LogEntry.List; import org.apache.beam.model.fnexecution.v1.BeamFnLoggingGrpc; import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.runners.dataflow.harness.test.TestStreams; import org.apache.beam.runners.dataflow.worker.fn.stream.ServerStreamObserverFactory; -import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.BindableService; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.BindableService; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.net.HostAndPort; import org.junit.After; import org.junit.Test; @@ -52,9 +54,6 @@ /** Tests for {@link BeamFnLoggingService}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnLoggingServiceTest { private Server server; @@ -118,7 +117,7 @@ public void testMultipleClientsSuccessfullyProcessed() throws Exception { } } - @Test + @Test(timeout = 5000) public void testMultipleClientsFailingIsHandledGracefullyByServer() throws Exception { Collection> tasks = new ArrayList<>(); ConcurrentLinkedQueue logs = new ConcurrentLinkedQueue<>(); @@ -129,11 +128,12 @@ public void testMultipleClientsFailingIsHandledGracefullyByServer() throws Excep ServerStreamObserverFactory.fromOptions(PipelineOptionsFactory.create())::from, GrpcContextHeaderAccessorProvider.getHeaderAccessor())) { server = createServer(service, service.getApiServiceDescriptor()); + CountDownLatch waitForTermination = new CountDownLatch(3); + final BlockingQueue> outboundObservers = new LinkedBlockingQueue<>(); for (int i = 1; i <= 3; ++i) { int instructionId = i; tasks.add( () -> { - CountDownLatch waitForTermination = new CountDownLatch(1); ManagedChannel channel = InProcessChannelBuilder.forName(service.getApiServiceDescriptor().getUrl()) .build(); @@ -144,13 +144,16 @@ public void testMultipleClientsFailingIsHandledGracefullyByServer() throws Excep .withOnError(waitForTermination::countDown) .build()); outboundObserver.onNext(createLogsWithIds(instructionId, -instructionId)); - outboundObserver.onError(new RuntimeException("Client " + instructionId)); - waitForTermination.await(); + outboundObservers.add(outboundObserver); return null; }); } ExecutorService executorService = Executors.newCachedThreadPool(); executorService.invokeAll(tasks); + for (int i = 1; i <= 3; ++i) { + outboundObservers.take().onError(new RuntimeException("Client " + i)); + } + waitForTermination.await(); } } diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/stream/ServerStreamObserverFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/stream/ServerStreamObserverFactoryTest.java index 7525299f8870..e5e0171f6b45 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/stream/ServerStreamObserverFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/stream/ServerStreamObserverFactoryTest.java @@ -17,15 +17,15 @@ */ package org.apache.beam.runners.dataflow.worker.fn.stream; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.fn.stream.BufferingStreamObserver; import org.apache.beam.sdk.fn.stream.DirectStreamObserver; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; @@ -35,9 +35,6 @@ /** Tests for {@link ServerStreamObserverFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ServerStreamObserverFactoryTest { @Mock private CallStreamObserver mockResponseObserver; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/CloneAmbiguousFlattensFunctionTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/CloneAmbiguousFlattensFunctionTest.java index 367f16c82c18..4f8e7fba6021 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/CloneAmbiguousFlattensFunctionTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/CloneAmbiguousFlattensFunctionTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.dataflow.worker.graph; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.matchers.JUnitMatchers.hasItems; @@ -43,9 +43,6 @@ /** Tests for {@link CloneAmbiguousFlattensFunction}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public final class CloneAmbiguousFlattensFunctionTest { /** A node that stores nothing. Used for testing with nodes that have no ExecutionLocation. */ diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/CreateRegisterFnOperationFunctionTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/CreateRegisterFnOperationFunctionTest.java index 21446eac808a..e149ed539220 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/CreateRegisterFnOperationFunctionTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/CreateRegisterFnOperationFunctionTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.runners.dataflow.worker.graph; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.everyItem; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.when; @@ -58,9 +58,6 @@ /** Tests for {@link CreateRegisterFnOperationFunction}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CreateRegisterFnOperationFunctionTest { @Mock private Supplier portSupplier; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/DeduceNodeLocationsFunctionTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/DeduceNodeLocationsFunctionTest.java index a1824d6f2ac9..fca960884333 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/DeduceNodeLocationsFunctionTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/DeduceNodeLocationsFunctionTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.dataflow.worker.graph; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import com.google.api.services.dataflow.model.FlattenInstruction; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/InsertFetchAndFilterStreamingSideInputNodesTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/InsertFetchAndFilterStreamingSideInputNodesTest.java index aa5f01a6d4ef..b8c24d0614fc 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/InsertFetchAndFilterStreamingSideInputNodesTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/InsertFetchAndFilterStreamingSideInputNodesTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker.graph; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import com.google.api.services.dataflow.model.InstructionOutput; import com.google.api.services.dataflow.model.ParDoInstruction; @@ -53,7 +53,7 @@ import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionView; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Equivalence; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Equivalence.Wrapper; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/LengthPrefixUnknownCodersTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/LengthPrefixUnknownCodersTest.java index 250b059e22a5..575e754182b1 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/LengthPrefixUnknownCodersTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/LengthPrefixUnknownCodersTest.java @@ -73,9 +73,6 @@ /** Tests for {@link LengthPrefixUnknownCoders}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LengthPrefixUnknownCodersTest { private static final Coder>> windowedValueCoder = WindowedValue.getFullCoder( diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunctionTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunctionTest.java index 09d2205eaed1..dcef92c902fa 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunctionTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunctionTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.dataflow.worker.graph; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.greaterThanOrEqualTo; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.collection.IsEmptyCollection.emptyCollectionOf; @@ -24,7 +25,6 @@ import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import com.google.api.services.dataflow.model.FlattenInstruction; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/NetworksTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/NetworksTest.java index b027233a6783..9c2dc0300ea8 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/NetworksTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/NetworksTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.dataflow.worker.graph; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.collection.IsEmptyCollection.empty; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.spy; import static org.mockito.Mockito.times; import static org.mockito.Mockito.verify; @@ -46,9 +46,6 @@ /** Tests for {@link Networks}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class NetworksTest { @Test public void testTopologicalSortWithEmptyNetwork() { diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/NodesTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/NodesTest.java index 83c1665cf47d..bfa57aaa2ca7 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/NodesTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/NodesTest.java @@ -57,7 +57,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class NodesTest { private static final String PCOLLECTION_ID = "fakeId"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/RemoveFlattenInstructionsFunctionTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/RemoveFlattenInstructionsFunctionTest.java index a5724ae254d5..c243a1eb28f1 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/RemoveFlattenInstructionsFunctionTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/RemoveFlattenInstructionsFunctionTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.dataflow.worker.graph; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import com.google.api.services.dataflow.model.FlattenInstruction; import com.google.api.services.dataflow.model.InstructionOutput; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/ReplacePgbkWithPrecombineFunctionTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/ReplacePgbkWithPrecombineFunctionTest.java index 29a29915b4be..8b3872994c2c 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/ReplacePgbkWithPrecombineFunctionTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/ReplacePgbkWithPrecombineFunctionTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.dataflow.worker.graph; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import com.google.api.services.dataflow.model.InstructionOutput; import com.google.api.services.dataflow.model.ParDoInstruction; @@ -46,9 +46,6 @@ /** Tests for {@link ReplacePgbkWithPrecombineFunction}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public final class ReplacePgbkWithPrecombineFunctionTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingHandlerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingHandlerTest.java index 312470a0b23b..99638b0c8c5f 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingHandlerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingHandlerTest.java @@ -35,7 +35,7 @@ import org.apache.beam.runners.dataflow.worker.NameContextsForTests; import org.apache.beam.runners.dataflow.worker.TestOperationContext.TestDataflowExecutionState; import org.apache.beam.runners.dataflow.worker.testing.RestoreDataflowLoggingMDC; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; import org.junit.After; import org.junit.Before; @@ -47,9 +47,6 @@ /** Unit tests for {@link DataflowWorkerLoggingHandler}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowWorkerLoggingHandlerTest { @Rule public TestRule restoreMDC = new RestoreDataflowLoggingMDC(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java index db08e242b7b3..72501ce4b3c0 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java @@ -23,12 +23,12 @@ import static org.apache.beam.runners.dataflow.worker.logging.DataflowWorkerLoggingInitializer.SDK_FILEPATH_PROPERTY; import static org.apache.beam.runners.dataflow.worker.logging.DataflowWorkerLoggingInitializer.getLoggingHandler; import static org.apache.beam.runners.dataflow.worker.logging.DataflowWorkerLoggingInitializer.getSdkLoggingHandler; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.File; @@ -65,9 +65,6 @@ * not safe to assert on log counts or whether the retrieved log collection is empty. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataflowWorkerLoggingInitializerTest { @Rule public TemporaryFolder logFolder = new TemporaryFolder(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/JulHandlerPrintStreamAdapterFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/JulHandlerPrintStreamAdapterFactoryTest.java index 804ec532f840..2bd26c77734b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/JulHandlerPrintStreamAdapterFactoryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/JulHandlerPrintStreamAdapterFactoryTest.java @@ -18,14 +18,14 @@ package org.apache.beam.runners.dataflow.worker.logging; import static org.apache.beam.runners.dataflow.worker.LogRecordMatcher.hasLogItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.blankOrNullString; import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.io.PrintStream; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.logging.Level; import java.util.logging.LogRecord; import org.apache.beam.runners.dataflow.worker.LogSaver; @@ -37,9 +37,6 @@ /** Tests for {@link JulHandlerPrintStreamAdapterFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JulHandlerPrintStreamAdapterFactoryTest { private static final String LOGGER_NAME = "test"; @@ -70,7 +67,8 @@ public void testLogOnNewLine() { @Test public void testLogRecordMetadata() { PrintStream printStream = - JulHandlerPrintStreamAdapterFactory.create(handler, "fooLogger", Level.WARNING); + JulHandlerPrintStreamAdapterFactory.create( + handler, "fooLogger", Level.WARNING, StandardCharsets.UTF_8); printStream.println("anyMessage"); assertThat(handler.getLogs(), not(empty())); @@ -124,14 +122,14 @@ public void testLogOnClose() { public void testLogRawBytes() { PrintStream printStream = createPrintStreamAdapter(); String msg = "♠ ♡ ♢ ♣ ♤ ♥ ♦ ♧"; - byte[] bytes = msg.getBytes(Charset.defaultCharset()); + byte[] bytes = msg.getBytes(StandardCharsets.UTF_8); printStream.write(bytes, 0, 1); printStream.write(bytes, 1, 4); printStream.write(bytes, 5, 15); printStream.write(bytes, 20, bytes.length - 20); assertThat(handler.getLogs(), is(empty())); String newlineMsg = "♠ ♡ \n♦ ♧"; - byte[] newlineMsgBytes = newlineMsg.getBytes(Charset.defaultCharset()); + byte[] newlineMsgBytes = newlineMsg.getBytes(StandardCharsets.UTF_8); printStream.write(newlineMsgBytes, 0, newlineMsgBytes.length); assertThat(handler.getLogs(), hasLogItem(msg + newlineMsg)); } @@ -154,7 +152,7 @@ public void testNoEmptyMessages() { printStream.flush(); printStream.print(""); printStream.flush(); - byte[] bytes = "a".getBytes(Charset.defaultCharset()); + byte[] bytes = "a".getBytes(StandardCharsets.UTF_8); printStream.write(bytes, 0, 0); printStream.flush(); } @@ -165,6 +163,7 @@ public void testNoEmptyMessages() { } private PrintStream createPrintStreamAdapter() { - return JulHandlerPrintStreamAdapterFactory.create(handler, LOGGER_NAME, Level.INFO); + return JulHandlerPrintStreamAdapterFactory.create( + handler, LOGGER_NAME, Level.INFO, StandardCharsets.UTF_8); } } diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfilerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfilerTest.java index cea566df4162..9bb5b97e6233 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfilerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfilerTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker.profiler; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.HashMap; import org.apache.beam.runners.dataflow.worker.profiler.ScopedProfiler.ProfileScope; @@ -31,9 +31,6 @@ /** Tests for {@link ScopedProfiler}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ScopedProfilerTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/DebugCaptureTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/DebugCaptureTest.java index 0de0aca1d7e7..767cda651195 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/DebugCaptureTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/DebugCaptureTest.java @@ -42,9 +42,6 @@ /** Tests for {@link DebugCapture}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DebugCaptureTest { private static final String PROJECT_ID = "some-project"; private static final String REGION = "some-region"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/ThreadzServletTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/ThreadzServletTest.java index 5b8de01ebe75..9658e5307541 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/ThreadzServletTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/ThreadzServletTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.dataflow.worker.status; import static org.apache.beam.runners.dataflow.worker.status.ThreadzServlet.Stack; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/WorkerStatusPagesTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/WorkerStatusPagesTest.java index c6e03ba2340c..bea39bd7822e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/WorkerStatusPagesTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/status/WorkerStatusPagesTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker.status; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; -import static org.junit.Assert.assertThat; import java.util.function.BooleanSupplier; import org.apache.beam.runners.dataflow.worker.util.MemoryMonitor; @@ -34,9 +34,6 @@ /** Tests for {@link WorkerStatusPages}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WorkerStatusPagesTest { private final Server server = new Server(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/GenericJsonMatcherTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/GenericJsonMatcherTest.java index 575495cc627b..146fb1734ac8 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/GenericJsonMatcherTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/GenericJsonMatcherTest.java @@ -30,9 +30,6 @@ /** Tests for {@link GenericJsonMatcher}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GenericJsonMatcherTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/RestoreDataflowLoggingMDC.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/RestoreDataflowLoggingMDC.java index 40c9d55026e6..eed8ca9fbbb9 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/RestoreDataflowLoggingMDC.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/RestoreDataflowLoggingMDC.java @@ -21,9 +21,6 @@ import org.junit.rules.ExternalResource; /** Saves, clears and restores the current thread-local logging parameters for tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RestoreDataflowLoggingMDC extends ExternalResource { private String previousJobId; private String previousStageName; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/TestCountingSource.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/TestCountingSource.java index f57134c5bcb1..6771e9dbb713 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/TestCountingSource.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/testing/TestCountingSource.java @@ -43,9 +43,6 @@ * The reader will occasionally return false from {@code advance}, in order to simulate a source * where not all the data is available immediately. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestCountingSource extends UnboundedSource, TestCountingSource.CounterMark> { private static final Logger LOG = LoggerFactory.getLogger(TestCountingSource.class); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowFnsTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowFnsTest.java index 775466a9312b..715cb8b30004 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowFnsTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowFnsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.dataflow.worker.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.core.InMemoryStateInternals; import org.apache.beam.runners.core.StateInternals; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowReshuffleDoFnTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowReshuffleDoFnTest.java index f3fbc42b4c30..3d2052ca84fc 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowReshuffleDoFnTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowReshuffleDoFnTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/CounterHamcrestMatchers.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/CounterHamcrestMatchers.java index d1290e03b10e..2cb0833c2e0f 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/CounterHamcrestMatchers.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/CounterHamcrestMatchers.java @@ -36,9 +36,6 @@ import org.hamcrest.TypeSafeMatcher; /** Matchers for {@link Counter} and {@link CounterUpdate}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public final class CounterHamcrestMatchers { private CounterHamcrestMatchers() {} diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/GroupAlsoByWindowProperties.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/GroupAlsoByWindowProperties.java index e94c5485f3d5..79c2b9a1a2e4 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/GroupAlsoByWindowProperties.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/GroupAlsoByWindowProperties.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.dataflow.worker.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasSize; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Arrays; @@ -63,9 +63,6 @@ * which the implementation is applicable. For example, some {@code GroupAlsoByWindows} may not * support merging windows. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GroupAlsoByWindowProperties { /** @@ -178,14 +175,12 @@ public static void groupsElementsIntoSlidingWindowsWithMinTimestamp( TimestampedValue>> item1 = getOnlyElementInWindow(result, window(0, 20)); assertThat(item1.getValue().getValue(), containsInAnyOrder("v1", "v2")); - // Timestamp adjusted by WindowFn to exceed the end of the prior sliding window - assertThat(item1.getTimestamp(), equalTo(new Instant(10))); + assertThat(item1.getTimestamp(), equalTo(new Instant(5))); TimestampedValue>> item2 = getOnlyElementInWindow(result, window(10, 30)); assertThat(item2.getValue().getValue(), contains("v2")); - // Timestamp adjusted by WindowFn to exceed the end of the prior sliding window - assertThat(item2.getTimestamp(), equalTo(new Instant(20))); + assertThat(item2.getTimestamp(), equalTo(new Instant(15))); } /** @@ -234,14 +229,12 @@ public static void combinesElementsInSlidingWindows( TimestampedValue> item1 = getOnlyElementInWindow(result, window(0, 20)); assertThat(item1.getValue().getKey(), equalTo("k")); assertThat(item1.getValue().getValue(), equalTo(combineFn.apply(ImmutableList.of(1L, 2L, 4L)))); - // Timestamp adjusted by WindowFn to exceed the end of the prior sliding window - assertThat(item1.getTimestamp(), equalTo(new Instant(10L))); + assertThat(item1.getTimestamp(), equalTo(new Instant(5L))); TimestampedValue> item2 = getOnlyElementInWindow(result, window(10, 30)); assertThat(item2.getValue().getKey(), equalTo("k")); assertThat(item2.getValue().getValue(), equalTo(combineFn.apply(ImmutableList.of(2L, 4L)))); - // Timestamp adjusted by WindowFn to exceed the end of the prior sliding window - assertThat(item2.getTimestamp(), equalTo(new Instant(20L))); + assertThat(item2.getTimestamp(), equalTo(new Instant(15L))); } /** diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/MemoryMonitorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/MemoryMonitorTest.java index 7d91b51b19e3..4a96b75d9624 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/MemoryMonitorTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/MemoryMonitorTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.File; @@ -39,9 +39,6 @@ * Test the memory monitor will block threads when the server is in a (faked) GC thrashing state. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MemoryMonitorTest { @Rule public TemporaryFolder tempFolder = new TemporaryFolder(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/ScalableBloomFilterTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/ScalableBloomFilterTest.java index 27fb83de64b8..db47f1a75de2 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/ScalableBloomFilterTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/ScalableBloomFilterTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.lessThan; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.nio.ByteBuffer; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/TimerOrElementTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/TimerOrElementTest.java index b4ae74ab3d32..45433dc4c5f7 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/TimerOrElementTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/TimerOrElementTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import java.util.Collections; import java.util.List; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/TaggedReiteratorListTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/TaggedReiteratorListTest.java index aa5c3c4bdcbe..454e7f129f9b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/TaggedReiteratorListTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/TaggedReiteratorListTest.java @@ -29,7 +29,7 @@ /** Tests for {@link TaggedReiteratorList}. */ @RunWith(JUnit4.class) -@SuppressWarnings({"keyfor", "nullness"}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +@SuppressWarnings({"keyfor"}) public class TaggedReiteratorListTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/BatchingShuffleEntryReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/BatchingShuffleEntryReaderTest.java index 6ceff5664c11..798acf39bbba 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/BatchingShuffleEntryReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/BatchingShuffleEntryReaderTest.java @@ -18,9 +18,9 @@ package org.apache.beam.runners.dataflow.worker.util.common.worker; import static com.google.api.client.util.Lists.newArrayList; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.verifyNoMoreInteractions; import static org.mockito.Mockito.when; @@ -39,9 +39,6 @@ /** Unit tests for {@link BatchingShuffleEntryReader}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public final class BatchingShuffleEntryReaderTest { private static final byte[] KEY = {0xA}; private static final byte[] SKEY = {0xB}; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/CachingShuffleBatchReaderTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/CachingShuffleBatchReaderTest.java index 88db3b2350e7..c1db7efdf088 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/CachingShuffleBatchReaderTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/CachingShuffleBatchReaderTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.dataflow.worker.util.common.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.notNullValue; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.times; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ExecutorTestUtils.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ExecutorTestUtils.java index 642db15a1e60..8acdc662f8fd 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ExecutorTestUtils.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ExecutorTestUtils.java @@ -35,7 +35,6 @@ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "unchecked", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ExecutorTestUtils { // Do not instantiate. diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/FlattenOperationTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/FlattenOperationTest.java index ad862b8f3465..e8d9d303a643 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/FlattenOperationTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/FlattenOperationTest.java @@ -19,6 +19,7 @@ import static org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver.TestOutputCounter.getMeanByteCounterName; import static org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver.TestOutputCounter.getObjectCounterName; +import static org.hamcrest.MatcherAssert.assertThat; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.verifyNoMoreInteractions; import static org.mockito.Mockito.when; @@ -30,7 +31,6 @@ import org.apache.beam.runners.dataflow.worker.counters.CounterSet; import org.apache.beam.runners.dataflow.worker.counters.NameContext; import org.hamcrest.CoreMatchers; -import org.junit.Assert; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; @@ -64,8 +64,7 @@ public void testRunFlattenOperation() throws Exception { flattenOperation.finish(); - Assert.assertThat( - receiver.outputElems, CoreMatchers.hasItems("hi", "there", "", "bob")); + assertThat(receiver.outputElems, CoreMatchers.hasItems("hi", "there", "", "bob")); CounterUpdateExtractor updateExtractor = Mockito.mock(CounterUpdateExtractor.class); counterSet.extractUpdates(false, updateExtractor); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/GroupingShuffleEntryIteratorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/GroupingShuffleEntryIteratorTest.java index 05694a6fdd58..5ca741fa5ad9 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/GroupingShuffleEntryIteratorTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/GroupingShuffleEntryIteratorTest.java @@ -58,7 +58,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class GroupingShuffleEntryIteratorTest { private static final ByteArrayShufflePosition START_POSITION = diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/GroupingTablesTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/GroupingTablesTest.java index 86976e98ed1c..70060eebfa4e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/GroupingTablesTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/GroupingTablesTest.java @@ -17,13 +17,13 @@ */ package org.apache.beam.runners.dataflow.worker.util.common.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.in; import static org.hamcrest.core.Is.is; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -48,9 +48,6 @@ /** Unit tests for {@link GroupingTables}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GroupingTablesTest { @Test diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutorTest.java index b92bd4419c35..607efe4c1850 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutorTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutorTest.java @@ -27,10 +27,10 @@ import static org.apache.beam.runners.dataflow.worker.SourceTranslationUtils.cloudProgressToReaderProgress; import static org.apache.beam.runners.dataflow.worker.SourceTranslationUtils.splitRequestToApproximateSplitRequest; import static org.apache.beam.runners.dataflow.worker.counters.CounterName.named; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.arrayWithSize; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Matchers.anyBoolean; import static org.mockito.Matchers.eq; @@ -75,9 +75,6 @@ /** Tests for {@link MapTaskExecutor}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MapTaskExecutorTest { private static final String COUNTER_PREFIX = "test-"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputObjectAndByteCounterTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputObjectAndByteCounterTest.java index 6527271ce105..67370bdea86b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputObjectAndByteCounterTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputObjectAndByteCounterTest.java @@ -44,9 +44,6 @@ /** Tests for {@link OutputObjectAndByteCounter}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class OutputObjectAndByteCounterTest { private final CounterSet counterSet = new CounterSet(); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputReceiverTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputReceiverTest.java index b062f79e3d92..69f4b2959a0e 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputReceiverTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputReceiverTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.dataflow.worker.util.common.worker; import static org.apache.beam.runners.dataflow.worker.NameContextsForTests.nameContextForTest; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.runners.dataflow.worker.counters.CounterFactory.CounterMean; import org.apache.beam.runners.dataflow.worker.counters.CounterSet; @@ -65,7 +66,7 @@ public void testMultipleOutputReceiver() throws Exception { CounterMean meanByteCount = outputCounter.getMeanByteCount().getAggregate(); Assert.assertEquals(7, meanByteCount.getAggregate().longValue()); Assert.assertEquals(2, meanByteCount.getCount()); - Assert.assertThat(receiver1.outputElems, CoreMatchers.hasItems("hi", "bob")); - Assert.assertThat(receiver2.outputElems, CoreMatchers.hasItems("hi", "bob")); + assertThat(receiver1.outputElems, CoreMatchers.hasItems("hi", "bob")); + assertThat(receiver2.outputElems, CoreMatchers.hasItems("hi", "bob")); } } diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ParDoOperationTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ParDoOperationTest.java index ceebf68d6503..812a79b14817 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ParDoOperationTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ParDoOperationTest.java @@ -19,7 +19,7 @@ import static org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver.TestOutputCounter.getMeanByteCounterName; import static org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver.TestOutputCounter.getObjectCounterName; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import static org.mockito.Matchers.anyBoolean; import static org.mockito.Matchers.eq; import static org.mockito.Mockito.verify; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ReadOperationTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ReadOperationTest.java index a061e8a3f773..a2245deebe29 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ReadOperationTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ReadOperationTest.java @@ -27,11 +27,11 @@ import static org.apache.beam.runners.dataflow.worker.counters.CounterName.named; import static org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver.TestOutputCounter.getMeanByteCounterName; import static org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver.TestOutputCounter.getObjectCounterName; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Matchers.anyBoolean; @@ -72,9 +72,6 @@ /** Tests for ReadOperation. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReadOperationTest { private static final String COUNTER_PREFIX = "test-"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ShuffleEntryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ShuffleEntryTest.java index 7a894d60d9a7..38cb642b2d3f 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ShuffleEntryTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/ShuffleEntryTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.runners.dataflow.worker.util.common.worker; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import org.junit.Test; import org.junit.runner.RunWith; @@ -29,9 +29,6 @@ /** Unit tests for {@link ShuffleEntry}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ShuffleEntryTest { private static final byte[] KEY = {0xA}; private static final byte[] SKEY = {0xB}; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/StubbedExecutor.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/StubbedExecutor.java index 31eed49dbe27..7555f2bd398c 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/StubbedExecutor.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/StubbedExecutor.java @@ -41,9 +41,6 @@ * get the executor and use it to schedule the initial task, then use {@link runNextRunnable()} to * run the initial task, then use {@link runNextRunnable()} to run the second task, etc. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StubbedExecutor { private static final Logger LOG = LoggerFactory.getLogger(StubbedExecutor.class); diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/TestOutputReceiver.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/TestOutputReceiver.java index c2e591d27810..afd86aa995e6 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/TestOutputReceiver.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/TestOutputReceiver.java @@ -30,9 +30,6 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; /** An OutputReceiver that allows the output elements to be retrieved. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestOutputReceiver extends OutputReceiver { private static final String OBJECT_COUNTER_NAME = "-ObjectCount"; private static final String MEAN_BYTE_COUNTER_NAME = "-MeanByteCount"; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/WorkProgressUpdaterTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/WorkProgressUpdaterTest.java index f2b2efc6fe7a..3b060c8c7a6b 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/WorkProgressUpdaterTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/WorkProgressUpdaterTest.java @@ -35,9 +35,6 @@ /** Unit tests for {@link WorkProgressUpdater}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WorkProgressUpdaterTest { /** * WorkProgressUpdater relies on subclasses to implement some of its functionality, particularly diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/WriteOperationTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/WriteOperationTest.java index 376564299ccb..2c2c8bf32907 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/WriteOperationTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/common/worker/WriteOperationTest.java @@ -19,8 +19,8 @@ import static org.apache.beam.runners.dataflow.worker.counters.CounterName.named; import static org.hamcrest.CoreMatchers.hasItems; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Matchers.anyBoolean; import static org.mockito.Matchers.eq; diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServerTest.java index 7cb7bcdadf5e..3a58685793b4 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServerTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServerTest.java @@ -58,13 +58,13 @@ import org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub.CommitWorkStream; import org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub.GetDataStream; import org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub.GetWorkStream; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusRuntimeException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.util.MutableHandlerRegistry; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusRuntimeException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.util.MutableHandlerRegistry; import org.checkerframework.checker.nullness.qual.Nullable; import org.hamcrest.Matchers; import org.joda.time.Instant; @@ -82,7 +82,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class GrpcWindmillServerTest { private static final Logger LOG = LoggerFactory.getLogger(GrpcWindmillServerTest.class); diff --git a/runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto b/runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto index b0e8bda375ff..fdb37ba06971 100644 --- a/runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto +++ b/runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto @@ -87,6 +87,29 @@ message TagValue { optional string state_family = 3; } +message TagValuePrefix { + optional bytes tag_prefix = 1; + optional string state_family = 2; +} + +message TagValuePrefixRequest { + optional bytes tag_prefix = 1; + optional string state_family = 2; + // In request: A previously returned continuation_token from an earlier + // request. Indicates we wish to fetch the next page of values. + // In response: Copied from request. + optional bytes request_position = 3; + optional int64 fetch_max_bytes = 4 [default = 0x6400000]; +} + +message TagValuePrefixResponse { + optional bytes tag_prefix = 1; + optional string state_family = 2; + repeated TagValue tag_values = 3; + optional bytes continuation_position = 4; + optional bytes request_position = 5; +} + message TagBag { optional bytes tag = 1; // In request: All existing items in the list will be deleted. If new values @@ -256,6 +279,7 @@ message KeyedGetDataRequest { required fixed64 work_token = 2; optional fixed64 sharding_key = 6; repeated TagValue values_to_fetch = 3; + repeated TagValuePrefixRequest tag_value_prefixes_to_fetch = 10; repeated TagBag bags_to_fetch = 8; // Must be at most one sorted_list_to_fetch for a given state family and tag. repeated TagSortedListFetchRequest sorted_lists_to_fetch = 9; @@ -286,6 +310,7 @@ message KeyedGetDataResponse { // The response for this key is not populated due to the fetch failing. optional bool failed = 2; repeated TagValue values = 3; + repeated TagValuePrefixResponse tag_value_prefixes = 9; repeated TagBag bags = 6; // There is one TagSortedListFetchResponse per state-family, tag pair. repeated TagSortedListFetchResponse tag_sorted_lists = 8; @@ -351,6 +376,7 @@ message WorkItemCommitRequest { repeated PubSubMessageBundle pubsub_messages = 7; repeated Timer output_timers = 4; repeated TagValue value_updates = 5; + repeated TagValuePrefix tag_value_prefix_deletes = 25; repeated TagBag bag_updates = 18; repeated TagSortedListUpdateRequest sorted_list_updates = 24; repeated Counter counter_updates = 8; diff --git a/runners/java-fn-execution/build.gradle b/runners/java-fn-execution/build.gradle index 94a875c8de62..0ee96414dbdb 100644 --- a/runners/java-fn-execution/build.gradle +++ b/runners/java-fn-execution/build.gradle @@ -31,18 +31,15 @@ dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":sdks:java:fn-execution") compile project(":runners:core-construction-java") - compile project(path: ":vendor:sdks-java-extensions-protobuf", configuration: "shadow") - compile library.java.vendored_grpc_1_26_0 - compile library.java.commons_compress + compile library.java.vendored_grpc_1_36_0 compile library.java.slf4j_api - compile library.java.args4j + compile project(path: ":model:job-management", configuration: "shadow") + compile library.java.joda_time testCompile project(":sdks:java:harness") testCompile project(":runners:core-construction-java") testCompile project(path: ":runners:core-java", configuration: "testRuntime") testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.mockito_core testRuntimeOnly library.java.slf4j_simple } diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ArtifactRetrievalService.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ArtifactRetrievalService.java index 269f8443f625..a40f248f300e 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ArtifactRetrievalService.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ArtifactRetrievalService.java @@ -28,12 +28,12 @@ import org.apache.beam.runners.core.construction.ArtifactResolver; import org.apache.beam.runners.core.construction.BeamUrns; import org.apache.beam.runners.core.construction.DefaultArtifactResolver; -import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.fn.server.FnService; import org.apache.beam.sdk.io.FileSystems; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** An {@link ArtifactRetrievalService} that uses {@link FileSystems} as its backing storage. */ @SuppressWarnings({ diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ArtifactStagingService.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ArtifactStagingService.java index bbe650ee24ab..fba8c87af954 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ArtifactStagingService.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ArtifactStagingService.java @@ -42,20 +42,20 @@ import org.apache.beam.model.jobmanagement.v1.ArtifactApi; import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.runners.fnexecution.FnService; import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.IdGenerators; +import org.apache.beam.sdk.fn.server.FnService; import org.apache.beam.sdk.io.FileSystems; import org.apache.beam.sdk.io.fs.MatchResult; import org.apache.beam.sdk.io.fs.MoveOptions; import org.apache.beam.sdk.io.fs.ResolveOptions; import org.apache.beam.sdk.io.fs.ResourceId; import org.apache.beam.sdk.util.MimeTypes; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/BundleCheckpointHandlers.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/BundleCheckpointHandlers.java index 6ed6127928cd..780881d8d42c 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/BundleCheckpointHandlers.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/BundleCheckpointHandlers.java @@ -108,7 +108,7 @@ public void onCheckpoint(ProcessBundleResponse response) { // Calculate the watermark hold for the timer. long outputTimestamp = BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis(); if (!residual.getApplication().getOutputWatermarksMap().isEmpty()) { - for (org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp outputWatermark : + for (org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp outputWatermark : residual.getApplication().getOutputWatermarksMap().values()) { outputTimestamp = Math.min(outputTimestamp, outputWatermark.getSeconds() * 1000); } diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java index a0d370ef4730..b9851caa9ea0 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java @@ -39,9 +39,6 @@ import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; import org.apache.beam.runners.core.construction.Timer; import org.apache.beam.runners.core.construction.graph.ExecutableStage; -import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor; import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.TimerSpec; @@ -63,6 +60,9 @@ import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.IdGenerators; import org.apache.beam.sdk.fn.data.FnDataReceiver; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.function.ThrowingFunction; import org.apache.beam.sdk.options.ExperimentalOptions; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java index ba7db0f5e354..2e9554f1e962 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java @@ -29,9 +29,9 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionRequest; import org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusRuntimeException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusRuntimeException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolService.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolService.java index a4e06e2a2b41..b1bea3724ca9 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolService.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolService.java @@ -26,11 +26,11 @@ import javax.annotation.concurrent.GuardedBy; import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnControlGrpc; -import org.apache.beam.runners.fnexecution.FnService; -import org.apache.beam.runners.fnexecution.HeaderAccessor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.sdk.fn.server.FnService; +import org.apache.beam.sdk.fn.server.HeaderAccessor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java index f9c2d87113db..ae10a801f382 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java @@ -46,6 +46,7 @@ import org.apache.beam.runners.core.construction.graph.TimerReference; import org.apache.beam.runners.core.construction.graph.UserStateReference; import org.apache.beam.runners.fnexecution.data.RemoteInputDestination; +import org.apache.beam.runners.fnexecution.wire.ByteStringCoder; import org.apache.beam.runners.fnexecution.wire.LengthPrefixUnknownCoders; import org.apache.beam.runners.fnexecution.wire.WireCoders; import org.apache.beam.sdk.coders.Coder; @@ -57,11 +58,10 @@ import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.util.WindowedValue.FullWindowedValueCoder; import org.apache.beam.sdk.values.KV; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableTable; -import org.apache.beam.vendor.sdk.v2.sdk.extensions.protobuf.ByteStringCoder; import org.checkerframework.checker.nullness.qual.Nullable; /** Utility methods for creating {@link ProcessBundleDescriptor} instances. */ @@ -353,12 +353,10 @@ private static Map> forTimerSpecs( case PROCESSING_TIME: spec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); break; - case SYNCHRONIZED_PROCESSING_TIME: - spec = TimerSpecs.timer(TimeDomain.SYNCHRONIZED_PROCESSING_TIME); - break; default: throw new IllegalArgumentException( - String.format("Unknown time domain %s", timerFamilySpec.getTimeDomain())); + String.format( + "Unknown or unsupported time domain %s", timerFamilySpec.getTimeDomain())); } for (WireCoderSetting wireCoderSetting : stage.getWireCoderSettings()) { diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/SingleEnvironmentInstanceJobBundleFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/SingleEnvironmentInstanceJobBundleFactory.java index 6e696c0157b6..5b734872e088 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/SingleEnvironmentInstanceJobBundleFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/SingleEnvironmentInstanceJobBundleFactory.java @@ -25,7 +25,6 @@ import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; import org.apache.beam.runners.core.construction.Timer; import org.apache.beam.runners.core.construction.graph.ExecutableStage; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor; import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.TimerSpec; import org.apache.beam.runners.fnexecution.data.GrpcDataService; @@ -36,6 +35,7 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.data.FnDataReceiver; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.sdk.values.KV; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/TimerReceiverFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/TimerReceiverFactory.java index b71cd3c268b0..e46ef2a15746 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/TimerReceiverFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/TimerReceiverFactory.java @@ -89,6 +89,7 @@ public FnDataReceiver> create(String transformId, String timerFamil StateNamespace namespace = StateNamespaces.window(windowCoder, (BoundedWindow) window); TimerInternals.TimerData timerData = TimerInternals.TimerData.of( + timer.getDynamicTimerTag(), encodeToTimerDataTimerId(timerSpec.transformId(), timerSpec.timerId()), namespace, timer.getClearBit() ? BoundedWindow.TIMESTAMP_MAX_VALUE : timer.getFireTimestamp(), diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java index e8ad75175ebb..87c339e35c92 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java @@ -25,7 +25,6 @@ import java.util.concurrent.TimeoutException; import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnDataGrpc; -import org.apache.beam.runners.fnexecution.FnService; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver; import org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer; @@ -35,9 +34,10 @@ import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.fn.data.InboundDataClient; import org.apache.beam.sdk.fn.data.LogicalEndpoint; +import org.apache.beam.sdk.fn.server.FnService; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.SettableFuture; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java index 209908754de8..b7aadd483481 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java @@ -28,8 +28,6 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; import org.apache.beam.runners.core.construction.BeamUrns; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.control.ControlClientPool; import org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService; @@ -37,6 +35,8 @@ import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService; import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; import org.apache.beam.sdk.fn.IdGenerator; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.sdk.options.ManualDockerEnvironmentOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.RemoteEnvironmentOptions; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/EmbeddedEnvironmentFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/EmbeddedEnvironmentFactory.java index 73e14f6a25bb..25f248c4e17d 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/EmbeddedEnvironmentFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/EmbeddedEnvironmentFactory.java @@ -20,6 +20,7 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import java.time.Duration; +import java.util.Collections; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; @@ -27,9 +28,6 @@ import org.apache.beam.fn.harness.FnHarness; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.InProcessServerFactory; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.control.ControlClientPool; import org.apache.beam.runners.fnexecution.control.ControlClientPool.Source; @@ -38,6 +36,9 @@ import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService; import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; import org.apache.beam.sdk.fn.IdGenerator; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.fn.test.InProcessManagedChannelFactory; import org.apache.beam.sdk.options.PipelineOptions; @@ -100,8 +101,10 @@ public RemoteEnvironment createEnvironment(Environment environment, String worke FnHarness.main( workerId, options, + Collections.emptySet(), // Runner capabilities. loggingServer.getApiServiceDescriptor(), controlServer.getApiServiceDescriptor(), + null, InProcessManagedChannelFactory.create(), OutboundObserverFactory.clientDirect()); } catch (NoClassDefFoundError e) { diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/EnvironmentFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/EnvironmentFactory.java index 267edcbcd0b6..c4780ae49600 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/EnvironmentFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/EnvironmentFactory.java @@ -19,8 +19,6 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.control.ControlClientPool; import org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService; @@ -28,6 +26,8 @@ import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService; import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; import org.apache.beam.sdk.fn.IdGenerator; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.ServerFactory; /** Creates {@link Environment environments} which communicate to an {@link SdkHarnessClient}. */ public interface EnvironmentFactory { diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ExternalEnvironmentFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ExternalEnvironmentFactory.java index 1852d61732a4..3e147efc2149 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ExternalEnvironmentFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ExternalEnvironmentFactory.java @@ -25,8 +25,6 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; import org.apache.beam.runners.core.construction.BeamUrns; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.control.ControlClientPool; import org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService; @@ -35,7 +33,9 @@ import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.channel.ManagedChannelFactory; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.ServerFactory; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactory.java index 8a4bb1a62261..be301f2f61f0 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactory.java @@ -22,7 +22,6 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; import org.apache.beam.runners.core.construction.BeamUrns; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.control.ControlClientPool; import org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService; @@ -30,6 +29,7 @@ import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService; import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; import org.apache.beam.sdk.fn.IdGenerator; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.RemoteEnvironmentOptions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironmentFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironmentFactory.java index d6051b1d3702..8e0fc82d0f6f 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironmentFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironmentFactory.java @@ -18,7 +18,6 @@ package org.apache.beam.runners.fnexecution.environment; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.control.ControlClientPool; import org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService; @@ -26,6 +25,7 @@ import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService; import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; import org.apache.beam.sdk.fn.IdGenerator; +import org.apache.beam.sdk.fn.server.GrpcFnServer; /** * An {@link EnvironmentFactory} that creates StaticRemoteEnvironment used by a runner harness that diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/logging/GrpcLoggingService.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/logging/GrpcLoggingService.java index 631260d084d2..0333d7187ef7 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/logging/GrpcLoggingService.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/logging/GrpcLoggingService.java @@ -23,8 +23,8 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnApi.LogControl; import org.apache.beam.model.fnexecution.v1.BeamFnLoggingGrpc; -import org.apache.beam.runners.fnexecution.FnService; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.sdk.fn.server.FnService; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/provisioning/JobInfo.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/provisioning/JobInfo.java index 95a3070aeee8..6a6645be7887 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/provisioning/JobInfo.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/provisioning/JobInfo.java @@ -20,7 +20,7 @@ import com.google.auto.value.AutoValue; import java.io.Serializable; import org.apache.beam.model.fnexecution.v1.ProvisionApi; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; /** * A subset of {@link org.apache.beam.model.fnexecution.v1.ProvisionApi.ProvisionInfo} that diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/provisioning/StaticGrpcProvisionService.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/provisioning/StaticGrpcProvisionService.java index 05f3468e4c81..2b0650c7c9cb 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/provisioning/StaticGrpcProvisionService.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/provisioning/StaticGrpcProvisionService.java @@ -26,9 +26,9 @@ import org.apache.beam.model.fnexecution.v1.ProvisionServiceGrpc; import org.apache.beam.model.fnexecution.v1.ProvisionServiceGrpc.ProvisionServiceImplBase; import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.runners.fnexecution.FnService; -import org.apache.beam.runners.fnexecution.HeaderAccessor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.sdk.fn.server.FnService; +import org.apache.beam.sdk.fn.server.HeaderAccessor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** * A {@link ProvisionServiceImplBase provision service} that returns a static response to all calls. diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java index df109101e276..145ea8e0e32c 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java @@ -27,9 +27,9 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; import org.apache.beam.model.fnexecution.v1.BeamFnStateGrpc; -import org.apache.beam.runners.fnexecution.FnService; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.sdk.fn.server.FnService; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ServerCallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** An implementation of the Beam Fn State service. */ public class GrpcStateService extends BeamFnStateGrpc.BeamFnStateImplBase diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java index e5e751935f5a..1e727390bb4d 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java @@ -44,6 +44,7 @@ import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.BagUserStateSpec; import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor; import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.SideInputSpec; +import org.apache.beam.runners.fnexecution.wire.ByteStringCoder; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.fn.stream.DataStreams; @@ -52,10 +53,9 @@ import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.util.common.Reiterable; import org.apache.beam.sdk.values.KV; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.sdk.v2.sdk.extensions.protobuf.ByteStringCoder; /** * A set of utility methods which construct {@link StateRequestHandler}s. diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcService.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcService.java index c3f9bba16504..b2968b4779f9 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcService.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcService.java @@ -33,9 +33,9 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusResponse; import org.apache.beam.model.fnexecution.v1.BeamFnWorkerStatusGrpc.BeamFnWorkerStatusImplBase; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; -import org.apache.beam.runners.fnexecution.FnService; -import org.apache.beam.runners.fnexecution.HeaderAccessor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.sdk.fn.server.FnService; +import org.apache.beam.sdk.fn.server.HeaderAccessor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/WorkerStatusClient.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/WorkerStatusClient.java index 0d892e518eb8..03b48ec81b14 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/WorkerStatusClient.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/WorkerStatusClient.java @@ -29,7 +29,7 @@ import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.IdGenerators; import org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtils.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtils.java index 0059cc7ac843..ecfc234f885a 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtils.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtils.java @@ -43,7 +43,7 @@ import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.BiMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableBiMap; @@ -151,13 +151,13 @@ private static void fireTimer( Timer timerValue = Timer.of( currentTimerKey, - "", + timer.getTimerId(), Collections.singletonList(window), timestamp, outputTimestamp, PaneInfo.NO_FIRING); KV transformAndTimerId = - TimerReceiverFactory.decodeTimerDataTimerId(timer.getTimerId()); + TimerReceiverFactory.decodeTimerDataTimerId(timer.getTimerFamilyId()); FnDataReceiver fnTimerReceiver = timerReceivers.get(transformAndTimerId); Preconditions.checkNotNull( fnTimerReceiver, "No FnDataReceiver found for %s", transformAndTimerId); diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkStreamingSideInputHandlerFactory.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/StreamingSideInputHandlerFactory.java similarity index 93% rename from runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkStreamingSideInputHandlerFactory.java rename to runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/StreamingSideInputHandlerFactory.java index f6f6d55d1ed3..1dfcc4a8d7bf 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkStreamingSideInputHandlerFactory.java +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/StreamingSideInputHandlerFactory.java @@ -15,7 +15,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.flink.translation.functions; +package org.apache.beam.runners.fnexecution.translation; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; @@ -40,13 +40,13 @@ /** * {@link StateRequestHandler} that uses {@link org.apache.beam.runners.core.SideInputHandler} to - * access the Flink broadcast state that represents side inputs. + * access the broadcast state that represents side inputs. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public class FlinkStreamingSideInputHandlerFactory implements SideInputHandlerFactory { +public class StreamingSideInputHandlerFactory implements SideInputHandlerFactory { // Map from side input id to global PCollection id. private final Map> sideInputToCollection; @@ -56,7 +56,7 @@ public class FlinkStreamingSideInputHandlerFactory implements SideInputHandlerFa * Creates a new state handler for the given stage. Note that this requires a traversal of the * stage itself, so this should only be called once per stage rather than once per bundle. */ - public static FlinkStreamingSideInputHandlerFactory forStage( + public static StreamingSideInputHandlerFactory forStage( ExecutableStage stage, Map> viewMapping, org.apache.beam.runners.core.SideInputHandler runnerHandler) { @@ -76,12 +76,12 @@ public static FlinkStreamingSideInputHandlerFactory forStage( sideInputId.getLocalName())); } - FlinkStreamingSideInputHandlerFactory factory = - new FlinkStreamingSideInputHandlerFactory(sideInputBuilder.build(), runnerHandler); + StreamingSideInputHandlerFactory factory = + new StreamingSideInputHandlerFactory(sideInputBuilder.build(), runnerHandler); return factory; } - private FlinkStreamingSideInputHandlerFactory( + private StreamingSideInputHandlerFactory( Map> sideInputToCollection, org.apache.beam.runners.core.SideInputHandler runnerHandler) { this.sideInputToCollection = sideInputToCollection; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/wire/ByteStringCoder.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/wire/ByteStringCoder.java new file mode 100644 index 000000000000..86127f72c605 --- /dev/null +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/wire/ByteStringCoder.java @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.wire; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.coders.AtomicCoder; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.util.VarInt; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; + +/** + * A duplicate of {@link ByteStringCoder} that uses the Apache Beam vendored protobuf. + * + *

    For internal use only, no backwards-compatibility guarantees. + */ +@Internal +public class ByteStringCoder extends AtomicCoder { + + public static ByteStringCoder of() { + return INSTANCE; + } + + /** ************************ */ + private static final ByteStringCoder INSTANCE = new ByteStringCoder(); + + private static final TypeDescriptor TYPE_DESCRIPTOR = + new TypeDescriptor() {}; + + private ByteStringCoder() {} + + @Override + public void encode(ByteString value, OutputStream outStream) throws IOException, CoderException { + encode(value, outStream, Context.NESTED); + } + + @Override + public void encode(ByteString value, OutputStream outStream, Context context) + throws IOException, CoderException { + if (value == null) { + throw new CoderException("cannot encode a null ByteString"); + } + + if (!context.isWholeStream) { + // ByteString is not delimited, so write its size before its contents. + VarInt.encode(value.size(), outStream); + } + value.writeTo(outStream); + } + + @Override + public ByteString decode(InputStream inStream) throws IOException { + return decode(inStream, Context.NESTED); + } + + @Override + public ByteString decode(InputStream inStream, Context context) throws IOException { + if (context.isWholeStream) { + return ByteString.readFrom(inStream); + } + + int size = VarInt.decodeInt(inStream); + // ByteString reads to the end of the input stream, so give it a limited stream of exactly + // the right length. Also set its chunk size so that the ByteString will contain exactly + // one chunk. + return ByteString.readFrom(ByteStreams.limit(inStream, size), size); + } + + @Override + protected long getEncodedElementByteSize(ByteString value) throws Exception { + int size = value.size(); + return (long) VarInt.getLength(size) + size; + } + + @Override + public void verifyDeterministic() {} + + /** + * {@inheritDoc} + * + *

    Returns true; the encoded output of two invocations of {@link ByteStringCoder} in the same + * {@link Coder.Context} will be identical if and only if the original {@link ByteString} objects + * are equal according to {@link Object#equals}. + */ + @Override + public boolean consistentWithEquals() { + return true; + } + + /** + * {@inheritDoc} + * + *

    Returns true. {@link ByteString#size} returns the size of an array and a {@link VarInt}. + */ + @Override + public boolean isRegisterByteSizeObserverCheap(ByteString value) { + return true; + } + + @Override + public TypeDescriptor getEncodedTypeDescriptor() { + return TYPE_DESCRIPTOR; + } +} diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/EmbeddedSdkHarness.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/EmbeddedSdkHarness.java index 91e5cd828061..417f8ed458df 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/EmbeddedSdkHarness.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/EmbeddedSdkHarness.java @@ -31,6 +31,9 @@ import org.apache.beam.runners.fnexecution.environment.EmbeddedEnvironmentFactory; import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService; import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; @@ -42,9 +45,6 @@ * {@link FnHarness} to properly execute, and provides access to the associated client and harness * during test execution. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class EmbeddedSdkHarness extends ExternalResource implements TestRule { public static EmbeddedSdkHarness create() { diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/GrpcContextHeaderAccessorProviderTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/GrpcContextHeaderAccessorProviderTest.java index aebc327364da..b24657b6a50a 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/GrpcContextHeaderAccessorProviderTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/GrpcContextHeaderAccessorProviderTest.java @@ -24,16 +24,18 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.Elements; import org.apache.beam.model.fnexecution.v1.BeamFnDataGrpc; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.CallOptions; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Channel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ClientCall; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ClientInterceptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingClientCall.SimpleForwardingClientCall; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Metadata; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.MethodDescriptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.testing.GrpcCleanupRule; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.CallOptions; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Channel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ClientCall; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ClientInterceptor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ForwardingClientCall.SimpleForwardingClientCall; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Metadata; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.MethodDescriptor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.testing.GrpcCleanupRule; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Assert; import org.junit.Rule; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/ServerFactoryTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/ServerFactoryTest.java index 94ddc61a4452..510a292c7a64 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/ServerFactoryTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/ServerFactoryTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.fnexecution; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.contains; @@ -26,7 +27,6 @@ import static org.hamcrest.Matchers.lessThan; import static org.hamcrest.Matchers.startsWith; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assume.assumeTrue; import java.net.Inet4Address; @@ -42,13 +42,14 @@ import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.sdk.fn.channel.ManagedChannelFactory; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.sdk.fn.test.TestStreams; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.testing.GrpcCleanupRule; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.Epoll; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.testing.GrpcCleanupRule; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.Epoll; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.net.HostAndPort; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/ArtifactRetrievalServiceTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/ArtifactRetrievalServiceTest.java index 225ab9c47e0a..d89e39a561f8 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/ArtifactRetrievalServiceTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/ArtifactRetrievalServiceTest.java @@ -28,12 +28,12 @@ import org.apache.beam.model.jobmanagement.v1.ArtifactApi; import org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc; import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.testing.GrpcCleanupRule; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.testing.GrpcCleanupRule; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -45,9 +45,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ArtifactRetrievalServiceTest { private static final int TEST_BUFFER_SIZE = 1 << 10; private GrpcFnServer retrievalServer; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/ArtifactStagingServiceTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/ArtifactStagingServiceTest.java index f9358e6d6eaf..a23ff906d67e 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/ArtifactStagingServiceTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/ArtifactStagingServiceTest.java @@ -29,12 +29,12 @@ import org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc; import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.testing.GrpcCleanupRule; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.testing.GrpcCleanupRule; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -47,9 +47,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ArtifactStagingServiceTest { private static final int TEST_BUFFER_SIZE = 1 << 10; private ArtifactStagingService stagingService; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactoryTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactoryTest.java index f05f08127ed3..b61e911a0748 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactoryTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactoryTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.fnexecution.control; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; -import static org.hamcrest.Matchers.greaterThanOrEqualTo; -import static org.hamcrest.Matchers.is; +import static org.hamcrest.Matchers.equalTo; import static org.mockito.ArgumentMatchers.eq; import static org.mockito.Matchers.any; import static org.mockito.Mockito.mock; @@ -30,9 +30,12 @@ import java.util.Collections; import java.util.HashSet; import java.util.Map; -import java.util.Timer; -import java.util.TimerTask; +import java.util.Optional; import java.util.concurrent.CompletableFuture; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledFuture; +import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicBoolean; import org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionResponse; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; @@ -46,8 +49,6 @@ import org.apache.beam.runners.core.construction.ModelCoders; import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; import org.apache.beam.runners.core.construction.graph.ExecutableStage; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.data.GrpcDataService; import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory; @@ -62,11 +63,13 @@ import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.IdGenerators; import org.apache.beam.sdk.fn.data.CloseableFnDataReceiver; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.options.PortablePipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.junit.Assert; import org.junit.Before; @@ -82,9 +85,6 @@ /** Tests for {@link DefaultJobBundleFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DefaultJobBundleFactoryTest { @Rule public ExpectedException thrown = ExpectedException.none(); @Mock private EnvironmentFactory envFactory; @@ -458,32 +458,41 @@ public void loadBalancesBundles() throws Exception { final RemoteBundle b2 = sbf.getBundle(orf, srh, BundleProgressHandler.ignored()); verify(envFactory, Mockito.times(2)).createEnvironment(eq(environment), any()); - long tms = System.currentTimeMillis(); - AtomicBoolean closed = new AtomicBoolean(); - // close to free up environment for another bundle - TimerTask closeBundleTask = - new TimerTask() { - @Override - public void run() { - try { - b2.close(); - closed.set(true); - } catch (Exception e) { - throw new RuntimeException(e); - } - } - }; - new Timer().schedule(closeBundleTask, 100); - + AtomicBoolean b2Closing = new AtomicBoolean(false); + ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); + ScheduledFuture> closingFuture = + executor.schedule( + () -> { + try { + b2Closing.compareAndSet(false, true); + b2.close(); + return Optional.empty(); + } catch (Exception e) { + return Optional.of(e); + } + }, + 100, + TimeUnit.MILLISECONDS); + + assertThat(b2Closing.get(), equalTo(false)); + + // This call should block until closingFuture has finished closing b2 (100ms) RemoteBundle b3 = sbf.getBundle(orf, srh, BundleProgressHandler.ignored()); - // ensure we waited for close - Assert.assertThat(System.currentTimeMillis() - tms, greaterThanOrEqualTo(100L)); - Assert.assertThat(closed.get(), is(true)); + // ensure the previous call waited for close + assertThat(b2Closing.get(), equalTo(true)); + + // Join closingFuture and check if an exception occurred + Optional closingException = closingFuture.get(); + if (closingException.isPresent()) { + throw new AssertionError("Exception occurred while closing b2", closingException.get()); + } verify(envFactory, Mockito.times(2)).createEnvironment(eq(environment), any()); b3.close(); b1.close(); + + executor.shutdown(); } } @@ -506,7 +515,7 @@ public void rejectsStateCachingWithLoadBalancing() throws Exception { stageIdGenerator, serverInfo) .close()); - Assert.assertThat(e.getMessage(), containsString("state_cache_size")); + assertThat(e.getMessage(), containsString("state_cache_size")); } private DefaultJobBundleFactory createDefaultJobBundleFactory( diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolServiceTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolServiceTest.java index d518e4369b88..f6f99b0dc586 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolServiceTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolServiceTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.fnexecution.control; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.any; import static org.mockito.Matchers.argThat; import static org.mockito.Mockito.mock; @@ -32,14 +32,14 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionRequest; import org.apache.beam.model.fnexecution.v1.BeamFnControlGrpc; -import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; import org.apache.beam.sdk.util.MoreFutures; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.After; import org.junit.Assert; import org.junit.Before; @@ -49,9 +49,6 @@ /** Unit tests for {@link FnApiControlClientPoolService}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FnApiControlClientPoolServiceTest { private final ControlClientPool pool = MapControlClientPool.create(); diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java index 2c823f0a4679..26bb95e4f136 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.runners.fnexecution.control; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.isA; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.any; import static org.mockito.Mockito.verify; @@ -33,7 +33,7 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionRequest; import org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionResponse; import org.apache.beam.sdk.util.MoreFutures; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Before; import org.junit.Rule; import org.junit.Test; @@ -46,9 +46,6 @@ /** Unit tests for {@link FnApiControlClient}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FnApiControlClientTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptorsTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptorsTest.java index 55a15dd87df6..9337c63417e5 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptorsTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptorsTest.java @@ -63,9 +63,6 @@ import org.junit.Test; /** Tests for {@link ProcessBundleDescriptors}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ProcessBundleDescriptorsTest implements Serializable { /** diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java index dcea277d0c2e..4a6609dfd5aa 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java @@ -42,6 +42,7 @@ import java.util.Map.Entry; import java.util.Set; import java.util.UUID; +import java.util.concurrent.CompletionStage; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentMap; import java.util.concurrent.CountDownLatch; @@ -55,6 +56,7 @@ import java.util.concurrent.TimeUnit; import java.util.function.Function; import org.apache.beam.fn.harness.FnHarness; +import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleProgressResponse; import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleResponse; import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleSplitResponse; @@ -68,16 +70,15 @@ import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser; import org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode; import org.apache.beam.runners.core.construction.graph.ProtoOverrides; +import org.apache.beam.runners.core.construction.graph.SideInputReference; import org.apache.beam.runners.core.construction.graph.SplittableParDoExpander; import org.apache.beam.runners.core.metrics.DistributionData; +import org.apache.beam.runners.core.metrics.ExecutionStateSampler; import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.TypeUrns; import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Urns; import org.apache.beam.runners.core.metrics.MonitoringInfoMatchers; import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder; -import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.InProcessServerFactory; import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor; import org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor; import org.apache.beam.runners.fnexecution.data.GrpcDataService; @@ -98,10 +99,14 @@ import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.fn.data.FnDataReceiver; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.fn.test.InProcessManagedChannelFactory; import org.apache.beam.sdk.metrics.Metrics; import org.apache.beam.sdk.options.ExperimentalOptions; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.state.BagState; import org.apache.beam.sdk.state.ReadableState; @@ -131,7 +136,7 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.PCollectionView; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Optional; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; @@ -144,7 +149,6 @@ import org.joda.time.DateTimeUtils; import org.joda.time.Duration; import org.junit.After; -import org.junit.Before; import org.junit.Rule; import org.junit.Test; import org.junit.runner.RunWith; @@ -159,9 +163,8 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "keyfor", - "nullness" -}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) + "keyfor" +}) public class RemoteExecutionTest implements Serializable { @Rule public transient ResetDateTimeProvider resetDateTimeProvider = new ResetDateTimeProvider(); @@ -178,8 +181,7 @@ public class RemoteExecutionTest implements Serializable { private transient ExecutorService sdkHarnessExecutor; private transient Future sdkHarnessExecutorFuture; - @Before - public void setup() throws Exception { + public void launchSdkHarness(PipelineOptions options) throws Exception { // Setup execution-time servers ThreadFactory threadFactory = new ThreadFactoryBuilder().setDaemon(true).build(); serverExecutor = Executors.newCachedThreadPool(threadFactory); @@ -212,9 +214,11 @@ public void setup() throws Exception { try { FnHarness.main( "id", - PipelineOptionsFactory.create(), + options, + Collections.emptySet(), // Runner capabilities. loggingServer.getApiServiceDescriptor(), controlServer.getApiServiceDescriptor(), + null, InProcessManagedChannelFactory.create(), OutboundObserverFactory.clientDirect()); } catch (Exception e) { @@ -250,6 +254,7 @@ public void tearDown() throws Exception { @Test public void testExecution() throws Exception { + launchSdkHarness(PipelineOptionsFactory.create()); Pipeline p = Pipeline.create(); p.apply("impulse", Impulse.create()) .apply( @@ -325,6 +330,7 @@ public void process(ProcessContext ctxt) { @Test public void testBundleProcessorThrowsExecutionExceptionWhenUserCodeThrows() throws Exception { + launchSdkHarness(PipelineOptionsFactory.create()); Pipeline p = Pipeline.create(); p.apply("impulse", Impulse.create()) .apply( @@ -405,6 +411,7 @@ public void process(ProcessContext ctxt) throws Exception { @Test public void testExecutionWithSideInput() throws Exception { + launchSdkHarness(PipelineOptionsFactory.create()); Pipeline p = Pipeline.create(); addExperiment(p.getOptions().as(ExperimentalOptions.class), "beam_fn_api"); // TODO(BEAM-10097): Remove experiment once all portable runners support this view type @@ -530,6 +537,165 @@ MultimapSideInputHandler forMultimapSideInput( } } + @Test + public void testExecutionWithSideInputCaching() throws Exception { + Pipeline p = Pipeline.create(); + addExperiment(p.getOptions().as(ExperimentalOptions.class), "beam_fn_api"); + // TODO(BEAM-10212): Remove experiment once cross bundle caching is used by default + addExperiment(p.getOptions().as(ExperimentalOptions.class), "cross_bundle_caching"); + // TODO(BEAM-10097): Remove experiment once all portable runners support this view type + addExperiment(p.getOptions().as(ExperimentalOptions.class), "use_runner_v2"); + + launchSdkHarness(p.getOptions()); + + PCollection input = + p.apply("impulse", Impulse.create()) + .apply( + "create", + ParDo.of( + new DoFn() { + @ProcessElement + public void process(ProcessContext ctxt) { + ctxt.output("zero"); + ctxt.output("one"); + ctxt.output("two"); + } + })) + .setCoder(StringUtf8Coder.of()); + PCollectionView> view = input.apply("createSideInput", View.asIterable()); + + input + .apply( + "readSideInput", + ParDo.of( + new DoFn>() { + @ProcessElement + public void processElement(ProcessContext context) { + for (String value : context.sideInput(view)) { + context.output(KV.of(context.element(), value)); + } + } + }) + .withSideInputs(view)) + .setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())) + // Force the output to be materialized + .apply("gbk", GroupByKey.create()); + + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p); + FusedPipeline fused = GreedyPipelineFuser.fuse(pipelineProto); + Optional optionalStage = + Iterables.tryFind( + fused.getFusedStages(), (ExecutableStage stage) -> !stage.getSideInputs().isEmpty()); + checkState(optionalStage.isPresent(), "Expected a stage with side inputs."); + ExecutableStage stage = optionalStage.get(); + + ExecutableProcessBundleDescriptor descriptor = + ProcessBundleDescriptors.fromExecutableStage( + "test_stage", + stage, + dataServer.getApiServiceDescriptor(), + stateServer.getApiServiceDescriptor()); + + BundleProcessor processor = + controlClient.getProcessor( + descriptor.getProcessBundleDescriptor(), + descriptor.getRemoteInputDestinations(), + stateDelegator); + Map remoteOutputCoders = descriptor.getRemoteOutputCoders(); + Map>> outputValues = new HashMap<>(); + Map> outputReceivers = new HashMap<>(); + for (Entry remoteOutputCoder : remoteOutputCoders.entrySet()) { + List> outputContents = Collections.synchronizedList(new ArrayList<>()); + outputValues.put(remoteOutputCoder.getKey(), outputContents); + outputReceivers.put( + remoteOutputCoder.getKey(), + RemoteOutputReceiver.of( + (Coder>) remoteOutputCoder.getValue(), outputContents::add)); + } + + StoringStateRequestHandler stateRequestHandler = + new StoringStateRequestHandler( + StateRequestHandlers.forSideInputHandlerFactory( + descriptor.getSideInputSpecs(), + new SideInputHandlerFactory() { + @Override + public + IterableSideInputHandler forIterableSideInput( + String pTransformId, + String sideInputId, + Coder elementCoder, + Coder windowCoder) { + return new IterableSideInputHandler() { + @Override + public Iterable get(W window) { + return (Iterable) Arrays.asList("A", "B", "C"); + } + + @Override + public Coder elementCoder() { + return elementCoder; + } + }; + } + + @Override + public + MultimapSideInputHandler forMultimapSideInput( + String pTransformId, + String sideInputId, + KvCoder elementCoder, + Coder windowCoder) { + throw new UnsupportedOperationException(); + } + })); + SideInputReference sideInputReference = stage.getSideInputs().iterator().next(); + String transformId = sideInputReference.transform().getId(); + String sideInputId = sideInputReference.localName(); + stateRequestHandler.addCacheToken( + BeamFnApi.ProcessBundleRequest.CacheToken.newBuilder() + .setSideInput( + BeamFnApi.ProcessBundleRequest.CacheToken.SideInput.newBuilder() + .setSideInputId(sideInputId) + .setTransformId(transformId) + .build()) + .setToken(ByteString.copyFromUtf8("SideInputToken")) + .build()); + BundleProgressHandler progressHandler = BundleProgressHandler.ignored(); + + try (RemoteBundle bundle = + processor.newBundle(outputReceivers, stateRequestHandler, progressHandler)) { + Iterables.getOnlyElement(bundle.getInputReceivers().values()) + .accept(valueInGlobalWindow("X")); + } + + try (RemoteBundle bundle = + processor.newBundle(outputReceivers, stateRequestHandler, progressHandler)) { + Iterables.getOnlyElement(bundle.getInputReceivers().values()) + .accept(valueInGlobalWindow("X")); + } + for (Collection> windowedValues : outputValues.values()) { + assertThat( + windowedValues, + containsInAnyOrder( + valueInGlobalWindow(KV.of("X", "A")), + valueInGlobalWindow(KV.of("X", "B")), + valueInGlobalWindow(KV.of("X", "C")), + valueInGlobalWindow(KV.of("X", "A")), + valueInGlobalWindow(KV.of("X", "B")), + valueInGlobalWindow(KV.of("X", "C")))); + } + + // Only expect one read to the sideInput + assertEquals(1, stateRequestHandler.receivedRequests.size()); + BeamFnApi.StateRequest receivedRequest = stateRequestHandler.receivedRequests.get(0); + assertEquals( + receivedRequest.getStateKey().getIterableSideInput(), + BeamFnApi.StateKey.IterableSideInput.newBuilder() + .setSideInputId(sideInputId) + .setTransformId(transformId) + .build()); + } + /** * A {@link DoFn} that uses static maps of {@link CountDownLatch}es to block execution allowing * for synchronization during test execution. The expected flow is: @@ -563,6 +729,7 @@ public MetricsDoFn() { public void startBundle() throws InterruptedException { Metrics.counter(RemoteExecutionTest.class, START_USER_COUNTER_NAME).inc(10); Metrics.distribution(RemoteExecutionTest.class, START_USER_DISTRIBUTION_NAME).update(10); + ExecutionStateSampler.instance().doSampling(1); } @ProcessElement @@ -572,6 +739,7 @@ public void processElement(ProcessContext ctxt) throws InterruptedException { ctxt.output("two"); Metrics.counter(RemoteExecutionTest.class, PROCESS_USER_COUNTER_NAME).inc(); Metrics.distribution(RemoteExecutionTest.class, PROCESS_USER_DISTRIBUTION_NAME).update(1); + ExecutionStateSampler.instance().doSampling(2); AFTER_PROCESS.get(uuid).countDown(); checkState( ALLOW_COMPLETION.get(uuid).await(60, TimeUnit.SECONDS), @@ -582,11 +750,13 @@ public void processElement(ProcessContext ctxt) throws InterruptedException { public void finishBundle() throws InterruptedException { Metrics.counter(RemoteExecutionTest.class, FINISH_USER_COUNTER_NAME).inc(100); Metrics.distribution(RemoteExecutionTest.class, FINISH_USER_DISTRIBUTION_NAME).update(100); + ExecutionStateSampler.instance().doSampling(3); } } @Test public void testMetrics() throws Exception { + launchSdkHarness(PipelineOptionsFactory.create()); MetricsDoFn metricsDoFn = new MetricsDoFn(); Pipeline p = Pipeline.create(); @@ -640,7 +810,7 @@ public void process(ProcessContext ctxt) { (Coder>) remoteOutputCoder.getValue(), outputContents::add)); } - final String testPTransformId = "create/ParMultiDo(Metrics)"; + final String testPTransformId = "create-ParMultiDo-Metrics-"; BundleProgressHandler progressHandler = new BundleProgressHandler() { @Override @@ -802,7 +972,7 @@ public void onCompleted(ProcessBundleResponse response) { builder = new SimpleMonitoringInfoBuilder(); builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT); builder.setLabel( - MonitoringInfoConstants.Labels.PCOLLECTION, testPTransformId + ".output"); + MonitoringInfoConstants.Labels.PCOLLECTION, "create/ParMultiDo(Metrics).output"); builder.setInt64SumValue(3); matchers.add(MonitoringInfoMatchers.matchSetFields(builder.build())); @@ -831,7 +1001,7 @@ public void onCompleted(ProcessBundleResponse response) { matchers.add( allOf( MonitoringInfoMatchers.matchSetFields(builder.build()), - MonitoringInfoMatchers.counterValueGreaterThanOrEqualTo(0))); + MonitoringInfoMatchers.counterValueGreaterThanOrEqualTo(1))); // Check for execution time metrics for the testPTransformId builder = new SimpleMonitoringInfoBuilder(); @@ -841,7 +1011,7 @@ public void onCompleted(ProcessBundleResponse response) { matchers.add( allOf( MonitoringInfoMatchers.matchSetFields(builder.build()), - MonitoringInfoMatchers.counterValueGreaterThanOrEqualTo(0))); + MonitoringInfoMatchers.counterValueGreaterThanOrEqualTo(2))); builder = new SimpleMonitoringInfoBuilder(); builder.setUrn(Urns.FINISH_BUNDLE_MSECS); @@ -850,7 +1020,7 @@ public void onCompleted(ProcessBundleResponse response) { matchers.add( allOf( MonitoringInfoMatchers.matchSetFields(builder.build()), - MonitoringInfoMatchers.counterValueGreaterThanOrEqualTo(0))); + MonitoringInfoMatchers.counterValueGreaterThanOrEqualTo(3))); assertThat( response.getMonitoringInfosList(), @@ -881,6 +1051,7 @@ public void onCompleted(ProcessBundleResponse response) { @Test public void testExecutionWithUserState() throws Exception { + launchSdkHarness(PipelineOptionsFactory.create()); Pipeline p = Pipeline.create(); final String stateId = "foo"; final String stateId2 = "foo2"; @@ -1035,8 +1206,264 @@ public void clear(ByteString key, BoundedWindow window) { assertThat(userStateData.get(stateId2), IsEmptyIterable.emptyIterable()); } + @Test + public void testExecutionWithUserStateCaching() throws Exception { + Pipeline p = Pipeline.create(); + // TODO(BEAM-10212): Remove experiment once cross bundle caching is used by default + addExperiment(p.getOptions().as(ExperimentalOptions.class), "cross_bundle_caching"); + + launchSdkHarness(p.getOptions()); + + final String stateId = "foo"; + final String stateId2 = "bar"; + + p.apply("impulse", Impulse.create()) + .apply( + "create", + ParDo.of( + new DoFn>() { + @ProcessElement + public void process(ProcessContext ctxt) {} + })) + .setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())) + .apply( + "userState", + ParDo.of( + new DoFn, KV>() { + + @StateId(stateId) + private final StateSpec> bufferState = + StateSpecs.bag(StringUtf8Coder.of()); + + @StateId(stateId2) + private final StateSpec> bufferState2 = + StateSpecs.bag(StringUtf8Coder.of()); + + @ProcessElement + public void processElement( + @Element KV element, + @StateId(stateId) BagState state, + @StateId(stateId2) BagState state2, + OutputReceiver> r) { + for (String value : state.read()) { + r.output(KV.of(element.getKey(), value)); + } + ReadableState isEmpty = state2.isEmpty(); + if (isEmpty.read()) { + r.output(KV.of(element.getKey(), "Empty")); + } else { + state2.clear(); + } + } + })) + // Force the output to be materialized + .apply("gbk", GroupByKey.create()); + + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p); + FusedPipeline fused = GreedyPipelineFuser.fuse(pipelineProto); + Optional optionalStage = + Iterables.tryFind( + fused.getFusedStages(), (ExecutableStage stage) -> !stage.getUserStates().isEmpty()); + checkState(optionalStage.isPresent(), "Expected a stage with user state."); + ExecutableStage stage = optionalStage.get(); + + ExecutableProcessBundleDescriptor descriptor = + ProcessBundleDescriptors.fromExecutableStage( + "test_stage", + stage, + dataServer.getApiServiceDescriptor(), + stateServer.getApiServiceDescriptor()); + + BundleProcessor processor = + controlClient.getProcessor( + descriptor.getProcessBundleDescriptor(), + descriptor.getRemoteInputDestinations(), + stateDelegator); + Map remoteOutputCoders = descriptor.getRemoteOutputCoders(); + Map>> outputValues = new HashMap<>(); + Map> outputReceivers = new HashMap<>(); + for (Entry remoteOutputCoder : remoteOutputCoders.entrySet()) { + List> outputContents = Collections.synchronizedList(new ArrayList<>()); + outputValues.put(remoteOutputCoder.getKey(), outputContents); + outputReceivers.put( + remoteOutputCoder.getKey(), + RemoteOutputReceiver.of( + (Coder>) remoteOutputCoder.getValue(), outputContents::add)); + } + + Map> userStateData = + ImmutableMap.of( + stateId, + new ArrayList( + Arrays.asList( + ByteString.copyFrom( + CoderUtils.encodeToByteArray( + StringUtf8Coder.of(), "A", Coder.Context.NESTED)), + ByteString.copyFrom( + CoderUtils.encodeToByteArray( + StringUtf8Coder.of(), "B", Coder.Context.NESTED)), + ByteString.copyFrom( + CoderUtils.encodeToByteArray( + StringUtf8Coder.of(), "C", Coder.Context.NESTED)))), + stateId2, + new ArrayList( + Arrays.asList( + ByteString.copyFrom( + CoderUtils.encodeToByteArray( + StringUtf8Coder.of(), "D", Coder.Context.NESTED))))); + + StoringStateRequestHandler stateRequestHandler = + new StoringStateRequestHandler( + StateRequestHandlers.forBagUserStateHandlerFactory( + descriptor, + new BagUserStateHandlerFactory() { + @Override + public BagUserStateHandler forUserState( + String pTransformId, + String userStateId, + Coder keyCoder, + Coder valueCoder, + Coder windowCoder) { + return new BagUserStateHandler() { + @Override + public Iterable get(ByteString key, BoundedWindow window) { + return (Iterable) userStateData.get(userStateId); + } + + @Override + public void append( + ByteString key, BoundedWindow window, Iterator values) { + Iterators.addAll(userStateData.get(userStateId), (Iterator) values); + } + + @Override + public void clear(ByteString key, BoundedWindow window) { + userStateData.get(userStateId).clear(); + } + }; + } + })); + + try (RemoteBundle bundle = + processor.newBundle( + outputReceivers, stateRequestHandler, BundleProgressHandler.ignored())) { + Iterables.getOnlyElement(bundle.getInputReceivers().values()) + .accept(valueInGlobalWindow(KV.of("X", "Y"))); + } + try (RemoteBundle bundle2 = + processor.newBundle( + outputReceivers, stateRequestHandler, BundleProgressHandler.ignored())) { + Iterables.getOnlyElement(bundle2.getInputReceivers().values()) + .accept(valueInGlobalWindow(KV.of("X", "Z"))); + } + for (Collection> windowedValues : outputValues.values()) { + assertThat( + windowedValues, + containsInAnyOrder( + valueInGlobalWindow(KV.of("X", "A")), + valueInGlobalWindow(KV.of("X", "B")), + valueInGlobalWindow(KV.of("X", "C")), + valueInGlobalWindow(KV.of("X", "A")), + valueInGlobalWindow(KV.of("X", "B")), + valueInGlobalWindow(KV.of("X", "C")), + valueInGlobalWindow(KV.of("X", "Empty")))); + } + assertThat( + userStateData.get(stateId), + IsIterableContainingInOrder.contains( + ByteString.copyFrom( + CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "A", Coder.Context.NESTED)), + ByteString.copyFrom( + CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "B", Coder.Context.NESTED)), + ByteString.copyFrom( + CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "C", Coder.Context.NESTED)))); + assertThat(userStateData.get(stateId2), IsEmptyIterable.emptyIterable()); + + // 3 Requests expected: state read, state2 read, and state2 clear + assertEquals(3, stateRequestHandler.getRequestCount()); + ByteString.Output out = ByteString.newOutput(); + StringUtf8Coder.of().encode("X", out); + + assertEquals( + stateId, + stateRequestHandler + .receivedRequests + .get(0) + .getStateKey() + .getBagUserState() + .getUserStateId()); + assertEquals( + stateRequestHandler.receivedRequests.get(0).getStateKey().getBagUserState().getKey(), + out.toByteString()); + assertTrue(stateRequestHandler.receivedRequests.get(0).hasGet()); + + assertEquals( + stateId2, + stateRequestHandler + .receivedRequests + .get(1) + .getStateKey() + .getBagUserState() + .getUserStateId()); + assertEquals( + stateRequestHandler.receivedRequests.get(1).getStateKey().getBagUserState().getKey(), + out.toByteString()); + assertTrue(stateRequestHandler.receivedRequests.get(1).hasGet()); + + assertEquals( + stateId2, + stateRequestHandler + .receivedRequests + .get(2) + .getStateKey() + .getBagUserState() + .getUserStateId()); + assertEquals( + stateRequestHandler.receivedRequests.get(2).getStateKey().getBagUserState().getKey(), + out.toByteString()); + assertTrue(stateRequestHandler.receivedRequests.get(2).hasClear()); + } + + /** + * A state handler that stores each state request made - used to validate that cached requests are + * not forwarded to the state client. + */ + private static class StoringStateRequestHandler implements StateRequestHandler { + + private StateRequestHandler stateRequestHandler; + private ArrayList receivedRequests; + private ArrayList cacheTokens; + + StoringStateRequestHandler(StateRequestHandler delegate) { + stateRequestHandler = delegate; + receivedRequests = new ArrayList<>(); + cacheTokens = new ArrayList<>(); + } + + @Override + public CompletionStage handle(BeamFnApi.StateRequest request) + throws Exception { + receivedRequests.add(request); + return stateRequestHandler.handle(request); + } + + @Override + public Iterable getCacheTokens() { + return Iterables.concat(stateRequestHandler.getCacheTokens(), cacheTokens); + } + + public int getRequestCount() { + return receivedRequests.size(); + } + + public void addCacheToken(BeamFnApi.ProcessBundleRequest.CacheToken token) { + cacheTokens.add(token); + } + } + @Test public void testExecutionWithTimer() throws Exception { + launchSdkHarness(PipelineOptionsFactory.create()); Pipeline p = Pipeline.create(); p.apply("impulse", Impulse.create()) @@ -1217,6 +1644,7 @@ public void processingTimer( @Test public void testExecutionWithMultipleStages() throws Exception { + launchSdkHarness(PipelineOptionsFactory.create()); Pipeline p = Pipeline.create(); Function> pCollectionGenerator = @@ -1361,6 +1789,7 @@ public IsBounded isBounded() { @Test(timeout = 60000L) public void testSplit() throws Exception { + launchSdkHarness(PipelineOptionsFactory.create()); Pipeline p = Pipeline.create(); p.apply("impulse", Impulse.create()) .apply( diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/SdkHarnessClientTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/SdkHarnessClientTest.java index 1d96b5473dd8..3528e99b2e1a 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/SdkHarnessClientTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/SdkHarnessClientTest.java @@ -83,7 +83,7 @@ import org.apache.beam.sdk.util.WindowedValue.FullWindowedValueCoder; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.junit.Before; @@ -102,7 +102,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SdkHarnessClientTest { diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/SingleEnvironmentInstanceJobBundleFactoryTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/SingleEnvironmentInstanceJobBundleFactoryTest.java index bf3c2ab8195f..a68aeda49d33 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/SingleEnvironmentInstanceJobBundleFactoryTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/SingleEnvironmentInstanceJobBundleFactoryTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.fnexecution.control; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Matchers.any; import static org.mockito.Mockito.doThrow; @@ -37,13 +37,13 @@ import org.apache.beam.runners.core.construction.PipelineTranslation; import org.apache.beam.runners.core.construction.graph.ExecutableStage; import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.InProcessServerFactory; import org.apache.beam.runners.fnexecution.data.GrpcDataService; import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory; import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment; import org.apache.beam.runners.fnexecution.state.GrpcStateService; import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; @@ -59,9 +59,6 @@ /** Tests for {@link SingleEnvironmentInstanceJobBundleFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SingleEnvironmentInstanceJobBundleFactoryTest { @Mock private EnvironmentFactory environmentFactory; @Mock private InstructionRequestHandler instructionRequestHandler; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/data/GrpcDataServiceTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/data/GrpcDataServiceTest.java index b648a0eecfb3..e53e29b2d172 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/data/GrpcDataServiceTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/data/GrpcDataServiceTest.java @@ -18,10 +18,10 @@ package org.apache.beam.runners.fnexecution.data; import static org.apache.beam.sdk.util.CoderUtils.encodeToByteArray; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Collection; @@ -34,8 +34,6 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnApi.Elements; import org.apache.beam.model.fnexecution.v1.BeamFnDataGrpc; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.InProcessServerFactory; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.LengthPrefixCoder; @@ -43,14 +41,16 @@ import org.apache.beam.sdk.fn.data.CloseableFnDataReceiver; import org.apache.beam.sdk.fn.data.InboundDataClient; import org.apache.beam.sdk.fn.data.LogicalEndpoint; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.fn.test.TestStreams; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactoryTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactoryTest.java index 049333eb2e3c..689140f9da2b 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactoryTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactoryTest.java @@ -34,7 +34,6 @@ import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; import org.apache.beam.runners.core.construction.Environments; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.control.ControlClientPool; import org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService; @@ -43,6 +42,7 @@ import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.IdGenerators; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.sdk.options.ManualDockerEnvironmentOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.options.RemoteEnvironmentOptions; @@ -59,9 +59,6 @@ import org.mockito.MockitoAnnotations; /** Tests for {@link DockerEnvironmentFactory}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DockerEnvironmentFactoryTest { private static final ApiServiceDescriptor SERVICE_DESCRIPTOR = diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactoryTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactoryTest.java index f3fafa510272..efddc7226089 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactoryTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactoryTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.fnexecution.environment; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import static org.mockito.ArgumentMatchers.anyString; import static org.mockito.Matchers.anyList; import static org.mockito.Matchers.anyMap; @@ -32,11 +32,11 @@ import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; import org.apache.beam.runners.core.construction.Environments; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler; import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.IdGenerators; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.options.RemoteEnvironmentOptions; import org.junit.Before; @@ -49,9 +49,6 @@ /** Tests for {@link ProcessEnvironmentFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ProcessEnvironmentFactoryTest { private static final ApiServiceDescriptor SERVICE_DESCRIPTOR = diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/ProcessManagerTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/ProcessManagerTest.java index d2d9d4753399..8108cd4349e4 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/ProcessManagerTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/ProcessManagerTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.runners.fnexecution.environment; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.is; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -30,7 +30,7 @@ import java.io.File; import java.io.IOException; import java.io.PrintStream; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.util.Arrays; import java.util.Collections; @@ -40,9 +40,6 @@ /** Tests for {@link ProcessManager}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ProcessManagerTest { @Test @@ -130,7 +127,7 @@ public void testRedirectOutput() throws IOException, InterruptedException { processManager.stopProcess("1"); byte[] output = Files.readAllBytes(outputFile.toPath()); assertNotNull(output); - String outputStr = new String(output, Charset.defaultCharset()); + String outputStr = new String(output, StandardCharsets.UTF_8); assertThat(outputStr, containsString("testing123")); } @@ -158,7 +155,7 @@ public void testInheritIO() throws IOException, InterruptedException { } // TODO: this doesn't work as inherit IO bypasses System.out/err // the output instead appears in the console - // String outputStr = new String(baos.toByteArray(), Charset.defaultCharset()); + // String outputStr = new String(baos.toByteArray(), StandardCharsets.UTF_8); // assertThat(outputStr, containsString("testing123")); assertFalse(ProcessManager.INHERIT_IO_FILE.exists()); } diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/RemoteEnvironmentTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/RemoteEnvironmentTest.java index c61d1e04d36f..af8a810cef77 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/RemoteEnvironmentTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/RemoteEnvironmentTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.fnexecution.environment; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.theInstance; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.verify; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/logging/GrpcLoggingServiceTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/logging/GrpcLoggingServiceTest.java index 39a5e5541036..42e204829715 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/logging/GrpcLoggingServiceTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/logging/GrpcLoggingServiceTest.java @@ -17,11 +17,12 @@ */ package org.apache.beam.runners.fnexecution.logging; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Collection; +import java.util.concurrent.BlockingQueue; import java.util.concurrent.Callable; import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.CountDownLatch; @@ -34,12 +35,12 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.LogControl; import org.apache.beam.model.fnexecution.v1.BeamFnApi.LogEntry; import org.apache.beam.model.fnexecution.v1.BeamFnLoggingGrpc; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; import org.apache.beam.sdk.fn.test.TestStreams; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; @@ -102,12 +103,14 @@ public void testMultipleClientsFailingIsHandledGracefullyByServer() throws Excep try (GrpcFnServer server = GrpcFnServer.allocatePortAndCreateFor(service, InProcessServerFactory.create())) { + CountDownLatch waitForTermination = new CountDownLatch(3); + final BlockingQueue> outboundObservers = + new LinkedBlockingQueue<>(); Collection> tasks = new ArrayList<>(); for (int i = 1; i <= 3; ++i) { final int instructionId = i; tasks.add( () -> { - CountDownLatch waitForTermination = new CountDownLatch(1); ManagedChannel channel = InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()) .build(); @@ -118,13 +121,17 @@ public void testMultipleClientsFailingIsHandledGracefullyByServer() throws Excep .withOnError(new CountDown(waitForTermination)) .build()); outboundObserver.onNext(createLogsWithIds(instructionId, -instructionId)); - outboundObserver.onError(new RuntimeException("Client " + instructionId)); - waitForTermination.await(); + outboundObservers.add(outboundObserver); return null; }); } ExecutorService executorService = Executors.newCachedThreadPool(); executorService.invokeAll(tasks); + + for (int i = 1; i <= 3; ++i) { + outboundObservers.take().onError(new RuntimeException("Client " + i)); + } + waitForTermination.await(); } } diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/provisioning/StaticGrpcProvisionServiceTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/provisioning/StaticGrpcProvisionServiceTest.java index bb03c58b3a1a..ac077134e61e 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/provisioning/StaticGrpcProvisionServiceTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/provisioning/StaticGrpcProvisionServiceTest.java @@ -17,22 +17,22 @@ */ package org.apache.beam.runners.fnexecution.provisioning; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.model.fnexecution.v1.ProvisionApi.GetProvisionInfoRequest; import org.apache.beam.model.fnexecution.v1.ProvisionApi.GetProvisionInfoResponse; import org.apache.beam.model.fnexecution.v1.ProvisionApi.ProvisionInfo; import org.apache.beam.model.fnexecution.v1.ProvisionServiceGrpc; import org.apache.beam.model.fnexecution.v1.ProvisionServiceGrpc.ProvisionServiceBlockingStub; -import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.InProcessServerFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ListValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.NullValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Value; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ListValue; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.NullValue; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Value; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/state/GrpcStateServiceTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/state/GrpcStateServiceTest.java index 9a2ab99677d6..c9a2827a9b50 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/state/GrpcStateServiceTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/state/GrpcStateServiceTest.java @@ -18,7 +18,7 @@ package org.apache.beam.runners.fnexecution.state; import static org.hamcrest.CoreMatchers.equalTo; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import static org.mockito.Matchers.any; import static org.mockito.Mockito.never; import static org.mockito.Mockito.times; @@ -31,8 +31,8 @@ import java.util.concurrent.TimeUnit; import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.sdk.fn.test.TestStreams; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; @@ -45,7 +45,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class GrpcStateServiceTest { private static final long TIMEOUT_MS = 30 * 1000; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlersTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlersTest.java index 0c0c68de2730..3fbee1fb7c1f 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlersTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlersTest.java @@ -37,7 +37,7 @@ import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.util.CoderUtils; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.junit.Test; import org.junit.runner.RunWith; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcServiceTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcServiceTest.java index 35fe7b1915f1..9499f38c9f06 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcServiceTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcServiceTest.java @@ -36,13 +36,13 @@ import org.apache.beam.model.fnexecution.v1.BeamFnWorkerStatusGrpc; import org.apache.beam.model.fnexecution.v1.BeamFnWorkerStatusGrpc.BeamFnWorkerStatusStub; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; -import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.InProcessServerFactory; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.testing.GrpcCleanupRule; +import org.apache.beam.sdk.fn.server.GrpcContextHeaderAccessorProvider; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.InProcessServerFactory; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.testing.GrpcCleanupRule; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; import org.junit.After; import org.junit.Before; @@ -54,9 +54,6 @@ import org.mockito.MockitoAnnotations; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamWorkerStatusGrpcServiceTest { @Rule public final GrpcCleanupRule grpcCleanup = new GrpcCleanupRule(); diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/status/WorkerStatusClientTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/status/WorkerStatusClientTest.java index 1a810ebffb4d..705f209593f1 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/status/WorkerStatusClientTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/status/WorkerStatusClientTest.java @@ -26,7 +26,7 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusRequest; import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusResponse; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Assert; import org.junit.Before; import org.junit.Test; @@ -36,9 +36,6 @@ import org.mockito.MockitoAnnotations; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WorkerStatusClientTest { @Mock public StreamObserver mockObserver; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/translation/BatchSideInputHandlerFactoryTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/translation/BatchSideInputHandlerFactoryTest.java index 7fb4a525f31f..1f459456d13b 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/translation/BatchSideInputHandlerFactoryTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/translation/BatchSideInputHandlerFactoryTest.java @@ -62,9 +62,6 @@ /** Tests for {@link BatchSideInputHandlerFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BatchSideInputHandlerFactoryTest { private static final String TRANSFORM_ID = "transform-id"; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtilsTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtilsTest.java index 83b983d1f526..36997d66eaaf 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtilsTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtilsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.fnexecution.translation; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/wire/ByteStringCoderTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/wire/ByteStringCoderTest.java new file mode 100644 index 000000000000..5a831c2047a3 --- /dev/null +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/wire/ByteStringCoderTest.java @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.wire; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.equalTo; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import java.util.Arrays; +import java.util.List; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.Coder.Context; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.ListCoder; +import org.apache.beam.sdk.testing.CoderProperties; +import org.apache.beam.sdk.util.CoderUtils; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test case for {@link ByteStringCoder}. */ +@RunWith(JUnit4.class) +public class ByteStringCoderTest { + + private static final ByteStringCoder TEST_CODER = ByteStringCoder.of(); + + private static final List TEST_STRING_VALUES = + Arrays.asList( + "", + "a", + "13", + "hello", + "a longer string with spaces and all that", + "a string with a \n newline", + "???????????????"); + private static final ImmutableList TEST_VALUES; + + static { + ImmutableList.Builder builder = ImmutableList.builder(); + for (String s : TEST_STRING_VALUES) { + builder.add(ByteString.copyFromUtf8(s)); + } + TEST_VALUES = builder.build(); + } + + /** + * Generated data to check that the wire format has not changed. To regenerate, see {@link + * org.apache.beam.sdk.coders.PrintBase64Encodings}. + */ + private static final List TEST_ENCODINGS = + Arrays.asList( + "", + "YQ", + "MTM", + "aGVsbG8", + "YSBsb25nZXIgc3RyaW5nIHdpdGggc3BhY2VzIGFuZCBhbGwgdGhhdA", + "YSBzdHJpbmcgd2l0aCBhIAogbmV3bGluZQ", + "Pz8_Pz8_Pz8_Pz8_Pz8_"); + + @Rule public ExpectedException thrown = ExpectedException.none(); + + @Test + public void testDecodeEncodeEqualInAllContexts() throws Exception { + for (ByteString value : TEST_VALUES) { + CoderProperties.coderDecodeEncodeEqual(TEST_CODER, value); + } + } + + @Test + public void testWireFormatEncode() throws Exception { + CoderProperties.coderEncodesBase64(TEST_CODER, TEST_VALUES, TEST_ENCODINGS); + } + + @Test + public void testCoderDeterministic() throws Throwable { + TEST_CODER.verifyDeterministic(); + } + + @Test + public void testConsistentWithEquals() { + assertTrue(TEST_CODER.consistentWithEquals()); + } + + @Test + public void testEncodeNullThrowsCoderException() throws Exception { + thrown.expect(CoderException.class); + thrown.expectMessage("cannot encode a null ByteString"); + + CoderUtils.encodeToBase64(TEST_CODER, null); + } + + @Test + public void testNestedCoding() throws Throwable { + Coder> listCoder = ListCoder.of(TEST_CODER); + CoderProperties.coderDecodeEncodeContentsEqual(listCoder, TEST_VALUES); + CoderProperties.coderDecodeEncodeContentsInSameOrder(listCoder, TEST_VALUES); + } + + @Test + public void testEncodedElementByteSize() throws Throwable { + for (ByteString value : TEST_VALUES) { + byte[] encoded = CoderUtils.encodeToByteArray(TEST_CODER, value, Context.NESTED); + assertEquals(encoded.length, TEST_CODER.getEncodedElementByteSize(value)); + } + } + + @Test + public void testEncodedTypeDescriptor() throws Exception { + assertThat(TEST_CODER.getEncodedTypeDescriptor(), equalTo(TypeDescriptor.of(ByteString.class))); + } +} diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/wire/LengthPrefixUnknownCodersTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/wire/LengthPrefixUnknownCodersTest.java index 2cdd6b164cba..a423b3686cd9 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/wire/LengthPrefixUnknownCodersTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/wire/LengthPrefixUnknownCodersTest.java @@ -45,9 +45,6 @@ /** Tests for {@link LengthPrefixUnknownCoders}. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LengthPrefixUnknownCodersTest { private static class UnknownCoder extends CustomCoder { diff --git a/runners/java-job-service/build.gradle b/runners/java-job-service/build.gradle index 7fce501a25ee..69bc8d9a5a01 100644 --- a/runners/java-job-service/build.gradle +++ b/runners/java-job-service/build.gradle @@ -26,17 +26,20 @@ dependencies { compile library.java.vendored_guava_26_0_jre compile project(":runners:core-construction-java") compile project(path: ":model:pipeline", configuration: "shadow") - compile project(path: ":model:fn-execution", configuration: "shadow") + compile project(path: ":sdks:java:core", configuration: "shadow") + compile project(path: ":model:job-management", configuration: "shadow") compile project(":sdks:java:expansion-service") compile project(":sdks:java:fn-execution") compile project(":runners:core-construction-java") compile project(":runners:java-fn-execution") - compile library.java.vendored_grpc_1_26_0 + compile library.java.jackson_core + compile library.java.jackson_databind + compile library.java.joda_time + compile library.java.commons_compress + compile library.java.vendored_grpc_1_36_0 compile library.java.slf4j_api compile library.java.args4j testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.mockito_core testRuntimeOnly library.java.slf4j_simple } diff --git a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/InMemoryJobService.java b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/InMemoryJobService.java index 01dda109e3ac..c5c0eb363ed8 100644 --- a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/InMemoryJobService.java +++ b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/InMemoryJobService.java @@ -50,17 +50,17 @@ import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.construction.graph.PipelineValidator; -import org.apache.beam.runners.fnexecution.FnService; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService; +import org.apache.beam.sdk.fn.server.FnService; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver; import org.apache.beam.sdk.function.ThrowingConsumer; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusRuntimeException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusRuntimeException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.slf4j.Logger; diff --git a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobInvocation.java b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobInvocation.java index 192d009c6caa..9cf44e302d45 100644 --- a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobInvocation.java +++ b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobInvocation.java @@ -35,7 +35,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline; import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.sdk.PipelineResult; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.util.Timestamps; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.Timestamps; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.FutureCallback; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Futures; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListenableFuture; diff --git a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobInvoker.java b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobInvoker.java index 356c228d73a5..383b5a836765 100644 --- a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobInvoker.java +++ b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobInvoker.java @@ -22,7 +22,7 @@ import java.util.concurrent.ThreadFactory; import javax.annotation.Nullable; import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListeningExecutorService; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; diff --git a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobPreparation.java b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobPreparation.java index 9aa9895e4dab..a9066e1f7384 100644 --- a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobPreparation.java +++ b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobPreparation.java @@ -19,7 +19,7 @@ import com.google.auto.value.AutoValue; import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; /** A job that has been prepared, but not invoked. */ @AutoValue diff --git a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java index 499fc8c9ecd4..c1c924f7e932 100644 --- a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java +++ b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java @@ -20,11 +20,11 @@ import java.io.IOException; import java.nio.file.Paths; import org.apache.beam.model.pipeline.v1.Endpoints; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService; import org.apache.beam.sdk.expansion.service.ExpansionServer; import org.apache.beam.sdk.expansion.service.ExpansionService; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.kohsuke.args4j.Option; import org.kohsuke.args4j.spi.ExplicitBooleanOptionHandler; @@ -90,7 +90,11 @@ public static class ServerConfiguration { usage = "The Java expansion service port. 0 to use a dynamic port. (Default: 8097)") private int expansionPort = 8097; - @Option(name = "--artifacts-dir", usage = "The location to store staged artifact files") + @Option( + name = "--artifacts-dir", + usage = + "The location to store staged artifact files. " + + "If artifact staging is needed, this directory must be accessible by the execution engine's workers.") private String artifactStagingPath = Paths.get(System.getProperty("java.io.tmpdir"), "beam-artifact-staging").toString(); diff --git a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarCreator.java b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarCreator.java index f8f92d54377a..07103ee439a2 100644 --- a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarCreator.java +++ b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarCreator.java @@ -46,8 +46,8 @@ import org.apache.beam.sdk.io.ClassLoaderFileSystem; import org.apache.beam.sdk.metrics.MetricResults; import org.apache.beam.sdk.options.PortablePipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.MessageOrBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.util.JsonFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.MessageOrBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.JsonFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; diff --git a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarUtils.java b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarUtils.java index d460ef7678a1..dc6b6760187a 100644 --- a/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarUtils.java +++ b/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarUtils.java @@ -27,9 +27,9 @@ import java.util.jar.JarEntry; import java.util.jar.JarOutputStream; import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Message.Builder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.util.JsonFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Message.Builder; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.JsonFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/InMemoryJobServiceTest.java b/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/InMemoryJobServiceTest.java index 4b4a5cd05a4d..ff227dcdd914 100644 --- a/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/InMemoryJobServiceTest.java +++ b/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/InMemoryJobServiceTest.java @@ -35,11 +35,11 @@ import org.apache.beam.model.jobmanagement.v1.JobApi; import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.junit.Before; import org.junit.Test; @@ -51,9 +51,6 @@ /** Tests for {@link InMemoryJobService}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class InMemoryJobServiceTest { private static final String TEST_JOB_NAME = "test-job"; diff --git a/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/JobInvocationTest.java b/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/JobInvocationTest.java index 76a665343b58..165ed30d1257 100644 --- a/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/JobInvocationTest.java +++ b/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/JobInvocationTest.java @@ -33,7 +33,7 @@ import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.PipelineResult; import org.apache.beam.sdk.metrics.MetricResults; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListeningExecutorService; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; import org.joda.time.Duration; @@ -42,9 +42,6 @@ import org.junit.Test; /** Tests for {@link JobInvocation}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JobInvocationTest { private static ExecutorService executorService; diff --git a/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarCreatorTest.java b/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarCreatorTest.java index 8c613aff711b..9c600db652fc 100644 --- a/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarCreatorTest.java +++ b/runners/java-job-service/src/test/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarCreatorTest.java @@ -41,7 +41,7 @@ import java.util.jar.Manifest; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Before; import org.junit.Test; @@ -51,9 +51,6 @@ /** Unit tests for {@link PortablePipelineJarCreator}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PortablePipelineJarCreatorTest implements Serializable { @Mock private JarFile inputJar; diff --git a/runners/jet/build.gradle b/runners/jet/build.gradle index 754d68feb929..16b7526d2aed 100644 --- a/runners/jet/build.gradle +++ b/runners/jet/build.gradle @@ -42,13 +42,16 @@ configurations { dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":runners:core-java") + compile project(path: ":model:pipeline", configuration: "shadow") + compile project(":runners:core-construction-java") compile "com.hazelcast.jet:hazelcast-jet:$jet_version" + compile library.java.joda_time + compile library.java.vendored_guava_26_0_jre + compile library.java.slf4j_api testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(path: ":runners:core-java", configuration: "testRuntime") testCompile project(path: ":runners:core-construction-java", configuration: "testRuntime") - testCompile library.java.hamcrest_core - testCompile library.java.junit testCompile library.java.joda_time testCompile "com.hazelcast.jet:hazelcast-jet-core:$jet_version:tests" testCompile "com.hazelcast:hazelcast:$hazelcast_version:tests" diff --git a/runners/jet/src/main/java/org/apache/beam/runners/jet/processors/AbstractParDoP.java b/runners/jet/src/main/java/org/apache/beam/runners/jet/processors/AbstractParDoP.java index 92b4449f6770..c5df5a6fa3b0 100644 --- a/runners/jet/src/main/java/org/apache/beam/runners/jet/processors/AbstractParDoP.java +++ b/runners/jet/src/main/java/org/apache/beam/runners/jet/processors/AbstractParDoP.java @@ -148,8 +148,7 @@ public void init(@Nonnull Outbox outbox, @Nonnull Context context) { this.outbox = outbox; this.metricsContainer = new JetMetricsContainer(stepId, ownerId, context); - doFnInvoker = DoFnInvokers.invokerFor(doFn); - doFnInvoker.invokeSetup(); + doFnInvoker = DoFnInvokers.tryInvokeSetupFor(doFn, pipelineOptions.get()); if (ordinalToSideInput.isEmpty()) { sideInputReader = NullSideInputReader.of(Collections.emptyList()); diff --git a/runners/jet/src/test/java/org/apache/beam/runners/jet/TestJetRunner.java b/runners/jet/src/test/java/org/apache/beam/runners/jet/TestJetRunner.java index 739c5d309196..bc21ff2b3689 100644 --- a/runners/jet/src/test/java/org/apache/beam/runners/jet/TestJetRunner.java +++ b/runners/jet/src/test/java/org/apache/beam/runners/jet/TestJetRunner.java @@ -47,7 +47,6 @@ /** Slightly altered version of the Jet based runner, used in unit-tests. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class TestJetRunner extends PipelineRunner { diff --git a/runners/jet/src/test/java/org/apache/beam/runners/jet/TestStreamP.java b/runners/jet/src/test/java/org/apache/beam/runners/jet/TestStreamP.java index 7d2b24646400..abb37f6193da 100644 --- a/runners/jet/src/test/java/org/apache/beam/runners/jet/TestStreamP.java +++ b/runners/jet/src/test/java/org/apache/beam/runners/jet/TestStreamP.java @@ -42,7 +42,6 @@ */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class TestStreamP extends AbstractProcessor { diff --git a/runners/local-java/build.gradle b/runners/local-java/build.gradle index cf17e94a4ea5..ac1789c264c4 100644 --- a/runners/local-java/build.gradle +++ b/runners/local-java/build.gradle @@ -35,7 +35,5 @@ dependencies { */ compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.joda_time - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.junit } diff --git a/runners/local-java/src/test/java/org/apache/beam/runners/local/StructuralKeyTest.java b/runners/local-java/src/test/java/org/apache/beam/runners/local/StructuralKeyTest.java index 91c4887febe8..96934fd14254 100644 --- a/runners/local-java/src/test/java/org/apache/beam/runners/local/StructuralKeyTest.java +++ b/runners/local-java/src/test/java/org/apache/beam/runners/local/StructuralKeyTest.java @@ -17,13 +17,13 @@ */ package org.apache.beam.runners.local; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertArrayEquals; import org.apache.beam.sdk.coders.ByteArrayCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VarIntCoder; import org.hamcrest.Matchers; -import org.junit.Assert; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; @@ -33,8 +33,8 @@ public class StructuralKeyTest { @Test public void getKeyEqualToOldKey() { - Assert.assertThat(StructuralKey.of(1234, VarIntCoder.of()).getKey(), Matchers.equalTo(1234)); - Assert.assertThat( + assertThat(StructuralKey.of(1234, VarIntCoder.of()).getKey(), Matchers.equalTo(1234)); + assertThat( StructuralKey.of("foobar", StringUtf8Coder.of()).getKey(), Matchers.equalTo("foobar")); assertArrayEquals( StructuralKey.of(new byte[] {2, 9, -22}, ByteArrayCoder.of()).getKey(), @@ -46,21 +46,21 @@ public void getKeyNotSameInstance() { byte[] original = new byte[] {1, 4, 9, 127, -22}; StructuralKey key = StructuralKey.of(original, ByteArrayCoder.of()); - Assert.assertThat(key.getKey(), Matchers.not(Matchers.theInstance(original))); + assertThat(key.getKey(), Matchers.not(Matchers.theInstance(original))); } @Test public void emptyKeysNotEqual() { StructuralKey empty = StructuralKey.empty(); - Assert.assertThat(empty, Matchers.not(Matchers.equalTo(StructuralKey.empty()))); - Assert.assertThat(empty, Matchers.equalTo(empty)); + assertThat(empty, Matchers.not(Matchers.equalTo(StructuralKey.empty()))); + assertThat(empty, Matchers.equalTo(empty)); } @Test public void objectEqualsTrueKeyEquals() { StructuralKey original = StructuralKey.of(1234, VarIntCoder.of()); - Assert.assertThat(StructuralKey.of(1234, VarIntCoder.of()), Matchers.equalTo(original)); + assertThat(StructuralKey.of(1234, VarIntCoder.of()), Matchers.equalTo(original)); } @Test @@ -70,7 +70,7 @@ public void objectsNotEqualEncodingsEqualEquals() { StructuralKey otherKey = StructuralKey.of(new byte[] {1, 4, 9, 127, -22}, ByteArrayCoder.of()); - Assert.assertThat(key, Matchers.equalTo(otherKey)); + assertThat(key, Matchers.equalTo(otherKey)); } @Test @@ -80,6 +80,6 @@ public void notEqualEncodingsEqual() { StructuralKey otherKey = StructuralKey.of(new byte[] {9, -128, 22}, ByteArrayCoder.of()); - Assert.assertThat(key, Matchers.not(Matchers.equalTo(otherKey))); + assertThat(key, Matchers.not(Matchers.equalTo(otherKey))); } } diff --git a/runners/portability/java/build.gradle b/runners/portability/java/build.gradle index 32f64d6c56b9..6309aa5a14a5 100644 --- a/runners/portability/java/build.gradle +++ b/runners/portability/java/build.gradle @@ -32,12 +32,18 @@ configurations { dependencies { compile library.java.vendored_guava_26_0_jre - compile library.java.hamcrest_library compile project(":runners:java-fn-execution") compile project(":runners:java-job-service") compile project(path: ":sdks:java:harness", configuration: "shadow") - compile library.java.vendored_grpc_1_26_0 + compile library.java.vendored_grpc_1_36_0 compile library.java.slf4j_api + compile library.java.joda_time + compile "org.hamcrest:hamcrest:2.1" + compile project(path: ":model:job-management:", configuration: "shadow") + compile project(path: ":model:pipeline", configuration: "shadow") + compile project(":runners:core-construction-java") + compile project(path: ":sdks:java:core", configuration: "shadow") + compile project(":sdks:java:fn-execution") testCompile project(path: ":runners:core-construction-java", configuration: "testRuntime") testCompile library.java.hamcrest_core @@ -140,7 +146,6 @@ def createUlrValidatesRunnerTask = { name, environmentType, dockerImageTask = "" includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner' excludeCategories 'org.apache.beam.sdk.testing.UsesGaugeMetrics' excludeCategories 'org.apache.beam.sdk.testing.UsesOnWindowExpiration' - excludeCategories 'org.apache.beam.sdk.testing.UsesBundleFinalizer' excludeCategories 'org.apache.beam.sdk.testing.UsesMapState' excludeCategories 'org.apache.beam.sdk.testing.UsesSetState' excludeCategories 'org.apache.beam.sdk.testing.UsesOrderedListState' @@ -154,9 +159,6 @@ def createUlrValidatesRunnerTask = { name, environmentType, dockerImageTask = "" // https://issues.apache.org/jira/browse/BEAM-10446 excludeTestsMatching 'org.apache.beam.sdk.metrics.MetricsTest$CommittedMetricTests.testCommittedDistributionMetrics' - // This test seems erroneously labeled ValidatesRunner - excludeTestsMatching 'org.apache.beam.sdk.schemas.AvroSchemaTest.testAvroPipelineGroupBy' - // Teardown not called in exceptions // https://issues.apache.org/jira/browse/BEAM-10447 excludeTestsMatching 'org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundle' @@ -172,27 +174,11 @@ def createUlrValidatesRunnerTask = { name, environmentType, dockerImageTask = "" // https://issues.apache.org/jira/browse/BEAM-10448 excludeTestsMatching 'org.apache.beam.sdk.transforms.windowing.WindowTest.testMergingCustomWindows' excludeTestsMatching 'org.apache.beam.sdk.transforms.windowing.WindowTest.testMergingCustomWindowsKeyedCollection' + excludeTestsMatching 'org.apache.beam.sdk.transforms.windowing.WindowTest.testMergingCustomWindowsWithoutCustomWindowTypes' excludeTestsMatching 'org.apache.beam.sdk.transforms.windowing.WindowingTest.testMergingWindowing' excludeTestsMatching 'org.apache.beam.sdk.transforms.windowing.WindowingTest.testNonPartitioningWindowing' excludeTestsMatching 'org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMergingWindows' - // Flatten with empty PCollections hangs - // https://issues.apache.org/jira/browse/BEAM-10450 - excludeTestsMatching 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput' - excludeTestsMatching 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty' - excludeTestsMatching 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo' - - // Empty side inputs hang - // https://issues.apache.org/jira/browse/BEAM-10449 - excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testWindowedSideInputFixedToFixedWithDefault' - excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testEmptyIterableSideInput' - excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testEmptySingletonSideInput' - excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testEmptyListSideInput' - excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testEmptyMultimapSideInput' - excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testEmptyMultimapSideInputWithNonDeterministicKeyCoder' - excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testEmptyMapSideInput' - excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testEmptyMapSideInputWithNonDeterministicKeyCoder' - // Misc failures // https://issues.apache.org/jira/browse/BEAM-10451 excludeTestsMatching 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testGlobalCombineWithDefaultsAndTriggers' @@ -202,12 +188,16 @@ def createUlrValidatesRunnerTask = { name, environmentType, dockerImageTask = "" // https://issues.apache.org/jira/browse/BEAM-10454 excludeTestsMatching 'org.apache.beam.sdk.testing.PAssertTest.testWindowedIsEqualTo' - // https://issues.apache.org/jira/browse/BEAM-10453 - excludeTestsMatching 'org.apache.beam.sdk.transforms.ReshuffleTest.testReshuffleWithTimestampsStreaming' - // https://issues.apache.org/jira/browse/BEAM-10452 excludeTestsMatching 'org.apache.beam.sdk.transforms.CombineTest$BasicTests.testHotKeyCombiningWithAccumulationMode' + // https://issues.apache.org/jira/browse/BEAM-12275 + excludeTestsMatching 'org.apache.beam.sdk.transforms.ParDoTest$MultipleInputsAndOutputTests.testSideInputAnnotationWithMultipleSideInputs' + excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testMapAsEntrySetSideInput' + excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testWindowedMultimapAsEntrySetSideInput' + excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testWindowedMapAsEntrySetSideInput' + excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testMultimapAsEntrySetSideInput' + // https://issues.apache.org/jira/browse/BEAM-10995 excludeTestsMatching 'org.apache.beam.sdk.transforms.windowing.WindowingTest.testWindowPreservation' } diff --git a/runners/portability/java/src/main/java/org/apache/beam/runners/portability/ExternalWorkerService.java b/runners/portability/java/src/main/java/org/apache/beam/runners/portability/ExternalWorkerService.java deleted file mode 100644 index c4311a385e55..000000000000 --- a/runners/portability/java/src/main/java/org/apache/beam/runners/portability/ExternalWorkerService.java +++ /dev/null @@ -1,86 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.runners.portability; - -import org.apache.beam.fn.harness.FnHarness; -import org.apache.beam.model.fnexecution.v1.BeamFnApi.StartWorkerRequest; -import org.apache.beam.model.fnexecution.v1.BeamFnApi.StartWorkerResponse; -import org.apache.beam.model.fnexecution.v1.BeamFnExternalWorkerPoolGrpc.BeamFnExternalWorkerPoolImplBase; -import org.apache.beam.runners.fnexecution.FnService; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.ServerFactory; -import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -/** - * Implements the BeamFnExternalWorkerPool service by starting a fresh SDK harness for each request. - */ -public class ExternalWorkerService extends BeamFnExternalWorkerPoolImplBase implements FnService { - - private static final Logger LOG = LoggerFactory.getLogger(ExternalWorkerService.class); - - private final PipelineOptions options; - private final ServerFactory serverFactory = ServerFactory.createDefault(); - - public ExternalWorkerService(PipelineOptions options) { - this.options = options; - } - - @Override - public void startWorker( - StartWorkerRequest request, StreamObserver responseObserver) { - LOG.info( - "Starting worker {} pointing at {}.", - request.getWorkerId(), - request.getControlEndpoint().getUrl()); - LOG.debug("Worker request {}.", request); - Thread th = - new Thread( - () -> { - try { - FnHarness.main( - request.getWorkerId(), - options, - request.getLoggingEndpoint(), - request.getControlEndpoint()); - LOG.info("Successfully started worker {}.", request.getWorkerId()); - } catch (Exception exn) { - LOG.error(String.format("Failed to start worker %s.", request.getWorkerId()), exn); - } - }); - th.setName("SDK-worker-" + request.getWorkerId()); - th.setDaemon(true); - th.start(); - - responseObserver.onNext(StartWorkerResponse.newBuilder().build()); - responseObserver.onCompleted(); - } - - @Override - public void close() {} - - public GrpcFnServer start() throws Exception { - GrpcFnServer server = - GrpcFnServer.allocatePortAndCreateFor(this, serverFactory); - LOG.debug( - "Listening for worker start requests at {}.", server.getApiServiceDescriptor().getUrl()); - return server; - } -} diff --git a/runners/portability/java/src/main/java/org/apache/beam/runners/portability/JobServicePipelineResult.java b/runners/portability/java/src/main/java/org/apache/beam/runners/portability/JobServicePipelineResult.java index 35988acf3903..b631a810aecf 100644 --- a/runners/portability/java/src/main/java/org/apache/beam/runners/portability/JobServicePipelineResult.java +++ b/runners/portability/java/src/main/java/org/apache/beam/runners/portability/JobServicePipelineResult.java @@ -33,7 +33,7 @@ import org.apache.beam.model.jobmanagement.v1.JobServiceGrpc.JobServiceBlockingStub; import org.apache.beam.sdk.PipelineResult; import org.apache.beam.sdk.metrics.MetricResults; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; import org.slf4j.Logger; @@ -47,14 +47,19 @@ class JobServicePipelineResult implements PipelineResult, AutoCloseable { private static final Logger LOG = LoggerFactory.getLogger(JobServicePipelineResult.class); private final ByteString jobId; + private final int jobServerTimeout; private final CloseableResource jobService; private @Nullable State terminalState; private final @Nullable Runnable cleanup; private org.apache.beam.model.jobmanagement.v1.JobApi.MetricResults jobMetrics; JobServicePipelineResult( - ByteString jobId, CloseableResource jobService, Runnable cleanup) { + ByteString jobId, + int jobServerTimeout, + CloseableResource jobService, + Runnable cleanup) { this.jobId = jobId; + this.jobServerTimeout = jobServerTimeout; this.jobService = jobService; this.terminalState = null; this.cleanup = cleanup; @@ -65,7 +70,8 @@ public State getState() { if (terminalState != null) { return terminalState; } - JobServiceBlockingStub stub = jobService.get(); + JobServiceBlockingStub stub = + jobService.get().withDeadlineAfter(jobServerTimeout, TimeUnit.SECONDS); JobStateEvent response = stub.getState(GetJobStateRequest.newBuilder().setJobIdBytes(jobId).build()); return getJavaState(response.getState()); @@ -135,7 +141,8 @@ public void close() { } private void waitForTerminalState() { - JobServiceBlockingStub stub = jobService.get(); + JobServiceBlockingStub stub = + jobService.get().withDeadlineAfter(jobServerTimeout, TimeUnit.SECONDS); GetJobStateRequest request = GetJobStateRequest.newBuilder().setJobIdBytes(jobId).build(); JobStateEvent response = stub.getState(request); State lastState = getJavaState(response.getState()); @@ -146,7 +153,7 @@ private void waitForTerminalState() { Thread.currentThread().interrupt(); throw new RuntimeException(e); } - response = stub.getState(request); + response = stub.withDeadlineAfter(jobServerTimeout, TimeUnit.SECONDS).getState(request); lastState = getJavaState(response.getState()); } terminalState = lastState; @@ -157,7 +164,10 @@ private void propagateErrors() { JobMessagesRequest messageStreamRequest = JobMessagesRequest.newBuilder().setJobIdBytes(jobId).build(); Iterator messageStreamIterator = - jobService.get().getMessageStream(messageStreamRequest); + jobService + .get() + .withDeadlineAfter(jobServerTimeout, TimeUnit.SECONDS) + .getMessageStream(messageStreamRequest); while (messageStreamIterator.hasNext()) { JobMessage messageResponse = messageStreamIterator.next().getMessageResponse(); if (messageResponse.getImportance() == JobMessage.MessageImportance.JOB_MESSAGE_ERROR) { diff --git a/runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableRunner.java b/runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableRunner.java index 0b3df505a485..03b4965ba4a3 100644 --- a/runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableRunner.java +++ b/runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableRunner.java @@ -22,6 +22,8 @@ import java.util.Arrays; import java.util.List; import java.util.Optional; +import java.util.concurrent.TimeUnit; +import org.apache.beam.fn.harness.ExternalWorkerService; import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; import org.apache.beam.model.jobmanagement.v1.JobApi.PrepareJobRequest; import org.apache.beam.model.jobmanagement.v1.JobApi.PrepareJobResponse; @@ -36,7 +38,6 @@ import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; import org.apache.beam.runners.core.construction.PipelineTranslation; import org.apache.beam.runners.core.construction.SdkComponents; -import org.apache.beam.runners.fnexecution.GrpcFnServer; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService; import org.apache.beam.runners.portability.CloseableResource.CloseException; @@ -44,12 +45,13 @@ import org.apache.beam.sdk.PipelineResult; import org.apache.beam.sdk.PipelineRunner; import org.apache.beam.sdk.fn.channel.ManagedChannelFactory; +import org.apache.beam.sdk.fn.server.GrpcFnServer; import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsValidator; import org.apache.beam.sdk.options.PortablePipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.slf4j.Logger; @@ -138,6 +140,8 @@ public PipelineResult run(Pipeline pipeline) { + "Defaulting to files from the classpath: {}", classpathResources.size()); filesToStageBuilder.addAll(classpathResources); + } else { + filesToStageBuilder.addAll(stagingFiles); } // TODO(heejong): remove jar_packages experimental flag when cross-language dependency @@ -174,7 +178,12 @@ public PipelineResult run(Pipeline pipeline) { try (CloseableResource wrappedJobService = CloseableResource.of(jobService, unused -> jobServiceChannel.shutdown())) { - PrepareJobResponse prepareJobResponse = jobService.prepare(prepareJobRequest); + final int jobServerTimeout = options.as(PortablePipelineOptions.class).getJobServerTimeout(); + PrepareJobResponse prepareJobResponse = + jobService + .withDeadlineAfter(jobServerTimeout, TimeUnit.SECONDS) + .withWaitForReady() + .prepare(prepareJobRequest); LOG.info("PrepareJobResponse: {}", prepareJobResponse); ApiServiceDescriptor artifactStagingEndpoint = @@ -201,12 +210,16 @@ public PipelineResult run(Pipeline pipeline) { .setPreparationId(prepareJobResponse.getPreparationId()) .build(); + // Run the job and wait for a result, we don't set a timeout here because + // it may take a long time for a job to complete and streaming + // jobs never return a response. RunJobResponse runJobResponse = jobService.run(runJobRequest); LOG.info("RunJobResponse: {}", runJobResponse); ByteString jobId = runJobResponse.getJobIdBytes(); - return new JobServicePipelineResult(jobId, wrappedJobService.transfer(), cleanup); + return new JobServicePipelineResult( + jobId, jobServerTimeout, wrappedJobService.transfer(), cleanup); } catch (CloseException e) { throw new RuntimeException(e); } diff --git a/runners/portability/java/src/main/java/org/apache/beam/runners/portability/testing/TestJobService.java b/runners/portability/java/src/main/java/org/apache/beam/runners/portability/testing/TestJobService.java index 53578b755850..66198b04141e 100644 --- a/runners/portability/java/src/main/java/org/apache/beam/runners/portability/testing/TestJobService.java +++ b/runners/portability/java/src/main/java/org/apache/beam/runners/portability/testing/TestJobService.java @@ -27,7 +27,7 @@ import org.apache.beam.model.jobmanagement.v1.JobApi.RunJobResponse; import org.apache.beam.model.jobmanagement.v1.JobServiceGrpc.JobServiceImplBase; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** * A JobService for tests. diff --git a/runners/portability/java/src/test/java/org/apache/beam/runners/portability/PortableRunnerTest.java b/runners/portability/java/src/test/java/org/apache/beam/runners/portability/PortableRunnerTest.java index 9bd1ec6653ed..652ec3be73f6 100644 --- a/runners/portability/java/src/test/java/org/apache/beam/runners/portability/PortableRunnerTest.java +++ b/runners/portability/java/src/test/java/org/apache/beam/runners/portability/PortableRunnerTest.java @@ -45,10 +45,10 @@ import org.apache.beam.sdk.options.PortablePipelineOptions; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.testing.GrpcCleanupRule; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.testing.GrpcCleanupRule; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; import org.joda.time.Duration; diff --git a/runners/samza/build.gradle b/runners/samza/build.gradle index 3d02d9d7cfc9..b4aff99d1738 100644 --- a/runners/samza/build.gradle +++ b/runners/samza/build.gradle @@ -40,7 +40,7 @@ configurations { validatesRunner } -def samza_version = "1.3.0" +def samza_version = "1.5.0" dependencies { compile library.java.vendored_guava_26_0_jre @@ -52,27 +52,39 @@ dependencies { compile library.java.jackson_annotations compile library.java.slf4j_api compile library.java.joda_time - compile library.java.commons_io compile library.java.args4j + compile library.java.commons_io + runtimeOnly "org.rocksdb:rocksdbjni:6.15.2" + runtimeOnly "org.scala-lang:scala-library:2.11.8" compile "org.apache.samza:samza-api:$samza_version" compile "org.apache.samza:samza-core_2.11:$samza_version" - compile "org.apache.samza:samza-kafka_2.11:$samza_version" - compile "org.apache.samza:samza-kv_2.11:$samza_version" + runtimeOnly "org.apache.samza:samza-kafka_2.11:$samza_version" + runtimeOnly "org.apache.samza:samza-kv_2.11:$samza_version" compile "org.apache.samza:samza-kv-rocksdb_2.11:$samza_version" compile "org.apache.samza:samza-kv-inmemory_2.11:$samza_version" compile "org.apache.samza:samza-yarn_2.11:$samza_version" - compile "org.apache.kafka:kafka-clients:0.11.0.2" + runtimeOnly "org.apache.kafka:kafka-clients:2.0.1" + compile library.java.vendored_grpc_1_36_0 + compile project(path: ":model:fn-execution", configuration: "shadow") + compile project(path: ":model:job-management", configuration: "shadow") + compile project(path: ":model:pipeline", configuration: "shadow") + compile project(":sdks:java:fn-execution") testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(path: ":runners:core-java", configuration: "testRuntime") testCompile library.java.hamcrest_core testCompile library.java.junit testCompile library.java.mockito_core testCompile library.java.jackson_dataformat_yaml + testCompile library.java.google_code_gson validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest") validatesRunner project(path: ":runners:core-java", configuration: "testRuntime") validatesRunner project(project.path) } +configurations.all { + exclude group: "org.slf4j", module: "slf4j-jdk14" +} + task validatesRunner(type: Test) { group = "Verification" description "Validates Samza runner" @@ -83,22 +95,56 @@ task validatesRunner(type: Test) { classpath = configurations.validatesRunner testClassesDirs = files(project(":sdks:java:core").sourceSets.test.output.classesDirs) useJUnit { + includeCategories 'org.apache.beam.sdk.testing.NeedsRunner' includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner' + excludeCategories 'org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo' + excludeCategories 'org.apache.beam.sdk.testing.UsesSchema' excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above100MB' excludeCategories 'org.apache.beam.sdk.testing.UsesAttemptedMetrics' excludeCategories 'org.apache.beam.sdk.testing.UsesCommittedMetrics' - excludeCategories 'org.apache.beam.sdk.testing.UsesTestStream' + excludeCategories 'org.apache.beam.sdk.testing.UsesTestStreamWithProcessingTime' excludeCategories 'org.apache.beam.sdk.testing.UsesMetricsPusher' excludeCategories 'org.apache.beam.sdk.testing.UsesParDoLifecycle' excludeCategories 'org.apache.beam.sdk.testing.UsesStrictTimerOrdering' - excludeCategories 'org.apache.beam.sdk.testing.UsesTimerMap' excludeCategories 'org.apache.beam.sdk.testing.UsesOnWindowExpiration' excludeCategories 'org.apache.beam.sdk.testing.UsesOrderedListState' excludeCategories 'org.apache.beam.sdk.testing.UsesBundleFinalizer' + excludeCategories 'org.apache.beam.sdk.testing.UsesLoopingTimer' } filter { // TODO(BEAM-10025) excludeTestsMatching 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefaultUnbounded' + // TODO(BEAM-11479) + excludeTestsMatching 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestamp' + // TODO(BEAM-11479) + excludeTestsMatching 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testRelativeTimerWithOutputTimestamp' + // TODO(BEAM-12035) + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testFirstElementLate' + // TODO(BEAM-12036) + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testLateDataAccumulating' + // TODO(BEAM-12743) + excludeTestsMatching 'org.apache.beam.sdk.coders.PCollectionCustomCoderTest.testEncodingNPException' + excludeTestsMatching 'org.apache.beam.sdk.coders.PCollectionCustomCoderTest.testEncodingIOException' + excludeTestsMatching 'org.apache.beam.sdk.coders.PCollectionCustomCoderTest.testDecodingNPException' + excludeTestsMatching 'org.apache.beam.sdk.coders.PCollectionCustomCoderTest.testDecodingIOException' + // TODO(BEAM-12744) + excludeTestsMatching 'org.apache.beam.sdk.PipelineTest.testEmptyPipeline' + // TODO(BEAM-12745) + excludeTestsMatching 'org.apache.beam.sdk.io.AvroIOTest*' + // TODO(BEAM-12746) + excludeTestsMatching 'org.apache.beam.sdk.io.FileIOTest*' + // TODO(BEAM-12747) + excludeTestsMatching 'org.apache.beam.sdk.transforms.WithTimestampsTest.withTimestampsBackwardsInTimeShouldThrow' + excludeTestsMatching 'org.apache.beam.sdk.transforms.WithTimestampsTest.withTimestampsWithNullTimestampShouldThrow' + // TODO(BEAM-12748) + excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testEmptySingletonSideInput' + excludeTestsMatching 'org.apache.beam.sdk.transforms.ViewTest.testNonSingletonSideInput' + // TODO(BEAM-12749) + excludeTestsMatching 'org.apache.beam.sdk.transforms.MapElementsTest.testMapSimpleFunction' + // TODO(BEAM-12750) + excludeTestsMatching 'org.apache.beam.sdk.transforms.GroupIntoBatchesTest.testInGlobalWindowBatchSizeByteSizeFn' + excludeTestsMatching 'org.apache.beam.sdk.transforms.GroupIntoBatchesTest.testInStreamingMode' + excludeTestsMatching 'org.apache.beam.sdk.transforms.GroupIntoBatchesTest.testWithShardedKeyInGlobalWindow' // These tests fail since there is no support for side inputs in Samza's unbounded splittable DoFn integration excludeTestsMatching 'org.apache.beam.sdk.transforms.SplittableDoFnTest.testWindowedSideInputWithCheckpointsUnbounded' @@ -110,6 +156,10 @@ task validatesRunner(type: Test) { excludeTestsMatching 'org.apache.beam.sdk.transforms.SplittableDoFnTest.testPairWithIndexWindowedTimestampedUnbounded' excludeTestsMatching 'org.apache.beam.sdk.transforms.SplittableDoFnTest.testOutputAfterCheckpointUnbounded' } + filter { + // Re-enable the test after Samza runner supports same state id across DoFn(s). + excludeTest('ParDoTest$StateTests', 'testValueStateSameId') + } } // Generates :runners:samza:runQuickstartJavaSamza diff --git a/runners/samza/job-server/build.gradle b/runners/samza/job-server/build.gradle index 7119b906f16d..266ec0e6be9e 100644 --- a/runners/samza/job-server/build.gradle +++ b/runners/samza/job-server/build.gradle @@ -1,3 +1,5 @@ +import org.apache.beam.gradle.BeamModulePlugin + /* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file @@ -23,20 +25,130 @@ mainClassName = "org.apache.beam.runners.samza.SamzaJobServerDriver" applyJavaNature( automaticModuleName: 'org.apache.beam.runners.samza.jobserver', + archivesBaseName: project.hasProperty('archives_base_name') ? archives_base_name : archivesBaseName, validateShadowJar: false, - exportJavadoc: false, shadowClosure: { append "reference.conf" }, ) +def samzaRunnerProject = project.parent.path + +description = "Apache Beam :: Runners :: Samza :: Job Server" + +configurations { + validatesPortableRunner +} + dependencies { - compile project(":runners:samza") + compile project(samzaRunnerProject) + permitUnusedDeclared project(samzaRunnerProject) runtime group: "org.slf4j", name: "jcl-over-slf4j", version: dependencies.create(project.library.java.slf4j_api).getVersion() + validatesPortableRunner project(path: samzaRunnerProject, configuration: "testRuntime") + validatesPortableRunner project(path: ":sdks:java:core", configuration: "shadowTest") + validatesPortableRunner project(path: ":runners:core-java", configuration: "testRuntime") + validatesPortableRunner project(path: ":runners:portability:java", configuration: "testRuntime") runtime library.java.slf4j_simple } runShadow { args = [] } + +def tempDir = File.createTempDir() +def pipelineOptions = [ + "--configOverride={\"job.non-logged.store.base.dir\":\"" + tempDir + "\"}" +] +createPortableValidatesRunnerTask( + name: "validatesPortableRunner", + jobServerDriver: "org.apache.beam.runners.samza.SamzaJobServerDriver", + jobServerConfig: "--job-host=localhost,--job-port=0,--artifact-port=0,--expansion-port=0", + testClasspathConfiguration: configurations.validatesPortableRunner, + numParallelTests: 1, + pipelineOpts: pipelineOptions, + environment: BeamModulePlugin.PortableValidatesRunnerConfiguration.Environment.EMBEDDED, + testCategories: { + includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner' + // TODO: BEAM-12350 + excludeCategories 'org.apache.beam.sdk.testing.UsesAttemptedMetrics' + // TODO: BEAM-12681 + excludeCategories 'org.apache.beam.sdk.testing.FlattenWithHeterogeneousCoders' + // Larger keys are possible, but they require more memory. + excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above10MB' + excludeCategories 'org.apache.beam.sdk.testing.UsesCommittedMetrics' + excludeCategories 'org.apache.beam.sdk.testing.UsesCrossLanguageTransforms' + excludeCategories 'org.apache.beam.sdk.testing.UsesPythonExpansionService' + excludeCategories 'org.apache.beam.sdk.testing.UsesCustomWindowMerging' + excludeCategories 'org.apache.beam.sdk.testing.UsesFailureMessage' + excludeCategories 'org.apache.beam.sdk.testing.UsesGaugeMetrics' + excludeCategories 'org.apache.beam.sdk.testing.UsesParDoLifecycle' + excludeCategories 'org.apache.beam.sdk.testing.UsesMapState' + excludeCategories 'org.apache.beam.sdk.testing.UsesSetState' + excludeCategories 'org.apache.beam.sdk.testing.UsesOrderedListState' + excludeCategories 'org.apache.beam.sdk.testing.UsesStrictTimerOrdering' + excludeCategories 'org.apache.beam.sdk.testing.UsesOnWindowExpiration' + excludeCategories 'org.apache.beam.sdk.testing.UsesBundleFinalizer' + excludeCategories 'org.apache.beam.sdk.testing.UsesOrderedListState' + excludeCategories 'org.apache.beam.sdk.testing.UsesBoundedSplittableParDo' + excludeCategories 'org.apache.beam.sdk.testing.UsesTestStreamWithProcessingTime' + // TODO(BEAM-12821) + excludeCategories 'org.apache.beam.sdk.testing.UsesTestStreamWithMultipleStages' + excludeCategories 'org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo' + excludeCategories 'org.apache.beam.sdk.testing.UsesSplittableParDoWithWindowedSideInputs' + excludeCategories 'org.apache.beam.sdk.testing.UsesLoopingTimer' + }, + testFilter: { + // TODO(BEAM-12677) + excludeTestsMatching "org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2" + excludeTestsMatching "org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput" + excludeTestsMatching "org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo" + excludeTestsMatching "org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty" + // TODO(BEAM-10025) + excludeTestsMatching 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefaultUnbounded' + // TODO(BEAM-11479) + excludeTestsMatching 'org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestamp' + // TODO(BEAM-12035) + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testFirstElementLate' + // TODO(BEAM-12036) + excludeTestsMatching 'org.apache.beam.sdk.testing.TestStreamTest.testLateDataAccumulating' + } +) + +def jobPort = BeamModulePlugin.getRandomPort() +def artifactPort = BeamModulePlugin.getRandomPort() + +def setupTask = project.tasks.create(name: "samzaJobServerSetup", type: Exec) { + dependsOn shadowJar + def pythonDir = project.project(":sdks:python").projectDir + def samzaJobServerJar = shadowJar.archivePath + + executable 'sh' + args '-c', "$pythonDir/scripts/run_job_server.sh stop --group_id ${project.name} && $pythonDir/scripts/run_job_server.sh start --group_id ${project.name} --job_port ${jobPort} --artifact_port ${artifactPort} --job_server_jar ${samzaJobServerJar}" +} + +def cleanupTask = project.tasks.create(name: "samzaJobServerCleanup", type: Exec) { + def pythonDir = project.project(":sdks:python").projectDir + + executable 'sh' + args '-c', "$pythonDir/scripts/run_job_server.sh stop --group_id ${project.name}" +} + +createCrossLanguageValidatesRunnerTask( + startJobServer: setupTask, + cleanupJobServer: cleanupTask, + classpath: configurations.validatesPortableRunner, + numParallelTests: 1, + pythonPipelineOptions: [ + "--runner=PortableRunner", + "--job_endpoint=localhost:${jobPort}", + "--environment_cache_millis=10000", + "--experiments=beam_fn_api", + ], + javaPipelineOptions: [ + "--runner=PortableRunner", + "--jobEndpoint=localhost:${jobPort}", + "--environmentCacheMillis=10000", + "--experiments=beam_fn_api", + ] +) \ No newline at end of file diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaExecutionContext.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaExecutionContext.java index 3d49ae191c20..b15efaa1c78c 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaExecutionContext.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaExecutionContext.java @@ -17,53 +17,22 @@ */ package org.apache.beam.runners.samza; -import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; - -import java.time.Duration; -import java.util.concurrent.ExecutorService; -import java.util.concurrent.Executors; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.ServerFactory; -import org.apache.beam.runners.fnexecution.control.ControlClientPool; -import org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService; -import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler; -import org.apache.beam.runners.fnexecution.control.JobBundleFactory; -import org.apache.beam.runners.fnexecution.control.MapControlClientPool; -import org.apache.beam.runners.fnexecution.control.SingleEnvironmentInstanceJobBundleFactory; -import org.apache.beam.runners.fnexecution.data.GrpcDataService; -import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory; -import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment; -import org.apache.beam.runners.fnexecution.state.GrpcStateService; import org.apache.beam.runners.samza.metrics.SamzaMetricsContainer; -import org.apache.beam.sdk.fn.IdGenerator; -import org.apache.beam.sdk.fn.IdGenerators; -import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.samza.context.ApplicationContainerContext; import org.apache.samza.context.ApplicationContainerContextFactory; import org.apache.samza.context.ContainerContext; import org.apache.samza.context.ExternalContext; import org.apache.samza.context.JobContext; import org.apache.samza.metrics.MetricsRegistryMap; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; /** Runtime context for the Samza runner. */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SamzaExecutionContext implements ApplicationContainerContext { - private static final Logger LOG = LoggerFactory.getLogger(SamzaExecutionContext.class); - private static final String SAMZA_WORKER_ID = "samza_py_worker_id"; private final SamzaPipelineOptions options; private SamzaMetricsContainer metricsContainer; - private JobBundleFactory jobBundleFactory; - private GrpcFnServer fnControlServer; - private GrpcFnServer fnDataServer; - private GrpcFnServer fnStateServer; - private ControlClientPool controlClientPool; - private ExecutorService dataExecutor; - private IdGenerator idGenerator = IdGenerators.incrementingLongs(); public SamzaExecutionContext(SamzaPipelineOptions options) { this.options = options; @@ -81,93 +50,11 @@ void setMetricsContainer(SamzaMetricsContainer metricsContainer) { this.metricsContainer = metricsContainer; } - public JobBundleFactory getJobBundleFactory() { - return this.jobBundleFactory; - } - - void setJobBundleFactory(JobBundleFactory jobBundleFactory) { - this.jobBundleFactory = jobBundleFactory; - } - @Override - public void start() { - checkState(getJobBundleFactory() == null, "jobBundleFactory has been created!"); - - if (SamzaRunnerOverrideConfigs.isPortableMode(options)) { - try { - controlClientPool = MapControlClientPool.create(); - dataExecutor = Executors.newCachedThreadPool(); - - fnControlServer = - GrpcFnServer.allocatePortAndCreateFor( - FnApiControlClientPoolService.offeringClientsToPool( - controlClientPool.getSink(), () -> SAMZA_WORKER_ID), - ServerFactory.createWithPortSupplier( - () -> SamzaRunnerOverrideConfigs.getFnControlPort(options))); - LOG.info("Started control server on port {}", fnControlServer.getServer().getPort()); - - fnDataServer = - GrpcFnServer.allocatePortAndCreateFor( - GrpcDataService.create( - options, dataExecutor, OutboundObserverFactory.serverDirect()), - ServerFactory.createDefault()); - LOG.info("Started data server on port {}", fnDataServer.getServer().getPort()); - - fnStateServer = - GrpcFnServer.allocatePortAndCreateFor( - GrpcStateService.create(), ServerFactory.createDefault()); - LOG.info("Started state server on port {}", fnStateServer.getServer().getPort()); - - final long waitTimeoutMs = - SamzaRunnerOverrideConfigs.getControlClientWaitTimeoutMs(options); - LOG.info("Control client wait timeout config: " + waitTimeoutMs); - - final InstructionRequestHandler instructionHandler = - controlClientPool.getSource().take(SAMZA_WORKER_ID, Duration.ofMillis(waitTimeoutMs)); - final EnvironmentFactory environmentFactory = - (environment, workerId) -> - RemoteEnvironment.forHandler(environment, instructionHandler); - // TODO: use JobBundleFactoryBase.WrappedSdkHarnessClient.wrapping - jobBundleFactory = - SingleEnvironmentInstanceJobBundleFactory.create( - environmentFactory, fnDataServer, fnStateServer, idGenerator); - LOG.info("Started job bundle factory"); - } catch (Exception e) { - throw new RuntimeException( - "Running samza in Beam portable mode but failed to create job bundle factory", e); - } - - setJobBundleFactory(jobBundleFactory); - } - } + public void start() {} @Override - public void stop() { - closeAutoClosable(fnControlServer, "controlServer"); - fnControlServer = null; - closeAutoClosable(fnDataServer, "dataServer"); - fnDataServer = null; - closeAutoClosable(fnStateServer, "stateServer"); - fnStateServer = null; - if (dataExecutor != null) { - dataExecutor.shutdown(); - dataExecutor = null; - } - controlClientPool = null; - closeAutoClosable(jobBundleFactory, "jobBundle"); - jobBundleFactory = null; - } - - private static void closeAutoClosable(AutoCloseable closeable, String name) { - try (AutoCloseable closer = closeable) { - LOG.info("Closed {}", name); - } catch (Exception e) { - LOG.error( - "Failed to close {}. Ignore since this is shutdown process...", - closeable.getClass().getSimpleName(), - e); - } - } + public void stop() {} /** The factory to return this {@link SamzaExecutionContext}. */ public class Factory implements ApplicationContainerContextFactory { diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaJobInvoker.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaJobInvoker.java new file mode 100644 index 000000000000..45e87e753cf6 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaJobInvoker.java @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza; + +import java.util.UUID; +import javax.annotation.Nullable; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; +import org.apache.beam.runners.fnexecution.provisioning.JobInfo; +import org.apache.beam.runners.jobsubmission.JobInvocation; +import org.apache.beam.runners.jobsubmission.JobInvoker; +import org.apache.beam.runners.jobsubmission.PortablePipelineJarCreator; +import org.apache.beam.runners.jobsubmission.PortablePipelineRunner; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListeningExecutorService; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class SamzaJobInvoker extends JobInvoker { + + private static final Logger LOG = LoggerFactory.getLogger(SamzaJobInvoker.class); + private final SamzaJobServerDriver.SamzaServerConfiguration configuration; + + public static SamzaJobInvoker create( + SamzaJobServerDriver.SamzaServerConfiguration configuration) { + return new SamzaJobInvoker(configuration); + } + + private SamzaJobInvoker(SamzaJobServerDriver.SamzaServerConfiguration configuration) { + super("samza-runner-job-invoker-%d"); + this.configuration = configuration; + } + + @Override + protected JobInvocation invokeWithExecutor( + RunnerApi.Pipeline pipeline, + Struct options, + @Nullable String retrievalToken, + ListeningExecutorService executorService) { + LOG.trace("Parsing pipeline options"); + final SamzaPortablePipelineOptions samzaOptions = + PipelineOptionsTranslation.fromProto(options).as(SamzaPortablePipelineOptions.class); + + final PortablePipelineRunner pipelineRunner; + if (Strings.isNullOrEmpty(samzaOptions.getOutputExecutablePath())) { + pipelineRunner = new SamzaPipelineRunner(samzaOptions); + } else { + /* + * To support --output_executable_path where bundles the input pipeline along with all + * artifacts, etc. required to run the pipeline into a jar that can be executed later. + */ + pipelineRunner = new PortablePipelineJarCreator(SamzaPipelineRunner.class); + } + + final String invocationId = + String.format("%s_%s", samzaOptions.getJobName(), UUID.randomUUID().toString()); + final JobInfo jobInfo = + JobInfo.create(invocationId, samzaOptions.getJobName(), retrievalToken, options); + return new JobInvocation(jobInfo, executorService, pipeline, pipelineRunner); + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaJobServerDriver.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaJobServerDriver.java index c3d1603af4c1..ad0a3253bacb 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaJobServerDriver.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaJobServerDriver.java @@ -17,100 +17,77 @@ */ package org.apache.beam.runners.samza; -import java.io.IOException; -import java.util.HashMap; -import java.util.Map; -import java.util.UUID; -import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; -import org.apache.beam.runners.fnexecution.GrpcFnServer; -import org.apache.beam.runners.fnexecution.ServerFactory; -import org.apache.beam.runners.fnexecution.provisioning.JobInfo; -import org.apache.beam.runners.jobsubmission.InMemoryJobService; -import org.apache.beam.runners.jobsubmission.JobInvocation; -import org.apache.beam.runners.jobsubmission.JobInvoker; +import org.apache.beam.runners.jobsubmission.JobServerDriver; +import org.apache.beam.sdk.fn.server.ServerFactory; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListeningExecutorService; -import org.checkerframework.checker.nullness.qual.Nullable; +import org.kohsuke.args4j.CmdLineException; +import org.kohsuke.args4j.CmdLineParser; import org.slf4j.Logger; import org.slf4j.LoggerFactory; -/** Driver program that starts a job server. */ -// TODO extend JobServerDriver -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class SamzaJobServerDriver { +/** Driver program that starts a job server for the Samza runner. */ +public class SamzaJobServerDriver extends JobServerDriver { + private static final Logger LOG = LoggerFactory.getLogger(SamzaJobServerDriver.class); - private final SamzaPortablePipelineOptions pipelineOptions; + /** Samza runner-specific Configuration for the jobServer. */ + public static class SamzaServerConfiguration extends ServerConfiguration {} - private SamzaJobServerDriver(SamzaPortablePipelineOptions pipelineOptions) { - this.pipelineOptions = pipelineOptions; + public static void main(String[] args) { + // TODO: Expose the fileSystem related options. + PipelineOptions options = PipelineOptionsFactory.create(); + // Register standard file systems. + FileSystems.setDefaultPipelineOptions(options); + fromParams(args).run(); } - public static void main(String[] args) throws Exception { - SamzaPortablePipelineOptions pipelineOptions = - PipelineOptionsFactory.fromArgs(args).as(SamzaPortablePipelineOptions.class); - fromOptions(pipelineOptions).run(); + private static SamzaJobServerDriver fromParams(String[] args) { + return fromConfig(parseArgs(args)); } - public static SamzaJobServerDriver fromOptions(SamzaPortablePipelineOptions pipelineOptions) { - Map overrideConfig = - pipelineOptions.getConfigOverride() != null - ? pipelineOptions.getConfigOverride() - : new HashMap<>(); - overrideConfig.put(SamzaRunnerOverrideConfigs.IS_PORTABLE_MODE, String.valueOf(true)); - overrideConfig.put( - SamzaRunnerOverrideConfigs.FN_CONTROL_PORT, - String.valueOf(pipelineOptions.getControlPort())); - pipelineOptions.setConfigOverride(overrideConfig); - return new SamzaJobServerDriver(pipelineOptions); + private static void printUsage(CmdLineParser parser) { + System.err.printf("Usage: java %s arguments...%n", SamzaJobServerDriver.class.getSimpleName()); + parser.printUsage(System.err); + System.err.println(); } - private static InMemoryJobService createJobService(SamzaPortablePipelineOptions pipelineOptions) - throws IOException { - JobInvoker jobInvoker = - new JobInvoker("samza-job-invoker") { - @Override - protected JobInvocation invokeWithExecutor( - RunnerApi.Pipeline pipeline, - Struct options, - @Nullable String retrievalToken, - ListeningExecutorService executorService) - throws IOException { - String invocationId = - String.format("%s_%s", pipelineOptions.getJobName(), UUID.randomUUID().toString()); - SamzaPipelineRunner pipelineRunner = new SamzaPipelineRunner(pipelineOptions); - JobInfo jobInfo = - JobInfo.create( - invocationId, - pipelineOptions.getJobName(), - retrievalToken, - PipelineOptionsTranslation.toProto(pipelineOptions)); - return new JobInvocation(jobInfo, executorService, pipeline, pipelineRunner); - } - }; - return InMemoryJobService.create( - null, - session -> session, - stagingSessionToken -> {}, - jobInvoker, - InMemoryJobService.DEFAULT_MAX_INVOCATION_HISTORY); + private static SamzaJobServerDriver fromConfig(SamzaServerConfiguration configuration) { + return create( + configuration, + createJobServerFactory(configuration), + createArtifactServerFactory(configuration)); } - public void run() throws Exception { - final InMemoryJobService service = createJobService(pipelineOptions); - final GrpcFnServer jobServiceGrpcFnServer = - GrpcFnServer.allocatePortAndCreateFor( - service, ServerFactory.createWithPortSupplier(pipelineOptions::getJobPort)); - LOG.info("JobServer started on {}", jobServiceGrpcFnServer.getApiServiceDescriptor().getUrl()); + public static SamzaServerConfiguration parseArgs(String[] args) { + SamzaServerConfiguration configuration = new SamzaServerConfiguration(); + CmdLineParser parser = new CmdLineParser(configuration); try { - jobServiceGrpcFnServer.getServer().awaitTermination(); - } finally { - LOG.info("JobServer closing"); - jobServiceGrpcFnServer.close(); + parser.parseArgument(args); + } catch (CmdLineException e) { + LOG.error("Unable to parse command line arguments.", e); + printUsage(parser); + throw new IllegalArgumentException("Unable to parse command line arguments.", e); } + return configuration; + } + + private static SamzaJobServerDriver create( + SamzaServerConfiguration configuration, + ServerFactory jobServerFactory, + ServerFactory artifactServerFactory) { + return new SamzaJobServerDriver(configuration, jobServerFactory, artifactServerFactory); + } + + private SamzaJobServerDriver( + SamzaServerConfiguration configuration, + ServerFactory jobServerFactory, + ServerFactory artifactServerFactory) { + super( + configuration, + jobServerFactory, + artifactServerFactory, + () -> SamzaJobInvoker.create(configuration)); } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineOptions.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineOptions.java index 3ff64e3afa70..c0af1fab0d90 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineOptions.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineOptions.java @@ -23,8 +23,8 @@ import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.Description; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.samza.config.ConfigFactory; -import org.apache.samza.config.factories.PropertiesConfigFactory; +import org.apache.samza.config.ConfigLoaderFactory; +import org.apache.samza.config.loaders.PropertiesConfigLoaderFactory; import org.apache.samza.metrics.MetricsReporter; /** Options which can be used to configure a Samza PortablePipelineRunner. */ @@ -38,10 +38,10 @@ public interface SamzaPipelineOptions extends PipelineOptions { void setConfigFilePath(String filePath); @Description("The factory to read config file from config file path.") - @Default.Class(PropertiesConfigFactory.class) - Class getConfigFactory(); + @Default.Class(PropertiesConfigLoaderFactory.class) + Class getConfigLoaderFactory(); - void setConfigFactory(Class configFactory); + void setConfigLoaderFactory(Class configLoaderFactory); @Description( "The config override to set programmatically. It will be applied on " @@ -76,6 +76,18 @@ public interface SamzaPipelineOptions extends PipelineOptions { void setSystemBufferSize(int consumerBufferSize); + @Description("The maximum number of event-time timers to buffer in memory for a PTransform") + @Default.Integer(50000) + int getEventTimerBufferSize(); + + void setEventTimerBufferSize(int eventTimerBufferSize); + + @Description("The maximum number of ready timers to process at once per watermark.") + @Default.Integer(Integer.MAX_VALUE) + int getMaxReadyTimersToProcessOnce(); + + void setMaxReadyTimersToProcessOnce(int maxReadyTimersToProcessOnce); + @Description("The maximum parallelism allowed for any data source.") @Default.Integer(1) int getMaxSourceParallelism(); diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineOptionsValidator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineOptionsValidator.java index 591c0ee9b306..7702b6bb41fc 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineOptionsValidator.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineOptionsValidator.java @@ -18,11 +18,12 @@ package org.apache.beam.runners.samza; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; -import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; -import static org.apache.samza.config.TaskConfig.MAX_CONCURRENCY; +import static org.apache.samza.config.JobConfig.JOB_CONTAINER_THREAD_POOL_SIZE; import java.util.HashMap; import java.util.Map; +import org.apache.samza.config.JobConfig; +import org.apache.samza.config.MapConfig; /** Validates that the {@link SamzaPipelineOptions} conforms to all the criteria. */ public class SamzaPipelineOptionsValidator { @@ -32,30 +33,25 @@ public static void validate(SamzaPipelineOptions opts) { } /* - * Perform some bundling related validation for pipeline option . + * Perform some bundling related validation for pipeline option. + * Visible for testing. */ - private static void validateBundlingRelatedOptions(SamzaPipelineOptions pipelineOptions) { + static void validateBundlingRelatedOptions(SamzaPipelineOptions pipelineOptions) { if (pipelineOptions.getMaxBundleSize() > 1) { - // TODO: remove this check and implement bundling for side input, timer, etc in DoFnOp.java - checkState( - isPortable(pipelineOptions), - "Bundling is not supported in non portable mode. Please disable by setting maxBundleSize to 1."); - - String taskConcurrencyConfig = MAX_CONCURRENCY; - Map configs = + final Map configs = pipelineOptions.getConfigOverride() == null ? new HashMap<>() : pipelineOptions.getConfigOverride(); - long taskConcurrency = Long.parseLong(configs.getOrDefault(taskConcurrencyConfig, "1")); - checkState( - taskConcurrency == 1, - "Bundling is not supported if " - + taskConcurrencyConfig - + " is greater than 1. Please disable bundling by setting maxBundleSize to 1. Or disable task concurrency."); - } - } + final JobConfig jobConfig = new JobConfig(new MapConfig(configs)); - private static boolean isPortable(SamzaPipelineOptions options) { - return options instanceof SamzaPortablePipelineOptions; + // TODO: once Samza supports a better thread pool modle, e.g. thread + // per-task/key-range, this can be supported. + checkArgument( + jobConfig.getThreadPoolSize() <= 1, + JOB_CONTAINER_THREAD_POOL_SIZE + + " cannot be configured to" + + " greater than 1 for max bundle size: " + + pipelineOptions.getMaxBundleSize()); + } } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineRunner.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineRunner.java index 375b0555441e..bbdd1f5d0e47 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineRunner.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineRunner.java @@ -20,13 +20,16 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline; import org.apache.beam.runners.core.construction.PTransformTranslation; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser; import org.apache.beam.runners.core.construction.graph.ProtoOverrides; import org.apache.beam.runners.core.construction.graph.SplittableParDoExpander; +import org.apache.beam.runners.core.construction.graph.TrivialNativeTransformExpander; import org.apache.beam.runners.core.construction.renderer.PipelineDotRenderer; import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.runners.jobsubmission.PortablePipelineResult; import org.apache.beam.runners.jobsubmission.PortablePipelineRunner; +import org.apache.beam.runners.samza.translation.SamzaPortablePipelineTranslator; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -46,9 +49,19 @@ public PortablePipelineResult run(final Pipeline pipeline, JobInfo jobInfo) { pipeline, SplittableParDoExpander.createSizedReplacement()); + // Don't let the fuser fuse any subcomponents of native transforms. + Pipeline trimmedPipeline = + TrivialNativeTransformExpander.forKnownUrns( + pipelineWithSdfExpanded, SamzaPortablePipelineTranslator.knownUrns()); + // Fused pipeline proto. - final RunnerApi.Pipeline fusedPipeline = - GreedyPipelineFuser.fuse(pipelineWithSdfExpanded).toPipeline(); + // TODO: Consider supporting partially-fused graphs. + RunnerApi.Pipeline fusedPipeline = + trimmedPipeline.getComponents().getTransformsMap().values().stream() + .anyMatch(proto -> ExecutableStage.URN.equals(proto.getSpec().getUrn())) + ? trimmedPipeline + : GreedyPipelineFuser.fuse(trimmedPipeline).toPipeline(); + LOG.info("Portable pipeline to run:"); LOG.info(PipelineDotRenderer.toDotString(fusedPipeline)); // the pipeline option coming from sdk will set the sdk specific runner which will break @@ -57,7 +70,15 @@ public PortablePipelineResult run(final Pipeline pipeline, JobInfo jobInfo) { options.setRunner(SamzaRunner.class); try { final SamzaRunner runner = SamzaRunner.fromOptions(options); - return runner.runPortablePipeline(fusedPipeline); + final PortablePipelineResult result = runner.runPortablePipeline(fusedPipeline, jobInfo); + + final SamzaExecutionEnvironment exeEnv = options.getSamzaExecutionEnvironment(); + if (exeEnv == SamzaExecutionEnvironment.LOCAL + || exeEnv == SamzaExecutionEnvironment.STANDALONE) { + // Make run() sync for local mode + result.waitUntilFinish(); + } + return result; } catch (Exception e) { throw new RuntimeException("Failed to invoke samza job", e); } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPortablePipelineOptions.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPortablePipelineOptions.java index 661c1a554d34..aa8e7ceb71d7 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPortablePipelineOptions.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPortablePipelineOptions.java @@ -17,20 +17,16 @@ */ package org.apache.beam.runners.samza; -import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PortablePipelineOptions; /** Samza pipeline option that contains portability specific logic. For internal usage only. */ -public interface SamzaPortablePipelineOptions extends SamzaPipelineOptions { - @Description("The job service port. (Default: 11440) ") - @Default.Integer(11440) - int getJobPort(); +public interface SamzaPortablePipelineOptions + extends SamzaPipelineOptions, PortablePipelineOptions { + @Description( + "The file path for the local file system token. If not set (by default), then the runner would" + + " not use secure server factory.") + String getFsTokenPath(); - void setJobPort(int port); - - @Description("The FnControl port. (Default: 11441) ") - @Default.Integer(11441) - int getControlPort(); - - void setControlPort(int port); + void setFsTokenPath(String path); } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java index 573a2a1cd227..669c0c1c5749 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java @@ -25,6 +25,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.construction.SplittableParDo; import org.apache.beam.runners.core.construction.renderer.PipelineDotRenderer; +import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.runners.jobsubmission.PortablePipelineResult; import org.apache.beam.runners.samza.translation.ConfigBuilder; import org.apache.beam.runners.samza.translation.PViewToIdMapper; @@ -33,9 +34,11 @@ import org.apache.beam.runners.samza.translation.SamzaPortablePipelineTranslator; import org.apache.beam.runners.samza.translation.SamzaTransformOverrides; import org.apache.beam.runners.samza.translation.TranslationContext; +import org.apache.beam.runners.samza.util.PipelineJsonRenderer; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.PipelineRunner; import org.apache.beam.sdk.metrics.MetricsEnvironment; +import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsValidator; import org.apache.beam.sdk.values.PValue; @@ -60,6 +63,7 @@ public class SamzaRunner extends PipelineRunner { private static final Logger LOG = LoggerFactory.getLogger(SamzaRunner.class); private static final String BEAM_DOT_GRAPH = "beamDotGraph"; + private static final String BEAM_JSON_GRAPH = "beamJsonGraph"; public static SamzaRunner fromOptions(PipelineOptions opts) { final SamzaPipelineOptions samzaOptions = @@ -78,9 +82,9 @@ private SamzaRunner(SamzaPipelineOptions options) { listenerReg.hasNext() ? Iterators.getOnlyElement(listenerReg).getLifeCycleListener() : null; } - public PortablePipelineResult runPortablePipeline(RunnerApi.Pipeline pipeline) { + public PortablePipelineResult runPortablePipeline(RunnerApi.Pipeline pipeline, JobInfo jobInfo) { final String dotGraph = PipelineDotRenderer.toDotString(pipeline); - LOG.info("Portable pipeline to run:\n{}", dotGraph); + LOG.info("Portable pipeline to run DOT graph:\n{}", dotGraph); final ConfigBuilder configBuilder = new ConfigBuilder(options); SamzaPortablePipelineTranslator.createConfig(pipeline, configBuilder, options); @@ -101,7 +105,7 @@ public PortablePipelineResult runPortablePipeline(RunnerApi.Pipeline pipeline) { .withApplicationContainerContextFactory(executionContext.new Factory()) .withMetricsReporterFactories(reporterFactories); SamzaPortablePipelineTranslator.translate( - pipeline, new PortableTranslationContext(appDescriptor, options)); + pipeline, new PortableTranslationContext(appDescriptor, options, jobInfo)); }; ApplicationRunner runner = runSamzaApp(app, config); @@ -110,11 +114,21 @@ public PortablePipelineResult runPortablePipeline(RunnerApi.Pipeline pipeline) { @Override public SamzaPipelineResult run(Pipeline pipeline) { - SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); + // TODO(BEAM-10670): Use SDF read as default for non-portable execution when we address + // performance issue. + if (!ExperimentalOptions.hasExperiment(pipeline.getOptions(), "beam_fn_api")) { + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); + } + MetricsEnvironment.setMetricsSupported(true); if (LOG.isDebugEnabled()) { - LOG.debug("Pre-processed Beam pipeline:\n{}", PipelineDotRenderer.toDotString(pipeline)); + LOG.debug( + "Pre-processed Beam pipeline in dot format:\n{}", + PipelineDotRenderer.toDotString(pipeline)); + LOG.debug( + "Pre-processed Beam pipeline in json format:\n{}", + PipelineJsonRenderer.toJsonString(pipeline)); } pipeline.replaceAll(SamzaTransformOverrides.getDefaultOverrides()); @@ -122,11 +136,15 @@ public SamzaPipelineResult run(Pipeline pipeline) { final String dotGraph = PipelineDotRenderer.toDotString(pipeline); LOG.info("Beam pipeline DOT graph:\n{}", dotGraph); + final String jsonGraph = PipelineJsonRenderer.toJsonString(pipeline); + LOG.info("Beam pipeline JSON graph:\n{}", jsonGraph); + final Map idMap = PViewToIdMapper.buildIdMap(pipeline); final ConfigBuilder configBuilder = new ConfigBuilder(options); SamzaPipelineTranslator.createConfig(pipeline, options, idMap, configBuilder); configBuilder.put(BEAM_DOT_GRAPH, dotGraph); + configBuilder.put(BEAM_JSON_GRAPH, jsonGraph); final Config config = configBuilder.build(); options.setConfigOverride(config); diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunnerOverrideConfigs.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunnerOverrideConfigs.java index e31fea9dcbbb..3546a1c5457c 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunnerOverrideConfigs.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunnerOverrideConfigs.java @@ -19,6 +19,10 @@ import java.time.Duration; +// TODO: can we get rid of this class? Right now the SamzaPipelineOptionsValidator would force +// the pipeline option to be the type SamzaPipelineOption. Ideally, we should be able to keep +// passing SamzaPortablePipelineOption. Alternative, we could merge portable and non-portable +// pipeline option. /** A helper class for holding all the beam runner specific samza configs. */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) @@ -33,6 +37,8 @@ public class SamzaRunnerOverrideConfigs { public static final String CONTROL_CLIENT_MAX_WAIT_TIME_MS = "controL.wait.time.ms"; public static final long DEFAULT_CONTROL_CLIENT_MAX_WAIT_TIME_MS = Duration.ofMinutes(2).toMillis(); + public static final String FS_TOKEN_PATH = BEAM_RUNNER_CONFIG_PREFIX + "fs.token.path"; + public static final String DEFAULT_FS_TOKEN_PATH = null; private static boolean containsKey(SamzaPipelineOptions options, String configKey) { if (options == null || options.getConfigOverride() == null) { @@ -67,4 +73,13 @@ public static long getControlClientWaitTimeoutMs(SamzaPipelineOptions options) { return DEFAULT_CONTROL_CLIENT_MAX_WAIT_TIME_MS; } } + + /** Get fs token path for portable mode. */ + public static String getFsTokenPath(SamzaPipelineOptions options) { + if (containsKey(options, FS_TOKEN_PATH)) { + return options.getConfigOverride().get(FS_TOKEN_PATH); + } else { + return DEFAULT_FS_TOKEN_PATH; + } + } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/UnboundedSourceSystem.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/UnboundedSourceSystem.java index ef9531b6f23a..f94caaefc39c 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/UnboundedSourceSystem.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/UnboundedSourceSystem.java @@ -372,7 +372,13 @@ private void updateWatermark() throws InterruptedException { final Instant nextWatermark = reader.getWatermark(); if (currentWatermark.isBefore(nextWatermark)) { currentWatermarks.put(ssp, nextWatermark); - enqueueWatermark(reader); + if (BoundedWindow.TIMESTAMP_MAX_VALUE.isAfter(nextWatermark)) { + enqueueWatermark(reader); + } else { + // Max watermark has been reached for this reader. + enqueueMaxWatermarkAndEndOfStream(reader); + running = false; + } } } @@ -403,6 +409,37 @@ private void enqueueMessage(UnboundedReader reader) throws InterruptedException queues.get(ssp).put(envelope); } + // Send an max watermark message and an end of stream message to the corresponding ssp to + // close windows and finish the task. + private void enqueueMaxWatermarkAndEndOfStream(UnboundedReader reader) { + final SystemStreamPartition ssp = readerToSsp.get(reader); + // Send the max watermark to force completion of any open windows. + final IncomingMessageEnvelope watermarkEnvelope = + IncomingMessageEnvelope.buildWatermarkEnvelope( + ssp, BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis()); + enqueueUninterruptibly(watermarkEnvelope); + + final IncomingMessageEnvelope endOfStreamEnvelope = + IncomingMessageEnvelope.buildEndOfStreamEnvelope(ssp); + enqueueUninterruptibly(endOfStreamEnvelope); + } + + private void enqueueUninterruptibly(IncomingMessageEnvelope envelope) { + final BlockingQueue queue = + queues.get(envelope.getSystemStreamPartition()); + while (true) { + try { + queue.put(envelope); + return; + } catch (InterruptedException e) { + // Some events require that we post an envelope to the queue even if the interrupt + // flag was set (i.e. during a call to stop) to ensure that the consumer properly + // shuts down. Consequently, if we receive an interrupt here we ignore it and retry + // the put operation. + } + } + } + void stop() { running = false; } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/container/BeamContainerRunner.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/container/BeamContainerRunner.java index 60d7f69b1b2c..6ca8b29919aa 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/container/BeamContainerRunner.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/container/BeamContainerRunner.java @@ -40,8 +40,10 @@ public class BeamContainerRunner implements ApplicationRunner { private static final Logger LOG = LoggerFactory.getLogger(BeamContainerRunner.class); + @SuppressWarnings("rawtypes") private final ApplicationDescriptorImpl appDesc; + @SuppressWarnings("rawtypes") public BeamContainerRunner(SamzaApplication app, Config config) { this.appDesc = ApplicationDescriptorUtil.getAppDescriptor(app, config); } @@ -56,9 +58,7 @@ public void run(ExternalContext externalContext) { })); ContainerLaunchUtil.run( - appDesc, - System.getenv(ShellCommandConfig.ENV_CONTAINER_ID()), - ContainerCfgFactory.jobModel); + appDesc, System.getenv(ShellCommandConfig.ENV_CONTAINER_ID), ContainerCfgLoader.jobModel); } @Override diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/container/BeamJobCoordinatorRunner.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/container/BeamJobCoordinatorRunner.java new file mode 100644 index 000000000000..fb00a018fb29 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/container/BeamJobCoordinatorRunner.java @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.container; + +import java.time.Duration; +import org.apache.samza.application.SamzaApplication; +import org.apache.samza.application.descriptors.ApplicationDescriptor; +import org.apache.samza.clustermanager.JobCoordinatorLaunchUtil; +import org.apache.samza.config.Config; +import org.apache.samza.context.ExternalContext; +import org.apache.samza.job.ApplicationStatus; +import org.apache.samza.runtime.ApplicationRunner; + +/** Runs on Yarn AM, execute planning and launches JobCoordinator. */ +public class BeamJobCoordinatorRunner implements ApplicationRunner { + + @SuppressWarnings("rawtypes") + private final SamzaApplication app; + + private final Config config; + + /** + * Constructors a {@link BeamJobCoordinatorRunner} to run the {@code app} with the {@code config}. + * + * @param app application to run + * @param config configuration for the application + */ + @SuppressWarnings("rawtypes") + public BeamJobCoordinatorRunner( + SamzaApplication app, Config config) { + this.app = app; + this.config = config; + } + + @Override + public void run(ExternalContext externalContext) { + JobCoordinatorLaunchUtil.run(app, config); + } + + @Override + public void kill() { + throw new UnsupportedOperationException( + "BeamJobCoordinatorRunner#kill should never be invoked."); + } + + @Override + public ApplicationStatus status() { + throw new UnsupportedOperationException( + "BeamJobCoordinatorRunner#status should never be invoked."); + } + + @Override + public void waitForFinish() { + throw new UnsupportedOperationException( + "BeamJobCoordinatorRunner#waitForFinish should never be invoked."); + } + + @Override + public boolean waitForFinish(Duration timeout) { + throw new UnsupportedOperationException( + "BeamJobCoordinatorRunner#waitForFinish should never be invoked."); + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/container/ContainerCfgFactory.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/container/ContainerCfgLoader.java similarity index 79% rename from runners/samza/src/main/java/org/apache/beam/runners/samza/container/ContainerCfgFactory.java rename to runners/samza/src/main/java/org/apache/beam/runners/samza/container/ContainerCfgLoader.java index cb97b58e49dd..99455e74925a 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/container/ContainerCfgFactory.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/container/ContainerCfgLoader.java @@ -17,12 +17,11 @@ */ package org.apache.beam.runners.samza.container; -import java.net.URI; import java.util.HashMap; import java.util.Map; import java.util.Random; import org.apache.samza.config.Config; -import org.apache.samza.config.ConfigFactory; +import org.apache.samza.config.ConfigLoader; import org.apache.samza.config.MapConfig; import org.apache.samza.config.ShellCommandConfig; import org.apache.samza.container.SamzaContainer; @@ -30,26 +29,27 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; -/** Factory for the Beam yarn container to load job model. */ +/** Loader for the Beam yarn container to load job model. */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public class ContainerCfgFactory implements ConfigFactory { - private static final Logger LOG = LoggerFactory.getLogger(ContainerCfgFactory.class); +public class ContainerCfgLoader implements ConfigLoader { + private static final Logger LOG = LoggerFactory.getLogger(ContainerCfgLoader.class); private static final Object LOCK = new Object(); static volatile JobModel jobModel; @Override - public Config getConfig(URI configUri) { + public Config getConfig() { if (jobModel == null) { synchronized (LOCK) { if (jobModel == null) { - String containerId = System.getenv(ShellCommandConfig.ENV_CONTAINER_ID()); + final String containerId = System.getenv(ShellCommandConfig.ENV_CONTAINER_ID); LOG.info(String.format("Got container ID: %s", containerId)); - String coordinatorUrl = System.getenv(ShellCommandConfig.ENV_COORDINATOR_URL()); + final String coordinatorUrl = System.getenv(ShellCommandConfig.ENV_COORDINATOR_URL); LOG.info(String.format("Got coordinator URL: %s", coordinatorUrl)); - int delay = new Random().nextInt(SamzaContainer.DEFAULT_READ_JOBMODEL_DELAY_MS()) + 1; + final int delay = + new Random().nextInt(SamzaContainer.DEFAULT_READ_JOBMODEL_DELAY_MS()) + 1; jobModel = SamzaContainer.readJobModel(coordinatorUrl, delay); } } diff --git a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/SourceTransformationCompat.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/container/ContainerCfgLoaderFactory.java similarity index 65% rename from runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/SourceTransformationCompat.java rename to runners/samza/src/main/java/org/apache/beam/runners/samza/container/ContainerCfgLoaderFactory.java index f7cc67cd5480..d3b090d6e20a 100644 --- a/runners/flink/1.8/src/test/java/org/apache/beam/runners/flink/SourceTransformationCompat.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/container/ContainerCfgLoaderFactory.java @@ -15,14 +15,16 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.flink; +package org.apache.beam.runners.samza.container; -import org.apache.flink.streaming.api.operators.StreamSource; -import org.apache.flink.streaming.api.transformations.SourceTransformation; +import org.apache.samza.config.Config; +import org.apache.samza.config.ConfigLoader; +import org.apache.samza.config.ConfigLoaderFactory; -/** Compatibility layer for {@link SourceTransformation} rename. */ -public class SourceTransformationCompat { - public static StreamSource getOperator(Object sourceTransform) { - return ((SourceTransformation) sourceTransform).getOperator(); +/** Factory for the Beam yarn container to get loader to load job model. */ +public class ContainerCfgLoaderFactory implements ConfigLoaderFactory { + @Override + public ConfigLoader getLoader(Config config) { + return new ContainerCfgLoader(); } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/metrics/SamzaMetricsContainer.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/metrics/SamzaMetricsContainer.java index 3ecf51902d45..84706ab269e3 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/metrics/SamzaMetricsContainer.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/metrics/SamzaMetricsContainer.java @@ -73,7 +73,7 @@ public void updateMetrics(String stepName) { final GaugeUpdater updateGauge = new GaugeUpdater(); results.getGauges().forEach(updateGauge); - // TODO: add distribution metrics to Samza + // TODO(BEAM-12614): add distribution metrics to Samza } private class CounterUpdater implements Consumer> { diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/BundleManager.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/BundleManager.java new file mode 100644 index 000000000000..e5bf8ec70b2e --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/BundleManager.java @@ -0,0 +1,349 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.runtime; + +import java.util.Collection; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.CompletionStage; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicLong; +import java.util.concurrent.atomic.AtomicReference; +import java.util.function.BiConsumer; +import javax.annotation.Nullable; +import org.apache.beam.runners.core.StateNamespaces; +import org.apache.beam.runners.core.TimerInternals; +import org.apache.beam.sdk.state.TimeDomain; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.samza.operators.Scheduler; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Bundle management for the {@link DoFnOp} that handles lifecycle of a bundle. It also serves as a + * proxy for the {@link DoFnOp} to process watermark and decides to 1. Hold watermark if there is at + * least one bundle in progress. 2. Propagates the watermark to downstream DAG, if all the previous + * bundles have completed. + * + *

    A bundle is considered complete only when the outputs corresponding to each element in the + * bundle have been resolved and the watermark associated with the bundle(if any) is propagated + * downstream. The output of an element is considered resolved based on the nature of the ParDoFn 1. + * In case of synchronous ParDo, outputs of the element is resolved immediately after the + * processElement returns. 2. In case of asynchronous ParDo, outputs of the element is resolved when + * all the future emitted by the processElement is resolved. + * + *

    This class is not thread safe and the current implementation relies on the assumption that + * messages are dispatched to BundleManager in a single threaded mode. + * + * @param output type of the {@link DoFnOp} + */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class BundleManager { + private static final Logger LOG = LoggerFactory.getLogger(BundleManager.class); + private static final long MIN_BUNDLE_CHECK_TIME_MS = 10L; + + private final long maxBundleSize; + private final long maxBundleTimeMs; + private final BundleProgressListener bundleProgressListener; + private final FutureCollector futureCollector; + private final Scheduler> bundleTimerScheduler; + private final String bundleCheckTimerId; + + // Number elements belonging to the current active bundle + private transient AtomicLong currentBundleElementCount; + // Number of bundles that are in progress but not yet finished + private transient AtomicLong pendingBundleCount; + // Denotes the start time of the current active bundle + private transient AtomicLong bundleStartTime; + // Denotes if there is an active in progress bundle. Note at a given time, we can have multiple + // bundle in progress. + // This flag denotes if there is a bundle that is current and hasn't been closed. + private transient AtomicBoolean isBundleStarted; + // Holder for watermark which gets propagated when the bundle is finished. + private transient Instant bundleWatermarkHold; + // A future that is completed once all futures belonging to the current active bundle are + // completed. The value is null if there are no futures in the current active bundle. + private transient AtomicReference> currentActiveBundleDoneFutureReference; + private transient CompletionStage watermarkFuture; + + public BundleManager( + BundleProgressListener bundleProgressListener, + FutureCollector futureCollector, + long maxBundleSize, + long maxBundleTimeMs, + Scheduler> bundleTimerScheduler, + String bundleCheckTimerId) { + this.maxBundleSize = maxBundleSize; + this.maxBundleTimeMs = maxBundleTimeMs; + this.bundleProgressListener = bundleProgressListener; + this.bundleTimerScheduler = bundleTimerScheduler; + this.bundleCheckTimerId = bundleCheckTimerId; + this.futureCollector = futureCollector; + + if (maxBundleSize > 1) { + scheduleNextBundleCheck(); + } + + // instance variable initialization for bundle tracking + this.bundleStartTime = new AtomicLong(Long.MAX_VALUE); + this.currentActiveBundleDoneFutureReference = new AtomicReference<>(); + this.currentBundleElementCount = new AtomicLong(0L); + this.isBundleStarted = new AtomicBoolean(false); + this.pendingBundleCount = new AtomicLong(0L); + this.watermarkFuture = CompletableFuture.completedFuture(null); + } + + /* + * Schedule in processing time to check whether the current bundle should be closed. Note that + * we only approximately achieve max bundle time by checking as frequent as half of the max bundle + * time set by users. This would violate the max bundle time by up to half of it but should + * acceptable in most cases (and cheaper than scheduling a timer at the beginning of every bundle). + */ + private void scheduleNextBundleCheck() { + final Instant nextBundleCheckTime = + Instant.now().plus(Duration.millis(maxBundleTimeMs / 2 + MIN_BUNDLE_CHECK_TIME_MS)); + final TimerInternals.TimerData timerData = + TimerInternals.TimerData.of( + this.bundleCheckTimerId, + StateNamespaces.global(), + nextBundleCheckTime, + nextBundleCheckTime, + TimeDomain.PROCESSING_TIME); + bundleTimerScheduler.schedule( + new KeyedTimerData<>(new byte[0], null, timerData), nextBundleCheckTime.getMillis()); + } + + void tryStartBundle() { + futureCollector.prepare(); + + if (isBundleStarted.compareAndSet(false, true)) { + LOG.debug("Starting a new bundle."); + // make sure the previous bundle is sealed and futures are cleared + Preconditions.checkArgument( + currentActiveBundleDoneFutureReference.get() == null, + "Current active bundle done future should be null before starting a new bundle."); + bundleStartTime.set(System.currentTimeMillis()); + pendingBundleCount.incrementAndGet(); + bundleProgressListener.onBundleStarted(); + } + + currentBundleElementCount.incrementAndGet(); + } + + void processWatermark(Instant watermark, OpEmitter emitter) { + // propagate watermark immediately if no bundle is in progress and all the previous bundles have + // completed. + if (!isBundleStarted() && pendingBundleCount.get() == 0) { + LOG.debug("Propagating watermark: {} directly since no bundle in progress.", watermark); + bundleProgressListener.onWatermark(watermark, emitter); + return; + } + + // hold back the watermark since there is either a bundle in progress or previously closed + // bundles are unfinished. + this.bundleWatermarkHold = watermark; + + // for batch mode, the max watermark should force the bundle to close + if (BoundedWindow.TIMESTAMP_MAX_VALUE.equals(watermark)) { + /* + * Due to lack of async watermark function, we block on the previous watermark futures before propagating the watermark + * downstream. If a bundle is in progress tryFinishBundle() fill force the bundle to close and emit watermark. + * If no bundle in progress, we progress watermark explicitly after the completion of previous watermark futures. + */ + if (isBundleStarted()) { + LOG.info( + "Received max watermark. Triggering finish bundle before flushing the watermark downstream."); + tryFinishBundle(emitter); + watermarkFuture.toCompletableFuture().join(); + } else { + LOG.info( + "Received max watermark. Waiting for previous bundles to complete before flushing the watermark downstream."); + watermarkFuture.toCompletableFuture().join(); + bundleProgressListener.onWatermark(watermark, emitter); + } + } + } + + void processTimer(KeyedTimerData keyedTimerData, OpEmitter emitter) { + // this is internal timer in processing time to check whether a bundle should be closed + if (bundleCheckTimerId.equals(keyedTimerData.getTimerData().getTimerId())) { + tryFinishBundle(emitter); + scheduleNextBundleCheck(); + } + } + + /** + * Signal the bundle manager to handle failure. We discard the output collected as part of + * processing the current element and reset the bundle count. + * + * @param t failure cause + */ + void signalFailure(Throwable t) { + LOG.error("Encountered error during processing the message. Discarding the output due to: ", t); + futureCollector.discard(); + // reset the bundle start flag only if the bundle has started + isBundleStarted.compareAndSet(true, false); + + // bundle start may not necessarily mean we have actually started the bundle since some of the + // invariant check conditions within bundle start could throw exceptions. so rely on bundle + // start time + if (bundleStartTime.get() != Long.MAX_VALUE) { + currentBundleElementCount.set(0L); + bundleStartTime.set(Long.MAX_VALUE); + pendingBundleCount.decrementAndGet(); + currentActiveBundleDoneFutureReference.set(null); + } + } + + void tryFinishBundle(OpEmitter emitter) { + + // we need to seal the output for each element within a bundle irrespective of the whether we + // decide to finish the + // bundle or not + CompletionStage>> outputFuture = futureCollector.finish(); + + if (shouldFinishBundle() && isBundleStarted.compareAndSet(true, false)) { + LOG.debug("Finishing the current bundle."); + + // reset the bundle count + // seal the bundle and emit the result future (collection of results) + // chain the finish bundle invocation on the finish bundle + currentBundleElementCount.set(0L); + bundleStartTime.set(Long.MAX_VALUE); + Instant watermarkHold = bundleWatermarkHold; + bundleWatermarkHold = null; + + CompletionStage currentActiveBundleDoneFuture = + currentActiveBundleDoneFutureReference.get(); + outputFuture = + outputFuture.thenCombine( + currentActiveBundleDoneFuture != null + ? currentActiveBundleDoneFuture + : CompletableFuture.completedFuture(null), + (res, ignored) -> { + bundleProgressListener.onBundleFinished(emitter); + return res; + }); + + BiConsumer>, Void> watermarkPropagationFn; + if (watermarkHold == null) { + watermarkPropagationFn = (ignored, res) -> pendingBundleCount.decrementAndGet(); + } else { + watermarkPropagationFn = + (ignored, res) -> { + LOG.debug("Propagating watermark: {} to downstream.", watermarkHold); + bundleProgressListener.onWatermark(watermarkHold, emitter); + pendingBundleCount.decrementAndGet(); + }; + } + + // We chain the current watermark emission with previous watermark and the output futures + // since bundles can finish out of order but we still want the watermark to be emitted in + // order. + watermarkFuture = outputFuture.thenAcceptBoth(watermarkFuture, watermarkPropagationFn); + currentActiveBundleDoneFutureReference.set(null); + } else if (isBundleStarted.get()) { + final CompletableFuture>> finalOutputFuture = + outputFuture.toCompletableFuture(); + currentActiveBundleDoneFutureReference.updateAndGet( + maybePrevFuture -> { + CompletableFuture prevFuture = + maybePrevFuture != null ? maybePrevFuture : CompletableFuture.completedFuture(null); + + return CompletableFuture.allOf(prevFuture, finalOutputFuture); + }); + } + + // emit the future to the propagate it to rest of the DAG + emitter.emitFuture(outputFuture); + } + + @VisibleForTesting + long getCurrentBundleElementCount() { + return currentBundleElementCount.longValue(); + } + + @VisibleForTesting + @Nullable + CompletionStage getCurrentBundleDoneFuture() { + return currentActiveBundleDoneFutureReference.get(); + } + + @VisibleForTesting + void setCurrentBundleDoneFuture(CompletableFuture currentBundleResultFuture) { + this.currentActiveBundleDoneFutureReference.set(currentBundleResultFuture); + } + + @VisibleForTesting + long getPendingBundleCount() { + return pendingBundleCount.longValue(); + } + + @VisibleForTesting + void setPendingBundleCount(long value) { + pendingBundleCount.set(value); + } + + @VisibleForTesting + boolean isBundleStarted() { + return isBundleStarted.get(); + } + + @VisibleForTesting + void setBundleWatermarkHold(Instant watermark) { + this.bundleWatermarkHold = watermark; + } + + /** + * We close the current bundle in progress if one of the following criteria is met 1. The bundle + * count ≥ maxBundleSize 2. Time elapsed since the bundle started is ≥ maxBundleTimeMs 3. + * Watermark hold equals to TIMESTAMP_MAX_VALUE which usually is the case for bounded jobs + * + * @return true - if one of the criteria above is satisfied; false - otherwise + */ + private boolean shouldFinishBundle() { + return isBundleStarted.get() + && (currentBundleElementCount.get() >= maxBundleSize + || System.currentTimeMillis() - bundleStartTime.get() >= maxBundleTimeMs + || BoundedWindow.TIMESTAMP_MAX_VALUE.equals(bundleWatermarkHold)); + } + + /** + * A listener used to track the lifecycle of a bundle. Typically, the lifecycle of a bundle + * consists of 1. Start bundle - Invoked when the bundle is started 2. Finish bundle - Invoked + * when the bundle is complete. Refer to the docs under {@link BundleManager} for definition on + * when a bundle is considered complete. 3. onWatermark - Invoked when watermark is ready to be + * propagated to downstream DAG. Refer to the docs under {@link BundleManager} on when watermark + * is held vs propagated. + * + * @param + */ + public interface BundleProgressListener { + void onBundleStarted(); + + void onBundleFinished(OpEmitter emitter); + + void onWatermark(Instant watermark, OpEmitter emitter); + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/DoFnOp.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/DoFnOp.java index 9f2ea43fd387..79de63cc557b 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/DoFnOp.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/DoFnOp.java @@ -17,15 +17,19 @@ */ package org.apache.beam.runners.samza.runtime; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + import java.util.ArrayList; import java.util.Collection; +import java.util.Collections; import java.util.HashMap; import java.util.Iterator; import java.util.List; import java.util.Map; import java.util.ServiceLoader; +import java.util.concurrent.CompletionStage; import java.util.concurrent.atomic.AtomicBoolean; -import java.util.concurrent.atomic.AtomicLong; +import java.util.function.Function; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.DoFnRunner; import org.apache.beam.runners.core.DoFnRunners; @@ -34,15 +38,15 @@ import org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner; import org.apache.beam.runners.core.StateNamespace; import org.apache.beam.runners.core.StateNamespaces; -import org.apache.beam.runners.core.StateTags; import org.apache.beam.runners.core.TimerInternals; import org.apache.beam.runners.core.construction.graph.ExecutableStage; +import org.apache.beam.runners.fnexecution.control.ExecutableStageContext; import org.apache.beam.runners.fnexecution.control.StageBundleFactory; +import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.runners.samza.SamzaExecutionContext; import org.apache.beam.runners.samza.SamzaPipelineOptions; +import org.apache.beam.runners.samza.util.FutureUtils; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.state.BagState; -import org.apache.beam.sdk.state.TimeDomain; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.DoFnSchemaInformation; import org.apache.beam.sdk.transforms.join.RawUnionValue; @@ -60,7 +64,6 @@ import org.apache.samza.config.Config; import org.apache.samza.context.Context; import org.apache.samza.operators.Scheduler; -import org.joda.time.Duration; import org.joda.time.Instant; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -72,7 +75,6 @@ }) public class DoFnOp implements Op { private static final Logger LOG = LoggerFactory.getLogger(DoFnOp.class); - private static final long MIN_BUNDLE_CHECK_TIME_MS = 10L; private final TupleTag mainOutputTag; private final DoFn doFn; @@ -82,6 +84,7 @@ public class DoFnOp implements Op { private final WindowingStrategy windowingStrategy; private final OutputManagerFactory outputManagerFactory; // NOTE: we use HashMap here to guarantee Serializability + // Mapping from view id to a view private final HashMap> idToViewMap; private final String transformFullName; private final String transformId; @@ -95,6 +98,7 @@ public class DoFnOp implements Op { // portable api related private final boolean isPortable; private final RunnerApi.ExecutableStagePayload stagePayload; + private final JobInfo jobInfo; private final HashMap> idToTupleTagMap; private transient SamzaTimerInternalsFactory timerInternalsFactory; @@ -113,18 +117,15 @@ public class DoFnOp implements Op { // TODO: add this to checkpointable state private transient Instant inputWatermark; - private transient Instant bundleWatermarkHold; + private transient BundleManager bundleManager; private transient Instant sideInputWatermark; private transient List> pushbackValues; + private transient ExecutableStageContext stageContext; private transient StageBundleFactory stageBundleFactory; - private transient long maxBundleSize; - private transient long maxBundleTimeMs; - private transient AtomicLong currentBundleElementCount; - private transient AtomicLong bundleStartTime; - private transient AtomicBoolean isBundleStarted; - private transient Scheduler> bundleTimerScheduler; - private DoFnSchemaInformation doFnSchemaInformation; - private Map> sideInputMapping; + private transient boolean bundleDisabled; + + private final DoFnSchemaInformation doFnSchemaInformation; + private final Map> sideInputMapping; public DoFnOp( TupleTag mainOutputTag, @@ -143,9 +144,10 @@ public DoFnOp( PCollection.IsBounded isBounded, boolean isPortable, RunnerApi.ExecutableStagePayload stagePayload, + JobInfo jobInfo, Map> idToTupleTagMap, DoFnSchemaInformation doFnSchemaInformation, - Map> sideInputMapping) { + Map> sideInputMapping) { this.mainOutputTag = mainOutputTag; this.doFn = doFn; this.sideInputs = sideInputs; @@ -162,6 +164,7 @@ public DoFnOp( this.isBounded = isBounded; this.isPortable = isPortable; this.stagePayload = stagePayload; + this.jobInfo = jobInfo; this.idToTupleTagMap = new HashMap<>(idToTupleTagMap); this.bundleCheckTimerId = "_samza_bundle_check_" + transformId; this.bundleStateId = "_samza_bundle_" + transformId; @@ -170,6 +173,7 @@ public DoFnOp( } @Override + @SuppressWarnings("unchecked") public void open( Config config, Context context, @@ -178,26 +182,27 @@ public void open( this.inputWatermark = BoundedWindow.TIMESTAMP_MIN_VALUE; this.sideInputWatermark = BoundedWindow.TIMESTAMP_MIN_VALUE; this.pushbackWatermarkHold = BoundedWindow.TIMESTAMP_MAX_VALUE; - this.currentBundleElementCount = new AtomicLong(0L); - this.bundleStartTime = new AtomicLong(Long.MAX_VALUE); - this.isBundleStarted = new AtomicBoolean(false); - this.bundleWatermarkHold = null; final DoFnSignature signature = DoFnSignatures.getSignature(doFn.getClass()); final SamzaExecutionContext samzaExecutionContext = (SamzaExecutionContext) context.getApplicationContainerContext(); this.samzaPipelineOptions = samzaExecutionContext.getPipelineOptions(); - this.maxBundleSize = samzaPipelineOptions.getMaxBundleSize(); - this.maxBundleTimeMs = samzaPipelineOptions.getMaxBundleTimeMs(); - this.bundleTimerScheduler = timerRegistry; - - if (this.maxBundleSize > 1) { - scheduleNextBundleCheck(); - } + this.bundleDisabled = samzaPipelineOptions.getMaxBundleSize() <= 1; + final String stateId = "pardo-" + transformId; final SamzaStoreStateInternals.Factory nonKeyedStateInternalsFactory = - SamzaStoreStateInternals.createStateInternalFactory( - transformId, null, context.getTaskContext(), samzaPipelineOptions, signature); + SamzaStoreStateInternals.createNonKeyedStateInternalsFactory( + stateId, context.getTaskContext(), samzaPipelineOptions); + final FutureCollector outputFutureCollector = createFutureCollector(); + + this.bundleManager = + new BundleManager<>( + createBundleProgressListener(), + outputFutureCollector, + samzaPipelineOptions.getMaxBundleSize(), + samzaPipelineOptions.getMaxBundleTimeMs(), + timerRegistry, + bundleCheckTimerId); this.timerInternalsFactory = SamzaTimerInternalsFactory.createTimerInternalFactory( @@ -213,18 +218,21 @@ public void open( new SideInputHandler(sideInputs, nonKeyedStateInternalsFactory.stateInternalsForKey(null)); if (isPortable) { - // storing events within a bundle in states - final BagState> bundledEventsBagState = - nonKeyedStateInternalsFactory - .stateInternalsForKey(null) - .state(StateNamespaces.global(), StateTags.bag(bundleStateId, windowedValueCoder)); final ExecutableStage executableStage = ExecutableStage.fromPayload(stagePayload); - stageBundleFactory = samzaExecutionContext.getJobBundleFactory().forStage(executableStage); + stageContext = SamzaExecutableStageContextFactory.getInstance().get(jobInfo); + stageBundleFactory = stageContext.getStageBundleFactory(executableStage); this.fnRunner = SamzaDoFnRunners.createPortable( + transformId, + bundleStateId, + windowedValueCoder, + executableStage, + sideInputMapping, + sideInputHandler, + nonKeyedStateInternalsFactory, + timerInternalsFactory, samzaPipelineOptions, - bundledEventsBagState, - outputManagerFactory.create(emitter), + outputManagerFactory.create(emitter, outputFutureCollector), stageBundleFactory, mainOutputTag, idToTupleTagMap, @@ -237,18 +245,18 @@ public void open( doFn, windowingStrategy, transformFullName, - transformId, + stateId, context, mainOutputTag, sideInputHandler, timerInternalsFactory, keyCoder, - outputManagerFactory.create(emitter), + outputManagerFactory.create(emitter, outputFutureCollector), inputCoder, sideOutputTags, outputCoders, doFnSchemaInformation, - sideInputMapping); + (Map>) sideInputMapping); } this.pushbackFnRunner = @@ -259,87 +267,44 @@ public void open( ServiceLoader.load(SamzaDoFnInvokerRegistrar.class).iterator(); if (!invokerReg.hasNext()) { // use the default invoker here - doFnInvoker = DoFnInvokers.invokerFor(doFn); + doFnInvoker = DoFnInvokers.tryInvokeSetupFor(doFn, samzaPipelineOptions); } else { - doFnInvoker = Iterators.getOnlyElement(invokerReg).invokerFor(doFn, context); + doFnInvoker = + Iterators.getOnlyElement(invokerReg).invokerSetupFor(doFn, samzaPipelineOptions, context); } - - doFnInvoker.invokeSetup(); } - /* - * Schedule in processing time to check whether the current bundle should be closed. Note that - * we only approximately achieve max bundle time by checking as frequent as half of the max bundle - * time set by users. This would violate the max bundle time by up to half of it but should - * acceptable in most cases (and cheaper than scheduling a timer at the beginning of every bundle). - */ - private void scheduleNextBundleCheck() { - final Instant nextBundleCheckTime = - Instant.now().plus(Duration.millis(maxBundleTimeMs / 2 + MIN_BUNDLE_CHECK_TIME_MS)); - final TimerInternals.TimerData timerData = - TimerInternals.TimerData.of( - bundleCheckTimerId, - StateNamespaces.global(), - nextBundleCheckTime, - nextBundleCheckTime, - TimeDomain.PROCESSING_TIME); - bundleTimerScheduler.schedule( - new KeyedTimerData<>(new byte[0], null, timerData), nextBundleCheckTime.getMillis()); + FutureCollector createFutureCollector() { + return new FutureCollectorImpl<>(); } private String getTimerStateId(DoFnSignature signature) { final StringBuilder builder = new StringBuilder("timer"); if (signature.usesTimers()) { - signature.timerDeclarations().keySet().forEach(key -> builder.append(key)); + signature.timerDeclarations().keySet().forEach(builder::append); } return builder.toString(); } - private void attemptStartBundle() { - if (isBundleStarted.compareAndSet(false, true)) { - currentBundleElementCount.set(0L); - bundleStartTime.set(System.currentTimeMillis()); - pushbackFnRunner.startBundle(); - } - } - - private void finishBundle(OpEmitter emitter) { - if (isBundleStarted.compareAndSet(true, false)) { - currentBundleElementCount.set(0L); - bundleStartTime.set(Long.MAX_VALUE); - pushbackFnRunner.finishBundle(); - if (bundleWatermarkHold != null) { - doProcessWatermark(bundleWatermarkHold, emitter); - } - bundleWatermarkHold = null; - } - } - - private void attemptFinishBundle(OpEmitter emitter) { - if (!isBundleStarted.get()) { - return; - } - if (currentBundleElementCount.get() >= maxBundleSize - || System.currentTimeMillis() - bundleStartTime.get() > maxBundleTimeMs) { - finishBundle(emitter); - } - } - @Override public void processElement(WindowedValue inputElement, OpEmitter emitter) { - attemptStartBundle(); - - final Iterable> rejectedValues = - pushbackFnRunner.processElementInReadyWindows(inputElement); - for (WindowedValue rejectedValue : rejectedValues) { - if (rejectedValue.getTimestamp().compareTo(pushbackWatermarkHold) < 0) { - pushbackWatermarkHold = rejectedValue.getTimestamp(); + try { + bundleManager.tryStartBundle(); + final Iterable> rejectedValues = + pushbackFnRunner.processElementInReadyWindows(inputElement); + for (WindowedValue rejectedValue : rejectedValues) { + if (rejectedValue.getTimestamp().compareTo(pushbackWatermarkHold) < 0) { + pushbackWatermarkHold = rejectedValue.getTimestamp(); + } + pushbackValues.add(rejectedValue); } - pushbackValues.add(rejectedValue); - } - currentBundleElementCount.incrementAndGet(); - attemptFinishBundle(emitter); + bundleManager.tryFinishBundle(emitter); + } catch (Throwable t) { + LOG.error("Encountered error during process element", t); + bundleManager.signalFailure(t); + throw t; + } } private void doProcessWatermark(Instant watermark, OpEmitter emitter) { @@ -373,21 +338,14 @@ private void doProcessWatermark(Instant watermark, OpEmitter emitter) { @Override public void processWatermark(Instant watermark, OpEmitter emitter) { - if (!isBundleStarted.get()) { - doProcessWatermark(watermark, emitter); - } else { - // if there is a bundle in progress, hold back the watermark until end of the bundle - this.bundleWatermarkHold = watermark; - if (watermark.isEqual(BoundedWindow.TIMESTAMP_MAX_VALUE)) { - // for batch mode, the max watermark should force the bundle to close - finishBundle(emitter); - } - } + bundleManager.processWatermark(watermark, emitter); } @Override public void processSideInput( String id, WindowedValue> elements, OpEmitter emitter) { + checkState( + bundleDisabled, "Side input not supported in bundling mode. Please disable bundling."); @SuppressWarnings("unchecked") final WindowedValue> retypedElements = (WindowedValue>) elements; @@ -413,6 +371,8 @@ public void processSideInput( @Override public void processSideInputWatermark(Instant watermark, OpEmitter emitter) { + checkState( + bundleDisabled, "Side input not supported in bundling mode. Please disable bundling."); sideInputWatermark = watermark; if (sideInputWatermark.isEqual(BoundedWindow.TIMESTAMP_MAX_VALUE)) { @@ -422,11 +382,11 @@ public void processSideInputWatermark(Instant watermark, OpEmitter emitter } @Override + @SuppressWarnings("unchecked") public void processTimer(KeyedTimerData keyedTimerData, OpEmitter emitter) { // this is internal timer in processing time to check whether a bundle should be closed if (bundleCheckTimerId.equals(keyedTimerData.getTimerData().getTimerId())) { - attemptFinishBundle(emitter); - scheduleNextBundleCheck(); + bundleManager.processTimer(keyedTimerData, emitter); return; } @@ -439,9 +399,9 @@ public void processTimer(KeyedTimerData keyedTimerData, OpEmitter em @Override public void close() { - bundleWatermarkHold = null; doFnInvoker.invokeTeardown(); - try (AutoCloseable closer = stageBundleFactory) { + try (AutoCloseable factory = stageBundleFactory; + AutoCloseable context = stageContext) { // do nothing } catch (Exception e) { LOG.error("Failed to close stage bundle factory", e); @@ -456,21 +416,17 @@ private void fireTimer(KeyedTimerData keyedTimerData) { // NOTE: not sure why this is safe, but DoFnOperator makes this assumption final BoundedWindow window = ((StateNamespaces.WindowNamespace) namespace).getWindow(); - if (fnRunner instanceof DoFnRunnerWithKeyedInternals) { - // Need to pass in the keyed TimerData here - ((DoFnRunnerWithKeyedInternals) fnRunner).onTimer(keyedTimerData, window); - } else { - pushbackFnRunner.onTimer( - timer.getTimerId(), - timer.getTimerFamilyId(), - null, - window, - timer.getTimestamp(), - timer.getOutputTimestamp(), - timer.getDomain()); - } + fnRunner.onTimer( + timer.getTimerId(), + timer.getTimerFamilyId(), + keyedTimerData.getKey(), + window, + timer.getTimestamp(), + timer.getOutputTimestamp(), + timer.getDomain()); } + // todo: should this go through bundle manager to start and finish the bundle? private void emitAllPushbackValues() { if (!pushbackValues.isEmpty()) { pushbackFnRunner.startBundle(); @@ -487,6 +443,88 @@ private void emitAllPushbackValues() { } } + private BundleManager.BundleProgressListener createBundleProgressListener() { + return new BundleManager.BundleProgressListener() { + @Override + public void onBundleStarted() { + pushbackFnRunner.startBundle(); + } + + @Override + public void onBundleFinished(OpEmitter emitter) { + pushbackFnRunner.finishBundle(); + } + + @Override + public void onWatermark(Instant watermark, OpEmitter emitter) { + doProcessWatermark(watermark, emitter); + } + }; + } + + static CompletionStage> createOutputFuture( + WindowedValue windowedValue, + CompletionStage valueFuture, + Function valueMapper) { + return valueFuture.thenApply( + res -> + WindowedValue.of( + valueMapper.apply(res), + windowedValue.getTimestamp(), + windowedValue.getWindows(), + windowedValue.getPane())); + } + + static class FutureCollectorImpl implements FutureCollector { + private final List>> outputFutures; + private final AtomicBoolean collectorSealed; + + FutureCollectorImpl() { + /* + * Choosing synchronized list here since the concurrency is low as the message dispatch thread is single threaded. + * We need this guard against scenarios when watermark/finish bundle trigger outputs. + */ + outputFutures = Collections.synchronizedList(new ArrayList<>()); + collectorSealed = new AtomicBoolean(true); + } + + @Override + public void add(CompletionStage> element) { + checkState( + !collectorSealed.get(), + "Cannot add elements to an unprepared collector. Make sure prepare() is invoked before adding elements."); + outputFutures.add(element); + } + + @Override + public void discard() { + collectorSealed.compareAndSet(false, true); + outputFutures.clear(); + } + + @Override + public CompletionStage>> finish() { + /* + * We can ignore the results here because its okay to call finish without invoking prepare. It will be a no-op + * and an empty collection will be returned. + */ + collectorSealed.compareAndSet(false, true); + + CompletionStage>> sealedOutputFuture = + FutureUtils.flattenFutures(outputFutures); + outputFutures.clear(); + return sealedOutputFuture; + } + + @Override + public void prepare() { + boolean isCollectorSealed = collectorSealed.compareAndSet(true, false); + checkState( + isCollectorSealed, + "Failed to prepare the collector. Collector needs to be sealed before prepare() is invoked."); + } + } + /** * Factory class to create an {@link org.apache.beam.runners.core.DoFnRunners.OutputManager} that * emits values to the main output only, which is a single {@link @@ -497,13 +535,31 @@ private void emitAllPushbackValues() { public static class SingleOutputManagerFactory implements OutputManagerFactory { @Override public DoFnRunners.OutputManager create(OpEmitter emitter) { + return createOutputManager(emitter, null); + } + + @Override + public DoFnRunners.OutputManager create( + OpEmitter emitter, FutureCollector collector) { + return createOutputManager(emitter, collector); + } + + private DoFnRunners.OutputManager createOutputManager( + OpEmitter emitter, FutureCollector collector) { return new DoFnRunners.OutputManager() { @Override + @SuppressWarnings("unchecked") public void output(TupleTag tupleTag, WindowedValue windowedValue) { // With only one input we know that T is of type OutT. - @SuppressWarnings("unchecked") - final WindowedValue retypedWindowedValue = (WindowedValue) windowedValue; - emitter.emitElement(retypedWindowedValue); + if (windowedValue.getValue() instanceof CompletionStage) { + CompletionStage valueFuture = (CompletionStage) windowedValue.getValue(); + if (collector != null) { + collector.add(createOutputFuture(windowedValue, valueFuture, value -> (OutT) value)); + } + } else { + final WindowedValue retypedWindowedValue = (WindowedValue) windowedValue; + emitter.emitElement(retypedWindowedValue); + } } }; } @@ -523,13 +579,34 @@ public MultiOutputManagerFactory(Map, Integer> tagToIndexMap) { @Override public DoFnRunners.OutputManager create(OpEmitter emitter) { + return createOutputManager(emitter, null); + } + + @Override + public DoFnRunners.OutputManager create( + OpEmitter emitter, FutureCollector collector) { + return createOutputManager(emitter, collector); + } + + private DoFnRunners.OutputManager createOutputManager( + OpEmitter emitter, FutureCollector collector) { return new DoFnRunners.OutputManager() { @Override + @SuppressWarnings("unchecked") public void output(TupleTag tupleTag, WindowedValue windowedValue) { final int index = tagToIndexMap.get(tupleTag); final T rawValue = windowedValue.getValue(); - final RawUnionValue rawUnionValue = new RawUnionValue(index, rawValue); - emitter.emitElement(windowedValue.withValue(rawUnionValue)); + if (rawValue instanceof CompletionStage) { + CompletionStage valueFuture = (CompletionStage) rawValue; + if (collector != null) { + collector.add( + createOutputFuture( + windowedValue, valueFuture, res -> new RawUnionValue(index, res))); + } + } else { + final RawUnionValue rawUnionValue = new RawUnionValue(index, rawValue); + emitter.emitElement(windowedValue.withValue(rawUnionValue)); + } } }; } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/DoFnRunnerWithKeyedInternals.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/DoFnRunnerWithKeyedInternals.java index fb666445f6a6..ce89c36ac5b8 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/DoFnRunnerWithKeyedInternals.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/DoFnRunnerWithKeyedInternals.java @@ -17,11 +17,8 @@ */ package org.apache.beam.runners.samza.runtime; -import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; - import org.apache.beam.runners.core.DoFnRunner; import org.apache.beam.runners.core.KeyedWorkItem; -import org.apache.beam.runners.core.TimerInternals; import org.apache.beam.sdk.state.TimeDomain; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; @@ -61,24 +58,6 @@ public void processElement(WindowedValue elem) { } } - public void onTimer(KeyedTimerData keyedTimerData, BoundedWindow window) { - setKeyedInternals(keyedTimerData); - - try { - final TimerInternals.TimerData timer = keyedTimerData.getTimerData(); - onTimer( - timer.getTimerId(), - timer.getTimerFamilyId(), - keyedTimerData.getKey(), - window, - timer.getTimestamp(), - timer.getOutputTimestamp(), - timer.getDomain()); - } finally { - clearKeyedInternals(); - } - } - @Override public void onTimer( String timerId, @@ -88,9 +67,16 @@ public void onTimer( Instant timestamp, Instant outputTimestamp, TimeDomain timeDomain) { - checkState(keyedInternals.getKey() != null, "Key is not set for timer"); + // Note: wrap with KV.of(key, null) as a special use case of setKeyedInternals() to set key + // directly. + setKeyedInternals(KV.of(key, null)); - underlying.onTimer(timerId, timerFamilyId, key, window, timestamp, outputTimestamp, timeDomain); + try { + underlying.onTimer( + timerId, timerFamilyId, key, window, timestamp, outputTimestamp, timeDomain); + } finally { + clearKeyedInternals(); + } } @Override @@ -108,7 +94,6 @@ public DoFn getFn() { return underlying.getFn(); } - @SuppressWarnings("unchecked") private void setKeyedInternals(Object value) { if (value instanceof KeyedWorkItem) { keyedInternals.setKey(((KeyedWorkItem) value).key()); @@ -117,8 +102,12 @@ private void setKeyedInternals(Object value) { if (key != null) { keyedInternals.setKey(key); } - } else { + } else if (value instanceof KV) { keyedInternals.setKey(((KV) value).getKey()); + } else { + throw new UnsupportedOperationException( + String.format( + "%s is not supported in %s", value.getClass(), DoFnRunnerWithKeyedInternals.class)); } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/FutureCollector.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/FutureCollector.java new file mode 100644 index 000000000000..acb2ebaa7136 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/FutureCollector.java @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.runtime; + +import java.util.Collection; +import java.util.concurrent.CompletionStage; +import org.apache.beam.sdk.util.WindowedValue; + +/** + * A future collector that buffers the output from the users {@link + * org.apache.beam.sdk.transforms.DoFn} and propagates the result future to downstream operators + * only after {@link #finish()} is invoked. + * + * @param type of the output element + */ +public interface FutureCollector { + /** + * Outputs the element to the collector. + * + * @param element to add to the collector + */ + void add(CompletionStage> element); + + /** + * Discards the elements within the collector. Once the elements have been discarded, callers need + * to prepare the collector again before invoking {@link #add(CompletionStage)}. + */ + void discard(); + + /** + * Seals this {@link FutureCollector}, returning a {@link CompletionStage} containing all of the + * elements that were added to it. The {@link #add(CompletionStage)} method will throw an {@link + * IllegalStateException} if called after a call to finish. + * + *

    The {@link FutureCollector} needs to be started again to collect newer batch of output. + */ + CompletionStage>> finish(); + + /** + * Prepares the {@link FutureCollector} to accept output elements. The {@link + * #add(CompletionStage)} method will throw an {@link IllegalStateException} if called without + * preparing the collector. + */ + void prepare(); +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/GroupByKeyOp.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/GroupByKeyOp.java index a84dde67a11f..dd4157a96084 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/GroupByKeyOp.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/GroupByKeyOp.java @@ -32,8 +32,6 @@ import org.apache.beam.runners.core.SystemReduceFn; import org.apache.beam.runners.core.TimerInternals; import org.apache.beam.runners.core.TimerInternals.TimerData; -import org.apache.beam.runners.core.construction.SerializablePipelineOptions; -import org.apache.beam.runners.core.serialization.Base64Serializer; import org.apache.beam.runners.samza.SamzaExecutionContext; import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.metrics.DoFnRunnerWithMetrics; @@ -111,15 +109,14 @@ public void open( Context context, Scheduler> timerRegistry, OpEmitter> emitter) { - this.pipelineOptions = - Base64Serializer.deserializeUnchecked( - config.get("beamPipelineOptions"), SerializablePipelineOptions.class) - .get() - .as(SamzaPipelineOptions.class); + + final SamzaExecutionContext samzaExecutionContext = + (SamzaExecutionContext) context.getApplicationContainerContext(); + this.pipelineOptions = samzaExecutionContext.getPipelineOptions(); final SamzaStoreStateInternals.Factory nonKeyedStateInternalsFactory = - SamzaStoreStateInternals.createStateInternalFactory( - transformId, null, context.getTaskContext(), pipelineOptions, null); + SamzaStoreStateInternals.createNonKeyedStateInternalsFactory( + transformId, context.getTaskContext(), pipelineOptions); final DoFnRunners.OutputManager outputManager = outputManagerFactory.create(emitter); diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KeyedInternals.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KeyedInternals.java index 2ac97d33e02f..6501247c3e54 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KeyedInternals.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KeyedInternals.java @@ -28,6 +28,8 @@ import org.apache.beam.runners.core.StateTag; import org.apache.beam.runners.core.TimerInternals; import org.apache.beam.runners.core.TimerInternalsFactory; +import org.apache.beam.runners.samza.state.SamzaMapState; +import org.apache.beam.runners.samza.state.SamzaSetState; import org.apache.beam.sdk.state.State; import org.apache.beam.sdk.state.StateContext; import org.apache.beam.sdk.state.TimeDomain; @@ -87,8 +89,10 @@ void clearKey() { final List states = threadLocalKeyedStates.get().states; states.forEach( state -> { - if (state instanceof SamzaStoreStateInternals.KeyValueIteratorState) { - ((SamzaStoreStateInternals.KeyValueIteratorState) state).closeIterators(); + if (state instanceof SamzaMapState) { + ((SamzaMapState) state).closeIterators(); + } else if (state instanceof SamzaSetState) { + ((SamzaSetState) state).closeIterators(); } }); states.clear(); @@ -138,8 +142,9 @@ public void setTimer(TimerData timerData) { } @Override - public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain timeDomain) { - getInternals().deleteTimer(namespace, timerId, timeDomain); + public void deleteTimer( + StateNamespace namespace, String timerId, String timerFamilyId, TimeDomain timeDomain) { + getInternals().deleteTimer(namespace, timerId, timerFamilyId, timeDomain); } @Override diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KeyedTimerData.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KeyedTimerData.java index 9dcec3d954d7..9c6bf386cfdd 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KeyedTimerData.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KeyedTimerData.java @@ -42,6 +42,7 @@ * {@link Comparable} by first comparing the wrapped TimerData then the key. */ @SuppressWarnings({ + "keyfor", "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OpAdapter.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OpAdapter.java index 564ee463ee85..46746a6c50f0 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OpAdapter.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OpAdapter.java @@ -21,24 +21,30 @@ import java.util.ArrayList; import java.util.Collection; import java.util.List; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.CompletionStage; +import java.util.stream.Collectors; import org.apache.beam.sdk.util.UserCodeException; import org.apache.beam.sdk.util.WindowedValue; import org.apache.samza.config.Config; import org.apache.samza.context.Context; import org.apache.samza.operators.Scheduler; -import org.apache.samza.operators.functions.FlatMapFunction; +import org.apache.samza.operators.functions.AsyncFlatMapFunction; import org.apache.samza.operators.functions.ScheduledFunction; import org.apache.samza.operators.functions.WatermarkFunction; import org.joda.time.Instant; import org.slf4j.Logger; import org.slf4j.LoggerFactory; -/** Adaptor class that runs a Samza {@link Op} for BEAM in the Samza {@link FlatMapFunction}. */ +/** + * Adaptor class that runs a Samza {@link Op} for BEAM in the Samza {@link AsyncFlatMapFunction}. + * This class is initialized once for each Op within a Task for each Task. + */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class OpAdapter - implements FlatMapFunction, OpMessage>, + implements AsyncFlatMapFunction, OpMessage>, WatermarkFunction>, ScheduledFunction, OpMessage>, Serializable { @@ -46,12 +52,13 @@ public class OpAdapter private final Op op; private transient List> outputList; + private transient CompletionStage>> outputFuture; private transient Instant outputWatermark; private transient OpEmitter emitter; private transient Config config; private transient Context context; - public static FlatMapFunction, OpMessage> adapt( + public static AsyncFlatMapFunction, OpMessage> adapt( Op op) { return new OpAdapter<>(op); } @@ -76,7 +83,7 @@ public final void schedule(Scheduler> timerRegistry) { } @Override - public Collection> apply(OpMessage message) { + public synchronized CompletionStage>> apply(OpMessage message) { assert outputList.isEmpty(); try { @@ -99,13 +106,26 @@ public Collection> apply(OpMessage message) { throw UserCodeException.wrap(e); } - final List> results = new ArrayList<>(outputList); + CompletionStage>> resultFuture = + CompletableFuture.completedFuture(new ArrayList<>(outputList)); + + if (outputFuture != null) { + resultFuture = + resultFuture.thenCombine( + outputFuture, + (res1, res2) -> { + res1.addAll(res2); + return res1; + }); + } + outputList.clear(); - return results; + outputFuture = null; + return resultFuture; } @Override - public Collection> processWatermark(long time) { + public synchronized Collection> processWatermark(long time) { assert outputList.isEmpty(); try { @@ -122,12 +142,13 @@ public Collection> processWatermark(long time) { } @Override - public Long getOutputWatermark() { + public synchronized Long getOutputWatermark() { return outputWatermark != null ? outputWatermark.getMillis() : null; } @Override - public Collection> onCallback(KeyedTimerData keyedTimerData, long time) { + public synchronized Collection> onCallback( + KeyedTimerData keyedTimerData, long time) { assert outputList.isEmpty(); try { @@ -153,6 +174,13 @@ public void emitElement(WindowedValue element) { outputList.add(OpMessage.ofElement(element)); } + @Override + public void emitFuture(CompletionStage>> resultFuture) { + outputFuture = + resultFuture.thenApply( + res -> res.stream().map(OpMessage::ofElement).collect(Collectors.toList())); + } + @Override public void emitWatermark(Instant watermark) { outputWatermark = watermark; diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OpEmitter.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OpEmitter.java index d1e1f0646eae..951f5df6e46d 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OpEmitter.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OpEmitter.java @@ -17,11 +17,16 @@ */ package org.apache.beam.runners.samza.runtime; +import java.util.Collection; +import java.util.concurrent.CompletionStage; import org.apache.beam.sdk.util.WindowedValue; import org.joda.time.Instant; /** Output emitter for Samza {@link Op}. */ public interface OpEmitter { + + void emitFuture(CompletionStage>> resultFuture); + void emitElement(WindowedValue element); void emitWatermark(Instant watermark); diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OutputManagerFactory.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OutputManagerFactory.java index e404c5f660af..5d4047dc9672 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OutputManagerFactory.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/OutputManagerFactory.java @@ -23,4 +23,9 @@ /** Factory class to create {@link DoFnRunners.OutputManager}. */ public interface OutputManagerFactory extends Serializable { DoFnRunners.OutputManager create(OpEmitter emitter); + + default DoFnRunners.OutputManager create( + OpEmitter emitter, FutureCollector collector) { + return create(emitter); + } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaDoFnInvokerRegistrar.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaDoFnInvokerRegistrar.java index ebb01b0e6107..c0536161cb79 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaDoFnInvokerRegistrar.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaDoFnInvokerRegistrar.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.samza.runtime; import java.util.Map; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.reflect.DoFnInvoker; import org.apache.samza.context.Context; @@ -26,8 +27,8 @@ public interface SamzaDoFnInvokerRegistrar { /** Returns the invoker for a {@link DoFn}. */ - DoFnInvoker invokerFor( - DoFn fn, Context context); + DoFnInvoker invokerSetupFor( + DoFn fn, PipelineOptions options, Context context); /** Returns the configs for a {@link DoFn}. */ Map configFor(DoFn fn); diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaDoFnRunners.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaDoFnRunners.java index 2232736fa12b..1848f3bf91cd 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaDoFnRunners.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaDoFnRunners.java @@ -17,24 +17,34 @@ */ package org.apache.beam.runners.samza.runtime; +import java.util.Collections; import java.util.List; +import java.util.Locale; import java.util.Map; import java.util.concurrent.LinkedBlockingQueue; +import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.DoFnRunner; import org.apache.beam.runners.core.DoFnRunners; import org.apache.beam.runners.core.SideInputHandler; import org.apache.beam.runners.core.StateInternals; +import org.apache.beam.runners.core.StateNamespaces; +import org.apache.beam.runners.core.StateTags; import org.apache.beam.runners.core.StatefulDoFnRunner; import org.apache.beam.runners.core.StepContext; import org.apache.beam.runners.core.TimerInternals; +import org.apache.beam.runners.core.construction.Timer; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; import org.apache.beam.runners.fnexecution.control.BundleProgressHandler; import org.apache.beam.runners.fnexecution.control.OutputReceiverFactory; import org.apache.beam.runners.fnexecution.control.RemoteBundle; import org.apache.beam.runners.fnexecution.control.StageBundleFactory; +import org.apache.beam.runners.fnexecution.control.TimerReceiverFactory; import org.apache.beam.runners.fnexecution.state.StateRequestHandler; import org.apache.beam.runners.samza.SamzaExecutionContext; import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.metrics.DoFnRunnerWithMetrics; +import org.apache.beam.runners.samza.util.StateUtils; +import org.apache.beam.runners.samza.util.WindowUtils; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.state.BagState; @@ -44,6 +54,7 @@ import org.apache.beam.sdk.transforms.reflect.DoFnSignature; import org.apache.beam.sdk.transforms.reflect.DoFnSignatures; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollectionView; @@ -84,12 +95,12 @@ public static DoFnRunner create( final StateInternals stateInternals; final DoFnSignature signature = DoFnSignatures.getSignature(doFn.getClass()); final SamzaStoreStateInternals.Factory stateInternalsFactory = - SamzaStoreStateInternals.createStateInternalFactory( + SamzaStoreStateInternals.createStateInternalsFactory( transformId, keyCoder, context.getTaskContext(), pipelineOptions, signature); final SamzaExecutionContext executionContext = (SamzaExecutionContext) context.getApplicationContainerContext(); - if (DoFnSignatures.isStateful(doFn)) { + if (StateUtils.isStateful(doFn)) { keyedInternals = new KeyedInternals(stateInternalsFactory, timerInternalsFactory); stateInternals = keyedInternals.stateInternals(); timerInternals = keyedInternals.timerInternals(); @@ -170,25 +181,62 @@ private static StatefulDoFnRunner.StateCleaner createStateClean } /** Create DoFnRunner for portable runner. */ + @SuppressWarnings("unchecked") public static DoFnRunner createPortable( + String transformId, + String bundleStateId, + Coder> windowedValueCoder, + ExecutableStage executableStage, + Map> sideInputMapping, + SideInputHandler sideInputHandler, + SamzaStoreStateInternals.Factory nonKeyedStateInternalsFactory, + SamzaTimerInternalsFactory timerInternalsFactory, SamzaPipelineOptions pipelineOptions, - BagState> bundledEventsBag, DoFnRunners.OutputManager outputManager, StageBundleFactory stageBundleFactory, TupleTag mainOutputTag, Map> idToTupleTagMap, Context context, String transformFullName) { + // storing events within a bundle in states + final BagState> bundledEventsBag = + nonKeyedStateInternalsFactory + .stateInternalsForKey(null) + .state(StateNamespaces.global(), StateTags.bag(bundleStateId, windowedValueCoder)); + + final StateRequestHandler stateRequestHandler = + SamzaStateRequestHandlers.of( + transformId, + context.getTaskContext(), + pipelineOptions, + executableStage, + stageBundleFactory, + (Map>) + sideInputMapping, + sideInputHandler); + final SamzaExecutionContext executionContext = (SamzaExecutionContext) context.getApplicationContainerContext(); - final DoFnRunner sdkHarnessDoFnRunner = + final DoFnRunner underlyingRunner = new SdkHarnessDoFnRunner<>( - outputManager, stageBundleFactory, mainOutputTag, idToTupleTagMap, bundledEventsBag); - return DoFnRunnerWithMetrics.wrap( - sdkHarnessDoFnRunner, executionContext.getMetricsContainer(), transformFullName); + timerInternalsFactory, + WindowUtils.getWindowStrategy( + executableStage.getInputPCollection().getId(), executableStage.getComponents()), + outputManager, + stageBundleFactory, + mainOutputTag, + idToTupleTagMap, + bundledEventsBag, + stateRequestHandler); + return pipelineOptions.getEnableMetrics() + ? DoFnRunnerWithMetrics.wrap( + underlyingRunner, executionContext.getMetricsContainer(), transformFullName) + : underlyingRunner; } private static class SdkHarnessDoFnRunner implements DoFnRunner { + private final SamzaTimerInternalsFactory timerInternalsFactory; + private final WindowingStrategy windowingStrategy; private final DoFnRunners.OutputManager outputManager; private final StageBundleFactory stageBundleFactory; private final TupleTag mainOutputTag; @@ -197,18 +245,36 @@ private static class SdkHarnessDoFnRunner implements DoFnRunner> bundledEventsBag; private RemoteBundle remoteBundle; private FnDataReceiver> inputReceiver; + private StateRequestHandler stateRequestHandler; private SdkHarnessDoFnRunner( + SamzaTimerInternalsFactory timerInternalsFactory, + WindowingStrategy windowingStrategy, DoFnRunners.OutputManager outputManager, StageBundleFactory stageBundleFactory, TupleTag mainOutputTag, Map> idToTupleTagMap, - BagState> bundledEventsBag) { + BagState> bundledEventsBag, + StateRequestHandler stateRequestHandler) { + this.timerInternalsFactory = timerInternalsFactory; + this.windowingStrategy = windowingStrategy; this.outputManager = outputManager; this.stageBundleFactory = stageBundleFactory; this.mainOutputTag = mainOutputTag; this.idToTupleTagMap = idToTupleTagMap; this.bundledEventsBag = bundledEventsBag; + this.stateRequestHandler = stateRequestHandler; + } + + @SuppressWarnings("unchecked") + private void timerDataConsumer(Timer timerElement, TimerInternals.TimerData timerData) { + TimerInternals timerInternals = + timerInternalsFactory.timerInternalsForKey(timerElement.getUserKey()); + if (timerElement.getClearBit()) { + timerInternals.deleteTimer(timerData); + } else { + timerInternals.setTimer(timerData); + } } @Override @@ -225,13 +291,17 @@ public FnDataReceiver create(String pCollectionId) { } }; + final Coder windowCoder = windowingStrategy.getWindowFn().windowCoder(); + final TimerReceiverFactory timerReceiverFactory = + new TimerReceiverFactory(stageBundleFactory, this::timerDataConsumer, windowCoder); + remoteBundle = stageBundleFactory.getBundle( receiverFactory, - StateRequestHandler.unsupported(), + timerReceiverFactory, + stateRequestHandler, BundleProgressHandler.ignored()); - // TODO: side input support needs to implement to handle this properly inputReceiver = Iterables.getOnlyElement(remoteBundle.getInputReceivers().values()); bundledEventsBag .read() @@ -275,7 +345,27 @@ public void onTimer( BoundedWindow window, Instant timestamp, Instant outputTimestamp, - TimeDomain timeDomain) {} + TimeDomain timeDomain) { + final KV timerReceiverKey = + TimerReceiverFactory.decodeTimerDataTimerId(timerFamilyId); + final FnDataReceiver timerReceiver = + remoteBundle.getTimerReceivers().get(timerReceiverKey); + final Timer timerValue = + Timer.of( + key, + timerId, + Collections.singletonList(window), + timestamp, + outputTimestamp, + // TODO: Support propagating the PaneInfo through. + PaneInfo.NO_FIRING); + try { + timerReceiver.accept(timerValue); + } catch (Exception e) { + throw new RuntimeException( + String.format(Locale.ENGLISH, "Failed to process timer %s", timerReceiver), e); + } + } @Override public void finishBundle() { diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaExecutableStageContextFactory.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaExecutableStageContextFactory.java new file mode 100644 index 000000000000..f034d031f2c2 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaExecutableStageContextFactory.java @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.runtime; + +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentMap; +import org.apache.beam.runners.fnexecution.control.DefaultExecutableStageContext; +import org.apache.beam.runners.fnexecution.control.ExecutableStageContext; +import org.apache.beam.runners.fnexecution.control.ReferenceCountingExecutableStageContextFactory; +import org.apache.beam.runners.fnexecution.provisioning.JobInfo; + +/** + * Singleton class that contains one {@link ExecutableStageContext.Factory} per job. Assumes it is + * safe to release the backing environment asynchronously. + */ +public class SamzaExecutableStageContextFactory implements ExecutableStageContext.Factory { + + private static final SamzaExecutableStageContextFactory instance = + new SamzaExecutableStageContextFactory(); + // This map should only ever have a single element, as each job will have its own + // classloader and therefore its own instance of SamzaExecutableStageContextFactory. This + // code supports multiple JobInfos in order to provide a sensible implementation of + // Factory.get(JobInfo), which in theory could be called with different JobInfos. + private static final ConcurrentMap jobFactories = + new ConcurrentHashMap<>(); + + private SamzaExecutableStageContextFactory() {} + + public static SamzaExecutableStageContextFactory getInstance() { + return instance; + } + + @Override + public ExecutableStageContext get(JobInfo jobInfo) { + ExecutableStageContext.Factory jobFactory = + jobFactories.computeIfAbsent( + jobInfo.jobId(), + k -> + ReferenceCountingExecutableStageContextFactory.create( + DefaultExecutableStageContext::create, + // Always release environment asynchronously. + (caller) -> false)); + + return jobFactory.get(jobInfo); + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaStateRequestHandlers.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaStateRequestHandlers.java new file mode 100644 index 000000000000..d67a5a2984c3 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaStateRequestHandlers.java @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.runtime; + +import java.io.IOException; +import java.util.EnumMap; +import java.util.Iterator; +import java.util.Map; +import org.apache.beam.model.fnexecution.v1.BeamFnApi; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.SideInputHandler; +import org.apache.beam.runners.core.StateNamespace; +import org.apache.beam.runners.core.StateNamespaces; +import org.apache.beam.runners.core.StateTags; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; +import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors; +import org.apache.beam.runners.fnexecution.control.StageBundleFactory; +import org.apache.beam.runners.fnexecution.state.StateRequestHandler; +import org.apache.beam.runners.fnexecution.state.StateRequestHandlers; +import org.apache.beam.runners.fnexecution.translation.StreamingSideInputHandlerFactory; +import org.apache.beam.runners.fnexecution.wire.ByteStringCoder; +import org.apache.beam.runners.samza.SamzaPipelineOptions; +import org.apache.beam.runners.samza.util.StateUtils; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.state.BagState; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.values.PCollectionView; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.samza.context.TaskContext; + +/** + * This class creates {@link StateRequestHandler} for side inputs and states of the Samza portable + * runner. + */ +public class SamzaStateRequestHandlers { + + public static StateRequestHandler of( + String transformId, + TaskContext context, + SamzaPipelineOptions pipelineOptions, + ExecutableStage executableStage, + StageBundleFactory stageBundleFactory, + Map> sideInputIds, + SideInputHandler sideInputHandler) { + final StateRequestHandler sideInputStateHandler = + createSideInputStateHandler(executableStage, sideInputIds, sideInputHandler); + final StateRequestHandler userStateRequestHandler = + createUserStateRequestHandler( + transformId, executableStage, context, pipelineOptions, stageBundleFactory); + final EnumMap handlerMap = + new EnumMap<>(BeamFnApi.StateKey.TypeCase.class); + handlerMap.put(BeamFnApi.StateKey.TypeCase.ITERABLE_SIDE_INPUT, sideInputStateHandler); + handlerMap.put(BeamFnApi.StateKey.TypeCase.MULTIMAP_SIDE_INPUT, sideInputStateHandler); + handlerMap.put(BeamFnApi.StateKey.TypeCase.MULTIMAP_KEYS_SIDE_INPUT, sideInputStateHandler); + handlerMap.put(BeamFnApi.StateKey.TypeCase.BAG_USER_STATE, userStateRequestHandler); + return StateRequestHandlers.delegateBasedUponType(handlerMap); + } + + private static StateRequestHandler createSideInputStateHandler( + ExecutableStage executableStage, + Map> sideInputIds, + SideInputHandler sideInputHandler) { + + if (executableStage.getSideInputs().size() <= 0) { + return StateRequestHandler.unsupported(); + } + + final StateRequestHandlers.SideInputHandlerFactory sideInputHandlerFactory = + Preconditions.checkNotNull( + StreamingSideInputHandlerFactory.forStage( + executableStage, sideInputIds, sideInputHandler)); + try { + return StateRequestHandlers.forSideInputHandlerFactory( + ProcessBundleDescriptors.getSideInputs(executableStage), sideInputHandlerFactory); + } catch (IOException e) { + throw new RuntimeException("Failed to initialize SideInputHandler", e); + } + } + + private static StateRequestHandler createUserStateRequestHandler( + String transformId, + ExecutableStage executableStage, + TaskContext context, + SamzaPipelineOptions pipelineOptions, + StageBundleFactory stageBundleFactory) { + + if (!StateUtils.isStateful(executableStage)) { + return StateRequestHandler.unsupported(); + } + + final SamzaStoreStateInternals.Factory stateInternalsFactory = + SamzaStoreStateInternals.createStateInternalsFactory( + transformId, ByteStringCoder.of(), context, pipelineOptions, executableStage); + + return StateRequestHandlers.forBagUserStateHandlerFactory( + stageBundleFactory.getProcessBundleDescriptor(), + new BagUserStateFactory<>(stateInternalsFactory)); + } + + /** + * Factory to create {@link StateRequestHandlers.BagUserStateHandler} to provide bag state access + * for the given {@link K key} and {@link W window} provided by SDK worker, unlike classic + * pipeline where {@link K key} is set at {@link DoFnRunnerWithKeyedInternals#processElement} and + * {@link W window} is set at {@link + * org.apache.beam.runners.core.SimpleDoFnRunner.DoFnProcessContext#window()}}. + */ + static class BagUserStateFactory< + K extends ByteString, V extends ByteString, W extends BoundedWindow> + implements StateRequestHandlers.BagUserStateHandlerFactory { + + private final SamzaStoreStateInternals.Factory stateInternalsFactory; + + BagUserStateFactory(SamzaStoreStateInternals.Factory stateInternalsFactory) { + this.stateInternalsFactory = stateInternalsFactory; + } + + @Override + public StateRequestHandlers.BagUserStateHandler forUserState( + String pTransformId, + String userStateId, + Coder keyCoder, + Coder valueCoder, + Coder windowCoder) { + return new StateRequestHandlers.BagUserStateHandler() { + + /** {@inheritDoc} */ + @Override + public Iterable get(K key, W window) { + StateNamespace namespace = StateNamespaces.window(windowCoder, window); + BagState bagState = + stateInternalsFactory + .stateInternalsForKey(key) + .state(namespace, StateTags.bag(userStateId, valueCoder)); + return bagState.read(); + } + + /** {@inheritDoc} */ + @Override + public void append(K key, W window, Iterator values) { + StateNamespace namespace = StateNamespaces.window(windowCoder, window); + BagState bagState = + stateInternalsFactory + .stateInternalsForKey(key) + .state(namespace, StateTags.bag(userStateId, valueCoder)); + while (values.hasNext()) { + bagState.add(values.next()); + } + } + + /** {@inheritDoc} */ + @Override + public void clear(K key, W window) { + StateNamespace namespace = StateNamespaces.window(windowCoder, window); + BagState bagState = + stateInternalsFactory + .stateInternalsForKey(key) + .state(namespace, StateTags.bag(userStateId, valueCoder)); + bagState.clear(); + } + }; + } + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaStoreStateInternals.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaStoreStateInternals.java index a388c252759a..a9e38a7f2ede 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaStoreStateInternals.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaStoreStateInternals.java @@ -26,17 +26,23 @@ import java.util.AbstractMap; import java.util.ArrayList; import java.util.Arrays; +import java.util.Collection; import java.util.Collections; import java.util.HashMap; import java.util.Iterator; import java.util.List; import java.util.Map; import java.util.Objects; +import java.util.Set; +import java.util.function.Function; +import java.util.stream.Collectors; import javax.annotation.Nonnull; import org.apache.beam.runners.core.StateInternals; import org.apache.beam.runners.core.StateInternalsFactory; import org.apache.beam.runners.core.StateNamespace; import org.apache.beam.runners.core.StateTag; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; +import org.apache.beam.runners.core.construction.graph.UserStateReference; import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.state.SamzaMapState; import org.apache.beam.runners.samza.state.SamzaSetState; @@ -73,7 +79,10 @@ import org.apache.samza.storage.kv.Entry; import org.apache.samza.storage.kv.KeyValueIterator; import org.apache.samza.storage.kv.KeyValueStore; +import org.checkerframework.checker.initialization.qual.Initialized; +import org.checkerframework.checker.nullness.qual.NonNull; import org.checkerframework.checker.nullness.qual.Nullable; +import org.checkerframework.checker.nullness.qual.UnknownKeyFor; import org.joda.time.Instant; /** {@link StateInternals} that uses Samza local {@link KeyValueStore} to manage state. */ @@ -85,18 +94,18 @@ public class SamzaStoreStateInternals implements StateInternals { static final String BEAM_STORE = "beamStore"; - private static ThreadLocal> threadLocalBaos = + private static final ThreadLocal> threadLocalBaos = new ThreadLocal<>(); // the stores include both beamStore for system states as well as stores for user state - private final Map> stores; + private final Map>> stores; private final K key; private final byte[] keyBytes; private final int batchGetSize; private final String stageId; private SamzaStoreStateInternals( - Map> stores, + Map>> stores, @Nullable K key, byte @Nullable [] keyBytes, String stageId, @@ -109,32 +118,66 @@ private SamzaStoreStateInternals( } @SuppressWarnings("unchecked") - static KeyValueStore getBeamStore(TaskContext context) { - return (KeyValueStore) context.getStore(SamzaStoreStateInternals.BEAM_STORE); + static KeyValueStore> getBeamStore(TaskContext context) { + return (KeyValueStore>) + context.getStore(SamzaStoreStateInternals.BEAM_STORE); } - static Factory createStateInternalFactory( + /** + * Creates non keyed state internal factory to persist states in {@link + * SamzaStoreStateInternals#BEAM_STORE}. + */ + static Factory createNonKeyedStateInternalsFactory( + String id, TaskContext context, SamzaPipelineOptions pipelineOptions) { + return createStateInternalsFactory(id, null, context, pipelineOptions, Collections.emptySet()); + } + + static Factory createStateInternalsFactory( String id, - Coder keyCoder, + Coder keyCoder, TaskContext context, SamzaPipelineOptions pipelineOptions, DoFnSignature signature) { + + return createStateInternalsFactory( + id, keyCoder, context, pipelineOptions, signature.stateDeclarations().keySet()); + } + + static Factory createStateInternalsFactory( + String id, + Coder keyCoder, + TaskContext context, + SamzaPipelineOptions pipelineOptions, + ExecutableStage executableStage) { + + Set stateIds = + executableStage.getUserStates().stream() + .map(UserStateReference::localName) + .collect(Collectors.toSet()); + + return createStateInternalsFactory(id, keyCoder, context, pipelineOptions, stateIds); + } + + @SuppressWarnings("unchecked") + private static Factory createStateInternalsFactory( + String id, + @Nullable Coder keyCoder, + TaskContext context, + SamzaPipelineOptions pipelineOptions, + Collection stateIds) { final int batchGetSize = pipelineOptions.getStoreBatchGetSize(); - final Map> stores = new HashMap<>(); + final Map>> stores = new HashMap<>(); stores.put(BEAM_STORE, getBeamStore(context)); - final Coder stateKeyCoder; + final Coder stateKeyCoder; if (keyCoder != null) { - signature - .stateDeclarations() - .keySet() - .forEach( - stateId -> - stores.put( - stateId, (KeyValueStore) context.getStore(stateId))); + stateIds.forEach( + stateId -> + stores.put( + stateId, (KeyValueStore>) context.getStore(stateId))); stateKeyCoder = keyCoder; } else { - stateKeyCoder = VoidCoder.of(); + stateKeyCoder = (Coder) VoidCoder.of(); } return new Factory<>(Objects.toString(id), stores, stateKeyCoder, batchGetSize); } @@ -227,13 +270,13 @@ private static ByteArrayOutputStream getThreadLocalBaos() { /** Factory class to create {@link SamzaStoreStateInternals}. */ public static class Factory implements StateInternalsFactory { private final String stageId; - private final Map> stores; + private final Map>> stores; private final Coder keyCoder; private final int batchGetSize; public Factory( String stageId, - Map> stores, + Map>> stores, Coder keyCoder, int batchGetSize) { this.stageId = stageId; @@ -264,40 +307,29 @@ public StateInternals stateInternalsForKey(@Nullable K key) { } } - /** An internal State interface that holds underlying KeyValueIterators. */ - interface KeyValueIteratorState { - void closeIterators(); - } - private abstract class AbstractSamzaState { - private final Coder coder; - private final byte[] encodedStoreKey; - private final String namespace; - protected final KeyValueStore store; + private final StateNamespace namespace; + private final String addressId; + private final boolean isBeamStore; + private final String stageId; + private final byte[] keyBytes; + private byte[] encodedStoreKey; + protected final Coder coder; + protected final KeyValueStore> store; + @SuppressWarnings({"unchecked", "rawtypes"}) protected AbstractSamzaState( StateNamespace namespace, StateTag address, Coder coder) { this.coder = coder; - this.namespace = namespace.stringKey(); - - final KeyValueStore userStore = stores.get(address.getId()); - this.store = userStore != null ? userStore : stores.get(BEAM_STORE); - - final ByteArrayOutputStream baos = getThreadLocalBaos(); - try (DataOutputStream dos = new DataOutputStream(baos)) { - dos.write(keyBytes); - dos.writeUTF(namespace.stringKey()); - - if (userStore == null) { - // for system state, we need to differentiate based on the following: - dos.writeUTF(stageId); - dos.writeUTF(address.getId()); - } - } catch (IOException e) { - throw new RuntimeException( - "Could not encode full address for state: " + address.getId(), e); - } - this.encodedStoreKey = baos.toByteArray(); + this.namespace = namespace; + this.addressId = address.getId(); + this.isBeamStore = !stores.containsKey(address.getId()); + this.store = + isBeamStore + ? (KeyValueStore) stores.get(BEAM_STORE) + : (KeyValueStore) stores.get(address.getId()); + this.stageId = SamzaStoreStateInternals.this.stageId; + this.keyBytes = SamzaStoreStateInternals.this.keyBytes; } protected void clearInternal() { @@ -305,12 +337,12 @@ protected void clearInternal() { } protected void writeInternal(T value) { - store.put(getEncodedStoreKey(), encodeValue(value)); + store.put(getEncodedStoreKey(), StateValue.of(value, coder)); } protected T readInternal() { - final byte[] valueBytes = store.get(getEncodedStoreKey()); - return decodeValue(valueBytes); + final StateValue stateValue = store.get(getEncodedStoreKey()); + return decodeValue(stateValue); } protected ReadableState isEmptyInternal() { @@ -328,32 +360,31 @@ public ReadableState readLater() { } protected ByteArray getEncodedStoreKey() { - return ByteArray.of(encodedStoreKey); + return ByteArray.of(getEncodedStoreKeyBytes()); } protected byte[] getEncodedStoreKeyBytes() { - return encodedStoreKey; - } - - protected byte[] encodeValue(T value) { - final ByteArrayOutputStream baos = getThreadLocalBaos(); - try { - coder.encode(value, baos); - } catch (IOException e) { - throw new RuntimeException("Could not encode state value: " + value, e); - } - return baos.toByteArray(); - } - - protected T decodeValue(byte[] valueBytes) { - if (valueBytes != null) { - try { - return coder.decode(new ByteArrayInputStream(valueBytes)); + if (encodedStoreKey == null) { + final ByteArrayOutputStream baos = getThreadLocalBaos(); + try (DataOutputStream dos = new DataOutputStream(baos)) { + dos.write(keyBytes); + dos.writeUTF(namespace.stringKey()); + + if (isBeamStore) { + // for system state, we need to differentiate based on the following: + dos.writeUTF(stageId); + dos.writeUTF(addressId); + } } catch (IOException e) { - throw new RuntimeException("Could not decode state", e); + throw new RuntimeException("Could not encode full address for state: " + addressId, e); } + this.encodedStoreKey = baos.toByteArray(); } - return null; + return encodedStoreKey; + } + + protected T decodeValue(StateValue stateValue) { + return stateValue == null ? null : stateValue.getValue(coder); } @Override @@ -367,13 +398,20 @@ public boolean equals(@Nullable Object o) { @SuppressWarnings("unchecked") final AbstractSamzaState that = (AbstractSamzaState) o; - return Arrays.equals(encodedStoreKey, that.encodedStoreKey); + if (isBeamStore || that.isBeamStore) { + if (!isBeamStore || !that.isBeamStore || !stageId.equals(that.stageId)) { + return false; + } + } + return Arrays.equals(keyBytes, that.keyBytes) + && addressId.equals(that.addressId) + && this.namespace.equals(that.namespace); } @Override public int hashCode() { int result = namespace.hashCode(); - result = 31 * result + Arrays.hashCode(encodedStoreKey); + result = 31 * result + Arrays.hashCode(getEncodedStoreKeyBytes()); return result; } } @@ -417,8 +455,8 @@ public void add(T value) { synchronized (store) { final int size = getSize(); final ByteArray encodedKey = encodeKey(size); - store.put(encodedKey, encodeValue(value)); - store.put(getEncodedStoreKey(), Ints.toByteArray(size + 1)); + store.put(encodedKey, StateValue.of(value, coder)); + store.put(getEncodedStoreKey(), StateValue.of(Ints.toByteArray(size + 1))); } } @@ -476,8 +514,10 @@ public void clear() { } private int getSize() { - final byte[] sizeBytes = store.get(getEncodedStoreKey()); - return sizeBytes == null ? 0 : Ints.fromByteArray(sizeBytes); + final StateValue stateSize = store.get(getEncodedStoreKey()); + return (stateSize == null || stateSize.valueBytes == null) + ? 0 + : Ints.fromByteArray(stateSize.valueBytes); } private ByteArray encodeKey(int size) { @@ -492,7 +532,7 @@ private ByteArray encodeKey(int size) { } } - private class SamzaSetStateImpl implements SamzaSetState, KeyValueIteratorState { + private class SamzaSetStateImpl implements SamzaSetState { private final SamzaMapStateImpl mapState; private SamzaSetStateImpl( @@ -585,11 +625,11 @@ public void closeIterators() { } private class SamzaMapStateImpl extends AbstractSamzaState - implements SamzaMapState, KeyValueIteratorState { + implements SamzaMapState { private final Coder keyCoder; private final int storeKeySize; - private final List> openIterators = + private final List>> openIterators = Collections.synchronizedList(new ArrayList<>()); private int maxKeySize; @@ -611,15 +651,16 @@ protected SamzaMapStateImpl( public void put(KeyT key, ValueT value) { final ByteArray encodedKey = encodeKey(key); maxKeySize = Math.max(maxKeySize, encodedKey.getValue().length); - store.put(encodedKey, encodeValue(value)); + store.put(encodedKey, StateValue.of(value, coder)); } @Override - public @Nullable ReadableState putIfAbsent(KeyT key, ValueT value) { + public @Nullable ReadableState computeIfAbsent( + KeyT key, Function mappingFunction) { final ByteArray encodedKey = encodeKey(key); final ValueT current = decodeValue(store.get(encodedKey)); if (current == null) { - put(key, value); + put(key, mappingFunction.apply(key)); } return current == null ? null : ReadableStates.immediate(current); @@ -632,8 +673,24 @@ public void remove(KeyT key) { @Override public ReadableState get(KeyT key) { - ValueT value = decodeValue(store.get(encodeKey(key))); - return ReadableStates.immediate(value); + return getOrDefault(key, null); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState getOrDefault( + KeyT key, @Nullable ValueT defaultValue) { + return new ReadableState() { + @Override + public @Nullable ValueT read() { + ValueT value = decodeValue(store.get(encodeKey(key))); + return value != null ? value : defaultValue; + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + return this; + } + }; } @Override @@ -684,10 +741,30 @@ public ReadableState>> readLater() { }; } + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState< + @UnknownKeyFor @NonNull @Initialized Boolean> + isEmpty() { + ReadableState> keys = this.keys(); + return new ReadableState() { + @Override + public @Nullable Boolean read() { + return Iterables.isEmpty(keys.read()); + } + + @Override + public @UnknownKeyFor @NonNull @Initialized ReadableState readLater() { + keys.readLater(); + return this; + } + }; + } + @Override public ReadableState>> readIterator() { final ByteArray maxKey = createMaxKey(); - final KeyValueIterator kvIter = store.range(getEncodedStoreKey(), maxKey); + final KeyValueIterator> kvIter = + store.range(getEncodedStoreKey(), maxKey); openIterators.add(kvIter); return new ReadableState>>() { @@ -707,7 +784,7 @@ public boolean hasNext() { @Override public Map.Entry next() { - Entry entry = kvIter.next(); + Entry> entry = kvIter.next(); return new AbstractMap.SimpleEntry<>( decodeKey(entry.getKey()), decodeValue(entry.getValue())); } @@ -726,16 +803,19 @@ public ReadableState>> readLater() { * properly, we need to load the content into memory. */ private Iterable createIterable( - SerializableFunction, OutputT> fn) { + SerializableFunction< + org.apache.samza.storage.kv.Entry>, OutputT> + fn) { final ByteArray maxKey = createMaxKey(); - final KeyValueIterator kvIter = store.range(getEncodedStoreKey(), maxKey); - final List> iterable = ImmutableList.copyOf(kvIter); + final KeyValueIterator> kvIter = + store.range(getEncodedStoreKey(), maxKey); + final List>> iterable = ImmutableList.copyOf(kvIter); kvIter.close(); return new Iterable() { @Override public Iterator iterator() { - final Iterator> iter = iterable.iterator(); + final Iterator>> iter = iterable.iterator(); return new Iterator() { @Override @@ -755,7 +835,8 @@ public OutputT next() { @Override public void clear() { final ByteArray maxKey = createMaxKey(); - final KeyValueIterator kvIter = store.range(getEncodedStoreKey(), maxKey); + final KeyValueIterator> kvIter = + store.range(getEncodedStoreKey(), maxKey); while (kvIter.hasNext()) { store.delete(kvIter.next().getKey()); } @@ -975,4 +1056,76 @@ public ByteArray fromBytes(byte[] bytes) { } } } + + /** + * Wrapper for state value so that unencoded value can be read directly from the cache of + * KeyValueStore. + */ + public static class StateValue implements Serializable { + private T value; + private Coder valueCoder; + private byte[] valueBytes; + + private StateValue(T value, Coder valueCoder, byte[] valueBytes) { + this.value = value; + this.valueCoder = valueCoder; + this.valueBytes = valueBytes; + } + + public static StateValue of(T value, Coder valueCoder) { + return new StateValue<>(value, valueCoder, null); + } + + public static StateValue of(byte[] valueBytes) { + return new StateValue<>(null, null, valueBytes); + } + + public T getValue(Coder coder) { + if (value == null && valueBytes != null) { + if (valueCoder == null) { + valueCoder = coder; + } + try { + value = valueCoder.decode(new ByteArrayInputStream(valueBytes)); + } catch (IOException e) { + throw new RuntimeException("Could not decode state", e); + } + } + return value; + } + + public byte[] getValueBytes() { + if (valueBytes == null && value != null) { + final ByteArrayOutputStream baos = new ByteArrayOutputStream(); + try { + valueCoder.encode(value, baos); + } catch (IOException e) { + throw new RuntimeException("Could not encode state value: " + value, e); + } + valueBytes = baos.toByteArray(); + } + return valueBytes; + } + } + + /** Factory class to provide {@link StateValueSerdeFactory.StateValueSerde}. */ + public static class StateValueSerdeFactory implements SerdeFactory> { + @Override + public Serde> getSerde(String name, Config config) { + return new StateValueSerde(); + } + + /** Serde for {@link StateValue}. */ + public static class StateValueSerde implements Serde> { + @Override + public StateValue fromBytes(byte[] bytes) { + return StateValue.of(bytes); + } + + @Override + public byte[] toBytes(StateValue stateValue) { + return stateValue == null ? null : stateValue.getValueBytes(); + } + } + } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaTimerInternalsFactory.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaTimerInternalsFactory.java index 1fc0324ef4ac..781421743233 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaTimerInternalsFactory.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaTimerInternalsFactory.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.samza.runtime; +import com.google.auto.value.AutoValue; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; @@ -37,6 +38,7 @@ import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.SamzaRunner; import org.apache.beam.runners.samza.state.SamzaMapState; +import org.apache.beam.runners.samza.state.SamzaSetState; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.StringUtf8Coder; @@ -64,8 +66,7 @@ }) public class SamzaTimerInternalsFactory implements TimerInternalsFactory { private static final Logger LOG = LoggerFactory.getLogger(SamzaTimerInternalsFactory.class); - - private final NavigableSet> eventTimeTimers; + private final NavigableSet> eventTimeBuffer; private final Coder keyCoder; private final Scheduler> timerRegistry; private final SamzaTimerState state; @@ -74,16 +75,31 @@ public class SamzaTimerInternalsFactory implements TimerInternalsFactory { private Instant inputWatermark = BoundedWindow.TIMESTAMP_MIN_VALUE; private Instant outputWatermark = BoundedWindow.TIMESTAMP_MIN_VALUE; + // Size of each event timer is around 200B, by default with buffer size 50k, the default size is + // 10M + private final int maxEventTimerBufferSize; + // Max event time stored in eventTimerBuffer + // If it is set to long.MAX_VALUE, it indicates the State does not contain any KeyedTimerData + private long maxEventTimeInBuffer; + + // The maximum number of ready timers to process at once per watermark. + private final long maxReadyTimersToProcessOnce; + private SamzaTimerInternalsFactory( Coder keyCoder, Scheduler> timerRegistry, String timerStateId, SamzaStoreStateInternals.Factory nonKeyedStateInternalsFactory, Coder windowCoder, - IsBounded isBounded) { + IsBounded isBounded, + SamzaPipelineOptions pipelineOptions) { this.keyCoder = keyCoder; this.timerRegistry = timerRegistry; - this.eventTimeTimers = new TreeSet<>(); + this.eventTimeBuffer = new TreeSet<>(); + this.maxEventTimerBufferSize = + pipelineOptions.getEventTimerBufferSize(); // must be placed before state initialization + this.maxEventTimeInBuffer = Long.MAX_VALUE; + this.maxReadyTimersToProcessOnce = pipelineOptions.getMaxReadyTimersToProcessOnce(); this.state = new SamzaTimerState(timerStateId, nonKeyedStateInternalsFactory, windowCoder); this.isBounded = isBounded; } @@ -105,7 +121,8 @@ static SamzaTimerInternalsFactory createTimerInternalFactory( timerStateId, nonKeyedStateInternalsFactory, windowCoder, - isBounded); + isBounded, + pipelineOptions); } @Override @@ -152,16 +169,37 @@ public void setOutputWatermark(Instant watermark) { outputWatermark = watermark; } + /** + * The method is called when watermark comes. It compares timers in memory buffer with watermark + * to prepare ready timers. When memory buffer is empty, it asks store to reload timers into + * buffer. note that the number of timers returned may be larger than memory buffer size. + * + * @return a collection of ready timers to be fired + */ public Collection> removeReadyTimers() { final Collection> readyTimers = new ArrayList<>(); - while (!eventTimeTimers.isEmpty() - && eventTimeTimers.first().getTimerData().getTimestamp().isBefore(inputWatermark)) { - final KeyedTimerData keyedTimerData = eventTimeTimers.pollFirst(); + while (!eventTimeBuffer.isEmpty() + && !eventTimeBuffer.first().getTimerData().getTimestamp().isAfter(inputWatermark) + && readyTimers.size() < maxReadyTimersToProcessOnce) { + + final KeyedTimerData keyedTimerData = eventTimeBuffer.pollFirst(); readyTimers.add(keyedTimerData); state.deletePersisted(keyedTimerData); + + if (eventTimeBuffer.isEmpty()) { + state.reloadEventTimeTimers(); + } } + LOG.debug("Removed {} ready timers", readyTimers.size()); + if (readyTimers.size() == maxReadyTimersToProcessOnce + && !eventTimeBuffer.isEmpty() + && eventTimeBuffer.first().getTimerData().getTimestamp().isBefore(inputWatermark)) { + LOG.warn( + "Loaded {} expired timers, the remaining will be processed at next watermark.", + maxReadyTimersToProcessOnce); + } return readyTimers; } @@ -177,6 +215,11 @@ public Instant getOutputWatermark() { return outputWatermark; } + // for unit test only + NavigableSet> getEventTimeBuffer() { + return eventTimeBuffer; + } + private class SamzaTimerInternals implements TimerInternals { private final byte[] keyBytes; private final K key; @@ -202,77 +245,126 @@ public void setTimer( public void setTimer(TimerData timerData) { if (isBounded == IsBounded.UNBOUNDED && timerData.getTimestamp().getMillis() - >= GlobalWindow.INSTANCE.maxTimestamp().getMillis()) { - // No need to register a timer of max timestamp if the input is unbounded + > GlobalWindow.INSTANCE.maxTimestamp().getMillis()) { + // No need to register a timer greater than maxTimestamp if the input is unbounded. + // 1. It will ignore timers with (maxTimestamp + 1) created by stateful ParDo with global + // window. + // 2. It will register timers with maxTimestamp so that global window can be closed + // correctly when max watermark comes. return; } final KeyedTimerData keyedTimerData = new KeyedTimerData<>(keyBytes, key, timerData); - if (eventTimeTimers.contains(keyedTimerData)) { + if (eventTimeBuffer.contains(keyedTimerData)) { return; } final Long lastTimestamp = state.get(keyedTimerData); final Long newTimestamp = timerData.getTimestamp().getMillis(); - if (!newTimestamp.equals(lastTimestamp)) { - if (lastTimestamp != null) { - final TimerData lastTimerData = - TimerData.of( - timerData.getTimerId(), - timerData.getNamespace(), - new Instant(lastTimestamp), - new Instant(lastTimestamp), - timerData.getDomain()); - deleteTimer(lastTimerData, false); - } + if (newTimestamp.equals(lastTimestamp)) { + return; + } + + if (lastTimestamp != null) { + deleteTimer( + timerData.getNamespace(), + timerData.getTimerId(), + timerData.getTimerFamilyId(), + new Instant(lastTimestamp), + new Instant(lastTimestamp), + timerData.getDomain()); + } - // persist it first - state.persist(keyedTimerData); + // persist it first + state.persist(keyedTimerData); - switch (timerData.getDomain()) { - case EVENT_TIME: - eventTimeTimers.add(keyedTimerData); - break; + // TO-DO: apply the same memory optimization over processing timers + switch (timerData.getDomain()) { + case EVENT_TIME: + /* + * To determine if the upcoming KeyedTimerData could be added to the Buffer while + * guaranteeing the Buffer's timestamps are all <= than those in State Store to preserve + * timestamp eviction priority: + * + *

    1) If maxEventTimeInBuffer == long.MAX_VALUE, it indicates that the State is empty, + * therefore all the Event times greater or lesser than newTimestamp are in the buffer; + * + *

    2) If newTimestamp < maxEventTimeInBuffer, it indicates that there are entries + * greater than newTimestamp, so it is safe to add it to the buffer + * + *

    In case that the Buffer is full, we remove the largest timer from memory according + * to {@link KeyedTimerData.compareTo()} + */ + if (newTimestamp < maxEventTimeInBuffer) { + eventTimeBuffer.add(keyedTimerData); + if (eventTimeBuffer.size() > maxEventTimerBufferSize) { + eventTimeBuffer.pollLast(); + maxEventTimeInBuffer = + eventTimeBuffer.last().getTimerData().getTimestamp().getMillis(); + } + } + break; - case PROCESSING_TIME: - timerRegistry.schedule(keyedTimerData, timerData.getTimestamp().getMillis()); - break; + case PROCESSING_TIME: + timerRegistry.schedule(keyedTimerData, timerData.getTimestamp().getMillis()); + break; - default: - throw new UnsupportedOperationException( - String.format( - "%s currently only supports even time or processing time", SamzaRunner.class)); - } + default: + throw new UnsupportedOperationException( + String.format( + "%s currently only supports even time or processing time", SamzaRunner.class)); } } + /** @deprecated use {@link #deleteTimer(StateNamespace, String, String, TimeDomain)}. */ @Override - public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain timeDomain) { - Instant now = Instant.now(); - deleteTimer(TimerData.of(timerId, namespace, now, now, timeDomain)); + @Deprecated + public void deleteTimer(StateNamespace namespace, String timerId, String timerFamilyId) { + deleteTimer(namespace, timerId, timerFamilyId, TimeDomain.EVENT_TIME); } + /** @deprecated use {@link #deleteTimer(StateNamespace, String, String, TimeDomain)}. */ @Override - public void deleteTimer(StateNamespace namespace, String timerId, String timerFamilyId) { - Instant now = Instant.now(); - deleteTimer(TimerData.of(timerId, namespace, now, now, TimeDomain.EVENT_TIME)); + @Deprecated + public void deleteTimer(TimerData timerData) { + deleteTimer( + timerData.getNamespace(), + timerData.getTimerId(), + timerData.getTimerFamilyId(), + timerData.getDomain()); } @Override - public void deleteTimer(TimerData timerData) { - deleteTimer(timerData, true); + public void deleteTimer( + StateNamespace namespace, String timerId, String timerFamilyId, TimeDomain timeDomain) { + final TimerKey timerKey = TimerKey.of(key, namespace, timerId, timerFamilyId); + final Long lastTimestamp = state.get(timerKey, timeDomain); + + if (lastTimestamp == null) { + return; + } + + final Instant timestamp = Instant.ofEpochMilli(lastTimestamp); + deleteTimer(namespace, timerId, timerFamilyId, timestamp, timestamp, timeDomain); } - private void deleteTimer(TimerData timerData, boolean updateState) { + private void deleteTimer( + StateNamespace namespace, + String timerId, + String timerFamilyId, + Instant timestamp, + Instant outputTimestamp, + TimeDomain timeDomain) { + final TimerData timerData = + TimerData.of(timerId, timerFamilyId, namespace, timestamp, outputTimestamp, timeDomain); final KeyedTimerData keyedTimerData = new KeyedTimerData<>(keyBytes, key, timerData); - if (updateState) { - state.deletePersisted(keyedTimerData); - } + + state.deletePersisted(keyedTimerData); switch (timerData.getDomain()) { case EVENT_TIME: - eventTimeTimers.remove(keyedTimerData); + eventTimeBuffer.remove(keyedTimerData); break; case PROCESSING_TIME: @@ -281,7 +373,8 @@ private void deleteTimer(TimerData timerData, boolean updateState) { default: throw new UnsupportedOperationException( - String.format("%s currently only supports event time", SamzaRunner.class)); + String.format( + "%s currently only supports event time or processing time", SamzaRunner.class)); } } @@ -309,15 +402,16 @@ public Instant currentOutputWatermarkTime() { } private class SamzaTimerState { - private final SamzaMapState, Long> eventTimerTimerState; - private final SamzaMapState, Long> processingTimerTimerState; + private final SamzaMapState, Long> eventTimeTimerState; + private final SamzaSetState> timestampSortedEventTimeTimerState; + private final SamzaMapState, Long> processingTimeTimerState; SamzaTimerState( String timerStateId, SamzaStoreStateInternals.Factory nonKeyedStateInternalsFactory, Coder windowCoder) { - this.eventTimerTimerState = + this.eventTimeTimerState = (SamzaMapState, Long>) nonKeyedStateInternalsFactory .stateInternalsForKey(null) @@ -328,7 +422,17 @@ private class SamzaTimerState { new TimerKeyCoder<>(keyCoder, windowCoder), VarLongCoder.of())); - this.processingTimerTimerState = + this.timestampSortedEventTimeTimerState = + (SamzaSetState>) + nonKeyedStateInternalsFactory + .stateInternalsForKey(null) + .state( + StateNamespaces.global(), + StateTags.set( + timerStateId + "-ts", + new KeyedTimerData.KeyedTimerDataCoder<>(keyCoder, windowCoder))); + + this.processingTimeTimerState = (SamzaMapState, Long>) nonKeyedStateInternalsFactory .stateInternalsForKey(null) @@ -339,17 +443,20 @@ private class SamzaTimerState { new TimerKeyCoder<>(keyCoder, windowCoder), VarLongCoder.of())); - restore(); + init(); } Long get(KeyedTimerData keyedTimerData) { - final TimerKey timerKey = TimerKey.of(keyedTimerData); - switch (keyedTimerData.getTimerData().getDomain()) { + return get(TimerKey.of(keyedTimerData), keyedTimerData.getTimerData().getDomain()); + } + + Long get(TimerKey key, TimeDomain domain) { + switch (domain) { case EVENT_TIME: - return eventTimerTimerState.get(timerKey).read(); + return eventTimeTimerState.get(key).read(); case PROCESSING_TIME: - return processingTimerTimerState.get(timerKey).read(); + return processingTimeTimerState.get(key).read(); default: throw new UnsupportedOperationException( @@ -361,18 +468,29 @@ void persist(KeyedTimerData keyedTimerData) { final TimerKey timerKey = TimerKey.of(keyedTimerData); switch (keyedTimerData.getTimerData().getDomain()) { case EVENT_TIME: - eventTimerTimerState.put( + final Long timestamp = eventTimeTimerState.get(timerKey).read(); + + if (timestamp != null) { + final KeyedTimerData keyedTimerDataInStore = + TimerKey.toKeyedTimerData(timerKey, timestamp, TimeDomain.EVENT_TIME, keyCoder); + timestampSortedEventTimeTimerState.remove(keyedTimerDataInStore); + } + eventTimeTimerState.put( timerKey, keyedTimerData.getTimerData().getTimestamp().getMillis()); + + timestampSortedEventTimeTimerState.add(keyedTimerData); + break; case PROCESSING_TIME: - processingTimerTimerState.put( + processingTimeTimerState.put( timerKey, keyedTimerData.getTimerData().getTimestamp().getMillis()); break; default: throw new UnsupportedOperationException( - String.format("%s currently only supports event time", SamzaRunner.class)); + String.format( + "%s currently only supports event time or processing time", SamzaRunner.class)); } } @@ -380,38 +498,51 @@ void deletePersisted(KeyedTimerData keyedTimerData) { final TimerKey timerKey = TimerKey.of(keyedTimerData); switch (keyedTimerData.getTimerData().getDomain()) { case EVENT_TIME: - eventTimerTimerState.remove(timerKey); + eventTimeTimerState.remove(timerKey); + timestampSortedEventTimeTimerState.remove(keyedTimerData); break; case PROCESSING_TIME: - processingTimerTimerState.remove(timerKey); + processingTimeTimerState.remove(timerKey); break; default: throw new UnsupportedOperationException( - String.format("%s currently only supports event time", SamzaRunner.class)); + String.format( + "%s currently only supports event time or processing time", SamzaRunner.class)); } } - private void loadEventTimeTimers() { - final Iterator, Long>> iter = - eventTimerTimerState.readIterator().read(); - // since the iterator will reach to the end, it will be closed automatically - while (iter.hasNext()) { - final Map.Entry, Long> entry = iter.next(); - final KeyedTimerData keyedTimerData = - TimerKey.toKeyedTimerData( - entry.getKey(), entry.getValue(), TimeDomain.EVENT_TIME, keyCoder); + /** + * Reload event time timers from state to memory buffer. Buffer size is bound by + * maxEventTimerBufferSize + */ + private void reloadEventTimeTimers() { + final Iterator> iter = + timestampSortedEventTimeTimerState.readIterator().read(); - eventTimeTimers.add(keyedTimerData); + while (iter.hasNext() && eventTimeBuffer.size() < maxEventTimerBufferSize) { + final KeyedTimerData keyedTimerData = iter.next(); + eventTimeBuffer.add(keyedTimerData); + maxEventTimeInBuffer = keyedTimerData.getTimerData().getTimestamp().getMillis(); } - LOG.info("Loaded {} event time timers in memory", eventTimeTimers.size()); + timestampSortedEventTimeTimerState.closeIterators(); + LOG.info("Loaded {} event time timers in memory", eventTimeBuffer.size()); + + if (eventTimeBuffer.size() < maxEventTimerBufferSize) { + LOG.debug( + "Event time timers in State is empty, filled {} timers out of {} buffer capacity", + eventTimeBuffer.size(), + maxEventTimeInBuffer); + // Reset the flag variable to indicate there are no more KeyedTimerData in State + maxEventTimeInBuffer = Long.MAX_VALUE; + } } private void loadProcessingTimeTimers() { final Iterator, Long>> iter = - processingTimerTimerState.readIterator().read(); + processingTimeTimerState.readIterator().read(); // since the iterator will reach to the end, it will be closed automatically int count = 0; while (iter.hasNext()) { @@ -424,109 +555,113 @@ private void loadProcessingTimeTimers() { keyedTimerData, keyedTimerData.getTimerData().getTimestamp().getMillis()); ++count; } + processingTimeTimerState.closeIterators(); LOG.info("Loaded {} processing time timers in memory", count); } - private void restore() { - loadEventTimeTimers(); + /** + * Restore timer state from RocksDB. This is needed for migration of existing jobs. Give events + * in eventTimeTimerState, construct timestampSortedEventTimeTimerState preparing for memory + * reloading. TO-DO: processing time timers are still loaded into memory in one shot; will apply + * the same optimization mechanism as event time timer + */ + private void init() { + final Iterator, Long>> eventTimersIter = + eventTimeTimerState.readIterator().read(); + // use hasNext to check empty, because this is relatively cheap compared with Iterators.size() + if (eventTimersIter.hasNext()) { + final Iterator sortedEventTimerIter = + timestampSortedEventTimeTimerState.readIterator().read(); + + if (!sortedEventTimerIter.hasNext()) { + // inline the migration code + while (eventTimersIter.hasNext()) { + final Map.Entry, Long> entry = eventTimersIter.next(); + final KeyedTimerData keyedTimerData = + TimerKey.toKeyedTimerData( + entry.getKey(), entry.getValue(), TimeDomain.EVENT_TIME, keyCoder); + timestampSortedEventTimeTimerState.add(keyedTimerData); + } + } + timestampSortedEventTimeTimerState.closeIterators(); + } + eventTimeTimerState.closeIterators(); + + reloadEventTimeTimers(); loadProcessingTimeTimers(); } } - private static class TimerKey { - private final K key; - private final StateNamespace stateNamespace; - private final String timerId; + @AutoValue + abstract static class TimerKey { + abstract @Nullable K getKey(); + + abstract StateNamespace getStateNamespace(); + + abstract String getTimerId(); + + abstract String getTimerFamilyId(); + + static Builder builder() { + return new AutoValue_SamzaTimerInternalsFactory_TimerKey.Builder<>(); + } static TimerKey of(KeyedTimerData keyedTimerData) { final TimerInternals.TimerData timerData = keyedTimerData.getTimerData(); - return new TimerKey<>( - keyedTimerData.getKey(), timerData.getNamespace(), timerData.getTimerId()); + return of( + keyedTimerData.getKey(), + timerData.getNamespace(), + timerData.getTimerId(), + timerData.getTimerFamilyId()); + } + + static TimerKey of( + K key, StateNamespace namespace, String timerId, String timerFamilyId) { + return TimerKey.builder() + .setKey(key) + .setStateNamespace(namespace) + .setTimerId(timerId) + .setTimerFamilyId(timerFamilyId) + .build(); } static KeyedTimerData toKeyedTimerData( TimerKey timerKey, long timestamp, TimeDomain domain, Coder keyCoder) { byte[] keyBytes = null; - if (keyCoder != null && timerKey.key != null) { + if (keyCoder != null && timerKey.getKey() != null) { final ByteArrayOutputStream baos = new ByteArrayOutputStream(); try { - keyCoder.encode(timerKey.key, baos); + keyCoder.encode(timerKey.getKey(), baos); } catch (IOException e) { - throw new RuntimeException("Could not encode key: " + timerKey.key, e); + throw new RuntimeException("Could not encode key: " + timerKey.getKey(), e); } keyBytes = baos.toByteArray(); } - return new KeyedTimerData( + return new KeyedTimerData<>( keyBytes, - timerKey.key, + timerKey.getKey(), TimerInternals.TimerData.of( - timerKey.timerId, - timerKey.stateNamespace, + timerKey.getTimerId(), + timerKey.getTimerFamilyId(), + timerKey.getStateNamespace(), new Instant(timestamp), new Instant(timestamp), domain)); } - private TimerKey(K key, StateNamespace stateNamespace, String timerId) { - this.key = key; - this.stateNamespace = stateNamespace; - this.timerId = timerId; - } - - public K getKey() { - return key; - } - - public StateNamespace getStateNamespace() { - return stateNamespace; - } + @AutoValue.Builder + abstract static class Builder { + abstract Builder setKey(K key); - public String getTimerId() { - return timerId; - } + abstract Builder setStateNamespace(StateNamespace stateNamespace); - @Override - public boolean equals(@Nullable Object o) { - if (this == o) { - return true; - } - if (o == null || getClass() != o.getClass()) { - return false; - } + abstract Builder setTimerId(String timerId); - TimerKey timerKey = (TimerKey) o; + abstract Builder setTimerFamilyId(String timerFamilyId); - if (key != null ? !key.equals(timerKey.key) : timerKey.key != null) { - return false; - } - if (!stateNamespace.equals(timerKey.stateNamespace)) { - return false; - } - - return timerId.equals(timerKey.timerId); - } - - @Override - public int hashCode() { - int result = key != null ? key.hashCode() : 0; - result = 31 * result + stateNamespace.hashCode(); - result = 31 * result + timerId.hashCode(); - return result; - } - - @Override - public String toString() { - return "TimerKey{" - + "key=" - + key - + ", stateNamespace=" - + stateNamespace - + ", timerId='" - + timerId - + '\'' - + '}'; + abstract TimerKey build(); } } @@ -547,12 +682,14 @@ public void encode(TimerKey value, OutputStream outStream) throws CoderException, IOException { // encode the timestamp first - STRING_CODER.encode(value.timerId, outStream); - STRING_CODER.encode(value.stateNamespace.stringKey(), outStream); + STRING_CODER.encode(value.getTimerId(), outStream); + STRING_CODER.encode(value.getStateNamespace().stringKey(), outStream); if (keyCoder != null) { - keyCoder.encode(value.key, outStream); + keyCoder.encode(value.getKey(), outStream); } + + STRING_CODER.encode(value.getTimerFamilyId(), outStream); } @Override @@ -568,7 +705,16 @@ public TimerKey decode(InputStream inStream) throws CoderException, IOExcepti key = keyCoder.decode(inStream); } - return new TimerKey<>(key, namespace, timerId); + // check if the stream has more available bytes. This is to ensure backward compatibility with + // old rocksdb state which does not encode timer family data + final String timerFamilyId = inStream.available() > 0 ? STRING_CODER.decode(inStream) : ""; + + return TimerKey.builder() + .setTimerId(timerId) + .setStateNamespace(namespace) + .setKey(key) + .setTimerFamilyId(timerFamilyId) + .build(); } @Override diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SplittableParDoProcessKeyedElementsOp.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SplittableParDoProcessKeyedElementsOp.java index a09693f91f21..e2663a714c6f 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SplittableParDoProcessKeyedElementsOp.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SplittableParDoProcessKeyedElementsOp.java @@ -115,8 +115,8 @@ public void open( .as(SamzaPipelineOptions.class); final SamzaStoreStateInternals.Factory nonKeyedStateInternalsFactory = - SamzaStoreStateInternals.createStateInternalFactory( - transformId, null, context.getTaskContext(), pipelineOptions, null); + SamzaStoreStateInternals.createNonKeyedStateInternalsFactory( + transformId, context.getTaskContext(), pipelineOptions); final DoFnRunners.OutputManager outputManager = outputManagerFactory.create(emitter); @@ -145,7 +145,7 @@ public void open( SplittableParDoViaKeyedWorkItems.ProcessFn< InputT, OutputT, RestrictionT, PositionT, WatermarkEstimatorStateT> processFn = processElements.newProcessFn(processElements.getFn()); - DoFnInvokers.tryInvokeSetupFor(processFn); + DoFnInvokers.tryInvokeSetupFor(processFn, pipelineOptions); processFn.setStateInternalsFactory(stateInternalsFactory); processFn.setTimerInternalsFactory(timerInternalsFactory); processFn.setProcessElementInvoker( diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/state/SamzaMapState.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/state/SamzaMapState.java index 2afba06e7b20..a8741fbc8625 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/state/SamzaMapState.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/state/SamzaMapState.java @@ -34,4 +34,7 @@ public interface SamzaMapState extends MapState { * @return a {@link ReadableState} of an iterator */ ReadableState>> readIterator(); + + /** Closes the iterator returned from {@link SamzaMapState#readIterator()}. */ + void closeIterators(); } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/state/SamzaSetState.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/state/SamzaSetState.java index a6785c7f3158..8af82fcd5b75 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/state/SamzaSetState.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/state/SamzaSetState.java @@ -33,4 +33,7 @@ public interface SamzaSetState extends SetState { * @return a {@link ReadableState} of an iterator */ ReadableState> readIterator(); + + /** Closes the iterator returned from {@link SamzaSetState#readIterator()}. */ + void closeIterators(); } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ConfigBuilder.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ConfigBuilder.java index b8cfa2782f23..08bef1675662 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ConfigBuilder.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ConfigBuilder.java @@ -22,9 +22,10 @@ import static org.apache.samza.config.JobConfig.JOB_NAME; import static org.apache.samza.config.TaskConfig.COMMIT_MS; import static org.apache.samza.config.TaskConfig.GROUPER_FACTORY; +import static org.apache.samza.config.TaskConfig.MAX_CONCURRENCY; import java.io.File; -import java.net.URI; +import java.util.Collections; import java.util.HashMap; import java.util.Map; import java.util.UUID; @@ -34,21 +35,21 @@ import org.apache.beam.runners.samza.SamzaExecutionEnvironment; import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.container.BeamContainerRunner; +import org.apache.beam.runners.samza.container.BeamJobCoordinatorRunner; import org.apache.beam.runners.samza.runtime.SamzaStoreStateInternals; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.samza.config.ApplicationConfig; import org.apache.samza.config.Config; -import org.apache.samza.config.ConfigFactory; +import org.apache.samza.config.ConfigLoaderFactory; import org.apache.samza.config.JobCoordinatorConfig; import org.apache.samza.config.MapConfig; import org.apache.samza.config.ZkConfig; -import org.apache.samza.config.factories.PropertiesConfigFactory; +import org.apache.samza.config.loaders.PropertiesConfigLoaderFactory; import org.apache.samza.container.grouper.task.SingleContainerGrouperFactory; import org.apache.samza.job.yarn.YarnJobFactory; import org.apache.samza.runtime.LocalApplicationRunner; import org.apache.samza.runtime.RemoteApplicationRunner; -import org.apache.samza.serializers.ByteSerdeFactory; import org.apache.samza.standalone.PassthroughJobCoordinatorFactory; import org.apache.samza.zk.ZkJobCoordinatorFactory; import org.slf4j.Logger; @@ -61,6 +62,7 @@ public class ConfigBuilder { private static final Logger LOG = LoggerFactory.getLogger(ConfigBuilder.class); + private static final String BEAM_STORE_FACTORY = "stores.beamStore.factory"; private static final String APP_RUNNER_CLASS = "app.runner.class"; private static final String YARN_PACKAGE_PATH = "yarn.package.path"; private static final String JOB_FACTORY_CLASS = "job.factory.class"; @@ -80,10 +82,11 @@ public void putAll(Map properties) { config.putAll(properties); } + /** @return built configuration */ public Config build() { try { // apply framework configs - config.putAll(createSystemConfig(options)); + config.putAll(createSystemConfig(options, config)); // apply user configs config.putAll(createUserConfig(options)); @@ -92,7 +95,10 @@ public Config build() { config.put(ApplicationConfig.APP_ID, options.getJobInstance()); config.put(JOB_NAME, options.getJobName()); config.put(JOB_ID, options.getJobInstance()); + config.put(MAX_CONCURRENCY, String.valueOf(options.getMaxBundleSize())); + // remove config overrides before serialization (LISAMZA-15259) + options.setConfigOverride(new HashMap<>()); config.put( "beamPipelineOptions", Base64Serializer.serializeUnchecked(new SerializablePipelineOptions(options))); @@ -116,21 +122,21 @@ private static Map createUserConfig(SamzaPipelineOptions options if (StringUtils.isNoneEmpty(configFilePath)) { LOG.info("configFilePath: " + configFilePath); - final File configFile = new File(configFilePath); - final URI configUri = configFile.toURI(); - final ConfigFactory configFactory = - options.getConfigFactory().getDeclaredConstructor().newInstance(); + final Config properties = new MapConfig(Collections.singletonMap("path", configFilePath)); + final ConfigLoaderFactory configLoaderFactory = + options.getConfigLoaderFactory().getDeclaredConstructor().newInstance(); - LOG.info("configFactory: " + configFactory.getClass().getName()); + LOG.info("configLoaderFactory: " + configLoaderFactory.getClass().getName()); // Config file must exist for default properties config // TODO: add check to all non-empty files once we don't need to // pass the command-line args through the containers - if (configFactory instanceof PropertiesConfigFactory) { - checkArgument(configFile.exists(), "Config file %s does not exist", configFilePath); + if (configLoaderFactory instanceof PropertiesConfigLoaderFactory) { + checkArgument( + new File(configFilePath).exists(), "Config file %s does not exist", configFilePath); } - config.putAll(configFactory.getConfig(configUri)); + config.putAll(configLoaderFactory.getLoader(properties).getConfig()); } // Apply override on top if (options.getConfigOverride() != null) { @@ -181,6 +187,7 @@ private static void validateYarnRun(Map config) { final String appRunner = config.get(APP_RUNNER_CLASS); checkArgument( appRunner == null + || BeamJobCoordinatorRunner.class.getName().equals(appRunner) || RemoteApplicationRunner.class.getName().equals(appRunner) || BeamContainerRunner.class.getName().equals(appRunner), "Config %s must be set to %s for %s Deployment", @@ -208,7 +215,7 @@ public static Map localRunConfig() { .put( // TODO: remove after SAMZA-1531 is resolved ApplicationConfig.APP_RUN_ID, - String.valueOf(System.currentTimeMillis()) + System.currentTimeMillis() + "-" // use the most significant bits in UUID (8 digits) to avoid collision + UUID.randomUUID().toString().substring(0, 8)) @@ -231,23 +238,26 @@ public static Map standAloneRunConfig() { .build(); } - private static Map createSystemConfig(SamzaPipelineOptions options) { - ImmutableMap.Builder configBuilder = + private static Map createSystemConfig( + SamzaPipelineOptions options, Map config) { + final ImmutableMap.Builder configBuilder = ImmutableMap.builder() - .put( - "stores.beamStore.factory", - "org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory") .put("stores.beamStore.key.serde", "byteArraySerde") - .put("stores.beamStore.msg.serde", "byteSerde") - .put("serializers.registry.byteSerde.class", ByteSerdeFactory.class.getName()) + .put("stores.beamStore.msg.serde", "stateValueSerde") + .put( + "serializers.registry.stateValueSerde.class", + SamzaStoreStateInternals.StateValueSerdeFactory.class.getName()) .put( "serializers.registry.byteArraySerde.class", SamzaStoreStateInternals.ByteArraySerdeFactory.class.getName()); - if (options.getStateDurable()) { - LOG.info("stateDurable is enabled"); - configBuilder.put("stores.beamStore.changelog", getChangelogTopic(options, "beamStore")); - configBuilder.put("job.host-affinity.enabled", "true"); + // if config does not contain "stores.beamStore.factory" at this moment, + // then it is a stateless job. + if (!config.containsKey(BEAM_STORE_FACTORY)) { + options.setStateDurable(false); + configBuilder.put( + BEAM_STORE_FACTORY, + "org.apache.samza.storage.kv.inmemory.InMemoryKeyValueStorageEngineFactory"); } LOG.info("Execution environment is " + options.getSamzaExecutionEnvironment()); @@ -269,6 +279,23 @@ private static Map createSystemConfig(SamzaPipelineOptions optio return configBuilder.build(); } + static Map createRocksDBStoreConfig(SamzaPipelineOptions options) { + final ImmutableMap.Builder configBuilder = + ImmutableMap.builder() + .put( + BEAM_STORE_FACTORY, + "org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory") + .put("stores.beamStore.rocksdb.compression", "lz4"); + + if (options.getStateDurable()) { + LOG.info("stateDurable is enabled"); + configBuilder.put("stores.beamStore.changelog", getChangelogTopic(options, "beamStore")); + configBuilder.put("job.host-affinity.enabled", "true"); + } + + return configBuilder.build(); + } + private static void validateConfigs(SamzaPipelineOptions options, Map config) { // validate execution environment diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ConfigContext.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ConfigContext.java index d77bc1e727d4..ea5f6d9e5b4b 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ConfigContext.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ConfigContext.java @@ -17,7 +17,9 @@ */ package org.apache.beam.runners.samza.translation; +import java.util.HashSet; import java.util.Map; +import java.util.Set; import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.runners.TransformHierarchy; @@ -33,10 +35,12 @@ public class ConfigContext { private final Map idMap; private AppliedPTransform currentTransform; private final SamzaPipelineOptions options; + private final Set stateIds; public ConfigContext(Map idMap, SamzaPipelineOptions options) { this.idMap = idMap; this.options = options; + this.stateIds = new HashSet<>(); } public void setCurrentTransform(AppliedPTransform currentTransform) { @@ -60,6 +64,10 @@ public SamzaPipelineOptions getPipelineOptions() { return this.options; } + public boolean addStateId(String stateId) { + return stateIds.add(stateId); + } + private String getIdForPValue(PValue pvalue) { final String id = idMap.get(pvalue); if (id == null) { diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/FlattenPCollectionsTranslator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/FlattenPCollectionsTranslator.java index 50e62a2f0e28..ec7af07979fa 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/FlattenPCollectionsTranslator.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/FlattenPCollectionsTranslator.java @@ -61,7 +61,7 @@ private static void doTranslate( // for some of the validateRunner tests only final MessageStream> noOpStream = ctx.getDummyStream() - .flatMap(OpAdapter.adapt((Op) (inputElement, emitter) -> {})); + .flatMapAsync(OpAdapter.adapt((Op) (inputElement, emitter) -> {})); ctx.registerMessageStream(output, noOpStream); return; } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/GroupByKeyTranslator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/GroupByKeyTranslator.java index 6d2b2b6b1900..d5ded165dc50 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/GroupByKeyTranslator.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/GroupByKeyTranslator.java @@ -19,12 +19,14 @@ import static org.apache.beam.runners.samza.util.SamzaPipelineTranslatorUtils.escape; +import java.util.Map; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.KeyedWorkItem; import org.apache.beam.runners.core.KeyedWorkItemCoder; import org.apache.beam.runners.core.SystemReduceFn; import org.apache.beam.runners.core.construction.graph.PipelineNode; import org.apache.beam.runners.core.construction.graph.QueryablePipeline; +import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.runtime.DoFnOp; import org.apache.beam.runners.samza.runtime.GroupByKeyOp; import org.apache.beam.runners.samza.runtime.KvToKeyedWorkItemOp; @@ -33,6 +35,7 @@ import org.apache.beam.runners.samza.transforms.GroupWithoutRepartition; import org.apache.beam.runners.samza.util.SamzaCoders; import org.apache.beam.runners.samza.util.SamzaPipelineTranslatorUtils; +import org.apache.beam.runners.samza.util.WindowUtils; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.KvCoder; @@ -56,7 +59,9 @@ @SuppressWarnings({"keyfor", "nullness"}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) class GroupByKeyTranslator implements TransformTranslator< - PTransform>, PCollection>>> { + PTransform>, PCollection>>>, + TransformConfigGenerator< + PTransform>, PCollection>>> { @Override public void translate( @@ -108,51 +113,72 @@ public void translatePortable( PipelineNode.PTransformNode transform, QueryablePipeline pipeline, PortableTranslationContext ctx) { - doTranslatePortable(transform, pipeline, ctx); + final String inputId = ctx.getInputId(transform); + final RunnerApi.PCollection input = pipeline.getComponents().getPcollectionsOrThrow(inputId); + final MessageStream>> inputStream = ctx.getMessageStreamById(inputId); + final WindowingStrategy windowingStrategy = + WindowUtils.getWindowStrategy(inputId, pipeline.getComponents()); + final WindowedValue.WindowedValueCoder> windowedInputCoder = + WindowUtils.instantiateWindowedCoder(inputId, pipeline.getComponents()); + final TupleTag> outputTag = + new TupleTag<>(Iterables.getOnlyElement(transform.getTransform().getOutputsMap().keySet())); + + final MessageStream>> outputStream = + doTranslatePortable( + input, inputStream, windowingStrategy, windowedInputCoder, outputTag, ctx); + + ctx.registerMessageStream(ctx.getOutputId(transform), outputStream); } - private static void doTranslatePortable( - PipelineNode.PTransformNode transform, - QueryablePipeline pipeline, + @Override + public Map createConfig( + PTransform>, PCollection>> transform, + TransformHierarchy.Node node, + ConfigContext ctx) { + return ConfigBuilder.createRocksDBStoreConfig(ctx.getPipelineOptions()); + } + + @Override + public Map createPortableConfig( + PipelineNode.PTransformNode transform, SamzaPipelineOptions options) { + return ConfigBuilder.createRocksDBStoreConfig(options); + } + + /** + * The method is used to translate both portable GBK transform as well as grouping side inputs + * into Samza. + */ + static MessageStream>> doTranslatePortable( + RunnerApi.PCollection input, + MessageStream>> inputStream, + WindowingStrategy windowingStrategy, + WindowedValue.WindowedValueCoder> windowedInputCoder, + TupleTag> outputTag, PortableTranslationContext ctx) { - final MessageStream>> inputStream = - ctx.getOneInputMessageStream(transform); final boolean needRepartition = ctx.getSamzaPipelineOptions().getMaxSourceParallelism() > 1; - final WindowingStrategy windowingStrategy = - ctx.getPortableWindowStrategy(transform, pipeline); final Coder windowCoder = windowingStrategy.getWindowFn().windowCoder(); - - final String inputId = ctx.getInputId(transform); - final WindowedValue.WindowedValueCoder> windowedInputCoder = - ctx.instantiateCoder(inputId, pipeline.getComponents()); final KvCoder kvInputCoder = (KvCoder) windowedInputCoder.getValueCoder(); final Coder>> elementCoder = WindowedValue.FullWindowedValueCoder.of(kvInputCoder, windowCoder); - final TupleTag> outputTag = - new TupleTag<>(Iterables.getOnlyElement(transform.getTransform().getOutputsMap().keySet())); - @SuppressWarnings("unchecked") final SystemReduceFn reduceFn = (SystemReduceFn) SystemReduceFn.buffering(kvInputCoder.getValueCoder()); - final RunnerApi.PCollection input = pipeline.getComponents().getPcollectionsOrThrow(inputId); final PCollection.IsBounded isBounded = SamzaPipelineTranslatorUtils.isBounded(input); - final MessageStream>> outputStream = - doTranslateGBK( - inputStream, - needRepartition, - reduceFn, - windowingStrategy, - kvInputCoder, - elementCoder, - ctx.getTransformFullName(), - ctx.getTransformId(), - outputTag, - isBounded); - ctx.registerMessageStream(ctx.getOutputId(transform), outputStream); + return doTranslateGBK( + inputStream, + needRepartition, + reduceFn, + windowingStrategy, + kvInputCoder, + elementCoder, + ctx.getTransformFullName(), + ctx.getTransformId(), + outputTag, + isBounded); } private static MessageStream>> doTranslateGBK( @@ -193,8 +219,8 @@ private static MessageStream>> doT final MessageStream>> outputStream = partitionedInputStream - .flatMap(OpAdapter.adapt(new KvToKeyedWorkItemOp<>())) - .flatMap( + .flatMapAsync(OpAdapter.adapt(new KvToKeyedWorkItemOp<>())) + .flatMapAsync( OpAdapter.adapt( new GroupByKeyOp<>( outputTag, diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ImpulseTranslator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ImpulseTranslator.java index ba909225703f..25b39abc6d68 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ImpulseTranslator.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ImpulseTranslator.java @@ -20,6 +20,7 @@ import org.apache.beam.runners.core.construction.graph.PipelineNode; import org.apache.beam.runners.core.construction.graph.QueryablePipeline; import org.apache.beam.runners.samza.runtime.OpMessage; +import org.apache.beam.runners.samza.util.SamzaPipelineTranslatorUtils; import org.apache.beam.sdk.runners.TransformHierarchy.Node; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.values.PBegin; @@ -65,13 +66,14 @@ public void translatePortable( PortableTranslationContext ctx) { final String outputId = ctx.getOutputId(transform); + final String escapedOutputId = SamzaPipelineTranslatorUtils.escape(outputId); final GenericSystemDescriptor systemDescriptor = - new GenericSystemDescriptor(outputId, SamzaImpulseSystemFactory.class.getName()); + new GenericSystemDescriptor(escapedOutputId, SamzaImpulseSystemFactory.class.getName()); // The KvCoder is needed here for Samza not to crop the key. final Serde>> kvSerde = KVSerde.of(new NoOpSerde(), new NoOpSerde<>()); final GenericInputDescriptor>> inputDescriptor = - systemDescriptor.getInputDescriptor(outputId, kvSerde); + systemDescriptor.getInputDescriptor(escapedOutputId, kvSerde); ctx.registerInputMessageStream(outputId, inputDescriptor); } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/PViewToIdMapper.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/PViewToIdMapper.java index 017be6fdb718..693e624e3df1 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/PViewToIdMapper.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/PViewToIdMapper.java @@ -22,6 +22,7 @@ import java.util.Map; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.runners.TransformHierarchy; +import org.apache.beam.sdk.util.NameUtils; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.PValue; @@ -47,7 +48,7 @@ private PViewToIdMapper() {} @Override public void visitValue(PValue value, TransformHierarchy.Node producer) { - final String valueDesc = value.toString().replaceFirst(".*:([a-zA-Z#0-9]+).*", "$1"); + final String valueDesc = pValueToString(value).replaceFirst(".*:([a-zA-Z#0-9]+).*", "$1"); final String samzaSafeValueDesc = valueDesc.replaceAll("[^A-Za-z0-9_-]", "_"); @@ -65,4 +66,18 @@ public void visitPrimitiveTransform(TransformHierarchy.Node node) { public Map getIdMap() { return Collections.unmodifiableMap(idMap); } + + /** + * This method is created to replace the {@link org.apache.beam.sdk.values.PValueBase#toString()} + * with the old implementation that doesn't contain the hashcode. + */ + private static String pValueToString(PValue value) { + String name; + try { + name = value.getName(); + } catch (IllegalStateException e) { + name = ""; + } + return name + " [" + NameUtils.approximateSimpleName(value.getClass()) + "]"; + } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ParDoBoundMultiTranslator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ParDoBoundMultiTranslator.java index 2369f1081264..84e2c2efec6c 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ParDoBoundMultiTranslator.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ParDoBoundMultiTranslator.java @@ -17,6 +17,8 @@ */ package org.apache.beam.runners.samza.translation; +import static org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.instantiateCoder; + import java.io.IOException; import java.util.ArrayList; import java.util.Collection; @@ -29,7 +31,9 @@ import java.util.concurrent.atomic.AtomicInteger; import java.util.stream.Collectors; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId; import org.apache.beam.runners.core.construction.ParDoTranslation; +import org.apache.beam.runners.core.construction.RunnerPCollectionView; import org.apache.beam.runners.core.construction.graph.PipelineNode; import org.apache.beam.runners.core.construction.graph.QueryablePipeline; import org.apache.beam.runners.samza.SamzaPipelineOptions; @@ -40,19 +44,30 @@ import org.apache.beam.runners.samza.runtime.OpMessage; import org.apache.beam.runners.samza.runtime.SamzaDoFnInvokerRegistrar; import org.apache.beam.runners.samza.util.SamzaPipelineTranslatorUtils; +import org.apache.beam.runners.samza.util.StateUtils; +import org.apache.beam.runners.samza.util.WindowUtils; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.VoidCoder; import org.apache.beam.sdk.runners.TransformHierarchy; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.DoFnSchemaInformation; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.ViewFn; import org.apache.beam.sdk.transforms.join.RawUnionValue; import org.apache.beam.sdk.transforms.reflect.DoFnSignature; import org.apache.beam.sdk.transforms.reflect.DoFnSignatures; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionView; +import org.apache.beam.sdk.values.PCollectionViews; import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; import org.apache.samza.operators.MessageStream; import org.apache.samza.operators.functions.FlatMapFunction; @@ -99,8 +114,10 @@ private static void doTranslate( .collect( Collectors.toMap(e -> e.getKey(), e -> ((PCollection) e.getValue()).getCoder())); - boolean isStateful = DoFnSignatures.isStateful(transform.getFn()); - final Coder keyCoder = isStateful ? ((KvCoder) input.getCoder()).getKeyCoder() : null; + final Coder keyCoder = + StateUtils.isStateful(transform.getFn()) + ? ((KvCoder) input.getCoder()).getKeyCoder() + : null; if (DoFnSignatures.isSplittable(transform.getFn())) { throw new UnsupportedOperationException("Splittable DoFn is not currently supported"); @@ -162,6 +179,7 @@ private static void doTranslate( input.isBounded(), false, null, + null, Collections.emptyMap(), doFnSchemaInformation, sideInputMapping); @@ -176,7 +194,7 @@ private static void doTranslate( } final MessageStream> taggedOutputStream = - mergedStreams.flatMap(OpAdapter.adapt(op)); + mergedStreams.flatMapAsync(OpAdapter.adapt(op)); for (int outputIndex : tagToIndexMap.values()) { @SuppressWarnings("unchecked") @@ -186,7 +204,7 @@ private static void doTranslate( message -> message.getType() != OpMessage.Type.ELEMENT || message.getElement().getValue().getUnionTag() == outputIndex) - .flatMap(OpAdapter.adapt(new RawUnionValueToValue())); + .flatMapAsync(OpAdapter.adapt(new RawUnionValueToValue())); ctx.registerMessageStream(indexToPCollectionMap.get(outputIndex), outputStream); } @@ -218,12 +236,45 @@ private static void doTranslatePortable( } catch (IOException e) { throw new RuntimeException(e); } + String inputId = stagePayload.getInput(); final MessageStream> inputStream = ctx.getMessageStreamById(inputId); - // TODO: support side input - final List>> sideInputStreams = Collections.emptyList(); + + // Analyze side inputs + final List>>> sideInputStreams = new ArrayList<>(); + final Map> sideInputMapping = new HashMap<>(); + final Map> idToViewMapping = new HashMap<>(); + final RunnerApi.Components components = stagePayload.getComponents(); + for (SideInputId sideInputId : stagePayload.getSideInputsList()) { + final String sideInputCollectionId = + components + .getTransformsOrThrow(sideInputId.getTransformId()) + .getInputsOrThrow(sideInputId.getLocalName()); + final WindowingStrategy windowingStrategy = + WindowUtils.getWindowStrategy(sideInputCollectionId, components); + final WindowedValue.WindowedValueCoder coder = + (WindowedValue.WindowedValueCoder) instantiateCoder(sideInputCollectionId, components); + + // Create a runner-side view + final PCollectionView view = createPCollectionView(sideInputId, coder, windowingStrategy); + + // Use GBK to aggregate the side inputs and then broadcast it out + final MessageStream>> broadcastSideInput = + groupAndBroadcastSideInput( + sideInputId, + sideInputCollectionId, + components.getPcollectionsOrThrow(sideInputCollectionId), + (WindowingStrategy) windowingStrategy, + coder, + ctx); + + sideInputStreams.add(broadcastSideInput); + sideInputMapping.put(sideInputId, view); + idToViewMapping.put(getSideInputUniqueId(sideInputId), view); + } final Map, Integer> tagToIndexMap = new HashMap<>(); + final Map indexToIdMap = new HashMap<>(); final Map> idToTupleTagMap = new HashMap<>(); // first output as the main output @@ -238,41 +289,48 @@ private static void doTranslatePortable( outputName -> { TupleTag tupleTag = new TupleTag<>(outputName); tagToIndexMap.put(tupleTag, index.get()); - index.incrementAndGet(); String collectionId = outputs.get(outputName); + indexToIdMap.put(index.get(), collectionId); idToTupleTagMap.put(collectionId, tupleTag); + index.incrementAndGet(); }); WindowedValue.WindowedValueCoder windowedInputCoder = - ctx.instantiateCoder(inputId, pipeline.getComponents()); - - final DoFnSchemaInformation doFnSchemaInformation; - doFnSchemaInformation = ParDoTranslation.getSchemaInformation(transform.getTransform()); + WindowUtils.instantiateWindowedCoder(inputId, pipeline.getComponents()); - Map> sideInputMapping = - ParDoTranslation.getSideInputMapping(transform.getTransform()); + // TODO: support schema and side inputs for portable runner + // Note: transform.getTransform() is an ExecutableStage, not ParDo, so we need to extract + // these info from its components. + final DoFnSchemaInformation doFnSchemaInformation = null; final RunnerApi.PCollection input = pipeline.getComponents().getPcollectionsOrThrow(inputId); final PCollection.IsBounded isBounded = SamzaPipelineTranslatorUtils.isBounded(input); + final Coder keyCoder = + StateUtils.isStateful(stagePayload) + ? ((KvCoder) + ((WindowedValue.FullWindowedValueCoder) windowedInputCoder).getValueCoder()) + .getKeyCoder() + : null; final DoFnOp op = new DoFnOp<>( mainOutputTag, new NoOpDoFn<>(), - null, // key coder not in use + keyCoder, windowedInputCoder.getValueCoder(), // input coder not in use windowedInputCoder, Collections.emptyMap(), // output coders not in use - Collections.emptyList(), // sideInputs not in use until side input support + new ArrayList<>(sideInputMapping.values()), new ArrayList<>(idToTupleTagMap.values()), // used by java runner only - SamzaPipelineTranslatorUtils.getPortableWindowStrategy(transform, pipeline), - Collections.emptyMap(), // idToViewMap not in use until side input support + WindowUtils.getWindowStrategy(inputId, stagePayload.getComponents()), + idToViewMapping, new DoFnOp.MultiOutputManagerFactory(tagToIndexMap), ctx.getTransformFullName(), ctx.getTransformId(), isBounded, true, stagePayload, + ctx.getJobInfo(), idToTupleTagMap, doFnSchemaInformation, sideInputMapping); @@ -287,18 +345,19 @@ private static void doTranslatePortable( } final MessageStream> taggedOutputStream = - mergedStreams.flatMap(OpAdapter.adapt(op)); + mergedStreams.flatMapAsync(OpAdapter.adapt(op)); for (int outputIndex : tagToIndexMap.values()) { + @SuppressWarnings("unchecked") final MessageStream> outputStream = taggedOutputStream .filter( message -> message.getType() != OpMessage.Type.ELEMENT || message.getElement().getValue().getUnionTag() == outputIndex) - .flatMap(OpAdapter.adapt(new RawUnionValueToValue())); + .flatMapAsync(OpAdapter.adapt(new RawUnionValueToValue())); - ctx.registerMessageStream(ctx.getOutputId(transform), outputStream); + ctx.registerMessageStream(indexToIdMap.get(outputIndex), outputStream); } } @@ -309,15 +368,29 @@ public Map createConfig( final DoFnSignature signature = DoFnSignatures.getSignature(transform.getFn().getClass()); final SamzaPipelineOptions options = ctx.getPipelineOptions(); + // If a ParDo observes directly or indirectly with window, then this is a stateful ParDo + // in this case, we will use RocksDB as system store. + if (signature.processElement().observesWindow()) { + config.putAll(ConfigBuilder.createRocksDBStoreConfig(options)); + } + if (signature.usesState()) { // set up user state configs for (DoFnSignature.StateDeclaration state : signature.stateDeclarations().values()) { final String storeId = state.id(); + + // TODO: remove validation after we support same state id in different ParDo. + if (!ctx.addStateId(storeId)) { + throw new IllegalStateException( + "Duplicate StateId " + storeId + " found in multiple ParDo."); + } + config.put( "stores." + storeId + ".factory", "org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory"); config.put("stores." + storeId + ".key.serde", "byteArraySerde"); - config.put("stores." + storeId + ".msg.serde", "byteSerde"); + config.put("stores." + storeId + ".msg.serde", "stateValueSerde"); + config.put("stores." + storeId + ".rocksdb.compression", "lz4"); if (options.getStateDurable()) { config.put( @@ -334,6 +407,118 @@ public Map createConfig( return config; } + @Override + public Map createPortableConfig( + PipelineNode.PTransformNode transform, SamzaPipelineOptions options) { + + final RunnerApi.ExecutableStagePayload stagePayload; + try { + stagePayload = + RunnerApi.ExecutableStagePayload.parseFrom( + transform.getTransform().getSpec().getPayload()); + } catch (IOException e) { + throw new RuntimeException(e); + } + + if (!StateUtils.isStateful(stagePayload)) { + return Collections.emptyMap(); + } + + final Map config = + new HashMap<>(ConfigBuilder.createRocksDBStoreConfig(options)); + for (RunnerApi.ExecutableStagePayload.UserStateId stateId : stagePayload.getUserStatesList()) { + final String storeId = stateId.getLocalName(); + + config.put( + "stores." + storeId + ".factory", + "org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory"); + config.put("stores." + storeId + ".key.serde", "byteArraySerde"); + config.put("stores." + storeId + ".msg.serde", "stateValueSerde"); + config.put("stores." + storeId + ".rocksdb.compression", "lz4"); + + if (options.getStateDurable()) { + config.put( + "stores." + storeId + ".changelog", ConfigBuilder.getChangelogTopic(options, storeId)); + } + } + + return config; + } + + @SuppressWarnings("unchecked") + private static final ViewFn>, ?> VIEW_FN = + (ViewFn) + new PCollectionViews.MultimapViewFn<>( + (PCollectionViews.TypeDescriptorSupplier>>) + () -> TypeDescriptors.iterables(new TypeDescriptor>() {}), + (PCollectionViews.TypeDescriptorSupplier) TypeDescriptors::voids); + + // This method follows the same way in Flink to create a runner-side Java + // PCollectionView to represent a portable side input. + private static PCollectionView createPCollectionView( + SideInputId sideInputId, + WindowedValue.WindowedValueCoder coder, + WindowingStrategy windowingStrategy) { + + return new RunnerPCollectionView<>( + null, + new TupleTag<>(sideInputId.getLocalName()), + VIEW_FN, + // TODO: support custom mapping fn + windowingStrategy.getWindowFn().getDefaultWindowMappingFn(), + windowingStrategy, + coder.getValueCoder()); + } + + // Group the side input globally with a null key and then broadcast it + // to all tasks. + private static + MessageStream>> groupAndBroadcastSideInput( + SideInputId sideInputId, + String sideInputCollectionId, + RunnerApi.PCollection sideInputPCollection, + WindowingStrategy windowingStrategy, + WindowedValue.WindowedValueCoder coder, + PortableTranslationContext ctx) { + final MessageStream> sideInput = + ctx.getMessageStreamById(sideInputCollectionId); + final MessageStream>> keyedSideInput = + sideInput.map( + opMessage -> { + WindowedValue wv = opMessage.getElement(); + return OpMessage.ofElement(wv.withValue(KV.of(null, wv.getValue()))); + }); + final WindowedValue.WindowedValueCoder> kvCoder = + coder.withValueCoder(KvCoder.of(VoidCoder.of(), coder.getValueCoder())); + final MessageStream>>> groupedSideInput = + GroupByKeyTranslator.doTranslatePortable( + sideInputPCollection, + keyedSideInput, + windowingStrategy, + kvCoder, + new TupleTag<>("main output"), + ctx); + final MessageStream>> nonkeyGroupedSideInput = + groupedSideInput.map( + opMessage -> { + WindowedValue>> wv = opMessage.getElement(); + return OpMessage.ofElement(wv.withValue(wv.getValue().getValue())); + }); + final MessageStream>> broadcastSideInput = + SamzaPublishViewTranslator.doTranslate( + nonkeyGroupedSideInput, + coder.withValueCoder(IterableCoder.of(coder.getValueCoder())), + ctx.getTransformId(), + getSideInputUniqueId(sideInputId), + ctx.getSamzaPipelineOptions()); + + return broadcastSideInput; + } + + private static String getSideInputUniqueId(SideInputId sideInputId) { + return sideInputId.getTransformId() + "-" + sideInputId.getLocalName(); + } + static class SideInputWatermarkFn implements FlatMapFunction, OpMessage>, WatermarkFunction> { diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/PortableTranslationContext.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/PortableTranslationContext.java index cf0a874a7104..5708388eaaf3 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/PortableTranslationContext.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/PortableTranslationContext.java @@ -17,7 +17,6 @@ */ package org.apache.beam.runners.samza.translation; -import java.io.IOException; import java.util.Collection; import java.util.HashMap; import java.util.HashSet; @@ -25,18 +24,11 @@ import java.util.Map; import java.util.Set; import java.util.stream.Collectors; -import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.runners.core.construction.RehydratedComponents; -import org.apache.beam.runners.core.construction.WindowingStrategyTranslation; import org.apache.beam.runners.core.construction.graph.PipelineNode; -import org.apache.beam.runners.core.construction.graph.QueryablePipeline; -import org.apache.beam.runners.fnexecution.wire.WireCoders; +import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.runtime.OpMessage; import org.apache.beam.runners.samza.util.HashIdGenerator; -import org.apache.beam.sdk.transforms.windowing.BoundedWindow; -import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.samza.application.descriptors.StreamApplicationDescriptor; import org.apache.samza.operators.KV; @@ -57,8 +49,9 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PortableTranslationContext { - private final Map> messsageStreams = new HashMap<>(); + private final Map> messageStreams = new HashMap<>(); private final StreamApplicationDescriptor appDescriptor; + private final JobInfo jobInfo; private final SamzaPipelineOptions options; private final Set registeredInputStreams = new HashSet<>(); private final Map registeredTables = new HashMap<>(); @@ -67,7 +60,8 @@ public class PortableTranslationContext { private PipelineNode.PTransformNode currentTransform; public PortableTranslationContext( - StreamApplicationDescriptor appDescriptor, SamzaPipelineOptions options) { + StreamApplicationDescriptor appDescriptor, SamzaPipelineOptions options, JobInfo jobInfo) { + this.jobInfo = jobInfo; this.appDescriptor = appDescriptor; this.options = options; } @@ -90,7 +84,7 @@ public MessageStream> getOneInputMessageStream( @SuppressWarnings("unchecked") public MessageStream> getMessageStreamById(String id) { - return (MessageStream>) messsageStreams.get(id); + return (MessageStream>) messageStreams.get(id); } public String getInputId(PipelineNode.PTransformNode transform) { @@ -101,11 +95,15 @@ public String getOutputId(PipelineNode.PTransformNode transform) { return Iterables.getOnlyElement(transform.getTransform().getOutputsMap().values()); } + public JobInfo getJobInfo() { + return jobInfo; + } + public void registerMessageStream(String id, MessageStream> stream) { - if (messsageStreams.containsKey(id)) { + if (messageStreams.containsKey(id)) { throw new IllegalArgumentException("Stream already registered for id: " + id); } - messsageStreams.put(id, stream); + messageStreams.put(id, stream); } /** Get output stream by output descriptor. */ @@ -128,47 +126,6 @@ public void registerInputMessageStream( registeredInputStreams.add(streamId); } - public WindowedValue.WindowedValueCoder instantiateCoder( - String collectionId, RunnerApi.Components components) { - PipelineNode.PCollectionNode collectionNode = - PipelineNode.pCollection(collectionId, components.getPcollectionsOrThrow(collectionId)); - try { - return (WindowedValue.WindowedValueCoder) - WireCoders.instantiateRunnerWireCoder(collectionNode, components); - } catch (IOException e) { - throw new RuntimeException(e); - } - } - - public WindowingStrategy getPortableWindowStrategy( - PipelineNode.PTransformNode transform, QueryablePipeline pipeline) { - String inputId = Iterables.getOnlyElement(transform.getTransform().getInputsMap().values()); - RehydratedComponents rehydratedComponents = - RehydratedComponents.forComponents(pipeline.getComponents()); - - RunnerApi.WindowingStrategy windowingStrategyProto = - pipeline - .getComponents() - .getWindowingStrategiesOrThrow( - pipeline.getComponents().getPcollectionsOrThrow(inputId).getWindowingStrategyId()); - - WindowingStrategy windowingStrategy; - try { - windowingStrategy = - WindowingStrategyTranslation.fromProto(windowingStrategyProto, rehydratedComponents); - } catch (Exception e) { - throw new IllegalStateException( - String.format( - "Unable to hydrate GroupByKey windowing strategy %s.", windowingStrategyProto), - e); - } - - @SuppressWarnings("unchecked") - WindowingStrategy ret = - (WindowingStrategy) windowingStrategy; - return ret; - } - @SuppressWarnings("unchecked") public Table> getTable(TableDescriptor tableDesc) { return registeredTables.computeIfAbsent( diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPipelineTranslator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPipelineTranslator.java index 8eb8746a7088..3514b4fd3195 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPipelineTranslator.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPipelineTranslator.java @@ -18,7 +18,6 @@ package org.apache.beam.runners.samza.translation; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; -import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; import com.google.auto.service.AutoService; import java.util.HashMap; @@ -58,9 +57,6 @@ private static Map> loadTranslators() { private SamzaPipelineTranslator() {} public static void translate(Pipeline pipeline, TranslationContext ctx) { - checkState( - ctx.getPipelineOptions().getMaxBundleSize() <= 1, - "bundling is not supported for non portable mode. Please disable bundling (by setting max bundle size to 1)."); final TransformVisitorFn translateFn = new TransformVisitorFn() { @@ -180,18 +176,19 @@ public static class SamzaTranslators implements SamzaTranslatorRegistrar { @Override public Map> getTransformTranslators() { return ImmutableMap.>builder() - .put(PTransformTranslation.READ_TRANSFORM_URN, new ReadTranslator()) - .put(PTransformTranslation.PAR_DO_TRANSFORM_URN, new ParDoBoundMultiTranslator()) - .put(PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new GroupByKeyTranslator()) - .put(PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN, new GroupByKeyTranslator()) - .put(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new WindowAssignTranslator()) - .put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenPCollectionsTranslator()) - .put(SamzaPublishView.SAMZA_PUBLISH_VIEW_URN, new SamzaPublishViewTranslator()) + .put(PTransformTranslation.READ_TRANSFORM_URN, new ReadTranslator<>()) + .put(PTransformTranslation.PAR_DO_TRANSFORM_URN, new ParDoBoundMultiTranslator<>()) + .put(PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new GroupByKeyTranslator<>()) + .put(PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN, new GroupByKeyTranslator<>()) + .put(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new WindowAssignTranslator<>()) + .put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenPCollectionsTranslator<>()) + .put(SamzaPublishView.SAMZA_PUBLISH_VIEW_URN, new SamzaPublishViewTranslator<>()) .put(PTransformTranslation.IMPULSE_TRANSFORM_URN, new ImpulseTranslator()) + .put(ExecutableStage.URN, new ParDoBoundMultiTranslator<>()) + .put(PTransformTranslation.TEST_STREAM_TRANSFORM_URN, new SamzaTestStreamTranslator()) .put( PTransformTranslation.SPLITTABLE_PROCESS_KEYED_URN, new SplittableParDoTranslators.ProcessKeyedElements<>()) - .put(ExecutableStage.URN, new ParDoBoundMultiTranslator()) .build(); } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPortablePipelineTranslator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPortablePipelineTranslator.java index fcca6bbde7d4..9158cd4dfd91 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPortablePipelineTranslator.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPortablePipelineTranslator.java @@ -17,10 +17,14 @@ */ package org.apache.beam.runners.samza.translation; +import com.google.auto.service.AutoService; import java.util.HashMap; import java.util.Map; import java.util.ServiceLoader; +import java.util.Set; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.PTransformTranslation; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; import org.apache.beam.runners.core.construction.graph.PipelineNode; import org.apache.beam.runners.core.construction.graph.QueryablePipeline; import org.apache.beam.runners.samza.SamzaPipelineOptions; @@ -33,6 +37,7 @@ * pipeline */ @SuppressWarnings({ + "keyfor", "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) @@ -43,7 +48,8 @@ public class SamzaPortablePipelineTranslator { private static Map> loadTranslators() { Map> translators = new HashMap<>(); - for (SamzaTranslatorRegistrar registrar : ServiceLoader.load(SamzaTranslatorRegistrar.class)) { + for (SamzaPortableTranslatorRegistrar registrar : + ServiceLoader.load(SamzaPortableTranslatorRegistrar.class)) { translators.putAll(registrar.getTransformTranslators()); } LOG.info("{} translators loaded.", translators.size()); @@ -85,4 +91,24 @@ public static void createConfig( } } } + + public static Set knownUrns() { + return TRANSLATORS.keySet(); + } + + /** Registers Samza translators. */ + @AutoService(SamzaPortableTranslatorRegistrar.class) + public static class SamzaTranslators implements SamzaPortableTranslatorRegistrar { + + @Override + public Map> getTransformTranslators() { + return ImmutableMap.>builder() + .put(PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new GroupByKeyTranslator<>()) + .put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenPCollectionsTranslator<>()) + .put(PTransformTranslation.IMPULSE_TRANSFORM_URN, new ImpulseTranslator()) + .put(PTransformTranslation.TEST_STREAM_TRANSFORM_URN, new SamzaTestStreamTranslator<>()) + .put(ExecutableStage.URN, new ParDoBoundMultiTranslator<>()) + .build(); + } + } } diff --git a/sdks/java/io/thrift/src/test/resources/thrift/thrift_test.thrift b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPortableTranslatorRegistrar.java similarity index 72% rename from sdks/java/io/thrift/src/test/resources/thrift/thrift_test.thrift rename to runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPortableTranslatorRegistrar.java index 93b04b304c91..5eede8f65326 100644 --- a/sdks/java/io/thrift/src/test/resources/thrift/thrift_test.thrift +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPortableTranslatorRegistrar.java @@ -15,17 +15,11 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -// This thrift file is used to generate the TestThriftStruct class. +package org.apache.beam.runners.samza.translation; -namespace java test_thrift +import java.util.Map; -struct TestThriftStruct { - 1: i8 testByte - 2: i16 testShort - 3: i32 testInt - 4: i64 testLong - 5: double testDouble - 6: map stringIntMap - 7: binary testBinary - 8: bool testBool +/** A registrar of TransformTranslator in portable pipeline. */ +public interface SamzaPortableTranslatorRegistrar { + Map> getTransformTranslators(); } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPublishViewTransformOverride.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPublishViewTransformOverride.java index 8644e732b3f9..1f8fbcc14257 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPublishViewTransformOverride.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPublishViewTransformOverride.java @@ -17,12 +17,8 @@ */ package org.apache.beam.runners.samza.translation; -import java.util.ArrayList; -import java.util.List; +import org.apache.beam.runners.core.Concatenate; import org.apache.beam.runners.core.construction.SingleInputOutputOverrideFactory; -import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.coders.CoderRegistry; -import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.sdk.transforms.PTransform; @@ -67,41 +63,4 @@ public PCollection expand(PCollection input) { return input; } } - - private static class Concatenate extends Combine.CombineFn, List> { - @Override - public List createAccumulator() { - return new ArrayList<>(); - } - - @Override - public List addInput(List accumulator, T input) { - accumulator.add(input); - return accumulator; - } - - @Override - public List mergeAccumulators(Iterable> accumulators) { - List result = createAccumulator(); - for (List accumulator : accumulators) { - result.addAll(accumulator); - } - return result; - } - - @Override - public List extractOutput(List accumulator) { - return accumulator; - } - - @Override - public Coder> getAccumulatorCoder(CoderRegistry registry, Coder inputCoder) { - return ListCoder.of(inputCoder); - } - - @Override - public Coder> getDefaultOutputCoder(CoderRegistry registry, Coder inputCoder) { - return ListCoder.of(inputCoder); - } - } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPublishViewTranslator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPublishViewTranslator.java index 308be267a3aa..08b6196b6c43 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPublishViewTranslator.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaPublishViewTranslator.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.samza.translation; import java.util.List; +import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.runtime.OpMessage; import org.apache.beam.runners.samza.util.SamzaCoders; import org.apache.beam.sdk.coders.Coder; @@ -35,18 +36,29 @@ public void translate( SamzaPublishView transform, TransformHierarchy.Node node, TranslationContext ctx) { - doTranslate(transform, node, ctx); - } - - private static void doTranslate( - SamzaPublishView transform, - TransformHierarchy.Node node, - TranslationContext ctx) { - final PCollection> input = ctx.getInput(transform); final MessageStream>> inputStream = ctx.getMessageStream(input); @SuppressWarnings("unchecked") final Coder>> elementCoder = (Coder) SamzaCoders.of(input); + final String viewId = ctx.getViewId(transform.getView()); + + final MessageStream>> outputStream = + doTranslate( + inputStream, elementCoder, ctx.getTransformId(), viewId, ctx.getPipelineOptions()); + + ctx.registerViewStream(transform.getView(), outputStream); + } + + /** + * This method is used to translate both native Java PublishView transform as well as portable + * side input broadcasting into Samza. + */ + static MessageStream>> doTranslate( + MessageStream>> inputStream, + Coder>> coder, + String transformId, + String viewId, + SamzaPipelineOptions options) { final MessageStream>> elementStream = inputStream @@ -55,15 +67,10 @@ private static void doTranslate( // TODO: once SAMZA-1580 is resolved, this optimization will go directly inside Samza final MessageStream>> broadcastStream = - ctx.getPipelineOptions().getMaxSourceParallelism() == 1 + options.getMaxSourceParallelism() == 1 ? elementStream - : elementStream.broadcast( - SamzaCoders.toSerde(elementCoder), "view-" + ctx.getTransformId()); + : elementStream.broadcast(SamzaCoders.toSerde(coder), "view-" + transformId); - final String viewId = ctx.getViewId(transform.getView()); - final MessageStream>> outputStream = - broadcastStream.map(element -> OpMessage.ofSideInput(viewId, element)); - - ctx.registerViewStream(transform.getView(), outputStream); + return broadcastStream.map(element -> OpMessage.ofSideInput(viewId, element)); } } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaTestStreamSystemFactory.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaTestStreamSystemFactory.java new file mode 100644 index 000000000000..570be619695f --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaTestStreamSystemFactory.java @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.translation; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.function.Function; +import java.util.stream.Collectors; +import org.apache.beam.runners.core.serialization.Base64Serializer; +import org.apache.beam.runners.samza.runtime.OpMessage; +import org.apache.beam.sdk.testing.TestStream; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.samza.Partition; +import org.apache.samza.SamzaException; +import org.apache.samza.config.Config; +import org.apache.samza.config.SystemConfig; +import org.apache.samza.metrics.MetricsRegistry; +import org.apache.samza.system.IncomingMessageEnvelope; +import org.apache.samza.system.SystemAdmin; +import org.apache.samza.system.SystemConsumer; +import org.apache.samza.system.SystemFactory; +import org.apache.samza.system.SystemProducer; +import org.apache.samza.system.SystemStreamMetadata; +import org.apache.samza.system.SystemStreamPartition; + +/** + * A Samza system factory that supports consuming from {@link TestStream} and translating events + * into messages according to the {@link org.apache.beam.sdk.testing.TestStream.EventType} of the + * events. + */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class SamzaTestStreamSystemFactory implements SystemFactory { + @Override + public SystemConsumer getConsumer(String systemName, Config config, MetricsRegistry registry) { + final String streamPrefix = String.format(SystemConfig.SYSTEM_ID_PREFIX, systemName); + final Config scopedConfig = config.subset(streamPrefix, true); + return new SamzaTestStreamSystemConsumer<>(getTestStream(scopedConfig)); + } + + @Override + public SystemProducer getProducer(String systemName, Config config, MetricsRegistry registry) { + throw new UnsupportedOperationException("SamzaTestStreamSystem doesn't support producing"); + } + + @Override + public SystemAdmin getAdmin(String systemName, Config config) { + return new SamzaTestStreamSystemAdmin(); + } + + /** A helper function to decode testStream from the config. */ + private static TestStream getTestStream(Config config) { + @SuppressWarnings("unchecked") + final SerializableFunction> testStreamDecoder = + Base64Serializer.deserializeUnchecked( + config.get(SamzaTestStreamTranslator.TEST_STREAM_DECODER), SerializableFunction.class); + return testStreamDecoder.apply(config.get(SamzaTestStreamTranslator.ENCODED_TEST_STREAM)); + } + + private static final String DUMMY_OFFSET = "0"; + + /** System admin for SamzaTestStreamSystem. */ + public static class SamzaTestStreamSystemAdmin implements SystemAdmin { + @Override + public Map getOffsetsAfter( + Map offsets) { + return offsets.keySet().stream() + .collect(Collectors.toMap(Function.identity(), k -> DUMMY_OFFSET)); + } + + @Override + public Map getSystemStreamMetadata(Set streamNames) { + return streamNames.stream() + .collect( + Collectors.toMap( + Function.identity(), + stream -> { + // TestStream will always be single partition + Map + partitionMetadata = + Collections.singletonMap( + new Partition(0), + new SystemStreamMetadata.SystemStreamPartitionMetadata( + DUMMY_OFFSET, DUMMY_OFFSET, DUMMY_OFFSET)); + return new SystemStreamMetadata(stream, partitionMetadata); + })); + } + + @Override + public Integer offsetComparator(String offset1, String offset2) { + return 0; + } + } + + /** System consumer for SamzaTestStreamSystem. */ + public static class SamzaTestStreamSystemConsumer implements SystemConsumer { + TestStream testStream; + + public SamzaTestStreamSystemConsumer(TestStream testStream) { + this.testStream = testStream; + } + + @Override + public void start() {} + + @Override + public void stop() {} + + @Override + public void register(SystemStreamPartition systemStreamPartition, String offset) {} + + @Override + public Map> poll( + Set systemStreamPartitions, long timeout) { + SystemStreamPartition ssp = systemStreamPartitions.iterator().next(); + ArrayList messages = new ArrayList<>(); + + for (TestStream.Event event : testStream.getEvents()) { + if (event.getType().equals(TestStream.EventType.ELEMENT)) { + // If event type is element, for each element, create a message with the element and + // timestamp. + for (TimestampedValue element : ((TestStream.ElementEvent) event).getElements()) { + WindowedValue windowedValue = + WindowedValue.timestampedValueInGlobalWindow( + element.getValue(), element.getTimestamp()); + final OpMessage opMessage = OpMessage.ofElement(windowedValue); + final IncomingMessageEnvelope envelope = + new IncomingMessageEnvelope(ssp, DUMMY_OFFSET, null, opMessage); + messages.add(envelope); + } + } else if (event.getType().equals(TestStream.EventType.WATERMARK)) { + // If event type is watermark, create a watermark message. + long watermarkMillis = ((TestStream.WatermarkEvent) event).getWatermark().getMillis(); + final IncomingMessageEnvelope envelope = + IncomingMessageEnvelope.buildWatermarkEnvelope(ssp, watermarkMillis); + messages.add(envelope); + if (watermarkMillis == BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis()) { + // If watermark reached max watermark, also create a end-of-stream message + final IncomingMessageEnvelope endOfStreamMessage = + IncomingMessageEnvelope.buildEndOfStreamEnvelope(ssp); + messages.add(endOfStreamMessage); + break; + } + } else if (event.getType().equals(TestStream.EventType.PROCESSING_TIME)) { + throw new UnsupportedOperationException( + "Advancing Processing time is not supported by the Samza Runner."); + } else { + throw new SamzaException("Unknown event type " + event.getType()); + } + } + + return ImmutableMap.of(ssp, messages); + } + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaTestStreamTranslator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaTestStreamTranslator.java new file mode 100644 index 000000000000..e50dc2c405d3 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SamzaTestStreamTranslator.java @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.translation; + +import java.io.IOException; +import java.util.Map; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.RehydratedComponents; +import org.apache.beam.runners.core.construction.TestStreamTranslation; +import org.apache.beam.runners.core.construction.graph.PipelineNode; +import org.apache.beam.runners.core.construction.graph.QueryablePipeline; +import org.apache.beam.runners.core.serialization.Base64Serializer; +import org.apache.beam.runners.samza.runtime.OpMessage; +import org.apache.beam.runners.samza.util.SamzaPipelineTranslatorUtils; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.runners.TransformHierarchy; +import org.apache.beam.sdk.testing.TestStream; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.util.CoderUtils; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.samza.operators.KV; +import org.apache.samza.serializers.KVSerde; +import org.apache.samza.serializers.NoOpSerde; +import org.apache.samza.serializers.Serde; +import org.apache.samza.system.descriptors.GenericInputDescriptor; +import org.apache.samza.system.descriptors.GenericSystemDescriptor; + +/** + * Translate {@link org.apache.beam.sdk.testing.TestStream} to a samza message stream produced by + * {@link SamzaTestStreamSystemFactory.SamzaTestStreamSystemConsumer}. + */ +@SuppressWarnings({"rawtypes"}) +public class SamzaTestStreamTranslator implements TransformTranslator> { + public static final String ENCODED_TEST_STREAM = "encodedTestStream"; + public static final String TEST_STREAM_DECODER = "testStreamDecoder"; + + @Override + public void translate( + TestStream testStream, TransformHierarchy.Node node, TranslationContext ctx) { + final PCollection output = ctx.getOutput(testStream); + final String outputId = ctx.getIdForPValue(output); + final Coder valueCoder = testStream.getValueCoder(); + final TestStream.TestStreamCoder testStreamCoder = TestStream.TestStreamCoder.of(valueCoder); + + // encode testStream as a string + final String encodedTestStream; + try { + encodedTestStream = CoderUtils.encodeToBase64(testStreamCoder, testStream); + } catch (CoderException e) { + throw new RuntimeException("Could not encode TestStream.", e); + } + + // the decoder for encodedTestStream + SerializableFunction> testStreamDecoder = + string -> { + try { + return CoderUtils.decodeFromBase64(TestStream.TestStreamCoder.of(valueCoder), string); + } catch (CoderException e) { + throw new RuntimeException("Could not decode TestStream.", e); + } + }; + + ctx.registerInputMessageStream( + output, createInputDescriptor(outputId, encodedTestStream, testStreamDecoder)); + } + + @Override + public void translatePortable( + PipelineNode.PTransformNode transform, + QueryablePipeline pipeline, + PortableTranslationContext ctx) { + final ByteString bytes = transform.getTransform().getSpec().getPayload(); + final SerializableFunction> testStreamDecoder = + createTestStreamDecoder(pipeline.getComponents(), bytes); + + final String outputId = ctx.getOutputId(transform); + final String escapedOutputId = SamzaPipelineTranslatorUtils.escape(outputId); + + ctx.registerInputMessageStream( + outputId, + createInputDescriptor( + escapedOutputId, Base64Serializer.serializeUnchecked(bytes), testStreamDecoder)); + } + + @SuppressWarnings("unchecked") + private static GenericInputDescriptor>> createInputDescriptor( + String id, + String encodedTestStream, + SerializableFunction> testStreamDecoder) { + final Map systemConfig = + ImmutableMap.of( + ENCODED_TEST_STREAM, + encodedTestStream, + TEST_STREAM_DECODER, + Base64Serializer.serializeUnchecked(testStreamDecoder)); + final GenericSystemDescriptor systemDescriptor = + new GenericSystemDescriptor(id, SamzaTestStreamSystemFactory.class.getName()) + .withSystemConfigs(systemConfig); + + // The KvCoder is needed here for Samza not to crop the key. + final Serde>> kvSerde = KVSerde.of(new NoOpSerde(), new NoOpSerde<>()); + return systemDescriptor.getInputDescriptor(id, kvSerde); + } + + @SuppressWarnings("unchecked") + private static SerializableFunction> createTestStreamDecoder( + RunnerApi.Components components, ByteString payload) { + Coder coder; + try { + coder = + (Coder) + RehydratedComponents.forComponents(components) + .getCoder(RunnerApi.TestStreamPayload.parseFrom(payload).getCoderId()); + } catch (IOException e) { + throw new RuntimeException(e); + } + + // the decoder for encodedTestStream + return encodedTestStream -> { + try { + return TestStreamTranslation.testStreamFromProtoPayload( + RunnerApi.TestStreamPayload.parseFrom( + Base64Serializer.deserializeUnchecked(encodedTestStream, ByteString.class)), + coder); + } catch (IOException e) { + throw new RuntimeException("Could not decode TestStream.", e); + } + }; + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SplittableParDoTranslators.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SplittableParDoTranslators.java index e2a37c207adb..91dc2a6e3dea 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SplittableParDoTranslators.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/SplittableParDoTranslators.java @@ -119,8 +119,8 @@ public void translate( final MessageStream> taggedOutputStream = partitionedInputStream - .flatMap(OpAdapter.adapt(new KvToKeyedWorkItemOp<>())) - .flatMap( + .flatMapAsync(OpAdapter.adapt(new KvToKeyedWorkItemOp<>())) + .flatMapAsync( OpAdapter.adapt( new SplittableParDoProcessKeyedElementsOp<>( transform.getMainOutputTag(), @@ -139,7 +139,7 @@ public void translate( message -> message.getType() != OpMessage.Type.ELEMENT || message.getElement().getValue().getUnionTag() == outputIndex) - .flatMap(OpAdapter.adapt(new RawUnionValueToValue())); + .flatMapAsync(OpAdapter.adapt(new RawUnionValueToValue())); ctx.registerMessageStream(indexToPCollectionMap.get(outputIndex), outputStream); } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/TranslationContext.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/TranslationContext.java index a298f3832a1e..5a6962878481 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/TranslationContext.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/TranslationContext.java @@ -17,8 +17,12 @@ */ package org.apache.beam.runners.samza.translation; +import java.util.Collections; import java.util.HashMap; +import java.util.HashSet; +import java.util.List; import java.util.Map; +import java.util.Set; import java.util.UUID; import java.util.function.Consumer; import org.apache.beam.runners.core.construction.TransformInputs; @@ -93,26 +97,38 @@ public TranslationContext( } public void registerInputMessageStream( - PValue pvalue, - InputDescriptor>, ?> inputDescriptor) { - // we want to register it with the Samza graph only once per i/o stream - final String streamId = inputDescriptor.getStreamId(); - if (registeredInputStreams.containsKey(streamId)) { - MessageStream> messageStream = registeredInputStreams.get(streamId); - LOG.info( - String.format( - "Stream id %s has already been mapped to %s stream. Mapping %s to the same message stream.", - streamId, messageStream, pvalue)); - registerMessageStream(pvalue, messageStream); - - return; - } - @SuppressWarnings("unchecked") - final MessageStream> typedStream = - getValueStream(appDescriptor.getInputStream(inputDescriptor)); + PValue pvalue, InputDescriptor>, ?> inputDescriptor) { + registerInputMessageStreams(pvalue, Collections.singletonList(inputDescriptor)); + } - registerMessageStream(pvalue, typedStream); - registeredInputStreams.put(streamId, typedStream); + /** + * Function to register a merged messageStream of all input messageStreams to a PCollection. + * + * @param pvalue output of a transform + * @param inputDescriptors a list of Samza InputDescriptors + */ + public void registerInputMessageStreams( + PValue pvalue, List>, ?>> inputDescriptors) { + final Set>> streamsToMerge = new HashSet<>(); + for (InputDescriptor>, ?> inputDescriptor : inputDescriptors) { + final String streamId = inputDescriptor.getStreamId(); + // each streamId registered in map should already be add in messageStreamMap + if (registeredInputStreams.containsKey(streamId)) { + @SuppressWarnings("unchecked") + MessageStream> messageStream = registeredInputStreams.get(streamId); + LOG.info( + String.format( + "Stream id %s has already been mapped to %s stream. Mapping %s to the same message stream.", + streamId, messageStream, pvalue)); + streamsToMerge.add(messageStream); + } else { + final MessageStream> typedStream = + getValueStream(appDescriptor.getInputStream(inputDescriptor)); + registeredInputStreams.put(streamId, typedStream); + streamsToMerge.add(typedStream); + } + } + registerMessageStream(pvalue, MessageStream.mergeAll(streamsToMerge)); } public void registerMessageStream(PValue pvalue, MessageStream> stream) { @@ -204,9 +220,8 @@ public Table> getTable(TableDescriptor tableDesc) { tableDesc.getTableId(), id -> appDescriptor.getTable(tableDesc)); } - private static MessageStream getValueStream( - MessageStream> input) { - return input.map(org.apache.samza.operators.KV::getValue); + private static MessageStream getValueStream(MessageStream> input) { + return input.map(KV::getValue); } public String getIdForPValue(PValue pvalue) { diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/WindowAssignTranslator.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/WindowAssignTranslator.java index 114a256c8676..54000412c0be 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/WindowAssignTranslator.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/translation/WindowAssignTranslator.java @@ -28,7 +28,7 @@ import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.transforms.windowing.WindowFn; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.samza.operators.MessageStream; /** @@ -47,7 +47,7 @@ public void translate( final MessageStream> inputStream = ctx.getMessageStream(ctx.getInput(transform)); final MessageStream> outputStream = - inputStream.flatMap(OpAdapter.adapt(new WindowAssignOp<>(windowFn))); + inputStream.flatMapAsync(OpAdapter.adapt(new WindowAssignOp<>(windowFn))); ctx.registerMessageStream(output, outputStream); } @@ -73,7 +73,7 @@ public void translatePortable( final MessageStream> inputStream = ctx.getOneInputMessageStream(transform); final MessageStream> outputStream = - inputStream.flatMap(OpAdapter.adapt(new WindowAssignOp<>(windowFn))); + inputStream.flatMapAsync(OpAdapter.adapt(new WindowAssignOp<>(windowFn))); ctx.registerMessageStream(ctx.getOutputId(transform), outputStream); } diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/util/FutureUtils.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/FutureUtils.java new file mode 100644 index 000000000000..09ad77bac576 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/FutureUtils.java @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.util; + +import java.util.Collection; +import java.util.List; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.CompletionStage; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +/** A util class to handle java 8 {@link CompletableFuture} and {@link CompletionStage}. */ +@SuppressWarnings({"rawtypes"}) +public final class FutureUtils { + /** + * Flattens the input future collection and returns a single future comprising the results of all + * the futures. + * + * @param inputFutures input future collection + * @param result type of the input future + * @return a single {@link CompletionStage} that contains the results of all the input futures. + */ + public static CompletionStage> flattenFutures( + Collection> inputFutures) { + CompletableFuture[] futures = inputFutures.toArray(new CompletableFuture[0]); + + return CompletableFuture.allOf(futures) + .thenApply( + ignored -> { + final List result = + Stream.of(futures).map(CompletableFuture::join).collect(Collectors.toList()); + return result; + }); + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/util/PipelineJsonRenderer.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/PipelineJsonRenderer.java new file mode 100644 index 000000000000..cc53764d7414 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/PipelineJsonRenderer.java @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.util; + +import java.util.HashMap; +import java.util.Iterator; +import java.util.Map; +import java.util.Optional; +import java.util.ServiceLoader; +import javax.annotation.Nullable; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.runners.TransformHierarchy; +import org.apache.beam.sdk.values.PValue; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A JSON renderer for BEAM {@link Pipeline} DAG. This can help us with visualization of the Beam + * DAG. + */ +@Experimental +public class PipelineJsonRenderer implements Pipeline.PipelineVisitor { + + /** + * Interface to get I/O information for a Beam job. This will help add I/O information to the Beam + * DAG. + */ + @Experimental + public interface SamzaIOInfo { + + /** Get I/O topic name and cluster. */ + Optional getIOInfo(TransformHierarchy.Node node); + } + + /** A registrar for {@link SamzaIOInfo}. */ + public interface SamzaIORegistrar { + + SamzaIOInfo getSamzaIO(); + } + + private static final Logger LOG = LoggerFactory.getLogger(PipelineJsonRenderer.class); + private static final String OUTERMOST_NODE = "OuterMostNode"; + @Nullable private static final SamzaIOInfo SAMZA_IO_INFO = loadSamzaIOInfo(); + + /** + * This method creates a JSON representation of the Beam pipeline. + * + * @param pipeline The beam pipeline + * @return JSON string representation of the pipeline + */ + public static String toJsonString(Pipeline pipeline) { + final PipelineJsonRenderer visitor = new PipelineJsonRenderer(); + pipeline.traverseTopologically(visitor); + return visitor.jsonBuilder.toString(); + } + + /** + * This method creates a JSON representation for Beam Portable Pipeline. + * + * @param pipeline The beam portable pipeline + * @return JSON string representation of the pipeline + */ + public static String toJsonString(RunnerApi.Pipeline pipeline) { + throw new UnsupportedOperationException("JSON DAG for portable pipeline is not supported yet."); + } + + private final StringBuilder jsonBuilder = new StringBuilder(); + private final StringBuilder graphLinks = new StringBuilder(); + private final Map valueToProducerNodeName = new HashMap<>(); + private int indent; + + private PipelineJsonRenderer() {} + + @Nullable + private static SamzaIOInfo loadSamzaIOInfo() { + final Iterator beamIORegistrarIterator = + ServiceLoader.load(SamzaIORegistrar.class).iterator(); + return beamIORegistrarIterator.hasNext() + ? Iterators.getOnlyElement(beamIORegistrarIterator).getSamzaIO() + : null; + } + + @Override + public void enterPipeline(Pipeline p) { + writeLine("{ \n \"RootNode\": ["); + graphLinks.append(",\"graphLinks\": ["); + enterBlock(); + } + + @Override + public CompositeBehavior enterCompositeTransform(TransformHierarchy.Node node) { + String fullName = node.getFullName(); + writeLine("{ \"fullName\":\"%s\",", assignNodeName(fullName)); + if (node.getEnclosingNode() != null) { + String enclosingNodeName = node.getEnclosingNode().getFullName(); + writeLine(" \"enclosingNode\":\"%s\",", assignNodeName(enclosingNodeName)); + } + + Optional ioInfo = getIOInfo(node); + if (ioInfo.isPresent() && !ioInfo.get().isEmpty()) { + writeLine(" \"ioInfo\":\"%s\",", escapeString(ioInfo.get())); + } + + writeLine(" \"ChildNodes\":["); + enterBlock(); + return CompositeBehavior.ENTER_TRANSFORM; + } + + @Override + public void leaveCompositeTransform(TransformHierarchy.Node node) { + exitBlock(); + writeLine("]},"); + } + + @Override + public void visitPrimitiveTransform(TransformHierarchy.Node node) { + String fullName = node.getFullName(); + writeLine("{ \"fullName\":\"%s\",", escapeString(fullName)); + String enclosingNodeName = node.getEnclosingNode().getFullName(); + writeLine(" \"enclosingNode\":\"%s\"},", assignNodeName(enclosingNodeName)); + + node.getOutputs().values().forEach(x -> valueToProducerNodeName.put(x, fullName)); + node.getInputs() + .forEach( + (key, value) -> { + final String producerName = valueToProducerNodeName.get(value); + graphLinks.append( + String.format("{\"from\":\"%s\"," + "\"to\":\"%s\"},", producerName, fullName)); + }); + } + + @Override + public void visitValue(PValue value, TransformHierarchy.Node producer) {} + + @Override + public void leavePipeline(Pipeline pipeline) { + exitBlock(); + writeLine("]"); + // delete the last comma + int lastIndex = graphLinks.length() - 1; + if (graphLinks.charAt(lastIndex) == ',') { + graphLinks.deleteCharAt(lastIndex); + } + graphLinks.append("]"); + jsonBuilder.append(graphLinks); + jsonBuilder.append("}"); + } + + private void enterBlock() { + indent += 4; + } + + private void exitBlock() { + indent -= 4; + } + + private void writeLine(String format, Object... args) { + // Since we append a comma after every entry to the graph, we will need to remove that one extra + // comma towards the end of the JSON. + int secondLastCharIndex = jsonBuilder.length() - 2; + if (jsonBuilder.length() > 1 + && jsonBuilder.charAt(secondLastCharIndex) == ',' + && (format.startsWith("}") || format.startsWith("]"))) { + jsonBuilder.deleteCharAt(secondLastCharIndex); + } + if (indent != 0) { + jsonBuilder.append(String.format("%-" + indent + "s", "")); + } + jsonBuilder.append(String.format(format, args)); + jsonBuilder.append("\n"); + } + + private static String escapeString(String x) { + return x.replace("\"", "\\\""); + } + + private static String shortenTag(String tag) { + return tag.replaceFirst(".*:([a-zA-Z#0-9]+).*", "$1"); + } + + private String assignNodeName(String nodeName) { + return escapeString(nodeName.isEmpty() ? OUTERMOST_NODE : nodeName); + } + + private Optional getIOInfo(TransformHierarchy.Node node) { + if (SAMZA_IO_INFO == null) { + return Optional.empty(); + } + return SAMZA_IO_INFO.getIOInfo(node); + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/util/SamzaPipelineTranslatorUtils.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/SamzaPipelineTranslatorUtils.java index 08117fc084e4..e265ab02e41a 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/util/SamzaPipelineTranslatorUtils.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/SamzaPipelineTranslatorUtils.java @@ -19,17 +19,10 @@ import java.io.IOException; import org.apache.beam.model.pipeline.v1.RunnerApi; -import org.apache.beam.runners.core.construction.RehydratedComponents; -import org.apache.beam.runners.core.construction.WindowingStrategyTranslation; import org.apache.beam.runners.core.construction.graph.PipelineNode; -import org.apache.beam.runners.core.construction.graph.QueryablePipeline; import org.apache.beam.runners.fnexecution.wire.WireCoders; -import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; /** Utilities for pipeline translation. */ @SuppressWarnings({ @@ -50,35 +43,6 @@ public static WindowedValue.WindowedValueCoder instantiateCoder( } } - public static WindowingStrategy getPortableWindowStrategy( - PipelineNode.PTransformNode transform, QueryablePipeline pipeline) { - String inputId = Iterables.getOnlyElement(transform.getTransform().getInputsMap().values()); - RehydratedComponents rehydratedComponents = - RehydratedComponents.forComponents(pipeline.getComponents()); - - RunnerApi.WindowingStrategy windowingStrategyProto = - pipeline - .getComponents() - .getWindowingStrategiesOrThrow( - pipeline.getComponents().getPcollectionsOrThrow(inputId).getWindowingStrategyId()); - - WindowingStrategy windowingStrategy; - try { - windowingStrategy = - WindowingStrategyTranslation.fromProto(windowingStrategyProto, rehydratedComponents); - } catch (InvalidProtocolBufferException e) { - throw new IllegalStateException( - String.format( - "Unable to hydrate GroupByKey windowing strategy %s.", windowingStrategyProto), - e); - } - - @SuppressWarnings("unchecked") - WindowingStrategy ret = - (WindowingStrategy) windowingStrategy; - return ret; - } - /** * Escape the non-alphabet chars in the name so we can create a physical stream out of it. * @@ -86,7 +50,7 @@ public static WindowingStrategy getPortableWindowStrategy( * non-alphabetic characters. */ public static String escape(String name) { - return name.replaceAll("[\\.(/]", "-").replaceAll("[^A-Za-z0-9-_]", ""); + return name.replaceFirst(".*:([a-zA-Z#0-9]+).*", "$1").replaceAll("[^A-Za-z0-9_-]", "_"); } public static PCollection.IsBounded isBounded(RunnerApi.PCollection pCollection) { diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/util/StateUtils.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/StateUtils.java new file mode 100644 index 000000000000..0ebe89ca56c8 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/StateUtils.java @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.util; + +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.reflect.DoFnSignatures; + +/** Utils for determining stateful operators. */ +public class StateUtils { + + public static boolean isStateful(DoFn doFn) { + return DoFnSignatures.isStateful(doFn); + } + + public static boolean isStateful(RunnerApi.ExecutableStagePayload stagePayload) { + return stagePayload.getUserStatesCount() > 0 || stagePayload.getTimersCount() > 0; + } + + public static boolean isStateful(ExecutableStage executableStage) { + return executableStage.getUserStates().size() > 0 || executableStage.getTimers().size() > 0; + } +} diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/util/WindowUtils.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/WindowUtils.java new file mode 100644 index 000000000000..07aa2a4cb994 --- /dev/null +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/util/WindowUtils.java @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.util; + +import java.io.IOException; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.RehydratedComponents; +import org.apache.beam.runners.core.construction.WindowingStrategyTranslation; +import org.apache.beam.runners.core.construction.graph.PipelineNode; +import org.apache.beam.runners.fnexecution.wire.WireCoders; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.sdk.values.WindowingStrategy; + +/** Utils for window operations. */ +public class WindowUtils { + + /** Get {@link WindowingStrategy} of given collection id from {@link RunnerApi.Components}. */ + public static WindowingStrategy getWindowStrategy( + String collectionId, RunnerApi.Components components) { + RehydratedComponents rehydratedComponents = RehydratedComponents.forComponents(components); + + RunnerApi.WindowingStrategy windowingStrategyProto = + components.getWindowingStrategiesOrThrow( + components.getPcollectionsOrThrow(collectionId).getWindowingStrategyId()); + + WindowingStrategy windowingStrategy; + try { + windowingStrategy = + WindowingStrategyTranslation.fromProto(windowingStrategyProto, rehydratedComponents); + } catch (Exception e) { + throw new IllegalStateException( + String.format( + "Unable to hydrate GroupByKey windowing strategy %s.", windowingStrategyProto), + e); + } + + @SuppressWarnings("unchecked") + WindowingStrategy ret = + (WindowingStrategy) windowingStrategy; + return ret; + } + + /** + * Instantiate {@link WindowedValue.WindowedValueCoder} for given collection id from {@link + * RunnerApi.Components}. + */ + public static WindowedValue.WindowedValueCoder instantiateWindowedCoder( + String collectionId, RunnerApi.Components components) { + PipelineNode.PCollectionNode collectionNode = + PipelineNode.pCollection(collectionId, components.getPcollectionsOrThrow(collectionId)); + try { + return (WindowedValue.WindowedValueCoder) + WireCoders.instantiateRunnerWireCoder(collectionNode, components); + } catch (IOException e) { + throw new RuntimeException(e); + } + } +} diff --git a/runners/samza/src/main/resources/log4j.properties b/runners/samza/src/main/resources/log4j.properties index e9822ecd5f6b..3ad91be1b55a 100644 --- a/runners/samza/src/main/resources/log4j.properties +++ b/runners/samza/src/main/resources/log4j.properties @@ -16,7 +16,8 @@ # limitations under the License. ################################################################################ -log4j.rootLogger=DEBUG,console +# Update root logger to WARN and add log4j.category.org.apache.beam=INFO if executing in Intellij +log4j.rootLogger=INFO,console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/SamzaPipelineOptionsValidatorTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/SamzaPipelineOptionsValidatorTest.java new file mode 100644 index 000000000000..a5b03a23de36 --- /dev/null +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/SamzaPipelineOptionsValidatorTest.java @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza; + +import static org.apache.beam.runners.samza.SamzaPipelineOptionsValidator.validateBundlingRelatedOptions; +import static org.apache.samza.config.JobConfig.JOB_CONTAINER_THREAD_POOL_SIZE; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +import java.util.Collections; +import java.util.Map; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; + +/** Test for {@link SamzaPipelineOptionsValidator}. */ +public class SamzaPipelineOptionsValidatorTest { + + @Test(expected = IllegalArgumentException.class) + public void testBundleEnabledInMultiThreadedModeThrowsException() { + SamzaPipelineOptions mockOptions = mock(SamzaPipelineOptions.class); + Map config = ImmutableMap.of(JOB_CONTAINER_THREAD_POOL_SIZE, "10"); + + when(mockOptions.getMaxBundleSize()).thenReturn(2L); + when(mockOptions.getConfigOverride()).thenReturn(config); + validateBundlingRelatedOptions(mockOptions); + } + + @Test + public void testBundleEnabledInSingleThreadedMode() { + SamzaPipelineOptions mockOptions = mock(SamzaPipelineOptions.class); + when(mockOptions.getMaxBundleSize()).thenReturn(2L); + + try { + Map config = ImmutableMap.of(JOB_CONTAINER_THREAD_POOL_SIZE, "1"); + when(mockOptions.getConfigOverride()).thenReturn(config); + validateBundlingRelatedOptions(mockOptions); + + // In the absence of configuration make sure it is treated as single threaded mode. + when(mockOptions.getConfigOverride()).thenReturn(Collections.emptyMap()); + validateBundlingRelatedOptions(mockOptions); + } catch (Exception e) { + throw new AssertionError("Bundle size > 1 should be supported in single threaded mode"); + } + } +} diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystemTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystemTest.java index a95eff7744fb..ca31f8686aa3 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystemTest.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystemTest.java @@ -49,9 +49,6 @@ import org.junit.Test; /** Tests for {@link BoundedSourceSystem}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BoundedSourceSystemTest { private static final SystemStreamPartition DEFAULT_SSP = new SystemStreamPartition("default-system", "default-system", new Partition(0)); diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/TestSourceHelpers.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/TestSourceHelpers.java index e47503782d1f..b020a9acd6ce 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/TestSourceHelpers.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/TestSourceHelpers.java @@ -35,9 +35,6 @@ import org.joda.time.Instant; /** Helper classes and functions to build source for testing. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestSourceHelpers { private TestSourceHelpers() {} diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/TestUnboundedSource.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/TestUnboundedSource.java index 42edc7d5f47b..0439354d67ea 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/TestUnboundedSource.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/TestUnboundedSource.java @@ -43,9 +43,6 @@ * * @param element type */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestUnboundedSource extends UnboundedSource { // each list of events is a split private final List>> events; diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/UnboundedSourceSystemTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/UnboundedSourceSystemTest.java index fd61bc010ba3..bbc2e8cec91f 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/UnboundedSourceSystemTest.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/adapter/UnboundedSourceSystemTest.java @@ -18,6 +18,7 @@ package org.apache.beam.runners.samza.adapter; import static org.apache.beam.runners.samza.adapter.TestSourceHelpers.createElementMessage; +import static org.apache.beam.runners.samza.adapter.TestSourceHelpers.createEndOfStreamMessage; import static org.apache.beam.runners.samza.adapter.TestSourceHelpers.createWatermarkMessage; import static org.apache.beam.runners.samza.adapter.TestSourceHelpers.expectWrappedException; import static org.junit.Assert.assertEquals; @@ -52,9 +53,6 @@ import org.junit.Test; /** Tests for {@link UnboundedSourceSystem}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UnboundedSourceSystemTest { // A reasonable time to wait to get all messages from the source assuming no blocking. @@ -101,6 +99,33 @@ DEFAULT_SSP, offset(0), "test", BoundedWindow.TIMESTAMP_MIN_VALUE)), consumer.stop(); } + @Test + public void testMaxWatermarkTriggersEndOfStreamMessage() + throws IOException, InterruptedException { + final TestUnboundedSource source = + TestUnboundedSource.createBuilder() + .addElements("test") + .advanceWatermarkTo(BoundedWindow.TIMESTAMP_MAX_VALUE) + .build(); + + final UnboundedSourceSystem.Consumer consumer = + createConsumer(source); + + consumer.register(DEFAULT_SSP, NULL_STRING); + consumer.start(); + List actualList = + consumeUntilTimeoutOrWatermark(consumer, DEFAULT_SSP, DEFAULT_TIMEOUT_MILLIS); + actualList.addAll( + consumeUntilTimeoutOrWatermark(consumer, DEFAULT_SSP, DEFAULT_TIMEOUT_MILLIS)); + assertEquals( + Arrays.asList( + createElementMessage(DEFAULT_SSP, offset(0), "test", BoundedWindow.TIMESTAMP_MIN_VALUE), + createWatermarkMessage(DEFAULT_SSP, BoundedWindow.TIMESTAMP_MAX_VALUE), + createEndOfStreamMessage(DEFAULT_SSP)), + actualList); + consumer.stop(); + } + @Test public void testAdvanceTimestamp() throws IOException, InterruptedException { final Instant timestamp = Instant.now(); diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/BundleManagerTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/BundleManagerTest.java new file mode 100644 index 000000000000..91422097e83b --- /dev/null +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/BundleManagerTest.java @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.runtime; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNull; +import static org.junit.Assert.assertTrue; +import static org.mockito.Matchers.anyObject; +import static org.mockito.Mockito.doThrow; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; + +import java.util.Collection; +import java.util.Collections; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.CompletionStage; +import java.util.concurrent.CountDownLatch; +import org.apache.beam.runners.core.TimerInternals; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.WindowedValue; +import org.apache.samza.operators.Scheduler; +import org.joda.time.Instant; +import org.junit.Before; +import org.junit.Test; +import org.mockito.ArgumentCaptor; + +/** Unit tests for {@linkplain BundleManager}. */ +public final class BundleManagerTest { + private static final long MAX_BUNDLE_SIZE = 3; + private static final long MAX_BUNDLE_TIME_MS = 2000; + private static final String BUNDLE_CHECK_TIMER_ID = "bundle-check-test-timer"; + + private FutureCollector mockFutureCollector; + private BundleManager bundleManager; + private BundleManager.BundleProgressListener bundleProgressListener; + private Scheduler> mockScheduler; + + @Before + public void setUp() { + mockFutureCollector = mock(FutureCollector.class); + bundleProgressListener = mock(BundleManager.BundleProgressListener.class); + mockScheduler = mock(Scheduler.class); + bundleManager = + new BundleManager<>( + bundleProgressListener, + mockFutureCollector, + MAX_BUNDLE_SIZE, + MAX_BUNDLE_TIME_MS, + mockScheduler, + BUNDLE_CHECK_TIMER_ID); + } + + @Test + public void testTryStartBundleStartsBundle() { + bundleManager.tryStartBundle(); + + verify(bundleProgressListener, times(1)).onBundleStarted(); + assertEquals( + "Expected the number of element in the current bundle to be 1", + 1L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected the pending bundle count to be 1", 1L, bundleManager.getPendingBundleCount()); + assertTrue("tryStartBundle() did not start the bundle", bundleManager.isBundleStarted()); + } + + @Test + public void testTryStartBundleThrowsExceptionAndSignalError() { + bundleManager.setCurrentBundleDoneFuture(CompletableFuture.completedFuture(null)); + try { + bundleManager.tryStartBundle(); + } catch (IllegalArgumentException e) { + bundleManager.signalFailure(e); + } + + // verify if the signal failure only resets appropriate attributes of bundle + verify(mockFutureCollector, times(1)).prepare(); + verify(mockFutureCollector, times(1)).discard(); + assertEquals( + "Expected the number of element in the current bundle to 0", + 0L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected pending bundle count to be 0", 0L, bundleManager.getPendingBundleCount()); + assertFalse("Error didn't reset the bundle as expected.", bundleManager.isBundleStarted()); + } + + @Test + public void testTryStartBundleThrowsExceptionFromTheListener() { + doThrow(new RuntimeException("User start bundle threw an exception")) + .when(bundleProgressListener) + .onBundleStarted(); + + try { + bundleManager.tryStartBundle(); + } catch (RuntimeException e) { + bundleManager.signalFailure(e); + } + + // verify if the signal failure only resets appropriate attributes of bundle + verify(mockFutureCollector, times(1)).prepare(); + verify(mockFutureCollector, times(1)).discard(); + assertEquals( + "Expected the number of element in the current bundle to 0", + 0L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected pending bundle count to be 0", 0L, bundleManager.getPendingBundleCount()); + assertFalse("Error didn't reset the bundle as expected.", bundleManager.isBundleStarted()); + } + + @Test + public void testMultipleStartBundle() { + bundleManager.tryStartBundle(); + bundleManager.tryStartBundle(); + + // second invocation should not start the bundle + verify(bundleProgressListener, times(1)).onBundleStarted(); + assertEquals( + "Expected the number of element in the current bundle to be 2", + 2L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected the pending bundle count to be 1", 1L, bundleManager.getPendingBundleCount()); + assertTrue("tryStartBundle() did not start the bundle", bundleManager.isBundleStarted()); + } + + /* + * Setup the bundle manager with default max bundle size as 3 and max bundle close timeout to 2 seconds. + * The test verifies the following + * 1. Bundle gets closed on tryFinishBundle() + * a. pending bundle count == 0 + * b. element in current bundle == 0 + * c. isBundleStarted == false + * 2. onBundleFinished callback is invoked on the progress listener + */ + @Test + public void testTryFinishBundleClosesBundle() { + OpEmitter mockEmitter = mock(OpEmitter.class); + when(mockFutureCollector.finish()) + .thenReturn( + CompletableFuture.completedFuture(Collections.singleton(mock(WindowedValue.class)))); + + bundleManager.tryStartBundle(); + bundleManager.tryStartBundle(); + bundleManager.tryStartBundle(); + bundleManager.tryFinishBundle(mockEmitter); + + verify(mockEmitter, times(1)).emitFuture(anyObject()); + verify(bundleProgressListener, times(1)).onBundleFinished(mockEmitter); + assertEquals( + "Expected the number of element in the current bundle to be 0", + 0L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected the pending bundle count to be 0", 0L, bundleManager.getPendingBundleCount()); + assertFalse("tryFinishBundle() did not close the bundle", bundleManager.isBundleStarted()); + } + + @Test + public void testTryFinishBundleClosesBundleOnMaxWatermark() { + OpEmitter mockEmitter = mock(OpEmitter.class); + when(mockFutureCollector.finish()) + .thenReturn( + CompletableFuture.completedFuture(Collections.singleton(mock(WindowedValue.class)))); + bundleManager.setBundleWatermarkHold(BoundedWindow.TIMESTAMP_MAX_VALUE); + + bundleManager.tryStartBundle(); + bundleManager.tryStartBundle(); + bundleManager.tryFinishBundle(mockEmitter); + + verify(mockEmitter, times(1)).emitFuture(anyObject()); + verify(bundleProgressListener, times(1)).onBundleFinished(mockEmitter); + assertEquals( + "Expected the number of element in the current bundle to be 0", + 0L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected the pending bundle count to be 0", 0L, bundleManager.getPendingBundleCount()); + assertFalse("tryFinishBundle() did not close the bundle", bundleManager.isBundleStarted()); + } + + /* + * Set up the bundle manager with defaults and ensure the bundle manager doesn't close the current active bundle. + */ + @Test + public void testTryFinishBundleShouldNotCloseBundle() { + OpEmitter mockEmitter = mock(OpEmitter.class); + when(mockFutureCollector.finish()) + .thenReturn( + CompletableFuture.completedFuture(Collections.singleton(mock(WindowedValue.class)))); + + bundleManager.tryStartBundle(); + bundleManager.tryFinishBundle(mockEmitter); + + verify(mockFutureCollector, times(1)).finish(); + verify(mockEmitter, times(1)).emitFuture(anyObject()); + verify(bundleProgressListener, times(0)).onBundleFinished(mockEmitter); + assertEquals( + "Expected the number of element in the current bundle to be 1", + 1L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected the pending bundle count to be 1", 1L, bundleManager.getPendingBundleCount()); + assertTrue("tryFinishBundle() did not close the bundle", bundleManager.isBundleStarted()); + } + + @Test + public void testTryFinishBundleWhenNoBundleInProgress() { + OpEmitter mockEmitter = mock(OpEmitter.class); + when(mockFutureCollector.finish()) + .thenReturn(CompletableFuture.completedFuture(Collections.emptyList())); + + bundleManager.tryFinishBundle(mockEmitter); + + verify(mockEmitter, times(1)).emitFuture(anyObject()); + assertNull( + "tryFinishBundle() should not set the future when no bundle in progress", + bundleManager.getCurrentBundleDoneFuture()); + } + + @Test + public void testProcessWatermarkWhenNoBundleInProgress() { + Instant now = Instant.now(); + OpEmitter mockEmitter = mock(OpEmitter.class); + bundleManager.processWatermark(now, mockEmitter); + verify(bundleProgressListener, times(1)).onWatermark(now, mockEmitter); + } + + /* + * The test validates processing watermark during an active bundle in progress and also validates + * if the watermark hold is propagated down stream after the output futures are resolved. + */ + @Test + public void testProcessWatermarkWithPendingBundles() { + CountDownLatch latch = new CountDownLatch(1); + Instant watermark = Instant.now(); + OpEmitter mockEmitter = mock(OpEmitter.class); + + // We need to capture the finish bundle future to know if we can check for output watermark + // and verify other callbacks get invoked. + Class>>> outputFutureClass = + (Class>>>) (Class) CompletionStage.class; + ArgumentCaptor>>> captor = + ArgumentCaptor.forClass(outputFutureClass); + + when(mockFutureCollector.finish()) + .thenReturn( + CompletableFuture.supplyAsync( + () -> { + try { + latch.await(); + } catch (InterruptedException e) { + throw new AssertionError("Test interrupted when waiting for latch"); + } + + return Collections.singleton(mock(WindowedValue.class)); + })); + + testWatermarkHoldWhenPendingBundleInProgress(mockEmitter, captor, watermark); + testWatermarkHoldPropagatesAfterFutureResolution(mockEmitter, captor, latch, watermark); + } + + @Test + public void testMaxWatermarkPropagationForPendingBundle() { + Instant watermark = BoundedWindow.TIMESTAMP_MAX_VALUE; + OpEmitter mockEmitter = mock(OpEmitter.class); + bundleManager.setPendingBundleCount(1); + bundleManager.processWatermark(watermark, mockEmitter); + verify(bundleProgressListener, times(1)).onWatermark(watermark, mockEmitter); + } + + @Test + public void testMaxWatermarkWithBundleInProgress() { + Instant watermark = BoundedWindow.TIMESTAMP_MAX_VALUE; + OpEmitter mockEmitter = mock(OpEmitter.class); + + when(mockFutureCollector.finish()) + .thenReturn( + CompletableFuture.completedFuture(Collections.singleton(mock(WindowedValue.class)))); + + bundleManager.tryStartBundle(); + bundleManager.tryStartBundle(); + + // should force close bundle + bundleManager.processWatermark(watermark, mockEmitter); + verify(bundleProgressListener, times(1)).onWatermark(watermark, mockEmitter); + } + + @Test + public void testProcessTimerWithBundleTimeElapsed() { + BundleManager bundleManager = + new BundleManager<>( + bundleProgressListener, + mockFutureCollector, + MAX_BUNDLE_SIZE, + 0, + mockScheduler, + BUNDLE_CHECK_TIMER_ID); + OpEmitter mockEmitter = mock(OpEmitter.class); + KeyedTimerData mockTimer = mock(KeyedTimerData.class); + TimerInternals.TimerData mockTimerData = mock(TimerInternals.TimerData.class); + + when(mockFutureCollector.finish()) + .thenReturn( + CompletableFuture.completedFuture(Collections.singleton(mock(WindowedValue.class)))); + when(mockTimerData.getTimerId()).thenReturn(BUNDLE_CHECK_TIMER_ID); + when(mockTimer.getTimerData()).thenReturn(mockTimerData); + + bundleManager.tryStartBundle(); + bundleManager.processTimer(mockTimer, mockEmitter); + + verify(mockEmitter, times(1)).emitFuture(anyObject()); + verify(bundleProgressListener, times(1)).onBundleFinished(mockEmitter); + assertEquals( + "Expected the number of element in the current bundle to be 0", + 0L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected the pending bundle count to be 0", 0L, bundleManager.getPendingBundleCount()); + assertFalse("tryFinishBundle() did not close the bundle", bundleManager.isBundleStarted()); + } + + @Test + public void testProcessTimerWithTimeLessThanMaxBundleTime() { + OpEmitter mockEmitter = mock(OpEmitter.class); + KeyedTimerData mockTimer = mock(KeyedTimerData.class); + TimerInternals.TimerData mockTimerData = mock(TimerInternals.TimerData.class); + + when(mockTimerData.getTimerId()).thenReturn(BUNDLE_CHECK_TIMER_ID); + when(mockTimer.getTimerData()).thenReturn(mockTimerData); + + when(mockFutureCollector.finish()) + .thenReturn(CompletableFuture.completedFuture(Collections.emptyList())); + + bundleManager.tryStartBundle(); + bundleManager.processTimer(mockTimer, mockEmitter); + + verify(mockFutureCollector, times(1)).finish(); + verify(mockEmitter, times(1)).emitFuture(anyObject()); + verify(bundleProgressListener, times(0)).onBundleFinished(mockEmitter); + assertEquals( + "Expected the number of element in the current bundle to be 1", + 1L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected the pending bundle count to be 1", 1L, bundleManager.getPendingBundleCount()); + assertTrue("tryFinishBundle() closed the bundle", bundleManager.isBundleStarted()); + } + + @Test + public void testProcessTimerIgnoresNonBundleTimers() { + OpEmitter mockEmitter = mock(OpEmitter.class); + KeyedTimerData mockTimer = mock(KeyedTimerData.class); + TimerInternals.TimerData mockTimerData = mock(TimerInternals.TimerData.class); + + when(mockTimerData.getTimerId()).thenReturn("NotBundleTimer"); + when(mockTimer.getTimerData()).thenReturn(mockTimerData); + + bundleManager.tryStartBundle(); + bundleManager.processTimer(mockTimer, mockEmitter); + + verify(mockFutureCollector, times(0)).finish(); + verify(mockEmitter, times(0)).emitFuture(anyObject()); + verify(bundleProgressListener, times(0)).onBundleFinished(mockEmitter); + assertEquals( + "Expected the number of element in the current bundle to be 1", + 1L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected the pending bundle count to be 1", 1L, bundleManager.getPendingBundleCount()); + assertTrue("tryFinishBundle() closed the bundle", bundleManager.isBundleStarted()); + } + + @Test + public void testSignalFailureResetsTheBundleAndCollector() { + bundleManager.tryStartBundle(); + + bundleManager.signalFailure(mock(Throwable.class)); + verify(mockFutureCollector, times(1)).prepare(); + verify(mockFutureCollector, times(1)).discard(); + assertEquals( + "Expected the number of element in the current bundle to 0", + 0L, + bundleManager.getCurrentBundleElementCount()); + assertEquals( + "Expected pending bundle count to be 0", 0L, bundleManager.getPendingBundleCount()); + assertFalse("Error didn't reset the bundle as expected.", bundleManager.isBundleStarted()); + } + + /* + * We validate the following + * 1. Process watermark is held since there is a pending bundle. + * 2. Watermark propagates down stream once the output future is resolved. + * 3. The watermark propagated is the one that was held before closing the bundle + * 4. onBundleFinished and onWatermark callbacks are triggered + * 5. Pending bundle count is decremented once the future is resolved + */ + private void testWatermarkHoldPropagatesAfterFutureResolution( + OpEmitter mockEmitter, + ArgumentCaptor>>> captor, + CountDownLatch latch, + Instant sealedWatermark) { + Instant higherWatermark = Instant.now(); + + // Process watermark should result in watermark hold again since pending bundle count > 1 + bundleManager.processWatermark(higherWatermark, mockEmitter); + verify(bundleProgressListener, times(0)).onWatermark(higherWatermark, mockEmitter); + + // Resolving the process output futures should result in watermark propagation + latch.countDown(); + CompletionStage validationFuture = + captor + .getValue() + .thenAccept( + results -> { + verify(bundleProgressListener, times(1)).onBundleFinished(mockEmitter); + verify(bundleProgressListener, times(1)) + .onWatermark(sealedWatermark, mockEmitter); + assertEquals( + "Expected the pending bundle count to be 0", + 0L, + bundleManager.getPendingBundleCount()); + }); + + validationFuture.toCompletableFuture().join(); + } + + /* + * We validate the following + * 1. Watermark is held since there is a bundle in progress + * 2. Callbacks are not invoked when tryFinishBundle() is invoked since the future is unresolved + * 3. Watermark hold is sealed and output future is emitted + */ + private void testWatermarkHoldWhenPendingBundleInProgress( + OpEmitter mockEmitter, + ArgumentCaptor>>> captor, + Instant watermark) { + // Starts the bundle and reach the max bundle size so that tryFinishBundle() seals the current + // bundle + bundleManager.tryStartBundle(); + bundleManager.tryStartBundle(); + bundleManager.tryStartBundle(); + + bundleManager.processWatermark(watermark, mockEmitter); + verify(bundleProgressListener, times(0)).onWatermark(watermark, mockEmitter); + + // Bundle is still unresolved although sealed since count down the latch is not yet decremented. + bundleManager.tryFinishBundle(mockEmitter); + verify(mockFutureCollector, times(1)).finish(); + verify(mockEmitter, times(1)).emitFuture(captor.capture()); + assertFalse("tryFinishBundle() closed the bundle", bundleManager.isBundleStarted()); + } +} diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/FutureCollectorImplTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/FutureCollectorImplTest.java new file mode 100644 index 000000000000..f126dd14b835 --- /dev/null +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/FutureCollectorImplTest.java @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.runtime; + +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +import java.util.Collection; +import java.util.List; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.CompletionStage; +import java.util.stream.Collectors; +import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +/** Unit tests for {@linkplain org.apache.beam.runners.samza.runtime.DoFnOp.FutureCollectorImpl}. */ +public final class FutureCollectorImplTest { + private static final List RESULTS = ImmutableList.of("hello", "world"); + private FutureCollector futureCollector = new DoFnOp.FutureCollectorImpl<>(); + + @Before + public void setup() { + futureCollector = new DoFnOp.FutureCollectorImpl<>(); + } + + @Test(expected = IllegalStateException.class) + public void testAddWithoutPrepareCallThrowsException() { + futureCollector.add(mock(CompletionStage.class)); + } + + @Test + public void testFinishWithoutPrepareReturnsEmptyCollection() { + CompletionStage>> resultFuture = futureCollector.finish(); + CompletionStage validationFuture = + resultFuture.thenAccept( + result -> { + Assert.assertTrue("Expected the result to be empty", result.isEmpty()); + }); + validationFuture.toCompletableFuture().join(); + } + + @Test + public void testFinishReturnsExpectedResults() { + WindowedValue mockWindowedValue = mock(WindowedValue.class); + + when(mockWindowedValue.getValue()).thenReturn("hello").thenReturn("world"); + + futureCollector.prepare(); + futureCollector.add(CompletableFuture.completedFuture(mockWindowedValue)); + futureCollector.add(CompletableFuture.completedFuture(mockWindowedValue)); + + CompletionStage>> resultFuture = futureCollector.finish(); + CompletionStage validationFuture = + resultFuture.thenAccept( + results -> { + List actualResults = + results.stream().map(WindowedValue::getValue).collect(Collectors.toList()); + Assert.assertEquals( + "Expected the result to be {hello, world}", RESULTS, actualResults); + }); + validationFuture.toCompletableFuture().join(); + } + + @Test + public void testMultiplePrepareCallsWithoutFinishThrowsException() { + futureCollector.prepare(); + + try { + futureCollector.prepare(); + Assert.fail("Second invocation of prepare should throw IllegalStateException"); + } catch (IllegalStateException ex) { + } + } +} diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/KeyedTimerDataTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/KeyedTimerDataTest.java index a18d8752ac6b..d3da93a054bf 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/KeyedTimerDataTest.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/KeyedTimerDataTest.java @@ -27,7 +27,6 @@ import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.joda.time.DateTime; import org.joda.time.DateTimeZone; -import org.joda.time.Duration; import org.joda.time.Instant; import org.junit.Test; @@ -36,18 +35,14 @@ public class KeyedTimerDataTest { private static final Coder STRING_CODER = StringUtf8Coder.of(); private static final Instant TIMESTAMP = new DateTime(2020, 8, 11, 13, 42, 9, DateTimeZone.UTC).toInstant(); - private static final Instant OUTPUT_TIMESTAMP = TIMESTAMP.plus(Duration.standardSeconds(30)); + // TODO: LISAMZA-19205 Test OUTPUT_TIMESTAMP after outputTimestamp is encoded + // private static final Instant OUTPUT_TIMESTAMP = TIMESTAMP.plus(Duration.standardSeconds(30)); @Test public void testCoder() throws Exception { final TimerInternals.TimerData td = TimerInternals.TimerData.of( - "timer", - "timerFamily", - StateNamespaces.global(), - TIMESTAMP, - OUTPUT_TIMESTAMP, - TimeDomain.EVENT_TIME); + "timer", StateNamespaces.global(), TIMESTAMP, TIMESTAMP, TimeDomain.EVENT_TIME); final String key = "timer-key"; final ByteArrayOutputStream baos = new ByteArrayOutputStream(); @@ -58,6 +53,7 @@ public void testCoder() throws Exception { final KeyedTimerData.KeyedTimerDataCoder ktdCoder = new KeyedTimerData.KeyedTimerDataCoder<>(STRING_CODER, GlobalWindow.Coder.INSTANCE); - CoderProperties.coderDecodeEncodeEqual(ktdCoder, ktd); + // TODO: LISAMZA-19205: use CoderProperties.coderDecodeEncodeEqual + CoderProperties.coderDecodeEncodeEqualInContext(ktdCoder, Coder.Context.OUTER, ktd); } } diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/SamzaStoreStateInternalsTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/SamzaStoreStateInternalsTest.java index 1291e4c385bd..175d79edc7b2 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/SamzaStoreStateInternalsTest.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/SamzaStoreStateInternalsTest.java @@ -19,9 +19,11 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNull; import static org.junit.Assert.assertTrue; import java.io.File; +import java.io.IOException; import java.io.Serializable; import java.util.ArrayList; import java.util.Collections; @@ -32,9 +34,13 @@ import java.util.Map; import java.util.Set; import org.apache.beam.runners.samza.SamzaPipelineOptions; +import org.apache.beam.runners.samza.TestSamzaRunner; +import org.apache.beam.runners.samza.runtime.SamzaStoreStateInternals.StateValue; +import org.apache.beam.runners.samza.runtime.SamzaStoreStateInternals.StateValueSerdeFactory; import org.apache.beam.runners.samza.state.SamzaMapState; import org.apache.beam.runners.samza.state.SamzaSetState; import org.apache.beam.runners.samza.translation.ConfigBuilder; +import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VarIntCoder; import org.apache.beam.sdk.options.PipelineOptionsFactory; @@ -58,6 +64,7 @@ import org.apache.samza.context.ContainerContext; import org.apache.samza.context.JobContext; import org.apache.samza.metrics.MetricsRegistry; +import org.apache.samza.serializers.Serde; import org.apache.samza.storage.StorageEngineFactory; import org.apache.samza.storage.kv.Entry; import org.apache.samza.storage.kv.KeyValueIterator; @@ -72,7 +79,6 @@ /** Tests for SamzaStoreStateInternals. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SamzaStoreStateInternalsTest implements Serializable { @Rule @@ -206,16 +212,9 @@ public KeyValueStore getKVStore( /** A test store based on InMemoryKeyValueStore. */ public static class TestStore extends InMemoryKeyValueStore { static List iterators = Collections.synchronizedList(new ArrayList<>()); - private final KeyValueStoreMetrics metrics; public TestStore(KeyValueStoreMetrics metrics) { super(metrics); - this.metrics = metrics; - } - - @Override - public KeyValueStoreMetrics metrics() { - return metrics; } @Override @@ -285,7 +284,9 @@ public void processElement( KV.of("hello", 97), KV.of("hello", 42), KV.of("hello", 42), KV.of("hello", 12))) .apply(ParDo.of(fn)); - Map configs = new HashMap(ConfigBuilder.localRunConfig()); + SamzaPipelineOptions options = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + options.setRunner(TestSamzaRunner.class); + Map configs = new HashMap<>(ConfigBuilder.localRunConfig()); configs.put("stores.foo.factory", TestStorageEngine.class.getName()); pipeline.getOptions().as(SamzaPipelineOptions.class).setConfigOverride(configs); pipeline.run(); @@ -295,4 +296,24 @@ public void processElement( assertEquals(8, TestStore.iterators.size()); TestStore.iterators.forEach(iter -> assertTrue(iter.closed)); } + + @Test + public void testStateValueSerde() throws IOException { + StateValueSerdeFactory stateValueSerdeFactory = new StateValueSerdeFactory(); + Serde> serde = (Serde) stateValueSerdeFactory.getSerde("Test", null); + int value = 123; + Coder coder = VarIntCoder.of(); + + byte[] valueBytes = serde.toBytes(StateValue.of(value, coder)); + StateValue stateValue1 = serde.fromBytes(valueBytes); + StateValue stateValue2 = StateValue.of(valueBytes); + assertEquals(stateValue1.getValue(coder).intValue(), value); + assertEquals(stateValue2.getValue(coder).intValue(), value); + + Integer nullValue = null; + byte[] nullBytes = serde.toBytes(StateValue.of(nullValue, coder)); + StateValue nullStateValue = serde.fromBytes(nullBytes); + assertNull(nullBytes); + assertNull(nullStateValue.getValue(coder)); + } } diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/SamzaTimerInternalsFactoryTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/SamzaTimerInternalsFactoryTest.java index 9af37a92dc59..291519d6912f 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/SamzaTimerInternalsFactoryTest.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/SamzaTimerInternalsFactoryTest.java @@ -38,17 +38,17 @@ import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.runtime.SamzaStoreStateInternals.ByteArray; import org.apache.beam.runners.samza.runtime.SamzaStoreStateInternals.ByteArraySerdeFactory; +import org.apache.beam.runners.samza.runtime.SamzaStoreStateInternals.StateValue; +import org.apache.beam.runners.samza.runtime.SamzaStoreStateInternals.StateValueSerdeFactory; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.state.TimeDomain; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.samza.config.MapConfig; import org.apache.samza.context.TaskContext; import org.apache.samza.metrics.MetricsRegistryMap; import org.apache.samza.operators.Scheduler; -import org.apache.samza.serializers.ByteSerde; import org.apache.samza.serializers.Serde; import org.apache.samza.storage.kv.KeyValueStore; import org.apache.samza.storage.kv.KeyValueStoreMetrics; @@ -74,7 +74,7 @@ public class SamzaTimerInternalsFactoryTest { @Rule public transient TemporaryFolder temporaryFolder = new TemporaryFolder(); - private KeyValueStore createStore() { + private KeyValueStore> createStore() { final Options options = new Options(); options.setCreateIfMissing(true); @@ -92,25 +92,24 @@ private KeyValueStore createStore() { return new SerializedKeyValueStore<>( rocksStore, new ByteArraySerdeFactory.ByteArraySerde(), - new ByteSerde(), + new StateValueSerdeFactory.StateValueSerde(), new SerializedKeyValueStoreMetrics("beamStore", new MetricsRegistryMap())); } private static SamzaStoreStateInternals.Factory createNonKeyedStateInternalsFactory( - SamzaPipelineOptions pipelineOptions, KeyValueStore store) { + SamzaPipelineOptions pipelineOptions, KeyValueStore> store) { final TaskContext context = mock(TaskContext.class); when(context.getStore(anyString())).thenReturn((KeyValueStore) store); - final TupleTag mainOutputTag = new TupleTag<>("output"); - return SamzaStoreStateInternals.createStateInternalFactory( - "42", null, context, pipelineOptions, null); + return SamzaStoreStateInternals.createNonKeyedStateInternalsFactory( + "42", context, pipelineOptions); } private static SamzaTimerInternalsFactory createTimerInternalsFactory( Scheduler> timerRegistry, String timerStateId, SamzaPipelineOptions pipelineOptions, - KeyValueStore store) { + KeyValueStore> store) { final SamzaStoreStateInternals.Factory nonKeyedStateInternalsFactory = createNonKeyedStateInternalsFactory(pipelineOptions, store); @@ -144,7 +143,7 @@ public void testEventTimeTimers() { final SamzaPipelineOptions pipelineOptions = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); - final KeyValueStore store = createStore(); + final KeyValueStore> store = createStore(); final SamzaTimerInternalsFactory timerInternalsFactory = createTimerInternalsFactory(null, "timer", pipelineOptions, store); @@ -177,12 +176,69 @@ public void testEventTimeTimers() { store.close(); } + @Test + public void testRestoreEventBufferSize() throws Exception { + final SamzaPipelineOptions pipelineOptions = + PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + + KeyValueStore> store = createStore(); + final SamzaTimerInternalsFactory timerInternalsFactory = + createTimerInternalsFactory(null, "timer", pipelineOptions, store); + + final String key = "testKey"; + final StateNamespace nameSpace = StateNamespaces.global(); + final TimerInternals timerInternals = timerInternalsFactory.timerInternalsForKey(key); + final TimerInternals.TimerData timer1 = + TimerInternals.TimerData.of( + "timer1", nameSpace, new Instant(10), new Instant(10), TimeDomain.EVENT_TIME); + timerInternals.setTimer(timer1); + + store.close(); + + // restore by creating a new instance + store = createStore(); + + final SamzaTimerInternalsFactory restoredFactory = + createTimerInternalsFactory(null, "timer", pipelineOptions, store); + assertEquals(1, restoredFactory.getEventTimeBuffer().size()); + + restoredFactory.setInputWatermark(new Instant(150)); + Collection> readyTimers = restoredFactory.removeReadyTimers(); + assertEquals(1, readyTimers.size()); + + // Timer 1 should be evicted from buffer + assertTrue(restoredFactory.getEventTimeBuffer().isEmpty()); + final TimerInternals restoredTimerInternals = restoredFactory.timerInternalsForKey(key); + final TimerInternals.TimerData timer2 = + TimerInternals.TimerData.of( + "timer2", nameSpace, new Instant(200), new Instant(200), TimeDomain.EVENT_TIME); + restoredTimerInternals.setTimer(timer2); + + // Timer 2 should be added to the Event buffer + assertEquals(1, restoredFactory.getEventTimeBuffer().size()); + // Timer 2 should not be ready + readyTimers = restoredFactory.removeReadyTimers(); + assertEquals(0, readyTimers.size()); + + restoredFactory.setInputWatermark(new Instant(250)); + + // Timer 2 should be ready + readyTimers = restoredFactory.removeReadyTimers(); + assertEquals(1, readyTimers.size()); + ByteArrayOutputStream baos = new ByteArrayOutputStream(); + StringUtf8Coder.of().encode(key, baos); + byte[] keyBytes = baos.toByteArray(); + assertEquals(readyTimers, Arrays.asList(new KeyedTimerData<>(keyBytes, key, timer2))); + + store.close(); + } + @Test public void testRestore() throws Exception { final SamzaPipelineOptions pipelineOptions = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); - KeyValueStore store = createStore(); + KeyValueStore> store = createStore(); final SamzaTimerInternalsFactory timerInternalsFactory = createTimerInternalsFactory(null, "timer", pipelineOptions, store); @@ -227,7 +283,7 @@ public void testProcessingTimeTimers() throws IOException { final SamzaPipelineOptions pipelineOptions = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); - KeyValueStore store = createStore(); + KeyValueStore> store = createStore(); TestTimerRegistry timerRegistry = new TestTimerRegistry(); final SamzaTimerInternalsFactory timerInternalsFactory = @@ -245,7 +301,16 @@ public void testProcessingTimeTimers() throws IOException { "timer2", nameSpace, new Instant(100), new Instant(100), TimeDomain.PROCESSING_TIME); timerInternals.setTimer(timer2); - assertEquals(2, timerRegistry.timers.size()); + final TimerInternals.TimerData timer3 = + TimerInternals.TimerData.of( + "timer3", + "timerFamilyId3", + nameSpace, + new Instant(100), + new Instant(100), + TimeDomain.PROCESSING_TIME); + timerInternals.setTimer(timer3); + assertEquals(3, timerRegistry.timers.size()); store.close(); @@ -255,14 +320,14 @@ public void testProcessingTimeTimers() throws IOException { final SamzaTimerInternalsFactory restoredFactory = createTimerInternalsFactory(restoredRegistry, "timer", pipelineOptions, store); - assertEquals(2, restoredRegistry.timers.size()); + assertEquals(3, restoredRegistry.timers.size()); final ByteArrayOutputStream baos = new ByteArrayOutputStream(); StringUtf8Coder.of().encode("testKey", baos); final byte[] keyBytes = baos.toByteArray(); - restoredFactory.removeProcessingTimer(new KeyedTimerData(keyBytes, "testKey", timer1)); - restoredFactory.removeProcessingTimer(new KeyedTimerData(keyBytes, "testKey", timer2)); - + restoredFactory.removeProcessingTimer(new KeyedTimerData<>(keyBytes, "testKey", timer1)); + restoredFactory.removeProcessingTimer(new KeyedTimerData<>(keyBytes, "testKey", timer2)); + restoredFactory.removeProcessingTimer(new KeyedTimerData<>(keyBytes, "testKey", timer3)); store.close(); } @@ -271,7 +336,7 @@ public void testOverride() { final SamzaPipelineOptions pipelineOptions = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); - KeyValueStore store = createStore(); + KeyValueStore> store = createStore(); final SamzaTimerInternalsFactory timerInternalsFactory = createTimerInternalsFactory(null, "timer", pipelineOptions, store); @@ -309,6 +374,303 @@ public void testOverride() { store.close(); } + /** + * Test the number of expired event timers for each watermark does not exceed the predefined + * limit. + */ + @Test + public void testMaxExpiredEventTimersProcessAtOnce() { + // If maxExpiredTimersToProcessOnce <= the number of expired timers, then load + // "maxExpiredTimersToProcessOnce" timers. + testMaxExpiredEventTimersProcessAtOnce(10, 10, 5, 5); + testMaxExpiredEventTimersProcessAtOnce(10, 10, 10, 10); + + // If maxExpiredTimersToProcessOnce > the number of expired timers, then load all the ready + // timers. + testMaxExpiredEventTimersProcessAtOnce(10, 10, 20, 10); + } + + private void testMaxExpiredEventTimersProcessAtOnce( + int totalNumberOfTimersInStore, + int totalNumberOfExpiredTimers, + int maxExpiredTimersToProcessOnce, + int expectedExpiredTimersToProcess) { + final SamzaPipelineOptions pipelineOptions = + PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + pipelineOptions.setMaxReadyTimersToProcessOnce(maxExpiredTimersToProcessOnce); + + final KeyValueStore> store = createStore(); + final SamzaTimerInternalsFactory timerInternalsFactory = + createTimerInternalsFactory(null, "timer", pipelineOptions, store); + + final StateNamespace nameSpace = StateNamespaces.global(); + final TimerInternals timerInternals = timerInternalsFactory.timerInternalsForKey("testKey"); + + TimerInternals.TimerData timer; + for (int i = 0; i < totalNumberOfTimersInStore; i++) { + timer = + TimerInternals.TimerData.of( + "timer" + i, nameSpace, new Instant(i), new Instant(i), TimeDomain.EVENT_TIME); + timerInternals.setTimer(timer); + } + + // Set the timestamp of the input watermark to be the value of totalNumberOfExpiredTimers + // so that totalNumberOfExpiredTimers timers are expected be expired with respect to this + // watermark. + final Instant inputWatermark = new Instant(totalNumberOfExpiredTimers); + timerInternalsFactory.setInputWatermark(inputWatermark); + final Collection> readyTimers = + timerInternalsFactory.removeReadyTimers(); + assertEquals(expectedExpiredTimersToProcess, readyTimers.size()); + store.close(); + } + + @Test + public void testBufferSizeNotExceedingPipelineOptionValue() { + final SamzaPipelineOptions pipelineOptions = + PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + pipelineOptions.setEventTimerBufferSize(2); + + final KeyValueStore> store = createStore(); + final SamzaTimerInternalsFactory timerInternalsFactory = + createTimerInternalsFactory(null, "timer", pipelineOptions, store); + + final StateNamespace nameSpace = StateNamespaces.global(); + final TimerInternals timerInternals = timerInternalsFactory.timerInternalsForKey("testKey"); + + // prepare 5 timers. + // timers in memory are then timestamped from 0 - 1; + // timers in store are then timestamped from 0 - 4. + for (int i = 0; i < 5; i++) { + timerInternals.setTimer( + nameSpace, "timer" + i, "", new Instant(i), new Instant(i), TimeDomain.EVENT_TIME); + } + + // only two timers are supposed to be in event time buffer + assertEquals(2, timerInternalsFactory.getEventTimeBuffer().size()); + + store.close(); + } + + @Test + public void testAllTimersAreFiredWithReload() { + final SamzaPipelineOptions pipelineOptions = + PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + pipelineOptions.setEventTimerBufferSize(2); + + final KeyValueStore> store = createStore(); + final SamzaTimerInternalsFactory timerInternalsFactory = + createTimerInternalsFactory(null, "timer", pipelineOptions, store); + + final StateNamespace nameSpace = StateNamespaces.global(); + final TimerInternals timerInternals = timerInternalsFactory.timerInternalsForKey("testKey"); + + // prepare 3 timers. + // timers in memory now are timestamped from 0 - 1; + // timers in store now are timestamped from 0 - 2. + for (int i = 0; i < 3; i++) { + timerInternals.setTimer( + nameSpace, "timer" + i, "", new Instant(i), new Instant(i), TimeDomain.EVENT_TIME); + } + + // total number of event time timers to fire equals to the number of timers in store + Collection> readyTimers; + timerInternalsFactory.setInputWatermark(new Instant(3)); + readyTimers = timerInternalsFactory.removeReadyTimers(); + // buffer should reload from store and all timers are supposed to be fired. + assertEquals(3, readyTimers.size()); + + store.close(); + } + + /** + * Test the total number of event time timers reloaded into memory is aligned with the number of + * the event time timers written to the store. Moreover, event time timers reloaded into memory is + * maintained in order. + */ + @Test + public void testAllTimersAreFiredInOrder() { + final SamzaPipelineOptions pipelineOptions = + PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + pipelineOptions.setEventTimerBufferSize(5); + + final KeyValueStore> store = createStore(); + final SamzaTimerInternalsFactory timerInternalsFactory = + createTimerInternalsFactory(null, "timer", pipelineOptions, store); + + final StateNamespace nameSpace = StateNamespaces.global(); + final TimerInternals timerInternals = timerInternalsFactory.timerInternalsForKey("testKey"); + + // prepare 8 timers. + // timers in memory now are timestamped from 0 - 4; + // timers in store now are timestamped from 0 - 7. + for (int i = 0; i < 8; i++) { + timerInternals.setTimer( + nameSpace, "timer" + i, "", new Instant(i), new Instant(i), TimeDomain.EVENT_TIME); + } + + // fire the first 2 timers. + // timers in memory now are timestamped from 2 - 4; + // timers in store now are timestamped from 2 - 7. + Collection> readyTimers; + timerInternalsFactory.setInputWatermark(new Instant(1)); + long lastTimestamp = 0; + readyTimers = timerInternalsFactory.removeReadyTimers(); + for (KeyedTimerData keyedTimerData : readyTimers) { + final long currentTimeStamp = keyedTimerData.getTimerData().getTimestamp().getMillis(); + assertTrue(lastTimestamp <= currentTimeStamp); + lastTimestamp = currentTimeStamp; + } + assertEquals(2, readyTimers.size()); + + // add another 12 timers. + // timers in memory (reloaded for three times) now are timestamped from 2 - 4; 5 - 9; 10 - 14; + // 15 - 19. + // timers in store now are timestamped from 2 - 19. + // the total number of timers to fire is 18. + for (int i = 8; i < 20; i++) { + timerInternals.setTimer( + nameSpace, "timer" + i, "", new Instant(i), new Instant(i), TimeDomain.EVENT_TIME); + } + timerInternalsFactory.setInputWatermark(new Instant(20)); + lastTimestamp = 0; + readyTimers = timerInternalsFactory.removeReadyTimers(); + for (KeyedTimerData keyedTimerData : readyTimers) { + final long currentTimeStamp = keyedTimerData.getTimerData().getTimestamp().getMillis(); + assertTrue(lastTimestamp <= currentTimeStamp); + lastTimestamp = currentTimeStamp; + } + assertEquals(18, readyTimers.size()); + + store.close(); + } + + @Test + public void testNewTimersAreInsertedInOrder() { + final SamzaPipelineOptions pipelineOptions = + PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + pipelineOptions.setEventTimerBufferSize(5); + + final KeyValueStore> store = createStore(); + final SamzaTimerInternalsFactory timerInternalsFactory = + createTimerInternalsFactory(null, "timer", pipelineOptions, store); + + final StateNamespace nameSpace = StateNamespaces.global(); + final TimerInternals timerInternals = timerInternalsFactory.timerInternalsForKey("testKey"); + + // prepare 10 timers. + // timers in memory now are timestamped from 0 - 4; + // timers in store now are timestamped from 0 - 9. + for (int i = 0; i < 10; i++) { + timerInternals.setTimer( + nameSpace, "timer" + i, "", new Instant(i), new Instant(i), TimeDomain.EVENT_TIME); + } + + // fire the first 2 timers. + // timers in memory now are timestamped from 2 - 4; + // timers in store now are timestamped from 2 - 9. + Collection> readyTimers; + timerInternalsFactory.setInputWatermark(new Instant(1)); + long lastTimestamp = 0; + readyTimers = timerInternalsFactory.removeReadyTimers(); + for (KeyedTimerData keyedTimerData : readyTimers) { + final long currentTimeStamp = keyedTimerData.getTimerData().getTimestamp().getMillis(); + assertTrue(lastTimestamp <= currentTimeStamp); + lastTimestamp = currentTimeStamp; + } + assertEquals(2, readyTimers.size()); + + // add 3 timers but timer 2 has duplicate so drop. + // timers in memory now are timestamped from 0 to 2 prefixed with lateTimer, and 2 to + // 4 prefixed with timer, timestamp is in order; + // timers in store now are timestamped from 0 to 2 prefixed with lateTimer, and 2 to 9 + // prefixed with timer, timestamp is in order; + for (int i = 0; i < 3; i++) { + timerInternals.setTimer( + nameSpace, "timer" + i, "", new Instant(i), new Instant(i), TimeDomain.EVENT_TIME); + } + + // there are 11 timers in state now. + // watermark 5 comes, so 6 timers will be evicted because their timestamp is less than 5. + // memory will be reloaded once to have 5 to 8 left (reload to have 4 to 8, but 4 is evicted), 5 + // to 9 left in store. + // all of them are in order for firing. + timerInternalsFactory.setInputWatermark(new Instant(5)); + lastTimestamp = 0; + readyTimers = timerInternalsFactory.removeReadyTimers(); + for (KeyedTimerData keyedTimerData : readyTimers) { + final long currentTimeStamp = keyedTimerData.getTimerData().getTimestamp().getMillis(); + assertTrue(lastTimestamp <= currentTimeStamp); + lastTimestamp = currentTimeStamp; + } + assertEquals(6, readyTimers.size()); + assertEquals(4, timerInternalsFactory.getEventTimeBuffer().size()); + + // watermark 10 comes, so all timers will be evicted in order. + timerInternalsFactory.setInputWatermark(new Instant(10)); + readyTimers = timerInternalsFactory.removeReadyTimers(); + for (KeyedTimerData keyedTimerData : readyTimers) { + final long currentTimeStamp = keyedTimerData.getTimerData().getTimestamp().getMillis(); + assertTrue(lastTimestamp <= currentTimeStamp); + lastTimestamp = currentTimeStamp; + } + assertEquals(4, readyTimers.size()); + assertEquals(0, timerInternalsFactory.getEventTimeBuffer().size()); + + store.close(); + } + + @Test + public void testBufferRefilledAfterRestoreToNonFullState() { + final SamzaPipelineOptions pipelineOptions = + PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + pipelineOptions.setEventTimerBufferSize(5); + + final KeyValueStore> store = createStore(); + final SamzaTimerInternalsFactory timerInternalsFactory = + createTimerInternalsFactory(null, "timer", pipelineOptions, store); + + final StateNamespace nameSpace = StateNamespaces.global(); + final TimerInternals timerInternals = timerInternalsFactory.timerInternalsForKey("testKey"); + + // prepare (buffer capacity + 1) 6 timers. + // timers in memory now are timestamped from 0 - 4; + // timer in store now is timestamped 6. + for (int i = 0; i < 6; i++) { + timerInternals.setTimer( + nameSpace, "timer" + i, "", new Instant(i), new Instant(i), TimeDomain.EVENT_TIME); + } + + // total number of event time timers to fire equals to the number of timers in store + Collection> readyTimers; + timerInternalsFactory.setInputWatermark(new Instant(4)); + readyTimers = timerInternalsFactory.removeReadyTimers(); + assertEquals(5, readyTimers.size()); + // reloaded timer5 + assertEquals(1, timerInternalsFactory.getEventTimeBuffer().size()); + + for (int i = 6; i < 13; i++) { + timerInternals.setTimer( + nameSpace, "timer" + i, "", new Instant(i), new Instant(i), TimeDomain.EVENT_TIME); + } + // timers should go into buffer not state + assertEquals(5, timerInternalsFactory.getEventTimeBuffer().size()); + + // watermark 10 comes,6 timers will be evicted in order and 2 still in buffer. + timerInternalsFactory.setInputWatermark(new Instant(10)); + readyTimers = timerInternalsFactory.removeReadyTimers(); + long lastTimestamp = 0; + for (KeyedTimerData keyedTimerData : readyTimers) { + final long currentTimeStamp = keyedTimerData.getTimerData().getTimestamp().getMillis(); + assertTrue(lastTimestamp <= currentTimeStamp); + lastTimestamp = currentTimeStamp; + } + assertEquals(6, readyTimers.size()); + assertEquals(2, timerInternalsFactory.getEventTimeBuffer().size()); + + store.close(); + } + @Test public void testByteArray() { ByteArray key1 = ByteArray.of("hello world".getBytes(StandardCharsets.UTF_8)); diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/ConfigGeneratorTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/ConfigGeneratorTest.java index 36317b16947e..bb8e67b453e5 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/ConfigGeneratorTest.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/ConfigGeneratorTest.java @@ -19,9 +19,11 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; +import static org.junit.Assert.assertThrows; import static org.junit.Assert.assertTrue; import java.util.Map; +import java.util.Objects; import org.apache.beam.runners.samza.SamzaExecutionEnvironment; import org.apache.beam.runners.samza.SamzaPipelineOptions; import org.apache.beam.runners.samza.SamzaRunner; @@ -32,6 +34,8 @@ import org.apache.beam.sdk.state.ValueState; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.Impulse; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.Sum; import org.apache.beam.sdk.values.KV; @@ -46,19 +50,17 @@ import org.apache.samza.runtime.LocalApplicationRunner; import org.apache.samza.runtime.RemoteApplicationRunner; import org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory; +import org.apache.samza.storage.kv.inmemory.InMemoryKeyValueStorageEngineFactory; import org.apache.samza.zk.ZkJobCoordinatorFactory; import org.junit.Test; /** Test config generations for {@link org.apache.beam.runners.samza.SamzaRunner}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ConfigGeneratorTest { private static final String APP_RUNNER_CLASS = "app.runner.class"; private static final String JOB_FACTORY_CLASS = "job.factory.class"; @Test - public void testBeamStoreConfig() { + public void testStatefulBeamStoreConfig() { SamzaPipelineOptions options = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); options.setJobName("TestStoreConfig"); options.setRunner(SamzaRunner.class); @@ -77,7 +79,7 @@ public void testBeamStoreConfig() { RocksDbKeyValueStorageEngineFactory.class.getName(), config.get("stores.beamStore.factory")); assertEquals("byteArraySerde", config.get("stores.beamStore.key.serde")); - assertEquals("byteSerde", config.get("stores.beamStore.msg.serde")); + assertEquals("stateValueSerde", config.get("stores.beamStore.msg.serde")); assertNull(config.get("stores.beamStore.changelog")); options.setStateDurable(true); @@ -87,6 +89,36 @@ public void testBeamStoreConfig() { "TestStoreConfig-1-beamStore-changelog", config2.get("stores.beamStore.changelog")); } + @Test + public void testStatelessBeamStoreConfig() { + SamzaPipelineOptions options = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + options.setJobName("TestStoreConfig"); + options.setRunner(SamzaRunner.class); + + Pipeline pipeline = Pipeline.create(options); + pipeline.apply(Impulse.create()).apply(Filter.by(Objects::nonNull)); + + pipeline.replaceAll(SamzaTransformOverrides.getDefaultOverrides()); + + final Map idMap = PViewToIdMapper.buildIdMap(pipeline); + final ConfigBuilder configBuilder = new ConfigBuilder(options); + SamzaPipelineTranslator.createConfig(pipeline, options, idMap, configBuilder); + final Config config = configBuilder.build(); + + assertEquals( + InMemoryKeyValueStorageEngineFactory.class.getName(), + config.get("stores.beamStore.factory")); + assertEquals("byteArraySerde", config.get("stores.beamStore.key.serde")); + assertEquals("stateValueSerde", config.get("stores.beamStore.msg.serde")); + assertNull(config.get("stores.beamStore.changelog")); + + options.setStateDurable(true); + SamzaPipelineTranslator.createConfig(pipeline, options, idMap, configBuilder); + final Config config2 = configBuilder.build(); + // For stateless jobs, ignore state durable pipeline option. + assertNull(config2.get("stores.beamStore.changelog")); + } + @Test public void testSamzaLocalExecutionEnvironmentConfig() { SamzaPipelineOptions options = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); @@ -207,7 +239,7 @@ public void processElement( RocksDbKeyValueStorageEngineFactory.class.getName(), config.get("stores.testState.factory")); assertEquals("byteArraySerde", config.get("stores.testState.key.serde")); - assertEquals("byteSerde", config.get("stores.testState.msg.serde")); + assertEquals("stateValueSerde", config.get("stores.testState.msg.serde")); assertNull(config.get("stores.testState.changelog")); options.setStateDurable(true); @@ -216,4 +248,49 @@ public void processElement( assertEquals( "TestStoreConfig-1-testState-changelog", config2.get("stores.testState.changelog")); } + + @Test + public void testDuplicateStateIdConfig() { + SamzaPipelineOptions options = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + options.setJobName("TestStoreConfig"); + options.setRunner(SamzaRunner.class); + + Pipeline pipeline = Pipeline.create(options); + pipeline + .apply( + Create.empty(TypeDescriptors.kvs(TypeDescriptors.strings(), TypeDescriptors.strings()))) + .apply( + ParDo.of( + new DoFn, KV>() { + private static final String testState = "testState"; + + @StateId(testState) + private final StateSpec> state = StateSpecs.value(); + + @ProcessElement + public void processElement( + ProcessContext context, @StateId(testState) ValueState state) { + context.output(context.element()); + } + })) + .apply( + ParDo.of( + new DoFn, Void>() { + private static final String testState = "testState"; + + @StateId(testState) + private final StateSpec> state = StateSpecs.value(); + + @ProcessElement + public void processElement( + ProcessContext context, @StateId(testState) ValueState state) {} + })); + + final Map idMap = PViewToIdMapper.buildIdMap(pipeline); + final ConfigBuilder configBuilder = new ConfigBuilder(options); + + assertThrows( + IllegalStateException.class, + () -> SamzaPipelineTranslator.createConfig(pipeline, options, idMap, configBuilder)); + } } diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/SamzaImpulseSystemTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/SamzaImpulseSystemTest.java index 7eba0ff214cf..46a04f1ccca6 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/SamzaImpulseSystemTest.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/SamzaImpulseSystemTest.java @@ -36,7 +36,6 @@ */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SamzaImpulseSystemTest { @Test diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/TranslationContextTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/TranslationContextTest.java new file mode 100644 index 000000000000..8827f1e5622c --- /dev/null +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/translation/TranslationContextTest.java @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.translation; + +import static org.junit.Assert.assertNotNull; +import static org.mockito.Mockito.mock; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.beam.runners.samza.SamzaPipelineOptions; +import org.apache.beam.runners.samza.runtime.OpMessage; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PValue; +import org.apache.samza.application.descriptors.StreamApplicationDescriptor; +import org.apache.samza.application.descriptors.StreamApplicationDescriptorImpl; +import org.apache.samza.config.Config; +import org.apache.samza.config.MapConfig; +import org.apache.samza.operators.KV; +import org.apache.samza.operators.MessageStream; +import org.apache.samza.operators.functions.MapFunction; +import org.apache.samza.serializers.KVSerde; +import org.apache.samza.serializers.NoOpSerde; +import org.apache.samza.serializers.Serde; +import org.apache.samza.system.descriptors.GenericInputDescriptor; +import org.apache.samza.system.descriptors.GenericSystemDescriptor; +import org.junit.Test; + +@SuppressWarnings({"rawtypes"}) +public class TranslationContextTest { + private final GenericInputDescriptor testInputDescriptor = + new GenericSystemDescriptor("mockSystem", "mockFactoryClassName") + .getInputDescriptor("test-input-1", mock(Serde.class)); + MapFunction keyFn = m -> m.toString(); + MapFunction valueFn = m -> m; + private final String streamName = "testStream"; + KVSerde serde = KVSerde.of(new NoOpSerde<>(), new NoOpSerde<>()); + StreamApplicationDescriptor streamApplicationDescriptor = + new StreamApplicationDescriptorImpl( + appDesc -> { + MessageStream inputStream = appDesc.getInputStream(testInputDescriptor); + inputStream.partitionBy(keyFn, valueFn, serde, streamName); + }, + getConfig()); + Map idMap = new HashMap<>(); + TranslationContext translationContext = + new TranslationContext(streamApplicationDescriptor, idMap, mock(SamzaPipelineOptions.class)); + + @Test + public void testRegisterInputMessageStreams() { + final PCollection output = mock(PCollection.class); + List topics = Arrays.asList("stream1", "stream2"); + List inputDescriptors = + topics.stream() + .map(topicName -> createSamzaInputDescriptor(topicName, topicName)) + .collect(Collectors.toList()); + + translationContext.registerInputMessageStreams(output, inputDescriptors); + + assertNotNull(translationContext.getMessageStream(output)); + } + + public GenericInputDescriptor>> createSamzaInputDescriptor( + String systemName, String streamId) { + final Serde>> kvSerde = + KVSerde.of(new NoOpSerde<>(), new NoOpSerde<>()); + return new GenericSystemDescriptor(systemName, "factoryClass") + .getInputDescriptor(streamId, kvSerde); + } + + private static Config getConfig() { + HashMap configMap = new HashMap<>(); + configMap.put("job.name", "testJobName"); + configMap.put("job.id", "testJobId"); + return new MapConfig(configMap); + } +} diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/util/FutureUtilsTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/util/FutureUtilsTest.java new file mode 100644 index 000000000000..357fc6fa2396 --- /dev/null +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/util/FutureUtilsTest.java @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.util; + +import java.util.Collection; +import java.util.List; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.CompletionStage; +import java.util.concurrent.CountDownLatch; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Assert; +import org.junit.Test; + +/** Unit tests for {@linkplain FutureUtils}. */ +public final class FutureUtilsTest { + private static final List RESULTS = ImmutableList.of("hello", "world"); + + @Test + public void testFlattenFuturesForCollection() { + CompletionStage> resultFuture = + FutureUtils.flattenFutures( + ImmutableList.of( + CompletableFuture.completedFuture("hello"), + CompletableFuture.completedFuture("world"))); + + CompletionStage validationFuture = + resultFuture.thenAccept( + actualResults -> { + Assert.assertEquals( + "Expected flattened results to contain {hello, world}", RESULTS, actualResults); + }); + + validationFuture.toCompletableFuture().join(); + } + + @Test + public void testFlattenFuturesForFailedFuture() { + CompletionStage> resultFuture = + FutureUtils.flattenFutures( + ImmutableList.of( + CompletableFuture.completedFuture("hello"), + createFailedFuture(new RuntimeException()))); + + CompletionStage validationFuture = + resultFuture.handle( + (results, ex) -> { + Assert.assertTrue( + "Expected exception to be of RuntimeException", ex instanceof RuntimeException); + return null; + }); + + validationFuture.toCompletableFuture().join(); + } + + @Test + public void testWaitForAllFutures() { + CountDownLatch latch = new CountDownLatch(1); + CompletionStage> resultFuture = + FutureUtils.flattenFutures( + ImmutableList.of( + CompletableFuture.supplyAsync( + () -> { + try { + latch.await(); + } catch (InterruptedException e) { + return ""; + } + + return "hello"; + }), + CompletableFuture.supplyAsync( + () -> { + latch.countDown(); + return "world"; + }))); + + CompletionStage validationFuture = + resultFuture.thenAccept( + actualResults -> { + Assert.assertEquals( + "Expected flattened results to contain {hello, world}", RESULTS, actualResults); + }); + + validationFuture.toCompletableFuture().join(); + } + + private static CompletionStage createFailedFuture(Throwable t) { + CompletableFuture future = new CompletableFuture<>(); + future.completeExceptionally(t); + return future; + } +} diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/util/PipelineJsonRendererTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/util/PipelineJsonRendererTest.java new file mode 100644 index 000000000000..70812d69fe6d --- /dev/null +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/util/PipelineJsonRendererTest.java @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.util; + +import static org.junit.Assert.assertEquals; + +import com.google.gson.JsonParser; +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import java.nio.file.Files; +import java.nio.file.Paths; +import org.apache.beam.runners.samza.SamzaPipelineOptions; +import org.apache.beam.runners.samza.SamzaRunner; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.Sum; +import org.apache.beam.sdk.transforms.windowing.FixedWindows; +import org.apache.beam.sdk.transforms.windowing.Window; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.TimestampedValue; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.junit.Test; + +/** Tests for {@link org.apache.beam.runners.samza.util.PipelineJsonRenderer}. */ +public class PipelineJsonRendererTest { + + @Test + public void testEmptyPipeline() { + SamzaPipelineOptions options = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + options.setRunner(SamzaRunner.class); + + Pipeline p = Pipeline.create(options); + + String jsonDag = + "{ \"RootNode\": [" + + " { \"fullName\":\"OuterMostNode\"," + + " \"ChildNodes\":[ ]}],\"graphLinks\": []" + + "}"; + + System.out.println(PipelineJsonRenderer.toJsonString(p)); + assertEquals( + JsonParser.parseString(jsonDag), + JsonParser.parseString( + PipelineJsonRenderer.toJsonString(p).replaceAll(System.lineSeparator(), ""))); + } + + @Test + public void testCompositePipeline() throws IOException { + SamzaPipelineOptions options = PipelineOptionsFactory.create().as(SamzaPipelineOptions.class); + options.setRunner(SamzaRunner.class); + + Pipeline p = Pipeline.create(options); + + p.apply(Create.timestamped(TimestampedValue.of(KV.of(1, 1), new Instant(1)))) + .apply(Window.into(FixedWindows.of(Duration.millis(10)))) + .apply(Sum.integersPerKey()); + + String jsonDagFileName = "src/test/resources/ExpectedDag.json"; + String jsonDag = + new String(Files.readAllBytes(Paths.get(jsonDagFileName)), StandardCharsets.UTF_8); + + assertEquals( + JsonParser.parseString(jsonDag), + JsonParser.parseString( + PipelineJsonRenderer.toJsonString(p).replaceAll(System.lineSeparator(), ""))); + } +} diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/util/TestHashIdGenerator.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/util/TestHashIdGenerator.java index 82363d5f92b5..881ce7f91c43 100644 --- a/runners/samza/src/test/java/org/apache/beam/runners/samza/util/TestHashIdGenerator.java +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/util/TestHashIdGenerator.java @@ -31,9 +31,6 @@ import org.junit.Test; /** Test class for {@link HashIdGenerator}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestHashIdGenerator { @Test diff --git a/runners/samza/src/test/java/org/apache/beam/runners/samza/util/WindowUtilsTest.java b/runners/samza/src/test/java/org/apache/beam/runners/samza/util/WindowUtilsTest.java new file mode 100644 index 000000000000..a76811b05102 --- /dev/null +++ b/runners/samza/src/test/java/org/apache/beam/runners/samza/util/WindowUtilsTest.java @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.samza.util; + +import static org.junit.Assert.assertEquals; + +import java.io.IOException; +import org.apache.beam.runners.core.construction.Environments; +import org.apache.beam.runners.core.construction.SdkComponents; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.VarLongCoder; +import org.apache.beam.sdk.coders.VoidCoder; +import org.apache.beam.sdk.transforms.windowing.FixedWindows; +import org.apache.beam.sdk.transforms.windowing.IntervalWindow; +import org.apache.beam.sdk.transforms.windowing.TimestampCombiner; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.WindowingStrategy; +import org.joda.time.Duration; +import org.junit.Test; + +/** Unit tests for {@link WindowUtils}. */ +public class WindowUtilsTest { + + @Test + public void testGetWindowStrategy() throws IOException { + SdkComponents components = SdkComponents.create(); + String environmentId = + components.registerEnvironment(Environments.createDockerEnvironment("java")); + WindowingStrategy expected = + WindowingStrategy.of(FixedWindows.of(Duration.standardMinutes(1))) + .withMode(WindowingStrategy.AccumulationMode.DISCARDING_FIRED_PANES) + .withTimestampCombiner(TimestampCombiner.END_OF_WINDOW) + .withAllowedLateness(Duration.ZERO) + .withEnvironmentId(environmentId); + components.registerWindowingStrategy(expected); + String collectionId = + components.registerPCollection( + PCollection.createPrimitiveOutputInternal( + Pipeline.create(), expected, PCollection.IsBounded.BOUNDED, VoidCoder.of()) + .setName("name")); + + WindowingStrategy actual = + WindowUtils.getWindowStrategy(collectionId, components.toComponents()); + + assertEquals(expected, actual); + } + + @Test + public void testInstantiateWindowedCoder() throws IOException { + Coder> expectedValueCoder = + KvCoder.of(VarLongCoder.of(), StringUtf8Coder.of()); + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + String collectionId = + components.registerPCollection( + PCollection.createPrimitiveOutputInternal( + Pipeline.create(), + WindowingStrategy.globalDefault(), + PCollection.IsBounded.BOUNDED, + expectedValueCoder) + .setName("name")); + + assertEquals( + expectedValueCoder, + WindowUtils.instantiateWindowedCoder(collectionId, components.toComponents()) + .getValueCoder()); + } +} diff --git a/runners/samza/src/test/resources/ExpectedDag.json b/runners/samza/src/test/resources/ExpectedDag.json new file mode 100644 index 000000000000..966aaf85a541 --- /dev/null +++ b/runners/samza/src/test/resources/ExpectedDag.json @@ -0,0 +1,67 @@ +{ + "RootNode": [ + { "fullName":"OuterMostNode", + "ChildNodes":[ + { "fullName":"Create.TimestampedValues", + "enclosingNode":"OuterMostNode", + "ChildNodes":[ + { "fullName":"Create.TimestampedValues/Create.Values", + "enclosingNode":"Create.TimestampedValues", + "ChildNodes":[ + { "fullName":"Create.TimestampedValues/Create.Values/Read(CreateSource)", + "enclosingNode":"Create.TimestampedValues/Create.Values", + "ChildNodes":[ + { "fullName":"Create.TimestampedValues/Create.Values/Read(CreateSource)/Impulse", + "enclosingNode":"Create.TimestampedValues/Create.Values/Read(CreateSource)"}, + { "fullName":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(OutputSingleSource)", + "enclosingNode":"Create.TimestampedValues/Create.Values/Read(CreateSource)", + "ChildNodes":[ + { "fullName":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(OutputSingleSource)/ParMultiDo(OutputSingleSource)", + "enclosingNode":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(OutputSingleSource)"} + ]}, + { "fullName":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(BoundedSourceAsSDFWrapper)", + "enclosingNode":"Create.TimestampedValues/Create.Values/Read(CreateSource)", + "ChildNodes":[ + { "fullName":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(BoundedSourceAsSDFWrapper)/ParMultiDo(BoundedSourceAsSDFWrapper)", + "enclosingNode":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(BoundedSourceAsSDFWrapper)"} + ]} + ]} + ]}, + { "fullName":"Create.TimestampedValues/ParDo(ConvertTimestamps)", + "enclosingNode":"Create.TimestampedValues", + "ChildNodes":[ + { "fullName":"Create.TimestampedValues/ParDo(ConvertTimestamps)/ParMultiDo(ConvertTimestamps)", + "enclosingNode":"Create.TimestampedValues/ParDo(ConvertTimestamps)"} + ]} + ]}, + { "fullName":"Window.Into()", + "enclosingNode":"OuterMostNode", + "ChildNodes":[ + { "fullName":"Window.Into()/Window.Assign", + "enclosingNode":"Window.Into()"} + ]}, + { "fullName":"Combine.perKey(SumInteger)", + "enclosingNode":"OuterMostNode", + "ChildNodes":[ + { "fullName":"Combine.perKey(SumInteger)/GroupByKey", + "enclosingNode":"Combine.perKey(SumInteger)"}, + { "fullName":"Combine.perKey(SumInteger)/Combine.GroupedValues", + "enclosingNode":"Combine.perKey(SumInteger)", + "ChildNodes":[ + { "fullName":"Combine.perKey(SumInteger)/Combine.GroupedValues/ParDo(Anonymous)", + "enclosingNode":"Combine.perKey(SumInteger)/Combine.GroupedValues", + "ChildNodes":[ + { "fullName":"Combine.perKey(SumInteger)/Combine.GroupedValues/ParDo(Anonymous)/ParMultiDo(Anonymous)", + "enclosingNode":"Combine.perKey(SumInteger)/Combine.GroupedValues/ParDo(Anonymous)"} + ]} + ]} + ]} + ]} + ] +,"graphLinks": [ + {"from":"Create.TimestampedValues/Create.Values/Read(CreateSource)/Impulse","to":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(OutputSingleSource)/ParMultiDo(OutputSingleSource)"}, + {"from":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(OutputSingleSource)/ParMultiDo(OutputSingleSource)","to":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(BoundedSourceAsSDFWrapper)/ParMultiDo(BoundedSourceAsSDFWrapper)"}, + {"from":"Create.TimestampedValues/Create.Values/Read(CreateSource)/ParDo(BoundedSourceAsSDFWrapper)/ParMultiDo(BoundedSourceAsSDFWrapper)","to":"Create.TimestampedValues/ParDo(ConvertTimestamps)/ParMultiDo(ConvertTimestamps)"}, + {"from":"Create.TimestampedValues/ParDo(ConvertTimestamps)/ParMultiDo(ConvertTimestamps)","to":"Window.Into()/Window.Assign"}, + {"from":"Window.Into()/Window.Assign","to":"Combine.perKey(SumInteger)/GroupByKey"}, + {"from":"Combine.perKey(SumInteger)/GroupByKey","to":"Combine.perKey(SumInteger)/Combine.GroupedValues/ParDo(Anonymous)/ParMultiDo(Anonymous)"}]} \ No newline at end of file diff --git a/runners/flink/1.8/build.gradle b/runners/spark/2/build.gradle similarity index 80% rename from runners/flink/1.8/build.gradle rename to runners/spark/2/build.gradle index 5489aec11194..7bd7c7222d7d 100644 --- a/runners/flink/1.8/build.gradle +++ b/runners/spark/2/build.gradle @@ -17,18 +17,19 @@ */ def basePath = '..' - -/* All properties required for loading the Flink build script */ +/* All properties required for loading the Spark build script */ project.ext { - // Set the version of all Flink-related dependencies here. - flink_version = '1.8.3' + // Set the version of all Spark-related dependencies here. + spark_version = '2.4.8' + spark_scala_version = '2.11' + // Version specific code overrides. main_source_overrides = ['./src/main/java'] test_source_overrides = ['./src/test/java'] main_resources_overrides = [] test_resources_overrides = [] - archives_base_name = 'beam-runners-flink-1.8' + archives_base_name = 'beam-runners-spark' } // Load the main build script which contains all build logic. -apply from: "$basePath/flink_runner.gradle" +apply from: "$basePath/spark_runner.gradle" diff --git a/runners/flink/1.8/job-server/build.gradle b/runners/spark/2/job-server/build.gradle similarity index 91% rename from runners/flink/1.8/job-server/build.gradle rename to runners/spark/2/job-server/build.gradle index 562b6ca6480d..adb9121d1d0b 100644 --- a/runners/flink/1.8/job-server/build.gradle +++ b/runners/spark/2/job-server/build.gradle @@ -24,8 +24,8 @@ project.ext { test_source_dirs = ["$basePath/src/test/java"] main_resources_dirs = ["$basePath/src/main/resources"] test_resources_dirs = ["$basePath/src/test/resources"] - archives_base_name = 'beam-runners-flink-1.8-job-server' + archives_base_name = 'beam-runners-spark-job-server' } // Load the main build script which contains all build logic. -apply from: "$basePath/flink_job_server.gradle" +apply from: "$basePath/spark_job_server.gradle" diff --git a/runners/flink/1.9/job-server-container/build.gradle b/runners/spark/2/job-server/container/build.gradle similarity index 85% rename from runners/flink/1.9/job-server-container/build.gradle rename to runners/spark/2/job-server/container/build.gradle index afdb68a0fc91..10cacb109a2a 100644 --- a/runners/flink/1.9/job-server-container/build.gradle +++ b/runners/spark/2/job-server/container/build.gradle @@ -16,11 +16,12 @@ * limitations under the License. */ -def basePath = '../../job-server-container' +def basePath = '../../../job-server/container' project.ext { resource_path = basePath + spark_job_server_image = 'spark_job_server' } // Load the main build script which contains all build logic. -apply from: "$basePath/flink_job_server_container.gradle" +apply from: "$basePath/spark_job_server_container.gradle" diff --git a/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java new file mode 100644 index 000000000000..ece050e5cc9d --- /dev/null +++ b/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.spark.structuredstreaming.translation; + +import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingPipelineOptions; +import org.apache.spark.sql.streaming.DataStreamWriter; + +/** Subclass of {@link AbstractTranslationContext} that address spark breaking changes. */ +public class TranslationContext extends AbstractTranslationContext { + + public TranslationContext(SparkStructuredStreamingPipelineOptions options) { + super(options); + } + + @Override + public void launchStreaming(DataStreamWriter dataStreamWriter) { + dataStreamWriter.start(); + } +} diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java b/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java similarity index 95% rename from runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java rename to runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java index 3abe87c6b6b1..766d143d6d7d 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java +++ b/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java @@ -17,6 +17,9 @@ */ package org.apache.beam.runners.spark.structuredstreaming.translation.batch; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.BEAM_SOURCE_OPTION; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.DEFAULT_PARALLELISM; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.PIPELINE_OPTIONS; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import java.io.IOException; @@ -25,8 +28,8 @@ import java.util.List; import org.apache.beam.runners.core.construction.SerializablePipelineOptions; import org.apache.beam.runners.core.serialization.Base64Serializer; -import org.apache.beam.runners.spark.structuredstreaming.translation.SchemaHelpers; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.RowHelpers; +import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.SchemaHelpers; import org.apache.beam.sdk.io.BoundedSource; import org.apache.beam.sdk.io.BoundedSource.BoundedReader; import org.apache.beam.sdk.options.PipelineOptions; @@ -48,10 +51,6 @@ */ public class DatasetSourceBatch implements DataSourceV2, ReadSupport { - static final String BEAM_SOURCE_OPTION = "beam-source"; - static final String DEFAULT_PARALLELISM = "default-parallelism"; - static final String PIPELINE_OPTIONS = "pipeline-options"; - @Override public DataSourceReader createReader(DataSourceOptions options) { return new DatasetReader<>(options); diff --git a/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderFactory.java b/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderFactory.java new file mode 100644 index 000000000000..325d15075b67 --- /dev/null +++ b/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderFactory.java @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.spark.structuredstreaming.translation.helpers; + +import static org.apache.spark.sql.types.DataTypes.BinaryType; + +import java.util.Collections; +import java.util.List; +import org.apache.beam.sdk.coders.Coder; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.catalyst.analysis.GetColumnByOrdinal; +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder; +import org.apache.spark.sql.catalyst.expressions.BoundReference; +import org.apache.spark.sql.catalyst.expressions.Cast; +import org.apache.spark.sql.catalyst.expressions.Expression; +import org.apache.spark.sql.types.ObjectType; +import scala.collection.JavaConversions; +import scala.reflect.ClassTag; +import scala.reflect.ClassTag$; + +public class EncoderFactory { + + public static Encoder fromBeamCoder(Coder coder) { + Class clazz = coder.getEncodedTypeDescriptor().getRawType(); + ClassTag classTag = ClassTag$.MODULE$.apply(clazz); + List serializers = + Collections.singletonList( + new EncoderHelpers.EncodeUsingBeamCoder<>( + new BoundReference(0, new ObjectType(clazz), true), coder)); + + return new ExpressionEncoder<>( + SchemaHelpers.binarySchema(), + false, + JavaConversions.collectionAsScalaIterable(serializers).toSeq(), + new EncoderHelpers.DecodeUsingBeamCoder<>( + new Cast(new GetColumnByOrdinal(0, BinaryType), BinaryType), classTag, coder), + classTag); + } +} diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java b/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java similarity index 96% rename from runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java rename to runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java index f70da385258f..a53f529ae1ef 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java +++ b/runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java @@ -17,6 +17,9 @@ */ package org.apache.beam.runners.spark.structuredstreaming.translation.streaming; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.BEAM_SOURCE_OPTION; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.DEFAULT_PARALLELISM; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.PIPELINE_OPTIONS; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; @@ -27,8 +30,8 @@ import java.util.Optional; import org.apache.beam.runners.core.construction.SerializablePipelineOptions; import org.apache.beam.runners.core.serialization.Base64Serializer; -import org.apache.beam.runners.spark.structuredstreaming.translation.SchemaHelpers; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.RowHelpers; +import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.SchemaHelpers; import org.apache.beam.sdk.io.BoundedSource; import org.apache.beam.sdk.io.UnboundedSource; import org.apache.beam.sdk.io.UnboundedSource.CheckpointMark; @@ -69,10 +72,6 @@ }) class DatasetSourceStreaming implements DataSourceV2, MicroBatchReadSupport { - static final String BEAM_SOURCE_OPTION = "beam-source"; - static final String DEFAULT_PARALLELISM = "default-parallelism"; - static final String PIPELINE_OPTIONS = "pipeline-options"; - @Override public MicroBatchReader createMicroBatchReader( Optional schema, String checkpointLocation, DataSourceOptions options) { diff --git a/runners/flink/1.10/build.gradle b/runners/spark/3/build.gradle similarity index 68% rename from runners/flink/1.10/build.gradle rename to runners/spark/3/build.gradle index 5a79c3e3ecb9..061eaa6d5e2d 100644 --- a/runners/flink/1.10/build.gradle +++ b/runners/spark/3/build.gradle @@ -17,17 +17,19 @@ */ def basePath = '..' -/* All properties required for loading the Flink build script */ +/* All properties required for loading the Spark build script */ project.ext { - // Set the version of all Flink-related dependencies here. - flink_version = '1.10.1' + // Set the version of all Spark-related dependencies here. + spark_version = '3.1.2' + spark_scala_version = '2.12' + // Version specific code overrides. - main_source_overrides = ["${basePath}/1.8/src/main/java", "${basePath}/1.9/src/main/java", './src/main/java'] - test_source_overrides = ["${basePath}/1.8/src/test/java", "${basePath}/1.9/src/test/java", './src/test/java'] + main_source_overrides = ['./src/main/java'] + test_source_overrides = ['./src/test/java'] main_resources_overrides = [] test_resources_overrides = [] - archives_base_name = 'beam-runners-flink-1.10' + archives_base_name = 'beam-runners-spark-3' } // Load the main build script which contains all build logic. -apply from: "$basePath/flink_runner.gradle" +apply from: "$basePath/spark_runner.gradle" diff --git a/runners/flink/1.10/job-server/build.gradle b/runners/spark/3/job-server/build.gradle similarity index 91% rename from runners/flink/1.10/job-server/build.gradle rename to runners/spark/3/job-server/build.gradle index ee1e8c986970..a3aa69085a8b 100644 --- a/runners/flink/1.10/job-server/build.gradle +++ b/runners/spark/3/job-server/build.gradle @@ -24,8 +24,8 @@ project.ext { test_source_dirs = ["$basePath/src/test/java"] main_resources_dirs = ["$basePath/src/main/resources"] test_resources_dirs = ["$basePath/src/test/resources"] - archives_base_name = 'beam-runners-flink-1.10-job-server' + archives_base_name = 'beam-runners-spark-3-job-server' } // Load the main build script which contains all build logic. -apply from: "$basePath/flink_job_server.gradle" +apply from: "$basePath/spark_job_server.gradle" diff --git a/runners/flink/1.8/job-server-container/build.gradle b/runners/spark/3/job-server/container/build.gradle similarity index 85% rename from runners/flink/1.8/job-server-container/build.gradle rename to runners/spark/3/job-server/container/build.gradle index afdb68a0fc91..0e8b466048f8 100644 --- a/runners/flink/1.8/job-server-container/build.gradle +++ b/runners/spark/3/job-server/container/build.gradle @@ -16,11 +16,12 @@ * limitations under the License. */ -def basePath = '../../job-server-container' +def basePath = '../../../job-server/container' project.ext { resource_path = basePath + spark_job_server_image = 'spark3_job_server' } // Load the main build script which contains all build logic. -apply from: "$basePath/flink_job_server_container.gradle" +apply from: "$basePath/spark_job_server_container.gradle" diff --git a/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java new file mode 100644 index 000000000000..12cb2d2fef00 --- /dev/null +++ b/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.spark.structuredstreaming.translation; + +import java.util.concurrent.TimeoutException; +import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingPipelineOptions; +import org.apache.spark.sql.streaming.DataStreamWriter; + +/** + * Subclass of {@link + * org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext} that + * address spark breaking changes. + */ +public class TranslationContext extends AbstractTranslationContext { + + public TranslationContext(SparkStructuredStreamingPipelineOptions options) { + super(options); + } + + @Override + public void launchStreaming(DataStreamWriter dataStreamWriter) { + try { + dataStreamWriter.start(); + } catch (TimeoutException e) { + throw new RuntimeException("A timeout occurred when running the streaming pipeline", e); + } + } +} diff --git a/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java b/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java new file mode 100644 index 000000000000..f2fd80005fa9 --- /dev/null +++ b/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.spark.structuredstreaming.translation.batch; + +import static org.apache.beam.runners.spark.structuredstreaming.Constants.BEAM_SOURCE_OPTION; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.DEFAULT_PARALLELISM; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.PIPELINE_OPTIONS; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.io.IOException; +import java.io.Serializable; +import java.util.List; +import java.util.Map; +import java.util.Set; +import org.apache.beam.runners.core.construction.SerializablePipelineOptions; +import org.apache.beam.runners.core.serialization.Base64Serializer; +import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.RowHelpers; +import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.SchemaHelpers; +import org.apache.beam.sdk.io.BoundedSource; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.parquet.Strings; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.connector.catalog.SupportsRead; +import org.apache.spark.sql.connector.catalog.Table; +import org.apache.spark.sql.connector.catalog.TableCapability; +import org.apache.spark.sql.connector.catalog.TableProvider; +import org.apache.spark.sql.connector.expressions.Transform; +import org.apache.spark.sql.connector.read.Batch; +import org.apache.spark.sql.connector.read.InputPartition; +import org.apache.spark.sql.connector.read.PartitionReader; +import org.apache.spark.sql.connector.read.PartitionReaderFactory; +import org.apache.spark.sql.connector.read.Scan; +import org.apache.spark.sql.connector.read.ScanBuilder; +import org.apache.spark.sql.types.StructType; +import org.apache.spark.sql.util.CaseInsensitiveStringMap; + +/** + * Spark DataSourceV2 API was removed in Spark3. This is a Beam source wrapper using the new spark 3 + * source API. + */ +public class DatasetSourceBatch implements TableProvider { + + private static final StructType BINARY_SCHEMA = SchemaHelpers.binarySchema(); + + public DatasetSourceBatch() {} + + @Override + public StructType inferSchema(CaseInsensitiveStringMap options) { + return BINARY_SCHEMA; + } + + @Override + public boolean supportsExternalMetadata() { + return true; + } + + @Override + public Table getTable( + StructType schema, Transform[] partitioning, Map properties) { + return new DatasetSourceBatchTable(); + } + + private static class DatasetSourceBatchTable implements SupportsRead { + + @Override + public ScanBuilder newScanBuilder(CaseInsensitiveStringMap options) { + return new ScanBuilder() { + + @Override + public Scan build() { + return new Scan() { // scan for Batch reading + + @Override + public StructType readSchema() { + return BINARY_SCHEMA; + } + + @Override + public Batch toBatch() { + return new BeamBatch<>(options); + } + }; + } + }; + } + + @Override + public String name() { + return "BeamSource"; + } + + @Override + public StructType schema() { + return BINARY_SCHEMA; + } + + @Override + public Set capabilities() { + final ImmutableSet capabilities = + ImmutableSet.of(TableCapability.BATCH_READ); + return capabilities; + } + + private static class BeamBatch implements Batch, Serializable { + + private final int numPartitions; + private final BoundedSource source; + private final SerializablePipelineOptions serializablePipelineOptions; + + private BeamBatch(CaseInsensitiveStringMap options) { + if (Strings.isNullOrEmpty(options.get(BEAM_SOURCE_OPTION))) { + throw new RuntimeException("Beam source was not set in DataSource options"); + } + this.source = + Base64Serializer.deserializeUnchecked( + options.get(BEAM_SOURCE_OPTION), BoundedSource.class); + + if (Strings.isNullOrEmpty(DEFAULT_PARALLELISM)) { + throw new RuntimeException("Spark default parallelism was not set in DataSource options"); + } + this.numPartitions = Integer.parseInt(options.get(DEFAULT_PARALLELISM)); + checkArgument(numPartitions > 0, "Number of partitions must be greater than zero."); + + if (Strings.isNullOrEmpty(options.get(PIPELINE_OPTIONS))) { + throw new RuntimeException("Beam pipelineOptions were not set in DataSource options"); + } + this.serializablePipelineOptions = + new SerializablePipelineOptions(options.get(PIPELINE_OPTIONS)); + } + + @Override + public InputPartition[] planInputPartitions() { + PipelineOptions options = serializablePipelineOptions.get(); + long desiredSizeBytes; + + try { + desiredSizeBytes = source.getEstimatedSizeBytes(options) / numPartitions; + List> splits = source.split(desiredSizeBytes, options); + InputPartition[] result = new InputPartition[splits.size()]; + int i = 0; + for (BoundedSource split : splits) { + result[i++] = new BeamInputPartition<>(split); + } + return result; + } catch (Exception e) { + throw new RuntimeException( + "Error in splitting BoundedSource " + source.getClass().getCanonicalName(), e); + } + } + + @Override + public PartitionReaderFactory createReaderFactory() { + return new PartitionReaderFactory() { + + @Override + public PartitionReader createReader(InputPartition partition) { + return new BeamPartitionReader( + ((BeamInputPartition) partition).getSource(), serializablePipelineOptions); + } + }; + } + + private static class BeamInputPartition implements InputPartition { + + private final BoundedSource source; + + private BeamInputPartition(BoundedSource source) { + this.source = source; + } + + public BoundedSource getSource() { + return source; + } + } + + private static class BeamPartitionReader implements PartitionReader { + + private final BoundedSource source; + private final BoundedSource.BoundedReader reader; + private boolean started; + private boolean closed; + + BeamPartitionReader( + BoundedSource source, SerializablePipelineOptions serializablePipelineOptions) { + this.started = false; + this.closed = false; + this.source = source; + // reader is not serializable so lazy initialize it + try { + reader = + source.createReader(serializablePipelineOptions.get().as(PipelineOptions.class)); + } catch (IOException e) { + throw new RuntimeException("Error creating BoundedReader ", e); + } + } + + @Override + public boolean next() throws IOException { + if (!started) { + started = true; + return reader.start(); + } else { + return !closed && reader.advance(); + } + } + + @Override + public InternalRow get() { + WindowedValue windowedValue = + WindowedValue.timestampedValueInGlobalWindow( + reader.getCurrent(), reader.getCurrentTimestamp()); + return RowHelpers.storeWindowedValueInRow(windowedValue, source.getOutputCoder()); + } + + @Override + public void close() throws IOException { + closed = true; + reader.close(); + } + } + } + } +} diff --git a/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderFactory.java b/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderFactory.java new file mode 100644 index 000000000000..39a71507453e --- /dev/null +++ b/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderFactory.java @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.spark.structuredstreaming.translation.helpers; + +import static org.apache.spark.sql.types.DataTypes.BinaryType; + +import org.apache.beam.sdk.coders.Coder; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.catalyst.analysis.GetColumnByOrdinal; +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder; +import org.apache.spark.sql.catalyst.expressions.BoundReference; +import org.apache.spark.sql.catalyst.expressions.Cast; +import org.apache.spark.sql.catalyst.expressions.Expression; +import org.apache.spark.sql.types.ObjectType; +import scala.reflect.ClassTag; +import scala.reflect.ClassTag$; + +public class EncoderFactory { + + public static Encoder fromBeamCoder(Coder coder) { + Class clazz = coder.getEncodedTypeDescriptor().getRawType(); + ClassTag classTag = ClassTag$.MODULE$.apply(clazz); + Expression serializer = + new EncoderHelpers.EncodeUsingBeamCoder<>( + new BoundReference(0, new ObjectType(clazz), true), coder); + Expression deserializer = + new EncoderHelpers.DecodeUsingBeamCoder<>( + new Cast( + new GetColumnByOrdinal(0, BinaryType), BinaryType, scala.Option.empty()), + classTag, + coder); + return new ExpressionEncoder<>(serializer, deserializer, classTag); + } +} diff --git a/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java b/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java new file mode 100644 index 000000000000..b83a631a6bba --- /dev/null +++ b/runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.spark.structuredstreaming.translation.streaming; + +/** + * Spark structured streaming framework does not support more than one aggregation in streaming mode + * because of watermark implementation. As a consequence, this runner, does not support streaming + * mode yet see https://issues.apache.org/jira/browse/BEAM-9933 + */ +class DatasetSourceStreaming {} diff --git a/runners/spark/job-server/container/build.gradle b/runners/spark/job-server/container/spark_job_server_container.gradle similarity index 87% rename from runners/spark/job-server/container/build.gradle rename to runners/spark/job-server/container/spark_job_server_container.gradle index fbd3b17a255a..a7646b19e28e 100644 --- a/runners/spark/job-server/container/build.gradle +++ b/runners/spark/job-server/container/spark_job_server_container.gradle @@ -28,7 +28,7 @@ applyDockerNature() def sparkJobServerProject = project.parent.path -description = "Apache Beam :: Runners :: Spark :: Job Server :: Container" +description = project(sparkJobServerProject).description + " :: Container" configurations { dockerDependency @@ -41,18 +41,18 @@ dependencies { task copyDockerfileDependencies(type: Copy) { // Required Jars from configurations.dockerDependency - rename 'beam-runners-spark-job-server.*.jar', 'beam-runners-spark-job-server.jar' + rename 'beam-runners-spark.*-job-server.*.jar', 'beam-runners-spark-job-server.jar' into "build" // Entry script - from "spark-job-server.sh" + from "../../../job-server/container/spark-job-server.sh" into "build" // Dockerfile - from "Dockerfile" + from "../../../job-server/container/Dockerfile" into "build" } docker { - name containerImageName(name: project.docker_image_default_repo_prefix + 'spark_job_server', + name containerImageName(name: project.docker_image_default_repo_prefix + spark_job_server_image, root: project.rootProject.hasProperty(["docker-repository-root"]) ? project.rootProject["docker-repository-root"] : project.docker_image_default_repo_root) diff --git a/runners/spark/job-server/build.gradle b/runners/spark/job-server/spark_job_server.gradle similarity index 96% rename from runners/spark/job-server/build.gradle rename to runners/spark/job-server/spark_job_server.gradle index e08f8ae40cfb..c32f7b0428da 100644 --- a/runners/spark/job-server/build.gradle +++ b/runners/spark/job-server/spark_job_server.gradle @@ -29,6 +29,7 @@ mainClassName = "org.apache.beam.runners.spark.SparkJobServerDriver" applyJavaNature( automaticModuleName: 'org.apache.beam.runners.spark.jobserver', + archivesBaseName: project.hasProperty('archives_base_name') ? archives_base_name : archivesBaseName, validateShadowJar: false, exportJavadoc: false, shadowClosure: { @@ -50,13 +51,14 @@ configurations.all { dependencies { compile project(sparkRunnerProject) + permitUnusedDeclared project(sparkRunnerProject) compile project(path: sparkRunnerProject, configuration: "provided") validatesPortableRunner project(path: sparkRunnerProject, configuration: "testRuntime") validatesPortableRunner project(path: sparkRunnerProject, configuration: "provided") validatesPortableRunner project(path: ":sdks:java:core", configuration: "shadowTest") validatesPortableRunner project(path: ":runners:core-java", configuration: "testRuntime") validatesPortableRunner project(path: ":runners:portability:java", configuration: "testRuntime") - compile project(":sdks:java:extensions:google-cloud-platform-core") + runtime project(":sdks:java:extensions:google-cloud-platform-core") // TODO: Enable AWS and HDFS file system. } @@ -133,6 +135,7 @@ def portableValidatesRunnerTask(String name, Boolean streaming) { // // Assertion error: empty iterable output excludeTestsMatching 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testFixedWindowsCombine' excludeTestsMatching 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSessionsCombine' + excludeTestsMatching 'org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests' excludeTestsMatching 'org.apache.beam.sdk.transforms.ReshuffleTest.testReshuffleAfterFixedWindowsAndGroupByKey' excludeTestsMatching 'org.apache.beam.sdk.transforms.ReshuffleTest.testReshuffleAfterSessionsAndGroupByKey' excludeTestsMatching 'org.apache.beam.sdk.transforms.ReshuffleTest.testReshuffleAfterSlidingWindowsAndGroupByKey' @@ -199,8 +202,8 @@ task validatesPortableRunner() { dependsOn validatesPortableRunnerStreaming } -def jobPort = BeamModulePlugin.startingExpansionPortNumber.getAndDecrement() -def artifactPort = BeamModulePlugin.startingExpansionPortNumber.getAndDecrement() +def jobPort = BeamModulePlugin.getRandomPort() +def artifactPort = BeamModulePlugin.getRandomPort() def setupTask = project.tasks.create(name: "sparkJobServerSetup", type: Exec) { dependsOn shadowJar diff --git a/runners/spark/build.gradle b/runners/spark/spark_runner.gradle similarity index 74% rename from runners/spark/build.gradle rename to runners/spark/spark_runner.gradle index 315f166e2ee5..5a2e0f98542b 100644 --- a/runners/spark/build.gradle +++ b/runners/spark/spark_runner.gradle @@ -19,9 +19,11 @@ import groovy.json.JsonOutput import java.util.stream.Collectors -plugins { id 'org.apache.beam.module' } +apply plugin: 'org.apache.beam.module' applyJavaNature( + enableStrictDependencies:true, automaticModuleName: 'org.apache.beam.runners.spark', + archivesBaseName: (project.hasProperty('archives_base_name') ? archives_base_name : archivesBaseName), classesTriggerCheckerBugs: [ 'SparkAssignWindowFn': 'https://github.com/typetools/checker-framework/issues/3793', 'SparkCombineFn': 'https://github.com/typetools/checker-framework/issues/3793', @@ -29,7 +31,7 @@ applyJavaNature( ], ) -description = "Apache Beam :: Runners :: Spark" +description = "Apache Beam :: Runners :: Spark $spark_version" /* * We need to rely on manually specifying these evaluationDependsOn to ensure that @@ -53,6 +55,63 @@ def hadoopVersions = [ hadoopVersions.each {kv -> configurations.create("hadoopVersion$kv.key")} +/* + * Copy & merge source overrides into build directory. + */ +def sourceOverridesBase = "${project.buildDir}/source-overrides/src" + +def copySourceOverrides = tasks.register('copySourceOverrides', Copy) { + it.from main_source_overrides + it.into "${sourceOverridesBase}/main/java" + it.duplicatesStrategy DuplicatesStrategy.INCLUDE +} +compileJava.dependsOn copySourceOverrides + +def copyResourcesOverrides = tasks.register('copyResourcesOverrides', Copy) { + it.from main_resources_overrides + it.into "${sourceOverridesBase}/main/resources" + it.duplicatesStrategy DuplicatesStrategy.INCLUDE +} +compileJava.dependsOn copyResourcesOverrides + +def copyTestSourceOverrides = tasks.register('copyTestSourceOverrides', Copy) { + it.from test_source_overrides + it.into "${sourceOverridesBase}/test/java" + it.duplicatesStrategy DuplicatesStrategy.INCLUDE +} +compileTestJava.dependsOn copyTestSourceOverrides + +def copyTestResourcesOverrides = tasks.register('copyTestResourcesOverrides', Copy) { + it.from test_resources_overrides + it.into "${sourceOverridesBase}/test/resources" + it.duplicatesStrategy DuplicatesStrategy.INCLUDE +} +compileJava.dependsOn copyTestResourcesOverrides + +/* + * We have to explicitly set all directories here to make sure each + * version of Spark has the correct overrides set. + */ +def sourceBase = "${project.projectDir}/../src" +sourceSets { + main { + java { + srcDirs = ["${sourceBase}/main/java", "${sourceOverridesBase}/main/java"] + } + resources { + srcDirs = ["${sourceBase}/main/resources", "${sourceOverridesBase}/main/resources"] + } + } + test { + java { + srcDirs = ["${sourceBase}/test/java", "${sourceOverridesBase}/test/java"] + } + resources { + srcDirs = ["${sourceBase}/test/resources", "${sourceOverridesBase}/test/resources"] + } + } +} + test { systemProperty "beam.spark.test.reuseSparkContext", "true" systemProperty "spark.sql.shuffle.partitions", "4" @@ -63,6 +122,16 @@ test { "--streaming=false", "--enableSparkMetricSinks=true" ]""" + systemProperty "log4j.configuration", "log4j-test.properties" + // Change log level to debug: + // systemProperty "org.slf4j.simpleLogger.defaultLogLevel", "debug" + // Change log level to debug only for the package and nested packages: + // systemProperty "org.slf4j.simpleLogger.log.org.apache.beam.runners.spark.stateful", "debug" + jvmArgs "-XX:-UseGCOverheadLimit" + if (System.getProperty("beamSurefireArgline")) { + jvmArgs System.getProperty("beamSurefireArgline") + } + // Only one SparkContext may be running in a JVM (SPARK-2243) forkEvery 1 maxParallelForks 4 @@ -70,6 +139,10 @@ test { excludeCategories "org.apache.beam.runners.spark.StreamingTest" excludeCategories "org.apache.beam.runners.spark.UsesCheckpointRecovery" } + filter { + // BEAM-11653 MetricsSinkTest is failing with Spark 3 + excludeTestsMatching 'org.apache.beam.runners.spark.aggregators.metrics.sink.SparkMetricsSinkTest' + } } dependencies { @@ -84,28 +157,37 @@ dependencies { compile library.java.slf4j_api compile library.java.joda_time compile library.java.args4j - provided library.java.spark_core - provided library.java.spark_sql - provided library.java.spark_streaming - provided library.java.spark_network_common + compile project(path: ":model:fn-execution", configuration: "shadow") + compile project(path: ":model:job-management", configuration: "shadow") + compile project(":sdks:java:fn-execution") + compile library.java.vendored_grpc_1_36_0 + compile library.java.vendored_guava_26_0_jre + provided "org.apache.spark:spark-core_$spark_scala_version:$spark_version" + provided "org.apache.spark:spark-network-common_$spark_scala_version:$spark_version" + provided "org.apache.spark:spark-sql_$spark_scala_version:$spark_version" + provided "org.apache.spark:spark-streaming_$spark_scala_version:$spark_version" + if(project.property("spark_scala_version").equals("2.11")){ + runtimeOnly library.java.jackson_module_scala_2_11 + } else { + runtimeOnly library.java.jackson_module_scala_2_12 + } + // Force paranamer 2.8 to avoid issues when using Scala 2.12 + runtimeOnly "com.thoughtworks.paranamer:paranamer:2.8" provided library.java.hadoop_common provided library.java.commons_io provided library.java.hamcrest_core provided library.java.hamcrest_library provided "com.esotericsoftware.kryo:kryo:2.21" - runtimeOnly library.java.jackson_module_scala - runtimeOnly "org.scala-lang:scala-library:2.11.8" testCompile project(":sdks:java:io:kafka") testCompile project(path: ":sdks:java:core", configuration: "shadowTest") // SparkStateInternalsTest extends abstract StateInternalsTest testCompile project(path: ":runners:core-java", configuration: "testRuntime") testCompile project(":sdks:java:harness") testCompile library.java.avro - testCompile library.java.kafka + testCompile "org.apache.kafka:kafka_$spark_scala_version:2.4.1" testCompile library.java.kafka_clients testCompile library.java.junit testCompile library.java.mockito_core - testCompile library.java.jackson_dataformat_yaml testCompile "org.apache.zookeeper:zookeeper:3.4.11" validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest") validatesRunner project(path: ":runners:core-java", configuration: "testRuntime") @@ -143,6 +225,8 @@ hadoopVersions.each {kv -> task validatesRunnerBatch(type: Test) { group = "Verification" + // Disable gradle cache + outputs.upToDateWhen { false } def pipelineOptions = JsonOutput.toJson([ "--runner=TestSparkRunner", "--streaming=false", @@ -180,6 +264,7 @@ task validatesRunnerBatch(type: Test) { excludeCategories 'org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo' // Portability excludeCategories 'org.apache.beam.sdk.testing.UsesCrossLanguageTransforms' + excludeCategories 'org.apache.beam.sdk.testing.UsesPythonExpansionService' excludeCategories 'org.apache.beam.sdk.testing.UsesBundleFinalizer' } jvmArgs '-Xmx3g' @@ -187,6 +272,8 @@ task validatesRunnerBatch(type: Test) { task validatesRunnerStreaming(type: Test) { group = "Verification" + // Disable gradle cache + outputs.upToDateWhen { false } def pipelineOptions = JsonOutput.toJson([ "--runner=TestSparkRunner", "--forceStreaming=true", @@ -203,10 +290,16 @@ task validatesRunnerStreaming(type: Test) { useJUnit { includeCategories 'org.apache.beam.runners.spark.StreamingTest' } + filter { + // BEAM-11653 MetricsSinkTest is failing with Spark 3 + excludeTestsMatching 'org.apache.beam.runners.spark.aggregators.metrics.sink.SparkMetricsSinkTest' + } } task validatesStructuredStreamingRunnerBatch(type: Test) { group = "Verification" + // Disable gradle cache + outputs.upToDateWhen { false } def pipelineOptions = JsonOutput.toJson([ "--runner=SparkStructuredStreamingRunner", "--testMode=true", @@ -227,6 +320,8 @@ task validatesStructuredStreamingRunnerBatch(type: Test) { // Only one SparkContext may be running in a JVM (SPARK-2243) forkEvery 1 maxParallelForks 4 + // Increase memory heap in order to avoid OOM errors + jvmArgs '-Xmx7g' useJUnit { includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner' // Unbounded @@ -250,6 +345,7 @@ task validatesStructuredStreamingRunnerBatch(type: Test) { excludeCategories 'org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo' // Portability excludeCategories 'org.apache.beam.sdk.testing.UsesCrossLanguageTransforms' + excludeCategories 'org.apache.beam.sdk.testing.UsesPythonExpansionService' excludeCategories 'org.apache.beam.sdk.testing.UsesBundleFinalizer' } filter { @@ -277,7 +373,7 @@ task validatesRunner { //dependsOn validatesStructuredStreamingRunnerBatch } -// Generates :runners:spark:runQuickstartJavaSpark +// Generates :runners:spark:*:runQuickstartJavaSpark task createJavaExamplesArchetypeValidationTask(type: 'Quickstart', runner: 'Spark') task hadoopVersionsTest(group: "Verification") { diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkCommonPipelineOptions.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkCommonPipelineOptions.java index 26111450662c..95e7865e6f9e 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkCommonPipelineOptions.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkCommonPipelineOptions.java @@ -17,11 +17,13 @@ */ package org.apache.beam.runners.spark; -import java.util.List; +import org.apache.beam.runners.core.construction.resources.PipelineResources; +import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.options.ApplicationNameOptions; import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.DefaultValueFactory; import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.FileStagingOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.StreamingOptions; @@ -30,7 +32,8 @@ * master address, and other user-related knobs. */ public interface SparkCommonPipelineOptions - extends PipelineOptions, StreamingOptions, ApplicationNameOptions { + extends PipelineOptions, StreamingOptions, ApplicationNameOptions, FileStagingOptions { + String DEFAULT_MASTER_URL = "local[4]"; @Description("The url of the spark master to connect to, (e.g. spark://host:port, local[4]).") @@ -47,20 +50,6 @@ public interface SparkCommonPipelineOptions void setCheckpointDir(String checkpointDir); - /** - * List of local files to make available to workers. - * - *

    Jars are placed on the worker's classpath. - * - *

    The default value is the list of jars from the main program's classpath. - */ - @Description( - "Jar-Files to send to all workers and put on the classpath. " - + "The default value is all files from the classpath.") - List getFilesToStage(); - - void setFilesToStage(List value); - @Description("Enable/disable sending aggregator values to Spark's metric sinks") @Default.Boolean(true) Boolean getEnableSparkMetricSinks(); @@ -77,4 +66,16 @@ public String create(PipelineOptions options) { return "/tmp/" + options.getJobName(); } } + + /** + * Classpath contains non jar files (eg. directories with .class files or empty directories) will + * cause exception in running log. Though the {@link org.apache.spark.SparkContext} can handle + * this when running in local master, it's better not to include non-jars files in classpath. + */ + @Internal + static void prepareFilesToStage(SparkCommonPipelineOptions options) { + if (!options.getSparkMaster().matches("local\\[?\\d*]?")) { + PipelineResources.prepareFilesForStaging(options); + } + } } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobInvoker.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobInvoker.java index 79c0633d8563..cb5bee3223a6 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobInvoker.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobInvoker.java @@ -26,7 +26,7 @@ import org.apache.beam.runners.jobsubmission.PortablePipelineJarCreator; import org.apache.beam.runners.jobsubmission.PortablePipelineRunner; import org.apache.beam.sdk.options.PortablePipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListeningExecutorService; import org.checkerframework.checker.nullness.qual.Nullable; @@ -49,7 +49,7 @@ public static SparkJobInvoker create( } private SparkJobInvoker(SparkJobServerDriver.SparkServerConfiguration configuration) { - super("spark-runner-job-invoker"); + super("spark-runner-job-invoker-%d"); this.configuration = configuration; } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobServerDriver.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobServerDriver.java index 056583e83388..abb4f87c541a 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobServerDriver.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobServerDriver.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.spark; -import org.apache.beam.runners.fnexecution.ServerFactory; import org.apache.beam.runners.jobsubmission.JobServerDriver; import org.apache.beam.sdk.extensions.gcp.options.GcsOptions; +import org.apache.beam.sdk.fn.server.ServerFactory; import org.apache.beam.sdk.io.FileSystems; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; @@ -62,7 +62,7 @@ private static void printUsage(CmdLineParser parser) { System.err.println(); } - private static SparkJobServerDriver fromParams(String[] args) { + public static SparkJobServerDriver fromParams(String[] args) { SparkServerConfiguration configuration = new SparkServerConfiguration(); CmdLineParser parser = new CmdLineParser(configuration); try { @@ -76,7 +76,7 @@ private static SparkJobServerDriver fromParams(String[] args) { return fromConfig(configuration); } - private static SparkJobServerDriver fromConfig(SparkServerConfiguration configuration) { + public static SparkJobServerDriver fromConfig(SparkServerConfiguration configuration) { return create( configuration, createJobServerFactory(configuration), diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java index 4466aca46212..ecf933336c97 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java @@ -17,16 +17,10 @@ */ package org.apache.beam.runners.spark; -import java.io.File; -import java.util.List; -import java.util.stream.Collectors; -import org.apache.beam.runners.core.construction.resources.PipelineResources; import org.apache.beam.sdk.annotations.Experimental; -import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.Description; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; /** * Spark runner {@link PipelineOptions} handles Spark execution-related configurations, such as the @@ -100,35 +94,4 @@ public interface SparkPipelineOptions extends SparkCommonPipelineOptions { boolean isCacheDisabled(); void setCacheDisabled(boolean value); - - /** Detects if the pipeline is run in spark local mode. */ - @Internal - static boolean isLocalSparkMaster(SparkPipelineOptions options) { - return options.getSparkMaster().matches("local\\[?\\d*]?"); - } - - /** - * Classpath contains non jar files (eg. directories with .class files or empty directories) will - * cause exception in running log. Though the {@link org.apache.spark.SparkContext} can handle - * this when running in local master, it's better not to include non-jars files in classpath. - */ - @Internal - static void prepareFilesToStage(SparkPipelineOptions options) { - if (!isLocalSparkMaster(options)) { - List filesToStage = - options.getFilesToStage().stream() - .map(File::new) - .filter(File::exists) - .map( - file -> { - return file.getAbsolutePath(); - }) - .collect(Collectors.toList()); - options.setFilesToStage( - PipelineResources.prepareFilesForStaging( - filesToStage, - MoreObjects.firstNonNull( - options.getTempLocation(), System.getProperty("java.io.tmpdir")))); - } - } } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java index 4bbad35f2626..8f15e878f8bb 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java @@ -17,10 +17,11 @@ */ package org.apache.beam.runners.spark; -import static org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage; import static org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.hasUnboundedPCollections; -import static org.apache.beam.runners.spark.SparkPipelineOptions.prepareFilesToStage; +import static org.apache.beam.runners.spark.SparkCommonPipelineOptions.prepareFilesToStage; +import static org.apache.beam.runners.spark.util.SparkCommon.startEventLoggingListener; +import edu.umd.cs.findbugs.annotations.Nullable; import java.util.UUID; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; @@ -49,16 +50,20 @@ import org.apache.beam.runners.spark.translation.SparkStreamingTranslationContext; import org.apache.beam.runners.spark.translation.SparkTranslationContext; import org.apache.beam.runners.spark.util.GlobalWatermarkHolder; +import org.apache.beam.runners.spark.util.SparkCompat; import org.apache.beam.sdk.io.FileSystems; import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.metrics.MetricsOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Struct; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.scheduler.EventLoggingListener; +import org.apache.spark.scheduler.SparkListenerApplicationEnd; import org.apache.spark.streaming.api.java.JavaStreamingContext; import org.apache.spark.streaming.api.java.JavaStreamingListener; import org.apache.spark.streaming.api.java.JavaStreamingListenerWrapper; +import org.joda.time.Instant; import org.kohsuke.args4j.CmdLineException; import org.kohsuke.args4j.CmdLineParser; import org.kohsuke.args4j.Option; @@ -67,11 +72,9 @@ /** Runs a portable pipeline on Apache Spark. */ @SuppressWarnings({ - "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) + "rawtypes" // TODO(https://issues.apache.org/jira/browse/BEAM-10556) }) public class SparkPipelineRunner implements PortablePipelineRunner { - private static final Logger LOG = LoggerFactory.getLogger(SparkPipelineRunner.class); private final SparkPipelineOptions pipelineOptions; @@ -110,24 +113,13 @@ public PortablePipelineResult run(RunnerApi.Pipeline pipeline, JobInfo jobInfo) ? trimmedPipeline : GreedyPipelineFuser.fuse(trimmedPipeline).toPipeline(); - // File staging. - if (pipelineOptions.getFilesToStage() == null) { - pipelineOptions.setFilesToStage( - detectClassPathResourcesToStage( - SparkPipelineRunner.class.getClassLoader(), pipelineOptions)); - LOG.info( - "PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath"); - } prepareFilesToStage(pipelineOptions); - LOG.info( - "Will stage {} files. (Enable logging at DEBUG level to see which files will be staged.)", - pipelineOptions.getFilesToStage().size()); - LOG.debug("Staging files: {}", pipelineOptions.getFilesToStage()); - PortablePipelineResult result; final JavaSparkContext jsc = SparkContextFactory.getSparkContext(pipelineOptions); - LOG.info(String.format("Running job %s on Spark master %s", jobInfo.jobId(), jsc.master())); + final long startTime = Instant.now().getMillis(); + EventLoggingListener eventLoggingListener = + startEventLoggingListener(jsc, pipelineOptions, startTime); // Initialize accumulators. AggregatorsAccumulator.init(pipelineOptions, jsc); @@ -213,6 +205,14 @@ public PortablePipelineResult run(RunnerApi.Pipeline pipeline, JobInfo jobInfo) result); metricsPusher.start(); + if (eventLoggingListener != null) { + eventLoggingListener.onApplicationStart( + SparkCompat.buildSparkListenerApplicationStart(jsc, pipelineOptions, startTime, result)); + eventLoggingListener.onApplicationEnd( + new SparkListenerApplicationEnd(Instant.now().getMillis())); + eventLoggingListener.stop(); + } + return result; } @@ -268,6 +268,7 @@ public static void main(String[] args) throws Exception { } private static class SparkPipelineRunnerConfiguration { + @Nullable @Option( name = "--base-job-name", usage = diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java index a159b1eff1d9..747a926e75b4 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.spark; -import static org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage; -import static org.apache.beam.runners.spark.SparkPipelineOptions.prepareFilesToStage; +import static org.apache.beam.runners.spark.SparkCommonPipelineOptions.prepareFilesToStage; +import static org.apache.beam.runners.spark.util.SparkCommon.startEventLoggingListener; import java.util.Collection; import java.util.HashMap; @@ -43,6 +43,7 @@ import org.apache.beam.runners.spark.translation.streaming.Checkpoint.CheckpointDir; import org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory; import org.apache.beam.runners.spark.util.GlobalWatermarkHolder.WatermarkAdvancingStreamingListener; +import org.apache.beam.runners.spark.util.SparkCompat; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.PipelineRunner; import org.apache.beam.sdk.metrics.MetricsEnvironment; @@ -67,9 +68,12 @@ import org.apache.spark.SparkEnv$; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.metrics.MetricsSystem; +import org.apache.spark.scheduler.EventLoggingListener; +import org.apache.spark.scheduler.SparkListenerApplicationEnd; import org.apache.spark.streaming.api.java.JavaStreamingContext; import org.apache.spark.streaming.api.java.JavaStreamingListener; import org.apache.spark.streaming.api.java.JavaStreamingListenerWrapper; +import org.joda.time.Instant; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -97,7 +101,7 @@ public final class SparkRunner extends PipelineRunner { private static final Logger LOG = LoggerFactory.getLogger(SparkRunner.class); /** Options used in this pipeline runner. */ - private final SparkPipelineOptions mOptions; + private final SparkPipelineOptions pipelineOptions; /** * Creates and returns a new SparkRunner with default options. In particular, against a spark @@ -128,22 +132,7 @@ public static SparkRunner create(SparkPipelineOptions options) { * @return A pipeline runner that will execute with specified options. */ public static SparkRunner fromOptions(PipelineOptions options) { - SparkPipelineOptions sparkOptions = - PipelineOptionsValidator.validate(SparkPipelineOptions.class, options); - - if (sparkOptions.getFilesToStage() == null - && !SparkPipelineOptions.isLocalSparkMaster(sparkOptions)) { - sparkOptions.setFilesToStage( - detectClassPathResourcesToStage(SparkRunner.class.getClassLoader(), options)); - LOG.info( - "PipelineOptions.filesToStage was not specified. " - + "Defaulting to files from the classpath: will stage {} files. " - + "Enable logging at DEBUG level to see which files will be staged.", - sparkOptions.getFilesToStage().size()); - LOG.debug("Classpath elements: {}", sparkOptions.getFilesToStage()); - } - - return new SparkRunner(sparkOptions); + return new SparkRunner(PipelineOptionsValidator.validate(SparkPipelineOptions.class, options)); } /** @@ -151,7 +140,7 @@ public static SparkRunner fromOptions(PipelineOptions options) { * thread. */ private SparkRunner(SparkPipelineOptions options) { - mOptions = options; + pipelineOptions = options; } @Override @@ -170,26 +159,28 @@ public SparkPipelineResult run(final Pipeline pipeline) { // visit the pipeline to determine the translation mode detectTranslationMode(pipeline); - // Default to using the primitive versions of Read.Bounded and Read.Unbounded if we are - // executing an unbounded pipeline or the user specifically requested it. - if (mOptions.isStreaming() - || ExperimentalOptions.hasExperiment( - pipeline.getOptions(), "beam_fn_api_use_deprecated_read") - || ExperimentalOptions.hasExperiment(pipeline.getOptions(), "use_deprecated_read")) { - SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReads(pipeline); + // Default to using the primitive versions of Read.Bounded and Read.Unbounded. + // TODO(BEAM-10670): Use SDF read as default when we address performance issue. + if (!ExperimentalOptions.hasExperiment(pipeline.getOptions(), "beam_fn_api")) { + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); } - pipeline.replaceAll(SparkTransformOverrides.getDefaultOverrides(mOptions.isStreaming())); + pipeline.replaceAll(SparkTransformOverrides.getDefaultOverrides(pipelineOptions.isStreaming())); - prepareFilesToStage(mOptions); + prepareFilesToStage(pipelineOptions); - if (mOptions.isStreaming()) { - CheckpointDir checkpointDir = new CheckpointDir(mOptions.getCheckpointDir()); + final long startTime = Instant.now().getMillis(); + EventLoggingListener eventLoggingListener = null; + JavaSparkContext jsc = null; + if (pipelineOptions.isStreaming()) { + CheckpointDir checkpointDir = new CheckpointDir(pipelineOptions.getCheckpointDir()); SparkRunnerStreamingContextFactory streamingContextFactory = - new SparkRunnerStreamingContextFactory(pipeline, mOptions, checkpointDir); + new SparkRunnerStreamingContextFactory(pipeline, pipelineOptions, checkpointDir); final JavaStreamingContext jssc = JavaStreamingContext.getOrCreate( checkpointDir.getSparkCheckpointDir().toString(), streamingContextFactory); + jsc = jssc.sparkContext(); + eventLoggingListener = startEventLoggingListener(jsc, pipelineOptions, startTime); // Checkpoint aggregator/metrics values jssc.addStreamingListener( @@ -200,7 +191,8 @@ public SparkPipelineResult run(final Pipeline pipeline) { new MetricsAccumulator.AccumulatorCheckpointingSparkListener())); // register user-defined listeners. - for (JavaStreamingListener listener : mOptions.as(SparkContextOptions.class).getListeners()) { + for (JavaStreamingListener listener : + pipelineOptions.as(SparkContextOptions.class).getListeners()) { LOG.info("Registered listener {}." + listener.getClass().getSimpleName()); jssc.addStreamingListener(new JavaStreamingListenerWrapper(listener)); } @@ -213,7 +205,7 @@ public SparkPipelineResult run(final Pipeline pipeline) { // SparkRunnerStreamingContextFactory is because the factory is not called when resuming // from checkpoint (When not resuming from checkpoint initAccumulators will be called twice // but this is fine since it is idempotent). - initAccumulators(mOptions, jssc.sparkContext()); + initAccumulators(pipelineOptions, jssc.sparkContext()); startPipeline = executorService.submit( @@ -225,15 +217,16 @@ public SparkPipelineResult run(final Pipeline pipeline) { result = new SparkPipelineResult.StreamingMode(startPipeline, jssc); } else { - // create the evaluation context - final JavaSparkContext jsc = SparkContextFactory.getSparkContext(mOptions); - final EvaluationContext evaluationContext = new EvaluationContext(jsc, pipeline, mOptions); + jsc = SparkContextFactory.getSparkContext(pipelineOptions); + eventLoggingListener = startEventLoggingListener(jsc, pipelineOptions, startTime); + final EvaluationContext evaluationContext = + new EvaluationContext(jsc, pipeline, pipelineOptions); translator = new TransformTranslator.Translator(); // update the cache candidates updateCacheCandidates(pipeline, translator, evaluationContext); - initAccumulators(mOptions, jsc); + initAccumulators(pipelineOptions, jsc); startPipeline = executorService.submit( () -> { @@ -246,8 +239,8 @@ public SparkPipelineResult run(final Pipeline pipeline) { result = new SparkPipelineResult.BatchMode(startPipeline, jsc); } - if (mOptions.getEnableSparkMetricSinks()) { - registerMetricsSource(mOptions.getAppName()); + if (pipelineOptions.getEnableSparkMetricSinks()) { + registerMetricsSource(pipelineOptions.getAppName()); } // it would have been better to create MetricsPusher from runner-core but we need @@ -255,8 +248,19 @@ public SparkPipelineResult run(final Pipeline pipeline) { // MetricsContainerStepMap MetricsPusher metricsPusher = new MetricsPusher( - MetricsAccumulator.getInstance().value(), mOptions.as(MetricsOptions.class), result); + MetricsAccumulator.getInstance().value(), + pipelineOptions.as(MetricsOptions.class), + result); metricsPusher.start(); + + if (eventLoggingListener != null && jsc != null) { + eventLoggingListener.onApplicationStart( + SparkCompat.buildSparkListenerApplicationStart(jsc, pipelineOptions, startTime, result)); + eventLoggingListener.onApplicationEnd( + new SparkListenerApplicationEnd(Instant.now().getMillis())); + eventLoggingListener.stop(); + } + return result; } @@ -288,7 +292,7 @@ private void detectTranslationMode(Pipeline pipeline) { pipeline.traverseTopologically(detector); if (detector.getTranslationMode().equals(TranslationMode.STREAMING)) { // set streaming mode if it's a streaming pipeline - this.mOptions.setStreaming(true); + this.pipelineOptions.setStreaming(true); } } @@ -403,7 +407,7 @@ protected boolean shouldDefer(TransformHierarchy.Node node) { } PValue input = Iterables.getOnlyElement(nonAdditionalInputs); if (!(input instanceof PCollection) - || ((PCollection) input).getWindowingStrategy().getWindowFn().isNonMerging()) { + || !((PCollection) input).getWindowingStrategy().needsMerge()) { return false; } // so far we know that the input is a PCollection with merging windows. diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunnerDebugger.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunnerDebugger.java index 33b8408fa5be..65f83f5e8195 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunnerDebugger.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunnerDebugger.java @@ -79,14 +79,13 @@ public static SparkRunnerDebugger fromOptions(PipelineOptions options) { public SparkPipelineResult run(Pipeline pipeline) { boolean isStreaming = options.isStreaming() || options.as(TestSparkPipelineOptions.class).isForceStreaming(); - // Default to using the primitive versions of Read.Bounded and Read.Unbounded if we are - // executing an unbounded pipeline or the user specifically requested it. - if (isStreaming - || ExperimentalOptions.hasExperiment( - pipeline.getOptions(), "beam_fn_api_use_deprecated_read") - || ExperimentalOptions.hasExperiment(pipeline.getOptions(), "use_deprecated_read")) { - SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReads(pipeline); + + // Default to using the primitive versions of Read.Bounded and Read.Unbounded. + // TODO(BEAM-10670): Use SDF read as default when we address performance issue. + if (!ExperimentalOptions.hasExperiment(pipeline.getOptions(), "beam_fn_api")) { + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); } + JavaSparkContext jsc = new JavaSparkContext("local[1]", "Debug_Pipeline"); JavaStreamingContext jssc = new JavaStreamingContext(jsc, new org.apache.spark.streaming.Duration(1000)); diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunnerRegistrar.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunnerRegistrar.java index 79f2a4e50d95..cfc07377b9ce 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunnerRegistrar.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunnerRegistrar.java @@ -18,8 +18,6 @@ package org.apache.beam.runners.spark; import com.google.auto.service.AutoService; -import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingPipelineOptions; -import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner; import org.apache.beam.sdk.PipelineRunner; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsRegistrar; @@ -41,8 +39,7 @@ private SparkRunnerRegistrar() {} public static class Runner implements PipelineRunnerRegistrar { @Override public Iterable>> getPipelineRunners() { - return ImmutableList.of( - SparkRunner.class, TestSparkRunner.class, SparkStructuredStreamingRunner.class); + return ImmutableList.of(SparkRunner.class, TestSparkRunner.class); } } @@ -52,9 +49,7 @@ public static class Options implements PipelineOptionsRegistrar { @Override public Iterable> getPipelineOptions() { return ImmutableList.of( - SparkPipelineOptions.class, - SparkStructuredStreamingPipelineOptions.class, - SparkPortableStreamingPipelineOptions.class); + SparkPipelineOptions.class, SparkPortableStreamingPipelineOptions.class); } } } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java index b6433a042df7..09935e6e639c 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java @@ -81,7 +81,7 @@ public static void init(SparkPipelineOptions opts, JavaSparkContext jsc) { public static NamedAggregatorsAccumulator getInstance() { if (instance == null) { - throw new IllegalStateException("Aggregrators accumulator has not been instantiated"); + throw new IllegalStateException("Aggregators accumulator has not been instantiated"); } else { return instance; } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java index c43bb095e53b..02745c927c7e 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java @@ -38,13 +38,11 @@ /** * An adapter between the {@link MetricsContainerStepMap} and Codahale's {@link Metric} interface. */ -class SparkBeamMetric implements Metric { +public class SparkBeamMetric implements Metric { private static final String ILLEGAL_CHARACTERS = "[^A-Za-z0-9-]"; - Map renderAll() { + static Map renderAll(MetricResults metricResults) { Map metrics = new HashMap<>(); - MetricResults metricResults = - asAttemptedOnlyMetricResults(MetricsAccumulator.getInstance().value()); MetricQueryResults metricQueryResults = metricResults.allMetrics(); for (MetricResult metricResult : metricQueryResults.getCounters()) { metrics.put(renderName(metricResult), metricResult.getAttempted()); @@ -63,8 +61,24 @@ class SparkBeamMetric implements Metric { return metrics; } + public static Map renderAllToString(MetricResults metricResults) { + Map metricsString = new HashMap<>(); + for (Map.Entry entry : renderAll(metricResults).entrySet()) { + String key = entry.getKey(); + String value = String.valueOf(entry.getValue()); + metricsString.put(key, value); + } + return metricsString; + } + + Map renderAll() { + MetricResults metricResults = + asAttemptedOnlyMetricResults(MetricsAccumulator.getInstance().value()); + return renderAll(metricResults); + } + @VisibleForTesting - String renderName(MetricResult metricResult) { + static String renderName(MetricResult metricResult) { MetricKey key = metricResult.getKey(); MetricName name = key.metricName(); String step = key.stepName(); diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkTimerInternals.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkTimerInternals.java index b726c233dcbe..d22fe8e6194b 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkTimerInternals.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkTimerInternals.java @@ -120,7 +120,7 @@ public void setTimer(TimerData timer) { } @Override - public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain timeDomain) { + public void deleteTimer(StateNamespace namespace, String timerId, String timerFamilyId, TimeDomain timeDomain) { throw new UnsupportedOperationException("Deleting a timer by ID is not yet supported."); } diff --git a/sdks/go/pkg/beam/io/pubsubio/v1/v1.proto b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/Constants.java similarity index 68% rename from sdks/go/pkg/beam/io/pubsubio/v1/v1.proto rename to runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/Constants.java index e55930ed6ec3..23572de2e8cc 100644 --- a/sdks/go/pkg/beam/io/pubsubio/v1/v1.proto +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/Constants.java @@ -16,24 +16,11 @@ * limitations under the License. */ -/* - * Protocol Buffers describing pubsub payload (v1) to pass to beam.External. - */ -syntax = "proto3"; - -package v1; +package org.apache.beam.runners.spark.structuredstreaming; -message PubSubPayload { - enum Op { - INVALID = 0; - READ = 1; - WRITE = 2; - } - Op op = 1; +public class Constants { - string Topic = 2; - string Subscription = 3; - string IdAttribute = 4; - string TimestampAttribute = 5; - bool WithAttributes = 6; + public static final String BEAM_SOURCE_OPTION = "beam-source"; + public static final String DEFAULT_PARALLELISM = "default-parallelism"; + public static final String PIPELINE_OPTIONS = "pipeline-options"; } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunner.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunner.java index 93dfbe759f9b..c0cb9628d015 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunner.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunner.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.spark.structuredstreaming; -import static org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage; +import static org.apache.beam.runners.spark.SparkCommonPipelineOptions.prepareFilesToStage; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; @@ -29,8 +29,8 @@ import org.apache.beam.runners.spark.structuredstreaming.metrics.CompositeSource; import org.apache.beam.runners.spark.structuredstreaming.metrics.MetricsAccumulator; import org.apache.beam.runners.spark.structuredstreaming.metrics.SparkBeamMetricSource; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.batch.PipelineTranslatorBatch; import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.PipelineTranslatorStreaming; import org.apache.beam.sdk.Pipeline; @@ -109,23 +109,8 @@ public static SparkStructuredStreamingRunner create( * @return A pipeline runner that will execute with specified options. */ public static SparkStructuredStreamingRunner fromOptions(PipelineOptions options) { - SparkStructuredStreamingPipelineOptions sparkOptions = - PipelineOptionsValidator.validate(SparkStructuredStreamingPipelineOptions.class, options); - - if (sparkOptions.getFilesToStage() == null - && !PipelineTranslator.isLocalSparkMaster(sparkOptions)) { - sparkOptions.setFilesToStage( - detectClassPathResourcesToStage( - SparkStructuredStreamingRunner.class.getClassLoader(), options)); - LOG.info( - "PipelineOptions.filesToStage was not specified. " - + "Defaulting to files from the classpath: will stage {} files. " - + "Enable logging at DEBUG level to see which files will be staged.", - sparkOptions.getFilesToStage().size()); - LOG.debug("Classpath elements: {}", sparkOptions.getFilesToStage()); - } - - return new SparkStructuredStreamingRunner(sparkOptions); + return new SparkStructuredStreamingRunner( + PipelineOptionsValidator.validate(SparkStructuredStreamingPipelineOptions.class, options)); } /** @@ -150,7 +135,7 @@ public SparkStructuredStreamingPipelineResult run(final Pipeline pipeline) { AggregatorsAccumulator.clear(); MetricsAccumulator.clear(); - final TranslationContext translationContext = translatePipeline(pipeline); + final AbstractTranslationContext translationContext = translatePipeline(pipeline); final ExecutorService executorService = Executors.newSingleThreadExecutor(); final Future submissionFuture = @@ -183,20 +168,18 @@ public SparkStructuredStreamingPipelineResult run(final Pipeline pipeline) { return result; } - private TranslationContext translatePipeline(Pipeline pipeline) { + private AbstractTranslationContext translatePipeline(Pipeline pipeline) { PipelineTranslator.detectTranslationMode(pipeline, options); - // Default to using the primitive versions of Read.Bounded and Read.Unbounded if we are - // executing an unbounded pipeline or the user specifically requested it. - if (options.isStreaming() - || ExperimentalOptions.hasExperiment( - pipeline.getOptions(), "beam_fn_api_use_deprecated_read") - || ExperimentalOptions.hasExperiment(pipeline.getOptions(), "use_deprecated_read")) { - SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReads(pipeline); + // Default to using the primitive versions of Read.Bounded and Read.Unbounded for non-portable + // execution. + // TODO(BEAM-10670): Use SDF read as default when we address performance issue. + if (!ExperimentalOptions.hasExperiment(pipeline.getOptions(), "beam_fn_api")) { + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); } PipelineTranslator.replaceTransforms(pipeline, options); - PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(options); + prepareFilesToStage(options); PipelineTranslator pipelineTranslator = options.isStreaming() ? new PipelineTranslatorStreaming(options) diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunnerRegistrar.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunnerRegistrar.java new file mode 100644 index 000000000000..fb759da960ce --- /dev/null +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunnerRegistrar.java @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.spark.structuredstreaming; + +import com.google.auto.service.AutoService; +import org.apache.beam.sdk.PipelineRunner; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsRegistrar; +import org.apache.beam.sdk.runners.PipelineRunnerRegistrar; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * Contains the {@link PipelineRunnerRegistrar} and {@link PipelineOptionsRegistrar} for the {@link + * SparkStructuredStreamingRunner}. + * + *

    {@link AutoService} will register Spark's implementations of the {@link PipelineRunner} and + * {@link PipelineOptions} as available pipeline runner services. + */ +public final class SparkStructuredStreamingRunnerRegistrar { + private SparkStructuredStreamingRunnerRegistrar() {} + + /** Registers the {@link SparkStructuredStreamingRunner}. */ + @AutoService(PipelineRunnerRegistrar.class) + public static class Runner implements PipelineRunnerRegistrar { + @Override + public Iterable>> getPipelineRunners() { + return ImmutableList.of(SparkStructuredStreamingRunner.class); + } + } + + /** Registers the {@link SparkStructuredStreamingPipelineOptions}. */ + @AutoService(PipelineOptionsRegistrar.class) + public static class Options implements PipelineOptionsRegistrar { + @Override + public Iterable> getPipelineOptions() { + return ImmutableList.of(SparkStructuredStreamingPipelineOptions.class); + } + } +} diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/AbstractTranslationContext.java similarity index 94% rename from runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java rename to runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/AbstractTranslationContext.java index cf74d40495b8..6d66839b3ce7 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/AbstractTranslationContext.java @@ -44,7 +44,6 @@ import org.apache.spark.sql.ForeachWriter; import org.apache.spark.sql.SparkSession; import org.apache.spark.sql.streaming.DataStreamWriter; -import org.apache.spark.sql.streaming.StreamingQueryException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -56,9 +55,9 @@ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public class TranslationContext { +public abstract class AbstractTranslationContext { - private static final Logger LOG = LoggerFactory.getLogger(TranslationContext.class); + private static final Logger LOG = LoggerFactory.getLogger(AbstractTranslationContext.class); /** All the datasets of the DAG. */ private final Map> datasets; @@ -75,7 +74,7 @@ public class TranslationContext { private final Map, Dataset> broadcastDataSets; - public TranslationContext(SparkStructuredStreamingPipelineOptions options) { + public AbstractTranslationContext(SparkStructuredStreamingPipelineOptions options) { SparkConf sparkConf = new SparkConf(); sparkConf.setMaster(options.getSparkMaster()); sparkConf.setAppName(options.getAppName()); @@ -201,8 +200,7 @@ public Map, Coder> getOutputCoders() { // -------------------------------------------------------------------------------------------- /** Starts the pipeline. */ - public void startPipeline() { - try { + public void startPipeline(){ SparkStructuredStreamingPipelineOptions options = serializablePipelineOptions.get().as(SparkStructuredStreamingPipelineOptions.class); int datasetIndex = 0; @@ -216,8 +214,7 @@ public void startPipeline() { dataStreamWriter = dataStreamWriter.option("checkpointLocation", options.getCheckpointDir()); } - // TODO: Do not await termination here. - dataStreamWriter.foreach(new NoOpForeachWriter<>()).start().awaitTermination(); + launchStreaming(dataStreamWriter.foreach(new NoOpForeachWriter<>())); } else { if (options.getTestMode()) { LOG.debug("**** dataset {} catalyst execution plans ****", ++datasetIndex); @@ -228,10 +225,8 @@ public void startPipeline() { dataset.foreach((ForeachFunction) t -> {}); } } - } catch (StreamingQueryException e) { - throw new RuntimeException("Pipeline execution failed: " + e); - } } + public abstract void launchStreaming(DataStreamWriter dataStreamWriter); public static void printDatasetContent(Dataset dataset) { // cannot use dataset.show because dataset schema is binary so it will print binary @@ -241,7 +236,6 @@ public static void printDatasetContent(Dataset dataset) { LOG.debug("**** dataset content {} ****", windowedValue.toString()); } } - private static class NoOpForeachWriter extends ForeachWriter { @Override @@ -259,4 +253,5 @@ public void close(Throwable errorOrNull) { // do nothing } } + } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java index ce3cd730d0fb..b78907fe4962 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java @@ -18,8 +18,6 @@ package org.apache.beam.runners.spark.structuredstreaming.translation; import org.apache.beam.runners.core.construction.PTransformTranslation; -import org.apache.beam.runners.core.construction.resources.PipelineResources; -import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingPipelineOptions; import org.apache.beam.runners.spark.structuredstreaming.translation.batch.PipelineTranslatorBatch; import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.PipelineTranslatorStreaming; import org.apache.beam.sdk.Pipeline; @@ -44,25 +42,11 @@ public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults { private int depth = 0; private static final Logger LOG = LoggerFactory.getLogger(PipelineTranslator.class); - protected TranslationContext translationContext; + protected AbstractTranslationContext translationContext; // -------------------------------------------------------------------------------------------- // Pipeline preparation methods // -------------------------------------------------------------------------------------------- - /** - * Local configurations work in the same JVM and have no problems with improperly formatted files - * on classpath (eg. directories with .class files or empty directories). Prepare files for - * staging only when using remote cluster (passing the master address explicitly). - */ - public static void prepareFilesToStageForRemoteClusterExecution( - SparkStructuredStreamingPipelineOptions options) { - if (!PipelineTranslator.isLocalSparkMaster(options)) { - options.setFilesToStage( - PipelineResources.prepareFilesForStaging( - options.getFilesToStage(), options.getTempLocation())); - } - } - public static void replaceTransforms(Pipeline pipeline, StreamingOptions options) { pipeline.replaceAll(SparkTransformOverrides.getDefaultOverrides(options.isStreaming())); } @@ -138,11 +122,6 @@ private static String genSpaces(int n) { return builder.toString(); } - /** Detects if the pipeline is run in local spark mode. */ - public static boolean isLocalSparkMaster(SparkStructuredStreamingPipelineOptions options) { - return options.getSparkMaster().matches("local\\[?\\d*]?"); - } - /** Get a {@link TransformTranslator} for the given {@link TransformHierarchy.Node}. */ protected abstract TransformTranslator getTransformTranslator(TransformHierarchy.Node node); @@ -217,7 +196,7 @@ public void visitPrimitiveTransform(TransformHierarchy.Node node) { applyTransformTranslator(node, transformTranslator); } - public TranslationContext getTranslationContext() { + public AbstractTranslationContext getTranslationContext() { return translationContext; } } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java index 6d242ca33357..f7a5d5600223 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java @@ -27,5 +27,5 @@ public interface TransformTranslator extends Serializable { /** Base class for translators of {@link PTransform}. */ - void translateTransform(TransformT transform, TranslationContext context); + void translateTransform(TransformT transform, AbstractTranslationContext context); } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/AggregatorCombiner.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/AggregatorCombiner.java index ec2f65e3703f..d0f46ea807c2 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/AggregatorCombiner.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/AggregatorCombiner.java @@ -161,11 +161,7 @@ public Iterable> merge( Tuple2 accumAndInstant = new Tuple2<>( accumT, - timestampCombiner.assign( - mergedWindowForAccumulator, - windowingStrategy - .getWindowFn() - .getOutputTime(accumulatorWv.getTimestamp(), mergedWindowForAccumulator))); + timestampCombiner.assign(mergedWindowForAccumulator, accumulatorWv.getTimestamp())); if (mergedWindowToAccumulators.get(mergedWindowForAccumulator) == null) { mergedWindowToAccumulators.put( mergedWindowForAccumulator, Lists.newArrayList(accumAndInstant)); @@ -238,7 +234,7 @@ private Map mergeWindows(WindowingStrategy windowingStrategy, S throws Exception { WindowFn windowFn = windowingStrategy.getWindowFn(); - if (windowingStrategy.getWindowFn().isNonMerging()) { + if (!windowingStrategy.needsMerge()) { // Return an empty map, indicating that every window is not merged. return Collections.emptyMap(); } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java index 8d64fd345423..4b9464ac847c 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java @@ -19,8 +19,8 @@ import java.util.ArrayList; import java.util.List; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.EncoderHelpers; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.KVHelpers; import org.apache.beam.sdk.coders.CannotProvideCoderException; @@ -48,7 +48,7 @@ class CombinePerKeyTranslatorBatch @Override public void translateTransform( PTransform>, PCollection>> transform, - TranslationContext context) { + AbstractTranslationContext context) { Combine.PerKey combineTransform = (Combine.PerKey) transform; @SuppressWarnings("unchecked") diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CreatePCollectionViewTranslatorBatch.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CreatePCollectionViewTranslatorBatch.java index 525cc74ae0b9..ba40a8576ae9 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CreatePCollectionViewTranslatorBatch.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CreatePCollectionViewTranslatorBatch.java @@ -19,8 +19,8 @@ import java.io.IOException; import org.apache.beam.runners.core.construction.CreatePCollectionViewTranslation; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.sdk.runners.AppliedPTransform; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.util.WindowedValue; @@ -33,7 +33,7 @@ class CreatePCollectionViewTranslatorBatch @Override public void translateTransform( - PTransform, PCollection> transform, TranslationContext context) { + PTransform, PCollection> transform, AbstractTranslationContext context) { Dataset> inputDataSet = context.getDataset(context.getInput()); diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DoFnFunction.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DoFnFunction.java index 0e2a40f8f9f6..ccddf62cf463 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DoFnFunction.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DoFnFunction.java @@ -106,7 +106,7 @@ public DoFnFunction( public Iterator, WindowedValue>> call(Iterator> iter) throws Exception { if (!wasSetupCalled && iter.hasNext()) { - DoFnInvokers.tryInvokeSetupFor(doFn); + DoFnInvokers.tryInvokeSetupFor(doFn, serializableOptions.get()); wasSetupCalled = true; } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTranslatorBatch.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTranslatorBatch.java index 8320fad07bc9..beb35424998b 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTranslatorBatch.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTranslatorBatch.java @@ -20,8 +20,8 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import java.util.Collection; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.PCollection; @@ -37,7 +37,7 @@ class FlattenTranslatorBatch @Override public void translateTransform( - PTransform, PCollection> transform, TranslationContext context) { + PTransform, PCollection> transform, AbstractTranslationContext context) { Collection> pcollectionList = context.getInputs().values(); Dataset> result = null; if (pcollectionList.isEmpty()) { diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java index ab3faab3925e..4fe26d7d67a6 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java @@ -17,26 +17,27 @@ */ package org.apache.beam.runners.spark.structuredstreaming.translation.batch; -import java.io.Serializable; -import org.apache.beam.runners.core.InMemoryStateInternals; -import org.apache.beam.runners.core.StateInternals; -import org.apache.beam.runners.core.StateInternalsFactory; -import org.apache.beam.runners.core.SystemReduceFn; +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.runners.core.Concatenate; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; -import org.apache.beam.runners.spark.structuredstreaming.translation.batch.functions.GroupAlsoByWindowViaOutputBufferFn; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.EncoderHelpers; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.KVHelpers; +import org.apache.beam.sdk.coders.CannotProvideCoderException; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.WindowingStrategy; +import org.apache.spark.api.java.function.FlatMapFunction; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.KeyValueGroupedDataset; +import scala.Tuple2; class GroupByKeyTranslatorBatch implements TransformTranslator< @@ -45,46 +46,65 @@ class GroupByKeyTranslatorBatch @Override public void translateTransform( PTransform>, PCollection>>> transform, - TranslationContext context) { + AbstractTranslationContext context) { @SuppressWarnings("unchecked") - final PCollection> inputPCollection = (PCollection>) context.getInput(); - Dataset>> input = context.getDataset(inputPCollection); - WindowingStrategy windowingStrategy = inputPCollection.getWindowingStrategy(); - KvCoder kvCoder = (KvCoder) inputPCollection.getCoder(); - Coder valueCoder = kvCoder.getValueCoder(); + final PCollection> input = (PCollection>) context.getInput(); + @SuppressWarnings("unchecked") + final PCollection>> output = (PCollection>>) context.getOutput(); + final Combine.CombineFn, List> combineFn = new Concatenate<>(); - // group by key only - Coder keyCoder = kvCoder.getKeyCoder(); - KeyValueGroupedDataset>> groupByKeyOnly = - input.groupByKey(KVHelpers.extractKey(), EncoderHelpers.fromBeamCoder(keyCoder)); + WindowingStrategy windowingStrategy = input.getWindowingStrategy(); - // group also by windows - WindowedValue.FullWindowedValueCoder>> outputCoder = - WindowedValue.FullWindowedValueCoder.of( - KvCoder.of(keyCoder, IterableCoder.of(valueCoder)), - windowingStrategy.getWindowFn().windowCoder()); - Dataset>>> output = - groupByKeyOnly.flatMapGroups( - new GroupAlsoByWindowViaOutputBufferFn<>( - windowingStrategy, - new InMemoryStateInternalsFactory<>(), - SystemReduceFn.buffering(valueCoder), - context.getSerializableOptions()), - EncoderHelpers.fromBeamCoder(outputCoder)); + Dataset>> inputDataset = context.getDataset(input); - context.putDataset(context.getOutput(), output); - } + KvCoder inputCoder = (KvCoder) input.getCoder(); + Coder keyCoder = inputCoder.getKeyCoder(); + KvCoder> outputKVCoder = (KvCoder>) output.getCoder(); + Coder> outputCoder = outputKVCoder.getValueCoder(); + + KeyValueGroupedDataset>> groupedDataset = + inputDataset.groupByKey(KVHelpers.extractKey(), EncoderHelpers.fromBeamCoder(keyCoder)); - /** - * In-memory state internals factory. - * - * @param State key type. - */ - static class InMemoryStateInternalsFactory implements StateInternalsFactory, Serializable { - @Override - public StateInternals stateInternalsForKey(K key) { - return InMemoryStateInternals.forKey(key); + Coder> accumulatorCoder = null; + try { + accumulatorCoder = + combineFn.getAccumulatorCoder( + input.getPipeline().getCoderRegistry(), inputCoder.getValueCoder()); + } catch (CannotProvideCoderException e) { + throw new RuntimeException(e); } + + Dataset>>>> combinedDataset = + groupedDataset.agg( + new AggregatorCombiner, List, BoundedWindow>( + combineFn, windowingStrategy, accumulatorCoder, outputCoder) + .toColumn()); + + // expand the list into separate elements and put the key back into the elements + WindowedValue.WindowedValueCoder>> wvCoder = + WindowedValue.FullWindowedValueCoder.of( + outputKVCoder, input.getWindowingStrategy().getWindowFn().windowCoder()); + Dataset>>> outputDataset = + combinedDataset.flatMap( + (FlatMapFunction< + Tuple2>>>, WindowedValue>>>) + tuple2 -> { + K key = tuple2._1(); + Iterable>> windowedValues = tuple2._2(); + List>>> result = new ArrayList<>(); + for (WindowedValue> windowedValue : windowedValues) { + KV> kv = KV.of(key, windowedValue.getValue()); + result.add( + WindowedValue.of( + kv, + windowedValue.getTimestamp(), + windowedValue.getWindows(), + windowedValue.getPane())); + } + return result.iterator(); + }, + EncoderHelpers.fromBeamCoder(wvCoder)); + context.putDataset(output, outputDataset); } } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ImpulseTranslatorBatch.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ImpulseTranslatorBatch.java index dcdceadbccc7..65f496c772ba 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ImpulseTranslatorBatch.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ImpulseTranslatorBatch.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.spark.structuredstreaming.translation.batch; import java.util.Collections; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.EncoderHelpers; import org.apache.beam.sdk.coders.ByteArrayCoder; import org.apache.beam.sdk.coders.Coder; @@ -35,7 +35,7 @@ public class ImpulseTranslatorBatch @Override public void translateTransform( - PTransform> transform, TranslationContext context) { + PTransform> transform, AbstractTranslationContext context) { Coder> windowedValueCoder = WindowedValue.FullWindowedValueCoder.of(ByteArrayCoder.of(), GlobalWindow.Coder.INSTANCE); Dataset> dataset = diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java index f2b121796155..bade668382f7 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java @@ -27,11 +27,11 @@ import org.apache.beam.runners.core.construction.ParDoTranslation; import org.apache.beam.runners.spark.structuredstreaming.metrics.MetricsAccumulator; import org.apache.beam.runners.spark.structuredstreaming.metrics.MetricsContainerStepMapAccumulator; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.CoderHelpers; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.EncoderHelpers; -import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.MultiOuputCoder; +import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.MultiOutputCoder; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.SideInputBroadcast; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.SerializableCoder; @@ -68,7 +68,7 @@ class ParDoTranslatorBatch @Override public void translateTransform( - PTransform, PCollectionTuple> transform, TranslationContext context) { + PTransform, PCollectionTuple> transform, AbstractTranslationContext context) { String stepName = context.getCurrentTransform().getFullName(); // Check for not supported advanced features @@ -140,8 +140,8 @@ public void translateTransform( doFnSchemaInformation, sideInputMapping); - MultiOuputCoder multipleOutputCoder = - MultiOuputCoder.of(SerializableCoder.of(TupleTag.class), outputCoderMap, windowCoder); + MultiOutputCoder multipleOutputCoder = + MultiOutputCoder.of(SerializableCoder.of(TupleTag.class), outputCoderMap, windowCoder); Dataset, WindowedValue>> allOutputs = inputDataSet.mapPartitions(doFnWrapper, EncoderHelpers.fromBeamCoder(multipleOutputCoder)); if (outputs.entrySet().size() > 1) { @@ -163,7 +163,7 @@ public void translateTransform( } private static SideInputBroadcast createBroadcastSideInputs( - List> sideInputs, TranslationContext context) { + List> sideInputs, AbstractTranslationContext context) { JavaSparkContext jsc = JavaSparkContext.fromSparkContext(context.getSparkSession().sparkContext()); @@ -189,7 +189,7 @@ private static SideInputBroadcast createBroadcastSideInputs( return sideInputBroadcast; } - private List> getSideInputs(TranslationContext context) { + private List> getSideInputs(AbstractTranslationContext context) { List> sideInputs; try { sideInputs = ParDoTranslation.getSideInputs(context.getCurrentTransform()); @@ -199,7 +199,7 @@ private List> getSideInputs(TranslationContext context) { return sideInputs; } - private TupleTag getTupleTag(TranslationContext context) { + private TupleTag getTupleTag(AbstractTranslationContext context) { TupleTag mainOutputTag; try { mainOutputTag = ParDoTranslation.getMainOutputTag(context.getCurrentTransform()); @@ -210,7 +210,7 @@ private TupleTag getTupleTag(TranslationContext context) { } @SuppressWarnings("unchecked") - private DoFn getDoFn(TranslationContext context) { + private DoFn getDoFn(AbstractTranslationContext context) { DoFn doFn; try { doFn = (DoFn) ParDoTranslation.getDoFn(context.getCurrentTransform()); @@ -221,7 +221,7 @@ private DoFn getDoFn(TranslationContext context) { } private void pruneOutputFilteredByTag( - TranslationContext context, + AbstractTranslationContext context, Dataset, WindowedValue>> allOutputs, Map.Entry, PCollection> output, Coder windowCoder) { diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java index 6af7f55877f4..c07e882d6d13 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java @@ -17,11 +17,15 @@ */ package org.apache.beam.runners.spark.structuredstreaming.translation.batch; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.BEAM_SOURCE_OPTION; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.DEFAULT_PARALLELISM; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.PIPELINE_OPTIONS; + import java.io.IOException; import org.apache.beam.runners.core.construction.ReadTranslation; import org.apache.beam.runners.core.serialization.Base64Serializer; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.EncoderHelpers; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.RowHelpers; import org.apache.beam.sdk.io.BoundedSource; @@ -43,7 +47,7 @@ class ReadSourceTranslatorBatch @SuppressWarnings("unchecked") @Override public void translateTransform( - PTransform> transform, TranslationContext context) { + PTransform> transform, AbstractTranslationContext context) { AppliedPTransform, PTransform>> rootTransform = (AppliedPTransform, PTransform>>) context.getCurrentTransform(); @@ -61,12 +65,12 @@ public void translateTransform( sparkSession .read() .format(sourceProviderClass) - .option(DatasetSourceBatch.BEAM_SOURCE_OPTION, serializedSource) + .option(BEAM_SOURCE_OPTION, serializedSource) .option( - DatasetSourceBatch.DEFAULT_PARALLELISM, + DEFAULT_PARALLELISM, String.valueOf(context.getSparkSession().sparkContext().defaultParallelism())) .option( - DatasetSourceBatch.PIPELINE_OPTIONS, context.getSerializableOptions().toString()) + PIPELINE_OPTIONS, context.getSerializableOptions().toString()) .load(); // extract windowedValue from Row diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java index ca71dc7448c0..094328c912dd 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java @@ -17,13 +17,13 @@ */ package org.apache.beam.runners.spark.structuredstreaming.translation.batch; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.sdk.transforms.Reshuffle; /** TODO: Should be removed if {@link Reshuffle} won't be translated. */ class ReshuffleTranslatorBatch implements TransformTranslator> { @Override - public void translateTransform(Reshuffle transform, TranslationContext context) {} + public void translateTransform(Reshuffle transform, AbstractTranslationContext context) {} } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java index b1f7c6ffd657..20f8f2f7a58a 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.spark.structuredstreaming.translation.batch; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.EncoderHelpers; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.WindowingHelpers; import org.apache.beam.sdk.transforms.PTransform; @@ -36,7 +36,7 @@ class WindowAssignTranslatorBatch @Override public void translateTransform( - PTransform, PCollection> transform, TranslationContext context) { + PTransform, PCollection> transform, AbstractTranslationContext context) { Window.Assign assignTransform = (Window.Assign) transform; @SuppressWarnings("unchecked") diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderHelpers.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderHelpers.java index 2597e2cc5238..5a7ff8adb607 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderHelpers.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderHelpers.java @@ -21,17 +21,11 @@ import java.io.Serializable; import java.util.ArrayList; -import java.util.Collections; import java.util.List; import java.util.Objects; -import org.apache.beam.runners.spark.structuredstreaming.translation.SchemaHelpers; import org.apache.beam.sdk.coders.Coder; import org.apache.spark.sql.Encoder; import org.apache.spark.sql.Encoders; -import org.apache.spark.sql.catalyst.analysis.GetColumnByOrdinal; -import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder; -import org.apache.spark.sql.catalyst.expressions.BoundReference; -import org.apache.spark.sql.catalyst.expressions.Cast; import org.apache.spark.sql.catalyst.expressions.Expression; import org.apache.spark.sql.catalyst.expressions.NonSQLExpression; import org.apache.spark.sql.catalyst.expressions.UnaryExpression; @@ -45,7 +39,6 @@ import scala.StringContext; import scala.collection.JavaConversions; import scala.reflect.ClassTag; -import scala.reflect.ClassTag$; /** {@link Encoders} utility class. */ @SuppressWarnings({ @@ -57,19 +50,7 @@ public class EncoderHelpers { * generation). */ public static Encoder fromBeamCoder(Coder coder) { - Class clazz = coder.getEncodedTypeDescriptor().getRawType(); - ClassTag classTag = ClassTag$.MODULE$.apply(clazz); - List serializers = - Collections.singletonList( - new EncodeUsingBeamCoder<>(new BoundReference(0, new ObjectType(clazz), true), coder)); - - return new ExpressionEncoder<>( - SchemaHelpers.binarySchema(), - false, - JavaConversions.collectionAsScalaIterable(serializers).toSeq(), - new DecodeUsingBeamCoder<>( - new Cast(new GetColumnByOrdinal(0, BinaryType), BinaryType), classTag, coder), - classTag); + return EncoderFactory.fromBeamCoder(coder); } /** diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/MultiOuputCoder.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/MultiOutputCoder.java similarity index 93% rename from runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/MultiOuputCoder.java rename to runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/MultiOutputCoder.java index db5d4c65a835..9cdb796aa845 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/MultiOuputCoder.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/MultiOutputCoder.java @@ -39,19 +39,19 @@ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public class MultiOuputCoder extends CustomCoder, WindowedValue>> { +public class MultiOutputCoder extends CustomCoder, WindowedValue>> { Coder tupleTagCoder; Map, Coder> coderMap; Coder windowCoder; - public static MultiOuputCoder of( + public static MultiOutputCoder of( Coder tupleTagCoder, Map, Coder> coderMap, Coder windowCoder) { - return new MultiOuputCoder(tupleTagCoder, coderMap, windowCoder); + return new MultiOutputCoder(tupleTagCoder, coderMap, windowCoder); } - private MultiOuputCoder( + private MultiOutputCoder( Coder tupleTagCoder, Map, Coder> coderMap, Coder windowCoder) { diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SchemaHelpers.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/SchemaHelpers.java similarity index 99% rename from runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SchemaHelpers.java rename to runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/SchemaHelpers.java index b778c4655939..71dca5264dd8 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SchemaHelpers.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/SchemaHelpers.java @@ -15,7 +15,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.spark.structuredstreaming.translation; +package org.apache.beam.runners.spark.structuredstreaming.translation.helpers; import org.apache.spark.sql.types.DataTypes; import org.apache.spark.sql.types.Metadata; diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/WindowingHelpers.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/WindowingHelpers.java index 74272b58c670..5085eb9f7964 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/WindowingHelpers.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/WindowingHelpers.java @@ -18,7 +18,7 @@ package org.apache.beam.runners.spark.structuredstreaming.translation.helpers; import java.util.Collection; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.GlobalWindows; import org.apache.beam.sdk.transforms.windowing.Window; @@ -41,7 +41,7 @@ public final class WindowingHelpers { */ @SuppressWarnings("unchecked") public static boolean skipAssignWindows( - Window.Assign transform, TranslationContext context) { + Window.Assign transform, AbstractTranslationContext context) { WindowFn windowFnToApply = (WindowFn) transform.getWindowFn(); PCollection input = (PCollection) context.getInput(); WindowFn windowFnOfInput = input.getWindowingStrategy().getWindowFn(); diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/ReadSourceTranslatorStreaming.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/ReadSourceTranslatorStreaming.java index ea1027250768..23b1d97968b1 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/ReadSourceTranslatorStreaming.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/ReadSourceTranslatorStreaming.java @@ -17,11 +17,15 @@ */ package org.apache.beam.runners.spark.structuredstreaming.translation.streaming; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.BEAM_SOURCE_OPTION; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.DEFAULT_PARALLELISM; +import static org.apache.beam.runners.spark.structuredstreaming.Constants.PIPELINE_OPTIONS; + import java.io.IOException; import org.apache.beam.runners.core.construction.ReadTranslation; import org.apache.beam.runners.core.serialization.Base64Serializer; +import org.apache.beam.runners.spark.structuredstreaming.translation.AbstractTranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator; -import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.EncoderHelpers; import org.apache.beam.runners.spark.structuredstreaming.translation.helpers.RowHelpers; import org.apache.beam.sdk.io.UnboundedSource; @@ -43,7 +47,7 @@ class ReadSourceTranslatorStreaming @SuppressWarnings("unchecked") @Override public void translateTransform( - PTransform> transform, TranslationContext context) { + PTransform> transform, AbstractTranslationContext context) { AppliedPTransform, PTransform>> rootTransform = (AppliedPTransform, PTransform>>) context.getCurrentTransform(); @@ -61,12 +65,12 @@ public void translateTransform( sparkSession .readStream() .format(sourceProviderClass) - .option(DatasetSourceStreaming.BEAM_SOURCE_OPTION, serializedSource) + .option(BEAM_SOURCE_OPTION, serializedSource) .option( - DatasetSourceStreaming.DEFAULT_PARALLELISM, + DEFAULT_PARALLELISM, String.valueOf(context.getSparkSession().sparkContext().defaultParallelism())) .option( - DatasetSourceStreaming.PIPELINE_OPTIONS, + PIPELINE_OPTIONS, context.getSerializableOptions().toString()) .load(); diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupNonMergingWindowsFunctions.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupNonMergingWindowsFunctions.java index 0c0bc95b35d5..b75db5e0811f 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupNonMergingWindowsFunctions.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupNonMergingWindowsFunctions.java @@ -59,7 +59,7 @@ public class GroupNonMergingWindowsFunctions { * @return {@code true} if group by key and window can be used */ static boolean isEligibleForGroupByWindow(WindowingStrategy windowingStrategy) { - return windowingStrategy.getWindowFn().isNonMerging() + return !windowingStrategy.needsMerge() && windowingStrategy.getTimestampCombiner() == TimestampCombiner.END_OF_WINDOW && windowingStrategy.getWindowFn().windowCoder().consistentWithEquals(); } @@ -250,13 +250,7 @@ private WindowedValue> decodeItem(Tuple2 item) { @SuppressWarnings("unchecked") final W window = (W) Iterables.getOnlyElement(windowedValue.getWindows()); final Instant timestamp = - windowingStrategy - .getTimestampCombiner() - .assign( - window, - windowingStrategy - .getWindowFn() - .getOutputTime(windowedValue.getTimestamp(), window)); + windowingStrategy.getTimestampCombiner().assign(window, windowedValue.getTimestamp()); // BEAM-7341: Elements produced by GbK are always ON_TIME and ONLY_FIRING return WindowedValue.of( KV.of(key, value), timestamp, window, PaneInfo.ON_TIME_AND_ONLY_FIRING); diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/MultiDoFnFunction.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/MultiDoFnFunction.java index c9c7c8321be7..f19369cc4de5 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/MultiDoFnFunction.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/MultiDoFnFunction.java @@ -126,7 +126,7 @@ public MultiDoFnFunction( public Iterator, WindowedValue>> call(Iterator> iter) throws Exception { if (!wasSetupCalled && iter.hasNext()) { - DoFnInvokers.tryInvokeSetupFor(doFn); + DoFnInvokers.tryInvokeSetupFor(doFn, options.get()); wasSetupCalled = true; } diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkCombineFn.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkCombineFn.java index 2bc805b54f0c..cb46e1ddea54 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkCombineFn.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkCombineFn.java @@ -204,10 +204,7 @@ public void add(WindowedValue value, SparkCombineFn value, SparkCombineFn { @@ -728,6 +722,8 @@ private static Comparator asWindowComparator( WindowedAccumulator.create(this, toValue, windowingStrategy, windowComparator); accumulator.add(value, this); return accumulator; + } catch (RuntimeException ex) { + throw ex; } catch (Exception ex) { throw new IllegalStateException(ex); } @@ -743,6 +739,8 @@ private static Comparator asWindowComparator( try { accumulator.add(value, this); return accumulator; + } catch (RuntimeException ex) { + throw ex; } catch (Exception ex) { throw new IllegalStateException(ex); } @@ -759,6 +757,8 @@ private static Comparator asWindowComparator( try { ac1.merge(ac2, this); return ac1; + } catch (RuntimeException ex) { + throw ex; } catch (Exception ex) { throw new IllegalStateException(ex); } @@ -796,7 +796,7 @@ boolean mustBringWindowToKey() { } private WindowedAccumulator.Type getType(WindowingStrategy windowingStrategy) { - if (windowingStrategy.getWindowFn().isNonMerging()) { + if (!windowingStrategy.needsMerge()) { if (globalCombine) { /* global combine must use map-based accumulator to incorporate multiple windows */ return WindowedAccumulator.Type.NON_MERGING; diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/util/SparkCommon.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/util/SparkCommon.java new file mode 100644 index 000000000000..6f1f1f72cf48 --- /dev/null +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/util/SparkCommon.java @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.spark.util; + +import java.net.URI; +import java.net.URISyntaxException; +import org.apache.beam.runners.spark.SparkPipelineOptions; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.scheduler.EventLoggingListener; +import org.apache.spark.scheduler.SparkListenerExecutorAdded; +import org.apache.spark.scheduler.cluster.ExecutorInfo; +import org.checkerframework.checker.nullness.qual.Nullable; +import scala.Tuple2; + +/** Common methods to build Spark specific objects used by different runners. */ +@Internal +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class SparkCommon { + + /** + * Starts an EventLoggingListener to save Beam Metrics on Spark's History Server if event logging + * is enabled. + * + * @return The associated EventLoggingListener or null if it could not be started. + */ + public static @Nullable EventLoggingListener startEventLoggingListener( + final JavaSparkContext jsc, SparkPipelineOptions pipelineOptions, long startTime) { + EventLoggingListener eventLoggingListener = null; + try { + if (jsc.getConf().getBoolean("spark.eventLog.enabled", false)) { + eventLoggingListener = + new EventLoggingListener( + jsc.getConf().getAppId(), + scala.Option.apply("1"), + new URI(jsc.getConf().get("spark.eventLog.dir", null)), + jsc.getConf(), + jsc.hadoopConfiguration()); + eventLoggingListener.initializeLogIfNecessary(false, false); + eventLoggingListener.start(); + + scala.collection.immutable.Map logUrlMap = + new scala.collection.immutable.HashMap<>(); + Tuple2[] sparkMasters = jsc.getConf().getAllWithPrefix("spark.master"); + Tuple2[] sparkExecutors = + jsc.getConf().getAllWithPrefix("spark.executor.id"); + for (Tuple2 sparkExecutor : sparkExecutors) { + eventLoggingListener.onExecutorAdded( + new SparkListenerExecutorAdded( + startTime, + sparkExecutor._2(), + new ExecutorInfo(sparkMasters[0]._2(), 0, logUrlMap))); + } + return eventLoggingListener; + } + } catch (URISyntaxException e) { + throw new RuntimeException( + "The URI syntax in the Spark config \"spark.eventLog.dir\" is not correct", e); + } + return eventLoggingListener; + } +} diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/util/SparkCompat.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/util/SparkCompat.java index b8f37e565474..27b759baead0 100644 --- a/runners/spark/src/main/java/org/apache/beam/runners/spark/util/SparkCompat.java +++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/util/SparkCompat.java @@ -17,18 +17,27 @@ */ package org.apache.beam.runners.spark.util; +import java.lang.reflect.Constructor; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method; import java.util.List; import java.util.stream.Collectors; +import org.apache.beam.runners.spark.SparkPipelineOptions; +import org.apache.beam.runners.spark.metrics.SparkBeamMetric; import org.apache.beam.runners.spark.translation.SparkCombineFn; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.options.ApplicationNameOptions; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.KV; import org.apache.spark.api.java.JavaPairRDD; +import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; import org.apache.spark.api.java.function.Function; +import org.apache.spark.scheduler.SparkListenerApplicationStart; import org.apache.spark.streaming.api.java.JavaDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; +import scala.Option; +import scala.collection.JavaConverters; /** A set of functions to provide API compatibility between Spark 2 and Spark 3. */ @SuppressWarnings({ @@ -36,6 +45,8 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SparkCompat { + private SparkCompat() {} + /** * Union of dStreams in the given StreamingContext. * @@ -112,4 +123,56 @@ public static JavaPairRDD throw new RuntimeException("Error invoking Spark flatMapValues", e); } } + + public static SparkListenerApplicationStart buildSparkListenerApplicationStart( + final JavaSparkContext jsc, SparkPipelineOptions options, long time, PipelineResult result) { + String appName = options.as(ApplicationNameOptions.class).getAppName(); + Option appId = Option.apply(jsc.getConf().getAppId()); + Option appAttemptId = Option.apply("1"); + Option> driverLogs = + Option.apply( + JavaConverters.mapAsScalaMapConverter( + SparkBeamMetric.renderAllToString(result.metrics())) + .asScala()); + try { + Class clazz = Class.forName(SparkListenerApplicationStart.class.getName()); + if (jsc.version().startsWith("3")) { + // This invokes by Reflection the equivalent of + // return new SparkListenerApplicationStart( + // appName, appId, time, jsc.sparkUser(), appAttemptId, driverLogs, driverAttributes); + Class[] parameterTypes = { + String.class, + Option.class, + Long.TYPE, + String.class, + Option.class, + Option.class, + Option.class + }; + Constructor cons = clazz.getConstructor(parameterTypes); + Option> driverAttributes = + Option.apply(new scala.collection.immutable.HashMap<>()); + Object[] args = { + appName, appId, time, jsc.sparkUser(), appAttemptId, driverLogs, driverAttributes + }; + return (SparkListenerApplicationStart) cons.newInstance(args); + } else { + // This invokes by Reflection the equivalent of + // return new SparkListenerApplicationStart( + // appName, appId, time, jsc.sparkUser(), appAttemptId, driverLogs); + Class[] parameterTypes = { + String.class, Option.class, Long.TYPE, String.class, Option.class, Option.class + }; + Constructor cons = clazz.getConstructor(parameterTypes); + Object[] args = {appName, appId, time, jsc.sparkUser(), appAttemptId, driverLogs}; + return (SparkListenerApplicationStart) cons.newInstance(args); + } + } catch (ClassNotFoundException + | NoSuchMethodException + | IllegalAccessException + | InvocationTargetException + | InstantiationException e) { + throw new RuntimeException("Error building SparkListenerApplicationStart", e); + } + } } diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/CacheTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/CacheTest.java index b4ee37684752..8209d4302717 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/CacheTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/CacheTest.java @@ -45,7 +45,6 @@ /** Tests of {@link Dataset#cache(String, Coder)}} scenarios. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class CacheTest { diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/GlobalWatermarkHolderTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/GlobalWatermarkHolderTest.java index 55c454d63d44..7dfbdfea58d5 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/GlobalWatermarkHolderTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/GlobalWatermarkHolderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.spark; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.IsEqual.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.spark.translation.SparkContextFactory; import org.apache.beam.runners.spark.util.GlobalWatermarkHolder; @@ -33,9 +33,6 @@ import org.junit.rules.ExpectedException; /** A test suite for the propagation of watermarks in the Spark runner. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GlobalWatermarkHolderTest { @Rule public ClearWatermarksRule clearWatermarksRule = new ClearWatermarksRule(); diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/ProvidedSparkContextTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/ProvidedSparkContextTest.java index 5c4701b4e81c..4a57ade09cb5 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/ProvidedSparkContextTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/ProvidedSparkContextTest.java @@ -36,9 +36,6 @@ import org.junit.Test; /** Provided Spark Context tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ProvidedSparkContextTest { private static final String[] WORDS_ARRAY = { "hi there", "hi", "hi sue bob", diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkPipelineStateTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkPipelineStateTest.java index 4749f9562331..b48f553d8fc5 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkPipelineStateTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkPipelineStateTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.spark; import static org.hamcrest.CoreMatchers.instanceOf; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.io.Serializable; @@ -42,9 +42,6 @@ import org.junit.rules.TestName; /** This suite tests that various scenarios result in proper states of the pipeline. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SparkPipelineStateTest implements Serializable { private static class MyCustomException extends RuntimeException { diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkRunnerDebuggerTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkRunnerDebuggerTest.java index 88706098e6b8..c9bb83dd0c34 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkRunnerDebuggerTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkRunnerDebuggerTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.runners.spark; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.util.Collections; import org.apache.beam.runners.spark.examples.WordCount; @@ -81,7 +81,7 @@ public void debugBatchPipeline() { .apply(TextIO.write().to("!!PLACEHOLDER-OUTPUT-DIR!!").withNumShards(3).withSuffix(".txt")); final String expectedPipeline = - "_.\n" + "sparkContext.()\n" + "_.mapPartitions(" + "new org.apache.beam.runners.spark.examples.WordCount$ExtractWordsFn())\n" + "_.mapPartitions(new org.apache.beam.sdk.transforms.Contextful())\n" diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkRunnerRegistrarTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkRunnerRegistrarTest.java index 652bb8fa7323..2276ef8e0cda 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkRunnerRegistrarTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkRunnerRegistrarTest.java @@ -21,8 +21,6 @@ import static org.junit.Assert.fail; import java.util.ServiceLoader; -import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingPipelineOptions; -import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner; import org.apache.beam.sdk.options.PipelineOptionsRegistrar; import org.apache.beam.sdk.runners.PipelineRunnerRegistrar; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -37,18 +35,14 @@ public class SparkRunnerRegistrarTest { @Test public void testOptions() { assertEquals( - ImmutableList.of( - SparkPipelineOptions.class, - SparkStructuredStreamingPipelineOptions.class, - SparkPortableStreamingPipelineOptions.class), + ImmutableList.of(SparkPipelineOptions.class, SparkPortableStreamingPipelineOptions.class), new SparkRunnerRegistrar.Options().getPipelineOptions()); } @Test public void testRunners() { assertEquals( - ImmutableList.of( - SparkRunner.class, TestSparkRunner.class, SparkStructuredStreamingRunner.class), + ImmutableList.of(SparkRunner.class, TestSparkRunner.class), new SparkRunnerRegistrar.Runner().getPipelineRunners()); } diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/aggregators/metrics/sink/InMemoryMetrics.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/aggregators/metrics/sink/InMemoryMetrics.java index a9523d8bf680..a4b3e5425c2e 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/aggregators/metrics/sink/InMemoryMetrics.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/aggregators/metrics/sink/InMemoryMetrics.java @@ -25,9 +25,6 @@ import org.apache.spark.metrics.sink.Sink; /** An in-memory {@link Sink} implementation for tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class InMemoryMetrics implements Sink { private static WithMetricsSupport extendedMetricsRegistry; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/aggregators/metrics/sink/SparkMetricsSinkTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/aggregators/metrics/sink/SparkMetricsSinkTest.java index 4050ec178633..5e7c7600e51c 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/aggregators/metrics/sink/SparkMetricsSinkTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/aggregators/metrics/sink/SparkMetricsSinkTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.spark.aggregators.metrics.sink; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.spark.ReuseSparkContextRule; import org.apache.beam.runners.spark.SparkPipelineOptions; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/io/AvroPipelineTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/io/AvroPipelineTest.java index 4de3b39bb09c..fa49a9f5d12b 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/io/AvroPipelineTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/io/AvroPipelineTest.java @@ -41,9 +41,6 @@ import org.junit.rules.TemporaryFolder; /** Avro pipeline test. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AvroPipelineTest { private File inputFile; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/io/NumShardsTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/io/NumShardsTest.java index 65316d0c66fc..777d1d4dc5f0 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/io/NumShardsTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/io/NumShardsTest.java @@ -41,9 +41,6 @@ import org.junit.rules.TemporaryFolder; /** Number of shards test. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class NumShardsTest { private static final String[] WORDS_ARRAY = { diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/io/ReaderToIteratorAdapterTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/io/ReaderToIteratorAdapterTest.java index 0f548043b9b4..5b43e1b2aa8f 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/io/ReaderToIteratorAdapterTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/io/ReaderToIteratorAdapterTest.java @@ -18,8 +18,8 @@ package org.apache.beam.runners.spark.io; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.util.NoSuchElementException; @@ -31,9 +31,6 @@ import org.junit.rules.ExpectedException; /** Test for {@link SourceRDD.Bounded.ReaderToIteratorAdapter}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReaderToIteratorAdapterTest { @Rule public ExpectedException exception = ExpectedException.none(); diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/metrics/SparkBeamMetricTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/metrics/SparkBeamMetricTest.java index a1d6fe83870f..1851e1db1305 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/metrics/SparkBeamMetricTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/metrics/SparkBeamMetricTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.spark.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.metrics.MetricKey; import org.apache.beam.sdk.metrics.MetricName; @@ -35,7 +35,7 @@ public void testRenderName() { "myStep.one.two(three)", MetricName.named("myNameSpace//", "myName()")), 123, 456); - String renderedName = new SparkBeamMetric().renderName(metricResult); + String renderedName = SparkBeamMetric.renderName(metricResult); assertThat( "Metric name was not rendered correctly", renderedName, diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/metrics/SparkMetricsPusherTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/metrics/SparkMetricsPusherTest.java index 59de4bbd2f55..2f96ccfab0ea 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/metrics/SparkMetricsPusherTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/metrics/SparkMetricsPusherTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.spark.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.core.metrics.TestMetricsSink; import org.apache.beam.runners.spark.ReuseSparkContextRule; @@ -116,7 +116,7 @@ public void processElement(ProcessContext context) { @Category(UsesMetricsPusher.class) @Test - public void testInSBatchMode() throws Exception { + public void testInBatchMode() throws Exception { pipeline.apply(Create.of(1, 2, 3, 4, 5, 6)).apply(ParDo.of(new CountingDoFn())); pipeline.run(); diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunnerRegistrarTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunnerRegistrarTest.java new file mode 100644 index 000000000000..30d8297809b9 --- /dev/null +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunnerRegistrarTest.java @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.spark.structuredstreaming; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.fail; + +import java.util.ServiceLoader; +import org.apache.beam.sdk.options.PipelineOptionsRegistrar; +import org.apache.beam.sdk.runners.PipelineRunnerRegistrar; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test {@link SparkStructuredStreamingRunnerRegistrar}. */ +@RunWith(JUnit4.class) +public class SparkStructuredStreamingRunnerRegistrarTest { + @Test + public void testOptions() { + assertEquals( + ImmutableList.of(SparkStructuredStreamingPipelineOptions.class), + new SparkStructuredStreamingRunnerRegistrar.Options().getPipelineOptions()); + } + + @Test + public void testRunners() { + assertEquals( + ImmutableList.of(SparkStructuredStreamingRunner.class), + new SparkStructuredStreamingRunnerRegistrar.Runner().getPipelineRunners()); + } + + @Test + public void testServiceLoaderForOptions() { + for (PipelineOptionsRegistrar registrar : + Lists.newArrayList(ServiceLoader.load(PipelineOptionsRegistrar.class).iterator())) { + if (registrar instanceof SparkStructuredStreamingRunnerRegistrar.Options) { + return; + } + } + fail("Expected to find " + SparkStructuredStreamingRunnerRegistrar.Options.class); + } + + @Test + public void testServiceLoaderForRunner() { + for (PipelineRunnerRegistrar registrar : + Lists.newArrayList(ServiceLoader.load(PipelineRunnerRegistrar.class).iterator())) { + if (registrar instanceof SparkStructuredStreamingRunnerRegistrar.Runner) { + return; + } + } + fail("Expected to find " + SparkStructuredStreamingRunnerRegistrar.Runner.class); + } +} diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/StructuredStreamingPipelineStateTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/StructuredStreamingPipelineStateTest.java index 0dfb32109d38..b44df7bf101b 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/StructuredStreamingPipelineStateTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/StructuredStreamingPipelineStateTest.java @@ -46,9 +46,6 @@ /** This suite tests that various scenarios result in proper states of the pipeline. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StructuredStreamingPipelineStateTest implements Serializable { private static class MyCustomException extends RuntimeException { diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/InMemoryMetrics.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/InMemoryMetrics.java index fd1159af8d24..8649e91c7611 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/InMemoryMetrics.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/InMemoryMetrics.java @@ -28,9 +28,6 @@ /** An in-memory {@link Sink} implementation for tests. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class InMemoryMetrics implements Sink { private static WithMetricsSupport extendedMetricsRegistry; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/SparkMetricsSinkTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/SparkMetricsSinkTest.java index 1e8e7652d0a0..40b5036fe94f 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/SparkMetricsSinkTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/SparkMetricsSinkTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.spark.structuredstreaming.aggregators.metrics.sink; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.nullValue; -import static org.junit.Assert.assertThat; import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingPipelineOptions; import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner; @@ -49,9 +49,6 @@ */ @Ignore("Has been failing since at least c350188ef7a8704c7336f3c20a1ab2144abbcd4a") @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SparkMetricsSinkTest { @Rule public ExternalResource inMemoryMetricsSink = new InMemoryMetricsSinkRule(); diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/metrics/BeamMetricTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/metrics/BeamMetricTest.java index 45de622e82cd..a6989348e163 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/metrics/BeamMetricTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/metrics/BeamMetricTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.runners.spark.structuredstreaming.metrics; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.metrics.MetricKey; import org.apache.beam.sdk.metrics.MetricName; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombineTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombineTest.java index ec340c82ad1f..52e60a3db545 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombineTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombineTest.java @@ -45,9 +45,6 @@ /** Test class for beam to spark {@link org.apache.beam.sdk.transforms.Combine} translation. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CombineTest implements Serializable { private static Pipeline pipeline; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ComplexSourceTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ComplexSourceTest.java index b3e467c0d5b6..0175d03f8753 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ComplexSourceTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ComplexSourceTest.java @@ -41,9 +41,6 @@ /** Test class for beam to spark source translation. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ComplexSourceTest implements Serializable { @ClassRule public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder(); private static File file; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTest.java index f31154e2fa79..e126d06e6852 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTest.java @@ -34,9 +34,6 @@ /** Test class for beam to spark flatten translation. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlattenTest implements Serializable { private static Pipeline pipeline; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTest.java index 755d3efd90f1..07850232853a 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTest.java @@ -18,7 +18,7 @@ package org.apache.beam.runners.spark.structuredstreaming.translation.batch; import static org.apache.beam.sdk.testing.SerializableMatchers.containsInAnyOrder; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.io.Serializable; import java.util.ArrayList; @@ -47,9 +47,6 @@ /** Test class for beam to spark {@link ParDo} translation. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GroupByKeyTest implements Serializable { private static Pipeline pipeline; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTest.java index 408efdbc74c8..16d9a8b7fa87 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTest.java @@ -39,9 +39,6 @@ /** Test class for beam to spark {@link ParDo} translation. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ParDoTest implements Serializable { private static Pipeline pipeline; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/SimpleSourceTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/SimpleSourceTest.java index b6eb81385673..70cdca630b9b 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/SimpleSourceTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/SimpleSourceTest.java @@ -17,41 +17,23 @@ */ package org.apache.beam.runners.spark.structuredstreaming.translation.batch; -import java.io.IOException; import java.io.Serializable; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.List; -import java.util.Map; -import org.apache.beam.runners.core.construction.SerializablePipelineOptions; -import org.apache.beam.runners.core.serialization.Base64Serializer; import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingPipelineOptions; import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner; -import org.apache.beam.runners.spark.structuredstreaming.utils.SerializationDebugger; import org.apache.beam.sdk.Pipeline; -import org.apache.beam.sdk.io.BoundedSource; -import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.values.PCollection; -import org.apache.spark.sql.sources.v2.DataSourceOptions; -import org.apache.spark.sql.sources.v2.reader.DataSourceReader; import org.junit.BeforeClass; -import org.junit.ClassRule; import org.junit.Test; -import org.junit.rules.TemporaryFolder; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** Test class for beam to spark source translation. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SimpleSourceTest implements Serializable { private static Pipeline pipeline; - @ClassRule public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder(); @BeforeClass public static void beforeClass() { @@ -62,39 +44,6 @@ public static void beforeClass() { pipeline = Pipeline.create(options); } - @Test - public void testSerialization() throws IOException { - BoundedSource source = - new BoundedSource() { - - @Override - public List> split( - long desiredBundleSizeBytes, PipelineOptions options) throws Exception { - return new ArrayList<>(); - } - - @Override - public long getEstimatedSizeBytes(PipelineOptions options) throws Exception { - return 0; - } - - @Override - public BoundedReader createReader(PipelineOptions options) throws IOException { - return null; - } - }; - String serializedSource = Base64Serializer.serializeUnchecked(source); - Map dataSourceOptions = new HashMap<>(); - dataSourceOptions.put(DatasetSourceBatch.BEAM_SOURCE_OPTION, serializedSource); - dataSourceOptions.put(DatasetSourceBatch.DEFAULT_PARALLELISM, "4"); - dataSourceOptions.put( - DatasetSourceBatch.PIPELINE_OPTIONS, - new SerializablePipelineOptions(pipeline.getOptions()).toString()); - DataSourceReader objectToTest = - new DatasetSourceBatch().createReader(new DataSourceOptions(dataSourceOptions)); - SerializationDebugger.testSerialization(objectToTest, TEMPORARY_FOLDER.newFile()); - } - @Test public void testBoundedSource() { PCollection input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)); diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTest.java index b93c9c8a0505..b8b41010a24b 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTest.java @@ -38,9 +38,6 @@ /** Test class for beam to spark window assign translation. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindowAssignTest implements Serializable { private static Pipeline pipeline; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/SimpleSourceTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/SimpleSourceTest.java index 36b532028a9e..a06d2cec1e9e 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/SimpleSourceTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/SimpleSourceTest.java @@ -33,9 +33,6 @@ /** Test class for beam to spark source translation. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SimpleSourceTest implements Serializable { private static Pipeline pipeline; @ClassRule public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder(); diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/SerializationDebugger.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/SerializationDebugger.java index 7c3dee010938..b384b9b9d35d 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/SerializationDebugger.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/SerializationDebugger.java @@ -27,9 +27,6 @@ import java.util.List; /** A {@code SerializationDebugger} for Spark Runner. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SerializationDebugger { public static void testSerialization(Object object, File to) throws IOException { diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunctionTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunctionTest.java index 97e05cf6732f..24d69a5fb1d1 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunctionTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunctionTest.java @@ -66,7 +66,6 @@ /** Unit tests for {@link SparkExecutableStageFunction}. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SparkExecutableStageFunctionTest { @Mock private SparkExecutableStageContextFactory contextFactory; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/CreateStreamTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/CreateStreamTest.java index 6e795a5fa8cc..652beb4bef87 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/CreateStreamTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/CreateStreamTest.java @@ -17,13 +17,13 @@ */ package org.apache.beam.runners.spark.translation.streaming; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThanOrEqualTo; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.lessThanOrEqualTo; import static org.junit.Assert.assertNotEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.io.IOException; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/ResumeFromCheckpointStreamingTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/ResumeFromCheckpointStreamingTest.java index f049c00b9927..6c107c474b66 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/ResumeFromCheckpointStreamingTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/ResumeFromCheckpointStreamingTest.java @@ -18,10 +18,10 @@ package org.apache.beam.runners.spark.translation.streaming; import static org.apache.beam.sdk.metrics.MetricResultsMatchers.attemptedMetricsResult; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.io.Serializable; @@ -102,7 +102,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ResumeFromCheckpointStreamingTest implements Serializable { private static final EmbeddedKafkaCluster.EmbeddedZookeeper EMBEDDED_ZOOKEEPER = diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/SparkCoGroupByKeyStreamingTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/SparkCoGroupByKeyStreamingTest.java index 3c140c9a585c..19011437a063 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/SparkCoGroupByKeyStreamingTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/SparkCoGroupByKeyStreamingTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.spark.translation.streaming; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import org.apache.beam.runners.spark.ReuseSparkContextRule; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/StreamingSourceMetricsTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/StreamingSourceMetricsTest.java index d706c79903c4..75dccadca451 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/StreamingSourceMetricsTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/StreamingSourceMetricsTest.java @@ -18,10 +18,10 @@ package org.apache.beam.runners.spark.translation.streaming; import static org.apache.beam.sdk.metrics.MetricResultsMatchers.metricsResult; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.greaterThanOrEqualTo; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import java.io.Serializable; import org.apache.beam.runners.spark.StreamingTest; diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/TrackStreamingSourcesTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/TrackStreamingSourcesTest.java index 568816f4f21b..5ede41aaedaf 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/TrackStreamingSourcesTest.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/TrackStreamingSourcesTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.runners.spark.translation.streaming; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.core.IsEqual.equalTo; -import static org.junit.Assert.assertThat; import java.util.List; import org.apache.beam.runners.spark.ReuseSparkContextRule; @@ -55,7 +55,6 @@ */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class TrackStreamingSourcesTest { diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/utils/EmbeddedKafkaCluster.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/utils/EmbeddedKafkaCluster.java index 19152f4124cd..7794d5b4318e 100644 --- a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/utils/EmbeddedKafkaCluster.java +++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/utils/EmbeddedKafkaCluster.java @@ -36,9 +36,6 @@ import org.slf4j.LoggerFactory; /** Embedded Kafka cluster. https://gist.github.com/fjavieralba/7930018 */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class EmbeddedKafkaCluster { private static final Logger LOG = LoggerFactory.getLogger(EmbeddedKafkaCluster.class); diff --git a/runners/twister2/build.gradle b/runners/twister2/build.gradle index ca44ed35dd24..f9621277395f 100644 --- a/runners/twister2/build.gradle +++ b/runners/twister2/build.gradle @@ -36,7 +36,21 @@ dependencies { compile (project(path: ":runners:core-java")){ exclude group: 'com.esotericsoftware.kryo', module: 'kryo' } - compile project(":runners:java-fn-execution") + compile project(path: ":model:pipeline", configuration: "shadow") + compile project(path: ":runners:core-construction-java") + compile project(path: ":sdks:java:core", configuration: "shadow") + compile library.java.jackson_annotations + compile library.java.joda_time + compile library.java.vendored_grpc_1_36_0 + compile library.java.slf4j_api + compile "org.mortbay.jetty:jetty-util:6.1.26" + compile "org.twister2:comms-api-java:$twister2_version" + compile "org.twister2:config-api-java:$twister2_version" + compile "org.twister2:dataset-api-java:$twister2_version" + compile "org.twister2:driver-api-java:$twister2_version" + compile "org.twister2:exceptions-java:$twister2_version" + compile "org.twister2:scheduler-api-java:$twister2_version" + compile "org.twister2:task-api-java:$twister2_version" compile "org.twister2:api-java:$twister2_version" compile "org.twister2:tset-api-java:$twister2_version" compile "org.twister2:tset-java:$twister2_version" diff --git a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2PipelineOptions.java b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2PipelineOptions.java index 0782271f506e..f69c46291ace 100644 --- a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2PipelineOptions.java +++ b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2PipelineOptions.java @@ -19,14 +19,15 @@ import com.fasterxml.jackson.annotation.JsonIgnore; import edu.iu.dsc.tws.tset.env.TSetEnvironment; -import java.util.List; import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.FileStagingOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.StreamingOptions; /** Twister2PipelineOptions. */ -public interface Twister2PipelineOptions extends PipelineOptions, StreamingOptions { +public interface Twister2PipelineOptions + extends PipelineOptions, StreamingOptions, FileStagingOptions { @Description("set parallelism for Twister2 processor") @Default.Integer(1) @@ -46,12 +47,6 @@ public interface Twister2PipelineOptions extends PipelineOptions, StreamingOptio void setClusterType(String name); - @Description( - "Jar-Files to send to all workers and put on the classpath. The default value is all files from the classpath.") - List getFilesToStage(); - - void setFilesToStage(List value); - @Description("Job file zip") String getJobFileZip(); diff --git a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2Runner.java b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2Runner.java index 81e5e78abb84..b3a02d3998be 100644 --- a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2Runner.java +++ b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2Runner.java @@ -17,8 +17,6 @@ */ package org.apache.beam.runners.twister2; -import static org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage; - import edu.iu.dsc.tws.api.JobConfig; import edu.iu.dsc.tws.api.Twister2Job; import edu.iu.dsc.tws.api.config.Config; @@ -43,7 +41,6 @@ import java.util.Set; import java.util.logging.LogManager; import java.util.logging.Logger; -import java.util.stream.Collectors; import java.util.zip.ZipEntry; import java.util.zip.ZipOutputStream; import org.apache.beam.runners.core.construction.PTransformMatchers; @@ -54,10 +51,10 @@ import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.PipelineResult; import org.apache.beam.sdk.PipelineRunner; +import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsValidator; import org.apache.beam.sdk.runners.PTransformOverride; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; /** @@ -83,18 +80,8 @@ protected Twister2Runner(Twister2PipelineOptions options) { } public static Twister2Runner fromOptions(PipelineOptions options) { - Twister2PipelineOptions pipelineOptions = - PipelineOptionsValidator.validate(Twister2PipelineOptions.class, options); - if (pipelineOptions.getFilesToStage() == null) { - pipelineOptions.setFilesToStage( - detectClassPathResourcesToStage(Twister2Runner.class.getClassLoader(), pipelineOptions)); - LOG.info( - "PipelineOptions.filesToStage was not specified. " - + "Defaulting to files from the classpath: will stage {} files. " - + "Enable logging at DEBUG level to see which files will be staged" - + pipelineOptions.getFilesToStage().size()); - } - return new Twister2Runner(pipelineOptions); + return new Twister2Runner( + PipelineOptionsValidator.validate(Twister2PipelineOptions.class, options)); } @Override @@ -103,7 +90,12 @@ public PipelineResult run(Pipeline pipeline) { Twister2PipelineExecutionEnvironment env = new Twister2PipelineExecutionEnvironment(options); LOG.info("Translating pipeline to Twister2 program."); pipeline.replaceAll(getDefaultOverrides()); - SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); + + // TODO(BEAM-10670): Use SDF read as default when we address performance issue. + if (!ExperimentalOptions.hasExperiment(pipeline.getOptions(), "beam_fn_api")) { + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); + } + env.translate(pipeline); setupSystem(options); @@ -159,7 +151,12 @@ public PipelineResult runTest(Pipeline pipeline) { Twister2PipelineExecutionEnvironment env = new Twister2PipelineExecutionEnvironment(options); LOG.info("Translating pipeline to Twister2 program."); pipeline.replaceAll(getDefaultOverrides()); - SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); + + // TODO(BEAM-10670): Use SDF read as default when we address performance issue. + if (!ExperimentalOptions.hasExperiment(pipeline.getOptions(), "beam_fn_api")) { + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); + } + env.translate(pipeline); setupSystemTest(options); Map configMap = new HashMap(); @@ -269,20 +266,7 @@ private Map extractNames(Map> sideInputs) { * cause exception in running log. */ private void prepareFilesToStage(Twister2PipelineOptions options) { - List filesToStage = - options.getFilesToStage().stream() - .map(File::new) - .filter(File::exists) - .map( - file -> { - return file.getAbsolutePath(); - }) - .collect(Collectors.toList()); - options.setFilesToStage( - PipelineResources.prepareFilesForStaging( - filesToStage, - MoreObjects.firstNonNull( - options.getTempLocation(), System.getProperty("java.io.tmpdir")))); + PipelineResources.prepareFilesForStaging(options); } /** diff --git a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/batch/PCollectionViewTranslatorBatch.java b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/batch/PCollectionViewTranslatorBatch.java index a8c177112484..9bc32fcbb3f1 100644 --- a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/batch/PCollectionViewTranslatorBatch.java +++ b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/batch/PCollectionViewTranslatorBatch.java @@ -22,11 +22,14 @@ import org.apache.beam.runners.core.construction.CreatePCollectionViewTranslation; import org.apache.beam.runners.twister2.Twister2BatchTranslationContext; import org.apache.beam.runners.twister2.translators.BatchTransformTranslator; +import org.apache.beam.runners.twister2.translators.functions.ByteToElemFunction; import org.apache.beam.runners.twister2.translators.functions.ByteToWindowFunctionPrimitive; +import org.apache.beam.runners.twister2.translators.functions.ElemToBytesFunction; import org.apache.beam.runners.twister2.translators.functions.MapToTupleFunction; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.runners.AppliedPTransform; +import org.apache.beam.sdk.transforms.Materializations; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.windowing.WindowFn; @@ -58,23 +61,50 @@ public void translateNode( context.getCurrentTransform(); org.apache.beam.sdk.values.PCollectionView input; PCollection inputPCol = context.getInput(transform); - final KvCoder coder = (KvCoder) inputPCol.getCoder(); - Coder inputKeyCoder = coder.getKeyCoder(); + final Coder coder = inputPCol.getCoder(); WindowingStrategy windowingStrategy = inputPCol.getWindowingStrategy(); WindowFn windowFn = windowingStrategy.getWindowFn(); - final WindowedValue.WindowedValueCoder wvCoder = - WindowedValue.FullWindowedValueCoder.of(coder.getValueCoder(), windowFn.windowCoder()); - BatchTSet> inputGathered = - inputDataSet - .direct() - .map(new MapToTupleFunction<>(inputKeyCoder, wvCoder)) - .allGather() - .map(new ByteToWindowFunctionPrimitive(inputKeyCoder, wvCoder)); try { input = CreatePCollectionViewTranslation.getView(application); } catch (IOException e) { throw new RuntimeException(e); } - context.setSideInputDataSet(input.getTagInternal().getId(), inputGathered); + + switch (input.getViewFn().getMaterialization().getUrn()) { + case Materializations.MULTIMAP_MATERIALIZATION_URN: + KvCoder kvCoder = (KvCoder) coder; + final Coder keyCoder = kvCoder.getKeyCoder(); + final WindowedValue.WindowedValueCoder kvwvCoder = + WindowedValue.FullWindowedValueCoder.of( + kvCoder.getValueCoder(), windowFn.windowCoder()); + BatchTSet> multimapMaterialization = + inputDataSet + .direct() + .map(new MapToTupleFunction<>(keyCoder, kvwvCoder)) + .allGather() + .map(new ByteToWindowFunctionPrimitive(keyCoder, kvwvCoder)); + context.setSideInputDataSet(input.getTagInternal().getId(), multimapMaterialization); + break; + case Materializations.ITERABLE_MATERIALIZATION_URN: + final WindowedValue.WindowedValueCoder wvCoder = + WindowedValue.FullWindowedValueCoder.of(coder, windowFn.windowCoder()); + BatchTSet> iterableMaterialization = + inputDataSet + .direct() + .map(new ElemToBytesFunction<>(wvCoder)) + .allGather() + .map(new ByteToElemFunction(wvCoder)); + try { + input = CreatePCollectionViewTranslation.getView(application); + } catch (IOException e) { + throw new RuntimeException(e); + } + context.setSideInputDataSet(input.getTagInternal().getId(), iterableMaterialization); + break; + default: + throw new UnsupportedOperationException( + "Unknown side input materialization " + + input.getViewFn().getMaterialization().getUrn()); + } } } diff --git a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/AssignWindowsFunction.java b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/AssignWindowsFunction.java index 9655a87dd5bd..e974a121df96 100644 --- a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/AssignWindowsFunction.java +++ b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/AssignWindowsFunction.java @@ -35,7 +35,7 @@ import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.WindowFn; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; /** Assign Windows function. */ @SuppressWarnings({ diff --git a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/ByteToElemFunction.java b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/ByteToElemFunction.java new file mode 100644 index 000000000000..578225f00340 --- /dev/null +++ b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/ByteToElemFunction.java @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.twister2.translators.functions; + +import edu.iu.dsc.tws.api.tset.TSetContext; +import edu.iu.dsc.tws.api.tset.fn.MapFunc; +import java.io.ObjectStreamException; +import java.util.logging.Logger; +import org.apache.beam.runners.twister2.utils.TranslationUtils; +import org.apache.beam.sdk.util.SerializableUtils; +import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; + +/** ByteToWindow function. */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class ByteToElemFunction implements MapFunc, byte[]> { + private transient WindowedValueCoder wvCoder; + private static final Logger LOG = Logger.getLogger(ByteToElemFunction.class.getName()); + + private transient boolean isInitialized = false; + private byte[] wvCoderBytes; + + public ByteToElemFunction() { + // non arg constructor needed for kryo + isInitialized = false; + } + + public ByteToElemFunction(final WindowedValueCoder wvCoder) { + this.wvCoder = wvCoder; + + wvCoderBytes = SerializableUtils.serializeToByteArray(wvCoder); + } + + @Override + public WindowedValue map(byte[] input) { + return TranslationUtils.fromByteArray(input, wvCoder); + } + + @Override + public void prepare(TSetContext context) { + initTransient(); + } + + /** + * Method used to initialize the transient variables that were sent over as byte arrays or proto + * buffers. + */ + private void initTransient() { + if (isInitialized) { + return; + } + + wvCoder = + (WindowedValueCoder) + SerializableUtils.deserializeFromByteArray(wvCoderBytes, "Custom Coder Bytes"); + this.isInitialized = true; + } + + protected Object readResolve() throws ObjectStreamException { + return this; + } +} diff --git a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/DoFnFunction.java b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/DoFnFunction.java index 9488399a7f6b..a3a073577cf4 100644 --- a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/DoFnFunction.java +++ b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/DoFnFunction.java @@ -59,7 +59,7 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; /** DoFn function. */ @SuppressWarnings({ @@ -163,7 +163,7 @@ public void prepare(TSetContext context) { initTransient(); sideInputReader = new Twister2SideInputReader(sideInputs, context); outputManager.setup(mainOutput, sideOutputs); - doFnInvoker = DoFnInvokers.tryInvokeSetupFor(doFn); + doFnInvoker = DoFnInvokers.tryInvokeSetupFor(doFn, pipelineOptions); doFnRunner = DoFnRunners.simpleRunner( diff --git a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/ElemToBytesFunction.java b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/ElemToBytesFunction.java new file mode 100644 index 000000000000..c83acddd4798 --- /dev/null +++ b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/ElemToBytesFunction.java @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.twister2.translators.functions; + +import edu.iu.dsc.tws.api.tset.TSetContext; +import edu.iu.dsc.tws.api.tset.fn.MapFunc; +import java.io.ObjectStreamException; +import java.util.logging.Logger; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.util.CoderUtils; +import org.apache.beam.sdk.util.SerializableUtils; +import org.apache.beam.sdk.util.WindowedValue; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** Map to tuple function. */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class ElemToBytesFunction implements MapFunc> { + + private transient WindowedValue.WindowedValueCoder wvCoder; + private static final Logger LOG = Logger.getLogger(ElemToBytesFunction.class.getName()); + + private transient boolean isInitialized = false; + private byte[] wvCoderBytes; + + public ElemToBytesFunction() { + // non arg constructor needed for kryo + this.isInitialized = false; + } + + public ElemToBytesFunction(WindowedValue.WindowedValueCoder wvCoder) { + this.wvCoder = wvCoder; + wvCoderBytes = SerializableUtils.serializeToByteArray(wvCoder); + } + + @Override + public @Nullable byte[] map(WindowedValue input) { + try { + return CoderUtils.encodeToByteArray(wvCoder, input); + } catch (CoderException e) { + LOG.info(e.getMessage()); + } + return null; + } + + @Override + public void prepare(TSetContext context) { + initTransient(); + } + + /** + * Method used to initialize the transient variables that were sent over as byte arrays or proto + * buffers. + */ + private void initTransient() { + if (isInitialized) { + return; + } + wvCoder = + (WindowedValue.WindowedValueCoder) + SerializableUtils.deserializeFromByteArray(wvCoderBytes, "Coder"); + this.isInitialized = true; + } + + protected Object readResolve() throws ObjectStreamException { + return this; + } +} diff --git a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/GroupByWindowFunction.java b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/GroupByWindowFunction.java index 5b677f440a07..e0228db3a314 100644 --- a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/GroupByWindowFunction.java +++ b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/translators/functions/GroupByWindowFunction.java @@ -52,7 +52,7 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.joda.time.Instant; /** GroupBy window function. */ diff --git a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/utils/Twister2SideInputReader.java b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/utils/Twister2SideInputReader.java index bbcd3926ab42..6ed77c7434f0 100644 --- a/runners/twister2/src/main/java/org/apache/beam/runners/twister2/utils/Twister2SideInputReader.java +++ b/runners/twister2/src/main/java/org/apache/beam/runners/twister2/utils/Twister2SideInputReader.java @@ -23,6 +23,7 @@ import edu.iu.dsc.tws.api.dataset.DataPartitionConsumer; import edu.iu.dsc.tws.api.tset.TSetContext; import java.util.ArrayList; +import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; @@ -31,11 +32,11 @@ import org.apache.beam.runners.core.SideInputReader; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.transforms.Materializations; import org.apache.beam.sdk.transforms.Materializations.MultimapView; import org.apache.beam.sdk.transforms.ViewFn; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; @@ -75,40 +76,79 @@ public boolean isEmpty() { } private T getSideInput(PCollectionView view, BoundedWindow window) { - Map>>> partitionedElements = new HashMap<>(); + switch (view.getViewFn().getMaterialization().getUrn()) { + case Materializations.MULTIMAP_MATERIALIZATION_URN: + return getMultimapSideInput(view, window); + case Materializations.ITERABLE_MATERIALIZATION_URN: + return getIterableSideInput(view, window); + default: + throw new IllegalArgumentException( + "Unknown materialization type: " + view.getViewFn().getMaterialization().getUrn()); + } + } + + private T getMultimapSideInput(PCollectionView view, BoundedWindow window) { + Map>> partitionedElements = getPartitionedElements(view); + Map resultMap = new HashMap<>(); + + ViewFn viewFn = (ViewFn) view.getViewFn(); + for (Map.Entry>> elements : + partitionedElements.entrySet()) { + + Coder keyCoder = ((KvCoder) view.getCoderInternal()).getKeyCoder(); + resultMap.put( + elements.getKey(), + viewFn.apply( + InMemoryMultimapSideInputView.fromIterable( + keyCoder, + (Iterable) + elements.getValue().stream() + .map(WindowedValue::getValue) + .collect(Collectors.toList())))); + } + T result = resultMap.get(window); + if (result == null) { + result = viewFn.apply(InMemoryMultimapSideInputView.empty()); + } + return result; + } + + private Map>> getPartitionedElements( + PCollectionView view) { + Map>> partitionedElements = new HashMap<>(); DataPartition sideInput = runtimeContext.getInput(view.getTagInternal().getId()); DataPartitionConsumer dataPartitionConsumer = sideInput.getConsumer(); while (dataPartitionConsumer.hasNext()) { - WindowedValue> winValue = (WindowedValue>) dataPartitionConsumer.next(); + WindowedValue winValue = (WindowedValue) dataPartitionConsumer.next(); for (BoundedWindow tbw : winValue.getWindows()) { - List>> windowedValues = + List> windowedValues = partitionedElements.computeIfAbsent(tbw, k -> new ArrayList<>()); windowedValues.add(winValue); } } + return partitionedElements; + } + + private T getIterableSideInput(PCollectionView view, BoundedWindow window) { + Map>> partitionedElements = getPartitionedElements(view); + ViewFn viewFn = + (ViewFn) view.getViewFn(); Map resultMap = new HashMap<>(); - for (Map.Entry>>> elements : + for (Map.Entry>> elements : partitionedElements.entrySet()) { - - ViewFn viewFn = (ViewFn) view.getViewFn(); - Coder keyCoder = ((KvCoder) view.getCoderInternal()).getKeyCoder(); resultMap.put( elements.getKey(), - (T) - viewFn.apply( - InMemoryMultimapSideInputView.fromIterable( - keyCoder, - (Iterable) - elements.getValue().stream() - .map(WindowedValue::getValue) - .collect(Collectors.toList())))); + viewFn.apply( + () -> + elements.getValue().stream() + .map(WindowedValue::getValue) + .collect(Collectors.toList()))); } T result = resultMap.get(window); if (result == null) { - ViewFn viewFn = (ViewFn) view.getViewFn(); - result = viewFn.apply(InMemoryMultimapSideInputView.empty()); + result = viewFn.apply(() -> Collections.emptyList()); } return result; } diff --git a/sdks/go.mod b/sdks/go.mod new file mode 100644 index 000000000000..9cb3c90c60c6 --- /dev/null +++ b/sdks/go.mod @@ -0,0 +1,55 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// This module contains all Go code used for Beam's SDKs. This file is placed +// in this directory in order to cover the go code required for Java and Python +// containers, as well as the entire Go SDK. Placing this file in the repository +// root is not possible because it causes conflicts with a pre-existing vendor +// directory. +module github.com/apache/beam/sdks/v2 + +go 1.16 + +require ( + cloud.google.com/go/bigquery v1.17.0 + cloud.google.com/go/datastore v1.5.0 + cloud.google.com/go/pubsub v1.11.0-beta.schemas + cloud.google.com/go/storage v1.15.0 + github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect + github.com/golang/protobuf v1.5.2 // TODO(danoliveira): Fully replace this with google.golang.org/protobuf + github.com/golang/snappy v0.0.4 // indirect + github.com/google/btree v1.0.0 // indirect + github.com/google/go-cmp v0.5.6 + github.com/google/martian/v3 v3.2.1 // indirect + github.com/google/uuid v1.3.0 + github.com/hashicorp/golang-lru v0.5.1 // indirect + github.com/kr/text v0.2.0 // indirect + github.com/linkedin/goavro v2.1.0+incompatible + github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e // indirect + github.com/nightlyone/lockfile v1.0.0 + github.com/spf13/cobra v1.2.1 + golang.org/x/exp v0.0.0-20200224162631-6cc2880d07d6 // indirect + golang.org/x/net v0.0.0-20210423184538-5f58ad60dda6 + golang.org/x/oauth2 v0.0.0-20210628180205-a41e5a781914 + golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c // indirect + golang.org/x/text v0.3.6 + google.golang.org/api v0.45.0 + google.golang.org/genproto v0.0.0-20210728212813-7823e685a01f + google.golang.org/grpc v1.39.0 + google.golang.org/protobuf v1.27.1 + gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f // indirect + gopkg.in/linkedin/goavro.v1 v1.0.5 // indirect + gopkg.in/yaml.v2 v2.4.0 +) diff --git a/sdks/go.sum b/sdks/go.sum new file mode 100644 index 000000000000..97927327f322 --- /dev/null +++ b/sdks/go.sum @@ -0,0 +1,648 @@ +cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw= +cloud.google.com/go v0.34.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw= +cloud.google.com/go v0.38.0/go.mod h1:990N+gfupTy94rShfmMCWGDn0LpTmnzTp2qbd1dvSRU= +cloud.google.com/go v0.44.1/go.mod h1:iSa0KzasP4Uvy3f1mN/7PiObzGgflwredwwASm/v6AU= +cloud.google.com/go v0.44.2/go.mod h1:60680Gw3Yr4ikxnPRS/oxxkBccT6SA1yMk63TGekxKY= +cloud.google.com/go v0.45.1/go.mod h1:RpBamKRgapWJb87xiFSdk4g1CME7QZg3uwTez+TSTjc= +cloud.google.com/go v0.46.3/go.mod h1:a6bKKbmY7er1mI7TEI4lsAkts/mkhTSZK8w33B4RAg0= +cloud.google.com/go v0.50.0/go.mod h1:r9sluTvynVuxRIOHXQEHMFffphuXHOMZMycpNR5e6To= +cloud.google.com/go v0.52.0/go.mod h1:pXajvRH/6o3+F9jDHZWQ5PbGhn+o8w9qiu/CffaVdO4= +cloud.google.com/go v0.53.0/go.mod h1:fp/UouUEsRkN6ryDKNW/Upv/JBKnv6WDthjR6+vze6M= +cloud.google.com/go v0.54.0/go.mod h1:1rq2OEkV3YMf6n/9ZvGWI3GWw0VoqH/1x2nd8Is/bPc= +cloud.google.com/go v0.56.0/go.mod h1:jr7tqZxxKOVYizybht9+26Z/gUq7tiRzu+ACVAMbKVk= +cloud.google.com/go v0.57.0/go.mod h1:oXiQ6Rzq3RAkkY7N6t3TcE6jE+CIBBbA36lwQ1JyzZs= +cloud.google.com/go v0.62.0/go.mod h1:jmCYTdRCQuc1PHIIJ/maLInMho30T/Y0M4hTdTShOYc= +cloud.google.com/go v0.65.0/go.mod h1:O5N8zS7uWy9vkA9vayVHs65eM1ubvY4h553ofrNHObY= +cloud.google.com/go v0.72.0/go.mod h1:M+5Vjvlc2wnp6tjzE102Dw08nGShTscUx2nZMufOKPI= +cloud.google.com/go v0.74.0/go.mod h1:VV1xSbzvo+9QJOxLDaJfTjx5e+MePCpCWwvftOeQmWk= +cloud.google.com/go v0.75.0/go.mod h1:VGuuCn7PG0dwsd5XPVm2Mm3wlh3EL55/79EKB6hlPTY= +cloud.google.com/go v0.78.0/go.mod h1:QjdrLG0uq+YwhjoVOLsS1t7TW8fs36kLs4XO5R5ECHg= +cloud.google.com/go v0.79.0/go.mod h1:3bzgcEeQlzbuEAYu4mrWhKqWjmpprinYgKJLgKHnbb8= +cloud.google.com/go v0.81.0 h1:at8Tk2zUz63cLPR0JPWm5vp77pEZmzxEQBEfRKn1VV8= +cloud.google.com/go v0.81.0/go.mod h1:mk/AM35KwGk/Nm2YSeZbxXdrNK3KZOYHmLkOqC2V6E0= +cloud.google.com/go/bigquery v1.0.1/go.mod h1:i/xbL2UlR5RvWAURpBYZTtm/cXjCha9lbfbpx4poX+o= +cloud.google.com/go/bigquery v1.3.0/go.mod h1:PjpwJnslEMmckchkHFfq+HTD2DmtT67aNFKH1/VBDHE= +cloud.google.com/go/bigquery v1.4.0/go.mod h1:S8dzgnTigyfTmLBfrtrhyYhwRxG72rYxvftPBK2Dvzc= +cloud.google.com/go/bigquery v1.5.0/go.mod h1:snEHRnqQbz117VIFhE8bmtwIDY80NLUZUMb4Nv6dBIg= +cloud.google.com/go/bigquery v1.7.0/go.mod h1://okPTzCYNXSlb24MZs83e2Do+h+VXtc4gLoIoXIAPc= +cloud.google.com/go/bigquery v1.8.0/go.mod h1:J5hqkt3O0uAFnINi6JXValWIb1v0goeZM77hZzJN/fQ= +cloud.google.com/go/bigquery v1.17.0 h1:oq1PIpl9u1thzdsX0K9w5H8OlqH5gRu3zGc7FCk19IY= +cloud.google.com/go/bigquery v1.17.0/go.mod h1:pUlbH9kNOnp6ayShsqKLB6w49z14ILAaq0hrjh93Ajw= +cloud.google.com/go/datastore v1.0.0/go.mod h1:LXYbyblFSglQ5pkeyhO+Qmw7ukd3C+pD7TKLgZqpHYE= +cloud.google.com/go/datastore v1.1.0/go.mod h1:umbIZjpQpHh4hmRpGhH4tLFup+FVzqBi1b3c64qFpCk= +cloud.google.com/go/datastore v1.5.0 h1:3En8Rj64Q5GxtjsTljiqm25LTzvPFbpK+WQrgeKOUvI= +cloud.google.com/go/datastore v1.5.0/go.mod h1:RGUNM0FFAVkYA94BLTxoXBgfIyY1Riq67TwaBXH0lwc= +cloud.google.com/go/firestore v1.1.0/go.mod h1:ulACoGHTpvq5r8rxGJ4ddJZBZqakUQqClKRT5SZwBmk= +cloud.google.com/go/pubsub v1.0.1/go.mod h1:R0Gpsv3s54REJCy4fxDixWD93lHJMoZTyQ2kNxGRt3I= +cloud.google.com/go/pubsub v1.1.0/go.mod h1:EwwdRX2sKPjnvnqCa270oGRyludottCI76h+R3AArQw= +cloud.google.com/go/pubsub v1.2.0/go.mod h1:jhfEVHT8odbXTkndysNHCcx0awwzvfOlguIAii9o8iA= +cloud.google.com/go/pubsub v1.3.1/go.mod h1:i+ucay31+CNRpDW4Lu78I4xXG+O1r/MAHgjpRVR+TSU= +cloud.google.com/go/pubsub v1.11.0-beta.schemas h1:w81PfKDbPt8GQZFQePf2V3dhlld+fynrwwLuKQ1xntw= +cloud.google.com/go/pubsub v1.11.0-beta.schemas/go.mod h1:llNLsvx+RnsZJoY481TzC1XcdB2hWdR6gSWM5O4vgfs= +cloud.google.com/go/storage v1.0.0/go.mod h1:IhtSnM/ZTZV8YYJWCY8RULGVqBDmpoyjwiyrjsg+URw= +cloud.google.com/go/storage v1.5.0/go.mod h1:tpKbwo567HUNpVclU5sGELwQWBDZ8gh0ZeosJ0Rtdos= +cloud.google.com/go/storage v1.6.0/go.mod h1:N7U0C8pVQ/+NIKOBQyamJIeKQKkZ+mxpohlUTyfDhBk= +cloud.google.com/go/storage v1.8.0/go.mod h1:Wv1Oy7z6Yz3DshWRJFhqM/UCfaWIRTdp0RXyy7KQOVs= +cloud.google.com/go/storage v1.10.0/go.mod h1:FLPqc6j+Ki4BU591ie1oL6qBQGu2Bl/tZ9ullr3+Kg0= +cloud.google.com/go/storage v1.15.0 h1:Ljj+ZXVEhCr/1+4ZhvtteN1ND7UUsNTlduGclLh8GO0= +cloud.google.com/go/storage v1.15.0/go.mod h1:mjjQMoxxyGH7Jr8K5qrx6N2O0AHsczI61sMNn03GIZI= +dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU= +github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU= +github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo= +github.com/antihax/optional v1.0.0/go.mod h1:uupD/76wgC+ih3iEmQUL+0Ugr19nfwCT1kdvxnR2qWY= +github.com/armon/circbuf v0.0.0-20150827004946-bbbad097214e/go.mod h1:3U/XgcO3hCbHZ8TKRvWD2dDTCfh9M9ya+I9JpbB7O8o= +github.com/armon/go-metrics v0.0.0-20180917152333-f0300d1749da/go.mod h1:Q73ZrmVTwzkszR9V5SSuryQ31EELlFMUz1kKyl939pY= +github.com/armon/go-radix v0.0.0-20180808171621-7fddfc383310/go.mod h1:ufUuZ+zHj4x4TnLV4JWEpy2hxWSpsRywHrMgIH9cCH8= +github.com/bgentry/speakeasy v0.1.0/go.mod h1:+zsyZBPWlz7T6j88CTgSN5bM796AkVf0kBD4zp0CCIs= +github.com/bketelsen/crypt v0.0.4/go.mod h1:aI6NrJ0pMGgvZKL1iVgXLnfIFJtfV+bKCoqOes/6LfM= +github.com/census-instrumentation/opencensus-proto v0.2.1/go.mod h1:f6KPmirojxKA12rnyqOA5BBL4O983OfeGPqjHWSTneU= +github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWRnGsAI= +github.com/chzyer/readline v0.0.0-20180603132655-2972be24d48e/go.mod h1:nSuG5e5PlCu98SY8svDHJxuZscDgtXS6KTTbou5AhLI= +github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1/go.mod h1:Q3SI9o4m/ZMnBNeIyt5eFwwo7qiLfzFZmjNmxjkiQlU= +github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw= +github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc= +github.com/cncf/udpa/go v0.0.0-20200629203442-efcf912fb354/go.mod h1:WmhPx2Nbnhtbo57+VJT5O0JRkEi1Wbu0z5j0R8u5Hbk= +github.com/cncf/udpa/go v0.0.0-20201120205902-5459f2c99403/go.mod h1:WmhPx2Nbnhtbo57+VJT5O0JRkEi1Wbu0z5j0R8u5Hbk= +github.com/cncf/xds/go v0.0.0-20210312221358-fbca930ec8ed/go.mod h1:eXthEFrGJvWHgFFCl3hGmgk+/aYT6PnTQLykKQRLhEs= +github.com/coreos/go-semver v0.3.0/go.mod h1:nnelYz7RCh+5ahJtPPxZlU+153eP4D4r3EedlOD2RNk= +github.com/coreos/go-systemd/v22 v22.3.2/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc= +github.com/cpuguy83/go-md2man/v2 v2.0.0/go.mod h1:maD7wRr/U5Z6m/iR4s+kqSMx2CaBsrgA7czyZG/E6dU= +github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E= +github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/envoyproxy/go-control-plane v0.9.0/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4= +github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4= +github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98= +github.com/envoyproxy/go-control-plane v0.9.7/go.mod h1:cwu0lG7PUMfa9snN8LXBig5ynNVH9qI8YYLbd1fK2po= +github.com/envoyproxy/go-control-plane v0.9.9-0.20201210154907-fd9021fe5dad/go.mod h1:cXg6YxExXjJnVBQHBLXeUAgxn2UodCpnH306RInaBQk= +github.com/envoyproxy/go-control-plane v0.9.9-0.20210217033140-668b12f5399d/go.mod h1:cXg6YxExXjJnVBQHBLXeUAgxn2UodCpnH306RInaBQk= +github.com/envoyproxy/go-control-plane v0.9.9-0.20210512163311-63b5d3c536b0/go.mod h1:hliV/p42l8fGbc6Y9bQ70uLwIvmJyVE5k4iMKlh8wCQ= +github.com/envoyproxy/protoc-gen-validate v0.1.0/go.mod h1:iSmxcyjqTsJpI2R4NaDN7+kN2VEUnK/pcBlmesArF7c= +github.com/fatih/color v1.7.0/go.mod h1:Zm6kSWBoL9eyXnKyktHP6abPY2pDugNf5KwzbycvMj4= +github.com/fsnotify/fsnotify v1.4.9/go.mod h1:znqG4EE+3YCdAaPaxE2ZRY/06pZUdp0tY4IgpuI1SZQ= +github.com/ghodss/yaml v1.0.0/go.mod h1:4dBDuWmgqj2HViK6kFavaiC9ZROes6MMH2rRYeMEF04= +github.com/go-gl/glfw v0.0.0-20190409004039-e6da0acd62b1/go.mod h1:vR7hzQXu2zJy9AVAgeJqvqgH9Q5CA+iKCZ2gyEVpxRU= +github.com/go-gl/glfw/v3.3/glfw v0.0.0-20191125211704-12ad95a8df72/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8= +github.com/go-gl/glfw/v3.3/glfw v0.0.0-20200222043503-6f7a984d4dc4/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8= +github.com/godbus/dbus/v5 v5.0.4/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA= +github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q= +github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q= +github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc= +github.com/golang/groupcache v0.0.0-20191227052852-215e87163ea7/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc= +github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc= +github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da h1:oI5xCqsCo564l8iNU+DwB5epxmsaqB+rhGL0m5jtYqE= +github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc= +github.com/golang/mock v1.1.1/go.mod h1:oTYuIxOrZwtPieC+H1uAHpcLFnEyAGVDL/k47Jfbm0A= +github.com/golang/mock v1.2.0/go.mod h1:oTYuIxOrZwtPieC+H1uAHpcLFnEyAGVDL/k47Jfbm0A= +github.com/golang/mock v1.3.1/go.mod h1:sBzyDLLjw3U8JLTeZvSv8jJB+tU5PVekmnlKIyFUx0Y= +github.com/golang/mock v1.4.0/go.mod h1:UOMv5ysSaYNkG+OFQykRIcU/QvvxJf3p21QfJ2Bt3cw= +github.com/golang/mock v1.4.1/go.mod h1:UOMv5ysSaYNkG+OFQykRIcU/QvvxJf3p21QfJ2Bt3cw= +github.com/golang/mock v1.4.3/go.mod h1:UOMv5ysSaYNkG+OFQykRIcU/QvvxJf3p21QfJ2Bt3cw= +github.com/golang/mock v1.4.4/go.mod h1:l3mdAwkq5BuhzHwde/uurv3sEJeZMXNpwsxVWU71h+4= +github.com/golang/mock v1.5.0/go.mod h1:CWnOUgYIOo4TcNZ0wHX3YZCqsaM1I1Jvs6v3mP3KVu8= +github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= +github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= +github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= +github.com/golang/protobuf v1.3.3/go.mod h1:vzj43D7+SQXF/4pzW/hwtAqwc6iTitCiVSaWz5lYuqw= +github.com/golang/protobuf v1.3.4/go.mod h1:vzj43D7+SQXF/4pzW/hwtAqwc6iTitCiVSaWz5lYuqw= +github.com/golang/protobuf v1.3.5/go.mod h1:6O5/vntMXwX2lRkT1hjjk0nAC1IDOTvTlVgjlRvqsdk= +github.com/golang/protobuf v1.4.0-rc.1/go.mod h1:ceaxUfeHdC40wWswd/P6IGgMaK3YpKi5j83Wpe3EHw8= +github.com/golang/protobuf v1.4.0-rc.1.0.20200221234624-67d41d38c208/go.mod h1:xKAWHe0F5eneWXFV3EuXVDTCmh+JuBKY0li0aMyXATA= +github.com/golang/protobuf v1.4.0-rc.2/go.mod h1:LlEzMj4AhA7rCAGe4KMBDvJI+AwstrUpVNzEA03Pprs= +github.com/golang/protobuf v1.4.0-rc.4.0.20200313231945-b860323f09d0/go.mod h1:WU3c8KckQ9AFe+yFwt9sWVRKCVIyN9cPHBJSNnbL67w= +github.com/golang/protobuf v1.4.0/go.mod h1:jodUvKwWbYaEsadDk5Fwe5c77LiNKVO9IDvqG2KuDX0= +github.com/golang/protobuf v1.4.1/go.mod h1:U8fpvMrcmy5pZrNK1lt4xCsGvpyWQ/VVv6QDs8UjoX8= +github.com/golang/protobuf v1.4.2/go.mod h1:oDoupMAO8OvCJWAcko0GGGIgR6R6ocIYbsSw735rRwI= +github.com/golang/protobuf v1.4.3/go.mod h1:oDoupMAO8OvCJWAcko0GGGIgR6R6ocIYbsSw735rRwI= +github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk= +github.com/golang/protobuf v1.5.1/go.mod h1:DopwsBzvsk0Fs44TXzsVbJyPhcCPeIwnvohx4u74HPM= +github.com/golang/protobuf v1.5.2 h1:ROPKBNFfQgOUMifHyP+KYbvpjbdoFNs+aK7DXlji0Tw= +github.com/golang/protobuf v1.5.2/go.mod h1:XVQd3VNwM+JqD3oG2Ue2ip4fOMUkwXdXDdiuN0vRsmY= +github.com/golang/snappy v0.0.3/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q= +github.com/golang/snappy v0.0.4 h1:yAGX7huGHXlcLOEtBnF4w7FQwA26wojNCwOYAEhLjQM= +github.com/golang/snappy v0.0.4/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q= +github.com/google/btree v0.0.0-20180813153112-4030bb1f1f0c/go.mod h1:lNA+9X1NB3Zf8V7Ke586lFgjr2dZNuvo3lPJSGZ5JPQ= +github.com/google/btree v1.0.0/go.mod h1:lNA+9X1NB3Zf8V7Ke586lFgjr2dZNuvo3lPJSGZ5JPQ= +github.com/google/go-cmp v0.2.0/go.mod h1:oXzfMopK8JAjlY9xF4vHSVASa0yLyX7SntLO5aqRK0M= +github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= +github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= +github.com/google/go-cmp v0.4.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.4.1/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.1/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.3/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.4/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.6 h1:BKbKCqvP6I+rmFHt06ZmyQtvB8xAkWdhFyr0ZUNZcxQ= +github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= +github.com/google/martian v2.1.0+incompatible h1:/CP5g8u/VJHijgedC/Legn3BAbAaWPgecwXBIDzw5no= +github.com/google/martian v2.1.0+incompatible/go.mod h1:9I4somxYTbIHy5NJKHRl3wXiIaQGbYVAs8BPL6v8lEs= +github.com/google/martian/v3 v3.0.0/go.mod h1:y5Zk1BBys9G+gd6Jrk0W3cC1+ELVxBWuIGO+w/tUAp0= +github.com/google/martian/v3 v3.1.0/go.mod h1:y5Zk1BBys9G+gd6Jrk0W3cC1+ELVxBWuIGO+w/tUAp0= +github.com/google/martian/v3 v3.2.1 h1:d8MncMlErDFTwQGBK1xhv026j9kqhvw1Qv9IbWT1VLQ= +github.com/google/martian/v3 v3.2.1/go.mod h1:oBOf6HBosgwRXnUGWUB05QECsc6uvmMiJ3+6W4l/CUk= +github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57/go.mod h1:zfwlbNMJ+OItoe0UupaVj+oy1omPYYDuagoSzA8v9mc= +github.com/google/pprof v0.0.0-20190515194954-54271f7e092f/go.mod h1:zfwlbNMJ+OItoe0UupaVj+oy1omPYYDuagoSzA8v9mc= +github.com/google/pprof v0.0.0-20191218002539-d4f498aebedc/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= +github.com/google/pprof v0.0.0-20200212024743-f11f1df84d12/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= +github.com/google/pprof v0.0.0-20200229191704-1ebb73c60ed3/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= +github.com/google/pprof v0.0.0-20200430221834-fc25d7d30c6d/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= +github.com/google/pprof v0.0.0-20200708004538-1a94d8640e99/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= +github.com/google/pprof v0.0.0-20201023163331-3e6fc7fc9c4c/go.mod h1:kpwsk12EmLew5upagYY7GY0pfYCcupk39gWOCRROcvE= +github.com/google/pprof v0.0.0-20201203190320-1bf35d6f28c2/go.mod h1:kpwsk12EmLew5upagYY7GY0pfYCcupk39gWOCRROcvE= +github.com/google/pprof v0.0.0-20201218002935-b9804c9f04c2/go.mod h1:kpwsk12EmLew5upagYY7GY0pfYCcupk39gWOCRROcvE= +github.com/google/pprof v0.0.0-20210122040257-d980be63207e/go.mod h1:kpwsk12EmLew5upagYY7GY0pfYCcupk39gWOCRROcvE= +github.com/google/pprof v0.0.0-20210226084205-cbba55b83ad5/go.mod h1:kpwsk12EmLew5upagYY7GY0pfYCcupk39gWOCRROcvE= +github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI= +github.com/google/uuid v1.1.2/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= +github.com/google/uuid v1.3.0 h1:t6JiXgmwXMjEs8VusXIJk2BXHsn+wx8BZdTaoZ5fu7I= +github.com/google/uuid v1.3.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= +github.com/googleapis/gax-go/v2 v2.0.4/go.mod h1:0Wqv26UfaUD9n4G6kQubkQ+KchISgw+vpHVxEJEs9eg= +github.com/googleapis/gax-go/v2 v2.0.5 h1:sjZBwGj9Jlw33ImPtvFviGYvseOtDM7hkSKB7+Tv3SM= +github.com/googleapis/gax-go/v2 v2.0.5/go.mod h1:DWXyrwAJ9X0FpwwEdw+IPEYBICEFu5mhpdKc/us6bOk= +github.com/gopherjs/gopherjs v0.0.0-20181017120253-0766667cb4d1/go.mod h1:wJfORRmW1u3UXTncJ5qlYoELFm8eSnnEO6hX4iZ3EWY= +github.com/grpc-ecosystem/grpc-gateway v1.16.0/go.mod h1:BDjrQk3hbvj6Nolgz8mAMFbcEtjT1g+wF4CSlocrBnw= +github.com/hashicorp/consul/api v1.1.0/go.mod h1:VmuI/Lkw1nC05EYQWNKwWGbkg+FbDBtguAZLlVdkD9Q= +github.com/hashicorp/consul/sdk v0.1.1/go.mod h1:VKf9jXwCTEY1QZP2MOLRhb5i/I/ssyNV1vwHyQBF0x8= +github.com/hashicorp/errwrap v1.0.0/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4= +github.com/hashicorp/go-cleanhttp v0.5.1/go.mod h1:JpRdi6/HCYpAwUzNwuwqhbovhLtngrth3wmdIIUrZ80= +github.com/hashicorp/go-immutable-radix v1.0.0/go.mod h1:0y9vanUI8NX6FsYoO3zeMjhV/C5i9g4Q3DwcSNZ4P60= +github.com/hashicorp/go-msgpack v0.5.3/go.mod h1:ahLV/dePpqEmjfWmKiqvPkv/twdG7iPBM1vqhUKIvfM= +github.com/hashicorp/go-multierror v1.0.0/go.mod h1:dHtQlpGsu+cZNNAkkCN/P3hoUDHhCYQXV3UM06sGGrk= +github.com/hashicorp/go-rootcerts v1.0.0/go.mod h1:K6zTfqpRlCUIjkwsN4Z+hiSfzSTQa6eBIzfwKfwNnHU= +github.com/hashicorp/go-sockaddr v1.0.0/go.mod h1:7Xibr9yA9JjQq1JpNB2Vw7kxv8xerXegt+ozgdvDeDU= +github.com/hashicorp/go-syslog v1.0.0/go.mod h1:qPfqrKkXGihmCqbJM2mZgkZGvKG1dFdvsLplgctolz4= +github.com/hashicorp/go-uuid v1.0.0/go.mod h1:6SBZvOh/SIDV7/2o3Jml5SYk/TvGqwFJ/bN7x4byOro= +github.com/hashicorp/go-uuid v1.0.1/go.mod h1:6SBZvOh/SIDV7/2o3Jml5SYk/TvGqwFJ/bN7x4byOro= +github.com/hashicorp/go.net v0.0.1/go.mod h1:hjKkEWcCURg++eb33jQU7oqQcI9XDCnUzHA0oac0k90= +github.com/hashicorp/golang-lru v0.5.0/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8= +github.com/hashicorp/golang-lru v0.5.1/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8= +github.com/hashicorp/hcl v1.0.0/go.mod h1:E5yfLk+7swimpb2L/Alb/PJmXilQ/rhwaUYs4T20WEQ= +github.com/hashicorp/logutils v1.0.0/go.mod h1:QIAnNjmIWmVIIkWDTG1z5v++HQmx9WQRO+LraFDTW64= +github.com/hashicorp/mdns v1.0.0/go.mod h1:tL+uN++7HEJ6SQLQ2/p+z2pH24WQKWjBPkE0mNTz8vQ= +github.com/hashicorp/memberlist v0.1.3/go.mod h1:ajVTdAv/9Im8oMAAj5G31PhhMCZJV2pPBoIllUwCN7I= +github.com/hashicorp/serf v0.8.2/go.mod h1:6hOLApaqBFA1NXqRQAsxw9QxuDEvNxSQRwA/JwenrHc= +github.com/ianlancetaylor/demangle v0.0.0-20181102032728-5e5cf60278f6/go.mod h1:aSSvb/t6k1mPoxDqO4vJh6VOCGPwU4O0C2/Eqndh1Sc= +github.com/ianlancetaylor/demangle v0.0.0-20200824232613-28f6c0f3b639/go.mod h1:aSSvb/t6k1mPoxDqO4vJh6VOCGPwU4O0C2/Eqndh1Sc= +github.com/inconshreveable/mousetrap v1.0.0 h1:Z8tu5sraLXCXIcARxBp/8cbvlwVa7Z1NHg9XEKhtSvM= +github.com/inconshreveable/mousetrap v1.0.0/go.mod h1:PxqpIevigyE2G7u3NXJIT2ANytuPF1OarO4DADm73n8= +github.com/json-iterator/go v1.1.11/go.mod h1:KdQUCv79m/52Kvf8AW2vK1V8akMuk1QjK/uOdHXbAo4= +github.com/jstemmer/go-junit-report v0.0.0-20190106144839-af01ea7f8024/go.mod h1:6v2b51hI/fHJwM22ozAgKL4VKDeJcHhJFhtBdhmNjmU= +github.com/jstemmer/go-junit-report v0.9.1 h1:6QPYqodiu3GuPL+7mfx+NwDdp2eTkp9IfEUpgAwUN0o= +github.com/jstemmer/go-junit-report v0.9.1/go.mod h1:Brl9GWCQeLvo8nXZwPNNblvFj/XSXhF0NWZEnDohbsk= +github.com/jtolds/gls v4.20.0+incompatible/go.mod h1:QJZ7F/aHp+rZTRtaJ1ow/lLfFfVYBRgL+9YlvaHOwJU= +github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8= +github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck= +github.com/kr/fs v0.1.0/go.mod h1:FFnZGqtBN9Gxj7eW1uZ42v5BccTP0vu6NEaFoC2HwRg= +github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo= +github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ= +github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI= +github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= +github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= +github.com/linkedin/goavro v2.1.0+incompatible h1:DV2aUlj2xZiuxQyvag8Dy7zjY69ENjS66bWkSfdpddY= +github.com/linkedin/goavro v2.1.0+incompatible/go.mod h1:bBCwI2eGYpUI/4820s67MElg9tdeLbINjLjiM2xZFYM= +github.com/magiconair/properties v1.8.5/go.mod h1:y3VJvCyxH9uVvJTWEGAELF3aiYNyPKd5NZ3oSwXrF60= +github.com/mattn/go-colorable v0.0.9/go.mod h1:9vuHe8Xs5qXnSaW/c/ABM9alt+Vo+STaOChaDxuIBZU= +github.com/mattn/go-isatty v0.0.3/go.mod h1:M+lRXTBqGeGNdLjl/ufCoiOlB5xdOkqRJdNxMWT7Zi4= +github.com/miekg/dns v1.0.14/go.mod h1:W1PPwlIAgtquWBMBEV9nkV9Cazfe8ScdGz/Lj7v3Nrg= +github.com/mitchellh/cli v1.0.0/go.mod h1:hNIlj7HEI86fIcpObd7a0FcrxTWetlwJDGcceTlRvqc= +github.com/mitchellh/go-homedir v1.0.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0= +github.com/mitchellh/go-testing-interface v1.0.0/go.mod h1:kRemZodwjscx+RGhAo8eIhFbs2+BFgRtFPeD/KE+zxI= +github.com/mitchellh/gox v0.4.0/go.mod h1:Sd9lOJ0+aimLBi73mGofS1ycjY8lL3uZM3JPS42BGNg= +github.com/mitchellh/iochan v1.0.0/go.mod h1:JwYml1nuB7xOzsp52dPpHFffvOCDupsG0QubkSMEySY= +github.com/mitchellh/mapstructure v0.0.0-20160808181253-ca63d7c062ee/go.mod h1:FVVH3fgwuzCH5S8UJGiWEs2h04kUh9fWfEaFds41c1Y= +github.com/mitchellh/mapstructure v1.1.2/go.mod h1:FVVH3fgwuzCH5S8UJGiWEs2h04kUh9fWfEaFds41c1Y= +github.com/mitchellh/mapstructure v1.4.1/go.mod h1:bFUtVrKA4DC2yAKiSyO/QUcy7e+RRV2QTWOzhPopBRo= +github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= +github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742/go.mod h1:bx2lNnkwVCuqBIxFjflWJWanXIb3RllmbCylyMrvgv0= +github.com/modern-go/reflect2 v1.0.1/go.mod h1:bx2lNnkwVCuqBIxFjflWJWanXIb3RllmbCylyMrvgv0= +github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e h1:fD57ERR4JtEqsWbfPhv4DMiApHyliiK5xCTNVSPiaAs= +github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e/go.mod h1:zD1mROLANZcx1PVRCS0qkT7pwLkGfwJo4zjcN/Tysno= +github.com/nightlyone/lockfile v1.0.0 h1:RHep2cFKK4PonZJDdEl4GmkabuhbsRMgk/k3uAmxBiA= +github.com/nightlyone/lockfile v1.0.0/go.mod h1:rywoIealpdNse2r832aiD9jRk8ErCatROs6LzC841CI= +github.com/pascaldekloe/goe v0.0.0-20180627143212-57f6aae5913c/go.mod h1:lzWF7FIEvWOWxwDKqyGYQf6ZUaNfKdP144TG7ZOy1lc= +github.com/pelletier/go-toml v1.9.3/go.mod h1:u1nR/EPcESfeI/szUZKdtJ0xRNbUoANCkoOuaOx1Y+c= +github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= +github.com/pkg/sftp v1.10.1/go.mod h1:lYOWFsE0bwd1+KfKJaKeuokY15vzFx25BLbzYYoAxZI= +github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/posener/complete v1.1.1/go.mod h1:em0nMJCgc9GFtwrmVmEMR/ZL6WyhyjMBndrE9hABlRI= +github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA= +github.com/rogpeppe/fastuuid v1.2.0/go.mod h1:jVj6XXZzXRy/MSR5jhDC/2q6DgLz+nrA6LYCDYWNEvQ= +github.com/rogpeppe/go-internal v1.3.0/go.mod h1:M8bDsm7K2OlrFYOpmOWEs/qY81heoFRclV5y23lUDJ4= +github.com/russross/blackfriday/v2 v2.0.1/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM= +github.com/ryanuber/columnize v0.0.0-20160712163229-9b3edd62028f/go.mod h1:sm1tb6uqfes/u+d4ooFouqFdy9/2g9QGwK3SQygK0Ts= +github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529/go.mod h1:DxrIzT+xaE7yg65j358z/aeFdxmN0P9QXhEzd20vsDc= +github.com/shurcooL/sanitized_anchor_name v1.0.0/go.mod h1:1NzhyTcUVG4SuEtjjoZeVRXNmyL/1OwPU0+IJeTBvfc= +github.com/smartystreets/assertions v0.0.0-20180927180507-b2de0cb4f26d/go.mod h1:OnSkiWE9lh6wB0YB77sQom3nweQdgAjqCqsofrRNTgc= +github.com/smartystreets/goconvey v1.6.4/go.mod h1:syvi0/a8iFYH4r/RixwvyeAJjdLS9QV7WQ/tjFTllLA= +github.com/spf13/afero v1.6.0/go.mod h1:Ai8FlHk4v/PARR026UzYexafAt9roJ7LcLMAmO6Z93I= +github.com/spf13/cast v1.3.1/go.mod h1:Qx5cxh0v+4UWYiBimWS+eyWzqEqokIECu5etghLkUJE= +github.com/spf13/cobra v1.2.1 h1:+KmjbUw1hriSNMF55oPrkZcb27aECyrj8V2ytv7kWDw= +github.com/spf13/cobra v1.2.1/go.mod h1:ExllRjgxM/piMAM+3tAZvg8fsklGAf3tPfi+i8t68Nk= +github.com/spf13/jwalterweatherman v1.1.0/go.mod h1:aNWZUN0dPAAO/Ljvb5BEdw96iTZ0EXowPYD95IqWIGo= +github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA= +github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= +github.com/spf13/viper v1.8.1/go.mod h1:o0Pch8wJ9BVSWGQMbra6iw0oQ5oktSIBaujf1rJH9Ns= +github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= +github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs= +github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= +github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4= +github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5cxcmMvtA= +github.com/stretchr/testify v1.6.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= +github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= +github.com/subosito/gotenv v1.2.0/go.mod h1:N0PQaV/YGNqwC0u51sEeR/aUtSLEXKX9iv69rRypqCw= +github.com/yuin/goldmark v1.1.25/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= +github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= +github.com/yuin/goldmark v1.1.32/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= +github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= +github.com/yuin/goldmark v1.3.5/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1Zlc8k= +go.etcd.io/etcd/api/v3 v3.5.0/go.mod h1:cbVKeC6lCfl7j/8jBhAK6aIYO9XOjdptoxU/nLQcPvs= +go.etcd.io/etcd/client/pkg/v3 v3.5.0/go.mod h1:IJHfcCEKxYu1Os13ZdwCwIUTUVGYTSAM3YSwc9/Ac1g= +go.etcd.io/etcd/client/v2 v2.305.0/go.mod h1:h9puh54ZTgAKtEbut2oe9P4L/oqKCVB6xsXlzd7alYQ= +go.opencensus.io v0.21.0/go.mod h1:mSImk1erAIZhrmZN+AvHh14ztQfjbGwt4TtuofqLduU= +go.opencensus.io v0.22.0/go.mod h1:+kGneAE2xo2IficOXnaByMWTGM9T73dGwxeWcUqIpI8= +go.opencensus.io v0.22.2/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw= +go.opencensus.io v0.22.3/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw= +go.opencensus.io v0.22.4/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw= +go.opencensus.io v0.22.5/go.mod h1:5pWMHQbX5EPX2/62yrJeAkowc+lfs/XD7Uxpq3pI6kk= +go.opencensus.io v0.23.0 h1:gqCw0LfLxScz8irSi8exQc7fyQ0fKQU/qnC/X8+V/1M= +go.opencensus.io v0.23.0/go.mod h1:XItmlyltB5F7CS4xOC1DcqMoFqwtC6OG2xF7mCv7P7E= +go.opentelemetry.io/proto/otlp v0.7.0/go.mod h1:PqfVotwruBrMGOCsRd/89rSnXhoiJIqeYNgFYFoEGnI= +go.uber.org/atomic v1.7.0/go.mod h1:fEN4uk6kAWBTFdckzkM89CLk9XfWZrxpCo0nPH17wJc= +go.uber.org/multierr v1.6.0/go.mod h1:cdWPpRnG4AhwMwsgIHip0KRBQjJy5kYEpYjJxpXp9iU= +go.uber.org/zap v1.17.0/go.mod h1:MXVU+bhUf/A7Xi2HNOnopQOrmycQ5Ih87HtOu4q5SSo= +golang.org/x/crypto v0.0.0-20181029021203-45a5f77698d3/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4= +golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= +golang.org/x/crypto v0.0.0-20190510104115-cbcb75029529/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= +golang.org/x/crypto v0.0.0-20190605123033-f99c8df09eb5/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= +golang.org/x/crypto v0.0.0-20190820162420-60c769a6c586/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= +golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= +golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= +golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= +golang.org/x/exp v0.0.0-20190306152737-a1d7652674e8/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= +golang.org/x/exp v0.0.0-20190510132918-efd6b22b2522/go.mod h1:ZjyILWgesfNpC6sMxTJOJm9Kp84zZh5NQWvqDGG3Qr8= +golang.org/x/exp v0.0.0-20190829153037-c13cbed26979/go.mod h1:86+5VVa7VpoJ4kLfm080zCjGlMRFzhUhsZKEZO7MGek= +golang.org/x/exp v0.0.0-20191030013958-a1ab85dbe136/go.mod h1:JXzH8nQsPlswgeRAPE3MuO9GYsAcnJvJ4vnMwN/5qkY= +golang.org/x/exp v0.0.0-20191129062945-2f5052295587/go.mod h1:2RIsYlXP63K8oxa1u096TMicItID8zy7Y6sNkU49FU4= +golang.org/x/exp v0.0.0-20191227195350-da58074b4299/go.mod h1:2RIsYlXP63K8oxa1u096TMicItID8zy7Y6sNkU49FU4= +golang.org/x/exp v0.0.0-20200119233911-0405dc783f0a/go.mod h1:2RIsYlXP63K8oxa1u096TMicItID8zy7Y6sNkU49FU4= +golang.org/x/exp v0.0.0-20200207192155-f17229e696bd/go.mod h1:J/WKrq2StrnmMY6+EHIKF9dgMWnmCNThgcyBT1FY9mM= +golang.org/x/exp v0.0.0-20200224162631-6cc2880d07d6/go.mod h1:3jZMyOhIsHpP37uCMkUooju7aAi5cS1Q23tOzKc+0MU= +golang.org/x/image v0.0.0-20190227222117-0694c2d4d067/go.mod h1:kZ7UVZpmo3dzQBMxlp+ypCbDeSB+sBbTgSJuh5dn5js= +golang.org/x/image v0.0.0-20190802002840-cff245a6509b/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0= +golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE= +golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU= +golang.org/x/lint v0.0.0-20190301231843-5614ed5bae6f/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE= +golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= +golang.org/x/lint v0.0.0-20190409202823-959b441ac422/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= +golang.org/x/lint v0.0.0-20190909230951-414d861bb4ac/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= +golang.org/x/lint v0.0.0-20190930215403-16217165b5de/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= +golang.org/x/lint v0.0.0-20191125180803-fdd1cda4f05f/go.mod h1:5qLYkcX4OjUUV8bRuDixDT3tpyyb+LUpUlRWLxfhWrs= +golang.org/x/lint v0.0.0-20200130185559-910be7a94367/go.mod h1:3xt1FjdF8hUf6vQPIChWIBhFzV8gjjsPE/fR3IyQdNY= +golang.org/x/lint v0.0.0-20200302205851-738671d3881b/go.mod h1:3xt1FjdF8hUf6vQPIChWIBhFzV8gjjsPE/fR3IyQdNY= +golang.org/x/lint v0.0.0-20201208152925-83fdc39ff7b5/go.mod h1:3xt1FjdF8hUf6vQPIChWIBhFzV8gjjsPE/fR3IyQdNY= +golang.org/x/lint v0.0.0-20210508222113-6edffad5e616 h1:VLliZ0d+/avPrXXH+OakdXhpJuEoBZuwh1m2j7U6Iug= +golang.org/x/lint v0.0.0-20210508222113-6edffad5e616/go.mod h1:3xt1FjdF8hUf6vQPIChWIBhFzV8gjjsPE/fR3IyQdNY= +golang.org/x/mobile v0.0.0-20190312151609-d3739f865fa6/go.mod h1:z+o9i4GpDbdi3rU15maQ/Ox0txvL9dWGYEHz965HBQE= +golang.org/x/mobile v0.0.0-20190719004257-d2bd2a29d028/go.mod h1:E/iHnbuqvinMTCcRqshq8CkpyQDoeVncDDYHnLhea+o= +golang.org/x/mod v0.0.0-20190513183733-4bf6d317e70e/go.mod h1:mXi4GBBbnImb6dmsKGUJ2LatrhH/nqhxcFungHvyanc= +golang.org/x/mod v0.1.0/go.mod h1:0QHyrYULN0/3qlju5TqG8bIK38QM8yzMo5ekMj3DlcY= +golang.org/x/mod v0.1.1-0.20191105210325-c90efee705ee/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg= +golang.org/x/mod v0.1.1-0.20191107180719-034126e5016b/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg= +golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= +golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= +golang.org/x/mod v0.4.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= +golang.org/x/mod v0.4.1/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= +golang.org/x/mod v0.4.2 h1:Gz96sIWK3OalVv/I/qNygP42zyoKp3xptRVCWRFEBvo= +golang.org/x/mod v0.4.2/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= +golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= +golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= +golang.org/x/net v0.0.0-20181023162649-9b4f9f5ad519/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= +golang.org/x/net v0.0.0-20181201002055-351d144fa1fc/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= +golang.org/x/net v0.0.0-20190108225652-1e06a53dbb7e/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= +golang.org/x/net v0.0.0-20190213061140-3a22650c66bd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= +golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= +golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= +golang.org/x/net v0.0.0-20190501004415-9ce7a6920f09/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= +golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= +golang.org/x/net v0.0.0-20190603091049-60506f45cf65/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks= +golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20190628185345-da137c7871d7/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20190724013045-ca1201d0de80/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20191209160850-c0dbc17a3553/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20200114155413-6afb5195e5aa/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20200202094626-16171245cfb2/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20200222125558-5a598a2470a0/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20200226121028-0de0cce0169b/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20200301022130-244492dfa37a/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20200324143707-d3edc9973b7e/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= +golang.org/x/net v0.0.0-20200501053045-e0ff5e5a1de5/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= +golang.org/x/net v0.0.0-20200506145744-7e3656a0809f/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= +golang.org/x/net v0.0.0-20200513185701-a91f0712d120/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= +golang.org/x/net v0.0.0-20200520182314-0ba52f642ac2/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= +golang.org/x/net v0.0.0-20200625001655-4c5254603344/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA= +golang.org/x/net v0.0.0-20200707034311-ab3426394381/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA= +golang.org/x/net v0.0.0-20200822124328-c89045814202/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA= +golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU= +golang.org/x/net v0.0.0-20201031054903-ff519b6c9102/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU= +golang.org/x/net v0.0.0-20201110031124-69a78807bb2b/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU= +golang.org/x/net v0.0.0-20201209123823-ac852fbbde11/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= +golang.org/x/net v0.0.0-20201224014010-6772e930b67b/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= +golang.org/x/net v0.0.0-20210119194325-5f4716e94777/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= +golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= +golang.org/x/net v0.0.0-20210316092652-d523dce5a7f4/go.mod h1:RBQZq4jEuRlivfhVLdyRGr576XBO4/greRjx4P4O3yc= +golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM= +golang.org/x/net v0.0.0-20210423184538-5f58ad60dda6 h1:0PC75Fz/kyMGhL0e1QnypqK2kQMqKt9csD1GnMJR+Zk= +golang.org/x/net v0.0.0-20210423184538-5f58ad60dda6/go.mod h1:OJAsFXCWl8Ukc7SiCT/9KSuxbyM7479/AVlXFRxuMCk= +golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U= +golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= +golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= +golang.org/x/oauth2 v0.0.0-20191202225959-858c2ad4c8b6/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= +golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= +golang.org/x/oauth2 v0.0.0-20200902213428-5d25da1a8d43/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= +golang.org/x/oauth2 v0.0.0-20201109201403-9fd604954f58/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= +golang.org/x/oauth2 v0.0.0-20201208152858-08078c50e5b5/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= +golang.org/x/oauth2 v0.0.0-20210218202405-ba52d332ba99/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= +golang.org/x/oauth2 v0.0.0-20210220000619-9bb904979d93/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= +golang.org/x/oauth2 v0.0.0-20210313182246-cd4f82c27b84/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= +golang.org/x/oauth2 v0.0.0-20210402161424-2e8d93401602/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= +golang.org/x/oauth2 v0.0.0-20210413134643-5e61552d6c78/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= +golang.org/x/oauth2 v0.0.0-20210628180205-a41e5a781914 h1:3B43BWw0xEBsLZ/NO1VALz6fppU3481pik+2Ksv45z8= +golang.org/x/oauth2 v0.0.0-20210628180205-a41e5a781914/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= +golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20190227155943-e225da77a7e6/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20200317015054-43a5402ce75a/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20200625203802-6e8e738ad208/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20201207232520-09787c993a3a/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20210220032951-036812b2e83c h1:5KslGYwFpkhGh+Q16bwMP3cOontH8FOep7tGV86Y7SQ= +golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sys v0.0.0-20180823144017-11551d06cbcc/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20181026203630-95b1ffbd15a5/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20190312061237-fead79001313/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190502145724-3ef323f4f1fd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190507160741-ecd444e8653b/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190606165138-5da285871e9c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190624142023-c5567b49c5d0/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190726091711-fc99dfbffb4e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20191001151750-bb3f8db39f24/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20191005200804-aed5e4c7ecf9/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20191204072324-ce4227a45e2e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20191228213918-04cbcbbfeed8/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200113162924-86b910548bc1/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200122134326-e047566fdf82/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200202164722-d101bd2416d5/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200212091648-12a6c2dcc1e4/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200223170610-d5e6a3e2c0ae/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200302150141-5c8b2ff67527/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200331124033-c3d80250170d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200501052902-10377860bb8e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200511232937-7e40ca221e25/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200515095857-1151b9dac4a9/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200523222454-059865788121/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200803210538-64077c9b5642/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200905004654-be1d3432aa8f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20201201145000-ef89a241ccb3/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210104204734-6f8348627aad/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210119212857-b64e53b001e4/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210220050731-9a76102bfb43/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210225134936-a50acf3fe073/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210305230114-8fe3ee5dd75b/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210315160823-c6e025ad8005/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210320140829-1e4c9ba3b0c4/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210330210617-4fbd30eecc44/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210403161142-5e06dd20ab57/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210412220455-f1c623a9e750/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210510120138-977fb7262007/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c h1:F1jZWGFhYfh0Ci55sIpILtKKK8p3i2/krTr0H1rg74I= +golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= +golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= +golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= +golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= +golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk= +golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= +golang.org/x/text v0.3.4/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= +golang.org/x/text v0.3.5/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= +golang.org/x/text v0.3.6 h1:aRYxNxv6iGQlyVaZmk6ZgYEDa+Jg18DxebPSrd6bg1M= +golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= +golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= +golang.org/x/time v0.0.0-20190308202827-9d24e82272b4/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= +golang.org/x/time v0.0.0-20191024005414-555d28b269f0/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= +golang.org/x/time v0.0.0-20210220033141-f8bda1e9f3ba/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= +golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= +golang.org/x/tools v0.0.0-20190114222345-bf090417da8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= +golang.org/x/tools v0.0.0-20190226205152-f727befe758c/go.mod h1:9Yl7xja0Znq3iFh3HoIrodX9oNMXvdceNzlUR8zjMvY= +golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= +golang.org/x/tools v0.0.0-20190312151545-0bb0c0a6e846/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= +golang.org/x/tools v0.0.0-20190312170243-e65039ee4138/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= +golang.org/x/tools v0.0.0-20190328211700-ab21143f2384/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= +golang.org/x/tools v0.0.0-20190425150028-36563e24a262/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q= +golang.org/x/tools v0.0.0-20190506145303-2d16b83fe98c/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q= +golang.org/x/tools v0.0.0-20190524140312-2c0ae7006135/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q= +golang.org/x/tools v0.0.0-20190606124116-d0a3d012864b/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc= +golang.org/x/tools v0.0.0-20190621195816-6e04913cbbac/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc= +golang.org/x/tools v0.0.0-20190628153133-6cdbf07be9d0/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc= +golang.org/x/tools v0.0.0-20190816200558-6889da9d5479/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20190911174233-4f2ddba30aff/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191012152004-8de300cfc20a/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191112195655-aa38f8e97acc/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191113191852-77e3bb0ad9e7/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191115202509-3a792d9c32b2/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191125144606-a911d9008d1f/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191130070609-6e064ea0cf2d/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191216173652-a0e659d51361/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20191227053925-7b8e75db28f4/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20200117161641-43d50277825c/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20200122220014-bf1340f18c4a/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20200130002326-2f3ba24bd6e7/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20200204074204-1cc6d1ef6c74/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20200207183749-b753a1ba74fa/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20200212150539-ea181f53ac56/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20200224181240-023911ca70b2/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20200227222343-706bc42d1f0d/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.0.0-20200304193943-95d2e580d8eb/go.mod h1:o4KQGtdN14AW+yjsvvwRTJJuXz8XRtIHtEnmAXLyFUw= +golang.org/x/tools v0.0.0-20200312045724-11d5b4c81c7d/go.mod h1:o4KQGtdN14AW+yjsvvwRTJJuXz8XRtIHtEnmAXLyFUw= +golang.org/x/tools v0.0.0-20200331025713-a30bf2db82d4/go.mod h1:Sl4aGygMT6LrqrWclx+PTx3U+LnKx/seiNR+3G19Ar8= +golang.org/x/tools v0.0.0-20200501065659-ab2804fb9c9d/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= +golang.org/x/tools v0.0.0-20200512131952-2bc93b1c0c88/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= +golang.org/x/tools v0.0.0-20200515010526-7d3b6ebf133d/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= +golang.org/x/tools v0.0.0-20200618134242-20370b0cb4b2/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= +golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= +golang.org/x/tools v0.0.0-20200729194436-6467de6f59a7/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA= +golang.org/x/tools v0.0.0-20200804011535-6c149bb5ef0d/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA= +golang.org/x/tools v0.0.0-20200825202427-b303f430e36d/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA= +golang.org/x/tools v0.0.0-20200904185747-39188db58858/go.mod h1:Cj7w3i3Rnn0Xh82ur9kSqwfTHTeVxaDqrfMjpcNT6bE= +golang.org/x/tools v0.0.0-20201110124207-079ba7bd75cd/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= +golang.org/x/tools v0.0.0-20201201161351-ac6f37ff4c2a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= +golang.org/x/tools v0.0.0-20201208233053-a543418bbed2/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= +golang.org/x/tools v0.0.0-20210105154028-b0ab187a4818/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= +golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= +golang.org/x/tools v0.0.0-20210108195828-e2f9c7f1fc8e/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= +golang.org/x/tools v0.1.0/go.mod h1:xkSsbof2nBLbhDlRMhhhyNLN/zl3eTqcnHD5viDpcZ0= +golang.org/x/tools v0.1.2/go.mod h1:o0xws9oXOQQZyjljx8fwUC0k7L1pTE6eaCbjGeHmOkk= +golang.org/x/tools v0.1.5 h1:ouewzE6p+/VEB31YYnTbEJdi8pFqKp4P4n85vwo3DHA= +golang.org/x/tools v0.1.5/go.mod h1:o0xws9oXOQQZyjljx8fwUC0k7L1pTE6eaCbjGeHmOkk= +golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE= +golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +google.golang.org/api v0.4.0/go.mod h1:8k5glujaEP+g9n7WNsDg8QP6cUVNI86fCNMcbazEtwE= +google.golang.org/api v0.7.0/go.mod h1:WtwebWUNSVBH/HAw79HIFXZNqEvBhG+Ra+ax0hx3E3M= +google.golang.org/api v0.8.0/go.mod h1:o4eAsZoiT+ibD93RtjEohWalFOjRDx6CVaqeizhEnKg= +google.golang.org/api v0.9.0/go.mod h1:o4eAsZoiT+ibD93RtjEohWalFOjRDx6CVaqeizhEnKg= +google.golang.org/api v0.13.0/go.mod h1:iLdEw5Ide6rF15KTC1Kkl0iskquN2gFfn9o9XIsbkAI= +google.golang.org/api v0.14.0/go.mod h1:iLdEw5Ide6rF15KTC1Kkl0iskquN2gFfn9o9XIsbkAI= +google.golang.org/api v0.15.0/go.mod h1:iLdEw5Ide6rF15KTC1Kkl0iskquN2gFfn9o9XIsbkAI= +google.golang.org/api v0.17.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= +google.golang.org/api v0.18.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= +google.golang.org/api v0.19.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= +google.golang.org/api v0.20.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= +google.golang.org/api v0.22.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= +google.golang.org/api v0.24.0/go.mod h1:lIXQywCXRcnZPGlsd8NbLnOjtAoL6em04bJ9+z0MncE= +google.golang.org/api v0.28.0/go.mod h1:lIXQywCXRcnZPGlsd8NbLnOjtAoL6em04bJ9+z0MncE= +google.golang.org/api v0.29.0/go.mod h1:Lcubydp8VUV7KeIHD9z2Bys/sm/vGKnG1UHuDBSrHWM= +google.golang.org/api v0.30.0/go.mod h1:QGmEvQ87FHZNiUVJkT14jQNYJ4ZJjdRF23ZXz5138Fc= +google.golang.org/api v0.35.0/go.mod h1:/XrVsuzM0rZmrsbjJutiuftIzeuTQcEeaYcSk/mQ1dg= +google.golang.org/api v0.36.0/go.mod h1:+z5ficQTmoYpPn8LCUNVpK5I7hwkpjbcgqA7I34qYtE= +google.golang.org/api v0.40.0/go.mod h1:fYKFpnQN0DsDSKRVRcQSDQNtqWPfM9i+zNPxepjRCQ8= +google.golang.org/api v0.41.0/go.mod h1:RkxM5lITDfTzmyKFPt+wGrCJbVfniCr2ool8kTBzRTU= +google.golang.org/api v0.43.0/go.mod h1:nQsDGjRXMo4lvh5hP0TKqF244gqhGcr/YSIykhUk/94= +google.golang.org/api v0.44.0/go.mod h1:EBOGZqzyhtvMDoxwS97ctnh0zUmYY6CxqXsc1AvkYD8= +google.golang.org/api v0.45.0 h1:pqMffJFLBVUDIoYsHcqtxgQVTsmxMDpYLOc5MT4Jrww= +google.golang.org/api v0.45.0/go.mod h1:ISLIJCedJolbZvDfAk+Ctuq5hf+aJ33WgtUsfyFoLXA= +google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM= +google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4= +google.golang.org/appengine v1.5.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4= +google.golang.org/appengine v1.6.1/go.mod h1:i06prIuMbXzDqacNJfV5OdTW448YApPu5ww/cMBSeb0= +google.golang.org/appengine v1.6.5/go.mod h1:8WjMMxjGQR8xUklV/ARdw2HLXBOI7O7uCIDZVag1xfc= +google.golang.org/appengine v1.6.6/go.mod h1:8WjMMxjGQR8xUklV/ARdw2HLXBOI7O7uCIDZVag1xfc= +google.golang.org/appengine v1.6.7 h1:FZR1q0exgwxzPzp/aF+VccGrSfxfPpkBqjIIEq3ru6c= +google.golang.org/appengine v1.6.7/go.mod h1:8WjMMxjGQR8xUklV/ARdw2HLXBOI7O7uCIDZVag1xfc= +google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc= +google.golang.org/genproto v0.0.0-20190307195333-5fe7a883aa19/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE= +google.golang.org/genproto v0.0.0-20190418145605-e7d98fc518a7/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE= +google.golang.org/genproto v0.0.0-20190425155659-357c62f0e4bb/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE= +google.golang.org/genproto v0.0.0-20190502173448-54afdca5d873/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE= +google.golang.org/genproto v0.0.0-20190801165951-fa694d86fc64/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc= +google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc= +google.golang.org/genproto v0.0.0-20190911173649-1774047e7e51/go.mod h1:IbNlFCBrqXvoKpeg0TB2l7cyZUmoaFKYIwrEpbDKLA8= +google.golang.org/genproto v0.0.0-20191108220845-16a3f7862a1a/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= +google.golang.org/genproto v0.0.0-20191115194625-c23dd37a84c9/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= +google.golang.org/genproto v0.0.0-20191216164720-4f79533eabd1/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= +google.golang.org/genproto v0.0.0-20191230161307-f3c370f40bfb/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= +google.golang.org/genproto v0.0.0-20200115191322-ca5a22157cba/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= +google.golang.org/genproto v0.0.0-20200122232147-0452cf42e150/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= +google.golang.org/genproto v0.0.0-20200204135345-fa8e72b47b90/go.mod h1:GmwEX6Z4W5gMy59cAlVYjN9JhxgbQH6Gn+gFDQe2lzA= +google.golang.org/genproto v0.0.0-20200212174721-66ed5ce911ce/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= +google.golang.org/genproto v0.0.0-20200224152610-e50cd9704f63/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= +google.golang.org/genproto v0.0.0-20200228133532-8c2c7df3a383/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= +google.golang.org/genproto v0.0.0-20200305110556-506484158171/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= +google.golang.org/genproto v0.0.0-20200312145019-da6875a35672/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= +google.golang.org/genproto v0.0.0-20200331122359-1ee6d9798940/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= +google.golang.org/genproto v0.0.0-20200430143042-b979b6f78d84/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= +google.golang.org/genproto v0.0.0-20200511104702-f5ebc3bea380/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= +google.golang.org/genproto v0.0.0-20200513103714-09dca8ec2884/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= +google.golang.org/genproto v0.0.0-20200515170657-fc4c6c6a6587/go.mod h1:YsZOwe1myG/8QRHRsmBRE1LrgQY60beZKjly0O1fX9U= +google.golang.org/genproto v0.0.0-20200526211855-cb27e3aa2013/go.mod h1:NbSheEEYHJ7i3ixzK3sjbqSGDJWnxyFXZblF3eUsNvo= +google.golang.org/genproto v0.0.0-20200618031413-b414f8b61790/go.mod h1:jDfRM7FcilCzHH/e9qn6dsT145K34l5v+OpcnNgKAAA= +google.golang.org/genproto v0.0.0-20200729003335-053ba62fc06f/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20200804131852-c06518451d9c/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20200825200019-8632dd797987/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20200904004341-0bd0a958aa1d/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20201109203340-2640f1f9cdfb/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20201201144952-b05cb90ed32e/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20201210142538-e3217bee35cc/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20201214200347-8c77b98c765d/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20210108203827-ffc7fda8c3d7/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20210222152913-aa3ee6e6a81c/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20210226172003-ab064af71705/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20210303154014-9728d6b83eeb/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20210310155132-4ce2db91004e/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20210319143718-93e7006c17a6/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= +google.golang.org/genproto v0.0.0-20210325141258-5636347f2b14/go.mod h1:f2Bd7+2PlaVKmvKQ52aspJZXIDaRQBVdOOBfJ5i8OEs= +google.golang.org/genproto v0.0.0-20210402141018-6c239bbf2bb1/go.mod h1:9lPAdzaEmUacj36I+k7YKbEc5CXzPIeORRgDAUOu28A= +google.golang.org/genproto v0.0.0-20210406143921-e86de6bf7a46/go.mod h1:P3QM42oQyzQSnHPnZ/vqoCdDmzH28fzWByN9asMeM8A= +google.golang.org/genproto v0.0.0-20210413151531-c14fb6ef47c3/go.mod h1:P3QM42oQyzQSnHPnZ/vqoCdDmzH28fzWByN9asMeM8A= +google.golang.org/genproto v0.0.0-20210420162539-3c870d7478d2/go.mod h1:P3QM42oQyzQSnHPnZ/vqoCdDmzH28fzWByN9asMeM8A= +google.golang.org/genproto v0.0.0-20210602131652-f16073e35f0c/go.mod h1:UODoCrxHCcBojKKwX1terBiRUaqAsFqJiF615XL43r0= +google.golang.org/genproto v0.0.0-20210728212813-7823e685a01f h1:4m1jFN3fHeKo0UvpraW2ipO2O0rgp5w2ugXeggtecAk= +google.golang.org/genproto v0.0.0-20210728212813-7823e685a01f/go.mod h1:ob2IJxKrgPT52GcgX759i1sleT07tiKowYBGbczaW48= +google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c= +google.golang.org/grpc v1.20.1/go.mod h1:10oTOabMzJvdu6/UiuZezV6QK5dSlG84ov/aaiqXj38= +google.golang.org/grpc v1.21.1/go.mod h1:oYelfM1adQP15Ek0mdvEgi9Df8B9CZIaU1084ijfRaM= +google.golang.org/grpc v1.23.0/go.mod h1:Y5yQAOtifL1yxbo5wqy6BxZv8vAUGQwXBOALyacEbxg= +google.golang.org/grpc v1.25.1/go.mod h1:c3i+UQWmh7LiEpx4sFZnkU36qjEYZ0imhYfXVyQciAY= +google.golang.org/grpc v1.26.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk= +google.golang.org/grpc v1.27.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk= +google.golang.org/grpc v1.27.1/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk= +google.golang.org/grpc v1.28.0/go.mod h1:rpkK4SK4GF4Ach/+MFLZUBavHOvF2JJB5uozKKal+60= +google.golang.org/grpc v1.29.1/go.mod h1:itym6AZVZYACWQqET3MqgPpjcuV5QH3BxFS3IjizoKk= +google.golang.org/grpc v1.30.0/go.mod h1:N36X2cJ7JwdamYAgDz+s+rVMFjt3numwzf/HckM8pak= +google.golang.org/grpc v1.31.0/go.mod h1:N36X2cJ7JwdamYAgDz+s+rVMFjt3numwzf/HckM8pak= +google.golang.org/grpc v1.31.1/go.mod h1:N36X2cJ7JwdamYAgDz+s+rVMFjt3numwzf/HckM8pak= +google.golang.org/grpc v1.33.1/go.mod h1:fr5YgcSWrqhRRxogOsw7RzIpsmvOZ6IcH4kBYTpR3n0= +google.golang.org/grpc v1.33.2/go.mod h1:JMHMWHQWaTccqQQlmk3MJZS+GWXOdAesneDmEnv2fbc= +google.golang.org/grpc v1.34.0/go.mod h1:WotjhfgOW/POjDeRt8vscBtXq+2VjORFy659qA51WJ8= +google.golang.org/grpc v1.35.0/go.mod h1:qjiiYl8FncCW8feJPdyg3v6XW24KsRHe+dy9BAGRRjU= +google.golang.org/grpc v1.36.0/go.mod h1:qjiiYl8FncCW8feJPdyg3v6XW24KsRHe+dy9BAGRRjU= +google.golang.org/grpc v1.36.1/go.mod h1:qjiiYl8FncCW8feJPdyg3v6XW24KsRHe+dy9BAGRRjU= +google.golang.org/grpc v1.37.0/go.mod h1:NREThFqKR1f3iQ6oBuvc5LadQuXVGo9rkm5ZGrQdJfM= +google.golang.org/grpc v1.38.0/go.mod h1:NREThFqKR1f3iQ6oBuvc5LadQuXVGo9rkm5ZGrQdJfM= +google.golang.org/grpc v1.39.0 h1:Klz8I9kdtkIN6EpHHUOMLCYhTn/2WAe5a0s1hcBkdTI= +google.golang.org/grpc v1.39.0/go.mod h1:PImNr+rS9TWYb2O4/emRugxiyHZ5JyHW5F+RPnDzfrE= +google.golang.org/grpc/cmd/protoc-gen-go-grpc v1.1.0/go.mod h1:6Kw0yEErY5E/yWrBtf03jp27GLLJujG4z/JK95pnjjw= +google.golang.org/protobuf v0.0.0-20200109180630-ec00e32a8dfd/go.mod h1:DFci5gLYBciE7Vtevhsrf46CRTquxDuWsQurQQe4oz8= +google.golang.org/protobuf v0.0.0-20200221191635-4d8936d0db64/go.mod h1:kwYJMbMJ01Woi6D6+Kah6886xMZcty6N08ah7+eCXa0= +google.golang.org/protobuf v0.0.0-20200228230310-ab0ca4ff8a60/go.mod h1:cfTl7dwQJ+fmap5saPgwCLgHXTUD7jkjRqWcaiX5VyM= +google.golang.org/protobuf v1.20.1-0.20200309200217-e05f789c0967/go.mod h1:A+miEFZTKqfCUM6K7xSMQL9OKL/b6hQv+e19PK+JZNE= +google.golang.org/protobuf v1.21.0/go.mod h1:47Nbq4nVaFHyn7ilMalzfO3qCViNmqZ2kzikPIcrTAo= +google.golang.org/protobuf v1.22.0/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU= +google.golang.org/protobuf v1.23.0/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU= +google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU= +google.golang.org/protobuf v1.24.0/go.mod h1:r/3tXBNzIEhYS9I1OUVjXDlt8tc493IdKGjtUeSXeh4= +google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c= +google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw= +google.golang.org/protobuf v1.26.0/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc= +google.golang.org/protobuf v1.27.1 h1:SnqbnDw1V7RiZcXPx5MEeqPv2s79L9i7BJUlG/+RurQ= +google.golang.org/protobuf v1.27.1/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f h1:BLraFXnmrev5lT+xlilqcH8XK9/i0At2xKjWk4p6zsU= +gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/errgo.v2 v2.1.0/go.mod h1:hNsd1EY+bozCKY1Ytp96fpM3vjJbqLJn88ws8XvfDNI= +gopkg.in/ini.v1 v1.62.0/go.mod h1:pNLf8WUiyNEtQjuu5G5vTm06TEv9tsIgeAvK8hOrP4k= +gopkg.in/linkedin/goavro.v1 v1.0.5 h1:BJa69CDh0awSsLUmZ9+BowBdokpduDZSM9Zk8oKHfN4= +gopkg.in/linkedin/goavro.v1 v1.0.5/go.mod h1:Aw5GdAbizjOEl0kAMHV9iHmA8reZzW/OKuJAl4Hb9F0= +gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= +gopkg.in/yaml.v2 v2.2.3/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= +gopkg.in/yaml.v2 v2.2.8/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= +gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY= +gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ= +gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= +gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= +honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= +honnef.co/go/tools v0.0.0-20190106161140-3f1c8253044a/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= +honnef.co/go/tools v0.0.0-20190418001031-e561f6794a2a/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= +honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= +honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt0JzvZhAg= +honnef.co/go/tools v0.0.1-2020.1.3/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k= +honnef.co/go/tools v0.0.1-2020.1.4/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k= +rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8= +rsc.io/quote/v3 v3.1.0/go.mod h1:yEA65RcK8LyAZtP9Kv3t0HmxON59tX3rD+tICJqUlj0= +rsc.io/sampler v1.3.0/go.mod h1:T1hPZKmBbMNahiBKFy5HrXp6adAjACjK9JXDnKaTXpA= diff --git a/sdks/go/BUILD.md b/sdks/go/BUILD.md index 8d99b936451d..9834c8ddee89 100644 --- a/sdks/go/BUILD.md +++ b/sdks/go/BUILD.md @@ -20,8 +20,7 @@ # Go build This document describes the [Go](https://golang.org) code layout and build integration -with Gradle. The setup is non-trivial, because the Go toolchain expects a -certain layout and Gradle support is limited. +with Gradle. Goals: @@ -32,16 +31,25 @@ Goals: In short, the goals are to make both worlds work well. +## Go Modules + +Beam publishes a single Go Module for SDK developement and usage, in the `sdks` directory. +This puts all Go code necessary for user pipeline development and for execution +under the same module. +This includes container bootloader code in the Java and Python SDK directories. + +Pipeline authors will require a dependency on `github.com/apache/beam/sdks/v2` in their +`go.mod` files to use beam. + ### Gradle integration -The Go toolchain expects the package name to match the directory structure, -which in turn must be rooted in `github.com/apache/beam` for `go get` to work. -This directory prefix is beyond the repo itself and we must copy the Go source -code into such a layout to invoke the tool chain. We use a single directory -`sdks/go` for all shared library code and export it as a zip file during the -build process to be used by various tools, such as `sdks/java/container`. -This scheme balances the convenience of combined Go setup with the desire -for a unified layout across languages. Python seems to do the same. +To integrate with Gradle, we use a gradle plugin called GoGradle. +However, we disable GoGradle vendoring in favour of using Go Modules +for dependency management. +GoGradle handles invoking the go toolchain or gradle and jenkins, +using the same dependencies as SDK contributors and users. +For the rare Go binary, such as the container boot loaders, it should be +possible to build the same binary with both Gradle and the usual Go tool. The container build adds a small twist to the build integration, because container images use linux/amd64 but the development setup might not. We @@ -50,17 +58,24 @@ images where needed, generally placed in `target/linux_amd64`. ### Go development setup -Developers must clone their git repository into: -``` -$GOPATH/src/github.com/apache +To develop the SDK, it should be sufficient to clone the repository, make +changes and execute tests from within the module directory (`/sdks/...`). +Go users can just `go get` the code directly. For example: ``` -to match the package structure expected by the code imports. Go users can just -`go get` the code directly. For example: -``` -go get github.com/apache/beam/sdks/go/... +go get github.com/apache/beam/sdks/v2/go/pkg/beam ``` Developers must invoke Go for cross-compilation manually, if desired. If you make changes to .proto files, you will need to rebuild the generated code. Consult `pkg/beam/model/PROTOBUF.md`. + +If you make changes to .tmpl files, then add the specialize tool to your path. +You can install specialize using: +``` +go get github.com/apache/beam/sdks/v2/go/cmd/specialize +``` +Add it to your path: +``` +export PATH=$PATH:$GOROOT/bin:$GOPATH/bin +``` diff --git a/sdks/go/README.md b/sdks/go/README.md index b1ebef07fdce..153ed47191da 100644 --- a/sdks/go/README.md +++ b/sdks/go/README.md @@ -17,11 +17,10 @@ under the License. --> -# Go SDK (experimental) +# Go SDK -The Go SDK is currently an experimental feature of Apache Beam and -not suitable for production use. It is based on the following initial -[design](https://s.apache.org/beam-go-sdk-design-rfc). +The Apache Beam Go SDK is the Beam Model implemented in the [Go Programming Language](https://go.dev/). +It is based on the following initial [design](https://s.apache.org/beam-go-sdk-design-rfc). ## How to run the examples @@ -30,8 +29,9 @@ most examples), follow the setup [here](https://beam.apache.org/documentation/runners/dataflow/). You can verify that it works by running the corresponding Java example. -The examples are normal Go programs and are most easily run directly. They -are parameterized by Go flags. For example, to run wordcount on direct runner do: +The examples are normal Go programs and are most easily run directly. +They are parameterized by Go flags. +For example, to run wordcount on the Go direct runner do: ``` $ pwd @@ -123,48 +123,17 @@ https://github.com/campoy/go-tooling-workshop is a great start on learning good ### Developing Go Beam SDK on Github -To make and test changes when working with Go, it's neecessary to clone your repository -in a subdirectory of your GOPATH. This permits existing gradle tools to use your in progress changes. +The Go SDK uses Go Modules for dependency management so it's as simple as cloning +the repo, making necessary changes and running tests. -``` -# Create a Go compatible place for the repo, using src/github.com/apache/ -# matches where Go will look for the files, or go get would put them. -$ mkdir -p $GOPATH/src/github.com/apache/ -$ cd $GOPATH/src/github.com/apache/ - - -# Clone the repo, and update your branch as normal -$ git clone https://github.com/apache/beam.git -$ cd beam -$ git remote add git@github.com:/beam.git -$ git fetch --all - -# Get or Update all the Go SDK dependencies -$ go get -u ./... -# Test that the system compiles and runs. -$ go test ./... -``` +Executing all unit tests for the SDK is possible from the `\sdks\go` directory and running `go test ./...`. -If you don’t have a GOPATH set, follow [these instructions](https://github.com/golang/go/wiki/SettingGOPATH) to create a new directory in your home directory, and use that. +To test your change as Jenkins would execute it from a PR, from the +beam root directory, run: + * `./gradlew :sdks:go:goTest` executes the unit tests. + * `./gradlew :sdks:go:test:ulrValidatesRunner` validates the SDK against the Portable Python runner. + * `./gradlew :sdks:go:test:flinkValidatesRunner` validates the SDK against the Flink runner. Follow the [contribution guide](https://beam.apache.org/contribute/contribution-guide/#code) to create branches, and submit pull requests as normal. -### Dependency management -Until [BEAM-5379](https://issues.apache.org/jira/browse/BEAM-5379) is resolved, -Beam locks versions of packages with the gogradle plugin. If new dependencies -are added in a PR then the lock file needs to be updated. -From the `$GOPATH/src/github.com/apache/beam` directory run - -``` -$ ./gradlew :sdks:go:goLock -`./gradlew :goPostcommit` -``` - - to update the lock file, and test your code under the locked versions. gogradle -will add vendor directories with the locked versions of the code. - -You can sanity check a PR on Jenkins by commenting `Run Go PostCommit` to trigger -the integration tests. This is important so that the Beam testing done on the -jenkins cluster can produce consistent results, and have the packages available. - diff --git a/sdks/go/build.gradle b/sdks/go/build.gradle index fa3f9ad39803..0752a2f2b676 100644 --- a/sdks/go/build.gradle +++ b/sdks/go/build.gradle @@ -21,19 +21,25 @@ applyGoNature() description = "Apache Beam :: SDKs :: Go" +// Disable gogradle's dependency resolution so it uses Go modules instead. +installDependencies.enabled = false +resolveBuildDependencies.enabled = false +resolveTestDependencies.enabled = false + golang { - packagePath = 'github.com/apache/beam/sdks/go' + packagePath = 'github.com/apache/beam/sdks/v2/go' goBuild { // The symlinks makes it hard (impossible?) to do a wildcard build // of pkg. Go build refuses to follow symlinks. Drop for now. The files // are built when tested anyway. + targetPlatform = ['linux-amd64'] // Build debugging utilities - go 'build -o ./build/bin/beamctl github.com/apache/beam/sdks/go/cmd/beamctl' + go 'build -o ./build/bin/linux-amd64/beamctl github.com/apache/beam/sdks/v2/go/cmd/beamctl' // Build integration test driver - go 'build -o ./build/bin/integration github.com/apache/beam/sdks/go/test/integration' + go 'build -o ./build/bin/linux-amd64/integration/driver github.com/apache/beam/sdks/v2/go/test/integration/driver' } // Ignore spurious vet errors during check for [BEAM-4831]. diff --git a/sdks/go/cmd/beamctl/cmd/artifact.go b/sdks/go/cmd/beamctl/cmd/artifact.go index 8ce7e9611809..1b5b7c9df8f9 100644 --- a/sdks/go/cmd/beamctl/cmd/artifact.go +++ b/sdks/go/cmd/beamctl/cmd/artifact.go @@ -18,8 +18,8 @@ package cmd import ( "path/filepath" - "github.com/apache/beam/sdks/go/pkg/beam/artifact" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/artifact" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" "github.com/spf13/cobra" ) diff --git a/sdks/go/cmd/beamctl/cmd/provision.go b/sdks/go/cmd/beamctl/cmd/provision.go index 3f53a8406f8f..cab82f7bf9db 100644 --- a/sdks/go/cmd/beamctl/cmd/provision.go +++ b/sdks/go/cmd/beamctl/cmd/provision.go @@ -16,7 +16,7 @@ package cmd import ( - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" "github.com/golang/protobuf/proto" "github.com/spf13/cobra" ) diff --git a/sdks/go/cmd/beamctl/cmd/root.go b/sdks/go/cmd/beamctl/cmd/root.go index 7aedb57c77ea..5515882510cc 100644 --- a/sdks/go/cmd/beamctl/cmd/root.go +++ b/sdks/go/cmd/beamctl/cmd/root.go @@ -21,7 +21,7 @@ import ( "errors" "time" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" "github.com/spf13/cobra" "google.golang.org/grpc" ) diff --git a/sdks/go/cmd/beamctl/main.go b/sdks/go/cmd/beamctl/main.go index 7d6ae8a6c0d0..168a0950f83a 100644 --- a/sdks/go/cmd/beamctl/main.go +++ b/sdks/go/cmd/beamctl/main.go @@ -20,7 +20,7 @@ import ( "fmt" "os" - "github.com/apache/beam/sdks/go/cmd/beamctl/cmd" + "github.com/apache/beam/sdks/v2/go/cmd/beamctl/cmd" ) func main() { diff --git a/sdks/go/cmd/specialize/main.go b/sdks/go/cmd/specialize/main.go index 6ee82b589a38..71171189393f 100644 --- a/sdks/go/cmd/specialize/main.go +++ b/sdks/go/cmd/specialize/main.go @@ -100,7 +100,7 @@ var ( } packageMacros = map[string][]string{ - "typex": {"github.com/apache/beam/sdks/go/pkg/beam/core/typex"}, + "typex": {"github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex"}, } ) diff --git a/sdks/go/cmd/starcgen/starcgen.go b/sdks/go/cmd/starcgen/starcgen.go index 601ec1397c05..d6e6f1f10c55 100644 --- a/sdks/go/cmd/starcgen/starcgen.go +++ b/sdks/go/cmd/starcgen/starcgen.go @@ -27,7 +27,7 @@ // putting the types and functions used in a separate package from pipeline construction. // Then, the tool can be used as follows: // -// //go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +// //go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen // //go:generate starcgen --package= // //go:generate go fmt // @@ -37,7 +37,7 @@ // Alternatively, it's possible to specify the specific input files and identifiers within // the package for generation. // -// //go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +// //go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen // //go:generate starcgen --package= --inputs=foo.go --identifiers=myFn,myStructFn --output=custom.shims.go // //go:generate go fmt // @@ -56,7 +56,7 @@ import ( "path/filepath" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/util/starcgenx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/starcgenx" ) var ( diff --git a/sdks/go/cmd/starcgen/starcgen_test.go b/sdks/go/cmd/starcgen/starcgen_test.go index 7282ada8a27b..ab8c564ee404 100644 --- a/sdks/go/cmd/starcgen/starcgen_test.go +++ b/sdks/go/cmd/starcgen/starcgen_test.go @@ -110,7 +110,7 @@ import ( "context" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/util/shimx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/shimx" ) func anotherFn(v shimx.Emitter) string { diff --git a/sdks/go/cmd/symtab/main.go b/sdks/go/cmd/symtab/main.go index f4454b2b36d1..c9df1f0ad608 100644 --- a/sdks/go/cmd/symtab/main.go +++ b/sdks/go/cmd/symtab/main.go @@ -21,9 +21,9 @@ import ( "os" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/symtab" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/symtab" ) const ( diff --git a/sdks/go/container/boot.go b/sdks/go/container/boot.go index 0d354e5626eb..1f7ea5f09942 100644 --- a/sdks/go/container/boot.go +++ b/sdks/go/container/boot.go @@ -24,10 +24,10 @@ import ( "path/filepath" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/artifact" - "github.com/apache/beam/sdks/go/pkg/beam/provision" - "github.com/apache/beam/sdks/go/pkg/beam/util/execx" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/artifact" + "github.com/apache/beam/sdks/v2/go/pkg/beam/provision" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/execx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" ) var ( @@ -106,11 +106,12 @@ func main() { case 0: log.Fatal("No artifacts staged") case 1: - name = artifacts[0].Name + name, _ = artifact.MustExtractFilePayload(artifacts[0]) default: found := false for _, a := range artifacts { - if a.Name == worker { + n, _ := artifact.MustExtractFilePayload(a) + if n == worker { found = true break } diff --git a/sdks/go/container/build.gradle b/sdks/go/container/build.gradle index 74ca6c181aee..643ffdca4e72 100644 --- a/sdks/go/container/build.gradle +++ b/sdks/go/container/build.gradle @@ -22,10 +22,10 @@ applyDockerNature() description = "Apache Beam :: SDKs :: Go :: Container" -// Figure out why the golang plugin does not add a build dependency between projects. -// Without the line below, we get spurious errors about not being able to resolve -// "./github.com/apache/beam/sdks/go" -resolveBuildDependencies.dependsOn ":sdks:go:goBuild" +// Disable gogradle's dependency resolution so it uses Go modules instead. +installDependencies.enabled = false +resolveBuildDependencies.enabled = false +resolveTestDependencies.enabled = false clean.dependsOn cleanVendor @@ -33,13 +33,13 @@ dependencies { golang { // TODO(herohde): use "./" prefix to prevent gogradle use base github path, for now. // TODO(herohde): get the pkg subdirectory only, if possible. We spend mins pulling cmd/beamctl deps. - build name: './github.com/apache/beam/sdks/go', dir: project(':sdks:go').projectDir - test name: './github.com/apache/beam/sdks/go', dir: project(':sdks:go').projectDir + build name: './github.com/apache/beam/sdks/v2/go', dir: project(':sdks:go').projectDir + test name: './github.com/apache/beam/sdks/v2/go', dir: project(':sdks:go').projectDir } } golang { - packagePath = 'github.com/apache/beam/sdks/go/boot' + packagePath = 'github.com/apache/beam/sdks/v2/go/container' goBuild { // TODO(herohde): build local platform + linux-amd64, if possible. targetPlatform = ['linux-amd64'] @@ -53,6 +53,8 @@ docker { root: project.rootProject.hasProperty(["docker-repository-root"]) ? project.rootProject["docker-repository-root"] : project.docker_image_default_repo_root) + // tags used by dockerTag task + tags containerImageTags() files "./build/" buildArgs(['pull_licenses': project.rootProject.hasProperty(["docker-pull-licenses"]) || project.rootProject.hasProperty(["isRelease"])]) @@ -67,4 +69,14 @@ if (project.rootProject.hasProperty(["docker-pull-licenses"])) { dependsOn ':release:go-licenses:go:createLicenses' } dockerPrepare.dependsOn 'copyGolangLicenses' +} else { + task skipPullLicenses(type: Exec) { + executable "sh" + args "-c", "mkdir -p build/target/go-licenses" + } + dockerPrepare.dependsOn 'skipPullLicenses' } + +task pushAll { + dependsOn ":sdks:go:container:dockerPush" +} \ No newline at end of file diff --git a/sdks/go/examples/build.gradle b/sdks/go/examples/build.gradle index 8cf5979dfb1e..26f0fa868b19 100644 --- a/sdks/go/examples/build.gradle +++ b/sdks/go/examples/build.gradle @@ -21,17 +21,17 @@ applyGoNature() description = "Apache Beam :: SDKs :: Go :: Examples" +// Disable gogradle's dependency resolution so it uses Go modules instead. +installDependencies.enabled = false +resolveBuildDependencies.enabled = false +resolveTestDependencies.enabled = false + def getLocalPlatform = { String hostOs = com.github.blindpirate.gogradle.crossplatform.Os.getHostOs() String hostArch = com.github.blindpirate.gogradle.crossplatform.Arch.getHostArch() return hostOs + '-' + hostArch } -// Figure out why the golang plugin does not add a build dependency between projects. -// Without the line below, we get spurious errors about not being able to resolve -// "./github.com/apache/beam/sdks/go" -resolveBuildDependencies.dependsOn ":sdks:go:goBuild" - clean.dependsOn cleanVendor dependencies { @@ -44,7 +44,7 @@ dependencies { } golang { - packagePath = 'github.com/apache/beam/sdks/go/examples' + packagePath = 'github.com/apache/beam/sdks/v2/go/examples' goBuild { // We always want to build linux-amd64 in addition to the user host platform // so we can submit this as the remote binary used within the Go container. @@ -55,21 +55,21 @@ golang { targetPlatform = [getLocalPlatform(), 'linux-amd64'] // Build all the examples - go 'build -o ./build/bin/${GOOS}_${GOARCH}/autocomplete github.com/apache/beam/sdks/go/examples/complete/autocomplete' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/combine github.com/apache/beam/sdks/go/examples/cookbook/combine' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/contains github.com/apache/beam/sdks/go/examples/contains' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/debugging_wordcount github.com/apache/beam/sdks/go/examples/debugging_wordcount' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/filter github.com/apache/beam/sdks/go/examples/cookbook/filter' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/forest github.com/apache/beam/sdks/go/examples/forest' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/grades github.com/apache/beam/sdks/go/examples/grades' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/join github.com/apache/beam/sdks/go/examples/cookbook/join' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/max github.com/apache/beam/sdks/go/examples/cookbook/max' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/minimal_wordcount github.com/apache/beam/sdks/go/examples/minimal_wordcount' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/pingpong github.com/apache/beam/sdks/go/examples/pingpong' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/tornadoes github.com/apache/beam/sdks/go/examples/cookbook/tornadoes' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/windowed_wordcount github.com/apache/beam/sdks/go/examples/windowed_wordcount' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/wordcount github.com/apache/beam/sdks/go/examples/wordcount' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/yatzy github.com/apache/beam/sdks/go/examples/yatzy' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/autocomplete github.com/apache/beam/sdks/v2/go/examples/complete/autocomplete' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/combine github.com/apache/beam/sdks/v2/go/examples/cookbook/combine' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/contains github.com/apache/beam/sdks/v2/go/examples/contains' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/debugging_wordcount github.com/apache/beam/sdks/v2/go/examples/debugging_wordcount' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/filter github.com/apache/beam/sdks/v2/go/examples/cookbook/filter' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/forest github.com/apache/beam/sdks/v2/go/examples/forest' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/grades github.com/apache/beam/sdks/v2/go/examples/grades' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/join github.com/apache/beam/sdks/v2/go/examples/cookbook/join' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/max github.com/apache/beam/sdks/v2/go/examples/cookbook/max' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/minimal_wordcount github.com/apache/beam/sdks/v2/go/examples/minimal_wordcount' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/pingpong github.com/apache/beam/sdks/v2/go/examples/pingpong' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/tornadoes github.com/apache/beam/sdks/v2/go/examples/cookbook/tornadoes' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/windowed_wordcount github.com/apache/beam/sdks/v2/go/examples/windowed_wordcount' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/wordcount github.com/apache/beam/sdks/v2/go/examples/wordcount' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/yatzy github.com/apache/beam/sdks/v2/go/examples/yatzy' } // Ignore spurious vet errors during check for [BEAM-8992]. @@ -77,3 +77,10 @@ golang { continueOnFailure = true } } + +// Run this task to validate the Go environment setup for contributors +task wordCount(type: com.github.blindpirate.gogradle.Go) { + description "Run the Go word count example" + dependsOn goVendor + go 'build -o ./build/bin/${GOOS}_${GOARCH}/wordcount github.com/apache/beam/sdks/v2/go/examples/wordcount' +} \ No newline at end of file diff --git a/sdks/go/examples/complete/autocomplete/autocomplete.go b/sdks/go/examples/complete/autocomplete/autocomplete.go index 3b8ae17bf10e..4c39e5336bb2 100644 --- a/sdks/go/examples/complete/autocomplete/autocomplete.go +++ b/sdks/go/examples/complete/autocomplete/autocomplete.go @@ -21,12 +21,12 @@ import ( "os" "regexp" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/top" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/pkg/beam/x/debug" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/debug" ) // TODO(herohde) 5/30/2017: fully implement https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java diff --git a/sdks/go/examples/contains/contains.go b/sdks/go/examples/contains/contains.go index 45a8527a8511..5f7c14e0c2d3 100644 --- a/sdks/go/examples/contains/contains.go +++ b/sdks/go/examples/contains/contains.go @@ -23,12 +23,12 @@ import ( "regexp" "strings" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/pkg/beam/x/debug" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/debug" ) // Options used purely at pipeline construction-time can just be flags. diff --git a/sdks/go/examples/cookbook/combine/combine.go b/sdks/go/examples/cookbook/combine/combine.go index c42bf12e60c8..33023b28f710 100644 --- a/sdks/go/examples/cookbook/combine/combine.go +++ b/sdks/go/examples/cookbook/combine/combine.go @@ -23,18 +23,19 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/bigqueryio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/gcpopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) var ( input = flag.String("input", "publicdata:samples.shakespeare", "Shakespeare plays BQ table.") output = flag.String("output", "", "Output BQ table.") - minLength = flag.Int("min_length", 9, "Minimum word length") + minLength = flag.Int("min_length", 9, "Minimum word length") + small_words = beam.NewCounter("extract", "small_words") ) func init() { @@ -67,11 +68,12 @@ type extractFn struct { MinLength int `json:"min_length"` } -func (f *extractFn) ProcessElement(row WordRow, emit func(string, string)) { +func (f *extractFn) ProcessElement(ctx context.Context, row WordRow, emit func(string, string)) { if len(row.Word) >= f.MinLength { emit(row.Word, row.Corpus) + } else { + small_words.Inc(ctx, 1) } - // TODO(herohde) 7/14/2017: increment counter for "small words" } // TODO(herohde) 7/14/2017: the choice of a string (instead of []string) for the diff --git a/sdks/go/examples/cookbook/filter/filter.go b/sdks/go/examples/cookbook/filter/filter.go index effc1225655f..6b3d45dcd66f 100644 --- a/sdks/go/examples/cookbook/filter/filter.go +++ b/sdks/go/examples/cookbook/filter/filter.go @@ -22,12 +22,12 @@ import ( "flag" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/bigqueryio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/gcpopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) var ( diff --git a/sdks/go/examples/cookbook/join/join.go b/sdks/go/examples/cookbook/join/join.go index 450dfb08e548..3adb33effc10 100644 --- a/sdks/go/examples/cookbook/join/join.go +++ b/sdks/go/examples/cookbook/join/join.go @@ -22,12 +22,12 @@ import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/bigqueryio" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/gcpopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) // See: https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/cookbook/JoinExamples.java diff --git a/sdks/go/examples/cookbook/max/max.go b/sdks/go/examples/cookbook/max/max.go index f0184354afac..2e596f052c20 100644 --- a/sdks/go/examples/cookbook/max/max.go +++ b/sdks/go/examples/cookbook/max/max.go @@ -22,12 +22,12 @@ import ( "flag" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/bigqueryio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/gcpopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) var ( diff --git a/sdks/go/examples/cookbook/tornadoes/tornadoes.go b/sdks/go/examples/cookbook/tornadoes/tornadoes.go index 058948ba32e1..2950a4c499d2 100644 --- a/sdks/go/examples/cookbook/tornadoes/tornadoes.go +++ b/sdks/go/examples/cookbook/tornadoes/tornadoes.go @@ -38,12 +38,12 @@ import ( "flag" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/bigqueryio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/gcpopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) var ( diff --git a/sdks/go/examples/debugging_wordcount/debugging_wordcount.go b/sdks/go/examples/debugging_wordcount/debugging_wordcount.go index a17a75f3104a..6d30c6d7e640 100644 --- a/sdks/go/examples/debugging_wordcount/debugging_wordcount.go +++ b/sdks/go/examples/debugging_wordcount/debugging_wordcount.go @@ -48,12 +48,12 @@ import ( "reflect" "regexp" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) // TODO(herohde) 10/16/2017: support metrics and log level cutoff. diff --git a/sdks/go/examples/forest/forest.go b/sdks/go/examples/forest/forest.go index 06ae7831a457..5f6c55cd2d6e 100644 --- a/sdks/go/examples/forest/forest.go +++ b/sdks/go/examples/forest/forest.go @@ -31,10 +31,10 @@ import ( "context" "flag" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/pkg/beam/x/debug" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/debug" ) var ( diff --git a/sdks/go/examples/grades/grades.go b/sdks/go/examples/grades/grades.go index a96bcefbdabd..55268510488b 100644 --- a/sdks/go/examples/grades/grades.go +++ b/sdks/go/examples/grades/grades.go @@ -19,12 +19,12 @@ import ( "context" "flag" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/top" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/pkg/beam/x/debug" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/debug" ) type Grade struct { diff --git a/sdks/go/examples/kafka/taxi.go b/sdks/go/examples/kafka/taxi.go new file mode 100644 index 000000000000..2c9c3c2c5ce7 --- /dev/null +++ b/sdks/go/examples/kafka/taxi.go @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// taxi is an example using a cross-language Kafka pipeline to write and read +// to Kafka. This example reads from the PubSub NYC Taxi stream described in +// https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon, writes to +// a given Kafka topic and then reads back from the same Kafka topic, logging +// every element. This is done as a streaming pipeline and will not end +// unless the pipeline is stopped externally. +// +// Running this example requires a Kafka cluster accessible to the runner, and +// a cross-language expansion service that can expand Kafka read and write +// transforms. +// +// Setting Up a Kafka Cluster +// +// Setting up a Kafka cluster is more involved than can be covered in this +// example. In order for this example to work, all that is necessary is a Kafka +// cluster accessible through a bootstrap server address that is passed in as +// a flag. Some instructions for setting up a single node Kafka cluster in GCE +// can be found here: https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/dataflow/flex-templates/kafka_to_bigquery +// +// Running an Expansion Server +// +// These instructions will cover running the Java IO Expansion Service, and +// therefore requires a JDK installation in a version supported by Beam. +// Depending on whether you are running this from a numbered Beam release, or a +// development environment, there are two sources you may use for the Expansion +// service. +// +// Numbered release: The expansion service jar is vendored as module +// org.apache.beam:beam-sdks-java-io-expansion-service in Maven Repository. +// This jar can be executed directly with the following command: +// `java -jar ` +// Development env: This requires that the JAVA_HOME environment variable +// points to your JDK installation. From the root `beam/` directory of the +// Apache Beam repository, the jar can be built (or built and run) with the +// following commands: +// Build: ./gradlew :sdks:java:io:expansion-service:build +// Build and Run: ./gradlew :sdks:java:io:expansion-service:runExpansionService -PconstructionService.port= +// +// Running the Example on GCP +// +// Running this pipeline requires providing an address for the Expansion Service +// and for the Kafka cluster's bootstrap servers as flags, in addition to the +// usual flags for pipelines. +// +// An example command for executing this pipeline on GCP is as follows: +// export PROJECT="$(gcloud config get-value project)" +// export TEMP_LOCATION="gs://MY-BUCKET/temp" +// export REGION="us-central1" +// export JOB_NAME="kafka-taxi-`date +%Y%m%d-%H%M%S`" +// export BOOTSTRAP_SERVERS="123.45.67.89:1234" +// export EXPANSION_ADDR="localhost:1234" +// go run ./sdks/go/examples/kafka/types/types.go \ +// --runner=DataflowRunner \ +// --temp_location=$TEMP_LOCATION \ +// --staging_location=$STAGING_LOCATION \ +// --project=$PROJECT \ +// --region=$REGION \ +// --job_name="${JOB_NAME}" \ +// --bootstrap_servers=$BOOTSTRAP_SERVER \ +// --experiments=use_portable_job_submission,use_runner_v2 \ +// --expansion_addr=$EXPANSION_ADDR +// +// Running the Example From a Git Clone +// +// When running on a development environment, a custom container will likely +// need to be provided for the cross-language SDK. First this will require +// building and pushing the SDK container to container repository, such as +// Docker Hub. +// +// export DOCKER_ROOT="Your Docker Repository Root" +// ./gradlew :sdks:java:container:java8:docker -Pdocker-repository-root=$DOCKER_ROOT -Pdocker-tag=latest +// docker push $DOCKER_ROOT/beam_java8_sdk:latest +// +// For runners in local mode, simply building the container using the default +// values for docker-repository-root and docker-tag will work to have it +// accessible locally. +// +// Additionally, you must provide the location of your custom container to the +// pipeline with the --sdk_harness_container_image_override flag. For example: +// +// --sdk_harness_container_image_override=".*java.*,${DOCKER_ROOT}/beam_java8_sdk:latest" +package main + +import ( + "context" + "flag" + "reflect" + "time" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/pubsubio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/kafkaio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" +) + +var ( + expansionAddr = flag.String("expansion_addr", "", "Address of Expansion Service") + bootstrapServers = flag.String("bootstrap_servers", "", + "URL of the bootstrap servers for the Kafka cluster. Should be accessible by the runner.") + topic = flag.String("topic", "kafka_taxirides_realtime", "Kafka topic to write to and read from.") +) + +func init() { + beam.RegisterType(reflect.TypeOf((*LogFn)(nil)).Elem()) +} + +// LogFn is a DoFn to log rides. +type LogFn struct{} + +// ProcessElement logs each element it receives. +func (fn *LogFn) ProcessElement(ctx context.Context, elm []byte) { + log.Infof(ctx, "Ride info: %v", string(elm)) +} + +// FinishBundle waits a bit so the job server finishes receiving logs. +func (fn *LogFn) FinishBundle() { + time.Sleep(2 * time.Second) +} + +func main() { + flag.Parse() + beam.Init() + + ctx := context.Background() + if *expansionAddr == "" { + log.Fatal(ctx, "No expansion address provided") + } + + p := beam.NewPipeline() + s := p.Root() + + // Read from Pubsub and write to Kafka. + data := pubsubio.Read(s, "pubsub-public-data", "taxirides-realtime", nil) + kvData := beam.ParDo(s, func(elm []byte) ([]byte, []byte) { return []byte(""), elm }, data) + windowed := beam.WindowInto(s, window.NewFixedWindows(15*time.Second), kvData) + kafkaio.Write(s, *expansionAddr, *bootstrapServers, *topic, windowed) + + // Simultaneously read from Kafka and log any element received. + read := kafkaio.Read(s, *expansionAddr, *bootstrapServers, []string{*topic}) + vals := beam.DropKey(s, read) + beam.ParDo0(s, &LogFn{}, vals) + + if err := beamx.Run(ctx, p); err != nil { + log.Fatalf(ctx, "Failed to execute job: %v", err) + } +} diff --git a/sdks/go/examples/minimal_wordcount/minimal_wordcount.go b/sdks/go/examples/minimal_wordcount/minimal_wordcount.go index 5fe6acce9471..b35aab2beacb 100644 --- a/sdks/go/examples/minimal_wordcount/minimal_wordcount.go +++ b/sdks/go/examples/minimal_wordcount/minimal_wordcount.go @@ -41,13 +41,13 @@ import ( "fmt" "regexp" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/runners/direct" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/direct" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" ) var wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`) diff --git a/sdks/go/examples/multiout/multiout.go b/sdks/go/examples/multiout/multiout.go index 43b480dbf5d2..9c40514c647f 100644 --- a/sdks/go/examples/multiout/multiout.go +++ b/sdks/go/examples/multiout/multiout.go @@ -24,10 +24,10 @@ import ( "log" "regexp" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) var ( diff --git a/sdks/go/examples/pingpong/pingpong.go b/sdks/go/examples/pingpong/pingpong.go index 8c0bfe46420b..113f5828c832 100644 --- a/sdks/go/examples/pingpong/pingpong.go +++ b/sdks/go/examples/pingpong/pingpong.go @@ -23,10 +23,10 @@ import ( "os" "regexp" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) var ( diff --git a/sdks/go/examples/readavro/readavro.go b/sdks/go/examples/readavro/readavro.go index 9fe778943d62..9ea08888e2bd 100644 --- a/sdks/go/examples/readavro/readavro.go +++ b/sdks/go/examples/readavro/readavro.go @@ -25,10 +25,10 @@ import ( "log" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/avroio" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/pkg/beam/x/debug" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/avroio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/debug" ) var ( diff --git a/sdks/go/examples/snippets/01_03intro.go b/sdks/go/examples/snippets/01_03intro.go new file mode 100644 index 000000000000..38d6fe4ec24a --- /dev/null +++ b/sdks/go/examples/snippets/01_03intro.go @@ -0,0 +1,102 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package snippets + +import ( + "flag" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" +) + +// PipelineConstruction contains snippets for the initial sections of +// the Beam Programming Guide, from initializing to submitting a +// pipeline. +func PipelineConstruction() { + // [START pipeline_options] + // If beamx or Go flags are used, flags must be parsed first, + // before beam.Init() is called. + flag.Parse() + // [END pipeline_options] + + // [START pipelines_constructing_creating] + // beam.Init() is an initialization hook that must be called + // near the beginning of main(), before creating a pipeline. + beam.Init() + + // Create the Pipeline object and root scope. + pipeline, scope := beam.NewPipelineWithRoot() + // [END pipelines_constructing_creating] + + // [START pipelines_constructing_reading] + // Read the file at the URI 'gs://some/inputData.txt' and return + // the lines as a PCollection. + // Notice the scope as the first variable when calling + // the method as is needed when calling all transforms. + lines := textio.Read(scope, "gs://some/inputData.txt") + + // [END pipelines_constructing_reading] + + _ = []interface{}{pipeline, scope, lines} +} + +// Create demonstrates using beam.CreateList. +func Create() { + // [START model_pcollection] + lines := []string{ + "To be, or not to be: that is the question: ", + "Whether 'tis nobler in the mind to suffer ", + "The slings and arrows of outrageous fortune, ", + "Or to take arms against a sea of troubles, ", + } + + // Create the Pipeline object and root scope. + // It's conventional to use p as the Pipeline variable and + // s as the scope variable. + p, s := beam.NewPipelineWithRoot() + + // Pass the slice to beam.CreateList, to create the pcollection. + // The scope variable s is used to add the CreateList transform + // to the pipeline. + linesPCol := beam.CreateList(s, lines) + // [END model_pcollection] + _ = []interface{}{p, linesPCol} +} + +// PipelineOptions shows basic pipeline options using flags. +func PipelineOptions() { + // [START pipeline_options_define_custom] + // Use standard Go flags to define pipeline options. + var ( + input = flag.String("input", "", "") + output = flag.String("output", "", "") + ) + // [END pipeline_options_define_custom] + + _ = []interface{}{input, output} +} + +// PipelineOptionsCustom shows slightly less basic pipeline options using flags. +func PipelineOptionsCustom() { + // [START pipeline_options_define_custom_with_help_and_default] + var ( + input = flag.String("input", "gs://my-bucket/input", "Input for the pipeline") + output = flag.String("output", "gs://my-bucket/output", "Output for the pipeline") + ) + // [END pipeline_options_define_custom_with_help_and_default] + + _ = []interface{}{input, output} +} diff --git a/sdks/go/examples/snippets/04transforms.go b/sdks/go/examples/snippets/04transforms.go new file mode 100644 index 000000000000..4f07f04a5788 --- /dev/null +++ b/sdks/go/examples/snippets/04transforms.go @@ -0,0 +1,430 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package snippets + +import ( + "fmt" + "math" + "reflect" + "sort" + "strings" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" +) + +// [START model_pardo_pardo] + +// ComputeWordLengthFn is the DoFn to perform on each element in the input PCollection. +type ComputeWordLengthFn struct{} + +// ProcessElement is the method to execute for each element. +func (fn *ComputeWordLengthFn) ProcessElement(word string, emit func(int)) { + emit(len(word)) +} + +// DoFns must be registered with beam. +func init() { + beam.RegisterType(reflect.TypeOf((*ComputeWordLengthFn)(nil))) +} + +// [END model_pardo_pardo] + +// applyWordLen applies ComputeWordLengthFn to words, which must be +// a PCollection +func applyWordLen(s beam.Scope, words beam.PCollection) beam.PCollection { + // [START model_pardo_apply] + wordLengths := beam.ParDo(s, &ComputeWordLengthFn{}, words) + // [END model_pardo_apply] + return wordLengths +} + +func applyWordLenAnon(s beam.Scope, words beam.PCollection) beam.PCollection { + // [START model_pardo_apply_anon] + // Apply an anonymous function as a DoFn PCollection words. + // Save the result as the PCollection wordLengths. + wordLengths := beam.ParDo(s, func(word string) int { + return len(word) + }, words) + // [END model_pardo_apply_anon] + return wordLengths +} + +// [START cogroupbykey_input_helpers] + +type stringPair struct { + K, V string +} + +func splitStringPair(e stringPair) (string, string) { + return e.K, e.V +} + +func init() { + // Register element types and DoFns. + beam.RegisterType(reflect.TypeOf((*stringPair)(nil)).Elem()) + beam.RegisterFunction(splitStringPair) +} + +// CreateAndSplit is a helper function that creates +func CreateAndSplit(s beam.Scope, input []stringPair) beam.PCollection { + initial := beam.CreateList(s, input) + return beam.ParDo(s, splitStringPair, initial) +} + +// [END cogroupbykey_input_helpers] + +// [START cogroupbykey_output_helpers] + +func formatCoGBKResults(key string, emailIter, phoneIter func(*string) bool) string { + var s string + var emails, phones []string + for emailIter(&s) { + emails = append(emails, s) + } + for phoneIter(&s) { + phones = append(phones, s) + } + // Values have no guaranteed order, sort for deterministic output. + sort.Strings(emails) + sort.Strings(phones) + return fmt.Sprintf("%s; %s; %s", key, formatStringIter(emails), formatStringIter(phones)) +} + +func init() { + beam.RegisterFunction(formatCoGBKResults) +} + +// [END cogroupbykey_output_helpers] + +func formatStringIter(vs []string) string { + var b strings.Builder + b.WriteRune('[') + for i, v := range vs { + b.WriteRune('\'') + b.WriteString(v) + b.WriteRune('\'') + if i < len(vs)-1 { + b.WriteString(", ") + } + } + b.WriteRune(']') + return b.String() +} + +func coGBKExample(s beam.Scope) beam.PCollection { + // [START cogroupbykey_inputs] + var emailSlice = []stringPair{ + {"amy", "amy@example.com"}, + {"carl", "carl@example.com"}, + {"julia", "julia@example.com"}, + {"carl", "carl@email.com"}, + } + + var phoneSlice = []stringPair{ + {"amy", "111-222-3333"}, + {"james", "222-333-4444"}, + {"amy", "333-444-5555"}, + {"carl", "444-555-6666"}, + } + emails := CreateAndSplit(s.Scope("CreateEmails"), emailSlice) + phones := CreateAndSplit(s.Scope("CreatePhones"), phoneSlice) + // [END cogroupbykey_inputs] + + // [START cogroupbykey_outputs] + results := beam.CoGroupByKey(s, emails, phones) + + contactLines := beam.ParDo(s, formatCoGBKResults, results) + // [END cogroupbykey_outputs] + + return contactLines +} + +// [START combine_simple_sum] +func sumInts(a, v int) int { + return a + v +} + +func init() { + beam.RegisterFunction(sumInts) +} + +func globallySumInts(s beam.Scope, ints beam.PCollection) beam.PCollection { + return beam.Combine(s, sumInts, ints) +} + +type boundedSum struct { + Bound int +} + +func (fn *boundedSum) MergeAccumulators(a, v int) int { + sum := a + v + if fn.Bound > 0 && sum > fn.Bound { + return fn.Bound + } + return sum +} + +func init() { + beam.RegisterType(reflect.TypeOf((*boundedSum)(nil))) +} + +func globallyBoundedSumInts(s beam.Scope, bound int, ints beam.PCollection) beam.PCollection { + return beam.Combine(s, &boundedSum{Bound: bound}, ints) +} + +// [END combine_simple_sum] + +// [START combine_custom_average] + +type averageFn struct{} + +type averageAccum struct { + Count, Sum int +} + +func (fn *averageFn) CreateAccumulator() averageAccum { + return averageAccum{0, 0} +} + +func (fn *averageFn) AddInput(a averageAccum, v int) averageAccum { + return averageAccum{Count: a.Count + 1, Sum: a.Sum + v} +} + +func (fn *averageFn) MergeAccumulators(a, v averageAccum) averageAccum { + return averageAccum{Count: a.Count + v.Count, Sum: a.Sum + v.Sum} +} + +func (fn *averageFn) ExtractOutput(a averageAccum) float64 { + if a.Count == 0 { + return math.NaN() + } + return float64(a.Sum) / float64(a.Count) +} + +func init() { + beam.RegisterType(reflect.TypeOf((*averageFn)(nil))) +} + +// [END combine_custom_average] + +func globallyAverage(s beam.Scope, ints beam.PCollection) beam.PCollection { + // [START combine_global_average] + average := beam.Combine(s, &averageFn{}, ints) + // [END combine_global_average] + return average +} + +func globallyAverageWithDefault(s beam.Scope, ints beam.PCollection) beam.PCollection { + // [START combine_global_with_default] + // Setting combine defaults has requires no helper function in the Go SDK. + average := beam.Combine(s, &averageFn{}, ints) + + // To add a default value: + defaultValue := beam.Create(s, float64(0)) + avgWithDefault := beam.ParDo(s, func(d float64, iter func(*float64) bool) float64 { + var c float64 + if iter(&c) { + // Side input has a value, so return it. + return c + } + // Otherwise, return the default + return d + }, defaultValue, beam.SideInput{Input: average}) + // [END combine_global_with_default] + return avgWithDefault +} + +func perKeyAverage(s beam.Scope, playerAccuracies beam.PCollection) beam.PCollection { + // [START combine_per_key] + avgAccuracyPerPlayer := stats.MeanPerKey(s, playerAccuracies) + // [END combine_per_key] + return avgAccuracyPerPlayer +} + +func applyFlatten(s beam.Scope, pcol1, pcol2, pcol3 beam.PCollection) beam.PCollection { + // [START model_multiple_pcollections_flatten] + merged := beam.Flatten(s, pcol1, pcol2, pcol3) + // [END model_multiple_pcollections_flatten] + return merged +} + +type Student struct { + Percentile int +} + +// [START model_multiple_pcollections_partition_fn] + +func decileFn(student Student) int { + return int(float64(student.Percentile) / float64(10)) +} + +func init() { + beam.RegisterFunction(decileFn) +} + +// [END model_multiple_pcollections_partition_fn] + +// applyPartition returns the 40th percentile of students. +func applyPartition(s beam.Scope, students beam.PCollection) beam.PCollection { + // [START model_multiple_pcollections_partition] + // Partition returns a slice of PCollections + studentsByPercentile := beam.Partition(s, 10, decileFn, students) + // Each partition can be extracted by indexing into the slice. + fortiethPercentile := studentsByPercentile[4] + // [END model_multiple_pcollections_partition] + return fortiethPercentile +} + +// [START model_pardo_side_input_dofn] + +// filterWordsAbove is a DoFn that takes in a word, +// and a singleton side input iterator as of a length cut off +// and only emits words that are beneath that cut off. +// +// If the iterator has no elements, an error is returned, aborting processing. +func filterWordsAbove(word string, lengthCutOffIter func(*float64) bool, emitAboveCutoff func(string)) error { + var cutOff float64 + ok := lengthCutOffIter(&cutOff) + if !ok { + return fmt.Errorf("No length cutoff provided.") + } + if float64(len(word)) > cutOff { + emitAboveCutoff(word) + } + return nil +} + +// filterWordsBelow is a DoFn that takes in a word, +// and a singleton side input of a length cut off +// and only emits words that are beneath that cut off. +// +// If the side input isn't a singleton, a runtime panic will occur. +func filterWordsBelow(word string, lengthCutOff float64, emitBelowCutoff func(string)) { + if float64(len(word)) <= lengthCutOff { + emitBelowCutoff(word) + } +} + +func init() { + beam.RegisterFunction(filterWordsAbove) + beam.RegisterFunction(filterWordsBelow) +} + +// [END model_pardo_side_input_dofn] + +// addSideInput demonstrates passing a side input to a DoFn. +func addSideInput(s beam.Scope, words beam.PCollection) (beam.PCollection, beam.PCollection) { + wordLengths := applyWordLen(s, words) + + // [START model_pardo_side_input] + // avgWordLength is a PCollection containing a single element, a singleton. + avgWordLength := stats.Mean(s, wordLengths) + + // Side inputs are added as with the beam.SideInput option to beam.ParDo. + wordsAboveCutOff := beam.ParDo(s, filterWordsAbove, words, beam.SideInput{Input: avgWordLength}) + wordsBelowCutOff := beam.ParDo(s, filterWordsBelow, words, beam.SideInput{Input: avgWordLength}) + // [END model_pardo_side_input] + return wordsAboveCutOff, wordsBelowCutOff +} + +// isMarkedWord is a dummy function. +func isMarkedWord(word string) bool { + return strings.HasPrefix(word, "MARKER") +} + +// [START model_multiple_output_dofn] + +// processWords is a DoFn that has 3 output PCollections. The emitter functions +// are matched in positional order to the PCollections returned by beam.ParDo3. +func processWords(word string, emitBelowCutoff, emitAboveCutoff, emitMarked func(string)) { + const cutOff = 5 + if len(word) < cutOff { + emitBelowCutoff(word) + } else { + emitAboveCutoff(word) + } + if isMarkedWord(word) { + emitMarked(word) + } +} + +// processWordsMixed demonstrates mixing an emitter, with a standard return. +// If a standard return is used, it will always be the first returned PCollection, +// followed in positional order by the emitter functions. +func processWordsMixed(word string, emitMarked func(string)) int { + if isMarkedWord(word) { + emitMarked(word) + } + return len(word) +} + +func init() { + beam.RegisterFunction(processWords) + beam.RegisterFunction(processWordsMixed) +} + +// [END model_multiple_output_dofn] + +func applyMultipleOut(s beam.Scope, words beam.PCollection) (belows, aboves, markeds, lengths, mixedMarkeds beam.PCollection) { + // [START model_multiple_output] + // beam.ParDo3 returns PCollections in the same order as + // the emit function parameters in processWords. + below, above, marked := beam.ParDo3(s, processWords, words) + + // processWordsMixed uses both a standard return and an emitter function. + // The standard return produces the first PCollection from beam.ParDo2, + // and the emitter produces the second PCollection. + length, mixedMarked := beam.ParDo2(s, processWordsMixed, words) + // [END model_multiple_output] + return below, above, marked, length, mixedMarked +} + +func extractWordsFn(line string, emitWords func(string)) { + words := strings.Split(line, " ") + for _, w := range words { + emitWords(w) + } +} + +func init() { + beam.RegisterFunction(extractWordsFn) +} + +// [START countwords_composite] +// CountWords is a function that builds a composite PTransform +// to count the number of times each word appears. +func CountWords(s beam.Scope, lines beam.PCollection) beam.PCollection { + // A subscope is required for a function to become a composite transform. + // We assign it to the original scope variable s to shadow the original + // for the rest of the CountWords function. + s = s.Scope("CountWords") + + // Since the same subscope is used for the following transforms, + // they are in the same composite PTransform. + + // Convert lines of text into individual words. + words := beam.ParDo(s, extractWordsFn, lines) + + // Count the number of times each word occurs. + wordCounts := stats.Count(s, words) + + // Return any PCollections that should be available after + // the composite transform. + return wordCounts +} + +// [END countwords_composite] diff --git a/sdks/go/examples/snippets/04transforms_test.go b/sdks/go/examples/snippets/04transforms_test.go new file mode 100644 index 000000000000..8d888e028562 --- /dev/null +++ b/sdks/go/examples/snippets/04transforms_test.go @@ -0,0 +1,225 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package snippets + +import ( + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" +) + +func TestMain(m *testing.M) { + ptest.Main(m) +} + +func TestParDo(t *testing.T) { + p, s, input := ptest.CreateList([]string{"one", "two", "three"}) + lens := applyWordLen(s, input) + passert.Equals(s, lens, 3, 3, 5) + ptest.RunAndValidate(t, p) +} + +func TestParDo_anon(t *testing.T) { + p, s, input := ptest.CreateList([]string{"one", "two", "three"}) + lens := applyWordLenAnon(s, input) + passert.Equals(s, lens, 3, 3, 5) + ptest.RunAndValidate(t, p) +} + +func TestFormatCoGBKResults(t *testing.T) { + // [START cogroupbykey_outputs] + // Synthetic example results of a cogbk. + results := []struct { + Key string + Emails, Phones []string + }{ + { + Key: "amy", + Emails: []string{"amy@example.com"}, + Phones: []string{"111-222-3333", "333-444-5555"}, + }, { + Key: "carl", + Emails: []string{"carl@email.com", "carl@example.com"}, + Phones: []string{"444-555-6666"}, + }, { + Key: "james", + Emails: []string{}, + Phones: []string{"222-333-4444"}, + }, { + Key: "julia", + Emails: []string{"julia@example.com"}, + Phones: []string{}, + }, + } + // [END cogroupbykey_outputs] + + // [START cogroupbykey_formatted_outputs] + formattedResults := []string{ + "amy; ['amy@example.com']; ['111-222-3333', '333-444-5555']", + "carl; ['carl@email.com', 'carl@example.com']; ['444-555-6666']", + "james; []; ['222-333-4444']", + "julia; ['julia@example.com']; []", + } + // [END cogroupbykey_formatted_outputs] + + // Helper to fake iterators for unit testing. + makeIter := func(vs []string) func(*string) bool { + i := 0 + return func(v *string) bool { + if i >= len(vs) { + return false + } + *v = vs[i] + i++ + return true + } + } + + for i, result := range results { + got := formatCoGBKResults(result.Key, makeIter(result.Emails), makeIter(result.Phones)) + want := formattedResults[i] + if got != want { + t.Errorf("%d.%v, got %q, want %q", i, result.Key, got, want) + } + } + + p, s := beam.NewPipelineWithRoot() + formattedCoGBK := coGBKExample(s) + passert.Equals(s, formattedCoGBK, formattedResults[0], formattedResults[1], formattedResults[2], formattedResults[3]) + ptest.RunAndValidate(t, p) +} + +func TestCombine(t *testing.T) { + p, s, input := ptest.CreateList([]int{1, 2, 3}) + avg := globallyAverage(s, input) + passert.Equals(s, avg, float64(2.0)) + ptest.RunAndValidate(t, p) +} + +func TestCombineWithDefault_useDefault(t *testing.T) { + p, s, input := ptest.CreateList([]int{}) + avg := globallyAverageWithDefault(s, input) + passert.Equals(s, avg, float64(0)) + ptest.RunAndValidate(t, p) +} + +func TestCombineWithDefault_useAverage(t *testing.T) { + p, s, input := ptest.CreateList([]int{1, 2, 3}) + avg := globallyAverageWithDefault(s, input) + passert.Equals(s, avg, float64(2.0)) + ptest.RunAndValidate(t, p) +} + +func TestCombine_sum(t *testing.T) { + p, s, input := ptest.CreateList([]int{1, 2, 3}) + avg := globallySumInts(s, input) + passert.Equals(s, avg, int(6)) + ptest.RunAndValidate(t, p) +} + +func TestCombine_sum_bounded(t *testing.T) { + p, s, input := ptest.CreateList([]int{1, 2, 3}) + bound := int(4) + avg := globallyBoundedSumInts(s, bound, input) + passert.Equals(s, avg, bound) + ptest.RunAndValidate(t, p) +} + +type player struct { + Name string + Accuracy float64 +} + +func splitPlayer(e player) (string, float64) { + return e.Name, e.Accuracy +} + +func mergePlayer(k string, v float64) player { + return player{Name: k, Accuracy: v} +} + +func init() { + beam.RegisterFunction(splitPlayer) + beam.RegisterFunction(mergePlayer) +} + +func TestCombinePerKey(t *testing.T) { + p, s, input := ptest.CreateList([]player{{"fred", 0.2}, {"velma", 0.4}, {"fred", 0.5}, {"velma", 1.0}, {"shaggy", 0.1}}) + kvs := beam.ParDo(s, splitPlayer, input) + avg := perKeyAverage(s, kvs) + results := beam.ParDo(s, mergePlayer, avg) + passert.Equals(s, results, player{"fred", 0.35}, player{"velma", 0.7}, player{"shaggy", 0.1}) + ptest.RunAndValidate(t, p) +} + +func TestFlatten(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + a := beam.CreateList(s, []int{1, 2, 3}) + b := beam.CreateList(s, []int{5, 7, 9}) + c := beam.CreateList(s, []int{4, 6, 8}) + merged := applyFlatten(s, a, b, c) + passert.Equals(s, merged, 1, 2, 3, 4, 5, 6, 7, 8, 9) + ptest.RunAndValidate(t, p) +} + +func TestPartition(t *testing.T) { + p, s, input := ptest.CreateList([]Student{{42}, {57}, {23}, {89}, {99}, {5}}) + avg := applyPartition(s, input) + passert.Equals(s, avg, Student{42}) + ptest.RunAndValidate(t, p) +} + +func TestMultipleOutputs(t *testing.T) { + p, s, words := ptest.CreateList([]string{"a", "the", "pjamas", "art", "candy", "MARKERmarked"}) + below, above, marked, lengths, mixedMarked := applyMultipleOut(s, words) + + passert.Equals(s, below, "a", "the", "art") + passert.Equals(s, above, "pjamas", "candy", "MARKERmarked") + passert.Equals(s, marked, "MARKERmarked") + passert.Equals(s, lengths, 1, 3, 6, 3, 5, 12) + passert.Equals(s, mixedMarked, "MARKERmarked") + + ptest.RunAndValidate(t, p) +} + +func TestSideInputs(t *testing.T) { + p, s, words := ptest.CreateList([]string{"a", "the", "pjamas", "art", "candy", "garbage"}) + above, below := addSideInput(s, words) + passert.Equals(s, above, "pjamas", "candy", "garbage") + passert.Equals(s, below, "a", "the", "art") + ptest.RunAndValidate(t, p) +} + +func TestComposite(t *testing.T) { + p, s, lines := ptest.CreateList([]string{ + "this test dataset has the word test", + "at least twice, because to test the Composite", + "CountWords, one needs test data to run it with", + }) + // [START countwords_composite_call] + // A Composite PTransform function is called like any other function. + wordCounts := CountWords(s, lines) // returns a PCollection> + // [END countwords_composite_call] + testCount := beam.ParDo(s, func(k string, v int, emit func(int)) { + if k == "test" { + emit(v) + } + }, wordCounts) + passert.Equals(s, testCount, 4) + ptest.RunAndValidate(t, p) +} diff --git a/sdks/go/examples/snippets/doc.go b/sdks/go/examples/snippets/doc.go new file mode 100644 index 000000000000..95900b5c2517 --- /dev/null +++ b/sdks/go/examples/snippets/doc.go @@ -0,0 +1,23 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package snippets contains code used in the Beam Programming Guide +// as examples for the Apache Beam Go SDK. These snippets are compiled +// and their tests run to ensure correctness. However, due to their +// piecemeal pedagogical use, they may not be the best example of +// production code. +// +// The Beam Programming Guide can be found at https://beam.apache.org/documentation/programming-guide/. +package snippets diff --git a/sdks/go/examples/streaming_wordcap/wordcap.go b/sdks/go/examples/streaming_wordcap/wordcap.go index 045fb3755776..f6849228d97c 100644 --- a/sdks/go/examples/streaming_wordcap/wordcap.go +++ b/sdks/go/examples/streaming_wordcap/wordcap.go @@ -28,14 +28,14 @@ import ( "os" "strings" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/stringx" - "github.com/apache/beam/sdks/go/pkg/beam/io/pubsubio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts" - "github.com/apache/beam/sdks/go/pkg/beam/util/pubsubx" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/pkg/beam/x/debug" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/stringx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/pubsubio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/gcpopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/pubsubx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/debug" ) var ( diff --git a/sdks/go/examples/stringsplit/stringsplit.go b/sdks/go/examples/stringsplit/stringsplit.go index 13a1905269c0..20450dccd3fc 100644 --- a/sdks/go/examples/stringsplit/stringsplit.go +++ b/sdks/go/examples/stringsplit/stringsplit.go @@ -20,7 +20,7 @@ // // 1. From a command line, navigate to the top-level beam/ directory and run // the Flink job server: -// ./gradlew :runners:flink:1.10:job-server:runShadow -Djob-host=localhost -Dflink-master=local +// ./gradlew :runners:flink:1.13:job-server:runShadow -Djob-host=localhost -Dflink-master=local // // 2. The job server is ready to receive jobs once it outputs a log like the // following: `JobService started on localhost:8099`. Take note of the endpoint @@ -42,11 +42,11 @@ import ( "reflect" "time" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" - "github.com/apache/beam/sdks/go/pkg/beam/io/rtrackers/offsetrange" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/rtrackers/offsetrange" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) func init() { diff --git a/sdks/go/examples/windowed_wordcount/windowed_wordcount.go b/sdks/go/examples/windowed_wordcount/windowed_wordcount.go index f6748ab2fa98..040c659b85aa 100644 --- a/sdks/go/examples/windowed_wordcount/windowed_wordcount.go +++ b/sdks/go/examples/windowed_wordcount/windowed_wordcount.go @@ -42,13 +42,13 @@ import ( "reflect" "time" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/test/integration/wordcount" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/test/integration/wordcount" ) var ( @@ -62,6 +62,7 @@ var ( func init() { beam.RegisterType(reflect.TypeOf((*addTimestampFn)(nil)).Elem()) + beam.RegisterFunction(formatFn) } // Concept #2: A DoFn that sets the data element timestamp. This is a silly method, just for @@ -95,7 +96,7 @@ func main() { ctx := context.Background() if *output == "" { - log.Exit(ctx, "No output provided") + log.Exit(ctx, "No --output provided") } p := beam.NewPipeline() diff --git a/sdks/go/examples/wordcount/wordcount.go b/sdks/go/examples/wordcount/wordcount.go index d41ad010598b..08d9cb57f4e4 100644 --- a/sdks/go/examples/wordcount/wordcount.go +++ b/sdks/go/examples/wordcount/wordcount.go @@ -63,10 +63,10 @@ import ( "regexp" "strings" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) // Concept #2: Defining your own configuration options. Pipeline options can @@ -106,23 +106,33 @@ var ( // done automatically by the starcgen code generator, or it can be done manually // by calling beam.RegisterFunction in an init() call. func init() { - beam.RegisterFunction(extractFn) beam.RegisterFunction(formatFn) } var ( - wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`) - empty = beam.NewCounter("extract", "emptyLines") - lineLen = beam.NewDistribution("extract", "lineLenDistro") + wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`) + empty = beam.NewCounter("extract", "emptyLines") + small_word_length = flag.Int("small_word_length", 9, "small_word_length") + small_words = beam.NewCounter("extract", "small_words") + lineLen = beam.NewDistribution("extract", "lineLenDistro") ) -// extractFn is a DoFn that emits the words in a given line. -func extractFn(ctx context.Context, line string, emit func(string)) { +// extractFn is a DoFn that emits the words in a given line and keeps a count for small words. +type extractFn struct { + SmallWordLength int `json:"min_length"` +} + +func (f *extractFn) ProcessElement(ctx context.Context, line string, emit func(string)) { lineLen.Update(ctx, int64(len(line))) if len(strings.TrimSpace(line)) == 0 { empty.Inc(ctx, 1) } for _, word := range wordRE.FindAllString(line, -1) { + // increment the counter for small words if length of words is + // less than small_word_length + if len(word) < f.SmallWordLength { + small_words.Inc(ctx, 1) + } emit(word) } } @@ -150,7 +160,7 @@ func CountWords(s beam.Scope, lines beam.PCollection) beam.PCollection { s = s.Scope("CountWords") // Convert lines of text into individual words. - col := beam.ParDo(s, extractFn, lines) + col := beam.ParDo(s, &extractFn{SmallWordLength: *small_word_length}, lines) // Count the number of times each word occurs. return stats.Count(s, col) diff --git a/sdks/go/examples/xlang/cogroup_by/cogroup_by.go b/sdks/go/examples/xlang/cogroup_by/cogroup_by.go index daf4fefa6355..9a98c4ae97c0 100644 --- a/sdks/go/examples/xlang/cogroup_by/cogroup_by.go +++ b/sdks/go/examples/xlang/cogroup_by/cogroup_by.go @@ -30,15 +30,15 @@ import ( "reflect" "sort" - "github.com/apache/beam/sdks/go/examples/xlang" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/examples/xlang" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" // Imports to enable correct filesystem access and runner setup in LOOPBACK mode - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) var ( diff --git a/sdks/go/examples/xlang/combine/combine.go b/sdks/go/examples/xlang/combine/combine.go index 7b528c207845..d522dfaa1303 100644 --- a/sdks/go/examples/xlang/combine/combine.go +++ b/sdks/go/examples/xlang/combine/combine.go @@ -29,15 +29,15 @@ import ( "log" "reflect" - "github.com/apache/beam/sdks/go/examples/xlang" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/examples/xlang" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" // Imports to enable correct filesystem access and runner setup in LOOPBACK mode - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) var ( diff --git a/sdks/go/examples/xlang/combine_globally/combine_globally.go b/sdks/go/examples/xlang/combine_globally/combine_globally.go index 8055af60d7c1..cdb9a6977aa5 100644 --- a/sdks/go/examples/xlang/combine_globally/combine_globally.go +++ b/sdks/go/examples/xlang/combine_globally/combine_globally.go @@ -28,15 +28,15 @@ import ( "fmt" "log" - "github.com/apache/beam/sdks/go/examples/xlang" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/examples/xlang" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" // Imports to enable correct filesystem access and runner setup in LOOPBACK mode - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) var ( diff --git a/sdks/go/examples/xlang/flatten/flatten.go b/sdks/go/examples/xlang/flatten/flatten.go index 546e82782ecc..ab3818187bc5 100644 --- a/sdks/go/examples/xlang/flatten/flatten.go +++ b/sdks/go/examples/xlang/flatten/flatten.go @@ -28,15 +28,15 @@ import ( "fmt" "log" - "github.com/apache/beam/sdks/go/examples/xlang" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/examples/xlang" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" // Imports to enable correct filesystem access and runner setup in LOOPBACK mode - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) var ( diff --git a/sdks/go/examples/xlang/group_by/group_by.go b/sdks/go/examples/xlang/group_by/group_by.go index 2a381fad2901..6748fc261057 100644 --- a/sdks/go/examples/xlang/group_by/group_by.go +++ b/sdks/go/examples/xlang/group_by/group_by.go @@ -30,15 +30,15 @@ import ( "reflect" "sort" - "github.com/apache/beam/sdks/go/examples/xlang" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/examples/xlang" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" // Imports to enable correct filesystem access and runner setup in LOOPBACK mode - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) var ( diff --git a/sdks/go/examples/xlang/multi_input_output/multi.go b/sdks/go/examples/xlang/multi_input_output/multi.go index eb65c671efb5..3270669f8241 100644 --- a/sdks/go/examples/xlang/multi_input_output/multi.go +++ b/sdks/go/examples/xlang/multi_input_output/multi.go @@ -28,15 +28,15 @@ import ( "flag" "log" - "github.com/apache/beam/sdks/go/examples/xlang" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/examples/xlang" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" // Imports to enable correct filesystem access and runner setup in LOOPBACK mode - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) var ( diff --git a/sdks/go/examples/xlang/partition/partition.go b/sdks/go/examples/xlang/partition/partition.go index 6a5bf766ac59..65a0b3aa3a89 100644 --- a/sdks/go/examples/xlang/partition/partition.go +++ b/sdks/go/examples/xlang/partition/partition.go @@ -28,15 +28,15 @@ import ( "fmt" "log" - "github.com/apache/beam/sdks/go/examples/xlang" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/examples/xlang" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" // Imports to enable correct filesystem access and runner setup in LOOPBACK mode - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) var ( diff --git a/sdks/go/examples/xlang/transforms.go b/sdks/go/examples/xlang/transforms.go index c9166ce643a5..3d410e8ac88a 100644 --- a/sdks/go/examples/xlang/transforms.go +++ b/sdks/go/examples/xlang/transforms.go @@ -17,32 +17,62 @@ package xlang import ( - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) +func init() { + beam.RegisterType(reflect.TypeOf((*prefixPayload)(nil)).Elem()) +} + +// prefixPayload is a struct used to represent the payload of the Prefix +// cross-language transform. +// +// This must match the struct that the expansion service is expecting to +// receive. For example, at the time of writing this comment, that struct is +// the one in the following link. +// https://github.com/apache/beam/blob/v2.29.0/sdks/java/testing/expansion-service/src/test/java/org/apache/beam/sdk/testing/expansion/TestExpansionService.java#L191 +type prefixPayload struct { + Data string +} + +// Prefix wraps a cross-language transform call to the Prefix transform. This +// transform takes a PCollection of strings as input, and a payload defining a +// prefix string, and appends that as prefix to each input string. +// +// This serves as an example of a cross-language transform with a payload. +func Prefix(s beam.Scope, prefix string, addr string, col beam.PCollection) beam.PCollection { + s = s.Scope("XLangTest.Prefix") + + pl := beam.CrossLanguagePayload(prefixPayload{Data: prefix}) + outT := beam.UnnamedOutput(typex.New(reflectx.String)) + outs := beam.CrossLanguage(s, "beam:transforms:xlang:test:prefix", pl, addr, beam.UnnamedInput(col), outT) + return outs[beam.UnnamedOutputTag()] +} + func CoGroupByKey(s beam.Scope, addr string, col1, col2 beam.PCollection) beam.PCollection { s = s.Scope("XLangTest.CoGroupByKey") namedInputs := map[string]beam.PCollection{"col1": col1, "col2": col2} outT := beam.UnnamedOutput(typex.NewCoGBK(typex.New(reflectx.Int64), typex.New(reflectx.String))) outs := beam.CrossLanguage(s, "beam:transforms:xlang:test:cgbk", nil, addr, namedInputs, outT) - return outs[graph.UnnamedOutputTag] + return outs[beam.UnnamedOutputTag()] } func CombinePerKey(s beam.Scope, addr string, col beam.PCollection) beam.PCollection { s = s.Scope("XLangTest.CombinePerKey") outT := beam.UnnamedOutput(typex.NewKV(typex.New(reflectx.String), typex.New(reflectx.Int64))) outs := beam.CrossLanguage(s, "beam:transforms:xlang:test:compk", nil, addr, beam.UnnamedInput(col), outT) - return outs[graph.UnnamedOutputTag] + return outs[beam.UnnamedOutputTag()] } func CombineGlobally(s beam.Scope, addr string, col beam.PCollection) beam.PCollection { s = s.Scope("XLangTest.CombineGlobally") outT := beam.UnnamedOutput(typex.New(reflectx.Int64)) outs := beam.CrossLanguage(s, "beam:transforms:xlang:test:comgl", nil, addr, beam.UnnamedInput(col), outT) - return outs[graph.UnnamedOutputTag] + return outs[beam.UnnamedOutputTag()] } func Flatten(s beam.Scope, addr string, col1, col2 beam.PCollection) beam.PCollection { @@ -50,14 +80,14 @@ func Flatten(s beam.Scope, addr string, col1, col2 beam.PCollection) beam.PColle namedInputs := map[string]beam.PCollection{"col1": col1, "col2": col2} outT := beam.UnnamedOutput(typex.New(reflectx.Int64)) outs := beam.CrossLanguage(s, "beam:transforms:xlang:test:flatten", nil, addr, namedInputs, outT) - return outs[graph.UnnamedOutputTag] + return outs[beam.UnnamedOutputTag()] } func GroupByKey(s beam.Scope, addr string, col beam.PCollection) beam.PCollection { s = s.Scope("XLangTest.GroupByKey") outT := beam.UnnamedOutput(typex.NewCoGBK(typex.New(reflectx.String), typex.New(reflectx.Int64))) outs := beam.CrossLanguage(s, "beam:transforms:xlang:test:gbk", nil, addr, beam.UnnamedInput(col), outT) - return outs[graph.UnnamedOutputTag] + return outs[beam.UnnamedOutputTag()] } func Multi(s beam.Scope, addr string, main1, main2, side beam.PCollection) (mainOut, sideOut beam.PCollection) { @@ -81,5 +111,5 @@ func Count(s beam.Scope, addr string, col beam.PCollection) beam.PCollection { s = s.Scope("XLang.Count") outT := beam.UnnamedOutput(typex.NewKV(typex.New(reflectx.String), typex.New(reflectx.Int64))) c := beam.CrossLanguage(s, "beam:transforms:xlang:count", nil, addr, beam.UnnamedInput(col), outT) - return c[graph.UnnamedOutputTag] + return c[beam.UnnamedOutputTag()] } diff --git a/sdks/go/examples/xlang/wordcount/wordcount.go b/sdks/go/examples/xlang/wordcount/wordcount.go index 1c6ae1630935..a73dff52052c 100644 --- a/sdks/go/examples/xlang/wordcount/wordcount.go +++ b/sdks/go/examples/xlang/wordcount/wordcount.go @@ -31,15 +31,15 @@ import ( "regexp" "strings" - "github.com/apache/beam/sdks/go/examples/xlang" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/examples/xlang" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" // Imports to enable correct filesystem access and runner setup in LOOPBACK mode - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) var ( diff --git a/sdks/go/examples/yatzy/yatzy.go b/sdks/go/examples/yatzy/yatzy.go index 07117e573700..59c2a66ab303 100644 --- a/sdks/go/examples/yatzy/yatzy.go +++ b/sdks/go/examples/yatzy/yatzy.go @@ -27,9 +27,9 @@ import ( "sort" "time" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" ) var ( diff --git a/sdks/go/gogradle.lock b/sdks/go/gogradle.lock deleted file mode 100644 index d3f0901b53b8..000000000000 --- a/sdks/go/gogradle.lock +++ /dev/null @@ -1,716 +0,0 @@ -# This file is generated by gogradle automatically, you should NEVER modify it manually. ---- -apiVersion: "0.11.4" -dependencies: - build: - - vcs: "git" - name: "cloud.google.com/go" - commit: "03869a08dc16b35ad4968e92d34c5a2a2961b205" - url: "https://code.googlesource.com/gocloud" - transitive: false - - urls: - - "https://github.com/Shopify/sarama.git" - - "git@github.com:Shopify/sarama.git" - vcs: "git" - name: "github.com/Shopify/sarama" - commit: "541689b9f4212043471eb537fa72da507025d3ea" - transitive: false - - urls: - - "https://github.com/armon/consul-api.git" - - "git@github.com:armon/consul-api.git" - vcs: "git" - name: "github.com/armon/consul-api" - commit: "eb2c6b5be1b66bab83016e0b05f01b8d5496ffbd" - transitive: false - - name: "github.com/beorn7/perks" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/beorn7/perks" - transitive: false - - name: "github.com/bgentry/speakeasy" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/bgentry/speakeasy" - transitive: false - - name: "github.com/coreos/bbolt" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/coreos/bbolt" - transitive: false - - name: "github.com/coreos/go-semver" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/coreos/go-semver" - transitive: false - - name: "github.com/coreos/go-systemd" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/coreos/go-systemd" - transitive: false - - name: "github.com/coreos/pkg" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/coreos/pkg" - transitive: false - - urls: - - "https://github.com/cpuguy83/go-md2man.git" - - "git@github.com:cpuguy83/go-md2man.git" - vcs: "git" - name: "github.com/cpuguy83/go-md2man" - commit: "dc9f53734905c233adfc09fd4f063dce63ce3daf" - transitive: false - - urls: - - "https://github.com/davecgh/go-spew.git" - - "git@github.com:davecgh/go-spew.git" - vcs: "git" - name: "github.com/davecgh/go-spew" - commit: "87df7c60d5820d0f8ae11afede5aa52325c09717" - transitive: false - - name: "github.com/dgrijalva/jwt-go" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/dgrijalva/jwt-go" - transitive: false - - name: "github.com/dustin/go-humanize" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/dustin/go-humanize" - transitive: false - - urls: - - "https://github.com/eapache/go-resiliency.git" - - "git@github.com:eapache/go-resiliency.git" - vcs: "git" - name: "github.com/eapache/go-resiliency" - commit: "ef9aaa7ea8bd2448429af1a77cf41b2b3b34bdd6" - transitive: false - - urls: - - "https://github.com/eapache/go-xerial-snappy.git" - - "git@github.com:eapache/go-xerial-snappy.git" - vcs: "git" - name: "github.com/eapache/go-xerial-snappy" - commit: "bb955e01b9346ac19dc29eb16586c90ded99a98c" - transitive: false - - urls: - - "https://github.com/eapache/queue.git" - - "git@github.com:eapache/queue.git" - vcs: "git" - name: "github.com/eapache/queue" - commit: "44cc805cf13205b55f69e14bcb69867d1ae92f98" - transitive: false - - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - transitive: false - - urls: - - "https://github.com/fsnotify/fsnotify.git" - - "git@github.com:fsnotify/fsnotify.git" - vcs: "git" - name: "github.com/fsnotify/fsnotify" - commit: "c2828203cd70a50dcccfb2761f8b1f8ceef9a8e9" - transitive: false - - name: "github.com/ghodss/yaml" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/ghodss/yaml" - transitive: false - - name: "github.com/gogo/protobuf" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/gogo/protobuf" - transitive: false - - urls: - - "https://github.com/golang/glog.git" - - "git@github.com:golang/glog.git" - vcs: "git" - name: "github.com/golang/glog" - commit: "23def4e6c14b4da8ac2ed8007337bc5eb5007998" - transitive: false - - name: "github.com/golang/groupcache" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/golang/groupcache" - transitive: false - - urls: - - "https://github.com/golang/mock.git" - - "git@github.com:golang/mock.git" - vcs: "git" - name: "github.com/golang/mock" - commit: "b3e60bcdc577185fce3cf625fc96b62857ce5574" - transitive: false - - urls: - - "https://github.com/golang/protobuf.git" - - "git@github.com:golang/protobuf.git" - vcs: "git" - name: "github.com/golang/protobuf" - commit: "d04d7b157bb510b1e0c10132224b616ac0e26b17" - transitive: false - - urls: - - "https://github.com/golang/snappy.git" - - "git@github.com:golang/snappy.git" - vcs: "git" - name: "github.com/golang/snappy" - commit: "553a641470496b2327abcac10b36396bd98e45c9" - transitive: false - - name: "github.com/google/btree" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/google/btree" - transitive: false - - urls: - - "https://github.com/google/go-cmp.git" - - "git@github.com:google/go-cmp.git" - vcs: "git" - name: "github.com/google/go-cmp" - commit: "9680bfaf28748393e28e00238d94070fb9972fd8" - transitive: false - - urls: - - "https://github.com/google/pprof.git" - - "git@github.com:google/pprof.git" - vcs: "git" - name: "github.com/google/pprof" - commit: "a8f279b7952b27edbcb72e5a6c69ee9be4c8ad93" - transitive: false - - urls: - - "https://github.com/googleapis/gax-go.git" - - "git@github.com:googleapis/gax-go.git" - vcs: "git" - name: "github.com/googleapis/gax-go" - commit: "bd5b16380fd03dc758d11cef74ba2e3bc8b0e8c2" - transitive: false - - name: "github.com/gorilla/websocket" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/gorilla/websocket" - transitive: false - - name: "github.com/grpc-ecosystem/go-grpc-prometheus" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/grpc-ecosystem/go-grpc-prometheus" - transitive: false - - name: "github.com/grpc-ecosystem/grpc-gateway" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/grpc-ecosystem/grpc-gateway" - transitive: false - - urls: - - "https://github.com/hashicorp/hcl.git" - - "git@github.com:hashicorp/hcl.git" - vcs: "git" - name: "github.com/hashicorp/hcl" - commit: "23c074d0eceb2b8a5bfdbb271ab780cde70f05a8" - transitive: false - - urls: - - "https://github.com/ianlancetaylor/demangle.git" - - "git@github.com:ianlancetaylor/demangle.git" - vcs: "git" - name: "github.com/ianlancetaylor/demangle" - commit: "4883227f66371e02c4948937d3e2be1664d9be38" - transitive: false - - name: "github.com/inconshreveable/mousetrap" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/inconshreveable/mousetrap" - transitive: false - - name: "github.com/jonboulle/clockwork" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/jonboulle/clockwork" - transitive: false - - urls: - - "https://github.com/kr/fs.git" - - "git@github.com:kr/fs.git" - vcs: "git" - name: "github.com/kr/fs" - commit: "2788f0dbd16903de03cb8186e5c7d97b69ad387b" - transitive: false - - name: "github.com/kr/pty" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/kr/pty" - transitive: false - - urls: - - "https://github.com/linkedin/goavro.git" - - "git@github.com:linkedin/goavro.git" - vcs: "git" - name: "github.com/linkedin/goavro" - commit: "94a7a9db615f35a39dd2b82089398b92fabad2ba" - transitive: false - - urls: - - "https://github.com/magiconair/properties.git" - - "git@github.com:magiconair/properties.git" - vcs: "git" - name: "github.com/magiconair/properties" - commit: "49d762b9817ba1c2e9d0c69183c2b4a8b8f1d934" - transitive: false - - name: "github.com/mattn/go-runewidth" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/mattn/go-runewidth" - transitive: false - - name: "github.com/matttproud/golang_protobuf_extensions" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/matttproud/golang_protobuf_extensions" - transitive: false - - urls: - - "https://github.com/mitchellh/go-homedir.git" - - "git@github.com:mitchellh/go-homedir.git" - vcs: "git" - name: "github.com/mitchellh/go-homedir" - commit: "b8bc1bf767474819792c23f32d8286a45736f1c6" - transitive: false - - urls: - - "https://github.com/mitchellh/mapstructure.git" - - "git@github.com:mitchellh/mapstructure.git" - vcs: "git" - name: "github.com/mitchellh/mapstructure" - commit: "a4e142e9c047c904fa2f1e144d9a84e6133024bc" - transitive: false - - urls: - - "https://github.com/nightlyone/lockfile" - - "git@github.com:nightlyone/lockfile.git" - vcs: "git" - name: "github.com/nightlyone/lockfile" - commit: "0ad87eef1443f64d3d8c50da647e2b1552851124" - transitive: false - - name: "github.com/olekukonko/tablewriter" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/olekukonko/tablewriter" - transitive: false - - urls: - - "https://github.com/openzipkin/zipkin-go.git" - - "git@github.com:openzipkin/zipkin-go.git" - vcs: "git" - name: "github.com/openzipkin/zipkin-go" - commit: "3741243b287094fda649c7f0fa74bd51f37dc122" - transitive: false - - urls: - - "https://github.com/pelletier/go-toml.git" - - "git@github.com:pelletier/go-toml.git" - vcs: "git" - name: "github.com/pelletier/go-toml" - commit: "acdc4509485b587f5e675510c4f2c63e90ff68a8" - transitive: false - - name: "github.com/petar/GoLLRB" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/petar/GoLLRB" - transitive: false - - urls: - - "https://github.com/pierrec/lz4.git" - - "git@github.com:pierrec/lz4.git" - vcs: "git" - name: "github.com/pierrec/lz4" - commit: "ed8d4cc3b461464e69798080a0092bd028910298" - transitive: false - - urls: - - "https://github.com/pierrec/xxHash.git" - - "git@github.com:pierrec/xxHash.git" - vcs: "git" - name: "github.com/pierrec/xxHash" - commit: "a0006b13c722f7f12368c00a3d3c2ae8a999a0c6" - transitive: false - - urls: - - "https://github.com/pkg/errors.git" - - "git@github.com:pkg/errors.git" - vcs: "git" - name: "github.com/pkg/errors" - commit: "30136e27e2ac8d167177e8a583aa4c3fea5be833" - transitive: false - - urls: - - "https://github.com/pkg/sftp.git" - - "git@github.com:pkg/sftp.git" - vcs: "git" - name: "github.com/pkg/sftp" - commit: "22e9c1ccc02fc1b9fa3264572e49109b68a86947" - transitive: false - - urls: - - "https://github.com/prometheus/client_golang.git" - - "git@github.com:prometheus/client_golang.git" - vcs: "git" - name: "github.com/prometheus/client_golang" - commit: "9bb6ab929dcbe1c8393cd9ef70387cb69811bd1c" - transitive: false - - name: "github.com/prometheus/client_model" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/prometheus/client_model" - transitive: false - - name: "github.com/prometheus/common" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/prometheus/common" - transitive: false - - urls: - - "https://github.com/prometheus/procfs.git" - - "git@github.com:prometheus/procfs.git" - vcs: "git" - name: "github.com/prometheus/procfs" - commit: "cb4147076ac75738c9a7d279075a253c0cc5acbd" - transitive: false - - urls: - - "https://github.com/rcrowley/go-metrics.git" - - "git@github.com:rcrowley/go-metrics.git" - vcs: "git" - name: "github.com/rcrowley/go-metrics" - commit: "8732c616f52954686704c8645fe1a9d59e9df7c1" - transitive: false - - name: "github.com/russross/blackfriday" - host: - name: "github.com/cpuguy83/go-md2man" - commit: "dc9f53734905c233adfc09fd4f063dce63ce3daf" - urls: - - "https://github.com/cpuguy83/go-md2man.git" - - "git@github.com:cpuguy83/go-md2man.git" - vcs: "git" - vendorPath: "vendor/github.com/russross/blackfriday" - transitive: false - - name: "github.com/shurcooL/sanitized_anchor_name" - host: - name: "github.com/cpuguy83/go-md2man" - commit: "dc9f53734905c233adfc09fd4f063dce63ce3daf" - urls: - - "https://github.com/cpuguy83/go-md2man.git" - - "git@github.com:cpuguy83/go-md2man.git" - vcs: "git" - vendorPath: "vendor/github.com/shurcooL/sanitized_anchor_name" - transitive: false - - name: "github.com/sirupsen/logrus" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/sirupsen/logrus" - transitive: false - - name: "github.com/soheilhy/cmux" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/soheilhy/cmux" - transitive: false - - urls: - - "https://github.com/spf13/afero.git" - - "git@github.com:spf13/afero.git" - vcs: "git" - name: "github.com/spf13/afero" - commit: "bb8f1927f2a9d3ab41c9340aa034f6b803f4359c" - transitive: false - - urls: - - "https://github.com/spf13/cast.git" - - "git@github.com:spf13/cast.git" - vcs: "git" - name: "github.com/spf13/cast" - commit: "acbeb36b902d72a7a4c18e8f3241075e7ab763e4" - transitive: false - - urls: - - "https://github.com/spf13/cobra.git" - - "git@github.com:spf13/cobra.git" - vcs: "git" - name: "github.com/spf13/cobra" - commit: "93959269ad99e80983c9ba742a7e01203a4c0e4f" - transitive: false - - urls: - - "https://github.com/spf13/jwalterweatherman.git" - - "git@github.com:spf13/jwalterweatherman.git" - vcs: "git" - name: "github.com/spf13/jwalterweatherman" - commit: "7c0cea34c8ece3fbeb2b27ab9b59511d360fb394" - transitive: false - - name: "github.com/spf13/pflag" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/spf13/pflag" - transitive: false - - urls: - - "https://github.com/spf13/viper.git" - - "git@github.com:spf13/viper.git" - vcs: "git" - name: "github.com/spf13/viper" - commit: "aafc9e6bc7b7bb53ddaa75a5ef49a17d6e654be5" - transitive: false - - urls: - - "https://github.com/stathat/go.git" - - "git@github.com:stathat/go.git" - vcs: "git" - name: "github.com/stathat/go" - commit: "74669b9f388d9d788c97399a0824adbfee78400e" - transitive: false - - name: "github.com/tmc/grpc-websocket-proxy" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/tmc/grpc-websocket-proxy" - transitive: false - - name: "github.com/ugorji/go" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/ugorji/go" - transitive: false - - name: "github.com/urfave/cli" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/urfave/cli" - transitive: false - - name: "github.com/xiang90/probing" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/github.com/xiang90/probing" - transitive: false - - urls: - - "https://github.com/xordataexchange/crypt.git" - - "git@github.com:xordataexchange/crypt.git" - vcs: "git" - name: "github.com/xordataexchange/crypt" - commit: "b2862e3d0a775f18c7cfe02273500ae307b61218" - transitive: false - - vcs: "git" - name: "go.opencensus.io" - commit: "3fb168f674736c026e623310bfccb0691e6dec8a" - url: "https://github.com/census-instrumentation/opencensus-go" - transitive: false - - vcs: "git" - name: "golang.org/x/crypto" - commit: "d9133f5469342136e669e85192a26056b587f503" - url: "https://go.googlesource.com/crypto" - transitive: false - - vcs: "git" - name: "golang.org/x/debug" - commit: "95515998a8a4bd7448134b2cb5971dbeb12e0b77" - url: "https://go.googlesource.com/debug" - transitive: false - - vcs: "git" - name: "golang.org/x/net" - commit: "2fb46b16b8dda405028c50f7c7f0f9dd1fa6bfb1" - url: "https://go.googlesource.com/net" - transitive: false - - vcs: "git" - name: "golang.org/x/oauth2" - commit: "5d25da1a8d43b66f2898c444f899c7bcfd6a407e" - url: "https://go.googlesource.com/oauth2" - transitive: false - - vcs: "git" - name: "golang.org/x/sync" - commit: "fd80eb99c8f653c847d294a001bdf2a3a6f768f5" - url: "https://go.googlesource.com/sync" - transitive: false - - vcs: "git" - name: "golang.org/x/sys" - commit: "fde4db37ae7ad8191b03d30d27f258b5291ae4e3" - url: "https://go.googlesource.com/sys" - transitive: false - - vcs: "git" - name: "golang.org/x/text" - commit: "23ae387dee1f90d29a23c0e87ee0b46038fbed0e" - url: "https://go.googlesource.com/text" - transitive: false - - name: "golang.org/x/time" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/golang.org/x/time" - transitive: false - - vcs: "git" - name: "google.golang.org/api" - commit: "0324d5e90dc7753607860272666845fad9ceb97e" - url: "https://code.googlesource.com/google-api-go-client" - transitive: false - - vcs: "git" - name: "google.golang.org/genproto" - commit: "4d944d34d83c502a5f761500a14d8842648415c3" - url: "https://github.com/google/go-genproto" - transitive: false - - vcs: "git" - name: "google.golang.org/grpc" - commit: "5e8f83304c0563d1ba74db05fee83d9c18ab9a58" - url: "https://github.com/grpc/grpc-go" - transitive: false - - vcs: "git" - name: "google.golang.org/protobuf" - commit: "d165be301fb1e13390ad453281ded24385fd8ebc" - url: "https://go.googlesource.com/protobuf" - transitive: false - - name: "gopkg.in/cheggaaa/pb.v1" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/gopkg.in/cheggaaa/pb.v1" - transitive: false - - name: "gopkg.in/yaml.v2" - host: - name: "github.com/etcd-io/etcd" - commit: "11214aa33bf5a47d3d9d8dafe0f6b97237dfe921" - urls: - - "https://github.com/etcd-io/etcd.git" - - "git@github.com:etcd-io/etcd.git" - vcs: "git" - vendorPath: "vendor/gopkg.in/yaml.v2" - transitive: false - test: - - vcs: "git" - name: "golang.org/x/xerrors" - commit: "5ec99f83aff198f5fbd629d6c8d8eb38a04218ca" - url: "https://go.googlesource.com/xerrors" - transitive: false diff --git a/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go b/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go index 3c7da4317c02..15c2d9e2954a 100644 --- a/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go +++ b/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go @@ -19,9 +19,9 @@ import ( "io" "cloud.google.com/go/storage" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/gcsx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/gcsx" "github.com/golang/protobuf/proto" "golang.org/x/net/context" ) diff --git a/sdks/go/pkg/beam/artifact/gcsproxy/staging.go b/sdks/go/pkg/beam/artifact/gcsproxy/staging.go index 095450eae579..a29508439807 100644 --- a/sdks/go/pkg/beam/artifact/gcsproxy/staging.go +++ b/sdks/go/pkg/beam/artifact/gcsproxy/staging.go @@ -25,9 +25,9 @@ import ( "sync" "cloud.google.com/go/storage" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/gcsx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/gcsx" "github.com/golang/protobuf/proto" "golang.org/x/net/context" ) diff --git a/sdks/go/pkg/beam/artifact/materialize.go b/sdks/go/pkg/beam/artifact/materialize.go index 72556812915e..694f46f8459f 100644 --- a/sdks/go/pkg/beam/artifact/materialize.go +++ b/sdks/go/pkg/beam/artifact/materialize.go @@ -22,25 +22,30 @@ import ( "crypto/sha256" "encoding/hex" "io" + "log" "math/rand" "os" "path/filepath" + "strconv" "strings" "sync" + "sync/atomic" "time" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/errorx" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/errorx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" "github.com/golang/protobuf/proto" ) // TODO(lostluck): 2018/05/28 Extract these from their enum descriptors in the pipeline_v1 proto const ( - URNStagingTo = "beam:artifact:role:staging_to:v1" - NoArtifactsStaged = "__no_artifacts_staged__" + URNFileArtifact = "beam:artifact:type:file:v1" + URNPipRequirementsFile = "beam:artifact:role:pip_requirements_file:v1" + URNStagingTo = "beam:artifact:role:staging_to:v1" + NoArtifactsStaged = "__no_artifacts_staged__" ) // Materialize is a convenience helper for ensuring that all artifacts are @@ -49,17 +54,17 @@ const ( // present. // TODO(BEAM-9577): Return a mapping of filename to dependency, rather than []*jobpb.ArtifactMetadata. // TODO(BEAM-9577): Leverage richness of roles rather than magic names to understand artifacts. -func Materialize(ctx context.Context, endpoint string, dependencies []*pipepb.ArtifactInformation, rt string, dest string) ([]*jobpb.ArtifactMetadata, error) { +func Materialize(ctx context.Context, endpoint string, dependencies []*pipepb.ArtifactInformation, rt string, dest string) ([]*pipepb.ArtifactInformation, error) { if len(dependencies) > 0 { return newMaterialize(ctx, endpoint, dependencies, dest) } else if rt == "" || rt == NoArtifactsStaged { - return []*jobpb.ArtifactMetadata{}, nil + return []*pipepb.ArtifactInformation{}, nil } else { return legacyMaterialize(ctx, endpoint, rt, dest) } } -func newMaterialize(ctx context.Context, endpoint string, dependencies []*pipepb.ArtifactInformation, dest string) ([]*jobpb.ArtifactMetadata, error) { +func newMaterialize(ctx context.Context, endpoint string, dependencies []*pipepb.ArtifactInformation, dest string) ([]*pipepb.ArtifactInformation, error) { cc, err := grpcx.Dial(ctx, endpoint, 2*time.Minute) if err != nil { return nil, err @@ -69,41 +74,97 @@ func newMaterialize(ctx context.Context, endpoint string, dependencies []*pipepb return newMaterializeWithClient(ctx, jobpb.NewArtifactRetrievalServiceClient(cc), dependencies, dest) } -func newMaterializeWithClient(ctx context.Context, client jobpb.ArtifactRetrievalServiceClient, dependencies []*pipepb.ArtifactInformation, dest string) ([]*jobpb.ArtifactMetadata, error) { +func newMaterializeWithClient(ctx context.Context, client jobpb.ArtifactRetrievalServiceClient, dependencies []*pipepb.ArtifactInformation, dest string) ([]*pipepb.ArtifactInformation, error) { resolution, err := client.ResolveArtifacts(ctx, &jobpb.ResolveArtifactsRequest{Artifacts: dependencies}) if err != nil { return nil, err } - var md []*jobpb.ArtifactMetadata + var artifacts []*pipepb.ArtifactInformation var list []retrievable for _, dep := range resolution.Replacements { path, err := extractStagingToPath(dep) if err != nil { return nil, err } - md = append(md, &jobpb.ArtifactMetadata{ - Name: path, + filePayload := pipepb.ArtifactFilePayload{ + Path: path, + } + if dep.TypeUrn == URNFileArtifact { + typePayload := pipepb.ArtifactFilePayload{} + if err := proto.Unmarshal(dep.TypePayload, &typePayload); err != nil { + return nil, errors.Wrap(err, "failed to parse artifact file payload") + } + filePayload.Sha256 = typePayload.Sha256 + } + newTypePayload, err := proto.Marshal(&filePayload) + if err != nil { + return nil, errors.Wrap(err, "failed to create artifact type payload") + } + artifacts = append(artifacts, &pipepb.ArtifactInformation{ + TypeUrn: URNFileArtifact, + TypePayload: newTypePayload, + RoleUrn: dep.RoleUrn, + RolePayload: dep.RolePayload, }) + rolePayload, err := proto.Marshal(&pipepb.ArtifactStagingToRolePayload{ + StagedName: path, + }) + if err != nil { + return nil, errors.Wrap(err, "failed to create artifact role payload") + } list = append(list, &artifact{ client: client, - dep: dep, + dep: &pipepb.ArtifactInformation{ + TypeUrn: dep.TypeUrn, + TypePayload: dep.TypePayload, + RoleUrn: URNStagingTo, + RolePayload: rolePayload, + }, }) } - return md, MultiRetrieve(ctx, 10, list, dest) + return artifacts, MultiRetrieve(ctx, 10, list, dest) +} + +// Used for generating unique IDs. We assign uniquely generated names to staged files without staging names. +var idCounter uint64 + +func generateId() string { + id := atomic.AddUint64(&idCounter, 1) + return strconv.FormatUint(id, 10) } func extractStagingToPath(artifact *pipepb.ArtifactInformation) (string, error) { - if artifact.RoleUrn != URNStagingTo { - return "", errors.Errorf("Unsupported artifact role %s", artifact.RoleUrn) + var stagedName string + if artifact.RoleUrn == URNStagingTo { + role := pipepb.ArtifactStagingToRolePayload{} + if err := proto.Unmarshal(artifact.RolePayload, &role); err != nil { + return "", err + } + stagedName = role.StagedName + } else if artifact.TypeUrn == URNFileArtifact { + ty := pipepb.ArtifactFilePayload{} + if err := proto.Unmarshal(artifact.TypePayload, &ty); err != nil { + return "", err + } + stagedName = generateId() + "-" + filepath.Base(ty.Path) + } else { + return "", errors.Errorf("failed to extract staging path for artifact type %v role %v", artifact.TypeUrn, artifact.RoleUrn) } - role := pipepb.ArtifactStagingToRolePayload{} - if err := proto.Unmarshal(artifact.RolePayload, &role); err != nil { - return "", err + return stagedName, nil +} + +func MustExtractFilePayload(artifact *pipepb.ArtifactInformation) (string, string) { + if artifact.TypeUrn != URNFileArtifact { + log.Fatalf("Unsupported artifact type #{artifact.TypeUrn}") } - return role.StagedName, nil + ty := pipepb.ArtifactFilePayload{} + if err := proto.Unmarshal(artifact.TypePayload, &ty); err != nil { + log.Fatalf("failed to parse artifact file payload: #{err}") + } + return ty.Path, ty.Sha256 } type artifact struct { @@ -143,7 +204,7 @@ func (a artifact) retrieve(ctx context.Context, dest string) error { } w := bufio.NewWriter(fd) - err = writeChunks(stream, w) + sha256Hash, err := writeChunks(stream, w) if err != nil { fd.Close() // drop any buffered content return errors.Wrapf(err, "failed to retrieve chunk for %v", filename) @@ -152,27 +213,33 @@ func (a artifact) retrieve(ctx context.Context, dest string) error { fd.Close() return errors.Wrapf(err, "failed to flush chunks for %v", filename) } + stat, _ := fd.Stat() + log.Printf("Downloaded: %v (sha256: %v, size: %v)", filename, sha256Hash, stat.Size()) + return fd.Close() } -func writeChunks(stream jobpb.ArtifactRetrievalService_GetArtifactClient, w io.Writer) error { +func writeChunks(stream jobpb.ArtifactRetrievalService_GetArtifactClient, w io.Writer) (string, error) { + sha256W := sha256.New() for { chunk, err := stream.Recv() if err == io.EOF { break } if err != nil { - return err + return "", err + } + if _, err := sha256W.Write(chunk.Data); err != nil { + panic(err) // cannot fail } - if _, err := w.Write(chunk.Data); err != nil { - return errors.Wrapf(err, "chunk write failed") + return "", errors.Wrapf(err, "chunk write failed") } } - return nil + return hex.EncodeToString(sha256W.Sum(nil)), nil } -func legacyMaterialize(ctx context.Context, endpoint string, rt string, dest string) ([]*jobpb.ArtifactMetadata, error) { +func legacyMaterialize(ctx context.Context, endpoint string, rt string, dest string) ([]*pipepb.ArtifactInformation, error) { cc, err := grpcx.Dial(ctx, endpoint, 2*time.Minute) if err != nil { return nil, err @@ -187,8 +254,28 @@ func legacyMaterialize(ctx context.Context, endpoint string, rt string, dest str } mds := m.GetManifest().GetArtifact() + var artifacts []*pipepb.ArtifactInformation var list []retrievable for _, md := range mds { + typePayload, err := proto.Marshal(&pipepb.ArtifactFilePayload{ + Path: md.Name, + Sha256: md.Sha256, + }) + if err != nil { + return nil, errors.Wrap(err, "failed to create artifact type payload") + } + rolePayload, err := proto.Marshal(&pipepb.ArtifactStagingToRolePayload{ + StagedName: md.Name, + }) + if err != nil { + return nil, errors.Wrap(err, "failed to create artifact role payload") + } + artifacts = append(artifacts, &pipepb.ArtifactInformation{ + TypeUrn: URNFileArtifact, + TypePayload: typePayload, + RoleUrn: URNStagingTo, + RolePayload: rolePayload, + }) list = append(list, &legacyArtifact{ client: client, rt: rt, @@ -196,7 +283,7 @@ func legacyMaterialize(ctx context.Context, endpoint string, rt string, dest str }) } - return mds, MultiRetrieve(ctx, 10, list, dest) + return artifacts, MultiRetrieve(ctx, 10, list, dest) } // MultiRetrieve retrieves multiple artifacts concurrently, using at most 'cpus' diff --git a/sdks/go/pkg/beam/artifact/materialize_test.go b/sdks/go/pkg/beam/artifact/materialize_test.go index 53e62ed6578b..8c5702a51366 100644 --- a/sdks/go/pkg/beam/artifact/materialize_test.go +++ b/sdks/go/pkg/beam/artifact/materialize_test.go @@ -25,10 +25,10 @@ import ( "path/filepath" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" "github.com/golang/protobuf/proto" "google.golang.org/grpc" "google.golang.org/grpc/metadata" @@ -212,12 +212,13 @@ func TestNewRetrieveWithResolution(t *testing.T) { checkStagedFiles(mds, dest, expected, t) } -func checkStagedFiles(mds []*jobpb.ArtifactMetadata, dest string, expected map[string]string, t *testing.T) { +func checkStagedFiles(mds []*pipepb.ArtifactInformation, dest string, expected map[string]string, t *testing.T) { if len(mds) != len(expected) { t.Errorf("wrong number of artifacts staged %v vs %v", len(mds), len(expected)) } for _, md := range mds { - filename := filepath.Join(dest, filepath.FromSlash(md.Name)) + name, _ := MustExtractFilePayload(md) + filename := filepath.Join(dest, filepath.FromSlash(name)) fd, err := os.Open(filename) if err != nil { t.Errorf("error opening file %v", err) @@ -230,8 +231,8 @@ func checkStagedFiles(mds []*jobpb.ArtifactMetadata, dest string, expected map[s t.Errorf("error reading file %v", err) } - if string(data[:n]) != expected[md.Name] { - t.Errorf("missmatched contents for %v: '%s' vs '%s'", md.Name, string(data[:n]), expected[md.Name]) + if string(data[:n]) != expected[name] { + t.Errorf("missmatched contents for %v: '%s' vs '%s'", name, string(data[:n]), expected[name]) } } } diff --git a/sdks/go/pkg/beam/artifact/server_test.go b/sdks/go/pkg/beam/artifact/server_test.go index b2617f833475..3fdc8fa5cac8 100644 --- a/sdks/go/pkg/beam/artifact/server_test.go +++ b/sdks/go/pkg/beam/artifact/server_test.go @@ -22,9 +22,9 @@ import ( "testing" "time" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" "golang.org/x/net/context" "google.golang.org/grpc" ) @@ -68,6 +68,9 @@ type manifest struct { // server is a in-memory staging and retrieval artifact server for testing. type server struct { + jobpb.UnimplementedLegacyArtifactStagingServiceServer + jobpb.UnimplementedLegacyArtifactRetrievalServiceServer + m map[string]*manifest // token -> manifest mu sync.Mutex } diff --git a/sdks/go/pkg/beam/artifact/stage.go b/sdks/go/pkg/beam/artifact/stage.go index c8a5f9b31e45..6e6e1b84de3a 100644 --- a/sdks/go/pkg/beam/artifact/stage.go +++ b/sdks/go/pkg/beam/artifact/stage.go @@ -29,9 +29,9 @@ import ( "sync" "time" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/errorx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/errorx" ) // Commit commits a manifest with the given staged artifacts. It returns the diff --git a/sdks/go/pkg/beam/artifact/stage_test.go b/sdks/go/pkg/beam/artifact/stage_test.go index 57121e5966ef..d2ee91c16c53 100644 --- a/sdks/go/pkg/beam/artifact/stage_test.go +++ b/sdks/go/pkg/beam/artifact/stage_test.go @@ -21,8 +21,8 @@ import ( "os" "testing" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" "google.golang.org/grpc" ) diff --git a/sdks/go/pkg/beam/beam.shims.go b/sdks/go/pkg/beam/beam.shims.go index 106bdc148b6e..598c4619fa0c 100644 --- a/sdks/go/pkg/beam/beam.shims.go +++ b/sdks/go/pkg/beam/beam.shims.go @@ -23,10 +23,11 @@ import ( "reflect" // Library imports - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func init() { @@ -43,8 +44,11 @@ func init() { runtime.RegisterFunction(schemaEnc) runtime.RegisterFunction(swapKVFn) runtime.RegisterType(reflect.TypeOf((*createFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*createFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*reflect.Type)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*reflect.Type)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*reflectx.Func)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*reflectx.Func)(nil)).Elem()) reflectx.RegisterStructWrapper(reflect.TypeOf((*createFn)(nil)).Elem(), wrapMakerCreateFn) reflectx.RegisterFunc(reflect.TypeOf((*func(reflect.Type, []byte) (typex.T, error))(nil)).Elem(), funcMakerReflect۰TypeSliceOfByteГTypex۰TError) reflectx.RegisterFunc(reflect.TypeOf((*func(reflect.Type, typex.T) ([]byte, error))(nil)).Elem(), funcMakerReflect۰TypeTypex۰TГSliceOfByteError) diff --git a/sdks/go/pkg/beam/coder.go b/sdks/go/pkg/beam/coder.go index 5969afd861cc..fe6be4728538 100644 --- a/sdks/go/pkg/beam/coder.go +++ b/sdks/go/pkg/beam/coder.go @@ -23,18 +23,29 @@ import ( "reflect" "sync" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/coderx" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/jsonx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/coderx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/jsonx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" protov1 "github.com/golang/protobuf/proto" protov2 "google.golang.org/protobuf/proto" "google.golang.org/protobuf/reflect/protoreflect" ) +// EnableSchemas is a temporary configuration variable +// to use Beam Schema encoding by default instead of JSON. +// Before it is removed, it will be set to true by default +// and then eventually removed. +// +// Only users who rely on default JSON marshalling behaviour should set +// this explicitly, and file an issue on the BEAM JIRA so the issue may +// be resolved. +// https://issues.apache.org/jira/projects/BEAM/issues/ +var EnableSchemas bool = true + type jsonCoder interface { json.Marshaler json.Unmarshaler @@ -183,6 +194,19 @@ func inferCoder(t FullType) (*coder.Coder, error) { if c := coder.LookupCustomCoder(et); c != nil { return coder.CoderFrom(c), nil } + + if EnableSchemas { + switch et.Kind() { + case reflect.Ptr: + if et.Elem().Kind() != reflect.Struct { + break + } + fallthrough + case reflect.Struct: + return &coder.Coder{Kind: coder.Row, T: t}, nil + } + } + // Interface types that implement JSON marshalling can be handled by the default coder. // otherwise, inference needs to fail here. if et.Kind() == reflect.Interface && !et.Implements(jsonCoderType) { diff --git a/sdks/go/pkg/beam/coder_test.go b/sdks/go/pkg/beam/coder_test.go index d33e9063c16c..795dceb911c6 100644 --- a/sdks/go/pkg/beam/coder_test.go +++ b/sdks/go/pkg/beam/coder_test.go @@ -21,7 +21,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" ) func TestJSONCoder(t *testing.T) { diff --git a/sdks/go/pkg/beam/combine.go b/sdks/go/pkg/beam/combine.go index 126dbc475d06..a7a563794f80 100644 --- a/sdks/go/pkg/beam/combine.go +++ b/sdks/go/pkg/beam/combine.go @@ -16,9 +16,9 @@ package beam import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Combine inserts a global Combine transform into the pipeline. It diff --git a/sdks/go/pkg/beam/combine_test.go b/sdks/go/pkg/beam/combine_test.go index 14b0127f76fd..7338231f7c67 100644 --- a/sdks/go/pkg/beam/combine_test.go +++ b/sdks/go/pkg/beam/combine_test.go @@ -19,7 +19,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" ) // foolFn is a no-op CombineFn. diff --git a/sdks/go/test/validatesrunner/validatesrunner.go b/sdks/go/pkg/beam/core/core.go similarity index 55% rename from sdks/go/test/validatesrunner/validatesrunner.go rename to sdks/go/pkg/beam/core/core.go index ae2e5f75c6f2..74e42b49515a 100644 --- a/sdks/go/test/validatesrunner/validatesrunner.go +++ b/sdks/go/pkg/beam/core/core.go @@ -13,19 +13,19 @@ // See the License for the specific language governing permissions and // limitations under the License. -// Package validatesrunner contains Validates Runner tests, which are a type of -// integration test that execute short pipelines on various runners to validate -// runner behavior. +// Package core contains constants and other static data related to the SDK, +// such as the SDK Name and version. // -// These tests are intended to be used via "go test validatesrunner/...". Any -// flags usually necessary for running Go SDK pipelines should also be included, -// the notable flags being "--runner", "--endpoint", and "--expansion_addr". -package validatesrunner - -import "flag" +// As a rule, this package should not have dependencies, and should not depend +// on any package within the Apache Beam Go SDK. +// +// Files in this package may be generated or updated by release scripts, allowing +// for accurate version information to be included. +package core -var ( - // expansionAddr is the endpoint for an expansion service for cross-language - // transforms. - expansionAddr = flag.String("expansion_addr", "", "Address of Expansion Service") +const ( + // SdkName is the human readable name of the SDK for UserAgents. + SdkName = "Apache Beam SDK for Go" + // SdkVersion is the current version of the SDK. + SdkVersion = "2.34.0.dev" ) diff --git a/sdks/go/pkg/beam/core/funcx/fn.go b/sdks/go/pkg/beam/core/funcx/fn.go index dd5341ba6970..88d7d432f351 100644 --- a/sdks/go/pkg/beam/core/funcx/fn.go +++ b/sdks/go/pkg/beam/core/funcx/fn.go @@ -17,12 +17,12 @@ package funcx import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Note that we can't tell the difference between K, V and V, S before binding. diff --git a/sdks/go/pkg/beam/core/funcx/fn_test.go b/sdks/go/pkg/beam/core/funcx/fn_test.go index 410fb59df3e0..72c167718753 100644 --- a/sdks/go/pkg/beam/core/funcx/fn_test.go +++ b/sdks/go/pkg/beam/core/funcx/fn_test.go @@ -21,9 +21,9 @@ import ( "strings" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) type foo struct { diff --git a/sdks/go/pkg/beam/core/funcx/output.go b/sdks/go/pkg/beam/core/funcx/output.go index 116ecf17ef8d..0803c89be064 100644 --- a/sdks/go/pkg/beam/core/funcx/output.go +++ b/sdks/go/pkg/beam/core/funcx/output.go @@ -18,7 +18,7 @@ package funcx import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // IsEmit returns true iff the supplied type is an emitter. diff --git a/sdks/go/pkg/beam/core/funcx/output_test.go b/sdks/go/pkg/beam/core/funcx/output_test.go index 77de76b80a65..65ac8fd7b860 100644 --- a/sdks/go/pkg/beam/core/funcx/output_test.go +++ b/sdks/go/pkg/beam/core/funcx/output_test.go @@ -19,7 +19,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) func TestIsEmit(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/funcx/sideinput.go b/sdks/go/pkg/beam/core/funcx/sideinput.go index 748af78a56eb..d2a44c1ba610 100644 --- a/sdks/go/pkg/beam/core/funcx/sideinput.go +++ b/sdks/go/pkg/beam/core/funcx/sideinput.go @@ -18,8 +18,8 @@ package funcx import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // IsIter returns true iff the supplied type is a "single sweep functional iterator". diff --git a/sdks/go/pkg/beam/core/funcx/sideinput_test.go b/sdks/go/pkg/beam/core/funcx/sideinput_test.go index a1cc23626928..52532ba8cea5 100644 --- a/sdks/go/pkg/beam/core/funcx/sideinput_test.go +++ b/sdks/go/pkg/beam/core/funcx/sideinput_test.go @@ -19,7 +19,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) func TestIsIter(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/funcx/signature.go b/sdks/go/pkg/beam/core/funcx/signature.go index 8bcd3860113e..d880416f06f1 100644 --- a/sdks/go/pkg/beam/core/funcx/signature.go +++ b/sdks/go/pkg/beam/core/funcx/signature.go @@ -20,9 +20,9 @@ import ( "reflect" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Signature is a concise representation of a group of function types. The diff --git a/sdks/go/pkg/beam/core/funcx/signature_test.go b/sdks/go/pkg/beam/core/funcx/signature_test.go index ce40ad116265..5ffc6a5d250a 100644 --- a/sdks/go/pkg/beam/core/funcx/signature_test.go +++ b/sdks/go/pkg/beam/core/funcx/signature_test.go @@ -19,8 +19,8 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func TestSatisfy(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/graph/bind.go b/sdks/go/pkg/beam/core/graph/bind.go index f5cc7f82d54b..c0014d193d52 100644 --- a/sdks/go/pkg/beam/core/graph/bind.go +++ b/sdks/go/pkg/beam/core/graph/bind.go @@ -18,9 +18,9 @@ package graph import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // TODO(herohde) 4/21/2017: Bind is where most user mistakes will likely show diff --git a/sdks/go/pkg/beam/core/graph/bind_test.go b/sdks/go/pkg/beam/core/graph/bind_test.go index c0ca0202ef0e..bfdd15903ee3 100644 --- a/sdks/go/pkg/beam/core/graph/bind_test.go +++ b/sdks/go/pkg/beam/core/graph/bind_test.go @@ -19,10 +19,10 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func TestBind(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/graph/coder/bool.go b/sdks/go/pkg/beam/core/graph/coder/bool.go index 309ae24f1991..bf4fe4975979 100644 --- a/sdks/go/pkg/beam/core/graph/coder/bool.go +++ b/sdks/go/pkg/beam/core/graph/coder/bool.go @@ -18,8 +18,8 @@ package coder import ( "io" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // EncodeBool encodes a boolean according to the beam protocol. diff --git a/sdks/go/pkg/beam/core/graph/coder/bool_test.go b/sdks/go/pkg/beam/core/graph/coder/bool_test.go index 77aaf8bff6f4..9aff71f2071a 100644 --- a/sdks/go/pkg/beam/core/graph/coder/bool_test.go +++ b/sdks/go/pkg/beam/core/graph/coder/bool_test.go @@ -20,7 +20,7 @@ import ( "fmt" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" "github.com/google/go-cmp/cmp" ) diff --git a/sdks/go/pkg/beam/core/graph/coder/bytes.go b/sdks/go/pkg/beam/core/graph/coder/bytes.go index 5bb5692f4217..aa97ef3141b1 100644 --- a/sdks/go/pkg/beam/core/graph/coder/bytes.go +++ b/sdks/go/pkg/beam/core/graph/coder/bytes.go @@ -19,8 +19,8 @@ import ( "fmt" "io" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // EncodeByte encodes a single byte. diff --git a/sdks/go/pkg/beam/core/graph/coder/coder.go b/sdks/go/pkg/beam/core/graph/coder/coder.go index 61908b809cd5..6eea66b0d317 100644 --- a/sdks/go/pkg/beam/core/graph/coder/coder.go +++ b/sdks/go/pkg/beam/core/graph/coder/coder.go @@ -23,10 +23,10 @@ import ( "reflect" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // CustomCoder contains possibly untyped encode/decode user functions that are @@ -170,6 +170,7 @@ const ( Double Kind = "double" Row Kind = "R" Timer Kind = "T" + PaneInfo Kind = "PI" WindowedValue Kind = "W" ParamWindowedValue Kind = "PW" Iterable Kind = "I" @@ -297,6 +298,11 @@ func IsW(c *Coder) bool { return c.Kind == WindowedValue } +// NewPI returns a PaneInfo coder +func NewPI() *Coder { + return &Coder{Kind: PaneInfo, T: typex.New(typex.PaneInfoType)} +} + // NewW returns a WindowedValue coder for the window of elements. func NewW(c *Coder, w *WindowCoder) *Coder { if c == nil { diff --git a/sdks/go/pkg/beam/core/graph/coder/coder_test.go b/sdks/go/pkg/beam/core/graph/coder/coder_test.go index fe976d2e462f..762ed848f589 100644 --- a/sdks/go/pkg/beam/core/graph/coder/coder_test.go +++ b/sdks/go/pkg/beam/core/graph/coder/coder_test.go @@ -20,9 +20,9 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) type MyType struct{} diff --git a/sdks/go/pkg/beam/core/graph/coder/double.go b/sdks/go/pkg/beam/core/graph/coder/double.go index bb47afe5f42f..d9d508b2711f 100644 --- a/sdks/go/pkg/beam/core/graph/coder/double.go +++ b/sdks/go/pkg/beam/core/graph/coder/double.go @@ -20,7 +20,7 @@ import ( "io" "math" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" ) // EncodeDouble encodes a float64 in big endian format. diff --git a/sdks/go/pkg/beam/core/graph/coder/double_test.go b/sdks/go/pkg/beam/core/graph/coder/double_test.go new file mode 100644 index 000000000000..2fbb7c58183c --- /dev/null +++ b/sdks/go/pkg/beam/core/graph/coder/double_test.go @@ -0,0 +1,53 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package coder + +import ( + "bytes" + "math" + "testing" +) + +func TestEncodeDecodeDouble(t *testing.T) { + var tests []float64 + for x := -100.0; x <= 100.0; x++ { + tests = append(tests, 0.1*x) + tests = append(tests, math.Pow(2, 0.1*x)) + } + tests = append(tests, -math.MaxFloat64) + tests = append(tests, math.MaxFloat64) + tests = append(tests, math.Inf(1)) + tests = append(tests, math.Inf(-1)) + for _, test := range tests { + var buf bytes.Buffer + if err := EncodeDouble(test, &buf); err != nil { + t.Fatalf("EncodeDouble(%v) failed: %v", test, err) + } + t.Logf("Encoded %v to %v", test, buf.Bytes()) + + if len(buf.Bytes()) != 8 { + t.Errorf("len(EncodeDouble(%v)) = %v, want 8", test, len(buf.Bytes())) + } + + actual, err := DecodeDouble(&buf) + if err != nil { + t.Fatalf("DecodeDouble(<%v>) failed: %v", test, err) + } + if actual != test { + t.Errorf("DecodeDouble(<%v>) = %v, want %v", test, actual, test) + } + } +} diff --git a/sdks/go/pkg/beam/core/graph/coder/int.go b/sdks/go/pkg/beam/core/graph/coder/int.go index 35a6cf2ac408..6eda98ccc38f 100644 --- a/sdks/go/pkg/beam/core/graph/coder/int.go +++ b/sdks/go/pkg/beam/core/graph/coder/int.go @@ -19,7 +19,7 @@ import ( "encoding/binary" "io" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" ) // EncodeUint64 encodes an uint64 in big endian format. diff --git a/sdks/go/pkg/beam/core/graph/coder/iterable.go b/sdks/go/pkg/beam/core/graph/coder/iterable.go index c6affc6b9fda..36fb1c6f2805 100644 --- a/sdks/go/pkg/beam/core/graph/coder/iterable.go +++ b/sdks/go/pkg/beam/core/graph/coder/iterable.go @@ -19,21 +19,61 @@ import ( "io" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // TODO(lostluck): 2020.06.29 export these for use for others? +// EncoderForSlice returns an encoding function that encodes a struct type +// or a pointer to a struct type using the beam row encoding. +// +// Returns an error if the given type is invalid or not encodable to a beam +// schema row. +func EncoderForSlice(rt reflect.Type) (func(interface{}, io.Writer) error, error) { + var bld RowEncoderBuilder + eEnc, err := bld.encoderForSingleTypeReflect(rt.Elem()) + if err != nil { + return nil, err + } + enc := iterableEncoder(rt, eEnc) + return func(v interface{}, w io.Writer) error { + return enc(reflect.ValueOf(v), w) + }, nil +} + +// DecoderForSlice returns a decoding function that decodes the beam row encoding +// into the given type. +// +// Returns an error if the given type is invalid or not decodable from a beam +// schema row. +func DecoderForSlice(rt reflect.Type) (func(io.Reader) (interface{}, error), error) { + var bld RowDecoderBuilder + eDec, err := bld.decoderForSingleTypeReflect(rt.Elem()) + if err != nil { + return nil, err + } + dec := iterableDecoderForSlice(rt, eDec) + return func(r io.Reader) (interface{}, error) { + rv := reflect.New(rt) + err := dec(rv.Elem(), r) + return rv.Elem().Interface(), err + }, nil +} + // iterableEncoder reflectively encodes a slice or array type using // the beam fixed length iterable encoding. -func iterableEncoder(rt reflect.Type, encode func(reflect.Value, io.Writer) error) func(reflect.Value, io.Writer) error { +func iterableEncoder(rt reflect.Type, encode typeEncoderFieldReflect) func(reflect.Value, io.Writer) error { return func(rv reflect.Value, w io.Writer) error { size := rv.Len() if err := EncodeInt32((int32)(size), w); err != nil { return err } for i := 0; i < size; i++ { - if err := encode(rv.Index(i), w); err != nil { + iv := rv.Index(i) + if encode.addr { + iv = iv.Addr() + } + if err := encode.encode(iv, w); err != nil { return err } } @@ -44,7 +84,7 @@ func iterableEncoder(rt reflect.Type, encode func(reflect.Value, io.Writer) erro // iterableDecoderForSlice can decode from both the fixed sized and // multi-chunk variants of the beam iterable protocol. // Returns an error for other protocols (such as state backed). -func iterableDecoderForSlice(rt reflect.Type, decodeToElem func(reflect.Value, io.Reader) error) func(reflect.Value, io.Reader) error { +func iterableDecoderForSlice(rt reflect.Type, decodeToElem typeDecoderFieldReflect) func(reflect.Value, io.Reader) error { return func(ret reflect.Value, r io.Reader) error { // (1) Read count prefixed encoded data size, err := DecodeInt32(r) @@ -88,7 +128,7 @@ func iterableDecoderForSlice(rt reflect.Type, decodeToElem func(reflect.Value, i // iterableDecoderForArray can decode from only the fixed sized and // multi-chunk variant of the beam iterable protocol. // Returns an error for other protocols (such as state backed). -func iterableDecoderForArray(rt reflect.Type, decodeToElem func(reflect.Value, io.Reader) error) func(reflect.Value, io.Reader) error { +func iterableDecoderForArray(rt reflect.Type, decodeToElem typeDecoderFieldReflect) func(reflect.Value, io.Reader) error { return func(ret reflect.Value, r io.Reader) error { // (1) Read count prefixed encoded data size, err := DecodeInt32(r) @@ -111,10 +151,14 @@ func iterableDecoderForArray(rt reflect.Type, decodeToElem func(reflect.Value, i } } -func decodeToIterable(rv reflect.Value, r io.Reader, decodeTo func(reflect.Value, io.Reader) error) error { - for i := 0; i < rv.Len(); i++ { - err := decodeTo(rv.Index(i), r) - if err != nil { +func decodeToIterable(rv reflect.Value, r io.Reader, decodeTo typeDecoderFieldReflect) error { + size := rv.Len() + for i := 0; i < size; i++ { + iv := rv.Index(i) + if decodeTo.addr { + iv = iv.Addr() + } + if err := decodeTo.decode(iv, r); err != nil { return err } } diff --git a/sdks/go/pkg/beam/core/graph/coder/iterable_test.go b/sdks/go/pkg/beam/core/graph/coder/iterable_test.go index 977b58817573..b4d6ba33dd59 100644 --- a/sdks/go/pkg/beam/core/graph/coder/iterable_test.go +++ b/sdks/go/pkg/beam/core/graph/coder/iterable_test.go @@ -80,7 +80,7 @@ func TestEncodeDecodeIterable(t *testing.T) { if !test.decodeOnly { t.Run(fmt.Sprintf("encode %q", test.v), func(t *testing.T) { var buf bytes.Buffer - err := iterableEncoder(reflect.TypeOf(test.v), test.encElm)(reflect.ValueOf(test.v), &buf) + err := iterableEncoder(reflect.TypeOf(test.v), typeEncoderFieldReflect{encode: test.encElm})(reflect.ValueOf(test.v), &buf) if err != nil { t.Fatalf("EncodeBytes(%q) = %v", test.v, err) } @@ -95,9 +95,9 @@ func TestEncodeDecodeIterable(t *testing.T) { var dec func(reflect.Value, io.Reader) error switch rt.Kind() { case reflect.Slice: - dec = iterableDecoderForSlice(rt, test.decElm) + dec = iterableDecoderForSlice(rt, typeDecoderFieldReflect{decode: test.decElm}) case reflect.Array: - dec = iterableDecoderForArray(rt, test.decElm) + dec = iterableDecoderForArray(rt, typeDecoderFieldReflect{decode: test.decElm}) } rv := reflect.New(rt).Elem() err := dec(rv, buf) diff --git a/sdks/go/pkg/beam/core/graph/coder/map.go b/sdks/go/pkg/beam/core/graph/coder/map.go index 4e5dc2ccddb2..2d72446bf444 100644 --- a/sdks/go/pkg/beam/core/graph/coder/map.go +++ b/sdks/go/pkg/beam/core/graph/coder/map.go @@ -16,14 +16,16 @@ package coder import ( + "bytes" "io" "reflect" + "sort" ) // TODO(lostluck): 2020.08.04 export these for use for others? // mapDecoder produces a decoder for the beam schema map encoding. -func mapDecoder(rt reflect.Type, decodeToKey, decodeToElem func(reflect.Value, io.Reader) error) func(reflect.Value, io.Reader) error { +func mapDecoder(rt reflect.Type, decodeToKey, decodeToElem typeDecoderFieldReflect) func(reflect.Value, io.Reader) error { return func(ret reflect.Value, r io.Reader) error { // (1) Read count prefixed encoded data size, err := DecodeInt32(r) @@ -32,16 +34,29 @@ func mapDecoder(rt reflect.Type, decodeToKey, decodeToElem func(reflect.Value, i } n := int(size) ret.Set(reflect.MakeMapWithSize(rt, n)) + rtk := rt.Key() + rtv := rt.Elem() for i := 0; i < n; i++ { - rvk := reflect.New(rt.Key()).Elem() - if err := decodeToKey(rvk, r); err != nil { + rvk := reflect.New(rtk) + var err error + if decodeToKey.addr { + err = decodeToKey.decode(rvk, r) + } else { + err = decodeToKey.decode(rvk.Elem(), r) + } + if err != nil { return err } - rvv := reflect.New(rt.Elem()).Elem() - if err := decodeToElem(rvv, r); err != nil { + rvv := reflect.New(rtv) + if decodeToElem.addr { + err = decodeToElem.decode(rvv, r) + } else { + err = decodeToElem.decode(rvv.Elem(), r) + } + if err != nil { return err } - ret.SetMapIndex(rvk, rvv) + ret.SetMapIndex(rvk.Elem(), rvv.Elem()) } return nil } @@ -58,28 +73,59 @@ func containerNilDecoder(decodeToElem func(reflect.Value, io.Reader) error) func if !hasValue { return nil } - rv := reflect.New(ret.Type().Elem()) - if err := decodeToElem(rv.Elem(), r); err != nil { + if err := decodeToElem(ret, r); err != nil { return err } - ret.Set(rv) return nil } } // mapEncoder reflectively encodes a map or array type using the beam map encoding. -func mapEncoder(rt reflect.Type, encodeKey, encodeValue func(reflect.Value, io.Writer) error) func(reflect.Value, io.Writer) error { +func mapEncoder(rt reflect.Type, encodeKey, encodeValue typeEncoderFieldReflect) func(reflect.Value, io.Writer) error { return func(rv reflect.Value, w io.Writer) error { size := rv.Len() if err := EncodeInt32((int32)(size), w); err != nil { return err } - iter := rv.MapRange() - for iter.Next() { - if err := encodeKey(iter.Key(), w); err != nil { + keys := rv.MapKeys() + type pair struct { + v reflect.Value + b []byte + } + rtk := rv.Type().Key() + sorted := make([]pair, 0, size) + var buf bytes.Buffer // Re-use the same buffer. + for _, key := range keys { + var rvk reflect.Value + if encodeKey.addr { + rvk = reflect.New(rtk) + rvk.Elem().Set(key) + } else { + rvk = key + } + if err := encodeKey.encode(key, &buf); err != nil { + return err + } + p := pair{v: key, b: make([]byte, buf.Len(), buf.Len())} + copy(p.b, buf.Bytes()) + sorted = append(sorted, p) + buf.Reset() // Reset for next iteration. + } + sort.Slice(sorted, func(i, j int) bool { return bytes.Compare(sorted[i].b, sorted[j].b) < 0 }) + rtv := rv.Type().Elem() + for _, kp := range sorted { + if _, err := w.Write(kp.b); err != nil { return err } - if err := encodeValue(iter.Value(), w); err != nil { + val := rv.MapIndex(kp.v) + var rvv reflect.Value + if encodeValue.addr { + rvv = reflect.New(rtv) + rvv.Elem().Set(val) + } else { + rvv = val + } + if err := encodeValue.encode(rvv, w); err != nil { return err } } @@ -97,6 +143,6 @@ func containerNilEncoder(encodeElem func(reflect.Value, io.Writer) error) func(r if err := EncodeBool(true, w); err != nil { return err } - return encodeElem(rv.Elem(), w) + return encodeElem(rv, w) } } diff --git a/sdks/go/pkg/beam/core/graph/coder/map_test.go b/sdks/go/pkg/beam/core/graph/coder/map_test.go index 0b825c2d5105..ee4c35afa609 100644 --- a/sdks/go/pkg/beam/core/graph/coder/map_test.go +++ b/sdks/go/pkg/beam/core/graph/coder/map_test.go @@ -22,15 +22,24 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" "github.com/google/go-cmp/cmp" ) func TestEncodeDecodeMap(t *testing.T) { - byteEnc := containerEncoderForType(reflectx.Uint8) - byteDec := containerDecoderForType(reflectx.Uint8) - bytePtrEnc := containerEncoderForType(reflect.PtrTo(reflectx.Uint8)) - bytePtrDec := containerDecoderForType(reflect.PtrTo(reflectx.Uint8)) + byteEnc := func(v reflect.Value, w io.Writer) error { + return EncodeByte(byte(v.Uint()), w) + } + byteDec := reflectDecodeByte + bytePtrEnc := func(v reflect.Value, w io.Writer) error { + return byteEnc(v.Elem(), w) + } + bytePtrDec := func(v reflect.Value, r io.Reader) error { + v.Set(reflect.New(reflectx.Uint8)) + return byteDec(v.Elem(), r) + } + byteCtnrPtrEnc := containerNilEncoder(bytePtrEnc) + byteCtnrPtrDec := containerNilDecoder(bytePtrDec) ptrByte := byte(42) @@ -48,19 +57,26 @@ func TestEncodeDecodeMap(t *testing.T) { decK: byteDec, decV: byteDec, encoded: []byte{0, 0, 0, 1, 10, 42}, + }, { + v: map[byte]byte{10: 42, 12: 53, 15: 64}, + encK: byteEnc, + encV: byteEnc, + decK: byteDec, + decV: byteDec, + encoded: []byte{0, 0, 0, 3, 10, 42, 12, 53, 15, 64}, }, { v: map[byte]*byte{10: &ptrByte}, encK: byteEnc, - encV: bytePtrEnc, + encV: byteCtnrPtrEnc, decK: byteDec, - decV: bytePtrDec, + decV: byteCtnrPtrDec, encoded: []byte{0, 0, 0, 1, 10, 1, 42}, }, { v: map[byte]*byte{10: &ptrByte, 23: nil, 53: nil}, encK: byteEnc, - encV: bytePtrEnc, + encV: byteCtnrPtrEnc, decK: byteDec, - decV: bytePtrDec, + decV: byteCtnrPtrDec, encoded: []byte{0, 0, 0, 3, 10, 1, 42, 23, 0, 53, 0}, decodeOnly: true, }, @@ -70,7 +86,7 @@ func TestEncodeDecodeMap(t *testing.T) { if !test.decodeOnly { t.Run(fmt.Sprintf("encode %q", test.v), func(t *testing.T) { var buf bytes.Buffer - err := mapEncoder(reflect.TypeOf(test.v), test.encK, test.encV)(reflect.ValueOf(test.v), &buf) + err := mapEncoder(reflect.TypeOf(test.v), typeEncoderFieldReflect{encode: test.encK}, typeEncoderFieldReflect{encode: test.encV})(reflect.ValueOf(test.v), &buf) if err != nil { t.Fatalf("mapEncoder(%q) = %v", test.v, err) } @@ -83,7 +99,7 @@ func TestEncodeDecodeMap(t *testing.T) { buf := bytes.NewBuffer(test.encoded) rt := reflect.TypeOf(test.v) var dec func(reflect.Value, io.Reader) error - dec = mapDecoder(rt, test.decK, test.decV) + dec = mapDecoder(rt, typeDecoderFieldReflect{decode: test.decK}, typeDecoderFieldReflect{decode: test.decV}) rv := reflect.New(rt).Elem() err := dec(rv, buf) if err != nil { diff --git a/sdks/go/pkg/beam/core/graph/coder/panes.go b/sdks/go/pkg/beam/core/graph/coder/panes.go new file mode 100644 index 000000000000..3ccd987e765d --- /dev/null +++ b/sdks/go/pkg/beam/core/graph/coder/panes.go @@ -0,0 +1,116 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package coder + +import ( + "io" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" +) + +// EncodePane encodes a typex.PaneInfo. +func EncodePane(v typex.PaneInfo, w io.Writer) error { + // Encoding: typex.PaneInfo + + pane := byte(0) + if v.IsFirst { + pane |= 0x01 + } + if v.IsLast { + pane |= 0x02 + } + pane |= byte(v.Timing << 2) + + switch { + case v.Index == 0 || v.NonSpeculativeIndex == 0 || v.Timing == typex.PaneUnknown: + // The entire pane info is encoded as a single byte + paneByte := []byte{pane} + w.Write(paneByte) + case v.Index == v.NonSpeculativeIndex || v.Timing == typex.PaneEarly: + // The pane info is encoded as this byte plus a single VarInt encoded integer + paneByte := []byte{pane | 1<<4} + w.Write(paneByte) + EncodeVarInt(v.Index, w) + default: + // The pane info is encoded as this byte plus two VarInt encoded integer + paneByte := []byte{pane | 2<<4} + w.Write(paneByte) + EncodeVarInt(v.Index, w) + EncodeVarInt(v.NonSpeculativeIndex, w) + } + return nil +} + +// NewPane initializes the PaneInfo from a given byte. +// By default, PaneInfo is assigned to NoFiringPane. +func NewPane(b byte) typex.PaneInfo { + pn := typex.NoFiringPane() + + if b&0x01 == 1 { + pn.IsFirst = true + } + if b&0x02 == 2 { + pn.IsLast = true + } + + pn.Timing = typex.PaneTiming((b >> 2) & 0x03) + return pn +} + +// DecodePane decodes a single byte. +func DecodePane(r io.Reader) (typex.PaneInfo, error) { + // Decoding: typex.PaneInfo + + var data [1]byte + if err := ioutilx.ReadNBufUnsafe(r, data[:]); err != nil { + return typex.PaneInfo{}, err + } + + pn := NewPane(data[0] & 0x0f) + + switch data[0] >> 4 { + case 0: + // Result encoded in only one pane. + return pn, nil + case 1: + // Result encoded in one pane plus a VarInt encoded integer. + data, err := DecodeVarInt(r) + if err != nil { + return typex.PaneInfo{}, err + } + + pn.Index = data + if pn.Timing == typex.PaneEarly { + pn.NonSpeculativeIndex = -1 + } else { + pn.NonSpeculativeIndex = pn.Index + } + case 2: + // Result encoded in one pane plus two VarInt encoded integer. + data, err := DecodeVarInt(r) + if err != nil { + return typex.PaneInfo{}, err + } + + pn.Index = data + pn.NonSpeculativeIndex, err = DecodeVarInt(r) + if err != nil { + return typex.PaneInfo{}, err + } + } + return pn, nil +} diff --git a/sdks/go/pkg/beam/core/graph/coder/registry.go b/sdks/go/pkg/beam/core/graph/coder/registry.go index 5be9556bf0c5..7f59588fa405 100644 --- a/sdks/go/pkg/beam/core/graph/coder/registry.go +++ b/sdks/go/pkg/beam/core/graph/coder/registry.go @@ -18,7 +18,7 @@ package coder import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) var ( diff --git a/sdks/go/pkg/beam/core/graph/coder/registry_test.go b/sdks/go/pkg/beam/core/graph/coder/registry_test.go index 468be6ede583..0d27cb328fca 100644 --- a/sdks/go/pkg/beam/core/graph/coder/registry_test.go +++ b/sdks/go/pkg/beam/core/graph/coder/registry_test.go @@ -19,7 +19,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) func clearRegistry() { @@ -131,7 +131,6 @@ func TestRegisterCoder(t *testing.T) { t.Run(test.name, func(t *testing.T) { defer func() { if p := recover(); p != nil { - t.Log(p) return } t.Fatalf("RegisterCoder(%v, %T, %T): want panic", msType, test.enc, test.dec) diff --git a/sdks/go/pkg/beam/core/graph/coder/row.go b/sdks/go/pkg/beam/core/graph/coder/row.go index 714e1500c1f5..c4661b5addc1 100644 --- a/sdks/go/pkg/beam/core/graph/coder/row.go +++ b/sdks/go/pkg/beam/core/graph/coder/row.go @@ -20,20 +20,38 @@ import ( "io" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) +var ( + defaultEnc RowEncoderBuilder + defaultDec RowDecoderBuilder +) + +// RequireAllFieldsExported when set to true will have the default coder buildings using +// RowEncoderForStruct and RowDecoderForStruct fail if there are any unexported fields. +// When set false, unexported fields in default destination structs will be silently +// ignored when coding. +// This has no effect on types with registered coder providers. +func RequireAllFieldsExported(require bool) { + defaultEnc.RequireAllFieldsExported = require + defaultDec.RequireAllFieldsExported = require +} + +// RegisterSchemaProviders Register Custom Schema providers. +func RegisterSchemaProviders(rt reflect.Type, enc, dec interface{}) { + defaultEnc.Register(rt, enc) + defaultDec.Register(rt, dec) +} + // RowEncoderForStruct returns an encoding function that encodes a struct type // or a pointer to a struct type using the beam row encoding. // // Returns an error if the given type is invalid or not encodable to a beam // schema row. func RowEncoderForStruct(rt reflect.Type) (func(interface{}, io.Writer) error, error) { - if err := rowTypeValidation(rt, true); err != nil { - return nil, err - } - return encoderForType(rt), nil + return defaultEnc.Build(rt) } // RowDecoderForStruct returns a decoding function that decodes the beam row encoding @@ -42,10 +60,7 @@ func RowEncoderForStruct(rt reflect.Type) (func(interface{}, io.Writer) error, e // Returns an error if the given type is invalid or not decodable from a beam // schema row. func RowDecoderForStruct(rt reflect.Type) (func(io.Reader) (interface{}, error), error) { - if err := rowTypeValidation(rt, true); err != nil { - return nil, err - } - return decoderForType(rt), nil + return defaultDec.Build(rt) } func rowTypeValidation(rt reflect.Type, strictExportedFields bool) error { @@ -58,277 +73,6 @@ func rowTypeValidation(rt reflect.Type, strictExportedFields bool) error { return nil } -// decoderForType returns a decoder function for the struct or pointer to struct type. -func decoderForType(t reflect.Type) func(io.Reader) (interface{}, error) { - var isPtr bool - // Pointers become the value type for decomposition. - if t.Kind() == reflect.Ptr { - isPtr = true - t = t.Elem() - } - dec := decoderForStructReflect(t) - - if isPtr { - return func(r io.Reader) (interface{}, error) { - rv := reflect.New(t) - err := dec(rv.Elem(), r) - return rv.Interface(), err - } - } - return func(r io.Reader) (interface{}, error) { - rv := reflect.New(t) - err := dec(rv.Elem(), r) - return rv.Elem().Interface(), err - } -} - -// decoderForSingleTypeReflect returns a reflection based decoder function for the -// given type. -func decoderForSingleTypeReflect(t reflect.Type) func(reflect.Value, io.Reader) error { - switch t.Kind() { - case reflect.Struct: - return decoderForStructReflect(t) - case reflect.Bool: - return func(rv reflect.Value, r io.Reader) error { - v, err := DecodeBool(r) - if err != nil { - return errors.Wrap(err, "error decoding bool field") - } - rv.SetBool(v) - return nil - } - case reflect.Uint8: - return func(rv reflect.Value, r io.Reader) error { - b, err := DecodeByte(r) - if err != nil { - return errors.Wrap(err, "error decoding single byte field") - } - rv.SetUint(uint64(b)) - return nil - } - case reflect.String: - return func(rv reflect.Value, r io.Reader) error { - v, err := DecodeStringUTF8(r) - if err != nil { - return errors.Wrap(err, "error decoding string field") - } - rv.SetString(v) - return nil - } - case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64: - return func(rv reflect.Value, r io.Reader) error { - v, err := DecodeVarInt(r) - if err != nil { - return errors.Wrap(err, "error decoding varint field") - } - rv.SetInt(v) - return nil - } - case reflect.Float32, reflect.Float64: - return func(rv reflect.Value, r io.Reader) error { - v, err := DecodeDouble(r) - if err != nil { - return errors.Wrap(err, "error decoding double field") - } - rv.SetFloat(v) - return nil - } - case reflect.Ptr: - decf := decoderForSingleTypeReflect(t.Elem()) - return func(rv reflect.Value, r io.Reader) error { - nv := reflect.New(t.Elem()) - rv.Set(nv) - return decf(nv.Elem(), r) - } - case reflect.Slice: - // Special case handling for byte slices. - if t.Elem().Kind() == reflect.Uint8 { - return func(rv reflect.Value, r io.Reader) error { - b, err := DecodeBytes(r) - if err != nil { - return errors.Wrap(err, "error decoding []byte field") - } - rv.SetBytes(b) - return nil - } - } - decf := containerDecoderForType(t.Elem()) - return iterableDecoderForSlice(t, decf) - case reflect.Array: - decf := containerDecoderForType(t.Elem()) - return iterableDecoderForArray(t, decf) - case reflect.Map: - decK := containerDecoderForType(t.Key()) - decV := containerDecoderForType(t.Elem()) - return mapDecoder(t, decK, decV) - } - panic(fmt.Sprintf("unimplemented type to decode: %v", t)) -} - -func containerDecoderForType(t reflect.Type) func(reflect.Value, io.Reader) error { - if t.Kind() == reflect.Ptr { - return containerNilDecoder(decoderForSingleTypeReflect(t.Elem())) - } - return decoderForSingleTypeReflect(t) -} - -type typeDecoderReflect struct { - typ reflect.Type - fields []func(reflect.Value, io.Reader) error -} - -// decoderForStructReflect returns a reflection based decoder function for the -// given struct type. -func decoderForStructReflect(t reflect.Type) func(reflect.Value, io.Reader) error { - var coder typeDecoderReflect - for i := 0; i < t.NumField(); i++ { - i := i // avoid alias issues in the closures. - dec := decoderForSingleTypeReflect(t.Field(i).Type) - coder.fields = append(coder.fields, func(rv reflect.Value, r io.Reader) error { - return dec(rv.Field(i), r) - }) - } - - return func(rv reflect.Value, r io.Reader) error { - nf, nils, err := readRowHeader(rv, r) - if err != nil { - return err - } - if nf != len(coder.fields) { - return errors.Errorf("schema[%v] changed: got %d fields, want %d fields", "TODO", nf, len(coder.fields)) - } - for i, f := range coder.fields { - if isFieldNil(nils, i) { - continue - } - if err := f(rv, r); err != nil { - return err - } - } - return nil - } -} - -// isFieldNil examines the passed in packed bits nils buffer -// and returns true if the field at that index wasn't encoded -// and can be skipped in decoding. -func isFieldNil(nils []byte, f int) bool { - i, b := f/8, f%8 - return len(nils) != 0 && (nils[i]>>uint8(b))&0x1 == 1 -} - -// encoderForType returns an encoder function for the struct or pointer to struct type. -func encoderForType(t reflect.Type) func(interface{}, io.Writer) error { - var isPtr bool - // Pointers become the value type for decomposition. - if t.Kind() == reflect.Ptr { - isPtr = true - t = t.Elem() - } - enc := encoderForStructReflect(t) - - if isPtr { - return func(v interface{}, w io.Writer) error { - return enc(reflect.ValueOf(v).Elem(), w) - } - } - return func(v interface{}, w io.Writer) error { - return enc(reflect.ValueOf(v), w) - } -} - -// Generates coder using reflection for -func encoderForSingleTypeReflect(t reflect.Type) func(reflect.Value, io.Writer) error { - switch t.Kind() { - case reflect.Struct: - return encoderForStructReflect(t) - case reflect.Bool: - return func(rv reflect.Value, w io.Writer) error { - return EncodeBool(rv.Bool(), w) - } - case reflect.Uint8: - return func(rv reflect.Value, w io.Writer) error { - return EncodeByte(byte(rv.Uint()), w) - } - case reflect.String: - return func(rv reflect.Value, w io.Writer) error { - return EncodeStringUTF8(rv.String(), w) - } - case reflect.Int, reflect.Int64, reflect.Int16, reflect.Int32, reflect.Int8: - return func(rv reflect.Value, w io.Writer) error { - return EncodeVarInt(int64(rv.Int()), w) - } - case reflect.Float32, reflect.Float64: - return func(rv reflect.Value, w io.Writer) error { - return EncodeDouble(float64(rv.Float()), w) - } - case reflect.Ptr: - // Nils are handled at the struct field level. - encf := encoderForSingleTypeReflect(t.Elem()) - return func(rv reflect.Value, w io.Writer) error { - return encf(rv.Elem(), w) - } - case reflect.Slice: - // Special case handling for byte slices. - if t.Elem().Kind() == reflect.Uint8 { - return func(rv reflect.Value, w io.Writer) error { - return EncodeBytes(rv.Bytes(), w) - } - } - encf := containerEncoderForType(t.Elem()) - return iterableEncoder(t, encf) - case reflect.Array: - encf := containerEncoderForType(t.Elem()) - return iterableEncoder(t, encf) - case reflect.Map: - encK := containerEncoderForType(t.Key()) - encV := containerEncoderForType(t.Elem()) - return mapEncoder(t, encK, encV) - } - panic(fmt.Sprintf("unimplemented type to encode: %v", t)) -} - -func containerEncoderForType(t reflect.Type) func(reflect.Value, io.Writer) error { - if t.Kind() == reflect.Ptr { - return containerNilEncoder(encoderForSingleTypeReflect(t.Elem())) - } - return encoderForSingleTypeReflect(t) -} - -type typeEncoderReflect struct { - debug []string - fields []func(reflect.Value, io.Writer) error -} - -// encoderForStructReflect generates reflection field access closures for structs. -func encoderForStructReflect(t reflect.Type) func(reflect.Value, io.Writer) error { - var coder typeEncoderReflect - for i := 0; i < t.NumField(); i++ { - coder.debug = append(coder.debug, t.Field(i).Type.Name()) - coder.fields = append(coder.fields, encoderForSingleTypeReflect(t.Field(i).Type)) - } - - return func(rv reflect.Value, w io.Writer) error { - // Row/Structs are prefixed with the number of fields that are encoded in total. - if err := writeRowHeader(rv, w); err != nil { - return err - } - for i, f := range coder.fields { - rvf := rv.Field(i) - switch rvf.Kind() { - case reflect.Ptr, reflect.Map, reflect.Slice: - if rvf.IsNil() { - continue - } - } - if err := f(rvf, w); err != nil { - return errors.Wrapf(err, "encoding %v, expected: %v", rvf.Type(), coder.debug[i]) - } - } - return nil - } -} - // writeRowHeader handles the field header for row encodings. func writeRowHeader(rv reflect.Value, w io.Writer) error { // Row/Structs are prefixed with the number of fields that are encoded in total. @@ -370,11 +114,51 @@ func writeRowHeader(rv reflect.Value, w io.Writer) error { return nil } -// readRowHeader handles the field header for row decodings. +// WriteRowHeader handles the field header for row encodings. +func WriteRowHeader(n int, isNil func(int) bool, w io.Writer) error { + // Row/Structs are prefixed with the number of fields that are encoded in total. + if err := EncodeVarInt(int64(n), w); err != nil { + return err + } + // Followed by a packed bit array of the nil fields. + var curByte byte + var nils bool + var bytes = make([]byte, 0, n/8+1) + for i := 0; i < n; i++ { + shift := i % 8 + if i != 0 && shift == 0 { + bytes = append(bytes, curByte) + curByte = 0 + } + if isNil(i) { + curByte |= (1 << uint8(shift)) + nils = true + } + } + if nils { + bytes = append(bytes, curByte) + } else { + // If there are no nils, we write a 0 length byte array instead. + bytes = bytes[:0] + } + if err := EncodeVarInt(int64(len(bytes)), w); err != nil { + return err + } + if _, err := ioutilx.WriteUnsafe(w, bytes); err != nil { + return err + } + return nil +} + +// ReadRowHeader handles the field header for row decodings. +// +// This returns the number of encoded fileds, the raw bitpacked bytes and +// any error during decoding. Each bit only needs only needs to be +// examined once during decoding using the IsFieldNil helper function. // -// This returns the raw bitpacked byte slice because we only need to -// examine each bit once, so we may as well do so inline with field checking. -func readRowHeader(rv reflect.Value, r io.Reader) (int, []byte, error) { +// If there are no nil fields encoded,the byte array will be nil, and no +// encoded fields will be nil. +func ReadRowHeader(r io.Reader) (int, []byte, error) { nf, err := DecodeVarInt(r) // is for checksum purposes (old vs new versions of a schemas) if err != nil { return 0, nil, err @@ -387,10 +171,56 @@ func readRowHeader(rv reflect.Value, r io.Reader) (int, []byte, error) { // A zero length byte array means no nils. return int(nf), nil, nil } - var buf [32]byte // should get stack allocated? - nils := buf[:l] + if nf < l { + return int(nf), nil, fmt.Errorf("number of fields is less than byte array %v < %v", nf, l) + } + nils := make([]byte, l) if err := ioutilx.ReadNBufUnsafe(r, nils); err != nil { return int(nf), nil, err } return int(nf), nils, nil } + +// IsFieldNil examines the passed in packed bits nils buffer +// and returns true if the field at that index wasn't encoded +// and can be skipped in decoding. +func IsFieldNil(nils []byte, f int) bool { + i, b := f/8, f%8 + return len(nils) != 0 && (nils[i]>>uint8(b))&0x1 == 1 +} + +// WriteSimpleRowHeader is a convenience function to write Beam Schema Row Headers +// for values that do not have any nil fields. Writes the number of fields total +// and a 0 len byte slice to indicate no fields are nil. +func WriteSimpleRowHeader(fields int, w io.Writer) error { + if err := EncodeVarInt(int64(fields), w); err != nil { + return err + } + // Never nils, so we write the 0 byte header. + if err := EncodeVarInt(0, w); err != nil { + return fmt.Errorf("WriteSimpleRowHeader a 0 length nils bit field: %v", err) + } + return nil +} + +// ReadSimpleRowHeader is a convenience function to read Beam Schema Row Headers +// for values that do not have any nil fields. Reads and validates the number of +// fields total (returning an error for mismatches, and checks that there are +// no nils encoded as a bit field. +func ReadSimpleRowHeader(fields int, r io.Reader) error { + n, err := DecodeVarInt(r) + if err != nil { + return fmt.Errorf("ReadSimpleRowHeader field count: %v, %v", n, err) + } + if int(n) != fields { + return fmt.Errorf("ReadSimpleRowHeader field count mismatch, got %v, want %v", n, fields) + } + n, err = DecodeVarInt(r) + if err != nil { + return fmt.Errorf("ReadSimpleRowHeader reading nils count: %v, %v", n, err) + } + if n != 0 { + return fmt.Errorf("ReadSimpleRowHeader expected no nils encoded count, got %v", n) + } + return nil +} diff --git a/sdks/go/pkg/beam/core/graph/coder/row_decoder.go b/sdks/go/pkg/beam/core/graph/coder/row_decoder.go new file mode 100644 index 000000000000..8e7124bfaf0f --- /dev/null +++ b/sdks/go/pkg/beam/core/graph/coder/row_decoder.go @@ -0,0 +1,396 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package coder + +import ( + "fmt" + "io" + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" +) + +// RowDecoderBuilder allows one to build Beam Schema row encoders for provided types. +type RowDecoderBuilder struct { + allFuncs map[reflect.Type]decoderProvider + ifaceFuncs []reflect.Type + + // RequireAllFieldsExported when set to true will have the default decoder building fail if + // there are any unexported fields. When set false, unexported fields in default + // destination structs will be silently ignored when decoding. + // This has no effect on types with registered decoder providers. + RequireAllFieldsExported bool +} + +type decoderProvider = func(reflect.Type) (func(io.Reader) (interface{}, error), error) + +// Register accepts a provider to decode schema encoded values +// of that type. +// +// When decoding values, decoder functions produced by this builder will +// first check for exact type matches, then interfaces implemented by +// the type in recency order of registration, and then finally the +// default Beam Schema encoding behavior. +// +// TODO(BEAM-9615): Add final factory types. This interface is subject to change. +// Currently f must be a function func(reflect.Type) (func(io.Reader) (interface{}, error), error) +func (b *RowDecoderBuilder) Register(rt reflect.Type, f interface{}) { + fd, ok := f.(decoderProvider) + if !ok { + panic(fmt.Sprintf("%T isn't a supported decoder function type (passed with %v), currently expecting %T", f, rt, (decoderProvider)(nil))) + } + + if rt.Kind() == reflect.Interface && rt.NumMethod() == 0 { + panic(fmt.Sprintf("interface type %v must have methods", rt)) + } + + if b.allFuncs == nil { + b.allFuncs = make(map[reflect.Type]decoderProvider) + } + b.allFuncs[rt] = fd + if rt.Kind() == reflect.Interface { + b.ifaceFuncs = append(b.ifaceFuncs, rt) + } +} + +// Build constructs a Beam Schema coder for the given type, using any providers registered for +// itself or it's fields. +func (b *RowDecoderBuilder) Build(rt reflect.Type) (func(io.Reader) (interface{}, error), error) { + if err := rowTypeValidation(rt, true); err != nil { + return nil, err + } + return b.decoderForType(rt) +} + +// decoderForType returns a decoder function for the struct or pointer to struct type. +func (b *RowDecoderBuilder) decoderForType(t reflect.Type) (func(io.Reader) (interface{}, error), error) { + // Check if there are any providers registered for this type, or that this type adheres to any interfaces. + f, addr, err := b.customFunc(t) + if err != nil { + return nil, err + } + if addr { + // We cannot deal with address of here, only in embedded fields, indices, and keys/values. So clear f and continue. + f = nil + } + if f != nil { + return f, nil + } + + var isPtr bool + // Pointers become the value type for decomposition. + if t.Kind() == reflect.Ptr { + isPtr = true + t = t.Elem() + } + dec, err := b.decoderForStructReflect(t) + if err != nil { + return nil, err + } + + if isPtr { + return func(r io.Reader) (interface{}, error) { + rv := reflect.New(t) + err := dec(rv.Elem(), r) + return rv.Interface(), errors.Wrapf(err, "decoding a *%v", t) + }, nil + } + return func(r io.Reader) (interface{}, error) { + rv := reflect.New(t) + err := dec(rv.Elem(), r) + return rv.Elem().Interface(), errors.Wrapf(err, "decoding a *%v", t) + }, nil +} + +// decoderForStructReflect returns a reflection based decoder function for the +// given struct type. +func (b *RowDecoderBuilder) decoderForStructReflect(t reflect.Type) (func(reflect.Value, io.Reader) error, error) { + var coder typeDecoderReflect + coder.typ = t + for i := 0; i < t.NumField(); i++ { + i := i // avoid alias issues in the closures. + sf := t.Field(i) + isUnexported := sf.PkgPath != "" + if sf.Anonymous { + ft := sf.Type + if ft.Kind() == reflect.Ptr { + // If a struct embeds a pointer to an unexported type, + // it is not possible to set a newly allocated value + // since the field is unexported. + // + // See https://golang.org/issue/21357 + // + // Since the values are created by this package reflectively, + // there's no work around like pre-allocating the field + // manually. + if isUnexported { + return nil, errors.Errorf("cannot make schema decoder for type %v as it has an embedded field of a pointer to an unexported type %v. See https://golang.org/issue/21357", t, ft.Elem()) + } + ft = ft.Elem() + } + if isUnexported && ft.Kind() != reflect.Struct { + // Ignore embedded fields of unexported non-struct types. + continue + } + // Do not ignore embedded fields of unexported struct types + // since they may have exported fields. + } else if isUnexported { + if b.RequireAllFieldsExported { + return nil, errors.Errorf("cannot make schema decoder for type %v as it has unexported fields such as %s.", t, sf.Name) + } + // Silently ignore, since we can't do anything about it. + // Add a no-op coder to fill in field index + coder.fields = append(coder.fields, typeDecoderFieldReflect{decode: func(rv reflect.Value, r io.Reader) error { + return nil + }}) + continue + } + dec, err := b.decoderForSingleTypeReflect(sf.Type) + if err != nil { + return nil, err + } + coder.fields = append(coder.fields, dec) + } + return func(rv reflect.Value, r io.Reader) error { + nf, nils, err := ReadRowHeader(r) + if err != nil { + return err + } + if nf != len(coder.fields) { + return errors.Errorf("schema[%v] changed: got %d fields, want %d fields", coder.typ, nf, len(coder.fields)) + } + for i, f := range coder.fields { + if IsFieldNil(nils, i) { + continue + } + fv := rv.Field(i) + if f.addr { + fv = fv.Addr() + } + if err := f.decode(fv, r); err != nil { + return err + } + } + return nil + }, nil +} + +func reflectDecodeBool(rv reflect.Value, r io.Reader) error { + v, err := DecodeBool(r) + if err != nil { + return errors.Wrap(err, "error decoding bool field") + } + rv.SetBool(v) + return nil +} + +func reflectDecodeByte(rv reflect.Value, r io.Reader) error { + b, err := DecodeByte(r) + if err != nil { + return errors.Wrap(err, "error decoding single byte field") + } + rv.SetUint(uint64(b)) + return nil +} + +func reflectDecodeString(rv reflect.Value, r io.Reader) error { + v, err := DecodeStringUTF8(r) + if err != nil { + return errors.Wrap(err, "error decoding string field") + } + rv.SetString(v) + return nil +} + +func reflectDecodeInt(rv reflect.Value, r io.Reader) error { + v, err := DecodeVarInt(r) + if err != nil { + return errors.Wrap(err, "error decoding varint field") + } + rv.SetInt(v) + return nil +} + +func reflectDecodeUint(rv reflect.Value, r io.Reader) error { + v, err := DecodeVarUint64(r) + if err != nil { + return errors.Wrap(err, "error decoding varint field") + } + rv.SetUint(v) + return nil +} + +func reflectDecodeFloat(rv reflect.Value, r io.Reader) error { + v, err := DecodeDouble(r) + if err != nil { + return errors.Wrap(err, "error decoding double field") + } + rv.SetFloat(v) + return nil +} + +func reflectDecodeByteSlice(rv reflect.Value, r io.Reader) error { + b, err := DecodeBytes(r) + if err != nil { + return errors.Wrap(err, "error decoding []byte field") + } + rv.SetBytes(b) + return nil +} + +// customFunc returns nil if no custom func exists for this type. +// If an error is returned, coder construction should be aborted. +func (b *RowDecoderBuilder) customFunc(t reflect.Type) (func(io.Reader) (interface{}, error), bool, error) { + if fact, ok := b.allFuncs[t]; ok { + f, err := fact(t) + + if err != nil { + return nil, false, err + } + return f, false, nil + } + // Check satisfaction of interface types in reverse registration order. + pt := reflect.PtrTo(t) + for i := len(b.ifaceFuncs) - 1; i >= 0; i-- { + it := b.ifaceFuncs[i] + if ok := t.Implements(it); ok { + if fact, ok := b.allFuncs[it]; ok { + f, err := fact(t) + if err != nil { + return nil, false, err + } + return f, false, nil + } + } + // This can occur when the type uses a pointer receiver but it is included as a value in a struct field. + if ok := pt.Implements(it); ok { + if fact, ok := b.allFuncs[it]; ok { + f, err := fact(pt) + if err != nil { + return nil, true, err + } + return f, true, nil + } + } + } + return nil, false, nil +} + +// decoderForSingleTypeReflect returns a reflection based decoder function for the +// given type. +func (b *RowDecoderBuilder) decoderForSingleTypeReflect(t reflect.Type) (typeDecoderFieldReflect, error) { + // Check if there are any providers registered for this type, or that this type adheres to any interfaces. + dec, addr, err := b.customFunc(t) + if err != nil { + return typeDecoderFieldReflect{}, err + } + if dec != nil { + return typeDecoderFieldReflect{ + decode: func(v reflect.Value, r io.Reader) error { + elm, err := dec(r) + if err != nil { + return err + } + if addr { + v.Elem().Set(reflect.ValueOf(elm).Elem()) + } else { + v.Set(reflect.ValueOf(elm)) + } + return nil + }, + addr: addr, + }, nil + } + switch t.Kind() { + case reflect.Struct: + dec, err := b.decoderForStructReflect(t) + return typeDecoderFieldReflect{decode: dec}, err + case reflect.Bool: + return typeDecoderFieldReflect{decode: reflectDecodeBool}, nil + case reflect.Uint8: + return typeDecoderFieldReflect{decode: reflectDecodeByte}, nil + case reflect.String: + return typeDecoderFieldReflect{decode: reflectDecodeString}, nil + case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64: + return typeDecoderFieldReflect{decode: reflectDecodeInt}, nil + case reflect.Uint, reflect.Uint64, reflect.Uint32, reflect.Uint16: + return typeDecoderFieldReflect{decode: reflectDecodeUint}, nil + case reflect.Float32, reflect.Float64: + return typeDecoderFieldReflect{decode: reflectDecodeFloat}, nil + case reflect.Ptr: + decf, err := b.decoderForSingleTypeReflect(t.Elem()) + if err != nil { + return typeDecoderFieldReflect{}, err + } + return typeDecoderFieldReflect{decode: func(rv reflect.Value, r io.Reader) error { + rv.Set(reflect.New(t.Elem())) + if !decf.addr { + rv = rv.Elem() + } + return decf.decode(rv, r) + }}, nil + case reflect.Slice: + // Special case handling for byte slices. + if t.Elem().Kind() == reflect.Uint8 { + return typeDecoderFieldReflect{decode: reflectDecodeByteSlice}, nil + } + decf, err := b.containerDecoderForType(t.Elem()) + if err != nil { + return typeDecoderFieldReflect{}, err + } + return typeDecoderFieldReflect{decode: iterableDecoderForSlice(t, decf)}, nil + case reflect.Array: + decf, err := b.containerDecoderForType(t.Elem()) + if err != nil { + return typeDecoderFieldReflect{}, err + } + return typeDecoderFieldReflect{decode: iterableDecoderForArray(t, decf)}, nil + case reflect.Map: + decK, err := b.containerDecoderForType(t.Key()) + if err != nil { + return typeDecoderFieldReflect{}, err + } + decV, err := b.containerDecoderForType(t.Elem()) + if err != nil { + return typeDecoderFieldReflect{}, err + } + return typeDecoderFieldReflect{decode: mapDecoder(t, decK, decV)}, nil + } + return typeDecoderFieldReflect{}, errors.Errorf("unable to decode type: %v", t) +} + +func (b *RowDecoderBuilder) containerDecoderForType(t reflect.Type) (typeDecoderFieldReflect, error) { + dec, err := b.decoderForSingleTypeReflect(t) + if err != nil { + return typeDecoderFieldReflect{}, err + } + if t.Kind() == reflect.Ptr { + return typeDecoderFieldReflect{decode: containerNilDecoder(dec.decode), addr: dec.addr}, nil + } + return dec, nil +} + +type typeDecoderReflect struct { + typ reflect.Type + fields []typeDecoderFieldReflect +} + +type typeDecoderFieldReflect struct { + decode func(reflect.Value, io.Reader) error + // If true the decoder is expecting us to pass it the address + // of the field value (i.e. &foo.bar) and not the field value (i.e. foo.bar). + addr bool +} diff --git a/sdks/go/pkg/beam/core/graph/coder/row_encoder.go b/sdks/go/pkg/beam/core/graph/coder/row_encoder.go new file mode 100644 index 000000000000..e12776459da8 --- /dev/null +++ b/sdks/go/pkg/beam/core/graph/coder/row_encoder.go @@ -0,0 +1,351 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package coder + +import ( + "fmt" + "io" + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" +) + +// RowEncoderBuilder allows one to build Beam Schema row encoders for provided types. +type RowEncoderBuilder struct { + allFuncs map[reflect.Type]encoderProvider + ifaceFuncs []reflect.Type + + // RequireAllFieldsExported when set to true will have the default decoder building fail if + // there are any unexported fields. When set false, unexported fields in default + // destination structs will be silently ignored when decoding. + // This has no effect on types with registered decoder providers. + RequireAllFieldsExported bool +} + +type encoderProvider = func(reflect.Type) (func(interface{}, io.Writer) error, error) + +// Register accepts a provider for the given type to schema encode values of that type. +// +// When generating encoding functions, this builder will first check for exact type +// matches, then against interfaces with registered factories in recency order of +// registration, and then finally use the default Beam Schema encoding behavior. +// +// TODO(BEAM-9615): Add final factory types. This interface is subject to change. +// Currently f must be a function of the type func(reflect.Type) func(T, io.Writer) (error). +func (b *RowEncoderBuilder) Register(rt reflect.Type, f interface{}) { + fe, ok := f.(encoderProvider) + if !ok { + panic(fmt.Sprintf("%T isn't a supported encoder function type (passed with %v)", f, rt)) + } + + if rt.Kind() == reflect.Interface && rt.NumMethod() == 0 { + panic(fmt.Sprintf("interface type %v must have methods", rt)) + } + if b.allFuncs == nil { + b.allFuncs = make(map[reflect.Type]encoderProvider) + } + b.allFuncs[rt] = fe + if rt.Kind() == reflect.Interface { + b.ifaceFuncs = append(b.ifaceFuncs, rt) + } +} + +// Build constructs a Beam Schema coder for the given type, using any providers registered for +// itself or it's fields. +func (b *RowEncoderBuilder) Build(rt reflect.Type) (func(interface{}, io.Writer) error, error) { + if err := rowTypeValidation(rt, true); err != nil { + return nil, err + } + return b.encoderForType(rt) +} + +// customFunc returns nil if no custom func exists for this type. +// If an error is returned, coder construction should be aborted. +func (b *RowEncoderBuilder) customFunc(t reflect.Type) (func(interface{}, io.Writer) error, bool, error) { + if fact, ok := b.allFuncs[t]; ok { + f, err := fact(t) + + if err != nil { + return nil, false, err + } + return f, false, err + } + pt := reflect.PtrTo(t) + // Check satisfaction of interface types in reverse registration order. + for i := len(b.ifaceFuncs) - 1; i >= 0; i-- { + it := b.ifaceFuncs[i] + if ok := t.Implements(it); ok { + if fact, ok := b.allFuncs[it]; ok { + f, err := fact(t) + if err != nil { + return nil, false, err + } + return f, false, nil + } + } + // This can occur when the type uses a pointer receiver but it is included as a value in a struct field. + if ok := pt.Implements(it); ok { + if fact, ok := b.allFuncs[it]; ok { + f, err := fact(pt) + if err != nil { + return nil, true, err + } + return f, true, nil + } + } + } + return nil, false, nil +} + +// encoderForType returns an encoder function for the struct or pointer to struct type. +func (b *RowEncoderBuilder) encoderForType(t reflect.Type) (func(interface{}, io.Writer) error, error) { + // Check if there are any providers registered for this type, or that this type adheres to any interfaces. + var isPtr bool + // Pointers become the value type for decomposition. + if t.Kind() == reflect.Ptr { + // If we have something for the pointer version already, we're done. + enc, addr, err := b.customFunc(t) + if err != nil { + return nil, err + } + if addr { + // We cannot deal with address of here, only in embedded fields, indices, and keys/values. So clear f and continue. + enc = nil + } + if enc != nil { + return enc, nil + } + isPtr = true + t = t.Elem() + } + + { + enc, addr, err := b.customFunc(t) + if err != nil { + return nil, err + } + if addr { + // We cannot deal with address of here, only in embedded fields, indices, and keys/values. So clear f and continue. + enc = nil + } + if enc != nil { + if isPtr { + // We have the value version, but not a pointer version, so we jump through reflect to + // get the right type to pass in. + return func(v interface{}, w io.Writer) error { + return enc(reflect.ValueOf(v).Elem().Interface(), w) + }, nil + } + return enc, nil + } + } + + enc, err := b.encoderForStructReflect(t) + if err != nil { + return nil, err + } + + if isPtr { + return func(v interface{}, w io.Writer) error { + return enc(reflect.ValueOf(v).Elem(), w) + }, nil + } + return func(v interface{}, w io.Writer) error { + return enc(reflect.ValueOf(v), w) + }, nil +} + +// Generates coder using reflection for +func (b *RowEncoderBuilder) encoderForSingleTypeReflect(t reflect.Type) (typeEncoderFieldReflect, error) { + // Check if there are any providers registered for this type, or that this type adheres to any interfaces. + enc, addr, err := b.customFunc(t) + if err != nil { + return typeEncoderFieldReflect{}, err + } + if enc != nil { + return typeEncoderFieldReflect{ + encode: func(v reflect.Value, w io.Writer) error { + return enc(v.Interface(), w) + }, + addr: addr, + }, nil + } + + switch t.Kind() { + case reflect.Struct: + enc, err := b.encoderForStructReflect(t) + return typeEncoderFieldReflect{encode: enc}, err + case reflect.Bool: + return typeEncoderFieldReflect{encode: func(rv reflect.Value, w io.Writer) error { + return EncodeBool(rv.Bool(), w) + }}, nil + case reflect.Uint8: + return typeEncoderFieldReflect{encode: func(rv reflect.Value, w io.Writer) error { + return EncodeByte(byte(rv.Uint()), w) + }}, nil + case reflect.String: + return typeEncoderFieldReflect{encode: func(rv reflect.Value, w io.Writer) error { + return EncodeStringUTF8(rv.String(), w) + }}, nil + case reflect.Int, reflect.Int64, reflect.Int16, reflect.Int32, reflect.Int8: + return typeEncoderFieldReflect{encode: func(rv reflect.Value, w io.Writer) error { + return EncodeVarInt(rv.Int(), w) + }}, nil + case reflect.Uint, reflect.Uint64, reflect.Uint32, reflect.Uint16: + return typeEncoderFieldReflect{encode: func(rv reflect.Value, w io.Writer) error { + return EncodeVarUint64(rv.Uint(), w) + }}, nil + case reflect.Float32, reflect.Float64: + return typeEncoderFieldReflect{encode: func(rv reflect.Value, w io.Writer) error { + return EncodeDouble(rv.Float(), w) + }}, nil + case reflect.Ptr: + // Nils are handled at the struct field level. + encf, err := b.encoderForSingleTypeReflect(t.Elem()) + if err != nil { + return typeEncoderFieldReflect{}, err + } + return typeEncoderFieldReflect{encode: func(rv reflect.Value, w io.Writer) error { + if !encf.addr { + rv = rv.Elem() + } + return encf.encode(rv, w) + }}, nil + case reflect.Slice: + // Special case handling for byte slices. + if t.Elem().Kind() == reflect.Uint8 { + return typeEncoderFieldReflect{encode: func(rv reflect.Value, w io.Writer) error { + return EncodeBytes(rv.Bytes(), w) + }}, nil + } + encf, err := b.containerEncoderForType(t.Elem()) + if err != nil { + return typeEncoderFieldReflect{}, err + } + return typeEncoderFieldReflect{encode: iterableEncoder(t, encf)}, nil + case reflect.Array: + encf, err := b.containerEncoderForType(t.Elem()) + if err != nil { + return typeEncoderFieldReflect{}, err + } + return typeEncoderFieldReflect{encode: iterableEncoder(t, encf)}, nil + case reflect.Map: + encK, err := b.containerEncoderForType(t.Key()) + if err != nil { + return typeEncoderFieldReflect{}, err + } + encV, err := b.containerEncoderForType(t.Elem()) + if err != nil { + return typeEncoderFieldReflect{}, err + } + return typeEncoderFieldReflect{encode: mapEncoder(t, encK, encV)}, nil + } + return typeEncoderFieldReflect{}, errors.Errorf("unable to encode type: %v", t) +} + +func (b *RowEncoderBuilder) containerEncoderForType(t reflect.Type) (typeEncoderFieldReflect, error) { + encf, err := b.encoderForSingleTypeReflect(t) + if err != nil { + return typeEncoderFieldReflect{}, err + } + if t.Kind() == reflect.Ptr { + return typeEncoderFieldReflect{encode: containerNilEncoder(encf.encode), addr: encf.addr}, nil + } + return encf, nil +} + +type typeEncoderReflect struct { + debug []string + fields []typeEncoderFieldReflect +} + +type typeEncoderFieldReflect struct { + encode func(reflect.Value, io.Writer) error + // If true the encoder is expecting us to pass it the address + // of the field value (i.e. &foo.bar) and not the field value (i.e. foo.bar). + addr bool +} + +// encoderForStructReflect generates reflection field access closures for structs. +func (b *RowEncoderBuilder) encoderForStructReflect(t reflect.Type) (func(reflect.Value, io.Writer) error, error) { + var coder typeEncoderReflect + for i := 0; i < t.NumField(); i++ { + coder.debug = append(coder.debug, t.Field(i).Name+" "+t.Field(i).Type.String()) + sf := t.Field(i) + isUnexported := sf.PkgPath != "" + if sf.Anonymous { + ft := sf.Type + if ft.Kind() == reflect.Ptr { + // If a struct embeds a pointer to an unexported type, + // it is not possible to set a newly allocated value + // since the field is unexported. + // + // See https://golang.org/issue/21357 + // + // Since the values are created by this package reflectively, + // there's no work around like pre-allocating the field + // manually. + if isUnexported { + return nil, errors.Errorf("cannot make schema encoder for type %v as it has an embedded field of a pointer to an unexported type %v. See https://golang.org/issue/21357", t, ft.Elem()) + } + ft = ft.Elem() + } + if isUnexported && ft.Kind() != reflect.Struct { + // Ignore embedded fields of unexported non-struct types. + continue + } + // Do not ignore embedded fields of unexported struct types + // since they may have exported fields. + } else if isUnexported { + if b.RequireAllFieldsExported { + return nil, errors.Errorf("cannot make schema encoder for type %v as it has unexported fields such as %s.", t, sf.Name) + } + // Silently ignore, since we can't do anything about it. + // Add a no-op coder to fill in field index + coder.fields = append(coder.fields, typeEncoderFieldReflect{encode: func(rv reflect.Value, w io.Writer) error { + return nil + }}) + continue + } + enc, err := b.encoderForSingleTypeReflect(sf.Type) + if err != nil { + return nil, err + } + coder.fields = append(coder.fields, enc) + } + + return func(rv reflect.Value, w io.Writer) error { + if err := writeRowHeader(rv, w); err != nil { + return err + } + for i, f := range coder.fields { + rvf := rv.Field(i) + switch rvf.Kind() { + case reflect.Ptr, reflect.Map, reflect.Slice: + if rvf.IsNil() { + continue + } + } + if f.addr { + rvf = rvf.Addr() + } + if err := f.encode(rvf, w); err != nil { + return errors.Wrapf(err, "encoding %v, expected: %q", rvf.Type(), coder.debug[i]) + } + } + return nil + }, nil +} diff --git a/sdks/go/pkg/beam/core/graph/coder/row_test.go b/sdks/go/pkg/beam/core/graph/coder/row_test.go index 38b7c5dfb1f9..b0193ee67e1a 100644 --- a/sdks/go/pkg/beam/core/graph/coder/row_test.go +++ b/sdks/go/pkg/beam/core/graph/coder/row_test.go @@ -18,9 +18,11 @@ package coder import ( "bytes" "fmt" + "io" "reflect" "testing" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/jsonx" "github.com/google/go-cmp/cmp" ) @@ -28,113 +30,151 @@ func TestReflectionRowCoderGeneration(t *testing.T) { num := 35 tests := []struct { want interface{} - }{{ - // Top level value check - want: UserType1{ - A: "cats", - B: 24, - C: "pjamas", - }, - }, { - // Top level pointer check - want: &UserType1{ - A: "marmalade", - B: 24, - C: "jam", - }, - }, { - // Inner pointer check. - want: UserType2{ - A: "dogs", - B: &UserType1{ + }{ + { + // Top level value check + want: UserType1{ A: "cats", B: 24, C: "pjamas", }, - C: &num, - }, - }, { - // nil pointer check. - want: UserType2{ - A: "dogs", - B: nil, - C: nil, - }, - }, { - // All zeroes - want: struct { - V00 bool - V01 byte // unsupported by spec (same as uint8) - V02 uint8 // unsupported by spec - V03 int16 - // V04 uint16 // unsupported by spec - V05 int32 - // V06 uint32 // unsupported by spec - V07 int64 - // V08 uint64 // unsupported by spec - V09 int - V10 struct{} - V11 *struct{} - V12 [0]int - V13 [2]int - V14 []int - V15 map[string]int - V16 float32 - V17 float64 - V18 []byte - V19 [2]*int - V20 map[*string]*int - }{}, - }, { - want: struct { - V00 bool - V01 byte // unsupported by spec (same as uint8) - V02 uint8 // unsupported by spec - V03 int16 - // V04 uint16 // unsupported by spec - V05 int32 - // V06 uint32 // unsupported by spec - V07 int64 - // V08 uint64 // unsupported by spec - V09 int - V10 struct{} - V11 *struct{} - V12 [0]int - V13 [2]int - V14 []int - V15 map[string]int - V16 float32 - V17 float64 - V18 []byte - V19 [2]*int - V20 map[string]*int - V21 []*int - }{ - V00: true, - V01: 1, - V02: 2, - V03: 3, - V05: 5, - V07: 7, - V09: 9, - V10: struct{}{}, - V11: &struct{}{}, - V12: [0]int{}, - V13: [2]int{72, 908}, - V14: []int{12, 9326, 641346, 6}, - V15: map[string]int{"pants": 42}, - V16: 3.14169, - V17: 2.6e100, - V18: []byte{21, 17, 65, 255, 0, 16}, - V19: [2]*int{nil, &num}, - V20: map[string]*int{ - "notnil": &num, - "nil": nil, - }, - V21: []*int{nil, &num, nil}, + }, { + // Top level pointer check + want: &UserType1{ + A: "marmalade", + B: 24, + C: "jam", + }, + }, { + // Inner pointer check. + want: UserType2{ + A: "dogs", + B: &UserType1{ + A: "cats", + B: 24, + C: "pjamas", + }, + C: &num, + }, + }, { + // nil pointer check. + want: UserType2{ + A: "dogs", + B: nil, + C: nil, + }, + }, { + // nested struct check + want: UserType3{ + A: UserType1{ + A: "marmalade", + B: 24, + C: "jam", + }, + }, + }, { + // embedded struct check + want: UserType4{ + UserType1{ + A: "marmalade", + B: 24, + C: "jam", + }, + }, + }, { + // embedded struct check2 + want: userType5{ + unexportedUserType: unexportedUserType{ + A: 24, + B: "marmalade", + }, + C: 79, + }, + }, { + // embedded struct check3 + want: userType6{ + UserType1: &UserType1{ + A: "marmalade", + B: 24, + }, + C: 81, + }, + }, { + // All zeroes + want: struct { + V00 bool + V01 byte + V02 uint8 + V03 int16 + V04 uint16 + V05 int32 + V06 uint32 + V07 int64 + V08 uint64 + V09 int + V10 struct{} + V11 *struct{} + V12 [0]int + V13 [2]int + V14 []int + V15 map[string]int + V16 float32 + V17 float64 + V18 []byte + V19 [2]*int + V20 map[*string]*int + }{}, + }, { + want: struct { + V00 bool + V01 byte + V02 uint8 + V03 int16 + V04 uint16 + V05 int32 + V06 uint32 + V07 int64 + V08 uint64 + V09 int + V10 struct{} + V11 *struct{} + V12 [0]int + V13 [2]int + V14 []int + V15 map[string]int + V16 float32 + V17 float64 + V18 []byte + V19 [2]*int + V20 map[*string]*int + V21 []*int + }{ + V00: true, + V01: 1, + V02: 2, + V03: 3, + V04: 4, + V05: 5, + V06: 6, + V07: 7, + V08: 8, + V09: 9, + V10: struct{}{}, + V11: &struct{}{}, + V12: [0]int{}, + V13: [2]int{72, 908}, + V14: []int{12, 9326, 641346, 6}, + V15: map[string]int{"pants": 42}, + V16: 3.14169, + V17: 2.6e100, + V18: []byte{21, 17, 65, 255, 0, 16}, + V19: [2]*int{nil, &num}, + V20: map[*string]*int{ + nil: nil, + }, + V21: []*int{nil, &num, nil}, + }, }, - // TODO add custom types such as protocol buffers. - }, } for _, test := range tests { t.Run(fmt.Sprintf("%+v", test.want), func(t *testing.T) { @@ -157,12 +197,11 @@ func TestReflectionRowCoderGeneration(t *testing.T) { if err != nil { t.Fatalf("RowDecoderForStruct(%v) = %v, want nil error", rt, err) } - if d := cmp.Diff(test.want, got); d != "" { + if d := cmp.Diff(test.want, got, cmp.AllowUnexported(userType5{}, unexportedUserType{})); d != "" { t.Fatalf("dec(enc(%v)) = %v\ndiff (-want, +got): %v", test.want, got, d) } }) } - } type UserType1 struct { @@ -176,3 +215,614 @@ type UserType2 struct { B *UserType1 C *int } + +type UserType3 struct { + A UserType1 +} + +// Embedding check. +type UserType4 struct { + UserType1 +} + +type unexportedUserType struct { + A int + B string + c int32 +} + +// Embedding check with unexported type. +type userType5 struct { + unexportedUserType + C int32 +} + +// Embedding check with a pointer Exported type +type userType6 struct { + *UserType1 + C int32 +} + +// Note: pointers to unexported types can't be handled by +// this package. See https://golang.org/issue/21357. + +func ut1Enc(val interface{}, w io.Writer) error { + if err := WriteSimpleRowHeader(3, w); err != nil { + return err + } + elm := val.(UserType1) + if err := EncodeStringUTF8(elm.A, w); err != nil { + return err + } + if err := EncodeVarInt(int64(elm.B), w); err != nil { + return err + } + if err := EncodeStringUTF8(elm.C, w); err != nil { + return err + } + return nil +} + +func ut1Dec(r io.Reader) (interface{}, error) { + if err := ReadSimpleRowHeader(3, r); err != nil { + return nil, err + } + a, err := DecodeStringUTF8(r) + if err != nil { + return nil, fmt.Errorf("decoding string field A: %v", err) + } + b, err := DecodeVarInt(r) + if err != nil { + return nil, fmt.Errorf("decoding int field B: %v", err) + } + c, err := DecodeStringUTF8(r) + if err != nil { + return nil, fmt.Errorf("decoding string field C: %v, %v", c, err) + } + return UserType1{ + A: a, + B: int(b), + C: c, + }, nil +} + +func TestRowCoder_CustomCoder(t *testing.T) { + customRT := reflect.TypeOf(UserType1{}) + customEnc := ut1Enc + customDec := ut1Dec + + num := 35 + tests := []struct { + want interface{} + }{ + { + // Top level value check + want: UserType1{ + A: "cats", + B: 24, + C: "pjamas", + }, + }, { + // Top level pointer check + want: &UserType1{ + A: "marmalade", + B: 24, + C: "jam", + }, + }, { + // Inner pointer check. + want: UserType2{ + A: "dogs", + B: &UserType1{ + A: "cats", + B: 24, + C: "pjamas", + }, + C: &num, + }, + }, { + // nil pointer check. + want: UserType2{ + A: "dogs", + B: nil, + C: nil, + }, + }, { + // nested struct check + want: UserType3{ + A: UserType1{ + A: "marmalade", + B: 24, + C: "jam", + }, + }, + }, { + // embedded struct check + want: UserType4{ + UserType1{ + A: "marmalade", + B: 24, + C: "jam", + }, + }, + }, + } + for _, test := range tests { + t.Run(fmt.Sprintf("%+v", test.want), func(t *testing.T) { + rt := reflect.TypeOf(test.want) + var encB RowEncoderBuilder + encB.Register(customRT, func(reflect.Type) (func(interface{}, io.Writer) error, error) { return customEnc, nil }) + enc, err := encB.Build(rt) + if err != nil { + t.Fatalf("RowEncoderBuilder.Build(%v) = %v, want nil error", rt, err) + } + var decB RowDecoderBuilder + decB.Register(customRT, func(reflect.Type) (func(io.Reader) (interface{}, error), error) { return customDec, nil }) + dec, err := decB.Build(rt) + if err != nil { + t.Fatalf("RowDecoderBuilder.Build(%v) = %v, want nil error", rt, err) + } + var buf bytes.Buffer + if err := enc(test.want, &buf); err != nil { + t.Fatalf("enc(%v) = err, want nil error", err) + } + _, err = dec(&buf) + if err != nil { + t.Fatalf("BuildDecoder(%v) = %v, want nil error", rt, err) + } + }) + } +} + +func BenchmarkRowCoder_RoundTrip(b *testing.B) { + ut1Enc := func(val interface{}, w io.Writer) error { + elm := val.(UserType1) + // We have 3 fields we use. + if err := EncodeVarInt(3, w); err != nil { + return err + } + // Never nils, so we write the 0 byte header. + if err := EncodeVarInt(0, w); err != nil { + return err + } + if err := EncodeStringUTF8(elm.A, w); err != nil { + return err + } + if err := EncodeVarInt(int64(elm.B), w); err != nil { + return err + } + if err := EncodeStringUTF8(elm.C, w); err != nil { + return err + } + return nil + } + ut1Dec := func(r io.Reader) (interface{}, error) { + // We have 3 fields we use. + n, err := DecodeVarInt(r) + if err != nil { + return nil, fmt.Errorf("decoding header fieldcount: %v, %v", n, err) + } + if n != 3 { + return nil, fmt.Errorf("decoding header field count, got %v, want %v", n, 3) + } + // Never nils, so we read the 0 byte header. + n, err = DecodeVarInt(r) + if err != nil { + return nil, fmt.Errorf("decoding header nils: %v, %v", n, err) + } + if n != 0 { + return nil, fmt.Errorf("decoding header nils count, got %v, want %v", n, 0) + } + a, err := DecodeStringUTF8(r) + if err != nil { + return nil, fmt.Errorf("decoding string field A: %v", err) + } + b, err := DecodeVarInt(r) + if err != nil { + return nil, fmt.Errorf("decoding int field B: %v", err) + } + c, err := DecodeStringUTF8(r) + if err != nil { + return nil, fmt.Errorf("decoding string field C: %v, %v", c, err) + } + return UserType1{ + A: a, + B: int(b), + C: c, + }, nil + } + + num := 35 + benches := []struct { + want interface{} + customRT reflect.Type + customEnc func(interface{}, io.Writer) error + customDec func(io.Reader) (interface{}, error) + }{ + { + // Top level value check + want: UserType1{ + A: "cats", + B: 24, + C: "pjamas", + }, + customRT: reflect.TypeOf(UserType1{}), + customEnc: ut1Enc, + customDec: ut1Dec, + }, { + // Top level pointer check + want: &UserType1{ + A: "marmalade", + B: 24, + C: "jam", + }, + customRT: reflect.TypeOf(UserType1{}), + customEnc: ut1Enc, + customDec: ut1Dec, + }, { + // Inner pointer check. + want: UserType2{ + A: "dogs", + B: &UserType1{ + A: "cats", + B: 24, + C: "pjamas", + }, + C: &num, + }, + customRT: reflect.TypeOf(UserType1{}), + customEnc: ut1Enc, + customDec: ut1Dec, + }, { + // nil pointer check. + want: UserType2{ + A: "dogs", + B: nil, + C: nil, + }, + customRT: reflect.TypeOf(UserType1{}), + customEnc: ut1Enc, + customDec: ut1Dec, + }, { + // nested struct check + want: UserType3{ + A: UserType1{ + A: "marmalade", + B: 24, + C: "jam", + }, + }, + customRT: reflect.TypeOf(UserType1{}), + customEnc: ut1Enc, + customDec: ut1Dec, + }, { + // embedded struct check + want: UserType4{ + UserType1{ + A: "marmalade", + B: 24, + C: "jam", + }, + }, + customRT: reflect.TypeOf(UserType1{}), + customEnc: ut1Enc, + customDec: ut1Dec, + }, { + // embedded struct check2 + want: userType5{ + unexportedUserType: unexportedUserType{ + B: "marmalade", + A: 24, + }, + C: 79, + }, + }, { + // embedded struct check3 + want: userType6{ + UserType1: &UserType1{ + A: "marmalade", + B: 24, + }, + C: 81, + }, + }, + } + for _, bench := range benches { + rt := reflect.TypeOf(bench.want) + { + enc, err := RowEncoderForStruct(rt) + if err != nil { + b.Fatalf("BuildEncoder(%v) = %v, want nil error", rt, err) + } + dec, err := RowDecoderForStruct(rt) + if err != nil { + b.Fatalf("BuildDecoder(%v) = %v, want nil error", rt, err) + } + var buf bytes.Buffer + b.Run(fmt.Sprintf("SCHEMA %+v", bench.want), func(b *testing.B) { + for i := 0; i < b.N; i++ { + if err := enc(bench.want, &buf); err != nil { + b.Fatalf("enc(%v) = err, want nil error", err) + } + _, err := dec(&buf) + if err != nil { + b.Fatalf("BuildDecoder(%v) = %v, want nil error", rt, err) + } + } + }) + } + if bench.customEnc != nil && bench.customDec != nil && rt == bench.customRT { + var buf bytes.Buffer + b.Run(fmt.Sprintf("CUSTOM %+v", bench.want), func(b *testing.B) { + for i := 0; i < b.N; i++ { + if err := bench.customEnc(bench.want, &buf); err != nil { + b.Fatalf("enc(%v) = err, want nil error", err) + } + _, err := bench.customDec(&buf) + if err != nil { + b.Fatalf("BuildDecoder(%v) = %v, want nil error", rt, err) + } + } + }) + } + if bench.customEnc != nil && bench.customDec != nil { + var encB RowEncoderBuilder + encB.Register(bench.customRT, func(reflect.Type) (func(interface{}, io.Writer) error, error) { return bench.customEnc, nil }) + enc, err := encB.Build(rt) + if err != nil { + b.Fatalf("RowEncoderBuilder.Build(%v) = %v, want nil error", rt, err) + } + var decB RowDecoderBuilder + decB.Register(bench.customRT, func(reflect.Type) (func(io.Reader) (interface{}, error), error) { return bench.customDec, nil }) + dec, err := decB.Build(rt) + if err != nil { + b.Fatalf("RowDecoderBuilder.Build(%v) = %v, want nil error", rt, err) + } + var buf bytes.Buffer + b.Run(fmt.Sprintf("REGISTERED %+v", bench.want), func(b *testing.B) { + for i := 0; i < b.N; i++ { + if err := enc(bench.want, &buf); err != nil { + b.Fatalf("enc(%v) = err, want nil error", err) + } + _, err := dec(&buf) + if err != nil { + b.Fatalf("BuildDecoder(%v) = %v, want nil error", rt, err) + } + } + }) + } + { + b.Run(fmt.Sprintf("JSON %+v", bench.want), func(b *testing.B) { + for i := 0; i < b.N; i++ { + data, err := jsonx.Marshal(bench.want) + if err != nil { + b.Fatalf("jsonx.Marshal(%v) = err, want nil error", err) + } + val := reflect.New(rt) + if err := jsonx.Unmarshal(val.Interface(), data); err != nil { + b.Fatalf("jsonx.Unmarshal(%v) = %v, want nil error; type: %v", rt, err, val.Type()) + } + } + }) + } + } +} + +type testInterface interface { + TestEncode() ([]byte, error) + TestDecode(b []byte) error +} + +var ( + testInterfaceType = reflect.TypeOf((*testInterface)(nil)).Elem() + testStorageType = reflect.TypeOf((*struct{ TestData []byte })(nil)).Elem() + testParDoType = reflect.TypeOf((*testParDo)(nil)) +) + +type testProvider struct{} + +func (p *testProvider) FromLogicalType(rt reflect.Type) (reflect.Type, error) { + if !rt.Implements(testInterfaceType) { + return nil, fmt.Errorf("%s does not implement testInterface", rt) + } + return testStorageType, nil +} + +func (p *testProvider) BuildEncoder(rt reflect.Type) (func(interface{}, io.Writer) error, error) { + if _, err := p.FromLogicalType(rt); err != nil { + return nil, err + } + + return func(iface interface{}, w io.Writer) error { + v := iface.(testInterface) + data, err := v.TestEncode() + if err != nil { + return err + } + if err := WriteSimpleRowHeader(1, w); err != nil { + return err + } + if err := EncodeBytes(data, w); err != nil { + return err + } + return nil + }, nil +} + +func (p *testProvider) BuildDecoder(rt reflect.Type) (func(io.Reader) (interface{}, error), error) { + if _, err := p.FromLogicalType(rt); err != nil { + return nil, err + } + if rt.Kind() == reflect.Ptr { + rt = rt.Elem() + return func(r io.Reader) (interface{}, error) { + if err := ReadSimpleRowHeader(1, r); err != nil { + return nil, err + } + data, err := DecodeBytes(r) + if err != nil { + return nil, err + } + v, ok := reflect.New(rt).Interface().(testInterface) + if !ok { + return nil, fmt.Errorf("%s is not %s", reflect.PtrTo(rt), testInterfaceType) + } + if err := v.TestDecode(data); err != nil { + return nil, err + } + return v, nil + }, nil + } + return func(r io.Reader) (interface{}, error) { + if err := ReadSimpleRowHeader(1, r); err != nil { + return nil, err + } + data, err := DecodeBytes(r) + if err != nil { + return nil, err + } + v, ok := reflect.New(rt).Elem().Interface().(testInterface) + if !ok { + return nil, fmt.Errorf("%s is not %s", rt, testInterfaceType) + } + if err := v.TestDecode(data); err != nil { + return nil, err + } + return v, nil + }, nil +} + +type testStruct struct { + A int64 + + b int64 +} + +func (s *testStruct) TestEncode() ([]byte, error) { + var buf bytes.Buffer + if err := EncodeVarInt(s.A, &buf); err != nil { + return nil, err + } + if err := EncodeVarInt(s.b, &buf); err != nil { + return nil, err + } + return buf.Bytes(), nil +} + +func (s *testStruct) TestDecode(b []byte) error { + buf := bytes.NewReader(b) + var err error + s.A, err = DecodeVarInt(buf) + if err != nil { + return err + } + s.b, err = DecodeVarInt(buf) + if err != nil { + return err + } + return nil +} + +var _ testInterface = &testStruct{} + +type testParDo struct { + Struct testStruct + StructPtr *testStruct + StructSlice []testStruct + StructPtrSlice []*testStruct + StructMap map[int64]testStruct + StructPtrMap map[int64]*testStruct +} + +func TestSchemaProviderInterface(t *testing.T) { + p := testProvider{} + encb := &RowEncoderBuilder{} + encb.Register(testInterfaceType, p.BuildEncoder) + enc, err := encb.Build(testParDoType) + if err != nil { + t.Fatalf("RowEncoderBuilder.Build(%v): %v", testParDoType, err) + } + decb := &RowDecoderBuilder{} + decb.Register(testInterfaceType, p.BuildDecoder) + dec, err := decb.Build(testParDoType) + if err != nil { + t.Fatalf("RowDecoderBuilder.Build(%v): %v", testParDoType, err) + } + want := &testParDo{ + Struct: testStruct{ + A: 1, + b: 2, + }, + StructPtr: &testStruct{ + A: 3, + b: 4, + }, + StructSlice: []testStruct{ + { + A: 5, + b: 6, + }, + { + A: 7, + b: 8, + }, + { + A: 9, + b: 10, + }, + }, + StructPtrSlice: []*testStruct{ + { + A: 11, + b: 12, + }, + { + A: 13, + b: 14, + }, + { + A: 15, + b: 16, + }, + }, + StructMap: map[int64]testStruct{ + 0: testStruct{ + A: 17, + b: 18, + }, + 1: testStruct{ + A: 19, + b: 20, + }, + 2: testStruct{ + A: 21, + b: 22, + }, + }, + StructPtrMap: map[int64]*testStruct{ + 0: &testStruct{ + A: 23, + b: 24, + }, + 1: &testStruct{ + A: 25, + b: 26, + }, + 2: &testStruct{ + A: 27, + b: 28, + }, + }, + } + var buf bytes.Buffer + if err := enc(want, &buf); err != nil { + t.Fatalf("Encode(%v): %v", want, err) + } + got, err := dec(&buf) + if err != nil { + t.Fatalf("Decode(%v): %v", buf.Bytes(), err) + } + if diff := cmp.Diff(want, got, cmp.AllowUnexported(testStruct{})); diff != "" { + t.Errorf("Decode(Encode(%v)): %v", want, diff) + } +} diff --git a/sdks/go/pkg/beam/core/graph/coder/stringutf8.go b/sdks/go/pkg/beam/core/graph/coder/stringutf8.go index d70df6acef68..6cd0105558b9 100644 --- a/sdks/go/pkg/beam/core/graph/coder/stringutf8.go +++ b/sdks/go/pkg/beam/core/graph/coder/stringutf8.go @@ -19,7 +19,7 @@ import ( "io" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" ) const bufCap = 64 diff --git a/sdks/go/pkg/beam/core/graph/coder/testutil/testutil.go b/sdks/go/pkg/beam/core/graph/coder/testutil/testutil.go new file mode 100644 index 000000000000..3c7acaca053e --- /dev/null +++ b/sdks/go/pkg/beam/core/graph/coder/testutil/testutil.go @@ -0,0 +1,154 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package testutil contains helpers to test and validate custom Beam Schema coders. +package testutil + +import ( + "bytes" + "fmt" + "reflect" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/google/go-cmp/cmp" +) + +// SchemaCoder helps validate custom schema coders. +type SchemaCoder struct { + encBldUT, encBldSchema coder.RowEncoderBuilder + decBldUT, decBldSchema coder.RowDecoderBuilder + + // CmpOptions to pass into the round trip comparison + CmpOptions cmp.Options +} + +// Register adds additional custom types not under test to both the under test +// and default schema coders. +func (v *SchemaCoder) Register(rt reflect.Type, encF, decF interface{}) { + v.encBldUT.Register(rt, encF) + v.encBldSchema.Register(rt, encF) + v.decBldUT.Register(rt, decF) + v.decBldSchema.Register(rt, decF) +} + +// T is an interface to facilitate testing the tester. The methods need +// to match the one's we're using of *testing.T. +type T interface { + Helper() + Run(string, func(*testing.T)) bool + Errorf(string, ...interface{}) + Failed() bool + FailNow() +} + +// Validate is a test utility to validate custom schema coders generate +// beam schema encoded bytes. +// +// Validate accepts the reflect.Type to register, factory functions for +// encoding and decoding, an anonymous struct type equivalent to the encoded +// format produced and consumed by the factory produced functions and test +// values. Test values must be either a struct, pointer to struct, or a slice +// where each element is a struct or pointer to struct. +// +// TODO(lostluck): Improve documentation. +// TODO(lostluck): Abstract into a configurable struct, to handle +// +// Validate will register the under test factories and generate an encoder and +// decoder function. These functions will be re-used for all test values. This +// emulates coders being re-used for all elements within a bundle. +// +// Validate mutates the SchemaCoderValidator, so the SchemaCoderValidator may not be used more than once. +func (v *SchemaCoder) Validate(t T, rt reflect.Type, encF, decF, schema interface{}, values interface{}) { + t.Helper() + testValues := reflect.ValueOf(values) + // Check whether we have a slice type or not. + if testValues.Type().Kind() != reflect.Slice { + vs := reflect.MakeSlice(reflect.SliceOf(testValues.Type()), 0, 1) + testValues = reflect.Append(vs, testValues) + } + if testValues.Len() == 0 { + t.Errorf("No test values provided for ValidateSchemaCoder(%v)", rt) + } + // We now have non empty slice of test values! + + v.encBldUT.Register(rt, encF) + v.decBldUT.Register(rt, decF) + + testRt := testValues.Type().Elem() + encUT, err := v.encBldUT.Build(testRt) + if err != nil { + t.Errorf("Unable to build encoder function with given factory: coder.RowEncoderBuilder.Build(%v) = %v, want nil error", rt, err) + } + decUT, err := v.decBldUT.Build(testRt) + if err != nil { + t.Errorf("Unable to build decoder function with given factory: coder.RowDecoderBuilder.Build(%v) = %v, want nil error", rt, err) + } + + schemaRt := reflect.TypeOf(schema) + encSchema, err := v.encBldSchema.Build(schemaRt) + if err != nil { + t.Errorf("Unable to build encoder function for schema equivalent type: coder.RowEncoderBuilder.Build(%v) = %v, want nil error", rt, err) + } + decSchema, err := v.decBldSchema.Build(schemaRt) + if err != nil { + t.Errorf("Unable to build decoder function for schema equivalent type: coder.RowDecoderBuilder.Build(%v) = %v, want nil error", rt, err) + } + // We use error messages instead of fatals to allow all the cases to be + // checked. None of the coder functions are used until the per value runs + // so a user can get additional information per run. + if t.Failed() { + t.FailNow() + } + for i := 0; i < testValues.Len(); i++ { + t.Run(fmt.Sprintf("%v[%d]", rt, i), func(t *testing.T) { + var buf bytes.Buffer + want := testValues.Index(i).Interface() + if err := encUT(want, &buf); err != nil { + t.Fatalf("error calling Under Test encoder[%v](%v) = %v", testRt, want, err) + } + initialBytes := clone(buf.Bytes()) + + bufSchema := bytes.NewBuffer(clone(initialBytes)) + + schemaV, err := decSchema(bufSchema) + if err != nil { + t.Fatalf("error calling Equivalent Schema decoder[%v]() = %v", schemaRt, err) + } + err = encSchema(schemaV, bufSchema) + if err != nil { + t.Fatalf("error calling Equivalent Schema encoder[%v](%v) = %v, want nil error", schemaRt, schemaV, err) + } + roundTripBytes := clone(bufSchema.Bytes()) + + if d := cmp.Diff(initialBytes, roundTripBytes); d != "" { + t.Errorf("round trip through equivalent schema type didn't produce equivalent byte slices (-initial,+roundTrip): \n%v", d) + } + got, err := decUT(bufSchema) + if err != nil { + t.Fatalf("Under Test decoder(%v) = %v, want nil error", rt, err) + } + if d := cmp.Diff(want, got, v.CmpOptions); d != "" { + t.Fatalf("round trip through custom coder produced diff: (-want, +got):\n%v", d) + } + }) + } +} + +func clone(b []byte) []byte { + c := make([]byte, len(b)) + copy(c, b) + return c +} diff --git a/sdks/go/pkg/beam/core/graph/coder/testutil/testutil_test.go b/sdks/go/pkg/beam/core/graph/coder/testutil/testutil_test.go new file mode 100644 index 000000000000..f3d5309beaf0 --- /dev/null +++ b/sdks/go/pkg/beam/core/graph/coder/testutil/testutil_test.go @@ -0,0 +1,201 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package testutil + +import ( + "fmt" + "io" + "reflect" + "strings" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" +) + +type UserInterface interface { + mark() +} + +type UserType1 struct { + A string + B int + C string +} + +func (UserType1) mark() {} + +func ut1EncDropB(val interface{}, w io.Writer) error { + if err := coder.WriteSimpleRowHeader(2, w); err != nil { + return err + } + elm := val.(UserType1) + if err := coder.EncodeStringUTF8(elm.A, w); err != nil { + return err + } + if err := coder.EncodeStringUTF8(elm.C, w); err != nil { + return err + } + return nil +} + +func ut1DecDropB(r io.Reader) (interface{}, error) { + if err := coder.ReadSimpleRowHeader(2, r); err != nil { + return nil, err + } + a, err := coder.DecodeStringUTF8(r) + if err != nil { + return nil, fmt.Errorf("decoding string field A: %v", err) + } + c, err := coder.DecodeStringUTF8(r) + if err != nil { + return nil, fmt.Errorf("decoding string field C: %v, %v", c, err) + } + return UserType1{ + A: a, + B: 42, + C: c, + }, nil +} + +type UserType2 struct { + A UserType1 +} + +func TestValidateCoder(t *testing.T) { + // Validates a custom UserType1 encoding, which drops encoding the "B" field, + // always setting it to a constant value. + t.Run("SingleValue", func(t *testing.T) { + (&SchemaCoder{}).Validate(t, reflect.TypeOf((*UserType1)(nil)).Elem(), + func(reflect.Type) (func(interface{}, io.Writer) error, error) { return ut1EncDropB, nil }, + func(reflect.Type) (func(io.Reader) (interface{}, error), error) { return ut1DecDropB, nil }, + struct{ A, C string }{}, + UserType1{ + A: "cats", + B: 42, + C: "pjamas", + }, + ) + }) + t.Run("SliceOfValues", func(t *testing.T) { + (&SchemaCoder{}).Validate(t, reflect.TypeOf((*UserType1)(nil)).Elem(), + func(reflect.Type) (func(interface{}, io.Writer) error, error) { return ut1EncDropB, nil }, + func(reflect.Type) (func(io.Reader) (interface{}, error), error) { return ut1DecDropB, nil }, + struct{ A, C string }{}, + []UserType1{ + { + A: "cats", + B: 42, + C: "pjamas", + }, { + A: "dogs", + B: 42, + C: "breakfast", + }, { + A: "fish", + B: 42, + C: "plenty of", + }, + }, + ) + }) + t.Run("InterfaceCoder", func(t *testing.T) { + (&SchemaCoder{}).Validate(t, reflect.TypeOf((*UserInterface)(nil)).Elem(), + func(rt reflect.Type) (func(interface{}, io.Writer) error, error) { + return ut1EncDropB, nil + }, + func(rt reflect.Type) (func(io.Reader) (interface{}, error), error) { + return ut1DecDropB, nil + }, + struct{ A, C string }{}, + UserType1{ + A: "cats", + B: 42, + C: "pjamas", + }, + ) + }) + t.Run("FailureCases", func(t *testing.T) { + var c checker + err := fmt.Errorf("FactoryError") + var v SchemaCoder + // Register the pointer type to the default encoder too. + v.Register(reflect.TypeOf((*UserType2)(nil)), + func(reflect.Type) (func(interface{}, io.Writer) error, error) { return nil, err }, + func(reflect.Type) (func(io.Reader) (interface{}, error), error) { return nil, err }, + ) + v.Validate(&c, reflect.TypeOf((*UserType1)(nil)).Elem(), + func(reflect.Type) (func(interface{}, io.Writer) error, error) { return ut1EncDropB, err }, + func(reflect.Type) (func(io.Reader) (interface{}, error), error) { return ut1DecDropB, err }, + struct { + A, C string + B *UserType2 // To trigger the bad factory registered earlier. + }{}, + []UserType1{}, + ) + if got, want := len(c.errors), 5; got != want { + t.Fatalf("SchemaCoder.Validate did not fail as expected. Got %v errors logged, but want %v", got, want) + } + if !strings.Contains(c.errors[0].fmt, "No test values") { + t.Fatalf("SchemaCoder.Validate with no values did not fail. fmt: %q", c.errors[0].fmt) + } + if !strings.Contains(c.errors[1].fmt, "Unable to build encoder function with given factory") { + t.Fatalf("SchemaCoder.Validate with no values did not fail. fmt: %q", c.errors[1].fmt) + } + if !strings.Contains(c.errors[2].fmt, "Unable to build decoder function with given factory") { + t.Fatalf("SchemaCoder.Validate with no values did not fail. fmt: %q", c.errors[2].fmt) + } + if !strings.Contains(c.errors[3].fmt, "Unable to build encoder function for schema equivalent type") { + t.Fatalf("SchemaCoder.Validate with no values did not fail. fmt: %q", c.errors[3].fmt) + } + if !strings.Contains(c.errors[4].fmt, "Unable to build decoder function for schema equivalent type") { + t.Fatalf("SchemaCoder.Validate with no values did not fail. fmt: %q", c.errors[4].fmt) + } + }) +} + +type msg struct { + fmt string + params []interface{} +} + +type checker struct { + errors []msg + + runCount int + failNowCalled bool +} + +func (c *checker) Helper() {} + +func (c *checker) Run(string, func(*testing.T)) bool { + c.runCount++ + return true +} + +func (c *checker) Errorf(fmt string, params ...interface{}) { + c.errors = append(c.errors, msg{ + fmt: fmt, + params: params, + }) +} + +func (c *checker) Failed() bool { + return len(c.errors) > 0 +} + +func (c *checker) FailNow() { + c.failNowCalled = true +} diff --git a/sdks/go/pkg/beam/core/graph/coder/time.go b/sdks/go/pkg/beam/core/graph/coder/time.go index 2eac9bff7700..6b86d16444b0 100644 --- a/sdks/go/pkg/beam/core/graph/coder/time.go +++ b/sdks/go/pkg/beam/core/graph/coder/time.go @@ -19,8 +19,8 @@ import ( "io" "math" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // EncodeEventTime encodes an EventTime as an uint64. The encoding is diff --git a/sdks/go/pkg/beam/core/graph/coder/time_test.go b/sdks/go/pkg/beam/core/graph/coder/time_test.go index 5553057b50f5..313f65284d65 100644 --- a/sdks/go/pkg/beam/core/graph/coder/time_test.go +++ b/sdks/go/pkg/beam/core/graph/coder/time_test.go @@ -19,7 +19,7 @@ import ( "bytes" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" ) func TestEncodeDecodeEventTime(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/graph/coder/varint.go b/sdks/go/pkg/beam/core/graph/coder/varint.go index 4f311a0d3990..18780a3775de 100644 --- a/sdks/go/pkg/beam/core/graph/coder/varint.go +++ b/sdks/go/pkg/beam/core/graph/coder/varint.go @@ -18,8 +18,8 @@ package coder import ( "io" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // ErrVarIntTooLong indicates a data corruption issue that needs special diff --git a/sdks/go/pkg/beam/core/graph/edge.go b/sdks/go/pkg/beam/core/graph/edge.go index 8016f31e5be7..e1f0d0be9035 100644 --- a/sdks/go/pkg/beam/core/graph/edge.go +++ b/sdks/go/pkg/beam/core/graph/edge.go @@ -20,12 +20,12 @@ import ( "reflect" "sort" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Opcode represents a primitive Beam instruction kind. @@ -137,6 +137,13 @@ func (o *Outbound) String() string { type Payload struct { URN string Data []byte + + // Optional fields mapping tags to inputs. If present, will override + // the default IO tagging for the transform's input PCollections. + InputsMap map[string]int + // Optional fields mapping tags to outputs. If present, will override + // the default IO tagging for the transform's output PCollections. + OutputsMap map[string]int } // MultiEdge represents a primitive data processing operation. Each non-user @@ -289,9 +296,12 @@ func NewCrossLanguage(g *Graph, s *Scope, ext *ExternalTransform, ins []*Inbound edge.Op = External edge.External = ext - windowingStrategy := inputWindow([]*Node{ins[0].From}) + ws := window.DefaultWindowingStrategy() + if len(ins) > 0 { + ws = inputWindow([]*Node{ins[0].From}) + } for _, o := range outs { - o.To.w = windowingStrategy + o.To.w = ws } isBoundedUpdater := func(n *Node, bounded bool) { @@ -369,6 +379,31 @@ func NewExternal(g *Graph, s *Scope, payload *Payload, in []*Node, out []typex.F return edge } +// NewTaggedExternal inserts an External transform with tagged inbound and +// outbound connections. The system makes no assumptions about what this +// transform might do. +func NewTaggedExternal(g *Graph, s *Scope, payload *Payload, ins []*Inbound, outs []*Outbound, bounded bool) *MultiEdge { + edge := g.NewEdge(s) + edge.Op = External + edge.Payload = payload + + var windowingStrategy *window.WindowingStrategy + if len(ins) == 0 { + windowingStrategy = window.DefaultWindowingStrategy() + } else { + windowingStrategy = inputWindow([]*Node{ins[0].From}) + } + + for _, o := range outs { + o.To.w = windowingStrategy + o.To.bounded = bounded + } + + edge.Input = ins + edge.Output = outs + return edge +} + // NewParDo inserts a new ParDo edge into the graph. func NewParDo(g *Graph, s *Scope, u *DoFn, in []*Node, rc *coder.Coder, typedefs map[string]reflect.Type) (*MultiEdge, error) { return newDoFnNode(ParDo, g, s, u, in, rc, typedefs) @@ -498,13 +533,13 @@ func NewImpulse(g *Graph, s *Scope, value []byte) *MultiEdge { } // NewWindowInto inserts a new WindowInto edge into the graph. -func NewWindowInto(g *Graph, s *Scope, wfn *window.Fn, in *Node) *MultiEdge { - n := g.NewNode(in.Type(), &window.WindowingStrategy{Fn: wfn}, in.Bounded()) +func NewWindowInto(g *Graph, s *Scope, ws *window.WindowingStrategy, in *Node) *MultiEdge { + n := g.NewNode(in.Type(), ws, in.Bounded()) n.Coder = in.Coder edge := g.NewEdge(s) edge.Op = WindowInto - edge.WindowFn = wfn + edge.WindowFn = ws.Fn edge.Input = []*Inbound{{Kind: Main, From: in, Type: in.Type()}} edge.Output = []*Outbound{{To: n, Type: in.Type()}} return edge diff --git a/sdks/go/pkg/beam/core/graph/fn.go b/sdks/go/pkg/beam/core/graph/fn.go index 534f4f7d610e..6ad5c1dfaed8 100644 --- a/sdks/go/pkg/beam/core/graph/fn.go +++ b/sdks/go/pkg/beam/core/graph/fn.go @@ -19,11 +19,11 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Fn holds either a function or struct receiver. @@ -41,6 +41,8 @@ type Fn struct { // methods holds the public methods (or the function) by their beam // names. methods map[string]*funcx.Fn + // annotations holds the annotations of the struct. + annotations map[string][]byte } // Name returns the name of the function or struct. @@ -104,6 +106,14 @@ func NewFn(fn interface{}) (*Fn, error) { case reflect.Struct: methods := make(map[string]*funcx.Fn) + annotations := make(map[string][]byte) + af := reflect.Indirect(val).FieldByName("Annotations") + if af.IsValid() { + a, ok := af.Interface().(map[string][]byte) + if ok { + annotations = a + } + } if methodsFuncs, ok := reflectx.WrapMethods(fn); ok { for name, mfn := range methodsFuncs { f, err := funcx.New(mfn) @@ -112,7 +122,7 @@ func NewFn(fn interface{}) (*Fn, error) { } methods[name] = f } - return &Fn{Recv: fn, methods: methods}, nil + return &Fn{Recv: fn, methods: methods, annotations: annotations}, nil } // TODO(lostluck): Consider moving this into the reflectx package. for i := 0; i < val.Type().NumMethod(); i++ { @@ -138,7 +148,7 @@ func NewFn(fn interface{}) (*Fn, error) { } methods[m.Name] = f } - return &Fn{Recv: fn, methods: methods}, nil + return &Fn{Recv: fn, methods: methods, annotations: annotations}, nil default: return nil, errors.Errorf("value %v must be function or (ptr to) struct", fn) @@ -240,6 +250,11 @@ func (f *DoFn) TeardownFn() *funcx.Fn { return f.methods[teardownName] } +// Annotations returns the optional annotations of the DoFn, if present. +func (f *DoFn) Annotations() map[string][]byte { + return f.annotations +} + // Name returns the name of the function or struct. func (f *DoFn) Name() string { return (*Fn)(f).Name() diff --git a/sdks/go/pkg/beam/core/graph/fn_test.go b/sdks/go/pkg/beam/core/graph/fn_test.go index f56aba3c19f3..84d318e866fd 100644 --- a/sdks/go/pkg/beam/core/graph/fn_test.go +++ b/sdks/go/pkg/beam/core/graph/fn_test.go @@ -20,7 +20,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) func TestNewDoFn(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/graph/graph.go b/sdks/go/pkg/beam/core/graph/graph.go index fb80cba8bcaa..474ab1cb37da 100644 --- a/sdks/go/pkg/beam/core/graph/graph.go +++ b/sdks/go/pkg/beam/core/graph/graph.go @@ -19,9 +19,9 @@ import ( "fmt" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Graph represents an in-progress deferred execution graph and is easily diff --git a/sdks/go/pkg/beam/core/graph/node.go b/sdks/go/pkg/beam/core/graph/node.go index 9ed998f28b25..6ae3c30fe358 100644 --- a/sdks/go/pkg/beam/core/graph/node.go +++ b/sdks/go/pkg/beam/core/graph/node.go @@ -18,9 +18,9 @@ package graph import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // Node is a typed connector describing the data type and encoding. A node diff --git a/sdks/go/pkg/beam/core/graph/window/fn.go b/sdks/go/pkg/beam/core/graph/window/fn.go index 7197571ae664..df32a97b89c2 100644 --- a/sdks/go/pkg/beam/core/graph/window/fn.go +++ b/sdks/go/pkg/beam/core/graph/window/fn.go @@ -19,7 +19,7 @@ import ( "fmt" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" ) // Kind is the semantic type of a window fn. @@ -29,7 +29,7 @@ const ( GlobalWindows Kind = "GLO" FixedWindows Kind = "FIX" SlidingWindows Kind = "SLI" - Sessions Kind = "SES" // TODO + Sessions Kind = "SES" ) // NewGlobalWindows returns the default WindowFn, which places all elements diff --git a/sdks/go/pkg/beam/core/graph/window/strategy.go b/sdks/go/pkg/beam/core/graph/window/strategy.go index 09609d62bba9..0dca6ab60320 100644 --- a/sdks/go/pkg/beam/core/graph/window/strategy.go +++ b/sdks/go/pkg/beam/core/graph/window/strategy.go @@ -16,12 +16,22 @@ // Package window contains window representation, windowing strategies and utilities. package window +type AccumulationMode string + +const ( + Unspecified AccumulationMode = "AccumulationMode_UNSPECIFIED" + Discarding AccumulationMode = "AccumulationMode_DISCARDING" + Accumulating AccumulationMode = "AccumulationMode_ACCUMULATING" + Retracting AccumulationMode = "AccumulationMode_RETRACTING" +) + // WindowingStrategy defines the types of windowing used in a pipeline and contains // the data to support executing a windowing strategy. type WindowingStrategy struct { - Fn *Fn - - // TODO(BEAM-3304): trigger support + Fn *Fn + Trigger Trigger + AccumulationMode AccumulationMode + AllowedLateness int // in milliseconds } func (ws *WindowingStrategy) Equals(o *WindowingStrategy) bool { @@ -34,5 +44,5 @@ func (ws *WindowingStrategy) String() string { // DefaultWindowingStrategy returns the default windowing strategy. func DefaultWindowingStrategy() *WindowingStrategy { - return &WindowingStrategy{Fn: NewGlobalWindows()} + return &WindowingStrategy{Fn: NewGlobalWindows(), Trigger: Trigger{Kind: DefaultTrigger}, AccumulationMode: Discarding, AllowedLateness: 0} } diff --git a/sdks/go/pkg/beam/core/graph/window/trigger.go b/sdks/go/pkg/beam/core/graph/window/trigger.go new file mode 100644 index 000000000000..9068076a9dd2 --- /dev/null +++ b/sdks/go/pkg/beam/core/graph/window/trigger.go @@ -0,0 +1,95 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package window + +import "fmt" + +type Trigger struct { + Kind string + SubTriggers []Trigger + Delay int64 // in milliseconds + ElementCount int32 + EarlyTrigger *Trigger + LateTrigger *Trigger +} + +const ( + DefaultTrigger string = "Trigger_Default_" + AlwaysTrigger string = "Trigger_Always_" + AfterAnyTrigger string = "Trigger_AfterAny_" + AfterAllTrigger string = "Trigger_AfterAll_" + AfterProcessingTimeTrigger string = "Trigger_AfterProcessing_Time_" + ElementCountTrigger string = "Trigger_ElementCount_" + AfterEndOfWindowTrigger string = "Trigger_AfterEndOfWindow_" + RepeatTrigger string = "Trigger_Repeat_" + OrFinallyTrigger string = "Trigger_OrFinally_" + NeverTrigger string = "Trigger_Never_" + AfterSynchronizedProcessingTimeTrigger string = "Trigger_AfterSynchronizedProcessingTime_" +) + +// TriggerDefault constructs a default trigger that fires once after the end of window. +// Late Data is discarded. +func TriggerDefault() Trigger { + return Trigger{Kind: DefaultTrigger} +} + +// TriggerAlways constructs an always trigger that keeps firing immediately after an element is processed. +// Equivalent to window.TriggerRepeat(window.TriggerAfterCount(1)) +func TriggerAlways() Trigger { + return Trigger{Kind: AlwaysTrigger} +} + +// TriggerAfterCount constructs an element count trigger that fires after atleast `count` number of elements are processed. +func TriggerAfterCount(count int32) Trigger { + return Trigger{Kind: ElementCountTrigger, ElementCount: count} +} + +// TriggerAfterProcessingTime constructs an after processing time trigger that fires after 'delay' milliseconds of processing time have passed. +func TriggerAfterProcessingTime(delay int64) Trigger { + return Trigger{Kind: AfterProcessingTimeTrigger, Delay: delay} +} + +// TriggerRepeat constructs a repeat trigger that fires a trigger repeatedly once the condition has been met. +// Ex: window.TriggerRepeat(window.TriggerAfterCount(1)) is same as window.TriggerAlways(). +func TriggerRepeat(tr Trigger) Trigger { + return Trigger{Kind: RepeatTrigger, SubTriggers: []Trigger{tr}} +} + +// TriggerAfterEndOfWindow constructs an end of window trigger that is configurable for early firing trigger(before the end of window) +// and late firing trigger(after the end of window). +// Default Options are: Default Trigger for EarlyFiring and No LateFiring. Override it with EarlyFiring and LateFiring methods on this trigger. +func TriggerAfterEndOfWindow() Trigger { + defaultEarly := TriggerDefault() + return Trigger{Kind: AfterEndOfWindowTrigger, EarlyTrigger: &defaultEarly, LateTrigger: nil} +} + +// EarlyFiring configures AfterEndOfWindow trigger with an early firing trigger. +func (tr Trigger) EarlyFiring(early Trigger) Trigger { + if tr.Kind != AfterEndOfWindowTrigger { + panic(fmt.Errorf("can't apply early firing to %s, got: %s, want: AfterEndOfWindowTrigger", tr.Kind, tr.Kind)) + } + tr.EarlyTrigger = &early + return tr +} + +// LateFiring configures AfterEndOfWindow trigger with a late firing trigger +func (tr Trigger) LateFiring(late Trigger) Trigger { + if tr.Kind != AfterEndOfWindowTrigger { + panic(fmt.Errorf("can't apply late firing to %s, got: %s, want: AfterEndOfWindowTrigger", tr.Kind, tr.Kind)) + } + tr.LateTrigger = &late + return tr +} diff --git a/sdks/go/pkg/beam/core/graph/window/windows.go b/sdks/go/pkg/beam/core/graph/window/windows.go index 330cf525545e..1a0643b802f7 100644 --- a/sdks/go/pkg/beam/core/graph/window/windows.go +++ b/sdks/go/pkg/beam/core/graph/window/windows.go @@ -18,8 +18,8 @@ package window import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) var ( diff --git a/sdks/go/pkg/beam/core/graph/xlang.go b/sdks/go/pkg/beam/core/graph/xlang.go index 7b41babf39cb..964d271c3de7 100644 --- a/sdks/go/pkg/beam/core/graph/xlang.go +++ b/sdks/go/pkg/beam/core/graph/xlang.go @@ -20,7 +20,7 @@ import ( "strings" "time" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) var ( diff --git a/sdks/go/pkg/beam/core/metrics/dumper.go b/sdks/go/pkg/beam/core/metrics/dumper.go index c731f1ab663c..94ef7ed6657a 100644 --- a/sdks/go/pkg/beam/core/metrics/dumper.go +++ b/sdks/go/pkg/beam/core/metrics/dumper.go @@ -21,7 +21,7 @@ import ( "sort" "time" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) // DumpToLog is a debugging function that outputs all metrics available locally diff --git a/sdks/go/pkg/beam/core/metrics/metrics.go b/sdks/go/pkg/beam/core/metrics/metrics.go index 33b8ef2a26e4..3633e0c7cb49 100644 --- a/sdks/go/pkg/beam/core/metrics/metrics.go +++ b/sdks/go/pkg/beam/core/metrics/metrics.go @@ -50,11 +50,12 @@ import ( "context" "fmt" "hash/fnv" + "sort" "sync" "sync/atomic" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" ) // Metric cells are named and scoped by ptransform, and bundle, @@ -528,6 +529,31 @@ func (r CounterResult) Result() int64 { return r.Attempted } +// MergeCounters combines counter metrics that share a common key. +func MergeCounters( + attempted map[StepKey]int64, + committed map[StepKey]int64) []CounterResult { + res := make([]CounterResult, 0) + merged := map[StepKey]CounterResult{} + + for k, v := range attempted { + merged[k] = CounterResult{Attempted: v, Key: k} + } + for k, v := range committed { + m, ok := merged[k] + if ok { + merged[k] = CounterResult{Attempted: m.Attempted, Committed: v, Key: k} + } else { + merged[k] = CounterResult{Committed: v, Key: k} + } + } + + for _, v := range merged { + res = append(res, v) + } + return res +} + // DistributionResult is an attempted and a commited value of a distribution // metric plus key. type DistributionResult struct { @@ -545,6 +571,31 @@ func (r DistributionResult) Result() DistributionValue { return r.Attempted } +// MergeDistributions combines distribution metrics that share a common key. +func MergeDistributions( + attempted map[StepKey]DistributionValue, + committed map[StepKey]DistributionValue) []DistributionResult { + res := make([]DistributionResult, 0) + merged := map[StepKey]DistributionResult{} + + for k, v := range attempted { + merged[k] = DistributionResult{Attempted: v, Key: k} + } + for k, v := range committed { + m, ok := merged[k] + if ok { + merged[k] = DistributionResult{Attempted: m.Attempted, Committed: v, Key: k} + } else { + merged[k] = DistributionResult{Committed: v, Key: k} + } + } + + for _, v := range merged { + res = append(res, v) + } + return res +} + // GaugeResult is an attempted and a commited value of a gauge metric plus // key. type GaugeResult struct { @@ -566,3 +617,93 @@ func (r GaugeResult) Result() GaugeValue { type StepKey struct { Step, Name, Namespace string } + +// MergeGauges combines gauge metrics that share a common key. +func MergeGauges( + attempted map[StepKey]GaugeValue, + committed map[StepKey]GaugeValue) []GaugeResult { + res := make([]GaugeResult, 0) + merged := map[StepKey]GaugeResult{} + + for k, v := range attempted { + merged[k] = GaugeResult{Attempted: v, Key: k} + } + for k, v := range committed { + m, ok := merged[k] + if ok { + merged[k] = GaugeResult{Attempted: m.Attempted, Committed: v, Key: k} + } else { + merged[k] = GaugeResult{Committed: v, Key: k} + } + } + + for _, v := range merged { + res = append(res, v) + } + return res +} + +// MetricsExtractor extracts the metrics.Results from Store using ctx. +// This is same as what metrics.dumperExtractor and metrics.dumpTo would do together. +func MetricsExtractor(ctx context.Context) Results { + store := GetStore(ctx) + m := make(map[Labels]interface{}) + e := &Extractor{ + SumInt64: func(l Labels, v int64) { + m[l] = &counter{value: v} + }, + DistributionInt64: func(l Labels, count, sum, min, max int64) { + m[l] = &distribution{count: count, sum: sum, min: min, max: max} + }, + GaugeInt64: func(l Labels, v int64, t time.Time) { + m[l] = &gauge{v: v, t: t} + }, + } + e.ExtractFrom(store) + + var ls []Labels + for l := range m { + ls = append(ls, l) + } + + sort.Slice(ls, func(i, j int) bool { + if ls[i].transform < ls[j].transform { + return true + } + tEq := ls[i].transform == ls[j].transform + if tEq && ls[i].namespace < ls[j].namespace { + return true + } + nsEq := ls[i].namespace == ls[j].namespace + if tEq && nsEq && ls[i].name < ls[j].name { + return true + } + return false + }) + + r := Results{counters: []CounterResult{}, distributions: []DistributionResult{}, gauges: []GaugeResult{}} + for _, l := range ls { + key := StepKey{Step: l.transform, Name: l.name, Namespace: l.namespace} + switch opt := m[l]; opt.(type) { + case *counter: + attempted := make(map[StepKey]int64) + committed := make(map[StepKey]int64) + attempted[key] = 0 + committed[key] = opt.(*counter).value + r.counters = append(r.counters, MergeCounters(attempted, committed)...) + case *distribution: + attempted := make(map[StepKey]DistributionValue) + committed := make(map[StepKey]DistributionValue) + attempted[key] = DistributionValue{} + committed[key] = DistributionValue{opt.(*distribution).count, opt.(*distribution).sum, opt.(*distribution).min, opt.(*distribution).max} + r.distributions = append(r.distributions, MergeDistributions(attempted, committed)...) + case *gauge: + attempted := make(map[StepKey]GaugeValue) + committed := make(map[StepKey]GaugeValue) + attempted[key] = GaugeValue{} + committed[key] = GaugeValue{opt.(*gauge).v, opt.(*gauge).t} + r.gauges = append(r.gauges, MergeGauges(attempted, committed)...) + } + } + return r +} diff --git a/sdks/go/pkg/beam/core/metrics/metrics_test.go b/sdks/go/pkg/beam/core/metrics/metrics_test.go index a0127d99b220..db247291abb5 100644 --- a/sdks/go/pkg/beam/core/metrics/metrics_test.go +++ b/sdks/go/pkg/beam/core/metrics/metrics_test.go @@ -20,6 +20,9 @@ import ( "fmt" "testing" "time" + + "github.com/google/go-cmp/cmp" + "github.com/google/go-cmp/cmp/cmpopts" ) // bID is a bundleId to use in the tests, if nothing more specific is needed. @@ -304,6 +307,147 @@ func TestNameCollisions(t *testing.T) { } } +func TestMergeCounters(t *testing.T) { + realKey := StepKey{Name: "real"} + tests := []struct { + name string + attempted, committed map[StepKey]int64 + want []CounterResult + }{ + { + name: "merge", + attempted: map[StepKey]int64{ + realKey: 5, + }, + committed: map[StepKey]int64{ + realKey: 7, + }, + want: []CounterResult{{Attempted: 5, Committed: 7, Key: realKey}}, + }, { + name: "attempted only", + attempted: map[StepKey]int64{ + realKey: 5, + }, + committed: map[StepKey]int64{}, + want: []CounterResult{{Attempted: 5, Key: realKey}}, + }, { + name: "committed only", + attempted: map[StepKey]int64{}, + committed: map[StepKey]int64{ + realKey: 7, + }, + want: []CounterResult{{Committed: 7, Key: realKey}}, + }, + } + less := func(a, b CounterResult) bool { + return a.Key.Name < b.Key.Name + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + got := MergeCounters(test.attempted, test.committed) + if d := cmp.Diff(test.want, got, cmpopts.SortSlices(less)); d != "" { + t.Errorf("MergeCounters(%+v, %+v) = %+v, want %+v\ndiff:\n%v", test.attempted, test.committed, got, test.want, d) + } + }) + } +} + +func TestMergeDistributions(t *testing.T) { + realKey := StepKey{Name: "real"} + distA := DistributionValue{Count: 2, Sum: 5, Min: 1, Max: 4} + distB := DistributionValue{Count: 3, Sum: 5, Min: 1, Max: 2} + tests := []struct { + name string + attempted, committed map[StepKey]DistributionValue + want []DistributionResult + }{ + { + name: "merge", + attempted: map[StepKey]DistributionValue{ + realKey: distA, + }, + committed: map[StepKey]DistributionValue{ + realKey: distB, + }, + want: []DistributionResult{{Attempted: distA, Committed: distB, Key: realKey}}, + }, { + name: "attempted only", + attempted: map[StepKey]DistributionValue{ + realKey: distA, + }, + committed: map[StepKey]DistributionValue{}, + want: []DistributionResult{{Attempted: distA, Key: realKey}}, + }, { + name: "committed only", + attempted: map[StepKey]DistributionValue{}, + committed: map[StepKey]DistributionValue{ + realKey: distB, + }, + want: []DistributionResult{{Committed: distB, Key: realKey}}, + }, + } + less := func(a, b DistributionResult) bool { + return a.Key.Name < b.Key.Name + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + got := MergeDistributions(test.attempted, test.committed) + if d := cmp.Diff(test.want, got, cmpopts.SortSlices(less)); d != "" { + t.Errorf("MergeDistributions(%+v, %+v) = %+v, want %+v\ndiff:\n%v", test.attempted, test.committed, got, test.want, d) + } + }) + } +} + +func TestMergeGauges(t *testing.T) { + realKey := StepKey{Name: "real"} + now := time.Now() + later := now.Add(time.Hour) + gaugeA := GaugeValue{Value: 2, Timestamp: now} + gaugeB := GaugeValue{Value: 3, Timestamp: later} + tests := []struct { + name string + attempted, committed map[StepKey]GaugeValue + want []GaugeResult + }{ + { + name: "merge", + attempted: map[StepKey]GaugeValue{ + realKey: gaugeA, + }, + committed: map[StepKey]GaugeValue{ + realKey: gaugeB, + }, + want: []GaugeResult{{Attempted: gaugeA, Committed: gaugeB, Key: realKey}}, + }, { + name: "attempted only", + attempted: map[StepKey]GaugeValue{ + realKey: gaugeA, + }, + committed: map[StepKey]GaugeValue{}, + want: []GaugeResult{{Attempted: gaugeA, Key: realKey}}, + }, { + name: "committed only", + attempted: map[StepKey]GaugeValue{}, + committed: map[StepKey]GaugeValue{ + realKey: gaugeB, + }, + want: []GaugeResult{{Committed: gaugeB, Key: realKey}}, + }, + } + less := func(a, b DistributionResult) bool { + return a.Key.Name < b.Key.Name + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + got := MergeGauges(test.attempted, test.committed) + if d := cmp.Diff(test.want, got, cmpopts.SortSlices(less)); d != "" { + t.Errorf("MergeGauges(%+v, %+v) = %+v, want %+v\ndiff:\n%v", test.attempted, test.committed, got, test.want, d) + } + }) + } +} + // Run on @lostluck's desktop (2020/01/21) go1.13.4 // // Allocs & bytes should be consistent within go versions, but ns/op is relative to the running machine. diff --git a/sdks/go/pkg/beam/core/metrics/store.go b/sdks/go/pkg/beam/core/metrics/store.go index e90cfeb1e671..c0355b544bc8 100644 --- a/sdks/go/pkg/beam/core/metrics/store.go +++ b/sdks/go/pkg/beam/core/metrics/store.go @@ -52,12 +52,34 @@ func PCollectionLabels(pcollection string) Labels { return Labels{pcollection: pcollection} } +// PCollection returns the PCollection id for this metric. +func (l Labels) PCollection() string { return l.pcollection } + // PTransformLabels builds a Labels for transform metrics. // Intended for framework use. func PTransformLabels(transform string) Labels { return Labels{transform: transform} } +// Map produces a map of present labels to their values. +// +// Returns nil map if invalid. +func (l Labels) Map() map[string]string { + if l.transform != "" { + return map[string]string{ + "PTRANSFORM": l.transform, + "NAMESPACE": l.namespace, + "NAME": l.name, + } + } + if l.pcollection != "" { + return map[string]string{ + "PCOLLECTION": l.pcollection, + } + } + return nil +} + // Extractor allows users to access metrics programatically after // pipeline completion. Users assign functions to fields that // interest them, and that function is called for each metric diff --git a/sdks/go/pkg/beam/core/runtime/coderx/coderx.shims.go b/sdks/go/pkg/beam/core/runtime/coderx/coderx.shims.go index be6e3627d16c..e51c81effec5 100644 --- a/sdks/go/pkg/beam/core/runtime/coderx/coderx.shims.go +++ b/sdks/go/pkg/beam/core/runtime/coderx/coderx.shims.go @@ -22,9 +22,10 @@ import ( "reflect" // Library imports - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func init() { @@ -45,6 +46,7 @@ func init() { runtime.RegisterFunction(encVarIntZ) runtime.RegisterFunction(encVarUintZ) runtime.RegisterType(reflect.TypeOf((*reflect.Type)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*reflect.Type)(nil)).Elem()) reflectx.RegisterFunc(reflect.TypeOf((*func(int32) []byte)(nil)).Elem(), funcMakerInt32ГSliceOfByte) reflectx.RegisterFunc(reflect.TypeOf((*func(int64) []byte)(nil)).Elem(), funcMakerInt64ГSliceOfByte) reflectx.RegisterFunc(reflect.TypeOf((*func(reflect.Type, []byte) (typex.T, error))(nil)).Elem(), funcMakerReflect۰TypeSliceOfByteГTypex۰TError) diff --git a/sdks/go/pkg/beam/core/runtime/coderx/doc.go b/sdks/go/pkg/beam/core/runtime/coderx/doc.go index 7cb2bf21cea6..a633fe53f8ab 100644 --- a/sdks/go/pkg/beam/core/runtime/coderx/doc.go +++ b/sdks/go/pkg/beam/core/runtime/coderx/doc.go @@ -17,6 +17,6 @@ // in the beam model. package coderx -//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +//go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen //go:generate starcgen --package=coderx --identifiers=encString,decString,encUint32,decUint32,encInt32,decInt32,encUint64,decUint64,encInt64,decInt64,encVarIntZ,decVarIntZ,encVarUintZ,decVarUintZ,encFloat,decFloat //go:generate go fmt diff --git a/sdks/go/pkg/beam/core/runtime/coderx/float.go b/sdks/go/pkg/beam/core/runtime/coderx/float.go index b502d05f350b..f5e7f80adc27 100644 --- a/sdks/go/pkg/beam/core/runtime/coderx/float.go +++ b/sdks/go/pkg/beam/core/runtime/coderx/float.go @@ -21,10 +21,10 @@ import ( "math/bits" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) func encFloat(v typex.T) []byte { diff --git a/sdks/go/pkg/beam/core/runtime/coderx/float_test.go b/sdks/go/pkg/beam/core/runtime/coderx/float_test.go index 460057e900b8..76e1a34eaa10 100644 --- a/sdks/go/pkg/beam/core/runtime/coderx/float_test.go +++ b/sdks/go/pkg/beam/core/runtime/coderx/float_test.go @@ -19,7 +19,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func TestFloat(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/runtime/coderx/int.go b/sdks/go/pkg/beam/core/runtime/coderx/int.go index 292262b912cc..e7971110c70b 100644 --- a/sdks/go/pkg/beam/core/runtime/coderx/int.go +++ b/sdks/go/pkg/beam/core/runtime/coderx/int.go @@ -18,8 +18,8 @@ package coderx import ( "encoding/binary" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // Fixed-sized custom coders for integers. diff --git a/sdks/go/pkg/beam/core/runtime/coderx/string.go b/sdks/go/pkg/beam/core/runtime/coderx/string.go index 8eaa8e41e92a..21476baeacce 100644 --- a/sdks/go/pkg/beam/core/runtime/coderx/string.go +++ b/sdks/go/pkg/beam/core/runtime/coderx/string.go @@ -16,9 +16,9 @@ package coderx import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // NewString returns a coder for the string type. It uses the native diff --git a/sdks/go/pkg/beam/core/runtime/coderx/varint.go b/sdks/go/pkg/beam/core/runtime/coderx/varint.go index 07e4ce16dad0..c3919e846b37 100644 --- a/sdks/go/pkg/beam/core/runtime/coderx/varint.go +++ b/sdks/go/pkg/beam/core/runtime/coderx/varint.go @@ -20,9 +20,9 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // NewVarIntZ returns a varint coder for the given integer type. It uses a zig-zag scheme, diff --git a/sdks/go/pkg/beam/core/runtime/coderx/varint_test.go b/sdks/go/pkg/beam/core/runtime/coderx/varint_test.go index 766ab08f6c83..b426b1ac4bf3 100644 --- a/sdks/go/pkg/beam/core/runtime/coderx/varint_test.go +++ b/sdks/go/pkg/beam/core/runtime/coderx/varint_test.go @@ -19,7 +19,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func TestVarIntZ(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/runtime/exec/coder.go b/sdks/go/pkg/beam/core/runtime/exec/coder.go index 447d3ac03567..c7a19eae0470 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/coder.go +++ b/sdks/go/pkg/beam/core/runtime/exec/coder.go @@ -23,12 +23,12 @@ import ( "bytes" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // NOTE(herohde) 4/30/2017: The main complication is CoGBK results, which have @@ -122,6 +122,9 @@ func MakeElementEncoder(c *coder.Coder) ElementEncoder { enc: MakeWindowEncoder(c.Window), } + case coder.PaneInfo: + return &paneEncoder{} + case coder.Iterable: return &iterableEncoder{ enc: MakeElementEncoder(c.Components[0]), @@ -228,6 +231,9 @@ func MakeElementDecoder(c *coder.Coder) ElementDecoder { dec: MakeWindowDecoder(c.Window), } + case coder.PaneInfo: + return &paneDecoder{} + // Note: Iterables in CoGBK are handled in datasource.go instead. case coder.Iterable: return &iterableDecoder{ @@ -749,10 +755,11 @@ func (c *arrayDecoder) Decode(r io.Reader) (*FullValue, error) { type windowedValueEncoder struct { elm ElementEncoder win WindowEncoder + // need to add pane encoder here } func (e *windowedValueEncoder) Encode(val *FullValue, w io.Writer) error { - if err := EncodeWindowedValueHeader(e.win, val.Windows, val.Timestamp, w); err != nil { + if err := EncodeWindowedValueHeader(e.win, val.Windows, val.Timestamp, val.Pane, w); err != nil { return err } return e.elm.Encode(val, w) @@ -765,7 +772,7 @@ type windowedValueDecoder struct { func (d *windowedValueDecoder) DecodeTo(r io.Reader, fv *FullValue) error { // Encoding: beam utf8 string (length prefix + run of bytes) - w, et, err := DecodeWindowedValueHeader(d.win, r) + w, et, pn, err := DecodeWindowedValueHeader(d.win, r) if err != nil { return err } @@ -774,6 +781,7 @@ func (d *windowedValueDecoder) DecodeTo(r io.Reader, fv *FullValue) error { } fv.Windows = w fv.Timestamp = et + fv.Pane = pn return nil } @@ -842,6 +850,31 @@ func (d *timerDecoder) Decode(r io.Reader) (*FullValue, error) { return fv, nil } +type paneEncoder struct{} + +func (*paneEncoder) Encode(val *FullValue, w io.Writer) error { + return coder.EncodePane(val.Pane, w) +} + +type paneDecoder struct{} + +func (*paneDecoder) DecodeTo(r io.Reader, fv *FullValue) error { + data, err := coder.DecodePane(r) + if err != nil { + return err + } + *fv = FullValue{Pane: data} + return nil +} + +func (d *paneDecoder) Decode(r io.Reader) (*FullValue, error) { + fv := &FullValue{} + if err := d.DecodeTo(r, fv); err != nil { + return nil, err + } + return fv, nil +} + type rowEncoder struct { enc func(interface{}, io.Writer) error } @@ -1076,10 +1109,8 @@ func (*intervalWindowDecoder) DecodeSingle(r io.Reader) (typex.Window, error) { return window.IntervalWindow{Start: mtime.FromMilliseconds(end.Milliseconds() - int64(duration)), End: end}, nil } -var paneNoFiring = []byte{0xf} - // EncodeWindowedValueHeader serializes a windowed value header. -func EncodeWindowedValueHeader(enc WindowEncoder, ws []typex.Window, t typex.EventTime, w io.Writer) error { +func EncodeWindowedValueHeader(enc WindowEncoder, ws []typex.Window, t typex.EventTime, p typex.PaneInfo, w io.Writer) error { // Encoding: Timestamp, Window, Pane (header) + Element if err := coder.EncodeEventTime(t, w); err != nil { @@ -1088,25 +1119,30 @@ func EncodeWindowedValueHeader(enc WindowEncoder, ws []typex.Window, t typex.Eve if err := enc.Encode(ws, w); err != nil { return err } - _, err := w.Write(paneNoFiring) + err := coder.EncodePane(p, w) return err } // DecodeWindowedValueHeader deserializes a windowed value header. -func DecodeWindowedValueHeader(dec WindowDecoder, r io.Reader) ([]typex.Window, typex.EventTime, error) { +func DecodeWindowedValueHeader(dec WindowDecoder, r io.Reader) ([]typex.Window, typex.EventTime, typex.PaneInfo, error) { // Encoding: Timestamp, Window, Pane (header) + Element + onError := func(err error) ([]typex.Window, typex.EventTime, typex.PaneInfo, error) { + return nil, mtime.ZeroTimestamp, typex.NoFiringPane(), err + } + t, err := coder.DecodeEventTime(r) if err != nil { - return nil, mtime.ZeroTimestamp, err + return onError(err) } ws, err := dec.Decode(r) if err != nil { - return nil, mtime.ZeroTimestamp, err + return onError(err) } - var data [1]byte - if err := ioutilx.ReadNBufUnsafe(r, data[:]); err != nil { // NO_FIRING pane - return nil, mtime.ZeroTimestamp, err + pn, err := coder.DecodePane(r) + if err != nil { + return onError(err) } - return ws, t, nil + + return ws, t, pn, nil } diff --git a/sdks/go/pkg/beam/core/runtime/exec/coder_test.go b/sdks/go/pkg/beam/core/runtime/exec/coder_test.go index a06ff189372c..25812aca4e56 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/coder_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/coder_test.go @@ -21,11 +21,11 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/coderx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/coderx" ) func TestCoders(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/runtime/exec/cogbk.go b/sdks/go/pkg/beam/core/runtime/exec/cogbk.go index 5f207a24e713..7107118db7c0 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/cogbk.go +++ b/sdks/go/pkg/beam/core/runtime/exec/cogbk.go @@ -20,7 +20,7 @@ import ( "context" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // TODO(BEAM-490): This file contains support for the handling of CoGBK diff --git a/sdks/go/pkg/beam/core/runtime/exec/combine.go b/sdks/go/pkg/beam/core/runtime/exec/combine.go index 0d5432a31a83..03cdd8bcfbde 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/combine.go +++ b/sdks/go/pkg/beam/core/runtime/exec/combine.go @@ -22,13 +22,13 @@ import ( "path" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/util/errorx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/errorx" ) // Combine is a Combine executor. Combiners do not have side inputs (or output). @@ -286,8 +286,17 @@ func (n *Combine) extract(ctx context.Context, accum interface{}) (interface{}, func (n *Combine) fail(err error) error { n.status = Broken - n.err.TrySetError(err) - return err + if err2, ok := err.(*doFnError); ok { + return err2 + } + combineError := &doFnError{ + doFn: n.Fn.Name(), + err: err, + uid: n.UID, + pid: n.PID, + } + n.err.TrySetError(combineError) + return combineError } func (n *Combine) String() string { @@ -303,7 +312,8 @@ func (n *Combine) String() string { // FinishBundle step. type LiftedCombine struct { *Combine - KeyCoder *coder.Coder + KeyCoder *coder.Coder + WindowCoder *coder.WindowCoder keyHash elementHasher cache map[uint64]FullValue @@ -318,7 +328,7 @@ func (n *LiftedCombine) Up(ctx context.Context) error { if err := n.Combine.Up(ctx); err != nil { return err } - n.keyHash = makeElementHasher(n.KeyCoder) + n.keyHash = makeElementHasher(n.KeyCoder, n.WindowCoder) return nil } @@ -338,8 +348,19 @@ func (n *LiftedCombine) ProcessElement(ctx context.Context, value *FullValue, va if n.status != Active { return errors.Errorf("invalid status for precombine %v: %v", n.UID, n.status) } + // The cache layer in lifted combines implicitly observes windows. Process each individually. + for _, w := range value.Windows { + err := n.processElementPerWindow(ctx, value, w) + if err != nil { + return n.fail(err) + } + } + return nil +} - key, err := n.keyHash.Hash(value.Elm) +func (n *LiftedCombine) processElementPerWindow(ctx context.Context, value *FullValue, w typex.Window) error { + // In lifted combines, the window is always observed, so it's included in the hash key. + key, err := n.keyHash.Hash(value.Elm, w) if err != nil { return n.fail(err) } @@ -392,7 +413,7 @@ func (n *LiftedCombine) ProcessElement(ctx context.Context, value *FullValue, va } // Cache the accumulator with the key - n.cache[key] = FullValue{Windows: value.Windows, Elm: value.Elm, Elm2: a, Timestamp: value.Timestamp} + n.cache[key] = FullValue{Windows: []typex.Window{w}, Elm: value.Elm, Elm2: a, Timestamp: value.Timestamp} return nil } diff --git a/sdks/go/pkg/beam/core/runtime/exec/combine_test.go b/sdks/go/pkg/beam/core/runtime/exec/combine_test.go index 5a90f73a2ba7..8badcbddbd7a 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/combine_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/combine_test.go @@ -24,13 +24,13 @@ import ( "strconv" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/coderx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/coderx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) var intInput = []interface{}{int(1), int(2), int(3), int(4), int(5), int(6)} @@ -84,6 +84,8 @@ func TestCombine(t *testing.T) { // ExtractOutput nodes work correctly after the lift has been performed. func TestLiftedCombine(t *testing.T) { withCoder := func(t *testing.T, suffix string, key interface{}, keyCoder *coder.Coder) { + // The test values are all single global window. + wc := coder.NewGlobalWindow() for _, test := range tests { t.Run(fnName(test.Fn)+"_"+suffix, func(t *testing.T) { edge := getCombineEdge(t, test.Fn, reflectx.Int, test.AccumCoder) @@ -91,8 +93,8 @@ func TestLiftedCombine(t *testing.T) { out := &CaptureNode{UID: 1} extract := &ExtractOutput{Combine: &Combine{UID: 2, Fn: edge.CombineFn, Out: out}} merge := &MergeAccumulators{Combine: &Combine{UID: 3, Fn: edge.CombineFn, Out: extract}} - gbk := &simpleGBK{UID: 4, KeyCoder: keyCoder, Out: merge} - precombine := &LiftedCombine{Combine: &Combine{UID: 5, Fn: edge.CombineFn, Out: gbk}, KeyCoder: keyCoder} + gbk := &simpleGBK{UID: 4, KeyCoder: keyCoder, WindowCoder: wc, Out: merge} + precombine := &LiftedCombine{Combine: &Combine{UID: 5, Fn: edge.CombineFn, Out: gbk}, KeyCoder: keyCoder, WindowCoder: wc} n := &FixedRoot{UID: 6, Elements: makeKVInput(key, test.Input...), Out: precombine} constructAndExecutePlan(t, []Unit{n, precombine, gbk, merge, extract, out}) @@ -360,9 +362,10 @@ func intCoder(t reflect.Type) *coder.Coder { // simpleGBK buffers all input and continues on FinishBundle. Use with small single-bundle data only. type simpleGBK struct { - UID UnitID - Out Node - KeyCoder *coder.Coder + UID UnitID + Out Node + KeyCoder *coder.Coder + WindowCoder *coder.WindowCoder hasher elementHasher m map[uint64]*group @@ -379,7 +382,7 @@ func (n *simpleGBK) ID() UnitID { func (n *simpleGBK) Up(ctx context.Context) error { n.m = make(map[uint64]*group) - n.hasher = makeElementHasher(n.KeyCoder) + n.hasher = makeElementHasher(n.KeyCoder, n.WindowCoder) return nil } @@ -390,7 +393,10 @@ func (n *simpleGBK) StartBundle(ctx context.Context, id string, data DataContext func (n *simpleGBK) ProcessElement(ctx context.Context, elm *FullValue, _ ...ReStream) error { key := elm.Elm value := elm.Elm2 - keyHash, err := n.hasher.Hash(key) + + // Consider generalizing this to multiple windows. + // The test values are all single global window. + keyHash, err := n.hasher.Hash(key, elm.Windows[0]) if err != nil { return err } diff --git a/sdks/go/pkg/beam/core/runtime/exec/datasink.go b/sdks/go/pkg/beam/core/runtime/exec/datasink.go index e11c47a9440c..36f2a5195ca2 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/datasink.go +++ b/sdks/go/pkg/beam/core/runtime/exec/datasink.go @@ -21,70 +21,83 @@ import ( "fmt" "io" "sync/atomic" - "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) -// DataSink is a Node. +// DataSink is a Node that writes element data to the data service.. type DataSink struct { UID UnitID SID StreamID Coder *coder.Coder + PCol *PCollection // Handles size metrics. - enc ElementEncoder - wEnc WindowEncoder - w io.WriteCloser - count int64 - start time.Time + enc ElementEncoder + wEnc WindowEncoder + w io.WriteCloser } +// ID returns the debug ID. func (n *DataSink) ID() UnitID { return n.UID } +// Up initializes the element and window encoders. func (n *DataSink) Up(ctx context.Context) error { n.enc = MakeElementEncoder(coder.SkipW(n.Coder)) n.wEnc = MakeWindowEncoder(n.Coder.Window) return nil } +// StartBundle opens the writer to the data service. func (n *DataSink) StartBundle(ctx context.Context, id string, data DataContext) error { w, err := data.Data.OpenWrite(ctx, n.SID) if err != nil { return err } n.w = w - atomic.StoreInt64(&n.count, 0) - n.start = time.Now() + // TODO[BEAM-6374): Properly handle the multiplex and flatten cases. + // Right now we just stop datasink collection. + if n.PCol != nil { + atomic.StoreInt64(&n.PCol.elementCount, 0) + n.PCol.resetSize() + } return nil } +// ProcessElement encodes the windowed value header for the element, followed by the element, +// emitting it to the data service. func (n *DataSink) ProcessElement(ctx context.Context, value *FullValue, values ...ReStream) error { // Marshal the pieces into a temporary buffer since they must be transmitted on FnAPI as a single // unit. var b bytes.Buffer - atomic.AddInt64(&n.count, 1) - if err := EncodeWindowedValueHeader(n.wEnc, value.Windows, value.Timestamp, &b); err != nil { + if err := EncodeWindowedValueHeader(n.wEnc, value.Windows, value.Timestamp, value.Pane, &b); err != nil { return err } if err := n.enc.Encode(value, &b); err != nil { return errors.WithContextf(err, "encoding element %v with coder %v", value, n.enc) } - if _, err := n.w.Write(b.Bytes()); err != nil { + byteCount, err := n.w.Write(b.Bytes()) + if err != nil { return err } + // TODO[BEAM-6374): Properly handle the multiplex and flatten cases. + // Right now we just stop datasink collection. + if n.PCol != nil { + atomic.AddInt64(&n.PCol.elementCount, 1) + n.PCol.addSize(int64(byteCount)) + } return nil } +// FinishBundle closes the write to the data channel. func (n *DataSink) FinishBundle(ctx context.Context) error { - log.Infof(ctx, "DataSink: %d elements in %d ns", atomic.LoadInt64(&n.count), time.Now().Sub(n.start)) return n.w.Close() } +// Down is a no-op. func (n *DataSink) Down(ctx context.Context) error { return nil } diff --git a/sdks/go/pkg/beam/core/runtime/exec/datasource.go b/sdks/go/pkg/beam/core/runtime/exec/datasource.go index fe9d54033e8a..2e8f25d41d31 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/datasource.go +++ b/sdks/go/pkg/beam/core/runtime/exec/datasource.go @@ -25,10 +25,9 @@ import ( "sync" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/ioutilx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/ioutilx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // DataSource is a Root execution unit. @@ -38,14 +37,14 @@ type DataSource struct { Name string Coder *coder.Coder Out Node + PCol PCollection // Handles size metrics. Value instead of pointer so it's initialized by default in tests. source DataManager state StateReader - // TODO(lostluck) 2020/02/06: refactor to support more general PCollection metrics on nodes. - outputPID string // The index is the output count for the PCollection. - index int64 - splitIdx int64 - start time.Time + + index int64 + splitIdx int64 + start time.Time // su is non-nil if this DataSource feeds directly to a splittable unit, // and receives that splittable unit when it is available for splitting. @@ -90,6 +89,29 @@ func (n *DataSource) StartBundle(ctx context.Context, id string, data DataContex return n.Out.StartBundle(ctx, id, data) } +// ByteCountReader is a passthrough reader that counts all the bytes read through it. +// It trusts the nested reader to return accurate byte information. +type byteCountReader struct { + count *int + reader io.ReadCloser +} + +func (r *byteCountReader) Read(p []byte) (int, error) { + n, err := r.reader.Read(p) + *r.count += n + return n, err +} + +func (r *byteCountReader) Close() error { + return r.reader.Close() +} + +func (r *byteCountReader) reset() int { + c := *r.count + *r.count = 0 + return c +} + // Process opens the data source, reads and decodes data, kicking off element processing. func (n *DataSource) Process(ctx context.Context) error { r, err := n.source.OpenRead(ctx, n.SID) @@ -97,6 +119,9 @@ func (n *DataSource) Process(ctx context.Context) error { return err } defer r.Close() + n.PCol.resetSize() // initialize the size distribution for this bundle. + var byteCount int + bcr := byteCountReader{reader: r, count: &byteCount} c := coder.SkipW(n.Coder) wc := MakeWindowDecoder(n.Coder.Window) @@ -119,7 +144,8 @@ func (n *DataSource) Process(ctx context.Context) error { if n.incrementIndexAndCheckSplit() { return nil } - ws, t, err := DecodeWindowedValueHeader(wc, r) + // TODO(lostluck) 2020/02/22: Should we include window headers or just count the element sizes? + ws, t, pn, err := DecodeWindowedValueHeader(wc, r) if err != nil { if err == io.EOF { return nil @@ -128,16 +154,17 @@ func (n *DataSource) Process(ctx context.Context) error { } // Decode key or parallel element. - pe, err := cp.Decode(r) + pe, err := cp.Decode(&bcr) if err != nil { return errors.Wrap(err, "source decode failed") } pe.Timestamp = t pe.Windows = ws + pe.Pane = pn var valReStreams []ReStream for _, cv := range cvs { - values, err := n.makeReStream(ctx, pe, cv, r) + values, err := n.makeReStream(ctx, pe, cv, &bcr) if err != nil { return err } @@ -147,11 +174,15 @@ func (n *DataSource) Process(ctx context.Context) error { if err := n.Out.ProcessElement(ctx, pe, valReStreams...); err != nil { return err } + // Collect the actual size of the element, and reset the bytecounter reader. + n.PCol.addSize(int64(bcr.reset())) + bcr.reader = r } } -func (n *DataSource) makeReStream(ctx context.Context, key *FullValue, cv ElementDecoder, r io.ReadCloser) (ReStream, error) { - size, err := coder.DecodeInt32(r) +func (n *DataSource) makeReStream(ctx context.Context, key *FullValue, cv ElementDecoder, bcr *byteCountReader) (ReStream, error) { + // TODO(lostluck) 2020/02/22: Do we include the chunk size, or just the element sizes? + size, err := coder.DecodeInt32(bcr.reader) if err != nil { return nil, errors.Wrap(err, "stream size decoding failed") } @@ -160,16 +191,16 @@ func (n *DataSource) makeReStream(ctx context.Context, key *FullValue, cv Elemen case size >= 0: // Single chunk streams are fully read in and buffered in memory. buf := make([]FullValue, 0, size) - buf, err = readStreamToBuffer(cv, r, int64(size), buf) + buf, err = readStreamToBuffer(cv, bcr, int64(size), buf) if err != nil { return nil, err } return &FixedReStream{Buf: buf}, nil - case size == -1: // Shouldn't this be 0? + case size == -1: // Multi-chunked stream. var buf []FullValue for { - chunk, err := coder.DecodeVarInt(r) + chunk, err := coder.DecodeVarInt(bcr.reader) if err != nil { return nil, errors.Wrap(err, "stream chunk size decoding failed") } @@ -179,17 +210,17 @@ func (n *DataSource) makeReStream(ctx context.Context, key *FullValue, cv Elemen return &FixedReStream{Buf: buf}, nil case chunk > 0: // Non-zero chunk, read that many elements from the stream, and buffer them. chunkBuf := make([]FullValue, 0, chunk) - chunkBuf, err = readStreamToBuffer(cv, r, chunk, chunkBuf) + chunkBuf, err = readStreamToBuffer(cv, bcr, chunk, chunkBuf) if err != nil { return nil, err } buf = append(buf, chunkBuf...) case chunk == -1: // State backed iterable! - chunk, err := coder.DecodeVarInt(r) + chunk, err := coder.DecodeVarInt(bcr.reader) if err != nil { return nil, err } - token, err := ioutilx.ReadN(r, (int)(chunk)) + token, err := ioutilx.ReadN(bcr.reader, (int)(chunk)) if err != nil { return nil, err } @@ -201,6 +232,9 @@ func (n *DataSource) makeReStream(ctx context.Context, key *FullValue, cv Elemen if err != nil { return nil, err } + // We can't re-use the original bcr, since we may get new iterables, + // or multiple of them at the same time, but we can re-use the count itself. + r = &byteCountReader{reader: r, count: bcr.count} return &elementStream{r: r, ec: cv}, nil }, }, @@ -229,7 +263,6 @@ func readStreamToBuffer(cv ElementDecoder, r io.ReadCloser, size int64, buf []Fu func (n *DataSource) FinishBundle(ctx context.Context) error { n.mu.Lock() defer n.mu.Unlock() - log.Infof(ctx, "DataSource: %d elements in %d ns", n.index, time.Now().Sub(n.start)) n.source = nil n.splitIdx = 0 // Ensure errors are returned for split requests if this plan is re-used. return n.Out.FinishBundle(ctx) @@ -261,12 +294,11 @@ func (n *DataSource) incrementIndexAndCheckSplit() bool { } // ProgressReportSnapshot captures the progress reading an input source. -// -// TODO(lostluck) 2020/02/06: Add a visitor pattern for collecting progress -// metrics from downstream Nodes. type ProgressReportSnapshot struct { - ID, Name, PID string - Count int64 + ID, Name string + Count int64 + + pcol PCollectionSnapshot } // Progress returns a snapshot of the source's progress. @@ -275,6 +307,7 @@ func (n *DataSource) Progress() ProgressReportSnapshot { return ProgressReportSnapshot{} } n.mu.Lock() + pcol := n.PCol.snapshot() // The count is the number of "completely processed elements" // which matches the index of the currently processing element. c := n.index @@ -283,7 +316,8 @@ func (n *DataSource) Progress() ProgressReportSnapshot { if c < 0 { c = 0 } - return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, Name: n.Name, Count: c} + pcol.ElementCount = c + return ProgressReportSnapshot{ID: n.SID.PtransformID, Name: n.Name, Count: c, pcol: pcol} } // Split takes a sorted set of potential split indices and a fraction of the @@ -506,7 +540,7 @@ func splitHelper( func encodeElm(elm *FullValue, wc WindowEncoder, ec ElementEncoder) ([]byte, error) { var b bytes.Buffer - if err := EncodeWindowedValueHeader(wc, elm.Windows, elm.Timestamp, &b); err != nil { + if err := EncodeWindowedValueHeader(wc, elm.Windows, elm.Timestamp, elm.Pane, &b); err != nil { return nil, err } if err := ec.Encode(elm, &b); err != nil { diff --git a/sdks/go/pkg/beam/core/runtime/exec/datasource_test.go b/sdks/go/pkg/beam/core/runtime/exec/datasource_test.go index 15094ff84095..85919334f39b 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/datasource_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/datasource_test.go @@ -22,10 +22,11 @@ import ( "math" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) func TestDataSource_PerElement(t *testing.T) { @@ -43,7 +44,7 @@ func TestDataSource_PerElement(t *testing.T) { wc := MakeWindowEncoder(c.Window) ec := MakeElementEncoder(coder.SkipW(c)) for _, v := range expected { - EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, pw) + EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, typex.NoFiringPane(), pw) ec.Encode(&FullValue{Elm: v}, pw) } pw.Close() @@ -98,7 +99,7 @@ func TestDataSource_Iterators(t *testing.T) { driver: func(c *coder.Coder, dmw io.WriteCloser, _ func() io.WriteCloser, ks, vs []interface{}) { wc, kc, vc := extractCoders(c) for _, k := range ks { - EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, dmw) + EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, typex.NoFiringPane(), dmw) kc.Encode(&FullValue{Elm: k}, dmw) coder.EncodeInt32(int32(len(vs)), dmw) // Number of elements. for _, v := range vs { @@ -116,7 +117,7 @@ func TestDataSource_Iterators(t *testing.T) { driver: func(c *coder.Coder, dmw io.WriteCloser, _ func() io.WriteCloser, ks, vs []interface{}) { wc, kc, vc := extractCoders(c) for _, k := range ks { - EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, dmw) + EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, typex.NoFiringPane(), dmw) kc.Encode(&FullValue{Elm: k}, dmw) coder.EncodeInt32(-1, dmw) // Mark this as a multi-Chunk (though beam runner proto says to use 0) @@ -137,7 +138,7 @@ func TestDataSource_Iterators(t *testing.T) { driver: func(c *coder.Coder, dmw io.WriteCloser, swFn func() io.WriteCloser, ks, vs []interface{}) { wc, kc, vc := extractCoders(c) for _, k := range ks { - EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, dmw) + EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, typex.NoFiringPane(), dmw) kc.Encode(&FullValue{Elm: k}, dmw) coder.EncodeInt32(-1, dmw) // Mark as multi-chunk (though beam, runner says to use 0) coder.EncodeVarInt(-1, dmw) // Mark subsequent chunks as "state backed" @@ -203,6 +204,16 @@ func TestDataSource_Iterators(t *testing.T) { if got, want := iVals, expectedKeys; !equalList(got, want) { t.Errorf("DataSource => %#v, want %#v", extractValues(got...), extractValues(want...)) } + + // We're using integers that encode to 1 byte, so do some quick math to validate. + sizeOfSmallInt := 1 + snap := quickTestSnapshot(source, int64(len(test.keys))) + snap.pcol.SizeSum = int64(len(test.keys) * (1 + len(test.vals)) * sizeOfSmallInt) + snap.pcol.SizeMin = int64((1 + len(test.vals)) * sizeOfSmallInt) + snap.pcol.SizeMax = int64((1 + len(test.vals)) * sizeOfSmallInt) + if got, want := source.Progress(), snap; got != want { + t.Errorf("progress didn't match: got %v, want %v", got, want) + } }) } } @@ -225,7 +236,7 @@ func TestDataSource_Split(t *testing.T) { wc := MakeWindowEncoder(c.Window) ec := MakeElementEncoder(coder.SkipW(c)) for _, v := range elements { - EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, pw) + EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, typex.NoFiringPane(), pw) ec.Encode(&FullValue{Elm: v}, pw) } pw.Close() @@ -371,15 +382,6 @@ func TestDataSource_Split(t *testing.T) { }) validateSource(t, out, source, makeValues(test.expected...)) - - // Adjust expectations to maximum number of elements. - adjustedExpectation := test.splitIdx - if adjustedExpectation > int64(len(elements)) { - adjustedExpectation = int64(len(elements)) - } - if got, want := source.Progress().Count, adjustedExpectation; got != want { - t.Fatalf("progress didn't match split: got %v, want %v", got, want) - } }) } }) @@ -871,13 +873,29 @@ func constructAndExecutePlanWithContext(t *testing.T, us []Unit, dc DataContext) } } +func quickTestSnapshot(source *DataSource, count int64) ProgressReportSnapshot { + return ProgressReportSnapshot{ + Name: source.Name, + ID: source.SID.PtransformID, + Count: count, + pcol: PCollectionSnapshot{ + ElementCount: count, + SizeCount: count, + SizeSum: count, + // We're only encoding small ints here, so size will only be 1. + SizeMin: 1, + SizeMax: 1, + }, + } +} + func validateSource(t *testing.T, out *CaptureNode, source *DataSource, expected []FullValue) { t.Helper() if got, want := len(out.Elements), len(expected); got != want { t.Fatalf("lengths don't match: got %v, want %v", got, want) } - if got, want := source.Progress().Count, int64(len(expected)); got != want { - t.Fatalf("progress count didn't match: got %v, want %v", got, want) + if got, want := source.Progress(), quickTestSnapshot(source, int64(len(expected))); got != want { + t.Fatalf("progress snapshot didn't match: got %v, want %v", got, want) } if !equalList(out.Elements, expected) { t.Errorf("DataSource => %#v, want %#v", extractValues(out.Elements...), extractValues(expected...)) diff --git a/sdks/go/pkg/beam/core/runtime/exec/decode.go b/sdks/go/pkg/beam/core/runtime/exec/decode.go index 93fa3454d05a..a14713c7a815 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/decode.go +++ b/sdks/go/pkg/beam/core/runtime/exec/decode.go @@ -19,7 +19,7 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // Decoder is a uniform custom encoder interface. It wraps various diff --git a/sdks/go/pkg/beam/core/runtime/exec/dynsplit_test.go b/sdks/go/pkg/beam/core/runtime/exec/dynsplit_test.go index 2bac292c9497..2c87a3be29c1 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/dynsplit_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/dynsplit_test.go @@ -23,14 +23,14 @@ import ( "sync" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/io/rtrackers/offsetrange" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/rtrackers/offsetrange" "github.com/google/go-cmp/cmp" "github.com/google/go-cmp/cmp/cmpopts" ) @@ -279,7 +279,7 @@ func createSdfPlan(t *testing.T, name string, fn *graph.DoFn, cdr *coder.Coder) func writeElm(elm *FullValue, cdr *coder.Coder, pw *io.PipeWriter) { wc := MakeWindowEncoder(cdr.Window) ec := MakeElementEncoder(coder.SkipW(cdr)) - if err := EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, pw); err != nil { + if err := EncodeWindowedValueHeader(wc, window.SingleGlobalWindow, mtime.ZeroTimestamp, typex.NoFiringPane(), pw); err != nil { panic("err") } if err := ec.Encode(elm, pw); err != nil { @@ -294,7 +294,7 @@ func decodeDynSplitElm(elm []byte, cdr *coder.Coder) (*FullValue, error) { wd := MakeWindowDecoder(cdr.Window) ed := MakeElementDecoder(coder.SkipW(cdr)) b := bytes.NewBuffer(elm) - w, t, err := DecodeWindowedValueHeader(wd, b) + w, t, pn, err := DecodeWindowedValueHeader(wd, b) if err != nil { return nil, err } @@ -304,6 +304,7 @@ func decodeDynSplitElm(elm []byte, cdr *coder.Coder) (*FullValue, error) { } e.Windows = w e.Timestamp = t + e.Pane = pn return e, nil } diff --git a/sdks/go/pkg/beam/core/runtime/exec/emit.go b/sdks/go/pkg/beam/core/runtime/exec/emit.go index 47a37b439576..c703a06aed0e 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/emit.go +++ b/sdks/go/pkg/beam/core/runtime/exec/emit.go @@ -21,8 +21,8 @@ import ( "reflect" "sync" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // ReusableEmitter is a resettable value needed to hold the implicit context and diff --git a/sdks/go/pkg/beam/core/runtime/exec/encode.go b/sdks/go/pkg/beam/core/runtime/exec/encode.go index 7f6bbeeef510..ada5d3c195a4 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/encode.go +++ b/sdks/go/pkg/beam/core/runtime/exec/encode.go @@ -19,7 +19,7 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // Encoder is a uniform custom encoder interface. It wraps various diff --git a/sdks/go/pkg/beam/core/runtime/exec/fn.go b/sdks/go/pkg/beam/core/runtime/exec/fn.go index ae147c4256f2..b35fb41be0f6 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/fn.go +++ b/sdks/go/pkg/beam/core/runtime/exec/fn.go @@ -20,13 +20,13 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) //go:generate specialize --input=fn_arity.tmpl @@ -251,7 +251,16 @@ func (n *invoker) ret4(ws []typex.Window, ts typex.EventTime, r0, r1, r2, r3 int return &n.ret, nil } -func makeSideInputs(fn *funcx.Fn, in []*graph.Inbound, side []ReStream) ([]ReusableInput, error) { +func makeSideInputs(ctx context.Context, w typex.Window, side []SideInputAdapter, reader StateReader, fn *funcx.Fn, in []*graph.Inbound) ([]ReusableInput, error) { + streams := make([]ReStream, len(side), len(side)) + for i, adapter := range side { + s, err := adapter.NewIterable(ctx, reader, w) + if err != nil { + return nil, err + } + streams[i] = s + } + if len(side) == 0 { return nil, nil // ok: no side input } @@ -268,8 +277,8 @@ func makeSideInputs(fn *funcx.Fn, in []*graph.Inbound, side []ReStream) ([]Reusa offset := len(param) - len(side) var ret []ReusableInput - for i := 0; i < len(side); i++ { - s, err := makeSideInput(in[i+1].Kind, fn.Param[param[i+offset]].T, side[i]) + for i := 0; i < len(streams); i++ { + s, err := makeSideInput(in[i+1].Kind, fn.Param[param[i+offset]].T, streams[i]) if err != nil { return nil, errors.WithContextf(err, "making side input %v for %v", i, fn) } diff --git a/sdks/go/pkg/beam/core/runtime/exec/fn_arity.go b/sdks/go/pkg/beam/core/runtime/exec/fn_arity.go index ae4cd6cd4437..4e9fdf5b1cf5 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/fn_arity.go +++ b/sdks/go/pkg/beam/core/runtime/exec/fn_arity.go @@ -22,8 +22,8 @@ package exec import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // initCall initializes the caller for the invoker, avoiding slice allocation for the diff --git a/sdks/go/pkg/beam/core/runtime/exec/fn_arity.tmpl b/sdks/go/pkg/beam/core/runtime/exec/fn_arity.tmpl index 6c632a43fc1c..2f5a25d3e5eb 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/fn_arity.tmpl +++ b/sdks/go/pkg/beam/core/runtime/exec/fn_arity.tmpl @@ -20,8 +20,8 @@ package exec import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // initCall initializes the caller for the invoker, avoiding slice allocation for the diff --git a/sdks/go/pkg/beam/core/runtime/exec/fn_test.go b/sdks/go/pkg/beam/core/runtime/exec/fn_test.go index b92d9a6458ae..9ce2281da1b1 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/fn_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/fn_test.go @@ -22,11 +22,11 @@ import ( "testing" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) type testInt int32 diff --git a/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go b/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go index 792e1b440a32..c46d6551ec73 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go +++ b/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go @@ -20,8 +20,8 @@ import ( "io" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // TODO(herohde) 1/29/2018: using FullValue for nested KVs is somewhat of a hack @@ -37,13 +37,14 @@ type FullValue struct { Timestamp typex.EventTime Windows []typex.Window + Pane typex.PaneInfo } func (v *FullValue) String() string { if v.Elm2 == nil { - return fmt.Sprintf("%v [@%v:%v]", v.Elm, v.Timestamp, v.Windows) + return fmt.Sprintf("%v [@%v:%v:%v]", v.Elm, v.Timestamp, v.Windows, v.Pane) } - return fmt.Sprintf("KV<%v,%v> [@%v:%v]", v.Elm, v.Elm2, v.Timestamp, v.Windows) + return fmt.Sprintf("KV<%v,%v> [@%v:%v:%v]", v.Elm, v.Elm2, v.Timestamp, v.Windows, v.Pane) } // Stream is a FullValue reader. It returns io.EOF when complete, but can be diff --git a/sdks/go/pkg/beam/core/runtime/exec/fullvalue_test.go b/sdks/go/pkg/beam/core/runtime/exec/fullvalue_test.go index d760a4419117..f24348530b54 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/fullvalue_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/fullvalue_test.go @@ -19,11 +19,11 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" ) func makeInput(vs ...interface{}) []MainInput { diff --git a/sdks/go/pkg/beam/core/runtime/exec/hash.go b/sdks/go/pkg/beam/core/runtime/exec/hash.go index 10bbeecd40a9..bea0c6333f7d 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/hash.go +++ b/sdks/go/pkg/beam/core/runtime/exec/hash.go @@ -16,40 +16,44 @@ package exec import ( + "encoding/binary" "fmt" "hash" "hash/fnv" "math" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // Infrastructure for hashing values for lifted combines. type elementHasher interface { - Hash(element interface{}) (uint64, error) + Hash(element interface{}, w typex.Window) (uint64, error) } -func makeElementHasher(c *coder.Coder) elementHasher { +func makeElementHasher(c *coder.Coder, wc *coder.WindowCoder) elementHasher { // TODO(lostluck): move to a faster hashing library once we can take dependencies easily. hasher := fnv.New64a() + we := MakeWindowEncoder(wc) switch c.Kind { case coder.Bytes: - return &bytesHasher{hash: hasher} + return &bytesHasher{hash: hasher, we: we} case coder.VarInt: - return &numberHasher{} + return newNumberHasher(hasher, we) case coder.String: - return &stringHasher{hash: hasher} + return &stringHasher{hash: hasher, we: we} case coder.Row: enc := MakeElementEncoder(c) return &rowHasher{ hash: hasher, coder: enc, + we: we, } case coder.Custom: @@ -58,7 +62,7 @@ func makeElementHasher(c *coder.Coder) elementHasher { case reflectx.Int, reflectx.Int8, reflectx.Int16, reflectx.Int32, reflectx.Int64, reflectx.Uint, reflectx.Uint8, reflectx.Uint16, reflectx.Uint32, reflectx.Uint64, reflectx.Float32, reflectx.Float64: - return &numberHasher{} + return newNumberHasher(hasher, we) } // TODO(lostluck): 2019.02.07 - consider supporting encoders that // take in a io.Writer instead. @@ -66,6 +70,7 @@ func makeElementHasher(c *coder.Coder) elementHasher { hash: hasher, t: c.Custom.Type, coder: makeEncoder(c.Custom.Enc.Fn), + we: we, } default: panic(fmt.Sprintf("Unexpected coder for hashing: %v", c)) @@ -74,19 +79,22 @@ func makeElementHasher(c *coder.Coder) elementHasher { type bytesHasher struct { hash hash.Hash64 + we WindowEncoder } -func (h *bytesHasher) Hash(element interface{}) (uint64, error) { +func (h *bytesHasher) Hash(element interface{}, w typex.Window) (uint64, error) { h.hash.Reset() h.hash.Write(element.([]byte)) + h.we.EncodeSingle(w, h.hash) return h.hash.Sum64(), nil } type stringHasher struct { hash hash.Hash64 + we WindowEncoder } -func (h *stringHasher) Hash(element interface{}) (uint64, error) { +func (h *stringHasher) Hash(element interface{}, w typex.Window) (uint64, error) { h.hash.Reset() s := element.(string) var b [64]byte @@ -101,13 +109,27 @@ func (h *stringHasher) Hash(element interface{}) (uint64, error) { n := l - i copy(b[:], s[i:]) h.hash.Write(b[:n]) + h.we.EncodeSingle(w, h.hash) return h.hash.Sum64(), nil } type numberHasher struct { + hash hash.Hash64 + we WindowEncoder + cache []byte +} + +func newNumberHasher(hash hash.Hash64, we WindowEncoder) *numberHasher { + return &numberHasher{ + hash: hash, + we: we, + // Pre allocate slice to avoid re-allocations. + cache: make([]byte, 8, 8), + } } -func (h *numberHasher) Hash(element interface{}) (uint64, error) { +func (h *numberHasher) Hash(element interface{}, w typex.Window) (uint64, error) { + h.hash.Reset() var val uint64 switch n := element.(type) { case int: @@ -137,22 +159,27 @@ func (h *numberHasher) Hash(element interface{}) (uint64, error) { default: panic(fmt.Sprintf("received unknown value type: want a number:, got %T", n)) } - return val, nil + binary.LittleEndian.PutUint64(h.cache, val) + h.hash.Write(h.cache) + h.we.EncodeSingle(w, h.hash) + return h.hash.Sum64(), nil } type rowHasher struct { hash hash.Hash64 coder ElementEncoder + we WindowEncoder fv FullValue } -func (h *rowHasher) Hash(element interface{}) (uint64, error) { +func (h *rowHasher) Hash(element interface{}, w typex.Window) (uint64, error) { h.hash.Reset() h.fv.Elm = element if err := h.coder.Encode(&h.fv, h.hash); err != nil { return 0, err } h.fv.Elm = nil + h.we.EncodeSingle(w, h.hash) return h.hash.Sum64(), nil } @@ -160,14 +187,16 @@ type customEncodedHasher struct { hash hash.Hash64 t reflect.Type coder Encoder + we WindowEncoder } -func (h *customEncodedHasher) Hash(element interface{}) (uint64, error) { +func (h *customEncodedHasher) Hash(element interface{}, w typex.Window) (uint64, error) { h.hash.Reset() b, err := h.coder.Encode(h.t, element) if err != nil { return 0, err } h.hash.Write(b) + h.we.EncodeSingle(w, h.hash) return h.hash.Sum64(), nil } diff --git a/sdks/go/pkg/beam/core/runtime/exec/hash_test.go b/sdks/go/pkg/beam/core/runtime/exec/hash_test.go index 00852072edf6..143b5ddd90f9 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/hash_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/hash_test.go @@ -23,10 +23,11 @@ import ( "strings" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/coderx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/coderx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func BenchmarkPrimitives(b *testing.B) { @@ -125,6 +126,7 @@ func (*jsonEncoder) Encode(t reflect.Type, element interface{}) ([]byte, error) func hashbench(b *testing.B, test interface{}, encoded, dedicated elementHasher) { var value FullValue + gw := window.SingleGlobalWindow[0] b.Run("interface", func(b *testing.B) { m := make(map[interface{}]FullValue) for i := 0; i < b.N; i++ { @@ -136,7 +138,7 @@ func hashbench(b *testing.B, test interface{}, encoded, dedicated elementHasher) b.Run("encodedHash", func(b *testing.B) { m := make(map[uint64]FullValue) for i := 0; i < b.N; i++ { - k, err := encoded.Hash(test) + k, err := encoded.Hash(test, gw) if err != nil { b.Fatal(err) } @@ -150,7 +152,7 @@ func hashbench(b *testing.B, test interface{}, encoded, dedicated elementHasher) b.Run("dedicatedHash", func(b *testing.B) { m := make(map[uint64]FullValue) for i := 0; i < b.N; i++ { - k, err := dedicated.Hash(test) + k, err := dedicated.Hash(test, gw) if err != nil { b.Fatal(err) } diff --git a/sdks/go/pkg/beam/core/runtime/exec/input.go b/sdks/go/pkg/beam/core/runtime/exec/input.go index 74851c73d61a..98922a24e9f5 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/input.go +++ b/sdks/go/pkg/beam/core/runtime/exec/input.go @@ -21,9 +21,9 @@ import ( "reflect" "sync" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // TODO(herohde) 4/26/2017: SideInput representation? We want it to be amenable diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/callers.go b/sdks/go/pkg/beam/core/runtime/exec/optimized/callers.go index 7ba93f4edc03..3d361b18c159 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/callers.go +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/callers.go @@ -20,8 +20,8 @@ package optimized import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // TODO(herohde) 1/4/2018: Potential targets for type-specialization include simple predicate, diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/callers.tmpl b/sdks/go/pkg/beam/core/runtime/exec/optimized/callers.tmpl index da956a21688a..ed4990e5bc28 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/callers.tmpl +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/callers.tmpl @@ -18,7 +18,7 @@ package {{.Package}} import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" {{- range $import := .Imports}} "{{$import}}" {{- end}} diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/decoders.go b/sdks/go/pkg/beam/core/runtime/exec/optimized/decoders.go index a9eaf63f909f..d44c9ea861df 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/decoders.go +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/decoders.go @@ -20,8 +20,8 @@ package optimized import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // This template registers all forms of decoders as general diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/decoders.tmpl b/sdks/go/pkg/beam/core/runtime/exec/optimized/decoders.tmpl index bab012734e25..46746bb15fa5 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/decoders.tmpl +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/decoders.tmpl @@ -18,7 +18,7 @@ package {{.Package}} import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" {{- range $import := .Imports}} "{{$import}}" {{- end}} diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/emitters.go b/sdks/go/pkg/beam/core/runtime/exec/optimized/emitters.go index 437b184ef12f..f034620cf1d0 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/emitters.go +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/emitters.go @@ -21,8 +21,8 @@ import ( "context" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) func init() { diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/emitters.tmpl b/sdks/go/pkg/beam/core/runtime/exec/optimized/emitters.tmpl index eff40802cdda..3cddd0361c28 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/emitters.tmpl +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/emitters.tmpl @@ -19,8 +19,8 @@ import ( "context" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" {{- range $import := .Imports}} "{{$import}}" {{- end}} diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/encoders.go b/sdks/go/pkg/beam/core/runtime/exec/optimized/encoders.go index 831d42b6c8e9..1e670f0745e8 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/encoders.go +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/encoders.go @@ -20,8 +20,8 @@ package optimized import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // This template registers all forms of encoders as general diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/encoders.tmpl b/sdks/go/pkg/beam/core/runtime/exec/optimized/encoders.tmpl index 68b7dc2a0edd..a960382cc2a3 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/encoders.tmpl +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/encoders.tmpl @@ -18,7 +18,7 @@ package {{.Package}} import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" {{- range $import := .Imports}} "{{$import}}" {{- end}} diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/inputs.go b/sdks/go/pkg/beam/core/runtime/exec/optimized/inputs.go index a1f8a047091c..d66f8ed75156 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/inputs.go +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/inputs.go @@ -22,8 +22,8 @@ import ( "io" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) func init() { diff --git a/sdks/go/pkg/beam/core/runtime/exec/optimized/inputs.tmpl b/sdks/go/pkg/beam/core/runtime/exec/optimized/inputs.tmpl index 5ea083c61fd4..338ce417eb8e 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/optimized/inputs.tmpl +++ b/sdks/go/pkg/beam/core/runtime/exec/optimized/inputs.tmpl @@ -20,8 +20,8 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" {{- range $import := .Imports}} "{{$import}}" {{- end}} diff --git a/sdks/go/pkg/beam/core/runtime/exec/pardo.go b/sdks/go/pkg/beam/core/runtime/exec/pardo.go index 099763f4a13e..fdf20b0862fe 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/pardo.go +++ b/sdks/go/pkg/beam/core/runtime/exec/pardo.go @@ -20,14 +20,14 @@ import ( "fmt" "path" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/util/errorx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/errorx" ) // ParDo is a DoFn executor. @@ -43,8 +43,8 @@ type ParDo struct { ctx context.Context inv *invoker - side StateReader - cache *cacheElm + reader StateReader + cache *cacheElm status Status err errorx.GuardedError @@ -97,7 +97,7 @@ func (n *ParDo) StartBundle(ctx context.Context, id string, data DataContext) er return errors.Errorf("invalid status for pardo %v: %v, want Up", n.UID, n.status) } n.status = Active - n.side = data.State + n.reader = data.State // Allocating contexts all the time is expensive, but we seldom re-write them, // and never accept modified contexts from users, so we will cache them per-bundle // per-unit, to avoid the constant allocation overhead. @@ -201,7 +201,7 @@ func (n *ParDo) FinishBundle(_ context.Context) error { if _, err := n.invokeDataFn(n.ctx, window.SingleGlobalWindow, mtime.ZeroTimestamp, n.Fn.FinishBundleFn(), nil); err != nil { return n.fail(err) } - n.side = nil + n.reader = nil n.cache = nil if err := MultiFinishBundle(n.ctx, n.Out...); err != nil { @@ -216,7 +216,7 @@ func (n *ParDo) Down(ctx context.Context) error { return n.err.Error() } n.status = Down - n.side = nil + n.reader = nil n.cache = nil if _, err := InvokeWithoutEventTime(ctx, n.Fn.TeardownFn(), nil); err != nil { @@ -253,16 +253,7 @@ func (n *ParDo) initSideInput(ctx context.Context, w typex.Window) error { // Slow path: init side input for the given window - streams := make([]ReStream, len(n.Side), len(n.Side)) - for i, adapter := range n.Side { - s, err := adapter.NewIterable(ctx, n.side, w) - if err != nil { - return err - } - streams[i] = s - } - - sideinput, err := makeSideInputs(n.Fn.ProcessElementFn(), n.Inbound, streams) + sideinput, err := makeSideInputs(ctx, w, n.Side, n.reader, n.Fn.ProcessElementFn(), n.Inbound) if err != nil { return err } @@ -332,8 +323,18 @@ func (n *ParDo) postInvoke() error { func (n *ParDo) fail(err error) error { n.status = Broken - n.err.TrySetError(err) - return err + if err2, ok := err.(*doFnError); ok { + return err2 + } + + parDoError := &doFnError{ + doFn: n.Fn.Name(), + err: err, + uid: n.UID, + pid: n.PID, + } + n.err.TrySetError(parDoError) + return parDoError } func (n *ParDo) String() string { diff --git a/sdks/go/pkg/beam/core/runtime/exec/pardo_test.go b/sdks/go/pkg/beam/core/runtime/exec/pardo_test.go index bcb2f7b1f1f3..691c05781302 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/pardo_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/pardo_test.go @@ -19,11 +19,11 @@ import ( "context" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func sumFn(n int, a int, b []int, c func(*int) bool, d func() func(*int) bool, e func(int)) int { diff --git a/sdks/go/pkg/beam/core/runtime/exec/pcollection.go b/sdks/go/pkg/beam/core/runtime/exec/pcollection.go new file mode 100644 index 000000000000..3b2e3ab3bf2c --- /dev/null +++ b/sdks/go/pkg/beam/core/runtime/exec/pcollection.go @@ -0,0 +1,158 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "context" + "fmt" + "math" + "math/rand" + "sync" + "sync/atomic" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" +) + +// PCollection is a passthrough node to collect PCollection metrics, and +// must be placed as the Out node of any producer of a PCollection. +// +// In particular, must not be placed after a Multiplex, and must be placed +// after a Flatten. +type PCollection struct { + UID UnitID + PColID string + Out Node // Out is the consumer of this PCollection. + Coder *coder.Coder + Seed int64 + + r *rand.Rand + nextSampleIdx int64 // The index of the next value to sample. + elementCoder ElementEncoder + + elementCount int64 // must use atomic operations. + sizeMu sync.Mutex + sizeCount, sizeSum, sizeMin, sizeMax int64 +} + +// ID returns the debug id for this unit. +func (p *PCollection) ID() UnitID { + return p.UID +} + +// Up initializes the random sampling source and element encoder. +func (p *PCollection) Up(ctx context.Context) error { + // dedicated rand source + p.r = rand.New(rand.NewSource(p.Seed)) + p.elementCoder = MakeElementEncoder(p.Coder) + return nil +} + +// StartBundle resets collected metrics for this PCollection, and propagates bundle start. +func (p *PCollection) StartBundle(ctx context.Context, id string, data DataContext) error { + atomic.StoreInt64(&p.elementCount, 0) + p.nextSampleIdx = 1 + p.resetSize() + return MultiStartBundle(ctx, id, data, p.Out) +} + +type byteCounter struct { + count int +} + +func (w *byteCounter) Write(p []byte) (n int, err error) { + w.count += len(p) + return len(p), nil +} + +// ProcessElement increments the element count and sometimes takes size samples of the elements. +func (p *PCollection) ProcessElement(ctx context.Context, elm *FullValue, values ...ReStream) error { + cur := atomic.AddInt64(&p.elementCount, 1) + if cur == p.nextSampleIdx { + // Always encode the first 3 elements. Otherwise... + // We pick the next sampling index based on how large this pcollection already is. + // We don't want to necessarily wait until the pcollection has doubled, so we reduce the range. + // We don't want to always encode the first consecutive elements, so we add 2 to give some variance. + // Finally we add 1 no matter what, so that it can trigger again. + // Otherwise, there's the potential for the random int to be 0, which means we don't change the + // nextSampleIdx at all. + if p.nextSampleIdx < 4 { + p.nextSampleIdx++ + } else { + p.nextSampleIdx = cur + p.r.Int63n(cur/10+2) + 1 + } + var w byteCounter + p.elementCoder.Encode(elm, &w) + p.addSize(int64(w.count)) + } + return p.Out.ProcessElement(ctx, elm, values...) +} + +func (p *PCollection) addSize(size int64) { + p.sizeMu.Lock() + defer p.sizeMu.Unlock() + p.sizeCount++ + p.sizeSum += size + if size > p.sizeMax { + p.sizeMax = size + } + if size < p.sizeMin { + p.sizeMin = size + } +} + +func (p *PCollection) resetSize() { + p.sizeMu.Lock() + defer p.sizeMu.Unlock() + p.sizeCount = 0 + p.sizeSum = 0 + p.sizeMax = math.MinInt64 + p.sizeMin = math.MaxInt64 +} + +// FinishBundle propagates bundle termination. +func (p *PCollection) FinishBundle(ctx context.Context) error { + return MultiFinishBundle(ctx, p.Out) +} + +// Down is a no-op. +func (p *PCollection) Down(ctx context.Context) error { + return nil +} + +func (p *PCollection) String() string { + return fmt.Sprintf("PCollection[%v] Out:%v", p.PColID, IDs(p.Out)) +} + +// PCollectionSnapshot contains the PCollectionID +type PCollectionSnapshot struct { + ID string + ElementCount int64 + // If SizeCount is zero, then no size metrics should be exported. + SizeCount, SizeSum, SizeMin, SizeMax int64 +} + +func (p *PCollection) snapshot() PCollectionSnapshot { + p.sizeMu.Lock() + defer p.sizeMu.Unlock() + return PCollectionSnapshot{ + ID: p.PColID, + ElementCount: atomic.LoadInt64(&p.elementCount), + SizeCount: p.sizeCount, + SizeSum: p.sizeSum, + SizeMin: p.sizeMin, + SizeMax: p.sizeMax, + } +} diff --git a/sdks/go/pkg/beam/core/runtime/exec/pcollection_test.go b/sdks/go/pkg/beam/core/runtime/exec/pcollection_test.go new file mode 100644 index 000000000000..1212c3c2bf76 --- /dev/null +++ b/sdks/go/pkg/beam/core/runtime/exec/pcollection_test.go @@ -0,0 +1,154 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "context" + "math" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" +) + +// TestPCollection verifies that the PCollection node works correctly. +// Seed is by default set to 0, so we have a "deterministic" set of +// randomness for the samples. +func TestPCollection(t *testing.T) { + a := &CaptureNode{UID: 1} + pcol := &PCollection{UID: 2, Out: a, Coder: coder.NewVarInt()} + // The "large" 2nd value is to ensure the values are encoded properly, + // and that Min & Max are behaving. + inputs := []interface{}{int64(1), int64(2000000000), int64(3)} + in := &FixedRoot{UID: 3, Elements: makeInput(inputs...), Out: pcol} + + p, err := NewPlan("a", []Unit{a, pcol, in}) + if err != nil { + t.Fatalf("failed to construct plan: %v", err) + } + + if err := p.Execute(context.Background(), "1", DataContext{}); err != nil { + t.Fatalf("execute failed: %v", err) + } + if err := p.Down(context.Background()); err != nil { + t.Fatalf("down failed: %v", err) + } + + expected := makeValues(inputs...) + if !equalList(a.Elements, expected) { + t.Errorf("multiplex returned %v for a, want %v", extractValues(a.Elements...), extractValues(expected...)) + } + snap := pcol.snapshot() + if want, got := int64(len(expected)), snap.ElementCount; got != want { + t.Errorf("snapshot miscounted: got %v, want %v", got, want) + } + checkPCollectionSizeSample(t, snap, 3, 7, 1, 5) +} + +func TestPCollection_sizeReset(t *testing.T) { + // Check the initial values after resetting. + var pcol PCollection + pcol.resetSize() + snap := pcol.snapshot() + checkPCollectionSizeSample(t, snap, 0, 0, math.MaxInt64, math.MinInt64) +} + +func checkPCollectionSizeSample(t *testing.T, snap PCollectionSnapshot, count, sum, min, max int64) { + t.Helper() + if want, got := int64(count), snap.SizeCount; got != want { + t.Errorf("sample count incorrect: got %v, want %v", got, want) + } + if want, got := int64(sum), snap.SizeSum; got != want { + t.Errorf("sample sum incorrect: got %v, want %v", got, want) + } + if want, got := int64(min), snap.SizeMin; got != want { + t.Errorf("sample min incorrect: got %v, want %v", got, want) + } + if want, got := int64(max), snap.SizeMax; got != want { + t.Errorf("sample max incorrect: got %v, want %v", got, want) + } +} + +// BenchmarkPCollection measures the overhead of invoking a ParDo in a plan. +// +// On @lostluck's desktop (2020/02/20): +// BenchmarkPCollection-12 44699806 24.8 ns/op 0 B/op 0 allocs/op +func BenchmarkPCollection(b *testing.B) { + // Pre allocate the capture buffer and process buffer to avoid + // unnecessary overhead. + out := &CaptureNode{UID: 1, Elements: make([]FullValue, 0, b.N)} + process := make([]MainInput, 0, b.N) + for i := 0; i < b.N; i++ { + process = append(process, MainInput{Key: FullValue{ + Windows: window.SingleGlobalWindow, + Timestamp: mtime.ZeroTimestamp, + Elm: int64(1), + }}) + } + pcol := &PCollection{UID: 2, Out: out, Coder: coder.NewVarInt()} + n := &FixedRoot{UID: 3, Elements: process, Out: pcol} + p, err := NewPlan("a", []Unit{n, pcol, out}) + if err != nil { + b.Fatalf("failed to construct plan: %v", err) + } + b.ResetTimer() + if err := p.Execute(context.Background(), "1", DataContext{}); err != nil { + b.Fatalf("execute failed: %v", err) + } + if err := p.Down(context.Background()); err != nil { + b.Fatalf("down failed: %v", err) + } + if got, want := pcol.snapshot().ElementCount, int64(b.N); got != want { + b.Errorf("did not process all elements: got %v, want %v", got, want) + } + if got, want := len(out.Elements), b.N; got != want { + b.Errorf("did not process all elements: got %v, want %v", got, want) + } +} + +// BenchmarkPCollection_Baseline measures the baseline of the node benchmarking scaffold. +// +// On @lostluck's desktop (2020/02/20): +// BenchmarkPCollection_Baseline-12 62186372 18.8 ns/op 0 B/op 0 allocs/op +func BenchmarkPCollection_Baseline(b *testing.B) { + // Pre allocate the capture buffer and process buffer to avoid + // unnecessary overhead. + out := &CaptureNode{UID: 1, Elements: make([]FullValue, 0, b.N)} + process := make([]MainInput, 0, b.N) + for i := 0; i < b.N; i++ { + process = append(process, MainInput{Key: FullValue{ + Windows: window.SingleGlobalWindow, + Timestamp: mtime.ZeroTimestamp, + Elm: 1, + }}) + } + n := &FixedRoot{UID: 3, Elements: process, Out: out} + p, err := NewPlan("a", []Unit{n, out}) + if err != nil { + b.Fatalf("failed to construct plan: %v", err) + } + b.ResetTimer() + if err := p.Execute(context.Background(), "1", DataContext{}); err != nil { + b.Fatalf("execute failed: %v", err) + } + if err := p.Down(context.Background()); err != nil { + b.Fatalf("down failed: %v", err) + } + if got, want := len(out.Elements), b.N; got != want { + b.Errorf("did not process all elements: got %v, want %v", got, want) + } +} diff --git a/sdks/go/pkg/beam/core/runtime/exec/plan.go b/sdks/go/pkg/beam/core/runtime/exec/plan.go index 13566d5fa72c..7f89ce37322c 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/plan.go +++ b/sdks/go/pkg/beam/core/runtime/exec/plan.go @@ -21,44 +21,30 @@ import ( "context" "fmt" "strings" - "sync" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Plan represents the bundle execution plan. It will generally be constructed // from a part of a pipeline. A plan can be used to process multiple bundles // serially. type Plan struct { - id string - roots []Root - units []Unit - parDoIDs []string + id string // id of the bundle descriptor for this plan + roots []Root + units []Unit + pcols []*PCollection status Status - // While the store is threadsafe, the reference to it - // is not, so we need to protect the store field to be - // able to asynchronously provide tentative metrics. - storeMu sync.Mutex - store *metrics.Store - // TODO: there can be more than 1 DataSource in a bundle. source *DataSource } -// hasPID provides a common interface for extracting PTransformIDs -// from Units. -type hasPID interface { - GetPID() string -} - // NewPlan returns a new bundle execution plan from the given units. func NewPlan(id string, units []Unit) (*Plan, error) { var roots []Root + var pcols []*PCollection var source *DataSource - var pardoIDs []string for _, u := range units { if u == nil { @@ -70,8 +56,8 @@ func NewPlan(id string, units []Unit) (*Plan, error) { if s, ok := u.(*DataSource); ok { source = s } - if p, ok := u.(hasPID); ok { - pardoIDs = append(pardoIDs, p.GetPID()) + if p, ok := u.(*PCollection); ok { + pcols = append(pcols, p) } } if len(roots) == 0 { @@ -79,12 +65,12 @@ func NewPlan(id string, units []Unit) (*Plan, error) { } return &Plan{ - id: id, - status: Initializing, - roots: roots, - units: units, - parDoIDs: pardoIDs, - source: source, + id: id, + status: Initializing, + roots: roots, + units: units, + pcols: pcols, + source: source, }, nil } @@ -102,10 +88,6 @@ func (p *Plan) SourcePTransformID() string { // are brought up on the first execution. If a bundle fails, the plan cannot // be reused for further bundles. Does not panic. Blocking. func (p *Plan) Execute(ctx context.Context, id string, manager DataContext) error { - ctx = metrics.SetBundleID(ctx, p.id) - p.storeMu.Lock() - p.store = metrics.GetStore(ctx) - p.storeMu.Unlock() if p.status == Initializing { for _, u := range p.units { if err := callNoPanic(ctx, u.Up); err != nil { @@ -181,19 +163,28 @@ func (p *Plan) String() string { return fmt.Sprintf("Plan[%v]:\n%v", p.ID(), strings.Join(units, "\n")) } -// Progress returns a snapshot of input progress of the plan, and associated metrics. -func (p *Plan) Progress() (ProgressReportSnapshot, bool) { - if p.source != nil { - return p.source.Progress(), true - } - return ProgressReportSnapshot{}, false +// PlanSnapshot contains system metrics for the current run of the plan. +type PlanSnapshot struct { + Source ProgressReportSnapshot + PCols []PCollectionSnapshot } -// Store returns the metric store for the last use of this plan. -func (p *Plan) Store() *metrics.Store { - p.storeMu.Lock() - defer p.storeMu.Unlock() - return p.store +// Progress returns a snapshot of progress of the plan, and associated metrics. +// The retuend boolean indicates whether the plan includes a DataSource, which is +// important for handling legacy metrics. This boolean will be removed once +// we no longer return legacy metrics. +func (p *Plan) Progress() (PlanSnapshot, bool) { + pcolSnaps := make([]PCollectionSnapshot, 0, len(p.pcols)+1) // include space for the datasource pcollection. + for _, pcol := range p.pcols { + pcolSnaps = append(pcolSnaps, pcol.snapshot()) + } + snap := PlanSnapshot{PCols: pcolSnaps} + if p.source != nil { + snap.Source = p.source.Progress() + snap.PCols = append(pcolSnaps, snap.Source.pcol) + return snap, true + } + return snap, false } // SplitPoints captures the split requested by the Runner. diff --git a/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go b/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go index 9f5355617901..3a1900f6dc85 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go +++ b/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go @@ -22,8 +22,8 @@ import ( "io" "math/rand" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // ReshuffleInput is a Node. @@ -62,7 +62,7 @@ func (n *ReshuffleInput) StartBundle(ctx context.Context, id string, data DataCo func (n *ReshuffleInput) ProcessElement(ctx context.Context, value *FullValue, values ...ReStream) error { n.b.Reset() - if err := EncodeWindowedValueHeader(n.wEnc, value.Windows, value.Timestamp, &n.b); err != nil { + if err := EncodeWindowedValueHeader(n.wEnc, value.Windows, value.Timestamp, value.Pane, &n.b); err != nil { return err } if err := n.enc.Encode(value, &n.b); err != nil { @@ -135,7 +135,7 @@ func (n *ReshuffleOutput) ProcessElement(ctx context.Context, value *FullValue, return errors.WithContextf(err, "reading values for %v", n) } n.b = *bytes.NewBuffer(v.Elm.([]byte)) - ws, ts, err := DecodeWindowedValueHeader(n.wDec, &n.b) + ws, ts, pn, err := DecodeWindowedValueHeader(n.wDec, &n.b) if err != nil { return errors.WithContextf(err, "decoding windows for %v", n) } @@ -144,6 +144,7 @@ func (n *ReshuffleOutput) ProcessElement(ctx context.Context, value *FullValue, } n.ret.Windows = ws n.ret.Timestamp = ts + n.ret.Pane = pn if err := n.Out.ProcessElement(ctx, &n.ret); err != nil { return err } diff --git a/sdks/go/pkg/beam/core/runtime/exec/sdf.go b/sdks/go/pkg/beam/core/runtime/exec/sdf.go index a0c9b7275823..3b14afd1323b 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/sdf.go +++ b/sdks/go/pkg/beam/core/runtime/exec/sdf.go @@ -21,10 +21,10 @@ import ( "math" "path" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // PairWithRestriction is an executor for the expanded SDF step of the same @@ -172,13 +172,13 @@ func (n *SplitAndSizeRestrictions) ProcessElement(ctx context.Context, elm *Full mainElm := elm.Elm.(*FullValue) splitRests := n.splitInv.Invoke(mainElm, rest) - if len(splitRests) == 0 { - err := errors.Errorf("initial splitting returned 0 restrictions.") - return errors.WithContextf(err, "%v", n) - } for _, splitRest := range splitRests { size := n.sizeInv.Invoke(mainElm, splitRest) + if size < 0 { + err := errors.Errorf("size returned expected to be non-negative but received %v.", size) + return errors.WithContextf(err, "%v", n) + } output := &FullValue{} output.Timestamp = elm.Timestamp @@ -475,8 +475,14 @@ func (n *ProcessSizedElementsAndRestrictions) singleWindowSplit(f float64) ([]*F return []*FullValue{}, []*FullValue{}, nil } - pfv := n.newSplitResult(p, n.elm.Windows) - rfv := n.newSplitResult(r, n.elm.Windows) + pfv, err := n.newSplitResult(p, n.elm.Windows) + if err != nil { + return nil, nil, err + } + rfv, err := n.newSplitResult(r, n.elm.Windows) + if err != nil { + return nil, nil, err + } return []*FullValue{pfv}, []*FullValue{rfv}, nil } @@ -548,16 +554,32 @@ func (n *ProcessSizedElementsAndRestrictions) currentWindowSplit(f float64) ([]* // Split of currently processing restriction in a single window. ps := make([]*FullValue, 1) - ps[0] = n.newSplitResult(p, n.elm.Windows[n.currW:n.currW+1]) + newP, err := n.newSplitResult(p, n.elm.Windows[n.currW:n.currW+1]) + if err != nil { + return nil, nil, err + } + ps[0] = newP rs := make([]*FullValue, 1) - rs[0] = n.newSplitResult(r, n.elm.Windows[n.currW:n.currW+1]) + newR, err := n.newSplitResult(r, n.elm.Windows[n.currW:n.currW+1]) + if err != nil { + return nil, nil, err + } + rs[0] = newR // Window boundary split surrounding the split restriction above. full := n.elm.Elm.(*FullValue).Elm2 if 0 < n.currW { - ps = append(ps, n.newSplitResult(full, n.elm.Windows[0:n.currW])) + newP, err := n.newSplitResult(full, n.elm.Windows[0:n.currW]) + if err != nil { + return nil, nil, err + } + ps = append(ps, newP) } if n.currW+1 < n.numW { - rs = append(rs, n.newSplitResult(full, n.elm.Windows[n.currW+1:n.numW])) + newR, err := n.newSplitResult(full, n.elm.Windows[n.currW+1:n.numW]) + if err != nil { + return nil, nil, err + } + rs = append(rs, newR) } n.numW = n.currW + 1 return ps, rs, nil @@ -572,8 +594,14 @@ func (n *ProcessSizedElementsAndRestrictions) windowBoundarySplit(splitPt int) ( return []*FullValue{}, []*FullValue{}, nil } full := n.elm.Elm.(*FullValue).Elm2 - pFv := n.newSplitResult(full, n.elm.Windows[0:splitPt]) - rFv := n.newSplitResult(full, n.elm.Windows[splitPt:n.numW]) + pFv, err := n.newSplitResult(full, n.elm.Windows[0:splitPt]) + if err != nil { + return nil, nil, err + } + rFv, err := n.newSplitResult(full, n.elm.Windows[splitPt:n.numW]) + if err != nil { + return nil, nil, err + } n.numW = splitPt return []*FullValue{pFv}, []*FullValue{rFv}, nil } @@ -582,14 +610,22 @@ func (n *ProcessSizedElementsAndRestrictions) windowBoundarySplit(splitPt int) ( // element restriction pair based on the currently processing element, but with // a modified restriction and windows. Intended for creating primaries and // residuals to return as split results. -func (n *ProcessSizedElementsAndRestrictions) newSplitResult(rest interface{}, w []typex.Window) *FullValue { +func (n *ProcessSizedElementsAndRestrictions) newSplitResult(rest interface{}, w []typex.Window) (*FullValue, error) { var size float64 elm := n.elm.Elm.(*FullValue).Elm if fv, ok := elm.(*FullValue); ok { size = n.sizeInv.Invoke(fv, rest) + if size < 0 { + err := errors.Errorf("size returned expected to be non-negative but received %v.", size) + return nil, errors.WithContextf(err, "%v", n) + } } else { fv := &FullValue{Elm: elm} size = n.sizeInv.Invoke(fv, rest) + if size < 0 { + err := errors.Errorf("size returned expected to be non-negative but received %v.", size) + return nil, errors.WithContextf(err, "%v", n) + } } return &FullValue{ Elm: &FullValue{ @@ -599,7 +635,7 @@ func (n *ProcessSizedElementsAndRestrictions) newSplitResult(rest interface{}, w Elm2: size, Timestamp: n.elm.Timestamp, Windows: w, - } + }, nil } // GetProgress returns the current restriction tracker's progress as a fraction. diff --git a/sdks/go/pkg/beam/core/runtime/exec/sdf_invokers.go b/sdks/go/pkg/beam/core/runtime/exec/sdf_invokers.go index 6beac0fbb373..905984004724 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/sdf_invokers.go +++ b/sdks/go/pkg/beam/core/runtime/exec/sdf_invokers.go @@ -16,10 +16,10 @@ package exec import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" "reflect" ) diff --git a/sdks/go/pkg/beam/core/runtime/exec/sdf_invokers_test.go b/sdks/go/pkg/beam/core/runtime/exec/sdf_invokers_test.go index 7dbe3e8420da..31faabd61ba2 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/sdf_invokers_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/sdf_invokers_test.go @@ -18,7 +18,7 @@ package exec import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" "github.com/google/go-cmp/cmp" ) @@ -382,3 +382,50 @@ func (fn *VetKvSdf) ProcessElement(rt *VetRTracker, i, j int, emit func(*VetRest rest.ProcessElm = true emit(rest) } + +// VetEmptyInitialSplitSdf runs an SDF in order to test that these methods get called properly, +// each method will flip the corresponding flag in the passed in VetRestriction, +// overwrite the restriction's Key and Val with the last seen input elements, +// and retain the other fields in the VetRestriction. +type VetEmptyInitialSplitSdf struct { +} + +// CreateInitialRestriction creates a restriction with the given values and +// with the appropriate flags to track that this was called. +func (fn *VetEmptyInitialSplitSdf) CreateInitialRestriction(i int) *VetRestriction { + return &VetRestriction{ID: "EmptySdf", Val: i, CreateRest: true} +} + +// SplitRestriction outputs zero restrictions. +func (fn *VetEmptyInitialSplitSdf) SplitRestriction(i int, rest *VetRestriction) []*VetRestriction { + return []*VetRestriction{} +} + +// RestrictionSize just returns i as the size, as well as flipping appropriate +// flags on the restriction to track that this was called. +func (fn *VetEmptyInitialSplitSdf) RestrictionSize(i int, rest *VetRestriction) float64 { + rest.Key = nil + rest.Val = i + rest.RestSize = true + return (float64)(i) +} + +// CreateTracker creates an RTracker containing the given restriction and flips +// the appropriate flags on the restriction to track that this was called. +func (fn *VetEmptyInitialSplitSdf) CreateTracker(rest *VetRestriction) *VetRTracker { + rest.CreateTracker = true + return &VetRTracker{rest} +} + +// ProcessElement emits the restriction from the restriction tracker it +// received, with the appropriate flags flipped to track that this was called. +// Note that emitting restrictions is discouraged in normal usage. It is only +// done here to allow validating that ProcessElement is being executed +// properly. +func (fn *VetEmptyInitialSplitSdf) ProcessElement(rt *VetRTracker, i int, emit func(*VetRestriction)) { + rest := rt.Rest + rest.Key = nil + rest.Val = i + rest.ProcessElm = true + emit(rest) +} diff --git a/sdks/go/pkg/beam/core/runtime/exec/sdf_test.go b/sdks/go/pkg/beam/core/runtime/exec/sdf_test.go index 9bf3332319ca..e02606183931 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/sdf_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/sdf_test.go @@ -17,13 +17,14 @@ package exec import ( "context" + "strings" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/io/rtrackers/offsetrange" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/rtrackers/offsetrange" "github.com/google/go-cmp/cmp" ) @@ -57,6 +58,10 @@ func TestSdfNodes(t *testing.T) { if err != nil { t.Fatalf("invalid function: %v", err) } + emptydfn, err := graph.NewDoFn(&VetEmptyInitialSplitSdf{}, graph.NumMainInputs(graph.MainSingle)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } // Validate PairWithRestriction matches its contract and properly invokes // SDF method CreateInitialRestriction. @@ -227,6 +232,22 @@ func TestSdfNodes(t *testing.T) { }, }, }, + { + name: "Empty", + fn: emptydfn, + in: FullValue{ + Elm: &FullValue{ + Elm: 1, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + Elm2: &VetRestriction{ID: "Sdf"}, + Timestamp: testTimestamp, + Windows: testWindows, + }, + want: []FullValue{}, + }, } for _, test := range tests { test := test @@ -237,6 +258,10 @@ func TestSdfNodes(t *testing.T) { units := []Unit{root, node, capt} constructAndExecutePlan(t, units) + if len(capt.Elements) != len(test.want) { + t.Errorf("SplitAndSizeRestrictions(%v) has incorrect number of outputs got: %v, want: %v", + test.in, len(capt.Elements), len(test.want)) + } for i, got := range capt.Elements { if !cmp.Equal(got, test.want[i]) { t.Errorf("SplitAndSizeRestrictions(%v) has incorrect output %v: got: %v, want: %v", @@ -247,6 +272,56 @@ func TestSdfNodes(t *testing.T) { } }) + // Validate SplitAndSizeRestrictions matches its contract and properly + // invokes SDF methods SplitRestriction and RestrictionSize. + t.Run("InvalidSplitAndSizeRestrictions", func(t *testing.T) { + idfn, err := graph.NewDoFn(&NegativeSizeSdf{rest: offsetrange.Restriction{Start: 0, End: 4}}, graph.NumMainInputs(graph.MainSingle)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + tests := []struct { + name string + fn *graph.DoFn + in FullValue + }{ + { + name: "InvalidSplit", + fn: idfn, + in: FullValue{ + Elm: &FullValue{ + Elm: 1, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + Elm2: offsetrange.Restriction{Start: 0, End: 4}, + Timestamp: testTimestamp, + Windows: testWindows, + }, + }, + } + for _, test := range tests { + test := test + t.Run(test.name, func(t *testing.T) { + capt := &CaptureNode{UID: 2} + node := &SplitAndSizeRestrictions{UID: 1, Fn: test.fn, Out: capt} + root := &FixedRoot{UID: 0, Elements: []MainInput{{Key: test.in}}, Out: node} + units := []Unit{root, node, capt} + p, err := NewPlan("a", units) + if err != nil { + t.Fatalf("failed to construct plan: %v", err) + } + err = p.Execute(context.Background(), "1", DataContext{}) + if err == nil { + t.Errorf("execution was expected to fail.") + } + if !strings.Contains(err.Error(), "size returned expected to be non-negative but received") { + t.Errorf("SplitAndSizeRestrictions(%v) failed, got: %v, wanted: 'size returned expected to be non-negative but received'.", test.in, err) + } + }) + } + }) + // Validate ProcessSizedElementsAndRestrictions matches its contract and // properly invokes SDF methods CreateTracker and ProcessElement. t.Run("ProcessSizedElementsAndRestrictions", func(t *testing.T) { @@ -410,6 +485,14 @@ func TestAsSplittableUnit(t *testing.T) { if err != nil { t.Fatalf("invalid function: %v", err) } + pdfn, err := graph.NewDoFn(&NegativeSizeSdf{rest: offsetrange.Restriction{Start: 0, End: 2}}, graph.NumMainInputs(graph.MainSingle)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + rdfn, err := graph.NewDoFn(&NegativeSizeSdf{rest: offsetrange.Restriction{Start: 2, End: 4}}, graph.NumMainInputs(graph.MainSingle)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } multiWindows := []typex.Window{ window.IntervalWindow{Start: 10, End: 20}, window.IntervalWindow{Start: 11, End: 21}, @@ -752,6 +835,67 @@ func TestAsSplittableUnit(t *testing.T) { }) } }) + + // Test that Split properly validates the results and returns an error if invalid + t.Run("InvalidSplitSize", func(t *testing.T) { + tests := []struct { + name string + fn *graph.DoFn + in FullValue + }{ + { + name: "Primary", + fn: pdfn, + in: FullValue{ + Elm: &FullValue{ + Elm: 1, + Elm2: &offsetrange.Restriction{Start: 0, End: 4}, + }, + Elm2: 1.0, + Timestamp: testTimestamp, + Windows: testWindows, + }, + }, + { + name: "Residual", + fn: rdfn, + in: FullValue{ + Elm: &FullValue{ + Elm: 1, + Elm2: &offsetrange.Restriction{Start: 0, End: 4}, + }, + Elm2: 1.0, + Timestamp: testTimestamp, + Windows: testWindows, + }, + }, + } + for _, test := range tests { + test := test + t.Run(test.name, func(t *testing.T) { + // Setup, create transforms, inputs, and desired outputs. + n := &ParDo{UID: 1, Fn: test.fn, Out: []Node{}} + node := &ProcessSizedElementsAndRestrictions{PDo: n} + node.rt = sdf.RTracker(offsetrange.NewTracker(*test.in.Elm.(*FullValue).Elm2.(*offsetrange.Restriction))) + node.elm = &test.in + node.numW = len(test.in.Windows) + node.currW = 0 + + // Call from SplittableUnit and check results. + su := SplittableUnit(node) + if err := node.Up(context.Background()); err != nil { + t.Fatalf("ProcessSizedElementsAndRestrictions.Up() failed: %v", err) + } + _, _, err := su.Split(0.5) + if err == nil { + t.Errorf("SplittableUnit.Split(%v) was expected to fail.", test.in) + } + if !strings.Contains(err.Error(), "size returned expected to be non-negative but received") { + t.Errorf("SplittableUnit.Split(%v) failed, got: %v, wanted: 'size returned expected to be non-negative but received'.", test.in, err) + } + }) + } + }) } // TestMultiWindowProcessing tests that ProcessSizedElementsAndRestrictions @@ -856,6 +1000,44 @@ func TestMultiWindowProcessing(t *testing.T) { } } +// NegativeSizeSdf is a very basic SDF that returns a negative restriction size +// if the passed in restriction matches otherwise it uses offsetrange.Restriction's default size. +type NegativeSizeSdf struct { + rest offsetrange.Restriction +} + +// CreateInitialRestriction creates a four-element offset range. +func (fn *NegativeSizeSdf) CreateInitialRestriction(_ int) offsetrange.Restriction { + return offsetrange.Restriction{Start: 0, End: 4} +} + +// SplitRestriction is a no-op, and does not split. +func (fn *NegativeSizeSdf) SplitRestriction(_ int, rest offsetrange.Restriction) []offsetrange.Restriction { + return []offsetrange.Restriction{rest} +} + +// RestrictionSize returns the passed in size that should be used. +func (fn *NegativeSizeSdf) RestrictionSize(_ int, rest offsetrange.Restriction) float64 { + if fn.rest == rest { + return -1 + } + return rest.Size() +} + +// CreateTracker creates a LockRTracker wrapping an offset range RTracker. +func (fn *NegativeSizeSdf) CreateTracker(rest offsetrange.Restriction) *offsetrange.Tracker { + return offsetrange.NewTracker(rest) +} + +// ProcessElement emits the element after consuming the entire restriction tracker. +func (fn *NegativeSizeSdf) ProcessElement(rt *offsetrange.Tracker, elm int, emit func(int)) { + i := rt.GetRestriction().(offsetrange.Restriction).Start + for rt.TryClaim(i) { + i++ + } + emit(elm) +} + // WindowBlockingSdf is a very basic SDF that blocks execution once, in one // window and at one position within the restriction. type WindowBlockingSdf struct { diff --git a/sdks/go/pkg/beam/core/runtime/exec/sideinput.go b/sdks/go/pkg/beam/core/runtime/exec/sideinput.go index 08cc279b40df..c0a9b6f4c6a0 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/sideinput.go +++ b/sdks/go/pkg/beam/core/runtime/exec/sideinput.go @@ -20,8 +20,8 @@ import ( "fmt" "io" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // This file contains support for side input. diff --git a/sdks/go/pkg/beam/core/runtime/exec/translate.go b/sdks/go/pkg/beam/core/runtime/exec/translate.go index fbbdab3fe104..61809dff1fad 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/translate.go +++ b/sdks/go/pkg/beam/core/runtime/exec/translate.go @@ -21,17 +21,17 @@ import ( "strconv" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - v1pb "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/protox" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/stringx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + v1pb "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/stringx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" "github.com/golang/protobuf/ptypes" ) @@ -82,12 +82,18 @@ func UnmarshalPlan(desc *fnpb.ProcessBundleDescriptor) (*Plan, error) { for key, pid := range transform.GetOutputs() { u.SID = StreamID{PtransformID: id, Port: port} u.Name = key - u.outputPID = pid u.Out, err = b.makePCollection(pid) if err != nil { return nil, err } + // Elide the PCollection Node for DataSources + // DataSources can get byte samples directly, and can handle CoGBKs. + // Copying the PCollection here is fine, as the PCollection will never + // have used it's mutex yet. + u.PCol = *u.Out.(*PCollection) + u.Out = u.PCol.Out + b.units = b.units[:len(b.units)-1] } b.units = append(b.units, u) @@ -103,8 +109,8 @@ type builder struct { succ map[string][]linkID // PCollectionID -> []linkID windowing map[string]*window.WindowingStrategy - nodes map[string]Node // PCollectionID -> Node (cache) - links map[linkID]Node // linkID -> Node (cache) + nodes map[string]*PCollection // PCollectionID -> Node (cache) + links map[linkID]Node // linkID -> Node (cache) units []Unit // result idgen *GenID @@ -146,7 +152,7 @@ func newBuilder(desc *fnpb.ProcessBundleDescriptor) (*builder, error) { succ: succ, windowing: make(map[string]*window.WindowingStrategy), - nodes: make(map[string]Node), + nodes: make(map[string]*PCollection), links: make(map[linkID]Node), idgen: &GenID{}, @@ -230,7 +236,22 @@ func (b *builder) makePCollections(out []string) ([]Node, error) { if err != nil { return nil, err } - ret = append(ret, n) + // This is the cleanest place to do this check and filtering, + // since DataSinks don't know their inputs, due to the construction + // call stack. + // A Source->Sink is both uncommon and inefficent, with the Source eliding the + // collection anyway. + // TODO[BEAM-6374): Properly handle the multiplex and flatten cases. + // Right now we just stop datasink collection. + switch out := n.Out.(type) { + case *DataSink: + // We don't remove the PCollection from units here, since we + // want to ensure it's included in snapshots. + out.PCol = n + ret = append(ret, out) + default: + ret = append(ret, n) + } } return ret, nil } @@ -263,7 +284,7 @@ func (b *builder) makeCoderForPCollection(id string) (*coder.Coder, *coder.Windo return c, wc, nil } -func (b *builder) makePCollection(id string) (Node, error) { +func (b *builder) makePCollection(id string) (*PCollection, error) { if n, exists := b.nodes[id]; exists { return n, nil } @@ -278,8 +299,11 @@ func (b *builder) makePCollection(id string) (Node, error) { u = &Discard{UID: b.idgen.New()} case 1: - return b.makeLink(id, list[0]) - + out, err := b.makeLink(id, list[0]) + if err != nil { + return nil, err + } + return b.newPCollectionNode(id, out) default: // Multiplex. @@ -296,7 +320,16 @@ func (b *builder) makePCollection(id string) (Node, error) { b.units = append(b.units, u) u = &Flatten{UID: b.idgen.New(), N: count, Out: u} } + b.units = append(b.units, u) + return b.newPCollectionNode(id, u) +} +func (b *builder) newPCollectionNode(id string, out Node) (*PCollection, error) { + ec, _, err := b.makeCoderForPCollection(id) + if err != nil { + return nil, err + } + u := &PCollection{UID: b.idgen.New(), Out: out, PColID: id, Coder: ec, Seed: rand.Int63()} b.nodes[id] = u b.units = append(b.units, u) return u, nil @@ -440,16 +473,23 @@ func (b *builder) makeLink(from string, id linkID) (Node, error) { if len(inputs) != 1 { return nil, errors.Errorf("unexpected sideinput to combine: got %d, want 1", len(inputs)) } - ec, _, err := b.makeCoderForPCollection(inputs[0]) + ec, wc, err := b.makeCoderForPCollection(inputs[0]) if err != nil { return nil, err } if !coder.IsKV(ec) { return nil, errors.Errorf("unexpected non-KV coder PCollection input to combine: %v", ec) } - u = &LiftedCombine{Combine: cn, KeyCoder: ec.Components[0]} + u = &LiftedCombine{Combine: cn, KeyCoder: ec.Components[0], WindowCoder: wc} case urnPerKeyCombineMerge: - u = &MergeAccumulators{Combine: cn} + ma := &MergeAccumulators{Combine: cn} + if eo, ok := ma.Out.(*PCollection).Out.(*ExtractOutput); ok { + // Strip PCollections from between MergeAccumulators and ExtractOutputs + // as it's a synthetic PCollection. + b.units = b.units[:len(b.units)-1] + ma.Out = eo + } + u = ma case urnPerKeyCombineExtract: u = &ExtractOutput{Combine: cn} case urnPerKeyCombineConvert: @@ -472,7 +512,14 @@ func (b *builder) makeLink(from string, id linkID) (Node, error) { if !coder.IsKV(c) { return nil, errors.Errorf("unexpected inject coder: %v", c) } - u = &Inject{UID: b.idgen.New(), N: (int)(tp.Inject.N), ValueEncoder: MakeElementEncoder(c.Components[1]), Out: out[0]} + valCoder := c.Components[1] + // JIRA BEAM-12438 - an extra LP coder can get added here, but isn't added + // on decode. Strip them until we get a better fix. + if valCoder.Kind == coder.LP { + // strip unexpected length prefix coder. + valCoder = valCoder.Components[0] + } + u = &Inject{UID: b.idgen.New(), N: (int)(tp.Inject.N), ValueEncoder: MakeElementEncoder(valCoder), Out: out[0]} case graphx.URNExpand: var pid string @@ -491,7 +538,11 @@ func (b *builder) makeLink(from string, id linkID) (Node, error) { for _, dc := range c.Components[1:] { decoders = append(decoders, MakeElementDecoder(dc)) } - u = &Expand{UID: b.idgen.New(), ValueDecoders: decoders, Out: out[0]} + // Strip PCollections from Expand nodes, as CoGBK metrics are handled by + // the DataSource that preceeds them. + trueOut := out[0].(*PCollection).Out + b.units = b.units[:len(b.units)-1] + u = &Expand{UID: b.idgen.New(), ValueDecoders: decoders, Out: trueOut} case graphx.URNReshuffleInput: c, w, err := b.makeCoderForPCollection(from) @@ -514,7 +565,7 @@ func (b *builder) makeLink(from string, id linkID) (Node, error) { u = &ReshuffleOutput{UID: b.idgen.New(), Coder: coder.NewW(c, w), Out: out[0]} default: - return nil, errors.Errorf("unexpected payload: %v", tp) + return nil, errors.Errorf("unexpected payload: %v", &tp) } case graphx.URNWindow: @@ -617,7 +668,7 @@ func inputIdToIndex(id string) (int, error) { return strconv.Atoi(strings.TrimPrefix(id, "i")) } -// inputIdToIndex converts an index into a local input ID for a transform. Use +// indexToInputId converts an index into a local input ID for a transform. Use // this to avoid relying on format details for input IDs. func indexToInputId(i int) string { return "i" + strconv.Itoa(i) diff --git a/sdks/go/pkg/beam/core/runtime/exec/unit_test.go b/sdks/go/pkg/beam/core/runtime/exec/unit_test.go index 3bdcd8e4cbce..76ff2ded7909 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/unit_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/unit_test.go @@ -19,8 +19,8 @@ import ( "context" "io" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // CaptureNode is a test Node that captures all elements for verification. It also diff --git a/sdks/go/pkg/beam/core/runtime/exec/util.go b/sdks/go/pkg/beam/core/runtime/exec/util.go index dc0608bb25f3..2996b6dd159b 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/util.go +++ b/sdks/go/pkg/beam/core/runtime/exec/util.go @@ -17,9 +17,10 @@ package exec import ( "context" + "fmt" "runtime/debug" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // GenID is a simple UnitID generator. @@ -33,13 +34,29 @@ func (g *GenID) New() UnitID { return UnitID(g.last) } +type doFnError struct { + doFn string + err error + uid UnitID + pid string +} + +func (e *doFnError) Error() string { + return fmt.Sprintf("DoFn[UID:%v, PID:%v, Name: %v] failed:\n%v", e.uid, e.pid, e.doFn, e.err) +} + // callNoPanic calls the given function and catches any panic. func callNoPanic(ctx context.Context, fn func(context.Context) error) (err error) { defer func() { if r := recover(); r != nil { - // Top level error is the panic itself, but also include the stack trace as the original error. - // Higher levels can then add appropriate context without getting pushed down by the stack trace. - err = errors.SetTopLevelMsgf(errors.Errorf("panic: %v %s", r, debug.Stack()), "panic: %v", r) + // Check if the panic value is from a failed DoFn, and return it without a panic trace. + if e, ok := r.(*doFnError); ok { + err = e + } else { + // Top level error is the panic itself, but also include the stack trace as the original error. + // Higher levels can then add appropriate context without getting pushed down by the stack trace. + err = errors.SetTopLevelMsgf(errors.Errorf("panic: %v %s", r, debug.Stack()), "panic: %v", r) + } } }() return fn(ctx) diff --git a/sdks/go/pkg/beam/core/runtime/exec/util_test.go b/sdks/go/pkg/beam/core/runtime/exec/util_test.go new file mode 100644 index 000000000000..6e31d5b101ea --- /dev/null +++ b/sdks/go/pkg/beam/core/runtime/exec/util_test.go @@ -0,0 +1,67 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "context" + "strings" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/errorx" +) + +// testSimpleError tests for a simple case that doesn't panic +func TestCallNoPanic_simple(t *testing.T) { + ctx := context.Background() + want := errors.New("Simple error.") + got := callNoPanic(ctx, func(c context.Context) error { return errors.New("Simple error.") }) + + if got.Error() != want.Error() { + t.Errorf("callNoPanic() = %v, want %v", got, want) + } +} + +// testPanicError tests for the case in which a normal error is passed to panic, resulting in panic trace. +func TestCallNoPanic_panic(t *testing.T) { + ctx := context.Background() + got := callNoPanic(ctx, func(c context.Context) error { panic("Panic error") }) + if !strings.Contains(got.Error(), "panic:") { + t.Errorf("callNoPanic() didn't panic, got = %v", got) + } +} + +// testWrapPanicError tests for the case in which error is passed to panic from +// DoFn, resulting in formatted error message for DoFn. +func TestCallNoPanic_wrappedPanic(t *testing.T) { + ctx := context.Background() + errs := errors.New("SumFn error") + parDoError := &doFnError{ + doFn: "sumFn", + err: errs, + uid: 1, + pid: "Plan ID", + } + want := "DoFn[<1>;] returned error:" + var err errorx.GuardedError + err.TrySetError(parDoError) + + got := callNoPanic(ctx, func(c context.Context) error { panic(parDoError) }) + + if strings.Contains(got.Error(), "panic:") { + t.Errorf("callNoPanic() did not filter panic, want %v, got %v", want, got) + } +} diff --git a/sdks/go/pkg/beam/core/runtime/exec/window.go b/sdks/go/pkg/beam/core/runtime/exec/window.go index cc7accc48569..c6a6546bddd3 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/window.go +++ b/sdks/go/pkg/beam/core/runtime/exec/window.go @@ -20,9 +20,9 @@ import ( "fmt" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // WindowInto places each element in one or more windows. @@ -50,6 +50,7 @@ func (w *WindowInto) ProcessElement(ctx context.Context, elm *FullValue, values Timestamp: elm.Timestamp, Elm: elm.Elm, Elm2: elm.Elm2, + Pane: elm.Pane, } return w.Out.ProcessElement(ctx, windowed, values...) } @@ -73,6 +74,11 @@ func assignWindows(wfn *window.Fn, ts typex.EventTime) []typex.Window { ret = append(ret, window.IntervalWindow{Start: start, End: start.Add(wfn.Size)}) } return ret + case window.Sessions: + // Assign each element into a window from its timestamp until Gap in the + // future. Overlapping windows (representing elements within Gap of + // each other) will be merged. + return []typex.Window{window.IntervalWindow{Start: ts, End: ts.Add(wfn.Gap)}} default: panic(fmt.Sprintf("Unexpected window fn: %v", wfn)) diff --git a/sdks/go/pkg/beam/core/runtime/exec/window_test.go b/sdks/go/pkg/beam/core/runtime/exec/window_test.go index f2eaf1ecb353..d8770cc23f4d 100644 --- a/sdks/go/pkg/beam/core/runtime/exec/window_test.go +++ b/sdks/go/pkg/beam/core/runtime/exec/window_test.go @@ -19,9 +19,9 @@ import ( "testing" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // TestAssignWindow tests that each window fn assigns the diff --git a/sdks/go/pkg/beam/core/runtime/genx/genx.go b/sdks/go/pkg/beam/core/runtime/genx/genx.go index 017ab9d562b3..67209eaaf610 100644 --- a/sdks/go/pkg/beam/core/runtime/genx/genx.go +++ b/sdks/go/pkg/beam/core/runtime/genx/genx.go @@ -25,9 +25,9 @@ package genx import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" ) // RegisterDoFn is a convenience function for registering DoFns. diff --git a/sdks/go/pkg/beam/core/runtime/genx/genx_test.go b/sdks/go/pkg/beam/core/runtime/genx/genx_test.go index 82e6c2f10113..21527ede9970 100644 --- a/sdks/go/pkg/beam/core/runtime/genx/genx_test.go +++ b/sdks/go/pkg/beam/core/runtime/genx/genx_test.go @@ -20,9 +20,9 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" "github.com/google/go-cmp/cmp" "github.com/google/go-cmp/cmp/cmpopts" ) diff --git a/sdks/go/pkg/beam/core/runtime/graphx/coder.go b/sdks/go/pkg/beam/core/runtime/graphx/coder.go index 2c4bd43de978..9fd7d8aab41c 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/coder.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/coder.go @@ -20,13 +20,13 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/schema" - v1pb "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/protox" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + v1pb "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" ) @@ -70,7 +70,7 @@ func knownStandardCoders() []string { urnWindowedValueCoder, urnGlobalWindow, urnIntervalWindow, - // TODO(BEAM-9615): Add urnRowCoder once finalized. + urnRowCoder, // TODO(BEAM-10660): Add urnTimerCoder once finalized. } } @@ -263,6 +263,11 @@ func (b *CoderUnmarshaller) makeCoder(id string, c *pipepb.Coder) (*coder.Coder, return nil, err } + // No payload means this coder was length prefixed by the runner + // but is likely self describing - AKA a beam coder. + if len(sub.GetSpec().GetPayload()) == 0 { + return b.makeCoder(components[0], sub) + } // TODO(lostluck) 2018/10/17: Make this strict again, once dataflow can use // the portable pipeline model directly (BEAM-2885) switch u := sub.GetSpec().GetUrn(); u { @@ -364,11 +369,11 @@ func (b *CoderUnmarshaller) makeCoder(id string, c *pipepb.Coder) (*coder.Coder, } return coder.NewR(typex.New(t)), nil - // Special handling for window coders so they can be treated as - // a general coder. Generally window coders are not used outside of - // specific contexts, but this enables improved testing. - // Window types are not permitted to be fulltypes, so - // we use assignably equivalent anonymous struct types. + // Special handling for window coders so they can be treated as + // a general coder. Generally window coders are not used outside of + // specific contexts, but this enables improved testing. + // Window types are not permitted to be fulltypes, so + // we use assignably equivalent anonymous struct types. case urnIntervalWindow: w, err := b.WindowCoder(id) if err != nil { diff --git a/sdks/go/pkg/beam/core/runtime/graphx/coder_test.go b/sdks/go/pkg/beam/core/runtime/graphx/coder_test.go index c781e30d1dda..718311b3a68f 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/coder_test.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/coder_test.go @@ -19,12 +19,12 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/schema" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func init() { @@ -99,6 +99,10 @@ func TestMarshalUnmarshalCoders(t *testing.T) { "CoGBK", coder.NewCoGBK([]*coder.Coder{foo, bar, baz}), }, + { + name: "R[graphx.registeredNamedTypeForTest]", + c: coder.NewR(typex.New(reflect.TypeOf((*registeredNamedTypeForTest)(nil)).Elem())), + }, { name: "R[*graphx.registeredNamedTypeForTest]", c: coder.NewR(typex.New(reflect.TypeOf((*registeredNamedTypeForTest)(nil)))), diff --git a/sdks/go/pkg/beam/core/runtime/graphx/cogbk.go b/sdks/go/pkg/beam/core/runtime/graphx/cogbk.go index ed5587e51865..4c7217326eaa 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/cogbk.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/cogbk.go @@ -16,12 +16,12 @@ package graphx import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/coderx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/coderx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // CoGBK support diff --git a/sdks/go/pkg/beam/core/runtime/graphx/dataflow.go b/sdks/go/pkg/beam/core/runtime/graphx/dataflow.go index 3f68e634864c..77aa6ca46a57 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/dataflow.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/dataflow.go @@ -16,13 +16,13 @@ package graphx import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/schema" - v1pb "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/protox" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + v1pb "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) // TODO(herohde) 7/17/2018: move CoderRef to dataflowlib once Dataflow diff --git a/sdks/go/pkg/beam/core/runtime/graphx/schema/logicaltypes.go b/sdks/go/pkg/beam/core/runtime/graphx/schema/logicaltypes.go index e3d830201dbe..42bda4f79b6f 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/schema/logicaltypes.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/schema/logicaltypes.go @@ -19,103 +19,156 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) var ( - // Maps logical type identifiers to their reflect.Type and the schema representation. - // the type identifier is the reflect.Type name, and included in the proto as well. - // We don't treat all types as "logical" types. - // ... why don't we treat all types as Logical types? - logicalTypes = map[string]LogicalType{} - logicalIdentifiers = map[reflect.Type]string{} + defaultRegistry = NewRegistry() ) -// LogicalType is an interface between custom Go types, and schema storage types. -// -// A LogicalType is a way to define a new type that can be stored in a schema field -// using a known underlying type for storage. The storage type must be comprised of -// known schema field types, or pre-registered LogicalTypes. LogicalTypes may not be -// mutually recursive at any level of indirection. -type LogicalType interface { - ID() string - ArgumentType() reflect.Type - ArgumentValue() reflect.Value - GoType() reflect.Type - StorageType() reflect.Type - // ToStorageType converts an instance of the Go type to the schema storage type. - ToStorageType(input reflect.Value) reflect.Value - // ToGoType converts an instance of the given schema storage type to the Go type. - ToGoType(base reflect.Value) reflect.Value -} - // RegisterLogicalType registers a logical type with the beam schema system. // A logical type is a type that has distinct representations and storage. // // RegisterLogicalType will panic if the storage type of the LogicalType // instance is not a valid storage type. func RegisterLogicalType(lt LogicalType) { + defaultRegistry.RegisterLogicalType(lt) +} + +// RegisterLogicalTypeProvider allows registration of providers for interface types. +func RegisterLogicalTypeProvider(rt reflect.Type, ltp LogicalTypeProvider) { + defaultRegistry.RegisterLogicalTypeProvider(rt, ltp) +} + +// LogicalTypeProvider produces a logical type for a given Go type. +// +// If unable to produce a logical type, it instead produces an error. +// Typically used to handle mapping LogicalTypes from interface types +// to a concrete implementation. The provider will be passed a +// type, and will produce an appropriate LogicalType for it. +type LogicalTypeProvider = func(reflect.Type) (reflect.Type, error) + +// Registry retains mappings from go types to Schemas and LogicalTypes. +type Registry struct { + typeToSchema map[reflect.Type]*pipepb.Schema + idToType map[string]reflect.Type + syntheticToUser map[reflect.Type]reflect.Type + + logicalTypeProviders map[reflect.Type]LogicalTypeProvider + logicalTypeInterfaces []reflect.Type + + // Maps logical type identifiers to their reflect.Type and the schema representation. + // the type identifier is the reflect.Type name, and included in the proto as well. + // We don't treat all types as "logical" types. + // ... why don't we treat all types as Logical types? + logicalTypes map[string]LogicalType + logicalTypeIdentifiers map[reflect.Type]string + + // toReconcile contains a list of types that have been registered + // but not yet processed. Registration actually happens on first + // call to ToType or FromType or once Initialize is called on beam.Init. + toReconcile []reflect.Type +} + +// NewRegistry creates an initialized LogicalTypeRegistry. +func NewRegistry() *Registry { + return &Registry{ + typeToSchema: map[reflect.Type]*pipepb.Schema{}, + idToType: map[string]reflect.Type{}, + syntheticToUser: map[reflect.Type]reflect.Type{}, + + logicalTypes: map[string]LogicalType{}, + logicalTypeIdentifiers: map[reflect.Type]string{}, + logicalTypeProviders: map[reflect.Type]LogicalTypeProvider{}, + } +} + +// RegisterLogicalType a single logical type. +func (r *Registry) RegisterLogicalType(lt LogicalType) { // Validates that the storage type has known handling. st := lt.StorageType() - _, err := reflectTypeToFieldType(st) + _, err := r.reflectTypeToFieldType(st) if err != nil { panic(fmt.Sprintf("LogicalType[%v] has an invalid StorageType %v: %v", lt.ID(), st, err)) } - logicalIdentifiers[lt.GoType()] = lt.ID() - logicalTypes[lt.ID()] = lt + if len(lt.ID()) == 0 { + panic(fmt.Sprintf("invalid logical type, bad id: %v -> %v", lt.GoType(), lt.StorageType())) + } + // TODO add duplication checks. + r.logicalTypeIdentifiers[lt.GoType()] = lt.ID() + r.logicalTypes[lt.ID()] = lt } -// convertibleLogicalType uses reflect.Value.Convert to change the Go -// type to the schema storage type and back again. Does not support -// type arguments. +// RegisterLogicalTypeProvider allows registration of providers for interface types. +func (r *Registry) RegisterLogicalTypeProvider(rt reflect.Type, ltp LogicalTypeProvider) { + if rt.Kind() != reflect.Interface { + panic(fmt.Sprintf("Logical Types must be registered with interface types. %v is not an interface type.", rt)) + } + if rt.NumMethod() == 0 { + panic(fmt.Sprintf("Logical Types may not be registered with empty interface types. %v has no methods.", rt)) + } + r.logicalTypeProviders[rt] = ltp + r.logicalTypeInterfaces = append(r.logicalTypeInterfaces, rt) +} + +// LogicalType is a mapping between custom Go types, and their schema equivalent storage types. // -// gotT and storageT must be convertible to each other. -type convertibleLogicalType struct { - identifier string - goT, storageT reflect.Type +// A LogicalType is a way to define a type that can be stored in a schema field +// using a known underlying type for storage. The storage type must be comprised of +// known schema field types, or pre-registered LogicalTypes. +// +// LogicalTypes may not be mutually recursive at any level of indirection. +// LogicalTypes must map from a Go type to a single Schema Equivalent storage type. +type LogicalType struct { + identifier string + goT, storageT, argT reflect.Type + argV reflect.Value + toStorage, toGo func(value reflect.Value) reflect.Value } -func (l *convertibleLogicalType) ID() string { +// ID is a unique identifier for the logical type. +func (l *LogicalType) ID() string { return l.identifier } -func (l *convertibleLogicalType) ArgumentType() reflect.Type { - return nil +// ArgumentType returns the Go type of the argument for parameterized types. +func (l *LogicalType) ArgumentType() reflect.Type { + return l.argT } -func (l *convertibleLogicalType) ArgumentValue() reflect.Value { - return reflect.Value{} +// ArgumentValue returns the Go value of the argument for parameterized types. +func (l *LogicalType) ArgumentValue() reflect.Value { + return l.argV } -func (l *convertibleLogicalType) GoType() reflect.Type { +// GoType returns the Go type of the logical type. This is the type in a Go +// field. +func (l *LogicalType) GoType() reflect.Type { return l.goT } -func (l *convertibleLogicalType) StorageType() reflect.Type { +// StorageType is the schema equivalent representation of this logical type. +// The storage type is how the logical type is encoded in bytes, and if the +// logical type is unknown, can be decoded into a value of this type instead. +func (l *LogicalType) StorageType() reflect.Type { return l.storageT } -func (l *convertibleLogicalType) ToStorageType(value reflect.Value) reflect.Value { - return value.Convert(l.storageT) -} - -func (l *convertibleLogicalType) ToGoType(storage reflect.Value) reflect.Value { - return storage.Convert(l.goT) +// ToLogicalType creates a LogicalType, indicating that there's a conversion from one to the other. +func ToLogicalType(identifier string, goType, storageType reflect.Type) LogicalType { + return LogicalType{identifier: identifier, goT: goType, storageT: storageType} } -// NewConvertibleLogicalType creates a LogicalType where the go type and storage representation -// can be converted between each other with reflect.Value.Convert. -func NewConvertibleLogicalType(identifier string, goType, storageType reflect.Type) LogicalType { - if !(goType.ConvertibleTo(storageType) && storageType.ConvertibleTo(goType)) { - panic(fmt.Sprintf("Can't create ConvertibleTo LogicalType: %v and %v are not convertable to each other", goType, storageType)) - } - return &convertibleLogicalType{identifier: identifier, goT: goType, storageT: storageType} +func preRegLogicalTypes(r *Registry) { + r.RegisterLogicalType(ToLogicalType("int", reflectx.Int, reflectx.Int64)) + r.RegisterLogicalType(ToLogicalType("int8", reflectx.Int8, reflectx.Int64)) + r.RegisterLogicalType(ToLogicalType("uint16", reflectx.Uint16, reflectx.Int16)) + r.RegisterLogicalType(ToLogicalType("uint32", reflectx.Uint32, reflectx.Int32)) + r.RegisterLogicalType(ToLogicalType("uint64", reflectx.Uint64, reflectx.Int64)) + r.RegisterLogicalType(ToLogicalType("uint", reflectx.Uint, reflectx.Int64)) } func init() { - RegisterLogicalType(NewConvertibleLogicalType("int", reflectx.Int, reflectx.Int64)) - RegisterLogicalType(NewConvertibleLogicalType("uint32", reflectx.Uint32, reflectx.Int32)) - RegisterLogicalType(NewConvertibleLogicalType("uint64", reflectx.Uint64, reflectx.Int64)) - RegisterLogicalType(NewConvertibleLogicalType("uint16", reflectx.Uint16, reflectx.Int16)) - RegisterLogicalType(NewConvertibleLogicalType("int8", reflectx.Int8, reflectx.Uint8)) + preRegLogicalTypes(defaultRegistry) } diff --git a/sdks/go/pkg/beam/core/runtime/graphx/schema/schema.go b/sdks/go/pkg/beam/core/runtime/graphx/schema/schema.go index 558505222834..7ca516b56944 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/schema/schema.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/schema/schema.go @@ -26,164 +26,524 @@ package schema import ( + "bytes" "fmt" + "hash/fnv" "reflect" - "strconv" "strings" - "sync/atomic" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/golang/protobuf/proto" + "github.com/google/uuid" ) -var lastShortID int64 +// Initialize registered schemas. For use by the beam package at beam.Init time. +func Initialize() { + if err := defaultRegistry.reconcileRegistrations(); err != nil { + panic(err) + } +} -// TODO(BEAM-9615): Replace with UUIDs. -func getNextID() string { - id := atomic.AddInt64(&lastShortID, 1) - // No reason not to use the smallest string short ids possible. - return strconv.FormatInt(id, 36) +// FromType returns a Beam Schema of the passed in type. +// Returns an error if the type cannot be converted to a Schema. +func FromType(ot reflect.Type) (*pipepb.Schema, error) { + return defaultRegistry.FromType(ot) } -var ( - // Maps types to schemas for reuse, caching the UUIDs. - typeToSchema = map[reflect.Type]*pipepb.Schema{} - // Maps synthetic types to user types. Keys must be generated from a schema. - // This works around using the generated type assertion shims failing to type assert. - // Type assertion isn't assignability, which is closer to how the reflection based - // shims operate. - // User types are mapped to themselves to also signify they've been registered. - syntheticToUser = map[reflect.Type]reflect.Type{} -) +// ToType returns a Go type of the passed in Schema. +// Types returned by ToType are always of Struct kind. +// Returns an error if the Schema cannot be converted to a type. +func ToType(s *pipepb.Schema) (reflect.Type, error) { + return defaultRegistry.ToType(s) +} // Registered returns whether the given type has been registered with -// the schema package. +// the default schema registry. func Registered(ut reflect.Type) bool { - _, ok := syntheticToUser[ut] - return ok + return defaultRegistry.Registered(ut) } // RegisterType converts the type to it's schema representation, and converts it back to // a synthetic type so we can map from the synthetic type back to the user type. // Recursively registers other named struct types in any component parts. func RegisterType(ut reflect.Type) { - registerType(ut, map[reflect.Type]struct{}{}) + defaultRegistry.RegisterType(ut) +} + +// getUUID generates a UUID using the string form of the type name. +func getUUID(ut reflect.Type) string { + // String produces non-empty output for pointer and slice types. + typename := ut.String() + hasher := fnv.New128a() + if n, err := hasher.Write([]byte(typename)); err != nil || n != len(typename) { + panic(fmt.Sprintf("unable to generate schema uuid for %s, wrote out %d bytes, want %d: err %v", typename, n, len(typename), err)) + } + id, err := uuid.NewRandomFromReader(bytes.NewBuffer(hasher.Sum(nil))) + if err != nil { + panic(fmt.Sprintf("unable to genereate schema uuid for type %s: %v", typename, err)) + } + return id.String() +} + +// Registered returns whether the given type has been registered with +// the schema package. +func (r *Registry) Registered(ut reflect.Type) bool { + _, ok := r.syntheticToUser[ut] + return ok +} + +var sdfRtrackerType = reflect.TypeOf((*sdf.RTracker)(nil)).Elem() + +// RegisterType converts the type to it's schema representation, and converts it back to +// a synthetic type so we can map from the synthetic type back to the user type. +// Recursively registers other named struct types in any component parts. +func (r *Registry) RegisterType(ut reflect.Type) { + r.toReconcile = append(r.toReconcile, ut) +} + +// reconcileRegistrations actually finishes the registration process. +func (r *Registry) reconcileRegistrations() (deferedErr error) { + var ut reflect.Type + defer func() { + if r := recover(); r != nil { + deferedErr = errors.Errorf("panicked: %v", r) + deferedErr = errors.WithContextf(deferedErr, "reconciling schema registration for type %v", ut) + } + }() + for _, ut := range r.toReconcile { + check := func(ut reflect.Type) bool { + return coder.LookupCustomCoder(ut) != nil + } + if check(ut) || check(reflect.PtrTo(ut)) { + continue + } + if err := r.registerType(ut, map[reflect.Type]struct{}{}); err != nil { + return errors.Wrapf(err, "error reconciling type %v", ut) + } + } + r.toReconcile = nil + return nil +} + +func implements(ut, ifacet reflect.Type) bool { + if ut.Implements(ifacet) { + return true + } + switch ut.Kind() { + case reflect.Ptr: + t := ut.Elem() + if t.Implements(ifacet) { + return true + } + return implements(t, ifacet) + case reflect.Struct: + for i := 0; i < ut.NumField(); i++ { + sf := ut.Field(i) + if sf.Anonymous { + impls := implements(sf.Type, ifacet) + if impls { + return true + } + } + } + } + return false } -func registerType(ut reflect.Type, seen map[reflect.Type]struct{}) { - if _, ok := syntheticToUser[ut]; ok { - return +func ignoreField(t reflect.Type, sf reflect.StructField) (ignore, isAnon bool, err error) { + isUnexported := sf.PkgPath != "" + if sf.Anonymous { + ft := sf.Type + if ft.Kind() == reflect.Ptr { + // If a struct embeds a pointer to an unexported type, + // it is not possible to set a newly allocated value + // since the field is unexported. + // + // See https://golang.org/issue/21357 + // + // Since the values are created by the decoder reflectively, + // fail early here. + if isUnexported { + return false, false, errors.Errorf("cannot make schema for type %v as it has an embedded field of a pointer to an unexported type %v. See https://golang.org/issue/21357", t, ft.Elem()) + } + ft = ft.Elem() + } + if isUnexported && ft.Kind() != reflect.Struct { + // Ignore embedded fields of unexported non-struct types. + return true, true, nil + } + // Do not ignore embedded fields of unexported struct types + // since they may have exported fields. + return false, true, nil + } + if isUnexported { + // Schemas can't handle unexported fields at all. + return true, false, nil + } + if implements(sf.Type, sdfRtrackerType) { + // ignoring sdf.Rtracker interface + return true, false, nil + } + + return false, false, nil +} + +func (r *Registry) registerType(ut reflect.Type, seen map[reflect.Type]struct{}) error { + // Ignore rtrackers. + if implements(ut, sdfRtrackerType) { + return nil + } + if _, ok := r.syntheticToUser[ut]; ok { + return nil } if _, ok := seen[ut]; ok { - return // already processed in this pass, don't reprocess. + return nil // already processed in this pass, don't reprocess. } seen[ut] = struct{}{} // Lets do some recursion to register fundamental type parts. t := ut + if lID, ok := r.logicalTypeIdentifiers[t]; ok { + lt := r.logicalTypes[lID] + r.addToMaps(lt.StorageType(), t) + return nil + } + + for _, lti := range r.logicalTypeInterfaces { + if !t.Implements(lti) { + continue + } + p := r.logicalTypeProviders[lti] + st, err := p(t) + if err != nil { + return errors.Wrapf(err, "unable to convert LogicalType[%v] using provider for %v", t, lti) + } + if st == nil { + continue + } + r.RegisterLogicalType(ToLogicalType(t.String(), t, st)) + r.addToMaps(st, t) + return nil + } + switch t.Kind() { case reflect.Map: - registerType(t.Key(), seen) + if err := r.registerType(t.Key(), seen); err != nil { + return err + } fallthrough case reflect.Array, reflect.Slice, reflect.Ptr: - registerType(t.Elem(), seen) - return + if err := r.registerType(t.Elem(), seen); err != nil { + return errors.Wrapf(err, "type is of kind %v", t.Kind()) + } + return nil + case reflect.Interface, reflect.Func, reflect.Chan, reflect.Invalid, reflect.UnsafePointer, reflect.Uintptr: + // Ignore these, as they can't be serialized. + return nil + case reflect.Complex64, reflect.Complex128: + // TODO(BEAM-9615): Support complex number types. + return nil case reflect.Struct: // What we expect here. default: - return + rt, ok := reflectKindToTypeMap[t.Kind()] + if !ok { + // Kind is not listed, meaning it's an unlisted somehow, which means either the map + // is missing something, or the above switch cases are missing something. + return errors.Errorf("unlisted kind %v for type %v reached.", t.Kind(), t) + } + if t != rt { + // It's only a logical type if it's not a built in primitive type, which is returned by the map. + r.RegisterLogicalType(ToLogicalType(t.String(), t, rt)) + } + return nil } - runtime.RegisterType(ut) for i := 0; i < t.NumField(); i++ { sf := ut.Field(i) - registerType(sf.Type, seen) + ignore, _, err := ignoreField(t, sf) + if err != nil { + return err + } + if ignore { + continue + } + if err := r.registerType(sf.Type, seen); err != nil { + return errors.Wrapf(err, "registering type for field %v in %v", sf.Name, ut) + } } - schm, err := FromType(ut) + schm, err := r.fromType(ut) if err != nil { - panic(errors.WithContextf(err, "converting %v to schema", ut)) + return errors.WithContextf(err, "converting %v to schema", ut) } - synth, err := ToType(schm) + synth, err := r.toType(schm) if err != nil { - panic(errors.WithContextf(err, "converting %v's back to a synthetic type", ut)) + return errors.WithContextf(err, "converting %v's back to a synthetic type", ut) } + + r.addToMaps(synth, ut) + return nil +} + +func (r *Registry) addToMaps(synth, ut reflect.Type) { synth = reflectx.SkipPtr(synth) ut = reflectx.SkipPtr(ut) - syntheticToUser[synth] = ut - syntheticToUser[reflect.PtrTo(synth)] = reflect.PtrTo(ut) - syntheticToUser[ut] = ut - syntheticToUser[reflect.PtrTo(ut)] = reflect.PtrTo(ut) + // empty types have no value for lookups. + if synth != emptyStructType { + r.syntheticToUser[synth] = ut + r.syntheticToUser[reflect.PtrTo(synth)] = reflect.PtrTo(ut) + } + if ut != emptyStructType { + r.syntheticToUser[ut] = ut + r.syntheticToUser[reflect.PtrTo(ut)] = reflect.PtrTo(ut) + } } // FromType returns a Beam Schema of the passed in type. // Returns an error if the type cannot be converted to a Schema. -func FromType(ot reflect.Type) (*pipepb.Schema, error) { +func (r *Registry) FromType(ot reflect.Type) (*pipepb.Schema, error) { + if err := r.reconcileRegistrations(); err != nil { + return nil, errors.Wrap(err, "reconciling for FromType") + } if reflectx.SkipPtr(ot).Kind() != reflect.Struct { return nil, errors.Errorf("cannot convert %v to schema. FromType only converts structs to schemas", ot) } - schm, err := structToSchema(ot) + return r.fromType(ot) +} + +func (r *Registry) logicalTypeToFieldType(t reflect.Type) (*pipepb.FieldType, string, error) { + // Check if a logical type was registered that matches this struct type directly + // and if so, extract the schema from it for use. + if lID, ok := r.logicalTypeIdentifiers[t]; ok { + lt := r.logicalTypes[lID] + ftype, err := r.reflectTypeToFieldType(lt.StorageType()) + if err != nil { + return nil, "", errors.Wrapf(err, "unable to convert LogicalType[%v]'s storage type %v for Go type of %v to a schema", lID, lt.StorageType(), lt.GoType()) + } + return ftype, lID, nil + } + for _, lti := range r.logicalTypeInterfaces { + if !t.Implements(lti) { + continue + } + p := r.logicalTypeProviders[lti] + st, err := p(t) + if err != nil { + return nil, "", errors.Wrapf(err, "unable to convert LogicalType[%v] using provider for %v schema field", t, lti) + } + if st == nil { + continue + } + ftype, err := r.reflectTypeToFieldType(st) + if err != nil { + return nil, "", errors.Wrapf(err, "unable to convert LogicalType[%v]'s storage type %v for Go type of %v to a schema", "interface", st, t) + } + return ftype, t.String(), nil + } + return nil, "", nil +} + +// fromType handles if the initial type is a pointer or not WRT lookups against +// registered types and then delegates to structToSchema for most of the conversion. +// For determinism in schema IDs, regardless of whther the original type is a pointer or not, +// both variants are cached for latter reuse. +func (r *Registry) fromType(ot reflect.Type) (*pipepb.Schema, error) { + if schm, ok := r.typeToSchema[ot]; ok { + return schm, nil + } + ftype, lID, err := r.logicalTypeToFieldType(ot) if err != nil { return nil, err } - if ot.Kind() == reflect.Ptr { - schm.Options = append(schm.Options, &pipepb.Option{ - Name: optGoNillable, - }) + if ftype != nil { + schm := ftype.GetRowType().GetSchema() + schm = proto.Clone(schm).(*pipepb.Schema) + if ot.Kind() == reflect.Ptr { + schm.Options = append(schm.Options, optGoNillable()) + } + if lID != "" { + schm.Options = append(schm.Options, logicalOption(lID)) + } + schm.Id = getUUID(ot) + r.typeToSchema[ot] = schm + r.idToType[schm.GetId()] = ot + return schm, nil } - return schm, nil + + t := reflectx.SkipPtr(ot) + + schm, err := r.structToSchema(t) + if err != nil { + return nil, err + } + // Cache the pointer type here with it's own id. + pt := reflect.PtrTo(t) + schm = proto.Clone(schm).(*pipepb.Schema) + schm.Id = getUUID(pt) + schm.Options = append(schm.Options, optGoNillable()) + r.idToType[schm.GetId()] = pt + r.typeToSchema[pt] = schm + + // Return whatever the original type was. + return r.typeToSchema[ot], nil } // Schema Option urns. const ( // optGoNillable indicates that this top level schema should be returned as a pointer type. - optGoNillable = "beam:schema:go:nillable:v1" + optGoNillableUrn = "beam:schema:go:nillable:v1" + // optGoEmbedded indicates that this field is an embedded type. + optGoEmbeddedUrn = "beam:schema:go:embedded_field:v1" + // optGoLogical indicates that this top level schema has a logical type equivalent that need to be looked up. + // It has a value type of String representing the URN for the logical type to look up. + optGoLogicalUrn = "beam:schema:go:logical:v1" ) -// nillableFromOptions converts the passed in type to it's pointer version -// if the option is present. This permits go types to be pointers. -func nillableFromOptions(opts []*pipepb.Option, t reflect.Type) reflect.Type { - return checkOptions(opts, optGoNillable, reflect.PtrTo(t)) +func optGoNillable() *pipepb.Option { + return newToggleOption(optGoNillableUrn) +} + +func optGoEmbedded() *pipepb.Option { + return newToggleOption(optGoEmbeddedUrn) } -func checkOptions(opts []*pipepb.Option, urn string, rt reflect.Type) reflect.Type { +// newToggleOption constructs an Option whose presence is all +// that matters, rather than other configuration. The option +// is not set if the toggle isn't true, so the value is always +// true. +func newToggleOption(urn string) *pipepb.Option { + return &pipepb.Option{ + Name: urn, + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_BOOLEAN, + }, + }, + Value: &pipepb.FieldValue{ + FieldValue: &pipepb.FieldValue_AtomicValue{ + AtomicValue: &pipepb.AtomicTypeValue{ + Value: &pipepb.AtomicTypeValue_Boolean{ + Boolean: true, + }, + }, + }, + }, + } +} + +func checkOptions(opts []*pipepb.Option, urn string) *pipepb.Option { for _, opt := range opts { if opt.GetName() == urn { - return rt + return opt } } return nil } -func structToSchema(ot reflect.Type) (*pipepb.Schema, error) { - if schm, ok := typeToSchema[ot]; ok { +// nillableFromOptions converts the passed in type to it's pointer version +// if the option is present. This permits go types to be pointers. +func nillableFromOptions(opts []*pipepb.Option, t reflect.Type) reflect.Type { + if checkOptions(opts, optGoNillableUrn) != nil { + return reflect.PtrTo(t) + } + return nil +} + +var optGoLogicalType = &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_STRING, + }, +} + +func logicalOption(lID string) *pipepb.Option { + return &pipepb.Option{ + Name: optGoLogicalUrn, + Type: optGoLogicalType, + Value: &pipepb.FieldValue{ + FieldValue: &pipepb.FieldValue_AtomicValue{ + AtomicValue: &pipepb.AtomicTypeValue{ + Value: &pipepb.AtomicTypeValue_String_{ + String_: lID, + }, + }, + }, + }, + } +} + +// fromLogicalOption returns the logical type id of this top +// level type if this schema has a logical equivalent. +func fromLogicalOption(opts []*pipepb.Option) (string, bool) { + o := checkOptions(opts, optGoLogicalUrn) + if o == nil { + return "", false + } + lID := o.GetValue().GetAtomicValue().GetString_() + return lID, true +} + +func (r *Registry) structToSchema(t reflect.Type) (*pipepb.Schema, error) { + if t.Kind() != reflect.Struct { + return nil, errors.Errorf("non struct type received in structToSchema: %v is kind %v", t, t.Kind()) + } + if schm, ok := r.typeToSchema[t]; ok { return schm, nil } - t := reflectx.SkipPtr(ot) + + ftype, lID, err := r.logicalTypeToFieldType(t) + if err != nil { + return nil, err + } + if ftype != nil { + schm := ftype.GetRowType().GetSchema() + schm = proto.Clone(schm).(*pipepb.Schema) + schm.Options = append(schm.Options, logicalOption(lID)) + schm.Id = getUUID(t) + r.typeToSchema[t] = schm + r.idToType[schm.GetId()] = t + return schm, nil + } + fields := make([]*pipepb.Field, 0, t.NumField()) for i := 0; i < t.NumField(); i++ { - f, err := structFieldToField(t.Field(i)) + sf := t.Field(i) + ignore, isAnon, err := ignoreField(t, sf) + if err != nil { + return nil, err + } + if ignore { + continue + } + f, err := r.structFieldToField(sf) if err != nil { return nil, errors.Wrapf(err, "cannot convert field %v to schema", t.Field(i).Name) } + if isAnon { + f = proto.Clone(f).(*pipepb.Field) + f.Options = append(f.Options, optGoEmbedded()) + } fields = append(fields, f) } schm := &pipepb.Schema{ Fields: fields, - Id: getNextID(), + Id: getUUID(t), } - typeToSchema[ot] = schm + r.idToType[schm.GetId()] = t + r.typeToSchema[t] = schm return schm, nil } -func structFieldToField(sf reflect.StructField) (*pipepb.Field, error) { +func (r *Registry) structFieldToField(sf reflect.StructField) (*pipepb.Field, error) { name := sf.Name if tag := sf.Tag.Get("beam"); tag != "" { name, _ = parseTag(tag) } - ftype, err := reflectTypeToFieldType(sf.Type) + ftype, err := r.reflectTypeToFieldType(sf.Type) if err != nil { return nil, err } @@ -193,13 +553,12 @@ func structFieldToField(sf reflect.StructField) (*pipepb.Field, error) { }, nil } -func reflectTypeToFieldType(ot reflect.Type) (*pipepb.FieldType, error) { - if lID, ok := logicalIdentifiers[ot]; ok { - lt := logicalTypes[lID] - ftype, err := reflectTypeToFieldType(lt.StorageType()) - if err != nil { - return nil, errors.Wrapf(err, "unable to convert LogicalType[%v]'s storage type %v for Go type of %v to a schema field", lID, lt.StorageType(), lt.GoType()) - } +func (r *Registry) reflectTypeToFieldType(ot reflect.Type) (*pipepb.FieldType, error) { + ftype, lID, err := r.logicalTypeToFieldType(ot) + if err != nil { + return nil, err + } + if ftype != nil { return &pipepb.FieldType{ TypeInfo: &pipepb.FieldType_LogicalType{ LogicalType: &pipepb.LogicalType{ @@ -211,24 +570,25 @@ func reflectTypeToFieldType(ot reflect.Type) (*pipepb.FieldType, error) { }, nil } - var isPtr bool t := ot - if t.Kind() == reflect.Ptr { - isPtr = true - t = t.Elem() - } switch t.Kind() { + case reflect.Ptr: + vt, err := r.reflectTypeToFieldType(t.Elem()) + if err != nil { + return nil, errors.Wrapf(err, "unable to convert key of %v to schema field", ot) + } + vt.Nullable = true + return vt, nil case reflect.Map: - kt, err := reflectTypeToFieldType(t.Key()) + kt, err := r.reflectTypeToFieldType(t.Key()) if err != nil { return nil, errors.Wrapf(err, "unable to convert key of %v to schema field", ot) } - vt, err := reflectTypeToFieldType(t.Elem()) + vt, err := r.reflectTypeToFieldType(t.Elem()) if err != nil { return nil, errors.Wrapf(err, "unable to convert value of %v to schema field", ot) } return &pipepb.FieldType{ - Nullable: isPtr, TypeInfo: &pipepb.FieldType_MapType{ MapType: &pipepb.MapType{ KeyType: kt, @@ -237,12 +597,11 @@ func reflectTypeToFieldType(ot reflect.Type) (*pipepb.FieldType, error) { }, }, nil case reflect.Struct: - sch, err := structToSchema(t) + sch, err := r.structToSchema(t) if err != nil { return nil, errors.Wrapf(err, "unable to convert %v to schema field", ot) } return &pipepb.FieldType{ - Nullable: isPtr, TypeInfo: &pipepb.FieldType_RowType{ RowType: &pipepb.RowType{ Schema: sch, @@ -253,30 +612,27 @@ func reflectTypeToFieldType(ot reflect.Type) (*pipepb.FieldType, error) { // Special handling for []byte if t == reflectx.ByteSlice { return &pipepb.FieldType{ - Nullable: isPtr, TypeInfo: &pipepb.FieldType_AtomicType{ AtomicType: pipepb.AtomicType_BYTES, }, }, nil } - vt, err := reflectTypeToFieldType(t.Elem()) + vt, err := r.reflectTypeToFieldType(t.Elem()) if err != nil { return nil, errors.Wrapf(err, "unable to convert element type of %v to schema field", ot) } return &pipepb.FieldType{ - Nullable: isPtr, TypeInfo: &pipepb.FieldType_ArrayType{ ArrayType: &pipepb.ArrayType{ ElementType: vt, }, }, }, nil - case reflect.Interface, reflect.Chan, reflect.UnsafePointer, reflect.Complex128, reflect.Complex64: + case reflect.Interface, reflect.Func, reflect.Chan, reflect.UnsafePointer, reflect.Complex128, reflect.Complex64, reflect.Invalid: return nil, errors.Errorf("unable to convert unsupported type %v to schema", ot) default: // must be an atomic type if enum, ok := reflectTypeToAtomicTypeMap[t.Kind()]; ok { return &pipepb.FieldType{ - Nullable: isPtr, TypeInfo: &pipepb.FieldType_AtomicType{ AtomicType: enum, }, @@ -297,20 +653,58 @@ var reflectTypeToAtomicTypeMap = map[reflect.Kind]pipepb.AtomicType{ reflect.Bool: pipepb.AtomicType_BOOLEAN, } +var reflectKindToTypeMap = map[reflect.Kind]reflect.Type{ + reflect.Uint: reflectx.Uint, + reflect.Uint8: reflectx.Uint8, + reflect.Uint16: reflectx.Uint16, + reflect.Uint32: reflectx.Uint32, + reflect.Uint64: reflectx.Uint64, + reflect.Int: reflectx.Int, + reflect.Int8: reflectx.Int8, + reflect.Int16: reflectx.Int16, + reflect.Int32: reflectx.Int32, + reflect.Int64: reflectx.Int64, + reflect.Float32: reflectx.Float32, + reflect.Float64: reflectx.Float64, + reflect.String: reflectx.String, + reflect.Bool: reflectx.Bool, +} + +var emptyStructType = reflect.TypeOf((*struct{})(nil)).Elem() + // ToType returns a Go type of the passed in Schema. // Types returned by ToType are always of Struct kind. // Returns an error if the Schema cannot be converted to a type. -func ToType(s *pipepb.Schema) (reflect.Type, error) { +func (r *Registry) ToType(s *pipepb.Schema) (reflect.Type, error) { + if err := r.reconcileRegistrations(); err != nil { + return nil, errors.Wrap(err, "reconciling for ToType") + } + return r.toType(s) +} + +func (r *Registry) toType(s *pipepb.Schema) (reflect.Type, error) { + if t, ok := r.idToType[s.GetId()]; ok { + return t, nil + } + if lID, ok := fromLogicalOption(s.GetOptions()); ok { + if lt, ok := r.logicalTypes[lID]; ok { + return lt.GoType(), nil + } + } + fields := make([]reflect.StructField, 0, len(s.GetFields())) for _, sf := range s.GetFields() { - rf, err := fieldToStructField(sf) + rf, err := r.fieldToStructField(sf) if err != nil { return nil, errors.Wrapf(err, "cannot convert schema field %v to field", sf.GetName()) } + if checkOptions(sf.Options, optGoEmbeddedUrn) != nil { + rf.Anonymous = true + } fields = append(fields, rf) } ret := reflect.StructOf(fields) - if ut, ok := syntheticToUser[ret]; ok { + if ut, ok := r.syntheticToUser[ret]; ok { ret = ut } if t := nillableFromOptions(s.GetOptions(), ret); t != nil { @@ -319,9 +713,9 @@ func ToType(s *pipepb.Schema) (reflect.Type, error) { return ret, nil } -func fieldToStructField(sf *pipepb.Field) (reflect.StructField, error) { +func (r *Registry) fieldToStructField(sf *pipepb.Field) (reflect.StructField, error) { name := sf.GetName() - rt, err := fieldTypeToReflectType(sf.GetType(), sf.Options) + rt, err := r.fieldTypeToReflectType(sf.GetType(), sf.Options) if err != nil { return reflect.StructField{}, err } @@ -349,7 +743,7 @@ var atomicTypeToReflectType = map[pipepb.AtomicType]reflect.Type{ pipepb.AtomicType_BYTES: reflectx.ByteSlice, } -func fieldTypeToReflectType(sft *pipepb.FieldType, opts []*pipepb.Option) (reflect.Type, error) { +func (r *Registry) fieldTypeToReflectType(sft *pipepb.FieldType, opts []*pipepb.Option) (reflect.Type, error) { var t reflect.Type switch sft.GetTypeInfo().(type) { case *pipepb.FieldType_AtomicType: @@ -358,34 +752,34 @@ func fieldTypeToReflectType(sft *pipepb.FieldType, opts []*pipepb.Option) (refle return nil, errors.Errorf("unknown atomic type: %v", sft.GetAtomicType()) } case *pipepb.FieldType_ArrayType: - rt, err := fieldTypeToReflectType(sft.GetArrayType().GetElementType(), nil) + rt, err := r.fieldTypeToReflectType(sft.GetArrayType().GetElementType(), nil) if err != nil { return nil, errors.Wrap(err, "unable to convert array element type") } t = reflect.SliceOf(rt) case *pipepb.FieldType_MapType: - kt, err := fieldTypeToReflectType(sft.GetMapType().GetKeyType(), nil) + kt, err := r.fieldTypeToReflectType(sft.GetMapType().GetKeyType(), nil) if err != nil { return nil, errors.Wrap(err, "unable to convert map key type") } - vt, err := fieldTypeToReflectType(sft.GetMapType().GetValueType(), nil) + vt, err := r.fieldTypeToReflectType(sft.GetMapType().GetValueType(), nil) if err != nil { return nil, errors.Wrap(err, "unable to convert map value type") } t = reflect.MapOf(kt, vt) // Panics for invalid map keys (slices/iterables) case *pipepb.FieldType_RowType: - rt, err := ToType(sft.GetRowType().GetSchema()) + rt, err := r.toType(sft.GetRowType().GetSchema()) if err != nil { return nil, errors.Wrapf(err, "unable to convert row type: %v", sft.GetRowType().GetSchema().GetId()) } t = rt // case *pipepb.FieldType_IterableType: - // TODO(BEAM-9615): handle IterableTypes. + // TODO(BEAM-9615): handle IterableTypes (eg. CoGBK values) case *pipepb.FieldType_LogicalType: lst := sft.GetLogicalType() identifier := lst.GetUrn() - lt, ok := logicalTypes[identifier] + lt, ok := r.logicalTypes[identifier] if !ok { return nil, errors.Errorf("unknown logical type: %v", identifier) } diff --git a/sdks/go/pkg/beam/core/runtime/graphx/schema/schema_test.go b/sdks/go/pkg/beam/core/runtime/graphx/schema/schema_test.go index 8b22fe3ac6b7..ef66499de4ec 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/schema/schema_test.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/schema/schema_test.go @@ -20,8 +20,8 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" "github.com/google/go-cmp/cmp" "google.golang.org/protobuf/testing/protocmp" @@ -41,11 +41,64 @@ type justAType struct { C int } +type myInt int + +type anotherStruct struct { + Q myInt +} + func init() { runtime.RegisterType(reflect.TypeOf((*registeredType)(nil))) - RegisterType(reflect.TypeOf((*sRegisteredType)(nil))) } +type testInterface interface { + hidden() +} + +type unexportedFields struct { + d uint64 +} + +func (unexportedFields) hidden() {} + +type exportedFunc struct { + e int16 + F func() +} + +func (*exportedFunc) hidden() {} + +type Exported struct { + G myInt +} + +type hasEmbedded struct { + Exported +} + +type hasEmbeddedPtr struct { + *Exported +} + +type hasMap struct { + Cypher map[bool]float32 `beam:"cypher"` +} + +type nonRegisteredLogical struct { + k int32 +} + +var ( + unexportedFieldsType = reflect.TypeOf((*unexportedFields)(nil)).Elem() + exportedFuncType = reflect.TypeOf((*exportedFunc)(nil)) + anotherType = reflect.TypeOf((*anotherStruct)(nil)).Elem() + exportedType = reflect.TypeOf((*Exported)(nil)).Elem() + hasEmbeddedType = reflect.TypeOf((*hasEmbedded)(nil)).Elem() + hasEmbeddedPtrType = reflect.TypeOf((*hasEmbeddedPtr)(nil)).Elem() + hasMapType = reflect.TypeOf((*hasMap)(nil)).Elem() + nonRegisteredLogicalType = reflect.TypeOf((*nonRegisteredLogical)(nil)).Elem() +) + func TestSchemaConversion(t *testing.T) { tests := []struct { st *pipepb.Schema @@ -117,9 +170,7 @@ func TestSchemaConversion(t *testing.T) { }, }, }, - rt: reflect.TypeOf(struct { - Cypher map[bool]float32 `beam:"cypher"` - }{}), + rt: hasMapType, }, { st: &pipepb.Schema{ Fields: []*pipepb.Field{ @@ -353,25 +404,391 @@ func TestSchemaConversion(t *testing.T) { }, }, }, - Options: []*pipepb.Option{{ - Name: optGoNillable, - }}, + Options: []*pipepb.Option{optGoNillable()}, }, rt: reflect.TypeOf(&struct { SuperNES int16 }{}), + }, { + st: &pipepb.Schema{ + Options: []*pipepb.Option{ + logicalOption("schema.unexportedFields"), + }, + Fields: []*pipepb.Field{ + { + Name: "D", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "uint64", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT64, + }, + }, + }, + }, + }, + }, + }, + }, + rt: unexportedFieldsType, + }, { + st: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "G", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "schema.unexportedFields", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_RowType{ + RowType: &pipepb.RowType{ + Schema: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "D", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "uint64", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT64, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + rt: reflect.TypeOf(struct{ G unexportedFields }{}), + }, { + st: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "H", + Type: &pipepb.FieldType{ + Nullable: true, + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "schema.unexportedFields", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_RowType{ + RowType: &pipepb.RowType{ + Schema: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "D", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "uint64", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT64, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + rt: reflect.TypeOf(struct{ H *unexportedFields }{}), + }, { + st: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "E", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT16, + }, + }, + }, + }, + Options: []*pipepb.Option{optGoNillable(), logicalOption("*schema.exportedFunc")}, + }, + rt: exportedFuncType, + }, { + st: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "Q", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "schema.myInt", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "int", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT64, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + rt: anotherType, + }, { + st: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "Exported", + Options: []*pipepb.Option{optGoEmbedded()}, + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_RowType{ + RowType: &pipepb.RowType{ + Schema: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "G", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "schema.myInt", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "int", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT64, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + rt: hasEmbeddedType, + }, { + st: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "Exported", + Options: []*pipepb.Option{optGoEmbedded()}, + Type: &pipepb.FieldType{ + Nullable: true, + TypeInfo: &pipepb.FieldType_RowType{ + RowType: &pipepb.RowType{ + Schema: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "G", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "schema.myInt", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "int", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT64, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + rt: hasEmbeddedPtrType, + }, { + st: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "T", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_STRING, + }, + }, + }, + }, + Options: []*pipepb.Option{optGoNillable()}, + }, + rt: reflect.TypeOf(&struct { + myInt + T string + i int + }{}), + }, { + st: &pipepb.Schema{ + Options: []*pipepb.Option{ + logicalOption("schema.exportedFunc"), + }, + Fields: []*pipepb.Field{ + { + Name: "V", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT16, + }, + }, + }, + }, + }, + rt: reflect.TypeOf(exportedFunc{}), + }, { + st: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "U", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "schema.exportedFunc", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_RowType{ + RowType: &pipepb.RowType{ + Schema: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "V", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT16, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + rt: reflect.TypeOf(struct { + U exportedFunc + }{}), + }, { + st: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "U", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_LogicalType{ + LogicalType: &pipepb.LogicalType{ + Urn: "schema.nonRegisteredLogical", + Representation: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_RowType{ + RowType: &pipepb.RowType{ + Schema: &pipepb.Schema{ + Fields: []*pipepb.Field{ + { + Name: "K", + Type: &pipepb.FieldType{ + TypeInfo: &pipepb.FieldType_AtomicType{ + AtomicType: pipepb.AtomicType_INT32, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + }, + rt: reflect.TypeOf(struct { + U nonRegisteredLogical + }{}), }, } for _, test := range tests { test := test t.Run(fmt.Sprintf("%v", test.rt), func(t *testing.T) { + reg := NewRegistry() + preRegLogicalTypes(reg) + reg.RegisterLogicalType(ToLogicalType(exportedFuncType.Elem().String(), exportedFuncType.Elem(), reflect.TypeOf(struct{ V int16 }{}))) + reg.RegisterLogicalType(ToLogicalType(nonRegisteredLogicalType.String(), nonRegisteredLogicalType, reflect.TypeOf(struct{ K int32 }{}))) + reg.RegisterType(reflect.TypeOf((*sRegisteredType)(nil))) + reg.RegisterLogicalTypeProvider(reflect.TypeOf((*testInterface)(nil)).Elem(), func(t reflect.Type) (reflect.Type, error) { + switch t { + case unexportedFieldsType: + return reflect.TypeOf(struct{ D uint64 }{}), nil + case exportedFuncType: + return reflect.TypeOf(struct{ E int16 }{}), nil + } + return nil, nil + }) + reg.RegisterType(unexportedFieldsType) + reg.RegisterType(exportedFuncType) + reg.RegisterType(anotherType) + reg.RegisterType(exportedType) + reg.RegisterType(hasEmbeddedType) + reg.RegisterType(hasEmbeddedPtrType) + reg.RegisterType(hasMapType) + { - got, err := ToType(test.st) + got, err := reg.ToType(test.st) if err != nil { t.Fatalf("error ToType(%v) = %v", test.st, err) } - if !test.rt.AssignableTo(got) { + // We can't validate that synthetic types from Schemas with embedded fields are + // assignable, as the anonymous struct field won't be equivalent to the + // real embedded type. + if !hasEmbeddedField(test.rt) && !test.rt.AssignableTo(got) { t.Errorf("%v not assignable to %v", test.rt, got) if d := cmp.Diff(reflect.New(test.rt).Elem().Interface(), reflect.New(got).Elem().Interface()); d != "" { t.Errorf("diff (-want, +got): %v", d) @@ -379,7 +796,7 @@ func TestSchemaConversion(t *testing.T) { } } { - got, err := FromType(test.rt) + got, err := reg.FromType(test.rt) if err != nil { t.Fatalf("error FromType(%v) = %v", test.rt, err) } @@ -394,3 +811,18 @@ func TestSchemaConversion(t *testing.T) { }) } } + +func hasEmbeddedField(t reflect.Type) bool { + if t.Kind() == reflect.Ptr { + t = t.Elem() + } + if t.Kind() != reflect.Struct { + return false + } + for i := 0; i < t.NumField(); i++ { + if t.Field(i).Anonymous { + return true + } + } + return false +} diff --git a/sdks/go/pkg/beam/core/runtime/graphx/serialize.go b/sdks/go/pkg/beam/core/runtime/graphx/serialize.go index 9a9094f8e9c8..417a759b71a4 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/serialize.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/serialize.go @@ -21,16 +21,16 @@ import ( "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - v1pb "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/jsonx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + v1pb "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/jsonx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) var genFnType = reflect.TypeOf((*func(string, reflect.Type, []byte) reflectx.Func)(nil)).Elem() diff --git a/sdks/go/pkg/beam/core/runtime/graphx/serialize_test.go b/sdks/go/pkg/beam/core/runtime/graphx/serialize_test.go index be3bbf371c3a..b1a3369e768e 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/serialize_test.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/serialize_test.go @@ -20,8 +20,8 @@ import ( "strings" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - v1pb "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + v1pb "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/v1" ) func TestEncodeType(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/runtime/graphx/translate.go b/sdks/go/pkg/beam/core/runtime/graphx/translate.go index 17e8134641c2..c74b6458d737 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/translate.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/translate.go @@ -19,14 +19,14 @@ import ( "context" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - v1pb "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/pipelinex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/protox" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + v1pb "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/pipelinex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" "github.com/golang/protobuf/ptypes" ) @@ -40,7 +40,7 @@ const ( URNGBK = "beam:transform:group_by_key:v1" URNReshuffle = "beam:transform:reshuffle:v1" URNCombinePerKey = "beam:transform:combine_per_key:v1" - URNWindow = "beam:transform:window:v1" + URNWindow = "beam:transform:window_into:v1" // URNIterableSideInput = "beam:side_input:iterable:v1" URNMultimapSideInput = "beam:side_input:multimap:v1" @@ -302,8 +302,10 @@ func (m *marshaller) addMultiEdge(edge NamedEdge) ([]string, error) { } return []string{reshuffleID}, nil case edge.Edge.Op == graph.External: - if edge.Edge.External.Expanded != nil { - m.needsExpansion = true + if edge.Edge.External != nil { + if edge.Edge.External.Expanded != nil { + m.needsExpansion = true + } } if edge.Edge.Payload == nil { edgeID, err := m.expandCrossLanguage(edge) @@ -328,6 +330,7 @@ func (m *marshaller) addMultiEdge(edge NamedEdge) ([]string, error) { } outputs[fmt.Sprintf("i%v", i)] = nodeID(out.To) } + var annotations map[string][]byte // allPIds tracks additional PTransformIDs generated for the pipeline var allPIds []string @@ -424,6 +427,7 @@ func (m *marshaller) addMultiEdge(edge NamedEdge) ([]string, error) { m.requirements[URNRequiresSplittableDoFn] = true } spec = &pipepb.FunctionSpec{Urn: URNParDo, Payload: protox.MustEncode(payload)} + annotations = edge.Edge.DoFn.Annotations() case graph.Combine: mustEncodeMultiEdge, err := mustEncodeMultiEdgeBase64(edge.Edge) @@ -455,7 +459,34 @@ func (m *marshaller) addMultiEdge(edge NamedEdge) ([]string, error) { spec = &pipepb.FunctionSpec{Urn: URNWindow, Payload: protox.MustEncode(payload)} case graph.External: - spec = &pipepb.FunctionSpec{Urn: edge.Edge.Payload.URN, Payload: edge.Edge.Payload.Data} + pyld := edge.Edge.Payload + spec = &pipepb.FunctionSpec{Urn: pyld.URN, Payload: pyld.Data} + + if len(pyld.InputsMap) != 0 { + if got, want := len(pyld.InputsMap), len(edge.Edge.Input); got != want { + return handleErr(errors.Errorf("mismatch'd counts between External tags (%v) and inputs (%v)", got, want)) + } + inputs = make(map[string]string) + for tag, in := range InboundTagToNode(pyld.InputsMap, edge.Edge.Input) { + if _, err := m.addNode(in); err != nil { + return handleErr(err) + } + inputs[tag] = nodeID(in) + } + } + + if len(pyld.OutputsMap) != 0 { + if got, want := len(pyld.OutputsMap), len(edge.Edge.Output); got != want { + return handleErr(errors.Errorf("mismatch'd counts between External tags (%v) and outputs (%v)", got, want)) + } + outputs = make(map[string]string) + for tag, out := range OutboundTagToNode(pyld.OutputsMap, edge.Edge.Output) { + if _, err := m.addNode(out); err != nil { + return handleErr(err) + } + outputs[tag] = nodeID(out) + } + } default: err := errors.Errorf("unexpected opcode: %v", edge.Edge.Op) @@ -473,6 +504,7 @@ func (m *marshaller) addMultiEdge(edge NamedEdge) ([]string, error) { Inputs: inputs, Outputs: outputs, EnvironmentId: transformEnvID, + Annotations: annotations, } m.transforms[id] = transform allPIds = append(allPIds, id) @@ -943,22 +975,144 @@ func marshalWindowingStrategy(c *CoderMarshaller, w *window.WindowingStrategy) ( if err != nil { return nil, err } + var mergeStat pipepb.MergeStatus_Enum + if w.Fn.Kind == window.Sessions { + mergeStat = pipepb.MergeStatus_NEEDS_MERGE + } else { + mergeStat = pipepb.MergeStatus_NON_MERGING + } + ws := &pipepb.WindowingStrategy{ WindowFn: windowFn, - MergeStatus: pipepb.MergeStatus_NON_MERGING, - AccumulationMode: pipepb.AccumulationMode_DISCARDING, + MergeStatus: mergeStat, WindowCoderId: windowCoderId, - Trigger: &pipepb.Trigger{ + Trigger: makeTrigger(w.Trigger), + AccumulationMode: makeAccumulationMode(w.AccumulationMode), + OutputTime: pipepb.OutputTime_END_OF_WINDOW, + ClosingBehavior: pipepb.ClosingBehavior_EMIT_IF_NONEMPTY, + AllowedLateness: 0, + OnTimeBehavior: pipepb.OnTimeBehavior_FIRE_IF_NONEMPTY, + } + return ws, nil +} + +func makeAccumulationMode(m window.AccumulationMode) pipepb.AccumulationMode_Enum { + switch m { + case window.Accumulating: + return pipepb.AccumulationMode_ACCUMULATING + case window.Discarding: + return pipepb.AccumulationMode_DISCARDING + case window.Unspecified: + return pipepb.AccumulationMode_UNSPECIFIED + case window.Retracting: + return pipepb.AccumulationMode_RETRACTING + default: + return pipepb.AccumulationMode_DISCARDING + } +} + +func makeTrigger(t window.Trigger) *pipepb.Trigger { + switch t.Kind { + case window.DefaultTrigger: + return &pipepb.Trigger{ Trigger: &pipepb.Trigger_Default_{ Default: &pipepb.Trigger_Default{}, }, - }, - OutputTime: pipepb.OutputTime_END_OF_WINDOW, - ClosingBehavior: pipepb.ClosingBehavior_EMIT_IF_NONEMPTY, - AllowedLateness: 0, - OnTimeBehavior: pipepb.OnTimeBehavior_FIRE_ALWAYS, + } + case window.AlwaysTrigger: + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_Always_{ + Always: &pipepb.Trigger_Always{}, + }, + } + case window.AfterAnyTrigger: + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_AfterAny_{ + AfterAny: &pipepb.Trigger_AfterAny{ + Subtriggers: extractSubtriggers(t.SubTriggers), + }, + }, + } + case window.AfterAllTrigger: + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_AfterAll_{ + AfterAll: &pipepb.Trigger_AfterAll{ + Subtriggers: extractSubtriggers(t.SubTriggers), + }, + }, + } + case window.AfterProcessingTimeTrigger: + // TODO(BEAM-3304) Right now would work only for single delay value. + // could be configured to take more than one delay values later. + ttd := &pipepb.TimestampTransform{ + TimestampTransform: &pipepb.TimestampTransform_Delay_{ + Delay: &pipepb.TimestampTransform_Delay{DelayMillis: t.Delay}, + }} + tt := []*pipepb.TimestampTransform{ttd} + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_AfterProcessingTime_{ + AfterProcessingTime: &pipepb.Trigger_AfterProcessingTime{TimestampTransforms: tt}, + }, + } + case window.ElementCountTrigger: + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_ElementCount_{ + ElementCount: &pipepb.Trigger_ElementCount{ElementCount: t.ElementCount}, + }, + } + case window.AfterEndOfWindowTrigger: + var lateTrigger *pipepb.Trigger + if t.LateTrigger != nil { + lateTrigger = makeTrigger(*t.LateTrigger) + } + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_AfterEndOfWindow_{ + AfterEndOfWindow: &pipepb.Trigger_AfterEndOfWindow{ + EarlyFirings: makeTrigger(*t.EarlyTrigger), + LateFirings: lateTrigger, + }, + }, + } + case window.RepeatTrigger: + if len(t.SubTriggers) != 1 { + panic("Only 1 Subtrigger should be passed to Repeat Trigger") + } + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_Repeat_{ + Repeat: &pipepb.Trigger_Repeat{Subtrigger: makeTrigger(t.SubTriggers[0])}, + }, + } + case window.NeverTrigger: + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_Never_{ + Never: &pipepb.Trigger_Never{}, + }, + } + case window.AfterSynchronizedProcessingTimeTrigger: + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_AfterSynchronizedProcessingTime_{ + AfterSynchronizedProcessingTime: &pipepb.Trigger_AfterSynchronizedProcessingTime{}, + }, + } + default: + return &pipepb.Trigger{ + Trigger: &pipepb.Trigger_Default_{ + Default: &pipepb.Trigger_Default{}, + }, + } } - return ws, nil +} + +func extractSubtriggers(t []window.Trigger) []*pipepb.Trigger { + if len(t) <= 0 { + panic("At least one subtrigger required for composite triggers.") + } + + var result []*pipepb.Trigger + for _, tr := range t { + result = append(result, makeTrigger(tr)) + } + return result } func makeWindowFn(w *window.Fn) (*pipepb.FunctionSpec, error) { @@ -1004,10 +1158,10 @@ func makeWindowCoder(w *window.Fn) (*coder.WindowCoder, error) { switch w.Kind { case window.GlobalWindows: return coder.NewGlobalWindow(), nil - case window.FixedWindows, window.SlidingWindows, URNSlidingWindowsWindowFn: + case window.FixedWindows, window.SlidingWindows, window.Sessions, URNSlidingWindowsWindowFn: return coder.NewIntervalWindow(), nil default: - return nil, errors.Errorf("unexpected windowing strategy: %v", w) + return nil, errors.Errorf("unexpected windowing strategy for coder: %v", w) } } diff --git a/sdks/go/pkg/beam/core/runtime/graphx/translate_test.go b/sdks/go/pkg/beam/core/runtime/graphx/translate_test.go index 8f3d5a50338f..411e0f22d551 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/translate_test.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/translate_test.go @@ -21,14 +21,14 @@ import ( "github.com/google/go-cmp/cmp/cmpopts" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" "github.com/google/go-cmp/cmp" ) diff --git a/sdks/go/pkg/beam/core/runtime/graphx/tree.go b/sdks/go/pkg/beam/core/runtime/graphx/tree.go index c48fd8cf9c8b..b921d4a9d5f8 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/tree.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/tree.go @@ -16,7 +16,7 @@ package graphx import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" ) // NamedEdge is a named MultiEdge. diff --git a/sdks/go/pkg/beam/core/runtime/graphx/user.go b/sdks/go/pkg/beam/core/runtime/graphx/user.go index f6dd47148fa0..9e0fb1ae7d6a 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/user.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/user.go @@ -22,11 +22,11 @@ import ( "encoding/base64" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - v1pb "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/protox" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + v1pb "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // EncodeType encodes a type as a string. Unless registered, the decoded type diff --git a/sdks/go/pkg/beam/core/runtime/graphx/v1/gen.go b/sdks/go/pkg/beam/core/runtime/graphx/v1/gen.go index fb41a343d662..90db14565664 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/v1/gen.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/v1/gen.go @@ -15,4 +15,4 @@ package v1 -//go:generate protoc -I . v1.proto --go_out=. +//go:generate protoc -I . v1.proto --go_out=../../../../../../../../../../../ diff --git a/sdks/go/pkg/beam/core/runtime/graphx/v1/v1.pb.go b/sdks/go/pkg/beam/core/runtime/graphx/v1/v1.pb.go index fc9dd0195b6f..8b23ff469907 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/v1/v1.pb.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/v1/v1.pb.go @@ -1,40 +1,47 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// +// Protocol Buffers describing graph serialization (v1) +// +// We need to serialize types and function symbols to be able to recreate +// UserFns on the worker side, in particular. We use protos for compactness. + // Code generated by protoc-gen-go. DO NOT EDIT. +// versions: +// protoc-gen-go v1.25.0-devel +// protoc v3.13.0 // source: v1.proto -/* -Package v1 is a generated protocol buffer package. - -It is generated from these files: - v1.proto - -It has these top-level messages: - Type - FullType - UserFn - DynFn - Fn - WindowFn - CustomCoder - MultiEdge - InjectPayload - TransformPayload -*/ package v1 -import proto "github.com/golang/protobuf/proto" -import fmt "fmt" -import math "math" - -// Reference imports to suppress errors if they are not otherwise used. -var _ = proto.Marshal -var _ = fmt.Errorf -var _ = math.Inf +import ( + protoreflect "google.golang.org/protobuf/reflect/protoreflect" + protoimpl "google.golang.org/protobuf/runtime/protoimpl" + reflect "reflect" + sync "sync" +) -// This is a compile-time assertion to ensure that this generated file -// is compatible with the proto package it is being compiled against. -// A compilation error at this line likely means your copy of the -// proto package needs to be updated. -const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package +const ( + // Verify that this generated code is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(20 - protoimpl.MinVersion) + // Verify that runtime/protoimpl is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(protoimpl.MaxVersion - 20) +) // Kind is mostly identical to reflect.TypeKind, expect we handle certain // types specially, such as "error". @@ -67,59 +74,84 @@ const ( Type_EXTERNAL Type_Kind = 26 ) -var Type_Kind_name = map[int32]string{ - 0: "INVALID", - 1: "BOOL", - 2: "INT", - 3: "INT8", - 4: "INT16", - 5: "INT32", - 6: "INT64", - 7: "UINT", - 8: "UINT8", - 9: "UINT16", - 10: "UINT32", - 11: "UINT64", - 12: "STRING", - 13: "FLOAT32", - 14: "FLOAT64", - 20: "SLICE", - 21: "STRUCT", - 22: "FUNC", - 23: "CHAN", - 24: "PTR", - 25: "SPECIAL", - 26: "EXTERNAL", -} -var Type_Kind_value = map[string]int32{ - "INVALID": 0, - "BOOL": 1, - "INT": 2, - "INT8": 3, - "INT16": 4, - "INT32": 5, - "INT64": 6, - "UINT": 7, - "UINT8": 8, - "UINT16": 9, - "UINT32": 10, - "UINT64": 11, - "STRING": 12, - "FLOAT32": 13, - "FLOAT64": 14, - "SLICE": 20, - "STRUCT": 21, - "FUNC": 22, - "CHAN": 23, - "PTR": 24, - "SPECIAL": 25, - "EXTERNAL": 26, +// Enum value maps for Type_Kind. +var ( + Type_Kind_name = map[int32]string{ + 0: "INVALID", + 1: "BOOL", + 2: "INT", + 3: "INT8", + 4: "INT16", + 5: "INT32", + 6: "INT64", + 7: "UINT", + 8: "UINT8", + 9: "UINT16", + 10: "UINT32", + 11: "UINT64", + 12: "STRING", + 13: "FLOAT32", + 14: "FLOAT64", + 20: "SLICE", + 21: "STRUCT", + 22: "FUNC", + 23: "CHAN", + 24: "PTR", + 25: "SPECIAL", + 26: "EXTERNAL", + } + Type_Kind_value = map[string]int32{ + "INVALID": 0, + "BOOL": 1, + "INT": 2, + "INT8": 3, + "INT16": 4, + "INT32": 5, + "INT64": 6, + "UINT": 7, + "UINT8": 8, + "UINT16": 9, + "UINT32": 10, + "UINT64": 11, + "STRING": 12, + "FLOAT32": 13, + "FLOAT64": 14, + "SLICE": 20, + "STRUCT": 21, + "FUNC": 22, + "CHAN": 23, + "PTR": 24, + "SPECIAL": 25, + "EXTERNAL": 26, + } +) + +func (x Type_Kind) Enum() *Type_Kind { + p := new(Type_Kind) + *p = x + return p } func (x Type_Kind) String() string { - return proto.EnumName(Type_Kind_name, int32(x)) + return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x)) +} + +func (Type_Kind) Descriptor() protoreflect.EnumDescriptor { + return file_v1_proto_enumTypes[0].Descriptor() +} + +func (Type_Kind) Type() protoreflect.EnumType { + return &file_v1_proto_enumTypes[0] +} + +func (x Type_Kind) Number() protoreflect.EnumNumber { + return protoreflect.EnumNumber(x) +} + +// Deprecated: Use Type_Kind.Descriptor instead. +func (Type_Kind) EnumDescriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{0, 0} } -func (Type_Kind) EnumDescriptor() ([]byte, []int) { return fileDescriptor0, []int{0, 0} } // ChanDir matches reflect.ChanDir. type Type_ChanDir int32 @@ -130,21 +162,46 @@ const ( Type_BOTH Type_ChanDir = 2 ) -var Type_ChanDir_name = map[int32]string{ - 0: "RECV", - 1: "SEND", - 2: "BOTH", -} -var Type_ChanDir_value = map[string]int32{ - "RECV": 0, - "SEND": 1, - "BOTH": 2, +// Enum value maps for Type_ChanDir. +var ( + Type_ChanDir_name = map[int32]string{ + 0: "RECV", + 1: "SEND", + 2: "BOTH", + } + Type_ChanDir_value = map[string]int32{ + "RECV": 0, + "SEND": 1, + "BOTH": 2, + } +) + +func (x Type_ChanDir) Enum() *Type_ChanDir { + p := new(Type_ChanDir) + *p = x + return p } func (x Type_ChanDir) String() string { - return proto.EnumName(Type_ChanDir_name, int32(x)) + return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x)) +} + +func (Type_ChanDir) Descriptor() protoreflect.EnumDescriptor { + return file_v1_proto_enumTypes[1].Descriptor() +} + +func (Type_ChanDir) Type() protoreflect.EnumType { + return &file_v1_proto_enumTypes[1] +} + +func (x Type_ChanDir) Number() protoreflect.EnumNumber { + return protoreflect.EnumNumber(x) +} + +// Deprecated: Use Type_ChanDir.Descriptor instead. +func (Type_ChanDir) EnumDescriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{0, 1} } -func (Type_ChanDir) EnumDescriptor() ([]byte, []int) { return fileDescriptor0, []int{0, 1} } type Type_Special int32 @@ -169,47 +226,72 @@ const ( Type_Z Type_Special = 21 ) -var Type_Special_name = map[int32]string{ - 0: "ILLEGAL", - 1: "ERROR", - 2: "CONTEXT", - 3: "TYPE", - 10: "EVENTTIME", - 22: "WINDOW", - 11: "KV", - 13: "COGBK", - 14: "WINDOWEDVALUE", - 15: "T", - 16: "U", - 17: "V", - 18: "W", - 19: "X", - 20: "Y", - 21: "Z", -} -var Type_Special_value = map[string]int32{ - "ILLEGAL": 0, - "ERROR": 1, - "CONTEXT": 2, - "TYPE": 3, - "EVENTTIME": 10, - "WINDOW": 22, - "KV": 11, - "COGBK": 13, - "WINDOWEDVALUE": 14, - "T": 15, - "U": 16, - "V": 17, - "W": 18, - "X": 19, - "Y": 20, - "Z": 21, +// Enum value maps for Type_Special. +var ( + Type_Special_name = map[int32]string{ + 0: "ILLEGAL", + 1: "ERROR", + 2: "CONTEXT", + 3: "TYPE", + 10: "EVENTTIME", + 22: "WINDOW", + 11: "KV", + 13: "COGBK", + 14: "WINDOWEDVALUE", + 15: "T", + 16: "U", + 17: "V", + 18: "W", + 19: "X", + 20: "Y", + 21: "Z", + } + Type_Special_value = map[string]int32{ + "ILLEGAL": 0, + "ERROR": 1, + "CONTEXT": 2, + "TYPE": 3, + "EVENTTIME": 10, + "WINDOW": 22, + "KV": 11, + "COGBK": 13, + "WINDOWEDVALUE": 14, + "T": 15, + "U": 16, + "V": 17, + "W": 18, + "X": 19, + "Y": 20, + "Z": 21, + } +) + +func (x Type_Special) Enum() *Type_Special { + p := new(Type_Special) + *p = x + return p } func (x Type_Special) String() string { - return proto.EnumName(Type_Special_name, int32(x)) + return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x)) +} + +func (Type_Special) Descriptor() protoreflect.EnumDescriptor { + return file_v1_proto_enumTypes[2].Descriptor() +} + +func (Type_Special) Type() protoreflect.EnumType { + return &file_v1_proto_enumTypes[2] +} + +func (x Type_Special) Number() protoreflect.EnumNumber { + return protoreflect.EnumNumber(x) +} + +// Deprecated: Use Type_Special.Descriptor instead. +func (Type_Special) EnumDescriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{0, 2} } -func (Type_Special) EnumDescriptor() ([]byte, []int) { return fileDescriptor0, []int{0, 2} } type MultiEdge_Inbound_InputKind int32 @@ -224,215 +306,235 @@ const ( MultiEdge_Inbound_REITER MultiEdge_Inbound_InputKind = 7 ) -var MultiEdge_Inbound_InputKind_name = map[int32]string{ - 0: "INVALID", - 1: "MAIN", - 2: "SINGLETON", - 3: "SLICE", - 4: "MAP", - 5: "MULTIMAP", - 6: "ITER", - 7: "REITER", -} -var MultiEdge_Inbound_InputKind_value = map[string]int32{ - "INVALID": 0, - "MAIN": 1, - "SINGLETON": 2, - "SLICE": 3, - "MAP": 4, - "MULTIMAP": 5, - "ITER": 6, - "REITER": 7, +// Enum value maps for MultiEdge_Inbound_InputKind. +var ( + MultiEdge_Inbound_InputKind_name = map[int32]string{ + 0: "INVALID", + 1: "MAIN", + 2: "SINGLETON", + 3: "SLICE", + 4: "MAP", + 5: "MULTIMAP", + 6: "ITER", + 7: "REITER", + } + MultiEdge_Inbound_InputKind_value = map[string]int32{ + "INVALID": 0, + "MAIN": 1, + "SINGLETON": 2, + "SLICE": 3, + "MAP": 4, + "MULTIMAP": 5, + "ITER": 6, + "REITER": 7, + } +) + +func (x MultiEdge_Inbound_InputKind) Enum() *MultiEdge_Inbound_InputKind { + p := new(MultiEdge_Inbound_InputKind) + *p = x + return p } func (x MultiEdge_Inbound_InputKind) String() string { - return proto.EnumName(MultiEdge_Inbound_InputKind_name, int32(x)) + return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x)) +} + +func (MultiEdge_Inbound_InputKind) Descriptor() protoreflect.EnumDescriptor { + return file_v1_proto_enumTypes[3].Descriptor() +} + +func (MultiEdge_Inbound_InputKind) Type() protoreflect.EnumType { + return &file_v1_proto_enumTypes[3] } + +func (x MultiEdge_Inbound_InputKind) Number() protoreflect.EnumNumber { + return protoreflect.EnumNumber(x) +} + +// Deprecated: Use MultiEdge_Inbound_InputKind.Descriptor instead. func (MultiEdge_Inbound_InputKind) EnumDescriptor() ([]byte, []int) { - return fileDescriptor0, []int{7, 0, 0} + return file_v1_proto_rawDescGZIP(), []int{7, 0, 0} } // Type represents a serializable reflect.Type. type Type struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + // (Required) Type kind. - Kind Type_Kind `protobuf:"varint,1,opt,name=kind,enum=v1.Type_Kind" json:"kind,omitempty"` + Kind Type_Kind `protobuf:"varint,1,opt,name=kind,proto3,enum=org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type_Kind" json:"kind,omitempty"` // (Optional) Element type (if SLICE, PTR or CHAN) - Element *Type `protobuf:"bytes,2,opt,name=element" json:"element,omitempty"` + Element *Type `protobuf:"bytes,2,opt,name=element,proto3" json:"element,omitempty"` // (Optional) Fields (if STRUCT). - Fields []*Type_StructField `protobuf:"bytes,3,rep,name=fields" json:"fields,omitempty"` + Fields []*Type_StructField `protobuf:"bytes,3,rep,name=fields,proto3" json:"fields,omitempty"` // (Optional) Parameter types (if FUNC). - ParameterTypes []*Type `protobuf:"bytes,4,rep,name=parameter_types,json=parameterTypes" json:"parameter_types,omitempty"` + ParameterTypes []*Type `protobuf:"bytes,4,rep,name=parameter_types,json=parameterTypes,proto3" json:"parameter_types,omitempty"` // (Optional) Return types (if FUNC). - ReturnTypes []*Type `protobuf:"bytes,5,rep,name=return_types,json=returnTypes" json:"return_types,omitempty"` + ReturnTypes []*Type `protobuf:"bytes,5,rep,name=return_types,json=returnTypes,proto3" json:"return_types,omitempty"` // (Optional) Is variadic (if FUNC). - IsVariadic bool `protobuf:"varint,6,opt,name=is_variadic,json=isVariadic" json:"is_variadic,omitempty"` + IsVariadic bool `protobuf:"varint,6,opt,name=is_variadic,json=isVariadic,proto3" json:"is_variadic,omitempty"` // (Optional) Channel direction (if CHAN). - ChanDir Type_ChanDir `protobuf:"varint,7,opt,name=chan_dir,json=chanDir,enum=v1.Type_ChanDir" json:"chan_dir,omitempty"` + ChanDir Type_ChanDir `protobuf:"varint,7,opt,name=chan_dir,json=chanDir,proto3,enum=org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type_ChanDir" json:"chan_dir,omitempty"` // (Optional) Special type (if SPECIAL) - Special Type_Special `protobuf:"varint,8,opt,name=special,enum=v1.Type_Special" json:"special,omitempty"` + Special Type_Special `protobuf:"varint,8,opt,name=special,proto3,enum=org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type_Special" json:"special,omitempty"` // (Optional) Key for external types. // External types are types that are not directly serialized using // the types above, but rather indirectly serialized. The wire format // holds a lookup key into a registry to reify the types in a worker from a // registry. The main usage of external serialization is to preserve // methods attached to types. - ExternalKey string `protobuf:"bytes,9,opt,name=external_key,json=externalKey" json:"external_key,omitempty"` + ExternalKey string `protobuf:"bytes,9,opt,name=external_key,json=externalKey,proto3" json:"external_key,omitempty"` +} + +func (x *Type) Reset() { + *x = Type{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[0] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Type) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Type) ProtoMessage() {} + +func (x *Type) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[0] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) } -func (m *Type) Reset() { *m = Type{} } -func (m *Type) String() string { return proto.CompactTextString(m) } -func (*Type) ProtoMessage() {} -func (*Type) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{0} } +// Deprecated: Use Type.ProtoReflect.Descriptor instead. +func (*Type) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{0} +} -func (m *Type) GetKind() Type_Kind { - if m != nil { - return m.Kind +func (x *Type) GetKind() Type_Kind { + if x != nil { + return x.Kind } return Type_INVALID } -func (m *Type) GetElement() *Type { - if m != nil { - return m.Element +func (x *Type) GetElement() *Type { + if x != nil { + return x.Element } return nil } -func (m *Type) GetFields() []*Type_StructField { - if m != nil { - return m.Fields +func (x *Type) GetFields() []*Type_StructField { + if x != nil { + return x.Fields } return nil } -func (m *Type) GetParameterTypes() []*Type { - if m != nil { - return m.ParameterTypes +func (x *Type) GetParameterTypes() []*Type { + if x != nil { + return x.ParameterTypes } return nil } -func (m *Type) GetReturnTypes() []*Type { - if m != nil { - return m.ReturnTypes +func (x *Type) GetReturnTypes() []*Type { + if x != nil { + return x.ReturnTypes } return nil } -func (m *Type) GetIsVariadic() bool { - if m != nil { - return m.IsVariadic +func (x *Type) GetIsVariadic() bool { + if x != nil { + return x.IsVariadic } return false } -func (m *Type) GetChanDir() Type_ChanDir { - if m != nil { - return m.ChanDir +func (x *Type) GetChanDir() Type_ChanDir { + if x != nil { + return x.ChanDir } return Type_RECV } -func (m *Type) GetSpecial() Type_Special { - if m != nil { - return m.Special +func (x *Type) GetSpecial() Type_Special { + if x != nil { + return x.Special } return Type_ILLEGAL } -func (m *Type) GetExternalKey() string { - if m != nil { - return m.ExternalKey +func (x *Type) GetExternalKey() string { + if x != nil { + return x.ExternalKey } return "" } -// StructField matches reflect.StructField. -type Type_StructField struct { - Name string `protobuf:"bytes,1,opt,name=name" json:"name,omitempty"` - PkgPath string `protobuf:"bytes,2,opt,name=pkg_path,json=pkgPath" json:"pkg_path,omitempty"` - Type *Type `protobuf:"bytes,3,opt,name=type" json:"type,omitempty"` - Tag string `protobuf:"bytes,4,opt,name=tag" json:"tag,omitempty"` - Offset int64 `protobuf:"varint,5,opt,name=offset" json:"offset,omitempty"` - Index []int32 `protobuf:"varint,6,rep,packed,name=index" json:"index,omitempty"` - Anonymous bool `protobuf:"varint,7,opt,name=anonymous" json:"anonymous,omitempty"` -} - -func (m *Type_StructField) Reset() { *m = Type_StructField{} } -func (m *Type_StructField) String() string { return proto.CompactTextString(m) } -func (*Type_StructField) ProtoMessage() {} -func (*Type_StructField) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{0, 0} } +// FullType represents a serialized typex.FullType +type FullType struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields -func (m *Type_StructField) GetName() string { - if m != nil { - return m.Name - } - return "" + Type *Type `protobuf:"bytes,1,opt,name=type,proto3" json:"type,omitempty"` + Components []*FullType `protobuf:"bytes,2,rep,name=components,proto3" json:"components,omitempty"` } -func (m *Type_StructField) GetPkgPath() string { - if m != nil { - return m.PkgPath +func (x *FullType) Reset() { + *x = FullType{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[1] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) } - return "" } -func (m *Type_StructField) GetType() *Type { - if m != nil { - return m.Type - } - return nil +func (x *FullType) String() string { + return protoimpl.X.MessageStringOf(x) } -func (m *Type_StructField) GetTag() string { - if m != nil { - return m.Tag - } - return "" -} +func (*FullType) ProtoMessage() {} -func (m *Type_StructField) GetOffset() int64 { - if m != nil { - return m.Offset +func (x *FullType) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[1] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms } - return 0 + return mi.MessageOf(x) } -func (m *Type_StructField) GetIndex() []int32 { - if m != nil { - return m.Index - } - return nil +// Deprecated: Use FullType.ProtoReflect.Descriptor instead. +func (*FullType) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{1} } -func (m *Type_StructField) GetAnonymous() bool { - if m != nil { - return m.Anonymous - } - return false -} - -// FullType represents a serialized typex.FullType -type FullType struct { - Type *Type `protobuf:"bytes,1,opt,name=type" json:"type,omitempty"` - Components []*FullType `protobuf:"bytes,2,rep,name=components" json:"components,omitempty"` -} - -func (m *FullType) Reset() { *m = FullType{} } -func (m *FullType) String() string { return proto.CompactTextString(m) } -func (*FullType) ProtoMessage() {} -func (*FullType) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{1} } - -func (m *FullType) GetType() *Type { - if m != nil { - return m.Type +func (x *FullType) GetType() *Type { + if x != nil { + return x.Type } return nil } -func (m *FullType) GetComponents() []*FullType { - if m != nil { - return m.Components +func (x *FullType) GetComponents() []*FullType { + if x != nil { + return x.Components } return nil } @@ -441,204 +543,359 @@ func (m *FullType) GetComponents() []*FullType { // implementation is notably not serialized and must be present (and // somehow discoverable from the symbol name) on the decoding side. type UserFn struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + // (Required) Symbol name of function. - Name string `protobuf:"bytes,1,opt,name=name" json:"name,omitempty"` + Name string `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"` // (Required) Function type. - Type *Type `protobuf:"bytes,2,opt,name=type" json:"type,omitempty"` + Type *Type `protobuf:"bytes,2,opt,name=type,proto3" json:"type,omitempty"` } -func (m *UserFn) Reset() { *m = UserFn{} } -func (m *UserFn) String() string { return proto.CompactTextString(m) } -func (*UserFn) ProtoMessage() {} -func (*UserFn) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{2} } +func (x *UserFn) Reset() { + *x = UserFn{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[2] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *UserFn) String() string { + return protoimpl.X.MessageStringOf(x) +} -func (m *UserFn) GetName() string { - if m != nil { - return m.Name +func (*UserFn) ProtoMessage() {} + +func (x *UserFn) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[2] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use UserFn.ProtoReflect.Descriptor instead. +func (*UserFn) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{2} +} + +func (x *UserFn) GetName() string { + if x != nil { + return x.Name } return "" } -func (m *UserFn) GetType() *Type { - if m != nil { - return m.Type +func (x *UserFn) GetType() *Type { + if x != nil { + return x.Type } return nil } // DynFn represents a serialized function generator. type DynFn struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + // (Required) Name of the generated function. - Name string `protobuf:"bytes,1,opt,name=name" json:"name,omitempty"` + Name string `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"` // (Required) Type of the generated function. - Type *Type `protobuf:"bytes,2,opt,name=type" json:"type,omitempty"` + Type *Type `protobuf:"bytes,2,opt,name=type,proto3" json:"type,omitempty"` // (Required) Input to generator. Data []byte `protobuf:"bytes,3,opt,name=data,proto3" json:"data,omitempty"` // (Required) Symbol name of generator (of type []byte -> // []reflect.Value -> []reflect.Value). - Gen string `protobuf:"bytes,4,opt,name=gen" json:"gen,omitempty"` + Gen string `protobuf:"bytes,4,opt,name=gen,proto3" json:"gen,omitempty"` +} + +func (x *DynFn) Reset() { + *x = DynFn{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[3] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *DynFn) String() string { + return protoimpl.X.MessageStringOf(x) } -func (m *DynFn) Reset() { *m = DynFn{} } -func (m *DynFn) String() string { return proto.CompactTextString(m) } -func (*DynFn) ProtoMessage() {} -func (*DynFn) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{3} } +func (*DynFn) ProtoMessage() {} -func (m *DynFn) GetName() string { - if m != nil { - return m.Name +func (x *DynFn) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[3] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use DynFn.ProtoReflect.Descriptor instead. +func (*DynFn) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{3} +} + +func (x *DynFn) GetName() string { + if x != nil { + return x.Name } return "" } -func (m *DynFn) GetType() *Type { - if m != nil { - return m.Type +func (x *DynFn) GetType() *Type { + if x != nil { + return x.Type } return nil } -func (m *DynFn) GetData() []byte { - if m != nil { - return m.Data +func (x *DynFn) GetData() []byte { + if x != nil { + return x.Data } return nil } -func (m *DynFn) GetGen() string { - if m != nil { - return m.Gen +func (x *DynFn) GetGen() string { + if x != nil { + return x.Gen } return "" } // Fn represents a serialized function reference or struct. type Fn struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + // (Optional) Function reference. - Fn *UserFn `protobuf:"bytes,1,opt,name=fn" json:"fn,omitempty"` + Fn *UserFn `protobuf:"bytes,1,opt,name=fn,proto3" json:"fn,omitempty"` // (Optional) Struct type. - Type *Type `protobuf:"bytes,2,opt,name=type" json:"type,omitempty"` + Type *Type `protobuf:"bytes,2,opt,name=type,proto3" json:"type,omitempty"` // (Optional) JSON-serialized value, if struct. - Opt string `protobuf:"bytes,3,opt,name=opt" json:"opt,omitempty"` + Opt string `protobuf:"bytes,3,opt,name=opt,proto3" json:"opt,omitempty"` // (Optional) Function generator, if dynamic function. - Dynfn *DynFn `protobuf:"bytes,4,opt,name=dynfn" json:"dynfn,omitempty"` + Dynfn *DynFn `protobuf:"bytes,4,opt,name=dynfn,proto3" json:"dynfn,omitempty"` +} + +func (x *Fn) Reset() { + *x = Fn{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[4] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Fn) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Fn) ProtoMessage() {} + +func (x *Fn) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[4] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) } -func (m *Fn) Reset() { *m = Fn{} } -func (m *Fn) String() string { return proto.CompactTextString(m) } -func (*Fn) ProtoMessage() {} -func (*Fn) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{4} } +// Deprecated: Use Fn.ProtoReflect.Descriptor instead. +func (*Fn) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{4} +} -func (m *Fn) GetFn() *UserFn { - if m != nil { - return m.Fn +func (x *Fn) GetFn() *UserFn { + if x != nil { + return x.Fn } return nil } -func (m *Fn) GetType() *Type { - if m != nil { - return m.Type +func (x *Fn) GetType() *Type { + if x != nil { + return x.Type } return nil } -func (m *Fn) GetOpt() string { - if m != nil { - return m.Opt +func (x *Fn) GetOpt() string { + if x != nil { + return x.Opt } return "" } -func (m *Fn) GetDynfn() *DynFn { - if m != nil { - return m.Dynfn +func (x *Fn) GetDynfn() *DynFn { + if x != nil { + return x.Dynfn } return nil } // WindowFn represents a window fn. type WindowFn struct { - Kind string `protobuf:"bytes,1,opt,name=kind" json:"kind,omitempty"` - SizeMs int64 `protobuf:"varint,2,opt,name=size_ms,json=sizeMs" json:"size_ms,omitempty"` - PeriodMs int64 `protobuf:"varint,3,opt,name=period_ms,json=periodMs" json:"period_ms,omitempty"` - GapMs int64 `protobuf:"varint,4,opt,name=gap_ms,json=gapMs" json:"gap_ms,omitempty"` + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Kind string `protobuf:"bytes,1,opt,name=kind,proto3" json:"kind,omitempty"` + SizeMs int64 `protobuf:"varint,2,opt,name=size_ms,json=sizeMs,proto3" json:"size_ms,omitempty"` + PeriodMs int64 `protobuf:"varint,3,opt,name=period_ms,json=periodMs,proto3" json:"period_ms,omitempty"` + GapMs int64 `protobuf:"varint,4,opt,name=gap_ms,json=gapMs,proto3" json:"gap_ms,omitempty"` +} + +func (x *WindowFn) Reset() { + *x = WindowFn{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[5] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *WindowFn) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*WindowFn) ProtoMessage() {} + +func (x *WindowFn) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[5] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) } -func (m *WindowFn) Reset() { *m = WindowFn{} } -func (m *WindowFn) String() string { return proto.CompactTextString(m) } -func (*WindowFn) ProtoMessage() {} -func (*WindowFn) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{5} } +// Deprecated: Use WindowFn.ProtoReflect.Descriptor instead. +func (*WindowFn) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{5} +} -func (m *WindowFn) GetKind() string { - if m != nil { - return m.Kind +func (x *WindowFn) GetKind() string { + if x != nil { + return x.Kind } return "" } -func (m *WindowFn) GetSizeMs() int64 { - if m != nil { - return m.SizeMs +func (x *WindowFn) GetSizeMs() int64 { + if x != nil { + return x.SizeMs } return 0 } -func (m *WindowFn) GetPeriodMs() int64 { - if m != nil { - return m.PeriodMs +func (x *WindowFn) GetPeriodMs() int64 { + if x != nil { + return x.PeriodMs } return 0 } -func (m *WindowFn) GetGapMs() int64 { - if m != nil { - return m.GapMs +func (x *WindowFn) GetGapMs() int64 { + if x != nil { + return x.GapMs } return 0 } // CustomCoder type CustomCoder struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + // (Required) Name of the coder. For informational purposes only. - Name string `protobuf:"bytes,1,opt,name=name" json:"name,omitempty"` + Name string `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"` // (Required) Concrete type being coded. - Type *Type `protobuf:"bytes,2,opt,name=type" json:"type,omitempty"` + Type *Type `protobuf:"bytes,2,opt,name=type,proto3" json:"type,omitempty"` // (Required) Encoding function. - Enc *UserFn `protobuf:"bytes,3,opt,name=enc" json:"enc,omitempty"` + Enc *UserFn `protobuf:"bytes,3,opt,name=enc,proto3" json:"enc,omitempty"` // (Required) Decoding function. - Dec *UserFn `protobuf:"bytes,4,opt,name=dec" json:"dec,omitempty"` + Dec *UserFn `protobuf:"bytes,4,opt,name=dec,proto3" json:"dec,omitempty"` } -func (m *CustomCoder) Reset() { *m = CustomCoder{} } -func (m *CustomCoder) String() string { return proto.CompactTextString(m) } -func (*CustomCoder) ProtoMessage() {} -func (*CustomCoder) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{6} } +func (x *CustomCoder) Reset() { + *x = CustomCoder{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[6] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *CustomCoder) String() string { + return protoimpl.X.MessageStringOf(x) +} -func (m *CustomCoder) GetName() string { - if m != nil { - return m.Name +func (*CustomCoder) ProtoMessage() {} + +func (x *CustomCoder) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[6] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use CustomCoder.ProtoReflect.Descriptor instead. +func (*CustomCoder) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{6} +} + +func (x *CustomCoder) GetName() string { + if x != nil { + return x.Name } return "" } -func (m *CustomCoder) GetType() *Type { - if m != nil { - return m.Type +func (x *CustomCoder) GetType() *Type { + if x != nil { + return x.Type } return nil } -func (m *CustomCoder) GetEnc() *UserFn { - if m != nil { - return m.Enc +func (x *CustomCoder) GetEnc() *UserFn { + if x != nil { + return x.Enc } return nil } -func (m *CustomCoder) GetDec() *UserFn { - if m != nil { - return m.Dec +func (x *CustomCoder) GetDec() *UserFn { + if x != nil { + return x.Dec } return nil } @@ -646,106 +903,128 @@ func (m *CustomCoder) GetDec() *UserFn { // MultiEdge represents a partly-serialized MultiEdge. It does not include // node information, because runners manipulate the graph structure. type MultiEdge struct { - Fn *Fn `protobuf:"bytes,1,opt,name=fn" json:"fn,omitempty"` - Opcode string `protobuf:"bytes,4,opt,name=opcode" json:"opcode,omitempty"` - WindowFn *WindowFn `protobuf:"bytes,5,opt,name=window_fn,json=windowFn" json:"window_fn,omitempty"` - Inbound []*MultiEdge_Inbound `protobuf:"bytes,2,rep,name=inbound" json:"inbound,omitempty"` - Outbound []*MultiEdge_Outbound `protobuf:"bytes,3,rep,name=outbound" json:"outbound,omitempty"` + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Fn *Fn `protobuf:"bytes,1,opt,name=fn,proto3" json:"fn,omitempty"` + Opcode string `protobuf:"bytes,4,opt,name=opcode,proto3" json:"opcode,omitempty"` + WindowFn *WindowFn `protobuf:"bytes,5,opt,name=window_fn,json=windowFn,proto3" json:"window_fn,omitempty"` + Inbound []*MultiEdge_Inbound `protobuf:"bytes,2,rep,name=inbound,proto3" json:"inbound,omitempty"` + Outbound []*MultiEdge_Outbound `protobuf:"bytes,3,rep,name=outbound,proto3" json:"outbound,omitempty"` +} + +func (x *MultiEdge) Reset() { + *x = MultiEdge{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[7] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *MultiEdge) String() string { + return protoimpl.X.MessageStringOf(x) } -func (m *MultiEdge) Reset() { *m = MultiEdge{} } -func (m *MultiEdge) String() string { return proto.CompactTextString(m) } -func (*MultiEdge) ProtoMessage() {} -func (*MultiEdge) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{7} } +func (*MultiEdge) ProtoMessage() {} -func (m *MultiEdge) GetFn() *Fn { - if m != nil { - return m.Fn +func (x *MultiEdge) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[7] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use MultiEdge.ProtoReflect.Descriptor instead. +func (*MultiEdge) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{7} +} + +func (x *MultiEdge) GetFn() *Fn { + if x != nil { + return x.Fn } return nil } -func (m *MultiEdge) GetOpcode() string { - if m != nil { - return m.Opcode +func (x *MultiEdge) GetOpcode() string { + if x != nil { + return x.Opcode } return "" } -func (m *MultiEdge) GetWindowFn() *WindowFn { - if m != nil { - return m.WindowFn +func (x *MultiEdge) GetWindowFn() *WindowFn { + if x != nil { + return x.WindowFn } return nil } -func (m *MultiEdge) GetInbound() []*MultiEdge_Inbound { - if m != nil { - return m.Inbound +func (x *MultiEdge) GetInbound() []*MultiEdge_Inbound { + if x != nil { + return x.Inbound } return nil } -func (m *MultiEdge) GetOutbound() []*MultiEdge_Outbound { - if m != nil { - return m.Outbound +func (x *MultiEdge) GetOutbound() []*MultiEdge_Outbound { + if x != nil { + return x.Outbound } return nil } -type MultiEdge_Inbound struct { - Kind MultiEdge_Inbound_InputKind `protobuf:"varint,1,opt,name=kind,enum=v1.MultiEdge_Inbound_InputKind" json:"kind,omitempty"` - Type *FullType `protobuf:"bytes,2,opt,name=type" json:"type,omitempty"` -} - -func (m *MultiEdge_Inbound) Reset() { *m = MultiEdge_Inbound{} } -func (m *MultiEdge_Inbound) String() string { return proto.CompactTextString(m) } -func (*MultiEdge_Inbound) ProtoMessage() {} -func (*MultiEdge_Inbound) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{7, 0} } +// InjectPayload is the payload for the built-in Inject function. +type InjectPayload struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields -func (m *MultiEdge_Inbound) GetKind() MultiEdge_Inbound_InputKind { - if m != nil { - return m.Kind - } - return MultiEdge_Inbound_INVALID + N int32 `protobuf:"varint,1,opt,name=n,proto3" json:"n,omitempty"` } -func (m *MultiEdge_Inbound) GetType() *FullType { - if m != nil { - return m.Type +func (x *InjectPayload) Reset() { + *x = InjectPayload{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[8] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) } - return nil } -type MultiEdge_Outbound struct { - Type *FullType `protobuf:"bytes,1,opt,name=type" json:"type,omitempty"` +func (x *InjectPayload) String() string { + return protoimpl.X.MessageStringOf(x) } -func (m *MultiEdge_Outbound) Reset() { *m = MultiEdge_Outbound{} } -func (m *MultiEdge_Outbound) String() string { return proto.CompactTextString(m) } -func (*MultiEdge_Outbound) ProtoMessage() {} -func (*MultiEdge_Outbound) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{7, 1} } +func (*InjectPayload) ProtoMessage() {} -func (m *MultiEdge_Outbound) GetType() *FullType { - if m != nil { - return m.Type +func (x *InjectPayload) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[8] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms } - return nil + return mi.MessageOf(x) } -// InjectPayload is the payload for the built-in Inject function. -type InjectPayload struct { - N int32 `protobuf:"varint,1,opt,name=n" json:"n,omitempty"` +// Deprecated: Use InjectPayload.ProtoReflect.Descriptor instead. +func (*InjectPayload) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{8} } -func (m *InjectPayload) Reset() { *m = InjectPayload{} } -func (m *InjectPayload) String() string { return proto.CompactTextString(m) } -func (*InjectPayload) ProtoMessage() {} -func (*InjectPayload) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{8} } - -func (m *InjectPayload) GetN() int32 { - if m != nil { - return m.N +func (x *InjectPayload) GetN() int32 { + if x != nil { + return x.N } return 0 } @@ -753,135 +1032,760 @@ func (m *InjectPayload) GetN() int32 { // TransformPayload represents the full payload for transforms, both // user defined and built-in. type TransformPayload struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + // urn is included here. It is also present in the model pipeline, but // not when submitting through Dataflow yet. - Urn string `protobuf:"bytes,1,opt,name=urn" json:"urn,omitempty"` - Edge *MultiEdge `protobuf:"bytes,2,opt,name=edge" json:"edge,omitempty"` - Inject *InjectPayload `protobuf:"bytes,3,opt,name=inject" json:"inject,omitempty"` + Urn string `protobuf:"bytes,1,opt,name=urn,proto3" json:"urn,omitempty"` + Edge *MultiEdge `protobuf:"bytes,2,opt,name=edge,proto3" json:"edge,omitempty"` + Inject *InjectPayload `protobuf:"bytes,3,opt,name=inject,proto3" json:"inject,omitempty"` +} + +func (x *TransformPayload) Reset() { + *x = TransformPayload{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[9] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *TransformPayload) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*TransformPayload) ProtoMessage() {} + +func (x *TransformPayload) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[9] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use TransformPayload.ProtoReflect.Descriptor instead. +func (*TransformPayload) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{9} +} + +func (x *TransformPayload) GetUrn() string { + if x != nil { + return x.Urn + } + return "" +} + +func (x *TransformPayload) GetEdge() *MultiEdge { + if x != nil { + return x.Edge + } + return nil +} + +func (x *TransformPayload) GetInject() *InjectPayload { + if x != nil { + return x.Inject + } + return nil +} + +// StructField matches reflect.StructField. +type Type_StructField struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Name string `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"` + PkgPath string `protobuf:"bytes,2,opt,name=pkg_path,json=pkgPath,proto3" json:"pkg_path,omitempty"` + Type *Type `protobuf:"bytes,3,opt,name=type,proto3" json:"type,omitempty"` + Tag string `protobuf:"bytes,4,opt,name=tag,proto3" json:"tag,omitempty"` + Offset int64 `protobuf:"varint,5,opt,name=offset,proto3" json:"offset,omitempty"` + Index []int32 `protobuf:"varint,6,rep,packed,name=index,proto3" json:"index,omitempty"` + Anonymous bool `protobuf:"varint,7,opt,name=anonymous,proto3" json:"anonymous,omitempty"` +} + +func (x *Type_StructField) Reset() { + *x = Type_StructField{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[10] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Type_StructField) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Type_StructField) ProtoMessage() {} + +func (x *Type_StructField) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[10] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) } -func (m *TransformPayload) Reset() { *m = TransformPayload{} } -func (m *TransformPayload) String() string { return proto.CompactTextString(m) } -func (*TransformPayload) ProtoMessage() {} -func (*TransformPayload) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{9} } +// Deprecated: Use Type_StructField.ProtoReflect.Descriptor instead. +func (*Type_StructField) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{0, 0} +} -func (m *TransformPayload) GetUrn() string { - if m != nil { - return m.Urn +func (x *Type_StructField) GetName() string { + if x != nil { + return x.Name } return "" } -func (m *TransformPayload) GetEdge() *MultiEdge { - if m != nil { - return m.Edge +func (x *Type_StructField) GetPkgPath() string { + if x != nil { + return x.PkgPath + } + return "" +} + +func (x *Type_StructField) GetType() *Type { + if x != nil { + return x.Type } return nil } -func (m *TransformPayload) GetInject() *InjectPayload { - if m != nil { - return m.Inject +func (x *Type_StructField) GetTag() string { + if x != nil { + return x.Tag + } + return "" +} + +func (x *Type_StructField) GetOffset() int64 { + if x != nil { + return x.Offset + } + return 0 +} + +func (x *Type_StructField) GetIndex() []int32 { + if x != nil { + return x.Index } return nil } -func init() { - proto.RegisterType((*Type)(nil), "v1.Type") - proto.RegisterType((*Type_StructField)(nil), "v1.Type.StructField") - proto.RegisterType((*FullType)(nil), "v1.FullType") - proto.RegisterType((*UserFn)(nil), "v1.UserFn") - proto.RegisterType((*DynFn)(nil), "v1.DynFn") - proto.RegisterType((*Fn)(nil), "v1.Fn") - proto.RegisterType((*WindowFn)(nil), "v1.WindowFn") - proto.RegisterType((*CustomCoder)(nil), "v1.CustomCoder") - proto.RegisterType((*MultiEdge)(nil), "v1.MultiEdge") - proto.RegisterType((*MultiEdge_Inbound)(nil), "v1.MultiEdge.Inbound") - proto.RegisterType((*MultiEdge_Outbound)(nil), "v1.MultiEdge.Outbound") - proto.RegisterType((*InjectPayload)(nil), "v1.InjectPayload") - proto.RegisterType((*TransformPayload)(nil), "v1.TransformPayload") - proto.RegisterEnum("v1.Type_Kind", Type_Kind_name, Type_Kind_value) - proto.RegisterEnum("v1.Type_ChanDir", Type_ChanDir_name, Type_ChanDir_value) - proto.RegisterEnum("v1.Type_Special", Type_Special_name, Type_Special_value) - proto.RegisterEnum("v1.MultiEdge_Inbound_InputKind", MultiEdge_Inbound_InputKind_name, MultiEdge_Inbound_InputKind_value) -} - -func init() { proto.RegisterFile("v1.proto", fileDescriptor0) } - -var fileDescriptor0 = []byte{ - // 1180 bytes of a gzipped FileDescriptorProto - 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0xff, 0x9c, 0x56, 0xdf, 0x72, 0xda, 0xc6, - 0x17, 0x8e, 0x10, 0xe8, 0xcf, 0x01, 0x9c, 0xcd, 0xfe, 0x6c, 0x47, 0xf1, 0x2f, 0x9d, 0x10, 0xdd, - 0x94, 0x34, 0x19, 0x77, 0x8c, 0x33, 0x9e, 0x4e, 0xef, 0x08, 0x96, 0x1d, 0x8d, 0x41, 0x30, 0x8b, - 0xc0, 0x49, 0x6f, 0x18, 0x05, 0x2d, 0x58, 0x35, 0xac, 0x54, 0x49, 0x38, 0xa1, 0x7d, 0x84, 0xbe, - 0x49, 0x9f, 0xa0, 0x6f, 0xd0, 0x8b, 0x3e, 0x54, 0x3b, 0xbb, 0x92, 0x88, 0x71, 0xdd, 0xe9, 0x4c, - 0xae, 0xf6, 0xec, 0xf9, 0xbe, 0xb3, 0x67, 0xf7, 0xdb, 0xb3, 0x47, 0x02, 0xed, 0xe6, 0xe8, 0x30, - 0x8a, 0xc3, 0x34, 0xc4, 0xa5, 0x9b, 0x23, 0xf3, 0x57, 0x0d, 0xca, 0xee, 0x3a, 0xa2, 0xf8, 0x39, - 0x94, 0xaf, 0x03, 0xe6, 0x1b, 0x52, 0x43, 0x6a, 0xee, 0xb4, 0xea, 0x87, 0x37, 0x47, 0x87, 0xdc, - 0x7f, 0x78, 0x11, 0x30, 0x9f, 0x08, 0x08, 0x9b, 0xa0, 0xd2, 0x05, 0x5d, 0x52, 0x96, 0x1a, 0xa5, - 0x86, 0xd4, 0xac, 0xb6, 0xb4, 0x82, 0x45, 0x0a, 0x00, 0xbf, 0x02, 0x65, 0x16, 0xd0, 0x85, 0x9f, - 0x18, 0x72, 0x43, 0x6e, 0x56, 0x5b, 0xbb, 0x9b, 0x85, 0x86, 0x69, 0xbc, 0x9a, 0xa6, 0x67, 0x1c, - 0x24, 0x39, 0x07, 0x1f, 0xc1, 0xc3, 0xc8, 0x8b, 0xbd, 0x25, 0x4d, 0x69, 0x3c, 0x49, 0xd7, 0x11, - 0x4d, 0x8c, 0xb2, 0x08, 0xfb, 0xbc, 0xf2, 0xce, 0x86, 0xc0, 0xa7, 0x09, 0x7e, 0x09, 0xb5, 0x98, - 0xa6, 0xab, 0x98, 0xe5, 0xfc, 0xca, 0x1d, 0x7e, 0x35, 0x43, 0x33, 0xf2, 0x33, 0xa8, 0x06, 0xc9, - 0xe4, 0xc6, 0x8b, 0x03, 0xcf, 0x0f, 0xa6, 0x86, 0xd2, 0x90, 0x9a, 0x1a, 0x81, 0x20, 0x19, 0xe7, - 0x1e, 0xfc, 0x12, 0xb4, 0xe9, 0x95, 0xc7, 0x26, 0x7e, 0x10, 0x1b, 0xaa, 0x38, 0x39, 0xda, 0x6c, - 0xb8, 0x73, 0xe5, 0xb1, 0xd3, 0x20, 0x26, 0xea, 0x34, 0x33, 0xf0, 0x37, 0xa0, 0x26, 0x11, 0x9d, - 0x06, 0xde, 0xc2, 0xd0, 0xee, 0x70, 0x87, 0x99, 0x9f, 0x14, 0x04, 0xfc, 0x1c, 0x6a, 0xf4, 0x53, - 0x4a, 0x63, 0xe6, 0x2d, 0x26, 0xd7, 0x74, 0x6d, 0xe8, 0x0d, 0xa9, 0xa9, 0x93, 0x6a, 0xe1, 0xbb, - 0xa0, 0xeb, 0x83, 0xdf, 0x25, 0xa8, 0xde, 0x12, 0x05, 0x63, 0x28, 0x33, 0x6f, 0x49, 0xc5, 0x0d, - 0xe8, 0x44, 0xd8, 0xf8, 0x09, 0x68, 0xd1, 0xf5, 0x7c, 0x12, 0x79, 0xe9, 0x95, 0xd0, 0x5c, 0x27, - 0x6a, 0x74, 0x3d, 0x1f, 0x78, 0xe9, 0x15, 0x7e, 0x0a, 0x65, 0xae, 0x80, 0x21, 0xdf, 0xb9, 0x0a, - 0xe1, 0xc5, 0x08, 0xe4, 0xd4, 0x9b, 0x1b, 0x65, 0x11, 0xc3, 0x4d, 0xbc, 0x0f, 0x4a, 0x38, 0x9b, - 0x25, 0x34, 0x35, 0x2a, 0x0d, 0xa9, 0x29, 0x93, 0x7c, 0x86, 0x77, 0xa1, 0x12, 0x30, 0x9f, 0x7e, - 0x32, 0x94, 0x86, 0xdc, 0xac, 0x90, 0x6c, 0x82, 0x9f, 0x82, 0xee, 0xb1, 0x90, 0xad, 0x97, 0xe1, - 0x2a, 0x11, 0xca, 0x68, 0xe4, 0xb3, 0xc3, 0xfc, 0x4b, 0x82, 0x32, 0x2f, 0x0c, 0x5c, 0x05, 0xd5, - 0x76, 0xc6, 0xed, 0xae, 0x7d, 0x8a, 0x1e, 0x60, 0x0d, 0xca, 0x6f, 0xfa, 0xfd, 0x2e, 0x92, 0xb0, - 0x0a, 0xb2, 0xed, 0xb8, 0xa8, 0xc4, 0x5d, 0xb6, 0xe3, 0x7e, 0x87, 0x64, 0xac, 0x43, 0xc5, 0x76, - 0xdc, 0xa3, 0x13, 0x54, 0xce, 0xcd, 0xe3, 0x16, 0xaa, 0xe4, 0xe6, 0xc9, 0x6b, 0xa4, 0x70, 0xea, - 0x88, 0x07, 0xa9, 0xdc, 0x39, 0x12, 0x51, 0x1a, 0x06, 0x50, 0x46, 0x59, 0x98, 0x5e, 0xd8, 0xc7, - 0x2d, 0x04, 0x85, 0x7d, 0xf2, 0x1a, 0x55, 0xb9, 0x3d, 0x74, 0x89, 0xed, 0x9c, 0xa3, 0x1a, 0xdf, - 0xcf, 0x59, 0xb7, 0xdf, 0xe6, 0xa4, 0xfa, 0x66, 0x72, 0xf2, 0x1a, 0xed, 0xf0, 0x45, 0x87, 0x5d, - 0xbb, 0x63, 0xa1, 0xdd, 0x3c, 0x60, 0xd4, 0x71, 0xd1, 0x1e, 0xcf, 0x7a, 0x36, 0x72, 0x3a, 0x68, - 0x9f, 0x5b, 0x9d, 0xb7, 0x6d, 0x07, 0x3d, 0xe6, 0xbb, 0x1f, 0xb8, 0x04, 0x19, 0x7c, 0x81, 0xe1, - 0xc0, 0xea, 0xd8, 0xed, 0x2e, 0x7a, 0x82, 0x6b, 0xa0, 0x59, 0xef, 0x5c, 0x8b, 0x38, 0xed, 0x2e, - 0x3a, 0x30, 0xbf, 0x06, 0x35, 0xaf, 0x0f, 0x1e, 0x48, 0xac, 0xce, 0x38, 0x13, 0x60, 0x68, 0x39, - 0xa7, 0x48, 0xca, 0xa4, 0x70, 0xdf, 0xa2, 0x92, 0xf9, 0x9b, 0x04, 0x6a, 0x5e, 0x1d, 0x42, 0xad, - 0x6e, 0xd7, 0x3a, 0x6f, 0x77, 0xd1, 0x03, 0xbe, 0x21, 0x8b, 0x90, 0x3e, 0x41, 0x12, 0xf7, 0x77, - 0xfa, 0x8e, 0x6b, 0xbd, 0xcb, 0x25, 0x73, 0xdf, 0x0f, 0x2c, 0x24, 0xe3, 0x3a, 0xe8, 0xd6, 0xd8, - 0x72, 0x5c, 0xd7, 0xee, 0x59, 0xd9, 0x99, 0x2f, 0x6d, 0xe7, 0xb4, 0x7f, 0x89, 0xf6, 0xb1, 0x02, - 0xa5, 0x8b, 0x31, 0xaa, 0xf2, 0x45, 0x3a, 0xfd, 0xf3, 0x37, 0x17, 0xa8, 0x8e, 0x1f, 0x41, 0x3d, - 0x83, 0xad, 0xd3, 0x71, 0xbb, 0x3b, 0xb2, 0xd0, 0x0e, 0xae, 0x80, 0xe4, 0xa2, 0x87, 0x7c, 0x18, - 0x21, 0xc4, 0x87, 0x31, 0x7a, 0xc4, 0x87, 0x4b, 0x84, 0xf9, 0xf0, 0x0e, 0xfd, 0x8f, 0x0f, 0xef, - 0xd1, 0x2e, 0x1f, 0x7e, 0x40, 0x7b, 0xe6, 0x18, 0xb4, 0xb3, 0xd5, 0x62, 0x21, 0x1a, 0x42, 0x51, - 0x5f, 0xd2, 0xbd, 0xf5, 0xf5, 0x0a, 0x60, 0x1a, 0x2e, 0xa3, 0x90, 0x51, 0x96, 0x26, 0x46, 0x49, - 0x3c, 0xc2, 0x1a, 0xe7, 0x14, 0xf1, 0xe4, 0x16, 0x6e, 0x7e, 0x0f, 0xca, 0x28, 0xa1, 0xf1, 0x19, - 0xbb, 0xb7, 0xc8, 0x8b, 0x4c, 0xa5, 0xfb, 0x32, 0x99, 0x13, 0xa8, 0x9c, 0xae, 0xd9, 0x97, 0x84, - 0xf2, 0x08, 0xdf, 0x4b, 0x3d, 0xf1, 0x44, 0x6a, 0x44, 0xd8, 0xfc, 0x61, 0xcc, 0x29, 0x2b, 0x1e, - 0xc6, 0x9c, 0x32, 0xf3, 0x27, 0x28, 0x9d, 0x31, 0x7c, 0x00, 0xa5, 0x19, 0xcb, 0x0f, 0x0b, 0x7c, - 0x9d, 0x6c, 0xc3, 0xa4, 0x34, 0x63, 0xff, 0x91, 0x05, 0x81, 0x1c, 0x46, 0xa9, 0x48, 0xa2, 0x13, - 0x6e, 0xe2, 0x67, 0x50, 0xf1, 0xd7, 0x6c, 0x96, 0x65, 0xa9, 0xb6, 0x74, 0x1e, 0x20, 0xce, 0x40, - 0x32, 0xbf, 0x79, 0x0d, 0xda, 0x65, 0xc0, 0xfc, 0xf0, 0x63, 0x76, 0xac, 0x4d, 0xe3, 0xd5, 0xf3, - 0x4e, 0xfb, 0x18, 0xd4, 0x24, 0xf8, 0x99, 0x4e, 0x96, 0x89, 0xc8, 0x29, 0x13, 0x85, 0x4f, 0x7b, - 0x09, 0xfe, 0x3f, 0xe8, 0x11, 0x8d, 0x83, 0xd0, 0xe7, 0x90, 0x2c, 0x20, 0x2d, 0x73, 0xf4, 0x12, - 0xbc, 0x07, 0xca, 0xdc, 0x8b, 0x38, 0x52, 0x16, 0x48, 0x65, 0xee, 0x45, 0xbd, 0xc4, 0xfc, 0x05, - 0xaa, 0x9d, 0x55, 0x92, 0x86, 0xcb, 0x4e, 0xe8, 0xd3, 0xf8, 0x0b, 0x64, 0x7c, 0x0a, 0x32, 0x65, - 0xd3, 0xbc, 0xd1, 0xdc, 0xd6, 0x86, 0xbb, 0x39, 0xea, 0xd3, 0x69, 0x7e, 0xd4, 0x2d, 0xd4, 0xa7, - 0x53, 0xf3, 0x0f, 0x19, 0xf4, 0xde, 0x6a, 0x91, 0x06, 0x96, 0x3f, 0xa7, 0x78, 0xff, 0x96, 0xc8, - 0x8a, 0xa8, 0x96, 0x4c, 0x60, 0xde, 0x9b, 0xa2, 0x69, 0xe8, 0xd3, 0xfc, 0x5e, 0xf2, 0x19, 0x7e, - 0x01, 0xfa, 0x47, 0xa1, 0xd3, 0x64, 0xc6, 0x44, 0xdb, 0xca, 0x8b, 0xac, 0x10, 0x8f, 0x68, 0x1f, - 0x0b, 0x19, 0xbf, 0x05, 0x35, 0x60, 0x1f, 0xc2, 0x15, 0xf3, 0xf3, 0x6a, 0xdc, 0xe3, 0xc4, 0x4d, - 0xea, 0x43, 0x3b, 0x03, 0x49, 0xc1, 0xc2, 0x2d, 0xd0, 0xc2, 0x55, 0x9a, 0x45, 0x64, 0xdf, 0xaa, - 0xfd, 0xed, 0x88, 0x7e, 0x8e, 0x92, 0x0d, 0xef, 0xe0, 0x4f, 0x09, 0xd4, 0x7c, 0x21, 0x7c, 0xbc, - 0xf5, 0xc1, 0x7c, 0x76, 0x6f, 0xb6, 0x43, 0x9b, 0x45, 0xab, 0xf4, 0xd6, 0x27, 0xb4, 0xb1, 0x25, - 0xf4, 0xf6, 0x83, 0xc9, 0xca, 0x3d, 0x00, 0x7d, 0x13, 0xf4, 0x8f, 0xf6, 0xda, 0x6b, 0xdb, 0x0e, - 0x92, 0x78, 0x63, 0x18, 0xda, 0xce, 0x79, 0xd7, 0x72, 0xfb, 0x0e, 0x2a, 0x7d, 0x6e, 0x6d, 0x32, - 0x6f, 0x5d, 0xbd, 0xf6, 0x00, 0x95, 0x79, 0xb7, 0xea, 0x8d, 0xba, 0xae, 0xcd, 0x67, 0x15, 0xd1, - 0x86, 0x5d, 0x8b, 0x20, 0x85, 0x37, 0x11, 0x62, 0x09, 0x5b, 0x3d, 0x78, 0x05, 0x5a, 0x71, 0xc6, - 0xcd, 0xc6, 0xa4, 0x7f, 0xdd, 0xd8, 0x57, 0x50, 0xb7, 0xd9, 0x8f, 0x74, 0x9a, 0x0e, 0xbc, 0xf5, - 0x22, 0xf4, 0x7c, 0x5c, 0x03, 0x29, 0xbb, 0xcb, 0x0a, 0x91, 0x98, 0x19, 0x03, 0x72, 0x63, 0x8f, - 0x25, 0xb3, 0x30, 0x5e, 0x16, 0x0c, 0x04, 0xf2, 0x2a, 0x66, 0x79, 0xa5, 0x71, 0x93, 0xff, 0x65, - 0x50, 0x7f, 0x5e, 0x9c, 0xbf, 0xbe, 0x25, 0x1a, 0x11, 0x10, 0x7e, 0x01, 0x4a, 0x20, 0xf2, 0xe4, - 0x05, 0xf7, 0x88, 0x93, 0xb6, 0x32, 0x93, 0x9c, 0xf0, 0x41, 0x11, 0xff, 0x31, 0xc7, 0x7f, 0x07, - 0x00, 0x00, 0xff, 0xff, 0xb5, 0x6b, 0xf5, 0xe3, 0xd3, 0x08, 0x00, 0x00, +func (x *Type_StructField) GetAnonymous() bool { + if x != nil { + return x.Anonymous + } + return false +} + +type MultiEdge_Inbound struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Kind MultiEdge_Inbound_InputKind `protobuf:"varint,1,opt,name=kind,proto3,enum=org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge_Inbound_InputKind" json:"kind,omitempty"` + Type *FullType `protobuf:"bytes,2,opt,name=type,proto3" json:"type,omitempty"` +} + +func (x *MultiEdge_Inbound) Reset() { + *x = MultiEdge_Inbound{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[11] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *MultiEdge_Inbound) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*MultiEdge_Inbound) ProtoMessage() {} + +func (x *MultiEdge_Inbound) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[11] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use MultiEdge_Inbound.ProtoReflect.Descriptor instead. +func (*MultiEdge_Inbound) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{7, 0} +} + +func (x *MultiEdge_Inbound) GetKind() MultiEdge_Inbound_InputKind { + if x != nil { + return x.Kind + } + return MultiEdge_Inbound_INVALID +} + +func (x *MultiEdge_Inbound) GetType() *FullType { + if x != nil { + return x.Type + } + return nil +} + +type MultiEdge_Outbound struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Type *FullType `protobuf:"bytes,1,opt,name=type,proto3" json:"type,omitempty"` +} + +func (x *MultiEdge_Outbound) Reset() { + *x = MultiEdge_Outbound{} + if protoimpl.UnsafeEnabled { + mi := &file_v1_proto_msgTypes[12] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *MultiEdge_Outbound) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*MultiEdge_Outbound) ProtoMessage() {} + +func (x *MultiEdge_Outbound) ProtoReflect() protoreflect.Message { + mi := &file_v1_proto_msgTypes[12] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use MultiEdge_Outbound.ProtoReflect.Descriptor instead. +func (*MultiEdge_Outbound) Descriptor() ([]byte, []int) { + return file_v1_proto_rawDescGZIP(), []int{7, 1} +} + +func (x *MultiEdge_Outbound) GetType() *FullType { + if x != nil { + return x.Type + } + return nil +} + +var File_v1_proto protoreflect.FileDescriptor + +var file_v1_proto_rawDesc = []byte{ + 0x0a, 0x08, 0x76, 0x31, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x12, 0x37, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, + 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, + 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, + 0x2e, 0x76, 0x31, 0x22, 0xb3, 0x0b, 0x0a, 0x04, 0x54, 0x79, 0x70, 0x65, 0x12, 0x56, 0x0a, 0x04, + 0x6b, 0x69, 0x6e, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x42, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, + 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, + 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, + 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x2e, 0x4b, 0x69, 0x6e, 0x64, 0x52, 0x04, + 0x6b, 0x69, 0x6e, 0x64, 0x12, 0x57, 0x0a, 0x07, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, + 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, + 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, + 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, + 0x54, 0x79, 0x70, 0x65, 0x52, 0x07, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x12, 0x61, 0x0a, + 0x06, 0x66, 0x69, 0x65, 0x6c, 0x64, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x49, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, + 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x2e, 0x53, 0x74, 0x72, + 0x75, 0x63, 0x74, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x52, 0x06, 0x66, 0x69, 0x65, 0x6c, 0x64, 0x73, + 0x12, 0x66, 0x0a, 0x0f, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x65, 0x74, 0x65, 0x72, 0x5f, 0x74, 0x79, + 0x70, 0x65, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, + 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, + 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, + 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x52, 0x0e, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x65, + 0x74, 0x65, 0x72, 0x54, 0x79, 0x70, 0x65, 0x73, 0x12, 0x60, 0x0a, 0x0c, 0x72, 0x65, 0x74, 0x75, + 0x72, 0x6e, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x73, 0x18, 0x05, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3d, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, + 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x52, 0x0b, 0x72, + 0x65, 0x74, 0x75, 0x72, 0x6e, 0x54, 0x79, 0x70, 0x65, 0x73, 0x12, 0x1f, 0x0a, 0x0b, 0x69, 0x73, + 0x5f, 0x76, 0x61, 0x72, 0x69, 0x61, 0x64, 0x69, 0x63, 0x18, 0x06, 0x20, 0x01, 0x28, 0x08, 0x52, + 0x0a, 0x69, 0x73, 0x56, 0x61, 0x72, 0x69, 0x61, 0x64, 0x69, 0x63, 0x12, 0x60, 0x0a, 0x08, 0x63, + 0x68, 0x61, 0x6e, 0x5f, 0x64, 0x69, 0x72, 0x18, 0x07, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x45, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, + 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x2e, 0x43, 0x68, 0x61, + 0x6e, 0x44, 0x69, 0x72, 0x52, 0x07, 0x63, 0x68, 0x61, 0x6e, 0x44, 0x69, 0x72, 0x12, 0x5f, 0x0a, + 0x07, 0x73, 0x70, 0x65, 0x63, 0x69, 0x61, 0x6c, 0x18, 0x08, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x45, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, + 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x2e, 0x53, 0x70, + 0x65, 0x63, 0x69, 0x61, 0x6c, 0x52, 0x07, 0x73, 0x70, 0x65, 0x63, 0x69, 0x61, 0x6c, 0x12, 0x21, + 0x0a, 0x0c, 0x65, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x5f, 0x6b, 0x65, 0x79, 0x18, 0x09, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x65, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x4b, 0x65, + 0x79, 0x1a, 0xed, 0x01, 0x0a, 0x0b, 0x53, 0x74, 0x72, 0x75, 0x63, 0x74, 0x46, 0x69, 0x65, 0x6c, + 0x64, 0x12, 0x12, 0x0a, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x12, 0x19, 0x0a, 0x08, 0x70, 0x6b, 0x67, 0x5f, 0x70, 0x61, 0x74, + 0x68, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x70, 0x6b, 0x67, 0x50, 0x61, 0x74, 0x68, + 0x12, 0x51, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, + 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, + 0x79, 0x70, 0x65, 0x12, 0x10, 0x0a, 0x03, 0x74, 0x61, 0x67, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, + 0x52, 0x03, 0x74, 0x61, 0x67, 0x12, 0x16, 0x0a, 0x06, 0x6f, 0x66, 0x66, 0x73, 0x65, 0x74, 0x18, + 0x05, 0x20, 0x01, 0x28, 0x03, 0x52, 0x06, 0x6f, 0x66, 0x66, 0x73, 0x65, 0x74, 0x12, 0x14, 0x0a, + 0x05, 0x69, 0x6e, 0x64, 0x65, 0x78, 0x18, 0x06, 0x20, 0x03, 0x28, 0x05, 0x52, 0x05, 0x69, 0x6e, + 0x64, 0x65, 0x78, 0x12, 0x1c, 0x0a, 0x09, 0x61, 0x6e, 0x6f, 0x6e, 0x79, 0x6d, 0x6f, 0x75, 0x73, + 0x18, 0x07, 0x20, 0x01, 0x28, 0x08, 0x52, 0x09, 0x61, 0x6e, 0x6f, 0x6e, 0x79, 0x6d, 0x6f, 0x75, + 0x73, 0x22, 0xff, 0x01, 0x0a, 0x04, 0x4b, 0x69, 0x6e, 0x64, 0x12, 0x0b, 0x0a, 0x07, 0x49, 0x4e, + 0x56, 0x41, 0x4c, 0x49, 0x44, 0x10, 0x00, 0x12, 0x08, 0x0a, 0x04, 0x42, 0x4f, 0x4f, 0x4c, 0x10, + 0x01, 0x12, 0x07, 0x0a, 0x03, 0x49, 0x4e, 0x54, 0x10, 0x02, 0x12, 0x08, 0x0a, 0x04, 0x49, 0x4e, + 0x54, 0x38, 0x10, 0x03, 0x12, 0x09, 0x0a, 0x05, 0x49, 0x4e, 0x54, 0x31, 0x36, 0x10, 0x04, 0x12, + 0x09, 0x0a, 0x05, 0x49, 0x4e, 0x54, 0x33, 0x32, 0x10, 0x05, 0x12, 0x09, 0x0a, 0x05, 0x49, 0x4e, + 0x54, 0x36, 0x34, 0x10, 0x06, 0x12, 0x08, 0x0a, 0x04, 0x55, 0x49, 0x4e, 0x54, 0x10, 0x07, 0x12, + 0x09, 0x0a, 0x05, 0x55, 0x49, 0x4e, 0x54, 0x38, 0x10, 0x08, 0x12, 0x0a, 0x0a, 0x06, 0x55, 0x49, + 0x4e, 0x54, 0x31, 0x36, 0x10, 0x09, 0x12, 0x0a, 0x0a, 0x06, 0x55, 0x49, 0x4e, 0x54, 0x33, 0x32, + 0x10, 0x0a, 0x12, 0x0a, 0x0a, 0x06, 0x55, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x10, 0x0b, 0x12, 0x0a, + 0x0a, 0x06, 0x53, 0x54, 0x52, 0x49, 0x4e, 0x47, 0x10, 0x0c, 0x12, 0x0b, 0x0a, 0x07, 0x46, 0x4c, + 0x4f, 0x41, 0x54, 0x33, 0x32, 0x10, 0x0d, 0x12, 0x0b, 0x0a, 0x07, 0x46, 0x4c, 0x4f, 0x41, 0x54, + 0x36, 0x34, 0x10, 0x0e, 0x12, 0x09, 0x0a, 0x05, 0x53, 0x4c, 0x49, 0x43, 0x45, 0x10, 0x14, 0x12, + 0x0a, 0x0a, 0x06, 0x53, 0x54, 0x52, 0x55, 0x43, 0x54, 0x10, 0x15, 0x12, 0x08, 0x0a, 0x04, 0x46, + 0x55, 0x4e, 0x43, 0x10, 0x16, 0x12, 0x08, 0x0a, 0x04, 0x43, 0x48, 0x41, 0x4e, 0x10, 0x17, 0x12, + 0x07, 0x0a, 0x03, 0x50, 0x54, 0x52, 0x10, 0x18, 0x12, 0x0b, 0x0a, 0x07, 0x53, 0x50, 0x45, 0x43, + 0x49, 0x41, 0x4c, 0x10, 0x19, 0x12, 0x0c, 0x0a, 0x08, 0x45, 0x58, 0x54, 0x45, 0x52, 0x4e, 0x41, + 0x4c, 0x10, 0x1a, 0x22, 0x27, 0x0a, 0x07, 0x43, 0x68, 0x61, 0x6e, 0x44, 0x69, 0x72, 0x12, 0x08, + 0x0a, 0x04, 0x52, 0x45, 0x43, 0x56, 0x10, 0x00, 0x12, 0x08, 0x0a, 0x04, 0x53, 0x45, 0x4e, 0x44, + 0x10, 0x01, 0x12, 0x08, 0x0a, 0x04, 0x42, 0x4f, 0x54, 0x48, 0x10, 0x02, 0x22, 0xaa, 0x01, 0x0a, + 0x07, 0x53, 0x70, 0x65, 0x63, 0x69, 0x61, 0x6c, 0x12, 0x0b, 0x0a, 0x07, 0x49, 0x4c, 0x4c, 0x45, + 0x47, 0x41, 0x4c, 0x10, 0x00, 0x12, 0x09, 0x0a, 0x05, 0x45, 0x52, 0x52, 0x4f, 0x52, 0x10, 0x01, + 0x12, 0x0b, 0x0a, 0x07, 0x43, 0x4f, 0x4e, 0x54, 0x45, 0x58, 0x54, 0x10, 0x02, 0x12, 0x08, 0x0a, + 0x04, 0x54, 0x59, 0x50, 0x45, 0x10, 0x03, 0x12, 0x0d, 0x0a, 0x09, 0x45, 0x56, 0x45, 0x4e, 0x54, + 0x54, 0x49, 0x4d, 0x45, 0x10, 0x0a, 0x12, 0x0a, 0x0a, 0x06, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, + 0x10, 0x16, 0x12, 0x06, 0x0a, 0x02, 0x4b, 0x56, 0x10, 0x0b, 0x12, 0x09, 0x0a, 0x05, 0x43, 0x4f, + 0x47, 0x42, 0x4b, 0x10, 0x0d, 0x12, 0x11, 0x0a, 0x0d, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x45, + 0x44, 0x56, 0x41, 0x4c, 0x55, 0x45, 0x10, 0x0e, 0x12, 0x05, 0x0a, 0x01, 0x54, 0x10, 0x0f, 0x12, + 0x05, 0x0a, 0x01, 0x55, 0x10, 0x10, 0x12, 0x05, 0x0a, 0x01, 0x56, 0x10, 0x11, 0x12, 0x05, 0x0a, + 0x01, 0x57, 0x10, 0x12, 0x12, 0x05, 0x0a, 0x01, 0x58, 0x10, 0x13, 0x12, 0x05, 0x0a, 0x01, 0x59, + 0x10, 0x14, 0x12, 0x05, 0x0a, 0x01, 0x5a, 0x10, 0x15, 0x22, 0xc0, 0x01, 0x0a, 0x08, 0x46, 0x75, + 0x6c, 0x6c, 0x54, 0x79, 0x70, 0x65, 0x12, 0x51, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, + 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, + 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x54, + 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x61, 0x0a, 0x0a, 0x63, 0x6f, 0x6d, + 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x41, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, + 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6c, 0x6c, 0x54, 0x79, 0x70, 0x65, + 0x52, 0x0a, 0x63, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x22, 0x6f, 0x0a, 0x06, + 0x55, 0x73, 0x65, 0x72, 0x46, 0x6e, 0x12, 0x12, 0x0a, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x12, 0x51, 0x0a, 0x04, 0x74, 0x79, + 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, + 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, + 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, + 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x22, 0x94, 0x01, + 0x0a, 0x05, 0x44, 0x79, 0x6e, 0x46, 0x6e, 0x12, 0x12, 0x0a, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x12, 0x51, 0x0a, 0x04, 0x74, + 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, + 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, + 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, + 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x12, + 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, 0x61, + 0x74, 0x61, 0x12, 0x10, 0x0a, 0x03, 0x67, 0x65, 0x6e, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x03, 0x67, 0x65, 0x6e, 0x22, 0x90, 0x02, 0x0a, 0x02, 0x46, 0x6e, 0x12, 0x4f, 0x0a, 0x02, 0x66, + 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, + 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, + 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, + 0x31, 0x2e, 0x55, 0x73, 0x65, 0x72, 0x46, 0x6e, 0x52, 0x02, 0x66, 0x6e, 0x12, 0x51, 0x0a, 0x04, + 0x74, 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, + 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, + 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, + 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, + 0x10, 0x0a, 0x03, 0x6f, 0x70, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6f, 0x70, + 0x74, 0x12, 0x54, 0x0a, 0x05, 0x64, 0x79, 0x6e, 0x66, 0x6e, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x3e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, + 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x79, 0x6e, 0x46, 0x6e, + 0x52, 0x05, 0x64, 0x79, 0x6e, 0x66, 0x6e, 0x22, 0x6b, 0x0a, 0x08, 0x57, 0x69, 0x6e, 0x64, 0x6f, + 0x77, 0x46, 0x6e, 0x12, 0x12, 0x0a, 0x04, 0x6b, 0x69, 0x6e, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x04, 0x6b, 0x69, 0x6e, 0x64, 0x12, 0x17, 0x0a, 0x07, 0x73, 0x69, 0x7a, 0x65, 0x5f, + 0x6d, 0x73, 0x18, 0x02, 0x20, 0x01, 0x28, 0x03, 0x52, 0x06, 0x73, 0x69, 0x7a, 0x65, 0x4d, 0x73, + 0x12, 0x1b, 0x0a, 0x09, 0x70, 0x65, 0x72, 0x69, 0x6f, 0x64, 0x5f, 0x6d, 0x73, 0x18, 0x03, 0x20, + 0x01, 0x28, 0x03, 0x52, 0x08, 0x70, 0x65, 0x72, 0x69, 0x6f, 0x64, 0x4d, 0x73, 0x12, 0x15, 0x0a, + 0x06, 0x67, 0x61, 0x70, 0x5f, 0x6d, 0x73, 0x18, 0x04, 0x20, 0x01, 0x28, 0x03, 0x52, 0x05, 0x67, + 0x61, 0x70, 0x4d, 0x73, 0x22, 0x9a, 0x02, 0x0a, 0x0b, 0x43, 0x75, 0x73, 0x74, 0x6f, 0x6d, 0x43, + 0x6f, 0x64, 0x65, 0x72, 0x12, 0x12, 0x0a, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x12, 0x51, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, + 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, + 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, + 0x2e, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x51, 0x0a, 0x03, 0x65, + 0x6e, 0x63, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, + 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, + 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, + 0x76, 0x31, 0x2e, 0x55, 0x73, 0x65, 0x72, 0x46, 0x6e, 0x52, 0x03, 0x65, 0x6e, 0x63, 0x12, 0x51, + 0x0a, 0x03, 0x64, 0x65, 0x63, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, + 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, + 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, + 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x55, 0x73, 0x65, 0x72, 0x46, 0x6e, 0x52, 0x03, 0x64, 0x65, + 0x63, 0x22, 0xba, 0x06, 0x0a, 0x09, 0x4d, 0x75, 0x6c, 0x74, 0x69, 0x45, 0x64, 0x67, 0x65, 0x12, + 0x4b, 0x0a, 0x02, 0x66, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3b, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, + 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, + 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, + 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x6e, 0x52, 0x02, 0x66, 0x6e, 0x12, 0x16, 0x0a, 0x06, + 0x6f, 0x70, 0x63, 0x6f, 0x64, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x06, 0x6f, 0x70, + 0x63, 0x6f, 0x64, 0x65, 0x12, 0x5e, 0x0a, 0x09, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x66, + 0x6e, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, + 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, + 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, + 0x31, 0x2e, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x46, 0x6e, 0x52, 0x08, 0x77, 0x69, 0x6e, 0x64, + 0x6f, 0x77, 0x46, 0x6e, 0x12, 0x64, 0x0a, 0x07, 0x69, 0x6e, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x18, + 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x4a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, + 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, + 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, + 0x4d, 0x75, 0x6c, 0x74, 0x69, 0x45, 0x64, 0x67, 0x65, 0x2e, 0x49, 0x6e, 0x62, 0x6f, 0x75, 0x6e, + 0x64, 0x52, 0x07, 0x69, 0x6e, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x12, 0x67, 0x0a, 0x08, 0x6f, 0x75, + 0x74, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x4b, 0x2e, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, + 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, + 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x75, 0x6c, 0x74, 0x69, 0x45, 0x64, 0x67, 0x65, + 0x2e, 0x4f, 0x75, 0x74, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x52, 0x08, 0x6f, 0x75, 0x74, 0x62, 0x6f, + 0x75, 0x6e, 0x64, 0x1a, 0xb5, 0x02, 0x0a, 0x07, 0x49, 0x6e, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x12, + 0x68, 0x0a, 0x04, 0x6b, 0x69, 0x6e, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x54, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, + 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x75, 0x6c, 0x74, 0x69, 0x45, 0x64, 0x67, + 0x65, 0x2e, 0x49, 0x6e, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x2e, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x4b, + 0x69, 0x6e, 0x64, 0x52, 0x04, 0x6b, 0x69, 0x6e, 0x64, 0x12, 0x55, 0x0a, 0x04, 0x74, 0x79, 0x70, + 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, + 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, + 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, + 0x31, 0x2e, 0x46, 0x75, 0x6c, 0x6c, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, + 0x22, 0x69, 0x0a, 0x09, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x4b, 0x69, 0x6e, 0x64, 0x12, 0x0b, 0x0a, + 0x07, 0x49, 0x4e, 0x56, 0x41, 0x4c, 0x49, 0x44, 0x10, 0x00, 0x12, 0x08, 0x0a, 0x04, 0x4d, 0x41, + 0x49, 0x4e, 0x10, 0x01, 0x12, 0x0d, 0x0a, 0x09, 0x53, 0x49, 0x4e, 0x47, 0x4c, 0x45, 0x54, 0x4f, + 0x4e, 0x10, 0x02, 0x12, 0x09, 0x0a, 0x05, 0x53, 0x4c, 0x49, 0x43, 0x45, 0x10, 0x03, 0x12, 0x07, + 0x0a, 0x03, 0x4d, 0x41, 0x50, 0x10, 0x04, 0x12, 0x0c, 0x0a, 0x08, 0x4d, 0x55, 0x4c, 0x54, 0x49, + 0x4d, 0x41, 0x50, 0x10, 0x05, 0x12, 0x08, 0x0a, 0x04, 0x49, 0x54, 0x45, 0x52, 0x10, 0x06, 0x12, + 0x0a, 0x0a, 0x06, 0x52, 0x45, 0x49, 0x54, 0x45, 0x52, 0x10, 0x07, 0x1a, 0x61, 0x0a, 0x08, 0x4f, + 0x75, 0x74, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x12, 0x55, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, + 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, + 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, + 0x46, 0x75, 0x6c, 0x6c, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x22, 0x1d, + 0x0a, 0x0d, 0x49, 0x6e, 0x6a, 0x65, 0x63, 0x74, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, + 0x0c, 0x0a, 0x01, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x05, 0x52, 0x01, 0x6e, 0x22, 0xdc, 0x01, + 0x0a, 0x10, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, + 0x61, 0x64, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x03, 0x75, 0x72, 0x6e, 0x12, 0x56, 0x0a, 0x04, 0x65, 0x64, 0x67, 0x65, 0x18, 0x02, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x42, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, + 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, + 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x75, 0x6c, + 0x74, 0x69, 0x45, 0x64, 0x67, 0x65, 0x52, 0x04, 0x65, 0x64, 0x67, 0x65, 0x12, 0x5e, 0x0a, 0x06, + 0x69, 0x6e, 0x6a, 0x65, 0x63, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x46, 0x2e, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, + 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x67, 0x72, 0x61, + 0x70, 0x68, 0x78, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x6e, 0x6a, 0x65, 0x63, 0x74, 0x50, 0x61, 0x79, + 0x6c, 0x6f, 0x61, 0x64, 0x52, 0x06, 0x69, 0x6e, 0x6a, 0x65, 0x63, 0x74, 0x42, 0x43, 0x5a, 0x41, + 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x67, 0x6f, 0x2f, 0x70, + 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x63, 0x6f, 0x72, 0x65, 0x2f, 0x72, 0x75, 0x6e, + 0x74, 0x69, 0x6d, 0x65, 0x2f, 0x67, 0x72, 0x61, 0x70, 0x68, 0x78, 0x2f, 0x76, 0x31, 0x3b, 0x76, + 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, +} + +var ( + file_v1_proto_rawDescOnce sync.Once + file_v1_proto_rawDescData = file_v1_proto_rawDesc +) + +func file_v1_proto_rawDescGZIP() []byte { + file_v1_proto_rawDescOnce.Do(func() { + file_v1_proto_rawDescData = protoimpl.X.CompressGZIP(file_v1_proto_rawDescData) + }) + return file_v1_proto_rawDescData +} + +var file_v1_proto_enumTypes = make([]protoimpl.EnumInfo, 4) +var file_v1_proto_msgTypes = make([]protoimpl.MessageInfo, 13) +var file_v1_proto_goTypes = []interface{}{ + (Type_Kind)(0), // 0: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.Kind + (Type_ChanDir)(0), // 1: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.ChanDir + (Type_Special)(0), // 2: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.Special + (MultiEdge_Inbound_InputKind)(0), // 3: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.Inbound.InputKind + (*Type)(nil), // 4: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + (*FullType)(nil), // 5: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.FullType + (*UserFn)(nil), // 6: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.UserFn + (*DynFn)(nil), // 7: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.DynFn + (*Fn)(nil), // 8: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Fn + (*WindowFn)(nil), // 9: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.WindowFn + (*CustomCoder)(nil), // 10: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.CustomCoder + (*MultiEdge)(nil), // 11: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge + (*InjectPayload)(nil), // 12: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.InjectPayload + (*TransformPayload)(nil), // 13: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.TransformPayload + (*Type_StructField)(nil), // 14: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.StructField + (*MultiEdge_Inbound)(nil), // 15: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.Inbound + (*MultiEdge_Outbound)(nil), // 16: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.Outbound +} +var file_v1_proto_depIdxs = []int32{ + 0, // 0: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.kind:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.Kind + 4, // 1: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.element:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + 14, // 2: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.fields:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.StructField + 4, // 3: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.parameter_types:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + 4, // 4: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.return_types:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + 1, // 5: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.chan_dir:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.ChanDir + 2, // 6: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.special:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.Special + 4, // 7: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.FullType.type:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + 5, // 8: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.FullType.components:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.FullType + 4, // 9: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.UserFn.type:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + 4, // 10: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.DynFn.type:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + 6, // 11: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Fn.fn:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.UserFn + 4, // 12: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Fn.type:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + 7, // 13: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Fn.dynfn:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.DynFn + 4, // 14: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.CustomCoder.type:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + 6, // 15: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.CustomCoder.enc:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.UserFn + 6, // 16: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.CustomCoder.dec:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.UserFn + 8, // 17: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.fn:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Fn + 9, // 18: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.window_fn:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.WindowFn + 15, // 19: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.inbound:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.Inbound + 16, // 20: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.outbound:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.Outbound + 11, // 21: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.TransformPayload.edge:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge + 12, // 22: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.TransformPayload.inject:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.InjectPayload + 4, // 23: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type.StructField.type:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.Type + 3, // 24: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.Inbound.kind:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.Inbound.InputKind + 5, // 25: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.Inbound.type:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.FullType + 5, // 26: org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.MultiEdge.Outbound.type:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1.FullType + 27, // [27:27] is the sub-list for method output_type + 27, // [27:27] is the sub-list for method input_type + 27, // [27:27] is the sub-list for extension type_name + 27, // [27:27] is the sub-list for extension extendee + 0, // [0:27] is the sub-list for field type_name +} + +func init() { file_v1_proto_init() } +func file_v1_proto_init() { + if File_v1_proto != nil { + return + } + if !protoimpl.UnsafeEnabled { + file_v1_proto_msgTypes[0].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Type); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[1].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*FullType); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[2].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*UserFn); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[3].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*DynFn); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[4].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Fn); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[5].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*WindowFn); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[6].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*CustomCoder); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[7].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*MultiEdge); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[8].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*InjectPayload); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[9].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*TransformPayload); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[10].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Type_StructField); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[11].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*MultiEdge_Inbound); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_v1_proto_msgTypes[12].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*MultiEdge_Outbound); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + } + type x struct{} + out := protoimpl.TypeBuilder{ + File: protoimpl.DescBuilder{ + GoPackagePath: reflect.TypeOf(x{}).PkgPath(), + RawDescriptor: file_v1_proto_rawDesc, + NumEnums: 4, + NumMessages: 13, + NumExtensions: 0, + NumServices: 0, + }, + GoTypes: file_v1_proto_goTypes, + DependencyIndexes: file_v1_proto_depIdxs, + EnumInfos: file_v1_proto_enumTypes, + MessageInfos: file_v1_proto_msgTypes, + }.Build() + File_v1_proto = out.File + file_v1_proto_rawDesc = nil + file_v1_proto_goTypes = nil + file_v1_proto_depIdxs = nil } diff --git a/sdks/go/pkg/beam/core/runtime/graphx/v1/v1.proto b/sdks/go/pkg/beam/core/runtime/graphx/v1/v1.proto index 6f85e6ad5f0a..6c21a618054a 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/v1/v1.proto +++ b/sdks/go/pkg/beam/core/runtime/graphx/v1/v1.proto @@ -24,7 +24,9 @@ */ syntax = "proto3"; -package v1; +package org.apache.beam.sdks.go.pkg.beam.core.runtime.graphx.v1; + +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/v1;v1"; // Type represents a serializable reflect.Type. message Type { diff --git a/sdks/go/pkg/beam/core/runtime/graphx/xlang.go b/sdks/go/pkg/beam/core/runtime/graphx/xlang.go index 5e064a0efa74..49c192e83050 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/xlang.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/xlang.go @@ -18,9 +18,9 @@ package graphx import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) // mergeExpandedWithPipeline adds expanded components of all ExternalTransforms to the existing pipeline @@ -28,49 +28,47 @@ func mergeExpandedWithPipeline(edges []*graph.MultiEdge, p *pipepb.Pipeline) { // Adding Expanded transforms to their counterparts in the Pipeline for _, e := range edges { - if e.Op == graph.External { - exp := e.External.Expanded - if exp == nil { - continue - } - id := fmt.Sprintf("e%v", e.ID()) + if e.Op != graph.External || e.External == nil || e.External.Expanded == nil { + continue + } + exp := e.External.Expanded + id := fmt.Sprintf("e%v", e.ID()) - p.Requirements = append(p.Requirements, exp.Requirements...) + p.Requirements = append(p.Requirements, exp.Requirements...) - // Adding components of the Expanded Transforms to the current Pipeline - components, err := ExpandedComponents(exp) - if err != nil { - panic(err) - } - for k, v := range components.GetTransforms() { - p.Components.Transforms[k] = v - } - for k, v := range components.GetPcollections() { - p.Components.Pcollections[k] = v - } - for k, v := range components.GetWindowingStrategies() { - p.Components.WindowingStrategies[k] = v - } - for k, v := range components.GetCoders() { - p.Components.Coders[k] = v - } - for k, v := range components.GetEnvironments() { - if k == defaultEnvId { - // This case is not an anomaly. It is expected to be always - // present. Any initial ExpansionRequest will have a - // component which requires the default environment. Scoping - // using unique namespace prevents collision. - continue - } - p.Components.Environments[k] = v + // Adding components of the Expanded Transforms to the current Pipeline + components, err := ExpandedComponents(exp) + if err != nil { + panic(err) + } + for k, v := range components.GetTransforms() { + p.Components.Transforms[k] = v + } + for k, v := range components.GetPcollections() { + p.Components.Pcollections[k] = v + } + for k, v := range components.GetWindowingStrategies() { + p.Components.WindowingStrategies[k] = v + } + for k, v := range components.GetCoders() { + p.Components.Coders[k] = v + } + for k, v := range components.GetEnvironments() { + if k == defaultEnvId { + // This case is not an anomaly. It is expected to be always + // present. Any initial ExpansionRequest will have a + // component which requires the default environment. Scoping + // using unique namespace prevents collision. + continue } + p.Components.Environments[k] = v + } - transform, err := ExpandedTransform(exp) - if err != nil { - panic(err) - } - p.Components.Transforms[id] = transform + transform, err := ExpandedTransform(exp) + if err != nil { + panic(err) } + p.Components.Transforms[id] = transform } } @@ -83,30 +81,28 @@ func purgeOutputInput(edges []*graph.MultiEdge, p *pipepb.Pipeline) { // Generating map (oldID -> newID) of outputs to be purged for _, e := range edges { - if e.Op == graph.External { - if e.External.Expanded == nil { - continue - } - for tag, n := range ExternalOutputs(e) { - nodeID := fmt.Sprintf("n%v", n.ID()) + if e.Op != graph.External || e.External == nil || e.External.Expanded == nil { + continue + } + for tag, n := range ExternalOutputs(e) { + nodeID := fmt.Sprintf("n%v", n.ID()) - transform, err := ExpandedTransform(e.External.Expanded) - if err != nil { - panic(err) - } - expandedOutputs := transform.GetOutputs() - var pcolID string - if tag == graph.UnnamedOutputTag { - for _, pcolID = range expandedOutputs { - // easiest way to access map with one entry (key,value) - } - } else { - pcolID = expandedOutputs[tag] + transform, err := ExpandedTransform(e.External.Expanded) + if err != nil { + panic(err) + } + expandedOutputs := transform.GetOutputs() + var pcolID string + if tag == graph.UnnamedOutputTag { + for _, pcolID = range expandedOutputs { + // easiest way to access map with one entry (key,value) } - - idxMap[nodeID] = pcolID - delete(components.Pcollections, nodeID) + } else { + pcolID = expandedOutputs[tag] } + + idxMap[nodeID] = pcolID + delete(components.Pcollections, nodeID) } } @@ -258,10 +254,14 @@ func ExpandedTransform(exp *graph.ExpandedTransform) (*pipepb.PTransform, error) // pcollection) of input nodes with respect to the map (tag -> index of Inbound // in MultiEdge.Input) of named inputs func ExternalInputs(e *graph.MultiEdge) map[string]*graph.Node { - inputs := make(map[string]*graph.Node) + return InboundTagToNode(e.External.InputsMap, e.Input) +} - for tag, id := range e.External.InputsMap { - inputs[tag] = e.Input[id].From +// InboundTagToNode relates the tags from inbound links to their respective nodes. +func InboundTagToNode(inputsMap map[string]int, inbound []*graph.Inbound) map[string]*graph.Node { + inputs := make(map[string]*graph.Node) + for tag, id := range inputsMap { + inputs[tag] = inbound[id].From } return inputs } @@ -270,10 +270,14 @@ func ExternalInputs(e *graph.MultiEdge) map[string]*graph.Node { // pcollection) of output nodes with respect to the map (tag -> index of // Outbound in MultiEdge.Output) of named outputs func ExternalOutputs(e *graph.MultiEdge) map[string]*graph.Node { - outputs := make(map[string]*graph.Node) + return OutboundTagToNode(e.External.OutputsMap, e.Output) +} - for tag, id := range e.External.OutputsMap { - outputs[tag] = e.Output[id].To +// OutboundTagToNode relates the tags from outbound links to their respective nodes. +func OutboundTagToNode(outputsMap map[string]int, outbound []*graph.Outbound) map[string]*graph.Node { + outputs := make(map[string]*graph.Node) + for tag, id := range outputsMap { + outputs[tag] = outbound[id].To } return outputs } diff --git a/sdks/go/pkg/beam/core/runtime/graphx/xlang_test.go b/sdks/go/pkg/beam/core/runtime/graphx/xlang_test.go index 368332477dca..a94fce634e69 100644 --- a/sdks/go/pkg/beam/core/runtime/graphx/xlang_test.go +++ b/sdks/go/pkg/beam/core/runtime/graphx/xlang_test.go @@ -19,11 +19,11 @@ import ( "fmt" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/google/go-cmp/cmp" "google.golang.org/protobuf/testing/protocmp" ) diff --git a/sdks/go/pkg/beam/core/runtime/harness/datamgr.go b/sdks/go/pkg/beam/core/runtime/harness/datamgr.go index ad57b8d2cd4e..3750621eac72 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/datamgr.go +++ b/sdks/go/pkg/beam/core/runtime/harness/datamgr.go @@ -21,10 +21,10 @@ import ( "sync" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" ) const ( diff --git a/sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go b/sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go index 31172e8a04b0..3ee15edc5c58 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go +++ b/sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go @@ -26,7 +26,7 @@ import ( "testing" "time" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" ) const extraData = 2 diff --git a/sdks/go/pkg/beam/core/runtime/harness/gen.go b/sdks/go/pkg/beam/core/runtime/harness/gen.go index cac56226ff0c..ab02ee2f8633 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/gen.go +++ b/sdks/go/pkg/beam/core/runtime/harness/gen.go @@ -15,4 +15,4 @@ package harness -//go:generate protoc -I. -I../../../../../../../model/fn-execution/src/main/proto -I../../../../../../../model/pipeline/src/main/proto session.proto --go_out=Mbeam_fn_api.proto=github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1:session +//go:generate protoc -I. -I../../../../../../../model/fn-execution/src/main/proto -I../../../../../../../model/pipeline/src/main/proto session.proto --go_out=Mbeam_fn_api.proto=github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1:../../../../../../../../../../ diff --git a/sdks/go/pkg/beam/core/runtime/harness/harness.go b/sdks/go/pkg/beam/core/runtime/harness/harness.go index 2eff7fcf5245..5c5b7b23f110 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/harness.go +++ b/sdks/go/pkg/beam/core/runtime/harness/harness.go @@ -24,12 +24,13 @@ import ( "sync/atomic" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/hooks" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/hooks" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" "github.com/golang/protobuf/proto" "google.golang.org/grpc" ) @@ -97,6 +98,7 @@ func Main(ctx context.Context, loggingEndpoint, controlEndpoint string) error { plans: make(map[bundleDescriptorID][]*exec.Plan), active: make(map[instructionID]*exec.Plan), inactive: newCircleBuffer(), + metStore: make(map[instructionID]*metrics.Store), failed: make(map[instructionID]error), data: &DataChannelManager{}, state: &StateChannelManager{}, @@ -221,6 +223,8 @@ type control struct { // a plan that's either about to start or has finished recently // instructions in this queue should return empty responses to control messages. inactive circleBuffer // protected by mu + // metric stores for active plans. + metStore map[instructionID]*metrics.Store // protected by mu // plans that have failed during execution failed map[instructionID]error // protected by mu mu sync.Mutex @@ -293,6 +297,10 @@ func (c *control) handleInstruction(ctx context.Context, req *fnpb.InstructionRe c.mu.Lock() c.inactive.Remove(instID) c.active[instID] = plan + // Get the user metrics store for this bundle. + ctx = metrics.SetBundleID(ctx, string(instID)) + store := metrics.GetStore(ctx) + c.metStore[instID] = store c.mu.Unlock() if err != nil { @@ -305,19 +313,21 @@ func (c *control) handleInstruction(ctx context.Context, req *fnpb.InstructionRe data.Close() state.Close() - mons, pylds := monitoring(plan) + mons, pylds := monitoring(plan, store) // Move the plan back to the candidate state c.mu.Lock() // Mark the instruction as failed. if err != nil { c.failed[instID] = err + } else { + // Non failure plans can be re-used. + c.plans[bdID] = append(c.plans[bdID], plan) } - c.plans[bdID] = append(c.plans[bdID], plan) delete(c.active, instID) - if removed, ok := c.inactive.Insert(instID); ok { delete(c.failed, removed) // Also GC old failed bundles. } + delete(c.metStore, instID) c.mu.Unlock() if err != nil { @@ -339,7 +349,7 @@ func (c *control) handleInstruction(ctx context.Context, req *fnpb.InstructionRe ref := instructionID(msg.GetInstructionId()) - plan, resp := c.getPlanOrResponse(ctx, "progress", instID, ref) + plan, store, resp := c.getPlanOrResponse(ctx, "progress", instID, ref) if resp != nil { return resp } @@ -352,7 +362,7 @@ func (c *control) handleInstruction(ctx context.Context, req *fnpb.InstructionRe } } - mons, pylds := monitoring(plan) + mons, pylds := monitoring(plan, store) return &fnpb.InstructionResponse{ InstructionId: string(instID), @@ -370,7 +380,7 @@ func (c *control) handleInstruction(ctx context.Context, req *fnpb.InstructionRe log.Debugf(ctx, "PB Split: %v", msg) ref := instructionID(msg.GetInstructionId()) - plan, resp := c.getPlanOrResponse(ctx, "split", instID, ref) + plan, _, resp := c.getPlanOrResponse(ctx, "split", instID, ref) if resp != nil { return resp } @@ -445,6 +455,16 @@ func (c *control) handleInstruction(ctx context.Context, req *fnpb.InstructionRe }, }, } + case req.GetHarnessMonitoringInfos() != nil: + return &fnpb.InstructionResponse{ + InstructionId: string(instID), + Response: &fnpb.InstructionResponse_HarnessMonitoringInfos{ + HarnessMonitoringInfos: &fnpb.HarnessMonitoringInfosResponse{ + // TODO(BEAM-11092): Populate with non-bundle metrics data. + MonitoringData: map[string][]byte{}, + }, + }, + } default: return fail(ctx, instID, "Unexpected request: %v", req) @@ -459,22 +479,23 @@ func (c *control) handleInstruction(ctx context.Context, req *fnpb.InstructionRe // them as a parameter here instead, and relying on those proto internal would be brittle. // // Since this logic is subtle, it's been abstracted to a method to scope the defer unlock. -func (c *control) getPlanOrResponse(ctx context.Context, kind string, instID, ref instructionID) (*exec.Plan, *fnpb.InstructionResponse) { +func (c *control) getPlanOrResponse(ctx context.Context, kind string, instID, ref instructionID) (*exec.Plan, *metrics.Store, *fnpb.InstructionResponse) { c.mu.Lock() plan, ok := c.active[ref] err := c.failed[ref] + store, _ := c.metStore[ref] defer c.mu.Unlock() if err != nil { - return nil, fail(ctx, instID, "failed to return %v: instruction %v failed: %v", kind, ref, err) + return nil, nil, fail(ctx, instID, "failed to return %v: instruction %v failed: %v", kind, ref, err) } if !ok { if c.inactive.Contains(ref) { - return nil, nil + return nil, nil, nil } - return nil, fail(ctx, instID, "failed to return %v: instruction %v not active", kind, ref) + return nil, nil, fail(ctx, instID, "failed to return %v: instruction %v not active", kind, ref) } - return plan, nil + return plan, store, nil } func fail(ctx context.Context, id instructionID, format string, args ...interface{}) *fnpb.InstructionResponse { diff --git a/sdks/go/pkg/beam/core/runtime/harness/harness_test.go b/sdks/go/pkg/beam/core/runtime/harness/harness_test.go index 2c83a73269db..84c5770c71a1 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/harness_test.go +++ b/sdks/go/pkg/beam/core/runtime/harness/harness_test.go @@ -20,9 +20,9 @@ import ( "strings" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" ) diff --git a/sdks/go/pkg/beam/core/runtime/harness/init/init.go b/sdks/go/pkg/beam/core/runtime/harness/init/init.go index 4043adaab633..aa207d176562 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/init/init.go +++ b/sdks/go/pkg/beam/core/runtime/harness/init/init.go @@ -29,9 +29,9 @@ import ( "runtime/debug" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" ) var ( diff --git a/sdks/go/pkg/beam/core/runtime/harness/logging.go b/sdks/go/pkg/beam/core/runtime/harness/logging.go index 2d26a9861989..5d31ca3dba43 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/logging.go +++ b/sdks/go/pkg/beam/core/runtime/harness/logging.go @@ -23,9 +23,9 @@ import ( "runtime" "time" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" "github.com/golang/protobuf/ptypes" ) diff --git a/sdks/go/pkg/beam/core/runtime/harness/logging_test.go b/sdks/go/pkg/beam/core/runtime/harness/logging_test.go index 8817fe674e63..c833512a612f 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/logging_test.go +++ b/sdks/go/pkg/beam/core/runtime/harness/logging_test.go @@ -20,8 +20,8 @@ import ( "strings" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/log" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" ) func TestLogger(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/runtime/harness/monitoring.go b/sdks/go/pkg/beam/core/runtime/harness/monitoring.go index c9ddb801d95c..f1dd93f5d896 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/monitoring.go +++ b/sdks/go/pkg/beam/core/runtime/harness/monitoring.go @@ -21,10 +21,10 @@ import ( "sync/atomic" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/metricsx" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/metricsx" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) type shortKey struct { @@ -72,7 +72,7 @@ func (c *shortIDCache) getShortID(l metrics.Labels, urn metricsx.Urn) string { c.shortIds2Infos[s] = &pipepb.MonitoringInfo{ Urn: metricsx.UrnToString(urn), Type: metricsx.UrnToType(urn), - Labels: userLabels(l), + Labels: l.Map(), } return s } @@ -102,8 +102,7 @@ func shortIdsToInfos(shortids []string) map[string]*pipepb.MonitoringInfo { return defaultShortIDCache.shortIdsToInfos(shortids) } -func monitoring(p *exec.Plan) ([]*pipepb.MonitoringInfo, map[string][]byte) { - store := p.Store() +func monitoring(p *exec.Plan, store *metrics.Store) ([]*pipepb.MonitoringInfo, map[string][]byte) { if store == nil { return nil, nil } @@ -125,7 +124,7 @@ func monitoring(p *exec.Plan) ([]*pipepb.MonitoringInfo, map[string][]byte) { &pipepb.MonitoringInfo{ Urn: metricsx.UrnToString(metricsx.UrnUserSumInt64), Type: metricsx.UrnToType(metricsx.UrnUserSumInt64), - Labels: userLabels(l), + Labels: l.Map(), Payload: payload, }) }, @@ -140,7 +139,7 @@ func monitoring(p *exec.Plan) ([]*pipepb.MonitoringInfo, map[string][]byte) { &pipepb.MonitoringInfo{ Urn: metricsx.UrnToString(metricsx.UrnUserDistInt64), Type: metricsx.UrnToType(metricsx.UrnUserDistInt64), - Labels: userLabels(l), + Labels: l.Map(), Payload: payload, }) }, @@ -155,7 +154,7 @@ func monitoring(p *exec.Plan) ([]*pipepb.MonitoringInfo, map[string][]byte) { &pipepb.MonitoringInfo{ Urn: metricsx.UrnToString(metricsx.UrnUserLatestMsInt64), Type: metricsx.UrnToType(metricsx.UrnUserLatestMsInt64), - Labels: userLabels(l), + Labels: l.Map(), Payload: payload, }) @@ -163,44 +162,65 @@ func monitoring(p *exec.Plan) ([]*pipepb.MonitoringInfo, map[string][]byte) { }.ExtractFrom(store) // Get the execution monitoring information from the bundle plan. - if snapshot, ok := p.Progress(); ok { - payload, err := metricsx.Int64Counter(snapshot.Count) + + snapshot, ok := p.Progress() + if !ok { + return monitoringInfo, payloads + } + for _, pcol := range snapshot.PCols { + payload, err := metricsx.Int64Counter(pcol.ElementCount) if err != nil { panic(err) } // TODO(BEAM-9934): This metric should account for elements in multiple windows. - payloads[getShortID(metrics.PCollectionLabels(snapshot.PID), metricsx.UrnElementCount)] = payload + payloads[getShortID(metrics.PCollectionLabels(pcol.ID), metricsx.UrnElementCount)] = payload + monitoringInfo = append(monitoringInfo, &pipepb.MonitoringInfo{ Urn: metricsx.UrnToString(metricsx.UrnElementCount), Type: metricsx.UrnToType(metricsx.UrnElementCount), Labels: map[string]string{ - "PCOLLECTION": snapshot.PID, + "PCOLLECTION": pcol.ID, }, Payload: payload, }) - payloads[getShortID(metrics.PTransformLabels(snapshot.ID), metricsx.UrnDataChannelReadIndex)] = payload - monitoringInfo = append(monitoringInfo, - &pipepb.MonitoringInfo{ - Urn: metricsx.UrnToString(metricsx.UrnDataChannelReadIndex), - Type: metricsx.UrnToType(metricsx.UrnDataChannelReadIndex), - Labels: map[string]string{ - "PTRANSFORM": snapshot.ID, - }, - Payload: payload, - }) - } + // Skip pcollections without size + if pcol.SizeCount != 0 { + payload, err := metricsx.Int64Distribution(pcol.SizeCount, pcol.SizeSum, pcol.SizeMin, pcol.SizeMax) + if err != nil { + panic(err) + } + payloads[getShortID(metrics.PCollectionLabels(pcol.ID), metricsx.UrnSampledByteSize)] = payload - return monitoringInfo, - payloads -} + monitoringInfo = append(monitoringInfo, + &pipepb.MonitoringInfo{ + Urn: metricsx.UrnToString(metricsx.UrnSampledByteSize), + Type: metricsx.UrnToType(metricsx.UrnSampledByteSize), + Labels: map[string]string{ + "PCOLLECTION": pcol.ID, + }, + Payload: payload, + }) + } + } -func userLabels(l metrics.Labels) map[string]string { - return map[string]string{ - "PTRANSFORM": l.Transform(), - "NAMESPACE": l.Namespace(), - "NAME": l.Name(), + payload, err := metricsx.Int64Counter(snapshot.Source.Count) + if err != nil { + panic(err) } + + payloads[getShortID(metrics.PTransformLabels(snapshot.Source.ID), metricsx.UrnDataChannelReadIndex)] = payload + monitoringInfo = append(monitoringInfo, + &pipepb.MonitoringInfo{ + Urn: metricsx.UrnToString(metricsx.UrnDataChannelReadIndex), + Type: metricsx.UrnToType(metricsx.UrnDataChannelReadIndex), + Labels: map[string]string{ + "PTRANSFORM": snapshot.Source.ID, + }, + Payload: payload, + }) + + return monitoringInfo, payloads } diff --git a/sdks/go/pkg/beam/core/runtime/harness/monitoring_test.go b/sdks/go/pkg/beam/core/runtime/harness/monitoring_test.go index ed792dda3b57..35e53b12752e 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/monitoring_test.go +++ b/sdks/go/pkg/beam/core/runtime/harness/monitoring_test.go @@ -19,8 +19,8 @@ import ( "strconv" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/metricsx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/metricsx" ) func TestGetShortID(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/runtime/harness/session.go b/sdks/go/pkg/beam/core/runtime/harness/session.go index f9da58cc6f21..04478c896755 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/session.go +++ b/sdks/go/pkg/beam/core/runtime/harness/session.go @@ -21,11 +21,11 @@ import ( "io" "sync" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/session" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/hooks" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/session" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/hooks" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" "github.com/golang/protobuf/proto" ) diff --git a/sdks/go/pkg/beam/core/runtime/harness/session.proto b/sdks/go/pkg/beam/core/runtime/harness/session.proto index 04cc5de74127..cfc07390f7a8 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/session.proto +++ b/sdks/go/pkg/beam/core/runtime/harness/session.proto @@ -22,7 +22,9 @@ syntax = "proto3"; -package session; +package org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session; + +option go_package = "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/session;session"; import "beam_fn_api.proto"; diff --git a/sdks/go/pkg/beam/core/runtime/harness/session/session.pb.go b/sdks/go/pkg/beam/core/runtime/harness/session/session.pb.go index 80487530a00d..8a6ec3470127 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/session/session.pb.go +++ b/sdks/go/pkg/beam/core/runtime/harness/session/session.pb.go @@ -1,35 +1,45 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// +// Protocol buffer definition for session serialization files + // Code generated by protoc-gen-go. DO NOT EDIT. +// versions: +// protoc-gen-go v1.25.0-devel +// protoc v3.13.0 // source: session.proto -/* -Package session is a generated protocol buffer package. - -It is generated from these files: - session.proto - -It has these top-level messages: - Header - Footer - EntryHeader - Entry -*/ package session -import proto "github.com/golang/protobuf/proto" -import fmt "fmt" -import math "math" -import org_apache_beam_model_fn_execution_v1 "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" - -// Reference imports to suppress errors if they are not otherwise used. -var _ = proto.Marshal -var _ = fmt.Errorf -var _ = math.Inf +import ( + fnexecution_v1 "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" + protoreflect "google.golang.org/protobuf/reflect/protoreflect" + protoimpl "google.golang.org/protobuf/runtime/protoimpl" + reflect "reflect" + sync "sync" +) -// This is a compile-time assertion to ensure that this generated file -// is compatible with the proto package it is being compiled against. -// A compilation error at this line likely means your copy of the -// proto package needs to be updated. -const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package +const ( + // Verify that this generated code is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(20 - protoimpl.MinVersion) + // Verify that runtime/protoimpl is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(protoimpl.MaxVersion - 20) +) type Kind int32 @@ -44,99 +54,220 @@ const ( Kind_FOOTER Kind = 7 ) -var Kind_name = map[int32]string{ - 0: "INVALID", - 1: "INSTRUCTION_REQUEST", - 2: "INSTRUCTION_RESPONSE", - 3: "DATA_RECEIVED", - 4: "DATA_SENT", - 5: "LOG_ENTRIES", - 6: "HEADER", - 7: "FOOTER", -} -var Kind_value = map[string]int32{ - "INVALID": 0, - "INSTRUCTION_REQUEST": 1, - "INSTRUCTION_RESPONSE": 2, - "DATA_RECEIVED": 3, - "DATA_SENT": 4, - "LOG_ENTRIES": 5, - "HEADER": 6, - "FOOTER": 7, +// Enum value maps for Kind. +var ( + Kind_name = map[int32]string{ + 0: "INVALID", + 1: "INSTRUCTION_REQUEST", + 2: "INSTRUCTION_RESPONSE", + 3: "DATA_RECEIVED", + 4: "DATA_SENT", + 5: "LOG_ENTRIES", + 6: "HEADER", + 7: "FOOTER", + } + Kind_value = map[string]int32{ + "INVALID": 0, + "INSTRUCTION_REQUEST": 1, + "INSTRUCTION_RESPONSE": 2, + "DATA_RECEIVED": 3, + "DATA_SENT": 4, + "LOG_ENTRIES": 5, + "HEADER": 6, + "FOOTER": 7, + } +) + +func (x Kind) Enum() *Kind { + p := new(Kind) + *p = x + return p } func (x Kind) String() string { - return proto.EnumName(Kind_name, int32(x)) + return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x)) +} + +func (Kind) Descriptor() protoreflect.EnumDescriptor { + return file_session_proto_enumTypes[0].Descriptor() +} + +func (Kind) Type() protoreflect.EnumType { + return &file_session_proto_enumTypes[0] +} + +func (x Kind) Number() protoreflect.EnumNumber { + return protoreflect.EnumNumber(x) +} + +// Deprecated: Use Kind.Descriptor instead. +func (Kind) EnumDescriptor() ([]byte, []int) { + return file_session_proto_rawDescGZIP(), []int{0} } -func (Kind) EnumDescriptor() ([]byte, []int) { return fileDescriptor0, []int{0} } type Header struct { - Version string `protobuf:"bytes,1,opt,name=version" json:"version,omitempty"` - SdkVersion string `protobuf:"bytes,2,opt,name=sdk_version,json=sdkVersion" json:"sdk_version,omitempty"` - MaxMsgLen int64 `protobuf:"varint,3,opt,name=max_msg_len,json=maxMsgLen" json:"max_msg_len,omitempty"` + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Version string `protobuf:"bytes,1,opt,name=version,proto3" json:"version,omitempty"` // version identifier of the session schema + SdkVersion string `protobuf:"bytes,2,opt,name=sdk_version,json=sdkVersion,proto3" json:"sdk_version,omitempty"` // build information for the SDK that generated the session. + MaxMsgLen int64 `protobuf:"varint,3,opt,name=max_msg_len,json=maxMsgLen,proto3" json:"max_msg_len,omitempty"` // maximum length of a single entry } -func (m *Header) Reset() { *m = Header{} } -func (m *Header) String() string { return proto.CompactTextString(m) } -func (*Header) ProtoMessage() {} -func (*Header) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{0} } +func (x *Header) Reset() { + *x = Header{} + if protoimpl.UnsafeEnabled { + mi := &file_session_proto_msgTypes[0] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} -func (m *Header) GetVersion() string { - if m != nil { - return m.Version +func (x *Header) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Header) ProtoMessage() {} + +func (x *Header) ProtoReflect() protoreflect.Message { + mi := &file_session_proto_msgTypes[0] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use Header.ProtoReflect.Descriptor instead. +func (*Header) Descriptor() ([]byte, []int) { + return file_session_proto_rawDescGZIP(), []int{0} +} + +func (x *Header) GetVersion() string { + if x != nil { + return x.Version } return "" } -func (m *Header) GetSdkVersion() string { - if m != nil { - return m.SdkVersion +func (x *Header) GetSdkVersion() string { + if x != nil { + return x.SdkVersion } return "" } -func (m *Header) GetMaxMsgLen() int64 { - if m != nil { - return m.MaxMsgLen +func (x *Header) GetMaxMsgLen() int64 { + if x != nil { + return x.MaxMsgLen } return 0 } type Footer struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields +} + +func (x *Footer) Reset() { + *x = Footer{} + if protoimpl.UnsafeEnabled { + mi := &file_session_proto_msgTypes[1] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Footer) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Footer) ProtoMessage() {} + +func (x *Footer) ProtoReflect() protoreflect.Message { + mi := &file_session_proto_msgTypes[1] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) } -func (m *Footer) Reset() { *m = Footer{} } -func (m *Footer) String() string { return proto.CompactTextString(m) } -func (*Footer) ProtoMessage() {} -func (*Footer) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{1} } +// Deprecated: Use Footer.ProtoReflect.Descriptor instead. +func (*Footer) Descriptor() ([]byte, []int) { + return file_session_proto_rawDescGZIP(), []int{1} +} type EntryHeader struct { - Len int64 `protobuf:"varint,1,opt,name=len" json:"len,omitempty"` - Kind Kind `protobuf:"varint,2,opt,name=kind,enum=session.Kind" json:"kind,omitempty"` + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Len int64 `protobuf:"varint,1,opt,name=len,proto3" json:"len,omitempty"` + Kind Kind `protobuf:"varint,2,opt,name=kind,proto3,enum=org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Kind" json:"kind,omitempty"` } -func (m *EntryHeader) Reset() { *m = EntryHeader{} } -func (m *EntryHeader) String() string { return proto.CompactTextString(m) } -func (*EntryHeader) ProtoMessage() {} -func (*EntryHeader) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{2} } +func (x *EntryHeader) Reset() { + *x = EntryHeader{} + if protoimpl.UnsafeEnabled { + mi := &file_session_proto_msgTypes[2] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} -func (m *EntryHeader) GetLen() int64 { - if m != nil { - return m.Len +func (x *EntryHeader) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*EntryHeader) ProtoMessage() {} + +func (x *EntryHeader) ProtoReflect() protoreflect.Message { + mi := &file_session_proto_msgTypes[2] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use EntryHeader.ProtoReflect.Descriptor instead. +func (*EntryHeader) Descriptor() ([]byte, []int) { + return file_session_proto_rawDescGZIP(), []int{2} +} + +func (x *EntryHeader) GetLen() int64 { + if x != nil { + return x.Len } return 0 } -func (m *EntryHeader) GetKind() Kind { - if m != nil { - return m.Kind +func (x *EntryHeader) GetKind() Kind { + if x != nil { + return x.Kind } return Kind_INVALID } type Entry struct { - Kind Kind `protobuf:"varint,1,opt,name=kind,enum=session.Kind" json:"kind,omitempty"` - // Types that are valid to be assigned to Msg: + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Kind Kind `protobuf:"varint,1,opt,name=kind,proto3,enum=org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Kind" json:"kind,omitempty"` + // Types that are assignable to Msg: // *Entry_InstReq // *Entry_InstResp // *Entry_Elems @@ -144,43 +275,47 @@ type Entry struct { // *Entry_Header // *Entry_Footer Msg isEntry_Msg `protobuf_oneof:"msg"` - Timestamp int64 `protobuf:"varint,2,opt,name=timestamp" json:"timestamp,omitempty"` + Timestamp int64 `protobuf:"varint,2,opt,name=timestamp,proto3" json:"timestamp,omitempty"` } -func (m *Entry) Reset() { *m = Entry{} } -func (m *Entry) String() string { return proto.CompactTextString(m) } -func (*Entry) ProtoMessage() {} -func (*Entry) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{3} } - -type isEntry_Msg interface { - isEntry_Msg() +func (x *Entry) Reset() { + *x = Entry{} + if protoimpl.UnsafeEnabled { + mi := &file_session_proto_msgTypes[3] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } } -type Entry_InstReq struct { - InstReq *org_apache_beam_model_fn_execution_v1.InstructionRequest `protobuf:"bytes,1000,opt,name=inst_req,json=instReq,oneof"` -} -type Entry_InstResp struct { - InstResp *org_apache_beam_model_fn_execution_v1.InstructionResponse `protobuf:"bytes,1001,opt,name=inst_resp,json=instResp,oneof"` -} -type Entry_Elems struct { - Elems *org_apache_beam_model_fn_execution_v1.Elements `protobuf:"bytes,1002,opt,name=elems,oneof"` +func (x *Entry) String() string { + return protoimpl.X.MessageStringOf(x) } -type Entry_LogEntries struct { - LogEntries *org_apache_beam_model_fn_execution_v1.LogEntry_List `protobuf:"bytes,1003,opt,name=log_entries,json=logEntries,oneof"` -} -type Entry_Header struct { - Header *Header `protobuf:"bytes,1004,opt,name=header,oneof"` + +func (*Entry) ProtoMessage() {} + +func (x *Entry) ProtoReflect() protoreflect.Message { + mi := &file_session_proto_msgTypes[3] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) } -type Entry_Footer struct { - Footer *Footer `protobuf:"bytes,1005,opt,name=footer,oneof"` + +// Deprecated: Use Entry.ProtoReflect.Descriptor instead. +func (*Entry) Descriptor() ([]byte, []int) { + return file_session_proto_rawDescGZIP(), []int{3} } -func (*Entry_InstReq) isEntry_Msg() {} -func (*Entry_InstResp) isEntry_Msg() {} -func (*Entry_Elems) isEntry_Msg() {} -func (*Entry_LogEntries) isEntry_Msg() {} -func (*Entry_Header) isEntry_Msg() {} -func (*Entry_Footer) isEntry_Msg() {} +func (x *Entry) GetKind() Kind { + if x != nil { + return x.Kind + } + return Kind_INVALID +} func (m *Entry) GetMsg() isEntry_Msg { if m != nil { @@ -189,256 +324,299 @@ func (m *Entry) GetMsg() isEntry_Msg { return nil } -func (m *Entry) GetKind() Kind { - if m != nil { - return m.Kind - } - return Kind_INVALID -} - -func (m *Entry) GetInstReq() *org_apache_beam_model_fn_execution_v1.InstructionRequest { - if x, ok := m.GetMsg().(*Entry_InstReq); ok { +func (x *Entry) GetInstReq() *fnexecution_v1.InstructionRequest { + if x, ok := x.GetMsg().(*Entry_InstReq); ok { return x.InstReq } return nil } -func (m *Entry) GetInstResp() *org_apache_beam_model_fn_execution_v1.InstructionResponse { - if x, ok := m.GetMsg().(*Entry_InstResp); ok { +func (x *Entry) GetInstResp() *fnexecution_v1.InstructionResponse { + if x, ok := x.GetMsg().(*Entry_InstResp); ok { return x.InstResp } return nil } -func (m *Entry) GetElems() *org_apache_beam_model_fn_execution_v1.Elements { - if x, ok := m.GetMsg().(*Entry_Elems); ok { +func (x *Entry) GetElems() *fnexecution_v1.Elements { + if x, ok := x.GetMsg().(*Entry_Elems); ok { return x.Elems } return nil } -func (m *Entry) GetLogEntries() *org_apache_beam_model_fn_execution_v1.LogEntry_List { - if x, ok := m.GetMsg().(*Entry_LogEntries); ok { +func (x *Entry) GetLogEntries() *fnexecution_v1.LogEntry_List { + if x, ok := x.GetMsg().(*Entry_LogEntries); ok { return x.LogEntries } return nil } -func (m *Entry) GetHeader() *Header { - if x, ok := m.GetMsg().(*Entry_Header); ok { +func (x *Entry) GetHeader() *Header { + if x, ok := x.GetMsg().(*Entry_Header); ok { return x.Header } return nil } -func (m *Entry) GetFooter() *Footer { - if x, ok := m.GetMsg().(*Entry_Footer); ok { +func (x *Entry) GetFooter() *Footer { + if x, ok := x.GetMsg().(*Entry_Footer); ok { return x.Footer } return nil } -func (m *Entry) GetTimestamp() int64 { - if m != nil { - return m.Timestamp +func (x *Entry) GetTimestamp() int64 { + if x != nil { + return x.Timestamp } return 0 } -// XXX_OneofFuncs is for the internal use of the proto package. -func (*Entry) XXX_OneofFuncs() (func(msg proto.Message, b *proto.Buffer) error, func(msg proto.Message, tag, wire int, b *proto.Buffer) (bool, error), func(msg proto.Message) (n int), []interface{}) { - return _Entry_OneofMarshaler, _Entry_OneofUnmarshaler, _Entry_OneofSizer, []interface{}{ - (*Entry_InstReq)(nil), - (*Entry_InstResp)(nil), - (*Entry_Elems)(nil), - (*Entry_LogEntries)(nil), - (*Entry_Header)(nil), - (*Entry_Footer)(nil), - } +type isEntry_Msg interface { + isEntry_Msg() } -func _Entry_OneofMarshaler(msg proto.Message, b *proto.Buffer) error { - m := msg.(*Entry) - // msg - switch x := m.Msg.(type) { - case *Entry_InstReq: - b.EncodeVarint(1000<<3 | proto.WireBytes) - if err := b.EncodeMessage(x.InstReq); err != nil { - return err - } - case *Entry_InstResp: - b.EncodeVarint(1001<<3 | proto.WireBytes) - if err := b.EncodeMessage(x.InstResp); err != nil { - return err - } - case *Entry_Elems: - b.EncodeVarint(1002<<3 | proto.WireBytes) - if err := b.EncodeMessage(x.Elems); err != nil { - return err - } - case *Entry_LogEntries: - b.EncodeVarint(1003<<3 | proto.WireBytes) - if err := b.EncodeMessage(x.LogEntries); err != nil { - return err - } - case *Entry_Header: - b.EncodeVarint(1004<<3 | proto.WireBytes) - if err := b.EncodeMessage(x.Header); err != nil { - return err - } - case *Entry_Footer: - b.EncodeVarint(1005<<3 | proto.WireBytes) - if err := b.EncodeMessage(x.Footer); err != nil { - return err - } - case nil: - default: - return fmt.Errorf("Entry.Msg has unexpected type %T", x) - } - return nil +type Entry_InstReq struct { + InstReq *fnexecution_v1.InstructionRequest `protobuf:"bytes,1000,opt,name=inst_req,json=instReq,proto3,oneof"` } -func _Entry_OneofUnmarshaler(msg proto.Message, tag, wire int, b *proto.Buffer) (bool, error) { - m := msg.(*Entry) - switch tag { - case 1000: // msg.inst_req - if wire != proto.WireBytes { - return true, proto.ErrInternalBadWireType - } - msg := new(org_apache_beam_model_fn_execution_v1.InstructionRequest) - err := b.DecodeMessage(msg) - m.Msg = &Entry_InstReq{msg} - return true, err - case 1001: // msg.inst_resp - if wire != proto.WireBytes { - return true, proto.ErrInternalBadWireType - } - msg := new(org_apache_beam_model_fn_execution_v1.InstructionResponse) - err := b.DecodeMessage(msg) - m.Msg = &Entry_InstResp{msg} - return true, err - case 1002: // msg.elems - if wire != proto.WireBytes { - return true, proto.ErrInternalBadWireType +type Entry_InstResp struct { + InstResp *fnexecution_v1.InstructionResponse `protobuf:"bytes,1001,opt,name=inst_resp,json=instResp,proto3,oneof"` +} + +type Entry_Elems struct { + Elems *fnexecution_v1.Elements `protobuf:"bytes,1002,opt,name=elems,proto3,oneof"` +} + +type Entry_LogEntries struct { + LogEntries *fnexecution_v1.LogEntry_List `protobuf:"bytes,1003,opt,name=log_entries,json=logEntries,proto3,oneof"` +} + +type Entry_Header struct { + Header *Header `protobuf:"bytes,1004,opt,name=header,proto3,oneof"` +} + +type Entry_Footer struct { + Footer *Footer `protobuf:"bytes,1005,opt,name=footer,proto3,oneof"` +} + +func (*Entry_InstReq) isEntry_Msg() {} + +func (*Entry_InstResp) isEntry_Msg() {} + +func (*Entry_Elems) isEntry_Msg() {} + +func (*Entry_LogEntries) isEntry_Msg() {} + +func (*Entry_Header) isEntry_Msg() {} + +func (*Entry_Footer) isEntry_Msg() {} + +var File_session_proto protoreflect.FileDescriptor + +var file_session_proto_rawDesc = []byte{ + 0x0a, 0x0d, 0x73, 0x65, 0x73, 0x73, 0x69, 0x6f, 0x6e, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x12, + 0x3d, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x68, + 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x2e, 0x73, 0x65, 0x73, 0x73, 0x69, 0x6f, 0x6e, 0x1a, 0x11, + 0x62, 0x65, 0x61, 0x6d, 0x5f, 0x66, 0x6e, 0x5f, 0x61, 0x70, 0x69, 0x2e, 0x70, 0x72, 0x6f, 0x74, + 0x6f, 0x22, 0x63, 0x0a, 0x06, 0x48, 0x65, 0x61, 0x64, 0x65, 0x72, 0x12, 0x18, 0x0a, 0x07, 0x76, + 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x76, 0x65, + 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x12, 0x1f, 0x0a, 0x0b, 0x73, 0x64, 0x6b, 0x5f, 0x76, 0x65, 0x72, + 0x73, 0x69, 0x6f, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, 0x73, 0x64, 0x6b, 0x56, + 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x12, 0x1e, 0x0a, 0x0b, 0x6d, 0x61, 0x78, 0x5f, 0x6d, 0x73, + 0x67, 0x5f, 0x6c, 0x65, 0x6e, 0x18, 0x03, 0x20, 0x01, 0x28, 0x03, 0x52, 0x09, 0x6d, 0x61, 0x78, + 0x4d, 0x73, 0x67, 0x4c, 0x65, 0x6e, 0x22, 0x08, 0x0a, 0x06, 0x46, 0x6f, 0x6f, 0x74, 0x65, 0x72, + 0x22, 0x78, 0x0a, 0x0b, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x48, 0x65, 0x61, 0x64, 0x65, 0x72, 0x12, + 0x10, 0x0a, 0x03, 0x6c, 0x65, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x03, 0x52, 0x03, 0x6c, 0x65, + 0x6e, 0x12, 0x57, 0x0a, 0x04, 0x6b, 0x69, 0x6e, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0e, 0x32, + 0x43, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, + 0x68, 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x2e, 0x73, 0x65, 0x73, 0x73, 0x69, 0x6f, 0x6e, 0x2e, + 0x4b, 0x69, 0x6e, 0x64, 0x52, 0x04, 0x6b, 0x69, 0x6e, 0x64, 0x22, 0xa2, 0x05, 0x0a, 0x05, 0x45, + 0x6e, 0x74, 0x72, 0x79, 0x12, 0x57, 0x0a, 0x04, 0x6b, 0x69, 0x6e, 0x64, 0x18, 0x01, 0x20, 0x01, + 0x28, 0x0e, 0x32, 0x43, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, + 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, + 0x6d, 0x65, 0x2e, 0x68, 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x2e, 0x73, 0x65, 0x73, 0x73, 0x69, + 0x6f, 0x6e, 0x2e, 0x4b, 0x69, 0x6e, 0x64, 0x52, 0x04, 0x6b, 0x69, 0x6e, 0x64, 0x12, 0x57, 0x0a, + 0x08, 0x69, 0x6e, 0x73, 0x74, 0x5f, 0x72, 0x65, 0x71, 0x18, 0xe8, 0x07, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x39, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, + 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, + 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x48, 0x00, 0x52, 0x07, 0x69, + 0x6e, 0x73, 0x74, 0x52, 0x65, 0x71, 0x12, 0x5a, 0x0a, 0x09, 0x69, 0x6e, 0x73, 0x74, 0x5f, 0x72, + 0x65, 0x73, 0x70, 0x18, 0xe9, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3a, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, + 0x76, 0x31, 0x2e, 0x49, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x65, + 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x08, 0x69, 0x6e, 0x73, 0x74, 0x52, 0x65, + 0x73, 0x70, 0x12, 0x48, 0x0a, 0x05, 0x65, 0x6c, 0x65, 0x6d, 0x73, 0x18, 0xea, 0x07, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, + 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, + 0x6e, 0x74, 0x73, 0x48, 0x00, 0x52, 0x05, 0x65, 0x6c, 0x65, 0x6d, 0x73, 0x12, 0x58, 0x0a, 0x0b, + 0x6c, 0x6f, 0x67, 0x5f, 0x65, 0x6e, 0x74, 0x72, 0x69, 0x65, 0x73, 0x18, 0xeb, 0x07, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, + 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x45, 0x6e, + 0x74, 0x72, 0x79, 0x2e, 0x4c, 0x69, 0x73, 0x74, 0x48, 0x00, 0x52, 0x0a, 0x6c, 0x6f, 0x67, 0x45, + 0x6e, 0x74, 0x72, 0x69, 0x65, 0x73, 0x12, 0x60, 0x0a, 0x06, 0x68, 0x65, 0x61, 0x64, 0x65, 0x72, + 0x18, 0xec, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, 0x2e, 0x67, + 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, 0x65, 0x2e, + 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x68, 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x2e, + 0x73, 0x65, 0x73, 0x73, 0x69, 0x6f, 0x6e, 0x2e, 0x48, 0x65, 0x61, 0x64, 0x65, 0x72, 0x48, 0x00, + 0x52, 0x06, 0x68, 0x65, 0x61, 0x64, 0x65, 0x72, 0x12, 0x60, 0x0a, 0x06, 0x66, 0x6f, 0x6f, 0x74, + 0x65, 0x72, 0x18, 0xed, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x73, 0x64, 0x6b, 0x73, + 0x2e, 0x67, 0x6f, 0x2e, 0x70, 0x6b, 0x67, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x63, 0x6f, 0x72, + 0x65, 0x2e, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2e, 0x68, 0x61, 0x72, 0x6e, 0x65, 0x73, + 0x73, 0x2e, 0x73, 0x65, 0x73, 0x73, 0x69, 0x6f, 0x6e, 0x2e, 0x46, 0x6f, 0x6f, 0x74, 0x65, 0x72, + 0x48, 0x00, 0x52, 0x06, 0x66, 0x6f, 0x6f, 0x74, 0x65, 0x72, 0x12, 0x1c, 0x0a, 0x09, 0x74, 0x69, + 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x18, 0x02, 0x20, 0x01, 0x28, 0x03, 0x52, 0x09, 0x74, + 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x42, 0x05, 0x0a, 0x03, 0x6d, 0x73, 0x67, 0x2a, + 0x91, 0x01, 0x0a, 0x04, 0x4b, 0x69, 0x6e, 0x64, 0x12, 0x0b, 0x0a, 0x07, 0x49, 0x4e, 0x56, 0x41, + 0x4c, 0x49, 0x44, 0x10, 0x00, 0x12, 0x17, 0x0a, 0x13, 0x49, 0x4e, 0x53, 0x54, 0x52, 0x55, 0x43, + 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x52, 0x45, 0x51, 0x55, 0x45, 0x53, 0x54, 0x10, 0x01, 0x12, 0x18, + 0x0a, 0x14, 0x49, 0x4e, 0x53, 0x54, 0x52, 0x55, 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x52, 0x45, + 0x53, 0x50, 0x4f, 0x4e, 0x53, 0x45, 0x10, 0x02, 0x12, 0x11, 0x0a, 0x0d, 0x44, 0x41, 0x54, 0x41, + 0x5f, 0x52, 0x45, 0x43, 0x45, 0x49, 0x56, 0x45, 0x44, 0x10, 0x03, 0x12, 0x0d, 0x0a, 0x09, 0x44, + 0x41, 0x54, 0x41, 0x5f, 0x53, 0x45, 0x4e, 0x54, 0x10, 0x04, 0x12, 0x0f, 0x0a, 0x0b, 0x4c, 0x4f, + 0x47, 0x5f, 0x45, 0x4e, 0x54, 0x52, 0x49, 0x45, 0x53, 0x10, 0x05, 0x12, 0x0a, 0x0a, 0x06, 0x48, + 0x45, 0x41, 0x44, 0x45, 0x52, 0x10, 0x06, 0x12, 0x0a, 0x0a, 0x06, 0x46, 0x4f, 0x4f, 0x54, 0x45, + 0x52, 0x10, 0x07, 0x42, 0x4e, 0x5a, 0x4c, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, + 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, + 0x6b, 0x73, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x63, + 0x6f, 0x72, 0x65, 0x2f, 0x72, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x2f, 0x68, 0x61, 0x72, 0x6e, + 0x65, 0x73, 0x73, 0x2f, 0x73, 0x65, 0x73, 0x73, 0x69, 0x6f, 0x6e, 0x3b, 0x73, 0x65, 0x73, 0x73, + 0x69, 0x6f, 0x6e, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, +} + +var ( + file_session_proto_rawDescOnce sync.Once + file_session_proto_rawDescData = file_session_proto_rawDesc +) + +func file_session_proto_rawDescGZIP() []byte { + file_session_proto_rawDescOnce.Do(func() { + file_session_proto_rawDescData = protoimpl.X.CompressGZIP(file_session_proto_rawDescData) + }) + return file_session_proto_rawDescData +} + +var file_session_proto_enumTypes = make([]protoimpl.EnumInfo, 1) +var file_session_proto_msgTypes = make([]protoimpl.MessageInfo, 4) +var file_session_proto_goTypes = []interface{}{ + (Kind)(0), // 0: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Kind + (*Header)(nil), // 1: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Header + (*Footer)(nil), // 2: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Footer + (*EntryHeader)(nil), // 3: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.EntryHeader + (*Entry)(nil), // 4: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Entry + (*fnexecution_v1.InstructionRequest)(nil), // 5: org.apache.beam.model.fn_execution.v1.InstructionRequest + (*fnexecution_v1.InstructionResponse)(nil), // 6: org.apache.beam.model.fn_execution.v1.InstructionResponse + (*fnexecution_v1.Elements)(nil), // 7: org.apache.beam.model.fn_execution.v1.Elements + (*fnexecution_v1.LogEntry_List)(nil), // 8: org.apache.beam.model.fn_execution.v1.LogEntry.List +} +var file_session_proto_depIdxs = []int32{ + 0, // 0: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.EntryHeader.kind:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Kind + 0, // 1: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Entry.kind:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Kind + 5, // 2: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Entry.inst_req:type_name -> org.apache.beam.model.fn_execution.v1.InstructionRequest + 6, // 3: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Entry.inst_resp:type_name -> org.apache.beam.model.fn_execution.v1.InstructionResponse + 7, // 4: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Entry.elems:type_name -> org.apache.beam.model.fn_execution.v1.Elements + 8, // 5: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Entry.log_entries:type_name -> org.apache.beam.model.fn_execution.v1.LogEntry.List + 1, // 6: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Entry.header:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Header + 2, // 7: org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Entry.footer:type_name -> org.apache.beam.sdks.go.pkg.beam.core.runtime.harness.session.Footer + 8, // [8:8] is the sub-list for method output_type + 8, // [8:8] is the sub-list for method input_type + 8, // [8:8] is the sub-list for extension type_name + 8, // [8:8] is the sub-list for extension extendee + 0, // [0:8] is the sub-list for field type_name +} + +func init() { file_session_proto_init() } +func file_session_proto_init() { + if File_session_proto != nil { + return + } + if !protoimpl.UnsafeEnabled { + file_session_proto_msgTypes[0].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Header); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } } - msg := new(org_apache_beam_model_fn_execution_v1.Elements) - err := b.DecodeMessage(msg) - m.Msg = &Entry_Elems{msg} - return true, err - case 1003: // msg.log_entries - if wire != proto.WireBytes { - return true, proto.ErrInternalBadWireType + file_session_proto_msgTypes[1].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Footer); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } } - msg := new(org_apache_beam_model_fn_execution_v1.LogEntry_List) - err := b.DecodeMessage(msg) - m.Msg = &Entry_LogEntries{msg} - return true, err - case 1004: // msg.header - if wire != proto.WireBytes { - return true, proto.ErrInternalBadWireType + file_session_proto_msgTypes[2].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*EntryHeader); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } } - msg := new(Header) - err := b.DecodeMessage(msg) - m.Msg = &Entry_Header{msg} - return true, err - case 1005: // msg.footer - if wire != proto.WireBytes { - return true, proto.ErrInternalBadWireType + file_session_proto_msgTypes[3].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Entry); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } } - msg := new(Footer) - err := b.DecodeMessage(msg) - m.Msg = &Entry_Footer{msg} - return true, err - default: - return false, nil } -} - -func _Entry_OneofSizer(msg proto.Message) (n int) { - m := msg.(*Entry) - // msg - switch x := m.Msg.(type) { - case *Entry_InstReq: - s := proto.Size(x.InstReq) - n += proto.SizeVarint(1000<<3 | proto.WireBytes) - n += proto.SizeVarint(uint64(s)) - n += s - case *Entry_InstResp: - s := proto.Size(x.InstResp) - n += proto.SizeVarint(1001<<3 | proto.WireBytes) - n += proto.SizeVarint(uint64(s)) - n += s - case *Entry_Elems: - s := proto.Size(x.Elems) - n += proto.SizeVarint(1002<<3 | proto.WireBytes) - n += proto.SizeVarint(uint64(s)) - n += s - case *Entry_LogEntries: - s := proto.Size(x.LogEntries) - n += proto.SizeVarint(1003<<3 | proto.WireBytes) - n += proto.SizeVarint(uint64(s)) - n += s - case *Entry_Header: - s := proto.Size(x.Header) - n += proto.SizeVarint(1004<<3 | proto.WireBytes) - n += proto.SizeVarint(uint64(s)) - n += s - case *Entry_Footer: - s := proto.Size(x.Footer) - n += proto.SizeVarint(1005<<3 | proto.WireBytes) - n += proto.SizeVarint(uint64(s)) - n += s - case nil: - default: - panic(fmt.Sprintf("proto: unexpected type %T in oneof", x)) + file_session_proto_msgTypes[3].OneofWrappers = []interface{}{ + (*Entry_InstReq)(nil), + (*Entry_InstResp)(nil), + (*Entry_Elems)(nil), + (*Entry_LogEntries)(nil), + (*Entry_Header)(nil), + (*Entry_Footer)(nil), } - return n -} - -func init() { - proto.RegisterType((*Header)(nil), "session.Header") - proto.RegisterType((*Footer)(nil), "session.Footer") - proto.RegisterType((*EntryHeader)(nil), "session.EntryHeader") - proto.RegisterType((*Entry)(nil), "session.Entry") - proto.RegisterEnum("session.Kind", Kind_name, Kind_value) -} - -func init() { proto.RegisterFile("session.proto", fileDescriptor0) } - -var fileDescriptor0 = []byte{ - // 532 bytes of a gzipped FileDescriptorProto - 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0xff, 0x9c, 0x93, 0xcd, 0x6e, 0xd3, 0x40, - 0x10, 0x80, 0xe3, 0x3a, 0xb1, 0x9b, 0xb1, 0x42, 0xdd, 0x05, 0x09, 0x0b, 0x21, 0x28, 0x39, 0x55, - 0x3d, 0x18, 0x51, 0xb8, 0xc0, 0x2d, 0x6d, 0xb6, 0xd8, 0x22, 0x38, 0xb0, 0x76, 0x0b, 0xe2, 0x62, - 0xb9, 0xf1, 0xd4, 0xb5, 0x62, 0xaf, 0x1d, 0xaf, 0x53, 0x85, 0x1b, 0xaf, 0xc0, 0x5b, 0xf2, 0xfb, - 0x0e, 0xc8, 0xeb, 0xa4, 0x08, 0x24, 0xa4, 0x8a, 0xdb, 0xce, 0xcc, 0xb7, 0xdf, 0x6a, 0x67, 0x76, - 0x61, 0x20, 0x50, 0x88, 0xb4, 0xe0, 0x76, 0x59, 0x15, 0x75, 0x41, 0xf4, 0x75, 0x78, 0x6f, 0xf7, - 0x1c, 0xa3, 0x3c, 0xbc, 0xe0, 0x61, 0x54, 0xa6, 0x6d, 0x6d, 0x38, 0x03, 0xcd, 0xc1, 0x28, 0xc6, - 0x8a, 0x58, 0xa0, 0x5f, 0x61, 0xd5, 0x70, 0x96, 0xb2, 0xa7, 0xec, 0xf7, 0xd9, 0x26, 0x24, 0x0f, - 0xc1, 0x10, 0xf1, 0x3c, 0xdc, 0x54, 0xb7, 0x64, 0x15, 0x44, 0x3c, 0x3f, 0x5b, 0x03, 0x0f, 0xc0, - 0xc8, 0xa3, 0x55, 0x98, 0x8b, 0x24, 0xcc, 0x90, 0x5b, 0xea, 0x9e, 0xb2, 0xaf, 0xb2, 0x7e, 0x1e, - 0xad, 0x5e, 0x8b, 0x64, 0x82, 0x7c, 0xb8, 0x0d, 0xda, 0x49, 0x51, 0xd4, 0x58, 0x0d, 0x8f, 0xc0, - 0xa0, 0xbc, 0xae, 0x3e, 0xae, 0xcf, 0x34, 0x41, 0x6d, 0x36, 0x28, 0x72, 0x43, 0xb3, 0x24, 0x8f, - 0xa0, 0x3b, 0x4f, 0x79, 0x2c, 0x0f, 0xb9, 0x75, 0x38, 0xb0, 0x37, 0x37, 0x79, 0x95, 0xf2, 0x98, - 0xc9, 0xd2, 0xf0, 0x53, 0x17, 0x7a, 0x52, 0x72, 0x0d, 0x2b, 0xff, 0x84, 0xc9, 0x3b, 0xd8, 0x4e, - 0xb9, 0xa8, 0xc3, 0x0a, 0x17, 0xd6, 0x17, 0x7d, 0x4f, 0xd9, 0x37, 0x0e, 0x9f, 0xdb, 0x45, 0x95, - 0xd8, 0x51, 0x19, 0xcd, 0x2e, 0xd1, 0x6e, 0x3a, 0x62, 0xe7, 0x45, 0x8c, 0x99, 0x7d, 0xc1, 0x43, - 0x5c, 0xe1, 0x6c, 0x59, 0x37, 0x8a, 0xab, 0x27, 0xb6, 0xcb, 0x45, 0x5d, 0x2d, 0x67, 0x4d, 0xc8, - 0x70, 0xb1, 0x44, 0x51, 0x3b, 0x1d, 0xa6, 0x37, 0x36, 0x86, 0x0b, 0xf2, 0x01, 0xfa, 0x6b, 0xb1, - 0x28, 0xad, 0xaf, 0xad, 0xf9, 0xc5, 0xff, 0x98, 0x45, 0x59, 0x70, 0x81, 0x4e, 0x87, 0x6d, 0xb7, - 0x6a, 0x51, 0x12, 0x07, 0x7a, 0x98, 0x61, 0x2e, 0xac, 0x6f, 0xad, 0xf7, 0xf1, 0x0d, 0xbd, 0x34, - 0xc3, 0x1c, 0x79, 0x2d, 0x9c, 0x0e, 0x6b, 0x05, 0xe4, 0x3d, 0x18, 0x59, 0x91, 0x84, 0xc8, 0xeb, - 0x2a, 0x45, 0x61, 0x7d, 0x6f, 0x7d, 0xcf, 0x6e, 0xe8, 0x9b, 0x14, 0x89, 0x6c, 0xb4, 0x3d, 0x49, - 0xe5, 0xe5, 0x21, 0x6b, 0x13, 0x29, 0x0a, 0x72, 0x00, 0xda, 0xa5, 0x1c, 0xa2, 0xf5, 0xa3, 0x95, - 0xee, 0x5c, 0xb7, 0xbf, 0x1d, 0xae, 0xd3, 0x61, 0x6b, 0xa2, 0x61, 0x2f, 0xe4, 0xfc, 0xad, 0x9f, - 0x7f, 0xb3, 0xed, 0xbb, 0x68, 0xd8, 0x96, 0x20, 0xf7, 0xa1, 0x5f, 0xa7, 0x39, 0x8a, 0x3a, 0xca, - 0x4b, 0xf9, 0x0a, 0x54, 0xf6, 0x3b, 0x71, 0xd4, 0x03, 0x35, 0x17, 0xc9, 0xc1, 0x67, 0x05, 0xba, - 0xcd, 0x90, 0x89, 0x01, 0xba, 0xeb, 0x9d, 0x8d, 0x26, 0xee, 0xd8, 0xec, 0x90, 0xbb, 0x70, 0xdb, - 0xf5, 0xfc, 0x80, 0x9d, 0x1e, 0x07, 0xee, 0xd4, 0x0b, 0x19, 0x7d, 0x7b, 0x4a, 0xfd, 0xc0, 0x54, - 0x88, 0x05, 0x77, 0xfe, 0x2c, 0xf8, 0x6f, 0xa6, 0x9e, 0x4f, 0xcd, 0x2d, 0xb2, 0x0b, 0x83, 0xf1, - 0x28, 0x18, 0x85, 0x8c, 0x1e, 0x53, 0xf7, 0x8c, 0x8e, 0x4d, 0x95, 0x0c, 0xa0, 0x2f, 0x53, 0x3e, - 0xf5, 0x02, 0xb3, 0x4b, 0x76, 0xc0, 0x98, 0x4c, 0x5f, 0x86, 0xd4, 0x0b, 0x98, 0x4b, 0x7d, 0xb3, - 0x47, 0x00, 0x34, 0x87, 0x8e, 0xc6, 0x94, 0x99, 0x5a, 0xb3, 0x3e, 0x99, 0x4e, 0x03, 0xca, 0x4c, - 0xfd, 0x5c, 0x93, 0x1f, 0xea, 0xe9, 0xaf, 0x00, 0x00, 0x00, 0xff, 0xff, 0xc8, 0x2f, 0xbd, 0x8f, - 0x7d, 0x03, 0x00, 0x00, + type x struct{} + out := protoimpl.TypeBuilder{ + File: protoimpl.DescBuilder{ + GoPackagePath: reflect.TypeOf(x{}).PkgPath(), + RawDescriptor: file_session_proto_rawDesc, + NumEnums: 1, + NumMessages: 4, + NumExtensions: 0, + NumServices: 0, + }, + GoTypes: file_session_proto_goTypes, + DependencyIndexes: file_session_proto_depIdxs, + EnumInfos: file_session_proto_enumTypes, + MessageInfos: file_session_proto_msgTypes, + }.Build() + File_session_proto = out.File + file_session_proto_rawDesc = nil + file_session_proto_goTypes = nil + file_session_proto_depIdxs = nil } diff --git a/sdks/go/pkg/beam/core/runtime/harness/statecache/statecache.go b/sdks/go/pkg/beam/core/runtime/harness/statecache/statecache.go new file mode 100644 index 000000000000..5496d8b81252 --- /dev/null +++ b/sdks/go/pkg/beam/core/runtime/harness/statecache/statecache.go @@ -0,0 +1,215 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package statecache implements the state caching feature described by the +// Beam Fn API +// +// The Beam State API and the intended caching behavior are described here: +// https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m +package statecache + +import ( + "sync" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" +) + +type token string + +// SideInputCache stores a cache of reusable inputs for the purposes of +// eliminating redundant calls to the runner during execution of ParDos +// using side inputs. +// +// A SideInputCache should be initialized when the SDK harness is initialized, +// creating storage for side input caching. On each ProcessBundleRequest, +// the cache will process the list of tokens for cacheable side inputs and +// be queried when side inputs are requested in bundle execution. Once a +// new bundle request comes in the valid tokens will be updated and the cache +// will be re-used. In the event that the cache reaches capacity, a random, +// currently invalid cached object will be evicted. +type SideInputCache struct { + capacity int + mu sync.Mutex + cache map[token]exec.ReusableInput + idsToTokens map[string]token + validTokens map[token]int8 // Maps tokens to active bundle counts + metrics CacheMetrics +} + +type CacheMetrics struct { + Hits int64 + Misses int64 + Evictions int64 + InUseEvictions int64 +} + +// Init makes the cache map and the map of IDs to cache tokens for the +// SideInputCache. Should only be called once. Returns an error for +// non-positive capacities. +func (c *SideInputCache) Init(cap int) error { + if cap <= 0 { + return errors.Errorf("capacity must be a positive integer, got %v", cap) + } + c.mu.Lock() + defer c.mu.Unlock() + c.cache = make(map[token]exec.ReusableInput, cap) + c.idsToTokens = make(map[string]token) + c.validTokens = make(map[token]int8) + c.capacity = cap + return nil +} + +// SetValidTokens clears the list of valid tokens then sets new ones, also updating the mapping of +// transform and side input IDs to cache tokens in the process. Should be called at the start of every +// new ProcessBundleRequest. If the runner does not support caching, the passed cache token values +// should be empty and all get/set requests will silently be no-ops. +func (c *SideInputCache) SetValidTokens(cacheTokens ...fnpb.ProcessBundleRequest_CacheToken) { + c.mu.Lock() + defer c.mu.Unlock() + for _, tok := range cacheTokens { + // User State caching is currently not supported, so these tokens are ignored + if tok.GetUserState() != nil { + continue + } + s := tok.GetSideInput() + transformID := s.GetTransformId() + sideInputID := s.GetSideInputId() + t := token(tok.GetToken()) + c.setValidToken(transformID, sideInputID, t) + } +} + +// setValidToken adds a new valid token for a request into the SideInputCache struct +// by mapping the transform ID and side input ID pairing to the cache token. +func (c *SideInputCache) setValidToken(transformID, sideInputID string, tok token) { + idKey := transformID + sideInputID + c.idsToTokens[idKey] = tok + count, ok := c.validTokens[tok] + if !ok { + c.validTokens[tok] = 1 + } else { + c.validTokens[tok] = count + 1 + } +} + +// CompleteBundle takes the cache tokens passed to set the valid tokens and decrements their +// usage count for the purposes of maintaining a valid count of whether or not a value is +// still in use. Should be called once ProcessBundle has completed. +func (c *SideInputCache) CompleteBundle(cacheTokens ...fnpb.ProcessBundleRequest_CacheToken) { + c.mu.Lock() + defer c.mu.Unlock() + for _, tok := range cacheTokens { + // User State caching is currently not supported, so these tokens are ignored + if tok.GetUserState() != nil { + continue + } + t := token(tok.GetToken()) + c.decrementTokenCount(t) + } +} + +// decrementTokenCount decrements the validTokens entry for +// a given token by 1. Should only be called when completing +// a bundle. +func (c *SideInputCache) decrementTokenCount(tok token) { + count := c.validTokens[tok] + if count == 1 { + delete(c.validTokens, tok) + } else { + c.validTokens[tok] = count - 1 + } +} + +func (c *SideInputCache) makeAndValidateToken(transformID, sideInputID string) (token, bool) { + idKey := transformID + sideInputID + // Check if it's a known token + tok, ok := c.idsToTokens[idKey] + if !ok { + return "", false + } + return tok, c.isValid(tok) +} + +// QueryCache takes a transform ID and side input ID and checking if a corresponding side +// input has been cached. A query having a bad token (e.g. one that doesn't make a known +// token or one that makes a known but currently invalid token) is treated the same as a +// cache miss. +func (c *SideInputCache) QueryCache(transformID, sideInputID string) exec.ReusableInput { + c.mu.Lock() + defer c.mu.Unlock() + tok, ok := c.makeAndValidateToken(transformID, sideInputID) + if !ok { + return nil + } + // Check to see if cached + input, ok := c.cache[tok] + if !ok { + c.metrics.Misses++ + return nil + } + + c.metrics.Hits++ + return input +} + +// SetCache allows a user to place a ReusableInput materialized from the reader into the SideInputCache +// with its corresponding transform ID and side input ID. If the IDs do not pair with a known, valid token +// then we silently do not cache the input, as this is an indication that the runner is treating that input +// as uncacheable. +func (c *SideInputCache) SetCache(transformID, sideInputID string, input exec.ReusableInput) { + c.mu.Lock() + defer c.mu.Unlock() + tok, ok := c.makeAndValidateToken(transformID, sideInputID) + if !ok { + return + } + if len(c.cache) >= c.capacity { + c.evictElement() + } + c.cache[tok] = input +} + +func (c *SideInputCache) isValid(tok token) bool { + count, ok := c.validTokens[tok] + // If the token is not known or not in use, return false + return ok && count > 0 +} + +// evictElement randomly evicts a ReusableInput that is not currently valid from the cache. +// It should only be called by a goroutine that obtained the lock in SetCache. +func (c *SideInputCache) evictElement() { + deleted := false + // Select a key from the cache at random + for k := range c.cache { + // Do not evict an element if it's currently valid + if !c.isValid(k) { + delete(c.cache, k) + c.metrics.Evictions++ + deleted = true + break + } + } + // Nothing is deleted if every side input is still valid. Clear + // out a random entry and record the in-use eviction + if !deleted { + for k := range c.cache { + delete(c.cache, k) + c.metrics.InUseEvictions++ + break + } + } +} diff --git a/sdks/go/pkg/beam/core/runtime/harness/statecache/statecache_test.go b/sdks/go/pkg/beam/core/runtime/harness/statecache/statecache_test.go new file mode 100644 index 000000000000..b9970c398154 --- /dev/null +++ b/sdks/go/pkg/beam/core/runtime/harness/statecache/statecache_test.go @@ -0,0 +1,290 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package statecache + +import ( + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" +) + +// TestReusableInput implements the ReusableInput interface for the purposes +// of testing. +type TestReusableInput struct { + transformID string + sideInputID string + value interface{} +} + +func makeTestReusableInput(transformID, sideInputID string, value interface{}) exec.ReusableInput { + return &TestReusableInput{transformID: transformID, sideInputID: sideInputID, value: value} +} + +// Init is a ReusableInput interface method, this is a no-op. +func (r *TestReusableInput) Init() error { + return nil +} + +// Value returns the stored value in the TestReusableInput. +func (r *TestReusableInput) Value() interface{} { + return r.value +} + +// Reset clears the value in the TestReusableInput. +func (r *TestReusableInput) Reset() error { + r.value = nil + return nil +} + +func TestInit(t *testing.T) { + var s SideInputCache + err := s.Init(5) + if err != nil { + t.Errorf("SideInputCache failed but should have succeeded, got %v", err) + } +} + +func TestInit_Bad(t *testing.T) { + var s SideInputCache + err := s.Init(0) + if err == nil { + t.Error("SideInputCache init succeeded but should have failed") + } +} + +func TestQueryCache_EmptyCase(t *testing.T) { + var s SideInputCache + err := s.Init(1) + if err != nil { + t.Fatalf("cache init failed, got %v", err) + } + output := s.QueryCache("side1", "transform1") + if output != nil { + t.Errorf("Cache hit when it should have missed, got %v", output) + } +} + +func TestSetCache_UncacheableCase(t *testing.T) { + var s SideInputCache + err := s.Init(1) + if err != nil { + t.Fatalf("cache init failed, got %v", err) + } + input := makeTestReusableInput("t1", "s1", 10) + s.SetCache("t1", "s1", input) + output := s.QueryCache("t1", "s1") + if output != nil { + t.Errorf("Cache hit when should have missed, got %v", output) + } +} + +func TestSetCache_CacheableCase(t *testing.T) { + var s SideInputCache + err := s.Init(1) + if err != nil { + t.Fatalf("cache init failed, got %v", err) + } + transID := "t1" + sideID := "s1" + tok := token("tok1") + s.setValidToken(transID, sideID, tok) + input := makeTestReusableInput(transID, sideID, 10) + s.SetCache(transID, sideID, input) + output := s.QueryCache(transID, sideID) + if output == nil { + t.Fatalf("call to query cache missed when should have hit") + } + val, ok := output.Value().(int) + if !ok { + t.Errorf("failed to convert value to integer, got %v", output.Value()) + } + if val != 10 { + t.Errorf("element mismatch, expected 10, got %v", val) + } +} + +func makeRequest(transformID, sideInputID string, t token) fnpb.ProcessBundleRequest_CacheToken { + var tok fnpb.ProcessBundleRequest_CacheToken + var wrap fnpb.ProcessBundleRequest_CacheToken_SideInput_ + var side fnpb.ProcessBundleRequest_CacheToken_SideInput + side.TransformId = transformID + side.SideInputId = sideInputID + wrap.SideInput = &side + tok.Type = &wrap + tok.Token = []byte(t) + return tok +} + +func TestSetValidTokens(t *testing.T) { + inputs := []struct { + transformID string + sideInputID string + tok token + }{ + { + "t1", + "s1", + "tok1", + }, + { + "t2", + "s2", + "tok2", + }, + { + "t3", + "s3", + "tok3", + }, + } + + var s SideInputCache + err := s.Init(3) + if err != nil { + t.Fatalf("cache init failed, got %v", err) + } + + var tokens []fnpb.ProcessBundleRequest_CacheToken + for _, input := range inputs { + t := makeRequest(input.transformID, input.sideInputID, input.tok) + tokens = append(tokens, t) + } + + s.SetValidTokens(tokens...) + if len(s.idsToTokens) != len(inputs) { + t.Errorf("Missing tokens, expected %v, got %v", len(inputs), len(s.idsToTokens)) + } + + for i, input := range inputs { + // Check that the token is in the valid list + if !s.isValid(input.tok) { + t.Errorf("error in input %v, token %v is not valid", i, input.tok) + } + // Check that the mapping of IDs to tokens is correct + mapped := s.idsToTokens[input.transformID+input.sideInputID] + if mapped != input.tok { + t.Errorf("token mismatch for input %v, expected %v, got %v", i, input.tok, mapped) + } + } +} + +func TestSetValidTokens_ClearingBetween(t *testing.T) { + inputs := []struct { + transformID string + sideInputID string + tk token + }{ + { + "t1", + "s1", + "tok1", + }, + { + "t2", + "s2", + "tok2", + }, + { + "t3", + "s3", + "tok3", + }, + } + + var s SideInputCache + err := s.Init(1) + if err != nil { + t.Fatalf("cache init failed, got %v", err) + } + + for i, input := range inputs { + tok := makeRequest(input.transformID, input.sideInputID, input.tk) + + s.SetValidTokens(tok) + + // Check that the token is in the valid list + if !s.isValid(input.tk) { + t.Errorf("error in input %v, token %v is not valid", i, input.tk) + } + // Check that the mapping of IDs to tokens is correct + mapped := s.idsToTokens[input.transformID+input.sideInputID] + if mapped != input.tk { + t.Errorf("token mismatch for input %v, expected %v, got %v", i, input.tk, mapped) + } + + s.CompleteBundle(tok) + } + + for k, _ := range s.validTokens { + if s.validTokens[k] != 0 { + t.Errorf("token count mismatch for token %v, expected 0, got %v", k, s.validTokens[k]) + } + } +} + +func TestSetCache_Eviction(t *testing.T) { + var s SideInputCache + err := s.Init(1) + if err != nil { + t.Fatalf("cache init failed, got %v", err) + } + + tokOne := makeRequest("t1", "s1", "tok1") + inOne := makeTestReusableInput("t1", "s1", 10) + s.SetValidTokens(tokOne) + s.SetCache("t1", "s1", inOne) + // Mark bundle as complete, drop count for tokOne to 0 + s.CompleteBundle(tokOne) + + tokTwo := makeRequest("t2", "s2", "tok2") + inTwo := makeTestReusableInput("t2", "s2", 20) + s.SetValidTokens(tokTwo) + s.SetCache("t2", "s2", inTwo) + + if len(s.cache) != 1 { + t.Errorf("cache size incorrect, expected 1, got %v", len(s.cache)) + } + if s.metrics.Evictions != 1 { + t.Errorf("number evictions incorrect, expected 1, got %v", s.metrics.Evictions) + } +} + +func TestSetCache_EvictionFailure(t *testing.T) { + var s SideInputCache + err := s.Init(1) + if err != nil { + t.Fatalf("cache init failed, got %v", err) + } + + tokOne := makeRequest("t1", "s1", "tok1") + inOne := makeTestReusableInput("t1", "s1", 10) + + tokTwo := makeRequest("t2", "s2", "tok2") + inTwo := makeTestReusableInput("t2", "s2", 20) + + s.SetValidTokens(tokOne, tokTwo) + s.SetCache("t1", "s1", inOne) + // Should fail to evict because the first token is still valid + s.SetCache("t2", "s2", inTwo) + // Cache should not exceed size 1 + if len(s.cache) != 1 { + t.Errorf("cache size incorrect, expected 1, got %v", len(s.cache)) + } + if s.metrics.InUseEvictions != 1 { + t.Errorf("number of failed evicition calls incorrect, expected 1, got %v", s.metrics.InUseEvictions) + } +} diff --git a/sdks/go/pkg/beam/core/runtime/harness/statemgr.go b/sdks/go/pkg/beam/core/runtime/harness/statemgr.go index 95c5f7260000..09daa1185dc4 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/statemgr.go +++ b/sdks/go/pkg/beam/core/runtime/harness/statemgr.go @@ -23,10 +23,10 @@ import ( "sync/atomic" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" "github.com/golang/protobuf/proto" ) diff --git a/sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go b/sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go index 9d0389164507..666f839e7a48 100644 --- a/sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go +++ b/sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go @@ -27,8 +27,8 @@ import ( "testing" "time" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" ) // fakeStateClient replicates the call and response protocol diff --git a/sdks/go/pkg/beam/core/runtime/metricsx/metricsx.go b/sdks/go/pkg/beam/core/runtime/metricsx/metricsx.go index 6cd10b4e9a35..1d0dd32ec6a9 100644 --- a/sdks/go/pkg/beam/core/runtime/metricsx/metricsx.go +++ b/sdks/go/pkg/beam/core/runtime/metricsx/metricsx.go @@ -21,9 +21,9 @@ import ( "log" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) // FromMonitoringInfos extracts metrics from monitored states and @@ -32,7 +32,7 @@ func FromMonitoringInfos(attempted []*pipepb.MonitoringInfo, committed []*pipepb ac, ad, ag := groupByType(attempted) cc, cd, cg := groupByType(committed) - return metrics.NewResults(mergeCounters(ac, cc), mergeDistributions(ad, cd), mergeGauges(ag, cg)) + return metrics.NewResults(metrics.MergeCounters(ac, cc), metrics.MergeDistributions(ad, cd), metrics.MergeGauges(ag, cg)) } func groupByType(minfos []*pipepb.MonitoringInfo) ( @@ -84,42 +84,6 @@ func groupByType(minfos []*pipepb.MonitoringInfo) ( return counters, distributions, gauges } -func mergeCounters( - attempted map[metrics.StepKey]int64, - committed map[metrics.StepKey]int64) []metrics.CounterResult { - res := make([]metrics.CounterResult, 0) - - for k := range attempted { - v := committed[k] - res = append(res, metrics.CounterResult{Attempted: attempted[k], Committed: v, Key: k}) - } - return res -} - -func mergeDistributions( - attempted map[metrics.StepKey]metrics.DistributionValue, - committed map[metrics.StepKey]metrics.DistributionValue) []metrics.DistributionResult { - res := make([]metrics.DistributionResult, 0) - - for k := range attempted { - v := committed[k] - res = append(res, metrics.DistributionResult{Attempted: attempted[k], Committed: v, Key: k}) - } - return res -} - -func mergeGauges( - attempted map[metrics.StepKey]metrics.GaugeValue, - committed map[metrics.StepKey]metrics.GaugeValue) []metrics.GaugeResult { - res := make([]metrics.GaugeResult, 0) - - for k := range attempted { - v := committed[k] - res = append(res, metrics.GaugeResult{Attempted: attempted[k], Committed: v, Key: k}) - } - return res -} - func extractKey(mi *pipepb.MonitoringInfo) (metrics.StepKey, error) { labels := newLabels(mi.GetLabels()) stepName := labels.Transform() diff --git a/sdks/go/pkg/beam/core/runtime/metricsx/metricsx_test.go b/sdks/go/pkg/beam/core/runtime/metricsx/metricsx_test.go index 83add8798cd0..d5406da270ed 100644 --- a/sdks/go/pkg/beam/core/runtime/metricsx/metricsx_test.go +++ b/sdks/go/pkg/beam/core/runtime/metricsx/metricsx_test.go @@ -19,8 +19,8 @@ import ( "testing" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/google/go-cmp/cmp" ) diff --git a/sdks/go/pkg/beam/core/runtime/metricsx/urns.go b/sdks/go/pkg/beam/core/runtime/metricsx/urns.go index 67aed96a819f..814e3f3a697e 100644 --- a/sdks/go/pkg/beam/core/runtime/metricsx/urns.go +++ b/sdks/go/pkg/beam/core/runtime/metricsx/urns.go @@ -19,8 +19,8 @@ import ( "bytes" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" ) // Urn is an enum type for representing urns of metrics and monitored states. diff --git a/sdks/go/pkg/beam/core/runtime/pipelinex/clone.go b/sdks/go/pkg/beam/core/runtime/pipelinex/clone.go index e48db2b5f5e3..2af1868fbf43 100644 --- a/sdks/go/pkg/beam/core/runtime/pipelinex/clone.go +++ b/sdks/go/pkg/beam/core/runtime/pipelinex/clone.go @@ -16,8 +16,8 @@ package pipelinex import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) func shallowClonePipeline(p *pipepb.Pipeline) *pipepb.Pipeline { @@ -49,6 +49,7 @@ func ShallowClonePTransform(t *pipepb.PTransform) *pipepb.PTransform { UniqueName: t.UniqueName, Spec: t.Spec, DisplayData: t.DisplayData, + Annotations: t.Annotations, } ret.Subtransforms, _ = reflectx.ShallowClone(t.Subtransforms).([]string) ret.Inputs, _ = reflectx.ShallowClone(t.Inputs).(map[string]string) diff --git a/sdks/go/pkg/beam/core/runtime/pipelinex/clone_test.go b/sdks/go/pkg/beam/core/runtime/pipelinex/clone_test.go index bf18cab5cb40..695830a483c0 100644 --- a/sdks/go/pkg/beam/core/runtime/pipelinex/clone_test.go +++ b/sdks/go/pkg/beam/core/runtime/pipelinex/clone_test.go @@ -18,7 +18,7 @@ package pipelinex import ( "testing" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" "github.com/google/go-cmp/cmp" ) diff --git a/sdks/go/pkg/beam/core/runtime/pipelinex/replace.go b/sdks/go/pkg/beam/core/runtime/pipelinex/replace.go index 8a0a3b907709..b27c07885c1c 100644 --- a/sdks/go/pkg/beam/core/runtime/pipelinex/replace.go +++ b/sdks/go/pkg/beam/core/runtime/pipelinex/replace.go @@ -19,11 +19,16 @@ package pipelinex import ( "fmt" + "path" + "regexp" "sort" + "strconv" + "strings" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/golang/protobuf/proto" ) // Update merges a pipeline with the given components, which may add, replace @@ -39,6 +44,11 @@ func Update(p *pipepb.Pipeline, values *pipepb.Components) (*pipepb.Pipeline, er return Normalize(ret) } +// IdempotentNormalize determines whether to use the idempotent version +// of ensureUniqueNames or the legacy version. +// TODO(BEAM-12341): Cleanup once nothing depends on the legacy implementation. +var IdempotentNormalize bool = true + // Normalize recomputes derivative information in the pipeline, such // as roots and input/output for composite transforms. It also // ensures that unique names are so and topologically sorts each @@ -49,7 +59,11 @@ func Normalize(p *pipepb.Pipeline) (*pipepb.Pipeline, error) { } ret := shallowClonePipeline(p) - ret.Components.Transforms = ensureUniqueNames(ret.Components.Transforms) + if IdempotentNormalize { + ret.Components.Transforms = ensureUniqueNames(ret.Components.Transforms) + } else { + ret.Components.Transforms = ensureUniqueNamesLegacy(ret.Components.Transforms) + } ret.Components.Transforms = computeCompositeInputOutput(ret.Components.Transforms) ret.RootTransformIds = computeRoots(ret.Components.Transforms) return ret, nil @@ -205,11 +219,148 @@ func externalIns(counted map[string]bool, xforms map[string]*pipepb.PTransform, } } -// ensureUniqueNames ensures that each name is unique. Any conflict is -// resolved by adding '1, '2, etc to the name. +type idSorted []string + +func (s idSorted) Len() int { + return len(s) +} +func (s idSorted) Swap(i, j int) { + s[i], s[j] = s[j], s[i] +} + +// Go SDK ids for transforms are "e#" or "s#" and we want to +// sort them properly at the root level at least. Cross lang +// transforms or expanded nodes (like CoGBK) don't follow this +// format should be sorted lexicographically, but are wrapped in +// a composite ptransform meaning they're compared to fewer +// transforms. +var idParseExp = regexp.MustCompile(`(\D*)(\d*)`) + +func (s idSorted) Less(i, j int) bool { + // We want to sort alphabetically by id prefix + // and numerically by id suffix. + // Otherwise, values are compared lexicographically. + iM := idParseExp.FindStringSubmatch(s[i]) + jM := idParseExp.FindStringSubmatch(s[j]) + if iM == nil || jM == nil { + return s[i] < s[j] + } + // check if the letters match. + if iM[1] < jM[1] { + return true + } + if iM[1] > jM[1] { + return false + } + // The letters match, check the numbers. + // We can ignore the errors here due to the regex check. + iN, _ := strconv.Atoi(iM[2]) + jN, _ := strconv.Atoi(jM[2]) + if iN < jN { + return true + } + return false +} + +func separateCompsAndLeaves(xforms map[string]*pipepb.PTransform) (comp, leaf []string) { + var cs, ls idSorted + for id, pt := range xforms { + if len(pt.GetSubtransforms()) == 0 { + // No subtransforms, it's a leaf! + ls = append(ls, id) + } else { + // Subtransforms, it's a composite + cs = append(cs, id) + } + } + // Sort the transforms to make to make renaming deterministic. + sort.Sort(cs) + sort.Sort(ls) + return []string(cs), []string(ls) +} + +// ensureUniqueNames ensures that each name is unique. +// +// Subtransforms are prefixed with the names of their parent, separated by a '/'. +// Any conflict is resolved by adding '1, '2, etc to the name. func ensureUniqueNames(xforms map[string]*pipepb.PTransform) map[string]*pipepb.PTransform { ret := reflectx.ShallowClone(xforms).(map[string]*pipepb.PTransform) + comp, leaf := separateCompsAndLeaves(xforms) + parentLookup := make(map[string]string) // childID -> parentID + for _, parentID := range comp { + t := xforms[parentID] + children := t.GetSubtransforms() + for _, childID := range children { + parentLookup[childID] = parentID + } + } + + parentNameCache := make(map[string]string) // parentID -> parentName + seen := make(map[string]bool) + // Closure to to make the names unique so we can handle all the parent ids first. + uniquify := func(id string) string { + t := xforms[id] + base := path.Base(t.GetUniqueName()) + var prefix string + if parentID, ok := parentLookup[id]; ok { + prefix = getParentName(parentNameCache, parentLookup, parentID, xforms) + } + base = prefix + base + name := findFreeName(seen, base) + seen[name] = true + + if name != t.UniqueName { + upd := ShallowClonePTransform(t) + upd.UniqueName = name + ret[id] = upd + } + return name + } + for _, id := range comp { + name := uniquify(id) + parentNameCache[id] = name + "/" + } + for _, id := range leaf { + uniquify(id) + } + return ret +} + +func getParentName(nameCache, parentLookup map[string]string, parentID string, xforms map[string]*pipepb.PTransform) string { + if name, ok := nameCache[parentID]; ok { + return name + } + var parts []string + curID := parentID + for { + t := xforms[curID] + // Construct composite names from scratch if the parent's not + // already in the cache. Otherwise there's a risk of errors from + // not following topological orderings. + parts = append(parts, path.Base(t.GetUniqueName())) + if pid, ok := parentLookup[curID]; ok { + curID = pid + continue + } + break + } + + // reverse the parts so parents are first. + for i, j := 0, len(parts)-1; i < j; i, j = i+1, j-1 { + parts[i], parts[j] = parts[j], parts[i] + } + name := strings.Join(parts, "/") + "/" + nameCache[parentID] = name + return name +} + +// ensureUniqueNamesLegacy ensures that each name is unique. Any conflict is +// resolved by adding '1, '2, etc to the name. +// Older version that wasn't idempotent. Sticking around for temporary migration purposes. +func ensureUniqueNamesLegacy(xforms map[string]*pipepb.PTransform) map[string]*pipepb.PTransform { + ret := reflectx.ShallowClone(xforms).(map[string]*pipepb.PTransform) + // Sort the transforms to make to make renaming deterministic. var ordering []string for id := range xforms { @@ -243,3 +394,45 @@ func findFreeName(seen map[string]bool, name string) string { } } } + +// ApplySdkImageOverrides takes a pipeline and a map of patterns to overrides, +// and proceeds to replace matching ContainerImages in any Environments +// present in the pipeline. Each environment is expected to match at most one +// pattern. If an environment matches two or more it is arbitrary which +// pattern will be applied. +func ApplySdkImageOverrides(p *pipepb.Pipeline, patterns map[string]string) error { + if len(patterns) == 0 { + return nil + } + + // Precompile all patterns as regexes. + regexes := make(map[*regexp.Regexp]string, len(patterns)) + for p, r := range patterns { + re, err := regexp.Compile(p) + if err != nil { + return err + } + regexes[re] = r + } + + for _, env := range p.GetComponents().GetEnvironments() { + var payload pipepb.DockerPayload + if err := proto.Unmarshal(env.GetPayload(), &payload); err != nil { + return err + } + oldImg := payload.GetContainerImage() + for re, replacement := range regexes { + newImg := re.ReplaceAllLiteralString(oldImg, replacement) + if newImg != oldImg { + payload.ContainerImage = newImg + pl, err := proto.Marshal(&payload) + if err != nil { + return err + } + env.Payload = pl + break // Apply at most one override to each environment. + } + } + } + return nil +} diff --git a/sdks/go/pkg/beam/core/runtime/pipelinex/replace_test.go b/sdks/go/pkg/beam/core/runtime/pipelinex/replace_test.go index 7bb395a33861..c6333cf09060 100644 --- a/sdks/go/pkg/beam/core/runtime/pipelinex/replace_test.go +++ b/sdks/go/pkg/beam/core/runtime/pipelinex/replace_test.go @@ -18,16 +18,19 @@ package pipelinex import ( "testing" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" "github.com/google/go-cmp/cmp" + "google.golang.org/protobuf/testing/protocmp" ) func TestEnsureUniqueName(t *testing.T) { tests := []struct { + name string in, exp map[string]*pipepb.PTransform }{ { + name: "AlreadyUnique", in: map[string]*pipepb.PTransform{ "1": {UniqueName: "a"}, "2": {UniqueName: "b"}, @@ -40,6 +43,7 @@ func TestEnsureUniqueName(t *testing.T) { }, }, { + name: "NeedsUniqueLeaves", in: map[string]*pipepb.PTransform{ "2": {UniqueName: "a"}, "1": {UniqueName: "a"}, @@ -51,13 +55,118 @@ func TestEnsureUniqueName(t *testing.T) { "3": {UniqueName: "a'2"}, }, }, + { + name: "StripUniqueLeaves", + in: map[string]*pipepb.PTransform{ + "1": {UniqueName: "a"}, + "2": {UniqueName: "a'1"}, + "3": {UniqueName: "a'2"}, + }, + exp: map[string]*pipepb.PTransform{ + "1": {UniqueName: "a"}, + "2": {UniqueName: "a'1"}, + "3": {UniqueName: "a'2"}, + }, + }, + { + name: "NonTopologicalIdOrder", + in: map[string]*pipepb.PTransform{ + "e1": {UniqueName: "a"}, + "s1": {UniqueName: "a", Subtransforms: []string{"e1"}}, + "s2": {UniqueName: "a", Subtransforms: []string{"s1"}}, + "s3": {UniqueName: "a", Subtransforms: []string{"s2"}}, // root + }, + exp: map[string]*pipepb.PTransform{ + "e1": {UniqueName: "a/a/a/a"}, + "s1": {UniqueName: "a/a/a", Subtransforms: []string{"e1"}}, + "s2": {UniqueName: "a/a", Subtransforms: []string{"s1"}}, + "s3": {UniqueName: "a", Subtransforms: []string{"s2"}}, // root + }, + }, + { + name: "UniqueComps", + in: map[string]*pipepb.PTransform{ + "e1": {UniqueName: "a"}, + "e2": {UniqueName: "a"}, + "s1": {UniqueName: "a", Subtransforms: []string{"e1"}}, + "s2": {UniqueName: "a", Subtransforms: []string{"e2"}}, + }, + exp: map[string]*pipepb.PTransform{ + "e1": {UniqueName: "a/a"}, + "e2": {UniqueName: "a'1/a"}, + "s1": {UniqueName: "a", Subtransforms: []string{"e1"}}, + "s2": {UniqueName: "a'1", Subtransforms: []string{"e2"}}, + }, + }, + { + name: "StripComps", + in: map[string]*pipepb.PTransform{ + "e1": {UniqueName: "a/a"}, + "e2": {UniqueName: "a'1/a"}, + "s1": {UniqueName: "a", Subtransforms: []string{"e1"}}, + "s2": {UniqueName: "a'1", Subtransforms: []string{"e2"}}, + }, + exp: map[string]*pipepb.PTransform{ + "e1": {UniqueName: "a/a"}, + "e2": {UniqueName: "a'1/a"}, + "s1": {UniqueName: "a", Subtransforms: []string{"e1"}}, + "s2": {UniqueName: "a'1", Subtransforms: []string{"e2"}}, + }, + }, + { + name: "large", + in: map[string]*pipepb.PTransform{ + "e1": {UniqueName: "a"}, + "e2": {UniqueName: "a"}, + "e3": {UniqueName: "a"}, + "e4": {UniqueName: "a"}, + "e5": {UniqueName: "a"}, + "e6": {UniqueName: "a"}, + "e7": {UniqueName: "a"}, + "e8": {UniqueName: "a"}, + "e9": {UniqueName: "a"}, + "e10": {UniqueName: "a"}, + "e11": {UniqueName: "a"}, + "e12": {UniqueName: "a"}, + "s1": {UniqueName: "a", Subtransforms: []string{"s2", "s3"}}, + "s2": {UniqueName: "a", Subtransforms: []string{"s4", "s5"}}, + "s3": {UniqueName: "a", Subtransforms: []string{"s6", "s7"}}, + "s4": {UniqueName: "a", Subtransforms: []string{"e1"}}, + "s5": {UniqueName: "a", Subtransforms: []string{"e2", "e3"}}, + "s6": {UniqueName: "a", Subtransforms: []string{"e4", "e5", "e6"}}, + "s7": {UniqueName: "a", Subtransforms: []string{"e7", "e8", "e9", "e10", "e11", "e12"}}, + }, + exp: map[string]*pipepb.PTransform{ + "e1": {UniqueName: "a/a/a/a"}, + "e2": {UniqueName: "a/a/a'1/a"}, + "e3": {UniqueName: "a/a/a'1/a'1"}, + "e4": {UniqueName: "a/a'1/a/a"}, + "e5": {UniqueName: "a/a'1/a/a'1"}, + "e6": {UniqueName: "a/a'1/a/a'2"}, + "e7": {UniqueName: "a/a'1/a'1/a"}, + "e8": {UniqueName: "a/a'1/a'1/a'1"}, + "e9": {UniqueName: "a/a'1/a'1/a'2"}, + "e10": {UniqueName: "a/a'1/a'1/a'3"}, + "e11": {UniqueName: "a/a'1/a'1/a'4"}, + "e12": {UniqueName: "a/a'1/a'1/a'5"}, + "s1": {UniqueName: "a", Subtransforms: []string{"s2", "s3"}}, + "s2": {UniqueName: "a/a", Subtransforms: []string{"s4", "s5"}}, + "s3": {UniqueName: "a/a'1", Subtransforms: []string{"s6", "s7"}}, + "s4": {UniqueName: "a/a/a", Subtransforms: []string{"e1"}}, + "s5": {UniqueName: "a/a/a'1", Subtransforms: []string{"e2", "e3"}}, + "s6": {UniqueName: "a/a'1/a", Subtransforms: []string{"e4", "e5", "e6"}}, + "s7": {UniqueName: "a/a'1/a'1", Subtransforms: []string{"e7", "e8", "e9", "e10", "e11", "e12"}}, + }, + }, } for _, test := range tests { - actual := ensureUniqueNames(test.in) - if !cmp.Equal(actual, test.exp, cmp.Comparer(proto.Equal)) { - t.Errorf("ensureUniqueName(%v) = %v, want %v", test.in, actual, test.exp) - } + t.Run(test.name, func(t *testing.T) { + actual := ensureUniqueNames(test.in) + if d := cmp.Diff(test.exp, actual, protocmp.Transform()); d != "" { + t.Errorf("ensureUniqueName(%v) = %v, want %v\n %v", test.in, actual, test.exp, d) + } + }) } } @@ -200,3 +309,95 @@ func TestComputeInputOutput(t *testing.T) { }) } } + +func TestApplySdkImageOverrides(t *testing.T) { + tests := []struct { + name string + patterns map[string]string + envs map[string]string // Environment ID to container image name. + want map[string]string // Environment ID to final container image names. + }{ + { + name: "Basic", + patterns: map[string]string{".*foo.*": "foo:override"}, + envs: map[string]string{ + "foobar": "foo:invalid", + "bar": "bar:valid", + }, + want: map[string]string{ + "foobar": "foo:override", + "bar": "bar:valid", + }, + }, + { + name: "MultiplePatterns", + patterns: map[string]string{ + ".*foo.*": "foo:override", + ".*bar.*": "bar:override", + }, + envs: map[string]string{ + "foobaz": "foo:invalid", + "barbaz": "bar:invalid", + }, + want: map[string]string{ + "foobaz": "foo:override", + "barbaz": "bar:override", + }, + }, + { + name: "MultipleMatches", + patterns: map[string]string{".*foo.*": "foo:override"}, + envs: map[string]string{ + "foo1": "foo1:invalid", + "foo2": "foo2:invalid", + }, + want: map[string]string{ + "foo1": "foo:override", + "foo2": "foo:override", + }, + }, + } + + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + envs := make(map[string]*pipepb.Environment) + for id, ci := range test.envs { + env := buildEnvironment(t, ci) + envs[id] = env + } + wantEnvs := make(map[string]*pipepb.Environment) + for id, ci := range test.want { + env := buildEnvironment(t, ci) + wantEnvs[id] = env + } + + p := &pipepb.Pipeline{ + Components: &pipepb.Components{ + Environments: envs, + }, + } + if err := ApplySdkImageOverrides(p, test.patterns); err != nil { + t.Fatalf("ApplySdkImageOverrides failed: %v", err) + } + if diff := cmp.Diff(envs, wantEnvs, protocmp.Transform()); diff != "" { + t.Errorf("ApplySdkImageOverrides gave incorrect output: diff(-want,+got):\n %v", diff) + } + }) + } +} + +func buildEnvironment(t *testing.T, containerImg string) *pipepb.Environment { + t.Helper() + env := &pipepb.Environment{ + Urn: "alpha", + DisplayData: []*pipepb.DisplayData{{Urn: "beta"}}, + Capabilities: []string{"delta", "gamma"}, + } + pl := pipepb.DockerPayload{ContainerImage: containerImg} + plb, err := proto.Marshal(&pl) + if err != nil { + t.Fatalf("Failed to marshal DockerPayload with container image %v: %v", containerImg, err) + } + env.Payload = plb + return env +} diff --git a/sdks/go/pkg/beam/core/runtime/pipelinex/util.go b/sdks/go/pkg/beam/core/runtime/pipelinex/util.go index 9aae0454e56e..70f914b4440a 100644 --- a/sdks/go/pkg/beam/core/runtime/pipelinex/util.go +++ b/sdks/go/pkg/beam/core/runtime/pipelinex/util.go @@ -18,7 +18,7 @@ package pipelinex import ( "sort" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" ) diff --git a/sdks/go/pkg/beam/core/runtime/pipelinex/util_test.go b/sdks/go/pkg/beam/core/runtime/pipelinex/util_test.go index 8f19c12b3257..c8a6091e878f 100644 --- a/sdks/go/pkg/beam/core/runtime/pipelinex/util_test.go +++ b/sdks/go/pkg/beam/core/runtime/pipelinex/util_test.go @@ -19,7 +19,7 @@ import ( "fmt" "testing" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/google/go-cmp/cmp" ) diff --git a/sdks/go/pkg/beam/core/runtime/symbols.go b/sdks/go/pkg/beam/core/runtime/symbols.go index 157d287b21a5..08e3de199c5e 100644 --- a/sdks/go/pkg/beam/core/runtime/symbols.go +++ b/sdks/go/pkg/beam/core/runtime/symbols.go @@ -20,9 +20,9 @@ import ( "reflect" "sync" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/symtab" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/symtab" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) var ( diff --git a/sdks/go/pkg/beam/core/runtime/types.go b/sdks/go/pkg/beam/core/runtime/types.go index b853a6fc3f0b..1e02b7dff9f9 100644 --- a/sdks/go/pkg/beam/core/runtime/types.go +++ b/sdks/go/pkg/beam/core/runtime/types.go @@ -19,7 +19,7 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) var types = make(map[string]reflect.Type) diff --git a/sdks/go/pkg/beam/core/runtime/types_test.go b/sdks/go/pkg/beam/core/runtime/types_test.go index fd9fa6496ab6..ed9b9e9606ae 100644 --- a/sdks/go/pkg/beam/core/runtime/types_test.go +++ b/sdks/go/pkg/beam/core/runtime/types_test.go @@ -19,7 +19,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) type S struct { diff --git a/sdks/go/pkg/beam/core/runtime/xlangx/expand.go b/sdks/go/pkg/beam/core/runtime/xlangx/expand.go index b871f8239b86..7623a044edec 100644 --- a/sdks/go/pkg/beam/core/runtime/xlangx/expand.go +++ b/sdks/go/pkg/beam/core/runtime/xlangx/expand.go @@ -13,17 +13,19 @@ // See the License for the specific language governing permissions and // limitations under the License. +// Package xlangx contains various low-level utilities needed for adding +// cross-language transforms to the pipeline. package xlangx import ( "context" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/pipelinex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/pipelinex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "google.golang.org/grpc" ) @@ -48,10 +50,11 @@ func Expand(edge *graph.MultiEdge, ext *graph.ExternalTransform) error { extTransform := transforms[extTransformID] for extTransform.UniqueName != "External" { delete(transforms, extTransformID) - p, err := pipelinex.Normalize(p) + p, err = pipelinex.Normalize(p) // Update root transform IDs. if err != nil { return err } + transforms = p.GetComponents().GetTransforms() extTransformID = p.GetRootTransformIds()[0] extTransform = transforms[extTransformID] } @@ -59,8 +62,6 @@ func Expand(edge *graph.MultiEdge, ext *graph.ExternalTransform) error { // Scoping the ExternalTransform with respect to it's unique namespace, thus // avoiding future collisions addNamespace(extTransform, p.GetComponents(), ext.Namespace) - - graphx.AddFakeImpulses(p) // Inputs need to have sources delete(transforms, extTransformID) // Querying the expansion service @@ -73,7 +74,6 @@ func Expand(edge *graph.MultiEdge, ext *graph.ExternalTransform) error { // Previously added fake impulses need to be removed to avoid having // multiple sources to the same pcollection in the graph - graphx.RemoveFakeImpulses(res.GetComponents(), res.GetTransform()) exp := &graph.ExpandedTransform{ Components: res.GetComponents(), @@ -121,5 +121,10 @@ func queryExpansionService( err = errors.Wrapf(err, "expansion failed") return nil, errors.WithContextf(err, "expanding transform with ExpansionRequest: %v", req) } + if len(res.GetError()) != 0 { // ExpansionResponse includes an error. + err := errors.New(res.GetError()) + err = errors.Wrapf(err, "expansion failed") + return nil, errors.WithContextf(err, "expanding transform with ExpansionRequest: %v", req) + } return res, nil } diff --git a/sdks/go/pkg/beam/core/runtime/xlangx/namespace.go b/sdks/go/pkg/beam/core/runtime/xlangx/namespace.go index 06956a61cf82..e3887ff01d53 100644 --- a/sdks/go/pkg/beam/core/runtime/xlangx/namespace.go +++ b/sdks/go/pkg/beam/core/runtime/xlangx/namespace.go @@ -18,8 +18,8 @@ package xlangx import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) func addCoderID(c *pipepb.Components, idMap map[string]string, cid string, newID func(string) string) string { @@ -67,11 +67,6 @@ func addWindowingStrategyID(c *pipepb.Components, idMap map[string]string, wid s windowingStrategy.WindowCoderId = addCoderID(c, idMap, windowingStrategy.WindowCoderId, newID) } - // Updating EnvironmentId of WindowingStrategy - if windowingStrategy.EnvironmentId != "" { - windowingStrategy.EnvironmentId = addEnvironmentID(c, idMap, windowingStrategy.EnvironmentId, newID) - } - idMap[wid] = newID(wid) // Updating WindowingStrategies map @@ -81,25 +76,6 @@ func addWindowingStrategyID(c *pipepb.Components, idMap map[string]string, wid s return idMap[wid] } -func addEnvironmentID(c *pipepb.Components, idMap map[string]string, eid string, newID func(string) string) string { - if _, exists := idMap[eid]; exists { - return idMap[eid] - } - - environment, exists := c.Environments[eid] - if !exists { - panic(errors.Errorf("attempted to add namespace to missing windowing strategy id: %v not in %v", eid, c.Environments)) - } - - idMap[eid] = newID(eid) - - // Updating Environments map - c.Environments[idMap[eid]] = environment - delete(c.Environments, eid) - - return idMap[eid] -} - func addNamespace(t *pipepb.PTransform, c *pipepb.Components, namespace string) { newID := func(id string) string { return fmt.Sprintf("%v@%v", id, namespace) @@ -107,10 +83,11 @@ func addNamespace(t *pipepb.PTransform, c *pipepb.Components, namespace string) idMap := make(map[string]string) - // Update Environment ID of PTransform - if t.EnvironmentId != "" { - t.EnvironmentId = addEnvironmentID(c, idMap, t.EnvironmentId, newID) - } + // Note: Currently environments are not namespaced. This works under the + // assumption that the unexpanded transform is using the default Go SDK + // environment. If multiple Go SDK environments become possible, then + // namespacing of non-default environments should happen here. + for _, pcolsMap := range []map[string]string{t.Inputs, t.Outputs} { for _, pid := range pcolsMap { if pcol, exists := c.Pcollections[pid]; exists { diff --git a/sdks/go/pkg/beam/core/runtime/xlangx/namespace_test.go b/sdks/go/pkg/beam/core/runtime/xlangx/namespace_test.go index 49b2e2032b84..e2ee43780bc2 100644 --- a/sdks/go/pkg/beam/core/runtime/xlangx/namespace_test.go +++ b/sdks/go/pkg/beam/core/runtime/xlangx/namespace_test.go @@ -19,7 +19,7 @@ import ( "strings" "testing" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/google/go-cmp/cmp" "google.golang.org/protobuf/testing/protocmp" ) @@ -100,25 +100,25 @@ func TestAddNamespace(t *testing.T) { UniqueName: "t0", Inputs: map[string]string{"t0i0": "p0"}, Outputs: map[string]string{"t0o0": "p1", "t0o1": "p2"}, - EnvironmentId: "e0@daASxQwenJ", + EnvironmentId: "e0", }, "t1": { UniqueName: "t1", Inputs: map[string]string{"t1i0": "p1"}, Outputs: map[string]string{"t1o0": "p3"}, - EnvironmentId: "e1@daASxQwenJ", + EnvironmentId: "e1", }, "t2": { UniqueName: "t2", Inputs: map[string]string{"t2i0": "p2"}, Outputs: map[string]string{"t2o0": "p4"}, - EnvironmentId: "e0@daASxQwenJ", + EnvironmentId: "e0", }, "t3": { UniqueName: "t3", Inputs: map[string]string{"t3i0": "p3", "t3i1": "p4"}, Outputs: map[string]string{"t3o0": "p5"}, - EnvironmentId: "e1@daASxQwenJ", + EnvironmentId: "e1", }, }, Pcollections: map[string]*pipepb.PCollection{ @@ -131,7 +131,7 @@ func TestAddNamespace(t *testing.T) { }, WindowingStrategies: map[string]*pipepb.WindowingStrategy{ "w0": {WindowCoderId: "c3", EnvironmentId: "e0"}, - "w1@daASxQwenJ": {WindowCoderId: "c4@daASxQwenJ", EnvironmentId: "e1@daASxQwenJ"}, + "w1@daASxQwenJ": {WindowCoderId: "c4@daASxQwenJ", EnvironmentId: "e1"}, }, Coders: map[string]*pipepb.Coder{ "c0@daASxQwenJ": {Spec: &pipepb.FunctionSpec{Urn: "c0"}}, @@ -141,8 +141,8 @@ func TestAddNamespace(t *testing.T) { "c4@daASxQwenJ": {Spec: &pipepb.FunctionSpec{Urn: "c4"}}, }, Environments: map[string]*pipepb.Environment{ - "e0@daASxQwenJ": {Urn: "e0"}, - "e1@daASxQwenJ": {Urn: "e1"}, + "e0": {Urn: "e0"}, + "e1": {Urn: "e1"}, }, }, }, @@ -213,7 +213,7 @@ func TestAddNamespace(t *testing.T) { UniqueName: "t1", Inputs: map[string]string{"t1i0": "p1"}, Outputs: map[string]string{"t1o0": "p3"}, - EnvironmentId: "e1@daASxQwenJ", + EnvironmentId: "e1", }, "t2": { UniqueName: "t2", @@ -225,7 +225,7 @@ func TestAddNamespace(t *testing.T) { UniqueName: "t3", Inputs: map[string]string{"t3i0": "p3", "t3i1": "p4"}, Outputs: map[string]string{"t3o0": "p5"}, - EnvironmentId: "e1@daASxQwenJ", + EnvironmentId: "e1", }, }, Pcollections: map[string]*pipepb.PCollection{ @@ -238,7 +238,7 @@ func TestAddNamespace(t *testing.T) { }, WindowingStrategies: map[string]*pipepb.WindowingStrategy{ "w0": {WindowCoderId: "c3", EnvironmentId: "e0"}, - "w1@daASxQwenJ": {WindowCoderId: "c4@daASxQwenJ", EnvironmentId: "e1@daASxQwenJ"}, + "w1@daASxQwenJ": {WindowCoderId: "c4@daASxQwenJ", EnvironmentId: "e1"}, }, Coders: map[string]*pipepb.Coder{ "c0": {Spec: &pipepb.FunctionSpec{Urn: "c0"}}, @@ -248,8 +248,8 @@ func TestAddNamespace(t *testing.T) { "c4@daASxQwenJ": {Spec: &pipepb.FunctionSpec{Urn: "c4"}}, }, Environments: map[string]*pipepb.Environment{ - "e0": {Urn: "e0"}, - "e1@daASxQwenJ": {Urn: "e1"}, + "e0": {Urn: "e0"}, + "e1": {Urn: "e1"}, }, }, }, @@ -314,25 +314,25 @@ func TestAddNamespace(t *testing.T) { UniqueName: "t0", Inputs: map[string]string{"t0i0": "p0"}, Outputs: map[string]string{"t0o0": "p1", "t0o1": "p2"}, - EnvironmentId: "e0@daASxQwenJ", + EnvironmentId: "e0", }, "t1": { UniqueName: "t1", Inputs: map[string]string{"t1i0": "p1"}, Outputs: map[string]string{"t1o0": "p3"}, - EnvironmentId: "e1@daASxQwenJ", + EnvironmentId: "e1", }, "t2": { UniqueName: "t2", Inputs: map[string]string{"t2i0": "p2"}, Outputs: map[string]string{"t2o0": "p4"}, - EnvironmentId: "e0@daASxQwenJ", + EnvironmentId: "e0", }, "t3": { UniqueName: "t3", Inputs: map[string]string{"t3i0": "p3", "t3i1": "p4"}, Outputs: map[string]string{"t3o0": "p5"}, - EnvironmentId: "e1@daASxQwenJ", + EnvironmentId: "e1", }, }, Pcollections: map[string]*pipepb.PCollection{ @@ -344,8 +344,8 @@ func TestAddNamespace(t *testing.T) { "p5": {CoderId: "c2@daASxQwenJ", WindowingStrategyId: "w1@daASxQwenJ"}, }, WindowingStrategies: map[string]*pipepb.WindowingStrategy{ - "w0@daASxQwenJ": {WindowCoderId: "c3@daASxQwenJ", EnvironmentId: "e0@daASxQwenJ"}, - "w1@daASxQwenJ": {WindowCoderId: "c4@daASxQwenJ", EnvironmentId: "e1@daASxQwenJ"}, + "w0@daASxQwenJ": {WindowCoderId: "c3@daASxQwenJ", EnvironmentId: "e0"}, + "w1@daASxQwenJ": {WindowCoderId: "c4@daASxQwenJ", EnvironmentId: "e1"}, }, Coders: map[string]*pipepb.Coder{ "c0": {Spec: &pipepb.FunctionSpec{Urn: "c0"}}, @@ -355,8 +355,8 @@ func TestAddNamespace(t *testing.T) { "c4@daASxQwenJ": {Spec: &pipepb.FunctionSpec{Urn: "c4"}}, }, Environments: map[string]*pipepb.Environment{ - "e0@daASxQwenJ": {Urn: "e0"}, - "e1@daASxQwenJ": {Urn: "e1"}, + "e0": {Urn: "e0"}, + "e1": {Urn: "e1"}, }, }, }, @@ -406,13 +406,13 @@ func TestAddNamespace(t *testing.T) { UniqueName: "t0", Inputs: map[string]string{"t0i0": "p0"}, Outputs: map[string]string{"t0o0": "p1"}, - EnvironmentId: "e0@daASxQwenJ", + EnvironmentId: "e0", }, "t1": { UniqueName: "t1", Inputs: map[string]string{"t1i0": "p1"}, Outputs: map[string]string{"t1o0": "p2"}, - EnvironmentId: "e1@daASxQwenJ", + EnvironmentId: "e1", }, }, Pcollections: map[string]*pipepb.PCollection{ @@ -421,8 +421,8 @@ func TestAddNamespace(t *testing.T) { "p2": {CoderId: "c0@daASxQwenJ", WindowingStrategyId: "w1@daASxQwenJ"}, }, WindowingStrategies: map[string]*pipepb.WindowingStrategy{ - "w0@daASxQwenJ": {WindowCoderId: "c3@daASxQwenJ", EnvironmentId: "e0@daASxQwenJ"}, - "w1@daASxQwenJ": {WindowCoderId: "c4@daASxQwenJ", EnvironmentId: "e1@daASxQwenJ"}, + "w0@daASxQwenJ": {WindowCoderId: "c3@daASxQwenJ", EnvironmentId: "e0"}, + "w1@daASxQwenJ": {WindowCoderId: "c4@daASxQwenJ", EnvironmentId: "e1"}, }, Coders: map[string]*pipepb.Coder{ "c0@daASxQwenJ": {Spec: &pipepb.FunctionSpec{Urn: "c0"}, ComponentCoderIds: []string{"c2@daASxQwenJ"}}, @@ -432,8 +432,8 @@ func TestAddNamespace(t *testing.T) { "c4@daASxQwenJ": {Spec: &pipepb.FunctionSpec{Urn: "c4"}}, }, Environments: map[string]*pipepb.Environment{ - "e0@daASxQwenJ": {Urn: "e0"}, - "e1@daASxQwenJ": {Urn: "e1"}, + "e0": {Urn: "e0"}, + "e1": {Urn: "e1"}, }, }, }, diff --git a/sdks/go/pkg/beam/core/runtime/xlangx/payload.go b/sdks/go/pkg/beam/core/runtime/xlangx/payload.go new file mode 100644 index 000000000000..e1d97c4a6b64 --- /dev/null +++ b/sdks/go/pkg/beam/core/runtime/xlangx/payload.go @@ -0,0 +1,67 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package xlangx + +import ( + "bytes" + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "google.golang.org/protobuf/proto" +) + +// EncodeStructPayload takes a native Go struct and returns a marshaled +// ExternalConfigurationPayload proto, containing a Schema representation of +// the original type and the original value encoded as a Row. This is intended +// to be used as the expansion payload for an External transform. +func EncodeStructPayload(pl interface{}) ([]byte, error) { + rt := reflect.TypeOf(pl) + + // Encode payload value as a Row. + enc, err := coder.RowEncoderForStruct(rt) + if err != nil { + err = errors.WithContext(err, "creating Row encoder for payload") + return []byte{}, errors.WithContextf(err, "encoding external payload %v", pl) + } + var buf bytes.Buffer + if err := enc(pl, &buf); err != nil { + err = errors.WithContext(err, "encoding payload as Row") + return []byte{}, errors.WithContextf(err, "encoding external payload %v", pl) + } + + // Convert payload type into Schema representation. + scm, err := schema.FromType(rt) + if err != nil { + err = errors.WithContext(err, "creating schema for payload") + return []byte{}, errors.WithContextf(err, "encoding external payload %v", pl) + } + + // Put schema and row into payload proto, and marshal it. + ecp := &pipepb.ExternalConfigurationPayload{ + Schema: scm, + Payload: buf.Bytes(), + } + plBytes, err := proto.Marshal(ecp) + if err != nil { + err = errors.Wrapf(err, "failed to marshal payload as proto") + return []byte{}, errors.WithContextf(err, "encoding external payload %v", pl) + } + + return plBytes, nil +} diff --git a/sdks/go/pkg/beam/core/runtime/xlangx/resolve.go b/sdks/go/pkg/beam/core/runtime/xlangx/resolve.go index 87dba0c441ff..581849a3166d 100644 --- a/sdks/go/pkg/beam/core/runtime/xlangx/resolve.go +++ b/sdks/go/pkg/beam/core/runtime/xlangx/resolve.go @@ -20,55 +20,110 @@ import ( "path/filepath" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/artifact" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/protox" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/artifact" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) // ResolveArtifacts acquires all dependencies for a cross-language transform func ResolveArtifacts(ctx context.Context, edges []*graph.MultiEdge, p *pipepb.Pipeline) { - path, err := filepath.Abs("/tmp/artifacts") + _, err := ResolveArtifactsWithConfig(ctx, edges, ResolveConfig{}) if err != nil { panic(err) } +} + +// ResolveConfig contains fields for configuring the behavior for resolving +// artifacts. +type ResolveConfig struct { + // SdkPath replaces the default filepath for dependencies, but only in the + // external environment proto to be used by the SDK Harness during pipeline + // execution. This is used to specify alternate staging directories, such + // as for staging artifacts remotely. + // + // Setting an SdkPath does not change staging behavior otherwise. All + // artifacts still get staged to the default local filepath, and it is the + // user's responsibility to stage those local artifacts to the SdkPath. + SdkPath string + + // JoinFn is a function for combining SdkPath and individual artifact names. + // If not specified, it defaults to using filepath.Join. + JoinFn func(path, name string) string +} + +func defaultJoinFn(path, name string) string { + return filepath.Join(path, "/", name) +} + +// ResolveArtifactsWithConfig acquires all dependencies for cross-language +// transforms, but with some additional configuration to behavior. By default, +// this function performs the following steps for each cross-language transform +// in the list of edges: +// 1. Retrieves a list of dependencies needed from the expansion service. +// 2. Retrieves each dependency as an artifact and stages it to a default +// local filepath. +// 3. Adds the dependencies to the transform's stored environment proto. +// The changes that can be configured are documented in ResolveConfig. +// +// This returns a map of "local path" to "sdk path". By default these are +// identical, unless ResolveConfig.SdkPath has been set. +func ResolveArtifactsWithConfig(ctx context.Context, edges []*graph.MultiEdge, cfg ResolveConfig) (paths map[string]string, err error) { + tmpPath, err := filepath.Abs("/tmp/artifacts") + if err != nil { + return nil, errors.WithContext(err, "resolving remote artifacts") + } + if cfg.JoinFn == nil { + cfg.JoinFn = defaultJoinFn + } + paths = make(map[string]string) for _, e := range edges { - if e.Op == graph.External { + if e.Op == graph.External && e.External != nil { components, err := graphx.ExpandedComponents(e.External.Expanded) if err != nil { - panic(err) + return nil, errors.WithContextf(err, + "resolving remote artifacts for edge %v", e.Name()) } envs := components.Environments for eid, env := range envs { - if strings.HasPrefix(eid, "go") { continue } deps := env.GetDependencies() - resolvedMeta, err := artifact.Materialize(ctx, e.External.ExpansionAddr, deps, "", path) + resolvedArtifacts, err := artifact.Materialize(ctx, e.External.ExpansionAddr, deps, "", tmpPath) if err != nil { - panic(err) + return nil, errors.WithContextf(err, + "resolving remote artifacts for env %v in edge %v", eid, e.Name()) } var resolvedDeps []*pipepb.ArtifactInformation - for _, meta := range resolvedMeta { - fullPath := filepath.Join(path, "/", meta.Name) + for _, a := range resolvedArtifacts { + name, sha256 := artifact.MustExtractFilePayload(a) + fullTmpPath := filepath.Join(tmpPath, "/", name) + fullSdkPath := fullTmpPath + if len(cfg.SdkPath) > 0 { + fullSdkPath = cfg.JoinFn(cfg.SdkPath, name) + } resolvedDeps = append(resolvedDeps, &pipepb.ArtifactInformation{ TypeUrn: "beam:artifact:type:file:v1", TypePayload: protox.MustEncode( &pipepb.ArtifactFilePayload{ - Path: fullPath, - Sha256: meta.Sha256, + Path: fullSdkPath, + Sha256: sha256, }, ), - RoleUrn: graphx.URNArtifactStagingTo, + RoleUrn: a.RoleUrn, + RolePayload: a.RolePayload, }, ) + paths[fullTmpPath] = fullSdkPath } env.Dependencies = resolvedDeps } } } + return paths, nil } diff --git a/sdks/go/pkg/beam/core/typex/class.go b/sdks/go/pkg/beam/core/typex/class.go index cb34ad7f6f5a..a14197fe1ddd 100644 --- a/sdks/go/pkg/beam/core/typex/class.go +++ b/sdks/go/pkg/beam/core/typex/class.go @@ -21,7 +21,7 @@ import ( "unicode" "unicode/utf8" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // Class is the type "class" of data as distinguished by the runtime. The class @@ -108,6 +108,7 @@ func isConcrete(t reflect.Type, visited map[uintptr]bool) bool { if t == nil || t == EventTimeType || t.Implements(WindowType) || + t == PaneInfoType || t == reflectx.Error || t == reflectx.Context || IsUniversal(t) { diff --git a/sdks/go/pkg/beam/core/typex/class_test.go b/sdks/go/pkg/beam/core/typex/class_test.go index 1584dc63276f..4521554f644c 100644 --- a/sdks/go/pkg/beam/core/typex/class_test.go +++ b/sdks/go/pkg/beam/core/typex/class_test.go @@ -19,7 +19,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // TestClassOf tests that the type classification is correct. diff --git a/sdks/go/pkg/beam/core/typex/fulltype.go b/sdks/go/pkg/beam/core/typex/fulltype.go index 011c91d81c24..cbfb443755ba 100644 --- a/sdks/go/pkg/beam/core/typex/fulltype.go +++ b/sdks/go/pkg/beam/core/typex/fulltype.go @@ -22,7 +22,7 @@ import ( "reflect" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // FullType represents the tree structure of data types processed by the graph. @@ -416,3 +416,9 @@ func checkTypesNotNil(list []FullType) { } } } + +// NoFiringPane return PaneInfo assigned as NoFiringPane(0x0f) +func NoFiringPane() PaneInfo { + pn := PaneInfo{IsFirst: true, IsLast: true, Timing: PaneUnknown} + return pn +} diff --git a/sdks/go/pkg/beam/core/typex/fulltype_test.go b/sdks/go/pkg/beam/core/typex/fulltype_test.go index c27befeff96b..8c7c161b7010 100644 --- a/sdks/go/pkg/beam/core/typex/fulltype_test.go +++ b/sdks/go/pkg/beam/core/typex/fulltype_test.go @@ -19,7 +19,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func TestIsBound(t *testing.T) { diff --git a/sdks/go/pkg/beam/core/typex/special.go b/sdks/go/pkg/beam/core/typex/special.go index 003e6d02583e..dd4199c5628e 100644 --- a/sdks/go/pkg/beam/core/typex/special.go +++ b/sdks/go/pkg/beam/core/typex/special.go @@ -18,7 +18,7 @@ package typex import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" ) // This file defines data types that programs use to indicate a @@ -35,6 +35,7 @@ var ( EventTimeType = reflect.TypeOf((*EventTime)(nil)).Elem() WindowType = reflect.TypeOf((*Window)(nil)).Elem() + PaneInfoType = reflect.TypeOf((*PaneInfo)(nil)).Elem() KVType = reflect.TypeOf((*KV)(nil)).Elem() CoGBKType = reflect.TypeOf((*CoGBK)(nil)).Elem() @@ -64,6 +65,21 @@ type Window interface { Equals(o Window) bool } +type PaneTiming byte + +const ( + PaneEarly PaneTiming = 0 + PaneOnTime PaneTiming = 1 + PaneLate PaneTiming = 2 + PaneUnknown PaneTiming = 3 +) + +type PaneInfo struct { + Timing PaneTiming + IsFirst, IsLast bool + Index, NonSpeculativeIndex int64 +} + // KV, CoGBK, WindowedValue represent composite generic types. They are not used // directly in user code signatures, but only in FullTypes. diff --git a/sdks/go/pkg/beam/core/util/dot/dot.go b/sdks/go/pkg/beam/core/util/dot/dot.go index 49e1d14cef6e..727151551f9f 100644 --- a/sdks/go/pkg/beam/core/util/dot/dot.go +++ b/sdks/go/pkg/beam/core/util/dot/dot.go @@ -22,8 +22,8 @@ import ( "path" "text/template" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) var ( diff --git a/sdks/go/pkg/beam/core/util/hooks/hooks.go b/sdks/go/pkg/beam/core/util/hooks/hooks.go index 9c873aa96431..141cb565c3d1 100644 --- a/sdks/go/pkg/beam/core/util/hooks/hooks.go +++ b/sdks/go/pkg/beam/core/util/hooks/hooks.go @@ -32,18 +32,30 @@ import ( "encoding/csv" "encoding/json" "strings" + "sync" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" ) -var ( - hookRegistry = make(map[string]HookFactory) - enabledHooks = make(map[string][]string) - activeHooks = make(map[string]Hook) -) +var defaultRegistry = newRegistry() + +type registry struct { + mu sync.Mutex + hookRegistry map[string]HookFactory + enabledHooks map[string][]string + activeHooks map[string]Hook +} + +func newRegistry() *registry { + return ®istry{ + hookRegistry: make(map[string]HookFactory), + enabledHooks: make(map[string][]string), + activeHooks: make(map[string]Hook), + } +} // A Hook is a set of hooks to run at various stages of executing a // pipeline. @@ -64,19 +76,26 @@ type InitHook func(context.Context) (context.Context, error) // HookFactory is a function that produces a Hook from the supplied arguments. type HookFactory func([]string) Hook +func (r *registry) RegisterHook(name string, h HookFactory) { + r.mu.Lock() + defer r.mu.Unlock() + r.hookRegistry[name] = h +} + // RegisterHook registers a Hook for the // supplied identifier. func RegisterHook(name string, h HookFactory) { - hookRegistry[name] = h + defaultRegistry.RegisterHook(name, h) } -// RunInitHooks runs the init hooks. -func RunInitHooks(ctx context.Context) (context.Context, error) { +func (r *registry) RunInitHooks(ctx context.Context) (context.Context, error) { // If an init hook fails to complete, the invariants of the // system are compromised and we can't run a workflow. // The hooks can run in any order. They should not be // interdependent or interfere with each other. - for _, h := range activeHooks { + r.mu.Lock() + defer r.mu.Unlock() + for _, h := range r.activeHooks { if h.Init != nil { var err error if ctx, err = h.Init(ctx); err != nil { @@ -87,15 +106,21 @@ func RunInitHooks(ctx context.Context) (context.Context, error) { return ctx, nil } +// RunInitHooks runs the init hooks. +func RunInitHooks(ctx context.Context) (context.Context, error) { + return defaultRegistry.RunInitHooks(ctx) +} + // RequestHook is called when handling a FnAPI instruction. It can return an updated // context to pass additional information to downstream callers, or return the // original context provided. type RequestHook func(context.Context, *fnpb.InstructionRequest) (context.Context, error) -// RunRequestHooks runs the hooks that handle a FnAPI request. -func RunRequestHooks(ctx context.Context, req *fnpb.InstructionRequest) context.Context { +func (r *registry) RunRequestHooks(ctx context.Context, req *fnpb.InstructionRequest) context.Context { + r.mu.Lock() + defer r.mu.Unlock() // The request hooks should not modify the request. - for n, h := range activeHooks { + for n, h := range r.activeHooks { if h.Req != nil { var err error if ctx, err = h.Req(ctx, req); err != nil { @@ -106,12 +131,18 @@ func RunRequestHooks(ctx context.Context, req *fnpb.InstructionRequest) context. return ctx } +// RunRequestHooks runs the hooks that handle a FnAPI request. +func RunRequestHooks(ctx context.Context, req *fnpb.InstructionRequest) context.Context { + return defaultRegistry.RunRequestHooks(ctx, req) +} + // ResponseHook is called when sending a FnAPI instruction response. type ResponseHook func(context.Context, *fnpb.InstructionRequest, *fnpb.InstructionResponse) error -// RunResponseHooks runs the hooks that handle a FnAPI response. -func RunResponseHooks(ctx context.Context, req *fnpb.InstructionRequest, resp *fnpb.InstructionResponse) { - for n, h := range activeHooks { +func (r *registry) RunResponseHooks(ctx context.Context, req *fnpb.InstructionRequest, resp *fnpb.InstructionResponse) { + r.mu.Lock() + defer r.mu.Unlock() + for n, h := range r.activeHooks { if h.Resp != nil { if err := h.Resp(ctx, req, resp); err != nil { log.Infof(ctx, "response hook %s failed: %v", n, err) @@ -120,10 +151,15 @@ func RunResponseHooks(ctx context.Context, req *fnpb.InstructionRequest, resp *f } } -// SerializeHooksToOptions serializes the activated hooks and their configuration into a JSON string -// that can be deserialized later by the runner. -func SerializeHooksToOptions() { - data, err := json.Marshal(enabledHooks) +// RunResponseHooks runs the hooks that handle a FnAPI response. +func RunResponseHooks(ctx context.Context, req *fnpb.InstructionRequest, resp *fnpb.InstructionResponse) { + defaultRegistry.RunResponseHooks(ctx, req, resp) +} + +func (r *registry) SerializeHooksToOptions() { + r.mu.Lock() + defer r.mu.Unlock() + data, err := json.Marshal(r.enabledHooks) if err != nil { // Shouldn't happen, since all the data is strings. panic(errors.Wrap(err, "Couldn't serialize hooks")) @@ -131,41 +167,65 @@ func SerializeHooksToOptions() { runtime.GlobalOptions.Set("hooks", string(data)) } -// DeserializeHooksFromOptions extracts the hook configuration information from the options and configures -// the hooks with the supplied options. -func DeserializeHooksFromOptions(ctx context.Context) { +// SerializeHooksToOptions serializes the activated hooks and their configuration into a JSON string +// that can be deserialized later by the runner. +func SerializeHooksToOptions() { + defaultRegistry.SerializeHooksToOptions() +} + +func (r *registry) DeserializeHooksFromOptions(ctx context.Context) { cfg := runtime.GlobalOptions.Get("hooks") if cfg == "" { log.Warn(ctx, "SerializeHooksToOptions was never called. No hooks enabled") return } - if err := json.Unmarshal([]byte(cfg), &enabledHooks); err != nil { + r.mu.Lock() + defer r.mu.Unlock() + if err := json.Unmarshal([]byte(cfg), &r.enabledHooks); err != nil { // Shouldn't happen, since all the data is strings. panic(errors.Wrapf(err, "DeserializeHooks failed on input %q", cfg)) } - for h, opts := range enabledHooks { - activeHooks[h] = hookRegistry[h](opts) + for h, opts := range r.enabledHooks { + r.activeHooks[h] = r.hookRegistry[h](opts) } } +// DeserializeHooksFromOptions extracts the hook configuration information from the options and configures +// the hooks with the supplied options. +func DeserializeHooksFromOptions(ctx context.Context) { + defaultRegistry.DeserializeHooksFromOptions(ctx) +} + +func (r *registry) EnableHook(name string, args ...string) error { + r.mu.Lock() + defer r.mu.Unlock() + if _, ok := r.hookRegistry[name]; !ok { + return errors.Errorf("EnableHook: hook %s not found", name) + } + r.enabledHooks[name] = args + return nil +} + // EnableHook enables the hook to be run for the pipline. It will be // receive the supplied args when the pipeline executes. It is safe // to enable the same hook with different options, as this is necessary // if a hook wants to compose behavior. func EnableHook(name string, args ...string) error { - if _, ok := hookRegistry[name]; !ok { - return errors.Errorf("EnableHook: hook %s not found", name) - } - enabledHooks[name] = args - return nil + return defaultRegistry.EnableHook(name, args...) +} + +func (r *registry) IsEnabled(name string) (bool, []string) { + r.mu.Lock() + defer r.mu.Unlock() + opts, ok := r.enabledHooks[name] + return ok, opts } // IsEnabled returns true and the registered options if the hook is // already enabled. func IsEnabled(name string) (bool, []string) { - opts, ok := enabledHooks[name] - return ok, opts + return defaultRegistry.IsEnabled(name) } // Encode encodes a hook name and its arguments into a single string. diff --git a/sdks/go/pkg/beam/core/util/hooks/hooks_test.go b/sdks/go/pkg/beam/core/util/hooks/hooks_test.go index 06b0d9acf4db..8781af835548 100644 --- a/sdks/go/pkg/beam/core/util/hooks/hooks_test.go +++ b/sdks/go/pkg/beam/core/util/hooks/hooks_test.go @@ -17,9 +17,11 @@ package hooks import ( "context" + "sync" "testing" + "time" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" ) type contextKey string @@ -31,8 +33,9 @@ const ( reqValue = "reqValue" ) -func initializeHooks() { - activeHooks["test"] = Hook{ +func initializeHooks() *registry { + var r = newRegistry() + r.activeHooks["test"] = Hook{ Init: func(ctx context.Context) (context.Context, error) { return context.WithValue(ctx, initKey, initValue), nil }, @@ -40,15 +43,16 @@ func initializeHooks() { return context.WithValue(ctx, reqKey, reqValue), nil }, } + return r } func TestInitContextPropagation(t *testing.T) { - initializeHooks() + r := initializeHooks() ctx := context.Background() var err error expected := initValue - ctx, err = RunInitHooks(ctx) + ctx, err = r.RunInitHooks(ctx) if err != nil { t.Errorf("got %v error, wanted no error", err) } @@ -59,13 +63,57 @@ func TestInitContextPropagation(t *testing.T) { } func TestRequestContextPropagation(t *testing.T) { - initializeHooks() + r := initializeHooks() ctx := context.Background() expected := reqValue - ctx = RunRequestHooks(ctx, nil) + ctx = r.RunRequestHooks(ctx, nil) actual := ctx.Value(reqKey) if actual != expected { t.Errorf("Got %s, wanted %s", actual, expected) } } + +// TestConcurrentWrites tests if the concurrent writes are handled properly. +// It uses go routines to test this on sample hook 'google_logging'. +func TestConcurrentWrites(t *testing.T) { + r := initializeHooks() + hf := func(opts []string) Hook { + return Hook{ + Req: func(ctx context.Context, req *fnpb.InstructionRequest) (context.Context, error) { + return ctx, nil + }, + } + } + r.RegisterHook("google_logging", hf) + + var actual, expected error + expected = nil + + ch := make(chan struct{}) + wg := sync.WaitGroup{} + + for i := 0; i < 5; i++ { + wg.Add(1) + go func() { + defer wg.Done() + for { + select { + case <-ch: + // When the channel is closed, exit. + return + default: + actual = r.EnableHook("google_logging") + if actual != expected { + t.Errorf("Got %s, wanted %s", actual, expected) + } + } + } + }() + } + // Let the goroutines execute for 5 seconds and then close the channel. + time.Sleep(time.Second * 5) + close(ch) + // Wait for all goroutines to exit properly. + wg.Wait() +} diff --git a/sdks/go/pkg/beam/core/util/ioutilx/read.go b/sdks/go/pkg/beam/core/util/ioutilx/read.go index cf3568cf9458..81f21c373374 100644 --- a/sdks/go/pkg/beam/core/util/ioutilx/read.go +++ b/sdks/go/pkg/beam/core/util/ioutilx/read.go @@ -20,7 +20,7 @@ import ( "io" "unsafe" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // ReadN reads exactly N bytes from the reader. Fails otherwise. diff --git a/sdks/go/pkg/beam/core/util/protox/any.go b/sdks/go/pkg/beam/core/util/protox/any.go index 0568f81045a2..e539a8c19dec 100644 --- a/sdks/go/pkg/beam/core/util/protox/any.go +++ b/sdks/go/pkg/beam/core/util/protox/any.go @@ -16,7 +16,7 @@ package protox import ( - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" "github.com/golang/protobuf/proto" protobuf "github.com/golang/protobuf/ptypes/any" protobufw "github.com/golang/protobuf/ptypes/wrappers" diff --git a/sdks/go/pkg/beam/core/util/protox/base64.go b/sdks/go/pkg/beam/core/util/protox/base64.go index 296d3975f37f..7f0f5a4bdeea 100644 --- a/sdks/go/pkg/beam/core/util/protox/base64.go +++ b/sdks/go/pkg/beam/core/util/protox/base64.go @@ -18,7 +18,7 @@ package protox import ( "encoding/base64" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" "github.com/golang/protobuf/proto" ) diff --git a/sdks/go/pkg/beam/core/util/reflectx/call.go b/sdks/go/pkg/beam/core/util/reflectx/call.go index 44f060254489..ace639cf00c9 100644 --- a/sdks/go/pkg/beam/core/util/reflectx/call.go +++ b/sdks/go/pkg/beam/core/util/reflectx/call.go @@ -19,8 +19,9 @@ import ( "reflect" "sync" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" "runtime/debug" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) //go:generate specialize --input=calls.tmpl diff --git a/sdks/go/pkg/beam/core/util/reflectx/calls.go b/sdks/go/pkg/beam/core/util/reflectx/calls.go index a1bfe6bc4da7..3297a49b57c9 100644 --- a/sdks/go/pkg/beam/core/util/reflectx/calls.go +++ b/sdks/go/pkg/beam/core/util/reflectx/calls.go @@ -17,7 +17,10 @@ package reflectx -import "reflect" +import ( + "fmt" + "reflect" +) // Generated arity-specialized Func implementations to avoid runtime temporary // slices. Code that knows the arity can potentially avoid that overhead in @@ -57,7 +60,7 @@ func (c *shimFunc0x0) Call0x0() { func ToFunc0x0(c Func) Func0x0 { if c.Type().NumIn() != 0 || c.Type().NumOut() != 0 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 0 inputs and 0 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func0x0); ok { return sc @@ -98,7 +101,7 @@ func (c *shimFunc0x1) Call0x1() interface{} { func ToFunc0x1(c Func) Func0x1 { if c.Type().NumIn() != 0 || c.Type().NumOut() != 1 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 0 inputs and 1 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func0x1); ok { return sc @@ -139,7 +142,7 @@ func (c *shimFunc0x2) Call0x2() (interface{}, interface{}) { func ToFunc0x2(c Func) Func0x2 { if c.Type().NumIn() != 0 || c.Type().NumOut() != 2 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 0 inputs and 2 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func0x2); ok { return sc @@ -180,7 +183,7 @@ func (c *shimFunc0x3) Call0x3() (interface{}, interface{}, interface{}) { func ToFunc0x3(c Func) Func0x3 { if c.Type().NumIn() != 0 || c.Type().NumOut() != 3 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 0 inputs and 3 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func0x3); ok { return sc @@ -221,7 +224,7 @@ func (c *shimFunc1x0) Call1x0(arg0 interface{}) { func ToFunc1x0(c Func) Func1x0 { if c.Type().NumIn() != 1 || c.Type().NumOut() != 0 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 1 inputs and 0 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func1x0); ok { return sc @@ -262,7 +265,7 @@ func (c *shimFunc1x1) Call1x1(arg0 interface{}) interface{} { func ToFunc1x1(c Func) Func1x1 { if c.Type().NumIn() != 1 || c.Type().NumOut() != 1 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 1 inputs and 1 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func1x1); ok { return sc @@ -303,7 +306,7 @@ func (c *shimFunc1x2) Call1x2(arg0 interface{}) (interface{}, interface{}) { func ToFunc1x2(c Func) Func1x2 { if c.Type().NumIn() != 1 || c.Type().NumOut() != 2 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 1 inputs and 2 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func1x2); ok { return sc @@ -344,7 +347,7 @@ func (c *shimFunc1x3) Call1x3(arg0 interface{}) (interface{}, interface{}, inter func ToFunc1x3(c Func) Func1x3 { if c.Type().NumIn() != 1 || c.Type().NumOut() != 3 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 1 inputs and 3 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func1x3); ok { return sc @@ -385,7 +388,7 @@ func (c *shimFunc2x0) Call2x0(arg0, arg1 interface{}) { func ToFunc2x0(c Func) Func2x0 { if c.Type().NumIn() != 2 || c.Type().NumOut() != 0 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 2 inputs and 0 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func2x0); ok { return sc @@ -426,7 +429,7 @@ func (c *shimFunc2x1) Call2x1(arg0, arg1 interface{}) interface{} { func ToFunc2x1(c Func) Func2x1 { if c.Type().NumIn() != 2 || c.Type().NumOut() != 1 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 2 inputs and 1 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func2x1); ok { return sc @@ -467,7 +470,7 @@ func (c *shimFunc2x2) Call2x2(arg0, arg1 interface{}) (interface{}, interface{}) func ToFunc2x2(c Func) Func2x2 { if c.Type().NumIn() != 2 || c.Type().NumOut() != 2 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 2 inputs and 2 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func2x2); ok { return sc @@ -508,7 +511,7 @@ func (c *shimFunc2x3) Call2x3(arg0, arg1 interface{}) (interface{}, interface{}, func ToFunc2x3(c Func) Func2x3 { if c.Type().NumIn() != 2 || c.Type().NumOut() != 3 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 2 inputs and 3 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func2x3); ok { return sc @@ -549,7 +552,7 @@ func (c *shimFunc3x0) Call3x0(arg0, arg1, arg2 interface{}) { func ToFunc3x0(c Func) Func3x0 { if c.Type().NumIn() != 3 || c.Type().NumOut() != 0 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 3 inputs and 0 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func3x0); ok { return sc @@ -590,7 +593,7 @@ func (c *shimFunc3x1) Call3x1(arg0, arg1, arg2 interface{}) interface{} { func ToFunc3x1(c Func) Func3x1 { if c.Type().NumIn() != 3 || c.Type().NumOut() != 1 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 3 inputs and 1 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func3x1); ok { return sc @@ -631,7 +634,7 @@ func (c *shimFunc3x2) Call3x2(arg0, arg1, arg2 interface{}) (interface{}, interf func ToFunc3x2(c Func) Func3x2 { if c.Type().NumIn() != 3 || c.Type().NumOut() != 2 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 3 inputs and 2 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func3x2); ok { return sc @@ -672,7 +675,7 @@ func (c *shimFunc3x3) Call3x3(arg0, arg1, arg2 interface{}) (interface{}, interf func ToFunc3x3(c Func) Func3x3 { if c.Type().NumIn() != 3 || c.Type().NumOut() != 3 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 3 inputs and 3 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func3x3); ok { return sc @@ -713,7 +716,7 @@ func (c *shimFunc4x0) Call4x0(arg0, arg1, arg2, arg3 interface{}) { func ToFunc4x0(c Func) Func4x0 { if c.Type().NumIn() != 4 || c.Type().NumOut() != 0 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 4 inputs and 0 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func4x0); ok { return sc @@ -754,7 +757,7 @@ func (c *shimFunc4x1) Call4x1(arg0, arg1, arg2, arg3 interface{}) interface{} { func ToFunc4x1(c Func) Func4x1 { if c.Type().NumIn() != 4 || c.Type().NumOut() != 1 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 4 inputs and 1 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func4x1); ok { return sc @@ -795,7 +798,7 @@ func (c *shimFunc4x2) Call4x2(arg0, arg1, arg2, arg3 interface{}) (interface{}, func ToFunc4x2(c Func) Func4x2 { if c.Type().NumIn() != 4 || c.Type().NumOut() != 2 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 4 inputs and 2 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func4x2); ok { return sc @@ -836,7 +839,7 @@ func (c *shimFunc4x3) Call4x3(arg0, arg1, arg2, arg3 interface{}) (interface{}, func ToFunc4x3(c Func) Func4x3 { if c.Type().NumIn() != 4 || c.Type().NumOut() != 3 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 4 inputs and 3 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func4x3); ok { return sc @@ -877,7 +880,7 @@ func (c *shimFunc5x0) Call5x0(arg0, arg1, arg2, arg3, arg4 interface{}) { func ToFunc5x0(c Func) Func5x0 { if c.Type().NumIn() != 5 || c.Type().NumOut() != 0 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 5 inputs and 0 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func5x0); ok { return sc @@ -918,7 +921,7 @@ func (c *shimFunc5x1) Call5x1(arg0, arg1, arg2, arg3, arg4 interface{}) interfac func ToFunc5x1(c Func) Func5x1 { if c.Type().NumIn() != 5 || c.Type().NumOut() != 1 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 5 inputs and 1 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func5x1); ok { return sc @@ -959,7 +962,7 @@ func (c *shimFunc5x2) Call5x2(arg0, arg1, arg2, arg3, arg4 interface{}) (interfa func ToFunc5x2(c Func) Func5x2 { if c.Type().NumIn() != 5 || c.Type().NumOut() != 2 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 5 inputs and 2 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func5x2); ok { return sc @@ -1000,7 +1003,7 @@ func (c *shimFunc5x3) Call5x3(arg0, arg1, arg2, arg3, arg4 interface{}) (interfa func ToFunc5x3(c Func) Func5x3 { if c.Type().NumIn() != 5 || c.Type().NumOut() != 3 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 5 inputs and 3 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func5x3); ok { return sc @@ -1041,7 +1044,7 @@ func (c *shimFunc6x0) Call6x0(arg0, arg1, arg2, arg3, arg4, arg5 interface{}) { func ToFunc6x0(c Func) Func6x0 { if c.Type().NumIn() != 6 || c.Type().NumOut() != 0 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 6 inputs and 0 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func6x0); ok { return sc @@ -1082,7 +1085,7 @@ func (c *shimFunc6x1) Call6x1(arg0, arg1, arg2, arg3, arg4, arg5 interface{}) in func ToFunc6x1(c Func) Func6x1 { if c.Type().NumIn() != 6 || c.Type().NumOut() != 1 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 6 inputs and 1 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func6x1); ok { return sc @@ -1123,7 +1126,7 @@ func (c *shimFunc6x2) Call6x2(arg0, arg1, arg2, arg3, arg4, arg5 interface{}) (i func ToFunc6x2(c Func) Func6x2 { if c.Type().NumIn() != 6 || c.Type().NumOut() != 2 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 6 inputs and 2 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func6x2); ok { return sc @@ -1164,7 +1167,7 @@ func (c *shimFunc6x3) Call6x3(arg0, arg1, arg2, arg3, arg4, arg5 interface{}) (i func ToFunc6x3(c Func) Func6x3 { if c.Type().NumIn() != 6 || c.Type().NumOut() != 3 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 6 inputs and 3 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func6x3); ok { return sc @@ -1205,7 +1208,7 @@ func (c *shimFunc7x0) Call7x0(arg0, arg1, arg2, arg3, arg4, arg5, arg6 interface func ToFunc7x0(c Func) Func7x0 { if c.Type().NumIn() != 7 || c.Type().NumOut() != 0 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 7 inputs and 0 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func7x0); ok { return sc @@ -1246,7 +1249,7 @@ func (c *shimFunc7x1) Call7x1(arg0, arg1, arg2, arg3, arg4, arg5, arg6 interface func ToFunc7x1(c Func) Func7x1 { if c.Type().NumIn() != 7 || c.Type().NumOut() != 1 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 7 inputs and 1 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func7x1); ok { return sc @@ -1287,7 +1290,7 @@ func (c *shimFunc7x2) Call7x2(arg0, arg1, arg2, arg3, arg4, arg5, arg6 interface func ToFunc7x2(c Func) Func7x2 { if c.Type().NumIn() != 7 || c.Type().NumOut() != 2 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 7 inputs and 2 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func7x2); ok { return sc @@ -1328,7 +1331,7 @@ func (c *shimFunc7x3) Call7x3(arg0, arg1, arg2, arg3, arg4, arg5, arg6 interface func ToFunc7x3(c Func) Func7x3 { if c.Type().NumIn() != 7 || c.Type().NumOut() != 3 { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want 7 inputs and 3 outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func7x3); ok { return sc diff --git a/sdks/go/pkg/beam/core/util/reflectx/calls.tmpl b/sdks/go/pkg/beam/core/util/reflectx/calls.tmpl index 60b6107a5765..75b80e27305d 100644 --- a/sdks/go/pkg/beam/core/util/reflectx/calls.tmpl +++ b/sdks/go/pkg/beam/core/util/reflectx/calls.tmpl @@ -15,7 +15,10 @@ package reflectx -import "reflect" +import ( + "reflect" + "fmt" +) // Generated arity-specialized Func implementations to avoid runtime temporary // slices. Code that knows the arity can potentially avoid that overhead in @@ -57,7 +60,7 @@ func (c *shimFunc{{$in}}x{{$out}}) Call{{$in}}x{{$out}}({{mkargs $in "arg%v" "in func ToFunc{{$in}}x{{$out}}(c Func) Func{{$in}}x{{$out}} { if c.Type().NumIn() != {{$in}} || c.Type().NumOut() != {{$out}} { - panic("incompatible func type") + panic(fmt.Sprintf("Incompatible func type: got func %v with %v inputs and %v outputs, want {{$in}} inputs and {{$out}} outputs", c.Type(), c.Type().NumIn(), c.Type().NumOut())) } if sc, ok := c.(Func{{$in}}x{{$out}}); ok { return sc diff --git a/sdks/go/pkg/beam/core/util/reflectx/structs.go b/sdks/go/pkg/beam/core/util/reflectx/structs.go index 84f9be9aad14..89026cb3bf5d 100644 --- a/sdks/go/pkg/beam/core/util/reflectx/structs.go +++ b/sdks/go/pkg/beam/core/util/reflectx/structs.go @@ -20,7 +20,7 @@ import ( "reflect" "sync" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) var ( diff --git a/sdks/go/pkg/beam/core/util/symtab/symtab.go b/sdks/go/pkg/beam/core/util/symtab/symtab.go index 5b8c925ff253..6a65a09caa39 100644 --- a/sdks/go/pkg/beam/core/util/symtab/symtab.go +++ b/sdks/go/pkg/beam/core/util/symtab/symtab.go @@ -25,7 +25,7 @@ import ( "reflect" "runtime" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // SymbolTable allows for mapping between symbols and their addresses. diff --git a/sdks/go/pkg/beam/core/util/symtab/symtab_test.go b/sdks/go/pkg/beam/core/util/symtab/symtab_test.go index a77053306c21..fa84275c4c33 100644 --- a/sdks/go/pkg/beam/core/util/symtab/symtab_test.go +++ b/sdks/go/pkg/beam/core/util/symtab/symtab_test.go @@ -34,7 +34,7 @@ import ( "os" "runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/symtab" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/symtab" ) func die(format string, a ...interface{}) { diff --git a/sdks/go/pkg/beam/create.go b/sdks/go/pkg/beam/create.go index a0d2c0b28f2e..c7916d43f941 100644 --- a/sdks/go/pkg/beam/create.go +++ b/sdks/go/pkg/beam/create.go @@ -19,7 +19,7 @@ import ( "bytes" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Create inserts a fixed non-empty set of values into the pipeline. The values must diff --git a/sdks/go/pkg/beam/create_test.go b/sdks/go/pkg/beam/create_test.go index b3a0dbb7ec51..3386469eb390 100644 --- a/sdks/go/pkg/beam/create_test.go +++ b/sdks/go/pkg/beam/create_test.go @@ -20,9 +20,9 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" "github.com/golang/protobuf/proto" ) diff --git a/sdks/go/pkg/beam/doc_test.go b/sdks/go/pkg/beam/doc_test.go index 6c279805e33a..8ee3a1d37921 100644 --- a/sdks/go/pkg/beam/doc_test.go +++ b/sdks/go/pkg/beam/doc_test.go @@ -23,9 +23,9 @@ import ( "strconv" "strings" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/runners/direct" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/direct" ) func Example_gettingStarted() { diff --git a/sdks/go/pkg/beam/encoding.go b/sdks/go/pkg/beam/encoding.go index 0ab524ddd966..bc66d601b989 100644 --- a/sdks/go/pkg/beam/encoding.go +++ b/sdks/go/pkg/beam/encoding.go @@ -17,12 +17,42 @@ package beam import ( "encoding/json" + "fmt" + "io" "reflect" + "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) +var ( + encodedTypeType = reflect.TypeOf((*EncodedType)(nil)).Elem() + encodedFuncType = reflect.TypeOf((*EncodedFunc)(nil)).Elem() + encodedCoderType = reflect.TypeOf((*EncodedCoder)(nil)).Elem() + encodedStorageType = reflect.TypeOf((*struct{ EncodedBeamData string })(nil)).Elem() + timeType = reflect.TypeOf((*time.Time)(nil)).Elem() +) + +func init() { + RegisterType(encodedTypeType) + RegisterType(encodedFuncType) + RegisterType(encodedCoderType) + RegisterType(timeType) + schema.RegisterLogicalType(schema.ToLogicalType("beam.EncodedType", encodedTypeType, encodedStorageType)) + schema.RegisterLogicalType(schema.ToLogicalType("beam.EncodedFunc", encodedFuncType, encodedStorageType)) + schema.RegisterLogicalType(schema.ToLogicalType("beam.EncodedCoder", encodedCoderType, encodedStorageType)) + coder.RegisterSchemaProviders(encodedTypeType, encodedTypeEnc, encodedTypeDec) + coder.RegisterSchemaProviders(encodedFuncType, encodedFuncEnc, encodedFuncDec) + coder.RegisterSchemaProviders(encodedCoderType, encodedCoderEnc, encodedCoderDec) + + schema.RegisterLogicalType(schema.ToLogicalType("time.Time", timeType, encodedStorageType)) + coder.RegisterSchemaProviders(timeType, timeEnc, timeDec) +} + // EncodedType is a serialization wrapper around a type for convenience. type EncodedType struct { // T is the type to preserve across serialization. @@ -52,6 +82,44 @@ func (w *EncodedType) UnmarshalJSON(buf []byte) error { return nil } +func encodedTypeEnc(reflect.Type) (func(interface{}, io.Writer) error, error) { + return func(iface interface{}, w io.Writer) error { + if err := coder.WriteSimpleRowHeader(1, w); err != nil { + return err + } + v := iface.(EncodedType) + str, err := graphx.EncodeType(v.T) + if err != nil { + return err + } + if err := coder.EncodeStringUTF8(str, w); err != nil { + return err + } + return nil + }, + nil +} + +func encodedTypeDec(reflect.Type) (func(io.Reader) (interface{}, error), error) { + return func(r io.Reader) (interface{}, error) { + if err := coder.ReadSimpleRowHeader(1, r); err != nil { + return nil, err + } + s, err := coder.DecodeStringUTF8(r) + if err != nil { + return nil, err + } + t, err := graphx.DecodeType(s) + if err != nil { + return nil, err + } + return EncodedType{ + T: t, + }, nil + }, + nil +} + // EncodedFunc is a serialization wrapper around a function for convenience. type EncodedFunc struct { // Fn is the function to preserve across serialization. @@ -81,6 +149,44 @@ func (w *EncodedFunc) UnmarshalJSON(buf []byte) error { return nil } +func encodedFuncEnc(reflect.Type) (func(interface{}, io.Writer) error, error) { + return func(iface interface{}, w io.Writer) error { + if err := coder.WriteSimpleRowHeader(1, w); err != nil { + return err + } + v := iface.(EncodedFunc) + str, err := graphx.EncodeFn(v.Fn) + if err != nil { + return err + } + if err := coder.EncodeStringUTF8(str, w); err != nil { + return err + } + return nil + }, + nil +} + +func encodedFuncDec(reflect.Type) (func(io.Reader) (interface{}, error), error) { + return func(r io.Reader) (interface{}, error) { + if err := coder.ReadSimpleRowHeader(1, r); err != nil { + return nil, err + } + s, err := coder.DecodeStringUTF8(r) + if err != nil { + return nil, err + } + fn, err := graphx.DecodeFn(s) + if err != nil { + return nil, err + } + return EncodedFunc{ + Fn: fn, + }, nil + }, + nil +} + // DecodeCoder decodes a coder. Any custom coder function symbol must be // resolvable via the runtime.GlobalSymbolResolver. The types must be encodable. func DecodeCoder(data string) (Coder, error) { @@ -119,3 +225,76 @@ func (w *EncodedCoder) UnmarshalJSON(buf []byte) error { w.Coder = Coder{coder: c} return nil } + +func encodedCoderEnc(reflect.Type) (func(interface{}, io.Writer) error, error) { + return func(iface interface{}, w io.Writer) error { + if err := coder.WriteSimpleRowHeader(1, w); err != nil { + return err + } + v := iface.(EncodedCoder) + str, err := graphx.EncodeCoder(v.Coder.coder) + if err != nil { + return err + } + if err := coder.EncodeStringUTF8(str, w); err != nil { + return err + } + return nil + }, + nil +} + +func encodedCoderDec(reflect.Type) (func(io.Reader) (interface{}, error), error) { + return func(r io.Reader) (interface{}, error) { + if err := coder.ReadSimpleRowHeader(1, r); err != nil { + return nil, err + } + s, err := coder.DecodeStringUTF8(r) + if err != nil { + return nil, err + } + c, err := graphx.DecodeCoder(s) + if err != nil { + return EncodedCoder{}, err + } + return EncodedCoder{Coder: Coder{coder: c}}, nil + }, + nil +} + +func timeEnc(reflect.Type) (func(interface{}, io.Writer) error, error) { + return func(iface interface{}, w io.Writer) error { + if err := coder.WriteSimpleRowHeader(1, w); err != nil { + return errors.Wrap(err, "encoding time.Time schema override") + } + t := iface.(time.Time) + // We use the text marshalling rather than the binary marshalling + // since it has more precision. Apparently some info isn't included + // in the binary marshal. + data, err := t.MarshalText() + if err != nil { + return fmt.Errorf("marshalling time: %v", err) + } + if err := coder.EncodeBytes(data, w); err != nil { + return err + } + return nil + }, nil +} + +func timeDec(reflect.Type) (func(io.Reader) (interface{}, error), error) { + return func(r io.Reader) (interface{}, error) { + if err := coder.ReadSimpleRowHeader(1, r); err != nil { + return nil, errors.Wrap(err, "decoding time.Time schema override") + } + data, err := coder.DecodeBytes(r) + if err != nil { + return nil, errors.Wrap(err, "retrieving time data: %v") + } + t := time.Time{} + if err := t.UnmarshalText(data); err != nil { + return nil, errors.Wrap(err, "decoding time: %v") + } + return t, nil + }, nil +} diff --git a/sdks/go/pkg/beam/encoding_test.go b/sdks/go/pkg/beam/encoding_test.go new file mode 100644 index 000000000000..4697da9575f4 --- /dev/null +++ b/sdks/go/pkg/beam/encoding_test.go @@ -0,0 +1,127 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package beam + +import ( + "reflect" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder/testutil" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/google/go-cmp/cmp" +) + +type registeredTestType struct { + A int + B string + _ float64 + D []byte + // Currently Not encodable by default, so validates the registration. + Foo [0]int + Bar map[string]int +} + +var registeredTestTypeType = reflect.TypeOf((*registeredTestType)(nil)).Elem() + +func init() { + RegisterType(registeredTestTypeType) +} + +func TestEncodedType(t *testing.T) { + var v testutil.SchemaCoder + v.CmpOptions = []cmp.Option{ + cmp.Comparer(func(a reflect.Type, b reflect.Type) bool { + return a.AssignableTo(b) + }), + } + v.Validate(t, encodedTypeType, encodedTypeEnc, encodedTypeDec, struct{ Str string }{}, + []EncodedType{ + {reflectx.String}, + {reflectx.Int}, + {reflectx.Int8}, + {reflectx.Int16}, + {reflectx.Int32}, + {reflectx.Int64}, + {reflectx.Uint}, + {reflectx.Uint8}, + {reflectx.Uint16}, + {reflectx.Uint32}, + {reflectx.Uint64}, + {reflectx.Float32}, + {reflectx.Float64}, + {reflectx.ByteSlice}, + {reflectx.Error}, + {reflectx.Bool}, + {reflectx.Context}, + + {reflect.PtrTo(reflectx.String)}, + {reflect.PtrTo(reflectx.Int)}, + {reflect.PtrTo(reflectx.Int8)}, + {reflect.PtrTo(reflectx.Int16)}, + {reflect.PtrTo(reflectx.Int32)}, + {reflect.PtrTo(reflectx.Int64)}, + {reflect.PtrTo(reflectx.Uint)}, + {reflect.PtrTo(reflectx.Uint8)}, + {reflect.PtrTo(reflectx.Uint16)}, + {reflect.PtrTo(reflectx.Uint32)}, + {reflect.PtrTo(reflectx.Uint64)}, + {reflect.PtrTo(reflectx.Float32)}, + {reflect.PtrTo(reflectx.Float64)}, + {reflect.PtrTo(reflectx.ByteSlice)}, + {reflect.PtrTo(reflectx.Error)}, + {reflect.PtrTo(reflectx.Bool)}, + + {reflect.ChanOf(reflect.BothDir, reflectx.Bool)}, + {reflect.ChanOf(reflect.RecvDir, reflectx.Float64)}, + {reflect.ChanOf(reflect.SendDir, reflectx.ByteSlice)}, + + // Needs changes to Go SDK v1 type protos to support without registering. + // {reflect.ArrayOf(0, reflectx.Int)}, + // {reflect.ArrayOf(7, reflectx.Int)}, + // {reflect.MapOf(reflectx.String, reflectx.ByteSlice)}, + + {registeredTestTypeType}, + + {reflect.FuncOf([]reflect.Type{reflectx.Context, reflectx.String, reflectx.Int}, []reflect.Type{reflectx.Error}, false)}, + + {encodedTypeType}, + {encodedFuncType}, + {reflect.TypeOf((*struct{ A, B, C int })(nil))}, + {reflect.TypeOf((*struct{ A, B, C int })(nil)).Elem()}, + }) +} + +func registeredLocalTestFunction(_ string) int { + return 0 +} + +func init() { + // Functions must be registered anyway, in particular for unit tests. + RegisterFunction(registeredLocalTestFunction) +} + +func TestEncodedFunc(t *testing.T) { + var v testutil.SchemaCoder + v.CmpOptions = []cmp.Option{ + cmp.Comparer(func(a reflectx.Func, b reflectx.Func) bool { + return a.Type() == b.Type() && a.Name() == b.Name() + }), + } + v.Validate(t, encodedFuncType, encodedFuncEnc, encodedFuncDec, struct{ Str string }{}, + []EncodedFunc{ + {reflectx.MakeFunc(registeredLocalTestFunction)}, + }) +} diff --git a/sdks/go/pkg/beam/example_schema_test.go b/sdks/go/pkg/beam/example_schema_test.go new file mode 100644 index 000000000000..8dee0af01a71 --- /dev/null +++ b/sdks/go/pkg/beam/example_schema_test.go @@ -0,0 +1,254 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package beam_test + +import ( + "bytes" + "fmt" + "io" + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/google/go-cmp/cmp" +) + +// RegisterSchemaProvider must be called before beam.Init, and conventionally in a package init block. +func init() { + beam.RegisterSchemaProvider(reflect.TypeOf((*Alphabet)(nil)).Elem(), &AlphabetProvider{}) + // TODO(BEAM-9615): Registerying a self encoding type causes a cycle. Needs resolving. + // beam.RegisterType(reflect.TypeOf((*Cyrillic)(nil))) + beam.RegisterType(reflect.TypeOf((*Latin)(nil))) + beam.RegisterType(reflect.TypeOf((*Ελληνικά)(nil))) +} + +type Alphabet interface { + alphabet() string +} + +type Cyrillic struct { + A, B int +} + +func (*Cyrillic) alphabet() string { + return "Cyrillic" +} + +type Latin struct { + // Unexported fields are not serializable by beam schemas by default + // so we need to handle this ourselves. + c uint64 + d *float32 +} + +func (*Latin) alphabet() string { + return "Latin" +} + +type Ελληνικά struct { + q string + G func() string +} + +func (*Ελληνικά) alphabet() string { + return "Ελληνικά" +} + +// AlphabetProvider provides encodings for types that implement the Alphabet interface. +type AlphabetProvider struct { + enc *coder.RowEncoderBuilder + dec *coder.RowDecoderBuilder +} + +var ( + typeCyrillic = reflect.TypeOf((*Cyrillic)(nil)) + typeLatin = reflect.TypeOf((*Latin)(nil)) + typeΕλληνικά = reflect.TypeOf((*Ελληνικά)(nil)) +) + +func (p *AlphabetProvider) FromLogicalType(rt reflect.Type) (reflect.Type, error) { + // FromLogicalType produces schema representative types, which match the encoders + // and decoders that this function generates for this type. + // While this example uses statically assigned schema representative types, it's + // possible to generate the returned reflect.Type dynamically instead, using the + // reflect package. + switch rt { + // The Cyrillic type is able to be encoded by default, so we simply use it directly + // as it's own representative type. + case typeCyrillic: + return typeCyrillic, nil + case typeLatin: + // The Latin type only has unexported fields, so we need to make the equivalent + // have exported fields. + return reflect.TypeOf((*struct { + C uint64 + D *float32 + })(nil)).Elem(), nil + case typeΕλληνικά: + return reflect.TypeOf((*struct{ Q string })(nil)).Elem(), nil + } + return nil, fmt.Errorf("Unknown Alphabet: %v", rt) +} + +// BuildEncoder returns beam schema encoder functions for types with the Alphabet interface. +func (p *AlphabetProvider) BuildEncoder(rt reflect.Type) (func(interface{}, io.Writer) error, error) { + switch rt { + case typeCyrillic: + if p.enc == nil { + p.enc = &coder.RowEncoderBuilder{} + } + // Since Cyrillic is by default encodable, defer to the standard schema row decoder for the type. + return p.enc.Build(rt) + case typeLatin: + return func(iface interface{}, w io.Writer) error { + v := iface.(*Latin) + // Beam Schema Rows have a header that indicates which fields if any, are nil. + if err := coder.WriteRowHeader(2, func(i int) bool { + if i == 1 { + return v.d == nil + } + return false + }, w); err != nil { + return err + } + // Afterwards, each field is encoded using the appropriate helper. + if err := coder.EncodeVarUint64(v.c, w); err != nil { + return err + } + // Nil fields have nothing written for them other than the header. + if v.d != nil { + if err := coder.EncodeDouble(float64(*v.d), w); err != nil { + return err + } + } + return nil + }, nil + case typeΕλληνικά: + return func(iface interface{}, w io.Writer) error { + // Since the representation for Ελληνικά never has nil fields + // we can use the simple header helper. + if err := coder.WriteSimpleRowHeader(1, w); err != nil { + return err + } + v := iface.(*Ελληνικά) + if err := coder.EncodeStringUTF8(v.q, w); err != nil { + return fmt.Errorf("decoding string field A: %v", err) + } + return nil + }, nil + } + return nil, fmt.Errorf("Unknown Alphabet: %v", rt) +} + +// BuildDecoder returns beam schema decoder functions for types with the Alphabet interface. +func (p *AlphabetProvider) BuildDecoder(rt reflect.Type) (func(io.Reader) (interface{}, error), error) { + switch rt { + case typeCyrillic: + if p.dec == nil { + p.dec = &coder.RowDecoderBuilder{} + } + // Since Cyrillic is by default encodable, defer to the standard schema row decoder for the type. + return p.dec.Build(rt) + case typeLatin: + return func(r io.Reader) (interface{}, error) { + // Since the d field can be nil, we use the header get the nil bits. + n, nils, err := coder.ReadRowHeader(r) + if err != nil { + return nil, err + } + // Header returns the number of fields, so we check if it has what we + // expect. This allows schemas to evolve if necessary. + if n != 2 { + return nil, fmt.Errorf("expected 2 fields, but got %v", n) + } + c, err := coder.DecodeVarUint64(r) + if err != nil { + return nil, err + } + // Check if the field is nil before trying to decode a value for it. + var d *float32 + if !coder.IsFieldNil(nils, 1) { + f, err := coder.DecodeDouble(r) + if err != nil { + return nil, err + } + f32 := float32(f) + d = &f32 + } + return &Latin{ + c: c, + d: d, + }, nil + }, nil + case typeΕλληνικά: + return func(r io.Reader) (interface{}, error) { + // Since the representation for Ελληνικά never has nil fields + // we can use the simple header helper. Returns an error if + // something unexpected occurs. + if err := coder.ReadSimpleRowHeader(1, r); err != nil { + return nil, err + } + q, err := coder.DecodeStringUTF8(r) + if err != nil { + return nil, fmt.Errorf("decoding string field A: %v", err) + } + return &Ελληνικά{ + q: q, + }, nil + }, nil + } + return nil, nil +} + +// Schema providers work on fields of schema encoded types. +type translation struct { + C *Cyrillic + L *Latin + E *Ελληνικά +} + +func ExampleRegisterSchemaProvider() { + f := float32(42.789) + want := translation{ + C: &Cyrillic{A: 123, B: 456}, + L: &Latin{c: 789, d: &f}, + E: &Ελληνικά{q: "testing"}, + } + rt := reflect.TypeOf((*translation)(nil)).Elem() + enc, err := coder.RowEncoderForStruct(rt) + if err != nil { + panic(err) + } + dec, err := coder.RowDecoderForStruct(rt) + if err != nil { + panic(err) + } + var buf bytes.Buffer + if err := enc(want, &buf); err != nil { + panic(err) + } + got, err := dec(&buf) + if err != nil { + panic(err) + } + if d := cmp.Diff(want, got, + cmp.AllowUnexported(Latin{}, Ελληνικά{})); d != "" { + fmt.Printf("diff in schema encoding translation: (-want,+got)\n%v\n", d) + } else { + fmt.Println("No diffs!") + } + // Output: No diffs! +} diff --git a/sdks/go/pkg/beam/external.go b/sdks/go/pkg/beam/external.go index 462b45bdbfab..a19767023056 100644 --- a/sdks/go/pkg/beam/external.go +++ b/sdks/go/pkg/beam/external.go @@ -16,22 +16,26 @@ package beam import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // External defines a Beam external transform. The interpretation of this primitive is runner // specific. The runner is responsible for parsing the payload based on the -// spec provided to implement the behavior of the operation. Transform +// URN provided to implement the behavior of the operation. Transform // libraries should expose an API that captures the user's intent and serialize // the payload as a byte slice that the runner will deserialize. -func External(s Scope, spec string, payload []byte, in []PCollection, out []FullType, bounded bool) []PCollection { - return MustN(TryExternal(s, spec, payload, in, out, bounded)) +// +// Use ExternalTagged if the runner will need to associate the PTransforms local PCollection tags +// with values in the payload. +func External(s Scope, urn string, payload []byte, in []PCollection, out []FullType, bounded bool) []PCollection { + return MustN(TryExternal(s, urn, payload, in, out, bounded)) } -// TryExternal attempts to perform the work of External, returning an error indicating why the operation -// failed. -func TryExternal(s Scope, spec string, payload []byte, in []PCollection, out []FullType, bounded bool) ([]PCollection, error) { +// TryExternal attempts to perform the work of External, returning an error indicating +// why the operation failed. +func TryExternal(s Scope, urn string, payload []byte, in []PCollection, out []FullType, bounded bool) ([]PCollection, error) { if !s.IsValid() { return nil, errors.New("invalid scope") } @@ -45,7 +49,7 @@ func TryExternal(s Scope, spec string, payload []byte, in []PCollection, out []F for _, col := range in { ins = append(ins, col.n) } - edge := graph.NewExternal(s.real, s.scope, &graph.Payload{URN: spec, Data: payload}, ins, out, bounded) + edge := graph.NewExternal(s.real, s.scope, &graph.Payload{URN: urn, Data: payload}, ins, out, bounded) var ret []PCollection for _, out := range edge.Output { @@ -55,3 +59,50 @@ func TryExternal(s Scope, spec string, payload []byte, in []PCollection, out []F } return ret, nil } + +// ExternalTagged defines an external PTransform, and allows re-specifying the tags for the input +// and output PCollections. The interpretation of this primitive is runner specific. +// The runner is responsible for parsing the payload based on the URN provided to implement +// the behavior of the operation. Transform libraries should expose an API that captures +// the user's intent and serialize the payload as a byte slice that the runner will deserialize. +// +// Use ExternalTagged if the runner will need to associate the PTransforms local PCollection tags +// with values in the payload. Otherwise, prefer External. +func ExternalTagged( + s Scope, + urn string, + payload []byte, + namedInputs map[string]PCollection, + namedOutputTypes map[string]FullType, + bounded bool) map[string]PCollection { + return MustTaggedN(TryExternalTagged(s, urn, payload, namedInputs, namedOutputTypes, bounded)) +} + +// TryExternalTagged attempts to perform the work of ExternalTagged, returning an error +// indicating why the operation failed. +func TryExternalTagged( + s Scope, + urn string, + payload []byte, + namedInputs map[string]PCollection, + namedOutputTypes map[string]FullType, + bounded bool) (map[string]PCollection, error) { + if !s.IsValid() { + return nil, errors.New("invalid scope") + } + + inputsMap, inboundLinks := graph.NamedInboundLinks(mapPCollectionToNode(namedInputs)) + outputsMap, outboundLinks := graph.NamedOutboundLinks(s.real, namedOutputTypes) + + ext := graph.Payload{ + URN: urn, + Data: payload, + InputsMap: inputsMap, + OutputsMap: outputsMap, + } + + edge := graph.NewTaggedExternal(s.real, s.scope, &ext, inboundLinks, outboundLinks, bounded) + + namedOutputs := graphx.OutboundTagToNode(edge.Payload.OutputsMap, edge.Output) + return mapNodeToPCollection(namedOutputs), nil +} diff --git a/sdks/go/pkg/beam/external_test.go b/sdks/go/pkg/beam/external_test.go new file mode 100644 index 000000000000..4cc0d90d3a10 --- /dev/null +++ b/sdks/go/pkg/beam/external_test.go @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package beam + +import ( + "bytes" + "strings" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" +) + +func TestExternalTagged(t *testing.T) { + urn := "my::test::urn" + payload := []byte("expected_payload") + inTag := "inTag" + outTag := "outTag" + + p := NewPipeline() + s := p.Root() + + col := Impulse(s) + ft := typex.New(reflectx.ByteSlice) + outMap := ExternalTagged(s, urn, payload, map[string]PCollection{inTag: col}, map[string]FullType{outTag: ft}, true) + + // Validate the in-construction PCollection types for ExternalTagged. + if len(outMap) != 1 { + t.Fatalf("ExternalTagged = %v, want 1 output with tag %q", outMap, outTag) + } + out, ok := outMap[outTag] + if !ok { + t.Fatalf("ExternalTagged = %v, want 1 output with tag %q", outMap, outTag) + } + if !typex.IsEqual(out.Type(), ft) { + t.Fatalf("ExternalTagged()[%q].Type() = %v, want %v", outTag, out.Type(), ft) + } + + // Validate the post-construction pipeline protocol buffer contents. + edges, _, err := p.Build() + if err != nil { + t.Fatalf("Pipeline couldn't build: %v", err) + } + pb, err := graphx.Marshal(edges, &graphx.Options{Environment: &pipepb.Environment{}}) + if err != nil { + t.Fatalf("Couldn't graphx.Marshal edges: %v", err) + } + components := pb.GetComponents() + transforms := components.GetTransforms() + + foundExternalTagged := false + for _, transform := range transforms { + spec := transform.GetSpec() + if !strings.Contains(spec.GetUrn(), urn) { + continue + } + foundExternalTagged = true + if bytes.Compare(spec.GetPayload(), payload) != 0 { + t.Errorf("Payload value: got %v, want %v", spec.GetPayload(), payload) + } + if got, want := len(transform.GetInputs()), 1; got != want { + t.Errorf("transform.Inputs has size %v, want %v: %v", got, want, transform.GetInputs()) + } else if _, ok := transform.GetInputs()[inTag]; !ok { + t.Errorf("transform.Inputs doesn't contain %v tag, got %v", inTag, transform.GetInputs()) + } + + if got, want := len(transform.GetOutputs()), 1; got != want { + t.Errorf("transform.Outputs has size %v, want %v: %v", got, want, transform.GetOutputs()) + } else if _, ok := transform.GetOutputs()[outTag]; !ok { + t.Errorf("transform.Outputs doesn't contain %v tag, got %v", outTag, transform.GetOutputs()) + } + break + } + if !foundExternalTagged { + t.Errorf("Couldn't find PTransform with urn %q in graph %v", urn, transforms) + } +} diff --git a/sdks/go/pkg/beam/flatten.go b/sdks/go/pkg/beam/flatten.go index 576bdc76cd78..c167131bdd91 100644 --- a/sdks/go/pkg/beam/flatten.go +++ b/sdks/go/pkg/beam/flatten.go @@ -16,8 +16,8 @@ package beam import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Flatten is a PTransform that takes either multiple PCollections of type 'A' diff --git a/sdks/go/pkg/beam/forward.go b/sdks/go/pkg/beam/forward.go index 7b93ca6e6975..5114faf27c03 100644 --- a/sdks/go/pkg/beam/forward.go +++ b/sdks/go/pkg/beam/forward.go @@ -18,10 +18,11 @@ package beam import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/genx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/genx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // IMPLEMENTATION NOTE: functions and types in this file are assumed to be @@ -43,6 +44,17 @@ import ( // facing copy for this important concept. func RegisterType(t reflect.Type) { runtime.RegisterType(t) + if EnableSchemas { + schema.RegisterType(t) + } +} + +func init() { + runtime.RegisterInit(func() { + if EnableSchemas { + schema.Initialize() + } + }) } // RegisterFunction allows function registration. It is beneficial for performance diff --git a/sdks/go/pkg/beam/gbk.go b/sdks/go/pkg/beam/gbk.go index 293986e58122..2f41db9ff996 100644 --- a/sdks/go/pkg/beam/gbk.go +++ b/sdks/go/pkg/beam/gbk.go @@ -16,8 +16,8 @@ package beam import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // GroupByKey is a PTransform that takes a PCollection of type KV, diff --git a/sdks/go/pkg/beam/impulse.go b/sdks/go/pkg/beam/impulse.go index 9543fdd5c81f..8a0cbe0e3f0b 100644 --- a/sdks/go/pkg/beam/impulse.go +++ b/sdks/go/pkg/beam/impulse.go @@ -16,7 +16,7 @@ package beam import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" ) // Impulse emits a single empty []byte into the global window. The resulting diff --git a/sdks/go/pkg/beam/io/avroio/avroio.go b/sdks/go/pkg/beam/io/avroio/avroio.go index 6fedc013f3f2..c4b07cfc9bd3 100644 --- a/sdks/go/pkg/beam/io/avroio/avroio.go +++ b/sdks/go/pkg/beam/io/avroio/avroio.go @@ -22,9 +22,9 @@ import ( "reflect" "strings" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" "github.com/linkedin/goavro" ) diff --git a/sdks/go/pkg/beam/io/bigqueryio/bigquery.go b/sdks/go/pkg/beam/io/bigqueryio/bigquery.go index dc54109645c6..c6976e453a55 100644 --- a/sdks/go/pkg/beam/io/bigqueryio/bigquery.go +++ b/sdks/go/pkg/beam/io/bigqueryio/bigquery.go @@ -26,9 +26,9 @@ import ( "time" "cloud.google.com/go/bigquery" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" bq "google.golang.org/api/bigquery/v2" "google.golang.org/api/googleapi" "google.golang.org/api/iterator" diff --git a/sdks/go/pkg/beam/io/databaseio/database.go b/sdks/go/pkg/beam/io/databaseio/database.go index 2b4167123a44..f49d6887413f 100644 --- a/sdks/go/pkg/beam/io/databaseio/database.go +++ b/sdks/go/pkg/beam/io/databaseio/database.go @@ -21,9 +21,9 @@ import ( "context" "database/sql" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" "reflect" "strings" ) diff --git a/sdks/go/pkg/beam/io/databaseio/mapper.go b/sdks/go/pkg/beam/io/databaseio/mapper.go index 56a98a5096e4..c327a07ff663 100644 --- a/sdks/go/pkg/beam/io/databaseio/mapper.go +++ b/sdks/go/pkg/beam/io/databaseio/mapper.go @@ -23,7 +23,7 @@ import ( "strings" "time" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) //rowMapper represents a record mapper diff --git a/sdks/go/pkg/beam/io/databaseio/util.go b/sdks/go/pkg/beam/io/databaseio/util.go index 04324fa941fe..97d098cb0732 100644 --- a/sdks/go/pkg/beam/io/databaseio/util.go +++ b/sdks/go/pkg/beam/io/databaseio/util.go @@ -21,7 +21,7 @@ import ( "reflect" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) //mapFields maps column into field index in record type diff --git a/sdks/go/pkg/beam/io/databaseio/writer.go b/sdks/go/pkg/beam/io/databaseio/writer.go index 1b904c88f6d0..83c74ff4abf2 100644 --- a/sdks/go/pkg/beam/io/databaseio/writer.go +++ b/sdks/go/pkg/beam/io/databaseio/writer.go @@ -23,8 +23,8 @@ import ( "golang.org/x/net/context" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) // Writer returns a row of data to be inserted into a table. diff --git a/sdks/go/pkg/beam/io/datastoreio/datastore.go b/sdks/go/pkg/beam/io/datastoreio/datastore.go index d80395cc032c..8bce95a71e60 100644 --- a/sdks/go/pkg/beam/io/datastoreio/datastore.go +++ b/sdks/go/pkg/beam/io/datastoreio/datastore.go @@ -27,10 +27,10 @@ import ( "strings" "cloud.google.com/go/datastore" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" "google.golang.org/api/iterator" ) diff --git a/sdks/go/pkg/beam/io/filesystem/filesystem.go b/sdks/go/pkg/beam/io/filesystem/filesystem.go index f979a7489547..61c8aa4f1761 100644 --- a/sdks/go/pkg/beam/io/filesystem/filesystem.go +++ b/sdks/go/pkg/beam/io/filesystem/filesystem.go @@ -23,7 +23,7 @@ import ( "io" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) var registry = make(map[string]func(context.Context) Interface) diff --git a/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go b/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go index 2cff9200ee89..192c471ec416 100644 --- a/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go +++ b/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go @@ -25,10 +25,10 @@ import ( "strings" "cloud.google.com/go/storage" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/util/gcsx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/gcsx" "google.golang.org/api/iterator" "google.golang.org/api/option" ) diff --git a/sdks/go/pkg/beam/io/filesystem/local/local.go b/sdks/go/pkg/beam/io/filesystem/local/local.go index f7cd60072217..074029941700 100644 --- a/sdks/go/pkg/beam/io/filesystem/local/local.go +++ b/sdks/go/pkg/beam/io/filesystem/local/local.go @@ -22,7 +22,7 @@ import ( "os" "path/filepath" - "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem" ) func init() { diff --git a/sdks/go/pkg/beam/io/filesystem/memfs/memory.go b/sdks/go/pkg/beam/io/filesystem/memfs/memory.go index 8005815eed85..b38256809bb4 100644 --- a/sdks/go/pkg/beam/io/filesystem/memfs/memory.go +++ b/sdks/go/pkg/beam/io/filesystem/memfs/memory.go @@ -26,7 +26,7 @@ import ( "strings" "sync" - "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem" ) func init() { diff --git a/sdks/go/pkg/beam/io/filesystem/memfs/memory_test.go b/sdks/go/pkg/beam/io/filesystem/memfs/memory_test.go index 08b012d5b49f..ce35d15c6f5f 100644 --- a/sdks/go/pkg/beam/io/filesystem/memfs/memory_test.go +++ b/sdks/go/pkg/beam/io/filesystem/memfs/memory_test.go @@ -20,7 +20,7 @@ import ( "os" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem" ) // TestReadWrite tests that read and write from the memory filesystem diff --git a/sdks/go/pkg/beam/io/pubsubio/pubsubio.go b/sdks/go/pkg/beam/io/pubsubio/pubsubio.go index 5232a5e4e15d..88b7395f44eb 100644 --- a/sdks/go/pkg/beam/io/pubsubio/pubsubio.go +++ b/sdks/go/pkg/beam/io/pubsubio/pubsubio.go @@ -20,14 +20,19 @@ package pubsubio import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/protox" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/io/pubsubio/v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/pubsubx" - "github.com/golang/protobuf/proto" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/pubsubx" pb "google.golang.org/genproto/googleapis/pubsub/v1" + "google.golang.org/protobuf/proto" +) + +var ( + readURN = "beam:transform:pubsub_read:v1" + writeURN = "beam:transform:pubsub_write:v1" ) func init() { @@ -49,8 +54,7 @@ type ReadOptions struct { func Read(s beam.Scope, project, topic string, opts *ReadOptions) beam.PCollection { s = s.Scope("pubsubio.Read") - payload := &v1.PubSubPayload{ - Op: v1.PubSubPayload_READ, + payload := &pipepb.PubSubReadPayload{ Topic: pubsubx.MakeQualifiedTopicName(project, topic), } if opts != nil { @@ -62,8 +66,8 @@ func Read(s beam.Scope, project, topic string, opts *ReadOptions) beam.PCollecti payload.WithAttributes = opts.WithAttributes } - out := beam.External(s, v1.PubSubPayloadURN, protox.MustEncode(payload), nil, []beam.FullType{typex.New(reflectx.ByteSlice)}, false) - if opts.WithAttributes { + out := beam.External(s, readURN, protox.MustEncode(payload), nil, []beam.FullType{typex.New(reflectx.ByteSlice)}, false) + if opts != nil && opts.WithAttributes { return beam.ParDo(s, unmarshalMessageFn, out[0]) } return out[0] @@ -81,15 +85,13 @@ func unmarshalMessageFn(raw []byte) (*pb.PubsubMessage, error) { func Write(s beam.Scope, project, topic string, col beam.PCollection) { s = s.Scope("pubsubio.Write") - payload := &v1.PubSubPayload{ - Op: v1.PubSubPayload_WRITE, + payload := &pipepb.PubSubWritePayload{ Topic: pubsubx.MakeQualifiedTopicName(project, topic), } out := col if col.Type().Type() != reflectx.ByteSlice { out = beam.ParDo(s, proto.Marshal, col) - payload.WithAttributes = true } - beam.External(s, v1.PubSubPayloadURN, protox.MustEncode(payload), []beam.PCollection{out}, nil, false) + beam.External(s, writeURN, protox.MustEncode(payload), []beam.PCollection{out}, nil, false) } diff --git a/sdks/go/pkg/beam/io/pubsubio/v1/v1.pb.go b/sdks/go/pkg/beam/io/pubsubio/v1/v1.pb.go deleted file mode 100644 index e398a09d1ed4..000000000000 --- a/sdks/go/pkg/beam/io/pubsubio/v1/v1.pb.go +++ /dev/null @@ -1,134 +0,0 @@ -// Code generated by protoc-gen-go. DO NOT EDIT. -// source: v1.proto - -/* -Package v1 is a generated protocol buffer package. - -It is generated from these files: - v1.proto - -It has these top-level messages: - PubSubPayload -*/ -package v1 - -import proto "github.com/golang/protobuf/proto" -import fmt "fmt" -import math "math" - -// Reference imports to suppress errors if they are not otherwise used. -var _ = proto.Marshal -var _ = fmt.Errorf -var _ = math.Inf - -// This is a compile-time assertion to ensure that this generated file -// is compatible with the proto package it is being compiled against. -// A compilation error at this line likely means your copy of the -// proto package needs to be updated. -const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package - -type PubSubPayload_Op int32 - -const ( - PubSubPayload_INVALID PubSubPayload_Op = 0 - PubSubPayload_READ PubSubPayload_Op = 1 - PubSubPayload_WRITE PubSubPayload_Op = 2 -) - -var PubSubPayload_Op_name = map[int32]string{ - 0: "INVALID", - 1: "READ", - 2: "WRITE", -} -var PubSubPayload_Op_value = map[string]int32{ - "INVALID": 0, - "READ": 1, - "WRITE": 2, -} - -func (x PubSubPayload_Op) String() string { - return proto.EnumName(PubSubPayload_Op_name, int32(x)) -} -func (PubSubPayload_Op) EnumDescriptor() ([]byte, []int) { return fileDescriptor0, []int{0, 0} } - -type PubSubPayload struct { - Op PubSubPayload_Op `protobuf:"varint,1,opt,name=op,enum=v1.PubSubPayload_Op" json:"op,omitempty"` - Topic string `protobuf:"bytes,2,opt,name=Topic" json:"Topic,omitempty"` - Subscription string `protobuf:"bytes,3,opt,name=Subscription" json:"Subscription,omitempty"` - IdAttribute string `protobuf:"bytes,4,opt,name=IdAttribute" json:"IdAttribute,omitempty"` - TimestampAttribute string `protobuf:"bytes,5,opt,name=TimestampAttribute" json:"TimestampAttribute,omitempty"` - WithAttributes bool `protobuf:"varint,6,opt,name=WithAttributes" json:"WithAttributes,omitempty"` -} - -func (m *PubSubPayload) Reset() { *m = PubSubPayload{} } -func (m *PubSubPayload) String() string { return proto.CompactTextString(m) } -func (*PubSubPayload) ProtoMessage() {} -func (*PubSubPayload) Descriptor() ([]byte, []int) { return fileDescriptor0, []int{0} } - -func (m *PubSubPayload) GetOp() PubSubPayload_Op { - if m != nil { - return m.Op - } - return PubSubPayload_INVALID -} - -func (m *PubSubPayload) GetTopic() string { - if m != nil { - return m.Topic - } - return "" -} - -func (m *PubSubPayload) GetSubscription() string { - if m != nil { - return m.Subscription - } - return "" -} - -func (m *PubSubPayload) GetIdAttribute() string { - if m != nil { - return m.IdAttribute - } - return "" -} - -func (m *PubSubPayload) GetTimestampAttribute() string { - if m != nil { - return m.TimestampAttribute - } - return "" -} - -func (m *PubSubPayload) GetWithAttributes() bool { - if m != nil { - return m.WithAttributes - } - return false -} - -func init() { - proto.RegisterType((*PubSubPayload)(nil), "v1.PubSubPayload") - proto.RegisterEnum("v1.PubSubPayload_Op", PubSubPayload_Op_name, PubSubPayload_Op_value) -} - -func init() { proto.RegisterFile("v1.proto", fileDescriptor0) } - -var fileDescriptor0 = []byte{ - // 226 bytes of a gzipped FileDescriptorProto - 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0xff, 0x6c, 0x90, 0x41, 0x4b, 0xc3, 0x30, - 0x18, 0x86, 0x4d, 0x5c, 0x67, 0xf7, 0x4d, 0x47, 0xf9, 0xd8, 0x21, 0xc7, 0x52, 0x64, 0xf4, 0x14, - 0xa8, 0xfe, 0x82, 0xc2, 0x76, 0x08, 0x88, 0x1b, 0x59, 0x71, 0xe7, 0x66, 0x1b, 0x18, 0x70, 0xe6, - 0xa3, 0x4d, 0x0a, 0xfe, 0x0c, 0xff, 0xb1, 0x10, 0x41, 0x9d, 0xec, 0xf8, 0x3e, 0xcf, 0x73, 0x7a, - 0x21, 0x1d, 0x2a, 0x49, 0x9d, 0xf3, 0x0e, 0xf9, 0x50, 0x15, 0x9f, 0x1c, 0xee, 0x36, 0xc1, 0x6c, - 0x83, 0xd9, 0xb4, 0x1f, 0x6f, 0xae, 0x3d, 0xe0, 0x3d, 0x70, 0x47, 0x82, 0xe5, 0xac, 0x9c, 0x3d, - 0xcc, 0xe5, 0x50, 0xc9, 0x33, 0x2d, 0xd7, 0xa4, 0xb9, 0x23, 0x9c, 0x43, 0xd2, 0x38, 0xb2, 0x7b, - 0xc1, 0x73, 0x56, 0x4e, 0xf4, 0xf7, 0xc0, 0x02, 0x6e, 0xb7, 0xc1, 0xf4, 0xfb, 0xce, 0x92, 0xb7, - 0xee, 0x5d, 0x5c, 0x47, 0x79, 0xc6, 0x30, 0x87, 0xa9, 0x3a, 0xd4, 0xde, 0x77, 0xd6, 0x04, 0x7f, - 0x14, 0xa3, 0x98, 0xfc, 0x45, 0x28, 0x01, 0x1b, 0x7b, 0x3a, 0xf6, 0xbe, 0x3d, 0xd1, 0x6f, 0x98, - 0xc4, 0xf0, 0x82, 0xc1, 0x05, 0xcc, 0x76, 0xd6, 0xbf, 0xfe, 0x80, 0x5e, 0x8c, 0x73, 0x56, 0xa6, - 0xfa, 0x1f, 0x2d, 0x16, 0xc0, 0xd7, 0x84, 0x53, 0xb8, 0x51, 0xcf, 0x2f, 0xf5, 0x93, 0x5a, 0x66, - 0x57, 0x98, 0xc2, 0x48, 0xaf, 0xea, 0x65, 0xc6, 0x70, 0x02, 0xc9, 0x4e, 0xab, 0x66, 0x95, 0x71, - 0x33, 0x8e, 0xf7, 0x3c, 0x7e, 0x05, 0x00, 0x00, 0xff, 0xff, 0x3a, 0x1b, 0xdf, 0x4b, 0x2a, 0x01, - 0x00, 0x00, -} diff --git a/sdks/go/pkg/beam/io/rtrackers/offsetrange/offsetrange.go b/sdks/go/pkg/beam/io/rtrackers/offsetrange/offsetrange.go index 3c767257c1ab..8cef51ada8ca 100644 --- a/sdks/go/pkg/beam/io/rtrackers/offsetrange/offsetrange.go +++ b/sdks/go/pkg/beam/io/rtrackers/offsetrange/offsetrange.go @@ -26,8 +26,8 @@ import ( "math" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" ) func init() { diff --git a/sdks/go/pkg/beam/io/synthetic/source.go b/sdks/go/pkg/beam/io/synthetic/source.go index 67ad727a51c3..585427dca72e 100644 --- a/sdks/go/pkg/beam/io/synthetic/source.go +++ b/sdks/go/pkg/beam/io/synthetic/source.go @@ -30,13 +30,14 @@ import ( "reflect" "time" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" - "github.com/apache/beam/sdks/go/pkg/beam/io/rtrackers/offsetrange" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/rtrackers/offsetrange" ) func init() { beam.RegisterType(reflect.TypeOf((*sourceFn)(nil)).Elem()) + beam.RegisterType(reflect.TypeOf((*SourceConfig)(nil)).Elem()) } // Source creates a synthetic source transform that emits randomly diff --git a/sdks/go/pkg/beam/io/synthetic/step.go b/sdks/go/pkg/beam/io/synthetic/step.go index dccb71b8e529..35ed7f4ce0e0 100644 --- a/sdks/go/pkg/beam/io/synthetic/step.go +++ b/sdks/go/pkg/beam/io/synthetic/step.go @@ -17,13 +17,13 @@ package synthetic import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" "math/rand" "reflect" "time" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/rtrackers/offsetrange" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/rtrackers/offsetrange" ) func init() { diff --git a/sdks/go/pkg/beam/io/textio/sdf.go b/sdks/go/pkg/beam/io/textio/sdf.go index bfccca911d93..91149d05dd6a 100644 --- a/sdks/go/pkg/beam/io/textio/sdf.go +++ b/sdks/go/pkg/beam/io/textio/sdf.go @@ -22,12 +22,12 @@ import ( "reflect" "strings" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/sdf" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem" - "github.com/apache/beam/sdks/go/pkg/beam/io/rtrackers/offsetrange" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/rtrackers/offsetrange" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) func init() { diff --git a/sdks/go/pkg/beam/io/textio/sdf_test.go b/sdks/go/pkg/beam/io/textio/sdf_test.go index 733d79a9b17f..881b9ab51999 100644 --- a/sdks/go/pkg/beam/io/textio/sdf_test.go +++ b/sdks/go/pkg/beam/io/textio/sdf_test.go @@ -19,9 +19,9 @@ import ( "context" "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/direct" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/direct" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" ) // TestReadSdf tests that readSdf successfully reads a test text file, and diff --git a/sdks/go/pkg/beam/io/textio/textio.go b/sdks/go/pkg/beam/io/textio/textio.go index e8bc7ed43104..3b0abf58877f 100644 --- a/sdks/go/pkg/beam/io/textio/textio.go +++ b/sdks/go/pkg/beam/io/textio/textio.go @@ -24,9 +24,9 @@ import ( "reflect" "strings" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) func init() { diff --git a/sdks/go/pkg/beam/io/textio/textio_test.go b/sdks/go/pkg/beam/io/textio/textio_test.go index 77807fb6dd0e..4090535226c7 100644 --- a/sdks/go/pkg/beam/io/textio/textio_test.go +++ b/sdks/go/pkg/beam/io/textio/textio_test.go @@ -19,7 +19,7 @@ package textio import ( "testing" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" ) func TestRead(t *testing.T) { diff --git a/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go b/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go new file mode 100644 index 000000000000..8781cff93527 --- /dev/null +++ b/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go @@ -0,0 +1,287 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package kafkaio contains cross-language functionality for using Apache Kafka +// (http://kafka.apache.org/). These transforms only work on runners that +// support cross-language transforms. +// +// Setup +// +// Transforms specified here are cross-language transforms implemented in a +// different SDK (listed below). During pipeline construction, the Go SDK will +// need to connect to an expansion service containing information on these +// transforms in their native SDK. +// +// To use an expansion service, it must be run as a separate process accessible +// during pipeline construction. The address of that process must be passed to +// the transforms in this package. +// +// The version of the expansion service should match the version of the Beam SDK +// being used. For numbered releases of Beam, these expansions services are +// released to the Maven repository as modules. For development versions of +// Beam, it is recommended to build and run it from source using Gradle. +// +// Current supported SDKs, including expansion service modules and reference +// documentation: +// * Java +// - Vendored Module: beam-sdks-java-io-expansion-service +// - Run via Gradle: ./gradlew :sdks:java:io:expansion-service:runExpansionService +// - Reference Class: org.apache.beam.sdk.io.kafka.KafkaIO +package kafkaio + +// TODO(BEAM-12492): Implement an API for specifying Kafka type serializers and +// deserializers. + +import ( + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" +) + +func init() { + beam.RegisterType(reflect.TypeOf((*readPayload)(nil)).Elem()) + beam.RegisterType(reflect.TypeOf((*writePayload)(nil)).Elem()) +} + +type policy string + +const ( + ByteArrayDeserializer = "org.apache.kafka.common.serialization.ByteArrayDeserializer" + ByteArraySerializer = "org.apache.kafka.common.serialization.ByteArraySerializer" + + // ProcessingTime is a timestamp policy that assigns processing time to + // each record. Specifically, this is the timestamp when the record becomes + // "current" in the reader. Further documentation can be found in Java's + // KafkaIO documentation. + ProcessingTime policy = "ProcessingTime" + + // CreateTime is a timestamp policy based on the CREATE_TIME timestamps of + // kafka records. Requires the records to have a type set to + // org.apache.kafka.common.record.TimestampTypeCREATE_TIME. Further + // documentation can be found in Java's KafkaIO documentation. + CreateTime policy = "CreateTime" + + // LogAppendTime is a timestamp policy that assigns Kafka's log append time + // (server side ingestion time) to each record. Further documentation can + // be found in Java's KafkaIO documentation. + LogAppendTime policy = "LogAppendTime" + + readURN = "beam:external:java:kafkaio:typedwithoutmetadata:v1" + writeURN = "beam:external:java:kafka:write:v1" +) + +// Read is a cross-language PTransform which reads from Kafka and returns a +// KV pair for each item in the specified Kafka topics. By default, this runs +// as an unbounded transform and outputs keys and values as byte slices. +// These properties can be changed through optional parameters. +// +// Read requires the address for an expansion service for Kafka Read transforms, +// a comma-seperated list of bootstrap server addresses (see the Kafka property +// "bootstrap.servers" for details), and at least one topic to read from. +// +// Read also accepts optional parameters as readOptions. All optional parameters +// are predefined in this package as functions that return readOption. To set +// an optional parameter, call the function within Read's function signature. +// +// Example of Read with required and optional parameters: +// +// expansionAddr := "localhost:1234" +// bootstrapServer := "bootstrap-server:1234" +// topic := "topic_name" +// pcol := kafkaio.Read( s, expansionAddr, bootstrapServer, []string{topic}, +// kafkaio.MaxNumRecords(100), kafkaio.CommitOffsetInFinalize(true)) +func Read(s beam.Scope, addr string, servers string, topics []string, opts ...readOption) beam.PCollection { + s = s.Scope("kafkaio.Read") + + if len(topics) == 0 { + panic("kafkaio.Read requires at least one topic to read from.") + } + + rpl := readPayload{ + ConsumerConfig: map[string]string{"bootstrap.servers": servers}, + Topics: topics, + KeyDeserializer: ByteArrayDeserializer, + ValueDeserializer: ByteArrayDeserializer, + TimestampPolicy: string(ProcessingTime), + } + rcfg := readConfig{ + pl: &rpl, + key: reflectx.ByteSlice, + val: reflectx.ByteSlice, + } + for _, opt := range opts { + opt(&rcfg) + } + + pl := beam.CrossLanguagePayload(rpl) + outT := beam.UnnamedOutput(typex.NewKV(typex.New(rcfg.key), typex.New(rcfg.val))) + out := beam.CrossLanguage(s, readURN, pl, addr, nil, outT) + return out[beam.UnnamedOutputTag()] +} + +type readOption func(*readConfig) +type readConfig struct { + pl *readPayload + key reflect.Type + val reflect.Type +} + +// ConsumerConfigs is a Read option that adds consumer properties to the +// Consumer configuration of the transform. Each usage of this adds the given +// elements to the existing map without removing existing elements. +// +// Note that the "bootstrap.servers" property is automatically set by +// kafkaio.Read and does not need to be specified via this option. +func ConsumerConfigs(cfgs map[string]string) readOption { + return func(cfg *readConfig) { + for k, v := range cfgs { + cfg.pl.ConsumerConfig[k] = v + } + } +} + +// StartReadTimestamp is a Read option that specifies a start timestamp in +// milliseconds epoch, so only records after that timestamp will be read. +// +// This results in failures if one or more partitions don't contain messages +// with a timestamp larger than or equal to the one specified, or if the +// message format version in a partition is before 0.10.0, meaning messages do +// not have timestamps. +func StartReadTimestamp(ts int64) readOption { + return func(cfg *readConfig) { + cfg.pl.StartReadTime = &ts + } +} + +// MaxNumRecords is a Read option that specifies the maximum amount of records +// to be read. Setting this will cause the Read to execute as a bounded +// transform. Useful for tests tests and demo applications. +func MaxNumRecords(num int64) readOption { + return func(cfg *readConfig) { + cfg.pl.MaxNumRecords = &num + } +} + +// MaxReadSecs is a Read option that specifies the maximum amount of time in +// seconds the transform executes. Setting this will cause the Read to execute +// as a bounded transform. Useful for tests and demo applications. +func MaxReadSecs(secs int64) readOption { + return func(cfg *readConfig) { + cfg.pl.MaxReadTime = &secs + } +} + +// CommitOffsetInFinalize is a Read option that specifies whether to commit +// offsets when finalizing. +// +// Default: false +func CommitOffsetInFinalize(enabled bool) readOption { + return func(cfg *readConfig) { + cfg.pl.CommitOffsetInFinalize = enabled + } +} + +// TimestampPolicy is a Read option that specifies the timestamp policy to use +// for extracting timestamps from the KafkaRecord. Must be one of the predefined +// constant timestamp policies in this package. +// +// Default: kafkaio.ProcessingTime +func TimestampPolicy(name policy) readOption { + return func(cfg *readConfig) { + cfg.pl.TimestampPolicy = string(name) + } +} + +// readPayload should produce a schema matching the expected cross-language +// payload for Kafka reads. An example of this on the receiving end can be +// found in the Java SDK class +// org.apache.beam.sdk.io.kafka.KafkaIO.Read.External.Configuration. +type readPayload struct { + ConsumerConfig map[string]string + Topics []string + KeyDeserializer string + ValueDeserializer string + StartReadTime *int64 + MaxNumRecords *int64 + MaxReadTime *int64 + CommitOffsetInFinalize bool + TimestampPolicy string +} + +// Write is a cross-language PTransform which writes KV data to a specified +// Kafka topic. By default, this assumes keys and values to be received as +// byte slices. This can be changed through optional parameters. +// +// Write requires the address for an expansion service for Kafka Write +// transforms, a comma-seperated list of bootstrap server addresses (see the +// Kafka property "bootstrap.servers" for details), and a topic to write to. +// +// Write also accepts optional parameters as writeOptions. All optional +// parameters are predefined in this package as functions that return +// writeOption. To set an optional parameter, call the function within Write's +// function signature. +// +// Example of Write with required and optional parameters: +// +// expansionAddr := "localhost:1234" +// bootstrapServer := "bootstrap-server:1234" +// topic := "topic_name" +// pcol := kafkaio.Read(s, expansionAddr, bootstrapServer, topic, +// kafkaio.ValueSerializer("foo.BarSerializer")) +func Write(s beam.Scope, addr, servers, topic string, col beam.PCollection, opts ...writeOption) { + s = s.Scope("kafkaio.Write") + + wpl := writePayload{ + ProducerConfig: map[string]string{"bootstrap.servers": servers}, + Topic: topic, + KeySerializer: ByteArraySerializer, + ValueSerializer: ByteArraySerializer, + } + for _, opt := range opts { + opt(&wpl) + } + + pl := beam.CrossLanguagePayload(wpl) + beam.CrossLanguage(s, writeURN, pl, addr, beam.UnnamedInput(col), nil) +} + +type writeOption func(*writePayload) + +// ProducerConfigs is a Write option that adds producer properties to the +// Producer configuration of the transform. Each usage of this adds the given +// elements to the existing map without removing existing elements. +// +// Note that the "bootstrap.servers" property is automatically set by +// kafkaio.Write and does not need to be specified via this option. +func ProducerConfigs(cfgs map[string]string) writeOption { + return func(pl *writePayload) { + for k, v := range cfgs { + pl.ProducerConfig[k] = v + } + } +} + +// writePayload should produce a schema matching the expected cross-language +// payload for Kafka writes. An example of this on the receiving end can be +// found in the Java SDK class +// org.apache.beam.sdk.io.kafka.KafkaIO.Write.External.Configuration. +type writePayload struct { + ProducerConfig map[string]string + Topic string + KeySerializer string + ValueSerializer string +} diff --git a/sdks/go/pkg/beam/metrics.go b/sdks/go/pkg/beam/metrics.go index 0d4c0bbb2161..07dbc8af5de1 100644 --- a/sdks/go/pkg/beam/metrics.go +++ b/sdks/go/pkg/beam/metrics.go @@ -18,7 +18,7 @@ package beam import ( "context" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" ) // Implementation Note: The wrapping of the embedded methods diff --git a/sdks/go/pkg/beam/metrics_test.go b/sdks/go/pkg/beam/metrics_test.go index 1bfb712eaa78..76350d2cc136 100644 --- a/sdks/go/pkg/beam/metrics_test.go +++ b/sdks/go/pkg/beam/metrics_test.go @@ -20,8 +20,8 @@ import ( "regexp" "time" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" ) // A beam_test global context var to improve how the examples look. diff --git a/sdks/go/pkg/beam/model/PROTOBUF.md b/sdks/go/pkg/beam/model/PROTOBUF.md index 5c9b162d5050..86eefe2a9f75 100644 --- a/sdks/go/pkg/beam/model/PROTOBUF.md +++ b/sdks/go/pkg/beam/model/PROTOBUF.md @@ -19,49 +19,35 @@ # Rebuilding generated protobuf code -If you make changes to .proto files, you will need to rebuild the generated Go code. +If you make changes to .proto files, you will need to rebuild the generated Go +code. You may also need to rebuild the generated code if the required versions +of the [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) +or [google.golang.org/grpc](https://github.com/grpc/grpc-go) modules defined +in [go.mod](https://github.com/apache/beam/blob/master/sdks/go.mod) change. First, follow this one-time setup: 1. Download [the protobuf compiler](https://github.com/google/protobuf/releases). - The simplest approach is to download one of the prebuilt binaries and extract - it somewhere in your machine's `$PATH`. -1. A properly installed Go development environment per [the official - instructions](https://golang.org/doc/install). `$GOPATH` must be set properly. - If it's not set, follow - [these instructions](https://github.com/golang/go/wiki/SettingGOPATH). + The simplest approach is to download one of the prebuilt binaries (named + `protoc`) and extract it somewhere in your machine's `$PATH`. 1. Add `$GOBIN` to your `$PATH`. (Note: If `$GOBIN` is not set, add `$GOPATH/bin` instead.) To generate the code: 1. Navigate to this directory (`pkg/beam/model`). -1. `go get -u github.com/golang/protobuf/protoc-gen-go` +1. Check [go.mod](https://github.com/apache/beam/blob/master/sdks/go.mod) and + make note of which versions of [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) + and [google.golang.org/grpc](https://github.com/grpc/grpc-go) are required. +1. Install the compiler executables at the corresponding versions. + 1. `go install google.golang.org/protobuf/cmd/protoc-gen-go@` + 1. `go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@` 1. `go generate` ## Generated Go code fails to build -Occasionally, after following the steps above and updating the generated .pb.go -files, they may fail to build. This usually indicates a version mismatch in the -[golang/protobuf](https://github.com/golang/protobuf) package. Specifically, -the version of protoc-gen-go in the local Go workspace (used during -`go generate`) differs from the cached version of golang/protobuf used for -building Beam (specified in [gogradle.lock](https://github.com/apache/beam/blob/master/sdks/go/gogradle.lock)). - -The preferred way to fix this issue is to update the fixed Beam version of -golang/protobuf to a recent commit. This can be done by manually changing the -commit hash for golang/protobuf in [gogradle.lock](https://github.com/apache/beam/blob/master/sdks/go/gogradle.lock). - -If that fails due to dependency issues, an alternate approach is to downgrade -the local version of protoc-gen-go to match the commit in gogradle.lock, with -the following commands. - -```bash -# Replace with the commit of golang/protobuf in gogradle.lock. -go get -d -u github.com/golang/protobuf/protoc-gen-go -git -C "$(go env GOPATH)"/src/github.com/golang/protobuf checkout -go install github.com/golang/protobuf/protoc-gen-go -``` -> **Note:** This leaves the local repository of protoc-gen-go in a detached -> head state, which may cause problems when updating it in the future. To fix -> this, navigate to the protoc-gen-go directory and run `git checkout master`. +If the generated .pb.go code contains build errors, it indicates a version +mismatch somewhere between the packages required in go.mod, the installed Go +executables, and the prebuilt `protoc` binary (which should match the +google.golang.org/protobuf version). Following the steps above with matching +version numbers should fix the error. diff --git a/sdks/go/pkg/beam/model/fnexecution_v1/beam_fn_api.pb.go b/sdks/go/pkg/beam/model/fnexecution_v1/beam_fn_api.pb.go index ef11c79cc3b9..b03f6141f569 100644 --- a/sdks/go/pkg/beam/model/fnexecution_v1/beam_fn_api.pb.go +++ b/sdks/go/pkg/beam/model/fnexecution_v1/beam_fn_api.pb.go @@ -26,8 +26,8 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: beam_fn_api.proto // TODO: Consider consolidating common components in another package @@ -36,16 +36,12 @@ package fnexecution_v1 import ( - context "context" - pipeline_v1 "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - _ "github.com/golang/protobuf/protoc-gen-go/descriptor" - duration "github.com/golang/protobuf/ptypes/duration" - timestamp "github.com/golang/protobuf/ptypes/timestamp" - grpc "google.golang.org/grpc" - codes "google.golang.org/grpc/codes" - status "google.golang.org/grpc/status" + pipeline_v1 "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" protoreflect "google.golang.org/protobuf/reflect/protoreflect" protoimpl "google.golang.org/protobuf/runtime/protoimpl" + _ "google.golang.org/protobuf/types/descriptorpb" + durationpb "google.golang.org/protobuf/types/known/durationpb" + timestamppb "google.golang.org/protobuf/types/known/timestamppb" reflect "reflect" sync "sync" ) @@ -127,7 +123,7 @@ func (x LogEntry_Severity_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use LogEntry_Severity_Enum.Descriptor instead. func (LogEntry_Severity_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{29, 1, 0} + return file_beam_fn_api_proto_rawDescGZIP(), []int{31, 1, 0} } // A descriptor for connecting to a remote port using the Beam Fn Data API. @@ -261,6 +257,7 @@ type InstructionRequest struct { // *InstructionRequest_ProcessBundleSplit // *InstructionRequest_FinalizeBundle // *InstructionRequest_MonitoringInfos + // *InstructionRequest_HarnessMonitoringInfos // *InstructionRequest_Register Request isInstructionRequest_Request `protobuf_oneof:"request"` } @@ -346,6 +343,13 @@ func (x *InstructionRequest) GetMonitoringInfos() *MonitoringInfosMetadataReques return nil } +func (x *InstructionRequest) GetHarnessMonitoringInfos() *HarnessMonitoringInfosRequest { + if x, ok := x.GetRequest().(*InstructionRequest_HarnessMonitoringInfos); ok { + return x.HarnessMonitoringInfos + } + return nil +} + func (x *InstructionRequest) GetRegister() *RegisterRequest { if x, ok := x.GetRequest().(*InstructionRequest_Register); ok { return x.Register @@ -377,6 +381,10 @@ type InstructionRequest_MonitoringInfos struct { MonitoringInfos *MonitoringInfosMetadataRequest `protobuf:"bytes,1005,opt,name=monitoring_infos,json=monitoringInfos,proto3,oneof"` } +type InstructionRequest_HarnessMonitoringInfos struct { + HarnessMonitoringInfos *HarnessMonitoringInfosRequest `protobuf:"bytes,1006,opt,name=harness_monitoring_infos,json=harnessMonitoringInfos,proto3,oneof"` +} + type InstructionRequest_Register struct { // DEPRECATED Register *RegisterRequest `protobuf:"bytes,1000,opt,name=register,proto3,oneof"` @@ -392,6 +400,8 @@ func (*InstructionRequest_FinalizeBundle) isInstructionRequest_Request() {} func (*InstructionRequest_MonitoringInfos) isInstructionRequest_Request() {} +func (*InstructionRequest_HarnessMonitoringInfos) isInstructionRequest_Request() {} + func (*InstructionRequest_Register) isInstructionRequest_Request() {} // The response for an associated request the SDK had been asked to fulfill. @@ -418,6 +428,7 @@ type InstructionResponse struct { // *InstructionResponse_ProcessBundleSplit // *InstructionResponse_FinalizeBundle // *InstructionResponse_MonitoringInfos + // *InstructionResponse_HarnessMonitoringInfos // *InstructionResponse_Register Response isInstructionResponse_Response `protobuf_oneof:"response"` } @@ -510,6 +521,13 @@ func (x *InstructionResponse) GetMonitoringInfos() *MonitoringInfosMetadataRespo return nil } +func (x *InstructionResponse) GetHarnessMonitoringInfos() *HarnessMonitoringInfosResponse { + if x, ok := x.GetResponse().(*InstructionResponse_HarnessMonitoringInfos); ok { + return x.HarnessMonitoringInfos + } + return nil +} + func (x *InstructionResponse) GetRegister() *RegisterResponse { if x, ok := x.GetResponse().(*InstructionResponse_Register); ok { return x.Register @@ -541,6 +559,10 @@ type InstructionResponse_MonitoringInfos struct { MonitoringInfos *MonitoringInfosMetadataResponse `protobuf:"bytes,1005,opt,name=monitoring_infos,json=monitoringInfos,proto3,oneof"` } +type InstructionResponse_HarnessMonitoringInfos struct { + HarnessMonitoringInfos *HarnessMonitoringInfosResponse `protobuf:"bytes,1006,opt,name=harness_monitoring_infos,json=harnessMonitoringInfos,proto3,oneof"` +} + type InstructionResponse_Register struct { // DEPRECATED Register *RegisterResponse `protobuf:"bytes,1000,opt,name=register,proto3,oneof"` @@ -556,8 +578,119 @@ func (*InstructionResponse_FinalizeBundle) isInstructionResponse_Response() {} func (*InstructionResponse_MonitoringInfos) isInstructionResponse_Response() {} +func (*InstructionResponse_HarnessMonitoringInfos) isInstructionResponse_Response() {} + func (*InstructionResponse_Register) isInstructionResponse_Response() {} +// A request to provide full MonitoringInfo associated with the entire SDK +// harness process, not specific to a bundle. +// +// An SDK can report metrics using an identifier that only contains the +// associated payload. A runner who wants to receive the full metrics +// information can request all the monitoring metadata via a +// MonitoringInfosMetadataRequest providing a list of ids as necessary. +// +// The SDK is allowed to reuse the identifiers +// for the lifetime of the associated control connection as long +// as the MonitoringInfo could be reconstructed fully by overwriting its +// payload field with the bytes specified here. +type HarnessMonitoringInfosRequest struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields +} + +func (x *HarnessMonitoringInfosRequest) Reset() { + *x = HarnessMonitoringInfosRequest{} + if protoimpl.UnsafeEnabled { + mi := &file_beam_fn_api_proto_msgTypes[4] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *HarnessMonitoringInfosRequest) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*HarnessMonitoringInfosRequest) ProtoMessage() {} + +func (x *HarnessMonitoringInfosRequest) ProtoReflect() protoreflect.Message { + mi := &file_beam_fn_api_proto_msgTypes[4] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use HarnessMonitoringInfosRequest.ProtoReflect.Descriptor instead. +func (*HarnessMonitoringInfosRequest) Descriptor() ([]byte, []int) { + return file_beam_fn_api_proto_rawDescGZIP(), []int{4} +} + +type HarnessMonitoringInfosResponse struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // An identifier to MonitoringInfo.payload mapping containing + // Metrics associated with the SDK harness, not a specific bundle. + // + // An SDK can report metrics using an identifier that only contains the + // associated payload. A runner who wants to receive the full metrics + // information can request all the monitoring metadata via a + // MonitoringInfosMetadataRequest providing a list of ids as necessary. + // + // The SDK is allowed to reuse the identifiers + // for the lifetime of the associated control connection as long + // as the MonitoringInfo could be reconstructed fully by overwriting its + // payload field with the bytes specified here. + MonitoringData map[string][]byte `protobuf:"bytes,1,rep,name=monitoring_data,json=monitoringData,proto3" json:"monitoring_data,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` +} + +func (x *HarnessMonitoringInfosResponse) Reset() { + *x = HarnessMonitoringInfosResponse{} + if protoimpl.UnsafeEnabled { + mi := &file_beam_fn_api_proto_msgTypes[5] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *HarnessMonitoringInfosResponse) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*HarnessMonitoringInfosResponse) ProtoMessage() {} + +func (x *HarnessMonitoringInfosResponse) ProtoReflect() protoreflect.Message { + mi := &file_beam_fn_api_proto_msgTypes[5] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use HarnessMonitoringInfosResponse.ProtoReflect.Descriptor instead. +func (*HarnessMonitoringInfosResponse) Descriptor() ([]byte, []int) { + return file_beam_fn_api_proto_rawDescGZIP(), []int{5} +} + +func (x *HarnessMonitoringInfosResponse) GetMonitoringData() map[string][]byte { + if x != nil { + return x.MonitoringData + } + return nil +} + // A list of objects which can be referred to by the runner in // future requests. // Stable @@ -573,7 +706,7 @@ type RegisterRequest struct { func (x *RegisterRequest) Reset() { *x = RegisterRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[4] + mi := &file_beam_fn_api_proto_msgTypes[6] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -586,7 +719,7 @@ func (x *RegisterRequest) String() string { func (*RegisterRequest) ProtoMessage() {} func (x *RegisterRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[4] + mi := &file_beam_fn_api_proto_msgTypes[6] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -599,7 +732,7 @@ func (x *RegisterRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use RegisterRequest.ProtoReflect.Descriptor instead. func (*RegisterRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{4} + return file_beam_fn_api_proto_rawDescGZIP(), []int{6} } func (x *RegisterRequest) GetProcessBundleDescriptor() []*ProcessBundleDescriptor { @@ -619,7 +752,7 @@ type RegisterResponse struct { func (x *RegisterResponse) Reset() { *x = RegisterResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[5] + mi := &file_beam_fn_api_proto_msgTypes[7] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -632,7 +765,7 @@ func (x *RegisterResponse) String() string { func (*RegisterResponse) ProtoMessage() {} func (x *RegisterResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[5] + mi := &file_beam_fn_api_proto_msgTypes[7] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -645,7 +778,7 @@ func (x *RegisterResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use RegisterResponse.ProtoReflect.Descriptor instead. func (*RegisterResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{5} + return file_beam_fn_api_proto_rawDescGZIP(), []int{7} } // Definitions that should be used to construct the bundle processing graph. @@ -679,7 +812,7 @@ type ProcessBundleDescriptor struct { func (x *ProcessBundleDescriptor) Reset() { *x = ProcessBundleDescriptor{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[6] + mi := &file_beam_fn_api_proto_msgTypes[8] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -692,7 +825,7 @@ func (x *ProcessBundleDescriptor) String() string { func (*ProcessBundleDescriptor) ProtoMessage() {} func (x *ProcessBundleDescriptor) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[6] + mi := &file_beam_fn_api_proto_msgTypes[8] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -705,7 +838,7 @@ func (x *ProcessBundleDescriptor) ProtoReflect() protoreflect.Message { // Deprecated: Use ProcessBundleDescriptor.ProtoReflect.Descriptor instead. func (*ProcessBundleDescriptor) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{6} + return file_beam_fn_api_proto_rawDescGZIP(), []int{8} } func (x *ProcessBundleDescriptor) GetId() string { @@ -786,7 +919,7 @@ type BundleApplication struct { // // If there is no watermark reported from RestrictionTracker, the runner will // use MIN_TIMESTAMP by default. - OutputWatermarks map[string]*timestamp.Timestamp `protobuf:"bytes,4,rep,name=output_watermarks,json=outputWatermarks,proto3" json:"output_watermarks,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` + OutputWatermarks map[string]*timestamppb.Timestamp `protobuf:"bytes,4,rep,name=output_watermarks,json=outputWatermarks,proto3" json:"output_watermarks,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` // Whether this application potentially produces an unbounded // amount of data. Note that this should only be set to BOUNDED if and // only if the application is known to produce a finite amount of output. @@ -796,7 +929,7 @@ type BundleApplication struct { func (x *BundleApplication) Reset() { *x = BundleApplication{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[7] + mi := &file_beam_fn_api_proto_msgTypes[9] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -809,7 +942,7 @@ func (x *BundleApplication) String() string { func (*BundleApplication) ProtoMessage() {} func (x *BundleApplication) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[7] + mi := &file_beam_fn_api_proto_msgTypes[9] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -822,7 +955,7 @@ func (x *BundleApplication) ProtoReflect() protoreflect.Message { // Deprecated: Use BundleApplication.ProtoReflect.Descriptor instead. func (*BundleApplication) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{7} + return file_beam_fn_api_proto_rawDescGZIP(), []int{9} } func (x *BundleApplication) GetTransformId() string { @@ -846,7 +979,7 @@ func (x *BundleApplication) GetElement() []byte { return nil } -func (x *BundleApplication) GetOutputWatermarks() map[string]*timestamp.Timestamp { +func (x *BundleApplication) GetOutputWatermarks() map[string]*timestamppb.Timestamp { if x != nil { return x.OutputWatermarks } @@ -857,7 +990,7 @@ func (x *BundleApplication) GetIsBounded() pipeline_v1.IsBounded_Enum { if x != nil { return x.IsBounded } - return pipeline_v1.IsBounded_UNSPECIFIED + return pipeline_v1.IsBounded_Enum(0) } // An Application should be scheduled for execution after a delay. @@ -873,13 +1006,13 @@ type DelayedBundleApplication struct { // Recommended time delay at which the application should be scheduled to // execute by the runner. Time delay that equals 0 may be scheduled to execute // immediately. The unit of time delay should be microsecond. - RequestedTimeDelay *duration.Duration `protobuf:"bytes,2,opt,name=requested_time_delay,json=requestedTimeDelay,proto3" json:"requested_time_delay,omitempty"` + RequestedTimeDelay *durationpb.Duration `protobuf:"bytes,2,opt,name=requested_time_delay,json=requestedTimeDelay,proto3" json:"requested_time_delay,omitempty"` } func (x *DelayedBundleApplication) Reset() { *x = DelayedBundleApplication{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[8] + mi := &file_beam_fn_api_proto_msgTypes[10] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -892,7 +1025,7 @@ func (x *DelayedBundleApplication) String() string { func (*DelayedBundleApplication) ProtoMessage() {} func (x *DelayedBundleApplication) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[8] + mi := &file_beam_fn_api_proto_msgTypes[10] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -905,7 +1038,7 @@ func (x *DelayedBundleApplication) ProtoReflect() protoreflect.Message { // Deprecated: Use DelayedBundleApplication.ProtoReflect.Descriptor instead. func (*DelayedBundleApplication) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{8} + return file_beam_fn_api_proto_rawDescGZIP(), []int{10} } func (x *DelayedBundleApplication) GetApplication() *BundleApplication { @@ -915,7 +1048,7 @@ func (x *DelayedBundleApplication) GetApplication() *BundleApplication { return nil } -func (x *DelayedBundleApplication) GetRequestedTimeDelay() *duration.Duration { +func (x *DelayedBundleApplication) GetRequestedTimeDelay() *durationpb.Duration { if x != nil { return x.RequestedTimeDelay } @@ -940,7 +1073,7 @@ type ProcessBundleRequest struct { func (x *ProcessBundleRequest) Reset() { *x = ProcessBundleRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[9] + mi := &file_beam_fn_api_proto_msgTypes[11] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -953,7 +1086,7 @@ func (x *ProcessBundleRequest) String() string { func (*ProcessBundleRequest) ProtoMessage() {} func (x *ProcessBundleRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[9] + mi := &file_beam_fn_api_proto_msgTypes[11] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -966,7 +1099,7 @@ func (x *ProcessBundleRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use ProcessBundleRequest.ProtoReflect.Descriptor instead. func (*ProcessBundleRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{9} + return file_beam_fn_api_proto_rawDescGZIP(), []int{11} } func (x *ProcessBundleRequest) GetProcessBundleDescriptorId() string { @@ -1017,7 +1150,7 @@ type ProcessBundleResponse struct { func (x *ProcessBundleResponse) Reset() { *x = ProcessBundleResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[10] + mi := &file_beam_fn_api_proto_msgTypes[12] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1030,7 +1163,7 @@ func (x *ProcessBundleResponse) String() string { func (*ProcessBundleResponse) ProtoMessage() {} func (x *ProcessBundleResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[10] + mi := &file_beam_fn_api_proto_msgTypes[12] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1043,7 +1176,7 @@ func (x *ProcessBundleResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use ProcessBundleResponse.ProtoReflect.Descriptor instead. func (*ProcessBundleResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{10} + return file_beam_fn_api_proto_rawDescGZIP(), []int{12} } func (x *ProcessBundleResponse) GetResidualRoots() []*DelayedBundleApplication { @@ -1090,7 +1223,7 @@ type ProcessBundleProgressRequest struct { func (x *ProcessBundleProgressRequest) Reset() { *x = ProcessBundleProgressRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[11] + mi := &file_beam_fn_api_proto_msgTypes[13] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1103,7 +1236,7 @@ func (x *ProcessBundleProgressRequest) String() string { func (*ProcessBundleProgressRequest) ProtoMessage() {} func (x *ProcessBundleProgressRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[11] + mi := &file_beam_fn_api_proto_msgTypes[13] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1116,7 +1249,7 @@ func (x *ProcessBundleProgressRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use ProcessBundleProgressRequest.ProtoReflect.Descriptor instead. func (*ProcessBundleProgressRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{11} + return file_beam_fn_api_proto_rawDescGZIP(), []int{13} } func (x *ProcessBundleProgressRequest) GetInstructionId() string { @@ -1126,29 +1259,30 @@ func (x *ProcessBundleProgressRequest) GetInstructionId() string { return "" } -// A request to provide full MonitoringInfo for a given id. +// A request to provide full MonitoringInfo for a set of provided ids. // // An SDK can report metrics using an identifier that only contains the // associated payload. A runner who wants to receive the full metrics // information can request all the monitoring metadata via a // MonitoringInfosMetadataRequest providing a list of ids as necessary. // -// The MonitoringInfo ids are scoped to the associated control connection. For -// example, an SDK may reuse the ids across multiple bundles. +// The SDK is allowed to reuse the identifiers for the lifetime of the +// associated control connection as long as the MonitoringInfo could be +// reconstructed fully by overwriting its payload field with the bytes specified +// here. type MonitoringInfosMetadataRequest struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields - // A list of ids for which MonitoringInfo are requested. All but the payload - // field will be populated. + // A list of ids for which the full MonitoringInfo is requested for. MonitoringInfoId []string `protobuf:"bytes,1,rep,name=monitoring_info_id,json=monitoringInfoId,proto3" json:"monitoring_info_id,omitempty"` } func (x *MonitoringInfosMetadataRequest) Reset() { *x = MonitoringInfosMetadataRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[12] + mi := &file_beam_fn_api_proto_msgTypes[14] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1161,7 +1295,7 @@ func (x *MonitoringInfosMetadataRequest) String() string { func (*MonitoringInfosMetadataRequest) ProtoMessage() {} func (x *MonitoringInfosMetadataRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[12] + mi := &file_beam_fn_api_proto_msgTypes[14] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1174,7 +1308,7 @@ func (x *MonitoringInfosMetadataRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use MonitoringInfosMetadataRequest.ProtoReflect.Descriptor instead. func (*MonitoringInfosMetadataRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{12} + return file_beam_fn_api_proto_rawDescGZIP(), []int{14} } func (x *MonitoringInfosMetadataRequest) GetMonitoringInfoId() []string { @@ -1199,7 +1333,8 @@ type ProcessBundleProgressResponse struct { // information can request all the monitoring metadata via a // MonitoringInfosMetadataRequest providing a list of ids as necessary. // - // The SDK is allowed to reuse the identifiers across multiple bundles as long + // The SDK is allowed to reuse the identifiers + // for the lifetime of the associated control connection as long // as the MonitoringInfo could be reconstructed fully by overwriting its // payload field with the bytes specified here. MonitoringData map[string][]byte `protobuf:"bytes,5,rep,name=monitoring_data,json=monitoringData,proto3" json:"monitoring_data,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` @@ -1208,7 +1343,7 @@ type ProcessBundleProgressResponse struct { func (x *ProcessBundleProgressResponse) Reset() { *x = ProcessBundleProgressResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[13] + mi := &file_beam_fn_api_proto_msgTypes[15] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1221,7 +1356,7 @@ func (x *ProcessBundleProgressResponse) String() string { func (*ProcessBundleProgressResponse) ProtoMessage() {} func (x *ProcessBundleProgressResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[13] + mi := &file_beam_fn_api_proto_msgTypes[15] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1234,7 +1369,7 @@ func (x *ProcessBundleProgressResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use ProcessBundleProgressResponse.ProtoReflect.Descriptor instead. func (*ProcessBundleProgressResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{13} + return file_beam_fn_api_proto_rawDescGZIP(), []int{15} } func (x *ProcessBundleProgressResponse) GetMonitoringInfos() []*pipeline_v1.MonitoringInfo { @@ -1259,22 +1394,23 @@ func (x *ProcessBundleProgressResponse) GetMonitoringData() map[string][]byte { // information can request all the monitoring metadata via a // MonitoringInfosMetadataRequest providing a list of ids as necessary. // -// The MonitoringInfo ids are scoped to the associated control connection. For -// example an SDK may reuse the ids across multiple bundles. +// The SDK is allowed to reuse the identifiers +// for the lifetime of the associated control connection as long +// as the MonitoringInfo could be reconstructed fully by overwriting its +// payload field with the bytes specified here. type MonitoringInfosMetadataResponse struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields - // A mapping from a requested identifier to a MonitoringInfo. All fields - // except for the payload of the MonitoringInfo will be specified. + // A mapping from an identifier to the full metrics information. MonitoringInfo map[string]*pipeline_v1.MonitoringInfo `protobuf:"bytes,1,rep,name=monitoring_info,json=monitoringInfo,proto3" json:"monitoring_info,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` } func (x *MonitoringInfosMetadataResponse) Reset() { *x = MonitoringInfosMetadataResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[14] + mi := &file_beam_fn_api_proto_msgTypes[16] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1287,7 +1423,7 @@ func (x *MonitoringInfosMetadataResponse) String() string { func (*MonitoringInfosMetadataResponse) ProtoMessage() {} func (x *MonitoringInfosMetadataResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[14] + mi := &file_beam_fn_api_proto_msgTypes[16] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1300,7 +1436,7 @@ func (x *MonitoringInfosMetadataResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use MonitoringInfosMetadataResponse.ProtoReflect.Descriptor instead. func (*MonitoringInfosMetadataResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{14} + return file_beam_fn_api_proto_rawDescGZIP(), []int{16} } func (x *MonitoringInfosMetadataResponse) GetMonitoringInfo() map[string]*pipeline_v1.MonitoringInfo { @@ -1330,7 +1466,7 @@ type ProcessBundleSplitRequest struct { func (x *ProcessBundleSplitRequest) Reset() { *x = ProcessBundleSplitRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[15] + mi := &file_beam_fn_api_proto_msgTypes[17] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1343,7 +1479,7 @@ func (x *ProcessBundleSplitRequest) String() string { func (*ProcessBundleSplitRequest) ProtoMessage() {} func (x *ProcessBundleSplitRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[15] + mi := &file_beam_fn_api_proto_msgTypes[17] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1356,7 +1492,7 @@ func (x *ProcessBundleSplitRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use ProcessBundleSplitRequest.ProtoReflect.Descriptor instead. func (*ProcessBundleSplitRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{15} + return file_beam_fn_api_proto_rawDescGZIP(), []int{17} } func (x *ProcessBundleSplitRequest) GetInstructionId() string { @@ -1402,7 +1538,7 @@ type ProcessBundleSplitResponse struct { func (x *ProcessBundleSplitResponse) Reset() { *x = ProcessBundleSplitResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[16] + mi := &file_beam_fn_api_proto_msgTypes[18] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1415,7 +1551,7 @@ func (x *ProcessBundleSplitResponse) String() string { func (*ProcessBundleSplitResponse) ProtoMessage() {} func (x *ProcessBundleSplitResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[16] + mi := &file_beam_fn_api_proto_msgTypes[18] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1428,7 +1564,7 @@ func (x *ProcessBundleSplitResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use ProcessBundleSplitResponse.ProtoReflect.Descriptor instead. func (*ProcessBundleSplitResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{16} + return file_beam_fn_api_proto_rawDescGZIP(), []int{18} } func (x *ProcessBundleSplitResponse) GetPrimaryRoots() []*BundleApplication { @@ -1465,7 +1601,7 @@ type FinalizeBundleRequest struct { func (x *FinalizeBundleRequest) Reset() { *x = FinalizeBundleRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[17] + mi := &file_beam_fn_api_proto_msgTypes[19] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1478,7 +1614,7 @@ func (x *FinalizeBundleRequest) String() string { func (*FinalizeBundleRequest) ProtoMessage() {} func (x *FinalizeBundleRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[17] + mi := &file_beam_fn_api_proto_msgTypes[19] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1491,7 +1627,7 @@ func (x *FinalizeBundleRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use FinalizeBundleRequest.ProtoReflect.Descriptor instead. func (*FinalizeBundleRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{17} + return file_beam_fn_api_proto_rawDescGZIP(), []int{19} } func (x *FinalizeBundleRequest) GetInstructionId() string { @@ -1510,7 +1646,7 @@ type FinalizeBundleResponse struct { func (x *FinalizeBundleResponse) Reset() { *x = FinalizeBundleResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[18] + mi := &file_beam_fn_api_proto_msgTypes[20] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1523,7 +1659,7 @@ func (x *FinalizeBundleResponse) String() string { func (*FinalizeBundleResponse) ProtoMessage() {} func (x *FinalizeBundleResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[18] + mi := &file_beam_fn_api_proto_msgTypes[20] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1536,7 +1672,7 @@ func (x *FinalizeBundleResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use FinalizeBundleResponse.ProtoReflect.Descriptor instead. func (*FinalizeBundleResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{18} + return file_beam_fn_api_proto_rawDescGZIP(), []int{20} } // Messages used to represent logical byte streams. @@ -1555,7 +1691,7 @@ type Elements struct { func (x *Elements) Reset() { *x = Elements{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[19] + mi := &file_beam_fn_api_proto_msgTypes[21] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1568,7 +1704,7 @@ func (x *Elements) String() string { func (*Elements) ProtoMessage() {} func (x *Elements) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[19] + mi := &file_beam_fn_api_proto_msgTypes[21] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1581,7 +1717,7 @@ func (x *Elements) ProtoReflect() protoreflect.Message { // Deprecated: Use Elements.ProtoReflect.Descriptor instead. func (*Elements) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{19} + return file_beam_fn_api_proto_rawDescGZIP(), []int{21} } func (x *Elements) GetData() []*Elements_Data { @@ -1625,7 +1761,7 @@ type StateRequest struct { func (x *StateRequest) Reset() { *x = StateRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[20] + mi := &file_beam_fn_api_proto_msgTypes[22] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1638,7 +1774,7 @@ func (x *StateRequest) String() string { func (*StateRequest) ProtoMessage() {} func (x *StateRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[20] + mi := &file_beam_fn_api_proto_msgTypes[22] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1651,7 +1787,7 @@ func (x *StateRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use StateRequest.ProtoReflect.Descriptor instead. func (*StateRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{20} + return file_beam_fn_api_proto_rawDescGZIP(), []int{22} } func (x *StateRequest) GetId() string { @@ -1753,7 +1889,7 @@ type StateResponse struct { func (x *StateResponse) Reset() { *x = StateResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[21] + mi := &file_beam_fn_api_proto_msgTypes[23] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1766,7 +1902,7 @@ func (x *StateResponse) String() string { func (*StateResponse) ProtoMessage() {} func (x *StateResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[21] + mi := &file_beam_fn_api_proto_msgTypes[23] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1779,7 +1915,7 @@ func (x *StateResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use StateResponse.ProtoReflect.Descriptor instead. func (*StateResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{21} + return file_beam_fn_api_proto_rawDescGZIP(), []int{23} } func (x *StateResponse) GetId() string { @@ -1868,7 +2004,7 @@ type StateKey struct { func (x *StateKey) Reset() { *x = StateKey{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[22] + mi := &file_beam_fn_api_proto_msgTypes[24] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -1881,7 +2017,7 @@ func (x *StateKey) String() string { func (*StateKey) ProtoMessage() {} func (x *StateKey) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[22] + mi := &file_beam_fn_api_proto_msgTypes[24] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -1894,7 +2030,7 @@ func (x *StateKey) ProtoReflect() protoreflect.Message { // Deprecated: Use StateKey.ProtoReflect.Descriptor instead. func (*StateKey) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{22} + return file_beam_fn_api_proto_rawDescGZIP(), []int{24} } func (m *StateKey) GetType() isStateKey_Type { @@ -1990,7 +2126,7 @@ type StateGetRequest struct { func (x *StateGetRequest) Reset() { *x = StateGetRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[23] + mi := &file_beam_fn_api_proto_msgTypes[25] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2003,7 +2139,7 @@ func (x *StateGetRequest) String() string { func (*StateGetRequest) ProtoMessage() {} func (x *StateGetRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[23] + mi := &file_beam_fn_api_proto_msgTypes[25] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2016,7 +2152,7 @@ func (x *StateGetRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use StateGetRequest.ProtoReflect.Descriptor instead. func (*StateGetRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{23} + return file_beam_fn_api_proto_rawDescGZIP(), []int{25} } func (x *StateGetRequest) GetContinuationToken() []byte { @@ -2046,7 +2182,7 @@ type StateGetResponse struct { func (x *StateGetResponse) Reset() { *x = StateGetResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[24] + mi := &file_beam_fn_api_proto_msgTypes[26] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2059,7 +2195,7 @@ func (x *StateGetResponse) String() string { func (*StateGetResponse) ProtoMessage() {} func (x *StateGetResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[24] + mi := &file_beam_fn_api_proto_msgTypes[26] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2072,7 +2208,7 @@ func (x *StateGetResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use StateGetResponse.ProtoReflect.Descriptor instead. func (*StateGetResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{24} + return file_beam_fn_api_proto_rawDescGZIP(), []int{26} } func (x *StateGetResponse) GetContinuationToken() []byte { @@ -2104,7 +2240,7 @@ type StateAppendRequest struct { func (x *StateAppendRequest) Reset() { *x = StateAppendRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[25] + mi := &file_beam_fn_api_proto_msgTypes[27] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2117,7 +2253,7 @@ func (x *StateAppendRequest) String() string { func (*StateAppendRequest) ProtoMessage() {} func (x *StateAppendRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[25] + mi := &file_beam_fn_api_proto_msgTypes[27] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2130,7 +2266,7 @@ func (x *StateAppendRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use StateAppendRequest.ProtoReflect.Descriptor instead. func (*StateAppendRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{25} + return file_beam_fn_api_proto_rawDescGZIP(), []int{27} } func (x *StateAppendRequest) GetData() []byte { @@ -2150,7 +2286,7 @@ type StateAppendResponse struct { func (x *StateAppendResponse) Reset() { *x = StateAppendResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[26] + mi := &file_beam_fn_api_proto_msgTypes[28] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2163,7 +2299,7 @@ func (x *StateAppendResponse) String() string { func (*StateAppendResponse) ProtoMessage() {} func (x *StateAppendResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[26] + mi := &file_beam_fn_api_proto_msgTypes[28] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2176,7 +2312,7 @@ func (x *StateAppendResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use StateAppendResponse.ProtoReflect.Descriptor instead. func (*StateAppendResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{26} + return file_beam_fn_api_proto_rawDescGZIP(), []int{28} } // A request to clear state. @@ -2189,7 +2325,7 @@ type StateClearRequest struct { func (x *StateClearRequest) Reset() { *x = StateClearRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[27] + mi := &file_beam_fn_api_proto_msgTypes[29] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2202,7 +2338,7 @@ func (x *StateClearRequest) String() string { func (*StateClearRequest) ProtoMessage() {} func (x *StateClearRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[27] + mi := &file_beam_fn_api_proto_msgTypes[29] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2215,7 +2351,7 @@ func (x *StateClearRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use StateClearRequest.ProtoReflect.Descriptor instead. func (*StateClearRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{27} + return file_beam_fn_api_proto_rawDescGZIP(), []int{29} } // A response to clear state. @@ -2228,7 +2364,7 @@ type StateClearResponse struct { func (x *StateClearResponse) Reset() { *x = StateClearResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[28] + mi := &file_beam_fn_api_proto_msgTypes[30] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2241,7 +2377,7 @@ func (x *StateClearResponse) String() string { func (*StateClearResponse) ProtoMessage() {} func (x *StateClearResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[28] + mi := &file_beam_fn_api_proto_msgTypes[30] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2254,7 +2390,7 @@ func (x *StateClearResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use StateClearResponse.ProtoReflect.Descriptor instead. func (*StateClearResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{28} + return file_beam_fn_api_proto_rawDescGZIP(), []int{30} } // A log entry @@ -2266,7 +2402,7 @@ type LogEntry struct { // (Required) The severity of the log statement. Severity LogEntry_Severity_Enum `protobuf:"varint,1,opt,name=severity,proto3,enum=org.apache.beam.model.fn_execution.v1.LogEntry_Severity_Enum" json:"severity,omitempty"` // (Required) The time at which this log statement occurred. - Timestamp *timestamp.Timestamp `protobuf:"bytes,2,opt,name=timestamp,proto3" json:"timestamp,omitempty"` + Timestamp *timestamppb.Timestamp `protobuf:"bytes,2,opt,name=timestamp,proto3" json:"timestamp,omitempty"` // (Required) A human readable message. Message string `protobuf:"bytes,3,opt,name=message,proto3" json:"message,omitempty"` // (Optional) An optional trace of the functions involved. For example, in @@ -2293,7 +2429,7 @@ type LogEntry struct { func (x *LogEntry) Reset() { *x = LogEntry{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[29] + mi := &file_beam_fn_api_proto_msgTypes[31] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2306,7 +2442,7 @@ func (x *LogEntry) String() string { func (*LogEntry) ProtoMessage() {} func (x *LogEntry) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[29] + mi := &file_beam_fn_api_proto_msgTypes[31] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2319,7 +2455,7 @@ func (x *LogEntry) ProtoReflect() protoreflect.Message { // Deprecated: Use LogEntry.ProtoReflect.Descriptor instead. func (*LogEntry) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{29} + return file_beam_fn_api_proto_rawDescGZIP(), []int{31} } func (x *LogEntry) GetSeverity() LogEntry_Severity_Enum { @@ -2329,7 +2465,7 @@ func (x *LogEntry) GetSeverity() LogEntry_Severity_Enum { return LogEntry_Severity_UNSPECIFIED } -func (x *LogEntry) GetTimestamp() *timestamp.Timestamp { +func (x *LogEntry) GetTimestamp() *timestamppb.Timestamp { if x != nil { return x.Timestamp } @@ -2387,7 +2523,7 @@ type LogControl struct { func (x *LogControl) Reset() { *x = LogControl{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[30] + mi := &file_beam_fn_api_proto_msgTypes[32] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2400,7 +2536,7 @@ func (x *LogControl) String() string { func (*LogControl) ProtoMessage() {} func (x *LogControl) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[30] + mi := &file_beam_fn_api_proto_msgTypes[32] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2413,7 +2549,7 @@ func (x *LogControl) ProtoReflect() protoreflect.Message { // Deprecated: Use LogControl.ProtoReflect.Descriptor instead. func (*LogControl) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{30} + return file_beam_fn_api_proto_rawDescGZIP(), []int{32} } type StartWorkerRequest struct { @@ -2432,7 +2568,7 @@ type StartWorkerRequest struct { func (x *StartWorkerRequest) Reset() { *x = StartWorkerRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[31] + mi := &file_beam_fn_api_proto_msgTypes[33] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2445,7 +2581,7 @@ func (x *StartWorkerRequest) String() string { func (*StartWorkerRequest) ProtoMessage() {} func (x *StartWorkerRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[31] + mi := &file_beam_fn_api_proto_msgTypes[33] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2458,7 +2594,7 @@ func (x *StartWorkerRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use StartWorkerRequest.ProtoReflect.Descriptor instead. func (*StartWorkerRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{31} + return file_beam_fn_api_proto_rawDescGZIP(), []int{33} } func (x *StartWorkerRequest) GetWorkerId() string { @@ -2514,7 +2650,7 @@ type StartWorkerResponse struct { func (x *StartWorkerResponse) Reset() { *x = StartWorkerResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[32] + mi := &file_beam_fn_api_proto_msgTypes[34] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2527,7 +2663,7 @@ func (x *StartWorkerResponse) String() string { func (*StartWorkerResponse) ProtoMessage() {} func (x *StartWorkerResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[32] + mi := &file_beam_fn_api_proto_msgTypes[34] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2540,7 +2676,7 @@ func (x *StartWorkerResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use StartWorkerResponse.ProtoReflect.Descriptor instead. func (*StartWorkerResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{32} + return file_beam_fn_api_proto_rawDescGZIP(), []int{34} } func (x *StartWorkerResponse) GetError() string { @@ -2561,7 +2697,7 @@ type StopWorkerRequest struct { func (x *StopWorkerRequest) Reset() { *x = StopWorkerRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[33] + mi := &file_beam_fn_api_proto_msgTypes[35] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2574,7 +2710,7 @@ func (x *StopWorkerRequest) String() string { func (*StopWorkerRequest) ProtoMessage() {} func (x *StopWorkerRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[33] + mi := &file_beam_fn_api_proto_msgTypes[35] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2587,7 +2723,7 @@ func (x *StopWorkerRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use StopWorkerRequest.ProtoReflect.Descriptor instead. func (*StopWorkerRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{33} + return file_beam_fn_api_proto_rawDescGZIP(), []int{35} } func (x *StopWorkerRequest) GetWorkerId() string { @@ -2608,7 +2744,7 @@ type StopWorkerResponse struct { func (x *StopWorkerResponse) Reset() { *x = StopWorkerResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[34] + mi := &file_beam_fn_api_proto_msgTypes[36] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2621,7 +2757,7 @@ func (x *StopWorkerResponse) String() string { func (*StopWorkerResponse) ProtoMessage() {} func (x *StopWorkerResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[34] + mi := &file_beam_fn_api_proto_msgTypes[36] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2634,7 +2770,7 @@ func (x *StopWorkerResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use StopWorkerResponse.ProtoReflect.Descriptor instead. func (*StopWorkerResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{34} + return file_beam_fn_api_proto_rawDescGZIP(), []int{36} } func (x *StopWorkerResponse) GetError() string { @@ -2658,7 +2794,7 @@ type WorkerStatusRequest struct { func (x *WorkerStatusRequest) Reset() { *x = WorkerStatusRequest{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[35] + mi := &file_beam_fn_api_proto_msgTypes[37] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2671,7 +2807,7 @@ func (x *WorkerStatusRequest) String() string { func (*WorkerStatusRequest) ProtoMessage() {} func (x *WorkerStatusRequest) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[35] + mi := &file_beam_fn_api_proto_msgTypes[37] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2684,7 +2820,7 @@ func (x *WorkerStatusRequest) ProtoReflect() protoreflect.Message { // Deprecated: Use WorkerStatusRequest.ProtoReflect.Descriptor instead. func (*WorkerStatusRequest) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{35} + return file_beam_fn_api_proto_rawDescGZIP(), []int{37} } func (x *WorkerStatusRequest) GetId() string { @@ -2715,7 +2851,7 @@ type WorkerStatusResponse struct { func (x *WorkerStatusResponse) Reset() { *x = WorkerStatusResponse{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[36] + mi := &file_beam_fn_api_proto_msgTypes[38] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2728,7 +2864,7 @@ func (x *WorkerStatusResponse) String() string { func (*WorkerStatusResponse) ProtoMessage() {} func (x *WorkerStatusResponse) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[36] + mi := &file_beam_fn_api_proto_msgTypes[38] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2741,7 +2877,7 @@ func (x *WorkerStatusResponse) ProtoReflect() protoreflect.Message { // Deprecated: Use WorkerStatusResponse.ProtoReflect.Descriptor instead. func (*WorkerStatusResponse) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{36} + return file_beam_fn_api_proto_rawDescGZIP(), []int{38} } func (x *WorkerStatusResponse) GetId() string { @@ -2785,7 +2921,7 @@ type ProcessBundleRequest_CacheToken struct { func (x *ProcessBundleRequest_CacheToken) Reset() { *x = ProcessBundleRequest_CacheToken{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[43] + mi := &file_beam_fn_api_proto_msgTypes[46] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2798,7 +2934,7 @@ func (x *ProcessBundleRequest_CacheToken) String() string { func (*ProcessBundleRequest_CacheToken) ProtoMessage() {} func (x *ProcessBundleRequest_CacheToken) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[43] + mi := &file_beam_fn_api_proto_msgTypes[46] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2811,7 +2947,7 @@ func (x *ProcessBundleRequest_CacheToken) ProtoReflect() protoreflect.Message { // Deprecated: Use ProcessBundleRequest_CacheToken.ProtoReflect.Descriptor instead. func (*ProcessBundleRequest_CacheToken) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{9, 0} + return file_beam_fn_api_proto_rawDescGZIP(), []int{11, 0} } func (m *ProcessBundleRequest_CacheToken) GetType() isProcessBundleRequest_CacheToken_Type { @@ -2868,7 +3004,7 @@ type ProcessBundleRequest_CacheToken_UserState struct { func (x *ProcessBundleRequest_CacheToken_UserState) Reset() { *x = ProcessBundleRequest_CacheToken_UserState{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[44] + mi := &file_beam_fn_api_proto_msgTypes[47] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2881,7 +3017,7 @@ func (x *ProcessBundleRequest_CacheToken_UserState) String() string { func (*ProcessBundleRequest_CacheToken_UserState) ProtoMessage() {} func (x *ProcessBundleRequest_CacheToken_UserState) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[44] + mi := &file_beam_fn_api_proto_msgTypes[47] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2894,7 +3030,7 @@ func (x *ProcessBundleRequest_CacheToken_UserState) ProtoReflect() protoreflect. // Deprecated: Use ProcessBundleRequest_CacheToken_UserState.ProtoReflect.Descriptor instead. func (*ProcessBundleRequest_CacheToken_UserState) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{9, 0, 0} + return file_beam_fn_api_proto_rawDescGZIP(), []int{11, 0, 0} } // A flag to indicate a cache token is valid for a side input. @@ -2912,7 +3048,7 @@ type ProcessBundleRequest_CacheToken_SideInput struct { func (x *ProcessBundleRequest_CacheToken_SideInput) Reset() { *x = ProcessBundleRequest_CacheToken_SideInput{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[45] + mi := &file_beam_fn_api_proto_msgTypes[48] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2925,7 +3061,7 @@ func (x *ProcessBundleRequest_CacheToken_SideInput) String() string { func (*ProcessBundleRequest_CacheToken_SideInput) ProtoMessage() {} func (x *ProcessBundleRequest_CacheToken_SideInput) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[45] + mi := &file_beam_fn_api_proto_msgTypes[48] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -2938,7 +3074,7 @@ func (x *ProcessBundleRequest_CacheToken_SideInput) ProtoReflect() protoreflect. // Deprecated: Use ProcessBundleRequest_CacheToken_SideInput.ProtoReflect.Descriptor instead. func (*ProcessBundleRequest_CacheToken_SideInput) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{9, 0, 1} + return file_beam_fn_api_proto_rawDescGZIP(), []int{11, 0, 1} } func (x *ProcessBundleRequest_CacheToken_SideInput) GetTransformId() string { @@ -2979,7 +3115,7 @@ type ProcessBundleSplitRequest_DesiredSplit struct { func (x *ProcessBundleSplitRequest_DesiredSplit) Reset() { *x = ProcessBundleSplitRequest_DesiredSplit{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[49] + mi := &file_beam_fn_api_proto_msgTypes[52] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -2992,7 +3128,7 @@ func (x *ProcessBundleSplitRequest_DesiredSplit) String() string { func (*ProcessBundleSplitRequest_DesiredSplit) ProtoMessage() {} func (x *ProcessBundleSplitRequest_DesiredSplit) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[49] + mi := &file_beam_fn_api_proto_msgTypes[52] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3005,7 +3141,7 @@ func (x *ProcessBundleSplitRequest_DesiredSplit) ProtoReflect() protoreflect.Mes // Deprecated: Use ProcessBundleSplitRequest_DesiredSplit.ProtoReflect.Descriptor instead. func (*ProcessBundleSplitRequest_DesiredSplit) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{15, 0} + return file_beam_fn_api_proto_rawDescGZIP(), []int{17, 0} } func (x *ProcessBundleSplitRequest_DesiredSplit) GetFractionOfRemainder() float64 { @@ -3057,7 +3193,7 @@ type ProcessBundleSplitResponse_ChannelSplit struct { func (x *ProcessBundleSplitResponse_ChannelSplit) Reset() { *x = ProcessBundleSplitResponse_ChannelSplit{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[51] + mi := &file_beam_fn_api_proto_msgTypes[54] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3070,7 +3206,7 @@ func (x *ProcessBundleSplitResponse_ChannelSplit) String() string { func (*ProcessBundleSplitResponse_ChannelSplit) ProtoMessage() {} func (x *ProcessBundleSplitResponse_ChannelSplit) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[51] + mi := &file_beam_fn_api_proto_msgTypes[54] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3083,7 +3219,7 @@ func (x *ProcessBundleSplitResponse_ChannelSplit) ProtoReflect() protoreflect.Me // Deprecated: Use ProcessBundleSplitResponse_ChannelSplit.ProtoReflect.Descriptor instead. func (*ProcessBundleSplitResponse_ChannelSplit) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{16, 0} + return file_beam_fn_api_proto_rawDescGZIP(), []int{18, 0} } func (x *ProcessBundleSplitResponse_ChannelSplit) GetTransformId() string { @@ -3138,7 +3274,7 @@ type Elements_Data struct { func (x *Elements_Data) Reset() { *x = Elements_Data{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[52] + mi := &file_beam_fn_api_proto_msgTypes[55] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3151,7 +3287,7 @@ func (x *Elements_Data) String() string { func (*Elements_Data) ProtoMessage() {} func (x *Elements_Data) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[52] + mi := &file_beam_fn_api_proto_msgTypes[55] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3164,7 +3300,7 @@ func (x *Elements_Data) ProtoReflect() protoreflect.Message { // Deprecated: Use Elements_Data.ProtoReflect.Descriptor instead. func (*Elements_Data) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{19, 0} + return file_beam_fn_api_proto_rawDescGZIP(), []int{21, 0} } func (x *Elements_Data) GetInstructionId() string { @@ -3224,7 +3360,7 @@ type Elements_Timers struct { func (x *Elements_Timers) Reset() { *x = Elements_Timers{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[53] + mi := &file_beam_fn_api_proto_msgTypes[56] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3237,7 +3373,7 @@ func (x *Elements_Timers) String() string { func (*Elements_Timers) ProtoMessage() {} func (x *Elements_Timers) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[53] + mi := &file_beam_fn_api_proto_msgTypes[56] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3250,7 +3386,7 @@ func (x *Elements_Timers) ProtoReflect() protoreflect.Message { // Deprecated: Use Elements_Timers.ProtoReflect.Descriptor instead. func (*Elements_Timers) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{19, 1} + return file_beam_fn_api_proto_rawDescGZIP(), []int{21, 1} } func (x *Elements_Timers) GetInstructionId() string { @@ -3306,7 +3442,7 @@ type StateKey_Runner struct { func (x *StateKey_Runner) Reset() { *x = StateKey_Runner{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[54] + mi := &file_beam_fn_api_proto_msgTypes[57] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3319,7 +3455,7 @@ func (x *StateKey_Runner) String() string { func (*StateKey_Runner) ProtoMessage() {} func (x *StateKey_Runner) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[54] + mi := &file_beam_fn_api_proto_msgTypes[57] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3332,7 +3468,7 @@ func (x *StateKey_Runner) ProtoReflect() protoreflect.Message { // Deprecated: Use StateKey_Runner.ProtoReflect.Descriptor instead. func (*StateKey_Runner) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{22, 0} + return file_beam_fn_api_proto_rawDescGZIP(), []int{24, 0} } func (x *StateKey_Runner) GetKey() []byte { @@ -3370,7 +3506,7 @@ type StateKey_IterableSideInput struct { func (x *StateKey_IterableSideInput) Reset() { *x = StateKey_IterableSideInput{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[55] + mi := &file_beam_fn_api_proto_msgTypes[58] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3383,7 +3519,7 @@ func (x *StateKey_IterableSideInput) String() string { func (*StateKey_IterableSideInput) ProtoMessage() {} func (x *StateKey_IterableSideInput) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[55] + mi := &file_beam_fn_api_proto_msgTypes[58] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3396,7 +3532,7 @@ func (x *StateKey_IterableSideInput) ProtoReflect() protoreflect.Message { // Deprecated: Use StateKey_IterableSideInput.ProtoReflect.Descriptor instead. func (*StateKey_IterableSideInput) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{22, 1} + return file_beam_fn_api_proto_rawDescGZIP(), []int{24, 1} } func (x *StateKey_IterableSideInput) GetTransformId() string { @@ -3451,7 +3587,7 @@ type StateKey_MultimapSideInput struct { func (x *StateKey_MultimapSideInput) Reset() { *x = StateKey_MultimapSideInput{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[56] + mi := &file_beam_fn_api_proto_msgTypes[59] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3464,7 +3600,7 @@ func (x *StateKey_MultimapSideInput) String() string { func (*StateKey_MultimapSideInput) ProtoMessage() {} func (x *StateKey_MultimapSideInput) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[56] + mi := &file_beam_fn_api_proto_msgTypes[59] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3477,7 +3613,7 @@ func (x *StateKey_MultimapSideInput) ProtoReflect() protoreflect.Message { // Deprecated: Use StateKey_MultimapSideInput.ProtoReflect.Descriptor instead. func (*StateKey_MultimapSideInput) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{22, 2} + return file_beam_fn_api_proto_rawDescGZIP(), []int{24, 2} } func (x *StateKey_MultimapSideInput) GetTransformId() string { @@ -3536,7 +3672,7 @@ type StateKey_MultimapKeysSideInput struct { func (x *StateKey_MultimapKeysSideInput) Reset() { *x = StateKey_MultimapKeysSideInput{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[57] + mi := &file_beam_fn_api_proto_msgTypes[60] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3549,7 +3685,7 @@ func (x *StateKey_MultimapKeysSideInput) String() string { func (*StateKey_MultimapKeysSideInput) ProtoMessage() {} func (x *StateKey_MultimapKeysSideInput) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[57] + mi := &file_beam_fn_api_proto_msgTypes[60] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3562,7 +3698,7 @@ func (x *StateKey_MultimapKeysSideInput) ProtoReflect() protoreflect.Message { // Deprecated: Use StateKey_MultimapKeysSideInput.ProtoReflect.Descriptor instead. func (*StateKey_MultimapKeysSideInput) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{22, 3} + return file_beam_fn_api_proto_rawDescGZIP(), []int{24, 3} } func (x *StateKey_MultimapKeysSideInput) GetTransformId() string { @@ -3605,7 +3741,7 @@ type StateKey_BagUserState struct { func (x *StateKey_BagUserState) Reset() { *x = StateKey_BagUserState{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[58] + mi := &file_beam_fn_api_proto_msgTypes[61] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3618,7 +3754,7 @@ func (x *StateKey_BagUserState) String() string { func (*StateKey_BagUserState) ProtoMessage() {} func (x *StateKey_BagUserState) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[58] + mi := &file_beam_fn_api_proto_msgTypes[61] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3631,7 +3767,7 @@ func (x *StateKey_BagUserState) ProtoReflect() protoreflect.Message { // Deprecated: Use StateKey_BagUserState.ProtoReflect.Descriptor instead. func (*StateKey_BagUserState) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{22, 4} + return file_beam_fn_api_proto_rawDescGZIP(), []int{24, 4} } func (x *StateKey_BagUserState) GetTransformId() string { @@ -3676,7 +3812,7 @@ type LogEntry_List struct { func (x *LogEntry_List) Reset() { *x = LogEntry_List{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[59] + mi := &file_beam_fn_api_proto_msgTypes[62] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3689,7 +3825,7 @@ func (x *LogEntry_List) String() string { func (*LogEntry_List) ProtoMessage() {} func (x *LogEntry_List) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[59] + mi := &file_beam_fn_api_proto_msgTypes[62] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3702,7 +3838,7 @@ func (x *LogEntry_List) ProtoReflect() protoreflect.Message { // Deprecated: Use LogEntry_List.ProtoReflect.Descriptor instead. func (*LogEntry_List) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{29, 0} + return file_beam_fn_api_proto_rawDescGZIP(), []int{31, 0} } func (x *LogEntry_List) GetLogEntries() []*LogEntry { @@ -3734,7 +3870,7 @@ type LogEntry_Severity struct { func (x *LogEntry_Severity) Reset() { *x = LogEntry_Severity{} if protoimpl.UnsafeEnabled { - mi := &file_beam_fn_api_proto_msgTypes[60] + mi := &file_beam_fn_api_proto_msgTypes[63] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3747,7 +3883,7 @@ func (x *LogEntry_Severity) String() string { func (*LogEntry_Severity) ProtoMessage() {} func (x *LogEntry_Severity) ProtoReflect() protoreflect.Message { - mi := &file_beam_fn_api_proto_msgTypes[60] + mi := &file_beam_fn_api_proto_msgTypes[63] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3760,7 +3896,7 @@ func (x *LogEntry_Severity) ProtoReflect() protoreflect.Message { // Deprecated: Use LogEntry_Severity.ProtoReflect.Descriptor instead. func (*LogEntry_Severity) Descriptor() ([]byte, []int) { - return file_beam_fn_api_proto_rawDescGZIP(), []int{29, 1} + return file_beam_fn_api_proto_rawDescGZIP(), []int{31, 1} } var File_beam_fn_api_proto protoreflect.FileDescriptor @@ -3795,7 +3931,7 @@ var file_beam_fn_api_proto_rawDesc = []byte{ 0x73, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x5f, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x19, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, - 0x69, 0x70, 0x74, 0x6f, 0x72, 0x49, 0x64, 0x22, 0xda, 0x05, 0x0a, 0x12, 0x49, 0x6e, 0x73, 0x74, + 0x69, 0x70, 0x74, 0x6f, 0x72, 0x49, 0x64, 0x22, 0xde, 0x06, 0x0a, 0x12, 0x49, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, @@ -3835,716 +3971,749 @@ var file_beam_fn_api_proto_rawDesc = []byte{ 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x48, 0x00, 0x52, 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, - 0x6f, 0x73, 0x12, 0x55, 0x0a, 0x08, 0x72, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x18, 0xe8, - 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x36, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, - 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x52, 0x65, - 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x48, 0x00, 0x52, - 0x08, 0x72, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x42, 0x09, 0x0a, 0x07, 0x72, 0x65, 0x71, - 0x75, 0x65, 0x73, 0x74, 0x22, 0xf8, 0x05, 0x0a, 0x13, 0x49, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, - 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x25, 0x0a, 0x0e, - 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x01, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, - 0x6e, 0x49, 0x64, 0x12, 0x14, 0x0a, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x18, 0x02, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x12, 0x66, 0x0a, 0x0e, 0x70, 0x72, 0x6f, - 0x63, 0x65, 0x73, 0x73, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x18, 0xe9, 0x07, 0x20, 0x01, - 0x28, 0x0b, 0x32, 0x3c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, - 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, - 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, - 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, - 0x48, 0x00, 0x52, 0x0d, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, - 0x65, 0x12, 0x7f, 0x0a, 0x17, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x62, 0x75, 0x6e, - 0x64, 0x6c, 0x65, 0x5f, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x18, 0xea, 0x07, 0x20, - 0x01, 0x28, 0x0b, 0x32, 0x44, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, - 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, - 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x50, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, - 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x15, 0x70, 0x72, 0x6f, - 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x50, 0x72, 0x6f, 0x67, 0x72, 0x65, - 0x73, 0x73, 0x12, 0x76, 0x0a, 0x14, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x62, 0x75, - 0x6e, 0x64, 0x6c, 0x65, 0x5f, 0x73, 0x70, 0x6c, 0x69, 0x74, 0x18, 0xeb, 0x07, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, - 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, - 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x52, 0x65, 0x73, 0x70, - 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x12, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, - 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x12, 0x69, 0x0a, 0x0f, 0x66, 0x69, - 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x65, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x18, 0xec, 0x07, - 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, - 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x6e, - 0x61, 0x6c, 0x69, 0x7a, 0x65, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, - 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x0e, 0x66, 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x65, 0x42, - 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x12, 0x74, 0x0a, 0x10, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, - 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x73, 0x18, 0xed, 0x07, 0x20, 0x01, 0x28, 0x0b, - 0x32, 0x46, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, - 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, - 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, - 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, - 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x0f, 0x6d, 0x6f, 0x6e, 0x69, - 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x12, 0x56, 0x0a, 0x08, 0x72, - 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x18, 0xe8, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, - 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, - 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x52, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x52, - 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x08, 0x72, 0x65, 0x67, 0x69, 0x73, - 0x74, 0x65, 0x72, 0x42, 0x0a, 0x0a, 0x08, 0x72, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, - 0x8d, 0x01, 0x0a, 0x0f, 0x52, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x52, 0x65, 0x71, 0x75, - 0x65, 0x73, 0x74, 0x12, 0x7a, 0x0a, 0x19, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x62, - 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x5f, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, - 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x6f, 0x73, 0x12, 0x81, 0x01, 0x0a, 0x18, 0x68, 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x5f, 0x6d, + 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x73, 0x18, + 0xee, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x44, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, + 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x48, + 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, + 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x48, 0x00, 0x52, 0x16, + 0x68, 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, + 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x12, 0x55, 0x0a, 0x08, 0x72, 0x65, 0x67, 0x69, 0x73, 0x74, + 0x65, 0x72, 0x18, 0xe8, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x36, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, + 0x31, 0x2e, 0x52, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, + 0x74, 0x48, 0x00, 0x52, 0x08, 0x72, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x42, 0x09, 0x0a, + 0x07, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x22, 0xfd, 0x06, 0x0a, 0x13, 0x49, 0x6e, 0x73, + 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, + 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, + 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, + 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x12, 0x14, 0x0a, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x12, 0x66, 0x0a, + 0x0e, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x18, + 0xe9, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, - 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, - 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x17, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, - 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x22, - 0x12, 0x0a, 0x10, 0x52, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x52, 0x65, 0x73, 0x70, 0x6f, - 0x6e, 0x73, 0x65, 0x22, 0x9d, 0x0b, 0x0a, 0x17, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, - 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x12, - 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x69, 0x64, 0x12, - 0x6e, 0x0a, 0x0a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x18, 0x02, 0x20, - 0x03, 0x28, 0x0b, 0x32, 0x4e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, - 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, - 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, - 0x74, 0x6f, 0x72, 0x2e, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x45, 0x6e, - 0x74, 0x72, 0x79, 0x52, 0x0a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x12, - 0x74, 0x0a, 0x0c, 0x70, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x18, - 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x50, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x73, 0x70, + 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x0d, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, + 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x12, 0x7f, 0x0a, 0x17, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, + 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x5f, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, + 0x18, 0xea, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x44, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, + 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x50, 0x72, 0x6f, + 0x67, 0x72, 0x65, 0x73, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, + 0x15, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x50, 0x72, + 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x12, 0x76, 0x0a, 0x14, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, + 0x73, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x5f, 0x73, 0x70, 0x6c, 0x69, 0x74, 0x18, 0xeb, + 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, - 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, - 0x69, 0x70, 0x74, 0x6f, 0x72, 0x2e, 0x50, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, - 0x6e, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0c, 0x70, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, - 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x12, 0x8a, 0x01, 0x0a, 0x14, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, - 0x69, 0x6e, 0x67, 0x5f, 0x73, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x69, 0x65, 0x73, 0x18, 0x04, - 0x20, 0x03, 0x28, 0x0b, 0x32, 0x57, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, + 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x12, 0x70, 0x72, 0x6f, 0x63, + 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x12, 0x69, + 0x0a, 0x0f, 0x66, 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x65, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, + 0x65, 0x18, 0xec, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, + 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, + 0x2e, 0x46, 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x65, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, + 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x0e, 0x66, 0x69, 0x6e, 0x61, 0x6c, + 0x69, 0x7a, 0x65, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x12, 0x74, 0x0a, 0x10, 0x6d, 0x6f, 0x6e, + 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x73, 0x18, 0xed, 0x07, + 0x20, 0x01, 0x28, 0x0b, 0x32, 0x46, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, - 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, + 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, + 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x4d, 0x65, 0x74, 0x61, + 0x64, 0x61, 0x74, 0x61, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x0f, + 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x12, + 0x82, 0x01, 0x0a, 0x18, 0x68, 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x5f, 0x6d, 0x6f, 0x6e, 0x69, + 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x73, 0x18, 0xee, 0x07, 0x20, + 0x01, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, + 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, + 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x48, 0x61, 0x72, 0x6e, + 0x65, 0x73, 0x73, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, + 0x6f, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x16, 0x68, 0x61, + 0x72, 0x6e, 0x65, 0x73, 0x73, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, + 0x6e, 0x66, 0x6f, 0x73, 0x12, 0x56, 0x0a, 0x08, 0x72, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, + 0x18, 0xe8, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, + 0x52, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, + 0x48, 0x00, 0x52, 0x08, 0x72, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, 0x42, 0x0a, 0x0a, 0x08, + 0x72, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0x1f, 0x0a, 0x1d, 0x48, 0x61, 0x72, 0x6e, + 0x65, 0x73, 0x73, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, + 0x6f, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x22, 0xe8, 0x01, 0x0a, 0x1e, 0x48, 0x61, + 0x72, 0x6e, 0x65, 0x73, 0x73, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, + 0x6e, 0x66, 0x6f, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x82, 0x01, 0x0a, + 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x64, 0x61, 0x74, 0x61, + 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x59, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, + 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x48, + 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, + 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x2e, 0x4d, 0x6f, + 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, + 0x79, 0x52, 0x0e, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x61, 0x74, + 0x61, 0x1a, 0x41, 0x0a, 0x13, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, + 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, + 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, + 0x3a, 0x02, 0x38, 0x01, 0x22, 0x8d, 0x01, 0x0a, 0x0f, 0x52, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, + 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x7a, 0x0a, 0x19, 0x70, 0x72, 0x6f, 0x63, + 0x65, 0x73, 0x73, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x5f, 0x64, 0x65, 0x73, 0x63, 0x72, + 0x69, 0x70, 0x74, 0x6f, 0x72, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3e, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, + 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, + 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x17, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, - 0x70, 0x74, 0x6f, 0x72, 0x2e, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, 0x74, - 0x72, 0x61, 0x74, 0x65, 0x67, 0x69, 0x65, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x13, 0x77, - 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x69, - 0x65, 0x73, 0x12, 0x62, 0x0a, 0x06, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x73, 0x18, 0x05, 0x20, 0x03, - 0x28, 0x0b, 0x32, 0x4a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, - 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, - 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, - 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, - 0x6f, 0x72, 0x2e, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x06, - 0x63, 0x6f, 0x64, 0x65, 0x72, 0x73, 0x12, 0x74, 0x0a, 0x0c, 0x65, 0x6e, 0x76, 0x69, 0x72, 0x6f, - 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x18, 0x06, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x50, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, - 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, - 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x2e, 0x45, 0x6e, 0x76, - 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0c, - 0x65, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x78, 0x0a, 0x1c, - 0x73, 0x74, 0x61, 0x74, 0x65, 0x5f, 0x61, 0x70, 0x69, 0x5f, 0x73, 0x65, 0x72, 0x76, 0x69, 0x63, - 0x65, 0x5f, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x18, 0x07, 0x20, 0x01, - 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, - 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, - 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, - 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x19, 0x73, 0x74, 0x61, - 0x74, 0x65, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, - 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x12, 0x78, 0x0a, 0x1c, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x5f, - 0x61, 0x70, 0x69, 0x5f, 0x73, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x5f, 0x64, 0x65, 0x73, 0x63, - 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x18, 0x08, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, - 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x19, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x41, 0x70, 0x69, 0x53, + 0x70, 0x74, 0x6f, 0x72, 0x22, 0x12, 0x0a, 0x10, 0x52, 0x65, 0x67, 0x69, 0x73, 0x74, 0x65, 0x72, + 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0x9d, 0x0b, 0x0a, 0x17, 0x50, 0x72, 0x6f, + 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, + 0x70, 0x74, 0x6f, 0x72, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, + 0x52, 0x02, 0x69, 0x64, 0x12, 0x6e, 0x0a, 0x0a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, + 0x6d, 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x4e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, + 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, + 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, + 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x2e, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, + 0x72, 0x6d, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, + 0x6f, 0x72, 0x6d, 0x73, 0x12, 0x74, 0x0a, 0x0c, 0x70, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, + 0x69, 0x6f, 0x6e, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x50, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, + 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, + 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x2e, 0x50, 0x63, 0x6f, 0x6c, 0x6c, + 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0c, 0x70, 0x63, + 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x12, 0x8a, 0x01, 0x0a, 0x14, 0x77, + 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x5f, 0x73, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, + 0x69, 0x65, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x57, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, + 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, + 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x2e, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, + 0x69, 0x6e, 0x67, 0x53, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x69, 0x65, 0x73, 0x45, 0x6e, 0x74, + 0x72, 0x79, 0x52, 0x13, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x72, + 0x61, 0x74, 0x65, 0x67, 0x69, 0x65, 0x73, 0x12, 0x62, 0x0a, 0x06, 0x63, 0x6f, 0x64, 0x65, 0x72, + 0x73, 0x18, 0x05, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x4a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, + 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, + 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x2e, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x73, 0x45, 0x6e, + 0x74, 0x72, 0x79, 0x52, 0x06, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x73, 0x12, 0x74, 0x0a, 0x0c, 0x65, + 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x18, 0x06, 0x20, 0x03, 0x28, + 0x0b, 0x32, 0x50, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, + 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, + 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, + 0x72, 0x2e, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x45, 0x6e, + 0x74, 0x72, 0x79, 0x52, 0x0c, 0x65, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, + 0x73, 0x12, 0x78, 0x0a, 0x1c, 0x73, 0x74, 0x61, 0x74, 0x65, 0x5f, 0x61, 0x70, 0x69, 0x5f, 0x73, + 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x5f, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, + 0x72, 0x18, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, - 0x1a, 0x6c, 0x0a, 0x0f, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x45, 0x6e, - 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x43, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, - 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, - 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, - 0x6f, 0x72, 0x6d, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x1a, 0x6f, - 0x0a, 0x11, 0x50, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x45, 0x6e, - 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x44, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, - 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, - 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x43, 0x6f, 0x6c, 0x6c, 0x65, 0x63, - 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x1a, - 0x7c, 0x0a, 0x18, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x72, 0x61, - 0x74, 0x65, 0x67, 0x69, 0x65, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, - 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x4a, 0x0a, - 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x72, 0x61, 0x74, 0x65, - 0x67, 0x79, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x1a, 0x63, 0x0a, - 0x0b, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, - 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x3e, - 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x28, 0x2e, + 0x52, 0x19, 0x73, 0x74, 0x61, 0x74, 0x65, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, + 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x12, 0x78, 0x0a, 0x1c, 0x74, + 0x69, 0x6d, 0x65, 0x72, 0x5f, 0x61, 0x70, 0x69, 0x5f, 0x73, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, + 0x5f, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x18, 0x08, 0x20, 0x01, 0x28, + 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, + 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x19, 0x74, 0x69, 0x6d, 0x65, + 0x72, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, + 0x69, 0x70, 0x74, 0x6f, 0x72, 0x1a, 0x6c, 0x0a, 0x0f, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, + 0x72, 0x6d, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x43, 0x0a, 0x05, 0x76, 0x61, + 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x54, + 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, + 0x02, 0x38, 0x01, 0x1a, 0x6f, 0x0a, 0x11, 0x50, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, + 0x6f, 0x6e, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x44, 0x0a, 0x05, 0x76, 0x61, + 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x43, + 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, + 0x3a, 0x02, 0x38, 0x01, 0x1a, 0x7c, 0x0a, 0x18, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, + 0x67, 0x53, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x69, 0x65, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, + 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, + 0x65, 0x79, 0x12, 0x4a, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, + 0x0b, 0x32, 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, + 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, + 0x38, 0x01, 0x1a, 0x63, 0x0a, 0x0b, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x73, 0x45, 0x6e, 0x74, 0x72, + 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, + 0x6b, 0x65, 0x79, 0x12, 0x3e, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x28, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, + 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x52, 0x05, 0x76, 0x61, + 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x1a, 0x6f, 0x0a, 0x11, 0x45, 0x6e, 0x76, 0x69, 0x72, + 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, + 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x44, + 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, - 0x31, 0x2e, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, - 0x38, 0x01, 0x1a, 0x6f, 0x0a, 0x11, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, - 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x44, 0x0a, 0x05, 0x76, 0x61, 0x6c, - 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x31, 0x2e, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x52, 0x05, 0x76, + 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0x9b, 0x03, 0x0a, 0x11, 0x42, 0x75, 0x6e, + 0x64, 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x21, + 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, + 0x64, 0x12, 0x19, 0x0a, 0x08, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, + 0x01, 0x28, 0x09, 0x52, 0x07, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x49, 0x64, 0x12, 0x18, 0x0a, 0x07, + 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x65, + 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x12, 0x7b, 0x0a, 0x11, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, + 0x5f, 0x77, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, + 0x0b, 0x32, 0x4e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, + 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, + 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x4f, 0x75, 0x74, 0x70, + 0x75, 0x74, 0x57, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x73, 0x45, 0x6e, 0x74, 0x72, + 0x79, 0x52, 0x10, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x57, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, + 0x72, 0x6b, 0x73, 0x12, 0x50, 0x0a, 0x0a, 0x69, 0x73, 0x5f, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x65, + 0x64, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x73, 0x42, 0x6f, + 0x75, 0x6e, 0x64, 0x65, 0x64, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x09, 0x69, 0x73, 0x42, 0x6f, + 0x75, 0x6e, 0x64, 0x65, 0x64, 0x1a, 0x5f, 0x0a, 0x15, 0x4f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x57, + 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, + 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, + 0x12, 0x30, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, + 0x1a, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, + 0x66, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x52, 0x05, 0x76, 0x61, 0x6c, + 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0xc3, 0x01, 0x0a, 0x18, 0x44, 0x65, 0x6c, 0x61, 0x79, + 0x65, 0x64, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, + 0x69, 0x6f, 0x6e, 0x12, 0x5a, 0x0a, 0x0b, 0x61, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, + 0x6f, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x38, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, - 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6e, 0x76, - 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, - 0x02, 0x38, 0x01, 0x22, 0x9b, 0x03, 0x0a, 0x11, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x41, 0x70, - 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, - 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x19, 0x0a, 0x08, - 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, - 0x69, 0x6e, 0x70, 0x75, 0x74, 0x49, 0x64, 0x12, 0x18, 0x0a, 0x07, 0x65, 0x6c, 0x65, 0x6d, 0x65, - 0x6e, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, - 0x74, 0x12, 0x7b, 0x0a, 0x11, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x5f, 0x77, 0x61, 0x74, 0x65, - 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x4e, 0x2e, 0x6f, + 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, + 0x2e, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, + 0x6f, 0x6e, 0x52, 0x0b, 0x61, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, + 0x4b, 0x0a, 0x14, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x65, 0x64, 0x5f, 0x74, 0x69, 0x6d, + 0x65, 0x5f, 0x64, 0x65, 0x6c, 0x61, 0x79, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x19, 0x2e, + 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, + 0x44, 0x75, 0x72, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x12, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, + 0x74, 0x65, 0x64, 0x54, 0x69, 0x6d, 0x65, 0x44, 0x65, 0x6c, 0x61, 0x79, 0x22, 0xb6, 0x04, 0x0a, + 0x14, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, + 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x3f, 0x0a, 0x1c, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, + 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x5f, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, + 0x6f, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x19, 0x70, 0x72, 0x6f, + 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, + 0x70, 0x74, 0x6f, 0x72, 0x49, 0x64, 0x12, 0x69, 0x0a, 0x0c, 0x63, 0x61, 0x63, 0x68, 0x65, 0x5f, + 0x74, 0x6f, 0x6b, 0x65, 0x6e, 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x46, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, - 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, - 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x4f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x57, 0x61, 0x74, - 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x10, 0x6f, 0x75, - 0x74, 0x70, 0x75, 0x74, 0x57, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x73, 0x12, 0x50, - 0x0a, 0x0a, 0x69, 0x73, 0x5f, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, 0x18, 0x05, 0x20, 0x01, - 0x28, 0x0e, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, - 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, - 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x73, 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, - 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x09, 0x69, 0x73, 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, - 0x1a, 0x5f, 0x0a, 0x15, 0x4f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x57, 0x61, 0x74, 0x65, 0x72, 0x6d, - 0x61, 0x72, 0x6b, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x30, 0x0a, 0x05, 0x76, - 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x67, 0x6f, 0x6f, - 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x54, 0x69, 0x6d, - 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, - 0x01, 0x22, 0xc3, 0x01, 0x0a, 0x18, 0x44, 0x65, 0x6c, 0x61, 0x79, 0x65, 0x64, 0x42, 0x75, 0x6e, - 0x64, 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x5a, - 0x0a, 0x0b, 0x61, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x01, 0x20, - 0x01, 0x28, 0x0b, 0x32, 0x38, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, - 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x42, 0x75, 0x6e, 0x64, - 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x0b, 0x61, - 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x4b, 0x0a, 0x14, 0x72, 0x65, - 0x71, 0x75, 0x65, 0x73, 0x74, 0x65, 0x64, 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x64, 0x65, 0x6c, - 0x61, 0x79, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x19, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, - 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x44, 0x75, 0x72, 0x61, 0x74, - 0x69, 0x6f, 0x6e, 0x52, 0x12, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x65, 0x64, 0x54, 0x69, - 0x6d, 0x65, 0x44, 0x65, 0x6c, 0x61, 0x79, 0x22, 0xb6, 0x04, 0x0a, 0x14, 0x50, 0x72, 0x6f, 0x63, - 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, - 0x12, 0x3f, 0x0a, 0x1c, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x62, 0x75, 0x6e, 0x64, - 0x6c, 0x65, 0x5f, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x5f, 0x69, 0x64, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x19, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, - 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x49, - 0x64, 0x12, 0x69, 0x0a, 0x0c, 0x63, 0x61, 0x63, 0x68, 0x65, 0x5f, 0x74, 0x6f, 0x6b, 0x65, 0x6e, - 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x46, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, + 0x6c, 0x65, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x2e, 0x43, 0x61, 0x63, 0x68, 0x65, 0x54, + 0x6f, 0x6b, 0x65, 0x6e, 0x52, 0x0b, 0x63, 0x61, 0x63, 0x68, 0x65, 0x54, 0x6f, 0x6b, 0x65, 0x6e, + 0x73, 0x1a, 0xf1, 0x02, 0x0a, 0x0a, 0x43, 0x61, 0x63, 0x68, 0x65, 0x54, 0x6f, 0x6b, 0x65, 0x6e, + 0x12, 0x71, 0x0a, 0x0a, 0x75, 0x73, 0x65, 0x72, 0x5f, 0x73, 0x74, 0x61, 0x74, 0x65, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x0b, 0x32, 0x50, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, + 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, + 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, + 0x74, 0x2e, 0x43, 0x61, 0x63, 0x68, 0x65, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x2e, 0x55, 0x73, 0x65, + 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x48, 0x00, 0x52, 0x09, 0x75, 0x73, 0x65, 0x72, 0x53, 0x74, + 0x61, 0x74, 0x65, 0x12, 0x71, 0x0a, 0x0a, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, + 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x50, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x71, - 0x75, 0x65, 0x73, 0x74, 0x2e, 0x43, 0x61, 0x63, 0x68, 0x65, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x52, - 0x0b, 0x63, 0x61, 0x63, 0x68, 0x65, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x73, 0x1a, 0xf1, 0x02, 0x0a, - 0x0a, 0x43, 0x61, 0x63, 0x68, 0x65, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x12, 0x71, 0x0a, 0x0a, 0x75, - 0x73, 0x65, 0x72, 0x5f, 0x73, 0x74, 0x61, 0x74, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, - 0x50, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, - 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, - 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, - 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x2e, 0x43, 0x61, 0x63, - 0x68, 0x65, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x2e, 0x55, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, - 0x65, 0x48, 0x00, 0x52, 0x09, 0x75, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x12, 0x71, - 0x0a, 0x0a, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x02, 0x20, 0x01, - 0x28, 0x0b, 0x32, 0x50, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, - 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, - 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, - 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x2e, - 0x43, 0x61, 0x63, 0x68, 0x65, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x2e, 0x53, 0x69, 0x64, 0x65, 0x49, - 0x6e, 0x70, 0x75, 0x74, 0x48, 0x00, 0x52, 0x09, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, - 0x74, 0x12, 0x14, 0x0a, 0x05, 0x74, 0x6f, 0x6b, 0x65, 0x6e, 0x18, 0x0a, 0x20, 0x01, 0x28, 0x0c, - 0x52, 0x05, 0x74, 0x6f, 0x6b, 0x65, 0x6e, 0x1a, 0x0b, 0x0a, 0x09, 0x55, 0x73, 0x65, 0x72, 0x53, - 0x74, 0x61, 0x74, 0x65, 0x1a, 0x52, 0x0a, 0x09, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, - 0x74, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, - 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, - 0x72, 0x6d, 0x49, 0x64, 0x12, 0x22, 0x0a, 0x0d, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, - 0x75, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x73, 0x69, 0x64, - 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x49, 0x64, 0x42, 0x06, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, - 0x22, 0xd6, 0x03, 0x0a, 0x15, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, - 0x6c, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x66, 0x0a, 0x0e, 0x72, 0x65, - 0x73, 0x69, 0x64, 0x75, 0x61, 0x6c, 0x5f, 0x72, 0x6f, 0x6f, 0x74, 0x73, 0x18, 0x02, 0x20, 0x03, - 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, - 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, - 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x65, 0x6c, 0x61, 0x79, - 0x65, 0x64, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, - 0x69, 0x6f, 0x6e, 0x52, 0x0d, 0x72, 0x65, 0x73, 0x69, 0x64, 0x75, 0x61, 0x6c, 0x52, 0x6f, 0x6f, - 0x74, 0x73, 0x12, 0x5c, 0x0a, 0x10, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, - 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x31, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x52, - 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, - 0x12, 0x33, 0x0a, 0x15, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x5f, 0x66, 0x69, 0x6e, - 0x61, 0x6c, 0x69, 0x7a, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x04, 0x20, 0x01, 0x28, 0x08, 0x52, - 0x14, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x46, 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, - 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x79, 0x0a, 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, - 0x69, 0x6e, 0x67, 0x5f, 0x64, 0x61, 0x74, 0x61, 0x18, 0x05, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x50, - 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, - 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, - 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x2e, 0x4d, 0x6f, 0x6e, - 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, 0x79, - 0x52, 0x0e, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x61, 0x74, 0x61, - 0x1a, 0x41, 0x0a, 0x13, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x61, - 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, - 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, - 0x02, 0x38, 0x01, 0x4a, 0x04, 0x08, 0x01, 0x10, 0x02, 0x22, 0x45, 0x0a, 0x1c, 0x50, 0x72, 0x6f, - 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x50, 0x72, 0x6f, 0x67, 0x72, 0x65, - 0x73, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, - 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, - 0x22, 0x4e, 0x0a, 0x1e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, - 0x66, 0x6f, 0x73, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x52, 0x65, 0x71, 0x75, 0x65, - 0x73, 0x74, 0x12, 0x2c, 0x0a, 0x12, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, - 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x03, 0x28, 0x09, 0x52, 0x10, - 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x49, 0x64, - 0x22, 0xd6, 0x02, 0x0a, 0x1d, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, - 0x6c, 0x65, 0x50, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, - 0x73, 0x65, 0x12, 0x5c, 0x0a, 0x10, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, - 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x31, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x52, - 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, - 0x12, 0x81, 0x01, 0x0a, 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, - 0x64, 0x61, 0x74, 0x61, 0x18, 0x05, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x58, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, - 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, - 0x50, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, - 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x61, 0x74, 0x61, 0x45, - 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0e, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, - 0x44, 0x61, 0x74, 0x61, 0x1a, 0x41, 0x0a, 0x13, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, - 0x6e, 0x67, 0x44, 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, - 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, - 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x05, 0x76, 0x61, - 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x4a, 0x04, 0x08, 0x01, 0x10, 0x02, 0x4a, 0x04, 0x08, - 0x02, 0x10, 0x03, 0x4a, 0x04, 0x08, 0x04, 0x10, 0x05, 0x22, 0x9d, 0x02, 0x0a, 0x1f, 0x4d, 0x6f, - 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x4d, 0x65, 0x74, - 0x61, 0x64, 0x61, 0x74, 0x61, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x83, 0x01, - 0x0a, 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, - 0x6f, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x5a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x75, 0x65, 0x73, 0x74, 0x2e, 0x43, 0x61, 0x63, 0x68, 0x65, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x2e, + 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x48, 0x00, 0x52, 0x09, 0x73, 0x69, 0x64, + 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x14, 0x0a, 0x05, 0x74, 0x6f, 0x6b, 0x65, 0x6e, 0x18, + 0x0a, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x05, 0x74, 0x6f, 0x6b, 0x65, 0x6e, 0x1a, 0x0b, 0x0a, 0x09, + 0x55, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x1a, 0x52, 0x0a, 0x09, 0x53, 0x69, 0x64, + 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, + 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, + 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x22, 0x0a, 0x0d, 0x73, 0x69, 0x64, + 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, + 0x52, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x49, 0x64, 0x42, 0x06, 0x0a, + 0x04, 0x74, 0x79, 0x70, 0x65, 0x22, 0xd6, 0x03, 0x0a, 0x15, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, + 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, + 0x66, 0x0a, 0x0e, 0x72, 0x65, 0x73, 0x69, 0x64, 0x75, 0x61, 0x6c, 0x5f, 0x72, 0x6f, 0x6f, 0x74, + 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, - 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x4d, - 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x2e, - 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x45, 0x6e, - 0x74, 0x72, 0x79, 0x52, 0x0e, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, - 0x6e, 0x66, 0x6f, 0x1a, 0x74, 0x0a, 0x13, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, - 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, - 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x47, 0x0a, 0x05, - 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x31, 0x2e, 0x6f, 0x72, - 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, - 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, - 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x52, 0x05, - 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0x81, 0x04, 0x0a, 0x19, 0x50, 0x72, - 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, - 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, - 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x12, 0x7a, - 0x0a, 0x0e, 0x64, 0x65, 0x73, 0x69, 0x72, 0x65, 0x64, 0x5f, 0x73, 0x70, 0x6c, 0x69, 0x74, 0x73, - 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x53, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, - 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, - 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, 0x6c, 0x69, - 0x74, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x2e, 0x44, 0x65, 0x73, 0x69, 0x72, 0x65, 0x64, - 0x53, 0x70, 0x6c, 0x69, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0d, 0x64, 0x65, 0x73, - 0x69, 0x72, 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x73, 0x1a, 0xae, 0x01, 0x0a, 0x0c, 0x44, - 0x65, 0x73, 0x69, 0x72, 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x12, 0x32, 0x0a, 0x15, 0x66, - 0x72, 0x61, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x6f, 0x66, 0x5f, 0x72, 0x65, 0x6d, 0x61, 0x69, - 0x6e, 0x64, 0x65, 0x72, 0x18, 0x01, 0x20, 0x01, 0x28, 0x01, 0x52, 0x13, 0x66, 0x72, 0x61, 0x63, - 0x74, 0x69, 0x6f, 0x6e, 0x4f, 0x66, 0x52, 0x65, 0x6d, 0x61, 0x69, 0x6e, 0x64, 0x65, 0x72, 0x12, - 0x30, 0x0a, 0x14, 0x61, 0x6c, 0x6c, 0x6f, 0x77, 0x65, 0x64, 0x5f, 0x73, 0x70, 0x6c, 0x69, 0x74, - 0x5f, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x03, 0x52, 0x12, 0x61, - 0x6c, 0x6c, 0x6f, 0x77, 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x50, 0x6f, 0x69, 0x6e, 0x74, - 0x73, 0x12, 0x38, 0x0a, 0x18, 0x65, 0x73, 0x74, 0x69, 0x6d, 0x61, 0x74, 0x65, 0x64, 0x5f, 0x69, - 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x03, 0x52, 0x16, 0x65, 0x73, 0x74, 0x69, 0x6d, 0x61, 0x74, 0x65, 0x64, 0x49, 0x6e, - 0x70, 0x75, 0x74, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x1a, 0x8f, 0x01, 0x0a, 0x12, - 0x44, 0x65, 0x73, 0x69, 0x72, 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x73, 0x45, 0x6e, 0x74, - 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x03, 0x6b, 0x65, 0x79, 0x12, 0x63, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x0b, 0x32, 0x4d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, + 0x44, 0x65, 0x6c, 0x61, 0x79, 0x65, 0x64, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x41, 0x70, 0x70, + 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x0d, 0x72, 0x65, 0x73, 0x69, 0x64, 0x75, + 0x61, 0x6c, 0x52, 0x6f, 0x6f, 0x74, 0x73, 0x12, 0x5c, 0x0a, 0x10, 0x6d, 0x6f, 0x6e, 0x69, 0x74, + 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, + 0x0b, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, + 0x49, 0x6e, 0x66, 0x6f, 0x52, 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, + 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x12, 0x33, 0x0a, 0x15, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, + 0x73, 0x5f, 0x66, 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x04, + 0x20, 0x01, 0x28, 0x08, 0x52, 0x14, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x46, 0x69, + 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x79, 0x0a, 0x0f, 0x6d, 0x6f, + 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x64, 0x61, 0x74, 0x61, 0x18, 0x05, 0x20, + 0x03, 0x28, 0x0b, 0x32, 0x50, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, - 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x52, 0x65, - 0x71, 0x75, 0x65, 0x73, 0x74, 0x2e, 0x44, 0x65, 0x73, 0x69, 0x72, 0x65, 0x64, 0x53, 0x70, 0x6c, - 0x69, 0x74, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0xf6, 0x03, - 0x0a, 0x1a, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, - 0x70, 0x6c, 0x69, 0x74, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x5d, 0x0a, 0x0d, - 0x70, 0x72, 0x69, 0x6d, 0x61, 0x72, 0x79, 0x5f, 0x72, 0x6f, 0x6f, 0x74, 0x73, 0x18, 0x01, 0x20, - 0x03, 0x28, 0x0b, 0x32, 0x38, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, - 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x42, 0x75, 0x6e, 0x64, - 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x0c, 0x70, - 0x72, 0x69, 0x6d, 0x61, 0x72, 0x79, 0x52, 0x6f, 0x6f, 0x74, 0x73, 0x12, 0x66, 0x0a, 0x0e, 0x72, - 0x65, 0x73, 0x69, 0x64, 0x75, 0x61, 0x6c, 0x5f, 0x72, 0x6f, 0x6f, 0x74, 0x73, 0x18, 0x02, 0x20, - 0x03, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, - 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x65, 0x6c, 0x61, - 0x79, 0x65, 0x64, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, - 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x0d, 0x72, 0x65, 0x73, 0x69, 0x64, 0x75, 0x61, 0x6c, 0x52, 0x6f, - 0x6f, 0x74, 0x73, 0x12, 0x75, 0x0a, 0x0e, 0x63, 0x68, 0x61, 0x6e, 0x6e, 0x65, 0x6c, 0x5f, 0x73, - 0x70, 0x6c, 0x69, 0x74, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x4e, 0x2e, 0x6f, 0x72, + 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, + 0x65, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x61, 0x74, 0x61, + 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0e, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, + 0x67, 0x44, 0x61, 0x74, 0x61, 0x1a, 0x41, 0x0a, 0x13, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, + 0x69, 0x6e, 0x67, 0x44, 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, + 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, + 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x05, 0x76, + 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x4a, 0x04, 0x08, 0x01, 0x10, 0x02, 0x22, 0x45, + 0x0a, 0x1c, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x50, + 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x25, + 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, + 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, + 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x22, 0x4e, 0x0a, 0x1e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, + 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, + 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x2c, 0x0a, 0x12, 0x6d, 0x6f, 0x6e, 0x69, 0x74, + 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, + 0x03, 0x28, 0x09, 0x52, 0x10, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, + 0x6e, 0x66, 0x6f, 0x49, 0x64, 0x22, 0xd6, 0x02, 0x0a, 0x1d, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, + 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x50, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x52, + 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x5c, 0x0a, 0x10, 0x6d, 0x6f, 0x6e, 0x69, 0x74, + 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, + 0x0b, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, + 0x49, 0x6e, 0x66, 0x6f, 0x52, 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, + 0x49, 0x6e, 0x66, 0x6f, 0x73, 0x12, 0x81, 0x01, 0x0a, 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, + 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x64, 0x61, 0x74, 0x61, 0x18, 0x05, 0x20, 0x03, 0x28, 0x0b, 0x32, + 0x58, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, + 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, + 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x50, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x52, 0x65, 0x73, + 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, + 0x44, 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0e, 0x6d, 0x6f, 0x6e, 0x69, 0x74, + 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x61, 0x74, 0x61, 0x1a, 0x41, 0x0a, 0x13, 0x4d, 0x6f, 0x6e, + 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, 0x79, + 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, + 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, + 0x0c, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x4a, 0x04, 0x08, 0x01, + 0x10, 0x02, 0x4a, 0x04, 0x08, 0x02, 0x10, 0x03, 0x4a, 0x04, 0x08, 0x04, 0x10, 0x05, 0x22, 0x9d, + 0x02, 0x0a, 0x1f, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, + 0x6f, 0x73, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, + 0x73, 0x65, 0x12, 0x83, 0x01, 0x0a, 0x0f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, + 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x5a, 0x2e, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, + 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, + 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, + 0x6e, 0x66, 0x6f, 0x73, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x52, 0x65, 0x73, 0x70, + 0x6f, 0x6e, 0x73, 0x65, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, + 0x6e, 0x66, 0x6f, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0e, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, + 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x1a, 0x74, 0x0a, 0x13, 0x4d, 0x6f, 0x6e, 0x69, + 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, + 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, + 0x79, 0x12, 0x47, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, + 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, + 0x6e, 0x66, 0x6f, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0x81, + 0x04, 0x0a, 0x19, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, + 0x53, 0x70, 0x6c, 0x69, 0x74, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x25, 0x0a, 0x0e, + 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, + 0x6e, 0x49, 0x64, 0x12, 0x7a, 0x0a, 0x0e, 0x64, 0x65, 0x73, 0x69, 0x72, 0x65, 0x64, 0x5f, 0x73, + 0x70, 0x6c, 0x69, 0x74, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x53, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, - 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x2e, 0x43, - 0x68, 0x61, 0x6e, 0x6e, 0x65, 0x6c, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x52, 0x0d, 0x63, 0x68, 0x61, - 0x6e, 0x6e, 0x65, 0x6c, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x73, 0x1a, 0x99, 0x01, 0x0a, 0x0c, 0x43, - 0x68, 0x61, 0x6e, 0x6e, 0x65, 0x6c, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x12, 0x21, 0x0a, 0x0c, 0x74, - 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x30, - 0x0a, 0x14, 0x6c, 0x61, 0x73, 0x74, 0x5f, 0x70, 0x72, 0x69, 0x6d, 0x61, 0x72, 0x79, 0x5f, 0x65, - 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, 0x03, 0x52, 0x12, 0x6c, 0x61, - 0x73, 0x74, 0x50, 0x72, 0x69, 0x6d, 0x61, 0x72, 0x79, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, - 0x12, 0x34, 0x0a, 0x16, 0x66, 0x69, 0x72, 0x73, 0x74, 0x5f, 0x72, 0x65, 0x73, 0x69, 0x64, 0x75, - 0x61, 0x6c, 0x5f, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x03, - 0x52, 0x14, 0x66, 0x69, 0x72, 0x73, 0x74, 0x52, 0x65, 0x73, 0x69, 0x64, 0x75, 0x61, 0x6c, 0x45, - 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x22, 0x3e, 0x0a, 0x15, 0x46, 0x69, 0x6e, 0x61, 0x6c, 0x69, - 0x7a, 0x65, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, - 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, - 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, - 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x22, 0x18, 0x0a, 0x16, 0x46, 0x69, 0x6e, 0x61, 0x6c, 0x69, - 0x7a, 0x65, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, - 0x22, 0xd1, 0x03, 0x0a, 0x08, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x48, 0x0a, - 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, 0x72, - 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, - 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, - 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x2e, 0x44, 0x61, 0x74, - 0x61, 0x52, 0x04, 0x64, 0x61, 0x74, 0x61, 0x12, 0x4e, 0x0a, 0x06, 0x74, 0x69, 0x6d, 0x65, 0x72, - 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x36, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, - 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, - 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x73, 0x52, - 0x06, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x73, 0x1a, 0x7d, 0x0a, 0x04, 0x44, 0x61, 0x74, 0x61, 0x12, - 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, - 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, - 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, - 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, - 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, - 0x61, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, 0x61, 0x74, 0x61, 0x12, 0x17, 0x0a, - 0x07, 0x69, 0x73, 0x5f, 0x6c, 0x61, 0x73, 0x74, 0x18, 0x04, 0x20, 0x01, 0x28, 0x08, 0x52, 0x06, - 0x69, 0x73, 0x4c, 0x61, 0x73, 0x74, 0x1a, 0xab, 0x01, 0x0a, 0x06, 0x54, 0x69, 0x6d, 0x65, 0x72, - 0x73, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, - 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, - 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, - 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, - 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x26, 0x0a, 0x0f, 0x74, - 0x69, 0x6d, 0x65, 0x72, 0x5f, 0x66, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x5f, 0x69, 0x64, 0x18, 0x03, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, - 0x79, 0x49, 0x64, 0x12, 0x16, 0x0a, 0x06, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x73, 0x18, 0x04, 0x20, - 0x01, 0x28, 0x0c, 0x52, 0x06, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x73, 0x12, 0x17, 0x0a, 0x07, 0x69, - 0x73, 0x5f, 0x6c, 0x61, 0x73, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, 0x08, 0x52, 0x06, 0x69, 0x73, - 0x4c, 0x61, 0x73, 0x74, 0x22, 0x94, 0x03, 0x0a, 0x0c, 0x53, 0x74, 0x61, 0x74, 0x65, 0x52, 0x65, - 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x02, 0x69, 0x64, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, - 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, - 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x12, 0x4c, 0x0a, 0x09, - 0x73, 0x74, 0x61, 0x74, 0x65, 0x5f, 0x6b, 0x65, 0x79, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, - 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, - 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, - 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, - 0x52, 0x08, 0x73, 0x74, 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, 0x12, 0x4b, 0x0a, 0x03, 0x67, 0x65, - 0x74, 0x18, 0xe8, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x36, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x2e, 0x44, 0x65, + 0x73, 0x69, 0x72, 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, + 0x52, 0x0d, 0x64, 0x65, 0x73, 0x69, 0x72, 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x73, 0x1a, + 0xae, 0x01, 0x0a, 0x0c, 0x44, 0x65, 0x73, 0x69, 0x72, 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, 0x74, + 0x12, 0x32, 0x0a, 0x15, 0x66, 0x72, 0x61, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x6f, 0x66, 0x5f, + 0x72, 0x65, 0x6d, 0x61, 0x69, 0x6e, 0x64, 0x65, 0x72, 0x18, 0x01, 0x20, 0x01, 0x28, 0x01, 0x52, + 0x13, 0x66, 0x72, 0x61, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x4f, 0x66, 0x52, 0x65, 0x6d, 0x61, 0x69, + 0x6e, 0x64, 0x65, 0x72, 0x12, 0x30, 0x0a, 0x14, 0x61, 0x6c, 0x6c, 0x6f, 0x77, 0x65, 0x64, 0x5f, + 0x73, 0x70, 0x6c, 0x69, 0x74, 0x5f, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x73, 0x18, 0x03, 0x20, 0x03, + 0x28, 0x03, 0x52, 0x12, 0x61, 0x6c, 0x6c, 0x6f, 0x77, 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, 0x74, + 0x50, 0x6f, 0x69, 0x6e, 0x74, 0x73, 0x12, 0x38, 0x0a, 0x18, 0x65, 0x73, 0x74, 0x69, 0x6d, 0x61, + 0x74, 0x65, 0x64, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, + 0x74, 0x73, 0x18, 0x02, 0x20, 0x01, 0x28, 0x03, 0x52, 0x16, 0x65, 0x73, 0x74, 0x69, 0x6d, 0x61, + 0x74, 0x65, 0x64, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, + 0x1a, 0x8f, 0x01, 0x0a, 0x12, 0x44, 0x65, 0x73, 0x69, 0x72, 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, + 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x63, 0x0a, 0x05, 0x76, 0x61, 0x6c, + 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x4d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, + 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, + 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, + 0x6c, 0x69, 0x74, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x2e, 0x44, 0x65, 0x73, 0x69, 0x72, + 0x65, 0x64, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, + 0x38, 0x01, 0x22, 0xf6, 0x03, 0x0a, 0x1a, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, + 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, + 0x65, 0x12, 0x5d, 0x0a, 0x0d, 0x70, 0x72, 0x69, 0x6d, 0x61, 0x72, 0x79, 0x5f, 0x72, 0x6f, 0x6f, + 0x74, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x38, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, - 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x47, 0x65, 0x74, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, - 0x48, 0x00, 0x52, 0x03, 0x67, 0x65, 0x74, 0x12, 0x54, 0x0a, 0x06, 0x61, 0x70, 0x70, 0x65, 0x6e, - 0x64, 0x18, 0xe9, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x39, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x2e, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x41, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, + 0x6f, 0x6e, 0x52, 0x0c, 0x70, 0x72, 0x69, 0x6d, 0x61, 0x72, 0x79, 0x52, 0x6f, 0x6f, 0x74, 0x73, + 0x12, 0x66, 0x0a, 0x0e, 0x72, 0x65, 0x73, 0x69, 0x64, 0x75, 0x61, 0x6c, 0x5f, 0x72, 0x6f, 0x6f, + 0x74, 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, - 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x41, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x52, 0x65, 0x71, 0x75, - 0x65, 0x73, 0x74, 0x48, 0x00, 0x52, 0x06, 0x61, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x12, 0x51, 0x0a, - 0x05, 0x63, 0x6c, 0x65, 0x61, 0x72, 0x18, 0xea, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x38, 0x2e, + 0x2e, 0x44, 0x65, 0x6c, 0x61, 0x79, 0x65, 0x64, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x41, 0x70, + 0x70, 0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x0d, 0x72, 0x65, 0x73, 0x69, 0x64, + 0x75, 0x61, 0x6c, 0x52, 0x6f, 0x6f, 0x74, 0x73, 0x12, 0x75, 0x0a, 0x0e, 0x63, 0x68, 0x61, 0x6e, + 0x6e, 0x65, 0x6c, 0x5f, 0x73, 0x70, 0x6c, 0x69, 0x74, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, + 0x32, 0x4e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, + 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, + 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x52, 0x65, 0x73, 0x70, 0x6f, + 0x6e, 0x73, 0x65, 0x2e, 0x43, 0x68, 0x61, 0x6e, 0x6e, 0x65, 0x6c, 0x53, 0x70, 0x6c, 0x69, 0x74, + 0x52, 0x0d, 0x63, 0x68, 0x61, 0x6e, 0x6e, 0x65, 0x6c, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x73, 0x1a, + 0x99, 0x01, 0x0a, 0x0c, 0x43, 0x68, 0x61, 0x6e, 0x6e, 0x65, 0x6c, 0x53, 0x70, 0x6c, 0x69, 0x74, + 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, + 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, + 0x6d, 0x49, 0x64, 0x12, 0x30, 0x0a, 0x14, 0x6c, 0x61, 0x73, 0x74, 0x5f, 0x70, 0x72, 0x69, 0x6d, + 0x61, 0x72, 0x79, 0x5f, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, + 0x03, 0x52, 0x12, 0x6c, 0x61, 0x73, 0x74, 0x50, 0x72, 0x69, 0x6d, 0x61, 0x72, 0x79, 0x45, 0x6c, + 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x12, 0x34, 0x0a, 0x16, 0x66, 0x69, 0x72, 0x73, 0x74, 0x5f, 0x72, + 0x65, 0x73, 0x69, 0x64, 0x75, 0x61, 0x6c, 0x5f, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, + 0x03, 0x20, 0x01, 0x28, 0x03, 0x52, 0x14, 0x66, 0x69, 0x72, 0x73, 0x74, 0x52, 0x65, 0x73, 0x69, + 0x64, 0x75, 0x61, 0x6c, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x22, 0x3e, 0x0a, 0x15, 0x46, + 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x65, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x71, + 0x75, 0x65, 0x73, 0x74, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, + 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, + 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x22, 0x18, 0x0a, 0x16, 0x46, + 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x65, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x52, 0x65, 0x73, + 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0xd1, 0x03, 0x0a, 0x08, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, + 0x74, 0x73, 0x12, 0x48, 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, + 0x32, 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, + 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, + 0x73, 0x2e, 0x44, 0x61, 0x74, 0x61, 0x52, 0x04, 0x64, 0x61, 0x74, 0x61, 0x12, 0x4e, 0x0a, 0x06, + 0x74, 0x69, 0x6d, 0x65, 0x72, 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x36, 0x2e, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, + 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, + 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x2e, 0x54, 0x69, + 0x6d, 0x65, 0x72, 0x73, 0x52, 0x06, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x73, 0x1a, 0x7d, 0x0a, 0x04, + 0x44, 0x61, 0x74, 0x61, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, + 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, + 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, + 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x12, + 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, 0x61, + 0x74, 0x61, 0x12, 0x17, 0x0a, 0x07, 0x69, 0x73, 0x5f, 0x6c, 0x61, 0x73, 0x74, 0x18, 0x04, 0x20, + 0x01, 0x28, 0x08, 0x52, 0x06, 0x69, 0x73, 0x4c, 0x61, 0x73, 0x74, 0x1a, 0xab, 0x01, 0x0a, 0x06, + 0x54, 0x69, 0x6d, 0x65, 0x72, 0x73, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, + 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, + 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x12, 0x21, 0x0a, + 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, + 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, + 0x12, 0x26, 0x0a, 0x0f, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x5f, 0x66, 0x61, 0x6d, 0x69, 0x6c, 0x79, + 0x5f, 0x69, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x74, 0x69, 0x6d, 0x65, 0x72, + 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x49, 0x64, 0x12, 0x16, 0x0a, 0x06, 0x74, 0x69, 0x6d, 0x65, + 0x72, 0x73, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x06, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x73, + 0x12, 0x17, 0x0a, 0x07, 0x69, 0x73, 0x5f, 0x6c, 0x61, 0x73, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, + 0x08, 0x52, 0x06, 0x69, 0x73, 0x4c, 0x61, 0x73, 0x74, 0x22, 0x94, 0x03, 0x0a, 0x0c, 0x53, 0x74, + 0x61, 0x74, 0x65, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, + 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x69, 0x64, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, + 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, + 0x64, 0x12, 0x4c, 0x0a, 0x09, 0x73, 0x74, 0x61, 0x74, 0x65, 0x5f, 0x6b, 0x65, 0x79, 0x18, 0x03, + 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, + 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, + 0x74, 0x65, 0x4b, 0x65, 0x79, 0x52, 0x08, 0x73, 0x74, 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, 0x12, + 0x4b, 0x0a, 0x03, 0x67, 0x65, 0x74, 0x18, 0xe8, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x36, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, - 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x43, 0x6c, 0x65, 0x61, 0x72, - 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x48, 0x00, 0x52, 0x05, 0x63, 0x6c, 0x65, 0x61, 0x72, - 0x42, 0x09, 0x0a, 0x07, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x22, 0xba, 0x02, 0x0a, 0x0d, - 0x53, 0x74, 0x61, 0x74, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x0e, 0x0a, - 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x69, 0x64, 0x12, 0x14, 0x0a, - 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x65, 0x72, - 0x72, 0x6f, 0x72, 0x12, 0x4c, 0x0a, 0x03, 0x67, 0x65, 0x74, 0x18, 0xe8, 0x07, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, - 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x47, - 0x65, 0x74, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x03, 0x67, 0x65, - 0x74, 0x12, 0x55, 0x0a, 0x06, 0x61, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x18, 0xe9, 0x07, 0x20, 0x01, - 0x28, 0x0b, 0x32, 0x3a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x47, 0x65, 0x74, 0x52, 0x65, + 0x71, 0x75, 0x65, 0x73, 0x74, 0x48, 0x00, 0x52, 0x03, 0x67, 0x65, 0x74, 0x12, 0x54, 0x0a, 0x06, + 0x61, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x18, 0xe9, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x39, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, + 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x41, 0x70, 0x70, 0x65, 0x6e, + 0x64, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x48, 0x00, 0x52, 0x06, 0x61, 0x70, 0x70, 0x65, + 0x6e, 0x64, 0x12, 0x51, 0x0a, 0x05, 0x63, 0x6c, 0x65, 0x61, 0x72, 0x18, 0xea, 0x07, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x38, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, - 0x41, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, - 0x52, 0x06, 0x61, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x12, 0x52, 0x0a, 0x05, 0x63, 0x6c, 0x65, 0x61, - 0x72, 0x18, 0xea, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x39, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, - 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, - 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, - 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x43, 0x6c, 0x65, 0x61, 0x72, 0x52, 0x65, 0x73, 0x70, 0x6f, - 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x05, 0x63, 0x6c, 0x65, 0x61, 0x72, 0x42, 0x0a, 0x0a, 0x08, - 0x72, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0xc7, 0x08, 0x0a, 0x08, 0x53, 0x74, 0x61, - 0x74, 0x65, 0x4b, 0x65, 0x79, 0x12, 0x50, 0x0a, 0x06, 0x72, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x18, - 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x36, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, - 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, - 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, 0x2e, 0x52, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x48, 0x00, 0x52, - 0x06, 0x72, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x12, 0x73, 0x0a, 0x13, 0x6d, 0x75, 0x6c, 0x74, 0x69, - 0x6d, 0x61, 0x70, 0x5f, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x02, + 0x43, 0x6c, 0x65, 0x61, 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x48, 0x00, 0x52, 0x05, + 0x63, 0x6c, 0x65, 0x61, 0x72, 0x42, 0x09, 0x0a, 0x07, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, + 0x22, 0xba, 0x02, 0x0a, 0x0d, 0x53, 0x74, 0x61, 0x74, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, + 0x73, 0x65, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, + 0x69, 0x64, 0x12, 0x14, 0x0a, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x18, 0x02, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x12, 0x4c, 0x0a, 0x03, 0x67, 0x65, 0x74, 0x18, + 0xe8, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, + 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, + 0x74, 0x61, 0x74, 0x65, 0x47, 0x65, 0x74, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, + 0x00, 0x52, 0x03, 0x67, 0x65, 0x74, 0x12, 0x55, 0x0a, 0x06, 0x61, 0x70, 0x70, 0x65, 0x6e, 0x64, + 0x18, 0xe9, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, + 0x53, 0x74, 0x61, 0x74, 0x65, 0x41, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x52, 0x65, 0x73, 0x70, 0x6f, + 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x06, 0x61, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x12, 0x52, 0x0a, + 0x05, 0x63, 0x6c, 0x65, 0x61, 0x72, 0x18, 0xea, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x39, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, + 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x43, 0x6c, 0x65, 0x61, 0x72, + 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x48, 0x00, 0x52, 0x05, 0x63, 0x6c, 0x65, 0x61, + 0x72, 0x42, 0x0a, 0x0a, 0x08, 0x72, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0xc7, 0x08, + 0x0a, 0x08, 0x53, 0x74, 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, 0x12, 0x50, 0x0a, 0x06, 0x72, 0x75, + 0x6e, 0x6e, 0x65, 0x72, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x36, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, + 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, 0x2e, 0x52, 0x75, 0x6e, 0x6e, + 0x65, 0x72, 0x48, 0x00, 0x52, 0x06, 0x72, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x12, 0x73, 0x0a, 0x13, + 0x6d, 0x75, 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x5f, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, + 0x70, 0x75, 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, + 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, 0x2e, 0x4d, 0x75, 0x6c, 0x74, 0x69, + 0x6d, 0x61, 0x70, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x48, 0x00, 0x52, 0x11, + 0x6d, 0x75, 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, + 0x74, 0x12, 0x64, 0x0a, 0x0e, 0x62, 0x61, 0x67, 0x5f, 0x75, 0x73, 0x65, 0x72, 0x5f, 0x73, 0x74, + 0x61, 0x74, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, + 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, 0x2e, 0x42, 0x61, 0x67, 0x55, 0x73, + 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x48, 0x00, 0x52, 0x0c, 0x62, 0x61, 0x67, 0x55, 0x73, + 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x12, 0x73, 0x0a, 0x13, 0x69, 0x74, 0x65, 0x72, 0x61, + 0x62, 0x6c, 0x65, 0x5f, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, - 0x74, 0x65, 0x4b, 0x65, 0x79, 0x2e, 0x4d, 0x75, 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x53, 0x69, - 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x48, 0x00, 0x52, 0x11, 0x6d, 0x75, 0x6c, 0x74, 0x69, - 0x6d, 0x61, 0x70, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x64, 0x0a, 0x0e, - 0x62, 0x61, 0x67, 0x5f, 0x75, 0x73, 0x65, 0x72, 0x5f, 0x73, 0x74, 0x61, 0x74, 0x65, 0x18, 0x03, - 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, - 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, - 0x74, 0x65, 0x4b, 0x65, 0x79, 0x2e, 0x42, 0x61, 0x67, 0x55, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, - 0x74, 0x65, 0x48, 0x00, 0x52, 0x0c, 0x62, 0x61, 0x67, 0x55, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, - 0x74, 0x65, 0x12, 0x73, 0x0a, 0x13, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x5f, 0x73, - 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, - 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x74, 0x65, 0x4b, 0x65, 0x79, 0x2e, 0x49, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x69, + 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x48, 0x00, 0x52, 0x11, 0x69, 0x74, 0x65, 0x72, 0x61, + 0x62, 0x6c, 0x65, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x80, 0x01, 0x0a, + 0x18, 0x6d, 0x75, 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x5f, 0x6b, 0x65, 0x79, 0x73, 0x5f, 0x73, + 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, + 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, - 0x2e, 0x49, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, - 0x75, 0x74, 0x48, 0x00, 0x52, 0x11, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x69, - 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x80, 0x01, 0x0a, 0x18, 0x6d, 0x75, 0x6c, 0x74, - 0x69, 0x6d, 0x61, 0x70, 0x5f, 0x6b, 0x65, 0x79, 0x73, 0x5f, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, - 0x6e, 0x70, 0x75, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, - 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x4b, 0x65, 0x79, 0x2e, 0x4d, 0x75, 0x6c, 0x74, - 0x69, 0x6d, 0x61, 0x70, 0x4b, 0x65, 0x79, 0x73, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, - 0x74, 0x48, 0x00, 0x52, 0x15, 0x6d, 0x75, 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x4b, 0x65, 0x79, - 0x73, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x1a, 0x1a, 0x0a, 0x06, 0x52, 0x75, - 0x6e, 0x6e, 0x65, 0x72, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x0c, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x1a, 0x72, 0x0a, 0x11, 0x49, 0x74, 0x65, 0x72, 0x61, 0x62, - 0x6c, 0x65, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x21, 0x0a, 0x0c, 0x74, - 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x22, - 0x0a, 0x0d, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x69, 0x64, 0x18, - 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, - 0x49, 0x64, 0x12, 0x16, 0x0a, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x18, 0x03, 0x20, 0x01, - 0x28, 0x0c, 0x52, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x1a, 0x84, 0x01, 0x0a, 0x11, 0x4d, - 0x75, 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, + 0x2e, 0x4d, 0x75, 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x4b, 0x65, 0x79, 0x73, 0x53, 0x69, 0x64, + 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x48, 0x00, 0x52, 0x15, 0x6d, 0x75, 0x6c, 0x74, 0x69, 0x6d, + 0x61, 0x70, 0x4b, 0x65, 0x79, 0x73, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x1a, + 0x1a, 0x0a, 0x06, 0x52, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, + 0x18, 0x01, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x1a, 0x72, 0x0a, 0x11, 0x49, + 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x22, 0x0a, 0x0d, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x49, 0x64, 0x12, 0x16, 0x0a, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, - 0x77, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x12, - 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x03, 0x6b, 0x65, - 0x79, 0x1a, 0x76, 0x0a, 0x15, 0x4d, 0x75, 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x4b, 0x65, 0x79, - 0x73, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, - 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x22, 0x0a, - 0x0d, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x02, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x49, - 0x64, 0x12, 0x16, 0x0a, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x18, 0x03, 0x20, 0x01, 0x28, - 0x0c, 0x52, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x1a, 0x7f, 0x0a, 0x0c, 0x42, 0x61, 0x67, - 0x55, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, - 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x22, 0x0a, 0x0d, - 0x75, 0x73, 0x65, 0x72, 0x5f, 0x73, 0x74, 0x61, 0x74, 0x65, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x09, 0x52, 0x0b, 0x75, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x49, 0x64, - 0x12, 0x16, 0x0a, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, - 0x52, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, - 0x04, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x42, 0x06, 0x0a, 0x04, 0x74, 0x79, - 0x70, 0x65, 0x22, 0x40, 0x0a, 0x0f, 0x53, 0x74, 0x61, 0x74, 0x65, 0x47, 0x65, 0x74, 0x52, 0x65, - 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x2d, 0x0a, 0x12, 0x63, 0x6f, 0x6e, 0x74, 0x69, 0x6e, 0x75, - 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, 0x6f, 0x6b, 0x65, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x0c, 0x52, 0x11, 0x63, 0x6f, 0x6e, 0x74, 0x69, 0x6e, 0x75, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x54, - 0x6f, 0x6b, 0x65, 0x6e, 0x22, 0x55, 0x0a, 0x10, 0x53, 0x74, 0x61, 0x74, 0x65, 0x47, 0x65, 0x74, - 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x2d, 0x0a, 0x12, 0x63, 0x6f, 0x6e, 0x74, - 0x69, 0x6e, 0x75, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, 0x6f, 0x6b, 0x65, 0x6e, 0x18, 0x01, - 0x20, 0x01, 0x28, 0x0c, 0x52, 0x11, 0x63, 0x6f, 0x6e, 0x74, 0x69, 0x6e, 0x75, 0x61, 0x74, 0x69, - 0x6f, 0x6e, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x12, 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, - 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, 0x61, 0x74, 0x61, 0x22, 0x28, 0x0a, 0x12, 0x53, - 0x74, 0x61, 0x74, 0x65, 0x41, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, - 0x74, 0x12, 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0c, 0x52, - 0x04, 0x64, 0x61, 0x74, 0x61, 0x22, 0x15, 0x0a, 0x13, 0x53, 0x74, 0x61, 0x74, 0x65, 0x41, 0x70, - 0x70, 0x65, 0x6e, 0x64, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0x13, 0x0a, 0x11, - 0x53, 0x74, 0x61, 0x74, 0x65, 0x43, 0x6c, 0x65, 0x61, 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, - 0x74, 0x22, 0x14, 0x0a, 0x12, 0x53, 0x74, 0x61, 0x74, 0x65, 0x43, 0x6c, 0x65, 0x61, 0x72, 0x52, - 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0xa2, 0x04, 0x0a, 0x08, 0x4c, 0x6f, 0x67, 0x45, - 0x6e, 0x74, 0x72, 0x79, 0x12, 0x59, 0x0a, 0x08, 0x73, 0x65, 0x76, 0x65, 0x72, 0x69, 0x74, 0x79, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, - 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x4c, - 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x2e, 0x53, 0x65, 0x76, 0x65, 0x72, 0x69, 0x74, 0x79, - 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x08, 0x73, 0x65, 0x76, 0x65, 0x72, 0x69, 0x74, 0x79, 0x12, - 0x38, 0x0a, 0x09, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x18, 0x02, 0x20, 0x01, - 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, - 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x52, 0x09, - 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x12, 0x18, 0x0a, 0x07, 0x6d, 0x65, 0x73, - 0x73, 0x61, 0x67, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x6d, 0x65, 0x73, 0x73, - 0x61, 0x67, 0x65, 0x12, 0x14, 0x0a, 0x05, 0x74, 0x72, 0x61, 0x63, 0x65, 0x18, 0x04, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x05, 0x74, 0x72, 0x61, 0x63, 0x65, 0x12, 0x25, 0x0a, 0x0e, 0x69, 0x6e, 0x73, - 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, 0x18, 0x05, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x49, 0x64, - 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, - 0x18, 0x06, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, - 0x6d, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x6c, 0x6f, 0x67, 0x5f, 0x6c, 0x6f, 0x63, 0x61, 0x74, - 0x69, 0x6f, 0x6e, 0x18, 0x07, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x6c, 0x6f, 0x67, 0x4c, 0x6f, - 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x16, 0x0a, 0x06, 0x74, 0x68, 0x72, 0x65, 0x61, 0x64, - 0x18, 0x08, 0x20, 0x01, 0x28, 0x09, 0x52, 0x06, 0x74, 0x68, 0x72, 0x65, 0x61, 0x64, 0x1a, 0x58, - 0x0a, 0x04, 0x4c, 0x69, 0x73, 0x74, 0x12, 0x50, 0x0a, 0x0b, 0x6c, 0x6f, 0x67, 0x5f, 0x65, 0x6e, - 0x74, 0x72, 0x69, 0x65, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, + 0x77, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x1a, + 0x84, 0x01, 0x0a, 0x11, 0x4d, 0x75, 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x53, 0x69, 0x64, 0x65, + 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, + 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, + 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x22, 0x0a, 0x0d, 0x73, 0x69, 0x64, 0x65, + 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x0b, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x49, 0x64, 0x12, 0x16, 0x0a, 0x06, + 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x06, 0x77, 0x69, + 0x6e, 0x64, 0x6f, 0x77, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x04, 0x20, 0x01, 0x28, + 0x0c, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x1a, 0x76, 0x0a, 0x15, 0x4d, 0x75, 0x6c, 0x74, 0x69, 0x6d, + 0x61, 0x70, 0x4b, 0x65, 0x79, 0x73, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, + 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, + 0x49, 0x64, 0x12, 0x22, 0x0a, 0x0d, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, + 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x49, + 0x6e, 0x70, 0x75, 0x74, 0x49, 0x64, 0x12, 0x16, 0x0a, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, + 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x1a, 0x7f, + 0x0a, 0x0c, 0x42, 0x61, 0x67, 0x55, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x12, 0x21, + 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, + 0x64, 0x12, 0x22, 0x0a, 0x0d, 0x75, 0x73, 0x65, 0x72, 0x5f, 0x73, 0x74, 0x61, 0x74, 0x65, 0x5f, + 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x75, 0x73, 0x65, 0x72, 0x53, 0x74, + 0x61, 0x74, 0x65, 0x49, 0x64, 0x12, 0x16, 0x0a, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x18, + 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x06, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x12, 0x10, 0x0a, + 0x03, 0x6b, 0x65, 0x79, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x42, + 0x06, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x22, 0x40, 0x0a, 0x0f, 0x53, 0x74, 0x61, 0x74, 0x65, + 0x47, 0x65, 0x74, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x2d, 0x0a, 0x12, 0x63, 0x6f, + 0x6e, 0x74, 0x69, 0x6e, 0x75, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, 0x6f, 0x6b, 0x65, 0x6e, + 0x18, 0x01, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x11, 0x63, 0x6f, 0x6e, 0x74, 0x69, 0x6e, 0x75, 0x61, + 0x74, 0x69, 0x6f, 0x6e, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x22, 0x55, 0x0a, 0x10, 0x53, 0x74, 0x61, + 0x74, 0x65, 0x47, 0x65, 0x74, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x2d, 0x0a, + 0x12, 0x63, 0x6f, 0x6e, 0x74, 0x69, 0x6e, 0x75, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, 0x6f, + 0x6b, 0x65, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x11, 0x63, 0x6f, 0x6e, 0x74, 0x69, + 0x6e, 0x75, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0x12, 0x12, 0x0a, 0x04, + 0x64, 0x61, 0x74, 0x61, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, 0x61, 0x74, 0x61, + 0x22, 0x28, 0x0a, 0x12, 0x53, 0x74, 0x61, 0x74, 0x65, 0x41, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x52, + 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, 0x61, 0x74, 0x61, 0x22, 0x15, 0x0a, 0x13, 0x53, 0x74, + 0x61, 0x74, 0x65, 0x41, 0x70, 0x70, 0x65, 0x6e, 0x64, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, + 0x65, 0x22, 0x13, 0x0a, 0x11, 0x53, 0x74, 0x61, 0x74, 0x65, 0x43, 0x6c, 0x65, 0x61, 0x72, 0x52, + 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x22, 0x14, 0x0a, 0x12, 0x53, 0x74, 0x61, 0x74, 0x65, 0x43, + 0x6c, 0x65, 0x61, 0x72, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0xa2, 0x04, 0x0a, + 0x08, 0x4c, 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x59, 0x0a, 0x08, 0x73, 0x65, 0x76, + 0x65, 0x72, 0x69, 0x74, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, - 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0a, 0x6c, 0x6f, - 0x67, 0x45, 0x6e, 0x74, 0x72, 0x69, 0x65, 0x73, 0x1a, 0x72, 0x0a, 0x08, 0x53, 0x65, 0x76, 0x65, - 0x72, 0x69, 0x74, 0x79, 0x22, 0x66, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, - 0x55, 0x4e, 0x53, 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x09, 0x0a, - 0x05, 0x54, 0x52, 0x41, 0x43, 0x45, 0x10, 0x01, 0x12, 0x09, 0x0a, 0x05, 0x44, 0x45, 0x42, 0x55, - 0x47, 0x10, 0x02, 0x12, 0x08, 0x0a, 0x04, 0x49, 0x4e, 0x46, 0x4f, 0x10, 0x03, 0x12, 0x0a, 0x0a, - 0x06, 0x4e, 0x4f, 0x54, 0x49, 0x43, 0x45, 0x10, 0x04, 0x12, 0x08, 0x0a, 0x04, 0x57, 0x41, 0x52, - 0x4e, 0x10, 0x05, 0x12, 0x09, 0x0a, 0x05, 0x45, 0x52, 0x52, 0x4f, 0x52, 0x10, 0x06, 0x12, 0x0c, - 0x0a, 0x08, 0x43, 0x52, 0x49, 0x54, 0x49, 0x43, 0x41, 0x4c, 0x10, 0x07, 0x22, 0x0c, 0x0a, 0x0a, - 0x4c, 0x6f, 0x67, 0x43, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x22, 0xe1, 0x04, 0x0a, 0x12, 0x53, - 0x74, 0x61, 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, - 0x74, 0x12, 0x1b, 0x0a, 0x09, 0x77, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x77, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x49, 0x64, 0x12, 0x62, - 0x0a, 0x10, 0x63, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x5f, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, - 0x6e, 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, - 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, - 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x70, 0x69, - 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, - 0x72, 0x52, 0x0f, 0x63, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x45, 0x6e, 0x64, 0x70, 0x6f, 0x69, - 0x6e, 0x74, 0x12, 0x62, 0x0a, 0x10, 0x6c, 0x6f, 0x67, 0x67, 0x69, 0x6e, 0x67, 0x5f, 0x65, 0x6e, - 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, - 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x0f, 0x6c, 0x6f, 0x67, 0x67, 0x69, 0x6e, 0x67, 0x45, 0x6e, - 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x12, 0x64, 0x0a, 0x11, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, - 0x63, 0x74, 0x5f, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x04, 0x20, 0x01, 0x28, + 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x2e, 0x53, 0x65, 0x76, + 0x65, 0x72, 0x69, 0x74, 0x79, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x08, 0x73, 0x65, 0x76, 0x65, + 0x72, 0x69, 0x74, 0x79, 0x12, 0x38, 0x0a, 0x09, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, + 0x70, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, + 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, + 0x61, 0x6d, 0x70, 0x52, 0x09, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x12, 0x18, + 0x0a, 0x07, 0x6d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x07, 0x6d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x12, 0x14, 0x0a, 0x05, 0x74, 0x72, 0x61, 0x63, + 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x74, 0x72, 0x61, 0x63, 0x65, 0x12, 0x25, + 0x0a, 0x0e, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x64, + 0x18, 0x05, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x69, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, + 0x69, 0x6f, 0x6e, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, + 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x06, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, + 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x6c, 0x6f, 0x67, 0x5f, + 0x6c, 0x6f, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x07, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, + 0x6c, 0x6f, 0x67, 0x4c, 0x6f, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x16, 0x0a, 0x06, 0x74, + 0x68, 0x72, 0x65, 0x61, 0x64, 0x18, 0x08, 0x20, 0x01, 0x28, 0x09, 0x52, 0x06, 0x74, 0x68, 0x72, + 0x65, 0x61, 0x64, 0x1a, 0x58, 0x0a, 0x04, 0x4c, 0x69, 0x73, 0x74, 0x12, 0x50, 0x0a, 0x0b, 0x6c, + 0x6f, 0x67, 0x5f, 0x65, 0x6e, 0x74, 0x72, 0x69, 0x65, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, + 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, + 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, + 0x79, 0x52, 0x0a, 0x6c, 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, 0x69, 0x65, 0x73, 0x1a, 0x72, 0x0a, + 0x08, 0x53, 0x65, 0x76, 0x65, 0x72, 0x69, 0x74, 0x79, 0x22, 0x66, 0x0a, 0x04, 0x45, 0x6e, 0x75, + 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, + 0x10, 0x00, 0x12, 0x09, 0x0a, 0x05, 0x54, 0x52, 0x41, 0x43, 0x45, 0x10, 0x01, 0x12, 0x09, 0x0a, + 0x05, 0x44, 0x45, 0x42, 0x55, 0x47, 0x10, 0x02, 0x12, 0x08, 0x0a, 0x04, 0x49, 0x4e, 0x46, 0x4f, + 0x10, 0x03, 0x12, 0x0a, 0x0a, 0x06, 0x4e, 0x4f, 0x54, 0x49, 0x43, 0x45, 0x10, 0x04, 0x12, 0x08, + 0x0a, 0x04, 0x57, 0x41, 0x52, 0x4e, 0x10, 0x05, 0x12, 0x09, 0x0a, 0x05, 0x45, 0x52, 0x52, 0x4f, + 0x52, 0x10, 0x06, 0x12, 0x0c, 0x0a, 0x08, 0x43, 0x52, 0x49, 0x54, 0x49, 0x43, 0x41, 0x4c, 0x10, + 0x07, 0x22, 0x0c, 0x0a, 0x0a, 0x4c, 0x6f, 0x67, 0x43, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x22, + 0xe1, 0x04, 0x0a, 0x12, 0x53, 0x74, 0x61, 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, + 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x1b, 0x0a, 0x09, 0x77, 0x6f, 0x72, 0x6b, 0x65, 0x72, + 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x77, 0x6f, 0x72, 0x6b, 0x65, + 0x72, 0x49, 0x64, 0x12, 0x62, 0x0a, 0x10, 0x63, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x5f, 0x65, + 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, + 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x0f, 0x63, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x45, + 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x12, 0x62, 0x0a, 0x10, 0x6c, 0x6f, 0x67, 0x67, 0x69, + 0x6e, 0x67, 0x5f, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, - 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x10, 0x61, 0x72, 0x74, 0x69, - 0x66, 0x61, 0x63, 0x74, 0x45, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x12, 0x66, 0x0a, 0x12, - 0x70, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, 0x5f, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, - 0x6e, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, - 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, - 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x70, 0x69, - 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, - 0x72, 0x52, 0x11, 0x70, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, 0x45, 0x6e, 0x64, 0x70, - 0x6f, 0x69, 0x6e, 0x74, 0x12, 0x5d, 0x0a, 0x06, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x18, 0x0a, - 0x20, 0x03, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, - 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, - 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x2e, - 0x50, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x06, 0x70, 0x61, 0x72, - 0x61, 0x6d, 0x73, 0x1a, 0x39, 0x0a, 0x0b, 0x50, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x45, 0x6e, 0x74, - 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0x2b, - 0x0a, 0x13, 0x53, 0x74, 0x61, 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x73, - 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x14, 0x0a, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x18, 0x01, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x22, 0x30, 0x0a, 0x11, 0x53, - 0x74, 0x6f, 0x70, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, - 0x12, 0x1b, 0x0a, 0x09, 0x77, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, - 0x01, 0x28, 0x09, 0x52, 0x08, 0x77, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x49, 0x64, 0x22, 0x2a, 0x0a, - 0x12, 0x53, 0x74, 0x6f, 0x70, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x73, 0x70, 0x6f, - 0x6e, 0x73, 0x65, 0x12, 0x14, 0x0a, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x18, 0x01, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x22, 0x25, 0x0a, 0x13, 0x57, 0x6f, 0x72, - 0x6b, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, - 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x69, 0x64, - 0x22, 0x5d, 0x0a, 0x14, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, - 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x69, 0x64, 0x12, 0x14, 0x0a, 0x05, 0x65, 0x72, 0x72, 0x6f, - 0x72, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x12, 0x1f, - 0x0a, 0x0b, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x18, 0x03, 0x20, - 0x01, 0x28, 0x09, 0x52, 0x0a, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x49, 0x6e, 0x66, 0x6f, 0x32, - 0xc3, 0x02, 0x0a, 0x0d, 0x42, 0x65, 0x61, 0x6d, 0x46, 0x6e, 0x43, 0x6f, 0x6e, 0x74, 0x72, 0x6f, - 0x6c, 0x12, 0x86, 0x01, 0x0a, 0x07, 0x43, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x12, 0x3a, 0x2e, + 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x0f, 0x6c, 0x6f, 0x67, 0x67, + 0x69, 0x6e, 0x67, 0x45, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x12, 0x64, 0x0a, 0x11, 0x61, + 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x5f, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, + 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, + 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, + 0x10, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x45, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, + 0x74, 0x12, 0x66, 0x0a, 0x12, 0x70, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, 0x5f, 0x65, + 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, - 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, - 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, - 0x6e, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x1a, 0x39, 0x2e, 0x6f, 0x72, 0x67, 0x2e, - 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, - 0x31, 0x2e, 0x49, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x65, 0x71, - 0x75, 0x65, 0x73, 0x74, 0x22, 0x00, 0x28, 0x01, 0x30, 0x01, 0x12, 0xa8, 0x01, 0x0a, 0x1a, 0x47, - 0x65, 0x74, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, - 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x12, 0x48, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, + 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x11, 0x70, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, + 0x6e, 0x45, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x12, 0x5d, 0x0a, 0x06, 0x70, 0x61, 0x72, + 0x61, 0x6d, 0x73, 0x18, 0x0a, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, - 0x31, 0x2e, 0x47, 0x65, 0x74, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, - 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x65, 0x71, 0x75, - 0x65, 0x73, 0x74, 0x1a, 0x3e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, - 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, - 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, - 0x74, 0x6f, 0x72, 0x22, 0x00, 0x32, 0x7c, 0x0a, 0x0a, 0x42, 0x65, 0x61, 0x6d, 0x46, 0x6e, 0x44, - 0x61, 0x74, 0x61, 0x12, 0x6e, 0x0a, 0x04, 0x44, 0x61, 0x74, 0x61, 0x12, 0x2f, 0x2e, 0x6f, 0x72, - 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, - 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, - 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x1a, 0x2f, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, - 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x22, 0x00, 0x28, - 0x01, 0x30, 0x01, 0x32, 0x87, 0x01, 0x0a, 0x0b, 0x42, 0x65, 0x61, 0x6d, 0x46, 0x6e, 0x53, 0x74, - 0x61, 0x74, 0x65, 0x12, 0x78, 0x0a, 0x05, 0x53, 0x74, 0x61, 0x74, 0x65, 0x12, 0x33, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, - 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, - 0x74, 0x1a, 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, - 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x52, - 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0x00, 0x28, 0x01, 0x30, 0x01, 0x32, 0x89, 0x01, - 0x0a, 0x0d, 0x42, 0x65, 0x61, 0x6d, 0x46, 0x6e, 0x4c, 0x6f, 0x67, 0x67, 0x69, 0x6e, 0x67, 0x12, - 0x78, 0x0a, 0x07, 0x4c, 0x6f, 0x67, 0x67, 0x69, 0x6e, 0x67, 0x12, 0x34, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, - 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x2e, 0x4c, 0x69, 0x73, 0x74, - 0x1a, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, - 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, - 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x43, 0x6f, 0x6e, 0x74, - 0x72, 0x6f, 0x6c, 0x22, 0x00, 0x28, 0x01, 0x30, 0x01, 0x32, 0xa9, 0x02, 0x0a, 0x18, 0x42, 0x65, - 0x61, 0x6d, 0x46, 0x6e, 0x45, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x57, 0x6f, 0x72, 0x6b, - 0x65, 0x72, 0x50, 0x6f, 0x6f, 0x6c, 0x12, 0x86, 0x01, 0x0a, 0x0b, 0x53, 0x74, 0x61, 0x72, 0x74, - 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x12, 0x39, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, - 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, - 0x74, 0x61, 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, - 0x74, 0x1a, 0x3a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, - 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x72, 0x74, 0x57, - 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0x00, 0x12, - 0x83, 0x01, 0x0a, 0x0a, 0x53, 0x74, 0x6f, 0x70, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x12, 0x38, + 0x31, 0x2e, 0x53, 0x74, 0x61, 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x71, + 0x75, 0x65, 0x73, 0x74, 0x2e, 0x50, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, + 0x52, 0x06, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x1a, 0x39, 0x0a, 0x0b, 0x50, 0x61, 0x72, 0x61, + 0x6d, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, + 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, + 0x02, 0x38, 0x01, 0x22, 0x2b, 0x0a, 0x13, 0x53, 0x74, 0x61, 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, + 0x65, 0x72, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x14, 0x0a, 0x05, 0x65, 0x72, + 0x72, 0x6f, 0x72, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, + 0x22, 0x30, 0x0a, 0x11, 0x53, 0x74, 0x6f, 0x70, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, + 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x1b, 0x0a, 0x09, 0x77, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x5f, + 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x77, 0x6f, 0x72, 0x6b, 0x65, 0x72, + 0x49, 0x64, 0x22, 0x2a, 0x0a, 0x12, 0x53, 0x74, 0x6f, 0x70, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, + 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x14, 0x0a, 0x05, 0x65, 0x72, 0x72, 0x6f, + 0x72, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x22, 0x25, + 0x0a, 0x13, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x52, 0x65, + 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x02, 0x69, 0x64, 0x22, 0x5d, 0x0a, 0x14, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, + 0x74, 0x61, 0x74, 0x75, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x0e, 0x0a, + 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x69, 0x64, 0x12, 0x14, 0x0a, + 0x05, 0x65, 0x72, 0x72, 0x6f, 0x72, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x65, 0x72, + 0x72, 0x6f, 0x72, 0x12, 0x1f, 0x0a, 0x0b, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x5f, 0x69, 0x6e, + 0x66, 0x6f, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, + 0x49, 0x6e, 0x66, 0x6f, 0x32, 0xc3, 0x02, 0x0a, 0x0d, 0x42, 0x65, 0x61, 0x6d, 0x46, 0x6e, 0x43, + 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x12, 0x86, 0x01, 0x0a, 0x07, 0x43, 0x6f, 0x6e, 0x74, 0x72, + 0x6f, 0x6c, 0x12, 0x3a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, + 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x6e, 0x73, 0x74, 0x72, + 0x75, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x1a, 0x39, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, + 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x6e, 0x73, 0x74, 0x72, 0x75, 0x63, 0x74, 0x69, + 0x6f, 0x6e, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x22, 0x00, 0x28, 0x01, 0x30, 0x01, 0x12, + 0xa8, 0x01, 0x0a, 0x1a, 0x47, 0x65, 0x74, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, + 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x12, 0x48, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, - 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x6f, 0x70, 0x57, 0x6f, 0x72, 0x6b, 0x65, - 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x39, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x47, 0x65, 0x74, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, + 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, + 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x3e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, - 0x2e, 0x53, 0x74, 0x6f, 0x70, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x73, 0x70, 0x6f, - 0x6e, 0x73, 0x65, 0x22, 0x00, 0x32, 0xa4, 0x01, 0x0a, 0x12, 0x42, 0x65, 0x61, 0x6d, 0x46, 0x6e, - 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x12, 0x8d, 0x01, 0x0a, - 0x0c, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x12, 0x3b, 0x2e, - 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, - 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, - 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, - 0x75, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x1a, 0x3a, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, - 0x76, 0x31, 0x2e, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x52, - 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x22, 0x00, 0x28, 0x01, 0x30, 0x01, 0x42, 0x7e, 0x0a, 0x24, + 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x42, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x44, 0x65, + 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x22, 0x00, 0x32, 0x7c, 0x0a, 0x0a, 0x42, 0x65, + 0x61, 0x6d, 0x46, 0x6e, 0x44, 0x61, 0x74, 0x61, 0x12, 0x6e, 0x0a, 0x04, 0x44, 0x61, 0x74, 0x61, + 0x12, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, + 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, + 0x73, 0x1a, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, + 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, + 0x74, 0x73, 0x22, 0x00, 0x28, 0x01, 0x30, 0x01, 0x32, 0x87, 0x01, 0x0a, 0x0b, 0x42, 0x65, 0x61, + 0x6d, 0x46, 0x6e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x12, 0x78, 0x0a, 0x05, 0x53, 0x74, 0x61, 0x74, + 0x65, 0x12, 0x33, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, + 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x52, + 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, + 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, + 0x74, 0x61, 0x74, 0x65, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0x00, 0x28, 0x01, + 0x30, 0x01, 0x32, 0x89, 0x01, 0x0a, 0x0d, 0x42, 0x65, 0x61, 0x6d, 0x46, 0x6e, 0x4c, 0x6f, 0x67, + 0x67, 0x69, 0x6e, 0x67, 0x12, 0x78, 0x0a, 0x07, 0x4c, 0x6f, 0x67, 0x67, 0x69, 0x6e, 0x67, 0x12, + 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, + 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, 0x79, + 0x2e, 0x4c, 0x69, 0x73, 0x74, 0x1a, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, + 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, + 0x67, 0x43, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x22, 0x00, 0x28, 0x01, 0x30, 0x01, 0x32, 0xa9, + 0x02, 0x0a, 0x18, 0x42, 0x65, 0x61, 0x6d, 0x46, 0x6e, 0x45, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, + 0x6c, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x50, 0x6f, 0x6f, 0x6c, 0x12, 0x86, 0x01, 0x0a, 0x0b, + 0x53, 0x74, 0x61, 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x12, 0x39, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, + 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, + 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x3a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, + 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, + 0x74, 0x61, 0x72, 0x74, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, + 0x73, 0x65, 0x22, 0x00, 0x12, 0x83, 0x01, 0x0a, 0x0a, 0x53, 0x74, 0x6f, 0x70, 0x57, 0x6f, 0x72, + 0x6b, 0x65, 0x72, 0x12, 0x38, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, + 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, + 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x6f, 0x70, + 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x39, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, - 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, - 0x6e, 0x2e, 0x76, 0x31, 0x42, 0x09, 0x42, 0x65, 0x61, 0x6d, 0x46, 0x6e, 0x41, 0x70, 0x69, 0x5a, - 0x4b, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x67, 0x6f, 0x2f, - 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x66, - 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x76, 0x31, 0x3b, 0x66, 0x6e, - 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, - 0x6f, 0x74, 0x6f, 0x33, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, + 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x6f, 0x70, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, + 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0x00, 0x32, 0xa4, 0x01, 0x0a, 0x12, 0x42, + 0x65, 0x61, 0x6d, 0x46, 0x6e, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x75, + 0x73, 0x12, 0x8d, 0x01, 0x0a, 0x0c, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, + 0x75, 0x73, 0x12, 0x3b, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, + 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x6f, 0x72, 0x6b, 0x65, + 0x72, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x1a, + 0x3a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, + 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x53, 0x74, + 0x61, 0x74, 0x75, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x22, 0x00, 0x28, 0x01, 0x30, + 0x01, 0x42, 0x81, 0x01, 0x0a, 0x24, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, + 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x65, 0x78, + 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x42, 0x09, 0x42, 0x65, 0x61, 0x6d, + 0x46, 0x6e, 0x41, 0x70, 0x69, 0x5a, 0x4e, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, + 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, + 0x6b, 0x73, 0x2f, 0x76, 0x32, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, + 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x66, 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, + 0x69, 0x6f, 0x6e, 0x5f, 0x76, 0x31, 0x3b, 0x66, 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, + 0x6f, 0x6e, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( @@ -4560,172 +4729,178 @@ func file_beam_fn_api_proto_rawDescGZIP() []byte { } var file_beam_fn_api_proto_enumTypes = make([]protoimpl.EnumInfo, 1) -var file_beam_fn_api_proto_msgTypes = make([]protoimpl.MessageInfo, 62) +var file_beam_fn_api_proto_msgTypes = make([]protoimpl.MessageInfo, 65) var file_beam_fn_api_proto_goTypes = []interface{}{ (LogEntry_Severity_Enum)(0), // 0: org.apache.beam.model.fn_execution.v1.LogEntry.Severity.Enum (*RemoteGrpcPort)(nil), // 1: org.apache.beam.model.fn_execution.v1.RemoteGrpcPort (*GetProcessBundleDescriptorRequest)(nil), // 2: org.apache.beam.model.fn_execution.v1.GetProcessBundleDescriptorRequest (*InstructionRequest)(nil), // 3: org.apache.beam.model.fn_execution.v1.InstructionRequest (*InstructionResponse)(nil), // 4: org.apache.beam.model.fn_execution.v1.InstructionResponse - (*RegisterRequest)(nil), // 5: org.apache.beam.model.fn_execution.v1.RegisterRequest - (*RegisterResponse)(nil), // 6: org.apache.beam.model.fn_execution.v1.RegisterResponse - (*ProcessBundleDescriptor)(nil), // 7: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor - (*BundleApplication)(nil), // 8: org.apache.beam.model.fn_execution.v1.BundleApplication - (*DelayedBundleApplication)(nil), // 9: org.apache.beam.model.fn_execution.v1.DelayedBundleApplication - (*ProcessBundleRequest)(nil), // 10: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest - (*ProcessBundleResponse)(nil), // 11: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse - (*ProcessBundleProgressRequest)(nil), // 12: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressRequest - (*MonitoringInfosMetadataRequest)(nil), // 13: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataRequest - (*ProcessBundleProgressResponse)(nil), // 14: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse - (*MonitoringInfosMetadataResponse)(nil), // 15: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse - (*ProcessBundleSplitRequest)(nil), // 16: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest - (*ProcessBundleSplitResponse)(nil), // 17: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse - (*FinalizeBundleRequest)(nil), // 18: org.apache.beam.model.fn_execution.v1.FinalizeBundleRequest - (*FinalizeBundleResponse)(nil), // 19: org.apache.beam.model.fn_execution.v1.FinalizeBundleResponse - (*Elements)(nil), // 20: org.apache.beam.model.fn_execution.v1.Elements - (*StateRequest)(nil), // 21: org.apache.beam.model.fn_execution.v1.StateRequest - (*StateResponse)(nil), // 22: org.apache.beam.model.fn_execution.v1.StateResponse - (*StateKey)(nil), // 23: org.apache.beam.model.fn_execution.v1.StateKey - (*StateGetRequest)(nil), // 24: org.apache.beam.model.fn_execution.v1.StateGetRequest - (*StateGetResponse)(nil), // 25: org.apache.beam.model.fn_execution.v1.StateGetResponse - (*StateAppendRequest)(nil), // 26: org.apache.beam.model.fn_execution.v1.StateAppendRequest - (*StateAppendResponse)(nil), // 27: org.apache.beam.model.fn_execution.v1.StateAppendResponse - (*StateClearRequest)(nil), // 28: org.apache.beam.model.fn_execution.v1.StateClearRequest - (*StateClearResponse)(nil), // 29: org.apache.beam.model.fn_execution.v1.StateClearResponse - (*LogEntry)(nil), // 30: org.apache.beam.model.fn_execution.v1.LogEntry - (*LogControl)(nil), // 31: org.apache.beam.model.fn_execution.v1.LogControl - (*StartWorkerRequest)(nil), // 32: org.apache.beam.model.fn_execution.v1.StartWorkerRequest - (*StartWorkerResponse)(nil), // 33: org.apache.beam.model.fn_execution.v1.StartWorkerResponse - (*StopWorkerRequest)(nil), // 34: org.apache.beam.model.fn_execution.v1.StopWorkerRequest - (*StopWorkerResponse)(nil), // 35: org.apache.beam.model.fn_execution.v1.StopWorkerResponse - (*WorkerStatusRequest)(nil), // 36: org.apache.beam.model.fn_execution.v1.WorkerStatusRequest - (*WorkerStatusResponse)(nil), // 37: org.apache.beam.model.fn_execution.v1.WorkerStatusResponse - nil, // 38: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.TransformsEntry - nil, // 39: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.PcollectionsEntry - nil, // 40: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.WindowingStrategiesEntry - nil, // 41: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.CodersEntry - nil, // 42: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.EnvironmentsEntry - nil, // 43: org.apache.beam.model.fn_execution.v1.BundleApplication.OutputWatermarksEntry - (*ProcessBundleRequest_CacheToken)(nil), // 44: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken - (*ProcessBundleRequest_CacheToken_UserState)(nil), // 45: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.UserState - (*ProcessBundleRequest_CacheToken_SideInput)(nil), // 46: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.SideInput - nil, // 47: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.MonitoringDataEntry - nil, // 48: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse.MonitoringDataEntry - nil, // 49: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse.MonitoringInfoEntry - (*ProcessBundleSplitRequest_DesiredSplit)(nil), // 50: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplit - nil, // 51: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplitsEntry - (*ProcessBundleSplitResponse_ChannelSplit)(nil), // 52: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.ChannelSplit - (*Elements_Data)(nil), // 53: org.apache.beam.model.fn_execution.v1.Elements.Data - (*Elements_Timers)(nil), // 54: org.apache.beam.model.fn_execution.v1.Elements.Timers - (*StateKey_Runner)(nil), // 55: org.apache.beam.model.fn_execution.v1.StateKey.Runner - (*StateKey_IterableSideInput)(nil), // 56: org.apache.beam.model.fn_execution.v1.StateKey.IterableSideInput - (*StateKey_MultimapSideInput)(nil), // 57: org.apache.beam.model.fn_execution.v1.StateKey.MultimapSideInput - (*StateKey_MultimapKeysSideInput)(nil), // 58: org.apache.beam.model.fn_execution.v1.StateKey.MultimapKeysSideInput - (*StateKey_BagUserState)(nil), // 59: org.apache.beam.model.fn_execution.v1.StateKey.BagUserState - (*LogEntry_List)(nil), // 60: org.apache.beam.model.fn_execution.v1.LogEntry.List - (*LogEntry_Severity)(nil), // 61: org.apache.beam.model.fn_execution.v1.LogEntry.Severity - nil, // 62: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.ParamsEntry - (*pipeline_v1.ApiServiceDescriptor)(nil), // 63: org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - (pipeline_v1.IsBounded_Enum)(0), // 64: org.apache.beam.model.pipeline.v1.IsBounded.Enum - (*duration.Duration)(nil), // 65: google.protobuf.Duration - (*pipeline_v1.MonitoringInfo)(nil), // 66: org.apache.beam.model.pipeline.v1.MonitoringInfo - (*timestamp.Timestamp)(nil), // 67: google.protobuf.Timestamp - (*pipeline_v1.PTransform)(nil), // 68: org.apache.beam.model.pipeline.v1.PTransform - (*pipeline_v1.PCollection)(nil), // 69: org.apache.beam.model.pipeline.v1.PCollection - (*pipeline_v1.WindowingStrategy)(nil), // 70: org.apache.beam.model.pipeline.v1.WindowingStrategy - (*pipeline_v1.Coder)(nil), // 71: org.apache.beam.model.pipeline.v1.Coder - (*pipeline_v1.Environment)(nil), // 72: org.apache.beam.model.pipeline.v1.Environment + (*HarnessMonitoringInfosRequest)(nil), // 5: org.apache.beam.model.fn_execution.v1.HarnessMonitoringInfosRequest + (*HarnessMonitoringInfosResponse)(nil), // 6: org.apache.beam.model.fn_execution.v1.HarnessMonitoringInfosResponse + (*RegisterRequest)(nil), // 7: org.apache.beam.model.fn_execution.v1.RegisterRequest + (*RegisterResponse)(nil), // 8: org.apache.beam.model.fn_execution.v1.RegisterResponse + (*ProcessBundleDescriptor)(nil), // 9: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor + (*BundleApplication)(nil), // 10: org.apache.beam.model.fn_execution.v1.BundleApplication + (*DelayedBundleApplication)(nil), // 11: org.apache.beam.model.fn_execution.v1.DelayedBundleApplication + (*ProcessBundleRequest)(nil), // 12: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest + (*ProcessBundleResponse)(nil), // 13: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse + (*ProcessBundleProgressRequest)(nil), // 14: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressRequest + (*MonitoringInfosMetadataRequest)(nil), // 15: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataRequest + (*ProcessBundleProgressResponse)(nil), // 16: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse + (*MonitoringInfosMetadataResponse)(nil), // 17: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse + (*ProcessBundleSplitRequest)(nil), // 18: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest + (*ProcessBundleSplitResponse)(nil), // 19: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse + (*FinalizeBundleRequest)(nil), // 20: org.apache.beam.model.fn_execution.v1.FinalizeBundleRequest + (*FinalizeBundleResponse)(nil), // 21: org.apache.beam.model.fn_execution.v1.FinalizeBundleResponse + (*Elements)(nil), // 22: org.apache.beam.model.fn_execution.v1.Elements + (*StateRequest)(nil), // 23: org.apache.beam.model.fn_execution.v1.StateRequest + (*StateResponse)(nil), // 24: org.apache.beam.model.fn_execution.v1.StateResponse + (*StateKey)(nil), // 25: org.apache.beam.model.fn_execution.v1.StateKey + (*StateGetRequest)(nil), // 26: org.apache.beam.model.fn_execution.v1.StateGetRequest + (*StateGetResponse)(nil), // 27: org.apache.beam.model.fn_execution.v1.StateGetResponse + (*StateAppendRequest)(nil), // 28: org.apache.beam.model.fn_execution.v1.StateAppendRequest + (*StateAppendResponse)(nil), // 29: org.apache.beam.model.fn_execution.v1.StateAppendResponse + (*StateClearRequest)(nil), // 30: org.apache.beam.model.fn_execution.v1.StateClearRequest + (*StateClearResponse)(nil), // 31: org.apache.beam.model.fn_execution.v1.StateClearResponse + (*LogEntry)(nil), // 32: org.apache.beam.model.fn_execution.v1.LogEntry + (*LogControl)(nil), // 33: org.apache.beam.model.fn_execution.v1.LogControl + (*StartWorkerRequest)(nil), // 34: org.apache.beam.model.fn_execution.v1.StartWorkerRequest + (*StartWorkerResponse)(nil), // 35: org.apache.beam.model.fn_execution.v1.StartWorkerResponse + (*StopWorkerRequest)(nil), // 36: org.apache.beam.model.fn_execution.v1.StopWorkerRequest + (*StopWorkerResponse)(nil), // 37: org.apache.beam.model.fn_execution.v1.StopWorkerResponse + (*WorkerStatusRequest)(nil), // 38: org.apache.beam.model.fn_execution.v1.WorkerStatusRequest + (*WorkerStatusResponse)(nil), // 39: org.apache.beam.model.fn_execution.v1.WorkerStatusResponse + nil, // 40: org.apache.beam.model.fn_execution.v1.HarnessMonitoringInfosResponse.MonitoringDataEntry + nil, // 41: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.TransformsEntry + nil, // 42: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.PcollectionsEntry + nil, // 43: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.WindowingStrategiesEntry + nil, // 44: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.CodersEntry + nil, // 45: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.EnvironmentsEntry + nil, // 46: org.apache.beam.model.fn_execution.v1.BundleApplication.OutputWatermarksEntry + (*ProcessBundleRequest_CacheToken)(nil), // 47: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken + (*ProcessBundleRequest_CacheToken_UserState)(nil), // 48: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.UserState + (*ProcessBundleRequest_CacheToken_SideInput)(nil), // 49: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.SideInput + nil, // 50: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.MonitoringDataEntry + nil, // 51: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse.MonitoringDataEntry + nil, // 52: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse.MonitoringInfoEntry + (*ProcessBundleSplitRequest_DesiredSplit)(nil), // 53: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplit + nil, // 54: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplitsEntry + (*ProcessBundleSplitResponse_ChannelSplit)(nil), // 55: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.ChannelSplit + (*Elements_Data)(nil), // 56: org.apache.beam.model.fn_execution.v1.Elements.Data + (*Elements_Timers)(nil), // 57: org.apache.beam.model.fn_execution.v1.Elements.Timers + (*StateKey_Runner)(nil), // 58: org.apache.beam.model.fn_execution.v1.StateKey.Runner + (*StateKey_IterableSideInput)(nil), // 59: org.apache.beam.model.fn_execution.v1.StateKey.IterableSideInput + (*StateKey_MultimapSideInput)(nil), // 60: org.apache.beam.model.fn_execution.v1.StateKey.MultimapSideInput + (*StateKey_MultimapKeysSideInput)(nil), // 61: org.apache.beam.model.fn_execution.v1.StateKey.MultimapKeysSideInput + (*StateKey_BagUserState)(nil), // 62: org.apache.beam.model.fn_execution.v1.StateKey.BagUserState + (*LogEntry_List)(nil), // 63: org.apache.beam.model.fn_execution.v1.LogEntry.List + (*LogEntry_Severity)(nil), // 64: org.apache.beam.model.fn_execution.v1.LogEntry.Severity + nil, // 65: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.ParamsEntry + (*pipeline_v1.ApiServiceDescriptor)(nil), // 66: org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + (pipeline_v1.IsBounded_Enum)(0), // 67: org.apache.beam.model.pipeline.v1.IsBounded.Enum + (*durationpb.Duration)(nil), // 68: google.protobuf.Duration + (*pipeline_v1.MonitoringInfo)(nil), // 69: org.apache.beam.model.pipeline.v1.MonitoringInfo + (*timestamppb.Timestamp)(nil), // 70: google.protobuf.Timestamp + (*pipeline_v1.PTransform)(nil), // 71: org.apache.beam.model.pipeline.v1.PTransform + (*pipeline_v1.PCollection)(nil), // 72: org.apache.beam.model.pipeline.v1.PCollection + (*pipeline_v1.WindowingStrategy)(nil), // 73: org.apache.beam.model.pipeline.v1.WindowingStrategy + (*pipeline_v1.Coder)(nil), // 74: org.apache.beam.model.pipeline.v1.Coder + (*pipeline_v1.Environment)(nil), // 75: org.apache.beam.model.pipeline.v1.Environment } var file_beam_fn_api_proto_depIdxs = []int32{ - 63, // 0: org.apache.beam.model.fn_execution.v1.RemoteGrpcPort.api_service_descriptor:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 10, // 1: org.apache.beam.model.fn_execution.v1.InstructionRequest.process_bundle:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleRequest - 12, // 2: org.apache.beam.model.fn_execution.v1.InstructionRequest.process_bundle_progress:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleProgressRequest - 16, // 3: org.apache.beam.model.fn_execution.v1.InstructionRequest.process_bundle_split:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest - 18, // 4: org.apache.beam.model.fn_execution.v1.InstructionRequest.finalize_bundle:type_name -> org.apache.beam.model.fn_execution.v1.FinalizeBundleRequest - 13, // 5: org.apache.beam.model.fn_execution.v1.InstructionRequest.monitoring_infos:type_name -> org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataRequest - 5, // 6: org.apache.beam.model.fn_execution.v1.InstructionRequest.register:type_name -> org.apache.beam.model.fn_execution.v1.RegisterRequest - 11, // 7: org.apache.beam.model.fn_execution.v1.InstructionResponse.process_bundle:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleResponse - 14, // 8: org.apache.beam.model.fn_execution.v1.InstructionResponse.process_bundle_progress:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse - 17, // 9: org.apache.beam.model.fn_execution.v1.InstructionResponse.process_bundle_split:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse - 19, // 10: org.apache.beam.model.fn_execution.v1.InstructionResponse.finalize_bundle:type_name -> org.apache.beam.model.fn_execution.v1.FinalizeBundleResponse - 15, // 11: org.apache.beam.model.fn_execution.v1.InstructionResponse.monitoring_infos:type_name -> org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse - 6, // 12: org.apache.beam.model.fn_execution.v1.InstructionResponse.register:type_name -> org.apache.beam.model.fn_execution.v1.RegisterResponse - 7, // 13: org.apache.beam.model.fn_execution.v1.RegisterRequest.process_bundle_descriptor:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor - 38, // 14: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.transforms:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.TransformsEntry - 39, // 15: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.pcollections:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.PcollectionsEntry - 40, // 16: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.windowing_strategies:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.WindowingStrategiesEntry - 41, // 17: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.coders:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.CodersEntry - 42, // 18: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.environments:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.EnvironmentsEntry - 63, // 19: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.state_api_service_descriptor:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 63, // 20: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.timer_api_service_descriptor:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 43, // 21: org.apache.beam.model.fn_execution.v1.BundleApplication.output_watermarks:type_name -> org.apache.beam.model.fn_execution.v1.BundleApplication.OutputWatermarksEntry - 64, // 22: org.apache.beam.model.fn_execution.v1.BundleApplication.is_bounded:type_name -> org.apache.beam.model.pipeline.v1.IsBounded.Enum - 8, // 23: org.apache.beam.model.fn_execution.v1.DelayedBundleApplication.application:type_name -> org.apache.beam.model.fn_execution.v1.BundleApplication - 65, // 24: org.apache.beam.model.fn_execution.v1.DelayedBundleApplication.requested_time_delay:type_name -> google.protobuf.Duration - 44, // 25: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.cache_tokens:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken - 9, // 26: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.residual_roots:type_name -> org.apache.beam.model.fn_execution.v1.DelayedBundleApplication - 66, // 27: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.monitoring_infos:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfo - 47, // 28: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.monitoring_data:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.MonitoringDataEntry - 66, // 29: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse.monitoring_infos:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfo - 48, // 30: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse.monitoring_data:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse.MonitoringDataEntry - 49, // 31: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse.monitoring_info:type_name -> org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse.MonitoringInfoEntry - 51, // 32: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.desired_splits:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplitsEntry - 8, // 33: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.primary_roots:type_name -> org.apache.beam.model.fn_execution.v1.BundleApplication - 9, // 34: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.residual_roots:type_name -> org.apache.beam.model.fn_execution.v1.DelayedBundleApplication - 52, // 35: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.channel_splits:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.ChannelSplit - 53, // 36: org.apache.beam.model.fn_execution.v1.Elements.data:type_name -> org.apache.beam.model.fn_execution.v1.Elements.Data - 54, // 37: org.apache.beam.model.fn_execution.v1.Elements.timers:type_name -> org.apache.beam.model.fn_execution.v1.Elements.Timers - 23, // 38: org.apache.beam.model.fn_execution.v1.StateRequest.state_key:type_name -> org.apache.beam.model.fn_execution.v1.StateKey - 24, // 39: org.apache.beam.model.fn_execution.v1.StateRequest.get:type_name -> org.apache.beam.model.fn_execution.v1.StateGetRequest - 26, // 40: org.apache.beam.model.fn_execution.v1.StateRequest.append:type_name -> org.apache.beam.model.fn_execution.v1.StateAppendRequest - 28, // 41: org.apache.beam.model.fn_execution.v1.StateRequest.clear:type_name -> org.apache.beam.model.fn_execution.v1.StateClearRequest - 25, // 42: org.apache.beam.model.fn_execution.v1.StateResponse.get:type_name -> org.apache.beam.model.fn_execution.v1.StateGetResponse - 27, // 43: org.apache.beam.model.fn_execution.v1.StateResponse.append:type_name -> org.apache.beam.model.fn_execution.v1.StateAppendResponse - 29, // 44: org.apache.beam.model.fn_execution.v1.StateResponse.clear:type_name -> org.apache.beam.model.fn_execution.v1.StateClearResponse - 55, // 45: org.apache.beam.model.fn_execution.v1.StateKey.runner:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.Runner - 57, // 46: org.apache.beam.model.fn_execution.v1.StateKey.multimap_side_input:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.MultimapSideInput - 59, // 47: org.apache.beam.model.fn_execution.v1.StateKey.bag_user_state:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.BagUserState - 56, // 48: org.apache.beam.model.fn_execution.v1.StateKey.iterable_side_input:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.IterableSideInput - 58, // 49: org.apache.beam.model.fn_execution.v1.StateKey.multimap_keys_side_input:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.MultimapKeysSideInput - 0, // 50: org.apache.beam.model.fn_execution.v1.LogEntry.severity:type_name -> org.apache.beam.model.fn_execution.v1.LogEntry.Severity.Enum - 67, // 51: org.apache.beam.model.fn_execution.v1.LogEntry.timestamp:type_name -> google.protobuf.Timestamp - 63, // 52: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.control_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 63, // 53: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.logging_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 63, // 54: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.artifact_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 63, // 55: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.provision_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 62, // 56: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.params:type_name -> org.apache.beam.model.fn_execution.v1.StartWorkerRequest.ParamsEntry - 68, // 57: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.TransformsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.PTransform - 69, // 58: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.PcollectionsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.PCollection - 70, // 59: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.WindowingStrategiesEntry.value:type_name -> org.apache.beam.model.pipeline.v1.WindowingStrategy - 71, // 60: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.CodersEntry.value:type_name -> org.apache.beam.model.pipeline.v1.Coder - 72, // 61: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.EnvironmentsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.Environment - 67, // 62: org.apache.beam.model.fn_execution.v1.BundleApplication.OutputWatermarksEntry.value:type_name -> google.protobuf.Timestamp - 45, // 63: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.user_state:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.UserState - 46, // 64: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.side_input:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.SideInput - 66, // 65: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse.MonitoringInfoEntry.value:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfo - 50, // 66: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplitsEntry.value:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplit - 30, // 67: org.apache.beam.model.fn_execution.v1.LogEntry.List.log_entries:type_name -> org.apache.beam.model.fn_execution.v1.LogEntry - 4, // 68: org.apache.beam.model.fn_execution.v1.BeamFnControl.Control:input_type -> org.apache.beam.model.fn_execution.v1.InstructionResponse - 2, // 69: org.apache.beam.model.fn_execution.v1.BeamFnControl.GetProcessBundleDescriptor:input_type -> org.apache.beam.model.fn_execution.v1.GetProcessBundleDescriptorRequest - 20, // 70: org.apache.beam.model.fn_execution.v1.BeamFnData.Data:input_type -> org.apache.beam.model.fn_execution.v1.Elements - 21, // 71: org.apache.beam.model.fn_execution.v1.BeamFnState.State:input_type -> org.apache.beam.model.fn_execution.v1.StateRequest - 60, // 72: org.apache.beam.model.fn_execution.v1.BeamFnLogging.Logging:input_type -> org.apache.beam.model.fn_execution.v1.LogEntry.List - 32, // 73: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool.StartWorker:input_type -> org.apache.beam.model.fn_execution.v1.StartWorkerRequest - 34, // 74: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool.StopWorker:input_type -> org.apache.beam.model.fn_execution.v1.StopWorkerRequest - 37, // 75: org.apache.beam.model.fn_execution.v1.BeamFnWorkerStatus.WorkerStatus:input_type -> org.apache.beam.model.fn_execution.v1.WorkerStatusResponse - 3, // 76: org.apache.beam.model.fn_execution.v1.BeamFnControl.Control:output_type -> org.apache.beam.model.fn_execution.v1.InstructionRequest - 7, // 77: org.apache.beam.model.fn_execution.v1.BeamFnControl.GetProcessBundleDescriptor:output_type -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor - 20, // 78: org.apache.beam.model.fn_execution.v1.BeamFnData.Data:output_type -> org.apache.beam.model.fn_execution.v1.Elements - 22, // 79: org.apache.beam.model.fn_execution.v1.BeamFnState.State:output_type -> org.apache.beam.model.fn_execution.v1.StateResponse - 31, // 80: org.apache.beam.model.fn_execution.v1.BeamFnLogging.Logging:output_type -> org.apache.beam.model.fn_execution.v1.LogControl - 33, // 81: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool.StartWorker:output_type -> org.apache.beam.model.fn_execution.v1.StartWorkerResponse - 35, // 82: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool.StopWorker:output_type -> org.apache.beam.model.fn_execution.v1.StopWorkerResponse - 36, // 83: org.apache.beam.model.fn_execution.v1.BeamFnWorkerStatus.WorkerStatus:output_type -> org.apache.beam.model.fn_execution.v1.WorkerStatusRequest - 76, // [76:84] is the sub-list for method output_type - 68, // [68:76] is the sub-list for method input_type - 68, // [68:68] is the sub-list for extension type_name - 68, // [68:68] is the sub-list for extension extendee - 0, // [0:68] is the sub-list for field type_name + 66, // 0: org.apache.beam.model.fn_execution.v1.RemoteGrpcPort.api_service_descriptor:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 12, // 1: org.apache.beam.model.fn_execution.v1.InstructionRequest.process_bundle:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleRequest + 14, // 2: org.apache.beam.model.fn_execution.v1.InstructionRequest.process_bundle_progress:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleProgressRequest + 18, // 3: org.apache.beam.model.fn_execution.v1.InstructionRequest.process_bundle_split:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest + 20, // 4: org.apache.beam.model.fn_execution.v1.InstructionRequest.finalize_bundle:type_name -> org.apache.beam.model.fn_execution.v1.FinalizeBundleRequest + 15, // 5: org.apache.beam.model.fn_execution.v1.InstructionRequest.monitoring_infos:type_name -> org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataRequest + 5, // 6: org.apache.beam.model.fn_execution.v1.InstructionRequest.harness_monitoring_infos:type_name -> org.apache.beam.model.fn_execution.v1.HarnessMonitoringInfosRequest + 7, // 7: org.apache.beam.model.fn_execution.v1.InstructionRequest.register:type_name -> org.apache.beam.model.fn_execution.v1.RegisterRequest + 13, // 8: org.apache.beam.model.fn_execution.v1.InstructionResponse.process_bundle:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleResponse + 16, // 9: org.apache.beam.model.fn_execution.v1.InstructionResponse.process_bundle_progress:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse + 19, // 10: org.apache.beam.model.fn_execution.v1.InstructionResponse.process_bundle_split:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse + 21, // 11: org.apache.beam.model.fn_execution.v1.InstructionResponse.finalize_bundle:type_name -> org.apache.beam.model.fn_execution.v1.FinalizeBundleResponse + 17, // 12: org.apache.beam.model.fn_execution.v1.InstructionResponse.monitoring_infos:type_name -> org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse + 6, // 13: org.apache.beam.model.fn_execution.v1.InstructionResponse.harness_monitoring_infos:type_name -> org.apache.beam.model.fn_execution.v1.HarnessMonitoringInfosResponse + 8, // 14: org.apache.beam.model.fn_execution.v1.InstructionResponse.register:type_name -> org.apache.beam.model.fn_execution.v1.RegisterResponse + 40, // 15: org.apache.beam.model.fn_execution.v1.HarnessMonitoringInfosResponse.monitoring_data:type_name -> org.apache.beam.model.fn_execution.v1.HarnessMonitoringInfosResponse.MonitoringDataEntry + 9, // 16: org.apache.beam.model.fn_execution.v1.RegisterRequest.process_bundle_descriptor:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor + 41, // 17: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.transforms:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.TransformsEntry + 42, // 18: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.pcollections:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.PcollectionsEntry + 43, // 19: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.windowing_strategies:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.WindowingStrategiesEntry + 44, // 20: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.coders:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.CodersEntry + 45, // 21: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.environments:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.EnvironmentsEntry + 66, // 22: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.state_api_service_descriptor:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 66, // 23: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.timer_api_service_descriptor:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 46, // 24: org.apache.beam.model.fn_execution.v1.BundleApplication.output_watermarks:type_name -> org.apache.beam.model.fn_execution.v1.BundleApplication.OutputWatermarksEntry + 67, // 25: org.apache.beam.model.fn_execution.v1.BundleApplication.is_bounded:type_name -> org.apache.beam.model.pipeline.v1.IsBounded.Enum + 10, // 26: org.apache.beam.model.fn_execution.v1.DelayedBundleApplication.application:type_name -> org.apache.beam.model.fn_execution.v1.BundleApplication + 68, // 27: org.apache.beam.model.fn_execution.v1.DelayedBundleApplication.requested_time_delay:type_name -> google.protobuf.Duration + 47, // 28: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.cache_tokens:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken + 11, // 29: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.residual_roots:type_name -> org.apache.beam.model.fn_execution.v1.DelayedBundleApplication + 69, // 30: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.monitoring_infos:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfo + 50, // 31: org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.monitoring_data:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleResponse.MonitoringDataEntry + 69, // 32: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse.monitoring_infos:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfo + 51, // 33: org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse.monitoring_data:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleProgressResponse.MonitoringDataEntry + 52, // 34: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse.monitoring_info:type_name -> org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse.MonitoringInfoEntry + 54, // 35: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.desired_splits:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplitsEntry + 10, // 36: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.primary_roots:type_name -> org.apache.beam.model.fn_execution.v1.BundleApplication + 11, // 37: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.residual_roots:type_name -> org.apache.beam.model.fn_execution.v1.DelayedBundleApplication + 55, // 38: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.channel_splits:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitResponse.ChannelSplit + 56, // 39: org.apache.beam.model.fn_execution.v1.Elements.data:type_name -> org.apache.beam.model.fn_execution.v1.Elements.Data + 57, // 40: org.apache.beam.model.fn_execution.v1.Elements.timers:type_name -> org.apache.beam.model.fn_execution.v1.Elements.Timers + 25, // 41: org.apache.beam.model.fn_execution.v1.StateRequest.state_key:type_name -> org.apache.beam.model.fn_execution.v1.StateKey + 26, // 42: org.apache.beam.model.fn_execution.v1.StateRequest.get:type_name -> org.apache.beam.model.fn_execution.v1.StateGetRequest + 28, // 43: org.apache.beam.model.fn_execution.v1.StateRequest.append:type_name -> org.apache.beam.model.fn_execution.v1.StateAppendRequest + 30, // 44: org.apache.beam.model.fn_execution.v1.StateRequest.clear:type_name -> org.apache.beam.model.fn_execution.v1.StateClearRequest + 27, // 45: org.apache.beam.model.fn_execution.v1.StateResponse.get:type_name -> org.apache.beam.model.fn_execution.v1.StateGetResponse + 29, // 46: org.apache.beam.model.fn_execution.v1.StateResponse.append:type_name -> org.apache.beam.model.fn_execution.v1.StateAppendResponse + 31, // 47: org.apache.beam.model.fn_execution.v1.StateResponse.clear:type_name -> org.apache.beam.model.fn_execution.v1.StateClearResponse + 58, // 48: org.apache.beam.model.fn_execution.v1.StateKey.runner:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.Runner + 60, // 49: org.apache.beam.model.fn_execution.v1.StateKey.multimap_side_input:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.MultimapSideInput + 62, // 50: org.apache.beam.model.fn_execution.v1.StateKey.bag_user_state:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.BagUserState + 59, // 51: org.apache.beam.model.fn_execution.v1.StateKey.iterable_side_input:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.IterableSideInput + 61, // 52: org.apache.beam.model.fn_execution.v1.StateKey.multimap_keys_side_input:type_name -> org.apache.beam.model.fn_execution.v1.StateKey.MultimapKeysSideInput + 0, // 53: org.apache.beam.model.fn_execution.v1.LogEntry.severity:type_name -> org.apache.beam.model.fn_execution.v1.LogEntry.Severity.Enum + 70, // 54: org.apache.beam.model.fn_execution.v1.LogEntry.timestamp:type_name -> google.protobuf.Timestamp + 66, // 55: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.control_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 66, // 56: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.logging_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 66, // 57: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.artifact_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 66, // 58: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.provision_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 65, // 59: org.apache.beam.model.fn_execution.v1.StartWorkerRequest.params:type_name -> org.apache.beam.model.fn_execution.v1.StartWorkerRequest.ParamsEntry + 71, // 60: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.TransformsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.PTransform + 72, // 61: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.PcollectionsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.PCollection + 73, // 62: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.WindowingStrategiesEntry.value:type_name -> org.apache.beam.model.pipeline.v1.WindowingStrategy + 74, // 63: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.CodersEntry.value:type_name -> org.apache.beam.model.pipeline.v1.Coder + 75, // 64: org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor.EnvironmentsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.Environment + 70, // 65: org.apache.beam.model.fn_execution.v1.BundleApplication.OutputWatermarksEntry.value:type_name -> google.protobuf.Timestamp + 48, // 66: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.user_state:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.UserState + 49, // 67: org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.side_input:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleRequest.CacheToken.SideInput + 69, // 68: org.apache.beam.model.fn_execution.v1.MonitoringInfosMetadataResponse.MonitoringInfoEntry.value:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfo + 53, // 69: org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplitsEntry.value:type_name -> org.apache.beam.model.fn_execution.v1.ProcessBundleSplitRequest.DesiredSplit + 32, // 70: org.apache.beam.model.fn_execution.v1.LogEntry.List.log_entries:type_name -> org.apache.beam.model.fn_execution.v1.LogEntry + 4, // 71: org.apache.beam.model.fn_execution.v1.BeamFnControl.Control:input_type -> org.apache.beam.model.fn_execution.v1.InstructionResponse + 2, // 72: org.apache.beam.model.fn_execution.v1.BeamFnControl.GetProcessBundleDescriptor:input_type -> org.apache.beam.model.fn_execution.v1.GetProcessBundleDescriptorRequest + 22, // 73: org.apache.beam.model.fn_execution.v1.BeamFnData.Data:input_type -> org.apache.beam.model.fn_execution.v1.Elements + 23, // 74: org.apache.beam.model.fn_execution.v1.BeamFnState.State:input_type -> org.apache.beam.model.fn_execution.v1.StateRequest + 63, // 75: org.apache.beam.model.fn_execution.v1.BeamFnLogging.Logging:input_type -> org.apache.beam.model.fn_execution.v1.LogEntry.List + 34, // 76: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool.StartWorker:input_type -> org.apache.beam.model.fn_execution.v1.StartWorkerRequest + 36, // 77: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool.StopWorker:input_type -> org.apache.beam.model.fn_execution.v1.StopWorkerRequest + 39, // 78: org.apache.beam.model.fn_execution.v1.BeamFnWorkerStatus.WorkerStatus:input_type -> org.apache.beam.model.fn_execution.v1.WorkerStatusResponse + 3, // 79: org.apache.beam.model.fn_execution.v1.BeamFnControl.Control:output_type -> org.apache.beam.model.fn_execution.v1.InstructionRequest + 9, // 80: org.apache.beam.model.fn_execution.v1.BeamFnControl.GetProcessBundleDescriptor:output_type -> org.apache.beam.model.fn_execution.v1.ProcessBundleDescriptor + 22, // 81: org.apache.beam.model.fn_execution.v1.BeamFnData.Data:output_type -> org.apache.beam.model.fn_execution.v1.Elements + 24, // 82: org.apache.beam.model.fn_execution.v1.BeamFnState.State:output_type -> org.apache.beam.model.fn_execution.v1.StateResponse + 33, // 83: org.apache.beam.model.fn_execution.v1.BeamFnLogging.Logging:output_type -> org.apache.beam.model.fn_execution.v1.LogControl + 35, // 84: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool.StartWorker:output_type -> org.apache.beam.model.fn_execution.v1.StartWorkerResponse + 37, // 85: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool.StopWorker:output_type -> org.apache.beam.model.fn_execution.v1.StopWorkerResponse + 38, // 86: org.apache.beam.model.fn_execution.v1.BeamFnWorkerStatus.WorkerStatus:output_type -> org.apache.beam.model.fn_execution.v1.WorkerStatusRequest + 79, // [79:87] is the sub-list for method output_type + 71, // [71:79] is the sub-list for method input_type + 71, // [71:71] is the sub-list for extension type_name + 71, // [71:71] is the sub-list for extension extendee + 0, // [0:71] is the sub-list for field type_name } func init() { file_beam_fn_api_proto_init() } @@ -4783,7 +4958,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[4].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*RegisterRequest); i { + switch v := v.(*HarnessMonitoringInfosRequest); i { case 0: return &v.state case 1: @@ -4795,7 +4970,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[5].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*RegisterResponse); i { + switch v := v.(*HarnessMonitoringInfosResponse); i { case 0: return &v.state case 1: @@ -4807,7 +4982,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[6].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ProcessBundleDescriptor); i { + switch v := v.(*RegisterRequest); i { case 0: return &v.state case 1: @@ -4819,7 +4994,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[7].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*BundleApplication); i { + switch v := v.(*RegisterResponse); i { case 0: return &v.state case 1: @@ -4831,7 +5006,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[8].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*DelayedBundleApplication); i { + switch v := v.(*ProcessBundleDescriptor); i { case 0: return &v.state case 1: @@ -4843,7 +5018,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[9].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ProcessBundleRequest); i { + switch v := v.(*BundleApplication); i { case 0: return &v.state case 1: @@ -4855,7 +5030,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[10].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ProcessBundleResponse); i { + switch v := v.(*DelayedBundleApplication); i { case 0: return &v.state case 1: @@ -4867,7 +5042,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[11].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ProcessBundleProgressRequest); i { + switch v := v.(*ProcessBundleRequest); i { case 0: return &v.state case 1: @@ -4879,7 +5054,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[12].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*MonitoringInfosMetadataRequest); i { + switch v := v.(*ProcessBundleResponse); i { case 0: return &v.state case 1: @@ -4891,7 +5066,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[13].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ProcessBundleProgressResponse); i { + switch v := v.(*ProcessBundleProgressRequest); i { case 0: return &v.state case 1: @@ -4903,7 +5078,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[14].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*MonitoringInfosMetadataResponse); i { + switch v := v.(*MonitoringInfosMetadataRequest); i { case 0: return &v.state case 1: @@ -4915,7 +5090,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[15].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ProcessBundleSplitRequest); i { + switch v := v.(*ProcessBundleProgressResponse); i { case 0: return &v.state case 1: @@ -4927,7 +5102,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[16].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ProcessBundleSplitResponse); i { + switch v := v.(*MonitoringInfosMetadataResponse); i { case 0: return &v.state case 1: @@ -4939,7 +5114,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[17].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*FinalizeBundleRequest); i { + switch v := v.(*ProcessBundleSplitRequest); i { case 0: return &v.state case 1: @@ -4951,7 +5126,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[18].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*FinalizeBundleResponse); i { + switch v := v.(*ProcessBundleSplitResponse); i { case 0: return &v.state case 1: @@ -4963,7 +5138,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[19].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*Elements); i { + switch v := v.(*FinalizeBundleRequest); i { case 0: return &v.state case 1: @@ -4975,7 +5150,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[20].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StateRequest); i { + switch v := v.(*FinalizeBundleResponse); i { case 0: return &v.state case 1: @@ -4987,7 +5162,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[21].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StateResponse); i { + switch v := v.(*Elements); i { case 0: return &v.state case 1: @@ -4999,7 +5174,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[22].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StateKey); i { + switch v := v.(*StateRequest); i { case 0: return &v.state case 1: @@ -5011,7 +5186,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[23].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StateGetRequest); i { + switch v := v.(*StateResponse); i { case 0: return &v.state case 1: @@ -5023,7 +5198,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[24].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StateGetResponse); i { + switch v := v.(*StateKey); i { case 0: return &v.state case 1: @@ -5035,7 +5210,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[25].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StateAppendRequest); i { + switch v := v.(*StateGetRequest); i { case 0: return &v.state case 1: @@ -5047,7 +5222,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[26].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StateAppendResponse); i { + switch v := v.(*StateGetResponse); i { case 0: return &v.state case 1: @@ -5059,7 +5234,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[27].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StateClearRequest); i { + switch v := v.(*StateAppendRequest); i { case 0: return &v.state case 1: @@ -5071,7 +5246,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[28].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StateClearResponse); i { + switch v := v.(*StateAppendResponse); i { case 0: return &v.state case 1: @@ -5083,7 +5258,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[29].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*LogEntry); i { + switch v := v.(*StateClearRequest); i { case 0: return &v.state case 1: @@ -5095,7 +5270,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[30].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*LogControl); i { + switch v := v.(*StateClearResponse); i { case 0: return &v.state case 1: @@ -5107,7 +5282,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[31].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StartWorkerRequest); i { + switch v := v.(*LogEntry); i { case 0: return &v.state case 1: @@ -5119,7 +5294,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[32].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StartWorkerResponse); i { + switch v := v.(*LogControl); i { case 0: return &v.state case 1: @@ -5131,7 +5306,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[33].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StopWorkerRequest); i { + switch v := v.(*StartWorkerRequest); i { case 0: return &v.state case 1: @@ -5143,7 +5318,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[34].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StopWorkerResponse); i { + switch v := v.(*StartWorkerResponse); i { case 0: return &v.state case 1: @@ -5155,7 +5330,7 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[35].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*WorkerStatusRequest); i { + switch v := v.(*StopWorkerRequest); i { case 0: return &v.state case 1: @@ -5167,6 +5342,30 @@ func file_beam_fn_api_proto_init() { } } file_beam_fn_api_proto_msgTypes[36].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*StopWorkerResponse); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_beam_fn_api_proto_msgTypes[37].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*WorkerStatusRequest); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_beam_fn_api_proto_msgTypes[38].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*WorkerStatusResponse); i { case 0: return &v.state @@ -5178,7 +5377,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[43].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[46].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ProcessBundleRequest_CacheToken); i { case 0: return &v.state @@ -5190,7 +5389,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[44].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[47].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ProcessBundleRequest_CacheToken_UserState); i { case 0: return &v.state @@ -5202,7 +5401,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[45].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[48].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ProcessBundleRequest_CacheToken_SideInput); i { case 0: return &v.state @@ -5214,7 +5413,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[49].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[52].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ProcessBundleSplitRequest_DesiredSplit); i { case 0: return &v.state @@ -5226,7 +5425,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[51].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[54].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ProcessBundleSplitResponse_ChannelSplit); i { case 0: return &v.state @@ -5238,7 +5437,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[52].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[55].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Elements_Data); i { case 0: return &v.state @@ -5250,7 +5449,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[53].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[56].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Elements_Timers); i { case 0: return &v.state @@ -5262,7 +5461,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[54].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[57].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*StateKey_Runner); i { case 0: return &v.state @@ -5274,7 +5473,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[55].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[58].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*StateKey_IterableSideInput); i { case 0: return &v.state @@ -5286,7 +5485,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[56].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[59].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*StateKey_MultimapSideInput); i { case 0: return &v.state @@ -5298,7 +5497,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[57].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[60].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*StateKey_MultimapKeysSideInput); i { case 0: return &v.state @@ -5310,7 +5509,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[58].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[61].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*StateKey_BagUserState); i { case 0: return &v.state @@ -5322,7 +5521,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[59].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[62].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*LogEntry_List); i { case 0: return &v.state @@ -5334,7 +5533,7 @@ func file_beam_fn_api_proto_init() { return nil } } - file_beam_fn_api_proto_msgTypes[60].Exporter = func(v interface{}, i int) interface{} { + file_beam_fn_api_proto_msgTypes[63].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*LogEntry_Severity); i { case 0: return &v.state @@ -5353,6 +5552,7 @@ func file_beam_fn_api_proto_init() { (*InstructionRequest_ProcessBundleSplit)(nil), (*InstructionRequest_FinalizeBundle)(nil), (*InstructionRequest_MonitoringInfos)(nil), + (*InstructionRequest_HarnessMonitoringInfos)(nil), (*InstructionRequest_Register)(nil), } file_beam_fn_api_proto_msgTypes[3].OneofWrappers = []interface{}{ @@ -5361,26 +5561,27 @@ func file_beam_fn_api_proto_init() { (*InstructionResponse_ProcessBundleSplit)(nil), (*InstructionResponse_FinalizeBundle)(nil), (*InstructionResponse_MonitoringInfos)(nil), + (*InstructionResponse_HarnessMonitoringInfos)(nil), (*InstructionResponse_Register)(nil), } - file_beam_fn_api_proto_msgTypes[20].OneofWrappers = []interface{}{ + file_beam_fn_api_proto_msgTypes[22].OneofWrappers = []interface{}{ (*StateRequest_Get)(nil), (*StateRequest_Append)(nil), (*StateRequest_Clear)(nil), } - file_beam_fn_api_proto_msgTypes[21].OneofWrappers = []interface{}{ + file_beam_fn_api_proto_msgTypes[23].OneofWrappers = []interface{}{ (*StateResponse_Get)(nil), (*StateResponse_Append)(nil), (*StateResponse_Clear)(nil), } - file_beam_fn_api_proto_msgTypes[22].OneofWrappers = []interface{}{ + file_beam_fn_api_proto_msgTypes[24].OneofWrappers = []interface{}{ (*StateKey_Runner_)(nil), (*StateKey_MultimapSideInput_)(nil), (*StateKey_BagUserState_)(nil), (*StateKey_IterableSideInput_)(nil), (*StateKey_MultimapKeysSideInput_)(nil), } - file_beam_fn_api_proto_msgTypes[43].OneofWrappers = []interface{}{ + file_beam_fn_api_proto_msgTypes[46].OneofWrappers = []interface{}{ (*ProcessBundleRequest_CacheToken_UserState_)(nil), (*ProcessBundleRequest_CacheToken_SideInput_)(nil), } @@ -5390,7 +5591,7 @@ func file_beam_fn_api_proto_init() { GoPackagePath: reflect.TypeOf(x{}).PkgPath(), RawDescriptor: file_beam_fn_api_proto_rawDesc, NumEnums: 1, - NumMessages: 62, + NumMessages: 65, NumExtensions: 0, NumServices: 6, }, @@ -5404,696 +5605,3 @@ func file_beam_fn_api_proto_init() { file_beam_fn_api_proto_goTypes = nil file_beam_fn_api_proto_depIdxs = nil } - -// Reference imports to suppress errors if they are not otherwise used. -var _ context.Context -var _ grpc.ClientConnInterface - -// This is a compile-time assertion to ensure that this generated file -// is compatible with the grpc package it is being compiled against. -const _ = grpc.SupportPackageIsVersion6 - -// BeamFnControlClient is the client API for BeamFnControl service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type BeamFnControlClient interface { - // Instructions sent by the runner to the SDK requesting different types - // of work. - Control(ctx context.Context, opts ...grpc.CallOption) (BeamFnControl_ControlClient, error) - // Used to get the full process bundle descriptors for bundles one - // is asked to process. - GetProcessBundleDescriptor(ctx context.Context, in *GetProcessBundleDescriptorRequest, opts ...grpc.CallOption) (*ProcessBundleDescriptor, error) -} - -type beamFnControlClient struct { - cc grpc.ClientConnInterface -} - -func NewBeamFnControlClient(cc grpc.ClientConnInterface) BeamFnControlClient { - return &beamFnControlClient{cc} -} - -func (c *beamFnControlClient) Control(ctx context.Context, opts ...grpc.CallOption) (BeamFnControl_ControlClient, error) { - stream, err := c.cc.NewStream(ctx, &_BeamFnControl_serviceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnControl/Control", opts...) - if err != nil { - return nil, err - } - x := &beamFnControlControlClient{stream} - return x, nil -} - -type BeamFnControl_ControlClient interface { - Send(*InstructionResponse) error - Recv() (*InstructionRequest, error) - grpc.ClientStream -} - -type beamFnControlControlClient struct { - grpc.ClientStream -} - -func (x *beamFnControlControlClient) Send(m *InstructionResponse) error { - return x.ClientStream.SendMsg(m) -} - -func (x *beamFnControlControlClient) Recv() (*InstructionRequest, error) { - m := new(InstructionRequest) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -func (c *beamFnControlClient) GetProcessBundleDescriptor(ctx context.Context, in *GetProcessBundleDescriptorRequest, opts ...grpc.CallOption) (*ProcessBundleDescriptor, error) { - out := new(ProcessBundleDescriptor) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.fn_execution.v1.BeamFnControl/GetProcessBundleDescriptor", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -// BeamFnControlServer is the server API for BeamFnControl service. -type BeamFnControlServer interface { - // Instructions sent by the runner to the SDK requesting different types - // of work. - Control(BeamFnControl_ControlServer) error - // Used to get the full process bundle descriptors for bundles one - // is asked to process. - GetProcessBundleDescriptor(context.Context, *GetProcessBundleDescriptorRequest) (*ProcessBundleDescriptor, error) -} - -// UnimplementedBeamFnControlServer can be embedded to have forward compatible implementations. -type UnimplementedBeamFnControlServer struct { -} - -func (*UnimplementedBeamFnControlServer) Control(BeamFnControl_ControlServer) error { - return status.Errorf(codes.Unimplemented, "method Control not implemented") -} -func (*UnimplementedBeamFnControlServer) GetProcessBundleDescriptor(context.Context, *GetProcessBundleDescriptorRequest) (*ProcessBundleDescriptor, error) { - return nil, status.Errorf(codes.Unimplemented, "method GetProcessBundleDescriptor not implemented") -} - -func RegisterBeamFnControlServer(s *grpc.Server, srv BeamFnControlServer) { - s.RegisterService(&_BeamFnControl_serviceDesc, srv) -} - -func _BeamFnControl_Control_Handler(srv interface{}, stream grpc.ServerStream) error { - return srv.(BeamFnControlServer).Control(&beamFnControlControlServer{stream}) -} - -type BeamFnControl_ControlServer interface { - Send(*InstructionRequest) error - Recv() (*InstructionResponse, error) - grpc.ServerStream -} - -type beamFnControlControlServer struct { - grpc.ServerStream -} - -func (x *beamFnControlControlServer) Send(m *InstructionRequest) error { - return x.ServerStream.SendMsg(m) -} - -func (x *beamFnControlControlServer) Recv() (*InstructionResponse, error) { - m := new(InstructionResponse) - if err := x.ServerStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -func _BeamFnControl_GetProcessBundleDescriptor_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(GetProcessBundleDescriptorRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(BeamFnControlServer).GetProcessBundleDescriptor(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.fn_execution.v1.BeamFnControl/GetProcessBundleDescriptor", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(BeamFnControlServer).GetProcessBundleDescriptor(ctx, req.(*GetProcessBundleDescriptorRequest)) - } - return interceptor(ctx, in, info, handler) -} - -var _BeamFnControl_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnControl", - HandlerType: (*BeamFnControlServer)(nil), - Methods: []grpc.MethodDesc{ - { - MethodName: "GetProcessBundleDescriptor", - Handler: _BeamFnControl_GetProcessBundleDescriptor_Handler, - }, - }, - Streams: []grpc.StreamDesc{ - { - StreamName: "Control", - Handler: _BeamFnControl_Control_Handler, - ServerStreams: true, - ClientStreams: true, - }, - }, - Metadata: "beam_fn_api.proto", -} - -// BeamFnDataClient is the client API for BeamFnData service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type BeamFnDataClient interface { - // Used to send data between harnesses. - Data(ctx context.Context, opts ...grpc.CallOption) (BeamFnData_DataClient, error) -} - -type beamFnDataClient struct { - cc grpc.ClientConnInterface -} - -func NewBeamFnDataClient(cc grpc.ClientConnInterface) BeamFnDataClient { - return &beamFnDataClient{cc} -} - -func (c *beamFnDataClient) Data(ctx context.Context, opts ...grpc.CallOption) (BeamFnData_DataClient, error) { - stream, err := c.cc.NewStream(ctx, &_BeamFnData_serviceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnData/Data", opts...) - if err != nil { - return nil, err - } - x := &beamFnDataDataClient{stream} - return x, nil -} - -type BeamFnData_DataClient interface { - Send(*Elements) error - Recv() (*Elements, error) - grpc.ClientStream -} - -type beamFnDataDataClient struct { - grpc.ClientStream -} - -func (x *beamFnDataDataClient) Send(m *Elements) error { - return x.ClientStream.SendMsg(m) -} - -func (x *beamFnDataDataClient) Recv() (*Elements, error) { - m := new(Elements) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -// BeamFnDataServer is the server API for BeamFnData service. -type BeamFnDataServer interface { - // Used to send data between harnesses. - Data(BeamFnData_DataServer) error -} - -// UnimplementedBeamFnDataServer can be embedded to have forward compatible implementations. -type UnimplementedBeamFnDataServer struct { -} - -func (*UnimplementedBeamFnDataServer) Data(BeamFnData_DataServer) error { - return status.Errorf(codes.Unimplemented, "method Data not implemented") -} - -func RegisterBeamFnDataServer(s *grpc.Server, srv BeamFnDataServer) { - s.RegisterService(&_BeamFnData_serviceDesc, srv) -} - -func _BeamFnData_Data_Handler(srv interface{}, stream grpc.ServerStream) error { - return srv.(BeamFnDataServer).Data(&beamFnDataDataServer{stream}) -} - -type BeamFnData_DataServer interface { - Send(*Elements) error - Recv() (*Elements, error) - grpc.ServerStream -} - -type beamFnDataDataServer struct { - grpc.ServerStream -} - -func (x *beamFnDataDataServer) Send(m *Elements) error { - return x.ServerStream.SendMsg(m) -} - -func (x *beamFnDataDataServer) Recv() (*Elements, error) { - m := new(Elements) - if err := x.ServerStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -var _BeamFnData_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnData", - HandlerType: (*BeamFnDataServer)(nil), - Methods: []grpc.MethodDesc{}, - Streams: []grpc.StreamDesc{ - { - StreamName: "Data", - Handler: _BeamFnData_Data_Handler, - ServerStreams: true, - ClientStreams: true, - }, - }, - Metadata: "beam_fn_api.proto", -} - -// BeamFnStateClient is the client API for BeamFnState service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type BeamFnStateClient interface { - // Used to get/append/clear state stored by the runner on behalf of the SDK. - State(ctx context.Context, opts ...grpc.CallOption) (BeamFnState_StateClient, error) -} - -type beamFnStateClient struct { - cc grpc.ClientConnInterface -} - -func NewBeamFnStateClient(cc grpc.ClientConnInterface) BeamFnStateClient { - return &beamFnStateClient{cc} -} - -func (c *beamFnStateClient) State(ctx context.Context, opts ...grpc.CallOption) (BeamFnState_StateClient, error) { - stream, err := c.cc.NewStream(ctx, &_BeamFnState_serviceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnState/State", opts...) - if err != nil { - return nil, err - } - x := &beamFnStateStateClient{stream} - return x, nil -} - -type BeamFnState_StateClient interface { - Send(*StateRequest) error - Recv() (*StateResponse, error) - grpc.ClientStream -} - -type beamFnStateStateClient struct { - grpc.ClientStream -} - -func (x *beamFnStateStateClient) Send(m *StateRequest) error { - return x.ClientStream.SendMsg(m) -} - -func (x *beamFnStateStateClient) Recv() (*StateResponse, error) { - m := new(StateResponse) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -// BeamFnStateServer is the server API for BeamFnState service. -type BeamFnStateServer interface { - // Used to get/append/clear state stored by the runner on behalf of the SDK. - State(BeamFnState_StateServer) error -} - -// UnimplementedBeamFnStateServer can be embedded to have forward compatible implementations. -type UnimplementedBeamFnStateServer struct { -} - -func (*UnimplementedBeamFnStateServer) State(BeamFnState_StateServer) error { - return status.Errorf(codes.Unimplemented, "method State not implemented") -} - -func RegisterBeamFnStateServer(s *grpc.Server, srv BeamFnStateServer) { - s.RegisterService(&_BeamFnState_serviceDesc, srv) -} - -func _BeamFnState_State_Handler(srv interface{}, stream grpc.ServerStream) error { - return srv.(BeamFnStateServer).State(&beamFnStateStateServer{stream}) -} - -type BeamFnState_StateServer interface { - Send(*StateResponse) error - Recv() (*StateRequest, error) - grpc.ServerStream -} - -type beamFnStateStateServer struct { - grpc.ServerStream -} - -func (x *beamFnStateStateServer) Send(m *StateResponse) error { - return x.ServerStream.SendMsg(m) -} - -func (x *beamFnStateStateServer) Recv() (*StateRequest, error) { - m := new(StateRequest) - if err := x.ServerStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -var _BeamFnState_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnState", - HandlerType: (*BeamFnStateServer)(nil), - Methods: []grpc.MethodDesc{}, - Streams: []grpc.StreamDesc{ - { - StreamName: "State", - Handler: _BeamFnState_State_Handler, - ServerStreams: true, - ClientStreams: true, - }, - }, - Metadata: "beam_fn_api.proto", -} - -// BeamFnLoggingClient is the client API for BeamFnLogging service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type BeamFnLoggingClient interface { - // Allows for the SDK to emit log entries which the runner can - // associate with the active job. - Logging(ctx context.Context, opts ...grpc.CallOption) (BeamFnLogging_LoggingClient, error) -} - -type beamFnLoggingClient struct { - cc grpc.ClientConnInterface -} - -func NewBeamFnLoggingClient(cc grpc.ClientConnInterface) BeamFnLoggingClient { - return &beamFnLoggingClient{cc} -} - -func (c *beamFnLoggingClient) Logging(ctx context.Context, opts ...grpc.CallOption) (BeamFnLogging_LoggingClient, error) { - stream, err := c.cc.NewStream(ctx, &_BeamFnLogging_serviceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnLogging/Logging", opts...) - if err != nil { - return nil, err - } - x := &beamFnLoggingLoggingClient{stream} - return x, nil -} - -type BeamFnLogging_LoggingClient interface { - Send(*LogEntry_List) error - Recv() (*LogControl, error) - grpc.ClientStream -} - -type beamFnLoggingLoggingClient struct { - grpc.ClientStream -} - -func (x *beamFnLoggingLoggingClient) Send(m *LogEntry_List) error { - return x.ClientStream.SendMsg(m) -} - -func (x *beamFnLoggingLoggingClient) Recv() (*LogControl, error) { - m := new(LogControl) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -// BeamFnLoggingServer is the server API for BeamFnLogging service. -type BeamFnLoggingServer interface { - // Allows for the SDK to emit log entries which the runner can - // associate with the active job. - Logging(BeamFnLogging_LoggingServer) error -} - -// UnimplementedBeamFnLoggingServer can be embedded to have forward compatible implementations. -type UnimplementedBeamFnLoggingServer struct { -} - -func (*UnimplementedBeamFnLoggingServer) Logging(BeamFnLogging_LoggingServer) error { - return status.Errorf(codes.Unimplemented, "method Logging not implemented") -} - -func RegisterBeamFnLoggingServer(s *grpc.Server, srv BeamFnLoggingServer) { - s.RegisterService(&_BeamFnLogging_serviceDesc, srv) -} - -func _BeamFnLogging_Logging_Handler(srv interface{}, stream grpc.ServerStream) error { - return srv.(BeamFnLoggingServer).Logging(&beamFnLoggingLoggingServer{stream}) -} - -type BeamFnLogging_LoggingServer interface { - Send(*LogControl) error - Recv() (*LogEntry_List, error) - grpc.ServerStream -} - -type beamFnLoggingLoggingServer struct { - grpc.ServerStream -} - -func (x *beamFnLoggingLoggingServer) Send(m *LogControl) error { - return x.ServerStream.SendMsg(m) -} - -func (x *beamFnLoggingLoggingServer) Recv() (*LogEntry_List, error) { - m := new(LogEntry_List) - if err := x.ServerStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -var _BeamFnLogging_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnLogging", - HandlerType: (*BeamFnLoggingServer)(nil), - Methods: []grpc.MethodDesc{}, - Streams: []grpc.StreamDesc{ - { - StreamName: "Logging", - Handler: _BeamFnLogging_Logging_Handler, - ServerStreams: true, - ClientStreams: true, - }, - }, - Metadata: "beam_fn_api.proto", -} - -// BeamFnExternalWorkerPoolClient is the client API for BeamFnExternalWorkerPool service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type BeamFnExternalWorkerPoolClient interface { - // Start the SDK worker with the given ID. - StartWorker(ctx context.Context, in *StartWorkerRequest, opts ...grpc.CallOption) (*StartWorkerResponse, error) - // Stop the SDK worker. - StopWorker(ctx context.Context, in *StopWorkerRequest, opts ...grpc.CallOption) (*StopWorkerResponse, error) -} - -type beamFnExternalWorkerPoolClient struct { - cc grpc.ClientConnInterface -} - -func NewBeamFnExternalWorkerPoolClient(cc grpc.ClientConnInterface) BeamFnExternalWorkerPoolClient { - return &beamFnExternalWorkerPoolClient{cc} -} - -func (c *beamFnExternalWorkerPoolClient) StartWorker(ctx context.Context, in *StartWorkerRequest, opts ...grpc.CallOption) (*StartWorkerResponse, error) { - out := new(StartWorkerResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StartWorker", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *beamFnExternalWorkerPoolClient) StopWorker(ctx context.Context, in *StopWorkerRequest, opts ...grpc.CallOption) (*StopWorkerResponse, error) { - out := new(StopWorkerResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StopWorker", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -// BeamFnExternalWorkerPoolServer is the server API for BeamFnExternalWorkerPool service. -type BeamFnExternalWorkerPoolServer interface { - // Start the SDK worker with the given ID. - StartWorker(context.Context, *StartWorkerRequest) (*StartWorkerResponse, error) - // Stop the SDK worker. - StopWorker(context.Context, *StopWorkerRequest) (*StopWorkerResponse, error) -} - -// UnimplementedBeamFnExternalWorkerPoolServer can be embedded to have forward compatible implementations. -type UnimplementedBeamFnExternalWorkerPoolServer struct { -} - -func (*UnimplementedBeamFnExternalWorkerPoolServer) StartWorker(context.Context, *StartWorkerRequest) (*StartWorkerResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method StartWorker not implemented") -} -func (*UnimplementedBeamFnExternalWorkerPoolServer) StopWorker(context.Context, *StopWorkerRequest) (*StopWorkerResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method StopWorker not implemented") -} - -func RegisterBeamFnExternalWorkerPoolServer(s *grpc.Server, srv BeamFnExternalWorkerPoolServer) { - s.RegisterService(&_BeamFnExternalWorkerPool_serviceDesc, srv) -} - -func _BeamFnExternalWorkerPool_StartWorker_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(StartWorkerRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(BeamFnExternalWorkerPoolServer).StartWorker(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StartWorker", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(BeamFnExternalWorkerPoolServer).StartWorker(ctx, req.(*StartWorkerRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _BeamFnExternalWorkerPool_StopWorker_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(StopWorkerRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(BeamFnExternalWorkerPoolServer).StopWorker(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StopWorker", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(BeamFnExternalWorkerPoolServer).StopWorker(ctx, req.(*StopWorkerRequest)) - } - return interceptor(ctx, in, info, handler) -} - -var _BeamFnExternalWorkerPool_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool", - HandlerType: (*BeamFnExternalWorkerPoolServer)(nil), - Methods: []grpc.MethodDesc{ - { - MethodName: "StartWorker", - Handler: _BeamFnExternalWorkerPool_StartWorker_Handler, - }, - { - MethodName: "StopWorker", - Handler: _BeamFnExternalWorkerPool_StopWorker_Handler, - }, - }, - Streams: []grpc.StreamDesc{}, - Metadata: "beam_fn_api.proto", -} - -// BeamFnWorkerStatusClient is the client API for BeamFnWorkerStatus service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type BeamFnWorkerStatusClient interface { - WorkerStatus(ctx context.Context, opts ...grpc.CallOption) (BeamFnWorkerStatus_WorkerStatusClient, error) -} - -type beamFnWorkerStatusClient struct { - cc grpc.ClientConnInterface -} - -func NewBeamFnWorkerStatusClient(cc grpc.ClientConnInterface) BeamFnWorkerStatusClient { - return &beamFnWorkerStatusClient{cc} -} - -func (c *beamFnWorkerStatusClient) WorkerStatus(ctx context.Context, opts ...grpc.CallOption) (BeamFnWorkerStatus_WorkerStatusClient, error) { - stream, err := c.cc.NewStream(ctx, &_BeamFnWorkerStatus_serviceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnWorkerStatus/WorkerStatus", opts...) - if err != nil { - return nil, err - } - x := &beamFnWorkerStatusWorkerStatusClient{stream} - return x, nil -} - -type BeamFnWorkerStatus_WorkerStatusClient interface { - Send(*WorkerStatusResponse) error - Recv() (*WorkerStatusRequest, error) - grpc.ClientStream -} - -type beamFnWorkerStatusWorkerStatusClient struct { - grpc.ClientStream -} - -func (x *beamFnWorkerStatusWorkerStatusClient) Send(m *WorkerStatusResponse) error { - return x.ClientStream.SendMsg(m) -} - -func (x *beamFnWorkerStatusWorkerStatusClient) Recv() (*WorkerStatusRequest, error) { - m := new(WorkerStatusRequest) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -// BeamFnWorkerStatusServer is the server API for BeamFnWorkerStatus service. -type BeamFnWorkerStatusServer interface { - WorkerStatus(BeamFnWorkerStatus_WorkerStatusServer) error -} - -// UnimplementedBeamFnWorkerStatusServer can be embedded to have forward compatible implementations. -type UnimplementedBeamFnWorkerStatusServer struct { -} - -func (*UnimplementedBeamFnWorkerStatusServer) WorkerStatus(BeamFnWorkerStatus_WorkerStatusServer) error { - return status.Errorf(codes.Unimplemented, "method WorkerStatus not implemented") -} - -func RegisterBeamFnWorkerStatusServer(s *grpc.Server, srv BeamFnWorkerStatusServer) { - s.RegisterService(&_BeamFnWorkerStatus_serviceDesc, srv) -} - -func _BeamFnWorkerStatus_WorkerStatus_Handler(srv interface{}, stream grpc.ServerStream) error { - return srv.(BeamFnWorkerStatusServer).WorkerStatus(&beamFnWorkerStatusWorkerStatusServer{stream}) -} - -type BeamFnWorkerStatus_WorkerStatusServer interface { - Send(*WorkerStatusRequest) error - Recv() (*WorkerStatusResponse, error) - grpc.ServerStream -} - -type beamFnWorkerStatusWorkerStatusServer struct { - grpc.ServerStream -} - -func (x *beamFnWorkerStatusWorkerStatusServer) Send(m *WorkerStatusRequest) error { - return x.ServerStream.SendMsg(m) -} - -func (x *beamFnWorkerStatusWorkerStatusServer) Recv() (*WorkerStatusResponse, error) { - m := new(WorkerStatusResponse) - if err := x.ServerStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -var _BeamFnWorkerStatus_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnWorkerStatus", - HandlerType: (*BeamFnWorkerStatusServer)(nil), - Methods: []grpc.MethodDesc{}, - Streams: []grpc.StreamDesc{ - { - StreamName: "WorkerStatus", - Handler: _BeamFnWorkerStatus_WorkerStatus_Handler, - ServerStreams: true, - ClientStreams: true, - }, - }, - Metadata: "beam_fn_api.proto", -} diff --git a/sdks/go/pkg/beam/model/fnexecution_v1/beam_fn_api_grpc.pb.go b/sdks/go/pkg/beam/model/fnexecution_v1/beam_fn_api_grpc.pb.go new file mode 100644 index 000000000000..18167a67578d --- /dev/null +++ b/sdks/go/pkg/beam/model/fnexecution_v1/beam_fn_api_grpc.pb.go @@ -0,0 +1,802 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Code generated by protoc-gen-go-grpc. DO NOT EDIT. + +package fnexecution_v1 + +import ( + context "context" + grpc "google.golang.org/grpc" + codes "google.golang.org/grpc/codes" + status "google.golang.org/grpc/status" +) + +// This is a compile-time assertion to ensure that this generated file +// is compatible with the grpc package it is being compiled against. +// Requires gRPC-Go v1.32.0 or later. +const _ = grpc.SupportPackageIsVersion7 + +// BeamFnControlClient is the client API for BeamFnControl service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type BeamFnControlClient interface { + // Instructions sent by the runner to the SDK requesting different types + // of work. + Control(ctx context.Context, opts ...grpc.CallOption) (BeamFnControl_ControlClient, error) + // Used to get the full process bundle descriptors for bundles one + // is asked to process. + GetProcessBundleDescriptor(ctx context.Context, in *GetProcessBundleDescriptorRequest, opts ...grpc.CallOption) (*ProcessBundleDescriptor, error) +} + +type beamFnControlClient struct { + cc grpc.ClientConnInterface +} + +func NewBeamFnControlClient(cc grpc.ClientConnInterface) BeamFnControlClient { + return &beamFnControlClient{cc} +} + +func (c *beamFnControlClient) Control(ctx context.Context, opts ...grpc.CallOption) (BeamFnControl_ControlClient, error) { + stream, err := c.cc.NewStream(ctx, &BeamFnControl_ServiceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnControl/Control", opts...) + if err != nil { + return nil, err + } + x := &beamFnControlControlClient{stream} + return x, nil +} + +type BeamFnControl_ControlClient interface { + Send(*InstructionResponse) error + Recv() (*InstructionRequest, error) + grpc.ClientStream +} + +type beamFnControlControlClient struct { + grpc.ClientStream +} + +func (x *beamFnControlControlClient) Send(m *InstructionResponse) error { + return x.ClientStream.SendMsg(m) +} + +func (x *beamFnControlControlClient) Recv() (*InstructionRequest, error) { + m := new(InstructionRequest) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +func (c *beamFnControlClient) GetProcessBundleDescriptor(ctx context.Context, in *GetProcessBundleDescriptorRequest, opts ...grpc.CallOption) (*ProcessBundleDescriptor, error) { + out := new(ProcessBundleDescriptor) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.fn_execution.v1.BeamFnControl/GetProcessBundleDescriptor", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +// BeamFnControlServer is the server API for BeamFnControl service. +// All implementations must embed UnimplementedBeamFnControlServer +// for forward compatibility +type BeamFnControlServer interface { + // Instructions sent by the runner to the SDK requesting different types + // of work. + Control(BeamFnControl_ControlServer) error + // Used to get the full process bundle descriptors for bundles one + // is asked to process. + GetProcessBundleDescriptor(context.Context, *GetProcessBundleDescriptorRequest) (*ProcessBundleDescriptor, error) + mustEmbedUnimplementedBeamFnControlServer() +} + +// UnimplementedBeamFnControlServer must be embedded to have forward compatible implementations. +type UnimplementedBeamFnControlServer struct { +} + +func (UnimplementedBeamFnControlServer) Control(BeamFnControl_ControlServer) error { + return status.Errorf(codes.Unimplemented, "method Control not implemented") +} +func (UnimplementedBeamFnControlServer) GetProcessBundleDescriptor(context.Context, *GetProcessBundleDescriptorRequest) (*ProcessBundleDescriptor, error) { + return nil, status.Errorf(codes.Unimplemented, "method GetProcessBundleDescriptor not implemented") +} +func (UnimplementedBeamFnControlServer) mustEmbedUnimplementedBeamFnControlServer() {} + +// UnsafeBeamFnControlServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to BeamFnControlServer will +// result in compilation errors. +type UnsafeBeamFnControlServer interface { + mustEmbedUnimplementedBeamFnControlServer() +} + +func RegisterBeamFnControlServer(s grpc.ServiceRegistrar, srv BeamFnControlServer) { + s.RegisterService(&BeamFnControl_ServiceDesc, srv) +} + +func _BeamFnControl_Control_Handler(srv interface{}, stream grpc.ServerStream) error { + return srv.(BeamFnControlServer).Control(&beamFnControlControlServer{stream}) +} + +type BeamFnControl_ControlServer interface { + Send(*InstructionRequest) error + Recv() (*InstructionResponse, error) + grpc.ServerStream +} + +type beamFnControlControlServer struct { + grpc.ServerStream +} + +func (x *beamFnControlControlServer) Send(m *InstructionRequest) error { + return x.ServerStream.SendMsg(m) +} + +func (x *beamFnControlControlServer) Recv() (*InstructionResponse, error) { + m := new(InstructionResponse) + if err := x.ServerStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +func _BeamFnControl_GetProcessBundleDescriptor_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(GetProcessBundleDescriptorRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(BeamFnControlServer).GetProcessBundleDescriptor(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.fn_execution.v1.BeamFnControl/GetProcessBundleDescriptor", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(BeamFnControlServer).GetProcessBundleDescriptor(ctx, req.(*GetProcessBundleDescriptorRequest)) + } + return interceptor(ctx, in, info, handler) +} + +// BeamFnControl_ServiceDesc is the grpc.ServiceDesc for BeamFnControl service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var BeamFnControl_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnControl", + HandlerType: (*BeamFnControlServer)(nil), + Methods: []grpc.MethodDesc{ + { + MethodName: "GetProcessBundleDescriptor", + Handler: _BeamFnControl_GetProcessBundleDescriptor_Handler, + }, + }, + Streams: []grpc.StreamDesc{ + { + StreamName: "Control", + Handler: _BeamFnControl_Control_Handler, + ServerStreams: true, + ClientStreams: true, + }, + }, + Metadata: "beam_fn_api.proto", +} + +// BeamFnDataClient is the client API for BeamFnData service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type BeamFnDataClient interface { + // Used to send data between harnesses. + Data(ctx context.Context, opts ...grpc.CallOption) (BeamFnData_DataClient, error) +} + +type beamFnDataClient struct { + cc grpc.ClientConnInterface +} + +func NewBeamFnDataClient(cc grpc.ClientConnInterface) BeamFnDataClient { + return &beamFnDataClient{cc} +} + +func (c *beamFnDataClient) Data(ctx context.Context, opts ...grpc.CallOption) (BeamFnData_DataClient, error) { + stream, err := c.cc.NewStream(ctx, &BeamFnData_ServiceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnData/Data", opts...) + if err != nil { + return nil, err + } + x := &beamFnDataDataClient{stream} + return x, nil +} + +type BeamFnData_DataClient interface { + Send(*Elements) error + Recv() (*Elements, error) + grpc.ClientStream +} + +type beamFnDataDataClient struct { + grpc.ClientStream +} + +func (x *beamFnDataDataClient) Send(m *Elements) error { + return x.ClientStream.SendMsg(m) +} + +func (x *beamFnDataDataClient) Recv() (*Elements, error) { + m := new(Elements) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// BeamFnDataServer is the server API for BeamFnData service. +// All implementations must embed UnimplementedBeamFnDataServer +// for forward compatibility +type BeamFnDataServer interface { + // Used to send data between harnesses. + Data(BeamFnData_DataServer) error + mustEmbedUnimplementedBeamFnDataServer() +} + +// UnimplementedBeamFnDataServer must be embedded to have forward compatible implementations. +type UnimplementedBeamFnDataServer struct { +} + +func (UnimplementedBeamFnDataServer) Data(BeamFnData_DataServer) error { + return status.Errorf(codes.Unimplemented, "method Data not implemented") +} +func (UnimplementedBeamFnDataServer) mustEmbedUnimplementedBeamFnDataServer() {} + +// UnsafeBeamFnDataServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to BeamFnDataServer will +// result in compilation errors. +type UnsafeBeamFnDataServer interface { + mustEmbedUnimplementedBeamFnDataServer() +} + +func RegisterBeamFnDataServer(s grpc.ServiceRegistrar, srv BeamFnDataServer) { + s.RegisterService(&BeamFnData_ServiceDesc, srv) +} + +func _BeamFnData_Data_Handler(srv interface{}, stream grpc.ServerStream) error { + return srv.(BeamFnDataServer).Data(&beamFnDataDataServer{stream}) +} + +type BeamFnData_DataServer interface { + Send(*Elements) error + Recv() (*Elements, error) + grpc.ServerStream +} + +type beamFnDataDataServer struct { + grpc.ServerStream +} + +func (x *beamFnDataDataServer) Send(m *Elements) error { + return x.ServerStream.SendMsg(m) +} + +func (x *beamFnDataDataServer) Recv() (*Elements, error) { + m := new(Elements) + if err := x.ServerStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// BeamFnData_ServiceDesc is the grpc.ServiceDesc for BeamFnData service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var BeamFnData_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnData", + HandlerType: (*BeamFnDataServer)(nil), + Methods: []grpc.MethodDesc{}, + Streams: []grpc.StreamDesc{ + { + StreamName: "Data", + Handler: _BeamFnData_Data_Handler, + ServerStreams: true, + ClientStreams: true, + }, + }, + Metadata: "beam_fn_api.proto", +} + +// BeamFnStateClient is the client API for BeamFnState service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type BeamFnStateClient interface { + // Used to get/append/clear state stored by the runner on behalf of the SDK. + State(ctx context.Context, opts ...grpc.CallOption) (BeamFnState_StateClient, error) +} + +type beamFnStateClient struct { + cc grpc.ClientConnInterface +} + +func NewBeamFnStateClient(cc grpc.ClientConnInterface) BeamFnStateClient { + return &beamFnStateClient{cc} +} + +func (c *beamFnStateClient) State(ctx context.Context, opts ...grpc.CallOption) (BeamFnState_StateClient, error) { + stream, err := c.cc.NewStream(ctx, &BeamFnState_ServiceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnState/State", opts...) + if err != nil { + return nil, err + } + x := &beamFnStateStateClient{stream} + return x, nil +} + +type BeamFnState_StateClient interface { + Send(*StateRequest) error + Recv() (*StateResponse, error) + grpc.ClientStream +} + +type beamFnStateStateClient struct { + grpc.ClientStream +} + +func (x *beamFnStateStateClient) Send(m *StateRequest) error { + return x.ClientStream.SendMsg(m) +} + +func (x *beamFnStateStateClient) Recv() (*StateResponse, error) { + m := new(StateResponse) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// BeamFnStateServer is the server API for BeamFnState service. +// All implementations must embed UnimplementedBeamFnStateServer +// for forward compatibility +type BeamFnStateServer interface { + // Used to get/append/clear state stored by the runner on behalf of the SDK. + State(BeamFnState_StateServer) error + mustEmbedUnimplementedBeamFnStateServer() +} + +// UnimplementedBeamFnStateServer must be embedded to have forward compatible implementations. +type UnimplementedBeamFnStateServer struct { +} + +func (UnimplementedBeamFnStateServer) State(BeamFnState_StateServer) error { + return status.Errorf(codes.Unimplemented, "method State not implemented") +} +func (UnimplementedBeamFnStateServer) mustEmbedUnimplementedBeamFnStateServer() {} + +// UnsafeBeamFnStateServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to BeamFnStateServer will +// result in compilation errors. +type UnsafeBeamFnStateServer interface { + mustEmbedUnimplementedBeamFnStateServer() +} + +func RegisterBeamFnStateServer(s grpc.ServiceRegistrar, srv BeamFnStateServer) { + s.RegisterService(&BeamFnState_ServiceDesc, srv) +} + +func _BeamFnState_State_Handler(srv interface{}, stream grpc.ServerStream) error { + return srv.(BeamFnStateServer).State(&beamFnStateStateServer{stream}) +} + +type BeamFnState_StateServer interface { + Send(*StateResponse) error + Recv() (*StateRequest, error) + grpc.ServerStream +} + +type beamFnStateStateServer struct { + grpc.ServerStream +} + +func (x *beamFnStateStateServer) Send(m *StateResponse) error { + return x.ServerStream.SendMsg(m) +} + +func (x *beamFnStateStateServer) Recv() (*StateRequest, error) { + m := new(StateRequest) + if err := x.ServerStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// BeamFnState_ServiceDesc is the grpc.ServiceDesc for BeamFnState service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var BeamFnState_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnState", + HandlerType: (*BeamFnStateServer)(nil), + Methods: []grpc.MethodDesc{}, + Streams: []grpc.StreamDesc{ + { + StreamName: "State", + Handler: _BeamFnState_State_Handler, + ServerStreams: true, + ClientStreams: true, + }, + }, + Metadata: "beam_fn_api.proto", +} + +// BeamFnLoggingClient is the client API for BeamFnLogging service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type BeamFnLoggingClient interface { + // Allows for the SDK to emit log entries which the runner can + // associate with the active job. + Logging(ctx context.Context, opts ...grpc.CallOption) (BeamFnLogging_LoggingClient, error) +} + +type beamFnLoggingClient struct { + cc grpc.ClientConnInterface +} + +func NewBeamFnLoggingClient(cc grpc.ClientConnInterface) BeamFnLoggingClient { + return &beamFnLoggingClient{cc} +} + +func (c *beamFnLoggingClient) Logging(ctx context.Context, opts ...grpc.CallOption) (BeamFnLogging_LoggingClient, error) { + stream, err := c.cc.NewStream(ctx, &BeamFnLogging_ServiceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnLogging/Logging", opts...) + if err != nil { + return nil, err + } + x := &beamFnLoggingLoggingClient{stream} + return x, nil +} + +type BeamFnLogging_LoggingClient interface { + Send(*LogEntry_List) error + Recv() (*LogControl, error) + grpc.ClientStream +} + +type beamFnLoggingLoggingClient struct { + grpc.ClientStream +} + +func (x *beamFnLoggingLoggingClient) Send(m *LogEntry_List) error { + return x.ClientStream.SendMsg(m) +} + +func (x *beamFnLoggingLoggingClient) Recv() (*LogControl, error) { + m := new(LogControl) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// BeamFnLoggingServer is the server API for BeamFnLogging service. +// All implementations must embed UnimplementedBeamFnLoggingServer +// for forward compatibility +type BeamFnLoggingServer interface { + // Allows for the SDK to emit log entries which the runner can + // associate with the active job. + Logging(BeamFnLogging_LoggingServer) error + mustEmbedUnimplementedBeamFnLoggingServer() +} + +// UnimplementedBeamFnLoggingServer must be embedded to have forward compatible implementations. +type UnimplementedBeamFnLoggingServer struct { +} + +func (UnimplementedBeamFnLoggingServer) Logging(BeamFnLogging_LoggingServer) error { + return status.Errorf(codes.Unimplemented, "method Logging not implemented") +} +func (UnimplementedBeamFnLoggingServer) mustEmbedUnimplementedBeamFnLoggingServer() {} + +// UnsafeBeamFnLoggingServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to BeamFnLoggingServer will +// result in compilation errors. +type UnsafeBeamFnLoggingServer interface { + mustEmbedUnimplementedBeamFnLoggingServer() +} + +func RegisterBeamFnLoggingServer(s grpc.ServiceRegistrar, srv BeamFnLoggingServer) { + s.RegisterService(&BeamFnLogging_ServiceDesc, srv) +} + +func _BeamFnLogging_Logging_Handler(srv interface{}, stream grpc.ServerStream) error { + return srv.(BeamFnLoggingServer).Logging(&beamFnLoggingLoggingServer{stream}) +} + +type BeamFnLogging_LoggingServer interface { + Send(*LogControl) error + Recv() (*LogEntry_List, error) + grpc.ServerStream +} + +type beamFnLoggingLoggingServer struct { + grpc.ServerStream +} + +func (x *beamFnLoggingLoggingServer) Send(m *LogControl) error { + return x.ServerStream.SendMsg(m) +} + +func (x *beamFnLoggingLoggingServer) Recv() (*LogEntry_List, error) { + m := new(LogEntry_List) + if err := x.ServerStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// BeamFnLogging_ServiceDesc is the grpc.ServiceDesc for BeamFnLogging service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var BeamFnLogging_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnLogging", + HandlerType: (*BeamFnLoggingServer)(nil), + Methods: []grpc.MethodDesc{}, + Streams: []grpc.StreamDesc{ + { + StreamName: "Logging", + Handler: _BeamFnLogging_Logging_Handler, + ServerStreams: true, + ClientStreams: true, + }, + }, + Metadata: "beam_fn_api.proto", +} + +// BeamFnExternalWorkerPoolClient is the client API for BeamFnExternalWorkerPool service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type BeamFnExternalWorkerPoolClient interface { + // Start the SDK worker with the given ID. + StartWorker(ctx context.Context, in *StartWorkerRequest, opts ...grpc.CallOption) (*StartWorkerResponse, error) + // Stop the SDK worker. + StopWorker(ctx context.Context, in *StopWorkerRequest, opts ...grpc.CallOption) (*StopWorkerResponse, error) +} + +type beamFnExternalWorkerPoolClient struct { + cc grpc.ClientConnInterface +} + +func NewBeamFnExternalWorkerPoolClient(cc grpc.ClientConnInterface) BeamFnExternalWorkerPoolClient { + return &beamFnExternalWorkerPoolClient{cc} +} + +func (c *beamFnExternalWorkerPoolClient) StartWorker(ctx context.Context, in *StartWorkerRequest, opts ...grpc.CallOption) (*StartWorkerResponse, error) { + out := new(StartWorkerResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StartWorker", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *beamFnExternalWorkerPoolClient) StopWorker(ctx context.Context, in *StopWorkerRequest, opts ...grpc.CallOption) (*StopWorkerResponse, error) { + out := new(StopWorkerResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StopWorker", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +// BeamFnExternalWorkerPoolServer is the server API for BeamFnExternalWorkerPool service. +// All implementations must embed UnimplementedBeamFnExternalWorkerPoolServer +// for forward compatibility +type BeamFnExternalWorkerPoolServer interface { + // Start the SDK worker with the given ID. + StartWorker(context.Context, *StartWorkerRequest) (*StartWorkerResponse, error) + // Stop the SDK worker. + StopWorker(context.Context, *StopWorkerRequest) (*StopWorkerResponse, error) + mustEmbedUnimplementedBeamFnExternalWorkerPoolServer() +} + +// UnimplementedBeamFnExternalWorkerPoolServer must be embedded to have forward compatible implementations. +type UnimplementedBeamFnExternalWorkerPoolServer struct { +} + +func (UnimplementedBeamFnExternalWorkerPoolServer) StartWorker(context.Context, *StartWorkerRequest) (*StartWorkerResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method StartWorker not implemented") +} +func (UnimplementedBeamFnExternalWorkerPoolServer) StopWorker(context.Context, *StopWorkerRequest) (*StopWorkerResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method StopWorker not implemented") +} +func (UnimplementedBeamFnExternalWorkerPoolServer) mustEmbedUnimplementedBeamFnExternalWorkerPoolServer() { +} + +// UnsafeBeamFnExternalWorkerPoolServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to BeamFnExternalWorkerPoolServer will +// result in compilation errors. +type UnsafeBeamFnExternalWorkerPoolServer interface { + mustEmbedUnimplementedBeamFnExternalWorkerPoolServer() +} + +func RegisterBeamFnExternalWorkerPoolServer(s grpc.ServiceRegistrar, srv BeamFnExternalWorkerPoolServer) { + s.RegisterService(&BeamFnExternalWorkerPool_ServiceDesc, srv) +} + +func _BeamFnExternalWorkerPool_StartWorker_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(StartWorkerRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(BeamFnExternalWorkerPoolServer).StartWorker(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StartWorker", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(BeamFnExternalWorkerPoolServer).StartWorker(ctx, req.(*StartWorkerRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _BeamFnExternalWorkerPool_StopWorker_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(StopWorkerRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(BeamFnExternalWorkerPoolServer).StopWorker(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StopWorker", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(BeamFnExternalWorkerPoolServer).StopWorker(ctx, req.(*StopWorkerRequest)) + } + return interceptor(ctx, in, info, handler) +} + +// BeamFnExternalWorkerPool_ServiceDesc is the grpc.ServiceDesc for BeamFnExternalWorkerPool service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var BeamFnExternalWorkerPool_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool", + HandlerType: (*BeamFnExternalWorkerPoolServer)(nil), + Methods: []grpc.MethodDesc{ + { + MethodName: "StartWorker", + Handler: _BeamFnExternalWorkerPool_StartWorker_Handler, + }, + { + MethodName: "StopWorker", + Handler: _BeamFnExternalWorkerPool_StopWorker_Handler, + }, + }, + Streams: []grpc.StreamDesc{}, + Metadata: "beam_fn_api.proto", +} + +// BeamFnWorkerStatusClient is the client API for BeamFnWorkerStatus service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type BeamFnWorkerStatusClient interface { + WorkerStatus(ctx context.Context, opts ...grpc.CallOption) (BeamFnWorkerStatus_WorkerStatusClient, error) +} + +type beamFnWorkerStatusClient struct { + cc grpc.ClientConnInterface +} + +func NewBeamFnWorkerStatusClient(cc grpc.ClientConnInterface) BeamFnWorkerStatusClient { + return &beamFnWorkerStatusClient{cc} +} + +func (c *beamFnWorkerStatusClient) WorkerStatus(ctx context.Context, opts ...grpc.CallOption) (BeamFnWorkerStatus_WorkerStatusClient, error) { + stream, err := c.cc.NewStream(ctx, &BeamFnWorkerStatus_ServiceDesc.Streams[0], "/org.apache.beam.model.fn_execution.v1.BeamFnWorkerStatus/WorkerStatus", opts...) + if err != nil { + return nil, err + } + x := &beamFnWorkerStatusWorkerStatusClient{stream} + return x, nil +} + +type BeamFnWorkerStatus_WorkerStatusClient interface { + Send(*WorkerStatusResponse) error + Recv() (*WorkerStatusRequest, error) + grpc.ClientStream +} + +type beamFnWorkerStatusWorkerStatusClient struct { + grpc.ClientStream +} + +func (x *beamFnWorkerStatusWorkerStatusClient) Send(m *WorkerStatusResponse) error { + return x.ClientStream.SendMsg(m) +} + +func (x *beamFnWorkerStatusWorkerStatusClient) Recv() (*WorkerStatusRequest, error) { + m := new(WorkerStatusRequest) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// BeamFnWorkerStatusServer is the server API for BeamFnWorkerStatus service. +// All implementations must embed UnimplementedBeamFnWorkerStatusServer +// for forward compatibility +type BeamFnWorkerStatusServer interface { + WorkerStatus(BeamFnWorkerStatus_WorkerStatusServer) error + mustEmbedUnimplementedBeamFnWorkerStatusServer() +} + +// UnimplementedBeamFnWorkerStatusServer must be embedded to have forward compatible implementations. +type UnimplementedBeamFnWorkerStatusServer struct { +} + +func (UnimplementedBeamFnWorkerStatusServer) WorkerStatus(BeamFnWorkerStatus_WorkerStatusServer) error { + return status.Errorf(codes.Unimplemented, "method WorkerStatus not implemented") +} +func (UnimplementedBeamFnWorkerStatusServer) mustEmbedUnimplementedBeamFnWorkerStatusServer() {} + +// UnsafeBeamFnWorkerStatusServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to BeamFnWorkerStatusServer will +// result in compilation errors. +type UnsafeBeamFnWorkerStatusServer interface { + mustEmbedUnimplementedBeamFnWorkerStatusServer() +} + +func RegisterBeamFnWorkerStatusServer(s grpc.ServiceRegistrar, srv BeamFnWorkerStatusServer) { + s.RegisterService(&BeamFnWorkerStatus_ServiceDesc, srv) +} + +func _BeamFnWorkerStatus_WorkerStatus_Handler(srv interface{}, stream grpc.ServerStream) error { + return srv.(BeamFnWorkerStatusServer).WorkerStatus(&beamFnWorkerStatusWorkerStatusServer{stream}) +} + +type BeamFnWorkerStatus_WorkerStatusServer interface { + Send(*WorkerStatusRequest) error + Recv() (*WorkerStatusResponse, error) + grpc.ServerStream +} + +type beamFnWorkerStatusWorkerStatusServer struct { + grpc.ServerStream +} + +func (x *beamFnWorkerStatusWorkerStatusServer) Send(m *WorkerStatusRequest) error { + return x.ServerStream.SendMsg(m) +} + +func (x *beamFnWorkerStatusWorkerStatusServer) Recv() (*WorkerStatusResponse, error) { + m := new(WorkerStatusResponse) + if err := x.ServerStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// BeamFnWorkerStatus_ServiceDesc is the grpc.ServiceDesc for BeamFnWorkerStatus service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var BeamFnWorkerStatus_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.fn_execution.v1.BeamFnWorkerStatus", + HandlerType: (*BeamFnWorkerStatusServer)(nil), + Methods: []grpc.MethodDesc{}, + Streams: []grpc.StreamDesc{ + { + StreamName: "WorkerStatus", + Handler: _BeamFnWorkerStatus_WorkerStatus_Handler, + ServerStreams: true, + ClientStreams: true, + }, + }, + Metadata: "beam_fn_api.proto", +} diff --git a/sdks/go/pkg/beam/model/fnexecution_v1/beam_provision_api.pb.go b/sdks/go/pkg/beam/model/fnexecution_v1/beam_provision_api.pb.go index ff73cd2be901..9c19469350ed 100644 --- a/sdks/go/pkg/beam/model/fnexecution_v1/beam_provision_api.pb.go +++ b/sdks/go/pkg/beam/model/fnexecution_v1/beam_provision_api.pb.go @@ -21,21 +21,17 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: beam_provision_api.proto package fnexecution_v1 import ( - context "context" - pipeline_v1 "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - _struct "github.com/golang/protobuf/ptypes/struct" - grpc "google.golang.org/grpc" - codes "google.golang.org/grpc/codes" - status "google.golang.org/grpc/status" + pipeline_v1 "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" protoreflect "google.golang.org/protobuf/reflect/protoreflect" protoimpl "google.golang.org/protobuf/runtime/protoimpl" + structpb "google.golang.org/protobuf/types/known/structpb" reflect "reflect" sync "sync" ) @@ -143,7 +139,7 @@ type ProvisionInfo struct { // (required) Pipeline options. For non-template jobs, the options are // identical to what is passed to job submission. - PipelineOptions *_struct.Struct `protobuf:"bytes,3,opt,name=pipeline_options,json=pipelineOptions,proto3" json:"pipeline_options,omitempty"` + PipelineOptions *structpb.Struct `protobuf:"bytes,3,opt,name=pipeline_options,json=pipelineOptions,proto3" json:"pipeline_options,omitempty"` // (required) The artifact retrieval token produced by // LegacyArtifactStagingService.CommitManifestResponse. RetrievalToken string `protobuf:"bytes,6,opt,name=retrieval_token,json=retrievalToken,proto3" json:"retrieval_token,omitempty"` @@ -160,6 +156,12 @@ type ProvisionInfo struct { ControlEndpoint *pipeline_v1.ApiServiceDescriptor `protobuf:"bytes,10,opt,name=control_endpoint,json=controlEndpoint,proto3" json:"control_endpoint,omitempty"` // The set of dependencies that should be staged into this environment. Dependencies []*pipeline_v1.ArtifactInformation `protobuf:"bytes,11,rep,name=dependencies,proto3" json:"dependencies,omitempty"` + // (optional) A set of capabilities that this SDK is allowed to use in its + // interactions with this runner. + RunnerCapabilities []string `protobuf:"bytes,12,rep,name=runner_capabilities,json=runnerCapabilities,proto3" json:"runner_capabilities,omitempty"` + // (optional) Runtime environment metadata that are static throughout the + // pipeline execution. + Metadata map[string]string `protobuf:"bytes,13,rep,name=metadata,proto3" json:"metadata,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` } func (x *ProvisionInfo) Reset() { @@ -194,7 +196,7 @@ func (*ProvisionInfo) Descriptor() ([]byte, []int) { return file_beam_provision_api_proto_rawDescGZIP(), []int{2} } -func (x *ProvisionInfo) GetPipelineOptions() *_struct.Struct { +func (x *ProvisionInfo) GetPipelineOptions() *structpb.Struct { if x != nil { return x.PipelineOptions } @@ -243,6 +245,20 @@ func (x *ProvisionInfo) GetDependencies() []*pipeline_v1.ArtifactInformation { return nil } +func (x *ProvisionInfo) GetRunnerCapabilities() []string { + if x != nil { + return x.RunnerCapabilities + } + return nil +} + +func (x *ProvisionInfo) GetMetadata() map[string]string { + if x != nil { + return x.Metadata + } + return nil +} + var File_beam_provision_api_proto protoreflect.FileDescriptor var file_beam_provision_api_proto_rawDesc = []byte{ @@ -262,7 +278,7 @@ var file_beam_provision_api_proto_rawDesc = []byte{ 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, 0x49, 0x6e, - 0x66, 0x6f, 0x52, 0x04, 0x69, 0x6e, 0x66, 0x6f, 0x22, 0xe8, 0x04, 0x0a, 0x0d, 0x50, 0x72, 0x6f, + 0x66, 0x6f, 0x52, 0x04, 0x69, 0x6e, 0x66, 0x6f, 0x22, 0xb6, 0x06, 0x0a, 0x0d, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, 0x49, 0x6e, 0x66, 0x6f, 0x12, 0x42, 0x0a, 0x10, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x6f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x17, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, @@ -301,26 +317,39 @@ var file_beam_provision_api_proto_rawDesc = []byte{ 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x49, 0x6e, 0x66, 0x6f, 0x72, 0x6d, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x0c, 0x64, 0x65, 0x70, 0x65, 0x6e, 0x64, 0x65, 0x6e, 0x63, - 0x69, 0x65, 0x73, 0x32, 0xa8, 0x01, 0x0a, 0x10, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, - 0x6e, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x12, 0x93, 0x01, 0x0a, 0x10, 0x47, 0x65, 0x74, - 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, 0x49, 0x6e, 0x66, 0x6f, 0x12, 0x3e, 0x2e, - 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, - 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, - 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x47, 0x65, 0x74, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, - 0x6f, 0x6e, 0x49, 0x6e, 0x66, 0x6f, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x3f, 0x2e, - 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, - 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, - 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x47, 0x65, 0x74, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, - 0x6f, 0x6e, 0x49, 0x6e, 0x66, 0x6f, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x42, 0x81, - 0x01, 0x0a, 0x24, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, - 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, - 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x42, 0x0c, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, - 0x6f, 0x6e, 0x41, 0x70, 0x69, 0x5a, 0x4b, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, - 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, - 0x6b, 0x73, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x66, 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, - 0x5f, 0x76, 0x31, 0x3b, 0x66, 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, - 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, + 0x69, 0x65, 0x73, 0x12, 0x2f, 0x0a, 0x13, 0x72, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x5f, 0x63, 0x61, + 0x70, 0x61, 0x62, 0x69, 0x6c, 0x69, 0x74, 0x69, 0x65, 0x73, 0x18, 0x0c, 0x20, 0x03, 0x28, 0x09, + 0x52, 0x12, 0x72, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x43, 0x61, 0x70, 0x61, 0x62, 0x69, 0x6c, 0x69, + 0x74, 0x69, 0x65, 0x73, 0x12, 0x5e, 0x0a, 0x08, 0x6d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, + 0x18, 0x0d, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x42, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, + 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x50, + 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, 0x49, 0x6e, 0x66, 0x6f, 0x2e, 0x4d, 0x65, 0x74, + 0x61, 0x64, 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x08, 0x6d, 0x65, 0x74, 0x61, + 0x64, 0x61, 0x74, 0x61, 0x1a, 0x3b, 0x0a, 0x0d, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, + 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, + 0x01, 0x32, 0xa8, 0x01, 0x0a, 0x10, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, 0x53, + 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x12, 0x93, 0x01, 0x0a, 0x10, 0x47, 0x65, 0x74, 0x50, 0x72, + 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, 0x49, 0x6e, 0x66, 0x6f, 0x12, 0x3e, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, + 0x2e, 0x76, 0x31, 0x2e, 0x47, 0x65, 0x74, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, + 0x49, 0x6e, 0x66, 0x6f, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x3f, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, + 0x2e, 0x76, 0x31, 0x2e, 0x47, 0x65, 0x74, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, + 0x49, 0x6e, 0x66, 0x6f, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x42, 0x84, 0x01, 0x0a, + 0x24, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x66, 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, + 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x42, 0x0c, 0x50, 0x72, 0x6f, 0x76, 0x69, 0x73, 0x69, 0x6f, 0x6e, + 0x41, 0x70, 0x69, 0x5a, 0x4e, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, + 0x2f, 0x76, 0x32, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x66, 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, + 0x6e, 0x5f, 0x76, 0x31, 0x3b, 0x66, 0x6e, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, + 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( @@ -335,30 +364,32 @@ func file_beam_provision_api_proto_rawDescGZIP() []byte { return file_beam_provision_api_proto_rawDescData } -var file_beam_provision_api_proto_msgTypes = make([]protoimpl.MessageInfo, 3) +var file_beam_provision_api_proto_msgTypes = make([]protoimpl.MessageInfo, 4) var file_beam_provision_api_proto_goTypes = []interface{}{ (*GetProvisionInfoRequest)(nil), // 0: org.apache.beam.model.fn_execution.v1.GetProvisionInfoRequest (*GetProvisionInfoResponse)(nil), // 1: org.apache.beam.model.fn_execution.v1.GetProvisionInfoResponse (*ProvisionInfo)(nil), // 2: org.apache.beam.model.fn_execution.v1.ProvisionInfo - (*_struct.Struct)(nil), // 3: google.protobuf.Struct - (*pipeline_v1.ApiServiceDescriptor)(nil), // 4: org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - (*pipeline_v1.ArtifactInformation)(nil), // 5: org.apache.beam.model.pipeline.v1.ArtifactInformation + nil, // 3: org.apache.beam.model.fn_execution.v1.ProvisionInfo.MetadataEntry + (*structpb.Struct)(nil), // 4: google.protobuf.Struct + (*pipeline_v1.ApiServiceDescriptor)(nil), // 5: org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + (*pipeline_v1.ArtifactInformation)(nil), // 6: org.apache.beam.model.pipeline.v1.ArtifactInformation } var file_beam_provision_api_proto_depIdxs = []int32{ 2, // 0: org.apache.beam.model.fn_execution.v1.GetProvisionInfoResponse.info:type_name -> org.apache.beam.model.fn_execution.v1.ProvisionInfo - 3, // 1: org.apache.beam.model.fn_execution.v1.ProvisionInfo.pipeline_options:type_name -> google.protobuf.Struct - 4, // 2: org.apache.beam.model.fn_execution.v1.ProvisionInfo.status_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 4, // 3: org.apache.beam.model.fn_execution.v1.ProvisionInfo.logging_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 4, // 4: org.apache.beam.model.fn_execution.v1.ProvisionInfo.artifact_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 4, // 5: org.apache.beam.model.fn_execution.v1.ProvisionInfo.control_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 5, // 6: org.apache.beam.model.fn_execution.v1.ProvisionInfo.dependencies:type_name -> org.apache.beam.model.pipeline.v1.ArtifactInformation - 0, // 7: org.apache.beam.model.fn_execution.v1.ProvisionService.GetProvisionInfo:input_type -> org.apache.beam.model.fn_execution.v1.GetProvisionInfoRequest - 1, // 8: org.apache.beam.model.fn_execution.v1.ProvisionService.GetProvisionInfo:output_type -> org.apache.beam.model.fn_execution.v1.GetProvisionInfoResponse - 8, // [8:9] is the sub-list for method output_type - 7, // [7:8] is the sub-list for method input_type - 7, // [7:7] is the sub-list for extension type_name - 7, // [7:7] is the sub-list for extension extendee - 0, // [0:7] is the sub-list for field type_name + 4, // 1: org.apache.beam.model.fn_execution.v1.ProvisionInfo.pipeline_options:type_name -> google.protobuf.Struct + 5, // 2: org.apache.beam.model.fn_execution.v1.ProvisionInfo.status_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 5, // 3: org.apache.beam.model.fn_execution.v1.ProvisionInfo.logging_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 5, // 4: org.apache.beam.model.fn_execution.v1.ProvisionInfo.artifact_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 5, // 5: org.apache.beam.model.fn_execution.v1.ProvisionInfo.control_endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 6, // 6: org.apache.beam.model.fn_execution.v1.ProvisionInfo.dependencies:type_name -> org.apache.beam.model.pipeline.v1.ArtifactInformation + 3, // 7: org.apache.beam.model.fn_execution.v1.ProvisionInfo.metadata:type_name -> org.apache.beam.model.fn_execution.v1.ProvisionInfo.MetadataEntry + 0, // 8: org.apache.beam.model.fn_execution.v1.ProvisionService.GetProvisionInfo:input_type -> org.apache.beam.model.fn_execution.v1.GetProvisionInfoRequest + 1, // 9: org.apache.beam.model.fn_execution.v1.ProvisionService.GetProvisionInfo:output_type -> org.apache.beam.model.fn_execution.v1.GetProvisionInfoResponse + 9, // [9:10] is the sub-list for method output_type + 8, // [8:9] is the sub-list for method input_type + 8, // [8:8] is the sub-list for extension type_name + 8, // [8:8] is the sub-list for extension extendee + 0, // [0:8] is the sub-list for field type_name } func init() { file_beam_provision_api_proto_init() } @@ -410,7 +441,7 @@ func file_beam_provision_api_proto_init() { GoPackagePath: reflect.TypeOf(x{}).PkgPath(), RawDescriptor: file_beam_provision_api_proto_rawDesc, NumEnums: 0, - NumMessages: 3, + NumMessages: 4, NumExtensions: 0, NumServices: 1, }, @@ -423,85 +454,3 @@ func file_beam_provision_api_proto_init() { file_beam_provision_api_proto_goTypes = nil file_beam_provision_api_proto_depIdxs = nil } - -// Reference imports to suppress errors if they are not otherwise used. -var _ context.Context -var _ grpc.ClientConnInterface - -// This is a compile-time assertion to ensure that this generated file -// is compatible with the grpc package it is being compiled against. -const _ = grpc.SupportPackageIsVersion6 - -// ProvisionServiceClient is the client API for ProvisionService service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type ProvisionServiceClient interface { - // Get provision information for the SDK harness worker instance. - GetProvisionInfo(ctx context.Context, in *GetProvisionInfoRequest, opts ...grpc.CallOption) (*GetProvisionInfoResponse, error) -} - -type provisionServiceClient struct { - cc grpc.ClientConnInterface -} - -func NewProvisionServiceClient(cc grpc.ClientConnInterface) ProvisionServiceClient { - return &provisionServiceClient{cc} -} - -func (c *provisionServiceClient) GetProvisionInfo(ctx context.Context, in *GetProvisionInfoRequest, opts ...grpc.CallOption) (*GetProvisionInfoResponse, error) { - out := new(GetProvisionInfoResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.fn_execution.v1.ProvisionService/GetProvisionInfo", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -// ProvisionServiceServer is the server API for ProvisionService service. -type ProvisionServiceServer interface { - // Get provision information for the SDK harness worker instance. - GetProvisionInfo(context.Context, *GetProvisionInfoRequest) (*GetProvisionInfoResponse, error) -} - -// UnimplementedProvisionServiceServer can be embedded to have forward compatible implementations. -type UnimplementedProvisionServiceServer struct { -} - -func (*UnimplementedProvisionServiceServer) GetProvisionInfo(context.Context, *GetProvisionInfoRequest) (*GetProvisionInfoResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method GetProvisionInfo not implemented") -} - -func RegisterProvisionServiceServer(s *grpc.Server, srv ProvisionServiceServer) { - s.RegisterService(&_ProvisionService_serviceDesc, srv) -} - -func _ProvisionService_GetProvisionInfo_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(GetProvisionInfoRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(ProvisionServiceServer).GetProvisionInfo(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.fn_execution.v1.ProvisionService/GetProvisionInfo", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(ProvisionServiceServer).GetProvisionInfo(ctx, req.(*GetProvisionInfoRequest)) - } - return interceptor(ctx, in, info, handler) -} - -var _ProvisionService_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.fn_execution.v1.ProvisionService", - HandlerType: (*ProvisionServiceServer)(nil), - Methods: []grpc.MethodDesc{ - { - MethodName: "GetProvisionInfo", - Handler: _ProvisionService_GetProvisionInfo_Handler, - }, - }, - Streams: []grpc.StreamDesc{}, - Metadata: "beam_provision_api.proto", -} diff --git a/sdks/go/pkg/beam/model/fnexecution_v1/beam_provision_api_grpc.pb.go b/sdks/go/pkg/beam/model/fnexecution_v1/beam_provision_api_grpc.pb.go new file mode 100644 index 000000000000..4e8967e0e88d --- /dev/null +++ b/sdks/go/pkg/beam/model/fnexecution_v1/beam_provision_api_grpc.pb.go @@ -0,0 +1,120 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Code generated by protoc-gen-go-grpc. DO NOT EDIT. + +package fnexecution_v1 + +import ( + context "context" + grpc "google.golang.org/grpc" + codes "google.golang.org/grpc/codes" + status "google.golang.org/grpc/status" +) + +// This is a compile-time assertion to ensure that this generated file +// is compatible with the grpc package it is being compiled against. +// Requires gRPC-Go v1.32.0 or later. +const _ = grpc.SupportPackageIsVersion7 + +// ProvisionServiceClient is the client API for ProvisionService service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type ProvisionServiceClient interface { + // Get provision information for the SDK harness worker instance. + GetProvisionInfo(ctx context.Context, in *GetProvisionInfoRequest, opts ...grpc.CallOption) (*GetProvisionInfoResponse, error) +} + +type provisionServiceClient struct { + cc grpc.ClientConnInterface +} + +func NewProvisionServiceClient(cc grpc.ClientConnInterface) ProvisionServiceClient { + return &provisionServiceClient{cc} +} + +func (c *provisionServiceClient) GetProvisionInfo(ctx context.Context, in *GetProvisionInfoRequest, opts ...grpc.CallOption) (*GetProvisionInfoResponse, error) { + out := new(GetProvisionInfoResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.fn_execution.v1.ProvisionService/GetProvisionInfo", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +// ProvisionServiceServer is the server API for ProvisionService service. +// All implementations must embed UnimplementedProvisionServiceServer +// for forward compatibility +type ProvisionServiceServer interface { + // Get provision information for the SDK harness worker instance. + GetProvisionInfo(context.Context, *GetProvisionInfoRequest) (*GetProvisionInfoResponse, error) + mustEmbedUnimplementedProvisionServiceServer() +} + +// UnimplementedProvisionServiceServer must be embedded to have forward compatible implementations. +type UnimplementedProvisionServiceServer struct { +} + +func (UnimplementedProvisionServiceServer) GetProvisionInfo(context.Context, *GetProvisionInfoRequest) (*GetProvisionInfoResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method GetProvisionInfo not implemented") +} +func (UnimplementedProvisionServiceServer) mustEmbedUnimplementedProvisionServiceServer() {} + +// UnsafeProvisionServiceServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to ProvisionServiceServer will +// result in compilation errors. +type UnsafeProvisionServiceServer interface { + mustEmbedUnimplementedProvisionServiceServer() +} + +func RegisterProvisionServiceServer(s grpc.ServiceRegistrar, srv ProvisionServiceServer) { + s.RegisterService(&ProvisionService_ServiceDesc, srv) +} + +func _ProvisionService_GetProvisionInfo_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(GetProvisionInfoRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(ProvisionServiceServer).GetProvisionInfo(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.fn_execution.v1.ProvisionService/GetProvisionInfo", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(ProvisionServiceServer).GetProvisionInfo(ctx, req.(*GetProvisionInfoRequest)) + } + return interceptor(ctx, in, info, handler) +} + +// ProvisionService_ServiceDesc is the grpc.ServiceDesc for ProvisionService service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var ProvisionService_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.fn_execution.v1.ProvisionService", + HandlerType: (*ProvisionServiceServer)(nil), + Methods: []grpc.MethodDesc{ + { + MethodName: "GetProvisionInfo", + Handler: _ProvisionService_GetProvisionInfo_Handler, + }, + }, + Streams: []grpc.StreamDesc{}, + Metadata: "beam_provision_api.proto", +} diff --git a/sdks/go/pkg/beam/model/gen.go b/sdks/go/pkg/beam/model/gen.go index dc706a25704a..c7905ef1a482 100644 --- a/sdks/go/pkg/beam/model/gen.go +++ b/sdks/go/pkg/beam/model/gen.go @@ -19,9 +19,11 @@ package model // TODO(herohde) 9/1/2017: for now, install protoc as described on grpc.io before running go generate. // TODO(lostluck) 2019/05/03: Figure out how to avoid manually keeping these in sync with the // generator. protoc-gen-go can be declared as a versioned module dependency too. +// TODO(danoliveira) 2021/08/11: GRPC generated files are generated without a license which needs +// to be added in manually. Figure out how to get the licenses applying properly. // Until file is automatically generated, keep the listed proto files in alphabetical order. -//go:generate protoc -I../../../../../model/pipeline/src/main/proto ../../../../../model/pipeline/src/main/proto/beam_runner_api.proto ../../../../../model/pipeline/src/main/proto/endpoints.proto ../../../../../model/pipeline/src/main/proto/external_transforms.proto ../../../../../model/pipeline/src/main/proto/metrics.proto ../../../../../model/pipeline/src/main/proto/schema.proto ../../../../../model/pipeline/src/main/proto/standard_window_fns.proto --go_out=M../../../../../../../../,plugins=grpc:../../../../../../../.. -//go:generate protoc -I../../../../../model/pipeline/src/main/proto -I../../../../../model/job-management/src/main/proto ../../../../../model/job-management/src/main/proto/beam_artifact_api.proto ../../../../../model/job-management/src/main/proto/beam_expansion_api.proto ../../../../../model/job-management/src/main/proto/beam_job_api.proto --go_out=M../../../../../../../../,plugins=grpc:../../../../../../../.. -//go:generate protoc -I../../../../../model/pipeline/src/main/proto -I../../../../../model/fn-execution/src/main/proto ../../../../../model/fn-execution/src/main/proto/beam_fn_api.proto ../../../../../model/fn-execution/src/main/proto/beam_provision_api.proto --go_out=M../../../../../../../../,plugins=grpc:../../../../../../../.. +//go:generate protoc --go_opt=module=github.com/apache/beam/sdks/v2 --go-grpc_opt=module=github.com/apache/beam/sdks/v2 -I../../../../../model/pipeline/src/main/proto ../../../../../model/pipeline/src/main/proto/beam_runner_api.proto ../../../../../model/pipeline/src/main/proto/endpoints.proto ../../../../../model/pipeline/src/main/proto/external_transforms.proto ../../../../../model/pipeline/src/main/proto/metrics.proto ../../../../../model/pipeline/src/main/proto/schema.proto ../../../../../model/pipeline/src/main/proto/standard_window_fns.proto --go_out=../../../../ --go-grpc_out=../../../.. +//go:generate protoc --go_opt=module=github.com/apache/beam/sdks/v2 --go-grpc_opt=module=github.com/apache/beam/sdks/v2 -I../../../../../model/pipeline/src/main/proto -I../../../../../model/job-management/src/main/proto ../../../../../model/job-management/src/main/proto/beam_artifact_api.proto ../../../../../model/job-management/src/main/proto/beam_expansion_api.proto ../../../../../model/job-management/src/main/proto/beam_job_api.proto --go_out=../../../../ --go-grpc_out=../../../.. +//go:generate protoc --go_opt=module=github.com/apache/beam/sdks/v2 --go-grpc_opt=module=github.com/apache/beam/sdks/v2 -I../../../../../model/pipeline/src/main/proto -I../../../../../model/fn-execution/src/main/proto ../../../../../model/fn-execution/src/main/proto/beam_fn_api.proto ../../../../../model/fn-execution/src/main/proto/beam_provision_api.proto --go_out=../../../../ --go-grpc_out=../../../.. diff --git a/sdks/go/pkg/beam/model/jobmanagement_v1/beam_artifact_api.pb.go b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_artifact_api.pb.go index 2cb93ba0b514..d0be17f3d755 100644 --- a/sdks/go/pkg/beam/model/jobmanagement_v1/beam_artifact_api.pb.go +++ b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_artifact_api.pb.go @@ -21,18 +21,14 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: beam_artifact_api.proto package jobmanagement_v1 import ( - context "context" - pipeline_v1 "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - grpc "google.golang.org/grpc" - codes "google.golang.org/grpc/codes" - status "google.golang.org/grpc/status" + pipeline_v1 "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" protoreflect "google.golang.org/protobuf/reflect/protoreflect" protoimpl "google.golang.org/protobuf/runtime/protoimpl" reflect "reflect" @@ -1439,16 +1435,16 @@ var file_beam_artifact_api_proto_rawDesc = []byte{ 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x6a, 0x6f, 0x62, 0x5f, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, - 0x43, 0x68, 0x75, 0x6e, 0x6b, 0x30, 0x01, 0x42, 0x86, 0x01, 0x0a, 0x26, 0x6f, 0x72, 0x67, 0x2e, + 0x43, 0x68, 0x75, 0x6e, 0x6b, 0x30, 0x01, 0x42, 0x89, 0x01, 0x0a, 0x26, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x76, 0x31, 0x42, 0x0b, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x41, 0x70, 0x69, 0x5a, - 0x4f, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x67, 0x6f, 0x2f, - 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x6a, - 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x76, 0x31, 0x3b, - 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x76, 0x31, - 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, + 0x52, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x76, 0x32, 0x2f, + 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2f, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, + 0x76, 0x31, 0x3b, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, + 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( @@ -1792,556 +1788,3 @@ func file_beam_artifact_api_proto_init() { file_beam_artifact_api_proto_goTypes = nil file_beam_artifact_api_proto_depIdxs = nil } - -// Reference imports to suppress errors if they are not otherwise used. -var _ context.Context -var _ grpc.ClientConnInterface - -// This is a compile-time assertion to ensure that this generated file -// is compatible with the grpc package it is being compiled against. -const _ = grpc.SupportPackageIsVersion6 - -// ArtifactRetrievalServiceClient is the client API for ArtifactRetrievalService service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type ArtifactRetrievalServiceClient interface { - // Resolves the given artifact references into one or more replacement - // artifact references (e.g. a Maven dependency into a (transitive) set - // of jars. - ResolveArtifacts(ctx context.Context, in *ResolveArtifactsRequest, opts ...grpc.CallOption) (*ResolveArtifactsResponse, error) - // Retrieves the given artifact as a stream of bytes. - GetArtifact(ctx context.Context, in *GetArtifactRequest, opts ...grpc.CallOption) (ArtifactRetrievalService_GetArtifactClient, error) -} - -type artifactRetrievalServiceClient struct { - cc grpc.ClientConnInterface -} - -func NewArtifactRetrievalServiceClient(cc grpc.ClientConnInterface) ArtifactRetrievalServiceClient { - return &artifactRetrievalServiceClient{cc} -} - -func (c *artifactRetrievalServiceClient) ResolveArtifacts(ctx context.Context, in *ResolveArtifactsRequest, opts ...grpc.CallOption) (*ResolveArtifactsResponse, error) { - out := new(ResolveArtifactsResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.ArtifactRetrievalService/ResolveArtifacts", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *artifactRetrievalServiceClient) GetArtifact(ctx context.Context, in *GetArtifactRequest, opts ...grpc.CallOption) (ArtifactRetrievalService_GetArtifactClient, error) { - stream, err := c.cc.NewStream(ctx, &_ArtifactRetrievalService_serviceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.ArtifactRetrievalService/GetArtifact", opts...) - if err != nil { - return nil, err - } - x := &artifactRetrievalServiceGetArtifactClient{stream} - if err := x.ClientStream.SendMsg(in); err != nil { - return nil, err - } - if err := x.ClientStream.CloseSend(); err != nil { - return nil, err - } - return x, nil -} - -type ArtifactRetrievalService_GetArtifactClient interface { - Recv() (*GetArtifactResponse, error) - grpc.ClientStream -} - -type artifactRetrievalServiceGetArtifactClient struct { - grpc.ClientStream -} - -func (x *artifactRetrievalServiceGetArtifactClient) Recv() (*GetArtifactResponse, error) { - m := new(GetArtifactResponse) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -// ArtifactRetrievalServiceServer is the server API for ArtifactRetrievalService service. -type ArtifactRetrievalServiceServer interface { - // Resolves the given artifact references into one or more replacement - // artifact references (e.g. a Maven dependency into a (transitive) set - // of jars. - ResolveArtifacts(context.Context, *ResolveArtifactsRequest) (*ResolveArtifactsResponse, error) - // Retrieves the given artifact as a stream of bytes. - GetArtifact(*GetArtifactRequest, ArtifactRetrievalService_GetArtifactServer) error -} - -// UnimplementedArtifactRetrievalServiceServer can be embedded to have forward compatible implementations. -type UnimplementedArtifactRetrievalServiceServer struct { -} - -func (*UnimplementedArtifactRetrievalServiceServer) ResolveArtifacts(context.Context, *ResolveArtifactsRequest) (*ResolveArtifactsResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method ResolveArtifacts not implemented") -} -func (*UnimplementedArtifactRetrievalServiceServer) GetArtifact(*GetArtifactRequest, ArtifactRetrievalService_GetArtifactServer) error { - return status.Errorf(codes.Unimplemented, "method GetArtifact not implemented") -} - -func RegisterArtifactRetrievalServiceServer(s *grpc.Server, srv ArtifactRetrievalServiceServer) { - s.RegisterService(&_ArtifactRetrievalService_serviceDesc, srv) -} - -func _ArtifactRetrievalService_ResolveArtifacts_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(ResolveArtifactsRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(ArtifactRetrievalServiceServer).ResolveArtifacts(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.ArtifactRetrievalService/ResolveArtifacts", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(ArtifactRetrievalServiceServer).ResolveArtifacts(ctx, req.(*ResolveArtifactsRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _ArtifactRetrievalService_GetArtifact_Handler(srv interface{}, stream grpc.ServerStream) error { - m := new(GetArtifactRequest) - if err := stream.RecvMsg(m); err != nil { - return err - } - return srv.(ArtifactRetrievalServiceServer).GetArtifact(m, &artifactRetrievalServiceGetArtifactServer{stream}) -} - -type ArtifactRetrievalService_GetArtifactServer interface { - Send(*GetArtifactResponse) error - grpc.ServerStream -} - -type artifactRetrievalServiceGetArtifactServer struct { - grpc.ServerStream -} - -func (x *artifactRetrievalServiceGetArtifactServer) Send(m *GetArtifactResponse) error { - return x.ServerStream.SendMsg(m) -} - -var _ArtifactRetrievalService_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.job_management.v1.ArtifactRetrievalService", - HandlerType: (*ArtifactRetrievalServiceServer)(nil), - Methods: []grpc.MethodDesc{ - { - MethodName: "ResolveArtifacts", - Handler: _ArtifactRetrievalService_ResolveArtifacts_Handler, - }, - }, - Streams: []grpc.StreamDesc{ - { - StreamName: "GetArtifact", - Handler: _ArtifactRetrievalService_GetArtifact_Handler, - ServerStreams: true, - }, - }, - Metadata: "beam_artifact_api.proto", -} - -// ArtifactStagingServiceClient is the client API for ArtifactStagingService service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type ArtifactStagingServiceClient interface { - ReverseArtifactRetrievalService(ctx context.Context, opts ...grpc.CallOption) (ArtifactStagingService_ReverseArtifactRetrievalServiceClient, error) -} - -type artifactStagingServiceClient struct { - cc grpc.ClientConnInterface -} - -func NewArtifactStagingServiceClient(cc grpc.ClientConnInterface) ArtifactStagingServiceClient { - return &artifactStagingServiceClient{cc} -} - -func (c *artifactStagingServiceClient) ReverseArtifactRetrievalService(ctx context.Context, opts ...grpc.CallOption) (ArtifactStagingService_ReverseArtifactRetrievalServiceClient, error) { - stream, err := c.cc.NewStream(ctx, &_ArtifactStagingService_serviceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.ArtifactStagingService/ReverseArtifactRetrievalService", opts...) - if err != nil { - return nil, err - } - x := &artifactStagingServiceReverseArtifactRetrievalServiceClient{stream} - return x, nil -} - -type ArtifactStagingService_ReverseArtifactRetrievalServiceClient interface { - Send(*ArtifactResponseWrapper) error - Recv() (*ArtifactRequestWrapper, error) - grpc.ClientStream -} - -type artifactStagingServiceReverseArtifactRetrievalServiceClient struct { - grpc.ClientStream -} - -func (x *artifactStagingServiceReverseArtifactRetrievalServiceClient) Send(m *ArtifactResponseWrapper) error { - return x.ClientStream.SendMsg(m) -} - -func (x *artifactStagingServiceReverseArtifactRetrievalServiceClient) Recv() (*ArtifactRequestWrapper, error) { - m := new(ArtifactRequestWrapper) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -// ArtifactStagingServiceServer is the server API for ArtifactStagingService service. -type ArtifactStagingServiceServer interface { - ReverseArtifactRetrievalService(ArtifactStagingService_ReverseArtifactRetrievalServiceServer) error -} - -// UnimplementedArtifactStagingServiceServer can be embedded to have forward compatible implementations. -type UnimplementedArtifactStagingServiceServer struct { -} - -func (*UnimplementedArtifactStagingServiceServer) ReverseArtifactRetrievalService(ArtifactStagingService_ReverseArtifactRetrievalServiceServer) error { - return status.Errorf(codes.Unimplemented, "method ReverseArtifactRetrievalService not implemented") -} - -func RegisterArtifactStagingServiceServer(s *grpc.Server, srv ArtifactStagingServiceServer) { - s.RegisterService(&_ArtifactStagingService_serviceDesc, srv) -} - -func _ArtifactStagingService_ReverseArtifactRetrievalService_Handler(srv interface{}, stream grpc.ServerStream) error { - return srv.(ArtifactStagingServiceServer).ReverseArtifactRetrievalService(&artifactStagingServiceReverseArtifactRetrievalServiceServer{stream}) -} - -type ArtifactStagingService_ReverseArtifactRetrievalServiceServer interface { - Send(*ArtifactRequestWrapper) error - Recv() (*ArtifactResponseWrapper, error) - grpc.ServerStream -} - -type artifactStagingServiceReverseArtifactRetrievalServiceServer struct { - grpc.ServerStream -} - -func (x *artifactStagingServiceReverseArtifactRetrievalServiceServer) Send(m *ArtifactRequestWrapper) error { - return x.ServerStream.SendMsg(m) -} - -func (x *artifactStagingServiceReverseArtifactRetrievalServiceServer) Recv() (*ArtifactResponseWrapper, error) { - m := new(ArtifactResponseWrapper) - if err := x.ServerStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -var _ArtifactStagingService_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.job_management.v1.ArtifactStagingService", - HandlerType: (*ArtifactStagingServiceServer)(nil), - Methods: []grpc.MethodDesc{}, - Streams: []grpc.StreamDesc{ - { - StreamName: "ReverseArtifactRetrievalService", - Handler: _ArtifactStagingService_ReverseArtifactRetrievalService_Handler, - ServerStreams: true, - ClientStreams: true, - }, - }, - Metadata: "beam_artifact_api.proto", -} - -// LegacyArtifactStagingServiceClient is the client API for LegacyArtifactStagingService service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type LegacyArtifactStagingServiceClient interface { - // Stage an artifact to be available during job execution. The first request must contain the - // name of the artifact. All future requests must contain sequential chunks of the content of - // the artifact. - PutArtifact(ctx context.Context, opts ...grpc.CallOption) (LegacyArtifactStagingService_PutArtifactClient, error) - // Commit the manifest for a Job. All artifacts must have been successfully uploaded - // before this call is made. - // - // Throws error INVALID_ARGUMENT if not all of the members of the manifest are present - CommitManifest(ctx context.Context, in *CommitManifestRequest, opts ...grpc.CallOption) (*CommitManifestResponse, error) -} - -type legacyArtifactStagingServiceClient struct { - cc grpc.ClientConnInterface -} - -func NewLegacyArtifactStagingServiceClient(cc grpc.ClientConnInterface) LegacyArtifactStagingServiceClient { - return &legacyArtifactStagingServiceClient{cc} -} - -func (c *legacyArtifactStagingServiceClient) PutArtifact(ctx context.Context, opts ...grpc.CallOption) (LegacyArtifactStagingService_PutArtifactClient, error) { - stream, err := c.cc.NewStream(ctx, &_LegacyArtifactStagingService_serviceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.LegacyArtifactStagingService/PutArtifact", opts...) - if err != nil { - return nil, err - } - x := &legacyArtifactStagingServicePutArtifactClient{stream} - return x, nil -} - -type LegacyArtifactStagingService_PutArtifactClient interface { - Send(*PutArtifactRequest) error - CloseAndRecv() (*PutArtifactResponse, error) - grpc.ClientStream -} - -type legacyArtifactStagingServicePutArtifactClient struct { - grpc.ClientStream -} - -func (x *legacyArtifactStagingServicePutArtifactClient) Send(m *PutArtifactRequest) error { - return x.ClientStream.SendMsg(m) -} - -func (x *legacyArtifactStagingServicePutArtifactClient) CloseAndRecv() (*PutArtifactResponse, error) { - if err := x.ClientStream.CloseSend(); err != nil { - return nil, err - } - m := new(PutArtifactResponse) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -func (c *legacyArtifactStagingServiceClient) CommitManifest(ctx context.Context, in *CommitManifestRequest, opts ...grpc.CallOption) (*CommitManifestResponse, error) { - out := new(CommitManifestResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.LegacyArtifactStagingService/CommitManifest", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -// LegacyArtifactStagingServiceServer is the server API for LegacyArtifactStagingService service. -type LegacyArtifactStagingServiceServer interface { - // Stage an artifact to be available during job execution. The first request must contain the - // name of the artifact. All future requests must contain sequential chunks of the content of - // the artifact. - PutArtifact(LegacyArtifactStagingService_PutArtifactServer) error - // Commit the manifest for a Job. All artifacts must have been successfully uploaded - // before this call is made. - // - // Throws error INVALID_ARGUMENT if not all of the members of the manifest are present - CommitManifest(context.Context, *CommitManifestRequest) (*CommitManifestResponse, error) -} - -// UnimplementedLegacyArtifactStagingServiceServer can be embedded to have forward compatible implementations. -type UnimplementedLegacyArtifactStagingServiceServer struct { -} - -func (*UnimplementedLegacyArtifactStagingServiceServer) PutArtifact(LegacyArtifactStagingService_PutArtifactServer) error { - return status.Errorf(codes.Unimplemented, "method PutArtifact not implemented") -} -func (*UnimplementedLegacyArtifactStagingServiceServer) CommitManifest(context.Context, *CommitManifestRequest) (*CommitManifestResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method CommitManifest not implemented") -} - -func RegisterLegacyArtifactStagingServiceServer(s *grpc.Server, srv LegacyArtifactStagingServiceServer) { - s.RegisterService(&_LegacyArtifactStagingService_serviceDesc, srv) -} - -func _LegacyArtifactStagingService_PutArtifact_Handler(srv interface{}, stream grpc.ServerStream) error { - return srv.(LegacyArtifactStagingServiceServer).PutArtifact(&legacyArtifactStagingServicePutArtifactServer{stream}) -} - -type LegacyArtifactStagingService_PutArtifactServer interface { - SendAndClose(*PutArtifactResponse) error - Recv() (*PutArtifactRequest, error) - grpc.ServerStream -} - -type legacyArtifactStagingServicePutArtifactServer struct { - grpc.ServerStream -} - -func (x *legacyArtifactStagingServicePutArtifactServer) SendAndClose(m *PutArtifactResponse) error { - return x.ServerStream.SendMsg(m) -} - -func (x *legacyArtifactStagingServicePutArtifactServer) Recv() (*PutArtifactRequest, error) { - m := new(PutArtifactRequest) - if err := x.ServerStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -func _LegacyArtifactStagingService_CommitManifest_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(CommitManifestRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(LegacyArtifactStagingServiceServer).CommitManifest(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.LegacyArtifactStagingService/CommitManifest", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(LegacyArtifactStagingServiceServer).CommitManifest(ctx, req.(*CommitManifestRequest)) - } - return interceptor(ctx, in, info, handler) -} - -var _LegacyArtifactStagingService_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.job_management.v1.LegacyArtifactStagingService", - HandlerType: (*LegacyArtifactStagingServiceServer)(nil), - Methods: []grpc.MethodDesc{ - { - MethodName: "CommitManifest", - Handler: _LegacyArtifactStagingService_CommitManifest_Handler, - }, - }, - Streams: []grpc.StreamDesc{ - { - StreamName: "PutArtifact", - Handler: _LegacyArtifactStagingService_PutArtifact_Handler, - ClientStreams: true, - }, - }, - Metadata: "beam_artifact_api.proto", -} - -// LegacyArtifactRetrievalServiceClient is the client API for LegacyArtifactRetrievalService service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type LegacyArtifactRetrievalServiceClient interface { - // Get the manifest for the job - GetManifest(ctx context.Context, in *GetManifestRequest, opts ...grpc.CallOption) (*GetManifestResponse, error) - // Get an artifact staged for the job. The requested artifact must be within the manifest - GetArtifact(ctx context.Context, in *LegacyGetArtifactRequest, opts ...grpc.CallOption) (LegacyArtifactRetrievalService_GetArtifactClient, error) -} - -type legacyArtifactRetrievalServiceClient struct { - cc grpc.ClientConnInterface -} - -func NewLegacyArtifactRetrievalServiceClient(cc grpc.ClientConnInterface) LegacyArtifactRetrievalServiceClient { - return &legacyArtifactRetrievalServiceClient{cc} -} - -func (c *legacyArtifactRetrievalServiceClient) GetManifest(ctx context.Context, in *GetManifestRequest, opts ...grpc.CallOption) (*GetManifestResponse, error) { - out := new(GetManifestResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.LegacyArtifactRetrievalService/GetManifest", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *legacyArtifactRetrievalServiceClient) GetArtifact(ctx context.Context, in *LegacyGetArtifactRequest, opts ...grpc.CallOption) (LegacyArtifactRetrievalService_GetArtifactClient, error) { - stream, err := c.cc.NewStream(ctx, &_LegacyArtifactRetrievalService_serviceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.LegacyArtifactRetrievalService/GetArtifact", opts...) - if err != nil { - return nil, err - } - x := &legacyArtifactRetrievalServiceGetArtifactClient{stream} - if err := x.ClientStream.SendMsg(in); err != nil { - return nil, err - } - if err := x.ClientStream.CloseSend(); err != nil { - return nil, err - } - return x, nil -} - -type LegacyArtifactRetrievalService_GetArtifactClient interface { - Recv() (*ArtifactChunk, error) - grpc.ClientStream -} - -type legacyArtifactRetrievalServiceGetArtifactClient struct { - grpc.ClientStream -} - -func (x *legacyArtifactRetrievalServiceGetArtifactClient) Recv() (*ArtifactChunk, error) { - m := new(ArtifactChunk) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -// LegacyArtifactRetrievalServiceServer is the server API for LegacyArtifactRetrievalService service. -type LegacyArtifactRetrievalServiceServer interface { - // Get the manifest for the job - GetManifest(context.Context, *GetManifestRequest) (*GetManifestResponse, error) - // Get an artifact staged for the job. The requested artifact must be within the manifest - GetArtifact(*LegacyGetArtifactRequest, LegacyArtifactRetrievalService_GetArtifactServer) error -} - -// UnimplementedLegacyArtifactRetrievalServiceServer can be embedded to have forward compatible implementations. -type UnimplementedLegacyArtifactRetrievalServiceServer struct { -} - -func (*UnimplementedLegacyArtifactRetrievalServiceServer) GetManifest(context.Context, *GetManifestRequest) (*GetManifestResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method GetManifest not implemented") -} -func (*UnimplementedLegacyArtifactRetrievalServiceServer) GetArtifact(*LegacyGetArtifactRequest, LegacyArtifactRetrievalService_GetArtifactServer) error { - return status.Errorf(codes.Unimplemented, "method GetArtifact not implemented") -} - -func RegisterLegacyArtifactRetrievalServiceServer(s *grpc.Server, srv LegacyArtifactRetrievalServiceServer) { - s.RegisterService(&_LegacyArtifactRetrievalService_serviceDesc, srv) -} - -func _LegacyArtifactRetrievalService_GetManifest_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(GetManifestRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(LegacyArtifactRetrievalServiceServer).GetManifest(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.LegacyArtifactRetrievalService/GetManifest", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(LegacyArtifactRetrievalServiceServer).GetManifest(ctx, req.(*GetManifestRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _LegacyArtifactRetrievalService_GetArtifact_Handler(srv interface{}, stream grpc.ServerStream) error { - m := new(LegacyGetArtifactRequest) - if err := stream.RecvMsg(m); err != nil { - return err - } - return srv.(LegacyArtifactRetrievalServiceServer).GetArtifact(m, &legacyArtifactRetrievalServiceGetArtifactServer{stream}) -} - -type LegacyArtifactRetrievalService_GetArtifactServer interface { - Send(*ArtifactChunk) error - grpc.ServerStream -} - -type legacyArtifactRetrievalServiceGetArtifactServer struct { - grpc.ServerStream -} - -func (x *legacyArtifactRetrievalServiceGetArtifactServer) Send(m *ArtifactChunk) error { - return x.ServerStream.SendMsg(m) -} - -var _LegacyArtifactRetrievalService_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.job_management.v1.LegacyArtifactRetrievalService", - HandlerType: (*LegacyArtifactRetrievalServiceServer)(nil), - Methods: []grpc.MethodDesc{ - { - MethodName: "GetManifest", - Handler: _LegacyArtifactRetrievalService_GetManifest_Handler, - }, - }, - Streams: []grpc.StreamDesc{ - { - StreamName: "GetArtifact", - Handler: _LegacyArtifactRetrievalService_GetArtifact_Handler, - ServerStreams: true, - }, - }, - Metadata: "beam_artifact_api.proto", -} diff --git a/sdks/go/pkg/beam/model/jobmanagement_v1/beam_artifact_api_grpc.pb.go b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_artifact_api_grpc.pb.go new file mode 100644 index 000000000000..57b73e4c8a9e --- /dev/null +++ b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_artifact_api_grpc.pb.go @@ -0,0 +1,637 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Code generated by protoc-gen-go-grpc. DO NOT EDIT. + +package jobmanagement_v1 + +import ( + context "context" + grpc "google.golang.org/grpc" + codes "google.golang.org/grpc/codes" + status "google.golang.org/grpc/status" +) + +// This is a compile-time assertion to ensure that this generated file +// is compatible with the grpc package it is being compiled against. +// Requires gRPC-Go v1.32.0 or later. +const _ = grpc.SupportPackageIsVersion7 + +// ArtifactRetrievalServiceClient is the client API for ArtifactRetrievalService service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type ArtifactRetrievalServiceClient interface { + // Resolves the given artifact references into one or more replacement + // artifact references (e.g. a Maven dependency into a (transitive) set + // of jars. + ResolveArtifacts(ctx context.Context, in *ResolveArtifactsRequest, opts ...grpc.CallOption) (*ResolveArtifactsResponse, error) + // Retrieves the given artifact as a stream of bytes. + GetArtifact(ctx context.Context, in *GetArtifactRequest, opts ...grpc.CallOption) (ArtifactRetrievalService_GetArtifactClient, error) +} + +type artifactRetrievalServiceClient struct { + cc grpc.ClientConnInterface +} + +func NewArtifactRetrievalServiceClient(cc grpc.ClientConnInterface) ArtifactRetrievalServiceClient { + return &artifactRetrievalServiceClient{cc} +} + +func (c *artifactRetrievalServiceClient) ResolveArtifacts(ctx context.Context, in *ResolveArtifactsRequest, opts ...grpc.CallOption) (*ResolveArtifactsResponse, error) { + out := new(ResolveArtifactsResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.ArtifactRetrievalService/ResolveArtifacts", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *artifactRetrievalServiceClient) GetArtifact(ctx context.Context, in *GetArtifactRequest, opts ...grpc.CallOption) (ArtifactRetrievalService_GetArtifactClient, error) { + stream, err := c.cc.NewStream(ctx, &ArtifactRetrievalService_ServiceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.ArtifactRetrievalService/GetArtifact", opts...) + if err != nil { + return nil, err + } + x := &artifactRetrievalServiceGetArtifactClient{stream} + if err := x.ClientStream.SendMsg(in); err != nil { + return nil, err + } + if err := x.ClientStream.CloseSend(); err != nil { + return nil, err + } + return x, nil +} + +type ArtifactRetrievalService_GetArtifactClient interface { + Recv() (*GetArtifactResponse, error) + grpc.ClientStream +} + +type artifactRetrievalServiceGetArtifactClient struct { + grpc.ClientStream +} + +func (x *artifactRetrievalServiceGetArtifactClient) Recv() (*GetArtifactResponse, error) { + m := new(GetArtifactResponse) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// ArtifactRetrievalServiceServer is the server API for ArtifactRetrievalService service. +// All implementations must embed UnimplementedArtifactRetrievalServiceServer +// for forward compatibility +type ArtifactRetrievalServiceServer interface { + // Resolves the given artifact references into one or more replacement + // artifact references (e.g. a Maven dependency into a (transitive) set + // of jars. + ResolveArtifacts(context.Context, *ResolveArtifactsRequest) (*ResolveArtifactsResponse, error) + // Retrieves the given artifact as a stream of bytes. + GetArtifact(*GetArtifactRequest, ArtifactRetrievalService_GetArtifactServer) error + mustEmbedUnimplementedArtifactRetrievalServiceServer() +} + +// UnimplementedArtifactRetrievalServiceServer must be embedded to have forward compatible implementations. +type UnimplementedArtifactRetrievalServiceServer struct { +} + +func (UnimplementedArtifactRetrievalServiceServer) ResolveArtifacts(context.Context, *ResolveArtifactsRequest) (*ResolveArtifactsResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method ResolveArtifacts not implemented") +} +func (UnimplementedArtifactRetrievalServiceServer) GetArtifact(*GetArtifactRequest, ArtifactRetrievalService_GetArtifactServer) error { + return status.Errorf(codes.Unimplemented, "method GetArtifact not implemented") +} +func (UnimplementedArtifactRetrievalServiceServer) mustEmbedUnimplementedArtifactRetrievalServiceServer() { +} + +// UnsafeArtifactRetrievalServiceServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to ArtifactRetrievalServiceServer will +// result in compilation errors. +type UnsafeArtifactRetrievalServiceServer interface { + mustEmbedUnimplementedArtifactRetrievalServiceServer() +} + +func RegisterArtifactRetrievalServiceServer(s grpc.ServiceRegistrar, srv ArtifactRetrievalServiceServer) { + s.RegisterService(&ArtifactRetrievalService_ServiceDesc, srv) +} + +func _ArtifactRetrievalService_ResolveArtifacts_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(ResolveArtifactsRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(ArtifactRetrievalServiceServer).ResolveArtifacts(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.ArtifactRetrievalService/ResolveArtifacts", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(ArtifactRetrievalServiceServer).ResolveArtifacts(ctx, req.(*ResolveArtifactsRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _ArtifactRetrievalService_GetArtifact_Handler(srv interface{}, stream grpc.ServerStream) error { + m := new(GetArtifactRequest) + if err := stream.RecvMsg(m); err != nil { + return err + } + return srv.(ArtifactRetrievalServiceServer).GetArtifact(m, &artifactRetrievalServiceGetArtifactServer{stream}) +} + +type ArtifactRetrievalService_GetArtifactServer interface { + Send(*GetArtifactResponse) error + grpc.ServerStream +} + +type artifactRetrievalServiceGetArtifactServer struct { + grpc.ServerStream +} + +func (x *artifactRetrievalServiceGetArtifactServer) Send(m *GetArtifactResponse) error { + return x.ServerStream.SendMsg(m) +} + +// ArtifactRetrievalService_ServiceDesc is the grpc.ServiceDesc for ArtifactRetrievalService service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var ArtifactRetrievalService_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.job_management.v1.ArtifactRetrievalService", + HandlerType: (*ArtifactRetrievalServiceServer)(nil), + Methods: []grpc.MethodDesc{ + { + MethodName: "ResolveArtifacts", + Handler: _ArtifactRetrievalService_ResolveArtifacts_Handler, + }, + }, + Streams: []grpc.StreamDesc{ + { + StreamName: "GetArtifact", + Handler: _ArtifactRetrievalService_GetArtifact_Handler, + ServerStreams: true, + }, + }, + Metadata: "beam_artifact_api.proto", +} + +// ArtifactStagingServiceClient is the client API for ArtifactStagingService service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type ArtifactStagingServiceClient interface { + ReverseArtifactRetrievalService(ctx context.Context, opts ...grpc.CallOption) (ArtifactStagingService_ReverseArtifactRetrievalServiceClient, error) +} + +type artifactStagingServiceClient struct { + cc grpc.ClientConnInterface +} + +func NewArtifactStagingServiceClient(cc grpc.ClientConnInterface) ArtifactStagingServiceClient { + return &artifactStagingServiceClient{cc} +} + +func (c *artifactStagingServiceClient) ReverseArtifactRetrievalService(ctx context.Context, opts ...grpc.CallOption) (ArtifactStagingService_ReverseArtifactRetrievalServiceClient, error) { + stream, err := c.cc.NewStream(ctx, &ArtifactStagingService_ServiceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.ArtifactStagingService/ReverseArtifactRetrievalService", opts...) + if err != nil { + return nil, err + } + x := &artifactStagingServiceReverseArtifactRetrievalServiceClient{stream} + return x, nil +} + +type ArtifactStagingService_ReverseArtifactRetrievalServiceClient interface { + Send(*ArtifactResponseWrapper) error + Recv() (*ArtifactRequestWrapper, error) + grpc.ClientStream +} + +type artifactStagingServiceReverseArtifactRetrievalServiceClient struct { + grpc.ClientStream +} + +func (x *artifactStagingServiceReverseArtifactRetrievalServiceClient) Send(m *ArtifactResponseWrapper) error { + return x.ClientStream.SendMsg(m) +} + +func (x *artifactStagingServiceReverseArtifactRetrievalServiceClient) Recv() (*ArtifactRequestWrapper, error) { + m := new(ArtifactRequestWrapper) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// ArtifactStagingServiceServer is the server API for ArtifactStagingService service. +// All implementations must embed UnimplementedArtifactStagingServiceServer +// for forward compatibility +type ArtifactStagingServiceServer interface { + ReverseArtifactRetrievalService(ArtifactStagingService_ReverseArtifactRetrievalServiceServer) error + mustEmbedUnimplementedArtifactStagingServiceServer() +} + +// UnimplementedArtifactStagingServiceServer must be embedded to have forward compatible implementations. +type UnimplementedArtifactStagingServiceServer struct { +} + +func (UnimplementedArtifactStagingServiceServer) ReverseArtifactRetrievalService(ArtifactStagingService_ReverseArtifactRetrievalServiceServer) error { + return status.Errorf(codes.Unimplemented, "method ReverseArtifactRetrievalService not implemented") +} +func (UnimplementedArtifactStagingServiceServer) mustEmbedUnimplementedArtifactStagingServiceServer() { +} + +// UnsafeArtifactStagingServiceServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to ArtifactStagingServiceServer will +// result in compilation errors. +type UnsafeArtifactStagingServiceServer interface { + mustEmbedUnimplementedArtifactStagingServiceServer() +} + +func RegisterArtifactStagingServiceServer(s grpc.ServiceRegistrar, srv ArtifactStagingServiceServer) { + s.RegisterService(&ArtifactStagingService_ServiceDesc, srv) +} + +func _ArtifactStagingService_ReverseArtifactRetrievalService_Handler(srv interface{}, stream grpc.ServerStream) error { + return srv.(ArtifactStagingServiceServer).ReverseArtifactRetrievalService(&artifactStagingServiceReverseArtifactRetrievalServiceServer{stream}) +} + +type ArtifactStagingService_ReverseArtifactRetrievalServiceServer interface { + Send(*ArtifactRequestWrapper) error + Recv() (*ArtifactResponseWrapper, error) + grpc.ServerStream +} + +type artifactStagingServiceReverseArtifactRetrievalServiceServer struct { + grpc.ServerStream +} + +func (x *artifactStagingServiceReverseArtifactRetrievalServiceServer) Send(m *ArtifactRequestWrapper) error { + return x.ServerStream.SendMsg(m) +} + +func (x *artifactStagingServiceReverseArtifactRetrievalServiceServer) Recv() (*ArtifactResponseWrapper, error) { + m := new(ArtifactResponseWrapper) + if err := x.ServerStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// ArtifactStagingService_ServiceDesc is the grpc.ServiceDesc for ArtifactStagingService service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var ArtifactStagingService_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.job_management.v1.ArtifactStagingService", + HandlerType: (*ArtifactStagingServiceServer)(nil), + Methods: []grpc.MethodDesc{}, + Streams: []grpc.StreamDesc{ + { + StreamName: "ReverseArtifactRetrievalService", + Handler: _ArtifactStagingService_ReverseArtifactRetrievalService_Handler, + ServerStreams: true, + ClientStreams: true, + }, + }, + Metadata: "beam_artifact_api.proto", +} + +// LegacyArtifactStagingServiceClient is the client API for LegacyArtifactStagingService service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type LegacyArtifactStagingServiceClient interface { + // Stage an artifact to be available during job execution. The first request must contain the + // name of the artifact. All future requests must contain sequential chunks of the content of + // the artifact. + PutArtifact(ctx context.Context, opts ...grpc.CallOption) (LegacyArtifactStagingService_PutArtifactClient, error) + // Commit the manifest for a Job. All artifacts must have been successfully uploaded + // before this call is made. + // + // Throws error INVALID_ARGUMENT if not all of the members of the manifest are present + CommitManifest(ctx context.Context, in *CommitManifestRequest, opts ...grpc.CallOption) (*CommitManifestResponse, error) +} + +type legacyArtifactStagingServiceClient struct { + cc grpc.ClientConnInterface +} + +func NewLegacyArtifactStagingServiceClient(cc grpc.ClientConnInterface) LegacyArtifactStagingServiceClient { + return &legacyArtifactStagingServiceClient{cc} +} + +func (c *legacyArtifactStagingServiceClient) PutArtifact(ctx context.Context, opts ...grpc.CallOption) (LegacyArtifactStagingService_PutArtifactClient, error) { + stream, err := c.cc.NewStream(ctx, &LegacyArtifactStagingService_ServiceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.LegacyArtifactStagingService/PutArtifact", opts...) + if err != nil { + return nil, err + } + x := &legacyArtifactStagingServicePutArtifactClient{stream} + return x, nil +} + +type LegacyArtifactStagingService_PutArtifactClient interface { + Send(*PutArtifactRequest) error + CloseAndRecv() (*PutArtifactResponse, error) + grpc.ClientStream +} + +type legacyArtifactStagingServicePutArtifactClient struct { + grpc.ClientStream +} + +func (x *legacyArtifactStagingServicePutArtifactClient) Send(m *PutArtifactRequest) error { + return x.ClientStream.SendMsg(m) +} + +func (x *legacyArtifactStagingServicePutArtifactClient) CloseAndRecv() (*PutArtifactResponse, error) { + if err := x.ClientStream.CloseSend(); err != nil { + return nil, err + } + m := new(PutArtifactResponse) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +func (c *legacyArtifactStagingServiceClient) CommitManifest(ctx context.Context, in *CommitManifestRequest, opts ...grpc.CallOption) (*CommitManifestResponse, error) { + out := new(CommitManifestResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.LegacyArtifactStagingService/CommitManifest", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +// LegacyArtifactStagingServiceServer is the server API for LegacyArtifactStagingService service. +// All implementations must embed UnimplementedLegacyArtifactStagingServiceServer +// for forward compatibility +type LegacyArtifactStagingServiceServer interface { + // Stage an artifact to be available during job execution. The first request must contain the + // name of the artifact. All future requests must contain sequential chunks of the content of + // the artifact. + PutArtifact(LegacyArtifactStagingService_PutArtifactServer) error + // Commit the manifest for a Job. All artifacts must have been successfully uploaded + // before this call is made. + // + // Throws error INVALID_ARGUMENT if not all of the members of the manifest are present + CommitManifest(context.Context, *CommitManifestRequest) (*CommitManifestResponse, error) + mustEmbedUnimplementedLegacyArtifactStagingServiceServer() +} + +// UnimplementedLegacyArtifactStagingServiceServer must be embedded to have forward compatible implementations. +type UnimplementedLegacyArtifactStagingServiceServer struct { +} + +func (UnimplementedLegacyArtifactStagingServiceServer) PutArtifact(LegacyArtifactStagingService_PutArtifactServer) error { + return status.Errorf(codes.Unimplemented, "method PutArtifact not implemented") +} +func (UnimplementedLegacyArtifactStagingServiceServer) CommitManifest(context.Context, *CommitManifestRequest) (*CommitManifestResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method CommitManifest not implemented") +} +func (UnimplementedLegacyArtifactStagingServiceServer) mustEmbedUnimplementedLegacyArtifactStagingServiceServer() { +} + +// UnsafeLegacyArtifactStagingServiceServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to LegacyArtifactStagingServiceServer will +// result in compilation errors. +type UnsafeLegacyArtifactStagingServiceServer interface { + mustEmbedUnimplementedLegacyArtifactStagingServiceServer() +} + +func RegisterLegacyArtifactStagingServiceServer(s grpc.ServiceRegistrar, srv LegacyArtifactStagingServiceServer) { + s.RegisterService(&LegacyArtifactStagingService_ServiceDesc, srv) +} + +func _LegacyArtifactStagingService_PutArtifact_Handler(srv interface{}, stream grpc.ServerStream) error { + return srv.(LegacyArtifactStagingServiceServer).PutArtifact(&legacyArtifactStagingServicePutArtifactServer{stream}) +} + +type LegacyArtifactStagingService_PutArtifactServer interface { + SendAndClose(*PutArtifactResponse) error + Recv() (*PutArtifactRequest, error) + grpc.ServerStream +} + +type legacyArtifactStagingServicePutArtifactServer struct { + grpc.ServerStream +} + +func (x *legacyArtifactStagingServicePutArtifactServer) SendAndClose(m *PutArtifactResponse) error { + return x.ServerStream.SendMsg(m) +} + +func (x *legacyArtifactStagingServicePutArtifactServer) Recv() (*PutArtifactRequest, error) { + m := new(PutArtifactRequest) + if err := x.ServerStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +func _LegacyArtifactStagingService_CommitManifest_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(CommitManifestRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(LegacyArtifactStagingServiceServer).CommitManifest(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.LegacyArtifactStagingService/CommitManifest", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(LegacyArtifactStagingServiceServer).CommitManifest(ctx, req.(*CommitManifestRequest)) + } + return interceptor(ctx, in, info, handler) +} + +// LegacyArtifactStagingService_ServiceDesc is the grpc.ServiceDesc for LegacyArtifactStagingService service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var LegacyArtifactStagingService_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.job_management.v1.LegacyArtifactStagingService", + HandlerType: (*LegacyArtifactStagingServiceServer)(nil), + Methods: []grpc.MethodDesc{ + { + MethodName: "CommitManifest", + Handler: _LegacyArtifactStagingService_CommitManifest_Handler, + }, + }, + Streams: []grpc.StreamDesc{ + { + StreamName: "PutArtifact", + Handler: _LegacyArtifactStagingService_PutArtifact_Handler, + ClientStreams: true, + }, + }, + Metadata: "beam_artifact_api.proto", +} + +// LegacyArtifactRetrievalServiceClient is the client API for LegacyArtifactRetrievalService service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type LegacyArtifactRetrievalServiceClient interface { + // Get the manifest for the job + GetManifest(ctx context.Context, in *GetManifestRequest, opts ...grpc.CallOption) (*GetManifestResponse, error) + // Get an artifact staged for the job. The requested artifact must be within the manifest + GetArtifact(ctx context.Context, in *LegacyGetArtifactRequest, opts ...grpc.CallOption) (LegacyArtifactRetrievalService_GetArtifactClient, error) +} + +type legacyArtifactRetrievalServiceClient struct { + cc grpc.ClientConnInterface +} + +func NewLegacyArtifactRetrievalServiceClient(cc grpc.ClientConnInterface) LegacyArtifactRetrievalServiceClient { + return &legacyArtifactRetrievalServiceClient{cc} +} + +func (c *legacyArtifactRetrievalServiceClient) GetManifest(ctx context.Context, in *GetManifestRequest, opts ...grpc.CallOption) (*GetManifestResponse, error) { + out := new(GetManifestResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.LegacyArtifactRetrievalService/GetManifest", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *legacyArtifactRetrievalServiceClient) GetArtifact(ctx context.Context, in *LegacyGetArtifactRequest, opts ...grpc.CallOption) (LegacyArtifactRetrievalService_GetArtifactClient, error) { + stream, err := c.cc.NewStream(ctx, &LegacyArtifactRetrievalService_ServiceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.LegacyArtifactRetrievalService/GetArtifact", opts...) + if err != nil { + return nil, err + } + x := &legacyArtifactRetrievalServiceGetArtifactClient{stream} + if err := x.ClientStream.SendMsg(in); err != nil { + return nil, err + } + if err := x.ClientStream.CloseSend(); err != nil { + return nil, err + } + return x, nil +} + +type LegacyArtifactRetrievalService_GetArtifactClient interface { + Recv() (*ArtifactChunk, error) + grpc.ClientStream +} + +type legacyArtifactRetrievalServiceGetArtifactClient struct { + grpc.ClientStream +} + +func (x *legacyArtifactRetrievalServiceGetArtifactClient) Recv() (*ArtifactChunk, error) { + m := new(ArtifactChunk) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// LegacyArtifactRetrievalServiceServer is the server API for LegacyArtifactRetrievalService service. +// All implementations must embed UnimplementedLegacyArtifactRetrievalServiceServer +// for forward compatibility +type LegacyArtifactRetrievalServiceServer interface { + // Get the manifest for the job + GetManifest(context.Context, *GetManifestRequest) (*GetManifestResponse, error) + // Get an artifact staged for the job. The requested artifact must be within the manifest + GetArtifact(*LegacyGetArtifactRequest, LegacyArtifactRetrievalService_GetArtifactServer) error + mustEmbedUnimplementedLegacyArtifactRetrievalServiceServer() +} + +// UnimplementedLegacyArtifactRetrievalServiceServer must be embedded to have forward compatible implementations. +type UnimplementedLegacyArtifactRetrievalServiceServer struct { +} + +func (UnimplementedLegacyArtifactRetrievalServiceServer) GetManifest(context.Context, *GetManifestRequest) (*GetManifestResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method GetManifest not implemented") +} +func (UnimplementedLegacyArtifactRetrievalServiceServer) GetArtifact(*LegacyGetArtifactRequest, LegacyArtifactRetrievalService_GetArtifactServer) error { + return status.Errorf(codes.Unimplemented, "method GetArtifact not implemented") +} +func (UnimplementedLegacyArtifactRetrievalServiceServer) mustEmbedUnimplementedLegacyArtifactRetrievalServiceServer() { +} + +// UnsafeLegacyArtifactRetrievalServiceServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to LegacyArtifactRetrievalServiceServer will +// result in compilation errors. +type UnsafeLegacyArtifactRetrievalServiceServer interface { + mustEmbedUnimplementedLegacyArtifactRetrievalServiceServer() +} + +func RegisterLegacyArtifactRetrievalServiceServer(s grpc.ServiceRegistrar, srv LegacyArtifactRetrievalServiceServer) { + s.RegisterService(&LegacyArtifactRetrievalService_ServiceDesc, srv) +} + +func _LegacyArtifactRetrievalService_GetManifest_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(GetManifestRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(LegacyArtifactRetrievalServiceServer).GetManifest(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.LegacyArtifactRetrievalService/GetManifest", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(LegacyArtifactRetrievalServiceServer).GetManifest(ctx, req.(*GetManifestRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _LegacyArtifactRetrievalService_GetArtifact_Handler(srv interface{}, stream grpc.ServerStream) error { + m := new(LegacyGetArtifactRequest) + if err := stream.RecvMsg(m); err != nil { + return err + } + return srv.(LegacyArtifactRetrievalServiceServer).GetArtifact(m, &legacyArtifactRetrievalServiceGetArtifactServer{stream}) +} + +type LegacyArtifactRetrievalService_GetArtifactServer interface { + Send(*ArtifactChunk) error + grpc.ServerStream +} + +type legacyArtifactRetrievalServiceGetArtifactServer struct { + grpc.ServerStream +} + +func (x *legacyArtifactRetrievalServiceGetArtifactServer) Send(m *ArtifactChunk) error { + return x.ServerStream.SendMsg(m) +} + +// LegacyArtifactRetrievalService_ServiceDesc is the grpc.ServiceDesc for LegacyArtifactRetrievalService service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var LegacyArtifactRetrievalService_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.job_management.v1.LegacyArtifactRetrievalService", + HandlerType: (*LegacyArtifactRetrievalServiceServer)(nil), + Methods: []grpc.MethodDesc{ + { + MethodName: "GetManifest", + Handler: _LegacyArtifactRetrievalService_GetManifest_Handler, + }, + }, + Streams: []grpc.StreamDesc{ + { + StreamName: "GetArtifact", + Handler: _LegacyArtifactRetrievalService_GetArtifact_Handler, + ServerStreams: true, + }, + }, + Metadata: "beam_artifact_api.proto", +} diff --git a/sdks/go/pkg/beam/model/jobmanagement_v1/beam_expansion_api.pb.go b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_expansion_api.pb.go index e9ec1d013cc0..a9893ae45d35 100644 --- a/sdks/go/pkg/beam/model/jobmanagement_v1/beam_expansion_api.pb.go +++ b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_expansion_api.pb.go @@ -21,18 +21,14 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: beam_expansion_api.proto package jobmanagement_v1 import ( - context "context" - pipeline_v1 "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - grpc "google.golang.org/grpc" - codes "google.golang.org/grpc/codes" - status "google.golang.org/grpc/status" + pipeline_v1 "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" protoreflect "google.golang.org/protobuf/reflect/protoreflect" protoimpl "google.golang.org/protobuf/runtime/protoimpl" reflect "reflect" @@ -241,16 +237,16 @@ var file_beam_expansion_api_proto_rawDesc = []byte{ 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x35, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x65, 0x78, 0x70, 0x61, 0x6e, 0x73, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, 0x70, 0x61, 0x6e, - 0x73, 0x69, 0x6f, 0x6e, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x42, 0x83, 0x01, 0x0a, + 0x73, 0x69, 0x6f, 0x6e, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x42, 0x86, 0x01, 0x0a, 0x22, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x65, 0x78, 0x70, 0x61, 0x6e, 0x73, 0x69, 0x6f, 0x6e, 0x2e, 0x76, 0x31, 0x42, 0x0c, 0x45, 0x78, 0x70, 0x61, 0x6e, 0x73, 0x69, 0x6f, 0x6e, 0x41, 0x70, - 0x69, 0x5a, 0x4f, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, - 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x67, - 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, - 0x2f, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x76, - 0x31, 0x3b, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, - 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, + 0x69, 0x5a, 0x52, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x76, + 0x32, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2f, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, + 0x74, 0x5f, 0x76, 0x31, 0x3b, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, + 0x6e, 0x74, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( @@ -336,83 +332,3 @@ func file_beam_expansion_api_proto_init() { file_beam_expansion_api_proto_goTypes = nil file_beam_expansion_api_proto_depIdxs = nil } - -// Reference imports to suppress errors if they are not otherwise used. -var _ context.Context -var _ grpc.ClientConnInterface - -// This is a compile-time assertion to ensure that this generated file -// is compatible with the grpc package it is being compiled against. -const _ = grpc.SupportPackageIsVersion6 - -// ExpansionServiceClient is the client API for ExpansionService service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type ExpansionServiceClient interface { - Expand(ctx context.Context, in *ExpansionRequest, opts ...grpc.CallOption) (*ExpansionResponse, error) -} - -type expansionServiceClient struct { - cc grpc.ClientConnInterface -} - -func NewExpansionServiceClient(cc grpc.ClientConnInterface) ExpansionServiceClient { - return &expansionServiceClient{cc} -} - -func (c *expansionServiceClient) Expand(ctx context.Context, in *ExpansionRequest, opts ...grpc.CallOption) (*ExpansionResponse, error) { - out := new(ExpansionResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.expansion.v1.ExpansionService/Expand", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -// ExpansionServiceServer is the server API for ExpansionService service. -type ExpansionServiceServer interface { - Expand(context.Context, *ExpansionRequest) (*ExpansionResponse, error) -} - -// UnimplementedExpansionServiceServer can be embedded to have forward compatible implementations. -type UnimplementedExpansionServiceServer struct { -} - -func (*UnimplementedExpansionServiceServer) Expand(context.Context, *ExpansionRequest) (*ExpansionResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method Expand not implemented") -} - -func RegisterExpansionServiceServer(s *grpc.Server, srv ExpansionServiceServer) { - s.RegisterService(&_ExpansionService_serviceDesc, srv) -} - -func _ExpansionService_Expand_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(ExpansionRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(ExpansionServiceServer).Expand(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.expansion.v1.ExpansionService/Expand", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(ExpansionServiceServer).Expand(ctx, req.(*ExpansionRequest)) - } - return interceptor(ctx, in, info, handler) -} - -var _ExpansionService_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.expansion.v1.ExpansionService", - HandlerType: (*ExpansionServiceServer)(nil), - Methods: []grpc.MethodDesc{ - { - MethodName: "Expand", - Handler: _ExpansionService_Expand_Handler, - }, - }, - Streams: []grpc.StreamDesc{}, - Metadata: "beam_expansion_api.proto", -} diff --git a/sdks/go/pkg/beam/model/jobmanagement_v1/beam_expansion_api_grpc.pb.go b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_expansion_api_grpc.pb.go new file mode 100644 index 000000000000..bc30ae2aebde --- /dev/null +++ b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_expansion_api_grpc.pb.go @@ -0,0 +1,118 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Code generated by protoc-gen-go-grpc. DO NOT EDIT. + +package jobmanagement_v1 + +import ( + context "context" + grpc "google.golang.org/grpc" + codes "google.golang.org/grpc/codes" + status "google.golang.org/grpc/status" +) + +// This is a compile-time assertion to ensure that this generated file +// is compatible with the grpc package it is being compiled against. +// Requires gRPC-Go v1.32.0 or later. +const _ = grpc.SupportPackageIsVersion7 + +// ExpansionServiceClient is the client API for ExpansionService service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type ExpansionServiceClient interface { + Expand(ctx context.Context, in *ExpansionRequest, opts ...grpc.CallOption) (*ExpansionResponse, error) +} + +type expansionServiceClient struct { + cc grpc.ClientConnInterface +} + +func NewExpansionServiceClient(cc grpc.ClientConnInterface) ExpansionServiceClient { + return &expansionServiceClient{cc} +} + +func (c *expansionServiceClient) Expand(ctx context.Context, in *ExpansionRequest, opts ...grpc.CallOption) (*ExpansionResponse, error) { + out := new(ExpansionResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.expansion.v1.ExpansionService/Expand", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +// ExpansionServiceServer is the server API for ExpansionService service. +// All implementations must embed UnimplementedExpansionServiceServer +// for forward compatibility +type ExpansionServiceServer interface { + Expand(context.Context, *ExpansionRequest) (*ExpansionResponse, error) + mustEmbedUnimplementedExpansionServiceServer() +} + +// UnimplementedExpansionServiceServer must be embedded to have forward compatible implementations. +type UnimplementedExpansionServiceServer struct { +} + +func (UnimplementedExpansionServiceServer) Expand(context.Context, *ExpansionRequest) (*ExpansionResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method Expand not implemented") +} +func (UnimplementedExpansionServiceServer) mustEmbedUnimplementedExpansionServiceServer() {} + +// UnsafeExpansionServiceServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to ExpansionServiceServer will +// result in compilation errors. +type UnsafeExpansionServiceServer interface { + mustEmbedUnimplementedExpansionServiceServer() +} + +func RegisterExpansionServiceServer(s grpc.ServiceRegistrar, srv ExpansionServiceServer) { + s.RegisterService(&ExpansionService_ServiceDesc, srv) +} + +func _ExpansionService_Expand_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(ExpansionRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(ExpansionServiceServer).Expand(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.expansion.v1.ExpansionService/Expand", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(ExpansionServiceServer).Expand(ctx, req.(*ExpansionRequest)) + } + return interceptor(ctx, in, info, handler) +} + +// ExpansionService_ServiceDesc is the grpc.ServiceDesc for ExpansionService service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var ExpansionService_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.expansion.v1.ExpansionService", + HandlerType: (*ExpansionServiceServer)(nil), + Methods: []grpc.MethodDesc{ + { + MethodName: "Expand", + Handler: _ExpansionService_Expand_Handler, + }, + }, + Streams: []grpc.StreamDesc{}, + Metadata: "beam_expansion_api.proto", +} diff --git a/sdks/go/pkg/beam/model/jobmanagement_v1/beam_job_api.pb.go b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_job_api.pb.go index a4423f165957..e86ec22a73d9 100644 --- a/sdks/go/pkg/beam/model/jobmanagement_v1/beam_job_api.pb.go +++ b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_job_api.pb.go @@ -21,22 +21,18 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: beam_job_api.proto package jobmanagement_v1 import ( - context "context" - pipeline_v1 "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - _struct "github.com/golang/protobuf/ptypes/struct" - timestamp "github.com/golang/protobuf/ptypes/timestamp" - grpc "google.golang.org/grpc" - codes "google.golang.org/grpc/codes" - status "google.golang.org/grpc/status" + pipeline_v1 "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" protoreflect "google.golang.org/protobuf/reflect/protoreflect" protoimpl "google.golang.org/protobuf/runtime/protoimpl" + structpb "google.golang.org/protobuf/types/known/structpb" + timestamppb "google.golang.org/protobuf/types/known/timestamppb" reflect "reflect" sync "sync" ) @@ -263,7 +259,7 @@ type PrepareJobRequest struct { unknownFields protoimpl.UnknownFields Pipeline *pipeline_v1.Pipeline `protobuf:"bytes,1,opt,name=pipeline,proto3" json:"pipeline,omitempty"` // (required) - PipelineOptions *_struct.Struct `protobuf:"bytes,2,opt,name=pipeline_options,json=pipelineOptions,proto3" json:"pipeline_options,omitempty"` // (required) + PipelineOptions *structpb.Struct `protobuf:"bytes,2,opt,name=pipeline_options,json=pipelineOptions,proto3" json:"pipeline_options,omitempty"` // (required) JobName string `protobuf:"bytes,3,opt,name=job_name,json=jobName,proto3" json:"job_name,omitempty"` // (required) } @@ -306,7 +302,7 @@ func (x *PrepareJobRequest) GetPipeline() *pipeline_v1.Pipeline { return nil } -func (x *PrepareJobRequest) GetPipelineOptions() *_struct.Struct { +func (x *PrepareJobRequest) GetPipelineOptions() *structpb.Struct { if x != nil { return x.PipelineOptions } @@ -603,10 +599,10 @@ type JobInfo struct { sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields - JobId string `protobuf:"bytes,1,opt,name=job_id,json=jobId,proto3" json:"job_id,omitempty"` // (required) - JobName string `protobuf:"bytes,2,opt,name=job_name,json=jobName,proto3" json:"job_name,omitempty"` // (required) - PipelineOptions *_struct.Struct `protobuf:"bytes,3,opt,name=pipeline_options,json=pipelineOptions,proto3" json:"pipeline_options,omitempty"` // (required) - State JobState_Enum `protobuf:"varint,4,opt,name=state,proto3,enum=org.apache.beam.model.job_management.v1.JobState_Enum" json:"state,omitempty"` // (required) + JobId string `protobuf:"bytes,1,opt,name=job_id,json=jobId,proto3" json:"job_id,omitempty"` // (required) + JobName string `protobuf:"bytes,2,opt,name=job_name,json=jobName,proto3" json:"job_name,omitempty"` // (required) + PipelineOptions *structpb.Struct `protobuf:"bytes,3,opt,name=pipeline_options,json=pipelineOptions,proto3" json:"pipeline_options,omitempty"` // (required) + State JobState_Enum `protobuf:"varint,4,opt,name=state,proto3,enum=org.apache.beam.model.job_management.v1.JobState_Enum" json:"state,omitempty"` // (required) } func (x *JobInfo) Reset() { @@ -655,7 +651,7 @@ func (x *JobInfo) GetJobName() string { return "" } -func (x *JobInfo) GetPipelineOptions() *_struct.Struct { +func (x *JobInfo) GetPipelineOptions() *structpb.Struct { if x != nil { return x.PipelineOptions } @@ -811,8 +807,8 @@ type JobStateEvent struct { sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields - State JobState_Enum `protobuf:"varint,1,opt,name=state,proto3,enum=org.apache.beam.model.job_management.v1.JobState_Enum" json:"state,omitempty"` // (required) - Timestamp *timestamp.Timestamp `protobuf:"bytes,2,opt,name=timestamp,proto3" json:"timestamp,omitempty"` // (required) + State JobState_Enum `protobuf:"varint,1,opt,name=state,proto3,enum=org.apache.beam.model.job_management.v1.JobState_Enum" json:"state,omitempty"` // (required) + Timestamp *timestamppb.Timestamp `protobuf:"bytes,2,opt,name=timestamp,proto3" json:"timestamp,omitempty"` // (required) } func (x *JobStateEvent) Reset() { @@ -854,7 +850,7 @@ func (x *JobStateEvent) GetState() JobState_Enum { return JobState_UNSPECIFIED } -func (x *JobStateEvent) GetTimestamp() *timestamp.Timestamp { +func (x *JobStateEvent) GetTimestamp() *timestamppb.Timestamp { if x != nil { return x.Timestamp } @@ -1865,16 +1861,16 @@ var file_beam_job_api_proto_rawDesc = []byte{ 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x6a, 0x6f, 0x62, 0x5f, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x62, 0x65, 0x50, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x4f, 0x70, 0x74, - 0x69, 0x6f, 0x6e, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x42, 0x81, 0x01, 0x0a, + 0x69, 0x6f, 0x6e, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x42, 0x84, 0x01, 0x0a, 0x26, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x76, 0x31, 0x42, 0x06, 0x4a, 0x6f, 0x62, 0x41, 0x70, 0x69, 0x5a, - 0x4f, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x67, 0x6f, 0x2f, - 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x6a, - 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x76, 0x31, 0x3b, - 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x76, 0x31, - 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, + 0x52, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x76, 0x32, 0x2f, + 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2f, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, + 0x76, 0x31, 0x3b, 0x6a, 0x6f, 0x62, 0x6d, 0x61, 0x6e, 0x61, 0x67, 0x65, 0x6d, 0x65, 0x6e, 0x74, + 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( @@ -1920,9 +1916,9 @@ var file_beam_job_api_proto_goTypes = []interface{}{ (*PipelineOptionDescriptor)(nil), // 25: org.apache.beam.model.job_management.v1.PipelineOptionDescriptor (*DescribePipelineOptionsResponse)(nil), // 26: org.apache.beam.model.job_management.v1.DescribePipelineOptionsResponse (*pipeline_v1.Pipeline)(nil), // 27: org.apache.beam.model.pipeline.v1.Pipeline - (*_struct.Struct)(nil), // 28: google.protobuf.Struct + (*structpb.Struct)(nil), // 28: google.protobuf.Struct (*pipeline_v1.ApiServiceDescriptor)(nil), // 29: org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - (*timestamp.Timestamp)(nil), // 30: google.protobuf.Timestamp + (*timestamppb.Timestamp)(nil), // 30: google.protobuf.Timestamp (*pipeline_v1.MonitoringInfo)(nil), // 31: org.apache.beam.model.pipeline.v1.MonitoringInfo } var file_beam_job_api_proto_depIdxs = []int32{ @@ -2290,484 +2286,3 @@ func file_beam_job_api_proto_init() { file_beam_job_api_proto_goTypes = nil file_beam_job_api_proto_depIdxs = nil } - -// Reference imports to suppress errors if they are not otherwise used. -var _ context.Context -var _ grpc.ClientConnInterface - -// This is a compile-time assertion to ensure that this generated file -// is compatible with the grpc package it is being compiled against. -const _ = grpc.SupportPackageIsVersion6 - -// JobServiceClient is the client API for JobService service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type JobServiceClient interface { - // Prepare a job for execution. The job will not be executed until a call is made to run with the - // returned preparationId. - Prepare(ctx context.Context, in *PrepareJobRequest, opts ...grpc.CallOption) (*PrepareJobResponse, error) - // Submit the job for execution - Run(ctx context.Context, in *RunJobRequest, opts ...grpc.CallOption) (*RunJobResponse, error) - // Get a list of all invoked jobs - GetJobs(ctx context.Context, in *GetJobsRequest, opts ...grpc.CallOption) (*GetJobsResponse, error) - // Get the current state of the job - GetState(ctx context.Context, in *GetJobStateRequest, opts ...grpc.CallOption) (*JobStateEvent, error) - // Get the job's pipeline - GetPipeline(ctx context.Context, in *GetJobPipelineRequest, opts ...grpc.CallOption) (*GetJobPipelineResponse, error) - // Cancel the job - Cancel(ctx context.Context, in *CancelJobRequest, opts ...grpc.CallOption) (*CancelJobResponse, error) - // Subscribe to a stream of state changes of the job, will immediately return the current state of the job as the first response. - GetStateStream(ctx context.Context, in *GetJobStateRequest, opts ...grpc.CallOption) (JobService_GetStateStreamClient, error) - // Subscribe to a stream of state changes and messages from the job - GetMessageStream(ctx context.Context, in *JobMessagesRequest, opts ...grpc.CallOption) (JobService_GetMessageStreamClient, error) - // Fetch metrics for a given job - GetJobMetrics(ctx context.Context, in *GetJobMetricsRequest, opts ...grpc.CallOption) (*GetJobMetricsResponse, error) - // Get the supported pipeline options of the runner - DescribePipelineOptions(ctx context.Context, in *DescribePipelineOptionsRequest, opts ...grpc.CallOption) (*DescribePipelineOptionsResponse, error) -} - -type jobServiceClient struct { - cc grpc.ClientConnInterface -} - -func NewJobServiceClient(cc grpc.ClientConnInterface) JobServiceClient { - return &jobServiceClient{cc} -} - -func (c *jobServiceClient) Prepare(ctx context.Context, in *PrepareJobRequest, opts ...grpc.CallOption) (*PrepareJobResponse, error) { - out := new(PrepareJobResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/Prepare", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *jobServiceClient) Run(ctx context.Context, in *RunJobRequest, opts ...grpc.CallOption) (*RunJobResponse, error) { - out := new(RunJobResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/Run", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *jobServiceClient) GetJobs(ctx context.Context, in *GetJobsRequest, opts ...grpc.CallOption) (*GetJobsResponse, error) { - out := new(GetJobsResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/GetJobs", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *jobServiceClient) GetState(ctx context.Context, in *GetJobStateRequest, opts ...grpc.CallOption) (*JobStateEvent, error) { - out := new(JobStateEvent) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/GetState", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *jobServiceClient) GetPipeline(ctx context.Context, in *GetJobPipelineRequest, opts ...grpc.CallOption) (*GetJobPipelineResponse, error) { - out := new(GetJobPipelineResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/GetPipeline", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *jobServiceClient) Cancel(ctx context.Context, in *CancelJobRequest, opts ...grpc.CallOption) (*CancelJobResponse, error) { - out := new(CancelJobResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/Cancel", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *jobServiceClient) GetStateStream(ctx context.Context, in *GetJobStateRequest, opts ...grpc.CallOption) (JobService_GetStateStreamClient, error) { - stream, err := c.cc.NewStream(ctx, &_JobService_serviceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.JobService/GetStateStream", opts...) - if err != nil { - return nil, err - } - x := &jobServiceGetStateStreamClient{stream} - if err := x.ClientStream.SendMsg(in); err != nil { - return nil, err - } - if err := x.ClientStream.CloseSend(); err != nil { - return nil, err - } - return x, nil -} - -type JobService_GetStateStreamClient interface { - Recv() (*JobStateEvent, error) - grpc.ClientStream -} - -type jobServiceGetStateStreamClient struct { - grpc.ClientStream -} - -func (x *jobServiceGetStateStreamClient) Recv() (*JobStateEvent, error) { - m := new(JobStateEvent) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -func (c *jobServiceClient) GetMessageStream(ctx context.Context, in *JobMessagesRequest, opts ...grpc.CallOption) (JobService_GetMessageStreamClient, error) { - stream, err := c.cc.NewStream(ctx, &_JobService_serviceDesc.Streams[1], "/org.apache.beam.model.job_management.v1.JobService/GetMessageStream", opts...) - if err != nil { - return nil, err - } - x := &jobServiceGetMessageStreamClient{stream} - if err := x.ClientStream.SendMsg(in); err != nil { - return nil, err - } - if err := x.ClientStream.CloseSend(); err != nil { - return nil, err - } - return x, nil -} - -type JobService_GetMessageStreamClient interface { - Recv() (*JobMessagesResponse, error) - grpc.ClientStream -} - -type jobServiceGetMessageStreamClient struct { - grpc.ClientStream -} - -func (x *jobServiceGetMessageStreamClient) Recv() (*JobMessagesResponse, error) { - m := new(JobMessagesResponse) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -func (c *jobServiceClient) GetJobMetrics(ctx context.Context, in *GetJobMetricsRequest, opts ...grpc.CallOption) (*GetJobMetricsResponse, error) { - out := new(GetJobMetricsResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/GetJobMetrics", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -func (c *jobServiceClient) DescribePipelineOptions(ctx context.Context, in *DescribePipelineOptionsRequest, opts ...grpc.CallOption) (*DescribePipelineOptionsResponse, error) { - out := new(DescribePipelineOptionsResponse) - err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/DescribePipelineOptions", in, out, opts...) - if err != nil { - return nil, err - } - return out, nil -} - -// JobServiceServer is the server API for JobService service. -type JobServiceServer interface { - // Prepare a job for execution. The job will not be executed until a call is made to run with the - // returned preparationId. - Prepare(context.Context, *PrepareJobRequest) (*PrepareJobResponse, error) - // Submit the job for execution - Run(context.Context, *RunJobRequest) (*RunJobResponse, error) - // Get a list of all invoked jobs - GetJobs(context.Context, *GetJobsRequest) (*GetJobsResponse, error) - // Get the current state of the job - GetState(context.Context, *GetJobStateRequest) (*JobStateEvent, error) - // Get the job's pipeline - GetPipeline(context.Context, *GetJobPipelineRequest) (*GetJobPipelineResponse, error) - // Cancel the job - Cancel(context.Context, *CancelJobRequest) (*CancelJobResponse, error) - // Subscribe to a stream of state changes of the job, will immediately return the current state of the job as the first response. - GetStateStream(*GetJobStateRequest, JobService_GetStateStreamServer) error - // Subscribe to a stream of state changes and messages from the job - GetMessageStream(*JobMessagesRequest, JobService_GetMessageStreamServer) error - // Fetch metrics for a given job - GetJobMetrics(context.Context, *GetJobMetricsRequest) (*GetJobMetricsResponse, error) - // Get the supported pipeline options of the runner - DescribePipelineOptions(context.Context, *DescribePipelineOptionsRequest) (*DescribePipelineOptionsResponse, error) -} - -// UnimplementedJobServiceServer can be embedded to have forward compatible implementations. -type UnimplementedJobServiceServer struct { -} - -func (*UnimplementedJobServiceServer) Prepare(context.Context, *PrepareJobRequest) (*PrepareJobResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method Prepare not implemented") -} -func (*UnimplementedJobServiceServer) Run(context.Context, *RunJobRequest) (*RunJobResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method Run not implemented") -} -func (*UnimplementedJobServiceServer) GetJobs(context.Context, *GetJobsRequest) (*GetJobsResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method GetJobs not implemented") -} -func (*UnimplementedJobServiceServer) GetState(context.Context, *GetJobStateRequest) (*JobStateEvent, error) { - return nil, status.Errorf(codes.Unimplemented, "method GetState not implemented") -} -func (*UnimplementedJobServiceServer) GetPipeline(context.Context, *GetJobPipelineRequest) (*GetJobPipelineResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method GetPipeline not implemented") -} -func (*UnimplementedJobServiceServer) Cancel(context.Context, *CancelJobRequest) (*CancelJobResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method Cancel not implemented") -} -func (*UnimplementedJobServiceServer) GetStateStream(*GetJobStateRequest, JobService_GetStateStreamServer) error { - return status.Errorf(codes.Unimplemented, "method GetStateStream not implemented") -} -func (*UnimplementedJobServiceServer) GetMessageStream(*JobMessagesRequest, JobService_GetMessageStreamServer) error { - return status.Errorf(codes.Unimplemented, "method GetMessageStream not implemented") -} -func (*UnimplementedJobServiceServer) GetJobMetrics(context.Context, *GetJobMetricsRequest) (*GetJobMetricsResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method GetJobMetrics not implemented") -} -func (*UnimplementedJobServiceServer) DescribePipelineOptions(context.Context, *DescribePipelineOptionsRequest) (*DescribePipelineOptionsResponse, error) { - return nil, status.Errorf(codes.Unimplemented, "method DescribePipelineOptions not implemented") -} - -func RegisterJobServiceServer(s *grpc.Server, srv JobServiceServer) { - s.RegisterService(&_JobService_serviceDesc, srv) -} - -func _JobService_Prepare_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(PrepareJobRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(JobServiceServer).Prepare(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.JobService/Prepare", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(JobServiceServer).Prepare(ctx, req.(*PrepareJobRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _JobService_Run_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(RunJobRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(JobServiceServer).Run(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.JobService/Run", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(JobServiceServer).Run(ctx, req.(*RunJobRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _JobService_GetJobs_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(GetJobsRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(JobServiceServer).GetJobs(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.JobService/GetJobs", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(JobServiceServer).GetJobs(ctx, req.(*GetJobsRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _JobService_GetState_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(GetJobStateRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(JobServiceServer).GetState(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.JobService/GetState", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(JobServiceServer).GetState(ctx, req.(*GetJobStateRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _JobService_GetPipeline_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(GetJobPipelineRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(JobServiceServer).GetPipeline(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.JobService/GetPipeline", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(JobServiceServer).GetPipeline(ctx, req.(*GetJobPipelineRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _JobService_Cancel_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(CancelJobRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(JobServiceServer).Cancel(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.JobService/Cancel", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(JobServiceServer).Cancel(ctx, req.(*CancelJobRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _JobService_GetStateStream_Handler(srv interface{}, stream grpc.ServerStream) error { - m := new(GetJobStateRequest) - if err := stream.RecvMsg(m); err != nil { - return err - } - return srv.(JobServiceServer).GetStateStream(m, &jobServiceGetStateStreamServer{stream}) -} - -type JobService_GetStateStreamServer interface { - Send(*JobStateEvent) error - grpc.ServerStream -} - -type jobServiceGetStateStreamServer struct { - grpc.ServerStream -} - -func (x *jobServiceGetStateStreamServer) Send(m *JobStateEvent) error { - return x.ServerStream.SendMsg(m) -} - -func _JobService_GetMessageStream_Handler(srv interface{}, stream grpc.ServerStream) error { - m := new(JobMessagesRequest) - if err := stream.RecvMsg(m); err != nil { - return err - } - return srv.(JobServiceServer).GetMessageStream(m, &jobServiceGetMessageStreamServer{stream}) -} - -type JobService_GetMessageStreamServer interface { - Send(*JobMessagesResponse) error - grpc.ServerStream -} - -type jobServiceGetMessageStreamServer struct { - grpc.ServerStream -} - -func (x *jobServiceGetMessageStreamServer) Send(m *JobMessagesResponse) error { - return x.ServerStream.SendMsg(m) -} - -func _JobService_GetJobMetrics_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(GetJobMetricsRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(JobServiceServer).GetJobMetrics(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.JobService/GetJobMetrics", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(JobServiceServer).GetJobMetrics(ctx, req.(*GetJobMetricsRequest)) - } - return interceptor(ctx, in, info, handler) -} - -func _JobService_DescribePipelineOptions_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { - in := new(DescribePipelineOptionsRequest) - if err := dec(in); err != nil { - return nil, err - } - if interceptor == nil { - return srv.(JobServiceServer).DescribePipelineOptions(ctx, in) - } - info := &grpc.UnaryServerInfo{ - Server: srv, - FullMethod: "/org.apache.beam.model.job_management.v1.JobService/DescribePipelineOptions", - } - handler := func(ctx context.Context, req interface{}) (interface{}, error) { - return srv.(JobServiceServer).DescribePipelineOptions(ctx, req.(*DescribePipelineOptionsRequest)) - } - return interceptor(ctx, in, info, handler) -} - -var _JobService_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.job_management.v1.JobService", - HandlerType: (*JobServiceServer)(nil), - Methods: []grpc.MethodDesc{ - { - MethodName: "Prepare", - Handler: _JobService_Prepare_Handler, - }, - { - MethodName: "Run", - Handler: _JobService_Run_Handler, - }, - { - MethodName: "GetJobs", - Handler: _JobService_GetJobs_Handler, - }, - { - MethodName: "GetState", - Handler: _JobService_GetState_Handler, - }, - { - MethodName: "GetPipeline", - Handler: _JobService_GetPipeline_Handler, - }, - { - MethodName: "Cancel", - Handler: _JobService_Cancel_Handler, - }, - { - MethodName: "GetJobMetrics", - Handler: _JobService_GetJobMetrics_Handler, - }, - { - MethodName: "DescribePipelineOptions", - Handler: _JobService_DescribePipelineOptions_Handler, - }, - }, - Streams: []grpc.StreamDesc{ - { - StreamName: "GetStateStream", - Handler: _JobService_GetStateStream_Handler, - ServerStreams: true, - }, - { - StreamName: "GetMessageStream", - Handler: _JobService_GetMessageStream_Handler, - ServerStreams: true, - }, - }, - Metadata: "beam_job_api.proto", -} diff --git a/sdks/go/pkg/beam/model/jobmanagement_v1/beam_job_api_grpc.pb.go b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_job_api_grpc.pb.go new file mode 100644 index 000000000000..a69232129944 --- /dev/null +++ b/sdks/go/pkg/beam/model/jobmanagement_v1/beam_job_api_grpc.pb.go @@ -0,0 +1,519 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Code generated by protoc-gen-go-grpc. DO NOT EDIT. + +package jobmanagement_v1 + +import ( + context "context" + grpc "google.golang.org/grpc" + codes "google.golang.org/grpc/codes" + status "google.golang.org/grpc/status" +) + +// This is a compile-time assertion to ensure that this generated file +// is compatible with the grpc package it is being compiled against. +// Requires gRPC-Go v1.32.0 or later. +const _ = grpc.SupportPackageIsVersion7 + +// JobServiceClient is the client API for JobService service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type JobServiceClient interface { + // Prepare a job for execution. The job will not be executed until a call is made to run with the + // returned preparationId. + Prepare(ctx context.Context, in *PrepareJobRequest, opts ...grpc.CallOption) (*PrepareJobResponse, error) + // Submit the job for execution + Run(ctx context.Context, in *RunJobRequest, opts ...grpc.CallOption) (*RunJobResponse, error) + // Get a list of all invoked jobs + GetJobs(ctx context.Context, in *GetJobsRequest, opts ...grpc.CallOption) (*GetJobsResponse, error) + // Get the current state of the job + GetState(ctx context.Context, in *GetJobStateRequest, opts ...grpc.CallOption) (*JobStateEvent, error) + // Get the job's pipeline + GetPipeline(ctx context.Context, in *GetJobPipelineRequest, opts ...grpc.CallOption) (*GetJobPipelineResponse, error) + // Cancel the job + Cancel(ctx context.Context, in *CancelJobRequest, opts ...grpc.CallOption) (*CancelJobResponse, error) + // Subscribe to a stream of state changes of the job, will immediately return the current state of the job as the first response. + GetStateStream(ctx context.Context, in *GetJobStateRequest, opts ...grpc.CallOption) (JobService_GetStateStreamClient, error) + // Subscribe to a stream of state changes and messages from the job + GetMessageStream(ctx context.Context, in *JobMessagesRequest, opts ...grpc.CallOption) (JobService_GetMessageStreamClient, error) + // Fetch metrics for a given job + GetJobMetrics(ctx context.Context, in *GetJobMetricsRequest, opts ...grpc.CallOption) (*GetJobMetricsResponse, error) + // Get the supported pipeline options of the runner + DescribePipelineOptions(ctx context.Context, in *DescribePipelineOptionsRequest, opts ...grpc.CallOption) (*DescribePipelineOptionsResponse, error) +} + +type jobServiceClient struct { + cc grpc.ClientConnInterface +} + +func NewJobServiceClient(cc grpc.ClientConnInterface) JobServiceClient { + return &jobServiceClient{cc} +} + +func (c *jobServiceClient) Prepare(ctx context.Context, in *PrepareJobRequest, opts ...grpc.CallOption) (*PrepareJobResponse, error) { + out := new(PrepareJobResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/Prepare", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *jobServiceClient) Run(ctx context.Context, in *RunJobRequest, opts ...grpc.CallOption) (*RunJobResponse, error) { + out := new(RunJobResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/Run", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *jobServiceClient) GetJobs(ctx context.Context, in *GetJobsRequest, opts ...grpc.CallOption) (*GetJobsResponse, error) { + out := new(GetJobsResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/GetJobs", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *jobServiceClient) GetState(ctx context.Context, in *GetJobStateRequest, opts ...grpc.CallOption) (*JobStateEvent, error) { + out := new(JobStateEvent) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/GetState", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *jobServiceClient) GetPipeline(ctx context.Context, in *GetJobPipelineRequest, opts ...grpc.CallOption) (*GetJobPipelineResponse, error) { + out := new(GetJobPipelineResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/GetPipeline", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *jobServiceClient) Cancel(ctx context.Context, in *CancelJobRequest, opts ...grpc.CallOption) (*CancelJobResponse, error) { + out := new(CancelJobResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/Cancel", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *jobServiceClient) GetStateStream(ctx context.Context, in *GetJobStateRequest, opts ...grpc.CallOption) (JobService_GetStateStreamClient, error) { + stream, err := c.cc.NewStream(ctx, &JobService_ServiceDesc.Streams[0], "/org.apache.beam.model.job_management.v1.JobService/GetStateStream", opts...) + if err != nil { + return nil, err + } + x := &jobServiceGetStateStreamClient{stream} + if err := x.ClientStream.SendMsg(in); err != nil { + return nil, err + } + if err := x.ClientStream.CloseSend(); err != nil { + return nil, err + } + return x, nil +} + +type JobService_GetStateStreamClient interface { + Recv() (*JobStateEvent, error) + grpc.ClientStream +} + +type jobServiceGetStateStreamClient struct { + grpc.ClientStream +} + +func (x *jobServiceGetStateStreamClient) Recv() (*JobStateEvent, error) { + m := new(JobStateEvent) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +func (c *jobServiceClient) GetMessageStream(ctx context.Context, in *JobMessagesRequest, opts ...grpc.CallOption) (JobService_GetMessageStreamClient, error) { + stream, err := c.cc.NewStream(ctx, &JobService_ServiceDesc.Streams[1], "/org.apache.beam.model.job_management.v1.JobService/GetMessageStream", opts...) + if err != nil { + return nil, err + } + x := &jobServiceGetMessageStreamClient{stream} + if err := x.ClientStream.SendMsg(in); err != nil { + return nil, err + } + if err := x.ClientStream.CloseSend(); err != nil { + return nil, err + } + return x, nil +} + +type JobService_GetMessageStreamClient interface { + Recv() (*JobMessagesResponse, error) + grpc.ClientStream +} + +type jobServiceGetMessageStreamClient struct { + grpc.ClientStream +} + +func (x *jobServiceGetMessageStreamClient) Recv() (*JobMessagesResponse, error) { + m := new(JobMessagesResponse) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +func (c *jobServiceClient) GetJobMetrics(ctx context.Context, in *GetJobMetricsRequest, opts ...grpc.CallOption) (*GetJobMetricsResponse, error) { + out := new(GetJobMetricsResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/GetJobMetrics", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +func (c *jobServiceClient) DescribePipelineOptions(ctx context.Context, in *DescribePipelineOptionsRequest, opts ...grpc.CallOption) (*DescribePipelineOptionsResponse, error) { + out := new(DescribePipelineOptionsResponse) + err := c.cc.Invoke(ctx, "/org.apache.beam.model.job_management.v1.JobService/DescribePipelineOptions", in, out, opts...) + if err != nil { + return nil, err + } + return out, nil +} + +// JobServiceServer is the server API for JobService service. +// All implementations must embed UnimplementedJobServiceServer +// for forward compatibility +type JobServiceServer interface { + // Prepare a job for execution. The job will not be executed until a call is made to run with the + // returned preparationId. + Prepare(context.Context, *PrepareJobRequest) (*PrepareJobResponse, error) + // Submit the job for execution + Run(context.Context, *RunJobRequest) (*RunJobResponse, error) + // Get a list of all invoked jobs + GetJobs(context.Context, *GetJobsRequest) (*GetJobsResponse, error) + // Get the current state of the job + GetState(context.Context, *GetJobStateRequest) (*JobStateEvent, error) + // Get the job's pipeline + GetPipeline(context.Context, *GetJobPipelineRequest) (*GetJobPipelineResponse, error) + // Cancel the job + Cancel(context.Context, *CancelJobRequest) (*CancelJobResponse, error) + // Subscribe to a stream of state changes of the job, will immediately return the current state of the job as the first response. + GetStateStream(*GetJobStateRequest, JobService_GetStateStreamServer) error + // Subscribe to a stream of state changes and messages from the job + GetMessageStream(*JobMessagesRequest, JobService_GetMessageStreamServer) error + // Fetch metrics for a given job + GetJobMetrics(context.Context, *GetJobMetricsRequest) (*GetJobMetricsResponse, error) + // Get the supported pipeline options of the runner + DescribePipelineOptions(context.Context, *DescribePipelineOptionsRequest) (*DescribePipelineOptionsResponse, error) + mustEmbedUnimplementedJobServiceServer() +} + +// UnimplementedJobServiceServer must be embedded to have forward compatible implementations. +type UnimplementedJobServiceServer struct { +} + +func (UnimplementedJobServiceServer) Prepare(context.Context, *PrepareJobRequest) (*PrepareJobResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method Prepare not implemented") +} +func (UnimplementedJobServiceServer) Run(context.Context, *RunJobRequest) (*RunJobResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method Run not implemented") +} +func (UnimplementedJobServiceServer) GetJobs(context.Context, *GetJobsRequest) (*GetJobsResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method GetJobs not implemented") +} +func (UnimplementedJobServiceServer) GetState(context.Context, *GetJobStateRequest) (*JobStateEvent, error) { + return nil, status.Errorf(codes.Unimplemented, "method GetState not implemented") +} +func (UnimplementedJobServiceServer) GetPipeline(context.Context, *GetJobPipelineRequest) (*GetJobPipelineResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method GetPipeline not implemented") +} +func (UnimplementedJobServiceServer) Cancel(context.Context, *CancelJobRequest) (*CancelJobResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method Cancel not implemented") +} +func (UnimplementedJobServiceServer) GetStateStream(*GetJobStateRequest, JobService_GetStateStreamServer) error { + return status.Errorf(codes.Unimplemented, "method GetStateStream not implemented") +} +func (UnimplementedJobServiceServer) GetMessageStream(*JobMessagesRequest, JobService_GetMessageStreamServer) error { + return status.Errorf(codes.Unimplemented, "method GetMessageStream not implemented") +} +func (UnimplementedJobServiceServer) GetJobMetrics(context.Context, *GetJobMetricsRequest) (*GetJobMetricsResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method GetJobMetrics not implemented") +} +func (UnimplementedJobServiceServer) DescribePipelineOptions(context.Context, *DescribePipelineOptionsRequest) (*DescribePipelineOptionsResponse, error) { + return nil, status.Errorf(codes.Unimplemented, "method DescribePipelineOptions not implemented") +} +func (UnimplementedJobServiceServer) mustEmbedUnimplementedJobServiceServer() {} + +// UnsafeJobServiceServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to JobServiceServer will +// result in compilation errors. +type UnsafeJobServiceServer interface { + mustEmbedUnimplementedJobServiceServer() +} + +func RegisterJobServiceServer(s grpc.ServiceRegistrar, srv JobServiceServer) { + s.RegisterService(&JobService_ServiceDesc, srv) +} + +func _JobService_Prepare_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(PrepareJobRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(JobServiceServer).Prepare(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.JobService/Prepare", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(JobServiceServer).Prepare(ctx, req.(*PrepareJobRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _JobService_Run_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(RunJobRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(JobServiceServer).Run(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.JobService/Run", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(JobServiceServer).Run(ctx, req.(*RunJobRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _JobService_GetJobs_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(GetJobsRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(JobServiceServer).GetJobs(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.JobService/GetJobs", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(JobServiceServer).GetJobs(ctx, req.(*GetJobsRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _JobService_GetState_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(GetJobStateRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(JobServiceServer).GetState(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.JobService/GetState", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(JobServiceServer).GetState(ctx, req.(*GetJobStateRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _JobService_GetPipeline_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(GetJobPipelineRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(JobServiceServer).GetPipeline(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.JobService/GetPipeline", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(JobServiceServer).GetPipeline(ctx, req.(*GetJobPipelineRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _JobService_Cancel_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(CancelJobRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(JobServiceServer).Cancel(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.JobService/Cancel", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(JobServiceServer).Cancel(ctx, req.(*CancelJobRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _JobService_GetStateStream_Handler(srv interface{}, stream grpc.ServerStream) error { + m := new(GetJobStateRequest) + if err := stream.RecvMsg(m); err != nil { + return err + } + return srv.(JobServiceServer).GetStateStream(m, &jobServiceGetStateStreamServer{stream}) +} + +type JobService_GetStateStreamServer interface { + Send(*JobStateEvent) error + grpc.ServerStream +} + +type jobServiceGetStateStreamServer struct { + grpc.ServerStream +} + +func (x *jobServiceGetStateStreamServer) Send(m *JobStateEvent) error { + return x.ServerStream.SendMsg(m) +} + +func _JobService_GetMessageStream_Handler(srv interface{}, stream grpc.ServerStream) error { + m := new(JobMessagesRequest) + if err := stream.RecvMsg(m); err != nil { + return err + } + return srv.(JobServiceServer).GetMessageStream(m, &jobServiceGetMessageStreamServer{stream}) +} + +type JobService_GetMessageStreamServer interface { + Send(*JobMessagesResponse) error + grpc.ServerStream +} + +type jobServiceGetMessageStreamServer struct { + grpc.ServerStream +} + +func (x *jobServiceGetMessageStreamServer) Send(m *JobMessagesResponse) error { + return x.ServerStream.SendMsg(m) +} + +func _JobService_GetJobMetrics_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(GetJobMetricsRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(JobServiceServer).GetJobMetrics(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.JobService/GetJobMetrics", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(JobServiceServer).GetJobMetrics(ctx, req.(*GetJobMetricsRequest)) + } + return interceptor(ctx, in, info, handler) +} + +func _JobService_DescribePipelineOptions_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { + in := new(DescribePipelineOptionsRequest) + if err := dec(in); err != nil { + return nil, err + } + if interceptor == nil { + return srv.(JobServiceServer).DescribePipelineOptions(ctx, in) + } + info := &grpc.UnaryServerInfo{ + Server: srv, + FullMethod: "/org.apache.beam.model.job_management.v1.JobService/DescribePipelineOptions", + } + handler := func(ctx context.Context, req interface{}) (interface{}, error) { + return srv.(JobServiceServer).DescribePipelineOptions(ctx, req.(*DescribePipelineOptionsRequest)) + } + return interceptor(ctx, in, info, handler) +} + +// JobService_ServiceDesc is the grpc.ServiceDesc for JobService service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var JobService_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.job_management.v1.JobService", + HandlerType: (*JobServiceServer)(nil), + Methods: []grpc.MethodDesc{ + { + MethodName: "Prepare", + Handler: _JobService_Prepare_Handler, + }, + { + MethodName: "Run", + Handler: _JobService_Run_Handler, + }, + { + MethodName: "GetJobs", + Handler: _JobService_GetJobs_Handler, + }, + { + MethodName: "GetState", + Handler: _JobService_GetState_Handler, + }, + { + MethodName: "GetPipeline", + Handler: _JobService_GetPipeline_Handler, + }, + { + MethodName: "Cancel", + Handler: _JobService_Cancel_Handler, + }, + { + MethodName: "GetJobMetrics", + Handler: _JobService_GetJobMetrics_Handler, + }, + { + MethodName: "DescribePipelineOptions", + Handler: _JobService_DescribePipelineOptions_Handler, + }, + }, + Streams: []grpc.StreamDesc{ + { + StreamName: "GetStateStream", + Handler: _JobService_GetStateStream_Handler, + ServerStreams: true, + }, + { + StreamName: "GetMessageStream", + Handler: _JobService_GetMessageStream_Handler, + ServerStreams: true, + }, + }, + Metadata: "beam_job_api.proto", +} diff --git a/sdks/go/pkg/beam/model/pipeline_v1/beam_runner_api.pb.go b/sdks/go/pkg/beam/model/pipeline_v1/beam_runner_api.pb.go index 68c777bf7838..f36d767a10a5 100644 --- a/sdks/go/pkg/beam/model/pipeline_v1/beam_runner_api.pb.go +++ b/sdks/go/pkg/beam/model/pipeline_v1/beam_runner_api.pb.go @@ -21,20 +21,16 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: beam_runner_api.proto package pipeline_v1 import ( - context "context" - descriptor "github.com/golang/protobuf/protoc-gen-go/descriptor" - grpc "google.golang.org/grpc" - codes "google.golang.org/grpc/codes" - status "google.golang.org/grpc/status" protoreflect "google.golang.org/protobuf/reflect/protoreflect" protoimpl "google.golang.org/protobuf/runtime/protoimpl" + descriptorpb "google.golang.org/protobuf/types/descriptorpb" reflect "reflect" sync "sync" ) @@ -204,6 +200,21 @@ const ( // // Payload: WindowingStrategy#window_fn FunctionSpec StandardPTransforms_MERGE_WINDOWS StandardPTransforms_Primitives = 7 + // A transform that translates a given element to its human-readable + // representation. + // + // Input: KV + // Output: KV + // + // For each given element, the implementation returns the best-effort + // human-readable representation. When possible, the implementation could + // call a user-overridable method on the type. For example, Java could + // call `toString()`, Python could call `str()`, Golang could call + // `String()`. The nonce is used by a runner to associate each input with + // its output. The nonce is represented as an opaque set of bytes. + // + // Payload: none + StandardPTransforms_TO_STRING StandardPTransforms_Primitives = 8 ) // Enum value maps for StandardPTransforms_Primitives. @@ -217,6 +228,7 @@ var ( 5: "TEST_STREAM", 6: "MAP_WINDOWS", 7: "MERGE_WINDOWS", + 8: "TO_STRING", } StandardPTransforms_Primitives_value = map[string]int32{ "PAR_DO": 0, @@ -227,6 +239,7 @@ var ( "TEST_STREAM": 5, "MAP_WINDOWS": 6, "MERGE_WINDOWS": 7, + "TO_STRING": 8, } ) @@ -328,6 +341,9 @@ const ( StandardPTransforms_PUBSUB_READ StandardPTransforms_Composites = 4 // Payload: PubSubWritePayload. StandardPTransforms_PUBSUB_WRITE StandardPTransforms_Composites = 5 + // Represents the GroupIntoBatches.WithShardedKey operation. + // Payload: GroupIntoBatchesPayload + StandardPTransforms_GROUP_INTO_BATCHES_WITH_SHARDED_KEY StandardPTransforms_Composites = 6 ) // Enum value maps for StandardPTransforms_Composites. @@ -339,14 +355,16 @@ var ( 3: "WRITE_FILES", 4: "PUBSUB_READ", 5: "PUBSUB_WRITE", + 6: "GROUP_INTO_BATCHES_WITH_SHARDED_KEY", } StandardPTransforms_Composites_value = map[string]int32{ - "COMBINE_PER_KEY": 0, - "COMBINE_GLOBALLY": 1, - "RESHUFFLE": 2, - "WRITE_FILES": 3, - "PUBSUB_READ": 4, - "PUBSUB_WRITE": 5, + "COMBINE_PER_KEY": 0, + "COMBINE_GLOBALLY": 1, + "RESHUFFLE": 2, + "WRITE_FILES": 3, + "PUBSUB_READ": 4, + "PUBSUB_WRITE": 5, + "GROUP_INTO_BATCHES_WITH_SHARDED_KEY": 6, } ) @@ -531,6 +549,50 @@ func (StandardPTransforms_SplittableParDoComponents) EnumDescriptor() ([]byte, [ return file_beam_runner_api_proto_rawDescGZIP(), []int{4, 4} } +// Payload for all of these: GroupIntoBatchesPayload +type StandardPTransforms_GroupIntoBatchesComponents int32 + +const ( + StandardPTransforms_GROUP_INTO_BATCHES StandardPTransforms_GroupIntoBatchesComponents = 0 +) + +// Enum value maps for StandardPTransforms_GroupIntoBatchesComponents. +var ( + StandardPTransforms_GroupIntoBatchesComponents_name = map[int32]string{ + 0: "GROUP_INTO_BATCHES", + } + StandardPTransforms_GroupIntoBatchesComponents_value = map[string]int32{ + "GROUP_INTO_BATCHES": 0, + } +) + +func (x StandardPTransforms_GroupIntoBatchesComponents) Enum() *StandardPTransforms_GroupIntoBatchesComponents { + p := new(StandardPTransforms_GroupIntoBatchesComponents) + *p = x + return p +} + +func (x StandardPTransforms_GroupIntoBatchesComponents) String() string { + return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x)) +} + +func (StandardPTransforms_GroupIntoBatchesComponents) Descriptor() protoreflect.EnumDescriptor { + return file_beam_runner_api_proto_enumTypes[6].Descriptor() +} + +func (StandardPTransforms_GroupIntoBatchesComponents) Type() protoreflect.EnumType { + return &file_beam_runner_api_proto_enumTypes[6] +} + +func (x StandardPTransforms_GroupIntoBatchesComponents) Number() protoreflect.EnumNumber { + return protoreflect.EnumNumber(x) +} + +// Deprecated: Use StandardPTransforms_GroupIntoBatchesComponents.Descriptor instead. +func (StandardPTransforms_GroupIntoBatchesComponents) EnumDescriptor() ([]byte, []int) { + return file_beam_runner_api_proto_rawDescGZIP(), []int{4, 5} +} + type StandardSideInputTypes_Enum int32 const ( @@ -569,11 +631,11 @@ func (x StandardSideInputTypes_Enum) String() string { } func (StandardSideInputTypes_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[6].Descriptor() + return file_beam_runner_api_proto_enumTypes[7].Descriptor() } func (StandardSideInputTypes_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[6] + return &file_beam_runner_api_proto_enumTypes[7] } func (x StandardSideInputTypes_Enum) Number() protoreflect.EnumNumber { @@ -618,11 +680,11 @@ func (x IsBounded_Enum) String() string { } func (IsBounded_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[7].Descriptor() + return file_beam_runner_api_proto_enumTypes[8].Descriptor() } func (IsBounded_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[7] + return &file_beam_runner_api_proto_enumTypes[8] } func (x IsBounded_Enum) Number() protoreflect.EnumNumber { @@ -775,6 +837,29 @@ const ( // Components: Coder for a single element. // Experimental. StandardCoders_STATE_BACKED_ITERABLE StandardCoders_Enum = 9 + // Encodes an arbitrary user defined window and its max timestamp (inclusive). + // The encoding format is: + // maxTimestamp window + // + // maxTimestamp - A big endian 8 byte integer representing millis-since-epoch. + // The encoded representation is shifted so that the byte representation + // of negative values are lexicographically ordered before the byte + // representation of positive values. This is typically done by + // subtracting -9223372036854775808 from the value and encoding it as a + // signed big endian integer. Example values: + // + // -9223372036854775808: 00 00 00 00 00 00 00 00 + // -255: 7F FF FF FF FF FF FF 01 + // -1: 7F FF FF FF FF FF FF FF + // 0: 80 00 00 00 00 00 00 00 + // 1: 80 00 00 00 00 00 00 01 + // 256: 80 00 00 00 00 00 01 00 + // 9223372036854775807: FF FF FF FF FF FF FF FF + // + // window - the window is encoded using the supplied window coder. + // + // Components: Coder for the custom window type. + StandardCoders_CUSTOM_WINDOW StandardCoders_Enum = 16 // Encodes a "row", an element with a known schema, defined by an // instance of Schema from schema.proto. // @@ -829,6 +914,27 @@ const ( // Components: None // Experimental. StandardCoders_ROW StandardCoders_Enum = 13 + // Encodes a user key and a shard id which is an opaque byte string. + // + // The encoding for a sharded key consists of a shard id byte string and the + // encoded user key in the following order: + // + // - shard id using beam:coder:bytes:v1 + // - encoded user key + // + // Examples: + // user key with an empty shard id + // 0x00 + // encode(user_key) + // + // user key with a shard id taking up two bytes. + // 0x02 + // 0x11 0x22 + // encode(user_key) + // + // Components: the user key coder. + // Experimental. + StandardCoders_SHARDED_KEY StandardCoders_Enum = 15 ) // Enum value maps for StandardCoders_Enum. @@ -848,7 +954,9 @@ var ( 8: "WINDOWED_VALUE", 14: "PARAM_WINDOWED_VALUE", 9: "STATE_BACKED_ITERABLE", + 16: "CUSTOM_WINDOW", 13: "ROW", + 15: "SHARDED_KEY", } StandardCoders_Enum_value = map[string]int32{ "BYTES": 0, @@ -865,7 +973,9 @@ var ( "WINDOWED_VALUE": 8, "PARAM_WINDOWED_VALUE": 14, "STATE_BACKED_ITERABLE": 9, + "CUSTOM_WINDOW": 16, "ROW": 13, + "SHARDED_KEY": 15, } ) @@ -880,11 +990,11 @@ func (x StandardCoders_Enum) String() string { } func (StandardCoders_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[8].Descriptor() + return file_beam_runner_api_proto_enumTypes[9].Descriptor() } func (StandardCoders_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[8] + return &file_beam_runner_api_proto_enumTypes[9] } func (x StandardCoders_Enum) Number() protoreflect.EnumNumber { @@ -893,7 +1003,7 @@ func (x StandardCoders_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use StandardCoders_Enum.Descriptor instead. func (StandardCoders_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{26, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{27, 0} } type MergeStatus_Enum int32 @@ -940,11 +1050,11 @@ func (x MergeStatus_Enum) String() string { } func (MergeStatus_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[9].Descriptor() + return file_beam_runner_api_proto_enumTypes[10].Descriptor() } func (MergeStatus_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[9] + return &file_beam_runner_api_proto_enumTypes[10] } func (x MergeStatus_Enum) Number() protoreflect.EnumNumber { @@ -953,7 +1063,7 @@ func (x MergeStatus_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use MergeStatus_Enum.Descriptor instead. func (MergeStatus_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{28, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{29, 0} } type AccumulationMode_Enum int32 @@ -995,11 +1105,11 @@ func (x AccumulationMode_Enum) String() string { } func (AccumulationMode_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[10].Descriptor() + return file_beam_runner_api_proto_enumTypes[11].Descriptor() } func (AccumulationMode_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[10] + return &file_beam_runner_api_proto_enumTypes[11] } func (x AccumulationMode_Enum) Number() protoreflect.EnumNumber { @@ -1008,7 +1118,7 @@ func (x AccumulationMode_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use AccumulationMode_Enum.Descriptor instead. func (AccumulationMode_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{29, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{30, 0} } type ClosingBehavior_Enum int32 @@ -1047,11 +1157,11 @@ func (x ClosingBehavior_Enum) String() string { } func (ClosingBehavior_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[11].Descriptor() + return file_beam_runner_api_proto_enumTypes[12].Descriptor() } func (ClosingBehavior_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[11] + return &file_beam_runner_api_proto_enumTypes[12] } func (x ClosingBehavior_Enum) Number() protoreflect.EnumNumber { @@ -1060,7 +1170,7 @@ func (x ClosingBehavior_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use ClosingBehavior_Enum.Descriptor instead. func (ClosingBehavior_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{30, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{31, 0} } type OnTimeBehavior_Enum int32 @@ -1099,11 +1209,11 @@ func (x OnTimeBehavior_Enum) String() string { } func (OnTimeBehavior_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[12].Descriptor() + return file_beam_runner_api_proto_enumTypes[13].Descriptor() } func (OnTimeBehavior_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[12] + return &file_beam_runner_api_proto_enumTypes[13] } func (x OnTimeBehavior_Enum) Number() protoreflect.EnumNumber { @@ -1112,7 +1222,7 @@ func (x OnTimeBehavior_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use OnTimeBehavior_Enum.Descriptor instead. func (OnTimeBehavior_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{31, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{32, 0} } type OutputTime_Enum int32 @@ -1156,11 +1266,11 @@ func (x OutputTime_Enum) String() string { } func (OutputTime_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[13].Descriptor() + return file_beam_runner_api_proto_enumTypes[14].Descriptor() } func (OutputTime_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[13] + return &file_beam_runner_api_proto_enumTypes[14] } func (x OutputTime_Enum) Number() protoreflect.EnumNumber { @@ -1169,7 +1279,7 @@ func (x OutputTime_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use OutputTime_Enum.Descriptor instead. func (OutputTime_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{32, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{33, 0} } type TimeDomain_Enum int32 @@ -1181,12 +1291,6 @@ const ( // Processing time is time from the perspective of the // execution of your pipeline TimeDomain_PROCESSING_TIME TimeDomain_Enum = 2 - // Synchronized processing time is the minimum of the - // processing time of all pending elements. - // - // The "processing time" of an element refers to - // the local processing time at which it was emitted - TimeDomain_SYNCHRONIZED_PROCESSING_TIME TimeDomain_Enum = 3 ) // Enum value maps for TimeDomain_Enum. @@ -1195,13 +1299,11 @@ var ( 0: "UNSPECIFIED", 1: "EVENT_TIME", 2: "PROCESSING_TIME", - 3: "SYNCHRONIZED_PROCESSING_TIME", } TimeDomain_Enum_value = map[string]int32{ - "UNSPECIFIED": 0, - "EVENT_TIME": 1, - "PROCESSING_TIME": 2, - "SYNCHRONIZED_PROCESSING_TIME": 3, + "UNSPECIFIED": 0, + "EVENT_TIME": 1, + "PROCESSING_TIME": 2, } ) @@ -1216,11 +1318,11 @@ func (x TimeDomain_Enum) String() string { } func (TimeDomain_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[14].Descriptor() + return file_beam_runner_api_proto_enumTypes[15].Descriptor() } func (TimeDomain_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[14] + return &file_beam_runner_api_proto_enumTypes[15] } func (x TimeDomain_Enum) Number() protoreflect.EnumNumber { @@ -1229,7 +1331,7 @@ func (x TimeDomain_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use TimeDomain_Enum.Descriptor instead. func (TimeDomain_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{33, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 0} } type StandardArtifacts_Types int32 @@ -1286,11 +1388,11 @@ func (x StandardArtifacts_Types) String() string { } func (StandardArtifacts_Types) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[15].Descriptor() + return file_beam_runner_api_proto_enumTypes[16].Descriptor() } func (StandardArtifacts_Types) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[15] + return &file_beam_runner_api_proto_enumTypes[16] } func (x StandardArtifacts_Types) Number() protoreflect.EnumNumber { @@ -1299,7 +1401,7 @@ func (x StandardArtifacts_Types) Number() protoreflect.EnumNumber { // Deprecated: Use StandardArtifacts_Types.Descriptor instead. func (StandardArtifacts_Types) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{37, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{38, 0} } type StandardArtifacts_Roles int32 @@ -1308,15 +1410,20 @@ const ( // A URN for staging-to role. // payload: ArtifactStagingToRolePayload StandardArtifacts_STAGING_TO StandardArtifacts_Roles = 0 + // A URN for pip-requirements-file role. + // payload: None + StandardArtifacts_PIP_REQUIREMENTS_FILE StandardArtifacts_Roles = 1 ) // Enum value maps for StandardArtifacts_Roles. var ( StandardArtifacts_Roles_name = map[int32]string{ 0: "STAGING_TO", + 1: "PIP_REQUIREMENTS_FILE", } StandardArtifacts_Roles_value = map[string]int32{ - "STAGING_TO": 0, + "STAGING_TO": 0, + "PIP_REQUIREMENTS_FILE": 1, } ) @@ -1331,11 +1438,11 @@ func (x StandardArtifacts_Roles) String() string { } func (StandardArtifacts_Roles) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[16].Descriptor() + return file_beam_runner_api_proto_enumTypes[17].Descriptor() } func (StandardArtifacts_Roles) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[16] + return &file_beam_runner_api_proto_enumTypes[17] } func (x StandardArtifacts_Roles) Number() protoreflect.EnumNumber { @@ -1344,7 +1451,7 @@ func (x StandardArtifacts_Roles) Number() protoreflect.EnumNumber { // Deprecated: Use StandardArtifacts_Roles.Descriptor instead. func (StandardArtifacts_Roles) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{37, 1} + return file_beam_runner_api_proto_rawDescGZIP(), []int{38, 1} } type StandardEnvironments_Environments int32 @@ -1353,6 +1460,7 @@ const ( StandardEnvironments_DOCKER StandardEnvironments_Environments = 0 // A managed docker container to run user code. StandardEnvironments_PROCESS StandardEnvironments_Environments = 1 // A managed native process to run user code. StandardEnvironments_EXTERNAL StandardEnvironments_Environments = 2 // An external non managed process to run user code. + StandardEnvironments_DEFAULT StandardEnvironments_Environments = 3 // Used as a stub when context is missing a runner-provided default environment. ) // Enum value maps for StandardEnvironments_Environments. @@ -1361,11 +1469,13 @@ var ( 0: "DOCKER", 1: "PROCESS", 2: "EXTERNAL", + 3: "DEFAULT", } StandardEnvironments_Environments_value = map[string]int32{ "DOCKER": 0, "PROCESS": 1, "EXTERNAL": 2, + "DEFAULT": 3, } ) @@ -1380,11 +1490,11 @@ func (x StandardEnvironments_Environments) String() string { } func (StandardEnvironments_Environments) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[17].Descriptor() + return file_beam_runner_api_proto_enumTypes[18].Descriptor() } func (StandardEnvironments_Environments) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[17] + return &file_beam_runner_api_proto_enumTypes[18] } func (x StandardEnvironments_Environments) Number() protoreflect.EnumNumber { @@ -1393,7 +1503,7 @@ func (x StandardEnvironments_Environments) Number() protoreflect.EnumNumber { // Deprecated: Use StandardEnvironments_Environments.Descriptor instead. func (StandardEnvironments_Environments) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{47, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{48, 0} } type StandardProtocols_Enum int32 @@ -1412,6 +1522,11 @@ const ( // simply indicates this SDK can actually parallelize the work across multiple // cores. StandardProtocols_MULTI_CORE_BUNDLE_PROCESSING StandardProtocols_Enum = 3 + // Indicates that this SDK handles the InstructionRequest of type + // HarnessMonitoringInfosRequest. + // A request to provide full MonitoringInfo data associated with + // the entire SDK harness process, not specific to a bundle. + StandardProtocols_HARNESS_MONITORING_INFOS StandardProtocols_Enum = 4 ) // Enum value maps for StandardProtocols_Enum. @@ -1421,12 +1536,14 @@ var ( 1: "PROGRESS_REPORTING", 2: "WORKER_STATUS", 3: "MULTI_CORE_BUNDLE_PROCESSING", + 4: "HARNESS_MONITORING_INFOS", } StandardProtocols_Enum_value = map[string]int32{ "LEGACY_PROGRESS_REPORTING": 0, "PROGRESS_REPORTING": 1, "WORKER_STATUS": 2, "MULTI_CORE_BUNDLE_PROCESSING": 3, + "HARNESS_MONITORING_INFOS": 4, } ) @@ -1441,11 +1558,11 @@ func (x StandardProtocols_Enum) String() string { } func (StandardProtocols_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[18].Descriptor() + return file_beam_runner_api_proto_enumTypes[19].Descriptor() } func (StandardProtocols_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[18] + return &file_beam_runner_api_proto_enumTypes[19] } func (x StandardProtocols_Enum) Number() protoreflect.EnumNumber { @@ -1454,7 +1571,51 @@ func (x StandardProtocols_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use StandardProtocols_Enum.Descriptor instead. func (StandardProtocols_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{51, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{52, 0} +} + +type StandardRunnerProtocols_Enum int32 + +const ( + // Indicates suport the MonitoringInfo short id protocol. + StandardRunnerProtocols_MONITORING_INFO_SHORT_IDS StandardRunnerProtocols_Enum = 0 +) + +// Enum value maps for StandardRunnerProtocols_Enum. +var ( + StandardRunnerProtocols_Enum_name = map[int32]string{ + 0: "MONITORING_INFO_SHORT_IDS", + } + StandardRunnerProtocols_Enum_value = map[string]int32{ + "MONITORING_INFO_SHORT_IDS": 0, + } +) + +func (x StandardRunnerProtocols_Enum) Enum() *StandardRunnerProtocols_Enum { + p := new(StandardRunnerProtocols_Enum) + *p = x + return p +} + +func (x StandardRunnerProtocols_Enum) String() string { + return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x)) +} + +func (StandardRunnerProtocols_Enum) Descriptor() protoreflect.EnumDescriptor { + return file_beam_runner_api_proto_enumTypes[20].Descriptor() +} + +func (StandardRunnerProtocols_Enum) Type() protoreflect.EnumType { + return &file_beam_runner_api_proto_enumTypes[20] +} + +func (x StandardRunnerProtocols_Enum) Number() protoreflect.EnumNumber { + return protoreflect.EnumNumber(x) +} + +// Deprecated: Use StandardRunnerProtocols_Enum.Descriptor instead. +func (StandardRunnerProtocols_Enum) EnumDescriptor() ([]byte, []int) { + return file_beam_runner_api_proto_rawDescGZIP(), []int{53, 0} } type StandardRequirements_Enum int32 @@ -1506,11 +1667,11 @@ func (x StandardRequirements_Enum) String() string { } func (StandardRequirements_Enum) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[19].Descriptor() + return file_beam_runner_api_proto_enumTypes[21].Descriptor() } func (StandardRequirements_Enum) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[19] + return &file_beam_runner_api_proto_enumTypes[21] } func (x StandardRequirements_Enum) Number() protoreflect.EnumNumber { @@ -1519,24 +1680,24 @@ func (x StandardRequirements_Enum) Number() protoreflect.EnumNumber { // Deprecated: Use StandardRequirements_Enum.Descriptor instead. func (StandardRequirements_Enum) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{52, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{54, 0} } type StandardDisplayData_DisplayData int32 const ( // A string label and value. Has a payload containing an encoded - // LabelledStringPayload. - StandardDisplayData_LABELLED_STRING StandardDisplayData_DisplayData = 0 + // LabelledPayload. + StandardDisplayData_LABELLED StandardDisplayData_DisplayData = 0 ) // Enum value maps for StandardDisplayData_DisplayData. var ( StandardDisplayData_DisplayData_name = map[int32]string{ - 0: "LABELLED_STRING", + 0: "LABELLED", } StandardDisplayData_DisplayData_value = map[string]int32{ - "LABELLED_STRING": 0, + "LABELLED": 0, } ) @@ -1551,11 +1712,11 @@ func (x StandardDisplayData_DisplayData) String() string { } func (StandardDisplayData_DisplayData) Descriptor() protoreflect.EnumDescriptor { - return file_beam_runner_api_proto_enumTypes[20].Descriptor() + return file_beam_runner_api_proto_enumTypes[22].Descriptor() } func (StandardDisplayData_DisplayData) Type() protoreflect.EnumType { - return &file_beam_runner_api_proto_enumTypes[20] + return &file_beam_runner_api_proto_enumTypes[22] } func (x StandardDisplayData_DisplayData) Number() protoreflect.EnumNumber { @@ -1564,7 +1725,56 @@ func (x StandardDisplayData_DisplayData) Number() protoreflect.EnumNumber { // Deprecated: Use StandardDisplayData_DisplayData.Descriptor instead. func (StandardDisplayData_DisplayData) EnumDescriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{54, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{56, 0} +} + +type StandardResourceHints_Enum int32 + +const ( + // Describes hardware accelerators that are desired to have in the execution environment. + StandardResourceHints_ACCELERATOR StandardResourceHints_Enum = 0 + // Describes desired minimal available RAM size in transform's execution environment. + // SDKs should convert the size to bytes, but can allow users to specify human-friendly units (e.g. GiB). + StandardResourceHints_MIN_RAM_BYTES StandardResourceHints_Enum = 1 +) + +// Enum value maps for StandardResourceHints_Enum. +var ( + StandardResourceHints_Enum_name = map[int32]string{ + 0: "ACCELERATOR", + 1: "MIN_RAM_BYTES", + } + StandardResourceHints_Enum_value = map[string]int32{ + "ACCELERATOR": 0, + "MIN_RAM_BYTES": 1, + } +) + +func (x StandardResourceHints_Enum) Enum() *StandardResourceHints_Enum { + p := new(StandardResourceHints_Enum) + *p = x + return p +} + +func (x StandardResourceHints_Enum) String() string { + return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x)) +} + +func (StandardResourceHints_Enum) Descriptor() protoreflect.EnumDescriptor { + return file_beam_runner_api_proto_enumTypes[23].Descriptor() +} + +func (StandardResourceHints_Enum) Type() protoreflect.EnumType { + return &file_beam_runner_api_proto_enumTypes[23] +} + +func (x StandardResourceHints_Enum) Number() protoreflect.EnumNumber { + return protoreflect.EnumNumber(x) +} + +// Deprecated: Use StandardResourceHints_Enum.Descriptor instead. +func (StandardResourceHints_Enum) EnumDescriptor() ([]byte, []int) { + return file_beam_runner_api_proto_rawDescGZIP(), []int{61, 0} } type BeamConstants struct { @@ -1613,6 +1823,10 @@ type Components struct { unknownFields protoimpl.UnknownFields // (Required) A map from pipeline-scoped id to PTransform. + // + // Keys of the transforms map may be used by runners to identify pipeline + // steps. Hence it's recommended to use strings that are not too long that + // match regex '[A-Za-z0-9-_]+'. Transforms map[string]*PTransform `protobuf:"bytes,1,rep,name=transforms,proto3" json:"transforms,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` // (Required) A map from pipeline-scoped id to PCollection. Pcollections map[string]*PCollection `protobuf:"bytes,2,rep,name=pcollections,proto3" json:"pcollections,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` @@ -1830,7 +2044,11 @@ type PTransform struct { Spec *FunctionSpec `protobuf:"bytes,1,opt,name=spec,proto3" json:"spec,omitempty"` // (Optional) A list of the ids of transforms that it contains. // - // Primitive transforms are not allowed to specify this. + // Primitive transforms (see StandardPTransforms.Primitives) are not allowed + // to specify subtransforms. + // + // Note that a composite transform may have zero subtransforms as long as it + // only outputs PCollections that are in its inputs. Subtransforms []string `protobuf:"bytes,2,rep,name=subtransforms,proto3" json:"subtransforms,omitempty"` // (Required) A map from local names of inputs (unique only with this map, and // likely embedded in the transform payload and serialized user code) to @@ -1863,6 +2081,14 @@ type PTransform struct { // Transforms that are required to be implemented by a runner must omit this. // All other transforms are required to specify this. EnvironmentId string `protobuf:"bytes,7,opt,name=environment_id,json=environmentId,proto3" json:"environment_id,omitempty"` + // (Optional) A map from URNs designating a type of annotation, to the + // annotation in binary format. For example, an annotation could indicate + // that this PTransform has specific privacy properties. + // + // A runner MAY ignore types of annotations it doesn't understand. Therefore + // annotations MUST NOT be used for metadata that can affect correct + // execution of the transform. + Annotations map[string][]byte `protobuf:"bytes,8,rep,name=annotations,proto3" json:"annotations,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` } func (x *PTransform) Reset() { @@ -1946,6 +2172,13 @@ func (x *PTransform) GetEnvironmentId() string { return "" } +func (x *PTransform) GetAnnotations() map[string][]byte { + if x != nil { + return x.Annotations + } + return nil +} + type StandardPTransforms struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache @@ -3136,14 +3369,20 @@ func (x *WriteFilesPayload) GetSideInputs() map[string]*SideInput { // Payload used by Google Cloud Pub/Sub read transform. // This can be used by runners that wish to override Beam Pub/Sub read transform // with a native implementation. +// The SDK should guarantee that only one of topic, subscription, +// topic_runtime_overridden and subscription_runtime_overridden is set. +// The output of PubSubReadPayload should be bytes of serialized PubsubMessage +// proto if with_attributes == true. Otherwise, the bytes is the raw payload. type PubSubReadPayload struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields // Topic to read from. Exactly one of topic or subscription should be set. + // Topic format is: /topics/project_id/subscription_name Topic string `protobuf:"bytes,1,opt,name=topic,proto3" json:"topic,omitempty"` // Subscription to read from. Exactly one of topic or subscription should be set. + // Subscription format is: /subscriptions/project_id/subscription_name Subscription string `protobuf:"bytes,2,opt,name=subscription,proto3" json:"subscription,omitempty"` // Attribute that provides element timestamps. TimestampAttribute string `protobuf:"bytes,3,opt,name=timestamp_attribute,json=timestampAttribute,proto3" json:"timestamp_attribute,omitempty"` @@ -3151,6 +3390,10 @@ type PubSubReadPayload struct { IdAttribute string `protobuf:"bytes,4,opt,name=id_attribute,json=idAttribute,proto3" json:"id_attribute,omitempty"` // If true, reads Pub/Sub payload as well as attributes. If false, reads only the payload. WithAttributes bool `protobuf:"varint,5,opt,name=with_attributes,json=withAttributes,proto3" json:"with_attributes,omitempty"` + // If set, the topic is expected to be provided during runtime. + TopicRuntimeOverridden string `protobuf:"bytes,6,opt,name=topic_runtime_overridden,json=topicRuntimeOverridden,proto3" json:"topic_runtime_overridden,omitempty"` + // If set, the subscription that is expected to be provided during runtime. + SubscriptionRuntimeOverridden string `protobuf:"bytes,7,opt,name=subscription_runtime_overridden,json=subscriptionRuntimeOverridden,proto3" json:"subscription_runtime_overridden,omitempty"` } func (x *PubSubReadPayload) Reset() { @@ -3220,23 +3463,41 @@ func (x *PubSubReadPayload) GetWithAttributes() bool { return false } +func (x *PubSubReadPayload) GetTopicRuntimeOverridden() string { + if x != nil { + return x.TopicRuntimeOverridden + } + return "" +} + +func (x *PubSubReadPayload) GetSubscriptionRuntimeOverridden() string { + if x != nil { + return x.SubscriptionRuntimeOverridden + } + return "" +} + // Payload used by Google Cloud Pub/Sub write transform. // This can be used by runners that wish to override Beam Pub/Sub write transform // with a native implementation. +// The SDK should guarantee that only one of topic and topic_runtime_overridden +// is set. +// The output of PubSubWritePayload should be bytes if serialized PubsubMessage +// proto. type PubSubWritePayload struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields // Topic to write to. + // Topic format is: /topics/project_id/subscription_name Topic string `protobuf:"bytes,1,opt,name=topic,proto3" json:"topic,omitempty"` // Attribute that provides element timestamps. TimestampAttribute string `protobuf:"bytes,2,opt,name=timestamp_attribute,json=timestampAttribute,proto3" json:"timestamp_attribute,omitempty"` // Attribute that uniquely identify messages. IdAttribute string `protobuf:"bytes,3,opt,name=id_attribute,json=idAttribute,proto3" json:"id_attribute,omitempty"` - // If true, writes Pub/Sub payload as well as attributes. If false, reads only the payload. - // TODO(BEAM-10869): consider removing/deprecating this field when fixed. - WithAttributes bool `protobuf:"varint,4,opt,name=with_attributes,json=withAttributes,proto3" json:"with_attributes,omitempty"` + // If set, the topic is expected to be provided during runtime. + TopicRuntimeOverridden string `protobuf:"bytes,4,opt,name=topic_runtime_overridden,json=topicRuntimeOverridden,proto3" json:"topic_runtime_overridden,omitempty"` } func (x *PubSubWritePayload) Reset() { @@ -3292,11 +3553,78 @@ func (x *PubSubWritePayload) GetIdAttribute() string { return "" } -func (x *PubSubWritePayload) GetWithAttributes() bool { +func (x *PubSubWritePayload) GetTopicRuntimeOverridden() string { if x != nil { - return x.WithAttributes + return x.TopicRuntimeOverridden } - return false + return "" +} + +// Payload for GroupIntoBatches composite transform. +type GroupIntoBatchesPayload struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Max size of a batch. + BatchSize int64 `protobuf:"varint,1,opt,name=batch_size,json=batchSize,proto3" json:"batch_size,omitempty"` + // Max byte size of a batch in element. + BatchSizeBytes int64 `protobuf:"varint,3,opt,name=batch_size_bytes,json=batchSizeBytes,proto3" json:"batch_size_bytes,omitempty"` + // (Optional) Max duration a batch is allowed to be cached in states. + MaxBufferingDurationMillis int64 `protobuf:"varint,2,opt,name=max_buffering_duration_millis,json=maxBufferingDurationMillis,proto3" json:"max_buffering_duration_millis,omitempty"` +} + +func (x *GroupIntoBatchesPayload) Reset() { + *x = GroupIntoBatchesPayload{} + if protoimpl.UnsafeEnabled { + mi := &file_beam_runner_api_proto_msgTypes[25] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *GroupIntoBatchesPayload) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*GroupIntoBatchesPayload) ProtoMessage() {} + +func (x *GroupIntoBatchesPayload) ProtoReflect() protoreflect.Message { + mi := &file_beam_runner_api_proto_msgTypes[25] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use GroupIntoBatchesPayload.ProtoReflect.Descriptor instead. +func (*GroupIntoBatchesPayload) Descriptor() ([]byte, []int) { + return file_beam_runner_api_proto_rawDescGZIP(), []int{25} +} + +func (x *GroupIntoBatchesPayload) GetBatchSize() int64 { + if x != nil { + return x.BatchSize + } + return 0 +} + +func (x *GroupIntoBatchesPayload) GetBatchSizeBytes() int64 { + if x != nil { + return x.BatchSizeBytes + } + return 0 +} + +func (x *GroupIntoBatchesPayload) GetMaxBufferingDurationMillis() int64 { + if x != nil { + return x.MaxBufferingDurationMillis + } + return 0 } // A coder, the binary format for serialization and deserialization of data in @@ -3321,7 +3649,7 @@ type Coder struct { func (x *Coder) Reset() { *x = Coder{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[25] + mi := &file_beam_runner_api_proto_msgTypes[26] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3334,7 +3662,7 @@ func (x *Coder) String() string { func (*Coder) ProtoMessage() {} func (x *Coder) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[25] + mi := &file_beam_runner_api_proto_msgTypes[26] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3347,7 +3675,7 @@ func (x *Coder) ProtoReflect() protoreflect.Message { // Deprecated: Use Coder.ProtoReflect.Descriptor instead. func (*Coder) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{25} + return file_beam_runner_api_proto_rawDescGZIP(), []int{26} } func (x *Coder) GetSpec() *FunctionSpec { @@ -3373,7 +3701,7 @@ type StandardCoders struct { func (x *StandardCoders) Reset() { *x = StandardCoders{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[26] + mi := &file_beam_runner_api_proto_msgTypes[27] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3386,7 +3714,7 @@ func (x *StandardCoders) String() string { func (*StandardCoders) ProtoMessage() {} func (x *StandardCoders) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[26] + mi := &file_beam_runner_api_proto_msgTypes[27] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3399,7 +3727,7 @@ func (x *StandardCoders) ProtoReflect() protoreflect.Message { // Deprecated: Use StandardCoders.ProtoReflect.Descriptor instead. func (*StandardCoders) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{26} + return file_beam_runner_api_proto_rawDescGZIP(), []int{27} } // A windowing strategy describes the window function, triggering, allowed @@ -3457,7 +3785,7 @@ type WindowingStrategy struct { func (x *WindowingStrategy) Reset() { *x = WindowingStrategy{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[27] + mi := &file_beam_runner_api_proto_msgTypes[28] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3470,7 +3798,7 @@ func (x *WindowingStrategy) String() string { func (*WindowingStrategy) ProtoMessage() {} func (x *WindowingStrategy) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[27] + mi := &file_beam_runner_api_proto_msgTypes[28] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3483,7 +3811,7 @@ func (x *WindowingStrategy) ProtoReflect() protoreflect.Message { // Deprecated: Use WindowingStrategy.ProtoReflect.Descriptor instead. func (*WindowingStrategy) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{27} + return file_beam_runner_api_proto_rawDescGZIP(), []int{28} } func (x *WindowingStrategy) GetWindowFn() *FunctionSpec { @@ -3575,7 +3903,7 @@ type MergeStatus struct { func (x *MergeStatus) Reset() { *x = MergeStatus{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[28] + mi := &file_beam_runner_api_proto_msgTypes[29] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3588,7 +3916,7 @@ func (x *MergeStatus) String() string { func (*MergeStatus) ProtoMessage() {} func (x *MergeStatus) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[28] + mi := &file_beam_runner_api_proto_msgTypes[29] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3601,7 +3929,7 @@ func (x *MergeStatus) ProtoReflect() protoreflect.Message { // Deprecated: Use MergeStatus.ProtoReflect.Descriptor instead. func (*MergeStatus) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{28} + return file_beam_runner_api_proto_rawDescGZIP(), []int{29} } // Whether or not subsequent outputs of aggregations should be entire @@ -3616,7 +3944,7 @@ type AccumulationMode struct { func (x *AccumulationMode) Reset() { *x = AccumulationMode{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[29] + mi := &file_beam_runner_api_proto_msgTypes[30] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3629,7 +3957,7 @@ func (x *AccumulationMode) String() string { func (*AccumulationMode) ProtoMessage() {} func (x *AccumulationMode) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[29] + mi := &file_beam_runner_api_proto_msgTypes[30] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3642,7 +3970,7 @@ func (x *AccumulationMode) ProtoReflect() protoreflect.Message { // Deprecated: Use AccumulationMode.ProtoReflect.Descriptor instead. func (*AccumulationMode) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{29} + return file_beam_runner_api_proto_rawDescGZIP(), []int{30} } // Controls whether or not an aggregating transform should output data @@ -3656,7 +3984,7 @@ type ClosingBehavior struct { func (x *ClosingBehavior) Reset() { *x = ClosingBehavior{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[30] + mi := &file_beam_runner_api_proto_msgTypes[31] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3669,7 +3997,7 @@ func (x *ClosingBehavior) String() string { func (*ClosingBehavior) ProtoMessage() {} func (x *ClosingBehavior) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[30] + mi := &file_beam_runner_api_proto_msgTypes[31] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3682,7 +4010,7 @@ func (x *ClosingBehavior) ProtoReflect() protoreflect.Message { // Deprecated: Use ClosingBehavior.ProtoReflect.Descriptor instead. func (*ClosingBehavior) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{30} + return file_beam_runner_api_proto_rawDescGZIP(), []int{31} } // Controls whether or not an aggregating transform should output data @@ -3696,7 +4024,7 @@ type OnTimeBehavior struct { func (x *OnTimeBehavior) Reset() { *x = OnTimeBehavior{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[31] + mi := &file_beam_runner_api_proto_msgTypes[32] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3709,7 +4037,7 @@ func (x *OnTimeBehavior) String() string { func (*OnTimeBehavior) ProtoMessage() {} func (x *OnTimeBehavior) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[31] + mi := &file_beam_runner_api_proto_msgTypes[32] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3722,7 +4050,7 @@ func (x *OnTimeBehavior) ProtoReflect() protoreflect.Message { // Deprecated: Use OnTimeBehavior.ProtoReflect.Descriptor instead. func (*OnTimeBehavior) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{31} + return file_beam_runner_api_proto_rawDescGZIP(), []int{32} } // When a number of windowed, timestamped inputs are aggregated, the timestamp @@ -3736,7 +4064,7 @@ type OutputTime struct { func (x *OutputTime) Reset() { *x = OutputTime{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[32] + mi := &file_beam_runner_api_proto_msgTypes[33] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3749,7 +4077,7 @@ func (x *OutputTime) String() string { func (*OutputTime) ProtoMessage() {} func (x *OutputTime) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[32] + mi := &file_beam_runner_api_proto_msgTypes[33] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3762,7 +4090,7 @@ func (x *OutputTime) ProtoReflect() protoreflect.Message { // Deprecated: Use OutputTime.ProtoReflect.Descriptor instead. func (*OutputTime) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{32} + return file_beam_runner_api_proto_rawDescGZIP(), []int{33} } // The different time domains in the Beam model. @@ -3775,7 +4103,7 @@ type TimeDomain struct { func (x *TimeDomain) Reset() { *x = TimeDomain{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[33] + mi := &file_beam_runner_api_proto_msgTypes[34] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3788,7 +4116,7 @@ func (x *TimeDomain) String() string { func (*TimeDomain) ProtoMessage() {} func (x *TimeDomain) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[33] + mi := &file_beam_runner_api_proto_msgTypes[34] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3801,7 +4129,7 @@ func (x *TimeDomain) ProtoReflect() protoreflect.Message { // Deprecated: Use TimeDomain.ProtoReflect.Descriptor instead. func (*TimeDomain) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{33} + return file_beam_runner_api_proto_rawDescGZIP(), []int{34} } // A small DSL for expressing when to emit new aggregations @@ -3834,7 +4162,7 @@ type Trigger struct { func (x *Trigger) Reset() { *x = Trigger{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[34] + mi := &file_beam_runner_api_proto_msgTypes[35] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -3847,7 +4175,7 @@ func (x *Trigger) String() string { func (*Trigger) ProtoMessage() {} func (x *Trigger) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[34] + mi := &file_beam_runner_api_proto_msgTypes[35] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -3860,7 +4188,7 @@ func (x *Trigger) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger.ProtoReflect.Descriptor instead. func (*Trigger) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35} } func (m *Trigger) GetTrigger() isTrigger_Trigger { @@ -4048,7 +4376,7 @@ type TimestampTransform struct { func (x *TimestampTransform) Reset() { *x = TimestampTransform{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[35] + mi := &file_beam_runner_api_proto_msgTypes[36] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4061,7 +4389,7 @@ func (x *TimestampTransform) String() string { func (*TimestampTransform) ProtoMessage() {} func (x *TimestampTransform) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[35] + mi := &file_beam_runner_api_proto_msgTypes[36] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4074,7 +4402,7 @@ func (x *TimestampTransform) ProtoReflect() protoreflect.Message { // Deprecated: Use TimestampTransform.ProtoReflect.Descriptor instead. func (*TimestampTransform) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{35} + return file_beam_runner_api_proto_rawDescGZIP(), []int{36} } func (m *TimestampTransform) GetTimestampTransform() isTimestampTransform_TimestampTransform { @@ -4148,7 +4476,7 @@ type SideInput struct { func (x *SideInput) Reset() { *x = SideInput{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[36] + mi := &file_beam_runner_api_proto_msgTypes[37] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4161,7 +4489,7 @@ func (x *SideInput) String() string { func (*SideInput) ProtoMessage() {} func (x *SideInput) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[36] + mi := &file_beam_runner_api_proto_msgTypes[37] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4174,7 +4502,7 @@ func (x *SideInput) ProtoReflect() protoreflect.Message { // Deprecated: Use SideInput.ProtoReflect.Descriptor instead. func (*SideInput) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{36} + return file_beam_runner_api_proto_rawDescGZIP(), []int{37} } func (x *SideInput) GetAccessPattern() *FunctionSpec { @@ -4207,7 +4535,7 @@ type StandardArtifacts struct { func (x *StandardArtifacts) Reset() { *x = StandardArtifacts{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[37] + mi := &file_beam_runner_api_proto_msgTypes[38] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4220,7 +4548,7 @@ func (x *StandardArtifacts) String() string { func (*StandardArtifacts) ProtoMessage() {} func (x *StandardArtifacts) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[37] + mi := &file_beam_runner_api_proto_msgTypes[38] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4233,7 +4561,7 @@ func (x *StandardArtifacts) ProtoReflect() protoreflect.Message { // Deprecated: Use StandardArtifacts.ProtoReflect.Descriptor instead. func (*StandardArtifacts) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{37} + return file_beam_runner_api_proto_rawDescGZIP(), []int{38} } type ArtifactFilePayload struct { @@ -4250,7 +4578,7 @@ type ArtifactFilePayload struct { func (x *ArtifactFilePayload) Reset() { *x = ArtifactFilePayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[38] + mi := &file_beam_runner_api_proto_msgTypes[39] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4263,7 +4591,7 @@ func (x *ArtifactFilePayload) String() string { func (*ArtifactFilePayload) ProtoMessage() {} func (x *ArtifactFilePayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[38] + mi := &file_beam_runner_api_proto_msgTypes[39] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4276,7 +4604,7 @@ func (x *ArtifactFilePayload) ProtoReflect() protoreflect.Message { // Deprecated: Use ArtifactFilePayload.ProtoReflect.Descriptor instead. func (*ArtifactFilePayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{38} + return file_beam_runner_api_proto_rawDescGZIP(), []int{39} } func (x *ArtifactFilePayload) GetPath() string { @@ -4300,12 +4628,14 @@ type ArtifactUrlPayload struct { // a string for an artifact URL e.g. "https://.../foo.jar" or "gs://tmp/foo.jar" Url string `protobuf:"bytes,1,opt,name=url,proto3" json:"url,omitempty"` + // (Optional) The hex-encoded sha256 checksum of the artifact if available. + Sha256 string `protobuf:"bytes,2,opt,name=sha256,proto3" json:"sha256,omitempty"` } func (x *ArtifactUrlPayload) Reset() { *x = ArtifactUrlPayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[39] + mi := &file_beam_runner_api_proto_msgTypes[40] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4318,7 +4648,7 @@ func (x *ArtifactUrlPayload) String() string { func (*ArtifactUrlPayload) ProtoMessage() {} func (x *ArtifactUrlPayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[39] + mi := &file_beam_runner_api_proto_msgTypes[40] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4331,7 +4661,7 @@ func (x *ArtifactUrlPayload) ProtoReflect() protoreflect.Message { // Deprecated: Use ArtifactUrlPayload.ProtoReflect.Descriptor instead. func (*ArtifactUrlPayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{39} + return file_beam_runner_api_proto_rawDescGZIP(), []int{40} } func (x *ArtifactUrlPayload) GetUrl() string { @@ -4341,6 +4671,13 @@ func (x *ArtifactUrlPayload) GetUrl() string { return "" } +func (x *ArtifactUrlPayload) GetSha256() string { + if x != nil { + return x.Sha256 + } + return "" +} + type EmbeddedFilePayload struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache @@ -4353,7 +4690,7 @@ type EmbeddedFilePayload struct { func (x *EmbeddedFilePayload) Reset() { *x = EmbeddedFilePayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[40] + mi := &file_beam_runner_api_proto_msgTypes[41] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4366,7 +4703,7 @@ func (x *EmbeddedFilePayload) String() string { func (*EmbeddedFilePayload) ProtoMessage() {} func (x *EmbeddedFilePayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[40] + mi := &file_beam_runner_api_proto_msgTypes[41] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4379,7 +4716,7 @@ func (x *EmbeddedFilePayload) ProtoReflect() protoreflect.Message { // Deprecated: Use EmbeddedFilePayload.ProtoReflect.Descriptor instead. func (*EmbeddedFilePayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{40} + return file_beam_runner_api_proto_rawDescGZIP(), []int{41} } func (x *EmbeddedFilePayload) GetData() []byte { @@ -4403,7 +4740,7 @@ type PyPIPayload struct { func (x *PyPIPayload) Reset() { *x = PyPIPayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[41] + mi := &file_beam_runner_api_proto_msgTypes[42] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4416,7 +4753,7 @@ func (x *PyPIPayload) String() string { func (*PyPIPayload) ProtoMessage() {} func (x *PyPIPayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[41] + mi := &file_beam_runner_api_proto_msgTypes[42] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4429,7 +4766,7 @@ func (x *PyPIPayload) ProtoReflect() protoreflect.Message { // Deprecated: Use PyPIPayload.ProtoReflect.Descriptor instead. func (*PyPIPayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{41} + return file_beam_runner_api_proto_rawDescGZIP(), []int{42} } func (x *PyPIPayload) GetArtifactId() string { @@ -4461,7 +4798,7 @@ type MavenPayload struct { func (x *MavenPayload) Reset() { *x = MavenPayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[42] + mi := &file_beam_runner_api_proto_msgTypes[43] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4474,7 +4811,7 @@ func (x *MavenPayload) String() string { func (*MavenPayload) ProtoMessage() {} func (x *MavenPayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[42] + mi := &file_beam_runner_api_proto_msgTypes[43] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4487,7 +4824,7 @@ func (x *MavenPayload) ProtoReflect() protoreflect.Message { // Deprecated: Use MavenPayload.ProtoReflect.Descriptor instead. func (*MavenPayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{42} + return file_beam_runner_api_proto_rawDescGZIP(), []int{43} } func (x *MavenPayload) GetArtifact() string { @@ -4519,7 +4856,7 @@ type DeferredArtifactPayload struct { func (x *DeferredArtifactPayload) Reset() { *x = DeferredArtifactPayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[43] + mi := &file_beam_runner_api_proto_msgTypes[44] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4532,7 +4869,7 @@ func (x *DeferredArtifactPayload) String() string { func (*DeferredArtifactPayload) ProtoMessage() {} func (x *DeferredArtifactPayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[43] + mi := &file_beam_runner_api_proto_msgTypes[44] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4545,7 +4882,7 @@ func (x *DeferredArtifactPayload) ProtoReflect() protoreflect.Message { // Deprecated: Use DeferredArtifactPayload.ProtoReflect.Descriptor instead. func (*DeferredArtifactPayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{43} + return file_beam_runner_api_proto_rawDescGZIP(), []int{44} } func (x *DeferredArtifactPayload) GetKey() string { @@ -4574,7 +4911,7 @@ type ArtifactStagingToRolePayload struct { func (x *ArtifactStagingToRolePayload) Reset() { *x = ArtifactStagingToRolePayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[44] + mi := &file_beam_runner_api_proto_msgTypes[45] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4587,7 +4924,7 @@ func (x *ArtifactStagingToRolePayload) String() string { func (*ArtifactStagingToRolePayload) ProtoMessage() {} func (x *ArtifactStagingToRolePayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[44] + mi := &file_beam_runner_api_proto_msgTypes[45] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4600,7 +4937,7 @@ func (x *ArtifactStagingToRolePayload) ProtoReflect() protoreflect.Message { // Deprecated: Use ArtifactStagingToRolePayload.ProtoReflect.Descriptor instead. func (*ArtifactStagingToRolePayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{44} + return file_beam_runner_api_proto_rawDescGZIP(), []int{45} } func (x *ArtifactStagingToRolePayload) GetStagedName() string { @@ -4626,7 +4963,7 @@ type ArtifactInformation struct { func (x *ArtifactInformation) Reset() { *x = ArtifactInformation{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[45] + mi := &file_beam_runner_api_proto_msgTypes[46] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4639,7 +4976,7 @@ func (x *ArtifactInformation) String() string { func (*ArtifactInformation) ProtoMessage() {} func (x *ArtifactInformation) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[45] + mi := &file_beam_runner_api_proto_msgTypes[46] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4652,7 +4989,7 @@ func (x *ArtifactInformation) ProtoReflect() protoreflect.Message { // Deprecated: Use ArtifactInformation.ProtoReflect.Descriptor instead. func (*ArtifactInformation) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{45} + return file_beam_runner_api_proto_rawDescGZIP(), []int{46} } func (x *ArtifactInformation) GetTypeUrn() string { @@ -4706,12 +5043,16 @@ type Environment struct { Capabilities []string `protobuf:"bytes,5,rep,name=capabilities,proto3" json:"capabilities,omitempty"` // (Optional) artifact dependency information used for executing UDFs in this environment. Dependencies []*ArtifactInformation `protobuf:"bytes,6,rep,name=dependencies,proto3" json:"dependencies,omitempty"` + // (Optional) A mapping of resource URNs to requested values. The encoding + // of the values is specified by the URN. Resource hints are advisory; + // a runner is free to ignore resource hints that it does not understand. + ResourceHints map[string][]byte `protobuf:"bytes,7,rep,name=resource_hints,json=resourceHints,proto3" json:"resource_hints,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` } func (x *Environment) Reset() { *x = Environment{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[46] + mi := &file_beam_runner_api_proto_msgTypes[47] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4724,7 +5065,7 @@ func (x *Environment) String() string { func (*Environment) ProtoMessage() {} func (x *Environment) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[46] + mi := &file_beam_runner_api_proto_msgTypes[47] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4737,7 +5078,7 @@ func (x *Environment) ProtoReflect() protoreflect.Message { // Deprecated: Use Environment.ProtoReflect.Descriptor instead. func (*Environment) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{46} + return file_beam_runner_api_proto_rawDescGZIP(), []int{47} } func (x *Environment) GetUrn() string { @@ -4775,6 +5116,13 @@ func (x *Environment) GetDependencies() []*ArtifactInformation { return nil } +func (x *Environment) GetResourceHints() map[string][]byte { + if x != nil { + return x.ResourceHints + } + return nil +} + type StandardEnvironments struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache @@ -4784,7 +5132,7 @@ type StandardEnvironments struct { func (x *StandardEnvironments) Reset() { *x = StandardEnvironments{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[47] + mi := &file_beam_runner_api_proto_msgTypes[48] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4797,7 +5145,7 @@ func (x *StandardEnvironments) String() string { func (*StandardEnvironments) ProtoMessage() {} func (x *StandardEnvironments) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[47] + mi := &file_beam_runner_api_proto_msgTypes[48] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4810,7 +5158,7 @@ func (x *StandardEnvironments) ProtoReflect() protoreflect.Message { // Deprecated: Use StandardEnvironments.ProtoReflect.Descriptor instead. func (*StandardEnvironments) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{47} + return file_beam_runner_api_proto_rawDescGZIP(), []int{48} } // The payload of a Docker image @@ -4825,7 +5173,7 @@ type DockerPayload struct { func (x *DockerPayload) Reset() { *x = DockerPayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[48] + mi := &file_beam_runner_api_proto_msgTypes[49] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4838,7 +5186,7 @@ func (x *DockerPayload) String() string { func (*DockerPayload) ProtoMessage() {} func (x *DockerPayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[48] + mi := &file_beam_runner_api_proto_msgTypes[49] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4851,7 +5199,7 @@ func (x *DockerPayload) ProtoReflect() protoreflect.Message { // Deprecated: Use DockerPayload.ProtoReflect.Descriptor instead. func (*DockerPayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{48} + return file_beam_runner_api_proto_rawDescGZIP(), []int{49} } func (x *DockerPayload) GetContainerImage() string { @@ -4875,7 +5223,7 @@ type ProcessPayload struct { func (x *ProcessPayload) Reset() { *x = ProcessPayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[49] + mi := &file_beam_runner_api_proto_msgTypes[50] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4888,7 +5236,7 @@ func (x *ProcessPayload) String() string { func (*ProcessPayload) ProtoMessage() {} func (x *ProcessPayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[49] + mi := &file_beam_runner_api_proto_msgTypes[50] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4901,7 +5249,7 @@ func (x *ProcessPayload) ProtoReflect() protoreflect.Message { // Deprecated: Use ProcessPayload.ProtoReflect.Descriptor instead. func (*ProcessPayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{49} + return file_beam_runner_api_proto_rawDescGZIP(), []int{50} } func (x *ProcessPayload) GetOs() string { @@ -4944,7 +5292,7 @@ type ExternalPayload struct { func (x *ExternalPayload) Reset() { *x = ExternalPayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[50] + mi := &file_beam_runner_api_proto_msgTypes[51] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -4957,7 +5305,7 @@ func (x *ExternalPayload) String() string { func (*ExternalPayload) ProtoMessage() {} func (x *ExternalPayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[50] + mi := &file_beam_runner_api_proto_msgTypes[51] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -4970,7 +5318,7 @@ func (x *ExternalPayload) ProtoReflect() protoreflect.Message { // Deprecated: Use ExternalPayload.ProtoReflect.Descriptor instead. func (*ExternalPayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{50} + return file_beam_runner_api_proto_rawDescGZIP(), []int{51} } func (x *ExternalPayload) GetEndpoint() *ApiServiceDescriptor { @@ -4999,7 +5347,7 @@ type StandardProtocols struct { func (x *StandardProtocols) Reset() { *x = StandardProtocols{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[51] + mi := &file_beam_runner_api_proto_msgTypes[52] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5012,7 +5360,7 @@ func (x *StandardProtocols) String() string { func (*StandardProtocols) ProtoMessage() {} func (x *StandardProtocols) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[51] + mi := &file_beam_runner_api_proto_msgTypes[52] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5025,7 +5373,47 @@ func (x *StandardProtocols) ProtoReflect() protoreflect.Message { // Deprecated: Use StandardProtocols.ProtoReflect.Descriptor instead. func (*StandardProtocols) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{51} + return file_beam_runner_api_proto_rawDescGZIP(), []int{52} +} + +// These URNs are used to indicate capabilities of runner that an environment +// may take advantage of when interacting with this runner. +type StandardRunnerProtocols struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields +} + +func (x *StandardRunnerProtocols) Reset() { + *x = StandardRunnerProtocols{} + if protoimpl.UnsafeEnabled { + mi := &file_beam_runner_api_proto_msgTypes[53] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *StandardRunnerProtocols) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*StandardRunnerProtocols) ProtoMessage() {} + +func (x *StandardRunnerProtocols) ProtoReflect() protoreflect.Message { + mi := &file_beam_runner_api_proto_msgTypes[53] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use StandardRunnerProtocols.ProtoReflect.Descriptor instead. +func (*StandardRunnerProtocols) Descriptor() ([]byte, []int) { + return file_beam_runner_api_proto_rawDescGZIP(), []int{53} } // These URNs are used to indicate requirements of a pipeline that cannot @@ -5042,7 +5430,7 @@ type StandardRequirements struct { func (x *StandardRequirements) Reset() { *x = StandardRequirements{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[52] + mi := &file_beam_runner_api_proto_msgTypes[54] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5055,7 +5443,7 @@ func (x *StandardRequirements) String() string { func (*StandardRequirements) ProtoMessage() {} func (x *StandardRequirements) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[52] + mi := &file_beam_runner_api_proto_msgTypes[54] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5068,7 +5456,7 @@ func (x *StandardRequirements) ProtoReflect() protoreflect.Message { // Deprecated: Use StandardRequirements.ProtoReflect.Descriptor instead. func (*StandardRequirements) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{52} + return file_beam_runner_api_proto_rawDescGZIP(), []int{54} } // A URN along with a parameter object whose schema is determined by the @@ -5116,7 +5504,7 @@ type FunctionSpec struct { func (x *FunctionSpec) Reset() { *x = FunctionSpec{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[53] + mi := &file_beam_runner_api_proto_msgTypes[55] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5129,7 +5517,7 @@ func (x *FunctionSpec) String() string { func (*FunctionSpec) ProtoMessage() {} func (x *FunctionSpec) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[53] + mi := &file_beam_runner_api_proto_msgTypes[55] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5142,7 +5530,7 @@ func (x *FunctionSpec) ProtoReflect() protoreflect.Message { // Deprecated: Use FunctionSpec.ProtoReflect.Descriptor instead. func (*FunctionSpec) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{53} + return file_beam_runner_api_proto_rawDescGZIP(), []int{55} } func (x *FunctionSpec) GetUrn() string { @@ -5174,7 +5562,7 @@ type StandardDisplayData struct { func (x *StandardDisplayData) Reset() { *x = StandardDisplayData{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[54] + mi := &file_beam_runner_api_proto_msgTypes[56] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5187,7 +5575,7 @@ func (x *StandardDisplayData) String() string { func (*StandardDisplayData) ProtoMessage() {} func (x *StandardDisplayData) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[54] + mi := &file_beam_runner_api_proto_msgTypes[56] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5200,38 +5588,42 @@ func (x *StandardDisplayData) ProtoReflect() protoreflect.Message { // Deprecated: Use StandardDisplayData.ProtoReflect.Descriptor instead. func (*StandardDisplayData) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{54} + return file_beam_runner_api_proto_rawDescGZIP(), []int{56} } -type LabelledStringPayload struct { +type LabelledPayload struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields // (Required) A human readable label for the value. Label string `protobuf:"bytes,1,opt,name=label,proto3" json:"label,omitempty"` - // (Required) A value which will be displayed to the user. The urn describes - // how the value can be interpreted and/or categorized. - Value string `protobuf:"bytes,2,opt,name=value,proto3" json:"value,omitempty"` + // (Required) A value which will be displayed to the user. + // + // Types that are assignable to Value: + // *LabelledPayload_StringValue + // *LabelledPayload_BoolValue + // *LabelledPayload_DoubleValue + Value isLabelledPayload_Value `protobuf_oneof:"value"` } -func (x *LabelledStringPayload) Reset() { - *x = LabelledStringPayload{} +func (x *LabelledPayload) Reset() { + *x = LabelledPayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[55] + mi := &file_beam_runner_api_proto_msgTypes[57] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } } -func (x *LabelledStringPayload) String() string { +func (x *LabelledPayload) String() string { return protoimpl.X.MessageStringOf(x) } -func (*LabelledStringPayload) ProtoMessage() {} +func (*LabelledPayload) ProtoMessage() {} -func (x *LabelledStringPayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[55] +func (x *LabelledPayload) ProtoReflect() protoreflect.Message { + mi := &file_beam_runner_api_proto_msgTypes[57] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5242,25 +5634,68 @@ func (x *LabelledStringPayload) ProtoReflect() protoreflect.Message { return mi.MessageOf(x) } -// Deprecated: Use LabelledStringPayload.ProtoReflect.Descriptor instead. -func (*LabelledStringPayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{55} +// Deprecated: Use LabelledPayload.ProtoReflect.Descriptor instead. +func (*LabelledPayload) Descriptor() ([]byte, []int) { + return file_beam_runner_api_proto_rawDescGZIP(), []int{57} } -func (x *LabelledStringPayload) GetLabel() string { +func (x *LabelledPayload) GetLabel() string { if x != nil { return x.Label } return "" } -func (x *LabelledStringPayload) GetValue() string { - if x != nil { - return x.Value +func (m *LabelledPayload) GetValue() isLabelledPayload_Value { + if m != nil { + return m.Value + } + return nil +} + +func (x *LabelledPayload) GetStringValue() string { + if x, ok := x.GetValue().(*LabelledPayload_StringValue); ok { + return x.StringValue } return "" } +func (x *LabelledPayload) GetBoolValue() bool { + if x, ok := x.GetValue().(*LabelledPayload_BoolValue); ok { + return x.BoolValue + } + return false +} + +func (x *LabelledPayload) GetDoubleValue() float64 { + if x, ok := x.GetValue().(*LabelledPayload_DoubleValue); ok { + return x.DoubleValue + } + return 0 +} + +type isLabelledPayload_Value interface { + isLabelledPayload_Value() +} + +type LabelledPayload_StringValue struct { + StringValue string `protobuf:"bytes,2,opt,name=string_value,json=stringValue,proto3,oneof"` +} + +type LabelledPayload_BoolValue struct { + BoolValue bool `protobuf:"varint,3,opt,name=bool_value,json=boolValue,proto3,oneof"` +} + +type LabelledPayload_DoubleValue struct { + DoubleValue float64 `protobuf:"fixed64,4,opt,name=double_value,json=doubleValue,proto3,oneof"` +} + +func (*LabelledPayload_StringValue) isLabelledPayload_Value() {} + +func (*LabelledPayload_BoolValue) isLabelledPayload_Value() {} + +func (*LabelledPayload_DoubleValue) isLabelledPayload_Value() {} + // Static display data associated with a pipeline component. Display data is // useful for pipeline runners IOs and diagnostic dashboards to display details // about annotated components. @@ -5281,7 +5716,7 @@ type DisplayData struct { func (x *DisplayData) Reset() { *x = DisplayData{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[56] + mi := &file_beam_runner_api_proto_msgTypes[58] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5294,7 +5729,7 @@ func (x *DisplayData) String() string { func (*DisplayData) ProtoMessage() {} func (x *DisplayData) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[56] + mi := &file_beam_runner_api_proto_msgTypes[58] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5307,7 +5742,7 @@ func (x *DisplayData) ProtoReflect() protoreflect.Message { // Deprecated: Use DisplayData.ProtoReflect.Descriptor instead. func (*DisplayData) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{56} + return file_beam_runner_api_proto_rawDescGZIP(), []int{58} } func (x *DisplayData) GetUrn() string { @@ -5357,7 +5792,7 @@ type MessageWithComponents struct { func (x *MessageWithComponents) Reset() { *x = MessageWithComponents{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[57] + mi := &file_beam_runner_api_proto_msgTypes[59] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5370,7 +5805,7 @@ func (x *MessageWithComponents) String() string { func (*MessageWithComponents) ProtoMessage() {} func (x *MessageWithComponents) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[57] + mi := &file_beam_runner_api_proto_msgTypes[59] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5383,7 +5818,7 @@ func (x *MessageWithComponents) ProtoReflect() protoreflect.Message { // Deprecated: Use MessageWithComponents.ProtoReflect.Descriptor instead. func (*MessageWithComponents) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{57} + return file_beam_runner_api_proto_rawDescGZIP(), []int{59} } func (x *MessageWithComponents) GetComponents() *Components { @@ -5577,7 +6012,7 @@ type ExecutableStagePayload struct { func (x *ExecutableStagePayload) Reset() { *x = ExecutableStagePayload{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[58] + mi := &file_beam_runner_api_proto_msgTypes[60] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5590,7 +6025,7 @@ func (x *ExecutableStagePayload) String() string { func (*ExecutableStagePayload) ProtoMessage() {} func (x *ExecutableStagePayload) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[58] + mi := &file_beam_runner_api_proto_msgTypes[60] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5603,7 +6038,7 @@ func (x *ExecutableStagePayload) ProtoReflect() protoreflect.Message { // Deprecated: Use ExecutableStagePayload.ProtoReflect.Descriptor instead. func (*ExecutableStagePayload) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{58} + return file_beam_runner_api_proto_rawDescGZIP(), []int{60} } func (x *ExecutableStagePayload) GetEnvironment() *Environment { @@ -5676,12 +6111,50 @@ func (x *ExecutableStagePayload) GetTimerFamilies() []*ExecutableStagePayload_Ti return nil } -type TestStreamPayload_Event struct { +type StandardResourceHints struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields +} - // Types that are assignable to Event: +func (x *StandardResourceHints) Reset() { + *x = StandardResourceHints{} + if protoimpl.UnsafeEnabled { + mi := &file_beam_runner_api_proto_msgTypes[61] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *StandardResourceHints) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*StandardResourceHints) ProtoMessage() {} + +func (x *StandardResourceHints) ProtoReflect() protoreflect.Message { + mi := &file_beam_runner_api_proto_msgTypes[61] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use StandardResourceHints.ProtoReflect.Descriptor instead. +func (*StandardResourceHints) Descriptor() ([]byte, []int) { + return file_beam_runner_api_proto_rawDescGZIP(), []int{61} +} + +type TestStreamPayload_Event struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Types that are assignable to Event: // *TestStreamPayload_Event_WatermarkEvent // *TestStreamPayload_Event_ProcessingTimeEvent // *TestStreamPayload_Event_ElementEvent @@ -5691,7 +6164,7 @@ type TestStreamPayload_Event struct { func (x *TestStreamPayload_Event) Reset() { *x = TestStreamPayload_Event{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[69] + mi := &file_beam_runner_api_proto_msgTypes[73] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5704,7 +6177,7 @@ func (x *TestStreamPayload_Event) String() string { func (*TestStreamPayload_Event) ProtoMessage() {} func (x *TestStreamPayload_Event) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[69] + mi := &file_beam_runner_api_proto_msgTypes[73] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5779,14 +6252,14 @@ type TestStreamPayload_TimestampedElement struct { // (Required) The element encoded. Currently the TestStream only supports // encoding primitives. EncodedElement []byte `protobuf:"bytes,1,opt,name=encoded_element,json=encodedElement,proto3" json:"encoded_element,omitempty"` - // (Required) The event timestamp of this element. + // (Required) The event timestamp in millisecond of this element. Timestamp int64 `protobuf:"varint,2,opt,name=timestamp,proto3" json:"timestamp,omitempty"` } func (x *TestStreamPayload_TimestampedElement) Reset() { *x = TestStreamPayload_TimestampedElement{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[70] + mi := &file_beam_runner_api_proto_msgTypes[74] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5799,7 +6272,7 @@ func (x *TestStreamPayload_TimestampedElement) String() string { func (*TestStreamPayload_TimestampedElement) ProtoMessage() {} func (x *TestStreamPayload_TimestampedElement) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[70] + mi := &file_beam_runner_api_proto_msgTypes[74] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5835,7 +6308,7 @@ type TestStreamPayload_Event_AdvanceWatermark struct { sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields - // (Required) The watermark to advance to. + // (Required) The watermark in millisecond to advance to. NewWatermark int64 `protobuf:"varint,1,opt,name=new_watermark,json=newWatermark,proto3" json:"new_watermark,omitempty"` // (Optional) The output watermark tag for a PCollection. If unspecified // or with an empty string, this will default to the Main PCollection @@ -5846,7 +6319,7 @@ type TestStreamPayload_Event_AdvanceWatermark struct { func (x *TestStreamPayload_Event_AdvanceWatermark) Reset() { *x = TestStreamPayload_Event_AdvanceWatermark{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[71] + mi := &file_beam_runner_api_proto_msgTypes[75] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5859,7 +6332,7 @@ func (x *TestStreamPayload_Event_AdvanceWatermark) String() string { func (*TestStreamPayload_Event_AdvanceWatermark) ProtoMessage() {} func (x *TestStreamPayload_Event_AdvanceWatermark) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[71] + mi := &file_beam_runner_api_proto_msgTypes[75] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5895,14 +6368,14 @@ type TestStreamPayload_Event_AdvanceProcessingTime struct { sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields - // (Required) The duration to advance by. + // (Required) The duration in millisecond to advance by. AdvanceDuration int64 `protobuf:"varint,1,opt,name=advance_duration,json=advanceDuration,proto3" json:"advance_duration,omitempty"` } func (x *TestStreamPayload_Event_AdvanceProcessingTime) Reset() { *x = TestStreamPayload_Event_AdvanceProcessingTime{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[72] + mi := &file_beam_runner_api_proto_msgTypes[76] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5915,7 +6388,7 @@ func (x *TestStreamPayload_Event_AdvanceProcessingTime) String() string { func (*TestStreamPayload_Event_AdvanceProcessingTime) ProtoMessage() {} func (x *TestStreamPayload_Event_AdvanceProcessingTime) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[72] + mi := &file_beam_runner_api_proto_msgTypes[76] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -5955,7 +6428,7 @@ type TestStreamPayload_Event_AddElements struct { func (x *TestStreamPayload_Event_AddElements) Reset() { *x = TestStreamPayload_Event_AddElements{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[73] + mi := &file_beam_runner_api_proto_msgTypes[77] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -5968,7 +6441,7 @@ func (x *TestStreamPayload_Event_AddElements) String() string { func (*TestStreamPayload_Event_AddElements) ProtoMessage() {} func (x *TestStreamPayload_Event_AddElements) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[73] + mi := &file_beam_runner_api_proto_msgTypes[77] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6010,7 +6483,7 @@ type Trigger_AfterAll struct { func (x *Trigger_AfterAll) Reset() { *x = Trigger_AfterAll{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[75] + mi := &file_beam_runner_api_proto_msgTypes[79] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6023,7 +6496,7 @@ func (x *Trigger_AfterAll) String() string { func (*Trigger_AfterAll) ProtoMessage() {} func (x *Trigger_AfterAll) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[75] + mi := &file_beam_runner_api_proto_msgTypes[79] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6036,7 +6509,7 @@ func (x *Trigger_AfterAll) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_AfterAll.ProtoReflect.Descriptor instead. func (*Trigger_AfterAll) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 0} } func (x *Trigger_AfterAll) GetSubtriggers() []*Trigger { @@ -6058,7 +6531,7 @@ type Trigger_AfterAny struct { func (x *Trigger_AfterAny) Reset() { *x = Trigger_AfterAny{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[76] + mi := &file_beam_runner_api_proto_msgTypes[80] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6071,7 +6544,7 @@ func (x *Trigger_AfterAny) String() string { func (*Trigger_AfterAny) ProtoMessage() {} func (x *Trigger_AfterAny) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[76] + mi := &file_beam_runner_api_proto_msgTypes[80] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6084,7 +6557,7 @@ func (x *Trigger_AfterAny) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_AfterAny.ProtoReflect.Descriptor instead. func (*Trigger_AfterAny) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 1} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 1} } func (x *Trigger_AfterAny) GetSubtriggers() []*Trigger { @@ -6107,7 +6580,7 @@ type Trigger_AfterEach struct { func (x *Trigger_AfterEach) Reset() { *x = Trigger_AfterEach{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[77] + mi := &file_beam_runner_api_proto_msgTypes[81] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6120,7 +6593,7 @@ func (x *Trigger_AfterEach) String() string { func (*Trigger_AfterEach) ProtoMessage() {} func (x *Trigger_AfterEach) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[77] + mi := &file_beam_runner_api_proto_msgTypes[81] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6133,7 +6606,7 @@ func (x *Trigger_AfterEach) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_AfterEach.ProtoReflect.Descriptor instead. func (*Trigger_AfterEach) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 2} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 2} } func (x *Trigger_AfterEach) GetSubtriggers() []*Trigger { @@ -6162,7 +6635,7 @@ type Trigger_AfterEndOfWindow struct { func (x *Trigger_AfterEndOfWindow) Reset() { *x = Trigger_AfterEndOfWindow{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[78] + mi := &file_beam_runner_api_proto_msgTypes[82] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6175,7 +6648,7 @@ func (x *Trigger_AfterEndOfWindow) String() string { func (*Trigger_AfterEndOfWindow) ProtoMessage() {} func (x *Trigger_AfterEndOfWindow) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[78] + mi := &file_beam_runner_api_proto_msgTypes[82] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6188,7 +6661,7 @@ func (x *Trigger_AfterEndOfWindow) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_AfterEndOfWindow.ProtoReflect.Descriptor instead. func (*Trigger_AfterEndOfWindow) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 3} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 3} } func (x *Trigger_AfterEndOfWindow) GetEarlyFirings() *Trigger { @@ -6219,7 +6692,7 @@ type Trigger_AfterProcessingTime struct { func (x *Trigger_AfterProcessingTime) Reset() { *x = Trigger_AfterProcessingTime{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[79] + mi := &file_beam_runner_api_proto_msgTypes[83] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6232,7 +6705,7 @@ func (x *Trigger_AfterProcessingTime) String() string { func (*Trigger_AfterProcessingTime) ProtoMessage() {} func (x *Trigger_AfterProcessingTime) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[79] + mi := &file_beam_runner_api_proto_msgTypes[83] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6245,7 +6718,7 @@ func (x *Trigger_AfterProcessingTime) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_AfterProcessingTime.ProtoReflect.Descriptor instead. func (*Trigger_AfterProcessingTime) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 4} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 4} } func (x *Trigger_AfterProcessingTime) GetTimestampTransforms() []*TimestampTransform { @@ -6266,7 +6739,7 @@ type Trigger_AfterSynchronizedProcessingTime struct { func (x *Trigger_AfterSynchronizedProcessingTime) Reset() { *x = Trigger_AfterSynchronizedProcessingTime{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[80] + mi := &file_beam_runner_api_proto_msgTypes[84] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6279,7 +6752,7 @@ func (x *Trigger_AfterSynchronizedProcessingTime) String() string { func (*Trigger_AfterSynchronizedProcessingTime) ProtoMessage() {} func (x *Trigger_AfterSynchronizedProcessingTime) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[80] + mi := &file_beam_runner_api_proto_msgTypes[84] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6292,7 +6765,7 @@ func (x *Trigger_AfterSynchronizedProcessingTime) ProtoReflect() protoreflect.Me // Deprecated: Use Trigger_AfterSynchronizedProcessingTime.ProtoReflect.Descriptor instead. func (*Trigger_AfterSynchronizedProcessingTime) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 5} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 5} } // The default trigger. Equivalent to Repeat { AfterEndOfWindow } but @@ -6306,7 +6779,7 @@ type Trigger_Default struct { func (x *Trigger_Default) Reset() { *x = Trigger_Default{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[81] + mi := &file_beam_runner_api_proto_msgTypes[85] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6319,7 +6792,7 @@ func (x *Trigger_Default) String() string { func (*Trigger_Default) ProtoMessage() {} func (x *Trigger_Default) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[81] + mi := &file_beam_runner_api_proto_msgTypes[85] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6332,7 +6805,7 @@ func (x *Trigger_Default) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_Default.ProtoReflect.Descriptor instead. func (*Trigger_Default) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 6} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 6} } // Ready whenever the requisite number of input elements have arrived @@ -6347,7 +6820,7 @@ type Trigger_ElementCount struct { func (x *Trigger_ElementCount) Reset() { *x = Trigger_ElementCount{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[82] + mi := &file_beam_runner_api_proto_msgTypes[86] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6360,7 +6833,7 @@ func (x *Trigger_ElementCount) String() string { func (*Trigger_ElementCount) ProtoMessage() {} func (x *Trigger_ElementCount) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[82] + mi := &file_beam_runner_api_proto_msgTypes[86] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6373,7 +6846,7 @@ func (x *Trigger_ElementCount) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_ElementCount.ProtoReflect.Descriptor instead. func (*Trigger_ElementCount) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 7} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 7} } func (x *Trigger_ElementCount) GetElementCount() int32 { @@ -6394,7 +6867,7 @@ type Trigger_Never struct { func (x *Trigger_Never) Reset() { *x = Trigger_Never{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[83] + mi := &file_beam_runner_api_proto_msgTypes[87] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6407,7 +6880,7 @@ func (x *Trigger_Never) String() string { func (*Trigger_Never) ProtoMessage() {} func (x *Trigger_Never) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[83] + mi := &file_beam_runner_api_proto_msgTypes[87] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6420,7 +6893,7 @@ func (x *Trigger_Never) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_Never.ProtoReflect.Descriptor instead. func (*Trigger_Never) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 8} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 8} } // Always ready. This can also be expressed as ElementCount(1) but @@ -6434,7 +6907,7 @@ type Trigger_Always struct { func (x *Trigger_Always) Reset() { *x = Trigger_Always{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[84] + mi := &file_beam_runner_api_proto_msgTypes[88] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6447,7 +6920,7 @@ func (x *Trigger_Always) String() string { func (*Trigger_Always) ProtoMessage() {} func (x *Trigger_Always) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[84] + mi := &file_beam_runner_api_proto_msgTypes[88] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6460,7 +6933,7 @@ func (x *Trigger_Always) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_Always.ProtoReflect.Descriptor instead. func (*Trigger_Always) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 9} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 9} } // Ready whenever either of its subtriggers are ready, but finishes output @@ -6479,7 +6952,7 @@ type Trigger_OrFinally struct { func (x *Trigger_OrFinally) Reset() { *x = Trigger_OrFinally{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[85] + mi := &file_beam_runner_api_proto_msgTypes[89] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6492,7 +6965,7 @@ func (x *Trigger_OrFinally) String() string { func (*Trigger_OrFinally) ProtoMessage() {} func (x *Trigger_OrFinally) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[85] + mi := &file_beam_runner_api_proto_msgTypes[89] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6505,7 +6978,7 @@ func (x *Trigger_OrFinally) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_OrFinally.ProtoReflect.Descriptor instead. func (*Trigger_OrFinally) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 10} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 10} } func (x *Trigger_OrFinally) GetMain() *Trigger { @@ -6536,7 +7009,7 @@ type Trigger_Repeat struct { func (x *Trigger_Repeat) Reset() { *x = Trigger_Repeat{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[86] + mi := &file_beam_runner_api_proto_msgTypes[90] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6549,7 +7022,7 @@ func (x *Trigger_Repeat) String() string { func (*Trigger_Repeat) ProtoMessage() {} func (x *Trigger_Repeat) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[86] + mi := &file_beam_runner_api_proto_msgTypes[90] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6562,7 +7035,7 @@ func (x *Trigger_Repeat) ProtoReflect() protoreflect.Message { // Deprecated: Use Trigger_Repeat.ProtoReflect.Descriptor instead. func (*Trigger_Repeat) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{34, 11} + return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 11} } func (x *Trigger_Repeat) GetSubtrigger() *Trigger { @@ -6584,7 +7057,7 @@ type TimestampTransform_Delay struct { func (x *TimestampTransform_Delay) Reset() { *x = TimestampTransform_Delay{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[87] + mi := &file_beam_runner_api_proto_msgTypes[91] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6597,7 +7070,7 @@ func (x *TimestampTransform_Delay) String() string { func (*TimestampTransform_Delay) ProtoMessage() {} func (x *TimestampTransform_Delay) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[87] + mi := &file_beam_runner_api_proto_msgTypes[91] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6610,7 +7083,7 @@ func (x *TimestampTransform_Delay) ProtoReflect() protoreflect.Message { // Deprecated: Use TimestampTransform_Delay.ProtoReflect.Descriptor instead. func (*TimestampTransform_Delay) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{36, 0} } func (x *TimestampTransform_Delay) GetDelayMillis() int64 { @@ -6636,7 +7109,7 @@ type TimestampTransform_AlignTo struct { func (x *TimestampTransform_AlignTo) Reset() { *x = TimestampTransform_AlignTo{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[88] + mi := &file_beam_runner_api_proto_msgTypes[92] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6649,7 +7122,7 @@ func (x *TimestampTransform_AlignTo) String() string { func (*TimestampTransform_AlignTo) ProtoMessage() {} func (x *TimestampTransform_AlignTo) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[88] + mi := &file_beam_runner_api_proto_msgTypes[92] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6662,7 +7135,7 @@ func (x *TimestampTransform_AlignTo) ProtoReflect() protoreflect.Message { // Deprecated: Use TimestampTransform_AlignTo.ProtoReflect.Descriptor instead. func (*TimestampTransform_AlignTo) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{35, 1} + return file_beam_runner_api_proto_rawDescGZIP(), []int{36, 1} } func (x *TimestampTransform_AlignTo) GetPeriod() int64 { @@ -6695,7 +7168,7 @@ type ExecutableStagePayload_SideInputId struct { func (x *ExecutableStagePayload_SideInputId) Reset() { *x = ExecutableStagePayload_SideInputId{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[91] + mi := &file_beam_runner_api_proto_msgTypes[96] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6708,7 +7181,7 @@ func (x *ExecutableStagePayload_SideInputId) String() string { func (*ExecutableStagePayload_SideInputId) ProtoMessage() {} func (x *ExecutableStagePayload_SideInputId) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[91] + mi := &file_beam_runner_api_proto_msgTypes[96] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6721,7 +7194,7 @@ func (x *ExecutableStagePayload_SideInputId) ProtoReflect() protoreflect.Message // Deprecated: Use ExecutableStagePayload_SideInputId.ProtoReflect.Descriptor instead. func (*ExecutableStagePayload_SideInputId) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{58, 0} + return file_beam_runner_api_proto_rawDescGZIP(), []int{60, 0} } func (x *ExecutableStagePayload_SideInputId) GetTransformId() string { @@ -6754,7 +7227,7 @@ type ExecutableStagePayload_UserStateId struct { func (x *ExecutableStagePayload_UserStateId) Reset() { *x = ExecutableStagePayload_UserStateId{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[92] + mi := &file_beam_runner_api_proto_msgTypes[97] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6767,7 +7240,7 @@ func (x *ExecutableStagePayload_UserStateId) String() string { func (*ExecutableStagePayload_UserStateId) ProtoMessage() {} func (x *ExecutableStagePayload_UserStateId) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[92] + mi := &file_beam_runner_api_proto_msgTypes[97] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6780,7 +7253,7 @@ func (x *ExecutableStagePayload_UserStateId) ProtoReflect() protoreflect.Message // Deprecated: Use ExecutableStagePayload_UserStateId.ProtoReflect.Descriptor instead. func (*ExecutableStagePayload_UserStateId) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{58, 1} + return file_beam_runner_api_proto_rawDescGZIP(), []int{60, 1} } func (x *ExecutableStagePayload_UserStateId) GetTransformId() string { @@ -6813,7 +7286,7 @@ type ExecutableStagePayload_TimerId struct { func (x *ExecutableStagePayload_TimerId) Reset() { *x = ExecutableStagePayload_TimerId{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[93] + mi := &file_beam_runner_api_proto_msgTypes[98] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6826,7 +7299,7 @@ func (x *ExecutableStagePayload_TimerId) String() string { func (*ExecutableStagePayload_TimerId) ProtoMessage() {} func (x *ExecutableStagePayload_TimerId) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[93] + mi := &file_beam_runner_api_proto_msgTypes[98] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6839,7 +7312,7 @@ func (x *ExecutableStagePayload_TimerId) ProtoReflect() protoreflect.Message { // Deprecated: Use ExecutableStagePayload_TimerId.ProtoReflect.Descriptor instead. func (*ExecutableStagePayload_TimerId) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{58, 2} + return file_beam_runner_api_proto_rawDescGZIP(), []int{60, 2} } func (x *ExecutableStagePayload_TimerId) GetTransformId() string { @@ -6872,7 +7345,7 @@ type ExecutableStagePayload_TimerFamilyId struct { func (x *ExecutableStagePayload_TimerFamilyId) Reset() { *x = ExecutableStagePayload_TimerFamilyId{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[94] + mi := &file_beam_runner_api_proto_msgTypes[99] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6885,7 +7358,7 @@ func (x *ExecutableStagePayload_TimerFamilyId) String() string { func (*ExecutableStagePayload_TimerFamilyId) ProtoMessage() {} func (x *ExecutableStagePayload_TimerFamilyId) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[94] + mi := &file_beam_runner_api_proto_msgTypes[99] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6898,7 +7371,7 @@ func (x *ExecutableStagePayload_TimerFamilyId) ProtoReflect() protoreflect.Messa // Deprecated: Use ExecutableStagePayload_TimerFamilyId.ProtoReflect.Descriptor instead. func (*ExecutableStagePayload_TimerFamilyId) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{58, 3} + return file_beam_runner_api_proto_rawDescGZIP(), []int{60, 3} } func (x *ExecutableStagePayload_TimerFamilyId) GetTransformId() string { @@ -6942,7 +7415,7 @@ type ExecutableStagePayload_WireCoderSetting struct { func (x *ExecutableStagePayload_WireCoderSetting) Reset() { *x = ExecutableStagePayload_WireCoderSetting{} if protoimpl.UnsafeEnabled { - mi := &file_beam_runner_api_proto_msgTypes[95] + mi := &file_beam_runner_api_proto_msgTypes[100] ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) ms.StoreMessageInfo(mi) } @@ -6955,7 +7428,7 @@ func (x *ExecutableStagePayload_WireCoderSetting) String() string { func (*ExecutableStagePayload_WireCoderSetting) ProtoMessage() {} func (x *ExecutableStagePayload_WireCoderSetting) ProtoReflect() protoreflect.Message { - mi := &file_beam_runner_api_proto_msgTypes[95] + mi := &file_beam_runner_api_proto_msgTypes[100] if protoimpl.UnsafeEnabled && x != nil { ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) if ms.LoadMessageInfo() == nil { @@ -6968,7 +7441,7 @@ func (x *ExecutableStagePayload_WireCoderSetting) ProtoReflect() protoreflect.Me // Deprecated: Use ExecutableStagePayload_WireCoderSetting.ProtoReflect.Descriptor instead. func (*ExecutableStagePayload_WireCoderSetting) Descriptor() ([]byte, []int) { - return file_beam_runner_api_proto_rawDescGZIP(), []int{58, 4} + return file_beam_runner_api_proto_rawDescGZIP(), []int{60, 4} } func (x *ExecutableStagePayload_WireCoderSetting) GetUrn() string { @@ -7028,7 +7501,7 @@ func (*ExecutableStagePayload_WireCoderSetting_Timer) isExecutableStagePayload_W var file_beam_runner_api_proto_extTypes = []protoimpl.ExtensionInfo{ { - ExtendedType: (*descriptor.EnumValueOptions)(nil), + ExtendedType: (*descriptorpb.EnumValueOptions)(nil), ExtensionType: (*string)(nil), Field: 185324356, Name: "org.apache.beam.model.pipeline.v1.beam_urn", @@ -7036,7 +7509,7 @@ var file_beam_runner_api_proto_extTypes = []protoimpl.ExtensionInfo{ Filename: "beam_runner_api.proto", }, { - ExtendedType: (*descriptor.EnumValueOptions)(nil), + ExtendedType: (*descriptorpb.EnumValueOptions)(nil), ExtensionType: (*string)(nil), Field: 185324357, Name: "org.apache.beam.model.pipeline.v1.beam_constant", @@ -7045,7 +7518,7 @@ var file_beam_runner_api_proto_extTypes = []protoimpl.ExtensionInfo{ }, } -// Extension fields to descriptor.EnumValueOptions. +// Extension fields to descriptorpb.EnumValueOptions. var ( // An extension to be used for specifying the standard URN of various // pipeline entities, e.g. transforms, functions, coders etc. @@ -7187,7 +7660,7 @@ var file_beam_runner_api_proto_rawDesc = []byte{ 0x44, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x52, 0x0b, 0x64, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x12, 0x22, 0x0a, 0x0c, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x09, 0x52, 0x0c, - 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x22, 0xb2, 0x04, 0x0a, + 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x22, 0xd4, 0x05, 0x0a, 0x0a, 0x50, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x12, 0x1f, 0x0a, 0x0b, 0x75, 0x6e, 0x69, 0x71, 0x75, 0x65, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, 0x75, 0x6e, 0x69, 0x71, 0x75, 0x65, 0x4e, 0x61, 0x6d, 0x65, 0x12, 0x43, 0x0a, 0x04, @@ -7215,1102 +7688,1200 @@ var file_beam_runner_api_proto_rawDesc = []byte{ 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x52, 0x0b, 0x64, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x12, 0x25, 0x0a, 0x0e, 0x65, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x07, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x65, 0x6e, 0x76, - 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x49, 0x64, 0x1a, 0x39, 0x0a, 0x0b, 0x49, 0x6e, - 0x70, 0x75, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, - 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, - 0x65, 0x3a, 0x02, 0x38, 0x01, 0x1a, 0x3a, 0x0a, 0x0c, 0x4f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x73, - 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, - 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, - 0x01, 0x22, 0x90, 0x0e, 0x0a, 0x13, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x50, 0x54, - 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x22, 0xa9, 0x03, 0x0a, 0x0a, 0x50, 0x72, - 0x69, 0x6d, 0x69, 0x74, 0x69, 0x76, 0x65, 0x73, 0x12, 0x29, 0x0a, 0x06, 0x50, 0x41, 0x52, 0x5f, - 0x44, 0x4f, 0x10, 0x00, 0x1a, 0x1d, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x17, 0x62, 0x65, 0x61, 0x6d, - 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x70, 0x61, 0x72, 0x64, 0x6f, - 0x3a, 0x76, 0x31, 0x12, 0x2c, 0x0a, 0x07, 0x46, 0x4c, 0x41, 0x54, 0x54, 0x45, 0x4e, 0x10, 0x01, - 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, - 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x66, 0x6c, 0x61, 0x74, 0x74, 0x65, 0x6e, 0x3a, 0x76, - 0x31, 0x12, 0x36, 0x0a, 0x0c, 0x47, 0x52, 0x4f, 0x55, 0x50, 0x5f, 0x42, 0x59, 0x5f, 0x4b, 0x45, - 0x59, 0x10, 0x02, 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x67, 0x72, 0x6f, 0x75, 0x70, 0x5f, - 0x62, 0x79, 0x5f, 0x6b, 0x65, 0x79, 0x3a, 0x76, 0x31, 0x12, 0x2c, 0x0a, 0x07, 0x49, 0x4d, 0x50, - 0x55, 0x4c, 0x53, 0x45, 0x10, 0x03, 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, 0x62, 0x65, - 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x69, 0x6d, 0x70, - 0x75, 0x6c, 0x73, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x37, 0x0a, 0x0e, 0x41, 0x53, 0x53, 0x49, 0x47, - 0x4e, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x53, 0x10, 0x04, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, - 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, - 0x6d, 0x3a, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x69, 0x6e, 0x74, 0x6f, 0x3a, 0x76, 0x31, - 0x12, 0x33, 0x0a, 0x0b, 0x54, 0x45, 0x53, 0x54, 0x5f, 0x53, 0x54, 0x52, 0x45, 0x41, 0x4d, 0x10, - 0x05, 0x1a, 0x22, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1c, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, - 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x74, 0x65, 0x73, 0x74, 0x73, 0x74, 0x72, 0x65, - 0x61, 0x6d, 0x3a, 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0b, 0x4d, 0x41, 0x50, 0x5f, 0x57, 0x49, 0x4e, - 0x44, 0x4f, 0x57, 0x53, 0x10, 0x06, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, - 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x6d, 0x61, 0x70, - 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x38, 0x0a, 0x0d, 0x4d, - 0x45, 0x52, 0x47, 0x45, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x53, 0x10, 0x07, 0x1a, 0x25, - 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1f, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, - 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x6d, 0x65, 0x72, 0x67, 0x65, 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, - 0x77, 0x73, 0x3a, 0x76, 0x31, 0x22, 0x74, 0x0a, 0x14, 0x44, 0x65, 0x70, 0x72, 0x65, 0x63, 0x61, - 0x74, 0x65, 0x64, 0x50, 0x72, 0x69, 0x6d, 0x69, 0x74, 0x69, 0x76, 0x65, 0x73, 0x12, 0x26, 0x0a, - 0x04, 0x52, 0x45, 0x41, 0x44, 0x10, 0x00, 0x1a, 0x1c, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x16, 0x62, - 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x72, 0x65, - 0x61, 0x64, 0x3a, 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0b, 0x43, 0x52, 0x45, 0x41, 0x54, 0x45, 0x5f, - 0x56, 0x49, 0x45, 0x57, 0x10, 0x01, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, - 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x72, 0x65, - 0x61, 0x74, 0x65, 0x5f, 0x76, 0x69, 0x65, 0x77, 0x3a, 0x76, 0x31, 0x22, 0xe0, 0x02, 0x0a, 0x0a, - 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x73, 0x69, 0x74, 0x65, 0x73, 0x12, 0x3c, 0x0a, 0x0f, 0x43, 0x4f, - 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, 0x50, 0x45, 0x52, 0x5f, 0x4b, 0x45, 0x59, 0x10, 0x00, 0x1a, - 0x27, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x21, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, - 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x70, 0x65, - 0x72, 0x5f, 0x6b, 0x65, 0x79, 0x3a, 0x76, 0x31, 0x12, 0x3e, 0x0a, 0x10, 0x43, 0x4f, 0x4d, 0x42, - 0x49, 0x4e, 0x45, 0x5f, 0x47, 0x4c, 0x4f, 0x42, 0x41, 0x4c, 0x4c, 0x59, 0x10, 0x01, 0x1a, 0x28, - 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, - 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x67, 0x6c, 0x6f, - 0x62, 0x61, 0x6c, 0x6c, 0x79, 0x3a, 0x76, 0x31, 0x12, 0x30, 0x0a, 0x09, 0x52, 0x45, 0x53, 0x48, - 0x55, 0x46, 0x46, 0x4c, 0x45, 0x10, 0x02, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, - 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x72, 0x65, - 0x73, 0x68, 0x75, 0x66, 0x66, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0b, 0x57, 0x52, - 0x49, 0x54, 0x45, 0x5f, 0x46, 0x49, 0x4c, 0x45, 0x53, 0x10, 0x03, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, - 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, - 0x6d, 0x3a, 0x77, 0x72, 0x69, 0x74, 0x65, 0x5f, 0x66, 0x69, 0x6c, 0x65, 0x73, 0x3a, 0x76, 0x31, - 0x12, 0x34, 0x0a, 0x0b, 0x50, 0x55, 0x42, 0x53, 0x55, 0x42, 0x5f, 0x52, 0x45, 0x41, 0x44, 0x10, - 0x04, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, - 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x70, 0x75, 0x62, 0x73, 0x75, 0x62, 0x5f, 0x72, - 0x65, 0x61, 0x64, 0x3a, 0x76, 0x31, 0x12, 0x36, 0x0a, 0x0c, 0x50, 0x55, 0x42, 0x53, 0x55, 0x42, - 0x5f, 0x57, 0x52, 0x49, 0x54, 0x45, 0x10, 0x05, 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, - 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x70, - 0x75, 0x62, 0x73, 0x75, 0x62, 0x5f, 0x77, 0x72, 0x69, 0x74, 0x65, 0x3a, 0x76, 0x31, 0x22, 0xe3, - 0x03, 0x0a, 0x11, 0x43, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, - 0x65, 0x6e, 0x74, 0x73, 0x12, 0x52, 0x0a, 0x1a, 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, - 0x50, 0x45, 0x52, 0x5f, 0x4b, 0x45, 0x59, 0x5f, 0x50, 0x52, 0x45, 0x43, 0x4f, 0x4d, 0x42, 0x49, - 0x4e, 0x45, 0x10, 0x00, 0x1a, 0x32, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x2c, 0x62, 0x65, 0x61, 0x6d, - 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, - 0x6e, 0x65, 0x5f, 0x70, 0x65, 0x72, 0x5f, 0x6b, 0x65, 0x79, 0x5f, 0x70, 0x72, 0x65, 0x63, 0x6f, - 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x62, 0x0a, 0x22, 0x43, 0x4f, 0x4d, 0x42, - 0x49, 0x4e, 0x45, 0x5f, 0x50, 0x45, 0x52, 0x5f, 0x4b, 0x45, 0x59, 0x5f, 0x4d, 0x45, 0x52, 0x47, - 0x45, 0x5f, 0x41, 0x43, 0x43, 0x55, 0x4d, 0x55, 0x4c, 0x41, 0x54, 0x4f, 0x52, 0x53, 0x10, 0x01, - 0x1a, 0x3a, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x34, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, - 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x70, - 0x65, 0x72, 0x5f, 0x6b, 0x65, 0x79, 0x5f, 0x6d, 0x65, 0x72, 0x67, 0x65, 0x5f, 0x61, 0x63, 0x63, - 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x6f, 0x72, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x5c, 0x0a, 0x1f, - 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, 0x50, 0x45, 0x52, 0x5f, 0x4b, 0x45, 0x59, 0x5f, - 0x45, 0x58, 0x54, 0x52, 0x41, 0x43, 0x54, 0x5f, 0x4f, 0x55, 0x54, 0x50, 0x55, 0x54, 0x53, 0x10, - 0x02, 0x1a, 0x37, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x31, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, - 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, - 0x70, 0x65, 0x72, 0x5f, 0x6b, 0x65, 0x79, 0x5f, 0x65, 0x78, 0x74, 0x72, 0x61, 0x63, 0x74, 0x5f, - 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x4a, 0x0a, 0x16, 0x43, 0x4f, - 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, 0x47, 0x52, 0x4f, 0x55, 0x50, 0x45, 0x44, 0x5f, 0x56, 0x41, - 0x4c, 0x55, 0x45, 0x53, 0x10, 0x03, 0x1a, 0x2e, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x28, 0x62, 0x65, - 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, - 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x67, 0x72, 0x6f, 0x75, 0x70, 0x65, 0x64, 0x5f, 0x76, 0x61, 0x6c, - 0x75, 0x65, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x6c, 0x0a, 0x27, 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, - 0x45, 0x5f, 0x50, 0x45, 0x52, 0x5f, 0x4b, 0x45, 0x59, 0x5f, 0x43, 0x4f, 0x4e, 0x56, 0x45, 0x52, - 0x54, 0x5f, 0x54, 0x4f, 0x5f, 0x41, 0x43, 0x43, 0x55, 0x4d, 0x55, 0x4c, 0x41, 0x54, 0x4f, 0x52, - 0x53, 0x10, 0x04, 0x1a, 0x3f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x39, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x49, 0x64, 0x12, 0x60, 0x0a, 0x0b, 0x61, 0x6e, + 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x18, 0x08, 0x20, 0x03, 0x28, 0x0b, 0x32, + 0x3e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, + 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x2e, 0x41, + 0x6e, 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, + 0x0b, 0x61, 0x6e, 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x1a, 0x39, 0x0a, 0x0b, + 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, + 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, + 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, + 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x1a, 0x3a, 0x0a, 0x0c, 0x4f, 0x75, 0x74, 0x70, 0x75, + 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, + 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, + 0x02, 0x38, 0x01, 0x1a, 0x3e, 0x0a, 0x10, 0x41, 0x6e, 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, 0x6f, + 0x6e, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, + 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, + 0x02, 0x38, 0x01, 0x22, 0x8a, 0x10, 0x0a, 0x13, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, + 0x50, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x22, 0xdb, 0x03, 0x0a, 0x0a, + 0x50, 0x72, 0x69, 0x6d, 0x69, 0x74, 0x69, 0x76, 0x65, 0x73, 0x12, 0x29, 0x0a, 0x06, 0x50, 0x41, + 0x52, 0x5f, 0x44, 0x4f, 0x10, 0x00, 0x1a, 0x1d, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x17, 0x62, 0x65, + 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x70, 0x61, 0x72, + 0x64, 0x6f, 0x3a, 0x76, 0x31, 0x12, 0x2c, 0x0a, 0x07, 0x46, 0x4c, 0x41, 0x54, 0x54, 0x45, 0x4e, + 0x10, 0x01, 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, + 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x66, 0x6c, 0x61, 0x74, 0x74, 0x65, 0x6e, + 0x3a, 0x76, 0x31, 0x12, 0x36, 0x0a, 0x0c, 0x47, 0x52, 0x4f, 0x55, 0x50, 0x5f, 0x42, 0x59, 0x5f, + 0x4b, 0x45, 0x59, 0x10, 0x02, 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, + 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x67, 0x72, 0x6f, 0x75, + 0x70, 0x5f, 0x62, 0x79, 0x5f, 0x6b, 0x65, 0x79, 0x3a, 0x76, 0x31, 0x12, 0x2c, 0x0a, 0x07, 0x49, + 0x4d, 0x50, 0x55, 0x4c, 0x53, 0x45, 0x10, 0x03, 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, + 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x69, + 0x6d, 0x70, 0x75, 0x6c, 0x73, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x37, 0x0a, 0x0e, 0x41, 0x53, 0x53, + 0x49, 0x47, 0x4e, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x53, 0x10, 0x04, 0x1a, 0x23, 0xa2, + 0xb4, 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, + 0x6f, 0x72, 0x6d, 0x3a, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x69, 0x6e, 0x74, 0x6f, 0x3a, + 0x76, 0x31, 0x12, 0x33, 0x0a, 0x0b, 0x54, 0x45, 0x53, 0x54, 0x5f, 0x53, 0x54, 0x52, 0x45, 0x41, + 0x4d, 0x10, 0x05, 0x1a, 0x22, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1c, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x74, 0x65, 0x73, 0x74, 0x73, 0x74, + 0x72, 0x65, 0x61, 0x6d, 0x3a, 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0b, 0x4d, 0x41, 0x50, 0x5f, 0x57, + 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x53, 0x10, 0x06, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, + 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x6d, + 0x61, 0x70, 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x38, 0x0a, + 0x0d, 0x4d, 0x45, 0x52, 0x47, 0x45, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x53, 0x10, 0x07, + 0x1a, 0x25, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1f, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, + 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x6d, 0x65, 0x72, 0x67, 0x65, 0x5f, 0x77, 0x69, 0x6e, + 0x64, 0x6f, 0x77, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x30, 0x0a, 0x09, 0x54, 0x4f, 0x5f, 0x53, 0x54, + 0x52, 0x49, 0x4e, 0x47, 0x10, 0x08, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, + 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x74, 0x6f, 0x5f, + 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x31, 0x22, 0x74, 0x0a, 0x14, 0x44, 0x65, 0x70, + 0x72, 0x65, 0x63, 0x61, 0x74, 0x65, 0x64, 0x50, 0x72, 0x69, 0x6d, 0x69, 0x74, 0x69, 0x76, 0x65, + 0x73, 0x12, 0x26, 0x0a, 0x04, 0x52, 0x45, 0x41, 0x44, 0x10, 0x00, 0x1a, 0x1c, 0xa2, 0xb4, 0xfa, + 0xc2, 0x05, 0x16, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, + 0x6d, 0x3a, 0x72, 0x65, 0x61, 0x64, 0x3a, 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0b, 0x43, 0x52, 0x45, + 0x41, 0x54, 0x45, 0x5f, 0x56, 0x49, 0x45, 0x57, 0x10, 0x01, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, + 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, + 0x3a, 0x63, 0x72, 0x65, 0x61, 0x74, 0x65, 0x5f, 0x76, 0x69, 0x65, 0x77, 0x3a, 0x76, 0x31, 0x22, + 0xc6, 0x03, 0x0a, 0x0a, 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x73, 0x69, 0x74, 0x65, 0x73, 0x12, 0x3c, + 0x0a, 0x0f, 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, 0x50, 0x45, 0x52, 0x5f, 0x4b, 0x45, + 0x59, 0x10, 0x00, 0x1a, 0x27, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x21, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, - 0x65, 0x5f, 0x70, 0x65, 0x72, 0x5f, 0x6b, 0x65, 0x79, 0x5f, 0x63, 0x6f, 0x6e, 0x76, 0x65, 0x72, - 0x74, 0x5f, 0x74, 0x6f, 0x5f, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x6f, 0x72, - 0x73, 0x3a, 0x76, 0x31, 0x22, 0x8d, 0x03, 0x0a, 0x19, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x74, 0x61, - 0x62, 0x6c, 0x65, 0x50, 0x61, 0x72, 0x44, 0x6f, 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, - 0x74, 0x73, 0x12, 0x4c, 0x0a, 0x15, 0x50, 0x41, 0x49, 0x52, 0x5f, 0x57, 0x49, 0x54, 0x48, 0x5f, - 0x52, 0x45, 0x53, 0x54, 0x52, 0x49, 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x10, 0x00, 0x1a, 0x31, 0xa2, - 0xb4, 0xfa, 0xc2, 0x05, 0x2b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, - 0x6f, 0x72, 0x6d, 0x3a, 0x73, 0x64, 0x66, 0x5f, 0x70, 0x61, 0x69, 0x72, 0x5f, 0x77, 0x69, 0x74, - 0x68, 0x5f, 0x72, 0x65, 0x73, 0x74, 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x3a, 0x76, 0x31, - 0x12, 0x58, 0x0a, 0x1b, 0x53, 0x50, 0x4c, 0x49, 0x54, 0x5f, 0x41, 0x4e, 0x44, 0x5f, 0x53, 0x49, - 0x5a, 0x45, 0x5f, 0x52, 0x45, 0x53, 0x54, 0x52, 0x49, 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x53, 0x10, - 0x01, 0x1a, 0x37, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x31, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, - 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x73, 0x64, 0x66, 0x5f, 0x73, 0x70, 0x6c, 0x69, - 0x74, 0x5f, 0x61, 0x6e, 0x64, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x5f, 0x72, 0x65, 0x73, 0x74, 0x72, - 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x6f, 0x0a, 0x27, 0x50, 0x52, - 0x4f, 0x43, 0x45, 0x53, 0x53, 0x5f, 0x53, 0x49, 0x5a, 0x45, 0x44, 0x5f, 0x45, 0x4c, 0x45, 0x4d, - 0x45, 0x4e, 0x54, 0x53, 0x5f, 0x41, 0x4e, 0x44, 0x5f, 0x52, 0x45, 0x53, 0x54, 0x52, 0x49, 0x43, - 0x54, 0x49, 0x4f, 0x4e, 0x53, 0x10, 0x02, 0x1a, 0x42, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x3c, 0x62, + 0x65, 0x5f, 0x70, 0x65, 0x72, 0x5f, 0x6b, 0x65, 0x79, 0x3a, 0x76, 0x31, 0x12, 0x3e, 0x0a, 0x10, + 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, 0x47, 0x4c, 0x4f, 0x42, 0x41, 0x4c, 0x4c, 0x59, + 0x10, 0x01, 0x1a, 0x28, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, + 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, + 0x5f, 0x67, 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x6c, 0x79, 0x3a, 0x76, 0x31, 0x12, 0x30, 0x0a, 0x09, + 0x52, 0x45, 0x53, 0x48, 0x55, 0x46, 0x46, 0x4c, 0x45, 0x10, 0x02, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, + 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, + 0x6d, 0x3a, 0x72, 0x65, 0x73, 0x68, 0x75, 0x66, 0x66, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x34, + 0x0a, 0x0b, 0x57, 0x52, 0x49, 0x54, 0x45, 0x5f, 0x46, 0x49, 0x4c, 0x45, 0x53, 0x10, 0x03, 0x1a, + 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, + 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x77, 0x72, 0x69, 0x74, 0x65, 0x5f, 0x66, 0x69, 0x6c, 0x65, + 0x73, 0x3a, 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0b, 0x50, 0x55, 0x42, 0x53, 0x55, 0x42, 0x5f, 0x52, + 0x45, 0x41, 0x44, 0x10, 0x04, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, + 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x70, 0x75, 0x62, 0x73, + 0x75, 0x62, 0x5f, 0x72, 0x65, 0x61, 0x64, 0x3a, 0x76, 0x31, 0x12, 0x36, 0x0a, 0x0c, 0x50, 0x55, + 0x42, 0x53, 0x55, 0x42, 0x5f, 0x57, 0x52, 0x49, 0x54, 0x45, 0x10, 0x05, 0x1a, 0x24, 0xa2, 0xb4, + 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, + 0x72, 0x6d, 0x3a, 0x70, 0x75, 0x62, 0x73, 0x75, 0x62, 0x5f, 0x77, 0x72, 0x69, 0x74, 0x65, 0x3a, + 0x76, 0x31, 0x12, 0x64, 0x0a, 0x23, 0x47, 0x52, 0x4f, 0x55, 0x50, 0x5f, 0x49, 0x4e, 0x54, 0x4f, + 0x5f, 0x42, 0x41, 0x54, 0x43, 0x48, 0x45, 0x53, 0x5f, 0x57, 0x49, 0x54, 0x48, 0x5f, 0x53, 0x48, + 0x41, 0x52, 0x44, 0x45, 0x44, 0x5f, 0x4b, 0x45, 0x59, 0x10, 0x06, 0x1a, 0x3b, 0xa2, 0xb4, 0xfa, + 0xc2, 0x05, 0x35, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, + 0x6d, 0x3a, 0x67, 0x72, 0x6f, 0x75, 0x70, 0x5f, 0x69, 0x6e, 0x74, 0x6f, 0x5f, 0x62, 0x61, 0x74, + 0x63, 0x68, 0x65, 0x73, 0x5f, 0x77, 0x69, 0x74, 0x68, 0x5f, 0x73, 0x68, 0x61, 0x72, 0x64, 0x65, + 0x64, 0x5f, 0x6b, 0x65, 0x79, 0x3a, 0x76, 0x31, 0x22, 0xe3, 0x03, 0x0a, 0x11, 0x43, 0x6f, 0x6d, + 0x62, 0x69, 0x6e, 0x65, 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x52, + 0x0a, 0x1a, 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, 0x50, 0x45, 0x52, 0x5f, 0x4b, 0x45, + 0x59, 0x5f, 0x50, 0x52, 0x45, 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x10, 0x00, 0x1a, 0x32, + 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x2c, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, + 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x70, 0x65, 0x72, + 0x5f, 0x6b, 0x65, 0x79, 0x5f, 0x70, 0x72, 0x65, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x3a, + 0x76, 0x31, 0x12, 0x62, 0x0a, 0x22, 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, 0x50, 0x45, + 0x52, 0x5f, 0x4b, 0x45, 0x59, 0x5f, 0x4d, 0x45, 0x52, 0x47, 0x45, 0x5f, 0x41, 0x43, 0x43, 0x55, + 0x4d, 0x55, 0x4c, 0x41, 0x54, 0x4f, 0x52, 0x53, 0x10, 0x01, 0x1a, 0x3a, 0xa2, 0xb4, 0xfa, 0xc2, + 0x05, 0x34, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, + 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x70, 0x65, 0x72, 0x5f, 0x6b, 0x65, 0x79, + 0x5f, 0x6d, 0x65, 0x72, 0x67, 0x65, 0x5f, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, + 0x6f, 0x72, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x5c, 0x0a, 0x1f, 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, + 0x45, 0x5f, 0x50, 0x45, 0x52, 0x5f, 0x4b, 0x45, 0x59, 0x5f, 0x45, 0x58, 0x54, 0x52, 0x41, 0x43, + 0x54, 0x5f, 0x4f, 0x55, 0x54, 0x50, 0x55, 0x54, 0x53, 0x10, 0x02, 0x1a, 0x37, 0xa2, 0xb4, 0xfa, + 0xc2, 0x05, 0x31, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, + 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x70, 0x65, 0x72, 0x5f, 0x6b, 0x65, + 0x79, 0x5f, 0x65, 0x78, 0x74, 0x72, 0x61, 0x63, 0x74, 0x5f, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, + 0x73, 0x3a, 0x76, 0x31, 0x12, 0x4a, 0x0a, 0x16, 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, + 0x47, 0x52, 0x4f, 0x55, 0x50, 0x45, 0x44, 0x5f, 0x56, 0x41, 0x4c, 0x55, 0x45, 0x53, 0x10, 0x03, + 0x1a, 0x2e, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x28, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, + 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x67, + 0x72, 0x6f, 0x75, 0x70, 0x65, 0x64, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x73, 0x3a, 0x76, 0x31, + 0x12, 0x6c, 0x0a, 0x27, 0x43, 0x4f, 0x4d, 0x42, 0x49, 0x4e, 0x45, 0x5f, 0x50, 0x45, 0x52, 0x5f, + 0x4b, 0x45, 0x59, 0x5f, 0x43, 0x4f, 0x4e, 0x56, 0x45, 0x52, 0x54, 0x5f, 0x54, 0x4f, 0x5f, 0x41, + 0x43, 0x43, 0x55, 0x4d, 0x55, 0x4c, 0x41, 0x54, 0x4f, 0x52, 0x53, 0x10, 0x04, 0x1a, 0x3f, 0xa2, + 0xb4, 0xfa, 0xc2, 0x05, 0x39, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, + 0x6f, 0x72, 0x6d, 0x3a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x70, 0x65, 0x72, 0x5f, + 0x6b, 0x65, 0x79, 0x5f, 0x63, 0x6f, 0x6e, 0x76, 0x65, 0x72, 0x74, 0x5f, 0x74, 0x6f, 0x5f, 0x61, + 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x6f, 0x72, 0x73, 0x3a, 0x76, 0x31, 0x22, 0x8d, + 0x03, 0x0a, 0x19, 0x53, 0x70, 0x6c, 0x69, 0x74, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x50, 0x61, 0x72, + 0x44, 0x6f, 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x4c, 0x0a, 0x15, + 0x50, 0x41, 0x49, 0x52, 0x5f, 0x57, 0x49, 0x54, 0x48, 0x5f, 0x52, 0x45, 0x53, 0x54, 0x52, 0x49, + 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x10, 0x00, 0x1a, 0x31, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x2b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x73, 0x64, - 0x66, 0x5f, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x64, 0x5f, - 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x61, 0x6e, 0x64, 0x5f, 0x72, 0x65, 0x73, 0x74, - 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x57, 0x0a, 0x1a, 0x54, - 0x52, 0x55, 0x4e, 0x43, 0x41, 0x54, 0x45, 0x5f, 0x53, 0x49, 0x5a, 0x45, 0x44, 0x5f, 0x52, 0x45, - 0x53, 0x54, 0x52, 0x49, 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x10, 0x03, 0x1a, 0x37, 0xa2, 0xb4, 0xfa, + 0x66, 0x5f, 0x70, 0x61, 0x69, 0x72, 0x5f, 0x77, 0x69, 0x74, 0x68, 0x5f, 0x72, 0x65, 0x73, 0x74, + 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x3a, 0x76, 0x31, 0x12, 0x58, 0x0a, 0x1b, 0x53, 0x50, + 0x4c, 0x49, 0x54, 0x5f, 0x41, 0x4e, 0x44, 0x5f, 0x53, 0x49, 0x5a, 0x45, 0x5f, 0x52, 0x45, 0x53, + 0x54, 0x52, 0x49, 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x53, 0x10, 0x01, 0x1a, 0x37, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x31, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, - 0x6d, 0x3a, 0x73, 0x64, 0x66, 0x5f, 0x74, 0x72, 0x75, 0x6e, 0x63, 0x61, 0x74, 0x65, 0x5f, 0x73, - 0x69, 0x7a, 0x65, 0x64, 0x5f, 0x72, 0x65, 0x73, 0x74, 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, - 0x73, 0x3a, 0x76, 0x31, 0x22, 0x82, 0x01, 0x0a, 0x16, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, - 0x64, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x54, 0x79, 0x70, 0x65, 0x73, 0x22, - 0x68, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x2f, 0x0a, 0x08, 0x49, 0x54, 0x45, 0x52, 0x41, - 0x42, 0x4c, 0x45, 0x10, 0x00, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, - 0x6d, 0x3a, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x3a, 0x69, 0x74, 0x65, - 0x72, 0x61, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x2f, 0x0a, 0x08, 0x4d, 0x55, 0x4c, 0x54, - 0x49, 0x4d, 0x41, 0x50, 0x10, 0x01, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, - 0x61, 0x6d, 0x3a, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x3a, 0x6d, 0x75, - 0x6c, 0x74, 0x69, 0x6d, 0x61, 0x70, 0x3a, 0x76, 0x31, 0x22, 0xa2, 0x02, 0x0a, 0x0b, 0x50, 0x43, - 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x1f, 0x0a, 0x0b, 0x75, 0x6e, 0x69, - 0x71, 0x75, 0x65, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, - 0x75, 0x6e, 0x69, 0x71, 0x75, 0x65, 0x4e, 0x61, 0x6d, 0x65, 0x12, 0x19, 0x0a, 0x08, 0x63, 0x6f, - 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x63, 0x6f, - 0x64, 0x65, 0x72, 0x49, 0x64, 0x12, 0x50, 0x0a, 0x0a, 0x69, 0x73, 0x5f, 0x62, 0x6f, 0x75, 0x6e, - 0x64, 0x65, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, - 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x73, - 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x09, 0x69, 0x73, - 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, 0x12, 0x32, 0x0a, 0x15, 0x77, 0x69, 0x6e, 0x64, 0x6f, - 0x77, 0x69, 0x6e, 0x67, 0x5f, 0x73, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x5f, 0x69, 0x64, - 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x13, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, - 0x67, 0x53, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x49, 0x64, 0x12, 0x51, 0x0a, 0x0c, 0x64, - 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x5f, 0x64, 0x61, 0x74, 0x61, 0x18, 0x05, 0x20, 0x03, 0x28, - 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, - 0x61, 0x52, 0x0b, 0x64, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x22, 0xbe, - 0x07, 0x0a, 0x0c, 0x50, 0x61, 0x72, 0x44, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, - 0x44, 0x0a, 0x05, 0x64, 0x6f, 0x5f, 0x66, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, - 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, - 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, - 0x04, 0x64, 0x6f, 0x46, 0x6e, 0x12, 0x60, 0x0a, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, - 0x70, 0x75, 0x74, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, - 0x61, 0x72, 0x44, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x53, 0x69, 0x64, 0x65, - 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0a, 0x73, 0x69, 0x64, - 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x12, 0x60, 0x0a, 0x0b, 0x73, 0x74, 0x61, 0x74, 0x65, - 0x5f, 0x73, 0x70, 0x65, 0x63, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x50, 0x61, 0x72, 0x44, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x53, 0x74, - 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0a, 0x73, - 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x73, 0x12, 0x73, 0x0a, 0x12, 0x74, 0x69, 0x6d, - 0x65, 0x72, 0x5f, 0x66, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x73, 0x18, - 0x09, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x6d, 0x3a, 0x73, 0x64, 0x66, 0x5f, 0x73, 0x70, 0x6c, 0x69, 0x74, 0x5f, 0x61, 0x6e, 0x64, 0x5f, + 0x73, 0x69, 0x7a, 0x65, 0x5f, 0x72, 0x65, 0x73, 0x74, 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, + 0x73, 0x3a, 0x76, 0x31, 0x12, 0x6f, 0x0a, 0x27, 0x50, 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x5f, + 0x53, 0x49, 0x5a, 0x45, 0x44, 0x5f, 0x45, 0x4c, 0x45, 0x4d, 0x45, 0x4e, 0x54, 0x53, 0x5f, 0x41, + 0x4e, 0x44, 0x5f, 0x52, 0x45, 0x53, 0x54, 0x52, 0x49, 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x53, 0x10, + 0x02, 0x1a, 0x42, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x3c, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x74, 0x72, + 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x73, 0x64, 0x66, 0x5f, 0x70, 0x72, 0x6f, 0x63, + 0x65, 0x73, 0x73, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x64, 0x5f, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, + 0x74, 0x5f, 0x61, 0x6e, 0x64, 0x5f, 0x72, 0x65, 0x73, 0x74, 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, + 0x6e, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x57, 0x0a, 0x1a, 0x54, 0x52, 0x55, 0x4e, 0x43, 0x41, 0x54, + 0x45, 0x5f, 0x53, 0x49, 0x5a, 0x45, 0x44, 0x5f, 0x52, 0x45, 0x53, 0x54, 0x52, 0x49, 0x43, 0x54, + 0x49, 0x4f, 0x4e, 0x10, 0x03, 0x1a, 0x37, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x31, 0x62, 0x65, 0x61, + 0x6d, 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x73, 0x64, 0x66, 0x5f, + 0x74, 0x72, 0x75, 0x6e, 0x63, 0x61, 0x74, 0x65, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x64, 0x5f, 0x72, + 0x65, 0x73, 0x74, 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x3a, 0x76, 0x31, 0x22, 0x60, + 0x0a, 0x1a, 0x47, 0x72, 0x6f, 0x75, 0x70, 0x49, 0x6e, 0x74, 0x6f, 0x42, 0x61, 0x74, 0x63, 0x68, + 0x65, 0x73, 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x42, 0x0a, 0x12, + 0x47, 0x52, 0x4f, 0x55, 0x50, 0x5f, 0x49, 0x4e, 0x54, 0x4f, 0x5f, 0x42, 0x41, 0x54, 0x43, 0x48, + 0x45, 0x53, 0x10, 0x00, 0x1a, 0x2a, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x24, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x3a, 0x67, 0x72, 0x6f, 0x75, 0x70, + 0x5f, 0x69, 0x6e, 0x74, 0x6f, 0x5f, 0x62, 0x61, 0x74, 0x63, 0x68, 0x65, 0x73, 0x3a, 0x76, 0x31, + 0x22, 0x82, 0x01, 0x0a, 0x16, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x53, 0x69, 0x64, + 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x54, 0x79, 0x70, 0x65, 0x73, 0x22, 0x68, 0x0a, 0x04, 0x45, + 0x6e, 0x75, 0x6d, 0x12, 0x2f, 0x0a, 0x08, 0x49, 0x54, 0x45, 0x52, 0x41, 0x42, 0x4c, 0x45, 0x10, + 0x00, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x73, 0x69, + 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x3a, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, + 0x65, 0x3a, 0x76, 0x31, 0x12, 0x2f, 0x0a, 0x08, 0x4d, 0x55, 0x4c, 0x54, 0x49, 0x4d, 0x41, 0x50, + 0x10, 0x01, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x73, + 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x3a, 0x6d, 0x75, 0x6c, 0x74, 0x69, 0x6d, + 0x61, 0x70, 0x3a, 0x76, 0x31, 0x22, 0xa2, 0x02, 0x0a, 0x0b, 0x50, 0x43, 0x6f, 0x6c, 0x6c, 0x65, + 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x1f, 0x0a, 0x0b, 0x75, 0x6e, 0x69, 0x71, 0x75, 0x65, 0x5f, + 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, 0x75, 0x6e, 0x69, 0x71, + 0x75, 0x65, 0x4e, 0x61, 0x6d, 0x65, 0x12, 0x19, 0x0a, 0x08, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, + 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x49, + 0x64, 0x12, 0x50, 0x0a, 0x0a, 0x69, 0x73, 0x5f, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, 0x18, + 0x03, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x61, 0x72, 0x44, 0x6f, 0x50, - 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, - 0x6c, 0x79, 0x53, 0x70, 0x65, 0x63, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x10, 0x74, 0x69, - 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x53, 0x70, 0x65, 0x63, 0x73, 0x12, 0x30, - 0x0a, 0x14, 0x72, 0x65, 0x73, 0x74, 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x63, 0x6f, - 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x07, 0x20, 0x01, 0x28, 0x09, 0x52, 0x12, 0x72, 0x65, - 0x73, 0x74, 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, - 0x12, 0x33, 0x0a, 0x15, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x73, 0x5f, 0x66, 0x69, 0x6e, - 0x61, 0x6c, 0x69, 0x7a, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x08, 0x20, 0x01, 0x28, 0x08, 0x52, - 0x14, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x73, 0x46, 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, - 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x3b, 0x0a, 0x1a, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, - 0x73, 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x73, 0x6f, 0x72, 0x74, 0x65, 0x64, 0x5f, 0x69, 0x6e, - 0x70, 0x75, 0x74, 0x18, 0x0a, 0x20, 0x01, 0x28, 0x08, 0x52, 0x17, 0x72, 0x65, 0x71, 0x75, 0x69, - 0x72, 0x65, 0x73, 0x54, 0x69, 0x6d, 0x65, 0x53, 0x6f, 0x72, 0x74, 0x65, 0x64, 0x49, 0x6e, 0x70, - 0x75, 0x74, 0x12, 0x32, 0x0a, 0x15, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x5f, 0x73, - 0x74, 0x61, 0x62, 0x6c, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x0b, 0x20, 0x01, 0x28, - 0x08, 0x52, 0x13, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x53, 0x74, 0x61, 0x62, 0x6c, - 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x1a, 0x6b, 0x0a, 0x0f, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, - 0x70, 0x75, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x42, 0x0a, 0x05, 0x76, - 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x53, - 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, - 0x02, 0x38, 0x01, 0x1a, 0x6b, 0x0a, 0x0f, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, - 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, - 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x42, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, - 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, - 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, - 0x65, 0x53, 0x70, 0x65, 0x63, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, - 0x1a, 0x77, 0x0a, 0x15, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x53, - 0x70, 0x65, 0x63, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x48, 0x0a, 0x05, 0x76, - 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x32, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, - 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x53, 0x70, 0x65, 0x63, 0x52, 0x05, - 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x4a, 0x04, 0x08, 0x06, 0x10, 0x07, 0x22, - 0xb8, 0x04, 0x0a, 0x09, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x12, 0x72, 0x0a, - 0x16, 0x72, 0x65, 0x61, 0x64, 0x5f, 0x6d, 0x6f, 0x64, 0x69, 0x66, 0x79, 0x5f, 0x77, 0x72, 0x69, - 0x74, 0x65, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3b, 0x2e, + 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x73, 0x42, 0x6f, 0x75, 0x6e, + 0x64, 0x65, 0x64, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x09, 0x69, 0x73, 0x42, 0x6f, 0x75, 0x6e, + 0x64, 0x65, 0x64, 0x12, 0x32, 0x0a, 0x15, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, + 0x5f, 0x73, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x5f, 0x69, 0x64, 0x18, 0x04, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x13, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x72, + 0x61, 0x74, 0x65, 0x67, 0x79, 0x49, 0x64, 0x12, 0x51, 0x0a, 0x0c, 0x64, 0x69, 0x73, 0x70, 0x6c, + 0x61, 0x79, 0x5f, 0x64, 0x61, 0x74, 0x61, 0x18, 0x05, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, - 0x31, 0x2e, 0x52, 0x65, 0x61, 0x64, 0x4d, 0x6f, 0x64, 0x69, 0x66, 0x79, 0x57, 0x72, 0x69, 0x74, - 0x65, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, 0x52, 0x13, 0x72, 0x65, - 0x61, 0x64, 0x4d, 0x6f, 0x64, 0x69, 0x66, 0x79, 0x57, 0x72, 0x69, 0x74, 0x65, 0x53, 0x70, 0x65, - 0x63, 0x12, 0x4c, 0x0a, 0x08, 0x62, 0x61, 0x67, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, - 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x42, 0x61, 0x67, 0x53, 0x74, 0x61, 0x74, 0x65, - 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, 0x52, 0x07, 0x62, 0x61, 0x67, 0x53, 0x70, 0x65, 0x63, 0x12, - 0x5e, 0x0a, 0x0e, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x5f, 0x73, 0x70, 0x65, - 0x63, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x35, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, - 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6f, 0x6d, 0x62, - 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, - 0x52, 0x0d, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x53, 0x70, 0x65, 0x63, 0x12, - 0x4c, 0x0a, 0x08, 0x6d, 0x61, 0x70, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x04, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x61, 0x70, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, - 0x65, 0x63, 0x48, 0x00, 0x52, 0x07, 0x6d, 0x61, 0x70, 0x53, 0x70, 0x65, 0x63, 0x12, 0x4c, 0x0a, - 0x08, 0x73, 0x65, 0x74, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, - 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, - 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, - 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x65, 0x74, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, - 0x48, 0x00, 0x52, 0x07, 0x73, 0x65, 0x74, 0x53, 0x70, 0x65, 0x63, 0x12, 0x65, 0x0a, 0x11, 0x6f, - 0x72, 0x64, 0x65, 0x72, 0x65, 0x64, 0x5f, 0x6c, 0x69, 0x73, 0x74, 0x5f, 0x73, 0x70, 0x65, 0x63, - 0x18, 0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4f, 0x72, 0x64, 0x65, 0x72, - 0x65, 0x64, 0x4c, 0x69, 0x73, 0x74, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x48, - 0x00, 0x52, 0x0f, 0x6f, 0x72, 0x64, 0x65, 0x72, 0x65, 0x64, 0x4c, 0x69, 0x73, 0x74, 0x53, 0x70, - 0x65, 0x63, 0x42, 0x06, 0x0a, 0x04, 0x73, 0x70, 0x65, 0x63, 0x22, 0x35, 0x0a, 0x18, 0x52, 0x65, - 0x61, 0x64, 0x4d, 0x6f, 0x64, 0x69, 0x66, 0x79, 0x57, 0x72, 0x69, 0x74, 0x65, 0x53, 0x74, 0x61, - 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x12, 0x19, 0x0a, 0x08, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, - 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x49, - 0x64, 0x22, 0x38, 0x0a, 0x0c, 0x42, 0x61, 0x67, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, - 0x63, 0x12, 0x28, 0x0a, 0x10, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x64, - 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0e, 0x65, 0x6c, 0x65, - 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0x40, 0x0a, 0x14, 0x4f, - 0x72, 0x64, 0x65, 0x72, 0x65, 0x64, 0x4c, 0x69, 0x73, 0x74, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, - 0x70, 0x65, 0x63, 0x12, 0x28, 0x0a, 0x10, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x63, - 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0e, 0x65, - 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0x96, 0x01, - 0x0a, 0x12, 0x43, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x61, 0x74, 0x65, - 0x53, 0x70, 0x65, 0x63, 0x12, 0x30, 0x0a, 0x14, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, - 0x74, 0x6f, 0x72, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x12, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x6f, 0x72, 0x43, - 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x12, 0x4e, 0x0a, 0x0a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, - 0x65, 0x5f, 0x66, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, + 0x31, 0x2e, 0x44, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x52, 0x0b, 0x64, + 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x22, 0xbe, 0x07, 0x0a, 0x0c, 0x50, + 0x61, 0x72, 0x44, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x44, 0x0a, 0x05, 0x64, + 0x6f, 0x5f, 0x66, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, - 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x09, 0x63, 0x6f, 0x6d, - 0x62, 0x69, 0x6e, 0x65, 0x46, 0x6e, 0x22, 0x56, 0x0a, 0x0c, 0x4d, 0x61, 0x70, 0x53, 0x74, 0x61, - 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x12, 0x20, 0x0a, 0x0c, 0x6b, 0x65, 0x79, 0x5f, 0x63, 0x6f, - 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, 0x6b, 0x65, - 0x79, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x12, 0x24, 0x0a, 0x0e, 0x76, 0x61, 0x6c, 0x75, - 0x65, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x0c, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0x38, - 0x0a, 0x0c, 0x53, 0x65, 0x74, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x12, 0x28, - 0x0a, 0x10, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, - 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0e, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, - 0x74, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0x99, 0x01, 0x0a, 0x0f, 0x54, 0x69, 0x6d, - 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x53, 0x70, 0x65, 0x63, 0x12, 0x53, 0x0a, 0x0b, - 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x64, 0x6f, 0x6d, 0x61, 0x69, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x0e, 0x32, 0x32, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x04, 0x64, 0x6f, 0x46, + 0x6e, 0x12, 0x60, 0x0a, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x73, + 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x61, 0x72, 0x44, 0x6f, + 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, + 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0a, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, + 0x75, 0x74, 0x73, 0x12, 0x60, 0x0a, 0x0b, 0x73, 0x74, 0x61, 0x74, 0x65, 0x5f, 0x73, 0x70, 0x65, + 0x63, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, + 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x61, 0x72, + 0x44, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, + 0x70, 0x65, 0x63, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0a, 0x73, 0x74, 0x61, 0x74, 0x65, + 0x53, 0x70, 0x65, 0x63, 0x73, 0x12, 0x73, 0x0a, 0x12, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x5f, 0x66, + 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x73, 0x18, 0x09, 0x20, 0x03, 0x28, + 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x44, 0x6f, 0x6d, 0x61, 0x69, 0x6e, - 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x0a, 0x74, 0x69, 0x6d, 0x65, 0x44, 0x6f, 0x6d, 0x61, 0x69, - 0x6e, 0x12, 0x31, 0x0a, 0x15, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x5f, 0x66, 0x61, 0x6d, 0x69, 0x6c, - 0x79, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x12, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x43, 0x6f, 0x64, - 0x65, 0x72, 0x49, 0x64, 0x22, 0x40, 0x0a, 0x09, 0x49, 0x73, 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, - 0x64, 0x22, 0x33, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, - 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0d, 0x0a, 0x09, 0x55, 0x4e, - 0x42, 0x4f, 0x55, 0x4e, 0x44, 0x45, 0x44, 0x10, 0x01, 0x12, 0x0b, 0x0a, 0x07, 0x42, 0x4f, 0x55, - 0x4e, 0x44, 0x45, 0x44, 0x10, 0x02, 0x22, 0xa8, 0x01, 0x0a, 0x0b, 0x52, 0x65, 0x61, 0x64, 0x50, - 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x47, 0x0a, 0x06, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x61, 0x72, 0x44, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, + 0x61, 0x64, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x53, 0x70, + 0x65, 0x63, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x10, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x46, + 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x53, 0x70, 0x65, 0x63, 0x73, 0x12, 0x30, 0x0a, 0x14, 0x72, 0x65, + 0x73, 0x74, 0x72, 0x69, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, + 0x69, 0x64, 0x18, 0x07, 0x20, 0x01, 0x28, 0x09, 0x52, 0x12, 0x72, 0x65, 0x73, 0x74, 0x72, 0x69, + 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x12, 0x33, 0x0a, 0x15, + 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x73, 0x5f, 0x66, 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, + 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x08, 0x20, 0x01, 0x28, 0x08, 0x52, 0x14, 0x72, 0x65, 0x71, + 0x75, 0x65, 0x73, 0x74, 0x73, 0x46, 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x61, 0x74, 0x69, 0x6f, + 0x6e, 0x12, 0x3b, 0x0a, 0x1a, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x5f, 0x74, 0x69, + 0x6d, 0x65, 0x5f, 0x73, 0x6f, 0x72, 0x74, 0x65, 0x64, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, + 0x0a, 0x20, 0x01, 0x28, 0x08, 0x52, 0x17, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x54, + 0x69, 0x6d, 0x65, 0x53, 0x6f, 0x72, 0x74, 0x65, 0x64, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x32, + 0x0a, 0x15, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x5f, 0x73, 0x74, 0x61, 0x62, 0x6c, + 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x0b, 0x20, 0x01, 0x28, 0x08, 0x52, 0x13, 0x72, + 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x53, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x49, 0x6e, 0x70, + 0x75, 0x74, 0x1a, 0x6b, 0x0a, 0x0f, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, + 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x42, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, - 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x06, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x12, - 0x50, 0x0a, 0x0a, 0x69, 0x73, 0x5f, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x0e, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x69, 0x64, 0x65, 0x49, + 0x6e, 0x70, 0x75, 0x74, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x1a, + 0x6b, 0x0a, 0x0f, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x73, 0x45, 0x6e, 0x74, + 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x03, 0x6b, 0x65, 0x79, 0x12, 0x42, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, + 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, - 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x73, 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, - 0x64, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x09, 0x69, 0x73, 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, - 0x64, 0x22, 0x61, 0x0a, 0x11, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x49, 0x6e, 0x74, 0x6f, 0x50, - 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x4c, 0x0a, 0x09, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, - 0x5f, 0x66, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, + 0x63, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x1a, 0x77, 0x0a, 0x15, + 0x54, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x53, 0x70, 0x65, 0x63, 0x73, + 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x48, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x32, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x72, + 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x53, 0x70, 0x65, 0x63, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, + 0x65, 0x3a, 0x02, 0x38, 0x01, 0x4a, 0x04, 0x08, 0x06, 0x10, 0x07, 0x22, 0xb8, 0x04, 0x0a, 0x09, + 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x12, 0x72, 0x0a, 0x16, 0x72, 0x65, 0x61, + 0x64, 0x5f, 0x6d, 0x6f, 0x64, 0x69, 0x66, 0x79, 0x5f, 0x77, 0x72, 0x69, 0x74, 0x65, 0x5f, 0x73, + 0x70, 0x65, 0x63, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3b, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, - 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x08, 0x77, 0x69, 0x6e, 0x64, - 0x6f, 0x77, 0x46, 0x6e, 0x22, 0x92, 0x01, 0x0a, 0x0e, 0x43, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, - 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x4e, 0x0a, 0x0a, 0x63, 0x6f, 0x6d, 0x62, 0x69, - 0x6e, 0x65, 0x5f, 0x66, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, + 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x52, 0x65, + 0x61, 0x64, 0x4d, 0x6f, 0x64, 0x69, 0x66, 0x79, 0x57, 0x72, 0x69, 0x74, 0x65, 0x53, 0x74, 0x61, + 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, 0x52, 0x13, 0x72, 0x65, 0x61, 0x64, 0x4d, 0x6f, + 0x64, 0x69, 0x66, 0x79, 0x57, 0x72, 0x69, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x12, 0x4c, 0x0a, + 0x08, 0x62, 0x61, 0x67, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, + 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, + 0x2e, 0x76, 0x31, 0x2e, 0x42, 0x61, 0x67, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, + 0x48, 0x00, 0x52, 0x07, 0x62, 0x61, 0x67, 0x53, 0x70, 0x65, 0x63, 0x12, 0x5e, 0x0a, 0x0e, 0x63, + 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x03, 0x20, + 0x01, 0x28, 0x0b, 0x32, 0x35, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, + 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, + 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x69, 0x6e, + 0x67, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, 0x52, 0x0d, 0x63, 0x6f, + 0x6d, 0x62, 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x53, 0x70, 0x65, 0x63, 0x12, 0x4c, 0x0a, 0x08, 0x6d, + 0x61, 0x70, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x4d, 0x61, 0x70, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, + 0x52, 0x07, 0x6d, 0x61, 0x70, 0x53, 0x70, 0x65, 0x63, 0x12, 0x4c, 0x0a, 0x08, 0x73, 0x65, 0x74, + 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, - 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x09, 0x63, 0x6f, - 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x46, 0x6e, 0x12, 0x30, 0x0a, 0x14, 0x61, 0x63, 0x63, 0x75, 0x6d, - 0x75, 0x6c, 0x61, 0x74, 0x6f, 0x72, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, - 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x12, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, - 0x6f, 0x72, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0xcd, 0x07, 0x0a, 0x11, 0x54, 0x65, - 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, - 0x19, 0x0a, 0x08, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x07, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x12, 0x52, 0x0a, 0x06, 0x65, 0x76, - 0x65, 0x6e, 0x74, 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3a, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, - 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, - 0x2e, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x52, 0x06, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x53, - 0x0a, 0x08, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, - 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, - 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, - 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, - 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x08, 0x65, 0x6e, 0x64, 0x70, 0x6f, - 0x69, 0x6e, 0x74, 0x1a, 0x96, 0x05, 0x0a, 0x05, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x12, 0x76, 0x0a, - 0x0f, 0x77, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x5f, 0x65, 0x76, 0x65, 0x6e, 0x74, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x4b, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x53, 0x65, 0x74, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, 0x52, 0x07, + 0x73, 0x65, 0x74, 0x53, 0x70, 0x65, 0x63, 0x12, 0x65, 0x0a, 0x11, 0x6f, 0x72, 0x64, 0x65, 0x72, + 0x65, 0x64, 0x5f, 0x6c, 0x69, 0x73, 0x74, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x06, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, + 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4f, 0x72, 0x64, 0x65, 0x72, 0x65, 0x64, 0x4c, 0x69, + 0x73, 0x74, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, 0x52, 0x0f, 0x6f, + 0x72, 0x64, 0x65, 0x72, 0x65, 0x64, 0x4c, 0x69, 0x73, 0x74, 0x53, 0x70, 0x65, 0x63, 0x42, 0x06, + 0x0a, 0x04, 0x73, 0x70, 0x65, 0x63, 0x22, 0x35, 0x0a, 0x18, 0x52, 0x65, 0x61, 0x64, 0x4d, 0x6f, + 0x64, 0x69, 0x66, 0x79, 0x57, 0x72, 0x69, 0x74, 0x65, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, + 0x65, 0x63, 0x12, 0x19, 0x0a, 0x08, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0x38, 0x0a, + 0x0c, 0x42, 0x61, 0x67, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x12, 0x28, 0x0a, + 0x10, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, + 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0e, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, + 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0x40, 0x0a, 0x14, 0x4f, 0x72, 0x64, 0x65, 0x72, + 0x65, 0x64, 0x4c, 0x69, 0x73, 0x74, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x12, + 0x28, 0x0a, 0x10, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, + 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0e, 0x65, 0x6c, 0x65, 0x6d, 0x65, + 0x6e, 0x74, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0x96, 0x01, 0x0a, 0x12, 0x43, 0x6f, + 0x6d, 0x62, 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, + 0x12, 0x30, 0x0a, 0x14, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x6f, 0x72, 0x5f, + 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x12, + 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x6f, 0x72, 0x43, 0x6f, 0x64, 0x65, 0x72, + 0x49, 0x64, 0x12, 0x4e, 0x0a, 0x0a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x66, 0x6e, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, - 0x74, 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x45, 0x76, 0x65, - 0x6e, 0x74, 0x2e, 0x41, 0x64, 0x76, 0x61, 0x6e, 0x63, 0x65, 0x57, 0x61, 0x74, 0x65, 0x72, 0x6d, - 0x61, 0x72, 0x6b, 0x48, 0x00, 0x52, 0x0e, 0x77, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, - 0x45, 0x76, 0x65, 0x6e, 0x74, 0x12, 0x86, 0x01, 0x0a, 0x15, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, - 0x73, 0x69, 0x6e, 0x67, 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x18, - 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x50, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, - 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x45, 0x76, 0x65, 0x6e, - 0x74, 0x2e, 0x41, 0x64, 0x76, 0x61, 0x6e, 0x63, 0x65, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, - 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, 0x48, 0x00, 0x52, 0x13, 0x70, 0x72, 0x6f, 0x63, 0x65, - 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x12, 0x6d, - 0x0a, 0x0d, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x18, - 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x46, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, - 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x45, 0x76, 0x65, 0x6e, - 0x74, 0x2e, 0x41, 0x64, 0x64, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x48, 0x00, 0x52, - 0x0c, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x1a, 0x49, 0x0a, - 0x10, 0x41, 0x64, 0x76, 0x61, 0x6e, 0x63, 0x65, 0x57, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, - 0x6b, 0x12, 0x23, 0x0a, 0x0d, 0x6e, 0x65, 0x77, 0x5f, 0x77, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, - 0x72, 0x6b, 0x18, 0x01, 0x20, 0x01, 0x28, 0x03, 0x52, 0x0c, 0x6e, 0x65, 0x77, 0x57, 0x61, 0x74, - 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x12, 0x10, 0x0a, 0x03, 0x74, 0x61, 0x67, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x09, 0x52, 0x03, 0x74, 0x61, 0x67, 0x1a, 0x42, 0x0a, 0x15, 0x41, 0x64, 0x76, 0x61, - 0x6e, 0x63, 0x65, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, - 0x65, 0x12, 0x29, 0x0a, 0x10, 0x61, 0x64, 0x76, 0x61, 0x6e, 0x63, 0x65, 0x5f, 0x64, 0x75, 0x72, - 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x03, 0x52, 0x0f, 0x61, 0x64, 0x76, - 0x61, 0x6e, 0x63, 0x65, 0x44, 0x75, 0x72, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x1a, 0x84, 0x01, 0x0a, - 0x0b, 0x41, 0x64, 0x64, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x63, 0x0a, 0x08, - 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x47, - 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, - 0x76, 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, - 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x65, 0x64, - 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x52, 0x08, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, - 0x73, 0x12, 0x10, 0x0a, 0x03, 0x74, 0x61, 0x67, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, - 0x74, 0x61, 0x67, 0x42, 0x07, 0x0a, 0x05, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x1a, 0x5b, 0x0a, 0x12, - 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x65, 0x64, 0x45, 0x6c, 0x65, 0x6d, 0x65, - 0x6e, 0x74, 0x12, 0x27, 0x0a, 0x0f, 0x65, 0x6e, 0x63, 0x6f, 0x64, 0x65, 0x64, 0x5f, 0x65, 0x6c, - 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x0e, 0x65, 0x6e, 0x63, - 0x6f, 0x64, 0x65, 0x64, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x12, 0x1c, 0x0a, 0x09, 0x74, - 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x18, 0x02, 0x20, 0x01, 0x28, 0x03, 0x52, 0x09, - 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x22, 0x2e, 0x0a, 0x0d, 0x45, 0x76, 0x65, - 0x6e, 0x74, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x1d, 0x0a, 0x0a, 0x6f, 0x75, - 0x74, 0x70, 0x75, 0x74, 0x5f, 0x69, 0x64, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x09, 0x52, 0x09, - 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x49, 0x64, 0x73, 0x22, 0xed, 0x03, 0x0a, 0x11, 0x57, 0x72, - 0x69, 0x74, 0x65, 0x46, 0x69, 0x6c, 0x65, 0x73, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, - 0x43, 0x0a, 0x04, 0x73, 0x69, 0x6e, 0x6b, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, - 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, - 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, - 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x04, - 0x73, 0x69, 0x6e, 0x6b, 0x12, 0x58, 0x0a, 0x0f, 0x66, 0x6f, 0x72, 0x6d, 0x61, 0x74, 0x5f, 0x66, - 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, + 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x09, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, + 0x46, 0x6e, 0x22, 0x56, 0x0a, 0x0c, 0x4d, 0x61, 0x70, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, + 0x65, 0x63, 0x12, 0x20, 0x0a, 0x0c, 0x6b, 0x65, 0x79, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, + 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, 0x6b, 0x65, 0x79, 0x43, 0x6f, 0x64, + 0x65, 0x72, 0x49, 0x64, 0x12, 0x24, 0x0a, 0x0e, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x5f, 0x63, 0x6f, + 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0c, 0x76, 0x61, + 0x6c, 0x75, 0x65, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0x38, 0x0a, 0x0c, 0x53, 0x65, + 0x74, 0x53, 0x74, 0x61, 0x74, 0x65, 0x53, 0x70, 0x65, 0x63, 0x12, 0x28, 0x0a, 0x10, 0x65, 0x6c, + 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x0e, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x64, + 0x65, 0x72, 0x49, 0x64, 0x22, 0x99, 0x01, 0x0a, 0x0f, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, + 0x6d, 0x69, 0x6c, 0x79, 0x53, 0x70, 0x65, 0x63, 0x12, 0x53, 0x0a, 0x0b, 0x74, 0x69, 0x6d, 0x65, + 0x5f, 0x64, 0x6f, 0x6d, 0x61, 0x69, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x32, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, - 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x0e, - 0x66, 0x6f, 0x72, 0x6d, 0x61, 0x74, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x27, - 0x0a, 0x0f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x65, 0x64, 0x5f, 0x77, 0x72, 0x69, 0x74, 0x65, - 0x73, 0x18, 0x03, 0x20, 0x01, 0x28, 0x08, 0x52, 0x0e, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x65, - 0x64, 0x57, 0x72, 0x69, 0x74, 0x65, 0x73, 0x12, 0x3c, 0x0a, 0x1a, 0x72, 0x75, 0x6e, 0x6e, 0x65, - 0x72, 0x5f, 0x64, 0x65, 0x74, 0x65, 0x72, 0x6d, 0x69, 0x6e, 0x65, 0x64, 0x5f, 0x73, 0x68, 0x61, - 0x72, 0x64, 0x69, 0x6e, 0x67, 0x18, 0x04, 0x20, 0x01, 0x28, 0x08, 0x52, 0x18, 0x72, 0x75, 0x6e, - 0x6e, 0x65, 0x72, 0x44, 0x65, 0x74, 0x65, 0x72, 0x6d, 0x69, 0x6e, 0x65, 0x64, 0x53, 0x68, 0x61, - 0x72, 0x64, 0x69, 0x6e, 0x67, 0x12, 0x65, 0x0a, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, - 0x70, 0x75, 0x74, 0x73, 0x18, 0x05, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x44, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x57, - 0x72, 0x69, 0x74, 0x65, 0x46, 0x69, 0x6c, 0x65, 0x73, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, - 0x2e, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, - 0x52, 0x0a, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x1a, 0x6b, 0x0a, 0x0f, - 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, - 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, - 0x79, 0x12, 0x42, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, - 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, - 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, - 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x52, 0x05, - 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0xca, 0x01, 0x0a, 0x11, 0x50, 0x75, - 0x62, 0x53, 0x75, 0x62, 0x52, 0x65, 0x61, 0x64, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, - 0x14, 0x0a, 0x05, 0x74, 0x6f, 0x70, 0x69, 0x63, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, - 0x74, 0x6f, 0x70, 0x69, 0x63, 0x12, 0x22, 0x0a, 0x0c, 0x73, 0x75, 0x62, 0x73, 0x63, 0x72, 0x69, - 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0c, 0x73, 0x75, 0x62, - 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x2f, 0x0a, 0x13, 0x74, 0x69, 0x6d, - 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x5f, 0x61, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, - 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x12, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, - 0x70, 0x41, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x12, 0x21, 0x0a, 0x0c, 0x69, 0x64, - 0x5f, 0x61, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x0b, 0x69, 0x64, 0x41, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x12, 0x27, 0x0a, - 0x0f, 0x77, 0x69, 0x74, 0x68, 0x5f, 0x61, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x73, - 0x18, 0x05, 0x20, 0x01, 0x28, 0x08, 0x52, 0x0e, 0x77, 0x69, 0x74, 0x68, 0x41, 0x74, 0x74, 0x72, - 0x69, 0x62, 0x75, 0x74, 0x65, 0x73, 0x22, 0xa7, 0x01, 0x0a, 0x12, 0x50, 0x75, 0x62, 0x53, 0x75, - 0x62, 0x57, 0x72, 0x69, 0x74, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x14, 0x0a, - 0x05, 0x74, 0x6f, 0x70, 0x69, 0x63, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x74, 0x6f, - 0x70, 0x69, 0x63, 0x12, 0x2f, 0x0a, 0x13, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, - 0x5f, 0x61, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x12, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x41, 0x74, 0x74, 0x72, 0x69, - 0x62, 0x75, 0x74, 0x65, 0x12, 0x21, 0x0a, 0x0c, 0x69, 0x64, 0x5f, 0x61, 0x74, 0x74, 0x72, 0x69, - 0x62, 0x75, 0x74, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x69, 0x64, 0x41, 0x74, - 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x12, 0x27, 0x0a, 0x0f, 0x77, 0x69, 0x74, 0x68, 0x5f, - 0x61, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x73, 0x18, 0x04, 0x20, 0x01, 0x28, 0x08, - 0x52, 0x0e, 0x77, 0x69, 0x74, 0x68, 0x41, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x73, - 0x22, 0x7c, 0x0a, 0x05, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x12, 0x43, 0x0a, 0x04, 0x73, 0x70, 0x65, - 0x63, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x31, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x44, 0x6f, 0x6d, 0x61, 0x69, 0x6e, 0x2e, 0x45, 0x6e, 0x75, + 0x6d, 0x52, 0x0a, 0x74, 0x69, 0x6d, 0x65, 0x44, 0x6f, 0x6d, 0x61, 0x69, 0x6e, 0x12, 0x31, 0x0a, + 0x15, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x5f, 0x66, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x5f, 0x63, 0x6f, + 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x12, 0x74, 0x69, + 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, + 0x22, 0x40, 0x0a, 0x09, 0x49, 0x73, 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, 0x22, 0x33, 0x0a, + 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, 0x50, 0x45, 0x43, 0x49, + 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0d, 0x0a, 0x09, 0x55, 0x4e, 0x42, 0x4f, 0x55, 0x4e, + 0x44, 0x45, 0x44, 0x10, 0x01, 0x12, 0x0b, 0x0a, 0x07, 0x42, 0x4f, 0x55, 0x4e, 0x44, 0x45, 0x44, + 0x10, 0x02, 0x22, 0xa8, 0x01, 0x0a, 0x0b, 0x52, 0x65, 0x61, 0x64, 0x50, 0x61, 0x79, 0x6c, 0x6f, + 0x61, 0x64, 0x12, 0x47, 0x0a, 0x06, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x18, 0x01, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, + 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, + 0x70, 0x65, 0x63, 0x52, 0x06, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x12, 0x50, 0x0a, 0x0a, 0x69, + 0x73, 0x5f, 0x62, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0e, 0x32, + 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, + 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x73, 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, 0x2e, 0x45, 0x6e, + 0x75, 0x6d, 0x52, 0x09, 0x69, 0x73, 0x42, 0x6f, 0x75, 0x6e, 0x64, 0x65, 0x64, 0x22, 0x61, 0x0a, + 0x11, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x49, 0x6e, 0x74, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, + 0x61, 0x64, 0x12, 0x4c, 0x0a, 0x09, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x66, 0x6e, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, + 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, + 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x08, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x46, 0x6e, + 0x22, 0x92, 0x01, 0x0a, 0x0e, 0x43, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x50, 0x61, 0x79, 0x6c, + 0x6f, 0x61, 0x64, 0x12, 0x4e, 0x0a, 0x0a, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x66, + 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, - 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x04, 0x73, 0x70, 0x65, 0x63, 0x12, 0x2e, - 0x0a, 0x13, 0x63, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x64, 0x65, - 0x72, 0x5f, 0x69, 0x64, 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x09, 0x52, 0x11, 0x63, 0x6f, 0x6d, - 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x73, 0x22, 0xe1, - 0x05, 0x0a, 0x0e, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x43, 0x6f, 0x64, 0x65, 0x72, - 0x73, 0x22, 0xce, 0x05, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x24, 0x0a, 0x05, 0x42, 0x59, - 0x54, 0x45, 0x53, 0x10, 0x00, 0x1a, 0x19, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x13, 0x62, 0x65, 0x61, - 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x62, 0x79, 0x74, 0x65, 0x73, 0x3a, 0x76, 0x31, - 0x12, 0x30, 0x0a, 0x0b, 0x53, 0x54, 0x52, 0x49, 0x4e, 0x47, 0x5f, 0x55, 0x54, 0x46, 0x38, 0x10, - 0x0a, 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, - 0x64, 0x65, 0x72, 0x3a, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x75, 0x74, 0x66, 0x38, 0x3a, - 0x76, 0x31, 0x12, 0x1e, 0x0a, 0x02, 0x4b, 0x56, 0x10, 0x01, 0x1a, 0x16, 0xa2, 0xb4, 0xfa, 0xc2, - 0x05, 0x10, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x6b, 0x76, 0x3a, - 0x76, 0x31, 0x12, 0x22, 0x0a, 0x04, 0x42, 0x4f, 0x4f, 0x4c, 0x10, 0x0c, 0x1a, 0x18, 0xa2, 0xb4, - 0xfa, 0xc2, 0x05, 0x12, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x62, - 0x6f, 0x6f, 0x6c, 0x3a, 0x76, 0x31, 0x12, 0x26, 0x0a, 0x06, 0x56, 0x41, 0x52, 0x49, 0x4e, 0x54, - 0x10, 0x02, 0x1a, 0x1a, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x14, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, - 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x76, 0x61, 0x72, 0x69, 0x6e, 0x74, 0x3a, 0x76, 0x31, 0x12, 0x26, - 0x0a, 0x06, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x0b, 0x1a, 0x1a, 0xa2, 0xb4, 0xfa, 0xc2, - 0x05, 0x14, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x64, 0x6f, 0x75, - 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x2a, 0x0a, 0x08, 0x49, 0x54, 0x45, 0x52, 0x41, 0x42, - 0x4c, 0x45, 0x10, 0x03, 0x1a, 0x1c, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x16, 0x62, 0x65, 0x61, 0x6d, - 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x3a, - 0x76, 0x31, 0x12, 0x24, 0x0a, 0x05, 0x54, 0x49, 0x4d, 0x45, 0x52, 0x10, 0x04, 0x1a, 0x19, 0xa2, - 0xb4, 0xfa, 0xc2, 0x05, 0x13, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, - 0x74, 0x69, 0x6d, 0x65, 0x72, 0x3a, 0x76, 0x31, 0x12, 0x38, 0x0a, 0x0f, 0x49, 0x4e, 0x54, 0x45, - 0x52, 0x56, 0x41, 0x4c, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x10, 0x05, 0x1a, 0x23, 0xa2, - 0xb4, 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, - 0x69, 0x6e, 0x74, 0x65, 0x72, 0x76, 0x61, 0x6c, 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x3a, - 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0d, 0x4c, 0x45, 0x4e, 0x47, 0x54, 0x48, 0x5f, 0x50, 0x52, 0x45, - 0x46, 0x49, 0x58, 0x10, 0x06, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, - 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x6c, 0x65, 0x6e, 0x67, 0x74, 0x68, 0x5f, 0x70, - 0x72, 0x65, 0x66, 0x69, 0x78, 0x3a, 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0d, 0x47, 0x4c, 0x4f, 0x42, - 0x41, 0x4c, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x10, 0x07, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, - 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x67, 0x6c, - 0x6f, 0x62, 0x61, 0x6c, 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x3a, 0x76, 0x31, 0x12, 0x36, - 0x0a, 0x0e, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x45, 0x44, 0x5f, 0x56, 0x41, 0x4c, 0x55, 0x45, - 0x10, 0x08, 0x1a, 0x22, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1c, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, - 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x65, 0x64, 0x5f, 0x76, 0x61, - 0x6c, 0x75, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x42, 0x0a, 0x14, 0x50, 0x41, 0x52, 0x41, 0x4d, 0x5f, - 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x45, 0x44, 0x5f, 0x56, 0x41, 0x4c, 0x55, 0x45, 0x10, 0x0e, - 0x1a, 0x28, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, - 0x65, 0x72, 0x3a, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x65, - 0x64, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x44, 0x0a, 0x15, 0x53, 0x54, - 0x41, 0x54, 0x45, 0x5f, 0x42, 0x41, 0x43, 0x4b, 0x45, 0x44, 0x5f, 0x49, 0x54, 0x45, 0x52, 0x41, - 0x42, 0x4c, 0x45, 0x10, 0x09, 0x1a, 0x29, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x23, 0x62, 0x65, 0x61, - 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x73, 0x74, 0x61, 0x74, 0x65, 0x5f, 0x62, 0x61, - 0x63, 0x6b, 0x65, 0x64, 0x5f, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, - 0x12, 0x20, 0x0a, 0x03, 0x52, 0x4f, 0x57, 0x10, 0x0d, 0x1a, 0x17, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, - 0x11, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x72, 0x6f, 0x77, 0x3a, - 0x76, 0x31, 0x22, 0xac, 0x06, 0x0a, 0x11, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, - 0x53, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x12, 0x4c, 0x0a, 0x09, 0x77, 0x69, 0x6e, 0x64, - 0x6f, 0x77, 0x5f, 0x66, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, - 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, - 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, - 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x08, 0x77, 0x69, - 0x6e, 0x64, 0x6f, 0x77, 0x46, 0x6e, 0x12, 0x56, 0x0a, 0x0c, 0x6d, 0x65, 0x72, 0x67, 0x65, 0x5f, - 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x33, 0x2e, 0x6f, + 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x09, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, + 0x65, 0x46, 0x6e, 0x12, 0x30, 0x0a, 0x14, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, + 0x6f, 0x72, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x12, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x6f, 0x72, 0x43, 0x6f, + 0x64, 0x65, 0x72, 0x49, 0x64, 0x22, 0xcd, 0x07, 0x0a, 0x11, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, + 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x19, 0x0a, 0x08, 0x63, + 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x63, + 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x12, 0x52, 0x0a, 0x06, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x73, + 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, + 0x74, 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x45, 0x76, 0x65, + 0x6e, 0x74, 0x52, 0x06, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x53, 0x0a, 0x08, 0x65, 0x6e, + 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x4d, 0x65, 0x72, 0x67, 0x65, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x2e, 0x45, 0x6e, 0x75, - 0x6d, 0x52, 0x0b, 0x6d, 0x65, 0x72, 0x67, 0x65, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x12, 0x26, - 0x0a, 0x0f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, - 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x43, - 0x6f, 0x64, 0x65, 0x72, 0x49, 0x64, 0x12, 0x44, 0x0a, 0x07, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, - 0x72, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, - 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, - 0x67, 0x65, 0x72, 0x52, 0x07, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x12, 0x65, 0x0a, 0x11, - 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x6d, 0x6f, 0x64, - 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x38, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, - 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x63, 0x63, 0x75, - 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x4d, 0x6f, 0x64, 0x65, 0x2e, 0x45, 0x6e, 0x75, - 0x6d, 0x52, 0x10, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x4d, - 0x6f, 0x64, 0x65, 0x12, 0x53, 0x0a, 0x0b, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x5f, 0x74, 0x69, - 0x6d, 0x65, 0x18, 0x06, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x32, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, - 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, - 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4f, 0x75, 0x74, - 0x70, 0x75, 0x74, 0x54, 0x69, 0x6d, 0x65, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x0a, 0x6f, 0x75, - 0x74, 0x70, 0x75, 0x74, 0x54, 0x69, 0x6d, 0x65, 0x12, 0x62, 0x0a, 0x10, 0x63, 0x6c, 0x6f, 0x73, - 0x69, 0x6e, 0x67, 0x5f, 0x62, 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, 0x18, 0x07, 0x20, 0x01, - 0x28, 0x0e, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, + 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x08, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x1a, + 0x96, 0x05, 0x0a, 0x05, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x12, 0x76, 0x0a, 0x0f, 0x77, 0x61, 0x74, + 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x5f, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x18, 0x01, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x4b, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, - 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6c, 0x6f, 0x73, 0x69, 0x6e, 0x67, 0x42, 0x65, - 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x0f, 0x63, 0x6c, 0x6f, - 0x73, 0x69, 0x6e, 0x67, 0x42, 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, 0x12, 0x29, 0x0a, 0x10, - 0x61, 0x6c, 0x6c, 0x6f, 0x77, 0x65, 0x64, 0x5f, 0x6c, 0x61, 0x74, 0x65, 0x6e, 0x65, 0x73, 0x73, - 0x18, 0x08, 0x20, 0x01, 0x28, 0x03, 0x52, 0x0f, 0x61, 0x6c, 0x6c, 0x6f, 0x77, 0x65, 0x64, 0x4c, - 0x61, 0x74, 0x65, 0x6e, 0x65, 0x73, 0x73, 0x12, 0x5e, 0x0a, 0x0e, 0x4f, 0x6e, 0x54, 0x69, 0x6d, - 0x65, 0x42, 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, 0x18, 0x09, 0x20, 0x01, 0x28, 0x0e, 0x32, - 0x36, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, - 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, - 0x2e, 0x76, 0x31, 0x2e, 0x4f, 0x6e, 0x54, 0x69, 0x6d, 0x65, 0x42, 0x65, 0x68, 0x61, 0x76, 0x69, - 0x6f, 0x72, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x0e, 0x4f, 0x6e, 0x54, 0x69, 0x6d, 0x65, 0x42, - 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, 0x12, 0x31, 0x0a, 0x15, 0x61, 0x73, 0x73, 0x69, 0x67, - 0x6e, 0x73, 0x5f, 0x74, 0x6f, 0x5f, 0x6f, 0x6e, 0x65, 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, - 0x18, 0x0a, 0x20, 0x01, 0x28, 0x08, 0x52, 0x12, 0x61, 0x73, 0x73, 0x69, 0x67, 0x6e, 0x73, 0x54, - 0x6f, 0x4f, 0x6e, 0x65, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x12, 0x25, 0x0a, 0x0e, 0x65, 0x6e, - 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x0b, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x0d, 0x65, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x49, - 0x64, 0x22, 0x5c, 0x0a, 0x0b, 0x4d, 0x65, 0x72, 0x67, 0x65, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, - 0x22, 0x4d, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, 0x50, - 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0f, 0x0a, 0x0b, 0x4e, 0x4f, 0x4e, - 0x5f, 0x4d, 0x45, 0x52, 0x47, 0x49, 0x4e, 0x47, 0x10, 0x01, 0x12, 0x0f, 0x0a, 0x0b, 0x4e, 0x45, - 0x45, 0x44, 0x53, 0x5f, 0x4d, 0x45, 0x52, 0x47, 0x45, 0x10, 0x02, 0x12, 0x12, 0x0a, 0x0e, 0x41, - 0x4c, 0x52, 0x45, 0x41, 0x44, 0x59, 0x5f, 0x4d, 0x45, 0x52, 0x47, 0x45, 0x44, 0x10, 0x03, 0x22, - 0x5d, 0x0a, 0x10, 0x41, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x4d, - 0x6f, 0x64, 0x65, 0x22, 0x49, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, - 0x4e, 0x53, 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0e, 0x0a, 0x0a, - 0x44, 0x49, 0x53, 0x43, 0x41, 0x52, 0x44, 0x49, 0x4e, 0x47, 0x10, 0x01, 0x12, 0x10, 0x0a, 0x0c, - 0x41, 0x43, 0x43, 0x55, 0x4d, 0x55, 0x4c, 0x41, 0x54, 0x49, 0x4e, 0x47, 0x10, 0x02, 0x12, 0x0e, - 0x0a, 0x0a, 0x52, 0x45, 0x54, 0x52, 0x41, 0x43, 0x54, 0x49, 0x4e, 0x47, 0x10, 0x03, 0x22, 0x51, - 0x0a, 0x0f, 0x43, 0x6c, 0x6f, 0x73, 0x69, 0x6e, 0x67, 0x42, 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, - 0x72, 0x22, 0x3e, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, - 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0f, 0x0a, 0x0b, 0x45, 0x4d, - 0x49, 0x54, 0x5f, 0x41, 0x4c, 0x57, 0x41, 0x59, 0x53, 0x10, 0x01, 0x12, 0x14, 0x0a, 0x10, 0x45, - 0x4d, 0x49, 0x54, 0x5f, 0x49, 0x46, 0x5f, 0x4e, 0x4f, 0x4e, 0x45, 0x4d, 0x50, 0x54, 0x59, 0x10, - 0x02, 0x22, 0x50, 0x0a, 0x0e, 0x4f, 0x6e, 0x54, 0x69, 0x6d, 0x65, 0x42, 0x65, 0x68, 0x61, 0x76, - 0x69, 0x6f, 0x72, 0x22, 0x3e, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, - 0x4e, 0x53, 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0f, 0x0a, 0x0b, - 0x46, 0x49, 0x52, 0x45, 0x5f, 0x41, 0x4c, 0x57, 0x41, 0x59, 0x53, 0x10, 0x01, 0x12, 0x14, 0x0a, - 0x10, 0x46, 0x49, 0x52, 0x45, 0x5f, 0x49, 0x46, 0x5f, 0x4e, 0x4f, 0x4e, 0x45, 0x4d, 0x50, 0x54, - 0x59, 0x10, 0x02, 0x22, 0x62, 0x0a, 0x0a, 0x4f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x54, 0x69, 0x6d, - 0x65, 0x22, 0x54, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, - 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x11, 0x0a, 0x0d, 0x45, 0x4e, - 0x44, 0x5f, 0x4f, 0x46, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x10, 0x01, 0x12, 0x12, 0x0a, - 0x0e, 0x4c, 0x41, 0x54, 0x45, 0x53, 0x54, 0x5f, 0x49, 0x4e, 0x5f, 0x50, 0x41, 0x4e, 0x45, 0x10, - 0x02, 0x12, 0x14, 0x0a, 0x10, 0x45, 0x41, 0x52, 0x4c, 0x49, 0x45, 0x53, 0x54, 0x5f, 0x49, 0x4e, - 0x5f, 0x50, 0x41, 0x4e, 0x45, 0x10, 0x03, 0x22, 0x6c, 0x0a, 0x0a, 0x54, 0x69, 0x6d, 0x65, 0x44, - 0x6f, 0x6d, 0x61, 0x69, 0x6e, 0x22, 0x5e, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, - 0x0b, 0x55, 0x4e, 0x53, 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0e, - 0x0a, 0x0a, 0x45, 0x56, 0x45, 0x4e, 0x54, 0x5f, 0x54, 0x49, 0x4d, 0x45, 0x10, 0x01, 0x12, 0x13, - 0x0a, 0x0f, 0x50, 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x49, 0x4e, 0x47, 0x5f, 0x54, 0x49, 0x4d, - 0x45, 0x10, 0x02, 0x12, 0x20, 0x0a, 0x1c, 0x53, 0x59, 0x4e, 0x43, 0x48, 0x52, 0x4f, 0x4e, 0x49, - 0x5a, 0x45, 0x44, 0x5f, 0x50, 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x49, 0x4e, 0x47, 0x5f, 0x54, - 0x49, 0x4d, 0x45, 0x10, 0x03, 0x22, 0xa3, 0x10, 0x0a, 0x07, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, - 0x72, 0x12, 0x52, 0x0a, 0x09, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, 0x61, 0x6c, 0x6c, 0x18, 0x01, - 0x20, 0x01, 0x28, 0x0b, 0x32, 0x33, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, - 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, - 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, 0x41, 0x6c, 0x6c, 0x48, 0x00, 0x52, 0x08, 0x61, 0x66, 0x74, - 0x65, 0x72, 0x41, 0x6c, 0x6c, 0x12, 0x52, 0x0a, 0x09, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, 0x61, - 0x6e, 0x79, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x33, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, - 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, - 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, - 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, 0x41, 0x6e, 0x79, 0x48, 0x00, 0x52, - 0x08, 0x61, 0x66, 0x74, 0x65, 0x72, 0x41, 0x6e, 0x79, 0x12, 0x55, 0x0a, 0x0a, 0x61, 0x66, 0x74, - 0x65, 0x72, 0x5f, 0x65, 0x61, 0x63, 0x68, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, - 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, - 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, - 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, 0x45, - 0x61, 0x63, 0x68, 0x48, 0x00, 0x52, 0x09, 0x61, 0x66, 0x74, 0x65, 0x72, 0x45, 0x61, 0x63, 0x68, - 0x12, 0x6c, 0x0a, 0x13, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, 0x65, 0x6e, 0x64, 0x5f, 0x6f, 0x66, - 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3b, 0x2e, - 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, - 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, - 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, 0x45, - 0x6e, 0x64, 0x4f, 0x66, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x48, 0x00, 0x52, 0x10, 0x61, 0x66, - 0x74, 0x65, 0x72, 0x45, 0x6e, 0x64, 0x4f, 0x66, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x12, 0x74, - 0x0a, 0x15, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, - 0x6e, 0x67, 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3e, 0x2e, - 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, - 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, - 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, 0x50, - 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, 0x48, 0x00, 0x52, - 0x13, 0x61, 0x66, 0x74, 0x65, 0x72, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, - 0x54, 0x69, 0x6d, 0x65, 0x12, 0x99, 0x01, 0x0a, 0x22, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, 0x73, - 0x79, 0x6e, 0x63, 0x68, 0x72, 0x6f, 0x6e, 0x69, 0x7a, 0x65, 0x64, 0x5f, 0x70, 0x72, 0x6f, 0x63, - 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x18, 0x06, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x4a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, + 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x2e, 0x41, + 0x64, 0x76, 0x61, 0x6e, 0x63, 0x65, 0x57, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x48, + 0x00, 0x52, 0x0e, 0x77, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x45, 0x76, 0x65, 0x6e, + 0x74, 0x12, 0x86, 0x01, 0x0a, 0x15, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, + 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, + 0x0b, 0x32, 0x50, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x66, - 0x74, 0x65, 0x72, 0x53, 0x79, 0x6e, 0x63, 0x68, 0x72, 0x6f, 0x6e, 0x69, 0x7a, 0x65, 0x64, 0x50, - 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, 0x48, 0x00, 0x52, - 0x1f, 0x61, 0x66, 0x74, 0x65, 0x72, 0x53, 0x79, 0x6e, 0x63, 0x68, 0x72, 0x6f, 0x6e, 0x69, 0x7a, - 0x65, 0x64, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, - 0x12, 0x4b, 0x0a, 0x06, 0x61, 0x6c, 0x77, 0x61, 0x79, 0x73, 0x18, 0x0c, 0x20, 0x01, 0x28, 0x0b, - 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, - 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, - 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x6c, 0x77, - 0x61, 0x79, 0x73, 0x48, 0x00, 0x52, 0x06, 0x61, 0x6c, 0x77, 0x61, 0x79, 0x73, 0x12, 0x4e, 0x0a, - 0x07, 0x64, 0x65, 0x66, 0x61, 0x75, 0x6c, 0x74, 0x18, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x32, - 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, - 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x44, 0x65, 0x66, 0x61, 0x75, - 0x6c, 0x74, 0x48, 0x00, 0x52, 0x07, 0x64, 0x65, 0x66, 0x61, 0x75, 0x6c, 0x74, 0x12, 0x5e, 0x0a, - 0x0d, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x75, 0x6e, 0x74, 0x18, 0x08, - 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, - 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, - 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x75, 0x6e, 0x74, 0x48, 0x00, 0x52, - 0x0c, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x75, 0x6e, 0x74, 0x12, 0x48, 0x0a, - 0x05, 0x6e, 0x65, 0x76, 0x65, 0x72, 0x18, 0x09, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x30, 0x2e, 0x6f, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, + 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x2e, 0x41, 0x64, + 0x76, 0x61, 0x6e, 0x63, 0x65, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, + 0x69, 0x6d, 0x65, 0x48, 0x00, 0x52, 0x13, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, + 0x67, 0x54, 0x69, 0x6d, 0x65, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x12, 0x6d, 0x0a, 0x0d, 0x65, 0x6c, + 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, + 0x0b, 0x32, 0x46, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, + 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x2e, 0x41, 0x64, + 0x64, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x48, 0x00, 0x52, 0x0c, 0x65, 0x6c, 0x65, + 0x6d, 0x65, 0x6e, 0x74, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x1a, 0x49, 0x0a, 0x10, 0x41, 0x64, 0x76, + 0x61, 0x6e, 0x63, 0x65, 0x57, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x12, 0x23, 0x0a, + 0x0d, 0x6e, 0x65, 0x77, 0x5f, 0x77, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, 0x72, 0x6b, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x03, 0x52, 0x0c, 0x6e, 0x65, 0x77, 0x57, 0x61, 0x74, 0x65, 0x72, 0x6d, 0x61, + 0x72, 0x6b, 0x12, 0x10, 0x0a, 0x03, 0x74, 0x61, 0x67, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x03, 0x74, 0x61, 0x67, 0x1a, 0x42, 0x0a, 0x15, 0x41, 0x64, 0x76, 0x61, 0x6e, 0x63, 0x65, 0x50, + 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, 0x12, 0x29, 0x0a, + 0x10, 0x61, 0x64, 0x76, 0x61, 0x6e, 0x63, 0x65, 0x5f, 0x64, 0x75, 0x72, 0x61, 0x74, 0x69, 0x6f, + 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x03, 0x52, 0x0f, 0x61, 0x64, 0x76, 0x61, 0x6e, 0x63, 0x65, + 0x44, 0x75, 0x72, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x1a, 0x84, 0x01, 0x0a, 0x0b, 0x41, 0x64, 0x64, + 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x63, 0x0a, 0x08, 0x65, 0x6c, 0x65, 0x6d, + 0x65, 0x6e, 0x74, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x47, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, + 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, + 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x65, 0x64, 0x45, 0x6c, 0x65, 0x6d, + 0x65, 0x6e, 0x74, 0x52, 0x08, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x10, 0x0a, + 0x03, 0x74, 0x61, 0x67, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x74, 0x61, 0x67, 0x42, + 0x07, 0x0a, 0x05, 0x65, 0x76, 0x65, 0x6e, 0x74, 0x1a, 0x5b, 0x0a, 0x12, 0x54, 0x69, 0x6d, 0x65, + 0x73, 0x74, 0x61, 0x6d, 0x70, 0x65, 0x64, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x12, 0x27, + 0x0a, 0x0f, 0x65, 0x6e, 0x63, 0x6f, 0x64, 0x65, 0x64, 0x5f, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, + 0x74, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x0e, 0x65, 0x6e, 0x63, 0x6f, 0x64, 0x65, 0x64, + 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x12, 0x1c, 0x0a, 0x09, 0x74, 0x69, 0x6d, 0x65, 0x73, + 0x74, 0x61, 0x6d, 0x70, 0x18, 0x02, 0x20, 0x01, 0x28, 0x03, 0x52, 0x09, 0x74, 0x69, 0x6d, 0x65, + 0x73, 0x74, 0x61, 0x6d, 0x70, 0x22, 0x2e, 0x0a, 0x0d, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x73, 0x52, + 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x1d, 0x0a, 0x0a, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, + 0x5f, 0x69, 0x64, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x09, 0x52, 0x09, 0x6f, 0x75, 0x74, 0x70, + 0x75, 0x74, 0x49, 0x64, 0x73, 0x22, 0xed, 0x03, 0x0a, 0x11, 0x57, 0x72, 0x69, 0x74, 0x65, 0x46, + 0x69, 0x6c, 0x65, 0x73, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x43, 0x0a, 0x04, 0x73, + 0x69, 0x6e, 0x6b, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, + 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x04, 0x73, 0x69, 0x6e, 0x6b, + 0x12, 0x58, 0x0a, 0x0f, 0x66, 0x6f, 0x72, 0x6d, 0x61, 0x74, 0x5f, 0x66, 0x75, 0x6e, 0x63, 0x74, + 0x69, 0x6f, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, + 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x0e, 0x66, 0x6f, 0x72, 0x6d, + 0x61, 0x74, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x27, 0x0a, 0x0f, 0x77, 0x69, + 0x6e, 0x64, 0x6f, 0x77, 0x65, 0x64, 0x5f, 0x77, 0x72, 0x69, 0x74, 0x65, 0x73, 0x18, 0x03, 0x20, + 0x01, 0x28, 0x08, 0x52, 0x0e, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x65, 0x64, 0x57, 0x72, 0x69, + 0x74, 0x65, 0x73, 0x12, 0x3c, 0x0a, 0x1a, 0x72, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x5f, 0x64, 0x65, + 0x74, 0x65, 0x72, 0x6d, 0x69, 0x6e, 0x65, 0x64, 0x5f, 0x73, 0x68, 0x61, 0x72, 0x64, 0x69, 0x6e, + 0x67, 0x18, 0x04, 0x20, 0x01, 0x28, 0x08, 0x52, 0x18, 0x72, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x44, + 0x65, 0x74, 0x65, 0x72, 0x6d, 0x69, 0x6e, 0x65, 0x64, 0x53, 0x68, 0x61, 0x72, 0x64, 0x69, 0x6e, + 0x67, 0x12, 0x65, 0x0a, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x73, + 0x18, 0x05, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x44, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x72, 0x69, 0x74, 0x65, + 0x46, 0x69, 0x6c, 0x65, 0x73, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x53, 0x69, 0x64, + 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x0a, 0x73, 0x69, + 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x1a, 0x6b, 0x0a, 0x0f, 0x53, 0x69, 0x64, 0x65, + 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, + 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x42, 0x0a, + 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x4e, 0x65, 0x76, 0x65, 0x72, 0x48, 0x00, - 0x52, 0x05, 0x6e, 0x65, 0x76, 0x65, 0x72, 0x12, 0x55, 0x0a, 0x0a, 0x6f, 0x72, 0x5f, 0x66, 0x69, - 0x6e, 0x61, 0x6c, 0x6c, 0x79, 0x18, 0x0a, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, 0x72, - 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, - 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, - 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x4f, 0x72, 0x46, 0x69, 0x6e, 0x61, 0x6c, 0x6c, - 0x79, 0x48, 0x00, 0x52, 0x09, 0x6f, 0x72, 0x46, 0x69, 0x6e, 0x61, 0x6c, 0x6c, 0x79, 0x12, 0x4b, - 0x0a, 0x06, 0x72, 0x65, 0x70, 0x65, 0x61, 0x74, 0x18, 0x0b, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x31, + 0x2e, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, + 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0xcc, 0x02, 0x0a, 0x11, 0x50, 0x75, 0x62, 0x53, 0x75, 0x62, + 0x52, 0x65, 0x61, 0x64, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x14, 0x0a, 0x05, 0x74, + 0x6f, 0x70, 0x69, 0x63, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x74, 0x6f, 0x70, 0x69, + 0x63, 0x12, 0x22, 0x0a, 0x0c, 0x73, 0x75, 0x62, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, + 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0c, 0x73, 0x75, 0x62, 0x73, 0x63, 0x72, 0x69, + 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x2f, 0x0a, 0x13, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, + 0x6d, 0x70, 0x5f, 0x61, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x18, 0x03, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x12, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x41, 0x74, 0x74, + 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x12, 0x21, 0x0a, 0x0c, 0x69, 0x64, 0x5f, 0x61, 0x74, 0x74, + 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x69, 0x64, + 0x41, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x12, 0x27, 0x0a, 0x0f, 0x77, 0x69, 0x74, + 0x68, 0x5f, 0x61, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x73, 0x18, 0x05, 0x20, 0x01, + 0x28, 0x08, 0x52, 0x0e, 0x77, 0x69, 0x74, 0x68, 0x41, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, + 0x65, 0x73, 0x12, 0x38, 0x0a, 0x18, 0x74, 0x6f, 0x70, 0x69, 0x63, 0x5f, 0x72, 0x75, 0x6e, 0x74, + 0x69, 0x6d, 0x65, 0x5f, 0x6f, 0x76, 0x65, 0x72, 0x72, 0x69, 0x64, 0x64, 0x65, 0x6e, 0x18, 0x06, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x16, 0x74, 0x6f, 0x70, 0x69, 0x63, 0x52, 0x75, 0x6e, 0x74, 0x69, + 0x6d, 0x65, 0x4f, 0x76, 0x65, 0x72, 0x72, 0x69, 0x64, 0x64, 0x65, 0x6e, 0x12, 0x46, 0x0a, 0x1f, + 0x73, 0x75, 0x62, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x72, 0x75, 0x6e, + 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x6f, 0x76, 0x65, 0x72, 0x72, 0x69, 0x64, 0x64, 0x65, 0x6e, 0x18, + 0x07, 0x20, 0x01, 0x28, 0x09, 0x52, 0x1d, 0x73, 0x75, 0x62, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, + 0x69, 0x6f, 0x6e, 0x52, 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x4f, 0x76, 0x65, 0x72, 0x72, 0x69, + 0x64, 0x64, 0x65, 0x6e, 0x22, 0xb8, 0x01, 0x0a, 0x12, 0x50, 0x75, 0x62, 0x53, 0x75, 0x62, 0x57, + 0x72, 0x69, 0x74, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x14, 0x0a, 0x05, 0x74, + 0x6f, 0x70, 0x69, 0x63, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x74, 0x6f, 0x70, 0x69, + 0x63, 0x12, 0x2f, 0x0a, 0x13, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x5f, 0x61, + 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x12, + 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x41, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, + 0x74, 0x65, 0x12, 0x21, 0x0a, 0x0c, 0x69, 0x64, 0x5f, 0x61, 0x74, 0x74, 0x72, 0x69, 0x62, 0x75, + 0x74, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x69, 0x64, 0x41, 0x74, 0x74, 0x72, + 0x69, 0x62, 0x75, 0x74, 0x65, 0x12, 0x38, 0x0a, 0x18, 0x74, 0x6f, 0x70, 0x69, 0x63, 0x5f, 0x72, + 0x75, 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x6f, 0x76, 0x65, 0x72, 0x72, 0x69, 0x64, 0x64, 0x65, + 0x6e, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x16, 0x74, 0x6f, 0x70, 0x69, 0x63, 0x52, 0x75, + 0x6e, 0x74, 0x69, 0x6d, 0x65, 0x4f, 0x76, 0x65, 0x72, 0x72, 0x69, 0x64, 0x64, 0x65, 0x6e, 0x22, + 0xa5, 0x01, 0x0a, 0x17, 0x47, 0x72, 0x6f, 0x75, 0x70, 0x49, 0x6e, 0x74, 0x6f, 0x42, 0x61, 0x74, + 0x63, 0x68, 0x65, 0x73, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x1d, 0x0a, 0x0a, 0x62, + 0x61, 0x74, 0x63, 0x68, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x03, 0x52, + 0x09, 0x62, 0x61, 0x74, 0x63, 0x68, 0x53, 0x69, 0x7a, 0x65, 0x12, 0x28, 0x0a, 0x10, 0x62, 0x61, + 0x74, 0x63, 0x68, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x5f, 0x62, 0x79, 0x74, 0x65, 0x73, 0x18, 0x03, + 0x20, 0x01, 0x28, 0x03, 0x52, 0x0e, 0x62, 0x61, 0x74, 0x63, 0x68, 0x53, 0x69, 0x7a, 0x65, 0x42, + 0x79, 0x74, 0x65, 0x73, 0x12, 0x41, 0x0a, 0x1d, 0x6d, 0x61, 0x78, 0x5f, 0x62, 0x75, 0x66, 0x66, + 0x65, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x64, 0x75, 0x72, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x6d, + 0x69, 0x6c, 0x6c, 0x69, 0x73, 0x18, 0x02, 0x20, 0x01, 0x28, 0x03, 0x52, 0x1a, 0x6d, 0x61, 0x78, + 0x42, 0x75, 0x66, 0x66, 0x65, 0x72, 0x69, 0x6e, 0x67, 0x44, 0x75, 0x72, 0x61, 0x74, 0x69, 0x6f, + 0x6e, 0x4d, 0x69, 0x6c, 0x6c, 0x69, 0x73, 0x22, 0x7c, 0x0a, 0x05, 0x43, 0x6f, 0x64, 0x65, 0x72, + 0x12, 0x43, 0x0a, 0x04, 0x73, 0x70, 0x65, 0x63, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, - 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x52, 0x65, 0x70, 0x65, 0x61, - 0x74, 0x48, 0x00, 0x52, 0x06, 0x72, 0x65, 0x70, 0x65, 0x61, 0x74, 0x1a, 0x58, 0x0a, 0x08, 0x41, - 0x66, 0x74, 0x65, 0x72, 0x41, 0x6c, 0x6c, 0x12, 0x4c, 0x0a, 0x0b, 0x73, 0x75, 0x62, 0x74, 0x72, - 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x52, 0x0b, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, - 0x67, 0x67, 0x65, 0x72, 0x73, 0x1a, 0x58, 0x0a, 0x08, 0x41, 0x66, 0x74, 0x65, 0x72, 0x41, 0x6e, - 0x79, 0x12, 0x4c, 0x0a, 0x0b, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, - 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, + 0x04, 0x73, 0x70, 0x65, 0x63, 0x12, 0x2e, 0x0a, 0x13, 0x63, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, + 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, 0x73, 0x18, 0x02, 0x20, 0x03, + 0x28, 0x09, 0x52, 0x11, 0x63, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x64, + 0x65, 0x72, 0x49, 0x64, 0x73, 0x22, 0xc9, 0x06, 0x0a, 0x0e, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, + 0x72, 0x64, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x73, 0x22, 0xb6, 0x06, 0x0a, 0x04, 0x45, 0x6e, 0x75, + 0x6d, 0x12, 0x24, 0x0a, 0x05, 0x42, 0x59, 0x54, 0x45, 0x53, 0x10, 0x00, 0x1a, 0x19, 0xa2, 0xb4, + 0xfa, 0xc2, 0x05, 0x13, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x62, + 0x79, 0x74, 0x65, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x30, 0x0a, 0x0b, 0x53, 0x54, 0x52, 0x49, 0x4e, + 0x47, 0x5f, 0x55, 0x54, 0x46, 0x38, 0x10, 0x0a, 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, + 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x73, 0x74, 0x72, 0x69, 0x6e, + 0x67, 0x5f, 0x75, 0x74, 0x66, 0x38, 0x3a, 0x76, 0x31, 0x12, 0x1e, 0x0a, 0x02, 0x4b, 0x56, 0x10, + 0x01, 0x1a, 0x16, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x10, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, + 0x64, 0x65, 0x72, 0x3a, 0x6b, 0x76, 0x3a, 0x76, 0x31, 0x12, 0x22, 0x0a, 0x04, 0x42, 0x4f, 0x4f, + 0x4c, 0x10, 0x0c, 0x1a, 0x18, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x12, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x62, 0x6f, 0x6f, 0x6c, 0x3a, 0x76, 0x31, 0x12, 0x26, 0x0a, + 0x06, 0x56, 0x41, 0x52, 0x49, 0x4e, 0x54, 0x10, 0x02, 0x1a, 0x1a, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, + 0x14, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x76, 0x61, 0x72, 0x69, + 0x6e, 0x74, 0x3a, 0x76, 0x31, 0x12, 0x26, 0x0a, 0x06, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, + 0x0b, 0x1a, 0x1a, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x14, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, + 0x64, 0x65, 0x72, 0x3a, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x2a, 0x0a, + 0x08, 0x49, 0x54, 0x45, 0x52, 0x41, 0x42, 0x4c, 0x45, 0x10, 0x03, 0x1a, 0x1c, 0xa2, 0xb4, 0xfa, + 0xc2, 0x05, 0x16, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x69, 0x74, + 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x24, 0x0a, 0x05, 0x54, 0x49, 0x4d, + 0x45, 0x52, 0x10, 0x04, 0x1a, 0x19, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x13, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x3a, 0x76, 0x31, 0x12, + 0x38, 0x0a, 0x0f, 0x49, 0x4e, 0x54, 0x45, 0x52, 0x56, 0x41, 0x4c, 0x5f, 0x57, 0x49, 0x4e, 0x44, + 0x4f, 0x57, 0x10, 0x05, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x69, 0x6e, 0x74, 0x65, 0x72, 0x76, 0x61, 0x6c, 0x5f, + 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x3a, 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0d, 0x4c, 0x45, 0x4e, + 0x47, 0x54, 0x48, 0x5f, 0x50, 0x52, 0x45, 0x46, 0x49, 0x58, 0x10, 0x06, 0x1a, 0x21, 0xa2, 0xb4, + 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x6c, + 0x65, 0x6e, 0x67, 0x74, 0x68, 0x5f, 0x70, 0x72, 0x65, 0x66, 0x69, 0x78, 0x3a, 0x76, 0x31, 0x12, + 0x34, 0x0a, 0x0d, 0x47, 0x4c, 0x4f, 0x42, 0x41, 0x4c, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, + 0x10, 0x07, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, + 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x67, 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x5f, 0x77, 0x69, 0x6e, 0x64, + 0x6f, 0x77, 0x3a, 0x76, 0x31, 0x12, 0x36, 0x0a, 0x0e, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x45, + 0x44, 0x5f, 0x56, 0x41, 0x4c, 0x55, 0x45, 0x10, 0x08, 0x1a, 0x22, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, + 0x1c, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x77, 0x69, 0x6e, 0x64, + 0x6f, 0x77, 0x65, 0x64, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x42, 0x0a, + 0x14, 0x50, 0x41, 0x52, 0x41, 0x4d, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x45, 0x44, 0x5f, + 0x56, 0x41, 0x4c, 0x55, 0x45, 0x10, 0x0e, 0x1a, 0x28, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x22, 0x62, + 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, + 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x65, 0x64, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x76, + 0x31, 0x12, 0x44, 0x0a, 0x15, 0x53, 0x54, 0x41, 0x54, 0x45, 0x5f, 0x42, 0x41, 0x43, 0x4b, 0x45, + 0x44, 0x5f, 0x49, 0x54, 0x45, 0x52, 0x41, 0x42, 0x4c, 0x45, 0x10, 0x09, 0x1a, 0x29, 0xa2, 0xb4, + 0xfa, 0xc2, 0x05, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x73, + 0x74, 0x61, 0x74, 0x65, 0x5f, 0x62, 0x61, 0x63, 0x6b, 0x65, 0x64, 0x5f, 0x69, 0x74, 0x65, 0x72, + 0x61, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x34, 0x0a, 0x0d, 0x43, 0x55, 0x53, 0x54, 0x4f, + 0x4d, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x10, 0x10, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, + 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x63, 0x75, 0x73, + 0x74, 0x6f, 0x6d, 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x3a, 0x76, 0x31, 0x12, 0x20, 0x0a, + 0x03, 0x52, 0x4f, 0x57, 0x10, 0x0d, 0x1a, 0x17, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x11, 0x62, 0x65, + 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x3a, 0x72, 0x6f, 0x77, 0x3a, 0x76, 0x31, 0x12, + 0x30, 0x0a, 0x0b, 0x53, 0x48, 0x41, 0x52, 0x44, 0x45, 0x44, 0x5f, 0x4b, 0x45, 0x59, 0x10, 0x0f, + 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x63, 0x6f, 0x64, + 0x65, 0x72, 0x3a, 0x73, 0x68, 0x61, 0x72, 0x64, 0x65, 0x64, 0x5f, 0x6b, 0x65, 0x79, 0x3a, 0x76, + 0x31, 0x22, 0xac, 0x06, 0x0a, 0x11, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, + 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x12, 0x4c, 0x0a, 0x09, 0x77, 0x69, 0x6e, 0x64, 0x6f, + 0x77, 0x5f, 0x66, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, + 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x08, 0x77, 0x69, 0x6e, + 0x64, 0x6f, 0x77, 0x46, 0x6e, 0x12, 0x56, 0x0a, 0x0c, 0x6d, 0x65, 0x72, 0x67, 0x65, 0x5f, 0x73, + 0x74, 0x61, 0x74, 0x75, 0x73, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x33, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, + 0x4d, 0x65, 0x72, 0x67, 0x65, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x2e, 0x45, 0x6e, 0x75, 0x6d, + 0x52, 0x0b, 0x6d, 0x65, 0x72, 0x67, 0x65, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x12, 0x26, 0x0a, + 0x0f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x69, 0x64, + 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x43, 0x6f, + 0x64, 0x65, 0x72, 0x49, 0x64, 0x12, 0x44, 0x0a, 0x07, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, + 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, - 0x65, 0x72, 0x52, 0x0b, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, 0x1a, - 0x59, 0x0a, 0x09, 0x41, 0x66, 0x74, 0x65, 0x72, 0x45, 0x61, 0x63, 0x68, 0x12, 0x4c, 0x0a, 0x0b, - 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, - 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x72, 0x52, 0x07, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x12, 0x65, 0x0a, 0x11, 0x61, + 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x6d, 0x6f, 0x64, 0x65, + 0x18, 0x05, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x38, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x63, 0x63, 0x75, 0x6d, + 0x75, 0x6c, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x4d, 0x6f, 0x64, 0x65, 0x2e, 0x45, 0x6e, 0x75, 0x6d, + 0x52, 0x10, 0x61, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x4d, 0x6f, + 0x64, 0x65, 0x12, 0x53, 0x0a, 0x0b, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x5f, 0x74, 0x69, 0x6d, + 0x65, 0x18, 0x06, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x32, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4f, 0x75, 0x74, 0x70, + 0x75, 0x74, 0x54, 0x69, 0x6d, 0x65, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x0a, 0x6f, 0x75, 0x74, + 0x70, 0x75, 0x74, 0x54, 0x69, 0x6d, 0x65, 0x12, 0x62, 0x0a, 0x10, 0x63, 0x6c, 0x6f, 0x73, 0x69, + 0x6e, 0x67, 0x5f, 0x62, 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, 0x18, 0x07, 0x20, 0x01, 0x28, + 0x0e, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x52, 0x0b, 0x73, - 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, 0x1a, 0xb2, 0x01, 0x0a, 0x10, 0x41, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6c, 0x6f, 0x73, 0x69, 0x6e, 0x67, 0x42, 0x65, 0x68, + 0x61, 0x76, 0x69, 0x6f, 0x72, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x0f, 0x63, 0x6c, 0x6f, 0x73, + 0x69, 0x6e, 0x67, 0x42, 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, 0x12, 0x29, 0x0a, 0x10, 0x61, + 0x6c, 0x6c, 0x6f, 0x77, 0x65, 0x64, 0x5f, 0x6c, 0x61, 0x74, 0x65, 0x6e, 0x65, 0x73, 0x73, 0x18, + 0x08, 0x20, 0x01, 0x28, 0x03, 0x52, 0x0f, 0x61, 0x6c, 0x6c, 0x6f, 0x77, 0x65, 0x64, 0x4c, 0x61, + 0x74, 0x65, 0x6e, 0x65, 0x73, 0x73, 0x12, 0x5e, 0x0a, 0x0e, 0x4f, 0x6e, 0x54, 0x69, 0x6d, 0x65, + 0x42, 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, 0x18, 0x09, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x36, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, + 0x76, 0x31, 0x2e, 0x4f, 0x6e, 0x54, 0x69, 0x6d, 0x65, 0x42, 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, + 0x72, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x52, 0x0e, 0x4f, 0x6e, 0x54, 0x69, 0x6d, 0x65, 0x42, 0x65, + 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, 0x12, 0x31, 0x0a, 0x15, 0x61, 0x73, 0x73, 0x69, 0x67, 0x6e, + 0x73, 0x5f, 0x74, 0x6f, 0x5f, 0x6f, 0x6e, 0x65, 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x18, + 0x0a, 0x20, 0x01, 0x28, 0x08, 0x52, 0x12, 0x61, 0x73, 0x73, 0x69, 0x67, 0x6e, 0x73, 0x54, 0x6f, + 0x4f, 0x6e, 0x65, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x12, 0x25, 0x0a, 0x0e, 0x65, 0x6e, 0x76, + 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x0b, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x0d, 0x65, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x49, 0x64, + 0x22, 0x5c, 0x0a, 0x0b, 0x4d, 0x65, 0x72, 0x67, 0x65, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x22, + 0x4d, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, 0x50, 0x45, + 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0f, 0x0a, 0x0b, 0x4e, 0x4f, 0x4e, 0x5f, + 0x4d, 0x45, 0x52, 0x47, 0x49, 0x4e, 0x47, 0x10, 0x01, 0x12, 0x0f, 0x0a, 0x0b, 0x4e, 0x45, 0x45, + 0x44, 0x53, 0x5f, 0x4d, 0x45, 0x52, 0x47, 0x45, 0x10, 0x02, 0x12, 0x12, 0x0a, 0x0e, 0x41, 0x4c, + 0x52, 0x45, 0x41, 0x44, 0x59, 0x5f, 0x4d, 0x45, 0x52, 0x47, 0x45, 0x44, 0x10, 0x03, 0x22, 0x5d, + 0x0a, 0x10, 0x41, 0x63, 0x63, 0x75, 0x6d, 0x75, 0x6c, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x4d, 0x6f, + 0x64, 0x65, 0x22, 0x49, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, + 0x53, 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0e, 0x0a, 0x0a, 0x44, + 0x49, 0x53, 0x43, 0x41, 0x52, 0x44, 0x49, 0x4e, 0x47, 0x10, 0x01, 0x12, 0x10, 0x0a, 0x0c, 0x41, + 0x43, 0x43, 0x55, 0x4d, 0x55, 0x4c, 0x41, 0x54, 0x49, 0x4e, 0x47, 0x10, 0x02, 0x12, 0x0e, 0x0a, + 0x0a, 0x52, 0x45, 0x54, 0x52, 0x41, 0x43, 0x54, 0x49, 0x4e, 0x47, 0x10, 0x03, 0x22, 0x51, 0x0a, + 0x0f, 0x43, 0x6c, 0x6f, 0x73, 0x69, 0x6e, 0x67, 0x42, 0x65, 0x68, 0x61, 0x76, 0x69, 0x6f, 0x72, + 0x22, 0x3e, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, 0x50, + 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0f, 0x0a, 0x0b, 0x45, 0x4d, 0x49, + 0x54, 0x5f, 0x41, 0x4c, 0x57, 0x41, 0x59, 0x53, 0x10, 0x01, 0x12, 0x14, 0x0a, 0x10, 0x45, 0x4d, + 0x49, 0x54, 0x5f, 0x49, 0x46, 0x5f, 0x4e, 0x4f, 0x4e, 0x45, 0x4d, 0x50, 0x54, 0x59, 0x10, 0x02, + 0x22, 0x50, 0x0a, 0x0e, 0x4f, 0x6e, 0x54, 0x69, 0x6d, 0x65, 0x42, 0x65, 0x68, 0x61, 0x76, 0x69, + 0x6f, 0x72, 0x22, 0x3e, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, + 0x53, 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0f, 0x0a, 0x0b, 0x46, + 0x49, 0x52, 0x45, 0x5f, 0x41, 0x4c, 0x57, 0x41, 0x59, 0x53, 0x10, 0x01, 0x12, 0x14, 0x0a, 0x10, + 0x46, 0x49, 0x52, 0x45, 0x5f, 0x49, 0x46, 0x5f, 0x4e, 0x4f, 0x4e, 0x45, 0x4d, 0x50, 0x54, 0x59, + 0x10, 0x02, 0x22, 0x62, 0x0a, 0x0a, 0x4f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x54, 0x69, 0x6d, 0x65, + 0x22, 0x54, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, 0x50, + 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x11, 0x0a, 0x0d, 0x45, 0x4e, 0x44, + 0x5f, 0x4f, 0x46, 0x5f, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x10, 0x01, 0x12, 0x12, 0x0a, 0x0e, + 0x4c, 0x41, 0x54, 0x45, 0x53, 0x54, 0x5f, 0x49, 0x4e, 0x5f, 0x50, 0x41, 0x4e, 0x45, 0x10, 0x02, + 0x12, 0x14, 0x0a, 0x10, 0x45, 0x41, 0x52, 0x4c, 0x49, 0x45, 0x53, 0x54, 0x5f, 0x49, 0x4e, 0x5f, + 0x50, 0x41, 0x4e, 0x45, 0x10, 0x03, 0x22, 0x6e, 0x0a, 0x0a, 0x54, 0x69, 0x6d, 0x65, 0x44, 0x6f, + 0x6d, 0x61, 0x69, 0x6e, 0x22, 0x60, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x0f, 0x0a, 0x0b, + 0x55, 0x4e, 0x53, 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x0e, 0x0a, + 0x0a, 0x45, 0x56, 0x45, 0x4e, 0x54, 0x5f, 0x54, 0x49, 0x4d, 0x45, 0x10, 0x01, 0x12, 0x13, 0x0a, + 0x0f, 0x50, 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x49, 0x4e, 0x47, 0x5f, 0x54, 0x49, 0x4d, 0x45, + 0x10, 0x02, 0x22, 0x04, 0x08, 0x03, 0x10, 0x03, 0x2a, 0x1c, 0x53, 0x59, 0x4e, 0x43, 0x48, 0x52, + 0x4f, 0x4e, 0x49, 0x5a, 0x45, 0x44, 0x5f, 0x50, 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x49, 0x4e, + 0x47, 0x5f, 0x54, 0x49, 0x4d, 0x45, 0x22, 0xa3, 0x10, 0x0a, 0x07, 0x54, 0x72, 0x69, 0x67, 0x67, + 0x65, 0x72, 0x12, 0x52, 0x0a, 0x09, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, 0x61, 0x6c, 0x6c, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x33, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, + 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, + 0x72, 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, 0x41, 0x6c, 0x6c, 0x48, 0x00, 0x52, 0x08, 0x61, 0x66, + 0x74, 0x65, 0x72, 0x41, 0x6c, 0x6c, 0x12, 0x52, 0x0a, 0x09, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, + 0x61, 0x6e, 0x79, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x33, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, + 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, 0x41, 0x6e, 0x79, 0x48, 0x00, + 0x52, 0x08, 0x61, 0x66, 0x74, 0x65, 0x72, 0x41, 0x6e, 0x79, 0x12, 0x55, 0x0a, 0x0a, 0x61, 0x66, + 0x74, 0x65, 0x72, 0x5f, 0x65, 0x61, 0x63, 0x68, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, + 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, + 0x45, 0x61, 0x63, 0x68, 0x48, 0x00, 0x52, 0x09, 0x61, 0x66, 0x74, 0x65, 0x72, 0x45, 0x61, 0x63, + 0x68, 0x12, 0x6c, 0x0a, 0x13, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, 0x65, 0x6e, 0x64, 0x5f, 0x6f, + 0x66, 0x5f, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3b, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, + 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, + 0x45, 0x6e, 0x64, 0x4f, 0x66, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x48, 0x00, 0x52, 0x10, 0x61, 0x66, 0x74, 0x65, 0x72, 0x45, 0x6e, 0x64, 0x4f, 0x66, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x12, - 0x4f, 0x0a, 0x0d, 0x65, 0x61, 0x72, 0x6c, 0x79, 0x5f, 0x66, 0x69, 0x72, 0x69, 0x6e, 0x67, 0x73, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, - 0x65, 0x72, 0x52, 0x0c, 0x65, 0x61, 0x72, 0x6c, 0x79, 0x46, 0x69, 0x72, 0x69, 0x6e, 0x67, 0x73, - 0x12, 0x4d, 0x0a, 0x0c, 0x6c, 0x61, 0x74, 0x65, 0x5f, 0x66, 0x69, 0x72, 0x69, 0x6e, 0x67, 0x73, - 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, - 0x65, 0x72, 0x52, 0x0b, 0x6c, 0x61, 0x74, 0x65, 0x46, 0x69, 0x72, 0x69, 0x6e, 0x67, 0x73, 0x1a, - 0x7f, 0x0a, 0x13, 0x41, 0x66, 0x74, 0x65, 0x72, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, - 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, 0x12, 0x68, 0x0a, 0x14, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, - 0x61, 0x6d, 0x70, 0x5f, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x18, 0x01, - 0x20, 0x03, 0x28, 0x0b, 0x32, 0x35, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, - 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, - 0x6d, 0x70, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x52, 0x13, 0x74, 0x69, 0x6d, - 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, - 0x1a, 0x21, 0x0a, 0x1f, 0x41, 0x66, 0x74, 0x65, 0x72, 0x53, 0x79, 0x6e, 0x63, 0x68, 0x72, 0x6f, - 0x6e, 0x69, 0x7a, 0x65, 0x64, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, - 0x69, 0x6d, 0x65, 0x1a, 0x09, 0x0a, 0x07, 0x44, 0x65, 0x66, 0x61, 0x75, 0x6c, 0x74, 0x1a, 0x33, - 0x0a, 0x0c, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x75, 0x6e, 0x74, 0x12, 0x23, + 0x74, 0x0a, 0x15, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, + 0x69, 0x6e, 0x67, 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3e, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, + 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x66, 0x74, 0x65, 0x72, + 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, 0x48, 0x00, + 0x52, 0x13, 0x61, 0x66, 0x74, 0x65, 0x72, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, + 0x67, 0x54, 0x69, 0x6d, 0x65, 0x12, 0x99, 0x01, 0x0a, 0x22, 0x61, 0x66, 0x74, 0x65, 0x72, 0x5f, + 0x73, 0x79, 0x6e, 0x63, 0x68, 0x72, 0x6f, 0x6e, 0x69, 0x7a, 0x65, 0x64, 0x5f, 0x70, 0x72, 0x6f, + 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x18, 0x06, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x4a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, + 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, + 0x66, 0x74, 0x65, 0x72, 0x53, 0x79, 0x6e, 0x63, 0x68, 0x72, 0x6f, 0x6e, 0x69, 0x7a, 0x65, 0x64, + 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, 0x48, 0x00, + 0x52, 0x1f, 0x61, 0x66, 0x74, 0x65, 0x72, 0x53, 0x79, 0x6e, 0x63, 0x68, 0x72, 0x6f, 0x6e, 0x69, + 0x7a, 0x65, 0x64, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, + 0x65, 0x12, 0x4b, 0x0a, 0x06, 0x61, 0x6c, 0x77, 0x61, 0x79, 0x73, 0x18, 0x0c, 0x20, 0x01, 0x28, + 0x0b, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x41, 0x6c, + 0x77, 0x61, 0x79, 0x73, 0x48, 0x00, 0x52, 0x06, 0x61, 0x6c, 0x77, 0x61, 0x79, 0x73, 0x12, 0x4e, + 0x0a, 0x07, 0x64, 0x65, 0x66, 0x61, 0x75, 0x6c, 0x74, 0x18, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, + 0x32, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, + 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x44, 0x65, 0x66, 0x61, + 0x75, 0x6c, 0x74, 0x48, 0x00, 0x52, 0x07, 0x64, 0x65, 0x66, 0x61, 0x75, 0x6c, 0x74, 0x12, 0x5e, 0x0a, 0x0d, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x75, 0x6e, 0x74, 0x18, - 0x01, 0x20, 0x01, 0x28, 0x05, 0x52, 0x0c, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, - 0x75, 0x6e, 0x74, 0x1a, 0x07, 0x0a, 0x05, 0x4e, 0x65, 0x76, 0x65, 0x72, 0x1a, 0x08, 0x0a, 0x06, - 0x41, 0x6c, 0x77, 0x61, 0x79, 0x73, 0x1a, 0x91, 0x01, 0x0a, 0x09, 0x4f, 0x72, 0x46, 0x69, 0x6e, - 0x61, 0x6c, 0x6c, 0x79, 0x12, 0x3e, 0x0a, 0x04, 0x6d, 0x61, 0x69, 0x6e, 0x18, 0x01, 0x20, 0x01, + 0x08, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, + 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, + 0x72, 0x2e, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x75, 0x6e, 0x74, 0x48, 0x00, + 0x52, 0x0c, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x75, 0x6e, 0x74, 0x12, 0x48, + 0x0a, 0x05, 0x6e, 0x65, 0x76, 0x65, 0x72, 0x18, 0x09, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x30, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x4e, 0x65, 0x76, 0x65, 0x72, 0x48, + 0x00, 0x52, 0x05, 0x6e, 0x65, 0x76, 0x65, 0x72, 0x12, 0x55, 0x0a, 0x0a, 0x6f, 0x72, 0x5f, 0x66, + 0x69, 0x6e, 0x61, 0x6c, 0x6c, 0x79, 0x18, 0x0a, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, + 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, + 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x4f, 0x72, 0x46, 0x69, 0x6e, 0x61, 0x6c, + 0x6c, 0x79, 0x48, 0x00, 0x52, 0x09, 0x6f, 0x72, 0x46, 0x69, 0x6e, 0x61, 0x6c, 0x6c, 0x79, 0x12, + 0x4b, 0x0a, 0x06, 0x72, 0x65, 0x70, 0x65, 0x61, 0x74, 0x18, 0x0b, 0x20, 0x01, 0x28, 0x0b, 0x32, + 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, + 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x2e, 0x52, 0x65, 0x70, 0x65, + 0x61, 0x74, 0x48, 0x00, 0x52, 0x06, 0x72, 0x65, 0x70, 0x65, 0x61, 0x74, 0x1a, 0x58, 0x0a, 0x08, + 0x41, 0x66, 0x74, 0x65, 0x72, 0x41, 0x6c, 0x6c, 0x12, 0x4c, 0x0a, 0x0b, 0x73, 0x75, 0x62, 0x74, + 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2a, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x52, 0x0b, 0x73, 0x75, 0x62, 0x74, 0x72, + 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, 0x1a, 0x58, 0x0a, 0x08, 0x41, 0x66, 0x74, 0x65, 0x72, 0x41, + 0x6e, 0x79, 0x12, 0x4c, 0x0a, 0x0b, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, + 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, + 0x67, 0x65, 0x72, 0x52, 0x0b, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, + 0x1a, 0x59, 0x0a, 0x09, 0x41, 0x66, 0x74, 0x65, 0x72, 0x45, 0x61, 0x63, 0x68, 0x12, 0x4c, 0x0a, + 0x0b, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, - 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x52, 0x04, - 0x6d, 0x61, 0x69, 0x6e, 0x12, 0x44, 0x0a, 0x07, 0x66, 0x69, 0x6e, 0x61, 0x6c, 0x6c, 0x79, 0x18, - 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x52, 0x0b, + 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x73, 0x1a, 0xb2, 0x01, 0x0a, 0x10, + 0x41, 0x66, 0x74, 0x65, 0x72, 0x45, 0x6e, 0x64, 0x4f, 0x66, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, + 0x12, 0x4f, 0x0a, 0x0d, 0x65, 0x61, 0x72, 0x6c, 0x79, 0x5f, 0x66, 0x69, 0x72, 0x69, 0x6e, 0x67, + 0x73, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, + 0x67, 0x65, 0x72, 0x52, 0x0c, 0x65, 0x61, 0x72, 0x6c, 0x79, 0x46, 0x69, 0x72, 0x69, 0x6e, 0x67, + 0x73, 0x12, 0x4d, 0x0a, 0x0c, 0x6c, 0x61, 0x74, 0x65, 0x5f, 0x66, 0x69, 0x72, 0x69, 0x6e, 0x67, + 0x73, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, + 0x67, 0x65, 0x72, 0x52, 0x0b, 0x6c, 0x61, 0x74, 0x65, 0x46, 0x69, 0x72, 0x69, 0x6e, 0x67, 0x73, + 0x1a, 0x7f, 0x0a, 0x13, 0x41, 0x66, 0x74, 0x65, 0x72, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, + 0x69, 0x6e, 0x67, 0x54, 0x69, 0x6d, 0x65, 0x12, 0x68, 0x0a, 0x14, 0x74, 0x69, 0x6d, 0x65, 0x73, + 0x74, 0x61, 0x6d, 0x70, 0x5f, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x18, + 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x35, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, - 0x72, 0x52, 0x07, 0x66, 0x69, 0x6e, 0x61, 0x6c, 0x6c, 0x79, 0x1a, 0x54, 0x0a, 0x06, 0x52, 0x65, - 0x70, 0x65, 0x61, 0x74, 0x12, 0x4a, 0x0a, 0x0a, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, - 0x65, 0x72, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, - 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, - 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, - 0x67, 0x67, 0x65, 0x72, 0x52, 0x0a, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, - 0x42, 0x09, 0x0a, 0x07, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x22, 0xc3, 0x02, 0x0a, 0x12, - 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, - 0x72, 0x6d, 0x12, 0x53, 0x0a, 0x05, 0x64, 0x65, 0x6c, 0x61, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x3b, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x54, - 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x2e, 0x44, 0x65, 0x6c, 0x61, 0x79, 0x48, 0x00, - 0x52, 0x05, 0x64, 0x65, 0x6c, 0x61, 0x79, 0x12, 0x5a, 0x0a, 0x08, 0x61, 0x6c, 0x69, 0x67, 0x6e, - 0x5f, 0x74, 0x6f, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, - 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x69, + 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, + 0x61, 0x6d, 0x70, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x52, 0x13, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, - 0x2e, 0x41, 0x6c, 0x69, 0x67, 0x6e, 0x54, 0x6f, 0x48, 0x00, 0x52, 0x07, 0x61, 0x6c, 0x69, 0x67, - 0x6e, 0x54, 0x6f, 0x1a, 0x2a, 0x0a, 0x05, 0x44, 0x65, 0x6c, 0x61, 0x79, 0x12, 0x21, 0x0a, 0x0c, - 0x64, 0x65, 0x6c, 0x61, 0x79, 0x5f, 0x6d, 0x69, 0x6c, 0x6c, 0x69, 0x73, 0x18, 0x01, 0x20, 0x01, - 0x28, 0x03, 0x52, 0x0b, 0x64, 0x65, 0x6c, 0x61, 0x79, 0x4d, 0x69, 0x6c, 0x6c, 0x69, 0x73, 0x1a, - 0x39, 0x0a, 0x07, 0x41, 0x6c, 0x69, 0x67, 0x6e, 0x54, 0x6f, 0x12, 0x16, 0x0a, 0x06, 0x70, 0x65, - 0x72, 0x69, 0x6f, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x03, 0x52, 0x06, 0x70, 0x65, 0x72, 0x69, - 0x6f, 0x64, 0x12, 0x16, 0x0a, 0x06, 0x6f, 0x66, 0x66, 0x73, 0x65, 0x74, 0x18, 0x04, 0x20, 0x01, - 0x28, 0x03, 0x52, 0x06, 0x6f, 0x66, 0x66, 0x73, 0x65, 0x74, 0x42, 0x15, 0x0a, 0x13, 0x74, 0x69, - 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x5f, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, - 0x6d, 0x22, 0x8a, 0x02, 0x0a, 0x09, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, - 0x56, 0x0a, 0x0e, 0x61, 0x63, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x70, 0x61, 0x74, 0x74, 0x65, 0x72, - 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, - 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, - 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x0d, 0x61, 0x63, 0x63, 0x65, 0x73, 0x73, - 0x50, 0x61, 0x74, 0x74, 0x65, 0x72, 0x6e, 0x12, 0x48, 0x0a, 0x07, 0x76, 0x69, 0x65, 0x77, 0x5f, - 0x66, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x73, 0x1a, 0x21, 0x0a, 0x1f, 0x41, 0x66, 0x74, 0x65, 0x72, 0x53, 0x79, 0x6e, 0x63, 0x68, 0x72, + 0x6f, 0x6e, 0x69, 0x7a, 0x65, 0x64, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, + 0x54, 0x69, 0x6d, 0x65, 0x1a, 0x09, 0x0a, 0x07, 0x44, 0x65, 0x66, 0x61, 0x75, 0x6c, 0x74, 0x1a, + 0x33, 0x0a, 0x0c, 0x45, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, 0x6f, 0x75, 0x6e, 0x74, 0x12, + 0x23, 0x0a, 0x0d, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x75, 0x6e, 0x74, + 0x18, 0x01, 0x20, 0x01, 0x28, 0x05, 0x52, 0x0c, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x43, + 0x6f, 0x75, 0x6e, 0x74, 0x1a, 0x07, 0x0a, 0x05, 0x4e, 0x65, 0x76, 0x65, 0x72, 0x1a, 0x08, 0x0a, + 0x06, 0x41, 0x6c, 0x77, 0x61, 0x79, 0x73, 0x1a, 0x91, 0x01, 0x0a, 0x09, 0x4f, 0x72, 0x46, 0x69, + 0x6e, 0x61, 0x6c, 0x6c, 0x79, 0x12, 0x3e, 0x0a, 0x04, 0x6d, 0x61, 0x69, 0x6e, 0x18, 0x01, 0x20, + 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, + 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, + 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x52, + 0x04, 0x6d, 0x61, 0x69, 0x6e, 0x12, 0x44, 0x0a, 0x07, 0x66, 0x69, 0x6e, 0x61, 0x6c, 0x6c, 0x79, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x69, 0x67, 0x67, + 0x65, 0x72, 0x52, 0x07, 0x66, 0x69, 0x6e, 0x61, 0x6c, 0x6c, 0x79, 0x1a, 0x54, 0x0a, 0x06, 0x52, + 0x65, 0x70, 0x65, 0x61, 0x74, 0x12, 0x4a, 0x0a, 0x0a, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, + 0x67, 0x65, 0x72, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, + 0x69, 0x67, 0x67, 0x65, 0x72, 0x52, 0x0a, 0x73, 0x75, 0x62, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, + 0x72, 0x42, 0x09, 0x0a, 0x07, 0x74, 0x72, 0x69, 0x67, 0x67, 0x65, 0x72, 0x22, 0xc3, 0x02, 0x0a, + 0x12, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, + 0x6f, 0x72, 0x6d, 0x12, 0x53, 0x0a, 0x05, 0x64, 0x65, 0x6c, 0x61, 0x79, 0x18, 0x01, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x3b, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, + 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, + 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x2e, 0x44, 0x65, 0x6c, 0x61, 0x79, 0x48, + 0x00, 0x52, 0x05, 0x64, 0x65, 0x6c, 0x61, 0x79, 0x12, 0x5a, 0x0a, 0x08, 0x61, 0x6c, 0x69, 0x67, + 0x6e, 0x5f, 0x74, 0x6f, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, + 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, + 0x6d, 0x2e, 0x41, 0x6c, 0x69, 0x67, 0x6e, 0x54, 0x6f, 0x48, 0x00, 0x52, 0x07, 0x61, 0x6c, 0x69, + 0x67, 0x6e, 0x54, 0x6f, 0x1a, 0x2a, 0x0a, 0x05, 0x44, 0x65, 0x6c, 0x61, 0x79, 0x12, 0x21, 0x0a, + 0x0c, 0x64, 0x65, 0x6c, 0x61, 0x79, 0x5f, 0x6d, 0x69, 0x6c, 0x6c, 0x69, 0x73, 0x18, 0x01, 0x20, + 0x01, 0x28, 0x03, 0x52, 0x0b, 0x64, 0x65, 0x6c, 0x61, 0x79, 0x4d, 0x69, 0x6c, 0x6c, 0x69, 0x73, + 0x1a, 0x39, 0x0a, 0x07, 0x41, 0x6c, 0x69, 0x67, 0x6e, 0x54, 0x6f, 0x12, 0x16, 0x0a, 0x06, 0x70, + 0x65, 0x72, 0x69, 0x6f, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x03, 0x52, 0x06, 0x70, 0x65, 0x72, + 0x69, 0x6f, 0x64, 0x12, 0x16, 0x0a, 0x06, 0x6f, 0x66, 0x66, 0x73, 0x65, 0x74, 0x18, 0x04, 0x20, + 0x01, 0x28, 0x03, 0x52, 0x06, 0x6f, 0x66, 0x66, 0x73, 0x65, 0x74, 0x42, 0x15, 0x0a, 0x13, 0x74, + 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x5f, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, + 0x72, 0x6d, 0x22, 0x8a, 0x02, 0x0a, 0x09, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, + 0x12, 0x56, 0x0a, 0x0e, 0x61, 0x63, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x70, 0x61, 0x74, 0x74, 0x65, + 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, - 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x06, 0x76, 0x69, 0x65, 0x77, 0x46, - 0x6e, 0x12, 0x5b, 0x0a, 0x11, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x6d, 0x61, 0x70, 0x70, - 0x69, 0x6e, 0x67, 0x5f, 0x66, 0x6e, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x0f, 0x77, - 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x4d, 0x61, 0x70, 0x70, 0x69, 0x6e, 0x67, 0x46, 0x6e, 0x22, 0xf6, - 0x02, 0x0a, 0x11, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x41, 0x72, 0x74, 0x69, 0x66, - 0x61, 0x63, 0x74, 0x73, 0x22, 0x9f, 0x02, 0x0a, 0x05, 0x54, 0x79, 0x70, 0x65, 0x73, 0x12, 0x2a, - 0x0a, 0x04, 0x46, 0x49, 0x4c, 0x45, 0x10, 0x00, 0x1a, 0x20, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1a, - 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, - 0x70, 0x65, 0x3a, 0x66, 0x69, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x28, 0x0a, 0x03, 0x55, 0x52, - 0x4c, 0x10, 0x01, 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x75, 0x72, - 0x6c, 0x3a, 0x76, 0x31, 0x12, 0x32, 0x0a, 0x08, 0x45, 0x4d, 0x42, 0x45, 0x44, 0x44, 0x45, 0x44, - 0x10, 0x02, 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, - 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x65, 0x6d, 0x62, - 0x65, 0x64, 0x64, 0x65, 0x64, 0x3a, 0x76, 0x31, 0x12, 0x2a, 0x0a, 0x04, 0x50, 0x59, 0x50, 0x49, - 0x10, 0x03, 0x1a, 0x20, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1a, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, - 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x70, 0x79, 0x70, - 0x69, 0x3a, 0x76, 0x31, 0x12, 0x2c, 0x0a, 0x05, 0x4d, 0x41, 0x56, 0x45, 0x4e, 0x10, 0x04, 0x1a, - 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, 0x72, 0x74, 0x69, - 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x6d, 0x61, 0x76, 0x65, 0x6e, 0x3a, - 0x76, 0x31, 0x12, 0x32, 0x0a, 0x08, 0x44, 0x45, 0x46, 0x45, 0x52, 0x52, 0x45, 0x44, 0x10, 0x05, - 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, 0x72, 0x74, - 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x64, 0x65, 0x66, 0x65, 0x72, - 0x72, 0x65, 0x64, 0x3a, 0x76, 0x31, 0x22, 0x3f, 0x0a, 0x05, 0x52, 0x6f, 0x6c, 0x65, 0x73, 0x12, - 0x36, 0x0a, 0x0a, 0x53, 0x54, 0x41, 0x47, 0x49, 0x4e, 0x47, 0x5f, 0x54, 0x4f, 0x10, 0x00, 0x1a, - 0x26, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x20, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, 0x72, 0x74, 0x69, - 0x66, 0x61, 0x63, 0x74, 0x3a, 0x72, 0x6f, 0x6c, 0x65, 0x3a, 0x73, 0x74, 0x61, 0x67, 0x69, 0x6e, - 0x67, 0x5f, 0x74, 0x6f, 0x3a, 0x76, 0x31, 0x22, 0x41, 0x0a, 0x13, 0x41, 0x72, 0x74, 0x69, 0x66, + 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x0d, 0x61, 0x63, 0x63, 0x65, 0x73, + 0x73, 0x50, 0x61, 0x74, 0x74, 0x65, 0x72, 0x6e, 0x12, 0x48, 0x0a, 0x07, 0x76, 0x69, 0x65, 0x77, + 0x5f, 0x66, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, + 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, + 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x06, 0x76, 0x69, 0x65, 0x77, + 0x46, 0x6e, 0x12, 0x5b, 0x0a, 0x11, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x6d, 0x61, 0x70, + 0x70, 0x69, 0x6e, 0x67, 0x5f, 0x66, 0x6e, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x52, 0x0f, + 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x4d, 0x61, 0x70, 0x70, 0x69, 0x6e, 0x67, 0x46, 0x6e, 0x22, + 0xc5, 0x03, 0x0a, 0x11, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x41, 0x72, 0x74, 0x69, + 0x66, 0x61, 0x63, 0x74, 0x73, 0x22, 0x9f, 0x02, 0x0a, 0x05, 0x54, 0x79, 0x70, 0x65, 0x73, 0x12, + 0x2a, 0x0a, 0x04, 0x46, 0x49, 0x4c, 0x45, 0x10, 0x00, 0x1a, 0x20, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, + 0x1a, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, + 0x79, 0x70, 0x65, 0x3a, 0x66, 0x69, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x28, 0x0a, 0x03, 0x55, + 0x52, 0x4c, 0x10, 0x01, 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x75, + 0x72, 0x6c, 0x3a, 0x76, 0x31, 0x12, 0x32, 0x0a, 0x08, 0x45, 0x4d, 0x42, 0x45, 0x44, 0x44, 0x45, + 0x44, 0x10, 0x02, 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x65, 0x6d, + 0x62, 0x65, 0x64, 0x64, 0x65, 0x64, 0x3a, 0x76, 0x31, 0x12, 0x2a, 0x0a, 0x04, 0x50, 0x59, 0x50, + 0x49, 0x10, 0x03, 0x1a, 0x20, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1a, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x70, 0x79, + 0x70, 0x69, 0x3a, 0x76, 0x31, 0x12, 0x2c, 0x0a, 0x05, 0x4d, 0x41, 0x56, 0x45, 0x4e, 0x10, 0x04, + 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, 0x72, 0x74, + 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x6d, 0x61, 0x76, 0x65, 0x6e, + 0x3a, 0x76, 0x31, 0x12, 0x32, 0x0a, 0x08, 0x44, 0x45, 0x46, 0x45, 0x52, 0x52, 0x45, 0x44, 0x10, + 0x05, 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, 0x72, + 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x74, 0x79, 0x70, 0x65, 0x3a, 0x64, 0x65, 0x66, 0x65, + 0x72, 0x72, 0x65, 0x64, 0x3a, 0x76, 0x31, 0x22, 0x8d, 0x01, 0x0a, 0x05, 0x52, 0x6f, 0x6c, 0x65, + 0x73, 0x12, 0x36, 0x0a, 0x0a, 0x53, 0x54, 0x41, 0x47, 0x49, 0x4e, 0x47, 0x5f, 0x54, 0x4f, 0x10, + 0x00, 0x1a, 0x26, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x20, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x61, 0x72, + 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x72, 0x6f, 0x6c, 0x65, 0x3a, 0x73, 0x74, 0x61, 0x67, + 0x69, 0x6e, 0x67, 0x5f, 0x74, 0x6f, 0x3a, 0x76, 0x31, 0x12, 0x4c, 0x0a, 0x15, 0x50, 0x49, 0x50, + 0x5f, 0x52, 0x45, 0x51, 0x55, 0x49, 0x52, 0x45, 0x4d, 0x45, 0x4e, 0x54, 0x53, 0x5f, 0x46, 0x49, + 0x4c, 0x45, 0x10, 0x01, 0x1a, 0x31, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x2b, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x3a, 0x72, 0x6f, 0x6c, 0x65, 0x3a, 0x70, + 0x69, 0x70, 0x5f, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x5f, + 0x66, 0x69, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x22, 0x41, 0x0a, 0x13, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x46, 0x69, 0x6c, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x12, 0x0a, 0x04, 0x70, 0x61, 0x74, 0x68, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x70, 0x61, 0x74, 0x68, 0x12, 0x16, 0x0a, 0x06, 0x73, 0x68, 0x61, 0x32, 0x35, 0x36, 0x18, 0x02, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x06, 0x73, 0x68, 0x61, 0x32, 0x35, 0x36, 0x22, 0x26, 0x0a, 0x12, 0x41, 0x72, + 0x28, 0x09, 0x52, 0x06, 0x73, 0x68, 0x61, 0x32, 0x35, 0x36, 0x22, 0x3e, 0x0a, 0x12, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x55, 0x72, 0x6c, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6c, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, - 0x72, 0x6c, 0x22, 0x29, 0x0a, 0x13, 0x45, 0x6d, 0x62, 0x65, 0x64, 0x64, 0x65, 0x64, 0x46, 0x69, - 0x6c, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, - 0x61, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, 0x61, 0x74, 0x61, 0x22, 0x48, 0x0a, - 0x0b, 0x50, 0x79, 0x50, 0x49, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x1f, 0x0a, 0x0b, - 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x0a, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x49, 0x64, 0x12, 0x18, 0x0a, - 0x07, 0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, - 0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x22, 0x51, 0x0a, 0x0c, 0x4d, 0x61, 0x76, 0x65, 0x6e, - 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x1a, 0x0a, 0x08, 0x61, 0x72, 0x74, 0x69, 0x66, - 0x61, 0x63, 0x74, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x61, 0x72, 0x74, 0x69, 0x66, - 0x61, 0x63, 0x74, 0x12, 0x25, 0x0a, 0x0e, 0x72, 0x65, 0x70, 0x6f, 0x73, 0x69, 0x74, 0x6f, 0x72, - 0x79, 0x5f, 0x75, 0x72, 0x6c, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x72, 0x65, 0x70, - 0x6f, 0x73, 0x69, 0x74, 0x6f, 0x72, 0x79, 0x55, 0x72, 0x6c, 0x22, 0x3f, 0x0a, 0x17, 0x44, 0x65, - 0x66, 0x65, 0x72, 0x72, 0x65, 0x64, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x50, 0x61, - 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, - 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, 0x61, 0x74, 0x61, 0x22, 0x3f, 0x0a, 0x1c, 0x41, - 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x53, 0x74, 0x61, 0x67, 0x69, 0x6e, 0x67, 0x54, 0x6f, - 0x52, 0x6f, 0x6c, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x1f, 0x0a, 0x0b, 0x73, - 0x74, 0x61, 0x67, 0x65, 0x64, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x0a, 0x73, 0x74, 0x61, 0x67, 0x65, 0x64, 0x4e, 0x61, 0x6d, 0x65, 0x22, 0x91, 0x01, 0x0a, - 0x13, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x49, 0x6e, 0x66, 0x6f, 0x72, 0x6d, 0x61, - 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x19, 0x0a, 0x08, 0x74, 0x79, 0x70, 0x65, 0x5f, 0x75, 0x72, 0x6e, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x74, 0x79, 0x70, 0x65, 0x55, 0x72, 0x6e, 0x12, - 0x21, 0x0a, 0x0c, 0x74, 0x79, 0x70, 0x65, 0x5f, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, - 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x0b, 0x74, 0x79, 0x70, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, - 0x61, 0x64, 0x12, 0x19, 0x0a, 0x08, 0x72, 0x6f, 0x6c, 0x65, 0x5f, 0x75, 0x72, 0x6e, 0x18, 0x03, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x72, 0x6f, 0x6c, 0x65, 0x55, 0x72, 0x6e, 0x12, 0x21, 0x0a, - 0x0c, 0x72, 0x6f, 0x6c, 0x65, 0x5f, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x04, 0x20, - 0x01, 0x28, 0x0c, 0x52, 0x0b, 0x72, 0x6f, 0x6c, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, - 0x22, 0x92, 0x02, 0x0a, 0x0b, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, - 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, - 0x72, 0x6e, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x03, 0x20, - 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x51, 0x0a, 0x0c, - 0x64, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x5f, 0x64, 0x61, 0x74, 0x61, 0x18, 0x04, 0x20, 0x03, - 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, - 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, - 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, - 0x74, 0x61, 0x52, 0x0b, 0x64, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x12, - 0x22, 0x0a, 0x0c, 0x63, 0x61, 0x70, 0x61, 0x62, 0x69, 0x6c, 0x69, 0x74, 0x69, 0x65, 0x73, 0x18, - 0x05, 0x20, 0x03, 0x28, 0x09, 0x52, 0x0c, 0x63, 0x61, 0x70, 0x61, 0x62, 0x69, 0x6c, 0x69, 0x74, - 0x69, 0x65, 0x73, 0x12, 0x5a, 0x0a, 0x0c, 0x64, 0x65, 0x70, 0x65, 0x6e, 0x64, 0x65, 0x6e, 0x63, - 0x69, 0x65, 0x73, 0x18, 0x06, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x36, 0x2e, 0x6f, 0x72, 0x67, 0x2e, - 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x72, - 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x49, 0x6e, 0x66, 0x6f, 0x72, 0x6d, 0x61, 0x74, 0x69, 0x6f, - 0x6e, 0x52, 0x0c, 0x64, 0x65, 0x70, 0x65, 0x6e, 0x64, 0x65, 0x6e, 0x63, 0x69, 0x65, 0x73, 0x4a, - 0x04, 0x08, 0x01, 0x10, 0x02, 0x22, 0x9f, 0x01, 0x0a, 0x14, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, - 0x72, 0x64, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x22, 0x86, - 0x01, 0x0a, 0x0c, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x12, - 0x24, 0x0a, 0x06, 0x44, 0x4f, 0x43, 0x4b, 0x45, 0x52, 0x10, 0x00, 0x1a, 0x18, 0xa2, 0xb4, 0xfa, - 0xc2, 0x05, 0x12, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x65, 0x6e, 0x76, 0x3a, 0x64, 0x6f, 0x63, 0x6b, - 0x65, 0x72, 0x3a, 0x76, 0x31, 0x12, 0x26, 0x0a, 0x07, 0x50, 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, - 0x10, 0x01, 0x1a, 0x19, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x13, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x65, - 0x6e, 0x76, 0x3a, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x28, 0x0a, - 0x08, 0x45, 0x58, 0x54, 0x45, 0x52, 0x4e, 0x41, 0x4c, 0x10, 0x02, 0x1a, 0x1a, 0xa2, 0xb4, 0xfa, - 0xc2, 0x05, 0x14, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x65, 0x6e, 0x76, 0x3a, 0x65, 0x78, 0x74, 0x65, - 0x72, 0x6e, 0x61, 0x6c, 0x3a, 0x76, 0x31, 0x22, 0x38, 0x0a, 0x0d, 0x44, 0x6f, 0x63, 0x6b, 0x65, - 0x72, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x27, 0x0a, 0x0f, 0x63, 0x6f, 0x6e, 0x74, - 0x61, 0x69, 0x6e, 0x65, 0x72, 0x5f, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x0e, 0x63, 0x6f, 0x6e, 0x74, 0x61, 0x69, 0x6e, 0x65, 0x72, 0x49, 0x6d, 0x61, 0x67, - 0x65, 0x22, 0xd4, 0x01, 0x0a, 0x0e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x50, 0x61, 0x79, - 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x0e, 0x0a, 0x02, 0x6f, 0x73, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x02, 0x6f, 0x73, 0x12, 0x12, 0x0a, 0x04, 0x61, 0x72, 0x63, 0x68, 0x18, 0x02, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x04, 0x61, 0x72, 0x63, 0x68, 0x12, 0x18, 0x0a, 0x07, 0x63, 0x6f, 0x6d, 0x6d, - 0x61, 0x6e, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x63, 0x6f, 0x6d, 0x6d, 0x61, - 0x6e, 0x64, 0x12, 0x4c, 0x0a, 0x03, 0x65, 0x6e, 0x76, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, - 0x3a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, - 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, - 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x50, 0x61, 0x79, 0x6c, 0x6f, - 0x61, 0x64, 0x2e, 0x45, 0x6e, 0x76, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x03, 0x65, 0x6e, 0x76, - 0x1a, 0x36, 0x0a, 0x08, 0x45, 0x6e, 0x76, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, - 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, - 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, - 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0xf9, 0x01, 0x0a, 0x0f, 0x45, 0x78, 0x74, - 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x53, 0x0a, 0x08, - 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, - 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, - 0x76, 0x31, 0x2e, 0x41, 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, - 0x63, 0x72, 0x69, 0x70, 0x74, 0x6f, 0x72, 0x52, 0x08, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, - 0x74, 0x12, 0x56, 0x0a, 0x06, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, - 0x0b, 0x32, 0x3e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x72, 0x6c, 0x12, 0x16, 0x0a, 0x06, 0x73, 0x68, 0x61, 0x32, 0x35, 0x36, 0x18, 0x02, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x06, 0x73, 0x68, 0x61, 0x32, 0x35, 0x36, 0x22, 0x29, 0x0a, 0x13, 0x45, 0x6d, + 0x62, 0x65, 0x64, 0x64, 0x65, 0x64, 0x46, 0x69, 0x6c, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, + 0x64, 0x12, 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0c, 0x52, + 0x04, 0x64, 0x61, 0x74, 0x61, 0x22, 0x48, 0x0a, 0x0b, 0x50, 0x79, 0x50, 0x49, 0x50, 0x61, 0x79, + 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x1f, 0x0a, 0x0b, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, + 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, 0x61, 0x72, 0x74, 0x69, 0x66, + 0x61, 0x63, 0x74, 0x49, 0x64, 0x12, 0x18, 0x0a, 0x07, 0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x22, + 0x51, 0x0a, 0x0c, 0x4d, 0x61, 0x76, 0x65, 0x6e, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, + 0x1a, 0x0a, 0x08, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x18, 0x01, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x08, 0x61, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x12, 0x25, 0x0a, 0x0e, 0x72, + 0x65, 0x70, 0x6f, 0x73, 0x69, 0x74, 0x6f, 0x72, 0x79, 0x5f, 0x75, 0x72, 0x6c, 0x18, 0x02, 0x20, + 0x01, 0x28, 0x09, 0x52, 0x0d, 0x72, 0x65, 0x70, 0x6f, 0x73, 0x69, 0x74, 0x6f, 0x72, 0x79, 0x55, + 0x72, 0x6c, 0x22, 0x3f, 0x0a, 0x17, 0x44, 0x65, 0x66, 0x65, 0x72, 0x72, 0x65, 0x64, 0x41, 0x72, + 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x10, 0x0a, + 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, + 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, + 0x61, 0x74, 0x61, 0x22, 0x3f, 0x0a, 0x1c, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x53, + 0x74, 0x61, 0x67, 0x69, 0x6e, 0x67, 0x54, 0x6f, 0x52, 0x6f, 0x6c, 0x65, 0x50, 0x61, 0x79, 0x6c, + 0x6f, 0x61, 0x64, 0x12, 0x1f, 0x0a, 0x0b, 0x73, 0x74, 0x61, 0x67, 0x65, 0x64, 0x5f, 0x6e, 0x61, + 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, 0x73, 0x74, 0x61, 0x67, 0x65, 0x64, + 0x4e, 0x61, 0x6d, 0x65, 0x22, 0x91, 0x01, 0x0a, 0x13, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, + 0x74, 0x49, 0x6e, 0x66, 0x6f, 0x72, 0x6d, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x19, 0x0a, 0x08, + 0x74, 0x79, 0x70, 0x65, 0x5f, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, + 0x74, 0x79, 0x70, 0x65, 0x55, 0x72, 0x6e, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x79, 0x70, 0x65, 0x5f, + 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x0b, 0x74, + 0x79, 0x70, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x19, 0x0a, 0x08, 0x72, 0x6f, + 0x6c, 0x65, 0x5f, 0x75, 0x72, 0x6e, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x72, 0x6f, + 0x6c, 0x65, 0x55, 0x72, 0x6e, 0x12, 0x21, 0x0a, 0x0c, 0x72, 0x6f, 0x6c, 0x65, 0x5f, 0x70, 0x61, + 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x0b, 0x72, 0x6f, 0x6c, + 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x22, 0xbe, 0x03, 0x0a, 0x0b, 0x45, 0x6e, 0x76, + 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, + 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, + 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, 0x79, + 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x51, 0x0a, 0x0c, 0x64, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x5f, + 0x64, 0x61, 0x74, 0x61, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x44, + 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x52, 0x0b, 0x64, 0x69, 0x73, 0x70, + 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x12, 0x22, 0x0a, 0x0c, 0x63, 0x61, 0x70, 0x61, 0x62, + 0x69, 0x6c, 0x69, 0x74, 0x69, 0x65, 0x73, 0x18, 0x05, 0x20, 0x03, 0x28, 0x09, 0x52, 0x0c, 0x63, + 0x61, 0x70, 0x61, 0x62, 0x69, 0x6c, 0x69, 0x74, 0x69, 0x65, 0x73, 0x12, 0x5a, 0x0a, 0x0c, 0x64, + 0x65, 0x70, 0x65, 0x6e, 0x64, 0x65, 0x6e, 0x63, 0x69, 0x65, 0x73, 0x18, 0x06, 0x20, 0x03, 0x28, + 0x0b, 0x32, 0x36, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x50, 0x61, - 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x50, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x45, 0x6e, 0x74, 0x72, - 0x79, 0x52, 0x06, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x1a, 0x39, 0x0a, 0x0b, 0x50, 0x61, 0x72, - 0x61, 0x6d, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, + 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x72, 0x74, 0x69, 0x66, 0x61, 0x63, 0x74, 0x49, 0x6e, + 0x66, 0x6f, 0x72, 0x6d, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x0c, 0x64, 0x65, 0x70, 0x65, 0x6e, + 0x64, 0x65, 0x6e, 0x63, 0x69, 0x65, 0x73, 0x12, 0x68, 0x0a, 0x0e, 0x72, 0x65, 0x73, 0x6f, 0x75, + 0x72, 0x63, 0x65, 0x5f, 0x68, 0x69, 0x6e, 0x74, 0x73, 0x18, 0x07, 0x20, 0x03, 0x28, 0x0b, 0x32, + 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, + 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x2e, + 0x52, 0x65, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x48, 0x69, 0x6e, 0x74, 0x73, 0x45, 0x6e, 0x74, + 0x72, 0x79, 0x52, 0x0d, 0x72, 0x65, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x48, 0x69, 0x6e, 0x74, + 0x73, 0x1a, 0x40, 0x0a, 0x12, 0x52, 0x65, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x48, 0x69, 0x6e, + 0x74, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, + 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, + 0x02, 0x38, 0x01, 0x4a, 0x04, 0x08, 0x01, 0x10, 0x02, 0x22, 0xc7, 0x01, 0x0a, 0x14, 0x53, 0x74, + 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, + 0x74, 0x73, 0x22, 0xae, 0x01, 0x0a, 0x0c, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, + 0x6e, 0x74, 0x73, 0x12, 0x24, 0x0a, 0x06, 0x44, 0x4f, 0x43, 0x4b, 0x45, 0x52, 0x10, 0x00, 0x1a, + 0x18, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x12, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x65, 0x6e, 0x76, 0x3a, + 0x64, 0x6f, 0x63, 0x6b, 0x65, 0x72, 0x3a, 0x76, 0x31, 0x12, 0x26, 0x0a, 0x07, 0x50, 0x52, 0x4f, + 0x43, 0x45, 0x53, 0x53, 0x10, 0x01, 0x1a, 0x19, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x13, 0x62, 0x65, + 0x61, 0x6d, 0x3a, 0x65, 0x6e, 0x76, 0x3a, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x3a, 0x76, + 0x31, 0x12, 0x28, 0x0a, 0x08, 0x45, 0x58, 0x54, 0x45, 0x52, 0x4e, 0x41, 0x4c, 0x10, 0x02, 0x1a, + 0x1a, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x14, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x65, 0x6e, 0x76, 0x3a, + 0x65, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x3a, 0x76, 0x31, 0x12, 0x26, 0x0a, 0x07, 0x44, + 0x45, 0x46, 0x41, 0x55, 0x4c, 0x54, 0x10, 0x03, 0x1a, 0x19, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x13, + 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x65, 0x6e, 0x76, 0x3a, 0x64, 0x65, 0x66, 0x61, 0x75, 0x6c, 0x74, + 0x3a, 0x76, 0x31, 0x22, 0x38, 0x0a, 0x0d, 0x44, 0x6f, 0x63, 0x6b, 0x65, 0x72, 0x50, 0x61, 0x79, + 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x27, 0x0a, 0x0f, 0x63, 0x6f, 0x6e, 0x74, 0x61, 0x69, 0x6e, 0x65, + 0x72, 0x5f, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0e, 0x63, + 0x6f, 0x6e, 0x74, 0x61, 0x69, 0x6e, 0x65, 0x72, 0x49, 0x6d, 0x61, 0x67, 0x65, 0x22, 0xd4, 0x01, + 0x0a, 0x0e, 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, + 0x12, 0x0e, 0x0a, 0x02, 0x6f, 0x73, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x6f, 0x73, + 0x12, 0x12, 0x0a, 0x04, 0x61, 0x72, 0x63, 0x68, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, + 0x61, 0x72, 0x63, 0x68, 0x12, 0x18, 0x0a, 0x07, 0x63, 0x6f, 0x6d, 0x6d, 0x61, 0x6e, 0x64, 0x18, + 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x63, 0x6f, 0x6d, 0x6d, 0x61, 0x6e, 0x64, 0x12, 0x4c, + 0x0a, 0x03, 0x65, 0x6e, 0x76, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3a, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, + 0x50, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x45, + 0x6e, 0x76, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x03, 0x65, 0x6e, 0x76, 0x1a, 0x36, 0x0a, 0x08, + 0x45, 0x6e, 0x76, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, - 0x3a, 0x02, 0x38, 0x01, 0x22, 0xb9, 0x02, 0x0a, 0x11, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, - 0x64, 0x50, 0x72, 0x6f, 0x74, 0x6f, 0x63, 0x6f, 0x6c, 0x73, 0x22, 0xa3, 0x02, 0x0a, 0x04, 0x45, - 0x6e, 0x75, 0x6d, 0x12, 0x48, 0x0a, 0x19, 0x4c, 0x45, 0x47, 0x41, 0x43, 0x59, 0x5f, 0x50, 0x52, - 0x4f, 0x47, 0x52, 0x45, 0x53, 0x53, 0x5f, 0x52, 0x45, 0x50, 0x4f, 0x52, 0x54, 0x49, 0x4e, 0x47, - 0x10, 0x00, 0x1a, 0x29, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x70, - 0x72, 0x6f, 0x74, 0x6f, 0x63, 0x6f, 0x6c, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, - 0x5f, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x30, 0x12, 0x41, 0x0a, - 0x12, 0x50, 0x52, 0x4f, 0x47, 0x52, 0x45, 0x53, 0x53, 0x5f, 0x52, 0x45, 0x50, 0x4f, 0x52, 0x54, - 0x49, 0x4e, 0x47, 0x10, 0x01, 0x1a, 0x29, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x23, 0x62, 0x65, 0x61, - 0x6d, 0x3a, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x63, 0x6f, 0x6c, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, - 0x65, 0x73, 0x73, 0x5f, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x31, - 0x12, 0x37, 0x0a, 0x0d, 0x57, 0x4f, 0x52, 0x4b, 0x45, 0x52, 0x5f, 0x53, 0x54, 0x41, 0x54, 0x55, - 0x53, 0x10, 0x02, 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x63, 0x6f, 0x6c, 0x3a, 0x77, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x5f, - 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x55, 0x0a, 0x1c, 0x4d, 0x55, 0x4c, - 0x54, 0x49, 0x5f, 0x43, 0x4f, 0x52, 0x45, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, 0x45, 0x5f, 0x50, - 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x49, 0x4e, 0x47, 0x10, 0x03, 0x1a, 0x33, 0xa2, 0xb4, 0xfa, - 0xc2, 0x05, 0x2d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x63, 0x6f, 0x6c, - 0x3a, 0x6d, 0x75, 0x6c, 0x74, 0x69, 0x5f, 0x63, 0x6f, 0x72, 0x65, 0x5f, 0x62, 0x75, 0x6e, 0x64, - 0x6c, 0x65, 0x5f, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x31, - 0x22, 0xa6, 0x03, 0x0a, 0x14, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x52, 0x65, 0x71, - 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x22, 0x8d, 0x03, 0x0a, 0x04, 0x45, 0x6e, - 0x75, 0x6d, 0x12, 0x4a, 0x0a, 0x1c, 0x52, 0x45, 0x51, 0x55, 0x49, 0x52, 0x45, 0x53, 0x5f, 0x53, - 0x54, 0x41, 0x54, 0x45, 0x46, 0x55, 0x4c, 0x5f, 0x50, 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x49, - 0x4e, 0x47, 0x10, 0x00, 0x1a, 0x28, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x22, 0x62, 0x65, 0x61, 0x6d, - 0x3a, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x3a, 0x70, 0x61, 0x72, - 0x64, 0x6f, 0x3a, 0x73, 0x74, 0x61, 0x74, 0x65, 0x66, 0x75, 0x6c, 0x3a, 0x76, 0x31, 0x12, 0x4e, - 0x0a, 0x1c, 0x52, 0x45, 0x51, 0x55, 0x49, 0x52, 0x45, 0x53, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, - 0x45, 0x5f, 0x46, 0x49, 0x4e, 0x41, 0x4c, 0x49, 0x5a, 0x41, 0x54, 0x49, 0x4f, 0x4e, 0x10, 0x01, - 0x1a, 0x2c, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x26, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x72, 0x65, 0x71, - 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x3a, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x3a, 0x66, - 0x69, 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x3a, 0x76, 0x31, 0x12, 0x47, - 0x0a, 0x15, 0x52, 0x45, 0x51, 0x55, 0x49, 0x52, 0x45, 0x53, 0x5f, 0x53, 0x54, 0x41, 0x42, 0x4c, - 0x45, 0x5f, 0x49, 0x4e, 0x50, 0x55, 0x54, 0x10, 0x02, 0x1a, 0x2c, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, - 0x26, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, - 0x74, 0x3a, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x3a, 0x73, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x5f, 0x69, - 0x6e, 0x70, 0x75, 0x74, 0x3a, 0x76, 0x31, 0x12, 0x51, 0x0a, 0x1a, 0x52, 0x45, 0x51, 0x55, 0x49, - 0x52, 0x45, 0x53, 0x5f, 0x54, 0x49, 0x4d, 0x45, 0x5f, 0x53, 0x4f, 0x52, 0x54, 0x45, 0x44, 0x5f, - 0x49, 0x4e, 0x50, 0x55, 0x54, 0x10, 0x03, 0x1a, 0x31, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x2b, 0x62, - 0x65, 0x61, 0x6d, 0x3a, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x3a, - 0x70, 0x61, 0x72, 0x64, 0x6f, 0x3a, 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x73, 0x6f, 0x72, 0x74, 0x65, - 0x64, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x3a, 0x76, 0x31, 0x12, 0x4d, 0x0a, 0x18, 0x52, 0x45, - 0x51, 0x55, 0x49, 0x52, 0x45, 0x53, 0x5f, 0x53, 0x50, 0x4c, 0x49, 0x54, 0x54, 0x41, 0x42, 0x4c, - 0x45, 0x5f, 0x44, 0x4f, 0x46, 0x4e, 0x10, 0x04, 0x1a, 0x2f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x29, + 0x3a, 0x02, 0x38, 0x01, 0x22, 0xf9, 0x01, 0x0a, 0x0f, 0x45, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, + 0x6c, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x53, 0x0a, 0x08, 0x65, 0x6e, 0x64, 0x70, + 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x37, 0x2e, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, + 0x70, 0x69, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x44, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, + 0x74, 0x6f, 0x72, 0x52, 0x08, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x12, 0x56, 0x0a, + 0x06, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x18, 0x02, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3e, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x45, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, + 0x64, 0x2e, 0x50, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x06, 0x70, + 0x61, 0x72, 0x61, 0x6d, 0x73, 0x1a, 0x39, 0x0a, 0x0b, 0x50, 0x61, 0x72, 0x61, 0x6d, 0x73, 0x45, + 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, + 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, + 0x22, 0x88, 0x03, 0x0a, 0x11, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x50, 0x72, 0x6f, + 0x74, 0x6f, 0x63, 0x6f, 0x6c, 0x73, 0x22, 0xf2, 0x02, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, + 0x48, 0x0a, 0x19, 0x4c, 0x45, 0x47, 0x41, 0x43, 0x59, 0x5f, 0x50, 0x52, 0x4f, 0x47, 0x52, 0x45, + 0x53, 0x53, 0x5f, 0x52, 0x45, 0x50, 0x4f, 0x52, 0x54, 0x49, 0x4e, 0x47, 0x10, 0x00, 0x1a, 0x29, + 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x70, 0x72, 0x6f, 0x74, 0x6f, + 0x63, 0x6f, 0x6c, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x5f, 0x72, 0x65, 0x70, + 0x6f, 0x72, 0x74, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x30, 0x12, 0x41, 0x0a, 0x12, 0x50, 0x52, 0x4f, + 0x47, 0x52, 0x45, 0x53, 0x53, 0x5f, 0x52, 0x45, 0x50, 0x4f, 0x52, 0x54, 0x49, 0x4e, 0x47, 0x10, + 0x01, 0x1a, 0x29, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x70, 0x72, + 0x6f, 0x74, 0x6f, 0x63, 0x6f, 0x6c, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x5f, + 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x31, 0x12, 0x37, 0x0a, 0x0d, + 0x57, 0x4f, 0x52, 0x4b, 0x45, 0x52, 0x5f, 0x53, 0x54, 0x41, 0x54, 0x55, 0x53, 0x10, 0x02, 0x1a, + 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x70, 0x72, 0x6f, 0x74, + 0x6f, 0x63, 0x6f, 0x6c, 0x3a, 0x77, 0x6f, 0x72, 0x6b, 0x65, 0x72, 0x5f, 0x73, 0x74, 0x61, 0x74, + 0x75, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x55, 0x0a, 0x1c, 0x4d, 0x55, 0x4c, 0x54, 0x49, 0x5f, 0x43, + 0x4f, 0x52, 0x45, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, 0x45, 0x5f, 0x50, 0x52, 0x4f, 0x43, 0x45, + 0x53, 0x53, 0x49, 0x4e, 0x47, 0x10, 0x03, 0x1a, 0x33, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x2d, 0x62, + 0x65, 0x61, 0x6d, 0x3a, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x63, 0x6f, 0x6c, 0x3a, 0x6d, 0x75, 0x6c, + 0x74, 0x69, 0x5f, 0x63, 0x6f, 0x72, 0x65, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x5f, 0x70, + 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x31, 0x12, 0x4d, 0x0a, 0x18, + 0x48, 0x41, 0x52, 0x4e, 0x45, 0x53, 0x53, 0x5f, 0x4d, 0x4f, 0x4e, 0x49, 0x54, 0x4f, 0x52, 0x49, + 0x4e, 0x47, 0x5f, 0x49, 0x4e, 0x46, 0x4f, 0x53, 0x10, 0x04, 0x1a, 0x2f, 0xa2, 0xb4, 0xfa, 0xc2, + 0x05, 0x29, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x63, 0x6f, 0x6c, 0x3a, + 0x68, 0x61, 0x72, 0x6e, 0x65, 0x73, 0x73, 0x5f, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, + 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x73, 0x3a, 0x76, 0x31, 0x22, 0x72, 0x0a, 0x17, 0x53, + 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x52, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x50, 0x72, 0x6f, + 0x74, 0x6f, 0x63, 0x6f, 0x6c, 0x73, 0x22, 0x57, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x4f, + 0x0a, 0x19, 0x4d, 0x4f, 0x4e, 0x49, 0x54, 0x4f, 0x52, 0x49, 0x4e, 0x47, 0x5f, 0x49, 0x4e, 0x46, + 0x4f, 0x5f, 0x53, 0x48, 0x4f, 0x52, 0x54, 0x5f, 0x49, 0x44, 0x53, 0x10, 0x00, 0x1a, 0x30, 0xa2, + 0xb4, 0xfa, 0xc2, 0x05, 0x2a, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x63, + 0x6f, 0x6c, 0x3a, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, + 0x66, 0x6f, 0x5f, 0x73, 0x68, 0x6f, 0x72, 0x74, 0x5f, 0x69, 0x64, 0x73, 0x3a, 0x76, 0x31, 0x22, + 0xa6, 0x03, 0x0a, 0x14, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x52, 0x65, 0x71, 0x75, + 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x22, 0x8d, 0x03, 0x0a, 0x04, 0x45, 0x6e, 0x75, + 0x6d, 0x12, 0x4a, 0x0a, 0x1c, 0x52, 0x45, 0x51, 0x55, 0x49, 0x52, 0x45, 0x53, 0x5f, 0x53, 0x54, + 0x41, 0x54, 0x45, 0x46, 0x55, 0x4c, 0x5f, 0x50, 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x49, 0x4e, + 0x47, 0x10, 0x00, 0x1a, 0x28, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x3a, 0x70, 0x61, 0x72, 0x64, + 0x6f, 0x3a, 0x73, 0x74, 0x61, 0x74, 0x65, 0x66, 0x75, 0x6c, 0x3a, 0x76, 0x31, 0x12, 0x4e, 0x0a, + 0x1c, 0x52, 0x45, 0x51, 0x55, 0x49, 0x52, 0x45, 0x53, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, 0x45, + 0x5f, 0x46, 0x49, 0x4e, 0x41, 0x4c, 0x49, 0x5a, 0x41, 0x54, 0x49, 0x4f, 0x4e, 0x10, 0x01, 0x1a, + 0x2c, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x26, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x72, 0x65, 0x71, 0x75, + 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x3a, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x3a, 0x66, 0x69, + 0x6e, 0x61, 0x6c, 0x69, 0x7a, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x3a, 0x76, 0x31, 0x12, 0x47, 0x0a, + 0x15, 0x52, 0x45, 0x51, 0x55, 0x49, 0x52, 0x45, 0x53, 0x5f, 0x53, 0x54, 0x41, 0x42, 0x4c, 0x45, + 0x5f, 0x49, 0x4e, 0x50, 0x55, 0x54, 0x10, 0x02, 0x1a, 0x2c, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x26, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, - 0x3a, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x3a, 0x73, 0x70, 0x6c, 0x69, 0x74, 0x74, 0x61, 0x62, 0x6c, - 0x65, 0x5f, 0x64, 0x6f, 0x66, 0x6e, 0x3a, 0x76, 0x31, 0x22, 0x3a, 0x0a, 0x0c, 0x46, 0x75, 0x6e, - 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, 0x18, 0x0a, 0x07, 0x70, - 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, - 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x22, 0x65, 0x0a, 0x13, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, - 0x64, 0x44, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x22, 0x4e, 0x0a, 0x0b, - 0x44, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x12, 0x3f, 0x0a, 0x0f, 0x4c, - 0x41, 0x42, 0x45, 0x4c, 0x4c, 0x45, 0x44, 0x5f, 0x53, 0x54, 0x52, 0x49, 0x4e, 0x47, 0x10, 0x00, - 0x1a, 0x2a, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x24, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x64, 0x69, 0x73, - 0x70, 0x6c, 0x61, 0x79, 0x5f, 0x64, 0x61, 0x74, 0x61, 0x3a, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x6c, - 0x65, 0x64, 0x5f, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x31, 0x22, 0x43, 0x0a, 0x15, - 0x4c, 0x61, 0x62, 0x65, 0x6c, 0x6c, 0x65, 0x64, 0x53, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x50, 0x61, - 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x14, 0x0a, 0x05, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x18, 0x01, - 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x12, 0x14, 0x0a, 0x05, 0x76, - 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, - 0x65, 0x22, 0x39, 0x0a, 0x0b, 0x44, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, - 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, - 0x72, 0x6e, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x22, 0xd7, 0x07, 0x0a, - 0x15, 0x4d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x57, 0x69, 0x74, 0x68, 0x43, 0x6f, 0x6d, 0x70, - 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x4d, 0x0a, 0x0a, 0x63, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, - 0x65, 0x6e, 0x74, 0x73, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, + 0x3a, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x3a, 0x73, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x5f, 0x69, 0x6e, + 0x70, 0x75, 0x74, 0x3a, 0x76, 0x31, 0x12, 0x51, 0x0a, 0x1a, 0x52, 0x45, 0x51, 0x55, 0x49, 0x52, + 0x45, 0x53, 0x5f, 0x54, 0x49, 0x4d, 0x45, 0x5f, 0x53, 0x4f, 0x52, 0x54, 0x45, 0x44, 0x5f, 0x49, + 0x4e, 0x50, 0x55, 0x54, 0x10, 0x03, 0x1a, 0x31, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x2b, 0x62, 0x65, + 0x61, 0x6d, 0x3a, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x3a, 0x70, + 0x61, 0x72, 0x64, 0x6f, 0x3a, 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x73, 0x6f, 0x72, 0x74, 0x65, 0x64, + 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x3a, 0x76, 0x31, 0x12, 0x4d, 0x0a, 0x18, 0x52, 0x45, 0x51, + 0x55, 0x49, 0x52, 0x45, 0x53, 0x5f, 0x53, 0x50, 0x4c, 0x49, 0x54, 0x54, 0x41, 0x42, 0x4c, 0x45, + 0x5f, 0x44, 0x4f, 0x46, 0x4e, 0x10, 0x04, 0x1a, 0x2f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x29, 0x62, + 0x65, 0x61, 0x6d, 0x3a, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x3a, + 0x70, 0x61, 0x72, 0x64, 0x6f, 0x3a, 0x73, 0x70, 0x6c, 0x69, 0x74, 0x74, 0x61, 0x62, 0x6c, 0x65, + 0x5f, 0x64, 0x6f, 0x66, 0x6e, 0x3a, 0x76, 0x31, 0x22, 0x3a, 0x0a, 0x0c, 0x46, 0x75, 0x6e, 0x63, + 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, + 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, 0x79, + 0x6c, 0x6f, 0x61, 0x64, 0x22, 0x57, 0x0a, 0x13, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, + 0x44, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x22, 0x40, 0x0a, 0x0b, 0x44, + 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x12, 0x31, 0x0a, 0x08, 0x4c, 0x41, + 0x42, 0x45, 0x4c, 0x4c, 0x45, 0x44, 0x10, 0x00, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, + 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x64, 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x5f, 0x64, 0x61, 0x74, + 0x61, 0x3a, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x6c, 0x65, 0x64, 0x3a, 0x76, 0x31, 0x22, 0x9b, 0x01, + 0x0a, 0x0f, 0x4c, 0x61, 0x62, 0x65, 0x6c, 0x6c, 0x65, 0x64, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, + 0x64, 0x12, 0x14, 0x0a, 0x05, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, + 0x52, 0x05, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x12, 0x23, 0x0a, 0x0c, 0x73, 0x74, 0x72, 0x69, 0x6e, + 0x67, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x48, 0x00, 0x52, + 0x0b, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x1f, 0x0a, 0x0a, + 0x62, 0x6f, 0x6f, 0x6c, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x08, + 0x48, 0x00, 0x52, 0x09, 0x62, 0x6f, 0x6f, 0x6c, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x23, 0x0a, + 0x0c, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x04, 0x20, + 0x01, 0x28, 0x01, 0x48, 0x00, 0x52, 0x0b, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x56, 0x61, 0x6c, + 0x75, 0x65, 0x42, 0x07, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x22, 0x39, 0x0a, 0x0b, 0x44, + 0x69, 0x73, 0x70, 0x6c, 0x61, 0x79, 0x44, 0x61, 0x74, 0x61, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, + 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, 0x18, 0x0a, 0x07, + 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, + 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x22, 0xd7, 0x07, 0x0a, 0x15, 0x4d, 0x65, 0x73, 0x73, 0x61, + 0x67, 0x65, 0x57, 0x69, 0x74, 0x68, 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, + 0x12, 0x4d, 0x0a, 0x0a, 0x63, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, + 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, + 0x6e, 0x74, 0x73, 0x52, 0x0a, 0x63, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x12, + 0x40, 0x0a, 0x05, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x28, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, + 0x76, 0x31, 0x2e, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x48, 0x00, 0x52, 0x05, 0x63, 0x6f, 0x64, 0x65, + 0x72, 0x12, 0x5c, 0x0a, 0x0f, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x5f, 0x70, 0x61, 0x79, + 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, - 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x52, 0x0a, 0x63, 0x6f, 0x6d, 0x70, 0x6f, - 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x40, 0x0a, 0x05, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x18, 0x02, - 0x20, 0x01, 0x28, 0x0b, 0x32, 0x28, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x48, 0x00, 0x52, + 0x0e, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, + 0x56, 0x0a, 0x0d, 0x66, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x73, 0x70, 0x65, 0x63, + 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, + 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, 0x52, 0x0c, 0x66, 0x75, 0x6e, 0x63, 0x74, + 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x12, 0x57, 0x0a, 0x0e, 0x70, 0x61, 0x72, 0x5f, 0x64, + 0x6f, 0x5f, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, + 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, + 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x61, 0x72, 0x44, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, + 0x48, 0x00, 0x52, 0x0c, 0x70, 0x61, 0x72, 0x44, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, + 0x12, 0x4f, 0x0a, 0x0a, 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x18, 0x07, + 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, - 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x48, 0x00, - 0x52, 0x05, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x12, 0x5c, 0x0a, 0x0f, 0x63, 0x6f, 0x6d, 0x62, 0x69, - 0x6e, 0x65, 0x5f, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, - 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, + 0x6f, 0x72, 0x6d, 0x48, 0x00, 0x52, 0x0a, 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, + 0x6d, 0x12, 0x52, 0x0a, 0x0b, 0x70, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, + 0x18, 0x08, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x43, 0x6f, 0x6c, 0x6c, + 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x48, 0x00, 0x52, 0x0b, 0x70, 0x63, 0x6f, 0x6c, 0x6c, 0x65, + 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x53, 0x0a, 0x0c, 0x72, 0x65, 0x61, 0x64, 0x5f, 0x70, 0x61, + 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x09, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, + 0x52, 0x65, 0x61, 0x64, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x48, 0x00, 0x52, 0x0b, 0x72, + 0x65, 0x61, 0x64, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x4d, 0x0a, 0x0a, 0x73, 0x69, + 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x0b, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, + 0x76, 0x31, 0x2e, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x48, 0x00, 0x52, 0x09, + 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x66, 0x0a, 0x13, 0x77, 0x69, 0x6e, + 0x64, 0x6f, 0x77, 0x5f, 0x69, 0x6e, 0x74, 0x6f, 0x5f, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, + 0x18, 0x0c, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x69, 0x6e, 0x64, 0x6f, + 0x77, 0x49, 0x6e, 0x74, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x48, 0x00, 0x52, 0x11, + 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x49, 0x6e, 0x74, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, + 0x64, 0x12, 0x65, 0x0a, 0x12, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x5f, 0x73, + 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x18, 0x0d, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x72, 0x61, 0x74, + 0x65, 0x67, 0x79, 0x48, 0x00, 0x52, 0x11, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, + 0x53, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x42, 0x06, 0x0a, 0x04, 0x72, 0x6f, 0x6f, 0x74, + 0x22, 0xb6, 0x0a, 0x0a, 0x16, 0x45, 0x78, 0x65, 0x63, 0x75, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, + 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x50, 0x0a, 0x0b, 0x65, + 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, - 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x50, 0x61, 0x79, 0x6c, - 0x6f, 0x61, 0x64, 0x48, 0x00, 0x52, 0x0e, 0x63, 0x6f, 0x6d, 0x62, 0x69, 0x6e, 0x65, 0x50, 0x61, - 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x56, 0x0a, 0x0d, 0x66, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, - 0x6e, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x46, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x48, 0x00, 0x52, - 0x0c, 0x66, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x53, 0x70, 0x65, 0x63, 0x12, 0x57, 0x0a, - 0x0e, 0x70, 0x61, 0x72, 0x5f, 0x64, 0x6f, 0x5f, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, - 0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, 0x61, 0x72, 0x44, 0x6f, 0x50, - 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x48, 0x00, 0x52, 0x0c, 0x70, 0x61, 0x72, 0x44, 0x6f, 0x50, - 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x4f, 0x0a, 0x0a, 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, - 0x66, 0x6f, 0x72, 0x6d, 0x18, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, + 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, + 0x52, 0x0b, 0x65, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x12, 0x7a, 0x0a, + 0x13, 0x77, 0x69, 0x72, 0x65, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x72, 0x5f, 0x73, 0x65, 0x74, 0x74, + 0x69, 0x6e, 0x67, 0x73, 0x18, 0x09, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x4a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x50, - 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x48, 0x00, 0x52, 0x0a, 0x70, 0x74, 0x72, - 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x12, 0x52, 0x0a, 0x0b, 0x70, 0x63, 0x6f, 0x6c, 0x6c, - 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x08, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x50, 0x43, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x48, 0x00, 0x52, 0x0b, - 0x70, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x53, 0x0a, 0x0c, 0x72, - 0x65, 0x61, 0x64, 0x5f, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x09, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x52, 0x65, 0x61, 0x64, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, - 0x64, 0x48, 0x00, 0x52, 0x0b, 0x72, 0x65, 0x61, 0x64, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, - 0x12, 0x4d, 0x0a, 0x0a, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x0b, - 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, + 0x78, 0x65, 0x63, 0x75, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, + 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x57, 0x69, 0x72, 0x65, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x53, + 0x65, 0x74, 0x74, 0x69, 0x6e, 0x67, 0x52, 0x11, 0x77, 0x69, 0x72, 0x65, 0x43, 0x6f, 0x64, 0x65, + 0x72, 0x53, 0x65, 0x74, 0x74, 0x69, 0x6e, 0x67, 0x73, 0x12, 0x14, 0x0a, 0x05, 0x69, 0x6e, 0x70, + 0x75, 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x12, + 0x66, 0x0a, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x18, 0x03, + 0x20, 0x03, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, - 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, - 0x75, 0x74, 0x48, 0x00, 0x52, 0x09, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x12, - 0x66, 0x0a, 0x13, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x69, 0x6e, 0x74, 0x6f, 0x5f, 0x70, - 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x0c, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x49, 0x6e, 0x74, 0x6f, 0x50, 0x61, 0x79, 0x6c, 0x6f, - 0x61, 0x64, 0x48, 0x00, 0x52, 0x11, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x49, 0x6e, 0x74, 0x6f, - 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x65, 0x0a, 0x12, 0x77, 0x69, 0x6e, 0x64, 0x6f, - 0x77, 0x69, 0x6e, 0x67, 0x5f, 0x73, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x18, 0x0d, 0x20, - 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, - 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x69, 0x6e, - 0x67, 0x53, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x48, 0x00, 0x52, 0x11, 0x77, 0x69, 0x6e, - 0x64, 0x6f, 0x77, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x72, 0x61, 0x74, 0x65, 0x67, 0x79, 0x42, 0x06, - 0x0a, 0x04, 0x72, 0x6f, 0x6f, 0x74, 0x22, 0xb6, 0x0a, 0x0a, 0x16, 0x45, 0x78, 0x65, 0x63, 0x75, - 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, - 0x64, 0x12, 0x50, 0x0a, 0x0b, 0x65, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x6e, 0x76, 0x69, 0x72, - 0x6f, 0x6e, 0x6d, 0x65, 0x6e, 0x74, 0x52, 0x0b, 0x65, 0x6e, 0x76, 0x69, 0x72, 0x6f, 0x6e, 0x6d, - 0x65, 0x6e, 0x74, 0x12, 0x7a, 0x0a, 0x13, 0x77, 0x69, 0x72, 0x65, 0x5f, 0x63, 0x6f, 0x64, 0x65, - 0x72, 0x5f, 0x73, 0x65, 0x74, 0x74, 0x69, 0x6e, 0x67, 0x73, 0x18, 0x09, 0x20, 0x03, 0x28, 0x0b, - 0x32, 0x4a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, - 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, - 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, 0x65, 0x63, 0x75, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, - 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x57, 0x69, 0x72, 0x65, - 0x43, 0x6f, 0x64, 0x65, 0x72, 0x53, 0x65, 0x74, 0x74, 0x69, 0x6e, 0x67, 0x52, 0x11, 0x77, 0x69, - 0x72, 0x65, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x53, 0x65, 0x74, 0x74, 0x69, 0x6e, 0x67, 0x73, 0x12, - 0x14, 0x0a, 0x05, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, - 0x69, 0x6e, 0x70, 0x75, 0x74, 0x12, 0x66, 0x0a, 0x0b, 0x73, 0x69, 0x64, 0x65, 0x5f, 0x69, 0x6e, - 0x70, 0x75, 0x74, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, + 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, 0x65, 0x63, 0x75, 0x74, 0x61, + 0x62, 0x6c, 0x65, 0x53, 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, + 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x49, 0x64, 0x52, 0x0a, 0x73, 0x69, 0x64, + 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x12, 0x1e, 0x0a, 0x0a, 0x74, 0x72, 0x61, 0x6e, 0x73, + 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x09, 0x52, 0x0a, 0x74, 0x72, 0x61, + 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x12, 0x18, 0x0a, 0x07, 0x6f, 0x75, 0x74, 0x70, 0x75, + 0x74, 0x73, 0x18, 0x05, 0x20, 0x03, 0x28, 0x09, 0x52, 0x07, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, + 0x73, 0x12, 0x4d, 0x0a, 0x0a, 0x63, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x18, + 0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, + 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, + 0x65, 0x6e, 0x74, 0x73, 0x52, 0x0a, 0x63, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, + 0x12, 0x66, 0x0a, 0x0b, 0x75, 0x73, 0x65, 0x72, 0x5f, 0x73, 0x74, 0x61, 0x74, 0x65, 0x73, 0x18, + 0x07, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, + 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, 0x65, 0x63, 0x75, 0x74, + 0x61, 0x62, 0x6c, 0x65, 0x53, 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, + 0x2e, 0x55, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x49, 0x64, 0x52, 0x0a, 0x75, 0x73, + 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x73, 0x12, 0x59, 0x0a, 0x06, 0x74, 0x69, 0x6d, 0x65, + 0x72, 0x73, 0x18, 0x08, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, + 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, 0x65, + 0x63, 0x75, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, 0x79, 0x6c, + 0x6f, 0x61, 0x64, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x49, 0x64, 0x52, 0x06, 0x74, 0x69, 0x6d, + 0x65, 0x72, 0x73, 0x12, 0x6d, 0x0a, 0x0d, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, + 0x6c, 0x69, 0x65, 0x73, 0x18, 0x0a, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x47, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, 0x65, 0x63, 0x75, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, - 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x49, - 0x64, 0x52, 0x0a, 0x73, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x73, 0x12, 0x1e, 0x0a, - 0x0a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, - 0x09, 0x52, 0x0a, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x73, 0x12, 0x18, 0x0a, - 0x07, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x73, 0x18, 0x05, 0x20, 0x03, 0x28, 0x09, 0x52, 0x07, - 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x73, 0x12, 0x4d, 0x0a, 0x0a, 0x63, 0x6f, 0x6d, 0x70, 0x6f, - 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x18, 0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, - 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, - 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, - 0x43, 0x6f, 0x6d, 0x70, 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x52, 0x0a, 0x63, 0x6f, 0x6d, 0x70, - 0x6f, 0x6e, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x66, 0x0a, 0x0b, 0x75, 0x73, 0x65, 0x72, 0x5f, 0x73, - 0x74, 0x61, 0x74, 0x65, 0x73, 0x18, 0x07, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x45, 0x2e, 0x6f, 0x72, + 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, + 0x79, 0x49, 0x64, 0x52, 0x0d, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x69, + 0x65, 0x73, 0x1a, 0x4f, 0x0a, 0x0b, 0x53, 0x69, 0x64, 0x65, 0x49, 0x6e, 0x70, 0x75, 0x74, 0x49, + 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, + 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, + 0x72, 0x6d, 0x49, 0x64, 0x12, 0x1d, 0x0a, 0x0a, 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x5f, 0x6e, 0x61, + 0x6d, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x09, 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x4e, + 0x61, 0x6d, 0x65, 0x1a, 0x4f, 0x0a, 0x0b, 0x55, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, + 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, + 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, + 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x1d, 0x0a, 0x0a, 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x5f, 0x6e, + 0x61, 0x6d, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x09, 0x6c, 0x6f, 0x63, 0x61, 0x6c, + 0x4e, 0x61, 0x6d, 0x65, 0x1a, 0x4b, 0x0a, 0x07, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x49, 0x64, 0x12, + 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, + 0x49, 0x64, 0x12, 0x1d, 0x0a, 0x0a, 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x5f, 0x6e, 0x61, 0x6d, 0x65, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x09, 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x4e, 0x61, 0x6d, + 0x65, 0x1a, 0x51, 0x0a, 0x0d, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, + 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, + 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, + 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x1d, 0x0a, 0x0a, 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x5f, 0x6e, + 0x61, 0x6d, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x09, 0x6c, 0x6f, 0x63, 0x61, 0x6c, + 0x4e, 0x61, 0x6d, 0x65, 0x1a, 0xd2, 0x01, 0x0a, 0x10, 0x57, 0x69, 0x72, 0x65, 0x43, 0x6f, 0x64, + 0x65, 0x72, 0x53, 0x65, 0x74, 0x74, 0x69, 0x6e, 0x67, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, + 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, 0x18, 0x0a, 0x07, 0x70, + 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, + 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x2d, 0x0a, 0x12, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x6f, + 0x72, 0x5f, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x5f, 0x69, 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, + 0x09, 0x48, 0x00, 0x52, 0x0f, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x4f, 0x72, 0x4f, 0x75, 0x74, 0x70, + 0x75, 0x74, 0x49, 0x64, 0x12, 0x59, 0x0a, 0x05, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x18, 0x04, 0x20, + 0x01, 0x28, 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, + 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, + 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, 0x65, 0x63, 0x75, 0x74, 0x61, 0x62, + 0x6c, 0x65, 0x53, 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x54, + 0x69, 0x6d, 0x65, 0x72, 0x49, 0x64, 0x48, 0x00, 0x52, 0x05, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x42, + 0x08, 0x0a, 0x06, 0x74, 0x61, 0x72, 0x67, 0x65, 0x74, 0x22, 0x8f, 0x01, 0x0a, 0x15, 0x53, 0x74, + 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x52, 0x65, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x48, 0x69, + 0x6e, 0x74, 0x73, 0x22, 0x76, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x34, 0x0a, 0x0b, 0x41, + 0x43, 0x43, 0x45, 0x4c, 0x45, 0x52, 0x41, 0x54, 0x4f, 0x52, 0x10, 0x00, 0x1a, 0x23, 0xa2, 0xb4, + 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x72, 0x65, 0x73, 0x6f, 0x75, 0x72, 0x63, + 0x65, 0x73, 0x3a, 0x61, 0x63, 0x63, 0x65, 0x6c, 0x65, 0x72, 0x61, 0x74, 0x6f, 0x72, 0x3a, 0x76, + 0x31, 0x12, 0x38, 0x0a, 0x0d, 0x4d, 0x49, 0x4e, 0x5f, 0x52, 0x41, 0x4d, 0x5f, 0x42, 0x59, 0x54, + 0x45, 0x53, 0x10, 0x01, 0x1a, 0x25, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1f, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x72, 0x65, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x73, 0x3a, 0x6d, 0x69, 0x6e, 0x5f, 0x72, + 0x61, 0x6d, 0x5f, 0x62, 0x79, 0x74, 0x65, 0x73, 0x3a, 0x76, 0x31, 0x32, 0x8f, 0x01, 0x0a, 0x11, + 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, + 0x65, 0x12, 0x7a, 0x0a, 0x06, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x73, 0x12, 0x30, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, - 0x45, 0x78, 0x65, 0x63, 0x75, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x74, 0x61, 0x67, 0x65, 0x50, - 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x55, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, - 0x49, 0x64, 0x52, 0x0a, 0x75, 0x73, 0x65, 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x73, 0x12, 0x59, - 0x0a, 0x06, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x73, 0x18, 0x08, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x41, - 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, - 0x76, 0x31, 0x2e, 0x45, 0x78, 0x65, 0x63, 0x75, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x74, 0x61, - 0x67, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x49, - 0x64, 0x52, 0x06, 0x74, 0x69, 0x6d, 0x65, 0x72, 0x73, 0x12, 0x6d, 0x0a, 0x0d, 0x74, 0x69, 0x6d, - 0x65, 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x69, 0x65, 0x73, 0x18, 0x0a, 0x20, 0x03, 0x28, 0x0b, - 0x32, 0x47, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, - 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, - 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, 0x65, 0x63, 0x75, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, - 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x54, 0x69, 0x6d, 0x65, - 0x72, 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x49, 0x64, 0x52, 0x0d, 0x74, 0x69, 0x6d, 0x65, 0x72, - 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x69, 0x65, 0x73, 0x1a, 0x4f, 0x0a, 0x0b, 0x53, 0x69, 0x64, 0x65, - 0x49, 0x6e, 0x70, 0x75, 0x74, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, - 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, - 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x1d, 0x0a, 0x0a, 0x6c, 0x6f, - 0x63, 0x61, 0x6c, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x09, - 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x4e, 0x61, 0x6d, 0x65, 0x1a, 0x4f, 0x0a, 0x0b, 0x55, 0x73, 0x65, - 0x72, 0x53, 0x74, 0x61, 0x74, 0x65, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, - 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, - 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x1d, 0x0a, 0x0a, 0x6c, - 0x6f, 0x63, 0x61, 0x6c, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x09, 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x4e, 0x61, 0x6d, 0x65, 0x1a, 0x4b, 0x0a, 0x07, 0x54, 0x69, - 0x6d, 0x65, 0x72, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, - 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x74, 0x72, 0x61, - 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x1d, 0x0a, 0x0a, 0x6c, 0x6f, 0x63, 0x61, - 0x6c, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x09, 0x6c, 0x6f, - 0x63, 0x61, 0x6c, 0x4e, 0x61, 0x6d, 0x65, 0x1a, 0x51, 0x0a, 0x0d, 0x54, 0x69, 0x6d, 0x65, 0x72, - 0x46, 0x61, 0x6d, 0x69, 0x6c, 0x79, 0x49, 0x64, 0x12, 0x21, 0x0a, 0x0c, 0x74, 0x72, 0x61, 0x6e, - 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, - 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x49, 0x64, 0x12, 0x1d, 0x0a, 0x0a, 0x6c, - 0x6f, 0x63, 0x61, 0x6c, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x09, 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x4e, 0x61, 0x6d, 0x65, 0x1a, 0xd2, 0x01, 0x0a, 0x10, 0x57, - 0x69, 0x72, 0x65, 0x43, 0x6f, 0x64, 0x65, 0x72, 0x53, 0x65, 0x74, 0x74, 0x69, 0x6e, 0x67, 0x12, - 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, 0x72, - 0x6e, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x02, 0x20, 0x01, - 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x2d, 0x0a, 0x12, 0x69, - 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x6f, 0x72, 0x5f, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x5f, 0x69, - 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x48, 0x00, 0x52, 0x0f, 0x69, 0x6e, 0x70, 0x75, 0x74, - 0x4f, 0x72, 0x4f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x49, 0x64, 0x12, 0x59, 0x0a, 0x05, 0x74, 0x69, - 0x6d, 0x65, 0x72, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x41, 0x2e, 0x6f, 0x72, 0x67, 0x2e, - 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x78, - 0x65, 0x63, 0x75, 0x74, 0x61, 0x62, 0x6c, 0x65, 0x53, 0x74, 0x61, 0x67, 0x65, 0x50, 0x61, 0x79, - 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x72, 0x49, 0x64, 0x48, 0x00, 0x52, 0x05, - 0x74, 0x69, 0x6d, 0x65, 0x72, 0x42, 0x08, 0x0a, 0x06, 0x74, 0x61, 0x72, 0x67, 0x65, 0x74, 0x32, - 0x8f, 0x01, 0x0a, 0x11, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, 0x53, 0x65, - 0x72, 0x76, 0x69, 0x63, 0x65, 0x12, 0x7a, 0x0a, 0x06, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x73, 0x12, - 0x30, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, - 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, - 0x2e, 0x76, 0x31, 0x2e, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, - 0x74, 0x1a, 0x3a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, - 0x50, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x2e, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x22, 0x00, 0x30, - 0x01, 0x3a, 0x3f, 0x0a, 0x08, 0x62, 0x65, 0x61, 0x6d, 0x5f, 0x75, 0x72, 0x6e, 0x12, 0x21, 0x2e, - 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, - 0x45, 0x6e, 0x75, 0x6d, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, - 0x18, 0xc4, 0xa6, 0xaf, 0x58, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x62, 0x65, 0x61, 0x6d, 0x55, - 0x72, 0x6e, 0x3a, 0x49, 0x0a, 0x0d, 0x62, 0x65, 0x61, 0x6d, 0x5f, 0x63, 0x6f, 0x6e, 0x73, 0x74, - 0x61, 0x6e, 0x74, 0x12, 0x21, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, - 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x4f, - 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x18, 0xc5, 0xa6, 0xaf, 0x58, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x0c, 0x62, 0x65, 0x61, 0x6d, 0x43, 0x6f, 0x6e, 0x73, 0x74, 0x61, 0x6e, 0x74, 0x42, 0x75, 0x0a, - 0x21, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, - 0x76, 0x31, 0x42, 0x09, 0x52, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x41, 0x70, 0x69, 0x5a, 0x45, 0x67, - 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, - 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x70, 0x69, 0x70, - 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x3b, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, - 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, + 0x45, 0x76, 0x65, 0x6e, 0x74, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x3a, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x54, 0x65, 0x73, 0x74, 0x53, 0x74, 0x72, 0x65, 0x61, 0x6d, 0x50, 0x61, 0x79, 0x6c, + 0x6f, 0x61, 0x64, 0x2e, 0x45, 0x76, 0x65, 0x6e, 0x74, 0x22, 0x00, 0x30, 0x01, 0x3a, 0x3f, 0x0a, + 0x08, 0x62, 0x65, 0x61, 0x6d, 0x5f, 0x75, 0x72, 0x6e, 0x12, 0x21, 0x2e, 0x67, 0x6f, 0x6f, 0x67, + 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x45, 0x6e, 0x75, 0x6d, + 0x56, 0x61, 0x6c, 0x75, 0x65, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x18, 0xc4, 0xa6, 0xaf, + 0x58, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x62, 0x65, 0x61, 0x6d, 0x55, 0x72, 0x6e, 0x3a, 0x49, + 0x0a, 0x0d, 0x62, 0x65, 0x61, 0x6d, 0x5f, 0x63, 0x6f, 0x6e, 0x73, 0x74, 0x61, 0x6e, 0x74, 0x12, + 0x21, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, + 0x66, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x4f, 0x70, 0x74, 0x69, 0x6f, + 0x6e, 0x73, 0x18, 0xc5, 0xa6, 0xaf, 0x58, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0c, 0x62, 0x65, 0x61, + 0x6d, 0x43, 0x6f, 0x6e, 0x73, 0x74, 0x61, 0x6e, 0x74, 0x42, 0x78, 0x0a, 0x21, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x42, 0x09, + 0x52, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x41, 0x70, 0x69, 0x5a, 0x48, 0x67, 0x69, 0x74, 0x68, 0x75, + 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, + 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x76, 0x32, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, + 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x70, 0x69, 0x70, 0x65, + 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x3b, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, + 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( @@ -8325,244 +8896,254 @@ func file_beam_runner_api_proto_rawDescGZIP() []byte { return file_beam_runner_api_proto_rawDescData } -var file_beam_runner_api_proto_enumTypes = make([]protoimpl.EnumInfo, 21) -var file_beam_runner_api_proto_msgTypes = make([]protoimpl.MessageInfo, 96) +var file_beam_runner_api_proto_enumTypes = make([]protoimpl.EnumInfo, 24) +var file_beam_runner_api_proto_msgTypes = make([]protoimpl.MessageInfo, 101) var file_beam_runner_api_proto_goTypes = []interface{}{ - (BeamConstants_Constants)(0), // 0: org.apache.beam.model.pipeline.v1.BeamConstants.Constants - (StandardPTransforms_Primitives)(0), // 1: org.apache.beam.model.pipeline.v1.StandardPTransforms.Primitives - (StandardPTransforms_DeprecatedPrimitives)(0), // 2: org.apache.beam.model.pipeline.v1.StandardPTransforms.DeprecatedPrimitives - (StandardPTransforms_Composites)(0), // 3: org.apache.beam.model.pipeline.v1.StandardPTransforms.Composites - (StandardPTransforms_CombineComponents)(0), // 4: org.apache.beam.model.pipeline.v1.StandardPTransforms.CombineComponents - (StandardPTransforms_SplittableParDoComponents)(0), // 5: org.apache.beam.model.pipeline.v1.StandardPTransforms.SplittableParDoComponents - (StandardSideInputTypes_Enum)(0), // 6: org.apache.beam.model.pipeline.v1.StandardSideInputTypes.Enum - (IsBounded_Enum)(0), // 7: org.apache.beam.model.pipeline.v1.IsBounded.Enum - (StandardCoders_Enum)(0), // 8: org.apache.beam.model.pipeline.v1.StandardCoders.Enum - (MergeStatus_Enum)(0), // 9: org.apache.beam.model.pipeline.v1.MergeStatus.Enum - (AccumulationMode_Enum)(0), // 10: org.apache.beam.model.pipeline.v1.AccumulationMode.Enum - (ClosingBehavior_Enum)(0), // 11: org.apache.beam.model.pipeline.v1.ClosingBehavior.Enum - (OnTimeBehavior_Enum)(0), // 12: org.apache.beam.model.pipeline.v1.OnTimeBehavior.Enum - (OutputTime_Enum)(0), // 13: org.apache.beam.model.pipeline.v1.OutputTime.Enum - (TimeDomain_Enum)(0), // 14: org.apache.beam.model.pipeline.v1.TimeDomain.Enum - (StandardArtifacts_Types)(0), // 15: org.apache.beam.model.pipeline.v1.StandardArtifacts.Types - (StandardArtifacts_Roles)(0), // 16: org.apache.beam.model.pipeline.v1.StandardArtifacts.Roles - (StandardEnvironments_Environments)(0), // 17: org.apache.beam.model.pipeline.v1.StandardEnvironments.Environments - (StandardProtocols_Enum)(0), // 18: org.apache.beam.model.pipeline.v1.StandardProtocols.Enum - (StandardRequirements_Enum)(0), // 19: org.apache.beam.model.pipeline.v1.StandardRequirements.Enum - (StandardDisplayData_DisplayData)(0), // 20: org.apache.beam.model.pipeline.v1.StandardDisplayData.DisplayData - (*BeamConstants)(nil), // 21: org.apache.beam.model.pipeline.v1.BeamConstants - (*Components)(nil), // 22: org.apache.beam.model.pipeline.v1.Components - (*Pipeline)(nil), // 23: org.apache.beam.model.pipeline.v1.Pipeline - (*PTransform)(nil), // 24: org.apache.beam.model.pipeline.v1.PTransform - (*StandardPTransforms)(nil), // 25: org.apache.beam.model.pipeline.v1.StandardPTransforms - (*StandardSideInputTypes)(nil), // 26: org.apache.beam.model.pipeline.v1.StandardSideInputTypes - (*PCollection)(nil), // 27: org.apache.beam.model.pipeline.v1.PCollection - (*ParDoPayload)(nil), // 28: org.apache.beam.model.pipeline.v1.ParDoPayload - (*StateSpec)(nil), // 29: org.apache.beam.model.pipeline.v1.StateSpec - (*ReadModifyWriteStateSpec)(nil), // 30: org.apache.beam.model.pipeline.v1.ReadModifyWriteStateSpec - (*BagStateSpec)(nil), // 31: org.apache.beam.model.pipeline.v1.BagStateSpec - (*OrderedListStateSpec)(nil), // 32: org.apache.beam.model.pipeline.v1.OrderedListStateSpec - (*CombiningStateSpec)(nil), // 33: org.apache.beam.model.pipeline.v1.CombiningStateSpec - (*MapStateSpec)(nil), // 34: org.apache.beam.model.pipeline.v1.MapStateSpec - (*SetStateSpec)(nil), // 35: org.apache.beam.model.pipeline.v1.SetStateSpec - (*TimerFamilySpec)(nil), // 36: org.apache.beam.model.pipeline.v1.TimerFamilySpec - (*IsBounded)(nil), // 37: org.apache.beam.model.pipeline.v1.IsBounded - (*ReadPayload)(nil), // 38: org.apache.beam.model.pipeline.v1.ReadPayload - (*WindowIntoPayload)(nil), // 39: org.apache.beam.model.pipeline.v1.WindowIntoPayload - (*CombinePayload)(nil), // 40: org.apache.beam.model.pipeline.v1.CombinePayload - (*TestStreamPayload)(nil), // 41: org.apache.beam.model.pipeline.v1.TestStreamPayload - (*EventsRequest)(nil), // 42: org.apache.beam.model.pipeline.v1.EventsRequest - (*WriteFilesPayload)(nil), // 43: org.apache.beam.model.pipeline.v1.WriteFilesPayload - (*PubSubReadPayload)(nil), // 44: org.apache.beam.model.pipeline.v1.PubSubReadPayload - (*PubSubWritePayload)(nil), // 45: org.apache.beam.model.pipeline.v1.PubSubWritePayload - (*Coder)(nil), // 46: org.apache.beam.model.pipeline.v1.Coder - (*StandardCoders)(nil), // 47: org.apache.beam.model.pipeline.v1.StandardCoders - (*WindowingStrategy)(nil), // 48: org.apache.beam.model.pipeline.v1.WindowingStrategy - (*MergeStatus)(nil), // 49: org.apache.beam.model.pipeline.v1.MergeStatus - (*AccumulationMode)(nil), // 50: org.apache.beam.model.pipeline.v1.AccumulationMode - (*ClosingBehavior)(nil), // 51: org.apache.beam.model.pipeline.v1.ClosingBehavior - (*OnTimeBehavior)(nil), // 52: org.apache.beam.model.pipeline.v1.OnTimeBehavior - (*OutputTime)(nil), // 53: org.apache.beam.model.pipeline.v1.OutputTime - (*TimeDomain)(nil), // 54: org.apache.beam.model.pipeline.v1.TimeDomain - (*Trigger)(nil), // 55: org.apache.beam.model.pipeline.v1.Trigger - (*TimestampTransform)(nil), // 56: org.apache.beam.model.pipeline.v1.TimestampTransform - (*SideInput)(nil), // 57: org.apache.beam.model.pipeline.v1.SideInput - (*StandardArtifacts)(nil), // 58: org.apache.beam.model.pipeline.v1.StandardArtifacts - (*ArtifactFilePayload)(nil), // 59: org.apache.beam.model.pipeline.v1.ArtifactFilePayload - (*ArtifactUrlPayload)(nil), // 60: org.apache.beam.model.pipeline.v1.ArtifactUrlPayload - (*EmbeddedFilePayload)(nil), // 61: org.apache.beam.model.pipeline.v1.EmbeddedFilePayload - (*PyPIPayload)(nil), // 62: org.apache.beam.model.pipeline.v1.PyPIPayload - (*MavenPayload)(nil), // 63: org.apache.beam.model.pipeline.v1.MavenPayload - (*DeferredArtifactPayload)(nil), // 64: org.apache.beam.model.pipeline.v1.DeferredArtifactPayload - (*ArtifactStagingToRolePayload)(nil), // 65: org.apache.beam.model.pipeline.v1.ArtifactStagingToRolePayload - (*ArtifactInformation)(nil), // 66: org.apache.beam.model.pipeline.v1.ArtifactInformation - (*Environment)(nil), // 67: org.apache.beam.model.pipeline.v1.Environment - (*StandardEnvironments)(nil), // 68: org.apache.beam.model.pipeline.v1.StandardEnvironments - (*DockerPayload)(nil), // 69: org.apache.beam.model.pipeline.v1.DockerPayload - (*ProcessPayload)(nil), // 70: org.apache.beam.model.pipeline.v1.ProcessPayload - (*ExternalPayload)(nil), // 71: org.apache.beam.model.pipeline.v1.ExternalPayload - (*StandardProtocols)(nil), // 72: org.apache.beam.model.pipeline.v1.StandardProtocols - (*StandardRequirements)(nil), // 73: org.apache.beam.model.pipeline.v1.StandardRequirements - (*FunctionSpec)(nil), // 74: org.apache.beam.model.pipeline.v1.FunctionSpec - (*StandardDisplayData)(nil), // 75: org.apache.beam.model.pipeline.v1.StandardDisplayData - (*LabelledStringPayload)(nil), // 76: org.apache.beam.model.pipeline.v1.LabelledStringPayload - (*DisplayData)(nil), // 77: org.apache.beam.model.pipeline.v1.DisplayData - (*MessageWithComponents)(nil), // 78: org.apache.beam.model.pipeline.v1.MessageWithComponents - (*ExecutableStagePayload)(nil), // 79: org.apache.beam.model.pipeline.v1.ExecutableStagePayload - nil, // 80: org.apache.beam.model.pipeline.v1.Components.TransformsEntry - nil, // 81: org.apache.beam.model.pipeline.v1.Components.PcollectionsEntry - nil, // 82: org.apache.beam.model.pipeline.v1.Components.WindowingStrategiesEntry - nil, // 83: org.apache.beam.model.pipeline.v1.Components.CodersEntry - nil, // 84: org.apache.beam.model.pipeline.v1.Components.EnvironmentsEntry - nil, // 85: org.apache.beam.model.pipeline.v1.PTransform.InputsEntry - nil, // 86: org.apache.beam.model.pipeline.v1.PTransform.OutputsEntry - nil, // 87: org.apache.beam.model.pipeline.v1.ParDoPayload.SideInputsEntry - nil, // 88: org.apache.beam.model.pipeline.v1.ParDoPayload.StateSpecsEntry - nil, // 89: org.apache.beam.model.pipeline.v1.ParDoPayload.TimerFamilySpecsEntry - (*TestStreamPayload_Event)(nil), // 90: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event - (*TestStreamPayload_TimestampedElement)(nil), // 91: org.apache.beam.model.pipeline.v1.TestStreamPayload.TimestampedElement - (*TestStreamPayload_Event_AdvanceWatermark)(nil), // 92: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AdvanceWatermark - (*TestStreamPayload_Event_AdvanceProcessingTime)(nil), // 93: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AdvanceProcessingTime - (*TestStreamPayload_Event_AddElements)(nil), // 94: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AddElements - nil, // 95: org.apache.beam.model.pipeline.v1.WriteFilesPayload.SideInputsEntry - (*Trigger_AfterAll)(nil), // 96: org.apache.beam.model.pipeline.v1.Trigger.AfterAll - (*Trigger_AfterAny)(nil), // 97: org.apache.beam.model.pipeline.v1.Trigger.AfterAny - (*Trigger_AfterEach)(nil), // 98: org.apache.beam.model.pipeline.v1.Trigger.AfterEach - (*Trigger_AfterEndOfWindow)(nil), // 99: org.apache.beam.model.pipeline.v1.Trigger.AfterEndOfWindow - (*Trigger_AfterProcessingTime)(nil), // 100: org.apache.beam.model.pipeline.v1.Trigger.AfterProcessingTime - (*Trigger_AfterSynchronizedProcessingTime)(nil), // 101: org.apache.beam.model.pipeline.v1.Trigger.AfterSynchronizedProcessingTime - (*Trigger_Default)(nil), // 102: org.apache.beam.model.pipeline.v1.Trigger.Default - (*Trigger_ElementCount)(nil), // 103: org.apache.beam.model.pipeline.v1.Trigger.ElementCount - (*Trigger_Never)(nil), // 104: org.apache.beam.model.pipeline.v1.Trigger.Never - (*Trigger_Always)(nil), // 105: org.apache.beam.model.pipeline.v1.Trigger.Always - (*Trigger_OrFinally)(nil), // 106: org.apache.beam.model.pipeline.v1.Trigger.OrFinally - (*Trigger_Repeat)(nil), // 107: org.apache.beam.model.pipeline.v1.Trigger.Repeat - (*TimestampTransform_Delay)(nil), // 108: org.apache.beam.model.pipeline.v1.TimestampTransform.Delay - (*TimestampTransform_AlignTo)(nil), // 109: org.apache.beam.model.pipeline.v1.TimestampTransform.AlignTo - nil, // 110: org.apache.beam.model.pipeline.v1.ProcessPayload.EnvEntry - nil, // 111: org.apache.beam.model.pipeline.v1.ExternalPayload.ParamsEntry - (*ExecutableStagePayload_SideInputId)(nil), // 112: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.SideInputId - (*ExecutableStagePayload_UserStateId)(nil), // 113: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.UserStateId - (*ExecutableStagePayload_TimerId)(nil), // 114: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerId - (*ExecutableStagePayload_TimerFamilyId)(nil), // 115: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerFamilyId - (*ExecutableStagePayload_WireCoderSetting)(nil), // 116: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.WireCoderSetting - (*ApiServiceDescriptor)(nil), // 117: org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - (*descriptor.EnumValueOptions)(nil), // 118: google.protobuf.EnumValueOptions + (BeamConstants_Constants)(0), // 0: org.apache.beam.model.pipeline.v1.BeamConstants.Constants + (StandardPTransforms_Primitives)(0), // 1: org.apache.beam.model.pipeline.v1.StandardPTransforms.Primitives + (StandardPTransforms_DeprecatedPrimitives)(0), // 2: org.apache.beam.model.pipeline.v1.StandardPTransforms.DeprecatedPrimitives + (StandardPTransforms_Composites)(0), // 3: org.apache.beam.model.pipeline.v1.StandardPTransforms.Composites + (StandardPTransforms_CombineComponents)(0), // 4: org.apache.beam.model.pipeline.v1.StandardPTransforms.CombineComponents + (StandardPTransforms_SplittableParDoComponents)(0), // 5: org.apache.beam.model.pipeline.v1.StandardPTransforms.SplittableParDoComponents + (StandardPTransforms_GroupIntoBatchesComponents)(0), // 6: org.apache.beam.model.pipeline.v1.StandardPTransforms.GroupIntoBatchesComponents + (StandardSideInputTypes_Enum)(0), // 7: org.apache.beam.model.pipeline.v1.StandardSideInputTypes.Enum + (IsBounded_Enum)(0), // 8: org.apache.beam.model.pipeline.v1.IsBounded.Enum + (StandardCoders_Enum)(0), // 9: org.apache.beam.model.pipeline.v1.StandardCoders.Enum + (MergeStatus_Enum)(0), // 10: org.apache.beam.model.pipeline.v1.MergeStatus.Enum + (AccumulationMode_Enum)(0), // 11: org.apache.beam.model.pipeline.v1.AccumulationMode.Enum + (ClosingBehavior_Enum)(0), // 12: org.apache.beam.model.pipeline.v1.ClosingBehavior.Enum + (OnTimeBehavior_Enum)(0), // 13: org.apache.beam.model.pipeline.v1.OnTimeBehavior.Enum + (OutputTime_Enum)(0), // 14: org.apache.beam.model.pipeline.v1.OutputTime.Enum + (TimeDomain_Enum)(0), // 15: org.apache.beam.model.pipeline.v1.TimeDomain.Enum + (StandardArtifacts_Types)(0), // 16: org.apache.beam.model.pipeline.v1.StandardArtifacts.Types + (StandardArtifacts_Roles)(0), // 17: org.apache.beam.model.pipeline.v1.StandardArtifacts.Roles + (StandardEnvironments_Environments)(0), // 18: org.apache.beam.model.pipeline.v1.StandardEnvironments.Environments + (StandardProtocols_Enum)(0), // 19: org.apache.beam.model.pipeline.v1.StandardProtocols.Enum + (StandardRunnerProtocols_Enum)(0), // 20: org.apache.beam.model.pipeline.v1.StandardRunnerProtocols.Enum + (StandardRequirements_Enum)(0), // 21: org.apache.beam.model.pipeline.v1.StandardRequirements.Enum + (StandardDisplayData_DisplayData)(0), // 22: org.apache.beam.model.pipeline.v1.StandardDisplayData.DisplayData + (StandardResourceHints_Enum)(0), // 23: org.apache.beam.model.pipeline.v1.StandardResourceHints.Enum + (*BeamConstants)(nil), // 24: org.apache.beam.model.pipeline.v1.BeamConstants + (*Components)(nil), // 25: org.apache.beam.model.pipeline.v1.Components + (*Pipeline)(nil), // 26: org.apache.beam.model.pipeline.v1.Pipeline + (*PTransform)(nil), // 27: org.apache.beam.model.pipeline.v1.PTransform + (*StandardPTransforms)(nil), // 28: org.apache.beam.model.pipeline.v1.StandardPTransforms + (*StandardSideInputTypes)(nil), // 29: org.apache.beam.model.pipeline.v1.StandardSideInputTypes + (*PCollection)(nil), // 30: org.apache.beam.model.pipeline.v1.PCollection + (*ParDoPayload)(nil), // 31: org.apache.beam.model.pipeline.v1.ParDoPayload + (*StateSpec)(nil), // 32: org.apache.beam.model.pipeline.v1.StateSpec + (*ReadModifyWriteStateSpec)(nil), // 33: org.apache.beam.model.pipeline.v1.ReadModifyWriteStateSpec + (*BagStateSpec)(nil), // 34: org.apache.beam.model.pipeline.v1.BagStateSpec + (*OrderedListStateSpec)(nil), // 35: org.apache.beam.model.pipeline.v1.OrderedListStateSpec + (*CombiningStateSpec)(nil), // 36: org.apache.beam.model.pipeline.v1.CombiningStateSpec + (*MapStateSpec)(nil), // 37: org.apache.beam.model.pipeline.v1.MapStateSpec + (*SetStateSpec)(nil), // 38: org.apache.beam.model.pipeline.v1.SetStateSpec + (*TimerFamilySpec)(nil), // 39: org.apache.beam.model.pipeline.v1.TimerFamilySpec + (*IsBounded)(nil), // 40: org.apache.beam.model.pipeline.v1.IsBounded + (*ReadPayload)(nil), // 41: org.apache.beam.model.pipeline.v1.ReadPayload + (*WindowIntoPayload)(nil), // 42: org.apache.beam.model.pipeline.v1.WindowIntoPayload + (*CombinePayload)(nil), // 43: org.apache.beam.model.pipeline.v1.CombinePayload + (*TestStreamPayload)(nil), // 44: org.apache.beam.model.pipeline.v1.TestStreamPayload + (*EventsRequest)(nil), // 45: org.apache.beam.model.pipeline.v1.EventsRequest + (*WriteFilesPayload)(nil), // 46: org.apache.beam.model.pipeline.v1.WriteFilesPayload + (*PubSubReadPayload)(nil), // 47: org.apache.beam.model.pipeline.v1.PubSubReadPayload + (*PubSubWritePayload)(nil), // 48: org.apache.beam.model.pipeline.v1.PubSubWritePayload + (*GroupIntoBatchesPayload)(nil), // 49: org.apache.beam.model.pipeline.v1.GroupIntoBatchesPayload + (*Coder)(nil), // 50: org.apache.beam.model.pipeline.v1.Coder + (*StandardCoders)(nil), // 51: org.apache.beam.model.pipeline.v1.StandardCoders + (*WindowingStrategy)(nil), // 52: org.apache.beam.model.pipeline.v1.WindowingStrategy + (*MergeStatus)(nil), // 53: org.apache.beam.model.pipeline.v1.MergeStatus + (*AccumulationMode)(nil), // 54: org.apache.beam.model.pipeline.v1.AccumulationMode + (*ClosingBehavior)(nil), // 55: org.apache.beam.model.pipeline.v1.ClosingBehavior + (*OnTimeBehavior)(nil), // 56: org.apache.beam.model.pipeline.v1.OnTimeBehavior + (*OutputTime)(nil), // 57: org.apache.beam.model.pipeline.v1.OutputTime + (*TimeDomain)(nil), // 58: org.apache.beam.model.pipeline.v1.TimeDomain + (*Trigger)(nil), // 59: org.apache.beam.model.pipeline.v1.Trigger + (*TimestampTransform)(nil), // 60: org.apache.beam.model.pipeline.v1.TimestampTransform + (*SideInput)(nil), // 61: org.apache.beam.model.pipeline.v1.SideInput + (*StandardArtifacts)(nil), // 62: org.apache.beam.model.pipeline.v1.StandardArtifacts + (*ArtifactFilePayload)(nil), // 63: org.apache.beam.model.pipeline.v1.ArtifactFilePayload + (*ArtifactUrlPayload)(nil), // 64: org.apache.beam.model.pipeline.v1.ArtifactUrlPayload + (*EmbeddedFilePayload)(nil), // 65: org.apache.beam.model.pipeline.v1.EmbeddedFilePayload + (*PyPIPayload)(nil), // 66: org.apache.beam.model.pipeline.v1.PyPIPayload + (*MavenPayload)(nil), // 67: org.apache.beam.model.pipeline.v1.MavenPayload + (*DeferredArtifactPayload)(nil), // 68: org.apache.beam.model.pipeline.v1.DeferredArtifactPayload + (*ArtifactStagingToRolePayload)(nil), // 69: org.apache.beam.model.pipeline.v1.ArtifactStagingToRolePayload + (*ArtifactInformation)(nil), // 70: org.apache.beam.model.pipeline.v1.ArtifactInformation + (*Environment)(nil), // 71: org.apache.beam.model.pipeline.v1.Environment + (*StandardEnvironments)(nil), // 72: org.apache.beam.model.pipeline.v1.StandardEnvironments + (*DockerPayload)(nil), // 73: org.apache.beam.model.pipeline.v1.DockerPayload + (*ProcessPayload)(nil), // 74: org.apache.beam.model.pipeline.v1.ProcessPayload + (*ExternalPayload)(nil), // 75: org.apache.beam.model.pipeline.v1.ExternalPayload + (*StandardProtocols)(nil), // 76: org.apache.beam.model.pipeline.v1.StandardProtocols + (*StandardRunnerProtocols)(nil), // 77: org.apache.beam.model.pipeline.v1.StandardRunnerProtocols + (*StandardRequirements)(nil), // 78: org.apache.beam.model.pipeline.v1.StandardRequirements + (*FunctionSpec)(nil), // 79: org.apache.beam.model.pipeline.v1.FunctionSpec + (*StandardDisplayData)(nil), // 80: org.apache.beam.model.pipeline.v1.StandardDisplayData + (*LabelledPayload)(nil), // 81: org.apache.beam.model.pipeline.v1.LabelledPayload + (*DisplayData)(nil), // 82: org.apache.beam.model.pipeline.v1.DisplayData + (*MessageWithComponents)(nil), // 83: org.apache.beam.model.pipeline.v1.MessageWithComponents + (*ExecutableStagePayload)(nil), // 84: org.apache.beam.model.pipeline.v1.ExecutableStagePayload + (*StandardResourceHints)(nil), // 85: org.apache.beam.model.pipeline.v1.StandardResourceHints + nil, // 86: org.apache.beam.model.pipeline.v1.Components.TransformsEntry + nil, // 87: org.apache.beam.model.pipeline.v1.Components.PcollectionsEntry + nil, // 88: org.apache.beam.model.pipeline.v1.Components.WindowingStrategiesEntry + nil, // 89: org.apache.beam.model.pipeline.v1.Components.CodersEntry + nil, // 90: org.apache.beam.model.pipeline.v1.Components.EnvironmentsEntry + nil, // 91: org.apache.beam.model.pipeline.v1.PTransform.InputsEntry + nil, // 92: org.apache.beam.model.pipeline.v1.PTransform.OutputsEntry + nil, // 93: org.apache.beam.model.pipeline.v1.PTransform.AnnotationsEntry + nil, // 94: org.apache.beam.model.pipeline.v1.ParDoPayload.SideInputsEntry + nil, // 95: org.apache.beam.model.pipeline.v1.ParDoPayload.StateSpecsEntry + nil, // 96: org.apache.beam.model.pipeline.v1.ParDoPayload.TimerFamilySpecsEntry + (*TestStreamPayload_Event)(nil), // 97: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event + (*TestStreamPayload_TimestampedElement)(nil), // 98: org.apache.beam.model.pipeline.v1.TestStreamPayload.TimestampedElement + (*TestStreamPayload_Event_AdvanceWatermark)(nil), // 99: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AdvanceWatermark + (*TestStreamPayload_Event_AdvanceProcessingTime)(nil), // 100: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AdvanceProcessingTime + (*TestStreamPayload_Event_AddElements)(nil), // 101: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AddElements + nil, // 102: org.apache.beam.model.pipeline.v1.WriteFilesPayload.SideInputsEntry + (*Trigger_AfterAll)(nil), // 103: org.apache.beam.model.pipeline.v1.Trigger.AfterAll + (*Trigger_AfterAny)(nil), // 104: org.apache.beam.model.pipeline.v1.Trigger.AfterAny + (*Trigger_AfterEach)(nil), // 105: org.apache.beam.model.pipeline.v1.Trigger.AfterEach + (*Trigger_AfterEndOfWindow)(nil), // 106: org.apache.beam.model.pipeline.v1.Trigger.AfterEndOfWindow + (*Trigger_AfterProcessingTime)(nil), // 107: org.apache.beam.model.pipeline.v1.Trigger.AfterProcessingTime + (*Trigger_AfterSynchronizedProcessingTime)(nil), // 108: org.apache.beam.model.pipeline.v1.Trigger.AfterSynchronizedProcessingTime + (*Trigger_Default)(nil), // 109: org.apache.beam.model.pipeline.v1.Trigger.Default + (*Trigger_ElementCount)(nil), // 110: org.apache.beam.model.pipeline.v1.Trigger.ElementCount + (*Trigger_Never)(nil), // 111: org.apache.beam.model.pipeline.v1.Trigger.Never + (*Trigger_Always)(nil), // 112: org.apache.beam.model.pipeline.v1.Trigger.Always + (*Trigger_OrFinally)(nil), // 113: org.apache.beam.model.pipeline.v1.Trigger.OrFinally + (*Trigger_Repeat)(nil), // 114: org.apache.beam.model.pipeline.v1.Trigger.Repeat + (*TimestampTransform_Delay)(nil), // 115: org.apache.beam.model.pipeline.v1.TimestampTransform.Delay + (*TimestampTransform_AlignTo)(nil), // 116: org.apache.beam.model.pipeline.v1.TimestampTransform.AlignTo + nil, // 117: org.apache.beam.model.pipeline.v1.Environment.ResourceHintsEntry + nil, // 118: org.apache.beam.model.pipeline.v1.ProcessPayload.EnvEntry + nil, // 119: org.apache.beam.model.pipeline.v1.ExternalPayload.ParamsEntry + (*ExecutableStagePayload_SideInputId)(nil), // 120: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.SideInputId + (*ExecutableStagePayload_UserStateId)(nil), // 121: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.UserStateId + (*ExecutableStagePayload_TimerId)(nil), // 122: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerId + (*ExecutableStagePayload_TimerFamilyId)(nil), // 123: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerFamilyId + (*ExecutableStagePayload_WireCoderSetting)(nil), // 124: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.WireCoderSetting + (*ApiServiceDescriptor)(nil), // 125: org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + (*descriptorpb.EnumValueOptions)(nil), // 126: google.protobuf.EnumValueOptions } var file_beam_runner_api_proto_depIdxs = []int32{ - 80, // 0: org.apache.beam.model.pipeline.v1.Components.transforms:type_name -> org.apache.beam.model.pipeline.v1.Components.TransformsEntry - 81, // 1: org.apache.beam.model.pipeline.v1.Components.pcollections:type_name -> org.apache.beam.model.pipeline.v1.Components.PcollectionsEntry - 82, // 2: org.apache.beam.model.pipeline.v1.Components.windowing_strategies:type_name -> org.apache.beam.model.pipeline.v1.Components.WindowingStrategiesEntry - 83, // 3: org.apache.beam.model.pipeline.v1.Components.coders:type_name -> org.apache.beam.model.pipeline.v1.Components.CodersEntry - 84, // 4: org.apache.beam.model.pipeline.v1.Components.environments:type_name -> org.apache.beam.model.pipeline.v1.Components.EnvironmentsEntry - 22, // 5: org.apache.beam.model.pipeline.v1.Pipeline.components:type_name -> org.apache.beam.model.pipeline.v1.Components - 77, // 6: org.apache.beam.model.pipeline.v1.Pipeline.display_data:type_name -> org.apache.beam.model.pipeline.v1.DisplayData - 74, // 7: org.apache.beam.model.pipeline.v1.PTransform.spec:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 85, // 8: org.apache.beam.model.pipeline.v1.PTransform.inputs:type_name -> org.apache.beam.model.pipeline.v1.PTransform.InputsEntry - 86, // 9: org.apache.beam.model.pipeline.v1.PTransform.outputs:type_name -> org.apache.beam.model.pipeline.v1.PTransform.OutputsEntry - 77, // 10: org.apache.beam.model.pipeline.v1.PTransform.display_data:type_name -> org.apache.beam.model.pipeline.v1.DisplayData - 7, // 11: org.apache.beam.model.pipeline.v1.PCollection.is_bounded:type_name -> org.apache.beam.model.pipeline.v1.IsBounded.Enum - 77, // 12: org.apache.beam.model.pipeline.v1.PCollection.display_data:type_name -> org.apache.beam.model.pipeline.v1.DisplayData - 74, // 13: org.apache.beam.model.pipeline.v1.ParDoPayload.do_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 87, // 14: org.apache.beam.model.pipeline.v1.ParDoPayload.side_inputs:type_name -> org.apache.beam.model.pipeline.v1.ParDoPayload.SideInputsEntry - 88, // 15: org.apache.beam.model.pipeline.v1.ParDoPayload.state_specs:type_name -> org.apache.beam.model.pipeline.v1.ParDoPayload.StateSpecsEntry - 89, // 16: org.apache.beam.model.pipeline.v1.ParDoPayload.timer_family_specs:type_name -> org.apache.beam.model.pipeline.v1.ParDoPayload.TimerFamilySpecsEntry - 30, // 17: org.apache.beam.model.pipeline.v1.StateSpec.read_modify_write_spec:type_name -> org.apache.beam.model.pipeline.v1.ReadModifyWriteStateSpec - 31, // 18: org.apache.beam.model.pipeline.v1.StateSpec.bag_spec:type_name -> org.apache.beam.model.pipeline.v1.BagStateSpec - 33, // 19: org.apache.beam.model.pipeline.v1.StateSpec.combining_spec:type_name -> org.apache.beam.model.pipeline.v1.CombiningStateSpec - 34, // 20: org.apache.beam.model.pipeline.v1.StateSpec.map_spec:type_name -> org.apache.beam.model.pipeline.v1.MapStateSpec - 35, // 21: org.apache.beam.model.pipeline.v1.StateSpec.set_spec:type_name -> org.apache.beam.model.pipeline.v1.SetStateSpec - 32, // 22: org.apache.beam.model.pipeline.v1.StateSpec.ordered_list_spec:type_name -> org.apache.beam.model.pipeline.v1.OrderedListStateSpec - 74, // 23: org.apache.beam.model.pipeline.v1.CombiningStateSpec.combine_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 14, // 24: org.apache.beam.model.pipeline.v1.TimerFamilySpec.time_domain:type_name -> org.apache.beam.model.pipeline.v1.TimeDomain.Enum - 74, // 25: org.apache.beam.model.pipeline.v1.ReadPayload.source:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 7, // 26: org.apache.beam.model.pipeline.v1.ReadPayload.is_bounded:type_name -> org.apache.beam.model.pipeline.v1.IsBounded.Enum - 74, // 27: org.apache.beam.model.pipeline.v1.WindowIntoPayload.window_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 74, // 28: org.apache.beam.model.pipeline.v1.CombinePayload.combine_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 90, // 29: org.apache.beam.model.pipeline.v1.TestStreamPayload.events:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event - 117, // 30: org.apache.beam.model.pipeline.v1.TestStreamPayload.endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 74, // 31: org.apache.beam.model.pipeline.v1.WriteFilesPayload.sink:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 74, // 32: org.apache.beam.model.pipeline.v1.WriteFilesPayload.format_function:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 95, // 33: org.apache.beam.model.pipeline.v1.WriteFilesPayload.side_inputs:type_name -> org.apache.beam.model.pipeline.v1.WriteFilesPayload.SideInputsEntry - 74, // 34: org.apache.beam.model.pipeline.v1.Coder.spec:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 74, // 35: org.apache.beam.model.pipeline.v1.WindowingStrategy.window_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 9, // 36: org.apache.beam.model.pipeline.v1.WindowingStrategy.merge_status:type_name -> org.apache.beam.model.pipeline.v1.MergeStatus.Enum - 55, // 37: org.apache.beam.model.pipeline.v1.WindowingStrategy.trigger:type_name -> org.apache.beam.model.pipeline.v1.Trigger - 10, // 38: org.apache.beam.model.pipeline.v1.WindowingStrategy.accumulation_mode:type_name -> org.apache.beam.model.pipeline.v1.AccumulationMode.Enum - 13, // 39: org.apache.beam.model.pipeline.v1.WindowingStrategy.output_time:type_name -> org.apache.beam.model.pipeline.v1.OutputTime.Enum - 11, // 40: org.apache.beam.model.pipeline.v1.WindowingStrategy.closing_behavior:type_name -> org.apache.beam.model.pipeline.v1.ClosingBehavior.Enum - 12, // 41: org.apache.beam.model.pipeline.v1.WindowingStrategy.OnTimeBehavior:type_name -> org.apache.beam.model.pipeline.v1.OnTimeBehavior.Enum - 96, // 42: org.apache.beam.model.pipeline.v1.Trigger.after_all:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterAll - 97, // 43: org.apache.beam.model.pipeline.v1.Trigger.after_any:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterAny - 98, // 44: org.apache.beam.model.pipeline.v1.Trigger.after_each:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterEach - 99, // 45: org.apache.beam.model.pipeline.v1.Trigger.after_end_of_window:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterEndOfWindow - 100, // 46: org.apache.beam.model.pipeline.v1.Trigger.after_processing_time:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterProcessingTime - 101, // 47: org.apache.beam.model.pipeline.v1.Trigger.after_synchronized_processing_time:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterSynchronizedProcessingTime - 105, // 48: org.apache.beam.model.pipeline.v1.Trigger.always:type_name -> org.apache.beam.model.pipeline.v1.Trigger.Always - 102, // 49: org.apache.beam.model.pipeline.v1.Trigger.default:type_name -> org.apache.beam.model.pipeline.v1.Trigger.Default - 103, // 50: org.apache.beam.model.pipeline.v1.Trigger.element_count:type_name -> org.apache.beam.model.pipeline.v1.Trigger.ElementCount - 104, // 51: org.apache.beam.model.pipeline.v1.Trigger.never:type_name -> org.apache.beam.model.pipeline.v1.Trigger.Never - 106, // 52: org.apache.beam.model.pipeline.v1.Trigger.or_finally:type_name -> org.apache.beam.model.pipeline.v1.Trigger.OrFinally - 107, // 53: org.apache.beam.model.pipeline.v1.Trigger.repeat:type_name -> org.apache.beam.model.pipeline.v1.Trigger.Repeat - 108, // 54: org.apache.beam.model.pipeline.v1.TimestampTransform.delay:type_name -> org.apache.beam.model.pipeline.v1.TimestampTransform.Delay - 109, // 55: org.apache.beam.model.pipeline.v1.TimestampTransform.align_to:type_name -> org.apache.beam.model.pipeline.v1.TimestampTransform.AlignTo - 74, // 56: org.apache.beam.model.pipeline.v1.SideInput.access_pattern:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 74, // 57: org.apache.beam.model.pipeline.v1.SideInput.view_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 74, // 58: org.apache.beam.model.pipeline.v1.SideInput.window_mapping_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 77, // 59: org.apache.beam.model.pipeline.v1.Environment.display_data:type_name -> org.apache.beam.model.pipeline.v1.DisplayData - 66, // 60: org.apache.beam.model.pipeline.v1.Environment.dependencies:type_name -> org.apache.beam.model.pipeline.v1.ArtifactInformation - 110, // 61: org.apache.beam.model.pipeline.v1.ProcessPayload.env:type_name -> org.apache.beam.model.pipeline.v1.ProcessPayload.EnvEntry - 117, // 62: org.apache.beam.model.pipeline.v1.ExternalPayload.endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor - 111, // 63: org.apache.beam.model.pipeline.v1.ExternalPayload.params:type_name -> org.apache.beam.model.pipeline.v1.ExternalPayload.ParamsEntry - 22, // 64: org.apache.beam.model.pipeline.v1.MessageWithComponents.components:type_name -> org.apache.beam.model.pipeline.v1.Components - 46, // 65: org.apache.beam.model.pipeline.v1.MessageWithComponents.coder:type_name -> org.apache.beam.model.pipeline.v1.Coder - 40, // 66: org.apache.beam.model.pipeline.v1.MessageWithComponents.combine_payload:type_name -> org.apache.beam.model.pipeline.v1.CombinePayload - 74, // 67: org.apache.beam.model.pipeline.v1.MessageWithComponents.function_spec:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec - 28, // 68: org.apache.beam.model.pipeline.v1.MessageWithComponents.par_do_payload:type_name -> org.apache.beam.model.pipeline.v1.ParDoPayload - 24, // 69: org.apache.beam.model.pipeline.v1.MessageWithComponents.ptransform:type_name -> org.apache.beam.model.pipeline.v1.PTransform - 27, // 70: org.apache.beam.model.pipeline.v1.MessageWithComponents.pcollection:type_name -> org.apache.beam.model.pipeline.v1.PCollection - 38, // 71: org.apache.beam.model.pipeline.v1.MessageWithComponents.read_payload:type_name -> org.apache.beam.model.pipeline.v1.ReadPayload - 57, // 72: org.apache.beam.model.pipeline.v1.MessageWithComponents.side_input:type_name -> org.apache.beam.model.pipeline.v1.SideInput - 39, // 73: org.apache.beam.model.pipeline.v1.MessageWithComponents.window_into_payload:type_name -> org.apache.beam.model.pipeline.v1.WindowIntoPayload - 48, // 74: org.apache.beam.model.pipeline.v1.MessageWithComponents.windowing_strategy:type_name -> org.apache.beam.model.pipeline.v1.WindowingStrategy - 67, // 75: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.environment:type_name -> org.apache.beam.model.pipeline.v1.Environment - 116, // 76: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.wire_coder_settings:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.WireCoderSetting - 112, // 77: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.side_inputs:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.SideInputId - 22, // 78: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.components:type_name -> org.apache.beam.model.pipeline.v1.Components - 113, // 79: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.user_states:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.UserStateId - 114, // 80: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.timers:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerId - 115, // 81: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.timerFamilies:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerFamilyId - 24, // 82: org.apache.beam.model.pipeline.v1.Components.TransformsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.PTransform - 27, // 83: org.apache.beam.model.pipeline.v1.Components.PcollectionsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.PCollection - 48, // 84: org.apache.beam.model.pipeline.v1.Components.WindowingStrategiesEntry.value:type_name -> org.apache.beam.model.pipeline.v1.WindowingStrategy - 46, // 85: org.apache.beam.model.pipeline.v1.Components.CodersEntry.value:type_name -> org.apache.beam.model.pipeline.v1.Coder - 67, // 86: org.apache.beam.model.pipeline.v1.Components.EnvironmentsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.Environment - 57, // 87: org.apache.beam.model.pipeline.v1.ParDoPayload.SideInputsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.SideInput - 29, // 88: org.apache.beam.model.pipeline.v1.ParDoPayload.StateSpecsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.StateSpec - 36, // 89: org.apache.beam.model.pipeline.v1.ParDoPayload.TimerFamilySpecsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.TimerFamilySpec - 92, // 90: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.watermark_event:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AdvanceWatermark - 93, // 91: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.processing_time_event:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AdvanceProcessingTime - 94, // 92: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.element_event:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AddElements - 91, // 93: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AddElements.elements:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.TimestampedElement - 57, // 94: org.apache.beam.model.pipeline.v1.WriteFilesPayload.SideInputsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.SideInput - 55, // 95: org.apache.beam.model.pipeline.v1.Trigger.AfterAll.subtriggers:type_name -> org.apache.beam.model.pipeline.v1.Trigger - 55, // 96: org.apache.beam.model.pipeline.v1.Trigger.AfterAny.subtriggers:type_name -> org.apache.beam.model.pipeline.v1.Trigger - 55, // 97: org.apache.beam.model.pipeline.v1.Trigger.AfterEach.subtriggers:type_name -> org.apache.beam.model.pipeline.v1.Trigger - 55, // 98: org.apache.beam.model.pipeline.v1.Trigger.AfterEndOfWindow.early_firings:type_name -> org.apache.beam.model.pipeline.v1.Trigger - 55, // 99: org.apache.beam.model.pipeline.v1.Trigger.AfterEndOfWindow.late_firings:type_name -> org.apache.beam.model.pipeline.v1.Trigger - 56, // 100: org.apache.beam.model.pipeline.v1.Trigger.AfterProcessingTime.timestamp_transforms:type_name -> org.apache.beam.model.pipeline.v1.TimestampTransform - 55, // 101: org.apache.beam.model.pipeline.v1.Trigger.OrFinally.main:type_name -> org.apache.beam.model.pipeline.v1.Trigger - 55, // 102: org.apache.beam.model.pipeline.v1.Trigger.OrFinally.finally:type_name -> org.apache.beam.model.pipeline.v1.Trigger - 55, // 103: org.apache.beam.model.pipeline.v1.Trigger.Repeat.subtrigger:type_name -> org.apache.beam.model.pipeline.v1.Trigger - 114, // 104: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.WireCoderSetting.timer:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerId - 118, // 105: org.apache.beam.model.pipeline.v1.beam_urn:extendee -> google.protobuf.EnumValueOptions - 118, // 106: org.apache.beam.model.pipeline.v1.beam_constant:extendee -> google.protobuf.EnumValueOptions - 42, // 107: org.apache.beam.model.pipeline.v1.TestStreamService.Events:input_type -> org.apache.beam.model.pipeline.v1.EventsRequest - 90, // 108: org.apache.beam.model.pipeline.v1.TestStreamService.Events:output_type -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event - 108, // [108:109] is the sub-list for method output_type - 107, // [107:108] is the sub-list for method input_type - 107, // [107:107] is the sub-list for extension type_name - 105, // [105:107] is the sub-list for extension extendee - 0, // [0:105] is the sub-list for field type_name + 86, // 0: org.apache.beam.model.pipeline.v1.Components.transforms:type_name -> org.apache.beam.model.pipeline.v1.Components.TransformsEntry + 87, // 1: org.apache.beam.model.pipeline.v1.Components.pcollections:type_name -> org.apache.beam.model.pipeline.v1.Components.PcollectionsEntry + 88, // 2: org.apache.beam.model.pipeline.v1.Components.windowing_strategies:type_name -> org.apache.beam.model.pipeline.v1.Components.WindowingStrategiesEntry + 89, // 3: org.apache.beam.model.pipeline.v1.Components.coders:type_name -> org.apache.beam.model.pipeline.v1.Components.CodersEntry + 90, // 4: org.apache.beam.model.pipeline.v1.Components.environments:type_name -> org.apache.beam.model.pipeline.v1.Components.EnvironmentsEntry + 25, // 5: org.apache.beam.model.pipeline.v1.Pipeline.components:type_name -> org.apache.beam.model.pipeline.v1.Components + 82, // 6: org.apache.beam.model.pipeline.v1.Pipeline.display_data:type_name -> org.apache.beam.model.pipeline.v1.DisplayData + 79, // 7: org.apache.beam.model.pipeline.v1.PTransform.spec:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 91, // 8: org.apache.beam.model.pipeline.v1.PTransform.inputs:type_name -> org.apache.beam.model.pipeline.v1.PTransform.InputsEntry + 92, // 9: org.apache.beam.model.pipeline.v1.PTransform.outputs:type_name -> org.apache.beam.model.pipeline.v1.PTransform.OutputsEntry + 82, // 10: org.apache.beam.model.pipeline.v1.PTransform.display_data:type_name -> org.apache.beam.model.pipeline.v1.DisplayData + 93, // 11: org.apache.beam.model.pipeline.v1.PTransform.annotations:type_name -> org.apache.beam.model.pipeline.v1.PTransform.AnnotationsEntry + 8, // 12: org.apache.beam.model.pipeline.v1.PCollection.is_bounded:type_name -> org.apache.beam.model.pipeline.v1.IsBounded.Enum + 82, // 13: org.apache.beam.model.pipeline.v1.PCollection.display_data:type_name -> org.apache.beam.model.pipeline.v1.DisplayData + 79, // 14: org.apache.beam.model.pipeline.v1.ParDoPayload.do_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 94, // 15: org.apache.beam.model.pipeline.v1.ParDoPayload.side_inputs:type_name -> org.apache.beam.model.pipeline.v1.ParDoPayload.SideInputsEntry + 95, // 16: org.apache.beam.model.pipeline.v1.ParDoPayload.state_specs:type_name -> org.apache.beam.model.pipeline.v1.ParDoPayload.StateSpecsEntry + 96, // 17: org.apache.beam.model.pipeline.v1.ParDoPayload.timer_family_specs:type_name -> org.apache.beam.model.pipeline.v1.ParDoPayload.TimerFamilySpecsEntry + 33, // 18: org.apache.beam.model.pipeline.v1.StateSpec.read_modify_write_spec:type_name -> org.apache.beam.model.pipeline.v1.ReadModifyWriteStateSpec + 34, // 19: org.apache.beam.model.pipeline.v1.StateSpec.bag_spec:type_name -> org.apache.beam.model.pipeline.v1.BagStateSpec + 36, // 20: org.apache.beam.model.pipeline.v1.StateSpec.combining_spec:type_name -> org.apache.beam.model.pipeline.v1.CombiningStateSpec + 37, // 21: org.apache.beam.model.pipeline.v1.StateSpec.map_spec:type_name -> org.apache.beam.model.pipeline.v1.MapStateSpec + 38, // 22: org.apache.beam.model.pipeline.v1.StateSpec.set_spec:type_name -> org.apache.beam.model.pipeline.v1.SetStateSpec + 35, // 23: org.apache.beam.model.pipeline.v1.StateSpec.ordered_list_spec:type_name -> org.apache.beam.model.pipeline.v1.OrderedListStateSpec + 79, // 24: org.apache.beam.model.pipeline.v1.CombiningStateSpec.combine_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 15, // 25: org.apache.beam.model.pipeline.v1.TimerFamilySpec.time_domain:type_name -> org.apache.beam.model.pipeline.v1.TimeDomain.Enum + 79, // 26: org.apache.beam.model.pipeline.v1.ReadPayload.source:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 8, // 27: org.apache.beam.model.pipeline.v1.ReadPayload.is_bounded:type_name -> org.apache.beam.model.pipeline.v1.IsBounded.Enum + 79, // 28: org.apache.beam.model.pipeline.v1.WindowIntoPayload.window_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 79, // 29: org.apache.beam.model.pipeline.v1.CombinePayload.combine_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 97, // 30: org.apache.beam.model.pipeline.v1.TestStreamPayload.events:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event + 125, // 31: org.apache.beam.model.pipeline.v1.TestStreamPayload.endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 79, // 32: org.apache.beam.model.pipeline.v1.WriteFilesPayload.sink:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 79, // 33: org.apache.beam.model.pipeline.v1.WriteFilesPayload.format_function:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 102, // 34: org.apache.beam.model.pipeline.v1.WriteFilesPayload.side_inputs:type_name -> org.apache.beam.model.pipeline.v1.WriteFilesPayload.SideInputsEntry + 79, // 35: org.apache.beam.model.pipeline.v1.Coder.spec:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 79, // 36: org.apache.beam.model.pipeline.v1.WindowingStrategy.window_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 10, // 37: org.apache.beam.model.pipeline.v1.WindowingStrategy.merge_status:type_name -> org.apache.beam.model.pipeline.v1.MergeStatus.Enum + 59, // 38: org.apache.beam.model.pipeline.v1.WindowingStrategy.trigger:type_name -> org.apache.beam.model.pipeline.v1.Trigger + 11, // 39: org.apache.beam.model.pipeline.v1.WindowingStrategy.accumulation_mode:type_name -> org.apache.beam.model.pipeline.v1.AccumulationMode.Enum + 14, // 40: org.apache.beam.model.pipeline.v1.WindowingStrategy.output_time:type_name -> org.apache.beam.model.pipeline.v1.OutputTime.Enum + 12, // 41: org.apache.beam.model.pipeline.v1.WindowingStrategy.closing_behavior:type_name -> org.apache.beam.model.pipeline.v1.ClosingBehavior.Enum + 13, // 42: org.apache.beam.model.pipeline.v1.WindowingStrategy.OnTimeBehavior:type_name -> org.apache.beam.model.pipeline.v1.OnTimeBehavior.Enum + 103, // 43: org.apache.beam.model.pipeline.v1.Trigger.after_all:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterAll + 104, // 44: org.apache.beam.model.pipeline.v1.Trigger.after_any:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterAny + 105, // 45: org.apache.beam.model.pipeline.v1.Trigger.after_each:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterEach + 106, // 46: org.apache.beam.model.pipeline.v1.Trigger.after_end_of_window:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterEndOfWindow + 107, // 47: org.apache.beam.model.pipeline.v1.Trigger.after_processing_time:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterProcessingTime + 108, // 48: org.apache.beam.model.pipeline.v1.Trigger.after_synchronized_processing_time:type_name -> org.apache.beam.model.pipeline.v1.Trigger.AfterSynchronizedProcessingTime + 112, // 49: org.apache.beam.model.pipeline.v1.Trigger.always:type_name -> org.apache.beam.model.pipeline.v1.Trigger.Always + 109, // 50: org.apache.beam.model.pipeline.v1.Trigger.default:type_name -> org.apache.beam.model.pipeline.v1.Trigger.Default + 110, // 51: org.apache.beam.model.pipeline.v1.Trigger.element_count:type_name -> org.apache.beam.model.pipeline.v1.Trigger.ElementCount + 111, // 52: org.apache.beam.model.pipeline.v1.Trigger.never:type_name -> org.apache.beam.model.pipeline.v1.Trigger.Never + 113, // 53: org.apache.beam.model.pipeline.v1.Trigger.or_finally:type_name -> org.apache.beam.model.pipeline.v1.Trigger.OrFinally + 114, // 54: org.apache.beam.model.pipeline.v1.Trigger.repeat:type_name -> org.apache.beam.model.pipeline.v1.Trigger.Repeat + 115, // 55: org.apache.beam.model.pipeline.v1.TimestampTransform.delay:type_name -> org.apache.beam.model.pipeline.v1.TimestampTransform.Delay + 116, // 56: org.apache.beam.model.pipeline.v1.TimestampTransform.align_to:type_name -> org.apache.beam.model.pipeline.v1.TimestampTransform.AlignTo + 79, // 57: org.apache.beam.model.pipeline.v1.SideInput.access_pattern:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 79, // 58: org.apache.beam.model.pipeline.v1.SideInput.view_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 79, // 59: org.apache.beam.model.pipeline.v1.SideInput.window_mapping_fn:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 82, // 60: org.apache.beam.model.pipeline.v1.Environment.display_data:type_name -> org.apache.beam.model.pipeline.v1.DisplayData + 70, // 61: org.apache.beam.model.pipeline.v1.Environment.dependencies:type_name -> org.apache.beam.model.pipeline.v1.ArtifactInformation + 117, // 62: org.apache.beam.model.pipeline.v1.Environment.resource_hints:type_name -> org.apache.beam.model.pipeline.v1.Environment.ResourceHintsEntry + 118, // 63: org.apache.beam.model.pipeline.v1.ProcessPayload.env:type_name -> org.apache.beam.model.pipeline.v1.ProcessPayload.EnvEntry + 125, // 64: org.apache.beam.model.pipeline.v1.ExternalPayload.endpoint:type_name -> org.apache.beam.model.pipeline.v1.ApiServiceDescriptor + 119, // 65: org.apache.beam.model.pipeline.v1.ExternalPayload.params:type_name -> org.apache.beam.model.pipeline.v1.ExternalPayload.ParamsEntry + 25, // 66: org.apache.beam.model.pipeline.v1.MessageWithComponents.components:type_name -> org.apache.beam.model.pipeline.v1.Components + 50, // 67: org.apache.beam.model.pipeline.v1.MessageWithComponents.coder:type_name -> org.apache.beam.model.pipeline.v1.Coder + 43, // 68: org.apache.beam.model.pipeline.v1.MessageWithComponents.combine_payload:type_name -> org.apache.beam.model.pipeline.v1.CombinePayload + 79, // 69: org.apache.beam.model.pipeline.v1.MessageWithComponents.function_spec:type_name -> org.apache.beam.model.pipeline.v1.FunctionSpec + 31, // 70: org.apache.beam.model.pipeline.v1.MessageWithComponents.par_do_payload:type_name -> org.apache.beam.model.pipeline.v1.ParDoPayload + 27, // 71: org.apache.beam.model.pipeline.v1.MessageWithComponents.ptransform:type_name -> org.apache.beam.model.pipeline.v1.PTransform + 30, // 72: org.apache.beam.model.pipeline.v1.MessageWithComponents.pcollection:type_name -> org.apache.beam.model.pipeline.v1.PCollection + 41, // 73: org.apache.beam.model.pipeline.v1.MessageWithComponents.read_payload:type_name -> org.apache.beam.model.pipeline.v1.ReadPayload + 61, // 74: org.apache.beam.model.pipeline.v1.MessageWithComponents.side_input:type_name -> org.apache.beam.model.pipeline.v1.SideInput + 42, // 75: org.apache.beam.model.pipeline.v1.MessageWithComponents.window_into_payload:type_name -> org.apache.beam.model.pipeline.v1.WindowIntoPayload + 52, // 76: org.apache.beam.model.pipeline.v1.MessageWithComponents.windowing_strategy:type_name -> org.apache.beam.model.pipeline.v1.WindowingStrategy + 71, // 77: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.environment:type_name -> org.apache.beam.model.pipeline.v1.Environment + 124, // 78: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.wire_coder_settings:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.WireCoderSetting + 120, // 79: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.side_inputs:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.SideInputId + 25, // 80: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.components:type_name -> org.apache.beam.model.pipeline.v1.Components + 121, // 81: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.user_states:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.UserStateId + 122, // 82: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.timers:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerId + 123, // 83: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.timerFamilies:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerFamilyId + 27, // 84: org.apache.beam.model.pipeline.v1.Components.TransformsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.PTransform + 30, // 85: org.apache.beam.model.pipeline.v1.Components.PcollectionsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.PCollection + 52, // 86: org.apache.beam.model.pipeline.v1.Components.WindowingStrategiesEntry.value:type_name -> org.apache.beam.model.pipeline.v1.WindowingStrategy + 50, // 87: org.apache.beam.model.pipeline.v1.Components.CodersEntry.value:type_name -> org.apache.beam.model.pipeline.v1.Coder + 71, // 88: org.apache.beam.model.pipeline.v1.Components.EnvironmentsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.Environment + 61, // 89: org.apache.beam.model.pipeline.v1.ParDoPayload.SideInputsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.SideInput + 32, // 90: org.apache.beam.model.pipeline.v1.ParDoPayload.StateSpecsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.StateSpec + 39, // 91: org.apache.beam.model.pipeline.v1.ParDoPayload.TimerFamilySpecsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.TimerFamilySpec + 99, // 92: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.watermark_event:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AdvanceWatermark + 100, // 93: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.processing_time_event:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AdvanceProcessingTime + 101, // 94: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.element_event:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AddElements + 98, // 95: org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AddElements.elements:type_name -> org.apache.beam.model.pipeline.v1.TestStreamPayload.TimestampedElement + 61, // 96: org.apache.beam.model.pipeline.v1.WriteFilesPayload.SideInputsEntry.value:type_name -> org.apache.beam.model.pipeline.v1.SideInput + 59, // 97: org.apache.beam.model.pipeline.v1.Trigger.AfterAll.subtriggers:type_name -> org.apache.beam.model.pipeline.v1.Trigger + 59, // 98: org.apache.beam.model.pipeline.v1.Trigger.AfterAny.subtriggers:type_name -> org.apache.beam.model.pipeline.v1.Trigger + 59, // 99: org.apache.beam.model.pipeline.v1.Trigger.AfterEach.subtriggers:type_name -> org.apache.beam.model.pipeline.v1.Trigger + 59, // 100: org.apache.beam.model.pipeline.v1.Trigger.AfterEndOfWindow.early_firings:type_name -> org.apache.beam.model.pipeline.v1.Trigger + 59, // 101: org.apache.beam.model.pipeline.v1.Trigger.AfterEndOfWindow.late_firings:type_name -> org.apache.beam.model.pipeline.v1.Trigger + 60, // 102: org.apache.beam.model.pipeline.v1.Trigger.AfterProcessingTime.timestamp_transforms:type_name -> org.apache.beam.model.pipeline.v1.TimestampTransform + 59, // 103: org.apache.beam.model.pipeline.v1.Trigger.OrFinally.main:type_name -> org.apache.beam.model.pipeline.v1.Trigger + 59, // 104: org.apache.beam.model.pipeline.v1.Trigger.OrFinally.finally:type_name -> org.apache.beam.model.pipeline.v1.Trigger + 59, // 105: org.apache.beam.model.pipeline.v1.Trigger.Repeat.subtrigger:type_name -> org.apache.beam.model.pipeline.v1.Trigger + 122, // 106: org.apache.beam.model.pipeline.v1.ExecutableStagePayload.WireCoderSetting.timer:type_name -> org.apache.beam.model.pipeline.v1.ExecutableStagePayload.TimerId + 126, // 107: org.apache.beam.model.pipeline.v1.beam_urn:extendee -> google.protobuf.EnumValueOptions + 126, // 108: org.apache.beam.model.pipeline.v1.beam_constant:extendee -> google.protobuf.EnumValueOptions + 45, // 109: org.apache.beam.model.pipeline.v1.TestStreamService.Events:input_type -> org.apache.beam.model.pipeline.v1.EventsRequest + 97, // 110: org.apache.beam.model.pipeline.v1.TestStreamService.Events:output_type -> org.apache.beam.model.pipeline.v1.TestStreamPayload.Event + 110, // [110:111] is the sub-list for method output_type + 109, // [109:110] is the sub-list for method input_type + 109, // [109:109] is the sub-list for extension type_name + 107, // [107:109] is the sub-list for extension extendee + 0, // [0:107] is the sub-list for field type_name } func init() { file_beam_runner_api_proto_init() } @@ -8873,7 +9454,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[25].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*Coder); i { + switch v := v.(*GroupIntoBatchesPayload); i { case 0: return &v.state case 1: @@ -8885,7 +9466,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[26].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StandardCoders); i { + switch v := v.(*Coder); i { case 0: return &v.state case 1: @@ -8897,7 +9478,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[27].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*WindowingStrategy); i { + switch v := v.(*StandardCoders); i { case 0: return &v.state case 1: @@ -8909,7 +9490,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[28].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*MergeStatus); i { + switch v := v.(*WindowingStrategy); i { case 0: return &v.state case 1: @@ -8921,7 +9502,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[29].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*AccumulationMode); i { + switch v := v.(*MergeStatus); i { case 0: return &v.state case 1: @@ -8933,7 +9514,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[30].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ClosingBehavior); i { + switch v := v.(*AccumulationMode); i { case 0: return &v.state case 1: @@ -8945,7 +9526,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[31].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*OnTimeBehavior); i { + switch v := v.(*ClosingBehavior); i { case 0: return &v.state case 1: @@ -8957,7 +9538,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[32].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*OutputTime); i { + switch v := v.(*OnTimeBehavior); i { case 0: return &v.state case 1: @@ -8969,7 +9550,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[33].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*TimeDomain); i { + switch v := v.(*OutputTime); i { case 0: return &v.state case 1: @@ -8981,7 +9562,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[34].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*Trigger); i { + switch v := v.(*TimeDomain); i { case 0: return &v.state case 1: @@ -8993,7 +9574,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[35].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*TimestampTransform); i { + switch v := v.(*Trigger); i { case 0: return &v.state case 1: @@ -9005,7 +9586,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[36].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*SideInput); i { + switch v := v.(*TimestampTransform); i { case 0: return &v.state case 1: @@ -9017,7 +9598,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[37].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StandardArtifacts); i { + switch v := v.(*SideInput); i { case 0: return &v.state case 1: @@ -9029,7 +9610,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[38].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ArtifactFilePayload); i { + switch v := v.(*StandardArtifacts); i { case 0: return &v.state case 1: @@ -9041,7 +9622,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[39].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ArtifactUrlPayload); i { + switch v := v.(*ArtifactFilePayload); i { case 0: return &v.state case 1: @@ -9053,7 +9634,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[40].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*EmbeddedFilePayload); i { + switch v := v.(*ArtifactUrlPayload); i { case 0: return &v.state case 1: @@ -9065,7 +9646,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[41].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*PyPIPayload); i { + switch v := v.(*EmbeddedFilePayload); i { case 0: return &v.state case 1: @@ -9077,7 +9658,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[42].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*MavenPayload); i { + switch v := v.(*PyPIPayload); i { case 0: return &v.state case 1: @@ -9089,7 +9670,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[43].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*DeferredArtifactPayload); i { + switch v := v.(*MavenPayload); i { case 0: return &v.state case 1: @@ -9101,7 +9682,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[44].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ArtifactStagingToRolePayload); i { + switch v := v.(*DeferredArtifactPayload); i { case 0: return &v.state case 1: @@ -9113,7 +9694,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[45].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ArtifactInformation); i { + switch v := v.(*ArtifactStagingToRolePayload); i { case 0: return &v.state case 1: @@ -9125,7 +9706,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[46].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*Environment); i { + switch v := v.(*ArtifactInformation); i { case 0: return &v.state case 1: @@ -9137,7 +9718,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[47].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StandardEnvironments); i { + switch v := v.(*Environment); i { case 0: return &v.state case 1: @@ -9149,7 +9730,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[48].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*DockerPayload); i { + switch v := v.(*StandardEnvironments); i { case 0: return &v.state case 1: @@ -9161,7 +9742,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[49].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ProcessPayload); i { + switch v := v.(*DockerPayload); i { case 0: return &v.state case 1: @@ -9173,7 +9754,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[50].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*ExternalPayload); i { + switch v := v.(*ProcessPayload); i { case 0: return &v.state case 1: @@ -9185,7 +9766,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[51].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StandardProtocols); i { + switch v := v.(*ExternalPayload); i { case 0: return &v.state case 1: @@ -9197,7 +9778,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[52].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StandardRequirements); i { + switch v := v.(*StandardProtocols); i { case 0: return &v.state case 1: @@ -9209,7 +9790,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[53].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*FunctionSpec); i { + switch v := v.(*StandardRunnerProtocols); i { case 0: return &v.state case 1: @@ -9221,7 +9802,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[54].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*StandardDisplayData); i { + switch v := v.(*StandardRequirements); i { case 0: return &v.state case 1: @@ -9233,7 +9814,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[55].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*LabelledStringPayload); i { + switch v := v.(*FunctionSpec); i { case 0: return &v.state case 1: @@ -9245,7 +9826,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[56].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*DisplayData); i { + switch v := v.(*StandardDisplayData); i { case 0: return &v.state case 1: @@ -9257,7 +9838,7 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[57].Exporter = func(v interface{}, i int) interface{} { - switch v := v.(*MessageWithComponents); i { + switch v := v.(*LabelledPayload); i { case 0: return &v.state case 1: @@ -9269,6 +9850,30 @@ func file_beam_runner_api_proto_init() { } } file_beam_runner_api_proto_msgTypes[58].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*DisplayData); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_beam_runner_api_proto_msgTypes[59].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*MessageWithComponents); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_beam_runner_api_proto_msgTypes[60].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ExecutableStagePayload); i { case 0: return &v.state @@ -9280,7 +9885,19 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[69].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[61].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*StandardResourceHints); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_beam_runner_api_proto_msgTypes[73].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*TestStreamPayload_Event); i { case 0: return &v.state @@ -9292,7 +9909,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[70].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[74].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*TestStreamPayload_TimestampedElement); i { case 0: return &v.state @@ -9304,7 +9921,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[71].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[75].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*TestStreamPayload_Event_AdvanceWatermark); i { case 0: return &v.state @@ -9316,7 +9933,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[72].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[76].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*TestStreamPayload_Event_AdvanceProcessingTime); i { case 0: return &v.state @@ -9328,7 +9945,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[73].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[77].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*TestStreamPayload_Event_AddElements); i { case 0: return &v.state @@ -9340,7 +9957,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[75].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[79].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_AfterAll); i { case 0: return &v.state @@ -9352,7 +9969,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[76].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[80].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_AfterAny); i { case 0: return &v.state @@ -9364,7 +9981,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[77].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[81].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_AfterEach); i { case 0: return &v.state @@ -9376,7 +9993,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[78].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[82].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_AfterEndOfWindow); i { case 0: return &v.state @@ -9388,7 +10005,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[79].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[83].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_AfterProcessingTime); i { case 0: return &v.state @@ -9400,7 +10017,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[80].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[84].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_AfterSynchronizedProcessingTime); i { case 0: return &v.state @@ -9412,7 +10029,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[81].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[85].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_Default); i { case 0: return &v.state @@ -9424,7 +10041,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[82].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[86].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_ElementCount); i { case 0: return &v.state @@ -9436,7 +10053,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[83].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[87].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_Never); i { case 0: return &v.state @@ -9448,7 +10065,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[84].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[88].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_Always); i { case 0: return &v.state @@ -9460,7 +10077,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[85].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[89].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_OrFinally); i { case 0: return &v.state @@ -9472,7 +10089,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[86].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[90].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*Trigger_Repeat); i { case 0: return &v.state @@ -9484,7 +10101,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[87].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[91].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*TimestampTransform_Delay); i { case 0: return &v.state @@ -9496,7 +10113,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[88].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[92].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*TimestampTransform_AlignTo); i { case 0: return &v.state @@ -9508,7 +10125,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[91].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[96].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ExecutableStagePayload_SideInputId); i { case 0: return &v.state @@ -9520,7 +10137,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[92].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[97].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ExecutableStagePayload_UserStateId); i { case 0: return &v.state @@ -9532,7 +10149,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[93].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[98].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ExecutableStagePayload_TimerId); i { case 0: return &v.state @@ -9544,7 +10161,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[94].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[99].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ExecutableStagePayload_TimerFamilyId); i { case 0: return &v.state @@ -9556,7 +10173,7 @@ func file_beam_runner_api_proto_init() { return nil } } - file_beam_runner_api_proto_msgTypes[95].Exporter = func(v interface{}, i int) interface{} { + file_beam_runner_api_proto_msgTypes[100].Exporter = func(v interface{}, i int) interface{} { switch v := v.(*ExecutableStagePayload_WireCoderSetting); i { case 0: return &v.state @@ -9577,7 +10194,7 @@ func file_beam_runner_api_proto_init() { (*StateSpec_SetSpec)(nil), (*StateSpec_OrderedListSpec)(nil), } - file_beam_runner_api_proto_msgTypes[34].OneofWrappers = []interface{}{ + file_beam_runner_api_proto_msgTypes[35].OneofWrappers = []interface{}{ (*Trigger_AfterAll_)(nil), (*Trigger_AfterAny_)(nil), (*Trigger_AfterEach_)(nil), @@ -9591,11 +10208,16 @@ func file_beam_runner_api_proto_init() { (*Trigger_OrFinally_)(nil), (*Trigger_Repeat_)(nil), } - file_beam_runner_api_proto_msgTypes[35].OneofWrappers = []interface{}{ + file_beam_runner_api_proto_msgTypes[36].OneofWrappers = []interface{}{ (*TimestampTransform_Delay_)(nil), (*TimestampTransform_AlignTo_)(nil), } file_beam_runner_api_proto_msgTypes[57].OneofWrappers = []interface{}{ + (*LabelledPayload_StringValue)(nil), + (*LabelledPayload_BoolValue)(nil), + (*LabelledPayload_DoubleValue)(nil), + } + file_beam_runner_api_proto_msgTypes[59].OneofWrappers = []interface{}{ (*MessageWithComponents_Coder)(nil), (*MessageWithComponents_CombinePayload)(nil), (*MessageWithComponents_FunctionSpec)(nil), @@ -9607,12 +10229,12 @@ func file_beam_runner_api_proto_init() { (*MessageWithComponents_WindowIntoPayload)(nil), (*MessageWithComponents_WindowingStrategy)(nil), } - file_beam_runner_api_proto_msgTypes[69].OneofWrappers = []interface{}{ + file_beam_runner_api_proto_msgTypes[73].OneofWrappers = []interface{}{ (*TestStreamPayload_Event_WatermarkEvent)(nil), (*TestStreamPayload_Event_ProcessingTimeEvent)(nil), (*TestStreamPayload_Event_ElementEvent)(nil), } - file_beam_runner_api_proto_msgTypes[95].OneofWrappers = []interface{}{ + file_beam_runner_api_proto_msgTypes[100].OneofWrappers = []interface{}{ (*ExecutableStagePayload_WireCoderSetting_InputOrOutputId)(nil), (*ExecutableStagePayload_WireCoderSetting_Timer)(nil), } @@ -9621,8 +10243,8 @@ func file_beam_runner_api_proto_init() { File: protoimpl.DescBuilder{ GoPackagePath: reflect.TypeOf(x{}).PkgPath(), RawDescriptor: file_beam_runner_api_proto_rawDesc, - NumEnums: 21, - NumMessages: 96, + NumEnums: 24, + NumMessages: 101, NumExtensions: 2, NumServices: 1, }, @@ -9637,112 +10259,3 @@ func file_beam_runner_api_proto_init() { file_beam_runner_api_proto_goTypes = nil file_beam_runner_api_proto_depIdxs = nil } - -// Reference imports to suppress errors if they are not otherwise used. -var _ context.Context -var _ grpc.ClientConnInterface - -// This is a compile-time assertion to ensure that this generated file -// is compatible with the grpc package it is being compiled against. -const _ = grpc.SupportPackageIsVersion6 - -// TestStreamServiceClient is the client API for TestStreamService service. -// -// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. -type TestStreamServiceClient interface { - // A TestStream will request for events using this RPC. - Events(ctx context.Context, in *EventsRequest, opts ...grpc.CallOption) (TestStreamService_EventsClient, error) -} - -type testStreamServiceClient struct { - cc grpc.ClientConnInterface -} - -func NewTestStreamServiceClient(cc grpc.ClientConnInterface) TestStreamServiceClient { - return &testStreamServiceClient{cc} -} - -func (c *testStreamServiceClient) Events(ctx context.Context, in *EventsRequest, opts ...grpc.CallOption) (TestStreamService_EventsClient, error) { - stream, err := c.cc.NewStream(ctx, &_TestStreamService_serviceDesc.Streams[0], "/org.apache.beam.model.pipeline.v1.TestStreamService/Events", opts...) - if err != nil { - return nil, err - } - x := &testStreamServiceEventsClient{stream} - if err := x.ClientStream.SendMsg(in); err != nil { - return nil, err - } - if err := x.ClientStream.CloseSend(); err != nil { - return nil, err - } - return x, nil -} - -type TestStreamService_EventsClient interface { - Recv() (*TestStreamPayload_Event, error) - grpc.ClientStream -} - -type testStreamServiceEventsClient struct { - grpc.ClientStream -} - -func (x *testStreamServiceEventsClient) Recv() (*TestStreamPayload_Event, error) { - m := new(TestStreamPayload_Event) - if err := x.ClientStream.RecvMsg(m); err != nil { - return nil, err - } - return m, nil -} - -// TestStreamServiceServer is the server API for TestStreamService service. -type TestStreamServiceServer interface { - // A TestStream will request for events using this RPC. - Events(*EventsRequest, TestStreamService_EventsServer) error -} - -// UnimplementedTestStreamServiceServer can be embedded to have forward compatible implementations. -type UnimplementedTestStreamServiceServer struct { -} - -func (*UnimplementedTestStreamServiceServer) Events(*EventsRequest, TestStreamService_EventsServer) error { - return status.Errorf(codes.Unimplemented, "method Events not implemented") -} - -func RegisterTestStreamServiceServer(s *grpc.Server, srv TestStreamServiceServer) { - s.RegisterService(&_TestStreamService_serviceDesc, srv) -} - -func _TestStreamService_Events_Handler(srv interface{}, stream grpc.ServerStream) error { - m := new(EventsRequest) - if err := stream.RecvMsg(m); err != nil { - return err - } - return srv.(TestStreamServiceServer).Events(m, &testStreamServiceEventsServer{stream}) -} - -type TestStreamService_EventsServer interface { - Send(*TestStreamPayload_Event) error - grpc.ServerStream -} - -type testStreamServiceEventsServer struct { - grpc.ServerStream -} - -func (x *testStreamServiceEventsServer) Send(m *TestStreamPayload_Event) error { - return x.ServerStream.SendMsg(m) -} - -var _TestStreamService_serviceDesc = grpc.ServiceDesc{ - ServiceName: "org.apache.beam.model.pipeline.v1.TestStreamService", - HandlerType: (*TestStreamServiceServer)(nil), - Methods: []grpc.MethodDesc{}, - Streams: []grpc.StreamDesc{ - { - StreamName: "Events", - Handler: _TestStreamService_Events_Handler, - ServerStreams: true, - }, - }, - Metadata: "beam_runner_api.proto", -} diff --git a/sdks/go/pkg/beam/model/pipeline_v1/beam_runner_api_grpc.pb.go b/sdks/go/pkg/beam/model/pipeline_v1/beam_runner_api_grpc.pb.go new file mode 100644 index 000000000000..a3051c9c717f --- /dev/null +++ b/sdks/go/pkg/beam/model/pipeline_v1/beam_runner_api_grpc.pb.go @@ -0,0 +1,147 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Code generated by protoc-gen-go-grpc. DO NOT EDIT. + +package pipeline_v1 + +import ( + context "context" + grpc "google.golang.org/grpc" + codes "google.golang.org/grpc/codes" + status "google.golang.org/grpc/status" +) + +// This is a compile-time assertion to ensure that this generated file +// is compatible with the grpc package it is being compiled against. +// Requires gRPC-Go v1.32.0 or later. +const _ = grpc.SupportPackageIsVersion7 + +// TestStreamServiceClient is the client API for TestStreamService service. +// +// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://pkg.go.dev/google.golang.org/grpc/?tab=doc#ClientConn.NewStream. +type TestStreamServiceClient interface { + // A TestStream will request for events using this RPC. + Events(ctx context.Context, in *EventsRequest, opts ...grpc.CallOption) (TestStreamService_EventsClient, error) +} + +type testStreamServiceClient struct { + cc grpc.ClientConnInterface +} + +func NewTestStreamServiceClient(cc grpc.ClientConnInterface) TestStreamServiceClient { + return &testStreamServiceClient{cc} +} + +func (c *testStreamServiceClient) Events(ctx context.Context, in *EventsRequest, opts ...grpc.CallOption) (TestStreamService_EventsClient, error) { + stream, err := c.cc.NewStream(ctx, &TestStreamService_ServiceDesc.Streams[0], "/org.apache.beam.model.pipeline.v1.TestStreamService/Events", opts...) + if err != nil { + return nil, err + } + x := &testStreamServiceEventsClient{stream} + if err := x.ClientStream.SendMsg(in); err != nil { + return nil, err + } + if err := x.ClientStream.CloseSend(); err != nil { + return nil, err + } + return x, nil +} + +type TestStreamService_EventsClient interface { + Recv() (*TestStreamPayload_Event, error) + grpc.ClientStream +} + +type testStreamServiceEventsClient struct { + grpc.ClientStream +} + +func (x *testStreamServiceEventsClient) Recv() (*TestStreamPayload_Event, error) { + m := new(TestStreamPayload_Event) + if err := x.ClientStream.RecvMsg(m); err != nil { + return nil, err + } + return m, nil +} + +// TestStreamServiceServer is the server API for TestStreamService service. +// All implementations must embed UnimplementedTestStreamServiceServer +// for forward compatibility +type TestStreamServiceServer interface { + // A TestStream will request for events using this RPC. + Events(*EventsRequest, TestStreamService_EventsServer) error + mustEmbedUnimplementedTestStreamServiceServer() +} + +// UnimplementedTestStreamServiceServer must be embedded to have forward compatible implementations. +type UnimplementedTestStreamServiceServer struct { +} + +func (UnimplementedTestStreamServiceServer) Events(*EventsRequest, TestStreamService_EventsServer) error { + return status.Errorf(codes.Unimplemented, "method Events not implemented") +} +func (UnimplementedTestStreamServiceServer) mustEmbedUnimplementedTestStreamServiceServer() {} + +// UnsafeTestStreamServiceServer may be embedded to opt out of forward compatibility for this service. +// Use of this interface is not recommended, as added methods to TestStreamServiceServer will +// result in compilation errors. +type UnsafeTestStreamServiceServer interface { + mustEmbedUnimplementedTestStreamServiceServer() +} + +func RegisterTestStreamServiceServer(s grpc.ServiceRegistrar, srv TestStreamServiceServer) { + s.RegisterService(&TestStreamService_ServiceDesc, srv) +} + +func _TestStreamService_Events_Handler(srv interface{}, stream grpc.ServerStream) error { + m := new(EventsRequest) + if err := stream.RecvMsg(m); err != nil { + return err + } + return srv.(TestStreamServiceServer).Events(m, &testStreamServiceEventsServer{stream}) +} + +type TestStreamService_EventsServer interface { + Send(*TestStreamPayload_Event) error + grpc.ServerStream +} + +type testStreamServiceEventsServer struct { + grpc.ServerStream +} + +func (x *testStreamServiceEventsServer) Send(m *TestStreamPayload_Event) error { + return x.ServerStream.SendMsg(m) +} + +// TestStreamService_ServiceDesc is the grpc.ServiceDesc for TestStreamService service. +// It's only intended for direct use with grpc.RegisterService, +// and not to be introspected or modified (even as a copy) +var TestStreamService_ServiceDesc = grpc.ServiceDesc{ + ServiceName: "org.apache.beam.model.pipeline.v1.TestStreamService", + HandlerType: (*TestStreamServiceServer)(nil), + Methods: []grpc.MethodDesc{}, + Streams: []grpc.StreamDesc{ + { + StreamName: "Events", + Handler: _TestStreamService_Events_Handler, + ServerStreams: true, + }, + }, + Metadata: "beam_runner_api.proto", +} diff --git a/sdks/go/pkg/beam/model/pipeline_v1/endpoints.pb.go b/sdks/go/pkg/beam/model/pipeline_v1/endpoints.pb.go index af1a0ddebc19..6245c617457d 100644 --- a/sdks/go/pkg/beam/model/pipeline_v1/endpoints.pb.go +++ b/sdks/go/pkg/beam/model/pipeline_v1/endpoints.pb.go @@ -20,8 +20,8 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: endpoints.proto package pipeline_v1 @@ -180,14 +180,15 @@ var file_endpoints_proto_rawDesc = []byte{ 0x53, 0x70, 0x65, 0x63, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, - 0x42, 0x75, 0x0a, 0x21, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, + 0x42, 0x78, 0x0a, 0x21, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x42, 0x09, 0x45, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x73, - 0x5a, 0x45, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x67, 0x6f, - 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x3b, 0x70, 0x69, 0x70, 0x65, - 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, + 0x5a, 0x48, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, + 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x76, 0x32, + 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2f, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x3b, 0x70, + 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, + 0x6f, 0x33, } var ( diff --git a/sdks/go/pkg/beam/model/pipeline_v1/external_transforms.pb.go b/sdks/go/pkg/beam/model/pipeline_v1/external_transforms.pb.go index d27d0f9f1f4d..1050d066afbe 100644 --- a/sdks/go/pkg/beam/model/pipeline_v1/external_transforms.pb.go +++ b/sdks/go/pkg/beam/model/pipeline_v1/external_transforms.pb.go @@ -20,8 +20,8 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: external_transforms.proto package pipeline_v1 @@ -115,16 +115,16 @@ var file_external_transforms_proto_rawDesc = []byte{ 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x63, 0x68, 0x65, 0x6d, 0x61, 0x52, 0x06, 0x73, 0x63, 0x68, 0x65, 0x6d, 0x61, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, - 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x42, 0x7e, 0x0a, 0x21, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x42, 0x12, - 0x45, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, - 0x6d, 0x73, 0x5a, 0x45, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, - 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, - 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2f, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x3b, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, - 0x33, + 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x42, 0x81, 0x01, 0x0a, 0x21, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x42, + 0x12, 0x45, 0x78, 0x74, 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, + 0x72, 0x6d, 0x73, 0x5a, 0x48, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, + 0x2f, 0x76, 0x32, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, + 0x31, 0x3b, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, + 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( diff --git a/sdks/go/pkg/beam/model/pipeline_v1/metrics.pb.go b/sdks/go/pkg/beam/model/pipeline_v1/metrics.pb.go index 16d4fcb554db..cd05bfd575f5 100644 --- a/sdks/go/pkg/beam/model/pipeline_v1/metrics.pb.go +++ b/sdks/go/pkg/beam/model/pipeline_v1/metrics.pb.go @@ -20,16 +20,17 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: metrics.proto package pipeline_v1 import ( - descriptor "github.com/golang/protobuf/protoc-gen-go/descriptor" protoreflect "google.golang.org/protobuf/reflect/protoreflect" protoimpl "google.golang.org/protobuf/runtime/protoimpl" + descriptorpb "google.golang.org/protobuf/types/descriptorpb" + timestamppb "google.golang.org/protobuf/types/known/timestamppb" reflect "reflect" sync "sync" ) @@ -94,6 +95,8 @@ const ( // For an SDK that processes items sequentially, this is equivalently the // number of items fully processed (or -1 if processing has not yet started). MonitoringInfoSpecs_DATA_CHANNEL_READ_INDEX MonitoringInfoSpecs_Enum = 18 + MonitoringInfoSpecs_API_REQUEST_COUNT MonitoringInfoSpecs_Enum = 19 + MonitoringInfoSpecs_API_REQUEST_LATENCIES MonitoringInfoSpecs_Enum = 20 ) // Enum value maps for MonitoringInfoSpecs_Enum. @@ -118,6 +121,8 @@ var ( 16: "WORK_REMAINING", 17: "WORK_COMPLETED", 18: "DATA_CHANNEL_READ_INDEX", + 19: "API_REQUEST_COUNT", + 20: "API_REQUEST_LATENCIES", } MonitoringInfoSpecs_Enum_value = map[string]int32{ "USER_SUM_INT64": 0, @@ -139,6 +144,8 @@ var ( "WORK_REMAINING": 16, "WORK_COMPLETED": 17, "DATA_CHANNEL_READ_INDEX": 18, + "API_REQUEST_COUNT": 19, + "API_REQUEST_LATENCIES": 20, } ) @@ -177,34 +184,67 @@ const ( // refer to them. For actively processed bundles, these should match the // values within the ProcessBundleDescriptor. For job management APIs, // these should match values within the original pipeline representation. - MonitoringInfo_TRANSFORM MonitoringInfo_MonitoringInfoLabels = 0 - MonitoringInfo_PCOLLECTION MonitoringInfo_MonitoringInfoLabels = 1 - MonitoringInfo_WINDOWING_STRATEGY MonitoringInfo_MonitoringInfoLabels = 2 - MonitoringInfo_CODER MonitoringInfo_MonitoringInfoLabels = 3 - MonitoringInfo_ENVIRONMENT MonitoringInfo_MonitoringInfoLabels = 4 - MonitoringInfo_NAMESPACE MonitoringInfo_MonitoringInfoLabels = 5 - MonitoringInfo_NAME MonitoringInfo_MonitoringInfoLabels = 6 + MonitoringInfo_TRANSFORM MonitoringInfo_MonitoringInfoLabels = 0 + MonitoringInfo_PCOLLECTION MonitoringInfo_MonitoringInfoLabels = 1 + MonitoringInfo_WINDOWING_STRATEGY MonitoringInfo_MonitoringInfoLabels = 2 + MonitoringInfo_CODER MonitoringInfo_MonitoringInfoLabels = 3 + MonitoringInfo_ENVIRONMENT MonitoringInfo_MonitoringInfoLabels = 4 + MonitoringInfo_NAMESPACE MonitoringInfo_MonitoringInfoLabels = 5 + MonitoringInfo_NAME MonitoringInfo_MonitoringInfoLabels = 6 + MonitoringInfo_SERVICE MonitoringInfo_MonitoringInfoLabels = 7 + MonitoringInfo_METHOD MonitoringInfo_MonitoringInfoLabels = 8 + MonitoringInfo_RESOURCE MonitoringInfo_MonitoringInfoLabels = 9 + MonitoringInfo_STATUS MonitoringInfo_MonitoringInfoLabels = 10 + MonitoringInfo_BIGQUERY_PROJECT_ID MonitoringInfo_MonitoringInfoLabels = 11 + MonitoringInfo_BIGQUERY_DATASET MonitoringInfo_MonitoringInfoLabels = 12 + MonitoringInfo_BIGQUERY_TABLE MonitoringInfo_MonitoringInfoLabels = 13 + MonitoringInfo_BIGQUERY_VIEW MonitoringInfo_MonitoringInfoLabels = 14 + MonitoringInfo_BIGQUERY_QUERY_NAME MonitoringInfo_MonitoringInfoLabels = 15 + MonitoringInfo_GCS_BUCKET MonitoringInfo_MonitoringInfoLabels = 16 + MonitoringInfo_GCS_PROJECT_ID MonitoringInfo_MonitoringInfoLabels = 17 ) // Enum value maps for MonitoringInfo_MonitoringInfoLabels. var ( MonitoringInfo_MonitoringInfoLabels_name = map[int32]string{ - 0: "TRANSFORM", - 1: "PCOLLECTION", - 2: "WINDOWING_STRATEGY", - 3: "CODER", - 4: "ENVIRONMENT", - 5: "NAMESPACE", - 6: "NAME", + 0: "TRANSFORM", + 1: "PCOLLECTION", + 2: "WINDOWING_STRATEGY", + 3: "CODER", + 4: "ENVIRONMENT", + 5: "NAMESPACE", + 6: "NAME", + 7: "SERVICE", + 8: "METHOD", + 9: "RESOURCE", + 10: "STATUS", + 11: "BIGQUERY_PROJECT_ID", + 12: "BIGQUERY_DATASET", + 13: "BIGQUERY_TABLE", + 14: "BIGQUERY_VIEW", + 15: "BIGQUERY_QUERY_NAME", + 16: "GCS_BUCKET", + 17: "GCS_PROJECT_ID", } MonitoringInfo_MonitoringInfoLabels_value = map[string]int32{ - "TRANSFORM": 0, - "PCOLLECTION": 1, - "WINDOWING_STRATEGY": 2, - "CODER": 3, - "ENVIRONMENT": 4, - "NAMESPACE": 5, - "NAME": 6, + "TRANSFORM": 0, + "PCOLLECTION": 1, + "WINDOWING_STRATEGY": 2, + "CODER": 3, + "ENVIRONMENT": 4, + "NAMESPACE": 5, + "NAME": 6, + "SERVICE": 7, + "METHOD": 8, + "RESOURCE": 9, + "STATUS": 10, + "BIGQUERY_PROJECT_ID": 11, + "BIGQUERY_DATASET": 12, + "BIGQUERY_TABLE": 13, + "BIGQUERY_VIEW": 14, + "BIGQUERY_QUERY_NAME": 15, + "GCS_BUCKET": 16, + "GCS_PROJECT_ID": 17, } ) @@ -634,6 +674,17 @@ type MonitoringInfo struct { // as Stackdriver will be able to aggregate the metrics using a subset of the // provided labels Labels map[string]string `protobuf:"bytes,4,rep,name=labels,proto3" json:"labels,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` + // This indicates the start of the time range over which this value was + // measured. + // This is needed by some external metric aggregation services + // to indicate when the reporter of the metric first began collecting the + // cumulative value for the timeseries. + // If the SDK Harness restarts, it should reset the start_time, and reset + // the collection of cumulative metrics (i.e. start to count again from 0). + // HarnessMonitoringInfos should set this start_time once, when the + // MonitoringInfo is first reported. + // ProcessBundle MonitoringInfos should set a start_time for each bundle. + StartTime *timestamppb.Timestamp `protobuf:"bytes,5,opt,name=start_time,json=startTime,proto3" json:"start_time,omitempty"` } func (x *MonitoringInfo) Reset() { @@ -696,6 +747,13 @@ func (x *MonitoringInfo) GetLabels() map[string]string { return nil } +func (x *MonitoringInfo) GetStartTime() *timestamppb.Timestamp { + if x != nil { + return x.StartTime + } + return nil +} + // A set of well known URNs that specify the encoding and aggregation method. type MonitoringInfoTypeUrns struct { state protoimpl.MessageState @@ -737,7 +795,7 @@ func (*MonitoringInfoTypeUrns) Descriptor() ([]byte, []int) { var file_metrics_proto_extTypes = []protoimpl.ExtensionInfo{ { - ExtendedType: (*descriptor.EnumValueOptions)(nil), + ExtendedType: (*descriptorpb.EnumValueOptions)(nil), ExtensionType: (*MonitoringInfoLabelProps)(nil), Field: 127337796, Name: "org.apache.beam.model.pipeline.v1.label_props", @@ -745,7 +803,7 @@ var file_metrics_proto_extTypes = []protoimpl.ExtensionInfo{ Filename: "metrics.proto", }, { - ExtendedType: (*descriptor.EnumValueOptions)(nil), + ExtendedType: (*descriptorpb.EnumValueOptions)(nil), ExtensionType: (*MonitoringInfoSpec)(nil), Field: 207174266, Name: "org.apache.beam.model.pipeline.v1.monitoring_info_spec", @@ -754,7 +812,7 @@ var file_metrics_proto_extTypes = []protoimpl.ExtensionInfo{ }, } -// Extension fields to descriptor.EnumValueOptions. +// Extension fields to descriptorpb.EnumValueOptions. var ( // optional org.apache.beam.model.pipeline.v1.MonitoringInfoLabelProps label_props = 127337796; E_LabelProps = &file_metrics_proto_extTypes[0] // From: commit 0x7970544. @@ -773,375 +831,444 @@ var file_metrics_proto_rawDesc = []byte{ 0x76, 0x31, 0x1a, 0x15, 0x62, 0x65, 0x61, 0x6d, 0x5f, 0x72, 0x75, 0x6e, 0x6e, 0x65, 0x72, 0x5f, 0x61, 0x70, 0x69, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x1a, 0x20, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2f, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2f, 0x64, 0x65, 0x73, 0x63, 0x72, - 0x69, 0x70, 0x74, 0x6f, 0x72, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x22, 0xb4, 0x01, 0x0a, 0x12, - 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x53, 0x70, - 0x65, 0x63, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x03, 0x75, 0x72, 0x6e, 0x12, 0x12, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, - 0x28, 0x09, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x27, 0x0a, 0x0f, 0x72, 0x65, 0x71, 0x75, - 0x69, 0x72, 0x65, 0x64, 0x5f, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x18, 0x03, 0x20, 0x03, 0x28, - 0x09, 0x52, 0x0e, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x64, 0x4c, 0x61, 0x62, 0x65, 0x6c, - 0x73, 0x12, 0x4f, 0x0a, 0x0b, 0x61, 0x6e, 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x73, - 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x6e, 0x6e, 0x6f, 0x74, - 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x0b, 0x61, 0x6e, 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, 0x6f, - 0x6e, 0x73, 0x22, 0x34, 0x0a, 0x0a, 0x41, 0x6e, 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, 0x6f, 0x6e, - 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, - 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x22, 0x8e, 0x1f, 0x0a, 0x13, 0x4d, 0x6f, 0x6e, - 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x53, 0x70, 0x65, 0x63, 0x73, - 0x22, 0xf6, 0x1e, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0xa7, 0x01, 0x0a, 0x0e, 0x55, 0x53, - 0x45, 0x52, 0x5f, 0x53, 0x55, 0x4d, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x10, 0x00, 0x1a, 0x92, - 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x8b, 0x01, 0x0a, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, - 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, - 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, - 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, - 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, - 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, - 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, - 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, - 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, - 0x69, 0x63, 0x2e, 0x12, 0xaa, 0x01, 0x0a, 0x0f, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x53, 0x55, 0x4d, - 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x01, 0x1a, 0x94, 0x01, 0xd2, 0xa7, 0xa7, 0x96, - 0x06, 0x8d, 0x01, 0x0a, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, - 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, - 0x3a, 0x76, 0x31, 0x12, 0x1a, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, - 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x1a, - 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, - 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, - 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, - 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, - 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, - 0x12, 0xc2, 0x01, 0x0a, 0x17, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x44, 0x49, 0x53, 0x54, 0x52, 0x49, - 0x42, 0x55, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x10, 0x02, 0x1a, 0xa4, - 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x9d, 0x01, 0x0a, 0x26, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, - 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x64, 0x69, 0x73, 0x74, 0x72, - 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, - 0x12, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x64, - 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, - 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, - 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, - 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, - 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, - 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, - 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xc5, 0x01, 0x0a, 0x18, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x44, - 0x49, 0x53, 0x54, 0x52, 0x49, 0x42, 0x55, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, 0x42, - 0x4c, 0x45, 0x10, 0x03, 0x1a, 0xa6, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x9f, 0x01, 0x0a, 0x27, - 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, - 0x3a, 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x64, 0x6f, - 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, - 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, - 0x6f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, - 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, - 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, - 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, - 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, - 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xb0, 0x01, - 0x0a, 0x11, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x4c, 0x41, 0x54, 0x45, 0x53, 0x54, 0x5f, 0x49, 0x4e, - 0x54, 0x36, 0x34, 0x10, 0x04, 0x1a, 0x98, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x91, 0x01, 0x0a, - 0x20, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, - 0x72, 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, - 0x31, 0x12, 0x1c, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, - 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, - 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, - 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, - 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, - 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, - 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, - 0x12, 0xb3, 0x01, 0x0a, 0x12, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x4c, 0x41, 0x54, 0x45, 0x53, 0x54, - 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x05, 0x1a, 0x9a, 0x01, 0xd2, 0xa7, 0xa7, 0x96, - 0x06, 0x93, 0x01, 0x0a, 0x21, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, - 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x64, 0x6f, 0x75, - 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, - 0x72, 0x69, 0x63, 0x73, 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x64, 0x6f, 0x75, 0x62, - 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, - 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, - 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, - 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, - 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, - 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xad, 0x01, 0x0a, 0x10, 0x55, 0x53, 0x45, 0x52, 0x5f, - 0x54, 0x4f, 0x50, 0x5f, 0x4e, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x10, 0x06, 0x1a, 0x96, 0x01, - 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x8f, 0x01, 0x0a, 0x1f, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, - 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, 0x5f, - 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, - 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, + 0x69, 0x70, 0x74, 0x6f, 0x72, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x1a, 0x1f, 0x67, 0x6f, 0x6f, + 0x67, 0x6c, 0x65, 0x2f, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2f, 0x74, 0x69, 0x6d, + 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x22, 0xb4, 0x01, 0x0a, + 0x12, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x53, + 0x70, 0x65, 0x63, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, + 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, 0x12, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, + 0x01, 0x28, 0x09, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x27, 0x0a, 0x0f, 0x72, 0x65, 0x71, + 0x75, 0x69, 0x72, 0x65, 0x64, 0x5f, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x18, 0x03, 0x20, 0x03, + 0x28, 0x09, 0x52, 0x0e, 0x72, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x64, 0x4c, 0x61, 0x62, 0x65, + 0x6c, 0x73, 0x12, 0x4f, 0x0a, 0x0b, 0x61, 0x6e, 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, 0x6f, 0x6e, + 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x6e, 0x6e, 0x6f, + 0x74, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x0b, 0x61, 0x6e, 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, + 0x6f, 0x6e, 0x73, 0x22, 0x34, 0x0a, 0x0a, 0x41, 0x6e, 0x6e, 0x6f, 0x74, 0x61, 0x74, 0x69, 0x6f, + 0x6e, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, + 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x22, 0xd1, 0x23, 0x0a, 0x13, 0x4d, 0x6f, + 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x53, 0x70, 0x65, 0x63, + 0x73, 0x22, 0xb9, 0x23, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0xa7, 0x01, 0x0a, 0x0e, 0x55, + 0x53, 0x45, 0x52, 0x5f, 0x53, 0x55, 0x4d, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x10, 0x00, 0x1a, + 0x92, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x8b, 0x01, 0x0a, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x73, 0x75, 0x6d, 0x5f, + 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, + 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, + 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, + 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, + 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, + 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, + 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, + 0x72, 0x69, 0x63, 0x2e, 0x12, 0xaa, 0x01, 0x0a, 0x0f, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x53, 0x55, + 0x4d, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x01, 0x1a, 0x94, 0x01, 0xd2, 0xa7, 0xa7, + 0x96, 0x06, 0x8d, 0x01, 0x0a, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, + 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, + 0x65, 0x3a, 0x76, 0x31, 0x12, 0x1a, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, + 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, + 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, + 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, + 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, + 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, + 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, + 0x2e, 0x12, 0xc2, 0x01, 0x0a, 0x17, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x44, 0x49, 0x53, 0x54, 0x52, + 0x49, 0x42, 0x55, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x10, 0x02, 0x1a, + 0xa4, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x9d, 0x01, 0x0a, 0x26, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x64, 0x69, 0x73, 0x74, + 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, + 0x31, 0x12, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, + 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, - 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xb0, 0x01, 0x0a, 0x11, 0x55, 0x53, 0x45, 0x52, 0x5f, - 0x54, 0x4f, 0x50, 0x5f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x07, 0x1a, 0x98, - 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x91, 0x01, 0x0a, 0x20, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, - 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, - 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x1c, 0x62, 0x65, 0x61, 0x6d, - 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, 0x5f, 0x64, - 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, - 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, - 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, - 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, - 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, - 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xb6, 0x01, 0x0a, 0x13, 0x55, 0x53, - 0x45, 0x52, 0x5f, 0x42, 0x4f, 0x54, 0x54, 0x4f, 0x4d, 0x5f, 0x4e, 0x5f, 0x49, 0x4e, 0x54, 0x36, - 0x34, 0x10, 0x08, 0x1a, 0x9c, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x95, 0x01, 0x0a, 0x22, 0x62, - 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, - 0x62, 0x6f, 0x74, 0x74, 0x6f, 0x6d, 0x5f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, - 0x31, 0x12, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, - 0x62, 0x6f, 0x74, 0x74, 0x6f, 0x6d, 0x5f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, - 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, - 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, - 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, - 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, - 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, - 0x63, 0x2e, 0x12, 0xb9, 0x01, 0x0a, 0x14, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x42, 0x4f, 0x54, 0x54, - 0x4f, 0x4d, 0x5f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x09, 0x1a, 0x9e, 0x01, - 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x97, 0x01, 0x0a, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, - 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x62, 0x6f, 0x74, 0x74, 0x6f, 0x6d, - 0x5f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x1f, 0x62, 0x65, - 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x62, 0x6f, 0x74, 0x74, 0x6f, - 0x6d, 0x5f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, + 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xc5, 0x01, 0x0a, 0x18, 0x55, 0x53, 0x45, 0x52, 0x5f, + 0x44, 0x49, 0x53, 0x54, 0x52, 0x49, 0x42, 0x55, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, + 0x42, 0x4c, 0x45, 0x10, 0x03, 0x1a, 0xa6, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x9f, 0x01, 0x0a, + 0x27, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, + 0x72, 0x3a, 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x64, + 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, + 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, + 0x69, 0x6f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, - 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xad, - 0x01, 0x0a, 0x0d, 0x45, 0x4c, 0x45, 0x4d, 0x45, 0x4e, 0x54, 0x5f, 0x43, 0x4f, 0x55, 0x4e, 0x54, - 0x10, 0x0a, 0x1a, 0x99, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x92, 0x01, 0x0a, 0x1c, 0x62, 0x65, - 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, - 0x74, 0x5f, 0x63, 0x6f, 0x75, 0x6e, 0x74, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, - 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, - 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0b, 0x50, 0x43, 0x4f, 0x4c, 0x4c, 0x45, 0x43, 0x54, 0x49, - 0x4f, 0x4e, 0x22, 0x4a, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, - 0x6e, 0x12, 0x3b, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, 0x65, 0x6c, 0x65, - 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x20, 0x74, 0x6f, 0x20, - 0x61, 0x20, 0x50, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x20, 0x62, 0x79, - 0x20, 0x61, 0x20, 0x50, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x2e, 0x12, 0xcd, - 0x02, 0x0a, 0x11, 0x53, 0x41, 0x4d, 0x50, 0x4c, 0x45, 0x44, 0x5f, 0x42, 0x59, 0x54, 0x45, 0x5f, - 0x53, 0x49, 0x5a, 0x45, 0x10, 0x0b, 0x1a, 0xb5, 0x02, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xae, 0x02, - 0x0a, 0x20, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x73, 0x61, - 0x6d, 0x70, 0x6c, 0x65, 0x64, 0x5f, 0x62, 0x79, 0x74, 0x65, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x3a, - 0x76, 0x31, 0x12, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, - 0x3a, 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x6e, + 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xb0, + 0x01, 0x0a, 0x11, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x4c, 0x41, 0x54, 0x45, 0x53, 0x54, 0x5f, 0x49, + 0x4e, 0x54, 0x36, 0x34, 0x10, 0x04, 0x1a, 0x98, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x91, 0x01, + 0x0a, 0x20, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, + 0x65, 0x72, 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, + 0x76, 0x31, 0x12, 0x1c, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, + 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, + 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, + 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, + 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, + 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, + 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, + 0x2e, 0x12, 0xb3, 0x01, 0x0a, 0x12, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x4c, 0x41, 0x54, 0x45, 0x53, + 0x54, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x05, 0x1a, 0x9a, 0x01, 0xd2, 0xa7, 0xa7, + 0x96, 0x06, 0x93, 0x01, 0x0a, 0x21, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, + 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x64, 0x6f, + 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, + 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x64, 0x6f, 0x75, + 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, + 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, + 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, + 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, + 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, + 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xad, 0x01, 0x0a, 0x10, 0x55, 0x53, 0x45, 0x52, + 0x5f, 0x54, 0x4f, 0x50, 0x5f, 0x4e, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x10, 0x06, 0x1a, 0x96, + 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x8f, 0x01, 0x0a, 0x1f, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, + 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, + 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, 0x5f, 0x69, 0x6e, + 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, + 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, + 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, + 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, + 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, + 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xb0, 0x01, 0x0a, 0x11, 0x55, 0x53, 0x45, 0x52, + 0x5f, 0x54, 0x4f, 0x50, 0x5f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x07, 0x1a, + 0x98, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x91, 0x01, 0x0a, 0x20, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x74, 0x6f, 0x70, 0x5f, + 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x1c, 0x62, 0x65, 0x61, + 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, 0x5f, + 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, + 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, + 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, + 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, + 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, + 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, 0xb6, 0x01, 0x0a, 0x13, 0x55, + 0x53, 0x45, 0x52, 0x5f, 0x42, 0x4f, 0x54, 0x54, 0x4f, 0x4d, 0x5f, 0x4e, 0x5f, 0x49, 0x4e, 0x54, + 0x36, 0x34, 0x10, 0x08, 0x1a, 0x9c, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x95, 0x01, 0x0a, 0x22, + 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, + 0x3a, 0x62, 0x6f, 0x74, 0x74, 0x6f, 0x6d, 0x5f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, + 0x76, 0x31, 0x12, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, + 0x3a, 0x62, 0x6f, 0x74, 0x74, 0x6f, 0x6d, 0x5f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, + 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, + 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, + 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, + 0x55, 0x52, 0x4e, 0x20, 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, + 0x72, 0x65, 0x70, 0x6f, 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, + 0x69, 0x63, 0x2e, 0x12, 0xb9, 0x01, 0x0a, 0x14, 0x55, 0x53, 0x45, 0x52, 0x5f, 0x42, 0x4f, 0x54, + 0x54, 0x4f, 0x4d, 0x5f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x09, 0x1a, 0x9e, + 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x97, 0x01, 0x0a, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, + 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x75, 0x73, 0x65, 0x72, 0x3a, 0x62, 0x6f, 0x74, 0x74, 0x6f, + 0x6d, 0x5f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x1f, 0x62, + 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x62, 0x6f, 0x74, 0x74, + 0x6f, 0x6d, 0x5f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x1a, 0x0a, + 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x1a, 0x09, 0x4e, 0x41, 0x4d, 0x45, + 0x53, 0x50, 0x41, 0x43, 0x45, 0x1a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0x32, 0x0a, 0x0b, 0x64, + 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x55, 0x52, 0x4e, 0x20, + 0x75, 0x74, 0x69, 0x6c, 0x69, 0x7a, 0x65, 0x64, 0x20, 0x74, 0x6f, 0x20, 0x72, 0x65, 0x70, 0x6f, + 0x72, 0x74, 0x20, 0x75, 0x73, 0x65, 0x72, 0x20, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x2e, 0x12, + 0xad, 0x01, 0x0a, 0x0d, 0x45, 0x4c, 0x45, 0x4d, 0x45, 0x4e, 0x54, 0x5f, 0x43, 0x4f, 0x55, 0x4e, + 0x54, 0x10, 0x0a, 0x1a, 0x99, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x92, 0x01, 0x0a, 0x1c, 0x62, + 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x65, 0x6c, 0x65, 0x6d, 0x65, + 0x6e, 0x74, 0x5f, 0x63, 0x6f, 0x75, 0x6e, 0x74, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, + 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0b, 0x50, 0x43, 0x4f, 0x4c, 0x4c, 0x45, 0x43, 0x54, - 0x49, 0x4f, 0x4e, 0x22, 0xd8, 0x01, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, - 0x69, 0x6f, 0x6e, 0x12, 0xc8, 0x01, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, - 0x62, 0x79, 0x74, 0x65, 0x20, 0x73, 0x69, 0x7a, 0x65, 0x20, 0x61, 0x6e, 0x64, 0x20, 0x63, 0x6f, - 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, 0x20, 0x61, 0x20, 0x73, 0x61, 0x6d, 0x70, 0x6c, 0x65, 0x64, - 0x20, 0x20, 0x73, 0x65, 0x74, 0x20, 0x28, 0x6f, 0x72, 0x20, 0x61, 0x6c, 0x6c, 0x29, 0x20, 0x6f, - 0x66, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x69, 0x6e, 0x20, 0x74, 0x68, - 0x65, 0x20, 0x70, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x20, 0x53, - 0x61, 0x6d, 0x70, 0x6c, 0x69, 0x6e, 0x67, 0x20, 0x69, 0x73, 0x20, 0x75, 0x73, 0x65, 0x64, 0x20, - 0x20, 0x62, 0x65, 0x63, 0x61, 0x75, 0x73, 0x65, 0x20, 0x63, 0x61, 0x6c, 0x63, 0x75, 0x6c, 0x61, - 0x74, 0x69, 0x6e, 0x67, 0x20, 0x74, 0x68, 0x65, 0x20, 0x62, 0x79, 0x74, 0x65, 0x20, 0x63, 0x6f, - 0x75, 0x6e, 0x74, 0x20, 0x69, 0x6e, 0x76, 0x6f, 0x6c, 0x76, 0x65, 0x73, 0x20, 0x73, 0x65, 0x72, - 0x69, 0x61, 0x6c, 0x69, 0x7a, 0x69, 0x6e, 0x67, 0x20, 0x74, 0x68, 0x65, 0x20, 0x20, 0x65, 0x6c, - 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x77, 0x68, 0x69, 0x63, 0x68, 0x20, 0x69, 0x73, 0x20, - 0x43, 0x50, 0x55, 0x20, 0x69, 0x6e, 0x74, 0x65, 0x6e, 0x73, 0x69, 0x76, 0x65, 0x2e, 0x12, 0xd9, - 0x01, 0x0a, 0x12, 0x53, 0x54, 0x41, 0x52, 0x54, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, 0x45, 0x5f, - 0x4d, 0x53, 0x45, 0x43, 0x53, 0x10, 0x0c, 0x1a, 0xc0, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xb9, - 0x01, 0x0a, 0x36, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, - 0x61, 0x72, 0x64, 0x6f, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, - 0x69, 0x6d, 0x65, 0x3a, 0x73, 0x74, 0x61, 0x72, 0x74, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, - 0x5f, 0x6d, 0x73, 0x65, 0x63, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, - 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, - 0x22, 0x58, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, - 0x49, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, 0x65, 0x73, 0x74, 0x69, 0x6d, - 0x61, 0x74, 0x65, 0x64, 0x20, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x20, 0x74, - 0x69, 0x6d, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, 0x20, 0x73, 0x74, 0x61, 0x72, 0x74, - 0x20, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x66, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x20, - 0x69, 0x6e, 0x20, 0x61, 0x20, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x12, 0xdf, 0x01, 0x0a, 0x14, 0x50, - 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, 0x45, 0x5f, 0x4d, 0x53, - 0x45, 0x43, 0x53, 0x10, 0x0d, 0x1a, 0xc4, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xbd, 0x01, 0x0a, - 0x38, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, 0x61, 0x72, - 0x64, 0x6f, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, 0x69, 0x6d, - 0x65, 0x3a, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, - 0x5f, 0x6d, 0x73, 0x65, 0x63, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, - 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, - 0x22, 0x5a, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, - 0x4b, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, 0x65, 0x73, 0x74, 0x69, 0x6d, - 0x61, 0x74, 0x65, 0x64, 0x20, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x20, 0x74, - 0x69, 0x6d, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, 0x20, 0x70, 0x72, 0x6f, 0x63, 0x65, - 0x73, 0x73, 0x20, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x66, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, - 0x6e, 0x20, 0x69, 0x6e, 0x20, 0x61, 0x20, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x12, 0xdd, 0x01, 0x0a, - 0x13, 0x46, 0x49, 0x4e, 0x49, 0x53, 0x48, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, 0x45, 0x5f, 0x4d, - 0x53, 0x45, 0x43, 0x53, 0x10, 0x0e, 0x1a, 0xc3, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xbc, 0x01, - 0x0a, 0x37, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, 0x61, + 0x49, 0x4f, 0x4e, 0x22, 0x4a, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, + 0x6f, 0x6e, 0x12, 0x3b, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, 0x65, 0x6c, + 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x20, 0x74, 0x6f, + 0x20, 0x61, 0x20, 0x50, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x20, 0x62, + 0x79, 0x20, 0x61, 0x20, 0x50, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x2e, 0x12, + 0xcd, 0x02, 0x0a, 0x11, 0x53, 0x41, 0x4d, 0x50, 0x4c, 0x45, 0x44, 0x5f, 0x42, 0x59, 0x54, 0x45, + 0x5f, 0x53, 0x49, 0x5a, 0x45, 0x10, 0x0b, 0x1a, 0xb5, 0x02, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xae, + 0x02, 0x0a, 0x20, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x73, + 0x61, 0x6d, 0x70, 0x6c, 0x65, 0x64, 0x5f, 0x62, 0x79, 0x74, 0x65, 0x5f, 0x73, 0x69, 0x7a, 0x65, + 0x3a, 0x76, 0x31, 0x12, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, + 0x73, 0x3a, 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, + 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0b, 0x50, 0x43, 0x4f, 0x4c, 0x4c, 0x45, 0x43, + 0x54, 0x49, 0x4f, 0x4e, 0x22, 0xd8, 0x01, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, + 0x74, 0x69, 0x6f, 0x6e, 0x12, 0xc8, 0x01, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, + 0x20, 0x62, 0x79, 0x74, 0x65, 0x20, 0x73, 0x69, 0x7a, 0x65, 0x20, 0x61, 0x6e, 0x64, 0x20, 0x63, + 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, 0x20, 0x61, 0x20, 0x73, 0x61, 0x6d, 0x70, 0x6c, 0x65, + 0x64, 0x20, 0x20, 0x73, 0x65, 0x74, 0x20, 0x28, 0x6f, 0x72, 0x20, 0x61, 0x6c, 0x6c, 0x29, 0x20, + 0x6f, 0x66, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x69, 0x6e, 0x20, 0x74, + 0x68, 0x65, 0x20, 0x70, 0x63, 0x6f, 0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x20, + 0x53, 0x61, 0x6d, 0x70, 0x6c, 0x69, 0x6e, 0x67, 0x20, 0x69, 0x73, 0x20, 0x75, 0x73, 0x65, 0x64, + 0x20, 0x20, 0x62, 0x65, 0x63, 0x61, 0x75, 0x73, 0x65, 0x20, 0x63, 0x61, 0x6c, 0x63, 0x75, 0x6c, + 0x61, 0x74, 0x69, 0x6e, 0x67, 0x20, 0x74, 0x68, 0x65, 0x20, 0x62, 0x79, 0x74, 0x65, 0x20, 0x63, + 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x69, 0x6e, 0x76, 0x6f, 0x6c, 0x76, 0x65, 0x73, 0x20, 0x73, 0x65, + 0x72, 0x69, 0x61, 0x6c, 0x69, 0x7a, 0x69, 0x6e, 0x67, 0x20, 0x74, 0x68, 0x65, 0x20, 0x20, 0x65, + 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x77, 0x68, 0x69, 0x63, 0x68, 0x20, 0x69, 0x73, + 0x20, 0x43, 0x50, 0x55, 0x20, 0x69, 0x6e, 0x74, 0x65, 0x6e, 0x73, 0x69, 0x76, 0x65, 0x2e, 0x12, + 0xd9, 0x01, 0x0a, 0x12, 0x53, 0x54, 0x41, 0x52, 0x54, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, 0x45, + 0x5f, 0x4d, 0x53, 0x45, 0x43, 0x53, 0x10, 0x0c, 0x1a, 0xc0, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, + 0xb9, 0x01, 0x0a, 0x36, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, + 0x70, 0x61, 0x72, 0x64, 0x6f, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, + 0x74, 0x69, 0x6d, 0x65, 0x3a, 0x73, 0x74, 0x61, 0x72, 0x74, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, + 0x65, 0x5f, 0x6d, 0x73, 0x65, 0x63, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, + 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, + 0x4d, 0x22, 0x58, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, + 0x12, 0x49, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, 0x65, 0x73, 0x74, 0x69, + 0x6d, 0x61, 0x74, 0x65, 0x64, 0x20, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x20, + 0x74, 0x69, 0x6d, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, 0x20, 0x73, 0x74, 0x61, 0x72, + 0x74, 0x20, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x66, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, + 0x20, 0x69, 0x6e, 0x20, 0x61, 0x20, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x12, 0xdf, 0x01, 0x0a, 0x14, + 0x50, 0x52, 0x4f, 0x43, 0x45, 0x53, 0x53, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, 0x45, 0x5f, 0x4d, + 0x53, 0x45, 0x43, 0x53, 0x10, 0x0d, 0x1a, 0xc4, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xbd, 0x01, + 0x0a, 0x38, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, 0x69, - 0x6d, 0x65, 0x3a, 0x66, 0x69, 0x6e, 0x69, 0x73, 0x68, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, - 0x5f, 0x6d, 0x73, 0x65, 0x63, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, - 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, - 0x22, 0x5a, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, - 0x4b, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, 0x65, 0x73, 0x74, 0x69, 0x6d, - 0x61, 0x74, 0x65, 0x64, 0x20, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x20, 0x74, - 0x69, 0x6d, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, 0x20, 0x66, 0x69, 0x6e, 0x69, 0x73, - 0x68, 0x20, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x20, 0x66, 0x75, 0x6e, 0x63, 0x74, 0x69, 0x6f, - 0x6e, 0x20, 0x69, 0x6e, 0x20, 0x61, 0x20, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x12, 0xbb, 0x01, 0x0a, - 0x0b, 0x54, 0x4f, 0x54, 0x41, 0x4c, 0x5f, 0x4d, 0x53, 0x45, 0x43, 0x53, 0x10, 0x0f, 0x1a, 0xa9, - 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xa2, 0x01, 0x0a, 0x34, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, - 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, - 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x3a, - 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x5f, 0x6d, 0x73, 0x65, 0x63, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x19, - 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, - 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, - 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x22, 0x43, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, - 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x34, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, - 0x65, 0x73, 0x74, 0x69, 0x6d, 0x61, 0x74, 0x65, 0x64, 0x20, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, - 0x69, 0x6f, 0x6e, 0x20, 0x74, 0x69, 0x6d, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, 0x20, - 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x12, 0x9f, 0x02, 0x0a, 0x0e, 0x57, - 0x4f, 0x52, 0x4b, 0x5f, 0x52, 0x45, 0x4d, 0x41, 0x49, 0x4e, 0x49, 0x4e, 0x47, 0x10, 0x10, 0x1a, - 0x8a, 0x02, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x83, 0x02, 0x0a, 0x2c, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x6d, 0x65, 0x3a, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, + 0x65, 0x5f, 0x6d, 0x73, 0x65, 0x63, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, + 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, + 0x4d, 0x22, 0x5a, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, + 0x12, 0x4b, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, 0x65, 0x73, 0x74, 0x69, + 0x6d, 0x61, 0x74, 0x65, 0x64, 0x20, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x20, + 0x74, 0x69, 0x6d, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, 0x20, 0x70, 0x72, 0x6f, 0x63, + 0x65, 0x73, 0x73, 0x20, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x66, 0x75, 0x6e, 0x63, 0x74, 0x69, + 0x6f, 0x6e, 0x20, 0x69, 0x6e, 0x20, 0x61, 0x20, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x12, 0xdd, 0x01, + 0x0a, 0x13, 0x46, 0x49, 0x4e, 0x49, 0x53, 0x48, 0x5f, 0x42, 0x55, 0x4e, 0x44, 0x4c, 0x45, 0x5f, + 0x4d, 0x53, 0x45, 0x43, 0x53, 0x10, 0x0e, 0x1a, 0xc3, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xbc, + 0x01, 0x0a, 0x37, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, + 0x61, 0x72, 0x64, 0x6f, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, + 0x69, 0x6d, 0x65, 0x3a, 0x66, 0x69, 0x6e, 0x69, 0x73, 0x68, 0x5f, 0x62, 0x75, 0x6e, 0x64, 0x6c, + 0x65, 0x5f, 0x6d, 0x73, 0x65, 0x63, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, + 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, + 0x4d, 0x22, 0x5a, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, + 0x12, 0x4b, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x20, 0x65, 0x73, 0x74, 0x69, + 0x6d, 0x61, 0x74, 0x65, 0x64, 0x20, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x20, + 0x74, 0x69, 0x6d, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, 0x20, 0x66, 0x69, 0x6e, 0x69, + 0x73, 0x68, 0x20, 0x62, 0x75, 0x6e, 0x64, 0x6c, 0x65, 0x20, 0x66, 0x75, 0x6e, 0x63, 0x74, 0x69, + 0x6f, 0x6e, 0x20, 0x69, 0x6e, 0x20, 0x61, 0x20, 0x70, 0x61, 0x72, 0x64, 0x6f, 0x12, 0xbb, 0x01, + 0x0a, 0x0b, 0x54, 0x4f, 0x54, 0x41, 0x4c, 0x5f, 0x4d, 0x53, 0x45, 0x43, 0x53, 0x10, 0x0f, 0x1a, + 0xa9, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xa2, 0x01, 0x0a, 0x34, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, - 0x6d, 0x5f, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x3a, 0x72, 0x65, 0x6d, 0x61, 0x69, - 0x6e, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x31, 0x12, 0x18, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, - 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x3a, 0x76, - 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x22, 0xac, 0x01, - 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x9c, 0x01, - 0x54, 0x68, 0x65, 0x20, 0x72, 0x65, 0x6d, 0x61, 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x20, 0x61, 0x6d, - 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, 0x20, 0x77, 0x6f, 0x72, 0x6b, 0x20, 0x66, 0x6f, 0x72, - 0x20, 0x65, 0x61, 0x63, 0x68, 0x20, 0x61, 0x63, 0x74, 0x69, 0x76, 0x65, 0x20, 0x65, 0x6c, 0x65, - 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x20, 0x45, 0x61, 0x63, 0x68, 0x20, 0x61, 0x63, 0x74, 0x69, 0x76, - 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x20, 0x72, 0x65, 0x70, 0x72, 0x65, 0x73, - 0x65, 0x6e, 0x74, 0x73, 0x20, 0x61, 0x6e, 0x20, 0x69, 0x6e, 0x64, 0x65, 0x70, 0x65, 0x6e, 0x64, - 0x65, 0x6e, 0x74, 0x20, 0x61, 0x6d, 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, 0x20, 0x77, 0x6f, - 0x72, 0x6b, 0x20, 0x6e, 0x6f, 0x74, 0x20, 0x73, 0x68, 0x61, 0x72, 0x65, 0x64, 0x20, 0x77, 0x69, - 0x74, 0x68, 0x20, 0x61, 0x6e, 0x79, 0x20, 0x6f, 0x74, 0x68, 0x65, 0x72, 0x20, 0x61, 0x63, 0x74, - 0x69, 0x76, 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x12, 0x9f, 0x02, 0x0a, - 0x0e, 0x57, 0x4f, 0x52, 0x4b, 0x5f, 0x43, 0x4f, 0x4d, 0x50, 0x4c, 0x45, 0x54, 0x45, 0x44, 0x10, - 0x11, 0x1a, 0x8a, 0x02, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x83, 0x02, 0x0a, 0x2c, 0x62, 0x65, 0x61, - 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, - 0x6f, 0x72, 0x6d, 0x5f, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x3a, 0x63, 0x6f, 0x6d, - 0x70, 0x6c, 0x65, 0x74, 0x65, 0x64, 0x3a, 0x76, 0x31, 0x12, 0x18, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, - 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x22, - 0xac, 0x01, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, - 0x9c, 0x01, 0x54, 0x68, 0x65, 0x20, 0x72, 0x65, 0x6d, 0x61, 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x20, - 0x61, 0x6d, 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, 0x20, 0x77, 0x6f, 0x72, 0x6b, 0x20, 0x66, - 0x6f, 0x72, 0x20, 0x65, 0x61, 0x63, 0x68, 0x20, 0x61, 0x63, 0x74, 0x69, 0x76, 0x65, 0x20, 0x65, - 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x20, 0x45, 0x61, 0x63, 0x68, 0x20, 0x61, 0x63, 0x74, - 0x69, 0x76, 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x20, 0x72, 0x65, 0x70, 0x72, - 0x65, 0x73, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x61, 0x6e, 0x20, 0x69, 0x6e, 0x64, 0x65, 0x70, 0x65, - 0x6e, 0x64, 0x65, 0x6e, 0x74, 0x20, 0x61, 0x6d, 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, 0x20, - 0x77, 0x6f, 0x72, 0x6b, 0x20, 0x6e, 0x6f, 0x74, 0x20, 0x73, 0x68, 0x61, 0x72, 0x65, 0x64, 0x20, - 0x77, 0x69, 0x74, 0x68, 0x20, 0x61, 0x6e, 0x79, 0x20, 0x6f, 0x74, 0x68, 0x65, 0x72, 0x20, 0x61, - 0x63, 0x74, 0x69, 0x76, 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x12, 0xa8, - 0x01, 0x0a, 0x17, 0x44, 0x41, 0x54, 0x41, 0x5f, 0x43, 0x48, 0x41, 0x4e, 0x4e, 0x45, 0x4c, 0x5f, - 0x52, 0x45, 0x41, 0x44, 0x5f, 0x49, 0x4e, 0x44, 0x45, 0x58, 0x10, 0x12, 0x1a, 0x8a, 0x01, 0xd2, - 0xa7, 0xa7, 0x96, 0x06, 0x83, 0x01, 0x0a, 0x26, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, - 0x72, 0x69, 0x63, 0x3a, 0x64, 0x61, 0x74, 0x61, 0x5f, 0x63, 0x68, 0x61, 0x6e, 0x6e, 0x65, 0x6c, - 0x3a, 0x72, 0x65, 0x61, 0x64, 0x5f, 0x69, 0x6e, 0x64, 0x65, 0x78, 0x3a, 0x76, 0x31, 0x12, 0x19, - 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, - 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, - 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, - 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x54, 0x68, 0x65, 0x20, 0x72, 0x65, 0x61, 0x64, 0x20, 0x69, - 0x6e, 0x64, 0x65, 0x78, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, 0x20, 0x64, 0x61, 0x74, 0x61, - 0x20, 0x63, 0x68, 0x61, 0x6e, 0x6e, 0x65, 0x6c, 0x2e, 0x22, 0x2e, 0x0a, 0x18, 0x4d, 0x6f, 0x6e, - 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x4c, 0x61, 0x62, 0x65, 0x6c, - 0x50, 0x72, 0x6f, 0x70, 0x73, 0x12, 0x12, 0x0a, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, - 0x01, 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x22, 0xf2, 0x03, 0x0a, 0x0e, 0x4d, 0x6f, - 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x12, 0x10, 0x0a, 0x03, - 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, 0x12, - 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x74, 0x79, - 0x70, 0x65, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x03, 0x20, - 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x55, 0x0a, 0x06, - 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x3d, 0x2e, 0x6f, - 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, - 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x2e, - 0x4c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x06, 0x6c, 0x61, 0x62, - 0x65, 0x6c, 0x73, 0x1a, 0x39, 0x0a, 0x0b, 0x4c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x45, 0x6e, 0x74, - 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, - 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0x8d, - 0x02, 0x0a, 0x14, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, - 0x6f, 0x4c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x12, 0x21, 0x0a, 0x09, 0x54, 0x52, 0x41, 0x4e, 0x53, - 0x46, 0x4f, 0x52, 0x4d, 0x10, 0x00, 0x1a, 0x12, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x0c, 0x0a, 0x0a, - 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x12, 0x24, 0x0a, 0x0b, 0x50, 0x43, - 0x4f, 0x4c, 0x4c, 0x45, 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x10, 0x01, 0x1a, 0x13, 0xa2, 0xd4, 0xe0, - 0xe5, 0x03, 0x0d, 0x0a, 0x0b, 0x50, 0x43, 0x4f, 0x4c, 0x4c, 0x45, 0x43, 0x54, 0x49, 0x4f, 0x4e, - 0x12, 0x32, 0x0a, 0x12, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x49, 0x4e, 0x47, 0x5f, 0x53, 0x54, - 0x52, 0x41, 0x54, 0x45, 0x47, 0x59, 0x10, 0x02, 0x1a, 0x1a, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x14, - 0x0a, 0x12, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x49, 0x4e, 0x47, 0x5f, 0x53, 0x54, 0x52, 0x41, - 0x54, 0x45, 0x47, 0x59, 0x12, 0x18, 0x0a, 0x05, 0x43, 0x4f, 0x44, 0x45, 0x52, 0x10, 0x03, 0x1a, - 0x0d, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x07, 0x0a, 0x05, 0x43, 0x4f, 0x44, 0x45, 0x52, 0x12, 0x24, - 0x0a, 0x0b, 0x45, 0x4e, 0x56, 0x49, 0x52, 0x4f, 0x4e, 0x4d, 0x45, 0x4e, 0x54, 0x10, 0x04, 0x1a, - 0x13, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x0d, 0x0a, 0x0b, 0x45, 0x4e, 0x56, 0x49, 0x52, 0x4f, 0x4e, - 0x4d, 0x45, 0x4e, 0x54, 0x12, 0x20, 0x0a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, - 0x45, 0x10, 0x05, 0x1a, 0x11, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x0b, 0x0a, 0x09, 0x4e, 0x41, 0x4d, - 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, 0x12, 0x16, 0x0a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x10, 0x06, - 0x1a, 0x0c, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x06, 0x0a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x22, 0xbc, - 0x05, 0x0a, 0x16, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, - 0x6f, 0x54, 0x79, 0x70, 0x65, 0x55, 0x72, 0x6e, 0x73, 0x22, 0xa1, 0x05, 0x0a, 0x04, 0x45, 0x6e, - 0x75, 0x6d, 0x12, 0x33, 0x0a, 0x0e, 0x53, 0x55, 0x4d, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x5f, - 0x54, 0x59, 0x50, 0x45, 0x10, 0x00, 0x1a, 0x1f, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x19, 0x62, 0x65, - 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, - 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x35, 0x0a, 0x0f, 0x53, 0x55, 0x4d, 0x5f, 0x44, - 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x01, 0x1a, 0x20, 0xa2, 0xb4, - 0xfa, 0xc2, 0x05, 0x1a, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, - 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x45, - 0x0a, 0x17, 0x44, 0x49, 0x53, 0x54, 0x52, 0x49, 0x42, 0x55, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x49, - 0x4e, 0x54, 0x36, 0x34, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x02, 0x1a, 0x28, 0xa2, 0xb4, 0xfa, - 0xc2, 0x05, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, - 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, - 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x47, 0x0a, 0x18, 0x44, 0x49, 0x53, 0x54, 0x52, 0x49, 0x42, - 0x55, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x5f, 0x54, 0x59, 0x50, - 0x45, 0x10, 0x03, 0x1a, 0x29, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, - 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x39, - 0x0a, 0x11, 0x4c, 0x41, 0x54, 0x45, 0x53, 0x54, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x5f, 0x54, - 0x59, 0x50, 0x45, 0x10, 0x04, 0x1a, 0x22, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1c, 0x62, 0x65, 0x61, - 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, - 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x3b, 0x0a, 0x12, 0x4c, 0x41, 0x54, - 0x45, 0x53, 0x54, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, - 0x05, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, - 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x64, 0x6f, 0x75, - 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x37, 0x0a, 0x10, 0x54, 0x4f, 0x50, 0x5f, 0x4e, 0x5f, - 0x49, 0x4e, 0x54, 0x36, 0x34, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x06, 0x1a, 0x21, 0xa2, 0xb4, - 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, - 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, - 0x39, 0x0a, 0x11, 0x54, 0x4f, 0x50, 0x5f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x5f, - 0x54, 0x59, 0x50, 0x45, 0x10, 0x07, 0x1a, 0x22, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1c, 0x62, 0x65, - 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, - 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x3d, 0x0a, 0x13, 0x42, 0x4f, - 0x54, 0x54, 0x4f, 0x4d, 0x5f, 0x4e, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x5f, 0x54, 0x59, 0x50, - 0x45, 0x10, 0x08, 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x62, 0x6f, 0x74, 0x74, 0x6f, 0x6d, 0x5f, 0x6e, - 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x3f, 0x0a, 0x14, 0x42, 0x4f, 0x54, - 0x54, 0x4f, 0x4d, 0x5f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x5f, 0x54, 0x59, 0x50, - 0x45, 0x10, 0x09, 0x1a, 0x25, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1f, 0x62, 0x65, 0x61, 0x6d, 0x3a, - 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x62, 0x6f, 0x74, 0x74, 0x6f, 0x6d, 0x5f, 0x6e, - 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x31, 0x0a, 0x0d, 0x50, 0x52, - 0x4f, 0x47, 0x52, 0x45, 0x53, 0x53, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x0a, 0x1a, 0x1e, 0xa2, - 0xb4, 0xfa, 0xc2, 0x05, 0x18, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, - 0x73, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x3a, 0x76, 0x31, 0x3a, 0x82, 0x01, - 0x0a, 0x0b, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x5f, 0x70, 0x72, 0x6f, 0x70, 0x73, 0x12, 0x21, 0x2e, - 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, - 0x45, 0x6e, 0x75, 0x6d, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, - 0x18, 0xc4, 0x8a, 0xdc, 0x3c, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x3b, 0x2e, 0x6f, 0x72, 0x67, 0x2e, - 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, - 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x4c, 0x61, 0x62, 0x65, - 0x6c, 0x50, 0x72, 0x6f, 0x70, 0x73, 0x52, 0x0a, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x50, 0x72, 0x6f, - 0x70, 0x73, 0x3a, 0x8d, 0x01, 0x0a, 0x14, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, - 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x5f, 0x73, 0x70, 0x65, 0x63, 0x12, 0x21, 0x2e, 0x67, 0x6f, - 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x45, 0x6e, - 0x75, 0x6d, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x18, 0xfa, - 0xf4, 0xe4, 0x62, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x35, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, - 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, - 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x53, 0x70, 0x65, 0x63, 0x52, 0x12, - 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x53, 0x70, - 0x65, 0x63, 0x42, 0x76, 0x0a, 0x21, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, - 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x42, 0x0a, 0x4d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, - 0x41, 0x70, 0x69, 0x5a, 0x45, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, - 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, - 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2f, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x3b, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, - 0x6f, 0x33, + 0x6d, 0x5f, 0x65, 0x78, 0x65, 0x63, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x74, 0x69, 0x6d, 0x65, + 0x3a, 0x74, 0x6f, 0x74, 0x61, 0x6c, 0x5f, 0x6d, 0x73, 0x65, 0x63, 0x73, 0x3a, 0x76, 0x31, 0x12, + 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, + 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, + 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x22, 0x43, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, + 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x34, 0x54, 0x68, 0x65, 0x20, 0x74, 0x6f, 0x74, 0x61, 0x6c, + 0x20, 0x65, 0x73, 0x74, 0x69, 0x6d, 0x61, 0x74, 0x65, 0x64, 0x20, 0x65, 0x78, 0x65, 0x63, 0x75, + 0x74, 0x69, 0x6f, 0x6e, 0x20, 0x74, 0x69, 0x6d, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, + 0x20, 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, 0x72, 0x6d, 0x12, 0x9f, 0x02, 0x0a, 0x0e, + 0x57, 0x4f, 0x52, 0x4b, 0x5f, 0x52, 0x45, 0x4d, 0x41, 0x49, 0x4e, 0x49, 0x4e, 0x47, 0x10, 0x10, + 0x1a, 0x8a, 0x02, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x83, 0x02, 0x0a, 0x2c, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, 0x66, 0x6f, + 0x72, 0x6d, 0x5f, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x3a, 0x72, 0x65, 0x6d, 0x61, + 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x3a, 0x76, 0x31, 0x12, 0x18, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, + 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x3a, + 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x22, 0xac, + 0x01, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x9c, + 0x01, 0x54, 0x68, 0x65, 0x20, 0x72, 0x65, 0x6d, 0x61, 0x69, 0x6e, 0x69, 0x6e, 0x67, 0x20, 0x61, + 0x6d, 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, 0x20, 0x77, 0x6f, 0x72, 0x6b, 0x20, 0x66, 0x6f, + 0x72, 0x20, 0x65, 0x61, 0x63, 0x68, 0x20, 0x61, 0x63, 0x74, 0x69, 0x76, 0x65, 0x20, 0x65, 0x6c, + 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x20, 0x45, 0x61, 0x63, 0x68, 0x20, 0x61, 0x63, 0x74, 0x69, + 0x76, 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x20, 0x72, 0x65, 0x70, 0x72, 0x65, + 0x73, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x61, 0x6e, 0x20, 0x69, 0x6e, 0x64, 0x65, 0x70, 0x65, 0x6e, + 0x64, 0x65, 0x6e, 0x74, 0x20, 0x61, 0x6d, 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, 0x20, 0x77, + 0x6f, 0x72, 0x6b, 0x20, 0x6e, 0x6f, 0x74, 0x20, 0x73, 0x68, 0x61, 0x72, 0x65, 0x64, 0x20, 0x77, + 0x69, 0x74, 0x68, 0x20, 0x61, 0x6e, 0x79, 0x20, 0x6f, 0x74, 0x68, 0x65, 0x72, 0x20, 0x61, 0x63, + 0x74, 0x69, 0x76, 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x12, 0x9f, 0x02, + 0x0a, 0x0e, 0x57, 0x4f, 0x52, 0x4b, 0x5f, 0x43, 0x4f, 0x4d, 0x50, 0x4c, 0x45, 0x54, 0x45, 0x44, + 0x10, 0x11, 0x1a, 0x8a, 0x02, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x83, 0x02, 0x0a, 0x2c, 0x62, 0x65, + 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x70, 0x74, 0x72, 0x61, 0x6e, 0x73, + 0x66, 0x6f, 0x72, 0x6d, 0x5f, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, 0x73, 0x3a, 0x63, 0x6f, + 0x6d, 0x70, 0x6c, 0x65, 0x74, 0x65, 0x64, 0x3a, 0x76, 0x31, 0x12, 0x18, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, + 0x73, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, + 0x22, 0xac, 0x01, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, + 0x12, 0x9c, 0x01, 0x54, 0x68, 0x65, 0x20, 0x72, 0x65, 0x6d, 0x61, 0x69, 0x6e, 0x69, 0x6e, 0x67, + 0x20, 0x61, 0x6d, 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, 0x20, 0x77, 0x6f, 0x72, 0x6b, 0x20, + 0x66, 0x6f, 0x72, 0x20, 0x65, 0x61, 0x63, 0x68, 0x20, 0x61, 0x63, 0x74, 0x69, 0x76, 0x65, 0x20, + 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x20, 0x45, 0x61, 0x63, 0x68, 0x20, 0x61, 0x63, + 0x74, 0x69, 0x76, 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x20, 0x72, 0x65, 0x70, + 0x72, 0x65, 0x73, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x61, 0x6e, 0x20, 0x69, 0x6e, 0x64, 0x65, 0x70, + 0x65, 0x6e, 0x64, 0x65, 0x6e, 0x74, 0x20, 0x61, 0x6d, 0x6f, 0x75, 0x6e, 0x74, 0x20, 0x6f, 0x66, + 0x20, 0x77, 0x6f, 0x72, 0x6b, 0x20, 0x6e, 0x6f, 0x74, 0x20, 0x73, 0x68, 0x61, 0x72, 0x65, 0x64, + 0x20, 0x77, 0x69, 0x74, 0x68, 0x20, 0x61, 0x6e, 0x79, 0x20, 0x6f, 0x74, 0x68, 0x65, 0x72, 0x20, + 0x61, 0x63, 0x74, 0x69, 0x76, 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x2e, 0x12, + 0xa8, 0x01, 0x0a, 0x17, 0x44, 0x41, 0x54, 0x41, 0x5f, 0x43, 0x48, 0x41, 0x4e, 0x4e, 0x45, 0x4c, + 0x5f, 0x52, 0x45, 0x41, 0x44, 0x5f, 0x49, 0x4e, 0x44, 0x45, 0x58, 0x10, 0x12, 0x1a, 0x8a, 0x01, + 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x83, 0x01, 0x0a, 0x26, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, + 0x74, 0x72, 0x69, 0x63, 0x3a, 0x64, 0x61, 0x74, 0x61, 0x5f, 0x63, 0x68, 0x61, 0x6e, 0x6e, 0x65, + 0x6c, 0x3a, 0x72, 0x65, 0x61, 0x64, 0x5f, 0x69, 0x6e, 0x64, 0x65, 0x78, 0x3a, 0x76, 0x31, 0x12, + 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, + 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, + 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x22, 0x32, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, + 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x23, 0x54, 0x68, 0x65, 0x20, 0x72, 0x65, 0x61, 0x64, 0x20, + 0x69, 0x6e, 0x64, 0x65, 0x78, 0x20, 0x6f, 0x66, 0x20, 0x74, 0x68, 0x65, 0x20, 0x64, 0x61, 0x74, + 0x61, 0x20, 0x63, 0x68, 0x61, 0x6e, 0x6e, 0x65, 0x6c, 0x2e, 0x12, 0x8a, 0x02, 0x0a, 0x11, 0x41, + 0x50, 0x49, 0x5f, 0x52, 0x45, 0x51, 0x55, 0x45, 0x53, 0x54, 0x5f, 0x43, 0x4f, 0x55, 0x4e, 0x54, + 0x10, 0x13, 0x1a, 0xf2, 0x01, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0xeb, 0x01, 0x0a, 0x23, 0x62, 0x65, + 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x69, 0x6f, 0x3a, 0x61, 0x70, 0x69, + 0x5f, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x5f, 0x63, 0x6f, 0x75, 0x6e, 0x74, 0x3a, 0x76, + 0x31, 0x12, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, + 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x07, 0x53, 0x45, + 0x52, 0x56, 0x49, 0x43, 0x45, 0x1a, 0x06, 0x4d, 0x45, 0x54, 0x48, 0x4f, 0x44, 0x1a, 0x08, 0x52, + 0x45, 0x53, 0x4f, 0x55, 0x52, 0x43, 0x45, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, + 0x4f, 0x52, 0x4d, 0x1a, 0x06, 0x53, 0x54, 0x41, 0x54, 0x55, 0x53, 0x22, 0x62, 0x0a, 0x0b, 0x64, + 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x53, 0x52, 0x65, 0x71, 0x75, + 0x65, 0x73, 0x74, 0x20, 0x63, 0x6f, 0x75, 0x6e, 0x74, 0x73, 0x20, 0x77, 0x69, 0x74, 0x68, 0x20, + 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x20, 0x6d, 0x61, 0x64, 0x65, 0x20, 0x74, 0x6f, 0x20, 0x49, + 0x4f, 0x20, 0x73, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x20, 0x41, 0x50, 0x49, 0x73, 0x20, 0x74, + 0x6f, 0x20, 0x62, 0x61, 0x74, 0x63, 0x68, 0x20, 0x72, 0x65, 0x61, 0x64, 0x20, 0x6f, 0x72, 0x20, + 0x77, 0x72, 0x69, 0x74, 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x2e, 0x22, + 0x16, 0x0a, 0x0e, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, 0x5f, 0x6d, 0x65, 0x74, 0x72, 0x69, + 0x63, 0x12, 0x04, 0x74, 0x72, 0x75, 0x65, 0x12, 0xb3, 0x02, 0x0a, 0x15, 0x41, 0x50, 0x49, 0x5f, + 0x52, 0x45, 0x51, 0x55, 0x45, 0x53, 0x54, 0x5f, 0x4c, 0x41, 0x54, 0x45, 0x4e, 0x43, 0x49, 0x45, + 0x53, 0x10, 0x14, 0x1a, 0x97, 0x02, 0xd2, 0xa7, 0xa7, 0x96, 0x06, 0x90, 0x02, 0x0a, 0x27, 0x62, + 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x3a, 0x69, 0x6f, 0x3a, 0x61, 0x70, + 0x69, 0x5f, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x5f, 0x6c, 0x61, 0x74, 0x65, 0x6e, 0x63, + 0x69, 0x65, 0x73, 0x3a, 0x76, 0x31, 0x12, 0x1f, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, + 0x72, 0x69, 0x63, 0x73, 0x3a, 0x68, 0x69, 0x73, 0x74, 0x6f, 0x67, 0x72, 0x61, 0x6d, 0x5f, 0x69, + 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x1a, 0x07, 0x53, 0x45, 0x52, 0x56, 0x49, 0x43, 0x45, + 0x1a, 0x06, 0x4d, 0x45, 0x54, 0x48, 0x4f, 0x44, 0x1a, 0x08, 0x52, 0x45, 0x53, 0x4f, 0x55, 0x52, + 0x43, 0x45, 0x1a, 0x0a, 0x50, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x22, 0x6e, + 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x5f, 0x48, + 0x69, 0x73, 0x74, 0x6f, 0x67, 0x72, 0x61, 0x6d, 0x20, 0x63, 0x6f, 0x75, 0x6e, 0x74, 0x73, 0x20, + 0x66, 0x6f, 0x72, 0x20, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x20, 0x6c, 0x61, 0x74, 0x65, + 0x6e, 0x63, 0x69, 0x65, 0x73, 0x20, 0x6d, 0x61, 0x64, 0x65, 0x20, 0x74, 0x6f, 0x20, 0x49, 0x4f, + 0x20, 0x73, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x20, 0x41, 0x50, 0x49, 0x73, 0x20, 0x74, 0x6f, + 0x20, 0x62, 0x61, 0x74, 0x63, 0x68, 0x20, 0x72, 0x65, 0x61, 0x64, 0x20, 0x6f, 0x72, 0x20, 0x77, + 0x72, 0x69, 0x74, 0x65, 0x20, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x2e, 0x22, 0x15, + 0x0a, 0x05, 0x75, 0x6e, 0x69, 0x74, 0x73, 0x12, 0x0c, 0x4d, 0x69, 0x6c, 0x6c, 0x69, 0x73, 0x65, + 0x63, 0x6f, 0x6e, 0x64, 0x73, 0x22, 0x16, 0x0a, 0x0e, 0x70, 0x72, 0x6f, 0x63, 0x65, 0x73, 0x73, + 0x5f, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x12, 0x04, 0x74, 0x72, 0x75, 0x65, 0x22, 0x2e, 0x0a, + 0x18, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x4c, + 0x61, 0x62, 0x65, 0x6c, 0x50, 0x72, 0x6f, 0x70, 0x73, 0x12, 0x12, 0x0a, 0x04, 0x6e, 0x61, 0x6d, + 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x22, 0xe5, 0x07, + 0x0a, 0x0e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, + 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, + 0x72, 0x6e, 0x12, 0x12, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, + 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, + 0x64, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, + 0x12, 0x55, 0x0a, 0x06, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x18, 0x04, 0x20, 0x03, 0x28, 0x0b, + 0x32, 0x3d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, + 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, + 0x6e, 0x66, 0x6f, 0x2e, 0x4c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, + 0x06, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x12, 0x39, 0x0a, 0x0a, 0x73, 0x74, 0x61, 0x72, 0x74, + 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x67, 0x6f, + 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x54, 0x69, + 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x52, 0x09, 0x73, 0x74, 0x61, 0x72, 0x74, 0x54, 0x69, + 0x6d, 0x65, 0x1a, 0x39, 0x0a, 0x0b, 0x4c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x45, 0x6e, 0x74, 0x72, + 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, + 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x22, 0xc5, 0x05, + 0x0a, 0x14, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, + 0x4c, 0x61, 0x62, 0x65, 0x6c, 0x73, 0x12, 0x21, 0x0a, 0x09, 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, + 0x4f, 0x52, 0x4d, 0x10, 0x00, 0x1a, 0x12, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x0c, 0x0a, 0x0a, 0x50, + 0x54, 0x52, 0x41, 0x4e, 0x53, 0x46, 0x4f, 0x52, 0x4d, 0x12, 0x24, 0x0a, 0x0b, 0x50, 0x43, 0x4f, + 0x4c, 0x4c, 0x45, 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x10, 0x01, 0x1a, 0x13, 0xa2, 0xd4, 0xe0, 0xe5, + 0x03, 0x0d, 0x0a, 0x0b, 0x50, 0x43, 0x4f, 0x4c, 0x4c, 0x45, 0x43, 0x54, 0x49, 0x4f, 0x4e, 0x12, + 0x32, 0x0a, 0x12, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x49, 0x4e, 0x47, 0x5f, 0x53, 0x54, 0x52, + 0x41, 0x54, 0x45, 0x47, 0x59, 0x10, 0x02, 0x1a, 0x1a, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x14, 0x0a, + 0x12, 0x57, 0x49, 0x4e, 0x44, 0x4f, 0x57, 0x49, 0x4e, 0x47, 0x5f, 0x53, 0x54, 0x52, 0x41, 0x54, + 0x45, 0x47, 0x59, 0x12, 0x18, 0x0a, 0x05, 0x43, 0x4f, 0x44, 0x45, 0x52, 0x10, 0x03, 0x1a, 0x0d, + 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x07, 0x0a, 0x05, 0x43, 0x4f, 0x44, 0x45, 0x52, 0x12, 0x24, 0x0a, + 0x0b, 0x45, 0x4e, 0x56, 0x49, 0x52, 0x4f, 0x4e, 0x4d, 0x45, 0x4e, 0x54, 0x10, 0x04, 0x1a, 0x13, + 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x0d, 0x0a, 0x0b, 0x45, 0x4e, 0x56, 0x49, 0x52, 0x4f, 0x4e, 0x4d, + 0x45, 0x4e, 0x54, 0x12, 0x20, 0x0a, 0x09, 0x4e, 0x41, 0x4d, 0x45, 0x53, 0x50, 0x41, 0x43, 0x45, + 0x10, 0x05, 0x1a, 0x11, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x0b, 0x0a, 0x09, 0x4e, 0x41, 0x4d, 0x45, + 0x53, 0x50, 0x41, 0x43, 0x45, 0x12, 0x16, 0x0a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x10, 0x06, 0x1a, + 0x0c, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x06, 0x0a, 0x04, 0x4e, 0x41, 0x4d, 0x45, 0x12, 0x1c, 0x0a, + 0x07, 0x53, 0x45, 0x52, 0x56, 0x49, 0x43, 0x45, 0x10, 0x07, 0x1a, 0x0f, 0xa2, 0xd4, 0xe0, 0xe5, + 0x03, 0x09, 0x0a, 0x07, 0x53, 0x45, 0x52, 0x56, 0x49, 0x43, 0x45, 0x12, 0x1a, 0x0a, 0x06, 0x4d, + 0x45, 0x54, 0x48, 0x4f, 0x44, 0x10, 0x08, 0x1a, 0x0e, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x08, 0x0a, + 0x06, 0x4d, 0x45, 0x54, 0x48, 0x4f, 0x44, 0x12, 0x1e, 0x0a, 0x08, 0x52, 0x45, 0x53, 0x4f, 0x55, + 0x52, 0x43, 0x45, 0x10, 0x09, 0x1a, 0x10, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x0a, 0x0a, 0x08, 0x52, + 0x45, 0x53, 0x4f, 0x55, 0x52, 0x43, 0x45, 0x12, 0x1a, 0x0a, 0x06, 0x53, 0x54, 0x41, 0x54, 0x55, + 0x53, 0x10, 0x0a, 0x1a, 0x0e, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x08, 0x0a, 0x06, 0x53, 0x54, 0x41, + 0x54, 0x55, 0x53, 0x12, 0x34, 0x0a, 0x13, 0x42, 0x49, 0x47, 0x51, 0x55, 0x45, 0x52, 0x59, 0x5f, + 0x50, 0x52, 0x4f, 0x4a, 0x45, 0x43, 0x54, 0x5f, 0x49, 0x44, 0x10, 0x0b, 0x1a, 0x1b, 0xa2, 0xd4, + 0xe0, 0xe5, 0x03, 0x15, 0x0a, 0x13, 0x42, 0x49, 0x47, 0x51, 0x55, 0x45, 0x52, 0x59, 0x5f, 0x50, + 0x52, 0x4f, 0x4a, 0x45, 0x43, 0x54, 0x5f, 0x49, 0x44, 0x12, 0x2e, 0x0a, 0x10, 0x42, 0x49, 0x47, + 0x51, 0x55, 0x45, 0x52, 0x59, 0x5f, 0x44, 0x41, 0x54, 0x41, 0x53, 0x45, 0x54, 0x10, 0x0c, 0x1a, + 0x18, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x12, 0x0a, 0x10, 0x42, 0x49, 0x47, 0x51, 0x55, 0x45, 0x52, + 0x59, 0x5f, 0x44, 0x41, 0x54, 0x41, 0x53, 0x45, 0x54, 0x12, 0x2a, 0x0a, 0x0e, 0x42, 0x49, 0x47, + 0x51, 0x55, 0x45, 0x52, 0x59, 0x5f, 0x54, 0x41, 0x42, 0x4c, 0x45, 0x10, 0x0d, 0x1a, 0x16, 0xa2, + 0xd4, 0xe0, 0xe5, 0x03, 0x10, 0x0a, 0x0e, 0x42, 0x49, 0x47, 0x51, 0x55, 0x45, 0x52, 0x59, 0x5f, + 0x54, 0x41, 0x42, 0x4c, 0x45, 0x12, 0x28, 0x0a, 0x0d, 0x42, 0x49, 0x47, 0x51, 0x55, 0x45, 0x52, + 0x59, 0x5f, 0x56, 0x49, 0x45, 0x57, 0x10, 0x0e, 0x1a, 0x15, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x0f, + 0x0a, 0x0d, 0x42, 0x49, 0x47, 0x51, 0x55, 0x45, 0x52, 0x59, 0x5f, 0x56, 0x49, 0x45, 0x57, 0x12, + 0x34, 0x0a, 0x13, 0x42, 0x49, 0x47, 0x51, 0x55, 0x45, 0x52, 0x59, 0x5f, 0x51, 0x55, 0x45, 0x52, + 0x59, 0x5f, 0x4e, 0x41, 0x4d, 0x45, 0x10, 0x0f, 0x1a, 0x1b, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x15, + 0x0a, 0x13, 0x42, 0x49, 0x47, 0x51, 0x55, 0x45, 0x52, 0x59, 0x5f, 0x51, 0x55, 0x45, 0x52, 0x59, + 0x5f, 0x4e, 0x41, 0x4d, 0x45, 0x12, 0x22, 0x0a, 0x0a, 0x47, 0x43, 0x53, 0x5f, 0x42, 0x55, 0x43, + 0x4b, 0x45, 0x54, 0x10, 0x10, 0x1a, 0x12, 0xa2, 0xd4, 0xe0, 0xe5, 0x03, 0x0c, 0x0a, 0x0a, 0x47, + 0x43, 0x53, 0x5f, 0x42, 0x55, 0x43, 0x4b, 0x45, 0x54, 0x12, 0x2a, 0x0a, 0x0e, 0x47, 0x43, 0x53, + 0x5f, 0x50, 0x52, 0x4f, 0x4a, 0x45, 0x43, 0x54, 0x5f, 0x49, 0x44, 0x10, 0x11, 0x1a, 0x16, 0xa2, + 0xd4, 0xe0, 0xe5, 0x03, 0x10, 0x0a, 0x0e, 0x47, 0x43, 0x53, 0x5f, 0x50, 0x52, 0x4f, 0x4a, 0x45, + 0x43, 0x54, 0x5f, 0x49, 0x44, 0x22, 0xbc, 0x05, 0x0a, 0x16, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, + 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x54, 0x79, 0x70, 0x65, 0x55, 0x72, 0x6e, 0x73, + 0x22, 0xa1, 0x05, 0x0a, 0x04, 0x45, 0x6e, 0x75, 0x6d, 0x12, 0x33, 0x0a, 0x0e, 0x53, 0x55, 0x4d, + 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x00, 0x1a, 0x1f, 0xa2, + 0xb4, 0xfa, 0xc2, 0x05, 0x19, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, + 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x35, + 0x0a, 0x0f, 0x53, 0x55, 0x4d, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x5f, 0x54, 0x59, 0x50, + 0x45, 0x10, 0x01, 0x1a, 0x20, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1a, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x73, 0x75, 0x6d, 0x5f, 0x64, 0x6f, 0x75, 0x62, + 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x45, 0x0a, 0x17, 0x44, 0x49, 0x53, 0x54, 0x52, 0x49, 0x42, + 0x55, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x5f, 0x54, 0x59, 0x50, 0x45, + 0x10, 0x02, 0x1a, 0x28, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x22, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, + 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x64, 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, + 0x69, 0x6f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x47, 0x0a, 0x18, + 0x44, 0x49, 0x53, 0x54, 0x52, 0x49, 0x42, 0x55, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, + 0x42, 0x4c, 0x45, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x03, 0x1a, 0x29, 0xa2, 0xb4, 0xfa, 0xc2, + 0x05, 0x23, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x64, + 0x69, 0x73, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, + 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x39, 0x0a, 0x11, 0x4c, 0x41, 0x54, 0x45, 0x53, 0x54, 0x5f, + 0x49, 0x4e, 0x54, 0x36, 0x34, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x04, 0x1a, 0x22, 0xa2, 0xb4, + 0xfa, 0xc2, 0x05, 0x1c, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, + 0x3a, 0x6c, 0x61, 0x74, 0x65, 0x73, 0x74, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, + 0x12, 0x3b, 0x0a, 0x12, 0x4c, 0x41, 0x54, 0x45, 0x53, 0x54, 0x5f, 0x44, 0x4f, 0x55, 0x42, 0x4c, + 0x45, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x05, 0x1a, 0x23, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1d, + 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x6c, 0x61, 0x74, + 0x65, 0x73, 0x74, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, 0x31, 0x12, 0x37, 0x0a, + 0x10, 0x54, 0x4f, 0x50, 0x5f, 0x4e, 0x5f, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x5f, 0x54, 0x59, 0x50, + 0x45, 0x10, 0x06, 0x1a, 0x21, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x1b, 0x62, 0x65, 0x61, 0x6d, 0x3a, + 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, 0x5f, 0x69, 0x6e, + 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, 0x12, 0x39, 0x0a, 0x11, 0x54, 0x4f, 0x50, 0x5f, 0x4e, 0x5f, + 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x07, 0x1a, 0x22, 0xa2, + 0xb4, 0xfa, 0xc2, 0x05, 0x1c, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, + 0x73, 0x3a, 0x74, 0x6f, 0x70, 0x5f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, + 0x31, 0x12, 0x3d, 0x0a, 0x13, 0x42, 0x4f, 0x54, 0x54, 0x4f, 0x4d, 0x5f, 0x4e, 0x5f, 0x49, 0x4e, + 0x54, 0x36, 0x34, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x08, 0x1a, 0x24, 0xa2, 0xb4, 0xfa, 0xc2, + 0x05, 0x1e, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x62, + 0x6f, 0x74, 0x74, 0x6f, 0x6d, 0x5f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x3a, 0x76, 0x31, + 0x12, 0x3f, 0x0a, 0x14, 0x42, 0x4f, 0x54, 0x54, 0x4f, 0x4d, 0x5f, 0x4e, 0x5f, 0x44, 0x4f, 0x55, + 0x42, 0x4c, 0x45, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x10, 0x09, 0x1a, 0x25, 0xa2, 0xb4, 0xfa, 0xc2, + 0x05, 0x1f, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x62, + 0x6f, 0x74, 0x74, 0x6f, 0x6d, 0x5f, 0x6e, 0x5f, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x3a, 0x76, + 0x31, 0x12, 0x31, 0x0a, 0x0d, 0x50, 0x52, 0x4f, 0x47, 0x52, 0x45, 0x53, 0x53, 0x5f, 0x54, 0x59, + 0x50, 0x45, 0x10, 0x0a, 0x1a, 0x1e, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x18, 0x62, 0x65, 0x61, 0x6d, + 0x3a, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x3a, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x65, 0x73, + 0x73, 0x3a, 0x76, 0x31, 0x3a, 0x82, 0x01, 0x0a, 0x0b, 0x6c, 0x61, 0x62, 0x65, 0x6c, 0x5f, 0x70, + 0x72, 0x6f, 0x70, 0x73, 0x12, 0x21, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, + 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x56, 0x61, 0x6c, 0x75, 0x65, + 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x18, 0xc4, 0x8a, 0xdc, 0x3c, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x3b, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, + 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, + 0x6e, 0x66, 0x6f, 0x4c, 0x61, 0x62, 0x65, 0x6c, 0x50, 0x72, 0x6f, 0x70, 0x73, 0x52, 0x0a, 0x6c, + 0x61, 0x62, 0x65, 0x6c, 0x50, 0x72, 0x6f, 0x70, 0x73, 0x3a, 0x8d, 0x01, 0x0a, 0x14, 0x6d, 0x6f, + 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x5f, 0x73, 0x70, + 0x65, 0x63, 0x12, 0x21, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, + 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x45, 0x6e, 0x75, 0x6d, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x4f, 0x70, + 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x18, 0xfa, 0xf4, 0xe4, 0x62, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x35, + 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, + 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, + 0x76, 0x31, 0x2e, 0x4d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, 0x67, 0x49, 0x6e, 0x66, + 0x6f, 0x53, 0x70, 0x65, 0x63, 0x52, 0x12, 0x6d, 0x6f, 0x6e, 0x69, 0x74, 0x6f, 0x72, 0x69, 0x6e, + 0x67, 0x49, 0x6e, 0x66, 0x6f, 0x53, 0x70, 0x65, 0x63, 0x42, 0x79, 0x0a, 0x21, 0x6f, 0x72, 0x67, + 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, + 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x42, 0x0a, + 0x4d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x41, 0x70, 0x69, 0x5a, 0x48, 0x67, 0x69, 0x74, 0x68, + 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, + 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x76, 0x32, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, + 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x70, 0x69, 0x70, + 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x3b, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, + 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( @@ -1169,20 +1296,22 @@ var file_metrics_proto_goTypes = []interface{}{ (*MonitoringInfo)(nil), // 7: org.apache.beam.model.pipeline.v1.MonitoringInfo (*MonitoringInfoTypeUrns)(nil), // 8: org.apache.beam.model.pipeline.v1.MonitoringInfoTypeUrns nil, // 9: org.apache.beam.model.pipeline.v1.MonitoringInfo.LabelsEntry - (*descriptor.EnumValueOptions)(nil), // 10: google.protobuf.EnumValueOptions + (*timestamppb.Timestamp)(nil), // 10: google.protobuf.Timestamp + (*descriptorpb.EnumValueOptions)(nil), // 11: google.protobuf.EnumValueOptions } var file_metrics_proto_depIdxs = []int32{ 4, // 0: org.apache.beam.model.pipeline.v1.MonitoringInfoSpec.annotations:type_name -> org.apache.beam.model.pipeline.v1.Annotation 9, // 1: org.apache.beam.model.pipeline.v1.MonitoringInfo.labels:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfo.LabelsEntry - 10, // 2: org.apache.beam.model.pipeline.v1.label_props:extendee -> google.protobuf.EnumValueOptions - 10, // 3: org.apache.beam.model.pipeline.v1.monitoring_info_spec:extendee -> google.protobuf.EnumValueOptions - 6, // 4: org.apache.beam.model.pipeline.v1.label_props:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfoLabelProps - 3, // 5: org.apache.beam.model.pipeline.v1.monitoring_info_spec:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfoSpec - 6, // [6:6] is the sub-list for method output_type - 6, // [6:6] is the sub-list for method input_type - 4, // [4:6] is the sub-list for extension type_name - 2, // [2:4] is the sub-list for extension extendee - 0, // [0:2] is the sub-list for field type_name + 10, // 2: org.apache.beam.model.pipeline.v1.MonitoringInfo.start_time:type_name -> google.protobuf.Timestamp + 11, // 3: org.apache.beam.model.pipeline.v1.label_props:extendee -> google.protobuf.EnumValueOptions + 11, // 4: org.apache.beam.model.pipeline.v1.monitoring_info_spec:extendee -> google.protobuf.EnumValueOptions + 6, // 5: org.apache.beam.model.pipeline.v1.label_props:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfoLabelProps + 3, // 6: org.apache.beam.model.pipeline.v1.monitoring_info_spec:type_name -> org.apache.beam.model.pipeline.v1.MonitoringInfoSpec + 7, // [7:7] is the sub-list for method output_type + 7, // [7:7] is the sub-list for method input_type + 5, // [5:7] is the sub-list for extension type_name + 3, // [3:5] is the sub-list for extension extendee + 0, // [0:3] is the sub-list for field type_name } func init() { file_metrics_proto_init() } diff --git a/sdks/go/pkg/beam/model/pipeline_v1/schema.pb.go b/sdks/go/pkg/beam/model/pipeline_v1/schema.pb.go index 720b19e5e948..cb42a17ff9b9 100644 --- a/sdks/go/pkg/beam/model/pipeline_v1/schema.pb.go +++ b/sdks/go/pkg/beam/model/pipeline_v1/schema.pb.go @@ -24,8 +24,8 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: schema.proto package pipeline_v1 @@ -124,6 +124,8 @@ type Schema struct { // REQUIRED. An RFC 4122 UUID. Id string `protobuf:"bytes,2,opt,name=id,proto3" json:"id,omitempty"` Options []*Option `protobuf:"bytes,3,rep,name=options,proto3" json:"options,omitempty"` + // Indicates that encoding positions have been overridden. + EncodingPositionsSet bool `protobuf:"varint,4,opt,name=encoding_positions_set,json=encodingPositionsSet,proto3" json:"encoding_positions_set,omitempty"` } func (x *Schema) Reset() { @@ -179,6 +181,13 @@ func (x *Schema) GetOptions() []*Option { return nil } +func (x *Schema) GetEncodingPositionsSet() bool { + if x != nil { + return x.EncodingPositionsSet + } + return false +} + type Field struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache @@ -195,6 +204,8 @@ type Field struct { // or all of them are. Used to support backwards compatibility with schema // changes. // If no fields have encoding position populated the order of encoding is the same as the order in the Schema. + // If this Field is part of a Schema where encoding_positions_set is True then encoding_position must be + // defined, otherwise this field is ignored. EncodingPosition int32 `protobuf:"varint,5,opt,name=encoding_position,json=encodingPosition,proto3" json:"encoding_position,omitempty"` Options []*Option `protobuf:"bytes,6,rep,name=options,proto3" json:"options,omitempty"` } @@ -699,9 +710,9 @@ type Option struct { // REQUIRED. Identifier for the option. Name string `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"` - // OPTIONAL. Type specifer for the structure of value. - // If not present, assumes no additional configuration is needed - // for this option and value is ignored. + // REQUIRED. Type specifer for the structure of value. + // Conventionally, options that don't require additional configuration should + // use a boolean type, with the value set to true. Type *FieldType `protobuf:"bytes,2,opt,name=type,proto3" json:"type,omitempty"` Value *FieldValue `protobuf:"bytes,3,opt,name=value,proto3" json:"value,omitempty"` } @@ -1369,7 +1380,7 @@ var file_schema_proto_rawDesc = []byte{ 0x0a, 0x0c, 0x73, 0x63, 0x68, 0x65, 0x6d, 0x61, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x12, 0x21, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, - 0x31, 0x22, 0x9f, 0x01, 0x0a, 0x06, 0x53, 0x63, 0x68, 0x65, 0x6d, 0x61, 0x12, 0x40, 0x0a, 0x06, + 0x31, 0x22, 0xd5, 0x01, 0x0a, 0x06, 0x53, 0x63, 0x68, 0x65, 0x6d, 0x61, 0x12, 0x40, 0x0a, 0x06, 0x66, 0x69, 0x65, 0x6c, 0x64, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x28, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, @@ -1379,216 +1390,220 @@ var file_schema_proto_rawDesc = []byte{ 0x29, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x07, 0x6f, 0x70, 0x74, 0x69, - 0x6f, 0x6e, 0x73, 0x22, 0x81, 0x02, 0x0a, 0x05, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x12, 0x12, 0x0a, - 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, - 0x65, 0x12, 0x20, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, - 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, - 0x69, 0x6f, 0x6e, 0x12, 0x40, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, - 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x04, 0x20, 0x01, 0x28, - 0x05, 0x52, 0x02, 0x69, 0x64, 0x12, 0x2b, 0x0a, 0x11, 0x65, 0x6e, 0x63, 0x6f, 0x64, 0x69, 0x6e, - 0x67, 0x5f, 0x70, 0x6f, 0x73, 0x69, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x05, 0x20, 0x01, 0x28, 0x05, - 0x52, 0x10, 0x65, 0x6e, 0x63, 0x6f, 0x64, 0x69, 0x6e, 0x67, 0x50, 0x6f, 0x73, 0x69, 0x74, 0x69, - 0x6f, 0x6e, 0x12, 0x43, 0x0a, 0x07, 0x6f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x18, 0x06, 0x20, - 0x03, 0x28, 0x0b, 0x32, 0x29, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, - 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, - 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x07, - 0x6f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x22, 0x94, 0x04, 0x0a, 0x09, 0x46, 0x69, 0x65, 0x6c, - 0x64, 0x54, 0x79, 0x70, 0x65, 0x12, 0x1a, 0x0a, 0x08, 0x6e, 0x75, 0x6c, 0x6c, 0x61, 0x62, 0x6c, - 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x08, 0x52, 0x08, 0x6e, 0x75, 0x6c, 0x6c, 0x61, 0x62, 0x6c, - 0x65, 0x12, 0x50, 0x0a, 0x0b, 0x61, 0x74, 0x6f, 0x6d, 0x69, 0x63, 0x5f, 0x74, 0x79, 0x70, 0x65, - 0x18, 0x02, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x74, 0x6f, 0x6d, 0x69, - 0x63, 0x54, 0x79, 0x70, 0x65, 0x48, 0x00, 0x52, 0x0a, 0x61, 0x74, 0x6f, 0x6d, 0x69, 0x63, 0x54, - 0x79, 0x70, 0x65, 0x12, 0x4d, 0x0a, 0x0a, 0x61, 0x72, 0x72, 0x61, 0x79, 0x5f, 0x74, 0x79, 0x70, + 0x6f, 0x6e, 0x73, 0x12, 0x34, 0x0a, 0x16, 0x65, 0x6e, 0x63, 0x6f, 0x64, 0x69, 0x6e, 0x67, 0x5f, + 0x70, 0x6f, 0x73, 0x69, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x5f, 0x73, 0x65, 0x74, 0x18, 0x04, 0x20, + 0x01, 0x28, 0x08, 0x52, 0x14, 0x65, 0x6e, 0x63, 0x6f, 0x64, 0x69, 0x6e, 0x67, 0x50, 0x6f, 0x73, + 0x69, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x53, 0x65, 0x74, 0x22, 0x81, 0x02, 0x0a, 0x05, 0x46, 0x69, + 0x65, 0x6c, 0x64, 0x12, 0x12, 0x0a, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x12, 0x20, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, + 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x64, 0x65, + 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x40, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x72, 0x72, 0x61, - 0x79, 0x54, 0x79, 0x70, 0x65, 0x48, 0x00, 0x52, 0x09, 0x61, 0x72, 0x72, 0x61, 0x79, 0x54, 0x79, - 0x70, 0x65, 0x12, 0x56, 0x0a, 0x0d, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x5f, 0x74, - 0x79, 0x70, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, + 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x0e, 0x0a, 0x02, 0x69, + 0x64, 0x18, 0x04, 0x20, 0x01, 0x28, 0x05, 0x52, 0x02, 0x69, 0x64, 0x12, 0x2b, 0x0a, 0x11, 0x65, + 0x6e, 0x63, 0x6f, 0x64, 0x69, 0x6e, 0x67, 0x5f, 0x70, 0x6f, 0x73, 0x69, 0x74, 0x69, 0x6f, 0x6e, + 0x18, 0x05, 0x20, 0x01, 0x28, 0x05, 0x52, 0x10, 0x65, 0x6e, 0x63, 0x6f, 0x64, 0x69, 0x6e, 0x67, + 0x50, 0x6f, 0x73, 0x69, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x43, 0x0a, 0x07, 0x6f, 0x70, 0x74, 0x69, + 0x6f, 0x6e, 0x73, 0x18, 0x06, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x29, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x74, - 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x54, 0x79, 0x70, 0x65, 0x48, 0x00, 0x52, 0x0c, 0x69, 0x74, - 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x54, 0x79, 0x70, 0x65, 0x12, 0x47, 0x0a, 0x08, 0x6d, 0x61, - 0x70, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, + 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4f, 0x70, + 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x07, 0x6f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x22, 0x94, 0x04, + 0x0a, 0x09, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x12, 0x1a, 0x0a, 0x08, 0x6e, + 0x75, 0x6c, 0x6c, 0x61, 0x62, 0x6c, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x08, 0x52, 0x08, 0x6e, + 0x75, 0x6c, 0x6c, 0x61, 0x62, 0x6c, 0x65, 0x12, 0x50, 0x0a, 0x0b, 0x61, 0x74, 0x6f, 0x6d, 0x69, + 0x63, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, - 0x2e, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x48, 0x00, 0x52, 0x07, 0x6d, 0x61, 0x70, 0x54, - 0x79, 0x70, 0x65, 0x12, 0x47, 0x0a, 0x08, 0x72, 0x6f, 0x77, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, - 0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x52, 0x6f, 0x77, 0x54, 0x79, 0x70, - 0x65, 0x48, 0x00, 0x52, 0x07, 0x72, 0x6f, 0x77, 0x54, 0x79, 0x70, 0x65, 0x12, 0x53, 0x0a, 0x0c, - 0x6c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x07, 0x20, 0x01, - 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, - 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, - 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x54, 0x79, - 0x70, 0x65, 0x48, 0x00, 0x52, 0x0b, 0x6c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x54, 0x79, 0x70, - 0x65, 0x42, 0x0b, 0x0a, 0x09, 0x74, 0x79, 0x70, 0x65, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x22, 0x5c, - 0x0a, 0x09, 0x41, 0x72, 0x72, 0x61, 0x79, 0x54, 0x79, 0x70, 0x65, 0x12, 0x4f, 0x0a, 0x0c, 0x65, - 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, - 0x0b, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x54, 0x79, 0x70, 0x65, 0x22, 0x5f, 0x0a, 0x0c, - 0x49, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x54, 0x79, 0x70, 0x65, 0x12, 0x4f, 0x0a, 0x0c, - 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x01, 0x20, 0x01, - 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, + 0x2e, 0x41, 0x74, 0x6f, 0x6d, 0x69, 0x63, 0x54, 0x79, 0x70, 0x65, 0x48, 0x00, 0x52, 0x0a, 0x61, + 0x74, 0x6f, 0x6d, 0x69, 0x63, 0x54, 0x79, 0x70, 0x65, 0x12, 0x4d, 0x0a, 0x0a, 0x61, 0x72, 0x72, + 0x61, 0x79, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x41, 0x72, 0x72, 0x61, 0x79, 0x54, 0x79, 0x70, 0x65, 0x48, 0x00, 0x52, 0x09, 0x61, + 0x72, 0x72, 0x61, 0x79, 0x54, 0x79, 0x70, 0x65, 0x12, 0x56, 0x0a, 0x0d, 0x69, 0x74, 0x65, 0x72, + 0x61, 0x62, 0x6c, 0x65, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, + 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, + 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x54, 0x79, 0x70, 0x65, + 0x48, 0x00, 0x52, 0x0c, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x54, 0x79, 0x70, 0x65, + 0x12, 0x47, 0x0a, 0x08, 0x6d, 0x61, 0x70, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x05, 0x20, 0x01, + 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, - 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, - 0x52, 0x0b, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x54, 0x79, 0x70, 0x65, 0x22, 0x9f, 0x01, - 0x0a, 0x07, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x12, 0x47, 0x0a, 0x08, 0x6b, 0x65, 0x79, - 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, + 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x48, 0x00, + 0x52, 0x07, 0x6d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x12, 0x47, 0x0a, 0x08, 0x72, 0x6f, 0x77, + 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2a, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, - 0x46, 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, 0x07, 0x6b, 0x65, 0x79, 0x54, 0x79, - 0x70, 0x65, 0x12, 0x4b, 0x0a, 0x0a, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x5f, 0x74, 0x79, 0x70, 0x65, - 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, - 0x54, 0x79, 0x70, 0x65, 0x52, 0x09, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x54, 0x79, 0x70, 0x65, 0x22, - 0x4c, 0x0a, 0x07, 0x52, 0x6f, 0x77, 0x54, 0x79, 0x70, 0x65, 0x12, 0x41, 0x0a, 0x06, 0x73, 0x63, - 0x68, 0x65, 0x6d, 0x61, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x29, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x53, - 0x63, 0x68, 0x65, 0x6d, 0x61, 0x52, 0x06, 0x73, 0x63, 0x68, 0x65, 0x6d, 0x61, 0x22, 0xad, 0x02, - 0x0a, 0x0b, 0x4c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x54, 0x79, 0x70, 0x65, 0x12, 0x10, 0x0a, - 0x03, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, - 0x18, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, - 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, 0x54, 0x0a, 0x0e, 0x72, 0x65, 0x70, - 0x72, 0x65, 0x73, 0x65, 0x6e, 0x74, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x03, 0x20, 0x01, 0x28, + 0x52, 0x6f, 0x77, 0x54, 0x79, 0x70, 0x65, 0x48, 0x00, 0x52, 0x07, 0x72, 0x6f, 0x77, 0x54, 0x79, + 0x70, 0x65, 0x12, 0x53, 0x0a, 0x0c, 0x6c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x5f, 0x74, 0x79, + 0x70, 0x65, 0x18, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2e, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, + 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, + 0x69, 0x63, 0x61, 0x6c, 0x54, 0x79, 0x70, 0x65, 0x48, 0x00, 0x52, 0x0b, 0x6c, 0x6f, 0x67, 0x69, + 0x63, 0x61, 0x6c, 0x54, 0x79, 0x70, 0x65, 0x42, 0x0b, 0x0a, 0x09, 0x74, 0x79, 0x70, 0x65, 0x5f, + 0x69, 0x6e, 0x66, 0x6f, 0x22, 0x5c, 0x0a, 0x09, 0x41, 0x72, 0x72, 0x61, 0x79, 0x54, 0x79, 0x70, + 0x65, 0x12, 0x4f, 0x0a, 0x0c, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x74, 0x79, 0x70, + 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, + 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, 0x0b, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x54, 0x79, + 0x70, 0x65, 0x22, 0x5f, 0x0a, 0x0c, 0x49, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x54, 0x79, + 0x70, 0x65, 0x12, 0x4f, 0x0a, 0x0c, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x74, 0x79, + 0x70, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, + 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, + 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, 0x0b, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x54, + 0x79, 0x70, 0x65, 0x22, 0x9f, 0x01, 0x0a, 0x07, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x12, + 0x47, 0x0a, 0x08, 0x6b, 0x65, 0x79, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, - 0x0e, 0x72, 0x65, 0x70, 0x72, 0x65, 0x73, 0x65, 0x6e, 0x74, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, - 0x51, 0x0a, 0x0d, 0x61, 0x72, 0x67, 0x75, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x74, 0x79, 0x70, 0x65, - 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, - 0x54, 0x79, 0x70, 0x65, 0x52, 0x0c, 0x61, 0x72, 0x67, 0x75, 0x6d, 0x65, 0x6e, 0x74, 0x54, 0x79, - 0x70, 0x65, 0x12, 0x49, 0x0a, 0x08, 0x61, 0x72, 0x67, 0x75, 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x05, - 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, - 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, - 0x6c, 0x75, 0x65, 0x52, 0x08, 0x61, 0x72, 0x67, 0x75, 0x6d, 0x65, 0x6e, 0x74, 0x22, 0xa3, 0x01, - 0x0a, 0x06, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x12, 0x0a, 0x04, 0x6e, 0x61, 0x6d, 0x65, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x12, 0x40, 0x0a, 0x04, - 0x74, 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, + 0x07, 0x6b, 0x65, 0x79, 0x54, 0x79, 0x70, 0x65, 0x12, 0x4b, 0x0a, 0x0a, 0x76, 0x61, 0x6c, 0x75, + 0x65, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, + 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, + 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, 0x09, 0x76, 0x61, 0x6c, 0x75, + 0x65, 0x54, 0x79, 0x70, 0x65, 0x22, 0x4c, 0x0a, 0x07, 0x52, 0x6f, 0x77, 0x54, 0x79, 0x70, 0x65, + 0x12, 0x41, 0x0a, 0x06, 0x73, 0x63, 0x68, 0x65, 0x6d, 0x61, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x29, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, + 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x63, 0x68, 0x65, 0x6d, 0x61, 0x52, 0x06, 0x73, 0x63, 0x68, + 0x65, 0x6d, 0x61, 0x22, 0xad, 0x02, 0x0a, 0x0b, 0x4c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x54, + 0x79, 0x70, 0x65, 0x12, 0x10, 0x0a, 0x03, 0x75, 0x72, 0x6e, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, + 0x52, 0x03, 0x75, 0x72, 0x6e, 0x12, 0x18, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x12, + 0x54, 0x0a, 0x0e, 0x72, 0x65, 0x70, 0x72, 0x65, 0x73, 0x65, 0x6e, 0x74, 0x61, 0x74, 0x69, 0x6f, + 0x6e, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, + 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, 0x0e, 0x72, 0x65, 0x70, 0x72, 0x65, 0x73, 0x65, 0x6e, 0x74, + 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x51, 0x0a, 0x0d, 0x61, 0x72, 0x67, 0x75, 0x6d, 0x65, 0x6e, + 0x74, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2c, 0x2e, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, + 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, + 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, 0x0c, 0x61, 0x72, 0x67, 0x75, + 0x6d, 0x65, 0x6e, 0x74, 0x54, 0x79, 0x70, 0x65, 0x12, 0x49, 0x0a, 0x08, 0x61, 0x72, 0x67, 0x75, + 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, - 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x43, - 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, + 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x08, 0x61, 0x72, 0x67, 0x75, 0x6d, + 0x65, 0x6e, 0x74, 0x22, 0xa3, 0x01, 0x0a, 0x06, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x12, 0x12, + 0x0a, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, + 0x6d, 0x65, 0x12, 0x40, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x2c, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, + 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, + 0x74, 0x79, 0x70, 0x65, 0x12, 0x43, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x03, 0x20, + 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, + 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, + 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, + 0x75, 0x65, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x22, 0x4c, 0x0a, 0x03, 0x52, 0x6f, 0x77, + 0x12, 0x45, 0x0a, 0x06, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, + 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, + 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, + 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, + 0x06, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x73, 0x22, 0xa5, 0x04, 0x0a, 0x0a, 0x46, 0x69, 0x65, 0x6c, + 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x57, 0x0a, 0x0c, 0x61, 0x74, 0x6f, 0x6d, 0x69, 0x63, + 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x32, 0x2e, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, + 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, + 0x2e, 0x41, 0x74, 0x6f, 0x6d, 0x69, 0x63, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, + 0x48, 0x00, 0x52, 0x0b, 0x61, 0x74, 0x6f, 0x6d, 0x69, 0x63, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, + 0x54, 0x0a, 0x0b, 0x61, 0x72, 0x72, 0x61, 0x79, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, + 0x20, 0x01, 0x28, 0x0b, 0x32, 0x31, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, + 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x72, 0x72, 0x61, 0x79, 0x54, 0x79, + 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x48, 0x00, 0x52, 0x0a, 0x61, 0x72, 0x72, 0x61, 0x79, + 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x5d, 0x0a, 0x0e, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, + 0x65, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, - 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x05, 0x76, 0x61, - 0x6c, 0x75, 0x65, 0x22, 0x4c, 0x0a, 0x03, 0x52, 0x6f, 0x77, 0x12, 0x45, 0x0a, 0x06, 0x76, 0x61, - 0x6c, 0x75, 0x65, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, - 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, - 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, - 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x06, 0x76, 0x61, 0x6c, 0x75, 0x65, - 0x73, 0x22, 0xa5, 0x04, 0x0a, 0x0a, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, - 0x12, 0x57, 0x0a, 0x0c, 0x61, 0x74, 0x6f, 0x6d, 0x69, 0x63, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x32, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x74, 0x6f, 0x6d, 0x69, - 0x63, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x48, 0x00, 0x52, 0x0b, 0x61, 0x74, - 0x6f, 0x6d, 0x69, 0x63, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x54, 0x0a, 0x0b, 0x61, 0x72, 0x72, - 0x61, 0x79, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x31, - 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, - 0x76, 0x31, 0x2e, 0x41, 0x72, 0x72, 0x61, 0x79, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, - 0x65, 0x48, 0x00, 0x52, 0x0a, 0x61, 0x72, 0x72, 0x61, 0x79, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, - 0x5d, 0x0a, 0x0e, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x5f, 0x76, 0x61, 0x6c, 0x75, - 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x34, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x31, 0x2e, 0x49, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, + 0x6c, 0x75, 0x65, 0x48, 0x00, 0x52, 0x0d, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x56, + 0x61, 0x6c, 0x75, 0x65, 0x12, 0x4e, 0x0a, 0x09, 0x6d, 0x61, 0x70, 0x5f, 0x76, 0x61, 0x6c, 0x75, + 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x49, 0x74, 0x65, 0x72, - 0x61, 0x62, 0x6c, 0x65, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x48, 0x00, 0x52, - 0x0d, 0x69, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x4e, - 0x0a, 0x09, 0x6d, 0x61, 0x70, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, - 0x75, 0x65, 0x48, 0x00, 0x52, 0x08, 0x6d, 0x61, 0x70, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x45, - 0x0a, 0x09, 0x72, 0x6f, 0x77, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x26, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x52, 0x6f, 0x77, 0x48, 0x00, 0x52, 0x08, 0x72, 0x6f, 0x77, - 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x63, 0x0a, 0x12, 0x6c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, - 0x5f, 0x74, 0x79, 0x70, 0x65, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x06, 0x20, 0x01, 0x28, - 0x0b, 0x32, 0x33, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, - 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x54, 0x79, 0x70, - 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x48, 0x00, 0x52, 0x10, 0x6c, 0x6f, 0x67, 0x69, 0x63, 0x61, - 0x6c, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x42, 0x0d, 0x0a, 0x0b, 0x66, 0x69, - 0x65, 0x6c, 0x64, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x22, 0xf8, 0x01, 0x0a, 0x0f, 0x41, 0x74, - 0x6f, 0x6d, 0x69, 0x63, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x14, 0x0a, - 0x04, 0x62, 0x79, 0x74, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x05, 0x48, 0x00, 0x52, 0x04, 0x62, - 0x79, 0x74, 0x65, 0x12, 0x16, 0x0a, 0x05, 0x69, 0x6e, 0x74, 0x31, 0x36, 0x18, 0x02, 0x20, 0x01, - 0x28, 0x05, 0x48, 0x00, 0x52, 0x05, 0x69, 0x6e, 0x74, 0x31, 0x36, 0x12, 0x16, 0x0a, 0x05, 0x69, - 0x6e, 0x74, 0x33, 0x32, 0x18, 0x03, 0x20, 0x01, 0x28, 0x05, 0x48, 0x00, 0x52, 0x05, 0x69, 0x6e, - 0x74, 0x33, 0x32, 0x12, 0x16, 0x0a, 0x05, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x18, 0x04, 0x20, 0x01, - 0x28, 0x03, 0x48, 0x00, 0x52, 0x05, 0x69, 0x6e, 0x74, 0x36, 0x34, 0x12, 0x16, 0x0a, 0x05, 0x66, - 0x6c, 0x6f, 0x61, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, 0x02, 0x48, 0x00, 0x52, 0x05, 0x66, 0x6c, - 0x6f, 0x61, 0x74, 0x12, 0x18, 0x0a, 0x06, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x18, 0x06, 0x20, - 0x01, 0x28, 0x01, 0x48, 0x00, 0x52, 0x06, 0x64, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x12, 0x18, 0x0a, - 0x06, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x18, 0x07, 0x20, 0x01, 0x28, 0x09, 0x48, 0x00, 0x52, - 0x06, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x12, 0x1a, 0x0a, 0x07, 0x62, 0x6f, 0x6f, 0x6c, 0x65, - 0x61, 0x6e, 0x18, 0x08, 0x20, 0x01, 0x28, 0x08, 0x48, 0x00, 0x52, 0x07, 0x62, 0x6f, 0x6f, 0x6c, - 0x65, 0x61, 0x6e, 0x12, 0x16, 0x0a, 0x05, 0x62, 0x79, 0x74, 0x65, 0x73, 0x18, 0x09, 0x20, 0x01, - 0x28, 0x0c, 0x48, 0x00, 0x52, 0x05, 0x62, 0x79, 0x74, 0x65, 0x73, 0x42, 0x07, 0x0a, 0x05, 0x76, - 0x61, 0x6c, 0x75, 0x65, 0x22, 0x59, 0x0a, 0x0e, 0x41, 0x72, 0x72, 0x61, 0x79, 0x54, 0x79, 0x70, - 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x47, 0x0a, 0x07, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, - 0x74, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x61, 0x70, 0x54, + 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x48, 0x00, 0x52, 0x08, 0x6d, 0x61, 0x70, 0x56, + 0x61, 0x6c, 0x75, 0x65, 0x12, 0x45, 0x0a, 0x09, 0x72, 0x6f, 0x77, 0x5f, 0x76, 0x61, 0x6c, 0x75, + 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x26, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, - 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, - 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x07, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x22, - 0x5c, 0x0a, 0x11, 0x49, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, 0x65, 0x54, 0x79, 0x70, 0x65, 0x56, - 0x61, 0x6c, 0x75, 0x65, 0x12, 0x47, 0x0a, 0x07, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, - 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, - 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, - 0x61, 0x6c, 0x75, 0x65, 0x52, 0x07, 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x22, 0x59, 0x0a, - 0x0c, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x49, 0x0a, - 0x07, 0x65, 0x6e, 0x74, 0x72, 0x69, 0x65, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2f, - 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, - 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, - 0x76, 0x31, 0x2e, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, - 0x07, 0x65, 0x6e, 0x74, 0x72, 0x69, 0x65, 0x73, 0x22, 0x94, 0x01, 0x0a, 0x0c, 0x4d, 0x61, 0x70, - 0x54, 0x79, 0x70, 0x65, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x3f, 0x0a, 0x03, 0x6b, 0x65, 0x79, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, - 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, - 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, - 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x43, 0x0a, 0x05, 0x76, 0x61, - 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, - 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, - 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x22, - 0x57, 0x0a, 0x10, 0x4c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, - 0x6c, 0x75, 0x65, 0x12, 0x43, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x01, 0x20, 0x01, - 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, - 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, - 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, - 0x65, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x2a, 0x83, 0x01, 0x0a, 0x0a, 0x41, 0x74, 0x6f, - 0x6d, 0x69, 0x63, 0x54, 0x79, 0x70, 0x65, 0x12, 0x0f, 0x0a, 0x0b, 0x55, 0x4e, 0x53, 0x50, 0x45, - 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x08, 0x0a, 0x04, 0x42, 0x59, 0x54, 0x45, - 0x10, 0x01, 0x12, 0x09, 0x0a, 0x05, 0x49, 0x4e, 0x54, 0x31, 0x36, 0x10, 0x02, 0x12, 0x09, 0x0a, - 0x05, 0x49, 0x4e, 0x54, 0x33, 0x32, 0x10, 0x03, 0x12, 0x09, 0x0a, 0x05, 0x49, 0x4e, 0x54, 0x36, - 0x34, 0x10, 0x04, 0x12, 0x09, 0x0a, 0x05, 0x46, 0x4c, 0x4f, 0x41, 0x54, 0x10, 0x05, 0x12, 0x0a, - 0x0a, 0x06, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x06, 0x12, 0x0a, 0x0a, 0x06, 0x53, 0x54, - 0x52, 0x49, 0x4e, 0x47, 0x10, 0x07, 0x12, 0x0b, 0x0a, 0x07, 0x42, 0x4f, 0x4f, 0x4c, 0x45, 0x41, - 0x4e, 0x10, 0x08, 0x12, 0x09, 0x0a, 0x05, 0x42, 0x59, 0x54, 0x45, 0x53, 0x10, 0x09, 0x42, 0x75, - 0x0a, 0x21, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x52, 0x6f, 0x77, 0x48, + 0x00, 0x52, 0x08, 0x72, 0x6f, 0x77, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x63, 0x0a, 0x12, 0x6c, + 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x5f, 0x74, 0x79, 0x70, 0x65, 0x5f, 0x76, 0x61, 0x6c, 0x75, + 0x65, 0x18, 0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x33, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, + 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, + 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, 0x67, 0x69, + 0x63, 0x61, 0x6c, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x48, 0x00, 0x52, 0x10, + 0x6c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, + 0x42, 0x0d, 0x0a, 0x0b, 0x66, 0x69, 0x65, 0x6c, 0x64, 0x5f, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x22, + 0xf8, 0x01, 0x0a, 0x0f, 0x41, 0x74, 0x6f, 0x6d, 0x69, 0x63, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, + 0x6c, 0x75, 0x65, 0x12, 0x14, 0x0a, 0x04, 0x62, 0x79, 0x74, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, + 0x05, 0x48, 0x00, 0x52, 0x04, 0x62, 0x79, 0x74, 0x65, 0x12, 0x16, 0x0a, 0x05, 0x69, 0x6e, 0x74, + 0x31, 0x36, 0x18, 0x02, 0x20, 0x01, 0x28, 0x05, 0x48, 0x00, 0x52, 0x05, 0x69, 0x6e, 0x74, 0x31, + 0x36, 0x12, 0x16, 0x0a, 0x05, 0x69, 0x6e, 0x74, 0x33, 0x32, 0x18, 0x03, 0x20, 0x01, 0x28, 0x05, + 0x48, 0x00, 0x52, 0x05, 0x69, 0x6e, 0x74, 0x33, 0x32, 0x12, 0x16, 0x0a, 0x05, 0x69, 0x6e, 0x74, + 0x36, 0x34, 0x18, 0x04, 0x20, 0x01, 0x28, 0x03, 0x48, 0x00, 0x52, 0x05, 0x69, 0x6e, 0x74, 0x36, + 0x34, 0x12, 0x16, 0x0a, 0x05, 0x66, 0x6c, 0x6f, 0x61, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, 0x02, + 0x48, 0x00, 0x52, 0x05, 0x66, 0x6c, 0x6f, 0x61, 0x74, 0x12, 0x18, 0x0a, 0x06, 0x64, 0x6f, 0x75, + 0x62, 0x6c, 0x65, 0x18, 0x06, 0x20, 0x01, 0x28, 0x01, 0x48, 0x00, 0x52, 0x06, 0x64, 0x6f, 0x75, + 0x62, 0x6c, 0x65, 0x12, 0x18, 0x0a, 0x06, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x18, 0x07, 0x20, + 0x01, 0x28, 0x09, 0x48, 0x00, 0x52, 0x06, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x12, 0x1a, 0x0a, + 0x07, 0x62, 0x6f, 0x6f, 0x6c, 0x65, 0x61, 0x6e, 0x18, 0x08, 0x20, 0x01, 0x28, 0x08, 0x48, 0x00, + 0x52, 0x07, 0x62, 0x6f, 0x6f, 0x6c, 0x65, 0x61, 0x6e, 0x12, 0x16, 0x0a, 0x05, 0x62, 0x79, 0x74, + 0x65, 0x73, 0x18, 0x09, 0x20, 0x01, 0x28, 0x0c, 0x48, 0x00, 0x52, 0x05, 0x62, 0x79, 0x74, 0x65, + 0x73, 0x42, 0x07, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x22, 0x59, 0x0a, 0x0e, 0x41, 0x72, + 0x72, 0x61, 0x79, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x47, 0x0a, 0x07, + 0x65, 0x6c, 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2d, 0x2e, + 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, + 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x07, 0x65, 0x6c, + 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x22, 0x5c, 0x0a, 0x11, 0x49, 0x74, 0x65, 0x72, 0x61, 0x62, 0x6c, + 0x65, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x47, 0x0a, 0x07, 0x65, 0x6c, + 0x65, 0x6d, 0x65, 0x6e, 0x74, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, + 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, + 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, + 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x07, 0x65, 0x6c, 0x65, 0x6d, + 0x65, 0x6e, 0x74, 0x22, 0x59, 0x0a, 0x0c, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, + 0x6c, 0x75, 0x65, 0x12, 0x49, 0x0a, 0x07, 0x65, 0x6e, 0x74, 0x72, 0x69, 0x65, 0x73, 0x18, 0x01, + 0x20, 0x03, 0x28, 0x0b, 0x32, 0x2f, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, + 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, + 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, + 0x45, 0x6e, 0x74, 0x72, 0x79, 0x52, 0x07, 0x65, 0x6e, 0x74, 0x72, 0x69, 0x65, 0x73, 0x22, 0x94, + 0x01, 0x0a, 0x0c, 0x4d, 0x61, 0x70, 0x54, 0x79, 0x70, 0x65, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, + 0x3f, 0x0a, 0x03, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, + 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, + 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x03, 0x6b, 0x65, 0x79, + 0x12, 0x43, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, + 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, - 0x2e, 0x76, 0x31, 0x42, 0x09, 0x53, 0x63, 0x68, 0x65, 0x6d, 0x61, 0x41, 0x70, 0x69, 0x5a, 0x45, - 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, 0x70, 0x61, 0x63, 0x68, - 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, 0x67, 0x6f, 0x2f, 0x70, - 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x3b, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, - 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, + 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x05, + 0x76, 0x61, 0x6c, 0x75, 0x65, 0x22, 0x57, 0x0a, 0x10, 0x4c, 0x6f, 0x67, 0x69, 0x63, 0x61, 0x6c, + 0x54, 0x79, 0x70, 0x65, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x43, 0x0a, 0x05, 0x76, 0x61, 0x6c, + 0x75, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x2d, 0x2e, 0x6f, 0x72, 0x67, 0x2e, 0x61, + 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, + 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x2e, 0x46, 0x69, 0x65, + 0x6c, 0x64, 0x56, 0x61, 0x6c, 0x75, 0x65, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x2a, 0x83, + 0x01, 0x0a, 0x0a, 0x41, 0x74, 0x6f, 0x6d, 0x69, 0x63, 0x54, 0x79, 0x70, 0x65, 0x12, 0x0f, 0x0a, + 0x0b, 0x55, 0x4e, 0x53, 0x50, 0x45, 0x43, 0x49, 0x46, 0x49, 0x45, 0x44, 0x10, 0x00, 0x12, 0x08, + 0x0a, 0x04, 0x42, 0x59, 0x54, 0x45, 0x10, 0x01, 0x12, 0x09, 0x0a, 0x05, 0x49, 0x4e, 0x54, 0x31, + 0x36, 0x10, 0x02, 0x12, 0x09, 0x0a, 0x05, 0x49, 0x4e, 0x54, 0x33, 0x32, 0x10, 0x03, 0x12, 0x09, + 0x0a, 0x05, 0x49, 0x4e, 0x54, 0x36, 0x34, 0x10, 0x04, 0x12, 0x09, 0x0a, 0x05, 0x46, 0x4c, 0x4f, + 0x41, 0x54, 0x10, 0x05, 0x12, 0x0a, 0x0a, 0x06, 0x44, 0x4f, 0x55, 0x42, 0x4c, 0x45, 0x10, 0x06, + 0x12, 0x0a, 0x0a, 0x06, 0x53, 0x54, 0x52, 0x49, 0x4e, 0x47, 0x10, 0x07, 0x12, 0x0b, 0x0a, 0x07, + 0x42, 0x4f, 0x4f, 0x4c, 0x45, 0x41, 0x4e, 0x10, 0x08, 0x12, 0x09, 0x0a, 0x05, 0x42, 0x59, 0x54, + 0x45, 0x53, 0x10, 0x09, 0x42, 0x78, 0x0a, 0x21, 0x6f, 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, + 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, + 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x42, 0x09, 0x53, 0x63, 0x68, 0x65, 0x6d, + 0x61, 0x41, 0x70, 0x69, 0x5a, 0x48, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, + 0x2f, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, + 0x73, 0x2f, 0x76, 0x32, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, + 0x2f, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, + 0x76, 0x31, 0x3b, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, + 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( diff --git a/sdks/go/pkg/beam/model/pipeline_v1/standard_window_fns.pb.go b/sdks/go/pkg/beam/model/pipeline_v1/standard_window_fns.pb.go index 30c92044beca..0dbf8830e0a0 100644 --- a/sdks/go/pkg/beam/model/pipeline_v1/standard_window_fns.pb.go +++ b/sdks/go/pkg/beam/model/pipeline_v1/standard_window_fns.pb.go @@ -21,17 +21,17 @@ // Code generated by protoc-gen-go. DO NOT EDIT. // versions: -// protoc-gen-go v1.25.0-devel -// protoc v3.13.0 +// protoc-gen-go v1.27.1 +// protoc v3.17.3 // source: standard_window_fns.proto package pipeline_v1 import ( - duration "github.com/golang/protobuf/ptypes/duration" - timestamp "github.com/golang/protobuf/ptypes/timestamp" protoreflect "google.golang.org/protobuf/reflect/protoreflect" protoimpl "google.golang.org/protobuf/runtime/protoimpl" + durationpb "google.golang.org/protobuf/types/known/durationpb" + timestamppb "google.golang.org/protobuf/types/known/timestamppb" reflect "reflect" sync "sync" ) @@ -269,10 +269,10 @@ type FixedWindowsPayload struct { unknownFields protoimpl.UnknownFields // (Required) Represents the size of the window. - Size *duration.Duration `protobuf:"bytes,1,opt,name=size,proto3" json:"size,omitempty"` + Size *durationpb.Duration `protobuf:"bytes,1,opt,name=size,proto3" json:"size,omitempty"` // (Required) Represents the timestamp of when the first window begins. // Window N will start at offset + N * size. - Offset *timestamp.Timestamp `protobuf:"bytes,2,opt,name=offset,proto3" json:"offset,omitempty"` + Offset *timestamppb.Timestamp `protobuf:"bytes,2,opt,name=offset,proto3" json:"offset,omitempty"` } func (x *FixedWindowsPayload) Reset() { @@ -307,14 +307,14 @@ func (*FixedWindowsPayload) Descriptor() ([]byte, []int) { return file_standard_window_fns_proto_rawDescGZIP(), []int{1} } -func (x *FixedWindowsPayload) GetSize() *duration.Duration { +func (x *FixedWindowsPayload) GetSize() *durationpb.Duration { if x != nil { return x.Size } return nil } -func (x *FixedWindowsPayload) GetOffset() *timestamp.Timestamp { +func (x *FixedWindowsPayload) GetOffset() *timestamppb.Timestamp { if x != nil { return x.Offset } @@ -341,12 +341,12 @@ type SlidingWindowsPayload struct { unknownFields protoimpl.UnknownFields // (Required) Represents the size of the window. - Size *duration.Duration `protobuf:"bytes,1,opt,name=size,proto3" json:"size,omitempty"` + Size *durationpb.Duration `protobuf:"bytes,1,opt,name=size,proto3" json:"size,omitempty"` // (Required) Represents the timestamp of when the first window begins. // Window N will start at offset + N * period. - Offset *timestamp.Timestamp `protobuf:"bytes,2,opt,name=offset,proto3" json:"offset,omitempty"` + Offset *timestamppb.Timestamp `protobuf:"bytes,2,opt,name=offset,proto3" json:"offset,omitempty"` // (Required) Represents the amount of time between each start of a window. - Period *duration.Duration `protobuf:"bytes,3,opt,name=period,proto3" json:"period,omitempty"` + Period *durationpb.Duration `protobuf:"bytes,3,opt,name=period,proto3" json:"period,omitempty"` } func (x *SlidingWindowsPayload) Reset() { @@ -381,21 +381,21 @@ func (*SlidingWindowsPayload) Descriptor() ([]byte, []int) { return file_standard_window_fns_proto_rawDescGZIP(), []int{2} } -func (x *SlidingWindowsPayload) GetSize() *duration.Duration { +func (x *SlidingWindowsPayload) GetSize() *durationpb.Duration { if x != nil { return x.Size } return nil } -func (x *SlidingWindowsPayload) GetOffset() *timestamp.Timestamp { +func (x *SlidingWindowsPayload) GetOffset() *timestamppb.Timestamp { if x != nil { return x.Offset } return nil } -func (x *SlidingWindowsPayload) GetPeriod() *duration.Duration { +func (x *SlidingWindowsPayload) GetPeriod() *durationpb.Duration { if x != nil { return x.Period } @@ -418,7 +418,7 @@ type SessionWindowsPayload struct { unknownFields protoimpl.UnknownFields // (Required) Minimum duration of gaps between sessions. - GapSize *duration.Duration `protobuf:"bytes,1,opt,name=gap_size,json=gapSize,proto3" json:"gap_size,omitempty"` + GapSize *durationpb.Duration `protobuf:"bytes,1,opt,name=gap_size,json=gapSize,proto3" json:"gap_size,omitempty"` } func (x *SessionWindowsPayload) Reset() { @@ -453,7 +453,7 @@ func (*SessionWindowsPayload) Descriptor() ([]byte, []int) { return file_standard_window_fns_proto_rawDescGZIP(), []int{3} } -func (x *SessionWindowsPayload) GetGapSize() *duration.Duration { +func (x *SessionWindowsPayload) GetGapSize() *durationpb.Duration { if x != nil { return x.GapSize } @@ -513,16 +513,16 @@ var file_standard_window_fns_proto_rawDesc = []byte{ 0x6d, 0x12, 0x37, 0x0a, 0x0a, 0x50, 0x52, 0x4f, 0x50, 0x45, 0x52, 0x54, 0x49, 0x45, 0x53, 0x10, 0x00, 0x1a, 0x27, 0xa2, 0xb4, 0xfa, 0xc2, 0x05, 0x21, 0x62, 0x65, 0x61, 0x6d, 0x3a, 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x5f, 0x66, 0x6e, 0x3a, 0x73, 0x65, 0x73, 0x73, 0x69, 0x6f, 0x6e, 0x5f, - 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x73, 0x3a, 0x76, 0x31, 0x42, 0x7d, 0x0a, 0x21, 0x6f, 0x72, - 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, 0x6f, - 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, 0x42, - 0x11, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x46, - 0x6e, 0x73, 0x5a, 0x45, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x61, - 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, 0x2f, - 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x6d, 0x6f, 0x64, 0x65, - 0x6c, 0x2f, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x3b, 0x70, 0x69, - 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, - 0x33, + 0x77, 0x69, 0x6e, 0x64, 0x6f, 0x77, 0x73, 0x3a, 0x76, 0x31, 0x42, 0x80, 0x01, 0x0a, 0x21, 0x6f, + 0x72, 0x67, 0x2e, 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2e, 0x62, 0x65, 0x61, 0x6d, 0x2e, 0x6d, + 0x6f, 0x64, 0x65, 0x6c, 0x2e, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x2e, 0x76, 0x31, + 0x42, 0x11, 0x53, 0x74, 0x61, 0x6e, 0x64, 0x61, 0x72, 0x64, 0x57, 0x69, 0x6e, 0x64, 0x6f, 0x77, + 0x46, 0x6e, 0x73, 0x5a, 0x48, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, + 0x61, 0x70, 0x61, 0x63, 0x68, 0x65, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, 0x73, 0x64, 0x6b, 0x73, + 0x2f, 0x76, 0x32, 0x2f, 0x67, 0x6f, 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x62, 0x65, 0x61, 0x6d, 0x2f, + 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x2f, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, + 0x31, 0x3b, 0x70, 0x69, 0x70, 0x65, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, + 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( @@ -548,8 +548,8 @@ var file_standard_window_fns_proto_goTypes = []interface{}{ (*FixedWindowsPayload)(nil), // 5: org.apache.beam.model.pipeline.v1.FixedWindowsPayload (*SlidingWindowsPayload)(nil), // 6: org.apache.beam.model.pipeline.v1.SlidingWindowsPayload (*SessionWindowsPayload)(nil), // 7: org.apache.beam.model.pipeline.v1.SessionWindowsPayload - (*duration.Duration)(nil), // 8: google.protobuf.Duration - (*timestamp.Timestamp)(nil), // 9: google.protobuf.Timestamp + (*durationpb.Duration)(nil), // 8: google.protobuf.Duration + (*timestamppb.Timestamp)(nil), // 9: google.protobuf.Timestamp } var file_standard_window_fns_proto_depIdxs = []int32{ 8, // 0: org.apache.beam.model.pipeline.v1.FixedWindowsPayload.size:type_name -> google.protobuf.Duration diff --git a/sdks/go/pkg/beam/options/gcpopts/options.go b/sdks/go/pkg/beam/options/gcpopts/options.go index 29b5dc95d701..20cbb5b61c1e 100644 --- a/sdks/go/pkg/beam/options/gcpopts/options.go +++ b/sdks/go/pkg/beam/options/gcpopts/options.go @@ -23,7 +23,7 @@ import ( "os/exec" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) var ( diff --git a/sdks/go/pkg/beam/options/jobopts/options.go b/sdks/go/pkg/beam/options/jobopts/options.go index f1a159d19801..205804e77ddc 100644 --- a/sdks/go/pkg/beam/options/jobopts/options.go +++ b/sdks/go/pkg/beam/options/jobopts/options.go @@ -27,10 +27,26 @@ import ( "sync/atomic" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) +func init() { + flag.Var(&SdkHarnessContainerImageOverrides, + "sdk_harness_container_image_override", + "Overrides for SDK harness container images. Could be for the "+ + "local SDK or for a remote SDK that pipeline has to support due "+ + "to a cross-language transform. Each entry consists of two values "+ + "separated by a comma where first value gives a regex to "+ + "identify the container image to override and the second value "+ + "gives the replacement container image. Multiple entries can be "+ + "specified by using this flag multiple times. A container will "+ + "have no more than 1 override applied to it. If multiple "+ + "overrides match a container image it is arbitrary which "+ + "will be applied.") +} + var ( // Endpoint is the job service endpoint. Endpoint = flag.String("endpoint", "", "Job service endpoint (required).") @@ -52,6 +68,10 @@ var ( "\"env\":{\"\": \"\"} }. "+ "All fields in the json are optional except command.") + // SdkHarnessContainerImageOverrides contains patterns for overriding + // container image names in a pipeline. + SdkHarnessContainerImageOverrides stringSlice + // WorkerBinary is the location of the compiled worker binary. If not // specified, the binary is produced via go build. WorkerBinary = flag.String("worker_binary", "", "Worker binary (optional)") @@ -122,12 +142,24 @@ func IsLoopback() bool { // Convenience function. func GetEnvironmentConfig(ctx context.Context) string { if *EnvironmentConfig == "" { - *EnvironmentConfig = os.ExpandEnv("apache/beam_go_sdk:latest") + *EnvironmentConfig = os.ExpandEnv("apache/beam_go_sdk:" + core.SdkVersion) log.Infof(ctx, "No environment config specified. Using default config: '%v'", *EnvironmentConfig) } return *EnvironmentConfig } +// GetSdkImageOverrides gets the specified overrides as a map where each key is +// a regular expression pattern to match, and each value is the string to +// replace matching containers with. +func GetSdkImageOverrides() map[string]string { + ret := make(map[string]string) + for _, pattern := range SdkHarnessContainerImageOverrides { + splits := strings.SplitN(pattern, ",", 2) + ret[splits[0]] = splits[1] + } + return ret +} + // GetExperiments returns the experiments. func GetExperiments() []string { if *Experiments == "" { diff --git a/sdks/go/pkg/beam/options/jobopts/stringSlice.go b/sdks/go/pkg/beam/options/jobopts/stringSlice.go new file mode 100644 index 000000000000..f6e7565662da --- /dev/null +++ b/sdks/go/pkg/beam/options/jobopts/stringSlice.go @@ -0,0 +1,49 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package jobopts + +import ( + "fmt" +) + +// stringSlice is a flag.Value implementation for string slices, that allows +// multiple strings to be assigned to one flag by specifying multiple instances +// of the flag. +// +// Example: +// var myFlags stringSlice +// flag.Var(&myFlags, "my_flag", "A list of flags") +// With the example above, the slice can be set to contain ["foo", "bar"]: +// cmd -my_flag foo -my_flag bar +type stringSlice []string + +// String implements the String method of flag.Value. This outputs the value +// of the flag as a string. +func (s *stringSlice) String() string { + return fmt.Sprintf("%v", *s) +} + +// Set implements the Set method of flag.Value. This stores a string input to +// the flag into a stringSlice representation. +func (s *stringSlice) Set(value string) error { + *s = append(*s, value) + return nil +} + +// Get returns the instance itself. +func (s stringSlice) Get() interface{} { + return s +} diff --git a/sdks/go/pkg/beam/pardo.go b/sdks/go/pkg/beam/pardo.go index ab2a8e8abf20..dfbbc7e42305 100644 --- a/sdks/go/pkg/beam/pardo.go +++ b/sdks/go/pkg/beam/pardo.go @@ -17,10 +17,10 @@ package beam import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) func addParDoCtx(err error, s Scope) error { @@ -287,11 +287,11 @@ func ParDo0(s Scope, dofn interface{}, col PCollection, opts ...Option) { // called on each newly created restriction before they are processed. // * `RestrictionSize(elem, restriction) float64` // RestrictionSize returns a cheap size estimation for a restriction. This -// size is an abstract scalar value that represents how much work a -// restriction takes compared to other restrictions in the same DoFn. For -// example, a size of 200 represents twice as much work as a size of +// size is an abstract non-negative scalar value that represents how much +// work a restriction takes compared to other restrictions in the same DoFn. +// For example, a size of 200 represents twice as much work as a size of // 100, but the numbers do not represent anything on their own. Size is -// used by runners to estimate work for liquid sharding. +// used by runners to estimate work for dynamic work rebalancing. // * `CreateTracker(restriction) restrictionTracker` // CreateTracker creates and returns a restriction tracker (a concrete type // implementing the `sdf.RTracker` interface) given a restriction. The diff --git a/sdks/go/pkg/beam/pardo_test.go b/sdks/go/pkg/beam/pardo_test.go index 6f421dd3ef88..ae65241d5486 100644 --- a/sdks/go/pkg/beam/pardo_test.go +++ b/sdks/go/pkg/beam/pardo_test.go @@ -16,10 +16,20 @@ package beam import ( + "bytes" + "context" + "reflect" "strings" "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/jobopts" ) +func init() { + RegisterType(reflect.TypeOf((*AnnotationsFn)(nil))) +} + func TestParDoForSize(t *testing.T) { var tests = []struct { name string @@ -61,3 +71,64 @@ func TestFormatParDoError(t *testing.T) { t.Errorf("formatParDoError(testFunction,2,1) = %v, want = %v", got, want) } } + +func TestAnnotations(t *testing.T) { + m := make(map[string][]byte) + m["privacy_property"] = []byte("differential_privacy") + doFn := &AnnotationsFn{Annotations: m} + + p := NewPipeline() + s := p.Root() + + values := [2]int{0, 1} + col := CreateList(s, values) + ParDo(s, doFn, col) + + ctx := context.Background() + envUrn := jobopts.GetEnvironmentUrn(ctx) + getEnvCfg := jobopts.GetEnvironmentConfig + + environment, err := graphx.CreateEnvironment(ctx, envUrn, getEnvCfg) + if err != nil { + t.Fatalf("Couldn't create environment build: %v", err) + } + + edges, _, err := p.Build() + if err != nil { + t.Fatalf("Pipeline couldn't build: %v", err) + } + pb, err := graphx.Marshal(edges, &graphx.Options{Environment: environment}) + if err != nil { + t.Fatalf("Couldn't graphx.Marshal edges: %v", err) + } + components := pb.GetComponents() + transforms := components.GetTransforms() + + foundAnnotationsFn := false + for _, transform := range transforms { + if strings.Contains(transform.GetUniqueName(), "AnnotationsFn") { + foundAnnotationsFn = true + annotations := transform.GetAnnotations() + for name, annotation := range annotations { + if strings.Compare(name, "privacy_property") != 0 { + t.Errorf("Annotation name: got %v, want %v", name, "privacy_property") + } + if bytes.Compare(annotation, []byte("differential_privacy")) != 0 { + t.Errorf("Annotation value: got %v, want %v", annotation, []byte("differential_privacy")) + } + } + } + } + if !foundAnnotationsFn { + t.Errorf("Couldn't find AnnotationsFn in graph %v", transforms) + } +} + +// AnnotationsFn is a dummy DoFn with an annotation. +type AnnotationsFn struct { + Annotations map[string][]byte +} + +func (fn *AnnotationsFn) ProcessElement(v int) int { + return v +} diff --git a/sdks/go/pkg/beam/partition.go b/sdks/go/pkg/beam/partition.go index 071fb48e017f..a209e5adc9a2 100644 --- a/sdks/go/pkg/beam/partition.go +++ b/sdks/go/pkg/beam/partition.go @@ -21,11 +21,11 @@ import ( "encoding/json" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) var ( diff --git a/sdks/go/pkg/beam/partition_test.go b/sdks/go/pkg/beam/partition_test.go index 7d285cd5fc92..8fc4bf16703a 100644 --- a/sdks/go/pkg/beam/partition_test.go +++ b/sdks/go/pkg/beam/partition_test.go @@ -18,9 +18,9 @@ package beam_test import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) func init() { diff --git a/sdks/go/pkg/beam/pcollection.go b/sdks/go/pkg/beam/pcollection.go index 12110cd41c58..e5dc63289f39 100644 --- a/sdks/go/pkg/beam/pcollection.go +++ b/sdks/go/pkg/beam/pcollection.go @@ -16,9 +16,9 @@ package beam import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // PCollection is an immutable collection of values of type 'A', which must be @@ -49,7 +49,6 @@ func (p PCollection) IsValid() bool { } // TODO(herohde) 5/30/2017: add name for PCollections? Java supports it. -// TODO(herohde) 5/30/2017: add windowing strategy and documentation. // Type returns the full type 'A' of the elements. 'A' must be a concrete // type, such as int or KV. diff --git a/sdks/go/pkg/beam/pipeline.go b/sdks/go/pkg/beam/pipeline.go index 26087d483c3d..b3a2a10dc1ba 100644 --- a/sdks/go/pkg/beam/pipeline.go +++ b/sdks/go/pkg/beam/pipeline.go @@ -16,8 +16,8 @@ package beam import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" ) // Scope is a hierarchical grouping for composite transforms. Scopes can be @@ -90,4 +90,5 @@ func (p *Pipeline) String() string { // PipelineResult is the result of beamx.RunWithMetrics. type PipelineResult interface { Metrics() metrics.Results + JobID() string } diff --git a/sdks/go/pkg/beam/provision/provision.go b/sdks/go/pkg/beam/provision/provision.go index c53ce616ed3f..934a74c124c8 100644 --- a/sdks/go/pkg/beam/provision/provision.go +++ b/sdks/go/pkg/beam/provision/provision.go @@ -23,9 +23,9 @@ import ( "time" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" "github.com/golang/protobuf/jsonpb" google_pb "github.com/golang/protobuf/ptypes/struct" ) diff --git a/sdks/go/pkg/beam/runner.go b/sdks/go/pkg/beam/runner.go index 1b6b41fecd5b..43f6ccce5cd0 100644 --- a/sdks/go/pkg/beam/runner.go +++ b/sdks/go/pkg/beam/runner.go @@ -19,7 +19,7 @@ import ( "context" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) // TODO(herohde) 7/6/2017: do we want to make the selected runner visible to diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflow.go b/sdks/go/pkg/beam/runners/dataflow/dataflow.go index e29b17582a86..9032dea00adb 100644 --- a/sdks/go/pkg/beam/runners/dataflow/dataflow.go +++ b/sdks/go/pkg/beam/runners/dataflow/dataflow.go @@ -15,6 +15,11 @@ // Package dataflow contains the Dataflow runner for submitting pipelines // to Google Cloud Dataflow. +// +// This package infers Pipeline Options from flags automatically on job +// submission, for display in the Dataflow UI. +// Use the DontUseFlagAsPipelineOption function to prevent using a given +// flag as a PipelineOption. package dataflow import ( @@ -29,16 +34,17 @@ import ( "time" "cloud.google.com/go/storage" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/hooks" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts" - "github.com/apache/beam/sdks/go/pkg/beam/options/jobopts" - "github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib" - "github.com/apache/beam/sdks/go/pkg/beam/util/gcsx" - "github.com/apache/beam/sdks/go/pkg/beam/x/hooks/perf" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/pipelinex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/hooks" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/gcpopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/jobopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow/dataflowlib" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/gcsx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/hooks/perf" "github.com/golang/protobuf/proto" ) @@ -52,6 +58,7 @@ var ( serviceAccountEmail = flag.String("service_account_email", "", "Service account email (optional).") numWorkers = flag.Int64("num_workers", 0, "Number of workers (optional).") maxNumWorkers = flag.Int64("max_num_workers", 0, "Maximum number of workers during scaling (optional).") + diskSizeGb = flag.Int64("disk_size_gb", 0, "Size of root disk for VMs, in GB (optional).") autoscalingAlgorithm = flag.String("autoscaling_algorithm", "", "Autoscaling mode to use (optional).") zone = flag.String("zone", "", "GCP zone (optional)") network = flag.String("network", "", "GCP network (optional)") @@ -73,6 +80,66 @@ var ( sessionRecording = flag.String("session_recording", "", "Job records session transcripts") ) +// flagFilter filters flags that are already represented by the above flags +// or in the JobOpts to prevent them from appearing duplicated +// as PipelineOption display data. +// +// New flags that are already put into pipeline options +// should be added to this map. +var flagFilter = map[string]bool{ + "dataflow_endpoint": true, + "staging_location": true, + "worker_harness_container_image": true, + "labels": true, + "service_account_email": true, + "num_workers": true, + "max_num_workers": true, + "disk_size_gb": true, + "autoscaling_algorithm": true, + "zone": true, + "network": true, + "subnetwork": true, + "no_use_public_ips": true, + "temp_location": true, + "worker_machine_type": true, + "min_cpu_platform": true, + "dataflow_worker_jar": true, + "worker_region": true, + "worker_zone": true, + "teardown_policy": true, + "cpu_profiling": true, + "session_recording": true, + + // Job Options flags + "endpoint": true, + "job_name": true, + "environment_type": true, + "environment_config": true, + "experiments": true, + "async": true, + "retain_docker_containers": true, + "parallelism": true, + + // GCP opts + "project": true, + "region": true, + + // Other common beam flags. + "runner": true, + + // Don't filter these to note override. + // "beam_strict": true, + // "sdk_harness_container_image_override": true, + // "worker_binary": true, +} + +// DontUseFlagAsPipelineOption prevents a set flag from appearing +// as a PipelineOption in the Dataflow UI. Useful for sensitive, +// noisy, or irrelevant configuration. +func DontUseFlagAsPipelineOption(s string) { + flagFilter[s] = true +} + func init() { // Note that we also _ import harness/init to setup the remote execution hook. beam.RegisterRunner("dataflow", Execute) @@ -129,16 +196,23 @@ func Execute(ctx context.Context, p *beam.Pipeline) (beam.PipelineResult, error) experiments := jobopts.GetExperiments() // Always use runner v2, unless set already. - var v2set bool + var v2set, portaSubmission bool for _, e := range experiments { if strings.Contains(e, "use_runner_v2") || strings.Contains(e, "use_unified_worker") { v2set = true - break + } + if strings.Contains(e, "use_portable_job_submission") { + portaSubmission = true } } + // Enable by default unified worker, and portable job submission. if !v2set { experiments = append(experiments, "use_unified_worker") } + if !portaSubmission { + experiments = append(experiments, "use_portable_job_submission") + } + if *minCPUPlatform != "" { experiments = append(experiments, fmt.Sprintf("min_cpu_platform=%v", *minCPUPlatform)) } @@ -155,6 +229,7 @@ func Execute(ctx context.Context, p *beam.Pipeline) (beam.PipelineResult, error) NoUsePublicIPs: *noUsePublicIPs, NumWorkers: *numWorkers, MaxNumWorkers: *maxNumWorkers, + DiskSizeGb: *diskSizeGb, Algorithm: *autoscalingAlgorithm, MachineType: *machineType, Labels: jobLabels, @@ -165,32 +240,49 @@ func Execute(ctx context.Context, p *beam.Pipeline) (beam.PipelineResult, error) WorkerRegion: *workerRegion, WorkerZone: *workerZone, TeardownPolicy: *teardownPolicy, + ContainerImage: getContainerImage(ctx), } if opts.TempLocation == "" { opts.TempLocation = gcsx.Join(*stagingLocation, "tmp") } // (1) Build and submit + // NOTE(herohde) 10/8/2018: the last segment of the names must be "worker" and "dataflow-worker.jar". + id := fmt.Sprintf("go-%v-%v", atomic.AddInt32(&unique, 1), time.Now().UnixNano()) + + modelURL := gcsx.Join(*stagingLocation, id, "model") + workerURL := gcsx.Join(*stagingLocation, id, "worker") + jarURL := gcsx.Join(*stagingLocation, id, "dataflow-worker.jar") + xlangURL := gcsx.Join(*stagingLocation, id, "xlang") edges, _, err := p.Build() if err != nil { return nil, err } - enviroment, err := graphx.CreateEnvironment(ctx, jobopts.GetEnvironmentUrn(ctx), getContainerImage) + artifactURLs, err := dataflowlib.ResolveXLangArtifacts(ctx, edges, opts.Project, xlangURL) if err != nil { - return nil, errors.WithContext(err, "generating model pipeline") + return nil, errors.WithContext(err, "resolving cross-language artifacts") } - model, err := graphx.Marshal(edges, &graphx.Options{Environment: enviroment}) + opts.ArtifactURLs = artifactURLs + environment, err := graphx.CreateEnvironment(ctx, jobopts.GetEnvironmentUrn(ctx), getContainerImage) + if err != nil { + return nil, errors.WithContext(err, "creating environment for model pipeline") + } + model, err := graphx.Marshal(edges, &graphx.Options{Environment: environment}) if err != nil { return nil, errors.WithContext(err, "generating model pipeline") } + err = pipelinex.ApplySdkImageOverrides(model, jobopts.GetSdkImageOverrides()) + if err != nil { + return nil, errors.WithContext(err, "applying container image overrides") + } - // NOTE(herohde) 10/8/2018: the last segment of the names must be "worker" and "dataflow-worker.jar". - id := fmt.Sprintf("go-%v-%v", atomic.AddInt32(&unique, 1), time.Now().UnixNano()) - - modelURL := gcsx.Join(*stagingLocation, id, "model") - workerURL := gcsx.Join(*stagingLocation, id, "worker") - jarURL := gcsx.Join(*stagingLocation, id, "dataflow-worker.jar") + // Apply the all the as Go Options + flag.Visit(func(f *flag.Flag) { + if !flagFilter[f.Name] { + opts.Options.Options[f.Name] = f.Value.String() + } + }) if *dryRun { log.Info(ctx, "Dry-run: not submitting job!") @@ -204,8 +296,7 @@ func Execute(ctx context.Context, p *beam.Pipeline) (beam.PipelineResult, error) return nil, nil } - _, err = dataflowlib.Execute(ctx, model, opts, workerURL, jarURL, modelURL, *endpoint, *executeAsync) - return nil, err + return dataflowlib.Execute(ctx, model, opts, workerURL, jarURL, modelURL, *endpoint, *executeAsync) } func gcsRecorderHook(opts []string) perf.CaptureHook { bucket, prefix, err := gcsx.ParseObject(opts[0]) diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflow_test.go b/sdks/go/pkg/beam/runners/dataflow/dataflow_test.go new file mode 100644 index 000000000000..1e2844630c4b --- /dev/null +++ b/sdks/go/pkg/beam/runners/dataflow/dataflow_test.go @@ -0,0 +1,29 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package dataflow + +import "testing" + +func TestDontUseFlagAsPipelineOption(t *testing.T) { + f := "dummy_flag" + if flagFilter[f] { + t.Fatalf("%q is already a flag, but should be unset", f) + } + DontUseFlagAsPipelineOption(f) + if !flagFilter[f] { + t.Fatalf("%q should be in the filter, but isn't set", f) + } +} diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/execute.go b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/execute.go index 784059c02834..ae916447df38 100644 --- a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/execute.go +++ b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/execute.go @@ -22,17 +22,20 @@ import ( "encoding/json" "os" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/protox" - "github.com/apache/beam/sdks/go/pkg/beam/log" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - "github.com/apache/beam/sdks/go/pkg/beam/runners/universal/runnerlib" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal/runnerlib" "github.com/golang/protobuf/proto" df "google.golang.org/api/dataflow/v1b3" ) // Execute submits a pipeline as a Dataflow job. -func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, workerURL, jarURL, modelURL, endpoint string, async bool) (string, error) { +func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, workerURL, jarURL, modelURL, endpoint string, async bool) (*dataflowPipelineResult, error) { // (1) Upload Go binary to GCS. + presult := &dataflowPipelineResult{} bin := opts.Worker if bin == "" { @@ -44,7 +47,7 @@ func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, worker worker, err := runnerlib.BuildTempWorkerBinary(ctx) if err != nil { - return "", err + return presult, err } defer os.Remove(worker) @@ -57,7 +60,7 @@ func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, worker log.Infof(ctx, "Staging worker binary: %v", bin) if err := StageFile(ctx, opts.Project, workerURL, bin); err != nil { - return "", err + return presult, err } log.Infof(ctx, "Staged worker binary: %v", workerURL) @@ -65,7 +68,7 @@ func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, worker log.Infof(ctx, "Staging Dataflow worker jar: %v", opts.WorkerJar) if err := StageFile(ctx, opts.Project, jarURL, opts.WorkerJar); err != nil { - return "", err + return presult, err } log.Infof(ctx, "Staged worker jar: %v", jarURL) } @@ -74,12 +77,12 @@ func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, worker p, err := Fixup(raw) if err != nil { - return "", err + return presult, err } log.Info(ctx, proto.MarshalTextString(p)) if err := StageModel(ctx, opts.Project, modelURL, protox.MustEncode(p)); err != nil { - return "", err + return presult, err } log.Infof(ctx, "Staged model pipeline: %v", modelURL) @@ -87,31 +90,41 @@ func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, worker job, err := Translate(ctx, p, opts, workerURL, jarURL, modelURL) if err != nil { - return "", err + return presult, err } PrintJob(ctx, job) client, err := NewClient(ctx, endpoint) if err != nil { - return "", err + return presult, err } upd, err := Submit(ctx, client, opts.Project, opts.Region, job) if err != nil { - return "", err + return presult, err } log.Infof(ctx, "Submitted job: %v", upd.Id) if endpoint == "" { - log.Infof(ctx, "Console: https://console.cloud.google.com/dataflow/job/%v?project=%v", upd.Id, opts.Project) + log.Infof(ctx, "Console: https://console.cloud.google.com/dataflow/jobs/%v/%v?project=%v", opts.Region, upd.Id, opts.Project) } log.Infof(ctx, "Logs: https://console.cloud.google.com/logs/viewer?project=%v&resource=dataflow_step%%2Fjob_id%%2F%v", opts.Project, upd.Id) + presult.jobID = upd.Id + if async { - return upd.Id, nil + return presult, nil } // (4) Wait for completion. + err = WaitForCompletion(ctx, client, opts.Project, opts.Region, upd.Id) - return upd.Id, WaitForCompletion(ctx, client, opts.Project, opts.Region, upd.Id) + res, presultErr := newDataflowPipelineResult(ctx, client, p, opts.Project, opts.Region, upd.Id) + if presultErr != nil { + if err != nil { + return presult, errors.Wrap(err, presultErr.Error()) + } + return presult, presultErr + } + return res, err } // PrintJob logs the Dataflow job. @@ -122,3 +135,24 @@ func PrintJob(ctx context.Context, job *df.Job) { } log.Info(ctx, string(str)) } + +type dataflowPipelineResult struct { + jobID string + metrics *metrics.Results +} + +func newDataflowPipelineResult(ctx context.Context, client *df.Service, p *pipepb.Pipeline, project, region, jobID string) (*dataflowPipelineResult, error) { + res, err := GetMetrics(ctx, client, project, region, jobID) + if err != nil { + return &dataflowPipelineResult{jobID, nil}, errors.Wrap(err, "failed to get metrics") + } + return &dataflowPipelineResult{jobID, FromMetricUpdates(res.Metrics, p)}, nil +} + +func (pr dataflowPipelineResult) Metrics() metrics.Results { + return *pr.metrics +} + +func (pr dataflowPipelineResult) JobID() string { + return pr.jobID +} diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/fixup.go b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/fixup.go index 013ec30e2964..b56acc962d21 100644 --- a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/fixup.go +++ b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/fixup.go @@ -16,7 +16,7 @@ package dataflowlib import ( - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) // Fixup proto pipeline with Dataflow quirks. diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job.go b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job.go index b9e913edc38b..eee14f4e8cb8 100644 --- a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job.go +++ b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job.go @@ -17,16 +17,18 @@ package dataflowlib import ( "context" + "fmt" "strings" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" // Importing to get the side effect of the remote execution hook. See init(). - _ "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/init" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/pipelinex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/init" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/pipelinex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "golang.org/x/oauth2/google" df "google.golang.org/api/dataflow/v1b3" ) @@ -48,11 +50,14 @@ type JobOptions struct { Subnetwork string NoUsePublicIPs bool NumWorkers int64 + DiskSizeGb int64 MachineType string Labels map[string]string ServiceAccountEmail string WorkerRegion string WorkerZone string + ContainerImage string + ArtifactURLs []string // Additional packages for workers. // Autoscaling settings Algorithm string @@ -74,9 +79,22 @@ type JobOptions struct { func Translate(ctx context.Context, p *pipepb.Pipeline, opts *JobOptions, workerURL, jarURL, modelURL string) (*df.Job, error) { // (1) Translate pipeline to v1b3 speak. - steps, err := translate(p) - if err != nil { - return nil, err + isPortableJob := false + for _, exp := range opts.Experiments { + if exp == "use_portable_job_submission" { + isPortableJob = true + } + } + + var steps []*df.Step + if isPortableJob { // Portable jobs do not need to provide dataflow steps. + steps = make([]*df.Step, 0) + } else { + var err error + steps, err = translate(p) + if err != nil { + return nil, err + } } jobType := "JOB_TYPE_BATCH" @@ -89,8 +107,12 @@ func Translate(ctx context.Context, p *pipepb.Pipeline, opts *JobOptions, worker } images := pipelinex.ContainerImages(p) - if len(images) != 1 { - return nil, errors.Errorf("Dataflow supports one container image only: %v", images) + dfImages := make([]*df.SdkHarnessContainerImage, 0, len(images)) + for _, img := range images { + dfImages = append(dfImages, &df.SdkHarnessContainerImage{ + ContainerImage: img, + UseSingleCorePerContainer: false, + }) } packages := []*df.Package{{ @@ -108,6 +130,15 @@ func Translate(ctx context.Context, p *pipepb.Pipeline, opts *JobOptions, worker experiments = append(experiments, "use_staged_dataflow_worker_jar") } + for _, url := range opts.ArtifactURLs { + name := url[strings.LastIndexAny(url, "/")+1:] + pkg := &df.Package{ + Name: name, + Location: url, + } + packages = append(packages, pkg) + } + ipConfiguration := "WORKER_IP_UNSPECIFIED" if opts.NoUsePublicIPs { ipConfiguration = "WORKER_IP_PRIVATE" @@ -124,8 +155,8 @@ func Translate(ctx context.Context, p *pipepb.Pipeline, opts *JobOptions, worker Environment: &df.Environment{ ServiceAccountEmail: opts.ServiceAccountEmail, UserAgent: newMsg(userAgent{ - Name: "Apache Beam SDK for Go", - Version: "2.27.0.dev", + Name: core.SdkName, + Version: core.SdkVersion, }), Version: newMsg(version{ JobType: apiJobType, @@ -144,10 +175,12 @@ func Translate(ctx context.Context, p *pipepb.Pipeline, opts *JobOptions, worker AutoscalingSettings: &df.AutoscalingSettings{ MaxNumWorkers: opts.MaxNumWorkers, }, + DiskSizeGb: opts.DiskSizeGb, IpConfiguration: ipConfiguration, Kind: "harness", Packages: packages, - WorkerHarnessContainerImage: images[0], + WorkerHarnessContainerImage: opts.ContainerImage, + SdkHarnessContainerImages: dfImages, NumWorkers: 1, MachineType: opts.MachineType, Network: opts.Network, @@ -240,6 +273,12 @@ func NewClient(ctx context.Context, endpoint string) (*df.Service, error) { return client, nil } +// GetMetrics returns a collection of metrics describing the progress of a +// job by making a call to Cloud Monitoring service. +func GetMetrics(ctx context.Context, client *df.Service, project, region, jobID string) (*df.JobMetrics, error) { + return client.Projects.Locations.Jobs.GetMetrics(project, region, jobID).Do() +} + type dataflowOptions struct { Experiments []string `json:"experiments,omitempty"` PipelineURL string `json:"pipelineUrl"` @@ -307,5 +346,17 @@ func validateWorkerSettings(ctx context.Context, opts *JobOptions) error { opts.WorkerZone = opts.Zone opts.Zone = "" } + + numWorkers := opts.NumWorkers + maxNumWorkers := opts.MaxNumWorkers + if numWorkers < 0 { + return fmt.Errorf("num_workers (%d) cannot be negative", numWorkers) + } + if maxNumWorkers < 0 { + return fmt.Errorf("max_num_workers (%d) cannot be negative", maxNumWorkers) + } + if numWorkers > 0 && maxNumWorkers > 0 && numWorkers > maxNumWorkers { + return fmt.Errorf("num_workers (%d) cannot exceed max_num_workers (%d)", numWorkers, maxNumWorkers) + } return nil } diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job_test.go b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job_test.go index a7602292642b..1bf178f2700c 100644 --- a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job_test.go +++ b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job_test.go @@ -76,6 +76,28 @@ func TestValidateWorkerSettings(t *testing.T) { }, errMessage: "experiment worker_region and option Zone are mutually exclusive", }, + { + name: "test_num_workers_cannot_be_negative", + jobOptions: JobOptions{ + NumWorkers: -1, + }, + errMessage: "num_workers (-1) cannot be negative", + }, + { + name: "test_max_num_workers_cannot_be_negative", + jobOptions: JobOptions{ + MaxNumWorkers: -1, + }, + errMessage: "max_num_workers (-1) cannot be negative", + }, + { + name: "test_num_workers_cannot_exceed_max_num_workers", + jobOptions: JobOptions{ + NumWorkers: 43, + MaxNumWorkers: 42, + }, + errMessage: "num_workers (43) cannot exceed max_num_workers (42)", + }, } for _, test := range testsWithErr { @@ -110,6 +132,17 @@ func TestValidateWorkerSettings(t *testing.T) { opts: JobOptions{WorkerRegion: "foo"}, expected: JobOptions{WorkerRegion: "foo"}, }, + { + name: "test_num_workers_can_equal_max_num_workers", + opts: JobOptions{ + NumWorkers: 42, + MaxNumWorkers: 42, + }, + expected: JobOptions{ + NumWorkers: 42, + MaxNumWorkers: 42, + }, + }, } for _, test := range tests { diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/messages.go b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/messages.go index 864181169756..c8667c377246 100644 --- a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/messages.go +++ b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/messages.go @@ -19,8 +19,8 @@ import ( "encoding/json" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" "google.golang.org/api/googleapi" ) diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/metrics.go b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/metrics.go new file mode 100644 index 000000000000..0d1dde0dc0a6 --- /dev/null +++ b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/metrics.go @@ -0,0 +1,119 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package dataflowlib + +import ( + "fmt" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + df "google.golang.org/api/dataflow/v1b3" +) + +// FromMetricUpdates extracts metrics from a slice of MetricUpdate objects and +// groups them into counters, distributions and gauges. +// +// Dataflow currently only reports Counter and Distribution metrics to Cloud +// Monitoring. Gauge metrics are not supported. The output metrics.Results will +// not contain any gauges. +func FromMetricUpdates(allMetrics []*df.MetricUpdate, p *pipepb.Pipeline) *metrics.Results { + ac, ad := groupByType(allMetrics, p, true) + cc, cd := groupByType(allMetrics, p, false) + + return metrics.NewResults(metrics.MergeCounters(ac, cc), metrics.MergeDistributions(ad, cd), make([]metrics.GaugeResult, 0)) +} + +func groupByType(allMetrics []*df.MetricUpdate, p *pipepb.Pipeline, tentative bool) ( + map[metrics.StepKey]int64, + map[metrics.StepKey]metrics.DistributionValue) { + counters := make(map[metrics.StepKey]int64) + distributions := make(map[metrics.StepKey]metrics.DistributionValue) + + for _, metric := range allMetrics { + isTentative := metric.Name.Context["tentative"] == "true" + if isTentative != tentative { + continue + } + + key, err := extractKey(metric, p) + if err != nil { + continue + } + + if metric.Scalar != nil { + v, err := extractCounterValue(metric.Scalar) + if err != nil { + continue + } + counters[key] = v + } else if metric.Distribution != nil { + v, err := extractDistributionValue(metric.Distribution) + if err != nil { + continue + } + distributions[key] = v + } + } + return counters, distributions +} + +func extractKey(metric *df.MetricUpdate, p *pipepb.Pipeline) (metrics.StepKey, error) { + stepName, ok := metric.Name.Context["step"] + if !ok { + return metrics.StepKey{}, fmt.Errorf("could not find the internal step name") + } + userStepName := "" + + for k, transform := range p.GetComponents().GetTransforms() { + if k == stepName { + userStepName = transform.GetUniqueName() + break + } + } + if userStepName == "" { + return metrics.StepKey{}, fmt.Errorf("could not translate the internal step name %v", stepName) + } + + namespace := metric.Name.Context["namespace"] + if namespace == "" { + namespace = "dataflow/v1b3" + } + + return metrics.StepKey{Step: userStepName, Name: metric.Name.Name, Namespace: namespace}, nil +} + +func extractCounterValue(obj interface{}) (int64, error) { + v, ok := obj.(float64) + if !ok { + return -1, fmt.Errorf("expected float64, got data of type %T instead", obj) + } + return int64(v), nil +} + +func extractDistributionValue(obj interface{}) (metrics.DistributionValue, error) { + m := obj.(map[string]interface{}) + propertiesToVisit := []string{"count", "sum", "min", "max"} + var values [4]int64 + + for i, p := range propertiesToVisit { + v, ok := m[p].(float64) + if !ok { + return metrics.DistributionValue{}, fmt.Errorf("expected float64, got data of type %T instead", m[p]) + } + values[i] = int64(v) + } + return metrics.DistributionValue{Count: values[0], Sum: values[1], Min: values[2], Max: values[3]}, nil +} diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/metrics_test.go b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/metrics_test.go new file mode 100644 index 000000000000..3c0adab3c913 --- /dev/null +++ b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/metrics_test.go @@ -0,0 +1,127 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package dataflowlib + +import ( + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/google/go-cmp/cmp" + df "google.golang.org/api/dataflow/v1b3" +) + +func TestFromMetricUpdates_Counters(t *testing.T) { + want := metrics.CounterResult{ + Attempted: 15, + Committed: 15, + Key: metrics.StepKey{ + Step: "main.customDoFn", + Name: "customCounter", + Namespace: "customDoFn", + }} + cName := newMetricStructuredName("customCounter", "customDoFn", false) + committed := df.MetricUpdate{Name: &cName, Scalar: 15.0} + + aName := newMetricStructuredName("customCounter", "customDoFn", true) + attempted := df.MetricUpdate{Name: &aName, Scalar: 15.0} + + p, err := newPipeline("main.customDoFn") + if err != nil { + t.Fatal(err) + } + + got := FromMetricUpdates([]*df.MetricUpdate{&attempted, &committed}, p).AllMetrics().Counters() + size := len(got) + if size < 1 { + t.Fatalf("Invalid array's size: got: %v, want: %v", size, 1) + } + if d := cmp.Diff(want, got[0]); d != "" { + t.Fatalf("Invalid counter: got: %v, want: %v, diff(-want,+got):\n %v", + got[0], want, d) + } +} + +func TestFromMetricUpdates_Distributions(t *testing.T) { + want := metrics.DistributionResult{ + Attempted: metrics.DistributionValue{ + Count: 100, + Sum: 5, + Min: -12, + Max: 30, + }, + Committed: metrics.DistributionValue{ + Count: 100, + Sum: 5, + Min: -12, + Max: 30, + }, + Key: metrics.StepKey{ + Step: "main.customDoFn", + Name: "customDist", + Namespace: "customDoFn", + }} + distribution := map[string]interface{}{ + "count": 100.0, + "sum": 5.0, + "min": -12.0, + "max": 30.0, + } + cName := newMetricStructuredName("customDist", "customDoFn", false) + committed := df.MetricUpdate{Name: &cName, Distribution: distribution} + + aName := newMetricStructuredName("customDist", "customDoFn", true) + attempted := df.MetricUpdate{Name: &aName, Distribution: distribution} + + p, err := newPipeline("main.customDoFn") + if err != nil { + t.Fatal(err) + } + + got := FromMetricUpdates([]*df.MetricUpdate{&attempted, &committed}, p).AllMetrics().Distributions() + size := len(got) + if size < 1 { + t.Fatalf("Invalid array's size: got: %v, want: %v", size, 1) + } + if d := cmp.Diff(want, got[0]); d != "" { + t.Fatalf("Invalid distribution: got: %v, want: %v, diff(-want,+got):\n %v", + got[0], want, d) + } +} + +func newMetricStructuredName(name, namespace string, attempted bool) df.MetricStructuredName { + context := map[string]string{ + "step": "e5", + "namespace": namespace, + } + if attempted { + context["tentative"] = "true" + } + return df.MetricStructuredName{Context: context, Name: name} +} + +func newPipeline(stepName string) (*pipepb.Pipeline, error) { + p := &pipepb.Pipeline{ + Components: &pipepb.Components{ + Transforms: map[string]*pipepb.PTransform{ + "e5": &pipepb.PTransform{ + UniqueName: stepName, + }, + }, + }, + } + return p, nil +} diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/stage.go b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/stage.go index 67a2bdefb4fe..bd0d4147d04f 100644 --- a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/stage.go +++ b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/stage.go @@ -22,8 +22,10 @@ import ( "os" "cloud.google.com/go/storage" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/util/gcsx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/xlangx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/gcsx" ) // StageModel uploads the pipeline model to GCS as a unique object. @@ -54,3 +56,28 @@ func upload(ctx context.Context, project, object string, r io.Reader) error { _, err = gcsx.Upload(ctx, client, project, bucket, obj, r) return err } + +// ResolveXLangArtifacts resolves cross-language artifacts with a given GCS +// URL as a destination, and then stages all local artifacts to that URL. This +// function returns a list of staged artifact URLs. +func ResolveXLangArtifacts(ctx context.Context, edges []*graph.MultiEdge, project, url string) ([]string, error) { + cfg := xlangx.ResolveConfig{ + SdkPath: url, + JoinFn: func(url, name string) string { + return gcsx.Join(url, "/", name) + }, + } + paths, err := xlangx.ResolveArtifactsWithConfig(ctx, edges, cfg) + if err != nil { + return nil, err + } + var urls []string + for local, remote := range paths { + err := StageFile(ctx, project, remote, local) + if err != nil { + return nil, errors.WithContextf(err, "staging file to %v", remote) + } + urls = append(urls, remote) + } + return urls, nil +} diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go index 2bddbaff21df..e18ff6c74e14 100644 --- a/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go +++ b/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go @@ -22,18 +22,18 @@ import ( "net/url" "path" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/pipelinex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/protox" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/stringx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - pubsub_v1 "github.com/apache/beam/sdks/go/pkg/beam/io/pubsubio/v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/pipelinex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/stringx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" df "google.golang.org/api/dataflow/v1b3" ) @@ -118,7 +118,7 @@ func (x *translator) translateTransform(trunk string, id string) ([]*df.Step, er // URL Query-escaped windowed _unnested_ value. It is read back in // a nested context at runtime. var buf bytes.Buffer - if err := exec.EncodeWindowedValueHeader(exec.MakeWindowEncoder(coder.NewGlobalWindow()), window.SingleGlobalWindow, mtime.ZeroTimestamp, &buf); err != nil { + if err := exec.EncodeWindowedValueHeader(exec.MakeWindowEncoder(coder.NewGlobalWindow()), window.SingleGlobalWindow, mtime.ZeroTimestamp, typex.NoFiringPane(), &buf); err != nil { return nil, err } value := string(append(buf.Bytes(), t.GetSpec().Payload...)) @@ -246,40 +246,6 @@ func (x *translator) translateTransform(trunk string, id string) ([]*df.Step, er prop.SerializedFn = encodeSerializedFn(x.extractWindowingStrategy(out)) return []*df.Step{x.newStep(id, windowIntoKind, prop)}, nil - case pubsub_v1.PubSubPayloadURN: - // Translate to native handling of PubSub I/O. - - var msg pubsub_v1.PubSubPayload - if err := proto.Unmarshal(t.Spec.Payload, &msg); err != nil { - return nil, errors.Wrap(err, "bad pubsub payload") - } - - prop.Format = "pubsub" - prop.PubSubTopic = msg.GetTopic() - prop.PubSubSubscription = msg.GetSubscription() - prop.PubSubIDLabel = msg.GetIdAttribute() - prop.PubSubTimestampLabel = msg.GetTimestampAttribute() - prop.PubSubWithAttributes = msg.GetWithAttributes() - - if prop.PubSubSubscription != "" { - prop.PubSubTopic = "" - } - - switch msg.Op { - case pubsub_v1.PubSubPayload_READ: - return []*df.Step{x.newStep(id, readKind, prop)}, nil - - case pubsub_v1.PubSubPayload_WRITE: - in := stringx.SingleValue(t.Inputs) - - prop.ParallelInput = x.pcollections[in] - prop.Encoding = x.wrapCoder(x.comp.Pcollections[in], coder.NewBytes()) - return []*df.Step{x.newStep(id, writeKind, prop)}, nil - - default: - return nil, errors.Errorf("bad pubsub op: %v", msg.Op) - } - default: if len(t.Subtransforms) > 0 { return x.translateTransforms(fmt.Sprintf("%v%v/", trunk, path.Base(t.UniqueName)), t.Subtransforms) diff --git a/sdks/go/pkg/beam/runners/direct/buffer.go b/sdks/go/pkg/beam/runners/direct/buffer.go index 05c15262b0e8..9606a55144aa 100644 --- a/sdks/go/pkg/beam/runners/direct/buffer.go +++ b/sdks/go/pkg/beam/runners/direct/buffer.go @@ -19,9 +19,9 @@ import ( "context" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) // buffer buffers all input and notifies on FinishBundle. It is also a SideInputAdapter. diff --git a/sdks/go/pkg/beam/runners/direct/direct.go b/sdks/go/pkg/beam/runners/direct/direct.go index d304443f76fe..172915a5bff3 100644 --- a/sdks/go/pkg/beam/runners/direct/direct.go +++ b/sdks/go/pkg/beam/runners/direct/direct.go @@ -21,15 +21,15 @@ import ( "context" "path" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/options/jobopts" - "github.com/apache/beam/sdks/go/pkg/beam/runners/vet" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/jobopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/vet" ) func init() { @@ -47,6 +47,7 @@ func Execute(ctx context.Context, p *beam.Pipeline) (beam.PipelineResult, error) log.Info(ctx, "Pipeline:") log.Info(ctx, p) + ctx = metrics.SetBundleID(ctx, "direct") // Ensure a metrics.Store exists. if *jobopts.Strict { log.Info(ctx, "Strict mode enabled, applying additional validation.") @@ -73,10 +74,26 @@ func Execute(ctx context.Context, p *beam.Pipeline) (beam.PipelineResult, error) if err = plan.Down(ctx); err != nil { return nil, err } - // TODO(lostluck) 2020/01/24: What's the right way to expose the - // metrics store for the direct runner? - metrics.DumpToLogFromStore(ctx, plan.Store()) - return nil, nil + + return newDirectPipelineResult(ctx) +} + +type directPipelineResult struct { + jobID string + metrics *metrics.Results +} + +func newDirectPipelineResult(ctx context.Context) (*directPipelineResult, error) { + metrics := metrics.MetricsExtractor(ctx) + return &directPipelineResult{metrics: &metrics}, nil +} + +func (pr directPipelineResult) Metrics() metrics.Results { + return *pr.metrics +} + +func (pr directPipelineResult) JobID() string { + return pr.jobID } // Compile translates a pipeline to a multi-bundle execution plan. diff --git a/sdks/go/pkg/beam/runners/direct/gbk.go b/sdks/go/pkg/beam/runners/direct/gbk.go index 13763ecb8136..f1edb8129248 100644 --- a/sdks/go/pkg/beam/runners/direct/gbk.go +++ b/sdks/go/pkg/beam/runners/direct/gbk.go @@ -19,11 +19,13 @@ import ( "bytes" "context" "fmt" + "sort" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) type group struct { @@ -40,6 +42,7 @@ type CoGBK struct { enc exec.ElementEncoder // key encoder for coder-equality wEnc exec.WindowEncoder // window encoder for windowing m map[string]*group + wins []typex.Window } func (n *CoGBK) ID() exec.UnitID { @@ -63,30 +66,52 @@ func (n *CoGBK) ProcessElement(ctx context.Context, elm *exec.FullValue, _ ...ex for _, w := range elm.Windows { ws := []typex.Window{w} + n.wins = append(n.wins, ws...) - var buf bytes.Buffer - if err := n.enc.Encode(&exec.FullValue{Elm: value.Elm}, &buf); err != nil { - return errors.WithContextf(err, "encoding key %v for CoGBK", elm) - } - if err := n.wEnc.Encode(ws, &buf); err != nil { - return errors.WithContextf(err, "encoding window %v for CoGBK", w) - } - key := buf.String() - - g, ok := n.m[key] - if !ok { - g = &group{ - key: exec.FullValue{Elm: value.Elm, Timestamp: value.Timestamp, Windows: ws}, - values: make([][]exec.FullValue, len(n.Edge.Input)), - } - n.m[key] = g + key, err := n.encodeKey(value.Elm, ws) + if err != nil { + return errors.Errorf("failed encoding key for %v: %v", elm, err) } + g := n.getGroup(n.m, key, value, ws) g.values[index] = append(g.values[index], exec.FullValue{Elm: value.Elm2, Timestamp: value.Timestamp}) } return nil } +func (n *CoGBK) encodeKey(elm interface{}, ws []typex.Window) (string, error) { + var buf bytes.Buffer + if err := n.enc.Encode(&exec.FullValue{Elm: elm}, &buf); err != nil { + return "", errors.WithContextf(err, "encoding key %v for CoGBK", elm) + } + if err := n.wEnc.Encode(ws, &buf); err != nil { + return "", errors.WithContextf(err, "encoding window %v for CoGBK", ws) + } + return buf.String(), nil +} + +func (n *CoGBK) getGroup(m map[string]*group, key string, value *exec.FullValue, ws []typex.Window) *group { + g, ok := m[key] + if !ok { + g = &group{ + key: exec.FullValue{Elm: value.Elm, Timestamp: value.Timestamp, Windows: ws}, + values: make([][]exec.FullValue, len(n.Edge.Input)), + } + m[key] = g + } + return g +} + func (n *CoGBK) FinishBundle(ctx context.Context) error { + winKind := n.Edge.Input[0].From.WindowingStrategy().Fn.Kind + if winKind == window.Sessions { + mergeMap, mergeErr := n.mergeWindows() + if mergeErr != nil { + return errors.Errorf("failed to merge windows, got: %v", mergeErr) + } + if reprocessErr := n.reprocessByWindow(mergeMap); reprocessErr != nil { + return errors.Errorf("failed to reprocess with merged windows, got :%v", reprocessErr) + } + } for key, g := range n.m { values := make([]exec.ReStream, len(g.values)) for i, list := range g.values { @@ -100,6 +125,58 @@ func (n *CoGBK) FinishBundle(ctx context.Context) error { return n.Out.FinishBundle(ctx) } +func (n *CoGBK) mergeWindows() (map[typex.Window]int, error) { + sort.Slice(n.wins, func(i int, j int) bool { + return n.wins[i].MaxTimestamp() < n.wins[j].MaxTimestamp() + }) + // mergeMap is a map from the oringal windows to the index of the new window + // in the mergedWins slice + mergeMap := make(map[typex.Window]int) + mergedWins := []typex.Window{} + for i := 0; i < len(n.wins); { + intWin, ok := n.wins[i].(window.IntervalWindow) + if !ok { + return nil, errors.Errorf("tried to merge non-interval window type %T", n.wins[i]) + } + mergeStart := intWin.Start + mergeEnd := intWin.End + j := i + 1 + for j < len(n.wins) { + candidateWin := n.wins[j].(window.IntervalWindow) + if candidateWin.Start <= mergeEnd { + mergeEnd = candidateWin.End + j++ + } else { + break + } + } + for k := i; k < j; k++ { + mergeMap[n.wins[k]] = len(mergedWins) + } + mergedWins = append(mergedWins, window.IntervalWindow{Start: mergeStart, End: mergeEnd}) + i = j + } + n.wins = mergedWins + return mergeMap, nil +} + +func (n *CoGBK) reprocessByWindow(mergeMap map[typex.Window]int) error { + newGroups := make(map[string]*group) + for _, g := range n.m { + ws := []typex.Window{n.wins[mergeMap[g.key.Windows[0]]]} + key, err := n.encodeKey(g.key.Elm, ws) + if err != nil { + return errors.Errorf("failed encoding key for %v: %v", g.key.Elm, err) + } + gr := n.getGroup(newGroups, key, &g.key, ws) + for i, list := range g.values { + gr.values[i] = append(gr.values[i], list...) + } + } + n.m = newGroups + return nil +} + func (n *CoGBK) Down(ctx context.Context) error { return nil } diff --git a/sdks/go/pkg/beam/runners/direct/gbk_test.go b/sdks/go/pkg/beam/runners/direct/gbk_test.go new file mode 100644 index 000000000000..d6c5450039fe --- /dev/null +++ b/sdks/go/pkg/beam/runners/direct/gbk_test.go @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package direct + +import ( + "strings" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" +) + +func TestMergeWindows(t *testing.T) { + tests := []struct { + name string + wins []typex.Window + expectedMerge []typex.Window + expectedMapping []int + }{ + { + "two to one", + []typex.Window{window.IntervalWindow{Start: 0, End: 1000}, window.IntervalWindow{Start: 900, End: 1200}}, + []typex.Window{window.IntervalWindow{Start: 0, End: 1200}}, + []int{0, 0}, + }, + { + "four to two", + []typex.Window{window.IntervalWindow{Start: 0, End: 1000}, window.IntervalWindow{Start: 900, End: 1200}, + window.IntervalWindow{Start: 1750, End: 1900}, window.IntervalWindow{Start: 1900, End: 2000}}, + []typex.Window{window.IntervalWindow{Start: 0, End: 1200}, window.IntervalWindow{Start: 1750, End: 2000}}, + []int{0, 0, 1, 1}, + }, + { + "no merge", + []typex.Window{window.IntervalWindow{Start: 0, End: 800}, window.IntervalWindow{Start: 900, End: 1200}}, + []typex.Window{window.IntervalWindow{Start: 0, End: 800}, window.IntervalWindow{Start: 900, End: 1200}}, + []int{0, 1}, + }, + { + "no windows", + []typex.Window{}, + []typex.Window{}, + []int{}, + }, + } + for _, tc := range tests { + c := CoGBK{wins: tc.wins} + m, err := c.mergeWindows() + if err != nil { + t.Errorf("mergeWindows returned error, got %v", err) + } + if len(c.wins) != len(tc.expectedMerge) { + t.Errorf("%v got %v windows instead of 1", tc.name, len(c.wins)) + } + for i, win := range c.wins { + if !win.Equals(tc.expectedMerge[i]) { + t.Errorf("%v got window %v, expected %v", tc.name, win, tc.expectedMerge[i]) + } + if m[win] != tc.expectedMapping[i] { + t.Errorf("%v window %v mapped to index %v, expected %v", tc.name, win, m[win], tc.expectedMapping[i]) + } + } + } +} + +func TestMergeWindows_BadType(t *testing.T) { + c := CoGBK{wins: []typex.Window{window.GlobalWindow{}}} + _, err := c.mergeWindows() + if err == nil { + t.Fatalf("mergeWindows() succeeded when it should have failed") + } + if !strings.Contains(err.Error(), "tried to merge non-interval window type window.GlobalWindow") { + t.Errorf("mergeWindows failed but got incorrect error %v", err) + } +} diff --git a/sdks/go/pkg/beam/runners/direct/impulse.go b/sdks/go/pkg/beam/runners/direct/impulse.go index 932ee8cfbd67..1d6f78d06200 100644 --- a/sdks/go/pkg/beam/runners/direct/impulse.go +++ b/sdks/go/pkg/beam/runners/direct/impulse.go @@ -19,9 +19,9 @@ import ( "context" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" ) // Impulse emits its single element in one invocation. diff --git a/sdks/go/pkg/beam/runners/dot/dot.go b/sdks/go/pkg/beam/runners/dot/dot.go index 9a95ece8dcad..3e6370692bd7 100644 --- a/sdks/go/pkg/beam/runners/dot/dot.go +++ b/sdks/go/pkg/beam/runners/dot/dot.go @@ -23,9 +23,9 @@ import ( "flag" "io/ioutil" - "github.com/apache/beam/sdks/go/pkg/beam" - dotlib "github.com/apache/beam/sdks/go/pkg/beam/core/util/dot" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + dotlib "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/dot" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) func init() { diff --git a/sdks/go/pkg/beam/runners/flink/flink.go b/sdks/go/pkg/beam/runners/flink/flink.go index 45b13f9e9ac5..c79c778a7781 100644 --- a/sdks/go/pkg/beam/runners/flink/flink.go +++ b/sdks/go/pkg/beam/runners/flink/flink.go @@ -19,8 +19,8 @@ package flink import ( "context" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) func init() { diff --git a/sdks/go/pkg/beam/runners/samza/samza.go b/sdks/go/pkg/beam/runners/samza/samza.go new file mode 100644 index 000000000000..01a7c5233af2 --- /dev/null +++ b/sdks/go/pkg/beam/runners/samza/samza.go @@ -0,0 +1,35 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package samza contains the Samza runner. +package samza + +import ( + "context" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" +) + +func init() { + beam.RegisterRunner("samza", Execute) + beam.RegisterRunner("SamzaRunner", Execute) +} + +// Execute runs the given pipeline on Samza. Convenience wrapper over the +// universal runner. +func Execute(ctx context.Context, p *beam.Pipeline) (beam.PipelineResult, error) { + return universal.Execute(ctx, p) +} diff --git a/sdks/go/pkg/beam/runners/session/session.go b/sdks/go/pkg/beam/runners/session/session.go index 2c5a55dcacaa..3902beab03b2 100644 --- a/sdks/go/pkg/beam/runners/session/session.go +++ b/sdks/go/pkg/beam/runners/session/session.go @@ -30,12 +30,12 @@ import ( "sync" "time" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/session" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/session" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/proto" "google.golang.org/grpc" ) @@ -54,6 +54,8 @@ var sessionFile = flag.String("session_file", "", "Session file for the runner") // controlServer manages the FnAPI control channel. type controlServer struct { + fnpb.UnimplementedBeamFnControlServer + filename string wg *sync.WaitGroup // used to signal when the session is completed ctrlStream fnpb.BeamFnControl_ControlServer @@ -214,6 +216,8 @@ func extractPortSpec(spec *pipepb.FunctionSpec) string { // dataServer manages the FnAPI data channel. type dataServer struct { + fnpb.UnimplementedBeamFnDataServer + ctrl *controlServer } @@ -241,7 +245,9 @@ func (d *dataServer) Data(stream fnpb.BeamFnData_DataServer) error { } // loggingServer manages the FnAPI logging channel. -type loggingServer struct{} // no data content +type loggingServer struct { + fnpb.UnimplementedBeamFnLoggingServer +} func (l *loggingServer) Logging(stream fnpb.BeamFnLogging_LoggingServer) error { // This stream object is only used here. The stream is used for receiving, and diff --git a/sdks/go/pkg/beam/runners/spark/spark.go b/sdks/go/pkg/beam/runners/spark/spark.go index 7ee10fd4f889..4a37daf3c803 100644 --- a/sdks/go/pkg/beam/runners/spark/spark.go +++ b/sdks/go/pkg/beam/runners/spark/spark.go @@ -19,8 +19,8 @@ package spark import ( "context" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) func init() { diff --git a/sdks/go/pkg/beam/runners/universal/extworker/extworker.go b/sdks/go/pkg/beam/runners/universal/extworker/extworker.go index 70d637a314e6..dc75c7c8ca5b 100644 --- a/sdks/go/pkg/beam/runners/universal/extworker/extworker.go +++ b/sdks/go/pkg/beam/runners/universal/extworker/extworker.go @@ -22,10 +22,10 @@ import ( "net" "sync" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness" - "github.com/apache/beam/sdks/go/pkg/beam/log" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" "google.golang.org/grpc" ) @@ -48,6 +48,8 @@ func StartLoopback(ctx context.Context, port int) (*Loopback, error) { // Loopback implements fnpb.BeamFnExternalWorkerPoolServer type Loopback struct { + fnpb.UnimplementedBeamFnExternalWorkerPoolServer + lis net.Listener root context.Context rootCancel context.CancelFunc diff --git a/sdks/go/pkg/beam/runners/universal/extworker/extworker_test.go b/sdks/go/pkg/beam/runners/universal/extworker/extworker_test.go index 3c80065e39d9..e7444570b9c6 100644 --- a/sdks/go/pkg/beam/runners/universal/extworker/extworker_test.go +++ b/sdks/go/pkg/beam/runners/universal/extworker/extworker_test.go @@ -19,8 +19,8 @@ import ( "context" "testing" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" ) func TestLoopback(t *testing.T) { diff --git a/sdks/go/pkg/beam/runners/universal/runnerlib/compile.go b/sdks/go/pkg/beam/runners/universal/runnerlib/compile.go index 6510ad38fe51..a83e3653f12a 100644 --- a/sdks/go/pkg/beam/runners/universal/runnerlib/compile.go +++ b/sdks/go/pkg/beam/runners/universal/runnerlib/compile.go @@ -29,8 +29,8 @@ import ( "sync/atomic" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) // IsWorkerCompatibleBinary returns the path to itself and true if running diff --git a/sdks/go/pkg/beam/runners/universal/runnerlib/execute.go b/sdks/go/pkg/beam/runners/universal/runnerlib/execute.go index 8e863b769c3c..2d7672a3a4b8 100644 --- a/sdks/go/pkg/beam/runners/universal/runnerlib/execute.go +++ b/sdks/go/pkg/beam/runners/universal/runnerlib/execute.go @@ -20,20 +20,20 @@ import ( "os" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/metricsx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/metricsx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" ) // Execute executes a pipeline on the universal runner serving the given endpoint. // Convenience function. func Execute(ctx context.Context, p *pipepb.Pipeline, endpoint string, opt *JobOptions, async bool) (*universalPipelineResult, error) { // (1) Prepare job to obtain artifact staging instructions. - presult := &universalPipelineResult{JobID: ""} + presult := &universalPipelineResult{} cc, err := grpcx.Dial(ctx, endpoint, 2*time.Minute) if err != nil { @@ -105,7 +105,7 @@ func Execute(ctx context.Context, p *pipepb.Pipeline, endpoint string, opt *JobO } type universalPipelineResult struct { - JobID string + jobID string metrics *metrics.Results } @@ -124,3 +124,7 @@ func newUniversalPipelineResult(ctx context.Context, jobID string, client jobpb. func (pr universalPipelineResult) Metrics() metrics.Results { return *pr.metrics } + +func (pr universalPipelineResult) JobID() string { + return pr.jobID +} diff --git a/sdks/go/pkg/beam/runners/universal/runnerlib/job.go b/sdks/go/pkg/beam/runners/universal/runnerlib/job.go index 6fe606ecf6d3..12c17b35580e 100644 --- a/sdks/go/pkg/beam/runners/universal/runnerlib/job.go +++ b/sdks/go/pkg/beam/runners/universal/runnerlib/job.go @@ -20,14 +20,14 @@ import ( "fmt" "io" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/hooks" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - "github.com/apache/beam/sdks/go/pkg/beam/provision" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/hooks" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/provision" "github.com/golang/protobuf/proto" ) diff --git a/sdks/go/pkg/beam/runners/universal/runnerlib/stage.go b/sdks/go/pkg/beam/runners/universal/runnerlib/stage.go index bd9470b591a8..3c218a4b27e0 100644 --- a/sdks/go/pkg/beam/runners/universal/runnerlib/stage.go +++ b/sdks/go/pkg/beam/runners/universal/runnerlib/stage.go @@ -21,11 +21,13 @@ import ( "os" "time" - "github.com/apache/beam/sdks/go/pkg/beam/artifact" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/artifact" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + jobpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/jobmanagement_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" + "github.com/golang/protobuf/proto" "google.golang.org/grpc" ) @@ -82,9 +84,17 @@ func StageViaPortableApi(ctx context.Context, cc *grpc.ClientConn, binary, st st case *jobpb.ArtifactRequestWrapper_GetArtifact: switch typeUrn := request.GetArtifact.Artifact.TypeUrn; typeUrn { case graphx.URNArtifactGoWorker: - StageFile(binary, stream) + if err := StageFile(binary, stream); err != nil { + return errors.Wrap(err, "failed to stage Go worker binary") + } case "beam:artifact:type:file:v1": - StageFile(binary, stream) + typePl := pipepb.ArtifactFilePayload{} + if err := proto.Unmarshal(request.GetArtifact.Artifact.TypePayload, &typePl); err != nil { + return errors.Wrap(err, "failed to parse artifact file payload") + } + if err := StageFile(typePl.GetPath(), stream); err != nil { + return errors.Wrapf(err, "failed to stage file %v", typePl.GetPath()) + } default: return errors.Errorf("request has unexpected artifact type %s", typeUrn) } diff --git a/sdks/go/pkg/beam/runners/universal/universal.go b/sdks/go/pkg/beam/runners/universal/universal.go index d55d27001c6b..b31845fe3074 100644 --- a/sdks/go/pkg/beam/runners/universal/universal.go +++ b/sdks/go/pkg/beam/runners/universal/universal.go @@ -21,18 +21,18 @@ import ( "context" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/xlangx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/xlangx" // Importing to get the side effect of the remote execution hook. See init(). - _ "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/init" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/options/jobopts" - "github.com/apache/beam/sdks/go/pkg/beam/runners/universal/extworker" - "github.com/apache/beam/sdks/go/pkg/beam/runners/universal/runnerlib" - "github.com/apache/beam/sdks/go/pkg/beam/runners/vet" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/init" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/options/jobopts" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal/extworker" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal/runnerlib" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/vet" "github.com/golang/protobuf/proto" ) @@ -79,18 +79,18 @@ func Execute(ctx context.Context, p *beam.Pipeline) (beam.PipelineResult, error) getEnvCfg = srv.EnvironmentConfig } - enviroment, err := graphx.CreateEnvironment(ctx, envUrn, getEnvCfg) + // Fetch all dependencies for cross-language transforms + xlangx.ResolveArtifacts(ctx, edges, nil) + + environment, err := graphx.CreateEnvironment(ctx, envUrn, getEnvCfg) if err != nil { return nil, errors.WithContextf(err, "generating model pipeline") } - pipeline, err := graphx.Marshal(edges, &graphx.Options{Environment: enviroment}) + pipeline, err := graphx.Marshal(edges, &graphx.Options{Environment: environment}) if err != nil { return nil, errors.WithContextf(err, "generating model pipeline") } - // Fetch all dependencies for cross-language transforms - xlangx.ResolveArtifacts(ctx, edges, pipeline) - log.Info(ctx, proto.MarshalTextString(pipeline)) opt := &runnerlib.JobOptions{ diff --git a/sdks/go/pkg/beam/runners/vet/testpipeline/functions.go b/sdks/go/pkg/beam/runners/vet/testpipeline/functions.go index ce3d3cf06691..4e375598e1d4 100644 --- a/sdks/go/pkg/beam/runners/vet/testpipeline/functions.go +++ b/sdks/go/pkg/beam/runners/vet/testpipeline/functions.go @@ -15,9 +15,9 @@ package testpipeline -import "github.com/apache/beam/sdks/go/pkg/beam" +import "github.com/apache/beam/sdks/v2/go/pkg/beam" -//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +//go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen //go:generate starcgen --package=testpipeline --identifiers=VFn,KvFn,KvEmitFn,SCombine //go:generate go fmt diff --git a/sdks/go/pkg/beam/runners/vet/testpipeline/testpipeline.go b/sdks/go/pkg/beam/runners/vet/testpipeline/testpipeline.go index 3ce3c174ee71..d206765b4782 100644 --- a/sdks/go/pkg/beam/runners/vet/testpipeline/testpipeline.go +++ b/sdks/go/pkg/beam/runners/vet/testpipeline/testpipeline.go @@ -20,7 +20,7 @@ package testpipeline import ( - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" ) // Performant constructs a performant pipeline. diff --git a/sdks/go/pkg/beam/runners/vet/testpipeline/testpipeline.shims.go b/sdks/go/pkg/beam/runners/vet/testpipeline/testpipeline.shims.go index 9d32d1fc753b..9608d9464f84 100644 --- a/sdks/go/pkg/beam/runners/vet/testpipeline/testpipeline.shims.go +++ b/sdks/go/pkg/beam/runners/vet/testpipeline/testpipeline.shims.go @@ -23,10 +23,11 @@ import ( "reflect" // Library imports - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func init() { @@ -34,6 +35,7 @@ func init() { runtime.RegisterFunction(KvFn) runtime.RegisterFunction(VFn) runtime.RegisterType(reflect.TypeOf((*SCombine)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*SCombine)(nil)).Elem()) reflectx.RegisterStructWrapper(reflect.TypeOf((*SCombine)(nil)).Elem(), wrapMakerSCombine) reflectx.RegisterFunc(reflect.TypeOf((*func(int, int) int)(nil)).Elem(), funcMakerIntIntГInt) reflectx.RegisterFunc(reflect.TypeOf((*func(int) (string, int))(nil)).Elem(), funcMakerIntГStringInt) diff --git a/sdks/go/pkg/beam/runners/vet/vet.go b/sdks/go/pkg/beam/runners/vet/vet.go index 268852abd076..d382a41f65df 100644 --- a/sdks/go/pkg/beam/runners/vet/vet.go +++ b/sdks/go/pkg/beam/runners/vet/vet.go @@ -33,16 +33,16 @@ import ( "unicode" "unicode/utf8" - "github.com/apache/beam/sdks/go/pkg/beam/util/shimx" - - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/shimx" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) func init() { diff --git a/sdks/go/pkg/beam/runners/vet/vet_test.go b/sdks/go/pkg/beam/runners/vet/vet_test.go index dc7076a5091e..7b58042ceeb5 100644 --- a/sdks/go/pkg/beam/runners/vet/vet_test.go +++ b/sdks/go/pkg/beam/runners/vet/vet_test.go @@ -17,10 +17,10 @@ package vet import ( "context" - "github.com/apache/beam/sdks/go/pkg/beam/runners/vet/testpipeline" + "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/vet/testpipeline" "testing" - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" ) func TestEvaluate(t *testing.T) { diff --git a/sdks/go/pkg/beam/schema.go b/sdks/go/pkg/beam/schema.go new file mode 100644 index 000000000000..95811bfd1086 --- /dev/null +++ b/sdks/go/pkg/beam/schema.go @@ -0,0 +1,72 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package beam + +import ( + "io" + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" +) + +// RegisterSchemaProvider allows pipeline authors to provide special handling +// to convert types to schema representations, when those types are used as +// fields in types being encoded as schema rows. +// +// At present, the only supported provider interface is SchemaProvider, +// though this may change in the future. +// +// Providers only need to support a limited set of types for conversion, +// specifically a single struct type or a pointer to struct type, +// or an interface type, which they are registered with. +// +// Providers have three tasks with respect to a given supported logical type: +// +// * Producing schema representative types for their logical types. +// * Producing schema encoders for values of that type, writing beam +// schema encoded bytes for a value, matching the schema representative type. +// * Producing schema decoders for values of that type, reading beam +// schema encoded bytes, and producing a value of that type. +// +// Representative Schema types must be structs with only exported fields. +// +// A provider should be thread safe, but it's not required that a produced +// encoder or decoder is thread safe, since a separate encoder or decoder +// will be used for simultaneously executed bundles. +// +// If the supported type is an interface, that interface must have a non-empty +// method set. That is, it cannot be the empty interface. +// +// RegisterSchemaProvider must be called before beam.Init(), and conventionally +// is called in a package init() function. +func RegisterSchemaProvider(rt reflect.Type, provider interface{}) { + p := provider.(SchemaProvider) + schema.RegisterLogicalTypeProvider(rt, p.FromLogicalType) + coder.RegisterSchemaProviders(rt, p.BuildEncoder, p.BuildDecoder) +} + +// SchemaProvider specializes schema handling for complex types, including conversion to a +// valid schema base type, +// +// In particular, they are intended to handle schema for interface types. +// +// Sepearated out the acting type from the provider implementation is good. +type SchemaProvider interface { + FromLogicalType(reflect.Type) (reflect.Type, error) + BuildEncoder(rt reflect.Type) (func(interface{}, io.Writer) error, error) + BuildDecoder(rt reflect.Type) (func(io.Reader) (interface{}, error), error) +} diff --git a/sdks/go/pkg/beam/testing/passert/count.go b/sdks/go/pkg/beam/testing/passert/count.go index a5803c3291ec..6bc5ad89e87a 100644 --- a/sdks/go/pkg/beam/testing/passert/count.go +++ b/sdks/go/pkg/beam/testing/passert/count.go @@ -18,9 +18,9 @@ package passert import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Count verifies the given PCollection has the specified number of elements. diff --git a/sdks/go/pkg/beam/testing/passert/count_test.go b/sdks/go/pkg/beam/testing/passert/count_test.go index 4201ce7e5360..36f6c2aeeebe 100644 --- a/sdks/go/pkg/beam/testing/passert/count_test.go +++ b/sdks/go/pkg/beam/testing/passert/count_test.go @@ -18,8 +18,8 @@ package passert import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) func TestCount_Good(t *testing.T) { diff --git a/sdks/go/pkg/beam/testing/passert/equals.go b/sdks/go/pkg/beam/testing/passert/equals.go index 0c23aaef96a8..2311e961e432 100644 --- a/sdks/go/pkg/beam/testing/passert/equals.go +++ b/sdks/go/pkg/beam/testing/passert/equals.go @@ -21,7 +21,7 @@ import ( "sort" "strings" - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" ) // Equals verifies the given collection has the same values as the given @@ -40,6 +40,19 @@ func Equals(s beam.Scope, col beam.PCollection, values ...interface{}) beam.PCol return equals(subScope, col, other) } +// EqualsList verifies that the given collection has the same values as a +// given list, under coder equality. The values must be provided as an +// array or slice. This is equivalent to passing a beam.CreateList PCollection +// to Equals. +func EqualsList(s beam.Scope, col beam.PCollection, list interface{}) beam.PCollection { + subScope := s.Scope("passert.EqualsList") + if list == nil { + return Empty(subScope, col) + } + listCollection := beam.CreateList(subScope, list) + return equals(subScope, col, listCollection) +} + // equals verifies that the actual values match the expected ones. func equals(s beam.Scope, actual, expected beam.PCollection) beam.PCollection { unexpected, correct, missing := Diff(s, actual, expected) diff --git a/sdks/go/pkg/beam/testing/passert/equals_test.go b/sdks/go/pkg/beam/testing/passert/equals_test.go index 77f269c91c50..b0ddeae8d6f7 100644 --- a/sdks/go/pkg/beam/testing/passert/equals_test.go +++ b/sdks/go/pkg/beam/testing/passert/equals_test.go @@ -16,14 +16,15 @@ package passert import ( + "fmt" "strings" "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) -func TestGood(t *testing.T) { +func TestEquals_Good(t *testing.T) { p, s := beam.NewPipelineWithRoot() wantC := beam.Create(s, "c", "b", "a") gotC := beam.Create(s, "a", "b", "c") @@ -34,51 +35,63 @@ func TestGood(t *testing.T) { } } -func TestBad(t *testing.T) { - tests := []struct { - name string - actual []string - expected []string - errorParts []string - }{ - { - "missing entry", - []string{"a", "b"}, - []string{"a", "b", "MISSING"}, - []string{"2 correct entries", "0 unexpected entries", "1 missing entries", "MISSING"}, - }, - { - "unexpected entry", - []string{"a", "b", "UNEXPECTED"}, - []string{"a", "b"}, - []string{"2 correct entries", "1 unexpected entries", "0 missing entries", "UNEXPECTED"}, - }, - { - "both kinds of problem", - []string{"a", "b", "UNEXPECTED"}, - []string{"a", "b", "MISSING"}, - []string{"2 correct entries", "1 unexpected entries", "1 missing entries", "UNEXPECTED", "MISSING"}, - }, - { - "not enough", - []string{"not enough"}, - []string{"not enough", "not enough"}, - []string{"1 correct entries", "0 unexpected entries", "1 missing entries", "not enough"}, - }, - { - "too many", - []string{"too many", "too many"}, - []string{"too many"}, - []string{"1 correct entries", "1 unexpected entries", "0 missing entries", "too many"}, - }, - { - "both kinds of wrong count", - []string{"too many", "too many", "not enough"}, - []string{"not enough", "too many", "not enough"}, - []string{"2 correct entries", "1 unexpected entries", "1 missing entries", "too many", "not enough"}, - }, +func TestEqualsList_Good(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + wantL := [3]string{"c", "b", "a"} + gotC := beam.Create(s, "a", "b", "c") + + EqualsList(s, gotC, wantL) + if err := ptest.Run(p); err != nil { + t.Errorf("Pipeline failed: %v", err) } - for _, tc := range tests { +} + +var badEqualsTests = []struct { + name string + actual []string + expected []string + errorParts []string +}{ + { + "missing entry", + []string{"a", "b"}, + []string{"a", "b", "MISSING"}, + []string{"2 correct entries", "0 unexpected entries", "1 missing entries", "MISSING"}, + }, + { + "unexpected entry", + []string{"a", "b", "UNEXPECTED"}, + []string{"a", "b"}, + []string{"2 correct entries", "1 unexpected entries", "0 missing entries", "UNEXPECTED"}, + }, + { + "both kinds of problem", + []string{"a", "b", "UNEXPECTED"}, + []string{"a", "b", "MISSING"}, + []string{"2 correct entries", "1 unexpected entries", "1 missing entries", "UNEXPECTED", "MISSING"}, + }, + { + "not enough", + []string{"not enough"}, + []string{"not enough", "not enough"}, + []string{"1 correct entries", "0 unexpected entries", "1 missing entries", "not enough"}, + }, + { + "too many", + []string{"too many", "too many"}, + []string{"too many"}, + []string{"1 correct entries", "1 unexpected entries", "0 missing entries", "too many"}, + }, + { + "both kinds of wrong count", + []string{"too many", "too many", "not enough"}, + []string{"not enough", "too many", "not enough"}, + []string{"2 correct entries", "1 unexpected entries", "1 missing entries", "too many", "not enough"}, + }, +} + +func TestEquals_Bad(t *testing.T) { + for _, tc := range badEqualsTests { p, s := beam.NewPipelineWithRoot() out := Equals(s, beam.CreateList(s, tc.actual), beam.CreateList(s, tc.expected)) if err := ptest.Run(p); err == nil { @@ -97,3 +110,91 @@ func TestBad(t *testing.T) { } } } + +func TestEqualsList_Bad(t *testing.T) { + for _, tc := range badEqualsTests { + p, s := beam.NewPipelineWithRoot() + out := EqualsList(s, beam.CreateList(s, tc.actual), tc.expected) + if err := ptest.Run(p); err == nil { + t.Errorf("%v: pipeline SUCCEEDED but should have failed; got %v", tc.name, out) + } else { + str := err.Error() + missing := []string{} + for _, part := range tc.errorParts { + if !strings.Contains(str, part) { + missing = append(missing, part) + } + } + if len(missing) != 0 { + t.Errorf("%v: pipeline failed correctly, but substrings %#v are not present in message:\n%v", tc.name, missing, str) + } + } + } +} + +func ExampleEquals() { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, "some", "example", "strings") + + Equals(s, col, "example", "some", "strings") + err := ptest.Run(p) + fmt.Println(err == nil) + + // Output: true +} + +func ExampleEquals_pcollection() { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, "some", "example", "strings") + exp := beam.Create(s, "example", "some", "strings") + + Equals(s, col, exp) + err := ptest.Run(p) + fmt.Println(err == nil) + + // Output: true +} + +func ExampleEqualsList() { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, "example", "inputs", "here") + list := [3]string{"here", "example", "inputs"} + + EqualsList(s, col, list) + err := ptest.Run(p) + fmt.Println(err == nil) + + // Output: true +} + +func unwrapError(err error) error { + if wrapper, ok := err.(interface{ Unwrap() error }); ok { + return wrapper.Unwrap() + } + return err +} + +func ExampleEqualsList_mismatch() { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, "example", "inputs", "here") + list := [3]string{"wrong", "inputs", "here"} + + EqualsList(s, col, list) + err := ptest.Run(p) + err = unwrapError(err) + fmt.Println(err) + + // Output: + // DoFn[UID:1, PID:passert.failIfBadEntries, Name: github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert.failIfBadEntries] failed: + // actual PCollection does not match expected values + // ========= + // 2 correct entries (present in both) + // ========= + // 1 unexpected entries (present in actual, missing in expected) + // +++ + // example + // ========= + // 1 missing entries (missing in actual, present in expected) + // --- + // wrong +} diff --git a/sdks/go/pkg/beam/testing/passert/floats.go b/sdks/go/pkg/beam/testing/passert/floats.go new file mode 100644 index 000000000000..8bccd6989956 --- /dev/null +++ b/sdks/go/pkg/beam/testing/passert/floats.go @@ -0,0 +1,155 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package passert + +import ( + "fmt" + "reflect" + "sort" + "strings" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" +) + +// EqualsFloat calls into TryEqualsFloat, checkong that two PCollections of non-complex +// numeric types are equal, with each element being within a provided threshold of an +// expected value. Panics if TryEqualsFloat returns an error. +func EqualsFloat(s beam.Scope, observed, expected beam.PCollection, threshold float64) { + if err := TryEqualsFloat(s, observed, expected, threshold); err != nil { + panic(fmt.Sprintf("TryEqualsFloat failed: %v", err)) + } +} + +// TryEqualsFloat checks that two PCollections of floats are equal, with each element +// being within a specified threshold of its corresponding element. Both PCollections +// are loaded into memory, sorted, and compared element by element. Returns an error if +// the PCollection types are complex or non-numeric. +func TryEqualsFloat(s beam.Scope, observed, expected beam.PCollection, threshold float64) error { + errorStrings := []string{} + observedT := beam.ValidateNonCompositeType(observed) + if obsErr := validateNonComplexNumber(observedT.Type()); obsErr != nil { + errorStrings = append(errorStrings, fmt.Sprintf("observed PCollection has incompatible type: %v", obsErr)) + } + expectedT := beam.ValidateNonCompositeType(expected) + validateNonComplexNumber(expectedT.Type()) + if expErr := validateNonComplexNumber(expectedT.Type()); expErr != nil { + errorStrings = append(errorStrings, fmt.Sprintf("expected PCollection has incompatible type: %v", expErr)) + } + if len(errorStrings) != 0 { + return errors.New(strings.Join(errorStrings, "\n")) + } + s = s.Scope(fmt.Sprintf("passert.EqualsFloat[%v]", threshold)) + beam.ParDo0(s, &thresholdFn{Threshold: threshold}, beam.Impulse(s), beam.SideInput{Input: observed}, beam.SideInput{Input: expected}) + return nil +} + +type thresholdFn struct { + Threshold float64 +} + +func (f *thresholdFn) ProcessElement(_ []byte, observed, expected func(*beam.T) bool) error { + var observedValues, expectedValues []float64 + var observedInput, expectedInput beam.T + for observed(&observedInput) { + val := toFloat(observedInput) + observedValues = append(observedValues, val) + } + for expected(&expectedInput) { + val := toFloat(expectedInput) + expectedValues = append(expectedValues, val) + } + if len(observedValues) != len(expectedValues) { + return errors.Errorf("PCollections of different lengths, got %v expected %v", len(observedValues), len(expectedValues)) + } + sort.Float64s(observedValues) + sort.Float64s(expectedValues) + var tooLow, tooHigh []string + for i := 0; i < len(observedValues); i++ { + delta := observedValues[i] - expectedValues[i] + if delta > f.Threshold { + tooHigh = append(tooHigh, fmt.Sprintf("%v > %v,", observedValues[i], expectedValues[i])) + } else if delta < f.Threshold*-1 { + tooLow = append(tooLow, fmt.Sprintf("%v < %v,", observedValues[i], expectedValues[i])) + } + } + if len(tooLow)+len(tooHigh) == 0 { + return nil + } + errorStrings := []string{} + if len(tooLow) != 0 { + errorStrings = append(errorStrings, fmt.Sprintf("values below expected: %v", tooLow)) + } + if len(tooHigh) != 0 { + errorStrings = append(errorStrings, fmt.Sprintf("values above expected: %v", tooHigh)) + } + return errors.New(strings.Join(errorStrings, "\n")) +} + +// AllWithinBounds checks that a PCollection of numeric types is within the bounds +// [lo, high]. Checks for case where bounds are flipped and swaps them so the bounds +// passed to the doFn are always lo <= hi. +func AllWithinBounds(s beam.Scope, col beam.PCollection, lo, hi float64) { + t := beam.ValidateNonCompositeType(col) + validateNonComplexNumber(t.Type()) + if lo > hi { + lo, hi = hi, lo + } + s = s.Scope(fmt.Sprintf("passert.AllWithinBounds([%v, %v])", lo, hi)) + beam.ParDo0(s, &boundsFn{lo: lo, hi: hi}, beam.Impulse(s), beam.SideInput{Input: col}) +} + +type boundsFn struct { + lo, hi float64 +} + +func (f *boundsFn) ProcessElement(_ []byte, col func(*beam.T) bool) error { + var tooLow, tooHigh []float64 + var input beam.T + for col(&input) { + val := toFloat(input) + if val < f.lo { + tooLow = append(tooLow, val) + } else if val > f.hi { + tooHigh = append(tooHigh, val) + } + } + if len(tooLow)+len(tooHigh) == 0 { + return nil + } + errorStrings := []string{} + if len(tooLow) != 0 { + sort.Float64s(tooLow) + errorStrings = append(errorStrings, fmt.Sprintf("values below minimum value %v: %v", f.lo, tooLow)) + } + if len(tooHigh) != 0 { + sort.Float64s(tooHigh) + errorStrings = append(errorStrings, fmt.Sprintf("values above maximum value %v: %v", f.hi, tooHigh)) + } + return errors.New(strings.Join(errorStrings, "\n")) +} + +func toFloat(input beam.T) float64 { + return reflect.ValueOf(input.(interface{})).Convert(reflectx.Float64).Interface().(float64) +} + +func validateNonComplexNumber(t reflect.Type) error { + if !reflectx.IsNumber(t) || reflectx.IsComplex(t) { + return errors.Errorf("type must be a non-complex number: %v", t) + } + return nil +} diff --git a/sdks/go/pkg/beam/testing/passert/floats_test.go b/sdks/go/pkg/beam/testing/passert/floats_test.go new file mode 100644 index 000000000000..88bdc90757f6 --- /dev/null +++ b/sdks/go/pkg/beam/testing/passert/floats_test.go @@ -0,0 +1,323 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package passert + +import ( + "strings" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" +) + +func TestEqualsFloat_exact(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + left := beam.Create(s, 1.2, 4.6, 3.79) + right := beam.Create(s, 1.2, 4.6, 3.79) + EqualsFloat(s, left, right, 0.001) + err := ptest.Run(p) + if err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestEqualsFloat_withinThreshold(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + left := beam.Create(s, 1.1996, 4.60002, 3.79) + right := beam.Create(s, 1.2, 4.6, 3.79) + EqualsFloat(s, left, right, 0.001) + err := ptest.Run(p) + if err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestEqualsFloat_bad(t *testing.T) { + var tests = []struct { + name string + observed []float64 + expected []float64 + threshold float64 + errorParts []string + }{ + { + "length mismatch", + []float64{1.2, 3.4}, + []float64{1.2, 3.4, 5.6}, + 0.001, + []string{"PCollections of different lengths", "got 2", "expected 3"}, + }, + { + "too low", + []float64{1.198, 3.3}, + []float64{1.2, 3.4}, + 0.001, + []string{"values below expected", "1.198 < 1.2", "3.3 < 3.4"}, + }, + { + "too high", + []float64{1.3, 3.402}, + []float64{1.2, 3.4}, + 0.001, + []string{"values above expected", "1.3 > 1.2", "3.402 > 3.4"}, + }, + { + "too high and too low", + []float64{1.198, 3.402}, + []float64{1.2, 3.4}, + 0.001, + []string{"values below expected", "1.198 < 1.2", "values above expected", "3.402 > 3.4"}, + }, + } + for _, tc := range tests { + p, s := beam.NewPipelineWithRoot() + left := beam.CreateList(s, tc.observed) + right := beam.CreateList(s, tc.expected) + EqualsFloat(s, left, right, tc.threshold) + err := ptest.Run(p) + if err == nil { + t.Fatalf("Pipeline succeeded but should have failed.") + } + str := err.Error() + missing := []string{} + for _, part := range tc.errorParts { + if !strings.Contains(str, part) { + missing = append(missing, part) + } + } + if len(missing) != 0 { + t.Errorf("%v: pipeline failed correctly but substrings %#v are not present in message:\n%v", tc.name, missing, str) + } + } +} + +func TestEqualsFloat_nonNumeric(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + obs := beam.Create(s, "a", "b", "c") + exp := beam.Create(s, "a", "b", "c") + err := TryEqualsFloat(s, obs, exp, 0.001) + ptest.Run(p) + if err == nil { + t.Fatalf("Pipeline succeeded but should have failed.") + } + str := err.Error() + expErr := "type must be a non-complex number" + if !strings.Contains(str, expErr) { + t.Errorf("pipeline failed correctly but did not contain substring %#v in message: \n%v", expErr, str) + } +} + +func TestAllWithinBounds_GoodFloats(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, 0.0, 0.5, 1.0) + AllWithinBounds(s, col, 0.0, 1.0) + err := ptest.Run(p) + if err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestAllWithinBounds_GoodInts(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, 0, 1, 2) + AllWithinBounds(s, col, 0.0, 2.0) + err := ptest.Run(p) + if err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestAllWithinBounds_FlippedBounds(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, 0.0, 0.5, 1.0) + AllWithinBounds(s, col, 1.0, 0.0) + err := ptest.Run(p) + if err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestAllWithinBounds_BadFloats(t *testing.T) { + var badBoundsTests = []struct { + name string + inputs []float64 + lo float64 + hi float64 + errorParts []string + }{ + { + "out of int bounds low", + []float64{-1.0, 0.5, 1.0}, + 0, + 1, + []string{"values below minimum value 0:", "[-1]"}, + }, + { + "out of int bounds high", + []float64{0.0, 0.5, 2.0}, + 0, + 1, + []string{"values above maximum value 1:", "[2]"}, + }, + { + "out of int bounds both", + []float64{-0.5, 0.0, 0.5, 2.0}, + 0, + 1, + []string{"values below minimum value 0:", "[-0.5]", "values above maximum value 1:", "[2]"}, + }, + { + "out of int bounds in-order", + []float64{-0.5, -1.0, 0.0, 0.5}, + 0, + 1, + []string{"values below minimum value 0:", "[-1 -0.5]"}, + }, + { + "out of float bounds low", + []float64{0.2, 1.0, 1.5}, + 0.5, + 1.5, + []string{"values below minimum value 0.5:", "[0.2]"}, + }, + { + "out of float bounds high", + []float64{0.5, 1.0, 2.0}, + 0.5, + 1.5, + []string{"values above maximum value 1.5:", "[2]"}, + }, + { + "out of float bounds both", + []float64{0.0, 0.5, 1.0, 2.0}, + 0.5, + 1.5, + []string{"values below minimum value 0.5:", "[0]", "values above maximum value 1.5:", "[2]"}, + }, + { + "out of float bounds in-order", + []float64{0.5, 1.0, 2.0, 2.5}, + 0.5, + 1.5, + []string{"values above maximum value 1.5:", "[2 2.5]"}, + }, + } + for _, tc := range badBoundsTests { + p, s := beam.NewPipelineWithRoot() + col := beam.CreateList(s, tc.inputs) + AllWithinBounds(s, col, tc.lo, tc.hi) + err := ptest.Run(p) + if err == nil { + t.Fatalf("Pipeline succeeded but should have failed.") + } + str := err.Error() + missing := []string{} + for _, part := range tc.errorParts { + if !strings.Contains(str, part) { + missing = append(missing, part) + } + } + if len(missing) != 0 { + t.Errorf("%v: pipeline failed correctly but substrings %#v are not present in message:\n%v", tc.name, missing, str) + } + } +} + +func TestAllWithinBounds_BadInts(t *testing.T) { + var badBoundsTests = []struct { + name string + inputs []int + lo float64 + hi float64 + errorParts []string + }{ + { + "out of int bounds low", + []int{-1, 0, 1}, + 0, + 1, + []string{"values below minimum value 0:", "[-1]"}, + }, + { + "out of int bounds high", + []int{0, 1, 2}, + 0, + 1, + []string{"values above maximum value 1:", "[2]"}, + }, + { + "out of int bounds both", + []int{-1, 0, 1, 2}, + 0, + 1, + []string{"values below minimum value 0:", "[-1]", "values above maximum value 1:", "[2]"}, + }, + { + "out of int bounds in-order", + []int{0, 1, 2, 3}, + 0, + 1, + []string{"values above maximum value 1:", "[2 3]"}, + }, + { + "out of float bounds low", + []int{0, 1, 2}, + 0.5, + 2.5, + []string{"values below minimum value 0.5:", "[0]"}, + }, + { + "out of float bounds high", + []int{1, 2, 3}, + 0.5, + 2.5, + []string{"values above maximum value 2.5:", "[3]"}, + }, + { + "out of float bounds both", + []int{0, 1, 2, 3}, + 0.5, + 2.5, + []string{"values below minimum value 0.5:", "[0]", "values above maximum value 2.5:", "[3]"}, + }, + { + "out of float bounds in-order", + []int{0, -1, 1, 2}, + 0.5, + 2.5, + []string{"values below minimum value 0.5:", "[-1 0]"}, + }, + } + for _, tc := range badBoundsTests { + p, s := beam.NewPipelineWithRoot() + col := beam.CreateList(s, tc.inputs) + AllWithinBounds(s, col, tc.lo, tc.hi) + err := ptest.Run(p) + if err == nil { + t.Fatalf("Pipeline succeeded but should have failed.") + } + str := err.Error() + missing := []string{} + for _, part := range tc.errorParts { + if !strings.Contains(str, part) { + missing = append(missing, part) + } + } + if len(missing) != 0 { + t.Errorf("%v: pipeline failed correctly but substrings %#v are not present in message:\n%v", tc.name, missing, str) + } + } +} diff --git a/sdks/go/pkg/beam/testing/passert/hash.go b/sdks/go/pkg/beam/testing/passert/hash.go index 225e5f7dffe1..678c86dc59c0 100644 --- a/sdks/go/pkg/beam/testing/passert/hash.go +++ b/sdks/go/pkg/beam/testing/passert/hash.go @@ -21,8 +21,8 @@ import ( "fmt" "sort" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Hash validates that the incoming PCollection has the given size and diff --git a/sdks/go/pkg/beam/testing/passert/passert.go b/sdks/go/pkg/beam/testing/passert/passert.go index e77c5579c651..6530c67e2d6a 100644 --- a/sdks/go/pkg/beam/testing/passert/passert.go +++ b/sdks/go/pkg/beam/testing/passert/passert.go @@ -22,13 +22,13 @@ import ( "bytes" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/filter" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/filter" ) -//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +//go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen //go:generate starcgen --package=passert --identifiers=diffFn,failFn,failIfBadEntries,failKVFn,failGBKFn,hashFn,sumFn,errFn,elmCountCombineFn //go:generate go fmt diff --git a/sdks/go/pkg/beam/testing/passert/passert.shims.go b/sdks/go/pkg/beam/testing/passert/passert.shims.go index 5aef1410102b..a090aef743a4 100644 --- a/sdks/go/pkg/beam/testing/passert/passert.shims.go +++ b/sdks/go/pkg/beam/testing/passert/passert.shims.go @@ -25,22 +25,31 @@ import ( "reflect" // Library imports - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func init() { runtime.RegisterFunction(failIfBadEntries) runtime.RegisterType(reflect.TypeOf((*diffFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*diffFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*elmCountCombineFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*elmCountCombineFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*errFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*errFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*failFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*failFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*failGBKFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*failGBKFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*failKVFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*failKVFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*hashFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*hashFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*sumFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*sumFn)(nil)).Elem()) reflectx.RegisterStructWrapper(reflect.TypeOf((*diffFn)(nil)).Elem(), wrapMakerDiffFn) reflectx.RegisterStructWrapper(reflect.TypeOf((*elmCountCombineFn)(nil)).Elem(), wrapMakerElmCountCombineFn) reflectx.RegisterStructWrapper(reflect.TypeOf((*errFn)(nil)).Elem(), wrapMakerErrFn) diff --git a/sdks/go/pkg/beam/testing/passert/passert_test.go b/sdks/go/pkg/beam/testing/passert/passert_test.go new file mode 100644 index 000000000000..221b2f399c6c --- /dev/null +++ b/sdks/go/pkg/beam/testing/passert/passert_test.go @@ -0,0 +1,120 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package passert + +import ( + "strings" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" +) + +func TestTrue_string(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, "a", "a", "a") + True(s, col, func(input string) bool { + return input == "a" + }) + if err := ptest.Run(p); err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestTrue_numeric(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, 3, 3, 6) + True(s, col, func(input int) bool { + return input < 13 + }) + if err := ptest.Run(p); err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestTrue_bad(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, "a", "a", "b") + True(s, col, func(input string) bool { + return input == "a" + }) + err := ptest.Run(p) + if err == nil { + t.Fatalf("Pipeline succeeded when it should haved failed, got %v", err) + } + if !strings.Contains(err.Error(), "predicate(b) = false, want true") { + t.Errorf("Pipeline failed but did not produce the expected error, got %v", err) + } +} + +func TestFalse_string(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, "a", "a", "a") + False(s, col, func(input string) bool { + return input == "b" + }) + if err := ptest.Run(p); err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestFalse_numeric(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, 3, 3, 6) + False(s, col, func(input int) bool { + return input > 13 + }) + if err := ptest.Run(p); err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestFalse_bad(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, "a", "a", "b") + False(s, col, func(input string) bool { + return input == "b" + }) + err := ptest.Run(p) + if err == nil { + t.Fatalf("Pipeline succeeded when it should haved failed, got %v", err) + } + if !strings.Contains(err.Error(), "predicate(b) = true, want false") { + t.Errorf("Pipeline failed but did not produce the expected error, got %v", err) + } +} + +func TestEmpty_good(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.CreateList(s, []string{}) + Empty(s, col) + if err := ptest.Run(p); err != nil { + t.Errorf("Pipeline failed: %v", err) + } +} + +func TestEmpty_bad(t *testing.T) { + p, s := beam.NewPipelineWithRoot() + col := beam.Create(s, "a") + Empty(s, col) + err := ptest.Run(p) + if err == nil { + t.Fatalf("Pipeline succeeded when it should haved failed, got %v", err) + } + if !strings.Contains(err.Error(), "PCollection contains a, want empty collection") { + t.Errorf("Pipeline failed but did not produce the expected error, got %v", err) + } +} diff --git a/sdks/go/pkg/beam/testing/passert/sum.go b/sdks/go/pkg/beam/testing/passert/sum.go index 99d7a954d0c6..5e89ba5ed4c5 100644 --- a/sdks/go/pkg/beam/testing/passert/sum.go +++ b/sdks/go/pkg/beam/testing/passert/sum.go @@ -18,8 +18,8 @@ package passert import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // Sum validates that the sum and count of elements in the incoming PCollection is diff --git a/sdks/go/pkg/beam/testing/passert/sum_test.go b/sdks/go/pkg/beam/testing/passert/sum_test.go new file mode 100644 index 000000000000..1b6f391642a9 --- /dev/null +++ b/sdks/go/pkg/beam/testing/passert/sum_test.go @@ -0,0 +1,110 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package passert + +import ( + "strings" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" +) + +func TestSum_good(t *testing.T) { + tests := []struct { + name string + values []int + size int + total int + }{ + { + "all positive", + []int{1, 2, 3, 4, 5}, + 5, + 15, + }, + { + "all negative", + []int{-1, -2, -3, -4, -5}, + 5, + -15, + }, + { + "mixed", + []int{1, -2, 3, -4, 5}, + 5, + 3, + }, + { + "empty", + []int{}, + 0, + 0, + }, + } + for _, tc := range tests { + p, s := beam.NewPipelineWithRoot() + col := beam.CreateList(s, tc.values) + Sum(s, col, tc.name, tc.size, tc.total) + if err := ptest.Run(p); err != nil { + t.Errorf("Pipeline failed: %v", err) + } + } +} + +func TestSum_bad(t *testing.T) { + tests := []struct { + name string + col []int + size int + total int + errorParts []string + }{ + { + "bad size", + []int{1, 2, 3, 4, 5}, + 4, + 15, + []string{"{15, size: 5}", "want {15, size:4}"}, + }, + { + "bad total", + []int{1, 2, 3, 4, 5}, + 5, + 16, + []string{"{15, size: 5}", "want {16, size:5}"}, + }, + } + for _, tc := range tests { + p, s := beam.NewPipelineWithRoot() + col := beam.CreateList(s, tc.col) + Sum(s, col, tc.name, tc.size, tc.total) + err := ptest.Run(p) + if err == nil { + t.Fatalf("Pipeline succeeded but should have failed: %v", tc.name) + } + str := err.Error() + missing := []string{} + for _, part := range tc.errorParts { + if !strings.Contains(str, part) { + missing = append(missing, part) + } + } + if len(missing) != 0 { + t.Errorf("%v: pipeline failed correctly, but substrings %#v are not present in message:\n%v", tc.name, missing, str) + } + } +} diff --git a/sdks/go/pkg/beam/testing/ptest/ptest.go b/sdks/go/pkg/beam/testing/ptest/ptest.go index b4ea7db2a7dd..e45e28d71c41 100644 --- a/sdks/go/pkg/beam/testing/ptest/ptest.go +++ b/sdks/go/pkg/beam/testing/ptest/ptest.go @@ -22,10 +22,10 @@ import ( "os" "testing" - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" // ptest uses the direct runner to execute pipelines by default. - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/direct" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/direct" ) // TODO(herohde) 7/10/2017: add hooks to verify counters, logs, etc. @@ -67,6 +67,10 @@ var ( defaultRunner = "direct" ) +func DefaultRunner() string { + return defaultRunner +} + // Run runs a pipeline for testing. The semantics of the pipeline is expected // to be verified through passert. func Run(p *beam.Pipeline) error { @@ -77,18 +81,32 @@ func Run(p *beam.Pipeline) error { return err } +// RunWithMetrics runs a pipeline for testing with that returns metrics.Results +// in the form of Pipeline Result +func RunWithMetrics(p *beam.Pipeline) (beam.PipelineResult, error) { + if *Runner == "" { + *Runner = defaultRunner + } + return beam.Run(context.Background(), *Runner, p) +} + // RunAndValidate runs a pipeline for testing and validates the result, failing // the test if the pipeline fails. -func RunAndValidate(t *testing.T, p *beam.Pipeline) { - if err := Run(p); err != nil { +func RunAndValidate(t *testing.T, p *beam.Pipeline) beam.PipelineResult { + pr, err := RunWithMetrics(p) + if err != nil { t.Fatalf("Failed to execute job: %v", err) } + return pr } // Main is an implementation of testing's TestMain to permit testing // pipelines on runners other than the direct runner. // -// To enable this behavior, _ import the desired runner, and set the flag accordingly. +// To enable this behavior, _ import the desired runner, and set the flag +// accordingly. For example: +// +// import _ "github.com/apache/beam/sdks/v2/go/pkg/runners/flink" // // func TestMain(m *testing.M) { // ptest.Main(m) @@ -109,3 +127,25 @@ func MainWithDefault(m *testing.M, runner string) { beam.Init() os.Exit(m.Run()) } + +// MainRet is equivelant to Main, but returns an exit code to pass to os.Exit(). +// +// Example: +// +// func TestMain(m *testing.M) { +// os.Exit(ptest.Main(m)) +// } +func MainRet(m *testing.M) int { + return MainRetWithDefault(m, "direct") +} + +// MainRetWithDefault is equivelant to MainWithDefault but returns an exit code +// to pass to os.Exit(). +func MainRetWithDefault(m *testing.M, runner string) int { + defaultRunner = runner + if !flag.Parsed() { + flag.Parse() + } + beam.Init() + return m.Run() +} diff --git a/sdks/go/pkg/beam/testing/teststream/teststream.go b/sdks/go/pkg/beam/testing/teststream/teststream.go new file mode 100644 index 000000000000..11f2e1cea5f9 --- /dev/null +++ b/sdks/go/pkg/beam/testing/teststream/teststream.go @@ -0,0 +1,182 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package teststream contains code configuring the TestStream primitive for +// use in testing code that is meant to be run on streaming data sources. +// +// See https://beam.apache.org/blog/test-stream/ for more information. +// +// TestStream is supported on the Flink runner and currently supports int64, +// float64, and boolean types. +// +// TODO(BEAM-12753): Flink currently displays unexpected behavior with TestStream, +// should not be used until this issue is resolved. +package teststream + +import ( + "bytes" + "fmt" + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/protox" + + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" +) + +const urn = "beam:transform:teststream:v1" + +// Config holds information used to create a TestStreamPayload object. +type Config struct { + elmType beam.FullType + events []*pipepb.TestStreamPayload_Event + endpoint *pipepb.ApiServiceDescriptor + watermark int64 +} + +// NewConfig returns a Config to build a sequence of a test stream's events. +// Requires that users provide the coder for the elements they are trying to emit. +func NewConfig() Config { + return Config{elmType: nil, + events: []*pipepb.TestStreamPayload_Event{}, + endpoint: &pipepb.ApiServiceDescriptor{}, + watermark: mtime.MinTimestamp.Milliseconds(), + } +} + +// SetEndpoint sets a URL for a TestStreamService that will emit events instead of having them +// defined manually. Currently does not support authentication, so the TestStreamService should +// be accessed in a trusted context. +func (c *Config) setEndpoint(url string) { + c.endpoint.Url = url +} + +// createPayload converts the Config object into a TestStreamPayload to be sent to the runner. +func (c *Config) createPayload() *pipepb.TestStreamPayload { + // c0 is always the first coder in the pipeline, and inserting the TestStream as the first + // element in the pipeline guarantees that the c0 coder corresponds to the type it outputs. + return &pipepb.TestStreamPayload{CoderId: "c0", Events: c.events, Endpoint: c.endpoint} +} + +// AdvanceWatermark adds an event to the Config Events struct advancing the watermark for the PCollection +// to the given timestamp. Timestamp is in milliseconds +func (c *Config) AdvanceWatermark(timestamp int64) error { + if c.watermark >= timestamp { + return fmt.Errorf("watermark must be monotonally increasing, is at %v, got %v", c.watermark, timestamp) + } + watermarkAdvance := &pipepb.TestStreamPayload_Event_AdvanceWatermark{NewWatermark: timestamp} + watermarkEvent := &pipepb.TestStreamPayload_Event_WatermarkEvent{WatermarkEvent: watermarkAdvance} + c.events = append(c.events, &pipepb.TestStreamPayload_Event{Event: watermarkEvent}) + c.watermark = timestamp + return nil +} + +// AdvanceWatermarkToInfinity advances the watermark to the maximum timestamp. +func (c *Config) AdvanceWatermarkToInfinity() error { + return c.AdvanceWatermark(mtime.MaxTimestamp.Milliseconds()) +} + +// AdvanceProcessingTime adds an event advancing the processing time by a given duration. +// This advancement is applied to all of the PCollections output by the TestStream. +func (c *Config) AdvanceProcessingTime(duration int64) { + processingAdvance := &pipepb.TestStreamPayload_Event_AdvanceProcessingTime{AdvanceDuration: duration} + processingEvent := &pipepb.TestStreamPayload_Event_ProcessingTimeEvent{ProcessingTimeEvent: processingAdvance} + c.events = append(c.events, &pipepb.TestStreamPayload_Event{Event: processingEvent}) +} + +// AdvanceProcessingTimeToInfinity moves the TestStream processing time to the largest possible +// timestamp. +func (c *Config) AdvanceProcessingTimeToInfinity() { + c.AdvanceProcessingTime(mtime.MaxTimestamp.Milliseconds()) +} + +// AddElements adds a number of elements to the stream at the specified event timestamp. Must be called with +// at least one element. +// +// On the first call, a type will be inferred from the passed in elements, which must be of all the same type. +// Type mismatches on this or subsequent calls will cause AddElements to return an error. +// +// Element types must have built-in coders in Beam. +func (c *Config) AddElements(timestamp int64, elements ...interface{}) error { + t := reflect.TypeOf(elements[0]) + if c.elmType == nil { + c.elmType = typex.New(t) + } else if c.elmType.Type() != t { + return fmt.Errorf("element type mismatch, previous additions were of type %v, tried to add type %v", c.elmType, t) + } + for i, ele := range elements { + if reflect.TypeOf(ele) != c.elmType.Type() { + return fmt.Errorf("element %d was type %T, previous additions were of type %v", i, ele, c.elmType) + } + } + newElements := []*pipepb.TestStreamPayload_TimestampedElement{} + enc := beam.NewElementEncoder(t) + for _, e := range elements { + var buf bytes.Buffer + if err := enc.Encode(e, &buf); err != nil { + return fmt.Errorf("encoding value %v failed, got %v", e, err) + } + newElements = append(newElements, &pipepb.TestStreamPayload_TimestampedElement{EncodedElement: buf.Bytes(), Timestamp: timestamp}) + } + addElementsEvent := &pipepb.TestStreamPayload_Event_AddElements{Elements: newElements} + elementEvent := &pipepb.TestStreamPayload_Event_ElementEvent{ElementEvent: addElementsEvent} + c.events = append(c.events, &pipepb.TestStreamPayload_Event{Event: elementEvent}) + return nil +} + +// AddElementList inserts a slice of elements into the stream at the specified event timestamp. Must be called with +// at least one element. +// +// Calls into AddElements, which panics if an inserted type does not match a previously inserted element type. +// +// Element types must have built-in coders in Beam. +func (c *Config) AddElementList(timestamp int64, elements interface{}) error { + val := reflect.ValueOf(elements) + if val.Kind() != reflect.Slice && val.Kind() != reflect.Array { + return fmt.Errorf("input %v must be a slice or array", elements) + } + + var inputs []interface{} + for i := 0; i < val.Len(); i++ { + inputs = append(inputs, val.Index(i).Interface()) + } + return c.AddElements(timestamp, inputs...) +} + +// Create inserts a TestStream primitive into a pipeline, taking a scope and a Config object and +// producing an output PCollection. The TestStream must be the first PTransform in the +// pipeline. +func Create(s beam.Scope, c Config) beam.PCollection { + pyld := protox.MustEncode(c.createPayload()) + outputs := []beam.FullType{c.elmType} + + output := beam.External(s, urn, pyld, []beam.PCollection{}, outputs, false) + + // This should only ever contain one PCollection. + return output[0] +} + +// CreateWithEndpoint inserts a TestStream primitive into a pipeline, taking a scope, a url to a +// TestStreamService, and a FullType object describing the elements that will be returned by the +// TestStreamService. Authentication is currently not supported, so the service the URL points to +// should be accessed in a trusted context. +func CreateWithEndpoint(s beam.Scope, url string, elementType beam.FullType) beam.PCollection { + c := NewConfig() + c.setEndpoint(url) + c.elmType = elementType + return Create(s, c) +} diff --git a/sdks/go/pkg/beam/testing/teststream/teststream_test.go b/sdks/go/pkg/beam/testing/teststream/teststream_test.go new file mode 100644 index 000000000000..7b4a0c25637d --- /dev/null +++ b/sdks/go/pkg/beam/testing/teststream/teststream_test.go @@ -0,0 +1,172 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package teststream + +import ( + "bytes" + "reflect" + "strings" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" +) + +func TestNewConfig(t *testing.T) { + con := NewConfig() + if con.elmType != nil { + t.Errorf("type is not correct, expected nil, got %v", con.elmType) + } + if len(con.events) != 0 { + t.Errorf("config has too many elements, expected 0, got %v", len(con.events)) + } + if con.endpoint.Url != "" { + t.Errorf("config has URL endpoint when it should be empty") + } +} + +func TestAdvanceWatermark(t *testing.T) { + con := NewConfig() + con.AdvanceWatermark(500) + if w := con.watermark; w != 500 { + t.Errorf("want default watermark to be 500, got %v", w) + } + if len(con.events) != 1 { + t.Fatalf("want only 1 event in config, got %v", len(con.events)) + } + if eventWatermark := con.events[0].GetWatermarkEvent().NewWatermark; eventWatermark != 500 { + t.Errorf("want watermark in event to be 500, got %v", eventWatermark) + } +} + +func TestAdvanceWatermark_Bad(t *testing.T) { + con := NewConfig() + if errOne := con.AdvanceWatermark(500); errOne != nil { + t.Fatalf("first advance watermark failed when it should have succeeded, got %v", errOne) + } + if errTwo := con.AdvanceWatermark(200); errTwo == nil { + t.Errorf("second advance watermark succeeded when it should have failed") + } +} + +func TestAdvanceProcessingTime(t *testing.T) { + con := NewConfig() + con.AdvanceProcessingTime(100) + if len(con.events) != 1 { + t.Fatalf("want only 1 event in config, got %v", len(con.events)) + } + event := con.events[0].GetProcessingTimeEvent() + if event.GetAdvanceDuration() != 100 { + t.Errorf("want duration of 100, got %v", event.GetAdvanceDuration()) + } +} + +func TestAddElements(t *testing.T) { + tests := []struct { + name string + elementGroups [][]interface{} + }{ + { + "bools", + [][]interface{}{{true, false}}, + }, + { + "multiple bools", + [][]interface{}{{true, false}, {true, false}}, + }, + { + "strings", + [][]interface{}{{"test", "other test"}}, + }, + { + "floats", + [][]interface{}{{1.1, 2.2, 3.3}}, + }, + } + for _, tc := range tests { + con := NewConfig() + for i, elements := range tc.elementGroups { + if err := con.AddElements(100, elements...); err != nil { + t.Fatalf("%v failed to add elements to config, got %v", tc.name, err) + } + for j, event := range con.events[i].GetElementEvent().GetElements() { + dec := beam.NewElementDecoder(reflect.TypeOf(elements[j])) + buf := bytes.NewReader(event.GetEncodedElement()) + val, err := dec.Decode(buf) + if err != nil { + t.Errorf("%v, error decoding element, got %v", tc.name, err) + } + if val != elements[j] { + t.Errorf("%v added element mismatch, want %v, got %v", tc.name, elements[j], val) + } + } + } + } +} + +func TestAddElementList(t *testing.T) { + tests := []struct { + name string + elementGroups [][]interface{} + }{ + { + "bools", + [][]interface{}{{true, false}}, + }, + { + "multiple bools", + [][]interface{}{{true, false}, {true, false}}, + }, + { + "strings", + [][]interface{}{{"test", "other test"}}, + }, + { + "floats", + [][]interface{}{{1.1, 2.2, 3.3}}, + }, + } + for _, tc := range tests { + con := NewConfig() + for i, elements := range tc.elementGroups { + if err := con.AddElementList(100, elements); err != nil { + t.Fatalf("%v failed to add elements to config, got %v", tc.name, err) + } + for j, event := range con.events[i].GetElementEvent().GetElements() { + dec := beam.NewElementDecoder(reflect.TypeOf(elements[j])) + buf := bytes.NewReader(event.GetEncodedElement()) + val, err := dec.Decode(buf) + if err != nil { + t.Errorf("%v, error decoding element, got %v", tc.name, err) + } + if val != elements[j] { + t.Errorf("%v added element mismatch, want %v, got %v", tc.name, elements[j], val) + } + } + } + } +} + +func TestAddElementList_Bad(t *testing.T) { + con := NewConfig() + err := con.AddElementList(100, true) + if err == nil { + t.Fatalf("pipeline succeeded when it should have failed") + } + str := err.Error() + if !strings.Contains(str, "must be a slice or array") { + t.Errorf("pipeline failed but got unexpected error message, got %v", err) + } +} diff --git a/sdks/go/pkg/beam/transforms/filter/distinct.go b/sdks/go/pkg/beam/transforms/filter/distinct.go index 1d1e02553cc5..269789da4074 100644 --- a/sdks/go/pkg/beam/transforms/filter/distinct.go +++ b/sdks/go/pkg/beam/transforms/filter/distinct.go @@ -16,7 +16,7 @@ package filter import ( - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" ) // Distinct removes all duplicates from a collection, under coder equality. It diff --git a/sdks/go/pkg/beam/transforms/filter/distinct_test.go b/sdks/go/pkg/beam/transforms/filter/distinct_test.go index e1fe9f5b232b..142213f1e6fa 100644 --- a/sdks/go/pkg/beam/transforms/filter/distinct_test.go +++ b/sdks/go/pkg/beam/transforms/filter/distinct_test.go @@ -18,9 +18,9 @@ package filter_test import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/filter" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/filter" ) type s struct { diff --git a/sdks/go/pkg/beam/transforms/filter/filter.go b/sdks/go/pkg/beam/transforms/filter/filter.go index 428fd6265974..1b289bbe3dd6 100644 --- a/sdks/go/pkg/beam/transforms/filter/filter.go +++ b/sdks/go/pkg/beam/transforms/filter/filter.go @@ -18,12 +18,12 @@ package filter import ( - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) -//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +//go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen //go:generate starcgen --package=filter --identifiers=filterFn,mapFn,mergeFn //go:generate go fmt diff --git a/sdks/go/pkg/beam/transforms/filter/filter.shims.go b/sdks/go/pkg/beam/transforms/filter/filter.shims.go index aa6ebd3cfe32..9296daf22c75 100644 --- a/sdks/go/pkg/beam/transforms/filter/filter.shims.go +++ b/sdks/go/pkg/beam/transforms/filter/filter.shims.go @@ -23,16 +23,18 @@ import ( "reflect" // Library imports - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func init() { runtime.RegisterFunction(mapFn) runtime.RegisterFunction(mergeFn) runtime.RegisterType(reflect.TypeOf((*filterFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*filterFn)(nil)).Elem()) reflectx.RegisterStructWrapper(reflect.TypeOf((*filterFn)(nil)).Elem(), wrapMakerFilterFn) reflectx.RegisterFunc(reflect.TypeOf((*func(int, int) int)(nil)).Elem(), funcMakerIntIntГInt) reflectx.RegisterFunc(reflect.TypeOf((*func(typex.T, func(typex.T)))(nil)).Elem(), funcMakerTypex۰TEmitTypex۰TГ) diff --git a/sdks/go/pkg/beam/transforms/filter/filter_test.go b/sdks/go/pkg/beam/transforms/filter/filter_test.go index 4e3c6377ec3b..0a42826c5664 100644 --- a/sdks/go/pkg/beam/transforms/filter/filter_test.go +++ b/sdks/go/pkg/beam/transforms/filter/filter_test.go @@ -18,9 +18,9 @@ package filter_test import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/filter" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/filter" ) func TestInclude(t *testing.T) { diff --git a/sdks/go/pkg/beam/transforms/stats/count.go b/sdks/go/pkg/beam/transforms/stats/count.go index d7fbab87ba43..201df514d671 100644 --- a/sdks/go/pkg/beam/transforms/stats/count.go +++ b/sdks/go/pkg/beam/transforms/stats/count.go @@ -17,8 +17,8 @@ package stats import ( - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // Count counts the number of appearances of each element in a collection. It diff --git a/sdks/go/pkg/beam/transforms/stats/count_test.go b/sdks/go/pkg/beam/transforms/stats/count_test.go index 8665593f0e6a..23627a92f799 100644 --- a/sdks/go/pkg/beam/transforms/stats/count_test.go +++ b/sdks/go/pkg/beam/transforms/stats/count_test.go @@ -19,9 +19,9 @@ import ( "fmt" "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) type count struct { diff --git a/sdks/go/pkg/beam/transforms/stats/max.go b/sdks/go/pkg/beam/transforms/stats/max.go index a87712db68da..faef087aff61 100644 --- a/sdks/go/pkg/beam/transforms/stats/max.go +++ b/sdks/go/pkg/beam/transforms/stats/max.go @@ -16,7 +16,7 @@ package stats import ( - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" ) //go:generate specialize --input=max_switch.tmpl --x=integers,floats diff --git a/sdks/go/pkg/beam/transforms/stats/max_test.go b/sdks/go/pkg/beam/transforms/stats/max_test.go index e7eef4097446..af817527dc91 100644 --- a/sdks/go/pkg/beam/transforms/stats/max_test.go +++ b/sdks/go/pkg/beam/transforms/stats/max_test.go @@ -18,9 +18,9 @@ package stats import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) type student struct { diff --git a/sdks/go/pkg/beam/transforms/stats/mean.go b/sdks/go/pkg/beam/transforms/stats/mean.go index 3b74cde92793..2c875e766d5c 100644 --- a/sdks/go/pkg/beam/transforms/stats/mean.go +++ b/sdks/go/pkg/beam/transforms/stats/mean.go @@ -18,8 +18,8 @@ package stats import ( "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // Mean returns the arithmetic mean (or average) of the elements in a collection. diff --git a/sdks/go/pkg/beam/transforms/stats/mean_test.go b/sdks/go/pkg/beam/transforms/stats/mean_test.go index cae6c64d17af..019f93925140 100644 --- a/sdks/go/pkg/beam/transforms/stats/mean_test.go +++ b/sdks/go/pkg/beam/transforms/stats/mean_test.go @@ -18,9 +18,9 @@ package stats import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) // TestMeanInt verifies that Mean works correctly for ints. diff --git a/sdks/go/pkg/beam/transforms/stats/min.go b/sdks/go/pkg/beam/transforms/stats/min.go index fc1528588885..675c5df0048e 100644 --- a/sdks/go/pkg/beam/transforms/stats/min.go +++ b/sdks/go/pkg/beam/transforms/stats/min.go @@ -16,7 +16,7 @@ package stats import ( - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" ) //go:generate specialize --input=min_switch.tmpl --x=integers,floats diff --git a/sdks/go/pkg/beam/transforms/stats/min_test.go b/sdks/go/pkg/beam/transforms/stats/min_test.go index b71114c5b171..ee7ab8298d51 100644 --- a/sdks/go/pkg/beam/transforms/stats/min_test.go +++ b/sdks/go/pkg/beam/transforms/stats/min_test.go @@ -18,9 +18,9 @@ package stats import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) // TestMinInt verifies that Min works correctly for ints. diff --git a/sdks/go/pkg/beam/transforms/stats/quantiles.go b/sdks/go/pkg/beam/transforms/stats/quantiles.go new file mode 100644 index 000000000000..17f2a1dad699 --- /dev/null +++ b/sdks/go/pkg/beam/transforms/stats/quantiles.go @@ -0,0 +1,711 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package stats + +// Approximate quantiles is implemented based on https://arxiv.org/pdf/1907.00236.pdf. + +import ( + "bytes" + "container/heap" + "context" + "encoding/gob" + "encoding/json" + "hash/crc32" + "io" + "math" + "reflect" + "sort" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" +) + +func init() { + compactorsType := reflect.TypeOf((**compactors)(nil)).Elem() + weightedElementType := reflect.TypeOf((*weightedElement)(nil)).Elem() + beam.RegisterType(compactorsType) + beam.RegisterType(weightedElementType) + beam.RegisterType(reflect.TypeOf((*approximateQuantilesInputFn)(nil)).Elem()) + beam.RegisterType(reflect.TypeOf((*approximateQuantilesMergeOnlyFn)(nil)).Elem()) + beam.RegisterType(reflect.TypeOf((*approximateQuantilesOutputFn)(nil)).Elem()) + beam.RegisterType(reflect.TypeOf((*shardElementsFn)(nil)).Elem()) + beam.RegisterCoder(compactorsType, encodeCompactors, decodeCompactors) + beam.RegisterCoder(weightedElementType, encodeWeightedElement, decodeWeightedElement) +} + +// Opts contains settings used to configure how approximate quantiles are computed. +type Opts struct { + // Controls the memory used and approximation error (difference between the quantile returned and the true quantile.) + K int + // Number of quantiles to return. The algorithm will return NumQuantiles - 1 numbers + NumQuantiles int + // For extremely large datasets, runners may have issues with out of memory errors or taking too long to finish. + // If ApproximateQuantiles is failing, you can use this option to tune how the data is sharded internally. + // This parameter is optional. If unspecified, Beam will compact all elements into a single compactor at once using a single machine. + // For example, if this is set to [8, 4, 2]: First, elements will be assigned to 8 shards which will run in parallel. Then the intermediate results from those 8 shards will be reassigned to 4 shards and merged in parallel. Then once again to 2 shards. Finally the intermediate results of those two shards will be merged on one machine before returning the final result. + InternalSharding []int +} + +// The paper suggests reducing the size of the lower-level compactors as we grow. +// We reduce the capacity at this rate. +// The paper suggests 1/sqrt(2) is ideal. That's approximately 0.7. +const capacityCoefficient float64 = 0.7 + +type sortListHeap struct { + data [][]beam.T + less reflectx.Func2x1 +} + +func (s sortListHeap) Len() int { return len(s.data) } +func (s sortListHeap) Less(i, j int) bool { return s.less.Call2x1(s.data[i][0], s.data[j][0]).(bool) } +func (s sortListHeap) Swap(i, j int) { s.data[i], s.data[j] = s.data[j], s.data[i] } +func (s *sortListHeap) Push(x interface{}) { s.data = append(s.data, x.([]beam.T)) } +func (s *sortListHeap) Pop() interface{} { + var x beam.T + x, s.data = s.data[len(s.data)-1], s.data[:len(s.data)-1] + return x +} + +// compactor contains elements to be compacted. +type compactor struct { + // Compaction needs to sort elements before compacting. Thus in practice, we should often have some pre-sorted data. + // We want to keep it separate so we can sort only the unsorted data and merge the two sorted lists together. + // If we're only receiving elements of weight 1, only level 0 will ever contain unsorted data and the rest of the levels will always remain sorted. + // To prevent repeated allocation/copying, we keep multiple sorted lists and then merge them together + sorted [][]beam.T + unsorted []beam.T + // How many items should be stored in this compactor before it should get compacted. + // Note that this is not a hard limit. + // The paper suggests implementing lazy compaction which would allow + // compactors to temporarily exceed their capacity as long as the total + // elements in all compactors doesn't exceed the total capacity in all + // compactors. In other words, compactors can temporarily borrow capacity + // from each other. + // In the paper, this is referred to as the variable k_h. + capacity int +} + +// serializedList represents a list of elements serialized to a byte array. +type serializedList struct { + // Number of elements serialized to elements. + Count int + Elements []byte +} + +type compactorAsGob struct { + Sorted []serializedList + Unsorted serializedList + EncodedTypeAsJSON []byte +} + +func encodeElements(enc beam.ElementEncoder, elements []beam.T) ([]byte, error) { + var buf bytes.Buffer + for _, e := range elements { + if err := enc.Encode(e, &buf); err != nil { + return nil, err + } + } + return buf.Bytes(), nil +} + +func (c *compactor) getElementType() reflect.Type { + for _, e := range c.sorted { + for _, e2 := range e { + return reflect.TypeOf(e2) + } + } + for _, e := range c.unsorted { + return reflect.TypeOf(e) + } + return nil +} + +func (c *compactor) MarshalBinary() ([]byte, error) { + t := c.getElementType() + var buf bytes.Buffer + if t == nil { + enc := gob.NewEncoder(&buf) + if err := enc.Encode(compactorAsGob{}); err != nil { + return nil, err + } + return buf.Bytes(), nil + } + enc := beam.NewElementEncoder(t) + encodedSorted := make([]serializedList, 0, len(c.sorted)) + for _, sorted := range c.sorted { + encoded, err := encodeElements(enc, sorted) + if err != nil { + return nil, err + } + encodedSorted = append(encodedSorted, serializedList{Count: len(sorted), Elements: encoded}) + } + encodedUnsortedSerialized, err := encodeElements(enc, c.unsorted) + encodedUnsorted := serializedList{Count: len(c.unsorted), Elements: encodedUnsortedSerialized} + if err != nil { + return nil, err + } + tAsJSON, err := beam.EncodedType{T: t}.MarshalJSON() + if err != nil { + return nil, err + } + gobEnc := gob.NewEncoder(&buf) + if err = gobEnc.Encode(compactorAsGob{ + Sorted: encodedSorted, + Unsorted: encodedUnsorted, + EncodedTypeAsJSON: tAsJSON, + }); err != nil { + return nil, err + } + return buf.Bytes(), nil +} + +func (s serializedList) decodeElements(dec beam.ElementDecoder) ([]beam.T, error) { + buf := bytes.NewBuffer(s.Elements) + ret := make([]beam.T, 0, s.Count) + for { + element, err := dec.Decode(buf) + if err == io.EOF { + return ret, nil + } else if err != nil { + return nil, err + } + ret = append(ret, element) + } +} + +func (c *compactor) UnmarshalBinary(data []byte) error { + var g compactorAsGob + var err error + gobDec := gob.NewDecoder(bytes.NewBuffer(data)) + if err = gobDec.Decode(&g); err != nil { + return err + } + if len(g.EncodedTypeAsJSON) == 0 { + return nil + } + var t beam.EncodedType + if err = json.Unmarshal(g.EncodedTypeAsJSON, &t); err != nil { + return err + } + dec := beam.NewElementDecoder(t.T) + decodedSorted := make([][]beam.T, 0, len(g.Sorted)) + for _, sorted := range g.Sorted { + decoded, err := sorted.decodeElements(dec) + if err != nil { + return err + } + decodedSorted = append(decodedSorted, decoded) + } + c.sorted = decodedSorted + if c.unsorted, err = g.Unsorted.decodeElements(dec); err != nil { + return err + } + return nil +} + +// update inserts an element into the compactor. +func (c *compactor) update(element beam.T) { + c.unsorted = append(c.unsorted, element) +} + +// size returns the number of elements stored in this compactor. +func (c *compactor) size() int { + size := 0 + for _, s := range c.sorted { + size += len(s) + } + return len(c.unsorted) + size +} + +type sorter struct { + less reflectx.Func2x1 + data []beam.T +} + +func (s sorter) Len() int { return len(s.data) } +func (s sorter) Less(i, j int) bool { return s.less.Call2x1(s.data[i], s.data[j]).(bool) } +func (s sorter) Swap(i, j int) { s.data[i], s.data[j] = s.data[j], s.data[i] } + +// sort sorts the compactor and returns all the elements in sorted order. +func (c *compactor) sort(less reflectx.Func2x1) []beam.T { + sort.Sort(sorter{data: c.unsorted, less: less}) + h := sortListHeap{data: c.sorted, less: less} + heap.Init(&h) + sorted := make([]beam.T, 0, c.size()-len(c.unsorted)) + for h.Len() > 0 { + s := heap.Pop(&h).([]beam.T) + sorted = append(sorted, s[0]) + if len(s) > 1 { + heap.Push(&h, s[1:]) + } + } + c.sorted = [][]beam.T{mergeSorted(sorted, c.unsorted, func(a, b interface{}) bool { return less.Call2x1(a, b).(bool) })} + c.unsorted = nil + if len(c.sorted[0]) == 0 { + c.sorted = nil + return nil + } + return c.sorted[0] +} + +// Compactors holds the state of the quantile approximation compactors. +type compactors struct { + // References "K" from the paper which influences the amount of memory used. + K int + // When compacting, we want to alternate between taking elements at even vs odd indices. + // The paper suggests using a random variable but we'd prefer to stay deterministic. + // Especially when merging two compactors we want to keep track of how often we've selected odds vs evens. + NumberOfCompactions int + + // Each compactor takes a sample of elements. + // The "height" (also known as the index in this slice) of the compactor determines the weight of its elements. + // The weight of a compactor of height h is 2^h. + // For example, for h = 3 (which would be compactors[3]), the weight is 2^3 = 8. That means each element in that compactor represents 8 instances of itself. + Compactors []compactor +} + +func (c *compactors) totalCapacity() int { + totalCapacity := 0 + for _, compactor := range c.Compactors { + totalCapacity += compactor.capacity + } + return totalCapacity +} + +func (c *compactors) size() int { + size := 0 + for _, compactor := range c.Compactors { + size += compactor.size() + } + return size +} + +// capacity computes the capacity of a compactor at a certain level. +// The paper suggests decreasing the capacity of lower-leveled compactors as we add more elements. +func (c *compactors) capacity(compactorLevel int) int { + return int(math.Ceil(math.Pow(capacityCoefficient, float64(len(c.Compactors)-compactorLevel-1))*float64(c.K))) + 1 +} + +// compact compacts all compactors until the total size is less than the maximum capacity of all compactors. +func (c *compactors) compact(less reflectx.Func2x1) { + for c.size() > c.totalCapacity() { + for level, compactor := range c.Compactors { + if compactor.size() > compactor.capacity { + c.compactLevel(level, less) + // Merging compactors can cause us to exceed max capacity in multiple compactors. + if c.size() < c.totalCapacity() { + // Do lazy compaction as described in the paper. + break + } + } + } + } +} + +// update inserts the given element into the compactors. If this element causes the compactors to grow too large, we perform the compaction here. +func (c *compactors) update(element beam.T, weight int, less reflectx.Func2x1) { + level := int(math.Log2(float64(weight))) + c.growToIncludeLevel(level) + c.Compactors[level].update(element) + // From the paper, we're using the "Splitting the Input" approach. + remainingWeight := weight - (1 << uint(level)) + // Only attempt compaction if we're doing the last update. Otherwise we'd be compacting too often. + if remainingWeight <= 0 { + c.compact(less) + } else { + c.update(element, remainingWeight, less) + } +} + +// growToIncludeLevel ensures we have compactors available at the given level. +func (c *compactors) growToIncludeLevel(level int) { + if len(c.Compactors)-1 >= level { + return + } + for i := len(c.Compactors) - 1; i < level; i++ { + c.Compactors = append(c.Compactors, compactor{}) + } + for level := range c.Compactors { + c.Compactors[level].capacity = c.capacity(level) + } +} + +// compact compacts elements in compactors. +func (c *compactors) compactLevel(level int, less reflectx.Func2x1) { + c.growToIncludeLevel(level + 1) + jitterIndex := 0 + // Create a temporary buffer to hold the compacted elements. + // Buffering the elements like this makes it easier to call mergeSorted. + compacted := make([]beam.T, 0, c.Compactors[level].size()/2) + selectEvens := c.NumberOfCompactions%2 == 0 + c.NumberOfCompactions++ + for _, element := range c.Compactors[level].sort(less) { + if (jitterIndex%2 == 0) == selectEvens { + compacted = append(compacted, element) + } + jitterIndex++ + } + if len(compacted) > 0 { + c.Compactors[level+1].sorted = append(c.Compactors[level+1].sorted, compacted) + } + // Clear out the compactor at this level since we've finished compacting it. The compacted elements have already been moved to the next compactor. + c.Compactors[level].sorted = nil + c.Compactors[level].unsorted = nil +} + +func encodeCompactors(c *compactors) ([]byte, error) { + var buf bytes.Buffer + enc := gob.NewEncoder(&buf) + if err := enc.Encode(c); err != nil { + return nil, err + } + return buf.Bytes(), nil +} + +func decodeCompactors(data []byte) (*compactors, error) { + var compactors compactors + dec := gob.NewDecoder(bytes.NewBuffer(data)) + if err := dec.Decode(&compactors); err != nil { + return nil, err + } + for level := range compactors.Compactors { + compactors.Compactors[level].capacity = compactors.capacity(level) + } + return &compactors, nil +} + +// mergeSorted takes two slices which are already sorted and returns a new slice containing all elements sorted together. +func mergeSorted(a, b []beam.T, less func(interface{}, interface{}) bool) []beam.T { + output := make([]beam.T, 0, len(a)+len(b)) + for len(a) > 0 && len(b) > 0 { + if less(a[0], b[0]) { + output = append(output, a[0]) + a = a[1:] + } else { + output = append(output, b[0]) + b = b[1:] + } + } + if len(a) > 0 { + output = append(output, a...) + } else { + output = append(output, b...) + } + return output +} + +// mergeSortedWeighted takes two slices which are already sorted and returns a new slice containing all elements sorted together. +func mergeSortedWeighted(a, b []weightedElement, less func(interface{}, interface{}) bool) []weightedElement { + output := make([]weightedElement, 0, len(a)+len(b)) + for len(a) > 0 && len(b) > 0 { + if less(a[0], b[0]) { + output = append(output, a[0]) + a = a[1:] + } else { + output = append(output, b[0]) + b = b[1:] + } + } + if len(a) > 0 { + output = append(output, a...) + } else { + output = append(output, b...) + } + return output +} + +// merge joins two compactors together. +func (c *compactors) merge(other *compactors, less reflectx.Func2x1) { + for level := range c.Compactors { + if len(other.Compactors)-1 < level { + break + } + c.Compactors[level].unsorted = append(c.Compactors[level].unsorted, other.Compactors[level].unsorted...) + c.Compactors[level].sorted = append(c.Compactors[level].sorted, other.Compactors[level].sorted...) + } + if len(other.Compactors) > len(c.Compactors) { + c.Compactors = append(c.Compactors, other.Compactors[len(c.Compactors):]...) + } + c.NumberOfCompactions += other.NumberOfCompactions + c.compact(less) +} + +// approximateQuantilesCombineFnState contains the payload for the combiners. +// Ideally this would be a single combine function but if we do that, runners attempt to do all the merges on a single machine. +// Unfortunately the merges can be slow for extremely large datasets and large values of K. If the merge takes too long, it will get canceled and the job will never complete. +// Thus we split up the combiners into multiple functions to force the runner to do the work in parallel. +// This state can be shared across all of the split-up functions. +type approximateQuantilesCombineFnState struct { + // The size of the compactors. + // The memory consumed, and the error are controlled by this parameter. + K int `json:"k"` + // Used to compare elements. + LessFunc beam.EncodedFunc + // Internally cached instance. + less reflectx.Func2x1 + NumQuantiles int `json:"numQuantiles"` +} + +func (f *approximateQuantilesCombineFnState) setup() error { + f.less = reflectx.ToFunc2x1(f.LessFunc.Fn) + return nil +} + +func (f *approximateQuantilesCombineFnState) createAccumulator() *compactors { + return &compactors{ + K: f.K, + Compactors: []compactor{{capacity: f.K}}, + } +} + +// approximateQuantilesOutputFn extracts the final output containing the quantiles. +type approximateQuantilesOutputFn struct { + State approximateQuantilesCombineFnState `json:"state"` +} + +func (f *approximateQuantilesOutputFn) Setup() error { + return f.State.setup() +} + +func (f *approximateQuantilesOutputFn) CreateAccumulator() *compactors { + return f.State.createAccumulator() +} + +func (f *approximateQuantilesOutputFn) AddInput(compactors *compactors, element *compactors) *compactors { + compactors.merge(element, f.State.less) + return compactors +} + +func (f *approximateQuantilesOutputFn) MergeAccumulators(ctx context.Context, a, b *compactors) *compactors { + a.merge(b, f.State.less) + return a +} + +type weightedElementAsGob struct { + EncodedTypeAsJSON []byte + Weight int + Element []byte +} + +func encodeWeightedElement(element weightedElement) ([]byte, error) { + t := reflect.TypeOf(element.element) + enc := beam.NewElementEncoder(t) + var buf bytes.Buffer + if err := enc.Encode(element.element, &buf); err != nil { + return nil, err + } + tAsJSON, err := beam.EncodedType{T: t}.MarshalJSON() + if err != nil { + return nil, err + } + var gobBuf bytes.Buffer + if err := gob.NewEncoder(&gobBuf).Encode(weightedElementAsGob{ + EncodedTypeAsJSON: tAsJSON, + Weight: element.weight, + Element: buf.Bytes(), + }); err != nil { + return nil, err + } + return gobBuf.Bytes(), nil +} + +func decodeWeightedElement(data []byte) (weightedElement, error) { + var g weightedElementAsGob + dec := gob.NewDecoder(bytes.NewBuffer(data)) + if err := dec.Decode(&g); err != nil { + return weightedElement{}, err + } + var t beam.EncodedType + if err := t.UnmarshalJSON(g.EncodedTypeAsJSON); err != nil { + return weightedElement{}, err + } + element, err := beam.NewElementDecoder(t.T).Decode(bytes.NewBuffer(g.Element)) + if err != nil { + return weightedElement{}, err + } + return weightedElement{ + weight: g.Weight, + element: element, + }, nil +} + +type weightedElement struct { + weight int + element beam.T +} + +func toWeightedSlice(compactor compactor, less reflectx.Func2x1, weight int) []weightedElement { + sorted := compactor.sort(less) + weightedElements := make([]weightedElement, 0, len(sorted)) + for _, element := range sorted { + weightedElements = append(weightedElements, weightedElement{weight: weight, element: element}) + } + return weightedElements +} +func (f *approximateQuantilesOutputFn) ExtractOutput(ctx context.Context, compactors *compactors) []beam.T { + sorted := toWeightedSlice(compactors.Compactors[0], f.State.less, 1) + for level, compactor := range compactors.Compactors[1:] { + sorted = mergeSortedWeighted(sorted, toWeightedSlice(compactor, f.State.less, 1<= currentQuantile/float64(f.State.NumQuantiles) { + ret = append(ret, element.element) + currentQuantile++ + } + if currentQuantile >= float64(f.State.NumQuantiles) { + break + } + } + return ret +} + +// approximateQuantilesInputFn combines elements into compactors, but not necessarily the final compactor. +type approximateQuantilesInputFn approximateQuantilesOutputFn + +func (f *approximateQuantilesInputFn) Setup() error { + return f.State.setup() +} + +func (f *approximateQuantilesInputFn) CreateAccumulator() *compactors { + return f.State.createAccumulator() +} + +func (f *approximateQuantilesInputFn) AddInput(compactors *compactors, element weightedElement) *compactors { + compactors.update(element.element, element.weight, f.State.less) + return compactors +} + +func (f *approximateQuantilesInputFn) MergeAccumulators(ctx context.Context, a, b *compactors) *compactors { + a.merge(b, f.State.less) + return a +} + +func (f *approximateQuantilesInputFn) ExtractOutput(ctx context.Context, compactors *compactors) *compactors { + for i := range compactors.Compactors { + // Sort the compactors here so when we're merging them for the final output, they're already sorted and we can merge elements in order. + compactors.Compactors[i].sort(f.State.less) + } + return compactors +} + +// approximateQuantilesMergeOnlyFn combines compactors into smaller compactors, but not necessarily the final compactor. +type approximateQuantilesMergeOnlyFn approximateQuantilesOutputFn + +func (f *approximateQuantilesMergeOnlyFn) Setup() error { + return f.State.setup() +} + +func (f *approximateQuantilesMergeOnlyFn) CreateAccumulator() *compactors { + return f.State.createAccumulator() +} + +func (f *approximateQuantilesMergeOnlyFn) AddInput(compactors *compactors, element *compactors) *compactors { + compactors.merge(element, f.State.less) + return compactors +} + +func (f *approximateQuantilesMergeOnlyFn) MergeAccumulators(ctx context.Context, a, b *compactors) *compactors { + a.merge(b, f.State.less) + return a +} + +func (f *approximateQuantilesMergeOnlyFn) ExtractOutput(ctx context.Context, compactors *compactors) *compactors { + for i := range compactors.Compactors { + // Sort the compactors here so when we're merging them for the final output, they're already sorted and we can merge elements in order. + compactors.Compactors[i].sort(f.State.less) + } + return compactors +} + +type shardElementsFn struct { + Shards int `json:"shards"` + T beam.EncodedType `json:"t"` + elementEncoder beam.ElementEncoder +} + +func (s *shardElementsFn) Setup() { + s.elementEncoder = beam.NewElementEncoder(s.T.T) +} + +func (s *shardElementsFn) ProcessElement(element beam.T) (int, beam.T) { + h := crc32.NewIEEE() + s.elementEncoder.Encode(element, h) + return int(h.Sum32()) % s.Shards, element +} + +func makeWeightedElement(weight int, element beam.T) weightedElement { + return weightedElement{weight: weight, element: element} +} + +// ApproximateQuantiles computes approximate quantiles for the input PCollection. +// +// The output PCollection contains a single element: a list of numQuantiles - 1 elements approximately splitting up the input collection into numQuantiles separate quantiles. +// For example, if numQuantiles = 2, the returned list would contain a single element such that approximately half of the input would be less than that element and half would be greater. +func ApproximateQuantiles(s beam.Scope, pc beam.PCollection, less interface{}, opts Opts) beam.PCollection { + return ApproximateWeightedQuantiles(s, beam.ParDo(s, func(e beam.T) (int, beam.T) { return 1, e }, pc), less, opts) +} + +// reduce takes a PCollection and returns a PCollection<*compactors>. The output PCollection may have at most shardSizes[len(shardSizes) - 1] compactors. +func reduce(s beam.Scope, weightedElements beam.PCollection, state approximateQuantilesCombineFnState, shardSizes []int) beam.PCollection { + if len(shardSizes) == 0 { + shardSizes = []int{1} + } + elementsWithShardNumber := beam.ParDo(s, &shardElementsFn{Shards: shardSizes[0], T: beam.EncodedType{T: reflect.TypeOf((*weightedElement)(nil)).Elem()}}, weightedElements) + reducedCompactorsWithShardNumber := beam.CombinePerKey(s, &approximateQuantilesInputFn{State: state}, elementsWithShardNumber) + shardedCompactors := beam.DropKey(s, reducedCompactorsWithShardNumber) + shardSizes = shardSizes[1:] + compactorsType := reflect.TypeOf((**compactors)(nil)).Elem() + for _, shardSize := range shardSizes { + compactorsWithShardNumber := beam.ParDo(s, &shardElementsFn{Shards: shardSize, T: beam.EncodedType{T: compactorsType}}, shardedCompactors) + reducedCompactorsWithShardNumber = beam.CombinePerKey(s, &approximateQuantilesMergeOnlyFn{State: state}, compactorsWithShardNumber) + shardedCompactors = beam.DropKey(s, reducedCompactorsWithShardNumber) + } + return shardedCompactors +} + +// ApproximateWeightedQuantiles computes approximate quantiles for the input PCollection<(weight int, T)>. +// +// The output PCollection contains a single element: a list of numQuantiles - 1 elements approximately splitting up the input collection into numQuantiles separate quantiles. +// For example, if numQuantiles = 2, the returned list would contain a single element such that approximately half of the input would be less than that element and half would be greater or equal. +func ApproximateWeightedQuantiles(s beam.Scope, pc beam.PCollection, less interface{}, opts Opts) beam.PCollection { + _, t := beam.ValidateKVType(pc) + state := approximateQuantilesCombineFnState{ + K: opts.K, + NumQuantiles: opts.NumQuantiles, + LessFunc: beam.EncodedFunc{Fn: reflectx.MakeFunc(less)}, + } + weightedElements := beam.ParDo(s, makeWeightedElement, pc) + shardedCompactors := reduce(s, weightedElements, state, opts.InternalSharding) + return beam.Combine( + s, + &approximateQuantilesOutputFn{State: state}, + shardedCompactors, + beam.TypeDefinition{Var: beam.TType, T: t.Type()}, + ) +} diff --git a/sdks/go/pkg/beam/transforms/stats/quantiles_test.go b/sdks/go/pkg/beam/transforms/stats/quantiles_test.go new file mode 100644 index 000000000000..b2440da827d3 --- /dev/null +++ b/sdks/go/pkg/beam/transforms/stats/quantiles_test.go @@ -0,0 +1,275 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package stats + +import ( + "reflect" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/google/go-cmp/cmp" +) + +func init() { + beam.RegisterFunction(weightedElementToKv) + + // In practice, this runs faster than plain reflection. + // TODO(BEAM-9616): Remove once collisions don't occur for starcgen over test code and an equivalent is generated for us. + reflectx.RegisterFunc(reflect.ValueOf(less).Type(), func(_ interface{}) reflectx.Func { + return newIntLess() + }) +} + +type intLess struct { + name string + t reflect.Type +} + +func newIntLess() *intLess { + return &intLess{ + name: reflectx.FunctionName(reflect.ValueOf(less).Interface()), + t: reflect.ValueOf(less).Type(), + } +} + +func (i *intLess) Name() string { + return i.name +} +func (i *intLess) Type() reflect.Type { + return i.t +} +func (i *intLess) Call(args []interface{}) []interface{} { + return []interface{}{args[0].(int) < args[1].(int)} +} + +func less(a, b int) bool { + return a < b +} + +func TestLargeQuantiles(t *testing.T) { + const numElements int = 30000 + inputSlice := make([]int, 0, numElements) + for i := 0; i < numElements; i++ { + inputSlice = append(inputSlice, i) + } + p, s, input, expected := ptest.CreateList2(inputSlice, [][]int{[]int{10006, 19973}}) + quantiles := ApproximateQuantiles(s, input, less, Opts{ + K: 200, + NumQuantiles: 3, + }) + passert.Equals(s, quantiles, expected) + if err := ptest.Run(p); err != nil { + t.Errorf("ApproximateQuantiles failed: %v", err) + } +} + +func TestLargeQuantilesReversed(t *testing.T) { + const numElements int = 30000 + inputSlice := make([]int, 0, numElements) + for i := numElements - 1; i >= 0; i-- { + inputSlice = append(inputSlice, i) + } + p, s, input, expected := ptest.CreateList2(inputSlice, [][]int{[]int{9985, 19959}}) + quantiles := ApproximateQuantiles(s, input, less, Opts{ + K: 200, + NumQuantiles: 3, + }) + passert.Equals(s, quantiles, expected) + if err := ptest.Run(p); err != nil { + t.Errorf("ApproximateQuantiles failed: %v", err) + } +} + +func TestBasicQuantiles(t *testing.T) { + // Test asking for 3 quantiles for k=3. + tests := []struct { + Input []int + Expected [][]int + }{ + {[]int{}, [][]int{}}, + {[]int{1}, [][]int{[]int{1}}}, + {[]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, [][]int{[]int{6, 13}}}, + } + + for _, test := range tests { + p, s, in, exp := ptest.CreateList2(test.Input, test.Expected) + quantiles := ApproximateQuantiles(s, in, less, Opts{ + K: 3, + NumQuantiles: 3, + }) + passert.Equals(s, quantiles, exp) + + if err := ptest.Run(p); err != nil { + t.Errorf("ApproximateQuantiles(%v) != %v: %v", test.Input, test.Expected, err) + } + } +} + +func weightedElementToKv(e testWeightedElement) (int, int) { + return e.Weight, e.Element +} + +type testWeightedElement struct { + Weight int + Element int +} + +func TestWeightedQuantiles(t *testing.T) { + // Test asking for 3 quantiles for k=3. + input := []testWeightedElement{ + {Weight: 1, Element: 1}, + {Weight: 10, Element: 2}, + {Weight: 1, Element: 3}, + {Weight: 10, Element: 4}} + expected := []int{2, 4} + p, s, in, exp := ptest.CreateList2(input, [][]int{expected}) + elementWeightKvs := beam.ParDo(s, weightedElementToKv, in, beam.TypeDefinition{Var: beam.TType, T: reflectx.Int}) + opts := Opts{ + K: 3, + NumQuantiles: 3, + } + quantiles := ApproximateWeightedQuantiles(s, elementWeightKvs, less, opts) + passert.Equals(s, quantiles, exp) + + if err := ptest.Run(p); err != nil { + t.Errorf("ApproximateQuantiles(%v) != %v: %v", input, expected, err) + } +} + +func TestWeightedQuantilesWithInternalSharding(t *testing.T) { + // Test shard reduction. + input := []testWeightedElement{ + {Weight: 1, Element: 1}, + {Weight: 10, Element: 2}, + {Weight: 1, Element: 3}, + {Weight: 10, Element: 4}} + expected := []int{2, 4} + p, s, in, exp := ptest.CreateList2(input, [][]int{expected}) + elementWeightKvs := beam.ParDo(s, weightedElementToKv, in, beam.TypeDefinition{Var: beam.TType, T: reflectx.Int}) + opts := Opts{ + K: 3, + NumQuantiles: 3, + InternalSharding: []int{4, 3, 2}, + } + quantiles := ApproximateWeightedQuantiles(s, elementWeightKvs, less, opts) + passert.Equals(s, quantiles, exp) + + if err := ptest.Run(p); err != nil { + t.Errorf("ApproximateQuantiles(%v) != %v: %v", input, expected, err) + } +} + +func TestMerging(t *testing.T) { + compactors1 := compactors{ + K: 3, + NumberOfCompactions: 1, + Compactors: []compactor{{ + sorted: [][]beam.T{[]beam.T{1}, []beam.T{2}, []beam.T{3}}, + unsorted: []beam.T{6, 5, 4}, + capacity: 4, + }}, + } + + compactors2 := compactors{ + K: 3, + NumberOfCompactions: 1, + Compactors: []compactor{ + { + sorted: [][]beam.T{[]beam.T{7}, []beam.T{8}, []beam.T{9}}, + unsorted: []beam.T{12, 11, 10}, + capacity: 4}, + }, + } + + compactors1.merge(&compactors2, reflectx.MakeFunc2x1(less)) + + expectedCompactors := compactors{ + K: 3, + NumberOfCompactions: 3, + Compactors: []compactor{ + {capacity: 4}, + { + sorted: [][]beam.T{[]beam.T{1, 3, 5, 7, 9, 11}}, + capacity: 4, + }, + }, + } + if d := cmp.Diff(expectedCompactors, compactors1, cmp.AllowUnexported(compactor{})); d != "" { + t.Errorf("Failed. Expected %v, was %v, diff: %v", expectedCompactors, compactors1, d) + } +} + +func TestCompactorsEncoding(t *testing.T) { + compactors := compactors{ + K: 3, + NumberOfCompactions: 1, + Compactors: []compactor{ + { + capacity: 4, + sorted: [][]beam.T{[]beam.T{1, 2}}, + unsorted: []beam.T{3, 4}, + }, + { + capacity: 4, + sorted: [][]beam.T{[]beam.T{5, 6}}, + unsorted: []beam.T{7, 8}, + }, + }, + } + compactors.Compactors[0].update(1) + data, err := encodeCompactors(&compactors) + if err != nil { + t.Errorf("Failed to encode, %v", err) + } + decodedCompactors, err := decodeCompactors(data) + if err != nil { + t.Errorf("Failed to decode, %v", err) + } + // We want to use cmp.Diff which makes a distinction between empty and nil slices. + // So we need to clean up empty slices to be nil. + for i := range decodedCompactors.Compactors { + if len(decodedCompactors.Compactors[i].sorted) == 0 { + decodedCompactors.Compactors[i].sorted = nil + } + if len(decodedCompactors.Compactors[i].unsorted) == 0 { + decodedCompactors.Compactors[i].unsorted = nil + } + } + if d := cmp.Diff(&compactors, decodedCompactors, cmp.AllowUnexported(compactor{})); d != "" { + t.Errorf("Invalid coder. Wanted %v, got %v, diff: %v", &compactors, decodedCompactors, d) + } +} + +func TestWeightedElementEncoding(t *testing.T) { + w := weightedElement{ + weight: 10, + element: 1, + } + data, err := encodeWeightedElement(w) + if err != nil { + t.Errorf("Failed to encode %v", err) + } + decoded, err := decodeWeightedElement(data) + if err != nil { + t.Errorf("Failed to decode %v", err) + } + if d := cmp.Diff(w, decoded, cmp.AllowUnexported(weightedElement{})); d != "" { + t.Errorf("Invalid coder. Wanted %v got %v", w, decoded) + } +} diff --git a/sdks/go/pkg/beam/transforms/stats/stats.shims.go b/sdks/go/pkg/beam/transforms/stats/stats.shims.go index daa8988bcc6d..3a99269616a4 100644 --- a/sdks/go/pkg/beam/transforms/stats/stats.shims.go +++ b/sdks/go/pkg/beam/transforms/stats/stats.shims.go @@ -22,9 +22,10 @@ import ( "reflect" // Library imports - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func init() { @@ -67,44 +68,46 @@ func init() { runtime.RegisterFunction(sumUint8Fn) runtime.RegisterFunction(sumUintFn) runtime.RegisterType(reflect.TypeOf((*meanAccum)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*meanAccum)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*meanFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*meanFn)(nil)).Elem()) reflectx.RegisterStructWrapper(reflect.TypeOf((*meanFn)(nil)).Elem(), wrapMakerMeanFn) - reflectx.RegisterFunc(reflect.TypeOf((*func(float32,float32) (float32))(nil)).Elem(), funcMakerFloat32Float32ГFloat32) - reflectx.RegisterFunc(reflect.TypeOf((*func(float64,float64) (float64))(nil)).Elem(), funcMakerFloat64Float64ГFloat64) - reflectx.RegisterFunc(reflect.TypeOf((*func(int16,int16) (int16))(nil)).Elem(), funcMakerInt16Int16ГInt16) - reflectx.RegisterFunc(reflect.TypeOf((*func(int32,int32) (int32))(nil)).Elem(), funcMakerInt32Int32ГInt32) - reflectx.RegisterFunc(reflect.TypeOf((*func(int64,int64) (int64))(nil)).Elem(), funcMakerInt64Int64ГInt64) - reflectx.RegisterFunc(reflect.TypeOf((*func(int8,int8) (int8))(nil)).Elem(), funcMakerInt8Int8ГInt8) - reflectx.RegisterFunc(reflect.TypeOf((*func(int,int) (int))(nil)).Elem(), funcMakerIntIntГInt) - reflectx.RegisterFunc(reflect.TypeOf((*func(meanAccum,meanAccum) (meanAccum))(nil)).Elem(), funcMakerMeanAccumMeanAccumГMeanAccum) - reflectx.RegisterFunc(reflect.TypeOf((*func(meanAccum,typex.T) (meanAccum))(nil)).Elem(), funcMakerMeanAccumTypex۰TГMeanAccum) - reflectx.RegisterFunc(reflect.TypeOf((*func(meanAccum) (float64))(nil)).Elem(), funcMakerMeanAccumГFloat64) - reflectx.RegisterFunc(reflect.TypeOf((*func(typex.T) (int))(nil)).Elem(), funcMakerTypex۰TГInt) - reflectx.RegisterFunc(reflect.TypeOf((*func(typex.T) (typex.T,int))(nil)).Elem(), funcMakerTypex۰TГTypex۰TInt) - reflectx.RegisterFunc(reflect.TypeOf((*func(uint16,uint16) (uint16))(nil)).Elem(), funcMakerUint16Uint16ГUint16) - reflectx.RegisterFunc(reflect.TypeOf((*func(uint32,uint32) (uint32))(nil)).Elem(), funcMakerUint32Uint32ГUint32) - reflectx.RegisterFunc(reflect.TypeOf((*func(uint64,uint64) (uint64))(nil)).Elem(), funcMakerUint64Uint64ГUint64) - reflectx.RegisterFunc(reflect.TypeOf((*func(uint8,uint8) (uint8))(nil)).Elem(), funcMakerUint8Uint8ГUint8) - reflectx.RegisterFunc(reflect.TypeOf((*func(uint,uint) (uint))(nil)).Elem(), funcMakerUintUintГUint) - reflectx.RegisterFunc(reflect.TypeOf((*func() (meanAccum))(nil)).Elem(), funcMakerГMeanAccum) + reflectx.RegisterFunc(reflect.TypeOf((*func(float32, float32) float32)(nil)).Elem(), funcMakerFloat32Float32ГFloat32) + reflectx.RegisterFunc(reflect.TypeOf((*func(float64, float64) float64)(nil)).Elem(), funcMakerFloat64Float64ГFloat64) + reflectx.RegisterFunc(reflect.TypeOf((*func(int16, int16) int16)(nil)).Elem(), funcMakerInt16Int16ГInt16) + reflectx.RegisterFunc(reflect.TypeOf((*func(int32, int32) int32)(nil)).Elem(), funcMakerInt32Int32ГInt32) + reflectx.RegisterFunc(reflect.TypeOf((*func(int64, int64) int64)(nil)).Elem(), funcMakerInt64Int64ГInt64) + reflectx.RegisterFunc(reflect.TypeOf((*func(int8, int8) int8)(nil)).Elem(), funcMakerInt8Int8ГInt8) + reflectx.RegisterFunc(reflect.TypeOf((*func(int, int) int)(nil)).Elem(), funcMakerIntIntГInt) + reflectx.RegisterFunc(reflect.TypeOf((*func(meanAccum, meanAccum) meanAccum)(nil)).Elem(), funcMakerMeanAccumMeanAccumГMeanAccum) + reflectx.RegisterFunc(reflect.TypeOf((*func(meanAccum, typex.T) meanAccum)(nil)).Elem(), funcMakerMeanAccumTypex۰TГMeanAccum) + reflectx.RegisterFunc(reflect.TypeOf((*func(meanAccum) float64)(nil)).Elem(), funcMakerMeanAccumГFloat64) + reflectx.RegisterFunc(reflect.TypeOf((*func(typex.T) int)(nil)).Elem(), funcMakerTypex۰TГInt) + reflectx.RegisterFunc(reflect.TypeOf((*func(typex.T) (typex.T, int))(nil)).Elem(), funcMakerTypex۰TГTypex۰TInt) + reflectx.RegisterFunc(reflect.TypeOf((*func(uint16, uint16) uint16)(nil)).Elem(), funcMakerUint16Uint16ГUint16) + reflectx.RegisterFunc(reflect.TypeOf((*func(uint32, uint32) uint32)(nil)).Elem(), funcMakerUint32Uint32ГUint32) + reflectx.RegisterFunc(reflect.TypeOf((*func(uint64, uint64) uint64)(nil)).Elem(), funcMakerUint64Uint64ГUint64) + reflectx.RegisterFunc(reflect.TypeOf((*func(uint8, uint8) uint8)(nil)).Elem(), funcMakerUint8Uint8ГUint8) + reflectx.RegisterFunc(reflect.TypeOf((*func(uint, uint) uint)(nil)).Elem(), funcMakerUintUintГUint) + reflectx.RegisterFunc(reflect.TypeOf((*func() meanAccum)(nil)).Elem(), funcMakerГMeanAccum) } func wrapMakerMeanFn(fn interface{}) map[string]reflectx.Func { dfn := fn.(*meanFn) return map[string]reflectx.Func{ - "AddInput": reflectx.MakeFunc(func(a0 meanAccum, a1 typex.T) (meanAccum) { return dfn.AddInput(a0, a1) }), - "CreateAccumulator": reflectx.MakeFunc(func() (meanAccum) { return dfn.CreateAccumulator() }), - "ExtractOutput": reflectx.MakeFunc(func(a0 meanAccum) (float64) { return dfn.ExtractOutput(a0) }), - "MergeAccumulators": reflectx.MakeFunc(func(a0 meanAccum, a1 meanAccum) (meanAccum) { return dfn.MergeAccumulators(a0, a1) }), + "AddInput": reflectx.MakeFunc(func(a0 meanAccum, a1 typex.T) meanAccum { return dfn.AddInput(a0, a1) }), + "CreateAccumulator": reflectx.MakeFunc(func() meanAccum { return dfn.CreateAccumulator() }), + "ExtractOutput": reflectx.MakeFunc(func(a0 meanAccum) float64 { return dfn.ExtractOutput(a0) }), + "MergeAccumulators": reflectx.MakeFunc(func(a0 meanAccum, a1 meanAccum) meanAccum { return dfn.MergeAccumulators(a0, a1) }), } } type callerFloat32Float32ГFloat32 struct { - fn func(float32,float32) (float32) + fn func(float32, float32) float32 } func funcMakerFloat32Float32ГFloat32(fn interface{}) reflectx.Func { - f := fn.(func(float32,float32) (float32)) + f := fn.(func(float32, float32) float32) return &callerFloat32Float32ГFloat32{fn: f} } @@ -121,16 +124,16 @@ func (c *callerFloat32Float32ГFloat32) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerFloat32Float32ГFloat32) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerFloat32Float32ГFloat32) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(float32), arg1.(float32)) } type callerFloat64Float64ГFloat64 struct { - fn func(float64,float64) (float64) + fn func(float64, float64) float64 } func funcMakerFloat64Float64ГFloat64(fn interface{}) reflectx.Func { - f := fn.(func(float64,float64) (float64)) + f := fn.(func(float64, float64) float64) return &callerFloat64Float64ГFloat64{fn: f} } @@ -147,16 +150,16 @@ func (c *callerFloat64Float64ГFloat64) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerFloat64Float64ГFloat64) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerFloat64Float64ГFloat64) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(float64), arg1.(float64)) } type callerInt16Int16ГInt16 struct { - fn func(int16,int16) (int16) + fn func(int16, int16) int16 } func funcMakerInt16Int16ГInt16(fn interface{}) reflectx.Func { - f := fn.(func(int16,int16) (int16)) + f := fn.(func(int16, int16) int16) return &callerInt16Int16ГInt16{fn: f} } @@ -173,16 +176,16 @@ func (c *callerInt16Int16ГInt16) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerInt16Int16ГInt16) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerInt16Int16ГInt16) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(int16), arg1.(int16)) } type callerInt32Int32ГInt32 struct { - fn func(int32,int32) (int32) + fn func(int32, int32) int32 } func funcMakerInt32Int32ГInt32(fn interface{}) reflectx.Func { - f := fn.(func(int32,int32) (int32)) + f := fn.(func(int32, int32) int32) return &callerInt32Int32ГInt32{fn: f} } @@ -199,16 +202,16 @@ func (c *callerInt32Int32ГInt32) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerInt32Int32ГInt32) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerInt32Int32ГInt32) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(int32), arg1.(int32)) } type callerInt64Int64ГInt64 struct { - fn func(int64,int64) (int64) + fn func(int64, int64) int64 } func funcMakerInt64Int64ГInt64(fn interface{}) reflectx.Func { - f := fn.(func(int64,int64) (int64)) + f := fn.(func(int64, int64) int64) return &callerInt64Int64ГInt64{fn: f} } @@ -225,16 +228,16 @@ func (c *callerInt64Int64ГInt64) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerInt64Int64ГInt64) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerInt64Int64ГInt64) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(int64), arg1.(int64)) } type callerInt8Int8ГInt8 struct { - fn func(int8,int8) (int8) + fn func(int8, int8) int8 } func funcMakerInt8Int8ГInt8(fn interface{}) reflectx.Func { - f := fn.(func(int8,int8) (int8)) + f := fn.(func(int8, int8) int8) return &callerInt8Int8ГInt8{fn: f} } @@ -251,16 +254,16 @@ func (c *callerInt8Int8ГInt8) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerInt8Int8ГInt8) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerInt8Int8ГInt8) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(int8), arg1.(int8)) } type callerIntIntГInt struct { - fn func(int,int) (int) + fn func(int, int) int } func funcMakerIntIntГInt(fn interface{}) reflectx.Func { - f := fn.(func(int,int) (int)) + f := fn.(func(int, int) int) return &callerIntIntГInt{fn: f} } @@ -277,16 +280,16 @@ func (c *callerIntIntГInt) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerIntIntГInt) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerIntIntГInt) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(int), arg1.(int)) } type callerMeanAccumMeanAccumГMeanAccum struct { - fn func(meanAccum,meanAccum) (meanAccum) + fn func(meanAccum, meanAccum) meanAccum } func funcMakerMeanAccumMeanAccumГMeanAccum(fn interface{}) reflectx.Func { - f := fn.(func(meanAccum,meanAccum) (meanAccum)) + f := fn.(func(meanAccum, meanAccum) meanAccum) return &callerMeanAccumMeanAccumГMeanAccum{fn: f} } @@ -303,16 +306,16 @@ func (c *callerMeanAccumMeanAccumГMeanAccum) Call(args []interface{}) []interfa return []interface{}{out0} } -func (c *callerMeanAccumMeanAccumГMeanAccum) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerMeanAccumMeanAccumГMeanAccum) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(meanAccum), arg1.(meanAccum)) } type callerMeanAccumTypex۰TГMeanAccum struct { - fn func(meanAccum,typex.T) (meanAccum) + fn func(meanAccum, typex.T) meanAccum } func funcMakerMeanAccumTypex۰TГMeanAccum(fn interface{}) reflectx.Func { - f := fn.(func(meanAccum,typex.T) (meanAccum)) + f := fn.(func(meanAccum, typex.T) meanAccum) return &callerMeanAccumTypex۰TГMeanAccum{fn: f} } @@ -329,16 +332,16 @@ func (c *callerMeanAccumTypex۰TГMeanAccum) Call(args []interface{}) []interfac return []interface{}{out0} } -func (c *callerMeanAccumTypex۰TГMeanAccum) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerMeanAccumTypex۰TГMeanAccum) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(meanAccum), arg1.(typex.T)) } type callerMeanAccumГFloat64 struct { - fn func(meanAccum) (float64) + fn func(meanAccum) float64 } func funcMakerMeanAccumГFloat64(fn interface{}) reflectx.Func { - f := fn.(func(meanAccum) (float64)) + f := fn.(func(meanAccum) float64) return &callerMeanAccumГFloat64{fn: f} } @@ -355,16 +358,16 @@ func (c *callerMeanAccumГFloat64) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerMeanAccumГFloat64) Call1x1(arg0 interface{}) (interface{}) { +func (c *callerMeanAccumГFloat64) Call1x1(arg0 interface{}) interface{} { return c.fn(arg0.(meanAccum)) } type callerTypex۰TГInt struct { - fn func(typex.T) (int) + fn func(typex.T) int } func funcMakerTypex۰TГInt(fn interface{}) reflectx.Func { - f := fn.(func(typex.T) (int)) + f := fn.(func(typex.T) int) return &callerTypex۰TГInt{fn: f} } @@ -381,16 +384,16 @@ func (c *callerTypex۰TГInt) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerTypex۰TГInt) Call1x1(arg0 interface{}) (interface{}) { +func (c *callerTypex۰TГInt) Call1x1(arg0 interface{}) interface{} { return c.fn(arg0.(typex.T)) } type callerTypex۰TГTypex۰TInt struct { - fn func(typex.T) (typex.T,int) + fn func(typex.T) (typex.T, int) } func funcMakerTypex۰TГTypex۰TInt(fn interface{}) reflectx.Func { - f := fn.(func(typex.T) (typex.T,int)) + f := fn.(func(typex.T) (typex.T, int)) return &callerTypex۰TГTypex۰TInt{fn: f} } @@ -412,11 +415,11 @@ func (c *callerTypex۰TГTypex۰TInt) Call1x2(arg0 interface{}) (interface{}, in } type callerUint16Uint16ГUint16 struct { - fn func(uint16,uint16) (uint16) + fn func(uint16, uint16) uint16 } func funcMakerUint16Uint16ГUint16(fn interface{}) reflectx.Func { - f := fn.(func(uint16,uint16) (uint16)) + f := fn.(func(uint16, uint16) uint16) return &callerUint16Uint16ГUint16{fn: f} } @@ -433,16 +436,16 @@ func (c *callerUint16Uint16ГUint16) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerUint16Uint16ГUint16) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerUint16Uint16ГUint16) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(uint16), arg1.(uint16)) } type callerUint32Uint32ГUint32 struct { - fn func(uint32,uint32) (uint32) + fn func(uint32, uint32) uint32 } func funcMakerUint32Uint32ГUint32(fn interface{}) reflectx.Func { - f := fn.(func(uint32,uint32) (uint32)) + f := fn.(func(uint32, uint32) uint32) return &callerUint32Uint32ГUint32{fn: f} } @@ -459,16 +462,16 @@ func (c *callerUint32Uint32ГUint32) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerUint32Uint32ГUint32) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerUint32Uint32ГUint32) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(uint32), arg1.(uint32)) } type callerUint64Uint64ГUint64 struct { - fn func(uint64,uint64) (uint64) + fn func(uint64, uint64) uint64 } func funcMakerUint64Uint64ГUint64(fn interface{}) reflectx.Func { - f := fn.(func(uint64,uint64) (uint64)) + f := fn.(func(uint64, uint64) uint64) return &callerUint64Uint64ГUint64{fn: f} } @@ -485,16 +488,16 @@ func (c *callerUint64Uint64ГUint64) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerUint64Uint64ГUint64) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerUint64Uint64ГUint64) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(uint64), arg1.(uint64)) } type callerUint8Uint8ГUint8 struct { - fn func(uint8,uint8) (uint8) + fn func(uint8, uint8) uint8 } func funcMakerUint8Uint8ГUint8(fn interface{}) reflectx.Func { - f := fn.(func(uint8,uint8) (uint8)) + f := fn.(func(uint8, uint8) uint8) return &callerUint8Uint8ГUint8{fn: f} } @@ -511,16 +514,16 @@ func (c *callerUint8Uint8ГUint8) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerUint8Uint8ГUint8) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerUint8Uint8ГUint8) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(uint8), arg1.(uint8)) } type callerUintUintГUint struct { - fn func(uint,uint) (uint) + fn func(uint, uint) uint } func funcMakerUintUintГUint(fn interface{}) reflectx.Func { - f := fn.(func(uint,uint) (uint)) + f := fn.(func(uint, uint) uint) return &callerUintUintГUint{fn: f} } @@ -537,16 +540,16 @@ func (c *callerUintUintГUint) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerUintUintГUint) Call2x1(arg0, arg1 interface{}) (interface{}) { +func (c *callerUintUintГUint) Call2x1(arg0, arg1 interface{}) interface{} { return c.fn(arg0.(uint), arg1.(uint)) } type callerГMeanAccum struct { - fn func() (meanAccum) + fn func() meanAccum } func funcMakerГMeanAccum(fn interface{}) reflectx.Func { - f := fn.(func() (meanAccum)) + f := fn.(func() meanAccum) return &callerГMeanAccum{fn: f} } @@ -563,9 +566,8 @@ func (c *callerГMeanAccum) Call(args []interface{}) []interface{} { return []interface{}{out0} } -func (c *callerГMeanAccum) Call0x1() (interface{}) { +func (c *callerГMeanAccum) Call0x1() interface{} { return c.fn() } - // DO NOT MODIFY: GENERATED CODE diff --git a/sdks/go/pkg/beam/transforms/stats/sum.go b/sdks/go/pkg/beam/transforms/stats/sum.go index fa399e4bc453..a1f6ecf14c3e 100644 --- a/sdks/go/pkg/beam/transforms/stats/sum.go +++ b/sdks/go/pkg/beam/transforms/stats/sum.go @@ -16,7 +16,7 @@ package stats import ( - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" ) //go:generate specialize --input=sum_switch.tmpl --x=integers,floats diff --git a/sdks/go/pkg/beam/transforms/stats/sum_test.go b/sdks/go/pkg/beam/transforms/stats/sum_test.go index 079b4e90d9f9..fa924792d3ae 100644 --- a/sdks/go/pkg/beam/transforms/stats/sum_test.go +++ b/sdks/go/pkg/beam/transforms/stats/sum_test.go @@ -18,9 +18,9 @@ package stats import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) // TestSumInt verifies that Sum adds ints correctly. diff --git a/sdks/go/pkg/beam/transforms/stats/util.go b/sdks/go/pkg/beam/transforms/stats/util.go index 87762070b909..c1bb8974c565 100644 --- a/sdks/go/pkg/beam/transforms/stats/util.go +++ b/sdks/go/pkg/beam/transforms/stats/util.go @@ -28,8 +28,8 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func combine(s beam.Scope, makeCombineFn func(reflect.Type) interface{}, col beam.PCollection) beam.PCollection { diff --git a/sdks/go/pkg/beam/transforms/stats/util_gen.go b/sdks/go/pkg/beam/transforms/stats/util_gen.go index 4be40d77b459..6380edae7cb9 100644 --- a/sdks/go/pkg/beam/transforms/stats/util_gen.go +++ b/sdks/go/pkg/beam/transforms/stats/util_gen.go @@ -21,5 +21,5 @@ package stats // so that their generated functions exist when this one is run. // `go generate` executes go:generate commands in lexical file order, top to bottom. -//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +//go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen //go:generate starcgen --package=stats --identifiers=countFn,keyedCountFn,meanFn,maxIntFn,minIntFn,sumIntFn,maxInt8Fn,minInt8Fn,sumInt8Fn,maxInt16Fn,minInt16Fn,sumInt16Fn,maxInt32Fn,minInt32Fn,sumInt32Fn,maxInt64Fn,minInt64Fn,sumInt64Fn,maxUintFn,minUintFn,sumUintFn,maxUint8Fn,minUint8Fn,sumUint8Fn,maxUint16Fn,minUint16Fn,sumUint16Fn,maxUint32Fn,minUint32Fn,sumUint32Fn,maxUint64Fn,minUint64Fn,sumUint64Fn,maxFloat32Fn,minFloat32Fn,sumFloat32Fn,maxFloat64Fn,minFloat64Fn,sumFloat64Fn diff --git a/sdks/go/pkg/beam/transforms/stats/util_gen.tmpl b/sdks/go/pkg/beam/transforms/stats/util_gen.tmpl index 2d41e31282e5..cf3253f5e24c 100644 --- a/sdks/go/pkg/beam/transforms/stats/util_gen.tmpl +++ b/sdks/go/pkg/beam/transforms/stats/util_gen.tmpl @@ -19,7 +19,7 @@ package stats // so that their generated functions exist when this one is run. // `go generate` executes go:generate commands in lexical file order, top to bottom. -//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +//go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen {{- with $x := .X }} //go:generate starcgen --package=stats --identifiers=countFn,keyedCountFn,meanFn{{- range $i, $t := $x -}},max{{$t.Name}}Fn,min{{$t.Name}}Fn,sum{{$t.Name}}Fn{{- end -}} {{end}} diff --git a/sdks/go/pkg/beam/transforms/top/top.go b/sdks/go/pkg/beam/transforms/top/top.go index e342b3f5e97b..42b4d86a65d4 100644 --- a/sdks/go/pkg/beam/transforms/top/top.go +++ b/sdks/go/pkg/beam/transforms/top/top.go @@ -19,19 +19,19 @@ package top import ( "bytes" - "encoding/json" "fmt" "reflect" "sort" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/funcx" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/funcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) -//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +//go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen //go:generate starcgen --package=top //go:generate go fmt @@ -129,13 +129,6 @@ type accum struct { list []interface{} } -// UnmarshalJSON allows accum to hook into the JSON Decoder, and -// deserialize it's own representation. -func (a *accum) UnmarshalJSON(b []byte) error { - json.Unmarshal(b, &a.data) - return nil -} - func (a *accum) unmarshal() error { if a.data == nil { return nil @@ -151,21 +144,63 @@ func (a *accum) unmarshal() error { return nil } -// MarshalJSON uses the hook into the JSON encoder library to encode the accumulator. -func (a accum) MarshalJSON() ([]byte, error) { - if a.enc == nil { - return nil, errors.Errorf("top.accum: element encoder unspecified") +var ( + accumType = reflect.TypeOf((*accum)(nil)).Elem() +) + +func init() { + beam.RegisterType(accumType) + beam.RegisterCoder(accumType, accumEnc(), accumDec()) +} + +func accumEnc() func(accum) ([]byte, error) { + byteEnc, err := coder.EncoderForSlice(reflect.TypeOf((*[][]byte)(nil)).Elem()) + if err != nil { + panic(err) } - var values [][]byte - for _, value := range a.list { + return func(a accum) ([]byte, error) { + if a.enc == nil { + return nil, errors.Errorf("top.accum: element encoder unspecified") + } + var values [][]byte + for _, value := range a.list { + var buf bytes.Buffer + if err := a.enc.Encode(value, &buf); err != nil { + return nil, errors.WithContextf(err, "top.accum: marshalling %v", value) + } + values = append(values, buf.Bytes()) + } + a.list = nil + var buf bytes.Buffer - if err := a.enc.Encode(value, &buf); err != nil { - return nil, errors.WithContextf(err, "top.accum: marshalling %v", value) + if err := coder.WriteSimpleRowHeader(1, &buf); err != nil { + return nil, err + } + if err := byteEnc(values, &buf); err != nil { + return nil, err + } + return buf.Bytes(), nil + } +} + +func accumDec() func([]byte) (accum, error) { + byteDec, err := coder.DecoderForSlice(reflect.TypeOf((*[][]byte)(nil)).Elem()) + if err != nil { + panic(err) + } + return func(b []byte) (accum, error) { + buf := bytes.NewBuffer(b) + if err := coder.ReadSimpleRowHeader(1, buf); err != nil { + return accum{}, err + } + s, err := byteDec(buf) + if err != nil { + return accum{}, err } - values = append(values, buf.Bytes()) + return accum{ + data: s.([][]byte), + }, nil } - a.list = nil - return json.Marshal(values) } // combineFn is the internal CombineFn. It maintains accumulators containing diff --git a/sdks/go/pkg/beam/transforms/top/top.shims.go b/sdks/go/pkg/beam/transforms/top/top.shims.go index ac93f9176b55..b14fc738e12c 100644 --- a/sdks/go/pkg/beam/transforms/top/top.shims.go +++ b/sdks/go/pkg/beam/transforms/top/top.shims.go @@ -22,14 +22,17 @@ import ( "reflect" // Library imports - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func init() { runtime.RegisterType(reflect.TypeOf((*accum)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*accum)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*combineFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*combineFn)(nil)).Elem()) reflectx.RegisterStructWrapper(reflect.TypeOf((*combineFn)(nil)).Elem(), wrapMakerCombineFn) reflectx.RegisterFunc(reflect.TypeOf((*func(accum, accum) accum)(nil)).Elem(), funcMakerAccumAccumГAccum) reflectx.RegisterFunc(reflect.TypeOf((*func(accum, typex.T) accum)(nil)).Elem(), funcMakerAccumTypex۰TГAccum) diff --git a/sdks/go/pkg/beam/transforms/top/top_test.go b/sdks/go/pkg/beam/transforms/top/top_test.go index 964e2930cdb4..077e731b3be9 100644 --- a/sdks/go/pkg/beam/transforms/top/top_test.go +++ b/sdks/go/pkg/beam/transforms/top/top_test.go @@ -20,7 +20,7 @@ import ( "reflect" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) // TestCombineFn3String verifies that the accumulator correctly @@ -124,13 +124,16 @@ func load(fn *combineFn, elms ...string) accum { func merge(t *testing.T, fn *combineFn, as ...accum) accum { t.Helper() a := fn.CreateAccumulator() + enc := accumEnc() + dec := accumDec() + for i, b := range as { - buf, err := b.MarshalJSON() + buf, err := enc(b) if err != nil { t.Fatalf("failure marshalling accum[%d]: %v, %+v", i, err, b) } - var c accum - if err := c.UnmarshalJSON(buf); err != nil { + c, err := dec(buf) + if err != nil { t.Fatalf("failure unmarshalling accum[%d]: %v, %+v", i, err, b) } a = fn.MergeAccumulators(a, c) @@ -139,12 +142,15 @@ func merge(t *testing.T, fn *combineFn, as ...accum) accum { } func outputUnmarshal(t *testing.T, fn *combineFn, a accum) []string { - buf, err := a.MarshalJSON() + enc := accumEnc() + dec := accumDec() + + buf, err := enc(a) if err != nil { t.Fatalf("failure marshalling accum: %v, %+v", err, a) } - var b accum - if err := b.UnmarshalJSON(buf); err != nil { + b, err := dec(buf) + if err != nil { t.Fatalf("failure unmarshalling accum: %v, %+v", err, b) } return output(fn, b) diff --git a/sdks/go/pkg/beam/util.go b/sdks/go/pkg/beam/util.go index f6a605f9122a..3616ef2ee0d2 100644 --- a/sdks/go/pkg/beam/util.go +++ b/sdks/go/pkg/beam/util.go @@ -15,7 +15,7 @@ package beam -//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +//go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen //go:generate starcgen --package=beam --identifiers=addFixedKeyFn,dropKeyFn,dropValueFn,swapKVFn,explodeFn,jsonDec,jsonEnc,protoEnc,protoDec,schemaEnc,schemaDec,makePartitionFn,createFn //go:generate go fmt @@ -106,6 +106,14 @@ func MustN(list []PCollection, err error) []PCollection { return list } +// MustTaggedN returns the input, but panics if err != nil. +func MustTaggedN(ret map[string]PCollection, err error) map[string]PCollection { + if err != nil { + panic(err) + } + return ret +} + // Must returns the input, but panics if err != nil. func Must(a PCollection, err error) PCollection { if err != nil { diff --git a/sdks/go/pkg/beam/util/gcsx/gcs.go b/sdks/go/pkg/beam/util/gcsx/gcs.go index 70a325b27fde..703eca4b2664 100644 --- a/sdks/go/pkg/beam/util/gcsx/gcs.go +++ b/sdks/go/pkg/beam/util/gcsx/gcs.go @@ -25,7 +25,7 @@ import ( "path" "cloud.google.com/go/storage" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" "google.golang.org/api/option" ) diff --git a/sdks/go/pkg/beam/util/gcsx/gcs_test.go b/sdks/go/pkg/beam/util/gcsx/gcs_test.go index 9cc449ca514c..52664a054a87 100644 --- a/sdks/go/pkg/beam/util/gcsx/gcs_test.go +++ b/sdks/go/pkg/beam/util/gcsx/gcs_test.go @@ -20,7 +20,7 @@ import ( "time" "cloud.google.com/go/storage" - "github.com/apache/beam/sdks/go/pkg/beam/util/gcsx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/gcsx" ) func Example() { diff --git a/sdks/go/pkg/beam/util/grpcx/dial.go b/sdks/go/pkg/beam/util/grpcx/dial.go index c5496654105c..cc7fc2b5c9e8 100644 --- a/sdks/go/pkg/beam/util/grpcx/dial.go +++ b/sdks/go/pkg/beam/util/grpcx/dial.go @@ -20,7 +20,7 @@ import ( "math" "time" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" "google.golang.org/grpc" ) diff --git a/sdks/go/pkg/beam/util/grpcx/hook.go b/sdks/go/pkg/beam/util/grpcx/hook.go index 457aa4f019b3..0e312997aad5 100644 --- a/sdks/go/pkg/beam/util/grpcx/hook.go +++ b/sdks/go/pkg/beam/util/grpcx/hook.go @@ -20,7 +20,7 @@ import ( "fmt" "time" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/hooks" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/hooks" "google.golang.org/grpc" ) diff --git a/sdks/go/pkg/beam/util/grpcx/metadata.go b/sdks/go/pkg/beam/util/grpcx/metadata.go index 7c0651f3e2a8..48988a6a39d4 100644 --- a/sdks/go/pkg/beam/util/grpcx/metadata.go +++ b/sdks/go/pkg/beam/util/grpcx/metadata.go @@ -19,7 +19,7 @@ package grpcx import ( "context" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" "google.golang.org/grpc/metadata" ) diff --git a/sdks/go/pkg/beam/util/pubsubx/pubsub.go b/sdks/go/pkg/beam/util/pubsubx/pubsub.go index 585b4dc1ce47..b7769a9a5810 100644 --- a/sdks/go/pkg/beam/util/pubsubx/pubsub.go +++ b/sdks/go/pkg/beam/util/pubsubx/pubsub.go @@ -22,8 +22,8 @@ import ( "time" "cloud.google.com/go/pubsub" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) // MakeQualifiedTopicName returns a fully-qualified topic name for diff --git a/sdks/go/pkg/beam/util/shimx/generate.go b/sdks/go/pkg/beam/util/shimx/generate.go index 4bd70a090227..7b48e74fec0d 100644 --- a/sdks/go/pkg/beam/util/shimx/generate.go +++ b/sdks/go/pkg/beam/util/shimx/generate.go @@ -43,10 +43,11 @@ import ( // Beam imports that the generated code requires. var ( - ExecImport = "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - TypexImport = "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - ReflectxImport = "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - RuntimeImport = "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" + ExecImport = "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + TypexImport = "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + ReflectxImport = "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + RuntimeImport = "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + SchemaImport = "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" ) func validateBeamImports() { @@ -54,6 +55,7 @@ func validateBeamImports() { checkImportSuffix(TypexImport, "typex") checkImportSuffix(ReflectxImport, "reflectx") checkImportSuffix(RuntimeImport, "runtime") + checkImportSuffix(SchemaImport, "schema") } func checkImportSuffix(path, suffix string) { @@ -112,6 +114,18 @@ func (t Top) processImports() *Top { pred["fmt"] = true pred["io"] = true } + // This should definitley be happening earlier though. + var filteredTypes []string + for _, t := range t.Types { + if !strings.HasPrefix(t, "beam.") { + filteredTypes = append(filteredTypes, t) + } + } + t.Types = filteredTypes + if len(t.Types) > 0 { + filtered = append(filtered, SchemaImport) + pred[SchemaImport] = true + } if len(t.Types) > 0 || len(t.Functions) > 0 { filtered = append(filtered, RuntimeImport) pred[RuntimeImport] = true @@ -245,6 +259,7 @@ func init() { {{- end}} {{- range $x := .Types}} runtime.RegisterType(reflect.TypeOf((*{{$x}})(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*{{$x}})(nil)).Elem()) {{- end}} {{- range $x := .Wraps}} reflectx.RegisterStructWrapper(reflect.TypeOf((*{{$x.Type}})(nil)).Elem(), wrapMaker{{$x.Name}}) diff --git a/sdks/go/pkg/beam/util/shimx/generate_test.go b/sdks/go/pkg/beam/util/shimx/generate_test.go index 581d89696753..cc80d0b1d446 100644 --- a/sdks/go/pkg/beam/util/shimx/generate_test.go +++ b/sdks/go/pkg/beam/util/shimx/generate_test.go @@ -76,7 +76,7 @@ func TestTop_ProcessImports(t *testing.T) { {name: "shim", got: &Top{Shims: []Func{{Name: "emit"}}}, want: []string{ReflectxImport, "context", "keepit", "fmt", "io", "unrelated"}}, {name: "iter&emit", got: &Top{Emitters: []Emitter{{Name: "emit"}}, Inputs: []Input{{Name: "iter"}}}, want: []string{ExecImport, TypexImport, "keepit", "unrelated"}}, {name: "functions", got: &Top{Functions: []string{"func1"}}, want: []string{RuntimeImport, "context", "keepit", "fmt", "io", "unrelated"}}, - {name: "types", got: &Top{Types: []string{"func1"}}, want: []string{RuntimeImport, "context", "keepit", "fmt", "io", "unrelated"}}, + {name: "types", got: &Top{Types: []string{"func1"}}, want: []string{SchemaImport, RuntimeImport, "context", "keepit", "fmt", "io", "unrelated"}}, } for _, test := range tests { t.Run(test.name, func(t *testing.T) { diff --git a/sdks/go/pkg/beam/util/starcgenx/starcgenx.go b/sdks/go/pkg/beam/util/starcgenx/starcgenx.go index 9e662272131e..2637d89d2b87 100644 --- a/sdks/go/pkg/beam/util/starcgenx/starcgenx.go +++ b/sdks/go/pkg/beam/util/starcgenx/starcgenx.go @@ -30,9 +30,9 @@ import ( "strconv" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" - "github.com/apache/beam/sdks/go/pkg/beam/util/shimx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/shimx" ) // NewExtractor returns an extractor for the given package. diff --git a/sdks/go/pkg/beam/util/starcgenx/starcgenx_test.go b/sdks/go/pkg/beam/util/starcgenx/starcgenx_test.go index a4c4e6d38981..4c59a490507c 100644 --- a/sdks/go/pkg/beam/util/starcgenx/starcgenx_test.go +++ b/sdks/go/pkg/beam/util/starcgenx/starcgenx_test.go @@ -47,7 +47,7 @@ func TestExtractor(t *testing.T) { expected: []string{"runtime.RegisterType(reflect.TypeOf((*myDoFn)(nil)).Elem())", "funcMakerEmitIntГ", "emitMakerInt", "funcMakerValTypeValTypeEmitIntГ", "runtime.RegisterType(reflect.TypeOf((*valType)(nil)).Elem())", "reflectx.RegisterStructWrapper(reflect.TypeOf((*myDoFn)(nil)).Elem(), wrapMakerMyDoFn)"}, excluded: []string{"funcMakerStringГ", "emitMakerString", "nonPipelineType", "UnrelatedMethod1", "UnrelatedMethod2", "UnrelatedMethod3", "nonLifecycleMethod"}, }, - {name: "excludedtypes", files: []string{excludedtypes}, pkg: "excludedtypes", imports: []string{"github.com/apache/beam/sdks/go/pkg/beam"}, + {name: "excludedtypes", files: []string{excludedtypes}, pkg: "excludedtypes", imports: []string{"github.com/apache/beam/sdks/v2/go/pkg/beam"}, expected: []string{"runtime.RegisterFunction(ShouldExist)", "funcMakerTypex۰TГTypex۰XError"}, excluded: []string{"runtime.RegisterType(reflect.TypeOf((*typex.T)(nil)).Elem())", "runtime.RegisterType(reflect.TypeOf((*beam.T)(nil)).Elem())", "runtime.RegisterType(reflect.TypeOf((*typex.X)(nil)).Elem())", "runtime.RegisterType(reflect.TypeOf((*beam.X)(nil)).Elem())", "runtime.RegisterType(reflect.TypeOf((*error)(nil)).Elem())", "runtime.RegisterType(reflect.TypeOf((*context.Context)(nil)).Elem())"}, }, @@ -69,7 +69,7 @@ func TestExtractor(t *testing.T) { {name: "vars", files: []string{vars}, pkg: "vars", imports: []string{"strings"}, excluded: []string{"runtime.RegisterFunction(strings.MyTitle)", "runtime.RegisterFunction(anonFunction)"}, }, - {name: "registerDoFn", files: []string{pardo, registerDoFn}, pkg: "pardo", imports: []string{"github.com/apache/beam/sdks/go/pkg/beam"}, + {name: "registerDoFn", files: []string{pardo, registerDoFn}, pkg: "pardo", imports: []string{"github.com/apache/beam/sdks/v2/go/pkg/beam"}, expected: []string{"runtime.RegisterFunction(MyIdent)", "runtime.RegisterFunction(MyOtherDoFn)", "runtime.RegisterType(reflect.TypeOf((*foo)(nil)).Elem())", "funcMakerStringГString", "funcMakerFooГStringFoo"}, excluded: []string{"runtime.RegisterFunction(MyDropVal)", "funcMakerIntStringГInt"}, }, @@ -182,7 +182,7 @@ const excludedtypes = ` package excludedtypes import ( - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" ) func ShouldExist(v beam.T) (beam.X, error) { diff --git a/sdks/go/pkg/beam/util/syscallx/syscall.go b/sdks/go/pkg/beam/util/syscallx/syscall.go index 4d0babcb99db..dc1dc63932c0 100644 --- a/sdks/go/pkg/beam/util/syscallx/syscall.go +++ b/sdks/go/pkg/beam/util/syscallx/syscall.go @@ -20,7 +20,7 @@ package syscallx import ( - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // ErrUnsupported is the error returned for unsupported operations. diff --git a/sdks/go/pkg/beam/validate.go b/sdks/go/pkg/beam/validate.go index 1318b8a8f4a4..9987bfe5705e 100644 --- a/sdks/go/pkg/beam/validate.go +++ b/sdks/go/pkg/beam/validate.go @@ -19,8 +19,8 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) // ValidateKVType panics if the type of the PCollection is not KV. diff --git a/sdks/go/pkg/beam/windowing.go b/sdks/go/pkg/beam/windowing.go index 63bdf48feb19..58254acacbe4 100644 --- a/sdks/go/pkg/beam/windowing.go +++ b/sdks/go/pkg/beam/windowing.go @@ -16,26 +16,70 @@ package beam import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "fmt" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) +type WindowIntoOption interface { + windowIntoOption() +} + +type WindowTrigger struct { + Name window.Trigger +} + +func (t WindowTrigger) windowIntoOption() {} + +// Trigger applies `tr` trigger to the window. +func Trigger(tr window.Trigger) WindowTrigger { + return WindowTrigger{Name: tr} +} + +type AccumulationMode struct { + Mode window.AccumulationMode +} + +func (m AccumulationMode) windowIntoOption() {} + +// PanesAccumulate applies an Accumulating AccumulationMode to the window. +func PanesAccumulate() AccumulationMode { + return AccumulationMode{Mode: window.Accumulating} +} + +// PanesDiscard applies a Discarding AccumulationMode to the window. +func PanesDiscard() AccumulationMode { + return AccumulationMode{Mode: window.Discarding} +} + // WindowInto applies the windowing strategy to each element. -func WindowInto(s Scope, ws *window.Fn, col PCollection) PCollection { - return Must(TryWindowInto(s, ws, col)) +func WindowInto(s Scope, ws *window.Fn, col PCollection, opts ...WindowIntoOption) PCollection { + return Must(TryWindowInto(s, ws, col, opts...)) } // TryWindowInto attempts to insert a WindowInto transform. -func TryWindowInto(s Scope, ws *window.Fn, col PCollection) (PCollection, error) { +func TryWindowInto(s Scope, wfn *window.Fn, col PCollection, opts ...WindowIntoOption) (PCollection, error) { if !s.IsValid() { return PCollection{}, errors.New("invalid scope") } if !col.IsValid() { return PCollection{}, errors.New("invalid input pcollection") } + ws := window.WindowingStrategy{Fn: wfn, Trigger: window.Trigger{}} + for _, opt := range opts { + switch opt := opt.(type) { + case WindowTrigger: + ws.Trigger = opt.Name + case AccumulationMode: + ws.AccumulationMode = opt.Mode + default: + panic(fmt.Sprintf("Unknown WindowInto option type: %T: %v", opt, opt)) + } + } - edge := graph.NewWindowInto(s.real, s.scope, ws, col.n) + edge := graph.NewWindowInto(s.real, s.scope, &ws, col.n) ret := PCollection{edge.Output[0].To} return ret, nil } diff --git a/sdks/go/pkg/beam/x/beamx/run.go b/sdks/go/pkg/beam/x/beamx/run.go index 77d766b28ec5..9ff0d0346991 100644 --- a/sdks/go/pkg/beam/x/beamx/run.go +++ b/sdks/go/pkg/beam/x/beamx/run.go @@ -20,18 +20,19 @@ import ( "context" "flag" - "github.com/apache/beam/sdks/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam" // Import the reflection-optimized runtime. - _ "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/optimized" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs" - _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec/optimized" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local" // The imports here are for the side effect of runner registration. - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/direct" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/dot" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/flink" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/spark" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/direct" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dot" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/flink" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/samza" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/spark" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/universal" ) var runner = flag.String("runner", "direct", "Pipeline runner.") diff --git a/sdks/go/pkg/beam/x/debug/debug.shims.go b/sdks/go/pkg/beam/x/debug/debug.shims.go index 94b9034ff745..4abbf1ec7a01 100644 --- a/sdks/go/pkg/beam/x/debug/debug.shims.go +++ b/sdks/go/pkg/beam/x/debug/debug.shims.go @@ -25,20 +25,27 @@ import ( "reflect" // Library imports - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx/schema" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" ) func init() { runtime.RegisterFunction(discardFn) runtime.RegisterType(reflect.TypeOf((*context.Context)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*context.Context)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*headFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*headFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*headKVFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*headKVFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*printFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*printFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*printGBKFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*printGBKFn)(nil)).Elem()) runtime.RegisterType(reflect.TypeOf((*printKVFn)(nil)).Elem()) + schema.RegisterType(reflect.TypeOf((*printKVFn)(nil)).Elem()) reflectx.RegisterStructWrapper(reflect.TypeOf((*headFn)(nil)).Elem(), wrapMakerHeadFn) reflectx.RegisterStructWrapper(reflect.TypeOf((*headKVFn)(nil)).Elem(), wrapMakerHeadKVFn) reflectx.RegisterStructWrapper(reflect.TypeOf((*printFn)(nil)).Elem(), wrapMakerPrintFn) diff --git a/sdks/go/pkg/beam/x/debug/doc.go b/sdks/go/pkg/beam/x/debug/doc.go index 20345bfb8898..4ce31db9c7d5 100644 --- a/sdks/go/pkg/beam/x/debug/doc.go +++ b/sdks/go/pkg/beam/x/debug/doc.go @@ -17,6 +17,6 @@ // in debugging pipeline issues. package debug -//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen +//go:generate go install github.com/apache/beam/sdks/v2/go/cmd/starcgen //go:generate starcgen --package=debug --identifiers=headFn,headKVFn,discardFn,printFn,printKVFn,printGBKFn //go:generate go fmt diff --git a/sdks/go/pkg/beam/x/debug/head.go b/sdks/go/pkg/beam/x/debug/head.go index e0ca09432288..a3451a009b18 100644 --- a/sdks/go/pkg/beam/x/debug/head.go +++ b/sdks/go/pkg/beam/x/debug/head.go @@ -16,8 +16,8 @@ package debug import ( - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" ) // Head returns the first "n" elements it sees, it doesn't enforce any logic diff --git a/sdks/go/pkg/beam/x/debug/print.go b/sdks/go/pkg/beam/x/debug/print.go index 514de84e66d2..b37c35d1c711 100644 --- a/sdks/go/pkg/beam/x/debug/print.go +++ b/sdks/go/pkg/beam/x/debug/print.go @@ -19,9 +19,9 @@ import ( "context" "fmt" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/typex" - "github.com/apache/beam/sdks/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/typex" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" ) // Print prints out all data. Use with care. diff --git a/sdks/go/pkg/beam/x/hooks/perf/perf.go b/sdks/go/pkg/beam/x/hooks/perf/perf.go index d982b605f7dc..e45949df4534 100644 --- a/sdks/go/pkg/beam/x/hooks/perf/perf.go +++ b/sdks/go/pkg/beam/x/hooks/perf/perf.go @@ -24,8 +24,8 @@ import ( "runtime/pprof" "runtime/trace" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/hooks" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/hooks" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" ) // CaptureHook is used by the harness to have the runner diff --git a/sdks/go/pkg/beam/xlang.go b/sdks/go/pkg/beam/xlang.go index a95b5f8306ff..b9c6ec7f7c82 100644 --- a/sdks/go/pkg/beam/xlang.go +++ b/sdks/go/pkg/beam/xlang.go @@ -16,37 +16,125 @@ package beam import ( - "github.com/apache/beam/sdks/go/pkg/beam/core/graph" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/xlangx" - "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/xlangx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/internal/errors" ) -// xlang exposes an API to execute cross-language transforms within the Go SDK. -// It is experimental and likely to change. It exposes convenient wrappers -// around the core functions to pass in any combination of named/unnamed -// inputs/outputs. - // UnnamedInput is a helper function for passing single unnamed inputs to -// `beam.CrossLanguage`. +// beam.CrossLanguage. // // Example: -// beam.CrossLanguage(s, urn, payload, addr, UnnamedInput(input), outputs); +// beam.CrossLanguage(s, urn, payload, addr, UnnamedInput(input), outputs) func UnnamedInput(col PCollection) map[string]PCollection { return map[string]PCollection{graph.UnnamedInputTag: col} } // UnnamedOutput is a helper function for passing single unnamed output types to -// `beam.CrossLanguage`. +// beam.CrossLanguage. The associated output can be accessed with beam.UnnamedOutputTag. // // Example: -// beam.CrossLanguage(s, urn, payload, addr, inputs, UnnamedOutput(output)); +// resultMap := beam.CrossLanguage(s, urn, payload, addr, inputs, UnnamedOutput(output)); +// result := resultMap[beam.UnnamedOutputTag()] func UnnamedOutput(t FullType) map[string]FullType { return map[string]FullType{graph.UnnamedOutputTag: t} } -// CrossLanguage executes a cross-language transform that uses named inputs and -// returns named outputs. +// UnnamedOutputTag provides the output tag used for an output passed to beam.UnnamedOutput. +// Needed to retrieve the unnamed output PCollection from the result of beam.CrossLanguage. +func UnnamedOutputTag() string { + return graph.UnnamedOutputTag +} + +// CrossLanguagePayload encodes a native Go struct into a payload for cross-language transforms. +// payloads are []byte encoded ExternalConfigurationPayload protobufs. In order to fill the +// contents of the protobuf, the provided struct will be used to converted to a row encoded +// representation with an accompanying schema, so the input struct must be compatible with schemas. +// +// See https://beam.apache.org/documentation/programming-guide/#schemas for basic information on +// schemas, and pkg/beam/core/runtime/graphx/schema for details on schemas in the Go SDK. +// +// Example: +// type stringPayload struct { +// Data string +// } +// encodedPl := beam.CrossLanguagePayload(stringPayload{Data: "foo"}) +func CrossLanguagePayload(pl interface{}) []byte { + bytes, err := xlangx.EncodeStructPayload(pl) + if err != nil { + panic(err) + } + return bytes +} + +// CrossLanguage is a low-level transform for executing cross-language transforms written in other +// SDKs. Because this is low-level, it is recommended to use one of the higher-level IO-specific +// wrappers where available. These can be found in the pkg/beam/io/xlang subdirectory. +// CrossLanguage is useful for executing cross-language transforms which do not have any existing +// IO wrappers. +// +// Usage requires an address for an expansion service accessible during pipeline construction, a +// URN identifying the desired transform, an optional payload with configuration information, and +// input and output names. It outputs a map of named output PCollections. +// +// For more information on expansion services and other aspects of cross-language transforms in +// general, refer to the Beam programming guide: https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines +// +// Payload +// +// Payloads are configuration data that some cross-language transforms require for expansion. +// Consult the documentation of the transform in the source SDK to find out what payload data it +// requires. If no payload is required, pass in nil. +// +// CrossLanguage accepts payloads as a []byte containing an encoded ExternalConfigurationPayload +// protobuf. The helper function beam.CrossLanguagePayload is the recommended way to easily encode +// a standard Go struct for use as a payload. +// +// Inputs and Outputs +// +// Like most transforms, any input PCollections must be provided. Unlike most transforms, output +// types must be provided because Go cannot infer output types from external transforms. +// +// Inputs and outputs to a cross-language transform may be either named or unnamed. Named +// inputs/outputs are used when there are more than one input/output, and are provided as maps with +// names as keys. Unnamed inputs/outputs are used when there is only one, and a map can be quickly +// constructed with the UnnamedInput and UnnamedOutput methods. +// +// An example of defining named inputs and outputs: +// +// namedInputs := map[string]beam.PCollection{"pcol1": pcol1, "pcol2": pcol2} +// namedOutputTypes := map[string]typex.FullType{ +// "main": typex.New(reflectx.String), +// "side": typex.New(reflectx.Int64), +// } +// +// CrossLanguage outputs a map of PCollections with associated names. These names will match those +// from provided named outputs. If the beam.UnnamedOutput method was used, the PCollection can be +// retrieved with beam.UnnamedOutputTag(). +// +// An example of retrieving named outputs from a call to CrossLanguage: +// +// outputs := beam.CrossLanguage(...) +// mainPcol := outputs["main"] +// sidePcol := outputs["side"] +// +// Example +// +// This example shows using CrossLanguage to execute the Prefix cross-language transform using an +// expansion service running on localhost:8099. Prefix requires a payload containing a prefix to +// prepend to every input string. +// +// type prefixPayload struct { +// Data string +// } +// encodedPl := beam.CrossLanguagePayload(prefixPayload{Data: "foo"}) +// urn := "beam:transforms:xlang:test:prefix" +// expansionAddr := "localhost:8099" +// outputType := beam.UnnamedOutput(typex.New(reflectx.String)) +// input := beam.UnnamedInput(inputPcol) +// outs := beam.CrossLanguage(s, urn, encodedPl, expansionAddr, input, outputType) +// outPcol := outputs[beam.UnnamedOutputTag()] func CrossLanguage( s Scope, urn string, @@ -75,7 +163,9 @@ func CrossLanguage( return mapNodeToPCollection(namedOutputs) } -// TryCrossLanguage coordinates the core functions required to execute the cross-language transform +// TryCrossLanguage coordinates the core functions required to execute the cross-language transform. +// This is mainly intended for internal use. For the general-use entry point, see +// beam.CrossLanguage. func TryCrossLanguage(s Scope, ext *graph.ExternalTransform, ins []*graph.Inbound, outs []*graph.Outbound) (map[string]*graph.Node, error) { // Adding an edge in the graph corresponding to the ExternalTransform edge, isBoundedUpdater := graph.NewCrossLanguage(s.real, s.scope, ext, ins, outs) diff --git a/sdks/go/test/build.gradle b/sdks/go/test/build.gradle index db29d86d2d0e..90fd52f69031 100644 --- a/sdks/go/test/build.gradle +++ b/sdks/go/test/build.gradle @@ -22,10 +22,10 @@ applyPythonNature() description = "Apache Beam :: SDKs :: Go :: Test" -// Figure out why the golang plugin does not add a build dependency between projects. -// Without the line below, we get spurious errors about not being able to resolve -// "./github.com/apache/beam/sdks/go" -resolveBuildDependencies.dependsOn ":sdks:go:goBuild" +// Disable gogradle's dependency resolution so it uses Go modules instead. +installDependencies.enabled = false +resolveBuildDependencies.enabled = false +resolveTestDependencies.enabled = false clean.dependsOn cleanVendor @@ -39,28 +39,29 @@ dependencies { } golang { - packagePath = 'github.com/apache/beam/sdks/go/test' + packagePath = 'github.com/apache/beam/sdks/v2/go/test' goBuild { // Build the linux-amd64 worker. The native version is built in the parent to // have a fixed name, which is not possible with multiple target platforms. The // script would otherwise have to figure out which arch/platform binary to invoke. targetPlatform = ['linux-amd64'] - go 'build -o ./build/bin/linux-amd64/worker github.com/apache/beam/sdks/go/test/integration' + go 'build -o ./build/bin/linux-amd64/worker github.com/apache/beam/sdks/v2/go/test/integration/driver' } } -// ValidatesRunner tests for Flink using an updated framework that supports -// cross-language tests. -// TODO(BEAM-11415): Merge this into existing ValidatesRunner gradle rules. -task flinkXlangValidatesRunner { - dependsOn ":runners:flink:1.10:job-server:shadowJar" +// ValidatesRunner tests for Dataflow. Runs tests in the integration directory +// with Dataflow to validate that the runner behaves as expected. +task dataflowValidatesRunner() { + dependsOn ":sdks:go:test:goBuild" + dependsOn ":runners:google-cloud-dataflow-java:worker:shadowJar" dependsOn ":sdks:java:testing:expansion-service:buildTestExpansionServiceJar" + doLast { def options = [ - "--runner flink", - "--flink_job_server_jar ${project(":runners:flink:1.10:job-server").shadowJar.archivePath}", - "--expansion_service_jar ${project(":sdks:java:testing:expansion-service").buildTestExpansionServiceJar.archivePath}", + "--runner dataflow", + "--dataflow_worker_jar ${project(":runners:google-cloud-dataflow-java:worker").shadowJar.archivePath}", + "--test_expansion_jar ${project(":sdks:java:testing:expansion-service").buildTestExpansionServiceJar.archivePath}", ] exec { executable "sh" @@ -69,47 +70,90 @@ task flinkXlangValidatesRunner { } } +// ValidatesRunner tests for Flink. Runs tests in the integration directory +// with Flink to validate that the runner behaves as expected. task flinkValidatesRunner { dependsOn ":sdks:go:test:goBuild" - dependsOn ":runners:flink:1.10:job-server:shadowJar" + dependsOn ":sdks:go:container:docker" + dependsOn ":sdks:java:container:java8:docker" + dependsOn ":sdks:java:container:java11:docker" + dependsOn ":runners:flink:${project.ext.latestFlinkVersion}:job-server:shadowJar" + dependsOn ":sdks:java:testing:expansion-service:buildTestExpansionServiceJar" doLast { def options = [ "--runner flink", - "--parallel 1", // prevent memory overuse - "--flink_job_server_jar ${project(":runners:flink:1.10:job-server").shadowJar.archivePath}", + "--flink_job_server_jar ${project(":runners:flink:${project.ext.latestFlinkVersion}:job-server").shadowJar.archivePath}", + "--test_expansion_jar ${project(":sdks:java:testing:expansion-service").buildTestExpansionServiceJar.archivePath}", + ] + exec { + executable "sh" + args "-c", "./run_validatesrunner_tests.sh ${options.join(' ')}" + } + } +} + +// ValidatesRunner tests for Samza. Runs tests in the integration directory +// with Samza to validate that the runner behaves as expected. +task samzaValidatesRunner { + dependsOn ":sdks:go:test:goBuild" + dependsOn ":sdks:go:container:docker" + dependsOn ":sdks:java:container:java8:docker" + dependsOn ":sdks:java:container:java11:docker" + dependsOn ":runners:samza:job-server:shadowJar" + dependsOn ":sdks:java:testing:expansion-service:buildTestExpansionServiceJar" + doLast { + def options = [ + "--runner samza", + "--samza_job_server_jar ${project(":runners:samza:job-server").shadowJar.archivePath}", + "--test_expansion_jar ${project(":sdks:java:testing:expansion-service").buildTestExpansionServiceJar.archivePath}", ] exec { executable "sh" - args "-c", "./run_integration_tests.sh ${options.join(' ')}" + args "-c", "./run_validatesrunner_tests.sh ${options.join(' ')}" } } } +// ValidatesRunner tests for Spark. Runs tests in the integration directory +// with Spark to validate that the runner behaves as expected. task sparkValidatesRunner { dependsOn ":sdks:go:test:goBuild" - dependsOn ":runners:spark:job-server:shadowJar" + dependsOn ":sdks:java:container:java8:docker" + dependsOn ":sdks:java:container:java11:docker" + dependsOn ":runners:spark:2:job-server:shadowJar" + dependsOn ":sdks:java:testing:expansion-service:buildTestExpansionServiceJar" doLast { def options = [ "--runner spark", - "--parallel 1", // prevent memory overuse - "--spark_job_server_jar ${project(":runners:spark:job-server").shadowJar.archivePath}", + "--spark_job_server_jar ${project(":runners:spark:2:job-server").shadowJar.archivePath}", + "--test_expansion_jar ${project(":sdks:java:testing:expansion-service").buildTestExpansionServiceJar.archivePath}", ] exec { executable "sh" - args "-c", "./run_integration_tests.sh ${options.join(' ')}" + args "-c", "./run_validatesrunner_tests.sh ${options.join(' ')}" } } } - +// ValidatesRunner tests for the Python Portable runner (aka. ULR). Runs tests +// in the integration directory with the ULR to validate that the runner behaves +// as expected. +// +// The ULR can exhibit strange behavior when containers are built with outdated +// vendored directories. For best results use the clean task, like so: +// ./gradlew clean :sdks:go:test:ulrValidatesRunner task ulrValidatesRunner { dependsOn ":sdks:go:test:goBuild" dependsOn ":sdks:go:container:docker" + dependsOn ":sdks:java:container:java8:docker" + dependsOn ":sdks:java:container:java11:docker" dependsOn "setupVirtualenv" dependsOn ":sdks:python:buildPython" + dependsOn ":sdks:java:testing:expansion-service:buildTestExpansionServiceJar" doLast { def options = [ - "--runner universal", + "--runner portable", + "--test_expansion_jar ${project(":sdks:java:testing:expansion-service").buildTestExpansionServiceJar.archivePath}", ] exec { executable "sh" @@ -118,8 +162,7 @@ task ulrValidatesRunner { } exec { executable "sh" - args "-c", ". ${envdir}/bin/activate && ./run_integration_tests.sh ${options.join(' ')}" + args "-c", ". ${envdir}/bin/activate && ./run_validatesrunner_tests.sh ${options.join(' ')}" } } } - diff --git a/sdks/go/test/integration/driver.go b/sdks/go/test/integration/driver/driver.go similarity index 83% rename from sdks/go/test/integration/driver.go rename to sdks/go/test/integration/driver/driver.go index 3ac5645d5259..b8e11abe397b 100644 --- a/sdks/go/test/integration/driver.go +++ b/sdks/go/test/integration/driver/driver.go @@ -23,13 +23,13 @@ import ( "sync" "sync/atomic" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/memfs" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/test/integration/primitives" - "github.com/apache/beam/sdks/go/test/integration/synthetic" - "github.com/apache/beam/sdks/go/test/integration/wordcount" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/memfs" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/test/integration/primitives" + "github.com/apache/beam/sdks/v2/go/test/integration/synthetic" + "github.com/apache/beam/sdks/v2/go/test/integration/wordcount" ) var ( @@ -69,6 +69,13 @@ func main() { // {"flatten:dup", primitives.FlattenDup()}, {"reshuffle:reshuffle", primitives.Reshuffle()}, {"reshuffle:reshufflekv", primitives.ReshuffleKV()}, + {"window:sums", func() *beam.Pipeline { + p, s := beam.NewPipelineWithRoot() + primitives.WindowSums_GBK(s) + primitives.WindowSums_Lifted(s) + return p + }(), + }, } re := regexp.MustCompile(*filter) diff --git a/sdks/go/test/integration/flags.go b/sdks/go/test/integration/flags.go new file mode 100644 index 000000000000..6eee0e06a99a --- /dev/null +++ b/sdks/go/test/integration/flags.go @@ -0,0 +1,42 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package integration + +import "flag" + +// The following flags are flags used in one or more integration tests, and that +// may be used by scripts that execute "go test ./sdks/go/test/integration/...". +// Because any flags used with those commands are used for each package, every +// integration test package must import these flags, even if they are not used. +var ( + // TestExpansionAddr is the endpoint for the expansion service for test-only + // cross-language transforms. + TestExpansionAddr = flag.String("test_expansion_addr", "", "Address of Expansion Service for test cross-language transforms.") + + // IoExpansionAddr is the endpoint for the expansion service for + // cross-language IO transforms. + IoExpansionAddr = flag.String("io_expansion_addr", "", "Address of Expansion Service for cross-language IOs.") + + // BootstrapServers is the address of the bootstrap servers for a Kafka + // cluster, used for Kafka IO tests. + BootstrapServers = flag.String("bootstrap_servers", "", + "URL of the bootstrap servers for the Kafka cluster. Should be accessible by the runner.") + + // KafkaJar is a filepath to a jar for starting a Kafka cluster, used for + // Kafka IO tests. + KafkaJar = flag.String("kafka_jar", "", + "The filepath to a jar for starting up a Kafka cluster. Only used if boostrap_servers is unspecified.") +) diff --git a/sdks/go/test/integration/integration.go b/sdks/go/test/integration/integration.go new file mode 100644 index 000000000000..beeacdf8bc1c --- /dev/null +++ b/sdks/go/test/integration/integration.go @@ -0,0 +1,171 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package integration provides functionality that needs to be shared between all +// integration tests. +// +// Integration tests are implemented through Go's test framework, as test +// functions that create and execute pipelines using the ptest package. Tests +// should be placed in smaller sub-packages for organizational purposes and +// parallelism (tests are only run in parallel across different packages). +// Integration tests should always begin with a call to CheckFilters to ensure +// test filters can be applied, and each package containing integration tests +// should call ptest.Main in a TestMain function if it uses ptest. +// +// Running integration tests can be done with a go test call with any flags that +// are required by the test pipelines, such as --runner or --endpoint. +// Example: +// go test -v ./sdks/go/test/integration/... --runner=portable --endpoint=localhost:8099 +// +// Alternatively, tests can be executed by running the +// run_validatesrunner_tests.sh script, which also performs much of the +// environment setup, or by calling gradle commands in :sdks:go:test. +package integration + +import ( + "fmt" + "regexp" + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" +) + +// Filters for temporarily skipping integration tests. All filters are regex +// matchers that must match the full name of a test at the point where +// CheckFilters is called. Multiple tests can be skipped by using regex +// wildcards. (ex. "TestXLang_.*" filters all tests starting with TestXLang_) +// +// It is strongly recommended to include, TODOs, Jira issues, or just comments +// describing why tests are being skipped. + +// sickbay filters tests that fail due to Go SDK errors. These tests will not +// execute on any runners. +var sickbay = []string{} + +// Runner-specific test filters, for features that are not yet supported on +// specific runners. + +var directFilters = []string{ + // The direct runner does not yet support cross-language. + "TestXLang.*", + // Triggers are not yet supported + "TestTrigger.*", + // The direct runner does not support the TestStream primitive + "TestTestStream.*", +} + +var portableFilters = []string{ + // The portable runner does not support the TestStream primitive + "TestTestStream.*", + // The trigger tests uses TestStream + "TestTrigger.*", +} + +var flinkFilters = []string{ + // TODO(BEAM-11500): Flink tests timing out on reads. + "TestXLang_Combine.*", + // TODO(BEAM-12753): Flink test stream fails for non-string/byte slice inputs + "TestTestStream.*Sequence.*", + // Triggers are not yet supported + "TestTrigger.*", +} + +var samzaFilters = []string{ + // TODO(BEAM-12608): Samza tests invalid encoding. + "TestReshuffle", + "TestReshuffleKV", + // The Samza runner does not support the TestStream primitive + "TestTestStream.*", + // The trigger tests uses TestStream + "TestTrigger.*", +} + +var sparkFilters = []string{ + // TODO(BEAM-11498): XLang tests broken with Spark runner. + "TestXLang.*", + "TestParDoSideInput", + "TestParDoKVSideInput", + // The Spark runner does not support the TestStream primitive + "TestTestStream.*", + // The trigger tests uses TestStream + "TestTrigger.*", +} + +var dataflowFilters = []string{ + // TODO(BEAM-11576): TestFlattenDup failing on this runner. + "TestFlattenDup", + // The Dataflow runner does not support the TestStream primitive + "TestTestStream.*", + // The trigger tests uses TestStream + "TestTrigger.*", +} + +// CheckFilters checks if an integration test is filtered to be skipped, either +// because the intended runner does not support it, or the test is sickbayed. +// This method should be called at the beginning of any integration test. If +// t.Run is used, CheckFilters should be called within the t.Run callback, so +// that sub-tests can be skipped individually. +func CheckFilters(t *testing.T) { + // Check for sickbaying first. + n := t.Name() + for _, f := range sickbay { + // Add start and end of string regexp matchers so only a full match is + // counted. + f = fmt.Sprintf("^%v$", f) + match, err := regexp.MatchString(f, n) + if err != nil { + t.Errorf("Matching of regex '%v' with test '%v' failed: %v", f, n, err) + } + if match { + t.Skipf("Test %v is currently sickbayed on all runners", n) + } + } + + // Test for runner-specific skipping second. + var filters []string + runner := *ptest.Runner + if runner == "" { + runner = ptest.DefaultRunner() + } + switch runner { + case "direct", "DirectRunner": + filters = directFilters + case "portable", "PortableRunner": + filters = portableFilters + case "flink", "FlinkRunner": + filters = flinkFilters + case "samza", "SamzaRunner": + filters = samzaFilters + case "spark", "SparkRunner": + filters = sparkFilters + case "dataflow", "DataflowRunner": + filters = dataflowFilters + default: + return + } + + for _, f := range filters { + // Add start and end of string regexp matchers so only a full match is + // counted. + f = fmt.Sprintf("^%v$", f) + match, err := regexp.MatchString(f, n) + if err != nil { + t.Errorf("Matching of regex '%v' with test '%v' failed: %v", f, n, err) + } + if match { + t.Skipf("Test %v is currently filtered for runner %v", n, runner) + } + } +} diff --git a/sdks/go/test/integration/io/xlang/kafka/jar.go b/sdks/go/test/integration/io/xlang/kafka/jar.go new file mode 100644 index 000000000000..36bffa3e0bf4 --- /dev/null +++ b/sdks/go/test/integration/io/xlang/kafka/jar.go @@ -0,0 +1,65 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package kafka + +import ( + "net" + "os" + "os/exec" + "strconv" + "time" +) + +// kafkaCluster contains anything needed to use and clean up the Kafka cluster +// once it's been started. +type kafkaCluster struct { + proc *os.Process // The process information for the running jar. + bootstrapAddr string // The bootstrap address to connect to Kafka. +} + +// runLocalKafka takes a Kafka jar filepath and runs a local Kafka cluster, +// returning the bootstrap server for that cluster. +func runLocalKafka(jar string) (*kafkaCluster, error) { + port, err := getOpenPort() + if err != nil { + return nil, err + } + kafkaPort := strconv.Itoa(port) + port, err = getOpenPort() + if err != nil { + return nil, err + } + zookeeperPort := strconv.Itoa(port) + + cmd := exec.Command("java", "-jar", jar, kafkaPort, zookeeperPort) + err = cmd.Start() + if err != nil { + return nil, err + } + time.Sleep(3 * time.Second) // Wait a bit for the cluster to start. + + return &kafkaCluster{proc: cmd.Process, bootstrapAddr: "localhost:" + kafkaPort}, nil +} + +// getOpenPort gets an open TCP port and returns it, or an error on failure. +func getOpenPort() (int, error) { + listener, err := net.Listen("tcp", ":0") + if err != nil { + return 0, err + } + defer listener.Close() + return listener.Addr().(*net.TCPAddr).Port, nil +} diff --git a/sdks/go/test/integration/io/xlang/kafka/kafka.go b/sdks/go/test/integration/io/xlang/kafka/kafka.go new file mode 100644 index 000000000000..f7b51adb1ef9 --- /dev/null +++ b/sdks/go/test/integration/io/xlang/kafka/kafka.go @@ -0,0 +1,77 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package kafka contains integration tests for cross-language Kafka IO +// transforms. +package kafka + +import ( + "bytes" + "fmt" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/kafkaio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/google/uuid" +) + +func appendUuid(prefix string) string { + return fmt.Sprintf("%v_%v", prefix, uuid.New()) +} + +// Constants for the BasicPipeline. +const ( + numRecords = 1000 + basicTopic = "xlang_kafkaio_basic_test" +) + +// BasicPipeline creates a pipeline that writes and then reads a range of ints +// to and from a Kafka topic and asserts that all elements are present. This +// function requires an expansion service address and a Kafka bootstrap server +// address. +func BasicPipeline(expansionAddr, bootstrapAddr string) *beam.Pipeline { + topic := appendUuid(basicTopic) + inputs := make([]int, numRecords) + for i := 0; i < numRecords; i++ { + inputs[i] = i + } + p, s := beam.NewPipelineWithRoot() + ins := beam.CreateList(s, inputs) + + // Write to Kafka + encoded := beam.ParDo(s, func(i int) ([]byte, error) { + var buf bytes.Buffer + err := coder.EncodeVarInt(int64(i), &buf) + return buf.Bytes(), err + }, ins) + keyed := beam.ParDo(s, func(b []byte) ([]byte, []byte) { + return []byte(""), b + }, encoded) + kafkaio.Write(s, expansionAddr, bootstrapAddr, topic, keyed) + + // Read from Kafka + reads := kafkaio.Read(s, expansionAddr, bootstrapAddr, []string{topic}) + vals := beam.DropKey(s, reads) + decoded := beam.ParDo(s, func(b []byte) (int, error) { + buf := bytes.NewBuffer(b) + i, err := coder.DecodeVarInt(buf) + return int(i), err + }, vals) + + passert.Equals(s, decoded, ins) + + return p +} diff --git a/sdks/go/test/integration/io/xlang/kafka/kafka_test.go b/sdks/go/test/integration/io/xlang/kafka/kafka_test.go new file mode 100644 index 000000000000..db26eeb5e5c5 --- /dev/null +++ b/sdks/go/test/integration/io/xlang/kafka/kafka_test.go @@ -0,0 +1,73 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package kafka + +import ( + "log" + "os" + "testing" + + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/flink" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/samza" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/spark" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" +) + +// bootstrapAddr should be set by TestMain once a Kafka cluster has been +// started, and is used by each test. +var bootstrapAddr string + +func checkFlags(t *testing.T) { + if *integration.IoExpansionAddr == "" { + t.Skip("No IO expansion address provided.") + } + if bootstrapAddr == "" { + t.Skip("No bootstrap server address provided.") + } +} + +// TestBasicPipeline tests a basic Kafka pipeline that writes to and reads from +// Kafka with no optional parameters or extra features. +func TestBasicPipeline(t *testing.T) { + integration.CheckFilters(t) + checkFlags(t) + p := BasicPipeline(*integration.IoExpansionAddr, bootstrapAddr) + ptest.RunAndValidate(t, p) +} + +// TestMain starts up a Kafka cluster from integration.KafkaJar before running +// tests through ptest.Main. +func TestMain(m *testing.M) { + // Defer os.Exit so it happens after other defers. + var retCode int + defer func() { os.Exit(retCode) }() + + // Start local Kafka cluster and defer its shutdown. + if *integration.BootstrapServers != "" { + bootstrapAddr = *integration.BootstrapServers + } else if *integration.KafkaJar != "" { + cluster, err := runLocalKafka(*integration.KafkaJar) + if err != nil { + log.Fatalf("Kafka cluster failed to start: %v", err) + } + defer func() { cluster.proc.Kill() }() + bootstrapAddr = cluster.bootstrapAddr + } + + retCode = ptest.MainRet(m) +} diff --git a/sdks/go/test/integration/primitives/cogbk.go b/sdks/go/test/integration/primitives/cogbk.go index 3203bd93e971..b3b65cfec90b 100644 --- a/sdks/go/test/integration/primitives/cogbk.go +++ b/sdks/go/test/integration/primitives/cogbk.go @@ -18,8 +18,8 @@ package primitives import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" ) func genA(_ []byte, emit func(string, int)) { diff --git a/sdks/go/test/integration/primitives/cogbk_test.go b/sdks/go/test/integration/primitives/cogbk_test.go index 493ce8302b59..e464e7423f1b 100644 --- a/sdks/go/test/integration/primitives/cogbk_test.go +++ b/sdks/go/test/integration/primitives/cogbk_test.go @@ -13,28 +13,28 @@ // See the License for the specific language governing permissions and // limitations under the License. +// Package primitives contains tests on basic well-known Beam transforms, such +// as ParDo, Flatten, etc. package primitives import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" ) func TestCoGBK(t *testing.T) { - if err := ptest.Run(CoGBK()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, CoGBK()) } func TestReshuffle(t *testing.T) { - if err := ptest.Run(Reshuffle()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, Reshuffle()) } func TestReshuffleKV(t *testing.T) { - if err := ptest.Run(ReshuffleKV()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, ReshuffleKV()) } diff --git a/sdks/go/test/integration/primitives/flatten.go b/sdks/go/test/integration/primitives/flatten.go index 34d40851c4de..8bde73e038e7 100644 --- a/sdks/go/test/integration/primitives/flatten.go +++ b/sdks/go/test/integration/primitives/flatten.go @@ -16,8 +16,8 @@ package primitives import ( - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" ) // Flatten tests flatten. diff --git a/sdks/go/test/integration/primitives/flatten_test.go b/sdks/go/test/integration/primitives/flatten_test.go index 04710e48d46c..217816e8d772 100644 --- a/sdks/go/test/integration/primitives/flatten_test.go +++ b/sdks/go/test/integration/primitives/flatten_test.go @@ -18,17 +18,16 @@ package primitives import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" ) func TestFlatten(t *testing.T) { - if err := ptest.Run(Flatten()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, Flatten()) } func TestFlattenDup(t *testing.T) { - if err := ptest.Run(FlattenDup()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, FlattenDup()) } diff --git a/sdks/go/test/integration/primitives/pardo.go b/sdks/go/test/integration/primitives/pardo.go index 62654e8f3487..c46f52388243 100644 --- a/sdks/go/test/integration/primitives/pardo.go +++ b/sdks/go/test/integration/primitives/pardo.go @@ -16,8 +16,8 @@ package primitives import ( - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" ) func emit3Fn(elm int, emit, emit2, emit3 func(int)) { diff --git a/sdks/go/test/integration/primitives/pardo_test.go b/sdks/go/test/integration/primitives/pardo_test.go index f837689868dc..0d78403f8e9b 100644 --- a/sdks/go/test/integration/primitives/pardo_test.go +++ b/sdks/go/test/integration/primitives/pardo_test.go @@ -18,23 +18,21 @@ package primitives import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" ) func TestParDoMultiOutput(t *testing.T) { - if err := ptest.Run(ParDoMultiOutput()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, ParDoMultiOutput()) } func TestParDoSideInput(t *testing.T) { - if err := ptest.Run(ParDoSideInput()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, ParDoSideInput()) } func TestParDoKVSideInput(t *testing.T) { - if err := ptest.Run(ParDoKVSideInput()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, ParDoKVSideInput()) } diff --git a/sdks/go/test/integration/primitives/primitives_test.go b/sdks/go/test/integration/primitives/primitives_test.go index 34c8d133c7d9..ef8a265b8bfa 100644 --- a/sdks/go/test/integration/primitives/primitives_test.go +++ b/sdks/go/test/integration/primitives/primitives_test.go @@ -18,7 +18,11 @@ package primitives import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/flink" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/samza" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/spark" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) // TestMain invokes ptest.Main to allow running these tests on diff --git a/sdks/go/test/integration/primitives/teststream.go b/sdks/go/test/integration/primitives/teststream.go new file mode 100644 index 000000000000..7df6a485804f --- /dev/null +++ b/sdks/go/test/integration/primitives/teststream.go @@ -0,0 +1,156 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package primitives + +import ( + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/teststream" +) + +// TestStreamSequence tests the TestStream primitive by inserting string elements +// then advancing the watermark past the point where they were inserted. +func TestStreamStrings() *beam.Pipeline { + p, s := beam.NewPipelineWithRoot() + con := teststream.NewConfig() + con.AddElements(100, "a", "b", "c") + con.AdvanceWatermarkToInfinity() + + col := teststream.Create(s, con) + + passert.Count(s, col, "teststream strings", 3) + + return p +} + +// TestStreamByteSliceSequence tests the TestStream primitive by inserting byte slice elements +// then advancing the watermark to infinity and comparing the output.. +func TestStreamByteSliceSequence() *beam.Pipeline { + p, s := beam.NewPipelineWithRoot() + con := teststream.NewConfig() + b := []byte{91, 92, 93} + con.AddElements(1, b) + con.AdvanceWatermarkToInfinity() + col := teststream.Create(s, con) + passert.Count(s, col, "teststream byte", 1) + passert.Equals(s, col, append([]byte{3}, b...)) + return p +} + +// TestStreamInt64Sequence tests the TestStream primitive by inserting int64 elements +// then advancing the watermark past the point where they were inserted. +func TestStreamInt64Sequence() *beam.Pipeline { + p, s := beam.NewPipelineWithRoot() + con := teststream.NewConfig() + ele := []int64{91, 92, 93} + con.AddElementList(100, ele) + con.AdvanceWatermarkToInfinity() + + col := teststream.Create(s, con) + + passert.Count(s, col, "teststream int64", 3) + passert.EqualsList(s, col, ele) + return p +} + +// TestStreamTwoInt64Sequences tests the TestStream primitive by inserting two sets of +// int64 elements that arrive on-time into the TestStream +func TestStreamTwoInt64Sequences() *beam.Pipeline { + p, s := beam.NewPipelineWithRoot() + con := teststream.NewConfig() + eo := []int64{91, 92, 93} + et := []int64{96, 97, 98} + con.AddElementList(100, eo) + con.AdvanceWatermark(110) + con.AddElementList(120, et) + con.AdvanceWatermark(130) + + col := teststream.Create(s, con) + + passert.Count(s, col, "teststream int64", 6) + passert.EqualsList(s, col, append(eo, et...)) + return p +} + +// TestStreamFloat64Sequence tests the TestStream primitive by inserting float64 elements +// then advancing the watermark past the point where they were inserted. +func TestStreamFloat64Sequence() *beam.Pipeline { + p, s := beam.NewPipelineWithRoot() + con := teststream.NewConfig() + ele := []float64{91.1, 92.2, 93.3} + con.AddElementList(100, ele) + con.AdvanceWatermarkToInfinity() + + col := teststream.Create(s, con) + + passert.Count(s, col, "teststream float64", 3) + passert.EqualsList(s, col, ele) + return p +} + +// TestStreamTwoFloat64Sequences tests the TestStream primitive by inserting two sets of +// float64 elements that arrive on-time into the TestStream +func TestStreamTwoFloat64Sequences() *beam.Pipeline { + p, s := beam.NewPipelineWithRoot() + con := teststream.NewConfig() + eo := []float64{91.1, 92.2, 93.3} + et := []float64{96.4, 97.5, 98.6} + con.AddElementList(100, eo) + con.AdvanceWatermark(110) + con.AddElementList(120, et) + con.AdvanceWatermark(130) + + col := teststream.Create(s, con) + + passert.Count(s, col, "teststream float64", 6) + passert.EqualsList(s, col, append(eo, et...)) + return p +} + +// TestStreamBoolSequence tests the TestStream primitive by inserting boolean elements +// then advancing the watermark past the point where they were inserted. +func TestStreamBoolSequence() *beam.Pipeline { + p, s := beam.NewPipelineWithRoot() + con := teststream.NewConfig() + ele := []bool{true, false, true} + con.AddElementList(100, ele) + con.AdvanceWatermarkToInfinity() + + col := teststream.Create(s, con) + + passert.Count(s, col, "teststream bool", 3) + passert.EqualsList(s, col, ele) + return p +} + +// TestStreamTwoBoolSequences tests the TestStream primitive by inserting two sets of +// boolean elements that arrive on-time into the TestStream +func TestStreamTwoBoolSequences() *beam.Pipeline { + p, s := beam.NewPipelineWithRoot() + con := teststream.NewConfig() + eo := []bool{true, false, true} + et := []bool{false, true, false} + con.AddElementList(100, eo) + con.AdvanceWatermark(110) + con.AddElementList(120, et) + con.AdvanceWatermark(130) + + col := teststream.Create(s, con) + + passert.Count(s, col, "teststream bool", 6) + passert.EqualsList(s, col, append(eo, et...)) + return p +} diff --git a/sdks/go/test/integration/primitives/teststream_test.go b/sdks/go/test/integration/primitives/teststream_test.go new file mode 100644 index 000000000000..322cac81f228 --- /dev/null +++ b/sdks/go/test/integration/primitives/teststream_test.go @@ -0,0 +1,63 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package primitives + +import ( + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" +) + +func TestTestStreamStrings(t *testing.T) { + integration.CheckFilters(t) + ptest.RunAndValidate(t, TestStreamStrings()) +} + +func TestTestStreamByteSliceSequence(t *testing.T) { + integration.CheckFilters(t) + ptest.RunAndValidate(t, TestStreamByteSliceSequence()) +} + +func TestTestStreamInt64Sequence(t *testing.T) { + integration.CheckFilters(t) + ptest.RunAndValidate(t, TestStreamInt64Sequence()) +} + +func TestTestStreamTwoInt64Sequences(t *testing.T) { + integration.CheckFilters(t) + ptest.RunAndValidate(t, TestStreamTwoInt64Sequences()) +} + +func TestTestStreamFloat64Sequence(t *testing.T) { + integration.CheckFilters(t) + ptest.RunAndValidate(t, TestStreamFloat64Sequence()) +} + +func TestTestStreamTwoFloat64Sequences(t *testing.T) { + integration.CheckFilters(t) + ptest.RunAndValidate(t, TestStreamTwoFloat64Sequences()) +} + +func TestTestStreamBoolSequence(t *testing.T) { + integration.CheckFilters(t) + ptest.RunAndValidate(t, TestStreamBoolSequence()) +} + +func TestTestStreamTwoBoolSequences(t *testing.T) { + integration.CheckFilters(t) + ptest.RunAndValidate(t, TestStreamTwoBoolSequences()) +} diff --git a/sdks/go/test/integration/primitives/windowinto.go b/sdks/go/test/integration/primitives/windowinto.go new file mode 100644 index 000000000000..1c2286d162ff --- /dev/null +++ b/sdks/go/test/integration/primitives/windowinto.go @@ -0,0 +1,195 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package primitives + +import ( + "reflect" + "time" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/teststream" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" +) + +func init() { + beam.RegisterFunction(sumPerKey) + beam.RegisterType(reflect.TypeOf((*createTimestampedData)(nil)).Elem()) +} + +// createTimestampedData produces data timestamped with the ordinal. +type createTimestampedData struct { + Data []int +} + +func (f *createTimestampedData) ProcessElement(_ []byte, emit func(beam.EventTime, string, int)) { + for i, v := range f.Data { + timestamp := mtime.FromMilliseconds(int64((i + 1) * 1000)).Subtract(10 * time.Millisecond) + emit(timestamp, "magic", v) + } +} + +// WindowSums produces a pipeline that generates the numbers of a 3x3 magic square, and +// configures the pipeline so that PCollection. Sum is a closure to handle summing data over the window, in a few conditions. +func WindowSums(s beam.Scope, sumPerKey func(beam.Scope, beam.PCollection) beam.PCollection) { + timestampedData := beam.ParDo(s, &createTimestampedData{Data: []int{4, 9, 2, 3, 5, 7, 8, 1, 6}}, beam.Impulse(s)) + + windowSize := 3 * time.Second + + validate := func(s beam.Scope, wfn *window.Fn, in beam.PCollection, expected ...interface{}) { + // Window the data. + windowed := beam.WindowInto(s, wfn, in) + // Perform the appropriate sum operation. + sums := sumPerKey(s, windowed) + // Drop back to Global windows, and drop the key otherwise passert.Equals doesn't work. + sums = beam.WindowInto(s, window.NewGlobalWindows(), sums) + sums = beam.DropKey(s, sums) + passert.Equals(s, sums, expected...) + } + + // Use fixed windows to divide the data into 3 chunks. + validate(s.Scope("Fixed"), window.NewFixedWindows(windowSize), timestampedData, 15, 15, 15) + // This should be identical to the "fixed" windows. + validate(s.Scope("SlidingFixed"), window.NewSlidingWindows(windowSize, windowSize), timestampedData, 15, 15, 15) + // This will have overlap, but each value should be a multiple of the magic number. + validate(s.Scope("Sliding"), window.NewSlidingWindows(windowSize, 3*windowSize), timestampedData, 15, 30, 45, 30, 15) + // With such a large gap, there should be a single session which will sum to 45. + validate(s.Scope("Session"), window.NewSessions(windowSize), timestampedData, 45) +} + +func sumPerKey(ws beam.Window, ts beam.EventTime, key beam.U, iter func(*int) bool) (beam.U, int) { + var v, sum int + for iter(&v) { + sum += v + } + return key, sum +} + +func gbkSumPerKey(s beam.Scope, in beam.PCollection) beam.PCollection { + grouped := beam.GroupByKey(s, in) + return beam.ParDo(s, sumPerKey, grouped) +} + +func WindowSums_GBK(s beam.Scope) { + WindowSums(s.Scope("GBK"), gbkSumPerKey) +} + +func WindowSums_Lifted(s beam.Scope) { + WindowSums(s.Scope("Lifted"), stats.SumPerKey) +} + +func validateEquals(s beam.Scope, wfn *window.Fn, in beam.PCollection, tr window.Trigger, m beam.AccumulationMode, expected ...interface{}) { + windowed := beam.WindowInto(s, wfn, in, beam.Trigger(tr), m) + sums := stats.Sum(s, windowed) + sums = beam.WindowInto(s, window.NewGlobalWindows(), sums) + passert.Equals(s, sums, expected...) +} + +// TriggerDefault tests the default trigger which fires the pane after the end of the window +func TriggerDefault(s beam.Scope) { + con := teststream.NewConfig() + con.AddElements(1000, 1.0, 2.0, 3.0) + con.AdvanceWatermark(11000) + con.AddElements(12000, 4.0, 5.0) + con.AdvanceWatermark(13000) + + col := teststream.Create(s, con) + windowSize := 10 * time.Second + validateEquals(s.Scope("Fixed"), window.NewFixedWindows(windowSize), col, window.TriggerDefault(), beam.PanesDiscard(), 6.0, 9.0) +} + +// TriggerAlways tests the Always trigger, it is expected to receive every input value as the output. +func TriggerAlways(s beam.Scope) { + con := teststream.NewConfig() + con.AddElements(1000, 1.0, 2.0, 3.0) + con.AdvanceWatermark(11000) + col := teststream.Create(s, con) + windowSize := 10 * time.Second + + validateEquals(s.Scope("Fixed"), window.NewFixedWindows(windowSize), col, window.TriggerAlways(), beam.PanesDiscard(), 1.0, 2.0, 3.0) +} + +func validateCount(s beam.Scope, wfn *window.Fn, in beam.PCollection, tr window.Trigger, m beam.AccumulationMode, expected int) { + windowed := beam.WindowInto(s, wfn, in, beam.Trigger(tr), m) + sums := stats.Sum(s, windowed) + sums = beam.WindowInto(s, window.NewGlobalWindows(), sums) + passert.Count(s, sums, "total collections", expected) +} + +// TriggerElementCount tests the ElementCount Trigger, it waits for atleast N elements to be ready +// to fire an output pane +func TriggerElementCount(s beam.Scope) { + con := teststream.NewConfig() + con.AddElements(1000, 1.0, 2.0, 3.0) + con.AdvanceWatermark(2000) + con.AddElements(6000, 4.0, 5.0) + con.AdvanceWatermark(10000) + con.AddElements(52000, 10.0) + con.AdvanceWatermark(53000) + + col := teststream.Create(s, con) + windowSize := 10 * time.Second + + // waits only for two elements to arrive and fires output after that and never fires that. + // For the trigger to fire every 2 elements, combine it with Repeat Trigger + validateCount(s.Scope("Fixed"), window.NewFixedWindows(windowSize), col, window.TriggerAfterCount(2), beam.PanesDiscard(), 2) +} + +// TriggerAfterProcessingTime tests the AfterProcessingTime Trigger, it fires output panes once 't' processing time has passed +// Not yet supported by the flink runner: +// java.lang.UnsupportedOperationException: Advancing Processing time is not supported by the Flink Runner. +func TriggerAfterProcessingTime(s beam.Scope) { + con := teststream.NewConfig() + con.AdvanceProcessingTime(100) + con.AddElements(1000, 1.0, 2.0, 3.0) + con.AdvanceProcessingTime(2000) + con.AddElements(22000, 4.0) + + col := teststream.Create(s, con) + + validateEquals(s.Scope("Global"), window.NewGlobalWindows(), col, window.TriggerAfterProcessingTime(5000), beam.PanesDiscard(), 6.0) +} + +// TriggerRepeat tests the repeat trigger. As of now is it is configure to take only one trigger as a subtrigger. +// In the below test, it is expected to receive three output panes with two elements each. +func TriggerRepeat(s beam.Scope) { + // create a teststream pipeline and get the pcollection + con := teststream.NewConfig() + con.AddElements(1000, 1.0, 2.0, 3.0) + con.AdvanceWatermark(2000) + con.AddElements(6000, 4.0, 5.0, 6.0) + con.AdvanceWatermark(10000) + + col := teststream.Create(s, con) + + validateCount(s.Scope("Global"), window.NewGlobalWindows(), col, window.TriggerRepeat(window.TriggerAfterCount(2)), beam.PanesDiscard(), 3) +} + +// TriggerAfterEndOfWindow tests the AfterEndOfWindow Trigger. With AfterCount(2) as the early firing trigger and AfterCount(1) as late firing trigger. +// It fires two times, one with early firing when there are two elements while the third elements waits in. This third element is fired in the late firing. +func TriggerAfterEndOfWindow(s beam.Scope) { + con := teststream.NewConfig() + con.AddElements(1000, 1.0, 2.0, 3.0) + con.AdvanceWatermark(11000) + + col := teststream.Create(s, con) + windowSize := 10 * time.Second + trigger := window.TriggerAfterEndOfWindow().EarlyFiring(window.TriggerAfterCount(2)).LateFiring(window.TriggerAfterCount(1)) + + validateCount(s.Scope("Fixed"), window.NewFixedWindows(windowSize), col, trigger, beam.PanesDiscard(), 2) +} diff --git a/sdks/go/test/integration/primitives/windowinto_test.go b/sdks/go/test/integration/primitives/windowinto_test.go new file mode 100644 index 000000000000..a0cfdee37c42 --- /dev/null +++ b/sdks/go/test/integration/primitives/windowinto_test.go @@ -0,0 +1,73 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package primitives + +import ( + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" +) + +func TestWindowSums_Lifted(t *testing.T) { + integration.CheckFilters(t) + p, s := beam.NewPipelineWithRoot() + WindowSums_Lifted(s) + ptest.RunAndValidate(t, p) +} + +func TestWindowSums_GBK(t *testing.T) { + integration.CheckFilters(t) + p, s := beam.NewPipelineWithRoot() + WindowSums_GBK(s) + ptest.RunAndValidate(t, p) +} + +func TestTriggerDefault(t *testing.T) { + integration.CheckFilters(t) + p, s := beam.NewPipelineWithRoot() + TriggerDefault(s) + ptest.RunAndValidate(t, p) +} + +func TestTriggerAlways(t *testing.T) { + integration.CheckFilters(t) + p, s := beam.NewPipelineWithRoot() + TriggerAlways(s) + ptest.RunAndValidate(t, p) +} + +func TestTriggerElementCount(t *testing.T) { + integration.CheckFilters(t) + p, s := beam.NewPipelineWithRoot() + TriggerElementCount(s) + ptest.RunAndValidate(t, p) +} + +func TestTriggerRepeat(t *testing.T) { + integration.CheckFilters(t) + p, s := beam.NewPipelineWithRoot() + TriggerRepeat(s) + ptest.RunAndValidate(t, p) +} + +func TestTriggerAfterEndOfWindow(t *testing.T) { + integration.CheckFilters(t) + p, s := beam.NewPipelineWithRoot() + TriggerAfterEndOfWindow(s) + ptest.RunAndValidate(t, p) +} diff --git a/sdks/go/test/integration/synthetic/synthetic.go b/sdks/go/test/integration/synthetic/synthetic.go index 0cd9051de0dd..e854d31df690 100644 --- a/sdks/go/test/integration/synthetic/synthetic.go +++ b/sdks/go/test/integration/synthetic/synthetic.go @@ -17,9 +17,9 @@ package synthetic import ( - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/synthetic" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/synthetic" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" ) // SimplePipeline creates a very simple synthetic pipeline to test that basic diff --git a/sdks/go/test/integration/synthetic/synthetic_test.go b/sdks/go/test/integration/synthetic/synthetic_test.go new file mode 100644 index 000000000000..3161012975bd --- /dev/null +++ b/sdks/go/test/integration/synthetic/synthetic_test.go @@ -0,0 +1,79 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package synthetic + +import ( + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/synthetic" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/flink" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/samza" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/spark" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" +) + +// TestSimplePipeline creates a very simple synthetic pipeline to test that +// basic synthetic pipelines work. +func TestSimplePipeline(t *testing.T) { + integration.CheckFilters(t) + + p, s := beam.NewPipelineWithRoot() + const size = 100 + + src := synthetic.SourceSingle(s, + synthetic.DefaultSourceConfig().NumElements(size).Build()) + step := synthetic.Step(s, synthetic.DefaultStepConfig().Build(), src) + passert.Count(s, step, "out", size) + + ptest.RunAndValidate(t, p) +} + +// TestSplittablePipeline creates a simple synthetic pipeline that exercises +// splitting-related behavior. +func TestSplittablePipeline(t *testing.T) { + integration.CheckFilters(t) + + p, s := beam.NewPipelineWithRoot() + const srcSize1 = 50 + const srcSize2 = 10 + const stepMult = 500 + const outCount = (srcSize1 + srcSize2) * stepMult + + configs := beam.Create(s, + synthetic.DefaultSourceConfig().NumElements(srcSize1).InitialSplits(3).Build(), + synthetic.DefaultSourceConfig().NumElements(srcSize2).InitialSplits(3).Build()) + src := synthetic.Source(s, configs) + step := synthetic.Step( + s, + synthetic. + DefaultStepConfig(). + OutputPerInput(stepMult). + Splittable(true). + InitialSplits(8). + Build(), + src) + passert.Count(s, step, "out", outCount) + + ptest.RunAndValidate(t, p) +} + +func TestMain(m *testing.M) { + ptest.Main(m) +} diff --git a/sdks/go/test/integration/wordcount/wordcount.go b/sdks/go/test/integration/wordcount/wordcount.go index 109b94769bdd..bcc40b55a2d5 100644 --- a/sdks/go/test/integration/wordcount/wordcount.go +++ b/sdks/go/test/integration/wordcount/wordcount.go @@ -23,10 +23,10 @@ import ( "fmt" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/textio" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats" ) var ( diff --git a/sdks/go/test/integration/wordcount/wordcount_test.go b/sdks/go/test/integration/wordcount/wordcount_test.go index e86810d63f04..482d9a3ac872 100644 --- a/sdks/go/test/integration/wordcount/wordcount_test.go +++ b/sdks/go/test/integration/wordcount/wordcount_test.go @@ -19,8 +19,13 @@ import ( "strings" "testing" - "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/memfs" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/memfs" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/flink" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/samza" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/spark" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" ) func TestWordCount(t *testing.T) { @@ -74,12 +79,18 @@ func TestWordCount(t *testing.T) { } for _, test := range tests { + integration.CheckFilters(t) const filename = "memfs://input" memfs.Write(filename, []byte(strings.Join(test.lines, "\n"))) p := WordCount(filename, test.hash, test.words) - if err := ptest.Run(p); err != nil { + _, err := ptest.RunWithMetrics(p) + if err != nil { t.Errorf("WordCount(\"%v\") failed: %v", strings.Join(test.lines, "|"), err) } } } + +func TestMain(m *testing.M) { + ptest.Main(m) +} diff --git a/sdks/go/pkg/beam/io/pubsubio/v1/gen.go b/sdks/go/test/integration/xlang/xlang.go similarity index 82% rename from sdks/go/pkg/beam/io/pubsubio/v1/gen.go rename to sdks/go/test/integration/xlang/xlang.go index 55191112ce7d..e65ddcd45bf5 100644 --- a/sdks/go/pkg/beam/io/pubsubio/v1/gen.go +++ b/sdks/go/test/integration/xlang/xlang.go @@ -13,9 +13,5 @@ // See the License for the specific language governing permissions and // limitations under the License. -package v1 - -//go:generate protoc -I . v1.proto --go_out=. - -// PubSubPayloadURN is the URN of the pubsub proto payload. -const PubSubPayloadURN = "beam:go:payload:pubsub:v1" +// Package xlang contains integration tests for cross-language transforms. +package xlang diff --git a/sdks/go/test/validatesrunner/xlang_test.go b/sdks/go/test/integration/xlang/xlang_test.go similarity index 68% rename from sdks/go/test/validatesrunner/xlang_test.go rename to sdks/go/test/integration/xlang/xlang_test.go index cda57263d661..560bc5c17beb 100644 --- a/sdks/go/test/validatesrunner/xlang_test.go +++ b/sdks/go/test/integration/xlang/xlang_test.go @@ -13,7 +13,7 @@ // See the License for the specific language governing permissions and // limitations under the License. -package validatesrunner +package xlang import ( "fmt" @@ -21,10 +21,15 @@ import ( "sort" "testing" - "github.com/apache/beam/sdks/go/examples/xlang" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/examples/xlang" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/flink" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/samza" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/spark" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" ) func init() { @@ -40,6 +45,12 @@ func init() { beam.RegisterFunction(collectValues) } +func checkFlags(t *testing.T) { + if *integration.TestExpansionAddr == "" { + t.Skip("No expansion address provided.") + } +} + // formatIntStringsFn is a DoFn that formats an int64 and a list of strings. func formatIntStringsFn(i int64, s []string) string { sort.Strings(s) @@ -101,17 +112,24 @@ func collectValues(key string, iter func(*int64) bool) (string, []int) { return key, values } -func TestXLang_CoGroupBy(t *testing.T) { - // TODO(BEAM-11418): Enable test once this bug is fixed. - t.Skip("Sickbayed: This test currently fails for unknown reasons.") - // TODO(BEAM-11416): Filter this test out from direct runner. - if *ptest.Runner == "direct" { - t.Skip("Direct runner does not support cross-language.") - } +func TestXLang_Prefix(t *testing.T) { + integration.CheckFilters(t) + checkFlags(t) - if *expansionAddr == "" { - t.Fatal("No expansion address provided") - } + p := beam.NewPipeline() + s := p.Root() + + // Using the cross-language transform + strings := beam.Create(s, "a", "b", "c") + prefixed := xlang.Prefix(s, "prefix_", *integration.TestExpansionAddr, strings) + passert.Equals(s, prefixed, "prefix_a", "prefix_b", "prefix_c") + + ptest.RunAndValidate(t, p) +} + +func TestXLang_CoGroupBy(t *testing.T) { + integration.CheckFilters(t) + checkFlags(t) p := beam.NewPipeline() s := p.Root() @@ -119,7 +137,7 @@ func TestXLang_CoGroupBy(t *testing.T) { // Using the cross-language transform col1 := beam.ParDo(s, getIntString, beam.Create(s, IntString{X: 0, Y: "1"}, IntString{X: 0, Y: "2"}, IntString{X: 1, Y: "3"})) col2 := beam.ParDo(s, getIntString, beam.Create(s, IntString{X: 0, Y: "4"}, IntString{X: 1, Y: "5"}, IntString{X: 1, Y: "6"})) - c := xlang.CoGroupByKey(s, *expansionAddr, col1, col2) + c := xlang.CoGroupByKey(s, *integration.TestExpansionAddr, col1, col2) sums := beam.ParDo(s, sumCounts, c) formatted := beam.ParDo(s, formatIntStringsFn, sums) passert.Equals(s, formatted, "0:[1 2 4]", "1:[3 5 6]") @@ -128,14 +146,8 @@ func TestXLang_CoGroupBy(t *testing.T) { } func TestXLang_Combine(t *testing.T) { - // TODO(BEAM-11416): Filter this test out from direct runner. - if *ptest.Runner == "direct" { - t.Skip("Direct runner does not support cross-language.") - } - - if *expansionAddr == "" { - t.Fatal("No expansion address provided") - } + integration.CheckFilters(t) + checkFlags(t) p := beam.NewPipeline() s := p.Root() @@ -143,7 +155,7 @@ func TestXLang_Combine(t *testing.T) { // Using the cross-language transform kvs := beam.Create(s, StringInt{X: "a", Y: 1}, StringInt{X: "a", Y: 2}, StringInt{X: "b", Y: 3}) ins := beam.ParDo(s, getStringInt, kvs) - c := xlang.CombinePerKey(s, *expansionAddr, ins) + c := xlang.CombinePerKey(s, *integration.TestExpansionAddr, ins) formatted := beam.ParDo(s, formatStringIntFn, c) passert.Equals(s, formatted, "a:3", "b:3") @@ -152,14 +164,8 @@ func TestXLang_Combine(t *testing.T) { } func TestXLang_CombineGlobally(t *testing.T) { - // TODO(BEAM-11416): Filter this test out from direct runner. - if *ptest.Runner == "direct" { - t.Skip("Direct runner does not support cross-language.") - } - - if *expansionAddr == "" { - t.Fatal("No expansion address provided") - } + integration.CheckFilters(t) + checkFlags(t) p := beam.NewPipeline() s := p.Root() @@ -167,7 +173,7 @@ func TestXLang_CombineGlobally(t *testing.T) { in := beam.CreateList(s, []int64{1, 2, 3}) // Using the cross-language transform - c := xlang.CombineGlobally(s, *expansionAddr, in) + c := xlang.CombineGlobally(s, *integration.TestExpansionAddr, in) formatted := beam.ParDo(s, formatIntFn, c) passert.Equals(s, formatted, "6") @@ -176,14 +182,8 @@ func TestXLang_CombineGlobally(t *testing.T) { } func TestXLang_Flatten(t *testing.T) { - // TODO(BEAM-11416): Filter this test out from direct runner. - if *ptest.Runner == "direct" { - t.Skip("Direct runner does not support cross-language.") - } - - if *expansionAddr == "" { - t.Fatal("No expansion address provided") - } + integration.CheckFilters(t) + checkFlags(t) p := beam.NewPipeline() s := p.Root() @@ -192,7 +192,7 @@ func TestXLang_Flatten(t *testing.T) { col2 := beam.CreateList(s, []int64{4, 5, 6}) // Using the cross-language transform - c := xlang.Flatten(s, *expansionAddr, col1, col2) + c := xlang.Flatten(s, *integration.TestExpansionAddr, col1, col2) formatted := beam.ParDo(s, formatIntFn, c) passert.Equals(s, formatted, "1", "2", "3", "4", "5", "6") @@ -201,14 +201,8 @@ func TestXLang_Flatten(t *testing.T) { } func TestXLang_GroupBy(t *testing.T) { - // TODO(BEAM-11416): Filter this test out from direct runner. - if *ptest.Runner == "direct" { - t.Skip("Direct runner does not support cross-language.") - } - - if *expansionAddr == "" { - t.Fatal("No expansion address provided") - } + integration.CheckFilters(t) + checkFlags(t) p := beam.NewPipeline() s := p.Root() @@ -216,7 +210,7 @@ func TestXLang_GroupBy(t *testing.T) { // Using the cross-language transform kvs := beam.Create(s, StringInt{X: "0", Y: 1}, StringInt{X: "0", Y: 2}, StringInt{X: "1", Y: 3}) in := beam.ParDo(s, getStringInt, kvs) - out := xlang.GroupByKey(s, *expansionAddr, in) + out := xlang.GroupByKey(s, *integration.TestExpansionAddr, in) vals := beam.ParDo(s, collectValues, out) formatted := beam.ParDo(s, formatStringIntsFn, vals) @@ -226,16 +220,8 @@ func TestXLang_GroupBy(t *testing.T) { } func TestXLang_Multi(t *testing.T) { - // TODO(BEAM-11418): Enable test once this bug is fixed. - t.Skip("Sickbayed: This test currently fails for unknown reasons.") - // TODO(BEAM-11416): Filter this test out from direct runner. - if *ptest.Runner == "direct" { - t.Skip("Direct runner does not support cross-language.") - } - - if *expansionAddr == "" { - t.Fatal("No expansion address provided") - } + integration.CheckFilters(t) + checkFlags(t) p := beam.NewPipeline() s := p.Root() @@ -245,7 +231,7 @@ func TestXLang_Multi(t *testing.T) { side := beam.CreateList(s, []string{"s"}) // Using the cross-language transform - mainOut, sideOut := xlang.Multi(s, *expansionAddr, main1, main2, side) + mainOut, sideOut := xlang.Multi(s, *integration.TestExpansionAddr, main1, main2, side) passert.Equals(s, mainOut, "as", "bbs", "xs", "yys", "zzzs") passert.Equals(s, sideOut, "ss") @@ -254,16 +240,8 @@ func TestXLang_Multi(t *testing.T) { } func TestXLang_Partition(t *testing.T) { - // TODO(BEAM-11418): Enable test once this bug is fixed. - t.Skip("Sickbayed: This test currently fails for unknown reasons.") - // TODO(BEAM-11416): Filter this test out from direct runner. - if *ptest.Runner == "direct" { - t.Skip("Direct runner does not support cross-language.") - } - - if *expansionAddr == "" { - t.Fatal("No expansion address provided") - } + integration.CheckFilters(t) + checkFlags(t) p := beam.NewPipeline() s := p.Root() @@ -271,7 +249,7 @@ func TestXLang_Partition(t *testing.T) { col := beam.CreateList(s, []int64{1, 2, 3, 4, 5, 6}) // Using the cross-language transform - out0, out1 := xlang.Partition(s, *expansionAddr, col) + out0, out1 := xlang.Partition(s, *integration.TestExpansionAddr, col) formatted0 := beam.ParDo(s, formatIntFn, out0) formatted1 := beam.ParDo(s, formatIntFn, out1) @@ -280,3 +258,7 @@ func TestXLang_Partition(t *testing.T) { ptest.RunAndValidate(t, p) } + +func TestMain(m *testing.M) { + ptest.Main(m) +} diff --git a/sdks/go/test/load/build.gradle b/sdks/go/test/load/build.gradle index 83160e81a91b..a006fcda7a0d 100644 --- a/sdks/go/test/load/build.gradle +++ b/sdks/go/test/load/build.gradle @@ -21,6 +21,11 @@ applyGoNature() description = "Apache Beam :: SDKs :: Go :: Test :: Load" +// Disable gogradle's dependency resolution so it uses Go modules instead. +installDependencies.enabled = false +resolveBuildDependencies.enabled = false +resolveTestDependencies.enabled = false + def getLocalPlatform = { String hostOs = com.github.blindpirate.gogradle.crossplatform.Os.getHostOs() String hostArch = com.github.blindpirate.gogradle.crossplatform.Arch.getHostArch() @@ -37,7 +42,7 @@ dependencies { } golang { - packagePath = 'github.com/apache/beam/sdks/go/test/load' + packagePath = 'github.com/apache/beam/sdks/v2/go/test/load' goBuild { // We always want to build linux-amd64 in addition to the user host platform // so we can submit this as the remote binary used within the Go container. @@ -47,10 +52,11 @@ golang { // is not found within when invoked from within build_rules.gradle applyGoNature targetPlatform = [getLocalPlatform(), 'linux-amd64'] // Build all the tests - go 'build -o ./build/bin/${GOOS}_${GOARCH}/pardo github.com/apache/beam/sdks/go/test/load/pardo' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/combine github.com/apache/beam/sdks/go/test/load/combine' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/group_by_key github.com/apache/beam/sdks/go/test/load/group_by_key' - go 'build -o ./build/bin/${GOOS}_${GOARCH}/cogbk github.com/apache/beam/sdks/go/test/load/cogbk' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/pardo github.com/apache/beam/sdks/v2/go/test/load/pardo' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/combine github.com/apache/beam/sdks/v2/go/test/load/combine' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/group_by_key github.com/apache/beam/sdks/v2/go/test/load/group_by_key' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/sideinput github.com/apache/beam/sdks/v2/go/test/load/sideinput' + go 'build -o ./build/bin/${GOOS}_${GOARCH}/cogbk github.com/apache/beam/sdks/v2/go/test/load/cogbk' } } diff --git a/sdks/go/test/load/cogbk/cogbk.go b/sdks/go/test/load/cogbk/cogbk.go index a3de69bc185a..77b196f1620f 100644 --- a/sdks/go/test/load/cogbk/cogbk.go +++ b/sdks/go/test/load/cogbk/cogbk.go @@ -20,11 +20,11 @@ import ( "flag" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/synthetic" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/test/load" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/synthetic" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/test/load" ) var ( diff --git a/sdks/go/test/load/combine/combine.go b/sdks/go/test/load/combine/combine.go index 8027afc39d66..32d46a1d94a7 100644 --- a/sdks/go/test/load/combine/combine.go +++ b/sdks/go/test/load/combine/combine.go @@ -20,12 +20,12 @@ import ( "context" "flag" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/synthetic" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/transforms/top" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/test/load" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/synthetic" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/test/load" ) var ( diff --git a/sdks/go/test/load/group_by_key/group_by_key.go b/sdks/go/test/load/group_by_key/group_by_key.go index b32640f1a762..78871c96d6f9 100644 --- a/sdks/go/test/load/group_by_key/group_by_key.go +++ b/sdks/go/test/load/group_by_key/group_by_key.go @@ -21,11 +21,11 @@ import ( "context" "flag" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/synthetic" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/test/load" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/synthetic" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/test/load" ) var ( diff --git a/sdks/go/test/load/pardo/pardo.go b/sdks/go/test/load/pardo/pardo.go index 9ed8c7eefbd7..8114ac8a9060 100644 --- a/sdks/go/test/load/pardo/pardo.go +++ b/sdks/go/test/load/pardo/pardo.go @@ -21,11 +21,11 @@ import ( "fmt" "reflect" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/io/synthetic" - "github.com/apache/beam/sdks/go/pkg/beam/log" - "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" - "github.com/apache/beam/sdks/go/test/load" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/synthetic" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/test/load" ) var ( diff --git a/sdks/go/test/load/sideinput/sideinput.go b/sdks/go/test/load/sideinput/sideinput.go new file mode 100644 index 000000000000..fe4f4527075a --- /dev/null +++ b/sdks/go/test/load/sideinput/sideinput.go @@ -0,0 +1,100 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package main + +import ( + "context" + "flag" + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/synthetic" + "github.com/apache/beam/sdks/v2/go/pkg/beam/log" + "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx" + "github.com/apache/beam/sdks/v2/go/test/load" +) + +func init() { + beam.RegisterDoFn(reflect.TypeOf((*doFn)(nil))) +} + +var ( + accessPercentage = flag.Int( + "access_percentage", + 100, + "Specifies the percentage of elements in the side input to be accessed.") + syntheticSourceConfig = flag.String( + "input_options", + "", + "A JSON object that describes the configuration for synthetic source") +) + +func parseSyntheticConfig() synthetic.SourceConfig { + if *syntheticSourceConfig == "" { + panic("--input_options not provided") + } else { + encoded := []byte(*syntheticSourceConfig) + return synthetic.DefaultSourceConfig().BuildFromJSON(encoded) + } +} + +type doFn struct { + ElementsToAccess int +} + +func (fn *doFn) ProcessElement(_ []byte, values func(*[]byte, *[]byte) bool, emit func([]byte, []byte)) { + var key []byte + var value []byte + i := 0 + for values(&key, &value) { + if i >= fn.ElementsToAccess { + break + } + emit(key, value) + i++ + } +} + +func main() { + flag.Parse() + beam.Init() + ctx := context.Background() + p, s := beam.NewPipelineWithRoot() + + syntheticConfig := parseSyntheticConfig() + elementsToAccess := syntheticConfig.NumElements * *accessPercentage / 100 + + src := synthetic.SourceSingle(s, syntheticConfig) + src = beam.ParDo(s, &load.RuntimeMonitor{}, src) + + src = beam.ParDo( + s, + &doFn{ElementsToAccess: elementsToAccess}, + beam.Impulse(s), + beam.SideInput{Input: src}) + + beam.ParDo(s, &load.RuntimeMonitor{}, src) + + presult, err := beamx.RunWithMetrics(ctx, p) + if err != nil { + log.Fatalf(ctx, "Failed to execute job: %v", err) + } + + if presult != nil { + metrics := presult.Metrics().AllMetrics() + load.PublishMetrics(metrics) + } +} diff --git a/sdks/go/test/load/util.go b/sdks/go/test/load/util.go index 2f9baf832936..96436d0d6782 100644 --- a/sdks/go/test/load/util.go +++ b/sdks/go/test/load/util.go @@ -28,8 +28,8 @@ import ( "strings" "time" - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/metrics" ) const ( diff --git a/sdks/go/test/regression/coders/fromyaml/fromyaml.go b/sdks/go/test/regression/coders/fromyaml/fromyaml.go index 3544a5f56f44..337c48e9b470 100644 --- a/sdks/go/test/regression/coders/fromyaml/fromyaml.go +++ b/sdks/go/test/regression/coders/fromyaml/fromyaml.go @@ -30,12 +30,12 @@ import ( "strconv" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" - "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" - "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx" - "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/mtime" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/graphx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" "github.com/google/go-cmp/cmp" "golang.org/x/text/encoding/charmap" yaml "gopkg.in/yaml.v2" @@ -45,6 +45,7 @@ var unimplementedCoders = map[string]bool{ "beam:coder:param_windowed_value:v1": true, "beam:coder:timer:v1": true, "beam:coder:sharded_key:v1": true, + "beam:coder:custom_window:v1": true, } // Coder is a representation a serialized beam coder. diff --git a/sdks/go/test/regression/lperror.go b/sdks/go/test/regression/lperror.go new file mode 100644 index 000000000000..4555e5036833 --- /dev/null +++ b/sdks/go/test/regression/lperror.go @@ -0,0 +1,63 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package regression + +import ( + "github.com/apache/beam/sdks/v2/go/pkg/beam" +) + +// REPRO found by https://github.com/zelliott + +type fruit struct { + Name string +} + +func toFoo(id int, _ func(**fruit) bool) (int, string) { + return id, "Foo" +} + +func toID(id int, fruitIter func(**fruit) bool, _ func(*string) bool) int { + var fruit *fruit + for fruitIter(&fruit) { + } + return id +} + +// LPErrorPipeline constructs a pipeline that has a GBK followed by a CoGBK using the same +// input, with schema encoded structs as elements. This ends up having the stage after the +// CoGBK fail since the decoder post-cogbk is missing a Length Prefix coder that was +// applied to the GBK input, but not the CoGBK output. +// Root is likely in that there's no Beam standard CoGBK format for inject and expand. +// JIRA: BEAM-12438 +func LPErrorPipeline(s beam.Scope) beam.PCollection { + // ["Apple", "Banana", "Cherry"] + fruits := beam.CreateList(s, []*fruit{{"Apple"}, {"Banana"}, {"Cherry"}}) + + // [0 "Apple", 0 "Banana", 0 "Cherry"] + fruitsKV := beam.AddFixedKey(s, fruits) + + // [0 ["Apple", "Banana", "Cherry"]] + fruitsGBK := beam.GroupByKey(s, fruitsKV) + + // [0 "Foo"] + fooKV := beam.ParDo(s, toFoo, fruitsGBK) + + // [0 ["Foo"] ["Apple", "Banana", "Cherry"]] + fruitsFooCoGBK := beam.CoGroupByKey(s, fruitsKV, fooKV) + + // [0] + return beam.ParDo(s, toID, fruitsFooCoGBK) +} diff --git a/sdks/go/test/regression/lperror_test.go b/sdks/go/test/regression/lperror_test.go new file mode 100644 index 000000000000..a638dbeb47d5 --- /dev/null +++ b/sdks/go/test/regression/lperror_test.go @@ -0,0 +1,41 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package regression + +import ( + "testing" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" + + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/flink" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/samza" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/spark" +) + +func TestLPErrorPipeline(t *testing.T) { + integration.CheckFilters(t) + + pipeline, s := beam.NewPipelineWithRoot() + want := beam.CreateList(s, []int{0}) + got := LPErrorPipeline(s) + passert.Equals(s, got, want) + + ptest.RunAndValidate(t, pipeline) +} diff --git a/sdks/go/test/regression/pardo.go b/sdks/go/test/regression/pardo.go index e158e3b6d142..7ca3880a0e64 100644 --- a/sdks/go/test/regression/pardo.go +++ b/sdks/go/test/regression/pardo.go @@ -17,9 +17,9 @@ package regression import ( - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) func directFn(elm int) int { diff --git a/sdks/go/test/regression/pardo_test.go b/sdks/go/test/regression/pardo_test.go index 322dd69c6f7d..fc6d240fbf5a 100644 --- a/sdks/go/test/regression/pardo_test.go +++ b/sdks/go/test/regression/pardo_test.go @@ -18,41 +18,41 @@ package regression import ( "testing" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/test/integration" + + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/flink" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/samza" + _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/spark" ) func TestDirectParDo(t *testing.T) { - if err := ptest.Run(DirectParDo()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, DirectParDo()) } func TestEmitParDo(t *testing.T) { - if err := ptest.Run(EmitParDo()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, EmitParDo()) } func TestMultiEmitParDo(t *testing.T) { - if err := ptest.Run(MultiEmitParDo()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, MultiEmitParDo()) } func TestMixedOutputParDo(t *testing.T) { - if err := ptest.Run(MixedOutputParDo()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, MixedOutputParDo()) } func TestDirectParDoAfterGBK(t *testing.T) { - if err := ptest.Run(DirectParDoAfterGBK()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, DirectParDoAfterGBK()) } func TestEmitParDoAfterGBK(t *testing.T) { - if err := ptest.Run(EmitParDoAfterGBK()); err != nil { - t.Error(err) - } + integration.CheckFilters(t) + ptest.RunAndValidate(t, EmitParDoAfterGBK()) } diff --git a/sdks/go/test/validatesrunner/validatesrunner_test.go b/sdks/go/test/regression/regression_test.go similarity index 76% rename from sdks/go/test/validatesrunner/validatesrunner_test.go rename to sdks/go/test/regression/regression_test.go index 826666fd004f..5ef278f91a72 100644 --- a/sdks/go/test/validatesrunner/validatesrunner_test.go +++ b/sdks/go/test/regression/regression_test.go @@ -13,17 +13,16 @@ // See the License for the specific language governing permissions and // limitations under the License. -package validatesrunner +package regression import ( "testing" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/flink" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/spark" - _ "github.com/apache/beam/sdks/go/pkg/beam/runners/universal" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" + "github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest" ) +// TestMain invokes ptest.Main to allow running these tests on +// non-direct runners. func TestMain(m *testing.M) { ptest.Main(m) } diff --git a/sdks/go/test/run_integration_tests.sh b/sdks/go/test/run_integration_tests.sh deleted file mode 100755 index 11230638ded2..000000000000 --- a/sdks/go/test/run_integration_tests.sh +++ /dev/null @@ -1,239 +0,0 @@ -#!/bin/bash -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# This script will be run by Jenkins as a post commit test. In order to run -# locally use the following flags: -# -# --gcs_location -> Temporary location to use for service tests. -# --project -> Project name to use for docker images. -# --dataflow_project -> Project name to use for dataflow. -# -# Execute from the root of the repository. It assumes binaries are built. - -set -e -set -v - -RUNNER=dataflow - -# Where to store integration test outputs. -GCS_LOCATION=gs://temp-storage-for-end-to-end-tests - -# Project for the container and integration test -PROJECT=apache-beam-testing -DATAFLOW_PROJECT=apache-beam-testing -REGION=us-central1 - -# Number of tests to run in parallel -PARALLEL=10 - -while [[ $# -gt 0 ]] -do -key="$1" -case $key in - --runner) - RUNNER="$2" - shift # past argument - shift # past value - ;; - --project) - PROJECT="$2" - shift # past argument - shift # past value - ;; - --region) - REGION="$2" - shift # past argument - shift # past value - ;; - --dataflow_project) - DATAFLOW_PROJECT="$2" - shift # past argument - shift # past value - ;; - --gcs_location) - GCS_LOCATION="$2" - shift # past argument - shift # past value - ;; - --dataflow_worker_jar) - DATAFLOW_WORKER_JAR="$2" - shift # past argument - shift # past value - ;; - --flink_job_server_jar) - FLINK_JOB_SERVER_JAR="$2" - shift # past argument - shift # past value - ;; - --spark_job_server_jar) - SPARK_JOB_SERVER_JAR="$2" - shift # past argument - shift # past value - ;; - --endpoint) - ENDPOINT="$2" - shift # past argument - shift # past value - ;; - --parallel) - PARALLEL="$2" - shift # past argument - shift # past value - ;; - --filter) - FILTER="$2" - shift - shift - ;; - *) # unknown option - echo "Unknown option: $1" - exit 1 - ;; -esac -done - -if [[ "$RUNNER" == "universal" ]]; then - PUSH_CONTAINER_TO_GCR='' -else - PUSH_CONTAINER_TO_GCR='yes' -fi - -# Go to the root of the repository -cd $(git rev-parse --show-toplevel) - -# Verify in the root of the repository -test -d sdks/go/test - -# Verify docker and gcloud commands exist -command -v docker -docker -v - -if [[ "$PUSH_CONTAINER_TO_GCR" == "yes" ]]; then - command -v gcloud - gcloud --version - - # ensure gcloud is version 186 or above - TMPDIR=$(mktemp -d) - gcloud_ver=$(gcloud -v | head -1 | awk '{print $4}') - if [[ "$gcloud_ver" < "186" ]] - then - pushd $TMPDIR - curl https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-186.0.0-linux-x86_64.tar.gz --output gcloud.tar.gz - tar xf gcloud.tar.gz - ./google-cloud-sdk/install.sh --quiet - . ./google-cloud-sdk/path.bash.inc - popd - gcloud components update --quiet || echo 'gcloud components update failed' - gcloud -v - fi - - # Build the container - TAG=$(date +%Y%m%d-%H%M%S) - CONTAINER=us.gcr.io/$PROJECT/$USER/beam_go_sdk - echo "Using container $CONTAINER" - ./gradlew :sdks:go:container:docker -Pdocker-repository-root=us.gcr.io/$PROJECT/$USER -Pdocker-tag=$TAG - - # Verify it exists - docker images | grep $TAG - - # Push the container - gcloud docker -- push $CONTAINER -else - TAG=dev - ./gradlew :sdks:go:container:docker -Pdocker-tag=$TAG - CONTAINER=apache/beam_go_sdk -fi - -if [[ "$RUNNER" == "dataflow" ]]; then - if [[ -z "$DATAFLOW_WORKER_JAR" ]]; then - DATAFLOW_WORKER_JAR=$(find ./runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-*.jar) - fi - echo "Using Dataflow worker jar: $DATAFLOW_WORKER_JAR" -elif [[ "$RUNNER" == "flink" || "$RUNNER" == "spark" || "$RUNNER" == "universal" ]]; then - if [[ -z "$ENDPOINT" ]]; then - # Hacky python script to find a free port. Note there is a small chance the chosen port could - # get taken before being claimed by the job server. - SOCKET_SCRIPT=" -import socket -s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) -s.bind(('localhost', 0)) -print(s.getsockname()[1]) -s.close() - " - JOB_PORT=$(python -c "$SOCKET_SCRIPT") - ENDPOINT="localhost:$JOB_PORT" - echo "No endpoint specified; starting a new $RUNNER job server on $ENDPOINT" - if [[ "$RUNNER" == "flink" ]]; then - java \ - -jar $FLINK_JOB_SERVER_JAR \ - --flink-master [local] \ - --job-port $JOB_PORT \ - --artifact-port 0 & - elif [[ "$RUNNER" == "spark" ]]; then - java \ - -jar $SPARK_JOB_SERVER_JAR \ - --spark-master-url local \ - --job-port $JOB_PORT \ - --artifact-port 0 & - elif [[ "$RUNNER" == "universal" ]]; then - python \ - -m apache_beam.runners.portability.local_job_service_main \ - --port $JOB_PORT & - else - echo "Unknown runner: $RUNNER" - exit 1; - fi - fi -fi - -echo ">>> RUNNING $RUNNER INTEGRATION TESTS" -./sdks/go/build/bin/integration \ - --runner=$RUNNER \ - --project=$DATAFLOW_PROJECT \ - --region=$REGION \ - --environment_type=DOCKER \ - --environment_config=$CONTAINER:$TAG \ - --staging_location=$GCS_LOCATION/staging-validatesrunner-test \ - --temp_location=$GCS_LOCATION/temp-validatesrunner-test \ - --worker_binary=./sdks/go/test/build/bin/linux-amd64/worker \ - --dataflow_worker_jar=$DATAFLOW_WORKER_JAR \ - --endpoint=$ENDPOINT \ - --parallel=$PARALLEL \ - --filter=$FILTER \ - || TEST_EXIT_CODE=$? # don't fail fast here; clean up environment before exiting - -if [[ ! -z "$JOB_PORT" ]]; then - # Shut down the job server - kill %1 || echo "Failed to shut down job server" -fi - -if [[ "$PUSH_CONTAINER_TO_GCR" = 'yes' ]]; then - # Delete the container locally and remotely - docker rmi $CONTAINER:$TAG || echo "Failed to remove container" - gcloud --quiet container images delete $CONTAINER:$TAG || echo "Failed to delete container" - - # Clean up tempdir - rm -rf $TMPDIR -fi - -if [[ "$TEST_EXIT_CODE" -eq 0 ]]; then - echo ">>> SUCCESS" -else - echo ">>> FAILURE" -fi -exit $TEST_EXIT_CODE diff --git a/sdks/go/test/run_validatesrunner_tests.sh b/sdks/go/test/run_validatesrunner_tests.sh index c25f9c58c2ab..49c196e27fc2 100755 --- a/sdks/go/test/run_validatesrunner_tests.sh +++ b/sdks/go/test/run_validatesrunner_tests.sh @@ -16,9 +16,9 @@ # limitations under the License. # This script executes ValidatesRunner tests including launching any additional -# services needed, such as job services or expansion services. The following -# runners are supported, and selected via a flag: +# services needed, such as job services or expansion services. # +# The following runners are supported, and selected via a flag: # --runner {portable|direct|flink} (default: portable) # Select which runner to execute tests on. This flag also determines which # services to start up and which tests may be skipped. @@ -26,29 +26,83 @@ # portable - (default) Python Portable Runner (aka. Reference Runner or FnAPI Runner) # flink - Java Flink Runner (local mode) # spark - Java Spark Runner (local mode) +# dataflow - Dataflow Runner # -# --flink_job_server_jar -> Filepath to jar, used if runner is Flink. -# --spark_job_server_jar -> Filepath to jar, used if runner is Spark. -# --endpoint -> Replaces jar filepath with existing job server endpoint. +# General flags: +# --tests -> A space-seperated list of targets for "go test", written with +# beam/sdks/go as the working directory. Defaults to all packages in the +# integration and regression directories. +# --timeout -> Timeout for the go test command, on a per-package level. +# --simultaneous -> Number of simultaneous packages to test. +# Controls the -p flag for the go test command. +# Not used for Flink, Spark, or Samza runners. Defaults to 3 otherwise. +# --endpoint -> An endpoint for an existing job server outside the script. +# If present, job server jar flags are ignored. +# --test_expansion_jar -> Filepath to jar for an expansion service, for +# runners that support cross-language. The test expansion service is one +# that can expand test-only cross-language transforms. +# --test_expansion_addr -> An endpoint for an existing test expansion service +# outside the script. If present, --test_expansion_jar is ignored. +# --io_expansion_jar -> Filepath to jar for an expansion service, for +# runners that support cross-language. The IO expansion service is one +# that can expand cross-language transforms for Beam IOs. +# --io_expansion_addr -> An endpoint for an existing expansion service +# outside the script. If present, --io_expansion_jar is ignored. +# --sdk_overrides -> Only needed if performing cross-lanaguage tests with +# a staged SDK harness container. Note for Dataflow: Using this flag +# prevents the script from creating and staging a container. +# --pipeline_opts -> Appends additional pipeline options to the test command, +# in addition to those already added by this script. # -# --expansion_service_jar -> Filepath to jar for expansion service. -# --expansion_addr -> Replaces jar filepath with existing expansion service endpoint. -# -# Execute from the root of the repository. This script requires that necessary -# services can be built from the repository. +# Runner-specific flags: +# Flink +# --flink_job_server_jar -> Filepath to jar, used if runner is Flink. +# Spark +# --spark_job_server_jar -> Filepath to jar, used if runner is Spark. +# Dataflow +# --dataflow_project -> GCP project to run Dataflow jobs on. +# --project -> Same project as --dataflow-project, but in URL format, for +# example in the format "us.gcr.io/". +# --region -> GCP region to run Dataflow jobs on. +# --gcs_location -> GCS URL for storing temporary files for Dataflow jobs. +# --dataflow_worker_jar -> The Dataflow worker jar to use when running jobs. +# If not specified, the script attempts to retrieve a previously built +# jar from the appropriate gradle module, which may not succeed. set -e set -v +# Default test targets. +TESTS="./test/integration/... ./test/regression" + +# Default runner. RUNNER=portable +# Default timeout. This timeout is applied per-package, as tests in different +# packages are executed in parallel. +TIMEOUT=1h + +# Default limit on simultaneous test binaries/packages being executed. +SIMULTANEOUS=3 + +# Where to store integration test outputs. +GCS_LOCATION=gs://temp-storage-for-end-to-end-tests + +# Project for the container and integration test +PROJECT=apache-beam-testing +DATAFLOW_PROJECT=apache-beam-testing +REGION=us-central1 + # Set up trap to close any running background processes when script ends. exit_background_processes () { - if [[ -n "$JOBSERVER_PID" ]]; then - kill -s SIGKILL $JOBSERVER_PID + if [[ ! -z "$JOBSERVER_PID" ]]; then + kill -9 $JOBSERVER_PID || true fi - if [[ -n "$EXPANSION_PID" ]]; then - kill -s SIGKILL $EXPANSION_PID + if [[ ! -z "$TEST_EXPANSION_PID" ]]; then + kill -9 $TEST_EXPANSION_PID + fi + if [[ ! -z "$IO_EXPANSION_PID" ]]; then + kill -9 $IO_EXPANSION_PID fi } trap exit_background_processes SIGINT SIGTERM EXIT @@ -57,16 +111,61 @@ while [[ $# -gt 0 ]] do key="$1" case $key in + --tests) + TESTS="$2" + shift # past argument + shift # past value + ;; --runner) RUNNER="$2" shift # past argument shift # past value ;; + --timeout) + TIMEOUT="$2" + shift # past argument + shift # past value + ;; + --simultaneous) + SIMULTANEOUS="$2" + shift # past argument + shift # past value + ;; + --project) + PROJECT="$2" + shift # past argument + shift # past value + ;; + --region) + REGION="$2" + shift # past argument + shift # past value + ;; + --dataflow_project) + DATAFLOW_PROJECT="$2" + shift # past argument + shift # past value + ;; + --gcs_location) + GCS_LOCATION="$2" + shift # past argument + shift # past value + ;; + --dataflow_worker_jar) + DATAFLOW_WORKER_JAR="$2" + shift # past argument + shift # past value + ;; --flink_job_server_jar) FLINK_JOB_SERVER_JAR="$2" shift # past argument shift # past value ;; + --samza_job_server_jar) + SAMZA_JOB_SERVER_JAR="$2" + shift # past argument + shift # past value + ;; --spark_job_server_jar) SPARK_JOB_SERVER_JAR="$2" shift # past argument @@ -77,13 +176,33 @@ case $key in shift # past argument shift # past value ;; - --expansion_service_jar) - EXPANSION_SERVICE_JAR="$2" + --test_expansion_jar) + TEST_EXPANSION_JAR="$2" + shift # past argument + shift # past value + ;; + --test_expansion_addr) + TEST_EXPANSION_ADDR="$2" + shift # past argument + shift # past value + ;; + --io_expansion_jar) + IO_EXPANSION_JAR="$2" + shift # past argument + shift # past value + ;; + --io_expansion_addr) + IO_EXPANSION_ADDR="$2" + shift # past argument + shift # past value + ;; + --sdk_overrides) + SDK_OVERRIDES="$2" shift # past argument shift # past value ;; - --expansion_addr) - EXPANSION_ADDR="$2" + --pipeline_opts) + PIPELINE_OPTS="$2" shift # past argument shift # past value ;; @@ -100,7 +219,6 @@ cd $(git rev-parse --show-toplevel) # Verify in the root of the repository test -d sdks/go/test - # Hacky python script to find a free port. Note there is a small chance the chosen port could # get taken before being claimed by the job server. SOCKET_SCRIPT=" @@ -112,10 +230,29 @@ s.close() " # Set up environment based on runner. -ARGS=--runner=$RUNNER -if [[ "$RUNNER" == "flink" || "$RUNNER" == "spark" || "$RUNNER" == "portable" ]]; then +if [[ "$RUNNER" == "dataflow" ]]; then + if [[ -z "$DATAFLOW_WORKER_JAR" ]]; then + DATAFLOW_WORKER_JAR=$(find $(pwd)/runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-*.jar) + fi + echo "Using Dataflow worker jar: $DATAFLOW_WORKER_JAR" + + if [[ -z "$TEST_EXPANSION_ADDR" && -n "$TEST_EXPANSION_JAR" ]]; then + EXPANSION_PORT=$(python3 -c "$SOCKET_SCRIPT") + TEST_EXPANSION_ADDR="localhost:$EXPANSION_PORT" + echo "No test expansion address specified; starting a new test expansion server on $TEST_EXPANSION_ADDR" + java -jar $TEST_EXPANSION_JAR $EXPANSION_PORT & + TEST_EXPANSION_PID=$! + fi + if [[ -z "$IO_EXPANSION_ADDR" && -n "$IO_EXPANSION_JAR" ]]; then + EXPANSION_PORT=$(python3 -c "$SOCKET_SCRIPT") + IO_EXPANSION_ADDR="localhost:$EXPANSION_PORT" + echo "No IO expansion address specified; starting a new IO expansion server on $IO_EXPANSION_ADDR" + java -jar $IO_EXPANSION_JAR $EXPANSION_PORT & + IO_EXPANSION_PID=$! + fi +elif [[ "$RUNNER" == "flink" || "$RUNNER" == "spark" || "$RUNNER" == "samza" || "$RUNNER" == "portable" ]]; then if [[ -z "$ENDPOINT" ]]; then - JOB_PORT=$(python -c "$SOCKET_SCRIPT") + JOB_PORT=$(python3 -c "$SOCKET_SCRIPT") ENDPOINT="localhost:$JOB_PORT" echo "No endpoint specified; starting a new $RUNNER job server on $ENDPOINT" if [[ "$RUNNER" == "flink" ]]; then @@ -125,6 +262,12 @@ if [[ "$RUNNER" == "flink" || "$RUNNER" == "spark" || "$RUNNER" == "portable" ]] --job-port $JOB_PORT \ --expansion-port 0 \ --artifact-port 0 & + elif [[ "$RUNNER" == "samza" ]]; then + java \ + -jar $SAMZA_JOB_SERVER_JAR \ + --job-port $JOB_PORT \ + --expansion-port 0 \ + --artifact-port 0 & elif [[ "$RUNNER" == "spark" ]]; then java \ -jar $SPARK_JOB_SERVER_JAR \ @@ -133,7 +276,7 @@ if [[ "$RUNNER" == "flink" || "$RUNNER" == "spark" || "$RUNNER" == "portable" ]] --expansion-port 0 \ --artifact-port 0 & elif [[ "$RUNNER" == "portable" ]]; then - python \ + python3 \ -m apache_beam.runners.portability.local_job_service_main \ --port $JOB_PORT & else @@ -143,16 +286,132 @@ if [[ "$RUNNER" == "flink" || "$RUNNER" == "spark" || "$RUNNER" == "portable" ]] JOBSERVER_PID=$! fi - if [[ -z "$EXPANSION_ADDR" ]]; then - EXPANSION_PORT=$(python -c "$SOCKET_SCRIPT") - EXPANSION_ADDR="localhost:$EXPANSION_PORT" - echo "No expansion address specified; starting a new expansion server on $EXPANSION_ADDR" - java -jar $EXPANSION_SERVICE_JAR $EXPANSION_PORT & - EXPANSION_PID=$! + if [[ -z "$TEST_EXPANSION_ADDR" && -n "$TEST_EXPANSION_JAR" ]]; then + EXPANSION_PORT=$(python3 -c "$SOCKET_SCRIPT") + TEST_EXPANSION_ADDR="localhost:$EXPANSION_PORT" + echo "No test expansion address specified; starting a new test expansion server on $TEST_EXPANSION_ADDR" + java -jar $TEST_EXPANSION_JAR $EXPANSION_PORT & + TEST_EXPANSION_PID=$! + fi + if [[ -z "$IO_EXPANSION_ADDR" && -n "$IO_EXPANSION_JAR" ]]; then + EXPANSION_PORT=$(python3 -c "$SOCKET_SCRIPT") + IO_EXPANSION_ADDR="localhost:$EXPANSION_PORT" + echo "No IO expansion address specified; starting a new IO expansion server on $IO_EXPANSION_ADDR" + java -jar $IO_EXPANSION_JAR $EXPANSION_PORT & + IO_EXPANSION_PID=$! + fi +fi + +# Disable parallelism on runners that don't support it. +if [[ "$RUNNER" == "flink" || "$RUNNER" == "spark" || "$RUNNER" == "samza" ]]; then + SIMULTANEOUS=1 +fi + +if [[ "$RUNNER" == "dataflow" ]]; then + # Verify docker and gcloud commands exist + command -v docker + docker -v + command -v gcloud + gcloud --version + + # ensure gcloud is version 186 or above + TMPDIR=$(mktemp -d) + gcloud_ver=$(gcloud -v | head -1 | awk '{print $4}') + if [[ "$gcloud_ver" < "186" ]] + then + pushd $TMPDIR + curl https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-186.0.0-linux-x86_64.tar.gz --output gcloud.tar.gz + tar xf gcloud.tar.gz + ./google-cloud-sdk/install.sh --quiet + . ./google-cloud-sdk/path.bash.inc + popd + gcloud components update --quiet || echo 'gcloud components update failed' + gcloud -v + fi + + # Build the container + TAG=$(date +%Y%m%d-%H%M%S) + CONTAINER=us.gcr.io/$PROJECT/$USER/beam_go_sdk + echo "Using container $CONTAINER" + ./gradlew :sdks:go:container:docker -Pdocker-repository-root=us.gcr.io/$PROJECT/$USER -Pdocker-tag=$TAG + + # Verify it exists + docker images | grep $TAG + + # Push the container + gcloud docker -- push $CONTAINER:$TAG + + if [[ -n "$TEST_EXPANSION_ADDR" || -n "$IO_EXPANSION_ADDR" ]]; then + ARGS="$ARGS --experiments=use_portable_job_submission" + + if [[ -z "$SDK_OVERRIDES" ]]; then + # Build the java container for cross-language + JAVA_TAG=$(date +%Y%m%d-%H%M%S) + JAVA_CONTAINER=us.gcr.io/$PROJECT/$USER/beam_java11_sdk + echo "Using container $JAVA_CONTAINER for cross-language java transforms" + ./gradlew :sdks:java:container:java11:docker -Pdocker-repository-root=us.gcr.io/$PROJECT/$USER -Pdocker-tag=$JAVA_TAG + + # Verify it exists + docker images | grep $JAVA_TAG + + # Push the container + gcloud docker -- push $JAVA_CONTAINER:$JAVA_TAG + + SDK_OVERRIDES=".*java.*,$JAVA_CONTAINER:$JAVA_TAG" + fi + fi +else + TAG=dev + ./gradlew :sdks:go:container:docker -Pdocker-tag=$TAG + CONTAINER=apache/beam_go_sdk +fi + +# The go test flag -p dictates the number of simultaneous test binaries running tests. +# Note that --parallel indicates within a test binary level of parallism. +ARGS="$ARGS -p $SIMULTANEOUS" + +# Assemble test arguments and pipeline options. +ARGS="$ARGS --timeout=$TIMEOUT" +ARGS="$ARGS --runner=$RUNNER" +ARGS="$ARGS --project=$DATAFLOW_PROJECT" +ARGS="$ARGS --region=$REGION" +ARGS="$ARGS --environment_type=DOCKER" +ARGS="$ARGS --environment_config=$CONTAINER:$TAG" +ARGS="$ARGS --staging_location=$GCS_LOCATION/staging-validatesrunner-test" +ARGS="$ARGS --temp_location=$GCS_LOCATION/temp-validatesrunner-test" +ARGS="$ARGS --dataflow_worker_jar=$DATAFLOW_WORKER_JAR" +ARGS="$ARGS --endpoint=$ENDPOINT" +if [[ -n "$TEST_EXPANSION_ADDR" ]]; then + ARGS="$ARGS --test_expansion_addr=$TEST_EXPANSION_ADDR" +fi +if [[ -n "$IO_EXPANSION_ADDR" ]]; then + ARGS="$ARGS --io_expansion_addr=$IO_EXPANSION_ADDR" +fi +if [[ -n "$SDK_OVERRIDES" ]]; then + OVERRIDE=--sdk_harness_container_image_override="$SDK_OVERRIDES" + ARGS="$ARGS $OVERRIDE" +fi +ARGS="$ARGS $PIPELINE_OPTS" + +cd sdks/go +echo ">>> RUNNING $RUNNER integration tests with pipeline options: $ARGS" +go test -v $TESTS $ARGS \ + || TEST_EXIT_CODE=$? # don't fail fast here; clean up environment before exiting +cd ../.. + +if [[ "$RUNNER" == "dataflow" ]]; then + # Delete the container locally and remotely + docker rmi $CONTAINER:$TAG || echo "Failed to remove container" + gcloud --quiet container images delete $CONTAINER:$TAG || echo "Failed to delete container" + + if [[ -n "$TEST_EXPANSION_ADDR" || -n "$IO_EXPANSION_ADDR" ]]; then + # Delete the java cross-language container locally and remotely + docker rmi $JAVA_CONTAINER:$JAVA_TAG || echo "Failed to remove container" + gcloud --quiet container images delete $JAVA_CONTAINER:$JAVA_TAG || echo "Failed to delete container" fi - ARGS="$ARGS --endpoint=$ENDPOINT --expansion_addr=$EXPANSION_ADDR" + # Clean up tempdir + rm -rf $TMPDIR fi -echo ">>> RUNNING $RUNNER VALIDATESRUNNER TESTS" -go test ./sdks/go/test/validatesrunner/... $ARGS +exit $TEST_EXIT_CODE diff --git a/sdks/go/test/validatesrunner/reshuffle_test.go b/sdks/go/test/validatesrunner/reshuffle_test.go deleted file mode 100644 index a3b37295b562..000000000000 --- a/sdks/go/test/validatesrunner/reshuffle_test.go +++ /dev/null @@ -1,130 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one or more -// contributor license agreements. See the NOTICE file distributed with -// this work for additional information regarding copyright ownership. -// The ASF licenses this file to You under the Apache License, Version 2.0 -// (the "License"); you may not use this file except in compliance with -// the License. You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -package validatesrunner - -import ( - "fmt" - "testing" - - "github.com/apache/beam/sdks/go/pkg/beam" - "github.com/apache/beam/sdks/go/pkg/beam/testing/passert" - "github.com/apache/beam/sdks/go/pkg/beam/testing/ptest" -) - -func init() { - beam.RegisterFunction(genA) - beam.RegisterFunction(genB) - beam.RegisterFunction(genC) - beam.RegisterFunction(joinFn) - beam.RegisterFunction(sum) - beam.RegisterFunction(lenSum) - beam.RegisterFunction(splitFn) -} - -func genA(_ []byte, emit func(string, int)) { - emit("a", 1) - emit("a", 2) - emit("a", 3) - emit("b", 4) - emit("b", 5) - emit("c", 6) -} - -func genB(_ []byte, emit func(string, int)) { - emit("a", 7) - emit("b", 8) - emit("d", 9) -} - -func genC(_ []byte, emit func(string, string)) { - emit("a", "alpha") - emit("c", "charlie") - emit("d", "delta") -} - -func sum(nums func(*int) bool) int { - var ret, i int - for nums(&i) { - ret += i - } - return ret -} - -func lenSum(strings func(*string) bool) int { - var ret int - var s string - for strings(&s) { - ret += len(s) - } - return ret -} - -func joinFn(key string, as, bs func(*int) bool, cs func(*string) bool, emit func(string, int)) { - emit(key, sum(as)+sum(bs)+lenSum(cs)) -} - -func splitFn(key string, v int, a, b, c, d func(int)) { - switch key { - case "a": - a(v) - case "b": - b(v) - case "c": - c(v) - case "d": - d(v) - default: - panic(fmt.Sprintf("bad key: %v", key)) - } -} - -// TestReshuffle tests Reshuffle. -func TestReshuffle(t *testing.T) { - p, s := beam.NewPipelineWithRoot() - - in := beam.Create(s, 1, 2, 3, 4, 5, 6, 7, 8, 9) - in = beam.Reshuffle(s, in) - passert.Sum(s, in, "reshuffled", 9, 45) - - ptest.RunAndValidate(t, p) -} - -// TestReshuffleKV tests Reshuffle with KV PCollections. -func TestReshuffleKV(t *testing.T) { - p, s := beam.NewPipelineWithRoot() - - s2 := s.Scope("SubScope") - as := beam.ParDo(s2, genA, beam.Impulse(s)) - bs := beam.ParDo(s2, genB, beam.Impulse(s)) - cs := beam.ParDo(s2, genC, beam.Impulse(s)) - - as = beam.Reshuffle(s2, as) - cs = beam.Reshuffle(s2, cs) - - grouped := beam.CoGroupByKey(s2, as, bs, cs) - joined := beam.ParDo(s2, joinFn, grouped) - - joined = beam.Reshuffle(s, joined) - - a, b, c, d := beam.ParDo4(s, splitFn, joined) - - passert.Sum(s, a, "a", 1, 18) - passert.Sum(s, b, "b", 1, 17) - passert.Sum(s, c, "c", 1, 13) - passert.Sum(s, d, "d", 1, 14) - - ptest.RunAndValidate(t, p) -} diff --git a/sdks/java/bom/build.gradle b/sdks/java/bom/build.gradle index b8a71db095ad..4421897b4e25 100644 --- a/sdks/java/bom/build.gradle +++ b/sdks/java/bom/build.gradle @@ -38,7 +38,7 @@ ext { } for (p in rootProject.subprojects) { - if (!p.path.equals(project.path)) { + if (!p.path.startsWith(project.path)) { evaluationDependsOn(p.path) } } @@ -84,6 +84,13 @@ tasks.whenTaskAdded { task -> // check fails without generating the jar. jar.enabled = true +// Starting in Gradle 6.0, the Gradle module metadata is generated automatically +// Disable generating the metadata until this project uses java-platform to +// publish the BOM (see BEAM-11709) +tasks.withType(GenerateModuleMetadata) { + enabled = false +} + // Remove the default jar archive which is added by the 'java' plugin. configurations.archives.artifacts.with { archives -> def artifacts = [] diff --git a/sdks/java/bom/common.gradle b/sdks/java/bom/common.gradle new file mode 100644 index 000000000000..cf45bdd07d5d --- /dev/null +++ b/sdks/java/bom/common.gradle @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: 'org.apache.beam.module' +apply plugin: 'maven-publish' +apply plugin: 'java-platform' + +javaPlatform { + allowDependencies() +} + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +publishing { + publications { + mavenJava(MavenPublication) { + groupId = project.mavenGroupId + artifactId = archivesBaseName + version = project.version + pom { + name = project.description + if (project.hasProperty("summary")) { + description = project.summary + } + url = "https://beam.apache.org" + inceptionYear = "2016" + licenses { + license { + name = "Apache License, Version 2.0" + url = "https://www.apache.org/licenses/LICENSE-2.0.txt" + distribution = "repo" + } + } + scm { + connection = "scm:git:https://gitbox.apache.org/repos/asf/beam.git" + developerConnection = "scm:git:https://gitbox.apache.org/repos/asf/beam.git" + url = "https://gitbox.apache.org/repos/asf?p=beam.git;a=summary" + } + issueManagement { + system = "jira" + url = "https://issues.apache.org/jira/browse/BEAM" + } + mailingLists { + mailingList { + name = "Beam Dev" + subscribe = "dev-subscribe@beam.apache.org" + unsubscribe = "dev-unsubscribe@beam.apache.org" + post = "dev@beam.apache.org" + archive = "https://www.mail-archive.com/dev@beam.apache.org" + } + mailingList { + name = "Beam User" + subscribe = "user-subscribe@beam.apache.org" + unsubscribe = "user-unsubscribe@beam.apache.org" + post = "user@beam.apache.org" + archive = "https://www.mail-archive.com/user@beam.apache.org" + } + mailingList { + name = "Beam Commits" + subscribe = "commits-subscribe@beam.apache.org" + unsubscribe = "commits-unsubscribe@beam.apache.org" + post = "commits@beam.apache.org" + archive = "https://www.mail-archive.com/commits@beam.apache.org" + } + } + developers { + developer { + name = "The Apache Beam Team" + email = "dev@beam.apache.org" + url = "https://beam.apache.org" + organization = "Apache Software Foundation" + organizationUrl = "https://www.apache.org" + } + } + } + + pom.withXml { + def elem = asElement() + def hdr = elem.getOwnerDocument().createComment( + ''' + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +''') + elem.insertBefore(hdr, elem.getFirstChild()) + } + + from components.javaPlatform + } + } + + repositories project.ext.repositories +} + +// Only sign artifacts if we are performing a release +if (isRelease(project) && !project.hasProperty('noSigning')) { + apply plugin: "signing" + signing { + useGpgCmd() + sign publishing.publications + } +} diff --git a/sdks/java/bom/gcp/build.gradle b/sdks/java/bom/gcp/build.gradle new file mode 100644 index 000000000000..b9c16ac72bb0 --- /dev/null +++ b/sdks/java/bom/gcp/build.gradle @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply from: '../common.gradle' + +dependencies { + api platform(project(":sdks:java:bom")) + api platform(project.library.java.google_cloud_platform_libraries_bom) + constraints { + api project.library.java.guava + } +} + +publishing { + publications { + mavenJava(MavenPublication) { + artifactId = 'beam-sdks-java-google-cloud-platform-bom' + } + } +} \ No newline at end of file diff --git a/sdks/java/build-tools/beam-linkage-check.sh b/sdks/java/build-tools/beam-linkage-check.sh index 0672c6077ab5..69d25dad15e6 100755 --- a/sdks/java/build-tools/beam-linkage-check.sh +++ b/sdks/java/build-tools/beam-linkage-check.sh @@ -17,11 +17,10 @@ # # This script compares linkage errors (checkJavaLinkage task in the root gradle project) between -# one branch and master branch. -# This is a temporary solution before Linkage Checker implements exclusion rules (BEAM-9206). +# one branch and another. # Usage: -# /bin/bash sdks/java/build-tools/beam-linkage-check.sh +# /bin/bash sdks/java/build-tools/beam-linkage-check.sh origin/master # # By default, this checks the Maven artifacts listed in ARTIFACTS variable below. # @@ -87,7 +86,7 @@ function runLinkageCheck () { MODE=$1 # "baseline" or "validate" for ARTIFACT_LIST in $ARTIFACT_LISTS; do - echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT_LISTS}" + echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT_LIST}" BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT_LIST}.xml if [ "$MODE" = "baseline" ]; then @@ -96,6 +95,12 @@ function runLinkageCheck () { elif [ "$MODE" = "validate" ]; then BASELINE_OPTION="-PjavaLinkageReadBaseline=${BASELINE_FILE}" echo "`date`:" "using baseline $BASELINE_FILE" + if [ ! -r "${BASELINE_FILE}" ]; then + # If baseline generation failed in previous baseline step, no need to build the project. + echo "`date`:" "Error: Baseline file not found" + ACCUMULATED_RESULT=1 + continue + fi else BASELINE_OPTION="" echo "`date`:" "Unexpected mode: ${MODE}. Not using baseline file." @@ -107,7 +112,10 @@ function runLinkageCheck () { RESULT=$? set -e set +x - if [ "$MODE" = "validate" ]; then + if [ "$MODE" = "baseline" ] && [ ! -r "${BASELINE_FILE}" ]; then + echo "`date`:" "Failed to generate the baseline file. Check the build error above." + ACCUMULATED_RESULT=1 + elif [ "$MODE" = "validate" ]; then echo "`date`:" "Done: ${RESULT}" ACCUMULATED_RESULT=$((ACCUMULATED_RESULT | RESULT)) fi diff --git a/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml b/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml index c168453be8fd..0ccd87a806e2 100644 --- a/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml +++ b/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml @@ -119,6 +119,14 @@ page at http://checkstyle.sourceforge.net/config.html --> + + + + + + + + @@ -135,6 +143,15 @@ page at http://checkstyle.sourceforge.net/config.html --> + + + + + + + + + diff --git a/sdks/java/build-tools/src/main/resources/beam/spotbugs-filter.xml b/sdks/java/build-tools/src/main/resources/beam/spotbugs-filter.xml index 939b58c036a6..b28371c6b5fc 100644 --- a/sdks/java/build-tools/src/main/resources/beam/spotbugs-filter.xml +++ b/sdks/java/build-tools/src/main/resources/beam/spotbugs-filter.xml @@ -63,6 +63,9 @@ + + + @@ -72,4 +75,7 @@ + + + diff --git a/sdks/java/build-tools/src/main/resources/beam/suppressions.xml b/sdks/java/build-tools/src/main/resources/beam/suppressions.xml index 0d7c3bfab130..439a628f3c1b 100644 --- a/sdks/java/build-tools/src/main/resources/beam/suppressions.xml +++ b/sdks/java/build-tools/src/main/resources/beam/suppressions.xml @@ -24,6 +24,7 @@ + @@ -86,8 +87,7 @@ - - + @@ -98,6 +98,9 @@ + + + @@ -105,6 +108,7 @@ + @@ -115,8 +119,21 @@ + + + + + + + + + + + + + diff --git a/sdks/java/container/Dockerfile b/sdks/java/container/Dockerfile index 95f20cdfb5d5..1330cfdcf176 100644 --- a/sdks/java/container/Dockerfile +++ b/sdks/java/container/Dockerfile @@ -48,4 +48,7 @@ RUN if [ "${pull_licenses}" = "false" ] ; then \ rm -rf /opt/apache/beam/third_party_licenses ; \ fi +# Add Google Cloud Profiler agent. +COPY target/profiler/* /opt/google_cloud_profiler/ + ENTRYPOINT ["/opt/apache/beam/boot"] diff --git a/sdks/java/container/boot.go b/sdks/java/container/boot.go index 3787a437a4b8..70335d328a75 100644 --- a/sdks/java/container/boot.go +++ b/sdks/java/container/boot.go @@ -20,19 +20,20 @@ package main import ( "context" "flag" + "fmt" "log" "os" "path/filepath" "strconv" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/artifact" - fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - "github.com/apache/beam/sdks/go/pkg/beam/provision" - "github.com/apache/beam/sdks/go/pkg/beam/util/execx" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" - "github.com/apache/beam/sdks/go/pkg/beam/util/syscallx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/artifact" + fnpb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/fnexecution_v1" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/provision" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/execx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/syscallx" "github.com/golang/protobuf/proto" ) @@ -47,6 +48,13 @@ var ( semiPersistDir = flag.String("semi_persist_dir", "/tmp", "Local semi-persistent directory (optional).") ) +const ( + enableGoogleCloudProfilerOption = "enable_google_cloud_profiler" + enableGoogleCloudHeapSamplingOption = "enable_google_cloud_heap_sampling" + googleCloudProfilerAgentBaseArgs = "-agentpath:/opt/google_cloud_profiler/profiler_java_agent.so=-logtostderr,-cprof_service=%s,-cprof_service_version=%s" + googleCloudProfilerAgentHeapArgs = googleCloudProfilerAgentBaseArgs + ",-cprof_enable_heap_sampling,-cprof_heap_sampling_interval=2097152" +) + func main() { flag.Parse() if *id == "" { @@ -97,7 +105,12 @@ func main() { // (2) Retrieve the staged user jars. We ignore any disk limit, // because the staged jars are mandatory. - dir := filepath.Join(*semiPersistDir, "staged") + // Using the SDK Harness ID in the artifact destination path to make sure that dependencies used by multiple + // SDK Harnesses in the same VM do not conflict. This is needed since some runners (for example, Dataflow) + // may share the artifact staging directory across multiple SDK Harnesses + // TODO(BEAM-9455): consider removing the SDK Harness ID from the staging path after Dataflow can properly + // seperate out dependencies per environment. + dir := filepath.Join(*semiPersistDir, *id, "staged") artifacts, err := artifact.Materialize(ctx, *artifactEndpoint, info.GetDependencies(), info.GetRetrievalToken(), dir) if err != nil { @@ -110,6 +123,7 @@ func main() { os.Setenv("PIPELINE_OPTIONS", options) os.Setenv("LOGGING_API_SERVICE_DESCRIPTOR", proto.MarshalTextString(&pipepb.ApiServiceDescriptor{Url: *loggingEndpoint})) os.Setenv("CONTROL_API_SERVICE_DESCRIPTOR", proto.MarshalTextString(&pipepb.ApiServiceDescriptor{Url: *controlEndpoint})) + os.Setenv("RUNNER_CAPABILITIES", strings.Join(info.GetRunnerCapabilities(), " ")) if info.GetStatusEndpoint() != nil { os.Setenv("STATUS_API_SERVICE_DESCRIPTOR", proto.MarshalTextString(info.GetStatusEndpoint())) @@ -125,25 +139,49 @@ func main() { } var hasWorkerExperiment = strings.Contains(options, "use_staged_dataflow_worker_jar") - for _, md := range artifacts { + for _, a := range artifacts { + name, _ := artifact.MustExtractFilePayload(a) if hasWorkerExperiment { - if strings.HasPrefix(md.Name, "beam-runners-google-cloud-dataflow-java-fn-api-worker") { + if strings.HasPrefix(name, "beam-runners-google-cloud-dataflow-java-fn-api-worker") { continue } - if md.Name == "dataflow-worker.jar" { + if name == "dataflow-worker.jar" { continue } } - cp = append(cp, filepath.Join(dir, filepath.FromSlash(md.Name))) + cp = append(cp, filepath.Join(dir, filepath.FromSlash(name))) } args := []string{ "-Xmx" + strconv.FormatUint(heapSizeLimit(info), 10), "-XX:-OmitStackTraceInFastThrow", "-cp", strings.Join(cp, ":"), - "org.apache.beam.fn.harness.FnHarness", } + enableGoogleCloudProfiler := strings.Contains(options, enableGoogleCloudProfilerOption) + enableGoogleCloudHeapSampling := strings.Contains(options, enableGoogleCloudHeapSamplingOption) + if enableGoogleCloudProfiler { + if metadata := info.GetMetadata(); metadata != nil { + if jobName, nameExists := metadata["job_name"]; nameExists { + if jobId, idExists := metadata["job_id"]; idExists { + if enableGoogleCloudHeapSampling { + args = append(args, fmt.Sprintf(googleCloudProfilerAgentHeapArgs, jobName, jobId)) + } else { + args = append(args, fmt.Sprintf(googleCloudProfilerAgentBaseArgs, jobName, jobId)) + } + log.Printf("Turning on Cloud Profiling. Profile heap: %t", enableGoogleCloudHeapSampling) + } else { + log.Println("Required job_id missing from metadata, profiling will not be enabled without it.") + } + } else { + log.Println("Required job_name missing from metadata, profiling will not be enabled without it.") + } + } else { + log.Println("enable_google_cloud_profiler is set to true, but no metadata is received from provision server, profiling will not be enabled.") + } + } + + args = append(args, "org.apache.beam.fn.harness.FnHarness") log.Printf("Executing: java %v", strings.Join(args, " ")) log.Fatalf("Java exited: %v", execx.Execute("java", args...)) diff --git a/sdks/java/container/build.gradle b/sdks/java/container/build.gradle index 3919816d4c16..23cfd4219f87 100644 --- a/sdks/java/container/build.gradle +++ b/sdks/java/container/build.gradle @@ -18,7 +18,7 @@ plugins { id 'org.apache.beam.module' - id 'com.github.jk1.dependency-license-report' version '1.13' + id 'com.github.jk1.dependency-license-report' version '1.16' } applyGoNature() @@ -28,15 +28,14 @@ evaluationDependsOn(":sdks:java:io:kafka") description = "Apache Beam :: SDKs :: Java :: Container" -// Figure out why the golang plugin does not add a build dependency between projects. -// Without the line below, we get spurious errors about not being able to resolve -// "./github.com/apache/beam/sdks/go" -resolveBuildDependencies.dependsOn ":sdks:go:goBuild" +// Disable gogradle's dependency resolution so it uses Go modules instead. +installDependencies.enabled = false +resolveBuildDependencies.enabled = false +resolveTestDependencies.enabled = false configurations { dockerDependency sdkHarnessLauncher - pulledLicenses } dependencies { @@ -56,7 +55,7 @@ dependencies { } golang { - packagePath = 'github.com/apache/beam/sdks/java/boot' + packagePath = 'github.com/apache/beam/sdks/v2/java/container' goBuild { // TODO(herohde): build local platform + linux-amd64, if possible. targetPlatform = ['linux-amd64'] @@ -85,7 +84,17 @@ task pullLicenses(type: Exec) { outputs.dir('build/target/java_third_party_licenses') } +task downloadCloudProfilerAgent(type: Exec) { + executable "sh" + args "-c", "wget -q -O- https://storage.googleapis.com/cloud-profiler/java/latest/profiler_java_agent.tar.gz | tar xzv -C build/target/profiler/" + outputs.dir('build/target/profiler') +} + artifacts { sdkHarnessLauncher file: file('./build/target'), builtBy: goBuild - pulledLicenses file: file('./build/target'), builtBy: pullLicenses -} \ No newline at end of file +} + +task pushAll { + dependsOn ":sdks:java:container:java8:dockerPush" + dependsOn ":sdks:java:container:java11:dockerPush" +} diff --git a/sdks/java/container/common.gradle b/sdks/java/container/common.gradle index a18cdde44783..6f828ece9259 100644 --- a/sdks/java/container/common.gradle +++ b/sdks/java/container/common.gradle @@ -40,7 +40,6 @@ configurations { } dependencies { - pulledLicenses project(path: ":sdks:java:container", configuration: "pulledLicenses") dockerDependency project(path: ":sdks:java:container", configuration: "dockerDependency") sdkHarnessLauncher project(path: ":sdks:java:container", configuration: "sdkHarnessLauncher") } @@ -60,9 +59,10 @@ task copySdkHarnessLauncher(type: Copy) { into "build/target" } -task copyPulledLicenses(type: Copy) { - from configurations.pulledLicenses - into "build/target" +task copyJavaThirdPartyLicenses(type: Copy) { + from("${project(':sdks:java:container').buildDir}/target/third_party_licenses") + into "build/target/third_party_licenses" + dependsOn ':sdks:java:container:pullLicenses' } task copyGolangLicenses(type: Copy) { @@ -73,7 +73,7 @@ task copyGolangLicenses(type: Copy) { task skipPullLicenses(type: Exec) { executable "sh" - args "-c", "mkdir -p build/target/third_party_licenses && touch build/target/third_party_licenses/skip" + args "-c", "mkdir -p build/target/go-licenses build/target/third_party_licenses && touch build/target/third_party_licenses/skip" } docker { @@ -84,6 +84,8 @@ docker { project.docker_image_default_repo_root, tag: project.rootProject.hasProperty(["docker-tag"]) ? project.rootProject["docker-tag"] : project.sdk_version) + // tags used by dockerTag task + tags containerImageTags() dockerfile project.file("../Dockerfile") files "./build/" buildArgs([ @@ -95,10 +97,11 @@ docker { if (project.rootProject.hasProperty(["docker-pull-licenses"]) || project.rootProject.hasProperty(["isRelease"])) { - dockerPrepare.dependsOn copyPulledLicenses + dockerPrepare.dependsOn copyJavaThirdPartyLicenses dockerPrepare.dependsOn copyGolangLicenses } else { dockerPrepare.dependsOn skipPullLicenses } dockerPrepare.dependsOn copySdkHarnessLauncher dockerPrepare.dependsOn copyDockerfileDependencies +dockerPrepare.dependsOn ":sdks:java:container:downloadCloudProfilerAgent" diff --git a/sdks/java/container/license_scripts/dep_urls_java.yaml b/sdks/java/container/license_scripts/dep_urls_java.yaml index 280b1a8cee79..14dbb3e74188 100644 --- a/sdks/java/container/license_scripts/dep_urls_java.yaml +++ b/sdks/java/container/license_scripts/dep_urls_java.yaml @@ -41,12 +41,18 @@ jaxen: '1.1.6': type: "3-Clause BSD" libraries-bom: - '13.2.0': + '22.0.0': license: "https://raw.githubusercontent.com/GoogleCloudPlatform/cloud-opensource-java/master/LICENSE" type: "Apache License 2.0" paranamer: '2.7': license: "https://raw.githubusercontent.com/paul-hammant/paranamer/master/LICENSE.txt" + '2.8': + license: "https://raw.githubusercontent.com/paul-hammant/paranamer/master/LICENSE.txt" xz: - '1.8': - license: "https://git.tukaani.org/?p=xz-java.git;a=blob_plain;f=COPYING;h=c1d404dc7a6f06a0437bf1055fedaa4a4c89d728;hb=HEAD" + '1.5': + license: "https://git.tukaani.org/?p=xz-java.git;a=blob;f=COPYING;h=c1d404dc7a6f06a0437bf1055fedaa4a4c89d728;hb=9f1f97a26f090ffec6568c004a38c6534aa82b94" +jackson-bom: + '2.12.4': + license: "https://raw.githubusercontent.com/FasterXML/jackson-bom/master/LICENSE" + type: "Apache License 2.0" diff --git a/sdks/java/container/license_scripts/pull_licenses_java.py b/sdks/java/container/license_scripts/pull_licenses_java.py index 3ca24bde45ed..696a916392fe 100644 --- a/sdks/java/container/license_scripts/pull_licenses_java.py +++ b/sdks/java/container/license_scripts/pull_licenses_java.py @@ -38,7 +38,7 @@ from tenacity import wait_fixed from urllib.request import urlopen, URLError, HTTPError -SOURCE_CODE_REQUIRED_LICENSES = ['lgpl', 'glp', 'cddl', 'mpl', 'gnu', 'mozilla public license'] +SOURCE_CODE_REQUIRED_LICENSES = ['lgpl', 'gpl', 'cddl', 'mpl', 'gnu', 'mozilla public license'] RETRY_NUM = 9 THREADS = 16 diff --git a/sdks/java/core/build.gradle b/sdks/java/core/build.gradle index e8b07f66acae..0d01c1fbadba 100644 --- a/sdks/java/core/build.gradle +++ b/sdks/java/core/build.gradle @@ -24,7 +24,6 @@ applyJavaNature( 'MergingActiveWindowSetTest': 'https://github.com/typetools/checker-framework/issues/3776', 'WindowFnTestUtils': 'https://github.com/typetools/checker-framework/issues/3776', ], - checkerTooSlowOnTests: true, shadowClosure: { dependencies { include(dependency("org.apache.commons:.*")) @@ -49,12 +48,15 @@ interface for processing virtually any size data. This artifact includes entire Apache Beam Java SDK.""" processResources { + inputs.property('version', version) + inputs.property('sdk_version', sdk_version) + inputs.property('docker_image_default_repo_root', docker_image_default_repo_root) + inputs.property('docker_image_default_repo_prefix', docker_image_default_repo_prefix) filter org.apache.tools.ant.filters.ReplaceTokens, tokens: [ 'pom.version': version, 'pom.sdk_version': sdk_version, 'pom.docker_image_default_repo_root': docker_image_default_repo_root, 'pom.docker_image_default_repo_prefix': docker_image_default_repo_prefix, - 'timestamp': new Date().format("yyyy-MM-dd HH:mm") ] } @@ -68,16 +70,19 @@ test { dependencies { antlr library.java.antlr + // antlr is used to generate code from sdks/java/core/src/main/antlr/ + permitUnusedDeclared library.java.antlr // Required to load constants from the model, e.g. max timestamp for global window shadow project(path: ":model:pipeline", configuration: "shadow") shadow project(path: ":model:job-management", configuration: "shadow") - shadow library.java.vendored_bytebuddy_1_10_8 - shadow library.java.vendored_grpc_1_26_0 + shadow library.java.vendored_bytebuddy_1_11_0 + shadow library.java.vendored_grpc_1_36_0 shadow library.java.vendored_guava_26_0_jre compile library.java.antlr_runtime compile library.java.commons_compress compile library.java.commons_lang3 shadow library.java.jsr305 + shadow library.java.error_prone_annotations shadow library.java.jackson_core shadow library.java.jackson_annotations shadow library.java.jackson_databind @@ -85,11 +90,10 @@ dependencies { shadow library.java.avro shadow library.java.snappy_java shadow library.java.joda_time - shadow "org.tukaani:xz:1.8" provided library.java.junit provided library.java.hamcrest_core provided library.java.hamcrest_library - provided 'io.airlift:aircompressor:0.16' + provided 'io.airlift:aircompressor:0.18' provided 'com.facebook.presto.hadoop:hadoop-apache2:3.2.0-1' shadowTest library.java.jackson_dataformat_yaml shadowTest library.java.guava_testlib diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java index 3c5db964bb12..f48d8846a589 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java @@ -45,6 +45,7 @@ import org.apache.beam.sdk.schemas.SchemaRegistry; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.util.UserCodeException; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; @@ -519,7 +520,7 @@ private Pipeline(TransformHierarchy transforms, PipelineOptions options) { } protected Pipeline(PipelineOptions options) { - this(new TransformHierarchy(), options); + this(new TransformHierarchy(ResourceHints.fromOptions(options)), options); } @Override diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java index f6508e624c35..b2367837cc45 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java @@ -54,6 +54,9 @@ import org.apache.avro.reflect.ReflectDatumWriter; import org.apache.avro.reflect.Union; import org.apache.avro.specific.SpecificData; +import org.apache.avro.specific.SpecificDatumReader; +import org.apache.avro.specific.SpecificDatumWriter; +import org.apache.avro.specific.SpecificRecord; import org.apache.avro.util.ClassUtils; import org.apache.avro.util.Utf8; import org.apache.beam.sdk.util.EmptyOnDeserializationThreadLocal; @@ -125,7 +128,7 @@ public static AvroCoder of(TypeDescriptor type) { * @param the element type */ public static AvroCoder of(Class clazz) { - return new AvroCoder<>(clazz, new ReflectData(clazz.getClassLoader()).getSchema(clazz)); + return of(clazz, false); } /** @@ -136,6 +139,16 @@ public static AvroGenericCoder of(Schema schema) { return AvroGenericCoder.of(schema); } + /** + * Returns an {@code AvroCoder} instance for the given class using Avro's Reflection API for + * encoding and decoding. + * + * @param the element type + */ + public static AvroCoder of(Class type, boolean useReflectApi) { + return of(type, new ReflectData(type.getClassLoader()).getSchema(type), useReflectApi); + } + /** * Returns an {@code AvroCoder} instance for the provided element type using the provided Avro * schema. @@ -145,7 +158,17 @@ public static AvroGenericCoder of(Schema schema) { * @param the element type */ public static AvroCoder of(Class type, Schema schema) { - return new AvroCoder<>(type, schema); + return of(type, schema, false); + } + + /** + * Returns an {@code AvroCoder} instance for the given class and schema using Avro's Reflection + * API for encoding and decoding. + * + * @param the element type + */ + public static AvroCoder of(Class type, Schema schema, boolean useReflectApi) { + return new AvroCoder<>(type, schema, useReflectApi); } /** @@ -267,6 +290,10 @@ public ReflectData get() { private final Supplier reflectData; protected AvroCoder(Class type, Schema schema) { + this(type, schema, false); + } + + protected AvroCoder(Class type, Schema schema, boolean useReflectApi) { this.type = type; this.schemaSupplier = new SerializableSchemaSupplier(schema); typeDescriptor = TypeDescriptor.of(type); @@ -286,10 +313,13 @@ protected AvroCoder(Class type, Schema schema) { @Override public DatumReader initialValue() { - return myCoder.getType().equals(GenericRecord.class) - ? new GenericDatumReader<>(myCoder.getSchema()) - : new ReflectDatumReader<>( - myCoder.getSchema(), myCoder.getSchema(), myCoder.reflectData.get()); + if (myCoder.getType().equals(GenericRecord.class)) { + return new GenericDatumReader<>(myCoder.getSchema()); + } else if (SpecificRecord.class.isAssignableFrom(myCoder.getType()) && !useReflectApi) { + return new SpecificDatumReader<>(myCoder.getType()); + } + return new ReflectDatumReader<>( + myCoder.getSchema(), myCoder.getSchema(), myCoder.reflectData.get()); } }; @@ -299,9 +329,12 @@ public DatumReader initialValue() { @Override public DatumWriter initialValue() { - return myCoder.getType().equals(GenericRecord.class) - ? new GenericDatumWriter<>(myCoder.getSchema()) - : new ReflectDatumWriter<>(myCoder.getSchema(), myCoder.reflectData.get()); + if (myCoder.getType().equals(GenericRecord.class)) { + return new GenericDatumWriter<>(myCoder.getSchema()); + } else if (SpecificRecord.class.isAssignableFrom(myCoder.getType()) && !useReflectApi) { + return new SpecificDatumWriter<>(myCoder.getType()); + } + return new ReflectDatumWriter<>(myCoder.getSchema(), myCoder.reflectData.get()); } }; } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/DequeCoder.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/DequeCoder.java new file mode 100644 index 000000000000..81f706881416 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/DequeCoder.java @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.coders; + +import java.util.ArrayDeque; +import java.util.Deque; +import java.util.List; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeParameter; + +/** + * A {@link Coder} for {@link Deque}, using the format of {@link IterableLikeCoder}. + * + * @param the type of the elements of the Deques being transcoded + */ +public class DequeCoder extends IterableLikeCoder> { + + public static DequeCoder of(Coder elemCoder) { + return new DequeCoder<>(elemCoder); + } + + ///////////////////////////////////////////////////////////////////////////// + // Internal operations below here. + + @Override + protected Deque decodeToIterable(List decodedElements) { + return new ArrayDeque<>(decodedElements); + } + + protected DequeCoder(Coder elemCoder) { + super(elemCoder, "Deque"); + } + + @Override + public boolean consistentWithEquals() { + return getElemCoder().consistentWithEquals(); + } + + @Override + public Object structuralValue(Deque values) { + if (consistentWithEquals()) { + return values; + } else { + final Deque ret = new ArrayDeque<>(values.size()); + for (T value : values) { + ret.add(getElemCoder().structuralValue(value)); + } + return ret; + } + } + + /** + * Deque sizes are always known, so DequeIterable may be deterministic while the general + * IterableLikeCoder is not. + */ + @Override + public void verifyDeterministic() throws NonDeterministicException { + verifyDeterministic( + this, "Coder for elements of DequeCoder must be determistic", getElemCoder()); + } + + @Override + public TypeDescriptor> getEncodedTypeDescriptor() { + return new TypeDescriptor>(getClass()) {}.where( + new TypeParameter() {}, getElemCoder().getEncodedTypeDescriptor()); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java index 9ef658d2bff1..2d51411b2ef4 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java @@ -17,7 +17,9 @@ */ package org.apache.beam.sdk.coders; +import java.util.Map; import java.util.Objects; +import java.util.UUID; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.schemas.Schema; @@ -34,6 +36,11 @@ public static RowCoder of(Schema schema) { return new RowCoder(schema); } + /** Override encoding positions for the given schema. */ + public static void overrideEncodingPositions(UUID uuid, Map encodingPositions) { + SchemaCoder.overrideEncodingPositions(uuid, encodingPositions); + } + private RowCoder(Schema schema) { super( schema, diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java index a21433f16f92..be7731c3b640 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java @@ -24,10 +24,11 @@ import java.io.OutputStream; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Modifier; +import java.util.Arrays; import java.util.BitSet; -import java.util.List; import java.util.Map; import java.util.UUID; +import javax.annotation.Nullable; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.schemas.Schema; @@ -35,28 +36,30 @@ import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.SchemaCoder; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.FieldManifestation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.Ownership; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.Visibility; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.FixedValue; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Duplication; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.FieldAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.FieldManifestation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.Ownership; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.Visibility; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.FixedValue; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Duplication; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.FieldAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** * A utility for automatically generating a {@link Coder} for {@link Row} objects corresponding to a @@ -103,11 +106,23 @@ public abstract class RowCoderGenerator { private static final ByteBuddy BYTE_BUDDY = new ByteBuddy(); private static final BitSetCoder NULL_LIST_CODER = BitSetCoder.of(); private static final VarIntCoder VAR_INT_CODER = VarIntCoder.of(); + // BitSet.get(n) will return false for any n >= nbits, so a BitSet with 0 bits will return false + // for all calls to get. + private static final BitSet EMPTY_BIT_SET = new BitSet(0); private static final String CODERS_FIELD_NAME = "FIELD_CODERS"; + private static final String POSITIONS_FIELD_NAME = "FIELD_ENCODING_POSITIONS"; // Cache for Coder class that are already generated. private static final Map> GENERATED_CODERS = Maps.newConcurrentMap(); + private static final Map> ENCODING_POSITION_OVERRIDES = + Maps.newConcurrentMap(); + + private static final Logger LOG = LoggerFactory.getLogger(RowCoderGenerator.class); + + public static void overrideEncodingPositions(UUID uuid, Map encodingPositions) { + ENCODING_POSITION_OVERRIDES.put(uuid, encodingPositions); + } @SuppressWarnings("unchecked") public static Coder generate(Schema schema) { @@ -121,22 +136,37 @@ public static Coder generate(Schema schema) { (DynamicType.Builder) BYTE_BUDDY.subclass(coderType); builder = implementMethods(schema, builder); + int[] encodingPosToRowIndex = new int[schema.getFieldCount()]; + Map encodingPositions = + ENCODING_POSITION_OVERRIDES.getOrDefault(schema.getUUID(), schema.getEncodingPositions()); + for (int recordIndex = 0; recordIndex < schema.getFieldCount(); ++recordIndex) { + String name = schema.getField(recordIndex).getName(); + int encodingPosition = encodingPositions.get(name); + encodingPosToRowIndex[encodingPosition] = recordIndex; + } + // There should never be duplicate encoding positions. + Preconditions.checkState( + schema.getFieldCount() == Arrays.stream(encodingPosToRowIndex).distinct().count()); + + // Component coders are ordered by encoding position, but may encode a field with a different + // row index. Coder[] componentCoders = new Coder[schema.getFieldCount()]; for (int i = 0; i < schema.getFieldCount(); ++i) { + int rowIndex = encodingPosToRowIndex[i]; // We use withNullable(false) as nulls are handled by the RowCoder and the individual // component coders therefore do not need to handle nulls. componentCoders[i] = - SchemaCoder.coderForFieldType(schema.getField(i).getType().withNullable(false)); + SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false)); } - builder = - builder.defineField( - CODERS_FIELD_NAME, Coder[].class, Visibility.PRIVATE, FieldManifestation.FINAL); - builder = builder + .defineField( + CODERS_FIELD_NAME, Coder[].class, Visibility.PRIVATE, FieldManifestation.FINAL) + .defineField( + POSITIONS_FIELD_NAME, int[].class, Visibility.PRIVATE, FieldManifestation.FINAL) .defineConstructor(Modifier.PUBLIC) - .withParameters(Coder[].class) + .withParameters(Coder[].class, int[].class) .intercept(new GeneratedCoderConstructor()); try { @@ -145,8 +175,8 @@ public static Coder generate(Schema schema) { .make() .load(Coder.class.getClassLoader(), ClassLoadingStrategy.Default.INJECTION) .getLoaded() - .getDeclaredConstructor(Coder[].class) - .newInstance((Object) componentCoders); + .getDeclaredConstructor(Coder[].class, int[].class) + .newInstance((Object) componentCoders, (Object) encodingPosToRowIndex); } catch (InstantiationException | IllegalAccessException | NoSuchMethodException @@ -179,6 +209,7 @@ public ByteCodeAppender appender(final Target implementationTarget) { .filter( ElementMatchers.isConstructor().and(ElementMatchers.takesArguments(0))) .getOnly()), + Duplication.SINGLE, // Store the list of Coders as a member variable. MethodVariableAccess.REFERENCE.loadFrom(1), FieldAccess.forField( @@ -188,6 +219,15 @@ public ByteCodeAppender appender(final Target implementationTarget) { .filter(ElementMatchers.named(CODERS_FIELD_NAME)) .getOnly()) .write(), + // Store the list of encoding offsets as a member variable. + MethodVariableAccess.REFERENCE.loadFrom(2), + FieldAccess.forField( + implementationTarget + .getInstrumentedType() + .getDeclaredFields() + .filter(ElementMatchers.named(POSITIONS_FIELD_NAME)) + .getOnly()) + .write(), MethodReturn.VOID); StackManipulation.Size size = stackManipulation.apply(methodVisitor, implementationContext); return new Size(size.getMaximalSize(), numLocals); @@ -227,6 +267,14 @@ public ByteCodeAppender appender(Target implementationTarget) { .filter(ElementMatchers.named(CODERS_FIELD_NAME)) .getOnly()) .read(), + MethodVariableAccess.loadThis(), + FieldAccess.forField( + implementationContext + .getInstrumentedType() + .getDeclaredFields() + .filter(ElementMatchers.named(POSITIONS_FIELD_NAME)) + .getOnly()) + .read(), // Element to encode. (offset 1, as offset 0 is always "this"). MethodVariableAccess.REFERENCE.loadFrom(1), // OutputStream. @@ -260,33 +308,58 @@ public InstrumentedType prepare(InstrumentedType instrumentedType) { // per-field Coders. @SuppressWarnings("unchecked") static void encodeDelegate( - Coder[] coders, Row value, OutputStream outputStream, boolean hasNullableFields) + Coder[] coders, + int[] encodingPosToIndex, + Row value, + OutputStream outputStream, + boolean hasNullableFields) throws IOException { checkState(value.getFieldCount() == value.getSchema().getFieldCount()); + checkState(encodingPosToIndex.length == value.getFieldCount()); // Encode the field count. This allows us to handle compatible schema changes. VAR_INT_CODER.encode(value.getFieldCount(), outputStream); - // Encode a bitmap for the null fields to save having to encode a bunch of nulls. - NULL_LIST_CODER.encode(scanNullFields(value, hasNullableFields), outputStream); - for (int idx = 0; idx < value.getFieldCount(); ++idx) { - Object fieldValue = value.getValue(idx); - if (value.getValue(idx) != null) { - coders[idx].encode(fieldValue, outputStream); + + if (hasNullableFields) { + // If the row has null fields, extract the values out once so that both scanNullFields and + // the encoding can share it and avoid having to extract them twice. + + Object[] fieldValues = new Object[value.getFieldCount()]; + for (int idx = 0; idx < fieldValues.length; ++idx) { + fieldValues[idx] = value.getValue(idx); + } + + // Encode a bitmap for the null fields to save having to encode a bunch of nulls. + NULL_LIST_CODER.encode(scanNullFields(fieldValues), outputStream); + for (int encodingPos = 0; encodingPos < fieldValues.length; ++encodingPos) { + @Nullable Object fieldValue = fieldValues[encodingPosToIndex[encodingPos]]; + if (fieldValue != null) { + coders[encodingPos].encode(fieldValue, outputStream); + } + } + } else { + // Otherwise, we know all fields are non-null, so the null list is always empty. + + NULL_LIST_CODER.encode(EMPTY_BIT_SET, outputStream); + for (int encodingPos = 0; encodingPos < value.getFieldCount(); ++encodingPos) { + @Nullable Object fieldValue = value.getValue(encodingPosToIndex[encodingPos]); + if (fieldValue != null) { + coders[encodingPos].encode(fieldValue, outputStream); + } } } } // Figure out which fields of the Row are null, and returns a BitSet. This allows us to save // on encoding each null field separately. - private static BitSet scanNullFields(Row row, boolean hasNullableFields) { - BitSet nullFields = new BitSet(row.getFieldCount()); - if (hasNullableFields) { - for (int idx = 0; idx < row.getFieldCount(); ++idx) { - if (row.getValue(idx) == null) { - nullFields.set(idx); - } + private static BitSet scanNullFields(Object[] fieldValues) { + BitSet nullFields = new BitSet(fieldValues.length); + for (int idx = 0; idx < fieldValues.length; ++idx) { + if (fieldValues[idx] == null) { + nullFields.set(idx); } } + return nullFields; } } @@ -315,6 +388,14 @@ public ByteCodeAppender appender(Target implementationTarget) { .filter(ElementMatchers.named(CODERS_FIELD_NAME)) .getOnly()) .read(), + MethodVariableAccess.loadThis(), + FieldAccess.forField( + implementationContext + .getInstrumentedType() + .getDeclaredFields() + .filter(ElementMatchers.named(POSITIONS_FIELD_NAME)) + .getOnly()) + .read(), // read the InputStream. (offset 1, as offset 0 is always "this"). MethodVariableAccess.REFERENCE.loadFrom(1), MethodInvocation.invoke( @@ -336,27 +417,30 @@ public InstrumentedType prepare(InstrumentedType instrumentedType) { // The decode method of the generated Coder delegates to this method to evaluate all of the // per-field Coders. - static Row decodeDelegate(Schema schema, Coder[] coders, InputStream inputStream) + static Row decodeDelegate( + Schema schema, Coder[] coders, int[] encodingPosToIndex, InputStream inputStream) throws IOException { int fieldCount = VAR_INT_CODER.decode(inputStream); BitSet nullFields = NULL_LIST_CODER.decode(inputStream); - List fieldValues = Lists.newArrayListWithCapacity(coders.length); - for (int i = 0; i < fieldCount; ++i) { + Object[] fieldValues = new Object[coders.length]; + for (int encodingPos = 0; encodingPos < fieldCount; ++encodingPos) { // In the case of a schema change going backwards, fieldCount might be > coders.length, // in which case we drop the extra fields. - if (i < coders.length) { - if (nullFields.get(i)) { - fieldValues.add(null); + if (encodingPos < coders.length) { + int rowIndex = encodingPosToIndex[encodingPos]; + if (nullFields.get(rowIndex)) { + fieldValues[rowIndex] = null; } else { - Object fieldValue = coders[i].decode(inputStream); - fieldValues.add(fieldValue); + Object fieldValue = coders[encodingPos].decode(inputStream); + fieldValues[rowIndex] = fieldValue; } } } // If the schema was evolved to contain more fields, we fill them in with nulls. - for (int i = fieldCount; i < coders.length; i++) { - fieldValues.add(null); + for (int encodingPos = fieldCount; encodingPos < coders.length; encodingPos++) { + int rowIndex = encodingPosToIndex[encodingPos]; + fieldValues[rowIndex] = null; } // We call attachValues instead of setValues. setValues validates every element in the list // is of the proper type, potentially converts to the internal type Row stores, and copies diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java index ffdf3fb0248e..4f7addd539e6 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java @@ -39,6 +39,9 @@ /** * A {@link Coder} for Java classes that implement {@link Serializable}. * + *

    {@link SerializableCoder} should be used only for objects that have proper {@link + * Object#equals} and {@link Object#hashCode} implementations. + * *

    To use, specify the coder type on a PCollection: * *

    {@code
    diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoder.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoder.java
    new file mode 100644
    index 000000000000..ea7d56eb0c8b
    --- /dev/null
    +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoder.java
    @@ -0,0 +1,91 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.beam.sdk.coders;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.util.List;
    +import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
    +import org.apache.beam.sdk.util.common.ElementByteSizeObserver;
    +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists;
    +
    +/**
    + * A {@link TimestampPrefixingWindowCoder} wraps arbitrary user custom window coder. While encoding
    + * the custom window type, it extracts the maxTimestamp(inclusive) of the window and prefix it to
    + * the encoded bytes of the window using the user custom window coder.
    + *
    + * @param  The custom window type.
    + */
    +public class TimestampPrefixingWindowCoder extends StructuredCoder {
    +  private final Coder windowCoder;
    +
    +  public static  TimestampPrefixingWindowCoder of(
    +      Coder windowCoder) {
    +    return new TimestampPrefixingWindowCoder<>(windowCoder);
    +  }
    +
    +  TimestampPrefixingWindowCoder(Coder windowCoder) {
    +    this.windowCoder = windowCoder;
    +  }
    +
    +  public Coder getWindowCoder() {
    +    return windowCoder;
    +  }
    +
    +  @Override
    +  public void encode(T value, OutputStream outStream) throws CoderException, IOException {
    +    if (value == null) {
    +      throw new CoderException("Cannot encode null window");
    +    }
    +    InstantCoder.of().encode(value.maxTimestamp(), outStream);
    +    windowCoder.encode(value, outStream);
    +  }
    +
    +  @Override
    +  public T decode(InputStream inStream) throws CoderException, IOException {
    +    InstantCoder.of().decode(inStream);
    +    return windowCoder.decode(inStream);
    +  }
    +
    +  @Override
    +  public List> getCoderArguments() {
    +    return Lists.newArrayList(windowCoder);
    +  }
    +
    +  @Override
    +  public void verifyDeterministic() throws NonDeterministicException {
    +    windowCoder.verifyDeterministic();
    +  }
    +
    +  @Override
    +  public boolean consistentWithEquals() {
    +    return windowCoder.consistentWithEquals();
    +  }
    +
    +  @Override
    +  public boolean isRegisterByteSizeObserverCheap(T value) {
    +    return windowCoder.isRegisterByteSizeObserverCheap(value);
    +  }
    +
    +  @Override
    +  public void registerByteSizeObserver(T value, ElementByteSizeObserver observer) throws Exception {
    +    InstantCoder.of().registerByteSizeObserver(value.maxTimestamp(), observer);
    +    windowCoder.registerByteSizeObserver(value, observer);
    +  }
    +}
    diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/expansion/ExternalTransformRegistrar.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/expansion/ExternalTransformRegistrar.java
    index 297e0df885c9..ac691075ee6c 100644
    --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/expansion/ExternalTransformRegistrar.java
    +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/expansion/ExternalTransformRegistrar.java
    @@ -69,7 +69,7 @@ public interface ExternalTransformRegistrar {
             throw e;
           } catch (Exception e) {
             throw new RuntimeException(
    -            "Unable to instantiate ExternalTransformBuilder from constructor.");
    +            "Unable to instantiate ExternalTransformBuilder from constructor.", e);
           }
         }
         return builder.build();
    diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
    index 8fb41861a305..353d822fd033 100644
    --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
    +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
    @@ -18,6 +18,7 @@
     package org.apache.beam.sdk.io;
     
     import static org.apache.beam.sdk.io.FileIO.ReadMatches.DirectoryTreatment;
    +import static org.apache.beam.sdk.io.ReadAllViaFileBasedSource.ReadFileRangesFnExceptionHandler;
     import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
     import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
     
    @@ -44,11 +45,13 @@
     import org.apache.beam.sdk.coders.StringUtf8Coder;
     import org.apache.beam.sdk.io.FileBasedSink.FilenamePolicy;
     import org.apache.beam.sdk.io.FileIO.MatchConfiguration;
    +import org.apache.beam.sdk.io.FileIO.ReadableFile;
     import org.apache.beam.sdk.io.fs.EmptyMatchTreatment;
     import org.apache.beam.sdk.io.fs.ResourceId;
     import org.apache.beam.sdk.options.ValueProvider;
     import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider;
     import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
    +import org.apache.beam.sdk.schemas.utils.AvroUtils;
     import org.apache.beam.sdk.transforms.Create;
     import org.apache.beam.sdk.transforms.PTransform;
     import org.apache.beam.sdk.transforms.SerializableFunction;
    @@ -58,7 +61,6 @@
     import org.apache.beam.sdk.values.PBegin;
     import org.apache.beam.sdk.values.PCollection;
     import org.apache.beam.sdk.values.PDone;
    -import org.apache.beam.sdk.values.TypeDescriptor;
     import org.apache.beam.sdk.values.TypeDescriptors;
     import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
     import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function;
    @@ -197,7 +199,7 @@
      *
      * 
    {@code
      * PCollection records =
    - *     p.apply(AvroIO.read(...).from(...).withBeamSchemas(true);
    + *     p.apply(AvroIO.read(...).from(...).withBeamSchemas(true));
      * }
    * *

    Inferring Beam schemas from Avro PCollections

    @@ -346,8 +348,8 @@ public static Read read(Class recordClass) { } /** - * Like {@link #read}, but reads each file in a {@link PCollection} of {@link - * FileIO.ReadableFile}, returned by {@link FileIO#readMatches}. + * Like {@link #read}, but reads each file in a {@link PCollection} of {@link ReadableFile}, + * returned by {@link FileIO#readMatches}. * *

    You can read {@link GenericRecord} by using {@code #readFiles(GenericRecord.class)} or * {@code #readFiles(new Schema.Parser().parse(schema))} if the schema is a String. @@ -358,6 +360,8 @@ public static ReadFiles readFiles(Class recordClass) { .setSchema(ReflectData.get().getSchema(recordClass)) .setInferBeamSchema(false) .setDesiredBundleSizeBytes(DEFAULT_BUNDLE_SIZE_BYTES) + .setUsesReshuffle(ReadAllViaFileBasedSource.DEFAULT_USES_RESHUFFLE) + .setFileExceptionHandler(new ReadFileRangesFnExceptionHandler()) .build(); } @@ -392,7 +396,7 @@ public static Read readGenericRecords(Schema schema) { /** * Like {@link #readGenericRecords(Schema)}, but for a {@link PCollection} of {@link - * FileIO.ReadableFile}, for example, returned by {@link FileIO#readMatches}. + * ReadableFile}, for example, returned by {@link FileIO#readMatches}. */ public static ReadFiles readFilesGenericRecords(Schema schema) { return new AutoValue_AvroIO_ReadFiles.Builder() @@ -400,12 +404,14 @@ public static ReadFiles readFilesGenericRecords(Schema schema) { .setSchema(schema) .setInferBeamSchema(false) .setDesiredBundleSizeBytes(DEFAULT_BUNDLE_SIZE_BYTES) + .setUsesReshuffle(ReadAllViaFileBasedSource.DEFAULT_USES_RESHUFFLE) + .setFileExceptionHandler(new ReadFileRangesFnExceptionHandler()) .build(); } /** * Like {@link #readGenericRecords(Schema)}, but for a {@link PCollection} of {@link - * FileIO.ReadableFile}, for example, returned by {@link FileIO#readMatches}. + * ReadableFile}, for example, returned by {@link FileIO#readMatches}. * * @deprecated You can achieve The functionality of {@link #readAllGenericRecords(Schema)} using * {@link FileIO} matching plus {@link #readFilesGenericRecords(Schema)}. This is the @@ -431,7 +437,7 @@ public static Read readGenericRecords(String schema) { return readGenericRecords(new Schema.Parser().parse(schema)); } - /** Like {@link #readGenericRecords(String)}, but for {@link FileIO.ReadableFile} collections. */ + /** Like {@link #readGenericRecords(String)}, but for {@link ReadableFile} collections. */ public static ReadFiles readFilesGenericRecords(String schema) { return readFilesGenericRecords(new Schema.Parser().parse(schema)); } @@ -463,14 +469,16 @@ public static Parse parseGenericRecords(SerializableFunction ParseFiles parseFilesGenericRecords( SerializableFunction parseFn) { return new AutoValue_AvroIO_ParseFiles.Builder() .setParseFn(parseFn) .setDesiredBundleSizeBytes(DEFAULT_BUNDLE_SIZE_BYTES) + .setUsesReshuffle(ReadAllViaFileBasedSource.DEFAULT_USES_RESHUFFLE) + .setFileExceptionHandler(new ReadFileRangesFnExceptionHandler()) .build(); } @@ -569,16 +577,7 @@ private static TypedWrite.Builder default @Experimental(Kind.SCHEMAS) private static PCollection setBeamSchema( PCollection pc, Class clazz, @Nullable Schema schema) { - org.apache.beam.sdk.schemas.Schema beamSchema = - org.apache.beam.sdk.schemas.utils.AvroUtils.getSchema(clazz, schema); - if (beamSchema != null) { - pc.setSchema( - beamSchema, - TypeDescriptor.of(clazz), - org.apache.beam.sdk.schemas.utils.AvroUtils.getToRowFunction(clazz, schema), - org.apache.beam.sdk.schemas.utils.AvroUtils.getFromRowFunction(clazz)); - } - return pc; + return pc.setCoder(AvroUtils.schemaCoder(clazz, schema)); } /** @@ -755,12 +754,16 @@ private static AvroSource createSource( /** Implementation of {@link #readFiles}. */ @AutoValue public abstract static class ReadFiles - extends PTransform, PCollection> { + extends PTransform, PCollection> { abstract @Nullable Class getRecordClass(); abstract @Nullable Schema getSchema(); + abstract boolean getUsesReshuffle(); + + abstract ReadFileRangesFnExceptionHandler getFileExceptionHandler(); + abstract long getDesiredBundleSizeBytes(); abstract boolean getInferBeamSchema(); @@ -775,6 +778,11 @@ abstract static class Builder { abstract Builder setSchema(Schema schema); + abstract Builder setUsesReshuffle(boolean usesReshuffle); + + abstract Builder setFileExceptionHandler( + ReadFileRangesFnExceptionHandler exceptionHandler); + abstract Builder setDesiredBundleSizeBytes(long desiredBundleSizeBytes); abstract Builder setInferBeamSchema(boolean infer); @@ -789,6 +797,19 @@ ReadFiles withDesiredBundleSizeBytes(long desiredBundleSizeBytes) { return toBuilder().setDesiredBundleSizeBytes(desiredBundleSizeBytes).build(); } + /** Specifies if a Reshuffle should run before file reads occur. */ + @Experimental(Kind.FILESYSTEM) + public ReadFiles withUsesReshuffle(boolean usesReshuffle) { + return toBuilder().setUsesReshuffle(usesReshuffle).build(); + } + + /** Specifies if exceptions should be logged only for streaming pipelines. */ + @Experimental(Kind.FILESYSTEM) + public ReadFiles withFileExceptionHandler( + ReadFileRangesFnExceptionHandler exceptionHandler) { + return toBuilder().setFileExceptionHandler(exceptionHandler).build(); + } + /** * If set to true, a Beam schema will be inferred from the AVRO schema. This allows the output * to be used by SQL and by the schema-transform library. @@ -803,7 +824,7 @@ public ReadFiles withDatumReaderFactory(AvroSource.DatumReaderFactory fact } @Override - public PCollection expand(PCollection input) { + public PCollection expand(PCollection input) { checkNotNull(getSchema(), "schema"); PCollection read = input.apply( @@ -812,7 +833,9 @@ public PCollection expand(PCollection input) { getDesiredBundleSizeBytes(), new CreateSourceFn<>( getRecordClass(), getSchema().toString(), getDatumReaderFactory()), - AvroCoder.of(getRecordClass(), getSchema()))); + AvroCoder.of(getRecordClass(), getSchema()), + getUsesReshuffle(), + getFileExceptionHandler())); return getInferBeamSchema() ? setBeamSchema(read, getRecordClass(), getSchema()) : read; } @@ -1078,11 +1101,15 @@ public void populateDisplayData(DisplayData.Builder builder) { /** Implementation of {@link #parseFilesGenericRecords}. */ @AutoValue public abstract static class ParseFiles - extends PTransform, PCollection> { + extends PTransform, PCollection> { abstract SerializableFunction getParseFn(); abstract @Nullable Coder getCoder(); + abstract boolean getUsesReshuffle(); + + abstract ReadFileRangesFnExceptionHandler getFileExceptionHandler(); + abstract long getDesiredBundleSizeBytes(); abstract Builder toBuilder(); @@ -1093,6 +1120,11 @@ abstract static class Builder { abstract Builder setCoder(Coder coder); + abstract Builder setUsesReshuffle(boolean usesReshuffle); + + abstract Builder setFileExceptionHandler( + ReadFileRangesFnExceptionHandler exceptionHandler); + abstract Builder setDesiredBundleSizeBytes(long desiredBundleSizeBytes); abstract ParseFiles build(); @@ -1103,13 +1135,26 @@ public ParseFiles withCoder(Coder coder) { return toBuilder().setCoder(coder).build(); } + /** Specifies if a Reshuffle should run before file reads occur. */ + @Experimental(Kind.FILESYSTEM) + public ParseFiles withUsesReshuffle(boolean usesReshuffle) { + return toBuilder().setUsesReshuffle(usesReshuffle).build(); + } + + /** Specifies if exceptions should be logged only for streaming pipelines. */ + @Experimental(Kind.FILESYSTEM) + public ParseFiles withFileExceptionHandler( + ReadFileRangesFnExceptionHandler exceptionHandler) { + return toBuilder().setFileExceptionHandler(exceptionHandler).build(); + } + @VisibleForTesting ParseFiles withDesiredBundleSizeBytes(long desiredBundleSizeBytes) { return toBuilder().setDesiredBundleSizeBytes(desiredBundleSizeBytes).build(); } @Override - public PCollection expand(PCollection input) { + public PCollection expand(PCollection input) { final Coder coder = Parse.inferCoder(getCoder(), getParseFn(), input.getPipeline().getCoderRegistry()); final SerializableFunction parseFn = getParseFn(); @@ -1117,7 +1162,12 @@ public PCollection expand(PCollection input) { new CreateParseSourceFn<>(parseFn, coder); return input.apply( "Parse Files via FileBasedSource", - new ReadAllViaFileBasedSource<>(getDesiredBundleSizeBytes(), createSource, coder)); + new ReadAllViaFileBasedSource<>( + getDesiredBundleSizeBytes(), + createSource, + coder, + getUsesReshuffle(), + getFileExceptionHandler())); } @Override diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java index 73d457f520fe..16a92ad1bbfc 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java @@ -103,20 +103,18 @@ public PCollection expand(PBegin begin) { AvroIO.readGenericRecords(AvroUtils.toAvroSchema(dataSchema, null, null)) .withBeamSchemas(true) .from(location)) - .apply("GenericRecordToRow", Convert.toRows()); + .apply("ToRows", Convert.toRows()); } }; } @Override public PTransform, POutput> buildWriter() { - PTransform, PCollection> writeConverter = - GenericRecordWriteConverter.builder().beamSchema(dataSchema).build(); return new PTransform, POutput>() { @Override public PDone expand(PCollection input) { return input - .apply("GenericRecordToRow", writeConverter) + .apply("ToGenericRecords", Convert.to(GenericRecord.class)) .apply( "AvroIOWrite", AvroIO.writeGenericRecords(AvroUtils.toAvroSchema(dataSchema, null, null)) diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ClassLoaderFileSystem.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ClassLoaderFileSystem.java index 79aa85facc9c..22290300117b 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ClassLoaderFileSystem.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ClassLoaderFileSystem.java @@ -30,6 +30,7 @@ import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.io.fs.CreateOptions; import org.apache.beam.sdk.io.fs.MatchResult; +import org.apache.beam.sdk.io.fs.MoveOptions; import org.apache.beam.sdk.io.fs.ResolveOptions; import org.apache.beam.sdk.io.fs.ResourceId; import org.apache.beam.sdk.options.PipelineOptions; @@ -85,7 +86,9 @@ protected void copy( @Override protected void rename( - List srcResourceIds, List destResourceIds) + List srcResourceIds, + List destResourceIds, + MoveOptions... moveOptions) throws IOException { throw new UnsupportedOperationException("Read-only filesystem."); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CompressedSource.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CompressedSource.java index b704b9a62878..f6fb657fd654 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CompressedSource.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CompressedSource.java @@ -49,15 +49,18 @@ * }

    * *

    Supported compression algorithms are {@link Compression#GZIP}, {@link Compression#BZIP2}, - * {@link Compression#ZIP}, {@link Compression#ZSTD}, and {@link Compression#DEFLATE}. User-defined + * {@link Compression#ZIP}, {@link Compression#ZSTD}, {@link Compression#LZO}, {@link + * Compression#LZOP}, {@link Compression#SNAPPY}, and {@link Compression#DEFLATE}. User-defined * compression types are supported by implementing a {@link DecompressingChannelFactory}. * *

    By default, the compression algorithm is selected from those supported in {@link Compression} * based on the file name provided to the source, namely {@code ".bz2"} indicates {@link * Compression#BZIP2}, {@code ".gz"} indicates {@link Compression#GZIP}, {@code ".zip"} indicates - * {@link Compression#ZIP}, {@code ".zst"} indicates {@link Compression#ZSTD}, and {@code - * ".deflate"} indicates {@link Compression#DEFLATE}. If the file name does not match any of the - * supported algorithms, it is assumed to be uncompressed data. + * {@link Compression#ZIP}, {@code ".zst"} indicates {@link Compression#ZSTD}, {@code + * ".lzo_deflate"} indicates {@link Compression#LZO}, {@code ".lzo"} indicates {@link + * Compression#LZOP}, {@code ".snappy"} indicted {@link Compression#SNAPPY}, and {@code ".deflate"} + * indicates {@link Compression#DEFLATE}. If the file name does not match any of the supported + * algorithms, it is assumed to be uncompressed data. * * @param The type to read from the compressed file. */ @@ -102,7 +105,10 @@ public enum CompressionMode implements DecompressingChannelFactory { LZOP(Compression.LZOP), /** @see Compression#DEFLATE */ - DEFLATE(Compression.DEFLATE); + DEFLATE(Compression.DEFLATE), + + /** @see Compression#SNAPPY */ + SNAPPY(Compression.SNAPPY); private final Compression canonical; @@ -158,6 +164,9 @@ static DecompressingChannelFactory fromCanonical(Compression compression) { case DEFLATE: return DEFLATE; + case SNAPPY: + return SNAPPY; + default: throw new IllegalArgumentException("Unsupported compression type: " + compression); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java index 25a6fd7134ac..9e81aa96aa5c 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java @@ -36,6 +36,8 @@ import org.apache.commons.compress.compressors.deflate.DeflateCompressorInputStream; import org.apache.commons.compress.compressors.deflate.DeflateCompressorOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream; +import org.apache.commons.compress.compressors.snappy.SnappyCompressorOutputStream; import org.apache.commons.compress.compressors.zstandard.ZstdCompressorInputStream; import org.apache.commons.compress.compressors.zstandard.ZstdCompressorOutputStream; @@ -219,6 +221,25 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I return Channels.newChannel( new DeflateCompressorOutputStream(Channels.newOutputStream(channel))); } + }, + + /** Google Snappy compression. */ + SNAPPY(".snappy", ".snappy") { + private int uncompressedSize; + + @Override + public ReadableByteChannel readDecompressed(ReadableByteChannel channel) throws IOException { + SnappyCompressorInputStream is = + new SnappyCompressorInputStream(Channels.newInputStream(channel)); + uncompressedSize = is.getSize(); + return Channels.newChannel(is); + } + + @Override + public WritableByteChannel writeCompressed(WritableByteChannel channel) throws IOException { + return Channels.newChannel( + new SnappyCompressorOutputStream(Channels.newOutputStream(channel), uncompressedSize)); + } }; private final String suggestedSuffix; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java index d1f3f3f46f92..f735ec350bcf 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java @@ -150,7 +150,10 @@ public enum CompressionType implements WritableByteChannelFactory { LZOP(Compression.LZOP), /** @see Compression#DEFLATE */ - DEFLATE(Compression.DEFLATE); + DEFLATE(Compression.DEFLATE), + + /** @see Compression#SNAPPY */ + SNAPPY(Compression.SNAPPY); private final Compression canonical; @@ -202,6 +205,9 @@ public static CompressionType fromCanonical(Compression canonical) { case DEFLATE: return DEFLATE; + case SNAPPY: + return SNAPPY; + default: throw new UnsupportedOperationException("Unsupported compression type: " + canonical); } @@ -764,8 +770,15 @@ final void moveToOutputFiles( } // During a failure case, files may have been deleted in an earlier step. Thus // we ignore missing files here. - FileSystems.rename(srcFiles, dstFiles, StandardMoveOptions.IGNORE_MISSING_FILES); - removeTemporaryFiles(srcFiles); + FileSystems.rename( + srcFiles, + dstFiles, + StandardMoveOptions.IGNORE_MISSING_FILES, + StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS); + + // The rename ensures that the source files are deleted. However we may still need to clean + // up the directory or orphaned files. + removeTemporaryFiles(Collections.emptyList()); } /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystem.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystem.java index 34ba69df358c..ba2c8251da87 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystem.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystem.java @@ -27,6 +27,7 @@ import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.io.fs.CreateOptions; import org.apache.beam.sdk.io.fs.MatchResult; +import org.apache.beam.sdk.io.fs.MoveOptions; import org.apache.beam.sdk.io.fs.ResourceId; /** @@ -113,6 +114,9 @@ protected abstract void copy(List srcResourceIds, List * * @param srcResourceIds the references of the source resources * @param destResourceIds the references of the destination resources + * @param moveOptions move options specifying handling of error conditions + * @throws UnsupportedOperationException if move options are specified and not supported by the + * FileSystem * @throws FileNotFoundException if the source resources are missing. When rename throws, the * state of the resources is unknown but safe: for every (source, destination) pair of * resources, the following are possible: a) source exists, b) destination exists, c) source @@ -121,7 +125,10 @@ protected abstract void copy(List srcResourceIds, List * resource. */ protected abstract void rename( - List srcResourceIds, List destResourceIds) throws IOException; + List srcResourceIds, + List destResourceIds, + MoveOptions... moveOptions) + throws IOException; /** * Deletes a collection of resources. diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java index a43c9be23545..46573c680700 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java @@ -48,10 +48,10 @@ import org.apache.beam.sdk.io.fs.MatchResult.Metadata; import org.apache.beam.sdk.io.fs.MatchResult.Status; import org.apache.beam.sdk.io.fs.MoveOptions; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; import org.apache.beam.sdk.io.fs.ResourceId; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.util.common.ReflectHelpers; -import org.apache.beam.sdk.values.KV; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Joiner; @@ -274,23 +274,13 @@ public static void copy( throws IOException { validateSrcDestLists(srcResourceIds, destResourceIds); if (srcResourceIds.isEmpty()) { - // Short-circuit. return; } - - List srcToCopy = srcResourceIds; - List destToCopy = destResourceIds; - if (Sets.newHashSet(moveOptions) - .contains(MoveOptions.StandardMoveOptions.IGNORE_MISSING_FILES)) { - KV, List> existings = - filterMissingFiles(srcResourceIds, destResourceIds); - srcToCopy = existings.getKey(); - destToCopy = existings.getValue(); + FileSystem fileSystem = getFileSystemInternal(srcResourceIds.iterator().next().getScheme()); + FilterResult filtered = filterFiles(fileSystem, srcResourceIds, destResourceIds, moveOptions); + if (!filtered.resultSources.isEmpty()) { + fileSystem.copy(filtered.resultSources, filtered.resultDestinations); } - if (srcToCopy.isEmpty()) { - return; - } - getFileSystemInternal(srcToCopy.iterator().next().getScheme()).copy(srcToCopy, destToCopy); } /** @@ -303,6 +293,8 @@ public static void copy( * *

    It doesn't support renaming globs. * + *

    Src files will be removed, even if the copy is skipped due to specified move options. + * * @param srcResourceIds the references of the source resources * @param destResourceIds the references of the destination resources */ @@ -311,24 +303,35 @@ public static void rename( throws IOException { validateSrcDestLists(srcResourceIds, destResourceIds); if (srcResourceIds.isEmpty()) { - // Short-circuit. return; } + renameInternal( + getFileSystemInternal(srcResourceIds.iterator().next().getScheme()), + srcResourceIds, + destResourceIds, + moveOptions); + } - List srcToRename = srcResourceIds; - List destToRename = destResourceIds; - if (Sets.newHashSet(moveOptions) - .contains(MoveOptions.StandardMoveOptions.IGNORE_MISSING_FILES)) { - KV, List> existings = - filterMissingFiles(srcResourceIds, destResourceIds); - srcToRename = existings.getKey(); - destToRename = existings.getValue(); - } - if (srcToRename.isEmpty()) { - return; + @VisibleForTesting + static void renameInternal( + FileSystem fileSystem, + List srcResourceIds, + List destResourceIds, + MoveOptions... moveOptions) + throws IOException { + try { + fileSystem.rename(srcResourceIds, destResourceIds, moveOptions); + } catch (UnsupportedOperationException e) { + // Some file systems do not yet support specifying the move options. Instead we + // perform filtering using match calls before renaming. + FilterResult filtered = filterFiles(fileSystem, srcResourceIds, destResourceIds, moveOptions); + if (!filtered.resultSources.isEmpty()) { + fileSystem.rename(filtered.resultSources, filtered.resultDestinations); + } + if (!filtered.filteredExistingSrcs.isEmpty()) { + fileSystem.delete(filtered.filteredExistingSrcs); + } } - getFileSystemInternal(srcToRename.iterator().next().getScheme()) - .rename(srcToRename, destToRename); } /** @@ -390,25 +393,72 @@ public ResourceId apply(@Nonnull Metadata input) { .delete(resourceIdsToDelete); } - private static KV, List> filterMissingFiles( - List srcResourceIds, List destResourceIds) throws IOException { - validateSrcDestLists(srcResourceIds, destResourceIds); - if (srcResourceIds.isEmpty()) { - // Short-circuit. - return KV.of(Collections.emptyList(), Collections.emptyList()); + private static class FilterResult { + public List resultSources = new ArrayList(); + public List resultDestinations = new ArrayList(); + public List filteredExistingSrcs = new ArrayList(); + }; + + private static FilterResult filterFiles( + FileSystem fileSystem, + List srcResourceIds, + List destResourceIds, + MoveOptions... moveOptions) + throws IOException { + FilterResult result = new FilterResult(); + if (moveOptions.length == 0 || srcResourceIds.isEmpty()) { + // Nothing will be filtered. + result.resultSources = srcResourceIds; + result.resultDestinations = destResourceIds; + return result; } - - List srcToHandle = new ArrayList<>(); - List destToHandle = new ArrayList<>(); - - List matchResults = matchResources(srcResourceIds); - for (int i = 0; i < matchResults.size(); ++i) { - if (!matchResults.get(i).status().equals(Status.NOT_FOUND)) { - srcToHandle.add(srcResourceIds.get(i)); - destToHandle.add(destResourceIds.get(i)); + Set moveOptionSet = Sets.newHashSet(moveOptions); + final boolean ignoreMissingSrc = + moveOptionSet.contains(StandardMoveOptions.IGNORE_MISSING_FILES); + final boolean skipExistingDest = + moveOptionSet.contains(StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS); + final int size = srcResourceIds.size(); + + // Match necessary srcs and dests with a single match call. + List matchResources = new ArrayList<>(); + if (ignoreMissingSrc) { + matchResources.addAll(srcResourceIds); + } + if (skipExistingDest) { + matchResources.addAll(destResourceIds); + } + List matchResults = + fileSystem.match( + FluentIterable.from(matchResources).transform(ResourceId::toString).toList()); + List matchSrcResults = ignoreMissingSrc ? matchResults.subList(0, size) : null; + List matchDestResults = + skipExistingDest + ? matchResults.subList(matchResults.size() - size, matchResults.size()) + : null; + + for (int i = 0; i < size; ++i) { + if (matchSrcResults != null && matchSrcResults.get(i).status().equals(Status.NOT_FOUND)) { + // If the source is not found, and we are ignoring missing source files, then we skip it. + continue; } + if (matchDestResults != null + && matchDestResults.get(i).status().equals(Status.OK) + && checksumMatch( + matchDestResults.get(i).metadata().get(0), + matchSrcResults.get(i).metadata().get(0))) { + // If the destination exists, and we are skipping when destinations exist, then we skip + // the copy but note that the source exists in case it should be deleted. + result.filteredExistingSrcs.add(srcResourceIds.get(i)); + continue; + } + result.resultSources.add(srcResourceIds.get(i)); + result.resultDestinations.add(destResourceIds.get(i)); } - return KV.of(srcToHandle, destToHandle); + return result; + } + + private static boolean checksumMatch(MatchResult.Metadata first, MatchResult.Metadata second) { + return first.checksum() != null && first.checksum().equals(second.checksum()); } private static void validateSrcDestLists( diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/GenericRecordWriteConverter.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/GenericRecordWriteConverter.java deleted file mode 100644 index f532f77eb1d0..000000000000 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/GenericRecordWriteConverter.java +++ /dev/null @@ -1,67 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io; - -import com.google.auto.value.AutoValue; -import java.io.Serializable; -import org.apache.avro.generic.GenericRecord; -import org.apache.beam.sdk.coders.AvroCoder; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.schemas.utils.AvroUtils; -import org.apache.beam.sdk.transforms.DoFn; -import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.transforms.ParDo; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.Row; - -/** A {@link PTransform} to convert {@link Row} to {@link GenericRecord}. */ -@AutoValue -public abstract class GenericRecordWriteConverter - extends PTransform, PCollection> implements Serializable { - - public abstract Schema beamSchema(); - - public static Builder builder() { - return new AutoValue_GenericRecordWriteConverter.Builder(); - } - - @Override - public PCollection expand(PCollection input) { - return input - .apply( - "RowsToGenericRecord", - ParDo.of( - new DoFn() { - @ProcessElement - public void processElement(ProcessContext c) { - GenericRecord genericRecord = - AvroUtils.toGenericRecord( - c.element(), AvroUtils.toAvroSchema(beamSchema())); - c.output(genericRecord); - } - })) - .setCoder(AvroCoder.of(GenericRecord.class, AvroUtils.toAvroSchema(beamSchema()))); - } - - @AutoValue.Builder - abstract static class Builder { - public abstract Builder beamSchema(Schema beamSchema); - - public abstract GenericRecordWriteConverter build(); - } -} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java index 7404f78b799a..042e6c52cc5e 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java @@ -47,6 +47,7 @@ import org.apache.beam.sdk.io.fs.MatchResult; import org.apache.beam.sdk.io.fs.MatchResult.Metadata; import org.apache.beam.sdk.io.fs.MatchResult.Status; +import org.apache.beam.sdk.io.fs.MoveOptions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Predicates; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -162,8 +163,14 @@ protected void copy(List srcResourceIds, List } @Override - protected void rename(List srcResourceIds, List destResourceIds) + protected void rename( + List srcResourceIds, + List destResourceIds, + MoveOptions... moveOptions) throws IOException { + if (moveOptions.length > 0) { + throw new UnsupportedOperationException("Support for move options is not yet implemented."); + } checkArgument( srcResourceIds.size() == destResourceIds.size(), "Number of source files %s must equal number of destination files %s", diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java index e2f7a8f6b2e3..52e87d59abdd 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.io; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; import com.google.auto.value.AutoValue; @@ -27,6 +28,7 @@ import java.util.Arrays; import java.util.List; import java.util.NoSuchElementException; +import java.util.concurrent.TimeUnit; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.InstantCoder; @@ -44,7 +46,6 @@ import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.display.DisplayData; -import org.apache.beam.sdk.transforms.display.HasDisplayData; import org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator; import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker.HasProgress; @@ -60,6 +61,11 @@ import org.apache.beam.sdk.values.ValueWithRecordId; import org.apache.beam.sdk.values.ValueWithRecordId.StripIdsDoFn; import org.apache.beam.sdk.values.ValueWithRecordId.ValueWithRecordIdCoder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.Cache; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.RemovalListener; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; import org.joda.time.Instant; @@ -144,7 +150,8 @@ public final PCollection expand(PBegin input) { .apply(ParDo.of(new OutputSingleSource<>(source))) .setCoder(SerializableCoder.of(new TypeDescriptor>() {})) .apply(ParDo.of(new BoundedSourceAsSDFWrapperFn<>())) - .setCoder(source.getOutputCoder()); + .setCoder(source.getOutputCoder()) + .setTypeDescriptor(source.getOutputCoder().getEncodedTypeDescriptor()); } /** Returns the {@code BoundedSource} used to create this {@code Read} {@code PTransform}. */ @@ -170,7 +177,8 @@ public void populateDisplayData(DisplayData.Builder builder) { public static class Unbounded extends PTransform> { private final UnboundedSource source; - private Unbounded(@Nullable String name, UnboundedSource source) { + @VisibleForTesting + Unbounded(@Nullable String name, UnboundedSource source) { super(name); this.source = (UnboundedSource) SerializableUtils.ensureSerializable(source); @@ -209,10 +217,7 @@ public final PCollection expand(PBegin input) { .apply(ParDo.of(new OutputSingleSource<>(source))) .setCoder( SerializableCoder.of(new TypeDescriptor>() {})) - .apply( - ParDo.of( - new UnboundedSourceAsSDFWrapperFn<>( - (Coder) source.getCheckpointMarkCoder()))) + .apply(ParDo.of(createUnboundedSdfWrapper())) .setCoder(ValueWithRecordIdCoder.of(source.getOutputCoder())); if (source.requiresDeduping()) { @@ -224,6 +229,11 @@ public final PCollection expand(PBegin input) { return outputWithIds.apply(ParDo.of(new StripIdsDoFn<>())); } + @VisibleForTesting + UnboundedSourceAsSDFWrapperFn createUnboundedSdfWrapper() { + return new UnboundedSourceAsSDFWrapperFn<>(source.getCheckpointMarkCoder()); + } + /** Returns the {@code UnboundedSource} used to create this {@code Read} {@code PTransform}. */ public UnboundedSource getSource() { return source; @@ -249,26 +259,26 @@ public void populateDisplayData(DisplayData.Builder builder) { *

    We model the element as the original source and the restriction as the sub-source. This * allows us to split the sub-source over and over yet still receive "source" objects as inputs. */ - static class BoundedSourceAsSDFWrapperFn extends DoFn, T> { + static class BoundedSourceAsSDFWrapperFn> + extends DoFn { private static final Logger LOG = LoggerFactory.getLogger(BoundedSourceAsSDFWrapperFn.class); private static final long DEFAULT_DESIRED_BUNDLE_SIZE_BYTES = 64 * (1 << 20); @GetInitialRestriction - public BoundedSource initialRestriction(@Element BoundedSource element) { + public BoundedSourceT initialRestriction(@Element BoundedSourceT element) { return element; } @GetSize - public double getSize( - @Restriction BoundedSource restriction, PipelineOptions pipelineOptions) + public double getSize(@Restriction BoundedSourceT restriction, PipelineOptions pipelineOptions) throws Exception { return restriction.getEstimatedSizeBytes(pipelineOptions); } @SplitRestriction public void splitRestriction( - @Restriction BoundedSource restriction, - OutputReceiver> receiver, + @Restriction BoundedSourceT restriction, + OutputReceiver receiver, PipelineOptions pipelineOptions) throws Exception { long estimatedSize = restriction.getEstimatedSizeBytes(pipelineOptions); @@ -278,20 +288,22 @@ public void splitRestriction( Math.min( DEFAULT_DESIRED_BUNDLE_SIZE_BYTES, Math.max(1L, estimatedSize / DEFAULT_DESIRED_NUM_SPLITS)); - for (BoundedSource split : restriction.split(splitBundleSize, pipelineOptions)) { + List splits = + (List) restriction.split(splitBundleSize, pipelineOptions); + for (BoundedSourceT split : splits) { receiver.output(split); } } @NewTracker - public RestrictionTracker, TimestampedValue[]> restrictionTracker( - @Restriction BoundedSource restriction, PipelineOptions pipelineOptions) { + public RestrictionTracker[]> restrictionTracker( + @Restriction BoundedSourceT restriction, PipelineOptions pipelineOptions) { return new BoundedSourceAsSDFRestrictionTracker<>(restriction, pipelineOptions); } @ProcessElement public void processElement( - RestrictionTracker, TimestampedValue[]> tracker, + RestrictionTracker[]> tracker, OutputReceiver receiver) throws IOException { TimestampedValue[] out = new TimestampedValue[1]; @@ -301,23 +313,24 @@ public void processElement( } @GetRestrictionCoder - public Coder> restrictionCoder() { - return SerializableCoder.of(new TypeDescriptor>() {}); + public Coder restrictionCoder() { + return SerializableCoder.of(new TypeDescriptor() {}); } /** * A fake restriction tracker which adapts to the {@link BoundedSource} API. The restriction * object is used to advance the underlying source and to "return" the current element. */ - private static class BoundedSourceAsSDFRestrictionTracker - extends RestrictionTracker, TimestampedValue[]> { - private final BoundedSource initialRestriction; + private static class BoundedSourceAsSDFRestrictionTracker< + BoundedSourceT extends BoundedSource, T> + extends RestrictionTracker[]> { + private final BoundedSourceT initialRestriction; private final PipelineOptions pipelineOptions; private BoundedSource.BoundedReader currentReader; private boolean claimedAll; BoundedSourceAsSDFRestrictionTracker( - BoundedSource initialRestriction, PipelineOptions pipelineOptions) { + BoundedSourceT initialRestriction, PipelineOptions pipelineOptions) { this.initialRestriction = initialRestriction; this.pipelineOptions = pipelineOptions; } @@ -383,15 +396,15 @@ protected void finalize() throws Throwable { /** The value is invalid if {@link #tryClaim} has ever thrown an exception. */ @Override - public BoundedSource currentRestriction() { + public BoundedSourceT currentRestriction() { if (currentReader == null) { return initialRestriction; } - return currentReader.getCurrentSource(); + return (BoundedSourceT) currentReader.getCurrentSource(); } @Override - public SplitResult> trySplit(double fractionOfRemainder) { + public SplitResult trySplit(double fractionOfRemainder) { if (currentReader == null) { return null; } @@ -406,7 +419,7 @@ public SplitResult> trySplit(double fractionOfRemainder) { return null; } BoundedSource primary = currentReader.getCurrentSource(); - return SplitResult.of(primary, residual); + return (SplitResult) SplitResult.of(primary, residual); } @Override @@ -439,8 +452,11 @@ static class UnboundedSourceAsSDFWrapperFn checkpointCoder; + private Cache> cachedReaders; + private Coder> restrictionCoder; - private UnboundedSourceAsSDFWrapperFn(Coder checkpointCoder) { + @VisibleForTesting + UnboundedSourceAsSDFWrapperFn(Coder checkpointCoder) { this.checkpointCoder = checkpointCoder; } @@ -450,6 +466,27 @@ public UnboundedSourceRestriction initialRestriction( return UnboundedSourceRestriction.create(element, null, BoundedWindow.TIMESTAMP_MIN_VALUE); } + @Setup + public void setUp() throws Exception { + restrictionCoder = restrictionCoder(); + cachedReaders = + CacheBuilder.newBuilder() + .expireAfterWrite(1, TimeUnit.MINUTES) + .maximumSize(100) + .removalListener( + (RemovalListener) + removalNotification -> { + if (removalNotification.wasEvicted()) { + try { + removalNotification.getValue().close(); + } catch (IOException e) { + LOG.warn("Failed to close UnboundedReader.", e); + } + } + }) + .build(); + } + @SplitRestriction public void splitRestriction( @Restriction UnboundedSourceRestriction restriction, @@ -488,7 +525,10 @@ public void splitRestriction( restrictionTracker( @Restriction UnboundedSourceRestriction restriction, PipelineOptions pipelineOptions) { - return new UnboundedSourceAsSDFRestrictionTracker(restriction, pipelineOptions); + checkNotNull(restrictionCoder); + checkNotNull(cachedReaders); + return new UnboundedSourceAsSDFRestrictionTracker( + restriction, pipelineOptions, cachedReaders, restrictionCoder); } @ProcessElement @@ -504,6 +544,7 @@ public ProcessContinuation processElement( UnboundedSourceValue[] out = new UnboundedSourceValue[1]; while (tracker.tryClaim(out) && out[0] != null) { + watermarkEstimator.setWatermark(out[0].getWatermark()); receiver.outputWithTimestamp( new ValueWithRecordId<>(out[0].getValue(), out[0].getId()), out[0].getTimestamp()); } @@ -511,9 +552,6 @@ public ProcessContinuation processElement( UnboundedSourceRestriction currentRestriction = tracker.currentRestriction(); - // Advance the watermark even if zero elements may have been output. - watermarkEstimator.setWatermark(currentRestriction.getWatermark()); - // Add the checkpoint mark to be finalized if the checkpoint mark isn't trivial and is not // the initial restriction. The initial restriction would have been finalized as part of // a prior bundle being executed. @@ -533,6 +571,10 @@ public ProcessContinuation processElement( if (currentRestriction.getSource() instanceof EmptyUnboundedSource) { return ProcessContinuation.stop(); } + + // Advance the watermark even if zero elements may have been output. Only report it if we + // are resuming. + watermarkEstimator.setWatermark(currentRestriction.getWatermark()); return ProcessContinuation.resume(); } @@ -570,10 +612,11 @@ public Coder> restrictionCoder( */ @AutoValue abstract static class UnboundedSourceValue { + public static UnboundedSourceValue create( + byte[] id, T value, Instant timestamp, Instant watermark) { - public static UnboundedSourceValue create(byte[] id, T value, Instant timestamp) { return new AutoValue_Read_UnboundedSourceAsSDFWrapperFn_UnboundedSourceValue( - id, value, timestamp); + id, value, timestamp, watermark); } @SuppressWarnings("mutable") @@ -582,6 +625,8 @@ public static UnboundedSourceValue create(byte[] id, T value, Instant tim public abstract T getValue(); public abstract Instant getTimestamp(); + + public abstract Instant getWatermark(); } /** @@ -756,22 +801,62 @@ private static class UnboundedSourceAsSDFRestrictionTracker< private final PipelineOptions pipelineOptions; private UnboundedSource.UnboundedReader currentReader; private boolean readerHasBeenStarted; + private Cache> cachedReaders; + private Coder> restrictionCoder; UnboundedSourceAsSDFRestrictionTracker( UnboundedSourceRestriction initialRestriction, - PipelineOptions pipelineOptions) { + PipelineOptions pipelineOptions, + Cache> cachedReaders, + Coder> restrictionCoder) { this.initialRestriction = initialRestriction; this.pipelineOptions = pipelineOptions; + this.cachedReaders = cachedReaders; + this.restrictionCoder = restrictionCoder; + } + + private Object createCacheKey( + UnboundedSource source, CheckpointT checkpoint) { + checkNotNull(restrictionCoder); + // For caching reader, we don't care about the watermark. + return restrictionCoder.structuralValue( + UnboundedSourceRestriction.create( + source, checkpoint, BoundedWindow.TIMESTAMP_MIN_VALUE)); + } + + private void initializeCurrentReader() throws IOException { + Preconditions.checkState(currentReader == null); + Object cacheKey = + createCacheKey(initialRestriction.getSource(), initialRestriction.getCheckpoint()); + currentReader = cachedReaders.getIfPresent(cacheKey); + if (currentReader == null) { + currentReader = + initialRestriction + .getSource() + .createReader(pipelineOptions, initialRestriction.getCheckpoint()); + } else { + // If the reader is from cache, then we know that the reader has been started. + // We also remove this cache entry to avoid eviction. + readerHasBeenStarted = true; + cachedReaders.invalidate(cacheKey); + } + } + + private void cacheCurrentReader( + UnboundedSourceRestriction restriction) { + if (!(currentReader instanceof EmptyUnboundedSource.EmptyUnboundedReader)) { + // We only put the reader into the cache when we know it possibly will be reused by + // residuals. + cachedReaders.put( + createCacheKey(restriction.getSource(), restriction.getCheckpoint()), currentReader); + } } @Override public boolean tryClaim(UnboundedSourceValue[] position) { try { if (currentReader == null) { - currentReader = - initialRestriction - .getSource() - .createReader(pipelineOptions, initialRestriction.getCheckpoint()); + initializeCurrentReader(); } if (currentReader instanceof EmptyUnboundedSource.EmptyUnboundedReader) { return false; @@ -790,7 +875,8 @@ public boolean tryClaim(UnboundedSourceValue[] position) { UnboundedSourceValue.create( currentReader.getCurrentRecordId(), currentReader.getCurrent(), - currentReader.getCurrentTimestamp()); + currentReader.getCurrentTimestamp(), + currentReader.getWatermark()); return true; } catch (IOException e) { if (currentReader != null) { @@ -798,23 +884,14 @@ public boolean tryClaim(UnboundedSourceValue[] position) { currentReader.close(); } catch (IOException closeException) { e.addSuppressed(closeException); + } finally { + currentReader = null; } } throw new RuntimeException(e); } } - @Override - protected void finalize() throws Throwable { - if (currentReader != null) { - try { - currentReader.close(); - } catch (IOException e) { - LOG.error("Failed to close UnboundedReader due to failure processing bundle.", e); - } - } - } - /** The value is invalid if {@link #tryClaim} has ever thrown an exception. */ @Override public UnboundedSourceRestriction currentRestriction() { @@ -858,14 +935,10 @@ public SplitResult> trySplit( UnboundedSourceRestriction.create( EmptyUnboundedSource.INSTANCE, null, BoundedWindow.TIMESTAMP_MAX_VALUE), currentRestriction); - try { - currentReader.close(); - } catch (IOException e) { - LOG.warn("Failed to close UnboundedReader.", e); - } finally { - currentReader = - EmptyUnboundedSource.INSTANCE.createReader(null, currentRestriction.getCheckpoint()); - } + + cacheCurrentReader(currentRestriction); + currentReader = + EmptyUnboundedSource.INSTANCE.createReader(null, currentRestriction.getCheckpoint()); return result; } @@ -889,45 +962,52 @@ public Progress getProgress() { return RestrictionTracker.Progress.from(1, 0); } + boolean resetReaderAfter = false; if (currentReader == null) { try { - currentReader = - initialRestriction - .getSource() - .createReader(pipelineOptions, initialRestriction.getCheckpoint()); + initializeCurrentReader(); + resetReaderAfter = true; } catch (IOException e) { throw new RuntimeException(e); } } - long size = currentReader.getSplitBacklogBytes(); - if (size != UnboundedReader.BACKLOG_UNKNOWN) { - // The UnboundedSource/UnboundedReader API has no way of reporting how much work - // has been completed so runners can only see the work remaining changing. - return RestrictionTracker.Progress.from(0, size); - } + try { + long size = currentReader.getSplitBacklogBytes(); + if (size != UnboundedReader.BACKLOG_UNKNOWN) { + // The UnboundedSource/UnboundedReader API has no way of reporting how much work + // has been completed so runners can only see the work remaining changing. + return RestrictionTracker.Progress.from(0, size); + } - // TODO: Support "global" backlog reporting - // size = reader.getTotalBacklogBytes(); - // if (size != UnboundedReader.BACKLOG_UNKNOWN) { - // return size; - // } + // TODO: Support "global" backlog reporting + // size = reader.getTotalBacklogBytes(); + // if (size != UnboundedReader.BACKLOG_UNKNOWN) { + // return size; + // } - // We treat unknown as 0 progress - return RestrictionTracker.Progress.from(0, 1); + // We treat unknown as 0 progress + return RestrictionTracker.Progress.from(0, 1); + } finally { + if (resetReaderAfter) { + cacheCurrentReader(initialRestriction); + currentReader = null; + } + } } } } - private static class OutputSingleSource extends DoFn { - private final T source; + private static class OutputSingleSource> + extends DoFn { + private final SourceT source; - private OutputSingleSource(T source) { + private OutputSingleSource(SourceT source) { this.source = source; } @ProcessElement - public void processElement(OutputReceiver outputReceiver) { + public void processElement(OutputReceiver outputReceiver) { outputReceiver.output(source); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ReadAllViaFileBasedSource.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ReadAllViaFileBasedSource.java index 2c66ad538d3a..f21ba97e82bb 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ReadAllViaFileBasedSource.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ReadAllViaFileBasedSource.java @@ -19,6 +19,7 @@ import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; import java.io.IOException; +import java.io.Serializable; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.coders.Coder; @@ -33,6 +34,8 @@ import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** * Reads each file in the input {@link PCollection} of {@link ReadableFile} using given parameters @@ -45,25 +48,49 @@ @Experimental(Kind.SOURCE_SINK) public class ReadAllViaFileBasedSource extends PTransform, PCollection> { + + private static final Logger LOG = LoggerFactory.getLogger(ReadAllViaFileBasedSource.class); + protected static final boolean DEFAULT_USES_RESHUFFLE = true; private final long desiredBundleSizeBytes; private final SerializableFunction> createSource; private final Coder coder; + private final ReadFileRangesFnExceptionHandler exceptionHandler; + private final boolean usesReshuffle; public ReadAllViaFileBasedSource( long desiredBundleSizeBytes, SerializableFunction> createSource, Coder coder) { + this( + desiredBundleSizeBytes, + createSource, + coder, + DEFAULT_USES_RESHUFFLE, + new ReadFileRangesFnExceptionHandler()); + } + + public ReadAllViaFileBasedSource( + long desiredBundleSizeBytes, + SerializableFunction> createSource, + Coder coder, + boolean usesReshuffle, + ReadFileRangesFnExceptionHandler exceptionHandler) { this.desiredBundleSizeBytes = desiredBundleSizeBytes; this.createSource = createSource; this.coder = coder; + this.usesReshuffle = usesReshuffle; + this.exceptionHandler = exceptionHandler; } @Override public PCollection expand(PCollection input) { - return input - .apply("Split into ranges", ParDo.of(new SplitIntoRangesFn(desiredBundleSizeBytes))) - .apply("Reshuffle", Reshuffle.viaRandomKey()) - .apply("Read ranges", ParDo.of(new ReadFileRangesFn<>(createSource))) + PCollection> ranges = + input.apply("Split into ranges", ParDo.of(new SplitIntoRangesFn(desiredBundleSizeBytes))); + if (usesReshuffle) { + ranges = ranges.apply("Reshuffle", Reshuffle.viaRandomKey()); + } + return ranges + .apply("Read ranges", ParDo.of(new ReadFileRangesFn(createSource, exceptionHandler))) .setCoder(coder); } @@ -90,10 +117,13 @@ public void process(ProcessContext c) { private static class ReadFileRangesFn extends DoFn, T> { private final SerializableFunction> createSource; + private final ReadFileRangesFnExceptionHandler exceptionHandler; private ReadFileRangesFn( - SerializableFunction> createSource) { + SerializableFunction> createSource, + ReadFileRangesFnExceptionHandler exceptionHandler) { this.createSource = createSource; + this.exceptionHandler = exceptionHandler; } @ProcessElement @@ -113,7 +143,23 @@ public void process(ProcessContext c) throws IOException { for (boolean more = reader.start(); more; more = reader.advance()) { c.output(reader.getCurrent()); } + } catch (RuntimeException e) { + if (exceptionHandler.apply(file, range, e)) { + throw e; + } } } } + + /** A class to handle errors which occur during file reads. */ + public static class ReadFileRangesFnExceptionHandler implements Serializable { + + /* + * Applies the desired handler logic to the given exception and returns + * if the exception should be thrown. + */ + public boolean apply(ReadableFile file, OffsetRange range, Exception e) { + return true; + } + } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java index fbe6389152e6..76fe11ecf87c 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java @@ -30,7 +30,7 @@ import org.apache.beam.sdk.io.fs.MatchResult; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.ValueProvider; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.checkerframework.checker.nullness.qual.Nullable; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java index 6c358e00c358..9dabb4026e1d 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java @@ -22,10 +22,12 @@ import com.google.auto.value.AutoValue; import java.io.IOException; +import java.util.ArrayList; import java.util.Collection; import java.util.List; import java.util.Map; import java.util.UUID; +import java.util.concurrent.CompletionStage; import java.util.concurrent.ThreadLocalRandom; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; @@ -33,6 +35,7 @@ import org.apache.beam.sdk.coders.CannotProvideCoderException; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.Coder.NonDeterministicException; +import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.coders.ShardedKeyCoder; @@ -50,10 +53,13 @@ import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.Flatten; import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.Reify; import org.apache.beam.sdk.transforms.Reshuffle; +import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.transforms.Values; import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.WithKeys; @@ -65,6 +71,7 @@ import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.util.CoderUtils; +import org.apache.beam.sdk.util.MoreFutures; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollection.IsBounded; @@ -84,6 +91,8 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Multimap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing; import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Duration; +import org.joda.time.Instant; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -133,7 +142,14 @@ public abstract class WriteFiles // We could consider making this a parameter. private static final int SPILLED_RECORD_SHARDING_FACTOR = 10; + // The record count and buffering duration to trigger flushing records to a tmp file. Mainly used + // for writing unbounded data to avoid generating too many small files. + private static final int FILE_TRIGGERING_RECORD_COUNT = 100000; + private static final Duration FILE_TRIGGERING_RECORD_BUFFERING_DURATION = + Duration.standardSeconds(5); + static final int UNKNOWN_SHARDNUM = -1; + static final int DUMMY_SHARDNUM = 0; private @Nullable WriteOperation writeOperation; /** @@ -313,13 +329,16 @@ public WriteFilesResult expand(PCollection input) { getWindowedWrites(), "Must use windowed writes when applying %s to an unbounded PCollection", WriteFiles.class.getSimpleName()); - // The reason for this is https://issues.apache.org/jira/browse/BEAM-1438 - // and similar behavior in other runners. - checkArgument( - getComputeNumShards() != null || getNumShardsProvider() != null, - "When applying %s to an unbounded PCollection, " - + "must specify number of output shards explicitly", - WriteFiles.class.getSimpleName()); + // Sharding used to be required due to https://issues.apache.org/jira/browse/BEAM-1438 and + // similar behavior in other runners. Some runners may support runner determined sharding now. + // Check merging window here due to https://issues.apache.org/jira/browse/BEAM-12040. + if (input.getWindowingStrategy().needsMerge()) { + checkArgument( + getComputeNumShards() != null || getNumShardsProvider() != null, + "When applying %s to an unbounded PCollection with merging windows," + + " must specify number of output shards explicitly", + WriteFiles.class.getSimpleName()); + } } this.writeOperation = getSink().createWriteOperation(); this.writeOperation.setWindowedWrites(getWindowedWrites()); @@ -352,21 +371,34 @@ public WriteFilesResult expand(PCollection input) { PCollectionView numShardsView = (getComputeNumShards() == null) ? null : input.apply(getComputeNumShards()); - PCollection> tempFileResults = - (getComputeNumShards() == null && getNumShardsProvider() == null) - ? input.apply( - "WriteUnshardedBundlesToTempFiles", - new WriteUnshardedBundlesToTempFiles(destinationCoder, fileResultCoder)) - : input.apply( - "WriteShardedBundlesToTempFiles", - new WriteShardedBundlesToTempFiles( - destinationCoder, fileResultCoder, numShardsView)); - - return tempFileResults - .apply("GatherTempFileResults", new GatherResults<>(fileResultCoder)) - .apply( - "FinalizeTempFileBundles", - new FinalizeTempFileBundles(numShardsView, destinationCoder)); + boolean fixedSharding = getComputeNumShards() != null || getNumShardsProvider() != null; + PCollection>> tempFileResults; + if (fixedSharding) { + tempFileResults = + input + .apply( + "WriteShardedBundlesToTempFiles", + new WriteShardedBundlesToTempFiles( + destinationCoder, fileResultCoder, numShardsView)) + .apply("GatherTempFileResults", new GatherResults<>(fileResultCoder)); + } else { + if (input.isBounded() == IsBounded.BOUNDED) { + tempFileResults = + input + .apply( + "WriteUnshardedBundlesToTempFiles", + new WriteUnshardedBundlesToTempFiles(destinationCoder, fileResultCoder)) + .apply("GatherTempFileResults", new GatherResults<>(fileResultCoder)); + } else { + tempFileResults = + input.apply( + "WriteAutoShardedBundlesToTempFiles", + new WriteAutoShardedBundlesToTempFiles(destinationCoder, fileResultCoder)); + } + } + + return tempFileResults.apply( + "FinalizeTempFileBundles", new FinalizeTempFileBundles(numShardsView, destinationCoder)); } @Override @@ -679,6 +711,126 @@ public PCollection> expand(PCollection input) { } } + private class WriteAutoShardedBundlesToTempFiles + extends PTransform, PCollection>>> { + private final Coder destinationCoder; + private final Coder> fileResultCoder; + + private WriteAutoShardedBundlesToTempFiles( + Coder destinationCoder, Coder> fileResultCoder) { + this.destinationCoder = destinationCoder; + this.fileResultCoder = fileResultCoder; + } + + @Override + public PCollection>> expand(PCollection input) { + // Auto-sharding is achieved via GroupIntoBatches.WithShardedKey which shards, groups and at + // the same time batches the input records. The sharding behavior depends on runners. The + // batching is per window and we also emit the batches if there are a certain number of + // records buffered or they have been buffered for a certain time, controlled by + // FILE_TRIGGERING_RECORD_COUNT and BUFFERING_DURATION respectively. + // + // TODO(BEAM-12040): The implementation doesn't currently work with merging windows. + PCollection, Iterable>> shardedInput = + input + .apply( + "KeyedByDestinationHash", + ParDo.of( + new DoFn>() { + @ProcessElement + public void processElement(@Element UserT element, ProcessContext context) + throws Exception { + getDynamicDestinations().setSideInputAccessorFromProcessContext(context); + DestinationT destination = + getDynamicDestinations().getDestination(context.element()); + context.output( + KV.of(hashDestination(destination, destinationCoder), element)); + } + })) + .setCoder(KvCoder.of(VarIntCoder.of(), input.getCoder())) + .apply( + "ShardAndBatch", + GroupIntoBatches.ofSize(FILE_TRIGGERING_RECORD_COUNT) + .withMaxBufferingDuration(FILE_TRIGGERING_RECORD_BUFFERING_DURATION) + .withShardedKey()) + .setCoder( + KvCoder.of( + org.apache.beam.sdk.util.ShardedKey.Coder.of(VarIntCoder.of()), + IterableCoder.of(input.getCoder()))); + + // Write grouped elements to temp files. + PCollection> tempFiles = + shardedInput + .apply( + "AddDummyShard", + MapElements.via( + new SimpleFunction< + KV, Iterable>, + KV, Iterable>>() { + @Override + public KV, Iterable> apply( + KV, Iterable> + input) { + // Add dummy shard since it is required by WriteShardsIntoTempFilesFn. It + // will be dropped after we generate the temp files. + return KV.of( + ShardedKey.of(input.getKey().getKey(), DUMMY_SHARDNUM), + input.getValue()); + } + })) + .setCoder( + KvCoder.of( + ShardedKeyCoder.of(VarIntCoder.of()), IterableCoder.of(input.getCoder()))) + .apply( + "WriteShardsIntoTempFiles", + ParDo.of(new WriteShardsIntoTempFilesFn()).withSideInputs(getSideInputs())) + .setCoder(fileResultCoder) + .apply( + "DropShardNum", + ParDo.of( + new DoFn, FileResult>() { + @ProcessElement + public void process(ProcessContext c) { + c.output(c.element().withShard(UNKNOWN_SHARDNUM)); + } + })); + + // Group temp file results by destinations again to gather all the results in the same window. + // This is needed since we don't have shard idx associated with each temp file so have to rely + // on the indexing within a bundle. + return tempFiles + .apply( + "KeyedByDestination", + WithKeys.of( + new SimpleFunction, DestinationT>() { + @Override + public DestinationT apply(FileResult input) { + return input.getDestination(); + } + })) + .setCoder(KvCoder.of(destinationCoder, fileResultCoder)) + .apply(GroupByKey.create()) + .apply( + "ExtractValuesToList", + ParDo.of( + new DoFn< + KV>>, + List>>() { + @ProcessElement + public void processElement( + @Element KV>> element, + ProcessContext c) { + List> result = new ArrayList<>(); + for (FileResult e : element.getValue()) { + result.add(e); + } + c.output(result); + } + })) + .setCoder(ListCoder.of(fileResultCoder)); + } + } + private class RandomShardingFunction implements ShardingFunction { private final Coder destinationCoder; @@ -749,6 +901,15 @@ public void processElement(ProcessContext context) throws Exception { private class WriteShardsIntoTempFilesFn extends DoFn, Iterable>, FileResult> { + private transient @Nullable List> closeFutures = null; + private transient @Nullable List>> deferredOutput = null; + + @StartBundle + public void startBundle() { + closeFutures = new ArrayList<>(); + deferredOutput = new ArrayList<>(); + } + @ProcessElement public void processElement(ProcessContext c, BoundedWindow window) throws Exception { getDynamicDestinations().setSideInputAccessorFromProcessContext(c); @@ -777,21 +938,43 @@ public void processElement(ProcessContext c, BoundedWindow window) throws Except // Close all writers. for (Map.Entry> entry : writers.entrySet()) { - Writer writer = entry.getValue(); - try { - // Close the writer; if this throws let the error propagate. - writer.close(); - } catch (Exception e) { - // If anything goes wrong, make sure to delete the temporary file. - writer.cleanup(); - throw e; - } int shard = c.element().getKey().getShardNumber(); checkArgument( shard != UNKNOWN_SHARDNUM, "Shard should have been set, but is unset for element %s", c.element()); - c.output(new FileResult<>(writer.getOutputFile(), shard, window, c.pane(), entry.getKey())); + Writer writer = entry.getValue(); + deferredOutput.add( + KV.of( + c.timestamp(), + new FileResult<>(writer.getOutputFile(), shard, window, c.pane(), entry.getKey()))); + closeWriterInBackground(writer); + } + } + + private void closeWriterInBackground(Writer writer) { + // Close in parallel so flushing of buffered writes to files for many windows happens in + // parallel. + closeFutures.add( + MoreFutures.runAsync( + () -> { + try { + // Close the writer; if this throws let the error propagate. + writer.close(); + } catch (Exception e) { + // If anything goes wrong, make sure to delete the temporary file. + writer.cleanup(); + throw e; + } + })); + } + + @FinishBundle + public void finishBundle(FinishBundleContext c) throws Exception { + MoreFutures.get(MoreFutures.allAsList(closeFutures)); + // If all writers were closed without exception, output the results to the next stage. + for (KV> result : deferredOutput) { + c.output(result.getValue(), result.getKey(), result.getValue().getWindow()); } } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MatchResult.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MatchResult.java index 3447687319be..ceb8ae8bf9ec 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MatchResult.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MatchResult.java @@ -25,6 +25,7 @@ import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.io.FileSystems; +import org.checkerframework.checker.nullness.qual.Nullable; /** The result of {@link org.apache.beam.sdk.io.FileSystem#match}. */ public abstract class MatchResult { @@ -89,6 +90,9 @@ public abstract static class Metadata implements Serializable { public abstract boolean isReadSeekEfficient(); + /** An optional checksum to identify the contents of a file. */ + public abstract @Nullable String checksum(); + /** * Last modification timestamp in milliseconds since Unix epoch. * @@ -113,6 +117,7 @@ public abstract static class Metadata implements Serializable { public static Builder builder() { return new AutoValue_MatchResult_Metadata.Builder() + .setIsReadSeekEfficient(false) .setLastModifiedMillis(UNKNOWN_LAST_MODIFIED_MILLIS); } @@ -127,6 +132,8 @@ public abstract static class Builder { public abstract Builder setLastModifiedMillis(long value); + public abstract Builder setChecksum(String value); + public abstract Metadata build(); } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MoveOptions.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MoveOptions.java index 5ffcdae99640..937921030526 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MoveOptions.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MoveOptions.java @@ -28,5 +28,6 @@ public interface MoveOptions { /** Defines the standard {@link MoveOptions}. */ enum StandardMoveOptions implements MoveOptions { IGNORE_MISSING_FILES, + SKIP_IF_DESTINATION_EXISTS, } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKey.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKey.java index 0cf62dcce1fc..eef8252ddd42 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKey.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKey.java @@ -22,8 +22,8 @@ import java.io.Serializable; import java.nio.ByteBuffer; import javax.annotation.Nonnull; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString.ByteIterator; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString.ByteIterator; import org.checkerframework.checker.nullness.qual.Nullable; /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingCounter.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingCounter.java index 7566dd8c7f83..82808d8cb633 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingCounter.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingCounter.java @@ -24,9 +24,15 @@ @Internal public class DelegatingCounter implements Metric, Counter, Serializable { private final MetricName name; + private final boolean processWideContainer; public DelegatingCounter(MetricName name) { + this(name, false); + } + + public DelegatingCounter(MetricName name, boolean processWideContainer) { this.name = name; + this.processWideContainer = processWideContainer; } /** Increment the counter. */ @@ -38,7 +44,10 @@ public void inc() { /** Increment the counter by the given amount. */ @Override public void inc(long n) { - MetricsContainer container = MetricsEnvironment.getCurrentContainer(); + MetricsContainer container = + this.processWideContainer + ? MetricsEnvironment.getProcessWideContainer() + : MetricsEnvironment.getCurrentContainer(); if (container != null) { container.getCounter(name).inc(n); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingDistribution.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingDistribution.java new file mode 100644 index 000000000000..6cfe98e01931 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingDistribution.java @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.metrics; + +import java.io.Serializable; +import org.apache.beam.sdk.annotations.Internal; + +/** + * Implementation of {@link Distribution} that delegates to the instance for the current context. + */ +@Internal +public class DelegatingDistribution implements Metric, Distribution, Serializable { + private final MetricName name; + private final boolean processWideContainer; + + public DelegatingDistribution(MetricName name) { + this(name, false); + } + + public DelegatingDistribution(MetricName name, boolean processWideContainer) { + this.name = name; + this.processWideContainer = processWideContainer; + } + + @Override + public void update(long value) { + MetricsContainer container = + this.processWideContainer + ? MetricsEnvironment.getProcessWideContainer() + : MetricsEnvironment.getCurrentContainer(); + if (container != null) { + container.getDistribution(name).update(value); + } + } + + @Override + public void update(long sum, long count, long min, long max) { + MetricsContainer container = + this.processWideContainer + ? MetricsEnvironment.getProcessWideContainer() + : MetricsEnvironment.getCurrentContainer(); + if (container != null) { + container.getDistribution(name).update(sum, count, min, max); + } + } + + @Override + public MetricName getName() { + return name; + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingHistogram.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingHistogram.java new file mode 100644 index 000000000000..74e3cf5719f7 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DelegatingHistogram.java @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.metrics; + +import java.io.Serializable; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.util.HistogramData; + +/** Implementation of {@link Histogram} that delegates to the instance for the current context. */ +@Internal +public class DelegatingHistogram implements Metric, Histogram, Serializable { + private final MetricName name; + private final HistogramData.BucketType bucketType; + private final boolean processWideContainer; + + public DelegatingHistogram( + MetricName name, HistogramData.BucketType bucketType, boolean processWideContainer) { + this.name = name; + this.bucketType = bucketType; + this.processWideContainer = processWideContainer; + } + + @Override + public void update(double value) { + MetricsContainer container = + processWideContainer + ? MetricsEnvironment.getProcessWideContainer() + : MetricsEnvironment.getCurrentContainer(); + if (container != null) { + container.getHistogram(name, bucketType).update(value); + } + } + + @Override + public MetricName getName() { + return name; + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Histogram.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Histogram.java new file mode 100644 index 000000000000..542a6ed79af0 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Histogram.java @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.metrics; + +import org.apache.beam.sdk.annotations.Experimental; + +/** A metric that reports information about the histogram of reported values. */ +@Experimental(Experimental.Kind.METRICS) +public interface Histogram extends Metric { + /** Add an observation to this histogram. */ + void update(double value); +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsContainer.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsContainer.java index d40619554232..f10fccdd44d3 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsContainer.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsContainer.java @@ -18,8 +18,10 @@ package org.apache.beam.sdk.metrics; import java.io.Serializable; +import org.apache.beam.model.pipeline.v1.MetricsApi; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.util.HistogramData; /** * Holds the metrics for a single step. Each of the methods should return an implementation of the @@ -45,4 +47,17 @@ public interface MetricsContainer extends Serializable { * this container. */ Gauge getGauge(MetricName metricName); + + /** + * Return the {@link Histogram} that should be used for implementing the given {@code metricName} + * in this container. + */ + default Histogram getHistogram(MetricName metricName, HistogramData.BucketType bucketType) { + throw new RuntimeException("Histogram metric is not supported yet."); + } + + /** Return the cumulative values for any metrics in this container as MonitoringInfos. */ + default Iterable getMonitoringInfos() { + throw new RuntimeException("getMonitoringInfos is not implemented on this MetricsContainer."); + } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsEnvironment.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsEnvironment.java index 4df429572b51..d01ce4ecc93a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsEnvironment.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsEnvironment.java @@ -20,9 +20,11 @@ import java.io.Closeable; import java.io.IOException; import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.annotations.Internal; +import org.checkerframework.checker.nullness.qual.NonNull; import org.checkerframework.checker.nullness.qual.Nullable; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -51,8 +53,11 @@ public class MetricsEnvironment { private static final AtomicBoolean METRICS_SUPPORTED = new AtomicBoolean(false); private static final AtomicBoolean REPORTED_MISSING_CONTAINER = new AtomicBoolean(false); - private static final ThreadLocal<@Nullable MetricsContainer> CONTAINER_FOR_THREAD = - new ThreadLocal<>(); + private static final ThreadLocal<@Nullable MetricsContainerHolder> CONTAINER_FOR_THREAD = + ThreadLocal.withInitial(MetricsContainerHolder::new); + + private static final AtomicReference<@Nullable MetricsContainer> PROCESS_WIDE_METRICS_CONTAINER = + new AtomicReference<>(); /** * Set the {@link MetricsContainer} for the current thread. @@ -61,15 +66,24 @@ public class MetricsEnvironment { */ public static @Nullable MetricsContainer setCurrentContainer( @Nullable MetricsContainer container) { - MetricsContainer previous = CONTAINER_FOR_THREAD.get(); - if (container == null) { - CONTAINER_FOR_THREAD.remove(); - } else { - CONTAINER_FOR_THREAD.set(container); - } + @SuppressWarnings("nullness") // Non-null due to withInitialValue + @NonNull + MetricsContainerHolder holder = CONTAINER_FOR_THREAD.get(); + @Nullable MetricsContainer previous = holder.container; + holder.container = container; return previous; } + /** + * Set the {@link MetricsContainer} for the current process. + * + * @return The previous container for the current process. + */ + public static @Nullable MetricsContainer setProcessWideContainer( + @Nullable MetricsContainer container) { + return PROCESS_WIDE_METRICS_CONTAINER.getAndSet(container); + } + /** Called by the run to indicate whether metrics reporting is supported. */ public static void setMetricsSupported(boolean supported) { METRICS_SUPPORTED.set(supported); @@ -91,16 +105,20 @@ public static Closeable scopedMetricsContainer(MetricsContainer container) { } private static class ScopedContainer implements Closeable { - + private final MetricsContainerHolder holder; private final @Nullable MetricsContainer oldContainer; + @SuppressWarnings("nullness") // Non-null due to withInitialValue private ScopedContainer(MetricsContainer newContainer) { - this.oldContainer = setCurrentContainer(newContainer); + // It is safe to cache the thread-local holder because it never changes for the thread. + holder = CONTAINER_FOR_THREAD.get(); + this.oldContainer = holder.container; + holder.container = newContainer; } @Override public void close() throws IOException { - setCurrentContainer(oldContainer); + holder.container = oldContainer; } } @@ -112,7 +130,8 @@ public void close() throws IOException { * diagnostic message. */ public static @Nullable MetricsContainer getCurrentContainer() { - MetricsContainer container = CONTAINER_FOR_THREAD.get(); + @SuppressWarnings("nullness") // Non-null due to withInitialValue + MetricsContainer container = CONTAINER_FOR_THREAD.get().container; if (container == null && REPORTED_MISSING_CONTAINER.compareAndSet(false, true)) { if (isMetricsSupported()) { LOG.error( @@ -124,4 +143,13 @@ public void close() throws IOException { } return container; } + + /** Return the {@link MetricsContainer} for the current process. */ + public static @Nullable MetricsContainer getProcessWideContainer() { + return PROCESS_WIDE_METRICS_CONTAINER.get(); + } + + private static class MetricsContainerHolder { + public @Nullable MetricsContainer container = null; + } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsFilter.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsFilter.java index dbdec821a1d7..c08403e224bf 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsFilter.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsFilter.java @@ -80,6 +80,9 @@ public Builder addNameFilter(MetricNameFilter nameFilter) { *

    Step name filters may be either a full name (such as "foo/bar/baz") or a partial name such * as "foo", "bar" or "foo/bar". However, each component of the step name must be completely * matched, so the filter "foo" will not match a step name such as "fool/bar/foot" + * + *

    TODO(BEAM-12154): Beam does not guarantee a specific format for step IDs hence we should + * not assume a "foo/bar/baz" format here. */ public Builder addStep(String step) { immutableStepsBuilder().add(step); diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ExperimentalOptions.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ExperimentalOptions.java index cfc9f6d3cf46..8c15b8b5793f 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ExperimentalOptions.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ExperimentalOptions.java @@ -36,6 +36,8 @@ public interface ExperimentalOptions extends PipelineOptions { String STATE_CACHE_SIZE = "state_cache_size"; + String STATE_SAMPLING_PERIOD_MILLIS = "state_sampling_period_millis"; + @Description( "[Experimental] Apache Beam provides a number of experimental features that can " + "be enabled with this flag. If executing against a managed service, please contact the " diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/FileStagingOptions.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/FileStagingOptions.java new file mode 100644 index 000000000000..d480d123bc20 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/FileStagingOptions.java @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.options; + +import java.util.List; +import org.apache.beam.sdk.annotations.Experimental; + +/** File staging related options. */ +@Experimental +public interface FileStagingOptions extends PipelineOptions { + /** + * List of local files to make available to workers. + * + *

    Files are placed on the worker's classpath. + * + *

    The default value is the list of jars from the main program's classpath. + */ + @Description( + "Files to stage to the artifact service and make available to workers. Files are placed on " + + "the worker's classpath. The default value is all files from the classpath.") + List getFilesToStage(); + + void setFilesToStage(List value); +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java index 9ffd1f75d423..8a5e07b18ff8 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java @@ -136,6 +136,8 @@ *

  • Only getters may be annotated with {@link JsonIgnore @JsonIgnore}. *
  • If any getter is annotated with {@link JsonIgnore @JsonIgnore}, then all getters for this * property must be annotated with {@link JsonIgnore @JsonIgnore}. + *
  • If any getter is annotated with {@link JsonDeserialize} and {@link JsonSerialize}, then all + * getters for this property must also be. * * *

    Annotations For PipelineOptions

    @@ -161,6 +163,10 @@ *

    {@link JsonIgnore @JsonIgnore} is used to prevent a property from being serialized and * available during execution of {@link DoFn}. See the Serialization section below for more details. * + *

    {@link JsonSerialize @JsonSerialize} and {@link JsonDeserialize @JsonDeserialize} is used to + * control how a property is (de)serialized when the PipelineOptions are (de)serialized to JSON. See + * the Serialization section below for more details. + * *

    Registration Of PipelineOptions

    * *

    Registration of {@link PipelineOptions} by an application guarantees that the {@link @@ -198,6 +204,14 @@ * Jackson's ability to automatically configure the {@link ObjectMapper} with additional modules via * {@link ObjectMapper#findModules()}. * + *

    To further customize serialization, getter methods may be annotated with {@link + * JsonSerialize @JsonSerialize} and {@link JsonDeserialize @JsonDeserialize}. {@link + * JsonDeserialize @JsonDeserialize} is also used when parsing command line arguments. + * + *

    Note: A property must be annotated with BOTH{@link JsonDeserialize @JsonDeserialize} + * and {@link JsonSerialize @JsonSerialize} or neither. It is an error to have a property annotated + * with only {@link JsonDeserialize @JsonDeserialize} or {@link JsonSerialize @JsonSerialize}. + * *

    Note: It is an error to have the same property available in multiple interfaces with only some * of them being annotated with {@link JsonIgnore @JsonIgnore}. It is also an error to mark a setter * for a property with {@link JsonIgnore @JsonIgnore}. diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java index de0e5140abda..ea9d77a7b20a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java @@ -22,8 +22,30 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import com.fasterxml.jackson.annotation.JsonIgnore; +import com.fasterxml.jackson.core.JsonParseException; +import com.fasterxml.jackson.core.JsonParser; +import com.fasterxml.jackson.databind.BeanProperty; +import com.fasterxml.jackson.databind.InjectableValues; import com.fasterxml.jackson.databind.JavaType; +import com.fasterxml.jackson.databind.JsonDeserializer; +import com.fasterxml.jackson.databind.JsonMappingException; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.JsonSerializer; import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.annotation.JsonDeserialize; +import com.fasterxml.jackson.databind.annotation.JsonSerialize; +import com.fasterxml.jackson.databind.deser.DefaultDeserializationContext; +import com.fasterxml.jackson.databind.deser.impl.MethodProperty; +import com.fasterxml.jackson.databind.introspect.AnnotatedMember; +import com.fasterxml.jackson.databind.introspect.AnnotatedMethod; +import com.fasterxml.jackson.databind.introspect.AnnotationCollector; +import com.fasterxml.jackson.databind.introspect.BeanPropertyDefinition; +import com.fasterxml.jackson.databind.introspect.TypeResolutionContext; +import com.fasterxml.jackson.databind.node.TreeTraversingParser; +import com.fasterxml.jackson.databind.ser.DefaultSerializerProvider; +import com.fasterxml.jackson.databind.type.TypeBindings; +import com.fasterxml.jackson.databind.util.SimpleBeanPropertyDefinition; +import com.fasterxml.jackson.databind.util.TokenBuffer; import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; import java.beans.BeanInfo; import java.beans.IntrospectionException; @@ -477,6 +499,17 @@ Class getProxyClass() { new ObjectMapper() .registerModules(ObjectMapper.findModules(ReflectHelpers.findClassLoader())); + private static final DefaultDeserializationContext DESERIALIZATION_CONTEXT = + new DefaultDeserializationContext.Impl(MAPPER.getDeserializationContext().getFactory()) + .createInstance( + MAPPER.getDeserializationConfig(), + new TokenBuffer(MAPPER, false).asParser(), + new InjectableValues.Std()); + + static final DefaultSerializerProvider SERIALIZER_PROVIDER = + new DefaultSerializerProvider.Impl() + .createInstance(MAPPER.getSerializationConfig(), MAPPER.getSerializerFactory()); + /** Classes that are used as the boundary in the stack trace to find the callers class name. */ private static final ImmutableSet PIPELINE_OPTIONS_FACTORY_CLASSES = ImmutableSet.of(PipelineOptionsFactory.class.getName(), Builder.class.getName()); @@ -1058,6 +1091,17 @@ private static void validateMethodAnnotations( validateGettersHaveConsistentAnnotation( methodNameToAllMethodMap, descriptors, AnnotationPredicates.DEFAULT_VALUE); + // Verify that there is no getter with a mixed @JsonDeserialize annotation. + validateGettersHaveConsistentAnnotation( + methodNameToAllMethodMap, descriptors, AnnotationPredicates.JSON_DESERIALIZE); + + // Verify that there is no getter with a mixed @JsonSerialize annotation. + validateGettersHaveConsistentAnnotation( + methodNameToAllMethodMap, descriptors, AnnotationPredicates.JSON_SERIALIZE); + + // Verify that if a method has either @JsonSerialize or @JsonDeserialize then it has both. + validateMethodsHaveBothJsonSerializeAndDeserialize(descriptors); + // Verify that no setter has @JsonIgnore. validateSettersDoNotHaveAnnotation( methodNameToAllMethodMap, descriptors, AnnotationPredicates.JSON_IGNORE); @@ -1065,6 +1109,14 @@ private static void validateMethodAnnotations( // Verify that no setter has @Default. validateSettersDoNotHaveAnnotation( methodNameToAllMethodMap, descriptors, AnnotationPredicates.DEFAULT_VALUE); + + // Verify that no setter has @JsonDeserialize. + validateSettersDoNotHaveAnnotation( + methodNameToAllMethodMap, descriptors, AnnotationPredicates.JSON_DESERIALIZE); + + // Verify that no setter has @JsonSerialize. + validateSettersDoNotHaveAnnotation( + methodNameToAllMethodMap, descriptors, AnnotationPredicates.JSON_SERIALIZE); } /** Validates that getters don't have mixed annotation. */ @@ -1246,6 +1298,31 @@ private static void validateMethodsAreEitherBeanMethodOrKnownMethod( iface.getName()); } + private static void validateMethodsHaveBothJsonSerializeAndDeserialize( + List descriptors) { + List inconsistentMethods = + Lists.newArrayList(); + for (final PropertyDescriptor descriptor : descriptors) { + Method readMethod = descriptor.getReadMethod(); + if (readMethod == null || IGNORED_METHODS.contains(descriptor.getReadMethod())) { + continue; + } + + boolean hasJsonSerialize = AnnotationPredicates.JSON_SERIALIZE.forMethod.apply(readMethod); + boolean hasJsonDeserialize = + AnnotationPredicates.JSON_DESERIALIZE.forMethod.apply(readMethod); + if (hasJsonSerialize ^ hasJsonDeserialize) { + InconsistentJsonSerializeAndDeserializeAnnotation inconsistentAnnotation = + new InconsistentJsonSerializeAndDeserializeAnnotation(); + inconsistentAnnotation.property = descriptor; + inconsistentAnnotation.hasJsonDeserializeAttribute = hasJsonDeserialize; + inconsistentMethods.add(inconsistentAnnotation); + } + } + + throwForInconsistentJsonSerializeAndDeserializeAnnotation(inconsistentMethods); + } + private static void checkInheritedFrom( Class checkClass, Class fromClass, Set> nonPipelineOptions) { if (checkClass.equals(fromClass)) { @@ -1415,6 +1492,38 @@ private static void throwForMissingBeanMethod( } } + private static class InconsistentJsonSerializeAndDeserializeAnnotation { + PropertyDescriptor property; + boolean hasJsonDeserializeAttribute; + } + + private static void throwForInconsistentJsonSerializeAndDeserializeAnnotation( + List inconsistentAnnotations) + throws IllegalArgumentException { + if (inconsistentAnnotations.isEmpty()) { + return; + } + + StringBuilder builder = + new StringBuilder( + "Found incorrectly annotated property methods, if a method is annotated with either @JsonSerialize or @JsonDeserialize then it must be annotated with both."); + + for (InconsistentJsonSerializeAndDeserializeAnnotation annotation : inconsistentAnnotations) { + String presentAnnotation; + if (annotation.hasJsonDeserializeAttribute) { + presentAnnotation = "JsonDeserialize"; + } else { + presentAnnotation = "JsonSerialize"; + } + builder.append( + String.format( + "%n - Property [%s] had only @%s", + annotation.property.getName(), presentAnnotation)); + } + + throw new IllegalArgumentException(builder.toString()); + } + /** A {@link Comparator} that uses the classes name to compare them. */ private static class ClassNameComparator implements Comparator> { static final ClassNameComparator INSTANCE = new ClassNameComparator(); @@ -1500,6 +1609,18 @@ static class AnnotationPredicates { return false; }); + static final AnnotationPredicates JSON_DESERIALIZE = + new AnnotationPredicates( + JsonDeserialize.class, + input -> JsonDeserialize.class.equals(input.annotationType()), + input -> input.isAnnotationPresent(JsonDeserialize.class)); + + static final AnnotationPredicates JSON_SERIALIZE = + new AnnotationPredicates( + JsonSerialize.class, + input -> JsonSerialize.class.equals(input.annotationType()), + input -> input.isAnnotationPresent(JsonSerialize.class)); + final Class annotationClass; final Predicate forAnnotation; final Predicate forMethod; @@ -1572,6 +1693,141 @@ private static ListMultimap parseCommandLine( return builder.build(); } + private static BeanProperty createBeanProperty(Method method) { + AnnotationCollector ac = AnnotationCollector.emptyCollector(); + for (Annotation ann : method.getAnnotations()) { + ac = ac.addOrOverride(ann); + } + + AnnotatedMethod annotatedMethod = + new AnnotatedMethod( + new TypeResolutionContext.Basic(MAPPER.getTypeFactory(), TypeBindings.emptyBindings()), + method, + ac.asAnnotationMap(), + null); + + BeanPropertyDefinition propDef = + SimpleBeanPropertyDefinition.construct(MAPPER.getDeserializationConfig(), annotatedMethod); + + JavaType type = MAPPER.constructType(method.getGenericReturnType()); + + try { + return new MethodProperty( + propDef, + type, + MAPPER.getDeserializationConfig().findTypeDeserializer(type), + annotatedMethod.getAllAnnotations(), + annotatedMethod); + } catch (JsonMappingException e) { + throw new RuntimeException(e); + } + } + + private static JsonDeserializer computeDeserializerForMethod(Method method) { + try { + BeanProperty prop = createBeanProperty(method); + AnnotatedMember annotatedMethod = prop.getMember(); + + Object maybeDeserializerClass = + DESERIALIZATION_CONTEXT.getAnnotationIntrospector().findDeserializer(annotatedMethod); + + JsonDeserializer jsonDeserializer = + DESERIALIZATION_CONTEXT.deserializerInstance(annotatedMethod, maybeDeserializerClass); + + if (jsonDeserializer == null) { + jsonDeserializer = + DESERIALIZATION_CONTEXT.findContextualValueDeserializer(prop.getType(), prop); + } + return jsonDeserializer; + } catch (JsonMappingException e) { + throw new RuntimeException(e); + } + } + + private static Optional> computeCustomSerializerForMethod(Method method) { + try { + BeanProperty prop = createBeanProperty(method); + AnnotatedMember annotatedMethod = prop.getMember(); + + Object maybeSerializerClass = + SERIALIZER_PROVIDER.getAnnotationIntrospector().findSerializer(annotatedMethod); + + return Optional.fromNullable( + SERIALIZER_PROVIDER.serializerInstance(annotatedMethod, maybeSerializerClass)); + } catch (JsonMappingException e) { + throw new RuntimeException(e); + } + } + + /** + * Get a {@link JsonDeserializer} for a given method. If the method is annotated with {@link + * JsonDeserialize} the specified deserializer from the annotation is returned, otherwise the + * default is returned. + */ + private static JsonDeserializer getDeserializerForMethod(Method method) { + return CACHE + .get() + .deserializerCache + .computeIfAbsent(method, PipelineOptionsFactory::computeDeserializerForMethod); + } + + /** + * Get a {@link JsonSerializer} for a given method. If the method is annotated with {@link + * JsonDeserialize} the specified serializer from the annotation is returned, otherwise null is + * returned. + */ + static @Nullable JsonSerializer getCustomSerializerForMethod(Method method) { + return CACHE + .get() + .serializerCache + .computeIfAbsent(method, PipelineOptionsFactory::computeCustomSerializerForMethod) + .orNull(); + } + + static Object deserializeNode(JsonNode node, Method method) throws IOException { + if (node.isNull()) { + return null; + } + + JsonParser parser = new TreeTraversingParser(node, MAPPER); + parser.nextToken(); + + JsonDeserializer jsonDeserializer = getDeserializerForMethod(method); + return jsonDeserializer.deserialize(parser, DESERIALIZATION_CONTEXT); + } + + /** + * Attempt to parse an input string into an instance of `type` using an {@link ObjectMapper}. + * + *

    If the getter method is annotated with {@link + * com.fasterxml.jackson.databind.annotation.JsonDeserialize} the specified deserializer will be + * used, otherwise the default ObjectMapper deserialization strategy is used. + * + *

    Parsing is attempted twice, once with the raw string value. If that attempt fails, another + * attempt is made by wrapping the value in quotes so that it is interpreted as a JSON string. + */ + private static Object tryParseObject(String value, Method method) throws IOException { + + JsonNode tree; + try { + tree = MAPPER.readTree(value); + } catch (JsonParseException e) { + // try again, quoting the input string if it wasn't already + if (!(value.startsWith("\"") && value.endsWith("\""))) { + try { + tree = MAPPER.readTree("\"" + value + "\""); + } catch (JsonParseException inner) { + // rethrow the original exception rather the one thrown from the fallback attempt + throw e; + } + } else { + throw e; + } + } + + return deserializeNode(tree, method); + } + /** * Using the parsed string arguments, we convert the strings to the expected return type of the * methods that are found on the passed-in class. @@ -1632,6 +1888,7 @@ private static Map parseObjects( // Only allow empty argument values for String, String Array, and Collection. Class returnType = method.getReturnType(); JavaType type = MAPPER.getTypeFactory().constructType(method.getGenericReturnType()); + if ("runner".equals(entry.getKey())) { String runner = Iterables.getOnlyElement(entry.getValue()); final Map>> pipelineRunners = @@ -1680,7 +1937,7 @@ private static Map parseObjects( checkEmptyStringAllowed(returnType, type, method.getGenericReturnType().toString()); } try { - convertedOptions.put(entry.getKey(), MAPPER.readValue(value, type)); + convertedOptions.put(entry.getKey(), tryParseObject(value, method)); } catch (IOException e) { throw new IllegalArgumentException("Unable to parse JSON value " + value, e); } @@ -1791,6 +2048,11 @@ static final class Cache { private final Map>, Registration> combinedCache = Maps.newConcurrentMap(); + private final Map> deserializerCache = Maps.newConcurrentMap(); + + private final Map>> serializerCache = + Maps.newConcurrentMap(); + private Cache() { final ClassLoader loader = ReflectHelpers.findClassLoader(); // Store the list of all available pipeline runners. diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java index 226ee8055181..627077311f8b 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java @@ -25,23 +25,7 @@ /** Pipeline options common to all portable runners. */ @Experimental(Kind.PORTABILITY) -public interface PortablePipelineOptions extends PipelineOptions { - - // TODO: https://issues.apache.org/jira/browse/BEAM-4106: Consider pulling this out into a new - // options interface, e.g., FileStagingOptions. - /** - * List of local files to make available to workers. - * - *

    Files are placed on the worker's classpath. - * - *

    The default value is the list of jars from the main program's classpath. - */ - @Description( - "Files to stage to the artifact service and make available to workers. Files are placed on " - + "the worker's classpath. The default value is all files from the classpath.") - List getFilesToStage(); - - void setFilesToStage(List value); +public interface PortablePipelineOptions extends PipelineOptions, FileStagingOptions { @Description( "Job service endpoint to use. Should be in the form of address and port, e.g. localhost:3000") @@ -50,6 +34,17 @@ public interface PortablePipelineOptions extends PipelineOptions { void setJobEndpoint(String endpoint); + @Description( + "Job service request timeout in seconds. The timeout " + + "determines the max time the driver program will wait to " + + "get a response from the job server. NOTE: the timeout does not " + + "apply to the actual pipeline run time. The driver program will " + + "still wait for job completion indefinitely.") + @Default.Integer(60) + int getJobServerTimeout(); + + void setJobServerTimeout(int timeout); + @Description( "Set the default environment type for running user code. " + "Possible options are DOCKER and PROCESS.") @@ -103,4 +98,32 @@ public interface PortablePipelineOptions extends PipelineOptions { String getOutputExecutablePath(); void setOutputExecutablePath(String outputExecutablePath); + + @Description( + "Options for configuring the default environment of portable workers. This environment will be used for all executable stages except for external transforms. Recognized options depend on the value of defaultEnvironmentType:\n" + + "DOCKER: docker_container_image (optional), e.g. 'apache/beam_java8_sdk:latest'. If unset, will default to the latest official release of the Beam Java SDK corresponding to your Java runtime version (8 or 11).\n" + + "EXTERNAL: external_service_address (required), e.g. 'localhost:50000'\n" + + "PROCESS: process_command (required), process_variables (optional). process_command must be the location of an executable file that starts a Beam SDK worker. process_variables is a comma-separated list of environment variable assignments which will be set before running the process, e.g. 'FOO=a,BAR=b'\n\n" + + "environmentOptions and defaultEnvironmentConfig are mutually exclusive. Prefer environmentOptions.") + List getEnvironmentOptions(); + + void setEnvironmentOptions(List value); + + /** Return the value for the specified environment option or empty string if not present. */ + static String getEnvironmentOption( + PortablePipelineOptions options, String environmentOptionName) { + List environmentOptions = options.getEnvironmentOptions(); + if (environmentOptions == null) { + return ""; + } + + for (String environmentEntry : environmentOptions) { + String[] tokens = environmentEntry.split(environmentOptionName + "=", -1); + if (tokens.length > 1) { + return tokens[1]; + } + } + + return ""; + } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java index 9d7a11ef6d01..92835094fab8 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java @@ -23,15 +23,14 @@ import com.fasterxml.jackson.annotation.JsonIgnore; import com.fasterxml.jackson.core.JsonGenerator; import com.fasterxml.jackson.core.JsonParser; -import com.fasterxml.jackson.core.JsonProcessingException; import com.fasterxml.jackson.databind.DeserializationContext; -import com.fasterxml.jackson.databind.JavaType; import com.fasterxml.jackson.databind.JsonDeserializer; import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.JsonSerializer; import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.databind.SerializerProvider; import com.fasterxml.jackson.databind.node.ObjectNode; +import com.fasterxml.jackson.databind.util.TokenBuffer; import com.google.auto.value.AutoValue; import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; import java.beans.PropertyDescriptor; @@ -43,7 +42,6 @@ import java.lang.reflect.Method; import java.lang.reflect.ParameterizedType; import java.lang.reflect.Proxy; -import java.lang.reflect.Type; import java.util.Arrays; import java.util.Collection; import java.util.HashMap; @@ -324,6 +322,10 @@ public void populateDisplayData(DisplayData.Builder builder) { continue; } + if (optionSpec.getGetterMethod().isAnnotationPresent(Hidden.class)) { + continue; + } + builder.add( DisplayData.item(option.getKey(), resolved.getType(), resolved.getValue()) .withNamespace(optionSpec.getDefiningInterface())); @@ -350,6 +352,10 @@ public void populateDisplayData(DisplayData.Builder builder) { continue; } + if (spec.getGetterMethod().isAnnotationPresent(Hidden.class)) { + continue; + } + Object value = getValueFromJson(jsonOption.getKey(), spec.getGetterMethod()); DisplayDataValue resolved = DisplayDataValue.resolve(value); builder.add( @@ -502,13 +508,13 @@ public synchronized String toString() { * @return An object matching the return type of the method passed in. */ private Object getValueFromJson(String propertyName, Method method) { + JsonNode jsonNode = jsonOptions.get(propertyName); + return getValueFromJson(jsonNode, method); + } + + private static Object getValueFromJson(JsonNode node, Method method) { try { - JavaType type = - PipelineOptionsFactory.MAPPER - .getTypeFactory() - .constructType(method.getGenericReturnType()); - JsonNode jsonNode = jsonOptions.get(propertyName); - return PipelineOptionsFactory.MAPPER.readValue(jsonNode.toString(), type); + return PipelineOptionsFactory.deserializeNode(node, method); } catch (IOException e) { throw new RuntimeException("Unable to parse representation", e); } @@ -646,18 +652,35 @@ private static Map generateSettersToPropertyNames( } static class Serializer extends JsonSerializer { + private void serializeEntry( + String name, + Object value, + JsonGenerator jgen, + Map> customSerializers) + throws IOException { + + JsonSerializer customSerializer = customSerializers.get(name); + if (value == null || customSerializer == null || value instanceof JsonNode) { + jgen.writeObject(value); + } else { + customSerializer.serialize(value, jgen, PipelineOptionsFactory.SERIALIZER_PROVIDER); + } + } + @Override public void serialize(PipelineOptions value, JsonGenerator jgen, SerializerProvider provider) - throws IOException, JsonProcessingException { + throws IOException { ProxyInvocationHandler handler = (ProxyInvocationHandler) Proxy.getInvocationHandler(value); synchronized (handler) { + PipelineOptionsFactory.Cache cache = PipelineOptionsFactory.CACHE.get(); // We first filter out any properties that have been modified since // the last serialization of this PipelineOptions and then verify that // they are all serializable. Map filteredOptions = Maps.newHashMap(handler.options); - PipelineOptionsFactory.Cache cache = PipelineOptionsFactory.CACHE.get(); + Map> propertyToSerializer = + getSerializerMap(cache, handler.knownInterfaces); removeIgnoredOptions(cache, handler.knownInterfaces, filteredOptions); - ensureSerializable(cache, handler.knownInterfaces, filteredOptions); + ensureSerializable(cache, handler.knownInterfaces, filteredOptions, propertyToSerializer); // Now we create the map of serializable options by taking the original // set of serialized options (if any) and updating them with any properties @@ -669,7 +692,15 @@ public void serialize(PipelineOptions value, JsonGenerator jgen, SerializerProvi jgen.writeStartObject(); jgen.writeFieldName("options"); - jgen.writeObject(serializableOptions); + + jgen.writeStartObject(); + + for (Map.Entry entry : serializableOptions.entrySet()) { + jgen.writeFieldName(entry.getKey()); + serializeEntry(entry.getKey(), entry.getValue(), jgen, propertyToSerializer); + } + + jgen.writeEndObject(); List> serializedDisplayData = Lists.newArrayList(); DisplayData displayData = DisplayData.from(value); @@ -686,6 +717,23 @@ public void serialize(PipelineOptions value, JsonGenerator jgen, SerializerProvi } } + private Map> getSerializerMap( + PipelineOptionsFactory.Cache cache, Set> interfaces) { + + Map> propertyToSerializer = Maps.newHashMap(); + for (PropertyDescriptor descriptor : cache.getPropertyDescriptors(interfaces)) { + if (descriptor.getReadMethod() != null) { + JsonSerializer maybeSerializer = + PipelineOptionsFactory.getCustomSerializerForMethod(descriptor.getReadMethod()); + if (maybeSerializer != null) { + propertyToSerializer.put(descriptor.getName(), maybeSerializer); + } + } + } + + return propertyToSerializer; + } + /** * We remove all properties within the passed in options where there getter is annotated with * {@link JsonIgnore @JsonIgnore} from the passed in options using the passed in interfaces. @@ -716,27 +764,27 @@ private void removeIgnoredOptions( private void ensureSerializable( PipelineOptionsFactory.Cache cache, Set> interfaces, - Map options) + Map options, + Map> propertyToSerializer) throws IOException { // Construct a map from property name to the return type of the getter. - Map propertyToReturnType = Maps.newHashMap(); + Map propertyToReadMethod = Maps.newHashMap(); for (PropertyDescriptor descriptor : cache.getPropertyDescriptors(interfaces)) { if (descriptor.getReadMethod() != null) { - propertyToReturnType.put( - descriptor.getName(), descriptor.getReadMethod().getGenericReturnType()); + propertyToReadMethod.put(descriptor.getName(), descriptor.getReadMethod()); } } // Attempt to serialize and deserialize each property. for (Map.Entry entry : options.entrySet()) { try { - String serializedValue = - PipelineOptionsFactory.MAPPER.writeValueAsString(entry.getValue().getValue()); - JavaType type = - PipelineOptionsFactory.MAPPER - .getTypeFactory() - .constructType(propertyToReturnType.get(entry.getKey())); - PipelineOptionsFactory.MAPPER.readValue(serializedValue, type); + Object boundValue = entry.getValue().getValue(); + if (boundValue != null) { + TokenBuffer buffer = new TokenBuffer(PipelineOptionsFactory.MAPPER, false); + serializeEntry(entry.getKey(), boundValue, buffer, propertyToSerializer); + Method method = propertyToReadMethod.get(entry.getKey()); + getValueFromJson(buffer.asParser().readValueAsTree(), method); + } } catch (Exception e) { throw new IOException( String.format( @@ -751,7 +799,7 @@ private void ensureSerializable( static class Deserializer extends JsonDeserializer { @Override public PipelineOptions deserialize(JsonParser jp, DeserializationContext ctxt) - throws IOException, JsonProcessingException { + throws IOException { ObjectNode objectNode = jp.readValueAsTree(); JsonNode rawOptionsNode = objectNode.get("options"); diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/AppliedPTransform.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/AppliedPTransform.java index 34d3d2ce1750..ab1401e69abd 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/AppliedPTransform.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/AppliedPTransform.java @@ -23,6 +23,7 @@ import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PInput; import org.apache.beam.sdk.values.POutput; @@ -59,8 +60,9 @@ AppliedPTransform of( Map, PCollection> input, Map, PCollection> output, TransformT transform, + ResourceHints resourceHints, Pipeline p) { - return new AutoValue_AppliedPTransform<>(fullName, input, output, transform, p); + return new AutoValue_AppliedPTransform<>(fullName, input, output, transform, resourceHints, p); } public abstract String getFullName(); @@ -71,6 +73,8 @@ AppliedPTransform of( public abstract TransformT getTransform(); + public abstract ResourceHints getResourceHints(); + public abstract Pipeline getPipeline(); /** @return map of {@link TupleTag TupleTags} which are not side inputs. */ diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java index df53e01e78a8..b04266cdd418 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java @@ -35,6 +35,7 @@ import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.runners.PTransformOverrideFactory.ReplacementOutput; import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PInput; import org.apache.beam.sdk.values.POutput; @@ -71,11 +72,11 @@ public class TransformHierarchy { // Maintain a stack based on the enclosing nodes private Node current; - public TransformHierarchy() { + public TransformHierarchy(ResourceHints resourceHints) { producers = new HashMap<>(); producerInput = new HashMap<>(); unexpandedInputs = new HashMap<>(); - root = new Node(); + root = new Node(resourceHints); current = root; } @@ -257,18 +258,21 @@ public class Node { // Output of the transform, in expanded form. Null if not yet set. private @Nullable Map, PCollection> outputs; + private final ResourceHints resourceHints; + @VisibleForTesting boolean finishedSpecifying = false; /** * Creates the root-level node. The root level node has a null enclosing node, a null transform, * an empty map of inputs, an empty map of outputs, and a name equal to the empty string. */ - private Node() { + private Node(ResourceHints resourceHints) { this.enclosingNode = null; this.transform = null; this.fullName = ""; this.inputs = Collections.emptyMap(); this.outputs = Collections.emptyMap(); + this.resourceHints = resourceHints; } /** @@ -287,6 +291,7 @@ private Node(Node enclosingNode, PTransform transform, String fullName, PI inputs.putAll(PValues.expandInput(input)); inputs.putAll(PValues.fullyExpand(transform.getAdditionalInputs())); this.inputs = inputs.build(); + this.resourceHints = transform.getResourceHints().mergeWithOuter(enclosingNode.resourceHints); } /** @@ -313,6 +318,7 @@ private Node( this.fullName = fullName; this.inputs = inputs == null ? Collections.emptyMap() : inputs; this.outputs = outputs == null ? Collections.emptyMap() : outputs; + this.resourceHints = transform.getResourceHints().mergeWithOuter(enclosingNode.resourceHints); } /** @@ -405,12 +411,12 @@ private void setOutput(POutput output) { checkState( this.outputs == null, "Tried to specify more than one output for %s", getFullName()); checkNotNull(output, "Tried to set the output of %s to null", getFullName()); - this.outputs = PValues.fullyExpand(output.expand()); + this.outputs = PValues.expandOutput(output); // Validate that a primitive transform produces only primitive output, and a composite // transform does not produce primitive output. Set outputProducers = new HashSet<>(); - for (PCollection outputValue : PValues.fullyExpand(output.expand()).values()) { + for (PCollection outputValue : PValues.expandOutput(output).values()) { outputProducers.add(getProducer(outputValue)); } if (outputProducers.contains(this) && (!parts.isEmpty() || outputProducers.size() > 1)) { @@ -490,7 +496,7 @@ public Map, PCollection> getOutputs() { /** Returns the {@link AppliedPTransform} representing this {@link Node}. */ public AppliedPTransform toAppliedPTransform(Pipeline pipeline) { return AppliedPTransform.of( - getFullName(), inputs, outputs, (PTransform) getTransform(), pipeline); + getFullName(), inputs, outputs, (PTransform) getTransform(), resourceHints, pipeline); } /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/CachingFactory.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/CachingFactory.java index 1a817cfb9d14..e870b495d496 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/CachingFactory.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/CachingFactory.java @@ -35,7 +35,7 @@ "nullness", // TODO(https://issues.apache.org/jira/browse/BEAM-10402) "rawtypes" }) -class CachingFactory implements Factory { +public class CachingFactory implements Factory { private transient @Nullable ConcurrentHashMap cache = null; private final Factory innerFactory; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FieldValueTypeInformation.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FieldValueTypeInformation.java index e512a05484d6..022097052349 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FieldValueTypeInformation.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FieldValueTypeInformation.java @@ -27,6 +27,7 @@ import java.util.Arrays; import java.util.Collections; import java.util.Map; +import java.util.stream.Stream; import org.apache.beam.sdk.schemas.annotations.SchemaCaseFormat; import org.apache.beam.sdk.schemas.annotations.SchemaFieldName; import org.apache.beam.sdk.schemas.logicaltypes.OneOfType; @@ -176,19 +177,12 @@ public static FieldValueTypeInformation forGetter(Method method) { } private static boolean hasNullableAnnotation(Field field) { - for (Annotation annotation : field.getAnnotations()) { - if (isNullableAnnotation(annotation)) { - return true; - } - } - - for (Annotation annotation : field.getAnnotatedType().getAnnotations()) { - if (isNullableAnnotation(annotation)) { - return true; - } - } + Stream annotations = + Stream.concat( + Stream.of(field.getAnnotations()), + Stream.of(field.getAnnotatedType().getAnnotations())); - return false; + return annotations.anyMatch(FieldValueTypeInformation::isNullableAnnotation); } /** @@ -196,19 +190,26 @@ private static boolean hasNullableAnnotation(Field field) { * field is nullable. */ private static boolean hasNullableReturnType(Method method) { - for (Annotation annotation : method.getAnnotations()) { - if (isNullableAnnotation(annotation)) { - return true; - } - } + Stream annotations = + Stream.concat( + Stream.of(method.getAnnotations()), + Stream.of(method.getAnnotatedReturnType().getAnnotations())); - for (Annotation annotation : method.getAnnotatedReturnType().getAnnotations()) { - if (isNullableAnnotation(annotation)) { - return true; - } + return annotations.anyMatch(FieldValueTypeInformation::isNullableAnnotation); + } + + private static boolean hasSingleNullableParameter(Method method) { + if (method.getParameterCount() != 1) { + throw new RuntimeException( + "Setter methods should take a single argument " + method.getName()); } - return false; + Stream annotations = + Stream.concat( + Arrays.stream(method.getAnnotatedParameterTypes()[0].getAnnotations()), + Arrays.stream(method.getParameterAnnotations()[0])); + + return annotations.anyMatch(FieldValueTypeInformation::isNullableAnnotation); } /** Try to accept any Nullable annotation. */ @@ -227,13 +228,9 @@ public static FieldValueTypeInformation forSetter(Method method, String setterPr } else { throw new RuntimeException("Setter has wrong prefix " + method.getName()); } - if (method.getParameterCount() != 1) { - throw new RuntimeException("Setter methods should take a single argument."); - } + TypeDescriptor type = TypeDescriptor.of(method.getGenericParameterTypes()[0]); - boolean nullable = - Arrays.stream(method.getParameters()[0].getAnnotatedType().getAnnotations()) - .anyMatch(annotation -> isNullableAnnotation(annotation)); + boolean nullable = hasSingleNullableParameter(method); return new AutoValue_FieldValueTypeInformation.Builder() .setName(name) .setNullable(nullable) diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/RowMessages.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/RowMessages.java new file mode 100644 index 000000000000..af2ff7cedbdf --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/RowMessages.java @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas; + +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.transforms.ProcessFunction; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; + +@Internal +public final class RowMessages { + + private RowMessages() {} + + public static SimpleFunction bytesToRowFn( + SchemaProvider schemaProvider, + TypeDescriptor typeDescriptor, + ProcessFunction fromBytesFn) { + final SerializableFunction toRowFn = + checkArgumentNotNull(schemaProvider.toRowFunction(typeDescriptor)); + return new BytesToRowFn<>(fromBytesFn, toRowFn); + } + + public static SimpleFunction bytesToRowFn( + SchemaProvider schemaProvider, TypeDescriptor typeDescriptor, Coder coder) { + return bytesToRowFn( + schemaProvider, typeDescriptor, bytes -> coder.decode(new ByteArrayInputStream(bytes))); + } + + private static final class BytesToRowFn extends SimpleFunction { + + private final ProcessFunction fromBytesFn; + private final SerializableFunction toRowFn; + + private BytesToRowFn( + ProcessFunction fromBytesFn, SerializableFunction toRowFn) { + this.fromBytesFn = fromBytesFn; + this.toRowFn = toRowFn; + } + + @Override + public Row apply(byte[] bytes) { + final T message; + try { + message = fromBytesFn.apply(bytes); + } catch (Exception e) { + throw new IllegalStateException("Could not decode bytes as message", e); + } + return toRowFn.apply(message); + } + } + + public static SimpleFunction rowToBytesFn( + SchemaProvider schemaProvider, + TypeDescriptor typeDescriptor, + ProcessFunction toBytesFn) { + final Schema schema = checkArgumentNotNull(schemaProvider.schemaFor(typeDescriptor)); + final SerializableFunction fromRowFn = + checkArgumentNotNull(schemaProvider.fromRowFunction(typeDescriptor)); + toBytesFn = checkArgumentNotNull(toBytesFn); + return new RowToBytesFn<>(schema, fromRowFn, toBytesFn); + } + + public static SimpleFunction rowToBytesFn( + SchemaProvider schemaProvider, TypeDescriptor typeDescriptor, Coder coder) { + return rowToBytesFn(schemaProvider, typeDescriptor, message -> toBytes(coder, message)); + } + + private static byte[] toBytes(Coder coder, T message) throws IOException { + final ByteArrayOutputStream out = new ByteArrayOutputStream(); + coder.encode(message, out); + return out.toByteArray(); + } + + private static final class RowToBytesFn extends SimpleFunction { + + private final Schema schema; + private final SerializableFunction fromRowFn; + private final ProcessFunction toBytesFn; + + private RowToBytesFn( + Schema schema, + SerializableFunction fromRowFn, + ProcessFunction toBytesFn) { + this.schema = schema; + this.fromRowFn = fromRowFn; + this.toBytesFn = toBytesFn; + } + + @Override + public byte[] apply(Row row) { + if (!schema.equivalent(row.getSchema())) { + row = switchFieldsOrder(row); + } + final T message = fromRowFn.apply(row); + try { + return toBytesFn.apply(message); + } catch (Exception e) { + throw new IllegalStateException("Could not encode message as bytes", e); + } + } + + private Row switchFieldsOrder(Row row) { + Row.Builder convertedRow = Row.withSchema(schema); + schema.getFields().forEach(field -> convertedRow.addValue(row.getValue(field.getName()))); + return convertedRow.build(); + } + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java index ff03db9561a3..2d35ab0994c0 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java @@ -42,6 +42,7 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.BiMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashBiMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; @@ -95,6 +96,7 @@ public String toString() { // A mapping between field names an indices. private final BiMap fieldIndices = HashBiMap.create(); private Map encodingPositions = Maps.newHashMap(); + private boolean encodingPositionsOverridden = false; private final List fields; // Cache the hashCode, so it doesn't have to be recomputed. Schema objects are immutable, so this @@ -286,9 +288,15 @@ public Map getEncodingPositions() { return encodingPositions; } + /** Returns whether encoding positions have been explicitly overridden. */ + public boolean isEncodingPositionsOverridden() { + return encodingPositionsOverridden; + } + /** Sets the encoding positions for this schema. */ public void setEncodingPositions(Map encodingPositions) { this.encodingPositions = encodingPositions; + this.encodingPositionsOverridden = true; } /** Get this schema's UUID. */ @@ -391,8 +399,13 @@ public String toString() { builder.append(field); builder.append(System.lineSeparator()); } + builder.append("Encoding positions:"); + builder.append(System.lineSeparator()); + builder.append(encodingPositions); + builder.append(System.lineSeparator()); builder.append("Options:"); builder.append(options); + builder.append("UUID: " + uuid); return builder.toString(); } @@ -949,6 +962,40 @@ public int hashCode() { getMetadata() }); } + + @Override + public String toString() { + StringBuilder builder = new StringBuilder(); + switch (getTypeName()) { + case ROW: + builder.append("ROW<"); + ImmutableList.Builder fieldEntries = ImmutableList.builder(); + for (Field field : getRowSchema().getFields()) { + fieldEntries.add(field.getName() + " " + field.getType().toString()); + } + builder.append(String.join(", ", fieldEntries.build())); + builder.append(">"); + break; + case ARRAY: + builder.append("ARRAY<"); + builder.append(getCollectionElementType().toString()); + builder.append(">"); + break; + case MAP: + builder.append("MAP<"); + builder.append(getMapKeyType().toString()); + builder.append(", "); + builder.append(getMapValueType().toString()); + builder.append(">"); + break; + default: + builder.append(getTypeName().toString()); + } + if (!getNullable()) { + builder.append(" NOT NULL"); + } + return builder.toString(); + } } /** Field of a row. Contains the {@link FieldType} along with associated metadata. */ diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java index 015ff7361b3e..fe097d3dfd5a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java @@ -22,6 +22,7 @@ import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; +import java.util.Map; import java.util.Objects; import java.util.UUID; import org.apache.beam.sdk.annotations.Experimental; @@ -89,6 +90,11 @@ public static SchemaCoder of(Schema schema) { return RowCoder.of(schema); } + /** Override encoding positions for the given schema. */ + public static void overrideEncodingPositions(UUID uuid, Map encodingPositions) { + RowCoderGenerator.overrideEncodingPositions(uuid, encodingPositions); + } + /** Returns the schema associated with this type. */ public Schema getSchema() { return schema; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaTranslation.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaTranslation.java index 9ed2ce4fc144..2215f3dc9f72 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaTranslation.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaTranslation.java @@ -37,13 +37,15 @@ import org.apache.beam.sdk.schemas.Schema.LogicalType; import org.apache.beam.sdk.schemas.Schema.TypeName; import org.apache.beam.sdk.schemas.logicaltypes.MicrosInstant; +import org.apache.beam.sdk.schemas.logicaltypes.UnknownLogicalType; import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.checkerframework.checker.nullness.qual.Nullable; /** Utility methods for translating schemas. */ @Experimental(Kind.SCHEMAS) @@ -136,6 +138,21 @@ private static SchemaApi.FieldType fieldTypeToProto( .setRepresentation( fieldTypeToProto(logicalType.getBaseType(), serializeLogicalType)) .setUrn(logicalType.getIdentifier()); + } else if (logicalType instanceof UnknownLogicalType) { + logicalTypeBuilder = + SchemaApi.LogicalType.newBuilder() + .setUrn(logicalType.getIdentifier()) + .setPayload(ByteString.copyFrom(((UnknownLogicalType) logicalType).getPayload())) + .setRepresentation( + fieldTypeToProto(logicalType.getBaseType(), serializeLogicalType)); + + if (logicalType.getArgumentType() != null) { + logicalTypeBuilder + .setArgumentType( + fieldTypeToProto(logicalType.getArgumentType(), serializeLogicalType)) + .setArgument( + fieldValueToProto(logicalType.getArgumentType(), logicalType.getArgument())); + } } else { logicalTypeBuilder = SchemaApi.LogicalType.newBuilder() @@ -215,7 +232,19 @@ public static Schema schemaFromProto(SchemaApi.Schema protoSchema) { } builder.setOptions(optionsFromProto(protoSchema.getOptionsList())); Schema schema = builder.build(); - schema.setEncodingPositions(encodingLocationMap); + + Preconditions.checkState(encodingLocationMap.size() == schema.getFieldCount()); + long dinstictEncodingPositions = encodingLocationMap.values().stream().distinct().count(); + Preconditions.checkState(dinstictEncodingPositions <= schema.getFieldCount()); + if (dinstictEncodingPositions < schema.getFieldCount() && schema.getFieldCount() > 0) { + // This means that encoding positions were not specified in the proto. Generally, we don't + // expect this to happen, + // but if it does happen, we expect none to be specified - in which case the should all be + // zero. + Preconditions.checkState(dinstictEncodingPositions == 1); + } else if (protoSchema.getEncodingPositionsSet()) { + schema.setEncodingPositions(encodingLocationMap); + } if (!protoSchema.getId().isEmpty()) { schema.setUUID(UUID.fromString(protoSchema.getId())); } @@ -313,7 +342,20 @@ private static FieldType fieldTypeFromProtoWithoutNullable(SchemaApi.FieldType p SerializableUtils.deserializeFromByteArray( protoFieldType.getLogicalType().getPayload().toByteArray(), "logicalType")); } else { - throw new IllegalArgumentException("Encountered unsupported logical type URN: " + urn); + @Nullable FieldType argumentType = null; + @Nullable Object argumentValue = null; + if (protoFieldType.getLogicalType().hasArgumentType()) { + argumentType = fieldTypeFromProto(protoFieldType.getLogicalType().getArgumentType()); + argumentValue = + fieldValueFromProto(argumentType, protoFieldType.getLogicalType().getArgument()); + } + return FieldType.logicalType( + new UnknownLogicalType( + urn, + protoFieldType.getLogicalType().getPayload().toByteArray(), + argumentType, + argumentValue, + fieldTypeFromProto(protoFieldType.getLogicalType().getRepresentation()))); } default: throw new IllegalArgumentException( diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/DeadLetteredTransform.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/DeadLetteredTransform.java new file mode 100644 index 000000000000..c1fa11cec55c --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/DeadLetteredTransform.java @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import java.io.ByteArrayOutputStream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.transforms.WithFailures.Result; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.commons.lang3.exception.ExceptionUtils; + +@Internal +@Experimental(Kind.SCHEMAS) +public class DeadLetteredTransform + extends PTransform, PCollection> { + private final SimpleFunction transform; + private final PTransform, PDone> deadLetter; + + public DeadLetteredTransform(SimpleFunction transform, String deadLetterConfig) { + this(transform, GenericDlq.getDlqTransform(deadLetterConfig)); + } + + @VisibleForTesting + DeadLetteredTransform( + SimpleFunction transform, + PTransform, PDone> deadLetter) { + this.transform = transform; + this.deadLetter = deadLetter; + } + + // Required to capture the generic type parameter of the PCollection. + private PCollection expandInternal( + PCollection input) { + Coder coder = input.getCoder(); + SerializableFunction localTransform = transform::apply; + MapElements.MapWithFailures mapWithFailures = + MapElements.into(transform.getOutputTypeDescriptor()) + .via(localTransform) + .exceptionsInto(TypeDescriptor.of(Failure.class)) + .exceptionsVia( + x -> { + try (ByteArrayOutputStream os = new ByteArrayOutputStream()) { + coder.encode(x.element(), os); + return Failure.newBuilder() + .setPayload(os.toByteArray()) + .setError( + String.format( + "%s%n%n%s", + x.exception().getMessage(), + ExceptionUtils.getStackTrace(x.exception()))) + .build(); + } + }); + Result, Failure> result = mapWithFailures.expand(input); + result.failures().apply(deadLetter); + return result.output(); + } + + @Override + public PCollection expand(PCollection input) { + return expandInternal(input); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/Failure.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/Failure.java new file mode 100644 index 000000000000..44dce68885b5 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/Failure.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import com.google.auto.value.AutoValue; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.AutoValueSchema; +import org.apache.beam.sdk.schemas.annotations.DefaultSchema; + +/** A generic failure of an SQL transform. */ +@Internal +@Experimental(Kind.SCHEMAS) +@DefaultSchema(AutoValueSchema.class) +@AutoValue +public abstract class Failure { + /** Bytes containing the payload which has failed. */ + @SuppressWarnings("mutable") + public abstract byte[] getPayload(); + /** Information about the cause of the failure. */ + public abstract String getError(); + + public static Builder newBuilder() { + return new AutoValue_Failure.Builder(); + } + + @AutoValue.Builder + public abstract static class Builder { + public abstract Builder setPayload(byte[] payload); + + public abstract Builder setError(String error); + + public abstract Failure build(); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/GenericDlq.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/GenericDlq.java new file mode 100644 index 000000000000..03c9912e442f --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/GenericDlq.java @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; + +/** Helper to generate a DLQ transform to write PCollection to an external system. */ +@Internal +@Experimental(Kind.SCHEMAS) +public final class GenericDlq { + private GenericDlq() {} + + private static final Map PROVIDERS = + Providers.loadProviders(GenericDlqProvider.class); + + @SuppressWarnings("dereference.of.nullable") + public static PTransform, PDone> getDlqTransform(String fullConfig) { + List strings = Splitter.on(":").limit(2).splitToList(fullConfig); + checkArgument( + strings.size() == 2, "Invalid config, must start with `identifier:`. %s", fullConfig); + String key = strings.get(0); + String config = strings.get(1).trim(); + GenericDlqProvider provider = PROVIDERS.get(key); + checkArgument( + provider != null, "Invalid config, no DLQ provider exists with identifier `%s`.", key); + return provider.newDlqTransform(config); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/GenericDlqProvider.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/GenericDlqProvider.java new file mode 100644 index 000000000000..a74a6fd0ffe7 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/GenericDlqProvider.java @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.io.Providers.Identifyable; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; + +/** A Provider for generic DLQ transforms that handle deserialization failures. */ +@Internal +@Experimental(Kind.SCHEMAS) +public interface GenericDlqProvider extends Identifyable { + /** Generate a DLQ output from the provided config value. */ + PTransform, PDone> newDlqTransform(String config); +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/Providers.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/Providers.java new file mode 100644 index 000000000000..dc0f758b4aba --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/Providers.java @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.util.HashMap; +import java.util.Map; +import java.util.ServiceLoader; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; + +/** Helpers for implementing the "Provider" pattern. */ +@Internal +@Experimental(Kind.SCHEMAS) +public final class Providers { + public interface Identifyable { + /** + * Returns an id that uniquely represents this among others implementing its derived interface. + */ + String identifier(); + } + + private Providers() {} + + public static Map loadProviders(Class klass) { + Map providers = new HashMap<>(); + for (T provider : ServiceLoader.load(klass)) { + checkArgument( + !providers.containsKey(provider.identifier()), + "Duplicate providers exist with identifier `%s` for class %s.", + provider.identifier(), + klass); + providers.put(provider.identifier(), provider); + } + return providers; + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/PushdownProjector.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/PushdownProjector.java new file mode 100644 index 000000000000..5bfb55e48f88 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/PushdownProjector.java @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PInput; +import org.apache.beam.sdk.values.Row; + +/** + * Factory for creating a {@link PTransform} that can execute a projection. + * + *

    Typically this interface will be implemented by a reader {@link PTransform} that is capable of + * pushing down projection to an external source. For example, {@link SchemaIO#buildReader()} may + * return a {@link PushdownProjector} to which a projection may be applied later. + */ +@Experimental +public interface PushdownProjector { + /** + * Returns a {@link PTransform} that will execute the projection specified by the {@link + * FieldAccessDescriptor}. + */ + PTransform> withProjectionPushdown( + FieldAccessDescriptor fieldAccessDescriptor); + + /** + * Returns true if this instance can do a projection that returns fields in a different order than + * the projection's inputs. + */ + boolean supportsFieldReordering(); +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderAvroIT.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/AvroPayloadSerializerProvider.java similarity index 55% rename from sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderAvroIT.java rename to sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/AvroPayloadSerializerProvider.java index 22c834ec0cef..ace671fccd6f 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderAvroIT.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/AvroPayloadSerializerProvider.java @@ -15,25 +15,28 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; +package org.apache.beam.sdk.schemas.io.payloads; +import com.google.auto.service.AutoService; +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.utils.AvroUtils; -import org.apache.beam.sdk.transforms.SimpleFunction; -import org.apache.beam.sdk.values.Row; -import org.apache.kafka.clients.producer.ProducerRecord; - -public class KafkaTableProviderAvroIT extends KafkaTableProviderIT { - private final SimpleFunction toBytesFn = - AvroUtils.getRowToAvroBytesFunction(TEST_TABLE_SCHEMA); +@Internal +@Experimental(Kind.SCHEMAS) +@AutoService(PayloadSerializerProvider.class) +public class AvroPayloadSerializerProvider implements PayloadSerializerProvider { @Override - protected ProducerRecord generateProducerRecord(int i) { - return new ProducerRecord<>( - kafkaOptions.getKafkaTopic(), "k" + i, toBytesFn.apply(generateRow(i))); + public String identifier() { + return "avro"; } @Override - protected String getPayloadFormat() { - return "avro"; + public PayloadSerializer getSerializer(Schema schema, Map tableParams) { + return PayloadSerializer.of( + AvroUtils.getRowToAvroBytesFunction(schema), AvroUtils.getAvroBytesToRowFunction(schema)); } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/JsonPayloadSerializerProvider.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/JsonPayloadSerializerProvider.java new file mode 100644 index 000000000000..a8a85edeabc6 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/JsonPayloadSerializerProvider.java @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io.payloads; + +import static java.nio.charset.StandardCharsets.UTF_8; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.auto.service.AutoService; +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.util.RowJson.RowJsonDeserializer; +import org.apache.beam.sdk.util.RowJson.RowJsonSerializer; +import org.apache.beam.sdk.util.RowJsonUtils; + +@Internal +@Experimental(Kind.SCHEMAS) +@AutoService(PayloadSerializerProvider.class) +public class JsonPayloadSerializerProvider implements PayloadSerializerProvider { + @Override + public String identifier() { + return "json"; + } + + @Override + public PayloadSerializer getSerializer(Schema schema, Map tableParams) { + ObjectMapper deserializeMapper = + RowJsonUtils.newObjectMapperWith(RowJsonDeserializer.forSchema(schema)); + ObjectMapper serializeMapper = + RowJsonUtils.newObjectMapperWith(RowJsonSerializer.forSchema(schema)); + return PayloadSerializer.of( + row -> RowJsonUtils.rowToJson(serializeMapper, row).getBytes(UTF_8), + bytes -> RowJsonUtils.jsonToRow(deserializeMapper, new String(bytes, UTF_8))); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/PayloadSerializer.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/PayloadSerializer.java new file mode 100644 index 000000000000..a90255319b1e --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/PayloadSerializer.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io.payloads; + +import java.io.Serializable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.Row; + +@Internal +@Experimental(Kind.SCHEMAS) +public interface PayloadSerializer extends Serializable { + long serialVersionUID = 5645783967169L; + + byte[] serialize(Row row); + + Row deserialize(byte[] bytes); + + static PayloadSerializer of( + SerializableFunction serializeFn, + SerializableFunction deserializeFn) { + return new PayloadSerializer() { + @Override + public byte[] serialize(Row row) { + return serializeFn.apply(row); + } + + @Override + public Row deserialize(byte[] bytes) { + return deserializeFn.apply(bytes); + } + }; + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/PayloadSerializerProvider.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/PayloadSerializerProvider.java new file mode 100644 index 000000000000..eeda798f08b1 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/PayloadSerializerProvider.java @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io.payloads; + +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.Providers.Identifyable; + +@Internal +@Experimental(Kind.SCHEMAS) +public interface PayloadSerializerProvider extends Identifyable { + /** + * Get a PayloadSerializer. + * + * @param schema the schema of the payload + * @param tableParams parameters passed at table declaration time for configuring the serializer + */ + PayloadSerializer getSerializer(Schema schema, Map tableParams); +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/PayloadSerializers.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/PayloadSerializers.java new file mode 100644 index 000000000000..0a8199fa7db3 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/PayloadSerializers.java @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io.payloads; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.Providers; + +@Internal +@Experimental(Kind.SCHEMAS) +public final class PayloadSerializers { + private PayloadSerializers() {} + + private static final Map PROVIDERS = + Providers.loadProviders(PayloadSerializerProvider.class); + + @SuppressWarnings("dereference.of.nullable") + public static PayloadSerializer getSerializer( + String id, Schema schema, Map tableParams) { + PayloadSerializerProvider provider = PROVIDERS.get(id); + checkArgument( + provider != null, + "Invalid config, no serializer provider exists with identifier `%s`.", + id); + return provider.getSerializer(schema, tableParams); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/package-info.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/package-info.java new file mode 100644 index 000000000000..91fd27c3363b --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/payloads/package-info.java @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Provides abstractions for schema-aware IOs. */ +@DefaultAnnotation(NonNull.class) +package org.apache.beam.sdk.schemas.io.payloads; + +import edu.umd.cs.findbugs.annotations.DefaultAnnotation; +import org.checkerframework.checker.nullness.qual.NonNull; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/UnknownLogicalType.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/UnknownLogicalType.java new file mode 100644 index 000000000000..af19f8f33e5d --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/UnknownLogicalType.java @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.logicaltypes; + +import org.apache.beam.sdk.schemas.Schema.FieldType; + +/** + * A base class for logical types that are not understood by the Java SDK. + * + *

    Unknown logical types are passed through and treated like their Base type in the Java SDK. + * + *

    Java transforms and JVM runners should take care when processing these types as they may have + * a particular semantic meaning in the context that created them. For example, consider an + * enumerated type backed by a primitive {@class FieldType.INT8}. A Java transform can clearly pass + * through this value and pass it back to a context that understands it, but that transform should + * not blindly perform arithmetic on this type. + */ +public class UnknownLogicalType extends PassThroughLogicalType { + private byte[] payload; + + public UnknownLogicalType( + String identifier, + byte[] payload, + FieldType argumentType, + Object argument, + FieldType fieldType) { + super(identifier, argumentType, argument, fieldType); + this.payload = payload; + } + + public byte[] getPayload() { + return payload; + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Convert.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Convert.java index ce7507cd92ae..05fa13e4789a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Convert.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Convert.java @@ -21,6 +21,7 @@ import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.SchemaCoder; import org.apache.beam.sdk.schemas.SchemaRegistry; import org.apache.beam.sdk.schemas.utils.ByteBuddyUtils.DefaultTypeConversionsFactory; import org.apache.beam.sdk.schemas.utils.ConvertHelpers; @@ -125,6 +126,10 @@ public PCollection expand(PCollection input) { throw new RuntimeException("Convert requires a schema on the input."); } + SchemaCoder coder = (SchemaCoder) input.getCoder(); + if (coder.getEncodedTypeDescriptor().equals(outputTypeDescriptor)) { + return (PCollection) input; + } SchemaRegistry registry = input.getPipeline().getSchemaRegistry(); ConvertHelpers.ConvertedSchemaInformation converted = ConvertHelpers.getConvertedSchemaInformation( diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/RenameFields.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/RenameFields.java index 7b8da25a8462..bdb5bea453cf 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/RenameFields.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/RenameFields.java @@ -17,11 +17,16 @@ */ package org.apache.beam.sdk.schemas.transforms; +import com.google.auto.value.AutoValue; import java.io.Serializable; +import java.util.BitSet; import java.util.Collection; +import java.util.Collections; import java.util.List; import java.util.Map; +import java.util.UUID; import java.util.stream.Collectors; +import javax.annotation.Nullable; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.schemas.FieldAccessDescriptor; @@ -33,12 +38,13 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ArrayListMultimap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Multimap; -import org.apache.commons.compress.utils.Lists; /** * A transform for renaming fields inside an existing schema. Top level or nested fields can be @@ -64,64 +70,95 @@ public static Inner create() { return new Inner<>(); } - // Describes a single renameSchema rule. - private static class RenamePair implements Serializable { + // Describes a single renameSchema rule + @AutoValue + abstract static class RenamePair implements Serializable { // The FieldAccessDescriptor describing the field to renameSchema. Must reference a singleton // field. - private final FieldAccessDescriptor fieldAccessDescriptor; + abstract FieldAccessDescriptor getFieldAccessDescriptor(); // The new name for the field. - private final String newName; + abstract String getNewName(); - RenamePair(FieldAccessDescriptor fieldAccessDescriptor, String newName) { - this.fieldAccessDescriptor = fieldAccessDescriptor; - this.newName = newName; + static RenamePair of(FieldAccessDescriptor fieldAccessDescriptor, String newName) { + return new AutoValue_RenameFields_RenamePair(fieldAccessDescriptor, newName); } RenamePair resolve(Schema schema) { - FieldAccessDescriptor resolved = fieldAccessDescriptor.resolve(schema); + FieldAccessDescriptor resolved = getFieldAccessDescriptor().resolve(schema); if (!resolved.referencesSingleField()) { throw new IllegalArgumentException(resolved + " references multiple fields."); } - return new RenamePair(resolved, newName); + return RenamePair.of(resolved, getNewName()); } } - private static FieldType renameFieldType(FieldType inputType, Collection renames) { + private static FieldType renameFieldType( + FieldType inputType, + Collection renames, + Map renamedSchemasMap, + Map nestedFieldRenamedMap) { + if (renames.isEmpty()) { + return inputType; + } + switch (inputType.getTypeName()) { case ROW: - return FieldType.row(renameSchema(inputType.getRowSchema(), renames)); + renameSchema(inputType.getRowSchema(), renames, renamedSchemasMap, nestedFieldRenamedMap); + return FieldType.row(renamedSchemasMap.get(inputType.getRowSchema().getUUID())); case ARRAY: - return FieldType.array(renameFieldType(inputType.getCollectionElementType(), renames)); + return FieldType.array( + renameFieldType( + inputType.getCollectionElementType(), + renames, + renamedSchemasMap, + nestedFieldRenamedMap)); case ITERABLE: - return FieldType.iterable(renameFieldType(inputType.getCollectionElementType(), renames)); + return FieldType.iterable( + renameFieldType( + inputType.getCollectionElementType(), + renames, + renamedSchemasMap, + nestedFieldRenamedMap)); case MAP: return FieldType.map( - renameFieldType(inputType.getMapKeyType(), renames), - renameFieldType(inputType.getMapValueType(), renames)); + renameFieldType( + inputType.getMapKeyType(), renames, renamedSchemasMap, nestedFieldRenamedMap), + renameFieldType( + inputType.getMapValueType(), renames, renamedSchemasMap, nestedFieldRenamedMap)); + case LOGICAL_TYPE: + throw new RuntimeException("RenameFields does not support renaming logical types."); default: return inputType; } } // Apply the user-specified renames to the input schema. - private static Schema renameSchema(Schema inputSchema, Collection renames) { + @VisibleForTesting + static void renameSchema( + Schema inputSchema, + Collection renames, + Map renamedSchemasMap, + Map nestedFieldRenamedMap) { // The mapping of renames to apply at this level of the schema. Map topLevelRenames = Maps.newHashMap(); // For nested schemas, collect all applicable renames here. Multimap nestedRenames = ArrayListMultimap.create(); for (RenamePair rename : renames) { - FieldAccessDescriptor access = rename.fieldAccessDescriptor; + FieldAccessDescriptor access = rename.getFieldAccessDescriptor(); if (!access.fieldIdsAccessed().isEmpty()) { // This references a field at this level of the schema. Integer fieldId = Iterables.getOnlyElement(access.fieldIdsAccessed()); - topLevelRenames.put(fieldId, rename.newName); + topLevelRenames.put(fieldId, rename.getNewName()); } else { // This references a nested field. Map.Entry nestedAccess = Iterables.getOnlyElement(access.nestedFieldsById().entrySet()); + nestedFieldRenamedMap + .computeIfAbsent(inputSchema.getUUID(), s -> new BitSet(inputSchema.getFieldCount())) + .set(nestedAccess.getKey()); nestedRenames.put( - nestedAccess.getKey(), new RenamePair(nestedAccess.getValue(), rename.newName)); + nestedAccess.getKey(), RenamePair.of(nestedAccess.getValue(), rename.getNewName())); } } @@ -130,16 +167,13 @@ private static Schema renameSchema(Schema inputSchema, Collection re Field field = inputSchema.getField(i); FieldType fieldType = field.getType(); String newName = topLevelRenames.getOrDefault(i, field.getName()); - Collection nestedFieldRenames = nestedRenames.asMap().get(i); - if (nestedFieldRenames != null) { - // There are nested field renames. Recursively renameSchema the rest of the schema. - builder.addField(newName, renameFieldType(fieldType, nestedFieldRenames)); - } else { - // No renameSchema for this field. Just add it back as is, potentially with a new name. - builder.addField(newName, fieldType); - } + Collection nestedFieldRenames = + nestedRenames.asMap().getOrDefault(i, Collections.emptyList()); + builder.addField( + newName, + renameFieldType(fieldType, nestedFieldRenames, renamedSchemasMap, nestedFieldRenamedMap)); } - return builder.build(); + renamedSchemasMap.put(inputSchema.getUUID(), builder.build()); } /** The class implementing the actual PTransform. */ @@ -164,7 +198,7 @@ public Inner rename(FieldAccessDescriptor field, String newName) { List newList = ImmutableList.builder() .addAll(renames) - .add(new RenamePair(field, newName)) + .add(RenamePair.of(field, newName)) .build(); return new Inner<>(newList); @@ -172,21 +206,104 @@ public Inner rename(FieldAccessDescriptor field, String newName) { @Override public PCollection expand(PCollection input) { - Schema inputSchema = input.getSchema(); + final Map renamedSchemasMap = Maps.newHashMap(); + final Map nestedFieldRenamedMap = Maps.newHashMap(); - List pairs = - renames.stream().map(r -> r.resolve(inputSchema)).collect(Collectors.toList()); - final Schema outputSchema = renameSchema(inputSchema, pairs); + List resolvedRenames = + renames.stream().map(r -> r.resolve(input.getSchema())).collect(Collectors.toList()); + renameSchema(input.getSchema(), resolvedRenames, renamedSchemasMap, nestedFieldRenamedMap); + final Schema outputSchema = renamedSchemasMap.get(input.getSchema().getUUID()); + final BitSet nestedRenames = nestedFieldRenamedMap.get(input.getSchema().getUUID()); return input .apply( ParDo.of( new DoFn() { @ProcessElement public void processElement(@Element Row row, OutputReceiver o) { - o.output(Row.withSchema(outputSchema).attachValues(row.getValues())); + o.output( + renameRow( + row, + outputSchema, + nestedRenames, + renamedSchemasMap, + nestedFieldRenamedMap)); } })) .setRowSchema(outputSchema); } } + + // TODO(reuvenlax): For better performance, we should reuse functionality in + // SelectByteBuddyHelpers to generate + // byte code to do the rename. This would allow us to skip walking over the schema on each row. + // For now we added + // the optimization to skip schema walking if there are no nested renames (as determined by the + // nestedFieldRenamedMap). + @VisibleForTesting + static Row renameRow( + Row row, + Schema schema, + @Nullable BitSet nestedRenames, + Map renamedSubSchemasMap, + Map nestedFieldRenamedMap) { + if (nestedRenames == null || nestedRenames.isEmpty()) { + // Fast path, short circuit subschems. + return Row.withSchema(schema).attachValues(row.getValues()); + } else { + List values = Lists.newArrayListWithCapacity(row.getValues().size()); + for (int i = 0; i < schema.getFieldCount(); ++i) { + if (nestedRenames.get(i)) { + values.add( + renameFieldValue( + row.getValue(i), + schema.getField(i).getType(), + renamedSubSchemasMap, + nestedFieldRenamedMap)); + } else { + values.add(row.getValue(i)); + } + } + return Row.withSchema(schema).attachValues(values); + } + } + + private static Object renameFieldValue( + Object value, + FieldType fieldType, + Map renamedSubSchemas, + Map nestedFieldRenamed) { + switch (fieldType.getTypeName()) { + case ARRAY: + case ITERABLE: + List renamedValues = Lists.newArrayList(); + for (Object o : (List) value) { + renamedValues.add( + renameFieldValue( + o, fieldType.getCollectionElementType(), renamedSubSchemas, nestedFieldRenamed)); + } + return renamedValues; + case MAP: + Map renamedMap = Maps.newHashMap(); + for (Map.Entry entry : ((Map) value).entrySet()) { + renamedMap.put( + renameFieldValue( + entry.getKey(), fieldType.getMapKeyType(), renamedSubSchemas, nestedFieldRenamed), + renameFieldValue( + entry.getValue(), + fieldType.getMapValueType(), + renamedSubSchemas, + nestedFieldRenamed)); + } + return renamedMap; + case ROW: + return renameRow( + (Row) value, + fieldType.getRowSchema(), + nestedFieldRenamed.get(fieldType.getRowSchema().getUUID()), + renamedSubSchemas, + nestedFieldRenamed); + default: + return value; + } + } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AutoValueUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AutoValueUtils.java index 027832ccdf77..c2d6e34e50a1 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AutoValueUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AutoValueUtils.java @@ -42,28 +42,28 @@ import org.apache.beam.sdk.schemas.utils.ByteBuddyUtils.TypeConversionsFactory; import org.apache.beam.sdk.util.common.ReflectHelpers; import org.apache.beam.sdk.values.TypeDescriptor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.asm.AsmVisitorWrapper; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.method.MethodDescription.ForLoadedMethod; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Duplication; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Removal; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.TypeCreation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.TypeCasting; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.ClassWriter; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.asm.AsmVisitorWrapper; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.method.MethodDescription.ForLoadedMethod; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Duplication; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Removal; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.TypeCreation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.TypeCasting; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.ClassWriter; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.checkerframework.checker.nullness.qual.Nullable; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java index b31c369cd00c..0e98ed36146e 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java @@ -33,19 +33,19 @@ import org.apache.beam.sdk.schemas.utils.ReflectUtils.ClassWithSchema; import org.apache.beam.sdk.util.common.ReflectHelpers; import org.apache.beam.sdk.values.TypeDescriptor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.asm.AsmVisitorWrapper; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.MethodCall; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.TypeCasting; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.ClassWriter; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.asm.AsmVisitorWrapper; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.MethodCall; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.TypeCasting; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.ClassWriter; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; @Experimental(Kind.SCHEMAS) diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java index 4b66409f5d8d..3a10692173ab 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java @@ -28,7 +28,7 @@ import java.lang.reflect.Method; import java.math.BigDecimal; import java.nio.ByteBuffer; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.HashMap; import java.util.List; @@ -76,14 +76,14 @@ import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Duplication; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.TypeCreation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.TypeCasting; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Duplication; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.TypeCreation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.TypeCasting; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.CaseFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; @@ -369,6 +369,14 @@ public static Row toBeamRowStrict(GenericRecord record, @Nullable Schema schema) return builder.build(); } + /** + * Convert from a Beam Row to an AVRO GenericRecord. The Avro Schema is inferred from the Beam + * schema on the row. + */ + public static GenericRecord toGenericRecord(Row row) { + return toGenericRecord(row, null); + } + /** * Convert from a Beam Row to an AVRO GenericRecord. If a Schema is not provided, one is inferred * from the Beam schema on the row. @@ -453,7 +461,7 @@ public Row apply(byte[] bytes) { } catch (Exception e) { throw new AvroRuntimeException( "Could not decode avro record from given bytes " - + new String(bytes, Charset.defaultCharset()), + + new String(bytes, StandardCharsets.UTF_8), e); } } @@ -517,7 +525,7 @@ public boolean equals(@Nullable Object other) { return false; } GenericRecordToRowFn that = (GenericRecordToRowFn) other; - return schema.equals(that.schema); + return Objects.equals(this.schema, that.schema); } @Override @@ -556,7 +564,7 @@ public boolean equals(@Nullable Object other) { return false; } RowToGenericRecordFn that = (RowToGenericRecordFn) other; - return avroSchema.equals(that.avroSchema); + return Objects.equals(this.avroSchema, that.avroSchema); } @Override @@ -906,6 +914,26 @@ private static org.apache.avro.Schema getFieldSchema( .map(x -> getFieldSchema(x.getType(), x.getName(), namespace)) .collect(Collectors.toList())); break; + case "CHAR": + case "NCHAR": + baseType = + buildHiveLogicalTypeSchema("char", (int) fieldType.getLogicalType().getArgument()); + break; + case "NVARCHAR": + case "VARCHAR": + case "LONGNVARCHAR": + case "LONGVARCHAR": + baseType = + buildHiveLogicalTypeSchema( + "varchar", (int) fieldType.getLogicalType().getArgument()); + break; + case "DATE": + baseType = LogicalTypes.date().addToSchema(org.apache.avro.Schema.create(Type.INT)); + break; + case "TIME": + baseType = + LogicalTypes.timeMillis().addToSchema(org.apache.avro.Schema.create(Type.INT)); + break; default: throw new RuntimeException( "Unhandled logical type " + fieldType.getLogicalType().getIdentifier()); @@ -1017,6 +1045,15 @@ private static org.apache.avro.Schema getFieldSchema( typeWithNullability.type.getTypes().get(oneOfValue.getCaseType().getValue()), oneOfValue.getValue()); } + case "NVARCHAR": + case "VARCHAR": + case "LONGNVARCHAR": + case "LONGVARCHAR": + return new Utf8((String) value); + case "DATE": + return Days.daysBetween(Instant.EPOCH, (Instant) value).getDays(); + case "TIME": + return (int) ((Instant) value).getMillis(); default: throw new RuntimeException( "Unhandled logical type " + fieldType.getLogicalType().getIdentifier()); @@ -1275,7 +1312,20 @@ private static Object convertMapStrict( private static void checkTypeName(Schema.TypeName got, Schema.TypeName expected, String label) { checkArgument( - got.equals(expected), - "Can't convert '" + label + "' to " + got + ", expected: " + expected); + got.equals(expected), "Can't convert '%s' to %s, expected: %s", label, got, expected); + } + + /** + * Helper factory to build Avro Logical types schemas for SQL *CHAR types. This method represents + * the logical as Hive does. + */ + private static org.apache.avro.Schema buildHiveLogicalTypeSchema( + String hiveLogicalType, int size) { + String schemaJson = + String.format( + "{\"type\": \"string\", \"logicalType\": \"%s\", \"maxLength\": %s}", + hiveLogicalType, size); + return new org.apache.avro.Schema.Parser().parse(schemaJson); } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyLocalVariableManager.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyLocalVariableManager.java index d988e76bf3f0..55dba5ae761d 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyLocalVariableManager.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyLocalVariableManager.java @@ -19,11 +19,11 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.TypeCasting; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.TypeCasting; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; /** This class allows managing local variables in a ByteBuddy-generated function. */ class ByteBuddyLocalVariableManager { diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyUtils.java index 1448e45f74eb..1cc7f50643a1 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyUtils.java @@ -41,42 +41,42 @@ import org.apache.beam.sdk.util.common.ReflectHelpers; import org.apache.beam.sdk.values.TypeDescriptor; import org.apache.beam.sdk.values.TypeParameter; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.NamingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.NamingStrategy.SuffixingRandom.BaseNameResolver; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.asm.AsmVisitorWrapper; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.method.MethodDescription.ForLoadedConstructor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.method.MethodDescription.ForLoadedMethod; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation.Context; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Duplication; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.TypeCreation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.Assigner; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.Assigner.Typing; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.TypeCasting; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.collection.ArrayFactory; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.NullConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.FieldAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.ClassWriter; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.Label; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.MethodVisitor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.Opcodes; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.utility.RandomString; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.NamingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.NamingStrategy.SuffixingRandom.BaseNameResolver; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.asm.AsmVisitorWrapper; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.method.MethodDescription.ForLoadedConstructor; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.method.MethodDescription.ForLoadedMethod; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation.Context; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Duplication; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.TypeCreation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.Assigner; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.Assigner.Typing; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.TypeCasting; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.collection.ArrayFactory; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.NullConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.FieldAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.ClassWriter; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.Label; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.MethodVisitor; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.Opcodes; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.utility.RandomString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Collections2; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java index e644f661ba96..32bf016b898e 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java @@ -20,6 +20,7 @@ import java.io.Serializable; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Type; +import org.apache.avro.generic.GenericRecord; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.schemas.JavaFieldSchema.JavaFieldTypeSupplier; @@ -33,22 +34,24 @@ import org.apache.beam.sdk.util.common.ReflectHelpers; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.asm.AsmVisitorWrapper; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.ClassWriter; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.asm.AsmVisitorWrapper; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.ClassWriter; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Primitives; import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** Helper functions for converting between equivalent schema types. */ @Experimental(Kind.SCHEMAS) @@ -57,6 +60,8 @@ "rawtypes" }) public class ConvertHelpers { + private static final Logger LOG = LoggerFactory.getLogger(ConvertHelpers.class); + /** Return value after converting a schema. */ public static class ConvertedSchemaInformation implements Serializable { // If the output type is a composite type, this is the schema coder. @@ -78,12 +83,15 @@ public ConvertedSchemaInformation( public static ConvertedSchemaInformation getConvertedSchemaInformation( Schema inputSchema, TypeDescriptor outputType, SchemaRegistry schemaRegistry) { ConvertedSchemaInformation convertedSchema = null; - boolean toRow = outputType.equals(TypeDescriptor.of(Row.class)); - if (toRow) { + if (outputType.equals(TypeDescriptor.of(Row.class))) { // If the output is of type Row, then just forward the schema of the input type to the // output. convertedSchema = new ConvertedSchemaInformation<>((SchemaCoder) SchemaCoder.of(inputSchema), null); + } else if (outputType.equals(TypeDescriptor.of(GenericRecord.class))) { + convertedSchema = + new ConvertedSchemaInformation( + (SchemaCoder) AvroUtils.schemaCoder(AvroUtils.toAvroSchema(inputSchema)), null); } else { // Otherwise, try to find a schema for the output type in the schema registry. Schema outputSchema = null; @@ -97,7 +105,7 @@ public static ConvertedSchemaInformation getConvertedSchemaInformation( schemaRegistry.getToRowFunction(outputType), schemaRegistry.getFromRowFunction(outputType)); } catch (NoSuchSchemaException e) { - + LOG.debug("No schema found for type " + outputType, e); } FieldType unboxedType = null; // TODO: Properly handle nullable. diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JavaBeanUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JavaBeanUtils.java index bdb821a513d7..8dbd9b694e9b 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JavaBeanUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JavaBeanUtils.java @@ -37,23 +37,23 @@ import org.apache.beam.sdk.schemas.utils.ByteBuddyUtils.TypeConversionsFactory; import org.apache.beam.sdk.schemas.utils.ReflectUtils.ClassWithSchema; import org.apache.beam.sdk.util.common.ReflectHelpers; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.asm.AsmVisitorWrapper; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.method.MethodDescription.ForLoadedMethod; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.FixedValue; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Removal; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.ClassWriter; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.asm.AsmVisitorWrapper; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.method.MethodDescription.ForLoadedMethod; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.FixedValue; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Removal; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.ClassWriter; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; /** A set of utilities to generate getter and setter classes for JavaBean objects. */ diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/POJOUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/POJOUtils.java index 1c77c5972adc..20e9f326b5e7 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/POJOUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/POJOUtils.java @@ -40,29 +40,29 @@ import org.apache.beam.sdk.schemas.utils.ReflectUtils.ClassWithSchema; import org.apache.beam.sdk.util.common.ReflectHelpers; import org.apache.beam.sdk.values.TypeDescriptor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.asm.AsmVisitorWrapper; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.field.FieldDescription.ForLoadedField; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.FixedValue; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Duplication; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.TypeCreation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.TypeCasting; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.FieldAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.ClassWriter; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.asm.AsmVisitorWrapper; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.field.FieldDescription.ForLoadedField; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.FixedValue; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Duplication; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.TypeCreation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.TypeCasting; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.FieldAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.ClassWriter; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.checkerframework.checker.nullness.qual.Nullable; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SelectByteBuddyHelpers.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SelectByteBuddyHelpers.java index a9edc603fa99..91b4e3bfdb77 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SelectByteBuddyHelpers.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SelectByteBuddyHelpers.java @@ -36,37 +36,37 @@ import org.apache.beam.sdk.schemas.utils.ByteBuddyUtils.IfNullElse; import org.apache.beam.sdk.schemas.utils.ByteBuddyUtils.ShortCircuitReturnNull; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.asm.AsmVisitorWrapper; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.method.MethodDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.FieldManifestation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.Visibility; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.Generic; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation.Context; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Duplication; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Removal; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackSize; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.TypeCasting; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.NullConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.FieldAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.ClassWriter; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.Label; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.MethodVisitor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.Opcodes; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.asm.AsmVisitorWrapper; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.method.MethodDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.FieldManifestation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.Visibility; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.Generic; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation.Context; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Duplication; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Removal; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackSize; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.TypeCasting; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.NullConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.FieldAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.ClassWriter; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.Label; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.MethodVisitor; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.Opcodes; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SelectHelpers.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SelectHelpers.java index fa55509a5a9b..884ed683eded 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SelectHelpers.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SelectHelpers.java @@ -447,8 +447,15 @@ private static void allLeafFields( Map fieldsSelected) { for (Field field : schema.getFields()) { nameComponents.add(field.getName()); - if (field.getType().getTypeName().isCompositeType()) { - allLeafFields(field.getType().getRowSchema(), nameComponents, nameFn, fieldsSelected); + + FieldType fieldType = field.getType(); + FieldType collectionElementType = fieldType.getCollectionElementType(); + + if (fieldType.getTypeName().isCompositeType()) { + allLeafFields(fieldType.getRowSchema(), nameComponents, nameFn, fieldsSelected); + } else if (collectionElementType != null + && collectionElementType.getTypeName().isCompositeType()) { + allLeafFields(collectionElementType.getRowSchema(), nameComponents, nameFn, fieldsSelected); } else { String newName = nameFn.apply(nameComponents); fieldsSelected.put(String.join(".", nameComponents), newName); diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/state/MapState.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/state/MapState.java index f5f5dd2b4744..6c05ba8940a7 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/state/MapState.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/state/MapState.java @@ -18,6 +18,8 @@ package org.apache.beam.sdk.state; import java.util.Map; +import java.util.function.Function; +import javax.annotation.Nullable; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; @@ -54,8 +56,37 @@ public interface MapState extends State { *

    Changes will not be reflected in the results returned by previous calls to {@link * ReadableState#read} on the results any of the reading methods ({@link #get}, {@link #keys}, * {@link #values}, and {@link #entries}). + * + *

    Since the condition is not evaluated until {@link ReadableState#read} is called, a call to + * {@link #putIfAbsent} followed by a call to {@link #remove} followed by a read on the + * putIfAbsent return will result in the item being written to the map. Similarly, if there are + * multiple calls to {@link #putIfAbsent} for the same key, precedence will be given to the first + * one on which read is called. + */ + default ReadableState putIfAbsent(K key, V value) { + return computeIfAbsent(key, k -> value); + } + + /** + * A deferred read-followed-by-write. + * + *

    When {@code read()} is called on the result or state is committed, it forces a read of the + * map and reconciliation with any pending modifications. + * + *

    If the specified key is not already associated with a value (or is mapped to {@code null}) + * associates it with the computed and returns {@code null}, else returns the current value. + * + *

    Changes will not be reflected in the results returned by previous calls to {@link + * ReadableState#read} on the results any of the reading methods ({@link #get}, {@link #keys}, + * {@link #values}, and {@link #entries}). + * + *

    Since the condition is not evaluated until {@link ReadableState#read} is called, a call to + * {@link #putIfAbsent} followed by a call to {@link #remove} followed by a read on the + * putIfAbsent return will result in the item being written to the map. Similarly, if there are + * multiple calls to {@link #putIfAbsent} for the same key, precedence will be given to the first + * one on which read is called. */ - ReadableState putIfAbsent(K key, V value); + ReadableState computeIfAbsent(K key, Function mappingFunction); /** * Remove the mapping for a key from this map if it is present. @@ -67,7 +98,7 @@ public interface MapState extends State { void remove(K key); /** - * A deferred lookup. + * A deferred lookup, using null values if the item is not found. * *

    A user is encouraged to call {@code get} for all relevant keys and call {@code readLater()} * on the results. @@ -77,6 +108,17 @@ public interface MapState extends State { */ ReadableState get(K key); + /** + * A deferred lookup. + * + *

    A user is encouraged to call {@code get} for all relevant keys and call {@code readLater()} + * on the results. + * + *

    When {@code read()} is called, a particular state implementation is encouraged to perform + * all pending reads in a single batch. + */ + ReadableState getOrDefault(K key, @Nullable V defaultValue); + /** Returns an {@link Iterable} over the keys contained in this map. */ ReadableState> keys(); @@ -85,4 +127,10 @@ public interface MapState extends State { /** Returns an {@link Iterable} over the key-value pairs contained in this map. */ ReadableState>> entries(); + + /** + * Returns a {@link ReadableState} whose {@link ReadableState#read} method will return true if + * this state is empty at the point when that {@link ReadableState#read} call returns. + */ + ReadableState isEmpty(); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/state/StateSpecs.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/state/StateSpecs.java index 000ab549fe11..b812080f5acf 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/state/StateSpecs.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/state/StateSpecs.java @@ -23,6 +23,7 @@ import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.coders.BooleanCoder; import org.apache.beam.sdk.coders.CannotProvideCoderException; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderRegistry; @@ -320,6 +321,25 @@ public static StateSpec> convertToBag } } + /** + * For internal use only; no backwards-compatibility guarantees. + * + *

    Convert a set state spec to a map-state spec. + */ + @Internal + public static StateSpec> convertToMapSpecInternal( + StateSpec> setStateSpec) { + if (setStateSpec instanceof SetStateSpec) { + // Checked above; conversion to a map spec depends on the provided spec being one of those + // created via the factory methods in this class. + @SuppressWarnings("unchecked") + SetStateSpec typedSpec = (SetStateSpec) setStateSpec; + return typedSpec.asMapSpec(); + } else { + throw new IllegalArgumentException("Unexpected StateSpec " + setStateSpec); + } + } + /** * A specification for a state cell holding a settable value of type {@code T}. * @@ -773,6 +793,10 @@ public boolean equals(@Nullable Object obj) { public int hashCode() { return Objects.hash(getClass(), elemCoder); } + + private StateSpec> asMapSpec() { + return new MapStateSpec<>(this.elemCoder, BooleanCoder.of()); + } } /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/state/Timer.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/state/Timer.java index 437df4d3fc32..78453ee7c1b6 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/state/Timer.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/state/Timer.java @@ -81,6 +81,9 @@ public interface Timer { */ void setRelative(); + /** Clears a timer. Previous set timers will become unset. */ + void clear(); + /** Offsets the target timestamp used by {@link #setRelative()} by the given duration. */ Timer offset(Duration offset); @@ -95,4 +98,11 @@ public interface Timer { * the timer fires. */ Timer withOutputTimestamp(Instant outputTime); + + /** + * Returns the current relative time used by {@link #setRelative()} and {@link #offset}. This can + * be used by a client that self-manages relative timers (e.g. one that stores the current timer + * time in a state variable. + */ + Instant getCurrentRelativeTime(); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java index 4e8195753381..4af1c47cc5ac 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java @@ -17,12 +17,12 @@ */ package org.apache.beam.sdk.testing; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; @@ -244,6 +244,21 @@ public static void structuralValueDecodeEncodeEqual(Coder coder, T value) } } + /** + * Verifies that for the given {@code Coder} and value of type {@code T}, the structural value + * of the content of the Iterable is equal to the structural value yield by encoding and decoding + * the original value. + * + *

    This is useful to test the correct implementation of a Coder structural equality with values + * that don't implement the equals contract. + */ + public static > void structuralValueDecodeEncodeEqualIterable( + Coder coder, T value) throws Exception { + for (Coder.Context context : ALL_CONTEXTS) { + CoderProperties.structuralValueDecodeEncodeEqualIterableInContext(coder, context, value); + } + } + /** * Verifies that for the given {@code Coder}, {@code Coder.Context}, and value of type {@code * T}, the structural value is equal to the structural value yield by encoding and decoding the @@ -255,6 +270,20 @@ public static void structuralValueDecodeEncodeEqualInContext( coder.structuralValue(value), coder.structuralValue(decodeEncode(coder, context, value))); } + /** + * Verifies that for the given {@code Coder}, {@code Coder.Context}, and value of type {@code + * T}, the structural content of the Iterable of the value is equal to the structural value yield + * by encoding and decoding the original value, in any {@code Coder.Context}. + */ + public static > void structuralValueDecodeEncodeEqualIterableInContext( + Coder coder, Coder.Context context, T value) throws Exception { + assertThat( + "The original value changed after, encoding and decoding.", + Iterables.elementsEqual( + (Iterable) coder.structuralValue(value), + (Iterable) coder.structuralValue(decodeEncode(coder, context, value)))); + } + private static final String DECODING_WIRE_FORMAT_MESSAGE = "Decoded value from known wire format does not match expected value." + " This probably means that this Coder no longer correctly decodes" diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CombineFnTester.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CombineFnTester.java index 05a55fe84547..2c4af78abd43 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CombineFnTester.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CombineFnTester.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.testing; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Arrays; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java index 5d625b5eeebf..fbf49dbea35e 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java @@ -18,10 +18,10 @@ package org.apache.beam.sdk.testing; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; import java.io.IOException; @@ -29,6 +29,7 @@ import java.util.Arrays; import java.util.Collection; import java.util.Collections; +import java.util.List; import java.util.Map; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.Pipeline.PipelineVisitor; @@ -512,8 +513,78 @@ public static SingletonAssert> thatMap( PAssertionSite.capture(reason)); } + /** + * Constructs an {@link PCollectionListContentsAssert} for the provided {@link PCollectionList}. + */ + public static PCollectionListContentsAssert thatList(PCollectionList actual) { + return new PCollectionListContentsAssert<>(actual); + } + + /** + * Constructs an {@link IterableAssert} for the elements of the flattened {@link PCollectionList}. + */ + public static IterableAssert thatFlattened(PCollectionList actual) { + PCollection flatten = actual.apply(Flatten.pCollections()); + return that(flatten.getName(), flatten); + } + + /** + * Constructs an {@link IterableAssert} for the elements of the flattened {@link PCollectionList} + * with the specified reason. + */ + public static IterableAssert thatFlattened(String reason, PCollectionList actual) { + return new PCollectionContentsAssert<>( + actual.apply(Flatten.pCollections()), PAssertionSite.capture(reason)); + } + //////////////////////////////////////////////////////////// + /** + * An assert about the contents of each {@link PCollection} in the given {@link PCollectionList}. + */ + protected static class PCollectionListContentsAssert { + private final PCollectionList pCollectionList; + + public PCollectionListContentsAssert(PCollectionList actual) { + this.pCollectionList = actual; + } + + /** + * Applies one {@link SerializableFunction} to check the elements of each {@link PCollection} in + * the {@link PCollectionList}. + * + *

    Returns this {@code PCollectionListContentsAssert}. + */ + public PCollectionListContentsAssert satisfies( + SerializableFunction, Void> checkerFn) { + for (int i = 0; i < pCollectionList.size(); i++) { + PAssert.that(pCollectionList.get(i)).satisfies(checkerFn); + } + return this; + } + + /** + * Takes list of {@link SerializableFunction}s of the same size as {@link #pCollectionList}, and + * applies each matcher to the {@code PCollection} with the identical index in the {@link + * #pCollectionList}. + * + *

    Returns this {@code PCollectionListContentsAssert}. + */ + public PCollectionListContentsAssert satisfies( + List, Void>> checkerFnList) { + if (checkerFnList == null) { + throw new IllegalArgumentException("List of SerializableFunction must not be null"); + } else if (checkerFnList.size() != pCollectionList.size()) { + throw new IllegalArgumentException( + "List of SerializableFunction must be the same size as the PCollectionList"); + } + for (int i = 0; i < pCollectionList.size(); i++) { + PAssert.that(pCollectionList.get(i)).satisfies(checkerFnList.get(i)); + } + return this; + } + } + /** * An {@link IterableAssert} about the contents of a {@link PCollection}. This does not require * the runner to support side inputs. diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/SourceTestUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/SourceTestUtils.java index c101e8c5f08e..ada42f7fb832 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/SourceTestUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/SourceTestUtils.java @@ -18,12 +18,12 @@ package org.apache.beam.sdk.testing; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipeline.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipeline.java index f613d6b9bfb6..581c5f1e93d4 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipeline.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipeline.java @@ -334,6 +334,51 @@ public PipelineResult run() { return run(getOptions()); } + /** + * Runs this {@link TestPipeline} with additional cmd pipeline option args. + * + *

    This is useful when using {@link PipelineOptions#as(Class)} directly introduces circular + * dependency. + * + *

    Most of logic is similar to {@link #testingPipelineOptions}. + */ + public PipelineResult runWithAdditionalOptionArgs(List additionalArgs) { + try { + @Nullable + String beamTestPipelineOptions = System.getProperty(PROPERTY_BEAM_TEST_PIPELINE_OPTIONS); + PipelineOptions options; + if (Strings.isNullOrEmpty(beamTestPipelineOptions)) { + options = PipelineOptionsFactory.create(); + } else { + List args = MAPPER.readValue(beamTestPipelineOptions, List.class); + args.addAll(additionalArgs); + String[] newArgs = new String[args.size()]; + newArgs = args.toArray(newArgs); + options = PipelineOptionsFactory.fromArgs(newArgs).as(TestPipelineOptions.class); + } + + // If no options were specified, set some reasonable defaults + if (Strings.isNullOrEmpty(beamTestPipelineOptions)) { + // If there are no provided options, check to see if a dummy runner should be used. + String useDefaultDummy = System.getProperty(PROPERTY_USE_DEFAULT_DUMMY_RUNNER); + if (!Strings.isNullOrEmpty(useDefaultDummy) && Boolean.valueOf(useDefaultDummy)) { + options.setRunner(CrashingRunner.class); + } + } + options.setStableUniqueNames(CheckEnabled.ERROR); + + FileSystems.setDefaultPipelineOptions(options); + return run(options); + } catch (IOException e) { + throw new RuntimeException( + "Unable to instantiate test options from system property " + + PROPERTY_BEAM_TEST_PIPELINE_OPTIONS + + ":" + + System.getProperty(PROPERTY_BEAM_TEST_PIPELINE_OPTIONS), + e); + } + } + /** Like {@link #run} but with the given potentially modified options. */ @Override public PipelineResult run(PipelineOptions options) { diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java index edd83f1d9648..3327ae8fc747 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java @@ -54,6 +54,16 @@ public interface TestPipelineOptions extends PipelineOptions { void setTestTimeoutSeconds(Long value); + @Default.Boolean(true) + @org.apache.beam.sdk.options.Description( + "If the pipeline should block awaiting completion of the pipeline. If set to true, " + + "a call to Pipeline#run() will block until all PTransforms are complete. Otherwise, " + + "the Pipeline will execute asynchronously. If set to false, use " + + "PipelineResult#waitUntilFinish() to block until the Pipeline is complete.") + boolean isBlockOnRun(); + + void setBlockOnRun(boolean value); + /** Factory for {@link PipelineResult} matchers which always pass. */ class AlwaysPassMatcherFactory implements DefaultValueFactory> { diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesLoopingTimer.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesLoopingTimer.java new file mode 100644 index 000000000000..c74d8eb333ae --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesLoopingTimer.java @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.testing; + +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.transforms.ParDo; + +/** Category tag for validation tests which utilize looping timers in {@link ParDo}. */ +@Internal +public interface UsesLoopingTimer {} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesPythonExpansionService.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesPythonExpansionService.java new file mode 100644 index 000000000000..b92742e5db8b --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesPythonExpansionService.java @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.testing; + +import org.apache.beam.sdk.annotations.Internal; + +/** + * Category tag for tests which use the expansion service in Python. Tests tagged with {@link + * UsesPythonExpansionService} should be run for runners which support cross-language transforms. + */ +@Internal +public interface UsesPythonExpansionService {} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/WindowFnTestUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/WindowFnTestUtils.java index 61a209ba0055..10c2d2517b2f 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/WindowFnTestUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/WindowFnTestUtils.java @@ -18,14 +18,11 @@ package org.apache.beam.sdk.testing; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.greaterThan; -import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; -import static org.junit.Assert.assertTrue; import java.util.ArrayList; import java.util.Collection; -import java.util.Comparator; import java.util.HashMap; import java.util.HashSet; import java.util.List; @@ -201,96 +198,6 @@ public Set get(W window) { } } - /** - * Assigns the given {@code timestamp} to windows using the specified {@code windowFn}, and - * verifies that result of {@code windowFn.getOutputTimestamp} for each window is within the - * proper bound. - */ - public static void validateNonInterferingOutputTimes( - WindowFn windowFn, long timestamp) throws Exception { - validateNonInterferingOutputTimesWithValue( - windowFn, TimestampedValue.of((T) null, new Instant(timestamp))); - } - /** - * Assigns the given {@code timestampedValue} to windows using the specified {@code windowFn}, and - * verifies that result of {@code windowFn.getOutputTimestamp} for each window is within the - * proper bound. This version allows passing a {@link TimestampedValue} in case the value is - * needed to assign windows. - */ - public static void validateNonInterferingOutputTimesWithValue( - WindowFn windowFn, TimestampedValue timestampedValue) throws Exception { - Collection windows = assignedWindowsWithValue(windowFn, timestampedValue); - - Instant instant = timestampedValue.getTimestamp(); - for (W window : windows) { - Instant outputTimestamp = windowFn.getOutputTime(instant, window); - assertFalse( - "getOutputTime must be greater than or equal to input timestamp", - outputTimestamp.isBefore(instant)); - assertFalse( - "getOutputTime must be less than or equal to the max timestamp", - outputTimestamp.isAfter(window.maxTimestamp())); - } - } - - /** - * Assigns the given {@code timestamp} to windows using the specified {@code windowFn}, and - * verifies that result of {@link WindowFn#getOutputTime windowFn.getOutputTime} for later windows - * (as defined by {@code maxTimestamp} won't prevent the watermark from passing the end of earlier - * windows. - * - *

    This verifies that overlapping windows don't interfere at all. Depending on the {@code - * windowFn} this may be stricter than desired. - */ - public static void validateGetOutputTimestamp( - WindowFn windowFn, long timestamp) throws Exception { - validateGetOutputTimestampWithValue( - windowFn, TimestampedValue.of((T) null, new Instant(timestamp))); - } - - /** - * Assigns the given {@code timestampedValue} to windows using the specified {@code windowFn}, and - * verifies that result of {@link WindowFn#getOutputTime windowFn.getOutputTime} for later windows - * (as defined by {@code maxTimestamp} won't prevent the watermark from passing the end of earlier - * windows. - * - *

    This verifies that overlapping windows don't interfere at all. Depending on the {@code - * windowFn} this may be stricter than desired. This version allows passing a {@link - * TimestampedValue} in case the value is needed to assign windows. - */ - public static void validateGetOutputTimestampWithValue( - WindowFn windowFn, TimestampedValue timestampedValue) throws Exception { - Collection windows = assignedWindowsWithValue(windowFn, timestampedValue); - List sortedWindows = new ArrayList<>(windows); - sortedWindows.sort(Comparator.comparing(BoundedWindow::maxTimestamp)); - - Instant instant = timestampedValue.getTimestamp(); - Instant endOfPrevious = null; - for (W window : sortedWindows) { - Instant outputTimestamp = windowFn.getOutputTime(instant, window); - if (endOfPrevious == null) { - // If this is the first window, the output timestamp can be anything, as long as it is in - // the valid range. - assertFalse( - "getOutputTime must be greater than or equal to input timestamp", - outputTimestamp.isBefore(instant)); - assertFalse( - "getOutputTime must be less than or equal to the max timestamp", - outputTimestamp.isAfter(window.maxTimestamp())); - } else { - // If this is a later window, the output timestamp must be after the end of the previous - // window - assertTrue( - "getOutputTime must be greater than the end of the previous window", - outputTimestamp.isAfter(endOfPrevious)); - assertFalse( - "getOutputTime must be less than or equal to the max timestamp", - outputTimestamp.isAfter(window.maxTimestamp())); - } - endOfPrevious = window.maxTimestamp(); - } - } - /** * Verifies that later-ending merged windows from any of the timestamps hold up output of * earlier-ending windows, using the provided {@link WindowFn} and {@link TimestampCombiner}. diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java index c9430842018a..760883abac92 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java @@ -43,20 +43,22 @@ * {@code PTransform}s for estimating the number of distinct elements in a {@code PCollection}, or * the number of distinct values associated with each key in a {@code PCollection} of {@code KV}s. * - *

    Consider using {@code HllCount} in the {@code zetasketch} extension module if you need better - * performance or need to save intermediate aggregation result into a sketch for later processing. - * - *

    For example, to estimate the number of distinct elements in a {@code PCollection}: - * - *

    {@code
    + * @deprecated
    + *     

    Consider using {@code ApproximateCountDistinct} in the {@code zetasketch} extension + * module, which makes use of the {@code HllCount} implementation. + *

    If {@code ApproximateCountDistinct} does not meet your needs then you can directly use + * {@code HllCount}. Direct usage will also give you access to save intermediate aggregation + * result into a sketch for later processing. + *

    For example, to estimate the number of distinct elements in a {@code PCollection}: + *

    {@code
      * PCollection input = ...;
      * PCollection countDistinct =
      *     input.apply(HllCount.Init.forStrings().globally()).apply(HllCount.Extract.globally());
      * }
    - * - * For more details about using {@code HllCount} and the {@code zetasketch} extension module, see - * https://s.apache.org/hll-in-beam#bookmark=id.v6chsij1ixo7. + * For more details about using {@code HllCount} and the {@code zetasketch} extension module, + * see https://s.apache.org/hll-in-beam#bookmark=id.v6chsij1ixo7. */ +@Deprecated public class ApproximateUnique { /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java index 63b41910dab6..a3b56657877f 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java @@ -17,7 +17,6 @@ */ package org.apache.beam.sdk.transforms; -import static org.apache.beam.sdk.options.ExperimentalOptions.hasExperiment; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; import java.io.IOException; @@ -47,7 +46,6 @@ import org.apache.beam.sdk.transforms.CombineWithContext.Context; import org.apache.beam.sdk.transforms.CombineWithContext.RequiresContextInternal; import org.apache.beam.sdk.transforms.View.CreatePCollectionView; -import org.apache.beam.sdk.transforms.View.VoidKeyToMultimapMaterialization; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.display.DisplayData.Builder; import org.apache.beam.sdk.transforms.display.HasDisplayData; @@ -1308,43 +1306,21 @@ private GloballyAsSingletonView( @Override public PCollectionView expand(PCollection input) { - // TODO(BEAM-10097): Make this the default expansion for all portable runners. - if (hasExperiment(input.getPipeline().getOptions(), "beam_fn_api") - && (hasExperiment(input.getPipeline().getOptions(), "use_runner_v2") - || hasExperiment(input.getPipeline().getOptions(), "use_unified_worker"))) { - PCollection combined = - input.apply( - "CombineValues", - Combine.globally(fn).withoutDefaults().withFanout(fanout)); - Coder outputCoder = combined.getCoder(); - PCollectionView view = - PCollectionViews.singletonView( - combined, - (TypeDescriptorSupplier) - () -> outputCoder != null ? outputCoder.getEncodedTypeDescriptor() : null, - input.getWindowingStrategy(), - insertDefault, - insertDefault ? fn.defaultValue() : null, - combined.getCoder()); - combined.apply("CreatePCollectionView", CreatePCollectionView.of(view)); - return view; - } - PCollection combined = - input.apply(Combine.globally(fn).withoutDefaults().withFanout(fanout)); - PCollection> materializationInput = - combined.apply(new VoidKeyToMultimapMaterialization<>()); + input.apply( + "CombineValues", + Combine.globally(fn).withoutDefaults().withFanout(fanout)); Coder outputCoder = combined.getCoder(); PCollectionView view = - PCollectionViews.singletonViewUsingVoidKey( - materializationInput, + PCollectionViews.singletonView( + combined, (TypeDescriptorSupplier) () -> outputCoder != null ? outputCoder.getEncodedTypeDescriptor() : null, input.getWindowingStrategy(), insertDefault, insertDefault ? fn.defaultValue() : null, combined.getCoder()); - materializationInput.apply(CreatePCollectionView.of(view)); + combined.apply("CreatePCollectionView", CreatePCollectionView.of(view)); return view; } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Create.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Create.java index 20d123a45150..3ed4a3b706c9 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Create.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Create.java @@ -24,6 +24,7 @@ import java.util.ArrayList; import java.util.Arrays; import java.util.Collection; +import java.util.Deque; import java.util.Iterator; import java.util.List; import java.util.Map; @@ -38,6 +39,7 @@ import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.CoderRegistry; import org.apache.beam.sdk.coders.CollectionCoder; +import org.apache.beam.sdk.coders.DequeCoder; import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.ListCoder; @@ -794,6 +796,8 @@ private static Coder inferCoderFromObject( return ListCoder.of(inferCoderFromObjects(coderRegistry, schemaRegistry, (Iterable) o)); } else if (o instanceof Set) { return SetCoder.of(inferCoderFromObjects(coderRegistry, schemaRegistry, (Iterable) o)); + } else if (o instanceof Deque) { + return DequeCoder.of(inferCoderFromObjects(coderRegistry, schemaRegistry, (Iterable) o)); } else if (o instanceof Collection) { return CollectionCoder.of(inferCoderFromObjects(coderRegistry, schemaRegistry, (Iterable) o)); } else if (o instanceof Iterable) { diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Deduplicate.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Deduplicate.java index 42b9e96f71c1..6b78056bdddf 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Deduplicate.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Deduplicate.java @@ -27,6 +27,7 @@ import org.apache.beam.sdk.state.TimerSpec; import org.apache.beam.sdk.state.TimerSpecs; import org.apache.beam.sdk.state.ValueState; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.TypeDescriptor; @@ -304,13 +305,17 @@ private DeduplicateFn(TimeDomain timeDomain, Duration duration) { @ProcessElement public void processElement( @Element KV element, + BoundedWindow window, OutputReceiver> receiver, @StateId(SEEN_STATE) ValueState seenState, @TimerId(EXPIRY_TIMER) Timer expiryTimer) { Boolean seen = seenState.read(); // Seen state is either set or not set so if it has been set then it must be true. if (seen == null) { - expiryTimer.offset(duration).setRelative(); + // We don't want the expiry timer to hold up watermarks, so we set its output timestamp to + // the end of the + // window. + expiryTimer.offset(duration).withOutputTimestamp(window.maxTimestamp()).setRelative(); seenState.write(true); receiver.output(element); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Distinct.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Distinct.java index dab147c6a4aa..a8cfebc51946 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Distinct.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Distinct.java @@ -88,7 +88,7 @@ public static WithRepresentativeValues withRepresentativeValueF private static void validateWindowStrategy( WindowingStrategy strategy) { - if (!strategy.getWindowFn().isNonMerging() + if (strategy.needsMerge() && (!strategy.getTrigger().getClass().equals(DefaultTrigger.class) || strategy.getAllowedLateness().isLongerThan(Duration.ZERO))) { throw new UnsupportedOperationException( diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java index 7e27b5fa55f0..47eac66c9af8 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java @@ -872,7 +872,10 @@ public interface MultiOutputReceiver { * crash, hardware failure, etc.) or unnecessary (e.g. the pipeline is shutting down and the * process is about to be killed anyway, so all transient resources will be released automatically * by the OS). In these cases, the call may not happen. It will also not be retried, because in - * such situations the DoFn instance no longer exists, so there's no instance to retry it on. + * such situations the DoFn instance no longer exists, so there's no instance to retry it on. In + * portable execution(with {@code --experiments=beam_fn_api}), the exception thrown calling {@link + * Teardown} will not fail the bundle execution. Instead, an error message will be shown on sdk + * harness log. * *

    Thus, all work that depends on input elements, and all externally important side effects, * must be performed in the {@link ProcessElement} or {@link FinishBundle} methods. @@ -972,7 +975,7 @@ public interface MultiOutputReceiver { * options for the current pipeline. * * - *

    Returns a double representing the size of the current element and restriction. + *

    Returns a non-negative double representing the size of the current element and restriction. * *

    Splittable {@link DoFn}s should only provide this method if the default {@link * RestrictionTracker.HasProgress} implementation within the {@link RestrictionTracker} is an diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java index a06d5c53a231..e1acb0dba0a3 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java @@ -723,7 +723,19 @@ private void initializeState() throws Exception { SerializableUtils.serializeToByteArray(origFn), origFn.toString()); } fnInvoker = DoFnInvokers.invokerFor(fn); - fnInvoker.invokeSetup(); + fnInvoker.invokeSetup(new TestSetupArgumentProvider()); + } + + private class TestSetupArgumentProvider extends BaseArgumentProvider { + @Override + public PipelineOptions pipelineOptions() { + return options; + } + + @Override + public String getErrorContext() { + return "DoFnTester/Setup"; + } } private Map getOutputs() { diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java index 62c795a4d1f7..6dc7aaa3e3fd 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java @@ -26,7 +26,6 @@ import org.apache.beam.sdk.transforms.windowing.AfterWatermark.FromEndOfWindow; import org.apache.beam.sdk.transforms.windowing.DefaultTrigger; import org.apache.beam.sdk.transforms.windowing.GlobalWindows; -import org.apache.beam.sdk.transforms.windowing.InvalidWindows; import org.apache.beam.sdk.transforms.windowing.Never.NeverTrigger; import org.apache.beam.sdk.transforms.windowing.TimestampCombiner; import org.apache.beam.sdk.transforms.windowing.Window; @@ -158,13 +157,6 @@ public static void applicableTo(PCollection input) { + " trigger. Use a Window.into or Window.triggering transform prior to GroupByKey."); } - // Validate the window merge function. - if (windowingStrategy.getWindowFn() instanceof InvalidWindows) { - String cause = ((InvalidWindows) windowingStrategy.getWindowFn()).getCause(); - throw new IllegalStateException( - "GroupByKey must have a valid Window merge function. " + "Invalid because: " + cause); - } - // Validate that the trigger does not finish before garbage collection time if (!triggerIsSafe(windowingStrategy)) { throw new IllegalArgumentException( @@ -208,18 +200,10 @@ private static boolean triggerIsSafe(WindowingStrategy windowingStrategy) } public WindowingStrategy updateWindowingStrategy(WindowingStrategy inputStrategy) { - WindowFn inputWindowFn = inputStrategy.getWindowFn(); - if (!inputWindowFn.isNonMerging()) { - // Prevent merging windows again, without explicit user - // involvement, e.g., by Window.into() or Window.remerge(). - inputWindowFn = - new InvalidWindows<>( - "WindowFn has already been consumed by previous GroupByKey", inputWindowFn); - } - - // We also switch to the continuation trigger associated with the current trigger. + // If the WindowFn was merging, set the bit to indicate it is already merged. + // Switch to the continuation trigger associated with the current trigger. return inputStrategy - .withWindowFn(inputWindowFn) + .withAlreadyMerged(!inputStrategy.getWindowFn().isNonMerging()) .withTrigger(inputStrategy.getTrigger().getContinuationTrigger()); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java index 59f74ba54631..f5fbf5d3bccb 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java @@ -19,6 +19,8 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import com.google.auto.value.AutoValue; +import java.io.Serializable; import java.nio.ByteBuffer; import java.util.UUID; import javax.annotation.Nullable; @@ -36,9 +38,12 @@ import org.apache.beam.sdk.state.ValueState; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.util.ShardedKey; +import org.apache.beam.sdk.util.common.ElementByteSizeObserver; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.joda.time.Duration; import org.joda.time.Instant; @@ -49,10 +54,19 @@ * A {@link PTransform} that batches inputs to a desired batch size. Batches will contain only * elements of a single key. * - *

    Elements are buffered until there are {@code batchSize} elements, at which point they are + *

    Elements are buffered until there are enough elements for a batch, at which point they are * emitted to the output {@link PCollection}. A {@code maxBufferingDuration} can be set to emit * output early and avoid waiting for a full batch forever. * + *

    Batches can be triggered either based on element count or byte size. {@link #ofSize} is used + * to specify a maximum element count while {@link #ofByteSize} is used to specify a maximum byte + * size. The single-argument {@link #ofByteSize} uses the input coder to determine the encoded byte + * size of each element. However, this may not always be what is desired. A user may want to control + * batching based on a different byte size (e.g. the memory usage of the decoded Java object) or the + * input coder may not be able to efficiently determine the elements' byte size. For these cases, we + * also provide the two-argument {@link #ofByteSize} allowing the user to pass in a function to be + * used to determine the byte size of an element. + * *

    Windows are preserved (batches contain elements from the same window). Batches may contain * elements from more than one bundle. * @@ -94,32 +108,116 @@ public class GroupIntoBatches extends PTransform>, PCollection>>> { - private final long batchSize; - @Nullable private final Duration maxBufferingDuration; + /** + * Wrapper class for batching parameters supplied by users. Shared by both {@link + * GroupIntoBatches} and {@link GroupIntoBatches.WithShardedKey}. + */ + @AutoValue + public abstract static class BatchingParams implements Serializable { + public static BatchingParams create( + long batchSize, + long batchSizeBytes, + SerializableFunction elementByteSize, + Duration maxBufferingDuration) { + return new AutoValue_GroupIntoBatches_BatchingParams( + batchSize, batchSizeBytes, elementByteSize, maxBufferingDuration); + } + + public abstract long getBatchSize(); + + public abstract long getBatchSizeBytes(); + + @Nullable + public abstract SerializableFunction getElementByteSize(); + + public abstract Duration getMaxBufferingDuration(); + + public SerializableFunction getWeigher(Coder valueCoder) { + SerializableFunction weigher = getElementByteSize(); + if (getBatchSizeBytes() < Long.MAX_VALUE) { + if (weigher == null) { + // If the user didn't specify a byte-size function, then use the Coder to determine the + // byte + // size. + // Note: if Coder.isRegisterByteSizeObserverCheap == false, then this will be expensive. + weigher = + (InputT element) -> { + try { + ByteSizeObserver observer = new ByteSizeObserver(); + valueCoder.registerByteSizeObserver(element, observer); + observer.advance(); + return observer.getElementByteSize(); + } catch (Exception e) { + throw new RuntimeException(e); + } + }; + } + } + return weigher; + } + } + + private final BatchingParams params; private static final UUID workerUuid = UUID.randomUUID(); - private GroupIntoBatches(long batchSize, @Nullable Duration maxBufferingDuration) { - this.batchSize = batchSize; - this.maxBufferingDuration = maxBufferingDuration; + private GroupIntoBatches(BatchingParams params) { + this.params = params; } + /** Aim to create batches each with the specified element count. */ public static GroupIntoBatches ofSize(long batchSize) { - return new GroupIntoBatches<>(batchSize, null); + Preconditions.checkState(batchSize < Long.MAX_VALUE); + return new GroupIntoBatches<>( + BatchingParams.create(batchSize, Long.MAX_VALUE, null, Duration.ZERO)); + } + + /** + * Aim to create batches each with the specified byte size. + * + *

    This option uses the PCollection's coder to determine the byte size of each element. This + * may not always be what is desired (e.g. the encoded size is not the same as the memory usage of + * the Java object). This is also only recommended if the coder returns true for + * isRegisterByteSizeObserverCheap, otherwise the transform will perform a possibly-expensive + * encoding of each element in order to measure its byte size. An alternate approach is to use + * {@link #ofByteSize(long, SerializableFunction)} to specify code to calculate the byte size. + */ + public static GroupIntoBatches ofByteSize(long batchSizeBytes) { + Preconditions.checkState(batchSizeBytes < Long.MAX_VALUE); + return new GroupIntoBatches<>( + BatchingParams.create(Long.MAX_VALUE, batchSizeBytes, null, Duration.ZERO)); } - /** Returns the size of the batch. */ - public long getBatchSize() { - return batchSize; + /** + * Aim to create batches each with the specified byte size. The provided function is used to + * determine the byte size of each element. + */ + public static GroupIntoBatches ofByteSize( + long batchSizeBytes, SerializableFunction getElementByteSize) { + Preconditions.checkState(batchSizeBytes < Long.MAX_VALUE); + return new GroupIntoBatches<>( + BatchingParams.create(Long.MAX_VALUE, batchSizeBytes, getElementByteSize, Duration.ZERO)); + } + + /** Returns user supplied parameters for batching. */ + public BatchingParams getBatchingParams() { + return params; } /** * Sets a time limit (in processing time) on how long an incomplete batch of elements is allowed - * to be buffered. Once a batch is flushed to output, the timer is reset. + * to be buffered. Once a batch is flushed to output, the timer is reset. The provided limit must + * be a positive duration or zero; a zero buffering duration effectively means no limit. */ public GroupIntoBatches withMaxBufferingDuration(Duration duration) { checkArgument( - duration.isLongerThan(Duration.ZERO), "max buffering duration should be a positive value"); - return new GroupIntoBatches<>(batchSize, duration); + duration != null && !duration.isShorterThan(Duration.ZERO), + "max buffering duration should be a non-negative value"); + return new GroupIntoBatches<>( + BatchingParams.create( + params.getBatchSize(), + params.getBatchSizeBytes(), + params.getElementByteSize(), + duration)); } /** @@ -138,16 +236,14 @@ public class WithShardedKey PCollection>, PCollection, Iterable>>> { private WithShardedKey() {} - /** Returns the size of the batch. */ - public long getBatchSize() { - return batchSize; + /** Returns user supplied parameters for batching. */ + public BatchingParams getBatchingParams() { + return params; } @Override public PCollection, Iterable>> expand( PCollection> input) { - Duration allowedLateness = input.getWindowingStrategy().getAllowedLateness(); - checkArgument( input.getCoder() instanceof KvCoder, "coder specified in the input PCollection is not a KvCoder"); @@ -170,17 +266,23 @@ public KV, InputT> apply(KV input) { } })) .setCoder(KvCoder.of(ShardedKey.Coder.of(keyCoder), valueCoder)) - .apply( - ParDo.of( - new GroupIntoBatchesDoFn<>( - batchSize, - allowedLateness, - maxBufferingDuration, - ShardedKey.Coder.of(keyCoder), - valueCoder))); + .apply(new GroupIntoBatches<>(getBatchingParams())); } } + private static class ByteSizeObserver extends ElementByteSizeObserver { + private long elementByteSize = 0; + + @Override + protected void reportElementSize(long elementByteSize) { + this.elementByteSize += elementByteSize; + } + + public long getElementByteSize() { + return this.elementByteSize; + } + }; + @Override public PCollection>> expand(PCollection> input) { Duration allowedLateness = input.getWindowingStrategy().getAllowedLateness(); @@ -189,13 +291,18 @@ public PCollection>> expand(PCollection> in input.getCoder() instanceof KvCoder, "coder specified in the input PCollection is not a KvCoder"); KvCoder inputCoder = (KvCoder) input.getCoder(); - Coder keyCoder = (Coder) inputCoder.getCoderArguments().get(0); - Coder valueCoder = (Coder) inputCoder.getCoderArguments().get(1); + final Coder valueCoder = (Coder) inputCoder.getCoderArguments().get(1); + SerializableFunction weigher = params.getWeigher(valueCoder); return input.apply( ParDo.of( new GroupIntoBatchesDoFn<>( - batchSize, allowedLateness, maxBufferingDuration, keyCoder, valueCoder))); + params.getBatchSize(), + params.getBatchSizeBytes(), + weigher, + allowedLateness, + params.getMaxBufferingDuration(), + valueCoder))); } @VisibleForTesting @@ -203,93 +310,170 @@ private static class GroupIntoBatchesDoFn extends DoFn, KV>> { private static final Logger LOG = LoggerFactory.getLogger(GroupIntoBatchesDoFn.class); - private static final String END_OF_WINDOW_ID = "endOFWindow"; - private static final String END_OF_BUFFERING_ID = "endOfBuffering"; - private static final String BATCH_ID = "batch"; - private static final String NUM_ELEMENTS_IN_BATCH_ID = "numElementsInBatch"; - private static final String KEY_ID = "key"; private final long batchSize; + private final long batchSizeBytes; + @Nullable private final SerializableFunction weigher; private final Duration allowedLateness; private final Duration maxBufferingDuration; + // The following timer is no longer set. We maintain the spec for update compatibility. + private static final String END_OF_WINDOW_ID = "endOFWindow"; + @TimerId(END_OF_WINDOW_ID) private final TimerSpec windowTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME); + // This timer expires when it's time to batch and output the buffered data. + private static final String END_OF_BUFFERING_ID = "endOfBuffering"; + @TimerId(END_OF_BUFFERING_ID) private final TimerSpec bufferingTimer = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); + // The set of elements that will go in the next batch. + private static final String BATCH_ID = "batch"; + @StateId(BATCH_ID) private final StateSpec> batchSpec; + // The size of the current batch. + private static final String NUM_ELEMENTS_IN_BATCH_ID = "numElementsInBatch"; + @StateId(NUM_ELEMENTS_IN_BATCH_ID) - private final StateSpec> numElementsInBatchSpec; + private final StateSpec> batchSizeSpec; + + private static final String NUM_BYTES_IN_BATCH_ID = "numBytesInBatch"; + + // The byte size of the current batch. + @StateId(NUM_BYTES_IN_BATCH_ID) + private final StateSpec> batchSizeBytesSpec; + + private static final String TIMER_TIMESTAMP = "timerTs"; - @StateId(KEY_ID) - private final StateSpec> keySpec; + // The timestamp of the current active timer. + @StateId(TIMER_TIMESTAMP) + private final StateSpec> timerTsSpec; + + // The minimum element timestamp currently buffered in the bag. This is used to set the output + // timestamp + // on the timer which ensures that the watermark correctly tracks the buffered elements. + private static final String MIN_BUFFERED_TS = "minBufferedTs"; + + @StateId(MIN_BUFFERED_TS) + private final StateSpec> minBufferedTsSpec; private final long prefetchFrequency; GroupIntoBatchesDoFn( long batchSize, + long batchSizeBytes, + @Nullable SerializableFunction weigher, Duration allowedLateness, Duration maxBufferingDuration, - Coder inputKeyCoder, Coder inputValueCoder) { this.batchSize = batchSize; + this.batchSizeBytes = batchSizeBytes; + this.weigher = weigher; this.allowedLateness = allowedLateness; this.maxBufferingDuration = maxBufferingDuration; this.batchSpec = StateSpecs.bag(inputValueCoder); - this.numElementsInBatchSpec = - StateSpecs.combining( - new Combine.BinaryCombineLongFn() { - @Override - public long identity() { - return 0L; - } + Combine.BinaryCombineLongFn sumCombineFn = + new Combine.BinaryCombineLongFn() { + @Override + public long identity() { + return 0L; + } + + @Override + public long apply(long left, long right) { + return left + right; + } + }; + + Combine.BinaryCombineLongFn minCombineFn = + new Combine.BinaryCombineLongFn() { + @Override + public long identity() { + return BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis(); + } + + @Override + public long apply(long left, long right) { + return Math.min(left, right); + } + }; + + this.batchSizeSpec = StateSpecs.combining(sumCombineFn); + this.batchSizeBytesSpec = StateSpecs.combining(sumCombineFn); + this.timerTsSpec = StateSpecs.value(); + this.minBufferedTsSpec = StateSpecs.combining(minCombineFn); - @Override - public long apply(long left, long right) { - return left + right; - } - }); - - this.keySpec = StateSpecs.value(inputKeyCoder); // Prefetch every 20% of batchSize elements. Do not prefetch if batchSize is too little this.prefetchFrequency = ((batchSize / 5) <= 1) ? Long.MAX_VALUE : (batchSize / 5); } @ProcessElement public void processElement( - @TimerId(END_OF_WINDOW_ID) Timer windowTimer, @TimerId(END_OF_BUFFERING_ID) Timer bufferingTimer, @StateId(BATCH_ID) BagState batch, - @StateId(NUM_ELEMENTS_IN_BATCH_ID) CombiningState numElementsInBatch, - @StateId(KEY_ID) ValueState key, + @StateId(NUM_ELEMENTS_IN_BATCH_ID) CombiningState storedBatchSize, + @StateId(NUM_BYTES_IN_BATCH_ID) CombiningState storedBatchSizeBytes, + @StateId(TIMER_TIMESTAMP) ValueState timerTs, + @StateId(MIN_BUFFERED_TS) CombiningState minBufferedTs, @Element KV element, + @Timestamp Instant elementTs, BoundedWindow window, OutputReceiver>> receiver) { - Instant windowEnds = window.maxTimestamp().plus(allowedLateness); - LOG.debug("*** SET TIMER *** to point in time {} for window {}", windowEnds, window); - windowTimer.set(windowEnds); - key.write(element.getKey()); LOG.debug("*** BATCH *** Add element for window {} ", window); batch.add(element.getValue()); // Blind add is supported with combiningState - numElementsInBatch.add(1L); + storedBatchSize.add(1L); + if (weigher != null) { + storedBatchSizeBytes.add(weigher.apply(element.getValue())); + storedBatchSizeBytes.readLater(); + } - long num = numElementsInBatch.read(); - if (num == 1 && maxBufferingDuration != null) { - // This is the first element in batch. Start counting buffering time if a limit was set. - bufferingTimer.offset(maxBufferingDuration).setRelative(); + long num; + if (maxBufferingDuration.isLongerThan(Duration.ZERO)) { + minBufferedTs.readLater(); + num = storedBatchSize.read(); + + long oldOutputTs = + MoreObjects.firstNonNull( + minBufferedTs.read(), BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis()); + minBufferedTs.add(elementTs.getMillis()); + // If this is the first element in the batch or if the timer's output timestamp needs + // modifying, then set a + // timer. + if (num == 1 || minBufferedTs.read() != oldOutputTs) { + long targetTs = + MoreObjects.firstNonNull( + timerTs.read(), + bufferingTimer.getCurrentRelativeTime().getMillis() + + maxBufferingDuration.getMillis()); + bufferingTimer + .withOutputTimestamp(Instant.ofEpochMilli(minBufferedTs.read())) + .set(Instant.ofEpochMilli(targetTs)); + } } + num = storedBatchSize.read(); + if (num % prefetchFrequency == 0) { // Prefetch data and modify batch state (readLater() modifies this) batch.readLater(); } - if (num >= batchSize) { + + if (num >= batchSize + || (batchSizeBytes != Long.MAX_VALUE && storedBatchSizeBytes.read() >= batchSizeBytes)) { LOG.debug("*** END OF BATCH *** for window {}", window.toString()); - flushBatch(receiver, key, batch, numElementsInBatch, bufferingTimer); + flushBatch( + receiver, + element.getKey(), + batch, + storedBatchSize, + storedBatchSizeBytes, + timerTs, + minBufferedTs); + bufferingTimer.clear(); } } @@ -297,54 +481,84 @@ public void processElement( public void onBufferingTimer( OutputReceiver>> receiver, @Timestamp Instant timestamp, - @StateId(KEY_ID) ValueState key, + @Key K key, @StateId(BATCH_ID) BagState batch, - @StateId(NUM_ELEMENTS_IN_BATCH_ID) CombiningState numElementsInBatch, + @StateId(NUM_ELEMENTS_IN_BATCH_ID) CombiningState storedBatchSize, + @StateId(NUM_BYTES_IN_BATCH_ID) CombiningState storedBatchSizeBytes, + @StateId(TIMER_TIMESTAMP) ValueState timerTs, + @StateId(MIN_BUFFERED_TS) CombiningState minBufferedTs, @TimerId(END_OF_BUFFERING_ID) Timer bufferingTimer) { LOG.debug( "*** END OF BUFFERING *** for timer timestamp {} with buffering duration {}", timestamp, maxBufferingDuration); - flushBatch(receiver, key, batch, numElementsInBatch, null); + flushBatch( + receiver, key, batch, storedBatchSize, storedBatchSizeBytes, timerTs, minBufferedTs); + } + + @OnWindowExpiration + public void onWindowExpiration( + OutputReceiver>> receiver, + @Key K key, + @StateId(BATCH_ID) BagState batch, + @StateId(NUM_ELEMENTS_IN_BATCH_ID) CombiningState storedBatchSize, + @StateId(NUM_BYTES_IN_BATCH_ID) CombiningState storedBatchSizeBytes, + @StateId(TIMER_TIMESTAMP) ValueState timerTs, + @StateId(MIN_BUFFERED_TS) CombiningState minBufferedTs) { + flushBatch( + receiver, key, batch, storedBatchSize, storedBatchSizeBytes, timerTs, minBufferedTs); } + // We no longer set this timer, since OnWindowExpiration takes care of his. However we leave the + // callback in place + // for existing jobs that have already set these timers. @OnTimer(END_OF_WINDOW_ID) public void onWindowTimer( OutputReceiver>> receiver, @Timestamp Instant timestamp, - @StateId(KEY_ID) ValueState key, + @Key K key, @StateId(BATCH_ID) BagState batch, - @StateId(NUM_ELEMENTS_IN_BATCH_ID) CombiningState numElementsInBatch, - @TimerId(END_OF_BUFFERING_ID) Timer bufferingTimer, + @StateId(NUM_ELEMENTS_IN_BATCH_ID) CombiningState storedBatchSize, + @StateId(NUM_BYTES_IN_BATCH_ID) CombiningState storedBatchSizeBytes, + @StateId(TIMER_TIMESTAMP) ValueState timerTs, + @StateId(MIN_BUFFERED_TS) CombiningState minBufferedTs, BoundedWindow window) { LOG.debug( "*** END OF WINDOW *** for timer timestamp {} in windows {}", timestamp, window.toString()); - flushBatch(receiver, key, batch, numElementsInBatch, bufferingTimer); + flushBatch( + receiver, key, batch, storedBatchSize, storedBatchSizeBytes, timerTs, minBufferedTs); } private void flushBatch( OutputReceiver>> receiver, - ValueState key, + K key, BagState batch, - CombiningState numElementsInBatch, - @Nullable Timer bufferingTimer) { + CombiningState storedBatchSize, + CombiningState storedBatchSizeBytes, + ValueState timerTs, + CombiningState minBufferedTs) { Iterable values = batch.read(); // When the timer fires, batch state might be empty if (!Iterables.isEmpty(values)) { - receiver.output(KV.of(key.read(), values)); + receiver.output(KV.of(key, values)); } + clearState(batch, storedBatchSize, storedBatchSizeBytes, timerTs, minBufferedTs); + ; + } + + private void clearState( + BagState batch, + CombiningState storedBatchSize, + CombiningState storedBatchSizeBytes, + ValueState timerTs, + CombiningState minBufferedTs) { batch.clear(); - LOG.debug("*** BATCH *** clear"); - numElementsInBatch.clear(); - // We might reach here due to batch size being reached or window expiration. Reset the - // buffering timer (if not null) since the state is empty now. It'll be extended again if a - // new element arrives prior to the expiration time set here. - // TODO(BEAM-10887): Use clear() when it's available. - if (bufferingTimer != null && maxBufferingDuration != null) { - bufferingTimer.offset(maxBufferingDuration).setRelative(); - } + storedBatchSize.clear(); + storedBatchSizeBytes.clear(); + timerTs.clear(); + minBufferedTs.clear(); } } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/MapKeys.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/MapKeys.java new file mode 100644 index 000000000000..579bbc6663e7 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/MapKeys.java @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import org.apache.beam.sdk.transforms.Contextful.Fn; +import org.apache.beam.sdk.transforms.WithFailures.ExceptionElement; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeParameter; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.checkerframework.checker.nullness.qual.RequiresNonNull; + +/** + * {@code MapKeys} maps a {@code SerializableFunction} over keys of a {@code + * PCollection>} and returns a {@code PCollection>}. + * + *

    Example of use: + * + *

    {@code
    + * PCollection> input = ...;
    + * PCollection output =
    + *      input.apply(MapKeys.into(TypeDescriptors.doubles()).via(Integer::doubleValue));
    + * }
    + * + *

    See also {@link MapValues}. + * + * @param the type of the keys in the input {@code PCollection} + * @param the type of the keys in the output {@code PCollection} + */ +public class MapKeys extends PTransform>, PCollection>> { + + private final transient TypeDescriptor outputType; + private final @Nullable Contextful, KV>> fn; + + /** + * Returns a {@code MapKeys} {@code PTransform} for a {@code ProcessFunction} with predefined {@link #outputType}. + * + * @param the type of the keys in the input {@code PCollection} + * @param the type of the values in the input and output {@code PCollection}s + */ + public MapKeys via( + SerializableFunction fn) { + return new MapKeys<>( + Contextful.fn( + ((element, c) -> KV.of(fn.apply(element.getKey()), element.getValue())), + Requirements.empty()), + outputType); + } + + /** + * Returns a new {@link MapKeys} transform with the given type descriptor for the output type, but + * the mapping function yet to be specified using {@link #via(SerializableFunction)}. + */ + public static MapKeys into(final TypeDescriptor outputType) { + return new MapKeys<>(null, outputType); + } + + private MapKeys( + @Nullable Contextful, KV>> fn, TypeDescriptor outputType) { + this.fn = fn; + this.outputType = outputType; + } + + /** + * Returns a new {@link SimpleMapWithFailures} transform that catches exceptions raised while + * mapping elements, with the given type descriptor used for the failure collection but the + * exception handler yet to be specified using {@link + * SimpleMapWithFailures#exceptionsVia(ProcessFunction)}. + * + *

    See {@link WithFailures} documentation for usage patterns of the returned {@link + * WithFailures.Result}. + * + *

    Example usage: + * + *

    {@code
    +   * Result>, String> result =
    +   *         input.apply(
    +   *             MapKeys.into(TypeDescriptors.integers())
    +   *                 .via(word -> 1 / word.length)  // Could throw ArithmeticException
    +   *                 .exceptionsInto(TypeDescriptors.strings())
    +   *                 .exceptionsVia(ee -> ee.exception().getMessage()));
    +   * PCollection> output = result.output();
    +   * PCollection failures = result.failures();
    +   * }
    + */ + @RequiresNonNull("fn") + public SimpleMapWithFailures, KV, FailureT> exceptionsInto( + TypeDescriptor failureTypeDescriptor) { + return new SimpleMapWithFailures<>( + "MapKeysWithFailures", fn, getKvTypeDescriptor(), null, failureTypeDescriptor); + } + + /** + * Returns a new {@link SimpleMapWithFailures} transform that catches exceptions raised while + * mapping elements, passing the raised exception instance and the input element being processed + * through the given {@code exceptionHandler} and emitting the result to a failure collection. + * + *

    This method takes advantage of the type information provided by {@link InferableFunction}, + * meaning that a call to {@link #exceptionsInto(TypeDescriptor)} may not be necessary. + * + *

    See {@link WithFailures} documentation for usage patterns of the returned {@link + * WithFailures.Result}. + * + *

    Example usage: + * + *

    {@code
    +   * Result>, String> result =
    +   *         input.apply(
    +   *             MapKeys.into(TypeDescriptors.integers())
    +   *                 .via(word -> 1 / word.length)  // Could throw ArithmeticException
    +   *                 .exceptionsVia(
    +   *                     new InferableFunction>, String>() {
    +   *                       @Override
    +   *                       public String apply(ExceptionElement> input) {
    +   *                         return input.exception().getMessage();
    +   *                       }
    +   *                     }));
    +   * PCollection> output = result.output();
    +   * PCollection failures = result.failures();
    +   * }
    + */ + @RequiresNonNull("fn") + public SimpleMapWithFailures, KV, FailureT> exceptionsVia( + InferableFunction>, FailureT> exceptionHandler) { + return new SimpleMapWithFailures<>( + "MapKeysWithFailures", + fn, + getKvTypeDescriptor(), + exceptionHandler, + exceptionHandler.getOutputTypeDescriptor()); + } + + @Override + public PCollection> expand(PCollection> input) { + return input.apply( + "MapKeys", + MapElements.into(getKvTypeDescriptor()) + .via(checkNotNull(fn, "Must specify a function on MapKeys using .via()"))); + } + + private TypeDescriptor> getKvTypeDescriptor() { + return new TypeDescriptor>() {}.where(new TypeParameter() {}, outputType); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/MapValues.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/MapValues.java new file mode 100644 index 000000000000..ae1fe8ea6a45 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/MapValues.java @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import org.apache.beam.sdk.transforms.Contextful.Fn; +import org.apache.beam.sdk.transforms.WithFailures.ExceptionElement; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeParameter; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.checkerframework.checker.nullness.qual.RequiresNonNull; + +/** + * {@code MapValues} maps a {@code SerializableFunction} over values of a {@code + * PCollection>} and returns a {@code PCollection>}. + * + *

    Example of use: + * + *

    {@code
    + * PCollection> input = ...;
    + * PCollection output =
    + *      input.apply(MapValues.into(TypeDescriptors.doubles()).via(Integer::doubleValue));
    + * }
    + * + *

    See also {@link MapKeys}. + * + * @param the type of the values in the input {@code PCollection} + * @param the type of the elements in the output {@code PCollection} + */ +public class MapValues + extends PTransform>, PCollection>> { + + private final transient TypeDescriptor outputType; + private final @Nullable Contextful, KV>> fn; + + /** + * Returns a {@link MapValues} transform for a {@code ProcessFunction} with predefined + * {@link #outputType}. + * + * @param the type of the keys in the input and output {@code PCollection}s + * @param the type of the values in the input {@code PCollection} + */ + public MapValues via( + SerializableFunction fn) { + return new MapValues<>( + Contextful.fn( + ((element, c) -> KV.of(element.getKey(), fn.apply(element.getValue()))), + Requirements.empty()), + outputType); + } + + /** + * Returns a new {@link MapValues} transform with the given type descriptor for the output type, + * but the mapping function yet to be specified using {@link #via(SerializableFunction)}. + */ + public static MapValues into(final TypeDescriptor outputType) { + return new MapValues<>(null, outputType); + } + + private MapValues( + @Nullable Contextful, KV>> fn, TypeDescriptor outputType) { + this.fn = fn; + this.outputType = outputType; + } + + /** + * Returns a new {@link SimpleMapWithFailures} transform that catches exceptions raised while + * mapping elements, with the given type descriptor used for the failure collection but the + * exception handler yet to be specified using {@link + * SimpleMapWithFailures#exceptionsVia(ProcessFunction)}. + * + *

    See {@link WithFailures} documentation for usage patterns of the returned {@link + * WithFailures.Result}. + * + *

    Example usage: + * + *

    {@code
    +   * Result>, String> result =
    +   *         input.apply(
    +   *             MapValues.into(TypeDescriptors.integers())
    +   *                 .via(word -> 1 / word.length)  // Could throw ArithmeticException
    +   *                 .exceptionsInto(TypeDescriptors.strings())
    +   *                 .exceptionsVia(ee -> ee.exception().getMessage()));
    +   * PCollection> output = result.output();
    +   * PCollection failures = result.failures();
    +   * }
    + */ + @RequiresNonNull("fn") + public SimpleMapWithFailures, KV, FailureT> exceptionsInto( + TypeDescriptor failureTypeDescriptor) { + return new SimpleMapWithFailures<>( + "MapValuesWithFailures", fn, getKvTypeDescriptor(), null, failureTypeDescriptor); + } + + /** + * Returns a new {@link SimpleMapWithFailures} transform that catches exceptions raised while + * mapping elements, passing the raised exception instance and the input element being processed + * through the given {@code exceptionHandler} and emitting the result to a failure collection. + * + *

    This method takes advantage of the type information provided by {@link InferableFunction}, + * meaning that a call to {@link #exceptionsInto(TypeDescriptor)} may not be necessary. + * + *

    See {@link WithFailures} documentation for usage patterns of the returned {@link + * WithFailures.Result}. + * + *

    Example usage: + * + *

    {@code
    +   * Result>, String> result =
    +   *         input.apply(
    +   *             MapValues.into(TypeDescriptors.integers())
    +   *                 .via(word -> 1 / word.length)  // Could throw ArithmeticException
    +   *                 .exceptionsVia(
    +   *                     new InferableFunction>, String>() {
    +   *                       @Override
    +   *                       public String apply(ExceptionElement> input) {
    +   *                         return input.exception().getMessage();
    +   *                       }
    +   *                     }));
    +   * PCollection> output = result.output();
    +   * PCollection failures = result.failures();
    +   * }
    + */ + @RequiresNonNull("fn") + public SimpleMapWithFailures, KV, FailureT> exceptionsVia( + InferableFunction>, FailureT> exceptionHandler) { + return new SimpleMapWithFailures<>( + "MapValuesWithFailures", + fn, + getKvTypeDescriptor(), + exceptionHandler, + exceptionHandler.getOutputTypeDescriptor()); + } + + @Override + public PCollection> expand(PCollection> input) { + return input.apply( + "MapValues", + MapElements.into(getKvTypeDescriptor()) + .via(checkNotNull(fn, "Must specify a function on MapValues using .via()"))); + } + + private TypeDescriptor> getKvTypeDescriptor() { + return new TypeDescriptor>() {}.where(new TypeParameter() {}, outputType); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java index 1216b31ad657..85cfef979141 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java @@ -29,12 +29,14 @@ import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.display.DisplayData.Builder; import org.apache.beam.sdk.transforms.display.HasDisplayData; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.util.NameUtils; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PInput; import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.TupleTag; +import org.checkerframework.checker.nullness.qual.NonNull; import org.checkerframework.checker.nullness.qual.Nullable; /** @@ -178,6 +180,30 @@ public String getName() { return name != null ? name : getKindString(); } + /** + * Sets resource hints for the transform. + * + * @param resourceHints a {@link ResourceHints} instance. + * @return a reference to the same transfrom instance. + *

    For example: + *

    {@code
    +   * Pipeline p = ...
    +   * ...
    +   * p.apply(new SomeTransform().setResourceHints(ResourceHints.create().withMinRam("6 GiB")))
    +   * ...
    +   *
    +   * }
    + */ + public PTransform setResourceHints(@NonNull ResourceHints resourceHints) { + this.resourceHints = resourceHints; + return this; + } + + /** Returns resource hints set on the transform. */ + public ResourceHints getResourceHints() { + return resourceHints; + } + ///////////////////////////////////////////////////////////////////////////// // See the note about about PTransform's fake Serializability, to @@ -189,6 +215,8 @@ public String getName() { */ protected final transient @Nullable String name; + protected transient @NonNull ResourceHints resourceHints = ResourceHints.create(); + protected PTransform() { this.name = null; } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java index 761e9bb91697..3ea95f5df8f8 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java @@ -29,6 +29,7 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.TimestampedValue; import org.apache.beam.sdk.values.WindowingStrategy; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.UnsignedInteger; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; @@ -129,37 +130,38 @@ public PCollection expand(PCollection input) { .apply(Reshuffle.of()) .apply(Values.create()); } + } - private static class AssignShardFn extends DoFn> { - private int shard; - private @Nullable Integer numBuckets; + public static class AssignShardFn extends DoFn> { + private int shard; + private @Nullable Integer numBuckets; - private AssignShardFn(@Nullable Integer numBuckets) { - this.numBuckets = numBuckets; - } + public AssignShardFn(@Nullable Integer numBuckets) { + this.numBuckets = numBuckets; + } - @Setup - public void setup() { - shard = ThreadLocalRandom.current().nextInt(); - } + @Setup + public void setup() { + shard = ThreadLocalRandom.current().nextInt(); + } - @ProcessElement - public void processElement(@Element T element, OutputReceiver> r) { - ++shard; - // Smear the shard into something more random-looking, to avoid issues - // with runners that don't properly hash the key being shuffled, but rely - // on it being random-looking. E.g. Spark takes the Java hashCode() of keys, - // which for Integer is a no-op and it is an issue: - // http://hydronitrogen.com/poor-hash-partitioning-of-timestamps-integers-and-longs-in- - // spark.html - // This hashing strategy is copied from - // org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Hashing.smear(). - int hashOfShard = 0x1b873593 * Integer.rotateLeft(shard * 0xcc9e2d51, 15); - if (numBuckets != null) { - hashOfShard %= numBuckets; - } - r.output(KV.of(hashOfShard, element)); + @ProcessElement + public void processElement(@Element T element, OutputReceiver> r) { + ++shard; + // Smear the shard into something more random-looking, to avoid issues + // with runners that don't properly hash the key being shuffled, but rely + // on it being random-looking. E.g. Spark takes the Java hashCode() of keys, + // which for Integer is a no-op and it is an issue: + // http://hydronitrogen.com/poor-hash-partitioning-of-timestamps-integers-and-longs-in- + // spark.html + // This hashing strategy is copied from + // org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Hashing.smear(). + int hashOfShard = 0x1b873593 * Integer.rotateLeft(shard * 0xcc9e2d51, 15); + if (numBuckets != null) { + UnsignedInteger unsignedNumBuckets = UnsignedInteger.fromIntBits(numBuckets); + hashOfShard = UnsignedInteger.fromIntBits(hashOfShard).mod(unsignedNumBuckets).intValue(); } + r.output(KV.of(hashOfShard, element)); } } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SimpleMapWithFailures.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SimpleMapWithFailures.java new file mode 100644 index 000000000000..8d58f36eff9b --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SimpleMapWithFailures.java @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms; + +import org.apache.beam.sdk.transforms.Contextful.Fn; +import org.apache.beam.sdk.transforms.WithFailures.ExceptionElement; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * A {@code PTransform} that adds exception handling to {@link MapKeys} and {@link MapValues} using + * {@link MapElements.MapWithFailures}. + */ +class SimpleMapWithFailures + extends PTransform, WithFailures.Result, FailureT>> { + + private final transient TypeDescriptor outputType; + private final Contextful> fn; + private final transient TypeDescriptor failureType; + private final @Nullable ProcessFunction, FailureT> exceptionHandler; + private final String transformName; + + SimpleMapWithFailures( + String transformName, + Contextful> fn, + TypeDescriptor outputType, + @Nullable ProcessFunction, FailureT> exceptionHandler, + TypeDescriptor failureType) { + this.transformName = transformName; + this.fn = fn; + this.outputType = outputType; + this.exceptionHandler = exceptionHandler; + this.failureType = failureType; + } + + @Override + public WithFailures.Result, FailureT> expand(PCollection input) { + if (exceptionHandler == null) { + throw new NullPointerException(".exceptionsVia() is required"); + } + return input.apply( + transformName, + MapElements.into(outputType) + .via(fn) + .exceptionsInto(failureType) + .exceptionsVia(exceptionHandler)); + } + + /** + * Returns a {@code PTransform} that catches exceptions raised while mapping elements, passing the + * raised exception instance and the input element being processed through the given {@code + * exceptionHandler} and emitting the result to a failure collection. + */ + public SimpleMapWithFailures exceptionsVia( + ProcessFunction, FailureT> exceptionHandler) { + return new SimpleMapWithFailures<>( + transformName, fn, outputType, exceptionHandler, failureType); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/View.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/View.java index 904575c4b4ae..e81f0b85cd73 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/View.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/View.java @@ -17,8 +17,6 @@ */ package org.apache.beam.sdk.transforms; -import static org.apache.beam.sdk.options.ExperimentalOptions.hasExperiment; - import java.util.HashMap; import java.util.List; import java.util.Map; @@ -29,7 +27,6 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.KvCoder; -import org.apache.beam.sdk.coders.VoidCoder; import org.apache.beam.sdk.io.range.OffsetRange; import org.apache.beam.sdk.runners.TransformHierarchy.Node; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; @@ -260,33 +257,16 @@ public PCollectionView> expand(PCollection input) { * Long#MIN_VALUE} key is used to store all known {@link OffsetRange ranges} allowing us to * compute such an ordering. */ - - // TODO(BEAM-10097): Make this the default expansion for all portable runners. - if (hasExperiment(input.getPipeline().getOptions(), "beam_fn_api") - && (hasExperiment(input.getPipeline().getOptions(), "use_runner_v2") - || hasExperiment(input.getPipeline().getOptions(), "use_unified_worker"))) { - Coder inputCoder = input.getCoder(); - PCollection>> materializationInput = - input - .apply("IndexElements", ParDo.of(new ToListViewDoFn<>())) - .setCoder( - KvCoder.of( - BigEndianLongCoder.of(), - ValueOrMetadataCoder.create(inputCoder, OffsetRange.Coder.of()))); - PCollectionView> view = - PCollectionViews.listView( - materializationInput, - (TypeDescriptorSupplier) inputCoder::getEncodedTypeDescriptor, - input.getWindowingStrategy()); - materializationInput.apply(CreatePCollectionView.of(view)); - return view; - } - - PCollection> materializationInput = - input.apply(new VoidKeyToMultimapMaterialization<>()); Coder inputCoder = input.getCoder(); + PCollection>> materializationInput = + input + .apply("IndexElements", ParDo.of(new ToListViewDoFn<>())) + .setCoder( + KvCoder.of( + BigEndianLongCoder.of(), + ValueOrMetadataCoder.create(inputCoder, OffsetRange.Coder.of()))); PCollectionView> view = - PCollectionViews.listViewUsingVoidKey( + PCollectionViews.listView( materializationInput, (TypeDescriptorSupplier) inputCoder::getEncodedTypeDescriptor, materializationInput.getWindowingStrategy()); @@ -300,8 +280,8 @@ public PCollectionView> expand(PCollection input) { * range for each window seen. We use random offset ranges to minimize the chance that two ranges * overlap increasing the odds that each "key" represents a single index. */ - private static class ToListViewDoFn - extends DoFn>> { + @Internal + public static class ToListViewDoFn extends DoFn>> { private Map windowsToOffsets = new HashMap<>(); private OffsetRange generateRange(BoundedWindow window) { @@ -350,29 +330,19 @@ public PCollectionView> expand(PCollection input) { throw new IllegalStateException("Unable to create a side-input view from input", e); } - // TODO(BEAM-10097): Make this the default expansion for all portable runners. - if (hasExperiment(input.getPipeline().getOptions(), "beam_fn_api") - && (hasExperiment(input.getPipeline().getOptions(), "use_runner_v2") - || hasExperiment(input.getPipeline().getOptions(), "use_unified_worker"))) { - Coder inputCoder = input.getCoder(); - PCollectionView> view = - PCollectionViews.iterableView( - input, - (TypeDescriptorSupplier) inputCoder::getEncodedTypeDescriptor, - input.getWindowingStrategy()); - input.apply(CreatePCollectionView.of(view)); - return view; - } - - PCollection> materializationInput = - input.apply(new VoidKeyToMultimapMaterialization<>()); Coder inputCoder = input.getCoder(); + // HACK to work around https://issues.apache.org/jira/browse/BEAM-12228: + // There are bugs in "composite" vs "primitive" transform distinction + // in TransformHierachy. This noop transform works around them and should be zero + // cost. + PCollection materializationInput = + input.apply(MapElements.via(new SimpleFunction(x -> x) {})); PCollectionView> view = - PCollectionViews.iterableViewUsingVoidKey( + PCollectionViews.iterableView( materializationInput, (TypeDescriptorSupplier) inputCoder::getEncodedTypeDescriptor, materializationInput.getWindowingStrategy()); - materializationInput.apply(CreatePCollectionView.of(view)); + input.apply(CreatePCollectionView.of(view)); return view; } } @@ -508,35 +478,22 @@ public PCollectionView>> expand(PCollection> input) throw new IllegalStateException("Unable to create a side-input view from input", e); } - // TODO(BEAM-10097): Make this the default expansion for all portable runners. - if (hasExperiment(input.getPipeline().getOptions(), "beam_fn_api") - && (hasExperiment(input.getPipeline().getOptions(), "use_runner_v2") - || hasExperiment(input.getPipeline().getOptions(), "use_unified_worker"))) { - KvCoder kvCoder = (KvCoder) input.getCoder(); - Coder keyCoder = kvCoder.getKeyCoder(); - Coder valueCoder = kvCoder.getValueCoder(); - PCollectionView>> view = - PCollectionViews.multimapView( - input, - (TypeDescriptorSupplier) keyCoder::getEncodedTypeDescriptor, - (TypeDescriptorSupplier) valueCoder::getEncodedTypeDescriptor, - input.getWindowingStrategy()); - input.apply(CreatePCollectionView.of(view)); - return view; - } - KvCoder kvCoder = (KvCoder) input.getCoder(); Coder keyCoder = kvCoder.getKeyCoder(); Coder valueCoder = kvCoder.getValueCoder(); - PCollection>> materializationInput = - input.apply(new VoidKeyToMultimapMaterialization<>()); + // HACK to work around https://issues.apache.org/jira/browse/BEAM-12228: + // There are bugs in "composite" vs "primitive" transform distinction + // in TransformHierachy. This noop transform works around them and should be zero + // cost. + PCollection> materializationInput = + input.apply(MapElements.via(new SimpleFunction, KV>(x -> x) {})); PCollectionView>> view = - PCollectionViews.multimapViewUsingVoidKey( + PCollectionViews.multimapView( materializationInput, (TypeDescriptorSupplier) keyCoder::getEncodedTypeDescriptor, (TypeDescriptorSupplier) valueCoder::getEncodedTypeDescriptor, materializationInput.getWindowingStrategy()); - materializationInput.apply(CreatePCollectionView.of(view)); + input.apply(CreatePCollectionView.of(view)); return view; } } @@ -567,37 +524,19 @@ public PCollectionView> expand(PCollection> input) { throw new IllegalStateException("Unable to create a side-input view from input", e); } - // TODO(BEAM-10097): Make this the default expansion for all portable runners. - if (hasExperiment(input.getPipeline().getOptions(), "beam_fn_api") - && (hasExperiment(input.getPipeline().getOptions(), "use_runner_v2") - || hasExperiment(input.getPipeline().getOptions(), "use_unified_worker"))) { - KvCoder kvCoder = (KvCoder) input.getCoder(); - Coder keyCoder = kvCoder.getKeyCoder(); - Coder valueCoder = kvCoder.getValueCoder(); - - PCollectionView> view = - PCollectionViews.mapView( - input, - (TypeDescriptorSupplier) keyCoder::getEncodedTypeDescriptor, - (TypeDescriptorSupplier) valueCoder::getEncodedTypeDescriptor, - input.getWindowingStrategy()); - input.apply(CreatePCollectionView.of(view)); - return view; - } - KvCoder kvCoder = (KvCoder) input.getCoder(); Coder keyCoder = kvCoder.getKeyCoder(); Coder valueCoder = kvCoder.getValueCoder(); - PCollection>> materializationInput = - input.apply(new VoidKeyToMultimapMaterialization<>()); + PCollection> materializationInput = + input.apply(MapElements.via(new SimpleFunction, KV>(x -> x) {})); PCollectionView> view = - PCollectionViews.mapViewUsingVoidKey( + PCollectionViews.mapView( materializationInput, (TypeDescriptorSupplier) keyCoder::getEncodedTypeDescriptor, (TypeDescriptorSupplier) valueCoder::getEncodedTypeDescriptor, materializationInput.getWindowingStrategy()); - materializationInput.apply(CreatePCollectionView.of(view)); + input.apply(CreatePCollectionView.of(view)); return view; } } @@ -605,35 +544,12 @@ public PCollectionView> expand(PCollection> input) { //////////////////////////////////////////////////////////////////////////// // Internal details below - /** - * A {@link PTransform} which converts all values into {@link KV}s with {@link Void} keys. - * - *

    TODO(BEAM-10097): Replace this materialization with specializations that optimize the - * various SDK requested views. - */ - @Internal - public static class VoidKeyToMultimapMaterialization - extends PTransform, PCollection>> { - - private static class VoidKeyToMultimapMaterializationDoFn extends DoFn> { - @ProcessElement - public void processElement(@Element T element, OutputReceiver> r) { - r.output(KV.of((Void) null, element)); - } - } - - @Override - public PCollection> expand(PCollection input) { - PCollection output = input.apply(ParDo.of(new VoidKeyToMultimapMaterializationDoFn<>())); - output.setCoder(KvCoder.of(VoidCoder.of(), input.getCoder())); - return output; - } - } - /** * For internal use only; no backwards-compatibility guarantees. * - *

    Creates a primitive {@link PCollectionView}. + *

    Placeholder transform for runners to have a hook to materialize a {@link PCollection} as a + * side input. The metadata included in the {@link PCollectionView} is how the {@link PCollection} + * will be read as a side input. * * @param The type of the elements of the input PCollection * @param The type associated with the {@link PCollectionView} used as a side input diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java index 42d8b74e5fda..0f96986578cf 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java @@ -317,7 +317,17 @@ public interface TerminationCondition extends Serializable { * calling the {@link PollFn} for the current input, the {@link PollResult} included a * previously unseen {@code OutputT}. */ - StateT onSeenNewOutput(Instant now, StateT state); + default StateT onSeenNewOutput(Instant now, StateT state) { + return state; + } + + /** + * Called by the {@link Watch} transform to compute a new termination state after every poll + * completion. + */ + default StateT onPollComplete(StateT state) { + return state; + } /** * Called by the {@link Watch} transform to determine whether the given termination state @@ -379,6 +389,14 @@ public static AfterTimeSinceNewOutput afterTimeSinceNewOutput( return new AfterTimeSinceNewOutput<>(timeSinceNewOutput); } + /** + * Returns a {@link TerminationCondition} that holds after the given number of polling + * iterations have occurred per-input. Useful for deterministic testing of Watch users. + */ + public static AfterIterations afterIterations(int iterations) { + return new AfterIterations<>(iterations); + } + /** * Returns a {@link TerminationCondition} that holds when at least one of the given two * conditions holds. @@ -412,11 +430,6 @@ public Integer forNewInput(Instant now, InputT input) { return 0; } - @Override - public Integer onSeenNewOutput(Instant now, Integer state) { - return state; - } - @Override public boolean canStopPolling(Instant now, Integer state) { return false; @@ -450,6 +463,11 @@ public StateT onSeenNewOutput(Instant now, StateT state) { return wrapped.onSeenNewOutput(now, state); } + @Override + public StateT onPollComplete(StateT state) { + return wrapped.onPollComplete(state); + } + @Override public boolean canStopPolling(Instant now, StateT state) { return wrapped.canStopPolling(now, state); @@ -480,12 +498,6 @@ public KV forNewInput(Instant now, InputT input) { return KV.of(now, maxTimeSinceInput.apply(input)); } - @Override - public KV onSeenNewOutput( - Instant now, KV state) { - return state; - } - @Override public boolean canStopPolling(Instant now, KV state) { return new Duration(state.getKey(), now).isLongerThan(state.getValue()); @@ -502,6 +514,44 @@ public String toString(KV state) { } } + static class AfterIterations implements TerminationCondition { + private final int maxIterations; + + private AfterIterations(int maxIterations) { + this.maxIterations = maxIterations; + } + + @Override + public Coder getStateCoder() { + return VarIntCoder.of(); + } + + @Override + public Integer forNewInput(Instant now, InputT input) { + return 0; + } + + @Override + public Integer onPollComplete(Integer state) { + return state + 1; + } + + @Override + public boolean canStopPolling(Instant now, Integer state) { + return state >= maxIterations; + } + + @Override + public String toString(Integer state) { + return "AfterIterations{" + + "iterations=" + + state + + ", maxIterations=" + + maxIterations + + '}'; + } + } + static class AfterTimeSinceNewOutput implements TerminationCondition< InputT, @@ -586,6 +636,11 @@ public KV onSeenNewOutput( second.onSeenNewOutput(now, state.getValue())); } + @Override + public KV onPollComplete(KV state) { + return KV.of(first.onPollComplete(state.getKey()), second.onPollComplete(state.getValue())); + } + @Override public boolean canStopPolling(Instant now, KV state) { switch (operation) { @@ -785,7 +840,8 @@ public Coder getRestrictionCoder() { } @UnboundedPerElement - private static class WatchGrowthFn + @VisibleForTesting + protected static class WatchGrowthFn extends DoFn>>> { private final Watch.Growth spec; private final Coder outputCoder; @@ -793,7 +849,7 @@ private static class WatchGrowthFn private final Coder outputKeyCoder; private final Funnel coderFunnel; - private WatchGrowthFn( + WatchGrowthFn( Growth spec, Coder outputCoder, SerializableFunction outputKeyFn, @@ -844,7 +900,6 @@ public ProcessContinuation process( priorPoll.getOutputs().size()); c.output(KV.of(c.element(), priorPoll.getOutputs())); } - watermarkEstimator.setWatermark(priorPoll.getWatermark()); } return stop(); } @@ -856,7 +911,8 @@ public ProcessContinuation process( PollingGrowthState pollingRestriction = (PollingGrowthState) currentRestriction; - // Produce a poll result that only contains never seen before results. + // Produce a poll result that only contains never seen before results in timestamp + // sorted order. Growth.PollResult newResults = computeNeverSeenBeforeResults(pollingRestriction, res); @@ -875,6 +931,7 @@ public ProcessContinuation process( terminationState = getTerminationCondition().onSeenNewOutput(Instant.now(), terminationState); } + terminationState = getTerminationCondition().onPollComplete(terminationState); if (!tracker.tryClaim(KV.of(newResults, terminationState))) { LOG.info("{} - will not emit poll result tryClaim failed.", c.element()); @@ -885,8 +942,13 @@ public ProcessContinuation process( c.output(KV.of(c.element(), newResults.getOutputs())); } + Instant computedWatermark = null; if (newResults.getWatermark() != null) { - watermarkEstimator.setWatermark(newResults.getWatermark()); + computedWatermark = newResults.getWatermark(); + } else if (!newResults.getOutputs().isEmpty()) { + // computeNeverSeenBeforeResults returns the elements in timestamp sorted order so + // we can get the timestamp from the first element. + computedWatermark = newResults.getOutputs().get(0).getTimestamp(); } Instant currentTime = Instant.now(); @@ -899,11 +961,15 @@ public ProcessContinuation process( return stop(); } - if (BoundedWindow.TIMESTAMP_MAX_VALUE.equals(newResults.getWatermark())) { + if (BoundedWindow.TIMESTAMP_MAX_VALUE.equals(computedWatermark)) { LOG.info("{} - will stop polling, reached max timestamp.", c.element()); return stop(); } + if (computedWatermark != null) { + watermarkEstimator.setWatermark(computedWatermark); + } + LOG.info( "{} - will resume polling in {} ms.", c.element(), spec.getPollInterval().getMillis()); return resume().withResumeDelay(spec.getPollInterval()); diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithFailures.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithFailures.java index b03ff8d1d2ee..1fc37aba2bc6 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithFailures.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithFailures.java @@ -25,6 +25,7 @@ import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.values.EncodableThrowable; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionTuple; @@ -92,6 +93,24 @@ public static ExceptionElement of(T element, Exception exception) { } } + /** + * A handler that holds onto the {@link Throwable} that led to the exception, returning it along + * with the original value as a {@link KV}. + * + *

    Extends {@link SimpleFunction} so that full type information is captured. {@link KV} and + * {@link EncodableThrowable} coders can be easily inferred by Beam, so coder inference can be + * successfully applied if the consuming transform passes type information to the failure + * collection's {@link TupleTag}. This may require creating an instance of an anonymous inherited + * class rather than of this class directly. + */ + public static class ThrowableHandler + extends SimpleFunction, KV> { + @Override + public KV apply(ExceptionElement f) { + return KV.of(f.element(), EncodableThrowable.forThrowable(f.exception())); + } + } + /** * A simple handler that extracts information from an exception to a {@code Map} * and returns a {@link KV} where the key is the input element that failed processing, and the diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java index 6de6bc4f6fe4..ef0449d69f3a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java @@ -33,6 +33,7 @@ import org.apache.beam.sdk.state.Timer; import org.apache.beam.sdk.state.TimerMap; import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.DoFn.GetSize; import org.apache.beam.sdk.transforms.DoFn.ProcessElement; import org.apache.beam.sdk.transforms.DoFn.TruncateRestriction; import org.apache.beam.sdk.transforms.reflect.DoFnSignature.OnTimerMethod; @@ -69,40 +70,40 @@ import org.apache.beam.sdk.util.UserCodeException; import org.apache.beam.sdk.values.TypeDescriptor; import org.apache.beam.sdk.values.TypeDescriptors; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.field.FieldDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.method.MethodDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.Visibility; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeList; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.subclass.ConstructorStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.ExceptionMethod; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.FixedValue; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation.Context; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.MethodDelegation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Throw; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.Assigner; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.Assigner.Typing; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.TypeCasting; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.TextConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.FieldAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.Label; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.MethodVisitor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.Opcodes; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.Type; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.field.FieldDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.method.MethodDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.Visibility; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeList; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.subclass.ConstructorStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.ExceptionMethod; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.FixedValue; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation.Context; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.MethodDelegation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Throw; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.Assigner; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.Assigner.Typing; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.TypeCasting; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.TextConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.FieldAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.Label; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.MethodVisitor; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.Opcodes; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.Type; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Primitives; import org.checkerframework.checker.nullness.qual.Nullable; @@ -115,6 +116,7 @@ }) class ByteBuddyDoFnInvokerFactory implements DoFnInvokerFactory { + public static final String SETUP_CONTEXT_PARAMETER_METHOD = "setupContext"; public static final String START_BUNDLE_CONTEXT_PARAMETER_METHOD = "startBundleContext"; public static final String FINISH_BUNDLE_CONTEXT_PARAMETER_METHOD = "finishBundleContext"; public static final String PROCESS_CONTEXT_PARAMETER_METHOD = "processContext"; @@ -429,6 +431,14 @@ public static double invokeGetSize( return 1.0; } } + + public static double validateSize(double size) { + if (size < 0) { + throw new IllegalArgumentException( + String.format("Expected size >= 0 but received %s.", size)); + } + return size; + } } /** Generates a {@link DoFnInvoker} class for the given {@link DoFnSignature}. */ @@ -468,7 +478,7 @@ public static double invokeGetSize( .intercept( delegateMethodWithExtraParametersOrNoop(clazzDescription, signature.finishBundle())) .method(ElementMatchers.named("invokeSetup")) - .intercept(delegateOrNoop(clazzDescription, signature.setup())) + .intercept(delegateMethodWithExtraParametersOrNoop(clazzDescription, signature.setup())) .method(ElementMatchers.named("invokeTeardown")) .intercept(delegateOrNoop(clazzDescription, signature.teardown())) .method(ElementMatchers.named("invokeOnWindowExpiration")) @@ -599,7 +609,7 @@ private static Implementation getSizeDelegation( if (signature == null) { return MethodDelegation.to(DefaultGetSize.class); } else { - return new DoFnMethodWithExtraParametersDelegation(doFnType, signature); + return new GetSizeDelegation(doFnType, signature); } } @@ -861,7 +871,6 @@ static StackManipulation getExtraContextParameter( return parameter.match( new Cases() { - @Override public StackManipulation dispatch(StartBundleContextParameter p) { return new StackManipulation.Compound( @@ -1134,6 +1143,35 @@ protected StackManipulation afterDelegation(MethodDescription instrumentedMethod } } + /** + * Implements the invoker's {@link DoFnInvoker#invokeGetSize} method by delegating to the {@link + * GetSize} method. + */ + private static final class GetSizeDelegation extends DoFnMethodWithExtraParametersDelegation { + private static final MethodDescription VALIDATE_SIZE_METHOD; + + static { + try { + VALIDATE_SIZE_METHOD = + new MethodDescription.ForLoadedMethod( + DefaultGetSize.class.getMethod("validateSize", double.class)); + } catch (NoSuchMethodException e) { + throw new RuntimeException("Failed to locate DefaultGetSize.validateSize()"); + } + } + + /** Implementation of {@link MethodDelegation} for the {@link GetSize} method. */ + private GetSizeDelegation(TypeDescription doFnType, DoFnSignature.GetSizeMethod signature) { + super(doFnType, signature); + } + + @Override + protected StackManipulation afterDelegation(MethodDescription instrumentedMethod) { + return new StackManipulation.Compound( + MethodInvocation.invoke(VALIDATE_SIZE_METHOD), MethodReturn.DOUBLE); + } + } + private static class UserCodeMethodInvocation implements StackManipulation { private final @Nullable Integer returnVarIndex; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java index 6c0aac72e91c..51ae2436a7e1 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java @@ -26,22 +26,22 @@ import org.apache.beam.sdk.transforms.DoFn.OnTimer; import org.apache.beam.sdk.transforms.DoFn.TimerId; import org.apache.beam.sdk.transforms.reflect.ByteBuddyDoFnInvokerFactory.DoFnMethodWithExtraParametersDelegation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.FieldManifestation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.Visibility; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.subclass.ConstructorStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.FieldAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.FieldManifestation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.Visibility; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.subclass.ConstructorStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.FieldAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.CharMatcher; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvoker.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvoker.java index a877303a1b66..a1ab7762791f 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvoker.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvoker.java @@ -54,7 +54,7 @@ @Internal public interface DoFnInvoker { /** Invoke the {@link DoFn.Setup} method on the bound {@link DoFn}. */ - void invokeSetup(); + void invokeSetup(ArgumentProvider arguments); /** Invoke the {@link DoFn.StartBundle} method on the bound {@link DoFn}. */ void invokeStartBundle(ArgumentProvider arguments); diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokers.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokers.java index 31fa13e0543b..bd82db3a0406 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokers.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokers.java @@ -18,6 +18,7 @@ package org.apache.beam.sdk.transforms.reflect; import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.DoFn; /** Static utilities for working with {@link DoFnInvoker}. */ @@ -46,10 +47,10 @@ private DoFnInvokers() {} *

    On success returns an {@link DoFnInvoker} for the given {@link DoFn}. */ public static DoFnInvoker tryInvokeSetupFor( - DoFn fn) { + DoFn fn, PipelineOptions options) { DoFnInvoker doFnInvoker = invokerFor(fn); try { - doFnInvoker.invokeSetup(); + doFnInvoker.invokeSetup(new DoFnSetupArgumentProvider<>(fn, options)); } catch (Exception e) { try { doFnInvoker.invokeTeardown(); @@ -60,4 +61,26 @@ public static DoFnInvoker tryInvokeSetupFor( } return doFnInvoker; } + + /** An {@link DoFnInvoker.ArgumentProvider} for {@link DoFn.Setup @Setup}. */ + private static class DoFnSetupArgumentProvider + extends DoFnInvoker.BaseArgumentProvider { + private final DoFn fn; + private final PipelineOptions options; + + private DoFnSetupArgumentProvider(DoFn fn, PipelineOptions options) { + this.fn = fn; + this.options = options; + } + + @Override + public PipelineOptions pipelineOptions() { + return options; + } + + @Override + public String getErrorContext() { + return "SimpleDoFnRunner/Setup"; + } + } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java index 3846ab2d2e9a..6b59fa223d93 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java @@ -1248,13 +1248,17 @@ static FieldAccessDeclaration create(String id, Field field) { /** Describes a {@link DoFn.Setup} or {@link DoFn.Teardown} method. */ @AutoValue - public abstract static class LifecycleMethod implements DoFnMethod { + public abstract static class LifecycleMethod implements MethodWithExtraParameters { /** The annotated method itself. */ @Override public abstract Method targetMethod(); - static LifecycleMethod create(Method targetMethod) { - return new AutoValue_DoFnSignature_LifecycleMethod(targetMethod); + /** Types of optional parameters of the annotated method, in the order they appear. */ + @Override + public abstract List extraParameters(); + + static LifecycleMethod create(Method targetMethod, List extraParameters) { + return new AutoValue_DoFnSignature_LifecycleMethod(null, targetMethod, extraParameters); } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java index e730e8ab8449..d39fc8369da0 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java @@ -146,6 +146,9 @@ private DoFnSignatures() {} Parameter.SideInputParameter.class, Parameter.BundleFinalizerParameter.class); + private static final ImmutableList> ALLOWED_SETUP_PARAMETERS = + ImmutableList.of(Parameter.PipelineOptionsParameter.class); + private static final ImmutableList> ALLOWED_START_BUNDLE_PARAMETERS = ImmutableList.of( Parameter.PipelineOptionsParameter.class, @@ -186,7 +189,8 @@ private DoFnSignatures() {} Parameter.TimerParameter.class, Parameter.StateParameter.class, Parameter.TimerFamilyParameter.class, - Parameter.TimerIdParameter.class); + Parameter.TimerIdParameter.class, + Parameter.KeyParameter.class); private static final Collection> ALLOWED_ON_WINDOW_EXPIRATION_PARAMETERS = @@ -640,13 +644,14 @@ private static DoFnSignature parseSignature(Class> fnClass) } if (setupMethod != null) { + ErrorReporter setupErrors = errors.forMethod(DoFn.Setup.class, setupMethod); signatureBuilder.setSetup( - analyzeLifecycleMethod(errors.forMethod(DoFn.Setup.class, setupMethod), setupMethod)); + analyzeSetupMethod(setupErrors, fnT, setupMethod, inputT, outputT, fnContext)); } if (teardownMethod != null) { signatureBuilder.setTeardown( - analyzeLifecycleMethod( + analyzeShutdownMethod( errors.forMethod(DoFn.Teardown.class, teardownMethod), teardownMethod)); } @@ -1646,11 +1651,43 @@ static DoFnSignature.BundleMethod analyzeFinishBundleMethod( return DoFnSignature.BundleMethod.create(m, methodContext.extraParameters); } - private static DoFnSignature.LifecycleMethod analyzeLifecycleMethod( + @VisibleForTesting + static DoFnSignature.LifecycleMethod analyzeSetupMethod( + ErrorReporter errors, + TypeDescriptor> fnT, + Method m, + TypeDescriptor inputT, + TypeDescriptor outputT, + FnAnalysisContext fnContext) { + errors.checkArgument(void.class.equals(m.getReturnType()), "Must return void"); + Type[] params = m.getGenericParameterTypes(); + MethodAnalysisContext methodContext = MethodAnalysisContext.create(); + for (int i = 0; i < params.length; ++i) { + Parameter extraParam = + analyzeExtraParameter( + errors, + fnContext, + methodContext, + fnT, + ParameterDescription.of( + m, i, fnT.resolveType(params[i]), Arrays.asList(m.getParameterAnnotations()[i])), + inputT, + outputT); + methodContext.addParameter(extraParam); + } + + for (Parameter parameter : methodContext.getExtraParameters()) { + checkParameterOneOf(errors, parameter, ALLOWED_SETUP_PARAMETERS); + } + + return DoFnSignature.LifecycleMethod.create(m, methodContext.extraParameters); + } + + private static DoFnSignature.LifecycleMethod analyzeShutdownMethod( ErrorReporter errors, Method m) { errors.checkArgument(void.class.equals(m.getReturnType()), "Must return void"); errors.checkArgument(m.getGenericParameterTypes().length == 0, "Must take zero arguments"); - return DoFnSignature.LifecycleMethod.create(m); + return DoFnSignature.LifecycleMethod.create(m, Collections.emptyList()); } @VisibleForTesting diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/StableInvokerNamingStrategy.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/StableInvokerNamingStrategy.java index c7b9ce9e8902..aecd89601689 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/StableInvokerNamingStrategy.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/StableInvokerNamingStrategy.java @@ -21,8 +21,8 @@ import com.google.auto.value.AutoValue; import org.apache.beam.sdk.transforms.DoFn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.NamingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.NamingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription; import org.checkerframework.checker.nullness.qual.Nullable; /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHint.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHint.java new file mode 100644 index 000000000000..dfd0ac4ac2ff --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHint.java @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms.resourcehints; + +import org.checkerframework.checker.nullness.qual.Nullable; + +/** Provides a definition of a resource hint known to the SDK. */ +public abstract class ResourceHint { + + /** + * Reconciles values of a hint when the hint specified on a transform is also defined in an outer + * context, for example on a composite transform, or specified in the transform's execution + * environment. Override this method for a custom reconciliation logic. + */ + public ResourceHint mergeWithOuter(ResourceHint outer) { + // Defaults to the inner value as it is the most specific one. + return this; + } + + /** Defines how to represent the as bytestring. */ + public abstract byte[] toBytes(); + + @Override + public abstract boolean equals(@Nullable Object other); + + @Override + public abstract int hashCode(); +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHints.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHints.java new file mode 100644 index 000000000000..e89f6eb74a06 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHints.java @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms.resourcehints; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import java.util.List; +import java.util.Map; +import java.util.function.Function; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.StandardResourceHints; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ProtocolMessageEnum; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * Pipeline authors can use resource hints to provide additional information to runners about the + * desired aspects of the execution environment. Resource hints can be specified via {@link + * org.apache.beam.sdk.transforms.PTransform PTransform#setResourceHints} for parts of the pipeline, + * or globally via {@link ResourceHintsOptions resourceHints} pipeline option. + * + *

    Interpretation of hints is provided by Beam runners. + */ +public class ResourceHints { + private static final String MIN_RAM_URN = "beam:resources:min_ram_bytes:v1"; + private static final String ACCELERATOR_URN = "beam:resources:accelerator:v1"; + + // TODO: reference this from a common location in all packages that use this. + private static String getUrn(ProtocolMessageEnum value) { + return value.getValueDescriptor().getOptions().getExtension(RunnerApi.beamUrn); + } + + static { + checkState(MIN_RAM_URN.equals(getUrn(StandardResourceHints.Enum.MIN_RAM_BYTES))); + checkState(ACCELERATOR_URN.equals(getUrn(StandardResourceHints.Enum.ACCELERATOR))); + } + + private static ImmutableMap hintNameToUrn = + ImmutableMap.builder() + .put("minRam", MIN_RAM_URN) + .put("min_ram", MIN_RAM_URN) // Courtesy alias. + .put("accelerator", ACCELERATOR_URN) + .build(); + + private static ImmutableMap> parsers = + ImmutableMap.>builder() + .put(MIN_RAM_URN, s -> new BytesHint(BytesHint.parse(s))) + .put(ACCELERATOR_URN, s -> new StringHint(s)) + .build(); + + private static final ResourceHints EMPTY = new ResourceHints(ImmutableMap.of()); + + private final ImmutableMap hints; + + private ResourceHints(ImmutableMap hints) { + this.hints = hints; + } + + /** Creates a {@link ResourceHints} instance with no hints. */ + public static ResourceHints create() { + return EMPTY; + } + + /** Creates a {@link ResourceHints} instance with hints supplied in options. */ + public static ResourceHints fromOptions(PipelineOptions options) { + ResourceHintsOptions resourceHintsOptions = options.as(ResourceHintsOptions.class); + ResourceHints result = create(); + List hints = resourceHintsOptions.getResourceHints(); + Splitter splitter = Splitter.on('=').limit(2); + for (String hint : hints) { + List parts = splitter.splitToList(hint); + if (parts.size() != 2) { + throw new IllegalArgumentException("Unparsable resource hint: " + hint); + } + String nameOrUrn = parts.get(0); + String stringValue = parts.get(1); + String urn; + if (hintNameToUrn.containsKey(nameOrUrn)) { + urn = hintNameToUrn.get(nameOrUrn); + } else if (!nameOrUrn.startsWith("beam:resources:")) { + // Allow unknown hints to be passed, but validate a little bit to prevent typos. + throw new IllegalArgumentException("Unknown resource hint: " + hint); + } else { + urn = nameOrUrn; + } + ResourceHint value = parsers.getOrDefault(urn, s -> new StringHint(s)).apply(stringValue); + result = result.withHint(urn, value); + } + return result; + } + + /*package*/ static class BytesHint extends ResourceHint { + private static Map suffixes = + ImmutableMap.builder() + .put("B", 1L) + .put("KB", 1000L) + .put("MB", 1000_000L) + .put("GB", 1000_000_000L) + .put("TB", 1000_000_000_000L) + .put("PB", 1000_000_000_000_000L) + .put("KiB", 1L << 10) + .put("MiB", 1L << 20) + .put("GiB", 1L << 30) + .put("TiB", 1L << 40) + .put("PiB", 1L << 50) + .build(); + + private final long value; + + @Override + public boolean equals(@Nullable Object other) { + if (other == null) { + return false; + } else if (this == other) { + return true; + } else if (other instanceof BytesHint) { + return ((BytesHint) other).value == value; + } else { + return false; + } + } + + @Override + public int hashCode() { + return Long.hashCode(value); + } + + public BytesHint(long value) { + this.value = value; + } + + public static long parse(String s) { + Matcher m = Pattern.compile("([\\d.]+)[\\s]?([\\D]+$)").matcher(s); + if (m.find()) { + String number = m.group(1); + String suffix = m.group(2); + if (number != null && suffix != null && suffixes.containsKey(suffix)) { + return (long) (Double.valueOf(number) * suffixes.get(suffix)); + } + } + throw new IllegalArgumentException("Unable to parse '" + s + "' as a byte value."); + } + + @Override + public ResourceHint mergeWithOuter(ResourceHint outer) { + return new BytesHint(Math.max(value, ((BytesHint) outer).value)); + } + + @Override + public byte[] toBytes() { + return String.valueOf(value).getBytes(Charsets.US_ASCII); + } + } + + /*package*/ static class StringHint extends ResourceHint { + private final String value; + + public StringHint(String value) { + this.value = value; + } + + public static String parse(String s) { + return s; + } + + @Override + public byte[] toBytes() { + return value.getBytes(Charsets.US_ASCII); + } + + @Override + public boolean equals(@Nullable Object other) { + if (other == null) { + return false; + } else if (this == other) { + return true; + } else if (other instanceof StringHint) { + return ((StringHint) other).value.equals(value); + } else { + return false; + } + } + + @Override + public int hashCode() { + return value.hashCode(); + } + } + + /** Sets desired minimal available RAM size to have in transform's execution environment. */ + public ResourceHints withMinRam(long ramBytes) { + return withHint(MIN_RAM_URN, new BytesHint(ramBytes)); + } + + /** + * Sets desired minimal available RAM size to have in transform's execution environment. + * + * @param ramBytes specifies a human-friendly size string, for example: '10.5 GiB', '4096 MiB', + * etc. + */ + public ResourceHints withMinRam(String ramBytes) { + return withMinRam(BytesHint.parse(ramBytes)); + } + + /** Declares hardware accelerators that are desired to have in the execution environment. */ + public ResourceHints withAccelerator(String accelerator) { + return withHint(ACCELERATOR_URN, new StringHint(accelerator)); + } + + /** Declares a custom resource hint that has a specified URN. */ + public ResourceHints withHint(String urn, ResourceHint hint) { + ImmutableMap.Builder newHints = ImmutableMap.builder(); + newHints.put(urn, hint); + for (Map.Entry oldHint : hints.entrySet()) { + if (!oldHint.getKey().equals(urn)) { + newHints.put(oldHint.getKey(), oldHint.getValue()); + } + } + return new ResourceHints(newHints.build()); + } + + public Map hints() { + return hints; + } + + public ResourceHints mergeWithOuter(ResourceHints outer) { + if (outer.hints.isEmpty()) { + return this; + } else if (hints.isEmpty()) { + return outer; + } else { + ImmutableMap.Builder newHints = ImmutableMap.builder(); + for (Map.Entry outerHint : outer.hints().entrySet()) { + if (hints.containsKey(outerHint.getKey())) { + newHints.put( + outerHint.getKey(), + hints.get(outerHint.getKey()).mergeWithOuter(outerHint.getValue())); + } else { + newHints.put(outerHint); + } + } + for (Map.Entry hint : hints.entrySet()) { + if (!outer.hints.containsKey(hint.getKey())) { + newHints.put(hint); + } + } + return new ResourceHints(newHints.build()); + } + } + + @Override + public boolean equals(@Nullable Object other) { + if (other == null) { + return false; + } else if (this == other) { + return true; + } else if (other instanceof ResourceHints) { + return ((ResourceHints) other).hints.equals(hints); + } else { + return false; + } + } + + @Override + public int hashCode() { + return hints.hashCode(); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHintsOptions.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHintsOptions.java new file mode 100644 index 000000000000..f63ee2c71a26 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHintsOptions.java @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms.resourcehints; + +import com.google.auto.service.AutoService; +import java.util.List; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsRegistrar; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** Options that are used to control configuration of the remote environment. */ +public interface ResourceHintsOptions extends PipelineOptions { + class EmptyListDefault implements DefaultValueFactory { + @Override + public List create(PipelineOptions options) { + return ImmutableList.of(); + } + } + + @Description("Resource hints used for all transform execution environments.") + @Default.InstanceFactory(EmptyListDefault.class) + List getResourceHints(); + + void setResourceHints(List value); + + /** Register the {@link ResourceHintsOptions}. */ + @AutoService(PipelineOptionsRegistrar.class) + class Options implements PipelineOptionsRegistrar { + @Override + public Iterable> getPipelineOptions() { + return ImmutableList.of(ResourceHintsOptions.class); + } + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/package-info.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/package-info.java new file mode 100644 index 000000000000..dbfeacf52bae --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/resourcehints/package-info.java @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/** + * Defines {@link org.apache.beam.sdk.transforms.resourcehints.ResourceHints} for configuring + * pipeline execution. + * + *

    {@link org.apache.beam.sdk.transforms.resourcehints.ResourceHints} encapsulates environment + * parameters that describe the environment in which transforms should be run. + */ +package org.apache.beam.sdk.transforms.resourcehints; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/SplitResult.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/SplitResult.java index 8c44ae403354..dd02101811cd 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/SplitResult.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/SplitResult.java @@ -28,7 +28,7 @@ public abstract class SplitResult { /** Returns a {@link SplitResult} for the specified primary and residual restrictions. */ public static SplitResult of( - RestrictionT primary, RestrictionT residual) { + @Nullable RestrictionT primary, @Nullable RestrictionT residual) { return new AutoValue_SplitResult(primary, residual); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java index db07b15fe013..d15ffa8242c7 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java @@ -22,7 +22,6 @@ import java.util.Collections; import org.apache.beam.sdk.coders.Coder; import org.checkerframework.checker.nullness.qual.Nullable; -import org.joda.time.Instant; /** * A {@link WindowFn} that assigns all data to the same window. @@ -77,11 +76,6 @@ public GlobalWindow getSideInputWindow(BoundedWindow mainWindow) { } } - @Override - public Instant getOutputTime(Instant inputTimestamp, GlobalWindow window) { - return inputTimestamp; - } - @Override public boolean assignsToOneWindow() { return true; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/IntervalWindow.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/IntervalWindow.java index 87ffa4a611c6..143901a10b87 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/IntervalWindow.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/IntervalWindow.java @@ -27,6 +27,7 @@ import org.apache.beam.sdk.coders.DurationCoder; import org.apache.beam.sdk.coders.InstantCoder; import org.apache.beam.sdk.coders.StructuredCoder; +import org.apache.beam.sdk.util.common.ElementByteSizeObserver; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; import org.joda.time.Instant; @@ -174,6 +175,19 @@ public boolean consistentWithEquals() { return instantCoder.consistentWithEquals() && durationCoder.consistentWithEquals(); } + @Override + public boolean isRegisterByteSizeObserverCheap(IntervalWindow value) { + return instantCoder.isRegisterByteSizeObserverCheap(value.end) + && durationCoder.isRegisterByteSizeObserverCheap(new Duration(value.start, value.end)); + } + + @Override + public void registerByteSizeObserver(IntervalWindow value, ElementByteSizeObserver observer) + throws Exception { + instantCoder.registerByteSizeObserver(value.end, observer); + durationCoder.registerByteSizeObserver(new Duration(value.start, value.end), observer); + } + @Override public List> getCoderArguments() { return Collections.emptyList(); diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/InvalidWindows.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/InvalidWindows.java deleted file mode 100644 index 0525e77099e4..000000000000 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/InvalidWindows.java +++ /dev/null @@ -1,90 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.transforms.windowing; - -import java.util.Collection; -import org.apache.beam.sdk.coders.Coder; -import org.joda.time.Instant; - -/** - * A {@link WindowFn} that represents an invalid pipeline state. - * - * @param window type - */ -public class InvalidWindows extends WindowFn { - private String cause; - private WindowFn originalWindowFn; - - public InvalidWindows(String cause, WindowFn originalWindowFn) { - this.originalWindowFn = originalWindowFn; - this.cause = cause; - } - - /** Returns the reason that this {@link WindowFn} is invalid. */ - public String getCause() { - return cause; - } - - /** Returns the original windowFn that this InvalidWindows replaced. */ - public WindowFn getOriginalWindowFn() { - return originalWindowFn; - } - - @Override - public Collection assignWindows(AssignContext c) { - throw new UnsupportedOperationException(); - } - - @Override - public void mergeWindows(MergeContext c) { - throw new UnsupportedOperationException(); - } - - @Override - public Coder windowCoder() { - return originalWindowFn.windowCoder(); - } - - /** {@code InvalidWindows} objects with the same {@code originalWindowFn} are compatible. */ - @Override - public boolean isCompatible(WindowFn other) { - return getClass() == other.getClass() - && getOriginalWindowFn().isCompatible(((InvalidWindows) other).getOriginalWindowFn()); - } - - @Override - public void verifyCompatibility(WindowFn other) throws IncompatibleWindowException { - if (!this.isCompatible(other)) { - throw new IncompatibleWindowException( - other, - String.format( - "Only %s objects with the same originalWindowFn are compatible.", - InvalidWindows.class.getSimpleName())); - } - } - - @Override - public WindowMappingFn getDefaultWindowMappingFn() { - throw new UnsupportedOperationException("InvalidWindows is not allowed in side inputs"); - } - - @Override - public Instant getOutputTime(Instant inputTimestamp, W window) { - return inputTimestamp; - } -} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java index b4f8a4660d2d..ae36a4bf59dd 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java @@ -52,11 +52,6 @@ public W getSideInputWindow(BoundedWindow mainWindow) { }; } - @Override - public Instant getOutputTime(Instant inputTimestamp, W window) { - return inputTimestamp; - } - @Override public final boolean assignsToOneWindow() { return true; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java index acc01a02bd8e..5bbc2c9a720a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java @@ -21,8 +21,6 @@ import java.util.Collection; import java.util.List; import java.util.Objects; -import org.apache.beam.sdk.annotations.Experimental; -import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.transforms.display.DisplayData; import org.checkerframework.checker.nullness.qual.Nullable; @@ -198,21 +196,6 @@ public Duration getOffset() { return offset; } - /** - * Ensures that later sliding windows have an output time that is past the end of earlier windows. - * - *

    If this is the earliest sliding window containing {@code inputTimestamp}, that's fine. - * Otherwise, we pick the earliest time that doesn't overlap with earlier windows. - */ - @Experimental(Kind.OUTPUT_TIME) - @Override - public Instant getOutputTime(Instant inputTimestamp, IntervalWindow window) { - Instant startOfLastSegment = window.maxTimestamp().minus(period); - return startOfLastSegment.isBefore(inputTimestamp) - ? inputTimestamp - : startOfLastSegment.plus(1); - } - @Override public boolean equals(@Nullable Object object) { if (!(object instanceof SlidingWindows)) { diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Trigger.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Trigger.java index e25c96068ccc..bf7cd36265cd 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Trigger.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Trigger.java @@ -27,6 +27,7 @@ import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.transforms.GroupByKey; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Joiner; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Instant; @@ -87,7 +88,7 @@ protected Trigger() { } public List subTriggers() { - return subTriggers; + return MoreObjects.firstNonNull(subTriggers, Collections.emptyList()); } /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java index ddcd788f6dfd..b7e846b68e01 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java @@ -332,7 +332,7 @@ public Window withOnTimeBehavior(OnTimeBehavior behavior) { public WindowingStrategy getOutputStrategyInternal(WindowingStrategy inputStrategy) { WindowingStrategy result = inputStrategy; if (getWindowFn() != null) { - result = result.withWindowFn(getWindowFn()); + result = result.withAlreadyMerged(false).withWindowFn(getWindowFn()); } if (getTrigger() != null) { result = result.withTrigger(getTrigger()); @@ -509,9 +509,6 @@ public static Remerge remerge() { private static class Remerge extends PTransform, PCollection> { @Override public PCollection expand(PCollection input) { - WindowingStrategy outputWindowingStrategy = - getOutputWindowing(input.getWindowingStrategy()); - return input // We first apply a (trivial) transform to the input PCollection to produce a new // PCollection. This ensures that we don't modify the windowing strategy of the input @@ -526,18 +523,7 @@ public T apply(T element) { } })) // Then we modify the windowing strategy. - .setWindowingStrategyInternal(outputWindowingStrategy); - } - - private WindowingStrategy getOutputWindowing( - WindowingStrategy inputStrategy) { - if (inputStrategy.getWindowFn() instanceof InvalidWindows) { - @SuppressWarnings("unchecked") - InvalidWindows invalidWindows = (InvalidWindows) inputStrategy.getWindowFn(); - return inputStrategy.withWindowFn(invalidWindows.getOriginalWindowFn()); - } else { - return inputStrategy; - } + .setWindowingStrategyInternal(input.getWindowingStrategy().withAlreadyMerged(false)); } } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/WindowFn.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/WindowFn.java index 869bd868c821..e9a348362c01 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/WindowFn.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/WindowFn.java @@ -19,8 +19,6 @@ import java.io.Serializable; import java.util.Collection; -import org.apache.beam.sdk.annotations.Experimental; -import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.display.HasDisplayData; @@ -124,27 +122,6 @@ public void verifyCompatibility(WindowFn other) throws IncompatibleWindowE */ public abstract WindowMappingFn getDefaultWindowMappingFn(); - /** - * Returns the output timestamp to use for data depending on the given {@code inputTimestamp} in - * the specified {@code window}. - * - *

    The result of this method must be between {@code inputTimestamp} and {@code - * window.maxTimestamp()} (inclusive on both sides). - * - *

    This function must be monotonic across input timestamps. Specifically, if {@code A < B}, - * then {@code getOutputTime(A, window) <= getOutputTime(B, window)}. - * - *

    For a {@link WindowFn} that doesn't produce overlapping windows, this can (and typically - * should) just return {@code inputTimestamp}. In the presence of overlapping windows, it is - * suggested that the result in later overlapping windows is past the end of earlier windows so - * that the later windows don't prevent the watermark from progressing past the end of the earlier - * window. - */ - @Experimental(Kind.OUTPUT_TIME) - public Instant getOutputTime(Instant inputTimestamp, W window) { - return inputTimestamp; - } - /** Returns true if this {@code WindowFn} never needs to merge any windows. */ public boolean isNonMerging() { return false; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/HistogramData.java similarity index 67% rename from sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java rename to sdks/java/core/src/main/java/org/apache/beam/sdk/util/HistogramData.java index 3cb0899b6b8a..494cdf4499cc 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/HistogramData.java @@ -18,6 +18,7 @@ package org.apache.beam.sdk.util; import com.google.auto.value.AutoValue; +import java.io.Serializable; import java.math.RoundingMode; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath; @@ -34,26 +35,38 @@ * in future versions of the Apache Beam SDK. */ @Experimental -public class Histogram { - private static final Logger LOG = LoggerFactory.getLogger(Histogram.class); +public class HistogramData implements Serializable { + private static final Logger LOG = LoggerFactory.getLogger(HistogramData.class); private final BucketType bucketType; + // TODO(BEAM-12103): Update this function to remove the numTopRecords and numBottomRecords + // and include those counters in the buckets array. private long[] buckets; - private long numOfRecords; + private long numBoundedBucketRecords; private long numTopRecords; private long numBottomRecords; - private Histogram(BucketType bucketType) { + /** + * Create a histogram. + * + * @param bucketType a bucket type for a new histogram instance. + */ + public HistogramData(BucketType bucketType) { this.bucketType = bucketType; this.buckets = new long[bucketType.getNumBuckets()]; - this.numOfRecords = 0; + this.numBoundedBucketRecords = 0; this.numTopRecords = 0; this.numBottomRecords = 0; } + public BucketType getBucketType() { + return this.bucketType; + } + /** - * Create a histogram with linear buckets. + * TODO(BEAM-12103): Update this function to define numBuckets total, including the infinite + * buckets. Create a histogram with linear buckets. * * @param start Lower bound of a starting bucket. * @param width Bucket width. Smaller width implies a better resolution for percentile estimation. @@ -61,8 +74,8 @@ private Histogram(BucketType bucketType) { * width * numBuckets. * @return a new Histogram instance. */ - public static Histogram linear(double start, double width, int numBuckets) { - return new Histogram(LinearBuckets.of(start, width, numBuckets)); + public static HistogramData linear(double start, double width, int numBuckets) { + return new HistogramData(LinearBuckets.of(start, width, numBuckets)); } public void record(double... values) { @@ -71,9 +84,41 @@ public void record(double... values) { } } + public synchronized void update(HistogramData other) { + synchronized (other) { + if (!this.bucketType.equals(other.bucketType) + || this.buckets.length != other.buckets.length) { + LOG.warn("Failed to update HistogramData from another with a different buckets"); + return; + } + + incTopBucketCount(other.numTopRecords); + incBottomBucketCount(other.numBottomRecords); + for (int i = 0; i < other.buckets.length; i++) { + incBucketCount(i, other.buckets[i]); + } + } + } + + // TODO(BEAM-12103): Update this function to allow incrementing the infinite buckets as well. + // and remove the incTopBucketCount and incBotBucketCount methods. + // Using 0 and length -1 as the bucketIndex. + public synchronized void incBucketCount(int bucketIndex, long count) { + this.buckets[bucketIndex] += count; + this.numBoundedBucketRecords += count; + } + + public synchronized void incTopBucketCount(long count) { + this.numTopRecords += count; + } + + public synchronized void incBottomBucketCount(long count) { + this.numBottomRecords += count; + } + public synchronized void clear() { this.buckets = new long[bucketType.getNumBuckets()]; - this.numOfRecords = 0; + this.numBoundedBucketRecords = 0; this.numTopRecords = 0; this.numBottomRecords = 0; } @@ -82,23 +127,28 @@ public synchronized void record(double value) { double rangeTo = bucketType.getRangeTo(); double rangeFrom = bucketType.getRangeFrom(); if (value >= rangeTo) { - LOG.warn("record is out of upper bound {}: {}", rangeTo, value); numTopRecords++; } else if (value < rangeFrom) { - LOG.warn("record is out of lower bound {}: {}", rangeFrom, value); numBottomRecords++; } else { buckets[bucketType.getBucketIndex(value)]++; - numOfRecords++; + numBoundedBucketRecords++; } } public synchronized long getTotalCount() { - return numOfRecords + numTopRecords + numBottomRecords; + return numBoundedBucketRecords + numTopRecords + numBottomRecords; + } + + public synchronized String getPercentileString(String elemType, String unit) { + return String.format( + "Total number of %s: %s, P99: %.0f %s, P90: %.0f %s, P50: %.0f %s", + elemType, getTotalCount(), p99(), unit, p90(), unit, p50(), unit); } /** - * Get the bucket count for the given bucketIndex. + * TODO(BEAM-12103): Update this function to allow indexing the -INF and INF bucket (using 0 and + * length -1) Get the bucket count for the given bucketIndex. * *

    This method does not guarantee the atomicity when sequentially accessing the multiple * buckets i.e. other threads may alter the value between consecutive invocations. For summing the @@ -111,6 +161,14 @@ public synchronized long getCount(int bucketIndex) { return buckets[bucketIndex]; } + public synchronized long getTopBucketCount() { + return numTopRecords; + } + + public synchronized long getBottomBucketCount() { + return numBottomRecords; + } + public double p99() { return getLinearInterpolation(0.99); } @@ -131,7 +189,7 @@ public double p50() { private synchronized double getLinearInterpolation(double percentile) { long totalNumOfRecords = getTotalCount(); if (totalNumOfRecords == 0) { - throw new RuntimeException("histogram has no record."); + return Double.NaN; } int index; double recordSum = numBottomRecords; @@ -153,7 +211,7 @@ private synchronized double getLinearInterpolation(double percentile) { return bucketType.getRangeFrom() + bucketType.getAccumulatedBucketSize(index) + fracBucketSize; } - public interface BucketType { + public interface BucketType extends Serializable { // Lower bound of a starting bucket. double getRangeFrom(); // Upper bound of an ending bucket. @@ -180,13 +238,14 @@ public abstract static class LinearBuckets implements BucketType { public static LinearBuckets of(double start, double width, int numBuckets) { if (width <= 0) { - throw new RuntimeException(String.format("width should be greater than zero: %f", width)); + throw new IllegalArgumentException( + String.format("width should be greater than zero: %f", width)); } if (numBuckets <= 0) { - throw new RuntimeException( + throw new IllegalArgumentException( String.format("numBuckets should be greater than zero: %d", numBuckets)); } - return new AutoValue_Histogram_LinearBuckets(start, width, numBuckets); + return new AutoValue_HistogramData_LinearBuckets(start, width, numBuckets); } @Override @@ -213,5 +272,7 @@ public double getRangeFrom() { public double getRangeTo() { return getStart() + getNumBuckets() * getWidth(); } + + // Note: equals() and hashCode() are implemented by the AutoValue. } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/IdentityWindowFn.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/IdentityWindowFn.java index 7b5be6bdc7bb..ca78dc60f2fe 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/IdentityWindowFn.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/IdentityWindowFn.java @@ -25,26 +25,18 @@ import org.apache.beam.sdk.transforms.GroupByKey; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.IncompatibleWindowException; -import org.apache.beam.sdk.transforms.windowing.InvalidWindows; import org.apache.beam.sdk.transforms.windowing.NonMergingWindowFn; import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.transforms.windowing.WindowFn; import org.apache.beam.sdk.transforms.windowing.WindowMappingFn; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.WindowingStrategy; -import org.joda.time.Instant; /** * A {@link WindowFn} that leaves all associations between elements and windows unchanged. * *

    This {@link WindowFn} is applied when elements must be passed through a {@link GroupByKey}, - * but should maintain their existing {@link Window} assignments. Because windows may have been - * merged, the earlier {@link WindowFn} may not appropriately maintain the existing window - * assignments. For example, if the earlier {@link WindowFn} merges windows, after a {@link - * GroupByKey} the {@link WindowingStrategy} uses {@link InvalidWindows}, and no further {@link - * GroupByKey} can be applied without applying a new {@link WindowFn}. This {@link WindowFn} allows - * existing window assignments to be maintained across a single group by key, at which point the - * earlier {@link WindowingStrategy} should be restored. + * but should maintain their existing {@link Window} assignments. This will prevent merging if the + * underlying {@link WindowFn} would otherwise do so. * *

    This {@link WindowFn} is an internal implementation detail of sdk-provided utilities, and * should not be used by {@link Pipeline} writers. @@ -113,11 +105,6 @@ public WindowMappingFn getDefaultWindowMappingFn() { getClass().getCanonicalName())); } - @Override - public Instant getOutputTime(Instant inputTimestamp, BoundedWindow window) { - return inputTimestamp; - } - @Override public boolean assignsToOneWindow() { return true; diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/InstanceBuilder.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/InstanceBuilder.java index f6d9ffe2277d..1639186a599d 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/InstanceBuilder.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/InstanceBuilder.java @@ -221,11 +221,20 @@ private T buildFromMethod(Class[] types) { String.format( "Unable to find factory method %s#%s(%s)", factoryClass.getSimpleName(), methodName, Joiner.on(", ").join(types))); - - } catch (IllegalAccessException | InvocationTargetException e) { + } catch (InvocationTargetException e) { + if (e.getTargetException() instanceof RuntimeException) { + // If underlying exception is unchecked re-raise it as-is + throw (RuntimeException) e.getTargetException(); + } + throw new RuntimeException( + String.format( + "Encountered checked exception when constructing an instance from factory method %s#%s(%s)", + factoryClass.getSimpleName(), methodName, Joiner.on(", ").join(types)), + e.getTargetException()); + } catch (IllegalAccessException e) { throw new RuntimeException( String.format( - "Failed to construct instance from factory method %s#%s(%s)", + "Failed to construct instance from factory method %s#%s(%s) due to access restriction", factoryClass.getSimpleName(), methodName, Joiner.on(", ").join(types)), e); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/ReleaseInfo.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/ReleaseInfo.java index c284d4627a3d..3f21570a662e 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/ReleaseInfo.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/ReleaseInfo.java @@ -54,16 +54,21 @@ public String getName() { return getProperties().get("name"); } - /** Provides the BEAM version. ie: 2.18.0-SNAPSHOT */ + /** Provides the BEAM version e.g. 2.18.0-SNAPSHOT */ public String getVersion() { return getProperties().get("version"); } - /** Provides the SDK version. ie: 2.18.0 or 2.18.0.dev */ + /** Provides the SDK version. e.g. 2.18.0 or 2.18.0.dev */ public String getSdkVersion() { return getProperties().get("sdk_version"); } + /** Returns true if SDK version is a dev version (e.g. 2.18.0.dev) */ + public boolean isDevSdkVersion() { + return getProperties().get("sdk_version").contains("dev"); + } + /** Provides docker image default root (apache). */ public String getDefaultDockerRepoRoot() { return getProperties().get("docker_image_default_repo_root"); diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java index 63da5c06d3c2..351eacbec94f 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java @@ -19,6 +19,7 @@ import static org.apache.beam.sdk.schemas.Schema.TypeName.BOOLEAN; import static org.apache.beam.sdk.schemas.Schema.TypeName.BYTE; +import static org.apache.beam.sdk.schemas.Schema.TypeName.DATETIME; import static org.apache.beam.sdk.schemas.Schema.TypeName.DECIMAL; import static org.apache.beam.sdk.schemas.Schema.TypeName.DOUBLE; import static org.apache.beam.sdk.schemas.Schema.TypeName.FLOAT; @@ -28,13 +29,17 @@ import static org.apache.beam.sdk.schemas.Schema.TypeName.STRING; import static org.apache.beam.sdk.util.RowJsonValueExtractors.booleanValueExtractor; import static org.apache.beam.sdk.util.RowJsonValueExtractors.byteValueExtractor; +import static org.apache.beam.sdk.util.RowJsonValueExtractors.dateValueExtractor; +import static org.apache.beam.sdk.util.RowJsonValueExtractors.datetimeValueExtractor; import static org.apache.beam.sdk.util.RowJsonValueExtractors.decimalValueExtractor; import static org.apache.beam.sdk.util.RowJsonValueExtractors.doubleValueExtractor; import static org.apache.beam.sdk.util.RowJsonValueExtractors.floatValueExtractor; import static org.apache.beam.sdk.util.RowJsonValueExtractors.intValueExtractor; +import static org.apache.beam.sdk.util.RowJsonValueExtractors.localDatetimeValueExtractor; import static org.apache.beam.sdk.util.RowJsonValueExtractors.longValueExtractor; import static org.apache.beam.sdk.util.RowJsonValueExtractors.shortValueExtractor; import static org.apache.beam.sdk.util.RowJsonValueExtractors.stringValueExtractor; +import static org.apache.beam.sdk.util.RowJsonValueExtractors.timeValueExtractor; import static org.apache.beam.sdk.values.Row.toRow; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList.toImmutableList; @@ -49,6 +54,9 @@ import com.google.auto.value.AutoValue; import java.io.IOException; import java.math.BigDecimal; +import java.time.LocalDate; +import java.time.LocalDateTime; +import java.time.LocalTime; import java.util.stream.Stream; import java.util.stream.StreamSupport; import org.apache.beam.sdk.annotations.Experimental; @@ -57,12 +65,14 @@ import org.apache.beam.sdk.schemas.Schema.Field; import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; import org.apache.beam.sdk.util.RowJsonValueExtractors.ValueExtractor; import org.apache.beam.sdk.values.Row; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.DateTime; /** * Jackson serializer and deserializer for {@link Row Rows}. @@ -78,6 +88,8 @@ *

  • {@link Schema.TypeName#DOUBLE} *
  • {@link Schema.TypeName#BOOLEAN} *
  • {@link Schema.TypeName#STRING} + *
  • {@link Schema.TypeName#DECIMAL} + *
  • {@link Schema.TypeName#DATETIME} * */ @Experimental(Kind.SCHEMAS) @@ -86,7 +98,12 @@ }) public class RowJson { private static final ImmutableSet SUPPORTED_TYPES = - ImmutableSet.of(BYTE, INT16, INT32, INT64, FLOAT, DOUBLE, BOOLEAN, STRING, DECIMAL); + ImmutableSet.of(BYTE, INT16, INT32, INT64, FLOAT, DOUBLE, BOOLEAN, STRING, DECIMAL, DATETIME); + private static final ImmutableSet KNOWN_LOGICAL_TYPE_IDENTIFIERS = + ImmutableSet.of( + SqlTypes.DATE.getIdentifier(), + SqlTypes.TIME.getIdentifier(), + SqlTypes.DATETIME.getIdentifier()); /** * Throws {@link UnsupportedRowJsonException} if {@code schema} contains an unsupported field @@ -147,7 +164,11 @@ private static ImmutableList findUnsupportedFields( } if (fieldTypeName.isLogicalType()) { - return findUnsupportedFields(fieldType.getLogicalType().getBaseType(), fieldName); + if (KNOWN_LOGICAL_TYPE_IDENTIFIERS.contains(fieldType.getLogicalType().getIdentifier())) { + return ImmutableList.of(); + } else { + return findUnsupportedFields(fieldType.getLogicalType().getBaseType(), fieldName); + } } if (!SUPPORTED_TYPES.contains(fieldTypeName)) { @@ -206,6 +227,7 @@ public enum NullBehavior { .put(BOOLEAN, booleanValueExtractor()) .put(STRING, stringValueExtractor()) .put(DECIMAL, decimalValueExtractor()) + .put(DATETIME, datetimeValueExtractor()) .build(); private final Schema schema; @@ -285,11 +307,20 @@ private Object extractJsonNodeValue(FieldValue fieldValue) { } if (fieldValue.typeName().isLogicalType()) { - return extractJsonNodeValue( - FieldValue.of( - fieldValue.name(), - fieldValue.type().getLogicalType().getBaseType(), - fieldValue.jsonValue())); + String identifier = fieldValue.type().getLogicalType().getIdentifier(); + if (SqlTypes.DATE.getIdentifier().equals(identifier)) { + return dateValueExtractor().extractValue(fieldValue.jsonValue()); + } else if (SqlTypes.TIME.getIdentifier().equals(identifier)) { + return timeValueExtractor().extractValue(fieldValue.jsonValue()); + } else if (SqlTypes.DATETIME.getIdentifier().equals(identifier)) { + return localDatetimeValueExtractor().extractValue(fieldValue.jsonValue()); + } else { + return extractJsonNodeValue( + FieldValue.of( + fieldValue.name(), + fieldValue.type().getLogicalType().getBaseType(), + fieldValue.jsonValue())); + } } return extractJsonPrimitiveValue(fieldValue); @@ -495,6 +526,9 @@ private void writeValue(JsonGenerator gen, FieldType type, Object value) throws case DECIMAL: gen.writeNumber((BigDecimal) value); break; + case DATETIME: + gen.writeString(((DateTime) value).toString()); // ISO 8601 format + break; case ARRAY: case ITERABLE: gen.writeStartArray(); @@ -507,7 +541,16 @@ private void writeValue(JsonGenerator gen, FieldType type, Object value) throws writeRow((Row) value, type.getRowSchema(), gen); break; case LOGICAL_TYPE: - writeValue(gen, type.getLogicalType().getBaseType(), value); + String identifier = type.getLogicalType().getIdentifier(); + if (SqlTypes.DATE.getIdentifier().equals(identifier)) { + gen.writeString(((LocalDate) value).toString()); // ISO 8601 format + } else if (SqlTypes.TIME.getIdentifier().equals(identifier)) { + gen.writeString(((LocalTime) value).toString()); // ISO 8601 format + } else if (SqlTypes.DATETIME.getIdentifier().equals(identifier)) { + gen.writeString(((LocalDateTime) value).toString()); // ISO 8601 format + } else { + writeValue(gen, type.getLogicalType().getBaseType(), value); + } break; default: throw new IllegalArgumentException("Unsupported field type: " + type); diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonUtils.java index 956cdd0894bf..6538a1459290 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonUtils.java @@ -68,7 +68,7 @@ public static String rowToJson(ObjectMapper objectMapper, Row row) { try { return objectMapper.writeValueAsString(row); } catch (JsonProcessingException e) { - throw new IllegalArgumentException("Unable to serilize row: " + row, e); + throw new IllegalArgumentException("Unable to serialize row: " + row, e); } } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValueExtractors.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValueExtractors.java index 1b071fea3294..d60e84dca18a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValueExtractors.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValueExtractors.java @@ -20,9 +20,13 @@ import com.fasterxml.jackson.databind.JsonNode; import com.google.auto.value.AutoValue; import java.math.BigDecimal; +import java.time.LocalDate; +import java.time.LocalDateTime; +import java.time.LocalTime; import java.util.function.Function; import java.util.function.Predicate; import org.apache.beam.sdk.util.RowJson.UnsupportedRowJsonException; +import org.joda.time.DateTime; /** * Contains utilities for extracting primitive values from JSON nodes. @@ -177,6 +181,54 @@ static ValueExtractor decimalValueExtractor() { .build(); } + /** + * Extracts DateTime from the JsonNode (ISO 8601 format string) if it is valid. + * + *

    Throws {@link UnsupportedRowJsonException} if value is out of bounds. + */ + static ValueExtractor datetimeValueExtractor() { + return ValidatingValueExtractor.builder() + .setExtractor(jsonNode -> DateTime.parse(jsonNode.textValue())) + .setValidator(JsonNode::isTextual) + .build(); + } + + /** + * Extracts LocalDate from the JsonNode (ISO 8601 format string) if it is valid. + * + *

    Throws {@link UnsupportedRowJsonException} if value is out of bounds. + */ + static ValueExtractor dateValueExtractor() { + return ValidatingValueExtractor.builder() + .setExtractor(jsonNode -> LocalDate.parse(jsonNode.textValue())) + .setValidator(JsonNode::isTextual) + .build(); + } + + /** + * Extracts LocalTime from the JsonNode (ISO 8601 format string) if it is valid. + * + *

    Throws {@link UnsupportedRowJsonException} if value is out of bounds. + */ + static ValueExtractor timeValueExtractor() { + return ValidatingValueExtractor.builder() + .setExtractor(jsonNode -> LocalTime.parse(jsonNode.textValue())) + .setValidator(JsonNode::isTextual) + .build(); + } + + /** + * Extracts LocalDateTime from the JsonNode (ISO 8601 format string) if it is valid. + * + *

    Throws {@link UnsupportedRowJsonException} if value is out of bounds. + */ + static ValueExtractor localDatetimeValueExtractor() { + return ValidatingValueExtractor.builder() + .setExtractor(jsonNode -> LocalDateTime.parse(jsonNode.textValue())) + .setValidator(JsonNode::isTextual) + .build(); + } + @AutoValue public abstract static class ValidatingValueExtractor implements ValueExtractor { diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/ZipFiles.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/ZipFiles.java index f4bfb97ebaff..9ae1363e2f1b 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/ZipFiles.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/ZipFiles.java @@ -37,7 +37,6 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CharSource; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Closer; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Files; /** @@ -176,27 +175,42 @@ private static void checkName(String name) throws IOException { * as parameter, not absolute. * @param zipFile the zip-file to write to. * @throws IOException the zipping failed, e.g. because the input was not readable. + * @throws IllegalArgumentException sourceDirectory is not a directory, or zipFile already exists. */ public static void zipDirectory(File sourceDirectory, File zipFile) throws IOException { + zipDirectory(sourceDirectory, zipFile, false); + } + + /** + * Zips an entire directory specified by the path. + * + * @param sourceDirectory the directory to read from. This directory and all subdirectories will + * be added to the zip-file. The path within the zip file is relative to the directory given + * as parameter, not absolute. + * @param zipFile the zip-file to write to. Will be overwritten if it already exists. + * @throws IOException the zipping failed, e.g. because the input was not readable. + * @throws IllegalArgumentException sourceDirectory is not a directory. + */ + public static void zipDirectoryOverwrite(File sourceDirectory, File zipFile) throws IOException { + zipDirectory(sourceDirectory, zipFile, true); + } + + private static void zipDirectory(File sourceDirectory, File zipFile, boolean allowOverwrite) + throws IOException { checkNotNull(sourceDirectory); checkNotNull(zipFile); checkArgument( sourceDirectory.isDirectory(), "%s is not a valid directory", sourceDirectory.getAbsolutePath()); - checkArgument( - !zipFile.exists(), - "%s does already exist, files are not being overwritten", - zipFile.getAbsolutePath()); - Closer closer = Closer.create(); - try { - OutputStream outputStream = - closer.register(new BufferedOutputStream(new FileOutputStream(zipFile))); + if (!allowOverwrite) { + checkArgument( + !zipFile.exists(), + "%s already exists, file is not not being overwritten", + zipFile.getAbsolutePath()); + } + try (OutputStream outputStream = new BufferedOutputStream(new FileOutputStream(zipFile))) { zipDirectory(sourceDirectory, outputStream); - } catch (Throwable t) { - throw closer.rethrow(t); - } finally { - closer.close(); } } @@ -219,11 +233,12 @@ public static void zipDirectory(File sourceDirectory, OutputStream outputStream) "%s is not a valid directory", sourceDirectory.getAbsolutePath()); - ZipOutputStream zos = new ZipOutputStream(outputStream); - for (File file : sourceDirectory.listFiles()) { - zipDirectoryInternal(file, "", zos); + try (ZipOutputStream zos = new ZipOutputStream(outputStream)) { + for (File file : sourceDirectory.listFiles()) { + zipDirectoryInternal(file, "", zos); + } + zos.finish(); } - zos.finish(); } /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ReflectHelpers.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ReflectHelpers.java index 9c3652ddb216..7ca9249873a0 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ReflectHelpers.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ReflectHelpers.java @@ -250,7 +250,7 @@ public static ClassLoader findClassLoader(final Class... classes) { /** * Finds the appropriate {@code ClassLoader} to be used by the {@link ServiceLoader#load} call, * which by default would use the context {@code ClassLoader}, which can be null. The fallback is - * as follows: context ClassLoader, class ClassLoader and finaly the system ClassLoader. + * as follows: context ClassLoader, class ClassLoader and finally the system ClassLoader. */ public static ClassLoader findClassLoader() { return findClassLoader(Thread.currentThread().getContextClassLoader()); diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/EncodableThrowable.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/EncodableThrowable.java new file mode 100644 index 000000000000..008226217e05 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/EncodableThrowable.java @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.values; + +import java.io.Serializable; +import javax.annotation.Nullable; + +/** + * A wrapper around a {@link Throwable} for use with coders. + * + *

    Though {@link Throwable} is serializable, it doesn't override {@link Object#equals(Object)}, + * which can lead to false positives in mutation detection for coders. This class provides a coder- + * safe way to pass exceptions around without running into problems like log spam. + * + *

    This class is not suitable for general-purpose equality comparison among {@link Throwable}s + * and should only be used to pass a {@link Throwable} from one PTransform to another. + */ +public final class EncodableThrowable implements Serializable { + private Throwable throwable; + + private EncodableThrowable() { + // Can't set this to null without adding a pointless @Nullable annotation to the field. It also + // needs to be set from the constructor to avoid a checkstyle violation. + this.throwable = new Throwable(); + } + + /** Wraps {@code throwable} and returns the result. */ + public static EncodableThrowable forThrowable(Throwable throwable) { + EncodableThrowable comparable = new EncodableThrowable(); + comparable.throwable = throwable; + return comparable; + } + + /** Returns the underlying {@link Throwable}. */ + public Throwable throwable() { + return throwable; + } + + @Override + public int hashCode() { + return throwable.hashCode(); + } + + @Override + public boolean equals(@Nullable Object obj) { + if (!(obj instanceof EncodableThrowable)) { + return false; + } + Throwable other = ((EncodableThrowable) obj).throwable; + + // Assuming class preservation is enough to know that serialization/deserialization worked. + return throwable.getClass().equals(other.getClass()); + } +} diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java index a637951ff1d8..5e85eb8c4adb 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java @@ -124,7 +124,13 @@ public void finishSpecifying(PInput input, PTransform transform) { * override this to enable better {@code Coder} inference. */ public @Nullable TypeDescriptor getTypeDescriptor() { - return typeDescriptor; + if (typeDescriptor != null) { + return typeDescriptor; + } + if (coderOrFailure.coder != null) { + return coderOrFailure.coder.getEncodedTypeDescriptor(); + } + return null; } /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java index 56ade4edeff0..360c1afa39e1 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java @@ -58,7 +58,6 @@ import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.ViewFn; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; -import org.apache.beam.sdk.transforms.windowing.InvalidWindows; import org.apache.beam.sdk.transforms.windowing.WindowMappingFn; import org.apache.beam.sdk.util.CoderUtils; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; @@ -121,6 +120,7 @@ public static PCollectionView singletonView( */ @Deprecated public static PCollectionView singletonViewUsingVoidKey( + TupleTag> tag, PCollection> pCollection, TypeDescriptorSupplier typeDescriptorSupplier, WindowingStrategy windowingStrategy, @@ -129,6 +129,7 @@ public static PCollectionView singletonViewUsing Coder defaultValueCoder) { return new SimplePCollectionView<>( pCollection, + tag, new SingletonViewFn<>(hasDefault, defaultValue, defaultValueCoder, typeDescriptorSupplier), windowingStrategy.getWindowFn().getDefaultWindowMappingFn(), windowingStrategy); @@ -157,11 +158,13 @@ public static PCollectionView> iterable */ @Deprecated public static PCollectionView> iterableViewUsingVoidKey( + TupleTag> tag, PCollection> pCollection, TypeDescriptorSupplier typeDescriptorSupplier, WindowingStrategy windowingStrategy) { return new SimplePCollectionView<>( pCollection, + tag, new IterableViewFn<>(typeDescriptorSupplier), windowingStrategy.getWindowFn().getDefaultWindowMappingFn(), windowingStrategy); @@ -182,6 +185,23 @@ public static PCollectionView> listView( windowingStrategy); } + /** + * Returns a {@code PCollectionView>} capable of processing elements windowed using the + * provided {@link WindowingStrategy}. + */ + public static PCollectionView> listView( + PCollection>> pCollection, + TupleTag>> tag, + TypeDescriptorSupplier typeDescriptorSupplier, + WindowingStrategy windowingStrategy) { + return new SimplePCollectionView<>( + pCollection, + tag, + new ListViewFn2<>(typeDescriptorSupplier), + windowingStrategy.getWindowFn().getDefaultWindowMappingFn(), + windowingStrategy); + } + /** * Returns a {@code PCollectionView>} capable of processing elements windowed using the * provided {@link WindowingStrategy}. @@ -190,11 +210,13 @@ public static PCollectionView> listView( */ @Deprecated public static PCollectionView> listViewUsingVoidKey( + TupleTag> tag, PCollection> pCollection, TypeDescriptorSupplier typeDescriptorSupplier, WindowingStrategy windowingStrategy) { return new SimplePCollectionView<>( pCollection, + tag, new ListViewFn<>(typeDescriptorSupplier), windowingStrategy.getWindowFn().getDefaultWindowMappingFn(), windowingStrategy); @@ -244,12 +266,14 @@ public static PCollectionView> mapView */ @Deprecated public static PCollectionView> mapViewUsingVoidKey( + TupleTag>> tag, PCollection>> pCollection, TypeDescriptorSupplier keyTypeDescriptorSupplier, TypeDescriptorSupplier valueTypeDescriptorSupplier, WindowingStrategy windowingStrategy) { return new SimplePCollectionView<>( pCollection, + tag, new MapViewFn<>(keyTypeDescriptorSupplier, valueTypeDescriptorSupplier), windowingStrategy.getWindowFn().getDefaultWindowMappingFn(), windowingStrategy); @@ -280,12 +304,14 @@ public static PCollectionView @Deprecated public static PCollectionView>> multimapViewUsingVoidKey( + TupleTag>> tag, PCollection>> pCollection, TypeDescriptorSupplier keyTypeDescriptorSupplier, TypeDescriptorSupplier valueTypeDescriptorSupplier, WindowingStrategy windowingStrategy) { return new SimplePCollectionView<>( pCollection, + tag, new MultimapViewFn<>(keyTypeDescriptorSupplier, valueTypeDescriptorSupplier), windowingStrategy.getWindowFn().getDefaultWindowMappingFn(), windowingStrategy); @@ -313,7 +339,9 @@ public static Map, PValue> toAdditionalInputs(Iterable{@link SingletonViewFn} is meant to be removed in the future and replaced with this class. */ @Experimental(Kind.CORE_RUNNERS_ONLY) - private static class SingletonViewFn2 extends ViewFn, T> { + @Internal + public static class SingletonViewFn2 extends ViewFn, T> + implements HasDefaultValue { private byte @Nullable [] encodedDefaultValue; private transient @Nullable T defaultValue; private @Nullable Coder valueCoder; @@ -351,6 +379,7 @@ public boolean hasDefault() { * * @throws NoSuchElementException if no default was specified. */ + @Override public T getDefaultValue() { if (!hasDefault) { throw new NoSuchElementException("Empty PCollection accessed as a singleton view."); @@ -394,6 +423,11 @@ public TypeDescriptor getTypeDescriptor() { } } + @Internal + public interface HasDefaultValue { + T getDefaultValue(); + } + /** * Implementation which is able to adapt a multimap materialization to a {@code T}. * @@ -403,7 +437,8 @@ public TypeDescriptor getTypeDescriptor() { */ @Deprecated @Experimental(Kind.CORE_RUNNERS_ONLY) - public static class SingletonViewFn extends ViewFn, T> { + public static class SingletonViewFn extends ViewFn, T> + implements HasDefaultValue { private byte @Nullable [] encodedDefaultValue; private transient @Nullable T defaultValue; private @Nullable Coder valueCoder; @@ -441,6 +476,7 @@ public boolean hasDefault() { * * @throws NoSuchElementException if no default was specified. */ + @Override public T getDefaultValue() { if (!hasDefault) { throw new NoSuchElementException("Empty PCollection accessed as a singleton view."); @@ -494,7 +530,8 @@ public TypeDescriptor getTypeDescriptor() { *

    {@link IterableViewFn} is meant to be removed in the future and replaced with this class. */ @Experimental(Kind.CORE_RUNNERS_ONLY) - private static class IterableViewFn2 extends ViewFn, Iterable> { + @Internal + public static class IterableViewFn2 extends ViewFn, Iterable> { private TypeDescriptorSupplier typeDescriptorSupplier; public IterableViewFn2(TypeDescriptorSupplier typeDescriptorSupplier) { @@ -560,7 +597,7 @@ public TypeDescriptor> getTypeDescriptor() { */ @Experimental(Kind.CORE_RUNNERS_ONLY) @VisibleForTesting - static class ListViewFn2 + public static class ListViewFn2 extends ViewFn>, List> { private TypeDescriptorSupplier typeDescriptorSupplier; @@ -1004,7 +1041,8 @@ public int hashCode() { *

    {@link MultimapViewFn} is meant to be removed in the future and replaced with this class. */ @Experimental(Kind.CORE_RUNNERS_ONLY) - private static class MultimapViewFn2 + @Internal + public static class MultimapViewFn2 extends ViewFn, Map>> { private TypeDescriptorSupplier keyTypeDescriptorSupplier; private TypeDescriptorSupplier valueTypeDescriptorSupplier; @@ -1092,7 +1130,8 @@ public TypeDescriptor>> getTypeDescriptor() { * *

    {@link MapViewFn} is meant to be removed in the future and replaced with this class. */ - private static class MapViewFn2 extends ViewFn, Map> { + @Internal + public static class MapViewFn2 extends ViewFn, Map> { private TypeDescriptorSupplier keyTypeDescriptorSupplier; private TypeDescriptorSupplier valueTypeDescriptorSupplier; @@ -1203,9 +1242,6 @@ private SimplePCollectionView( WindowingStrategy windowingStrategy) { super(pCollection.getPipeline()); this.pCollection = pCollection; - if (windowingStrategy.getWindowFn() instanceof InvalidWindows) { - throw new IllegalArgumentException("WindowFn of PCollectionView cannot be InvalidWindows"); - } this.windowMappingFn = windowMappingFn; this.tag = tag; this.windowingStrategy = windowingStrategy; @@ -1283,7 +1319,13 @@ public boolean equals(@Nullable Object other) { @Override public String toString() { - return MoreObjects.toStringHelper(this).add("tag", tag).toString(); + return MoreObjects.toStringHelper(this) + .add("tag", tag) + .add("viewFn", viewFn) + .add("coder", coder) + .add("windowMappingFn", windowMappingFn) + .add("pCollection", pCollection) + .toString(); } @Override diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java index 93544f599e1e..9aa927787ad3 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java @@ -48,7 +48,6 @@ import org.apache.beam.sdk.values.RowUtils.FieldOverrides; import org.apache.beam.sdk.values.RowUtils.RowFieldMatcher; import org.apache.beam.sdk.values.RowUtils.RowPosition; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.DateTime; @@ -579,7 +578,60 @@ static int deepHashCodeForIterable(Iterable a, Schema.FieldType elementT @Override public String toString() { - return "Row:" + Arrays.deepToString(Iterables.toArray(getValues(), Object.class)); + return toString(true); + } + + /** Convert Row to String. */ + public String toString(boolean includeFieldNames) { + StringBuilder builder = new StringBuilder(); + builder.append("Row: "); + builder.append(System.lineSeparator()); + for (int i = 0; i < getSchema().getFieldCount(); ++i) { + Schema.Field field = getSchema().getField(i); + if (includeFieldNames) { + builder.append(field.getName() + ":"); + } + builder.append(toString(field.getType(), getValue(i), includeFieldNames)); + builder.append(System.lineSeparator()); + } + return builder.toString(); + } + + private String toString(Schema.FieldType fieldType, Object value, boolean includeFieldNames) { + StringBuilder builder = new StringBuilder(); + switch (fieldType.getTypeName()) { + case ARRAY: + case ITERABLE: + builder.append("["); + for (Object element : (Iterable) value) { + builder.append( + toString(fieldType.getCollectionElementType(), element, includeFieldNames)); + builder.append(", "); + } + builder.append("]"); + break; + case MAP: + builder.append("{"); + for (Map.Entry entry : ((Map) value).entrySet()) { + builder.append("("); + builder.append(toString(fieldType.getMapKeyType(), entry.getKey(), includeFieldNames)); + builder.append(", "); + builder.append( + toString(fieldType.getMapValueType(), entry.getValue(), includeFieldNames)); + builder.append("), "); + } + builder.append("}"); + break; + case BYTES: + builder.append(Arrays.toString((byte[]) value)); + break; + case ROW: + builder.append(((Row) value).toString(includeFieldNames)); + break; + default: + builder.append(value); + } + return builder.toString(); } /** @@ -795,23 +847,19 @@ public Row build() { + " fields."); } - FieldOverrides fieldOverrides = new FieldOverrides(schema); - fieldOverrides.setOverrides(this.values); - - Row row; - if (!fieldOverrides.isEmpty()) { - row = - (Row) - new RowFieldMatcher() - .match( - new CapturingRowCases(schema, fieldOverrides), - FieldType.row(schema), - new RowPosition(FieldAccessDescriptor.create()), - null); - } else { - row = new RowWithStorage(schema, Collections.emptyList()); + if (!values.isEmpty()) { + FieldOverrides fieldOverrides = new FieldOverrides(schema, this.values); + if (!fieldOverrides.isEmpty()) { + return (Row) + new RowFieldMatcher() + .match( + new CapturingRowCases(schema, fieldOverrides), + FieldType.row(schema), + new RowPosition(FieldAccessDescriptor.create()), + null); + } } - return row; + return new RowWithStorage(schema, Collections.emptyList()); } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/RowUtils.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/RowUtils.java index d3b1c21852d9..a5f62103837a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/RowUtils.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/RowUtils.java @@ -236,6 +236,11 @@ static class FieldOverrides { this.rootSchema = rootSchema; } + FieldOverrides(Schema rootSchema, List overrides) { + this.topNode = new FieldAccessNode(rootSchema, overrides); + this.rootSchema = rootSchema; + } + boolean isEmpty() { return topNode.isEmpty(); } @@ -270,6 +275,14 @@ private static class FieldAccessNode { nestedAccess = Lists.newArrayList(); } + FieldAccessNode(Schema schema, List overrides) { + fieldOverrides = Lists.newArrayListWithExpectedSize(schema.getFieldCount()); + for (Object value : overrides) { + fieldOverrides.add(new FieldOverride(value)); + } + nestedAccess = Lists.newArrayList(); + } + boolean isEmpty() { return fieldOverrides.isEmpty() && nestedAccess.isEmpty(); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/WindowingStrategy.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/WindowingStrategy.java index a66b22958074..5ec10e073aa0 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/WindowingStrategy.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/WindowingStrategy.java @@ -73,6 +73,7 @@ public enum AccumulationMode { private final OnTimeBehavior onTimeBehavior; private final TimestampCombiner timestampCombiner; private final String environmentId; + private final boolean alreadyMerged; private final boolean triggerSpecified; private final boolean modeSpecified; private final boolean allowedLatenessSpecified; @@ -90,7 +91,8 @@ private WindowingStrategy( boolean timestampCombinerSpecified, ClosingBehavior closingBehavior, OnTimeBehavior onTimeBehavior, - String environmentId) { + String environmentId, + boolean alreadyMerged) { this.windowFn = windowFn; this.trigger = trigger; this.triggerSpecified = triggerSpecified; @@ -103,6 +105,7 @@ private WindowingStrategy( this.timestampCombiner = timestampCombiner; this.timestampCombinerSpecified = timestampCombinerSpecified; this.environmentId = environmentId; + this.alreadyMerged = alreadyMerged; } /** Return a fully specified, default windowing strategy. */ @@ -123,7 +126,8 @@ public static WindowingStrategy of(WindowFn getWindowFn() { @@ -154,6 +158,14 @@ public boolean isModeSpecified() { return modeSpecified; } + public boolean isAlreadyMerged() { + return alreadyMerged; + } + + public boolean needsMerge() { + return !getWindowFn().isNonMerging() && !isAlreadyMerged(); + } + public ClosingBehavior getClosingBehavior() { return closingBehavior; } @@ -191,7 +203,8 @@ public WindowingStrategy withTrigger(Trigger trigger) { timestampCombinerSpecified, closingBehavior, onTimeBehavior, - environmentId); + environmentId, + alreadyMerged); } /** @@ -211,7 +224,8 @@ public WindowingStrategy withMode(AccumulationMode mode) { timestampCombinerSpecified, closingBehavior, onTimeBehavior, - environmentId); + environmentId, + alreadyMerged); } /** @@ -234,7 +248,8 @@ public WindowingStrategy withWindowFn(WindowFn wildcardWindowFn) { timestampCombinerSpecified, closingBehavior, onTimeBehavior, - environmentId); + environmentId, + alreadyMerged); } /** @@ -254,7 +269,8 @@ public WindowingStrategy withAllowedLateness(Duration allowedLateness) { timestampCombinerSpecified, closingBehavior, onTimeBehavior, - environmentId); + environmentId, + alreadyMerged); } public WindowingStrategy withClosingBehavior(ClosingBehavior closingBehavior) { @@ -270,7 +286,8 @@ public WindowingStrategy withClosingBehavior(ClosingBehavior closingBehavi timestampCombinerSpecified, closingBehavior, onTimeBehavior, - environmentId); + environmentId, + alreadyMerged); } public WindowingStrategy withOnTimeBehavior(OnTimeBehavior onTimeBehavior) { @@ -286,7 +303,8 @@ public WindowingStrategy withOnTimeBehavior(OnTimeBehavior onTimeBehavior) timestampCombinerSpecified, closingBehavior, onTimeBehavior, - environmentId); + environmentId, + alreadyMerged); } @Experimental(Kind.OUTPUT_TIME) @@ -304,7 +322,8 @@ public WindowingStrategy withTimestampCombiner(TimestampCombiner timestamp true, closingBehavior, onTimeBehavior, - environmentId); + environmentId, + alreadyMerged); } public WindowingStrategy withEnvironmentId(String environmentId) { @@ -320,7 +339,25 @@ public WindowingStrategy withEnvironmentId(String environmentId) { timestampCombinerSpecified, closingBehavior, onTimeBehavior, - environmentId); + environmentId, + alreadyMerged); + } + + public WindowingStrategy withAlreadyMerged(boolean alreadyMerged) { + return new WindowingStrategy<>( + windowFn, + trigger, + triggerSpecified, + mode, + modeSpecified, + allowedLateness, + allowedLatenessSpecified, + timestampCombiner, + timestampCombinerSpecified, + closingBehavior, + onTimeBehavior, + environmentId, + alreadyMerged); } @Override @@ -332,6 +369,7 @@ public String toString() { .add("accumulationMode", mode) .add("timestampCombiner", timestampCombiner) .add("environmentId", environmentId) + .add("alreadyMerged", alreadyMerged) .toString(); } @@ -344,6 +382,7 @@ public boolean equals(@Nullable Object object) { return isAllowedLatenessSpecified() == other.isAllowedLatenessSpecified() && isModeSpecified() == other.isModeSpecified() && isTimestampCombinerSpecified() == other.isTimestampCombinerSpecified() + && isAlreadyMerged() == other.isAlreadyMerged() && getMode().equals(other.getMode()) && getAllowedLateness().equals(other.getAllowedLateness()) && getClosingBehavior().equals(other.getClosingBehavior()) @@ -366,7 +405,8 @@ public int hashCode() { trigger, timestampCombiner, windowFn, - environmentId); + environmentId, + alreadyMerged); } /** @@ -387,6 +427,7 @@ public WindowingStrategy fixDefaults() { true, closingBehavior, onTimeBehavior, - environmentId); + environmentId, + alreadyMerged); } } diff --git a/sdks/java/core/src/main/resources/org/apache/beam/sdk/sdk.properties b/sdks/java/core/src/main/resources/org/apache/beam/sdk/sdk.properties index 3320a4c49357..36f0ce9eaf5e 100644 --- a/sdks/java/core/src/main/resources/org/apache/beam/sdk/sdk.properties +++ b/sdks/java/core/src/main/resources/org/apache/beam/sdk/sdk.properties @@ -18,6 +18,5 @@ version=@pom.version@ sdk_version=@pom.sdk_version@ - -build.date=@timestamp@ - +docker_image_default_repo_root=@pom.docker_image_default_repo_root@ +docker_image_default_repo_prefix=@pom.docker_image_default_repo_prefix@ diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/PipelineTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/PipelineTest.java index 2a9b11fad327..c409870f6c18 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/PipelineTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/PipelineTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.hasItem; @@ -24,7 +25,6 @@ import static org.hamcrest.Matchers.isA; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Mockito.mock; @@ -83,7 +83,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PipelineTest { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/TestUtils.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/TestUtils.java index 644bf1ba54f1..d37b50e1ff8d 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/TestUtils.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/TestUtils.java @@ -25,9 +25,6 @@ import org.hamcrest.TypeSafeMatcher; /** Utilities for tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestUtils { // Do not instantiate. private TestUtils() {} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/AvroCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/AvroCoderTest.java index 2b493097acc0..d7886c30b661 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/AvroCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/AvroCoderTest.java @@ -17,10 +17,12 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertArrayEquals; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; import com.esotericsoftware.kryo.Kryo; @@ -30,6 +32,7 @@ import java.io.ByteArrayOutputStream; import java.io.ObjectInputStream; import java.io.ObjectOutputStream; +import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.Collection; import java.util.HashSet; @@ -52,9 +55,14 @@ import org.apache.avro.reflect.Stringable; import org.apache.avro.reflect.Union; import org.apache.avro.specific.SpecificData; +import org.apache.avro.specific.SpecificRecord; import org.apache.avro.util.Utf8; import org.apache.beam.sdk.coders.Coder.Context; import org.apache.beam.sdk.coders.Coder.NonDeterministicException; +import org.apache.beam.sdk.schemas.TestAvro; +import org.apache.beam.sdk.schemas.TestAvroNested; +import org.apache.beam.sdk.schemas.TestEnum; +import org.apache.beam.sdk.schemas.fixed4; import org.apache.beam.sdk.testing.CoderProperties; import org.apache.beam.sdk.testing.InterceptingUrlClassLoader; import org.apache.beam.sdk.testing.NeedsRunner; @@ -68,6 +76,8 @@ import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.Nullable; import org.hamcrest.Description; import org.hamcrest.Matcher; @@ -75,7 +85,7 @@ import org.hamcrest.TypeSafeMatcher; import org.joda.time.DateTime; import org.joda.time.DateTimeZone; -import org.junit.Assert; +import org.joda.time.LocalDate; import org.junit.Rule; import org.junit.Test; import org.junit.experimental.categories.Category; @@ -85,15 +95,29 @@ /** Tests for {@link AvroCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AvroCoderTest { public static final DateTime DATETIME_A = new DateTime().withDate(1994, 10, 31).withZone(DateTimeZone.UTC); public static final DateTime DATETIME_B = new DateTime().withDate(1997, 4, 25).withZone(DateTimeZone.UTC); + private static final TestAvroNested AVRO_NESTED_SPECIFIC_RECORD = new TestAvroNested(true, 42); + private static final TestAvro AVRO_SPECIFIC_RECORD = + new TestAvro( + true, + 43, + 44L, + 44.1f, + 44.2d, + "mystring", + ByteBuffer.wrap(new byte[] {1, 2, 3, 4}), + new fixed4(new byte[] {1, 2, 3, 4}), + new LocalDate(1979, 3, 14), + new DateTime().withDate(1979, 3, 14).withTime(1, 2, 3, 4), + TestEnum.abc, + AVRO_NESTED_SPECIFIC_RECORD, + ImmutableList.of(AVRO_NESTED_SPECIFIC_RECORD, AVRO_NESTED_SPECIFIC_RECORD), + ImmutableMap.of("k1", AVRO_NESTED_SPECIFIC_RECORD, "k2", AVRO_NESTED_SPECIFIC_RECORD)); @DefaultCoder(AvroCoder.class) private static class Pojo { @@ -297,6 +321,27 @@ public void testPojoEncoding() throws Exception { CoderProperties.coderDecodeEncodeEqual(coder, value); } + @Test + public void testSpecificRecordEncoding() throws Exception { + AvroCoder coder = AvroCoder.of(TestAvro.class, AVRO_SPECIFIC_RECORD.getSchema()); + + assertTrue(SpecificRecord.class.isAssignableFrom(coder.getType())); + CoderProperties.coderDecodeEncodeEqual(coder, AVRO_SPECIFIC_RECORD); + } + + @Test + public void testReflectRecordEncoding() throws Exception { + AvroCoder coder = AvroCoder.of(TestAvro.class, true); + AvroCoder coderWithSchema = + AvroCoder.of(TestAvro.class, AVRO_SPECIFIC_RECORD.getSchema(), true); + + assertTrue(SpecificRecord.class.isAssignableFrom(coder.getType())); + assertTrue(SpecificRecord.class.isAssignableFrom(coderWithSchema.getType())); + + CoderProperties.coderDecodeEncodeEqual(coder, AVRO_SPECIFIC_RECORD); + CoderProperties.coderDecodeEncodeEqual(coderWithSchema, AVRO_SPECIFIC_RECORD); + } + @Test public void testGenericRecordEncoding() throws Exception { String schemaString = @@ -319,7 +364,7 @@ public void testGenericRecordEncoding() throws Exception { AvroCoder coder = AvroCoder.of(GenericRecord.class, schema); CoderProperties.coderDecodeEncodeEqual(coder, before); - Assert.assertEquals(schema, coder.getSchema()); + assertEquals(schema, coder.getSchema()); } @Test @@ -341,10 +386,10 @@ public void testEncodingNotBuffered() throws Exception { ByteArrayInputStream inStream = new ByteArrayInputStream(outStream.toByteArray()); Pojo after = coder.decode(inStream, context); - Assert.assertEquals(before, after); + assertEquals(before, after); Integer intAfter = intCoder.decode(inStream, context); - Assert.assertEquals(Integer.valueOf(10), intAfter); + assertEquals(Integer.valueOf(10), intAfter); } @Test @@ -369,6 +414,14 @@ public void testAvroCoderIsSerializable() throws Exception { SerializableUtils.ensureSerializable(coder); } + @Test + public void testAvroReflectCoderIsSerializable() throws Exception { + AvroCoder coder = AvroCoder.of(Pojo.class, true); + + // Check that the coder is serializable using the regular JSON approach. + SerializableUtils.ensureSerializable(coder); + } + private void assertDeterministic(AvroCoder coder) { try { coder.verifyDeterministic(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/AvroCoderTestPojo.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/AvroCoderTestPojo.java index 2b9ec5ee5bcc..d48ae9091f7c 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/AvroCoderTestPojo.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/AvroCoderTestPojo.java @@ -22,9 +22,6 @@ import org.checkerframework.checker.nullness.qual.Nullable; /** A Pojo at the top level for use in tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class AvroCoderTestPojo { public String text; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigDecimalCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigDecimalCoderTest.java index 9411e8a98675..f52b433d79d5 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigDecimalCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigDecimalCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.math.BigDecimal; import org.apache.beam.sdk.testing.CoderProperties; @@ -33,9 +33,6 @@ /** Test case for {@link BigDecimalCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigDecimalCoderTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigEndianIntegerCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigEndianIntegerCoderTest.java index 877099d80421..d38b5cb02eb2 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigEndianIntegerCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigEndianIntegerCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -33,9 +33,6 @@ /** Test case for {@link BigEndianIntegerCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigEndianIntegerCoderTest { private static final Coder TEST_CODER = BigEndianIntegerCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigEndianLongCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigEndianLongCoderTest.java index f79dcb52a3d3..fa289b7eff3a 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigEndianLongCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigEndianLongCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -33,9 +33,6 @@ /** Test case for {@link BigEndianLongCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigEndianLongCoderTest { private static final Coder TEST_CODER = BigEndianLongCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigIntegerCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigIntegerCoderTest.java index 8097de4d6175..860315c826ec 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigIntegerCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BigIntegerCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.math.BigInteger; import org.apache.beam.sdk.testing.CoderProperties; @@ -33,9 +33,6 @@ /** Test case for {@link BigIntegerCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigIntegerCoderTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BitSetCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BitSetCoderTest.java index d4fe5f462856..7d53d530e005 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BitSetCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/BitSetCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.BitSet; @@ -34,9 +34,6 @@ /** Tests for {@link BitSetCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BitSetCoderTest { private static final Coder TEST_CODER = BitSetCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ByteArrayCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ByteArrayCoderTest.java index 61617e27bce3..fb676492ba17 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ByteArrayCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ByteArrayCoderTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.io.ByteArrayOutputStream; import java.io.IOException; @@ -36,9 +36,6 @@ /** Unit tests for {@link ByteArrayCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ByteArrayCoderTest { private static final ByteArrayCoder TEST_CODER = ByteArrayCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ByteCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ByteCoderTest.java index a1059b1b16f4..51c0b5a51827 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ByteCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ByteCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -33,9 +33,6 @@ /** Test case for {@link ByteCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ByteCoderTest { private static final Coder TEST_CODER = ByteCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CoderRegistryTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CoderRegistryTest.java index 10d77decdc0f..f06d79132f58 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CoderRegistryTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CoderRegistryTest.java @@ -54,9 +54,6 @@ /** Tests for CoderRegistry. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CoderRegistryTest { @Rule public TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CoderTest.java index 31f15a8f9b28..8d2010900bda 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CoderTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotEquals; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.Collections; @@ -37,9 +37,6 @@ /** Tests for constructs defined within {@link Coder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CoderTest { @Rule public ExpectedException expectedException = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CollectionCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CollectionCoderTest.java index 2bb9bcfbc670..bf507a02e2c5 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CollectionCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/CollectionCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Arrays; @@ -38,9 +38,6 @@ /** Test case for {@link CollectionCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CollectionCoderTest { private static final Coder> TEST_CODER = CollectionCoder.of(VarIntCoder.of()); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DefaultCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DefaultCoderTest.java index f34dc35aeb86..b062a356c002 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DefaultCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DefaultCoderTest.java @@ -18,8 +18,8 @@ package org.apache.beam.sdk.coders; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.Collections; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DelegateCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DelegateCoderTest.java index a659b7af0577..fdb3e876ef8a 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DelegateCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DelegateCoderTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotEquals; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.Arrays; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DequeCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DequeCoderTest.java new file mode 100644 index 000000000000..b2c1f5378365 --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DequeCoderTest.java @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.coders; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.equalTo; + +import java.util.ArrayDeque; +import java.util.Collections; +import java.util.Deque; +import java.util.List; +import org.apache.beam.sdk.testing.CoderProperties; +import org.apache.beam.sdk.transforms.windowing.GlobalWindow; +import org.apache.beam.sdk.util.CoderUtils; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test case for {@link DequeCoder}. */ +@RunWith(JUnit4.class) +public class DequeCoderTest { + + private static final Coder> TEST_CODER = DequeCoder.of(VarIntCoder.of()); + + private static final List> TEST_VALUES = + ImmutableList.of( + new ArrayDeque<>(), + new ArrayDeque<>(Collections.singleton(13)), + new ArrayDeque<>(ImmutableList.of(31, -5, 83))); + + @Test + public void testCoderIsSerializableWithWellKnownCoderType() throws Exception { + CoderProperties.coderSerializable(DequeCoder.of(GlobalWindow.Coder.INSTANCE)); + } + + @Test + public void testDecodeEncodeContentsEqual() throws Exception { + for (Deque value : TEST_VALUES) { + CoderProperties.coderDecodeEncodeContentsEqual(TEST_CODER, value); + } + } + + /** + * Generated data to check that the wire format has not changed. To regenerate, see {@link + * org.apache.beam.sdk.coders.PrintBase64Encodings}. + */ + private static final List TEST_ENCODINGS = + ImmutableList.of("AAAAAA", "AAAAAQ0", "AAAAAx_7____D1M"); + + @Test + public void testWireFormatEncode() throws Exception { + CoderProperties.coderEncodesBase64(TEST_CODER, TEST_VALUES, TEST_ENCODINGS); + } + + @Rule public ExpectedException thrown = ExpectedException.none(); + + @Test + public void encodeNullThrowsCoderException() throws Exception { + thrown.expect(CoderException.class); + thrown.expectMessage("cannot encode a null Deque"); + + CoderUtils.encodeToBase64(TEST_CODER, null); + } + + @Test + public void structuralValueDecodeEncodeEqualIterable() throws Exception { + DequeCoder coder = DequeCoder.of(ByteArrayCoder.of()); + Deque value = new ArrayDeque<>(Collections.singletonList(new byte[] {1, 2, 3, 4})); + CoderProperties.structuralValueDecodeEncodeEqualIterable(coder, value); + } + + @Test + public void encodeDequeWithList() throws Exception { + DequeCoder> listListLongCoder = DequeCoder.of(ListCoder.of(VarLongCoder.of())); + + CoderProperties.coderDecodeEncodeContentsEqual( + listListLongCoder, + new ArrayDeque<>( + ImmutableList.of( + ImmutableList.of(18L, 15L), ImmutableList.of(19L, 25L), ImmutableList.of(22L)))); + } + + @Test + public void testEncodedTypeDescriptor() throws Exception { + TypeDescriptor> typeDescriptor = new TypeDescriptor>() {}; + assertThat(TEST_CODER.getEncodedTypeDescriptor(), equalTo(typeDescriptor)); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DoubleCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DoubleCoderTest.java index 9711c7824131..1133ee8bf4cf 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DoubleCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DoubleCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -33,9 +33,6 @@ /** Test case for {@link DoubleCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DoubleCoderTest { private static final Coder TEST_CODER = DoubleCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DurationCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DurationCoderTest.java index cad73abddeeb..688dbd32ca86 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DurationCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/DurationCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -36,9 +36,6 @@ /** Unit tests for {@link DurationCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DurationCoderTest { private static final DurationCoder TEST_CODER = DurationCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/FloatCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/FloatCoderTest.java index 94ee5b13f715..d77d3fff36d9 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/FloatCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/FloatCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -33,9 +33,6 @@ /** Test case for {@link FloatCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FloatCoderTest { private static final Coder TEST_CODER = FloatCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/InstantCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/InstantCoderTest.java index 442f045c3cd0..41f7077d3300 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/InstantCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/InstantCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Arrays; @@ -39,9 +39,6 @@ /** Unit tests for {@link InstantCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class InstantCoderTest { private static final InstantCoder TEST_CODER = InstantCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/IterableCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/IterableCoderTest.java index 57e1e253823e..73dff39d866d 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/IterableCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/IterableCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Arrays; @@ -36,9 +36,6 @@ /** Unit tests for {@link IterableCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class IterableCoderTest { private static final Coder> TEST_CODER = IterableCoder.of(VarIntCoder.of()); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/KvCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/KvCoderTest.java index f5796d97474c..436f4c1ae910 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/KvCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/KvCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.Collections; @@ -36,9 +36,6 @@ /** Test case for {@link KvCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KvCoderTest { private static class CoderAndData { Coder coder; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ListCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ListCoderTest.java index 7f9136f13d02..e05cac578376 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ListCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/ListCoderTest.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.coders; import static junit.framework.TestCase.assertTrue; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Arrays; @@ -38,9 +38,6 @@ /** Unit tests for {@link ListCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ListCoderTest { private static final Coder> TEST_CODER = ListCoder.of(VarIntCoder.of()); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/MapCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/MapCoderTest.java index 07ffdbc6aee8..0777ae0ad615 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/MapCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/MapCoderTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.Arrays; @@ -40,9 +40,6 @@ /** Unit tests for {@link MapCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MapCoderTest { private static final Coder> TEST_CODER = diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/NullableCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/NullableCoderTest.java index 2294aef70f45..d67b08e7c758 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/NullableCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/NullableCoderTest.java @@ -18,11 +18,11 @@ package org.apache.beam.sdk.coders; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.theInstance; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.ByteArrayInputStream; @@ -45,9 +45,6 @@ /** Unit tests for {@link NullableCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class NullableCoderTest { private static final Coder TEST_CODER = NullableCoder.of(StringUtf8Coder.of()); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java index 769023a65c7c..78dd8625e437 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.io.InputStream; @@ -53,9 +53,6 @@ /** Tests for coder exception handling in runners. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PCollectionCustomCoderTest { private static final Logger LOG = LoggerFactory.getLogger(PCollectionCustomCoderTest.class); /** diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PrintBase64Encodings.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PrintBase64Encodings.java index 52ede71f36a4..5b97ecff2a8c 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PrintBase64Encodings.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PrintBase64Encodings.java @@ -35,9 +35,6 @@ * -Dexec.args='org.apache.beam.sdk.coders.BigEndianIntegerCoderTest.TEST_CODER \ * org.apache.beam.sdk.coders.BigEndianIntegerCoderTest.TEST_VALUES' } */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PrintBase64Encodings { /** Gets a field even if it is private, which the test data generally will be. */ diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/RowCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/RowCoderTest.java index 79202869bcaf..ca2d20c48d9e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/RowCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/RowCoderTest.java @@ -17,6 +17,10 @@ */ package org.apache.beam.sdk.coders; +import static org.junit.Assert.assertEquals; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; import java.math.BigDecimal; import java.util.Arrays; import java.util.Collections; @@ -31,6 +35,7 @@ import org.apache.beam.sdk.testing.CoderProperties; import org.apache.beam.sdk.values.Row; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.joda.time.DateTime; import org.joda.time.DateTimeZone; @@ -359,4 +364,133 @@ public void testConsistentWithEqualsMapWithNull() throws Exception { Row row = Row.withSchema(schema).addValue(Collections.singletonMap(1, null)).build(); CoderProperties.coderDecodeEncodeEqual(RowCoder.of(schema), row); } + + @Test + public void testEncodingPositionReorderFields() throws Exception { + Schema schema1 = + Schema.builder() + .addNullableField("f_int32", FieldType.INT32) + .addNullableField("f_string", FieldType.STRING) + .build(); + Schema schema2 = + Schema.builder() + .addNullableField("f_string", FieldType.STRING) + .addNullableField("f_int32", FieldType.INT32) + .build(); + schema2.setEncodingPositions(ImmutableMap.of("f_int32", 0, "f_string", 1)); + Row row = + Row.withSchema(schema1) + .withFieldValue("f_int32", 42) + .withFieldValue("f_string", "hello world!") + .build(); + + Row expected = + Row.withSchema(schema2) + .withFieldValue("f_int32", 42) + .withFieldValue("f_string", "hello world!") + .build(); + + ByteArrayOutputStream os = new ByteArrayOutputStream(); + RowCoder.of(schema1).encode(row, os); + Row decoded = RowCoder.of(schema2).decode(new ByteArrayInputStream(os.toByteArray())); + assertEquals(expected, decoded); + } + + @Test + public void testEncodingPositionAddNewFields() throws Exception { + Schema schema1 = + Schema.builder() + .addNullableField("f_int32", FieldType.INT32) + .addNullableField("f_string", FieldType.STRING) + .build(); + Schema schema2 = + Schema.builder() + .addNullableField("f_int32", FieldType.INT32) + .addNullableField("f_string", FieldType.STRING) + .addNullableField("f_boolean", FieldType.BOOLEAN) + .build(); + Row row = + Row.withSchema(schema1) + .withFieldValue("f_int32", 42) + .withFieldValue("f_string", "hello world!") + .build(); + + Row expected = + Row.withSchema(schema2) + .withFieldValue("f_int32", 42) + .withFieldValue("f_string", "hello world!") + .build(); + + ByteArrayOutputStream os = new ByteArrayOutputStream(); + RowCoder.of(schema1).encode(row, os); + Row decoded = RowCoder.of(schema2).decode(new ByteArrayInputStream(os.toByteArray())); + assertEquals(expected, decoded); + } + + @Test + public void testEncodingPositionAddNewFieldsAndReorderExisting() throws Exception { + Schema schema1 = + Schema.builder() + .addNullableField("f_int32", FieldType.INT32) + .addNullableField("f_string", FieldType.STRING) + .build(); + Schema schema2 = + Schema.builder() + .addNullableField("f_int32", FieldType.INT32) + .addNullableField("f_boolean", FieldType.BOOLEAN) + .addNullableField("f_string", FieldType.STRING) + .build(); + schema2.setEncodingPositions(ImmutableMap.of("f_int32", 0, "f_string", 1, "f_boolean", 2)); + + Row row = + Row.withSchema(schema1) + .withFieldValue("f_int32", 42) + .withFieldValue("f_string", "hello world!") + .build(); + + Row expected = + Row.withSchema(schema2) + .withFieldValue("f_int32", 42) + .withFieldValue("f_string", "hello world!") + .build(); + + ByteArrayOutputStream os = new ByteArrayOutputStream(); + RowCoder.of(schema1).encode(row, os); + Row decoded = RowCoder.of(schema2).decode(new ByteArrayInputStream(os.toByteArray())); + assertEquals(expected, decoded); + } + + @Test + public void testEncodingPositionRemoveFields() throws Exception { + Schema schema1 = + Schema.builder() + .addNullableField("f_int32", FieldType.INT32) + .addNullableField("f_string", FieldType.STRING) + .addNullableField("f_boolean", FieldType.BOOLEAN) + .build(); + + Schema schema2 = + Schema.builder() + .addNullableField("f_int32", FieldType.INT32) + .addNullableField("f_string", FieldType.STRING) + .build(); + + Row row = + Row.withSchema(schema1) + .withFieldValue("f_int32", 42) + .withFieldValue("f_string", "hello world!") + .withFieldValue("f_boolean", true) + .build(); + + Row expected = + Row.withSchema(schema2) + .withFieldValue("f_int32", 42) + .withFieldValue("f_string", "hello world!") + .build(); + + ByteArrayOutputStream os = new ByteArrayOutputStream(); + RowCoder.of(schema1).encode(row, os); + Row decoded = RowCoder.of(schema2).decode(new ByteArrayInputStream(os.toByteArray())); + assertEquals(expected, decoded); + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/SerializableCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/SerializableCoderTest.java index 36ae842109f3..38b982a09378 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/SerializableCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/SerializableCoderTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import java.io.ByteArrayInputStream; @@ -59,9 +59,6 @@ /** Tests SerializableCoder. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SerializableCoderTest implements Serializable { @Rule public ExpectedLogs expectedLogs = ExpectedLogs.none(SerializableCoder.class); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/SetCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/SetCoderTest.java index 3359379565e9..3cce288571a7 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/SetCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/SetCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.Collections; @@ -37,9 +37,6 @@ /** Test case for {@link SetCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SetCoderTest { private static final Coder> TEST_CODER = SetCoder.of(VarIntCoder.of()); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StringDelegateCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StringDelegateCoderTest.java index f88146124af7..f5dba3a777f0 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StringDelegateCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StringDelegateCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.net.URI; import java.util.Arrays; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StringUtf8CoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StringUtf8CoderTest.java index ba211dda1c8a..c51a3f7d8d9e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StringUtf8CoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StringUtf8CoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -33,9 +33,6 @@ /** Test case for {@link StringUtf8Coder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StringUtf8CoderTest { private static final Coder TEST_CODER = StringUtf8Coder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StructuredCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StructuredCoderTest.java index c58475a3576e..b3a20a62391e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StructuredCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/StructuredCoderTest.java @@ -17,6 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; + import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; @@ -28,16 +30,12 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.checkerframework.checker.nullness.qual.Nullable; import org.hamcrest.CoreMatchers; -import org.junit.Assert; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** Test case for {@link StructuredCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StructuredCoderTest { /** A coder for nullable {@code Boolean} values that is consistent with equals. */ @@ -170,7 +168,7 @@ public void testStructuralValue() throws Exception { /** Test for verifying {@link StructuredCoder#toString()}. */ @Test public void testToString() { - Assert.assertThat( + assertThat( new ObjectIdentityBooleanCoder().toString(), CoreMatchers.equalTo("StructuredCoderTest$ObjectIdentityBooleanCoder")); @@ -184,21 +182,21 @@ public List> getCoderArguments() { } }; - Assert.assertThat( + assertThat( coderWithArgs.toString(), CoreMatchers.equalTo("StructuredCoderTest$1(BigDecimalCoder,BigIntegerCoder)")); } @Test public void testGenericStandardCoderFallsBackToT() throws Exception { - Assert.assertThat( + assertThat( new Foo().getEncodedTypeDescriptor().getType(), CoreMatchers.not(TypeDescriptor.of(String.class).getType())); } @Test public void testGenericStandardCoder() throws Exception { - Assert.assertThat( + assertThat( new FooTwo().getEncodedTypeDescriptor(), CoreMatchers.equalTo(TypeDescriptor.of(String.class))); } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TextualIntegerCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TextualIntegerCoderTest.java index f8cd5350a337..b36c574b2ad3 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TextualIntegerCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TextualIntegerCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -33,9 +33,6 @@ /** Test case for {@link TextualIntegerCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TextualIntegerCoderTest { private static final Coder TEST_CODER = TextualIntegerCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoderTest.java new file mode 100644 index 000000000000..a9f8123505aa --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoderTest.java @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.coders; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.equalTo; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.List; +import java.util.Objects; +import org.apache.beam.sdk.testing.CoderProperties; +import org.apache.beam.sdk.testing.CoderProperties.TestElementByteSizeObserver; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.transforms.windowing.GlobalWindow; +import org.apache.beam.sdk.transforms.windowing.IntervalWindow; +import org.apache.beam.sdk.util.common.ElementByteSizeObserver; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Instant; +import org.junit.Test; + +public class TimestampPrefixingWindowCoderTest { + + private static class CustomWindow extends IntervalWindow { + private boolean isBig; + + CustomWindow(Instant start, Instant end, boolean isBig) { + super(start, end); + this.isBig = isBig; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + CustomWindow that = (CustomWindow) o; + return super.equals(o) && this.isBig == that.isBig; + } + + @Override + public int hashCode() { + return Objects.hash(super.hashCode(), isBig); + } + } + + private static class CustomWindowCoder extends CustomCoder { + + private static final Coder INTERVAL_WINDOW_CODER = IntervalWindow.getCoder(); + private static final int REGISTER_BYTE_SIZE = 1234; + private final boolean isConsistentWithEqual; + private final boolean isRegisterByteSizeCheap; + + public static CustomWindowCoder of( + boolean isConsistentWithEqual, boolean isRegisterByteSizeCheap) { + return new CustomWindowCoder(isConsistentWithEqual, isRegisterByteSizeCheap); + } + + private CustomWindowCoder(boolean isConsistentWithEqual, boolean isRegisterByteSizeCheap) { + this.isConsistentWithEqual = isConsistentWithEqual; + this.isRegisterByteSizeCheap = isRegisterByteSizeCheap; + } + + @Override + public void encode(CustomWindow window, OutputStream outStream) throws IOException { + INTERVAL_WINDOW_CODER.encode(window, outStream); + BooleanCoder.of().encode(window.isBig, outStream); + } + + @Override + public CustomWindow decode(InputStream inStream) throws IOException { + IntervalWindow superWindow = INTERVAL_WINDOW_CODER.decode(inStream); + boolean isBig = BooleanCoder.of().decode(inStream); + return new CustomWindow(superWindow.start(), superWindow.end(), isBig); + } + + @Override + public void verifyDeterministic() throws NonDeterministicException { + INTERVAL_WINDOW_CODER.verifyDeterministic(); + BooleanCoder.of().verifyDeterministic(); + } + + @Override + public boolean consistentWithEquals() { + return isConsistentWithEqual; + } + + @Override + public boolean isRegisterByteSizeObserverCheap(CustomWindow value) { + return isRegisterByteSizeCheap; + } + + @Override + public void registerByteSizeObserver(CustomWindow value, ElementByteSizeObserver observer) + throws Exception { + observer.update(REGISTER_BYTE_SIZE); + } + } + + private static final List CUSTOM_WINDOW_LIST = + Lists.newArrayList( + new CustomWindow(new Instant(0L), new Instant(1L), true), + new CustomWindow(new Instant(100L), new Instant(200L), false), + new CustomWindow(new Instant(0L), BoundedWindow.TIMESTAMP_MAX_VALUE, true)); + + @Test + public void testEncodeAndDecode() throws Exception { + List intervalWindowsToTest = + Lists.newArrayList( + new IntervalWindow(new Instant(0L), new Instant(1L)), + new IntervalWindow(new Instant(100L), new Instant(200L)), + new IntervalWindow(new Instant(0L), BoundedWindow.TIMESTAMP_MAX_VALUE)); + TimestampPrefixingWindowCoder coder1 = + TimestampPrefixingWindowCoder.of(IntervalWindow.getCoder()); + for (IntervalWindow window : intervalWindowsToTest) { + CoderProperties.coderDecodeEncodeEqual(coder1, window); + } + + GlobalWindow globalWindow = GlobalWindow.INSTANCE; + TimestampPrefixingWindowCoder coder2 = + TimestampPrefixingWindowCoder.of(GlobalWindow.Coder.INSTANCE); + CoderProperties.coderDecodeEncodeEqual(coder2, globalWindow); + TimestampPrefixingWindowCoder coder3 = + TimestampPrefixingWindowCoder.of(CustomWindowCoder.of(true, true)); + for (CustomWindow window : CUSTOM_WINDOW_LIST) { + CoderProperties.coderDecodeEncodeEqual(coder3, window); + } + } + + @Test + public void testConsistentWithEquals() { + TimestampPrefixingWindowCoder coder1 = + TimestampPrefixingWindowCoder.of(CustomWindowCoder.of(true, true)); + assertThat(coder1.consistentWithEquals(), equalTo(true)); + TimestampPrefixingWindowCoder coder2 = + TimestampPrefixingWindowCoder.of(CustomWindowCoder.of(false, true)); + assertThat(coder2.consistentWithEquals(), equalTo(false)); + } + + @Test + public void testIsRegisterByteSizeObserverCheap() { + TimestampPrefixingWindowCoder coder1 = + TimestampPrefixingWindowCoder.of(CustomWindowCoder.of(true, true)); + assertThat(coder1.isRegisterByteSizeObserverCheap(CUSTOM_WINDOW_LIST.get(0)), equalTo(true)); + TimestampPrefixingWindowCoder coder2 = + TimestampPrefixingWindowCoder.of(CustomWindowCoder.of(true, false)); + assertThat(coder2.isRegisterByteSizeObserverCheap(CUSTOM_WINDOW_LIST.get(0)), equalTo(false)); + } + + @Test + public void testGetEncodedElementByteSize() throws Exception { + TestElementByteSizeObserver observer = new TestElementByteSizeObserver(); + TimestampPrefixingWindowCoder coder = + TimestampPrefixingWindowCoder.of(CustomWindowCoder.of(true, true)); + for (CustomWindow value : CUSTOM_WINDOW_LIST) { + coder.registerByteSizeObserver(value, observer); + observer.advance(); + assertThat( + observer.getSumAndReset(), + equalTo( + CustomWindowCoder.REGISTER_BYTE_SIZE + + InstantCoder.of().getEncodedElementByteSize(value.maxTimestamp()))); + } + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VarIntCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VarIntCoderTest.java index adb8629efeaf..0f388809f6bb 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VarIntCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VarIntCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -33,9 +33,6 @@ /** Test case for {@link VarIntCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class VarIntCoderTest { private static final Coder TEST_CODER = VarIntCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VarLongCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VarLongCoderTest.java index 8af237853a65..b76b1f27df4e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VarLongCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VarLongCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import java.util.List; @@ -33,9 +33,6 @@ /** Test case for {@link VarLongCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class VarLongCoderTest { private static final Coder TEST_CODER = VarLongCoder.of(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VoidCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VoidCoderTest.java index 9f9beb2932e4..dbbef0252f68 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VoidCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/VoidCoderTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.values.TypeDescriptor; import org.junit.Test; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java index 1262f8203f27..1663ecf23cd7 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java @@ -24,10 +24,10 @@ import static org.apache.beam.sdk.transforms.Requirements.requiresSideInputs; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects.firstNonNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.File; @@ -119,7 +119,6 @@ /** Tests for AvroIO Read and Write transforms. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class AvroIOTest implements Serializable { /** Unit tests. */ @@ -476,6 +475,20 @@ public void testWriteSingleFileThenReadUsingAllMethods() throws Throwable { "ParseFilesGenericRecords", AvroIO.parseFilesGenericRecords(new ParseGenericClass()) .withCoder(AvroCoder.of(GenericClass.class)) + .withUsesReshuffle(false) + .withDesiredBundleSizeBytes(10))) + .containsInAnyOrder(values); + PAssert.that( + path.apply("MatchAllParseFilesGenericRecordsWithShuffle", FileIO.matchAll()) + .apply( + "ReadMatchesParseFilesGenericRecordsWithShuffle", + FileIO.readMatches() + .withDirectoryTreatment(FileIO.ReadMatches.DirectoryTreatment.PROHIBIT)) + .apply( + "ParseFilesGenericRecordsWithShuffle", + AvroIO.parseFilesGenericRecords(new ParseGenericClass()) + .withCoder(AvroCoder.of(GenericClass.class)) + .withUsesReshuffle(true) .withDesiredBundleSizeBytes(10))) .containsInAnyOrder(values); PAssert.that( diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java index 2aac671ae4c8..c58872ea13eb 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java @@ -18,11 +18,11 @@ package org.apache.beam.sdk.io; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.ByteArrayInputStream; @@ -74,9 +74,6 @@ /** Tests for AvroSource. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AvroSourceTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/BoundedReadFromUnboundedSourceTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/BoundedReadFromUnboundedSourceTest.java index 5722df104efc..887b23b680c5 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/BoundedReadFromUnboundedSourceTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/BoundedReadFromUnboundedSourceTest.java @@ -19,9 +19,9 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFor; import static org.apache.beam.sdk.util.CoderUtils.encodeToByteArray; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.IOException; @@ -57,9 +57,6 @@ /** Unit tests for {@link BoundedReadFromUnboundedSource}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BoundedReadFromUnboundedSourceTest implements Serializable { private static final int NUM_RECORDS = 100; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/ClassLoaderFileSystemTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/ClassLoaderFileSystemTest.java index 616d4519a834..2667196fb338 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/ClassLoaderFileSystemTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/ClassLoaderFileSystemTest.java @@ -31,9 +31,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ClassLoaderFileSystemTest { private static final String SOME_CLASS = diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java index e91fe30ac1b2..001aedb4a315 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java @@ -19,6 +19,7 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFor; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; @@ -26,7 +27,6 @@ import static org.junit.Assert.assertNotEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.ByteArrayOutputStream; @@ -48,15 +48,12 @@ import org.apache.beam.sdk.coders.SerializableCoder; import org.apache.beam.sdk.io.BoundedSource.BoundedReader; import org.apache.beam.sdk.io.CompressedSource.CompressedReader; -import org.apache.beam.sdk.io.CompressedSource.CompressionMode; -import org.apache.beam.sdk.io.CompressedSource.DecompressingChannelFactory; import org.apache.beam.sdk.io.FileBasedSource.FileBasedReader; import org.apache.beam.sdk.io.fs.MatchResult.Metadata; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; import org.apache.beam.sdk.testing.SourceTestUtils; -import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.util.LzoCompression; import org.apache.beam.sdk.values.KV; @@ -68,6 +65,7 @@ import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream; import org.apache.commons.compress.compressors.deflate.DeflateCompressorOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; +import org.apache.commons.compress.compressors.snappy.SnappyCompressorOutputStream; import org.apache.commons.compress.compressors.zstandard.ZstdCompressorOutputStream; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Instant; @@ -80,9 +78,6 @@ /** Tests for CompressedSource. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CompressedSourceTest { private final double delta = 1e-6; @@ -95,26 +90,33 @@ public class CompressedSourceTest { @Test public void testReadGzip() throws Exception { byte[] input = generateInput(5000); - runReadTest(input, CompressionMode.GZIP); + runReadTest(input, Compression.GZIP); } /** Test reading nonempty input with lzo. */ @Test public void testReadLzo() throws Exception { byte[] input = generateInput(5000); - runReadTest(input, CompressionMode.LZO); + runReadTest(input, Compression.LZO); } /** Test reading nonempty input with lzop. */ @Test public void testReadLzop() throws Exception { byte[] input = generateInput(5000); - runReadTest(input, CompressionMode.LZOP); + runReadTest(input, Compression.LZOP); + } + + /** Test reading nonempty input with snappy. */ + @Test + public void testReadSnappy() throws Exception { + byte[] input = generateInput(5000); + runReadTest(input, Compression.SNAPPY); } /** Test splittability of files in AUTO mode. */ @Test - public void testAutoSplittable() throws Exception { + public void testAutoSplittable() { CompressedSource source; // GZip files are not splittable @@ -160,6 +162,12 @@ public void testAutoSplittable() throws Exception { source = CompressedSource.from(new ByteSource("input.DEFLATE", 1)); assertFalse(source.isSplittable()); + // SNAPPY files are not splittable + source = CompressedSource.from(new ByteSource("input.snappy", 1)); + assertFalse(source.isSplittable()); + source = CompressedSource.from(new ByteSource("input.SNAPPY", 1)); + assertFalse(source.isSplittable()); + // Other extensions are assumed to be splittable. source = CompressedSource.from(new ByteSource("input.txt", 1)); assertTrue(source.isSplittable()); @@ -169,72 +177,74 @@ public void testAutoSplittable() throws Exception { /** Test splittability of files in GZIP mode -- none should be splittable. */ @Test - public void testGzipSplittable() throws Exception { + public void testGzipSplittable() { CompressedSource source; // GZip files are not splittable - source = - CompressedSource.from(new ByteSource("input.gz", 1)) - .withDecompression(CompressionMode.GZIP); + source = CompressedSource.from(new ByteSource("input.gz", 1)).withCompression(Compression.GZIP); assertFalse(source.isSplittable()); - source = - CompressedSource.from(new ByteSource("input.GZ", 1)) - .withDecompression(CompressionMode.GZIP); + source = CompressedSource.from(new ByteSource("input.GZ", 1)).withCompression(Compression.GZIP); assertFalse(source.isSplittable()); // Other extensions are also not splittable. source = - CompressedSource.from(new ByteSource("input.txt", 1)) - .withDecompression(CompressionMode.GZIP); + CompressedSource.from(new ByteSource("input.txt", 1)).withCompression(Compression.GZIP); assertFalse(source.isSplittable()); source = - CompressedSource.from(new ByteSource("input.csv", 1)) - .withDecompression(CompressionMode.GZIP); + CompressedSource.from(new ByteSource("input.csv", 1)).withCompression(Compression.GZIP); assertFalse(source.isSplittable()); } /** Test splittability of files in LZO mode -- none should be splittable. */ @Test - public void testLzoSplittable() throws Exception { + public void testLzoSplittable() { CompressedSource source; // LZO files are not splittable source = CompressedSource.from(new ByteSource("input.lzo_deflate", 1)) - .withDecompression(CompressionMode.LZO); + .withCompression(Compression.LZO); assertFalse(source.isSplittable()); // Other extensions are also not splittable. - source = - CompressedSource.from(new ByteSource("input.txt", 1)) - .withDecompression(CompressionMode.LZO); + source = CompressedSource.from(new ByteSource("input.txt", 1)).withCompression(Compression.LZO); assertFalse(source.isSplittable()); - source = - CompressedSource.from(new ByteSource("input.csv", 1)) - .withDecompression(CompressionMode.LZO); + source = CompressedSource.from(new ByteSource("input.csv", 1)).withCompression(Compression.LZO); assertFalse(source.isSplittable()); } /** Test splittability of files in LZOP mode -- none should be splittable. */ @Test - public void testLzopSplittable() throws Exception { + public void testLzopSplittable() { CompressedSource source; // LZO files are not splittable source = - CompressedSource.from(new ByteSource("input.lzo", 1)) - .withDecompression(CompressionMode.LZOP); + CompressedSource.from(new ByteSource("input.lzo", 1)).withCompression(Compression.LZOP); assertFalse(source.isSplittable()); // Other extensions are also not splittable. source = - CompressedSource.from(new ByteSource("input.txt", 1)) - .withDecompression(CompressionMode.LZOP); + CompressedSource.from(new ByteSource("input.txt", 1)).withCompression(Compression.LZOP); assertFalse(source.isSplittable()); source = - CompressedSource.from(new ByteSource("input.csv", 1)) - .withDecompression(CompressionMode.LZOP); + CompressedSource.from(new ByteSource("input.csv", 1)).withCompression(Compression.LZOP); + assertFalse(source.isSplittable()); + } + + /** Test splittability of files in SNAPPY mode -- none should be splittable. */ + @Test + public void testSnappySplittable() { + CompressedSource source; + + source = + CompressedSource.from(new ByteSource("input.snappy", 1)) + .withCompression(Compression.SNAPPY); + assertFalse(source.isSplittable()); + source = + CompressedSource.from(new ByteSource("input.snappy", 1)) + .withCompression(Compression.SNAPPY); assertFalse(source.isSplittable()); } @@ -242,49 +252,56 @@ public void testLzopSplittable() throws Exception { @Test public void testReadBzip2() throws Exception { byte[] input = generateInput(5000); - runReadTest(input, CompressionMode.BZIP2); + runReadTest(input, Compression.BZIP2); } /** Test reading nonempty input with zip. */ @Test public void testReadZip() throws Exception { byte[] input = generateInput(5000); - runReadTest(input, CompressionMode.ZIP); + runReadTest(input, Compression.ZIP); } /** Test reading nonempty input with deflate. */ @Test public void testReadDeflate() throws Exception { byte[] input = generateInput(5000); - runReadTest(input, CompressionMode.DEFLATE); + runReadTest(input, Compression.DEFLATE); } /** Test reading empty input with gzip. */ @Test public void testEmptyReadGzip() throws Exception { byte[] input = generateInput(0); - runReadTest(input, CompressionMode.GZIP); + runReadTest(input, Compression.GZIP); } /** Test reading empty input with zstd. */ @Test public void testEmptyReadZstd() throws Exception { byte[] input = generateInput(0); - runReadTest(input, CompressionMode.ZSTD); + runReadTest(input, Compression.ZSTD); } /** Test reading empty input with lzo. */ @Test public void testEmptyReadLzo() throws Exception { byte[] input = generateInput(0); - runReadTest(input, CompressionMode.LZO); + runReadTest(input, Compression.LZO); } /** Test reading empty input with lzop. */ @Test public void testEmptyReadLzop() throws Exception { byte[] input = generateInput(0); - runReadTest(input, CompressionMode.LZOP); + runReadTest(input, Compression.LZOP); + } + + /** Test reading empty input with snappy. */ + @Test + public void testEmptyReadSnappy() throws Exception { + byte[] input = generateInput(0); + runReadTest(input, Compression.SNAPPY); } private static byte[] compressGzip(byte[] input) throws IOException { @@ -311,6 +328,14 @@ private static byte[] compressLzop(byte[] input) throws IOException { return res.toByteArray(); } + private static byte[] compressSnappy(byte[] input) throws IOException { + ByteArrayOutputStream res = new ByteArrayOutputStream(); + try (OutputStream snappyStream = new SnappyCompressorOutputStream(res, input.length)) { + snappyStream.write(input); + } + return res.toByteArray(); + } + private static byte[] concat(byte[] first, byte[] second) { byte[] res = new byte[first.length + second.length]; System.arraycopy(first, 0, res, 0, first.length); @@ -337,7 +362,7 @@ public void testReadConcatenatedGzip() throws IOException { CompressedSource source = CompressedSource.from(new ByteSource(tmpFile.getAbsolutePath(), 1)) - .withDecompression(CompressionMode.GZIP); + .withCompression(Compression.GZIP); List actual = SourceTestUtils.readFromSource(source, PipelineOptionsFactory.create()); assertEquals(Bytes.asList(expected), actual); } @@ -361,7 +386,7 @@ public void testReadConcatenatedLzo() throws IOException { CompressedSource source = CompressedSource.from(new ByteSource(tmpFile.getAbsolutePath(), 1)) - .withDecompression(CompressionMode.LZO); + .withCompression(Compression.LZO); List actual = SourceTestUtils.readFromSource(source, PipelineOptionsFactory.create()); assertEquals(Bytes.asList(expected), actual); } @@ -384,11 +409,36 @@ public void testFalseReadConcatenatedLzop() throws IOException { CompressedSource source = CompressedSource.from(new ByteSource(tmpFile.getAbsolutePath(), 1)) - .withDecompression(CompressionMode.LZOP); + .withCompression(Compression.LZOP); List actual = SourceTestUtils.readFromSource(source, PipelineOptionsFactory.create()); assertNotEquals(Bytes.asList(expected), actual); } + /** + * Test that a concatenation of snappy files is not correctly decompressed. The current behaviour + * of the decompressor returns the contents of the first file only. + */ + @Test + public void testFalseReadConcatenatedSnappy() throws IOException { + byte[] header = "a,b,c\n".getBytes(StandardCharsets.UTF_8); + byte[] body = "1,2,3\n4,5,6\n7,8,9\n".getBytes(StandardCharsets.UTF_8); + byte[] headerAndBody = concat(header, body); + byte[] totalSnappy = concat(compressSnappy(header), compressSnappy(body)); + + File tmpFile = tmpFolder.newFile(); + try (FileOutputStream os = new FileOutputStream(tmpFile)) { + os.write(totalSnappy); + } + + CompressedSource source = + CompressedSource.from(new ByteSource(tmpFile.getAbsolutePath(), 1)) + .withCompression(Compression.SNAPPY); + List actual = SourceTestUtils.readFromSource(source, PipelineOptionsFactory.create()); + + assertNotEquals(Bytes.asList(headerAndBody), actual); + assertEquals(Bytes.asList(header), actual); + } + /** * Test a bzip2 file containing multiple streams is correctly decompressed. * @@ -397,17 +447,17 @@ public void testFalseReadConcatenatedLzop() throws IOException { */ @Test public void testReadMultiStreamBzip2() throws IOException { - CompressionMode mode = CompressionMode.BZIP2; + Compression compression = Compression.BZIP2; byte[] input1 = generateInput(5, 587973); byte[] input2 = generateInput(5, 387374); ByteArrayOutputStream stream1 = new ByteArrayOutputStream(); - try (OutputStream os = getOutputStreamForMode(mode, stream1)) { + try (OutputStream os = getOutputStreamForMode(compression, stream1, input1)) { os.write(input1); } ByteArrayOutputStream stream2 = new ByteArrayOutputStream(); - try (OutputStream os = getOutputStreamForMode(mode, stream2)) { + try (OutputStream os = getOutputStreamForMode(compression, stream2, input2)) { os.write(input2); } @@ -418,7 +468,7 @@ public void testReadMultiStreamBzip2() throws IOException { } byte[] output = Bytes.concat(input1, input2); - verifyReadContents(output, tmpFile, mode); + verifyReadContents(output, tmpFile, compression); } /** @@ -429,17 +479,17 @@ public void testReadMultiStreamBzip2() throws IOException { */ @Test public void testReadMultiStreamLzo() throws IOException { - CompressionMode mode = CompressionMode.LZO; + Compression compression = Compression.LZO; byte[] input1 = generateInput(5, 587973); byte[] input2 = generateInput(5, 387374); ByteArrayOutputStream stream1 = new ByteArrayOutputStream(); - try (OutputStream os = getOutputStreamForMode(mode, stream1)) { + try (OutputStream os = getOutputStreamForMode(compression, stream1, input1)) { os.write(input1); } ByteArrayOutputStream stream2 = new ByteArrayOutputStream(); - try (OutputStream os = getOutputStreamForMode(mode, stream2)) { + try (OutputStream os = getOutputStreamForMode(compression, stream2, input2)) { os.write(input2); } @@ -450,7 +500,7 @@ public void testReadMultiStreamLzo() throws IOException { } byte[] output = Bytes.concat(input1, input2); - verifyReadContents(output, tmpFile, mode); + verifyReadContents(output, tmpFile, compression); } /** @@ -459,17 +509,48 @@ public void testReadMultiStreamLzo() throws IOException { */ @Test public void testFalseReadMultiStreamLzop() throws IOException { - CompressionMode mode = CompressionMode.LZOP; + Compression compression = Compression.LZOP; + byte[] input1 = generateInput(5, 587973); + byte[] input2 = generateInput(5, 387374); + + ByteArrayOutputStream stream1 = new ByteArrayOutputStream(); + try (OutputStream os = getOutputStreamForMode(compression, stream1, input1)) { + os.write(input1); + } + + ByteArrayOutputStream stream2 = new ByteArrayOutputStream(); + try (OutputStream os = getOutputStreamForMode(compression, stream2, input2)) { + os.write(input2); + } + + File tmpFile = tmpFolder.newFile(); + try (OutputStream os = new FileOutputStream(tmpFile)) { + os.write(stream1.toByteArray()); + os.write(stream2.toByteArray()); + } + + byte[] output = Bytes.concat(input1, input2); + thrown.expectMessage("expected"); + verifyReadContents(output, tmpFile, compression); + } + + /** + * Test a snappy file containing multiple streams is not correctly decompressed. The current + * behavior is it only reads the contents of the first file. + */ + @Test + public void testFalseReadMultiStreamSnappy() throws IOException { + Compression compression = Compression.SNAPPY; byte[] input1 = generateInput(5, 587973); byte[] input2 = generateInput(5, 387374); ByteArrayOutputStream stream1 = new ByteArrayOutputStream(); - try (OutputStream os = getOutputStreamForMode(mode, stream1)) { + try (OutputStream os = getOutputStreamForMode(compression, stream1, input1)) { os.write(input1); } ByteArrayOutputStream stream2 = new ByteArrayOutputStream(); - try (OutputStream os = getOutputStreamForMode(mode, stream2)) { + try (OutputStream os = getOutputStreamForMode(compression, stream2, input2)) { os.write(input2); } @@ -481,21 +562,21 @@ public void testFalseReadMultiStreamLzop() throws IOException { byte[] output = Bytes.concat(input1, input2); thrown.expectMessage("expected"); - verifyReadContents(output, tmpFile, mode); + verifyReadContents(output, tmpFile, compression); } /** Test reading empty input with bzip2. */ @Test public void testCompressedReadBzip2() throws Exception { byte[] input = generateInput(0); - runReadTest(input, CompressionMode.BZIP2); + runReadTest(input, Compression.BZIP2); } /** Test reading empty input with zstd. */ @Test public void testCompressedReadZstd() throws Exception { byte[] input = generateInput(0); - runReadTest(input, CompressionMode.ZSTD); + runReadTest(input, Compression.ZSTD); } /** Test reading according to filepattern when the file is gzipped. */ @@ -503,7 +584,7 @@ public void testCompressedReadZstd() throws Exception { public void testCompressedAccordingToFilepatternGzip() throws Exception { byte[] input = generateInput(100); File tmpFile = tmpFolder.newFile("test.gz"); - writeFile(tmpFile, input, CompressionMode.GZIP); + writeFile(tmpFile, input, Compression.GZIP); verifyReadContents(input, tmpFile, null /* default auto decompression factory */); } @@ -512,7 +593,7 @@ public void testCompressedAccordingToFilepatternGzip() throws Exception { public void testCompressedAccordingToFilepatternBzip2() throws Exception { byte[] input = generateInput(100); File tmpFile = tmpFolder.newFile("test.bz2"); - writeFile(tmpFile, input, CompressionMode.BZIP2); + writeFile(tmpFile, input, Compression.BZIP2); verifyReadContents(input, tmpFile, null /* default auto decompression factory */); } @@ -521,7 +602,7 @@ public void testCompressedAccordingToFilepatternBzip2() throws Exception { public void testCompressedAccordingToFilepatternZstd() throws Exception { byte[] input = generateInput(100); File tmpFile = tmpFolder.newFile("test.zst"); - writeFile(tmpFile, input, CompressionMode.ZSTD); + writeFile(tmpFile, input, Compression.ZSTD); verifyReadContents(input, tmpFile, null /* default auto decompression factory */); } @@ -530,7 +611,7 @@ public void testCompressedAccordingToFilepatternZstd() throws Exception { public void testCompressedAccordingToFilepatternLzo() throws Exception { byte[] input = generateInput(100); File tmpFile = tmpFolder.newFile("test.lzo_deflate"); - writeFile(tmpFile, input, CompressionMode.LZO); + writeFile(tmpFile, input, Compression.LZO); verifyReadContents(input, tmpFile, null /* default auto decompression factory */); } @@ -539,7 +620,16 @@ public void testCompressedAccordingToFilepatternLzo() throws Exception { public void testCompressedAccordingToFilepatternLzop() throws Exception { byte[] input = generateInput(100); File tmpFile = tmpFolder.newFile("test.lzo"); - writeFile(tmpFile, input, CompressionMode.LZOP); + writeFile(tmpFile, input, Compression.LZOP); + verifyReadContents(input, tmpFile, null /* default auto decompression factory */); + } + + /** Test reading according to filepattern when the file is snappy compressed. */ + @Test + public void testCompressedAccordingToFilepatternSnappy() throws Exception { + byte[] input = generateInput(100); + File tmpFile = tmpFolder.newFile("test.snappy"); + writeFile(tmpFile, input, Compression.SNAPPY); verifyReadContents(input, tmpFile, null /* default auto decompression factory */); } @@ -549,33 +639,37 @@ public void testHeterogeneousCompression() throws Exception { String baseName = "test-input"; // Expected data - byte[] generated = generateInput(1000); - List expected = new ArrayList<>(); + byte[] generated; // Every sort of compression File uncompressedFile = tmpFolder.newFile(baseName + ".bin"); generated = generateInput(1000, 1); Files.write(generated, uncompressedFile); - expected.addAll(Bytes.asList(generated)); + List expected = new ArrayList<>(Bytes.asList(generated)); File gzipFile = tmpFolder.newFile(baseName + ".gz"); generated = generateInput(1000, 2); - writeFile(gzipFile, generated, CompressionMode.GZIP); + writeFile(gzipFile, generated, Compression.GZIP); expected.addAll(Bytes.asList(generated)); File bzip2File = tmpFolder.newFile(baseName + ".bz2"); generated = generateInput(1000, 3); - writeFile(bzip2File, generated, CompressionMode.BZIP2); + writeFile(bzip2File, generated, Compression.BZIP2); expected.addAll(Bytes.asList(generated)); File zstdFile = tmpFolder.newFile(baseName + ".zst"); generated = generateInput(1000, 4); - writeFile(zstdFile, generated, CompressionMode.ZSTD); + writeFile(zstdFile, generated, Compression.ZSTD); expected.addAll(Bytes.asList(generated)); File lzoFile = tmpFolder.newFile(baseName + ".lzo_deflate"); generated = generateInput(1000, 4); - writeFile(lzoFile, generated, CompressionMode.LZO); + writeFile(lzoFile, generated, Compression.LZO); + expected.addAll(Bytes.asList(generated)); + + File snappyFile = tmpFolder.newFile(baseName + ".snappy"); + generated = generateInput(1000, 4); + writeFile(snappyFile, generated, Compression.SNAPPY); expected.addAll(Bytes.asList(generated)); String filePattern = new File(tmpFolder.getRoot().toString(), baseName + ".*").toString(); @@ -607,7 +701,7 @@ public void testUncompressedFileWithUncompressedIsSplittable() throws Exception CompressedSource source = CompressedSource.from(new ByteSource(uncompressedFile.getPath(), 1)) - .withDecompression(CompressionMode.UNCOMPRESSED); + .withCompression(Compression.UNCOMPRESSED); assertTrue(source.isSplittable()); SourceTestUtils.assertSplitAtFractionExhaustive(source, PipelineOptionsFactory.create()); } @@ -617,7 +711,7 @@ public void testGzipFileIsNotSplittable() throws Exception { String baseName = "test-input"; File compressedFile = tmpFolder.newFile(baseName + ".gz"); - writeFile(compressedFile, generateInput(10), CompressionMode.GZIP); + writeFile(compressedFile, generateInput(10), Compression.GZIP); CompressedSource source = CompressedSource.from(new ByteSource(compressedFile.getPath(), 1)); @@ -629,7 +723,7 @@ public void testBzip2FileIsNotSplittable() throws Exception { String baseName = "test-input"; File compressedFile = tmpFolder.newFile(baseName + ".bz2"); - writeFile(compressedFile, generateInput(10), CompressionMode.BZIP2); + writeFile(compressedFile, generateInput(10), Compression.BZIP2); CompressedSource source = CompressedSource.from(new ByteSource(compressedFile.getPath(), 1)); @@ -641,7 +735,7 @@ public void testZstdFileIsNotSplittable() throws Exception { String baseName = "test-input"; File compressedFile = tmpFolder.newFile(baseName + ".zst"); - writeFile(compressedFile, generateInput(10), CompressionMode.ZSTD); + writeFile(compressedFile, generateInput(10), Compression.ZSTD); CompressedSource source = CompressedSource.from(new ByteSource(compressedFile.getPath(), 1)); @@ -653,7 +747,7 @@ public void testLzoFileIsNotSplittable() throws Exception { String baseName = "test-input"; File compressedFile = tmpFolder.newFile(baseName + ".lzo_deflate"); - writeFile(compressedFile, generateInput(10), CompressionMode.LZO); + writeFile(compressedFile, generateInput(10), Compression.LZO); CompressedSource source = CompressedSource.from(new ByteSource(compressedFile.getPath(), 1)); @@ -665,7 +759,19 @@ public void testLzopFileIsNotSplittable() throws Exception { String baseName = "test-input"; File compressedFile = tmpFolder.newFile(baseName + ".lzo"); - writeFile(compressedFile, generateInput(10), CompressionMode.LZOP); + writeFile(compressedFile, generateInput(10), Compression.LZOP); + + CompressedSource source = + CompressedSource.from(new ByteSource(compressedFile.getPath(), 1)); + assertFalse(source.isSplittable()); + } + + @Test + public void testSnappyFileIsNotSplittable() throws Exception { + String baseName = "test-input"; + + File compressedFile = tmpFolder.newFile(baseName + ".snappy"); + writeFile(compressedFile, generateInput(10), Compression.SNAPPY); CompressedSource source = CompressedSource.from(new ByteSource(compressedFile.getPath(), 1)); @@ -673,27 +779,25 @@ public void testLzopFileIsNotSplittable() throws Exception { } /** - * Test reading an uncompressed file with {@link CompressionMode#GZIP}, since we must support this - * due to properties of services that we read from. + * Test reading an uncompressed file with {@link Compression#GZIP}, since we must support this due + * to properties of services that we read from. */ @Test public void testFalseGzipStream() throws Exception { byte[] input = generateInput(1000); File tmpFile = tmpFolder.newFile("test.gz"); Files.write(input, tmpFile); - verifyReadContents(input, tmpFile, CompressionMode.GZIP); + verifyReadContents(input, tmpFile, Compression.GZIP); } - /** - * Test reading an uncompressed file with {@link CompressionMode#BZIP2}, and show that we fail. - */ + /** Test reading an uncompressed file with {@link Compression#BZIP2}, and show that we fail. */ @Test public void testFalseBzip2Stream() throws Exception { byte[] input = generateInput(1000); File tmpFile = tmpFolder.newFile("test.bz2"); Files.write(input, tmpFile); thrown.expectMessage("Stream is not in the BZip2 format"); - verifyReadContents(input, tmpFile, CompressionMode.BZIP2); + verifyReadContents(input, tmpFile, Compression.BZIP2); } /** Test reading an uncompressed file with {@link Compression#ZSTD}, and show that we fail. */ @@ -703,7 +807,7 @@ public void testFalseZstdStream() throws Exception { File tmpFile = tmpFolder.newFile("test.zst"); Files.write(input, tmpFile); thrown.expectMessage("Decompression error: Unknown frame descriptor"); - verifyReadContents(input, tmpFile, CompressionMode.ZSTD); + verifyReadContents(input, tmpFile, Compression.ZSTD); } /** Test reading an uncompressed file with {@link Compression#LZO}, and show that we fail. */ @@ -713,7 +817,7 @@ public void testFalseLzoStream() throws Exception { File tmpFile = tmpFolder.newFile("test.lzo_deflate"); Files.write(input, tmpFile); thrown.expectMessage("expected:"); - verifyReadContents(input, tmpFile, CompressionMode.LZO); + verifyReadContents(input, tmpFile, Compression.LZO); } /** Test reading an uncompressed file with {@link Compression#LZOP}, and show that we fail. */ @@ -723,7 +827,17 @@ public void testFalseLzopStream() throws Exception { File tmpFile = tmpFolder.newFile("test.lzo"); Files.write(input, tmpFile); thrown.expectMessage("Not an LZOP file"); - verifyReadContents(input, tmpFile, CompressionMode.LZOP); + verifyReadContents(input, tmpFile, Compression.LZOP); + } + + /** Test reading an uncompressed file with {@link Compression#SNAPPY}, and show that we fail. */ + @Test + public void testFalseSnappyStream() throws Exception { + byte[] input = generateInput(1000); + File tmpFile = tmpFolder.newFile("test.snappy"); + Files.write(input, tmpFile); + thrown.expectMessage("Illegal block with bad offset found"); + verifyReadContents(input, tmpFile, Compression.SNAPPY); } /** @@ -735,7 +849,7 @@ public void testEmptyReadGzipUncompressed() throws Exception { byte[] input = generateInput(0); File tmpFile = tmpFolder.newFile("test.gz"); Files.write(input, tmpFile); - verifyReadContents(input, tmpFile, CompressionMode.GZIP); + verifyReadContents(input, tmpFile, Compression.GZIP); } /** @@ -747,7 +861,7 @@ public void testOneByteReadGzipUncompressed() throws Exception { byte[] input = generateInput(1); File tmpFile = tmpFolder.newFile("test.gz"); Files.write(input, tmpFile); - verifyReadContents(input, tmpFile, CompressionMode.GZIP); + verifyReadContents(input, tmpFile, Compression.GZIP); } /** Test reading multiple files. */ @@ -761,13 +875,12 @@ public void testCompressedReadMultipleFiles() throws Exception { for (int i = 0; i < numFiles; i++) { byte[] generated = generateInput(100); File tmpFile = tmpFolder.newFile(baseName + i); - writeFile(tmpFile, generated, CompressionMode.GZIP); + writeFile(tmpFile, generated, Compression.GZIP); expected.addAll(Bytes.asList(generated)); } CompressedSource source = - CompressedSource.from(new ByteSource(filePattern, 1)) - .withDecompression(CompressionMode.GZIP); + CompressedSource.from(new ByteSource(filePattern, 1)).withCompression(Compression.GZIP); List actual = SourceTestUtils.readFromSource(source, PipelineOptionsFactory.create()); assertEquals(HashMultiset.create(expected), HashMultiset.create(actual)); } @@ -783,13 +896,12 @@ public void testCompressedReadMultipleLzoFiles() throws Exception { for (int i = 0; i < numFiles; i++) { byte[] generated = generateInput(100); File tmpFile = tmpFolder.newFile(baseName + i); - writeFile(tmpFile, generated, CompressionMode.LZO); + writeFile(tmpFile, generated, Compression.LZO); expected.addAll(Bytes.asList(generated)); } CompressedSource source = - CompressedSource.from(new ByteSource(filePattern, 1)) - .withDecompression(CompressionMode.LZO); + CompressedSource.from(new ByteSource(filePattern, 1)).withCompression(Compression.LZO); List actual = SourceTestUtils.readFromSource(source, PipelineOptionsFactory.create()); assertEquals(HashMultiset.create(expected), HashMultiset.create(actual)); } @@ -805,13 +917,33 @@ public void testCompressedReadMultipleLzopFiles() throws Exception { for (int i = 0; i < numFiles; i++) { byte[] generated = generateInput(100); File tmpFile = tmpFolder.newFile(baseName + i); - writeFile(tmpFile, generated, CompressionMode.LZOP); + writeFile(tmpFile, generated, Compression.LZOP); + expected.addAll(Bytes.asList(generated)); + } + + CompressedSource source = + CompressedSource.from(new ByteSource(filePattern, 1)).withCompression(Compression.LZOP); + List actual = SourceTestUtils.readFromSource(source, PipelineOptionsFactory.create()); + assertEquals(HashMultiset.create(expected), HashMultiset.create(actual)); + } + + /** Test reading multiple files that are snappy compressed. */ + @Test + public void testCompressedReadMultipleSnappyFiles() throws Exception { + int numFiles = 3; + String baseName = "test_input-"; + String filePattern = new File(tmpFolder.getRoot().toString(), baseName + "*").toString(); + List expected = new ArrayList<>(); + + for (int i = 0; i < numFiles; i++) { + byte[] generated = generateInput(100); + File tmpFile = tmpFolder.newFile(baseName + i); + writeFile(tmpFile, generated, Compression.SNAPPY); expected.addAll(Bytes.asList(generated)); } CompressedSource source = - CompressedSource.from(new ByteSource(filePattern, 1)) - .withDecompression(CompressionMode.LZOP); + CompressedSource.from(new ByteSource(filePattern, 1)).withCompression(Compression.SNAPPY); List actual = SourceTestUtils.readFromSource(source, PipelineOptionsFactory.create()); assertEquals(HashMultiset.create(expected), HashMultiset.create(actual)); } @@ -827,13 +959,13 @@ public void populateDisplayData(DisplayData.Builder builder) { }; CompressedSource compressedSource = CompressedSource.from(inputSource); - CompressedSource gzipSource = compressedSource.withDecompression(CompressionMode.GZIP); + CompressedSource gzipSource = compressedSource.withCompression(Compression.GZIP); DisplayData compressedSourceDisplayData = DisplayData.from(compressedSource); DisplayData gzipDisplayData = DisplayData.from(gzipSource); assertThat(compressedSourceDisplayData, hasDisplayItem("compressionMode")); - assertThat(gzipDisplayData, hasDisplayItem("compressionMode", CompressionMode.GZIP.toString())); + assertThat(gzipDisplayData, hasDisplayItem("compressionMode", Compression.GZIP.toString())); assertThat(compressedSourceDisplayData, hasDisplayItem("source", inputSource.getClass())); assertThat(compressedSourceDisplayData, includesDisplayDataFor("source", inputSource)); } @@ -854,9 +986,9 @@ private byte[] generateInput(int size, int seed) { } /** Get a compressing stream for a given compression mode. */ - private OutputStream getOutputStreamForMode(CompressionMode mode, OutputStream stream) - throws IOException { - switch (mode) { + private OutputStream getOutputStreamForMode( + Compression compression, OutputStream stream, byte[] input) throws IOException { + switch (compression) { case GZIP: return new GzipCompressorOutputStream(stream); case BZIP2: @@ -871,6 +1003,8 @@ private OutputStream getOutputStreamForMode(CompressionMode mode, OutputStream s return LzoCompression.createLzoOutputStream(stream); case LZOP: return LzoCompression.createLzopOutputStream(stream); + case SNAPPY: + return new SnappyCompressorOutputStream(stream, input.length); default: throw new RuntimeException("Unexpected compression mode"); } @@ -879,7 +1013,7 @@ private OutputStream getOutputStreamForMode(CompressionMode mode, OutputStream s /** Extend of {@link ZipOutputStream} that splits up bytes into multiple entries. */ private static class TestZipOutputStream extends OutputStream { - private ZipOutputStream zipOutputStream; + private final ZipOutputStream zipOutputStream; private long offset = 0; private int entry = 0; @@ -907,30 +1041,25 @@ public void close() throws IOException { } /** Writes a single output file. */ - private void writeFile(File file, byte[] input, CompressionMode mode) throws IOException { - try (OutputStream os = getOutputStreamForMode(mode, new FileOutputStream(file))) { + private void writeFile(File file, byte[] input, Compression compression) throws IOException { + try (OutputStream os = getOutputStreamForMode(compression, new FileOutputStream(file), input)) { os.write(input); } } /** Run a single read test, writing and reading back input with the given compression mode. */ - private void runReadTest( - byte[] input, - CompressionMode inputCompressionMode, - @Nullable DecompressingChannelFactory decompressionFactory) - throws IOException { + private void runReadTest(byte[] input, Compression compression) throws IOException { File tmpFile = tmpFolder.newFile(); - writeFile(tmpFile, input, inputCompressionMode); - verifyReadContents(input, tmpFile, decompressionFactory); + writeFile(tmpFile, input, compression); + verifyReadContents(input, tmpFile, compression); } private void verifyReadContents( - byte[] expected, File inputFile, @Nullable DecompressingChannelFactory decompressionFactory) - throws IOException { + byte[] expected, File inputFile, @Nullable Compression compression) throws IOException { CompressedSource source = CompressedSource.from(new ByteSource(inputFile.toPath().toString(), 1)); - if (decompressionFactory != null) { - source = source.withDecompression(decompressionFactory); + if (compression != null) { + source = source.withCompression(compression); } List> actualOutput = Lists.newArrayList(); try (BoundedReader reader = source.createReader(PipelineOptionsFactory.create())) { @@ -945,11 +1074,6 @@ private void verifyReadContents( assertEquals(expectedOutput, actualOutput); } - /** Run a single read test, writing and reading back input with the given compression mode. */ - private void runReadTest(byte[] input, CompressionMode mode) throws IOException { - runReadTest(input, mode, mode); - } - /** Dummy source for use in tests. */ private static class ByteSource extends FileBasedSource { public ByteSource(String fileOrPatternSpec, long minBundleSize) { @@ -997,7 +1121,7 @@ protected boolean isAtSplitPoint() { } @Override - protected void startReading(ReadableByteChannel channel) throws IOException { + protected void startReading(ReadableByteChannel channel) { this.channel = channel; } @@ -1024,18 +1148,11 @@ public Instant getCurrentTimestamp() throws NoSuchElementException { } } - private static class ExtractIndexFromTimestamp extends DoFn> { - @ProcessElement - public void processElement(ProcessContext context) { - context.output(KV.of(context.timestamp().getMillis(), context.element())); - } - } - @Test public void testEmptyGzipProgress() throws IOException { File tmpFile = tmpFolder.newFile("empty.gz"); String filename = tmpFile.toPath().toString(); - writeFile(tmpFile, new byte[0], CompressionMode.GZIP); + writeFile(tmpFile, new byte[0], Compression.GZIP); PipelineOptions options = PipelineOptionsFactory.create(); CompressedSource source = CompressedSource.from(new ByteSource(filename, 1)); @@ -1062,7 +1179,7 @@ public void testGzipProgress() throws IOException { int numRecords = 3; File tmpFile = tmpFolder.newFile("nonempty.gz"); String filename = tmpFile.toPath().toString(); - writeFile(tmpFile, new byte[numRecords], CompressionMode.GZIP); + writeFile(tmpFile, new byte[numRecords], Compression.GZIP); PipelineOptions options = PipelineOptionsFactory.create(); CompressedSource source = CompressedSource.from(new ByteSource(filename, 1)); @@ -1097,11 +1214,11 @@ public void testGzipProgress() throws IOException { public void testEmptyLzoProgress() throws IOException { File tmpFile = tmpFolder.newFile("empty.lzo_deflate"); String filename = tmpFile.toPath().toString(); - writeFile(tmpFile, new byte[0], CompressionMode.LZO); + writeFile(tmpFile, new byte[0], Compression.LZO); PipelineOptions options = PipelineOptionsFactory.create(); CompressedSource source = - CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZO); + CompressedSource.from(new ByteSource(filename, 1)).withCompression(Compression.LZO); try (BoundedReader readerOrig = source.createReader(options)) { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; @@ -1123,11 +1240,11 @@ public void testLzoProgress() throws IOException { int numRecords = 3; File tmpFile = tmpFolder.newFile("nonempty.lzo"); String filename = tmpFile.toPath().toString(); - writeFile(tmpFile, new byte[numRecords], CompressionMode.LZO); + writeFile(tmpFile, new byte[numRecords], Compression.LZO); PipelineOptions options = PipelineOptionsFactory.create(); CompressedSource source = - CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZO); + CompressedSource.from(new ByteSource(filename, 1)).withCompression(Compression.LZO); try (BoundedReader readerOrig = source.createReader(options)) { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; @@ -1159,11 +1276,11 @@ public void testLzoProgress() throws IOException { public void testEmptyLzopProgress() throws IOException { File tmpFile = tmpFolder.newFile("empty.lzo"); String filename = tmpFile.toPath().toString(); - writeFile(tmpFile, new byte[0], CompressionMode.LZOP); + writeFile(tmpFile, new byte[0], Compression.LZOP); PipelineOptions options = PipelineOptionsFactory.create(); CompressedSource source = - CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZOP); + CompressedSource.from(new ByteSource(filename, 1)).withCompression(Compression.LZOP); try (BoundedReader readerOrig = source.createReader(options)) { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; @@ -1187,11 +1304,75 @@ public void testLzopProgress() throws IOException { int numRecords = 3; File tmpFile = tmpFolder.newFile("nonempty.lzo"); String filename = tmpFile.toPath().toString(); - writeFile(tmpFile, new byte[numRecords], CompressionMode.LZOP); + writeFile(tmpFile, new byte[numRecords], Compression.LZOP); + + PipelineOptions options = PipelineOptionsFactory.create(); + CompressedSource source = + CompressedSource.from(new ByteSource(filename, 1)).withCompression(Compression.LZOP); + try (BoundedReader readerOrig = source.createReader(options)) { + assertThat(readerOrig, instanceOf(CompressedReader.class)); + CompressedReader reader = (CompressedReader) readerOrig; + // before starting + assertEquals(0.0, reader.getFractionConsumed(), delta); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { + if (i == 0) { + assertTrue(reader.start()); + } else { + assertTrue(reader.advance()); + } + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading source + assertEquals(1.0, reader.getFractionConsumed(), delta); + assertEquals(1, reader.getSplitPointsConsumed()); + assertEquals(0, reader.getSplitPointsRemaining()); + } + } + + @Test + public void testEmptySnappyProgress() throws IOException { + File tmpFile = tmpFolder.newFile("empty.snappy"); + String filename = tmpFile.toPath().toString(); + writeFile(tmpFile, new byte[0], Compression.SNAPPY); + + PipelineOptions options = PipelineOptionsFactory.create(); + CompressedSource source = + CompressedSource.from(new ByteSource(filename, 1)).withCompression(Compression.SNAPPY); + try (BoundedReader readerOrig = source.createReader(options)) { + assertThat(readerOrig, instanceOf(CompressedReader.class)); + CompressedReader reader = (CompressedReader) readerOrig; + // before starting + assertEquals(0.0, reader.getFractionConsumed(), delta); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm empty + assertFalse(reader.start()); + + // after reading empty source + assertEquals(1.0, reader.getFractionConsumed(), delta); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(0, reader.getSplitPointsRemaining()); + } + } + + @Test + public void testSnappyProgress() throws IOException { + int numRecords = 3; + File tmpFile = tmpFolder.newFile("nonempty.snappy"); + String filename = tmpFile.toPath().toString(); + writeFile(tmpFile, new byte[numRecords], Compression.SNAPPY); PipelineOptions options = PipelineOptionsFactory.create(); CompressedSource source = - CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZOP); + CompressedSource.from(new ByteSource(filename, 1)).withCompression(Compression.SNAPPY); try (BoundedReader readerOrig = source.createReader(options)) { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; @@ -1224,7 +1405,7 @@ public void testUnsplittable() throws IOException { String baseName = "test-input"; File compressedFile = tmpFolder.newFile(baseName + ".gz"); byte[] input = generateInput(10000); - writeFile(compressedFile, input, CompressionMode.GZIP); + writeFile(compressedFile, input, Compression.GZIP); CompressedSource source = CompressedSource.from(new ByteSource(compressedFile.getPath(), 1)); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/CountingSourceTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/CountingSourceTest.java index 04ca76769794..9db9c8979b5b 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/CountingSourceTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/CountingSourceTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.io; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.lessThan; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.IOException; @@ -56,9 +56,6 @@ /** Tests of {@link CountingSource}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CountingSourceTest { public static void addCountingAsserts(PCollection input, long numElements) { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/DefaultFilenamePolicyTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/DefaultFilenamePolicyTest.java index a1b031ac8d3f..ec959f6bd976 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/DefaultFilenamePolicyTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/DefaultFilenamePolicyTest.java @@ -31,9 +31,6 @@ /** Tests of {@link DefaultFilenamePolicy}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DefaultFilenamePolicyTest { @Before diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileBasedSinkTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileBasedSinkTest.java index fb1f4450c056..deb0d6b4e09a 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileBasedSinkTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileBasedSinkTest.java @@ -19,11 +19,11 @@ import static org.apache.beam.sdk.io.WriteFiles.UNKNOWN_SHARDNUM; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets.UTF_8; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.is; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; import static org.junit.Assume.assumeFalse; @@ -75,7 +75,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FileBasedSinkTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileBasedSourceTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileBasedSourceTest.java index ce0f7410431c..1ee9c8c6f427 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileBasedSourceTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileBasedSourceTest.java @@ -21,9 +21,9 @@ import static org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionFails; import static org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionSucceedsAndConsistent; import static org.apache.beam.sdk.testing.SourceTestUtils.readFromSource; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.ByteArrayOutputStream; @@ -66,9 +66,6 @@ /** Tests code common to all file-based sources. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FileBasedSourceTest { private Random random = new Random(0L); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileIOTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileIOTest.java index 4ff45ed270e7..f219ee894994 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileIOTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileIOTest.java @@ -68,9 +68,6 @@ /** Tests for {@link FileIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FileIOTest implements Serializable { @Rule public transient TestPipeline p = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java index f2c4dbce07ec..023aaeccdf41 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java @@ -17,13 +17,18 @@ */ package org.apache.beam.sdk.io; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assume.assumeFalse; +import static org.mockito.Mockito.doThrow; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; +import java.io.FileNotFoundException; import java.io.Writer; import java.nio.channels.Channels; import java.nio.charset.StandardCharsets; @@ -32,6 +37,7 @@ import java.nio.file.Paths; import java.util.List; import org.apache.beam.sdk.io.fs.CreateOptions; +import org.apache.beam.sdk.io.fs.MatchResult; import org.apache.beam.sdk.io.fs.MoveOptions; import org.apache.beam.sdk.io.fs.ResourceId; import org.apache.beam.sdk.options.PipelineOptionsFactory; @@ -50,9 +56,6 @@ /** Tests for {@link FileSystems}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FileSystemsTest { @Rule public TemporaryFolder temporaryFolder = new TemporaryFolder(); @@ -192,6 +195,86 @@ public void testRenameIgnoreMissingFiles() throws Exception { containsInAnyOrder("content3")); } + @Test + public void testRenameWithFilteringAfterUnsupportedOptions() throws Exception { + FileSystem mockFileSystem = mock(FileSystem.class); + + Path srcPath1 = temporaryFolder.newFile().toPath(); + Path nonExistentPath = srcPath1.resolveSibling("non-existent"); + Path srcPath3 = temporaryFolder.newFile().toPath(); + + Path destPath1 = srcPath1.resolveSibling("dest1"); + Path destPath2 = nonExistentPath.resolveSibling("dest2"); + Path destPath3 = srcPath1.resolveSibling("dest3"); + + doThrow(new UnsupportedOperationException("move options not supported.")) + .when(mockFileSystem) + .rename( + toResourceIds(ImmutableList.of(srcPath1, nonExistentPath, srcPath3), false), + toResourceIds(ImmutableList.of(destPath1, destPath2, destPath3), false), + MoveOptions.StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS, + MoveOptions.StandardMoveOptions.IGNORE_MISSING_FILES); + when(mockFileSystem.match( + ImmutableList.of( + srcPath1.toString(), + nonExistentPath.toString(), + srcPath3.toString(), + destPath1.toString(), + destPath2.toString(), + destPath3.toString()))) + .thenReturn( + ImmutableList.of( + MatchResult.create( + MatchResult.Status.OK, + ImmutableList.of( + MatchResult.Metadata.builder() + .setChecksum("1") + .setResourceId(LocalResourceId.fromPath(srcPath1, false)) + .setSizeBytes(1) + .build())), + MatchResult.create(MatchResult.Status.NOT_FOUND, new FileNotFoundException("")), + MatchResult.create( + MatchResult.Status.OK, + ImmutableList.of( + MatchResult.Metadata.builder() + .setChecksum("3") + .setResourceId(LocalResourceId.fromPath(srcPath3, false)) + .setSizeBytes(1) + .build())), + MatchResult.create(MatchResult.Status.NOT_FOUND, new FileNotFoundException("")), + MatchResult.create( + MatchResult.Status.OK, + ImmutableList.of( + MatchResult.Metadata.builder() + .setChecksum("2") + .setResourceId(LocalResourceId.fromPath(destPath2, false)) + .setSizeBytes(1) + .build())), + MatchResult.create( + MatchResult.Status.OK, + ImmutableList.of( + MatchResult.Metadata.builder() + .setChecksum("3") + .setResourceId(LocalResourceId.fromPath(destPath3, false)) + .setSizeBytes(1) + .build())))); + + FileSystems.renameInternal( + mockFileSystem, + toResourceIds( + ImmutableList.of(srcPath1, nonExistentPath, srcPath3), false /* isDirectory */), + toResourceIds(ImmutableList.of(destPath1, destPath2, destPath3), false /* isDirectory */), + MoveOptions.StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS, + MoveOptions.StandardMoveOptions.IGNORE_MISSING_FILES); + + verify(mockFileSystem) + .rename( + toResourceIds(ImmutableList.of(srcPath1), false /* isDirectory */), + toResourceIds(ImmutableList.of(destPath1), false /* isDirectory */)); + verify(mockFileSystem) + .delete(toResourceIds(ImmutableList.of(srcPath3), false /* isDirectory */)); + } + @Test public void testValidMatchNewResourceForLocalFileSystem() { assertEquals("file", FileSystems.matchNewResource("/tmp/f1", false).getScheme()); @@ -205,7 +288,7 @@ public void testInvalidSchemaMatchNewResource() { assertEquals("file", FileSystems.matchNewResource("c:/tmp/f1", false)); } - private List toResourceIds(List paths, final boolean isDirectory) { + private static List toResourceIds(List paths, final boolean isDirectory) { return FluentIterable.from(paths) .transform(path -> (ResourceId) LocalResourceId.fromPath(path, isDirectory)) .toList(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalFileSystemRegistrarTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalFileSystemRegistrarTest.java index c263146bab46..b1f77e4c4b2a 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalFileSystemRegistrarTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalFileSystemRegistrarTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.io; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.ServiceLoader; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalFileSystemTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalFileSystemTest.java index b937920df757..d88aab863473 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalFileSystemTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalFileSystemTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.sdk.io; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertThrows; import static org.junit.Assert.assertTrue; import static org.junit.Assume.assumeFalse; @@ -60,9 +60,6 @@ /** Tests for {@link LocalFileSystem}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LocalFileSystemTest { @Rule public ExpectedException thrown = ExpectedException.none(); @Rule public TemporaryFolder temporaryFolder = new TemporaryFolder(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalResourceIdTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalResourceIdTest.java index a221f2c64957..effcd9d1a905 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalResourceIdTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/LocalResourceIdTest.java @@ -48,9 +48,6 @@ *

    TODO: re-enable unicode tests when BEAM-1453 is resolved. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LocalResourceIdTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/OffsetBasedSourceTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/OffsetBasedSourceTest.java index 7e93156f2d63..10de308f551c 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/OffsetBasedSourceTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/OffsetBasedSourceTest.java @@ -41,9 +41,6 @@ /** Tests code common to all offset-based sources. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class OffsetBasedSourceTest { // An offset-based source with 4 bytes per offset that yields its own current offset // and rounds the start and end offset to the nearest multiple of a given number, diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/ReadTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/ReadTest.java index 8a1c51e64d18..8dd4a5f29c23 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/ReadTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/ReadTest.java @@ -17,24 +17,63 @@ */ package org.apache.beam.sdk.io; +import static org.apache.beam.sdk.testing.SerializableMatchers.greaterThanOrEqualTo; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFor; import static org.hamcrest.MatcherAssert.assertThat; +import static org.junit.Assert.assertEquals; import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; import java.util.List; +import java.util.Map; +import java.util.NoSuchElementException; +import java.util.Optional; +import java.util.UUID; +import java.util.concurrent.ConcurrentHashMap; +import java.util.function.Consumer; +import java.util.stream.Collectors; +import java.util.stream.LongStream; +import org.apache.beam.sdk.coders.AvroCoder; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; +import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.VarLongCoder; +import org.apache.beam.sdk.io.CountingSource.CounterMark; import org.apache.beam.sdk.io.UnboundedSource.CheckpointMark; +import org.apache.beam.sdk.io.UnboundedSource.UnboundedReader; +import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.UsesUnboundedPCollections; +import org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo; +import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.reflect.DoFnInvokers; +import org.apache.beam.sdk.transforms.splittabledofn.WatermarkEstimators; +import org.apache.beam.sdk.transforms.windowing.AfterWatermark; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.transforms.windowing.GlobalWindows; +import org.apache.beam.sdk.transforms.windowing.Window; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; +import org.joda.time.Instant; import org.junit.Rule; import org.junit.Test; +import org.junit.experimental.categories.Category; import org.junit.rules.ExpectedException; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; @@ -43,10 +82,14 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ReadTest implements Serializable { + + private static final Map> STATIC_INSTANT_LIST_MAP = + new ConcurrentHashMap<>(); + @Rule public transient ExpectedException thrown = ExpectedException.none(); + @Rule public final transient TestPipeline pipeline = TestPipeline.create(); @Test public void testInstantiationOfBoundedSourceAsSDFWrapper() { @@ -109,6 +152,106 @@ public void populateDisplayData(DisplayData.Builder builder) { assertThat(unboundedDisplayData, hasDisplayItem("maxReadTime", maxReadTime)); } + @Test + public void testReadBoundedPreservesTypeDescriptor() { + PCollection input = pipeline.apply(Read.from(new SerializableBoundedSource())); + TypeDescriptor typeDescriptor = input.getTypeDescriptor(); + assertEquals(String.class, typeDescriptor.getType()); + + ListBoundedSource longs = new ListBoundedSource<>(VarLongCoder.of()); + PCollection> numbers = pipeline.apply(Read.from(longs)); + assertEquals(new TypeDescriptor>() {}, numbers.getTypeDescriptor()); + } + + @Test + @Category({ + NeedsRunner.class, + UsesUnboundedPCollections.class, + UsesUnboundedSplittableParDo.class + }) + public void testUnboundedSdfWrapperCacheStartedReaders() { + long numElements = 1000L; + PCollection input = + pipeline.apply(Read.from(new ExpectCacheUnboundedSource(numElements))); + PAssert.that(input) + .containsInAnyOrder( + LongStream.rangeClosed(1L, numElements).boxed().collect(Collectors.toList())); + // TODO(BEAM-10670): Remove additional experiments when SDF read is default. + ExperimentalOptions.addExperiment( + pipeline.getOptions().as(ExperimentalOptions.class), "use_sdf_read"); + // Force the pipeline to run with one thread to ensure the reader will be reused on one DoFn + // instance. + // We are not able to use DirectOptions because of circular dependency. + pipeline + .runWithAdditionalOptionArgs(ImmutableList.of("--targetParallelism=1")) + .waitUntilFinish(); + } + + @Test + @Category({ + NeedsRunner.class, + UsesUnboundedPCollections.class, + UsesUnboundedSplittableParDo.class + }) + public void testWatermarkAdvanceOnClaimFail() { + // NOTE: this test is supposed to run only against DirectRunner + // as for other runners it might not be working the interception of watermark + // through the STATIC_INSTANT_LIST_MAP + int numElements = 1000; + final String uuid = UUID.randomUUID().toString(); + List interceptedWatermark = + STATIC_INSTANT_LIST_MAP.computeIfAbsent(uuid, tmp -> new ArrayList<>()); + PCollection counted = + pipeline + .apply( + newUnboundedReadInterceptingWatermark( + numElements, + (Serializable & Consumer) + (instant -> STATIC_INSTANT_LIST_MAP.get(uuid).add(instant)))) + .apply( + Window.into(new GlobalWindows()) + .discardingFiredPanes() + .triggering(AfterWatermark.pastEndOfWindow())) + .apply(Count.globally()); + PAssert.that(counted).containsInAnyOrder((long) numElements); + pipeline.run().waitUntilFinish(); + // verify that the observed watermark gradually moves + assertThat(interceptedWatermark.size(), greaterThanOrEqualTo(numElements)); + Instant watermark = interceptedWatermark.get(0); + for (int i = 1; i < interceptedWatermark.size(); i++) { + assertThat( + "Watermarks should be non-decreasing sequence, got " + interceptedWatermark, + !watermark.isAfter(interceptedWatermark.get(i))); + watermark = interceptedWatermark.get(i); + } + } + + private > + Read.Unbounded newUnboundedReadInterceptingWatermark( + long numElements, T interceptedWatermarkReceiver) { + + UnboundedLongSource source = new UnboundedLongSource(numElements); + return new Read.Unbounded(null, source) { + @Override + @SuppressWarnings("unchecked") + Read.UnboundedSourceAsSDFWrapperFn createUnboundedSdfWrapper() { + return new Read.UnboundedSourceAsSDFWrapperFn( + (Coder) source.getCheckpointMarkCoder()) { + @Override + public WatermarkEstimators.Manual newWatermarkEstimator(Instant watermarkEstimatorState) { + return new WatermarkEstimators.Manual(watermarkEstimatorState) { + @Override + public void setWatermark(Instant watermark) { + super.setWatermark(watermark); + interceptedWatermarkReceiver.accept(watermark); + } + }; + } + }; + } + }; + } + private abstract static class CustomBoundedSource extends BoundedSource { @Override public List> split( @@ -132,6 +275,35 @@ public Coder getOutputCoder() { } } + private static class ListBoundedSource extends BoundedSource> { + private Coder coder; + + ListBoundedSource(Coder coder) { + this.coder = coder; + } + + @Override + public List>> split( + long desiredBundleSizeBytes, PipelineOptions options) throws Exception { + return null; + } + + @Override + public long getEstimatedSizeBytes(PipelineOptions options) throws Exception { + return 0; + } + + @Override + public BoundedReader> createReader(PipelineOptions options) throws IOException { + return null; + } + + @Override + public Coder> getOutputCoder() { + return ListCoder.of(coder); + } + } + private static class NotSerializableBoundedSource extends CustomBoundedSource { @SuppressWarnings("unused") private final NotSerializableClass notSerializableClass = new NotSerializableClass(); @@ -139,6 +311,103 @@ private static class NotSerializableBoundedSource extends CustomBoundedSource { private static class SerializableBoundedSource extends CustomBoundedSource {} + private static class ExpectCacheUnboundedSource + extends UnboundedSource { + + private final long numElements; + + ExpectCacheUnboundedSource(long numElements) { + this.numElements = numElements; + } + + @Override + public List> split( + int desiredNumSplits, PipelineOptions options) throws Exception { + return ImmutableList.of(this); + } + + @Override + public UnboundedReader createReader( + PipelineOptions options, @Nullable CounterMark checkpointMark) throws IOException { + if (checkpointMark != null) { + throw new IOException("The reader should be retrieved from cache instead of a new one"); + } + return new ExpectCacheReader(this, checkpointMark); + } + + @Override + public Coder getOutputCoder() { + return VarLongCoder.of(); + } + + @Override + public Coder getCheckpointMarkCoder() { + return AvroCoder.of(CountingSource.CounterMark.class); + } + } + + private static class ExpectCacheReader extends UnboundedReader { + private long current; + private ExpectCacheUnboundedSource source; + + ExpectCacheReader(ExpectCacheUnboundedSource source, CounterMark checkpointMark) { + this.source = source; + if (checkpointMark == null) { + current = 0L; + } else { + current = checkpointMark.getLastEmitted(); + } + } + + @Override + public boolean start() throws IOException { + return advance(); + } + + @Override + public boolean advance() throws IOException { + current += 1; + if (current > source.numElements) { + return false; + } + return true; + } + + @Override + public Long getCurrent() throws NoSuchElementException { + return current; + } + + @Override + public Instant getCurrentTimestamp() throws NoSuchElementException { + return getWatermark(); + } + + @Override + public void close() throws IOException {} + + @Override + public Instant getWatermark() { + if (current > source.numElements) { + return BoundedWindow.TIMESTAMP_MAX_VALUE; + } + return BoundedWindow.TIMESTAMP_MIN_VALUE; + } + + @Override + public CheckpointMark getCheckpointMark() { + if (current <= 0) { + return null; + } + return new CounterMark(current, BoundedWindow.TIMESTAMP_MIN_VALUE); + } + + @Override + public UnboundedSource getCurrentSource() { + return source; + } + } + private abstract static class CustomUnboundedSource extends UnboundedSource { @Override @@ -182,4 +451,114 @@ private static class NotSerializableUnboundedSource extends CustomUnboundedSourc private static class SerializableUnboundedSource extends CustomUnboundedSource {} private static class NotSerializableClass {} + + private static class OffsetCheckpointMark implements CheckpointMark { + + private static final Coder CODER = + new CustomCoder() { + private final VarLongCoder longCoder = VarLongCoder.of(); + + @Override + public void encode(OffsetCheckpointMark value, OutputStream outStream) + throws CoderException, IOException { + longCoder.encode(value.offset, outStream); + } + + @Override + public OffsetCheckpointMark decode(InputStream inStream) + throws CoderException, IOException { + return new OffsetCheckpointMark(longCoder.decode(inStream)); + } + }; + + private final long offset; + + OffsetCheckpointMark(Long offset) { + this.offset = MoreObjects.firstNonNull(offset, -1L); + } + + @Override + public void finalizeCheckpoint() {} + } + + private class UnboundedLongSource extends UnboundedSource { + + private final long numElements; + + public UnboundedLongSource(long numElements) { + this.numElements = numElements; + } + + @Override + public List> split( + int desiredNumSplits, PipelineOptions options) { + + return Collections.singletonList(this); + } + + @Override + public UnboundedReader createReader( + PipelineOptions options, @Nullable OffsetCheckpointMark checkpointMark) { + + return new UnboundedLongSourceReader( + Optional.ofNullable(checkpointMark).map(m -> m.offset).orElse(-1L)); + } + + @Override + public Coder getOutputCoder() { + return VarLongCoder.of(); + } + + @Override + public Coder getCheckpointMarkCoder() { + return OffsetCheckpointMark.CODER; + } + + private class UnboundedLongSourceReader extends UnboundedReader { + private final Instant now = Instant.now(); + private long current; + + UnboundedLongSourceReader(long current) { + this.current = current; + } + + @Override + public Long getCurrent() throws NoSuchElementException { + return current; + } + + @Override + public Instant getCurrentTimestamp() throws NoSuchElementException { + return current < 0 ? now : now.plus(current); + } + + @Override + public void close() throws IOException {} + + @Override + public boolean start() throws IOException { + return advance(); + } + + @Override + public boolean advance() throws IOException { + return ++current < numElements; + } + + @Override + public Instant getWatermark() { + return current < numElements ? getCurrentTimestamp() : BoundedWindow.TIMESTAMP_MAX_VALUE; + } + + @Override + public CheckpointMark getCheckpointMark() { + return new OffsetCheckpointMark(current); + } + + @Override + public UnboundedSource getCurrentSource() { + return UnboundedLongSource.this; + } + } + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/SimpleSink.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/SimpleSink.java index 7d12ac6cfa52..1501a8199ab8 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/SimpleSink.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/SimpleSink.java @@ -29,9 +29,6 @@ /** * A simple {@link FileBasedSink} that writes {@link String} values as lines with header and footer. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class SimpleSink extends FileBasedSink { public SimpleSink( ResourceId tempDirectory, diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TFRecordIOTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TFRecordIOTest.java index b7b16bef7f65..0be211fd1de4 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TFRecordIOTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TFRecordIOTest.java @@ -82,9 +82,6 @@ /** Tests for TFRecordIO Read and Write transforms. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TFRecordIOTest { /* diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java index 5fa89be19126..b113dac40d3f 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java @@ -27,6 +27,7 @@ import static org.apache.beam.sdk.io.Compression.UNCOMPRESSED; import static org.apache.beam.sdk.io.Compression.ZIP; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; @@ -35,16 +36,17 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assume.assumeFalse; +import java.io.BufferedWriter; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStream; import java.io.PrintStream; import java.io.Writer; +import java.nio.charset.Charset; import java.nio.file.Files; import java.nio.file.Path; import java.util.ArrayList; @@ -66,6 +68,9 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo; import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.ToString; import org.apache.beam.sdk.transforms.Watch; import org.apache.beam.sdk.transforms.display.DisplayData; @@ -95,9 +100,6 @@ /** Tests for {@link TextIO.Read}. */ @RunWith(Enclosed.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TextIOReadTest { private static final int LINES_NUMBER_FOR_LARGE = 1000; private static final List EMPTY = Collections.emptyList(); @@ -854,4 +856,121 @@ public void testReadWatchForNewFiles() throws IOException, InterruptedException p.run(); } } + + /** Tests for TextSource class. */ + @RunWith(JUnit4.class) + public static class TextSourceTest { + @Rule public transient TestPipeline pipeline = TestPipeline.create(); + + @Test + @Category(NeedsRunner.class) + public void testRemoveUtf8BOM() throws Exception { + Path p1 = createTestFile("test_txt_ascii", Charset.forName("US-ASCII"), "1,p1", "2,p1"); + Path p2 = + createTestFile( + "test_txt_utf8_no_bom", + Charset.forName("UTF-8"), + "1,p2-Japanese:テスト", + "2,p2-Japanese:テスト"); + Path p3 = + createTestFile( + "test_txt_utf8_bom", + Charset.forName("UTF-8"), + "\uFEFF1,p3-テストBOM", + "\uFEFF2,p3-テストBOM"); + PCollection contents = + pipeline + .apply("Create", Create.of(p1.toString(), p2.toString(), p3.toString())) + .setCoder(StringUtf8Coder.of()) + // PCollection + .apply("Read file", new TextIOReadTest.TextSourceTest.TextFileReadTransform()); + // PCollection>: tableName, line + + // Validate that the BOM bytes (\uFEFF) at the beginning of the first line have been removed. + PAssert.that(contents) + .containsInAnyOrder( + "1,p1", + "2,p1", + "1,p2-Japanese:テスト", + "2,p2-Japanese:テスト", + "1,p3-テストBOM", + "\uFEFF2,p3-テストBOM"); + + pipeline.run(); + } + + @Test + @Category(NeedsRunner.class) + public void testPreserveNonBOMBytes() throws Exception { + // Contains \uFEFE, not UTF BOM. + Path p1 = + createTestFile( + "test_txt_utf_bom", Charset.forName("UTF-8"), "\uFEFE1,p1テスト", "\uFEFE2,p1テスト"); + PCollection contents = + pipeline + .apply("Create", Create.of(p1.toString())) + .setCoder(StringUtf8Coder.of()) + // PCollection + .apply("Read file", new TextIOReadTest.TextSourceTest.TextFileReadTransform()); + + PAssert.that(contents).containsInAnyOrder("\uFEFE1,p1テスト", "\uFEFE2,p1テスト"); + + pipeline.run(); + } + + private static class FileReadDoFn extends DoFn { + + @ProcessElement + public void processElement(ProcessContext c) { + FileIO.ReadableFile file = c.element(); + ValueProvider filenameProvider = + ValueProvider.StaticValueProvider.of(file.getMetadata().resourceId().getFilename()); + // Create a TextSource, passing null as the delimiter to use the default + // delimiters ('\n', '\r', or '\r\n'). + TextSource textSource = new TextSource(filenameProvider, null, null); + try { + BoundedSource.BoundedReader reader = + textSource + .createForSubrangeOfFile(file.getMetadata(), 0, file.getMetadata().sizeBytes()) + .createReader(c.getPipelineOptions()); + for (boolean more = reader.start(); more; more = reader.advance()) { + c.output(reader.getCurrent()); + } + } catch (IOException e) { + throw new RuntimeException( + "Unable to readFile: " + file.getMetadata().resourceId().toString()); + } + } + } + + /** A transform that reads CSV file records. */ + private static class TextFileReadTransform + extends PTransform, PCollection> { + public TextFileReadTransform() {} + + @Override + public PCollection expand(PCollection files) { + return files + // PCollection + .apply(FileIO.matchAll().withEmptyMatchTreatment(EmptyMatchTreatment.DISALLOW)) + // PCollection + .apply(FileIO.readMatches()) + // PCollection + .apply("Read lines", ParDo.of(new TextIOReadTest.TextSourceTest.FileReadDoFn())); + // PCollection: line + } + } + + private Path createTestFile(String filename, Charset charset, String... lines) + throws IOException { + Path path = Files.createTempFile(filename, ".csv"); + try (BufferedWriter writer = Files.newBufferedWriter(path, charset)) { + for (String line : lines) { + writer.write(line); + writer.write('\n'); + } + } + return path; + } + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java index 0879fe90d164..4cb8448ba922 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java @@ -22,9 +22,9 @@ import static org.apache.beam.sdk.TestUtils.NO_LINES_ARRAY; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects.firstNonNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assume.assumeFalse; @@ -88,9 +88,6 @@ /** Tests for {@link TextIO.Write}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TextIOWriteTest { private static final String MY_HEADER = "myHeader"; private static final String MY_FOOTER = "myFooter"; @@ -692,20 +689,6 @@ public void testWriteViaSink() throws Exception { .apply("Read Files", TextIO.readFiles())) .containsInAnyOrder(data); - PAssert.that( - p.apply("Create Data ReadAll", Create.of(data)) - .apply( - "Write ReadAll", - FileIO.write() - .to(tempFolder.getRoot().toString()) - .withSuffix(".txt") - .via(TextIO.sink()) - .withIgnoreWindowing()) - .getPerDestinationOutputFilenames() - .apply("Extract Values ReadAll", Values.create()) - .apply("Read All", TextIO.readAll())) - .containsInAnyOrder(data); - p.run(); } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextSourceTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextSourceTest.java deleted file mode 100644 index 4e20753c3c1d..000000000000 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextSourceTest.java +++ /dev/null @@ -1,161 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io; - -import java.io.BufferedWriter; -import java.io.IOException; -import java.nio.charset.Charset; -import java.nio.file.Files; -import java.nio.file.Path; -import org.apache.beam.sdk.coders.StringUtf8Coder; -import org.apache.beam.sdk.io.FileIO.ReadableFile; -import org.apache.beam.sdk.io.fs.EmptyMatchTreatment; -import org.apache.beam.sdk.options.ValueProvider; -import org.apache.beam.sdk.testing.NeedsRunner; -import org.apache.beam.sdk.testing.PAssert; -import org.apache.beam.sdk.testing.TestPipeline; -import org.apache.beam.sdk.transforms.Create; -import org.apache.beam.sdk.transforms.DoFn; -import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.transforms.ParDo; -import org.apache.beam.sdk.values.PCollection; -import org.junit.Rule; -import org.junit.Test; -import org.junit.experimental.categories.Category; -import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; - -/** Tests for TextSource class. */ -@RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class TextSourceTest { - @Rule public transient TestPipeline pipeline = TestPipeline.create(); - - @Test - @Category(NeedsRunner.class) - public void testRemoveUtf8BOM() throws Exception { - Path p1 = createTestFile("test_txt_ascii", Charset.forName("US-ASCII"), "1,p1", "2,p1"); - Path p2 = - createTestFile( - "test_txt_utf8_no_bom", - Charset.forName("UTF-8"), - "1,p2-Japanese:テスト", - "2,p2-Japanese:テスト"); - Path p3 = - createTestFile( - "test_txt_utf8_bom", - Charset.forName("UTF-8"), - "\uFEFF1,p3-テストBOM", - "\uFEFF2,p3-テストBOM"); - PCollection contents = - pipeline - .apply("Create", Create.of(p1.toString(), p2.toString(), p3.toString())) - .setCoder(StringUtf8Coder.of()) - // PCollection - .apply("Read file", new TextFileReadTransform()); - // PCollection>: tableName, line - - // Validate that the BOM bytes (\uFEFF) at the beginning of the first line have been removed. - PAssert.that(contents) - .containsInAnyOrder( - "1,p1", - "2,p1", - "1,p2-Japanese:テスト", - "2,p2-Japanese:テスト", - "1,p3-テストBOM", - "\uFEFF2,p3-テストBOM"); - - pipeline.run(); - } - - @Test - @Category(NeedsRunner.class) - public void testPreserveNonBOMBytes() throws Exception { - // Contains \uFEFE, not UTF BOM. - Path p1 = - createTestFile( - "test_txt_utf_bom", Charset.forName("UTF-8"), "\uFEFE1,p1テスト", "\uFEFE2,p1テスト"); - PCollection contents = - pipeline - .apply("Create", Create.of(p1.toString())) - .setCoder(StringUtf8Coder.of()) - // PCollection - .apply("Read file", new TextFileReadTransform()); - - PAssert.that(contents).containsInAnyOrder("\uFEFE1,p1テスト", "\uFEFE2,p1テスト"); - - pipeline.run(); - } - - private static class FileReadDoFn extends DoFn { - - @ProcessElement - public void processElement(ProcessContext c) { - ReadableFile file = c.element(); - ValueProvider filenameProvider = - ValueProvider.StaticValueProvider.of(file.getMetadata().resourceId().getFilename()); - // Create a TextSource, passing null as the delimiter to use the default - // delimiters ('\n', '\r', or '\r\n'). - TextSource textSource = new TextSource(filenameProvider, null, null); - try { - BoundedSource.BoundedReader reader = - textSource - .createForSubrangeOfFile(file.getMetadata(), 0, file.getMetadata().sizeBytes()) - .createReader(c.getPipelineOptions()); - for (boolean more = reader.start(); more; more = reader.advance()) { - c.output(reader.getCurrent()); - } - } catch (IOException e) { - throw new RuntimeException( - "Unable to readFile: " + file.getMetadata().resourceId().toString()); - } - } - } - - /** A transform that reads CSV file records. */ - private static class TextFileReadTransform - extends PTransform, PCollection> { - public TextFileReadTransform() {} - - @Override - public PCollection expand(PCollection files) { - return files - // PCollection - .apply(FileIO.matchAll().withEmptyMatchTreatment(EmptyMatchTreatment.DISALLOW)) - // PCollection - .apply(FileIO.readMatches()) - // PCollection - .apply("Read lines", ParDo.of(new FileReadDoFn())); - // PCollection: line - } - } - - private Path createTestFile(String filename, Charset charset, String... lines) - throws IOException { - Path path = Files.createTempFile(filename, ".csv"); - try (BufferedWriter writer = Files.newBufferedWriter(path, charset)) { - for (String line : lines) { - writer.write(line); - writer.write('\n'); - } - } - return path; - } -} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/WriteFilesTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/WriteFilesTest.java index 7900b8a33a74..059abcf8a5c6 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/WriteFilesTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/WriteFilesTest.java @@ -20,6 +20,7 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFor; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects.firstNonNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; @@ -29,7 +30,6 @@ import static org.hamcrest.Matchers.nullValue; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.BufferedReader; @@ -62,6 +62,9 @@ import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; import org.apache.beam.sdk.testing.NeedsRunner; import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.TestStream; +import org.apache.beam.sdk.testing.UsesTestStream; +import org.apache.beam.sdk.testing.UsesTestStreamWithProcessingTime; import org.apache.beam.sdk.testing.UsesUnboundedPCollections; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; @@ -89,10 +92,12 @@ import org.apache.beam.sdk.values.ShardedKey; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Optional; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.commons.compress.utils.Sets; import org.hamcrest.Matchers; import org.joda.time.Duration; +import org.joda.time.Instant; import org.joda.time.format.DateTimeFormatter; import org.joda.time.format.ISODateTimeFormat; import org.junit.Rule; @@ -105,9 +110,6 @@ /** Tests for the WriteFiles PTransform. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WriteFilesTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); @Rule public final TestPipeline p = TestPipeline.create(); @@ -311,7 +313,69 @@ public Void apply(Integer shardNumber, List shardContent) { } return null; } - }); + }, + false); + } + + @Test + @Category({NeedsRunner.class, UsesUnboundedPCollections.class}) + public void testWithRunnerDeterminedShardingUnbounded() throws IOException { + runShardedWrite( + Arrays.asList("one", "two", "three", "four", "five", "six"), + Window.into(FixedWindows.of(Duration.standardSeconds(10))), + getBaseOutputFilename(), + WriteFiles.to(makeSimpleSink()).withWindowedWrites().withRunnerDeterminedSharding(), + null, + true); + } + + @Test + @Category({ + NeedsRunner.class, + UsesUnboundedPCollections.class, + UsesTestStream.class, + UsesTestStreamWithProcessingTime.class + }) + public void testWithRunnerDeterminedShardingTestStream() throws IOException { + List elements = Lists.newArrayList(); + for (int i = 0; i < 30; ++i) { + elements.add("number: " + i); + } + Instant startInstant = new Instant(0L); + TestStream testStream = + TestStream.create(StringUtf8Coder.of()) + // Initialize watermark for timer to be triggered correctly. + .advanceWatermarkTo(startInstant) + // Add 10 elements in the first window. + .addElements(elements.get(0), Iterables.toArray(elements.subList(1, 10), String.class)) + .advanceProcessingTime(Duration.standardMinutes(1)) + .advanceWatermarkTo(startInstant.plus(Duration.standardSeconds(5))) + // Add 10 more elements in the first window. + .addElements( + elements.get(10), Iterables.toArray(elements.subList(11, 20), String.class)) + .advanceProcessingTime(Duration.standardMinutes(1)) + .advanceWatermarkTo(startInstant.plus(Duration.standardSeconds(10))) + // Add the remaining relements in the second window. + .addElements( + elements.get(20), Iterables.toArray(elements.subList(21, 30), String.class)) + .advanceProcessingTime(Duration.standardMinutes(1)) + .advanceWatermarkToInfinity(); + + // Flag to validate that the pipeline options are passed to the Sink + WriteOptions options = TestPipeline.testingPipelineOptions().as(WriteOptions.class); + options.setTestFlag("test_value"); + Pipeline p = TestPipeline.create(options); + WriteFiles write = + WriteFiles.to(makeSimpleSink()).withWindowedWrites().withRunnerDeterminedSharding(); + p.apply(testStream) + .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10)))) + .apply(write) + .getPerDestinationOutputFilenames() + .apply(new VerifyFilesExist<>()); + p.run(); + + checkFileContents( + getBaseOutputFilename(), elements, Optional.absent(), !write.getWindowedWrites(), null); } /** Test a WriteFiles transform with an empty PCollection. */ @@ -453,15 +517,16 @@ public void testUnboundedNeedsWindowed() { @Test @Category(NeedsRunner.class) - public void testUnboundedWritesNeedSharding() { + public void testUnboundedWritesWithMergingWindowNeedSharding() { thrown.expect(IllegalArgumentException.class); thrown.expectMessage( - "When applying WriteFiles to an unbounded PCollection, " + "When applying WriteFiles to an unbounded PCollection with merging windows, " + "must specify number of output shards explicitly"); SimpleSink sink = makeSimpleSink(); p.apply(Create.of("foo")) .setIsBoundedInternal(IsBounded.UNBOUNDED) + .apply(Window.into(Sessions.withGapDuration(Duration.millis(100)))) .apply(WriteFiles.to(sink).withWindowedWrites()); p.run(); } @@ -706,9 +771,9 @@ public ResourceId unwindowedFilename( } /** - * Same as {@link #runShardedWrite(List, PTransform, String, WriteFiles, BiFunction)} but without - * shard content check. This means content will be checked only globally, that shards together - * contains written input and not content per shard + * Same as {@link #runShardedWrite(List, PTransform, String, WriteFiles, BiFunction, boolean)} but + * without shard content check. This means content will be checked only globally, that shards + * together contains written input and not content per shard */ private void runShardedWrite( List inputs, @@ -716,7 +781,7 @@ private void runShardedWrite( String baseName, WriteFiles write) throws IOException { - runShardedWrite(inputs, transform, baseName, write, null); + runShardedWrite(inputs, transform, baseName, write, null, false); } /** @@ -730,7 +795,8 @@ private void runShardedWrite( PTransform, PCollection> transform, String baseName, WriteFiles write, - BiFunction, Void> shardContentChecker) + BiFunction, Void> shardContentChecker, + boolean isUnbounded) throws IOException { // Flag to validate that the pipeline options are passed to the Sink WriteOptions options = TestPipeline.testingPipelineOptions().as(WriteOptions.class); @@ -743,6 +809,7 @@ private void runShardedWrite( timestamps.add(i + 1); } p.apply(Create.timestamped(inputs, timestamps).withCoder(StringUtf8Coder.of())) + .setIsBoundedInternal(isUnbounded ? IsBounded.UNBOUNDED : IsBounded.BOUNDED) .apply(transform) .apply(write) .getPerDestinationOutputFilenames() diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeEstimateFractionTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeEstimateFractionTest.java index 7093d1177392..1320e159cc41 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeEstimateFractionTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeEstimateFractionTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.range; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.greaterThanOrEqualTo; -import static org.junit.Assert.assertThat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Test; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeInterpolateKeyTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeInterpolateKeyTest.java index 551815fed400..e3845435a8dd 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeInterpolateKeyTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeInterpolateKeyTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.io.range; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.closeTo; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.greaterThanOrEqualTo; -import static org.junit.Assert.assertThat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Test; @@ -34,9 +34,6 @@ * ByteKeyRange#estimateFractionForKey} by converting the interpolated keys back to fractions. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ByteKeyRangeInterpolateKeyTest { private static final ByteKey[] TEST_KEYS = ByteKeyRangeTest.RANGE_TEST_KEYS; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeTest.java index 0b8d91b16b34..523e680aaa88 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.io.range; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.greaterThanOrEqualTo; import static org.hamcrest.Matchers.lessThan; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -35,9 +35,6 @@ /** Tests for {@link ByteKeyRange}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ByteKeyRangeTest { // A set of ranges for testing. private static final ByteKeyRange RANGE_1_10 = ByteKeyRange.of(ByteKey.of(1), ByteKey.of(10)); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyTest.java index 7add55374acb..13353f88a922 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.io.range; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.lessThan; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricResultsMatchers.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricResultsMatchers.java index 52ba9b974dfd..2cd89c4f9f6e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricResultsMatchers.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricResultsMatchers.java @@ -23,9 +23,6 @@ import org.hamcrest.TypeSafeMatcher; /** Matchers for {@link MetricResults}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MetricResultsMatchers { /** @@ -157,7 +154,7 @@ static Matcher> distributionCommittedMinMax( return distributionMinMax(namespace, name, step, committedMin, committedMax, true); } - public static Matcher> distributionMinMax( + public static Matcher> distributionMinMax( final String namespace, final String name, final String step, @@ -165,14 +162,13 @@ public static Matcher> distributionMinMax( final Long max, final boolean isCommitted) { final String metricState = isCommitted ? "committed" : "attempted"; - return new MatchNameAndKey(namespace, name, step) { + return new MatchNameAndKey(namespace, name, step) { @Override - protected boolean matchesSafely(MetricResult item) { - final DistributionResult metricValue = - isCommitted ? item.getCommitted() : item.getAttempted(); + protected boolean matchesSafely(MetricResult item) { + final T metricValue = isCommitted ? item.getCommitted() : item.getAttempted(); return super.matchesSafely(item) - && Objects.equals(min, metricValue.getMin()) - && Objects.equals(max, metricValue.getMax()); + && Objects.equals(min, ((DistributionResult) metricValue).getMin()) + && Objects.equals(max, ((DistributionResult) metricValue).getMax()); } @Override @@ -187,25 +183,23 @@ public void describeTo(Description description) { } @Override - protected void describeMismatchSafely( - MetricResult item, Description mismatchDescription) { - final DistributionResult metricValue = - isCommitted ? item.getCommitted() : item.getAttempted(); + protected void describeMismatchSafely(MetricResult item, Description mismatchDescription) { + final T metricValue = isCommitted ? item.getCommitted() : item.getAttempted(); super.describeMismatchSafely(item, mismatchDescription); - if (!Objects.equals(min, metricValue.getMin())) { + if (!Objects.equals(min, ((DistributionResult) metricValue).getMin())) { mismatchDescription .appendText(String.format("%sMin: ", metricState)) .appendValue(min) .appendText(" != ") - .appendValue(metricValue.getMin()); + .appendValue(((DistributionResult) metricValue).getMin()); } - if (!Objects.equals(max, metricValue.getMax())) { + if (!Objects.equals(max, ((DistributionResult) metricValue).getMax())) { mismatchDescription .appendText(String.format("%sMax: ", metricState)) .appendValue(max) .appendText(" != ") - .appendValue(metricValue.getMax()); + .appendValue(((DistributionResult) metricValue).getMax()); } mismatchDescription.appendText("}"); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsEnvironmentTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsEnvironmentTest.java index e3c8f19ea40c..c49a59cbb58f 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsEnvironmentTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsEnvironmentTest.java @@ -17,11 +17,14 @@ */ package org.apache.beam.sdk.metrics; +import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.verifyNoMoreInteractions; import static org.mockito.Mockito.when; +import java.io.Closeable; +import java.io.IOException; import org.junit.After; import org.junit.Test; import org.junit.runner.RunWith; @@ -30,9 +33,6 @@ /** Tests for {@link MetricsEnvironment}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MetricsEnvironmentTest { @After public void teardown() { @@ -50,11 +50,38 @@ public void testUsesAppropriateMetricsContainer() { when(c1.getCounter(MetricName.named("ns", "name"))).thenReturn(counter1); when(c2.getCounter(MetricName.named("ns", "name"))).thenReturn(counter2); - MetricsEnvironment.setCurrentContainer(c1); + assertNull(MetricsEnvironment.setCurrentContainer(c1)); counter.inc(); - MetricsEnvironment.setCurrentContainer(c2); + assertEquals(c1, MetricsEnvironment.setCurrentContainer(c2)); counter.dec(); - MetricsEnvironment.setCurrentContainer(null); + assertEquals(c2, MetricsEnvironment.setCurrentContainer(null)); + + verify(counter1).inc(1L); + verify(counter2).inc(-1L); + verifyNoMoreInteractions(counter1, counter2); + } + + @Test + public void testScopedMetricsContainers() throws IOException { + Counter counter = Metrics.counter("ns", "name"); + + MetricsContainer c1 = Mockito.mock(MetricsContainer.class); + MetricsContainer c2 = Mockito.mock(MetricsContainer.class); + Counter counter1 = Mockito.mock(Counter.class); + Counter counter2 = Mockito.mock(Counter.class); + when(c2.getCounter(MetricName.named("ns", "name"))).thenReturn(counter2); + when(c1.getCounter(MetricName.named("ns", "name"))).thenReturn(counter1); + + try (Closeable close = MetricsEnvironment.scopedMetricsContainer(c1)) { + try (Closeable close2 = MetricsEnvironment.scopedMetricsContainer(null)) { + counter.inc(1000); + try (Closeable close3 = MetricsEnvironment.scopedMetricsContainer(c2)) { + counter.dec(); + } + } + counter.inc(); + } + assertEquals(null, MetricsEnvironment.setCurrentContainer(null)); verify(counter1).inc(1L); verify(counter2).inc(-1L); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java index e89430848bda..089d67993314 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java @@ -20,9 +20,10 @@ import static org.apache.beam.sdk.metrics.MetricResultsMatchers.attemptedMetricsResult; import static org.apache.beam.sdk.metrics.MetricResultsMatchers.distributionMinMax; import static org.apache.beam.sdk.metrics.MetricResultsMatchers.metricsResult; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.hasItem; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.when; @@ -54,9 +55,6 @@ import org.mockito.Mockito; /** Tests for {@link Metrics}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MetricsTest implements Serializable { private static final String NS = "test"; @@ -290,12 +288,20 @@ public void testBoundedSourceMetrics() { assertThat( metrics.getCounters(), - hasItem( - attemptedMetricsResult( - ELEMENTS_READ.getNamespace(), - ELEMENTS_READ.getName(), - "Read(BoundedCountingSource)", - 1000L))); + anyOf( + // Step names are different for portable and non-portable runners. + hasItem( + attemptedMetricsResult( + ELEMENTS_READ.getNamespace(), + ELEMENTS_READ.getName(), + "Read(BoundedCountingSource)", + 1000L)), + hasItem( + attemptedMetricsResult( + ELEMENTS_READ.getNamespace(), + ELEMENTS_READ.getName(), + "Read-BoundedCountingSource-", + 1000L)))); } @Test @@ -321,12 +327,20 @@ public void testUnboundedSourceMetrics() { assertThat( metrics.getCounters(), - hasItem( - attemptedMetricsResult( - ELEMENTS_READ.getNamespace(), - ELEMENTS_READ.getName(), - "Read(UnboundedCountingSource)", - 1000L))); + anyOf( + // Step names are different for portable and non-portable runners. + hasItem( + attemptedMetricsResult( + ELEMENTS_READ.getNamespace(), + ELEMENTS_READ.getName(), + "Read(UnboundedCountingSource)", + 1000L)), + hasItem( + attemptedMetricsResult( + ELEMENTS_READ.getNamespace(), + ELEMENTS_READ.getName(), + "Read-UnboundedCountingSource-", + 1000L)))); } } @@ -377,7 +391,13 @@ public void testAttemptedGaugeMetrics() { private static void assertCounterMetrics(MetricQueryResults metrics, boolean isCommitted) { assertThat( metrics.getCounters(), - hasItem(metricsResult(NAMESPACE, "count", "MyStep1", 3L, isCommitted))); + anyOf( + // Step names are different for portable and non-portable runners. + hasItem(metricsResult(NAMESPACE, "count", "MyStep1", 3L, isCommitted)), + hasItem( + metricsResult( + NAMESPACE, "count", "MyStep1-ParMultiDo-Anonymous-", 3L, isCommitted)))); + assertThat( metrics.getCounters(), hasItem(metricsResult(NAMESPACE, "count", "MyStep2", 6L, isCommitted))); @@ -398,13 +418,22 @@ private static void assertGaugeMetrics(MetricQueryResults metrics, boolean isCom private static void assertDistributionMetrics(MetricQueryResults metrics, boolean isCommitted) { assertThat( metrics.getDistributions(), - hasItem( - metricsResult( - NAMESPACE, - "input", - "MyStep1", - DistributionResult.create(26L, 3L, 5L, 13L), - isCommitted))); + anyOf( + // Step names are different for portable and non-portable runners. + hasItem( + metricsResult( + NAMESPACE, + "input", + "MyStep1", + DistributionResult.create(26L, 3L, 5L, 13L), + isCommitted)), + hasItem( + metricsResult( + NAMESPACE, + "input", + "MyStep1-ParMultiDo-Anonymous-", + DistributionResult.create(26L, 3L, 5L, 13L), + isCommitted)))); assertThat( metrics.getDistributions(), @@ -417,7 +446,12 @@ private static void assertDistributionMetrics(MetricQueryResults metrics, boolea isCommitted))); assertThat( metrics.getDistributions(), - hasItem(distributionMinMax(NAMESPACE, "bundle", "MyStep1", 10L, 40L, isCommitted))); + anyOf( + // Step names are different for portable and non-portable runners. + hasItem(distributionMinMax(NAMESPACE, "bundle", "MyStep1", 10L, 40L, isCommitted)), + hasItem( + distributionMinMax( + NAMESPACE, "bundle", "MyStep1-ParMultiDo-Anonymous-", 10L, 40L, isCommitted)))); } private static void assertAllMetrics(MetricQueryResults metrics, boolean isCommitted) { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsFactoryTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsFactoryTest.java index ea54562b19ba..94fd3f41faac 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsFactoryTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsFactoryTest.java @@ -19,6 +19,7 @@ import static java.util.Locale.ROOT; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps.uniqueIndex; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; @@ -32,7 +33,7 @@ import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNull; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertThrows; import static org.junit.Assert.assertTrue; import static org.junit.Assume.assumeFalse; @@ -49,7 +50,9 @@ import com.fasterxml.jackson.databind.SerializerProvider; import com.fasterxml.jackson.databind.annotation.JsonDeserialize; import com.fasterxml.jackson.databind.annotation.JsonSerialize; +import com.fasterxml.jackson.databind.deser.std.StdDeserializer; import com.fasterxml.jackson.databind.module.SimpleModule; +import com.fasterxml.jackson.databind.ser.std.StdSerializer; import com.google.auto.service.AutoService; import java.io.ByteArrayOutputStream; import java.io.IOException; @@ -57,6 +60,7 @@ import java.util.Collection; import java.util.List; import java.util.Map; +import java.util.Objects; import java.util.Set; import org.apache.beam.model.jobmanagement.v1.JobApi.PipelineOptionDescriptor; import org.apache.beam.model.jobmanagement.v1.JobApi.PipelineOptionType; @@ -88,7 +92,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PipelineOptionsFactoryTest { private static final String DEFAULT_RUNNER_NAME = "DirectRunner"; @@ -2081,6 +2084,161 @@ static String myStaticMethod(OptionsWithStaticMethod o) { } } + public static class SimpleParsedObject { + public String value; + + public SimpleParsedObject(String value) { + this.value = value; + } + } + + public interface OptionsWithParsing extends PipelineOptions { + SimpleParsedObject getSimple(); + + void setSimple(SimpleParsedObject value); + } + + @Test + public void testAutoQuoteStringArgumentsForComplexObjects() { + OptionsWithParsing options = + PipelineOptionsFactory.fromArgs("--simple=test").as(OptionsWithParsing.class); + + assertEquals("test", options.getSimple().value); + } + + public static class ComplexType2 { + public String value; + + @Override + public boolean equals(Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + ComplexType2 that = (ComplexType2) o; + return value.equals(that.value); + } + + @Override + public int hashCode() { + return value.hashCode(); + } + } + + public interface OptionsWithJsonDeserialize1 extends PipelineOptions { + @JsonDeserialize(using = ComplexType2Deserializer1.class) + @JsonSerialize(using = ComplexType2Serializer1.class) + ComplexType2 getComplexType(); + + void setComplexType(ComplexType2 value); + } + + public interface OptionsWithJsonDeserialize2 extends PipelineOptions { + @JsonDeserialize(using = ComplexType2Deserializer2.class) + ComplexType2 getComplexType(); + + void setComplexType(ComplexType2 value); + } + + public static class ComplexType2Deserializer1 extends StdDeserializer { + public ComplexType2Deserializer1() { + super(ComplexType2.class); + } + + @Override + public ComplexType2 deserialize(JsonParser p, DeserializationContext ctxt) + throws IOException, JsonProcessingException { + ComplexType2 ct = new ComplexType2(); + ct.value = p.getText(); + return ct; + } + } + + public static class ComplexType2Serializer1 extends StdSerializer { + public ComplexType2Serializer1() { + super(ComplexType2.class); + } + + @Override + public void serialize(ComplexType2 value, JsonGenerator gen, SerializerProvider provider) + throws IOException { + gen.writeString(value.value); + } + } + + public static class ComplexType2Deserializer2 extends StdDeserializer { + public ComplexType2Deserializer2() { + super(ComplexType2.class); + } + + @Override + public ComplexType2 deserialize(JsonParser p, DeserializationContext ctxt) + throws IOException, JsonProcessingException { + ComplexType2 ct = new ComplexType2(); + ct.value = p.getText(); + return ct; + } + } + + @Test + public void testJsonDeserializeAttribute_NoConflict() { + OptionsWithJsonDeserialize1 options = + PipelineOptionsFactory.fromArgs("--complexType=test").as(OptionsWithJsonDeserialize1.class); + + assertEquals("test", options.getComplexType().value); + } + + @Test + public void testJsonDeserializeAttribute_Conflict() { + OptionsWithJsonDeserialize1 options = + PipelineOptionsFactory.fromArgs("--complexType=test").as(OptionsWithJsonDeserialize1.class); + + IllegalArgumentException thrown = + assertThrows( + IllegalArgumentException.class, () -> options.as(OptionsWithJsonDeserialize2.class)); + assertThat( + thrown.getMessage(), + containsString("Property [complexType] is marked with contradictory annotations")); + } + + public interface InconsistentJsonDeserializeAttributes extends PipelineOptions { + @JsonDeserialize() + String getString(); + + void setString(String value); + } + + public interface InconsistentJsonSerializeAttributes extends PipelineOptions { + @JsonSerialize() + String getString(); + + void setString(String value); + } + + @Test + public void testJsonDeserializeAttributeValidation() { + IllegalArgumentException thrown = + assertThrows( + IllegalArgumentException.class, + () -> + PipelineOptionsFactory.fromArgs("--string=test") + .as(InconsistentJsonDeserializeAttributes.class)); + assertThat(thrown.getMessage(), containsString("Property [string] had only @JsonDeserialize")); + } + + @Test + public void testJsonSerializeAttributeValidation() { + IllegalArgumentException thrown = + assertThrows( + IllegalArgumentException.class, + () -> + PipelineOptionsFactory.fromArgs("--string=test") + .as(InconsistentJsonSerializeAttributes.class)); + assertThat(thrown.getMessage(), containsString("Property [string] had only @JsonSerialize")); + } + /** Test interface. */ public interface TestDescribeOptions extends PipelineOptions { String getString(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsValidatorTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsValidatorTest.java index b7eeef16ad28..a98d0276c494 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsValidatorTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsValidatorTest.java @@ -26,9 +26,6 @@ /** Tests for {@link PipelineOptionsValidator}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PipelineOptionsValidatorTest { @Rule public ExpectedException expectedException = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PortablePipelineOptionsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PortablePipelineOptionsTest.java index 675822c0ca08..47e6d11f5042 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PortablePipelineOptionsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PortablePipelineOptionsTest.java @@ -20,7 +20,9 @@ import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; import static org.hamcrest.core.IsNull.nullValue; +import static org.junit.Assert.assertEquals; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Test; public class PortablePipelineOptionsTest { @@ -37,5 +39,33 @@ public void testDefaults() { assertThat(options.getEnvironmentCacheMillis(), is(0)); assertThat(options.getEnvironmentExpirationMillis(), is(0)); assertThat(options.getOutputExecutablePath(), is(nullValue())); + assertThat(options.getEnvironmentOptions(), is(nullValue())); + } + + @Test + public void getEnvironmentOption() { + PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); + options.setEnvironmentOptions(ImmutableList.of("foo=bar")); + assertEquals("bar", PortablePipelineOptions.getEnvironmentOption(options, "foo")); + } + + @Test + public void getEnvironmentOptionContainingEquals() { + PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); + options.setEnvironmentOptions(ImmutableList.of("foo=bar=baz")); + assertEquals("bar=baz", PortablePipelineOptions.getEnvironmentOption(options, "foo")); + } + + @Test + public void getEnvironmentOptionFromEmptyListReturnsEmptyString() { + PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); + assertEquals("", PortablePipelineOptions.getEnvironmentOption(options, "foo")); + } + + @Test + public void getEnvironmentOptionMissingOptionReturnsEmptyString() { + PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class); + options.setEnvironmentOptions(ImmutableList.of("foo=bar")); + assertEquals("", PortablePipelineOptions.getEnvironmentOption(options, "baz")); } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/ProxyInvocationHandlerTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/ProxyInvocationHandlerTest.java index db5e8ee9898f..4cf9b2f0c8bb 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/ProxyInvocationHandlerTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/ProxyInvocationHandlerTest.java @@ -22,6 +22,7 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasNamespace; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasType; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasValue; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.not; @@ -29,15 +30,23 @@ import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assume.assumeFalse; import com.fasterxml.jackson.annotation.JsonIgnore; import com.fasterxml.jackson.annotation.JsonProperty; import com.fasterxml.jackson.annotation.JsonValue; +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.core.JsonParser; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.DeserializationContext; import com.fasterxml.jackson.databind.JsonMappingException; import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.SerializerProvider; +import com.fasterxml.jackson.databind.annotation.JsonDeserialize; +import com.fasterxml.jackson.databind.annotation.JsonSerialize; +import com.fasterxml.jackson.databind.deser.std.StdDeserializer; +import com.fasterxml.jackson.databind.ser.std.StdSerializer; import com.google.common.testing.EqualsTester; import java.io.IOException; import java.io.NotSerializableException; @@ -72,9 +81,6 @@ /** Tests for {@link ProxyInvocationHandler}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ProxyInvocationHandlerTest { @Rule public ExpectedException expectedException = ExpectedException.none(); @@ -907,6 +913,24 @@ public interface ObjectPipelineOptions extends PipelineOptions { void setValue(Object value); } + public interface PrimitiveIntOptions extends PipelineOptions { + int getInt(); + + void setInt(int value); + } + + @Test + public void testPrimitiveIntegerFromJsonOptions() throws Exception { + String optionsJson = + "{\"options\":{\"appName\":\"ProxyInvocationHandlerTest\",\"optionsId\":1,\"int\":\"100\"},\"display_data\":[{\"namespace\":\"org.apache.beam.sdk.options.ProxyInvocationHandlerTest$DisplayDataOptions\",\"key\":\"int\",\"type\":\"INTEGER\",\"value\":100},{\"namespace\":\"org.apache.beam.sdk.options.ApplicationNameOptions\",\"key\":\"appName\",\"type\":\"STRING\",\"value\":\"ProxyInvocationHandlerTest\"}]}"; + + PrimitiveIntOptions options = + MAPPER.readValue(optionsJson, PipelineOptions.class).as(PrimitiveIntOptions.class); + + int value = options.getInt(); + assertEquals(100, value); + } + @Test public void testDisplayDataInheritanceNamespace() { ExtendsBaseOptions options = PipelineOptionsFactory.as(ExtendsBaseOptions.class); @@ -972,6 +996,56 @@ public interface BarOptions extends PipelineOptions { void setBar(String value); } + public static class JacksonObject { + String value; + } + + public static class JacksonObjectSerializer extends StdSerializer { + public JacksonObjectSerializer() { + super(JacksonObject.class); + } + + @Override + public void serialize(JacksonObject value, JsonGenerator gen, SerializerProvider provider) + throws IOException { + gen.writeString(value.value); + } + } + + public static class JacksonObjectDeserializer extends StdDeserializer { + public JacksonObjectDeserializer() { + super(JacksonObject.class); + } + + @Override + public JacksonObject deserialize(JsonParser p, DeserializationContext ctxt) + throws IOException, JsonProcessingException { + JacksonObject obj = new JacksonObject(); + obj.value = p.getValueAsString(); + return obj; + } + } + + public interface JacksonOptions extends PipelineOptions { + @JsonSerialize(using = JacksonObjectSerializer.class) + @JsonDeserialize(using = JacksonObjectDeserializer.class) + JacksonObject getJacksonObject(); + + void setJacksonObject(JacksonObject value); + } + + @Test + public void testJacksonSerializeAndDeserialize() throws Exception { + JacksonOptions options = PipelineOptionsFactory.as(JacksonOptions.class); + JacksonObject value = new JacksonObject(); + value.value = "foo"; + + options.setJacksonObject(value); + + JacksonOptions deserializedOptions = serializeDeserialize(JacksonOptions.class, options); + assertEquals(options.getJacksonObject().value, deserializedOptions.getJacksonObject().value); + } + @Test public void testDisplayDataExcludesDefaultValues() { PipelineOptions options = PipelineOptionsFactory.as(HasDefaults.class); @@ -997,6 +1071,23 @@ public void testDisplayDataExcludesValuesAccessedButNeverSet() { assertThat(data, not(hasDisplayItem("foo"))); } + @Test + public void testDisplayDataExcludesHiddenValues() { + HasHidden options = PipelineOptionsFactory.as(HasHidden.class); + options.setFoo("bar"); + + DisplayData data = DisplayData.from(options); + assertThat(data, not(hasDisplayItem("foo"))); + } + + /** Test interface. */ + public interface HasHidden extends PipelineOptions { + @Hidden + String getFoo(); + + void setFoo(String value); + } + @Test public void testDisplayDataIncludesExplicitlySetDefaults() { HasDefaults options = PipelineOptionsFactory.as(HasDefaults.class); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/RemoteEnvironmentOptionsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/RemoteEnvironmentOptionsTest.java index 23989e99b1c3..ef4b1646f70f 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/RemoteEnvironmentOptionsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/RemoteEnvironmentOptionsTest.java @@ -25,9 +25,6 @@ /** Tests for {@link RemoteEnvironmentOptions}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RemoteEnvironmentOptionsTest { @Test diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/SdkHarnessOptionsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/SdkHarnessOptionsTest.java index 36fa8582fb49..c5744957d478 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/SdkHarnessOptionsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/SdkHarnessOptionsTest.java @@ -32,9 +32,6 @@ /** Tests for {@link SdkHarnessOptions}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SdkHarnessOptionsTest { private static final ObjectMapper MAPPER = new ObjectMapper() diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/PipelineRunnerTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/PipelineRunnerTest.java index dd65e4f6f097..f23ce67caa05 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/PipelineRunnerTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/PipelineRunnerTest.java @@ -18,8 +18,8 @@ package org.apache.beam.sdk.runners; import static org.apache.beam.sdk.metrics.MetricResultsMatchers.metricsResult; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItem; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import org.apache.beam.sdk.PipelineResult; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/TransformHierarchyTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/TransformHierarchyTest.java index 304c2c032f23..aeb729198632 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/TransformHierarchyTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/TransformHierarchyTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.sdk.runners; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.Collections; @@ -47,6 +47,7 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.ParDo.MultiOutput; import org.apache.beam.sdk.transforms.ParDo.SingleOutput; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollection.IsBounded; @@ -71,9 +72,6 @@ /** Tests for {@link TransformHierarchy}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TransformHierarchyTest implements Serializable { @Rule public final transient TestPipeline pipeline = @@ -84,7 +82,7 @@ public class TransformHierarchyTest implements Serializable { @Before public void setup() { - hierarchy = new TransformHierarchy(); + hierarchy = new TransformHierarchy(ResourceHints.create()); } @Test diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/TransformTreeTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/TransformTreeTest.java index 2dabd2455647..b8ebd30b2e98 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/TransformTreeTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/TransformTreeTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.runners; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.File; @@ -54,9 +54,6 @@ /** Tests for {@link TransformHierarchy.Node} and {@link TransformHierarchy}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TransformTreeTest { @Rule public final TestPipeline p = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AutoValueSchemaTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AutoValueSchemaTest.java index 867b87c1493f..db67a1f52030 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AutoValueSchemaTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AutoValueSchemaTest.java @@ -26,7 +26,7 @@ import com.google.auto.value.AutoValue; import java.math.BigDecimal; import java.nio.ByteBuffer; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import org.apache.beam.sdk.schemas.Schema.Field; import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.annotations.DefaultSchema; @@ -45,12 +45,9 @@ /** Tests for {@link AutoValueSchema}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AutoValueSchemaTest { static final DateTime DATE = DateTime.parse("1979-03-14"); - static final byte[] BYTE_ARRAY = "bytearray".getBytes(Charset.defaultCharset()); + static final byte[] BYTE_ARRAY = "bytearray".getBytes(StandardCharsets.UTF_8); static final StringBuilder STRING_BUILDER = new StringBuilder("stringbuilder"); static final Schema SIMPLE_SCHEMA = diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AvroSchemaTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AvroSchemaTest.java index a8622de983f8..c776d18f3cc0 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AvroSchemaTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AvroSchemaTest.java @@ -58,9 +58,6 @@ import org.junit.experimental.categories.Category; /** Tests for AVRO schema classes. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AvroSchemaTest { /** A test POJO that corresponds to our AVRO schema. */ public static class AvroSubPojo { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/FieldAccessDescriptorTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/FieldAccessDescriptorTest.java index fe6f25748a11..4b3cf150b69b 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/FieldAccessDescriptorTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/FieldAccessDescriptorTest.java @@ -30,9 +30,6 @@ import org.junit.rules.ExpectedException; /** Tests for {@link FieldAccessDescriptor}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FieldAccessDescriptorTest { private static final Schema SIMPLE_SCHEMA = Schema.builder() diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/JavaBeanSchemaTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/JavaBeanSchemaTest.java index 9833aceb81b9..5a0a40baea2c 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/JavaBeanSchemaTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/JavaBeanSchemaTest.java @@ -26,6 +26,7 @@ import static org.apache.beam.sdk.schemas.utils.TestJavaBeans.NESTED_ARRAY_BEAN_SCHEMA; import static org.apache.beam.sdk.schemas.utils.TestJavaBeans.NESTED_BEAN_SCHEMA; import static org.apache.beam.sdk.schemas.utils.TestJavaBeans.NESTED_MAP_BEAN_SCHEMA; +import static org.apache.beam.sdk.schemas.utils.TestJavaBeans.PARAMETER_NULLABLE_BEAN_SCHEMA; import static org.apache.beam.sdk.schemas.utils.TestJavaBeans.PRIMITIVE_ARRAY_BEAN_SCHEMA; import static org.apache.beam.sdk.schemas.utils.TestJavaBeans.RENAMED_FIELDS_AND_SETTERS_BEAM_SCHEMA; import static org.apache.beam.sdk.schemas.utils.TestJavaBeans.SIMPLE_BEAN_SCHEMA; @@ -38,9 +39,10 @@ import static org.junit.Assert.assertThrows; import static org.junit.Assert.assertTrue; +import java.lang.reflect.Executable; import java.math.BigDecimal; import java.nio.ByteBuffer; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.Arrays; import java.util.List; import java.util.Map; @@ -56,6 +58,7 @@ import org.apache.beam.sdk.schemas.utils.TestJavaBeans.NestedArraysBean; import org.apache.beam.sdk.schemas.utils.TestJavaBeans.NestedBean; import org.apache.beam.sdk.schemas.utils.TestJavaBeans.NestedMapBean; +import org.apache.beam.sdk.schemas.utils.TestJavaBeans.ParameterNullableBean; import org.apache.beam.sdk.schemas.utils.TestJavaBeans.PrimitiveArrayBean; import org.apache.beam.sdk.schemas.utils.TestJavaBeans.SimpleBean; import org.apache.beam.sdk.schemas.utils.TestJavaBeans.SimpleBeanWithAnnotations; @@ -72,12 +75,9 @@ import org.junit.rules.ExpectedException; /** Tests for the {@link JavaBeanSchema} schema provider. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JavaBeanSchemaTest { static final DateTime DATE = DateTime.parse("1979-03-14"); - static final byte[] BYTE_ARRAY = "bytearray".getBytes(Charset.defaultCharset()); + static final byte[] BYTE_ARRAY = "bytearray".getBytes(StandardCharsets.UTF_8); private SimpleBean createSimple(String name) { return new SimpleBean( @@ -216,6 +216,37 @@ public void testNullableFromRow() throws NoSuchSchemaException { assertNull(bean.getStringBuilder()); } + /** + * [BEAM-11530] Java distinguishes between parameter annotations and type annotations. Therefore + * annotations declared without {@link java.lang.annotation.ElementType#TYPE_USE} can't be + * accessed through {@link Executable#getAnnotatedParameterTypes()}. Some {@code @Nullable} + * annotations like {@link org.apache.avro.reflect.Nullable} do not declare {@link + * java.lang.annotation.ElementType#TYPE_USE} which makes them parameter annotations once placed + * in front of a parameter. + * + * @see https://stackoverflow.com/a/37587590/5896429 + */ + @Test + public void testParameterNullableToRow() throws NoSuchSchemaException { + SchemaRegistry registry = SchemaRegistry.createDefault(); + ParameterNullableBean bean = new ParameterNullableBean(); + Row row = registry.getToRowFunction(ParameterNullableBean.class).apply(bean); + + assertEquals(1, row.getFieldCount()); + assertNull(row.getInt64("value")); + } + + @Test + public void testParameterNullableFromRow() throws NoSuchSchemaException { + SchemaRegistry registry = SchemaRegistry.createDefault(); + Row row = Row.nullRow(PARAMETER_NULLABLE_BEAN_SCHEMA); + + ParameterNullableBean bean = + registry.getFromRowFunction(ParameterNullableBean.class).apply(row); + assertNull(bean.getValue()); + } + @Test public void testToRowSerializable() throws NoSuchSchemaException { SchemaRegistry registry = SchemaRegistry.createDefault(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/JavaFieldSchemaTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/JavaFieldSchemaTest.java index bb9223ae2fe6..4f8f6bd8a600 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/JavaFieldSchemaTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/JavaFieldSchemaTest.java @@ -43,7 +43,7 @@ import java.math.BigDecimal; import java.nio.ByteBuffer; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.Arrays; import java.util.List; import java.util.Map; @@ -80,15 +80,12 @@ import org.junit.Test; /** Tests for the {@link JavaFieldSchema} schema provider. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JavaFieldSchemaTest { static final DateTime DATE = DateTime.parse("1979-03-14"); static final Instant INSTANT = DateTime.parse("1979-03-15").toInstant(); - static final byte[] BYTE_ARRAY = "bytearray".getBytes(Charset.defaultCharset()); + static final byte[] BYTE_ARRAY = "bytearray".getBytes(StandardCharsets.UTF_8); static final ByteBuffer BYTE_BUFFER = - ByteBuffer.wrap("byteBuffer".getBytes(Charset.defaultCharset())); + ByteBuffer.wrap("byteBuffer".getBytes(StandardCharsets.UTF_8)); private SimplePOJO createSimple(String name) { return new SimplePOJO( diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaCoderTest.java index ca60267bb302..516b5a23bf6c 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.schemas; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertNotEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import com.google.auto.value.AutoValue; @@ -51,7 +51,6 @@ @RunWith(Enclosed.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SchemaCoderTest { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaOptionsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaOptionsTest.java index 0e9aae0d953d..81897a88a5c9 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaOptionsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaOptionsTest.java @@ -40,9 +40,6 @@ import org.junit.rules.ExpectedException; /** Unit tests for {@link Schema.Options}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SchemaOptionsTest { private static final String OPTION_NAME = "beam:test:field_i1"; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaRegistryTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaRegistryTest.java index db4e4e50f1cf..a00b80c58089 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaRegistryTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaRegistryTest.java @@ -39,7 +39,6 @@ /** Unit tests for {@link SchemaRegistry}. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SchemaRegistryTest { static final Schema EMPTY_SCHEMA = Schema.builder().build(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaTest.java index 2fa80df929be..5477885c62bb 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaTest.java @@ -32,9 +32,6 @@ import org.junit.rules.ExpectedException; /** Unit tests for {@link Schema}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SchemaTest { @Rule public ExpectedException thrown = ExpectedException.none(); @@ -374,4 +371,23 @@ public void testIllegalNameOf() { schema.nameOf(1); } + + @Test + public void testFieldTypeToString() { + assertEquals("STRING NOT NULL", FieldType.STRING.toString()); + assertEquals("INT64", FieldType.INT64.withNullable(true).toString()); + assertEquals("ARRAY NOT NULL", FieldType.array(FieldType.INT32).toString()); + assertEquals( + "MAP NOT NULL", + FieldType.map(FieldType.INT16, FieldType.FLOAT.withNullable(true)).toString()); + assertEquals( + "ROW", + FieldType.row( + Schema.builder() + .addByteArrayField("field1") + .addField("time", FieldType.DATETIME.withNullable(true)) + .build()) + .withNullable(true) + .toString()); + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/AvroPayloadSerializerProviderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/AvroPayloadSerializerProviderTest.java new file mode 100644 index 000000000000..dbe9ae06ed1d --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/AvroPayloadSerializerProviderTest.java @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import static org.junit.Assert.assertEquals; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.beam.sdk.coders.AvroCoder; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.payloads.AvroPayloadSerializerProvider; +import org.apache.beam.sdk.schemas.utils.AvroUtils; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class AvroPayloadSerializerProviderTest { + private static final Schema SCHEMA = + Schema.builder().addInt64Field("abc").addStringField("xyz").build(); + private static final org.apache.avro.Schema AVRO_SCHEMA = AvroUtils.toAvroSchema(SCHEMA); + private static final AvroCoder AVRO_CODER = AvroCoder.of(AVRO_SCHEMA); + private static final Row DESERIALIZED = + Row.withSchema(SCHEMA).withFieldValue("abc", 3L).withFieldValue("xyz", "qqq").build(); + private static final GenericRecord SERIALIZED = + new GenericRecordBuilder(AVRO_SCHEMA).set("abc", 3L).set("xyz", "qqq").build(); + + private final AvroPayloadSerializerProvider provider = new AvroPayloadSerializerProvider(); + + @Test + public void serialize() throws Exception { + byte[] bytes = provider.getSerializer(SCHEMA, ImmutableMap.of()).serialize(DESERIALIZED); + GenericRecord record = AVRO_CODER.decode(new ByteArrayInputStream(bytes)); + assertEquals(3L, record.get("abc")); + assertEquals("qqq", record.get("xyz").toString()); + } + + @Test + public void deserialize() throws Exception { + ByteArrayOutputStream os = new ByteArrayOutputStream(); + AVRO_CODER.encode(SERIALIZED, os); + Row row = provider.getSerializer(SCHEMA, ImmutableMap.of()).deserialize(os.toByteArray()); + assertEquals(DESERIALIZED, row); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/DeadLetteredTransformTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/DeadLetteredTransformTest.java new file mode 100644 index 000000000000..600e5931268c --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/DeadLetteredTransformTest.java @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import java.io.ByteArrayInputStream; +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.coders.VarLongCoder; +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class DeadLetteredTransformTest { + @Rule public final transient TestPipeline p = TestPipeline.create(); + + private static final String FAILURE_KEY = "KLJSDHFLKJDHF"; + + private static final List FAILURES = new ArrayList<>(); + + private static synchronized void capture(Failure val) { + FAILURES.add(val); + } + + private static synchronized List getFailures() { + return ImmutableList.copyOf(FAILURES); + } + + private static synchronized void resetFailures() { + FAILURES.clear(); + } + + @Test + @Category(NeedsRunner.class) + public void testDeadLettersOnlyFailures() throws Exception { + resetFailures(); + PCollection elements = p.apply(Create.of(10L, 20L).withCoder(VarLongCoder.of())); + PCollection results = + elements.apply( + new DeadLetteredTransform<>( + SimpleFunction.fromSerializableFunctionWithOutputType( + x -> { + if (x == 10L) { + throw new RuntimeException(FAILURE_KEY); + } + return x; + }, + TypeDescriptor.of(Long.class)), + new PTransform, PDone>() { + @Override + public PDone expand(PCollection input) { + input.apply( + MapElements.into(TypeDescriptor.of(Void.class)) + .via( + failure -> { + capture(failure); + return null; + })); + return PDone.in(input.getPipeline()); + } + })); + PAssert.that(results).containsInAnyOrder(20L); + p.run().waitUntilFinish(); + List failures = getFailures(); + assertEquals(1, failures.size()); + Failure failure = failures.iterator().next(); + assertEquals( + 10L, VarLongCoder.of().decode(new ByteArrayInputStream(failure.getPayload())).longValue()); + assertTrue(failure.getError().contains(FAILURE_KEY)); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/GenericDlqTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/GenericDlqTest.java new file mode 100644 index 000000000000..4379101329dd --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/GenericDlqTest.java @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import static java.nio.charset.StandardCharsets.UTF_8; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.junit.Assert.assertThrows; + +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.hamcrest.CoreMatchers; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class GenericDlqTest { + @Rule public final transient TestPipeline p = TestPipeline.create(); + + @Test + @Category(NeedsRunner.class) + public void testDlq() { + StoringDlqProvider.reset(); + Failure failure1 = Failure.newBuilder().setError("a").setPayload("b".getBytes(UTF_8)).build(); + Failure failure2 = Failure.newBuilder().setError("c").setPayload("d".getBytes(UTF_8)).build(); + p.apply(Create.of(failure1, failure2)) + .apply( + GenericDlq.getDlqTransform( + StoringDlqProvider.ID + ": " + StoringDlqProvider.CONFIG + " ")); + p.run().waitUntilFinish(); + assertThat(StoringDlqProvider.getFailures(), CoreMatchers.hasItems(failure1, failure2)); + } + + @Test + public void testParseFailures() { + assertThrows( + IllegalArgumentException.class, () -> GenericDlq.getDlqTransform("no colon present")); + assertThrows(IllegalArgumentException.class, () -> GenericDlq.getDlqTransform("bad_id:xxx")); + assertThrows( + IllegalArgumentException.class, + () -> GenericDlq.getDlqTransform(StoringDlqProvider.ID + ": not config")); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/JsonPayloadSerializerProviderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/JsonPayloadSerializerProviderTest.java new file mode 100644 index 000000000000..51cecad38492 --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/JsonPayloadSerializerProviderTest.java @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import static org.junit.Assert.assertEquals; + +import com.fasterxml.jackson.databind.ObjectMapper; +import java.nio.charset.StandardCharsets; +import java.util.Map; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.payloads.JsonPayloadSerializerProvider; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class JsonPayloadSerializerProviderTest { + private static final Schema SCHEMA = + Schema.builder().addInt64Field("abc").addStringField("xyz").build(); + private static final String SERIALIZED = "{ \"abc\": 3, \"xyz\": \"qqq\" }"; + private static final Row DESERIALIZED = + Row.withSchema(SCHEMA).withFieldValue("abc", 3L).withFieldValue("xyz", "qqq").build(); + + private final JsonPayloadSerializerProvider provider = new JsonPayloadSerializerProvider(); + + @Test + public void serialize() throws Exception { + byte[] bytes = provider.getSerializer(SCHEMA, ImmutableMap.of()).serialize(DESERIALIZED); + ObjectMapper mapper = new ObjectMapper(); + Map result = mapper.readValue(bytes, Map.class); + assertEquals(3, result.get("abc")); + assertEquals("qqq", result.get("xyz")); + } + + @Test + public void deserialize() { + Row row = + provider + .getSerializer(SCHEMA, ImmutableMap.of()) + .deserialize(SERIALIZED.getBytes(StandardCharsets.UTF_8)); + assertEquals(DESERIALIZED, row); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/StoringDlqProvider.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/StoringDlqProvider.java new file mode 100644 index 000000000000..b7a689dc26d5 --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/io/StoringDlqProvider.java @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.io; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.auto.service.AutoService; +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +@AutoService(GenericDlqProvider.class) +public class StoringDlqProvider implements GenericDlqProvider { + static final String ID = "storing_dlq_provider_testonly_do_not_use"; + static final String CONFIG = "storing_dlq_provider_required_config_value"; + private static final List FAILURES = new ArrayList<>(); + + @Override + public String identifier() { + return ID; + } + + public static synchronized void reset() { + FAILURES.clear(); + } + + private static synchronized void addValue(Failure failure) { + FAILURES.add(failure); + } + + public static synchronized List getFailures() { + return ImmutableList.copyOf(FAILURES); + } + + @Override + public PTransform, PDone> newDlqTransform(String config) { + checkArgument(config.equals(CONFIG)); + return new PTransform, PDone>() { + @Override + public PDone expand(PCollection input) { + input.apply( + MapElements.into(TypeDescriptor.of(Void.class)) + .via( + x -> { + addValue(x); + return null; + })); + return PDone.in(input.getPipeline()); + } + }; + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java index 6b825339985e..f52f36057a3a 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java @@ -31,9 +31,6 @@ import org.junit.Test; /** Unit tests for logical types. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LogicalTypesTest { @Test public void testEnumeration() { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/AddFieldsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/AddFieldsTest.java index e33c195b1515..6f9590d957f2 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/AddFieldsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/AddFieldsTest.java @@ -37,9 +37,6 @@ import org.junit.rules.ExpectedException; /** Tests for {@link AddFields}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AddFieldsTest { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); @Rule public transient ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastValidatorTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastValidatorTest.java index 19456f936452..ecc0434b2999 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastValidatorTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastValidatorTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.schemas.transforms; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.math.BigDecimal; @@ -42,9 +42,6 @@ /** Tests for {@link Cast.Widening}, {@link Cast.Narrowing}. */ @Category(UsesSchema.class) @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CastValidatorTest { public static final Map NUMERICS = diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CoGroupTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CoGroupTest.java index fdbf9afcc4ed..9102544235d3 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CoGroupTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CoGroupTest.java @@ -19,8 +19,8 @@ import static junit.framework.TestCase.assertEquals; import static org.hamcrest.CoreMatchers.allOf; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; -import static org.junit.Assert.assertThat; import java.util.List; import java.util.Objects; @@ -54,7 +54,6 @@ @Category(UsesSchema.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class CoGroupTest { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/ConvertTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/ConvertTest.java index 2cdd01c41dfa..d64848ee9ce5 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/ConvertTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/ConvertTest.java @@ -20,10 +20,12 @@ import java.util.Arrays; import java.util.Map; import java.util.Objects; +import org.apache.avro.generic.GenericRecord; import org.apache.beam.sdk.schemas.JavaFieldSchema; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.annotations.DefaultSchema; +import org.apache.beam.sdk.schemas.utils.AvroUtils; import org.apache.beam.sdk.testing.NeedsRunner; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; @@ -45,9 +47,6 @@ /** Tests for the {@link Convert} class. */ @RunWith(JUnit4.class) @Category(UsesSchema.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ConvertTest { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); @@ -125,6 +124,7 @@ public int hashCode() { private static final Row EXPECTED_ROW1_NESTED = Row.withSchema(EXPECTED_SCHEMA1_NESTED).addValues("yard2", 43L).build(); + private static final Row EXPECTED_ROW1 = Row.withSchema(EXPECTED_SCHEMA1) .addValue("field1") @@ -134,6 +134,9 @@ public int hashCode() { .addValue(ImmutableMap.of("first", EXPECTED_ROW1_NESTED, "second", EXPECTED_ROW1_NESTED)) .build(); + private static final GenericRecord EXPECTED_GENERICRECORD1 = + AvroUtils.toGenericRecord(EXPECTED_ROW1, AvroUtils.toAvroSchema(EXPECTED_SCHEMA1)); + /** Test outer POJO. Different but equivalent schema. * */ @DefaultSchema(JavaFieldSchema.class) public static class POJO2 { @@ -245,4 +248,13 @@ public void testFromRowsUnboxingPrimitive() { PAssert.that(longs).containsInAnyOrder((Long) EXPECTED_ROW1.getValue("field2")); pipeline.run(); } + + @Test + @Category(NeedsRunner.class) + public void testToGenericRecords() { + PCollection records = + pipeline.apply(Create.of(new POJO1())).apply(Convert.to(GenericRecord.class)); + PAssert.that(records).containsInAnyOrder(EXPECTED_GENERICRECORD1); + pipeline.run(); + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/FilterTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/FilterTest.java index fc29fa418cc4..9a5e2e48414f 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/FilterTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/FilterTest.java @@ -37,9 +37,6 @@ /** Test for {@link Filter}. * */ @RunWith(JUnit4.class) @Category(UsesSchema.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FilterTest { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); @Rule public transient ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/GroupTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/GroupTest.java index 42308b184516..a0630317fd31 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/GroupTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/GroupTest.java @@ -20,8 +20,8 @@ import static junit.framework.TestCase.assertEquals; import static org.hamcrest.CoreMatchers.allOf; import static org.hamcrest.CoreMatchers.equalTo; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import com.google.auto.value.AutoValue; @@ -70,7 +70,6 @@ @Category(UsesSchema.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class GroupTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/JoinTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/JoinTest.java index 7917ecb67b03..ef8daf1f0a99 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/JoinTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/JoinTest.java @@ -40,9 +40,6 @@ /** Tests for {@link org.apache.beam.sdk.schemas.transforms.Join}. */ @RunWith(JUnit4.class) @Category(UsesSchema.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JoinTest { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/RenameFieldsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/RenameFieldsTest.java index 963fff8f6e77..df2af4410810 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/RenameFieldsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/RenameFieldsTest.java @@ -19,8 +19,14 @@ import static junit.framework.TestCase.assertEquals; +import java.util.BitSet; import java.util.List; +import java.util.Map; +import java.util.UUID; +import java.util.stream.Collectors; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.transforms.RenameFields.RenamePair; import org.apache.beam.sdk.testing.NeedsRunner; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; @@ -29,6 +35,7 @@ import org.apache.beam.sdk.values.Row; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.junit.Rule; import org.junit.Test; import org.junit.experimental.categories.Category; @@ -258,4 +265,65 @@ public void renameNestedInMapFields() { PAssert.that(renamed).containsInAnyOrder(expectedRows); pipeline.run(); } + + @Test + public void testRenameRow() { + Schema nestedSchema = Schema.builder().addStringField("field1").addInt32Field("field2").build(); + Schema schema = + Schema.builder().addStringField("field1").addRowField("nested", nestedSchema).build(); + + Schema expectedNestedSchema = + Schema.builder().addStringField("bottom1").addInt32Field("bottom2").build(); + Schema expectedSchema = + Schema.builder() + .addStringField("top1") + .addRowField("top_nested", expectedNestedSchema) + .build(); + + List renames = + ImmutableList.of( + RenamePair.of(FieldAccessDescriptor.withFieldNames("field1"), "top1"), + RenamePair.of(FieldAccessDescriptor.withFieldNames("nested"), "top_nested"), + RenamePair.of(FieldAccessDescriptor.withFieldNames("nested.field1"), "bottom1"), + RenamePair.of(FieldAccessDescriptor.withFieldNames("nested.field2"), "bottom2")) + .stream() + .map(r -> r.resolve(schema)) + .collect(Collectors.toList()); + + final Map renamedSchemasMap = Maps.newHashMap(); + final Map nestedFieldRenamedMap = Maps.newHashMap(); + RenameFields.renameSchema(schema, renames, renamedSchemasMap, nestedFieldRenamedMap); + + assertEquals(expectedSchema, renamedSchemasMap.get(schema.getUUID())); + + Row row = + Row.withSchema(schema) + .withFieldValue("field1", "one") + .withFieldValue( + "nested", + Row.withSchema(nestedSchema) + .withFieldValue("field1", "one") + .withFieldValue("field2", 1) + .build()) + .build(); + Row expectedRow = + Row.withSchema(expectedSchema) + .withFieldValue("top1", "one") + .withFieldValue( + "top_nested", + Row.withSchema(expectedNestedSchema) + .withFieldValue("bottom1", "one") + .withFieldValue("bottom2", 1) + .build()) + .build(); + + Row renamedRow = + RenameFields.renameRow( + row, + renamedSchemasMap.get(schema.getUUID()), + nestedFieldRenamedMap.get(schema.getUUID()), + renamedSchemasMap, + nestedFieldRenamedMap); + assertEquals(expectedRow, renamedRow); + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/SelectTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/SelectTest.java index 308038a5c985..dd7ac9741cdc 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/SelectTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/SelectTest.java @@ -20,12 +20,14 @@ import static org.junit.Assert.assertEquals; import com.google.auto.value.AutoValue; +import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.stream.Collectors; import java.util.stream.IntStream; import org.apache.beam.sdk.schemas.AutoValueSchema; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.annotations.DefaultSchema; import org.apache.beam.sdk.testing.NeedsRunner; import org.apache.beam.sdk.testing.PAssert; @@ -46,9 +48,6 @@ /** Test for {@link Select}. */ @RunWith(JUnit4.class) @Category(UsesSchema.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SelectTest { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); @Rule public transient ExpectedException thrown = ExpectedException.none(); @@ -733,4 +732,163 @@ public void testFlattenWithOutputSchema() { assertEquals(CLASHING_NAME_UNNESTED_SCHEMA, unnested.getSchema()); pipeline.run(); } + + /** + * Test that {@link Select#flattenedSchema()} transform is able to flatten the nested fields of an + * array of {@link Row}. + * + *

    Example: + * + *

    +   * Row[] transactions {
    +   *   String[] banks,
    +   *   Double purchaseAmount
    +   * }
    +   * ->
    +   * String[][] transactions_banks,
    +   * Double[] transactions_purchaseAmount
    +   * 
    + */ + @Test + @Category(NeedsRunner.class) + public void testFlatSchemaWithArrayNestedField() { + + Schema shippingAddressSchema = + Schema.builder().addStringField("streetAddress").addStringField("city").build(); + Schema transactionSchema = + Schema.builder() + .addArrayField("banks", FieldType.STRING) + .addDoubleField("purchaseAmount") + .build(); + Schema nestedSchema = + Schema.builder() + .addStringField("userId") + .addRowField("shippingAddress", shippingAddressSchema) + .addArrayField("transactions", Schema.FieldType.row(transactionSchema)) + .build(); + + String userId = "user"; + String street = "street"; + String city = "city"; + String bank1 = "bank1_1"; + String bank2 = "bank1_2"; + String bank3 = "bank2_1"; + String bank4 = "bank2_2"; + double purchaseAmount1 = 1.0; + double purchaseAmount2 = 2.0; + + Row transactionOne = + Row.withSchema(transactionSchema).addArray(bank1, bank2).addValue(purchaseAmount1).build(); + Row transactionTwo = + Row.withSchema(transactionSchema).addArray(bank3, bank4).addValue(purchaseAmount2).build(); + Row address = Row.withSchema(shippingAddressSchema).addValues(street, city).build(); + Row row = + Row.withSchema(nestedSchema) + .addValues(userId, address) + .addArray(transactionOne, transactionTwo) + .build(); + + PCollection unnested = + pipeline.apply(Create.of(row).withRowSchema(nestedSchema)).apply(Select.flattenedSchema()); + + Schema expectedUnnestedSchema = + Schema.builder() + .addStringField("userId") + .addStringField("shippingAddress_streetAddress") + .addStringField("shippingAddress_city") + .addArrayField("transactions_banks", FieldType.array(FieldType.STRING)) + .addArrayField("transactions_purchaseAmount", FieldType.DOUBLE) + .build(); + assertEquals(expectedUnnestedSchema, unnested.getSchema()); + + Row expectedUnnestedRow = + Row.withSchema(unnested.getSchema()) + .addValues(userId, street, city) + .addArray(Arrays.asList(bank1, bank2), Arrays.asList(bank3, bank4)) + .addArray(purchaseAmount1, purchaseAmount2) + .build(); + PAssert.that(unnested).containsInAnyOrder(expectedUnnestedRow); + + pipeline.run(); + } + + /** + * Test that {@link Select#flattenedSchema()} transform is able to flatten the nested fields of an + * 2D array of {@link Row}. + * + *

    Example: + * + *

    +   * Row[] transactions {
    +   *   Row[] banks {
    +   *     String name
    +   *     String address
    +   *   },
    +   *   Double purchaseAmount
    +   * }
    +   * ->
    +   * String[][] transactions_banks_name,
    +   * String[][] transactions_banks_address,
    +   * Double[] transactions_purchaseAmount
    +   * 
    + */ + @Test + @Category(NeedsRunner.class) + public void testFlatSchemaWith2DArrayNestedField() { + + Schema banksSchema = Schema.builder().addStringField("name").addStringField("address").build(); + Schema transactionSchema = + Schema.builder() + .addArrayField("banks", Schema.FieldType.row(banksSchema)) + .addDoubleField("purchaseAmount") + .build(); + Schema nestedSchema = + Schema.builder() + .addArrayField("transactions", Schema.FieldType.row(transactionSchema)) + .build(); + + String bankName1 = "bank1_1"; + String bankName2 = "bank1_2"; + String bankName3 = "bank2_1"; + String bankName4 = "bank2_2"; + String bankAddress1 = "address1_1"; + String bankAddress2 = "address1_2"; + String bankAddress3 = "address2_1"; + String bankAddress4 = "address2_2"; + double purchaseAmount1 = 1.0; + double purchaseAmount2 = 2.0; + + Row bank1 = Row.withSchema(banksSchema).addValues(bankName1, bankAddress1).build(); + Row bank2 = Row.withSchema(banksSchema).addValues(bankName2, bankAddress2).build(); + Row bank3 = Row.withSchema(banksSchema).addValues(bankName3, bankAddress3).build(); + Row bank4 = Row.withSchema(banksSchema).addValues(bankName4, bankAddress4).build(); + Row transactionOne = + Row.withSchema(transactionSchema).addArray(bank1, bank2).addValue(purchaseAmount1).build(); + Row transactionTwo = + Row.withSchema(transactionSchema).addArray(bank3, bank4).addValue(purchaseAmount2).build(); + Row row = Row.withSchema(nestedSchema).addArray(transactionOne, transactionTwo).build(); + + PCollection unnested = + pipeline.apply(Create.of(row).withRowSchema(nestedSchema)).apply(Select.flattenedSchema()); + + Schema expectedUnnestedSchema = + Schema.builder() + .addArrayField("transactions_purchaseAmount", FieldType.DOUBLE) + .addArrayField("transactions_banks_name", FieldType.array(FieldType.STRING)) + .addArrayField("transactions_banks_address", FieldType.array(FieldType.STRING)) + .build(); + assertEquals(expectedUnnestedSchema, unnested.getSchema()); + + Row expectedUnnestedRow = + Row.withSchema(unnested.getSchema()) + .addArray(purchaseAmount1, purchaseAmount2) + .addArray(Arrays.asList(bankName1, bankName2), Arrays.asList(bankName3, bankName4)) + .addArray( + Arrays.asList(bankAddress1, bankAddress2), + Arrays.asList(bankAddress3, bankAddress4)) + .build(); + PAssert.that(unnested).containsInAnyOrder(expectedUnnestedRow); + + pipeline.run(); + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/AvroGenerators.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/AvroGenerators.java index b209e4959961..cbeceab24c67 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/AvroGenerators.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/AvroGenerators.java @@ -34,9 +34,6 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ObjectArrays; /** QuickCheck generators for AVRO. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class AvroGenerators { /** Generates arbitrary AVRO schemas. */ diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/AvroUtilsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/AvroUtilsTest.java index 8b225a258b44..835acb3ef067 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/AvroUtilsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/AvroUtilsTest.java @@ -26,6 +26,7 @@ import com.pholser.junit.quickcheck.runner.JUnitQuickcheck; import java.math.BigDecimal; import java.nio.ByteBuffer; +import java.sql.JDBCType; import java.util.List; import java.util.Map; import org.apache.avro.Conversions; @@ -57,8 +58,13 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.DateTime; import org.joda.time.DateTimeZone; +import org.joda.time.Days; +import org.joda.time.Instant; +import org.joda.time.LocalTime; import org.junit.Test; import org.junit.runner.RunWith; @@ -66,7 +72,6 @@ @RunWith(JUnitQuickcheck.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class AvroUtilsTest { @@ -551,6 +556,164 @@ public void testUnionFieldInBeamSchema() { assertEquals(expectedGenericRecord, AvroUtils.toGenericRecord(row, avroSchema)); } + @Test + public void testJdbcLogicalVarCharRowDataToAvroSchema() { + String expectedAvroSchemaJson = + "{ " + + " \"name\": \"topLevelRecord\", " + + " \"type\": \"record\", " + + " \"fields\": [{ " + + " \"name\": \"my_varchar_field\", " + + " \"type\": {\"type\": \"string\", \"logicalType\": \"varchar\", \"maxLength\": 10}" + + " }, " + + " { " + + " \"name\": \"my_longvarchar_field\", " + + " \"type\": {\"type\": \"string\", \"logicalType\": \"varchar\", \"maxLength\": 50}" + + " }, " + + " { " + + " \"name\": \"my_nvarchar_field\", " + + " \"type\": {\"type\": \"string\", \"logicalType\": \"varchar\", \"maxLength\": 10}" + + " }, " + + " { " + + " \"name\": \"my_longnvarchar_field\", " + + " \"type\": {\"type\": \"string\", \"logicalType\": \"varchar\", \"maxLength\": 50}" + + " }, " + + " { " + + " \"name\": \"fixed_length_char_field\", " + + " \"type\": {\"type\": \"string\", \"logicalType\": \"char\", \"maxLength\": 25}" + + " } " + + " ] " + + "}"; + + Schema beamSchema = + Schema.builder() + .addField( + Field.of( + "my_varchar_field", FieldType.logicalType(JdbcType.StringType.varchar(10)))) + .addField( + Field.of( + "my_longvarchar_field", + FieldType.logicalType(JdbcType.StringType.longvarchar(50)))) + .addField( + Field.of( + "my_nvarchar_field", FieldType.logicalType(JdbcType.StringType.nvarchar(10)))) + .addField( + Field.of( + "my_longnvarchar_field", + FieldType.logicalType(JdbcType.StringType.longnvarchar(50)))) + .addField( + Field.of( + "fixed_length_char_field", + FieldType.logicalType(JdbcType.StringType.fixedLengthChar(25)))) + .build(); + + assertEquals( + new org.apache.avro.Schema.Parser().parse(expectedAvroSchemaJson), + AvroUtils.toAvroSchema(beamSchema)); + } + + @Test + public void testJdbcLogicalVarCharRowDataToGenericRecord() { + Schema beamSchema = + Schema.builder() + .addField( + Field.of( + "my_varchar_field", FieldType.logicalType(JdbcType.StringType.varchar(10)))) + .addField( + Field.of( + "my_longvarchar_field", + FieldType.logicalType(JdbcType.StringType.longvarchar(50)))) + .addField( + Field.of( + "my_nvarchar_field", FieldType.logicalType(JdbcType.StringType.nvarchar(10)))) + .addField( + Field.of( + "my_longnvarchar_field", + FieldType.logicalType(JdbcType.StringType.longnvarchar(50)))) + .build(); + + Row rowData = + Row.withSchema(beamSchema) + .addValue("varchar_value") + .addValue("longvarchar_value") + .addValue("nvarchar_value") + .addValue("longnvarchar_value") + .build(); + + org.apache.avro.Schema avroSchema = AvroUtils.toAvroSchema(beamSchema); + GenericRecord expectedRecord = + new GenericRecordBuilder(avroSchema) + .set("my_varchar_field", "varchar_value") + .set("my_longvarchar_field", "longvarchar_value") + .set("my_nvarchar_field", "nvarchar_value") + .set("my_longnvarchar_field", "longnvarchar_value") + .build(); + + assertEquals(expectedRecord, AvroUtils.toGenericRecord(rowData, avroSchema)); + } + + @Test + public void testJdbcLogicalDateAndTimeRowDataToAvroSchema() { + String expectedAvroSchemaJson = + "{ " + + " \"name\": \"topLevelRecord\", " + + " \"type\": \"record\", " + + " \"fields\": [{ " + + " \"name\": \"my_date_field\", " + + " \"type\": { \"type\": \"int\", \"logicalType\": \"date\" }" + + " }, " + + " { " + + " \"name\": \"my_time_field\", " + + " \"type\": { \"type\": \"int\", \"logicalType\": \"time-millis\" }" + + " }" + + " ] " + + "}"; + + Schema beamSchema = + Schema.builder() + .addField(Field.of("my_date_field", FieldType.logicalType(JdbcType.DATE))) + .addField(Field.of("my_time_field", FieldType.logicalType(JdbcType.TIME))) + .build(); + + assertEquals( + new org.apache.avro.Schema.Parser().parse(expectedAvroSchemaJson), + AvroUtils.toAvroSchema(beamSchema)); + } + + @Test + public void testJdbcLogicalDateAndTimeRowDataToGenericRecord() { + // Test Fixed clock at + DateTime testDateTime = DateTime.parse("2021-05-29T11:15:16.234Z"); + + Schema beamSchema = + Schema.builder() + .addField(Field.of("my_date_field", FieldType.logicalType(JdbcType.DATE))) + .addField(Field.of("my_time_field", FieldType.logicalType(JdbcType.TIME))) + .build(); + + Row rowData = + Row.withSchema(beamSchema) + .addValue(testDateTime.toLocalDate().toDateTime(LocalTime.MIDNIGHT).toInstant()) + .addValue(Instant.ofEpochMilli(testDateTime.toLocalTime().millisOfDay().get())) + .build(); + + int daysFromEpoch = + Days.daysBetween( + Instant.EPOCH, + testDateTime.toLocalDate().toDateTime(LocalTime.MIDNIGHT).toInstant()) + .getDays(); + int timeSinceMidNight = testDateTime.toLocalTime().getMillisOfDay(); + + org.apache.avro.Schema avroSchema = AvroUtils.toAvroSchema(beamSchema); + GenericRecord expectedRecord = + new GenericRecordBuilder(avroSchema) + .set("my_date_field", daysFromEpoch) + .set("my_time_field", timeSinceMidNight) + .build(); + + assertEquals(expectedRecord, AvroUtils.toGenericRecord(rowData, avroSchema)); + } + @Test public void testBeamRowToGenericRecord() { GenericRecord genericRecord = AvroUtils.toGenericRecord(getBeamRow(), null); @@ -558,6 +721,13 @@ public void testBeamRowToGenericRecord() { assertEquals(getGenericRecord(), genericRecord); } + @Test + public void testBeamRowToGenericRecordInferSchema() { + GenericRecord genericRecord = AvroUtils.toGenericRecord(getBeamRow()); + assertEquals(getAvroSchema(), genericRecord.getSchema()); + assertEquals(getGenericRecord(), genericRecord); + } + @Test public void testRowToGenericRecordFunction() { SerializableUtils.ensureSerializable(AvroUtils.getRowToGenericRecordFunction(NULL_SCHEMA)); @@ -634,4 +804,91 @@ public void testAvroBytesToRowAndRowToAvroBytesFunctions() { assertEquals(row, deserializedRow); } + + @Test + public void testNullSchemas() { + assertEquals( + AvroUtils.getFromRowFunction(GenericRecord.class), + AvroUtils.getFromRowFunction(GenericRecord.class)); + } + + /** Helper class that simulate JDBC Logical types. */ + private static class JdbcType implements Schema.LogicalType { + + private static final JdbcType DATE = + new JdbcType<>(JDBCType.DATE, FieldType.STRING, FieldType.DATETIME, ""); + private static final JdbcType TIME = + new JdbcType<>(JDBCType.TIME, FieldType.STRING, FieldType.DATETIME, ""); + + private final String identifier; + private final FieldType argumentType; + private final FieldType baseType; + private final Object argument; + + private static class StringType extends JdbcType { + + private static StringType fixedLengthChar(int size) { + return new StringType(JDBCType.CHAR, size); + } + + private static StringType varchar(int size) { + return new StringType(JDBCType.VARCHAR, size); + } + + private static StringType longvarchar(int size) { + return new StringType(JDBCType.LONGVARCHAR, size); + } + + private static StringType nvarchar(int size) { + return new StringType(JDBCType.NVARCHAR, size); + } + + private static StringType longnvarchar(int size) { + return new StringType(JDBCType.LONGNVARCHAR, size); + } + + private StringType(JDBCType type, int size) { + super(type, FieldType.INT32, FieldType.STRING, size); + } + } + + private JdbcType( + JDBCType jdbcType, FieldType argumentType, FieldType baseType, Object argument) { + this.identifier = jdbcType.getName(); + this.argumentType = argumentType; + this.baseType = baseType; + this.argument = argument; + } + + @Override + public String getIdentifier() { + return identifier; + } + + @Override + public @Nullable FieldType getArgumentType() { + return argumentType; + } + + @Override + public FieldType getBaseType() { + return baseType; + } + + @Override + @SuppressWarnings("TypeParameterUnusedInFormals") + public @Nullable T1 getArgument() { + return (T1) argument; + } + + @Override + public @NonNull T toBaseType(@NonNull T input) { + return input; + } + + @Override + public @NonNull T toInputType(@NonNull T base) { + return base; + } + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/JavaBeanUtilsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/JavaBeanUtilsTest.java index 51b221c24a6b..0413f3650d30 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/JavaBeanUtilsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/JavaBeanUtilsTest.java @@ -33,7 +33,7 @@ import java.math.BigDecimal; import java.nio.ByteBuffer; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.List; import org.apache.beam.sdk.schemas.FieldValueGetter; import org.apache.beam.sdk.schemas.FieldValueSetter; @@ -58,7 +58,6 @@ /** Tests for the {@link JavaBeanUtils} class. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class JavaBeanUtilsTest { @Test @@ -131,8 +130,8 @@ public void testGeneratedSimpleGetters() { simpleBean.setaBoolean(true); simpleBean.setDateTime(DateTime.parse("1979-03-14")); simpleBean.setInstant(DateTime.parse("1979-03-15").toInstant()); - simpleBean.setBytes("bytes1".getBytes(Charset.defaultCharset())); - simpleBean.setByteBuffer(ByteBuffer.wrap("bytes2".getBytes(Charset.defaultCharset()))); + simpleBean.setBytes("bytes1".getBytes(StandardCharsets.UTF_8)); + simpleBean.setByteBuffer(ByteBuffer.wrap("bytes2".getBytes(StandardCharsets.UTF_8))); simpleBean.setBigDecimal(new BigDecimal(42)); simpleBean.setStringBuilder(new StringBuilder("stringBuilder")); @@ -155,11 +154,11 @@ public void testGeneratedSimpleGetters() { assertEquals(DateTime.parse("1979-03-15").toInstant(), getters.get(7).get(simpleBean)); assertArrayEquals( "Unexpected bytes", - "bytes1".getBytes(Charset.defaultCharset()), + "bytes1".getBytes(StandardCharsets.UTF_8), (byte[]) getters.get(8).get(simpleBean)); assertArrayEquals( "Unexpected bytes", - "bytes2".getBytes(Charset.defaultCharset()), + "bytes2".getBytes(StandardCharsets.UTF_8), (byte[]) getters.get(9).get(simpleBean)); assertEquals(new BigDecimal(42), getters.get(10).get(simpleBean)); assertEquals("stringBuilder", getters.get(11).get(simpleBean).toString()); @@ -184,8 +183,8 @@ public void testGeneratedSimpleSetters() { setters.get(5).set(simpleBean, true); setters.get(6).set(simpleBean, DateTime.parse("1979-03-14").toInstant()); setters.get(7).set(simpleBean, DateTime.parse("1979-03-15").toInstant()); - setters.get(8).set(simpleBean, "bytes1".getBytes(Charset.defaultCharset())); - setters.get(9).set(simpleBean, "bytes2".getBytes(Charset.defaultCharset())); + setters.get(8).set(simpleBean, "bytes1".getBytes(StandardCharsets.UTF_8)); + setters.get(9).set(simpleBean, "bytes2".getBytes(StandardCharsets.UTF_8)); setters.get(10).set(simpleBean, new BigDecimal(42)); setters.get(11).set(simpleBean, "stringBuilder"); @@ -198,9 +197,9 @@ public void testGeneratedSimpleSetters() { assertEquals(DateTime.parse("1979-03-14"), simpleBean.getDateTime()); assertEquals(DateTime.parse("1979-03-15").toInstant(), simpleBean.getInstant()); assertArrayEquals( - "Unexpected bytes", "bytes1".getBytes(Charset.defaultCharset()), simpleBean.getBytes()); + "Unexpected bytes", "bytes1".getBytes(StandardCharsets.UTF_8), simpleBean.getBytes()); assertEquals( - ByteBuffer.wrap("bytes2".getBytes(Charset.defaultCharset())), simpleBean.getByteBuffer()); + ByteBuffer.wrap("bytes2".getBytes(StandardCharsets.UTF_8)), simpleBean.getByteBuffer()); assertEquals(new BigDecimal(42), simpleBean.getBigDecimal()); assertEquals("stringBuilder", simpleBean.getStringBuilder().toString()); } @@ -259,10 +258,10 @@ public void testGeneratedByteBufferSetters() { BEAN_WITH_BYTE_ARRAY_SCHEMA, new SetterTypeSupplier(), new DefaultTypeConversionsFactory()); - setters.get(0).set(bean, "field1".getBytes(Charset.defaultCharset())); - setters.get(1).set(bean, "field2".getBytes(Charset.defaultCharset())); + setters.get(0).set(bean, "field1".getBytes(StandardCharsets.UTF_8)); + setters.get(1).set(bean, "field2".getBytes(StandardCharsets.UTF_8)); - assertArrayEquals("not equal", "field1".getBytes(Charset.defaultCharset()), bean.getBytes1()); - assertEquals(ByteBuffer.wrap("field2".getBytes(Charset.defaultCharset())), bean.getBytes2()); + assertArrayEquals("not equal", "field1".getBytes(StandardCharsets.UTF_8), bean.getBytes1()); + assertEquals(ByteBuffer.wrap("field2".getBytes(StandardCharsets.UTF_8)), bean.getBytes2()); } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/POJOUtilsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/POJOUtilsTest.java index ff05402f15b1..cff25bff1f27 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/POJOUtilsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/POJOUtilsTest.java @@ -33,7 +33,7 @@ import java.math.BigDecimal; import java.nio.ByteBuffer; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.List; import org.apache.beam.sdk.schemas.FieldValueGetter; import org.apache.beam.sdk.schemas.FieldValueSetter; @@ -57,14 +57,13 @@ /** Tests for the {@link POJOUtils} class. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class POJOUtilsTest { static final DateTime DATE = DateTime.parse("1979-03-14"); static final Instant INSTANT = DateTime.parse("1979-03-15").toInstant(); - static final byte[] BYTE_ARRAY = "byteArray".getBytes(Charset.defaultCharset()); + static final byte[] BYTE_ARRAY = "byteArray".getBytes(StandardCharsets.UTF_8); static final ByteBuffer BYTE_BUFFER = - ByteBuffer.wrap("byteBuffer".getBytes(Charset.defaultCharset())); + ByteBuffer.wrap("byteBuffer".getBytes(StandardCharsets.UTF_8)); @Test public void testNullables() { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SchemaTestUtils.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SchemaTestUtils.java index 42673f7ccbed..494c47ee259c 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SchemaTestUtils.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SchemaTestUtils.java @@ -39,9 +39,6 @@ import org.hamcrest.Matcher; /** Utilities for testing schemas. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SchemaTestUtils { // Assert that two schemas are equivalent, ignoring field order. This tests that both schemas // (recursively) contain the same fields with the same names, but possibly different orders. diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SchemaZipFoldTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SchemaZipFoldTest.java index 35bb57af4852..8228acd5d1b3 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SchemaZipFoldTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SchemaZipFoldTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.schemas.utils; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.Collections; import java.util.List; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SelectHelpersTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SelectHelpersTest.java index 630fd297ce77..baff5375d881 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SelectHelpersTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/SelectHelpersTest.java @@ -37,9 +37,6 @@ /** Tests for {@link SelectHelpers}. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SelectHelpersTest { @Parameterized.Parameter public boolean useOptimizedSelect; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestJavaBeans.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestJavaBeans.java index 091187308134..e1d76f1a2841 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestJavaBeans.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestJavaBeans.java @@ -37,9 +37,6 @@ import org.joda.time.Instant; /** Various Java Beans and associated schemas used in tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestJavaBeans { /** A Bean containing one nullable and one non-nullable type. */ @DefaultSchema(JavaBeanSchema.class) @@ -1330,4 +1327,21 @@ public int hashCode() { .addInt32Field("age_in_years") .addBooleanField("KnowsJavascript") .build(); + + @DefaultSchema(JavaBeanSchema.class) + public static class ParameterNullableBean { + + @org.apache.avro.reflect.Nullable private Float value; + + public @org.apache.avro.reflect.Nullable Float getValue() { + return value; + } + + public void setValue(@org.apache.avro.reflect.Nullable Float value) { + this.value = value; + } + } + + public static final Schema PARAMETER_NULLABLE_BEAN_SCHEMA = + Schema.builder().addNullableField("value", FieldType.INT64).build(); } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestPOJOs.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestPOJOs.java index 0eb1384cd639..058bd7d82a91 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestPOJOs.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestPOJOs.java @@ -38,9 +38,6 @@ import org.joda.time.Instant; /** Various Java POJOs and associated schemas used in tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestPOJOs { /** A POJO containing one nullable and one non-nullable type. */ @DefaultSchema(JavaFieldSchema.class) diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/state/StateContextsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/state/StateContextsTest.java index 964f81a4a50d..1b7112c40da2 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/state/StateContextsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/state/StateContextsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.state; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.transforms.Create; @@ -37,9 +37,6 @@ /** Tests for {@link StateContexts}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StateContextsTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/CoderPropertiesTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/CoderPropertiesTest.java index 98021b695bb8..a20482eea9b4 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/CoderPropertiesTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/CoderPropertiesTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.testing; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.io.IOException; @@ -40,9 +40,6 @@ /** Unit tests for {@link CoderProperties}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CoderPropertiesTest { @Rule public ExpectedException expectedException = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/CombineFnTesterTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/CombineFnTesterTest.java index 3c054f5b298c..f76b3e1e1890 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/CombineFnTesterTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/CombineFnTesterTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.testing; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.ArrayList; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ExpectedLogs.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ExpectedLogs.java index 28c9483012d3..ad976531620a 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ExpectedLogs.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ExpectedLogs.java @@ -40,9 +40,6 @@ * certain log levels. For logs generated via the SLF4J logging frontend, the JUL backend must be * used. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ExpectedLogs extends ExternalResource { /** * Returns a {@link TestRule} that captures logs for the given logger name. diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ExpectedLogsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ExpectedLogsTest.java index adc954ddd732..896ca6958ec0 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ExpectedLogsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ExpectedLogsTest.java @@ -41,9 +41,6 @@ /** Tests for {@link ExpectedLogs}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ExpectedLogsTest { private static final Logger LOG = LoggerFactory.getLogger(ExpectedLogsTest.class); private Random random = new Random(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/FileChecksumMatcherTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/FileChecksumMatcherTest.java index 59d3276e0b5f..b44036d9f4a5 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/FileChecksumMatcherTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/FileChecksumMatcherTest.java @@ -38,9 +38,6 @@ /** Tests for {@link FileChecksumMatcher}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FileChecksumMatcherTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/InterceptingUrlClassLoader.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/InterceptingUrlClassLoader.java index 14e1c23136e3..39e11ef80163 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/InterceptingUrlClassLoader.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/InterceptingUrlClassLoader.java @@ -28,9 +28,6 @@ * original classes definition and is useful for testing code which needs to validate usage with * multiple classloaders.. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class InterceptingUrlClassLoader extends ClassLoader { private final Predicate test; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PAssertTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PAssertTest.java index 31e3bfa55f04..f3c43a2a7723 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PAssertTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PAssertTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.sdk.testing; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -56,8 +56,10 @@ import org.apache.beam.sdk.util.common.ElementByteSizeObserver; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.TimestampedValue; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; @@ -73,7 +75,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PAssertTest implements Serializable { @@ -597,4 +598,146 @@ public void countAssertsMultipleCallsIndependent() { assertThat(PAssert.countAsserts(pipeline), equalTo(3)); } + + @Test + @Category(ValidatesRunner.class) + public void testPAssertThatFlattened() { + PCollection firstCollection = pipeline.apply("FirstCreate", Create.of(1, 2, 3)); + PCollection secondCollection = pipeline.apply("SecondCreate", Create.of(4, 5, 6)); + + PCollectionList collectionList = + PCollectionList.of(firstCollection).and(secondCollection); + + PAssert.thatFlattened(collectionList).containsInAnyOrder(1, 2, 3, 4, 5, 6); + + pipeline.run(); + } + + /** Test that we throw an error for false assertion on flattened. */ + @Test + @Category({ValidatesRunner.class, UsesFailureMessage.class}) + public void testPAssertThatFlattenedFalse() throws Exception { + PCollection firstCollection = pipeline.apply("FirstCreate", Create.of(1, 2, 3)); + PCollection secondCollection = pipeline.apply("SecondCreate", Create.of(4, 5, 6)); + + PCollectionList collectionList = + PCollectionList.of(firstCollection).and(secondCollection); + + PAssert.thatFlattened(collectionList).containsInAnyOrder(7); + Throwable thrown = runExpectingAssertionFailure(pipeline); + + String message = thrown.getMessage(); + + assertThat(message, containsString("Expected: iterable with items [<7>] in any order")); + } + + @Test + @Category(ValidatesRunner.class) + public void testPAssertThatListSatisfiesOneMatcher() { + PCollection firstCollection = pipeline.apply("FirstCreate", Create.of(1, 2, 3)); + PCollection secondCollection = pipeline.apply("SecondCreate", Create.of(4, 5, 6)); + + PCollectionList collectionList = + PCollectionList.of(firstCollection).and(secondCollection); + + PAssert.thatList(collectionList) + .satisfies( + input -> { + for (Integer element : input) { + assertTrue(element > 0); + } + return null; + }); + + pipeline.run(); + } + + /** Test that we throw an error for false assertion on list with one matcher. */ + @Test + @Category({ValidatesRunner.class, UsesFailureMessage.class}) + public void testPAssertThatListSatisfiesOneMatcherFalse() { + PCollection firstCollection = pipeline.apply("FirstCreate", Create.of(1, 2, 3)); + PCollection secondCollection = pipeline.apply("SecondCreate", Create.of(4, 5, 6)); + + PCollectionList collectionList = + PCollectionList.of(firstCollection).and(secondCollection); + + String expectedAssertionFailMessage = "Elements should be less than 0"; + + PAssert.thatList(collectionList) + .satisfies( + input -> { + for (Integer element : input) { + assertTrue(expectedAssertionFailMessage, element < 0); + } + return null; + }); + + Throwable thrown = runExpectingAssertionFailure(pipeline); + String stackTrace = Throwables.getStackTraceAsString(thrown); + + assertThat(stackTrace, containsString(expectedAssertionFailMessage)); + } + + @Test + @Category(ValidatesRunner.class) + public void testPAssertThatListSatisfiesMultipleMatchers() { + PCollection firstCollection = pipeline.apply("FirstCreate", Create.of(1, 2, 3)); + PCollection secondCollection = pipeline.apply("SecondCreate", Create.of(4, 5, 6)); + + PCollectionList collectionList = + PCollectionList.of(firstCollection).and(secondCollection); + + PAssert.thatList(collectionList) + .satisfies( + ImmutableList.of( + input -> { + for (Integer element : input) { + assertTrue(element < 4); + } + return null; + }, + input -> { + for (Integer element : input) { + assertTrue(element < 7); + } + return null; + })); + + pipeline.run(); + } + + /** Test that we throw an error for false assertion on list with multiple matchers. */ + @Test + @Category({ValidatesRunner.class, UsesFailureMessage.class}) + public void testPAssertThatListSatisfiesMultipleMatchersFalse() { + PCollection firstCollection = pipeline.apply("FirstCreate", Create.of(1, 2, 3)); + PCollection secondCollection = pipeline.apply("SecondCreate", Create.of(4, 5, 6)); + + PCollectionList collectionList = + PCollectionList.of(firstCollection).and(secondCollection); + + String expectedAssertionFailMessage = "Elements should be less than 0"; + + PAssert.thatList(collectionList) + .satisfies( + ImmutableList.of( + input -> { + for (Integer element : input) { + assertTrue(expectedAssertionFailMessage, element < 0); + } + return null; + }, + input -> { + for (Integer element : input) { + assertTrue(expectedAssertionFailMessage, element < 0); + } + return null; + })); + + Throwable thrown = runExpectingAssertionFailure(pipeline); + String stackTrace = Throwables.getStackTraceAsString(thrown); + + assertThat(stackTrace, containsString(expectedAssertionFailMessage)); + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PCollectionViewTesting.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PCollectionViewTesting.java index b340d91f04e7..61b4bf8c8e9e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PCollectionViewTesting.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PCollectionViewTesting.java @@ -17,8 +17,6 @@ */ package org.apache.beam.sdk.testing; -import static org.apache.beam.sdk.options.ExperimentalOptions.hasExperiment; - import java.util.ArrayList; import java.util.Collections; import java.util.List; @@ -42,82 +40,42 @@ public static List materializeValuesFor( // materializations will differ but test code should not worry about what these look like if // they are relying on the ViewFn to "undo" the conversion. - // TODO(BEAM-10097): Make this the default case once all portable runners can support - // the iterable access pattern. - if (hasExperiment(options, "beam_fn_api") - && (hasExperiment(options, "use_runner_v2") - || hasExperiment(options, "use_unified_worker"))) { - if (View.AsSingleton.class.equals(viewTransformClass.getClass())) { - for (Object value : values) { - rval.add(value); - } - } else if (View.AsIterable.class.equals(viewTransformClass.getClass())) { - for (Object value : values) { - rval.add(value); - } - } else if (View.AsList.class.equals(viewTransformClass.getClass())) { - if (values.length > 0) { - rval.add( - KV.of( - Long.MIN_VALUE, - ValueOrMetadata.createMetadata(new OffsetRange(0, values.length)))); - for (int i = 0; i < values.length; ++i) { - rval.add(KV.of((long) i, ValueOrMetadata.create(values[i]))); - } - } - } else if (View.AsMap.class.equals(viewTransformClass.getClass())) { - for (Object value : values) { - rval.add(value); - } - } else if (View.AsMultimap.class.equals(viewTransformClass.getClass())) { - for (Object value : values) { - rval.add(value); - } - } else { - throw new IllegalArgumentException( - String.format( - "Unknown type of view %s. Supported views are %s.", - viewTransformClass.getClass(), - ImmutableSet.of( - View.AsSingleton.class, - View.AsIterable.class, - View.AsList.class, - View.AsMap.class, - View.AsMultimap.class))); + if (View.AsSingleton.class.equals(viewTransformClass.getClass())) { + for (Object value : values) { + rval.add(value); } - } else { - if (View.AsSingleton.class.equals(viewTransformClass.getClass())) { - for (Object value : values) { - rval.add(KV.of(null, value)); - } - } else if (View.AsIterable.class.equals(viewTransformClass.getClass())) { - for (Object value : values) { - rval.add(KV.of(null, value)); - } - } else if (View.AsList.class.equals(viewTransformClass.getClass())) { - for (Object value : values) { - rval.add(KV.of(null, value)); - } - } else if (View.AsMap.class.equals(viewTransformClass.getClass())) { - for (Object value : values) { - rval.add(KV.of(null, value)); - } - } else if (View.AsMultimap.class.equals(viewTransformClass.getClass())) { - for (Object value : values) { - rval.add(KV.of(null, value)); + } else if (View.AsIterable.class.equals(viewTransformClass.getClass())) { + for (Object value : values) { + rval.add(value); + } + } else if (View.AsList.class.equals(viewTransformClass.getClass())) { + if (values.length > 0) { + rval.add( + KV.of( + Long.MIN_VALUE, ValueOrMetadata.createMetadata(new OffsetRange(0, values.length)))); + for (int i = 0; i < values.length; ++i) { + rval.add(KV.of((long) i, ValueOrMetadata.create(values[i]))); } - } else { - throw new IllegalArgumentException( - String.format( - "Unknown type of view %s. Supported views are %s.", - viewTransformClass.getClass(), - ImmutableSet.of( - View.AsSingleton.class, - View.AsIterable.class, - View.AsList.class, - View.AsMap.class, - View.AsMultimap.class))); } + } else if (View.AsMap.class.equals(viewTransformClass.getClass())) { + for (Object value : values) { + rval.add(value); + } + } else if (View.AsMultimap.class.equals(viewTransformClass.getClass())) { + for (Object value : values) { + rval.add(value); + } + } else { + throw new IllegalArgumentException( + String.format( + "Unknown type of view %s. Supported views are %s.", + viewTransformClass.getClass(), + ImmutableSet.of( + View.AsSingleton.class, + View.AsIterable.class, + View.AsList.class, + View.AsMap.class, + View.AsMultimap.class))); } return Collections.unmodifiableList(rval); } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PaneExtractorsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PaneExtractorsTest.java index aa86f29aa554..d922710e3ae4 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PaneExtractorsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/PaneExtractorsTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.testing; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemProperties.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemProperties.java index 728084105de8..91903610b44a 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemProperties.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemProperties.java @@ -24,9 +24,6 @@ import org.junit.rules.TestRule; /** Saves and restores the current system properties for tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RestoreSystemProperties extends ExternalResource implements TestRule { private byte[] originalProperties; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemPropertiesTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemPropertiesTest.java index 52ebf9a89579..1a365864af31 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemPropertiesTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemPropertiesTest.java @@ -28,9 +28,6 @@ /** Tests for {@link RestoreSystemProperties}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RestoreSystemPropertiesTest { @Rule public TestRule restoreSystemProperties = new RestoreSystemProperties(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/SerializableMatchersTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/SerializableMatchersTest.java index 8aa5571f598e..7f720d63abd0 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/SerializableMatchersTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/SerializableMatchersTest.java @@ -22,7 +22,7 @@ import static org.apache.beam.sdk.testing.SerializableMatchers.containsInAnyOrder; import static org.apache.beam.sdk.testing.SerializableMatchers.kvWithKey; import static org.apache.beam.sdk.testing.SerializableMatchers.not; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.io.InputStream; import java.io.OutputStream; @@ -49,9 +49,6 @@ * boilerplate that is identical to each is considered thoroughly tested. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SerializableMatchersTest implements Serializable { @Rule public transient ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/SourceTestUtilsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/SourceTestUtilsTest.java index bb2170165878..81c002b5fc46 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/SourceTestUtilsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/SourceTestUtilsTest.java @@ -34,9 +34,6 @@ /** Tests for {@link SourceTestUtils}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SourceTestUtilsTest { @Test diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/StaticWindowsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/StaticWindowsTest.java index c5d47f6b24ce..0189270342ff 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/StaticWindowsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/StaticWindowsTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.testing; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestPipelineTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestPipelineTest.java index c32a1dc1f62e..9f26441bcc42 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestPipelineTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestPipelineTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.testing; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.startsWith; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import com.fasterxml.jackson.databind.ObjectMapper; import java.io.Serializable; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java index 4c0e2b41022d..87f7eff0d0f6 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java @@ -78,38 +78,35 @@ /** Tests for {@link TestStream}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestStreamTest implements Serializable { @Rule public transient TestPipeline p = TestPipeline.create(); @Rule public transient ExpectedException thrown = ExpectedException.none(); @Test - @Category({NeedsRunner.class, UsesTestStream.class}) + @Category({ValidatesRunner.class, UsesTestStream.class}) public void testLateDataAccumulating() { Instant instant = new Instant(0); - TestStream source = - TestStream.create(VarIntCoder.of()) + TestStream source = + TestStream.create(VarLongCoder.of()) .addElements( - TimestampedValue.of(1, instant), - TimestampedValue.of(2, instant), - TimestampedValue.of(3, instant)) + TimestampedValue.of(1L, instant), + TimestampedValue.of(2L, instant), + TimestampedValue.of(3L, instant)) .advanceWatermarkTo(instant.plus(Duration.standardMinutes(6))) // These elements are late but within the allowed lateness - .addElements(TimestampedValue.of(4, instant), TimestampedValue.of(5, instant)) + .addElements(TimestampedValue.of(4L, instant), TimestampedValue.of(5L, instant)) .advanceWatermarkTo(instant.plus(Duration.standardMinutes(20))) // These elements are droppably late .addElements( - TimestampedValue.of(-1, instant), - TimestampedValue.of(-2, instant), - TimestampedValue.of(-3, instant)) + TimestampedValue.of(-1L, instant), + TimestampedValue.of(-2L, instant), + TimestampedValue.of(-3L, instant)) .advanceWatermarkToInfinity(); - PCollection windowed = + PCollection windowed = p.apply(source) .apply( - Window.into(FixedWindows.of(Duration.standardMinutes(5))) + Window.into(FixedWindows.of(Duration.standardMinutes(5))) .triggering( AfterWatermark.pastEndOfWindow() .withEarlyFirings( @@ -118,19 +115,19 @@ public void testLateDataAccumulating() { .withLateFirings(AfterPane.elementCountAtLeast(1))) .accumulatingFiredPanes() .withAllowedLateness(Duration.standardMinutes(5), ClosingBehavior.FIRE_ALWAYS)); - PCollection triggered = + PCollection triggered = windowed .apply(WithKeys.of(1)) .apply(GroupByKey.create()) .apply(Values.create()) .apply(Flatten.iterables()); PCollection count = - windowed.apply(Combine.globally(Count.combineFn()).withoutDefaults()); - PCollection sum = windowed.apply(Sum.integersGlobally().withoutDefaults()); + windowed.apply(Combine.globally(Count.combineFn()).withoutDefaults()); + PCollection sum = windowed.apply(Sum.longsGlobally().withoutDefaults()); IntervalWindow window = new IntervalWindow(instant, instant.plus(Duration.standardMinutes(5L))); - PAssert.that(triggered).inFinalPane(window).containsInAnyOrder(1, 2, 3, 4, 5); - PAssert.that(triggered).inOnTimePane(window).containsInAnyOrder(1, 2, 3); + PAssert.that(triggered).inFinalPane(window).containsInAnyOrder(1L, 2L, 3L, 4L, 5L); + PAssert.that(triggered).inOnTimePane(window).containsInAnyOrder(1L, 2L, 3L); PAssert.that(count) .inWindow(window) .satisfies( @@ -144,8 +141,8 @@ public void testLateDataAccumulating() { .inWindow(window) .satisfies( input -> { - for (Integer sum1 : input) { - assertThat(sum1, allOf(greaterThanOrEqualTo(6), lessThanOrEqualTo(15))); + for (Long sum1 : input) { + assertThat(sum1, allOf(greaterThanOrEqualTo(6L), lessThanOrEqualTo(15L))); } return null; }); @@ -154,7 +151,7 @@ public void testLateDataAccumulating() { } @Test - @Category({NeedsRunner.class, UsesTestStreamWithProcessingTime.class}) + @Category({ValidatesRunner.class, UsesTestStreamWithProcessingTime.class}) public void testProcessingTimeTrigger() { TestStream source = TestStream.create(VarLongCoder.of()) @@ -185,7 +182,7 @@ public void testProcessingTimeTrigger() { } @Test - @Category({NeedsRunner.class, UsesTestStream.class}) + @Category({ValidatesRunner.class, UsesTestStream.class}) public void testDiscardingMode() { TestStream stream = TestStream.create(StringUtf8Coder.of()) @@ -234,7 +231,7 @@ public void testDiscardingMode() { } @Test - @Category({NeedsRunner.class, UsesTestStream.class}) + @Category({ValidatesRunner.class, UsesTestStream.class}) public void testFirstElementLate() { Instant lateElementTimestamp = new Instant(-1_000_000); TestStream stream = @@ -267,7 +264,7 @@ public void testFirstElementLate() { } @Test - @Category({NeedsRunner.class, UsesTestStream.class}) + @Category({ValidatesRunner.class, UsesTestStream.class}) public void testElementsAtAlmostPositiveInfinity() { Instant endOfGlobalWindow = GlobalWindow.INSTANCE.maxTimestamp(); TestStream stream = @@ -293,15 +290,17 @@ public void testElementsAtAlmostPositiveInfinity() { } @Test - @Category({NeedsRunner.class, UsesTestStream.class}) + @Category({ValidatesRunner.class, UsesTestStream.class}) public void testMultipleStreams() { TestStream stream = TestStream.create(StringUtf8Coder.of()) .addElements("foo", "bar") .advanceWatermarkToInfinity(); - TestStream other = - TestStream.create(VarIntCoder.of()).addElements(1, 2, 3, 4).advanceWatermarkToInfinity(); + TestStream other = + TestStream.create(VarLongCoder.of()) + .addElements(1L, 2L, 3L, 4L) + .advanceWatermarkToInfinity(); PCollection createStrings = p.apply("CreateStrings", stream) @@ -312,15 +311,15 @@ public void testMultipleStreams() { .withAllowedLateness(Duration.ZERO) .accumulatingFiredPanes()); PAssert.that(createStrings).containsInAnyOrder("foo", "bar"); - PCollection createInts = + PCollection createInts = p.apply("CreateInts", other) .apply( "WindowInts", - Window.configure() + Window.configure() .triggering(AfterPane.elementCountAtLeast(4)) .withAllowedLateness(Duration.ZERO) .accumulatingFiredPanes()); - PAssert.that(createInts).containsInAnyOrder(1, 2, 3, 4); + PAssert.that(createInts).containsInAnyOrder(1L, 2L, 3L, 4L); p.run(); } @@ -352,7 +351,7 @@ public void testAdvanceWatermarkEqualToPositiveInfinityThrows() { } @Test - @Category({NeedsRunner.class, UsesTestStreamWithProcessingTime.class}) + @Category({ValidatesRunner.class, UsesTestStreamWithProcessingTime.class}) public void testEarlyPanesOfWindow() { TestStream source = TestStream.create(VarLongCoder.of()) diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ThreadLeakTracker.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ThreadLeakTracker.java index 8132ac7436fa..1ab11271cc45 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ThreadLeakTracker.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/ThreadLeakTracker.java @@ -32,9 +32,6 @@ * Tracks the threads created during a test method execution (or class using @ClassRule) and fails * if some still exists after the test method execution. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ThreadLeakTracker implements TestRule { private final Field groupField; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/WindowSupplierTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/WindowSupplierTest.java index 676d82014769..8d49505b7785 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/WindowSupplierTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/WindowSupplierTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.testing; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.io.IOException; import java.io.InputStream; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ApproximateUniqueTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ApproximateUniqueTest.java index 2d96b32e7ac1..bfa4f8d75c6e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ApproximateUniqueTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ApproximateUniqueTest.java @@ -57,9 +57,6 @@ import org.junit.runners.Parameterized; /** Tests for the ApproximateUnique transform. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ApproximateUniqueTest implements Serializable { // implements Serializable just to make it easy to use anonymous inner DoFn subclasses diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineFnsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineFnsTest.java index 27d3f7557c53..4f89d8b68620 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineFnsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineFnsTest.java @@ -19,7 +19,7 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFor; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.io.IOException; import java.io.InputStream; @@ -60,9 +60,6 @@ /** Unit tests for {@link CombineFns}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CombineFnsTest { @Rule public final TestPipeline p = TestPipeline.create(); @Rule public ExpectedException expectedException = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java index ab70ada28256..9cc306e107a8 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java @@ -23,12 +23,12 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFor; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.io.InputStream; @@ -96,9 +96,6 @@ import org.junit.runners.JUnit4; /** Tests for {@link Combine} transforms. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CombineTest implements Serializable { // This test is Serializable, just so that it's easy to have // anonymous inner classes inside the non-static test methods. diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CreateTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CreateTest.java index ce6538bfc0e2..f0dee49f0838 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CreateTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CreateTest.java @@ -20,11 +20,11 @@ import static org.apache.beam.sdk.TestUtils.LINES; import static org.apache.beam.sdk.TestUtils.LINES_ARRAY; import static org.apache.beam.sdk.TestUtils.NO_LINES_ARRAY; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasSize; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.io.InputStream; @@ -80,7 +80,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "unchecked", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class CreateTest { @Rule public final ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DeduplicateTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DeduplicateTest.java index 00ee2d2ed783..0a8dc54ff903 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DeduplicateTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DeduplicateTest.java @@ -23,7 +23,7 @@ import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; -import org.apache.beam.sdk.coders.VarIntCoder; +import org.apache.beam.sdk.coders.VarLongCoder; import org.apache.beam.sdk.state.TimeDomain; import org.apache.beam.sdk.testing.NeedsRunner; import org.apache.beam.sdk.testing.PAssert; @@ -177,27 +177,28 @@ public T apply(KV input) { @Category({NeedsRunner.class, UsesTestStreamWithProcessingTime.class}) public void testRepresentativeValuesWithCoder() { Instant base = new Instant(0); - TestStream> values = - TestStream.create(KvCoder.of(VarIntCoder.of(), StringUtf8Coder.of())) + TestStream> values = + TestStream.create(KvCoder.of(VarLongCoder.of(), StringUtf8Coder.of())) .advanceWatermarkTo(base) .addElements( - TimestampedValue.of(KV.of(1, "k1"), base), - TimestampedValue.of(KV.of(2, "k2"), base.plus(Duration.standardSeconds(10))), - TimestampedValue.of(KV.of(3, "k3"), base.plus(Duration.standardSeconds(20)))) + TimestampedValue.of(KV.of(1L, "k1"), base), + TimestampedValue.of(KV.of(2L, "k2"), base.plus(Duration.standardSeconds(10))), + TimestampedValue.of(KV.of(3L, "k3"), base.plus(Duration.standardSeconds(20)))) .advanceProcessingTime(Duration.standardMinutes(1)) .addElements( - TimestampedValue.of(KV.of(1, "k1"), base.plus(Duration.standardSeconds(30))), - TimestampedValue.of(KV.of(2, "k2"), base.plus(Duration.standardSeconds(40))), - TimestampedValue.of(KV.of(3, "k3"), base.plus(Duration.standardSeconds(50)))) + TimestampedValue.of(KV.of(1L, "k1"), base.plus(Duration.standardSeconds(30))), + TimestampedValue.of(KV.of(2L, "k2"), base.plus(Duration.standardSeconds(40))), + TimestampedValue.of(KV.of(3L, "k3"), base.plus(Duration.standardSeconds(50)))) .advanceWatermarkToInfinity(); - PCollection> distinctValues = + PCollection> distinctValues = p.apply(values) .apply( - Deduplicate.withRepresentativeValueFn(new Keys()) - .withRepresentativeCoder(VarIntCoder.of())); + Deduplicate.withRepresentativeValueFn(new Keys()) + .withRepresentativeCoder(VarLongCoder.of())); - PAssert.that(distinctValues).containsInAnyOrder(KV.of(1, "k1"), KV.of(2, "k2"), KV.of(3, "k3")); + PAssert.that(distinctValues) + .containsInAnyOrder(KV.of(1L, "k1"), KV.of(2L, "k2"), KV.of(3L, "k3")); p.run(); } @@ -205,27 +206,28 @@ public void testRepresentativeValuesWithCoder() { @Category({NeedsRunner.class, UsesTestStreamWithProcessingTime.class}) public void testTriggeredRepresentativeValuesWithType() { Instant base = new Instant(0); - TestStream> values = - TestStream.create(KvCoder.of(VarIntCoder.of(), StringUtf8Coder.of())) + TestStream> values = + TestStream.create(KvCoder.of(VarLongCoder.of(), StringUtf8Coder.of())) .advanceWatermarkTo(base) .addElements( - TimestampedValue.of(KV.of(1, "k1"), base), - TimestampedValue.of(KV.of(2, "k2"), base.plus(Duration.standardSeconds(10))), - TimestampedValue.of(KV.of(3, "k3"), base.plus(Duration.standardSeconds(20)))) + TimestampedValue.of(KV.of(1L, "k1"), base), + TimestampedValue.of(KV.of(2L, "k2"), base.plus(Duration.standardSeconds(10))), + TimestampedValue.of(KV.of(3L, "k3"), base.plus(Duration.standardSeconds(20)))) .advanceProcessingTime(Duration.standardMinutes(1)) .addElements( - TimestampedValue.of(KV.of(1, "k1"), base.plus(Duration.standardSeconds(30))), - TimestampedValue.of(KV.of(2, "k2"), base.plus(Duration.standardSeconds(40))), - TimestampedValue.of(KV.of(3, "k3"), base.plus(Duration.standardSeconds(50)))) + TimestampedValue.of(KV.of(1L, "k1"), base.plus(Duration.standardSeconds(30))), + TimestampedValue.of(KV.of(2L, "k2"), base.plus(Duration.standardSeconds(40))), + TimestampedValue.of(KV.of(3L, "k3"), base.plus(Duration.standardSeconds(50)))) .advanceWatermarkToInfinity(); - PCollection> distinctValues = + PCollection> distinctValues = p.apply(values) .apply( - Deduplicate.withRepresentativeValueFn(new Keys()) - .withRepresentativeCoder(VarIntCoder.of())); + Deduplicate.withRepresentativeValueFn(new Keys()) + .withRepresentativeCoder(VarLongCoder.of())); - PAssert.that(distinctValues).containsInAnyOrder(KV.of(1, "k1"), KV.of(2, "k2"), KV.of(3, "k3")); + PAssert.that(distinctValues) + .containsInAnyOrder(KV.of(1L, "k1"), KV.of(2L, "k2"), KV.of(3L, "k3")); p.run(); } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DistinctTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DistinctTest.java index 7b44df9fcb6f..124395d00194 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DistinctTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DistinctTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.transforms; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.Arrays; @@ -32,7 +32,7 @@ import java.util.Set; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; -import org.apache.beam.sdk.coders.VarIntCoder; +import org.apache.beam.sdk.coders.VarLongCoder; import org.apache.beam.sdk.testing.NeedsRunner; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; @@ -62,9 +62,6 @@ /** Tests for {@link Distinct}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DistinctTest { @Rule public final TestPipeline p = TestPipeline.create(); @@ -220,25 +217,25 @@ public void testTriggeredDistinct() { @Category({NeedsRunner.class, UsesTestStreamWithProcessingTime.class}) public void testTriggeredDistinctRepresentativeValues() { Instant base = new Instant(0); - TestStream> values = - TestStream.create(KvCoder.of(VarIntCoder.of(), StringUtf8Coder.of())) + TestStream> values = + TestStream.create(KvCoder.of(VarLongCoder.of(), StringUtf8Coder.of())) .advanceWatermarkTo(base) .addElements( - TimestampedValue.of(KV.of(1, "k1"), base), - TimestampedValue.of(KV.of(2, "k2"), base.plus(Duration.standardSeconds(10))), - TimestampedValue.of(KV.of(3, "k3"), base.plus(Duration.standardSeconds(20)))) + TimestampedValue.of(KV.of(1L, "k1"), base), + TimestampedValue.of(KV.of(2L, "k2"), base.plus(Duration.standardSeconds(10))), + TimestampedValue.of(KV.of(3L, "k3"), base.plus(Duration.standardSeconds(20)))) .advanceProcessingTime(Duration.standardMinutes(1)) .addElements( - TimestampedValue.of(KV.of(1, "k1"), base.plus(Duration.standardSeconds(30))), - TimestampedValue.of(KV.of(2, "k2"), base.plus(Duration.standardSeconds(40))), - TimestampedValue.of(KV.of(3, "k3"), base.plus(Duration.standardSeconds(50)))) + TimestampedValue.of(KV.of(1L, "k1"), base.plus(Duration.standardSeconds(30))), + TimestampedValue.of(KV.of(2L, "k2"), base.plus(Duration.standardSeconds(40))), + TimestampedValue.of(KV.of(3L, "k3"), base.plus(Duration.standardSeconds(50)))) .advanceWatermarkToInfinity(); - PCollection> distinctValues = + PCollection> distinctValues = triggeredDistinctRepresentativePipeline .apply(values) .apply( - Window.>into(FixedWindows.of(Duration.standardMinutes(1))) + Window.>into(FixedWindows.of(Duration.standardMinutes(1))) .triggering( Repeatedly.forever( AfterProcessingTime.pastFirstElementInPane() @@ -246,10 +243,11 @@ public void testTriggeredDistinctRepresentativeValues() { .withAllowedLateness(Duration.ZERO) .accumulatingFiredPanes()) .apply( - Distinct.withRepresentativeValueFn(new Keys()) - .withRepresentativeType(TypeDescriptor.of(Integer.class))); + Distinct.withRepresentativeValueFn(new Keys()) + .withRepresentativeType(TypeDescriptor.of(Long.class))); - PAssert.that(distinctValues).containsInAnyOrder(KV.of(1, "k1"), KV.of(2, "k2"), KV.of(3, "k3")); + PAssert.that(distinctValues) + .containsInAnyOrder(KV.of(1L, "k1"), KV.of(2L, "k2"), KV.of(3L, "k3")); triggeredDistinctRepresentativePipeline.run(); } @@ -261,18 +259,18 @@ public void testTriggeredDistinctRepresentativeValues() { @Category({NeedsRunner.class, UsesTestStreamWithProcessingTime.class}) public void testTriggeredDistinctRepresentativeValuesEmpty() { Instant base = new Instant(0); - TestStream> values = - TestStream.create(KvCoder.of(VarIntCoder.of(), StringUtf8Coder.of())) + TestStream> values = + TestStream.create(KvCoder.of(VarLongCoder.of(), StringUtf8Coder.of())) .advanceWatermarkTo(base) - .addElements(TimestampedValue.of(KV.of(1, "k1"), base)) + .addElements(TimestampedValue.of(KV.of(1L, "k1"), base)) .advanceProcessingTime(Duration.standardMinutes(1)) .advanceWatermarkToInfinity(); - PCollection> distinctValues = + PCollection> distinctValues = triggeredDistinctRepresentativePipeline .apply(values) .apply( - Window.>into(FixedWindows.of(Duration.standardMinutes(1))) + Window.>into(FixedWindows.of(Duration.standardMinutes(1))) .triggering( AfterWatermark.pastEndOfWindow() .withEarlyFirings( @@ -281,10 +279,10 @@ public void testTriggeredDistinctRepresentativeValuesEmpty() { .withAllowedLateness(Duration.ZERO) .discardingFiredPanes()) .apply( - Distinct.withRepresentativeValueFn(new Keys()) - .withRepresentativeType(TypeDescriptor.of(Integer.class))); + Distinct.withRepresentativeValueFn(new Keys()) + .withRepresentativeType(TypeDescriptor.of(Long.class))); - PAssert.that(distinctValues).containsInAnyOrder(KV.of(1, "k1")); + PAssert.that(distinctValues).containsInAnyOrder(KV.of(1L, "k1")); triggeredDistinctRepresentativePipeline.run(); } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTest.java index e97d6b3c7b00..86d82d5de385 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.transforms; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.empty; -import static org.junit.Assert.assertThat; import java.io.Serializable; import org.apache.beam.sdk.testing.TestPipeline; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java index 0f12ee0c1116..2d888499f6ad 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java @@ -18,11 +18,11 @@ package org.apache.beam.sdk.transforms; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.hasItems; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.List; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FilterTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FilterTest.java index 22d3a002e24c..091c28fc240a 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FilterTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FilterTest.java @@ -36,9 +36,6 @@ /** Tests for {@link Filter}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FilterTest implements Serializable { static class TrivialFn implements SerializableFunction { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlatMapElementsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlatMapElementsTest.java index 72c57fec933e..9e606b39c5a9 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlatMapElementsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlatMapElementsTest.java @@ -54,9 +54,6 @@ /** Tests for {@link FlatMapElements}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlatMapElementsTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java index 109cf53100c1..93365bbe4051 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java @@ -22,9 +22,9 @@ import static org.apache.beam.sdk.TestUtils.LINES_ARRAY; import static org.apache.beam.sdk.TestUtils.NO_LINES; import static org.apache.beam.sdk.TestUtils.NO_LINES_ARRAY; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.ArrayList; @@ -77,9 +77,6 @@ /** Tests for Flatten. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlattenTest implements Serializable { @Rule public final transient TestPipeline p = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java index 590d27c134a6..fd849a506b97 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java @@ -21,9 +21,9 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.hamcrest.CoreMatchers.equalTo; import static org.hamcrest.CoreMatchers.hasItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.empty; import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; -import static org.junit.Assert.assertThat; import java.io.DataInputStream; import java.io.DataOutputStream; @@ -61,7 +61,6 @@ import org.apache.beam.sdk.transforms.windowing.FixedWindows; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.transforms.windowing.IntervalWindow; -import org.apache.beam.sdk.transforms.windowing.InvalidWindows; import org.apache.beam.sdk.transforms.windowing.Repeatedly; import org.apache.beam.sdk.transforms.windowing.Sessions; import org.apache.beam.sdk.transforms.windowing.SlidingWindows; @@ -71,8 +70,10 @@ import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Streams; import org.checkerframework.checker.nullness.qual.Nullable; import org.hamcrest.Matcher; import org.joda.time.Duration; @@ -586,31 +587,46 @@ public void testGroupByKeyMergingWindows() { } @Test - @Category(NeedsRunner.class) - public void testIdentityWindowFnPropagation() { - - List> ungroupedPairs = Arrays.asList(); - + @Category(ValidatesRunner.class) + public void testRewindowWithTimestampCombiner() { PCollection> input = p.apply( - Create.of(ungroupedPairs) - .withCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of()))) - .apply(Window.into(FixedWindows.of(Duration.standardMinutes(1)))); + Create.timestamped( + TimestampedValue.of(KV.of("foo", 1), new Instant(1)), + TimestampedValue.of(KV.of("foo", 4), new Instant(4)), + TimestampedValue.of(KV.of("bar", 3), new Instant(3)), + TimestampedValue.of(KV.of("foo", 9), new Instant(9)))) + .apply( + "GlobalWindows", + Window.>configure() + .withTimestampCombiner(TimestampCombiner.LATEST)); - PCollection>> output = input.apply(GroupByKey.create()); + PCollection> result = + input + .apply(GroupByKey.create()) + .apply( + MapElements.into( + TypeDescriptors.kvs( + TypeDescriptors.strings(), TypeDescriptors.integers())) + .via(kv -> KV.of(kv.getKey(), sum(kv.getValue())))) + .apply("FixedWindows", Window.into(FixedWindows.of(Duration.millis(1)))); + + PAssert.that(result) + .inWindow(new IntervalWindow(new Instant(9), new Instant(10))) + .containsInAnyOrder(KV.of("foo", 14)) + .inWindow(new IntervalWindow(new Instant(3), new Instant(4))) + .containsInAnyOrder(KV.of("bar", 3)); p.run(); + } - Assert.assertTrue( - output - .getWindowingStrategy() - .getWindowFn() - .isCompatible(FixedWindows.of(Duration.standardMinutes(1)))); + private static int sum(Iterable parts) { + return Streams.stream(parts).mapToInt(e -> e).sum(); } @Test @Category(NeedsRunner.class) - public void testWindowFnInvalidation() { + public void testIdentityWindowFnPropagation() { List> ungroupedPairs = Arrays.asList(); @@ -618,7 +634,7 @@ public void testWindowFnInvalidation() { p.apply( Create.of(ungroupedPairs) .withCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of()))) - .apply(Window.into(Sessions.withGapDuration(Duration.standardMinutes(1)))); + .apply(Window.into(FixedWindows.of(Duration.standardMinutes(1)))); PCollection>> output = input.apply(GroupByKey.create()); @@ -628,25 +644,53 @@ public void testWindowFnInvalidation() { output .getWindowingStrategy() .getWindowFn() - .isCompatible( - new InvalidWindows( - "Invalid", Sessions.withGapDuration(Duration.standardMinutes(1))))); + .isCompatible(FixedWindows.of(Duration.standardMinutes(1)))); } @Test - public void testInvalidWindowsDirect() { - - List> ungroupedPairs = Arrays.asList(); + @Category(NeedsRunner.class) + public void testWindowFnPostMerging() throws Exception { - PCollection> input = + List> ungroupedPairs = ImmutableList.of(KV.of("a", 3)); + PCollection> windowedInput = p.apply( - Create.of(ungroupedPairs) - .withCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of()))) - .apply(Window.into(Sessions.withGapDuration(Duration.standardMinutes(1)))); + Create.timestamped( + TimestampedValue.of(KV.of("foo", 1), new Instant(1)), + TimestampedValue.of(KV.of("foo", 4), new Instant(4)), + TimestampedValue.of(KV.of("bar", 3), new Instant(3)), + TimestampedValue.of(KV.of("foo", 9), new Instant(9)))) + .apply(Window.into(Sessions.withGapDuration(Duration.millis(4L)))); - thrown.expect(IllegalStateException.class); - thrown.expectMessage("GroupByKey must have a valid Window merge function"); - input.apply("GroupByKey", GroupByKey.create()).apply("GroupByKeyAgain", GroupByKey.create()); + PCollection>> grouped = + windowedInput.apply("First grouping", GroupByKey.create()); + PAssert.that(grouped).satisfies(containsKvs(kv("foo", 1, 4), kv("foo", 9), kv("bar", 3))); + + // Check that the WindowFn is carried along as-is but alreadyMerged bit is set + assertThat( + grouped.getWindowingStrategy().getWindowFn(), + equalTo(windowedInput.getWindowingStrategy().getWindowFn())); + + assertThat( + "WindowingStrategy should be already merged", + grouped.getWindowingStrategy().isAlreadyMerged()); + + // Second grouping should sum existing groupings, even those exploded, since the windows match + // and are carried along. + PCollection sums = + grouped + .apply("Drop keys", Values.create()) + .apply("Explode iterables", Flatten.iterables()) + .apply("Map to same key", WithKeys.of("bizzle")) + .apply("Summed grouping", Sum.integersPerKey()) + .apply("Pull out sums", Values.create()); + + PAssert.that(sums) + .containsInAnyOrder( + 5 /* sum originating from (foo, 1) and (foo, 4) that merged */, + 9 /* sum of just (foo, 9) which doesn't merge */, + 3 /* sum of just (bar, 3) which doesn't merge */); + + p.run(); } } @@ -749,7 +793,7 @@ public void process(ProcessContext c) { * returns {@code false} for {@link #equals(Object)}. The results of the test are correct if the * runner correctly hashes and sorts on the encoded bytes. */ - static class BadEqualityKey { + protected static class BadEqualityKey { long key; public BadEqualityKey() {} @@ -770,7 +814,7 @@ public int hashCode() { } /** Deterministic {@link Coder} for {@link BadEqualityKey}. */ - static class DeterministicKeyCoder extends AtomicCoder { + protected static class DeterministicKeyCoder extends AtomicCoder { public static DeterministicKeyCoder of() { return INSTANCE; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupIntoBatchesTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupIntoBatchesTest.java index 33348756ac88..316e258f95ee 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupIntoBatchesTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupIntoBatchesTest.java @@ -37,6 +37,7 @@ import org.apache.beam.sdk.testing.TestStream.WatermarkEvent; import org.apache.beam.sdk.testing.UsesStatefulParDo; import org.apache.beam.sdk.testing.UsesTestStream; +import org.apache.beam.sdk.testing.UsesTestStreamWithProcessingTime; import org.apache.beam.sdk.testing.UsesTimersInParDo; import org.apache.beam.sdk.transforms.windowing.AfterPane; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; @@ -48,6 +49,7 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.joda.time.Duration; import org.joda.time.Instant; @@ -64,6 +66,7 @@ public class GroupIntoBatchesTest implements Serializable { private static final int BATCH_SIZE = 5; + private static final long BATCH_SIZE_BYTES = 25; private static final long EVEN_NUM_ELEMENTS = 10; private static final long ODD_NUM_ELEMENTS = 11; private static final int ALLOWED_LATENESS = 0; @@ -95,7 +98,7 @@ private static ArrayList> createTestData(long numElements) { @Test @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesStatefulParDo.class}) - public void testInGlobalWindow() { + public void testInGlobalWindowBatchSizeCount() { PCollection>> collection = pipeline .apply("Input data", Create.of(data)) @@ -126,6 +129,99 @@ public Void apply(Iterable>> input) { pipeline.run(); } + @Test + @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesStatefulParDo.class}) + public void testInGlobalWindowBatchSizeByteSize() { + PCollection>> collection = + pipeline + .apply("Input data", Create.of(data)) + .apply(GroupIntoBatches.ofByteSize(BATCH_SIZE_BYTES)) + // set output coder + .setCoder(KvCoder.of(StringUtf8Coder.of(), IterableCoder.of(StringUtf8Coder.of()))); + PAssert.that("Incorrect batch size in one or more elements", collection) + .satisfies( + new SerializableFunction>>, Void>() { + + private boolean checkBatchSizes(Iterable>> listToCheck) { + for (KV> element : listToCheck) { + long byteSize = 0; + for (String str : element.getValue()) { + if (byteSize >= BATCH_SIZE_BYTES) { + // We already reached the batch size, so extra elements are not expected. + return false; + } + try { + byteSize += StringUtf8Coder.of().getEncodedElementByteSize(str); + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + return true; + } + + @Override + public Void apply(Iterable>> input) { + assertTrue(checkBatchSizes(input)); + return null; + } + }); + PAssert.thatSingleton("Incorrect collection size", collection.apply("Count", Count.globally())) + .isEqualTo(3L); + pipeline.run(); + } + + @Test + @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesStatefulParDo.class}) + public void testInGlobalWindowBatchSizeByteSizeFn() { + PCollection>> collection = + pipeline + .apply("Input data", Create.of(data)) + .apply( + GroupIntoBatches.ofByteSize( + BATCH_SIZE_BYTES, + s -> { + try { + return 2 * StringUtf8Coder.of().getEncodedElementByteSize(s); + } catch (Exception e) { + throw new RuntimeException(e); + } + })) + // set output coder + .setCoder(KvCoder.of(StringUtf8Coder.of(), IterableCoder.of(StringUtf8Coder.of()))); + PAssert.that("Incorrect batch size in one or more elements", collection) + .satisfies( + new SerializableFunction>>, Void>() { + + private boolean checkBatchSizes(Iterable>> listToCheck) { + for (KV> element : listToCheck) { + long byteSize = 0; + for (String str : element.getValue()) { + if (byteSize >= BATCH_SIZE_BYTES) { + // We already reached the batch size, so extra elements are not expected. + return false; + } + try { + byteSize += 2 * StringUtf8Coder.of().getEncodedElementByteSize(str); + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + return true; + } + + @Override + public Void apply(Iterable>> input) { + assertTrue(checkBatchSizes(input)); + return null; + } + }); + PAssert.thatSingleton("Incorrect collection size", collection.apply("Count", Count.globally())) + .isEqualTo(5L); + pipeline.run(); + } + @Test @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesStatefulParDo.class}) public void testWithShardedKeyInGlobalWindow() { @@ -204,7 +300,9 @@ public KV apply( numFullBatches > totalNumBatches / 2); return null; }); - pipeline.run(); + pipeline + .runWithAdditionalOptionArgs(ImmutableList.of("--targetParallelism=1")) + .waitUntilFinish(); } /** test behavior when the number of input elements is not evenly divisible by batch size. */ @@ -354,6 +452,7 @@ public void processElement(ProcessContext c, BoundedWindow window) { NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class, + UsesTestStreamWithProcessingTime.class, UsesStatefulParDo.class }) public void testBufferingTimerInFixedWindow() { @@ -476,6 +575,7 @@ public void processElement(ProcessContext c, BoundedWindow window) { NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class, + UsesTestStreamWithProcessingTime.class, UsesStatefulParDo.class }) public void testBufferingTimerInGlobalWindow() { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/JsonToRowTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/JsonToRowTest.java index 565b284eea61..80c2a5b87a20 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/JsonToRowTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/JsonToRowTest.java @@ -44,9 +44,6 @@ /** Unit tests for {@link JsonToRow}. */ @RunWith(JUnit4.class) @Category(UsesSchema.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JsonToRowTest implements Serializable { @Rule public transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/KvSwapTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/KvSwapTest.java index e9d30fb30237..8be30267eabe 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/KvSwapTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/KvSwapTest.java @@ -39,7 +39,6 @@ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "unchecked", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class KvSwapTest { private static final KV[] TABLE = diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/LatestFnTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/LatestFnTest.java index 594e50acef94..7e90578c552e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/LatestFnTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/LatestFnTest.java @@ -39,9 +39,6 @@ /** Unit tests for {@link Latest.LatestFn}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LatestFnTest { private static final Instant INSTANT = new Instant(100); private static final long VALUE = 100 * INSTANT.getMillis(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/LatestTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/LatestTest.java index 3f2c1bfd1706..a8aed85e99f5 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/LatestTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/LatestTest.java @@ -46,9 +46,6 @@ /** Unit tests for {@link Latest} {@link PTransform} and {@link Combine.CombineFn}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LatestTest implements Serializable { @Rule public final transient TestPipeline p = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapElementsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapElementsTest.java index b11ba7344473..390a61358071 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapElementsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapElementsTest.java @@ -55,9 +55,6 @@ /** Tests for {@link MapElements}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MapElementsTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapKeysTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapKeysTest.java new file mode 100644 index 000000000000..659388b133ea --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapKeysTest.java @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.coders.BigEndianIntegerCoder; +import org.apache.beam.sdk.coders.DoubleCoder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link MapKeys} transform. */ +@RunWith(JUnit4.class) +public class MapKeysTest { + + private static final List> TABLE = + ImmutableList.of(KV.of(1, "one"), KV.of(2, "two"), KV.of(3, "none")); + private static final List> WORDS_TABLE = + ImmutableList.of( + KV.of("one", "Length = 3"), KV.of("three", "Length = 4"), KV.of("", "Length = 0")); + + private static final List> EMPTY_TABLE = new ArrayList<>(); + public static final String EXPECTED_FAILURE_MESSAGE = "/ by zero"; + + @Rule public final TestPipeline p = TestPipeline.create(); + + @Test + @Category(NeedsRunner.class) + public void testMapKeysInto() { + + PCollection> input = + p.apply( + Create.of(TABLE) + .withCoder(KvCoder.of(BigEndianIntegerCoder.of(), StringUtf8Coder.of()))); + + PCollection> output = + input + .apply( + MapKeys.into(TypeDescriptors.doubles()) + .via((SerializableFunction) input1 -> input1 * 2d)) + .setCoder(KvCoder.of(DoubleCoder.of(), StringUtf8Coder.of())); + + PAssert.that(output) + .containsInAnyOrder( + ImmutableList.of(KV.of(2.0d, "one"), KV.of(4.0d, "two"), KV.of(6.0d, "none"))); + + p.run(); + } + + @Test + @Category(NeedsRunner.class) + public void testMapKeysWithFailures() { + + PCollection> input = + p.apply( + Create.of(WORDS_TABLE) + .withCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()))); + + WithFailures.Result>, String> result = + input.apply( + MapKeys.into(TypeDescriptors.integers()) + .via(word -> 1 / word.length()) + .exceptionsInto(TypeDescriptors.strings()) + .exceptionsVia(ee -> ee.exception().getMessage())); + result.output().setCoder(KvCoder.of(BigEndianIntegerCoder.of(), StringUtf8Coder.of())); + + PAssert.that(result.output()) + .containsInAnyOrder(ImmutableList.of(KV.of(0, "Length = 3"), KV.of(0, "Length = 4"))); + PAssert.that(result.failures()).containsInAnyOrder(EXPECTED_FAILURE_MESSAGE); + + p.run(); + } + + @Test + @Category(NeedsRunner.class) + public void testMapKeysEmpty() { + + PCollection> input = + p.apply( + Create.of(EMPTY_TABLE) + .withCoder(KvCoder.of(BigEndianIntegerCoder.of(), StringUtf8Coder.of()))); + + PCollection> output = + input + .apply(MapKeys.into(TypeDescriptors.doubles()).via(Integer::doubleValue)) + .setCoder(KvCoder.of(DoubleCoder.of(), StringUtf8Coder.of())); + + PAssert.that(output).empty(); + + p.run(); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapValuesTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapValuesTest.java new file mode 100644 index 000000000000..e32d6e9ca600 --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapValuesTest.java @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.coders.BigEndianIntegerCoder; +import org.apache.beam.sdk.coders.DoubleCoder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link MapValues} transform. */ +@RunWith(JUnit4.class) +public class MapValuesTest { + + private static final List> TABLE = + ImmutableList.of(KV.of("one", 1), KV.of("two", 2), KV.of("dup", 2)); + private static final List> WORDS_TABLE = + ImmutableList.of( + KV.of("Length = 3", "one"), KV.of("Length = 4", "three"), KV.of("Length = 0", "")); + + private static final List> EMPTY_TABLE = new ArrayList<>(); + public static final String EXPECTED_FAILURE_MESSAGE = "/ by zero"; + + @Rule public final TestPipeline p = TestPipeline.create(); + + @Test + @Category(NeedsRunner.class) + public void testMapValuesInto() { + + PCollection> input = + p.apply( + Create.of(TABLE) + .withCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of()))); + + PCollection> output = + input + .apply( + MapValues.into(TypeDescriptors.doubles()) + .via((SerializableFunction) input1 -> input1 * 2d)) + .setCoder(KvCoder.of(StringUtf8Coder.of(), DoubleCoder.of())); + + PAssert.that(output) + .containsInAnyOrder( + ImmutableList.of(KV.of("one", 2.0d), KV.of("two", 4.0d), KV.of("dup", 4.0d))); + + p.run(); + } + + @Test + @Category(NeedsRunner.class) + public void testMapValuesWithFailures() { + + PCollection> input = + p.apply( + Create.of(WORDS_TABLE) + .withCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()))); + + WithFailures.Result>, String> result = + input.apply( + MapValues.into(TypeDescriptors.integers()) + .via(word -> 1 / word.length()) + .exceptionsInto(TypeDescriptors.strings()) + .exceptionsVia(ee -> ee.exception().getMessage())); + result.output().setCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of())); + + PAssert.that(result.output()) + .containsInAnyOrder(ImmutableList.of(KV.of("Length = 3", 0), KV.of("Length = 4", 0))); + PAssert.that(result.failures()).containsInAnyOrder(EXPECTED_FAILURE_MESSAGE); + + p.run(); + } + + @Test + @Category(NeedsRunner.class) + public void testMapValuesEmpty() { + + PCollection> input = + p.apply( + Create.of(EMPTY_TABLE) + .withCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of()))); + + PCollection> output = + input + .apply(MapValues.into(TypeDescriptors.doubles()).via(Integer::doubleValue)) + .setCoder(KvCoder.of(StringUtf8Coder.of(), DoubleCoder.of())); + + PAssert.that(output).empty(); + + p.run(); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PTransformTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PTransformTest.java index 715dd900cb4e..7196b2447082 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PTransformTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PTransformTest.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.transforms; import static org.apache.beam.sdk.values.TypeDescriptors.integers; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.empty; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.Serializable; import org.apache.beam.sdk.testing.NeedsRunner; @@ -37,9 +37,6 @@ /** Tests for {@link PTransform} base class. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PTransformTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java index 799d89a7454e..44f56ca36d41 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java @@ -64,7 +64,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ParDoLifecycleTest implements Serializable { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoSchemaTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoSchemaTest.java index 00e3221da03c..9eee4540bca9 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoSchemaTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoSchemaTest.java @@ -69,9 +69,6 @@ /** Test {@link Schema} support. */ @RunWith(JUnit4.class) @Category(UsesSchema.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ParDoSchemaTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); @Rule public transient ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java index 6bc09acbb51f..e0ab582efece 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java @@ -28,6 +28,7 @@ import static org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString; import static org.apache.beam.sdk.util.StringUtils.jsonStringToByteArray; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.containsString; @@ -38,7 +39,6 @@ import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -97,6 +97,7 @@ import org.apache.beam.sdk.testing.TestStream; import org.apache.beam.sdk.testing.UsesBundleFinalizer; import org.apache.beam.sdk.testing.UsesKeyInParDo; +import org.apache.beam.sdk.testing.UsesLoopingTimer; import org.apache.beam.sdk.testing.UsesMapState; import org.apache.beam.sdk.testing.UsesOnWindowExpiration; import org.apache.beam.sdk.testing.UsesOrderedListState; @@ -164,7 +165,6 @@ /** Tests for ParDo. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ParDoTest implements Serializable { // This test is Serializable, just so that it's easy to have @@ -624,6 +624,35 @@ public void process(OutputReceiver r, PipelineOptions options) { pipeline.run(); } + + @Test + @Category(ValidatesRunner.class) + public void testSetupParameter() { + PCollection results = + pipeline + .apply(Create.of(1)) + .apply( + ParDo.of( + new DoFn() { + transient String myOptionValue; + + @Setup + public void setup(PipelineOptions options) { + myOptionValue = options.as(MyOptions.class).getFakeOption(); + } + + @ProcessElement + public void process(OutputReceiver r) { + r.output(myOptionValue); + } + })); + + String testOptionValue = "my value"; + pipeline.getOptions().as(MyOptions.class).setFakeOption(testOptionValue); + PAssert.that(results).containsInAnyOrder("my value"); + + pipeline.run(); + } } /** Tests to validate behaviors around multiple inputs or outputs. */ @@ -818,6 +847,39 @@ public void testParDoWithSideInputs() { pipeline.run(); } + @Test + @Category({ValidatesRunner.class, UsesSideInputs.class}) + public void testSameSideInputReadTwice() { + + List inputs = ImmutableList.of(3, -42, 66); + + PCollection input = pipeline.apply(Create.of(inputs)); + + PCollectionView sideInput = + pipeline + .apply("CreateSideInput", Create.of(11)) + .apply("ViewSideInput", View.asSingleton()); + + PCollection output1 = + input.apply( + "First ParDo", + ParDo.of(new TestDoFn(ImmutableList.of(sideInput), Arrays.asList())) + .withSideInputs(sideInput)); + + PCollection output2 = + input.apply( + "Second ParDo", + ParDo.of(new TestDoFn(ImmutableList.of(sideInput), Arrays.asList())) + .withSideInputs(sideInput)); + + PAssert.that(output1) + .satisfies(ParDoTest.HasExpectedOutput.forInput(inputs).andSideInputs(11)); + PAssert.that(output2) + .satisfies(ParDoTest.HasExpectedOutput.forInput(inputs).andSideInputs(11)); + + pipeline.run(); + } + @Test @Category({NeedsRunner.class, UsesSideInputs.class}) public void testSideInputAnnotationFailedValidationMissing() { @@ -1362,7 +1424,7 @@ public void processElement(ProcessContext cxt, @Element Long element) { } @Test - @Category(NeedsRunner.class) + @Category(ValidatesRunner.class) public void testMultiOutputChaining() { PCollectionTuple filters = @@ -1653,7 +1715,7 @@ public void testBundleFinalizationWithSideInputs() { @RunWith(JUnit4.class) public static class LifecycleTests extends SharedTestBase implements Serializable { @Test - @Category(NeedsRunner.class) + @Category(ValidatesRunner.class) public void testParDoWithErrorInStartBatch() { List inputs = Arrays.asList(3, -42, 666); @@ -1665,7 +1727,7 @@ public void testParDoWithErrorInStartBatch() { } @Test - @Category(NeedsRunner.class) + @Category(ValidatesRunner.class) public void testParDoWithErrorInProcessElement() { List inputs = Arrays.asList(3, -42, 666); @@ -1678,7 +1740,7 @@ public void testParDoWithErrorInProcessElement() { } @Test - @Category(NeedsRunner.class) + @Category(ValidatesRunner.class) public void testParDoWithErrorInFinishBatch() { List inputs = Arrays.asList(3, -42, 666); @@ -1730,7 +1792,7 @@ public void finishBundle(FinishBundleContext c) { @RunWith(JUnit4.class) public static class TimestampTests extends SharedTestBase implements Serializable { @Test - @Category(NeedsRunner.class) + @Category(ValidatesRunner.class) public void testParDoOutputWithTimestamp() { PCollection input = pipeline.apply(Create.of(Arrays.asList(3, 42, 6))); @@ -1751,7 +1813,7 @@ public void testParDoOutputWithTimestamp() { } @Test - @Category(NeedsRunner.class) + @Category(ValidatesRunner.class) public void testParDoTaggedOutputWithTimestamp() { PCollection input = pipeline.apply(Create.of(Arrays.asList(3, 42, 6))); @@ -3394,6 +3456,7 @@ public void processElement( /** Tests to validate ParDo timers. */ @RunWith(JUnit4.class) public static class TimerTests extends SharedTestBase implements Serializable { + @Test public void testTimerNotKeyed() { final String timerId = "foo"; @@ -3644,11 +3707,13 @@ public void onTimer(OutputReceiver r) { pipeline.run(); } - @Ignore( - "https://issues.apache.org/jira/browse/BEAM-2791, " - + "https://issues.apache.org/jira/browse/BEAM-2535") @Test - @Category({ValidatesRunner.class, UsesStatefulParDo.class, UsesTimersInParDo.class}) + @Category({ + ValidatesRunner.class, + UsesStatefulParDo.class, + UsesTimersInParDo.class, + UsesLoopingTimer.class + }) public void testEventTimeTimerLoop() { final String stateId = "count"; final String timerId = "timer"; @@ -3756,43 +3821,6 @@ public void onTimer( pipeline.run(); } - @Test - @Category({ValidatesRunner.class, UsesTimersInParDo.class}) - public void testAbsoluteProcessingTimeTimerRejected() throws Exception { - final String timerId = "foo"; - - DoFn, Integer> fn = - new DoFn, Integer>() { - - @TimerId(timerId) - private final TimerSpec spec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); - - @ProcessElement - public void processElement(@TimerId(timerId) Timer timer) { - try { - timer.set(new Instant(0)); - fail("Should have failed due to processing time with absolute timer."); - } catch (RuntimeException e) { - String message = e.getMessage(); - List expectedSubstrings = - Arrays.asList("relative timers", "processing time"); - expectedSubstrings.forEach( - str -> - Preconditions.checkState( - message.contains(str), - "Pipeline didn't fail with the expected strings: %s", - expectedSubstrings)); - } - } - - @OnTimer(timerId) - public void onTimer() {} - }; - - pipeline.apply(Create.of(KV.of("hello", 37))).apply(ParDo.of(fn)); - pipeline.run(); - } - @Test @Category({ValidatesRunner.class, UsesTimersInParDo.class}) public void testOutOfBoundsEventTimeTimer() throws Exception { @@ -3963,86 +3991,91 @@ public void onTimer() {} } @Test - @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesTestStreamWithProcessingTime.class}) + @Category({ + ValidatesRunner.class, + UsesTimersInParDo.class, + UsesTestStream.class, + UsesTestStreamWithProcessingTime.class + }) public void testSimpleProcessingTimerTimer() throws Exception { final String timerId = "foo"; - DoFn, Integer> fn = - new DoFn, Integer>() { + DoFn, Long> fn = + new DoFn, Long>() { @TimerId(timerId) private final TimerSpec spec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); @ProcessElement - public void processElement(@TimerId(timerId) Timer timer, OutputReceiver r) { + public void processElement(@TimerId(timerId) Timer timer, OutputReceiver r) { timer.offset(Duration.standardSeconds(1)).setRelative(); - r.output(3); + r.output(3L); } @OnTimer(timerId) - public void onTimer(TimeDomain timeDomain, OutputReceiver r) { + public void onTimer(TimeDomain timeDomain, OutputReceiver r) { if (timeDomain.equals(TimeDomain.PROCESSING_TIME)) { - r.output(42); + r.output(42L); } } }; - TestStream> stream = - TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) - .addElements(KV.of("hello", 37)) + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .addElements(KV.of("hello", 37L)) .advanceProcessingTime( Duration.millis( DateTimeUtils.currentTimeMillis() / 1000 * 1000) // round to seconds .plus(Duration.standardMinutes(2))) .advanceWatermarkToInfinity(); - PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); - PAssert.that(output).containsInAnyOrder(3, 42); + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L, 42L); pipeline.run(); } @Test - @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) + @Category({ValidatesRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) public void testEventTimeTimerUnbounded() throws Exception { final String timerId = "foo"; - DoFn, Integer> fn = - new DoFn, Integer>() { + DoFn, Long> fn = + new DoFn, Long>() { @TimerId(timerId) private final TimerSpec spec = TimerSpecs.timer(TimeDomain.EVENT_TIME); @ProcessElement - public void processElement(@TimerId(timerId) Timer timer, OutputReceiver r) { + public void processElement(@TimerId(timerId) Timer timer, OutputReceiver r) { timer.offset(Duration.standardSeconds(1)).setRelative(); - r.output(3); + r.output(3L); } @OnTimer(timerId) - public void onTimer(OutputReceiver r) { - r.output(42); + public void onTimer(OutputReceiver r) { + r.output(42L); } }; - TestStream> stream = - TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) .advanceWatermarkTo(new Instant(0)) - .addElements(KV.of("hello", 37)) + .addElements(KV.of("hello", 37L)) .advanceWatermarkTo(new Instant(0).plus(Duration.standardSeconds(1))) .advanceWatermarkToInfinity(); - PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); - PAssert.that(output).containsInAnyOrder(3, 42); + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L, 42L); pipeline.run(); } @Test - @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) + @Category({ValidatesRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) public void testEventTimeTimerAlignUnbounded() throws Exception { final String timerId = "foo"; - DoFn, KV> fn = - new DoFn, KV>() { + DoFn, KV> fn = + new DoFn, KV>() { @TimerId(timerId) private final TimerSpec spec = TimerSpecs.timer(TimeDomain.EVENT_TIME); @@ -4051,46 +4084,45 @@ public void testEventTimeTimerAlignUnbounded() throws Exception { public void processElement( @TimerId(timerId) Timer timer, @Timestamp Instant timestamp, - OutputReceiver> r) { + OutputReceiver> r) { timer .align(Duration.standardMinutes(1)) .offset(Duration.standardSeconds(1)) .setRelative(); - r.output(KV.of(3, timestamp)); + r.output(KV.of(3L, timestamp)); } @OnTimer(timerId) - public void onTimer( - @Timestamp Instant timestamp, OutputReceiver> r) { - r.output(KV.of(42, timestamp)); + public void onTimer(@Timestamp Instant timestamp, OutputReceiver> r) { + r.output(KV.of(42L, timestamp)); } }; - TestStream> stream = - TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) .advanceWatermarkTo(new Instant(0).plus(Duration.standardSeconds(5))) - .addElements(KV.of("hello", 37)) + .addElements(KV.of("hello", 37L)) .advanceWatermarkTo(new Instant(0).plus(Duration.standardMinutes(1))) .advanceWatermarkToInfinity(); - PCollection> output = pipeline.apply(stream).apply(ParDo.of(fn)); + PCollection> output = pipeline.apply(stream).apply(ParDo.of(fn)); PAssert.that(output) .containsInAnyOrder( - KV.of(3, new Instant(0).plus(Duration.standardSeconds(5))), + KV.of(3L, new Instant(0).plus(Duration.standardSeconds(5))), KV.of( - 42, + 42L, new Instant( Duration.standardMinutes(1).minus(Duration.standardSeconds(1)).getMillis()))); pipeline.run(); } @Test - @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) + @Category({ValidatesRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) public void testEventTimeTimerAlignAfterGcTimeUnbounded() throws Exception { final String timerId = "foo"; - DoFn, KV> fn = - new DoFn, KV>() { + DoFn, KV> fn = + new DoFn, KV>() { @TimerId(timerId) private final TimerSpec spec = TimerSpecs.timer(TimeDomain.EVENT_TIME); @@ -4099,29 +4131,28 @@ public void testEventTimeTimerAlignAfterGcTimeUnbounded() throws Exception { public void processElement(ProcessContext context, @TimerId(timerId) Timer timer) { // This aligned time will exceed the END_OF_GLOBAL_WINDOW timer.align(Duration.standardDays(1)).setRelative(); - context.output(KV.of(3, context.timestamp())); + context.output(KV.of(3L, context.timestamp())); } @OnTimer(timerId) - public void onTimer( - @Timestamp Instant timestamp, OutputReceiver> r) { - r.output(KV.of(42, timestamp)); + public void onTimer(@Timestamp Instant timestamp, OutputReceiver> r) { + r.output(KV.of(42L, timestamp)); } }; - TestStream> stream = - TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) // See GlobalWindow, // END_OF_GLOBAL_WINDOW is TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1)) .advanceWatermarkTo(GlobalWindow.INSTANCE.maxTimestamp()) - .addElements(KV.of("hello", 37)) + .addElements(KV.of("hello", 37L)) .advanceWatermarkToInfinity(); - PCollection> output = pipeline.apply(stream).apply(ParDo.of(fn)); + PCollection> output = pipeline.apply(stream).apply(ParDo.of(fn)); PAssert.that(output) .containsInAnyOrder( - KV.of(3, GlobalWindow.INSTANCE.maxTimestamp()), - KV.of(42, GlobalWindow.INSTANCE.maxTimestamp())); + KV.of(3L, GlobalWindow.INSTANCE.maxTimestamp()), + KV.of(42L, GlobalWindow.INSTANCE.maxTimestamp())); pipeline.run(); } @@ -4133,6 +4164,7 @@ public void onTimer( @Category({ ValidatesRunner.class, UsesTimersInParDo.class, + UsesTestStream.class, UsesTestStreamWithProcessingTime.class }) public void testProcessingTimeTimerCanBeReset() throws Exception { @@ -4224,8 +4256,8 @@ public void onTimer(OutputReceiver r) { UsesStrictTimerOrdering.class }) public void testEventTimeTimerOrdering() throws Exception { - final int numTestElements = 100; - final Instant now = new Instant(1500000000000L); + final int numTestElements = 10; + final Instant now = new Instant(0); TestStream.Builder> builder = TestStream.create(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())) .advanceWatermarkTo(new Instant(0)); @@ -4252,7 +4284,7 @@ public void testEventTimeTimerOrdering() throws Exception { }) public void testEventTimeTimerOrderingWithCreate() throws Exception { final int numTestElements = 100; - final Instant now = new Instant(1500000000000L); + final Instant now = new Instant(0L); List>> elements = new ArrayList<>(); for (int i = 0; i < numTestElements; i++) { @@ -4268,7 +4300,6 @@ private void testEventTimeTimerOrderingWithInputPTransform( int numTestElements, PTransform>> transform) throws Exception { - final String timerIdBagAppend = "append"; final String timerIdGc = "gc"; final String bag = "bag"; @@ -4316,7 +4347,6 @@ public void onTimer( OnTimerContext context, @TimerId(timerIdBagAppend) Timer timer, @StateId(bag) BagState> bagState) { - List> flush = new ArrayList<>(); Instant flushTime = context.timestamp(); for (TimestampedValue val : bagState.read()) { @@ -4336,7 +4366,6 @@ public void onTimer( @OnTimer(timerIdGc) public void onTimer( OnTimerContext context, @StateId(bag) BagState> bagState) { - String output = Joiner.on(":") .join( @@ -4423,23 +4452,53 @@ public void duplicateTimerSetting() { UsesTestStream.class, UsesStrictTimerOrdering.class }) - public void testTwoTimersSettingEachOther() { + public void testTwoTimersSettingEachOtherBounded() { + testTwoTimersSettingEachOther(IsBounded.BOUNDED); + } + + @Test + @Category({ + ValidatesRunner.class, + UsesTimersInParDo.class, + UsesTestStream.class, + UsesStrictTimerOrdering.class + }) + public void testTwoTimersSettingEachOtherUnbounded() { + testTwoTimersSettingEachOther(IsBounded.UNBOUNDED); + } + + private void testTwoTimersSettingEachOther(IsBounded isBounded) { Instant now = new Instant(1500000000000L); Instant end = now.plus(100); - TestStream> input = - TestStream.create(KvCoder.of(VoidCoder.of(), VoidCoder.of())) - .addElements(KV.of(null, null)) + TestStream> input = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())) + .addElements(KV.of("", "")) .advanceWatermarkToInfinity(); - pipeline.apply(TwoTimerTest.of(now, end, input)); + pipeline.apply(TwoTimerTest.of(now, end, input, isBounded)); pipeline.run(); } @Test @Category({ValidatesRunner.class, UsesTimersInParDo.class, UsesStrictTimerOrdering.class}) - public void testTwoTimersSettingEachOtherWithCreateAsInput() { - Instant now = new Instant(1500000000000L); + public void testTwoTimersSettingEachOtherWithCreateAsInputBounded() { + testTwoTimersSettingEachOtherWithCreateAsInput(IsBounded.BOUNDED); + } + + @Test + @Category({ + ValidatesRunner.class, + UsesTimersInParDo.class, + UsesStrictTimerOrdering.class, + UsesUnboundedPCollections.class + }) + public void testTwoTimersSettingEachOtherWithCreateAsInputUnbounded() { + testTwoTimersSettingEachOtherWithCreateAsInput(IsBounded.UNBOUNDED); + } + + private void testTwoTimersSettingEachOtherWithCreateAsInput(IsBounded isBounded) { + Instant now = new Instant(0L); Instant end = now.plus(100); - pipeline.apply(TwoTimerTest.of(now, end, Create.of(KV.of(null, null)))); + pipeline.apply(TwoTimerTest.of(now, end, Create.of(KV.of("", "")), isBounded)); pipeline.run(); } @@ -4530,13 +4589,14 @@ public void onTimer( ValidatesRunner.class, UsesStatefulParDo.class, UsesTimersInParDo.class, + UsesTestStream.class, UsesTestStreamWithProcessingTime.class, UsesTestStreamWithOutputTimestamp.class }) public void testOutputTimestampWithProcessingTime() { final String timerId = "foo"; - DoFn, KV> fn1 = - new DoFn, KV>() { + DoFn, KV> fn1 = + new DoFn, KV>() { @TimerId(timerId) private final TimerSpec timer = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); @@ -4545,21 +4605,21 @@ public void testOutputTimestampWithProcessingTime() { public void processElement( @TimerId(timerId) Timer timer, @Timestamp Instant timestamp, - OutputReceiver> o) { + OutputReceiver> o) { timer .withOutputTimestamp(timestamp.plus(Duration.standardSeconds(5))) .offset(Duration.standardSeconds(10)) .setRelative(); // Output a message. This will cause the next DoFn to set a timer as well. - o.output(KV.of("foo", 100)); + o.output(KV.of("foo", 100L)); } @OnTimer(timerId) public void onTimer(OnTimerContext c, BoundedWindow w) {} }; - DoFn, Integer> fn2 = - new DoFn, Integer>() { + DoFn, Long> fn2 = + new DoFn, Long>() { @TimerId(timerId) private final TimerSpec timer = TimerSpecs.timer(TimeDomain.EVENT_TIME); @@ -4584,48 +4644,132 @@ public void processElement( @OnTimer(timerId) public void onTimer( @StateId("timerFired") ValueState timerFiredState, - OutputReceiver o) { + OutputReceiver o) { timerFiredState.write(true); - o.output(100); + o.output(100L); } }; - TestStream> stream = - TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) .advanceProcessingTime(Duration.standardSeconds(1)) // Cause fn2 to set a timer. - .addElements(KV.of("key", 1)) + .addElements(KV.of("key", 1L)) // Normally this would case fn2's timer to expire, but it shouldn't here because of // the output timestamp. .advanceProcessingTime(Duration.standardSeconds(9)) .advanceWatermarkTo(new Instant(11)) // If the timer fired, then this would case fn2 to fail with an assertion error. - .addElements(KV.of("key", 1)) + .addElements(KV.of("key", 1L)) .advanceProcessingTime(Duration.standardSeconds(100)) .advanceWatermarkToInfinity(); - PCollection output = + PCollection output = pipeline.apply(stream).apply("first", ParDo.of(fn1)).apply("second", ParDo.of(fn2)); - PAssert.that(output).containsInAnyOrder(100); // result output + PAssert.that(output).containsInAnyOrder(100L); // result output + pipeline.run(); + } + + @Test + @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) + public void testRelativeTimerWithOutputTimestamp() { + DoFn, String> buffferFn = + new DoFn, String>() { + + @TimerId("timer") + private final TimerSpec timerSpec = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + @StateId("buffer") + private final StateSpec>> bufferSpec = + StateSpecs.bag(TimestampedValue.TimestampedValueCoder.of(StringUtf8Coder.of())); + + @StateId("minStamp") + private final StateSpec> minStamp = StateSpecs.value(); + + private Instant minInstant(Instant a, Instant b) { + return a.isBefore(b) ? a : b; + } + + @ProcessElement + public void processElement( + @Element KV element, + @Timestamp Instant timestamp, + @StateId("buffer") BagState> buffer, + @StateId("minStamp") ValueState minStamp, + @TimerId("timer") Timer timer) { + + minStamp.readLater(); + buffer.add(TimestampedValue.of(element.getValue(), timestamp)); + Instant currentMinStamp = minStamp.read(); + if (currentMinStamp == null || currentMinStamp.isAfter(timestamp)) { + currentMinStamp = timestamp; + minStamp.write(currentMinStamp); + timer.withOutputTimestamp(currentMinStamp).offset(Duration.ZERO).setRelative(); + } + } + + @OnTimer("timer") + public void onTimer( + OnTimerContext context, + @StateId("buffer") BagState> buffer, + @StateId("minStamp") ValueState minStamp, + @TimerId("timer") Timer timer, + OutputReceiver output) { + + Instant fireTimestamp = context.fireTimestamp(); + Iterable> values = buffer.read(); + Instant currentMinStamp = BoundedWindow.TIMESTAMP_MAX_VALUE; + for (TimestampedValue val : values) { + if (fireTimestamp.isBefore(val.getTimestamp())) { + output.outputWithTimestamp(val.getValue(), val.getTimestamp()); + } else if (currentMinStamp.isAfter(val.getTimestamp())) { + currentMinStamp = val.getTimestamp(); + } + } + if (currentMinStamp.isBefore(BoundedWindow.TIMESTAMP_MAX_VALUE)) { + minStamp.write(currentMinStamp); + timer.withOutputTimestamp(currentMinStamp).offset(Duration.ZERO).setRelative(); + } else { + minStamp.clear(); + } + } + }; + + PCollection> input = + pipeline.apply( + TestStream.create(KvCoder.of(VoidCoder.of(), StringUtf8Coder.of())) + .addElements(TimestampedValue.of(KV.of(null, "foo"), new Instant(1))) + .addElements(TimestampedValue.of(KV.of(null, "bar"), new Instant(2))) + .advanceWatermarkToInfinity()); + PCollection result = input.apply(ParDo.of(buffferFn)); + PAssert.that(result).containsInAnyOrder("foo", "bar"); pipeline.run(); } private static class TwoTimerTest extends PTransform { private static PTransform of( - Instant start, Instant end, PTransform>> input) { - return new TwoTimerTest(start, end, input); + Instant start, + Instant end, + PTransform>> input, + IsBounded isBounded) { + return new TwoTimerTest(start, end, input, isBounded); } private final Instant start; private final Instant end; - private final transient PTransform>> inputPTransform; + private final transient PTransform>> inputPTransform; + private IsBounded isBounded; public TwoTimerTest( - Instant start, Instant end, PTransform>> input) { + Instant start, + Instant end, + PTransform>> input, + IsBounded isBounded) { this.start = start; this.end = end; this.inputPTransform = input; + this.isBounded = isBounded; } @Override @@ -4637,9 +4781,10 @@ public PDone expand(PBegin input) { PCollection result = input .apply(inputPTransform) + .setIsBoundedInternal(isBounded) .apply( ParDo.of( - new DoFn, String>() { + new DoFn, String>() { @TimerId(timerName1) final TimerSpec timerSpec1 = TimerSpecs.timer(TimeDomain.EVENT_TIME); @@ -4652,11 +4797,9 @@ public PDone expand(PBegin input) { @ProcessElement public void processElement( - ProcessContext context, @TimerId(timerName1) Timer t1, @TimerId(timerName2) Timer t2, @StateId(countStateName) ValueState state) { - state.write(0); t1.set(start); // set the t2 timer after end, so that we test that @@ -4666,37 +4809,37 @@ public void processElement( @OnTimer(timerName1) public void onTimer1( - OnTimerContext context, + @Timestamp Instant timestamp, @TimerId(timerName2) Timer t2, - @StateId(countStateName) ValueState state) { - + @StateId(countStateName) ValueState state, + OutputReceiver o) { Integer current = state.read(); - t2.set(context.timestamp()); - - context.output( + t2.set(timestamp); + o.output( "t1:" + current + ":" - + context.timestamp().minus(start.getMillis()).getMillis()); + + timestamp.minus(start.getMillis()).getMillis()); } @OnTimer(timerName2) public void onTimer2( - OnTimerContext context, + @Timestamp Instant timestamp, @TimerId(timerName1) Timer t1, - @StateId(countStateName) ValueState state) { + @StateId(countStateName) ValueState state, + OutputReceiver o) { Integer current = state.read(); - if (context.timestamp().isBefore(end)) { + if (timestamp.isBefore(end)) { state.write(current + 1); - t1.set(context.timestamp().plus(1)); + t1.set(timestamp.plus(1)); } else { state.write(-1); } - context.output( + o.output( "t2:" + current + ":" - + context.timestamp().minus(start.getMillis()).getMillis()); + + timestamp.minus(start.getMillis()).getMillis()); } })); @@ -4710,30 +4853,382 @@ public void onTimer2( return PDone.in(input.getPipeline()); } } - } - /** Tests validating Timer coder inference behaviors. */ - @RunWith(JUnit4.class) - public static class TimerCoderInferenceTests extends SharedTestBase implements Serializable { @Test - @Category({ValidatesRunner.class, UsesStatefulParDo.class}) - public void testValueStateCoderInference() { - final String stateId = "foo"; - MyIntegerCoder myIntegerCoder = MyIntegerCoder.of(); - pipeline.getCoderRegistry().registerCoderForClass(MyInteger.class, myIntegerCoder); + @Category({ + NeedsRunner.class, + UsesTimersInParDo.class, + UsesTestStream.class, + UsesTestStreamWithProcessingTime.class + }) + public void testSetAndClearProcessingTimeTimer() { - DoFn, MyInteger> fn = - new DoFn, MyInteger>() { + final String timerId = "processing-timer"; - @StateId(stateId) - private final StateSpec> intState = StateSpecs.value(); + DoFn, Long> fn = + new DoFn, Long>() { + + @TimerId(timerId) + private final TimerSpec spec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); @ProcessElement - public void processElement( - ProcessContext c, - @StateId(stateId) ValueState state, - OutputReceiver r) { - MyInteger currentValue = MoreObjects.firstNonNull(state.read(), new MyInteger(0)); + public void processElement(@TimerId(timerId) Timer timer, OutputReceiver r) { + timer.offset(Duration.standardSeconds(1)).setRelative(); + timer.clear(); + r.output(3L); + } + + @OnTimer(timerId) + public void onTimer(TimeDomain timeDomain, OutputReceiver r) { + r.output(42L); + } + }; + + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .addElements(KV.of("hello", 37L)) + .advanceProcessingTime( + Duration.millis( + DateTimeUtils.currentTimeMillis() / 1000 * 1000) // round to seconds + .plus(Duration.standardMinutes(2))) + .advanceWatermarkToInfinity(); + + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L); + pipeline.run(); + } + + @Test + @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) + public void testSetAndClearEventTimeTimer() { + final String timerId = "event-timer"; + + DoFn, Long> fn = + new DoFn, Long>() { + + @TimerId(timerId) + private final TimerSpec spec = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + @ProcessElement + public void processElement(@TimerId(timerId) Timer timer, OutputReceiver r) { + timer.offset(Duration.standardSeconds(1)).setRelative(); + timer.clear(); + r.output(3L); + } + + @OnTimer(timerId) + public void onTimer(OutputReceiver r) { + r.output(42L); + } + }; + + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .advanceWatermarkTo(new Instant(0)) + .addElements(KV.of("hello", 37L)) + .advanceWatermarkTo(new Instant(0).plus(Duration.standardSeconds(1))) + .advanceWatermarkToInfinity(); + + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L); + pipeline.run(); + } + + @Test + @Category({ + NeedsRunner.class, + UsesTimersInParDo.class, + UsesTestStream.class, + UsesTestStreamWithProcessingTime.class + }) + public void testClearUnsetProcessingTimeTimer() { + final String timerId = "processing-timer"; + + DoFn, Long> fn = + new DoFn, Long>() { + + @TimerId(timerId) + private final TimerSpec spec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); + + @ProcessElement + public void processElement(@TimerId(timerId) Timer timer, OutputReceiver r) { + timer.clear(); + r.output(3L); + } + + @OnTimer(timerId) + public void onTimer(TimeDomain timeDomain, OutputReceiver r) { + r.output(42L); + } + }; + + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .addElements(KV.of("hello", 37L)) + .advanceProcessingTime( + Duration.millis( + DateTimeUtils.currentTimeMillis() / 1000 * 1000) // round to seconds + .plus(Duration.standardMinutes(4))) + .advanceWatermarkToInfinity(); + + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L); + pipeline.run(); + } + + @Test + @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) + public void testClearUnsetEventTimeTimer() { + final String timerId = "event-timer"; + + DoFn, Long> fn = + new DoFn, Long>() { + + @TimerId(timerId) + private final TimerSpec spec = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + @ProcessElement + public void processElement(@TimerId(timerId) Timer timer, OutputReceiver r) { + timer.clear(); + r.output(3L); + } + + @OnTimer(timerId) + public void onTimer(OutputReceiver r) { + r.output(42L); + } + }; + + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .advanceWatermarkTo(new Instant(0)) + .addElements(KV.of("hello", 37L)) + .advanceWatermarkTo(new Instant(0).plus(Duration.standardSeconds(1))) + .advanceWatermarkToInfinity(); + + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L); + pipeline.run(); + } + + @Test + @Category({ + NeedsRunner.class, + UsesTimersInParDo.class, + UsesTestStream.class, + UsesTestStreamWithProcessingTime.class + }) + public void testClearProcessingTimeTimer() { + final String timerId = "processing-timer"; + final String clearTimerId = "clear-timer"; + + DoFn, Long> fn = + new DoFn, Long>() { + + @TimerId(timerId) + private final TimerSpec spec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); + + @TimerId(clearTimerId) + private final TimerSpec clearTimerSpec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); + + @ProcessElement + public void processElement( + @TimerId(timerId) Timer timer, + @TimerId(clearTimerId) Timer clearTimer, + OutputReceiver r) { + timer.offset(Duration.standardSeconds(1)).setRelative(); + clearTimer.offset(Duration.standardSeconds(2)).setRelative(); + + r.output(3L); + } + + @OnTimer(timerId) + public void onTimer(OutputReceiver r, @TimerId(clearTimerId) Timer clearTimer) { + r.output(42L); + clearTimer.clear(); + } + + // This should never fire since we clear the timer in the earlier timer. + @OnTimer(clearTimerId) + public void clearTimer(OutputReceiver r) { + r.output(43L); + } + }; + + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .addElements(KV.of("hello", 37L)) + .advanceProcessingTime( + Duration.millis( + DateTimeUtils.currentTimeMillis() / 1000 * 1000) // round to seconds + .plus(Duration.standardMinutes(2))) + .advanceProcessingTime( + Duration.millis( + DateTimeUtils.currentTimeMillis() / 1000 * 1000) // round to seconds + .plus(Duration.standardMinutes(4))) + .advanceWatermarkToInfinity(); + + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L, 42L); + pipeline.run(); + } + + @Test + @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) + public void testClearEventTimeTimer() { + final String timerId = "event-timer"; + final String clearTimerId = "clear-timer"; + + DoFn, Long> fn = + new DoFn, Long>() { + + @TimerId(timerId) + private final TimerSpec spec = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + @TimerId(clearTimerId) + private final TimerSpec clearSpec = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + @ProcessElement + public void processElement( + @TimerId(timerId) Timer timer, + @TimerId(clearTimerId) Timer clearTimer, + OutputReceiver r) { + timer.offset(Duration.standardSeconds(1)).setRelative(); + clearTimer.offset(Duration.standardSeconds(2)).setRelative(); + + r.output(3L); + } + + @OnTimer(timerId) + public void onTimer(OutputReceiver r, @TimerId(clearTimerId) Timer clearTimer) { + r.output(42L); + clearTimer.clear(); + } + + // This should never fire since we clear the timer in the earlier timer. + @OnTimer(clearTimerId) + public void clearTimer(OutputReceiver r) { + r.output(43L); + } + }; + + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .advanceWatermarkTo(new Instant(0)) + .addElements(KV.of("hello", 37L)) + .advanceWatermarkTo(new Instant(0).plus(Duration.standardSeconds(1))) + .advanceWatermarkToInfinity(); + + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L, 42L); + pipeline.run(); + } + + @Test + @Category({ + NeedsRunner.class, + UsesTimersInParDo.class, + UsesTestStream.class, + UsesTestStreamWithProcessingTime.class + }) + public void testSetProcessingTimerAfterClear() { + final String timerId = "processing-timer"; + + DoFn, Long> fn = + new DoFn, Long>() { + + @TimerId(timerId) + private final TimerSpec spec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); + + @ProcessElement + public void processElement( + @Element KV e, + @TimerId(timerId) Timer timer, + OutputReceiver r) { + timer.clear(); + timer.offset(Duration.standardSeconds(1)).setRelative(); + r.output(3L); + } + + @OnTimer(timerId) + public void onTimer(TimeDomain timeDomain, OutputReceiver r) { + r.output(42L); + } + }; + + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .addElements(KV.of("hello", 37L), KV.of("hello", 38L)) + .advanceProcessingTime( + Duration.millis( + DateTimeUtils.currentTimeMillis() / 1000 * 1000) // round to seconds + .plus(Duration.standardMinutes(2))) + .advanceWatermarkToInfinity(); + + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L, 3L, 42L); + pipeline.run(); + } + + @Test + @Category({NeedsRunner.class, UsesTimersInParDo.class, UsesTestStream.class}) + public void testSetEventTimerAfterClear() { + final String timerId = "event-timer"; + + DoFn, Long> fn = + new DoFn, Long>() { + + @TimerId(timerId) + private final TimerSpec spec = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + @ProcessElement + public void processElement(@TimerId(timerId) Timer timer, OutputReceiver r) { + timer.clear(); + timer.offset(Duration.standardSeconds(1)).setRelative(); + r.output(3L); + } + + @OnTimer(timerId) + public void onTimer(OutputReceiver r) { + r.output(42L); + } + }; + + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .advanceWatermarkTo(new Instant(0)) + .addElements(KV.of("hello", 37L), KV.of("hello", 38L)) + .advanceWatermarkTo(new Instant(0).plus(Duration.standardSeconds(1))) + .advanceWatermarkToInfinity(); + + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L, 3L, 42L); + pipeline.run(); + } + } + + /** Tests validating Timer coder inference behaviors. */ + @RunWith(JUnit4.class) + public static class TimerCoderInferenceTests extends SharedTestBase implements Serializable { + @Test + @Category({ValidatesRunner.class, UsesStatefulParDo.class}) + public void testValueStateCoderInference() { + final String stateId = "foo"; + MyIntegerCoder myIntegerCoder = MyIntegerCoder.of(); + pipeline.getCoderRegistry().registerCoderForClass(MyInteger.class, myIntegerCoder); + + DoFn, MyInteger> fn = + new DoFn, MyInteger>() { + + @StateId(stateId) + private final StateSpec> intState = StateSpecs.value(); + + @ProcessElement + public void processElement( + ProcessContext c, + @StateId(stateId) ValueState state, + OutputReceiver r) { + MyInteger currentValue = MoreObjects.firstNonNull(state.read(), new MyInteger(0)); r.output(currentValue); state.write(new MyInteger(currentValue.getValue() + 1)); } @@ -5269,42 +5764,43 @@ public void onTimer2(@Timestamp Instant ts, OutputReceiver r) { @Category({ ValidatesRunner.class, UsesTimersInParDo.class, + UsesTestStream.class, UsesTestStreamWithProcessingTime.class, UsesTimerMap.class }) public void testTimerFamilyProcessingTime() throws Exception { final String timerId = "foo"; - DoFn, Integer> fn = - new DoFn, Integer>() { + DoFn, Long> fn = + new DoFn, Long>() { @TimerFamily(timerId) private final TimerSpec spec = TimerSpecs.timerMap(TimeDomain.PROCESSING_TIME); @ProcessElement public void processElement( - @TimerFamily(timerId) TimerMap timerMap, OutputReceiver r) { + @TimerFamily(timerId) TimerMap timerMap, OutputReceiver r) { Timer timer = timerMap.get("timerId1"); timer.offset(Duration.standardSeconds(1)).setRelative(); - r.output(3); + r.output(3L); } @OnTimerFamily(timerId) - public void onTimer(TimeDomain timeDomain, OutputReceiver r) { + public void onTimer(TimeDomain timeDomain, OutputReceiver r) { if (timeDomain.equals(TimeDomain.PROCESSING_TIME)) { - r.output(42); + r.output(42L); } } }; - TestStream> stream = - TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) - .addElements(KV.of("hello", 37)) + TestStream> stream = + TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of())) + .addElements(KV.of("hello", 37L)) .advanceProcessingTime(Duration.standardSeconds(2)) .advanceWatermarkToInfinity(); - PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); - PAssert.that(output).containsInAnyOrder(3, 42); + PCollection output = pipeline.apply(stream).apply(ParDo.of(fn)); + PAssert.that(output).containsInAnyOrder(3L, 42L); pipeline.run(); } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PartitionTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PartitionTest.java index 2c3bbf9918fb..7f16cc05b671 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PartitionTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PartitionTest.java @@ -44,9 +44,6 @@ /** Tests for {@link Partition}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PartitionTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PeriodicImpulseTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PeriodicImpulseTest.java index 2e01cd7b5e60..2ffff0e81746 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PeriodicImpulseTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PeriodicImpulseTest.java @@ -23,6 +23,8 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.testing.UsesImpulse; import org.apache.beam.sdk.testing.UsesStatefulParDo; +import org.apache.beam.sdk.testing.UsesUnboundedPCollections; +import org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.joda.time.Duration; @@ -51,6 +53,8 @@ public void processElement(DoFn>.ProcessContext c) NeedsRunner.class, UsesImpulse.class, UsesStatefulParDo.class, + UsesUnboundedPCollections.class, + UsesUnboundedSplittableParDo.class }) public void testOutputsProperElements() { Instant instant = Instant.now(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PeriodicSequenceTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PeriodicSequenceTest.java index cd6bc01d95d9..a426635511bc 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PeriodicSequenceTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PeriodicSequenceTest.java @@ -23,6 +23,8 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.testing.UsesImpulse; import org.apache.beam.sdk.testing.UsesStatefulParDo; +import org.apache.beam.sdk.testing.UsesUnboundedPCollections; +import org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.joda.time.Duration; @@ -51,6 +53,8 @@ public void processElement(DoFn>.ProcessContext c) NeedsRunner.class, UsesImpulse.class, UsesStatefulParDo.class, + UsesUnboundedPCollections.class, + UsesUnboundedSplittableParDo.class }) public void testOutputsProperElements() { Instant instant = Instant.now(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReifyTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReifyTest.java index f5629e2fc806..41c300a01a05 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReifyTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReifyTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.transforms; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.io.Serializable; import org.apache.beam.sdk.coders.StringUtf8Coder; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReifyTimestampsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReifyTimestampsTest.java index 16c0acafb3e9..c76fb72325f7 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReifyTimestampsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReifyTimestampsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.transforms; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.io.Serializable; import org.apache.beam.sdk.testing.PAssert; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java index 55c62cbec63c..0a1dbff06f5f 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java @@ -18,13 +18,15 @@ package org.apache.beam.sdk.transforms; import static org.apache.beam.sdk.TestUtils.KvMatcher.isKv; +import static org.apache.beam.sdk.values.TypeDescriptors.integers; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; import static org.hamcrest.core.Is.is; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.Serializable; +import java.util.List; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VarIntCoder; @@ -34,6 +36,7 @@ import org.apache.beam.sdk.testing.TestStream; import org.apache.beam.sdk.testing.UsesTestStream; import org.apache.beam.sdk.testing.ValidatesRunner; +import org.apache.beam.sdk.transforms.Reshuffle.AssignShardFn; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.FixedWindows; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; @@ -45,6 +48,7 @@ import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.joda.time.Duration; import org.joda.time.Instant; import org.junit.Rule; @@ -274,4 +278,27 @@ public void testReshuffleWithTimestampsStreaming() { pipeline.run(); } + + @Test + @Category({ValidatesRunner.class}) + public void testAssignShardFn() { + List> inputKvs = Lists.newArrayList(); + for (int i = 0; i < 10; i++) { + inputKvs.addAll(ARBITRARY_KVS); + } + + PCollection> input = + pipeline.apply( + Create.of(inputKvs).withCoder(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of()))); + + PCollection output = + input + .apply(ParDo.of(new AssignShardFn<>(2))) + .apply(GroupByKey.create()) + .apply(MapElements.into(integers()).via(KV::getKey)); + + PAssert.that(output).containsInAnyOrder(ImmutableList.of(0, 1)); + + pipeline.run(); + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SampleTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SampleTest.java index 4ac39f9e637a..9f7e89f7ae44 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SampleTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SampleTest.java @@ -62,9 +62,6 @@ import org.junit.runners.Parameterized; /** Tests for Sample transform. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SampleTest { private static final Integer[] EMPTY = new Integer[] {}; private static final Integer[] DATA = new Integer[] {1, 2, 3, 4, 5}; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SetsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SetsTest.java index b1fd4e60895e..ed07a65b0219 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SetsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SetsTest.java @@ -36,9 +36,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SetsTest { @Rule public final TestPipeline p = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SimpleFunctionTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SimpleFunctionTest.java index bac169ea1ff5..ef2e155b03c9 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SimpleFunctionTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SimpleFunctionTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.transforms; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.values.TypeDescriptors; import org.junit.Rule; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SimpleStatsFnsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SimpleStatsFnsTest.java index 8da4c637c369..b04beb47b49e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SimpleStatsFnsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SimpleStatsFnsTest.java @@ -29,9 +29,6 @@ /** Tests of Min, Max, Mean, and Sum. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SimpleStatsFnsTest { static final double DOUBLE_COMPARISON_ACCURACY = 1e-7; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java index aa2e83dbfbc3..f2e39ab64372 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java @@ -85,7 +85,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SplittableDoFnTest implements Serializable { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java index 5f210833a665..2dcce79ccf26 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java @@ -44,6 +44,7 @@ import org.apache.beam.sdk.coders.NullableCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VarIntCoder; +import org.apache.beam.sdk.coders.VarLongCoder; import org.apache.beam.sdk.coders.VoidCoder; import org.apache.beam.sdk.testing.NeedsRunner; import org.apache.beam.sdk.testing.PAssert; @@ -55,7 +56,6 @@ import org.apache.beam.sdk.transforms.windowing.FixedWindows; import org.apache.beam.sdk.transforms.windowing.GlobalWindows; import org.apache.beam.sdk.transforms.windowing.IntervalWindow; -import org.apache.beam.sdk.transforms.windowing.InvalidWindows; import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PBegin; @@ -81,9 +81,6 @@ */ @RunWith(JUnit4.class) @Category(UsesSideInputs.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ViewTest implements Serializable { // This test is Serializable, just so that it's easy to have // anonymous inner classes inside the non-static test methods. @@ -160,30 +157,30 @@ public void processElement(ProcessContext c) { @Test @Category({ValidatesRunner.class, UsesTestStream.class}) public void testWindowedSideInputNotPresent() { - PCollection> input = + PCollection> input = pipeline.apply( - TestStream.create(KvCoder.of(VarIntCoder.of(), VarIntCoder.of())) + TestStream.create(KvCoder.of(VarLongCoder.of(), VarLongCoder.of())) .advanceWatermarkTo(new Instant(0)) - .addElements(TimestampedValue.of(KV.of(1000, 1000), new Instant(1000))) + .addElements(TimestampedValue.of(KV.of(1000L, 1000L), new Instant(1000L))) .advanceWatermarkTo(new Instant(20000)) .advanceWatermarkToInfinity()); - final PCollectionView view = + final PCollectionView view = input .apply(Values.create()) .apply("SideWindowInto", Window.into(FixedWindows.of(Duration.standardSeconds(100)))) - .apply("ViewCombine", Combine.globally(Sum.ofIntegers()).withoutDefaults()) + .apply("ViewCombine", Combine.globally(Sum.ofLongs()).withoutDefaults()) .apply("Rewindow", Window.into(FixedWindows.of(Duration.standardSeconds(10)))) - .apply(View.asSingleton().withDefaultValue(0)); + .apply(View.asSingleton().withDefaultValue(0L)); - PCollection output = + PCollection output = input .apply("MainWindowInto", Window.into(FixedWindows.of(Duration.standardSeconds(10)))) .apply(GroupByKey.create()) .apply( "OutputSideInputs", ParDo.of( - new DoFn>, Integer>() { + new DoFn>, Long>() { @ProcessElement public void processElement(ProcessContext c) { c.output(c.sideInput(view)); @@ -193,7 +190,7 @@ public void processElement(ProcessContext c) { PAssert.that(output) .inWindow(new IntervalWindow(new Instant(0), new Instant(10000))) - .containsInAnyOrder(0); + .containsInAnyOrder(0L); pipeline.run(); } @@ -1616,22 +1613,6 @@ public PCollection> expand(PBegin input) { .apply(view); } - private void testViewNonmerging( - Pipeline pipeline, - PTransform>, ? extends PCollectionView> view) { - thrown.expect(IllegalStateException.class); - thrown.expectMessage("Unable to create a side-input view from input"); - thrown.expectCause( - ThrowableMessageMatcher.hasMessage(Matchers.containsString("Consumed by GroupByKey"))); - pipeline - .apply(Create.of(KV.of("hello", 5))) - .apply( - Window.into( - new InvalidWindows<>( - "Consumed by GroupByKey", FixedWindows.of(Duration.standardHours(1))))) - .apply(view); - } - @Test public void testViewUnboundedAsSingletonDirect() { testViewUnbounded(pipeline, View.asSingleton()); @@ -1656,29 +1637,4 @@ public void testViewUnboundedAsMapDirect() { public void testViewUnboundedAsMultimapDirect() { testViewUnbounded(pipeline, View.asMultimap()); } - - @Test - public void testViewNonmergingAsSingletonDirect() { - testViewNonmerging(pipeline, View.asSingleton()); - } - - @Test - public void testViewNonmergingAsIterableDirect() { - testViewNonmerging(pipeline, View.asIterable()); - } - - @Test - public void testViewNonmergingAsListDirect() { - testViewNonmerging(pipeline, View.asList()); - } - - @Test - public void testViewNonmergingAsMapDirect() { - testViewNonmerging(pipeline, View.asMap()); - } - - @Test - public void testViewNonmergingAsMultimapDirect() { - testViewNonmerging(pipeline, View.asMultimap()); - } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WaitTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WaitTest.java index 1425e5acc7c8..914d503eefc2 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WaitTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WaitTest.java @@ -52,9 +52,6 @@ /** Tests for {@link Wait}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WaitTest implements Serializable { @Rule public transient TestPipeline p = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WatchTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WatchTest.java index 0e3f96a36f7a..81873bea3034 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WatchTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WatchTest.java @@ -31,6 +31,7 @@ import static org.junit.Assert.assertNull; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; +import static org.mockito.Mockito.mock; import java.io.IOException; import java.io.Serializable; @@ -48,6 +49,7 @@ import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.testing.UsesUnboundedSplittableParDo; +import org.apache.beam.sdk.transforms.DoFn.ProcessContinuation; import org.apache.beam.sdk.transforms.Watch.Growth; import org.apache.beam.sdk.transforms.Watch.Growth.PollFn; import org.apache.beam.sdk.transforms.Watch.Growth.PollResult; @@ -55,6 +57,10 @@ import org.apache.beam.sdk.transforms.Watch.GrowthTracker; import org.apache.beam.sdk.transforms.Watch.NonPollingGrowthState; import org.apache.beam.sdk.transforms.Watch.PollingGrowthState; +import org.apache.beam.sdk.transforms.Watch.WatchGrowthFn; +import org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator; +import org.apache.beam.sdk.transforms.splittabledofn.WatermarkEstimators; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionView; @@ -81,9 +87,6 @@ /** Tests for {@link Watch}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WatchTest implements Serializable { @Rule public transient TestPipeline p = TestPipeline.create(); @@ -541,6 +544,38 @@ private static GrowthTracker newPollingGrowthTracker() { return newTracker(PollingGrowthState.of(never().forNewInput(Instant.now(), null))); } + @Test + public void testPollingGrowthTrackerUsesElementTimestampIfNoWatermarkProvided() throws Exception { + Instant now = Instant.now(); + Watch.Growth growth = + Watch.growthOf( + new Watch.Growth.PollFn() { + + @Override + public PollResult apply(String element, Context c) throws Exception { + // We specifically test an unsorted list. + return PollResult.incomplete( + Arrays.asList( + TimestampedValue.of("d", now.plus(standardSeconds(4))), + TimestampedValue.of("c", now.plus(standardSeconds(3))), + TimestampedValue.of("a", now.plus(standardSeconds(1))), + TimestampedValue.of("b", now.plus(standardSeconds(2))))); + } + }) + .withPollInterval(standardSeconds(10)); + WatchGrowthFn growthFn = + new WatchGrowthFn( + growth, StringUtf8Coder.of(), SerializableFunctions.identity(), StringUtf8Coder.of()); + GrowthTracker tracker = newPollingGrowthTracker(); + DoFn.ProcessContext context = mock(DoFn.ProcessContext.class); + ManualWatermarkEstimator watermarkEstimator = + new WatermarkEstimators.Manual(BoundedWindow.TIMESTAMP_MIN_VALUE); + ProcessContinuation processContinuation = + growthFn.process(context, tracker, watermarkEstimator); + assertEquals(now.plus(standardSeconds(1)), watermarkEstimator.currentWatermark()); + assertTrue(processContinuation.shouldResume()); + } + @Test public void testPollingGrowthTrackerCheckpointNonEmpty() { Instant now = Instant.now(); @@ -614,6 +649,42 @@ public void testPollingGrowthTrackerHashAlreadyClaimed() { assertFalse(newTracker(residual).tryClaim(KV.of(claim, 2))); } + @Test + public void testNonPollingGrowthTrackerIgnoresWatermark() throws Exception { + Instant now = Instant.now(); + PollResult claim = + PollResult.incomplete( + Arrays.asList( + TimestampedValue.of("d", now.plus(standardSeconds(4))), + TimestampedValue.of("c", now.plus(standardSeconds(3))), + TimestampedValue.of("a", now.plus(standardSeconds(1))), + TimestampedValue.of("b", now.plus(standardSeconds(2))))) + .withWatermark(now.plus(standardSeconds(7))); + + Watch.Growth growth = + Watch.growthOf( + new Watch.Growth.PollFn() { + + @Override + public PollResult apply(String element, Context c) throws Exception { + fail("Never expected to be invoked for NonPollingGrowthState."); + return null; + } + }) + .withPollInterval(standardSeconds(10)); + GrowthTracker tracker = newTracker(NonPollingGrowthState.of(claim)); + WatchGrowthFn growthFn = + new WatchGrowthFn( + growth, StringUtf8Coder.of(), SerializableFunctions.identity(), StringUtf8Coder.of()); + DoFn.ProcessContext context = mock(DoFn.ProcessContext.class); + ManualWatermarkEstimator watermarkEstimator = + new WatermarkEstimators.Manual(BoundedWindow.TIMESTAMP_MIN_VALUE); + ProcessContinuation processContinuation = + growthFn.process(context, tracker, watermarkEstimator); + assertEquals(BoundedWindow.TIMESTAMP_MIN_VALUE, watermarkEstimator.currentWatermark()); + assertFalse(processContinuation.shouldResume()); + } + @Test public void testNonPollingGrowthTrackerCheckpointNonEmpty() { Instant now = Instant.now(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithFailuresTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithFailuresTest.java index 095f72ae259c..68bef75e1a39 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithFailuresTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithFailuresTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.transforms; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasKey; import static org.hamcrest.Matchers.hasSize; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.ArrayList; @@ -30,6 +30,8 @@ import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.WithFailures.ExceptionAsMapHandler; +import org.apache.beam.sdk.transforms.WithFailures.ThrowableHandler; +import org.apache.beam.sdk.values.EncodableThrowable; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; @@ -43,13 +45,39 @@ /** Tests for {@link WithFailures}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WithFailuresTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); + @Test + @Category(NeedsRunner.class) + public void testDirectException() { + List>> errorCollections = new ArrayList<>(); + PCollection output = + pipeline + .apply(Create.of(0, 1)) + .apply( + MapElements.into(TypeDescriptors.integers()) + .via((Integer i) -> 1 / i) + .exceptionsVia(new ThrowableHandler() {})) + .failuresTo(errorCollections); + + PAssert.that(output).containsInAnyOrder(1); + + PAssert.thatSingleton(PCollectionList.of(errorCollections).apply(Flatten.pCollections())) + .satisfies( + kv -> { + assertEquals(Integer.valueOf(0), kv.getKey()); + + Throwable throwable = kv.getValue().throwable(); + assertEquals("java.lang.ArithmeticException", throwable.getClass().getName()); + assertEquals("/ by zero", throwable.getMessage()); + return null; + }); + + pipeline.run(); + } + /** Test of {@link WithFailures.Result#failuresTo(List)}. */ @Test @Category(NeedsRunner.class) diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithTimestampsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithTimestampsTest.java index 5e3930e5490c..93b0c48b797b 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithTimestampsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithTimestampsTest.java @@ -38,9 +38,6 @@ /** Tests for {@link WithTimestamps}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WithTimestampsTest implements Serializable { @Rule public final transient TestPipeline p = TestPipeline.create(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataEvaluator.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataEvaluator.java index 191d28e512ea..5cdfc41e5e06 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataEvaluator.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataEvaluator.java @@ -36,9 +36,6 @@ /** * Test utilities to evaluate the {@link DisplayData} in the context of a {@link PipelineRunner}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DisplayDataEvaluator { private final PipelineOptions options; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataEvaluatorTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataEvaluatorTest.java index d5faa29e4621..0f14b46f975b 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataEvaluatorTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataEvaluatorTest.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.transforms.display; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.Set; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataMatchers.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataMatchers.java index 5ee2b692082c..de5d0ab1d338 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataMatchers.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataMatchers.java @@ -34,9 +34,6 @@ import org.joda.time.Instant; /** Hamcrest matcher for making assertions on {@link DisplayData} instances. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DisplayDataMatchers { /** Do not instantiate. */ private DisplayDataMatchers() {} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataMatchersTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataMatchersTest.java index de68857d6194..4aa6e6913ceb 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataMatchersTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataMatchersTest.java @@ -23,9 +23,9 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasType; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasValue; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFor; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.display.DisplayData.Builder; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataTest.java index 22374456716b..8a9c64e35db9 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/display/DisplayDataTest.java @@ -25,6 +25,7 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasType; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasValue; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFor; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.everyItem; @@ -39,7 +40,6 @@ import static org.hamcrest.Matchers.nullValue; import static org.hamcrest.Matchers.startsWith; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import com.fasterxml.jackson.databind.JsonNode; @@ -79,9 +79,6 @@ /** Tests for {@link DisplayData} class. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DisplayDataTest implements Serializable { @Rule public transient ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGbkResultTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGbkResultTest.java index bb49dd3c5b11..059899936d49 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGbkResultTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGbkResultTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.transforms.join; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.emptyIterable; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.List; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGroupByKeyTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGroupByKeyTest.java index 17270a53ed4b..31ee0e08c1b9 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGroupByKeyTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGroupByKeyTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.transforms.join; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.Serializable; import java.util.ArrayList; @@ -56,9 +56,6 @@ /** Tests for CoGroupByKeyTest. Implements Serializable for anonymous DoFns. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CoGroupByKeyTest implements Serializable { /** diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/UnionCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/UnionCoderTest.java index 7a764e8dda28..681f41899bc0 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/UnionCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/UnionCoderTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.transforms.join; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import org.apache.beam.sdk.coders.DoubleCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokersTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokersTest.java index 972e33aabdef..7beb9ebb0927 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokersTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokersTest.java @@ -20,12 +20,13 @@ import static org.apache.beam.sdk.transforms.DoFn.ProcessContinuation.resume; import static org.apache.beam.sdk.transforms.DoFn.ProcessContinuation.stop; import static org.hamcrest.CoreMatchers.instanceOf; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNull; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertThrows; import static org.junit.Assert.fail; import static org.mockito.ArgumentMatchers.any; import static org.mockito.Matchers.eq; @@ -48,6 +49,7 @@ import org.apache.beam.sdk.coders.InstantCoder; import org.apache.beam.sdk.coders.VarIntCoder; import org.apache.beam.sdk.coders.VoidCoder; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.state.StateSpec; import org.apache.beam.sdk.state.StateSpecs; import org.apache.beam.sdk.state.TimeDomain; @@ -65,6 +67,8 @@ import org.apache.beam.sdk.transforms.splittabledofn.HasDefaultTracker; import org.apache.beam.sdk.transforms.splittabledofn.HasDefaultWatermarkEstimator; import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker.HasProgress; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker.Progress; import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker.TruncateResult; import org.apache.beam.sdk.transforms.splittabledofn.SplitResult; import org.apache.beam.sdk.transforms.splittabledofn.WatermarkEstimator; @@ -88,7 +92,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class DoFnInvokersTest { @Rule public ExpectedException thrown = ExpectedException.none(); @@ -103,6 +106,7 @@ public class DoFnInvokersTest { @Mock private IntervalWindow mockWindow; // @Mock private PaneInfo mockPaneInfo; @Mock private DoFnInvoker.ArgumentProvider mockArgumentProvider; + @Mock private PipelineOptions mockOptions; @Before public void setUp() { @@ -117,6 +121,7 @@ public void setUp() { when(mockArgumentProvider.outputReceiver(Matchers.any())).thenReturn(mockOutputReceiver); when(mockArgumentProvider.taggedOutputReceiver(Matchers.any())) .thenReturn(mockMultiOutputReceiver); + when(mockArgumentProvider.pipelineOptions()).thenReturn(mockOptions); when(mockArgumentProvider.startBundleContext(Matchers.any())) .thenReturn(mockStartBundleContext); when(mockArgumentProvider.finishBundleContext(Matchers.any())) @@ -358,6 +363,7 @@ public SomeRestrictionTracker newTracker(@Restriction SomeRestriction restrictio @Test public void testDoFnWithStartBundleSetupTeardown() throws Exception { + when(mockArgumentProvider.pipelineOptions()).thenReturn(mockOptions); when(mockArgumentProvider.startBundleContext(any(DoFn.class))) .thenReturn(mockStartBundleContext); when(mockArgumentProvider.finishBundleContext(any(DoFn.class))) @@ -373,19 +379,20 @@ public void startBundle(StartBundleContext c) {} public void finishBundle(FinishBundleContext c) {} @Setup - public void before() {} + public void before(PipelineOptions options) {} @Teardown public void after() {} } MockFn fn = mock(MockFn.class); + DoFnInvoker invoker = DoFnInvokers.invokerFor(fn); - invoker.invokeSetup(); + invoker.invokeSetup(mockArgumentProvider); invoker.invokeStartBundle(mockArgumentProvider); invoker.invokeFinishBundle(mockArgumentProvider); invoker.invokeTeardown(); - verify(fn).before(); + verify(fn).before(mockOptions); verify(fn).startBundle(mockStartBundleContext); verify(fn).finishBundle(mockFinishBundleContext); verify(fn).after(); @@ -439,6 +446,11 @@ public SomeRestrictionTracker newTracker(@Restriction SomeRestriction restrictio return null; } + @GetSize + public double getSize() { + return 2.0; + } + @GetRestrictionCoder public SomeRestrictionCoder getRestrictionCoder() { return null; @@ -498,6 +510,7 @@ public void splitRestriction( when(fn.newTracker(restriction)).thenReturn(tracker); when(fn.newWatermarkEstimator(watermarkEstimatorState)).thenReturn(watermarkEstimator); when(fn.processElement(mockProcessContext, tracker, watermarkEstimator)).thenReturn(resume()); + when(fn.getSize()).thenReturn(2.0); assertEquals(coder, invoker.invokeGetRestrictionCoder(CoderRegistry.createDefault())); assertEquals( @@ -587,6 +600,7 @@ public WatermarkEstimator watermarkEstimator() { return watermarkEstimator; } })); + assertEquals(2.0, invoker.invokeGetSize(mockArgumentProvider), 0.0001); } private static class RestrictionWithBoundedDefaultTracker @@ -985,6 +999,77 @@ public RestrictionWithBoundedDefaultTracker getInitialRestriction(@Element Strin assertNull(invoker.invokeGetInitialWatermarkEstimatorState(new FakeArgumentProvider<>())); } + @Test + public void testDefaultGetSizeWithoutHasProgress() throws Exception { + class MockFn extends DoFn { + @ProcessElement + public void processElement( + ProcessContext c, + RestrictionTracker tracker) {} + + @GetInitialRestriction + public RestrictionWithBoundedDefaultTracker getInitialRestriction(@Element String element) { + return null; + } + } + + MockFn fn = mock(MockFn.class); + DoFnInvoker invoker = DoFnInvokers.invokerFor(fn); + assertEquals(1.0, invoker.invokeGetSize(mockArgumentProvider), 0.0001); + } + + @Test + public void testDefaultGetSizeWithHasProgress() throws Exception { + class MockFn extends DoFn { + @ProcessElement + public void processElement( + ProcessContext c, + RestrictionTracker tracker) {} + + @GetInitialRestriction + public RestrictionWithBoundedDefaultTracker getInitialRestriction(@Element String element) { + return null; + } + } + + abstract class HasProgressRestrictionTracker extends SomeRestrictionTracker + implements HasProgress {} + HasProgressRestrictionTracker tracker = mock(HasProgressRestrictionTracker.class); + when(tracker.getProgress()).thenReturn(Progress.from(3.0, 4.0)); + + when(mockArgumentProvider.restrictionTracker()).thenReturn((RestrictionTracker) tracker); + + MockFn fn = mock(MockFn.class); + DoFnInvoker invoker = DoFnInvokers.invokerFor(fn); + assertEquals(4.0, invoker.invokeGetSize(mockArgumentProvider), 0.0001); + } + + @Test + public void testGetSize() throws Exception { + abstract class MockFn extends DoFn { + @ProcessElement + public abstract void processElement( + ProcessContext c, RestrictionTracker tracker); + + @GetInitialRestriction + public abstract RestrictionWithBoundedDefaultTracker getInitialRestriction( + @Element String element); + + @GetSize + public abstract double getSize(); + } + + MockFn fn = mock(MockFn.class); + when(fn.getSize()).thenReturn(5.0, -3.0); + + DoFnInvoker invoker = DoFnInvokers.invokerFor(fn); + assertEquals(5.0, invoker.invokeGetSize(mockArgumentProvider), 0.0001); + assertThrows( + "Expected size >= 0 but received", + IllegalArgumentException.class, + () -> invoker.invokeGetSize(mockArgumentProvider)); + } + // --------------------------------------------------------------------------------------- // Tests for ability to invoke @OnTimer for private, inner and anonymous classes. // --------------------------------------------------------------------------------------- diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesSplittableDoFnTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesSplittableDoFnTest.java index 6f88cfed4684..3b3c6fa44f5d 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesSplittableDoFnTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesSplittableDoFnTest.java @@ -71,7 +71,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class DoFnSignaturesSplittableDoFnTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesTest.java index cd0b44dcdf55..4bd18fae8b11 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesTest.java @@ -18,6 +18,7 @@ package org.apache.beam.sdk.transforms.reflect; import static org.apache.beam.sdk.transforms.reflect.DoFnSignaturesTestUtils.errors; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; @@ -27,7 +28,6 @@ import static org.hamcrest.Matchers.notNullValue; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -94,9 +94,6 @@ /** Tests for {@link DoFnSignatures}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DoFnSignaturesTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/OnTimerInvokersTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/OnTimerInvokersTest.java index 7263cd2c31d4..e45e956b1814 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/OnTimerInvokersTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/OnTimerInvokersTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.transforms.reflect; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.theInstance; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import org.apache.beam.sdk.state.TimeDomain; @@ -39,9 +39,6 @@ /** Tests for {@link DoFnInvokers}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class OnTimerInvokersTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/testhelper/DoFnInvokersTestHelper.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/testhelper/DoFnInvokersTestHelper.java index c13797a3f05c..05f066cfb743 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/testhelper/DoFnInvokersTestHelper.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/testhelper/DoFnInvokersTestHelper.java @@ -32,9 +32,6 @@ * Test helper for {@link DoFnInvokersTest}, which needs to test package-private access to DoFns in * other packages. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DoFnInvokersTestHelper { private static class StaticPrivateDoFn extends DoFn { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHintsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHintsTest.java new file mode 100644 index 000000000000..3cc522176374 --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/resourcehints/ResourceHintsTest.java @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms.resourcehints; + +import static junit.framework.TestCase.assertEquals; + +import java.io.Serializable; +import java.nio.charset.StandardCharsets; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link ResourceHints} class. */ +@RunWith(JUnit4.class) +public class ResourceHintsTest implements Serializable { + + @Rule public final ExpectedException thrown = ExpectedException.none(); + + private void verifyMinRamHintHelper(String hint, String expectedByteString) { + assertEquals( + expectedByteString, + new String( + ResourceHints.create() + .withMinRam(hint) + .hints() + .get("beam:resources:min_ram_bytes:v1") + .toBytes(), + StandardCharsets.US_ASCII)); + } + + @Test + public void testMinRamHintParsesCorrectly() { + verifyMinRamHintHelper("123B", "123"); + verifyMinRamHintHelper("123 B", "123"); + verifyMinRamHintHelper("10 GiB", String.valueOf(1024L * 1024 * 1024 * 10)); + verifyMinRamHintHelper("10.5 GB", "10500000000"); + } + + @Test + public void testMinRamStringHintDoesNotParseWhenNoUnitsSpecified() { + thrown.expect(IllegalArgumentException.class); + verifyMinRamHintHelper("10", null); + } + + @Test + public void testMinRamStringHintDoesNotParseWhenUnknownUnits() { + thrown.expect(IllegalArgumentException.class); + verifyMinRamHintHelper("10 BB", null); + } + + @Test + public void testAcceleratorHintParsesCorrectly() { + assertEquals( + "some_gpu", + new String( + ResourceHints.create() + .withAccelerator("some_gpu") + .hints() + .get("beam:resources:accelerator:v1") + .toBytes(), + StandardCharsets.US_ASCII)); + } + + @Test + public void testFromOptions() { + ResourceHintsOptions options = + PipelineOptionsFactory.fromArgs( + "--resourceHints=minRam=1KB", "--resourceHints=beam:resources:bar=foo") + .as(ResourceHintsOptions.class); + assertEquals( + ResourceHints.fromOptions(options), + ResourceHints.create() + .withMinRam(1000) + .withHint("beam:resources:bar", new ResourceHints.StringHint("foo"))); + options = + PipelineOptionsFactory.fromArgs( + "--resourceHints=min_ram=1KB", "--resourceHints=accelerator=foo") + .as(ResourceHintsOptions.class); + assertEquals( + ResourceHints.fromOptions(options), + ResourceHints.create().withMinRam(1000).withAccelerator("foo")); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java index 16933ba1b01f..1ebb4fe492a4 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java @@ -35,9 +35,6 @@ /** Tests for {@link ByteKeyRangeTrackerTest}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ByteKeyRangeTrackerTest { @Rule public final ExpectedException expected = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTrackerTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTrackerTest.java index 717db414b885..3816b13fe7f1 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTrackerTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTrackerTest.java @@ -37,7 +37,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class GrowableOffsetRangeTrackerTest { private static class SimpleEstimator implements GrowableOffsetRangeTracker.RangeEndEstimator { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java index e311eb42cf8f..1ccad433fe40 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java @@ -36,7 +36,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class OffsetRangeTrackerTest { @Rule public final ExpectedException expected = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/SplitResultTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/SplitResultTest.java index f98d861f7853..47c54bc62bd5 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/SplitResultTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/SplitResultTest.java @@ -24,9 +24,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SplitResultTest { @Test public void testPrimaryAndResidualAreSet() { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/FixedWindowsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/FixedWindowsTest.java index b13a5564fd33..3dd7c28a3484 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/FixedWindowsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/FixedWindowsTest.java @@ -21,10 +21,10 @@ import static org.apache.beam.sdk.testing.WindowFnTestUtils.set; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.hamcrest.CoreMatchers.containsString; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -32,7 +32,6 @@ import java.util.HashMap; import java.util.Map; import java.util.Set; -import org.apache.beam.sdk.testing.WindowFnTestUtils; import org.apache.beam.sdk.transforms.display.DisplayData; import org.joda.time.Duration; import org.joda.time.Instant; @@ -158,13 +157,6 @@ public void testVerifyCompatibility() throws IncompatibleWindowException { FixedWindows.of(new Duration(10)).verifyCompatibility(FixedWindows.of(new Duration(20))); } - @Test - public void testValidOutputTimes() throws Exception { - for (long timestamp : Arrays.asList(200, 800, 700)) { - WindowFnTestUtils.validateGetOutputTimestamp(FixedWindows.of(new Duration(500)), timestamp); - } - } - @Test public void testDisplayData() { Duration offset = Duration.standardSeconds(1234); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/IntervalWindowTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/IntervalWindowTest.java index 52674239cfe2..a79ff6fef6dc 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/IntervalWindowTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/IntervalWindowTest.java @@ -17,15 +17,18 @@ */ package org.apache.beam.sdk.transforms.windowing; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.List; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.DurationCoder; import org.apache.beam.sdk.coders.InstantCoder; import org.apache.beam.sdk.testing.CoderProperties; +import org.apache.beam.sdk.testing.CoderProperties.TestElementByteSizeObserver; import org.apache.beam.sdk.util.CoderUtils; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.joda.time.Duration; import org.joda.time.Instant; import org.junit.Test; import org.junit.runner.RunWith; @@ -87,4 +90,20 @@ public void testLengthsOfEncodingChoices() throws Exception { assertThat(encodedHourWindow.length, equalTo(encodedStart.length + encodedHourEnd.length - 4)); assertThat(encodedDayWindow.length, equalTo(encodedStart.length + encodedDayEnd.length - 4)); } + + @Test + public void testCoderRegisterByteSizeObserver() throws Exception { + assertThat(TEST_CODER.isRegisterByteSizeObserverCheap(TEST_VALUES.get(0)), equalTo(true)); + TestElementByteSizeObserver observer = new TestElementByteSizeObserver(); + TestElementByteSizeObserver observer2 = new TestElementByteSizeObserver(); + for (IntervalWindow window : TEST_VALUES) { + TEST_CODER.registerByteSizeObserver(window, observer); + InstantCoder.of().registerByteSizeObserver(window.maxTimestamp(), observer2); + DurationCoder.of() + .registerByteSizeObserver(new Duration(window.start(), window.end()), observer2); + observer.advance(); + observer2.advance(); + assertThat(observer.getSumAndReset(), equalTo(observer2.getSumAndReset())); + } + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/RepeatedlyTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/RepeatedlyTest.java index 036d223c1a37..7749c9bac5b3 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/RepeatedlyTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/RepeatedlyTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.transforms.windowing; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.when; import org.joda.time.Duration; @@ -33,9 +33,6 @@ /** Tests for {@link Repeatedly}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RepeatedlyTest { @Mock private Trigger mockTrigger; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/SessionsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/SessionsTest.java index a7e922e97831..9d05cee7f670 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/SessionsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/SessionsTest.java @@ -20,10 +20,10 @@ import static org.apache.beam.sdk.testing.WindowFnTestUtils.runWindowFn; import static org.apache.beam.sdk.testing.WindowFnTestUtils.set; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.containsString; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.Arrays; @@ -114,15 +114,6 @@ public void testVerifyCompatibility() throws IncompatibleWindowException { .verifyCompatibility(FixedWindows.of(new Duration(10))); } - /** Validates that the output timestamp for aggregate data falls within the acceptable range. */ - @Test - public void testValidOutputTimes() throws Exception { - for (long timestamp : Arrays.asList(200, 800, 700)) { - WindowFnTestUtils.validateGetOutputTimestamp( - Sessions.withGapDuration(Duration.millis(500)), timestamp); - } - } - /** * Test to confirm that {@link Sessions} with the default {@link TimestampCombiner} holds up the * watermark potentially indefinitely. diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/SlidingWindowsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/SlidingWindowsTest.java index d32fccf610df..3418b19b8eb6 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/SlidingWindowsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/SlidingWindowsTest.java @@ -20,18 +20,17 @@ import static org.apache.beam.sdk.testing.WindowFnTestUtils.runWindowFn; import static org.apache.beam.sdk.testing.WindowFnTestUtils.set; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.Arrays; import java.util.HashMap; import java.util.Map; import java.util.Set; -import org.apache.beam.sdk.testing.WindowFnTestUtils; import org.apache.beam.sdk.transforms.display.DisplayData; import org.joda.time.Duration; import org.joda.time.Instant; @@ -190,22 +189,6 @@ public void testDefaultWindowMappingFn() { mapping.getSideInputWindow(new IntervalWindow(new Instant(0), new Instant(1341)))); } - @Test - public void testValidOutputTimes() throws Exception { - for (long timestamp : Arrays.asList(200, 800, 499, 500, 501, 700, 1000)) { - WindowFnTestUtils.validateGetOutputTimestamp( - SlidingWindows.of(new Duration(1000)).every(new Duration(500)), timestamp); - } - } - - @Test - public void testOutputTimesNonInterference() throws Exception { - for (long timestamp : Arrays.asList(200, 800, 700)) { - WindowFnTestUtils.validateNonInterferingOutputTimes( - SlidingWindows.of(new Duration(1000)).every(new Duration(500)), timestamp); - } - } - @Test public void testDisplayData() { Duration windowSize = Duration.standardSeconds(1234); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/StubTrigger.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/StubTrigger.java index 725f57d671b6..d4ee23b92b2f 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/StubTrigger.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/StubTrigger.java @@ -22,9 +22,6 @@ import org.joda.time.Instant; /** No-op {@link OnceTrigger} implementation for testing. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) abstract class StubTrigger extends Trigger.OnceTrigger { /** * Create a stub {@link Trigger} instance which returns the specified name on {@link #toString()}. diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/TriggerTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/TriggerTest.java index eec0b6e224bc..335d9670625c 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/TriggerTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/TriggerTest.java @@ -30,9 +30,6 @@ /** Tests for {@link Trigger}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TriggerTest { @Test diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java index a89742f6081b..7f976e10a175 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java @@ -20,13 +20,13 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasKey; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFor; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.isOneOf; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.when; @@ -38,6 +38,7 @@ import java.util.Collections; import java.util.HashMap; import java.util.HashSet; +import java.util.List; import java.util.Map; import java.util.Objects; import java.util.Set; @@ -64,12 +65,14 @@ import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.display.DisplayDataEvaluator; +import org.apache.beam.sdk.transforms.windowing.IntervalWindow.IntervalWindowCoder; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.TimestampedValue; import org.apache.beam.sdk.values.WindowingStrategy; import org.apache.beam.sdk.values.WindowingStrategy.AccumulationMode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.checkerframework.checker.nullness.qual.Nullable; import org.hamcrest.Matchers; import org.joda.time.Duration; @@ -84,9 +87,6 @@ /** Tests for {@link Window}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindowTest implements Serializable { @Rule @@ -662,6 +662,39 @@ public void testMergingCustomWindowsKeyedCollection() { pipeline.run(); } + @Test + @Category({ValidatesRunner.class, UsesCustomWindowMerging.class}) + public void testMergingCustomWindowsWithoutCustomWindowTypes() { + Instant startInstant = new Instant(0L); + PCollection> inputCollection = + pipeline.apply( + Create.timestamped( + TimestampedValue.of(KV.of("a", 1), startInstant.plus(Duration.standardSeconds(1))), + TimestampedValue.of(KV.of("a", 2), startInstant.plus(Duration.standardSeconds(2))), + TimestampedValue.of(KV.of("a", 3), startInstant.plus(Duration.standardSeconds(3))), + TimestampedValue.of(KV.of("a", 4), startInstant.plus(Duration.standardSeconds(4))), + TimestampedValue.of( + KV.of("a", 5), startInstant.plus(Duration.standardSeconds(5))))); + PCollection> windowedCollection = + inputCollection.apply(Window.into(new WindowOddEvenMergingBuckets<>())); + PCollection result = + windowedCollection + .apply(GroupByKey.create()) + .apply( + ParDo.of( + new DoFn>, String>() { + @ProcessElement + public void processElement(ProcessContext c, BoundedWindow window) { + List elements = Lists.newArrayList(); + c.element().getValue().forEach(elements::add); + Collections.sort(elements); + c.output(elements.toString()); + } + })); + PAssert.that("Wrong output collection", result).containsInAnyOrder("[2, 4]", "[1, 3, 5]"); + pipeline.run(); + } + private static class CustomWindow extends IntervalWindow { private boolean isBig; @@ -778,4 +811,59 @@ public WindowMappingFn getDefaultWindowMappingFn() { throw new UnsupportedOperationException("side inputs not supported"); } } + + private static class WindowOddEvenMergingBuckets extends WindowFn { + + @Override + public Collection assignWindows(AssignContext c) throws Exception { + return Collections.singleton( + new IntervalWindow(c.timestamp(), c.timestamp().plus(Duration.standardSeconds(30)))); + } + + @Override + public void mergeWindows(MergeContext c) throws Exception { + Set evenWindows = new HashSet<>(); + Set oddWindows = new HashSet<>(); + for (IntervalWindow window : c.windows()) { + if ((window.start().getMillis() / 1000) % 2 == 0) { + evenWindows.add(window); + } else { + oddWindows.add(window); + } + } + if (evenWindows.size() > 1) { + IntervalWindow evenMerged = + new IntervalWindow( + Instant.ofEpochMilli( + evenWindows.stream().map(t -> t.start().getMillis()).min(Long::compare).get()), + Instant.ofEpochMilli( + evenWindows.stream().map(t -> t.end().getMillis()).max(Long::compare).get())); + c.merge(evenWindows, evenMerged); + } + if (oddWindows.size() > 1) { + IntervalWindow oddMerged = + new IntervalWindow( + Instant.ofEpochMilli( + oddWindows.stream().map(t -> t.start().getMillis()).min(Long::compare).get()), + Instant.ofEpochMilli( + oddWindows.stream().map(t -> t.end().getMillis()).max(Long::compare).get())); + c.merge(oddWindows, oddMerged); + } + } + + @Override + public boolean isCompatible(WindowFn other) { + return other instanceof WindowOddEvenMergingBuckets; + } + + @Override + public Coder windowCoder() { + return IntervalWindowCoder.of(); + } + + @Override + public WindowMappingFn getDefaultWindowMappingFn() { + throw new UnsupportedOperationException("side inputs not supported"); + } + } } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowingTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowingTest.java index c9756153501c..3a447f0d72d0 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowingTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowingTest.java @@ -142,10 +142,10 @@ public void testNonPartitioningWindowing() { PAssert.that(output) .containsInAnyOrder( output("a", 1, 1, -5, 5), - output("a", 2, 5, 0, 10), - output("a", 1, 10, 5, 15), + output("a", 2, 1, 0, 10), + output("a", 1, 7, 5, 15), output("b", 1, 8, 0, 10), - output("b", 1, 10, 5, 15)); + output("b", 1, 8, 5, 15)); p.run(); } diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ApiSurfaceTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ApiSurfaceTest.java index 4f5b6f6d0c67..3559e04a0865 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ApiSurfaceTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ApiSurfaceTest.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.util; import static org.apache.beam.sdk.util.ApiSurface.containsOnlyClassesMatching; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.emptyIterable; -import static org.junit.Assert.assertThat; import java.util.List; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.FluentIterable; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/BufferedElementCountingOutputStreamTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/BufferedElementCountingOutputStreamTest.java index 0d1f7d89b719..9ad70a7abfa1 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/BufferedElementCountingOutputStreamTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/BufferedElementCountingOutputStreamTest.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.util; import static org.apache.beam.sdk.util.BufferedElementCountingOutputStream.BUFFER_POOL; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.ByteArrayInputStream; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/CombineFnUtilTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/CombineFnUtilTest.java index b000a602da4d..084b067d33f2 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/CombineFnUtilTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/CombineFnUtilTest.java @@ -40,9 +40,6 @@ /** Unit tests for {@link CombineFnUtil}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CombineFnUtilTest { @Rule public ExpectedException expectedException = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/FilePatternMatchingShardedFileTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/FilePatternMatchingShardedFileTest.java index 5d458113bd1b..87044cb9ef35 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/FilePatternMatchingShardedFileTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/FilePatternMatchingShardedFileTest.java @@ -43,9 +43,6 @@ /** Tests for {@link FilePatternMatchingShardedFile}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FilePatternMatchingShardedFileTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/FluentBackoffTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/FluentBackoffTest.java index 1b04361ecd87..bca3f33a3bd1 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/FluentBackoffTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/FluentBackoffTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; @@ -24,7 +25,6 @@ import static org.hamcrest.Matchers.lessThan; import static org.hamcrest.Matchers.lessThanOrEqualTo; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.IOException; import org.joda.time.Duration; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/HistogramDataTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/HistogramDataTest.java new file mode 100644 index 000000000000..b6e4d989a8f3 --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/HistogramDataTest.java @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.util; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.equalTo; + +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link HistogramData}. */ +@RunWith(JUnit4.class) +public class HistogramDataTest { + + @Test + public void testOutOfRangeWarning() { + HistogramData histogramData = HistogramData.linear(0, 20, 5); + histogramData.record(100); + assertThat(histogramData.getTotalCount(), equalTo(1L)); + } + + @Test + public void testCheckBoundaryBuckets() { + HistogramData histogramData = HistogramData.linear(0, 20, 5); + histogramData.record(0); + histogramData.record(99.9); + assertThat(histogramData.getCount(0), equalTo(1L)); + assertThat(histogramData.getCount(4), equalTo(1L)); + } + + @Test + public void testFractionalBuckets() { + HistogramData histogramData1 = HistogramData.linear(0, 10.0 / 3, 3); + histogramData1.record(3.33); + histogramData1.record(6.66); + assertThat(histogramData1.getCount(0), equalTo(1L)); + assertThat(histogramData1.getCount(1), equalTo(1L)); + + HistogramData histogramData2 = HistogramData.linear(0, 10.0 / 3, 3); + histogramData2.record(3.34); + histogramData2.record(6.67); + assertThat(histogramData2.getCount(1), equalTo(1L)); + assertThat(histogramData2.getCount(2), equalTo(1L)); + } + + @Test + public void testP50() { + HistogramData histogramData1 = HistogramData.linear(0, 0.2, 50); + histogramData1.record(0, 1, 2, 3, 4, 5, 6, 7, 8, 9); + assertThat(String.format("%.3f", histogramData1.p50()), equalTo("4.200")); + + HistogramData histogramData2 = HistogramData.linear(0, 0.02, 50); + histogramData2.record(0, 0, 0); + assertThat(String.format("%.3f", histogramData2.p50()), equalTo("0.010")); + } + + @Test + public void testP90() { + HistogramData histogramData1 = HistogramData.linear(0, 0.2, 50); + histogramData1.record(0, 1, 2, 3, 4, 5, 6, 7, 8, 9); + assertThat(String.format("%.3f", histogramData1.p90()), equalTo("8.200")); + + HistogramData histogramData2 = HistogramData.linear(0, 0.02, 50); + histogramData2.record(0, 0, 0); + assertThat(String.format("%.3f", histogramData2.p90()), equalTo("0.018")); + } + + @Test + public void testP99() { + HistogramData histogramData1 = HistogramData.linear(0, 0.2, 50); + histogramData1.record(0, 1, 2, 3, 4, 5, 6, 7, 8, 9); + assertThat(String.format("%.3f", histogramData1.p99()), equalTo("9.180")); + + HistogramData histogramData2 = HistogramData.linear(0, 0.02, 50); + histogramData2.record(0, 0, 0); + assertThat(String.format("%.3f", histogramData2.p99()), equalTo("0.020")); + } + + @Test + public void testP90Negative() { + HistogramData histogramData1 = HistogramData.linear(-10, 0.2, 50); + histogramData1.record(-1, -2, -3, -4, -5, -6, -7, -8, -9, -10); + assertThat(String.format("%.3f", histogramData1.p90()), equalTo("-1.800")); + + HistogramData histogramData2 = HistogramData.linear(-1, 0.02, 50); + histogramData2.record(-1, -1, -1); + assertThat(String.format("%.3f", histogramData2.p90()), equalTo("-0.982")); + } + + @Test + public void testP90NegativeToPositive() { + HistogramData histogramData1 = HistogramData.linear(-5, 0.2, 50); + histogramData1.record(-1, -2, -3, -4, -5, 0, 1, 2, 3, 4); + assertThat(String.format("%.3f", histogramData1.p90()), equalTo("3.200")); + + HistogramData histogramData2 = HistogramData.linear(-0.5, 0.02, 50); + histogramData2.record(-0.5, -0.5, -0.5); + assertThat(String.format("%.3f", histogramData2.p90()), equalTo("-0.482")); + } + + @Test + public void testP50NegativeInfinity() { + HistogramData histogramData = HistogramData.linear(0, 0.2, 50); + histogramData.record(-1, -2, -3, -4, -5, 0, 1, 2, 3, 4); + assertThat(histogramData.p50(), equalTo(Double.NEGATIVE_INFINITY)); + assertThat( + histogramData.getPercentileString("meows", "cats"), + equalTo("Total number of meows: 10, P99: 4 cats, P90: 3 cats, P50: -Infinity cats")); + } + + @Test + public void testP50PositiveInfinity() { + HistogramData histogramData = HistogramData.linear(0, 0.2, 50); + histogramData.record(6, 7, 8, 9, 10, 11, 12, 13, 14, 15); + assertThat(histogramData.p50(), equalTo(Double.POSITIVE_INFINITY)); + assertThat( + histogramData.getPercentileString("meows", "cats"), + equalTo( + "Total number of meows: 10, P99: Infinity cats, P90: Infinity cats, P50: Infinity cats")); + } + + @Test + public void testEmptyP99() { + HistogramData histogramData = HistogramData.linear(0, 0.2, 50); + assertThat(histogramData.p99(), equalTo(Double.NaN)); + } + + @Test + public void testClear() { + HistogramData histogramData = HistogramData.linear(0, 0.2, 50); + histogramData.record(-1, 1, 2, 3); + assertThat(histogramData.getTotalCount(), equalTo(4L)); + assertThat(histogramData.getCount(5), equalTo(1L)); + histogramData.clear(); + assertThat(histogramData.getTotalCount(), equalTo(0L)); + assertThat(histogramData.getCount(5), equalTo(0L)); + } + + @Test + public void testUpdateUsingDoublesAndCumulative() { + HistogramData data = HistogramData.linear(0, 2, 2); + data.record(-1); // to -Inf bucket + data.record(0); // bucket 0 + data.record(1); + data.record(3); // bucket 1 + data.record(4); // to Inf bucket + assertThat(data.getCount(0), equalTo(2L)); + assertThat(data.getCount(1), equalTo(1L)); + assertThat(data.getTotalCount(), equalTo(5L)); + + // Now try updating it with another HistogramData + HistogramData data2 = HistogramData.linear(0, 2, 2); + data2.record(-1); // to -Inf bucket + data2.record(-1); // to -Inf bucket + data2.record(0); // bucket 0 + data2.record(0); + data2.record(1); + data2.record(1); + data2.record(3); // bucket 1 + data2.record(3); + data2.record(4); // to Inf bucket + data2.record(4); + assertThat(data2.getCount(0), equalTo(4L)); + assertThat(data2.getCount(1), equalTo(2L)); + assertThat(data2.getTotalCount(), equalTo(10L)); + + data.update(data2); + assertThat(data.getCount(0), equalTo(6L)); + assertThat(data.getCount(1), equalTo(3L)); + assertThat(data.getTotalCount(), equalTo(15L)); + } + + @Test + public void testIncrementBucketCountByIndex() { + HistogramData data = HistogramData.linear(0, 2, 2); + data.incBottomBucketCount(1); + data.incBucketCount(0, 2); + data.incBucketCount(1, 3); + data.incTopBucketCount(4); + + assertThat(data.getBottomBucketCount(), equalTo(1L)); + assertThat(data.getCount(0), equalTo(2L)); + assertThat(data.getCount(1), equalTo(3L)); + assertThat(data.getTopBucketCount(), equalTo(4L)); + assertThat(data.getTotalCount(), equalTo(10L)); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/HistogramTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/HistogramTest.java deleted file mode 100644 index 199876fbdb45..000000000000 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/HistogramTest.java +++ /dev/null @@ -1,152 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.util; - -import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; -import static org.junit.Assert.assertThrows; - -import org.apache.beam.sdk.testing.ExpectedLogs; -import org.junit.Rule; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; - -/** Tests for {@link Histogram}. */ -@RunWith(JUnit4.class) -public class HistogramTest { - @Rule public ExpectedLogs expectedLogs = ExpectedLogs.none(Histogram.class); - - @Test - public void testOutOfRangeWarning() { - Histogram histogram = Histogram.linear(0, 20, 5); - histogram.record(100); - assertThat(histogram.getTotalCount(), equalTo(1L)); - expectedLogs.verifyWarn("out of upper bound"); - } - - @Test - public void testCheckBoundaryBuckets() { - Histogram histogram = Histogram.linear(0, 20, 5); - histogram.record(0); - histogram.record(99.9); - assertThat(histogram.getCount(0), equalTo(1L)); - assertThat(histogram.getCount(4), equalTo(1L)); - } - - @Test - public void testFractionalBuckets() { - Histogram histogram1 = Histogram.linear(0, 10.0 / 3, 3); - histogram1.record(3.33); - histogram1.record(6.66); - assertThat(histogram1.getCount(0), equalTo(1L)); - assertThat(histogram1.getCount(1), equalTo(1L)); - - Histogram histogram2 = Histogram.linear(0, 10.0 / 3, 3); - histogram2.record(3.34); - histogram2.record(6.67); - assertThat(histogram2.getCount(1), equalTo(1L)); - assertThat(histogram2.getCount(2), equalTo(1L)); - } - - @Test - public void testP50() { - Histogram histogram1 = Histogram.linear(0, 0.2, 50); - histogram1.record(0, 1, 2, 3, 4, 5, 6, 7, 8, 9); - assertThat(String.format("%.3f", histogram1.p50()), equalTo("4.200")); - - Histogram histogram2 = Histogram.linear(0, 0.02, 50); - histogram2.record(0, 0, 0); - assertThat(String.format("%.3f", histogram2.p50()), equalTo("0.010")); - } - - @Test - public void testP90() { - Histogram histogram1 = Histogram.linear(0, 0.2, 50); - histogram1.record(0, 1, 2, 3, 4, 5, 6, 7, 8, 9); - assertThat(String.format("%.3f", histogram1.p90()), equalTo("8.200")); - - Histogram histogram2 = Histogram.linear(0, 0.02, 50); - histogram2.record(0, 0, 0); - assertThat(String.format("%.3f", histogram2.p90()), equalTo("0.018")); - } - - @Test - public void testP99() { - Histogram histogram1 = Histogram.linear(0, 0.2, 50); - histogram1.record(0, 1, 2, 3, 4, 5, 6, 7, 8, 9); - assertThat(String.format("%.3f", histogram1.p99()), equalTo("9.180")); - - Histogram histogram2 = Histogram.linear(0, 0.02, 50); - histogram2.record(0, 0, 0); - assertThat(String.format("%.3f", histogram2.p99()), equalTo("0.020")); - } - - @Test - public void testP90Negative() { - Histogram histogram1 = Histogram.linear(-10, 0.2, 50); - histogram1.record(-1, -2, -3, -4, -5, -6, -7, -8, -9, -10); - assertThat(String.format("%.3f", histogram1.p90()), equalTo("-1.800")); - - Histogram histogram2 = Histogram.linear(-1, 0.02, 50); - histogram2.record(-1, -1, -1); - assertThat(String.format("%.3f", histogram2.p90()), equalTo("-0.982")); - } - - @Test - public void testP90NegativeToPositive() { - Histogram histogram1 = Histogram.linear(-5, 0.2, 50); - histogram1.record(-1, -2, -3, -4, -5, 0, 1, 2, 3, 4); - assertThat(String.format("%.3f", histogram1.p90()), equalTo("3.200")); - - Histogram histogram2 = Histogram.linear(-0.5, 0.02, 50); - histogram2.record(-0.5, -0.5, -0.5); - assertThat(String.format("%.3f", histogram2.p90()), equalTo("-0.482")); - } - - @Test - public void testP50NegativeInfinity() { - Histogram histogram = Histogram.linear(0, 0.2, 50); - histogram.record(-1, -2, -3, -4, -5, 0, 1, 2, 3, 4); - assertThat(histogram.p50(), equalTo(Double.NEGATIVE_INFINITY)); - } - - @Test - public void testP50PositiveInfinity() { - Histogram histogram = Histogram.linear(0, 0.2, 50); - histogram.record(6, 7, 8, 9, 10, 11, 12, 13, 14, 15); - assertThat(histogram.p50(), equalTo(Double.POSITIVE_INFINITY)); - } - - @Test - public void testEmptyP99() { - Histogram histogram = Histogram.linear(0, 0.2, 50); - assertThrows(RuntimeException.class, histogram::p99); - } - - @Test - public void testClear() { - Histogram histogram = Histogram.linear(0, 0.2, 50); - histogram.record(-1, 1, 2, 3); - assertThat(histogram.getTotalCount(), equalTo(4L)); - assertThat(histogram.getCount(5), equalTo(1L)); - histogram.clear(); - assertThat(histogram.getTotalCount(), equalTo(0L)); - assertThat(histogram.getCount(5), equalTo(0L)); - } -} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/MoreFuturesTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/MoreFuturesTest.java index be53e042588a..4b6790d22c30 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/MoreFuturesTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/MoreFuturesTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.isA; -import static org.junit.Assert.assertThat; import java.util.concurrent.CompletionStage; import java.util.concurrent.ExecutionException; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/NameUtilsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/NameUtilsTest.java index 6288a1ce616a..f327c3c68db3 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/NameUtilsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/NameUtilsTest.java @@ -33,9 +33,6 @@ /** Tests for {@link NameUtils}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class NameUtilsTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/NumberedShardedFileTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/NumberedShardedFileTest.java index 577d3923ea01..5f0faeb3fa09 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/NumberedShardedFileTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/NumberedShardedFileTest.java @@ -44,9 +44,6 @@ /** Tests for {@link NumberedShardedFile}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class NumberedShardedFileTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ReleaseInfoTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ReleaseInfoTest.java index e7b1531e088d..acf540ba09da 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ReleaseInfoTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ReleaseInfoTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import org.junit.Test; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/RowJsonTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/RowJsonTest.java index 74c3647af656..4ff907cff97d 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/RowJsonTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/RowJsonTest.java @@ -41,6 +41,7 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.hamcrest.Matcher; import org.hamcrest.Matchers; +import org.joda.time.DateTime; import org.junit.Rule; import org.junit.Test; import org.junit.experimental.runners.Enclosed; @@ -53,9 +54,6 @@ /** Unit tests for {@link RowJson.RowJsonDeserializer} and {@link RowJson.RowJsonSerializer}. */ @RunWith(Enclosed.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RowJsonTest { @RunWith(Parameterized.class) public static class ValueTests { @@ -326,6 +324,8 @@ public static class DeserializerTests { private static final String FLOAT_STRING = "1.02e5"; private static final Double DOUBLE_VALUE = 1.02d; private static final String DOUBLE_STRING = "1.02"; + private static final String DATETIME_STRING = "2014-09-27T20:30:00.450Z"; + private static final DateTime DATETIME_VALUE = DateTime.parse(DATETIME_STRING); @Rule public ExpectedException thrown = ExpectedException.none(); @@ -463,25 +463,13 @@ public void testRequireMissingAcceptsMissingField() throws Exception { .build())); } - @Test - public void testDeserializerThrowsForUnsupportedType() throws Exception { - Schema schema = Schema.builder().addDateTimeField("f_dateTime").build(); - - thrown.expect(UnsupportedRowJsonException.class); - thrown.expectMessage("DATETIME"); - thrown.expectMessage("f_dateTime"); - thrown.expectMessage("not supported"); - - RowJson.RowJsonDeserializer.forSchema(schema); - } - @Test public void testDeserializerThrowsForUnsupportedArrayElementType() throws Exception { - Schema schema = Schema.builder().addArrayField("f_dateTimeArray", FieldType.DATETIME).build(); + Schema schema = Schema.builder().addArrayField("f_bytesArray", FieldType.BYTES).build(); thrown.expect(UnsupportedRowJsonException.class); - thrown.expectMessage("DATETIME"); - thrown.expectMessage("f_dateTimeArray[]"); + thrown.expectMessage("BYTES"); + thrown.expectMessage("f_bytesArray[]"); thrown.expectMessage("not supported"); RowJson.RowJsonDeserializer.forSchema(schema); @@ -489,14 +477,13 @@ public void testDeserializerThrowsForUnsupportedArrayElementType() throws Except @Test public void testDeserializerThrowsForUnsupportedNestedFieldType() throws Exception { - Schema nestedSchema = - Schema.builder().addArrayField("f_dateTimeArray", FieldType.DATETIME).build(); + Schema nestedSchema = Schema.builder().addArrayField("f_bytesArray", FieldType.BYTES).build(); Schema schema = Schema.builder().addRowField("f_nestedRow", nestedSchema).build(); thrown.expect(UnsupportedRowJsonException.class); - thrown.expectMessage("DATETIME"); - thrown.expectMessage("f_nestedRow.f_dateTimeArray[]"); + thrown.expectMessage("BYTES"); + thrown.expectMessage("f_nestedRow.f_bytesArray[]"); thrown.expectMessage("not supported"); RowJson.RowJsonDeserializer.forSchema(schema); @@ -507,13 +494,13 @@ public void testDeserializerThrowsForMultipleUnsupportedFieldTypes() throws Exce Schema schema = Schema.builder() .addInt32Field("f_int32") - .addDateTimeField("f_dateTime") - .addArrayField("f_dateTimeArray", FieldType.DATETIME) + .addByteArrayField("f_bytes") + .addArrayField("f_bytesArray", FieldType.BYTES) .build(); thrown.expect(UnsupportedRowJsonException.class); - thrown.expectMessage("f_dateTime=DATETIME"); - thrown.expectMessage("f_dateTimeArray[]=DATETIME"); + thrown.expectMessage("f_bytes=BYTES"); + thrown.expectMessage("f_bytesArray[]=BYTES"); thrown.expectMessage("not supported"); RowJson.RowJsonDeserializer.forSchema(schema); @@ -568,6 +555,11 @@ public void testSupportedDoubleConversions() throws Exception { testSupportedConversion(FieldType.DOUBLE, INT_STRING, (double) INT_VALUE); } + @Test + public void testSupportedDatetimeConversions() throws Exception { + testSupportedConversion(FieldType.DATETIME, quoted(DATETIME_STRING), DATETIME_VALUE); + } + private void testSupportedConversion( FieldType fieldType, String jsonFieldValue, Object expectedRowFieldValue) throws Exception { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/SerializableUtilsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/SerializableUtilsTest.java index 8d6ad369f898..9f96bce0e4c3 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/SerializableUtilsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/SerializableUtilsTest.java @@ -39,9 +39,6 @@ /** Tests for {@link SerializableUtils}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SerializableUtilsTest { @Rule public ExpectedException expectedException = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/StreamUtilsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/StreamUtilsTest.java index 7f31f67b8d8e..68f87d737631 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/StreamUtilsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/StreamUtilsTest.java @@ -33,9 +33,6 @@ /** Unit tests for {@link ExposedByteArrayInputStream}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StreamUtilsTest { private byte[] testData = null; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UnownedInputStreamTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UnownedInputStreamTest.java index 5c4492a6320a..b265780ef03e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UnownedInputStreamTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UnownedInputStreamTest.java @@ -30,9 +30,6 @@ /** Unit tests for {@link UnownedInputStream}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UnownedInputStreamTest { @Rule public ExpectedException expectedException = ExpectedException.none(); private ByteArrayInputStream bais; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UnownedOutputStreamTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UnownedOutputStreamTest.java index b9469448d9df..2aa7b5236336 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UnownedOutputStreamTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UnownedOutputStreamTest.java @@ -33,9 +33,6 @@ /** Unit tests for {@link UnownedOutputStream}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UnownedOutputStreamTest { @Rule public ExpectedException expectedException = ExpectedException.none(); private ByteArrayOutputStream baos; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UserCodeExceptionTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UserCodeExceptionTest.java index 90829e2e7d5d..c7f7844582d1 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UserCodeExceptionTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/UserCodeExceptionTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.sdk.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.isA; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.IOException; import org.hamcrest.Description; @@ -37,9 +37,6 @@ /** Tests for {@link UserCodeException} functionality. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UserCodeExceptionTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/VarIntTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/VarIntTest.java index 144609da1ca2..cf15adebdf37 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/VarIntTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/VarIntTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/WindowedValueTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/WindowedValueTest.java index 821c216498fd..3d1cda7e570e 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/WindowedValueTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/WindowedValueTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.Arrays; import org.apache.beam.sdk.coders.Coder; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ZipFilesTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ZipFilesTest.java index e51c02b1c566..2cf38be6b8c1 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ZipFilesTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ZipFilesTest.java @@ -17,12 +17,12 @@ */ package org.apache.beam.sdk.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.arrayWithSize; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -50,9 +50,6 @@ * fine. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ZipFilesTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); private File tmpDir; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/common/ReflectHelpersTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/common/ReflectHelpersTest.java index cd8a564079a7..8cfd0b7512be 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/common/ReflectHelpersTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/common/ReflectHelpersTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.util.common; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import com.fasterxml.jackson.annotation.JsonIgnore; import java.util.ArrayList; @@ -36,9 +36,6 @@ /** Tests for {@link ReflectHelpers}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReflectHelpersTest { @Test diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/EncodableThrowableTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/EncodableThrowableTest.java new file mode 100644 index 000000000000..36eb7eb585a8 --- /dev/null +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/EncodableThrowableTest.java @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.values; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotEquals; + +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link EncodableThrowable}. */ +@RunWith(JUnit4.class) +public final class EncodableThrowableTest { + @Test + public void testEquals() { + IllegalStateException exception = + new IllegalStateException( + "Some illegal state", + new RuntimeException( + "Some nested exception", new Exception("Deeply nested exception"))); + + EncodableThrowable comparable1 = EncodableThrowable.forThrowable(exception); + EncodableThrowable comparable2 = EncodableThrowable.forThrowable(exception); + + assertEquals(comparable1, comparable1); + assertEquals(comparable1, comparable2); + } + + @Test + public void testEqualsNonComparable() { + assertNotEquals(EncodableThrowable.forThrowable(new Exception()), new Throwable()); + } + + @Test + public void testEqualsDifferentUnderlyingTypes() { + String message = "some message"; + assertNotEquals( + EncodableThrowable.forThrowable(new RuntimeException(message)), + EncodableThrowable.forThrowable(new Exception(message))); + } +} diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/KVTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/KVTest.java index ede172985836..c5aaf26394d6 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/KVTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/KVTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.values; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.Comparator; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -30,9 +30,6 @@ /** Tests for KV. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KVTest { private static final Integer[] TEST_VALUES = { null, Integer.MIN_VALUE, -1, 0, 1, Integer.MAX_VALUE diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/PCollectionListTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/PCollectionListTest.java index b13a4dbf6afb..8e9812b062f2 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/PCollectionListTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/PCollectionListTest.java @@ -18,10 +18,10 @@ package org.apache.beam.sdk.values; import static org.hamcrest.CoreMatchers.containsString; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import com.google.common.testing.EqualsTester; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/PCollectionTupleTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/PCollectionTupleTest.java index e302c5135147..b3f0d87aea70 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/PCollectionTupleTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/PCollectionTupleTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.values; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import com.google.common.testing.EqualsTester; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/RowTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/RowTest.java index 9c1b5e3b6a3b..9c39fea57353 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/RowTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/RowTest.java @@ -48,7 +48,6 @@ /** Unit tests for {@link Row}. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class RowTest { diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TimestampedValueTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TimestampedValueTest.java index 4d837ab5965e..2a246fefbb7a 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TimestampedValueTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TimestampedValueTest.java @@ -33,9 +33,6 @@ /** Unit tests for {@link TimestampedValue}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TimestampedValueTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TupleTagTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TupleTagTest.java index 4daac49a0693..addd92123aa2 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TupleTagTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TupleTagTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.values; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.Matchers.startsWith; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotEquals; -import static org.junit.Assert.assertThat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypeDescriptorTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypeDescriptorTest.java index a6ad2716c5e0..dbad49c284a3 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypeDescriptorTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypeDescriptorTest.java @@ -32,9 +32,6 @@ /** Tests for TypeDescriptor. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TypeDescriptorTest { @Rule public transient ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypeDescriptorsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypeDescriptorsTest.java index 885ebbc081e6..e33186c2a3ff 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypeDescriptorsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypeDescriptorsTest.java @@ -23,10 +23,10 @@ import static org.apache.beam.sdk.values.TypeDescriptors.lists; import static org.apache.beam.sdk.values.TypeDescriptors.sets; import static org.apache.beam.sdk.values.TypeDescriptors.strings; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotEquals; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import java.util.List; import java.util.Set; diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypedPValueTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypedPValueTest.java index 740bc805b7fb..012676b8ef14 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypedPValueTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/values/TypedPValueTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.values; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.not; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.VarIntCoder; diff --git a/sdks/java/expansion-service/build.gradle b/sdks/java/expansion-service/build.gradle index 4fbdf1683b05..2a0ffd0a7b27 100644 --- a/sdks/java/expansion-service/build.gradle +++ b/sdks/java/expansion-service/build.gradle @@ -34,19 +34,18 @@ test { dependencies { compile project(path: ":model:pipeline", configuration: "shadow") - compile project(path: ":model:fn-execution", configuration: "shadow") + compile project(path: ":model:job-management", configuration: "shadow") compile project(path: ":sdks:java:core", configuration: "shadow") compile project(path: ":runners:core-construction-java") compile project(path: ":runners:java-fn-execution") - compile library.java.vendored_grpc_1_26_0 + compile library.java.jackson_annotations + compile library.java.jackson_databind + compile library.java.jackson_dataformat_yaml + compile library.java.vendored_grpc_1_36_0 compile library.java.vendored_guava_26_0_jre compile library.java.slf4j_api runtimeOnly library.java.slf4j_jdk14 testCompile library.java.junit - // TODO(BEAM-10632): Remove this. Currently Schema inference (used in - // ExpansionServiceTest) hits an NPE when checker is enabled and - // checkerframework is not in the classpath. - testCompile "org.checkerframework:checker:3.5.0" } task runExpansionService (type: JavaExec) { diff --git a/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServer.java b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServer.java index 373e219b3121..f7926dc03900 100644 --- a/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServer.java +++ b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServer.java @@ -21,8 +21,8 @@ import java.net.InetSocketAddress; import java.util.concurrent.TimeUnit; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.netty.NettyServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NettyServerBuilder; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; /** A {@link Server gRPC Server} for an ExpansionService. */ diff --git a/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java index 71c99e472581..6e1f3d30d64d 100644 --- a/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java +++ b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.expansion.service; +import static org.apache.beam.runners.core.construction.BeamUrns.getUrn; import static org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage; import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; @@ -25,26 +26,36 @@ import java.lang.reflect.Constructor; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method; +import java.util.Arrays; import java.util.Collections; import java.util.List; import java.util.Map; +import java.util.Optional; import java.util.ServiceLoader; import java.util.Set; import java.util.stream.Collectors; import org.apache.beam.model.expansion.v1.ExpansionApi; import org.apache.beam.model.expansion.v1.ExpansionServiceGrpc; +import org.apache.beam.model.pipeline.v1.ExternalTransforms.ExpansionMethods; import org.apache.beam.model.pipeline.v1.ExternalTransforms.ExternalConfigurationPayload; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.SchemaApi; import org.apache.beam.runners.core.construction.Environments; import org.apache.beam.runners.core.construction.PipelineTranslation; import org.apache.beam.runners.core.construction.RehydratedComponents; import org.apache.beam.runners.core.construction.SdkComponents; +import org.apache.beam.runners.core.construction.SplittableParDo; import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.PipelineRunner; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.RowCoder; import org.apache.beam.sdk.expansion.ExternalTransformRegistrar; +import org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProvider.AllowList; import org.apache.beam.sdk.options.ExperimentalOptions; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.options.PortablePipelineOptions; import org.apache.beam.sdk.schemas.NoSuchSchemaException; import org.apache.beam.sdk.schemas.Schema; @@ -63,9 +74,10 @@ import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.CaseFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Converter; @@ -165,8 +177,8 @@ private static Class getConfigClass( return configurationClass; } - private static Row decodeRow(ExternalConfigurationPayload payload) { - Schema payloadSchema = SchemaTranslation.schemaFromProto(payload.getSchema()); + static Row decodeConfigObjectRow(SchemaApi.Schema schema, ByteString payload) { + Schema payloadSchema = SchemaTranslation.schemaFromProto(schema); if (payloadSchema.getFieldCount() == 0) { return Row.withSchema(Schema.of()).build(); @@ -193,7 +205,7 @@ private static Row decodeRow(ExternalConfigurationPayload payload) { Row configRow; try { - configRow = RowCoder.of(payloadSchema).decode(payload.getPayload().newInput()); + configRow = RowCoder.of(payloadSchema).decode(payload.newInput()); } catch (IOException e) { throw new RuntimeException("Error decoding payload", e); } @@ -240,7 +252,7 @@ private static ConfigT payloadToConfigSchema( SerializableFunction fromRowFunc = SCHEMA_REGISTRY.getFromRowFunction(configurationClass); - Row payloadRow = decodeRow(payload); + Row payloadRow = decodeConfigObjectRow(payload.getSchema(), payload.getPayload()); if (!payloadRow.getSchema().assignableTo(configSchema)) { throw new IllegalArgumentException( @@ -256,7 +268,7 @@ private static ConfigT payloadToConfigSchema( private static ConfigT payloadToConfigSetters( ExternalConfigurationPayload payload, Class configurationClass) throws ReflectiveOperationException { - Row configRow = decodeRow(payload); + Row configRow = decodeConfigObjectRow(payload.getSchema(), payload.getPayload()); Constructor constructor = configurationClass.getDeclaredConstructor(); constructor.setAccessible(true); @@ -362,6 +374,19 @@ default Map> apply( } private @MonotonicNonNull Map registeredTransforms; + private final PipelineOptions pipelineOptions; + + public ExpansionService() { + this(new String[] {}); + } + + public ExpansionService(String[] args) { + this(PipelineOptionsFactory.fromArgs(args).create()); + } + + public ExpansionService(PipelineOptions opts) { + this.pipelineOptions = opts; + } private Map getRegisteredTransforms() { if (registeredTransforms == null) { @@ -391,14 +416,28 @@ private Map loadRegisteredTransforms() { request.getTransform().getSpec().getUrn()); LOG.debug("Full transform: {}", request.getTransform()); Set existingTransformIds = request.getComponents().getTransformsMap().keySet(); - Pipeline pipeline = Pipeline.create(); - ExperimentalOptions.addExperiment( - pipeline.getOptions().as(ExperimentalOptions.class), "beam_fn_api"); + Pipeline pipeline = createPipeline(); + boolean isUseDeprecatedRead = + ExperimentalOptions.hasExperiment(pipelineOptions, "use_deprecated_read") + || ExperimentalOptions.hasExperiment( + pipelineOptions, "beam_fn_api_use_deprecated_read"); + if (!isUseDeprecatedRead) { + ExperimentalOptions.addExperiment( + pipeline.getOptions().as(ExperimentalOptions.class), "beam_fn_api"); + // TODO(BEAM-10670): Remove this when we address performance issue. + ExperimentalOptions.addExperiment( + pipeline.getOptions().as(ExperimentalOptions.class), "use_sdf_read"); + } else { + LOG.warn( + "Using use_depreacted_read in portable runners is runner-dependent. The " + + "ExpansionService will respect that, but if your runner does not have support for " + + "native Read transform, your Pipeline will fail during Pipeline submission."); + } ClassLoader classLoader = Environments.class.getClassLoader(); if (classLoader == null) { throw new RuntimeException( - "Cannot detect classpath: classload is null (is it the bootstrap classloader?)"); + "Cannot detect classpath: classloader is null (is it the bootstrap classloader?)"); } List classpathResources = @@ -425,13 +464,22 @@ private Map loadRegisteredTransforms() { } })); - @Nullable - TransformProvider transformProvider = - getRegisteredTransforms().get(request.getTransform().getSpec().getUrn()); - if (transformProvider == null) { - throw new UnsupportedOperationException( - "Unknown urn: " + request.getTransform().getSpec().getUrn()); + String urn = request.getTransform().getSpec().getUrn(); + + TransformProvider transformProvider = null; + if (getUrn(ExpansionMethods.Enum.JAVA_CLASS_LOOKUP).equals(urn)) { + AllowList allowList = + pipelineOptions.as(ExpansionServiceOptions.class).getJavaClassLookupAllowlist(); + assert allowList != null; + transformProvider = new JavaClassLookupTransformProvider(allowList); + } else { + transformProvider = getRegisteredTransforms().get(urn); + if (transformProvider == null) { + throw new UnsupportedOperationException( + "Unknown urn: " + request.getTransform().getSpec().getUrn()); + } } + Map> outputs = transformProvider.apply( pipeline, @@ -459,6 +507,11 @@ private Map loadRegisteredTransforms() { throw new RuntimeException(exn); } })); + + if (isUseDeprecatedRead) { + SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReadsIfNecessary(pipeline); + } + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents); String expandedTransformId = Iterables.getOnlyElement( @@ -483,6 +536,22 @@ private Map loadRegisteredTransforms() { .build(); } + protected Pipeline createPipeline() { + // TODO: [BEAM-12599]: implement proper validation + PipelineOptions effectiveOpts = PipelineOptionsFactory.create(); + PortablePipelineOptions portableOptions = effectiveOpts.as(PortablePipelineOptions.class); + PortablePipelineOptions specifiedOptions = pipelineOptions.as(PortablePipelineOptions.class); + Optional.ofNullable(specifiedOptions.getDefaultEnvironmentType()) + .ifPresent(portableOptions::setDefaultEnvironmentType); + Optional.ofNullable(specifiedOptions.getDefaultEnvironmentConfig()) + .ifPresent(portableOptions::setDefaultEnvironmentConfig); + effectiveOpts + .as(ExperimentalOptions.class) + .setExperiments(pipelineOptions.as(ExperimentalOptions.class).getExperiments()); + effectiveOpts.setRunner(NotRunnableRunner.class); + return Pipeline.create(effectiveOpts); + } + @Override public void expand( ExpansionApi.ExpansionRequest request, @@ -507,7 +576,8 @@ public void close() throws Exception { public static void main(String[] args) throws Exception { int port = Integer.parseInt(args[0]); System.out.println("Starting expansion service at localhost:" + port); - ExpansionService service = new ExpansionService(); + @SuppressWarnings("nullness") + ExpansionService service = new ExpansionService(Arrays.copyOfRange(args, 1, args.length)); for (Map.Entry entry : service.getRegisteredTransforms().entrySet()) { System.out.println("\t" + entry.getKey() + ": " + entry.getValue()); @@ -521,4 +591,15 @@ public static void main(String[] args) throws Exception { server.start(); server.awaitTermination(); } + + private static class NotRunnableRunner extends PipelineRunner { + public static NotRunnableRunner fromOptions(PipelineOptions opts) { + return new NotRunnableRunner(); + } + + @Override + public PipelineResult run(Pipeline pipeline) { + throw new UnsupportedOperationException(); + } + } } diff --git a/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceOptions.java b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceOptions.java new file mode 100644 index 000000000000..79e870cd07f2 --- /dev/null +++ b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceOptions.java @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.expansion.service; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProvider.AllowList; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; + +/** Options used to configure the {@link ExpansionService}. */ +public interface ExpansionServiceOptions extends PipelineOptions { + + @Description("Allow list for Java class based transform expansion") + @Default.InstanceFactory(JavaClassLookupAllowListFactory.class) + AllowList getJavaClassLookupAllowlist(); + + void setJavaClassLookupAllowlist(AllowList file); + + @Description("Allow list file for Java class based transform expansion") + String getJavaClassLookupAllowlistFile(); + + void setJavaClassLookupAllowlistFile(String file); + + /** + * Loads the allow list from {@link #getJavaClassLookupAllowlistFile}, defaulting to an empty + * {@link JavaClassLookupTransformProvider.AllowList}. + */ + class JavaClassLookupAllowListFactory implements DefaultValueFactory { + + @Override + public AllowList create(PipelineOptions options) { + String allowListFile = + options.as(ExpansionServiceOptions.class).getJavaClassLookupAllowlistFile(); + if (allowListFile != null) { + ObjectMapper mapper = new ObjectMapper(new YAMLFactory()); + File allowListFileObj = new File(allowListFile); + if (!allowListFileObj.exists()) { + throw new IllegalArgumentException( + "Allow list file " + allowListFile + " does not exist"); + } + try { + return mapper.readValue(allowListFileObj, AllowList.class); + } catch (IOException e) { + throw new IllegalArgumentException( + "Could not load the provided allowlist file " + allowListFile, e); + } + } + + // By default produces an empty allow-list. + return new AutoValue_JavaClassLookupTransformProvider_AllowList( + JavaClassLookupTransformProvider.ALLOW_LIST_VERSION, new ArrayList<>()); + } + } +} diff --git a/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/JavaClassLookupTransformProvider.java b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/JavaClassLookupTransformProvider.java new file mode 100644 index 000000000000..d32c7e4207a8 --- /dev/null +++ b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/JavaClassLookupTransformProvider.java @@ -0,0 +1,526 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.expansion.service; + +import static org.apache.beam.runners.core.construction.BeamUrns.getUrn; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.auto.value.AutoValue; +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.IOException; +import java.lang.annotation.Annotation; +import java.lang.reflect.Array; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.lang.reflect.ParameterizedType; +import java.lang.reflect.Type; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.beam.model.pipeline.v1.ExternalTransforms.BuilderMethod; +import org.apache.beam.model.pipeline.v1.ExternalTransforms.ExpansionMethods; +import org.apache.beam.model.pipeline.v1.ExternalTransforms.JavaClassLookupPayload; +import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; +import org.apache.beam.model.pipeline.v1.SchemaApi; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.ClassUtils; +import org.apache.beam.sdk.coders.RowCoder; +import org.apache.beam.sdk.expansion.service.ExpansionService.TransformProvider; +import org.apache.beam.sdk.schemas.JavaFieldSchema; +import org.apache.beam.sdk.schemas.NoSuchSchemaException; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.SchemaRegistry; +import org.apache.beam.sdk.schemas.SchemaTranslation; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.util.common.ReflectHelpers; +import org.apache.beam.sdk.values.PInput; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * A transform provider that can be used to directly instantiate a transform using Java class name + * and builder methods. + * + * @param input {@link PInput} type of the transform + * @param output {@link POutput} type of the transform + */ +@SuppressWarnings({"argument.type.incompatible", "assignment.type.incompatible"}) +@SuppressFBWarnings("UWF_UNWRITTEN_PUBLIC_OR_PROTECTED_FIELD") +class JavaClassLookupTransformProvider + implements TransformProvider { + + public static final String ALLOW_LIST_VERSION = "v1"; + private static final SchemaRegistry SCHEMA_REGISTRY = SchemaRegistry.createDefault(); + private final AllowList allowList; + + public JavaClassLookupTransformProvider(AllowList allowList) { + if (!allowList.getVersion().equals(ALLOW_LIST_VERSION)) { + throw new IllegalArgumentException("Unknown allow-list version"); + } + this.allowList = allowList; + } + + @Override + public PTransform getTransform(FunctionSpec spec) { + JavaClassLookupPayload payload; + try { + payload = JavaClassLookupPayload.parseFrom(spec.getPayload()); + } catch (InvalidProtocolBufferException e) { + throw new IllegalArgumentException( + "Invalid payload type for URN " + getUrn(ExpansionMethods.Enum.JAVA_CLASS_LOOKUP), e); + } + + String className = payload.getClassName(); + try { + AllowedClass allowlistClass = null; + if (this.allowList != null) { + for (AllowedClass cls : this.allowList.getAllowedClasses()) { + if (cls.getClassName().equals(className)) { + if (allowlistClass != null) { + throw new IllegalArgumentException( + "Found two matching allowlist classes " + allowlistClass + " and " + cls); + } + allowlistClass = cls; + } + } + } + if (allowlistClass == null) { + throw new UnsupportedOperationException( + "The provided allow list does not enable expanding a transform class by the name " + + className + + "."); + } + Class> transformClass = + (Class>) + ReflectHelpers.findClassLoader().loadClass(className); + PTransform transform; + Row constructorRow = + decodeRow(payload.getConstructorSchema(), payload.getConstructorPayload()); + if (payload.getConstructorMethod().isEmpty()) { + Constructor[] constructors = transformClass.getConstructors(); + Constructor> constructor = + findMappingConstructor(constructors, payload); + Object[] parameterValues = + getParameterValues( + constructor.getParameters(), + constructorRow, + constructor.getGenericParameterTypes()); + transform = (PTransform) constructor.newInstance(parameterValues); + } else { + Method[] methods = transformClass.getMethods(); + Method method = findMappingConstructorMethod(methods, payload, allowlistClass); + Object[] parameterValues = + getParameterValues( + method.getParameters(), constructorRow, method.getGenericParameterTypes()); + transform = (PTransform) method.invoke(null /* static */, parameterValues); + } + return applyBuilderMethods(transform, payload, allowlistClass); + } catch (ClassNotFoundException e) { + throw new IllegalArgumentException("Could not find class " + className, e); + } catch (InstantiationException + | IllegalArgumentException + | IllegalAccessException + | InvocationTargetException e) { + throw new IllegalArgumentException("Could not instantiate class " + className, e); + } + } + + private PTransform applyBuilderMethods( + PTransform transform, + JavaClassLookupPayload payload, + AllowedClass allowListClass) { + for (BuilderMethod builderMethod : payload.getBuilderMethodsList()) { + Method method = getMethod(transform, builderMethod, allowListClass); + try { + Row builderMethodRow = decodeRow(builderMethod.getSchema(), builderMethod.getPayload()); + transform = + (PTransform) + method.invoke( + transform, + getParameterValues( + method.getParameters(), + builderMethodRow, + method.getGenericParameterTypes())); + } catch (IllegalAccessException | InvocationTargetException e) { + throw new IllegalArgumentException( + "Could not invoke the builder method " + + builderMethod + + " on transform " + + transform + + " with parameter schema " + + builderMethod.getSchema(), + e); + } + } + + return transform; + } + + private boolean isBuilderMethodForName( + Method method, String nameFromPayload, AllowedClass allowListClass) { + // Lookup based on method annotations + for (Annotation annotation : method.getAnnotations()) { + if (annotation instanceof MultiLanguageBuilderMethod) { + if (nameFromPayload.equals(((MultiLanguageBuilderMethod) annotation).name())) { + if (allowListClass.getAllowedBuilderMethods().contains(nameFromPayload)) { + return true; + } else { + throw new RuntimeException( + "Builder method " + nameFromPayload + " has to be explicitly allowed"); + } + } + } + } + + // Lookup based on the method name. + boolean match = method.getName().equals(nameFromPayload); + String consideredMethodName = method.getName(); + + // We provide a simplification for common Java builder pattern naming convention where builder + // methods start with "with". In this case, for a builder method name in the form "withXyz", + // users may just use "xyz". If additional updates to the method name are needed the transform + // has to be updated by adding annotations. + if (!match && consideredMethodName.length() > 4 && consideredMethodName.startsWith("with")) { + consideredMethodName = + consideredMethodName.substring(4, 5).toLowerCase() + consideredMethodName.substring(5); + match = consideredMethodName.equals(nameFromPayload); + } + if (match && !allowListClass.getAllowedBuilderMethods().contains(consideredMethodName)) { + throw new RuntimeException( + "Builder method name " + consideredMethodName + " has to be explicitly allowed"); + } + return match; + } + + private Method getMethod( + PTransform transform, + BuilderMethod builderMethod, + AllowedClass allowListClass) { + + Row builderMethodRow = decodeRow(builderMethod.getSchema(), builderMethod.getPayload()); + + List matchingMethods = + Arrays.stream(transform.getClass().getMethods()) + .filter(m -> isBuilderMethodForName(m, builderMethod.getName(), allowListClass)) + .filter(m -> parametersCompatible(m.getParameters(), builderMethodRow)) + .filter(m -> PTransform.class.isAssignableFrom(m.getReturnType())) + .collect(Collectors.toList()); + + if (matchingMethods.size() != 1) { + throw new RuntimeException( + "Expected to find exactly one matching method in transform " + + transform + + " for BuilderMethod" + + builderMethod + + " but found " + + matchingMethods.size()); + } + return matchingMethods.get(0); + } + + private static boolean isPrimitiveOrWrapperOrString(java.lang.Class type) { + return ClassUtils.isPrimitiveOrWrapper(type) || type == String.class; + } + + private Schema getParameterSchema(Class parameterClass) { + Schema parameterSchema; + try { + parameterSchema = SCHEMA_REGISTRY.getSchema(parameterClass); + } catch (NoSuchSchemaException e) { + + SCHEMA_REGISTRY.registerSchemaProvider(parameterClass, new JavaFieldSchema()); + try { + parameterSchema = SCHEMA_REGISTRY.getSchema(parameterClass); + } catch (NoSuchSchemaException e1) { + throw new RuntimeException(e1); + } + if (parameterSchema != null && parameterSchema.getFieldCount() == 0) { + throw new RuntimeException( + "Could not determine a valid schema for parameter class " + parameterClass); + } + } + return parameterSchema; + } + + private boolean parametersCompatible( + java.lang.reflect.Parameter[] methodParameters, Row constructorRow) { + Schema constructorSchema = constructorRow.getSchema(); + if (methodParameters.length != constructorSchema.getFieldCount()) { + return false; + } + + for (int i = 0; i < methodParameters.length; i++) { + java.lang.reflect.Parameter parameterFromReflection = methodParameters[i]; + Field parameterFromPayload = constructorSchema.getField(i); + + String paramNameFromReflection = parameterFromReflection.getName(); + if (!paramNameFromReflection.startsWith("arg") + && !paramNameFromReflection.equals(parameterFromPayload.getName())) { + // Parameter name through reflection is from the class file (not through synthesizing, + // hence we can validate names) + return false; + } + + Class parameterClass = parameterFromReflection.getType(); + if (isPrimitiveOrWrapperOrString(parameterClass)) { + continue; + } + + // We perform additional validation for arrays and non-primitive types. + if (parameterClass.isArray()) { + Class arrayFieldClass = parameterClass.getComponentType(); + if (parameterFromPayload.getType().getTypeName() != TypeName.ARRAY) { + throw new RuntimeException( + "Expected a schema with a single array field but received " + + parameterFromPayload.getType().getTypeName()); + } + + // Following is a best-effort validation that may not cover all cases. Idea is to resolve + // ambiguities as much as possible to determine an exact match for the given set of + // parameters. If there are ambiguities, the expansion will fail. + if (!isPrimitiveOrWrapperOrString(arrayFieldClass)) { + @Nullable Collection values = constructorRow.getArray(i); + Schema arrayFieldSchema = getParameterSchema(arrayFieldClass); + if (arrayFieldSchema == null) { + throw new RuntimeException("Could not determine a schema for type " + arrayFieldClass); + } + if (values != null) { + @Nullable Row firstItem = values.iterator().next(); + if (firstItem != null && !(firstItem.getSchema().assignableTo(arrayFieldSchema))) { + return false; + } + } + } + } else if (constructorRow.getValue(i) instanceof Row) { + @Nullable Row parameterRow = constructorRow.getRow(i); + Schema schema = getParameterSchema(parameterClass); + if (schema == null) { + throw new RuntimeException("Could not determine a schema for type " + parameterClass); + } + if (parameterRow != null && !parameterRow.getSchema().assignableTo(schema)) { + return false; + } + } + } + return true; + } + + private @Nullable Object getDecodedValueFromRow( + Class type, Object valueFromRow, @Nullable Type genericType) { + if (isPrimitiveOrWrapperOrString(type)) { + if (!isPrimitiveOrWrapperOrString(valueFromRow.getClass())) { + throw new IllegalArgumentException( + "Expected a Java primitive value but received " + valueFromRow); + } + return valueFromRow; + } else if (type.isArray()) { + Class arrayComponentClass = type.getComponentType(); + return getDecodedArrayValueFromRow(arrayComponentClass, valueFromRow); + } else if (Collection.class.isAssignableFrom(type)) { + List originalList = (List) valueFromRow; + List decodedList = new ArrayList<>(); + for (Object obj : originalList) { + if (genericType instanceof ParameterizedType) { + Class elementType = + (Class) ((ParameterizedType) genericType).getActualTypeArguments()[0]; + decodedList.add(getDecodedValueFromRow(elementType, obj, null)); + } else { + throw new RuntimeException("Could not determine the generic type of the list"); + } + } + return decodedList; + } else if (valueFromRow instanceof Row) { + Row row = (Row) valueFromRow; + SerializableFunction fromRowFunc; + try { + fromRowFunc = SCHEMA_REGISTRY.getFromRowFunction(type); + } catch (NoSuchSchemaException e) { + throw new IllegalArgumentException( + "Could not determine the row function for class " + type, e); + } + return fromRowFunc.apply(row); + } + throw new RuntimeException("Could not decode the value from Row " + valueFromRow); + } + + private Object[] getParameterValues( + java.lang.reflect.Parameter[] parameters, Row constrtuctorRow, Type[] genericTypes) { + ArrayList parameterValues = new ArrayList<>(); + for (int i = 0; i < parameters.length; ++i) { + java.lang.reflect.Parameter parameter = parameters[i]; + Class parameterClass = parameter.getType(); + Object parameterValue = + getDecodedValueFromRow(parameterClass, constrtuctorRow.getValue(i), genericTypes[i]); + parameterValues.add(parameterValue); + } + + return parameterValues.toArray(); + } + + private Object[] getDecodedArrayValueFromRow(Class arrayComponentType, Object valueFromRow) { + List originalValues = (List) valueFromRow; + List decodedValues = new ArrayList<>(); + for (Object obj : originalValues) { + decodedValues.add(getDecodedValueFromRow(arrayComponentType, obj, null)); + } + + // We have to construct and return an array of the correct type. Otherwise Java reflection + // constructor/method invocations that use the returned value may consider the array as varargs + // (different parameters). + Object valueTypeArray = Array.newInstance(arrayComponentType, decodedValues.size()); + for (int i = 0; i < decodedValues.size(); i++) { + Array.set(valueTypeArray, i, arrayComponentType.cast(decodedValues.get(i))); + } + return (Object[]) valueTypeArray; + } + + private Constructor> findMappingConstructor( + Constructor[] constructors, JavaClassLookupPayload payload) { + Row constructorRow = decodeRow(payload.getConstructorSchema(), payload.getConstructorPayload()); + + List> mappingConstructors = + Arrays.stream(constructors) + .filter(c -> c.getParameterCount() == payload.getConstructorSchema().getFieldsCount()) + .filter(c -> parametersCompatible(c.getParameters(), constructorRow)) + .collect(Collectors.toList()); + if (mappingConstructors.size() != 1) { + throw new RuntimeException( + "Expected to find a single mapping constructor but found " + mappingConstructors.size()); + } + return (Constructor>) mappingConstructors.get(0); + } + + private boolean isConstructorMethodForName( + Method method, String nameFromPayload, AllowedClass allowListClass) { + for (Annotation annotation : method.getAnnotations()) { + if (annotation instanceof MultiLanguageConstructorMethod) { + if (nameFromPayload.equals(((MultiLanguageConstructorMethod) annotation).name())) { + if (allowListClass.getAllowedConstructorMethods().contains(nameFromPayload)) { + return true; + } else { + throw new RuntimeException( + "Constructor method " + nameFromPayload + " needs to be explicitly allowed"); + } + } + } + } + if (method.getName().equals(nameFromPayload)) { + if (allowListClass.getAllowedConstructorMethods().contains(nameFromPayload)) { + return true; + } else { + throw new RuntimeException( + "Constructor method " + nameFromPayload + " needs to be explicitly allowed"); + } + } + return false; + } + + private Method findMappingConstructorMethod( + Method[] methods, JavaClassLookupPayload payload, AllowedClass allowListClass) { + + Row constructorRow = decodeRow(payload.getConstructorSchema(), payload.getConstructorPayload()); + + List mappingConstructorMethods = + Arrays.stream(methods) + .filter( + m -> isConstructorMethodForName(m, payload.getConstructorMethod(), allowListClass)) + .filter(m -> m.getParameterCount() == payload.getConstructorSchema().getFieldsCount()) + .filter(m -> parametersCompatible(m.getParameters(), constructorRow)) + .collect(Collectors.toList()); + + if (mappingConstructorMethods.size() != 1) { + throw new RuntimeException( + "Expected to find a single mapping constructor method but found " + + mappingConstructorMethods.size() + + " Payload was " + + payload); + } + return mappingConstructorMethods.get(0); + } + + @AutoValue + public abstract static class AllowList { + + public abstract String getVersion(); + + public abstract List getAllowedClasses(); + + @JsonCreator + static AllowList create( + @JsonProperty("version") String version, + @JsonProperty("allowedClasses") @javax.annotation.Nullable + List allowedClasses) { + if (allowedClasses == null) { + allowedClasses = new ArrayList<>(); + } + return new AutoValue_JavaClassLookupTransformProvider_AllowList(version, allowedClasses); + } + } + + @AutoValue + public abstract static class AllowedClass { + + public abstract String getClassName(); + + public abstract List getAllowedBuilderMethods(); + + public abstract List getAllowedConstructorMethods(); + + @JsonCreator + static AllowedClass create( + @JsonProperty("className") String className, + @JsonProperty("allowedBuilderMethods") @javax.annotation.Nullable + List allowedBuilderMethods, + @JsonProperty("allowedConstructorMethods") @javax.annotation.Nullable + List allowedConstructorMethods) { + if (allowedBuilderMethods == null) { + allowedBuilderMethods = new ArrayList<>(); + } + if (allowedConstructorMethods == null) { + allowedConstructorMethods = new ArrayList<>(); + } + return new AutoValue_JavaClassLookupTransformProvider_AllowedClass( + className, allowedBuilderMethods, allowedConstructorMethods); + } + } + + static Row decodeRow(SchemaApi.Schema schema, ByteString payload) { + Schema payloadSchema = SchemaTranslation.schemaFromProto(schema); + + if (payloadSchema.getFieldCount() == 0) { + return Row.withSchema(Schema.of()).build(); + } + + Row row; + try { + row = RowCoder.of(payloadSchema).decode(payload.newInput()); + } catch (IOException e) { + throw new RuntimeException("Error decoding payload", e); + } + return row; + } +} diff --git a/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/MultiLanguageBuilderMethod.java b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/MultiLanguageBuilderMethod.java new file mode 100644 index 000000000000..3ee9ef5a7dac --- /dev/null +++ b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/MultiLanguageBuilderMethod.java @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.expansion.service; + +import java.lang.annotation.Documented; +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +@Documented +@Target({ElementType.METHOD}) +@Retention(RetentionPolicy.RUNTIME) +public @interface MultiLanguageBuilderMethod { + String name(); +} diff --git a/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/MultiLanguageConstructorMethod.java b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/MultiLanguageConstructorMethod.java new file mode 100644 index 000000000000..e89f460edd86 --- /dev/null +++ b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/MultiLanguageConstructorMethod.java @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.expansion.service; + +import java.lang.annotation.Documented; +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +@Documented +@Target({ElementType.METHOD}) +@Retention(RetentionPolicy.RUNTIME) +public @interface MultiLanguageConstructorMethod { + String name(); +} diff --git a/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExpansionServerTest.java b/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExpansionServerTest.java index 51ed82682e7d..c5b21f9c28c1 100644 --- a/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExpansionServerTest.java +++ b/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExpansionServerTest.java @@ -18,9 +18,11 @@ package org.apache.beam.sdk.expansion.service; import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.core.Is.is; +import org.apache.beam.sdk.options.PortablePipelineOptions; import org.junit.Test; /** Tests for {@link ExpansionServer}. */ @@ -45,4 +47,20 @@ public void testHostPortAvailableAfterClose() throws Exception { assertThat(expansionServer.getHost(), is("localhost")); assertThat(expansionServer.getPort(), greaterThan(0)); } + + @Test + public void testPassingPipelineArguments() { + String[] args = { + "--defaultEnvironmentType=PROCESS", + "--defaultEnvironmentConfig={\"command\": \"/opt/apache/beam/boot\"}" + }; + ExpansionService service = new ExpansionService(args); + assertThat( + service + .createPipeline() + .getOptions() + .as(PortablePipelineOptions.class) + .getDefaultEnvironmentType(), + equalTo("PROCESS")); + } } diff --git a/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExpansionServiceTest.java b/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExpansionServiceTest.java index f37cdbdefb56..e8ecf469b784 100644 --- a/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExpansionServiceTest.java +++ b/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExpansionServiceTest.java @@ -31,7 +31,6 @@ import com.google.auto.service.AutoService; import com.google.auto.value.AutoValue; -import edu.umd.cs.findbugs.annotations.Nullable; import java.io.IOException; import java.util.Collections; import java.util.HashSet; @@ -55,11 +54,12 @@ import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.Impulse; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.checkerframework.checker.nullness.qual.Nullable; import org.hamcrest.Matchers; import org.junit.Test; @@ -90,7 +90,8 @@ public class ExpansionServiceTest { /** Registers a single test transformation. */ @AutoService(ExpansionService.ExpansionServiceRegistrar.class) - public static class TestTransforms implements ExpansionService.ExpansionServiceRegistrar { + public static class TestTransformRegistrar implements ExpansionService.ExpansionServiceRegistrar { + @Override public Map knownTransforms() { return ImmutableMap.of(TEST_URN, spec -> Count.perElement()); @@ -140,9 +141,9 @@ public void testConstruct() { } @Test - public void testConstructGenerateSequence() { + public void testConstructGenerateSequenceWithRegistration() { ExternalTransforms.ExternalConfigurationPayload payload = - encodeRow( + encodeRowIntoExternalConfigurationPayload( Row.withSchema( Schema.of( Field.of("start", FieldType.INT64), @@ -176,7 +177,7 @@ public void testConstructGenerateSequence() { @Test public void testCompoundCodersForExternalConfiguration_setters() throws Exception { ExternalTransforms.ExternalConfigurationPayload externalConfig = - encodeRow( + encodeRowIntoExternalConfigurationPayload( Row.withSchema( Schema.of( Field.nullable("config_key1", FieldType.INT64), @@ -253,7 +254,7 @@ public void setConfigKey4(@Nullable Map> configKey4) { @Test public void testCompoundCodersForExternalConfiguration_schemas() throws Exception { ExternalTransforms.ExternalConfigurationPayload externalConfig = - encodeRow( + encodeRowIntoExternalConfigurationPayload( Row.withSchema( Schema.of( Field.nullable("configKey1", FieldType.INT64), @@ -320,7 +321,7 @@ abstract static class TestConfigSchema { @Test public void testExternalConfiguration_simpleSchema() throws Exception { ExternalTransforms.ExternalConfigurationPayload externalConfig = - encodeRow( + encodeRowIntoExternalConfigurationPayload( Row.withSchema( Schema.of( Field.of("bar", FieldType.STRING), @@ -350,7 +351,8 @@ abstract static class TestConfigSimpleSchema { abstract List getList(); } - private static ExternalTransforms.ExternalConfigurationPayload encodeRow(Row row) { + private static ExternalTransforms.ExternalConfigurationPayload + encodeRowIntoExternalConfigurationPayload(Row row) { ByteString.Output outputStream = ByteString.newOutput(); try { SchemaCoder.of(row.getSchema()).encode(row, outputStream); diff --git a/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExternalTest.java b/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExternalTest.java index 564609b5b2e4..a18c51502224 100644 --- a/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExternalTest.java +++ b/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/ExternalTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.expansion.service; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.google.auto.service.AutoService; import java.io.IOException; @@ -42,8 +42,8 @@ import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; import org.apache.beam.sdk.values.TypeDescriptors; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerBuilder; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.MonotonicNonNull; import org.checkerframework.checker.nullness.qual.RequiresNonNull; diff --git a/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/JavaClassLookupTransformProviderTest.java b/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/JavaClassLookupTransformProviderTest.java new file mode 100644 index 000000000000..e4e6de7c9ffc --- /dev/null +++ b/sdks/java/expansion-service/src/test/java/org/apache/beam/sdk/expansion/service/JavaClassLookupTransformProviderTest.java @@ -0,0 +1,1111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.expansion.service; + +import static org.apache.beam.runners.core.construction.BeamUrns.getUrn; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.anyOf; +import static org.hamcrest.Matchers.containsString; +import static org.junit.Assert.assertArrayEquals; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertThrows; +import static org.junit.Assert.assertTrue; + +import java.io.File; +import java.io.IOException; +import java.io.Serializable; +import java.net.URL; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import org.apache.beam.model.expansion.v1.ExpansionApi; +import org.apache.beam.model.pipeline.v1.ExternalTransforms; +import org.apache.beam.model.pipeline.v1.ExternalTransforms.BuilderMethod; +import org.apache.beam.model.pipeline.v1.ExternalTransforms.ExpansionMethods; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.ParDoPayload; +import org.apache.beam.model.pipeline.v1.SchemaApi; +import org.apache.beam.runners.core.construction.ParDoTranslation; +import org.apache.beam.runners.core.construction.PipelineTranslation; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.SchemaCoder; +import org.apache.beam.sdk.schemas.SchemaTranslation; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.hamcrest.Matchers; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link JavaClassLookupTransformProvider}. */ +@RunWith(JUnit4.class) +public class JavaClassLookupTransformProviderTest { + + private static final String TEST_URN = "test:beam:transforms:count"; + + private static final String TEST_NAME = "TestName"; + + private static final String TEST_NAMESPACE = "namespace"; + + private static ExpansionService expansionService; + + @BeforeClass + public static void setupExpansionService() { + PipelineOptionsFactory.register(ExpansionServiceOptions.class); + URL allowListFile = Resources.getResource("./test_allowlist.yaml"); + System.out.println("Exists: " + new File(allowListFile.getPath()).exists()); + expansionService = + new ExpansionService( + new String[] {"--javaClassLookupAllowlistFile=" + allowListFile.getPath()}); + } + + static class DummyDoFn extends DoFn { + String strField1; + String strField2; + int intField1; + Double doubleWrapperField; + String[] strArrayField; + DummyComplexType complexTypeField; + DummyComplexType[] complexTypeArrayField; + List strListField; + List complexTypeListField; + + private DummyDoFn( + String strField1, + String strField2, + int intField1, + Double doubleWrapperField, + String[] strArrayField, + DummyComplexType complexTypeField, + DummyComplexType[] complexTypeArrayField, + List strListField, + List complexTypeListField) { + this.intField1 = intField1; + this.strField1 = strField1; + this.strField2 = strField2; + this.doubleWrapperField = doubleWrapperField; + this.strArrayField = strArrayField; + this.complexTypeField = complexTypeField; + this.complexTypeArrayField = complexTypeArrayField; + this.strListField = strListField; + this.complexTypeListField = complexTypeListField; + } + + @ProcessElement + public void processElement(ProcessContext c) { + c.output(c.element()); + } + } + + public static class DummyComplexType implements Serializable { + String complexTypeStrField; + int complexTypeIntField; + + public DummyComplexType() {} + + public DummyComplexType(String complexTypeStrField, int complexTypeIntField) { + this.complexTypeStrField = complexTypeStrField; + this.complexTypeIntField = complexTypeIntField; + } + + @Override + public int hashCode() { + return this.complexTypeStrField.hashCode() + this.complexTypeIntField * 31; + } + + @Override + public boolean equals(Object obj) { + if (!(obj instanceof DummyComplexType)) { + return false; + } + DummyComplexType toCompare = (DummyComplexType) obj; + return (this.complexTypeIntField == toCompare.complexTypeIntField) + && (this.complexTypeStrField.equals(toCompare.complexTypeStrField)); + } + } + + public static class DummyTransform extends PTransform> { + String strField1; + String strField2; + int intField1; + Double doubleWrapperField; + String[] strArrayField; + DummyComplexType complexTypeField; + DummyComplexType[] complexTypeArrayField; + List strListField; + List complexTypeListField; + + @Override + public PCollection expand(PBegin input) { + return input + .apply("MyCreateTransform", Create.of("aaa", "bbb", "ccc")) + .apply( + "MyParDoTransform", + ParDo.of( + new DummyDoFn( + this.strField1, + this.strField2, + this.intField1, + this.doubleWrapperField, + this.strArrayField, + this.complexTypeField, + this.complexTypeArrayField, + this.strListField, + this.complexTypeListField))); + } + } + + public static class DummyTransformWithConstructor extends DummyTransform { + + public DummyTransformWithConstructor(String strField1) { + this.strField1 = strField1; + } + } + + public static class DummyTransformWithConstructorAndBuilderMethods extends DummyTransform { + + public DummyTransformWithConstructorAndBuilderMethods(String strField1) { + this.strField1 = strField1; + } + + public DummyTransformWithConstructorAndBuilderMethods withStrField2(String strField2) { + this.strField2 = strField2; + return this; + } + + public DummyTransformWithConstructorAndBuilderMethods withIntField1(int intField1) { + this.intField1 = intField1; + return this; + } + } + + public static class DummyTransformWithMultiArgumentBuilderMethod extends DummyTransform { + + public DummyTransformWithMultiArgumentBuilderMethod(String strField1) { + this.strField1 = strField1; + } + + public DummyTransformWithMultiArgumentBuilderMethod withFields( + String strField2, int intField1) { + this.strField2 = strField2; + this.intField1 = intField1; + return this; + } + } + + public static class DummyTransformWithMultiArgumentConstructor extends DummyTransform { + + public DummyTransformWithMultiArgumentConstructor(String strField1, String strField2) { + this.strField1 = strField1; + this.strField2 = strField2; + } + } + + public static class DummyTransformWithConstructorMethod extends DummyTransform { + + public static DummyTransformWithConstructorMethod from(String strField1) { + DummyTransformWithConstructorMethod transform = new DummyTransformWithConstructorMethod(); + transform.strField1 = strField1; + return transform; + } + } + + public static class DummyTransformWithConstructorMethodAndBuilderMethods extends DummyTransform { + + public static DummyTransformWithConstructorMethodAndBuilderMethods from(String strField1) { + DummyTransformWithConstructorMethodAndBuilderMethods transform = + new DummyTransformWithConstructorMethodAndBuilderMethods(); + transform.strField1 = strField1; + return transform; + } + + public DummyTransformWithConstructorMethodAndBuilderMethods withStrField2(String strField2) { + this.strField2 = strField2; + return this; + } + + public DummyTransformWithConstructorMethodAndBuilderMethods withIntField1(int intField1) { + this.intField1 = intField1; + return this; + } + } + + public static class DummyTransformWithMultiLanguageAnnotations extends DummyTransform { + + @MultiLanguageConstructorMethod(name = "create_transform") + public static DummyTransformWithMultiLanguageAnnotations from(String strField1) { + DummyTransformWithMultiLanguageAnnotations transform = + new DummyTransformWithMultiLanguageAnnotations(); + transform.strField1 = strField1; + return transform; + } + + @MultiLanguageBuilderMethod(name = "abc") + public DummyTransformWithMultiLanguageAnnotations withStrField2(String strField2) { + this.strField2 = strField2; + return this; + } + + @MultiLanguageBuilderMethod(name = "xyz") + public DummyTransformWithMultiLanguageAnnotations withIntField1(int intField1) { + this.intField1 = intField1; + return this; + } + } + + public static class DummyTransformWithWrapperTypes extends DummyTransform { + public DummyTransformWithWrapperTypes(String strField1) { + this.strField1 = strField1; + } + + public DummyTransformWithWrapperTypes withDoubleWrapperField(Double doubleWrapperField) { + this.doubleWrapperField = doubleWrapperField; + return this; + } + } + + public static class DummyTransformWithComplexTypes extends DummyTransform { + public DummyTransformWithComplexTypes(String strField1) { + this.strField1 = strField1; + } + + public DummyTransformWithComplexTypes withComplexTypeField(DummyComplexType complexTypeField) { + this.complexTypeField = complexTypeField; + return this; + } + } + + public static class DummyTransformWithArray extends DummyTransform { + public DummyTransformWithArray(String strField1) { + this.strField1 = strField1; + } + + public DummyTransformWithArray withStrArrayField(String[] strArrayField) { + this.strArrayField = strArrayField; + return this; + } + } + + public static class DummyTransformWithList extends DummyTransform { + public DummyTransformWithList(String strField1) { + this.strField1 = strField1; + } + + public DummyTransformWithList withStrListField(List strListField) { + this.strListField = strListField; + return this; + } + } + + public static class DummyTransformWithComplexTypeArray extends DummyTransform { + public DummyTransformWithComplexTypeArray(String strField1) { + this.strField1 = strField1; + } + + public DummyTransformWithComplexTypeArray withComplexTypeArrayField( + DummyComplexType[] complexTypeArrayField) { + this.complexTypeArrayField = complexTypeArrayField; + return this; + } + } + + public static class DummyTransformWithComplexTypeList extends DummyTransform { + public DummyTransformWithComplexTypeList(String strField1) { + this.strField1 = strField1; + } + + public DummyTransformWithComplexTypeList withComplexTypeListField( + List complexTypeListField) { + this.complexTypeListField = complexTypeListField; + return this; + } + } + + void testClassLookupExpansionRequestConstruction( + ExternalTransforms.JavaClassLookupPayload payload, Map fieldsToVerify) { + Pipeline p = Pipeline.create(); + + RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p); + + ExpansionApi.ExpansionRequest request = + ExpansionApi.ExpansionRequest.newBuilder() + .setComponents(pipelineProto.getComponents()) + .setTransform( + RunnerApi.PTransform.newBuilder() + .setUniqueName(TEST_NAME) + .setSpec( + RunnerApi.FunctionSpec.newBuilder() + .setUrn(getUrn(ExpansionMethods.Enum.JAVA_CLASS_LOOKUP)) + .setPayload(payload.toByteString()))) + .setNamespace(TEST_NAMESPACE) + .build(); + ExpansionApi.ExpansionResponse response = expansionService.expand(request); + RunnerApi.PTransform expandedTransform = response.getTransform(); + assertEquals(TEST_NAMESPACE + TEST_NAME, expandedTransform.getUniqueName()); + assertThat(expandedTransform.getInputsCount(), Matchers.is(0)); + assertThat(expandedTransform.getOutputsCount(), Matchers.is(1)); + assertEquals(2, expandedTransform.getSubtransformsCount()); + assertEquals(2, expandedTransform.getSubtransformsCount()); + assertThat( + expandedTransform.getSubtransforms(0), + anyOf(containsString("MyCreateTransform"), containsString("MyParDoTransform"))); + assertThat( + expandedTransform.getSubtransforms(1), + anyOf(containsString("MyCreateTransform"), containsString("MyParDoTransform"))); + + org.apache.beam.model.pipeline.v1.RunnerApi.PTransform userParDoTransform = null; + for (String transformId : response.getComponents().getTransformsMap().keySet()) { + if (transformId.contains("ParMultiDo-Dummy-")) { + userParDoTransform = response.getComponents().getTransformsMap().get(transformId); + } + } + assertNotNull(userParDoTransform); + ParDoPayload parDoPayload = null; + try { + parDoPayload = ParDoPayload.parseFrom(userParDoTransform.getSpec().getPayload()); + } catch (InvalidProtocolBufferException e) { + throw new RuntimeException(e); + } + assertNotNull(parDoPayload); + DummyDoFn doFn = + (DummyDoFn) + ParDoTranslation.doFnWithExecutionInformationFromProto(parDoPayload.getDoFn()) + .getDoFn(); + System.out.println("DoFn" + doFn); + + List verifiedFields = new ArrayList<>(); + if (fieldsToVerify.keySet().contains("strField1")) { + assertEquals(doFn.strField1, fieldsToVerify.get("strField1")); + verifiedFields.add("strField1"); + } + if (fieldsToVerify.keySet().contains("strField2")) { + assertEquals(doFn.strField2, fieldsToVerify.get("strField2")); + verifiedFields.add("strField2"); + } + if (fieldsToVerify.keySet().contains("intField1")) { + assertEquals(doFn.intField1, fieldsToVerify.get("intField1")); + verifiedFields.add("intField1"); + } + if (fieldsToVerify.keySet().contains("doubleWrapperField")) { + assertEquals(doFn.doubleWrapperField, fieldsToVerify.get("doubleWrapperField")); + verifiedFields.add("doubleWrapperField"); + } + if (fieldsToVerify.containsKey("complexTypeStrField")) { + assertEquals( + doFn.complexTypeField.complexTypeStrField, fieldsToVerify.get("complexTypeStrField")); + verifiedFields.add("complexTypeStrField"); + } + if (fieldsToVerify.containsKey("complexTypeIntField")) { + assertEquals( + doFn.complexTypeField.complexTypeIntField, fieldsToVerify.get("complexTypeIntField")); + verifiedFields.add("complexTypeIntField"); + } + + if (fieldsToVerify.keySet().contains("strArrayField")) { + assertArrayEquals(doFn.strArrayField, (String[]) fieldsToVerify.get("strArrayField")); + verifiedFields.add("strArrayField"); + } + + if (fieldsToVerify.keySet().contains("strListField")) { + assertEquals(doFn.strListField, (List) fieldsToVerify.get("strListField")); + verifiedFields.add("strListField"); + } + + if (fieldsToVerify.keySet().contains("complexTypeArrayField")) { + assertArrayEquals( + doFn.complexTypeArrayField, + (DummyComplexType[]) fieldsToVerify.get("complexTypeArrayField")); + verifiedFields.add("complexTypeArrayField"); + } + + if (fieldsToVerify.keySet().contains("complexTypeListField")) { + assertEquals(doFn.complexTypeListField, (List) fieldsToVerify.get("complexTypeListField")); + verifiedFields.add("complexTypeListField"); + } + + List unverifiedFields = new ArrayList<>(fieldsToVerify.keySet()); + unverifiedFields.removeAll(verifiedFields); + if (!unverifiedFields.isEmpty()) { + throw new RuntimeException("Failed to verify some fields: " + unverifiedFields); + } + } + + @Test + public void testJavaClassLookupWithConstructor() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructor"); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), ImmutableMap.of("strField1", "test_str_1")); + } + + @Test + public void testJavaClassLookupWithConstructorMethod() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructorMethod"); + + payloadBuilder.setConstructorMethod("from"); + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), ImmutableMap.of("strField1", "test_str_1")); + } + + @Test + public void testJavaClassLookupWithConstructorAndBuilderMethods() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructorAndBuilderMethods"); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withStrField2"); + Row builderMethodRow = + Row.withSchema(Schema.of(Field.of("strField2", FieldType.STRING))) + .withFieldValue("strField2", "test_str_2") + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withIntField1"); + builderMethodRow = + Row.withSchema(Schema.of(Field.of("intField1", FieldType.INT32))) + .withFieldValue("intField1", 10) + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), + ImmutableMap.of("strField1", "test_str_1", "strField2", "test_str_2", "intField1", 10)); + } + + @Test + public void testJavaClassLookupWithMultiArgumentConstructor() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithMultiArgumentConstructor"); + + Row constructorRow = + Row.withSchema( + Schema.of( + Field.of("strField1", FieldType.STRING), + Field.of("strField2", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .withFieldValue("strField2", "test_str_2") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), + ImmutableMap.of("strField1", "test_str_1", "strField2", "test_str_2")); + } + + @Test + public void testJavaClassLookupWithMultiArgumentBuilderMethod() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithMultiArgumentBuilderMethod"); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withFields"); + Row builderMethodRow = + Row.withSchema( + Schema.of( + Field.of("strField2", FieldType.STRING), + Field.of("intField1", FieldType.INT32))) + .withFieldValue("strField2", "test_str_2") + .withFieldValue("intField1", 10) + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), + ImmutableMap.of("strField1", "test_str_1", "strField2", "test_str_2", "intField1", 10)); + } + + @Test + public void testJavaClassLookupWithWrapperTypes() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithWrapperTypes"); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withDoubleWrapperField"); + Row builderMethodRow = + Row.withSchema(Schema.of(Field.of("doubleWrapperField", FieldType.DOUBLE))) + .withFieldValue("doubleWrapperField", 123.56) + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), ImmutableMap.of("doubleWrapperField", 123.56)); + } + + @Test + public void testJavaClassLookupWithComplexTypes() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithComplexTypes"); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + Schema complexTypeSchema = + Schema.builder() + .addStringField("complexTypeStrField") + .addInt32Field("complexTypeIntField") + .build(); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withComplexTypeField"); + + Row builderMethodParamRow = + Row.withSchema(complexTypeSchema) + .withFieldValue("complexTypeStrField", "complex_type_str_1") + .withFieldValue("complexTypeIntField", 123) + .build(); + + Schema builderMethodSchema = + Schema.builder().addRowField("complexTypeField", complexTypeSchema).build(); + Row builderMethodRow = + Row.withSchema(builderMethodSchema) + .withFieldValue("complexTypeField", builderMethodParamRow) + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), + ImmutableMap.of("complexTypeStrField", "complex_type_str_1", "complexTypeIntField", 123)); + } + + @Test + public void testJavaClassLookupWithSimpleArrayType() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithArray"); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withStrArrayField"); + + Schema builderMethodSchema = + Schema.builder().addArrayField("strArrayField", FieldType.STRING).build(); + + Row builderMethodRow = + Row.withSchema(builderMethodSchema) + .withFieldValue( + "strArrayField", ImmutableList.of("test_str_1", "test_str_2", "test_str_3")) + .build(); + + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + String[] resultArray = {"test_str_1", "test_str_2", "test_str_3"}; + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), ImmutableMap.of("strArrayField", resultArray)); + } + + @Test + public void testJavaClassLookupWithSimpleListType() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithList"); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withStrListField"); + + Schema builderMethodSchema = + Schema.builder().addIterableField("strListField", FieldType.STRING).build(); + + Row builderMethodRow = + Row.withSchema(builderMethodSchema) + .withFieldValue( + "strListField", ImmutableList.of("test_str_1", "test_str_2", "test_str_3")) + .build(); + + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + List resultList = new ArrayList<>(); + resultList.add("test_str_1"); + resultList.add("test_str_2"); + resultList.add("test_str_3"); + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), ImmutableMap.of("strListField", resultList)); + } + + @Test + public void testJavaClassLookupWithComplexArrayType() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithComplexTypeArray"); + + Schema complexTypeSchema = + Schema.builder() + .addStringField("complexTypeStrField") + .addInt32Field("complexTypeIntField") + .build(); + + Schema builderMethodSchema = + Schema.builder() + .addArrayField("complexTypeArrayField", FieldType.row(complexTypeSchema)) + .build(); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + List complexTypeList = new ArrayList<>(); + complexTypeList.add( + Row.withSchema(complexTypeSchema) + .withFieldValue("complexTypeStrField", "complex_type_str_1") + .withFieldValue("complexTypeIntField", 123) + .build()); + complexTypeList.add( + Row.withSchema(complexTypeSchema) + .withFieldValue("complexTypeStrField", "complex_type_str_2") + .withFieldValue("complexTypeIntField", 456) + .build()); + complexTypeList.add( + Row.withSchema(complexTypeSchema) + .withFieldValue("complexTypeStrField", "complex_type_str_3") + .withFieldValue("complexTypeIntField", 789) + .build()); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withComplexTypeArrayField"); + + Row builderMethodRow = + Row.withSchema(builderMethodSchema) + .withFieldValue("complexTypeArrayField", complexTypeList) + .build(); + + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + ArrayList resultList = new ArrayList<>(); + resultList.add(new DummyComplexType("complex_type_str_1", 123)); + resultList.add(new DummyComplexType("complex_type_str_2", 456)); + resultList.add(new DummyComplexType("complex_type_str_3", 789)); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), + ImmutableMap.of("complexTypeArrayField", resultList.toArray(new DummyComplexType[0]))); + } + + @Test + public void testJavaClassLookupWithComplexListType() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithComplexTypeList"); + + Schema complexTypeSchema = + Schema.builder() + .addStringField("complexTypeStrField") + .addInt32Field("complexTypeIntField") + .build(); + + Schema builderMethodSchema = + Schema.builder() + .addIterableField("complexTypeListField", FieldType.row(complexTypeSchema)) + .build(); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + List complexTypeList = new ArrayList<>(); + complexTypeList.add( + Row.withSchema(complexTypeSchema) + .withFieldValue("complexTypeStrField", "complex_type_str_1") + .withFieldValue("complexTypeIntField", 123) + .build()); + complexTypeList.add( + Row.withSchema(complexTypeSchema) + .withFieldValue("complexTypeStrField", "complex_type_str_2") + .withFieldValue("complexTypeIntField", 456) + .build()); + complexTypeList.add( + Row.withSchema(complexTypeSchema) + .withFieldValue("complexTypeStrField", "complex_type_str_3") + .withFieldValue("complexTypeIntField", 789) + .build()); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withComplexTypeListField"); + + Row builderMethodRow = + Row.withSchema(builderMethodSchema) + .withFieldValue("complexTypeListField", complexTypeList) + .build(); + + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + ArrayList resultList = new ArrayList<>(); + resultList.add(new DummyComplexType("complex_type_str_1", 123)); + resultList.add(new DummyComplexType("complex_type_str_2", 456)); + resultList.add(new DummyComplexType("complex_type_str_3", 789)); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), ImmutableMap.of("complexTypeListField", resultList)); + } + + @Test + public void testJavaClassLookupWithConstructorMethodAndBuilderMethods() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructorMethodAndBuilderMethods"); + payloadBuilder.setConstructorMethod("from"); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withStrField2"); + + Row builderMethodRow = + Row.withSchema(Schema.of(Field.of("strField2", FieldType.STRING))) + .withFieldValue("strField2", "test_str_2") + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withIntField1"); + + builderMethodRow = + Row.withSchema(Schema.of(Field.of("intField1", FieldType.INT32))) + .withFieldValue("intField1", 10) + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), + ImmutableMap.of("strField1", "test_str_1", "strField2", "test_str_2", "intField1", 10)); + } + + @Test + public void testJavaClassLookupWithSimplifiedBuilderMethodNames() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructorMethodAndBuilderMethods"); + payloadBuilder.setConstructorMethod("from"); + + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("strField2"); + Row builderMethodRow = + Row.withSchema(Schema.of(Field.of("strField2", FieldType.STRING))) + .withFieldValue("strField2", "test_str_2") + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("intField1"); + builderMethodRow = + Row.withSchema(Schema.of(Field.of("intField1", FieldType.INT32))) + .withFieldValue("intField1", 10) + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), + ImmutableMap.of("strField1", "test_str_1", "strField2", "test_str_2", "intField1", 10)); + } + + @Test + public void testJavaClassLookupWithAnnotations() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithMultiLanguageAnnotations"); + payloadBuilder.setConstructorMethod("create_transform"); + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("abc"); + Row builderMethodRow = + Row.withSchema(Schema.of(Field.of("strField2", FieldType.STRING))) + .withFieldValue("strField2", "test_str_2") + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("xyz"); + builderMethodRow = + Row.withSchema(Schema.of(Field.of("intField1", FieldType.INT32))) + .withFieldValue("intField1", 10) + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), + ImmutableMap.of("strField1", "test_str_1", "strField2", "test_str_2", "intField1", 10)); + } + + @Test + public void testJavaClassLookupClassNotAvailable() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$UnavailableClass"); + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + RuntimeException thrown = + assertThrows( + RuntimeException.class, + () -> + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), ImmutableMap.of())); + assertTrue(thrown.getMessage().contains("does not enable")); + } + + @Test + public void testJavaClassLookupIncorrectConstructionParameter() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructor"); + Row constructorRow = + Row.withSchema(Schema.of(Field.of("incorrectField", FieldType.STRING))) + .withFieldValue("incorrectField", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + RuntimeException thrown = + assertThrows( + RuntimeException.class, + () -> + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), ImmutableMap.of())); + assertTrue(thrown.getMessage().contains("Expected to find a single mapping constructor")); + } + + @Test + public void testJavaClassLookupIncorrectBuilderMethodParameter() { + ExternalTransforms.JavaClassLookupPayload.Builder payloadBuilder = + ExternalTransforms.JavaClassLookupPayload.newBuilder(); + payloadBuilder.setClassName( + "org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructorAndBuilderMethods"); + Row constructorRow = + Row.withSchema(Schema.of(Field.of("strField1", FieldType.STRING))) + .withFieldValue("strField1", "test_str_1") + .build(); + + payloadBuilder.setConstructorSchema(getProtoSchemaFromRow(constructorRow)); + payloadBuilder.setConstructorPayload(getProtoPayloadFromRow(constructorRow)); + + BuilderMethod.Builder builderMethodBuilder = BuilderMethod.newBuilder(); + builderMethodBuilder.setName("withStrField2"); + Row builderMethodRow = + Row.withSchema(Schema.of(Field.of("incorrectParam", FieldType.STRING))) + .withFieldValue("incorrectParam", "test_str_2") + .build(); + builderMethodBuilder.setSchema(getProtoSchemaFromRow(builderMethodRow)); + builderMethodBuilder.setPayload(getProtoPayloadFromRow(builderMethodRow)); + + payloadBuilder.addBuilderMethods(builderMethodBuilder); + + RuntimeException thrown = + assertThrows( + RuntimeException.class, + () -> + testClassLookupExpansionRequestConstruction( + payloadBuilder.build(), ImmutableMap.of())); + assertTrue(thrown.getMessage().contains("Expected to find exactly one matching method")); + } + + private SchemaApi.Schema getProtoSchemaFromRow(Row row) { + return SchemaTranslation.schemaToProto(row.getSchema(), true); + } + + private ByteString getProtoPayloadFromRow(Row row) { + ByteString.Output outputStream = ByteString.newOutput(); + try { + SchemaCoder.of(row.getSchema()).encode(row, outputStream); + } catch (IOException e) { + throw new RuntimeException(e); + } + + return outputStream.toByteString(); + } +} diff --git a/sdks/java/expansion-service/src/test/resources/test_allowlist.yaml b/sdks/java/expansion-service/src/test/resources/test_allowlist.yaml new file mode 100644 index 000000000000..dd76f47885e1 --- /dev/null +++ b/sdks/java/expansion-service/src/test/resources/test_allowlist.yaml @@ -0,0 +1,67 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +version: v1 +allowedClasses: +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructor +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructorMethod + allowedConstructorMethods: + - from +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructorAndBuilderMethods + allowedBuilderMethods: + - withStrField2 + - withIntField1 +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithMultiArgumentConstructor +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithMultiArgumentBuilderMethod + allowedBuilderMethods: + - withFields +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithConstructorMethodAndBuilderMethods + allowedConstructorMethods: + - from + allowedBuilderMethods: + - withStrField2 + - withIntField1 + - strField2 + - intField1 +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithMultiLanguageAnnotations + allowedConstructorMethods: + - create_transform + allowedBuilderMethods: + - abc + - xyz +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithWrapperTypes + allowedBuilderMethods: + - withDoubleWrapperField +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithComplexTypes + allowedBuilderMethods: + - withComplexTypeField +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithArray + allowedBuilderMethods: + - withStrArrayField +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithList + allowedBuilderMethods: + - withStrListField +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithComplexTypeArray + allowedBuilderMethods: + - withComplexTypeArrayField +- className: org.apache.beam.sdk.expansion.service.JavaClassLookupTransformProviderTest$DummyTransformWithComplexTypeList + allowedBuilderMethods: + - withComplexTypeListField + + diff --git a/sdks/java/extensions/arrow/build.gradle b/sdks/java/extensions/arrow/build.gradle new file mode 100644 index 000000000000..9cf7a4865118 --- /dev/null +++ b/sdks/java/extensions/arrow/build.gradle @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +plugins { id 'org.apache.beam.module' } +applyJavaNature(automaticModuleName: 'org.apache.beam.sdk.extensions.arrow') + +description = "Apache Beam :: SDKs :: Java :: Extensions :: Arrow" + +dependencies { + compile library.java.vendored_guava_26_0_jre + compile project(path: ":sdks:java:core", configuration: "shadow") + compile library.java.arrow_vector + compile library.java.arrow_memory_core + compile library.java.joda_time + testCompile library.java.arrow_memory_netty + testCompile library.java.junit + testRuntimeOnly library.java.slf4j_simple +} \ No newline at end of file diff --git a/sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java b/sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java new file mode 100644 index 000000000000..09a6476ac7ae --- /dev/null +++ b/sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java @@ -0,0 +1,513 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.arrow; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.io.IOException; +import java.io.InputStream; +import java.nio.channels.Channels; +import java.util.Iterator; +import java.util.List; +import java.util.Optional; +import java.util.function.Function; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.VectorLoader; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.apache.arrow.vector.ipc.ReadChannel; +import org.apache.arrow.vector.ipc.message.ArrowRecordBatch; +import org.apache.arrow.vector.ipc.message.MessageSerializer; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.util.Text; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.CachingFactory; +import org.apache.beam.sdk.schemas.Factory; +import org.apache.beam.sdk.schemas.FieldValueGetter; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes; +import org.apache.beam.sdk.values.Row; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; + +/** + * Utilities to create {@link Iterable}s of Beam {@link Row} instances backed by Arrow record + * batches. + */ +@Experimental(Experimental.Kind.SCHEMAS) +public class ArrowConversion { + + /** Get Beam Field from Arrow Field. */ + private static Field toBeamField(org.apache.arrow.vector.types.pojo.Field field) { + FieldType beamFieldType = toFieldType(field.getFieldType(), field.getChildren()); + return Field.of(field.getName(), beamFieldType); + } + + /** Converts Arrow FieldType to Beam FieldType. */ + private static FieldType toFieldType( + org.apache.arrow.vector.types.pojo.FieldType arrowFieldType, + List childrenFields) { + FieldType fieldType = + arrowFieldType + .getType() + .accept( + new ArrowType.ArrowTypeVisitor() { + @Override + public FieldType visit(ArrowType.Null type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.Struct type) { + return FieldType.row(ArrowSchemaTranslator.toBeamSchema(childrenFields)); + } + + @Override + public FieldType visit(ArrowType.List type) { + checkArgument( + childrenFields.size() == 1, + "Encountered " + + childrenFields.size() + + " child fields for list type, expected 1"); + return FieldType.array(toBeamField(childrenFields.get(0)).getType()); + } + + @Override + public FieldType visit(ArrowType.FixedSizeList type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.Union type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.Map type) { + checkArgument( + childrenFields.size() == 2, + "Encountered " + + childrenFields.size() + + " child fields for map type, expected 2"); + return FieldType.map( + toBeamField(childrenFields.get(0)).getType(), + toBeamField(childrenFields.get(1)).getType()); + } + + @Override + public FieldType visit(ArrowType.Int type) { + if (!type.getIsSigned()) { + throw new IllegalArgumentException("Unsigned integers are not supported."); + } + switch (type.getBitWidth()) { + case 8: + return FieldType.BYTE; + case 16: + return FieldType.INT16; + case 32: + return FieldType.INT32; + case 64: + return FieldType.INT64; + default: + throw new IllegalArgumentException( + "Unsupported integer bit width: " + type.getBitWidth()); + } + } + + @Override + public FieldType visit(ArrowType.FloatingPoint type) { + switch (type.getPrecision()) { + case SINGLE: + return FieldType.FLOAT; + case DOUBLE: + return FieldType.DOUBLE; + default: + throw new IllegalArgumentException( + "Unsupported floating-point precision: " + type.getPrecision().name()); + } + } + + @Override + public FieldType visit(ArrowType.Utf8 type) { + return FieldType.STRING; + } + + @Override + public FieldType visit(ArrowType.Binary type) { + return FieldType.BYTES; + } + + @Override + public FieldType visit(ArrowType.FixedSizeBinary type) { + return FieldType.logicalType(FixedBytes.of(type.getByteWidth())); + } + + @Override + public FieldType visit(ArrowType.Bool type) { + return FieldType.BOOLEAN; + } + + @Override + public FieldType visit(ArrowType.Decimal type) { + // FieldType.DECIMAL isn't perfect here since arrow decimal has a + // scale/precision fixed by the schema, but FieldType.DECIMAL uses a BigDecimal, + // whose precision/scale can change from row to row. + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.Date type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.Time type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.Timestamp type) { + if (type.getUnit() == TimeUnit.MILLISECOND + || type.getUnit() == TimeUnit.MICROSECOND) { + return FieldType.DATETIME; + } else { + throw new IllegalArgumentException( + "Unsupported timestamp unit: " + type.getUnit().name()); + } + } + + @Override + public FieldType visit(ArrowType.Interval type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.Duration type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.LargeBinary type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.LargeUtf8 type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + + @Override + public FieldType visit(ArrowType.LargeList type) { + throw new IllegalArgumentException( + "Type \'" + type.toString() + "\' not supported."); + } + }); + return fieldType.withNullable(arrowFieldType.isNullable()); + } + + /** + * Returns a {@link RecordBatchRowIterator} backed by the Arrow record batch stored in {@code + * vectorSchemaRoot}. + * + *

    Note this is a lazy interface. The data in the underlying Arrow buffer is not read until a + * field of one of the returned {@link Row}s is accessed. + */ + public static RecordBatchRowIterator rowsFromRecordBatch( + Schema schema, VectorSchemaRoot vectorSchemaRoot) { + return new RecordBatchRowIterator(schema, vectorSchemaRoot); + } + + @SuppressWarnings("nullness") + public static RecordBatchRowIterator rowsFromSerializedRecordBatch( + org.apache.arrow.vector.types.pojo.Schema arrowSchema, + InputStream inputStream, + RootAllocator allocator) + throws IOException { + VectorSchemaRoot vectorRoot = VectorSchemaRoot.create(arrowSchema, allocator); + VectorLoader vectorLoader = new VectorLoader(vectorRoot); + vectorRoot.clear(); + try (ReadChannel read = new ReadChannel(Channels.newChannel(inputStream))) { + try (ArrowRecordBatch arrowMessage = + MessageSerializer.deserializeRecordBatch(read, allocator)) { + vectorLoader.load(arrowMessage); + } + } + return rowsFromRecordBatch(ArrowSchemaTranslator.toBeamSchema(arrowSchema), vectorRoot); + } + + public static org.apache.arrow.vector.types.pojo.Schema arrowSchemaFromInput(InputStream input) + throws IOException { + ReadChannel readChannel = new ReadChannel(Channels.newChannel(input)); + return MessageSerializer.deserializeSchema(readChannel); + } + + @SuppressWarnings("rawtypes") + public static class RecordBatchRowIterator implements Iterator, AutoCloseable { + private static final ArrowValueConverterVisitor valueConverterVisitor = + new ArrowValueConverterVisitor(); + private final Schema schema; + private final VectorSchemaRoot vectorSchemaRoot; + private final Factory> fieldValueGetters; + private Integer currRowIndex; + + private static class FieldVectorListValueGetterFactory + implements Factory> { + private final List fieldVectors; + + static FieldVectorListValueGetterFactory of(List fieldVectors) { + return new FieldVectorListValueGetterFactory(fieldVectors); + } + + private FieldVectorListValueGetterFactory(List fieldVectors) { + this.fieldVectors = fieldVectors; + } + + @Override + public List create(Class clazz, Schema schema) { + return this.fieldVectors.stream() + .map( + (fieldVector) -> { + Optional> optionalValue = + fieldVector.getField().getFieldType().getType().accept(valueConverterVisitor); + if (!optionalValue.isPresent()) { + return new FieldValueGetter() { + @Nullable + @Override + public Object get(Integer rowIndex) { + return fieldVector.getObject(rowIndex); + } + + @Override + public String name() { + return fieldVector.getField().getName(); + } + }; + } else { + Function conversionFunction = optionalValue.get(); + return new FieldValueGetter() { + @Nullable + @Override + public Object get(Integer rowIndex) { + Object value = fieldVector.getObject(rowIndex); + if (value == null) { + return null; + } + + return conversionFunction.apply(value); + } + + @Override + public String name() { + return fieldVector.getField().getName(); + } + }; + } + }) + .collect(Collectors.toList()); + } + } + + // TODO: Consider using ByteBuddyUtils.TypeConversion for this + private static class ArrowValueConverterVisitor + implements ArrowType.ArrowTypeVisitor>> { + @Override + public Optional> visit(ArrowType.Null type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.Struct type) { + // TODO: code to create a row. + return Optional.empty(); + } + + @Override + public Optional> visit(ArrowType.List type) { + return Optional.empty(); + } + + @Override + public Optional> visit(ArrowType.FixedSizeList type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.Union type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.Map type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.Duration type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.Int type) { + return Optional.empty(); + } + + @Override + public Optional> visit(ArrowType.FloatingPoint type) { + return Optional.empty(); + } + + @Override + public Optional> visit(ArrowType.Utf8 type) { + return Optional.of((Object text) -> ((Text) text).toString()); + } + + @Override + public Optional> visit(ArrowType.Binary type) { + return Optional.empty(); + } + + @Override + public Optional> visit(ArrowType.FixedSizeBinary type) { + return Optional.empty(); + } + + @Override + public Optional> visit(ArrowType.Bool type) { + return Optional.empty(); + } + + @Override + public Optional> visit(ArrowType.Decimal type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.Date type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.Time type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.Timestamp type) { + DateTimeZone tz; + try { + tz = DateTimeZone.forID(type.getTimezone()); + } catch (Exception e) { + throw new IllegalArgumentException( + "Encountered unrecognized Timezone: " + type.getTimezone()); + } + switch (type.getUnit()) { + case MICROSECOND: + return Optional.of((epochMicros) -> new DateTime((long) epochMicros / 1000, tz)); + case MILLISECOND: + return Optional.of((epochMills) -> new DateTime((long) epochMills, tz)); + default: + throw new AssertionError("Encountered unrecognized TimeUnit: " + type.getUnit()); + } + } + + @Override + public Optional> visit(ArrowType.Interval type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.LargeBinary type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.LargeUtf8 type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + + @Override + public Optional> visit(ArrowType.LargeList type) { + throw new IllegalArgumentException("Type \'" + type.toString() + "\' not supported."); + } + } + + private RecordBatchRowIterator(Schema schema, VectorSchemaRoot vectorSchemaRoot) { + this.schema = schema; + this.vectorSchemaRoot = vectorSchemaRoot; + this.fieldValueGetters = + new CachingFactory<>( + FieldVectorListValueGetterFactory.of(vectorSchemaRoot.getFieldVectors())); + this.currRowIndex = 0; + } + + @Override + public void close() { + this.vectorSchemaRoot.close(); + } + + @Override + public boolean hasNext() { + return currRowIndex < vectorSchemaRoot.getRowCount(); + } + + @Override + public Row next() { + if (!hasNext()) { + throw new IllegalStateException("There are no more Rows."); + } + Row result = + Row.withSchema(schema).withFieldValueGetters(this.fieldValueGetters, this.currRowIndex); + this.currRowIndex += 1; + return result; + } + } + + private ArrowConversion() {} + + /** Converts Arrow schema to Beam row schema. */ + public static class ArrowSchemaTranslator { + + public static Schema toBeamSchema(org.apache.arrow.vector.types.pojo.Schema schema) { + return toBeamSchema(schema.getFields()); + } + + public static Schema toBeamSchema(List fields) { + Schema.Builder builder = Schema.builder(); + for (org.apache.arrow.vector.types.pojo.Field field : fields) { + Field beamField = toBeamField(field); + builder.addField(beamField); + } + return builder.build(); + } + } +} diff --git a/sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/package-info.java b/sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/package-info.java new file mode 100644 index 000000000000..0fc60b6aa665 --- /dev/null +++ b/sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Extensions for using Apache Arrow with Beam. */ +package org.apache.beam.sdk.extensions.arrow; diff --git a/sdks/java/extensions/arrow/src/test/java/org/apache/beam/sdk/extensions/arrow/ArrowConversionTest.java b/sdks/java/extensions/arrow/src/test/java/org/apache/beam/sdk/extensions/arrow/ArrowConversionTest.java new file mode 100644 index 000000000000..c5cc71f69cfa --- /dev/null +++ b/sdks/java/extensions/arrow/src/test/java/org/apache/beam/sdk/extensions/arrow/ArrowConversionTest.java @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.arrow; + +import static java.util.Arrays.asList; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.equalTo; + +import java.util.ArrayList; +import org.apache.arrow.memory.BufferAllocator; +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.FixedSizeBinaryVector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.TimeStampMicroTZVector; +import org.apache.arrow.vector.TimeStampMilliTZVector; +import org.apache.arrow.vector.VarCharVector; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.apache.arrow.vector.complex.ListVector; +import org.apache.arrow.vector.types.FloatingPointPrecision; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.util.Text; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.hamcrest.collection.IsIterableContainingInOrder; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class ArrowConversionTest { + + private BufferAllocator allocator; + + @Before + public void init() { + allocator = new RootAllocator(Long.MAX_VALUE); + } + + @After + public void teardown() { + allocator.close(); + } + + @Test + public void toBeamSchema_convertsSimpleArrowSchema() { + Schema expected = + Schema.of(Field.of("int8", FieldType.BYTE), Field.of("int16", FieldType.INT16)); + + org.apache.arrow.vector.types.pojo.Schema arrowSchema = + new org.apache.arrow.vector.types.pojo.Schema( + ImmutableList.of( + field("int8", new ArrowType.Int(8, true)), + field("int16", new ArrowType.Int(16, true)))); + + assertThat(ArrowConversion.ArrowSchemaTranslator.toBeamSchema(arrowSchema), equalTo(expected)); + } + + @Test + public void rowIterator() { + org.apache.arrow.vector.types.pojo.Schema schema = + new org.apache.arrow.vector.types.pojo.Schema( + asList( + field("int32", new ArrowType.Int(32, true)), + field("float64", new ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)), + field("string", new ArrowType.Utf8()), + field("timestampMicroUTC", new ArrowType.Timestamp(TimeUnit.MICROSECOND, "UTC")), + field("timestampMilliUTC", new ArrowType.Timestamp(TimeUnit.MILLISECOND, "UTC")), + field( + "int32_list", + new ArrowType.List(), + field("int32s", new ArrowType.Int(32, true))), + field("boolean", new ArrowType.Bool()), + field("fixed_size_binary", new ArrowType.FixedSizeBinary(3)))); + + Schema beamSchema = ArrowConversion.ArrowSchemaTranslator.toBeamSchema(schema); + + VectorSchemaRoot expectedSchemaRoot = VectorSchemaRoot.create(schema, allocator); + expectedSchemaRoot.allocateNew(); + expectedSchemaRoot.setRowCount(16); + IntVector intVector = (IntVector) expectedSchemaRoot.getFieldVectors().get(0); + Float8Vector floatVector = (Float8Vector) expectedSchemaRoot.getFieldVectors().get(1); + VarCharVector strVector = (VarCharVector) expectedSchemaRoot.getFieldVectors().get(2); + TimeStampMicroTZVector timestampMicroUtcVector = + (TimeStampMicroTZVector) expectedSchemaRoot.getFieldVectors().get(3); + TimeStampMilliTZVector timeStampMilliTZVector = + (TimeStampMilliTZVector) expectedSchemaRoot.getFieldVectors().get(4); + ListVector int32ListVector = (ListVector) expectedSchemaRoot.getFieldVectors().get(5); + IntVector int32ListElementVector = + int32ListVector + .addOrGetVector( + new org.apache.arrow.vector.types.pojo.FieldType( + false, new ArrowType.Int(32, true), null)) + .getVector(); + BitVector boolVector = (BitVector) expectedSchemaRoot.getFieldVectors().get(6); + FixedSizeBinaryVector fixedSizeBinaryVector = + (FixedSizeBinaryVector) expectedSchemaRoot.getFieldVectors().get(7); + + ArrayList expectedRows = new ArrayList<>(); + for (int i = 0; i < 16; i++) { + DateTime dt = new DateTime(2019, 1, i + 1, i, i, i, DateTimeZone.UTC); + expectedRows.add( + Row.withSchema(beamSchema) + .addValues( + i, + i + .1 * i, + "" + i, + dt, + dt, + ImmutableList.of(i), + (i % 2) != 0, + new byte[] {(byte) i, (byte) (i + 1), (byte) (i + 2)}) + .build()); + + intVector.set(i, i); + floatVector.set(i, i + .1 * i); + strVector.set(i, new Text("" + i)); + timestampMicroUtcVector.set(i, dt.getMillis() * 1000); + timeStampMilliTZVector.set(i, dt.getMillis()); + int32ListVector.startNewValue(i); + int32ListElementVector.set(i, i); + int32ListVector.endValue(i, 1); + boolVector.set(i, i % 2); + fixedSizeBinaryVector.set(i, new byte[] {(byte) i, (byte) (i + 1), (byte) (i + 2)}); + } + + assertThat( + ImmutableList.copyOf(ArrowConversion.rowsFromRecordBatch(beamSchema, expectedSchemaRoot)), + IsIterableContainingInOrder.contains( + expectedRows.stream() + .map((row) -> equalTo(row)) + .collect(ImmutableList.toImmutableList()))); + + expectedSchemaRoot.close(); + } + + private static org.apache.arrow.vector.types.pojo.Field field( + String name, + boolean nullable, + ArrowType type, + org.apache.arrow.vector.types.pojo.Field... children) { + return new org.apache.arrow.vector.types.pojo.Field( + name, + new org.apache.arrow.vector.types.pojo.FieldType(nullable, type, null, null), + asList(children)); + } + + private static org.apache.arrow.vector.types.pojo.Field field( + String name, ArrowType type, org.apache.arrow.vector.types.pojo.Field... children) { + return field(name, false, type, children); + } +} diff --git a/sdks/java/extensions/euphoria/build.gradle b/sdks/java/extensions/euphoria/build.gradle index 92cbb6766173..639c018f839c 100644 --- a/sdks/java/extensions/euphoria/build.gradle +++ b/sdks/java/extensions/euphoria/build.gradle @@ -26,7 +26,10 @@ description = "Apache Beam :: SDKs :: Java :: Extensions :: Euphoria Java 8 DSL" dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") - testCompile library.java.mockito_core + compile library.java.jackson_annotations + compile library.java.joda_time + compile library.java.slf4j_api + compile library.java.vendored_guava_26_0_jre testCompile project(":sdks:java:extensions:kryo") testCompile library.java.slf4j_api testCompile library.java.hamcrest_core diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/FlatMapTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/FlatMapTest.java index 727c024d0365..8007a0cae906 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/FlatMapTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/FlatMapTest.java @@ -35,7 +35,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FlatMapTest { diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/JoinTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/JoinTest.java index e88b7f617173..2c74f47202b6 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/JoinTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/JoinTest.java @@ -46,7 +46,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class JoinTest { diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/ReduceByKeyTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/ReduceByKeyTest.java index c5c6a773f8b4..e1b09fa0dfe6 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/ReduceByKeyTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/ReduceByKeyTest.java @@ -49,7 +49,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ReduceByKeyTest { diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/SumByKeyTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/SumByKeyTest.java index 58b42a7c0d6e..03d9ff253b71 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/SumByKeyTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/SumByKeyTest.java @@ -39,7 +39,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SumByKeyTest { diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/TestUtils.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/TestUtils.java index 42f31a2bc41b..7263228c37c5 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/TestUtils.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/TestUtils.java @@ -39,9 +39,6 @@ import org.apache.beam.sdk.values.TypeDescriptor; /** Utility class for easier creating input data sets for operator testing. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestUtils { private static class PrimitiveOutputTranslatorProvider implements TranslatorProvider { diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/type/TypePropagationAssert.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/type/TypePropagationAssert.java index cc22f04569b6..0034e15368f7 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/type/TypePropagationAssert.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/type/TypePropagationAssert.java @@ -30,7 +30,6 @@ /** Bunch of methods to assert type descriptors in operators. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class TypePropagationAssert { diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/util/IOUtilsTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/util/IOUtilsTest.java index 43cfb9fd4ad7..d881c2baf903 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/util/IOUtilsTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/client/util/IOUtilsTest.java @@ -30,9 +30,6 @@ /** Test behavior of IOUtils. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class IOUtilsTest { @Test(expected = IOException.class) diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/FlatMapTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/FlatMapTest.java index 8569c314f77e..15fe99877d82 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/FlatMapTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/FlatMapTest.java @@ -35,9 +35,6 @@ /** Test operator {@code FlatMap}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlatMapTest extends AbstractOperatorTest { @Test diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/JoinTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/JoinTest.java index 2c9028984efb..9a11687d4507 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/JoinTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/JoinTest.java @@ -53,9 +53,6 @@ /** Test operator {@code Join}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JoinTest extends AbstractOperatorTest { @Test diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/MapElementsTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/MapElementsTest.java index d7aa40957881..3003824c04e2 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/MapElementsTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/MapElementsTest.java @@ -35,9 +35,6 @@ /** Tests for operator {@code MapElements}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MapElementsTest extends AbstractOperatorTest { @Test diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/ReduceByKeyTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/ReduceByKeyTest.java index 5622556cfdf3..f7010ba7b1da 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/ReduceByKeyTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/ReduceByKeyTest.java @@ -63,9 +63,6 @@ /** Test operator {@code ReduceByKey}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReduceByKeyTest extends AbstractOperatorTest { /** Validates the output type upon a `.reduceBy` operation on global window. */ diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/SumByKeyTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/SumByKeyTest.java index 67f2103e67c6..89dabdfefbcb 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/SumByKeyTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/SumByKeyTest.java @@ -32,9 +32,6 @@ /** Test operator {@code SumByKey}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SumByKeyTest extends AbstractOperatorTest { @Test diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/translate/SingleJvmAccumulatorProviderTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/translate/SingleJvmAccumulatorProviderTest.java index 9937209c7a5e..4872b8e3b782 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/translate/SingleJvmAccumulatorProviderTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/translate/SingleJvmAccumulatorProviderTest.java @@ -35,9 +35,6 @@ * JUnit directly. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SingleJvmAccumulatorProviderTest { private static final String TEST_COUNTER_NAME = "test-counter"; diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/translate/collector/SingleValueCollectorTest.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/translate/collector/SingleValueCollectorTest.java index 20b486e7af30..127269d21bdf 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/translate/collector/SingleValueCollectorTest.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/translate/collector/SingleValueCollectorTest.java @@ -33,7 +33,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SingleValueCollectorTest { diff --git a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowDesc.java b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowDesc.java index 0ef1d8c2c928..14a4ad7a60a2 100644 --- a/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowDesc.java +++ b/sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowDesc.java @@ -25,9 +25,6 @@ * * @param type of input element */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class WindowDesc { public static WindowDesc of(Window window) { diff --git a/sdks/java/extensions/google-cloud-platform-core/build.gradle b/sdks/java/extensions/google-cloud-platform-core/build.gradle index 4f290dd48758..504cc7066226 100644 --- a/sdks/java/extensions/google-cloud-platform-core/build.gradle +++ b/sdks/java/extensions/google-cloud-platform-core/build.gradle @@ -20,7 +20,6 @@ import groovy.json.JsonOutput plugins { id 'org.apache.beam.module' } applyJavaNature( - automaticModuleName: 'org.apache.beam.sdk.extensions.gcp') description = "Apache Beam :: SDKs :: Java :: Extensions :: Google Cloud Platform Core" @@ -39,6 +38,7 @@ dependencies { compile enforcedPlatform(library.java.google_cloud_platform_libraries_bom) compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") + compile project(path: ":runners:core-java") compile library.java.google_http_client_jackson2 compile library.java.google_auth_library_oauth2_http compile library.java.google_api_client @@ -50,12 +50,14 @@ dependencies { compile library.java.google_http_client compile library.java.slf4j_api compile library.java.joda_time + compile library.java.http_core + compile library.java.http_client compile library.java.jackson_annotations compile library.java.jackson_databind + permitUnusedDeclared library.java.jackson_databind // BEAM-11761 provided library.java.hamcrest_core provided library.java.junit testCompile project(path: ":sdks:java:core", configuration: "shadowTest") - testCompile library.java.hamcrest_library testCompile library.java.mockito_core testRuntimeOnly library.java.slf4j_jdk14 } diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/auth/GcpCredentialFactory.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/auth/GcpCredentialFactory.java index 0931dd4485d8..e7193da1c6b1 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/auth/GcpCredentialFactory.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/auth/GcpCredentialFactory.java @@ -46,6 +46,8 @@ public class GcpCredentialFactory implements CredentialFactory { "https://www.googleapis.com/auth/devstorage.full_control", "https://www.googleapis.com/auth/userinfo.email", "https://www.googleapis.com/auth/datastore", + "https://www.googleapis.com/auth/bigquery", + "https://www.googleapis.com/auth/bigquery.insertdata", "https://www.googleapis.com/auth/pubsub"); private static final GcpCredentialFactory INSTANCE = new GcpCredentialFactory(); diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java index 82fc3ae1e5df..f593e2f5c9d8 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java @@ -406,7 +406,7 @@ private static long getProjectNumber( try { Project project = ResilientOperation.retry( - ResilientOperation.getGoogleRequestCallable(getProject), + getProject::execute, backoff, RetryDeterminer.SOCKET_ERRORS, IOException.class, diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java index 9d63d2f8eb9f..d183d1647e11 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.extensions.gcp.options; import com.fasterxml.jackson.annotation.JsonIgnore; -import com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel; +import com.google.cloud.hadoop.util.AsyncWriteChannelOptions; import java.util.concurrent.ExecutorService; import java.util.concurrent.SynchronousQueue; import java.util.concurrent.ThreadPoolExecutor; @@ -78,15 +78,15 @@ public interface GcsOptions extends ApplicationNameOptions, GcpOptions, Pipeline /** * The buffer size (in bytes) to use when uploading files to GCS. Please see the documentation for - * {@link AbstractGoogleAsyncWriteChannel#setUploadBufferSize} for more information on the - * restrictions and performance implications of this value. + * {@link AsyncWriteChannelOptions#getUploadChunkSize} for more information on the restrictions + * and performance implications of this value. */ @Description( "The buffer size (in bytes) to use when uploading files to GCS. Please see the " - + "documentation for AbstractGoogleAsyncWriteChannel.setUploadBufferSize for more " + + "documentation for AsyncWriteChannelOptions.getUploadChunkSize for more " + "information on the restrictions and performance implications of this value.\n\n" + "https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/" - + "com/google/cloud/hadoop/util/AbstractGoogleAsyncWriteChannel.java") + + "com/google/cloud/hadoop/util/AsyncWriteChannelOptions.java") @Nullable Integer getGcsUploadBufferSizeBytes(); diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsCreateOptions.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsCreateOptions.java index c3a64629516c..1fdb871605c8 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsCreateOptions.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsCreateOptions.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.extensions.gcp.storage; import com.google.auto.value.AutoValue; -import com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel; +import com.google.cloud.hadoop.util.AsyncWriteChannelOptions; import org.apache.beam.sdk.io.fs.CreateOptions; import org.checkerframework.checker.nullness.qual.Nullable; @@ -28,8 +28,8 @@ public abstract class GcsCreateOptions extends CreateOptions { /** * The buffer size (in bytes) to use when uploading files to GCS. Please see the documentation for - * {@link AbstractGoogleAsyncWriteChannel#setUploadBufferSize} for more information on the - * restrictions and performance implications of this value. + * {@link AsyncWriteChannelOptions#getUploadChunkSize} for more information on the restrictions + * and performance implications of this value. */ public abstract @Nullable Integer gcsUploadBufferSizeBytes(); diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java index b02854b5eeec..6c3942890451 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java @@ -46,6 +46,7 @@ import org.apache.beam.sdk.io.fs.MatchResult; import org.apache.beam.sdk.io.fs.MatchResult.Metadata; import org.apache.beam.sdk.io.fs.MatchResult.Status; +import org.apache.beam.sdk.io.fs.MoveOptions; import org.apache.beam.sdk.metrics.Counter; import org.apache.beam.sdk.metrics.Metrics; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; @@ -69,14 +70,22 @@ class GcsFileSystem extends FileSystem { /** Number of copy operations performed. */ private Counter numCopies; + /** Number of renames operations performed. */ + private Counter numRenames; + /** Time spent performing copies. */ private Counter copyTimeMsec; + /** Time spent performing renames. */ + private Counter renameTimeMsec; + GcsFileSystem(GcsOptions options) { this.options = checkNotNull(options, "options"); if (options.getGcsPerformanceMetrics()) { numCopies = Metrics.counter(GcsFileSystem.class, "num_copies"); copyTimeMsec = Metrics.counter(GcsFileSystem.class, "copy_time_msec"); + numRenames = Metrics.counter(GcsFileSystem.class, "num_renames"); + renameTimeMsec = Metrics.counter(GcsFileSystem.class, "rename_time_msec"); } } @@ -142,10 +151,20 @@ protected ReadableByteChannel open(GcsResourceId resourceId) throws IOException } @Override - protected void rename(List srcResourceIds, List destResourceIds) + protected void rename( + List srcResourceIds, + List destResourceIds, + MoveOptions... moveOptions) throws IOException { - copy(srcResourceIds, destResourceIds); - delete(srcResourceIds); + Stopwatch stopwatch = Stopwatch.createStarted(); + options + .getGcsUtil() + .rename(toFilenames(srcResourceIds), toFilenames(destResourceIds), moveOptions); + stopwatch.stop(); + if (options.getGcsPerformanceMetrics()) { + numRenames.inc(srcResourceIds.size()); + renameTimeMsec.inc(stopwatch.elapsed(TimeUnit.MILLISECONDS)); + } } @Override @@ -275,6 +294,9 @@ private Metadata toMetadata(StorageObject storageObject) { Metadata.builder() .setIsReadSeekEfficient(true) .setResourceId(GcsResourceId.fromGcsPath(GcsPath.fromObject(storageObject))); + if (storageObject.getMd5Hash() != null) { + ret.setChecksum(storageObject.getMd5Hash()); + } BigInteger size = firstNonNull(storageObject.getSize(), BigInteger.ZERO); ret.setSizeBytes(size.longValue()); DateTime lastModified = firstNonNull(storageObject.getUpdated(), new DateTime(0L)); diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GceMetadataUtil.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GceMetadataUtil.java index 6736844870ae..711310217244 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GceMetadataUtil.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GceMetadataUtil.java @@ -21,7 +21,7 @@ import java.io.InputStream; import java.io.InputStreamReader; import java.io.Reader; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CharStreams; import org.apache.http.HttpResponse; import org.apache.http.client.HttpClient; @@ -50,7 +50,7 @@ static String fetchMetadata(String key) { return ""; } InputStream in = response.getEntity().getContent(); - try (final Reader reader = new InputStreamReader(in, Charset.defaultCharset())) { + try (final Reader reader = new InputStreamReader(in, StandardCharsets.UTF_8)) { return CharStreams.toString(reader); } } catch (IOException e) { diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java index 103c23414954..f6ad78ffa75d 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java @@ -26,8 +26,10 @@ import com.google.api.client.googleapis.batch.json.JsonBatchCallback; import com.google.api.client.googleapis.json.GoogleJsonError; import com.google.api.client.googleapis.json.GoogleJsonResponseException; +import com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest; import com.google.api.client.http.HttpHeaders; import com.google.api.client.http.HttpRequestInitializer; +import com.google.api.client.http.HttpStatusCodes; import com.google.api.client.util.BackOff; import com.google.api.client.util.Sleeper; import com.google.api.services.storage.Storage; @@ -35,11 +37,13 @@ import com.google.api.services.storage.model.Objects; import com.google.api.services.storage.model.RewriteResponse; import com.google.api.services.storage.model.StorageObject; +import com.google.auth.Credentials; import com.google.auto.value.AutoValue; import com.google.cloud.hadoop.gcsio.CreateObjectOptions; import com.google.cloud.hadoop.gcsio.GoogleCloudStorage; import com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl; import com.google.cloud.hadoop.gcsio.GoogleCloudStorageOptions; +import com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadOptions; import com.google.cloud.hadoop.gcsio.StorageResourceId; import com.google.cloud.hadoop.util.ApiErrorExtractor; import com.google.cloud.hadoop.util.AsyncWriteChannelOptions; @@ -54,9 +58,11 @@ import java.nio.file.FileAlreadyExistsException; import java.util.ArrayList; import java.util.Collection; +import java.util.HashMap; import java.util.Iterator; import java.util.LinkedList; import java.util.List; +import java.util.Set; import java.util.concurrent.CompletionStage; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; @@ -64,10 +70,16 @@ import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; +import java.util.function.Supplier; import java.util.regex.Matcher; import java.util.regex.Pattern; +import org.apache.beam.runners.core.metrics.GcpResourceIdentifiers; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.ServiceCallMetric; import org.apache.beam.sdk.extensions.gcp.options.GcsOptions; import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath; +import org.apache.beam.sdk.io.fs.MoveOptions; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; import org.apache.beam.sdk.options.DefaultValueFactory; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.util.FluentBackoff; @@ -75,6 +87,7 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; @@ -86,6 +99,7 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class GcsUtil { + /** * This is a {@link DefaultValueFactory} able to create a {@link GcsUtil} using any transport * flags specified on the {@link PipelineOptions}. @@ -107,6 +121,7 @@ public GcsUtil create(PipelineOptions options) { storageBuilder.getHttpRequestInitializer(), gcsOptions.getExecutorService(), hasExperiment(options, "use_grpc_for_gcs"), + gcsOptions.getGcpCredential(), gcsOptions.getGcsUploadBufferSizeBytes()); } @@ -116,12 +131,14 @@ public static GcsUtil create( Storage storageClient, HttpRequestInitializer httpRequestInitializer, ExecutorService executorService, + Credentials credentials, @Nullable Integer uploadBufferSizeBytes) { return new GcsUtil( storageClient, httpRequestInitializer, executorService, hasExperiment(options, "use_grpc_for_gcs"), + credentials, uploadBufferSizeBytes); } } @@ -147,6 +164,8 @@ public static GcsUtil create( /** Client for the GCS API. */ private Storage storageClient; + private Supplier batchRequestSupplier; + private final HttpRequestInitializer httpRequestInitializer; /** Buffer size for GCS uploads (in bytes). */ private final @Nullable Integer uploadBufferSizeBytes; @@ -159,9 +178,10 @@ public static GcsUtil create( // Exposed for testing. final ExecutorService executorService; + private Credentials credentials; + private GoogleCloudStorage googleCloudStorage; private GoogleCloudStorageOptions googleCloudStorageOptions; - private final boolean shouldUseGrpc; /** Rewrite operation setting. For testing purposes only. */ @VisibleForTesting @Nullable Long maxBytesRewrittenPerCall; @@ -185,20 +205,48 @@ private GcsUtil( HttpRequestInitializer httpRequestInitializer, ExecutorService executorService, Boolean shouldUseGrpc, + Credentials credentials, @Nullable Integer uploadBufferSizeBytes) { this.storageClient = storageClient; this.httpRequestInitializer = httpRequestInitializer; this.uploadBufferSizeBytes = uploadBufferSizeBytes; this.executorService = executorService; + this.credentials = credentials; this.maxBytesRewrittenPerCall = null; this.numRewriteTokensUsed = null; - this.shouldUseGrpc = shouldUseGrpc; googleCloudStorageOptions = - GoogleCloudStorageOptions.newBuilder() + GoogleCloudStorageOptions.builder() .setAppName("Beam") .setGrpcEnabled(shouldUseGrpc) .build(); - googleCloudStorage = new GoogleCloudStorageImpl(googleCloudStorageOptions, storageClient); + googleCloudStorage = + new GoogleCloudStorageImpl(googleCloudStorageOptions, storageClient, credentials); + this.batchRequestSupplier = + () -> { + // Capture reference to this so that the most recent storageClient and initializer + // are used. + GcsUtil util = this; + return new BatchInterface() { + final BatchRequest batch = util.storageClient.batch(util.httpRequestInitializer); + + @Override + public void queue( + AbstractGoogleJsonClientRequest request, JsonBatchCallback cb) + throws IOException { + request.queue(batch, cb); + } + + @Override + public void execute() throws IOException { + batch.execute(); + } + + @Override + public int size() { + return batch.size(); + } + }; + }; } // Use this only for testing purposes. @@ -206,6 +254,11 @@ protected void setStorageClient(Storage storageClient) { this.storageClient = storageClient; } + // Use this only for testing purposes. + protected void setBatchRequestSupplier(Supplier supplier) { + this.batchRequestSupplier = supplier; + } + /** * Expands a pattern into matched paths. The pattern path may contain globs, which are expanded in * the result. For patterns that only match a single object, we ensure that the object exists. @@ -288,11 +341,7 @@ StorageObject getObject(GcsPath gcsPath, BackOff backoff, Sleeper sleeper) throw storageClient.objects().get(gcsPath.getBucket(), gcsPath.getObject()); try { return ResilientOperation.retry( - ResilientOperation.getGoogleRequestCallable(getObject), - backoff, - RetryDeterminer.SOCKET_ERRORS, - IOException.class, - sleeper); + getObject::execute, backoff, RetryDeterminer.SOCKET_ERRORS, IOException.class, sleeper); } catch (IOException | InterruptedException e) { if (e instanceof InterruptedException) { Thread.currentThread().interrupt(); @@ -344,10 +393,7 @@ public Objects listObjects( try { return ResilientOperation.retry( - ResilientOperation.getGoogleRequestCallable(listObject), - createBackOff(), - RetryDeterminer.SOCKET_ERRORS, - IOException.class); + listObject::execute, createBackOff(), RetryDeterminer.SOCKET_ERRORS, IOException.class); } catch (Exception e) { throw new IOException( String.format("Unable to match files in bucket %s, prefix %s.", bucket, prefix), e); @@ -400,6 +446,45 @@ public SeekableByteChannel open(GcsPath path) throws IOException { return googleCloudStorage.open(new StorageResourceId(path.getBucket(), path.getObject())); } + /** + * Opens an object in GCS. + * + *

    Returns a SeekableByteChannel that provides access to data in the bucket. + * + * @param path the GCS filename to read from + * @param readOptions Fine-grained options for behaviors of retries, buffering, etc. + * @return a SeekableByteChannel that can read the object data + */ + @VisibleForTesting + SeekableByteChannel open(GcsPath path, GoogleCloudStorageReadOptions readOptions) + throws IOException { + HashMap baseLabels = new HashMap<>(); + baseLabels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + baseLabels.put(MonitoringInfoConstants.Labels.SERVICE, "Storage"); + baseLabels.put(MonitoringInfoConstants.Labels.METHOD, "GcsGet"); + baseLabels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.cloudStorageBucket(path.getBucket())); + baseLabels.put( + MonitoringInfoConstants.Labels.GCS_PROJECT_ID, googleCloudStorageOptions.getProjectId()); + baseLabels.put(MonitoringInfoConstants.Labels.GCS_BUCKET, path.getBucket()); + + ServiceCallMetric serviceCallMetric = + new ServiceCallMetric(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, baseLabels); + try { + SeekableByteChannel channel = + googleCloudStorage.open( + new StorageResourceId(path.getBucket(), path.getObject()), readOptions); + serviceCallMetric.call("ok"); + return channel; + } catch (IOException e) { + if (e.getCause() instanceof GoogleJsonResponseException) { + serviceCallMetric.call(((GoogleJsonResponseException) e.getCause()).getDetails().getCode()); + } + throw e; + } + } + /** * Creates an object in GCS. * @@ -419,30 +504,45 @@ public WritableByteChannel create(GcsPath path, String type) throws IOException */ public WritableByteChannel create(GcsPath path, String type, Integer uploadBufferSizeBytes) throws IOException { - // When AsyncWriteChannelOptions has toBuilder() method, the following can be changed to: - // AsyncWriteChannelOptions newOptions = - // wcOptions.toBuilder().setUploadChunkSize(uploadBufferSizeBytes).build(); AsyncWriteChannelOptions wcOptions = googleCloudStorageOptions.getWriteChannelOptions(); int uploadChunkSize = (uploadBufferSizeBytes == null) ? wcOptions.getUploadChunkSize() : uploadBufferSizeBytes; AsyncWriteChannelOptions newOptions = - AsyncWriteChannelOptions.builder() - .setBufferSize(wcOptions.getBufferSize()) - .setPipeBufferSize(wcOptions.getPipeBufferSize()) - .setUploadChunkSize(uploadChunkSize) - .setDirectUploadEnabled(wcOptions.isDirectUploadEnabled()) - .build(); + wcOptions.toBuilder().setUploadChunkSize(uploadChunkSize).build(); GoogleCloudStorageOptions newGoogleCloudStorageOptions = - googleCloudStorageOptions - .toBuilder() - .setWriteChannelOptions(newOptions) - .setGrpcEnabled(this.shouldUseGrpc) - .build(); + googleCloudStorageOptions.toBuilder().setWriteChannelOptions(newOptions).build(); GoogleCloudStorage gcpStorage = - new GoogleCloudStorageImpl(newGoogleCloudStorageOptions, this.storageClient); - return gcpStorage.create( - new StorageResourceId(path.getBucket(), path.getObject()), - new CreateObjectOptions(true, type, CreateObjectOptions.EMPTY_METADATA)); + new GoogleCloudStorageImpl( + newGoogleCloudStorageOptions, this.storageClient, this.credentials); + HashMap baseLabels = new HashMap<>(); + baseLabels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + baseLabels.put(MonitoringInfoConstants.Labels.SERVICE, "Storage"); + baseLabels.put(MonitoringInfoConstants.Labels.METHOD, "GcsInsert"); + baseLabels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.cloudStorageBucket(path.getBucket())); + baseLabels.put( + MonitoringInfoConstants.Labels.GCS_PROJECT_ID, googleCloudStorageOptions.getProjectId()); + baseLabels.put(MonitoringInfoConstants.Labels.GCS_BUCKET, path.getBucket()); + + ServiceCallMetric serviceCallMetric = + new ServiceCallMetric(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, baseLabels); + try { + WritableByteChannel channel = + gcpStorage.create( + new StorageResourceId(path.getBucket(), path.getObject()), + CreateObjectOptions.builder() + .setOverwriteExisting(true) + .setContentType(type) + .build()); + serviceCallMetric.call("ok"); + return channel; + } catch (IOException e) { + if (e.getCause() instanceof GoogleJsonResponseException) { + serviceCallMetric.call(((GoogleJsonResponseException) e.getCause()).getDetails().getCode()); + } + throw e; + } } /** Returns whether the GCS bucket exists and is accessible. */ @@ -487,7 +587,7 @@ Bucket getBucket(GcsPath path, BackOff backoff, Sleeper sleeper) throws IOExcept try { return ResilientOperation.retry( - ResilientOperation.getGoogleRequestCallable(getBucket), + getBucket::execute, backoff, new RetryDeterminer() { @Override @@ -526,7 +626,7 @@ void createBucket(String projectId, Bucket bucket, BackOff backoff, Sleeper slee try { ResilientOperation.retry( - ResilientOperation.getGoogleRequestCallable(insertBucket), + insertBucket::execute, backoff, new RetryDeterminer() { @Override @@ -552,13 +652,13 @@ public boolean shouldRetry(IOException e) { Thread.currentThread().interrupt(); throw new IOException( String.format( - "Error while attempting to create bucket gs://%s for rproject %s", + "Error while attempting to create bucket gs://%s for project %s", bucket.getName(), projectId), e); } } - private static void executeBatches(List batches) throws IOException { + private static void executeBatches(List batches) throws IOException { ExecutorService executor = MoreExecutors.listeningDecorator( new ThreadPoolExecutor( @@ -569,7 +669,7 @@ private static void executeBatches(List batches) throws IOExceptio new LinkedBlockingQueue<>())); List> futures = new ArrayList<>(); - for (final BatchRequest batch : batches) { + for (final BatchInterface batch : batches) { futures.add(MoreFutures.runAsync(() -> batch.execute(), executor)); } @@ -589,20 +689,20 @@ private static void executeBatches(List batches) throws IOExceptio } /** - * Makes get {@link BatchRequest BatchRequests}. + * Makes get {@link BatchInterface BatchInterfaces}. * * @param paths {@link GcsPath GcsPaths}. * @param results mutable {@link List} for return values. - * @return {@link BatchRequest BatchRequests} to execute. + * @return {@link BatchInterface BatchInterfaces} to execute. * @throws IOException */ @VisibleForTesting - List makeGetBatches( + List makeGetBatches( Collection paths, List results) throws IOException { - List batches = new ArrayList<>(); + List batches = new ArrayList<>(); for (List filesToGet : Lists.partition(Lists.newArrayList(paths), MAX_REQUESTS_PER_BATCH)) { - BatchRequest batch = createBatchRequest(); + BatchInterface batch = batchRequestSupplier.get(); for (GcsPath path : filesToGet) { results.add(enqueueGetFileSize(path, batch)); } @@ -612,35 +712,85 @@ List makeGetBatches( } /** - * Wrapper for RewriteRequest that supports multiple calls. + * Wrapper for rewriting that supports multiple calls as well as possibly deleting the source + * file. * *

    Usage: create, enqueue(), and execute batch. Then, check getReadyToEnqueue() if another * round of enqueue() and execute is required. Repeat until getReadyToEnqueue() returns false. */ class RewriteOp extends JsonBatchCallback { - private GcsPath from; - private GcsPath to; + private final GcsPath from; + private final GcsPath to; + private final boolean deleteSource; + private final boolean ignoreMissingSource; private boolean readyToEnqueue; + private boolean performDelete; + private GoogleJsonError lastError; @VisibleForTesting Storage.Objects.Rewrite rewriteRequest; public boolean getReadyToEnqueue() { return readyToEnqueue; } - public void enqueue(BatchRequest batch) throws IOException { + public GoogleJsonError getLastError() { + return lastError; + } + + public GcsPath getFrom() { + return from; + } + + public GcsPath getTo() { + return to; + } + + public void enqueue(BatchInterface batch) throws IOException { if (!readyToEnqueue) { throw new IOException( String.format( "Invalid state for Rewrite, from=%s, to=%s, readyToEnqueue=%s", from, to, readyToEnqueue)); } - rewriteRequest.queue(batch, this); - readyToEnqueue = false; + if (performDelete) { + Storage.Objects.Delete deleteRequest = + storageClient.objects().delete(from.getBucket(), from.getObject()); + batch.queue( + deleteRequest, + new JsonBatchCallback() { + @Override + public void onSuccess(Void obj, HttpHeaders responseHeaders) { + LOG.debug("Successfully deleted {} after moving to {}", from, to); + readyToEnqueue = false; + lastError = null; + } + + @Override + public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) + throws IOException { + if (e.getCode() == 404) { + LOG.info( + "Ignoring failed deletion of moved file {} which already does not exist: {}", + from, + e); + readyToEnqueue = false; + lastError = null; + } else { + readyToEnqueue = true; + lastError = e; + } + } + }); + } else { + batch.queue(rewriteRequest, this); + } } - public RewriteOp(GcsPath from, GcsPath to) throws IOException { + public RewriteOp(GcsPath from, GcsPath to, boolean deleteSource, boolean ignoreMissingSource) + throws IOException { this.from = from; this.to = to; + this.deleteSource = deleteSource; + this.ignoreMissingSource = ignoreMissingSource; rewriteRequest = storageClient .objects() @@ -654,9 +804,14 @@ public RewriteOp(GcsPath from, GcsPath to) throws IOException { @Override public void onSuccess(RewriteResponse rewriteResponse, HttpHeaders responseHeaders) throws IOException { + lastError = null; if (rewriteResponse.getDone()) { - LOG.debug("Rewrite done: {} to {}", from, to); - readyToEnqueue = false; + if (deleteSource) { + readyToEnqueue = true; + performDelete = true; + } else { + readyToEnqueue = false; + } } else { LOG.debug( "Rewrite progress: {} of {} bytes, {} to {}", @@ -674,21 +829,101 @@ public void onSuccess(RewriteResponse rewriteResponse, HttpHeaders responseHeade @Override public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOException { - readyToEnqueue = false; - throw new IOException(String.format("Error trying to rewrite %s to %s: %s", from, to, e)); + if (e.getCode() == HttpStatusCodes.STATUS_CODE_NOT_FOUND) { + if (ignoreMissingSource) { + // Treat a missing source as a successful rewrite. + readyToEnqueue = false; + lastError = null; + } else { + throw new FileNotFoundException(from.toString()); + } + } else { + lastError = e; + readyToEnqueue = true; + } } } public void copy(Iterable srcFilenames, Iterable destFilenames) throws IOException { - LinkedList rewrites = makeRewriteOps(srcFilenames, destFilenames); - while (rewrites.size() > 0) { - executeBatches(makeCopyBatches(rewrites)); + rewriteHelper( + srcFilenames, + destFilenames, + /*deleteSource=*/ false, + /*ignoreMissingSource=*/ false, + /*ignoreExistingDest=*/ false); + } + + public void rename( + Iterable srcFilenames, Iterable destFilenames, MoveOptions... moveOptions) + throws IOException { + // Rename is implemented as a rewrite followed by deleting the source. If the new object is in + // the same location, the copy is a metadata-only operation. + Set moveOptionSet = Sets.newHashSet(moveOptions); + final boolean ignoreMissingSrc = + moveOptionSet.contains(StandardMoveOptions.IGNORE_MISSING_FILES); + final boolean ignoreExistingDest = + moveOptionSet.contains(StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS); + rewriteHelper( + srcFilenames, destFilenames, /*deleteSource=*/ true, ignoreMissingSrc, ignoreExistingDest); + } + + private void rewriteHelper( + Iterable srcFilenames, + Iterable destFilenames, + boolean deleteSource, + boolean ignoreMissingSource, + boolean ignoreExistingDest) + throws IOException { + LinkedList rewrites = + makeRewriteOps( + srcFilenames, destFilenames, deleteSource, ignoreMissingSource, ignoreExistingDest); + org.apache.beam.sdk.util.BackOff backoff = BACKOFF_FACTORY.backoff(); + while (true) { + List batches = makeRewriteBatches(rewrites); // Removes completed rewrite ops. + if (batches.isEmpty()) { + break; + } + RewriteOp sampleErrorOp = + rewrites.stream().filter(op -> op.getLastError() != null).findFirst().orElse(null); + if (sampleErrorOp != null) { + long backOffMillis = backoff.nextBackOffMillis(); + if (backOffMillis == org.apache.beam.sdk.util.BackOff.STOP) { + throw new IOException( + String.format( + "Error completing file copies with retries, sample: from %s to %s due to %s", + sampleErrorOp.getFrom().toString(), + sampleErrorOp.getTo().toString(), + sampleErrorOp.getLastError())); + } + LOG.warn( + "Retrying with backoff unsuccessful copy requests, sample request: from {} to {} due to {}", + sampleErrorOp.getFrom(), + sampleErrorOp.getTo(), + sampleErrorOp.getLastError()); + try { + Thread.sleep(backOffMillis); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new IOException( + String.format( + "Interrupted backoff of file copies with retries, sample: from %s to %s due to %s", + sampleErrorOp.getFrom().toString(), + sampleErrorOp.getTo().toString(), + sampleErrorOp.getLastError())); + } + } + executeBatches(batches); } } LinkedList makeRewriteOps( - Iterable srcFilenames, Iterable destFilenames) throws IOException { + Iterable srcFilenames, + Iterable destFilenames, + boolean deleteSource, + boolean ignoreMissingSource, + boolean ignoreExistingDest) + throws IOException { List srcList = Lists.newArrayList(srcFilenames); List destList = Lists.newArrayList(destFilenames); checkArgument( @@ -700,14 +935,18 @@ LinkedList makeRewriteOps( for (int i = 0; i < srcList.size(); i++) { final GcsPath sourcePath = GcsPath.fromUri(srcList.get(i)); final GcsPath destPath = GcsPath.fromUri(destList.get(i)); - rewrites.addLast(new RewriteOp(sourcePath, destPath)); + if (ignoreExistingDest && !sourcePath.getBucket().equals(destPath.getBucket())) { + throw new UnsupportedOperationException( + "Skipping dest existence is only supported within a bucket."); + } + rewrites.addLast(new RewriteOp(sourcePath, destPath, deleteSource, ignoreMissingSource)); } return rewrites; } - List makeCopyBatches(LinkedList rewrites) throws IOException { - List batches = new ArrayList<>(); - BatchRequest batch = createBatchRequest(); + List makeRewriteBatches(LinkedList rewrites) throws IOException { + List batches = new ArrayList<>(); + BatchInterface batch = batchRequestSupplier.get(); Iterator it = rewrites.iterator(); while (it.hasNext()) { RewriteOp rewrite = it.next(); @@ -719,7 +958,7 @@ List makeCopyBatches(LinkedList rewrites) throws IOExce if (batch.size() >= MAX_REQUESTS_PER_BATCH) { batches.add(batch); - batch = createBatchRequest(); + batch = batchRequestSupplier.get(); } } if (batch.size() > 0) { @@ -728,11 +967,11 @@ List makeCopyBatches(LinkedList rewrites) throws IOExce return batches; } - List makeRemoveBatches(Collection filenames) throws IOException { - List batches = new ArrayList<>(); + List makeRemoveBatches(Collection filenames) throws IOException { + List batches = new ArrayList<>(); for (List filesToDelete : Lists.partition(Lists.newArrayList(filenames), MAX_REQUESTS_PER_BATCH)) { - BatchRequest batch = createBatchRequest(); + BatchInterface batch = batchRequestSupplier.get(); for (String file : filesToDelete) { enqueueDelete(GcsPath.fromUri(file), batch); } @@ -742,17 +981,19 @@ List makeRemoveBatches(Collection filenames) throws IOExce } public void remove(Collection filenames) throws IOException { + // TODO(BEAM-8268): It would be better to add per-file retries and backoff + // instead of failing everything if a single operation fails. executeBatches(makeRemoveBatches(filenames)); } - private StorageObjectOrIOException[] enqueueGetFileSize(final GcsPath path, BatchRequest batch) + private StorageObjectOrIOException[] enqueueGetFileSize(final GcsPath path, BatchInterface batch) throws IOException { final StorageObjectOrIOException[] ret = new StorageObjectOrIOException[1]; Storage.Objects.Get getRequest = storageClient.objects().get(path.getBucket(), path.getObject()); - getRequest.queue( - batch, + batch.queue( + getRequest, new JsonBatchCallback() { @Override public void onSuccess(StorageObject response, HttpHeaders httpHeaders) @@ -763,7 +1004,7 @@ public void onSuccess(StorageObject response, HttpHeaders httpHeaders) @Override public void onFailure(GoogleJsonError e, HttpHeaders httpHeaders) throws IOException { IOException ioException; - if (errorExtractor.itemNotFound(e)) { + if (e.getCode() == HttpStatusCodes.STATUS_CODE_NOT_FOUND) { ioException = new FileNotFoundException(path.toString()); } else { ioException = new IOException(String.format("Error trying to get %s: %s", path, e)); @@ -799,11 +1040,11 @@ public static StorageObjectOrIOException create(IOException ioException) { } } - private void enqueueDelete(final GcsPath file, BatchRequest batch) throws IOException { + private void enqueueDelete(final GcsPath file, BatchInterface batch) throws IOException { Storage.Objects.Delete deleteRequest = storageClient.objects().delete(file.getBucket(), file.getObject()); - deleteRequest.queue( - batch, + batch.queue( + deleteRequest, new JsonBatchCallback() { @Override public void onSuccess(Void obj, HttpHeaders responseHeaders) { @@ -822,7 +1063,13 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE }); } - private BatchRequest createBatchRequest() { - return storageClient.batch(httpRequestInitializer); + @VisibleForTesting + interface BatchInterface { + void queue(AbstractGoogleJsonClientRequest request, JsonBatchCallback cb) + throws IOException; + + void execute() throws IOException; + + int size(); } } diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializer.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializer.java index 655863683583..5b9cc614e9f2 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializer.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializer.java @@ -23,14 +23,32 @@ import com.google.api.client.http.HttpResponse; import com.google.api.client.http.HttpResponseInterceptor; import java.io.IOException; -import org.apache.beam.sdk.util.Histogram; +import java.util.Map; +import org.apache.beam.runners.core.metrics.LabeledMetrics; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName; +import org.apache.beam.sdk.metrics.Histogram; +import org.apache.beam.sdk.util.HistogramData; /** HttpRequestInitializer for recording request to response latency of Http-based API calls. */ public class LatencyRecordingHttpRequestInitializer implements HttpRequestInitializer { + public static final HistogramData.BucketType HISTOGRAM_BUCKET_TYPE = + // record latency upto 60 seconds in the resolution of 20ms + HistogramData.LinearBuckets.of(0, 20, 3000); private final Histogram requestLatencies; - public LatencyRecordingHttpRequestInitializer(Histogram requestLatencies) { - this.requestLatencies = requestLatencies; + public LatencyRecordingHttpRequestInitializer(Histogram histogram) { + requestLatencies = histogram; + } + + public LatencyRecordingHttpRequestInitializer(Map labels) { + // record latency upto 60 seconds in the resolution of 20ms + this.requestLatencies = + LabeledMetrics.histogram( + MonitoringInfoMetricName.named( + MonitoringInfoConstants.Urns.API_REQUEST_LATENCIES, labels), + HISTOGRAM_BUCKET_TYPE, + true); } private static class LoggingInterceptor @@ -45,7 +63,7 @@ public LoggingInterceptor(Histogram requestLatencies) { @Override public void interceptResponse(HttpResponse response) throws IOException { long timeToResponse = System.currentTimeMillis() - startTime; - requestLatencies.record(timeToResponse); + requestLatencies.update(timeToResponse); } @Override diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/GcpCoreApiSurfaceTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/GcpCoreApiSurfaceTest.java index bb2d4e07b7a6..5014bd7370c1 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/GcpCoreApiSurfaceTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/GcpCoreApiSurfaceTest.java @@ -31,9 +31,6 @@ /** API surface verification for Google Cloud Platform core components. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GcpCoreApiSurfaceTest { @Test diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptionsTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptionsTest.java index 6e268364d223..0f0b5b312aa4 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptionsTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptionsTest.java @@ -59,9 +59,6 @@ import org.mockito.MockitoAnnotations; /** Tests for {@link GcpOptions}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GcpOptionsTest { /** Tests for the majority of methods. */ diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemRegistrarTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemRegistrarTest.java index 10752dd42d3f..69c70a80b33e 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemRegistrarTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemRegistrarTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.extensions.gcp.storage; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.ServiceLoader; diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemTest.java index 405cda2d5903..2b56b81d8a6c 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.extensions.gcp.storage; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.ArgumentMatchers.anyString; import static org.mockito.Matchers.eq; import static org.mockito.Matchers.isNull; @@ -53,9 +53,6 @@ /** Tests for {@link GcsFileSystem}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GcsFileSystemTest { @Rule public transient ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidatorTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidatorTest.java index 9cfd4099b272..924afe747c3b 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidatorTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidatorTest.java @@ -36,9 +36,6 @@ /** Tests for {@link GcsPathValidator}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GcsPathValidatorTest { @Rule public ExpectedException expectedException = ExpectedException.none(); diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceIdTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceIdTest.java index 1346a5aee5c8..9df76a883b53 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceIdTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceIdTest.java @@ -37,9 +37,6 @@ /** Tests for {@link GcsResourceId}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GcsResourceIdTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilIT.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilIT.java index 316f36f1df12..ff96566d12e4 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilIT.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilIT.java @@ -44,9 +44,6 @@ */ @RunWith(JUnit4.class) @Category(UsesKms.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GcsUtilIT { /** Tests a rewrite operation that requires multiple API calls (using a continuation token). */ @Test diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java index eac676dae3e5..7856ddd175ba 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.extensions.gcp.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; @@ -25,14 +26,18 @@ import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertThrows; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; import static org.mockito.Mockito.when; -import com.google.api.client.googleapis.batch.BatchRequest; +import com.google.api.client.googleapis.batch.json.JsonBatchCallback; +import com.google.api.client.googleapis.json.GoogleJsonError; import com.google.api.client.googleapis.json.GoogleJsonError.ErrorInfo; import com.google.api.client.googleapis.json.GoogleJsonResponseException; +import com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest; import com.google.api.client.http.HttpRequest; import com.google.api.client.http.HttpResponse; import com.google.api.client.http.HttpStatusCodes; @@ -51,10 +56,13 @@ import com.google.api.services.storage.Storage; import com.google.api.services.storage.model.Bucket; import com.google.api.services.storage.model.Objects; +import com.google.api.services.storage.model.RewriteResponse; import com.google.api.services.storage.model.StorageObject; -import com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel; +import com.google.cloud.hadoop.gcsio.CreateObjectOptions; +import com.google.cloud.hadoop.gcsio.GoogleCloudStorage; +import com.google.cloud.hadoop.gcsio.GoogleCloudStorageOptions; import com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadOptions; -import com.google.cloud.hadoop.util.ClientRequestHelper; +import com.google.cloud.hadoop.gcsio.StorageResourceId; import java.io.ByteArrayInputStream; import java.io.FileNotFoundException; import java.io.IOException; @@ -67,21 +75,31 @@ import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; +import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; +import java.util.function.Supplier; +import org.apache.beam.runners.core.metrics.GcpResourceIdentifiers; +import org.apache.beam.runners.core.metrics.MetricsContainerImpl; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName; import org.apache.beam.sdk.extensions.gcp.auth.TestCredential; import org.apache.beam.sdk.extensions.gcp.options.GcsOptions; +import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.BatchInterface; import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.RewriteOp; import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.StorageObjectOrIOException; import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.util.FluentBackoff; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.junit.Before; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; @@ -91,12 +109,16 @@ /** Test case for {@link GcsUtil}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GcsUtilTest { @Rule public ExpectedException thrown = ExpectedException.none(); + @Before + public void setUp() { + // Setup the ProcessWideContainer for testing metrics are set. + MetricsContainerImpl container = new MetricsContainerImpl(null); + MetricsEnvironment.setProcessWideContainer(container); + } + private static GcsOptions gcsOptionsWithTestCredential() { GcsOptions pipelineOptions = PipelineOptionsFactory.as(GcsOptions.class); pipelineOptions.setGcpCredential(new TestCredential()); @@ -500,21 +522,30 @@ public void testGetSizeBytesWhenFileNotFoundBatchRetry() throws Exception { + "\n"; thrown.expect(FileNotFoundException.class); - final LowLevelHttpResponse mockResponse = Mockito.mock(LowLevelHttpResponse.class); - when(mockResponse.getContentType()).thenReturn("multipart/mixed; boundary=" + contentBoundary); + final LowLevelHttpResponse[] mockResponses = + new LowLevelHttpResponse[] { + Mockito.mock(LowLevelHttpResponse.class), Mockito.mock(LowLevelHttpResponse.class), + }; + when(mockResponses[0].getContentType()).thenReturn("text/plain"); + when(mockResponses[1].getContentType()) + .thenReturn("multipart/mixed; boundary=" + contentBoundary); // 429: Too many requests, then 200: OK. - when(mockResponse.getStatusCode()).thenReturn(429, 200); - when(mockResponse.getContent()).thenReturn(toStream("error"), toStream(content)); + when(mockResponses[0].getStatusCode()).thenReturn(429); + when(mockResponses[1].getStatusCode()).thenReturn(200); + when(mockResponses[0].getContent()).thenReturn(toStream("error")); + when(mockResponses[1].getContent()).thenReturn(toStream(content)); // A mock transport that lets us mock the API responses. MockHttpTransport mockTransport = new MockHttpTransport.Builder() .setLowLevelHttpRequest( new MockLowLevelHttpRequest() { + int index = 0; + @Override public LowLevelHttpResponse execute() throws IOException { - return mockResponse; + return mockResponses[index++]; } }) .build(); @@ -762,15 +793,78 @@ public void testGetBucketNotExists() throws IOException { @Test public void testGCSChannelCloseIdempotent() throws IOException { + GcsOptions pipelineOptions = gcsOptionsWithTestCredential(); + GcsUtil gcsUtil = pipelineOptions.getGcsUtil(); GoogleCloudStorageReadOptions readOptions = GoogleCloudStorageReadOptions.builder().setFastFailOnNotFound(false).build(); SeekableByteChannel channel = - new GoogleCloudStorageReadChannel( - null, "dummybucket", "dummyobject", null, new ClientRequestHelper<>(), readOptions); + gcsUtil.open(GcsPath.fromComponents("testbucket", "testobject"), readOptions); channel.close(); channel.close(); } + @Test + public void testGCSReadMetricsIsSet() { + GcsOptions pipelineOptions = gcsOptionsWithTestCredential(); + GcsUtil gcsUtil = pipelineOptions.getGcsUtil(); + gcsUtil.setCloudStorageImpl( + GoogleCloudStorageOptions.builder() + .setAppName("Beam") + .setGrpcEnabled(true) + .setProjectId("my_project") + .build()); + GoogleCloudStorageReadOptions readOptions = + GoogleCloudStorageReadOptions.builder().setFastFailOnNotFound(true).build(); + assertThrows( + IOException.class, + () -> gcsUtil.open(GcsPath.fromComponents("testbucket", "testbucket"), readOptions)); + verifyMetricWasSet("my_project", "testbucket", "GcsGet", "permission_denied", 1); + } + + @Test + public void testGCSWriteMetricsIsSet() throws IOException { + GcsOptions pipelineOptions = gcsOptionsWithTestCredential(); + GcsUtil gcsUtil = pipelineOptions.getGcsUtil(); + GoogleCloudStorage mockStorage = Mockito.mock(GoogleCloudStorage.class); + gcsUtil.setCloudStorageImpl( + GoogleCloudStorageOptions.builder() + .setAppName("Beam") + .setGrpcEnabled(true) + .setProjectId("my_project") + .build()); + when(mockStorage.create( + new StorageResourceId("testbucket", "testobject"), + CreateObjectOptions.builder() + .setOverwriteExisting(true) + .setContentType("type") + .build())) + .thenThrow(IOException.class); + GcsPath gcsPath = GcsPath.fromComponents("testbucket", "testobject"); + assertThrows(IOException.class, () -> gcsUtil.create(gcsPath, "")); + verifyMetricWasSet("my_project", "testbucket", "GcsInsert", "permission_denied", 1); + } + + private void verifyMetricWasSet( + String projectId, String bucketId, String method, String status, long count) { + // Verify the metric as reported. + HashMap labels = new HashMap<>(); + labels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + labels.put(MonitoringInfoConstants.Labels.SERVICE, "Storage"); + labels.put(MonitoringInfoConstants.Labels.METHOD, method); + labels.put(MonitoringInfoConstants.Labels.GCS_PROJECT_ID, projectId); + labels.put(MonitoringInfoConstants.Labels.GCS_BUCKET, bucketId); + labels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.cloudStorageBucket(bucketId)); + labels.put(MonitoringInfoConstants.Labels.STATUS, status); + + MonitoringInfoMetricName name = + MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, labels); + MetricsContainerImpl container = + (MetricsContainerImpl) MetricsEnvironment.getProcessWideContainer(); + assertEquals(count, (long) container.getCounter(name).getCumulative()); + } + /** Builds a fake GoogleJsonResponseException for testing API error handling. */ private static GoogleJsonResponseException googleJsonResponseException( final int status, final String reason, final String message) throws IOException { @@ -821,9 +915,9 @@ private static List makeGcsPaths(String s, int n) { return ret.build(); } - private static int sumBatchSizes(List batches) { + private static int sumBatchSizes(List batches) { int ret = 0; - for (BatchRequest b : batches) { + for (BatchInterface b : batches) { ret += b.size(); assertThat(b.size(), greaterThan(0)); } @@ -836,7 +930,7 @@ public void testMakeRewriteOps() throws IOException { GcsUtil gcsUtil = gcsOptions.getGcsUtil(); LinkedList rewrites = - gcsUtil.makeRewriteOps(makeStrings("s", 1), makeStrings("d", 1)); + gcsUtil.makeRewriteOps(makeStrings("s", 1), makeStrings("d", 1), false, false, false); assertEquals(1, rewrites.size()); RewriteOp rewrite = rewrites.pop(); @@ -856,7 +950,7 @@ public void testMakeRewriteOpsWithOptions() throws IOException { gcsUtil.maxBytesRewrittenPerCall = 1337L; LinkedList rewrites = - gcsUtil.makeRewriteOps(makeStrings("s", 1), makeStrings("d", 1)); + gcsUtil.makeRewriteOps(makeStrings("s", 1), makeStrings("d", 1), false, false, false); assertEquals(1, rewrites.size()); RewriteOp rewrite = rewrites.pop(); @@ -866,26 +960,29 @@ public void testMakeRewriteOpsWithOptions() throws IOException { } @Test - public void testMakeCopyBatches() throws IOException { + public void testMakeRewriteBatches() throws IOException { GcsUtil gcsUtil = gcsOptionsWithTestCredential().getGcsUtil(); // Small number of files fits in 1 batch - List batches = - gcsUtil.makeCopyBatches(gcsUtil.makeRewriteOps(makeStrings("s", 3), makeStrings("d", 3))); + List batches = + gcsUtil.makeRewriteBatches( + gcsUtil.makeRewriteOps(makeStrings("s", 3), makeStrings("d", 3), false, false, false)); assertThat(batches.size(), equalTo(1)); assertThat(sumBatchSizes(batches), equalTo(3)); // 1 batch of files fits in 1 batch batches = - gcsUtil.makeCopyBatches( - gcsUtil.makeRewriteOps(makeStrings("s", 100), makeStrings("d", 100))); + gcsUtil.makeRewriteBatches( + gcsUtil.makeRewriteOps( + makeStrings("s", 100), makeStrings("d", 100), false, false, false)); assertThat(batches.size(), equalTo(1)); assertThat(sumBatchSizes(batches), equalTo(100)); // A little more than 5 batches of files fits in 6 batches batches = - gcsUtil.makeCopyBatches( - gcsUtil.makeRewriteOps(makeStrings("s", 501), makeStrings("d", 501))); + gcsUtil.makeRewriteBatches( + gcsUtil.makeRewriteOps( + makeStrings("s", 501), makeStrings("d", 501), false, false, false)); assertThat(batches.size(), equalTo(6)); assertThat(sumBatchSizes(batches), equalTo(501)); } @@ -896,7 +993,184 @@ public void testMakeRewriteOpsInvalid() throws IOException { thrown.expect(IllegalArgumentException.class); thrown.expectMessage("Number of source files 3"); - gcsUtil.makeRewriteOps(makeStrings("s", 3), makeStrings("d", 1)); + gcsUtil.makeRewriteOps(makeStrings("s", 3), makeStrings("d", 1), false, false, false); + } + + private class FakeBatcher implements BatchInterface { + ArrayList> requests = new ArrayList<>(); + + @Override + public void queue(AbstractGoogleJsonClientRequest request, JsonBatchCallback cb) { + assertNotNull(request); + assertNotNull(cb); + requests.add( + () -> { + try { + try { + T result = request.execute(); + cb.onSuccess(result, null); + } catch (FileNotFoundException e) { + GoogleJsonError error = new GoogleJsonError(); + error.setCode(HttpStatusCodes.STATUS_CODE_NOT_FOUND); + cb.onFailure(error, null); + } catch (Exception e) { + System.out.println("Propagating exception as server error " + e); + e.printStackTrace(); + GoogleJsonError error = new GoogleJsonError(); + error.setCode(HttpStatusCodes.STATUS_CODE_SERVER_ERROR); + cb.onFailure(error, null); + } + } catch (IOException e) { + throw new RuntimeException(e); + } + return null; + }); + } + + @Override + public void execute() throws IOException { + RuntimeException lastException = null; + for (Supplier request : requests) { + try { + request.get(); + } catch (RuntimeException e) { + lastException = e; + } + } + if (lastException != null) { + throw lastException; + } + } + + @Override + public int size() { + return requests.size(); + } + } + + @Test + public void testRename() throws IOException { + GcsOptions pipelineOptions = gcsOptionsWithTestCredential(); + GcsUtil gcsUtil = pipelineOptions.getGcsUtil(); + + Storage mockStorage = Mockito.mock(Storage.class); + gcsUtil.setStorageClient(mockStorage); + gcsUtil.setBatchRequestSupplier(() -> new FakeBatcher()); + + Storage.Objects mockStorageObjects = Mockito.mock(Storage.Objects.class); + Storage.Objects.Rewrite mockStorageRewrite = Mockito.mock(Storage.Objects.Rewrite.class); + Storage.Objects.Delete mockStorageDelete1 = Mockito.mock(Storage.Objects.Delete.class); + Storage.Objects.Delete mockStorageDelete2 = Mockito.mock(Storage.Objects.Delete.class); + + when(mockStorage.objects()).thenReturn(mockStorageObjects); + when(mockStorageObjects.rewrite("bucket", "s0", "bucket", "d0", null)) + .thenReturn(mockStorageRewrite); + when(mockStorageRewrite.execute()) + .thenThrow(new SocketTimeoutException("SocketException")) + .thenReturn(new RewriteResponse().setDone(true)); + when(mockStorageObjects.delete("bucket", "s0")) + .thenReturn(mockStorageDelete1) + .thenReturn(mockStorageDelete2); + + when(mockStorageDelete1.execute()).thenThrow(new SocketTimeoutException("SocketException")); + + gcsUtil.rename(makeStrings("s", 1), makeStrings("d", 1)); + verify(mockStorageRewrite, times(2)).execute(); + verify(mockStorageDelete1, times(1)).execute(); + verify(mockStorageDelete2, times(1)).execute(); + } + + @Test + public void testRenameIgnoringMissing() throws IOException { + GcsOptions pipelineOptions = gcsOptionsWithTestCredential(); + GcsUtil gcsUtil = pipelineOptions.getGcsUtil(); + + Storage mockStorage = Mockito.mock(Storage.class); + gcsUtil.setStorageClient(mockStorage); + gcsUtil.setBatchRequestSupplier(() -> new FakeBatcher()); + + Storage.Objects mockStorageObjects = Mockito.mock(Storage.Objects.class); + Storage.Objects.Rewrite mockStorageRewrite1 = Mockito.mock(Storage.Objects.Rewrite.class); + Storage.Objects.Rewrite mockStorageRewrite2 = Mockito.mock(Storage.Objects.Rewrite.class); + Storage.Objects.Delete mockStorageDelete = Mockito.mock(Storage.Objects.Delete.class); + + when(mockStorage.objects()).thenReturn(mockStorageObjects); + when(mockStorageObjects.rewrite("bucket", "s0", "bucket", "d0", null)) + .thenReturn(mockStorageRewrite1); + when(mockStorageRewrite1.execute()).thenThrow(new FileNotFoundException()); + when(mockStorageObjects.rewrite("bucket", "s1", "bucket", "d1", null)) + .thenReturn(mockStorageRewrite2); + when(mockStorageRewrite2.execute()).thenReturn(new RewriteResponse().setDone(true)); + when(mockStorageObjects.delete("bucket", "s1")).thenReturn(mockStorageDelete); + + gcsUtil.rename( + makeStrings("s", 2), makeStrings("d", 2), StandardMoveOptions.IGNORE_MISSING_FILES); + verify(mockStorageRewrite1, times(1)).execute(); + verify(mockStorageRewrite2, times(1)).execute(); + verify(mockStorageDelete, times(1)).execute(); + } + + @Test + public void testRenamePropagateMissingException() throws IOException { + GcsOptions pipelineOptions = gcsOptionsWithTestCredential(); + GcsUtil gcsUtil = pipelineOptions.getGcsUtil(); + + Storage mockStorage = Mockito.mock(Storage.class); + gcsUtil.setStorageClient(mockStorage); + gcsUtil.setBatchRequestSupplier(() -> new FakeBatcher()); + + Storage.Objects mockStorageObjects = Mockito.mock(Storage.Objects.class); + Storage.Objects.Rewrite mockStorageRewrite = Mockito.mock(Storage.Objects.Rewrite.class); + + when(mockStorage.objects()).thenReturn(mockStorageObjects); + when(mockStorageObjects.rewrite("bucket", "s0", "bucket", "d0", null)) + .thenReturn(mockStorageRewrite); + when(mockStorageRewrite.execute()).thenThrow(new FileNotFoundException()); + + assertThrows(IOException.class, () -> gcsUtil.rename(makeStrings("s", 1), makeStrings("d", 1))); + verify(mockStorageRewrite, times(1)).execute(); + } + + @Test + public void testRenameSkipDestinationExistsSameBucket() throws IOException { + GcsOptions pipelineOptions = gcsOptionsWithTestCredential(); + GcsUtil gcsUtil = pipelineOptions.getGcsUtil(); + + Storage mockStorage = Mockito.mock(Storage.class); + gcsUtil.setStorageClient(mockStorage); + gcsUtil.setBatchRequestSupplier(() -> new FakeBatcher()); + + Storage.Objects mockStorageObjects = Mockito.mock(Storage.Objects.class); + Storage.Objects.Rewrite mockStorageRewrite = Mockito.mock(Storage.Objects.Rewrite.class); + Storage.Objects.Delete mockStorageDelete = Mockito.mock(Storage.Objects.Delete.class); + + when(mockStorage.objects()).thenReturn(mockStorageObjects); + when(mockStorageObjects.rewrite("bucket", "s0", "bucket", "d0", null)) + .thenReturn(mockStorageRewrite); + when(mockStorageRewrite.execute()).thenReturn(new RewriteResponse().setDone(true)); + when(mockStorageObjects.delete("bucket", "s0")).thenReturn(mockStorageDelete); + + gcsUtil.rename( + makeStrings("s", 1), makeStrings("d", 1), StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS); + verify(mockStorageRewrite, times(1)).execute(); + verify(mockStorageDelete, times(1)).execute(); + } + + @Test + public void testRenameSkipDestinationExistsDifferentBucket() throws IOException { + GcsOptions pipelineOptions = gcsOptionsWithTestCredential(); + GcsUtil gcsUtil = pipelineOptions.getGcsUtil(); + + Storage mockStorage = Mockito.mock(Storage.class); + gcsUtil.setStorageClient(mockStorage); + + assertThrows( + UnsupportedOperationException.class, + () -> + gcsUtil.rename( + Collections.singletonList("gs://bucket/source"), + Collections.singletonList("gs://different_bucket/dest"), + StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS)); } @Test @@ -904,7 +1178,7 @@ public void testMakeRemoveBatches() throws IOException { GcsUtil gcsUtil = gcsOptionsWithTestCredential().getGcsUtil(); // Small number of files fits in 1 batch - List batches = gcsUtil.makeRemoveBatches(makeStrings("s", 3)); + List batches = gcsUtil.makeRemoveBatches(makeStrings("s", 3)); assertThat(batches.size(), equalTo(1)); assertThat(sumBatchSizes(batches), equalTo(3)); @@ -925,7 +1199,7 @@ public void testMakeGetBatches() throws IOException { // Small number of files fits in 1 batch List results = Lists.newArrayList(); - List batches = gcsUtil.makeGetBatches(makeGcsPaths("s", 3), results); + List batches = gcsUtil.makeGetBatches(makeGcsPaths("s", 3), results); assertThat(batches.size(), equalTo(1)); assertThat(sumBatchSizes(batches), equalTo(3)); assertEquals(3, results.size()); diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializerTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializerTest.java index 9ec8503f53a0..6e0b45df141d 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializerTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializerTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.extensions.gcp.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.mockito.ArgumentMatchers.anyDouble; import static org.mockito.ArgumentMatchers.anyString; import static org.mockito.Matchers.anyInt; @@ -37,8 +37,8 @@ import com.google.api.client.json.jackson2.JacksonFactory; import com.google.api.services.storage.Storage; import java.io.IOException; +import org.apache.beam.sdk.metrics.Histogram; import org.apache.beam.sdk.testing.ExpectedLogs; -import org.apache.beam.sdk.util.Histogram; import org.hamcrest.Matchers; import org.junit.After; import org.junit.Before; @@ -51,9 +51,6 @@ /** Tests for LatencyRecordingHttpRequestInitializer. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class LatencyRecordingHttpRequestInitializerTest { @Rule @@ -101,12 +98,12 @@ public void testOkResponse() throws IOException { HttpResponse response = result.executeUnparsed(); assertNotNull(response); - verify(mockHistogram, only()).record(anyDouble()); + verify(mockHistogram, only()).update(anyDouble()); verify(mockLowLevelRequest, atLeastOnce()).addHeader(anyString(), anyString()); verify(mockLowLevelRequest).setTimeout(anyInt(), anyInt()); verify(mockLowLevelRequest).setWriteTimeout(anyInt()); verify(mockLowLevelRequest).execute(); - verify(mockLowLevelResponse).getStatusCode(); + verify(mockLowLevelResponse, atLeastOnce()).getStatusCode(); } @Test @@ -122,11 +119,11 @@ public void testErrorResponse() throws IOException { assertThat(e.getMessage(), Matchers.containsString("403")); } - verify(mockHistogram, only()).record(anyDouble()); + verify(mockHistogram, only()).update(anyDouble()); verify(mockLowLevelRequest, atLeastOnce()).addHeader(anyString(), anyString()); verify(mockLowLevelRequest).setTimeout(anyInt(), anyInt()); verify(mockLowLevelRequest).setWriteTimeout(anyInt()); verify(mockLowLevelRequest).execute(); - verify(mockLowLevelResponse).getStatusCode(); + verify(mockLowLevelResponse, atLeastOnce()).getStatusCode(); } } diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializerTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializerTest.java index 673e809058c6..8bcca0491ea0 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializerTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializerTest.java @@ -17,14 +17,16 @@ */ package org.apache.beam.sdk.extensions.gcp.util; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.ArgumentMatchers.anyString; import static org.mockito.Matchers.any; import static org.mockito.Matchers.anyInt; import static org.mockito.Mockito.atLeastOnce; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.never; import static org.mockito.Mockito.times; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.verifyNoMoreInteractions; @@ -40,14 +42,17 @@ import com.google.api.client.json.jackson2.JacksonFactory; import com.google.api.client.testing.http.MockHttpTransport; import com.google.api.client.testing.http.MockLowLevelHttpRequest; +import com.google.api.client.testing.http.MockLowLevelHttpResponse; import com.google.api.client.util.NanoClock; import com.google.api.services.storage.Storage; import com.google.api.services.storage.Storage.Objects.Get; import java.io.IOException; import java.net.SocketTimeoutException; import java.security.PrivateKey; +import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; +import java.util.List; import java.util.concurrent.atomic.AtomicLong; import org.apache.beam.sdk.testing.ExpectedLogs; import org.hamcrest.Matchers; @@ -64,9 +69,6 @@ /** Tests for RetryHttpRequestInitializer. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RetryHttpRequestInitializerTest { @Rule public ExpectedLogs expectedLogs = ExpectedLogs.none(RetryHttpRequestInitializer.class); @@ -93,6 +95,17 @@ public long nanoTime() { } } + MockLowLevelHttpResponse[] createMockResponseWithStatusCode(int... statusCodes) { + MockLowLevelHttpResponse[] responses = new MockLowLevelHttpResponse[statusCodes.length]; + + for (int i = 0; i < statusCodes.length; ++i) { + MockLowLevelHttpResponse response = mock(MockLowLevelHttpResponse.class); + when(response.getStatusCode()).thenReturn(statusCodes[i]); + responses[i] = response; + } + return responses; + } + @Before public void setUp() { MockitoAnnotations.initMocks(this); @@ -141,17 +154,19 @@ public void testBasicOperation() throws IOException { verify(mockLowLevelRequest).setTimeout(anyInt(), anyInt()); verify(mockLowLevelRequest).setWriteTimeout(anyInt()); verify(mockLowLevelRequest).execute(); - verify(mockLowLevelResponse).getStatusCode(); + verify(mockLowLevelResponse, atLeastOnce()).getStatusCode(); expectedLogs.verifyNotLogged("Request failed"); } /** Tests that a non-retriable error is not retried. */ @Test public void testErrorCodeForbidden() throws IOException { - when(mockLowLevelRequest.execute()).thenReturn(mockLowLevelResponse); - when(mockLowLevelResponse.getStatusCode()) - .thenReturn(403) // Non-retryable error. - .thenReturn(200); // Shouldn't happen. + MockLowLevelHttpResponse[] responses = + createMockResponseWithStatusCode( + 403, // Non-retryable error. + 200); // Shouldn't happen. + + when(mockLowLevelRequest.execute()).thenReturn(responses[0], responses[1]); try { Storage.Buckets.Get result = storage.buckets().get("test"); @@ -166,21 +181,21 @@ public void testErrorCodeForbidden() throws IOException { verify(mockLowLevelRequest).setTimeout(anyInt(), anyInt()); verify(mockLowLevelRequest).setWriteTimeout(anyInt()); verify(mockLowLevelRequest).execute(); - verify(mockLowLevelResponse).getStatusCode(); + verify(responses[0], atLeastOnce()).getStatusCode(); + verify(responses[1], never()).getStatusCode(); expectedLogs.verifyWarn("Request failed with code 403"); } /** Tests that a retriable error is retried. */ @Test public void testRetryableError() throws IOException { + MockLowLevelHttpResponse[] mockResponses = + createMockResponseWithStatusCode( + 503, // Retryable + 429, // We also retry on 429 Too Many Requests. + 200); when(mockLowLevelRequest.execute()) - .thenReturn(mockLowLevelResponse) - .thenReturn(mockLowLevelResponse) - .thenReturn(mockLowLevelResponse); - when(mockLowLevelResponse.getStatusCode()) - .thenReturn(503) // Retryable - .thenReturn(429) // We also retry on 429 Too Many Requests. - .thenReturn(200); + .thenReturn(mockResponses[0], mockResponses[1], mockResponses[2]); Storage.Buckets.Get result = storage.buckets().get("test"); HttpResponse response = result.executeUnparsed(); @@ -191,7 +206,11 @@ public void testRetryableError() throws IOException { verify(mockLowLevelRequest, times(3)).setTimeout(anyInt(), anyInt()); verify(mockLowLevelRequest, times(3)).setWriteTimeout(anyInt()); verify(mockLowLevelRequest, times(3)).execute(); - verify(mockLowLevelResponse, times(3)).getStatusCode(); + + // It reads the status code of all responses + for (MockLowLevelHttpResponse mockResponse : mockResponses) { + verify(mockResponse, atLeastOnce()).getStatusCode(); + } expectedLogs.verifyDebug("Request failed with code 503"); } @@ -212,23 +231,30 @@ public void testThrowIOException() throws IOException { verify(mockLowLevelRequest, times(2)).setTimeout(anyInt(), anyInt()); verify(mockLowLevelRequest, times(2)).setWriteTimeout(anyInt()); verify(mockLowLevelRequest, times(2)).execute(); - verify(mockLowLevelResponse).getStatusCode(); + verify(mockLowLevelResponse, atLeastOnce()).getStatusCode(); expectedLogs.verifyDebug("Request failed with IOException"); } /** Tests that a retryable error is retried enough times. */ @Test public void testRetryableErrorRetryEnoughTimes() throws IOException { - when(mockLowLevelRequest.execute()).thenReturn(mockLowLevelResponse); + List responses = new ArrayList<>(); final int retries = 10; - when(mockLowLevelResponse.getStatusCode()) + + // The underlying http library calls getStatusCode method of a response multiple times. For a + // response, the method should return the same value. Therefore this test cannot rely on + // `mockLowLevelResponse` variable that are reused across responses. + when(mockLowLevelRequest.execute()) .thenAnswer( - new Answer() { + new Answer() { int n = 0; @Override - public Integer answer(InvocationOnMock invocation) { - return n++ < retries ? 503 : 9999; + public MockLowLevelHttpResponse answer(InvocationOnMock invocation) throws Throwable { + MockLowLevelHttpResponse response = mock(MockLowLevelHttpResponse.class); + responses.add(response); + when(response.getStatusCode()).thenReturn(n++ < retries ? 503 : 9999); + return response; } }); @@ -244,7 +270,10 @@ public Integer answer(InvocationOnMock invocation) { verify(mockLowLevelRequest, times(retries + 1)).setTimeout(anyInt(), anyInt()); verify(mockLowLevelRequest, times(retries + 1)).setWriteTimeout(anyInt()); verify(mockLowLevelRequest, times(retries + 1)).execute(); - verify(mockLowLevelResponse, times(retries + 1)).getStatusCode(); + assertThat(responses, Matchers.hasSize(retries + 1)); + for (MockLowLevelHttpResponse response : responses) { + verify(response, atLeastOnce()).getStatusCode(); + } expectedLogs.verifyWarn("performed 10 retries due to unsuccessful status codes"); } diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptorTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptorTest.java index ff263bf58be5..630a08491480 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptorTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptorTest.java @@ -32,9 +32,6 @@ /** A test for {@link org.apache.beam.sdk.extensions.gcp.util.UploadIdResponseInterceptor}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class UploadIdResponseInterceptorTest { @Rule public ExpectedException expectedException = ExpectedException.none(); diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPathTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPathTest.java index e7e77e3a983c..ea047b0c6ea5 100644 --- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPathTest.java +++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPathTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.extensions.gcp.util.gcsfs; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.net.URI; diff --git a/sdks/java/extensions/jackson/build.gradle b/sdks/java/extensions/jackson/build.gradle index b36343c37dbe..b9d218792cd9 100644 --- a/sdks/java/extensions/jackson/build.gradle +++ b/sdks/java/extensions/jackson/build.gradle @@ -29,8 +29,7 @@ dependencies { compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.jackson_databind - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library + compile library.java.jackson_core testCompile library.java.junit testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/extensions/jackson/src/test/java/org/apache/beam/sdk/extensions/jackson/JacksonTransformsTest.java b/sdks/java/extensions/jackson/src/test/java/org/apache/beam/sdk/extensions/jackson/JacksonTransformsTest.java index f4e9c56576bc..2e1855c412a9 100644 --- a/sdks/java/extensions/jackson/src/test/java/org/apache/beam/sdk/extensions/jackson/JacksonTransformsTest.java +++ b/sdks/java/extensions/jackson/src/test/java/org/apache/beam/sdk/extensions/jackson/JacksonTransformsTest.java @@ -49,9 +49,6 @@ import org.junit.Test; /** Test Jackson transforms {@link ParseJsons} and {@link AsJsons}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JacksonTransformsTest implements Serializable { private static final List VALID_JSONS = Arrays.asList("{\"myString\":\"abc\",\"myInt\":3}", "{\"myString\":\"def\",\"myInt\":4}"); diff --git a/sdks/java/extensions/join-library/build.gradle b/sdks/java/extensions/join-library/build.gradle index 1257f8f690c0..b8eca21f698b 100644 --- a/sdks/java/extensions/join-library/build.gradle +++ b/sdks/java/extensions/join-library/build.gradle @@ -17,15 +17,15 @@ */ plugins { id 'org.apache.beam.module' } -applyJavaNature(automaticModuleName: 'org.apache.beam.sdk.extensions.joinlibrary') +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.extensions.joinlibrary' +) description = "Apache Beam :: SDKs :: Java :: Extensions :: Join library" dependencies { compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.junit testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/extensions/kryo/build.gradle b/sdks/java/extensions/kryo/build.gradle index 50a2ae54d685..cf1e297caebb 100644 --- a/sdks/java/extensions/kryo/build.gradle +++ b/sdks/java/extensions/kryo/build.gradle @@ -39,7 +39,10 @@ applyJavaNature(automaticModuleName: 'org.apache.beam.sdk.extensions.kryo', description = 'Apache Beam :: SDKs :: Java :: Extensions :: Kryo' dependencies { + compile library.java.jackson_annotations + compile library.java.vendored_guava_26_0_jre compile "com.esotericsoftware:kryo:${kryoVersion}" + compile "org.objenesis:objenesis:3.2" shadow project(path: ':sdks:java:core', configuration: 'shadow') testCompile project(path: ':sdks:java:core', configuration: 'shadowTest') testRuntimeOnly project(path: ':runners:direct-java', configuration: 'shadow') diff --git a/sdks/java/extensions/ml/build.gradle b/sdks/java/extensions/ml/build.gradle index d7ab4caabb5e..a8acfe750e48 100644 --- a/sdks/java/extensions/ml/build.gradle +++ b/sdks/java/extensions/ml/build.gradle @@ -21,25 +21,44 @@ import groovy.json.JsonOutput */ plugins { id 'org.apache.beam.module' } -applyJavaNature(automaticModuleName: 'org.apache.beam.sdk.extensions.ml') +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.extensions.ml' +) description = 'Apache Beam :: SDKs :: Java :: Extensions :: ML' dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":sdks:java:expansion-service") + permitUnusedDeclared project(":sdks:java:expansion-service") // BEAM-11761 + compile library.java.google_http_client + compile 'com.google.cloud:google-cloud-recommendations-ai:0.3.7' compile 'com.google.cloud:google-cloud-video-intelligence:1.2.0' compile 'com.google.cloud:google-cloud-dlp:1.1.4' compile 'com.google.cloud:google-cloud-language:1.99.4' + compile library.java.protobuf_java_util + compile group: 'org.json', name: 'json', version: '20201115' + compile 'com.google.api.grpc:proto-google-cloud-dlp-v2:1.1.4' + compile 'com.google.api.grpc:proto-google-cloud-language-v1:1.81.4' + compile 'com.google.api.grpc:proto-google-cloud-video-intelligence-v1:1.2.0' + compile 'com.google.api.grpc:proto-google-cloud-vision-v1:1.81.3' + compile 'com.google.api.grpc:proto-google-cloud-recommendations-ai-v1beta1:0.3.7' + compile library.java.joda_time + compile library.java.auto_value_annotations + compile library.java.gax + compile library.java.protobuf_java + compile library.java.slf4j_api provided library.java.junit testCompile project(path: ':sdks:java:core', configuration: 'shadowTest') compile 'com.google.cloud:google-cloud-vision:1.99.3' testCompile library.java.mockito_core + testCompile library.java.google_http_client + testCompile library.java.protobuf_java_util + testCompile group: 'org.json', name: 'json', version: '20201115' + testCompile 'com.google.cloud:google-cloud-recommendations-ai:0.3.7' testCompile 'com.google.cloud:google-cloud-video-intelligence:1.2.0' testCompile 'com.google.cloud:google-cloud-dlp:1.1.4' testCompile project(path: ":sdks:java:extensions:google-cloud-platform-core", configuration: "testRuntime") - testCompile 'com.google.cloud:google-cloud-language:1.99.4' - testCompile 'com.google.cloud:google-cloud-vision:1.99.3' testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") testRuntimeOnly project(":runners:google-cloud-dataflow-java") } diff --git a/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAICreateCatalogItem.java b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAICreateCatalogItem.java new file mode 100644 index 000000000000..55c5abadf092 --- /dev/null +++ b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAICreateCatalogItem.java @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.api.client.json.GenericJson; +import com.google.api.gax.rpc.ApiException; +import com.google.auto.value.AutoValue; +import com.google.cloud.recommendationengine.v1beta1.CatalogItem; +import com.google.cloud.recommendationengine.v1beta1.CatalogName; +import com.google.cloud.recommendationengine.v1beta1.CatalogServiceClient; +import com.google.protobuf.util.JsonFormat; +import java.io.IOException; +import javax.annotation.Nullable; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.json.JSONObject; + +/** + * A {@link PTransform} using the Recommendations AI API (https://cloud.google.com/recommendations). + * Takes an input {@link PCollection} of {@link GenericJson}s and converts them to and creates + * {@link CatalogItem}s. It outputs a PCollectionTuple which will contain the successfully created + * and failed catalog items. + * + *

    It is possible to provide a catalog name to which you want to add the catalog item (defaults + * to "default_catalog"). + */ +@AutoValue +@SuppressWarnings({"nullness"}) +public abstract class RecommendationAICreateCatalogItem + extends PTransform, PCollectionTuple> { + + /** @return ID of Google Cloud project to be used for creating catalog items. */ + public abstract @Nullable String projectId(); + + /** + * @return Name of the catalog where the catalog items will be created (defaults to + * "default_catalog"). + */ + public abstract @Nullable String catalogName(); + + public static final TupleTag SUCCESS_TAG = new TupleTag() {}; + + public static final TupleTag FAILURE_TAG = new TupleTag() {}; + + abstract Builder toBuilder(); + + @AutoValue.Builder + abstract static class Builder { + /** @param projectId ID of Google Cloud project to be used for creating catalog items. */ + public abstract Builder setProjectId(@Nullable String projectId); + + /** @param catalogName Name of the catalog where the catalog items will be created. */ + public abstract Builder setCatalogName(@Nullable String catalogName); + + public abstract RecommendationAICreateCatalogItem build(); + } + + static Builder newBuilder() { + return new AutoValue_RecommendationAICreateCatalogItem.Builder() + .setCatalogName("default_catalog"); + } + + public RecommendationAICreateCatalogItem withProjectId(String projectId) { + return this.toBuilder().setProjectId(projectId).build(); + } + + public RecommendationAICreateCatalogItem withCatalogName(String catalogName) { + return this.toBuilder().setCatalogName(catalogName).build(); + } + + /** + * The transform converts the contents of input PCollection into {@link CatalogItem}s and then + * calls the Recommendation AI service to create the catalog item. + * + * @param input input PCollection + * @return PCollectionTuple with successful and failed {@link CatalogItem}s + */ + @Override + public PCollectionTuple expand(PCollection input) { + return input.apply( + ParDo.of(new CreateCatalogItem(projectId(), catalogName())) + .withOutputTags(SUCCESS_TAG, TupleTagList.of(FAILURE_TAG))); + } + + private static class CreateCatalogItem extends DoFn { + private final String projectId; + private final String catalogName; + + /** + * @param projectId ID of GCP project to be used for creating catalog items. + * @param catalogName Catalog name for CatalogItem creation. + */ + private CreateCatalogItem(String projectId, String catalogName) { + this.projectId = projectId; + this.catalogName = catalogName; + } + + @ProcessElement + public void processElement(ProcessContext context) throws IOException { + CatalogName parent = CatalogName.of(projectId, "global", catalogName); + CatalogItem.Builder catalogItemBuilder = CatalogItem.newBuilder(); + JsonFormat.parser().merge((new JSONObject(context.element())).toString(), catalogItemBuilder); + CatalogItem catalogItem = catalogItemBuilder.build(); + + try (CatalogServiceClient catalogServiceClient = CatalogServiceClient.create()) { + CatalogItem response = catalogServiceClient.createCatalogItem(parent, catalogItem); + + context.output(SUCCESS_TAG, response); + } catch (ApiException e) { + context.output(FAILURE_TAG, catalogItem); + } + } + } +} diff --git a/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIIO.java b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIIO.java new file mode 100644 index 000000000000..48ba7d6e9ffc --- /dev/null +++ b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIIO.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +/** + * The RecommendationAIIO class acts as a wrapper around the {@link PTransform}s that interact with + * the Recommendation AI API (https://cloud.google.com/recommendations). + * + *

    More information can be found on: - Writing catalog items using {@link + * RecommendationAICreateCatalogItem} - Importing catalog items using {@link + * RecommendationAIImportCatalogItems} - Writing user events using {@link + * RecommendationAIWriteUserEvent} - Importing user events using {@link + * RecommendationAIImportUserEvents} - Making predictions using {@link RecommendationAIPredict} + */ +public class RecommendationAIIO { + + public static RecommendationAICreateCatalogItem createCatalogItems() { + return RecommendationAICreateCatalogItem.newBuilder().build(); + } + + public static RecommendationAIImportCatalogItems importCatalogItems() { + return RecommendationAIImportCatalogItems.newBuilder().build(); + } + + public static RecommendationAIWriteUserEvent writeUserEvent() { + return RecommendationAIWriteUserEvent.newBuilder().build(); + } + + public static RecommendationAIImportUserEvents importUserEvents() { + return RecommendationAIImportUserEvents.newBuilder().build(); + } + + public static RecommendationAIPredict predictAll() { + return RecommendationAIPredict.newBuilder().build(); + } +} diff --git a/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIImportCatalogItems.java b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIImportCatalogItems.java new file mode 100644 index 000000000000..1bea407a5fef --- /dev/null +++ b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIImportCatalogItems.java @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.api.client.json.GenericJson; +import com.google.api.gax.rpc.ApiException; +import com.google.auto.value.AutoValue; +import com.google.cloud.recommendationengine.v1beta1.CatalogInlineSource; +import com.google.cloud.recommendationengine.v1beta1.CatalogItem; +import com.google.cloud.recommendationengine.v1beta1.CatalogName; +import com.google.cloud.recommendationengine.v1beta1.CatalogServiceClient; +import com.google.cloud.recommendationengine.v1beta1.ImportCatalogItemsRequest; +import com.google.cloud.recommendationengine.v1beta1.ImportCatalogItemsResponse; +import com.google.cloud.recommendationengine.v1beta1.InputConfig; +import com.google.protobuf.util.JsonFormat; +import java.io.IOException; +import java.util.ArrayList; +import java.util.concurrent.ExecutionException; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.util.ShardedKey; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Duration; +import org.json.JSONObject; + +/** + * A {@link PTransform} connecting to the Recommendations AI API + * (https://cloud.google.com/recommendations) and creating {@link CatalogItem}s. * + * + *

    Batch size defines how many items are created at once per batch (max: 5000). + * + *

    The transform consumes {@link KV} of {@link String} and {@link GenericJson}s (assumed to be + * the catalog item id as key and contents as value) and outputs a PCollectionTuple which will + * contain the successfully created and failed catalog items. + * + *

    It is possible to provide a catalog name to which you want to add the catalog item (defaults + * to "default_catalog"). + */ +@AutoValue +@SuppressWarnings({"nullness"}) +public abstract class RecommendationAIImportCatalogItems + extends PTransform>, PCollectionTuple> { + + public static final TupleTag SUCCESS_TAG = new TupleTag() {}; + public static final TupleTag FAILURE_TAG = new TupleTag() {}; + + static Builder newBuilder() { + return new AutoValue_RecommendationAIImportCatalogItems.Builder(); + } + + abstract Builder toBuilder(); + + /** @return ID of Google Cloud project to be used for creating catalog items. */ + public abstract @Nullable String projectId(); + + /** @return Name of the catalog where the catalog items will be created. */ + public abstract @Nullable String catalogName(); + + /** @return Size of input elements batch to be sent in one request. */ + public abstract Integer batchSize(); + + /** + * @return Time limit (in processing time) on how long an incomplete batch of elements is allowed + * to be buffered. + */ + public abstract Duration maxBufferingDuration(); + + public RecommendationAIImportCatalogItems withProjectId(String projectId) { + return this.toBuilder().setProjectId(projectId).build(); + } + + public RecommendationAIImportCatalogItems withCatalogName(String catalogName) { + return this.toBuilder().setCatalogName(catalogName).build(); + } + + public RecommendationAIImportCatalogItems withBatchSize(Integer batchSize) { + return this.toBuilder().setBatchSize(batchSize).build(); + } + + /** + * The transform converts the contents of input PCollection into {@link CatalogItem}s and then + * calls the Recommendation AI service to create the catalog item. + * + * @param input input PCollection + * @return PCollection after transformations + */ + @Override + public PCollectionTuple expand(PCollection> input) { + return input + .apply( + "Batch Contents", + GroupIntoBatches.ofSize(batchSize()) + .withMaxBufferingDuration(maxBufferingDuration()) + .withShardedKey()) + .apply( + "Import CatalogItems", + ParDo.of(new ImportCatalogItems(projectId(), catalogName())) + .withOutputTags(SUCCESS_TAG, TupleTagList.of(FAILURE_TAG))); + } + + @AutoValue.Builder + abstract static class Builder { + /** @param projectId ID of Google Cloud project to be used for creating catalog items. */ + public abstract Builder setProjectId(@Nullable String projectId); + + /** @param catalogName Name of the catalog where the catalog items will be created. */ + public abstract Builder setCatalogName(@Nullable String catalogName); + + /** + * @param batchSize Amount of input elements to be sent to Recommendation AI service in one + * request. + */ + public abstract Builder setBatchSize(Integer batchSize); + + /** + * @param maxBufferingDuration Time limit (in processing time) on how long an incomplete batch + * of elements is allowed to be buffered. + */ + public abstract Builder setMaxBufferingDuration(Duration maxBufferingDuration); + + public abstract RecommendationAIImportCatalogItems build(); + } + + private static class ImportCatalogItems + extends DoFn, Iterable>, CatalogItem> { + private final String projectId; + private final String catalogName; + + /** + * @param projectId ID of GCP project to be used for creating catalog items. + * @param catalogName Catalog name for CatalogItem creation. + */ + private ImportCatalogItems(String projectId, String catalogName) { + this.projectId = projectId; + this.catalogName = catalogName; + } + + @ProcessElement + public void processElement(ProcessContext c) + throws IOException, ExecutionException, InterruptedException { + CatalogName parent = CatalogName.of(projectId, "global", catalogName); + + ArrayList catalogItems = new ArrayList<>(); + for (GenericJson element : c.element().getValue()) { + CatalogItem.Builder catalogItemBuilder = CatalogItem.newBuilder(); + JsonFormat.parser().merge((new JSONObject(element)).toString(), catalogItemBuilder); + catalogItems.add(catalogItemBuilder.build()); + } + CatalogInlineSource catalogInlineSource = + CatalogInlineSource.newBuilder().addAllCatalogItems(catalogItems).build(); + + InputConfig inputConfig = + InputConfig.newBuilder().mergeCatalogInlineSource(catalogInlineSource).build(); + ImportCatalogItemsRequest request = + ImportCatalogItemsRequest.newBuilder() + .setParent(parent.toString()) + .setInputConfig(inputConfig) + .build(); + try (CatalogServiceClient catalogServiceClient = CatalogServiceClient.create()) { + ImportCatalogItemsResponse response = + catalogServiceClient.importCatalogItemsAsync(request).get(); + if (response.getErrorSamplesCount() > 0) { + for (CatalogItem ci : catalogItems) { + c.output(FAILURE_TAG, ci); + } + } else { + for (CatalogItem ci : catalogItems) { + c.output(SUCCESS_TAG, ci); + } + } + } catch (ApiException e) { + for (CatalogItem ci : catalogItems) { + c.output(SUCCESS_TAG, ci); + } + } + } + } +} diff --git a/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIImportUserEvents.java b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIImportUserEvents.java new file mode 100644 index 000000000000..762c379f610c --- /dev/null +++ b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIImportUserEvents.java @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.api.client.json.GenericJson; +import com.google.api.gax.rpc.ApiException; +import com.google.auto.value.AutoValue; +import com.google.cloud.recommendationengine.v1beta1.EventStoreName; +import com.google.cloud.recommendationengine.v1beta1.ImportUserEventsRequest; +import com.google.cloud.recommendationengine.v1beta1.ImportUserEventsResponse; +import com.google.cloud.recommendationengine.v1beta1.InputConfig; +import com.google.cloud.recommendationengine.v1beta1.UserEvent; +import com.google.cloud.recommendationengine.v1beta1.UserEventInlineSource; +import com.google.cloud.recommendationengine.v1beta1.UserEventServiceClient; +import com.google.protobuf.util.JsonFormat; +import java.io.IOException; +import java.util.ArrayList; +import java.util.concurrent.ExecutionException; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.util.ShardedKey; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Duration; +import org.json.JSONObject; + +/** + * A {@link PTransform} connecting to the Recommendations AI API + * (https://cloud.google.com/recommendations) and creating {@link UserEvent}s. * + * + *

    Batch size defines how many items are at once per batch (max: 5000). + * + *

    The transform consumes {@link KV} of {@link String} and {@link GenericJson}s (assumed to be + * the user event id as key and contents as value) and outputs a PCollectionTuple which will contain + * the successfully created and failed user events. + * + *

    It is possible to provide a catalog name to which you want to add the catalog item (defaults + * to "default_catalog"). It is possible to provide a event store to which you want to add the user + * event (defaults to "default_event_store"). + */ +@AutoValue +@SuppressWarnings({"nullness"}) +public abstract class RecommendationAIImportUserEvents + extends PTransform>, PCollectionTuple> { + + public static final TupleTag SUCCESS_TAG = new TupleTag() {}; + public static final TupleTag FAILURE_TAG = new TupleTag() {}; + + static Builder newBuilder() { + return new AutoValue_RecommendationAIImportUserEvents.Builder() + .setCatalogName("default_catalog") + .setEventStore("default_event_store"); + } + + abstract Builder toBuilder(); + + /** @return ID of Google Cloud project to be used for creating user events. */ + public abstract @Nullable String projectId(); + + /** @return Name of the catalog where the user events will be created. */ + public abstract @Nullable String catalogName(); + + /** @return Name of the event store where the user events will be created. */ + public abstract @Nullable String eventStore(); + + /** @return Size of input elements batch to be sent in one request. */ + public abstract Integer batchSize(); + + /** + * @return Time limit (in processing time) on how long an incomplete batch of elements is allowed + * to be buffered. + */ + public abstract Duration maxBufferingDuration(); + + public RecommendationAIImportUserEvents withProjectId(String projectId) { + return this.toBuilder().setProjectId(projectId).build(); + } + + public RecommendationAIImportUserEvents withCatalogName(String catalogName) { + return this.toBuilder().setCatalogName(catalogName).build(); + } + + public RecommendationAIImportUserEvents withEventStore(String eventStore) { + return this.toBuilder().setEventStore(eventStore).build(); + } + + public RecommendationAIImportUserEvents withBatchSize(Integer batchSize) { + return this.toBuilder().setBatchSize(batchSize).build(); + } + + /** + * The transform converts the contents of input PCollection into {@link UserEvent}s and then calls + * the Recommendation AI service to create the user event. + * + * @param input input PCollection + * @return PCollection after transformations + */ + @Override + public PCollectionTuple expand(PCollection> input) { + return input + .apply( + "Batch Contents", + GroupIntoBatches.ofSize(batchSize()) + .withMaxBufferingDuration(maxBufferingDuration()) + .withShardedKey()) + .apply( + "Import CatalogItems", + ParDo.of(new ImportUserEvents(projectId(), catalogName(), eventStore())) + .withOutputTags(SUCCESS_TAG, TupleTagList.of(FAILURE_TAG))); + } + + @AutoValue.Builder + abstract static class Builder { + /** @param projectId ID of Google Cloud project to be used for creating user events. */ + public abstract Builder setProjectId(@Nullable String projectId); + + /** @param catalogName Name of the catalog where the user events will be created. */ + public abstract Builder setCatalogName(@Nullable String catalogName); + + /** @param eventStore Name of the event store where the user events will be created. */ + public abstract Builder setEventStore(@Nullable String eventStore); + + /** + * @param batchSize Amount of input elements to be sent to Recommendation AI service in one + * request. + */ + public abstract Builder setBatchSize(Integer batchSize); + + /** + * @param maxBufferingDuration Time limit (in processing time) on how long an incomplete batch + * of elements is allowed to be buffered. + */ + public abstract Builder setMaxBufferingDuration(Duration maxBufferingDuration); + + public abstract RecommendationAIImportUserEvents build(); + } + + private static class ImportUserEvents + extends DoFn, Iterable>, UserEvent> { + private final String projectId; + private final String catalogName; + private final String eventStore; + + /** + * @param projectId ID of GCP project to be used for creating user events. + * @param catalogName Catalog name for UserEvent creation. + * @param eventStore Event store name for UserEvent creation. + */ + private ImportUserEvents(String projectId, String catalogName, String eventStore) { + this.projectId = projectId; + this.catalogName = catalogName; + this.eventStore = eventStore; + } + + @ProcessElement + public void processElement(ProcessContext c) + throws IOException, ExecutionException, InterruptedException { + EventStoreName parent = EventStoreName.of(projectId, "global", catalogName, eventStore); + + ArrayList userEvents = new ArrayList<>(); + for (GenericJson element : c.element().getValue()) { + UserEvent.Builder userEventBuilder = UserEvent.newBuilder(); + JsonFormat.parser().merge((new JSONObject(element)).toString(), userEventBuilder); + userEvents.add(userEventBuilder.build()); + } + UserEventInlineSource userEventInlineSource = + UserEventInlineSource.newBuilder().addAllUserEvents(userEvents).build(); + + InputConfig inputConfig = + InputConfig.newBuilder().mergeUserEventInlineSource(userEventInlineSource).build(); + ImportUserEventsRequest request = + ImportUserEventsRequest.newBuilder() + .setParent(parent.toString()) + .setInputConfig(inputConfig) + .build(); + try (UserEventServiceClient userEventServiceClient = UserEventServiceClient.create()) { + ImportUserEventsResponse response = + userEventServiceClient.importUserEventsAsync(request).get(); + if (response.getErrorSamplesCount() > 0) { + for (UserEvent ci : userEvents) { + c.output(FAILURE_TAG, ci); + } + } else { + for (UserEvent ci : userEvents) { + c.output(SUCCESS_TAG, ci); + } + } + } catch (ApiException e) { + for (UserEvent ci : userEvents) { + c.output(FAILURE_TAG, ci); + } + } + } + } +} diff --git a/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIPredict.java b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIPredict.java new file mode 100644 index 000000000000..64aea84f8a3b --- /dev/null +++ b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIPredict.java @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.api.client.json.GenericJson; +import com.google.api.gax.rpc.ApiException; +import com.google.auto.value.AutoValue; +import com.google.cloud.recommendationengine.v1beta1.PlacementName; +import com.google.cloud.recommendationengine.v1beta1.PredictResponse; +import com.google.cloud.recommendationengine.v1beta1.PredictionServiceClient; +import com.google.cloud.recommendationengine.v1beta1.UserEvent; +import com.google.protobuf.util.JsonFormat; +import java.io.IOException; +import javax.annotation.Nullable; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.json.JSONObject; + +/** + * A {@link PTransform} using the Recommendations AI API (https://cloud.google.com/recommendations). + * Takes an input {@link PCollection} of {@link GenericJson}s and creates {@link + * PredictResponse.PredictionResult}s. + * + *

    It is possible to provide a catalog name to which you want to add the user event (defaults to + * "default_catalog"). It is possible to provide a event store to which you want to add the user + * event (defaults to "default_event_store"). A placement id for the recommendation engine placement + * to be used. + */ +@AutoValue +@SuppressWarnings({"nullness"}) +public abstract class RecommendationAIPredict + extends PTransform, PCollectionTuple> { + + /** @return ID of Google Cloud project to be used for creating catalog items. */ + public abstract @Nullable String projectId(); + + /** @return Name of the catalog where the catalog items will be created. */ + public abstract @Nullable String catalogName(); + + /** @return Name of the event store where the user events will be created. */ + public abstract @Nullable String eventStore(); + + /** @return ID of the recommendation engine placement. */ + public abstract String placementId(); + + public static final TupleTag SUCCESS_TAG = + new TupleTag() {}; + + public static final TupleTag FAILURE_TAG = new TupleTag() {}; + + @AutoValue.Builder + abstract static class Builder { + /** @param projectId ID of Google Cloud project to be used for the predictions. */ + public abstract Builder setProjectId(@Nullable String projectId); + + /** @param catalogName Name of the catalog to be used for predictions. */ + public abstract Builder setCatalogName(@Nullable String catalogName); + + /** @param eventStore Name of the event store to be used for predictions. */ + public abstract Builder setEventStore(@Nullable String eventStore); + + /** @param placementId of the recommendation engine placement. */ + public abstract Builder setPlacementId(String placementId); + + public abstract RecommendationAIPredict build(); + } + + static Builder newBuilder() { + return new AutoValue_RecommendationAIPredict.Builder() + .setCatalogName("default_catalog") + .setEventStore("default_event_store") + .setPlacementId("recently_viewed_default"); + } + + abstract Builder toBuilder(); + + public RecommendationAIPredict withProjectId(String projectId) { + return this.toBuilder().setProjectId(projectId).build(); + } + + public RecommendationAIPredict withCatalogName(String catalogName) { + return this.toBuilder().setCatalogName(catalogName).build(); + } + + public RecommendationAIPredict withEventStore(String eventStore) { + return this.toBuilder().setEventStore(eventStore).build(); + } + + public RecommendationAIPredict withPlacementId(String placementId) { + return this.toBuilder().setPlacementId(placementId).build(); + } + + @Override + public PCollectionTuple expand(PCollection input) { + return input.apply( + ParDo.of(new Predict(projectId(), catalogName(), eventStore(), placementId())) + .withOutputTags(SUCCESS_TAG, TupleTagList.of(FAILURE_TAG))); + } + + private static class Predict extends DoFn { + private final String projectId; + private final String catalogName; + private final String eventStore; + private final String placementId; + + /** + * @param projectId ID of GCP project to be used for creating catalog items. + * @param catalogName Catalog name for UserEvent creation. + * @param eventStore Event store for UserEvent creation. + * @param placementId ID of the recommendation engine placement. + */ + private Predict(String projectId, String catalogName, String eventStore, String placementId) { + this.projectId = projectId; + this.catalogName = catalogName; + this.eventStore = eventStore; + this.placementId = placementId; + } + + @ProcessElement + public void processElement(ProcessContext context) throws IOException { + PlacementName name = + PlacementName.of(projectId, "global", catalogName, eventStore, placementId); + UserEvent.Builder userEventBuilder = UserEvent.newBuilder(); + JsonFormat.parser().merge((new JSONObject(context.element())).toString(), userEventBuilder); + UserEvent userEvent = userEventBuilder.build(); + try (PredictionServiceClient predictionServiceClient = PredictionServiceClient.create()) { + for (PredictResponse.PredictionResult res : + predictionServiceClient.predict(name, userEvent).iterateAll()) { + context.output(SUCCESS_TAG, res); + } + } catch (ApiException e) { + context.output(FAILURE_TAG, userEvent); + } + } + } +} diff --git a/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIWriteUserEvent.java b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIWriteUserEvent.java new file mode 100644 index 000000000000..ba1f2d8c8fd1 --- /dev/null +++ b/sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/RecommendationAIWriteUserEvent.java @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.api.client.json.GenericJson; +import com.google.api.gax.rpc.ApiException; +import com.google.auto.value.AutoValue; +import com.google.cloud.recommendationengine.v1beta1.EventStoreName; +import com.google.cloud.recommendationengine.v1beta1.UserEvent; +import com.google.cloud.recommendationengine.v1beta1.UserEventServiceClient; +import com.google.protobuf.util.JsonFormat; +import java.io.IOException; +import java.util.concurrent.ExecutionException; +import javax.annotation.Nullable; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.json.JSONObject; + +/** + * A {@link PTransform} using the Recommendations AI API (https://cloud.google.com/recommendations). + * Takes an input {@link PCollection} of {@link GenericJson}s and converts them to and creates + * {@link UserEvent}s. + * + *

    It is possible to provide a catalog name to which you want to add the user event (defaults to + * "default_catalog"). It is possible to provide a event store to which you want to add the user + * event (defaults to "default_event_store"). + */ +@AutoValue +@SuppressWarnings({"nullness"}) +public abstract class RecommendationAIWriteUserEvent + extends PTransform, PCollectionTuple> { + + public static final TupleTag SUCCESS_TAG = new TupleTag() {}; + public static final TupleTag FAILURE_TAG = new TupleTag() {}; + + static Builder newBuilder() { + return new AutoValue_RecommendationAIWriteUserEvent.Builder() + .setCatalogName("default_catalog") + .setEventStore("default_event_store"); + } + + /** @return ID of Google Cloud project to be used for creating user events. */ + public abstract @Nullable String projectId(); + + /** @return Name of the catalog where the user events will be created. */ + public abstract @Nullable String catalogName(); + + /** @return Name of the event store where the user events will be created. */ + public abstract @Nullable String eventStore(); + + /** + * The transform converts the contents of input PCollection into {@link UserEvent}s and then calls + * the Recommendation AI service to create the user event. + * + * @param input input PCollection + * @return PCollectionTuple with successful and failed {@link UserEvent}s + */ + @Override + public PCollectionTuple expand(PCollection input) { + return input.apply( + ParDo.of(new WriteUserEvent(projectId(), catalogName(), eventStore())) + .withOutputTags(SUCCESS_TAG, TupleTagList.of(FAILURE_TAG))); + } + + @AutoValue.Builder + abstract static class Builder { + /** @param projectId ID of Google Cloud project to be used for creating user events. */ + public abstract Builder setProjectId(@Nullable String projectId); + + /** @param catalogName Name of the catalog where the user events will be created. */ + public abstract Builder setCatalogName(@Nullable String catalogName); + + /** @param eventStore Name of the event store where the user events will be created. */ + public abstract Builder setEventStore(@Nullable String eventStore); + + public abstract RecommendationAIWriteUserEvent build(); + } + + abstract Builder toBuilder(); + + public RecommendationAIWriteUserEvent withProjectId(String projectId) { + return this.toBuilder().setProjectId(projectId).build(); + } + + public RecommendationAIWriteUserEvent withCatalogName(String catalogName) { + return this.toBuilder().setCatalogName(catalogName).build(); + } + + public RecommendationAIWriteUserEvent withEventStore(String eventStore) { + return this.toBuilder().setEventStore(eventStore).build(); + } + + private static class WriteUserEvent extends DoFn { + private final String projectId; + private final String catalogName; + private final String eventStore; + + /** + * @param projectId ID of GCP project to be used for creating user events. + * @param catalogName Catalog name for UserEvent creation. + * @param eventStore Event store for UserEvent creation. + */ + private WriteUserEvent(String projectId, String catalogName, String eventStore) { + this.projectId = projectId; + this.catalogName = catalogName; + this.eventStore = eventStore; + } + + @ProcessElement + public void processElement(ProcessContext context) + throws IOException, ExecutionException, InterruptedException { + EventStoreName parent = EventStoreName.of(projectId, "global", catalogName, eventStore); + UserEvent.Builder userEventBuilder = UserEvent.newBuilder(); + JsonFormat.parser().merge((new JSONObject(context.element())).toString(), userEventBuilder); + UserEvent userEvent = userEventBuilder.build(); + + try (UserEventServiceClient userEventServiceClient = UserEventServiceClient.create()) { + UserEvent response = userEventServiceClient.writeUserEvent(parent, userEvent); + + context.output(SUCCESS_TAG, response); + } catch (ApiException e) { + context.output(FAILURE_TAG, userEvent); + } + } + } +} diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/AnnotateImagesTest.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/AnnotateImagesTest.java index cc7721095c94..341c97e7d586 100644 --- a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/AnnotateImagesTest.java +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/AnnotateImagesTest.java @@ -32,9 +32,6 @@ import org.mockito.junit.MockitoJUnitRunner; @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AnnotateImagesTest { @Mock private ImageAnnotatorClient imageAnnotatorClient; @Mock private BatchAnnotateImagesResponse response; diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/AnnotateVideoTest.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/AnnotateVideoTest.java index 0439cc75550d..56473aae1ae2 100644 --- a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/AnnotateVideoTest.java +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/AnnotateVideoTest.java @@ -37,9 +37,6 @@ import org.mockito.junit.MockitoJUnitRunner; @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AnnotateVideoTest { private static final String TEST_URI = "fake_uri"; diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDlpTest.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDlpTest.java index a1dea4e0ee7c..3e9a74178929 100644 --- a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDlpTest.java +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDlpTest.java @@ -35,9 +35,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BatchRequestForDlpTest { @Rule public TestPipeline testPipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/CloudVisionIT.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/CloudVisionIT.java index 964d0d1912d5..5ee72981409c 100644 --- a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/CloudVisionIT.java +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/CloudVisionIT.java @@ -33,9 +33,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CloudVisionIT { @Rule public TestPipeline testPipeline = TestPipeline.create(); private static final String TEST_IMAGE_URI = "gs://cloud-samples-data/vision/label/setagaya.jpeg"; diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/CloudVisionTest.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/CloudVisionTest.java index 402c8dbd96da..27f68b46e583 100644 --- a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/CloudVisionTest.java +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/CloudVisionTest.java @@ -30,9 +30,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CloudVisionTest { private static final String TEST_URI = "test_uri"; diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/DelegatingAtomicCoder.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/DelegatingAtomicCoder.java new file mode 100644 index 000000000000..2efdc043f309 --- /dev/null +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/DelegatingAtomicCoder.java @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.AtomicCoder; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; + +public abstract class DelegatingAtomicCoder extends AtomicCoder { + + private final Coder delegate; + + protected DelegatingAtomicCoder(Coder delegate) { + this.delegate = delegate; + } + + @Override + public final X decode(InputStream inStream) throws CoderException, IOException { + return from(delegate.decode(inStream)); + } + + @Override + public final void encode(X value, OutputStream outStream) throws CoderException, IOException { + delegate.encode(to(value), outStream); + } + + protected abstract X from(W object) throws CoderException, IOException; + + @Override + public void verifyDeterministic() throws NonDeterministicException { + delegate.verifyDeterministic(); + } + + protected abstract W to(X object) throws CoderException, IOException; +} diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/GenericJsonCoder.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/GenericJsonCoder.java new file mode 100644 index 000000000000..a37f6b6e1c00 --- /dev/null +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/GenericJsonCoder.java @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.api.client.googleapis.util.Utils; +import com.google.api.client.json.GenericJson; +import com.google.api.client.json.JsonFactory; +import java.io.IOException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.StringUtf8Coder; + +/** + * Can be used as a coder for any object that extends GenericJson. This includes all objects in the + * Google Genomics Java client library. + */ +public class GenericJsonCoder extends DelegatingAtomicCoder { + + private static final JsonFactory JSON_FACTORY = Utils.getDefaultJsonFactory(); + private static final Coder STRING_CODER = StringUtf8Coder.of(); + private final Class type; + + private GenericJsonCoder(Class type) { + super(STRING_CODER); + this.type = type; + } + + public static GenericJsonCoder of(Class type) { + return new GenericJsonCoder<>(type); + } + + @JsonCreator + @SuppressWarnings("unchecked") + public static GenericJsonCoder of(@JsonProperty("type") String type) + throws ClassNotFoundException { + return of((Class) Class.forName(type)); + } + + @Override + protected T from(String object) throws IOException { + return JSON_FACTORY.fromString(object, type); + } + + @Override + protected String to(T object) throws IOException { + return JSON_FACTORY.toString(object); + } +} diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/MapStringToDlpRowTest.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/MapStringToDlpRowTest.java index 588d2f03a99f..577a5dc7ed34 100644 --- a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/MapStringToDlpRowTest.java +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/MapStringToDlpRowTest.java @@ -31,9 +31,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MapStringToDlpRowTest { @Rule public TestPipeline testPipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/RecommendationAICatalogItemIT.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/RecommendationAICatalogItemIT.java new file mode 100644 index 000000000000..069e7d245526 --- /dev/null +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/RecommendationAICatalogItemIT.java @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import com.google.api.client.json.GenericJson; +import com.google.cloud.recommendationengine.v1beta1.CatalogItem; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.Random; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.junit.Ignore; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class RecommendationAICatalogItemIT { + @Rule public TestPipeline testPipeline = TestPipeline.create(); + + private static GenericJson getCatalogItem() { + List categories = new ArrayList(); + categories.add(new GenericJson().set("categories", Arrays.asList("Electronics", "Computers"))); + categories.add(new GenericJson().set("categories", Arrays.asList("Laptops"))); + return new GenericJson() + .set("id", Integer.toString(new Random().nextInt())) + .set("title", "Sample Laptop") + .set("description", "Indisputably the most fantastic laptop ever created.") + .set("categoryHierarchies", categories) + .set("languageCode", "en"); + } + + @Ignore("https://issues.apache.org/jira/browse/BEAM-12733") + @Test + public void createCatalogItem() { + String projectId = testPipeline.getOptions().as(GcpOptions.class).getProject(); + GenericJson catalogItem = getCatalogItem(); + + PCollectionTuple createCatalogItemResult = + testPipeline + .apply( + Create.of(Arrays.asList(catalogItem)) + .withCoder(GenericJsonCoder.of(GenericJson.class))) + .apply(RecommendationAIIO.createCatalogItems().withProjectId(projectId)); + PAssert.that(createCatalogItemResult.get(RecommendationAICreateCatalogItem.SUCCESS_TAG)) + .satisfies(new VerifyCatalogItemResult(1, (String) catalogItem.get("id"))); + testPipeline.run().waitUntilFinish(); + } + + @Ignore("Import method causing issues") + @Test + public void importCatalogItems() { + String projectId = testPipeline.getOptions().as(GcpOptions.class).getProject(); + ArrayList> catalogItems = new ArrayList<>(); + + GenericJson catalogItem1 = getCatalogItem(); + GenericJson catalogItem2 = getCatalogItem(); + + catalogItems.add(KV.of(Integer.toString(new Random().nextInt()), catalogItem1)); + catalogItems.add(KV.of(Integer.toString(new Random().nextInt()), catalogItem2)); + + PCollectionTuple importCatalogItemResult = + testPipeline + .apply(Create.of(catalogItems)) + .apply(RecommendationAIImportCatalogItems.newBuilder().setProjectId(projectId).build()); + PAssert.that(importCatalogItemResult.get(RecommendationAIImportCatalogItems.SUCCESS_TAG)) + .satisfies(new VerifyCatalogItemResult(2, (String) catalogItem1.get("id"))); + testPipeline.run().waitUntilFinish(); + } + + private static class VerifyCatalogItemResult + implements SerializableFunction, Void> { + + String catalogItemId; + int size; + + private VerifyCatalogItemResult(int size, String catalogItemId) { + this.size = size; + this.catalogItemId = catalogItemId; + } + + @Override + public Void apply(Iterable input) { + List matches = new ArrayList<>(); + input.forEach( + item -> { + CatalogItem result = item; + matches.add(result.getId()); + }); + assertTrue(matches.contains(this.catalogItemId)); + assertEquals(size, matches.size()); + return null; + } + } +} diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/RecommendationAIPredictIT.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/RecommendationAIPredictIT.java new file mode 100644 index 000000000000..a8b8d752474d --- /dev/null +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/RecommendationAIPredictIT.java @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import static org.junit.Assert.assertTrue; + +import com.google.api.client.json.GenericJson; +import com.google.cloud.recommendationengine.v1beta1.PredictResponse; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.junit.Ignore; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class RecommendationAIPredictIT { + @Rule public TestPipeline testPipeline = TestPipeline.create(); + + public static GenericJson getUserEvent() { + GenericJson userInfo = new GenericJson().set("visitorId", "1"); + return new GenericJson().set("eventType", "home-page-view").set("userInfo", userInfo); + } + + @Ignore("https://issues.apache.org/jira/browse/BEAM-12733") + @Test + public void predict() { + String projectId = testPipeline.getOptions().as(GcpOptions.class).getProject(); + + PCollectionTuple predictResult = + testPipeline + .apply( + Create.of(Arrays.asList(getUserEvent())) + .withCoder(GenericJsonCoder.of(GenericJson.class))) + .apply( + RecommendationAIIO.predictAll() + .withProjectId(projectId) + .withPlacementId("recently_viewed_default")); + PAssert.that(predictResult.get(RecommendationAIPredict.SUCCESS_TAG)) + .satisfies(new VerifyPredictResult()); + testPipeline.run().waitUntilFinish(); + } + + private static class VerifyPredictResult + implements SerializableFunction, Void> { + + @Override + public Void apply(Iterable input) { + List matches = new ArrayList<>(); + input.forEach( + item -> { + PredictResponse.PredictionResult result = item; + matches.add(result); + }); + assertTrue(!matches.isEmpty()); + return null; + } + } +} diff --git a/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/RecommendationAIUserEventIT.java b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/RecommendationAIUserEventIT.java new file mode 100644 index 000000000000..3be60a89176e --- /dev/null +++ b/sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/RecommendationAIUserEventIT.java @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import com.google.api.client.json.GenericJson; +import com.google.cloud.recommendationengine.v1beta1.UserEvent; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.junit.Ignore; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class RecommendationAIUserEventIT { + @Rule public TestPipeline testPipeline = TestPipeline.create(); + + public static GenericJson getUserEvent() { + GenericJson userInfo = new GenericJson().set("visitorId", "1"); + GenericJson productDetail = new GenericJson().set("id", "1").set("quantity", 1); + ArrayList productDetails = new ArrayList<>(); + productDetails.add(productDetail); + GenericJson productEventDetail = new GenericJson().set("productDetails", productDetails); + return new GenericJson() + .set("eventType", "detail-page-view") + .set("userInfo", userInfo) + .set("productEventDetail", productEventDetail); + } + + @Test + public void createUserEvent() { + String projectId = testPipeline.getOptions().as(GcpOptions.class).getProject(); + + PCollectionTuple createUserEventResult = + testPipeline + .apply( + Create.of(Arrays.asList(getUserEvent())) + .withCoder(GenericJsonCoder.of(GenericJson.class))) + .apply(RecommendationAIIO.writeUserEvent().withProjectId(projectId)); + PAssert.that(createUserEventResult.get(RecommendationAIWriteUserEvent.SUCCESS_TAG)) + .satisfies(new VerifyUserEventResult(1)); + testPipeline.run().waitUntilFinish(); + } + + @Ignore("Import method causing issues") + @Test + public void importUserEvents() { + String projectId = testPipeline.getOptions().as(GcpOptions.class).getProject(); + ArrayList> userEvents = new ArrayList<>(); + userEvents.add(KV.of("123", getUserEvent())); + userEvents.add(KV.of("123", getUserEvent())); + + PCollectionTuple importUserEventResult = + testPipeline + .apply(Create.of(userEvents)) + .apply(RecommendationAIImportUserEvents.newBuilder().setProjectId(projectId).build()); + PAssert.that(importUserEventResult.get(RecommendationAIWriteUserEvent.SUCCESS_TAG)) + .satisfies(new VerifyUserEventResult(2)); + testPipeline.run().waitUntilFinish(); + } + + private static class VerifyUserEventResult + implements SerializableFunction, Void> { + + int size; + + private VerifyUserEventResult(int size) { + this.size = size; + } + + @Override + public Void apply(Iterable input) { + List matches = new ArrayList<>(); + input.forEach( + item -> { + UserEvent result = item; + matches.add(result.getUserInfo().getVisitorId()); + }); + assertTrue(matches.contains(((GenericJson) getUserEvent().get("userInfo")).get("visitorId"))); + assertEquals(size, matches.size()); + return null; + } + } +} diff --git a/sdks/java/extensions/protobuf/build.gradle b/sdks/java/extensions/protobuf/build.gradle index 74882d3452ce..873ce45bb8cb 100644 --- a/sdks/java/extensions/protobuf/build.gradle +++ b/sdks/java/extensions/protobuf/build.gradle @@ -19,6 +19,7 @@ plugins { id 'org.apache.beam.module' } applyJavaNature( generatedClassPatterns: [ + /^org\.apache\.beam\.sdk\.extensions\.protobuf\.PayloadMessages/, /^org\.apache\.beam\.sdk\.extensions\.protobuf\.Proto2CoderTestMessages/, /^org\.apache\.beam\.sdk\.extensions\.protobuf\.Proto2SchemaMessages/, /^org\.apache\.beam\.sdk\.extensions\.protobuf\.Proto3SchemaMessages/, @@ -32,13 +33,11 @@ description = "Apache Beam :: SDKs :: Java :: Extensions :: Protobuf" ext.summary = "Add support to Apache Beam for Google Protobuf." dependencies { + compile library.java.vendored_bytebuddy_1_11_0 compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.protobuf_java testCompile project(path: ":sdks:java:core", configuration: "shadowTest") - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library - testCompile library.java.mockito_core testCompile library.java.junit testRuntimeOnly library.java.slf4j_jdk14 } diff --git a/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoByteBuddyUtils.java b/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoByteBuddyUtils.java index c58414e4385c..7b2ebf7a5ad4 100644 --- a/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoByteBuddyUtils.java +++ b/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoByteBuddyUtils.java @@ -72,37 +72,37 @@ import org.apache.beam.sdk.util.common.ReflectHelpers; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptor; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.ByteBuddy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.asm.AsmVisitorWrapper; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.method.MethodDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.method.MethodDescription.ForLoadedMethod; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.FieldManifestation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.modifier.Visibility; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.description.type.TypeDescription.ForLoadedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.DynamicType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.dynamic.scaffold.InstrumentedType; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.FixedValue; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.Implementation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.Duplication; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.Assigner; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.Assigner.Typing; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.assign.TypeCasting; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.constant.NullConstant; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.FieldAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodInvocation; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodReturn; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.ClassWriter; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.jar.asm.Label; -import org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy.matcher.ElementMatchers; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.ByteBuddy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.asm.AsmVisitorWrapper; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.method.MethodDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.method.MethodDescription.ForLoadedMethod; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.FieldManifestation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.modifier.Visibility; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.description.type.TypeDescription.ForLoadedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.DynamicType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.loading.ClassLoadingStrategy; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.dynamic.scaffold.InstrumentedType; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.FixedValue; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.Implementation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.ByteCodeAppender.Size; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.Duplication; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.StackManipulation.Compound; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.Assigner; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.Assigner.Typing; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.assign.TypeCasting; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.collection.ArrayAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.IntegerConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.constant.NullConstant; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.FieldAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodInvocation; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodReturn; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.implementation.bytecode.member.MethodVariableAccess; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.ClassWriter; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.jar.asm.Label; +import org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy.matcher.ElementMatchers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.CaseFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; diff --git a/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoMessageSchema.java b/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoMessageSchema.java index 2fbe98a56102..c4b77382a51c 100644 --- a/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoMessageSchema.java +++ b/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoMessageSchema.java @@ -22,7 +22,6 @@ import com.google.protobuf.DynamicMessage; import com.google.protobuf.Message; -import java.io.IOException; import java.lang.reflect.Method; import java.util.List; import java.util.Map; @@ -32,6 +31,7 @@ import org.apache.beam.sdk.schemas.FieldValueGetter; import org.apache.beam.sdk.schemas.FieldValueTypeInformation; import org.apache.beam.sdk.schemas.GetterBasedSchemaProvider; +import org.apache.beam.sdk.schemas.RowMessages; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.Field; import org.apache.beam.sdk.schemas.SchemaUserTypeCreator; @@ -39,7 +39,6 @@ import org.apache.beam.sdk.schemas.utils.FieldValueTypeSupplier; import org.apache.beam.sdk.schemas.utils.JavaBeanUtils; import org.apache.beam.sdk.schemas.utils.ReflectUtils; -import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptor; @@ -130,28 +129,12 @@ public SchemaUserTypeCreator schemaTypeCreator(Class targetClass, Schema sche "unchecked" }) public static SimpleFunction getProtoBytesToRowFn(Class clazz) { - checkForMessageType(clazz); - return new ProtoBytesToRowFn(clazz); - } - - private static class ProtoBytesToRowFn extends SimpleFunction { - private final ProtoCoder protoCoder; - private final SerializableFunction toRowFunction; - - public ProtoBytesToRowFn(Class clazz) { - this.protoCoder = ProtoCoder.of(clazz); - this.toRowFunction = new ProtoMessageSchema().toRowFunction(TypeDescriptor.of(clazz)); - } - - @Override - public Row apply(byte[] bytes) { - try { - T message = protoCoder.getParser().parseFrom(bytes); - return toRowFunction.apply(message); - } catch (IOException e) { - throw new IllegalArgumentException("Could not decode row from proto payload.", e); - } - } + Class protoClass = ensureMessageType(clazz); + ProtoCoder protoCoder = ProtoCoder.of(protoClass); + return RowMessages.bytesToRowFn( + new ProtoMessageSchema(), + TypeDescriptor.of(protoClass), + bytes -> protoCoder.getParser().parseFrom(bytes)); } // Other modules are not allowed to use non-vendored Message class @@ -160,37 +143,9 @@ public Row apply(byte[] bytes) { "unchecked" }) public static SimpleFunction getRowToProtoBytesFn(Class clazz) { - checkForMessageType(clazz); - return new RowToProtoBytesFn(clazz); - } - - private static class RowToProtoBytesFn extends SimpleFunction { - private final SerializableFunction toMessageFunction; - private final Schema protoSchema; - - public RowToProtoBytesFn(Class clazz) { - ProtoMessageSchema messageSchema = new ProtoMessageSchema(); - TypeDescriptor typeDescriptor = TypeDescriptor.of(clazz); - this.toMessageFunction = messageSchema.fromRowFunction(typeDescriptor); - this.protoSchema = messageSchema.schemaFor(typeDescriptor); - } - - @Override - public byte[] apply(Row row) { - if (!protoSchema.equivalent(row.getSchema())) { - row = switchFieldsOrder(row); - } - Message message = toMessageFunction.apply(row); - return message.toByteArray(); - } - - private Row switchFieldsOrder(Row row) { - Row.Builder convertedRow = Row.withSchema(protoSchema); - protoSchema - .getFields() - .forEach(field -> convertedRow.addValue(row.getValue(field.getName()))); - return convertedRow.build(); - } + Class protoClass = ensureMessageType(clazz); + return RowMessages.rowToBytesFn( + new ProtoMessageSchema(), TypeDescriptor.of(protoClass), Message::toByteArray); } private void checkForDynamicType(TypeDescriptor typeDescriptor) { @@ -200,11 +155,12 @@ private void checkForDynamicType(TypeDescriptor typeDescriptor) { } } - private static void checkForMessageType(Class clazz) { + private static Class ensureMessageType(Class clazz) { checkArgument( Message.class.isAssignableFrom(clazz), "%s is not a subtype of %s", clazz.getName(), Message.class.getSimpleName()); + return (Class) clazz; } } diff --git a/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoPayloadSerializerProvider.java b/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoPayloadSerializerProvider.java new file mode 100644 index 000000000000..0b18c6e99b11 --- /dev/null +++ b/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoPayloadSerializerProvider.java @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.protobuf; + +import static org.apache.beam.sdk.schemas.transforms.Cast.castRow; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; + +import com.google.auto.service.AutoService; +import com.google.protobuf.Message; +import java.util.Map; +import javax.annotation.Nonnull; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializerProvider; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; + +@Internal +@Experimental(Kind.SCHEMAS) +@AutoService(PayloadSerializerProvider.class) +public class ProtoPayloadSerializerProvider implements PayloadSerializerProvider { + @Override + public String identifier() { + return "proto"; + } + + private static Class getClass(Map tableParams) { + String protoClassName = checkArgumentNotNull(tableParams.get("protoClass")).toString(); + try { + Class protoClass = Class.forName(protoClassName); + return protoClass.asSubclass(Message.class); + } catch (ClassNotFoundException e) { + throw new IllegalArgumentException("Incorrect proto class provided: " + protoClassName, e); + } + } + + private static void inferAndVerifySchema( + Class protoClass, Schema requiredSchema) { + @Nonnull + Schema inferredSchema = + checkArgumentNotNull(new ProtoMessageSchema().schemaFor(TypeDescriptor.of(protoClass))); + if (!inferredSchema.assignableTo(requiredSchema)) { + throw new IllegalArgumentException( + String.format( + "Given message schema: '%s'%n" + + "does not match schema inferred from protobuf class.%n" + + "Protobuf class: '%s'%n" + + "Inferred schema: '%s'", + requiredSchema, protoClass.getName(), inferredSchema)); + } + } + + @Override + public PayloadSerializer getSerializer(Schema schema, Map tableParams) { + Class protoClass = getClass(tableParams); + inferAndVerifySchema(protoClass, schema); + SimpleFunction toRowFn = ProtoMessageSchema.getProtoBytesToRowFn(protoClass); + return PayloadSerializer.of( + ProtoMessageSchema.getRowToProtoBytesFn(protoClass), + bytes -> { + Row rawRow = toRowFn.apply(bytes); + return castRow(rawRow, rawRow.getSchema(), schema); + }); + } +} diff --git a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ByteStringCoderTest.java b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ByteStringCoderTest.java index 05d838039778..776b380b7094 100644 --- a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ByteStringCoderTest.java +++ b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ByteStringCoderTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.extensions.protobuf; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import com.google.protobuf.ByteString; @@ -41,9 +41,6 @@ /** Test case for {@link ByteStringCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ByteStringCoderTest { private static final ByteStringCoder TEST_CODER = ByteStringCoder.of(); diff --git a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/DynamicProtoCoderTest.java b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/DynamicProtoCoderTest.java index f8a819773829..10395832b36a 100644 --- a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/DynamicProtoCoderTest.java +++ b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/DynamicProtoCoderTest.java @@ -34,9 +34,6 @@ /** Tests for {@link ProtoCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DynamicProtoCoderTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoCoderTest.java b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoCoderTest.java index 4580af304382..38aa92bfc223 100644 --- a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoCoderTest.java +++ b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoCoderTest.java @@ -43,9 +43,6 @@ /** Tests for {@link ProtoCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ProtoCoderTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoMessageSchemaTest.java b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoMessageSchemaTest.java index 2b8674f3e1ca..480ea1d1f858 100644 --- a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoMessageSchemaTest.java +++ b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoMessageSchemaTest.java @@ -82,9 +82,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ProtoMessageSchemaTest { @Test diff --git a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoPayloadSerializerProviderTest.java b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoPayloadSerializerProviderTest.java new file mode 100644 index 000000000000..69a31f338653 --- /dev/null +++ b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoPayloadSerializerProviderTest.java @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.protobuf; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThrows; + +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class ProtoPayloadSerializerProviderTest { + private static final Schema SHUFFLED_SCHEMA = + Schema.builder() + .addStringField("f_string") + .addInt32Field("f_int") + .addArrayField("f_float_array", Schema.FieldType.FLOAT) + .addDoubleField("f_double") + .addInt64Field("f_long") + .build(); + private static final Row ROW = + Row.withSchema(SHUFFLED_SCHEMA) + .withFieldValue("f_string", "string") + .withFieldValue("f_int", 123) + .withFieldValue("f_float_array", ImmutableList.of(8.0f)) + .withFieldValue("f_double", 9.0) + .withFieldValue("f_long", 456L) + .build(); + private static final PayloadMessages.TestMessage MESSAGE = + PayloadMessages.TestMessage.newBuilder() + .setFLong(456) + .setFInt(123) + .setFDouble(9.0) + .setFString("string") + .addFFloatArray(8.0f) + .build(); + + private final ProtoPayloadSerializerProvider provider = new ProtoPayloadSerializerProvider(); + + @Test + public void invalidArgs() { + assertThrows( + IllegalArgumentException.class, + () -> provider.getSerializer(SHUFFLED_SCHEMA, ImmutableMap.of())); + assertThrows( + IllegalArgumentException.class, + () -> provider.getSerializer(SHUFFLED_SCHEMA, ImmutableMap.of("protoClass", ""))); + assertThrows( + ClassCastException.class, + () -> + provider.getSerializer( + SHUFFLED_SCHEMA, ImmutableMap.of("protoClass", ImmutableList.class.getName()))); + assertThrows( + IllegalArgumentException.class, + () -> + provider.getSerializer( + Schema.builder() + .addStringField("f_NOTACTUALLYINMESSAGE") + .addInt32Field("f_int") + .addArrayField("f_float_array", FieldType.FLOAT) + .addDoubleField("f_double") + .addInt64Field("f_long") + .build(), + ImmutableMap.of("protoClass", PayloadMessages.TestMessage.class.getName()))); + } + + @Test + public void serialize() throws Exception { + byte[] bytes = + provider + .getSerializer( + SHUFFLED_SCHEMA, + ImmutableMap.of("protoClass", PayloadMessages.TestMessage.class.getName())) + .serialize(ROW); + PayloadMessages.TestMessage result = PayloadMessages.TestMessage.parseFrom(bytes); + assertEquals(MESSAGE, result); + } + + @Test + public void deserialize() { + Row row = + provider + .getSerializer( + SHUFFLED_SCHEMA, + ImmutableMap.of("protoClass", PayloadMessages.TestMessage.class.getName())) + .deserialize(MESSAGE.toByteArray()); + assertEquals(ROW, row); + } +} diff --git a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoSchemaTranslatorTest.java b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoSchemaTranslatorTest.java index e9ae3b1a1885..9d473bf1f35b 100644 --- a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoSchemaTranslatorTest.java +++ b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtoSchemaTranslatorTest.java @@ -32,7 +32,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ProtoSchemaTranslatorTest { @Test diff --git a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtobufUtilTest.java b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtobufUtilTest.java index ac2971bde6d7..f4680ba71cb9 100644 --- a/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtobufUtilTest.java +++ b/sdks/java/extensions/protobuf/src/test/java/org/apache/beam/sdk/extensions/protobuf/ProtobufUtilTest.java @@ -20,8 +20,8 @@ import static org.apache.beam.sdk.extensions.protobuf.ProtobufUtil.checkProto2Syntax; import static org.apache.beam.sdk.extensions.protobuf.ProtobufUtil.getRecursiveDescriptorsForClass; import static org.apache.beam.sdk.extensions.protobuf.ProtobufUtil.verifyDeterministic; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.google.protobuf.Any; import com.google.protobuf.Descriptors.GenericDescriptor; diff --git a/sdks/java/extensions/sql/src/test/proto/kafka/kafka_messages.proto b/sdks/java/extensions/protobuf/src/test/proto/payload_messages.proto similarity index 74% rename from sdks/java/extensions/sql/src/test/proto/kafka/kafka_messages.proto rename to sdks/java/extensions/protobuf/src/test/proto/payload_messages.proto index 3cafec12f617..cecd9d34b73f 100644 --- a/sdks/java/extensions/sql/src/test/proto/kafka/kafka_messages.proto +++ b/sdks/java/extensions/protobuf/src/test/proto/payload_messages.proto @@ -22,7 +22,7 @@ syntax = "proto3"; -option java_package = "org.apache.beam.sdk.extensions.sql.meta.provider.kafka"; +option java_package = "org.apache.beam.sdk.extensions.protobuf"; message TestMessage { int64 f_long = 1; @@ -42,3 +42,26 @@ message SimpleMessage { int32 id = 1; string name = 2; } + +message NameMessage { + string name = 1; + + enum NameType { + FIRST = 0; + MIDDLE = 1; + LAST = 2; + SECOND_LAST = 3; + } + repeated NameType name_array = 2; +} + +message NameHeightMessage { + string name = 1; + int32 height = 2; +} + +message NameHeightKnowsJSMessage { + string name = 1; + int32 height = 2; + bool knows_javascript = 3; +} diff --git a/sdks/java/extensions/schemaio-expansion-service/build.gradle b/sdks/java/extensions/schemaio-expansion-service/build.gradle index 4129db687b42..7691d479c201 100644 --- a/sdks/java/extensions/schemaio-expansion-service/build.gradle +++ b/sdks/java/extensions/schemaio-expansion-service/build.gradle @@ -31,11 +31,15 @@ applyJavaNature( dependencies { compile project(path: ":sdks:java:expansion-service") + permitUnusedDeclared project(path: ":sdks:java:expansion-service") // BEAM-11761 compile project(":sdks:java:io:jdbc") + permitUnusedDeclared project(":sdks:java:io:jdbc") // BEAM-11761 compile library.java.postgres + permitUnusedDeclared library.java.postgres // BEAM-11761 + compile project(path: ":model:pipeline", configuration: "shadow") + compile project(path: ":sdks:java:core", configuration: "shadow") + compile library.java.vendored_grpc_1_36_0 + compile library.java.vendored_guava_26_0_jre testCompile library.java.junit - testCompile library.java.powermock_mockito testCompile library.java.mockito_core - // TODO(BEAM-10632): remove this dependency - testCompile "org.checkerframework:checker-qual:3.5.0" -} \ No newline at end of file +} diff --git a/sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java b/sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java index c120164891a1..4f3ab302d38a 100644 --- a/sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java +++ b/sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java @@ -37,7 +37,7 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PDone; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; diff --git a/sdks/java/extensions/schemaio-expansion-service/src/test/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrarTest.java b/sdks/java/extensions/schemaio-expansion-service/src/test/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrarTest.java index 9d9bdcff3dd2..31b8dac47b7b 100644 --- a/sdks/java/extensions/schemaio-expansion-service/src/test/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrarTest.java +++ b/sdks/java/extensions/schemaio-expansion-service/src/test/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrarTest.java @@ -23,7 +23,7 @@ import java.io.ByteArrayOutputStream; import java.io.IOException; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import javax.annotation.Nullable; import org.apache.beam.sdk.coders.RowCoder; import org.apache.beam.sdk.extensions.schemaio.expansion.ExternalSchemaIOTransformRegistrar.Configuration; @@ -49,7 +49,7 @@ public class ExternalSchemaIOTransformRegistrarTest { Row validConfigRow = Row.withSchema(validConfigSchema).addValue("value").build(); byte[] validSchemaBytes = SchemaTranslation.schemaToProto(validDataSchema, true).toByteArray(); - byte[] invalidBytes = "Nice try".getBytes(Charset.defaultCharset()); + byte[] invalidBytes = "Nice try".getBytes(StandardCharsets.UTF_8); SchemaIO schemaIO = Mockito.mock(SchemaIO.class); SchemaIOProvider schemaIOProvider = Mockito.mock(SchemaIOProvider.class); diff --git a/sdks/java/extensions/sketching/build.gradle b/sdks/java/extensions/sketching/build.gradle index 84e6f29231ce..9a5891f1d1b3 100644 --- a/sdks/java/extensions/sketching/build.gradle +++ b/sdks/java/extensions/sketching/build.gradle @@ -30,11 +30,8 @@ dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") compile "com.clearspring.analytics:stream:$streamlib_version" compile "com.tdunning:t-digest:$tdigest_version" - compile library.java.slf4j_api testCompile library.java.avro testCompile project(path: ":sdks:java:core", configuration: "shadowTest") - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.junit testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/extensions/sketching/src/test/java/org/apache/beam/sdk/extensions/sketching/TDigestQuantilesTest.java b/sdks/java/extensions/sketching/src/test/java/org/apache/beam/sdk/extensions/sketching/TDigestQuantilesTest.java index 5efc34b2d19a..f2cbe875bb8a 100644 --- a/sdks/java/extensions/sketching/src/test/java/org/apache/beam/sdk/extensions/sketching/TDigestQuantilesTest.java +++ b/sdks/java/extensions/sketching/src/test/java/org/apache/beam/sdk/extensions/sketching/TDigestQuantilesTest.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.extensions.sketching; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import com.tdunning.math.stats.Centroid; import com.tdunning.math.stats.MergingDigest; diff --git a/sdks/java/extensions/sorter/build.gradle b/sdks/java/extensions/sorter/build.gradle index f6f0dc6cd0b7..9072fec3a3fc 100644 --- a/sdks/java/extensions/sorter/build.gradle +++ b/sdks/java/extensions/sorter/build.gradle @@ -36,10 +36,9 @@ hadoopVersions.each {kv -> configurations.create("hadoopVersion$kv.key")} dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.vendored_guava_26_0_jre + compile library.java.slf4j_api provided library.java.hadoop_mapreduce_client_core provided library.java.hadoop_common - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.mockito_core testCompile library.java.junit testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") diff --git a/sdks/java/extensions/sorter/src/test/java/org/apache/beam/sdk/extensions/sorter/ExternalSorterBenchmark.java b/sdks/java/extensions/sorter/src/test/java/org/apache/beam/sdk/extensions/sorter/ExternalSorterBenchmark.java index 620711f9ac8f..0daff56962da 100644 --- a/sdks/java/extensions/sorter/src/test/java/org/apache/beam/sdk/extensions/sorter/ExternalSorterBenchmark.java +++ b/sdks/java/extensions/sorter/src/test/java/org/apache/beam/sdk/extensions/sorter/ExternalSorterBenchmark.java @@ -19,7 +19,7 @@ import java.io.File; import java.io.IOException; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.util.UUID; import org.apache.beam.sdk.extensions.sorter.ExternalSorter.Options.SorterType; @@ -48,8 +48,8 @@ private static void benchmark(Sorter sorter) throws IOException { for (int i = 0; i < N; i++) { sorter.add( KV.of( - UUID.randomUUID().toString().getBytes(Charset.defaultCharset()), - UUID.randomUUID().toString().getBytes(Charset.defaultCharset()))); + UUID.randomUUID().toString().getBytes(StandardCharsets.UTF_8), + UUID.randomUUID().toString().getBytes(StandardCharsets.UTF_8))); } int i = 0; for (KV ignored : sorter.sort()) { diff --git a/sdks/java/extensions/sorter/src/test/java/org/apache/beam/sdk/extensions/sorter/SortValuesTest.java b/sdks/java/extensions/sorter/src/test/java/org/apache/beam/sdk/extensions/sorter/SortValuesTest.java index 532fe955d7f7..fa3fbe6ae169 100644 --- a/sdks/java/extensions/sorter/src/test/java/org/apache/beam/sdk/extensions/sorter/SortValuesTest.java +++ b/sdks/java/extensions/sorter/src/test/java/org/apache/beam/sdk/extensions/sorter/SortValuesTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.extensions.sorter; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import java.util.Arrays; import org.apache.beam.sdk.testing.PAssert; diff --git a/sdks/java/extensions/sql/build.gradle b/sdks/java/extensions/sql/build.gradle index 5db332297087..c7a73db7ab32 100644 --- a/sdks/java/extensions/sql/build.gradle +++ b/sdks/java/extensions/sql/build.gradle @@ -26,15 +26,12 @@ plugins { applyJavaNature( generatedClassPatterns: [ /^org\.apache\.beam\.sdk\.extensions\.sql\.impl\.parser\.impl.*/, - /^org\.apache\.beam\.sdk\.extensions\.sql\.meta\.provider\.kafka\.KafkaMessages/, ], automaticModuleName: 'org.apache.beam.sdk.extensions.sql', // javacc generated code produces lint warnings disableLintWarnings: ['dep-ann', 'rawtypes'], ) -applyGrpcNature() - description = "Apache Beam :: SDKs :: Java :: Extensions :: SQL" ext.summary = "Beam SQL provides a new interface to generate a Beam pipeline from SQL statement" @@ -58,37 +55,44 @@ hadoopVersions.each {kv -> configurations.create("hadoopVersion$kv.key")} dependencies { javacc "net.java.dev.javacc:javacc:4.0" fmppTask "com.googlecode.fmpp-maven-plugin:fmpp-maven-plugin:1.0" - fmppTask "org.freemarker:freemarker:2.3.28" - fmppTemplates library.java.vendored_calcite_1_20_0 - compile project(":sdks:java:core") + fmppTask "org.freemarker:freemarker:2.3.31" + fmppTemplates library.java.vendored_calcite_1_26_0 + compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":sdks:java:extensions:join-library") + permitUnusedDeclared project(":sdks:java:extensions:join-library") // BEAM-11761 + compile project(":sdks:java:extensions:sql:udf") compile project(path: ":runners:direct-java", configuration: "shadow") compile library.java.commons_codec compile library.java.commons_csv compile library.java.jackson_databind compile library.java.joda_time - compile library.java.vendored_calcite_1_20_0 + compile library.java.vendored_calcite_1_26_0 compile "com.alibaba:fastjson:1.2.69" compile "org.codehaus.janino:janino:3.0.11" compile "org.codehaus.janino:commons-compiler:3.0.11" - provided "org.checkerframework:checker-qual:3.4.1" + compile library.java.jackson_core + compile library.java.mongo_java_driver + compile library.java.slf4j_api + compile library.java.joda_time + compile library.java.vendored_guava_26_0_jre provided project(":sdks:java:io:kafka") provided project(":sdks:java:io:google-cloud-platform") compile project(":sdks:java:io:mongodb") + compile library.java.avro provided project(":sdks:java:io:parquet") provided library.java.jackson_dataformat_xml provided library.java.hadoop_client provided library.java.kafka_clients - testCompile library.java.vendored_calcite_1_20_0 + testCompile library.java.vendored_calcite_1_26_0 testCompile library.java.vendored_guava_26_0_jre testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library - testCompile library.java.mockito_core testCompile library.java.quickcheck_core testCompile library.java.testcontainers_kafka testCompile library.java.google_cloud_bigtable_emulator testCompile project(path: ":sdks:java:io:mongodb", configuration: "testRuntime") + testCompile project(path: ":sdks:java:io:thrift", configuration: "testRuntime") + testCompile project(path: ":sdks:java:extensions:protobuf", configuration: "testRuntime") + testCompileOnly project(":sdks:java:extensions:sql:udf-test-provider") testRuntimeClasspath library.java.slf4j_jdk14 hadoopVersions.each {kv -> "hadoopVersion$kv.key" "org.apache.hadoop:hadoop-client:$kv.value" @@ -119,11 +123,11 @@ task copyFmppTemplatesFromCalciteCore(type: Copy) { into "${project.buildDir}/templates-fmpp" filter{ line -> - line.replace('import org.apache.calcite.', 'import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.') + line.replace('import org.apache.calcite.', 'import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.') } filter{ line -> - line.replace('import static org.apache.calcite.', 'import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.') + line.replace('import static org.apache.calcite.', 'import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.') } } @@ -177,16 +181,6 @@ task runPojoExample(type: JavaExec) { args = ["--runner=DirectRunner"] } -// These tests become flaky when run in parallel at more than 3 threads so run them in a separate task -task runKafkaTableProviderIT(type: Test) { - outputs.upToDateWhen { false } - include '**/KafkaTableProvider*IT.class' - maxParallelForks 2 - classpath = project(":sdks:java:extensions:sql").sourceSets.test.runtimeClasspath - testClassesDirs = files(project(":sdks:java:extensions:sql").sourceSets.test.output.classesDirs) - useJUnit { } -} - task integrationTest(type: Test) { def gcpProject = project.findProperty('gcpProject') ?: 'apache-beam-testing' def gcsTempRoot = project.findProperty('gcsTempRoot') ?: 'gs://temp-storage-for-end-to-end-tests/' @@ -202,7 +196,6 @@ task integrationTest(type: Test) { systemProperty "beamTestPipelineOptions", JsonOutput.toJson(pipelineOptions) include '**/*IT.class' - exclude '**/KafkaTableProvider*IT.class' maxParallelForks 4 classpath = project(":sdks:java:extensions:sql") @@ -216,10 +209,14 @@ task integrationTest(type: Test) { task postCommit { group = "Verification" description = "Various integration tests" - dependsOn runKafkaTableProviderIT dependsOn integrationTest } +task emptyJar(type: Jar) { + archiveBaseName = "${project.archivesBaseName}-empty-jar" + from fileTree(dir: getTemporaryDir().createNewFile().toString()) +} + task hadoopVersionsTest(group: "Verification") { description = "Runs SQL tests with different Hadoop versions" def taskNames = hadoopVersions.keySet().stream() @@ -233,5 +230,18 @@ hadoopVersions.each { kv -> description = "Runs SQL tests with Hadoop version $kv.value" classpath = configurations."hadoopVersion$kv.key" + sourceSets.test.runtimeClasspath include '**/*Test.class' + dependsOn emptyJar + // Pass jars used by Java UDF tests via system properties. + evaluationDependsOn(":sdks:java:extensions:sql:udf-test-provider") // Needed to resolve jarPath. + systemProperty "beam.sql.udf.test.jar_path", project(":sdks:java:extensions:sql:udf-test-provider").jarPath + systemProperty "beam.sql.udf.test.empty_jar_path", emptyJar.archivePath } } + +test { + dependsOn emptyJar + // Pass jars used by Java UDF tests via system properties. + evaluationDependsOn(":sdks:java:extensions:sql:udf-test-provider") // Needed to resolve jarPath. + systemProperty "beam.sql.udf.test.jar_path", project(":sdks:java:extensions:sql:udf-test-provider").jarPath + systemProperty "beam.sql.udf.test.empty_jar_path", emptyJar.archivePath +} diff --git a/sdks/java/extensions/sql/datacatalog/build.gradle b/sdks/java/extensions/sql/datacatalog/build.gradle index ad1fab0bceb9..8c0f4904f2a7 100644 --- a/sdks/java/extensions/sql/datacatalog/build.gradle +++ b/sdks/java/extensions/sql/datacatalog/build.gradle @@ -20,13 +20,22 @@ import groovy.json.JsonOutput plugins { id 'org.apache.beam.module' } -applyJavaNature(automaticModuleName: 'org.apache.beam.sdk.extensions.sql.datacatalog') +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.extensions.sql.datacatalog') dependencies { compile enforcedPlatform(library.java.google_cloud_platform_libraries_bom) compile(library.java.google_cloud_datacatalog_v1beta1) { exclude group: 'io.grpc', module: 'grpc-core' // Use Beam's version } + compile library.java.gax + compile library.java.google_auth_library_credentials + compile library.java.proto_google_cloud_datacatalog_v1beta1 + compile library.java.protobuf_java + compile library.java.slf4j_api + compile library.java.vendored_guava_26_0_jre + compile project(path: ":sdks:java:core", configuration: "shadow") + compile "org.threeten:threetenbp:1.4.5" provided project(":sdks:java:extensions:sql") // Dependencies for the example diff --git a/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlDataCatalogExample.java b/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlDataCatalogExample.java index fd56ff488454..9e73ab27baa1 100644 --- a/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlDataCatalogExample.java +++ b/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlDataCatalogExample.java @@ -31,7 +31,7 @@ import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptor; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Strings; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Strings; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/DataCatalogTableProvider.java b/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/DataCatalogTableProvider.java index a2eb4e23f5fa..cd81ce0b6d52 100644 --- a/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/DataCatalogTableProvider.java +++ b/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/DataCatalogTableProvider.java @@ -22,15 +22,19 @@ import com.google.api.gax.rpc.InvalidArgumentException; import com.google.api.gax.rpc.NotFoundException; import com.google.api.gax.rpc.PermissionDeniedException; +import com.google.api.gax.rpc.StatusCode.Code; import com.google.cloud.datacatalog.v1beta1.DataCatalogClient; import com.google.cloud.datacatalog.v1beta1.DataCatalogSettings; import com.google.cloud.datacatalog.v1beta1.Entry; import com.google.cloud.datacatalog.v1beta1.LookupEntryRequest; +import com.google.cloud.datacatalog.v1beta1.UpdateEntryRequest; +import com.google.protobuf.FieldMask; import java.io.IOException; import java.util.HashMap; import java.util.Map; import java.util.Optional; import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; import org.apache.beam.sdk.extensions.sql.impl.TableName; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; @@ -42,13 +46,19 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.pubsub.PubsubTableProvider; import org.apache.beam.sdk.extensions.sql.meta.provider.text.TextTableProvider; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.threeten.bp.Duration; /** Uses DataCatalog to get the source type and schema for a table. */ public class DataCatalogTableProvider extends FullNameTableProvider implements AutoCloseable { + private static final Logger LOG = LoggerFactory.getLogger(DataCatalogTableProvider.class); + private static final TableFactory PUBSUB_TABLE_FACTORY = new PubsubTableFactory(); private static final TableFactory GCS_TABLE_FACTORY = new GcsTableFactory(); @@ -142,13 +152,41 @@ private Table loadTableFromDC(String tableName) { } } - private static DataCatalogClient createDataCatalogClient(DataCatalogPipelineOptions options) { + @Internal + public static DataCatalogClient createDataCatalogClient(DataCatalogPipelineOptions options) { try { - return DataCatalogClient.create( + DataCatalogSettings.Builder builder = DataCatalogSettings.newBuilder() .setCredentialsProvider(() -> options.as(GcpOptions.class).getGcpCredential()) - .setEndpoint(options.getDataCatalogEndpoint()) - .build()); + .setEndpoint(options.getDataCatalogEndpoint()); + + // Retry permission denied errors, they are likely due to sync delay. + // Limit max retry delay to 1 minute, at that point its probably a legitimate permission error + // and we should get back to the user. + builder + .lookupEntrySettings() + .setRetryableCodes( + ImmutableSet.of(Code.PERMISSION_DENIED, Code.DEADLINE_EXCEEDED, Code.UNAVAILABLE)) + .setRetrySettings( + builder + .lookupEntrySettings() + .getRetrySettings() + .toBuilder() + .setMaxRetryDelay(Duration.ofMinutes(1L)) + .build()); + builder + .updateEntrySettings() + .setRetryableCodes( + ImmutableSet.of(Code.PERMISSION_DENIED, Code.DEADLINE_EXCEEDED, Code.UNAVAILABLE)) + .setRetrySettings( + builder + .updateEntrySettings() + .getRetrySettings() + .toBuilder() + .setMaxRetryDelay(Duration.ofMinutes(1L)) + .build()); + + return DataCatalogClient.create(builder.build()); } catch (IOException e) { throw new RuntimeException("Error creating Data Catalog client", e); } @@ -178,6 +216,24 @@ private Table toCalciteTable(String tableName, Entry entry) { return tableBuilder.get().schema(schema).name(tableName).build(); } + @Internal + public boolean setSchemaIfNotPresent(String resource, Schema schema) { + com.google.cloud.datacatalog.v1beta1.Schema dcSchema = SchemaUtils.toDataCatalog(schema); + Entry entry = + dataCatalog.lookupEntry(LookupEntryRequest.newBuilder().setSqlResource(resource).build()); + if (entry.getSchema().getColumnsCount() == 0) { + dataCatalog.updateEntry( + UpdateEntryRequest.newBuilder() + .setEntry(entry.toBuilder().setSchema(dcSchema).build()) + .setUpdateMask(FieldMask.newBuilder().addPaths("schema").build()) + .build()); + return true; + } else { + LOG.info(String.format("Not updating schema for '%s' since it already has one.", resource)); + return false; + } + } + @Override public void close() { dataCatalog.close(); diff --git a/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/SchemaUtils.java b/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/SchemaUtils.java index 2b3bcc36bf73..7c729c4af2ce 100644 --- a/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/SchemaUtils.java +++ b/sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/SchemaUtils.java @@ -28,10 +28,14 @@ import org.apache.beam.sdk.schemas.Schema.Field; import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Strings; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Strings; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; @Experimental(Kind.SCHEMAS) +@SuppressWarnings({ + "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) class SchemaUtils { private static final Map FIELD_TYPES = @@ -98,4 +102,93 @@ private static FieldType getBeamFieldType(ColumnSchema column) { throw new UnsupportedOperationException( "Field type '" + dcFieldType + "' is not supported (field '" + column.getColumn() + "')"); } + + /** Convert Beam schema to DataCatalog schema. */ + static com.google.cloud.datacatalog.v1beta1.Schema toDataCatalog(Schema schema) { + com.google.cloud.datacatalog.v1beta1.Schema.Builder schemaBuilder = + com.google.cloud.datacatalog.v1beta1.Schema.newBuilder(); + for (Schema.Field field : schema.getFields()) { + schemaBuilder.addColumns(fromBeamField(field)); + } + return schemaBuilder.build(); + } + + private static ColumnSchema fromBeamField(Schema.Field field) { + Schema.FieldType fieldType = field.getType(); + if (fieldType.getTypeName().equals(Schema.TypeName.ARRAY)) { + if (fieldType.getNullable()) { + throw new UnsupportedOperationException( + "Nullable array type is not supported in DataCatalog schemas: " + fieldType); + } else if (fieldType.getCollectionElementType().getNullable()) { + throw new UnsupportedOperationException( + "Nullable array element type is not supported in DataCatalog schemas: " + fieldType); + } else if (fieldType.getCollectionElementType().getTypeName().equals(Schema.TypeName.ARRAY)) { + throw new UnsupportedOperationException( + "Array of arrays not supported in DataCatalog schemas: " + fieldType); + } + ColumnSchema column = + fromBeamField(Field.of(field.getName(), fieldType.getCollectionElementType())); + if (!column.getMode().equals("REQUIRED")) { + // We should have bailed out earlier for any cases that would result in mode being set. + throw new AssertionError( + "ColumnSchema for collection element type has non-empty mode: " + fieldType); + } + return column.toBuilder().setMode("REPEATED").build(); + } else { // struct or primitive type + ColumnSchema.Builder colBuilder = + ColumnSchema.newBuilder().setType(getDataCatalogType(fieldType)); + + if (fieldType.getNullable()) { + colBuilder.setMode("NULLABLE"); + } else { + colBuilder.setMode("REQUIRED"); + } + + // if this is a struct, add the child columns + if (fieldType.getTypeName().equals(Schema.TypeName.ROW)) { + for (Schema.Field subField : fieldType.getRowSchema().getFields()) { + colBuilder.addSubcolumns(fromBeamField(subField)); + } + } + + return colBuilder.setColumn(field.getName()).build(); + } + } + + private static String getDataCatalogType(FieldType fieldType) { + switch (fieldType.getTypeName()) { + case INT32: + case INT64: + case BYTES: + case DOUBLE: + case STRING: + return fieldType.getTypeName().name(); + case BOOLEAN: + return "BOOL"; + case DATETIME: + return "TIMESTAMP"; + case DECIMAL: + return "NUMERIC"; + case LOGICAL_TYPE: + Schema.LogicalType logical = fieldType.getLogicalType(); + if (SqlTypes.TIME.getIdentifier().equals(logical.getIdentifier())) { + return "TIME"; + } else if (SqlTypes.DATE.getIdentifier().equals(logical.getIdentifier())) { + return "DATE"; + } else if (SqlTypes.DATETIME.getIdentifier().equals(logical.getIdentifier())) { + return "DATETIME"; + } else { + throw new UnsupportedOperationException("Unsupported logical type: " + logical); + } + case ROW: + return "STRUCT"; + case MAP: + return String.format( + "MAP<%s,%s>", + getDataCatalogType(fieldType.getMapKeyType()), + getDataCatalogType(fieldType.getMapValueType())); + default: + throw new UnsupportedOperationException("Unsupported type: " + fieldType); + } + } } diff --git a/sdks/java/extensions/sql/datacatalog/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/SchemaUtilsTest.java b/sdks/java/extensions/sql/datacatalog/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/SchemaUtilsTest.java new file mode 100644 index 000000000000..6afc9466f60e --- /dev/null +++ b/sdks/java/extensions/sql/datacatalog/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/SchemaUtilsTest.java @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog; + +import static org.junit.Assert.assertEquals; + +import com.google.cloud.datacatalog.v1beta1.ColumnSchema; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Unit tests for {@link SchemaUtils}. */ +@RunWith(JUnit4.class) +public class SchemaUtilsTest { + + private static final Schema TEST_INNER_SCHEMA = + Schema.builder().addField("i1", FieldType.INT64).addField("i2", FieldType.STRING).build(); + + private static final Schema TEST_SCHEMA = + Schema.builder() + .addNullableField("f_int32", FieldType.INT32) + .addNullableField("f_int64", FieldType.INT64) + .addNullableField("f_bytes", FieldType.BYTES) + .addNullableField("f_double", FieldType.DOUBLE) + .addNullableField("f_string", FieldType.STRING) + .addNullableField("f_bool", FieldType.BOOLEAN) + .addNullableField("f_ts", FieldType.DATETIME) + .addNullableField("f_numeric", FieldType.DECIMAL) + .addLogicalTypeField("f_time", SqlTypes.TIME) + .addLogicalTypeField("f_date", SqlTypes.DATE) + .addLogicalTypeField("f_datetime", SqlTypes.DATETIME) + .addArrayField("f_array", FieldType.INT64) + .addRowField("f_struct", TEST_INNER_SCHEMA) + .build(); + + private static final com.google.cloud.datacatalog.v1beta1.Schema TEST_DC_SCHEMA = + com.google.cloud.datacatalog.v1beta1.Schema.newBuilder() + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_int32") + .setType("INT32") + .setMode("NULLABLE") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_int64") + .setType("INT64") + .setMode("NULLABLE") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_bytes") + .setType("BYTES") + .setMode("NULLABLE") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_double") + .setType("DOUBLE") + .setMode("NULLABLE") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_string") + .setType("STRING") + .setMode("NULLABLE") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_bool") + .setType("BOOL") + .setMode("NULLABLE") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_ts") + .setType("TIMESTAMP") + .setMode("NULLABLE") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_numeric") + .setType("NUMERIC") + .setMode("NULLABLE") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_time") + .setType("TIME") + .setMode("REQUIRED") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_date") + .setType("DATE") + .setMode("REQUIRED") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_datetime") + .setType("DATETIME") + .setMode("REQUIRED") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_array") + .setType("INT64") + .setMode("REPEATED") + .build()) + .addColumns( + ColumnSchema.newBuilder() + .setColumn("f_struct") + .setType("STRUCT") + .addSubcolumns( + ColumnSchema.newBuilder() + .setColumn("i1") + .setType("INT64") + .setMode("REQUIRED") + .build()) + .addSubcolumns( + ColumnSchema.newBuilder() + .setColumn("i2") + .setType("STRING") + .setMode("REQUIRED") + .build()) + .setMode("REQUIRED") + .build()) + .build(); + + @Test + public void testFromDataCatalog() { + assertEquals(TEST_SCHEMA, SchemaUtils.fromDataCatalog(TEST_DC_SCHEMA)); + } + + @Test + public void testToDataCatalog() { + assertEquals(TEST_DC_SCHEMA, SchemaUtils.toDataCatalog(TEST_SCHEMA)); + } +} diff --git a/sdks/java/extensions/sql/expansion-service/build.gradle b/sdks/java/extensions/sql/expansion-service/build.gradle index aabd3b1ed0c2..37f162995ef9 100644 --- a/sdks/java/extensions/sql/expansion-service/build.gradle +++ b/sdks/java/extensions/sql/expansion-service/build.gradle @@ -32,9 +32,12 @@ ext.summary = """Contains code to run a SQL Expansion Service.""" dependencies { + compile project(path: ":sdks:java:core", configuration: "shadow") + compile project(path: ":sdks:java:expansion-service") + permitUnusedDeclared project(path: ":sdks:java:expansion-service") // BEAM-11761 compile project(path: ":sdks:java:extensions:sql") compile project(path: ":sdks:java:extensions:sql:zetasql") - compile project(path: ":sdks:java:expansion-service") + compile library.java.vendored_guava_26_0_jre } task runExpansionService (type: JavaExec) { diff --git a/sdks/java/extensions/sql/hcatalog/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/hcatalog/BeamSqlHiveSchemaTest.java b/sdks/java/extensions/sql/hcatalog/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/hcatalog/BeamSqlHiveSchemaTest.java index 3e87efcaeb7e..706cf69762b9 100644 --- a/sdks/java/extensions/sql/hcatalog/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/hcatalog/BeamSqlHiveSchemaTest.java +++ b/sdks/java/extensions/sql/hcatalog/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/hcatalog/BeamSqlHiveSchemaTest.java @@ -51,9 +51,6 @@ import org.junit.rules.TemporaryFolder; /** Test for HCatalogTableProvider. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlHiveSchemaTest implements Serializable { private static final Schema ROW_SCHEMA = diff --git a/sdks/java/extensions/sql/jdbc/build.gradle b/sdks/java/extensions/sql/jdbc/build.gradle index f1919053a9e7..a1c99cf53c1b 100644 --- a/sdks/java/extensions/sql/jdbc/build.gradle +++ b/sdks/java/extensions/sql/jdbc/build.gradle @@ -33,17 +33,17 @@ configurations { dependencies { compile project(":sdks:java:extensions:sql") compile "jline:jline:2.14.6" + permitUnusedDeclared "jline:jline:2.14.6" // BEAM-11761 compile "sqlline:sqlline:1.4.0" - compile library.java.slf4j_jdk14 + compile library.java.vendored_guava_26_0_jre testCompile project(path: ":sdks:java:io:google-cloud-platform", configuration: "testRuntime") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library // Depending on outputs so integrationTest can run with only test dependencies. // This enables us to test the JDBC jar being loaded on a custom classloader. integrationTest sourceSets.test.output integrationTest sourceSets.main.output integrationTest library.java.junit + compile project(path: ":sdks:java:core", configuration: "shadow") } processResources { diff --git a/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineIT.java b/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineIT.java index ae9573966915..1f5bb46c71a6 100644 --- a/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineIT.java +++ b/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineIT.java @@ -21,7 +21,7 @@ import static org.apache.beam.sdk.extensions.sql.jdbc.BeamSqlLineTestingUtils.buildArgs; import static org.apache.beam.sdk.extensions.sql.jdbc.BeamSqlLineTestingUtils.toLines; import static org.hamcrest.CoreMatchers.everyItem; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.databind.node.ObjectNode; diff --git a/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineTest.java b/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineTest.java index 34bae8dc0f06..e308fa52d6e4 100644 --- a/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineTest.java +++ b/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineTest.java @@ -20,13 +20,14 @@ import static org.apache.beam.sdk.extensions.sql.jdbc.BeamSqlLineTestingUtils.buildArgs; import static org.apache.beam.sdk.extensions.sql.jdbc.BeamSqlLineTestingUtils.toLines; import static org.hamcrest.CoreMatchers.everyItem; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.is; +import static org.hamcrest.Matchers.oneOf; import java.io.ByteArrayOutputStream; import java.io.File; import java.util.Arrays; import java.util.List; -import org.hamcrest.collection.IsIn; import org.junit.Rule; import org.junit.Test; import org.junit.rules.TemporaryFolder; @@ -94,7 +95,7 @@ public void testSqlLine_select() throws Exception { List> lines = toLines(byteArrayOutputStream); assertThat( Arrays.asList(Arrays.asList("3", "hello", "2018-05-28")), - everyItem(IsIn.isOneOf(lines.toArray()))); + everyItem(is(oneOf(lines.toArray())))); } @Test @@ -104,7 +105,7 @@ public void testSqlLine_selectFromTable() throws Exception { buildArgs( "CREATE EXTERNAL TABLE table_test (col_a VARCHAR, col_b VARCHAR, " + "col_c VARCHAR, col_x TINYINT, col_y INT, col_z BIGINT) TYPE 'test';", - "INSERT INTO table_test VALUES ('a', 'b', 'c', 1, 2, 3);", + "INSERT INTO table_test VALUES ('a', 'b', 'c', 1, 2, CAST(3 AS BIGINT));", "SELECT * FROM table_test;"); BeamSqlLine.runSqlLine(args, null, byteArrayOutputStream, null); @@ -114,7 +115,7 @@ public void testSqlLine_selectFromTable() throws Exception { Arrays.asList( Arrays.asList("col_a", "col_b", "col_c", "col_x", "col_y", "col_z"), Arrays.asList("a", "b", "c", "1", "2", "3")), - everyItem(IsIn.isOneOf(lines.toArray()))); + everyItem(is(oneOf(lines.toArray())))); } @Test @@ -129,8 +130,7 @@ public void testSqlLine_insertSelect() throws Exception { BeamSqlLine.runSqlLine(args, null, byteArrayOutputStream, null); List> lines = toLines(byteArrayOutputStream); - assertThat( - Arrays.asList(Arrays.asList("3", "hello")), everyItem(IsIn.isOneOf(lines.toArray()))); + assertThat(Arrays.asList(Arrays.asList("3", "hello")), everyItem(is(oneOf(lines.toArray())))); } @Test @@ -149,7 +149,7 @@ public void testSqlLine_GroupBy() throws Exception { List> lines = toLines(byteArrayOutputStream); assertThat( Arrays.asList(Arrays.asList("3", "2"), Arrays.asList("4", "1")), - everyItem(IsIn.isOneOf(lines.toArray()))); + everyItem(is(oneOf(lines.toArray())))); } @Test @@ -169,7 +169,7 @@ public void testSqlLine_fixedWindow() throws Exception { assertThat( Arrays.asList( Arrays.asList("2018-07-01 21:26:06", "1"), Arrays.asList("2018-07-01 21:26:07", "1")), - everyItem(IsIn.isOneOf(lines.toArray()))); + everyItem(is(oneOf(lines.toArray())))); } @Test @@ -195,6 +195,6 @@ public void testSqlLine_slidingWindow() throws Exception { Arrays.asList("2018-07-01 21:26:09", "2"), Arrays.asList("2018-07-01 21:26:10", "2"), Arrays.asList("2018-07-01 21:26:11", "1")), - everyItem(IsIn.isOneOf(lines.toArray()))); + everyItem(is(oneOf(lines.toArray())))); } } diff --git a/sdks/java/extensions/sql/perf-tests/build.gradle b/sdks/java/extensions/sql/perf-tests/build.gradle index 7875a6b846fd..8af316e64fa4 100644 --- a/sdks/java/extensions/sql/perf-tests/build.gradle +++ b/sdks/java/extensions/sql/perf-tests/build.gradle @@ -17,7 +17,8 @@ */ plugins { id 'org.apache.beam.module' } -applyJavaNature(automaticModuleName: 'org.apache.beam.sdk.extensions.sql.meta.provider') +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.extensions.sql.meta.provider') provideIntegrationTestingDependencies() enableJavaPerformanceTesting() diff --git a/sdks/java/extensions/sql/perf-tests/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryIOPushDownIT.java b/sdks/java/extensions/sql/perf-tests/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryIOPushDownIT.java index 1194baac4414..d2808cdd1c22 100644 --- a/sdks/java/extensions/sql/perf-tests/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryIOPushDownIT.java +++ b/sdks/java/extensions/sql/perf-tests/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryIOPushDownIT.java @@ -46,9 +46,9 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSets; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Before; import org.junit.BeforeClass; diff --git a/sdks/java/extensions/sql/src/main/codegen/config.fmpp b/sdks/java/extensions/sql/src/main/codegen/config.fmpp index 8dcb04b9404f..ca590152a9e4 100644 --- a/sdks/java/extensions/sql/src/main/codegen/config.fmpp +++ b/sdks/java/extensions/sql/src/main/codegen/config.fmpp @@ -21,11 +21,12 @@ data: { # List of import statements. imports: [ - "org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.ColumnStrategy" - "org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlCreate" - "org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDrop" - "org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName" + "org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.ColumnStrategy" + "org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlCreate" + "org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlDrop" + "org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName" "org.apache.beam.sdk.extensions.sql.impl.parser.SqlCreateExternalTable" + "org.apache.beam.sdk.extensions.sql.impl.parser.SqlCreateFunction" "org.apache.beam.sdk.extensions.sql.impl.parser.SqlDdlNodes" "org.apache.beam.sdk.extensions.sql.impl.parser.SqlSetOptionBeam" "org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils" @@ -34,8 +35,10 @@ data: { # List of keywords. keywords: [ + "AGGREGATE" "COMMENT" "IF" + "JAR" "LOCATION" "TBLPROPERTIES" ] @@ -43,6 +46,7 @@ data: { # List of keywords from "keywords" section that are not reserved. nonReservedKeywords: [ "A" + "ABSENT" "ABSOLUTE" "ACTION" "ADA" @@ -65,11 +69,11 @@ data: { "CATALOG_NAME" "CENTURY" "CHAIN" + "CHARACTERISTICS" + "CHARACTERS" "CHARACTER_SET_CATALOG" "CHARACTER_SET_NAME" "CHARACTER_SET_SCHEMA" - "CHARACTERISTICS" - "CHARACTERS" "CLASS_ORIGIN" "COBOL" "COLLATION" @@ -80,13 +84,14 @@ data: { "COMMAND_FUNCTION" "COMMAND_FUNCTION_CODE" "COMMITTED" + "CONDITIONAL" "CONDITION_NUMBER" "CONNECTION" "CONNECTION_NAME" "CONSTRAINT_CATALOG" "CONSTRAINT_NAME" - "CONSTRAINT_SCHEMA" "CONSTRAINTS" + "CONSTRAINT_SCHEMA" "CONSTRUCTOR" "CONTINUE" "CURSOR_NAME" @@ -94,6 +99,7 @@ data: { "DATABASE" "DATETIME_INTERVAL_CODE" "DATETIME_INTERVAL_PRECISION" + "DAYS" "DECADE" "DEFAULTS" "DEFERRABLE" @@ -113,13 +119,16 @@ data: { "DOY" "DYNAMIC_FUNCTION" "DYNAMIC_FUNCTION_CODE" + "ENCODING" "EPOCH" + "ERROR" "EXCEPTION" "EXCLUDE" "EXCLUDING" "FINAL" "FIRST" "FOLLOWING" + "FORMAT" "FORTRAN" "FOUND" "FRAC_SECOND" @@ -131,6 +140,9 @@ data: { "GOTO" "GRANTED" "HIERARCHY" + "HOP" + "HOURS" + "IGNORE" "IMMEDIATE" "IMMEDIATELY" "IMPLEMENTATION" @@ -142,8 +154,8 @@ data: { "INSTANTIABLE" "INVOKER" "ISODOW" - "ISOYEAR" "ISOLATION" + "ISOYEAR" "JAVA" "JSON" "K" @@ -160,13 +172,15 @@ data: { "MAP" "MATCHED" "MAXVALUE" - "MICROSECOND" "MESSAGE_LENGTH" "MESSAGE_OCTET_LENGTH" "MESSAGE_TEXT" - "MILLISECOND" + "MICROSECOND" "MILLENNIUM" + "MILLISECOND" + "MINUTES" "MINVALUE" + "MONTHS" "MORE_" "MUMPS" "NAME" @@ -195,6 +209,7 @@ data: { "PARAMETER_SPECIFIC_SCHEMA" "PARTIAL" "PASCAL" + "PASSING" "PASSTHROUGH" "PAST" "PATH" @@ -211,24 +226,28 @@ data: { "RELATIVE" "REPEATABLE" "REPLACE" + "RESPECT" "RESTART" "RESTRICT" "RETURNED_CARDINALITY" "RETURNED_LENGTH" "RETURNED_OCTET_LENGTH" "RETURNED_SQLSTATE" + "RETURNING" "ROLE" "ROUTINE" "ROUTINE_CATALOG" "ROUTINE_NAME" "ROUTINE_SCHEMA" "ROW_COUNT" + "SCALAR" "SCALE" "SCHEMA" "SCHEMA_NAME" "SCOPE_CATALOGS" "SCOPE_NAME" "SCOPE_SCHEMA" + "SECONDS" "SECTION" "SECURITY" "SELF" @@ -269,8 +288,8 @@ data: { "SQL_INTERVAL_YEAR" "SQL_INTERVAL_YEAR_TO_MONTH" "SQL_LONGVARBINARY" - "SQL_LONGVARNCHAR" "SQL_LONGVARCHAR" + "SQL_LONGVARNCHAR" "SQL_NCHAR" "SQL_NCLOB" "SQL_NUMERIC" @@ -313,9 +332,11 @@ data: { "TRIGGER_CATALOG" "TRIGGER_NAME" "TRIGGER_SCHEMA" + "TUMBLE" "TYPE" "UNBOUNDED" "UNCOMMITTED" + "UNCONDITIONAL" "UNDER" "UNNAMED" "USAGE" @@ -323,44 +344,71 @@ data: { "USER_DEFINED_TYPE_CODE" "USER_DEFINED_TYPE_NAME" "USER_DEFINED_TYPE_SCHEMA" + "UTF16" + "UTF32" + "UTF8" "VERSION" "VIEW" "WEEK" - "WRAPPER" "WORK" + "WRAPPER" "WRITE" "XML" + "YEARS" "ZONE" # added in Beam + "AGGREGATE" "COMMENT" "IF" + "JAR" "LOCATION" "TBLPROPERTIES" ] + # List of non-reserved keywords to add; + # items in this list become non-reserved + nonReservedKeywordsToAdd: [ + ] + + # List of non-reserved keywords to remove; + # items in this list become reserved + nonReservedKeywordsToRemove: [ + ] + # List of additional join types. Each is a method with no arguments. # Example: LeftSemiJoin() joinTypes: [ ] # List of methods for parsing custom SQL statements. + # Return type of method implementation should be 'SqlNode'. + # Example: SqlShowDatabases(), SqlShowTables(). statementParserMethods: [ "SqlSetOptionBeam(Span.of(), null)" - "SqlCreateExternalTable()" ] # List of methods for parsing custom literals. + # Return type of method implementation should be "SqlNode". # Example: ParseJsonLiteral(). literalParserMethods: [ ] # List of methods for parsing custom data types. + # Return type of method implementation should be "SqlTypeNameSpec". + # Example: SqlParseTimeStampZ(). dataTypeParserMethods: [ ] + # List of methods for parsing builtin function calls. + # Return type of method implementation should be "SqlNode". + # Example: DateFunctionCall(). + builtinFunctionCallMethods: [ + ] + # List of methods for parsing extensions to "ALTER " calls. # Each must accept arguments "(SqlParserPos pos, String scope)". + # Example: "SqlUploadJarNode" alterStatementParserMethods: [ "SqlSetOptionBeam" ] @@ -368,7 +416,9 @@ data: { # List of methods for parsing extensions to "CREATE [OR REPLACE]" calls. # Each must accept arguments "(SqlParserPos pos, boolean replace)". createStatementParserMethods: [ - SqlCreateTableNotSupportedMessage + "SqlCreateExternalTable" + "SqlCreateFunction" + "SqlCreateTableNotSupportedMessage" ] # List of methods for parsing extensions to "DROP" calls. @@ -377,6 +427,14 @@ data: { "SqlDropTable" ] + # Binary operators tokens + binaryOperatorsTokens: [ + ] + + # Binary operators initialization + extraBinaryExpressions: [ + ] + # List of files in @includes directory that have parser method # implementations for parsing custom SQL statements, literals or types # given as part of "statementParserMethods", "literalParserMethods" or @@ -385,16 +443,14 @@ data: { "parserImpls.ftl" ] - # List of methods for parsing builtin function calls. - builtinFunctionCallMethods: [ - ] - + includePosixOperators: false includeCompoundIdentifier: true includeBraces: true includeAdditionalDeclarations: false } } + freemarkerLinks: { includes: includes/ } diff --git a/sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl b/sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl index 1fd6efda54fe..8983cc9c6ce4 100644 --- a/sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl +++ b/sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl @@ -146,10 +146,8 @@ Schema.Field Field() : * ( LOCATION location_string )? * ( TBLPROPERTIES tbl_properties )? */ -SqlCreate SqlCreateExternalTable() : +SqlCreate SqlCreateExternalTable(Span s, boolean replace) : { - final Span s = Span.of(); - final boolean replace = false; final boolean ifNotExists; final SqlIdentifier id; List fieldList = null; @@ -160,7 +158,7 @@ SqlCreate SqlCreateExternalTable() : } { - { +
    { s.add(this); } @@ -191,6 +189,35 @@ SqlCreate SqlCreateExternalTable() : } } +SqlCreate SqlCreateFunction(Span s, boolean replace) : +{ + boolean isAggregate = false; + final SqlIdentifier name; + final SqlNode jarName; +} +{ + ( + { + isAggregate = true; + } + )? + { + s.add(this); + } + name = CompoundIdentifier() + + jarName = StringLiteral() + { + return + new SqlCreateFunction( + s.end(this), + replace, + name, + jarName, + isAggregate); + } +} + SqlCreate SqlCreateTableNotSupportedMessage(Span s, boolean replace) : { } @@ -294,7 +321,7 @@ List RowFields() : Schema.FieldType SimpleType() : { final Span s = Span.of(); - final SqlTypeName simpleTypeName; + final SqlTypeNameSpec simpleTypeName; } { simpleTypeName = SqlTypeName(s) diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/SqlTransform.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/SqlTransform.java index 7b8ff5c9bf87..5ddaca5baaf1 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/SqlTransform.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/SqlTransform.java @@ -22,6 +22,7 @@ import java.util.HashMap; import java.util.List; import java.util.Map; +import java.util.ServiceLoader; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.BeamSqlEnvBuilder; @@ -33,6 +34,7 @@ import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; import org.apache.beam.sdk.extensions.sql.meta.provider.ReadOnlyTableProvider; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; +import org.apache.beam.sdk.extensions.sql.meta.store.InMemoryMetaStore; import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.SerializableFunction; @@ -42,8 +44,8 @@ import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.checkerframework.checker.nullness.qual.Nullable; @@ -82,26 +84,44 @@ * * p.run().waitUntilFinish(); * } + * + *

    A typical pipeline with Beam SQL DDL and DSL is: + * + *

    {@code
    + * PipelineOptions options = PipelineOptionsFactory.create();
    + * Pipeline p = Pipeline.create(options);
    + *
    + * String sql1 = "INSERT INTO pubsub_sink SELECT * FROM pubsub_source";
    + *
    + * String ddlSource = "CREATE EXTERNAL TABLE pubsub_source(" +
    + *     "attributes MAP, payload ROW)" +
    + *     "TYPE pubsub LOCATION 'projects/myproject/topics/topic1'";
    + *
    + * String ddlSink = "CREATE EXTERNAL TABLE pubsub_sink(" +
    + *     "attributes MAP, payload ROW)" +
    + *     "TYPE pubsub LOCATION 'projects/myproject/topics/mytopic'";
    + *
    + * p.apply(SqlTransform.query(sql1).withDdlString(ddlSource).withDdlString(ddlSink))
    + *
    + * p.run().waitUntilFinish();
    + * }
    */ @AutoValue -@Experimental @AutoValue.CopyAnnotations -@SuppressWarnings({ - "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public abstract class SqlTransform extends PTransform> { static final String PCOLLECTION_NAME = "PCOLLECTION"; abstract String queryString(); + abstract List ddlStrings(); + abstract QueryParameters queryParameters(); - abstract List udfDefinitions(); + abstract @Experimental List udfDefinitions(); - abstract List udafDefinitions(); + abstract @Experimental List udafDefinitions(); - abstract boolean autoUdfUdafLoad(); + abstract boolean autoLoading(); abstract Map tableProviderMap(); @@ -111,21 +131,27 @@ public abstract class SqlTransform extends PTransform> @Override public PCollection expand(PInput input) { - BeamSqlEnvBuilder sqlEnvBuilder = - BeamSqlEnv.builder(new ReadOnlyTableProvider(PCOLLECTION_NAME, toTableMap(input))); - - tableProviderMap().forEach(sqlEnvBuilder::addSchema); - - if (defaultTableProvider() != null) { - sqlEnvBuilder.setCurrentSchema(defaultTableProvider()); - } + TableProvider inputTableProvider = + new ReadOnlyTableProvider(PCOLLECTION_NAME, toTableMap(input)); + InMemoryMetaStore metaTableProvider = new InMemoryMetaStore(); + metaTableProvider.registerProvider(inputTableProvider); + BeamSqlEnvBuilder sqlEnvBuilder = BeamSqlEnv.builder(metaTableProvider); // TODO: validate duplicate functions. - sqlEnvBuilder.autoLoadBuiltinFunctions(); registerFunctions(sqlEnvBuilder); - if (autoUdfUdafLoad()) { + // Load automatic table providers before user ones so the user ones will cause a conflict if + // the same names are reused. + if (autoLoading()) { sqlEnvBuilder.autoLoadUserDefinedFunctions(); + ServiceLoader.load(TableProvider.class).forEach(metaTableProvider::registerProvider); + } + + tableProviderMap().forEach(sqlEnvBuilder::addSchema); + + final @Nullable String defaultTableProvider = defaultTableProvider(); + if (defaultTableProvider != null) { + sqlEnvBuilder.setCurrentSchema(defaultTableProvider); } sqlEnvBuilder.setQueryPlannerClassName( @@ -136,6 +162,7 @@ public PCollection expand(PInput input) { sqlEnvBuilder.setPipelineOptions(input.getPipeline().getOptions()); BeamSqlEnv sqlEnv = sqlEnvBuilder.build(); + ddlStrings().forEach(sqlEnv::executeDdl); return BeamSqlRelUtils.toPCollection( input.getPipeline(), sqlEnv.parseQuery(queryString(), queryParameters())); } @@ -192,17 +219,10 @@ private void registerFunctions(BeamSqlEnvBuilder sqlEnvBuilder) { *

    Any available implementation of {@link QueryPlanner} can be used as the query planner in * {@link SqlTransform}. An implementation can be specified globally for the entire pipeline with * {@link BeamSqlPipelineOptions#getPlannerName()}. The global planner can be overridden - * per-transform with {@link #withQueryPlannerClass(Class)}. + * per-transform with {@link #withQueryPlannerClass(Class)}. */ public static SqlTransform query(String queryString) { - return builder() - .setQueryString(queryString) - .setQueryParameters(QueryParameters.ofNone()) - .setUdafDefinitions(Collections.emptyList()) - .setUdfDefinitions(Collections.emptyList()) - .setTableProviderMap(Collections.emptyMap()) - .setAutoUdfUdafLoad(false) - .build(); + return builder().setQueryString(queryString).build(); } public SqlTransform withTableProvider(String name, TableProvider tableProvider) { @@ -215,6 +235,7 @@ public SqlTransform withDefaultTableProvider(String name, TableProvider tablePro return withTableProvider(name, tableProvider).toBuilder().setDefaultTableProvider(name).build(); } + @Experimental public SqlTransform withQueryPlannerClass(Class clazz) { return toBuilder().setQueryPlannerClassName(clazz.getName()).build(); } @@ -227,14 +248,21 @@ public SqlTransform withPositionalParameters(List parameters) { return toBuilder().setQueryParameters(QueryParameters.ofPositional(parameters)).build(); } - public SqlTransform withAutoUdfUdafLoad(boolean autoUdfUdafLoad) { - return toBuilder().setAutoUdfUdafLoad(autoUdfUdafLoad).build(); + public SqlTransform withDdlString(String ddlString) { + return toBuilder() + .setDdlStrings(ImmutableList.builder().addAll(ddlStrings()).add(ddlString).build()) + .build(); + } + + public SqlTransform withAutoLoading(boolean autoLoading) { + return toBuilder().setAutoLoading(autoLoading).build(); } /** * register a UDF function used in this query. * *

    Refer to {@link BeamSqlUdf} for more about how to implement a UDF in BeamSql. */ + @Experimental public SqlTransform registerUdf(String functionName, Class clazz) { return registerUdf(functionName, clazz, BeamSqlUdf.UDF_METHOD); } @@ -243,6 +271,7 @@ public SqlTransform registerUdf(String functionName, Class * Register {@link SerializableFunction} as a UDF function used in this query. Note, {@link * SerializableFunction} must have a constructor without arguments. */ + @Experimental public SqlTransform registerUdf(String functionName, SerializableFunction sfn) { return registerUdf(functionName, sfn.getClass(), "apply"); } @@ -258,6 +287,7 @@ private SqlTransform registerUdf(String functionName, Class clazz, String met } /** register a {@link Combine.CombineFn} as UDAF function used in this query. */ + @Experimental public SqlTransform registerUdaf(String functionName, Combine.CombineFn combineFn) { ImmutableList newUdafs = ImmutableList.builder() @@ -271,20 +301,29 @@ public SqlTransform registerUdaf(String functionName, Combine.CombineFn combineF abstract Builder toBuilder(); static Builder builder() { - return new AutoValue_SqlTransform.Builder(); + return new AutoValue_SqlTransform.Builder() + .setQueryParameters(QueryParameters.ofNone()) + .setDdlStrings(Collections.emptyList()) + .setUdafDefinitions(Collections.emptyList()) + .setUdfDefinitions(Collections.emptyList()) + .setTableProviderMap(Collections.emptyMap()) + .setAutoLoading(true); } @AutoValue.Builder + @AutoValue.CopyAnnotations abstract static class Builder { abstract Builder setQueryString(String queryString); abstract Builder setQueryParameters(QueryParameters queryParameters); - abstract Builder setUdfDefinitions(List udfDefinitions); + abstract Builder setDdlStrings(List ddlStrings); + + abstract @Experimental Builder setUdfDefinitions(List udfDefinitions); - abstract Builder setUdafDefinitions(List udafDefinitions); + abstract @Experimental Builder setUdafDefinitions(List udafDefinitions); - abstract Builder setAutoUdfUdafLoad(boolean autoUdfUdafLoad); + abstract Builder setAutoLoading(boolean autoLoading); abstract Builder setTableProviderMap(Map tableProviderMap); @@ -297,7 +336,7 @@ abstract static class Builder { @AutoValue @AutoValue.CopyAnnotations - @SuppressWarnings({"rawtypes"}) + @Experimental abstract static class UdfDefinition { abstract String udfName(); @@ -312,7 +351,7 @@ static UdfDefinition of(String udfName, Class clazz, String methodName) { @AutoValue @AutoValue.CopyAnnotations - @SuppressWarnings({"rawtypes"}) + @Experimental abstract static class UdafDefinition { abstract String udafName(); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/TableNameExtractionUtils.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/TableNameExtractionUtils.java index c6b17740e771..5b1c416ba682 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/TableNameExtractionUtils.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/TableNameExtractionUtils.java @@ -23,13 +23,13 @@ import java.util.Collections; import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.TableName; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlAsOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlJoin; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSelect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSetOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlAsOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlJoin; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlSelect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlSetOperator; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; /** diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/TypedCombineFnDelegate.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/TypedCombineFnDelegate.java new file mode 100644 index 000000000000..e77e6a382aaa --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/TypedCombineFnDelegate.java @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql; + +import java.lang.reflect.ParameterizedType; +import java.lang.reflect.TypeVariable; +import java.util.Optional; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.CannotProvideCoderException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderRegistry; +import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.util.Preconditions; +import org.apache.beam.sdk.values.TypeDescriptor; + +/** + * A {@link Combine.CombineFn} delegating all relevant calls to given delegate. This is used to + * create a type anonymous class for cases where the CombineFn is a generic class. The anonymous + * class can then be used in a UDAF as + * + *

    + *   .registerUdaf("UDAF", new TypedCombineFnDelegate<>(genericCombineFn) {})
    + * 
    + * + * @param the type of input + * @param the type of accumulator + * @param the type of output + */ +@Experimental +public class TypedCombineFnDelegate + extends Combine.CombineFn { + + private final Combine.CombineFn delegate; + + protected TypedCombineFnDelegate(Combine.CombineFn delegate) { + this.delegate = delegate; + } + + @Override + public TypeDescriptor getOutputType() { + return Optional.>ofNullable(getGenericSuperTypeAtIndex(2)) + .orElse(delegate.getOutputType()); + } + + @Override + public TypeDescriptor getInputType() { + return Optional.>ofNullable(getGenericSuperTypeAtIndex(0)) + .orElse(delegate.getInputType()); + } + + @Override + public AccumT createAccumulator() { + return delegate.createAccumulator(); + } + + @Override + public AccumT addInput(AccumT mutableAccumulator, InputT input) { + return delegate.addInput(mutableAccumulator, input); + } + + @Override + public AccumT mergeAccumulators(Iterable accumulators) { + return delegate.mergeAccumulators(accumulators); + } + + @Override + public OutputT extractOutput(AccumT accumulator) { + return delegate.extractOutput(accumulator); + } + + @Override + public AccumT compact(AccumT accumulator) { + return delegate.compact(accumulator); + } + + @Override + public OutputT apply(Iterable inputs) { + return delegate.apply(inputs); + } + + @Override + public OutputT defaultValue() { + return delegate.defaultValue(); + } + + @Override + public Coder getAccumulatorCoder(CoderRegistry registry, Coder inputCoder) + throws CannotProvideCoderException { + return delegate.getAccumulatorCoder(registry, inputCoder); + } + + @Override + public Coder getDefaultOutputCoder(CoderRegistry registry, Coder inputCoder) + throws CannotProvideCoderException { + return delegate.getDefaultOutputCoder(registry, inputCoder); + } + + @Override + public String getIncompatibleGlobalWindowErrorMessage() { + return delegate.getIncompatibleGlobalWindowErrorMessage(); + } + + @Override + public TypeVariable getInputTVariable() { + return delegate.getInputTVariable(); + } + + @Override + public TypeVariable getAccumTVariable() { + return delegate.getAccumTVariable(); + } + + @Override + public TypeVariable getOutputTVariable() { + return delegate.getOutputTVariable(); + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + delegate.populateDisplayData(builder); + } + + @SuppressWarnings("unchecked") + @Nullable + private TypeDescriptor getGenericSuperTypeAtIndex(int index) { + Class cls = Preconditions.checkArgumentNotNull(getClass()); + do { + Class superClass = cls.getSuperclass(); + if (superClass == null) { + break; + } + if (superClass.equals(TypedCombineFnDelegate.class)) { + @Nonnull + ParameterizedType superType = + (ParameterizedType) Preconditions.checkArgumentNotNull(cls.getGenericSuperclass()); + TypeDescriptor candidate = + (TypeDescriptor) TypeDescriptor.of(superType.getActualTypeArguments()[index]); + if (!(candidate instanceof TypeVariable)) { + return candidate; + } + } + cls = superClass; + } while (true); + return null; + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchema.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchema.java index d3d376d0030e..accbf63c5352 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchema.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchema.java @@ -24,13 +24,13 @@ import java.util.Set; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expression; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelProtoDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaVersion; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Schemas; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expression; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelProtoDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Schema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaVersion; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Schemas; import org.checkerframework.checker.nullness.qual.Nullable; /** Adapter from {@link TableProvider} to {@link Schema}. */ @@ -101,7 +101,7 @@ public Set getTypeNames() { } @Override - public org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Table getTable( + public org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Table getTable( String name) { Table table = tableProvider.getTable(name); if (table == null) { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java index b7b2a177a173..08ba342fe860 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java @@ -27,16 +27,16 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; import org.apache.beam.sdk.extensions.sql.meta.store.InMemoryMetaStore; import org.apache.beam.sdk.extensions.sql.meta.store.MetaStore; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteConnection; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expression; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelProtoDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaVersion; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Table; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteConnection; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expression; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelProtoDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Schema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaVersion; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Table; /** * Factory classes that Calcite uses to create initial schema for JDBC connection. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteTable.java index deade435b27d..d74e791670a7 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteTable.java @@ -27,21 +27,21 @@ import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.java.AbstractQueryableTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.QueryProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.Queryable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.Prepare; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.TableModify; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.ModifiableTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.TranslatableTable; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.java.AbstractQueryableTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.QueryProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.Queryable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.prepare.Prepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.TableModify; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.ModifiableTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.TranslatableTable; /** Adapter from {@link BeamSqlTable} to a calcite Table. */ @SuppressWarnings({ diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java index cbe32246d497..655a3fce8734 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java @@ -17,15 +17,13 @@ */ package org.apache.beam.sdk.extensions.sql.impl; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkNotNull; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkNotNull; -import java.lang.reflect.Method; import java.sql.SQLException; import java.util.AbstractMap.SimpleEntry; import java.util.Collection; import java.util.HashMap; import java.util.HashSet; -import java.util.List; import java.util.Map; import java.util.ServiceLoader; import java.util.Set; @@ -33,9 +31,9 @@ import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.sql.BeamSqlUdf; import org.apache.beam.sdk.extensions.sql.impl.QueryPlanner.QueryParameters; +import org.apache.beam.sdk.extensions.sql.impl.parser.BeamSqlParser; import org.apache.beam.sdk.extensions.sql.impl.planner.BeamRuleSets; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; -import org.apache.beam.sdk.extensions.sql.impl.udf.BeamBuiltinFunctionProvider; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; import org.apache.beam.sdk.extensions.sql.meta.provider.ReadOnlyTableProvider; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; @@ -45,12 +43,12 @@ import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.transforms.Combine.CombineFn; import org.apache.beam.sdk.transforms.SerializableFunction; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Strings; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalcitePrepare; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptUtil; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlExecutableStatement; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Strings; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; /** * Contains the metadata of tables/UDF functions, and exposes APIs to @@ -115,12 +113,11 @@ public BeamRelNode parseQuery(String query, QueryParameters queryParameters) } public boolean isDdl(String sqlStatement) throws ParseException { - return planner.parse(sqlStatement) instanceof SqlExecutableStatement; + return planner.parse(sqlStatement).getKind().belongsTo(SqlKind.DDL); } public void executeDdl(String sqlStatement) throws ParseException { - SqlExecutableStatement ddl = (SqlExecutableStatement) planner.parse(sqlStatement); - ddl.execute(getContext()); + BeamSqlParser.DDL_EXECUTOR.executeDdl(getContext(), planner.parse(sqlStatement)); } public CalcitePrepare.Context getContext() { @@ -148,7 +145,6 @@ public static class BeamSqlEnvBuilder { private String currentSchemaName; private Map schemaMap; private Set> functionSet; - private boolean autoLoadBuiltinFunctions; private boolean autoLoadUdfs; private PipelineOptions pipelineOptions; private Collection ruleSets; @@ -161,7 +157,6 @@ private BeamSqlEnvBuilder(TableProvider tableProvider) { schemaMap = new HashMap<>(); functionSet = new HashSet<>(); autoLoadUdfs = false; - autoLoadBuiltinFunctions = false; pipelineOptions = null; ruleSets = BeamRuleSets.getRuleSets(); } @@ -219,12 +214,6 @@ public BeamSqlEnvBuilder autoLoadUserDefinedFunctions() { return this; } - /** Load Beam SQL built-in functions defined in {@link BeamBuiltinFunctionProvider}. */ - public BeamSqlEnvBuilder autoLoadBuiltinFunctions() { - autoLoadBuiltinFunctions = true; - return this; - } - public BeamSqlEnvBuilder setQueryPlannerClassName(String name) { queryPlannerClassName = name; return this; @@ -247,14 +236,13 @@ public BeamSqlEnv build() { configureSchemas(jdbcConnection); - loadBeamBuiltinFunctions(); + QueryPlanner planner = instantiatePlanner(jdbcConnection, ruleSets); + // The planner may choose to add its own builtin functions to the schema, so load user-defined + // functions second, in case there's a conflict. loadUdfs(); - addUdfsUdafs(jdbcConnection); - QueryPlanner planner = instantiatePlanner(jdbcConnection, ruleSets); - return new BeamSqlEnv(jdbcConnection, planner); } @@ -275,25 +263,6 @@ private void configureSchemas(JdbcConnection jdbcConnection) { } } - private void loadBeamBuiltinFunctions() { - if (!autoLoadBuiltinFunctions) { - return; - } - - for (BeamBuiltinFunctionProvider provider : - ServiceLoader.load(BeamBuiltinFunctionProvider.class)) { - loadBuiltinUdf(provider.getBuiltinMethods()); - } - } - - private void loadBuiltinUdf(Map> methods) { - for (Map.Entry> entry : methods.entrySet()) { - for (Method method : entry.getValue()) { - functionSet.add(new SimpleEntry<>(entry.getKey(), UdfImpl.create(method))); - } - } - } - private void loadUdfs() { if (!autoLoadUdfs) { return; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptionsRegistrar.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptionsRegistrar.java index 5a1d31353bea..ae88d401a33a 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptionsRegistrar.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptionsRegistrar.java @@ -20,7 +20,7 @@ import com.google.auto.service.AutoService; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsRegistrar; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; /** {@link AutoService} registrar for {@link BeamSqlPipelineOptions}. */ @AutoService(PipelineOptionsRegistrar.class) diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamTableStatistics.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamTableStatistics.java index b5d6a2ebdfd7..d66275dcaf75 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamTableStatistics.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamTableStatistics.java @@ -18,16 +18,17 @@ package org.apache.beam.sdk.extensions.sql.impl; import java.io.Serializable; +import java.util.Collections; import java.util.List; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Internal; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelDistribution; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelDistributionTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelReferentialConstraint; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Statistic; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.ImmutableBitSet; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelDistribution; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelDistributionTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelReferentialConstraint; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Statistic; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ImmutableBitSet; /** This class stores row count statistics. */ @Experimental @@ -76,6 +77,11 @@ public boolean isKey(ImmutableBitSet columns) { return false; } + @Override + public List getKeys() { + return Collections.emptyList(); + } + @Override public List getReferentialConstraints() { return ImmutableList.of(); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteConnectionWrapper.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteConnectionWrapper.java index 0bdab106cb29..c99076d657b7 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteConnectionWrapper.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteConnectionWrapper.java @@ -35,14 +35,14 @@ import java.util.Map; import java.util.Properties; import java.util.concurrent.Executor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.java.JavaTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionConfig; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteConnection; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalcitePrepare; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.Enumerator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.Queryable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expression; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionConfig; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteConnection; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.Enumerator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.Queryable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expression; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; /** * Abstract wrapper for {@link CalciteConnection} to simplify extension. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteFactoryWrapper.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteFactoryWrapper.java index 6bd714ff6ba0..8fb51b7e26a3 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteFactoryWrapper.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteFactoryWrapper.java @@ -21,18 +21,18 @@ import java.sql.SQLException; import java.util.Properties; import java.util.TimeZone; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.java.JavaTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.AvaticaConnection; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.AvaticaFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.AvaticaPreparedStatement; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.AvaticaResultSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.AvaticaSpecificDatabaseMetaData; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.AvaticaStatement; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.Meta; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.QueryState; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.UnregisteredDriver; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.AvaticaConnection; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.AvaticaFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.AvaticaPreparedStatement; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.AvaticaResultSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.AvaticaSpecificDatabaseMetaData; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.AvaticaStatement; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.Meta; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.QueryState; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.UnregisteredDriver; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; /** * Wrapper for {@link CalciteFactory}. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteQueryPlanner.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteQueryPlanner.java index b24b973fd618..8c1a05b7d4a1 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteQueryPlanner.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteQueryPlanner.java @@ -17,52 +17,55 @@ */ package org.apache.beam.sdk.extensions.sql.impl; +import java.lang.reflect.Method; import java.util.Collection; import java.util.List; import java.util.Map; +import java.util.ServiceLoader; import java.util.stream.Collectors; -import org.apache.beam.sdk.extensions.sql.impl.QueryPlanner.Factory; import org.apache.beam.sdk.extensions.sql.impl.QueryPlanner.QueryParameters.Kind; import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; import org.apache.beam.sdk.extensions.sql.impl.planner.RelMdNodeStats; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionConfig; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Contexts; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.ConventionTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCost; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner.CannotPlanException; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptUtil; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.CalciteCatalogReader; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelRoot; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.BuiltInMetadata; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.ChainedRelMetadataProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.JaninoRelMetadataProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.MetadataDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.MetadataHandler; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParseException; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParser; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserImplFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.util.ChainedSqlOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.FrameworkConfig; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Frameworks; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Planner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelConversionException; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.BuiltInMethod; +import org.apache.beam.sdk.extensions.sql.impl.udf.BeamBuiltinFunctionProvider; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.Table; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionConfig; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Contexts; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.ConventionTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCost; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner.CannotPlanException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.prepare.CalciteCatalogReader; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelRoot; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.BuiltInMetadata; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.ChainedRelMetadataProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.JaninoRelMetadataProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.MetadataDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.MetadataHandler; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParseException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParser; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserImplFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.util.SqlOperatorTables; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.FrameworkConfig; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Frameworks; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Planner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelConversionException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.ValidationException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.BuiltInMethod; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -92,8 +95,20 @@ public CalciteQueryPlanner(JdbcConnection connection, Collection ruleSe @Override public QueryPlanner createPlanner( JdbcConnection jdbcConnection, Collection ruleSets) { + loadBuiltinFunctions(jdbcConnection); return new CalciteQueryPlanner(jdbcConnection, ruleSets); } + + private void loadBuiltinFunctions(JdbcConnection jdbcConnection) { + for (BeamBuiltinFunctionProvider provider : + ServiceLoader.load(BeamBuiltinFunctionProvider.class)) { + for (Map.Entry> entry : provider.getBuiltinMethods().entrySet()) { + for (Method method : entry.getValue()) { + jdbcConnection.getCurrentSchemaPlus().add(entry.getKey(), UdfImpl.create(method)); + } + } + } + } }; public FrameworkConfig defaultConfig(JdbcConnection connection, Collection ruleSets) { @@ -133,7 +148,7 @@ public FrameworkConfig defaultConfig(JdbcConnection connection, Collection costKeys = - mq.map.entrySet().stream() + List> costKeys = + mq.map.cellSet().stream() .filter(entry -> entry.getValue() instanceof BeamCostModel) .filter(entry -> ((BeamCostModel) entry.getValue()).isInfinite()) - .map(Map.Entry::getKey) .collect(Collectors.toList()); - costKeys.forEach(mq.map::remove); + costKeys.forEach(cell -> mq.map.remove(cell.getRowKey(), cell.getColumnKey())); return ((BeamRelNode) rel).beamComputeSelfCost(rel.getCluster().getPlanner(), mq); } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JavaUdfLoader.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JavaUdfLoader.java new file mode 100644 index 000000000000..75a20e262a7f --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JavaUdfLoader.java @@ -0,0 +1,279 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl; + +import com.google.auto.value.AutoValue; +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.net.URL; +import java.net.URLClassLoader; +import java.nio.channels.ReadableByteChannel; +import java.nio.channels.WritableByteChannel; +import java.nio.file.ProviderNotFoundException; +import java.security.AccessController; +import java.security.PrivilegedAction; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.ServiceLoader; +import org.apache.beam.sdk.extensions.sql.udf.AggregateFn; +import org.apache.beam.sdk.extensions.sql.udf.ScalarFn; +import org.apache.beam.sdk.extensions.sql.udf.UdfProvider; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.commons.codec.digest.DigestUtils; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Loads {@link UdfProvider} implementations from user-provided jars. + * + *

    All UDFs are loaded and cached for each jar to mitigate IO costs. + */ +public class JavaUdfLoader { + private static final Logger LOG = LoggerFactory.getLogger(JavaUdfLoader.class); + + /** + * Maps the external jar location to the functions the jar defines. Static so it can persist + * across multiple SQL transforms. + */ + private static final Map functionCache = new HashMap<>(); + + /** Maps potentially remote jar paths to their local file copies. */ + private static final Map jarCache = new HashMap<>(); + + /** Load a user-defined scalar function from the specified jar. */ + public ScalarFn loadScalarFunction(List functionPath, String jarPath) { + String functionFullName = String.join(".", functionPath); + try { + FunctionDefinitions functionDefinitions = loadJar(jarPath); + if (!functionDefinitions.scalarFunctions().containsKey(functionPath)) { + throw new IllegalArgumentException( + String.format( + "No implementation of scalar function %s found in %s.%n" + + " 1. Create a class implementing %s and annotate it with @AutoService(%s.class).%n" + + " 2. Add function %s to the class's userDefinedScalarFunctions implementation.", + functionFullName, + jarPath, + UdfProvider.class.getSimpleName(), + UdfProvider.class.getSimpleName(), + functionFullName)); + } + return functionDefinitions.scalarFunctions().get(functionPath); + } catch (IOException e) { + throw new RuntimeException( + String.format( + "Failed to load user-defined scalar function %s from %s", functionFullName, jarPath), + e); + } + } + + /** Load a user-defined aggregate function from the specified jar. */ + public AggregateFn loadAggregateFunction(List functionPath, String jarPath) { + String functionFullName = String.join(".", functionPath); + try { + FunctionDefinitions functionDefinitions = loadJar(jarPath); + if (!functionDefinitions.aggregateFunctions().containsKey(functionPath)) { + throw new IllegalArgumentException( + String.format( + "No implementation of aggregate function %s found in %s.%n" + + " 1. Create a class implementing %s and annotate it with @AutoService(%s.class).%n" + + " 2. Add function %s to the class's userDefinedAggregateFunctions implementation.", + functionFullName, + jarPath, + UdfProvider.class.getSimpleName(), + UdfProvider.class.getSimpleName(), + functionFullName)); + } + return functionDefinitions.aggregateFunctions().get(functionPath); + } catch (IOException e) { + throw new RuntimeException( + String.format( + "Failed to load user-defined aggregate function %s from %s", + functionFullName, jarPath), + e); + } + } + + /** + * Creates a temporary local copy of the file at {@code inputPath}, and returns a handle to the + * local copy. + */ + private File downloadFile(String inputPath, String mimeType) throws IOException { + Preconditions.checkArgument(!inputPath.isEmpty(), "Path cannot be empty."); + ResourceId inputResource = FileSystems.matchNewResource(inputPath, false /* is directory */); + try (ReadableByteChannel inputChannel = FileSystems.open(inputResource)) { + File outputFile = File.createTempFile("sql-udf-", inputResource.getFilename()); + ResourceId outputResource = + FileSystems.matchNewResource(outputFile.getAbsolutePath(), false /* is directory */); + try (WritableByteChannel outputChannel = FileSystems.create(outputResource, mimeType)) { + ByteStreams.copy(inputChannel, outputChannel); + } + // Compute and log checksum. + try (InputStream inputStream = new FileInputStream(outputFile)) { + LOG.info( + "Copied {} to {} with md5 hash {}.", + inputPath, + outputFile.getAbsolutePath(), + DigestUtils.md5Hex(inputStream)); + } + return outputFile; + } + } + + private File getLocalJar(String inputJarPath) throws IOException { + if (!jarCache.containsKey(inputJarPath)) { + jarCache.put(inputJarPath, downloadFile(inputJarPath, "application/java-archive")); + } + return jarCache.get(inputJarPath); + } + + /** + * This code creates a classloader, which needs permission if a security manager is installed. If + * this code might be invoked by code that does not have security permissions, then the + * classloader creation needs to occur inside a doPrivileged block. + */ + private URLClassLoader createUrlClassLoader(URL[] urls) { + return (URLClassLoader) + AccessController.doPrivileged( + new PrivilegedAction() { + @Override + public URLClassLoader run() { + return new URLClassLoader(urls); + } + }); + } + + private ClassLoader createClassLoader(String inputJarPath) throws IOException { + File tmpJar = getLocalJar(inputJarPath); + URL url = tmpJar.toURI().toURL(); + return createUrlClassLoader(new URL[] {url}); + } + + public ClassLoader createClassLoader(List inputJarPaths) throws IOException { + List urls = new ArrayList<>(); + for (String inputJar : inputJarPaths) { + urls.add(getLocalJar(inputJar).toURI().toURL()); + } + return createUrlClassLoader(urls.toArray(new URL[0])); + } + + @VisibleForTesting + Iterator getUdfProviders(ClassLoader classLoader) throws IOException { + return ServiceLoader.load(UdfProvider.class, classLoader).iterator(); + } + + private FunctionDefinitions loadJar(String jarPath) throws IOException { + if (functionCache.containsKey(jarPath)) { + LOG.debug("Using cached function definitions from {}", jarPath); + return functionCache.get(jarPath); + } + + ClassLoader classLoader = createClassLoader(jarPath); + Map, ScalarFn> scalarFunctions = new HashMap<>(); + Map, AggregateFn> aggregateFunctions = new HashMap<>(); + Iterator providers = getUdfProviders(classLoader); + int providersCount = 0; + while (providers.hasNext()) { + providersCount++; + UdfProvider provider = providers.next(); + provider + .userDefinedScalarFunctions() + .forEach( + (functionName, implementation) -> { + List functionPath = ImmutableList.copyOf(functionName.split("\\.")); + if (scalarFunctions.containsKey(functionPath)) { + throw new IllegalArgumentException( + String.format( + "Found multiple definitions of scalar function %s in %s.", + functionName, jarPath)); + } + scalarFunctions.put(functionPath, implementation); + }); + provider + .userDefinedAggregateFunctions() + .forEach( + (functionName, implementation) -> { + List functionPath = ImmutableList.copyOf(functionName.split("\\.")); + if (aggregateFunctions.containsKey(functionPath)) { + throw new IllegalArgumentException( + String.format( + "Found multiple definitions of aggregate function %s in %s.", + functionName, jarPath)); + } + aggregateFunctions.put(functionPath, implementation); + }); + } + if (providersCount == 0) { + throw new ProviderNotFoundException( + String.format( + "No %s implementation found in %s. Create a class implementing %s and annotate it with @AutoService(%s.class).", + UdfProvider.class.getSimpleName(), + jarPath, + UdfProvider.class.getSimpleName(), + UdfProvider.class.getSimpleName())); + } + LOG.info( + "Loaded {} implementations of {} from {} with {} scalar function(s).", + providersCount, + UdfProvider.class.getSimpleName(), + jarPath, + scalarFunctions.size()); + FunctionDefinitions userFunctionDefinitions = + FunctionDefinitions.newBuilder() + .setScalarFunctions(ImmutableMap.copyOf(scalarFunctions)) + .setAggregateFunctions(ImmutableMap.copyOf(aggregateFunctions)) + .build(); + + functionCache.put(jarPath, userFunctionDefinitions); + + return userFunctionDefinitions; + } + + /** Holds user defined function definitions. */ + @AutoValue + abstract static class FunctionDefinitions { + abstract ImmutableMap, ScalarFn> scalarFunctions(); + + abstract ImmutableMap, AggregateFn> aggregateFunctions(); + + @AutoValue.Builder + abstract static class Builder { + abstract Builder setScalarFunctions(ImmutableMap, ScalarFn> value); + + abstract Builder setAggregateFunctions(ImmutableMap, AggregateFn> value); + + abstract FunctionDefinitions build(); + } + + static Builder newBuilder() { + return new AutoValue_JavaUdfLoader_FunctionDefinitions.Builder() + .setScalarFunctions(ImmutableMap.of()) + .setAggregateFunctions(ImmutableMap.of()); + } + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java index 3ae7177e92c5..2ff8d6f2e23d 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java @@ -24,10 +24,10 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.values.KV; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteConnection; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteConnection; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; import org.checkerframework.checker.nullness.qual.Nullable; /** diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcDriver.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcDriver.java index 931d7b5ba475..1c08ea0f6fc0 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcDriver.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcDriver.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.extensions.sql.impl; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionProperty.SCHEMA_FACTORY; -import static org.apache.beam.vendor.calcite.v1_20_0.org.codehaus.commons.compiler.CompilerFactoryFactory.getDefaultCompilerFactory; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionProperty.SCHEMA_FACTORY; +import static org.apache.beam.vendor.calcite.v1_26_0.org.codehaus.commons.compiler.CompilerFactoryFactory.getDefaultCompilerFactory; import com.fasterxml.jackson.databind.ObjectMapper; import com.google.auto.service.AutoService; @@ -32,19 +32,18 @@ import org.apache.beam.sdk.extensions.sql.impl.planner.BeamRuleSets; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.AvaticaFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteConnection; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.Driver; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.CalcitePrepareImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollationTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.CalcRemoveRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.SortRemoveRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.Hook; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.EnumerableRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.AvaticaFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteConnection; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.Driver; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollationTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.runtime.Hook; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; /** * Calcite JDBC driver with Beam defaults. @@ -86,10 +85,10 @@ public class JdbcDriver extends Driver { planner.addRule(rule); } } - planner.removeRule(CalcRemoveRule.INSTANCE); - planner.removeRule(SortRemoveRule.INSTANCE); + planner.removeRule(CoreRules.CALC_REMOVE); + planner.removeRule(CoreRules.SORT_REMOVE); - for (RelOptRule rule : CalcitePrepareImpl.ENUMERABLE_RULES) { + for (RelOptRule rule : EnumerableRules.ENUMERABLE_RULES) { planner.removeRule(rule); } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcFactory.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcFactory.java index 22a6a524b8d7..19f1e4ce1e68 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcFactory.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcFactory.java @@ -18,26 +18,26 @@ package org.apache.beam.sdk.extensions.sql.impl; import static org.apache.beam.sdk.extensions.sql.impl.JdbcDriver.TOP_LEVEL_BEAM_SCHEMA; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.BuiltInConnectionProperty.TIME_ZONE; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionProperty.LEX; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionProperty.PARSER_FACTORY; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionProperty.SCHEMA; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionProperty.SCHEMA_FACTORY; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionProperty.TYPE_SYSTEM; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.BuiltInConnectionProperty.TIME_ZONE; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionProperty.LEX; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionProperty.PARSER_FACTORY; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionProperty.SCHEMA; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionProperty.SCHEMA_FACTORY; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionProperty.TYPE_SYSTEM; import java.util.Properties; -import org.apache.beam.sdk.extensions.sql.impl.parser.impl.BeamSqlParserImpl; +import org.apache.beam.sdk.extensions.sql.impl.parser.BeamSqlParser; import org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; import org.apache.beam.sdk.util.ReleaseInfo; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.java.JavaTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.AvaticaConnection; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.AvaticaFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.ConnectionProperty; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.UnregisteredDriver; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.Lex; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.AvaticaConnection; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.AvaticaFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.ConnectionProperty; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.UnregisteredDriver; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.Lex; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; /** * Implements {@link CalciteFactory} that is used by Clacite JDBC driver to instantiate different @@ -45,9 +45,9 @@ * *

    The purpose of this class is to intercept the connection creation and force a cache-less root * schema ({@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.SimpleCalciteSchema}). Otherwise + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.SimpleCalciteSchema}). Otherwise * Calcite uses {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CachingCalciteSchema} that eagerly + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CachingCalciteSchema} that eagerly * caches table information. This behavior does not work well for dynamic table providers. */ class JdbcFactory extends CalciteFactoryWrapper { @@ -95,7 +95,7 @@ private Properties ensureDefaultProperties(Properties originalInfo) { setIfNull(info, TIME_ZONE, "UTC"); setIfNull(info, LEX, Lex.JAVA.name()); - setIfNull(info, PARSER_FACTORY, BeamSqlParserImpl.class.getName() + "#FACTORY"); + setIfNull(info, PARSER_FACTORY, BeamSqlParser.class.getName() + "#FACTORY"); setIfNull(info, TYPE_SYSTEM, BeamRelDataTypeSystem.class.getName()); setIfNull(info, SCHEMA, TOP_LEVEL_BEAM_SCHEMA); setIfNull(info, SCHEMA_FACTORY, BeamCalciteSchemaFactory.AllProviders.class.getName()); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/LazyAggregateCombineFn.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/LazyAggregateCombineFn.java new file mode 100644 index 000000000000..81e7d2c9b8af --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/LazyAggregateCombineFn.java @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl; + +import edu.umd.cs.findbugs.annotations.Nullable; +import java.lang.reflect.ParameterizedType; +import java.lang.reflect.Type; +import java.lang.reflect.TypeVariable; +import java.util.Iterator; +import java.util.List; +import org.apache.beam.sdk.coders.CannotProvideCoderException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderRegistry; +import org.apache.beam.sdk.extensions.sql.udf.AggregateFn; +import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +/** + * {@link org.apache.beam.sdk.transforms.Combine.CombineFn} that wraps an {@link AggregateFn}. The + * {@link AggregateFn} is lazily instantiated so it doesn't have to be serialized/deserialized. + */ +public class LazyAggregateCombineFn + extends Combine.CombineFn { + private final List functionPath; + private final String jarPath; + private transient @Nullable AggregateFn aggregateFn = null; + + public LazyAggregateCombineFn(List functionPath, String jarPath) { + this.functionPath = functionPath; + this.jarPath = jarPath; + } + + @VisibleForTesting + LazyAggregateCombineFn(AggregateFn aggregateFn) { + this.functionPath = ImmutableList.of(); + this.jarPath = ""; + this.aggregateFn = aggregateFn; + } + + private AggregateFn getAggregateFn() { + if (aggregateFn == null) { + JavaUdfLoader loader = new JavaUdfLoader(); + aggregateFn = loader.loadAggregateFunction(functionPath, jarPath); + } + return aggregateFn; + } + + @Override + public AccumT createAccumulator() { + return getAggregateFn().createAccumulator(); + } + + @Override + public AccumT addInput(AccumT mutableAccumulator, InputT input) { + return getAggregateFn().addInput(mutableAccumulator, input); + } + + @Override + public AccumT mergeAccumulators(Iterable accumulators) { + AccumT first = accumulators.iterator().next(); + Iterable rest = new SkipFirstElementIterable<>(accumulators); + return getAggregateFn().mergeAccumulators(first, rest); + } + + @Override + public OutputT extractOutput(AccumT accumulator) { + return getAggregateFn().extractOutput(accumulator); + } + + @Override + public Coder getAccumulatorCoder(CoderRegistry registry, Coder inputCoder) + throws CannotProvideCoderException { + // Infer coder based on underlying AggregateFn instance. + return registry.getCoder( + getAggregateFn().getClass(), + AggregateFn.class, + ImmutableMap.>of(getInputTVariable(), inputCoder), + getAccumTVariable()); + } + + @Override + public TypeVariable getAccumTVariable() { + return AggregateFn.class.getTypeParameters()[1]; + } + + public UdafImpl getUdafImpl() { + return new LazyUdafImpl<>(this); + } + + @Override + public String toString() { + return String.format( + "%s %s from jar %s", + LazyAggregateCombineFn.class.getSimpleName(), String.join(".", functionPath), jarPath); + } + + /** Wrapper {@link Iterable} which always skips its first element. */ + private static class SkipFirstElementIterable implements Iterable { + private final Iterable all; + + SkipFirstElementIterable(Iterable all) { + this.all = all; + } + + @Override + public Iterator iterator() { + Iterator it = all.iterator(); + it.next(); + return it; + } + } + + /** {@link UdafImpl} that defers type inference to the underlying {@link AggregateFn}. */ + private static class LazyUdafImpl extends UdafImpl { + private final LazyAggregateCombineFn lazyFn; + + public LazyUdafImpl(LazyAggregateCombineFn lazyFn) { + super(lazyFn); + this.lazyFn = lazyFn; + } + + private Type[] getTypeArguments() { + Class clazz = lazyFn.getAggregateFn().getClass(); + while (clazz != null) { + for (Type genericInterface : clazz.getGenericInterfaces()) { + if (genericInterface instanceof ParameterizedType) { + ParameterizedType parameterizedType = ((ParameterizedType) genericInterface); + if (parameterizedType.getRawType().equals(AggregateFn.class)) { + return parameterizedType.getActualTypeArguments(); + } + } + } + clazz = clazz.getSuperclass(); + } + throw new IllegalStateException( + String.format( + "Cannot get type arguments for %s: must implement parameterized %s", + lazyFn, AggregateFn.class.getSimpleName())); + } + + @Override + protected Type getInputType() { + return getTypeArguments()[0]; + } + + @Override + protected Type getOutputType() { + return getTypeArguments()[2]; + } + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java index 9ae01b6e86bb..48374a599d67 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java @@ -22,10 +22,10 @@ import java.util.List; import java.util.Map; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; /** * An interface that planners should implement to convert sql statement to {@link BeamRelNode} or diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFnReflector.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFnReflector.java new file mode 100644 index 000000000000..57a14bcc4558 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFnReflector.java @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl; + +import java.lang.reflect.Method; +import java.lang.reflect.Modifier; +import java.util.Arrays; +import java.util.Collection; +import org.apache.beam.sdk.extensions.sql.udf.ScalarFn; +import org.apache.beam.sdk.util.common.ReflectHelpers; + +/** Reflection-based implementation logic for {@link ScalarFn}. */ +public class ScalarFnReflector { + /** + * Gets the method annotated with {@link + * org.apache.beam.sdk.extensions.sql.udf.ScalarFn.ApplyMethod} from {@code scalarFn}. + * + *

    There must be exactly one method annotated with {@link + * org.apache.beam.sdk.extensions.sql.udf.ScalarFn.ApplyMethod}, and it must be public. + */ + public static Method getApplyMethod(ScalarFn scalarFn) { + Class clazz = scalarFn.getClass(); + Collection matches = + ReflectHelpers.declaredMethodsWithAnnotation( + ScalarFn.ApplyMethod.class, clazz, ScalarFn.class); + + if (matches.isEmpty()) { + throw new IllegalArgumentException( + String.format( + "No method annotated with @%s found in class %s.", + ScalarFn.ApplyMethod.class.getSimpleName(), clazz.getName())); + } + + // If we have at least one match, then either it should be the only match + // or it should be an extension of the other matches (which came from parent + // classes). + Method first = matches.iterator().next(); + for (Method other : matches) { + if (!first.getName().equals(other.getName()) + || !Arrays.equals(first.getParameterTypes(), other.getParameterTypes())) { + throw new IllegalArgumentException( + String.format( + "Found multiple methods annotated with @%s. [%s] and [%s]", + ScalarFn.ApplyMethod.class.getSimpleName(), + ReflectHelpers.formatMethod(first), + ReflectHelpers.formatMethod(other))); + } + } + + // Method must be public. + if ((first.getModifiers() & Modifier.PUBLIC) == 0) { + throw new IllegalArgumentException( + String.format("Method %s is not public.", ReflectHelpers.formatMethod(first))); + } + + return first; + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFunctionImpl.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFunctionImpl.java index da8cb269748d..7649f5d7adb9 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFunctionImpl.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFunctionImpl.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Static.RESOURCE; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Static.RESOURCE; import java.lang.reflect.Constructor; import java.lang.reflect.Method; @@ -27,29 +27,29 @@ import java.util.Arrays; import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMultimap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.CallImplementor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.NullPolicy; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.ReflectiveCallNotNullImplementor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.RexImpTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.RexToLixTranslator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.ByteString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.function.SemiStrict; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.function.Strict; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expression; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expressions; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.ImplementableFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.ScalarFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperatorBinding; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMultimap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.CallImplementor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.NullPolicy; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.ReflectiveCallNotNullImplementor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.RexImpTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.RexToLixTranslator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.ByteString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.function.SemiStrict; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.function.Strict; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expression; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expressions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.ImplementableFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.ScalarFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperatorBinding; /** * Beam-customized version from {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.impl.ScalarFunctionImpl} , to + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.impl.ScalarFunctionImpl} , to * address BEAM-5921. */ @SuppressWarnings({ @@ -60,15 +60,28 @@ public class ScalarFunctionImpl extends UdfImplReflectiveFunctionBase implements ScalarFunction, ImplementableFunction { private final CallImplementor implementor; + private final String jarPath; - /** Private constructor. */ - private ScalarFunctionImpl(Method method, CallImplementor implementor) { + protected ScalarFunctionImpl(Method method, CallImplementor implementor, String jarPath) { super(method); this.implementor = implementor; + this.jarPath = jarPath; + } + + protected ScalarFunctionImpl(Method method, CallImplementor implementor) { + this(method, implementor, ""); } /** - * Creates {@link org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function} for + * Optional Beam filesystem path to the jar containing the bytecode for this function. Empty if + * the function is assumed to already be on the classpath. + */ + public String getJarPath() { + return jarPath; + } + + /** + * Creates {@link org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function} for * each method in a given class. */ public static ImmutableMultimap createAll(Class clazz) { @@ -87,21 +100,14 @@ public static ImmutableMultimap createAll(Class clazz) { } /** - * Creates {@link org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function} from - * given class. - * - *

    If a method of the given name is not found or it does not suit, returns {@code null}. + * Creates {@link org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function} from + * given method. When {@code eval} method does not suit, {@code null} is returned. * - * @param clazz class that is used to implement the function - * @param methodName Method name (typically "eval") - * @return created {@link ScalarFunction} or null + * @param method method that is used to implement the function + * @return created {@link Function} or null */ - public static Function create(Class clazz, String methodName) { - final Method method = findMethod(clazz, methodName); - if (method == null) { - return null; - } - return create(method); + public static Function create(Method method) { + return create(method, ""); } /** @@ -109,9 +115,16 @@ public static Function create(Class clazz, String methodName) { * given method. When {@code eval} method does not suit, {@code null} is returned. * * @param method method that is used to implement the function + * @param jarPath Path to jar that contains the method. * @return created {@link Function} or null */ - public static Function create(Method method) { + public static Function create(Method method, String jarPath) { + validateMethod(method); + CallImplementor implementor = createImplementor(method); + return new ScalarFunctionImpl(method, implementor, jarPath); + } + + protected static void validateMethod(Method method) { if (!Modifier.isStatic(method.getModifiers())) { Class clazz = method.getDeclaringClass(); if (!classHasPublicZeroArgsConstructor(clazz)) { @@ -121,9 +134,6 @@ public static Function create(Method method) { if (method.getExceptionTypes().length != 0) { throw new RuntimeException(method.getName() + " must not throw checked exception"); } - - CallImplementor implementor = createImplementor(method); - return new ScalarFunctionImpl(method, implementor); } @Override @@ -191,7 +201,7 @@ public Expression implement( } } - private static CallImplementor createImplementor(Method method) { + protected static CallImplementor createImplementor(Method method) { final NullPolicy nullPolicy = getNullPolicy(method); return RexImpTable.createImplementor( new ScalarReflectiveCallNotNullImplementor(method), nullPolicy, false); @@ -247,21 +257,6 @@ static boolean classHasPublicZeroArgsConstructor(Class clazz) { } return false; } - - /* - * Finds a method in a given class by name. - * @param clazz class to search method in - * @param name name of the method to find - * @return the first method with matching name or null when no method found - */ - static Method findMethod(Class clazz, String name) { - for (Method method : clazz.getMethods()) { - if (method.getName().equals(name) && !method.isBridge()) { - return method; - } - } - return null; - } } // End ScalarFunctionImpl.java diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/TableResolutionUtils.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/TableResolutionUtils.java index 842a7e0ddd8e..00ac8b5d9a15 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/TableResolutionUtils.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/TableResolutionUtils.java @@ -28,8 +28,8 @@ import org.apache.beam.sdk.extensions.sql.TableNameExtractionUtils; import org.apache.beam.sdk.extensions.sql.meta.CustomTableResolver; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -172,7 +172,7 @@ private static List tablesForSchema( */ private static class SchemaWithName { String name; - org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Schema schema; + org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Schema schema; static SchemaWithName create(JdbcConnection connection, String name) { SchemaWithName schemaWithName = new SchemaWithName(); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdafImpl.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdafImpl.java index e93edc16fdce..29ca04137f4e 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdafImpl.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdafImpl.java @@ -19,18 +19,21 @@ import java.io.Serializable; import java.lang.reflect.ParameterizedType; +import java.lang.reflect.Type; +import java.lang.reflect.TypeVariable; import java.util.ArrayList; import java.util.List; +import javax.annotation.Nullable; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; import org.apache.beam.sdk.transforms.Combine.CombineFn; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.AggImplementor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.AggregateFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.FunctionParameter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.ImplementableAggFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.AggImplementor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.AggregateFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.FunctionParameter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.ImplementableAggFunction; /** Implement {@link AggregateFunction} to take a {@link CombineFn} as UDAF. */ @Experimental @@ -39,7 +42,7 @@ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public final class UdafImpl +public class UdafImpl implements AggregateFunction, ImplementableAggFunction, Serializable { private CombineFn combineFn; @@ -69,23 +72,16 @@ public String getName() { @Override public RelDataType getType(RelDataTypeFactory typeFactory) { - ParameterizedType parameterizedType = findCombineFnSuperClass(); - return CalciteUtils.sqlTypeWithAutoCast( - typeFactory, parameterizedType.getActualTypeArguments()[0]); - } - - private ParameterizedType findCombineFnSuperClass() { - Class clazz = combineFn.getClass(); - while (!clazz.getSuperclass().equals(CombineFn.class)) { - clazz = clazz.getSuperclass(); - } - - if (!(clazz.getGenericSuperclass() instanceof ParameterizedType)) { - throw new IllegalStateException( - "Subclass of " + CombineFn.class + " must be parameterized to be used as a UDAF"); - } else { - return (ParameterizedType) clazz.getGenericSuperclass(); + Type inputType = getInputType(); + if (inputType instanceof TypeVariable) { + throw new IllegalArgumentException( + "Unable to infer SQL type from type variable " + + inputType + + ". This usually means you are trying to use a generic type whose type information " + + "is not known at runtime. You can wrap your CombineFn into typed subclass" + + " by 'new TypedCombineFnDelegate<...>(combineFn) {}'"); } + return CalciteUtils.sqlTypeWithAutoCast(typeFactory, inputType); } @Override @@ -105,6 +101,34 @@ public AggImplementor getImplementor(boolean windowContext) { @Override public RelDataType getReturnType(RelDataTypeFactory typeFactory) { - return CalciteUtils.sqlTypeWithAutoCast(typeFactory, combineFn.getOutputType().getType()); + return CalciteUtils.sqlTypeWithAutoCast(typeFactory, getOutputType()); + } + + protected Type getInputType() { + @Nullable Type inputType = combineFn.getInputType().getType(); + if (inputType != null && !(inputType instanceof TypeVariable)) { + return inputType; + } + ParameterizedType parameterizedType = findCombineFnSuperClass(); + return parameterizedType.getActualTypeArguments()[0]; + } + + protected Type getOutputType() { + return combineFn.getOutputType().getType(); + } + + private ParameterizedType findCombineFnSuperClass() { + + Class clazz = combineFn.getClass(); + + while (!clazz.getSuperclass().equals(CombineFn.class)) { + clazz = clazz.getSuperclass(); + } + + if (!(clazz.getGenericSuperclass() instanceof ParameterizedType)) { + throw new IllegalStateException( + "Subclass of " + CombineFn.class + " must be parameterized to be used as a UDAF"); + } + return (ParameterizedType) clazz.getGenericSuperclass(); } } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdfImpl.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdfImpl.java index a84b3ff5ac39..44e98d579a1f 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdfImpl.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdfImpl.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.extensions.sql.impl; import java.lang.reflect.Method; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.TranslatableTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.impl.TableMacroImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.TranslatableTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.impl.TableMacroImpl; /** Beam-customized facade behind {@link Function} to address BEAM-5921. */ @SuppressWarnings({ @@ -31,7 +31,7 @@ class UdfImpl { private UdfImpl() {} /** - * Creates {@link org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function} from + * Creates {@link org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function} from * given class. * *

    If a method of the given name is not found or it does not suit, returns {@code null}. @@ -49,7 +49,7 @@ public static Function create(Class clazz, String methodName) { } /** - * Creates {@link org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function} from + * Creates {@link org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function} from * given method. * * @param method method that is used to implement the function diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdfImplReflectiveFunctionBase.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdfImplReflectiveFunctionBase.java index 9146e0579fae..0faa084a034b 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdfImplReflectiveFunctionBase.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/UdfImplReflectiveFunctionBase.java @@ -20,16 +20,17 @@ import java.lang.reflect.Constructor; import java.lang.reflect.Method; import java.lang.reflect.Modifier; +import java.lang.reflect.Type; import java.util.ArrayList; import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.FunctionParameter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.impl.ReflectiveFunctionBase; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.ReflectUtil; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.FunctionParameter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.impl.ReflectiveFunctionBase; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ReflectUtil; /** Beam-customized version from {@link ReflectiveFunctionBase}, to address BEAM-5921. */ @SuppressWarnings({ @@ -100,7 +101,7 @@ public static ParameterListBuilder builder() { /** * Helps build lists of {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.FunctionParameter}. + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.FunctionParameter}. */ public static class ParameterListBuilder { final List builder = new ArrayList<>(); @@ -113,8 +114,7 @@ public ParameterListBuilder add(final Class type, final String name) { return add(type, name, false); } - public ParameterListBuilder add( - final Class type, final String name, final boolean optional) { + public ParameterListBuilder add(final Type type, final String name, final boolean optional) { final int ordinal = builder.size(); builder.add( new FunctionParameter() { @@ -142,7 +142,7 @@ public boolean isOptional() { } public ParameterListBuilder addMethodParameters(Method method) { - final Class[] types = method.getParameterTypes(); + final Type[] types = method.getGenericParameterTypes(); for (int i = 0; i < types.length; i++) { add( types[i], diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ZetaSqlUserDefinedSQLNativeTableValuedFunction.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ZetaSqlUserDefinedSQLNativeTableValuedFunction.java index 07c9cbdde8a6..00d37827229b 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ZetaSqlUserDefinedSQLNativeTableValuedFunction.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ZetaSqlUserDefinedSQLNativeTableValuedFunction.java @@ -19,13 +19,13 @@ import java.util.List; import org.apache.beam.sdk.annotations.Internal; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlOperandTypeChecker; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlOperandTypeInference; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlReturnTypeInference; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlUserDefinedFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlOperandTypeChecker; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlOperandTypeInference; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlReturnTypeInference; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlUserDefinedFunction; /** This is a class to indicate that a TVF is a ZetaSQL SQL native UDTVF. */ @Internal diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPCall.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPCall.java index fb1d6da04471..6ee91e81040d 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPCall.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPCall.java @@ -19,11 +19,11 @@ import java.util.ArrayList; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexPatternFieldRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexPatternFieldRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; /** * A {@code CEPCall} instance represents an operation (node) that contains an operator and a list of diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPFieldRef.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPFieldRef.java index 68aaf8d45e12..664bb5bb7da0 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPFieldRef.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPFieldRef.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.cep; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexPatternFieldRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexPatternFieldRef; /** * A {@code CEPFieldRef} instance represents a node that points to a specified field in a {@code diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPLiteral.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPLiteral.java index d67576719eb4..5f83cf2d01f1 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPLiteral.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPLiteral.java @@ -20,7 +20,7 @@ import java.math.BigDecimal; import org.apache.beam.sdk.extensions.sql.impl.SqlConversionException; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; import org.joda.time.ReadableDateTime; /** diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPOperation.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPOperation.java index 50f735ffcd47..13f6321d8cca 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPOperation.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPOperation.java @@ -19,10 +19,10 @@ import java.io.Serializable; import org.apache.beam.sdk.extensions.sql.impl.SqlConversionException; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexPatternFieldRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexPatternFieldRef; /** * {@code CEPOperation} is the base class for the evaluation operations defined in the {@code diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPOperator.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPOperator.java index 9dc3abb227e4..84fb72e978ad 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPOperator.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPOperator.java @@ -19,8 +19,8 @@ import java.io.Serializable; import java.util.Map; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; /** diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPPattern.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPPattern.java index dc7a4072d780..ed6668840c5b 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPPattern.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPPattern.java @@ -23,7 +23,7 @@ import javax.annotation.Nullable; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; /** Core pattern class that stores the definition of a single pattern. */ @SuppressWarnings({ diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPUtils.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPUtils.java index 1c45f60875f4..00698281c4b4 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPUtils.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/CEPUtils.java @@ -21,14 +21,14 @@ import java.util.List; import java.util.Map; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelFieldCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelFieldCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ImmutableBitSet; /** * Some utility methods for transforming Calcite's constructs into our own Beam constructs (for @@ -156,11 +156,10 @@ public static ArrayList makeOrderKeysFromCollation(RelCollation orderK } /** Transform the partition columns into serializable CEPFieldRef. */ - public static List getCEPFieldRefFromParKeys(List parKeys) { + public static List getCEPFieldRefFromParKeys(ImmutableBitSet partitionKeys) { ArrayList fieldList = new ArrayList<>(); - for (RexNode i : parKeys) { - RexInputRef parKey = (RexInputRef) i; - fieldList.add(new CEPFieldRef(parKey.getName(), parKey.getIndex())); + for (int index : partitionKeys.asList()) { + fieldList.add(new CEPFieldRef("Partition Key " + index, index)); } return fieldList; } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/OrderKey.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/OrderKey.java index 85825e65f2da..07d6654d4ece 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/OrderKey.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/cep/OrderKey.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.extensions.sql.impl.cep; import java.io.Serializable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelFieldCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelFieldCollation; /** * The {@code OrderKey} class stores the information to sort a column. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamSqlParser.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamSqlParser.java new file mode 100644 index 000000000000..c76a87b47c12 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamSqlParser.java @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.parser; + +import java.io.Reader; +import org.apache.beam.sdk.extensions.sql.impl.parser.impl.BeamSqlParserImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.server.DdlExecutor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlAbstractParserImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserImplFactory; + +public class BeamSqlParser { + + private BeamSqlParser() {} + + /** Parser factory. */ + public static final SqlParserImplFactory FACTORY = + new SqlParserImplFactory() { + @Override + public SqlAbstractParserImpl getParser(Reader stream) { + return BeamSqlParserImpl.FACTORY.getParser(stream); + } + + @Override + public DdlExecutor getDdlExecutor() { + return BeamSqlParser.DDL_EXECUTOR; + } + }; + + /** Ddl Executor. */ + public static final DdlExecutor DDL_EXECUTOR = + (context, node) -> { + ((ExecutableStatement) node).execute(context); + }; + + interface ExecutableStatement { + void execute(CalcitePrepare.Context context); + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCheckConstraint.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCheckConstraint.java index a6d145dc3fd2..95c96de6387e 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCheckConstraint.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCheckConstraint.java @@ -18,15 +18,15 @@ package org.apache.beam.sdk.extensions.sql.impl.parser; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSpecialOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.ImmutableNullableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlSpecialOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ImmutableNullableList; /** * Parse tree for {@code UNIQUE}, {@code PRIMARY KEY} constraints. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlColumnDeclaration.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlColumnDeclaration.java index 1ffe80f0502d..ba3e9fed6a3f 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlColumnDeclaration.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlColumnDeclaration.java @@ -18,16 +18,16 @@ package org.apache.beam.sdk.extensions.sql.impl.parser; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDataTypeSpec; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSpecialOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlDataTypeSpec; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlSpecialOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; /** Parse tree for column. */ public class SqlColumnDeclaration extends SqlCall { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateExternalTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateExternalTable.java index 900856109afd..e65f53235597 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateExternalTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateExternalTable.java @@ -19,8 +19,8 @@ import static com.alibaba.fastjson.JSON.parseObject; import static org.apache.beam.sdk.schemas.Schema.toSchema; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkNotNull; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Static.RESOURCE; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkNotNull; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Static.RESOURCE; import com.alibaba.fastjson.JSONObject; import java.util.List; @@ -28,25 +28,24 @@ import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalcitePrepare; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlCreate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlExecutableStatement; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSpecialOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlUtil; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlCreate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlSpecialOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; /** Parse tree for {@code CREATE EXTERNAL TABLE} statement. */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public class SqlCreateExternalTable extends SqlCreate implements SqlExecutableStatement { +public class SqlCreateExternalTable extends SqlCreate implements BeamSqlParser.ExecutableStatement { private final SqlIdentifier name; private final List columnList; private final SqlNode type; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateFunction.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateFunction.java new file mode 100644 index 000000000000..1fd06438c0af --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateFunction.java @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.parser; + +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Static.RESOURCE; + +import java.lang.reflect.Method; +import java.util.Arrays; +import java.util.List; +import java.util.Objects; +import org.apache.beam.sdk.extensions.sql.impl.JavaUdfLoader; +import org.apache.beam.sdk.extensions.sql.impl.LazyAggregateCombineFn; +import org.apache.beam.sdk.extensions.sql.impl.ScalarFnReflector; +import org.apache.beam.sdk.extensions.sql.impl.ScalarFunctionImpl; +import org.apache.beam.sdk.extensions.sql.udf.ScalarFn; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlCharStringLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlCreate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlSpecialOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** Parse tree for {@code CREATE FUNCTION} statement. */ +public class SqlCreateFunction extends SqlCreate implements BeamSqlParser.ExecutableStatement { + private final boolean isAggregate; + private final SqlIdentifier functionName; + private final SqlNode jarPath; + + private static final SqlSpecialOperator OPERATOR = + new SqlSpecialOperator("CREATE FUNCTION", SqlKind.CREATE_FUNCTION); + + /** Creates a SqlCreateFunction. */ + public SqlCreateFunction( + SqlParserPos pos, + boolean replace, + SqlIdentifier functionName, + SqlNode jarPath, + boolean isAggregate) { + super(OPERATOR, pos, replace, false); + this.functionName = Objects.requireNonNull(functionName, "functionName"); + this.jarPath = Objects.requireNonNull(jarPath, "jarPath"); + this.isAggregate = isAggregate; + } + + @Override + public void unparse(SqlWriter writer, int leftPrec, int rightPrec) { + writer.keyword("CREATE"); + if (isAggregate) { + writer.keyword("AGGREGATE"); + } + writer.keyword("FUNCTION"); + functionName.unparse(writer, 0, 0); + writer.keyword("USING JAR"); + jarPath.unparse(writer, 0, 0); + } + + @Override + public SqlOperator getOperator() { + return OPERATOR; + } + + @Override + public List getOperandList() { + return Arrays.asList(functionName, jarPath); + } + + @Override + public void execute(CalcitePrepare.Context context) { + final Pair pair = SqlDdlNodes.schema(context, true, functionName); + SchemaPlus schema = pair.left.plus(); + String lastName = pair.right; + if (!schema.getFunctions(lastName).isEmpty()) { + throw SqlUtil.newContextException( + functionName.getParserPosition(), + RESOURCE.internal(String.format("Function %s is already defined.", lastName))); + } + JavaUdfLoader udfLoader = new JavaUdfLoader(); + // TODO(BEAM-12355) Support qualified function names. + List functionPath = ImmutableList.of(lastName); + if (!(jarPath instanceof SqlCharStringLiteral)) { + throw SqlUtil.newContextException( + jarPath.getParserPosition(), + RESOURCE.internal("Jar path is not instanceof SqlCharStringLiteral.")); + } + String unquotedJarPath = ((SqlCharStringLiteral) jarPath).getNlsString().getValue(); + if (isAggregate) { + // Try loading the aggregate function just to make sure it exists. LazyAggregateCombineFn will + // need to fetch it again at runtime. + udfLoader.loadAggregateFunction(functionPath, unquotedJarPath); + LazyAggregateCombineFn combineFn = + new LazyAggregateCombineFn<>(functionPath, unquotedJarPath); + schema.add(lastName, combineFn.getUdafImpl()); + } else { + ScalarFn scalarFn = udfLoader.loadScalarFunction(functionPath, unquotedJarPath); + Method method = ScalarFnReflector.getApplyMethod(scalarFn); + Function function = ScalarFunctionImpl.create(method, unquotedJarPath); + schema.add(lastName, function); + } + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDdlNodes.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDdlNodes.java index 55c4fa4dcdfe..256b9b712863 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDdlNodes.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDdlNodes.java @@ -18,16 +18,16 @@ package org.apache.beam.sdk.extensions.sql.impl.parser; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalcitePrepare; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDataTypeSpec; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.NlsString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Util; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlDataTypeSpec; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.NlsString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Util; import org.checkerframework.checker.nullness.qual.Nullable; /** Utilities concerning {@link SqlNode} for DDL. */ diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropObject.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropObject.java index 2801dcd4b34a..a6ea9f84774e 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropObject.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropObject.java @@ -17,27 +17,26 @@ */ package org.apache.beam.sdk.extensions.sql.impl.parser; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Static.RESOURCE; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Static.RESOURCE; import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.BeamCalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalcitePrepare; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDrop; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlExecutableStatement; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlUtil; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlDrop; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; /** * Base class for parse trees of {@code DROP TABLE}, {@code DROP VIEW} and {@code DROP MATERIALIZED * VIEW} statements. */ -abstract class SqlDropObject extends SqlDrop implements SqlExecutableStatement { +abstract class SqlDropObject extends SqlDrop implements BeamSqlParser.ExecutableStatement { protected final SqlIdentifier name; /** Creates a SqlDropObject. */ diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropTable.java index 9541242a8604..3cc83a2884e6 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropTable.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.extensions.sql.impl.parser; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSpecialOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlSpecialOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; /** Parse tree for {@code DROP TABLE} statement. */ public class SqlDropTable extends SqlDropObject { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlSetOptionBeam.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlSetOptionBeam.java index 98d64663aba9..cfb1c715bdf3 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlSetOptionBeam.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlSetOptionBeam.java @@ -17,24 +17,23 @@ */ package org.apache.beam.sdk.extensions.sql.impl.parser; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Static.RESOURCE; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Static.RESOURCE; import org.apache.beam.sdk.extensions.sql.impl.BeamCalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalcitePrepare; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlExecutableStatement; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSetOption; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlUtil; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlSetOption; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; /** SQL parse tree node to represent {@code SET} and {@code RESET} statements. */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public class SqlSetOptionBeam extends SqlSetOption implements SqlExecutableStatement { +public class SqlSetOptionBeam extends SqlSetOption implements BeamSqlParser.ExecutableStatement { public SqlSetOptionBeam(SqlParserPos pos, String scope, SqlIdentifier name, SqlNode value) { super(pos, scope, name, value); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamCostModel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamCostModel.java index c3989cd62c7a..aa3a640709f1 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamCostModel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamCostModel.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.extensions.sql.impl.planner; import java.util.Objects; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCost; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCostFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCost; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCostFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptUtil; import org.checkerframework.checker.nullness.qual.Nullable; /** @@ -218,7 +218,7 @@ public static BeamCostModel convertRelOptCost(RelOptCost ic) { /** * Implementation of {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCostFactory} that creates + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCostFactory} that creates * {@link BeamCostModel}s. */ public static class Factory implements RelOptCostFactory { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamJavaTypeFactory.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamJavaTypeFactory.java index bc67b93a47e9..43b2c6b88add 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamJavaTypeFactory.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamJavaTypeFactory.java @@ -18,12 +18,12 @@ package org.apache.beam.sdk.extensions.sql.impl.planner; import java.lang.reflect.Type; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.java.JavaTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.BasicSqlType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.IntervalSqlType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.BasicSqlType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.IntervalSqlType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; /** customized data type in Beam. */ public class BeamJavaTypeFactory extends JavaTypeFactoryImpl { @@ -40,6 +40,12 @@ public Type getJavaClass(RelDataType type) { return type.isNullable() ? Float.class : float.class; } } + // Map BINARY and VARBINARY to byte[] instead of ByteString so UDFs over these types don't + // require vendored Calcite. + if (type.getSqlTypeName() == SqlTypeName.BINARY + || type.getSqlTypeName() == SqlTypeName.VARBINARY) { + return byte[].class; + } return super.getJavaClass(type); } } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRelDataTypeSystem.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRelDataTypeSystem.java index 445242200af9..838922bc7cf6 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRelDataTypeSystem.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRelDataTypeSystem.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.extensions.sql.impl.planner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeSystem; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeSystemImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeSystem; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeSystemImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; /** customized data type in Beam. */ public class BeamRelDataTypeSystem extends RelDataTypeSystemImpl { @@ -43,13 +43,25 @@ public boolean shouldConvertRaggedUnionTypesToVarying() { return true; } + @Override + public int getDefaultPrecision(SqlTypeName typeName) { + switch (typeName) { + case TIME: + case TIMESTAMP: + case TIMESTAMP_WITH_LOCAL_TIME_ZONE: + return 6; // support microsecond precision + default: + return super.getDefaultPrecision(typeName); + } + } + @Override public int getMaxPrecision(SqlTypeName typeName) { switch (typeName) { case TIME: - return 6; // support microsecond time precision + case TIMESTAMP: case TIMESTAMP_WITH_LOCAL_TIME_ZONE: - return 6; // support microsecond datetime precision + return 6; // support microsecond precision default: return super.getMaxPrecision(typeName); } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java index d9bc23663b04..0b8210b9a88d 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java @@ -24,6 +24,7 @@ import org.apache.beam.sdk.extensions.sql.impl.rule.BeamAggregateProjectMergeRule; import org.apache.beam.sdk.extensions.sql.impl.rule.BeamAggregationRule; import org.apache.beam.sdk.extensions.sql.impl.rule.BeamBasicAggregationRule; +import org.apache.beam.sdk.extensions.sql.impl.rule.BeamCalcMergeRule; import org.apache.beam.sdk.extensions.sql.impl.rule.BeamCalcRule; import org.apache.beam.sdk.extensions.sql.impl.rule.BeamCoGBKJoinRule; import org.apache.beam.sdk.extensions.sql.impl.rule.BeamEnumerableConverterRule; @@ -42,34 +43,14 @@ import org.apache.beam.sdk.extensions.sql.impl.rule.BeamUnnestRule; import org.apache.beam.sdk.extensions.sql.impl.rule.BeamValuesRule; import org.apache.beam.sdk.extensions.sql.impl.rule.BeamWindowRule; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.AggregateJoinTransposeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.AggregateRemoveRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.AggregateUnionAggregateRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.CalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterAggregateTransposeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterJoinRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterProjectTransposeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterSetOpTransposeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.JoinCommuteRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.JoinPushExpressionsRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectFilterTransposeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectSetOpTransposeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectSortTransposeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectToWindowRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.PruneEmptyRules; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.SortProjectTransposeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.UnionEliminatorRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.UnionToDistinctRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSets; +import org.apache.beam.sdk.extensions.sql.impl.rule.LogicalCalcMergeRule; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.PruneEmptyRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSets; /** * {@link RuleSet} used in {@code BeamQueryPlanner}. It translates a standard Calcite {@link @@ -80,66 +61,68 @@ public class BeamRuleSets { private static final List LOGICAL_OPTIMIZATIONS = ImmutableList.of( // Rules for window functions - ProjectToWindowRule.PROJECT, + CoreRules.PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW, // Rules so we only have to implement Calc - FilterCalcMergeRule.INSTANCE, - ProjectCalcMergeRule.INSTANCE, - FilterToCalcRule.INSTANCE, - ProjectToCalcRule.INSTANCE, + CoreRules.FILTER_CALC_MERGE, + CoreRules.PROJECT_CALC_MERGE, + CoreRules.FILTER_TO_CALC, + CoreRules.PROJECT_TO_CALC, BeamIOPushDownRule.INSTANCE, // disabled due to https://issues.apache.org/jira/browse/BEAM-6810 - // CalcRemoveRule.INSTANCE, - CalcMergeRule.INSTANCE, + // CoreRules.CALC_REMOVE, + + // Rules to merge matching Calcs together. + LogicalCalcMergeRule.INSTANCE, + BeamCalcMergeRule.INSTANCE, // push a filter into a join - FilterJoinRule.FILTER_ON_JOIN, + CoreRules.FILTER_INTO_JOIN, // push filter into the children of a join - FilterJoinRule.JOIN, + CoreRules.JOIN_CONDITION_PUSH, // push filter through an aggregation - FilterAggregateTransposeRule.INSTANCE, + CoreRules.FILTER_AGGREGATE_TRANSPOSE, // push filter through set operation - FilterSetOpTransposeRule.INSTANCE, + CoreRules.FILTER_SET_OP_TRANSPOSE, // push project through set operation - ProjectSetOpTransposeRule.INSTANCE, + CoreRules.PROJECT_SET_OP_TRANSPOSE, // aggregation and projection rules BeamAggregateProjectMergeRule.INSTANCE, // push a projection past a filter or vice versa - ProjectFilterTransposeRule.INSTANCE, - FilterProjectTransposeRule.INSTANCE, + CoreRules.PROJECT_FILTER_TRANSPOSE, + CoreRules.FILTER_PROJECT_TRANSPOSE, // push a projection to the children of a join // merge projections - ProjectMergeRule.INSTANCE, - // ProjectRemoveRule.INSTANCE, + CoreRules.PROJECT_MERGE, + // CoreRules.PROJECT_REMOVE, // reorder sort and projection - SortProjectTransposeRule.INSTANCE, - ProjectSortTransposeRule.INSTANCE, + CoreRules.SORT_PROJECT_TRANSPOSE, // join rules - JoinPushExpressionsRule.INSTANCE, - JoinCommuteRule.INSTANCE, + CoreRules.JOIN_PUSH_EXPRESSIONS, + CoreRules.JOIN_COMMUTE, BeamJoinAssociateRule.INSTANCE, BeamJoinPushThroughJoinRule.RIGHT, BeamJoinPushThroughJoinRule.LEFT, // remove union with only a single child - UnionEliminatorRule.INSTANCE, + CoreRules.UNION_REMOVE, // convert non-all union into all-union + distinct - UnionToDistinctRule.INSTANCE, + CoreRules.UNION_TO_DISTINCT, // remove aggregation if it does not aggregate and input is already distinct - AggregateRemoveRule.INSTANCE, + CoreRules.AGGREGATE_REMOVE, // push aggregate through join - AggregateJoinTransposeRule.EXTENDED, + CoreRules.AGGREGATE_JOIN_TRANSPOSE_EXTENDED, // aggregate union rule - AggregateUnionAggregateRule.INSTANCE, + CoreRules.AGGREGATE_UNION_AGGREGATE, // reduce aggregate functions like AVG, STDDEV_POP etc. - // AggregateReduceFunctionsRule.INSTANCE, + // CoreRules.AGGREGATE_REDUCE_FUNCTIONS, // remove unnecessary sort rule // https://issues.apache.org/jira/browse/BEAM-5073 - // SortRemoveRule.INSTANCE, + // CoreRules.SORT_REMOVE,, BeamTableFunctionScanRule.INSTANCE, // prune empty results rules PruneEmptyRules.AGGREGATE_INSTANCE, diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/NodeStatsMetadata.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/NodeStatsMetadata.java index f0991afc5bdb..daa749bb0ef4 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/NodeStatsMetadata.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/NodeStatsMetadata.java @@ -18,12 +18,12 @@ package org.apache.beam.sdk.extensions.sql.impl.planner; import java.lang.reflect.Method; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Types; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.Metadata; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.MetadataDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.MetadataHandler; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Types; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.Metadata; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.MetadataDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.MetadataHandler; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; /** * This is a metadata used for row count and rate estimation. It extends Calcite's Metadata diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/RelMdNodeStats.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/RelMdNodeStats.java index df325bcb1e06..3eca22866436 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/RelMdNodeStats.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/RelMdNodeStats.java @@ -17,16 +17,18 @@ */ package org.apache.beam.sdk.extensions.sql.impl.planner; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; + import java.util.List; -import java.util.Map; import java.util.stream.Collectors; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.MetadataDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.MetadataHandler; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.Table; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.MetadataDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.MetadataHandler; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; /** * This is the implementation of NodeStatsMetadata. Methods to estimate rate and row count for @@ -71,16 +73,15 @@ private NodeStats getBeamNodeStats(BeamRelNode rel, RelMetadataQuery mq) { // wraps the metadata provider with CachingRelMetadataProvider. However, // CachingRelMetadataProvider checks timestamp before returning previous results. Therefore, // there wouldn't be a problem in that case. - List keys = - mq.map.entrySet().stream() + List> keys = + mq.map.cellSet().stream() + .filter(entry -> entry != null) + .filter(entry -> entry.getValue() != null) .filter(entry -> entry.getValue() instanceof NodeStats) - .filter(entry -> ((NodeStats) entry.getValue()).isUnknown()) - .map(Map.Entry::getKey) + .filter(entry -> (checkArgumentNotNull((NodeStats) entry.getValue()).isUnknown())) .collect(Collectors.toList()); - for (List key : keys) { - mq.map.remove(key); - } + keys.forEach(cell -> mq.map.remove(cell.getRowKey(), cell.getColumnKey())); return rel.estimateNodeStats(mq); } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/AbstractBeamCalcRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/AbstractBeamCalcRel.java index 7ef101e36613..0bcaae23a8ec 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/AbstractBeamCalcRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/AbstractBeamCalcRel.java @@ -20,15 +20,15 @@ import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexProgram; /** BeamRelNode to replace {@code Project} and {@code Filter} node. */ @Internal diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java index a77d04d171bf..f3e14da9b96e 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java @@ -19,7 +19,7 @@ import static java.util.stream.Collectors.toList; import static org.apache.beam.sdk.values.PCollection.IsBounded.BOUNDED; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.io.Serializable; import java.util.List; @@ -50,15 +50,15 @@ import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Aggregate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.AggregateCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.ImmutableBitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Aggregate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.AggregateCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ImmutableBitSet; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; @@ -83,6 +83,7 @@ public BeamAggregationRel( int windowFieldIndex) { super(cluster, traits, child, groupSet, groupSets, aggCalls); + assert getGroupType() == Group.SIMPLE; this.windowFn = windowFn; this.windowFieldIndex = windowFieldIndex; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java index 6dd0b72887ff..c4356959e727 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java @@ -18,32 +18,37 @@ package org.apache.beam.sdk.extensions.sql.impl.rel; import static org.apache.beam.sdk.schemas.Schema.FieldType; -import static org.apache.beam.sdk.schemas.Schema.TypeName; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; +import java.io.IOException; import java.lang.reflect.InvocationTargetException; -import java.lang.reflect.Method; import java.lang.reflect.Modifier; import java.lang.reflect.Type; import java.math.BigDecimal; +import java.sql.Date; +import java.sql.Time; +import java.sql.Timestamp; import java.time.LocalDate; import java.time.LocalDateTime; import java.time.LocalTime; import java.util.AbstractList; import java.util.AbstractMap; +import java.util.ArrayList; import java.util.Arrays; -import java.util.Collection; +import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Set; import java.util.TimeZone; +import java.util.stream.Collectors; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlPipelineOptions; +import org.apache.beam.sdk.extensions.sql.impl.JavaUdfLoader; +import org.apache.beam.sdk.extensions.sql.impl.ScalarFunctionImpl; import org.apache.beam.sdk.extensions.sql.impl.planner.BeamJavaTypeFactory; import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.CharType; import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.TimeWithLocalTzType; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.schemas.logicaltypes.DateTime; import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.PTransform; @@ -51,44 +56,46 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.DataContext; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.JavaRowFormat; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.PhysType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.PhysTypeImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.RexToLixTranslator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.java.JavaTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.ByteString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.QueryProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.BlockBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expression; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expressions; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.GotoExpressionKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.MemberDeclaration; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.ParameterExpression; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Types; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPredicateList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexSimplify; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexUtil; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlConformance; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlConformanceEnum; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.BuiltInMethod; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.DataContext; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.JavaRowFormat; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.PhysType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.PhysTypeImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.RexToLixTranslator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.ByteString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.QueryProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.BlockBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expression; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expressions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.MemberDeclaration; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.ParameterExpression; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Types; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPredicateList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexSimplify; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.runtime.SqlFunctions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlConformance; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlConformanceEnum; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlUserDefinedFunction; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.checkerframework.checker.nullness.qual.Nullable; import org.codehaus.commons.compiler.CompileException; import org.codehaus.janino.ScriptEvaluator; +import org.joda.time.DateTime; import org.joda.time.Instant; -import org.joda.time.ReadableInstant; /** BeamRelNode to replace {@code Project} and {@code Filter} node. */ @SuppressWarnings({ @@ -101,10 +108,7 @@ public class BeamCalcRel extends AbstractBeamCalcRel { private static final long NANOS_PER_MILLISECOND = 1000000L; private static final long MILLIS_PER_DAY = 86400000L; - private static final ParameterExpression outputSchemaParam = - Expressions.parameter(Schema.class, "outputSchema"); - private static final ParameterExpression processContextParam = - Expressions.parameter(DoFn.ProcessContext.class, "c"); + private static final ParameterExpression rowParam = Expressions.parameter(Row.class, "row"); public BeamCalcRel(RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram program) { super(cluster, traits, input, program); @@ -128,8 +132,8 @@ private class Transform extends PTransform, PCollection expand(PCollectionList pinput) { @@ -148,9 +152,6 @@ public PCollection expand(PCollectionList pinput) { final PhysType physType = PhysTypeImpl.of(typeFactory, getRowType(), JavaRowFormat.ARRAY, false); - Expression input = - Expressions.convert_(Expressions.call(processContextParam, "element"), Row.class); - final RexBuilder rexBuilder = getCluster().getRexBuilder(); final RelMetadataQuery mq = RelMetadataQuery.instance(); final RelOptPredicateList predicates = mq.getPulledUpPredicates(getInput()); @@ -162,7 +163,7 @@ public PCollection expand(PCollectionList pinput) { program, typeFactory, builder, - new InputGetterImpl(input, upstream.getSchema()), + new InputGetterImpl(rowParam, upstream.getSchema()), null, conformance); @@ -174,57 +175,29 @@ public PCollection expand(PCollectionList pinput) { builder, physType, DataContext.ROOT, - new InputGetterImpl(input, upstream.getSchema()), + new InputGetterImpl(rowParam, upstream.getSchema()), null); - boolean verifyRowValues = - pinput.getPipeline().getOptions().as(BeamSqlPipelineOptions.class).getVerifyRowValues(); - - List listValues = Lists.newArrayListWithCapacity(expressions.size()); - for (int index = 0; index < expressions.size(); index++) { - Expression value = expressions.get(index); - FieldType toType = outputSchema.getField(index).getType(); - listValues.add(castOutput(value, toType)); - } - Method newArrayList = Types.lookupMethod(Arrays.class, "asList"); - Expression valueList = Expressions.call(newArrayList, listValues); - - // Expressions.call is equivalent to: output = - // Row.withSchema(outputSchema).attachValue(values); - Expression output = Expressions.call(Row.class, "withSchema", outputSchemaParam); - - if (verifyRowValues) { - Method attachValues = Types.lookupMethod(Row.Builder.class, "addValues", List.class); - output = Expressions.call(output, attachValues, valueList); - output = Expressions.call(output, "build"); - } else { - Method attachValues = Types.lookupMethod(Row.Builder.class, "attachValues", List.class); - output = Expressions.call(output, attachValues, valueList); - } - builder.add( - // Expressions.ifThen is equivalent to: - // if (condition) { - // c.output(output); - // } - Expressions.ifThen( + Expressions.ifThenElse( condition, - Expressions.makeGoto( - GotoExpressionKind.Sequence, - null, - Expressions.call( - processContextParam, - Types.lookupMethod(DoFn.ProcessContext.class, "output", Object.class), - output)))); + Expressions.return_(null, physType.record(expressions)), + Expressions.return_(null, Expressions.constant(null)))); - CalcFn calcFn = new CalcFn(builder.toBlock().toString(), outputSchema); + BeamSqlPipelineOptions options = + pinput.getPipeline().getOptions().as(BeamSqlPipelineOptions.class); + + CalcFn calcFn = + new CalcFn( + builder.toBlock().toString(), + outputSchema, + options.getVerifyRowValues(), + getJarPaths(program)); // validate generated code calcFn.compile(); - PCollection projectStream = upstream.apply(ParDo.of(calcFn)).setRowSchema(outputSchema); - - return projectStream; + return upstream.apply(ParDo.of(calcFn)).setRowSchema(outputSchema); } } @@ -232,22 +205,36 @@ public PCollection expand(PCollectionList pinput) { private static class CalcFn extends DoFn { private final String processElementBlock; private final Schema outputSchema; + private final boolean verifyRowValues; + private final List jarPaths; private transient @Nullable ScriptEvaluator se = null; - public CalcFn(String processElementBlock, Schema outputSchema) { + public CalcFn( + String processElementBlock, + Schema outputSchema, + boolean verifyRowValues, + List jarPaths) { this.processElementBlock = processElementBlock; this.outputSchema = outputSchema; + this.verifyRowValues = verifyRowValues; + this.jarPaths = jarPaths; } ScriptEvaluator compile() { ScriptEvaluator se = new ScriptEvaluator(); + if (!jarPaths.isEmpty()) { + try { + JavaUdfLoader udfLoader = new JavaUdfLoader(); + ClassLoader classLoader = udfLoader.createClassLoader(jarPaths); + se.setParentClassLoader(classLoader); + } catch (IOException e) { + throw new RuntimeException("Failed to load user-provided jar(s).", e); + } + } se.setParameters( - new String[] {outputSchemaParam.name, processContextParam.name, DataContext.ROOT.name}, - new Class[] { - (Class) outputSchemaParam.getType(), - (Class) processContextParam.getType(), - (Class) DataContext.ROOT.getType() - }); + new String[] {rowParam.name, DataContext.ROOT.name}, + new Class[] {(Class) rowParam.getType(), (Class) DataContext.ROOT.getType()}); + se.setReturnType(Object[].class); try { se.cook(processElementBlock); } catch (CompileException e) { @@ -265,149 +252,162 @@ public void setup() { @ProcessElement public void processElement(ProcessContext c) { assert se != null; + final Object[] v; try { - se.evaluate(new Object[] {outputSchema, c, CONTEXT_INSTANCE}); + v = (Object[]) se.evaluate(new Object[] {c.element(), CONTEXT_INSTANCE}); } catch (InvocationTargetException e) { throw new RuntimeException( "CalcFn failed to evaluate: " + processElementBlock, e.getCause()); } + if (v != null) { + Row row = toBeamRow(Arrays.asList(v), outputSchema, verifyRowValues); + c.output(row); + } } } - private static final Map rawTypeMap = - ImmutableMap.builder() - .put(TypeName.BYTE, Byte.class) - .put(TypeName.INT16, Short.class) - .put(TypeName.INT32, Integer.class) - .put(TypeName.INT64, Long.class) - .put(TypeName.FLOAT, Float.class) - .put(TypeName.DOUBLE, Double.class) - .build(); - - private static Expression castOutput(Expression value, FieldType toType) { - Expression returnValue = value; - if (value.getType() == Object.class || !(value.getType() instanceof Class)) { - // fast copy path, just pass object through - returnValue = value; - } else if (CalciteUtils.isDateTimeType(toType) - && !Types.isAssignableFrom(ReadableInstant.class, (Class) value.getType())) { - returnValue = castOutputTime(value, toType); - } else if (toType.getTypeName() == TypeName.DECIMAL - && !Types.isAssignableFrom(BigDecimal.class, (Class) value.getType())) { - returnValue = Expressions.new_(BigDecimal.class, value); - } else if (toType.getTypeName() == TypeName.BYTES - && Types.isAssignableFrom(ByteString.class, (Class) value.getType())) { - returnValue = - Expressions.condition( - Expressions.equal(value, Expressions.constant(null)), - Expressions.constant(null), - Expressions.call(value, "getBytes")); - } else if (((Class) value.getType()).isPrimitive() - || Types.isAssignableFrom(Number.class, (Class) value.getType())) { - Type rawType = rawTypeMap.get(toType.getTypeName()); - if (rawType != null) { - returnValue = Types.castIfNecessary(rawType, value); + private static List getJarPaths(RexProgram program) { + ImmutableList.Builder jarPaths = new ImmutableList.Builder<>(); + for (RexNode node : program.getExprList()) { + if (node instanceof RexCall) { + SqlOperator op = ((RexCall) node).op; + if (op instanceof SqlUserDefinedFunction) { + Function function = ((SqlUserDefinedFunction) op).function; + if (function instanceof ScalarFunctionImpl) { + String jarPath = ((ScalarFunctionImpl) function).getJarPath(); + if (!jarPath.isEmpty()) { + jarPaths.add(jarPath); + } + } + } } - } else if (Types.isAssignableFrom(Iterable.class, value.getType())) { - // Passing an Iterable into newArrayList gets interpreted to mean copying each individual - // element. We want the - // entire Iterable to be treated as a single element, so we cast to Object. - returnValue = Expressions.convert_(value, Object.class); } - returnValue = - Expressions.condition( - Expressions.equal(value, Expressions.constant(null)), - Expressions.constant(null), - returnValue); - return returnValue; + return jarPaths.build(); } - private static Expression castOutputTime(Expression value, FieldType toType) { - Expression valueDateTime = value; - - if (CalciteUtils.TIMESTAMP.typesEqual(toType) - || CalciteUtils.NULLABLE_TIMESTAMP.typesEqual(toType)) { - // Convert TIMESTAMP to joda Instant - if (value.getType() == java.sql.Timestamp.class) { - valueDateTime = Expressions.call(BuiltInMethod.TIMESTAMP_TO_LONG.method, valueDateTime); - } - valueDateTime = Expressions.new_(Instant.class, valueDateTime); - } else if (CalciteUtils.TIME.typesEqual(toType) - || CalciteUtils.NULLABLE_TIME.typesEqual(toType)) { - // Convert TIME to LocalTime - if (value.getType() == java.sql.Time.class) { - valueDateTime = Expressions.call(BuiltInMethod.TIME_TO_INT.method, valueDateTime); - } else if (value.getType() == Long.class) { - valueDateTime = Expressions.unbox(valueDateTime); - } - valueDateTime = - Expressions.multiply(valueDateTime, Expressions.constant(NANOS_PER_MILLISECOND)); - valueDateTime = Expressions.call(LocalTime.class, "ofNanoOfDay", valueDateTime); - } else if (CalciteUtils.DATE.typesEqual(toType) - || CalciteUtils.NULLABLE_DATE.typesEqual(toType)) { - // Convert DATE to LocalDate - if (value.getType() == java.sql.Date.class) { - valueDateTime = Expressions.call(BuiltInMethod.DATE_TO_INT.method, valueDateTime); - } else if (value.getType() == Long.class) { - valueDateTime = Expressions.unbox(valueDateTime); - } - valueDateTime = Expressions.call(LocalDate.class, "ofEpochDay", valueDateTime); - } else if (CalciteUtils.TIMESTAMP_WITH_LOCAL_TZ.typesEqual(toType) - || CalciteUtils.NULLABLE_TIMESTAMP_WITH_LOCAL_TZ.typesEqual(toType)) { - // Convert TimeStamp_With_Local_TimeZone to LocalDateTime - Expression dateValue = - Expressions.divide(valueDateTime, Expressions.constant(MILLIS_PER_DAY)); - Expression date = Expressions.call(LocalDate.class, "ofEpochDay", dateValue); - Expression timeValue = - Expressions.multiply( - Expressions.modulo(valueDateTime, Expressions.constant(MILLIS_PER_DAY)), - Expressions.constant(NANOS_PER_MILLISECOND)); - Expression time = Expressions.call(LocalTime.class, "ofNanoOfDay", timeValue); - valueDateTime = Expressions.call(LocalDateTime.class, "of", date, time); - } else { - throw new UnsupportedOperationException("Unknown DateTime type " + toType); + static Object toBeamObject(Object value, FieldType fieldType, boolean verifyValues) { + if (value == null) { + return null; } + switch (fieldType.getTypeName()) { + // BEAM-12176: Numbers aren't always the type we expect. + case BYTE: + return ((Number) value).byteValue(); + case INT16: + return ((Number) value).shortValue(); + case INT32: + return ((Number) value).intValue(); + case INT64: + return ((Number) value).longValue(); + case FLOAT: + return ((Number) value).floatValue(); + case DOUBLE: + return ((Number) value).doubleValue(); + case DECIMAL: + if (value instanceof BigDecimal) { + return (BigDecimal) value; + } else if (value instanceof Long) { + return BigDecimal.valueOf((Long) value); + } else if (value instanceof Integer) { + return BigDecimal.valueOf((Integer) value); + } + return new BigDecimal(((Number) value).toString()); + case STRING: + return (String) value; + case BOOLEAN: + return (Boolean) value; + case DATETIME: + if (value instanceof Timestamp) { + value = SqlFunctions.toLong((Timestamp) value); + } + return Instant.ofEpochMilli(((Number) value).longValue()); + case BYTES: + if (value instanceof byte[]) { + return value; + } + return ((ByteString) value).getBytes(); + case ARRAY: + return toBeamList((List) value, fieldType.getCollectionElementType(), verifyValues); + case MAP: + return toBeamMap( + (Map) value, + fieldType.getMapKeyType(), + fieldType.getMapValueType(), + verifyValues); + case ROW: + if (value instanceof Object[]) { + value = Arrays.asList((Object[]) value); + } + return toBeamRow((List) value, fieldType.getRowSchema(), verifyValues); + case LOGICAL_TYPE: + String identifier = fieldType.getLogicalType().getIdentifier(); + if (CharType.IDENTIFIER.equals(identifier)) { + return (String) value; + } else if (TimeWithLocalTzType.IDENTIFIER.equals(identifier)) { + return Instant.ofEpochMilli(((Number) value).longValue()); + } else if (SqlTypes.DATE.getIdentifier().equals(identifier)) { + if (value instanceof Date) { + value = SqlFunctions.toInt((Date) value); + } + // BEAM-12175: value should always be Integer here, but it isn't. + return LocalDate.ofEpochDay(((Number) value).longValue()); + } else if (SqlTypes.TIME.getIdentifier().equals(identifier)) { + if (value instanceof Time) { + value = SqlFunctions.toInt((Time) value); + } + // BEAM-12175: value should always be Integer here, but it isn't. + return LocalTime.ofNanoOfDay(((Number) value).longValue() * NANOS_PER_MILLISECOND); + } else if (SqlTypes.DATETIME.getIdentifier().equals(identifier)) { + if (value instanceof Timestamp) { + value = SqlFunctions.toLong((Timestamp) value); + } + return LocalDateTime.of( + LocalDate.ofEpochDay(((Number) value).longValue() / MILLIS_PER_DAY), + LocalTime.ofNanoOfDay( + (((Number) value).longValue() % MILLIS_PER_DAY) * NANOS_PER_MILLISECOND)); + } else { + throw new UnsupportedOperationException("Unable to convert logical type " + identifier); + } + default: + throw new UnsupportedOperationException("Unable to convert " + fieldType.getTypeName()); + } + } + + private static List toBeamList( + List arrayValue, FieldType elementType, boolean verifyValues) { + return arrayValue.stream() + .map(e -> toBeamObject(e, elementType, verifyValues)) + .collect(Collectors.toList()); + } - // make conversion conditional on non-null input. - if (!((Class) value.getType()).isPrimitive()) { - valueDateTime = - Expressions.condition( - Expressions.equal(value, Expressions.constant(null)), - Expressions.constant(null), - valueDateTime); + private static Map toBeamMap( + Map mapValue, + FieldType keyType, + FieldType elementType, + boolean verifyValues) { + Map output = new HashMap<>(mapValue.size()); + for (Map.Entry entry : mapValue.entrySet()) { + output.put( + toBeamObject(entry.getKey(), keyType, verifyValues), + toBeamObject(entry.getValue(), elementType, verifyValues)); } + return output; + } - return valueDateTime; + private static Row toBeamRow(List structValue, Schema schema, boolean verifyValues) { + List objects = new ArrayList<>(schema.getFieldCount()); + assert structValue.size() == schema.getFieldCount(); + for (int i = 0; i < structValue.size(); i++) { + objects.add(toBeamObject(structValue.get(i), schema.getField(i).getType(), verifyValues)); + } + Row row = + verifyValues + ? Row.withSchema(schema).addValues(objects).build() + : Row.withSchema(schema).attachValues(objects); + return row; } private static class InputGetterImpl implements RexToLixTranslator.InputGetter { - private static final Map TYPE_CONVERSION_MAP = - ImmutableMap.builder() - .put(TypeName.BYTE, Byte.class) - .put(TypeName.BYTES, byte[].class) - .put(TypeName.INT16, Short.class) - .put(TypeName.INT32, Integer.class) - .put(TypeName.INT64, Long.class) - .put(TypeName.DECIMAL, BigDecimal.class) - .put(TypeName.FLOAT, Float.class) - .put(TypeName.DOUBLE, Double.class) - .put(TypeName.STRING, String.class) - .put(TypeName.DATETIME, ReadableInstant.class) - .put(TypeName.BOOLEAN, Boolean.class) - .put(TypeName.MAP, Map.class) - .put(TypeName.ARRAY, Collection.class) - .put(TypeName.ITERABLE, Iterable.class) - .put(TypeName.ROW, Row.class) - .build(); - - private static final Map LOGICAL_TYPE_TO_BASE_TYPE_MAP = - ImmutableMap.builder() - .put(SqlTypes.DATE.getIdentifier(), Long.class) - .put(SqlTypes.TIME.getIdentifier(), Long.class) - .put(TimeWithLocalTzType.IDENTIFIER, ReadableInstant.class) - .put(SqlTypes.DATETIME.getIdentifier(), Row.class) - .put(CharType.IDENTIFIER, String.class) - .build(); private final Expression input; private final Schema inputSchema; @@ -419,84 +419,189 @@ private InputGetterImpl(Expression input, Schema inputSchema) { @Override public Expression field(BlockBuilder list, int index, Type storageType) { - return value(list, index, storageType, input, inputSchema); + return getBeamField(list, index, input, inputSchema); } - private static Expression value( - BlockBuilder list, int index, Type storageType, Expression input, Schema schema) { + // Read field from Beam Row + private static Expression getBeamField( + BlockBuilder list, int index, Expression input, Schema schema) { if (index >= schema.getFieldCount() || index < 0) { throw new IllegalArgumentException("Unable to find value #" + index); } final Expression expression = list.append(list.newName("current"), input); - FieldType fromType = schema.getField(index).getType(); - Class convertTo = null; - if (storageType == Object.class) { - convertTo = Object.class; - } else if (fromType.getTypeName().isLogicalType()) { - convertTo = LOGICAL_TYPE_TO_BASE_TYPE_MAP.get(fromType.getLogicalType().getIdentifier()); - } else { - convertTo = TYPE_CONVERSION_MAP.get(fromType.getTypeName()); - } - if (convertTo == null) { - throw new UnsupportedOperationException("Unable to get " + fromType.getTypeName()); + FieldType fieldType = schema.getField(index).getType(); + Expression value; + switch (fieldType.getTypeName()) { + case BYTE: + value = Expressions.call(expression, "getByte", Expressions.constant(index)); + break; + case INT16: + value = Expressions.call(expression, "getInt16", Expressions.constant(index)); + break; + case INT32: + value = Expressions.call(expression, "getInt32", Expressions.constant(index)); + break; + case INT64: + value = Expressions.call(expression, "getInt64", Expressions.constant(index)); + break; + case DECIMAL: + value = Expressions.call(expression, "getDecimal", Expressions.constant(index)); + break; + case FLOAT: + value = Expressions.call(expression, "getFloat", Expressions.constant(index)); + break; + case DOUBLE: + value = Expressions.call(expression, "getDouble", Expressions.constant(index)); + break; + case STRING: + value = Expressions.call(expression, "getString", Expressions.constant(index)); + break; + case DATETIME: + value = Expressions.call(expression, "getDateTime", Expressions.constant(index)); + break; + case BOOLEAN: + value = Expressions.call(expression, "getBoolean", Expressions.constant(index)); + break; + case BYTES: + value = Expressions.call(expression, "getBytes", Expressions.constant(index)); + break; + case ARRAY: + value = Expressions.call(expression, "getArray", Expressions.constant(index)); + break; + case MAP: + value = Expressions.call(expression, "getMap", Expressions.constant(index)); + break; + case ROW: + value = Expressions.call(expression, "getRow", Expressions.constant(index)); + break; + case LOGICAL_TYPE: + String identifier = fieldType.getLogicalType().getIdentifier(); + if (CharType.IDENTIFIER.equals(identifier)) { + value = Expressions.call(expression, "getString", Expressions.constant(index)); + } else if (TimeWithLocalTzType.IDENTIFIER.equals(identifier)) { + value = Expressions.call(expression, "getDateTime", Expressions.constant(index)); + } else if (SqlTypes.DATE.getIdentifier().equals(identifier)) { + value = + Expressions.convert_( + Expressions.call( + expression, + "getLogicalTypeValue", + Expressions.constant(index), + Expressions.constant(LocalDate.class)), + LocalDate.class); + } else if (SqlTypes.TIME.getIdentifier().equals(identifier)) { + value = + Expressions.convert_( + Expressions.call( + expression, + "getLogicalTypeValue", + Expressions.constant(index), + Expressions.constant(LocalTime.class)), + LocalTime.class); + } else if (SqlTypes.DATETIME.getIdentifier().equals(identifier)) { + value = + Expressions.convert_( + Expressions.call( + expression, + "getLogicalTypeValue", + Expressions.constant(index), + Expressions.constant(LocalDateTime.class)), + LocalDateTime.class); + } else { + throw new UnsupportedOperationException("Unable to get logical type " + identifier); + } + break; + default: + throw new UnsupportedOperationException("Unable to get " + fieldType.getTypeName()); } - Expression value = - Expressions.convert_( - Expressions.call( - expression, - "getBaseValue", - Expressions.constant(index), - Expressions.constant(convertTo)), - convertTo); - return (storageType != Object.class) ? value(value, fromType) : value; + return toCalciteValue(value, fieldType); } - private static Expression value(Expression value, Schema.FieldType type) { - if (type.getTypeName().isLogicalType()) { - String logicalId = type.getLogicalType().getIdentifier(); - if (SqlTypes.TIME.getIdentifier().equals(logicalId)) { + // Value conversion: Beam => Calcite + private static Expression toCalciteValue(Expression value, FieldType fieldType) { + switch (fieldType.getTypeName()) { + case BYTE: + return Expressions.convert_(value, Byte.class); + case INT16: + return Expressions.convert_(value, Short.class); + case INT32: + return Expressions.convert_(value, Integer.class); + case INT64: + return Expressions.convert_(value, Long.class); + case DECIMAL: + return Expressions.convert_(value, BigDecimal.class); + case FLOAT: + return Expressions.convert_(value, Float.class); + case DOUBLE: + return Expressions.convert_(value, Double.class); + case STRING: + return Expressions.convert_(value, String.class); + case BOOLEAN: + return Expressions.convert_(value, Boolean.class); + case DATETIME: return nullOr( - value, Expressions.divide(value, Expressions.constant(NANOS_PER_MILLISECOND))); - } else if (SqlTypes.DATE.getIdentifier().equals(logicalId)) { - return value; - } else if (SqlTypes.DATETIME.getIdentifier().equals(logicalId)) { - Expression dateValue = - Expressions.call(value, "getInt64", Expressions.constant(DateTime.DATE_FIELD_NAME)); - Expression timeValue = - Expressions.call(value, "getInt64", Expressions.constant(DateTime.TIME_FIELD_NAME)); - Expression returnValue = - Expressions.add( - Expressions.multiply(dateValue, Expressions.constant(MILLIS_PER_DAY)), - Expressions.divide(timeValue, Expressions.constant(NANOS_PER_MILLISECOND))); - return nullOr(value, returnValue); - } else if (!CharType.IDENTIFIER.equals(logicalId)) { - throw new UnsupportedOperationException( - "Unknown LogicalType " + type.getLogicalType().getIdentifier()); - } - } else if (type.getTypeName().isMapType()) { - return nullOr(value, map(value, type.getMapValueType())); - } else if (CalciteUtils.isDateTimeType(type)) { - return nullOr(value, Expressions.call(value, "getMillis")); - } else if (type.getTypeName().isCompositeType()) { - return nullOr(value, row(value, type.getRowSchema())); - } else if (type.getTypeName().isCollectionType()) { - return nullOr(value, list(value, type.getCollectionElementType())); - } else if (type.getTypeName() == TypeName.BYTES) { - return nullOr( - value, Expressions.new_(ByteString.class, Types.castIfNecessary(byte[].class, value))); + value, Expressions.call(Expressions.convert_(value, DateTime.class), "getMillis")); + case BYTES: + return nullOr( + value, Expressions.new_(ByteString.class, Expressions.convert_(value, byte[].class))); + case ARRAY: + return nullOr(value, toCalciteList(value, fieldType.getCollectionElementType())); + case MAP: + return nullOr(value, toCalciteMap(value, fieldType.getMapValueType())); + case ROW: + return nullOr(value, toCalciteRow(value, fieldType.getRowSchema())); + case LOGICAL_TYPE: + String identifier = fieldType.getLogicalType().getIdentifier(); + if (CharType.IDENTIFIER.equals(identifier)) { + return Expressions.convert_(value, String.class); + } else if (TimeWithLocalTzType.IDENTIFIER.equals(identifier)) { + return nullOr( + value, Expressions.call(Expressions.convert_(value, DateTime.class), "getMillis")); + } else if (SqlTypes.DATE.getIdentifier().equals(identifier)) { + return nullOr( + value, + Expressions.call( + Expressions.box( + Expressions.call( + Expressions.convert_(value, LocalDate.class), "toEpochDay")), + "intValue")); + } else if (SqlTypes.TIME.getIdentifier().equals(identifier)) { + return nullOr( + value, + Expressions.call( + Expressions.box( + Expressions.divide( + Expressions.call( + Expressions.convert_(value, LocalTime.class), "toNanoOfDay"), + Expressions.constant(NANOS_PER_MILLISECOND))), + "intValue")); + } else if (SqlTypes.DATETIME.getIdentifier().equals(identifier)) { + value = Expressions.convert_(value, LocalDateTime.class); + Expression dateValue = + Expressions.call(Expressions.call(value, "toLocalDate"), "toEpochDay"); + Expression timeValue = + Expressions.call(Expressions.call(value, "toLocalTime"), "toNanoOfDay"); + Expression returnValue = + Expressions.add( + Expressions.multiply(dateValue, Expressions.constant(MILLIS_PER_DAY)), + Expressions.divide(timeValue, Expressions.constant(NANOS_PER_MILLISECOND))); + return nullOr(value, returnValue); + } else { + throw new UnsupportedOperationException("Unable to convert logical type " + identifier); + } + default: + throw new UnsupportedOperationException("Unable to convert " + fieldType.getTypeName()); } - - return value; } - private static Expression list(Expression input, FieldType elementType) { + private static Expression toCalciteList(Expression input, FieldType elementType) { ParameterExpression value = Expressions.parameter(Object.class); BlockBuilder block = new BlockBuilder(); - block.add(value(value, elementType)); + block.add(toCalciteValue(value, elementType)); return Expressions.new_( WrappedList.class, @@ -510,11 +615,11 @@ private static Expression list(Expression input, FieldType elementType) { block.toBlock()))); } - private static Expression map(Expression input, FieldType mapValueType) { + private static Expression toCalciteMap(Expression input, FieldType mapValueType) { ParameterExpression value = Expressions.parameter(Object.class); BlockBuilder block = new BlockBuilder(); - block.add(value(value, mapValueType)); + block.add(toCalciteValue(value, mapValueType)); return Expressions.new_( WrappedMap.class, @@ -528,14 +633,14 @@ private static Expression map(Expression input, FieldType mapValueType) { block.toBlock()))); } - private static Expression row(Expression input, Schema schema) { + private static Expression toCalciteRow(Expression input, Schema schema) { ParameterExpression row = Expressions.parameter(Row.class); ParameterExpression index = Expressions.parameter(int.class); BlockBuilder body = new BlockBuilder(/* optimizing= */ false); for (int i = 0; i < schema.getFieldCount(); i++) { BlockBuilder list = new BlockBuilder(/* optimizing= */ false, body); - Expression returnValue = value(list, i, /* storageType= */ null, row, schema); + Expression returnValue = getBeamField(list, i, row, schema); list.append(returnValue); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRel.java index 9debc62f51e5..2bd3f429e8dc 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRel.java @@ -40,14 +40,14 @@ import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.CorrelationId; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.CorrelationId; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; /** * A {@code BeamJoinRel} which does CoGBK Join diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java index 9b91285e034c..4b900d82e336 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rel; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.io.IOException; import java.time.LocalDate; @@ -57,24 +57,24 @@ import org.apache.beam.sdk.values.PCollection.IsBounded; import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.EnumerableRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.EnumerableRelImplementor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.PhysType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.PhysTypeImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.Enumerable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.Linq4j; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.BlockBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expression; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expressions; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.ConventionTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCost; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.EnumerableRel; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.EnumerableRelImplementor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.PhysType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.PhysTypeImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.Enumerable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.Linq4j; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.BlockBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expression; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expressions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.ConventionTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCost; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; import org.joda.time.ReadableInstant; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSinkRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSinkRel.java index 3af4509cba33..ffe707fa6819 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSinkRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSinkRel.java @@ -17,28 +17,31 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rel; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.util.List; import java.util.Map; import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; import org.apache.beam.sdk.extensions.sql.impl.rule.BeamIOSinkRule; +import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.transforms.RenameFields; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.Prepare; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.TableModify; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql2rel.RelStructuredTypeFlattener; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.prepare.Prepare; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.TableModify; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql2rel.RelStructuredTypeFlattener; /** BeamRelNode to replace a {@code TableModify} node. */ public class BeamIOSinkRel extends TableModify @@ -132,7 +135,8 @@ public PCollection expand(PCollectionList pinput) { "Wrong number of inputs for %s: %s", BeamIOSinkRel.class.getSimpleName(), pinput); - PCollection input = pinput.get(0); + Schema schema = CalciteUtils.toSchema(getExpectedInputRowType(0)); + PCollection input = pinput.get(0).apply(RenameFields.create()).setRowSchema(schema); sqlTable.buildIOWriter(input); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java index f67238457168..3917c8eef232 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rel; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.util.List; import java.util.Map; @@ -31,15 +31,15 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCost; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.RelOptTableImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.TableScan; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCost; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.prepare.RelOptTableImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.TableScan; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; /** BeamRelNode to replace a {@code TableScan} node. */ public class BeamIOSourceRel extends TableScan implements BeamRelNode { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRel.java index 80db5038394a..e69e14aecf33 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRel.java @@ -24,13 +24,13 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Intersect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.SetOp; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Intersect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.SetOp; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; /** * {@code BeamRelNode} to replace a {@code Intersect} node. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java index d0a42cba3b1b..d81e8a750850 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java @@ -25,23 +25,23 @@ import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Optional; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSubset; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.CorrelationId; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexFieldAccess; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Optional; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.volcano.RelSubset; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.CorrelationId; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexFieldAccess; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; /** * An abstract {@code BeamRelNode} to implement Join Rels. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamLogicalConvention.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamLogicalConvention.java index 0cbe757c2573..f744973ff328 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamLogicalConvention.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamLogicalConvention.java @@ -17,12 +17,12 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.ConventionTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTrait; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.ConventionTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTrait; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; /** Convertion for Beam SQL. */ @SuppressWarnings({ diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMatchRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMatchRel.java index d970c6912213..37615c8e5806 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMatchRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMatchRel.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.extensions.sql.impl.rel; import static org.apache.beam.sdk.extensions.sql.impl.cep.CEPUtils.makeOrderKeysFromCollation; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.util.ArrayList; import java.util.List; @@ -48,18 +48,18 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Match; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Match; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ImmutableBitSet; /** * {@code BeamRelNode} to replace a {@code Match} node. @@ -87,7 +87,7 @@ public BeamMatchRel( RexNode after, Map> subsets, boolean allRows, - List partitionKeys, + ImmutableBitSet partitionKeys, RelCollation orderKeys, RexNode interval) { @@ -134,7 +134,7 @@ public PTransform, PCollection> buildPTransform() { private class MatchTransform extends PTransform, PCollection> { - private final List parKeys; + private final ImmutableBitSet partitionKeys; private final RelCollation orderKeys; private final Map measures; private final boolean allRows; @@ -142,13 +142,13 @@ private class MatchTransform extends PTransform, PCollectio private final Map patternDefs; public MatchTransform( - List parKeys, + ImmutableBitSet partitionKeys, RelCollation orderKeys, Map measures, boolean allRows, RexNode pattern, Map patternDefs) { - this.parKeys = parKeys; + this.partitionKeys = partitionKeys; this.orderKeys = orderKeys; this.measures = measures; this.allRows = allRows; @@ -168,9 +168,7 @@ public PCollection expand(PCollectionList pinput) { Schema outSchema = CalciteUtils.toSchema(getRowType()); Schema.Builder schemaBuilder = new Schema.Builder(); - for (RexNode i : parKeys) { - RexInputRef varNode = (RexInputRef) i; - int index = varNode.getIndex(); + for (int index : partitionKeys.asList()) { schemaBuilder.addField(upstreamSchema.getField(index)); } Schema partitionKeySchema = schemaBuilder.build(); @@ -217,7 +215,7 @@ public PCollection expand(PCollectionList pinput) { // apply the ParDo for the match process and measures clause // for now, support FINAL only // TODO: add support for FINAL/RUNNING - List cepParKeys = CEPUtils.getCEPFieldRefFromParKeys(parKeys); + List cepParKeys = CEPUtils.getCEPFieldRefFromParKeys(partitionKeys); PCollection outStream = orderedUpstream .apply( @@ -236,20 +234,20 @@ private static class MatchPattern extends DoFn>, Row> { private final Schema upstreamSchema; private final Schema outSchema; - private final List parKeys; + private final List partitionKeys; private final ArrayList pattern; private final List measures; private final boolean allRows; MatchPattern( Schema upstreamSchema, - List parKeys, + List partitionKeys, ArrayList pattern, List measures, boolean allRows, Schema outSchema) { this.upstreamSchema = upstreamSchema; - this.parKeys = parKeys; + this.partitionKeys = partitionKeys; this.pattern = pattern; this.measures = measures; this.allRows = allRows; @@ -283,18 +281,18 @@ public void processElement(@Element KV> keyRows, OutputReceiv Row.FieldValueBuilder newFieldBuilder = null; // add partition key columns - for (CEPFieldRef i : parKeys) { + for (CEPFieldRef i : partitionKeys) { int colIndex = i.getIndex(); Schema.Field parSchema = upstreamSchema.getField(colIndex); String fieldName = parSchema.getName(); if (!result.isEmpty()) { - Row parKeyRow = keyRows.getKey(); + Row partitionKeyRow = keyRows.getKey(); if (newFieldBuilder == null) { newFieldBuilder = - newRowBuilder.withFieldValue(fieldName, parKeyRow.getValue(fieldName)); + newRowBuilder.withFieldValue(fieldName, partitionKeyRow.getValue(fieldName)); } else { newFieldBuilder = - newFieldBuilder.withFieldValue(fieldName, parKeyRow.getValue(fieldName)); + newFieldBuilder.withFieldValue(fieldName, partitionKeyRow.getValue(fieldName)); } } else { break; @@ -432,7 +430,6 @@ public void processElement(@Element Row eleRow, OutputReceiver> out } } - @Override public Match copy( RelNode input, RelDataType rowType, @@ -444,7 +441,7 @@ public Match copy( RexNode after, Map> subsets, boolean allRows, - List partitionKeys, + ImmutableBitSet partitionKeys, RelCollation orderKeys, RexNode interval) { @@ -465,4 +462,24 @@ public Match copy( orderKeys, interval); } + + @Override + public RelNode copy(RelTraitSet traitSet, List inputs) { + return new BeamMatchRel( + getCluster(), + traitSet, + inputs.get(0), + rowType, + pattern, + strictStart, + strictEnd, + patternDefinitions, + measures, + after, + subsets, + allRows, + partitionKeys, + orderKeys, + interval); + } } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java index 5e9e075652de..53357b41710d 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java @@ -24,13 +24,13 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Minus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.SetOp; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Minus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.SetOp; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; /** * {@code BeamRelNode} to replace a {@code Minus} node. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java index 1e9b551b8662..edc324001ce3 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rel; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.util.List; import java.util.Map; @@ -33,13 +33,13 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; public class BeamPushDownIOSourceRel extends BeamIOSourceRel { private final List usedFields; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamRelNode.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamRelNode.java index 935ec6cae628..fc7bdf1f42e1 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamRelNode.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamRelNode.java @@ -25,9 +25,9 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; /** A {@link RelNode} that can also give a {@link PTransform} that implements the expression. */ @SuppressWarnings({ @@ -76,8 +76,8 @@ default Map getPipelineOptions() { * estimate its NodeStats, it may need NodeStat of its inputs. However, it should not call this * directly (because maybe its inputs are not physical yet). It should call {@link * org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils#getNodeStats( - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode, - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery)} + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode, + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery)} * instead. */ NodeStats estimateNodeStats(RelMetadataQuery mq); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java index 9f0873426614..ca7bd212d24d 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rel; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.io.Serializable; import org.apache.beam.sdk.extensions.sql.impl.transform.BeamSetOperatorsTransforms; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputJoinRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputJoinRel.java index ebe38aeefbd3..7fedd9bb06b6 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputJoinRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputJoinRel.java @@ -32,14 +32,14 @@ import org.apache.beam.sdk.values.PCollection.IsBounded; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.CorrelationId; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.CorrelationId; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; /** diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRel.java index b4dbd565bd2c..248fd98074a4 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRel.java @@ -26,13 +26,13 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.CorrelationId; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.CorrelationId; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** * A {@code BeamJoinRel} which does Lookup Join diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSortRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSortRel.java index 3c056f6280b0..a636ec2f1ef7 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSortRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSortRel.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rel; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.MoreObjects.firstNonNull; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.MoreObjects.firstNonNull; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.io.Serializable; import java.math.BigDecimal; @@ -50,19 +50,19 @@ import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollationImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelFieldCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Sort; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollationImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelFieldCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Sort; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; /** * {@code BeamRelNode} to replace a {@code Sort} node. @@ -112,7 +112,8 @@ public BeamSortRel( RexNode fetch) { super(cluster, traits, child, collation, offset, fetch); - List fieldExps = getChildExps(); + // https://issues.apache.org/jira/browse/CALCITE-4079?focusedCommentId=17165904&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17165904 + List fieldExps = getSortExps(); RelCollationImpl collationImpl = (RelCollationImpl) collation; List collations = collationImpl.getFieldCollations(); for (int i = 0; i < fieldExps.size(); i++) { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSqlRelUtils.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSqlRelUtils.java index fbe6dd463d1e..9dd731f53673 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSqlRelUtils.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSqlRelUtils.java @@ -28,9 +28,9 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSubset; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.volcano.RelSubset; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; /** Utilities for {@code BeamRelNode}. */ @SuppressWarnings({ @@ -50,7 +50,16 @@ private static PCollectionList buildPCollectionList( } else { return PCollectionList.of( inputRels.stream() - .map(input -> BeamSqlRelUtils.toPCollection(pipeline, (BeamRelNode) input, cache)) + .map( + input -> { + final BeamRelNode beamRel; + if (input instanceof RelSubset) { + beamRel = (BeamRelNode) ((RelSubset) input).getBest(); + } else { + beamRel = (BeamRelNode) input; + } + return BeamSqlRelUtils.toPCollection(pipeline, beamRel, cache); + }) .collect(Collectors.toList())); } } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java index 8f45a3fa4aef..7bf28e397d4c 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rel; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.lang.reflect.Type; import java.util.ArrayList; @@ -51,18 +51,18 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.TableFunctionScan; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelColumnMapping; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.TableFunctionScan; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelColumnMapping; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.joda.time.Duration; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRel.java index b569a22477db..40d3b5f27e26 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRel.java @@ -17,8 +17,9 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rel; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; +import java.util.Collections; import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; @@ -29,12 +30,12 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Uncollect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Uncollect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; /** {@link BeamRelNode} to implement an uncorrelated {@link Uncollect}, aka UNNEST. */ @SuppressWarnings({ @@ -44,7 +45,7 @@ public class BeamUncollectRel extends Uncollect implements BeamRelNode { public BeamUncollectRel( RelOptCluster cluster, RelTraitSet traitSet, RelNode input, boolean withOrdinality) { - super(cluster, traitSet, input, withOrdinality); + super(cluster, traitSet, input, withOrdinality, Collections.emptyList()); } @Override diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java index 5fc3d07f18be..2e410107a9ef 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java @@ -25,13 +25,13 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.SetOp; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Union; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.SetOp; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Union; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; /** * {@link BeamRelNode} to replace a {@link Union}. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnnestRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnnestRel.java index 811d642150f8..4fb1ec629057 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnnestRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnnestRel.java @@ -29,18 +29,18 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Correlate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Uncollect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorUtil; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Correlate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Uncollect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlValidatorUtil; import org.checkerframework.checker.nullness.qual.Nullable; /** diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamValuesRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamValuesRel.java index 9fa50370189b..c09077172478 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamValuesRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamValuesRel.java @@ -20,7 +20,7 @@ import static java.util.stream.Collectors.toList; import static org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils.autoCastField; import static org.apache.beam.sdk.values.Row.toRow; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.util.List; import java.util.Map; @@ -34,15 +34,15 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Values; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Values; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; /** * {@code BeamRelNode} to replace a {@code Values} node. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamWindowRel.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamWindowRel.java index d9a15a15be84..a1ed5cddb96a 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamWindowRel.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamWindowRel.java @@ -40,18 +40,18 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelFieldCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.AggregateCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Window; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelFieldCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.AggregateCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Window; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; /** diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/CalcRelSplitter.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/CalcRelSplitter.java new file mode 100644 index 000000000000..c388f62d2b7a --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/CalcRelSplitter.java @@ -0,0 +1,898 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.rel; + +import java.io.PrintWriter; +import java.io.StringWriter; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalCalc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexDynamicParam; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexFieldAccess; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexShuttle; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexVisitorImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Litmus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Util; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.graph.DefaultDirectedGraph; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.graph.DefaultEdge; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.graph.DirectedGraph; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.graph.TopologicalOrderIterator; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Ints; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; + +/** + * CalcRelSplitter operates on a {@link Calc} with multiple {@link RexCall} sub-expressions that + * cannot all be implemented by a single concrete {@link RelNode}. + * + *

    This is a copy of {@link + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CalcRelSplitter} modified to + * work with Beam. TODO(CALCITE-4538) consider contributing these changes back upstream. + * + *

    For example, the Java and Fennel calculator do not implement an identical set of operators. + * The Calc can be used to split a single Calc with mixed Java- and Fennel-only operators into a + * tree of Calc object that can each be individually implemented by either Java or Fennel.and splits + * it into several Calc instances. + * + *

    Currently the splitter is only capable of handling two "rel types". That is, it can deal with + * Java vs. Fennel Calcs, but not Java vs. Fennel vs. some other type of Calc. + * + *

    See {@link ProjectToWindowRule} for an example of how this class is used. + */ +@SuppressWarnings({"all", "OperatorPrecedence"}) +public class CalcRelSplitter { + // ~ Static fields/initializers --------------------------------------------- + + private static final Logger RULE_LOGGER = RelOptPlanner.LOGGER; + + // ~ Instance fields -------------------------------------------------------- + + protected final RexProgram program; + private final RelDataTypeFactory typeFactory; + + private final RelType[] relTypes; + private final RelOptCluster cluster; + private final RelTraitSet traits; + private final RelNode child; + protected final RelBuilder relBuilder; + + // ~ Constructors ----------------------------------------------------------- + + /** + * Constructs a CalcRelSplitter. + * + * @param calc Calc to split + * @param relTypes Array of rel types, e.g. {Java, Fennel}. Must be distinct. + */ + public CalcRelSplitter(Calc calc, RelBuilder relBuilder, RelType[] relTypes) { + this.relBuilder = relBuilder; + for (int i = 0; i < relTypes.length; i++) { + assert relTypes[i] != null; + for (int j = 0; j < i; j++) { + assert relTypes[i] != relTypes[j] : "Rel types must be distinct"; + } + } + this.program = calc.getProgram(); + this.cluster = calc.getCluster(); + this.traits = calc.getTraitSet(); + this.typeFactory = calc.getCluster().getTypeFactory(); + this.child = calc.getInput(); + this.relTypes = relTypes; + } + + // ~ Methods ---------------------------------------------------------------- + + public RelNode execute() { + // Check that program is valid. In particular, this means that every + // expression is trivial (either an atom, or a function applied to + // references to atoms) and every expression depends only on + // expressions to the left. + assert program.isValid(Litmus.THROW, null); + final List exprList = program.getExprList(); + final RexNode[] exprs = exprList.toArray(new RexNode[0]); + assert !RexUtil.containComplexExprs(exprList); + + // Figure out what level each expression belongs to. + int[] exprLevels = new int[exprs.length]; + + // The type of a level is given by + // relTypes[levelTypeOrdinals[level]]. + int[] levelTypeOrdinals = new int[exprs.length]; + + int levelCount = chooseLevels(exprs, -1, exprLevels, levelTypeOrdinals); + + // For each expression, figure out which is the highest level where it + // is used. + int[] exprMaxUsingLevelOrdinals = + new HighestUsageFinder(exprs, exprLevels).getMaxUsingLevelOrdinals(); + + // If expressions are used as outputs, mark them as higher than that. + final List projectRefList = program.getProjectList(); + final RexLocalRef conditionRef = program.getCondition(); + for (RexLocalRef projectRef : projectRefList) { + exprMaxUsingLevelOrdinals[projectRef.getIndex()] = levelCount; + } + if (conditionRef != null) { + exprMaxUsingLevelOrdinals[conditionRef.getIndex()] = levelCount; + } + + // Print out what we've got. + if (RULE_LOGGER.isTraceEnabled()) { + traceLevelExpressions(exprs, exprLevels, levelTypeOrdinals, levelCount); + } + + // Now build the calcs. + RelNode rel = child; + final int inputFieldCount = program.getInputRowType().getFieldCount(); + int[] inputExprOrdinals = identityArray(inputFieldCount); + boolean doneCondition = false; + for (int level = 0; level < levelCount; level++) { + final int[] projectExprOrdinals; + final RelDataType outputRowType; + if (level == (levelCount - 1)) { + outputRowType = program.getOutputRowType(); + projectExprOrdinals = new int[projectRefList.size()]; + for (int i = 0; i < projectExprOrdinals.length; i++) { + projectExprOrdinals[i] = projectRefList.get(i).getIndex(); + } + } else { + outputRowType = null; + + // Project the expressions which are computed at this level or + // before, and will be used at later levels. + List projectExprOrdinalList = new ArrayList<>(); + for (int i = 0; i < exprs.length; i++) { + RexNode expr = exprs[i]; + if (expr instanceof RexLiteral) { + // Don't project literals. They are always created in + // the level where they are used. + exprLevels[i] = -1; + continue; + } + if ((exprLevels[i] <= level) && (exprMaxUsingLevelOrdinals[i] > level)) { + projectExprOrdinalList.add(i); + } + } + projectExprOrdinals = Ints.toArray(projectExprOrdinalList); + } + + final RelType relType = relTypes[levelTypeOrdinals[level]]; + + // Can we do the condition this level? + int conditionExprOrdinal = -1; + if ((conditionRef != null) && !doneCondition) { + conditionExprOrdinal = conditionRef.getIndex(); + if ((exprLevels[conditionExprOrdinal] > level) || !relType.supportsCondition()) { + // stand down -- we're not ready to do the condition yet + conditionExprOrdinal = -1; + } else { + doneCondition = true; + } + } + + RexProgram program1 = + createProgramForLevel( + level, + levelCount, + rel.getRowType(), + exprs, + exprLevels, + inputExprOrdinals, + projectExprOrdinals, + conditionExprOrdinal, + outputRowType); + rel = relType.makeRel(cluster, traits, relBuilder, rel, program1); + + rel = handle(rel); + + // The outputs of this level will be the inputs to the next level. + inputExprOrdinals = projectExprOrdinals; + } + + Preconditions.checkArgument(doneCondition || (conditionRef == null), "unhandled condition"); + return rel; + } + + /** + * Opportunity to further refine the relational expression created for a given level. The default + * implementation returns the relational expression unchanged. + */ + protected RelNode handle(RelNode rel) { + return rel; + } + + /** + * Figures out which expressions to calculate at which level. + * + * @param exprs Array of expressions + * @param conditionOrdinal Ordinal of the condition expression, or -1 if no condition + * @param exprLevels Level ordinal for each expression (output) + * @param levelTypeOrdinals The type of each level (output) + * @return Number of levels required + */ + private int chooseLevels( + final RexNode[] exprs, int conditionOrdinal, int[] exprLevels, int[] levelTypeOrdinals) { + final int inputFieldCount = program.getInputRowType().getFieldCount(); + + int levelCount = 0; + final MaxInputFinder maxInputFinder = new MaxInputFinder(exprLevels); + boolean[] relTypesPossibleForTopLevel = new boolean[relTypes.length]; + Arrays.fill(relTypesPossibleForTopLevel, true); + + // Compute the order in which to visit expressions. + final List> cohorts = getCohorts(); + final List permutation = computeTopologicalOrdering(exprs, cohorts); + + for (int i : permutation) { + RexNode expr = exprs[i]; + final boolean condition = i == conditionOrdinal; + + if (i < inputFieldCount) { + assert expr instanceof RexInputRef; + exprLevels[i] = -1; + continue; + } + + // Deduce the minimum level of the expression. An expression must + // be at a level greater than or equal to all of its inputs. + int level = maxInputFinder.maxInputFor(expr); + + // If the expression is in a cohort, it can occur no lower than the + // levels of other expressions in the same cohort. + Set cohort = findCohort(cohorts, i); + if (cohort != null) { + for (Integer exprOrdinal : cohort) { + if (exprOrdinal == i) { + // Already did this member of the cohort. It's a waste + // of effort to repeat. + continue; + } + final RexNode cohortExpr = exprs[exprOrdinal]; + int cohortLevel = maxInputFinder.maxInputFor(cohortExpr); + if (cohortLevel > level) { + level = cohortLevel; + } + } + } + + // Try to implement this expression at this level. + // If that is not possible, try to implement it at higher levels. + levelLoop: + for (; ; ++level) { + if (level >= levelCount) { + // This is a new level. We can use any type we like. + for (int relTypeOrdinal = 0; relTypeOrdinal < relTypes.length; relTypeOrdinal++) { + if (!relTypesPossibleForTopLevel[relTypeOrdinal]) { + continue; + } + if (relTypes[relTypeOrdinal].canImplement(expr, condition)) { + // Success. We have found a type where we can + // implement this expression. + exprLevels[i] = level; + levelTypeOrdinals[level] = relTypeOrdinal; + assert (level == 0) || (levelTypeOrdinals[level - 1] != levelTypeOrdinals[level]) + : "successive levels of same type"; + + // Figure out which of the other reltypes are + // still possible for this level. + // Previous reltypes are not possible. + for (int j = 0; j < relTypeOrdinal; ++j) { + relTypesPossibleForTopLevel[j] = false; + } + + // Successive reltypes may be possible. + for (int j = relTypeOrdinal + 1; j < relTypes.length; ++j) { + if (relTypesPossibleForTopLevel[j]) { + relTypesPossibleForTopLevel[j] = relTypes[j].canImplement(expr, condition); + } + } + + // Move to next level. + levelTypeOrdinals[levelCount] = firstSet(relTypesPossibleForTopLevel); + ++levelCount; + Arrays.fill(relTypesPossibleForTopLevel, true); + break levelLoop; + } + } + + // None of the reltypes still active for this level could + // implement expr. But maybe we could succeed with a new + // level, with all options open? + if (count(relTypesPossibleForTopLevel) >= relTypes.length) { + // Cannot implement for any type. + throw new AssertionError("cannot implement " + expr); + } + levelTypeOrdinals[levelCount] = firstSet(relTypesPossibleForTopLevel); + ++levelCount; + Arrays.fill(relTypesPossibleForTopLevel, true); + } else { + final int levelTypeOrdinal = levelTypeOrdinals[level]; + if (!relTypes[levelTypeOrdinal].canImplement(expr, condition)) { + // Cannot implement this expression in this type; + // continue to next level. + continue; + } + exprLevels[i] = level; + break; + } + } + } + if (levelCount == 0) { + // At least one level is always required. + levelCount = 1; + } + return levelCount; + } + + /** + * Computes the order in which to visit expressions, so that we decide the level of an expression + * only after the levels of lower expressions have been decided. + * + *

    First, we need to ensure that an expression is visited after all of its inputs. + * + *

    Further, if the expression is a member of a cohort, we need to visit it after the inputs of + * all other expressions in that cohort. With this condition, expressions in the same cohort will + * very likely end up in the same level. + * + *

    Note that if there are no cohorts, the expressions from the {@link RexProgram} are already + * in a suitable order. We perform the topological sort just to ensure that the code path is + * well-trodden. + * + * @param exprs Expressions + * @param cohorts List of cohorts, each of which is a set of expr ordinals + * @return Expression ordinals in topological order + */ + private static List computeTopologicalOrdering( + RexNode[] exprs, List> cohorts) { + final DirectedGraph graph = DefaultDirectedGraph.create(); + for (int i = 0; i < exprs.length; i++) { + graph.addVertex(i); + } + for (int i = 0; i < exprs.length; i++) { + final RexNode expr = exprs[i]; + final Set cohort = findCohort(cohorts, i); + final Set targets; + if (cohort == null) { + targets = Collections.singleton(i); + } else { + targets = cohort; + } + expr.accept( + new RexVisitorImpl(true) { + @Override + public Void visitLocalRef(RexLocalRef localRef) { + for (Integer target : targets) { + graph.addEdge(localRef.getIndex(), target); + } + return null; + } + }); + } + TopologicalOrderIterator iter = new TopologicalOrderIterator<>(graph); + final List permutation = new ArrayList<>(); + while (iter.hasNext()) { + permutation.add(iter.next()); + } + return permutation; + } + + /** + * Finds the cohort that contains the given integer, or returns null. + * + * @param cohorts List of cohorts, each a set of integers + * @param ordinal Integer to search for + * @return Cohort that contains the integer, or null if not found + */ + private static @Nullable Set findCohort(List> cohorts, int ordinal) { + for (Set cohort : cohorts) { + if (cohort.contains(ordinal)) { + return cohort; + } + } + return null; + } + + private static int[] identityArray(int length) { + final int[] ints = new int[length]; + for (int i = 0; i < ints.length; i++) { + ints[i] = i; + } + return ints; + } + + /** + * Creates a program containing the expressions for a given level. + * + *

    The expression list of the program will consist of all entries in the expression list + * allExprs[i] for which the corresponding level ordinal exprLevels[i] is + * equal to level. Expressions are mapped according to inputExprOrdinals + * . + * + * @param level Level ordinal + * @param levelCount Number of levels + * @param inputRowType Input row type + * @param allExprs Array of all expressions + * @param exprLevels Array of the level ordinal of each expression + * @param inputExprOrdinals Ordinals in the expression list of input expressions. Input expression + * i will be found at position inputExprOrdinals[i]. + * @param projectExprOrdinals Ordinals of the expressions to be output this level. + * @param conditionExprOrdinal Ordinal of the expression to form the condition for this level, or + * -1 if there is no condition. + * @param outputRowType Output row type + * @return Relational expression + */ + private RexProgram createProgramForLevel( + int level, + int levelCount, + RelDataType inputRowType, + RexNode[] allExprs, + int[] exprLevels, + int[] inputExprOrdinals, + final int[] projectExprOrdinals, + int conditionExprOrdinal, + @Nullable RelDataType outputRowType) { + // Build a list of expressions to form the calc. + List exprs = new ArrayList<>(); + + // exprInverseOrdinals describes where an expression in allExprs comes + // from -- from an input, from a calculated expression, or -1 if not + // available at this level. + int[] exprInverseOrdinals = new int[allExprs.length]; + Arrays.fill(exprInverseOrdinals, -1); + int j = 0; + + // First populate the inputs. They were computed at some previous level + // and are used here. + for (int i = 0; i < inputExprOrdinals.length; i++) { + final int inputExprOrdinal = inputExprOrdinals[i]; + exprs.add(new RexInputRef(i, allExprs[inputExprOrdinal].getType())); + exprInverseOrdinals[inputExprOrdinal] = j; + ++j; + } + + // Next populate the computed expressions. + final RexShuttle shuttle = + new InputToCommonExprConverter( + exprInverseOrdinals, exprLevels, level, inputExprOrdinals, allExprs); + for (int i = 0; i < allExprs.length; i++) { + if (exprLevels[i] == level + || exprLevels[i] == -1 + && level == (levelCount - 1) + && allExprs[i] instanceof RexLiteral) { + RexNode expr = allExprs[i]; + final RexNode translatedExpr = expr.accept(shuttle); + exprs.add(translatedExpr); + assert exprInverseOrdinals[i] == -1; + exprInverseOrdinals[i] = j; + ++j; + } + } + + // Form the projection and condition list. Project and condition + // ordinals are offsets into allExprs, so we need to map them into + // exprs. + final List projectRefs = new ArrayList<>(projectExprOrdinals.length); + final List fieldNames = new ArrayList<>(projectExprOrdinals.length); + for (int i = 0; i < projectExprOrdinals.length; i++) { + final int projectExprOrdinal = projectExprOrdinals[i]; + final int index = exprInverseOrdinals[projectExprOrdinal]; + assert index >= 0; + RexNode expr = allExprs[projectExprOrdinal]; + projectRefs.add(new RexLocalRef(index, expr.getType())); + + // Inherit meaningful field name if possible. + fieldNames.add(deriveFieldName(expr, i)); + } + RexLocalRef conditionRef; + if (conditionExprOrdinal >= 0) { + final int index = exprInverseOrdinals[conditionExprOrdinal]; + conditionRef = new RexLocalRef(index, allExprs[conditionExprOrdinal].getType()); + } else { + conditionRef = null; + } + if (outputRowType == null) { + outputRowType = RexUtil.createStructType(typeFactory, projectRefs, fieldNames, null); + } + final RexProgram program = + new RexProgram(inputRowType, exprs, projectRefs, conditionRef, outputRowType); + // Program is NOT normalized here (e.g. can contain literals in + // call operands), since literals should be inlined. + return program; + } + + private String deriveFieldName(RexNode expr, int ordinal) { + if (expr instanceof RexInputRef) { + int inputIndex = ((RexInputRef) expr).getIndex(); + String fieldName = child.getRowType().getFieldList().get(inputIndex).getName(); + // Don't inherit field names like '$3' from child: that's + // confusing. + if (!fieldName.startsWith("$") || fieldName.startsWith("$EXPR")) { + return fieldName; + } + } + return "$" + ordinal; + } + + /** + * Traces the given array of level expression lists at the finer level. + * + * @param exprs Array expressions + * @param exprLevels For each expression, the ordinal of its level + * @param levelTypeOrdinals For each level, the ordinal of its type in the {@link #relTypes} array + * @param levelCount The number of levels + */ + private void traceLevelExpressions( + RexNode[] exprs, int[] exprLevels, int[] levelTypeOrdinals, int levelCount) { + StringWriter traceMsg = new StringWriter(); + PrintWriter traceWriter = new PrintWriter(traceMsg); + traceWriter.println("FarragoAutoCalcRule result expressions for: "); + traceWriter.println(program.toString()); + + for (int level = 0; level < levelCount; level++) { + traceWriter.println("Rel Level " + level + ", type " + relTypes[levelTypeOrdinals[level]]); + + for (int i = 0; i < exprs.length; i++) { + RexNode expr = exprs[i]; + assert (exprLevels[i] >= -1) && (exprLevels[i] < levelCount) + : "expression's level is out of range"; + if (exprLevels[i] == level) { + traceWriter.println("\t" + i + ": " + expr); + } + } + traceWriter.println(); + } + String msg = traceMsg.toString(); + RULE_LOGGER.trace(msg); + } + + /** Returns the number of bits set in an array. */ + private static int count(boolean[] booleans) { + int count = 0; + for (boolean b : booleans) { + if (b) { + ++count; + } + } + return count; + } + + /** Returns the index of the first set bit in an array. */ + private static int firstSet(boolean[] booleans) { + for (int i = 0; i < booleans.length; i++) { + if (booleans[i]) { + return i; + } + } + return -1; + } + + /** + * Searches for a value in a map, and returns the position where it was found, or -1. + * + * @param value Value to search for + * @param map Map to search in + * @return Ordinal of value in map, or -1 if not found + */ + private static int indexOf(int value, int[] map) { + for (int i = 0; i < map.length; i++) { + if (value == map[i]) { + return i; + } + } + return -1; + } + + /** + * Returns whether a relational expression can be implemented solely in a given {@link RelType}. + * + * @param rel Calculation relational expression + * @param relTypeName Name of a {@link RelType} + * @return Whether relational expression can be implemented + */ + protected boolean canImplement(LogicalCalc rel, String relTypeName) { + for (RelType relType : relTypes) { + if (relType.name.equals(relTypeName)) { + return relType.canImplement(rel.getProgram()); + } + } + throw new AssertionError("unknown type " + relTypeName); + } + + /** + * Returns a list of sets of expressions that should be on the same level. + * + *

    For example, if this method returns { {3, 5}, {4, 7} }, it means that expressions 3 and 5, + * should be on the same level, and expressions 4 and 7 should be on the same level. The two + * cohorts do not need to be on the same level. + * + *

    The list is best effort. If it is not possible to arrange that the expressions in a cohort + * are on the same level, the {@link #execute()} method will still succeed. + * + *

    The default implementation of this method returns the empty list; expressions will be put on + * the most suitable level. This is generally the lowest possible level, except for literals, + * which are placed at the level where they are used. + * + * @return List of cohorts, that is sets of expressions, that the splitting algorithm should + * attempt to place on the same level + */ + protected List> getCohorts() { + return Collections.emptyList(); + } + + // ~ Inner Classes ---------------------------------------------------------- + + /** Type of relational expression. Determines which kinds of expressions it can handle. */ + public abstract static class RelType { + private final String name; + + protected RelType(String name) { + this.name = name; + } + + @Override + public String toString() { + return name; + } + + protected abstract boolean canImplement(RexFieldAccess field); + + protected abstract boolean canImplement(RexDynamicParam param); + + protected abstract boolean canImplement(RexLiteral literal); + + protected abstract boolean canImplement(RexCall call); + + protected boolean supportsCondition() { + return true; + } + + protected RelNode makeRel( + RelOptCluster cluster, + RelTraitSet traitSet, + RelBuilder relBuilder, + RelNode input, + RexProgram program) { + return LogicalCalc.create(input, program); + } + + /** + * Returns whether this RelType can implement a given expression. + * + * @param expr Expression + * @param condition Whether expression is a condition + * @return Whether this RelType can implement a given expression. + */ + public boolean canImplement(RexNode expr, boolean condition) { + if (condition && !supportsCondition()) { + return false; + } + try { + expr.accept(new ImplementTester(this)); + return true; + } catch (CannotImplement e) { + Util.swallow(e, null); + return false; + } + } + + /** + * Returns whether this tester's RelType can implement a given program. + * + * @param program Program + * @return Whether this tester's RelType can implement a given program. + */ + public boolean canImplement(RexProgram program) { + if ((program.getCondition() != null) && !canImplement(program.getCondition(), true)) { + return false; + } + for (RexNode expr : program.getExprList()) { + if (!canImplement(expr, false)) { + return false; + } + } + return true; + } + } + + /** + * Visitor which returns whether an expression can be implemented in a given type of relational + * expression. + */ + private static class ImplementTester extends RexVisitorImpl { + private final RelType relType; + + ImplementTester(RelType relType) { + super(false); + this.relType = relType; + } + + @Override + public Void visitCall(RexCall call) { + if (!relType.canImplement(call)) { + throw CannotImplement.INSTANCE; + } + return null; + } + + @Override + public Void visitDynamicParam(RexDynamicParam dynamicParam) { + if (!relType.canImplement(dynamicParam)) { + throw CannotImplement.INSTANCE; + } + return null; + } + + @Override + public Void visitFieldAccess(RexFieldAccess fieldAccess) { + if (!relType.canImplement(fieldAccess)) { + throw CannotImplement.INSTANCE; + } + return null; + } + + @Override + public Void visitLiteral(RexLiteral literal) { + if (!relType.canImplement(literal)) { + throw CannotImplement.INSTANCE; + } + return null; + } + } + + /** Control exception for {@link ImplementTester}. */ + private static class CannotImplement extends RuntimeException { + @SuppressWarnings("ThrowableInstanceNeverThrown") + static final CannotImplement INSTANCE = new CannotImplement(); + } + + /** + * Shuttle which converts every reference to an input field in an expression to a reference to a + * common sub-expression. + */ + private static class InputToCommonExprConverter extends RexShuttle { + private final int[] exprInverseOrdinals; + private final int[] exprLevels; + private final int level; + private final int[] inputExprOrdinals; + private final RexNode[] allExprs; + + InputToCommonExprConverter( + int[] exprInverseOrdinals, + int[] exprLevels, + int level, + int[] inputExprOrdinals, + RexNode[] allExprs) { + this.exprInverseOrdinals = exprInverseOrdinals; + this.exprLevels = exprLevels; + this.level = level; + this.inputExprOrdinals = inputExprOrdinals; + this.allExprs = allExprs; + } + + @Override + public RexNode visitInputRef(RexInputRef input) { + final int index = exprInverseOrdinals[input.getIndex()]; + assert index >= 0; + return new RexLocalRef(index, input.getType()); + } + + @Override + public RexNode visitLocalRef(RexLocalRef local) { + // A reference to a local variable becomes a reference to an input + // if the local was computed at a previous level. + final int localIndex = local.getIndex(); + final int exprLevel = exprLevels[localIndex]; + if (exprLevel < level) { + if (allExprs[localIndex] instanceof RexLiteral) { + // Expression is to be inlined. Use the original expression. + return allExprs[localIndex]; + } + int inputIndex = indexOf(localIndex, inputExprOrdinals); + assert inputIndex >= 0; + return new RexLocalRef(inputIndex, local.getType()); + } else { + // It's a reference to what was a local expression at the + // previous level, and was then projected. + final int exprIndex = exprInverseOrdinals[localIndex]; + return new RexLocalRef(exprIndex, local.getType()); + } + } + } + + /** Finds the highest level used by any of the inputs of a given expression. */ + private static class MaxInputFinder extends RexVisitorImpl { + int level; + private final int[] exprLevels; + + MaxInputFinder(int[] exprLevels) { + super(true); + this.exprLevels = exprLevels; + } + + @Override + public Void visitLocalRef(RexLocalRef localRef) { + int inputLevel = exprLevels[localRef.getIndex()]; + level = Math.max(level, inputLevel); + return null; + } + + /** Returns the highest level of any of the inputs of an expression. */ + public int maxInputFor(RexNode expr) { + level = 0; + expr.accept(this); + return level; + } + } + + /** + * Builds an array of the highest level which contains an expression which uses each expression as + * an input. + */ + private static class HighestUsageFinder extends RexVisitorImpl { + private final int[] maxUsingLevelOrdinals; + private int currentLevel; + + HighestUsageFinder(RexNode[] exprs, int[] exprLevels) { + super(true); + this.maxUsingLevelOrdinals = new int[exprs.length]; + Arrays.fill(maxUsingLevelOrdinals, -1); + for (int i = 0; i < exprs.length; i++) { + if (exprs[i] instanceof RexLiteral) { + // Literals are always used directly. It never makes sense + // to compute them at a lower level and project them to + // where they are used. + maxUsingLevelOrdinals[i] = -1; + continue; + } + currentLevel = exprLevels[i]; + @SuppressWarnings("argument.type.incompatible") + final Void unused = exprs[i].accept(this); + } + } + + public int[] getMaxUsingLevelOrdinals() { + return maxUsingLevelOrdinals; + } + + @Override + public Void visitLocalRef(RexLocalRef ref) { + final int index = ref.getIndex(); + maxUsingLevelOrdinals[index] = Math.max(maxUsingLevelOrdinals[index], currentLevel); + return null; + } + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/package-info.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/package-info.java index 0eb883fb6900..8b81b6df6d8f 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/package-info.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/package-info.java @@ -18,7 +18,7 @@ /** * BeamSQL specified nodes, to replace {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode}. + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode}. */ @DefaultAnnotation(NonNull.class) package org.apache.beam.sdk.extensions.sql.impl.rel; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java index 9499e1bca475..fd7ba2112165 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java @@ -21,16 +21,16 @@ import java.util.List; import java.util.Set; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSubset; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.SingleRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Aggregate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Filter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Project; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.AggregateProjectMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.volcano.RelSubset; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.SingleRel; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Aggregate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Filter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Project; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.RelFactories; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.AggregateProjectMergeRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilderFactory; /** * This rule is essentially a wrapper around Calcite's {@code AggregateProjectMergeRule}. In the diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregationRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregationRule.java index ae0183cd595f..96c5d91043cf 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregationRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregationRule.java @@ -25,18 +25,18 @@ import org.apache.beam.sdk.transforms.windowing.Sessions; import org.apache.beam.sdk.transforms.windowing.SlidingWindows; import org.apache.beam.sdk.transforms.windowing.WindowFn; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Aggregate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Project; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.ImmutableBitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Aggregate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Project; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.RelFactories; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilderFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ImmutableBitSet; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; @@ -60,6 +60,11 @@ public BeamAggregationRule( public void onMatch(RelOptRuleCall call) { final Aggregate aggregate = call.rel(0); final Project project = call.rel(1); + + if (aggregate.getGroupType() != Aggregate.Group.SIMPLE) { + return; + } + RelNode x = updateWindow(call, aggregate, project); if (x == null) { // Non-windowed case should be handled by the BeamBasicAggregationRule diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamBasicAggregationRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamBasicAggregationRule.java index f12ddc4f2b2b..15028a7be768 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamBasicAggregationRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamBasicAggregationRule.java @@ -22,19 +22,19 @@ import java.util.stream.Collectors; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamAggregationRel; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSubset; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Aggregate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Filter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Project; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.volcano.RelSubset; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Aggregate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Filter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Project; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.RelFactories; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilderFactory; /** * Aggregation rule that doesn't include projection. @@ -60,6 +60,10 @@ public void onMatch(RelOptRuleCall call) { Aggregate aggregate = call.rel(0); RelNode relNode = call.rel(1); + if (aggregate.getGroupType() != Aggregate.Group.SIMPLE) { + return; + } + if (relNode instanceof Project || relNode instanceof Calc || relNode instanceof Filter) { if (isWindowed(relNode) || hasWindowedParents(relNode)) { // This case is expected to get handled by the 'BeamAggregationRule' diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcMergeRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcMergeRule.java new file mode 100644 index 000000000000..210aebfe7be8 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcMergeRule.java @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.rule; + +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleOperand; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CalcMergeRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; + +/** + * Planner rule to merge a {@link BeamCalcRel} with a {@link BeamCalcRel}. Subset of {@link + * CalcMergeRule}. + */ +public class BeamCalcMergeRule extends RelOptRule { + public static final BeamCalcMergeRule INSTANCE = new BeamCalcMergeRule(); + + public BeamCalcMergeRule() { + super(operand(BeamCalcRel.class, operand(BeamCalcRel.class, any()), new RelOptRuleOperand[0])); + } + + @Override + public void onMatch(RelOptRuleCall call) { + CoreRules.CALC_MERGE.onMatch(call); + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java index 7a820424c8d4..01532b455f26 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java @@ -20,15 +20,15 @@ import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalCalc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexOver; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalCalc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexOver; /** A {@code ConverterRule} to replace {@link Calc} with {@link BeamCalcRel}. */ public class BeamCalcRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcSplittingRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcSplittingRule.java new file mode 100644 index 000000000000..432df0f2cf37 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcSplittingRule.java @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.rule; + +import org.apache.beam.sdk.extensions.sql.impl.rel.CalcRelSplitter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.RelFactories; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalCalc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A {@link RelOptRule} that converts a {@link LogicalCalc} into a chain of {@link + * org.apache.beam.sdk.extensions.sql.impl.rel.AbstractBeamCalcRel} nodes via {@link + * CalcRelSplitter}. + */ +public abstract class BeamCalcSplittingRule extends RelOptRule { + private static final Logger LOG = LoggerFactory.getLogger(BeamCalcSplittingRule.class); + + protected BeamCalcSplittingRule(String description) { + super(operand(LogicalCalc.class, any()), RelFactories.LOGICAL_BUILDER, description); + } + + @Override + public boolean matches(RelOptRuleCall x) { + CalcRelSplitter.RelType[] relTypes = getRelTypes(); + for (RelNode relNode : x.getRelList()) { + if (relNode instanceof LogicalCalc) { + LogicalCalc logicalCalc = (LogicalCalc) relNode; + for (RexNode rexNode : logicalCalc.getProgram().getExprList()) { + if (!relTypes[0].canImplement(rexNode, false) + && !relTypes[1].canImplement(rexNode, false)) { + LOG.error("Cannot implement expression {} with rule {}.", rexNode, this.description); + return false; + } + } + } + } + return true; + } + + @Override + public void onMatch(RelOptRuleCall relOptRuleCall) { + final Calc calc = (Calc) relOptRuleCall.rel(0); + final CalcRelSplitter transform = + new CalcRelSplitter(calc, relOptRuleCall.builder(), getRelTypes()); + RelNode newRel = transform.execute(); + relOptRuleCall.transformTo(newRel); + } + + protected abstract CalcRelSplitter.RelType[] getRelTypes(); +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCoGBKJoinRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCoGBKJoinRule.java index 516bc094ee4a..3d47c8429484 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCoGBKJoinRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCoGBKJoinRule.java @@ -21,12 +21,12 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalJoin; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.RelFactories; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalJoin; /** * Rule to convert {@code LogicalJoin} node to {@code BeamCoGBKJoinRel} node. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamEnumerableConverterRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamEnumerableConverterRule.java index 773fef11875e..15de4ff8092d 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamEnumerableConverterRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamEnumerableConverterRule.java @@ -20,10 +20,10 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamEnumerableConverter; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.EnumerableConvention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.EnumerableConvention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; /** A {@code ConverterRule} to Convert {@link BeamRelNode} to {@link EnumerableConvention}. */ public class BeamEnumerableConverterRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java index 9767fbf452da..2e2e2fb1bdb4 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java @@ -38,24 +38,25 @@ import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.utils.SelectHelpers; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.RelFactories; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelRecordType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilderFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; @SuppressWarnings({ @@ -140,8 +141,6 @@ public void onMatch(RelOptRuleCall call) { resolved = resolved.resolve(beamSqlTable.getSchema()); if (canDropCalc(program, beamSqlTable.supportsProjects(), tableFilter)) { - // Tell the optimizer to not use old IO, since the new one is better. - call.getPlanner().setImportance(ioSourceRel, 0.0); call.transformTo( ioSourceRel.createPushDownRel( calc.getRowType(), @@ -173,8 +172,6 @@ public void onMatch(RelOptRuleCall call) { || usedFields.size() < calcInputRowType.getFieldCount()) { // Smaller Calc programs are indisputably better, as well as IOs with less projected fields. // We can consider something with the same number of filters. - // Tell the optimizer not to use old Calc and IO. - call.getPlanner().setImportance(ioSourceRel, 0); call.transformTo(result); } } @@ -369,9 +366,13 @@ private RelNode constructNodesWithPushDown( newProjects.add(reMapRexNodeToNewInputs(project, mapping)); } + if (RexUtil.isIdentity(newProjects, newIoSourceRel.getRowType())) { + // Force a rename prior to filter for identity function. + relBuilder.project(newProjects, calcDataType.getFieldNames(), true); + } + relBuilder.filter(newFilter); - // Force to preserve named projects. - relBuilder.project(newProjects, calcDataType.getFieldNames(), true); + relBuilder.project(newProjects, calcDataType.getFieldNames()); return relBuilder.build(); } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOSinkRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOSinkRule.java index d67e106b93af..0e7a2231f902 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOSinkRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOSinkRule.java @@ -20,9 +20,9 @@ import java.util.Arrays; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSinkRel; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.TableModify; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.TableModify; /** A {@code ConverterRule} to replace {@link TableModify} with {@link BeamIOSinkRel}. */ public class BeamIOSinkRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIntersectRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIntersectRule.java index 1a91e4c68a75..7c51deacd5b7 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIntersectRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIntersectRule.java @@ -20,11 +20,11 @@ import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIntersectRel; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Intersect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalIntersect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Intersect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalIntersect; /** {@code ConverterRule} to replace {@code Intersect} with {@code BeamIntersectRel}. */ public class BeamIntersectRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamJoinAssociateRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamJoinAssociateRule.java index 3eb7ab5f18fd..d9445da50f94 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamJoinAssociateRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamJoinAssociateRule.java @@ -18,15 +18,15 @@ package org.apache.beam.sdk.extensions.sql.impl.rule; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.JoinAssociateRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.RelFactories; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.JoinAssociateRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilderFactory; /** * This is very similar to {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.JoinAssociateRule}. It only + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.JoinAssociateRule}. It only * checks if the resulting condition is supported before transforming. */ public class BeamJoinAssociateRule extends JoinAssociateRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamJoinPushThroughJoinRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamJoinPushThroughJoinRule.java index f2a10b9e9374..45df21c15181 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamJoinPushThroughJoinRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamJoinPushThroughJoinRule.java @@ -18,48 +18,46 @@ package org.apache.beam.sdk.extensions.sql.impl.rule; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalJoin; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.JoinPushThroughJoinRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.RelFactories; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalJoin; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.JoinPushThroughJoinRule; /** * This is exactly similar to {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.JoinPushThroughJoinRule}. It + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.JoinPushThroughJoinRule}. It * only checks if the condition of the new bottom join is supported. */ -public class BeamJoinPushThroughJoinRule extends JoinPushThroughJoinRule { +public class BeamJoinPushThroughJoinRule extends RelOptRule { /** Instance of the rule that works on logical joins only, and pushes to the right. */ public static final RelOptRule RIGHT = new BeamJoinPushThroughJoinRule( - "BeamJoinPushThroughJoinRule:right", - true, - LogicalJoin.class, - RelFactories.LOGICAL_BUILDER); + "BeamJoinPushThroughJoinRule:right", JoinPushThroughJoinRule.RIGHT); /** Instance of the rule that works on logical joins only, and pushes to the left. */ public static final RelOptRule LEFT = new BeamJoinPushThroughJoinRule( - "BeamJoinPushThroughJoinRule:left", - false, - LogicalJoin.class, - RelFactories.LOGICAL_BUILDER); + "BeamJoinPushThroughJoinRule:left", JoinPushThroughJoinRule.LEFT); + + private final RelOptRule base; /** Creates a JoinPushThroughJoinRule. */ - private BeamJoinPushThroughJoinRule( - String description, - boolean right, - Class clazz, - RelBuilderFactory relBuilderFactory) { - super(description, right, clazz, relBuilderFactory); + private BeamJoinPushThroughJoinRule(String description, RelOptRule base) { + super( + operand( + LogicalJoin.class, operand(LogicalJoin.class, any()), operand(RelNode.class, any())), + RelFactories.LOGICAL_BUILDER, + description); + + this.base = base; } @Override public void onMatch(RelOptRuleCall call) { - super.onMatch( + base.onMatch( new JoinRelOptRuleCall( call, rel -> { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamMatchRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamMatchRule.java index 6441c79840f7..c70162d5f639 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamMatchRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamMatchRule.java @@ -19,11 +19,11 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamMatchRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Match; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalMatch; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Match; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalMatch; /** {@code ConverterRule} to replace {@code Match} with {@code BeamMatchRel}. */ public class BeamMatchRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamMinusRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamMinusRule.java index 29d4a974e9be..24b691a1f63d 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamMinusRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamMinusRule.java @@ -20,11 +20,11 @@ import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamMinusRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Minus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalMinus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Minus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalMinus; /** {@code ConverterRule} to replace {@code Minus} with {@code BeamMinusRel}. */ public class BeamMinusRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSideInputJoinRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSideInputJoinRule.java index 98227bbe966b..3aa2fa4965f7 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSideInputJoinRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSideInputJoinRule.java @@ -21,12 +21,12 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSideInputJoinRel; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalJoin; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.RelFactories; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalJoin; /** * Rule to convert {@code LogicalJoin} node to {@code BeamSideInputJoinRel} node. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSideInputLookupJoinRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSideInputLookupJoinRule.java index 2c96bd9563b7..c94e72e9af3a 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSideInputLookupJoinRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSideInputLookupJoinRule.java @@ -20,12 +20,12 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSideInputLookupJoinRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalJoin; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalJoin; /** * Rule to convert {@code LogicalJoin} node to {@code BeamSideInputLookupJoinRel} node. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSortRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSortRule.java index 1647bf7290c8..06c17ecb0f4c 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSortRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamSortRule.java @@ -19,11 +19,11 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Sort; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Sort; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalSort; /** {@code ConverterRule} to replace {@code Sort} with {@code BeamSortRel}. */ public class BeamSortRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamTableFunctionScanRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamTableFunctionScanRule.java index 20959b51261b..a78bd2277a70 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamTableFunctionScanRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamTableFunctionScanRule.java @@ -17,18 +17,18 @@ */ package org.apache.beam.sdk.extensions.sql.impl.rule; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.util.ArrayList; import java.util.Collections; import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamTableFunctionScanRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.TableFunctionScan; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalTableFunctionScan; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.TableFunctionScan; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalTableFunctionScan; /** * This is the conveter rule that converts a Calcite {@code TableFunctionScan} to Beam {@code diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUncollectRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUncollectRule.java index 393882b26877..fdbdbbcb617a 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUncollectRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUncollectRule.java @@ -19,10 +19,10 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamUncollectRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Uncollect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Uncollect; /** A {@code ConverterRule} to replace {@link Uncollect} with {@link BeamUncollectRule}. */ public class BeamUncollectRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java index 7b84e25eab76..704a937560de 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java @@ -19,15 +19,15 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamUnionRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Union; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalUnion; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Union; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalUnion; /** * A {@code ConverterRule} to replace {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Union} with {@link + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Union} with {@link * BeamUnionRule}. */ public class BeamUnionRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnnestRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnnestRule.java index 502a2346cea6..07812e1855f3 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnnestRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnnestRule.java @@ -19,19 +19,19 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamUnnestRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSubset; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.SingleRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Correlate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Uncollect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalCorrelate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalProject; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexFieldAccess; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.volcano.RelSubset; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.SingleRel; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Correlate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Uncollect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalCorrelate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalProject; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexFieldAccess; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** * A {@code ConverterRule} to replace {@link Correlate} {@link Uncollect} with {@link @@ -109,7 +109,7 @@ public void onMatch(RelOptRuleCall call) { new BeamUnnestRel( correlate.getCluster(), correlate.getTraitSet().replace(BeamLogicalConvention.INSTANCE), - outer, + convert(outer, outer.getTraitSet().replace(BeamLogicalConvention.INSTANCE)), call.rel(2).getRowType(), fieldAccessIndices.build())); } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamValuesRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamValuesRule.java index 6fbe1e0910bc..98b9254fb789 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamValuesRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamValuesRule.java @@ -19,11 +19,11 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamValuesRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Values; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalValues; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Values; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalValues; /** {@code ConverterRule} to replace {@code Values} with {@code BeamValuesRel}. */ public class BeamValuesRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamWindowRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamWindowRule.java index 73c10ccc6f62..4995a3e8d5a3 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamWindowRule.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamWindowRule.java @@ -19,11 +19,11 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamWindowRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Window; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalWindow; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Window; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalWindow; /** A {@code ConverterRule} to replace {@link Window} with {@link BeamWindowRel}. */ public class BeamWindowRule extends ConverterRule { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinRelOptRuleCall.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinRelOptRuleCall.java index 62ebf3c64b06..5f60a8dc2db8 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinRelOptRuleCall.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinRelOptRuleCall.java @@ -19,13 +19,14 @@ import java.util.List; import java.util.Map; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleOperand; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelHintsPropagator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleOperand; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilder; /** * This is a class to catch the built join and check if it is a legal join before passing it to the @@ -53,6 +54,14 @@ public void transformTo(RelNode rel, Map equiv) { } } + @Override + public void transformTo( + RelNode relNode, Map map, RelHintsPropagator relHintsPropagator) { + if (checker.check(relNode)) { + originalCall.transformTo(relNode, map, relHintsPropagator); + } + } + /** This is a function gets the output relation and checks if it is a legal relational node. */ public interface JoinChecker { boolean check(RelNode rel); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/LogicalCalcMergeRule.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/LogicalCalcMergeRule.java new file mode 100644 index 000000000000..8d89133fc145 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/LogicalCalcMergeRule.java @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.rule; + +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleOperand; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalCalc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CalcMergeRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; + +/** + * Planner rule to merge a {@link LogicalCalc} with a {@link LogicalCalc}. Subset of {@link + * CalcMergeRule}. + */ +public class LogicalCalcMergeRule extends RelOptRule { + public static final LogicalCalcMergeRule INSTANCE = new LogicalCalcMergeRule(); + + public LogicalCalcMergeRule() { + super(operand(LogicalCalc.class, operand(LogicalCalc.class, any()), new RelOptRuleOperand[0])); + } + + @Override + public void onMatch(RelOptRuleCall call) { + CoreRules.CALC_MERGE.onMatch(call); + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/package-info.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/package-info.java index 43cf0b909fc1..90e53618c990 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/package-info.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/package-info.java @@ -17,7 +17,7 @@ */ /** - * {@link org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule} to generate + * {@link org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule} to generate * {@link org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode}. */ @DefaultAnnotation(NonNull.class) diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java index 205196911188..00890fc78461 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java @@ -35,8 +35,8 @@ import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.Schema.TypeName; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.ByteString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.NlsString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.ByteString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.NlsString; import org.apache.commons.csv.CSVFormat; import org.apache.commons.csv.CSVParser; import org.apache.commons.csv.CSVPrinter; @@ -146,6 +146,9 @@ public static Object autoCastField(Schema.Field field, Object rawObj) { && ((rawObj instanceof String) || (rawObj instanceof BigDecimal && type.getTypeName() != TypeName.DECIMAL))) { String raw = rawObj.toString(); + if (raw.trim().isEmpty()) { + return null; + } switch (type.getTypeName()) { case BYTE: return Byte.valueOf(raw); @@ -154,16 +157,10 @@ public static Object autoCastField(Schema.Field field, Object rawObj) { case INT32: return Integer.valueOf(raw); case INT64: - if (raw.equals("")) { - return null; - } return Long.valueOf(raw); case FLOAT: return Float.valueOf(raw); case DOUBLE: - if (raw.equals("")) { - return null; - } return Double.valueOf(raw); default: throw new UnsupportedOperationException( diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAggregations.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAggregations.java index 9f668505ab23..6dd76a30cc82 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAggregations.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAggregations.java @@ -27,6 +27,7 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderRegistry; import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.extensions.sql.impl.transform.agg.CountIf; import org.apache.beam.sdk.extensions.sql.impl.transform.agg.CovarianceFn; import org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceFn; import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; @@ -41,7 +42,7 @@ import org.apache.beam.sdk.transforms.Sample; import org.apache.beam.sdk.transforms.Sum; import org.apache.beam.sdk.values.KV; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.Nullable; /** Built-in aggregations functions for COUNT/MAX/MIN/SUM/AVG/VAR_POP/VAR_SAMP. */ @@ -62,12 +63,14 @@ public class BeamBuiltinAggregations { .put("$SUM0", BeamBuiltinAggregations::createSum) .put("AVG", BeamBuiltinAggregations::createAvg) .put("BIT_OR", BeamBuiltinAggregations::createBitOr) + .put("BIT_XOR", BeamBuiltinAggregations::createBitXOr) // JIRA link:https://issues.apache.org/jira/browse/BEAM-10379 .put("BIT_AND", BeamBuiltinAggregations::createBitAnd) .put("VAR_POP", t -> VarianceFn.newPopulation(t.getTypeName())) .put("VAR_SAMP", t -> VarianceFn.newSample(t.getTypeName())) .put("COVAR_POP", t -> CovarianceFn.newPopulation(t.getTypeName())) .put("COVAR_SAMP", t -> CovarianceFn.newSample(t.getTypeName())) + .put("COUNTIF", typeName -> CountIf.combineFn()) .build(); private static MathContext mc = new MathContext(10, RoundingMode.HALF_UP); @@ -199,6 +202,14 @@ static CombineFn createBitAnd(Schema.FieldType fieldType) { String.format("[%s] is not supported in BIT_AND", fieldType)); } + public static CombineFn createBitXOr(Schema.FieldType fieldType) { + if (fieldType.getTypeName() == TypeName.INT64) { + return new BitXOr(); + } + throw new UnsupportedOperationException( + String.format("[%s] is not supported in BIT_XOR", fieldType)); + } + static class CustMax> extends Combine.BinaryCombineFn { @Override public T apply(T left, T right) { @@ -397,40 +408,91 @@ public Long extractOutput(Long accum) { *

    Note: null values are ignored when mixed with non-null values. * (https://issues.apache.org/jira/browse/BEAM-10379) */ - static class BitAnd extends CombineFn { - // Indicate if input only contains null value. - private boolean isEmpty = true; + static class BitAnd extends CombineFn { + static class Accum { + /** True if no inputs have been seen yet. */ + boolean isEmpty = true; + /** + * True if any null inputs have been seen. If we see a single null value, the end result is + * null, so if isNull is true, isEmpty and bitAnd are ignored. + */ + boolean isNull = false; + /** The bitwise-and of the inputs seen so far. */ + long bitAnd = -1L; + } + + @Override + public Accum createAccumulator() { + return new Accum(); + } + + @Override + public Accum addInput(Accum accum, T input) { + if (accum.isNull) { + return accum; + } + if (input == null) { + accum.isNull = true; + return accum; + } + accum.isEmpty = false; + accum.bitAnd &= input.longValue(); + return accum; + } + + @Override + public Accum mergeAccumulators(Iterable accums) { + Accum merged = createAccumulator(); + for (Accum accum : accums) { + if (accum.isNull) { + return accum; + } + if (accum.isEmpty) { + continue; + } + merged.isEmpty = false; + merged.bitAnd &= accum.bitAnd; + } + return merged; + } + + @Override + public Long extractOutput(Accum accum) { + if (accum.isEmpty || accum.isNull) { + return null; + } + return accum.bitAnd; + } + } + + public static class BitXOr extends CombineFn { @Override public Long createAccumulator() { - return -1L; + return 0L; } @Override - public Long addInput(Long accum, T input) { + public Long addInput(Long mutableAccumulator, T input) { if (input != null) { - this.isEmpty = false; - return accum & input.longValue(); + return mutableAccumulator ^ input.longValue(); } else { - return null; + return 0L; } } @Override - public Long mergeAccumulators(Iterable accums) { + public Long mergeAccumulators(Iterable accumulators) { Long merged = createAccumulator(); - for (Long accum : accums) { - merged = merged & accum; + for (Long accum : accumulators) { + merged = merged ^ accum; } return merged; } @Override - public Long extractOutput(Long accum) { - if (this.isEmpty) { - return null; - } - return accum; + public Long extractOutput(Long accumulator) { + return accumulator; } } } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAnalyticFunctions.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAnalyticFunctions.java index e9709784755f..26da1aa46267 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAnalyticFunctions.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAnalyticFunctions.java @@ -24,7 +24,7 @@ import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.sdk.values.KV; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; /** Built-in Analytic Functions for the aggregation analytics functionality. */ @SuppressWarnings({ diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java index 009ae43a7b8f..6cac136263af 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java @@ -33,10 +33,10 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; /** Collections of {@code PTransform} and {@code DoFn} used to perform JOIN operation. */ @SuppressWarnings({ diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSqlOutputToConsoleFn.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSqlOutputToConsoleFn.java index 7b1b2afadb86..d3e44dbf9075 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSqlOutputToConsoleFn.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSqlOutputToConsoleFn.java @@ -32,6 +32,6 @@ public BeamSqlOutputToConsoleFn(String stepName) { @ProcessElement public void processElement(ProcessContext c) { - System.out.println("Output: " + c.element().getValues()); + System.out.println("Output: " + c.element().toString()); } } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/AggregationCombineFnAdapter.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/AggregationCombineFnAdapter.java index 53175424a55a..2178a2d47b28 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/AggregationCombineFnAdapter.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/AggregationCombineFnAdapter.java @@ -27,8 +27,8 @@ import org.apache.beam.sdk.schemas.SchemaCoder; import org.apache.beam.sdk.transforms.Combine.CombineFn; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.AggregateCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlUserDefinedAggFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.AggregateCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlUserDefinedAggFunction; import org.checkerframework.checker.nullness.qual.Nullable; /** Wrapper {@link CombineFn}s for aggregation function calls. */ diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/CountIf.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/CountIf.java new file mode 100644 index 000000000000..92b05b21e1a0 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/CountIf.java @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.transform.agg; + +import org.apache.beam.sdk.transforms.Combine; + +/** + * Returns the count of TRUE values for expression. Returns 0 if there are zero input rows, or if + * expression evaluates to FALSE or NULL for all rows. + */ +public class CountIf { + private CountIf() {} + + public static CountIfFn combineFn() { + return new CountIf.CountIfFn(); + } + + public static class CountIfFn extends Combine.CombineFn { + + public static class Accum { + boolean isExpressionFalse = true; + long countIfResult = 0L; + } + + @Override + public Accum createAccumulator() { + return new Accum(); + } + + @Override + public Accum addInput(Accum accum, Boolean input) { + if (input) { + accum.isExpressionFalse = false; + accum.countIfResult += 1; + } + return accum; + } + + @Override + public Accum mergeAccumulators(Iterable accums) { + CountIfFn.Accum merged = createAccumulator(); + for (CountIfFn.Accum accum : accums) { + if (!accum.isExpressionFalse) { + merged.countIfResult += accum.countIfResult; + } + } + return merged; + } + + @Override + public Long extractOutput(Accum accum) { + if (!accum.isExpressionFalse) { + return accum.countIfResult; + } + return 0L; + } + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/CovarianceFn.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/CovarianceFn.java index 579955fef6a0..f8112e29834e 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/CovarianceFn.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/CovarianceFn.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.transform.agg; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.math.BigDecimal; import java.math.MathContext; @@ -32,7 +32,7 @@ import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.SqlFunctions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.runtime.SqlFunctions; /** * {@link Combine.CombineFn} for Covariance on {@link Number} types. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceFn.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceFn.java index f0ffbd495ec3..ae1d9c8e5641 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceFn.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceFn.java @@ -29,7 +29,7 @@ import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.sdk.transforms.SerializableFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.SqlFunctions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.runtime.SqlFunctions; /** * {@link Combine.CombineFn} for Variance on {@link Number} types. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udaf/ArrayAgg.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udaf/ArrayAgg.java new file mode 100644 index 000000000000..721b4fa44922 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udaf/ArrayAgg.java @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.udaf; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.transforms.Combine; + +public class ArrayAgg { + + public static class ArrayAggArray extends Combine.CombineFn, List> { + @Override + public List createAccumulator() { + return new ArrayList<>(); + } + + @Override + public List addInput(List accum, Object input) { + accum.add(input); + return accum; + } + + @Override + public List mergeAccumulators(Iterable> accums) { + List merged = new ArrayList<>(); + for (List accum : accums) { + for (Object o : accum) { + merged.add(o); + } + } + return merged; + } + + @Override + public List extractOutput(List accumulator) { + return accumulator; + } + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udf/BuiltinHashFunctions.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udf/BuiltinHashFunctions.java index c3fc82b1fdb5..339b6e471c19 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udf/BuiltinHashFunctions.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udf/BuiltinHashFunctions.java @@ -19,7 +19,7 @@ import com.google.auto.service.AutoService; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.function.Strict; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.function.Strict; /** Hash Functions. */ @AutoService(BeamBuiltinFunctionProvider.class) diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udf/BuiltinStringFunctions.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udf/BuiltinStringFunctions.java index b7f931821846..490b6f8ec248 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udf/BuiltinStringFunctions.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/udf/BuiltinStringFunctions.java @@ -24,7 +24,7 @@ import org.apache.beam.repackaged.core.org.apache.commons.lang3.ArrayUtils; import org.apache.beam.repackaged.core.org.apache.commons.lang3.StringUtils; import org.apache.beam.sdk.schemas.Schema.TypeName; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.function.Strict; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.function.Strict; import org.apache.commons.codec.DecoderException; import org.apache.commons.codec.binary.Hex; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/BigDecimalConverter.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/BigDecimalConverter.java index d00e6d6d2d6b..d987c38c775a 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/BigDecimalConverter.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/BigDecimalConverter.java @@ -21,7 +21,7 @@ import java.util.Map; import org.apache.beam.sdk.schemas.Schema.TypeName; import org.apache.beam.sdk.transforms.SerializableFunction; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; /** * Provides converters from {@link BigDecimal} to other numeric types based on the input {@link diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/CalciteUtils.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/CalciteUtils.java index acd6e6c6b046..75ad2ebc5c90 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/CalciteUtils.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/CalciteUtils.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.utils; +import java.lang.reflect.GenericArrayType; import java.lang.reflect.ParameterizedType; import java.lang.reflect.Type; import java.util.Date; @@ -27,21 +28,20 @@ import org.apache.beam.sdk.schemas.Schema.TypeName; import org.apache.beam.sdk.schemas.logicaltypes.PassThroughLogicalType; import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.BiMap; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableBiMap; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.ByteString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.sdk.util.Preconditions; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.BiMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableBiMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.ByteString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlTypeNameSpec; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; import org.joda.time.Instant; import org.joda.time.base.AbstractInstant; /** Utility methods for Calcite related operations. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CalciteUtils { private static final long UNLIMITED_ARRAY_SIZE = -1L; @@ -73,7 +73,9 @@ public static boolean isDateTimeType(FieldType fieldType) { } if (fieldType.getTypeName().isLogicalType()) { - String logicalId = fieldType.getLogicalType().getIdentifier(); + Schema.LogicalType logicalType = fieldType.getLogicalType(); + Preconditions.checkArgumentNotNull(logicalType); + String logicalId = logicalType.getIdentifier(); return logicalId.equals(SqlTypes.DATE.getIdentifier()) || logicalId.equals(SqlTypes.TIME.getIdentifier()) || logicalId.equals(TimeWithLocalTzType.IDENTIFIER) @@ -88,7 +90,9 @@ public static boolean isStringType(FieldType fieldType) { } if (fieldType.getTypeName().isLogicalType()) { - String logicalId = fieldType.getLogicalType().getIdentifier(); + Schema.LogicalType logicalType = fieldType.getLogicalType(); + Preconditions.checkArgumentNotNull(logicalType); + String logicalId = logicalType.getIdentifier(); return logicalId.equals(CharType.IDENTIFIER); } return false; @@ -198,6 +202,10 @@ public static SqlTypeName toSqlTypeName(FieldType type) { } } + public static FieldType toFieldType(SqlTypeNameSpec sqlTypeName) { + return toFieldType(SqlTypeName.get(sqlTypeName.getTypeName().getSimple())); + } + public static FieldType toFieldType(SqlTypeName sqlTypeName) { switch (sqlTypeName) { case MAP: @@ -210,7 +218,12 @@ public static FieldType toFieldType(SqlTypeName sqlTypeName) { + "so it cannot be converted to a %s", sqlTypeName, Schema.FieldType.class.getSimpleName())); default: - return CALCITE_TO_BEAM_TYPE_MAPPING.get(sqlTypeName); + FieldType fieldType = CALCITE_TO_BEAM_TYPE_MAPPING.get(sqlTypeName); + if (fieldType == null) { + throw new IllegalArgumentException( + "Cannot find a matching Beam FieldType for Calcite type: " + sqlTypeName); + } + return fieldType; } } @@ -234,7 +247,12 @@ public static FieldType toFieldType(RelDataType calciteType) { return FieldType.row(toSchema(calciteType)); default: - return toFieldType(calciteType.getSqlTypeName()); + try { + return toFieldType(calciteType.getSqlTypeName()).withNullable(calciteType.isNullable()); + } catch (IllegalArgumentException e) { + throw new IllegalArgumentException( + "Cannot find a matching Beam FieldType for Calcite type: " + calciteType, e); + } } } @@ -254,16 +272,22 @@ public static RelDataType toRelDataType(RelDataTypeFactory dataTypeFactory, Fiel switch (fieldType.getTypeName()) { case ARRAY: case ITERABLE: + FieldType collectionElementType = fieldType.getCollectionElementType(); + Preconditions.checkArgumentNotNull(collectionElementType); return dataTypeFactory.createArrayType( - toRelDataType(dataTypeFactory, fieldType.getCollectionElementType()), - UNLIMITED_ARRAY_SIZE); + toRelDataType(dataTypeFactory, collectionElementType), UNLIMITED_ARRAY_SIZE); case MAP: - RelDataType componentKeyType = toRelDataType(dataTypeFactory, fieldType.getMapKeyType()); - RelDataType componentValueType = - toRelDataType(dataTypeFactory, fieldType.getMapValueType()); + FieldType mapKeyType = fieldType.getMapKeyType(); + FieldType mapValueType = fieldType.getMapValueType(); + Preconditions.checkArgumentNotNull(mapKeyType); + Preconditions.checkArgumentNotNull(mapValueType); + RelDataType componentKeyType = toRelDataType(dataTypeFactory, mapKeyType); + RelDataType componentValueType = toRelDataType(dataTypeFactory, mapValueType); return dataTypeFactory.createMapType(componentKeyType, componentValueType); case ROW: - return toCalciteRowType(fieldType.getRowSchema(), dataTypeFactory); + Schema schema = fieldType.getRowSchema(); + Preconditions.checkArgumentNotNull(schema); + return toCalciteRowType(schema, dataTypeFactory); default: return dataTypeFactory.createSqlType(toSqlTypeName(fieldType)); } @@ -280,7 +304,8 @@ private static RelDataType toRelDataType( /** * SQL-Java type mapping, with specified Beam rules:
    * 1. redirect {@link AbstractInstant} to {@link Date} so Calcite can recognize it.
    - * 2. For a list, the component type is needed to create a Sql array type. + * 2. For a list, the component type is needed to create a Sql array type.
    + * 3. For a Map, the component type is needed to create a Sql map type. * * @param type * @return Calcite RelDataType @@ -291,13 +316,25 @@ public static RelDataType sqlTypeWithAutoCast(RelDataTypeFactory typeFactory, Ty return typeFactory.createJavaType(Date.class); } else if (type instanceof Class && ByteString.class.isAssignableFrom((Class) type)) { return typeFactory.createJavaType(byte[].class); - } else if (type instanceof ParameterizedType - && java.util.List.class.isAssignableFrom( - (Class) ((ParameterizedType) type).getRawType())) { + } else if (type instanceof ParameterizedType) { ParameterizedType parameterizedType = (ParameterizedType) type; - Class genericType = (Class) parameterizedType.getActualTypeArguments()[0]; - RelDataType collectionElementType = typeFactory.createJavaType(genericType); - return typeFactory.createArrayType(collectionElementType, UNLIMITED_ARRAY_SIZE); + if (java.util.List.class.isAssignableFrom((Class) parameterizedType.getRawType())) { + RelDataType elementType = + sqlTypeWithAutoCast(typeFactory, parameterizedType.getActualTypeArguments()[0]); + return typeFactory.createArrayType(elementType, UNLIMITED_ARRAY_SIZE); + } else if (java.util.Map.class.isAssignableFrom((Class) parameterizedType.getRawType())) { + RelDataType mapElementKeyType = + sqlTypeWithAutoCast(typeFactory, parameterizedType.getActualTypeArguments()[0]); + RelDataType mapElementValueType = + sqlTypeWithAutoCast(typeFactory, parameterizedType.getActualTypeArguments()[1]); + return typeFactory.createMapType(mapElementKeyType, mapElementValueType); + } + } else if (type instanceof GenericArrayType) { + throw new IllegalArgumentException( + "Cannot infer types from " + + type + + ". This is currently unsupported, use List instead " + + "of Array."); } return typeFactory.createJavaType((Class) type); } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexFieldAccess.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexFieldAccess.java index ce75b92f8f50..62a22ae534db 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexFieldAccess.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexFieldAccess.java @@ -20,8 +20,8 @@ import java.util.ArrayList; import java.util.Collections; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexFieldAccess; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexFieldAccess; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; /** SerializableRexFieldAccess. */ public class SerializableRexFieldAccess extends SerializableRexNode { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexInputRef.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexInputRef.java index 4d4d364c8d00..d7b4ec7154c5 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexInputRef.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexInputRef.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.utils; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; /** SerializableRexInputRef. */ public class SerializableRexInputRef extends SerializableRexNode { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexNode.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexNode.java index 9796bf31b944..b4ef1a031041 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexNode.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SerializableRexNode.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.extensions.sql.impl.utils; import java.io.Serializable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexFieldAccess; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexFieldAccess; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** SerializableRexNode. */ public abstract class SerializableRexNode implements Serializable { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java index d9149551904d..c8ad21a260e9 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java @@ -21,7 +21,7 @@ import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** Basic implementation of {@link BeamSqlTable} methods used by predicate and filter push-down. */ public abstract class BaseBeamTable implements BeamSqlTable { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java index be2c205389e6..de13042a276f 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java @@ -25,7 +25,7 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** This interface defines a Beam Sql Table. */ public interface BeamSqlTable { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTableFilter.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTableFilter.java index 623c27eae4c0..0828eccbbb96 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTableFilter.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTableFilter.java @@ -18,8 +18,8 @@ package org.apache.beam.sdk.extensions.sql.meta; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** This interface defines Beam SQL Table Filter. */ public interface BeamSqlTableFilter { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java index 16a6906da316..eb8d7225d677 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.extensions.sql.meta; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** * This default implementation of {@link BeamSqlTableFilter} interface. Assumes that predicate @@ -27,7 +27,7 @@ public final class DefaultTableFilter implements BeamSqlTableFilter { private final List filters; - DefaultTableFilter(List filters) { + public DefaultTableFilter(List filters) { this.filters = filters; } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/ReadOnlyTableProvider.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/ReadOnlyTableProvider.java index ce6a00e15f60..d00166216cce 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/ReadOnlyTableProvider.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/ReadOnlyTableProvider.java @@ -21,7 +21,7 @@ import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; /** * A {@code ReadOnlyTableProvider} provides in-memory read only set of {@code BeamSqlTable diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableProviderWrapper.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableProviderWrapper.java index fb3dca3457b1..88f070c68751 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableProviderWrapper.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableProviderWrapper.java @@ -22,16 +22,22 @@ import com.alibaba.fastjson.JSONObject; import com.fasterxml.jackson.core.JsonProcessingException; import java.io.Serializable; +import java.util.List; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; import org.apache.beam.sdk.extensions.sql.meta.BaseBeamTable; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTableFilter; +import org.apache.beam.sdk.extensions.sql.meta.DefaultTableFilter; +import org.apache.beam.sdk.extensions.sql.meta.ProjectSupport; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.io.InvalidConfigurationException; import org.apache.beam.sdk.schemas.io.InvalidSchemaException; +import org.apache.beam.sdk.schemas.io.PushdownProjector; import org.apache.beam.sdk.schemas.io.SchemaIO; import org.apache.beam.sdk.schemas.io.SchemaIOProvider; import org.apache.beam.sdk.transforms.PTransform; @@ -118,6 +124,43 @@ public PCollection buildIOReader(PBegin begin) { return begin.apply(readerTransform); } + @Override + public PCollection buildIOReader( + PBegin begin, BeamSqlTableFilter filters, List fieldNames) { + PTransform> readerTransform = schemaIO.buildReader(); + if (!(filters instanceof DefaultTableFilter)) { + throw new UnsupportedOperationException( + String.format( + "Filter pushdown is not yet supported in %s. BEAM-12663", + SchemaIOTableWrapper.class)); + } + if (!fieldNames.isEmpty()) { + if (readerTransform instanceof PushdownProjector) { + // The pushdown must return a PTransform that can be applied to a PBegin, or this cast + // will fail. + PushdownProjector pushdownProjector = (PushdownProjector) readerTransform; + FieldAccessDescriptor fieldAccessDescriptor = + FieldAccessDescriptor.withFieldNames(fieldNames); + readerTransform = pushdownProjector.withProjectionPushdown(fieldAccessDescriptor); + } else { + throw new UnsupportedOperationException( + String.format("%s does not support projection pushdown.", this.getClass())); + } + } + return begin.apply(readerTransform); + } + + @Override + public ProjectSupport supportsProjects() { + PTransform> readerTransform = schemaIO.buildReader(); + if (readerTransform instanceof PushdownProjector) { + return ((PushdownProjector) readerTransform).supportsFieldReordering() + ? ProjectSupport.WITH_FIELD_REORDERING + : ProjectSupport.WITHOUT_FIELD_REORDERING; + } + return ProjectSupport.NONE; + } + @Override public POutput buildIOWriter(PCollection input) { PTransform, ? extends POutput> writerTransform = schemaIO.buildWriter(); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/UdfUdafProvider.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/UdfUdafProvider.java index 8b3d95d26b36..85c917d30232 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/UdfUdafProvider.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/UdfUdafProvider.java @@ -19,6 +19,7 @@ import java.util.Collections; import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.extensions.sql.BeamSqlUdf; import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.sdk.transforms.SerializableFunction; @@ -27,6 +28,7 @@ @SuppressWarnings({ "rawtypes" // TODO(https://issues.apache.org/jira/browse/BEAM-10556) }) +@Experimental public interface UdfUdafProvider { /** For UDFs implement {@link BeamSqlUdf}. */ default Map> getBeamSqlUdfs() { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamBigQuerySqlDialect.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamBigQuerySqlDialect.java index 7fa53f3bcda9..60e249dc21fd 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamBigQuerySqlDialect.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamBigQuerySqlDialect.java @@ -19,30 +19,17 @@ import java.util.List; import java.util.Map; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.Casing; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.TimeUnit; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.NullCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeSystem; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlAbstractDateTimeLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDataTypeSpec; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIntervalLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIntervalQualifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSetOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSyntax; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlTimestampLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.dialect.BigQuerySqlDialect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlTrimFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.BasicSqlType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.Casing; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.NullCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeSystem; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlAbstractDateTimeLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlDialect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIntervalLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlTimestampLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.dialect.BigQuerySqlDialect; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -64,14 +51,6 @@ public class BeamBigQuerySqlDialect extends BigQuerySqlDialect { public static final SqlDialect DEFAULT = new BeamBigQuerySqlDialect(DEFAULT_CONTEXT); - // List of BigQuery Specific Operators needed to form Syntactically Correct SQL - private static final SqlOperator UNION_DISTINCT = - new SqlSetOperator("UNION DISTINCT", SqlKind.UNION, 14, false); - private static final SqlSetOperator EXCEPT_DISTINCT = - new SqlSetOperator("EXCEPT DISTINCT", SqlKind.EXCEPT, 14, false); - private static final SqlSetOperator INTERSECT_DISTINCT = - new SqlSetOperator("INTERSECT DISTINCT", SqlKind.INTERSECT, 18, false); - // ZetaSQL defined functions that need special unparsing private static final List FUNCTIONS_USING_INTERVAL = ImmutableList.of( @@ -112,37 +91,10 @@ public String quoteIdentifier(String val) { return quoteIdentifier(new StringBuilder(), val).toString(); } - @Override - public SqlNode emulateNullDirection(SqlNode node, boolean nullsFirst, boolean desc) { - return emulateNullDirectionWithIsNull(node, nullsFirst, desc); - } - - @Override - public boolean supportsNestedAggregations() { - return false; - } - - @Override - public void unparseOffsetFetch(SqlWriter writer, SqlNode offset, SqlNode fetch) { - unparseFetchUsingLimit(writer, offset, fetch); - } - @Override public void unparseCall( final SqlWriter writer, final SqlCall call, final int leftPrec, final int rightPrec) { switch (call.getKind()) { - case POSITION: - final SqlWriter.Frame frame = writer.startFunCall("STRPOS"); - writer.sep(","); - call.operand(1).unparse(writer, leftPrec, rightPrec); - writer.sep(","); - call.operand(0).unparse(writer, leftPrec, rightPrec); - if (3 == call.operandCount()) { - throw new UnsupportedOperationException( - "3rd operand Not Supported for Function STRPOS in Big Query"); - } - writer.endFunCall(frame); - break; case ROW: final SqlWriter.Frame structFrame = writer.startFunCall("STRUCT"); for (SqlNode operand : call.getOperandList()) { @@ -151,24 +103,6 @@ public void unparseCall( } writer.endFunCall(structFrame); break; - case UNION: - if (!((SqlSetOperator) call.getOperator()).isAll()) { - SqlSyntax.BINARY.unparse(writer, UNION_DISTINCT, call, leftPrec, rightPrec); - } - break; - case EXCEPT: - if (!((SqlSetOperator) call.getOperator()).isAll()) { - SqlSyntax.BINARY.unparse(writer, EXCEPT_DISTINCT, call, leftPrec, rightPrec); - } - break; - case INTERSECT: - if (!((SqlSetOperator) call.getOperator()).isAll()) { - SqlSyntax.BINARY.unparse(writer, INTERSECT_DISTINCT, call, leftPrec, rightPrec); - } - break; - case TRIM: - unparseTrim(writer, call, leftPrec, rightPrec); - break; case OTHER_FUNCTION: String funName = call.getOperator().getName(); if (DOUBLE_LITERAL_WRAPPERS.containsKey(funName)) { @@ -216,51 +150,6 @@ public void unparseSqlIntervalLiteral( unparseSqlIntervalQualifier(writer, interval.getIntervalQualifier(), RelDataTypeSystem.DEFAULT); } - @Override - public void unparseSqlIntervalQualifier( - SqlWriter writer, SqlIntervalQualifier qualifier, RelDataTypeSystem typeSystem) { - final String start = validate(qualifier.timeUnitRange.startUnit).name(); - if (qualifier.timeUnitRange.endUnit == null) { - writer.keyword(start); - } else { - throw new UnsupportedOperationException("Range time unit is not supported for BigQuery."); - } - } - - /** - * For usage of TRIM, LTRIM and RTRIM in BQ see - * BQ Trim Function. - */ - private void unparseTrim(SqlWriter writer, SqlCall call, int leftPrec, int rightPrec) { - final String operatorName; - SqlLiteral trimFlag = call.operand(0); - SqlLiteral valueToTrim = call.operand(1); - switch (trimFlag.getValueAs(SqlTrimFunction.Flag.class)) { - case LEADING: - operatorName = "LTRIM"; - break; - case TRAILING: - operatorName = "RTRIM"; - break; - default: - operatorName = call.getOperator().getName(); - break; - } - final SqlWriter.Frame trimFrame = writer.startFunCall(operatorName); - call.operand(2).unparse(writer, leftPrec, rightPrec); - - /** - * If the trimmed character is non space character then add it to the target sql. eg: TRIM(BOTH - * 'A' from 'ABCD' Output Query: TRIM('ABC', 'A') - */ - if (!valueToTrim.toValue().matches("\\s+")) { - writer.literal(","); - call.operand(1).unparse(writer, leftPrec, rightPrec); - } - writer.endFunCall(trimFrame); - } - private void unparseDoubleLiteralWrapperFunction(SqlWriter writer, String funName) { writer.literal(DOUBLE_LITERAL_WRAPPERS.get(funName)); } @@ -349,78 +238,6 @@ private void unparseInArrayOperator(SqlWriter writer, SqlCall call, int leftPrec writer.literal(")"); } - private TimeUnit validate(TimeUnit timeUnit) { - switch (timeUnit) { - case MICROSECOND: - case MILLISECOND: - case SECOND: - case MINUTE: - case HOUR: - case DAY: - case WEEK: - case MONTH: - case QUARTER: - case YEAR: - case ISOYEAR: - return timeUnit; - default: - throw new UnsupportedOperationException( - "Time unit " + timeUnit + " is not supported for BigQuery."); - } - } - - /** - * BigQuery data type reference: Bigquery - * Standard SQL Data Types. - */ - @Override - public SqlNode getCastSpec(final RelDataType type) { - if (type instanceof BasicSqlType) { - switch (type.getSqlTypeName()) { - // BigQuery only supports INT64 for integer types. - case BIGINT: - case INTEGER: - case TINYINT: - case SMALLINT: - return typeFromName(type, "INT64"); - // BigQuery only supports FLOAT64(aka. Double) for floating point types. - case FLOAT: - case DOUBLE: - return typeFromName(type, "FLOAT64"); - case DECIMAL: - return typeFromName(type, "NUMERIC"); - case BOOLEAN: - return typeFromName(type, "BOOL"); - case CHAR: - case VARCHAR: - return typeFromName(type, "STRING"); - case VARBINARY: - case BINARY: - return typeFromName(type, "BYTES"); - case DATE: - return typeFromName(type, "DATE"); - case TIME: - return typeFromName(type, "TIME"); - case TIMESTAMP: - return typeFromName(type, "TIMESTAMP"); - default: - break; - } - } - return super.getCastSpec(type); - } - - private static SqlNode typeFromName(RelDataType type, String name) { - return new SqlDataTypeSpec( - new SqlIdentifier(name, SqlParserPos.ZERO), - type.getPrecision(), - -1, - null, - null, - SqlParserPos.ZERO); - } - @Override public void unparseDateTimeLiteral( SqlWriter writer, SqlAbstractDateTimeLiteral literal, int leftPrec, int rightPrec) { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java index e55263d499b9..68ad64e415a8 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.bigquery; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor.POS; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rel2sql.SqlImplementor.POS; import java.util.HashMap; import java.util.Map; @@ -27,24 +27,28 @@ import org.apache.beam.repackaged.core.org.apache.commons.lang3.text.translate.EntityArrays; import org.apache.beam.repackaged.core.org.apache.commons.lang3.text.translate.JavaUnicodeEscaper; import org.apache.beam.repackaged.core.org.apache.commons.lang3.text.translate.LookupTranslator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.ByteString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.TimeUnitRange; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexDynamicParam; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDynamicParam; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFamily; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.BitString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.TimestampString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.ByteString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.TimeUnitRange; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rel2sql.RelToSqlConverter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rel2sql.SqlImplementor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexDynamicParam; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlDynamicParam; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeFamily; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.BitString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.TimestampString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.checkerframework.checker.nullness.qual.Nullable; @SuppressWarnings({ @@ -53,6 +57,13 @@ }) public class BeamSqlUnparseContext extends SqlImplementor.SimpleContext { + private final SqlImplementor imp = new RelToSqlConverter(BeamBigQuerySqlDialect.DEFAULT); + + @Override + public SqlImplementor implementor() { + return imp; + } + // More about escape sequences here: // https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical // No need to escape: \`, \?, \v, \a, \ooo, \xhh (since this in not a thing in Java) @@ -114,6 +125,12 @@ public SqlNode toSql(RexProgram program, RexNode rex) { final String name = "null_param_" + index; nullParams.put(name, param.getType()); return new NamedDynamicParam(index, POS, name); + } else if (SqlKind.SEARCH.equals(rex.getKind())) { + // Workaround CALCITE-4716 + RexCall search = (RexCall) rex; + RexLocalRef ref = (RexLocalRef) search.operands.get(1); + RexLiteral literal = (RexLiteral) program.getExprList().get(ref.getIndex()); + rex = search.clone(search.getType(), ImmutableList.of(search.operands.get(0), literal)); } return super.toSql(program, rex); diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryFilter.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryFilter.java index c028a7ea6287..526ae6c209a1 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryFilter.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryFilter.java @@ -17,29 +17,29 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.bigquery; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.AND; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.BETWEEN; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.CAST; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.COMPARISON; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.DIVIDE; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.LIKE; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.MINUS; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.MOD; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.OR; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.PLUS; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.TIMES; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.AND; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.BETWEEN; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.CAST; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.COMPARISON; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.DIVIDE; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.LIKE; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.MINUS; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.MOD; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.OR; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.PLUS; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.TIMES; import java.util.ArrayList; import java.util.List; import java.util.stream.Collectors; import org.apache.beam.repackaged.core.org.apache.commons.lang3.tuple.Pair; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTableFilter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; @SuppressWarnings({ diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java index 6e6203d1e1e3..68b9e00e5b16 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java @@ -48,13 +48,13 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rel2sql.SqlImplementor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java index b1646aa69347..717cd239e7f5 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.bigquery; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.MoreObjects.firstNonNull; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.MoreObjects.firstNonNull; import com.alibaba.fastjson.JSONObject; import com.google.auto.service.AutoService; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableFilter.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableFilter.java new file mode 100644 index 000000000000..7c192212fda8 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableFilter.java @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.bigtable; + +import static java.util.stream.Collectors.toList; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.KEY; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.byteStringUtf8; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.LIKE; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.bigtable.v2.RowFilter; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTableFilter; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; + +/** + * BigtableFilter for queries with WHERE clause. + * + *

    Currently only queries with a single LIKE statement by key field with RE2 Syntax regex type are supported, e.g. + * `SELECT * FROM table WHERE key LIKE '^key\d'` + */ +class BigtableFilter implements BeamSqlTableFilter { + private final List supported; + private final List unsupported; + private final Schema schema; + + BigtableFilter(List predicateCNF, Schema schema) { + supported = predicateCNF.stream().filter(BigtableFilter::isSupported).collect(toList()); + unsupported = + predicateCNF.stream().filter(predicate -> !isSupported(predicate)).collect(toList()); + this.schema = schema; + } + + @Override + public List getNotSupported() { + return unsupported; + } + + @Override + public int numSupported() { + return BeamSqlTableFilter.expressionsInFilter(supported); + } + + public List getSupported() { + return supported; + } + + @Override + public String toString() { + String supStr = supported.stream().map(RexNode::toString).collect(Collectors.joining()); + String unsupStr = unsupported.stream().map(RexNode::toString).collect(Collectors.joining()); + return String.format("[supported{%s}, unsupported{%s}]", supStr, unsupStr); + } + + RowFilter getFilters() { + checkArgument( + supported.size() == 1, + String.format("Only one LIKE operation is allowed. Got %s operations", supported.size())); + return translateRexNodeToRowFilter(supported.get(0)); + } + + private RowFilter translateRexNodeToRowFilter(RexNode node) { + checkNodeIsCoposite(node); + checkArgument(LIKE.equals(node.getKind()), "Only LIKE operation is supported."); + + List literals = filterOperands((RexCall) node, RexLiteral.class); + List inputRefs = filterOperands((RexCall) node, RexInputRef.class); + + checkArgument(literals.size() == 1); + checkArgument(inputRefs.size() == 1); + + checkFieldIsKey(inputRefs.get(0)); + String literal = literals.get(0).getValueAs(String.class); + + return RowFilter.newBuilder().setRowKeyRegexFilter(byteStringUtf8(literal)).build(); + } + + private void checkFieldIsKey(RexInputRef inputRef) { + String inputFieldName = schema.getField(inputRef.getIndex()).getName(); + checkArgument( + KEY.equals(inputFieldName), + "Only 'key' queries are supported. Got field " + inputFieldName); + } + + private static boolean isSupported(RexNode node) { + checkNodeIsCoposite(node); + if (!LIKE.equals(node.getKind())) { + return false; + } + + long literalsCount = countOperands((RexCall) node, RexLiteral.class); + long fieldsCount = countOperands((RexCall) node, RexInputRef.class); + + return literalsCount == 1 && fieldsCount == 1; + } + + private List filterOperands(RexCall compositeNode, Class clazz) { + return compositeNode.getOperands().stream() + .filter(clazz::isInstance) + .map(clazz::cast) + .collect(toList()); + } + + private static long countOperands(RexCall compositeNode, Class clazz) { + return compositeNode.getOperands().stream().filter(clazz::isInstance).count(); + } + + private static void checkNodeIsCoposite(RexNode node) { + checkArgument( + node instanceof RexCall, + String.format( + "Encountered an unexpected node type: %s. Should be %s", + node.getClass().getSimpleName(), RexCall.class.getSimpleName())); + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTable.java index da27d348e40b..a7d1012718d2 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTable.java @@ -34,6 +34,7 @@ import java.util.regex.Pattern; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTableFilter; import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.extensions.sql.meta.provider.InvalidTableException; @@ -43,10 +44,12 @@ import org.apache.beam.sdk.io.gcp.bigtable.BigtableRowToBeamRowFlat; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; @Experimental @@ -95,21 +98,23 @@ public class BigtableTable extends SchemaBaseBeamTable implements Serializable { @Override public PCollection buildIOReader(PBegin begin) { - BigtableIO.Read readTransform = - BigtableIO.read().withProjectId(projectId).withInstanceId(instanceId).withTableId(tableId); - if (!emulatorHost.isEmpty()) { - readTransform = readTransform.withEmulator(emulatorHost); - } - return readTransform + return readTransform() .expand(begin) - .apply( - "BigtableRowToBeamRow", - useFlatSchema - ? new BigtableRowToBeamRowFlat(schema, columnsMapping) - : new BigtableRowToBeamRow(schema)) + .apply("BigtableRowToBeamRow", bigtableRowToRow()) .setRowSchema(schema); } + @Override + public PCollection buildIOReader( + PBegin begin, BeamSqlTableFilter filters, List fieldNames) { + BigtableIO.Read readTransform = readTransform(); + if (filters instanceof BigtableFilter) { + BigtableFilter bigtableFilter = (BigtableFilter) filters; + readTransform = readTransform.withRowFilter(bigtableFilter.getFilters()); + } + return readTransform.expand(begin).apply(bigtableRowToRow()); + } + @Override public POutput buildIOWriter(PCollection input) { if (!useFlatSchema) { @@ -134,6 +139,11 @@ public BeamTableStatistics getTableStatistics(PipelineOptions options) { return BeamTableStatistics.BOUNDED_UNKNOWN; } + @Override + public BeamSqlTableFilter constructFilter(List filter) { + return new BigtableFilter(filter, schema); + } + private static Map> parseColumnsMapping(String commaSeparatedMapping) { Map> columnsMapping = new HashMap<>(); Splitter.on(",") @@ -208,4 +218,19 @@ private static void validateColumnsMappingFields( allMappingQualifiers, schemaFieldNames)); } } + + private BigtableIO.Read readTransform() { + BigtableIO.Read readTransform = + BigtableIO.read().withProjectId(projectId).withInstanceId(instanceId).withTableId(tableId); + if (!emulatorHost.isEmpty()) { + readTransform = readTransform.withEmulator(emulatorHost); + } + return readTransform; + } + + private PTransform, PCollection> bigtableRowToRow() { + return useFlatSchema + ? new BigtableRowToBeamRowFlat(schema, columnsMapping) + : new BigtableRowToBeamRow(schema); + } } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaAvroTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaAvroTable.java deleted file mode 100644 index 4045106584b5..000000000000 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaAvroTable.java +++ /dev/null @@ -1,89 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; - -import java.util.List; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.schemas.utils.AvroUtils; -import org.apache.beam.sdk.transforms.MapElements; -import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.transforms.SimpleFunction; -import org.apache.beam.sdk.values.KV; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.Row; -import org.apache.beam.sdk.values.TypeDescriptor; - -public class BeamKafkaAvroTable extends BeamKafkaTable { - - public BeamKafkaAvroTable(Schema beamSchema, String bootstrapServers, List topics) { - super(beamSchema, bootstrapServers, topics); - } - - @Override - protected PTransform>, PCollection> getPTransformForInput() { - return new AvroRecorderDecoder(schema); - } - - @Override - protected PTransform, PCollection>> getPTransformForOutput() { - return new AvroRecorderEncoder(schema); - } - - /** A PTransform to convert {@code KV} to {@link Row}. */ - private static class AvroRecorderDecoder - extends PTransform>, PCollection> { - private final Schema schema; - - AvroRecorderDecoder(Schema schema) { - this.schema = schema; - } - - @Override - public PCollection expand(PCollection> input) { - return input - .apply( - "extractValue", MapElements.into(TypeDescriptor.of(byte[].class)).via(KV::getValue)) - .apply("decodeAvroRecord", MapElements.via(AvroUtils.getAvroBytesToRowFunction(schema))) - .setRowSchema(schema); - } - } - - /** A PTransform to convert {@link Row} to {@code KV}. */ - private static class AvroRecorderEncoder - extends PTransform, PCollection>> { - private final Schema schema; - - AvroRecorderEncoder(Schema schema) { - this.schema = schema; - } - - @Override - public PCollection> expand(PCollection input) { - return input - .apply("encodeAvroRecord", MapElements.via(AvroUtils.getRowToAvroBytesFunction(schema))) - .apply("mapToKV", MapElements.via(new MakeBytesKVFn())); - } - - private static class MakeBytesKVFn extends SimpleFunction> { - @Override - public KV apply(byte[] bytes) { - return KV.of(new byte[] {}, bytes); - } - } - } -} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaCSVTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaCSVTable.java index d03086967678..51f8719e89e6 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaCSVTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaCSVTable.java @@ -20,8 +20,10 @@ import static java.nio.charset.StandardCharsets.UTF_8; import static org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils.beamRow2CsvLine; import static org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils.csvLines2BeamRows; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; import java.util.List; +import org.apache.beam.sdk.io.kafka.KafkaRecord; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.PTransform; @@ -29,7 +31,9 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.commons.csv.CSVFormat; +import org.apache.kafka.clients.producer.ProducerRecord; /** A Kafka topic that saves records as CSV format. */ public class BeamKafkaCSVTable extends BeamKafkaTable { @@ -46,18 +50,20 @@ public BeamKafkaCSVTable( } @Override - protected PTransform>, PCollection> getPTransformForInput() { + protected PTransform>, PCollection> + getPTransformForInput() { return new CsvRecorderDecoder(schema, csvFormat); } @Override - protected PTransform, PCollection>> getPTransformForOutput() { - return new CsvRecorderEncoder(csvFormat); + protected PTransform, PCollection>> + getPTransformForOutput() { + return new CsvRecorderEncoder(csvFormat, Iterables.getOnlyElement(getTopics())); } /** A PTransform to convert {@code KV} to {@link Row}. */ private static class CsvRecorderDecoder - extends PTransform>, PCollection> { + extends PTransform>, PCollection> { private final Schema schema; private final CSVFormat format; @@ -67,15 +73,16 @@ private static class CsvRecorderDecoder } @Override - public PCollection expand(PCollection> input) { + public PCollection expand(PCollection> input) { return input .apply( "decodeCsvRecord", ParDo.of( - new DoFn, Row>() { + new DoFn, Row>() { @ProcessElement public void processElement(ProcessContext c) { - String rowInString = new String(c.element().getValue(), UTF_8); + KV kv = checkArgumentNotNull(c.element()).getKV(); + String rowInString = new String(checkArgumentNotNull(kv.getValue()), UTF_8); for (Row row : csvLines2BeamRows(format, rowInString, schema)) { c.output(row); } @@ -87,23 +94,27 @@ public void processElement(ProcessContext c) { /** A PTransform to convert {@link Row} to {@code KV}. */ private static class CsvRecorderEncoder - extends PTransform, PCollection>> { + extends PTransform, PCollection>> { private final CSVFormat format; + private final String topic; - CsvRecorderEncoder(CSVFormat format) { + CsvRecorderEncoder(CSVFormat format, String topic) { this.format = format; + this.topic = topic; } @Override - public PCollection> expand(PCollection input) { + public PCollection> expand(PCollection input) { return input.apply( "encodeCsvRecord", ParDo.of( - new DoFn>() { + new DoFn>() { @ProcessElement public void processElement(ProcessContext c) { - Row in = c.element(); - c.output(KV.of(new byte[] {}, beamRow2CsvLine(in, format).getBytes(UTF_8))); + Row in = checkArgumentNotNull(c.element()); + c.output( + new ProducerRecord<>( + topic, new byte[] {}, beamRow2CsvLine(in, format).getBytes(UTF_8))); } })); } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaJsonTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaJsonTable.java deleted file mode 100644 index 16a5d8e09630..000000000000 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaJsonTable.java +++ /dev/null @@ -1,107 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; - -import static java.nio.charset.StandardCharsets.UTF_8; - -import com.fasterxml.jackson.databind.ObjectMapper; -import java.util.List; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.transforms.DoFn; -import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.transforms.ParDo; -import org.apache.beam.sdk.util.RowJson; -import org.apache.beam.sdk.util.RowJsonUtils; -import org.apache.beam.sdk.values.KV; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.Row; - -public class BeamKafkaJsonTable extends BeamKafkaTable { - public BeamKafkaJsonTable(Schema beamSchema, String bootstrapServers, List topics) { - super(beamSchema, bootstrapServers, topics); - } - - @Override - public PTransform>, PCollection> getPTransformForInput() { - ObjectMapper objectMapper = - RowJsonUtils.newObjectMapperWith(RowJson.RowJsonDeserializer.forSchema(schema)); - return new BeamKafkaJsonTable.JsonRecorderDecoder(schema, objectMapper); - } - - @Override - public PTransform, PCollection>> getPTransformForOutput() { - ObjectMapper objectMapper = - RowJsonUtils.newObjectMapperWith(RowJson.RowJsonSerializer.forSchema(schema)); - return new BeamKafkaJsonTable.JsonRecorderEncoder(objectMapper); - } - - /** A PTransform to convert {@code KV} to {@link Row}. */ - private static class JsonRecorderDecoder - extends PTransform>, PCollection> { - private final Schema schema; - private final ObjectMapper objectMapper; - - public JsonRecorderDecoder(Schema schema, ObjectMapper objectMapper) { - this.schema = schema; - this.objectMapper = objectMapper; - } - - @Override - public PCollection expand(PCollection> input) { - return input - .apply( - "decodeJsonRecord", - ParDo.of( - new DoFn, Row>() { - @ProcessElement - public void processElement(ProcessContext c) { - String rowInString = new String(c.element().getValue(), UTF_8); - Row row = RowJsonUtils.jsonToRow(objectMapper, rowInString); - c.output(row); - } - })) - .setRowSchema(schema); - } - } - - /** A PTransform to convert {@link Row} to {@code KV}. */ - private static class JsonRecorderEncoder - extends PTransform, PCollection>> { - private final ObjectMapper objectMapper; - - public JsonRecorderEncoder(ObjectMapper objectMapper) { - this.objectMapper = objectMapper; - } - - @Override - public PCollection> expand(PCollection input) { - return input.apply( - "encodeJsonRecord", - ParDo.of( - new DoFn>() { - @ProcessElement - public void processElement(ProcessContext c) { - c.output( - KV.of( - new byte[] {}, - RowJsonUtils.rowToJson(objectMapper, c.element()).getBytes(UTF_8))); - } - })); - } - } -} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaProtoTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaProtoTable.java deleted file mode 100644 index 7dc6134ffdef..000000000000 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaProtoTable.java +++ /dev/null @@ -1,119 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; - -import java.util.List; -import org.apache.beam.sdk.extensions.protobuf.ProtoMessageSchema; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.transforms.MapElements; -import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.transforms.SimpleFunction; -import org.apache.beam.sdk.values.KV; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.Row; -import org.apache.beam.sdk.values.TypeDescriptor; - -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class BeamKafkaProtoTable extends BeamKafkaTable { - private final Class protoClass; - - public BeamKafkaProtoTable( - Schema messageSchema, String bootstrapServers, List topics, Class protoClass) { - super(inferAndVerifySchema(protoClass, messageSchema), bootstrapServers, topics); - this.protoClass = protoClass; - } - - @Override - public PTransform>, PCollection> getPTransformForInput() { - return new ProtoRecorderDecoder(schema, protoClass); - } - - @Override - public PTransform, PCollection>> getPTransformForOutput() { - return new ProtoRecorderEncoder(protoClass); - } - - private static Schema inferAndVerifySchema(Class protoClass, Schema messageSchema) { - Schema inferredSchema = new ProtoMessageSchema().schemaFor(TypeDescriptor.of(protoClass)); - if (!messageSchema.equivalent(inferredSchema)) { - throw new IllegalArgumentException( - String.format( - "Given message schema: '%s'%n" - + "does not match schema inferred from protobuf class.%n" - + "Protobuf class: '%s'%n" - + "Inferred schema: '%s'", - messageSchema, protoClass.getName(), inferredSchema)); - } - return inferredSchema; - } - - /** A PTransform to convert {@code KV} to {@link Row}. */ - private static class ProtoRecorderDecoder - extends PTransform>, PCollection> { - private final Schema schema; - private final Class clazz; - - ProtoRecorderDecoder(Schema schema, Class clazz) { - this.schema = schema; - this.clazz = clazz; - } - - @Override - public PCollection expand(PCollection> input) { - SimpleFunction toRowFn = ProtoMessageSchema.getProtoBytesToRowFn(clazz); - return input - .apply("decodeProtoRecord", MapElements.via(new KvToBytes())) - .apply("Map bytes to rows", MapElements.via(toRowFn)) - .setRowSchema(schema); - } - - private static class KvToBytes extends SimpleFunction, byte[]> { - @Override - public byte[] apply(KV kv) { - return kv.getValue(); - } - } - } - - /** A PTransform to convert {@link Row} to {@code KV}. */ - private static class ProtoRecorderEncoder - extends PTransform, PCollection>> { - private final Class clazz; - - public ProtoRecorderEncoder(Class clazz) { - this.clazz = clazz; - } - - @Override - public PCollection> expand(PCollection input) { - SimpleFunction toBytesFn = ProtoMessageSchema.getRowToProtoBytesFn(clazz); - return input - .apply("Encode proto bytes to row", MapElements.via(toBytesFn)) - .apply("Bytes to KV", MapElements.via(new BytesToKV())); - } - - private static class BytesToKV extends SimpleFunction> { - @Override - public KV apply(byte[] bytes) { - return KV.of(new byte[] {}, bytes); - } - } - } -} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java index e52bd550184d..57830ca3e0d5 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.util.Collection; import java.util.HashMap; @@ -31,10 +31,11 @@ import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable; import org.apache.beam.sdk.extensions.sql.meta.provider.InvalidTableException; import org.apache.beam.sdk.io.kafka.KafkaIO; +import org.apache.beam.sdk.io.kafka.KafkaRecord; +import org.apache.beam.sdk.io.kafka.ProducerRecordCoder; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.POutput; @@ -43,6 +44,7 @@ import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.common.TopicPartition; import org.apache.kafka.common.serialization.ByteArrayDeserializer; import org.apache.kafka.common.serialization.ByteArraySerializer; @@ -94,16 +96,16 @@ public PCollection.IsBounded isBounded() { return PCollection.IsBounded.UNBOUNDED; } - protected abstract PTransform>, PCollection> + protected abstract PTransform>, PCollection> getPTransformForInput(); - protected abstract PTransform, PCollection>> + protected abstract PTransform, PCollection>> getPTransformForOutput(); @Override public PCollection buildIOReader(PBegin begin) { return begin - .apply("read", createKafkaRead().withoutMetadata()) + .apply("read", createKafkaRead()) .apply("in_format", getPTransformForInput()) .setRowSchema(getSchema()); } @@ -139,11 +141,12 @@ public POutput buildIOWriter(PCollection input) { return input .apply("out_reformat", getPTransformForOutput()) + .setCoder(ProducerRecordCoder.of(ByteArrayCoder.of(), ByteArrayCoder.of())) .apply("persistent", createKafkaWrite()); } - private KafkaIO.Write createKafkaWrite() { - return KafkaIO.write() + private KafkaIO.WriteRecords createKafkaWrite() { + return KafkaIO.writeRecords() .withBootstrapServers(bootstrapServers) .withTopic(topics.get(0)) .withKeySerializer(ByteArraySerializer.class) diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProvider.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProvider.java index d408bc565bae..74f4dbad5a7c 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProvider.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProvider.java @@ -17,16 +17,26 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; -import com.alibaba.fastjson.JSONArray; +import static org.apache.beam.sdk.extensions.sql.meta.provider.kafka.Schemas.PAYLOAD_FIELD; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + import com.alibaba.fastjson.JSONObject; import com.google.auto.service.AutoService; -import java.util.ArrayList; import java.util.List; +import java.util.Optional; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.extensions.sql.meta.provider.InMemoryMetaTableProvider; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializers; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.checkerframework.checker.nullness.qual.Nullable; /** * Kafka table provider. @@ -39,54 +49,89 @@ * NAME VARCHAR(127) COMMENT 'this is the name' * ) * COMMENT 'this is the table orders' - * LOCATION 'kafka://localhost:2181/brokers?topic=test' - * TBLPROPERTIES '{"bootstrap.servers":"localhost:9092", "topics": ["topic1", "topic2"]}' + * TYPE kafka + * // Optional. One broker host:port pair to bootstrap with and a topic. + * // Only one topic overall may be provided for writing. + * LOCATION 'my.company.url.com:2181/topic1' + * // Extra bootstrap_servers and topics can be provided explicitly. These will be merged + * // with the server and topic in LOCATION. + * TBLPROPERTIES '{ + * "bootstrap_servers": ["104.126.7.88:7743", "104.111.9.22:7743"], + * "topics": ["topic2", "topic3"] + * }' * } */ @AutoService(TableProvider.class) public class KafkaTableProvider extends InMemoryMetaTableProvider { + private static class ParsedLocation { + String brokerLocation = ""; + String topic = ""; + } + + private static ParsedLocation parseLocation(String location) { + ParsedLocation parsed = new ParsedLocation(); + List split = Splitter.on('/').splitToList(location); + checkArgument( + split.size() >= 2, + "Location string `%s` invalid: must be /.", + location); + parsed.topic = Iterables.getLast(split); + parsed.brokerLocation = String.join("/", split.subList(0, split.size() - 1)); + return parsed; + } - private enum PayloadFormat { - CSV, - AVRO, - JSON, - PROTO + private static List mergeParam(Optional initial, @Nullable List toMerge) { + ImmutableList.Builder merged = ImmutableList.builder(); + initial.ifPresent(merged::add); + if (toMerge != null) { + toMerge.forEach(o -> merged.add(o.toString())); + } + return merged.build(); } @Override public BeamSqlTable buildBeamSqlTable(Table table) { Schema schema = table.getSchema(); - JSONObject properties = table.getProperties(); - String bootstrapServers = properties.getString("bootstrap.servers"); - JSONArray topicsArr = properties.getJSONArray("topics"); - List topics = new ArrayList<>(topicsArr.size()); - for (Object topic : topicsArr) { - topics.add(topic.toString()); + + Optional parsedLocation = Optional.empty(); + if (!Strings.isNullOrEmpty(table.getLocation())) { + parsedLocation = Optional.of(parseLocation(checkArgumentNotNull(table.getLocation()))); } + List topics = + mergeParam(parsedLocation.map(loc -> loc.topic), properties.getJSONArray("topics")); + List allBootstrapServers = + mergeParam( + parsedLocation.map(loc -> loc.brokerLocation), + properties.getJSONArray("bootstrap_servers")); + String bootstrapServers = String.join(",", allBootstrapServers); - PayloadFormat payloadFormat = + Optional payloadFormat = properties.containsKey("format") - ? PayloadFormat.valueOf(properties.getString("format").toUpperCase()) - : PayloadFormat.CSV; - - switch (payloadFormat) { - case CSV: + ? Optional.of(properties.getString("format")) + : Optional.empty(); + if (Schemas.isNestedSchema(schema)) { + Optional serializer = + payloadFormat.map( + format -> + PayloadSerializers.getSerializer( + format, + checkArgumentNotNull(schema.getField(PAYLOAD_FIELD).getType().getRowSchema()), + properties.getInnerMap())); + return new NestedPayloadKafkaTable(schema, bootstrapServers, topics, serializer); + } else { + /* + * CSV is handled separately because multiple rows can be produced from a single message, which + * adds complexity to payload extraction. It remains here and as the default because it is the + * historical default, but it will not be extended to support attaching extended attributes to + * rows. + */ + if (payloadFormat.orElse("csv").equals("csv")) { return new BeamKafkaCSVTable(schema, bootstrapServers, topics); - case AVRO: - return new BeamKafkaAvroTable(schema, bootstrapServers, topics); - case JSON: - return new BeamKafkaJsonTable(schema, bootstrapServers, topics); - case PROTO: - String protoClassName = properties.getString("protoClass"); - try { - Class protoClass = Class.forName(protoClassName); - return new BeamKafkaProtoTable(schema, bootstrapServers, topics, protoClass); - } catch (ClassNotFoundException e) { - throw new IllegalArgumentException("Incorrect proto class provided: " + protoClassName); - } - default: - throw new IllegalArgumentException("Unsupported payload format: " + payloadFormat); + } + PayloadSerializer serializer = + PayloadSerializers.getSerializer(payloadFormat.get(), schema, properties.getInnerMap()); + return new PayloadSerializerKafkaTable(schema, bootstrapServers, topics, serializer); } } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/NestedPayloadKafkaTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/NestedPayloadKafkaTable.java new file mode 100644 index 000000000000..ce00c8f855e0 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/NestedPayloadKafkaTable.java @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; + +import static org.apache.beam.sdk.schemas.transforms.Cast.castRow; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.util.Collection; +import java.util.List; +import java.util.Optional; +import javax.annotation.Nullable; +import org.apache.beam.sdk.io.kafka.KafkaRecord; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableListMultimap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.kafka.clients.producer.ProducerRecord; +import org.apache.kafka.common.header.Header; +import org.apache.kafka.common.header.Headers; +import org.apache.kafka.common.header.internals.RecordHeader; +import org.joda.time.Instant; +import org.joda.time.ReadableDateTime; + +/** A class which transforms kafka records with attributes to a nested table. */ +class NestedPayloadKafkaTable extends BeamKafkaTable { + private final @Nullable PayloadSerializer payloadSerializer; + + public NestedPayloadKafkaTable( + Schema beamSchema, + String bootstrapServers, + List topics, + Optional payloadSerializer) { + super(beamSchema, bootstrapServers, topics); + + checkArgument(Schemas.isNestedSchema(schema)); + Schemas.validateNestedSchema(schema); + if (payloadSerializer.isPresent()) { + checkArgument( + schema.getField(Schemas.PAYLOAD_FIELD).getType().getTypeName().equals(TypeName.ROW)); + this.payloadSerializer = payloadSerializer.get(); + } else { + checkArgument(schema.getField(Schemas.PAYLOAD_FIELD).getType().equals(FieldType.BYTES)); + this.payloadSerializer = null; + } + } + + @Override + protected PTransform>, PCollection> + getPTransformForInput() { + return new PTransform>, PCollection>() { + @Override + public PCollection expand(PCollection> input) { + return input.apply( + MapElements.into(new TypeDescriptor() {}).via(record -> transformInput(record))); + } + }; + } + + @VisibleForTesting + Row transformInput(KafkaRecord record) { + Row.FieldValueBuilder builder = Row.withSchema(getSchema()).withFieldValues(ImmutableMap.of()); + if (schema.hasField(Schemas.MESSAGE_KEY_FIELD)) { + builder.withFieldValue(Schemas.MESSAGE_KEY_FIELD, record.getKV().getKey()); + } + if (schema.hasField(Schemas.EVENT_TIMESTAMP_FIELD)) { + builder.withFieldValue( + Schemas.EVENT_TIMESTAMP_FIELD, Instant.ofEpochMilli(record.getTimestamp())); + } + if (schema.hasField(Schemas.HEADERS_FIELD)) { + @Nullable Headers recordHeaders = record.getHeaders(); + if (recordHeaders != null) { + ImmutableListMultimap.Builder headersBuilder = + ImmutableListMultimap.builder(); + recordHeaders.forEach(header -> headersBuilder.put(header.key(), header.value())); + ImmutableList.Builder listBuilder = ImmutableList.builder(); + headersBuilder + .build() + .asMap() + .forEach( + (key, values) -> { + Row entry = + Row.withSchema(Schemas.HEADERS_ENTRY_SCHEMA) + .withFieldValue(Schemas.HEADERS_KEY_FIELD, key) + .withFieldValue(Schemas.HEADERS_VALUES_FIELD, values) + .build(); + listBuilder.add(entry); + }); + builder.withFieldValue(Schemas.HEADERS_FIELD, listBuilder.build()); + } + } + if (payloadSerializer == null) { + builder.withFieldValue(Schemas.PAYLOAD_FIELD, record.getKV().getValue()); + } else { + byte[] payload = record.getKV().getValue(); + if (payload != null) { + builder.withFieldValue( + Schemas.PAYLOAD_FIELD, payloadSerializer.deserialize(record.getKV().getValue())); + } + } + return builder.build(); + } + + @Override + protected PTransform, PCollection>> + getPTransformForOutput() { + return new PTransform, PCollection>>() { + @Override + public PCollection> expand(PCollection input) { + return input.apply( + MapElements.into(new TypeDescriptor>() {}) + .via(row -> transformOutput(row))); + } + }; + } + + // Suppress nullability warnings: ProducerRecord is supposed to accept null arguments. + @SuppressWarnings("argument.type.incompatible") + @VisibleForTesting + ProducerRecord transformOutput(Row row) { + row = castRow(row, row.getSchema(), schema); + String topic = Iterables.getOnlyElement(getTopics()); + byte[] key = null; + byte[] payload; + List
    headers = ImmutableList.of(); + Long timestampMillis = null; + if (schema.hasField(Schemas.MESSAGE_KEY_FIELD)) { + key = row.getBytes(Schemas.MESSAGE_KEY_FIELD); + } + if (schema.hasField(Schemas.EVENT_TIMESTAMP_FIELD)) { + ReadableDateTime time = row.getDateTime(Schemas.EVENT_TIMESTAMP_FIELD); + if (time != null) { + timestampMillis = time.getMillis(); + } + } + if (schema.hasField(Schemas.HEADERS_FIELD)) { + Collection headerRows = checkArgumentNotNull(row.getArray(Schemas.HEADERS_FIELD)); + ImmutableList.Builder
    headersBuilder = ImmutableList.builder(); + headerRows.forEach( + entry -> { + String headerKey = checkArgumentNotNull(entry.getString(Schemas.HEADERS_KEY_FIELD)); + Collection values = + checkArgumentNotNull(entry.getArray(Schemas.HEADERS_VALUES_FIELD)); + values.forEach(value -> headersBuilder.add(new RecordHeader(headerKey, value))); + }); + headers = headersBuilder.build(); + } + if (payloadSerializer == null) { + payload = row.getBytes(Schemas.PAYLOAD_FIELD); + } else { + payload = + payloadSerializer.serialize(checkArgumentNotNull(row.getRow(Schemas.PAYLOAD_FIELD))); + } + return new ProducerRecord<>(topic, null, timestampMillis, key, payload, headers); + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/PayloadSerializerKafkaTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/PayloadSerializerKafkaTable.java new file mode 100644 index 000000000000..c7a0ae71f40e --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/PayloadSerializerKafkaTable.java @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; + +import java.util.List; +import org.apache.beam.sdk.io.kafka.KafkaRecord; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.kafka.clients.producer.ProducerRecord; + +public class PayloadSerializerKafkaTable extends BeamKafkaTable { + private final PayloadSerializer serializer; + + PayloadSerializerKafkaTable( + Schema requiredSchema, + String bootstrapServers, + List topics, + PayloadSerializer serializer) { + super(requiredSchema, bootstrapServers, topics); + this.serializer = serializer; + } + + @Override + protected PTransform>, PCollection> + getPTransformForInput() { + return new PTransform>, PCollection>( + "deserialize-kafka-rows") { + @Override + public PCollection expand(PCollection> input) { + return input + .apply( + MapElements.into(TypeDescriptor.of(Row.class)) + .via(record -> serializer.deserialize(record.getKV().getValue()))) + .setRowSchema(getSchema()); + } + }; + } + + @Override + protected PTransform, PCollection>> + getPTransformForOutput() { + String topic = Iterables.getOnlyElement(getTopics()); + return new PTransform, PCollection>>( + "serialize-kafka-rows") { + @Override + public PCollection> expand(PCollection input) { + return input.apply( + MapElements.into(new TypeDescriptor>() {}) + .via(row -> new ProducerRecord<>(topic, new byte[] {}, serializer.serialize(row)))); + } + }; + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/Schemas.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/Schemas.java new file mode 100644 index 000000000000..f3a35e74ae60 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/Schemas.java @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.EquivalenceNullablePolicy; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; + +final class Schemas { + private Schemas() {} + + static final String MESSAGE_KEY_FIELD = "message_key"; + static final String EVENT_TIMESTAMP_FIELD = "event_timestamp"; + static final String HEADERS_FIELD = "headers"; + static final String PAYLOAD_FIELD = "payload"; + + static final String HEADERS_KEY_FIELD = "key"; + static final String HEADERS_VALUES_FIELD = "values"; + static final Schema HEADERS_ENTRY_SCHEMA = + Schema.builder() + .addStringField(HEADERS_KEY_FIELD) + .addArrayField(HEADERS_VALUES_FIELD, FieldType.BYTES) + .build(); + static final Schema.FieldType HEADERS_FIELD_TYPE = + Schema.FieldType.array(FieldType.row(HEADERS_ENTRY_SCHEMA)); + + private static boolean hasNestedPayloadField(Schema schema) { + if (!schema.hasField(PAYLOAD_FIELD)) { + return false; + } + Field field = schema.getField(PAYLOAD_FIELD); + if (fieldHasType(field, FieldType.BYTES)) { + return true; + } + return field.getType().getTypeName().equals(TypeName.ROW); + } + + private static boolean hasNestedHeadersField(Schema schema) { + if (!schema.hasField(HEADERS_FIELD)) { + return false; + } + return fieldHasType(schema.getField(HEADERS_FIELD), HEADERS_FIELD_TYPE); + } + + static boolean isNestedSchema(Schema schema) { + return hasNestedPayloadField(schema) && hasNestedHeadersField(schema); + } + + private static boolean fieldHasType(Field field, FieldType type) { + return type.equivalent(field.getType(), EquivalenceNullablePolicy.WEAKEN); + } + + private static void checkFieldHasType(Field field, FieldType type) { + checkArgument( + fieldHasType(field, type), + String.format("'%s' field must have schema matching '%s'.", field.getName(), type)); + } + + static void validateNestedSchema(Schema schema) { + checkArgument(schema.hasField(PAYLOAD_FIELD), "Must provide a 'payload' field for Kafka."); + for (Field field : schema.getFields()) { + switch (field.getName()) { + case HEADERS_FIELD: + checkFieldHasType(field, HEADERS_FIELD_TYPE); + break; + case EVENT_TIMESTAMP_FIELD: + checkFieldHasType(field, FieldType.DATETIME); + break; + case MESSAGE_KEY_FIELD: + checkFieldHasType(field, FieldType.BYTES); + break; + case PAYLOAD_FIELD: + checkArgument( + fieldHasType(field, FieldType.BYTES) + || field.getType().getTypeName().equals(TypeName.ROW), + String.format( + "'%s' field must either have a 'BYTES NOT NULL' or 'ROW' schema.", + field.getName())); + break; + default: + throw new IllegalArgumentException( + String.format( + "'%s' field is invalid at the top level for Kafka in the nested schema.", + field.getName())); + } + } + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java index 8955d294df83..5354553e85e5 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.AND; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.COMPARISON; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.OR; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.AND; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.COMPARISON; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.OR; import com.mongodb.client.model.Filters; import java.io.Serializable; @@ -57,13 +57,13 @@ import org.apache.beam.sdk.values.PCollection.IsBounded; import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.bson.Document; import org.bson.conversions.Bson; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTable.java new file mode 100644 index 000000000000..b2282ffee5fc --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTable.java @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.parquet; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import org.apache.avro.Schema; +import org.apache.avro.Schema.Field; +import org.apache.avro.generic.GenericRecord; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTableFilter; +import org.apache.beam.sdk.extensions.sql.meta.ProjectSupport; +import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.parquet.ParquetIO; +import org.apache.beam.sdk.io.parquet.ParquetIO.Read; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.schemas.transforms.Convert; +import org.apache.beam.sdk.schemas.utils.AvroUtils; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollection.IsBounded; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.Row; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@Internal +@SuppressWarnings({"nullness"}) +class ParquetTable extends SchemaBaseBeamTable implements Serializable { + private static final Logger LOG = LoggerFactory.getLogger(ParquetTable.class); + + private final Table table; + + ParquetTable(Table table) { + super(table.getSchema()); + this.table = table; + } + + @Override + public PCollection buildIOReader(PBegin begin) { + final Schema schema = AvroUtils.toAvroSchema(table.getSchema()); + Read read = ParquetIO.read(schema).withBeamSchemas(true).from(table.getLocation() + "/*"); + return begin.apply("ParquetIORead", read).apply("ToRows", Convert.toRows()); + } + + @Override + public PCollection buildIOReader( + PBegin begin, BeamSqlTableFilter filters, List fieldNames) { + final Schema schema = AvroUtils.toAvroSchema(table.getSchema()); + Read read = ParquetIO.read(schema).withBeamSchemas(true).from(table.getLocation() + "/*"); + if (!fieldNames.isEmpty()) { + Schema projectionSchema = projectSchema(schema, fieldNames); + LOG.info("Projecting fields schema : " + projectionSchema.toString()); + read = read.withProjection(projectionSchema, projectionSchema); + } + return begin.apply("ParquetIORead", read).apply("ToRows", Convert.toRows()); + } + + /** Returns a copy of the {@link Schema} with only the fieldNames fields. */ + private static Schema projectSchema(Schema schema, List fieldNames) { + List selectedFields = new ArrayList<>(); + for (String fieldName : fieldNames) { + selectedFields.add(deepCopyField(schema.getField(fieldName))); + } + return Schema.createRecord( + schema.getName() + "_projected", + schema.getDoc(), + schema.getNamespace(), + schema.isError(), + selectedFields); + } + + private static Field deepCopyField(Field field) { + Schema.Field newField = + new Schema.Field( + field.name(), field.schema(), field.doc(), field.defaultVal(), field.order()); + for (Map.Entry kv : field.getObjectProps().entrySet()) { + newField.addProp(kv.getKey(), kv.getValue()); + } + if (field.aliases() != null) { + for (String alias : field.aliases()) { + newField.addAlias(alias); + } + } + return newField; + } + + @Override + public POutput buildIOWriter(PCollection input) { + final org.apache.avro.Schema schema = AvroUtils.toAvroSchema(input.getSchema()); + return input + .apply("ToGenericRecords", Convert.to(GenericRecord.class)) + .apply( + "ParquetIOWrite", + FileIO.write().via(ParquetIO.sink(schema)).to(table.getLocation())); + } + + @Override + public IsBounded isBounded() { + return PCollection.IsBounded.BOUNDED; + } + + @Override + public BeamTableStatistics getTableStatistics(PipelineOptions options) { + return BeamTableStatistics.BOUNDED_UNKNOWN; + } + + @Override + public ProjectSupport supportsProjects() { + return ProjectSupport.WITH_FIELD_REORDERING; + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableProvider.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableProvider.java index b8a55f5db664..f24e226d6104 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableProvider.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableProvider.java @@ -18,18 +18,15 @@ package org.apache.beam.sdk.extensions.sql.meta.provider.parquet; import com.google.auto.service.AutoService; -import org.apache.beam.sdk.extensions.sql.meta.provider.SchemaIOTableProviderWrapper; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.extensions.sql.meta.provider.InMemoryMetaTableProvider; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; import org.apache.beam.sdk.io.parquet.ParquetIO; -import org.apache.beam.sdk.io.parquet.ParquetSchemaIOProvider; -import org.apache.beam.sdk.schemas.io.SchemaIOProvider; /** * {@link TableProvider} for {@link ParquetIO} for consumption by Beam SQL. * - *

    Passes the {@link ParquetSchemaIOProvider} to the generalized table provider wrapper, {@link - * SchemaIOTableProviderWrapper}, for Parquet specific behavior. - * *

    A sample of parquet table is: * *

    {@code
    @@ -39,19 +36,18 @@
      *   favorite_numbers ARRAY
      * )
      * TYPE 'parquet'
    - * LOCATION '/home/admin/users.parquet'
    + * LOCATION '/home/admin/orders/'
      * }
    */ @AutoService(TableProvider.class) -public class ParquetTableProvider extends SchemaIOTableProviderWrapper { +public class ParquetTableProvider extends InMemoryMetaTableProvider { @Override - public SchemaIOProvider getSchemaIOProvider() { - return new ParquetSchemaIOProvider(); + public String getTableType() { + return "parquet"; } - // TODO[BEAM-10516]: remove this override after TableProvider problem is fixed @Override - public String getTableType() { - return "parquet"; + public BeamSqlTable buildBeamSqlTable(Table table) { + return new ParquetTable(table); } } diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteSubscriptionTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteSubscriptionTable.java new file mode 100644 index 000000000000..5f46472d2b58 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteSubscriptionTable.java @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.pubsublite; + +import com.google.cloud.pubsublite.SubscriptionPath; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; +import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable; +import org.apache.beam.sdk.io.gcp.pubsublite.PubsubLiteIO; +import org.apache.beam.sdk.io.gcp.pubsublite.SubscriberOptions; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollection.IsBounded; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.Row; + +class PubsubLiteSubscriptionTable extends SchemaBaseBeamTable { + private final SubscriptionPath subscription; + private final PTransform, PCollection> transform; + + PubsubLiteSubscriptionTable( + Schema schema, + SubscriptionPath subscription, + PTransform, PCollection> transform) { + super(schema); + this.subscription = subscription; + this.transform = transform; + } + + @Override + public PCollection buildIOReader(PBegin begin) { + return begin + .apply( + "Read Pub/Sub Lite", + PubsubLiteIO.read( + SubscriberOptions.newBuilder().setSubscriptionPath(subscription).build())) + .apply("Transform to Row", transform); + } + + @Override + public POutput buildIOWriter(PCollection input) { + throw new UnsupportedOperationException( + "You cannot write to a Pub/Sub Lite subscription: you must write to a topic."); + } + + @Override + public IsBounded isBounded() { + return IsBounded.UNBOUNDED; + } + + @Override + public BeamTableStatistics getTableStatistics(PipelineOptions options) { + return BeamTableStatistics.UNBOUNDED_UNKNOWN; + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteTableProvider.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteTableProvider.java new file mode 100644 index 000000000000..c6c133ac245a --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteTableProvider.java @@ -0,0 +1,219 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.pubsublite; + +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.alibaba.fastjson.JSONObject; +import com.google.auto.service.AutoService; +import com.google.auto.value.AutoOneOf; +import com.google.cloud.pubsublite.SubscriptionPath; +import com.google.cloud.pubsublite.TopicPath; +import com.google.cloud.pubsublite.proto.PubSubMessage; +import java.util.Optional; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.extensions.sql.meta.provider.InMemoryMetaTableProvider; +import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.EquivalenceNullablePolicy; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.io.DeadLetteredTransform; +import org.apache.beam.sdk.schemas.io.Failure; +import org.apache.beam.sdk.schemas.io.GenericDlq; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializers; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; + +/** + * Pub/Sub Lite table provider. + * + *

    Pub/Sub Lite tables may be constructed with: + * + *

    {@code
    + * CREATE EXTERNAL TABLE tableName(
    + *     message_key BYTES [NOT NULL],  // optional, always present on read
    + *     publish_timestamp TIMESTAMP [NOT NULL],  // optional, readable tables only, always present on read
    + *     event_timestamp TIMESTAMP [NOT NULL],  // optional, null if not present in readable table, unset in message if null in writable table. NOT NULL enforces field presence on read
    + *     attributes ARRAY>>,  // optional, null values never present on reads or handled on writes
    + *     payload BYTES | ROW<[INSERT SCHEMA HERE]>,
    + * )
    + * TYPE pubsublite
    + * // For writable tables
    + * LOCATION 'projects/[PROJECT]/locations/[CLOUD ZONE]/topics/[TOPIC]'
    + * // For readable tables
    + * LOCATION 'projects/[PROJECT]/locations/[CLOUD ZONE]/subscriptions/[SUBSCRIPTION]'
    + * TBLPROPERTIES '{
    + *     "deadLetterQueue": "[DLQ_KIND]:[DLQ_ID]",  // optional
    + *     "format": "[FORMAT]",  // optional
    + *     // format params
    + * }'
    + * }
    + */ +@AutoService(TableProvider.class) +public class PubsubLiteTableProvider extends InMemoryMetaTableProvider { + @Override + public String getTableType() { + return "pubsublite"; + } + + private static Optional getSerializer(Schema schema, JSONObject properties) { + if (schema.getField("payload").getType().equals(FieldType.BYTES)) { + checkArgument( + !properties.containsKey("format"), + "Must not set the 'format' property if not unpacking payload."); + return Optional.empty(); + } + String format = properties.containsKey("format") ? properties.getString("format") : "json"; + return Optional.of(PayloadSerializers.getSerializer(format, schema, properties.getInnerMap())); + } + + private static void checkFieldHasType(Field field, FieldType type) { + checkArgument( + type.equivalent(field.getType(), EquivalenceNullablePolicy.WEAKEN), + String.format("'%s' field must have schema matching '%s'.", field.getName(), type)); + } + + private static void validateSchema(Schema schema) { + checkArgument( + schema.hasField(RowHandler.PAYLOAD_FIELD), + "Must provide a 'payload' field for Pub/Sub Lite."); + for (Field field : schema.getFields()) { + switch (field.getName()) { + case RowHandler.ATTRIBUTES_FIELD: + checkFieldHasType(field, RowHandler.ATTRIBUTES_FIELD_TYPE); + break; + case RowHandler.EVENT_TIMESTAMP_FIELD: + case RowHandler.PUBLISH_TIMESTAMP_FIELD: + checkFieldHasType(field, FieldType.DATETIME); + break; + case RowHandler.MESSAGE_KEY_FIELD: + checkFieldHasType(field, FieldType.BYTES); + break; + case RowHandler.PAYLOAD_FIELD: + checkArgument( + FieldType.BYTES.equivalent(field.getType(), EquivalenceNullablePolicy.WEAKEN) + || field.getType().getTypeName().equals(TypeName.ROW), + String.format( + "'%s' field must either have a 'BYTES NOT NULL' or 'ROW' schema.", + field.getName())); + break; + default: + throw new IllegalArgumentException( + String.format( + "'%s' field is invalid at the top level for Pub/Sub Lite.", field.getName())); + } + } + } + + @AutoOneOf(Location.Kind.class) + abstract static class Location { + enum Kind { + TOPIC, + SUBSCRIPTION + } + + abstract Kind getKind(); + + abstract TopicPath topic(); + + abstract SubscriptionPath subscription(); + + static Location parse(String location) { + if (location.contains("/topics/")) { + return AutoOneOf_PubsubLiteTableProvider_Location.topic(TopicPath.parse(location)); + } + if (location.contains("/subscriptions/")) { + return AutoOneOf_PubsubLiteTableProvider_Location.subscription( + SubscriptionPath.parse(location)); + } + throw new IllegalArgumentException( + String.format( + "Location '%s' does not correspond to either a Pub/Sub Lite topic or subscription.", + location)); + } + } + + private static RowHandler getRowHandler( + Schema schema, Optional optionalSerializer) { + if (optionalSerializer.isPresent()) { + return new RowHandler(schema, optionalSerializer.get()); + } + return new RowHandler(schema); + } + + private static Optional, PDone>> getDlqTransform( + JSONObject properties) { + if (!properties.containsKey("deadLetterQueue")) { + return Optional.empty(); + } + return Optional.of(GenericDlq.getDlqTransform(properties.getString("deadLetterQueue"))); + } + + private static + PTransform, PCollection> addDlqIfPresent( + SimpleFunction transform, JSONObject properties) { + if (properties.containsKey("deadLetterQueue")) { + return new DeadLetteredTransform<>(transform, properties.getString("deadLetterQueue")); + } + return MapElements.via(transform); + } + + @Override + public BeamSqlTable buildBeamSqlTable(Table table) { + checkArgument(table.getType().equals(getTableType())); + validateSchema(table.getSchema()); + Optional serializer = + getSerializer(table.getSchema(), table.getProperties()); + Location location = Location.parse(checkArgumentNotNull(table.getLocation())); + RowHandler rowHandler = getRowHandler(table.getSchema(), serializer); + + switch (location.getKind()) { + case TOPIC: + checkArgument( + !table.getSchema().hasField(RowHandler.PUBLISH_TIMESTAMP_FIELD), + "May not write to publish timestamp, this field is read-only."); + return new PubsubLiteTopicTable( + table.getSchema(), + location.topic(), + addDlqIfPresent( + SimpleFunction.fromSerializableFunctionWithOutputType( + rowHandler::rowToMessage, TypeDescriptor.of(PubSubMessage.class)), + table.getProperties())); + case SUBSCRIPTION: + return new PubsubLiteSubscriptionTable( + table.getSchema(), + location.subscription(), + addDlqIfPresent( + SimpleFunction.fromSerializableFunctionWithOutputType( + rowHandler::messageToRow, TypeDescriptor.of(Row.class)), + table.getProperties())); + default: + throw new IllegalArgumentException("Invalid kind for location: " + location.getKind()); + } + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteTopicTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteTopicTable.java new file mode 100644 index 000000000000..3cb9f51b4097 --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteTopicTable.java @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.pubsublite; + +import com.google.cloud.pubsublite.TopicPath; +import com.google.cloud.pubsublite.proto.PubSubMessage; +import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; +import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable; +import org.apache.beam.sdk.io.gcp.pubsublite.PublisherOptions; +import org.apache.beam.sdk.io.gcp.pubsublite.PubsubLiteIO; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollection.IsBounded; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.Row; + +class PubsubLiteTopicTable extends SchemaBaseBeamTable { + private final TopicPath topic; + private final PTransform, PCollection> transform; + + PubsubLiteTopicTable( + Schema schema, + TopicPath topic, + PTransform, PCollection> transform) { + super(schema); + this.topic = topic; + this.transform = transform; + } + + @Override + public PCollection buildIOReader(PBegin begin) { + throw new UnsupportedOperationException( + "You cannot read from a Pub/Sub Lite topic: you must create a subscription first."); + } + + @Override + public POutput buildIOWriter(PCollection input) { + return input + .apply("Transform to PubSubMessage", transform) + .apply( + "Write Pub/Sub Lite", + PubsubLiteIO.write(PublisherOptions.newBuilder().setTopicPath(topic).build())); + } + + @Override + public IsBounded isBounded() { + return IsBounded.UNBOUNDED; + } + + @Override + public BeamTableStatistics getTableStatistics(PipelineOptions options) { + return BeamTableStatistics.UNBOUNDED_UNKNOWN; + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/RowHandler.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/RowHandler.java new file mode 100644 index 000000000000..f82d394a0dfc --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/RowHandler.java @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.pubsublite; + +import static org.apache.beam.sdk.schemas.transforms.Cast.castRow; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.cloud.pubsublite.proto.AttributeValues; +import com.google.cloud.pubsublite.proto.PubSubMessage; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.Timestamps; +import java.io.Serializable; +import java.util.Collection; +import java.util.stream.Collectors; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.joda.time.Instant; +import org.joda.time.ReadableDateTime; + +class RowHandler implements Serializable { + private static final long serialVersionUID = 6827681678454156L; + + static final String PUBLISH_TIMESTAMP_FIELD = "publish_timestamp"; + static final String MESSAGE_KEY_FIELD = "message_key"; + static final String EVENT_TIMESTAMP_FIELD = "event_timestamp"; + static final String ATTRIBUTES_FIELD = "attributes"; + static final String PAYLOAD_FIELD = "payload"; + + static final String ATTRIBUTES_KEY_FIELD = "key"; + static final String ATTRIBUTES_VALUES_FIELD = "values"; + + static final Schema ATTRIBUTES_ENTRY_SCHEMA = + Schema.builder() + .addStringField(ATTRIBUTES_KEY_FIELD) + .addArrayField(ATTRIBUTES_VALUES_FIELD, FieldType.BYTES) + .build(); + static final Schema.FieldType ATTRIBUTES_FIELD_TYPE = + Schema.FieldType.array(FieldType.row(ATTRIBUTES_ENTRY_SCHEMA)); + + private final Schema schema; + private final @Nullable PayloadSerializer payloadSerializer; + + RowHandler(Schema schema) { + checkArgument(schema.getField(PAYLOAD_FIELD).getType().equals(FieldType.BYTES)); + this.schema = schema; + this.payloadSerializer = null; + } + + RowHandler(Schema schema, @Nonnull PayloadSerializer payloadSerializer) { + this.schema = schema; + this.payloadSerializer = payloadSerializer; + checkArgument((schema.getField(PAYLOAD_FIELD).getType().getTypeName().equals(TypeName.ROW))); + } + + /* Convert a message to a row. If Schema payload field is a Row type, payloadSerializer is required. */ + Row messageToRow(SequencedMessage message) { + // Transform this to a FieldValueBuilder, because otherwise individual withFieldValue calls will + // not mutate the original object. + Row.FieldValueBuilder builder = Row.withSchema(schema).withFieldValues(ImmutableMap.of()); + if (schema.hasField(PUBLISH_TIMESTAMP_FIELD)) { + builder.withFieldValue( + PUBLISH_TIMESTAMP_FIELD, + Instant.ofEpochMilli(Timestamps.toMillis(message.getPublishTime()))); + } + if (schema.hasField(MESSAGE_KEY_FIELD)) { + builder.withFieldValue(MESSAGE_KEY_FIELD, message.getMessage().getKey().toByteArray()); + } + if (schema.hasField(EVENT_TIMESTAMP_FIELD) && message.getMessage().hasEventTime()) { + builder.withFieldValue( + EVENT_TIMESTAMP_FIELD, + Instant.ofEpochMilli(Timestamps.toMillis(message.getMessage().getEventTime()))); + } + if (schema.hasField(ATTRIBUTES_FIELD)) { + ImmutableList.Builder listBuilder = ImmutableList.builder(); + message + .getMessage() + .getAttributesMap() + .forEach( + (key, values) -> { + Row entry = + Row.withSchema(ATTRIBUTES_ENTRY_SCHEMA) + .withFieldValue(ATTRIBUTES_KEY_FIELD, key) + .withFieldValue( + ATTRIBUTES_VALUES_FIELD, + values.getValuesList().stream() + .map(ByteString::toByteArray) + .collect(Collectors.toList())) + .build(); + listBuilder.add(entry); + }); + builder.withFieldValue(ATTRIBUTES_FIELD, listBuilder.build()); + } + if (payloadSerializer == null) { + builder.withFieldValue(PAYLOAD_FIELD, message.getMessage().getData().toByteArray()); + } else { + builder.withFieldValue( + PAYLOAD_FIELD, + payloadSerializer.deserialize(message.getMessage().getData().toByteArray())); + } + return builder.build(); + } + + /* Convert a row to a message. If Schema payload field is a Row type, payloadSerializer is required. */ + PubSubMessage rowToMessage(Row row) { + row = castRow(row, row.getSchema(), schema); + PubSubMessage.Builder builder = PubSubMessage.newBuilder(); + if (schema.hasField(MESSAGE_KEY_FIELD)) { + byte[] bytes = row.getBytes(MESSAGE_KEY_FIELD); + if (bytes != null) { + builder.setKey(ByteString.copyFrom(bytes)); + } + } + if (schema.hasField(EVENT_TIMESTAMP_FIELD)) { + ReadableDateTime time = row.getDateTime(EVENT_TIMESTAMP_FIELD); + if (time != null) { + builder.setEventTime(Timestamps.fromMillis(time.getMillis())); + } + } + if (schema.hasField(ATTRIBUTES_FIELD)) { + Collection attributes = row.getArray(ATTRIBUTES_FIELD); + if (attributes != null) { + attributes.forEach( + entry -> { + AttributeValues.Builder valuesBuilder = AttributeValues.newBuilder(); + Collection values = + checkArgumentNotNull(entry.getArray(ATTRIBUTES_VALUES_FIELD)); + values.forEach(bytes -> valuesBuilder.addValues(ByteString.copyFrom(bytes))); + builder.putAttributes( + checkArgumentNotNull(entry.getString(ATTRIBUTES_KEY_FIELD)), + valuesBuilder.build()); + }); + } + } + if (payloadSerializer == null) { + byte[] payload = row.getBytes(PAYLOAD_FIELD); + if (payload != null) { + builder.setData(ByteString.copyFrom(payload)); + } + } else { + Row payload = row.getRow(PAYLOAD_FIELD); + if (payload != null) { + builder.setData(ByteString.copyFrom(payloadSerializer.serialize(payload))); + } + } + return builder.build(); + } +} diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/package-info.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/package-info.java new file mode 100644 index 000000000000..8f6741957b1a --- /dev/null +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/package-info.java @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Provides abstractions for schema-aware IOs. */ +@DefaultAnnotation(NonNull.class) +package org.apache.beam.sdk.extensions.sql.meta.provider.pubsublite; + +import edu.umd.cs.findbugs.annotations.DefaultAnnotation; +import org.checkerframework.checker.nullness.qual.NonNull; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableFilter.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableFilter.java index 1f05d3c529e3..ae99c61320f3 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableFilter.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableFilter.java @@ -17,18 +17,18 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.test; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.COMPARISON; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind.IN; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.COMPARISON; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind.IN; import java.util.ArrayList; import java.util.List; import java.util.stream.Collectors; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTableFilter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java index c865ede20976..7c25993353a5 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.test; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import com.google.auto.service.AutoService; import java.io.Serializable; @@ -55,11 +55,11 @@ import org.apache.beam.sdk.values.PDone; import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; /** * Test in-memory table provider for use in tests. diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableUtils.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableUtils.java index a4d41f274f47..9ee6c298f9c9 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableUtils.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableUtils.java @@ -27,7 +27,7 @@ import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.Lists; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.Lists; /** Utility functions for mock classes. */ @Experimental diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestUnboundedTable.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestUnboundedTable.java index 05cbc55ee5d8..80f177283682 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestUnboundedTable.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestUnboundedTable.java @@ -29,7 +29,7 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TimestampedValue; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; import org.joda.time.Duration; import org.joda.time.Instant; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTableProvider.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTableProvider.java index 28e7f0c4c994..d50a14cbac0f 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTableProvider.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTableProvider.java @@ -51,9 +51,9 @@ import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; import org.apache.beam.sdk.values.TypeDescriptors; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.MoreObjects; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableSet; import org.apache.commons.csv.CSVFormat; import org.checkerframework.checker.nullness.qual.Nullable; diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStore.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStore.java index 5030be66505a..68901c9a8ad7 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStore.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStore.java @@ -22,7 +22,7 @@ import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; /** * A {@link MetaStore} which stores the meta info in memory. diff --git a/sdks/java/extensions/sql/src/main/resources/org.apache.beam.vendor.calcite.v1_20_0.org.codehaus.commons.compiler.properties b/sdks/java/extensions/sql/src/main/resources/org.apache.beam.vendor.calcite.v1_26_0.org.codehaus.commons.compiler.properties similarity index 93% rename from sdks/java/extensions/sql/src/main/resources/org.apache.beam.vendor.calcite.v1_20_0.org.codehaus.commons.compiler.properties rename to sdks/java/extensions/sql/src/main/resources/org.apache.beam.vendor.calcite.v1_26_0.org.codehaus.commons.compiler.properties index ab9a23481740..9bcf38628f06 100644 --- a/sdks/java/extensions/sql/src/main/resources/org.apache.beam.vendor.calcite.v1_20_0.org.codehaus.commons.compiler.properties +++ b/sdks/java/extensions/sql/src/main/resources/org.apache.beam.vendor.calcite.v1_26_0.org.codehaus.commons.compiler.properties @@ -15,4 +15,4 @@ # See the License for the specific language governing permissions and # limitations under the License. ################################################################################ -compilerFactory=org.apache.beam.vendor.calcite.v1_20_0.org.codehaus.janino.CompilerFactory +compilerFactory=org.apache.beam.vendor.calcite.v1_26_0.org.codehaus.janino.CompilerFactory diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java index 5e778764520e..b4fa11a7c05e 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java @@ -35,8 +35,8 @@ import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; import org.joda.time.Duration; import org.joda.time.Instant; import org.junit.Ignore; @@ -44,9 +44,6 @@ import org.junit.Test; /** Unit Tests for ComplexTypes, including nested ROW etc. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamComplexTypeTest { private static final Schema innerRowSchema = Schema.builder().addStringField("string_field").addInt64Field("long_field").build(); @@ -85,16 +82,6 @@ public class BeamComplexTypeTest { .addArrayField("field3", FieldType.INT64) .build(); - private static final Schema flattenedRowSchema = - Schema.builder() - .addStringField("field1") - .addStringField("field2") - .addInt64Field("field3") - .addInt64Field("field4") - .addStringField("field5") - .addInt64Field("field6") - .build(); - private static final ReadOnlyTableProvider readOnlyTableProvider = new ReadOnlyTableProvider( "test_provider", @@ -134,10 +121,18 @@ public void testNestedRow() { PCollection stream = BeamSqlRelUtils.toPCollection( pipeline, sqlEnv.parseQuery("SELECT nestedRowTestTable.col FROM nestedRowTestTable")); + Schema outputSchema = Schema.builder().addRowField("col", nestedRowSchema).build(); PAssert.that(stream) .containsInAnyOrder( - Row.withSchema(flattenedRowSchema) - .addValues("str", "inner_str_one", 1L, 2L, "inner_str_two", 3L) + Row.withSchema(outputSchema) + .addValues( + Row.withSchema(nestedRowSchema) + .addValues( + "str", + Row.withSchema(innerRowSchema).addValues("inner_str_one", 1L).build(), + 2L, + Row.withSchema(innerRowSchema).addValues("inner_str_two", 3L).build()) + .build()) .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); } @@ -149,8 +144,12 @@ public void testArrayWithRow() { BeamSqlRelUtils.toPCollection( pipeline, sqlEnv.parseQuery("SELECT arrayWithRowTestTable.col[1] FROM arrayWithRowTestTable")); + Schema outputSchema = Schema.builder().addRowField("col", innerRowSchema).build(); PAssert.that(stream) - .containsInAnyOrder(Row.withSchema(innerRowSchema).addValues("str", 1L).build()); + .containsInAnyOrder( + Row.withSchema(outputSchema) + .addValues(Row.withSchema(innerRowSchema).addValues("str", 1L).build()) + .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); } @@ -176,8 +175,12 @@ public void testBasicRow() { PCollection stream = BeamSqlRelUtils.toPCollection( pipeline, sqlEnv.parseQuery("SELECT col FROM basicRowTestTable")); + Schema outputSchema = Schema.builder().addRowField("col", innerRowSchema).build(); PAssert.that(stream) - .containsInAnyOrder(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build()); + .containsInAnyOrder( + Row.withSchema(outputSchema) + .addValues(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build()) + .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); } @@ -242,6 +245,7 @@ public void testSelectInnerRowOfNestedRow() { pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); } + @Ignore("https://issues.apache.org/jira/browse/BEAM-12782") @Test public void testNestedBytes() { byte[] bytes = new byte[] {-70, -83, -54, -2}; @@ -266,6 +270,7 @@ public void testNestedBytes() { pipeline.run(); } + @Ignore("https://issues.apache.org/jira/browse/BEAM-12782") @Test public void testNestedArrayOfBytes() { byte[] bytes = new byte[] {-70, -83, -54, -2}; @@ -297,18 +302,26 @@ public void testRowConstructor() { PCollection stream = BeamSqlRelUtils.toPCollection( pipeline, sqlEnv.parseQuery("SELECT ROW(1, ROW(2, 3), 'str', ROW('str2', 'str3'))")); + Schema intRow = Schema.builder().addInt32Field("field2").addInt32Field("field3").build(); + Schema strRow = Schema.builder().addStringField("field5").addStringField("field6").build(); + Schema innerRow = + Schema.builder() + .addInt32Field("field1") + .addRowField("intRow", intRow) + .addStringField("field4") + .addRowField("strRow", strRow) + .build(); PAssert.that(stream) .containsInAnyOrder( - Row.withSchema( - Schema.builder() - .addInt32Field("field1") - .addInt32Field("field2") - .addInt32Field("field3") - .addStringField("field4") - .addStringField("field5") - .addStringField("field6") + Row.withSchema(Schema.builder().addRowField("row", innerRow).build()) + .addValues( + Row.withSchema(innerRow) + .addValues( + 1, + Row.withSchema(intRow).addValues(2, 3).build(), + "str", + Row.withSchema(strRow).addValues("str2", "str3").build()) .build()) - .addValues(1, 2, 3, "str", "str2", "str3") .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); } @@ -389,7 +402,10 @@ public void testDatetimeFields() { .setRowSchema(dateTimeFieldSchema) .apply( SqlTransform.query( - "select EXTRACT(YEAR from dateTimeField) as yyyy, " + "select " + + " dateTimeField, " + + " nullableDateTimeField, " + + " EXTRACT(YEAR from dateTimeField) as yyyy, " + " EXTRACT(YEAR from nullableDateTimeField) as year_with_null, " + " EXTRACT(MONTH from dateTimeField) as mm, " + " EXTRACT(MONTH from nullableDateTimeField) as month_with_null " @@ -397,6 +413,8 @@ public void testDatetimeFields() { Schema outputRowSchema = Schema.builder() + .addField("dateTimeField", FieldType.DATETIME) + .addNullableField("nullableDateTimeField", FieldType.DATETIME) .addField("yyyy", FieldType.INT64) .addNullableField("year_with_null", FieldType.INT64) .addField("mm", FieldType.INT64) @@ -405,7 +423,9 @@ public void testDatetimeFields() { PAssert.that(outputRow) .containsInAnyOrder( - Row.withSchema(outputRowSchema).addValues(2019L, null, 06L, null).build()); + Row.withSchema(outputRowSchema) + .addValues(current, null, 2019L, null, 06L, null) + .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); } @@ -427,7 +447,10 @@ public void testSqlLogicalTypeDateFields() { .setRowSchema(dateTimeFieldSchema) .apply( SqlTransform.query( - "select EXTRACT(DAY from dateTypeField) as dd, " + "select " + + " dateTypeField, " + + " nullableDateTypeField, " + + " EXTRACT(DAY from dateTypeField) as dd, " + " EXTRACT(DAY from nullableDateTypeField) as day_with_null, " + " dateTypeField + interval '1' day as date_with_day_added, " + " nullableDateTypeField + interval '1' day as day_added_with_null " @@ -435,6 +458,8 @@ public void testSqlLogicalTypeDateFields() { Schema outputRowSchema = Schema.builder() + .addField("dateTypeField", FieldType.logicalType(SqlTypes.DATE)) + .addNullableField("nullableDateTypeField", FieldType.logicalType(SqlTypes.DATE)) .addField("dd", FieldType.INT64) .addNullableField("day_with_null", FieldType.INT64) .addField("date_with_day_added", FieldType.logicalType(SqlTypes.DATE)) @@ -444,7 +469,8 @@ public void testSqlLogicalTypeDateFields() { PAssert.that(outputRow) .containsInAnyOrder( Row.withSchema(outputRowSchema) - .addValues(27L, null, LocalDate.of(2019, 6, 28), null) + .addValues( + LocalDate.of(2019, 6, 27), null, 27L, null, LocalDate.of(2019, 6, 28), null) .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); @@ -467,7 +493,10 @@ public void testSqlLogicalTypeTimeFields() { .setRowSchema(dateTimeFieldSchema) .apply( SqlTransform.query( - "select timeTypeField + interval '1' hour as time_with_hour_added, " + "select " + + " timeTypeField, " + + " nullableTimeTypeField, " + + " timeTypeField + interval '1' hour as time_with_hour_added, " + " nullableTimeTypeField + interval '1' hour as hour_added_with_null, " + " timeTypeField - INTERVAL '60' SECOND as time_with_seconds_added, " + " nullableTimeTypeField - INTERVAL '60' SECOND as seconds_added_with_null " @@ -475,6 +504,8 @@ public void testSqlLogicalTypeTimeFields() { Schema outputRowSchema = Schema.builder() + .addField("timeTypeField", FieldType.logicalType(SqlTypes.TIME)) + .addNullableField("nullableTimeTypeField", FieldType.logicalType(SqlTypes.TIME)) .addField("time_with_hour_added", FieldType.logicalType(SqlTypes.TIME)) .addNullableField("hour_added_with_null", FieldType.logicalType(SqlTypes.TIME)) .addField("time_with_seconds_added", FieldType.logicalType(SqlTypes.TIME)) @@ -484,7 +515,13 @@ public void testSqlLogicalTypeTimeFields() { PAssert.that(outputRow) .containsInAnyOrder( Row.withSchema(outputRowSchema) - .addValues(LocalTime.of(2, 0, 0), null, LocalTime.of(0, 59, 0), null) + .addValues( + LocalTime.of(1, 0, 0), + null, + LocalTime.of(2, 0, 0), + null, + LocalTime.of(0, 59, 0), + null) .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); @@ -509,7 +546,10 @@ public void testSqlLogicalTypeDatetimeFields() { .setRowSchema(dateTimeFieldSchema) .apply( SqlTransform.query( - "select EXTRACT(YEAR from dateTimeField) as yyyy, " + "select " + + " dateTimeField, " + + " nullableDateTimeField, " + + " EXTRACT(YEAR from dateTimeField) as yyyy, " + " EXTRACT(YEAR from nullableDateTimeField) as year_with_null, " + " EXTRACT(MONTH from dateTimeField) as mm, " + " EXTRACT(MONTH from nullableDateTimeField) as month_with_null, " @@ -525,6 +565,8 @@ public void testSqlLogicalTypeDatetimeFields() { Schema outputRowSchema = Schema.builder() + .addField("dateTimeField", FieldType.logicalType(SqlTypes.DATETIME)) + .addNullableField("nullableDateTimeField", FieldType.logicalType(SqlTypes.DATETIME)) .addField("yyyy", FieldType.INT64) .addNullableField("year_with_null", FieldType.INT64) .addField("mm", FieldType.INT64) @@ -543,6 +585,8 @@ public void testSqlLogicalTypeDatetimeFields() { .containsInAnyOrder( Row.withSchema(outputRowSchema) .addValues( + LocalDateTime.of(2008, 12, 25, 15, 30, 0), + null, 2008L, null, 12L, diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java index df95b81a0b91..9584972b3c18 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java @@ -42,9 +42,6 @@ import org.junit.Test; /** UnitTest for {@link BeamSqlCli}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlCliTest { @Test public void testExecute_createTextTable() throws Exception { diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationCovarianceTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationCovarianceTest.java index 04085e70f835..edb765a254db 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationCovarianceTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationCovarianceTest.java @@ -31,9 +31,6 @@ import org.junit.Test; /** Integration tests for {@code COVAR_POP} and {@code COVAR_SAMP}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDslAggregationCovarianceTest { private static final double PRECISION = 1e-7; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationNullableTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationNullableTest.java index 6ee3d2474f29..91925f08d80e 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationNullableTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationNullableTest.java @@ -33,9 +33,6 @@ import org.junit.Test; /** Integration tests for aggregation nullable columns. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDslAggregationNullableTest { @Rule public TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java index 4c83b7cef194..206dde09a5f6 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java @@ -32,6 +32,8 @@ import java.util.List; import java.util.Map; import org.apache.beam.sdk.extensions.sql.impl.ParseException; +import org.apache.beam.sdk.extensions.sql.impl.SqlConversionException; +import org.apache.beam.sdk.extensions.sql.impl.transform.agg.CountIf; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestStream; @@ -59,9 +61,6 @@ * Tests for GROUP-BY/aggregation, with global_window/fix_time_window/sliding_window/session_window * with BOUNDED PCollection. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDslAggregationTest extends BeamSqlDslBase { public PCollection boundedInput3; @@ -138,6 +137,26 @@ private void runAggregationWithoutWindow(PCollection input) throws Exceptio pipeline.run().waitUntilFinish(); } + /** GROUP-BY ROLLUP with bounded PCollection. */ + @Test + public void testAggregationRollupWithBounded() throws Exception { + runAggregationRollup(boundedInput1); + } + + /** GROUP-BY with single aggregation function with unbounded PCollection. */ + @Test + public void testAggregationRollupWithUnbounded() throws Exception { + runAggregationRollup(unboundedInput1); + } + + private void runAggregationRollup(PCollection input) throws Exception { + String sql = "SELECT f_int2 FROM PCOLLECTION GROUP BY ROLLUP(f_int2)"; + + exceptions.expect(SqlConversionException.class); + pipeline.enableAbandonedNodeEnforcement(false); + input.apply(SqlTransform.query(sql)); + } + /** GROUP-BY with multiple aggregation functions with bounded PCollection. */ @Test public void testAggregationFunctionsWithBounded() throws Exception { @@ -767,7 +786,7 @@ public void testWindowOnNonTimestampField() throws Exception { exceptions.expectCause( hasMessage( containsString( - "Cannot apply 'TUMBLE' to arguments of type 'TUMBLE(, )'"))); + "Cannot apply '$TUMBLE' to arguments of type '$TUMBLE(, )'"))); pipeline.enableAbandonedNodeEnforcement(false); String sql = @@ -982,4 +1001,31 @@ private PCollection createTestPCollection( .withTimestampField(timestampField) .buildUnbounded(); } + + @Test + public void testCountIfFunction() throws Exception { + pipeline.enableAbandonedNodeEnforcement(false); + + Schema schemaInTableA = + Schema.builder().addInt64Field("f_int64").addInt64Field("f_int64_2").build(); + Schema resultType = Schema.builder().addInt64Field("finalAnswer").build(); + List rowsInTableA = + TestUtils.RowsBuilder.of(schemaInTableA) + .addRows( + 1L, 0L, + 3L, 0L, + 4L, 0L) + .getRows(); + + String sql = + "SELECT COUNTIF(f_int64 >" + 0 + ") AS countif_no " + "FROM PCOLLECTION GROUP BY f_int64_2"; + Row rowResult = Row.withSchema(resultType).addValues(3L).build(); + PCollection inputRows = + pipeline.apply("longVals", Create.of(rowsInTableA).withRowSchema(schemaInTableA)); + PCollection result = + inputRows.apply( + "sql", SqlTransform.query(sql).registerUdaf("COUNTIF", new CountIf.CountIfFn())); + PAssert.that(result).containsInAnyOrder(rowResult); + pipeline.run().waitUntilFinish(); + } } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationVarianceTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationVarianceTest.java index be2ac66b725d..808b27aaac4c 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationVarianceTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationVarianceTest.java @@ -31,9 +31,6 @@ import org.junit.Test; /** Integration tests for {@code VAR_POP} and {@code VAR_SAMP}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDslAggregationVarianceTest { private static final double PRECISION = 1e-7; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslArrayTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslArrayTest.java index a9da87acbaca..f07246751320 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslArrayTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslArrayTest.java @@ -296,7 +296,7 @@ public void testSelectSingleRowFromArrayOfRows() { Schema elementSchema = Schema.builder().addStringField("f_rowString").addInt32Field("f_rowInt").build(); - Schema resultSchema = elementSchema; + Schema resultSchema = Schema.builder().addRowField("row", elementSchema).build(); Schema inputType = Schema.builder() @@ -330,8 +330,12 @@ public void testSelectSingleRowFromArrayOfRows() { PAssert.that(result) .containsInAnyOrder( - Row.withSchema(elementSchema).addValues("BB", 22).build(), - Row.withSchema(elementSchema).addValues("DD", 44).build()); + Row.withSchema(resultSchema) + .addValues(Row.withSchema(elementSchema).addValues("BB", 22).build()) + .build(), + Row.withSchema(resultSchema) + .addValues(Row.withSchema(elementSchema).addValues("DD", 44).build()) + .build()); pipeline.run(); } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslBase.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslBase.java index b3406a93b686..426b95ae6df6 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslBase.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslBase.java @@ -23,9 +23,13 @@ import java.math.BigDecimal; import java.text.ParseException; +import java.time.LocalDate; +import java.time.LocalDateTime; +import java.time.LocalTime; import java.util.List; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.testing.TestStream; import org.apache.beam.sdk.transforms.Create; @@ -46,9 +50,6 @@ * *

    Note that, any change in these records would impact tests in this package. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDslBase { @Rule public final TestPipeline pipeline = TestPipeline.create(); @Rule public ExpectedException exceptions = ExpectedException.none(); @@ -87,6 +88,9 @@ public static void prepareClass() throws ParseException { .addFloatField("f_float") .addDoubleField("f_double") .addStringField("f_string") + .addField("f_date", FieldType.logicalType(SqlTypes.DATE)) + .addField("f_time", FieldType.logicalType(SqlTypes.TIME)) + .addField("f_datetime", FieldType.logicalType(SqlTypes.DATETIME)) .addDateTimeField("f_timestamp") .addInt32Field("f_int2") .addDecimalField("f_decimal") @@ -102,6 +106,9 @@ public static void prepareClass() throws ParseException { 1.0f, 1.0d, "string_row1", + LocalDate.of(2017, 1, 1), + LocalTime.of(1, 1, 3), + LocalDateTime.of(2017, 1, 1, 1, 1, 3), parseTimestampWithoutTimeZone("2017-01-01 01:01:03"), 0, new BigDecimal(1)) @@ -113,6 +120,9 @@ public static void prepareClass() throws ParseException { 2.0f, 2.0d, "string_row2", + LocalDate.of(2017, 1, 1), + LocalTime.of(1, 2, 3), + LocalDateTime.of(2017, 1, 1, 1, 2, 3), parseTimestampWithoutTimeZone("2017-01-01 01:02:03"), 0, new BigDecimal(2)) @@ -124,6 +134,9 @@ public static void prepareClass() throws ParseException { 3.0f, 3.0d, "string_row3", + LocalDate.of(2017, 1, 1), + LocalTime.of(1, 6, 3), + LocalDateTime.of(2017, 1, 1, 1, 6, 3), parseTimestampWithoutTimeZone("2017-01-01 01:06:03"), 0, new BigDecimal(3)) @@ -135,6 +148,9 @@ public static void prepareClass() throws ParseException { 4.0f, 4.0d, "第四行", + LocalDate.of(2017, 1, 1), + LocalTime.of(2, 4, 3), + LocalDateTime.of(2017, 1, 1, 2, 4, 3), parseTimestampWithoutTimeZone("2017-01-01 02:04:03"), 0, new BigDecimal(4)) @@ -150,6 +166,9 @@ public static void prepareClass() throws ParseException { 1.0f, 1.0d, "string_row1", + LocalDate.of(2017, 1, 1), + LocalTime.of(1, 1, 3), + LocalDateTime.of(2017, 1, 1, 1, 1, 3), parseTimestampWithUTCTimeZone("2017-01-01 01:01:03"), 0, new BigDecimal(1)) @@ -161,6 +180,9 @@ public static void prepareClass() throws ParseException { 2.0f, 2.0d, "string_row2", + LocalDate.of(2017, 1, 1), + LocalTime.of(1, 2, 3), + LocalDateTime.of(2017, 1, 1, 1, 2, 3), parseTimestampWithUTCTimeZone("2017-02-01 01:02:03"), 0, new BigDecimal(2)) @@ -172,6 +194,9 @@ public static void prepareClass() throws ParseException { 3.0f, 3.0d, "string_row3", + LocalDate.of(2017, 1, 1), + LocalTime.of(1, 6, 3), + LocalDateTime.of(2017, 1, 1, 1, 6, 3), parseTimestampWithUTCTimeZone("2017-03-01 01:06:03"), 0, new BigDecimal(3)) diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslJoinTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslJoinTest.java index da7ca1ce579f..b210bb0258f5 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslJoinTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslJoinTest.java @@ -42,9 +42,6 @@ import org.junit.rules.ExpectedException; /** Tests for joins in queries. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDslJoinTest { @Rule public final ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslNestedRowsTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslNestedRowsTest.java index db6cc1421b21..0efa0e30874c 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslNestedRowsTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslNestedRowsTest.java @@ -34,6 +34,10 @@ public class BeamSqlDslNestedRowsTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); @Rule public ExpectedException exceptions = ExpectedException.none(); + /** + * TODO([BEAM-9378]): This is a test of the incorrect behavior that should not work but does + * because calcite flattens the row. + */ @Test public void testRowConstructorKeyword() { Schema nestedSchema = @@ -43,39 +47,108 @@ public void testRowConstructorKeyword() { .addInt32Field("f_nestedIntPlusOne") .build(); - Schema resultSchema = + Schema schema = + Schema.builder().addInt32Field("f_int").addRowField("f_row", nestedSchema).build(); + + PCollection input = + pipeline.apply( + Create.of( + Row.withSchema(schema) + .addValues( + 1, Row.withSchema(nestedSchema).addValues(312, "CC", 313).build()) + .build()) + .withRowSchema(schema)); + + PCollection result = + input.apply( + SqlTransform.query( + "SELECT 1 as `f_int`, ROW(3, 'BB', f_int + 1) as `f_row1` FROM PCOLLECTION")); + + PAssert.that(result) + .containsInAnyOrder( + Row.withSchema(schema) + .addValues(1, Row.withSchema(nestedSchema).addValues(3, "BB", 2).build()) + .build()); + + pipeline.run(); + } + + @Test + public void testRowAliasAsRow() { + Schema nestedSchema = Schema.builder() - .addInt32Field("f_int") - .addInt32Field("f_int2") - .addStringField("f_varchar") - .addInt32Field("f_int3") + .addStringField("f_nestedString") + .addInt32Field("f_nestedInt") + .addInt32Field("f_nestedIntPlusOne") .build(); Schema inputType = Schema.builder().addInt32Field("f_int").addRowField("f_row", nestedSchema).build(); + Schema outputType = + Schema.builder().addInt32Field("f_int").addRowField("f_row1", nestedSchema).build(); PCollection input = pipeline.apply( Create.of( Row.withSchema(inputType) - .addValues( - 1, Row.withSchema(nestedSchema).addValues(312, "CC", 313).build()) - .build()) + .attachValues(1, Row.withSchema(nestedSchema).attachValues("CC", 312, 313))) + .withRowSchema(inputType)); + + PCollection result = + input + .apply(SqlTransform.query("SELECT 1 as `f_int`, f_row as `f_row1` FROM PCOLLECTION")) + .setRowSchema(outputType); + + PAssert.that(result) + .containsInAnyOrder( + Row.withSchema(outputType) + .attachValues(1, Row.withSchema(nestedSchema).attachValues("CC", 312, 313))); + + pipeline.run(); + } + + @Test + public void testRowConstructorKeywordKeepAsRow() { + Schema nestedSchema = + Schema.builder() + .addStringField("f_nestedString") + .addInt32Field("f_nestedInt") + .addInt32Field("f_nestedIntPlusOne") + .build(); + + Schema inputType = + Schema.builder().addInt32Field("f_int").addRowField("f_row", nestedSchema).build(); + Schema nestedOutput = + Schema.builder().addInt32Field("int_field").addStringField("str_field").build(); + Schema outputType = + Schema.builder().addInt32Field("f_int1").addRowField("f_row1", nestedOutput).build(); + + PCollection input = + pipeline.apply( + Create.of( + Row.withSchema(inputType) + .attachValues(2, Row.withSchema(nestedSchema).attachValues("CC", 312, 313))) .withRowSchema(inputType)); PCollection result = input .apply( SqlTransform.query( - "SELECT 1 as `f_int`, ROW(3, 'BB', f_int + 1) as `f_row1` FROM PCOLLECTION")) - .setRowSchema(resultSchema); + "SELECT f_int as `f_int1`, (`PCOLLECTION`.`f_row`.`f_nestedInt`, `PCOLLECTION`.`f_row`.`f_nestedString`) as `f_row1` FROM PCOLLECTION")) + .setRowSchema(outputType); PAssert.that(result) - .containsInAnyOrder(Row.withSchema(resultSchema).addValues(1, 3, "BB", 2).build()); + .containsInAnyOrder( + Row.withSchema(outputType) + .attachValues(2, Row.withSchema(nestedOutput).attachValues(312, "CC"))); pipeline.run(); } + /** + * TODO([BEAM-9378] This is a test of the incorrect behavior that should not work but does because + * calcite flattens the row. + */ @Test public void testRowConstructorBraces() { @@ -86,35 +159,28 @@ public void testRowConstructorBraces() { .addInt32Field("f_nestedIntPlusOne") .build(); - Schema resultSchema = - Schema.builder() - .addInt32Field("f_int") - .addInt32Field("f_int2") - .addStringField("f_varchar") - .addInt32Field("f_int3") - .build(); - - Schema inputType = + Schema schema = Schema.builder().addInt32Field("f_int").addRowField("f_row", nestedSchema).build(); PCollection input = pipeline.apply( Create.of( - Row.withSchema(inputType) + Row.withSchema(schema) .addValues( 1, Row.withSchema(nestedSchema).addValues(312, "CC", 313).build()) .build()) - .withRowSchema(inputType)); + .withRowSchema(schema)); PCollection result = - input - .apply( - SqlTransform.query( - "SELECT 1 as `f_int`, (3, 'BB', f_int + 1) as `f_row1` FROM PCOLLECTION")) - .setRowSchema(resultSchema); + input.apply( + SqlTransform.query( + "SELECT 1 as `f_int`, (3, 'BB', f_int + 1) as `f_row1` FROM PCOLLECTION")); PAssert.that(result) - .containsInAnyOrder(Row.withSchema(resultSchema).addValues(1, 3, "BB", 2).build()); + .containsInAnyOrder( + Row.withSchema(schema) + .addValues(1, Row.withSchema(nestedSchema).addValues(3, "BB", 2).build()) + .build()); pipeline.run(); } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslProjectTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslProjectTest.java index 024db76f9d3a..998013fe99af 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslProjectTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslProjectTest.java @@ -35,9 +35,6 @@ import org.junit.Test; /** Tests for field-project in queries with BOUNDED PCollection. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDslProjectTest extends BeamSqlDslBase { /** select all fields with bounded PCollection. */ @Test diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java index 9c0fd681802f..f7dbb367454b 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java @@ -17,9 +17,10 @@ */ package org.apache.beam.sdk.extensions.sql; +import static java.math.RoundingMode.UNNECESSARY; import static org.apache.beam.sdk.extensions.sql.utils.DateTimeUtils.parseTimestampWithUTCTimeZone; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import com.google.auto.value.AutoValue; @@ -30,7 +31,6 @@ import java.lang.annotation.Target; import java.lang.reflect.Method; import java.math.BigDecimal; -import java.math.RoundingMode; import java.time.LocalDate; import java.time.LocalTime; import java.util.Arrays; @@ -44,14 +44,14 @@ import org.apache.beam.sdk.extensions.sql.integrationtest.BeamSqlBuiltinFunctionsIntegrationTestBase; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Joiner; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.Lists; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.Ordering; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.SqlFunctions; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Joiner; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.Lists; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.Ordering; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.runtime.SqlFunctions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; @@ -59,19 +59,16 @@ /** * DSL compliance tests for the row-level operators of {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable}. + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDslSqlStdOperatorsTest extends BeamSqlBuiltinFunctionsIntegrationTestBase { - private static final BigDecimal ZERO = BigDecimal.valueOf(0.0); - private static final BigDecimal ONE = BigDecimal.valueOf(1.0); - private static final BigDecimal ONE2 = BigDecimal.valueOf(1.0).multiply(BigDecimal.valueOf(1.0)); - private static final BigDecimal ONE10 = - BigDecimal.ONE.divide(BigDecimal.ONE, 10, RoundingMode.HALF_EVEN); - private static final BigDecimal TWO = BigDecimal.valueOf(2.0); - private static final BigDecimal TWO0 = BigDecimal.ONE.add(BigDecimal.ONE); + private static final BigDecimal ZERO_0 = BigDecimal.valueOf(0).setScale(0, UNNECESSARY); + private static final BigDecimal ZERO_1 = BigDecimal.valueOf(0).setScale(1, UNNECESSARY); + private static final BigDecimal ONE_0 = BigDecimal.valueOf(1).setScale(0, UNNECESSARY); + private static final BigDecimal ONE_1 = BigDecimal.valueOf(1).setScale(1, UNNECESSARY); + private static final BigDecimal ONE_2 = BigDecimal.valueOf(1).setScale(2, UNNECESSARY); + private static final BigDecimal TWO_0 = BigDecimal.valueOf(2).setScale(0, UNNECESSARY); + private static final BigDecimal TWO_1 = BigDecimal.valueOf(2).setScale(1, UNNECESSARY); private static final int INTEGER_VALUE = 1; private static final long LONG_VALUE = 1L; @@ -79,6 +76,7 @@ public class BeamSqlDslSqlStdOperatorsTest extends BeamSqlBuiltinFunctionsIntegr private static final byte BYTE_VALUE = 1; private static final double DOUBLE_VALUE = 1.0; private static final float FLOAT_VALUE = 1.0f; + private static final BigDecimal DECIMAL_VALUE = BigDecimal.ONE; @Rule public ExpectedException thrown = ExpectedException.none(); @@ -236,9 +234,11 @@ public void testThatOperatorsExist() { } @Test - @SqlOperatorTest(name = "OR", kind = "OR") - @SqlOperatorTest(name = "NOT", kind = "NOT") - @SqlOperatorTest(name = "AND", kind = "AND") + @SqlOperatorTests({ + @SqlOperatorTest(name = "OR", kind = "OR"), + @SqlOperatorTest(name = "NOT", kind = "NOT"), + @SqlOperatorTest(name = "AND", kind = "AND"), + }) public void testLogicOperators() { ExpressionChecker checker = new ExpressionChecker() @@ -283,23 +283,25 @@ public void testLogicOperators() { } @Test - @SqlOperatorTest(name = "+", kind = "PLUS") - @SqlOperatorTest(name = "-", kind = "MINUS") - @SqlOperatorTest(name = "*", kind = "TIMES") - @SqlOperatorTest(name = "/", kind = "DIVIDE") - @SqlOperatorTest(name = "MOD", kind = "MOD") + @SqlOperatorTests({ + @SqlOperatorTest(name = "+", kind = "PLUS"), + @SqlOperatorTest(name = "-", kind = "MINUS"), + @SqlOperatorTest(name = "*", kind = "TIMES"), + @SqlOperatorTest(name = "/", kind = "DIVIDE"), + @SqlOperatorTest(name = "MOD", kind = "MOD"), + }) public void testArithmeticOperator() { ExpressionChecker checker = new ExpressionChecker() .addExpr("1 + 1", 2) - .addExpr("1.0 + 1", TWO) - .addExpr("1 + 1.0", TWO) - .addExpr("1.0 + 1.0", TWO) + .addExpr("1.0 + 1", TWO_1) + .addExpr("1 + 1.0", TWO_1) + .addExpr("1.0 + 1.0", TWO_1) .addExpr("c_tinyint + c_tinyint", (byte) 2) .addExpr("c_smallint + c_smallint", (short) 2) .addExpr("c_bigint + c_bigint", 2L) - .addExpr("c_decimal + c_decimal", TWO0) - .addExpr("c_tinyint + c_decimal", TWO0) + .addExpr("c_decimal + c_decimal", TWO_0) + .addExpr("c_tinyint + c_decimal", TWO_0) .addExpr("c_float + c_decimal", 2.0) .addExpr("c_double + c_decimal", 2.0) .addExpr("c_float + c_float", 2.0f) @@ -308,9 +310,9 @@ public void testArithmeticOperator() { .addExpr("c_float + c_bigint", 2.0f) .addExpr("c_double + c_bigint", 2.0) .addExpr("1 - 1", 0) - .addExpr("1.0 - 1", ZERO) - .addExpr("1 - 0.0", ONE) - .addExpr("1.0 - 1.0", ZERO) + .addExpr("1.0 - 1", ZERO_1) + .addExpr("1 - 0.0", ONE_1) + .addExpr("1.0 - 1.0", ZERO_1) .addExpr("c_tinyint - c_tinyint", (byte) 0) .addExpr("c_smallint - c_smallint", (short) 0) .addExpr("c_bigint - c_bigint", 0L) @@ -324,14 +326,14 @@ public void testArithmeticOperator() { .addExpr("c_float - c_bigint", 0.0f) .addExpr("c_double - c_bigint", 0.0) .addExpr("1 * 1", 1) - .addExpr("1.0 * 1", ONE) - .addExpr("1 * 1.0", ONE) - .addExpr("1.0 * 1.0", ONE2) + .addExpr("1.0 * 1", ONE_1) + .addExpr("1 * 1.0", ONE_1) + .addExpr("1.0 * 1.0", ONE_2) .addExpr("c_tinyint * c_tinyint", (byte) 1) .addExpr("c_smallint * c_smallint", (short) 1) .addExpr("c_bigint * c_bigint", 1L) - .addExpr("c_decimal * c_decimal", BigDecimal.ONE) - .addExpr("c_tinyint * c_decimal", BigDecimal.ONE) + .addExpr("c_decimal * c_decimal", ONE_0) + .addExpr("c_tinyint * c_decimal", ONE_0) .addExpr("c_float * c_decimal", 1.0) .addExpr("c_double * c_decimal", 1.0) .addExpr("c_float * c_float", 1.0f) @@ -340,14 +342,14 @@ public void testArithmeticOperator() { .addExpr("c_float * c_bigint", 1.0f) .addExpr("c_double * c_bigint", 1.0) .addExpr("1 / 1", 1) - .addExpr("1.0 / 1", ONE) - .addExpr("1 / 1.0", BigDecimal.ONE) - .addExpr("1.0 / 1.0", BigDecimal.ONE) + .addExpr("1.0 / 1", ONE_1) + .addExpr("1 / 1.0", ONE_0) + .addExpr("1.0 / 1.0", ONE_0) .addExpr("c_tinyint / c_tinyint", (byte) 1) .addExpr("c_smallint / c_smallint", (short) 1) .addExpr("c_bigint / c_bigint", 1L) - .addExpr("c_decimal / c_decimal", BigDecimal.ONE) - .addExpr("c_tinyint / c_decimal", BigDecimal.ONE) + .addExpr("c_decimal / c_decimal", ONE_0) + .addExpr("c_tinyint / c_decimal", ONE_0) .addExpr("c_float / c_decimal", 1.0) .addExpr("c_double / c_decimal", 1.0) .addExpr("c_float / c_float", 1.0f) @@ -356,14 +358,14 @@ public void testArithmeticOperator() { .addExpr("c_float / c_bigint", 1.0f) .addExpr("c_double / c_bigint", 1.0) .addExpr("mod(1, 1)", 0) - .addExpr("mod(1.0, 1)", 0) - .addExpr("mod(1, 1.0)", BigDecimal.ZERO) - .addExpr("mod(1.0, 1.0)", ZERO) + .addExpr("mod(1.0, 1)", ZERO_1) + .addExpr("mod(1, 1.0)", ZERO_1) + .addExpr("mod(1.0, 1.0)", ZERO_1) .addExpr("mod(c_tinyint, c_tinyint)", (byte) 0) .addExpr("mod(c_smallint, c_smallint)", (short) 0) .addExpr("mod(c_bigint, c_bigint)", 0L) - .addExpr("mod(c_decimal, c_decimal)", BigDecimal.ZERO) - .addExpr("mod(c_tinyint, c_decimal)", BigDecimal.ZERO) + .addExpr("mod(c_decimal, c_decimal)", ZERO_0) + .addExpr("mod(c_tinyint, c_decimal)", ZERO_0) // Test overflow .addExpr("c_tinyint_max + c_tinyint_max", (byte) -2) .addExpr("c_smallint_max + c_smallint_max", (short) -2) @@ -374,8 +376,10 @@ public void testArithmeticOperator() { } @Test - @SqlOperatorTest(name = "LIKE", kind = "LIKE") - @SqlOperatorTest(name = "NOT LIKE", kind = "LIKE") + @SqlOperatorTests({ + @SqlOperatorTest(name = "LIKE", kind = "LIKE"), + @SqlOperatorTest(name = "NOT LIKE", kind = "LIKE"), + }) public void testLikeAndNotLike() { ExpressionChecker checker = new ExpressionChecker() @@ -454,22 +458,24 @@ public void testLikeAndNotLike() { } @Test - @SqlOperatorTest(name = "<", kind = "LESS_THAN") - @SqlOperatorTest(name = ">", kind = "GREATER_THAN") - @SqlOperatorTest(name = "<=", kind = "LESS_THAN_OR_EQUAL") - @SqlOperatorTest(name = "<>", kind = "NOT_EQUALS") - @SqlOperatorTest(name = "=", kind = "EQUALS") - @SqlOperatorTest(name = ">=", kind = "GREATER_THAN_OR_EQUAL") - @SqlOperatorTest(name = "IS NOT NULL", kind = "IS_NOT_NULL") - @SqlOperatorTest(name = "IS NULL", kind = "IS_NULL") - @SqlOperatorTest(name = "IS TRUE", kind = "IS_TRUE") - @SqlOperatorTest(name = "IS NOT TRUE", kind = "IS_NOT_TRUE") - @SqlOperatorTest(name = "IS FALSE", kind = "IS_FALSE") - @SqlOperatorTest(name = "IS NOT FALSE", kind = "IS_NOT_FALSE") - @SqlOperatorTest(name = "IS UNKNOWN", kind = "IS_NULL") - @SqlOperatorTest(name = "IS NOT UNKNOWN", kind = "IS_NOT_NULL") - @SqlOperatorTest(name = "IS DISTINCT FROM", kind = "IS_DISTINCT_FROM") - @SqlOperatorTest(name = "IS NOT DISTINCT FROM", kind = "IS_NOT_DISTINCT_FROM") + @SqlOperatorTests({ + @SqlOperatorTest(name = "<", kind = "LESS_THAN"), + @SqlOperatorTest(name = ">", kind = "GREATER_THAN"), + @SqlOperatorTest(name = "<=", kind = "LESS_THAN_OR_EQUAL"), + @SqlOperatorTest(name = "<>", kind = "NOT_EQUALS"), + @SqlOperatorTest(name = "=", kind = "EQUALS"), + @SqlOperatorTest(name = ">=", kind = "GREATER_THAN_OR_EQUAL"), + @SqlOperatorTest(name = "IS NOT NULL", kind = "IS_NOT_NULL"), + @SqlOperatorTest(name = "IS NULL", kind = "IS_NULL"), + @SqlOperatorTest(name = "IS TRUE", kind = "IS_TRUE"), + @SqlOperatorTest(name = "IS NOT TRUE", kind = "IS_NOT_TRUE"), + @SqlOperatorTest(name = "IS FALSE", kind = "IS_FALSE"), + @SqlOperatorTest(name = "IS NOT FALSE", kind = "IS_NOT_FALSE"), + @SqlOperatorTest(name = "IS UNKNOWN", kind = "IS_NULL"), + @SqlOperatorTest(name = "IS NOT UNKNOWN", kind = "IS_NOT_NULL"), + @SqlOperatorTest(name = "IS DISTINCT FROM", kind = "IS_DISTINCT_FROM"), + @SqlOperatorTest(name = "IS NOT DISTINCT FROM", kind = "IS_NOT_DISTINCT_FROM"), + }) public void testComparisonOperatorFunction() { ExpressionChecker checker = new ExpressionChecker() @@ -701,16 +707,18 @@ public void testAggrationFunctions() { } @Test - @SqlOperatorTest(name = "CHARACTER_LENGTH", kind = "OTHER_FUNCTION") - @SqlOperatorTest(name = "CHAR_LENGTH", kind = "OTHER_FUNCTION") - @SqlOperatorTest(name = "INITCAP", kind = "OTHER_FUNCTION") - @SqlOperatorTest(name = "LOWER", kind = "OTHER_FUNCTION") - @SqlOperatorTest(name = "POSITION", kind = "POSITION") - @SqlOperatorTest(name = "OVERLAY", kind = "OTHER_FUNCTION") - @SqlOperatorTest(name = "SUBSTRING", kind = "OTHER_FUNCTION") - @SqlOperatorTest(name = "TRIM", kind = "TRIM") - @SqlOperatorTest(name = "UPPER", kind = "OTHER_FUNCTION") - @SqlOperatorTest(name = "||", kind = "OTHER") + @SqlOperatorTests({ + @SqlOperatorTest(name = "CHARACTER_LENGTH", kind = "OTHER_FUNCTION"), + @SqlOperatorTest(name = "CHAR_LENGTH", kind = "OTHER_FUNCTION"), + @SqlOperatorTest(name = "INITCAP", kind = "OTHER_FUNCTION"), + @SqlOperatorTest(name = "LOWER", kind = "OTHER_FUNCTION"), + @SqlOperatorTest(name = "POSITION", kind = "POSITION"), + @SqlOperatorTest(name = "OVERLAY", kind = "OTHER_FUNCTION"), + @SqlOperatorTest(name = "SUBSTRING", kind = "OTHER_FUNCTION"), + @SqlOperatorTest(name = "TRIM", kind = "TRIM"), + @SqlOperatorTest(name = "UPPER", kind = "OTHER_FUNCTION"), + @SqlOperatorTest(name = "||", kind = "OTHER"), + }) public void testStringFunctions() throws Exception { SqlExpressionChecker checker = new SqlExpressionChecker() @@ -756,7 +764,7 @@ public void testAbs() { .addExpr("ABS(c_tinyint)", (byte) Math.abs(BYTE_VALUE)) .addExpr("ABS(c_double)", Math.abs(DOUBLE_VALUE)) .addExpr("ABS(c_float)", Math.abs(FLOAT_VALUE)) - .addExpr("ABS(c_decimal)", new BigDecimal(Math.abs(ONE.doubleValue()))); + .addExpr("ABS(c_decimal)", ONE_0.abs()); checker.buildRunAndCheck(); } @@ -771,7 +779,7 @@ public void testLn() { .addExpr("LN(c_tinyint)", Math.log(BYTE_VALUE)) .addExpr("LN(c_double)", Math.log(DOUBLE_VALUE)) .addExpr("LN(c_float)", Math.log(FLOAT_VALUE)) - .addExpr("LN(c_decimal)", Math.log(ONE.doubleValue())); + .addExpr("LN(c_decimal)", Math.log(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -786,7 +794,7 @@ public void testSqrt() { .addExpr("SQRT(c_tinyint)", Math.sqrt(BYTE_VALUE)) .addExpr("SQRT(c_double)", Math.sqrt(DOUBLE_VALUE)) .addExpr("SQRT(c_float)", Math.sqrt(FLOAT_VALUE)) - .addExpr("SQRT(c_decimal)", Math.sqrt(ONE.doubleValue())); + .addExpr("SQRT(c_decimal)", Math.sqrt(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -801,8 +809,7 @@ public void testRound() { .addExpr("ROUND(c_tinyint, 0)", (byte) SqlFunctions.sround(BYTE_VALUE, 0)) .addExpr("ROUND(c_double, 0)", SqlFunctions.sround(DOUBLE_VALUE, 0)) .addExpr("ROUND(c_float, 0)", (float) SqlFunctions.sround(FLOAT_VALUE, 0)) - .addExpr( - "ROUND(c_decimal, 0)", new BigDecimal(SqlFunctions.sround(ONE.doubleValue(), 0))); + .addExpr("ROUND(c_decimal, 0)", SqlFunctions.sround(ONE_0, 0)); checker.buildRunAndCheck(); } @@ -817,7 +824,7 @@ public void testLog10() { .addExpr("LOG10(c_tinyint)", Math.log10(BYTE_VALUE)) .addExpr("LOG10(c_double)", Math.log10(DOUBLE_VALUE)) .addExpr("LOG10(c_float)", Math.log10(FLOAT_VALUE)) - .addExpr("LOG10(c_decimal)", Math.log10(ONE.doubleValue())); + .addExpr("LOG10(c_decimal)", Math.log10(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -832,7 +839,7 @@ public void testExp() { .addExpr("EXP(c_tinyint)", Math.exp(BYTE_VALUE)) .addExpr("EXP(c_double)", Math.exp(DOUBLE_VALUE)) .addExpr("EXP(c_float)", Math.exp(FLOAT_VALUE)) - .addExpr("EXP(c_decimal)", Math.exp(ONE.doubleValue())); + .addExpr("EXP(c_decimal)", Math.exp(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -847,7 +854,7 @@ public void testAcos() { .addExpr("ACOS(c_tinyint)", Math.acos(BYTE_VALUE)) .addExpr("ACOS(c_double)", Math.acos(DOUBLE_VALUE)) .addExpr("ACOS(c_float)", Math.acos(FLOAT_VALUE)) - .addExpr("ACOS(c_decimal)", Math.acos(ONE.doubleValue())); + .addExpr("ACOS(c_decimal)", Math.acos(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -862,7 +869,7 @@ public void testAsin() { .addExpr("ASIN(c_tinyint)", Math.asin(BYTE_VALUE)) .addExpr("ASIN(c_double)", Math.asin(DOUBLE_VALUE)) .addExpr("ASIN(c_float)", Math.asin(FLOAT_VALUE)) - .addExpr("ASIN(c_decimal)", Math.asin(ONE.doubleValue())); + .addExpr("ASIN(c_decimal)", Math.asin(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -877,7 +884,7 @@ public void testAtan() { .addExpr("ATAN(c_tinyint)", Math.atan(BYTE_VALUE)) .addExpr("ATAN(c_double)", Math.atan(DOUBLE_VALUE)) .addExpr("ATAN(c_float)", Math.atan(FLOAT_VALUE)) - .addExpr("ATAN(c_decimal)", Math.atan(ONE.doubleValue())); + .addExpr("ATAN(c_decimal)", Math.atan(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -892,7 +899,7 @@ public void testCot() { .addExpr("COT(c_tinyint)", 1.0d / Math.tan(BYTE_VALUE)) .addExpr("COT(c_double)", 1.0d / Math.tan(DOUBLE_VALUE)) .addExpr("COT(c_float)", 1.0d / Math.tan(FLOAT_VALUE)) - .addExpr("COT(c_decimal)", 1.0d / Math.tan(ONE.doubleValue())); + .addExpr("COT(c_decimal)", 1.0d / Math.tan(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -907,7 +914,7 @@ public void testDegrees() { .addExpr("DEGREES(c_tinyint)", Math.toDegrees(BYTE_VALUE)) .addExpr("DEGREES(c_double)", Math.toDegrees(DOUBLE_VALUE)) .addExpr("DEGREES(c_float)", Math.toDegrees(FLOAT_VALUE)) - .addExpr("DEGREES(c_decimal)", Math.toDegrees(ONE.doubleValue())); + .addExpr("DEGREES(c_decimal)", Math.toDegrees(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -922,7 +929,7 @@ public void testRadians() { .addExpr("RADIANS(c_tinyint)", Math.toRadians(BYTE_VALUE)) .addExpr("RADIANS(c_double)", Math.toRadians(DOUBLE_VALUE)) .addExpr("RADIANS(c_float)", Math.toRadians(FLOAT_VALUE)) - .addExpr("RADIANS(c_decimal)", Math.toRadians(ONE.doubleValue())); + .addExpr("RADIANS(c_decimal)", Math.toRadians(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -937,7 +944,7 @@ public void testCos() { .addExpr("COS(c_tinyint)", Math.cos(BYTE_VALUE)) .addExpr("COS(c_double)", Math.cos(DOUBLE_VALUE)) .addExpr("COS(c_float)", Math.cos(FLOAT_VALUE)) - .addExpr("COS(c_decimal)", Math.cos(ONE.doubleValue())); + .addExpr("COS(c_decimal)", Math.cos(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -952,7 +959,7 @@ public void testSin() { .addExpr("SIN(c_tinyint)", Math.sin(BYTE_VALUE)) .addExpr("SIN(c_double)", Math.sin(DOUBLE_VALUE)) .addExpr("SIN(c_float)", Math.sin(FLOAT_VALUE)) - .addExpr("SIN(c_decimal)", Math.sin(ONE.doubleValue())); + .addExpr("SIN(c_decimal)", Math.sin(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -967,7 +974,7 @@ public void testTan() { .addExpr("TAN(c_tinyint)", Math.tan(BYTE_VALUE)) .addExpr("TAN(c_double)", Math.tan(DOUBLE_VALUE)) .addExpr("TAN(c_float)", Math.tan(FLOAT_VALUE)) - .addExpr("TAN(c_decimal)", Math.tan(ONE.doubleValue())); + .addExpr("TAN(c_decimal)", Math.tan(DECIMAL_VALUE.doubleValue())); checker.buildRunAndCheck(); } @@ -983,7 +990,7 @@ public void testSign() { .addExpr("SIGN(c_tinyint)", (byte) Integer.signum(BYTE_VALUE)) .addExpr("SIGN(c_double)", Math.signum(DOUBLE_VALUE)) .addExpr("SIGN(c_float)", Math.signum(FLOAT_VALUE)) - .addExpr("SIGN(c_decimal)", BigDecimal.valueOf(ONE.signum())); + .addExpr("SIGN(c_decimal)", BigDecimal.valueOf(DECIMAL_VALUE.signum())); checker.buildRunAndCheck(); } @@ -999,14 +1006,14 @@ public void testPower() { .addExpr("POWER(c_tinyint, 2)", Math.pow(BYTE_VALUE, 2)) .addExpr("POWER(c_double, 2)", Math.pow(DOUBLE_VALUE, 2)) .addExpr("POWER(c_float, 2)", Math.pow(FLOAT_VALUE, 2)) - .addExpr("POWER(c_decimal, 2)", Math.pow(ONE.doubleValue(), 2)); + .addExpr("POWER(c_decimal, 2)", Math.pow(DECIMAL_VALUE.doubleValue(), 2)); checker.buildRunAndCheck(); } @Test @SqlOperatorTest(name = "PI", kind = "OTHER_FUNCTION") - public void testPi() throws Exception { + public void testPi() { ExpressionChecker checker = new ExpressionChecker().addExpr("PI", Math.PI); checker.buildRunAndCheck(); @@ -1023,7 +1030,7 @@ public void testAtan2() { .addExpr("ATAN2(c_tinyint, 2)", Math.atan2(BYTE_VALUE, 2)) .addExpr("ATAN2(c_double, 2)", Math.atan2(DOUBLE_VALUE, 2)) .addExpr("ATAN2(c_float, 2)", Math.atan2(FLOAT_VALUE, 2)) - .addExpr("ATAN2(c_decimal, 2)", Math.atan2(ONE.doubleValue(), 2)); + .addExpr("ATAN2(c_decimal, 2)", Math.atan2(DECIMAL_VALUE.doubleValue(), 2)); checker.buildRunAndCheck(); } @@ -1039,7 +1046,7 @@ public void testTruncate() { .addExpr("TRUNCATE(c_tinyint, 2)", (byte) SqlFunctions.struncate(BYTE_VALUE, 2)) .addExpr("TRUNCATE(c_double, 2)", SqlFunctions.struncate(DOUBLE_VALUE, 2)) .addExpr("TRUNCATE(c_float, 2)", (float) SqlFunctions.struncate(FLOAT_VALUE, 2)) - .addExpr("TRUNCATE(c_decimal, 2)", SqlFunctions.struncate(ONE, 2)); + .addExpr("TRUNCATE(c_decimal, 2)", SqlFunctions.struncate(DECIMAL_VALUE, 2)); checker.buildRunAndCheck(); } @@ -1068,9 +1075,11 @@ public void testRandInteger() { } @Test - @SqlOperatorTest(name = "ARRAY", kind = "ARRAY_VALUE_CONSTRUCTOR") - @SqlOperatorTest(name = "CARDINALITY", kind = "OTHER_FUNCTION") - @SqlOperatorTest(name = "ELEMENT", kind = "OTHER_FUNCTION") + @SqlOperatorTests({ + @SqlOperatorTest(name = "ARRAY", kind = "ARRAY_VALUE_CONSTRUCTOR"), + @SqlOperatorTest(name = "CARDINALITY", kind = "OTHER_FUNCTION"), + @SqlOperatorTest(name = "ELEMENT", kind = "OTHER_FUNCTION"), + }) public void testArrayFunctions() { ExpressionChecker checker = new ExpressionChecker() @@ -1089,17 +1098,19 @@ public void testArrayFunctions() { } @Test - @SqlOperatorTest(name = "DAYOFMONTH", kind = "OTHER") - @SqlOperatorTest(name = "DAYOFWEEK", kind = "OTHER") - @SqlOperatorTest(name = "DAYOFYEAR", kind = "OTHER") - @SqlOperatorTest(name = "EXTRACT", kind = "EXTRACT") - @SqlOperatorTest(name = "YEAR", kind = "OTHER") - @SqlOperatorTest(name = "QUARTER", kind = "OTHER") - @SqlOperatorTest(name = "MONTH", kind = "OTHER") - @SqlOperatorTest(name = "WEEK", kind = "OTHER") - @SqlOperatorTest(name = "HOUR", kind = "OTHER") - @SqlOperatorTest(name = "MINUTE", kind = "OTHER") - @SqlOperatorTest(name = "SECOND", kind = "OTHER") + @SqlOperatorTests({ + @SqlOperatorTest(name = "DAYOFMONTH", kind = "OTHER"), + @SqlOperatorTest(name = "DAYOFWEEK", kind = "OTHER"), + @SqlOperatorTest(name = "DAYOFYEAR", kind = "OTHER"), + @SqlOperatorTest(name = "EXTRACT", kind = "EXTRACT"), + @SqlOperatorTest(name = "YEAR", kind = "OTHER"), + @SqlOperatorTest(name = "QUARTER", kind = "OTHER"), + @SqlOperatorTest(name = "MONTH", kind = "OTHER"), + @SqlOperatorTest(name = "WEEK", kind = "OTHER"), + @SqlOperatorTest(name = "HOUR", kind = "OTHER"), + @SqlOperatorTest(name = "MINUTE", kind = "OTHER"), + @SqlOperatorTest(name = "SECOND", kind = "OTHER"), + }) public void testBasicDateTimeFunctions() { ExpressionChecker checker = new ExpressionChecker() @@ -1368,9 +1379,11 @@ public void testTimestampMinusInterval() { } @Test - @SqlOperatorTest(name = "CASE", kind = "CASE") - @SqlOperatorTest(name = "NULLIF", kind = "NULLIF") - @SqlOperatorTest(name = "COALESCE", kind = "COALESCE") + @SqlOperatorTests({ + @SqlOperatorTest(name = "CASE", kind = "CASE"), + @SqlOperatorTest(name = "NULLIF", kind = "NULLIF"), + @SqlOperatorTest(name = "COALESCE", kind = "COALESCE"), + }) public void testConditionalOperatorsAndFunctions() { ExpressionChecker checker = new ExpressionChecker() diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslUdfUdafTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslUdfUdafTest.java index 7bc1aafaa6aa..9d0b6345748a 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslUdfUdafTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslUdfUdafTest.java @@ -22,8 +22,16 @@ import static org.junit.internal.matchers.ThrowableMessageMatcher.hasMessage; import com.google.auto.service.AutoService; +import java.sql.Date; +import java.sql.Time; import java.sql.Timestamp; +import java.time.LocalDate; +import java.time.LocalTime; +import java.util.ArrayList; import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; import java.util.Map; import java.util.stream.IntStream; import org.apache.beam.sdk.extensions.sql.impl.BeamCalciteTable; @@ -32,6 +40,7 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestBoundedTable; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.transforms.Combine.CombineFn; import org.apache.beam.sdk.transforms.SerializableFunction; @@ -39,16 +48,13 @@ import org.apache.beam.sdk.values.PCollectionTuple; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.function.Parameter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.TranslatableTable; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.function.Parameter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.TranslatableTable; import org.joda.time.Instant; import org.junit.Test; /** Tests for UDF/UDAF. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDslUdfUdafTest extends BeamSqlDslBase { /** GROUP-BY with UDAF. */ @@ -67,9 +73,8 @@ public void testUdaf() throws Exception { pipeline.run().waitUntilFinish(); } - /** Test Joda time UDF. */ @Test - public void testJodaTimeUdf() throws Exception { + public void testTimestampUdaf() throws Exception { Schema resultType = Schema.builder().addDateTimeField("jodatime").build(); Row row = @@ -86,17 +91,48 @@ public void testJodaTimeUdf() throws Exception { pipeline.run().waitUntilFinish(); } - /** Test Joda time UDAF. */ @Test - public void testJodaTimeUdaf() throws Exception { - Schema resultType = Schema.builder().addDateTimeField("jodatime").build(); + public void testDateUdf() throws Exception { + Schema resultType = + Schema.builder().addField("result_date", FieldType.logicalType(SqlTypes.DATE)).build(); + + Row row = Row.withSchema(resultType).addValues(LocalDate.of(2016, 12, 31)).build(); + + String sql = "SELECT PRE_DATE(f_date) as result_date FROM PCOLLECTION WHERE f_int=1"; + PCollection result = + boundedInput1.apply( + "testTimeUdf", SqlTransform.query(sql).registerUdf("PRE_DATE", PreviousDate.class)); + PAssert.that(result).containsInAnyOrder(row); + + pipeline.run().waitUntilFinish(); + } + + @Test + public void testTimeUdf() throws Exception { + Schema resultType = + Schema.builder().addField("result_time", FieldType.logicalType(SqlTypes.TIME)).build(); + + Row row = Row.withSchema(resultType).addValues(LocalTime.of(0, 1, 3)).build(); + + String sql = "SELECT PRE_HOUR(f_time) as result_time FROM PCOLLECTION WHERE f_int=1"; + PCollection result = + boundedInput1.apply( + "testTimeUdf", SqlTransform.query(sql).registerUdf("PRE_HOUR", PreviousHour.class)); + PAssert.that(result).containsInAnyOrder(row); + + pipeline.run().waitUntilFinish(); + } + + @Test + public void testTimestampUdf() throws Exception { + Schema resultType = Schema.builder().addDateTimeField("result_time").build(); Row row = Row.withSchema(resultType) .addValues(parseTimestampWithoutTimeZone("2016-12-31 01:01:03")) .build(); - String sql = "SELECT PRE_DAY(f_timestamp) as jodatime FROM PCOLLECTION WHERE f_int=1"; + String sql = "SELECT PRE_DAY(f_timestamp) as result_time FROM PCOLLECTION WHERE f_int=1"; PCollection result = boundedInput1.apply( "testTimeUdf", SqlTransform.query(sql).registerUdf("PRE_DAY", PreviousDay.class)); @@ -105,6 +141,56 @@ public void testJodaTimeUdaf() throws Exception { pipeline.run().waitUntilFinish(); } + /** GROUP-BY with UDAF that returns Map. */ + @Test + public void testUdafWithMapOutput() throws Exception { + Schema resultType = + Schema.builder() + .addInt32Field("f_int2") + .addMapField("squareAndAccumulateInMap", FieldType.STRING, FieldType.INT32) + .build(); + + Map resultMap = new HashMap(); + resultMap.put("squareOf-1", 1); + resultMap.put("squareOf-2", 4); + resultMap.put("squareOf-3", 9); + resultMap.put("squareOf-4", 16); + Row row = Row.withSchema(resultType).addValues(0, resultMap).build(); + + String sql = + "SELECT f_int2,squareAndAccumulateInMap(f_int) AS `squareAndAccumulateInMap` FROM PCOLLECTION GROUP BY f_int2"; + PCollection result = + boundedInput1.apply( + "testUdafWithMapOutput", + SqlTransform.query(sql) + .registerUdaf("squareAndAccumulateInMap", new SquareAndAccumulateInMap())); + PAssert.that(result).containsInAnyOrder(row); + + pipeline.run().waitUntilFinish(); + } + + /** GROUP-BY with UDAF that returns List. */ + @Test + public void testUdafWithListOutput() throws Exception { + Schema resultType = + Schema.builder() + .addInt32Field("f_int2") + .addArrayField("squareAndAccumulateInList", FieldType.INT32) + .build(); + Row row = Row.withSchema(resultType).addValue(0).addArray(Arrays.asList(1, 4, 9, 16)).build(); + + String sql = + "SELECT f_int2,squareAndAccumulateInList(f_int) AS `squareAndAccumulateInList` FROM PCOLLECTION GROUP BY f_int2"; + PCollection result = + boundedInput1.apply( + "testUdafWithListOutput", + SqlTransform.query(sql) + .registerUdaf("squareAndAccumulateInList", new SquareAndAccumulateInList())); + PAssert.that(result).containsInAnyOrder(row); + + pipeline.run().waitUntilFinish(); + } + @Test public void testUdfWithListOutput() throws Exception { Schema resultType = Schema.builder().addArrayField("array_field", FieldType.INT64).build(); @@ -216,7 +302,7 @@ public void testBeamSqlUdfWithDefaultParameters() throws Exception { } /** - * test {@link org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.TableMacro} UDF. + * test {@link org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.TableMacro} UDF. */ @Test public void testTableMacroUdf() throws Exception { @@ -238,7 +324,7 @@ public void testTableMacroUdf() throws Exception { /** test auto-provider UDF/UDAF. */ @Test - public void testAutoUdfUdaf() throws Exception { + public void testAutoLoadedUdfUdaf() throws Exception { Schema resultType = Schema.builder().addInt32Field("f_int2").addInt32Field("autoload_squarecubicsum").build(); @@ -247,8 +333,7 @@ public void testAutoUdfUdaf() throws Exception { String sql = "SELECT f_int2, autoload_squaresum(autoload_cubic(f_int)) AS `autoload_squarecubicsum`" + " FROM PCOLLECTION GROUP BY f_int2"; - PCollection result = - boundedInput1.apply("testUdaf", SqlTransform.query(sql).withAutoUdfUdafLoad(true)); + PCollection result = boundedInput1.apply("testUdaf", SqlTransform.query(sql)); PAssert.that(result).containsInAnyOrder(row); pipeline.run().waitUntilFinish(); @@ -381,7 +466,21 @@ public static String eval( } } + /** A UDF to test support of date. */ + public static final class PreviousDate implements BeamSqlUdf { + public static Date eval(Date date) { + return new Date(date.getTime() - 24 * 3600 * 1000L); + } + } + /** A UDF to test support of time. */ + public static final class PreviousHour implements BeamSqlUdf { + public static Time eval(Time time) { + return new Time(time.getTime() - 3600 * 1000L); + } + } + + /** A UDF to test support of timestamp. */ public static final class PreviousDay implements BeamSqlUdf { public static Timestamp eval(Timestamp time) { return new Timestamp(time.getTime() - 24 * 3600 * 1000L); @@ -404,7 +503,7 @@ public static Integer eval(java.util.List i) { /** * UDF to test support for {@link - * org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.TableMacro}. + * org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.TableMacro}. */ public static final class RangeUdf implements BeamSqlUdf { public static TranslatableTable eval(int startInclusive, int endExclusive) { @@ -413,4 +512,64 @@ public static TranslatableTable eval(int startInclusive, int endExclusive) { return BeamCalciteTable.of(new TestBoundedTable(schema).addRows(values)); } } + + /** UDAF(CombineFn) for test, which squares each input, tags it and returns them all in a Map. */ + public static class SquareAndAccumulateInMap + extends CombineFn, Map> { + @Override + public Map createAccumulator() { + return new HashMap(); + } + + @Override + public Map addInput(Map accumulator, Integer input) { + accumulator.put("squareOf-" + input, input * input); + return accumulator; + } + + @Override + public Map mergeAccumulators(Iterable> accumulators) { + Map merged = createAccumulator(); + for (Map accumulator : accumulators) { + merged.putAll(accumulator); + } + return merged; + } + + @Override + public Map extractOutput(Map accumulator) { + return accumulator; + } + } + + /** UDAF(CombineFn) for test, which squares each input and returns them all in a List. */ + public static class SquareAndAccumulateInList + extends CombineFn, List> { + + @Override + public List createAccumulator() { + return new ArrayList(); + } + + @Override + public List addInput(List accumulator, Integer input) { + accumulator.add(input * input); + return accumulator; + } + + @Override + public List mergeAccumulators(Iterable> accumulators) { + List merged = createAccumulator(); + for (List accumulator : accumulators) { + merged.addAll(accumulator); + } + return merged; + } + + @Override + public List extractOutput(List accumulator) { + Collections.sort(accumulator); + return accumulator; + } + } } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlExplainTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlExplainTest.java index 602af4dd18f7..4884e02ee6de 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlExplainTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlExplainTest.java @@ -21,16 +21,13 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.text.TextTableProvider; import org.apache.beam.sdk.extensions.sql.meta.store.InMemoryMetaStore; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParseException; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelConversionException; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParseException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelConversionException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.ValidationException; import org.junit.Before; import org.junit.Ignore; /** UnitTest for Explain Plan. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlExplainTest { private InMemoryMetaStore metaStore; private BeamSqlCli cli; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlMapTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlMapTest.java index e1755302fa08..ba7c9e48efd9 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlMapTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlMapTest.java @@ -23,7 +23,7 @@ import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlMultipleSchemasTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlMultipleSchemasTest.java index 41f916bef236..b891bd3d383c 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlMultipleSchemasTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlMultipleSchemasTest.java @@ -29,7 +29,7 @@ import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/InferredJavaBeanSqlTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/InferredJavaBeanSqlTest.java index 77b71a625640..45b9601484db 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/InferredJavaBeanSqlTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/InferredJavaBeanSqlTest.java @@ -35,9 +35,6 @@ import org.junit.Test; /** Tests for automatic inferring schema from the input {@link PCollection} of JavaBeans. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class InferredJavaBeanSqlTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/PubsubToBigqueryIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/PubsubToBigqueryIT.java index 68f301114a5a..16d6c11a33b6 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/PubsubToBigqueryIT.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/PubsubToBigqueryIT.java @@ -25,18 +25,14 @@ import java.io.Serializable; import java.util.List; import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; -import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; -import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; -import org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BigQueryTableProvider; -import org.apache.beam.sdk.extensions.sql.meta.provider.pubsub.PubsubTableProvider; import org.apache.beam.sdk.io.gcp.bigquery.TestBigQuery; import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage; import org.apache.beam.sdk.io.gcp.pubsub.TestPubsub; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; import org.joda.time.Duration; import org.joda.time.Instant; import org.junit.Rule; @@ -56,14 +52,12 @@ public class PubsubToBigqueryIT implements Serializable { @Test public void testSimpleInsert() throws Exception { - BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new PubsubTableProvider(), new BigQueryTableProvider()); - - String createTableString = + String pubsubTableString = "CREATE EXTERNAL TABLE pubsub_topic (\n" + "event_timestamp TIMESTAMP, \n" + "attributes MAP, \n" + "payload ROW< \n" - + " id INTEGER, \n" + + " id BIGINT, \n" + " name VARCHAR \n" + " > \n" + ") \n" @@ -72,9 +66,7 @@ public void testSimpleInsert() throws Exception { + pubsub.topicPath() + "' \n" + "TBLPROPERTIES '{ \"timestampAttributeKey\" : \"ts\" }'"; - sqlEnv.executeDdl(createTableString); - - String createTableStatement = + String bqTableString = "CREATE EXTERNAL TABLE bq_table( \n" + " id BIGINT, \n" + " name VARCHAR \n " @@ -83,17 +75,16 @@ public void testSimpleInsert() throws Exception { + "LOCATION '" + bigQuery.tableSpec() + "'"; - sqlEnv.executeDdl(createTableStatement); - String insertStatement = "INSERT INTO bq_table \n" + "SELECT \n" + " pubsub_topic.payload.id, \n" + " pubsub_topic.payload.name \n" + "FROM pubsub_topic"; - - BeamSqlRelUtils.toPCollection(pipeline, sqlEnv.parseQuery(insertStatement)); - + pipeline.apply( + SqlTransform.query(insertStatement) + .withDdlString(pubsubTableString) + .withDdlString(bqTableString)); pipeline.run(); // Block until a subscription for this topic exists @@ -117,12 +108,10 @@ public void testSimpleInsert() throws Exception { @Test public void testSimpleInsertFlat() throws Exception { - BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new PubsubTableProvider(), new BigQueryTableProvider()); - - String createTableString = + String pubsubTableString = "CREATE EXTERNAL TABLE pubsub_topic (\n" + "event_timestamp TIMESTAMP, \n" - + "id INTEGER, \n" + + "id BIGINT, \n" + "name VARCHAR \n" + ") \n" + "TYPE 'pubsub' \n" @@ -130,9 +119,7 @@ public void testSimpleInsertFlat() throws Exception { + pubsub.topicPath() + "' \n" + "TBLPROPERTIES '{ \"timestampAttributeKey\" : \"ts\" }'"; - sqlEnv.executeDdl(createTableString); - - String createTableStatement = + String bqTableString = "CREATE EXTERNAL TABLE bq_table( \n" + " id BIGINT, \n" + " name VARCHAR \n " @@ -141,13 +128,13 @@ public void testSimpleInsertFlat() throws Exception { + "LOCATION '" + bigQuery.tableSpec() + "'"; - sqlEnv.executeDdl(createTableStatement); - String insertStatement = "INSERT INTO bq_table \n" + "SELECT \n" + " id, \n" + " name \n" + "FROM pubsub_topic"; - BeamSqlRelUtils.toPCollection(pipeline, sqlEnv.parseQuery(insertStatement)); - + pipeline.apply( + SqlTransform.query(insertStatement) + .withDdlString(pubsubTableString) + .withDdlString(bqTableString)); pipeline.run(); // Block until a subscription for this topic exists diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/TestUtils.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/TestUtils.java index 33ae73959adc..6ab09c198ed2 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/TestUtils.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/TestUtils.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import java.util.ArrayList; import java.util.Arrays; @@ -35,15 +35,12 @@ import org.joda.time.Instant; /** Test utilities. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestUtils { /** A {@code DoFn} to convert a {@code BeamSqlRow} to a comparable {@code String}. */ public static class BeamSqlRow2StringDoFn extends DoFn { @ProcessElement - public void processElement(ProcessContext ctx) { - ctx.output(ctx.element().toString()); + public void processElement(@Element Row row, OutputReceiver o) { + o.output(row.toString(false)); } } @@ -51,7 +48,7 @@ public void processElement(ProcessContext ctx) { public static List beamSqlRows2Strings(List rows) { List strs = new ArrayList<>(); for (Row row : rows) { - strs.add(row.toString()); + strs.add(row.toString(false)); } return strs; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/TypedCombineFnDelegateTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/TypedCombineFnDelegateTest.java new file mode 100644 index 000000000000..9652e810b2ab --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/TypedCombineFnDelegateTest.java @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql; + +import static org.junit.Assert.assertEquals; + +import java.io.Serializable; +import java.util.Comparator; +import java.util.List; +import org.apache.beam.sdk.extensions.sql.impl.UdafImpl; +import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.sdk.transforms.Max; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeSystem; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.FunctionParameter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; + +public class TypedCombineFnDelegateTest { + + @Rule public ExpectedException exceptions = ExpectedException.none(); + + @Test + public void testParameterExtractionFromCombineFn_CombineFnDelegate() { + Combine.BinaryCombineFn max = + Max.of( + (Comparator & Serializable) (a, b) -> Integer.compare(a.length(), b.length())); + UdafImpl, String> udaf = + new UdafImpl<>(new TypedCombineFnDelegate, String>(max) {}); + RelDataTypeFactory typeFactory = new JavaTypeFactoryImpl(RelDataTypeSystem.DEFAULT); + List parameters = udaf.getParameters(); + assertEquals(1, parameters.size()); + assertEquals(SqlTypeName.VARCHAR, parameters.get(0).getType(typeFactory).getSqlTypeName()); + } + + @Test + public void testParameterExtractionFromCombineFn_CombineFnDelegate_WithGenericArray() { + Combine.BinaryCombineFn[]> max = + Max.of( + (Comparator[]> & Serializable) + (a, b) -> Integer.compare(a[0].get(0).length(), b[0].get(0).length())); + UdafImpl[], Combine.Holder[]>, List[]> udaf = + new UdafImpl<>( + new TypedCombineFnDelegate< + List[], Combine.Holder[]>, List[]>(max) {}); + exceptions.expect(IllegalArgumentException.class); + RelDataTypeFactory typeFactory = new JavaTypeFactoryImpl(RelDataTypeSystem.DEFAULT); + udaf.getParameters().get(0).getType(typeFactory); + } + + @Test + public void testParameterExtractionFromCombineFn_CombineFnDelegate_WithListInsteadOfArray() { + Combine.BinaryCombineFn>> max = + Max.of( + (Comparator>> & Serializable) + (a, b) -> Integer.compare(a.get(0).get(0).length(), b.get(0).get(0).length())); + UdafImpl>, Combine.Holder>>, List>> udaf = + new UdafImpl<>( + new TypedCombineFnDelegate< + List>, Combine.Holder>>, List>>( + max) {}); + RelDataTypeFactory typeFactory = new JavaTypeFactoryImpl(RelDataTypeSystem.DEFAULT); + List parameters = udaf.getParameters(); + assertEquals(1, parameters.size()); + assertEquals(SqlTypeName.ARRAY, parameters.get(0).getType(typeFactory).getSqlTypeName()); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/JavaUdfLoaderTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/JavaUdfLoaderTest.java new file mode 100644 index 000000000000..92038d0a8d5c --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/JavaUdfLoaderTest.java @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl; + +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.fail; + +import java.io.IOException; +import java.nio.file.ProviderNotFoundException; +import java.util.Collections; +import java.util.Iterator; +import org.apache.beam.sdk.extensions.sql.impl.parser.SqlCreateFunctionTest; +import org.apache.beam.sdk.extensions.sql.udf.UdfProvider; +import org.apache.beam.sdk.util.common.ReflectHelpers; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Unit tests for {@link JavaUdfLoader}. */ +@RunWith(JUnit4.class) +public class JavaUdfLoaderTest { + @Rule public ExpectedException thrown = ExpectedException.none(); + + private final String jarPathProperty = "beam.sql.udf.test.jar_path"; + private final String emptyJarPathProperty = "beam.sql.udf.test.empty_jar_path"; + + private final String jarPath = System.getProperty(jarPathProperty, ""); + private final String emptyJarPath = System.getProperty(emptyJarPathProperty, ""); + + @Before + public void setUp() { + if (jarPath == null) { + fail( + String.format( + "System property %s must be set to run %s.", + jarPathProperty, SqlCreateFunctionTest.class.getSimpleName())); + } + if (emptyJarPath == null) { + fail( + String.format( + "System property %s must be set to run %s.", + emptyJarPathProperty, SqlCreateFunctionTest.class.getSimpleName())); + } + } + + /** + * Test that the parent classloader does not load any implementations of {@link UdfProvider}. This + * is important because we do not want to pollute the user's namespace. + */ + @Test + public void testClassLoaderHasNoUdfProviders() throws IOException { + JavaUdfLoader udfLoader = new JavaUdfLoader(); + Iterator udfProviders = + udfLoader.getUdfProviders(ReflectHelpers.findClassLoader()); + assertFalse(udfProviders.hasNext()); + } + + @Test + public void testLoadScalarFunction() { + JavaUdfLoader udfLoader = new JavaUdfLoader(); + udfLoader.loadScalarFunction(Collections.singletonList("helloWorld"), jarPath); + } + + @Test + public void testLoadAggregateFunction() { + JavaUdfLoader udfLoader = new JavaUdfLoader(); + udfLoader.loadAggregateFunction(Collections.singletonList("my_sum"), jarPath); + } + + @Test + public void testLoadUnregisteredScalarFunctionThrowsRuntimeException() { + JavaUdfLoader udfLoader = new JavaUdfLoader(); + thrown.expect(RuntimeException.class); + thrown.expectMessage( + String.format("No implementation of scalar function notRegistered found in %s.", jarPath)); + udfLoader.loadScalarFunction(Collections.singletonList("notRegistered"), jarPath); + } + + @Test + public void testLoadUnregisteredAggregateFunctionThrowsRuntimeException() { + JavaUdfLoader udfLoader = new JavaUdfLoader(); + thrown.expect(RuntimeException.class); + thrown.expectMessage( + String.format( + "No implementation of aggregate function notRegistered found in %s.", jarPath)); + udfLoader.loadAggregateFunction(Collections.singletonList("notRegistered"), jarPath); + } + + @Test + public void testJarMissingUdfProviderThrowsProviderNotFoundException() { + JavaUdfLoader udfLoader = new JavaUdfLoader(); + thrown.expect(ProviderNotFoundException.class); + thrown.expectMessage(String.format("No UdfProvider implementation found in %s.", emptyJarPath)); + // Load from an inhabited jar first so we can make sure we load UdfProviders in isolation + // from other jars. + udfLoader.loadScalarFunction(Collections.singletonList("helloWorld"), jarPath); + udfLoader.loadScalarFunction(Collections.singletonList("helloWorld"), emptyJarPath); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/JdbcDriverTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/JdbcDriverTest.java index d50a4aa8a629..c5308cc1e537 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/JdbcDriverTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/JdbcDriverTest.java @@ -18,13 +18,13 @@ package org.apache.beam.sdk.extensions.sql.impl; import static org.apache.beam.sdk.values.Row.toRow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.nullValue; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.sql.Connection; @@ -50,10 +50,10 @@ import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.util.ReleaseInfo; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteConnection; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteConnection; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; import org.joda.time.DateTime; import org.joda.time.Duration; import org.joda.time.ReadableInstant; @@ -65,9 +65,6 @@ import org.junit.rules.ExpectedException; /** Test for {@link JdbcDriver}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JdbcDriverTest { public static final DateTime FIRST_DATE = new DateTime(1); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/LazyAggregateCombineFnTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/LazyAggregateCombineFnTest.java new file mode 100644 index 000000000000..d3cdea3c7e36 --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/LazyAggregateCombineFnTest.java @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.hasSize; +import static org.hamcrest.Matchers.instanceOf; +import static org.junit.Assert.assertEquals; + +import java.util.List; +import org.apache.beam.sdk.coders.CannotProvideCoderException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderRegistry; +import org.apache.beam.sdk.coders.VarLongCoder; +import org.apache.beam.sdk.extensions.sql.udf.AggregateFn; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeSystem; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.AggregateFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.FunctionParameter; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.junit.runners.Parameterized; + +/** Tests for {@link LazyAggregateCombineFn}. */ +@RunWith(JUnit4.class) +public class LazyAggregateCombineFnTest { + @Rule public ExpectedException exceptions = ExpectedException.none(); + + @Test + public void getAccumulatorCoderInfersCoderForWildcardTypeParameter() + throws CannotProvideCoderException { + LazyAggregateCombineFn combiner = new LazyAggregateCombineFn<>(new Sum()); + Coder coder = combiner.getAccumulatorCoder(CoderRegistry.createDefault(), VarLongCoder.of()); + assertThat(coder, instanceOf(VarLongCoder.class)); + } + + @Test + public void mergeAccumulators() { + LazyAggregateCombineFn combiner = new LazyAggregateCombineFn<>(new Sum()); + long merged = combiner.mergeAccumulators(ImmutableList.of(1L, 1L)); + assertEquals(2L, merged); + } + + @RunWith(Parameterized.class) + public static class UdafImplTest { + @Parameterized.Parameters(name = "aggregateFn: {0}") + public static Object[] data() { + return new Object[] {new Sum(), new SumChild()}; + } + + @Parameterized.Parameter public AggregateFn aggregateFn; + + @Test + public void subclassGetUdafImpl() { + LazyAggregateCombineFn combiner = new LazyAggregateCombineFn<>(aggregateFn); + AggregateFunction aggregateFunction = combiner.getUdafImpl(); + RelDataTypeFactory typeFactory = new JavaTypeFactoryImpl(RelDataTypeSystem.DEFAULT); + RelDataType expectedType = typeFactory.createJavaType(Long.class); + + List params = aggregateFunction.getParameters(); + assertThat(params, hasSize(1)); + RelDataType paramType = params.get(0).getType(typeFactory); + assertEquals(expectedType, paramType); + + RelDataType returnType = aggregateFunction.getReturnType(typeFactory); + assertEquals(expectedType, returnType); + } + } + + @Test + public void nonparameterizedGetUdafImpl_throwsIllegalStateException() { + LazyAggregateCombineFn combiner = + new LazyAggregateCombineFn<>(new NonParameterizedAggregateFn()); + AggregateFunction aggregateFunction = combiner.getUdafImpl(); + RelDataTypeFactory typeFactory = new JavaTypeFactoryImpl(RelDataTypeSystem.DEFAULT); + + exceptions.expect(IllegalStateException.class); + + List params = aggregateFunction.getParameters(); + params.get(0).getType(typeFactory); + } + + public static class Sum implements AggregateFn { + + @Override + public Long createAccumulator() { + return 0L; + } + + @Override + public Long addInput(Long mutableAccumulator, Long input) { + return mutableAccumulator + input; + } + + @Override + public Long mergeAccumulators(Long mutableAccumulator, Iterable immutableAccumulators) { + for (Long x : immutableAccumulators) { + mutableAccumulator += x; + } + return mutableAccumulator; + } + + @Override + public Long extractOutput(Long mutableAccumulator) { + return mutableAccumulator; + } + } + + public static class SumChild extends Sum {} + + public static class NonParameterizedAggregateFn implements AggregateFn { + + @Override + public @Nullable Object createAccumulator() { + return null; + } + + @Override + public @Nullable Object addInput(Object mutableAccumulator, Object input) { + return null; + } + + @Override + public @Nullable Object mergeAccumulators( + Object mutableAccumulator, Iterable immutableAccumulators) { + return null; + } + + @Override + public @Nullable Object extractOutput(Object mutableAccumulator) { + return null; + } + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFnReflectorTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFnReflectorTest.java new file mode 100644 index 000000000000..a5f38918389e --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFnReflectorTest.java @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl; + +import static org.hamcrest.Matchers.instanceOf; +import static org.junit.Assert.assertEquals; + +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import org.apache.beam.sdk.extensions.sql.udf.ScalarFn; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link ScalarFnReflector}. */ +@RunWith(JUnit4.class) +public class ScalarFnReflectorTest { + @Rule public ExpectedException thrown = ExpectedException.none(); + + @Test + @SuppressWarnings("nullness") // If result is null, test will fail as expected. + public void testGetApplyMethod() throws InvocationTargetException, IllegalAccessException { + IncrementFn incrementFn = new IncrementFn(); + Method method = ScalarFnReflector.getApplyMethod(incrementFn); + @Nullable Object result = method.invoke(incrementFn, Long.valueOf(24L)); + assertEquals(Long.valueOf(25L), result); + } + + @Test + @SuppressWarnings("nullness") // If result is null, test will fail as expected. + public void testGetApplyMethodOverride() + throws InvocationTargetException, IllegalAccessException { + IncrementFnChild incrementFn = new IncrementFnChild(); + Method method = ScalarFnReflector.getApplyMethod(incrementFn); + @Nullable Object result = method.invoke(incrementFn, Long.valueOf(24L)); + assertEquals(Long.valueOf(26L), result); + } + + @Test + @SuppressWarnings("nullness") // If result is null, test will fail as expected. + public void testGetApplyMethodStatic() throws InvocationTargetException, IllegalAccessException { + Method method = ScalarFnReflector.getApplyMethod(new IncrementFnWithStaticMethod()); + @Nullable Object result = method.invoke(null, Long.valueOf(24L)); + assertEquals(Long.valueOf(25L), result); + } + + @Test + public void testDifferentMethodNameThrowsIllegalArgumentException() { + thrown.expect(instanceOf(IllegalArgumentException.class)); + thrown.expectMessage("Found multiple methods annotated with @ApplyMethod."); + ScalarFnReflector.getApplyMethod(new IncrementFnDifferentMethodName()); + } + + @Test + public void testDifferentMethodSignatureThrowsIllegalArgumentException() { + thrown.expect(instanceOf(IllegalArgumentException.class)); + thrown.expectMessage("Found multiple methods annotated with @ApplyMethod."); + ScalarFnReflector.getApplyMethod(new IncrementFnDifferentSignature()); + } + + @Test + public void testMissingAnnotationThrowsIllegalArgumentException() { + thrown.expect(instanceOf(IllegalArgumentException.class)); + thrown.expectMessage("No method annotated with @ApplyMethod found in class"); + ScalarFnReflector.getApplyMethod(new IncrementFnMissingAnnotation()); + } + + @Test + public void testNonPublicMethodThrowsIllegalArgumentException() { + thrown.expect(instanceOf(IllegalArgumentException.class)); + thrown.expectMessage("not public"); + ScalarFnReflector.getApplyMethod(new IncrementFnWithProtectedMethod()); + } + + static class IncrementFn extends ScalarFn { + @ApplyMethod + public Long increment(Long i) { + return i + 1; + } + } + + static class IncrementFnChild extends IncrementFn { + @ApplyMethod + @Override + public Long increment(Long i) { + return i + 2; + } + } + + static class IncrementFnWithStaticMethod extends ScalarFn { + @ApplyMethod + public static Long increment(Long i) { + return i + 1; + } + } + + static class IncrementFnDifferentMethodName extends IncrementFn { + @ApplyMethod + public Long differentMethod(Long i) { + return i + 2; + } + } + + static class IncrementFnDifferentSignature extends IncrementFn { + @ApplyMethod + public Long increment(String s) { + return 0L; + } + } + + static class IncrementFnMissingAnnotation extends ScalarFn { + public Long increment(Long i) { + return i + 1; + } + } + + static class IncrementFnWithProtectedMethod extends ScalarFn { + @ApplyMethod + protected Long increment(Long i) { + return i + 1; + } + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLNestedTypesTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLNestedTypesTest.java index 32b6ff40d68a..1e78f2fe84a5 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLNestedTypesTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLNestedTypesTest.java @@ -32,7 +32,7 @@ import org.apache.beam.sdk.extensions.sql.utils.QuickCheckGenerators.PrimitiveTypes; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParseException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParseException; import org.junit.runner.RunWith; /** @@ -43,9 +43,6 @@ *

    By default quick check runs this test 100 times. */ @RunWith(JUnitQuickcheck.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamDDLNestedTypesTest { @Property diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLTest.java index e8cbcb4e90a4..316f66733f1e 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLTest.java @@ -30,15 +30,19 @@ import org.apache.beam.sdk.extensions.sql.impl.parser.impl.BeamSqlParserImpl; import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect; import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.pretty.SqlPrettyWriter; import org.junit.Test; /** UnitTest for {@link BeamSqlParserImpl}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamDDLTest { @Test @@ -216,6 +220,35 @@ public void testParseDropTable() throws Exception { assertNull(tableProvider.getTables().get("person")); } + @Test + public void unparseScalarFunction() { + SqlIdentifier name = new SqlIdentifier("foo", SqlParserPos.ZERO); + SqlNode jarPath = SqlLiteral.createCharString("path/to/udf.jar", SqlParserPos.ZERO); + SqlCreateFunction createFunction = + new SqlCreateFunction(SqlParserPos.ZERO, false, name, jarPath, false); + SqlWriter sqlWriter = new SqlPrettyWriter(BeamBigQuerySqlDialect.DEFAULT); + + createFunction.unparse(sqlWriter, 0, 0); + + assertEquals( + "CREATE FUNCTION foo USING JAR 'path/to/udf.jar'", sqlWriter.toSqlString().getSql()); + } + + @Test + public void unparseAggregateFunction() { + SqlIdentifier name = new SqlIdentifier("foo", SqlParserPos.ZERO); + SqlNode jarPath = SqlLiteral.createCharString("path/to/udf.jar", SqlParserPos.ZERO); + SqlCreateFunction createFunction = + new SqlCreateFunction(SqlParserPos.ZERO, false, name, jarPath, true); + SqlWriter sqlWriter = new SqlPrettyWriter(BeamBigQuerySqlDialect.DEFAULT); + + createFunction.unparse(sqlWriter, 0, 0); + + assertEquals( + "CREATE AGGREGATE FUNCTION foo USING JAR 'path/to/udf.jar'", + sqlWriter.toSqlString().getSql()); + } + private static Table mockTable(String name, String type, String comment, JSONObject properties) { return mockTable(name, type, comment, properties, "/home/admin/" + name); } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateFunctionTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateFunctionTest.java new file mode 100644 index 000000000000..9970a78fe9f0 --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlCreateFunctionTest.java @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.parser; + +import static org.junit.Assert.fail; + +import org.apache.beam.sdk.extensions.sql.BeamSqlDslBase; +import org.apache.beam.sdk.extensions.sql.SqlTransform; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link SqlCreateFunction}. */ +@RunWith(JUnit4.class) +public class SqlCreateFunctionTest extends BeamSqlDslBase { + private final String jarPathProperty = "beam.sql.udf.test.jar_path"; + private final @Nullable String jarPath = System.getProperty(jarPathProperty); + + @Before + public void setUp() { + if (jarPath == null) { + fail( + String.format( + "System property %s must be set to run %s.", + jarPathProperty, SqlCreateFunctionTest.class.getSimpleName())); + } + } + + @Test + public void createScalarFunction() throws Exception { + String ddl = String.format("CREATE FUNCTION increment USING JAR '%s'", jarPath); + String query = "SELECT increment(0)"; + PCollection stream = pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); + + final Schema schema = Schema.builder().addInt64Field("field1").build(); + PAssert.that(stream).containsInAnyOrder(Row.withSchema(schema).addValues(1L).build()); + + pipeline.run().waitUntilFinish(); + } + + @Test + public void createAggregateFunction() throws Exception { + String ddl = String.format("CREATE AGGREGATE FUNCTION my_sum USING JAR '%s'", jarPath); + String query = "SELECT my_sum(f_long) FROM PCOLLECTION"; + PCollection stream = boundedInput1.apply(SqlTransform.query(query).withDdlString(ddl)); + + final Schema schema = Schema.builder().addInt64Field("field1").build(); + PAssert.that(stream).containsInAnyOrder(Row.withSchema(schema).addValues(10000L).build()); + + pipeline.run().waitUntilFinish(); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/planner/NodeStatsTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/planner/NodeStatsTest.java index a68d96286bd3..1eb2fad97fa8 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/planner/NodeStatsTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/planner/NodeStatsTest.java @@ -21,19 +21,16 @@ import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestBoundedTable; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSubset; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.SingleRel; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.volcano.RelSubset; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.SingleRel; import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Test; /** This tests the NodeStats Metadata handler and the estimations. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class NodeStatsTest extends BaseRelTest { static class UnknownRel extends SingleRel { protected UnknownRel(RelOptCluster cluster, RelTraitSet traits, RelNode input) { diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BaseRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BaseRelTest.java index 5f5f8d09dc75..5ba74e88acc3 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BaseRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BaseRelTest.java @@ -26,9 +26,6 @@ import org.apache.beam.sdk.values.Row; /** Base class for rel test. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public abstract class BaseRelTest { private static Map tables = new HashMap<>(); protected static BeamSqlEnv env = BeamSqlEnv.readOnly("test", tables); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRelTest.java index 3ebe01e42024..0b97ff66713e 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRelTest.java @@ -23,7 +23,7 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestBoundedTable; import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestUnboundedTable; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.joda.time.DateTime; import org.joda.time.Duration; import org.junit.Assert; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRelTest.java index 8b1c2dcde281..27baad3b138d 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRelTest.java @@ -23,7 +23,7 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestBoundedTable; import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestUnboundedTable; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.joda.time.DateTime; import org.joda.time.Duration; import org.junit.Assert; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRelBoundedVsBoundedTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRelBoundedVsBoundedTest.java index f6826a57aac4..f9ade2fe63ac 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRelBoundedVsBoundedTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRelBoundedVsBoundedTest.java @@ -25,7 +25,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.hamcrest.core.StringContains; import org.junit.Assert; import org.junit.BeforeClass; @@ -34,9 +34,6 @@ import org.junit.rules.ExpectedException; /** Bounded + Bounded Test for {@code BeamCoGBKJoinRel}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamCoGBKJoinRelBoundedVsBoundedTest extends BaseRelTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRelUnboundedVsUnboundedTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRelUnboundedVsUnboundedTest.java index f4fe3dfd55ce..27cda2b561b8 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRelUnboundedVsUnboundedTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCoGBKJoinRelUnboundedVsUnboundedTest.java @@ -28,7 +28,7 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.joda.time.DateTime; import org.joda.time.Duration; import org.junit.Assert; @@ -37,9 +37,6 @@ import org.junit.Test; /** Unbounded + Unbounded Test for {@code BeamCoGBKJoinRel}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamCoGBKJoinRelUnboundedVsUnboundedTest extends BaseRelTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); private static final DateTime FIRST_DATE = new DateTime(1); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverterTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverterTest.java index 797d423994df..c8db3ba68aff 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverterTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverterTest.java @@ -39,18 +39,19 @@ import org.apache.beam.sdk.values.PDone; import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.java.JavaTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.Enumerable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.Enumerator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.VolcanoPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.RelOptTableImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeSystem; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.Enumerable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.Enumerator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.volcano.VolcanoPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.prepare.RelOptTableImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.TableModify.Operation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeSystem; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; import org.junit.Test; import org.junit.experimental.categories.Category; import org.junit.runner.RunWith; @@ -58,9 +59,6 @@ import org.junit.runners.Parameterized; /** Test for {@code BeamEnumerableConverter}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamEnumerableConverterTest { static final JavaTypeFactory TYPE_FACTORY = new JavaTypeFactoryImpl(RelDataTypeSystem.DEFAULT); static RexBuilder rexBuilder = new RexBuilder(TYPE_FACTORY); @@ -173,7 +171,7 @@ public void testToEnumerable_count() { RelOptTableImpl.create(null, type, ImmutableList.of(), null), null, new BeamValuesRel(cluster, type, tuples, null), - null, + Operation.INSERT, null, null, false, diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRelTest.java index ff0d70fa5df6..ddec055cc5e8 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRelTest.java @@ -24,8 +24,8 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestUnboundedTable; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.TestPipeline; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; import org.joda.time.DateTime; import org.joda.time.Duration; import org.junit.Assert; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRelTest.java index d5acfabd0ae1..6c7995ba28b7 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRelTest.java @@ -26,7 +26,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Rule; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRelTest.java index 074c4476926e..b1916c42e2fe 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRelTest.java @@ -28,7 +28,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.joda.time.DateTime; import org.joda.time.Duration; import org.junit.Assert; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputJoinRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputJoinRelTest.java index b217425d5816..9bb424853569 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputJoinRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputJoinRelTest.java @@ -28,7 +28,7 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.joda.time.DateTime; import org.joda.time.Duration; import org.junit.Assert; @@ -37,9 +37,6 @@ import org.junit.Test; /** Unbounded + Bounded Test for {@code BeamSideInputJoinRel}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSideInputJoinRelTest extends BaseRelTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); public static final DateTime FIRST_DATE = new DateTime(1); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRelTest.java index 091ecafaca84..4b8dc381a527 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRelTest.java @@ -41,9 +41,6 @@ import org.junit.Test; import org.junit.rules.ExpectedException; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSideInputLookupJoinRelTest extends BaseRelTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSortRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSortRelTest.java index 99c7860b8fa8..a85878846387 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSortRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSortRelTest.java @@ -25,7 +25,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.joda.time.DateTime; import org.junit.Assert; import org.junit.Before; @@ -34,9 +34,6 @@ import org.junit.rules.ExpectedException; /** Test for {@code BeamSortRel}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSortRelTest extends BaseRelTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRelTest.java index 640a1df33ef1..4b99d6a2a138 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRelTest.java @@ -27,7 +27,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.junit.Assert; import org.junit.Rule; import org.junit.Test; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java index 6f777517057f..b729ae9f4394 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java @@ -26,7 +26,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Rule; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamValuesRelTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamValuesRelTest.java index 07877519762b..6adfc98a6865 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamValuesRelTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamValuesRelTest.java @@ -25,7 +25,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Rule; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRuleTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRuleTest.java index 42c4639e6789..5d13af947777 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRuleTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRuleTest.java @@ -34,6 +34,7 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider.PushDownOptions; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.Row; import org.junit.Before; import org.junit.Rule; import org.junit.Test; @@ -41,9 +42,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamAggregateProjectMergeRuleTest { private static final Schema BASIC_SCHEMA = Schema.builder() @@ -62,9 +60,19 @@ public void buildUp() { Table projectTable = getTable("TEST_PROJECT", PushDownOptions.PROJECT); Table filterTable = getTable("TEST_FILTER", PushDownOptions.FILTER); Table noneTable = getTable("TEST_NONE", PushDownOptions.NONE); + tableProvider.createTable(projectTable); tableProvider.createTable(filterTable); tableProvider.createTable(noneTable); + + // Rules are cost based, need rows to optimize! + tableProvider.addRows( + "TEST_PROJECT", Row.withSchema(BASIC_SCHEMA).addValues(1, 2, "3", 4).build()); + tableProvider.addRows( + "TEST_FILTER", Row.withSchema(BASIC_SCHEMA).addValues(1, 2, "3", 4).build()); + tableProvider.addRows( + "TEST_NONE", Row.withSchema(BASIC_SCHEMA).addValues(1, 2, "3", 4).build()); + sqlEnv = BeamSqlEnv.inMemory(tableProvider); } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/IOPushDownRuleTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/IOPushDownRuleTest.java index 164e226416da..7795bfbecfd2 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/IOPushDownRuleTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/IOPushDownRuleTest.java @@ -36,17 +36,13 @@ import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.CalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSets; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSets; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Pair; import org.junit.Before; import org.junit.Rule; import org.junit.Test; @@ -54,9 +50,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class IOPushDownRuleTest { private static final Schema BASIC_SCHEMA = Schema.builder() @@ -68,11 +61,11 @@ public class IOPushDownRuleTest { private static final List defaultRules = ImmutableList.of( BeamCalcRule.INSTANCE, - FilterCalcMergeRule.INSTANCE, - ProjectCalcMergeRule.INSTANCE, - FilterToCalcRule.INSTANCE, - ProjectToCalcRule.INSTANCE, - CalcMergeRule.INSTANCE); + CoreRules.FILTER_CALC_MERGE, + CoreRules.PROJECT_CALC_MERGE, + CoreRules.FILTER_TO_CALC, + CoreRules.PROJECT_TO_CALC, + CoreRules.CALC_MERGE); private BeamSqlEnv sqlEnv; @Rule public TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinReorderingTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinReorderingTest.java index bd392cde756c..d79005cf422f 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinReorderingTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinReorderingTest.java @@ -31,43 +31,43 @@ import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.DataContext; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.EnumerableConvention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.EnumerableRules; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.Enumerable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.Linq4j; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.ConventionTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollationTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollations; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelFieldCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelRoot; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Join; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.TableScan; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.JoinCommuteRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.SortProjectTransposeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.ScannableTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Statistic; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Statistics; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Table; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.impl.AbstractSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.impl.AbstractTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParser; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.FrameworkConfig; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Frameworks; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Planner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Programs; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSets; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.ImmutableBitSet; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.DataContext; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.EnumerableConvention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.EnumerableRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.Enumerable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.Linq4j; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.ConventionTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollationTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollations; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelFieldCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelRoot; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Join; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.TableScan; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.JoinCommuteRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.ScannableTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Statistic; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Statistics; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Table; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.impl.AbstractTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParser; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.FrameworkConfig; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Frameworks; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Planner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Programs; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSets; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ImmutableBitSet; import org.junit.Assert; import org.junit.Test; @@ -75,9 +75,6 @@ * This test ensures that we are reordering joins and get a plan similar to Join(large,Join(small, * medium)) instead of Join(small, Join(medium,large). */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JoinReorderingTest { private final PipelineOptions defaultPipelineOptions = PipelineOptionsFactory.create(); @@ -115,7 +112,7 @@ public void testTableSizes() { public void testBeamJoinAssociationRule() throws Exception { RuleSet prepareRules = RuleSets.ofList( - SortProjectTransposeRule.INSTANCE, + CoreRules.SORT_PROJECT_TRANSPOSE, EnumerableRules.ENUMERABLE_JOIN_RULE, EnumerableRules.ENUMERABLE_PROJECT_RULE, EnumerableRules.ENUMERABLE_SORT_RULE, @@ -144,7 +141,7 @@ public void testBeamJoinAssociationRule() throws Exception { public void testBeamJoinPushThroughJoinRuleLeft() throws Exception { RuleSet prepareRules = RuleSets.ofList( - SortProjectTransposeRule.INSTANCE, + CoreRules.SORT_PROJECT_TRANSPOSE, EnumerableRules.ENUMERABLE_JOIN_RULE, EnumerableRules.ENUMERABLE_PROJECT_RULE, EnumerableRules.ENUMERABLE_SORT_RULE, @@ -173,7 +170,7 @@ public void testBeamJoinPushThroughJoinRuleLeft() throws Exception { public void testBeamJoinPushThroughJoinRuleRight() throws Exception { RuleSet prepareRules = RuleSets.ofList( - SortProjectTransposeRule.INSTANCE, + CoreRules.SORT_PROJECT_TRANSPOSE, EnumerableRules.ENUMERABLE_JOIN_RULE, EnumerableRules.ENUMERABLE_PROJECT_RULE, EnumerableRules.ENUMERABLE_SORT_RULE, @@ -367,9 +364,6 @@ private void createThreeTables(TestTableProvider tableProvider) { } } -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) final class ThreeTablesSchema extends AbstractSchema { private final ImmutableMap tables; @@ -423,7 +417,7 @@ public ThreeTablesSchema() { } @Override - protected Map + protected Map getTableMap() { return tables; } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamSqlRowCoderTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamSqlRowCoderTest.java index d4819fc966a6..6bf95097f642 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamSqlRowCoderTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamSqlRowCoderTest.java @@ -27,10 +27,10 @@ import org.apache.beam.sdk.schemas.SchemaCoder; import org.apache.beam.sdk.testing.CoderProperties; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeSystem; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeSystem; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; import org.joda.time.DateTime; import org.junit.Test; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/schema/transform/BeamTransformBaseTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/schema/transform/BeamTransformBaseTest.java index cb0e7e515703..e37edd8c5530 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/schema/transform/BeamTransformBaseTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/schema/transform/BeamTransformBaseTest.java @@ -27,9 +27,6 @@ import org.junit.BeforeClass; /** shared methods to test PTransforms which execute Beam SQL steps. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamTransformBaseTest { static final DateTimeFormatter FORMAT = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss"); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/utils/CalciteUtilsTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/utils/CalciteUtilsTest.java index 8a86348efffa..707818b8a0d7 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/utils/CalciteUtilsTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/utils/CalciteUtilsTest.java @@ -24,22 +24,23 @@ import java.util.Map; import java.util.stream.Collectors; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeSystem; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeSystem; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; import org.junit.Before; +import org.junit.Rule; import org.junit.Test; +import org.junit.rules.ExpectedException; /** Tests for conversion from Beam schema to Calcite data type. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CalciteUtilsTest { RelDataTypeFactory dataTypeFactory; + @Rule public ExpectedException thrown = ExpectedException.none(); + @Before public void setUp() { dataTypeFactory = new SqlTypeFactoryImpl(RelDataTypeSystem.DEFAULT); @@ -169,4 +170,12 @@ public void testRoundTripBeamNullableSchema() { assertEquals(schema, out); } + + @Test + public void testFieldTypeNotFound() { + RelDataType relDataType = dataTypeFactory.createUnknownType(); + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage("Cannot find a matching Beam FieldType for Calcite type: UNKNOWN"); + CalciteUtils.toFieldType(relDataType); + } } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/integrationtest/BeamSqlBuiltinFunctionsIntegrationTestBase.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/integrationtest/BeamSqlBuiltinFunctionsIntegrationTestBase.java index af7ede35856a..25e6ef31cd54 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/integrationtest/BeamSqlBuiltinFunctionsIntegrationTestBase.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/integrationtest/BeamSqlBuiltinFunctionsIntegrationTestBase.java @@ -19,7 +19,7 @@ import static org.apache.beam.sdk.extensions.sql.utils.DateTimeUtils.parseTimestampWithUTCTimeZone; import static org.apache.beam.sdk.extensions.sql.utils.RowAsserts.matchesScalar; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import static org.junit.Assert.assertTrue; import com.google.auto.value.AutoValue; @@ -50,16 +50,13 @@ import org.apache.beam.sdk.values.PDone; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptors; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.Iterables; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.Iterables; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.DateTime; import org.junit.Rule; /** Base class for all built-in functions integration tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlBuiltinFunctionsIntegrationTestBase { private static final double PRECISION_DOUBLE = 1e-7; private static final float PRECISION_FLOAT = 1e-7f; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/integrationtest/BeamSqlDateFunctionsIntegrationTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/integrationtest/BeamSqlDateFunctionsIntegrationTest.java index a824c011af40..0d5db922d220 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/integrationtest/BeamSqlDateFunctionsIntegrationTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/integrationtest/BeamSqlDateFunctionsIntegrationTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.extensions.sql.integrationtest; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.DateTimeUtils.MILLIS_PER_DAY; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.DateTimeUtils.MILLIS_PER_SECOND; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.DateTimeUtils.MILLIS_PER_DAY; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.DateTimeUtils.MILLIS_PER_SECOND; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; @@ -35,9 +35,6 @@ import org.junit.Test; /** Integration test for date functions. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamSqlDateFunctionsIntegrationTest extends BeamSqlBuiltinFunctionsIntegrationTestBase { diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/CustomTableResolverTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/CustomTableResolverTest.java index f4d589e1070a..d20274d18cbb 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/CustomTableResolverTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/CustomTableResolverTest.java @@ -32,16 +32,13 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; import org.joda.time.Duration; import org.junit.Rule; import org.junit.Test; /** Test for custom table resolver and full name table provider. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CustomTableResolverTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableProviderWrapperTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableProviderWrapperTest.java new file mode 100644 index 000000000000..4b45e5a1cb1d --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableProviderWrapperTest.java @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider; + +import com.alibaba.fastjson.JSON; +import java.util.List; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; +import org.apache.beam.sdk.extensions.sql.meta.DefaultTableFilter; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests {@link org.apache.beam.sdk.extensions.sql.meta.provider.SchemaIOTableProviderWrapper} using + * {@link org.apache.beam.sdk.extensions.sql.meta.provider.TestSchemaIOTableProviderWrapper}. + */ +@RunWith(JUnit4.class) +public class SchemaIOTableProviderWrapperTest { + @Rule public TestPipeline pipeline = TestPipeline.create(); + + private static final Schema inputSchema = + Schema.builder() + .addStringField("f_string") + .addInt64Field("f_long") + .addBooleanField("f_bool") + .build(); + private static final List rows = + ImmutableList.of( + Row.withSchema(inputSchema).addValues("zero", 0L, false).build(), + Row.withSchema(inputSchema).addValues("one", 1L, true).build()); + private final Table testTable = + Table.builder() + .name("table") + .comment("table") + .schema(inputSchema) + .properties(JSON.parseObject("{}")) + .type("test") + .build(); + + @BeforeClass + public static void setUp() { + TestSchemaIOTableProviderWrapper.addRows(rows.stream().toArray(Row[]::new)); + } + + @Test + public void testBuildIOReader() { + TestSchemaIOTableProviderWrapper provider = new TestSchemaIOTableProviderWrapper(); + BeamSqlTable beamSqlTable = provider.buildBeamSqlTable(testTable); + + PCollection result = beamSqlTable.buildIOReader(pipeline.begin()); + PAssert.that(result).containsInAnyOrder(rows); + + pipeline.run(); + } + + @Test + public void testBuildIOReader_withProjectionPushdown() { + TestSchemaIOTableProviderWrapper provider = new TestSchemaIOTableProviderWrapper(); + BeamSqlTable beamSqlTable = provider.buildBeamSqlTable(testTable); + + PCollection result = + beamSqlTable.buildIOReader( + pipeline.begin(), + new DefaultTableFilter(ImmutableList.of()), + ImmutableList.of("f_long")); + Schema outputSchema = Schema.builder().addInt64Field("f_long").build(); + PAssert.that(result) + .containsInAnyOrder( + Row.withSchema(outputSchema).addValues(0L).build(), + Row.withSchema(outputSchema).addValues(1L).build()); + + pipeline.run(); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/TestSchemaIOTableProviderWrapper.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/TestSchemaIOTableProviderWrapper.java new file mode 100644 index 000000000000..a198460e81bc --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/TestSchemaIOTableProviderWrapper.java @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.PushdownProjector; +import org.apache.beam.sdk.schemas.io.SchemaIO; +import org.apache.beam.sdk.schemas.io.SchemaIOProvider; +import org.apache.beam.sdk.schemas.transforms.Select; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.Row; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * A mock {@link org.apache.beam.sdk.extensions.sql.meta.provider.SchemaIOTableProviderWrapper} that + * reads in-memory data for testing. + */ +public class TestSchemaIOTableProviderWrapper extends SchemaIOTableProviderWrapper { + private static final List rows = new ArrayList<>(); + + @Override + public SchemaIOProvider getSchemaIOProvider() { + return new TestSchemaIOProvider(); + } + + public static void addRows(Row... newRows) { + rows.addAll(Arrays.asList(newRows)); + } + + private class TestSchemaIOProvider implements SchemaIOProvider { + @Override + public String identifier() { + return "TestSchemaIOProvider"; + } + + @Override + public Schema configurationSchema() { + return Schema.of(); + } + + @Override + public SchemaIO from(String location, Row configuration, @Nullable Schema dataSchema) { + return new TestSchemaIO(dataSchema); + } + + @Override + public boolean requiresDataSchema() { + return true; + } + + @Override + public PCollection.IsBounded isBounded() { + return PCollection.IsBounded.BOUNDED; + } + } + + private class TestSchemaIO implements SchemaIO { + private final Schema schema; + + TestSchemaIO(Schema schema) { + this.schema = schema; + } + + @Override + public Schema schema() { + return schema; + } + + @Override + public PTransform> buildReader() { + // Read all fields by default. + return new TestPushdownProjector(schema, FieldAccessDescriptor.withAllFields()); + } + + @Override + public PTransform, ? extends POutput> buildWriter() { + throw new UnsupportedOperationException(); + } + } + + /** + * {@link PTransform} that reads in-memory data for testing. Simulates projection pushdown using + * {@link Select}. + */ + private class TestPushdownProjector extends PTransform> + implements PushdownProjector { + /** The schema of the input data. */ + private final Schema schema; + /** The fields to be projected. */ + private final FieldAccessDescriptor fieldAccessDescriptor; + + TestPushdownProjector(Schema schema, FieldAccessDescriptor fieldAccessDescriptor) { + this.schema = schema; + this.fieldAccessDescriptor = fieldAccessDescriptor; + } + + @Override + public PTransform> withProjectionPushdown( + FieldAccessDescriptor fieldAccessDescriptor) { + return new TestPushdownProjector(schema, fieldAccessDescriptor); + } + + @Override + public boolean supportsFieldReordering() { + return true; + } + + @Override + public PCollection expand(PBegin input) { + // Simulate projection pushdown using Select. In a real IO, projection would be pushed down to + // the source. + return input + .apply(Create.of(rows).withRowSchema(schema)) + .apply(Select.fieldAccess(fieldAccessDescriptor)); + } + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/avro/AvroTableProviderTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/avro/AvroTableProviderTest.java index 60bd6f7c9baa..2e2ff91d1b57 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/avro/AvroTableProviderTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/avro/AvroTableProviderTest.java @@ -22,8 +22,7 @@ import java.io.File; import org.apache.beam.sdk.PipelineResult; import org.apache.beam.sdk.PipelineResult.State; -import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; -import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.extensions.sql.SqlTransform; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; @@ -42,36 +41,32 @@ public class AvroTableProviderTest { @Rule public TestPipeline readPipeline = TestPipeline.create(); @Rule public TemporaryFolder tempFolder = new TemporaryFolder(); - private static final String AVRO_FIELD_NAMES = "(name VARCHAR, age BIGINT, country VARCHAR)"; + private static final String FIELD_NAMES = "(name VARCHAR, age BIGINT, country VARCHAR)"; private static final Schema OUTPUT_ROW_SCHEMA = Schema.builder().addInt64Field("age").addStringField("country").build(); @Test - public void testReadAndWriteAvroTable() { + public void testWriteAndReadTable() { File destinationFile = new File(tempFolder.getRoot(), "person-info.avro"); - BeamSqlEnv env = BeamSqlEnv.inMemory(new AvroTableProvider()); - env.executeDdl( + String query = "INSERT INTO PersonInfo VALUES ('Alan', 22, 'England'), ('John', 42, 'USA')"; + String ddl = String.format( "CREATE EXTERNAL TABLE PersonInfo %s TYPE avro LOCATION '%s'", - AVRO_FIELD_NAMES, destinationFile.getAbsolutePath())); + FIELD_NAMES, destinationFile.getAbsolutePath()); - BeamSqlRelUtils.toPCollection( - writePipeline, - env.parseQuery( - "INSERT INTO PersonInfo VALUES ('Alan', 22, 'England'), ('John', 42, 'USA')")); + writePipeline.apply(SqlTransform.query(query).withDdlString(ddl)); writePipeline.run().waitUntilFinish(); - PCollection rows = - BeamSqlRelUtils.toPCollection( - readPipeline, env.parseQuery("SELECT age, country FROM PersonInfo where age > 25")); + String readQuery = "SELECT age, country FROM PersonInfo WHERE age > 25"; + PCollection rows = readPipeline.apply(SqlTransform.query(readQuery).withDdlString(ddl)); PAssert.that(rows) .containsInAnyOrder(Row.withSchema(OUTPUT_ROW_SCHEMA).addValues(42L, "USA").build()); PipelineResult.State state = readPipeline.run().waitUntilFinish(); - assertEquals(state, State.DONE); + assertEquals(State.DONE, state); } } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryFilterTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryFilterTest.java index 463b3cf8d3ea..fa84883448cc 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryFilterTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryFilterTest.java @@ -41,9 +41,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryFilterTest { // TODO: add date, time, and datetime fields. private static final Schema BASIC_SCHEMA = diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryReadWriteIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryReadWriteIT.java index acc2d068c0d4..14d742294c5b 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryReadWriteIT.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryReadWriteIT.java @@ -17,7 +17,6 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.bigquery; -import static junit.framework.TestCase.assertNull; import static org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BigQueryTable.METHOD_PROPERTY; import static org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BigQueryTable.WRITE_DISPOSITION_PROPERTY; import static org.apache.beam.sdk.extensions.sql.utils.DateTimeUtils.parseTimestampWithUTCTimeZone; @@ -42,7 +41,6 @@ import org.apache.beam.sdk.PipelineResult; import org.apache.beam.sdk.PipelineResult.State; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; -import org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamPushDownIOSourceRel; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; @@ -60,7 +58,7 @@ import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; import org.joda.time.Duration; import org.junit.Rule; import org.junit.Test; @@ -740,12 +738,10 @@ public void testSQLRead_withDirectRead_withProjectPushDown() { BeamRelNode relNode = sqlEnv.parseQuery(selectTableStatement); PCollection output = BeamSqlRelUtils.toPCollection(readPipeline, relNode); - // Calc is not dropped because BigQuery does not support field reordering yet. - assertThat(relNode, instanceOf(BeamCalcRel.class)); - assertThat(relNode.getInput(0), instanceOf(BeamPushDownIOSourceRel.class)); + assertThat(relNode, instanceOf(BeamPushDownIOSourceRel.class)); // IO projects fields in the same order they are defined in the schema. assertThat( - relNode.getInput(0).getRowType().getFieldNames(), + relNode.getRowType().getFieldNames(), containsInAnyOrder("c_tinyint", "c_integer", "c_varchar")); // Field reordering is done in a Calc assertThat( @@ -816,15 +812,9 @@ public void testSQLRead_withDirectRead_withProjectAndFilterPushDown() { BeamRelNode relNode = sqlEnv.parseQuery(selectTableStatement); PCollection output = BeamSqlRelUtils.toPCollection(readPipeline, relNode); - assertThat(relNode, instanceOf(BeamCalcRel.class)); - // Predicate should be pushed-down to IO level - assertNull(((BeamCalcRel) relNode).getProgram().getCondition()); - - assertThat(relNode.getInput(0), instanceOf(BeamPushDownIOSourceRel.class)); + assertThat(relNode, instanceOf(BeamPushDownIOSourceRel.class)); // Unused fields should not be projected by an IO - assertThat( - relNode.getInput(0).getRowType().getFieldNames(), - containsInAnyOrder("c_varchar", "c_integer")); + assertThat(relNode.getRowType().getFieldNames(), containsInAnyOrder("c_varchar", "c_integer")); assertThat( output.getSchema(), diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryRowCountIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryRowCountIT.java index bd6d9ae18e83..a2c5f835363a 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryRowCountIT.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryRowCountIT.java @@ -38,7 +38,7 @@ import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; import org.junit.Rule; import org.junit.Test; import org.junit.runner.RunWith; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTable.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTable.java index bdf740bc1c2e..a674e40f0d24 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTable.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTable.java @@ -26,9 +26,6 @@ * A BigQueryTable that keeps jobName from the pipeline options whenever row count is called. It is * made for {@link BigQueryRowCountIT#testPipelineOptionInjection()} */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryTestTable extends BigQueryTable { private String jobName = null; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTableProvider.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTableProvider.java index 1a682c8dec7e..2e0fac649ce2 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTableProvider.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTableProvider.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.bigquery; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.MoreObjects.firstNonNull; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.MoreObjects.firstNonNull; import java.util.HashMap; import java.util.Map; @@ -27,9 +27,6 @@ import org.checkerframework.checker.nullness.qual.Nullable; /** A test table provider for BigQueryRowCountIT. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryTestTableProvider extends BigQueryTableProvider { private Map tableSpecMap; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableClientWrapper.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableClientWrapper.java new file mode 100644 index 000000000000..6a8b343b18f6 --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableClientWrapper.java @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.bigtable; + +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.byteString; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.byteStringUtf8; + +import com.google.auth.Credentials; +import com.google.bigtable.admin.v2.ColumnFamily; +import com.google.bigtable.admin.v2.DeleteTableRequest; +import com.google.bigtable.admin.v2.Table; +import com.google.bigtable.v2.MutateRowRequest; +import com.google.bigtable.v2.Mutation; +import com.google.cloud.bigtable.config.BigtableOptions; +import com.google.cloud.bigtable.config.CredentialOptions; +import com.google.cloud.bigtable.grpc.BigtableDataClient; +import com.google.cloud.bigtable.grpc.BigtableSession; +import com.google.cloud.bigtable.grpc.BigtableTableAdminClient; +import java.io.IOException; +import java.io.Serializable; +import org.checkerframework.checker.nullness.qual.Nullable; + +class BigtableClientWrapper implements Serializable { + private final BigtableTableAdminClient tableAdminClient; + private final BigtableDataClient dataClient; + private final BigtableSession session; + private final BigtableOptions bigtableOptions; + + BigtableClientWrapper( + String project, + String instanceId, + @Nullable Integer emulatorPort, + @Nullable Credentials gcpCredentials) + throws IOException { + BigtableOptions.Builder optionsBuilder = + BigtableOptions.builder() + .setProjectId(project) + .setInstanceId(instanceId) + .setUserAgent("apache-beam-test"); + if (emulatorPort != null) { + optionsBuilder.enableEmulator("localhost", emulatorPort); + } + if (gcpCredentials != null) { + optionsBuilder.setCredentialOptions(CredentialOptions.credential(gcpCredentials)); + } + bigtableOptions = optionsBuilder.build(); + + session = new BigtableSession(bigtableOptions); + tableAdminClient = session.getTableAdminClient(); + dataClient = session.getDataClient(); + } + + void writeRow( + String key, + String table, + String familyColumn, + String columnQualifier, + byte[] value, + long timestampMicros) { + Mutation.SetCell setCell = + Mutation.SetCell.newBuilder() + .setFamilyName(familyColumn) + .setColumnQualifier(byteStringUtf8(columnQualifier)) + .setValue(byteString(value)) + .setTimestampMicros(timestampMicros) + .build(); + Mutation mutation = Mutation.newBuilder().setSetCell(setCell).build(); + MutateRowRequest mutateRowRequest = + MutateRowRequest.newBuilder() + .setRowKey(byteStringUtf8(key)) + .setTableName(bigtableOptions.getInstanceName().toTableNameStr(table)) + .addMutations(mutation) + .build(); + dataClient.mutateRow(mutateRowRequest); + } + + void createTable(String tableName, String familyName) { + Table.Builder tableBuilder = Table.newBuilder(); + tableBuilder.putColumnFamilies(familyName, ColumnFamily.newBuilder().build()); + + String instanceName = bigtableOptions.getInstanceName().toString(); + com.google.bigtable.admin.v2.CreateTableRequest.Builder createTableRequestBuilder = + com.google.bigtable.admin.v2.CreateTableRequest.newBuilder() + .setParent(instanceName) + .setTableId(tableName) + .setTable(tableBuilder.build()); + tableAdminClient.createTable(createTableRequestBuilder.build()); + } + + void deleteTable(String tableId) { + final String tableName = bigtableOptions.getInstanceName().toTableNameStr(tableId); + DeleteTableRequest.Builder deleteTableRequestBuilder = + DeleteTableRequest.newBuilder().setName(tableName); + tableAdminClient.deleteTable(deleteTableRequestBuilder.build()); + } + + void closeSession() throws IOException { + session.close(); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableFilterTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableFilterTest.java new file mode 100644 index 000000000000..dade31390f6a --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableFilterTest.java @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.bigtable; + +import static org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider.PUSH_DOWN_OPTION; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.KEY; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.instanceOf; + +import com.alibaba.fastjson.JSON; +import java.util.Arrays; +import java.util.Collection; +import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider; +import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider.PushDownOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.Row; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; +import org.junit.runners.Parameterized.Parameter; +import org.junit.runners.Parameterized.Parameters; + +@RunWith(Parameterized.class) +public class BigtableFilterTest { + + private static final Schema BASIC_SCHEMA = + Schema.builder().addStringField(KEY).addStringField("name").build(); + + private BeamSqlEnv sqlEnv; + + @Parameters + public static Collection data() { + return Arrays.asList( + new Object[][] { + {"select * from TEST where key = '100'", false}, + {"select * from TEST where key >= 'key2'", false}, + {"select * from TEST where key LIKE '^key[123]'", true}, + {"select * from TEST where key LIKE '^key[abc]' OR key LIKE '^key[bcd]'", false}, + }); + } + + @Parameter public String query; + + @Parameter(1) + public boolean isSupported; + + @Rule public TestPipeline pipeline = TestPipeline.create(); + + @Before + public void buildUp() { + TestTableProvider tableProvider = new TestTableProvider(); + Table table = getTable("TEST", PushDownOptions.NONE); + tableProvider.createTable(table); + tableProvider.addRows(table.getName(), row("key1", "firstName"), row("key2", "secondName")); + + sqlEnv = + BeamSqlEnv.builder(tableProvider) + .setPipelineOptions(PipelineOptionsFactory.create()) + .build(); + } + + @Test + public void testIsSupported() { + BeamRelNode beamRelNode = sqlEnv.parseQuery(query); + assertThat(beamRelNode, instanceOf(BeamCalcRel.class)); + BigtableFilter filter = + new BigtableFilter(((BeamCalcRel) beamRelNode).getProgram().split().right, BASIC_SCHEMA); + + assertThat( + "Query: '" + query + "' is expected to be " + (isSupported ? "supported." : "unsupported."), + filter.getNotSupported().isEmpty() == isSupported); + } + + private static Table getTable(String name, PushDownOptions options) { + return Table.builder() + .name(name) + .comment(name + " table") + .schema(BASIC_SCHEMA) + .properties( + JSON.parseObject("{ " + PUSH_DOWN_OPTION + ": " + "\"" + options.toString() + "\" }")) + .type("test") + .build(); + } + + private static Row row(String key, String name) { + return Row.withSchema(BASIC_SCHEMA).addValues(key, name).build(); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableCreationFailuresTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableCreationFailuresTest.java index 3f6f0a24895c..17195903aaec 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableCreationFailuresTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableCreationFailuresTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.bigtable; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.checkMessage; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.checkMessage; import static org.junit.Assert.assertThrows; import org.apache.beam.sdk.extensions.sql.BeamSqlCli; @@ -29,9 +29,6 @@ import org.junit.Before; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigtableTableCreationFailuresTest { private final InMemoryMetaStore metaStore = new InMemoryMetaStore(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableFlatTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableFlatTest.java index bbfc62f4f1fd..817570ab2cd7 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableFlatTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableFlatTest.java @@ -17,12 +17,18 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.bigtable; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.FAMILY_TEST; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.KEY1; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.KEY2; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.TEST_FLAT_SCHEMA; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.bigTableRow; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.columnsMappingString; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.createFlatTableString; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.createReadTable; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.flatRow; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.setFixedTimestamp; import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.COLUMNS_MAPPING; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.KEY1; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.KEY2; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.TEST_FLAT_SCHEMA; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.bigTableRow; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.setFixedTimestamp; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.KEY; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; @@ -30,57 +36,61 @@ import static org.junit.Assert.assertTrue; import com.alibaba.fastjson.JSONObject; +import com.google.cloud.bigtable.emulator.v2.BigtableEmulatorRule; +import java.io.IOException; import org.apache.beam.sdk.extensions.sql.BeamSqlCli; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.extensions.sql.meta.store.InMemoryMetaStore; import org.apache.beam.sdk.io.gcp.bigtable.BigtableIO; +import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.ClassRule; import org.junit.Rule; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class BigtableTableFlatTest extends BigtableTableTest { +public class BigtableTableFlatTest { + @ClassRule + public static final BigtableEmulatorRule BIGTABLE_EMULATOR = BigtableEmulatorRule.create(); + + @Rule public TestPipeline readPipeline = TestPipeline.create(); @Rule public TestPipeline writePipeline = TestPipeline.create(); - private String createFlatTableString(String table) { - return "CREATE EXTERNAL TABLE " - + table - + "( \n" - + " key VARCHAR NOT NULL, \n" - + " boolColumn BOOLEAN NOT NULL, \n" - + " longColumn BIGINT NOT NULL, \n" - + " stringColumn VARCHAR NOT NULL, \n" - + " doubleColumn DOUBLE NOT NULL \n" - + ") \n" - + "TYPE bigtable \n" - + "LOCATION '" - + getLocation(table) - + "' \n" - + "TBLPROPERTIES '{ \n" - + " \"columnsMapping\": \"" - + columnsMappingString() - + "\"}'"; + private static BigtableClientWrapper emulatorWrapper; + + private static final String PROJECT = "fakeProject"; + private static final String INSTANCE = "fakeInstance"; + + @BeforeClass + public static void setUp() throws Exception { + emulatorWrapper = + new BigtableClientWrapper(PROJECT, INSTANCE, BIGTABLE_EMULATOR.getPort(), null); + } + + @AfterClass + public static void tearDown() throws IOException { + emulatorWrapper.closeSession(); } @Test public void testCreatesFlatSchemaCorrectly() { + final String tableId = "flatTableSchema"; InMemoryMetaStore metaStore = new InMemoryMetaStore(); metaStore.registerProvider(new BigtableTableProvider()); BeamSqlCli cli = new BeamSqlCli().metaStore(metaStore); - cli.execute(createFlatTableString("flatTable")); + cli.execute(createFlatTableString(tableId, location(tableId))); - Table table = metaStore.getTables().get("flatTable"); + Table table = metaStore.getTables().get(tableId); assertNotNull(table); assertEquals(TEST_FLAT_SCHEMA, table.getSchema()); @@ -90,19 +100,13 @@ public void testCreatesFlatSchemaCorrectly() { } @Test - public void testSimpleSelectFlat() throws Exception { - createReadTable("flatTable"); + public void testSimpleSelectFlat() { + final String tableId = "flatTable"; + createReadTable(tableId, emulatorWrapper); BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new BigtableTableProvider()); - sqlEnv.executeDdl(createFlatTableString("flatTable")); + sqlEnv.executeDdl(createFlatTableString(tableId, location(tableId))); - String query = - "SELECT \n" - + " ft.key, \n" - + " ft.boolColumn, \n" - + " ft.longColumn, \n" - + " ft.stringColumn, \n" - + " ft.doubleColumn \n" - + "FROM flatTable ft"; + String query = "SELECT key, boolColumn, longColumn, stringColumn, doubleColumn FROM flatTable"; sqlEnv.parseQuery(query); PCollection queryOutput = @@ -110,32 +114,64 @@ public void testSimpleSelectFlat() throws Exception { assertThat(queryOutput.getSchema(), equalTo(TEST_FLAT_SCHEMA)); - PAssert.that(queryOutput).containsInAnyOrder(row(KEY1), row(KEY2)); + PAssert.that(queryOutput).containsInAnyOrder(flatRow(KEY1), flatRow(KEY2)); + readPipeline.run().waitUntilFinish(); + } + + @Test + public void testSelectFlatKeyRegexQuery() { + final String tableId = "regexTable"; + createReadTable(tableId, emulatorWrapper); + BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new BigtableTableProvider()); + sqlEnv.executeDdl(createFlatTableString(tableId, location(tableId))); + + String query = "SELECT key FROM regexTable WHERE key LIKE '^key[0134]{1}'"; + + sqlEnv.parseQuery(query); + PCollection queryOutput = + BeamSqlRelUtils.toPCollection(readPipeline, sqlEnv.parseQuery(query)); + + assertThat(queryOutput.getSchema(), equalTo(filterSchema())); + + PAssert.that(queryOutput).containsInAnyOrder(filterRow(KEY1)); readPipeline.run().waitUntilFinish(); } @Test public void testSimpleInsert() { - createTable("beamWriteTable"); + final String tableId = "beamWriteTable"; + emulatorWrapper.createTable(tableId, FAMILY_TEST); BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new BigtableTableProvider()); - sqlEnv.executeDdl(createFlatTableString("beamWriteTable")); + sqlEnv.executeDdl(createFlatTableString(tableId, location(tableId))); String query = "INSERT INTO beamWriteTable(key, boolColumn, longColumn, stringColumn, doubleColumn) " - + "VALUES ('key', TRUE, 10, 'stringValue', 5.5)"; + + "VALUES ('key', TRUE, CAST(10 AS bigint), 'stringValue', 5.5)"; BeamSqlRelUtils.toPCollection(writePipeline, sqlEnv.parseQuery(query)); writePipeline.run().waitUntilFinish(); PCollection bigTableRows = readPipeline - .apply(readTransform("beamWriteTable")) + .apply(readTransform(tableId)) .apply(MapElements.via(new ReplaceCellTimestamp())); PAssert.that(bigTableRows).containsInAnyOrder(bigTableRow()); readPipeline.run().waitUntilFinish(); } + private String location(String tableId) { + return BigtableTableTestUtils.location(PROJECT, INSTANCE, tableId, BIGTABLE_EMULATOR.getPort()); + } + + private Schema filterSchema() { + return Schema.builder().addStringField(KEY).build(); + } + + private Row filterRow(String key) { + return Row.withSchema(filterSchema()).attachValues(key); + } + private static class ReplaceCellTimestamp extends SimpleFunction { @Override @@ -144,16 +180,7 @@ public com.google.bigtable.v2.Row apply(com.google.bigtable.v2.Row input) { } } - private String columnsMappingString() { - return "familyTest:boolColumn,familyTest:longColumn,familyTest:doubleColumn," - + "familyTest:stringColumn"; - } - - private static Row row(String key) { - return Row.withSchema(TEST_FLAT_SCHEMA).attachValues(key, false, 2L, "string2", 2.20); - } - - private static BigtableIO.Read readTransform(String table) { + private BigtableIO.Read readTransform(String table) { return BigtableIO.read() .withProjectId("fakeProject") .withInstanceId("fakeInstance") diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableIT.java new file mode 100644 index 000000000000..0a9ab2ffc94b --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableIT.java @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.bigtable; + +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.FAMILY_TEST; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.NOW; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.TEST_FLAT_SCHEMA; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.createFlatTableString; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.createFullTableString; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.expectedFullSchema; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.TIMESTAMP_MICROS; + +import com.google.auth.Credentials; +import com.google.cloud.bigtable.emulator.v2.Emulator; +import java.util.UUID; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.TestPipelineOptions; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.testcontainers.shaded.com.google.common.collect.ImmutableList; + +@RunWith(JUnit4.class) +public class BigtableTableIT { + private static BigtableTestOptions options; + private static BigtableClientWrapper clientWrapper; + private static final String TABLE_ID = "Beam" + UUID.randomUUID(); + private static Emulator emulator; + + @BeforeClass + public static void setup() throws Exception { + PipelineOptionsFactory.register(BigtableTestOptions.class); + options = TestPipeline.testingPipelineOptions().as(BigtableTestOptions.class); + + if (options.isWithEmulator()) { + emulator = Emulator.createBundled(); + emulator.start(); + } + Credentials credentials = + options.isWithEmulator() ? null : options.as(GcpOptions.class).getGcpCredential(); + Integer emulatorPort = options.isWithEmulator() ? emulator.getPort() : null; + + clientWrapper = + new BigtableClientWrapper( + options.getBigtableProject(), options.getInstanceId(), emulatorPort, credentials); + + clientWrapper.createTable(TABLE_ID, FAMILY_TEST); + } + + @AfterClass + public static void tearDown() throws Exception { + clientWrapper.deleteTable(TABLE_ID); + clientWrapper.closeSession(); + if (emulator != null) { + emulator.stop(); + } + } + + @Test + public void testWriteThenRead() { + writeData(); + readFlatData(); + readData(); + } + + private void writeData() { + Pipeline p = Pipeline.create(options); + BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new BigtableTableProvider()); + sqlEnv.executeDdl(createFlatTableString(TABLE_ID, location())); + + String query = + String.format( + "INSERT INTO `%s`(key, boolColumn, longColumn, stringColumn, doubleColumn) " + + "VALUES ('key1', FALSE, CAST(1 as bigint), 'string1', 1.0)", + TABLE_ID); + + BeamSqlRelUtils.toPCollection(p, sqlEnv.parseQuery(query)); + p.run().waitUntilFinish(); + } + + private void readFlatData() { + Pipeline p = Pipeline.create(options); + BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new BigtableTableProvider()); + sqlEnv.executeDdl(createFlatTableString(TABLE_ID, location())); + String query = "SELECT * FROM `" + TABLE_ID + "`"; + + PCollection flatRows = BeamSqlRelUtils.toPCollection(p, sqlEnv.parseQuery(query)); + + PAssert.that(flatRows).containsInAnyOrder(expectedFlatRow(1)); + p.run().waitUntilFinish(); + } + + private void readData() { + Pipeline p = Pipeline.create(options); + BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new BigtableTableProvider()); + sqlEnv.executeDdl(createFullTableString(TABLE_ID, location())); + String query = + String.format( + "SELECT key, " + + " t.familyTest.boolColumn, " + + " t.familyTest.longColumn.val AS longValue, " + + " t.familyTest.longColumn.timestampMicros, " + + " t.familyTest.longColumn.labels, " + + " t.familyTest.stringColumn, " + + " t.familyTest.doubleColumn " + + "FROM `%s` t", + TABLE_ID); + + PCollection rows = + BeamSqlRelUtils.toPCollection(p, sqlEnv.parseQuery(query)) + .apply(MapElements.via(new ReplaceTimestamp())) + .setRowSchema(expectedFullSchema()); + + PAssert.that(rows).containsInAnyOrder(expectedFullRow(1)); + p.run().waitUntilFinish(); + } + + private Row expectedFullRow(int i) { + return Row.withSchema(expectedFullSchema()) + .attachValues( + "key" + i, + i % 2 == 0, + (long) i, + NOW, + ImmutableList.of(), + ImmutableList.of("string" + i), + (double) i); + } + + private Row expectedFlatRow(int i) { + return Row.withSchema(TEST_FLAT_SCHEMA) + .attachValues("key" + i, i % 2 == 0, (long) i, "string" + i, (double) i); + } + + private static class ReplaceTimestamp extends SimpleFunction { + @Override + public Row apply(Row input) { + return Row.fromRow(input).withFieldValue(TIMESTAMP_MICROS, NOW).build(); + } + } + + private String location() { + Integer emulatorPort = options.isWithEmulator() ? emulator.getPort() : null; + return BigtableTableTestUtils.location( + options.getBigtableProject(), options.getInstanceId(), TABLE_ID, emulatorPort); + } + + /** Properties needed when using Bigtable with the Beam SDK. */ + public interface BigtableTestOptions extends TestPipelineOptions { + @Description("Instance ID for Bigtable") + @Default.String("fakeInstance") + String getInstanceId(); + + void setInstanceId(String value); + + @Description("Project for Bigtable") + @Default.String("fakeProject") + String getBigtableProject(); + + void setBigtableProject(String value); + + @Description("Whether to use emulator") + @Default.Boolean(true) + Boolean isWithEmulator(); + + void setWithEmulator(Boolean value); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableTest.java deleted file mode 100644 index 94d1df219529..000000000000 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableTest.java +++ /dev/null @@ -1,94 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.extensions.sql.meta.provider.bigtable; - -import static java.nio.charset.StandardCharsets.UTF_8; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.BINARY_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.BOOL_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.DOUBLE_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.FAMILY_TEST; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.KEY1; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.KEY2; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.LATER; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.LONG_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.NOW; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.STRING_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.booleanToByteArray; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.doubleToByteArray; - -import com.google.cloud.bigtable.emulator.v2.BigtableEmulatorRule; -import org.apache.beam.sdk.io.gcp.testing.BigtableEmulatorWrapper; -import org.apache.beam.sdk.testing.TestPipeline; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Longs; -import org.junit.BeforeClass; -import org.junit.ClassRule; -import org.junit.Rule; - -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public abstract class BigtableTableTest { - - @ClassRule - public static final BigtableEmulatorRule BIGTABLE_EMULATOR = BigtableEmulatorRule.create(); - - @Rule public transient TestPipeline readPipeline = TestPipeline.create(); - - private static BigtableEmulatorWrapper emulatorWrapper; - - @BeforeClass - public static void setUp() throws Exception { - emulatorWrapper = - new BigtableEmulatorWrapper(BIGTABLE_EMULATOR.getPort(), "fakeProject", "fakeInstance"); - } - - protected static void createTable(String table) { - emulatorWrapper.createTable(table, FAMILY_TEST); - } - - protected static void createReadTable(String table) throws Exception { - createTable(table); - writeRow(KEY1, table); - writeRow(KEY2, table); - } - - protected static String getLocation(String table) { - return String.format( - "localhost:%s/bigtable/projects/fakeProject/instances/fakeInstance/tables/%s", - BIGTABLE_EMULATOR.getPort(), table); - } - - private static void writeRow(String key, String table) throws Exception { - emulatorWrapper.writeRow(key, table, FAMILY_TEST, BOOL_COLUMN, booleanToByteArray(true), NOW); - emulatorWrapper.writeRow( - key, table, FAMILY_TEST, BOOL_COLUMN, booleanToByteArray(false), LATER); - emulatorWrapper.writeRow( - key, table, FAMILY_TEST, STRING_COLUMN, "string1".getBytes(UTF_8), NOW); - emulatorWrapper.writeRow( - key, table, FAMILY_TEST, STRING_COLUMN, "string2".getBytes(UTF_8), LATER); - emulatorWrapper.writeRow(key, table, FAMILY_TEST, LONG_COLUMN, Longs.toByteArray(1L), NOW); - emulatorWrapper.writeRow(key, table, FAMILY_TEST, LONG_COLUMN, Longs.toByteArray(2L), LATER); - emulatorWrapper.writeRow(key, table, FAMILY_TEST, DOUBLE_COLUMN, doubleToByteArray(1.10), NOW); - emulatorWrapper.writeRow( - key, table, FAMILY_TEST, DOUBLE_COLUMN, doubleToByteArray(2.20), LATER); - emulatorWrapper.writeRow( - key, table, FAMILY_TEST, BINARY_COLUMN, "blob1".getBytes(UTF_8), LATER); - emulatorWrapper.writeRow( - key, table, FAMILY_TEST, BINARY_COLUMN, "blob2".getBytes(UTF_8), LATER); - } -} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableTestUtils.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableTestUtils.java new file mode 100644 index 000000000000..f318676f6ba1 --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableTestUtils.java @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.bigtable; + +import static java.nio.charset.StandardCharsets.UTF_8; +import static java.util.stream.Collectors.toList; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.KEY; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.LABELS; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.TIMESTAMP_MICROS; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.VALUE; +import static org.apache.beam.sdk.io.gcp.testing.BigtableUtils.booleanToByteArray; +import static org.apache.beam.sdk.io.gcp.testing.BigtableUtils.byteString; +import static org.apache.beam.sdk.io.gcp.testing.BigtableUtils.byteStringUtf8; +import static org.apache.beam.sdk.io.gcp.testing.BigtableUtils.doubleToByteArray; +import static org.apache.beam.sdk.io.gcp.testing.BigtableUtils.longToByteArray; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.containsString; +import static org.junit.Assert.fail; + +import com.google.bigtable.v2.Cell; +import com.google.bigtable.v2.Column; +import com.google.bigtable.v2.Family; +import java.util.List; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Longs; +import org.checkerframework.checker.nullness.qual.Nullable; + +class BigtableTableTestUtils { + + static final String KEY1 = "key1"; + static final String KEY2 = "key2"; + + static final String BOOL_COLUMN = "boolColumn"; + static final String LONG_COLUMN = "longColumn"; + static final String STRING_COLUMN = "stringColumn"; + static final String DOUBLE_COLUMN = "doubleColumn"; + static final String FAMILY_TEST = "familyTest"; + + static final Schema LONG_COLUMN_SCHEMA = + Schema.builder() + .addInt64Field(VALUE) + .addInt64Field(TIMESTAMP_MICROS) + .addArrayField(LABELS, Schema.FieldType.STRING) + .build(); + + static final Schema TEST_FAMILY_SCHEMA = + Schema.builder() + .addBooleanField(BOOL_COLUMN) + .addRowField(LONG_COLUMN, LONG_COLUMN_SCHEMA) + .addArrayField(STRING_COLUMN, Schema.FieldType.STRING) + .addDoubleField(DOUBLE_COLUMN) + .build(); + + static final Schema TEST_SCHEMA = + Schema.builder().addStringField(KEY).addRowField(FAMILY_TEST, TEST_FAMILY_SCHEMA).build(); + + static final Schema TEST_FLAT_SCHEMA = + Schema.builder() + .addStringField(KEY) + .addBooleanField(BOOL_COLUMN) + .addInt64Field(LONG_COLUMN) + .addStringField(STRING_COLUMN) + .addDoubleField(DOUBLE_COLUMN) + .build(); + + static final long NOW = 5_000_000_000L; + static final long LATER = NOW + 1_000L; + + static String createFlatTableString(String table, String location) { + return String.format( + "CREATE EXTERNAL TABLE `%s`( \n" + + " key VARCHAR NOT NULL, \n" + + " boolColumn BOOLEAN NOT NULL, \n" + + " longColumn BIGINT NOT NULL, \n" + + " stringColumn VARCHAR NOT NULL, \n" + + " doubleColumn DOUBLE NOT NULL \n" + + ") \n" + + "TYPE bigtable \n" + + "LOCATION '%s' \n" + + "TBLPROPERTIES '{ \n" + + " \"columnsMapping\": \"%s\"}'", + table, location, columnsMappingString()); + } + + static String createFullTableString(String tableId, String location) { + return String.format( + "CREATE EXTERNAL TABLE `%s`( \n" + + " key VARCHAR NOT NULL, \n" + + " familyTest ROW< \n" + + " boolColumn BOOLEAN NOT NULL, \n" + + " longColumn ROW< \n" + + " val BIGINT NOT NULL, \n" + + " timestampMicros BIGINT NOT NULL, \n" + + " labels ARRAY NOT NULL \n" + + " > NOT NULL, \n" + + " stringColumn ARRAY NOT NULL, \n" + + " doubleColumn DOUBLE NOT NULL \n" + + " > NOT NULL \n" + + ") \n" + + "TYPE bigtable \n" + + "LOCATION '%s'", + tableId, location); + } + + static Schema expectedFullSchema() { + return Schema.builder() + .addStringField(KEY) + .addBooleanField(BOOL_COLUMN) + .addInt64Field("longValue") + .addInt64Field(TIMESTAMP_MICROS) + .addArrayField(LABELS, Schema.FieldType.STRING) + .addArrayField(STRING_COLUMN, Schema.FieldType.STRING) + .addDoubleField(DOUBLE_COLUMN) + .build(); + } + + static Row expectedFullRow(String key) { + return Row.withSchema(expectedFullSchema()) + .attachValues( + key, + false, + 2L, + LATER, + ImmutableList.of(), + ImmutableList.of("string1", "string2"), + 2.20); + } + + static Row flatRow(String key) { + return Row.withSchema(TEST_FLAT_SCHEMA).attachValues(key, false, 2L, "string2", 2.20); + } + + static String location( + String project, String instanceId, String tableId, @Nullable Integer emulatorPort) { + String host = emulatorPort == null ? "googleapis.com" : "localhost:" + emulatorPort; + return String.format( + "%s/bigtable/projects/%s/instances/%s/tables/%s", host, project, instanceId, tableId); + } + + static String columnsMappingString() { + return "familyTest:boolColumn,familyTest:longColumn,familyTest:doubleColumn," + + "familyTest:stringColumn"; + } + + static void createReadTable(String table, BigtableClientWrapper clientWrapper) { + clientWrapper.createTable(table, FAMILY_TEST); + writeRow(KEY1, table, clientWrapper); + writeRow(KEY2, table, clientWrapper); + } + + static com.google.bigtable.v2.Row bigTableRow() { + List columns = + ImmutableList.of( + column("boolColumn", booleanToByteArray(true)), + column("doubleColumn", doubleToByteArray(5.5)), + column("longColumn", Longs.toByteArray(10L)), + column("stringColumn", "stringValue".getBytes(UTF_8))); + Family family = Family.newBuilder().setName("familyTest").addAllColumns(columns).build(); + return com.google.bigtable.v2.Row.newBuilder() + .setKey(byteStringUtf8("key")) + .addFamilies(family) + .build(); + } + + // There is no possibility to insert a value with fixed timestamp so we have to replace it + // for the testing purpose. + static com.google.bigtable.v2.Row setFixedTimestamp(com.google.bigtable.v2.Row row) { + Family family = row.getFamilies(0); + + List columnsReplaced = + family.getColumnsList().stream() + .map( + column -> { + Cell cell = column.getCells(0); + return column( + column.getQualifier().toStringUtf8(), cell.getValue().toByteArray()); + }) + .collect(toList()); + Family familyReplaced = + Family.newBuilder().setName(family.getName()).addAllColumns(columnsReplaced).build(); + return com.google.bigtable.v2.Row.newBuilder() + .setKey(row.getKey()) + .addFamilies(familyReplaced) + .build(); + } + + static void checkMessage(@Nullable String message, String substring) { + if (message != null) { + assertThat(message, containsString(substring)); + } else { + fail(); + } + } + + private static Column column(String qualifier, byte[] value) { + return Column.newBuilder() + .setQualifier(byteStringUtf8(qualifier)) + .addCells(cell(value)) + .build(); + } + + private static Cell cell(byte[] value) { + return Cell.newBuilder().setValue(byteString(value)).setTimestampMicros(NOW).build(); + } + + private static void writeRow(String key, String table, BigtableClientWrapper clientWrapper) { + clientWrapper.writeRow(key, table, FAMILY_TEST, BOOL_COLUMN, booleanToByteArray(true), NOW); + clientWrapper.writeRow(key, table, FAMILY_TEST, BOOL_COLUMN, booleanToByteArray(false), LATER); + clientWrapper.writeRow(key, table, FAMILY_TEST, STRING_COLUMN, "string1".getBytes(UTF_8), NOW); + clientWrapper.writeRow( + key, table, FAMILY_TEST, STRING_COLUMN, "string2".getBytes(UTF_8), LATER); + clientWrapper.writeRow(key, table, FAMILY_TEST, LONG_COLUMN, longToByteArray(1L), NOW); + clientWrapper.writeRow(key, table, FAMILY_TEST, LONG_COLUMN, longToByteArray(2L), LATER); + clientWrapper.writeRow(key, table, FAMILY_TEST, DOUBLE_COLUMN, doubleToByteArray(1.10), NOW); + clientWrapper.writeRow(key, table, FAMILY_TEST, DOUBLE_COLUMN, doubleToByteArray(2.20), LATER); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableWithRowsTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableWithRowsTest.java index 7b0f2b54bff6..4b60eb13c894 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableWithRowsTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigtable/BigtableTableWithRowsTest.java @@ -17,62 +17,62 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.bigtable; -import static java.nio.charset.StandardCharsets.UTF_8; import static java.util.stream.Collectors.toList; -import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.KEY; -import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.LABELS; -import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.TIMESTAMP_MICROS; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.BINARY_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.BOOL_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.DOUBLE_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.KEY1; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.KEY2; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.LATER; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.STRING_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.TEST_SCHEMA; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.KEY1; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.KEY2; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.STRING_COLUMN; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.TEST_SCHEMA; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.createFullTableString; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.createReadTable; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.expectedFullRow; +import static org.apache.beam.sdk.extensions.sql.meta.provider.bigtable.BigtableTableTestUtils.expectedFullSchema; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; +import com.google.cloud.bigtable.emulator.v2.BigtableEmulatorRule; +import java.io.IOException; import org.apache.beam.sdk.extensions.sql.BeamSqlCli; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.extensions.sql.meta.store.InMemoryMetaStore; -import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.ClassRule; +import org.junit.Rule; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class BigtableTableWithRowsTest extends BigtableTableTest { - - private String createTableString() { - return "CREATE EXTERNAL TABLE beamTable( \n" - + " key VARCHAR NOT NULL, \n" - + " familyTest ROW< \n" - + " boolColumn BOOLEAN NOT NULL, \n" - + " longColumn ROW< \n" - + " val BIGINT NOT NULL, \n" - + " timestampMicros BIGINT NOT NULL, \n" - + " labels ARRAY NOT NULL \n" - + " > NOT NULL, \n" - + " stringColumn ARRAY NOT NULL, \n" - + " doubleColumn DOUBLE NOT NULL, \n" - + " binaryColumn BINARY NOT NULL \n" - + " > NOT NULL \n" - + ") \n" - + "TYPE bigtable \n" - + "LOCATION '" - + getLocation("beamTable") - + "'"; +public class BigtableTableWithRowsTest { + + @ClassRule + public static final BigtableEmulatorRule BIGTABLE_EMULATOR = BigtableEmulatorRule.create(); + + @Rule public TestPipeline readPipeline = TestPipeline.create(); + + private static BigtableClientWrapper emulatorWrapper; + + private static final String PROJECT = "fakeProject"; + private static final String INSTANCE = "fakeInstance"; + private static final String TABLE = "beamTable"; + + @BeforeClass + public static void setUp() throws Exception { + emulatorWrapper = + new BigtableClientWrapper("fakeProject", "fakeInstance", BIGTABLE_EMULATOR.getPort(), null); + } + + @AfterClass + public static void tearDown() throws IOException { + emulatorWrapper.closeSession(); } @Test @@ -81,7 +81,7 @@ public void testCreatesSchemaCorrectly() { metaStore.registerProvider(new BigtableTableProvider()); BeamSqlCli cli = new BeamSqlCli().metaStore(metaStore); - cli.execute(createTableString()); + cli.execute(createFullTableString(TABLE, location())); Table table = metaStore.getTables().get("beamTable"); assertNotNull(table); @@ -89,59 +89,36 @@ public void testCreatesSchemaCorrectly() { } @Test - public void testSimpleSelect() throws Exception { - createReadTable("beamTable"); + public void testSimpleSelect() { + createReadTable(TABLE, emulatorWrapper); BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new BigtableTableProvider()); - sqlEnv.executeDdl(createTableString()); + sqlEnv.executeDdl(createFullTableString(TABLE, location())); String query = - "" - + "SELECT key, \n" + "SELECT key, \n" + " bt.familyTest.boolColumn, \n" + " bt.familyTest.longColumn.val AS longValue, \n" + " bt.familyTest.longColumn.timestampMicros, \n" + " bt.familyTest.longColumn.labels, \n" + " bt.familyTest.stringColumn, \n" - + " bt.familyTest.doubleColumn, \n" - + " bt.familyTest.binaryColumn \n" + + " bt.familyTest.doubleColumn \n" + "FROM beamTable bt"; sqlEnv.parseQuery(query); PCollection queryOutput = BeamSqlRelUtils.toPCollection(readPipeline, sqlEnv.parseQuery(query)); - assertThat(queryOutput.getSchema(), equalTo(expectedSchema())); + assertThat(queryOutput.getSchema(), equalTo(expectedFullSchema())); PCollection sorted = - queryOutput.apply(MapElements.via(new SortByTimestamp())).setRowSchema(expectedSchema()); + queryOutput + .apply(MapElements.via(new SortByTimestamp())) + .setRowSchema(expectedFullSchema()); - PAssert.that(sorted) - .containsInAnyOrder(row(expectedSchema(), KEY1), row(expectedSchema(), KEY2)); + PAssert.that(sorted).containsInAnyOrder(expectedFullRow(KEY1), expectedFullRow(KEY2)); readPipeline.run().waitUntilFinish(); } - private static Schema expectedSchema() { - return Schema.builder() - .addStringField(KEY) - .addBooleanField(BOOL_COLUMN) - .addInt64Field("longValue") - .addInt64Field(TIMESTAMP_MICROS) - .addArrayField(LABELS, Schema.FieldType.STRING) - .addArrayField(STRING_COLUMN, Schema.FieldType.STRING) - .addDoubleField(DOUBLE_COLUMN) - .addByteArrayField(BINARY_COLUMN) - .build(); - } - - private static Row row(Schema schema, String key) { - return Row.withSchema(schema) - .attachValues( - key, - false, - 2L, - LATER, - ImmutableList.of(), - ImmutableList.of("string1", "string2"), - 2.20, - "blob2".getBytes(UTF_8)); + private String location() { + return BigtableTableTestUtils.location(PROJECT, INSTANCE, TABLE, BIGTABLE_EMULATOR.getPort()); } private static class SortByTimestamp extends SimpleFunction { diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datastore/DataStoreReadWriteIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datastore/DataStoreReadWriteIT.java index c2c499a5cce6..107da0d29b23 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datastore/DataStoreReadWriteIT.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datastore/DataStoreReadWriteIT.java @@ -50,7 +50,7 @@ import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.ByteString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.ByteString; import org.joda.time.Duration; import org.junit.Rule; import org.junit.Test; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableAvroTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableAvroTest.java index 987c64c77090..fee04e49a8c3 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableAvroTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableAvroTest.java @@ -17,12 +17,14 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; +import com.alibaba.fastjson.JSON; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.util.List; import org.apache.avro.generic.GenericRecord; import org.apache.avro.generic.GenericRecordBuilder; import org.apache.beam.sdk.coders.AvroCoder; +import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.utils.AvroUtils; import org.apache.beam.sdk.values.Row; @@ -84,6 +86,15 @@ protected byte[] generateEncodedPayload(int i) { @Override protected BeamKafkaTable getBeamKafkaTable() { - return new BeamKafkaAvroTable(TEST_SCHEMA, "", ImmutableList.of()); + return (BeamKafkaTable) + (new KafkaTableProvider() + .buildBeamSqlTable( + Table.builder() + .name("kafka") + .type("kafka") + .schema(TEST_SCHEMA) + .location("localhost/mytopic") + .properties(JSON.parseObject("{ \"format\": \"avro\" }")) + .build())); } } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableCSVTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableCSVTest.java index 085c33112dbf..5624f0051c02 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableCSVTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableCSVTest.java @@ -49,7 +49,7 @@ protected Row generateRow(int i) { @Override protected BeamKafkaTable getBeamKafkaTable() { - return new BeamKafkaCSVTable(TEST_SCHEMA, "", ImmutableList.of()); + return new BeamKafkaCSVTable(TEST_SCHEMA, "", ImmutableList.of("mytopic")); } private String createCsv(int i) { diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableJsonTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableJsonTest.java index 842d55116486..d33665cf2131 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableJsonTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableJsonTest.java @@ -19,7 +19,9 @@ import static java.nio.charset.StandardCharsets.UTF_8; +import com.alibaba.fastjson.JSON; import java.util.List; +import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.values.Row; import org.testcontainers.shaded.com.google.common.collect.ImmutableList; @@ -57,7 +59,16 @@ protected Row generateRow(int i) { @Override protected BeamKafkaTable getBeamKafkaTable() { - return new BeamKafkaJsonTable(TEST_SCHEMA, "", ImmutableList.of()); + return (BeamKafkaTable) + (new KafkaTableProvider() + .buildBeamSqlTable( + Table.builder() + .name("kafka") + .type("kafka") + .schema(TEST_SCHEMA) + .location("localhost/mytopic") + .properties(JSON.parseObject("{ \"format\": \"json\" }")) + .build())); } private String createJson(int i) { diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableProtoTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableProtoTest.java index 47484e753f43..a75dded1aa44 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableProtoTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableProtoTest.java @@ -17,24 +17,27 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertThrows; +import com.alibaba.fastjson.JSON; import java.util.List; +import org.apache.beam.sdk.coders.ByteArrayCoder; +import org.apache.beam.sdk.extensions.protobuf.PayloadMessages; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.io.kafka.KafkaRecordCoder; +import org.apache.beam.sdk.io.kafka.ProducerRecordCoder; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamKafkaTableProtoTest extends BeamKafkaTableTest { - private static final Schema TEST_SCHEMA = Schema.builder() .addInt64Field("f_long") @@ -54,17 +57,18 @@ public class BeamKafkaTableProtoTest extends BeamKafkaTableTest { .build(); @Test - public void testWithShuffledSchema() { - BeamKafkaTable kafkaTable = - new BeamKafkaProtoTable( - SHUFFLED_SCHEMA, "", ImmutableList.of(), KafkaMessages.TestMessage.class); + public void testWithShuffledSchema() throws Exception { + BeamKafkaTable kafkaTable = getBeamKafkaTable(SHUFFLED_SCHEMA); PCollection result = pipeline .apply(Create.of(shuffledRow(1), shuffledRow(2))) .apply(kafkaTable.getPTransformForOutput()) + .setCoder(ProducerRecordCoder.of(ByteArrayCoder.of(), ByteArrayCoder.of())) + .apply(MapElements.via(new ProducerToRecord())) + .setCoder(KafkaRecordCoder.of(ByteArrayCoder.of(), ByteArrayCoder.of())) .apply(kafkaTable.getPTransformForInput()); - PAssert.that(result).containsInAnyOrder(generateRow(1), generateRow(2)); + PAssert.that(result).containsInAnyOrder(shuffledRow(1), shuffledRow(2)); pipeline.run(); } @@ -73,21 +77,33 @@ public void testSchemasDoNotMatch() { Schema schema = Schema.builder().addStringField("non_existing_field").build(); IllegalArgumentException e = - assertThrows( - IllegalArgumentException.class, - () -> - new BeamKafkaProtoTable( - schema, "", ImmutableList.of(), KafkaMessages.TestMessage.class)); + assertThrows(IllegalArgumentException.class, () -> getBeamKafkaTable(schema)); assertThat( e.getMessage(), containsString("does not match schema inferred from protobuf class.\nProtobuf class: ")); } + private static BeamKafkaTable getBeamKafkaTable(Schema schema) { + return (BeamKafkaTable) + (new KafkaTableProvider() + .buildBeamSqlTable( + Table.builder() + .name("kafka") + .type("kafka") + .schema(schema) + .location("localhost/mytopic") + .properties( + JSON.parseObject( + "{ \"format\": \"proto\", \"protoClass\": \"" + + PayloadMessages.TestMessage.class.getName() + + "\" }")) + .build())); + } + @Override protected BeamKafkaTable getBeamKafkaTable() { - return new BeamKafkaProtoTable( - TEST_SCHEMA, "", ImmutableList.of(), KafkaMessages.TestMessage.class); + return getBeamKafkaTable(TEST_SCHEMA); } @Override @@ -99,8 +115,8 @@ protected Row generateRow(int i) { @Override protected byte[] generateEncodedPayload(int i) { - KafkaMessages.TestMessage message = - KafkaMessages.TestMessage.newBuilder() + PayloadMessages.TestMessage message = + PayloadMessages.TestMessage.newBuilder() .setFLong(i) .setFInt(i) .setFDouble(i) diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableStatisticsTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableStatisticsTest.java index a372fb458476..eb291dddc102 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableStatisticsTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableStatisticsTest.java @@ -26,9 +26,6 @@ import org.junit.Assert; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamKafkaTableStatisticsTest { @Test public void testOrderedArrivalSinglePartitionRate() { diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableTest.java index 28d4079e6632..12cb7e5f42a1 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableTest.java @@ -17,6 +17,11 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; +import org.apache.beam.sdk.coders.ByteArrayCoder; +import org.apache.beam.sdk.io.kafka.KafkaRecord; +import org.apache.beam.sdk.io.kafka.KafkaRecordCoder; +import org.apache.beam.sdk.io.kafka.KafkaTimestampType; +import org.apache.beam.sdk.io.kafka.ProducerRecordCoder; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; @@ -25,6 +30,8 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; +import org.apache.kafka.clients.producer.ProducerRecord; +import org.apache.kafka.common.header.internals.RecordHeaders; import org.junit.Rule; import org.junit.Test; @@ -48,7 +55,8 @@ public void testRecorderDecoder() throws Exception { PCollection result = pipeline .apply(Create.of(generateEncodedPayload(1), generateEncodedPayload(2))) - .apply(MapElements.via(new ToKV())) + .apply(MapElements.via(new BytesToRecord())) + .setCoder(KafkaRecordCoder.of(ByteArrayCoder.of(), ByteArrayCoder.of())) .apply(kafkaTable.getPTransformForInput()); PAssert.that(result).containsInAnyOrder(generateRow(1), generateRow(2)); @@ -62,15 +70,41 @@ public void testRecorderEncoder() { pipeline .apply(Create.of(generateRow(1), generateRow(2))) .apply(kafkaTable.getPTransformForOutput()) + .setCoder(ProducerRecordCoder.of(ByteArrayCoder.of(), ByteArrayCoder.of())) + .apply(MapElements.via(new ProducerToRecord())) + .setCoder(KafkaRecordCoder.of(ByteArrayCoder.of(), ByteArrayCoder.of())) .apply(kafkaTable.getPTransformForInput()); PAssert.that(result).containsInAnyOrder(generateRow(1), generateRow(2)); pipeline.run(); } - private static class ToKV extends SimpleFunction> { + private static class BytesToRecord extends SimpleFunction> { @Override - public KV apply(byte[] bytes) { - return KV.of(new byte[] {}, bytes); + public KafkaRecord apply(byte[] bytes) { + return new KafkaRecord<>( + "abc", + 0, + 0, + 0, + KafkaTimestampType.LOG_APPEND_TIME, + new RecordHeaders(), + KV.of(new byte[] {}, bytes)); + } + } + + static class ProducerToRecord + extends SimpleFunction, KafkaRecord> { + @Override + public KafkaRecord apply(ProducerRecord record) { + return new KafkaRecord<>( + record.topic(), + record.partition() != null ? record.partition() : 0, + 0, + 0, + KafkaTimestampType.LOG_APPEND_TIME, + record.headers(), + record.key(), + record.value()); } } } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableThriftTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableThriftTest.java new file mode 100644 index 000000000000..958ca632b32a --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTableThriftTest.java @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.containsString; +import static org.junit.Assert.assertThrows; + +import com.alibaba.fastjson.JSON; +import java.util.List; +import org.apache.beam.sdk.coders.ByteArrayCoder; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.io.kafka.KafkaRecordCoder; +import org.apache.beam.sdk.io.kafka.ProducerRecordCoder; +import org.apache.beam.sdk.io.thrift.payloads.TestThriftMessage; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.thrift.TException; +import org.apache.thrift.TSerializer; +import org.apache.thrift.protocol.TCompactProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.junit.Test; + +public class BeamKafkaTableThriftTest extends BeamKafkaTableTest { + private final TProtocolFactory protocolFactory = new TCompactProtocol.Factory(); + + private static final Schema TEST_SCHEMA = + Schema.builder() + .addInt64Field("f_long") + .addInt32Field("f_int") + .addDoubleField("f_double") + .addStringField("f_string") + .addArrayField("f_double_array", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema SHUFFLED_SCHEMA = + Schema.builder() + .addStringField("f_string") + .addInt32Field("f_int") + .addArrayField("f_double_array", Schema.FieldType.DOUBLE) + .addDoubleField("f_double") + .addInt64Field("f_long") + .build(); + + @Test + public void testWithShuffledSchema() { + BeamKafkaTable kafkaTable = getBeamKafkaTable(SHUFFLED_SCHEMA); + + PCollection result = + pipeline + .apply(Create.of(shuffledRow(1), shuffledRow(2))) + .apply(kafkaTable.getPTransformForOutput()) + .setCoder(ProducerRecordCoder.of(ByteArrayCoder.of(), ByteArrayCoder.of())) + .apply(MapElements.via(new ProducerToRecord())) + .setCoder(KafkaRecordCoder.of(ByteArrayCoder.of(), ByteArrayCoder.of())) + .apply(kafkaTable.getPTransformForInput()); + PAssert.that(result).containsInAnyOrder(shuffledRow(1), shuffledRow(2)); + pipeline.run(); + } + + @Test + public void testSchemasDoNotMatch() { + Schema schema = Schema.builder().addStringField("non_existing_field").build(); + + IllegalArgumentException e = + assertThrows(IllegalArgumentException.class, () -> getBeamKafkaTable(schema)); + + assertThat( + e.getMessage(), + containsString("does not match schema inferred from thrift class.\nThrift class: ")); + } + + private static BeamKafkaTable getBeamKafkaTable(Schema schema) { + return (BeamKafkaTable) + (new KafkaTableProvider() + .buildBeamSqlTable( + Table.builder() + .name("kafka") + .type("kafka") + .schema(schema) + .location("localhost/mytopic") + .properties( + JSON.parseObject( + "{ \"format\": \"thrift\", \"thriftClass\": \"" + + TestThriftMessage.class.getName() + + "\", \"thriftProtocolFactoryClass\": \"" + + TCompactProtocol.Factory.class.getName() + + "\" }")) + .build())); + } + + @Override + protected BeamKafkaTable getBeamKafkaTable() { + return getBeamKafkaTable(TEST_SCHEMA); + } + + @Override + protected Row generateRow(int i) { + return Row.withSchema(TEST_SCHEMA) + .addValues((long) i, i, (double) i, "thrift_value" + i, ImmutableList.of((double) i)) + .build(); + } + + @Override + protected byte[] generateEncodedPayload(int i) { + final TestThriftMessage message = + new TestThriftMessage().setFLong(i).setFInt(i).setFDouble(i).setFString("thrift_value" + i); + message.addToFDoubleArray(i); + + try { + return new TSerializer(protocolFactory).serialize(message); + } catch (TException e) { + throw new RuntimeException(e); + } + } + + private Row shuffledRow(int i) { + List values = + ImmutableList.of("thrift_value" + i, i, ImmutableList.of((double) i), (double) i, (long) i); + return Row.withSchema(SHUFFLED_SCHEMA).addValues(values).build(); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderIT.java index 1a38911e5170..ec94b656ad41 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderIT.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderIT.java @@ -17,7 +17,14 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; +import static java.nio.charset.StandardCharsets.UTF_8; +import static org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils.beamRow2CsvLine; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + import com.alibaba.fastjson.JSON; +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collection; import java.util.Map; import java.util.Properties; import java.util.Set; @@ -30,15 +37,22 @@ import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.RowCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.extensions.protobuf.PayloadMessages; +import org.apache.beam.sdk.extensions.protobuf.ProtoMessageSchema; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; +import org.apache.beam.sdk.io.thrift.ThriftCoder; +import org.apache.beam.sdk.io.thrift.ThriftSchema; +import org.apache.beam.sdk.io.thrift.payloads.ItThriftMessage; import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.Description; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.Validation; +import org.apache.beam.sdk.schemas.RowMessages; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.utils.AvroUtils; import org.apache.beam.sdk.state.BagState; import org.apache.beam.sdk.state.StateSpec; import org.apache.beam.sdk.state.StateSpecs; @@ -51,51 +65,79 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.commons.csv.CSVFormat; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.ProducerRecord; +import org.apache.thrift.protocol.TBinaryProtocol; +import org.apache.thrift.protocol.TProtocolFactory; import org.checkerframework.checker.nullness.qual.Nullable; import org.junit.Assert; +import org.junit.Assume; import org.junit.Before; +import org.junit.ClassRule; import org.junit.Rule; import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; +import org.junit.runners.Parameterized.Parameter; +import org.junit.runners.Parameterized.Parameters; import org.testcontainers.containers.KafkaContainer; import org.testcontainers.utility.DockerImageName; /** Integration Test utility for KafkaTableProvider implementations. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public abstract class KafkaTableProviderIT { +@RunWith(Parameterized.class) +public class KafkaTableProviderIT { private static final String KAFKA_CONTAINER_VERSION = "5.5.2"; @Rule public transient TestPipeline pipeline = TestPipeline.create(); - @Rule - public transient KafkaContainer kafka = + @ClassRule + public static final KafkaContainer KAFKA_CONTAINER = new KafkaContainer( DockerImageName.parse("confluentinc/cp-kafka").withTag(KAFKA_CONTAINER_VERSION)); - protected KafkaOptions kafkaOptions; + private static KafkaOptions kafkaOptions; - protected static final Schema TEST_TABLE_SCHEMA = + private static final Schema TEST_TABLE_SCHEMA = Schema.builder() .addInt64Field("f_long") .addInt32Field("f_int") .addStringField("f_string") .build(); - protected abstract ProducerRecord generateProducerRecord(int i); + @Parameters + public static Collection data() { + return Arrays.asList( + new Object[][] { + {new KafkaJsonObjectProvider(), "json_topic"}, + {new KafkaAvroObjectProvider(), "avro_topic"}, + {new KafkaProtoObjectProvider(), "proto_topic"}, + {new KafkaCsvObjectProvider(), "csv_topic"}, + {new KafkaThriftObjectProvider(), "thrift_topic"} + }); + } + + @Parameter public KafkaObjectProvider objectsProvider; - protected abstract String getPayloadFormat(); + @Parameter(1) + public String topic; @Before public void setUp() { kafkaOptions = pipeline.getOptions().as(KafkaOptions.class); - kafkaOptions.setKafkaTopic("topic"); - kafkaOptions.setKafkaBootstrapServerAddress(kafka.getBootstrapServers()); + kafkaOptions.setKafkaTopic(topic); + kafkaOptions.setKafkaBootstrapServerAddress(KAFKA_CONTAINER.getBootstrapServers()); + checkArgument( + !KAFKA_CONTAINER.getBootstrapServers().contains(","), + "This integration test expects exactly one bootstrap server."); + } + + private static String buildLocation() { + return kafkaOptions.getKafkaBootstrapServerAddress() + "/" + kafkaOptions.getKafkaTopic(); } @Test @@ -104,11 +146,11 @@ public void testFake2() throws BeamKafkaTable.NoEstimationException { Table table = Table.builder() .name("kafka_table") - .comment("kafka" + " table") - .location("") + .comment("kafka table") + .location(buildLocation()) .schema(TEST_TABLE_SCHEMA) .type("kafka") - .properties(JSON.parseObject(getKafkaPropertiesString())) + .properties(JSON.parseObject(objectsProvider.getKafkaPropertiesString())) .build(); BeamKafkaTable kafkaTable = (BeamKafkaTable) new KafkaTableProvider().buildBeamSqlTable(table); produceSomeRecordsWithDelay(100, 20); @@ -118,16 +160,6 @@ public void testFake2() throws BeamKafkaTable.NoEstimationException { Assert.assertTrue(rate2 > rate1); } - protected String getKafkaPropertiesString() { - return "{ " - + (getPayloadFormat() == null ? "" : "\"format\" : \"" + getPayloadFormat() + "\",") - + "\"bootstrap.servers\" : \"" - + kafkaOptions.getKafkaBootstrapServerAddress() - + "\",\"topics\":[\"" - + kafkaOptions.getKafkaTopic() - + "\"] }"; - } - static final transient Map FLAG = new ConcurrentHashMap<>(); @Test @@ -141,9 +173,9 @@ public void testFake() throws InterruptedException { + "f_string VARCHAR NOT NULL \n" + ") \n" + "TYPE 'kafka' \n" - + "LOCATION ''\n" + + "LOCATION '%s'\n" + "TBLPROPERTIES '%s'", - getKafkaPropertiesString()); + buildLocation(), objectsProvider.getKafkaPropertiesString()); TableProvider tb = new KafkaTableProvider(); BeamSqlEnv env = BeamSqlEnv.inMemory(tb); @@ -162,14 +194,65 @@ public void testFake() throws InterruptedException { ImmutableSet.of(generateRow(0), generateRow(1), generateRow(2))))); queryOutput.apply(logRecords("")); pipeline.run(); - TimeUnit.MILLISECONDS.sleep(3000); + TimeUnit.SECONDS.sleep(4); produceSomeRecords(3); for (int i = 0; i < 200; i++) { if (FLAG.getOrDefault(pipeline.getOptions().getOptionsId(), false)) { return; } - TimeUnit.MILLISECONDS.sleep(60); + TimeUnit.MILLISECONDS.sleep(90); + } + Assert.fail(); + } + + @Test + public void testFakeNested() throws InterruptedException { + Assume.assumeFalse(topic.equals("csv_topic")); + pipeline.getOptions().as(DirectOptions.class).setBlockOnRun(false); + String createTableString = + String.format( + "CREATE EXTERNAL TABLE kafka_table(\n" + + "headers ARRAY>>," + + "payload ROW<" + + "f_long BIGINT NOT NULL, \n" + + "f_int INTEGER NOT NULL, \n" + + "f_string VARCHAR NOT NULL \n" + + ">" + + ") \n" + + "TYPE 'kafka' \n" + + "LOCATION '%s'\n" + + "TBLPROPERTIES '%s'", + buildLocation(), objectsProvider.getKafkaPropertiesString()); + TableProvider tb = new KafkaTableProvider(); + BeamSqlEnv env = BeamSqlEnv.inMemory(tb); + + env.executeDdl(createTableString); + + PCollection queryOutput = + BeamSqlRelUtils.toPCollection( + pipeline, + env.parseQuery( + "SELECT kafka_table.payload.f_long, kafka_table.payload.f_int, kafka_table.payload.f_string FROM kafka_table")); + + queryOutput + .apply(ParDo.of(new FakeKvPair())) + .setCoder(KvCoder.of(StringUtf8Coder.of(), RowCoder.of(TEST_TABLE_SCHEMA))) + .apply( + "waitForSuccess", + ParDo.of( + new StreamAssertEqual( + ImmutableSet.of(generateRow(0), generateRow(1), generateRow(2))))); + queryOutput.apply(logRecords("")); + pipeline.run(); + TimeUnit.SECONDS.sleep(4); + produceSomeRecords(3); + + for (int i = 0; i < 200; i++) { + if (FLAG.getOrDefault(pipeline.getOptions().getOptionsId(), false)) { + return; + } + TimeUnit.MILLISECONDS.sleep(90); } Assert.fail(); } @@ -231,7 +314,7 @@ public void process( } } - protected Row generateRow(int i) { + private static Row generateRow(int i) { return Row.withSchema(TEST_TABLE_SCHEMA).addValues((long) i, i % 3 + 1, "value" + i).build(); } @@ -242,7 +325,7 @@ private void produceSomeRecords(int num) { .limit(num) .forEach( i -> { - ProducerRecord record = generateProducerRecord(i); + ProducerRecord record = objectsProvider.generateProducerRecord(i); producer.send(record); }); producer.flush(); @@ -256,7 +339,7 @@ private void produceSomeRecordsWithDelay(int num, int delayMilis) { .limit(num) .forEach( i -> { - ProducerRecord record = generateProducerRecord(i); + ProducerRecord record = objectsProvider.generateProducerRecord(i); producer.send(record); try { TimeUnit.MILLISECONDS.sleep(delayMilis); @@ -281,6 +364,127 @@ private Properties producerProps() { return props; } + private abstract static class KafkaObjectProvider implements Serializable { + + protected abstract ProducerRecord generateProducerRecord(int i); + + protected abstract String getPayloadFormat(); + + protected String getKafkaPropertiesString() { + return "{ " + + (getPayloadFormat() == null ? "" : "\"format\" : \"" + getPayloadFormat() + "\",") + + "}"; + } + } + + private static class KafkaJsonObjectProvider extends KafkaObjectProvider { + @Override + protected ProducerRecord generateProducerRecord(int i) { + return new ProducerRecord<>( + kafkaOptions.getKafkaTopic(), "k" + i, createJson(i).getBytes(UTF_8)); + } + + @Override + protected String getPayloadFormat() { + return "json"; + } + + private String createJson(int i) { + return String.format( + "{\"f_long\": %s, \"f_int\": %s, \"f_string\": \"%s\"}", i, i % 3 + 1, "value" + i); + } + } + + private static class KafkaProtoObjectProvider extends KafkaObjectProvider { + private final SimpleFunction toBytesFn = + ProtoMessageSchema.getRowToProtoBytesFn(PayloadMessages.ItMessage.class); + + @Override + protected ProducerRecord generateProducerRecord(int i) { + return new ProducerRecord<>( + kafkaOptions.getKafkaTopic(), "k" + i, toBytesFn.apply(generateRow(i))); + } + + @Override + protected String getPayloadFormat() { + return "proto"; + } + + @Override + protected String getKafkaPropertiesString() { + return "{ " + + "\"format\" : \"proto\"," + + "\"protoClass\": \"" + + PayloadMessages.ItMessage.class.getName() + + "\"}"; + } + } + + private static class KafkaCsvObjectProvider extends KafkaObjectProvider { + + @Override + protected ProducerRecord generateProducerRecord(int i) { + return new ProducerRecord<>( + kafkaOptions.getKafkaTopic(), + "k" + i, + beamRow2CsvLine(generateRow(i), CSVFormat.DEFAULT).getBytes(UTF_8)); + } + + @Override + protected String getPayloadFormat() { + return null; + } + } + + private static class KafkaAvroObjectProvider extends KafkaObjectProvider { + + private final SimpleFunction toBytesFn = + AvroUtils.getRowToAvroBytesFunction(TEST_TABLE_SCHEMA); + + @Override + protected ProducerRecord generateProducerRecord(int i) { + return new ProducerRecord<>( + kafkaOptions.getKafkaTopic(), "k" + i, toBytesFn.apply(generateRow(i))); + } + + @Override + protected String getPayloadFormat() { + return "avro"; + } + } + + private static class KafkaThriftObjectProvider extends KafkaObjectProvider { + private final Class thriftClass = ItThriftMessage.class; + private final TProtocolFactory protocolFactory = new TBinaryProtocol.Factory(); + private final SimpleFunction toBytesFn = + RowMessages.rowToBytesFn( + ThriftSchema.provider(), + TypeDescriptor.of(thriftClass), + ThriftCoder.of(thriftClass, protocolFactory)); + + @Override + protected ProducerRecord generateProducerRecord(int i) { + return new ProducerRecord<>( + kafkaOptions.getKafkaTopic(), "k" + i, toBytesFn.apply(generateRow(i))); + } + + @Override + protected String getKafkaPropertiesString() { + return "{ " + + "\"format\" : \"thrift\"," + + "\"thriftClass\": \"" + + thriftClass.getName() + + "\", \"thriftProtocolFactoryClass\": \"" + + protocolFactory.getClass().getName() + + "\"}"; + } + + @Override + protected String getPayloadFormat() { + return "thrift"; + } + } + /** Pipeline options specific for this test. */ public interface KafkaOptions extends PipelineOptions { diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderProtoIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderProtoIT.java deleted file mode 100644 index a6622e5923d8..000000000000 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderProtoIT.java +++ /dev/null @@ -1,53 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; - -import org.apache.beam.sdk.extensions.protobuf.ProtoMessageSchema; -import org.apache.beam.sdk.transforms.SimpleFunction; -import org.apache.beam.sdk.values.Row; -import org.apache.kafka.clients.producer.ProducerRecord; - -public class KafkaTableProviderProtoIT extends KafkaTableProviderIT { - private final SimpleFunction toBytesFn = - ProtoMessageSchema.getRowToProtoBytesFn(KafkaMessages.ItMessage.class); - - @Override - protected ProducerRecord generateProducerRecord(int i) { - return new ProducerRecord<>( - kafkaOptions.getKafkaTopic(), "k" + i, toBytesFn.apply(generateRow(i))); - } - - @Override - protected String getPayloadFormat() { - return "proto"; - } - - @Override - protected String getKafkaPropertiesString() { - return "{ " - + "\"format\" : \"proto\"," - + "\"bootstrap.servers\" : \"" - + kafkaOptions.getKafkaBootstrapServerAddress() - + "\",\"topics\":[\"" - + kafkaOptions.getKafkaTopic() - + "\"]," - + "\"protoClass\": \"" - + KafkaMessages.ItMessage.class.getName() - + "\"}"; - } -} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderTest.java index 565561d6779f..a2a663ad6ab1 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderTest.java @@ -17,24 +17,30 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; -import static org.apache.beam.sdk.schemas.Schema.toSchema; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertTrue; import com.alibaba.fastjson.JSONArray; import com.alibaba.fastjson.JSONObject; -import java.util.stream.Stream; +import java.util.List; +import org.apache.beam.sdk.extensions.protobuf.PayloadMessages; import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.io.thrift.payloads.SimpleThriftMessage; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.thrift.TBase; +import org.apache.thrift.protocol.TCompactProtocol; +import org.apache.thrift.protocol.TProtocolFactory; import org.checkerframework.checker.nullness.qual.Nullable; import org.junit.Test; /** UnitTest for {@link KafkaTableProvider}. */ public class KafkaTableProviderTest { private final KafkaTableProvider provider = new KafkaTableProvider(); + private static final String LOCATION_BROKER = "104.126.7.88:7743"; + private static final String LOCATION_TOPIC = "topic1"; @Test public void testBuildBeamSqlCSVTable() { @@ -44,9 +50,37 @@ public void testBuildBeamSqlCSVTable() { assertNotNull(sqlTable); assertTrue(sqlTable instanceof BeamKafkaCSVTable); - BeamKafkaCSVTable csvTable = (BeamKafkaCSVTable) sqlTable; - assertEquals("localhost:9092", csvTable.getBootstrapServers()); - assertEquals(ImmutableList.of("topic1", "topic2"), csvTable.getTopics()); + BeamKafkaCSVTable kafkaTable = (BeamKafkaCSVTable) sqlTable; + assertEquals(LOCATION_BROKER, kafkaTable.getBootstrapServers()); + assertEquals(ImmutableList.of(LOCATION_TOPIC), kafkaTable.getTopics()); + } + + @Test + public void testBuildWithExtraServers() { + Table table = + mockTableWithExtraServers("hello", ImmutableList.of("localhost:1111", "localhost:2222")); + BeamSqlTable sqlTable = provider.buildBeamSqlTable(table); + + assertNotNull(sqlTable); + assertTrue(sqlTable instanceof BeamKafkaCSVTable); + + BeamKafkaCSVTable kafkaTable = (BeamKafkaCSVTable) sqlTable; + assertEquals( + LOCATION_BROKER + ",localhost:1111,localhost:2222", kafkaTable.getBootstrapServers()); + assertEquals(ImmutableList.of(LOCATION_TOPIC), kafkaTable.getTopics()); + } + + @Test + public void testBuildWithExtraTopics() { + Table table = mockTableWithExtraTopics("hello", ImmutableList.of("topic2", "topic3")); + BeamSqlTable sqlTable = provider.buildBeamSqlTable(table); + + assertNotNull(sqlTable); + assertTrue(sqlTable instanceof BeamKafkaCSVTable); + + BeamKafkaCSVTable kafkaTable = (BeamKafkaCSVTable) sqlTable; + assertEquals(LOCATION_BROKER, kafkaTable.getBootstrapServers()); + assertEquals(ImmutableList.of(LOCATION_TOPIC, "topic2", "topic3"), kafkaTable.getTopics()); } @Test @@ -55,24 +89,65 @@ public void testBuildBeamSqlAvroTable() { BeamSqlTable sqlTable = provider.buildBeamSqlTable(table); assertNotNull(sqlTable); - assertTrue(sqlTable instanceof BeamKafkaAvroTable); + assertTrue(sqlTable instanceof BeamKafkaTable); - BeamKafkaAvroTable csvTable = (BeamKafkaAvroTable) sqlTable; - assertEquals("localhost:9092", csvTable.getBootstrapServers()); - assertEquals(ImmutableList.of("topic1", "topic2"), csvTable.getTopics()); + BeamKafkaTable kafkaTable = (BeamKafkaTable) sqlTable; + assertEquals(LOCATION_BROKER, kafkaTable.getBootstrapServers()); + assertEquals(ImmutableList.of(LOCATION_TOPIC), kafkaTable.getTopics()); } @Test public void testBuildBeamSqlProtoTable() { - Table table = mockTable("hello", "proto", KafkaMessages.SimpleMessage.class.getName()); + Table table = mockProtoTable("hello", PayloadMessages.SimpleMessage.class); + BeamSqlTable sqlTable = provider.buildBeamSqlTable(table); + + assertNotNull(sqlTable); + assertTrue(sqlTable instanceof BeamKafkaTable); + + BeamKafkaTable kafkaTable = (BeamKafkaTable) sqlTable; + assertEquals(LOCATION_BROKER, kafkaTable.getBootstrapServers()); + assertEquals(ImmutableList.of(LOCATION_TOPIC), kafkaTable.getTopics()); + } + + @Test + public void testBuildBeamSqlThriftTable() { + Table table = + mockThriftTable("hello", SimpleThriftMessage.class, TCompactProtocol.Factory.class); + BeamSqlTable sqlTable = provider.buildBeamSqlTable(table); + + assertNotNull(sqlTable); + assertTrue(sqlTable instanceof BeamKafkaTable); + + BeamKafkaTable kafkaTable = (BeamKafkaTable) sqlTable; + assertEquals(LOCATION_BROKER, kafkaTable.getBootstrapServers()); + assertEquals(ImmutableList.of(LOCATION_TOPIC), kafkaTable.getTopics()); + } + + @Test + public void testBuildBeamSqlNestedBytesTable() { + Table table = mockNestedBytesTable("hello"); + BeamSqlTable sqlTable = provider.buildBeamSqlTable(table); + + assertNotNull(sqlTable); + assertTrue(sqlTable instanceof NestedPayloadKafkaTable); + + BeamKafkaTable kafkaTable = (BeamKafkaTable) sqlTable; + assertEquals(LOCATION_BROKER, kafkaTable.getBootstrapServers()); + assertEquals(ImmutableList.of(LOCATION_TOPIC), kafkaTable.getTopics()); + } + + @Test + public void testBuildBeamSqlNestedThriftTable() { + Table table = + mockNestedThriftTable("hello", SimpleThriftMessage.class, TCompactProtocol.Factory.class); BeamSqlTable sqlTable = provider.buildBeamSqlTable(table); assertNotNull(sqlTable); - assertTrue(sqlTable instanceof BeamKafkaProtoTable); + assertTrue(sqlTable instanceof NestedPayloadKafkaTable); - BeamKafkaProtoTable csvTable = (BeamKafkaProtoTable) sqlTable; - assertEquals("localhost:9092", csvTable.getBootstrapServers()); - assertEquals(ImmutableList.of("topic1", "topic2"), csvTable.getTopics()); + BeamKafkaTable kafkaTable = (BeamKafkaTable) sqlTable; + assertEquals(LOCATION_BROKER, kafkaTable.getBootstrapServers()); + assertEquals(ImmutableList.of(LOCATION_TOPIC), kafkaTable.getTopics()); } @Test @@ -81,37 +156,99 @@ public void testGetTableType() { } private static Table mockTable(String name) { - return mockTable(name, null, null); + return mockTable(name, false, null, null, null, null, null, null); + } + + private static Table mockTableWithExtraServers(String name, List extraBootstrapServers) { + return mockTable(name, false, extraBootstrapServers, null, null, null, null, null); + } + + private static Table mockTableWithExtraTopics(String name, List extraTopics) { + return mockTable(name, false, null, extraTopics, null, null, null, null); } private static Table mockTable(String name, String payloadFormat) { - return mockTable(name, payloadFormat, null); + return mockTable(name, false, null, null, payloadFormat, null, null, null); + } + + private static Table mockProtoTable(String name, Class protoClass) { + return mockTable(name, false, null, null, "proto", protoClass, null, null); + } + + private static Table mockThriftTable( + String name, + Class> thriftClass, + Class thriftProtocolFactoryClass) { + return mockTable( + name, false, null, null, "thrift", null, thriftClass, thriftProtocolFactoryClass); + } + + private static Table mockNestedBytesTable(String name) { + return mockTable(name, true, null, null, null, null, null, null); + } + + private static Table mockNestedThriftTable( + String name, + Class> thriftClass, + Class thriftProtocolFactoryClass) { + return mockTable( + name, true, null, null, "thrift", null, thriftClass, thriftProtocolFactoryClass); } private static Table mockTable( - String name, @Nullable String payloadFormat, @Nullable String protoClass) { + String name, + boolean isNested, + @Nullable List extraBootstrapServers, + @Nullable List extraTopics, + @Nullable String payloadFormat, + @Nullable Class protoClass, + @Nullable Class> thriftClass, + @Nullable Class thriftProtocolFactoryClass) { JSONObject properties = new JSONObject(); - properties.put("bootstrap.servers", "localhost:9092"); - JSONArray topics = new JSONArray(); - topics.add("topic1"); - topics.add("topic2"); - properties.put("topics", topics); + + if (extraBootstrapServers != null) { + JSONArray bootstrapServers = new JSONArray(); + bootstrapServers.addAll(extraBootstrapServers); + properties.put("bootstrap_servers", bootstrapServers); + } + if (extraTopics != null) { + JSONArray topics = new JSONArray(); + topics.addAll(extraTopics); + properties.put("topics", topics); + } + if (payloadFormat != null) { properties.put("format", payloadFormat); } if (protoClass != null) { - properties.put("protoClass", protoClass); + properties.put("protoClass", protoClass.getName()); + } + if (thriftClass != null) { + properties.put("thriftClass", thriftClass.getName()); + } + if (thriftProtocolFactoryClass != null) { + properties.put("thriftProtocolFactoryClass", thriftProtocolFactoryClass.getName()); + } + Schema payloadSchema = Schema.builder().addInt32Field("id").addStringField("name").build(); + Schema schema; + if (isNested) { + Schema.Builder schemaBuilder = Schema.builder(); + schemaBuilder.addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE); + if (payloadFormat == null) { + schemaBuilder.addByteArrayField(Schemas.PAYLOAD_FIELD); + } else { + schemaBuilder.addRowField(Schemas.PAYLOAD_FIELD, payloadSchema); + } + schema = schemaBuilder.build(); + } else { + schema = payloadSchema; } return Table.builder() .name(name) .comment(name + " table") - .location("kafka://localhost:2181/brokers?topic=test") - .schema( - Stream.of( - Schema.Field.of("id", Schema.FieldType.INT32), - Schema.Field.of("name", Schema.FieldType.STRING)) - .collect(toSchema())) + .location(LOCATION_BROKER + "/" + LOCATION_TOPIC) + .schema(schema) .type("kafka") .properties(properties) .build(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTestTable.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTestTable.java index 393d1ff4ef51..f9cdf0ba9b40 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTestTable.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTestTable.java @@ -30,9 +30,9 @@ import java.util.concurrent.atomic.AtomicReference; import java.util.stream.Collectors; import org.apache.beam.sdk.io.kafka.KafkaIO; +import org.apache.beam.sdk.io.kafka.KafkaRecord; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -42,15 +42,13 @@ import org.apache.kafka.clients.consumer.MockConsumer; import org.apache.kafka.clients.consumer.OffsetAndTimestamp; import org.apache.kafka.clients.consumer.OffsetResetStrategy; +import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.common.KafkaException; import org.apache.kafka.common.PartitionInfo; import org.apache.kafka.common.TopicPartition; import org.apache.kafka.common.record.TimestampType; /** This is a mock BeamKafkaTable. It will use a Mock Consumer. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KafkaTestTable extends BeamKafkaTable { private final int partitionsPerTopic; private final List records; @@ -203,12 +201,14 @@ public void run() { } @Override - public PTransform>, PCollection> getPTransformForInput() { + public PTransform>, PCollection> + getPTransformForInput() { throw new RuntimeException("KafkaTestTable does not implement getPTransformForInput method."); } @Override - public PTransform, PCollection>> getPTransformForOutput() { + public PTransform, PCollection>> + getPTransformForOutput() { throw new RuntimeException("KafkaTestTable does not implement getPTransformForOutput method."); } } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/NestedPayloadKafkaTableTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/NestedPayloadKafkaTableTest.java new file mode 100644 index 000000000000..beb973cd7dba --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/NestedPayloadKafkaTableTest.java @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; + +import static java.nio.charset.StandardCharsets.UTF_8; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThrows; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.Mockito.doThrow; +import static org.mockito.MockitoAnnotations.openMocks; + +import java.util.List; +import java.util.Optional; +import org.apache.beam.sdk.io.kafka.KafkaRecord; +import org.apache.beam.sdk.io.kafka.KafkaTimestampType; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableListMultimap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ListMultimap; +import org.apache.kafka.clients.producer.ProducerRecord; +import org.apache.kafka.common.header.Header; +import org.apache.kafka.common.header.Headers; +import org.apache.kafka.common.header.internals.RecordHeaders; +import org.joda.time.Instant; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.mockito.Mock; + +@RunWith(JUnit4.class) +@SuppressWarnings("initialization.fields.uninitialized") +public class NestedPayloadKafkaTableTest { + private static final String TOPIC = "mytopic"; + private static final Schema FULL_WRITE_SCHEMA = + Schema.builder() + .addByteArrayField(Schemas.MESSAGE_KEY_FIELD) + .addField(Schemas.EVENT_TIMESTAMP_FIELD, FieldType.DATETIME.withNullable(true)) + .addArrayField(Schemas.HEADERS_FIELD, FieldType.row(Schemas.HEADERS_ENTRY_SCHEMA)) + .addByteArrayField(Schemas.PAYLOAD_FIELD) + .build(); + private static final Schema FULL_READ_SCHEMA = + Schema.builder() + .addByteArrayField(Schemas.MESSAGE_KEY_FIELD) + .addDateTimeField(Schemas.EVENT_TIMESTAMP_FIELD) + .addArrayField(Schemas.HEADERS_FIELD, FieldType.row(Schemas.HEADERS_ENTRY_SCHEMA)) + .addByteArrayField(Schemas.PAYLOAD_FIELD) + .build(); + + @Mock public PayloadSerializer serializer; + + @Before + public void setUp() { + openMocks(this); + } + + private NestedPayloadKafkaTable newTable(Schema schema, Optional serializer) { + return new NestedPayloadKafkaTable( + schema, "abc.bootstrap", ImmutableList.of(TOPIC), serializer); + } + + @Test + public void constructionFailures() { + // Not nested schema (no headers) + assertThrows( + IllegalArgumentException.class, + () -> + newTable( + Schema.builder().addByteArrayField(Schemas.PAYLOAD_FIELD).build(), + Optional.empty())); + // Row payload without serializer + assertThrows( + IllegalArgumentException.class, + () -> + newTable( + Schema.builder() + .addRowField( + Schemas.PAYLOAD_FIELD, Schema.builder().addStringField("abc").build()) + .addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE) + .build(), + Optional.empty())); + // Bytes payload with serializer + assertThrows( + IllegalArgumentException.class, + () -> + newTable( + Schema.builder() + .addByteArrayField(Schemas.PAYLOAD_FIELD) + .addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE) + .build(), + Optional.of(serializer))); + // Bad field in schema + assertThrows( + IllegalArgumentException.class, + () -> + newTable( + Schema.builder() + .addByteArrayField(Schemas.PAYLOAD_FIELD) + .addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE) + .addBooleanField("bad") + .build(), + Optional.empty())); + // Bad field type in schema + assertThrows( + IllegalArgumentException.class, + () -> + newTable( + Schema.builder() + .addByteArrayField(Schemas.PAYLOAD_FIELD) + .addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE) + .addBooleanField(Schemas.EVENT_TIMESTAMP_FIELD) + .build(), + Optional.empty())); + } + + private static KafkaRecord readRecord( + byte[] key, byte[] value, long timestamp, ListMultimap attributes) { + Headers headers = new RecordHeaders(); + attributes.forEach(headers::add); + return new KafkaRecord<>( + TOPIC, 0, 0, timestamp, KafkaTimestampType.LOG_APPEND_TIME, headers, key, value); + } + + @Test + public void recordToRowFailures() { + { + Schema payloadSchema = Schema.builder().addStringField("def").build(); + NestedPayloadKafkaTable table = + newTable( + Schema.builder() + .addRowField(Schemas.PAYLOAD_FIELD, payloadSchema) + .addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE) + .build(), + Optional.of(serializer)); + doThrow(new IllegalArgumentException("")).when(serializer).deserialize(any()); + assertThrows( + IllegalArgumentException.class, + () -> + table.transformInput( + readRecord( + new byte[] {}, "abc".getBytes(UTF_8), 123, ImmutableListMultimap.of()))); + } + // Schema requires headers, missing in message + { + NestedPayloadKafkaTable table = + newTable( + Schema.builder() + .addByteArrayField(Schemas.PAYLOAD_FIELD) + .addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE) + .build(), + Optional.empty()); + assertThrows( + IllegalArgumentException.class, + () -> + table.transformInput( + new KafkaRecord<>( + TOPIC, + 0, + 0, + 0, + KafkaTimestampType.LOG_APPEND_TIME, + null, + new byte[] {}, + new byte[] {}))); + } + } + + @Test + public void rowToRecordFailures() { + Schema payloadSchema = Schema.builder().addStringField("def").build(); + Schema schema = + Schema.builder() + .addRowField(Schemas.PAYLOAD_FIELD, payloadSchema) + .addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE.withNullable(true)) + .build(); + NestedPayloadKafkaTable table = newTable(schema, Optional.of(serializer)); + // badRow cannot be cast to schema + Schema badRowSchema = Schema.builder().addStringField("xxx").build(); + Row badRow = + Row.withSchema(badRowSchema).attachValues(Row.withSchema(badRowSchema).attachValues("abc")); + assertThrows(IllegalArgumentException.class, () -> table.transformOutput(badRow)); + + Row goodRow = + Row.withSchema(schema) + .withFieldValue( + Schemas.PAYLOAD_FIELD, + Row.withSchema(payloadSchema).withFieldValue("def", "abc").build()) + .build(); + doThrow(new IllegalArgumentException("")).when(serializer).serialize(any()); + assertThrows(IllegalArgumentException.class, () -> table.transformOutput(goodRow)); + } + + @Test + public void reorderRowToRecord() { + Schema schema = + Schema.builder() + .addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE) + .addByteArrayField(Schemas.PAYLOAD_FIELD) + .build(); + Schema rowSchema = + Schema.builder() + .addByteArrayField(Schemas.PAYLOAD_FIELD) + .addField(Schemas.HEADERS_FIELD, Schemas.HEADERS_FIELD_TYPE) + .build(); + NestedPayloadKafkaTable table = newTable(schema, Optional.empty()); + Row row = Row.withSchema(rowSchema).attachValues("abc".getBytes(UTF_8), ImmutableList.of()); + ProducerRecord output = table.transformOutput(row); + assertEquals("abc", new String(output.value(), UTF_8)); + assertEquals(0, output.headers().toArray().length); + } + + @Test + public void fullRowToRecord() { + NestedPayloadKafkaTable table = newTable(FULL_WRITE_SCHEMA, Optional.empty()); + Instant now = Instant.now(); + Row row = + Row.withSchema(FULL_WRITE_SCHEMA) + .withFieldValue(Schemas.MESSAGE_KEY_FIELD, "val1".getBytes(UTF_8)) + .withFieldValue(Schemas.PAYLOAD_FIELD, "val2".getBytes(UTF_8)) + .withFieldValue(Schemas.EVENT_TIMESTAMP_FIELD, now) + .withFieldValue( + Schemas.HEADERS_FIELD, + ImmutableList.of( + Row.withSchema(Schemas.HEADERS_ENTRY_SCHEMA) + .attachValues( + "key1", + ImmutableList.of("attr1".getBytes(UTF_8), "attr2".getBytes(UTF_8))), + Row.withSchema(Schemas.HEADERS_ENTRY_SCHEMA) + .attachValues("key2", ImmutableList.of("attr3".getBytes(UTF_8))))) + .build(); + ProducerRecord result = table.transformOutput(row); + assertEquals("val1", new String(result.key(), UTF_8)); + assertEquals("val2", new String(result.value(), UTF_8)); + assertEquals(now.getMillis(), result.timestamp().longValue()); + List
    key1Headers = ImmutableList.copyOf(result.headers().headers("key1")); + List
    key2Headers = ImmutableList.copyOf(result.headers().headers("key2")); + assertEquals(2, key1Headers.size()); + assertEquals(1, key2Headers.size()); + assertEquals("attr3", new String(key2Headers.get(0).value(), UTF_8)); + } + + @Test + public void fullRecordToRow() { + NestedPayloadKafkaTable table = newTable(FULL_READ_SCHEMA, Optional.empty()); + Instant event = Instant.now(); + KafkaRecord record = + readRecord( + "key".getBytes(UTF_8), + "value".getBytes(UTF_8), + event.getMillis(), + ImmutableListMultimap.of( + "key1", "attr1".getBytes(UTF_8), + "key1", "attr2".getBytes(UTF_8), + "key2", "attr3".getBytes(UTF_8))); + Row expected = + Row.withSchema(FULL_READ_SCHEMA) + .withFieldValue(Schemas.MESSAGE_KEY_FIELD, "key".getBytes(UTF_8)) + .withFieldValue(Schemas.PAYLOAD_FIELD, "value".getBytes(UTF_8)) + .withFieldValue(Schemas.EVENT_TIMESTAMP_FIELD, event) + .withFieldValue( + Schemas.HEADERS_FIELD, + ImmutableList.of( + Row.withSchema(Schemas.HEADERS_ENTRY_SCHEMA) + .attachValues( + "key1", + ImmutableList.of("attr1".getBytes(UTF_8), "attr2".getBytes(UTF_8))), + Row.withSchema(Schemas.HEADERS_ENTRY_SCHEMA) + .attachValues("key2", ImmutableList.of("attr3".getBytes(UTF_8))))) + .build(); + assertEquals(expected, table.transformInput(record)); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbFilterTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbFilterTest.java index dce8471fc63f..aafaf9b9396d 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbFilterTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbFilterTest.java @@ -44,9 +44,6 @@ import org.junit.runners.Parameterized.Parameters; @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MongoDbFilterTest { private static final Schema BASIC_SCHEMA = Schema.builder() diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java index 945c324ba35d..47ad96a23afc 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java @@ -77,9 +77,6 @@ * independent Mongo instance. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MongoDbReadWriteIT { private static final Logger LOG = LoggerFactory.getLogger(MongoDbReadWriteIT.class); private static final Schema SOURCE_SCHEMA = diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTableProviderTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTableProviderTest.java index 459af5679620..5d75bba583fb 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTableProviderTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTableProviderTest.java @@ -27,7 +27,7 @@ import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableProviderTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableProviderTest.java new file mode 100644 index 000000000000..71680f706fb5 --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableProviderTest.java @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.parquet; + +import static org.junit.Assert.assertEquals; + +import java.io.File; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.PipelineResult.State; +import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test for ParquetTable. */ +@RunWith(JUnit4.class) +public class ParquetTableProviderTest { + @Rule public TestPipeline writePipeline = TestPipeline.create(); + @Rule public TestPipeline readPipeline = TestPipeline.create(); + @Rule public TemporaryFolder tempFolder = new TemporaryFolder(); + + private static final String FIELD_NAMES = "(name VARCHAR, age BIGINT, country VARCHAR)"; + + private static final Schema TABLE_SCHEMA = + Schema.builder() + .addStringField("name") + .addInt64Field("age") + .addStringField("country") + .build(); + private static final Schema PROJECTED_SCHEMA = + Schema.builder().addInt64Field("age").addStringField("country").build(); + + @Test + public void testWriteAndReadTable() { + File destinationFile = new File(tempFolder.getRoot(), "person-info/"); + + BeamSqlEnv env = BeamSqlEnv.inMemory(new ParquetTableProvider()); + env.executeDdl( + String.format( + "CREATE EXTERNAL TABLE PersonInfo %s TYPE parquet LOCATION '%s'", + FIELD_NAMES, destinationFile.getAbsolutePath())); + + BeamSqlRelUtils.toPCollection( + writePipeline, + env.parseQuery( + "INSERT INTO PersonInfo VALUES ('Alan', 22, 'England'), ('John', 42, 'USA')")); + writePipeline.run().waitUntilFinish(); + + PCollection rows = + BeamSqlRelUtils.toPCollection(readPipeline, env.parseQuery("SELECT * FROM PersonInfo")); + PAssert.that(rows) + .containsInAnyOrder( + Row.withSchema(TABLE_SCHEMA).addValues("Alan", 22L, "England").build(), + Row.withSchema(TABLE_SCHEMA).addValues("John", 42L, "USA").build()); + + PCollection filtered = + BeamSqlRelUtils.toPCollection( + readPipeline, env.parseQuery("SELECT * FROM PersonInfo WHERE age > 25")); + PAssert.that(filtered) + .containsInAnyOrder(Row.withSchema(TABLE_SCHEMA).addValues("John", 42L, "USA").build()); + + PCollection projected = + BeamSqlRelUtils.toPCollection( + readPipeline, env.parseQuery("SELECT age, country FROM PersonInfo")); + PAssert.that(projected) + .containsInAnyOrder( + Row.withSchema(PROJECTED_SCHEMA).addValues(22L, "England").build(), + Row.withSchema(PROJECTED_SCHEMA).addValues(42L, "USA").build()); + + PCollection filteredAndProjected = + BeamSqlRelUtils.toPCollection( + readPipeline, env.parseQuery("SELECT age, country FROM PersonInfo WHERE age > 25")); + PAssert.that(filteredAndProjected) + .containsInAnyOrder(Row.withSchema(PROJECTED_SCHEMA).addValues(42L, "USA").build()); + + PipelineResult.State state = readPipeline.run().waitUntilFinish(); + assertEquals(State.DONE, state); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableReadTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableReadTest.java deleted file mode 100644 index afddb52fcd69..000000000000 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableReadTest.java +++ /dev/null @@ -1,93 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.extensions.sql.meta.provider.parquet; - -import java.io.File; -import java.io.IOException; -import java.io.InputStream; -import java.nio.file.Files; -import java.nio.file.Path; -import java.util.Arrays; -import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; -import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.testing.PAssert; -import org.apache.beam.sdk.testing.TestPipeline; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.Row; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -/** Test for ParquetTable. */ -@RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class ParquetTableReadTest { - private static final Logger LOG = LoggerFactory.getLogger(ParquetTableReadTest.class); - - @Rule public TestPipeline pipeline = TestPipeline.create(); - @Rule public TemporaryFolder temporaryFolder = new TemporaryFolder(); - - private static final String SQL_PARQUET_FIELD = - "(name VARCHAR, favorite_color VARCHAR, favorite_numbers ARRAY)"; - - private static final Schema PARQUET_SCHEMA = - Schema.builder() - .addField("name", Schema.FieldType.STRING) - .addNullableField("favorite_color", Schema.FieldType.STRING) - .addArrayField("favorite_numbers", Schema.FieldType.INT32) - .build(); - - private String extractParquetFile(String fileName) throws IOException { - InputStream inputStream = getClass().getResourceAsStream("/" + fileName); - File root = temporaryFolder.getRoot(); - Path tempFilePath = new File(root, fileName).toPath(); - Files.copy(inputStream, tempFilePath); - return tempFilePath.toString(); - } - - @Test - public void testReadParquet() throws IOException { - String parquetPath = extractParquetFile("users.parquet"); - - BeamSqlEnv env = BeamSqlEnv.inMemory(new ParquetTableProvider()); - env.executeDdl( - String.format( - "CREATE EXTERNAL TABLE users %s TYPE parquet LOCATION '%s'", - SQL_PARQUET_FIELD, parquetPath)); - - PCollection rows = - BeamSqlRelUtils.toPCollection( - pipeline, env.parseQuery("SELECT name, favorite_color, favorite_numbers FROM users")); - - PAssert.that(rows) - .containsInAnyOrder( - Row.withSchema(PARQUET_SCHEMA) - .addValues("Alyssa", null, Arrays.asList(3, 9, 15, 20)) - .build(), - Row.withSchema(PARQUET_SCHEMA).addValues("Ben", "red", Arrays.asList()).build()); - - pipeline.run(); - } -} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubAvroIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubAvroIT.java deleted file mode 100644 index 9b9ed68c6129..000000000000 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubAvroIT.java +++ /dev/null @@ -1,102 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.extensions.sql.meta.provider.pubsub; - -import static org.hamcrest.Matchers.equalTo; -import static org.hamcrest.Matchers.hasProperty; - -import java.io.ByteArrayOutputStream; -import java.io.IOException; -import java.util.List; -import org.apache.avro.generic.GenericRecord; -import org.apache.avro.generic.GenericRecordBuilder; -import org.apache.beam.sdk.coders.AvroCoder; -import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.schemas.utils.AvroUtils; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; -import org.hamcrest.Matcher; -import org.joda.time.Instant; -import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; - -/** Integration tests for querying Pubsub AVRO messages with SQL. */ -@RunWith(JUnit4.class) -public class PubsubAvroIT extends PubsubTableProviderIT { - private static final Schema NAME_HEIGHT_KNOWS_JS_SCHEMA = - Schema.builder() - .addNullableField("name", Schema.FieldType.STRING) - .addNullableField("height", Schema.FieldType.INT32) - .addNullableField("knowsJavascript", Schema.FieldType.BOOLEAN) - .build(); - - private static final Schema NAME_HEIGHT_SCHEMA = - Schema.builder() - .addNullableField("name", Schema.FieldType.STRING) - .addNullableField("height", Schema.FieldType.INT32) - .build(); - - @Override - protected String getPayloadFormat() { - return "avro"; - } - - @Override - protected PubsubMessage messageIdName(Instant timestamp, int id, String name) throws IOException { - byte[] encodedRecord = createEncodedGenericRecord(PAYLOAD_SCHEMA, ImmutableList.of(id, name)); - return message(timestamp, encodedRecord); - } - - @Override - protected Matcher matcherNames(String name) throws IOException { - Schema schema = Schema.builder().addStringField("name").build(); - byte[] encodedRecord = createEncodedGenericRecord(schema, ImmutableList.of(name)); - return hasProperty("payload", equalTo(encodedRecord)); - } - - @Override - protected Matcher matcherNameHeight(String name, int height) throws IOException { - byte[] encodedRecord = - createEncodedGenericRecord(NAME_HEIGHT_SCHEMA, ImmutableList.of(name, height)); - return hasProperty("payload", equalTo(encodedRecord)); - } - - @Override - protected Matcher matcherNameHeightKnowsJS( - String name, int height, boolean knowsJS) throws IOException { - byte[] encodedRecord = - createEncodedGenericRecord( - NAME_HEIGHT_KNOWS_JS_SCHEMA, ImmutableList.of(name, height, knowsJS)); - return hasProperty("payload", equalTo(encodedRecord)); - } - - private byte[] createEncodedGenericRecord(Schema beamSchema, List values) - throws IOException { - org.apache.avro.Schema avroSchema = AvroUtils.toAvroSchema(beamSchema); - GenericRecordBuilder builder = new GenericRecordBuilder(avroSchema); - List fields = avroSchema.getFields(); - for (int i = 0; i < fields.size(); ++i) { - builder.set(fields.get(i), values.get(i)); - } - AvroCoder coder = AvroCoder.of(avroSchema); - ByteArrayOutputStream out = new ByteArrayOutputStream(); - - coder.encode(builder.build(), out); - return out.toByteArray(); - } -} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubJsonIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubJsonIT.java deleted file mode 100644 index 8db903642cec..000000000000 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubJsonIT.java +++ /dev/null @@ -1,76 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.extensions.sql.meta.provider.pubsub; - -import static java.nio.charset.StandardCharsets.UTF_8; -import static org.apache.beam.sdk.testing.JsonMatcher.jsonBytesLike; -import static org.hamcrest.Matchers.hasProperty; - -import java.io.IOException; -import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage; -import org.hamcrest.Matcher; -import org.joda.time.Instant; -import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; - -/** Integration tests for querying Pubsub JSON messages with SQL. */ -@RunWith(JUnit4.class) -@SuppressWarnings({"nullness", "keyfor"}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -public class PubsubJsonIT extends PubsubTableProviderIT { - - // Pubsub table provider should default to json - @Override - protected String getPayloadFormat() { - return null; - } - - @Override - protected PubsubMessage messageIdName(Instant timestamp, int id, String name) { - String jsonString = "{ \"id\" : " + id + ", \"name\" : \"" + name + "\" }"; - return message(timestamp, jsonString); - } - - @Override - protected Matcher matcherNames(String name) throws IOException { - return hasProperty("payload", toJsonByteLike(String.format("{\"name\":\"%s\"}", name))); - } - - @Override - protected Matcher matcherNameHeightKnowsJS( - String name, int height, boolean knowsJS) throws IOException { - String jsonString = - String.format( - "{\"name\":\"%s\", \"height\": %s, \"knowsJavascript\": %s}", name, height, knowsJS); - - return hasProperty("payload", toJsonByteLike(jsonString)); - } - - @Override - protected Matcher matcherNameHeight(String name, int height) throws IOException { - String jsonString = String.format("{\"name\":\"%s\", \"height\": %s}", name, height); - return hasProperty("payload", toJsonByteLike(jsonString)); - } - - private PubsubMessage message(Instant timestamp, String jsonPayload) { - return message(timestamp, jsonPayload.getBytes(UTF_8)); - } - - private Matcher toJsonByteLike(String jsonString) throws IOException { - return jsonBytesLike(jsonString); - } -} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubTableProviderIT.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubTableProviderIT.java index ab7880980422..11d4feea030d 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubTableProviderIT.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubTableProviderIT.java @@ -17,6 +17,8 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.pubsub; +import static java.nio.charset.StandardCharsets.UTF_8; +import static org.apache.beam.sdk.testing.JsonMatcher.jsonBytesLike; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.equalTo; @@ -25,20 +27,29 @@ import com.fasterxml.jackson.core.JsonProcessingException; import com.fasterxml.jackson.databind.ObjectMapper; +import java.io.ByteArrayOutputStream; +import java.io.IOException; import java.io.Serializable; import java.nio.charset.StandardCharsets; import java.sql.ResultSet; -import java.sql.SQLException; import java.sql.Statement; +import java.util.Arrays; +import java.util.Collection; +import java.util.HashMap; import java.util.List; import java.util.Map; +import java.util.Set; import java.util.concurrent.Callable; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; import java.util.stream.Collectors; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.beam.sdk.coders.AvroCoder; import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.extensions.protobuf.PayloadMessages; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; import org.apache.beam.sdk.extensions.sql.impl.JdbcConnection; import org.apache.beam.sdk.extensions.sql.impl.JdbcDriver; @@ -51,31 +62,39 @@ import org.apache.beam.sdk.io.gcp.pubsub.TestPubsubSignal; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.SchemaCoder; +import org.apache.beam.sdk.schemas.utils.AvroUtils; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.util.common.ReflectHelpers; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteConnection; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteConnection; import org.hamcrest.Matcher; import org.joda.time.Duration; import org.joda.time.Instant; +import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; +import org.junit.runners.Parameterized; +import org.junit.runners.Parameterized.Parameter; +import org.junit.runners.Parameterized.Parameters; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; -@RunWith(JUnit4.class) +@RunWith(Parameterized.class) @SuppressWarnings({ "keyfor", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public abstract class PubsubTableProviderIT implements Serializable { +public class PubsubTableProviderIT implements Serializable { - protected static final Schema PAYLOAD_SCHEMA = + private static final Logger LOG = LoggerFactory.getLogger(PubsubTableProviderIT.class); + + private static final Schema PAYLOAD_SCHEMA = Schema.builder() .addNullableField("id", Schema.FieldType.INT32) .addNullableField("name", Schema.FieldType.STRING) @@ -88,19 +107,33 @@ public abstract class PubsubTableProviderIT implements Serializable { @Rule public transient TestPipeline pipeline = TestPipeline.create(); @Rule public transient TestPipeline filterPipeline = TestPipeline.create(); private final SchemaIOTableProviderWrapper tableProvider = new PubsubTableProvider(); - private final String payloadFormatParam = - getPayloadFormat() == null ? "" : String.format("\"format\" : \"%s\", ", getPayloadFormat()); + + /** How long to wait on the result signal. */ + private final Duration timeout = Duration.standardMinutes(10); + + @Parameters + public static Collection data() { + return Arrays.asList( + new Object[][] { + {new PubsubJsonObjectProvider()}, + {new PubsubAvroObjectProvider()}, + {new PubsubProtoObjectProvider()} + }); + } + + @Parameter public PubsubObjectProvider objectsProvider; /** * HACK: we need an objectmapper to turn pipelineoptions back into a map. We need to use * ReflectHelpers to get the extra PipelineOptions. */ - protected static final ObjectMapper MAPPER = + private static final ObjectMapper MAPPER = new ObjectMapper() .registerModules(ObjectMapper.findModules(ReflectHelpers.findClassLoader())); @Test public void testSQLSelectsPayloadContent() throws Exception { + String createTableString = String.format( "CREATE EXTERNAL TABLE message (\n" @@ -115,18 +148,15 @@ public void testSQLSelectsPayloadContent() throws Exception { + "LOCATION '%s' \n" + "TBLPROPERTIES '{ " + "%s" + + "\"protoClass\" : \"%s\", " + "\"timestampAttributeKey\" : \"ts\" }'", - tableProvider.getTableType(), eventsTopic.topicPath(), payloadFormatParam); + tableProvider.getTableType(), + eventsTopic.topicPath(), + payloadFormatParam(), + PayloadMessages.SimpleMessage.class.getName()); String queryString = "SELECT message.payload.id, message.payload.name from message"; - // Prepare messages to send later - List messages = - ImmutableList.of( - messageIdName(ts(1), 3, "foo"), - messageIdName(ts(2), 5, "bar"), - messageIdName(ts(3), 7, "baz")); - // Initialize SQL environment and create the pubsub table BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new PubsubTableProvider()); sqlEnv.executeDdl(createTableString); @@ -149,6 +179,144 @@ public void testSQLSelectsPayloadContent() throws Exception { // Start the pipeline pipeline.run(); + // Block until a subscription for this topic exists + eventsTopic.assertSubscriptionEventuallyCreated( + pipeline.getOptions().as(GcpOptions.class).getProject(), Duration.standardMinutes(5)); + + // Start publishing the messages when main pipeline is started and signaling topic is ready + eventsTopic.publish( + ImmutableList.of( + objectsProvider.messageIdName(ts(1), 3, "foo"), + objectsProvider.messageIdName(ts(2), 5, "bar"), + objectsProvider.messageIdName(ts(3), 7, "baz"))); + + // Poll the signaling topic for success message + resultSignal.waitForSuccess(timeout); + } + + @Ignore("https://issues.apache.org/jira/browse/BEAM-12320") + @Test + public void testSQLSelectsArrayAttributes() throws Exception { + + String createTableString = + String.format( + "CREATE EXTERNAL TABLE message (\n" + + "event_timestamp TIMESTAMP, \n" + + "attributes ARRAY>, \n" + + "payload ROW< \n" + + " id INTEGER, \n" + + " name VARCHAR \n" + + " > \n" + + ") \n" + + "TYPE '%s' \n" + + "LOCATION '%s' \n" + + "TBLPROPERTIES '{ " + + "%s" + + "\"protoClass\" : \"%s\", " + + "\"timestampAttributeKey\" : \"ts\" }'", + tableProvider.getTableType(), + eventsTopic.topicPath(), + payloadFormatParam(), + PayloadMessages.SimpleMessage.class.getName()); + + String queryString = + "SELECT message.payload.id, attributes[1].key AS a1, attributes[2].key AS a2 FROM message"; + + // Initialize SQL environment and create the pubsub table + BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new PubsubTableProvider()); + sqlEnv.executeDdl(createTableString); + + // Apply the PTransform to query the pubsub topic + PCollection queryOutput = query(sqlEnv, pipeline, queryString); + + // Observe the query results and send success signal after seeing the expected messages + queryOutput.apply( + "waitForSuccess", + resultSignal.signalSuccessWhen( + SchemaCoder.of(PAYLOAD_SCHEMA), + observedRows -> { + Map entries = new HashMap<>(); + for (Row row : observedRows) { + if ("ts".equals(row.getString("a1"))) { + entries.put(row.getInt32("id"), row.getString("a2")); + } else { + entries.put(row.getInt32("id"), row.getString("a1")); + } + } + + LOG.info("Entries: {}", entries); + + return entries.equals(ImmutableMap.of(3, "foo", 5, "bar", 7, "baz")); + })); + + // Start the pipeline + pipeline.run(); + + // Block until a subscription for this topic exists + eventsTopic.assertSubscriptionEventuallyCreated( + pipeline.getOptions().as(GcpOptions.class).getProject(), Duration.standardMinutes(5)); + + // Start publishing the messages when main pipeline is started and signaling topic is ready + eventsTopic.publish( + ImmutableList.of( + objectsProvider.messageIdName(ts(1), 3, "foo"), + objectsProvider.messageIdName(ts(2), 5, "bar"), + objectsProvider.messageIdName(ts(3), 7, "baz"))); + + // Poll the signaling topic for success message + resultSignal.waitForSuccess(timeout); + } + + @Test + public void testSQLWithBytePayload() throws Exception { + + // Prepare messages to send later + List messages = + ImmutableList.of( + objectsProvider.messageIdName(ts(1), 3, "foo"), + objectsProvider.messageIdName(ts(2), 5, "bar"), + objectsProvider.messageIdName(ts(3), 7, "baz")); + + String createTableString = + String.format( + "CREATE EXTERNAL TABLE message (\n" + + "event_timestamp TIMESTAMP, \n" + + "attributes MAP, \n" + + "payload VARBINARY \n" + + ") \n" + + "TYPE '%s' \n" + + "LOCATION '%s' \n" + + "TBLPROPERTIES '{ " + + "\"protoClass\" : \"%s\", " + + "\"timestampAttributeKey\" : \"ts\" }'", + tableProvider.getTableType(), + eventsTopic.topicPath(), + PayloadMessages.SimpleMessage.class.getName()); + + String queryString = "SELECT message.payload AS some_bytes FROM message"; + + // Initialize SQL environment and create the pubsub table + BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new PubsubTableProvider()); + sqlEnv.executeDdl(createTableString); + + // Apply the PTransform to query the pubsub topic + PCollection queryOutput = query(sqlEnv, pipeline, queryString); + + // Observe the query results and send success signal after seeing the expected messages + Schema justBytesSchema = + Schema.builder().addField("some_bytes", FieldType.BYTES.withNullable(true)).build(); + Row expectedRow0 = row(justBytesSchema, (Object) messages.get(0).getPayload()); + Row expectedRow1 = row(justBytesSchema, (Object) messages.get(1).getPayload()); + Row expectedRow2 = row(justBytesSchema, (Object) messages.get(2).getPayload()); + Set expected = ImmutableSet.of(expectedRow0, expectedRow1, expectedRow2); + queryOutput.apply( + "waitForSuccess", + resultSignal.signalSuccessWhen( + SchemaCoder.of(justBytesSchema), observedRows -> observedRows.equals(expected))); + + // Start the pipeline + pipeline.run(); + // Block until a subscription for this topic exists eventsTopic.assertSubscriptionEventuallyCreated( pipeline.getOptions().as(GcpOptions.class).getProject(), Duration.standardMinutes(5)); @@ -157,11 +325,13 @@ public void testSQLSelectsPayloadContent() throws Exception { eventsTopic.publish(messages); // Poll the signaling topic for success message - resultSignal.waitForSuccess(Duration.standardMinutes(5)); + resultSignal.waitForSuccess(timeout); } @Test + @SuppressWarnings("unchecked") public void testUsesDlq() throws Exception { + String createTableString = String.format( "CREATE EXTERNAL TABLE message (\n" @@ -178,24 +348,17 @@ public void testUsesDlq() throws Exception { + " '{ " + " %s" + " \"timestampAttributeKey\" : \"ts\", " - + " \"deadLetterQueue\" : \"%s\"" + + " \"deadLetterQueue\" : \"%s\", " + + " \"protoClass\" : \"%s\" " + " }'", tableProvider.getTableType(), eventsTopic.topicPath(), - payloadFormatParam, - dlqTopic.topicPath()); + payloadFormatParam(), + dlqTopic.topicPath(), + PayloadMessages.SimpleMessage.class.getName()); String queryString = "SELECT message.payload.id, message.payload.name from message"; - // Prepare messages to send later - List messages = - ImmutableList.of( - messageIdName(ts(1), 3, "foo"), - messageIdName(ts(2), 5, "bar"), - messageIdName(ts(3), 7, "baz"), - messagePayload(ts(4), "{ - }"), // invalid message, will go to DLQ - messagePayload(ts(5), "{ + }")); // invalid message, will go to DLQ - // Initialize SQL environment and create the pubsub table BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new PubsubTableProvider()); sqlEnv.executeDdl(createTableString); @@ -223,18 +386,26 @@ public void testUsesDlq() throws Exception { pipeline.getOptions().as(GcpOptions.class).getProject(), Duration.standardMinutes(5)); // Start publishing the messages when main pipeline is started and signaling topics are ready - eventsTopic.publish(messages); + eventsTopic.publish( + ImmutableList.of( + objectsProvider.messageIdName(ts(1), 3, "foo"), + objectsProvider.messageIdName(ts(2), 5, "bar"), + objectsProvider.messageIdName(ts(3), 7, "baz"), + messagePayload(ts(4), "{ - }", ImmutableMap.of()), // invalid message, will go to DLQ + messagePayload(ts(5), "{ + }", ImmutableMap.of()))); // invalid message, will go to DLQ // Poll the signaling topic for success message - resultSignal.waitForSuccess(Duration.standardMinutes(4)); + resultSignal.waitForSuccess(timeout); dlqTopic .assertThatTopicEventuallyReceives( matcherPayload(ts(4), "{ - }"), matcherPayload(ts(5), "{ + }")) - .waitForUpTo(Duration.standardSeconds(20)); + .waitForUpTo(Duration.standardSeconds(40)); } @Test + @SuppressWarnings({"unchecked", "rawtypes"}) public void testSQLLimit() throws Exception { + String createTableString = String.format( "CREATE EXTERNAL TABLE message (\n" @@ -251,22 +422,24 @@ public void testSQLLimit() throws Exception { + " '{ " + " %s" + " \"timestampAttributeKey\" : \"ts\", " - + " \"deadLetterQueue\" : \"%s\"" + + " \"deadLetterQueue\" : \"%s\", " + + " \"protoClass\" : \"%s\" " + " }'", tableProvider.getTableType(), eventsTopic.topicPath(), - payloadFormatParam, - dlqTopic.topicPath()); + payloadFormatParam(), + dlqTopic.topicPath(), + PayloadMessages.SimpleMessage.class.getName()); List messages = ImmutableList.of( - messageIdName(ts(1), 3, "foo"), - messageIdName(ts(2), 5, "bar"), - messageIdName(ts(3), 7, "baz"), - messageIdName(ts(4), 9, "ba2"), - messageIdName(ts(5), 10, "ba3"), - messageIdName(ts(6), 13, "ba4"), - messageIdName(ts(7), 15, "ba5")); + objectsProvider.messageIdName(ts(1), 3, "foo"), + objectsProvider.messageIdName(ts(2), 5, "bar"), + objectsProvider.messageIdName(ts(3), 7, "baz"), + objectsProvider.messageIdName(ts(4), 9, "ba2"), + objectsProvider.messageIdName(ts(5), 10, "ba3"), + objectsProvider.messageIdName(ts(6), 13, "ba4"), + objectsProvider.messageIdName(ts(7), 15, "ba5")); // We need the default options on the schema to include the project passed in for the // integration test @@ -295,13 +468,16 @@ public void testSQLLimit() throws Exception { eventsTopic.assertSubscriptionEventuallyCreated( pipeline.getOptions().as(GcpOptions.class).getProject(), Duration.standardMinutes(5)); + eventsTopic.publish(messages); + assertThat(queryResult.get(2, TimeUnit.MINUTES).size(), equalTo(3)); pool.shutdown(); } @Test public void testSQLSelectsPayloadContentFlat() throws Exception { + String createTableString = String.format( "CREATE EXTERNAL TABLE message (\n" @@ -314,19 +490,16 @@ public void testSQLSelectsPayloadContentFlat() throws Exception { + "TBLPROPERTIES " + " '{ " + " %s" + + " \"protoClass\" : \"%s\", " + " \"timestampAttributeKey\" : \"ts\" " + " }'", - tableProvider.getTableType(), eventsTopic.topicPath(), payloadFormatParam); + tableProvider.getTableType(), + eventsTopic.topicPath(), + payloadFormatParam(), + PayloadMessages.SimpleMessage.class.getName()); String queryString = "SELECT message.id, message.name from message"; - // Prepare messages to send later - List messages = - ImmutableList.of( - messageIdName(ts(1), 3, "foo"), - messageIdName(ts(2), 5, "bar"), - messageIdName(ts(3), 7, "baz")); - // Initialize SQL environment and create the pubsub table BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new PubsubTableProvider()); sqlEnv.executeDdl(createTableString); @@ -354,32 +527,40 @@ public void testSQLSelectsPayloadContentFlat() throws Exception { pipeline.getOptions().as(GcpOptions.class).getProject(), Duration.standardMinutes(5)); // Start publishing the messages when main pipeline is started and signaling topic is ready - eventsTopic.publish(messages); + eventsTopic.publish( + ImmutableList.of( + objectsProvider.messageIdName(ts(1), 3, "foo"), + objectsProvider.messageIdName(ts(2), 5, "bar"), + objectsProvider.messageIdName(ts(3), 7, "baz"))); // Poll the signaling topic for success message - resultSignal.waitForSuccess(Duration.standardMinutes(5)); + resultSignal.waitForSuccess(timeout); } @Test + @SuppressWarnings("unchecked") public void testSQLInsertRowsToPubsubFlat() throws Exception { + String createTableString = String.format( "CREATE EXTERNAL TABLE message (\n" + "event_timestamp TIMESTAMP, \n" + "name VARCHAR, \n" + "height INTEGER, \n" - + "knowsJavascript BOOLEAN \n" + + "knows_javascript BOOLEAN \n" + ") \n" + "TYPE '%s' \n" + "LOCATION '%s' \n" + "TBLPROPERTIES " + " '{ " + " %s" + + " \"protoClass\" : \"%s\", " + " \"deadLetterQueue\" : \"%s\"" + " }'", tableProvider.getTableType(), eventsTopic.topicPath(), - payloadFormatParam, + payloadFormatParam(), + PayloadMessages.NameHeightKnowsJSMessage.class.getName(), dlqTopic.topicPath()); // Initialize SQL environment and create the pubsub table @@ -389,44 +570,48 @@ public void testSQLInsertRowsToPubsubFlat() throws Exception { // TODO(BEAM-8741): Ideally we could write this query without specifying a column list, because // it shouldn't be possible to write to event_timestamp when it's mapped to publish time. String queryString = - "INSERT INTO message (name, height, knowsJavascript) \n" + "INSERT INTO message (name, height, knows_javascript) \n" + "VALUES \n" + "('person1', 80, TRUE), \n" + "('person2', 70, FALSE)"; // Apply the PTransform to insert the rows - PCollection queryOutput = query(sqlEnv, pipeline, queryString); + query(sqlEnv, pipeline, queryString); pipeline.run().waitUntilFinish(Duration.standardMinutes(5)); eventsTopic .assertThatTopicEventuallyReceives( - matcherNameHeightKnowsJS("person1", 80, true), - matcherNameHeightKnowsJS("person2", 70, false)) - .waitForUpTo(Duration.standardSeconds(20)); + objectsProvider.matcherNameHeightKnowsJS("person1", 80, true), + objectsProvider.matcherNameHeightKnowsJS("person2", 70, false)) + .waitForUpTo(Duration.standardSeconds(40)); } @Test + @SuppressWarnings("unchecked") public void testSQLInsertRowsToPubsubWithTimestampAttributeFlat() throws Exception { + String createTableString = String.format( "CREATE EXTERNAL TABLE message (\n" + " event_timestamp TIMESTAMP, \n" + " name VARCHAR, \n" + " height INTEGER, \n" - + " knowsJavascript BOOLEAN \n" + + " knows_javascript BOOLEAN \n" + ") \n" + "TYPE '%s' \n" + "LOCATION '%s' \n" + "TBLPROPERTIES " + " '{ " + " %s " + + " \"protoClass\" : \"%s\", " + " \"deadLetterQueue\" : \"%s\"," + " \"timestampAttributeKey\" : \"ts\"" + " }'", tableProvider.getTableType(), eventsTopic.topicPath(), - payloadFormatParam, + payloadFormatParam(), + PayloadMessages.NameHeightKnowsJSMessage.class.getName(), dlqTopic.topicPath()); // Initialize SQL environment and create the pubsub table @@ -436,20 +621,18 @@ public void testSQLInsertRowsToPubsubWithTimestampAttributeFlat() throws Excepti String queryString = "INSERT INTO message " + "VALUES " - + "(TIMESTAMP '1970-01-01 00:00:00.001', 'person1', 80, TRUE), " + "(TIMESTAMP '1970-01-01 00:00:00.002', 'person2', 70, FALSE)"; - PCollection queryOutput = query(sqlEnv, pipeline, queryString); + query(sqlEnv, pipeline, queryString); pipeline.run().waitUntilFinish(Duration.standardMinutes(5)); eventsTopic - .assertThatTopicEventuallyReceives( - matcherTsNameHeightKnowsJS(ts(1), "person1", 80, true), - matcherTsNameHeightKnowsJS(ts(2), "person2", 70, false)) - .waitForUpTo(Duration.standardSeconds(20)); + .assertThatTopicEventuallyReceives(matcherTsNameHeightKnowsJS(ts(2), "person2", 70, false)) + .waitForUpTo(Duration.standardSeconds(40)); } @Test + @SuppressWarnings("unchecked") public void testSQLReadAndWriteWithSameFlatTableDefinition() throws Exception { // This test verifies that the same pubsub table definition can be used for both reading and // writing @@ -458,22 +641,34 @@ public void testSQLReadAndWriteWithSameFlatTableDefinition() throws Exception { // `javascript_people` String tblProperties = - getPayloadFormat() == null + objectsProvider.getPayloadFormat() == null ? "" - : String.format("TBLPROPERTIES '{\"format\": \"%s\"}'", getPayloadFormat()); + : String.format( + "TBLPROPERTIES '{ \"protoClass\" : \"%s\", \"format\": \"%s\" }'", + PayloadMessages.NameHeightKnowsJSMessage.class.getName(), + objectsProvider.getPayloadFormat()); + String createTableString = String.format( "CREATE EXTERNAL TABLE people (\n" + "event_timestamp TIMESTAMP, \n" + "name VARCHAR, \n" + "height INTEGER, \n" - + "knowsJavascript BOOLEAN \n" + + "knows_javascript BOOLEAN \n" + ") \n" + "TYPE '%s' \n" + "LOCATION '%s' \n" + "%s", tableProvider.getTableType(), eventsTopic.topicPath(), tblProperties); + String filteredTblProperties = + objectsProvider.getPayloadFormat() == null + ? "" + : String.format( + "TBLPROPERTIES '{ \"protoClass\" : \"%s\", \"format\": \"%s\" }'", + PayloadMessages.NameHeightMessage.class.getName(), + objectsProvider.getPayloadFormat()); + String createFilteredTableString = String.format( "CREATE EXTERNAL TABLE javascript_people (\n" @@ -484,7 +679,7 @@ public void testSQLReadAndWriteWithSameFlatTableDefinition() throws Exception { + "TYPE '%s' \n" + "LOCATION '%s' \n" + "%s", - tableProvider.getTableType(), filteredEventsTopic.topicPath(), tblProperties); + tableProvider.getTableType(), filteredEventsTopic.topicPath(), filteredTblProperties); // Initialize SQL environment and create the pubsub table BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(new PubsubTableProvider()); @@ -500,11 +695,11 @@ public void testSQLReadAndWriteWithSameFlatTableDefinition() throws Exception { + " name, \n" + " height \n" + " FROM people \n" - + " WHERE knowsJavascript \n" + + " WHERE knows_javascript \n" + ")"; String injectQueryString = - "INSERT INTO people (name, height, knowsJavascript) VALUES \n" + "INSERT INTO people (name, height, knows_javascript) VALUES \n" + "('person1', 80, TRUE), \n" + "('person2', 70, FALSE), \n" + "('person3', 60, TRUE), \n" @@ -529,19 +724,30 @@ public void testSQLReadAndWriteWithSameFlatTableDefinition() throws Exception { filteredEventsTopic .assertThatTopicEventuallyReceives( - matcherNameHeight("person1", 80), - matcherNameHeight("person3", 60), - matcherNameHeight("person5", 40)) + objectsProvider.matcherNameHeight("person1", 80), + objectsProvider.matcherNameHeight("person3", 60), + objectsProvider.matcherNameHeight("person5", 40)) .waitForUpTo(Duration.standardMinutes(5)); } - private CalciteConnection connect(PipelineOptions options, TableProvider... tableProviders) - throws SQLException { + @SuppressWarnings("unchecked") + private CalciteConnection connect(PipelineOptions options, TableProvider... tableProviders) { // HACK: PipelineOptions should expose a prominent method to do this reliably // The actual options are in the "options" field of the converted map Map argsMap = ((Map) MAPPER.convertValue(pipeline.getOptions(), Map.class).get("options")) .entrySet().stream() + .filter( + (entry) -> { + if (entry.getValue() instanceof List) { + if (!((List) entry.getValue()).isEmpty()) { + throw new IllegalArgumentException("Cannot encode list arguments"); + } + // We can encode empty lists, just omit them. + return false; + } + return true; + }) .collect(Collectors.toMap(Map.Entry::getKey, entry -> toArg(entry.getValue()))); InMemoryMetaStore inMemoryMetaStore = new InMemoryMetaStore(); @@ -569,50 +775,239 @@ private static String toArg(Object o) { } } + private String payloadFormatParam() { + return objectsProvider.getPayloadFormat() == null + ? "" + : String.format("\"format\" : \"%s\", ", objectsProvider.getPayloadFormat()); + } + private PCollection query(BeamSqlEnv sqlEnv, TestPipeline pipeline, String queryString) { return BeamSqlRelUtils.toPCollection(pipeline, sqlEnv.parseQuery(queryString)); } - protected Row row(Schema schema, Object... values) { + private Row row(Schema schema, Object... values) { return Row.withSchema(schema).addValues(values).build(); } - protected PubsubMessage message(Instant timestamp, byte[] payload) { - return new PubsubMessage(payload, ImmutableMap.of("ts", String.valueOf(timestamp.getMillis()))); + private static PubsubMessage message( + Instant timestamp, byte[] payload, Map attributes) { + return new PubsubMessage( + payload, + ImmutableMap.builder() + .putAll(attributes) + .put("ts", String.valueOf(timestamp.getMillis())) + .build()); } - protected Matcher matcherTsNameHeightKnowsJS( + private Matcher matcherTsNameHeightKnowsJS( Instant ts, String name, int height, boolean knowsJS) throws Exception { return allOf( - matcherNameHeightKnowsJS(name, height, knowsJS), + objectsProvider.matcherNameHeightKnowsJS(name, height, knowsJS), hasProperty("attributeMap", hasEntry("ts", String.valueOf(ts.getMillis())))); } - protected Matcher matcherPayload(Instant timestamp, String payload) { + private Matcher matcherPayload(Instant timestamp, String payload) { return allOf( hasProperty("payload", equalTo(payload.getBytes(StandardCharsets.US_ASCII))), hasProperty("attributeMap", hasEntry("ts", String.valueOf(timestamp.getMillis())))); } - protected Instant ts(long millis) { + private Instant ts(long millis) { return Instant.ofEpochMilli(millis); } - protected abstract String getPayloadFormat(); + private PubsubMessage messagePayload( + Instant timestamp, String payload, Map attributes) { + return message(timestamp, payload.getBytes(StandardCharsets.US_ASCII), attributes); + } + + private abstract static class PubsubObjectProvider implements Serializable { + protected abstract String getPayloadFormat(); + + protected abstract PubsubMessage messageIdName(Instant timestamp, int id, String name) + throws Exception; - protected abstract PubsubMessage messageIdName(Instant timestamp, int id, String name) - throws Exception; + protected abstract Matcher matcherNames(String name) throws Exception; - protected abstract Matcher matcherNames(String name) throws Exception; + protected abstract Matcher matcherNameHeightKnowsJS( + String name, int height, boolean knowsJS) throws Exception; - protected abstract Matcher matcherNameHeightKnowsJS( - String name, int height, boolean knowsJS) throws Exception; + protected abstract Matcher matcherNameHeight(String name, int height) + throws Exception; + } + + private static class PubsubProtoObjectProvider extends PubsubObjectProvider { - protected abstract Matcher matcherNameHeight(String name, int height) - throws Exception; + @Override + protected String getPayloadFormat() { + return "proto"; + } - private PubsubMessage messagePayload(Instant timestamp, String payload) { - return message(timestamp, payload.getBytes(StandardCharsets.US_ASCII)); + @Override + protected PubsubMessage messageIdName(Instant timestamp, int id, String name) { + + PayloadMessages.SimpleMessage.Builder simpleMessage = + PayloadMessages.SimpleMessage.newBuilder().setId(id).setName(name); + + return PubsubTableProviderIT.message( + timestamp, + simpleMessage.build().toByteArray(), + ImmutableMap.of(name, Integer.toString(id))); + } + + @Override + protected Matcher matcherNames(String name) throws IOException { + + PayloadMessages.NameMessage.Builder nameMessage = + PayloadMessages.NameMessage.newBuilder().setName(name); + + return hasProperty("payload", equalTo(nameMessage.build().toByteArray())); + } + + @Override + protected Matcher matcherNameHeightKnowsJS( + String name, int height, boolean knowsJS) throws IOException { + + PayloadMessages.NameHeightKnowsJSMessage.Builder nameHeightKnowsJSMessage = + PayloadMessages.NameHeightKnowsJSMessage.newBuilder() + .setHeight(height) + .setName(name) + .setKnowsJavascript(knowsJS); + + return hasProperty("payload", equalTo(nameHeightKnowsJSMessage.build().toByteArray())); + } + + @Override + protected Matcher matcherNameHeight(String name, int height) throws IOException { + + PayloadMessages.NameHeightMessage.Builder nameHeightMessage = + PayloadMessages.NameHeightMessage.newBuilder().setName(name).setHeight(height); + + return hasProperty("payload", equalTo(nameHeightMessage.build().toByteArray())); + } + } + + private static class PubsubJsonObjectProvider extends PubsubObjectProvider { + + // Pubsub table provider should default to json + @Override + protected String getPayloadFormat() { + return null; + } + + @Override + protected PubsubMessage messageIdName(Instant timestamp, int id, String name) { + String jsonString = "{ \"id\" : " + id + ", \"name\" : \"" + name + "\" }"; + return message(timestamp, jsonString, ImmutableMap.of(name, Integer.toString(id))); + } + + @Override + protected Matcher matcherNames(String name) throws IOException { + return hasProperty("payload", toJsonByteLike(String.format("{\"name\":\"%s\"}", name))); + } + + @Override + protected Matcher matcherNameHeightKnowsJS( + String name, int height, boolean knowsJS) throws IOException { + String jsonString = + String.format( + "{\"name\":\"%s\", \"height\": %s, \"knows_javascript\": %s}", name, height, knowsJS); + + return hasProperty("payload", toJsonByteLike(jsonString)); + } + + @Override + protected Matcher matcherNameHeight(String name, int height) throws IOException { + String jsonString = String.format("{\"name\":\"%s\", \"height\": %s}", name, height); + return hasProperty("payload", toJsonByteLike(jsonString)); + } + + private PubsubMessage message( + Instant timestamp, String jsonPayload, Map attributes) { + return PubsubTableProviderIT.message(timestamp, jsonPayload.getBytes(UTF_8), attributes); + } + + private Matcher toJsonByteLike(String jsonString) throws IOException { + return jsonBytesLike(jsonString); + } + } + + private static class PubsubAvroObjectProvider extends PubsubObjectProvider { + private static final Schema NAME_HEIGHT_KNOWS_JS_SCHEMA = + Schema.builder() + .addNullableField("name", Schema.FieldType.STRING) + .addNullableField("height", Schema.FieldType.INT32) + .addNullableField("knows_javascript", Schema.FieldType.BOOLEAN) + .build(); + + private static final Schema NAME_HEIGHT_SCHEMA = + Schema.builder() + .addNullableField("name", Schema.FieldType.STRING) + .addNullableField("height", Schema.FieldType.INT32) + .build(); + + @Override + protected String getPayloadFormat() { + return "avro"; + } + + @Override + protected PubsubMessage messageIdName(Instant timestamp, int id, String name) + throws IOException { + byte[] encodedRecord = + createEncodedGenericRecord( + PAYLOAD_SCHEMA, + org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList.of( + id, name)); + return message(timestamp, encodedRecord, ImmutableMap.of(name, Integer.toString(id))); + } + + @Override + protected Matcher matcherNames(String name) throws IOException { + Schema schema = Schema.builder().addStringField("name").build(); + byte[] encodedRecord = + createEncodedGenericRecord( + schema, + org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList.of( + name)); + return hasProperty("payload", equalTo(encodedRecord)); + } + + @Override + protected Matcher matcherNameHeight(String name, int height) throws IOException { + byte[] encodedRecord = + createEncodedGenericRecord( + NAME_HEIGHT_SCHEMA, + org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList.of( + name, height)); + return hasProperty("payload", equalTo(encodedRecord)); + } + + @Override + protected Matcher matcherNameHeightKnowsJS( + String name, int height, boolean knowsJS) throws IOException { + byte[] encodedRecord = + createEncodedGenericRecord( + NAME_HEIGHT_KNOWS_JS_SCHEMA, + org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList.of( + name, height, knowsJS)); + return hasProperty("payload", equalTo(encodedRecord)); + } + + private byte[] createEncodedGenericRecord(Schema beamSchema, List values) + throws IOException { + org.apache.avro.Schema avroSchema = AvroUtils.toAvroSchema(beamSchema); + GenericRecordBuilder builder = new GenericRecordBuilder(avroSchema); + List fields = avroSchema.getFields(); + for (int i = 0; i < fields.size(); ++i) { + builder.set(fields.get(i), values.get(i)); + } + AvroCoder coder = AvroCoder.of(avroSchema); + ByteArrayOutputStream out = new ByteArrayOutputStream(); + + coder.encode(builder.build(), out); + return out.toByteArray(); + } } } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteTableProviderTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteTableProviderTest.java new file mode 100644 index 000000000000..e3c172f6c0ea --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/PubsubLiteTableProviderTest.java @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.pubsublite; + +import static com.google.cloud.pubsublite.internal.testing.UnitTestExamples.example; +import static org.junit.Assert.assertThrows; +import static org.junit.Assert.assertTrue; + +import com.alibaba.fastjson.JSONObject; +import com.google.api.gax.rpc.ApiException; +import com.google.cloud.pubsublite.SubscriptionPath; +import com.google.cloud.pubsublite.TopicPath; +import java.util.Map; +import java.util.function.Function; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class PubsubLiteTableProviderTest { + private static final PubsubLiteTableProvider PROVIDER = new PubsubLiteTableProvider(); + private static final Schema FULL_WRITE_SCHEMA = + Schema.builder() + .addByteArrayField(RowHandler.MESSAGE_KEY_FIELD) + .addDateTimeField(RowHandler.EVENT_TIMESTAMP_FIELD) + .addArrayField( + RowHandler.ATTRIBUTES_FIELD, FieldType.row(RowHandler.ATTRIBUTES_ENTRY_SCHEMA)) + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .build(); + private static final Schema FULL_READ_SCHEMA = + Schema.builder() + .addByteArrayField(RowHandler.MESSAGE_KEY_FIELD) + .addDateTimeField(RowHandler.EVENT_TIMESTAMP_FIELD) + .addArrayField( + RowHandler.ATTRIBUTES_FIELD, FieldType.row(RowHandler.ATTRIBUTES_ENTRY_SCHEMA)) + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addDateTimeField(RowHandler.PUBLISH_TIMESTAMP_FIELD) + .build(); + + private static BeamSqlTable makeTable( + Schema schema, String location, Map properties) { + Table table = + Table.builder() + .type(PROVIDER.getTableType()) + .name("testTable") + .schema(schema) + .location(location) + .properties(new JSONObject().fluentPutAll(properties)) + .build(); + return PROVIDER.buildBeamSqlTable(table); + } + + @Test + public void invalidSchemas() { + Function tableMaker = + schema -> makeTable(schema, example(SubscriptionPath.class).toString(), ImmutableMap.of()); + // No payload + assertThrows( + IllegalArgumentException.class, + () -> + tableMaker.apply( + Schema.builder().addDateTimeField(RowHandler.EVENT_TIMESTAMP_FIELD).build())); + // Bad payload type + assertThrows( + IllegalArgumentException.class, + () -> + tableMaker.apply(Schema.builder().addDateTimeField(RowHandler.PAYLOAD_FIELD).build())); + // Bad field name + assertThrows( + IllegalArgumentException.class, + () -> + tableMaker.apply( + Schema.builder() + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addByteArrayField("my-random-field") + .build())); + // Bad attributes type + assertThrows( + IllegalArgumentException.class, + () -> + tableMaker.apply( + Schema.builder() + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addByteArrayField(RowHandler.ATTRIBUTES_FIELD) + .build())); + // Bad attributes field names + assertThrows( + IllegalArgumentException.class, + () -> + tableMaker.apply( + Schema.builder() + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addRowField( + RowHandler.ATTRIBUTES_FIELD, + Schema.builder() + .addStringField(RowHandler.ATTRIBUTES_KEY_FIELD) + .addArrayField("badValues", FieldType.BYTES) + .build()) + .build())); + // Bad event timestamp type + assertThrows( + IllegalArgumentException.class, + () -> + tableMaker.apply( + Schema.builder() + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addByteArrayField(RowHandler.EVENT_TIMESTAMP_FIELD) + .build())); + // Bad publish timestamp type + assertThrows( + IllegalArgumentException.class, + () -> + tableMaker.apply( + Schema.builder() + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addByteArrayField(RowHandler.PUBLISH_TIMESTAMP_FIELD) + .build())); + // Bad message key type + assertThrows( + IllegalArgumentException.class, + () -> + tableMaker.apply( + Schema.builder() + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addStringField(RowHandler.MESSAGE_KEY_FIELD) + .build())); + } + + @Test + public void bytesWithFormatInvalid() { + assertThrows( + IllegalArgumentException.class, + () -> + makeTable( + Schema.builder().addByteArrayField(RowHandler.PAYLOAD_FIELD).build(), + example(SubscriptionPath.class).toString(), + ImmutableMap.of("format", "json"))); + } + + @Test + public void invalidLocations() { + Function tableMaker = + location -> makeTable(FULL_WRITE_SCHEMA, location, ImmutableMap.of()); + + // Just nonsense + assertThrows(IllegalArgumentException.class, () -> tableMaker.apply("my-location")); + // CPS topic + assertThrows(ApiException.class, () -> tableMaker.apply("projects/abc/topics/def")); + // CPS subscription + assertThrows(ApiException.class, () -> tableMaker.apply("projects/abc/subscriptions/def")); + } + + @Test + public void validTopicTables() { + BeamSqlTable basic = + makeTable(FULL_WRITE_SCHEMA, example(TopicPath.class).toString(), ImmutableMap.of()); + assertTrue(basic instanceof PubsubLiteTopicTable); + BeamSqlTable row = + makeTable( + Schema.builder() + .addRowField( + RowHandler.PAYLOAD_FIELD, Schema.builder().addStringField("abc").build()) + .build(), + example(TopicPath.class).toString(), + ImmutableMap.of("format", "json")); // Defaults to json + assertTrue(row instanceof PubsubLiteTopicTable); + BeamSqlTable dlq = + makeTable( + Schema.builder() + .addRowField( + RowHandler.PAYLOAD_FIELD, Schema.builder().addStringField("abc").build()) + .build(), + example(TopicPath.class).toString(), + ImmutableMap.of( + "deadLetterQueue", "pubsub:projects/abc/topics/def")); // Defaults to json + assertTrue(dlq instanceof PubsubLiteTopicTable); + } + + @Test + @SuppressWarnings("argument.type.incompatible") + public void topicTableCannotRead() { + BeamSqlTable basic = + makeTable(FULL_WRITE_SCHEMA, example(TopicPath.class).toString(), ImmutableMap.of()); + assertThrows(UnsupportedOperationException.class, () -> basic.buildIOReader(null)); + } + + @Test + public void validSubscriptionTables() { + BeamSqlTable basic = + makeTable(FULL_READ_SCHEMA, example(SubscriptionPath.class).toString(), ImmutableMap.of()); + assertTrue(basic instanceof PubsubLiteSubscriptionTable); + BeamSqlTable row = + makeTable( + Schema.builder() + .addRowField( + RowHandler.PAYLOAD_FIELD, Schema.builder().addStringField("abc").build()) + .build(), + example(SubscriptionPath.class).toString(), + ImmutableMap.of("format", "json")); + assertTrue(row instanceof PubsubLiteSubscriptionTable); + BeamSqlTable dlq = + makeTable( + Schema.builder() + .addRowField( + RowHandler.PAYLOAD_FIELD, Schema.builder().addStringField("abc").build()) + .build(), + example(SubscriptionPath.class).toString(), + ImmutableMap.of("format", "json", "deadLetterQueue", "pubsub:projects/abc/topics/def")); + assertTrue(dlq instanceof PubsubLiteSubscriptionTable); + } + + @Test + @SuppressWarnings("argument.type.incompatible") + public void subscriptionTableCannotWrite() { + BeamSqlTable basic = + makeTable(FULL_READ_SCHEMA, example(SubscriptionPath.class).toString(), ImmutableMap.of()); + assertThrows(UnsupportedOperationException.class, () -> basic.buildIOWriter(null)); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/RowHandlerTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/RowHandlerTest.java new file mode 100644 index 000000000000..91bba72808cb --- /dev/null +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsublite/RowHandlerTest.java @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.pubsublite; + +import static java.nio.charset.StandardCharsets.UTF_8; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThrows; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.Mockito.doThrow; +import static org.mockito.MockitoAnnotations.openMocks; + +import com.google.cloud.pubsublite.proto.AttributeValues; +import com.google.cloud.pubsublite.proto.PubSubMessage; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.Timestamps; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.joda.time.Instant; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.mockito.Mock; + +@RunWith(JUnit4.class) +@SuppressWarnings("initialization.fields.uninitialized") +public class RowHandlerTest { + private static final Schema FULL_WRITE_SCHEMA = + Schema.builder() + .addByteArrayField(RowHandler.MESSAGE_KEY_FIELD) + .addField(RowHandler.EVENT_TIMESTAMP_FIELD, FieldType.DATETIME.withNullable(true)) + .addArrayField( + RowHandler.ATTRIBUTES_FIELD, FieldType.row(RowHandler.ATTRIBUTES_ENTRY_SCHEMA)) + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .build(); + private static final Schema FULL_READ_SCHEMA = + Schema.builder() + .addByteArrayField(RowHandler.MESSAGE_KEY_FIELD) + .addField(RowHandler.EVENT_TIMESTAMP_FIELD, FieldType.DATETIME.withNullable(true)) + .addArrayField( + RowHandler.ATTRIBUTES_FIELD, FieldType.row(RowHandler.ATTRIBUTES_ENTRY_SCHEMA)) + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addDateTimeField(RowHandler.PUBLISH_TIMESTAMP_FIELD) + .build(); + + @Mock public PayloadSerializer serializer; + + @Before + public void setUp() { + openMocks(this); + } + + @Test + public void constructionFailures() { + // Row payload without serializer + assertThrows( + IllegalArgumentException.class, + () -> + new RowHandler( + Schema.builder() + .addRowField( + RowHandler.PAYLOAD_FIELD, Schema.builder().addStringField("abc").build()) + .build())); + // Bytes payload with serializer + assertThrows( + IllegalArgumentException.class, + () -> + new RowHandler( + Schema.builder().addByteArrayField(RowHandler.PAYLOAD_FIELD).build(), serializer)); + } + + @Test + public void messageToRowFailures() { + { + Schema payloadSchema = Schema.builder().addStringField("def").build(); + RowHandler rowHandler = + new RowHandler( + Schema.builder().addRowField(RowHandler.PAYLOAD_FIELD, payloadSchema).build(), + serializer); + doThrow(new IllegalArgumentException("")).when(serializer).deserialize(any()); + assertThrows( + IllegalArgumentException.class, + () -> rowHandler.messageToRow(SequencedMessage.getDefaultInstance())); + } + // Schema requires event time, missing in message + { + RowHandler rowHandler = + new RowHandler( + Schema.builder() + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addDateTimeField(RowHandler.EVENT_TIMESTAMP_FIELD) + .build()); + assertThrows( + IllegalArgumentException.class, + () -> rowHandler.messageToRow(SequencedMessage.getDefaultInstance())); + } + } + + @Test + public void rowToMessageFailures() { + Schema payloadSchema = Schema.builder().addStringField("def").build(); + Schema schema = Schema.builder().addRowField(RowHandler.PAYLOAD_FIELD, payloadSchema).build(); + RowHandler rowHandler = new RowHandler(schema, serializer); + // badRow cannot be cast to schema + Schema badRowSchema = Schema.builder().addStringField("xxx").build(); + Row badRow = + Row.withSchema(badRowSchema).attachValues(Row.withSchema(badRowSchema).attachValues("abc")); + assertThrows(IllegalArgumentException.class, () -> rowHandler.rowToMessage(badRow)); + + Row goodRow = + Row.withSchema(schema).addValue(Row.withSchema(payloadSchema).attachValues("abc")).build(); + doThrow(new IllegalArgumentException("")).when(serializer).serialize(any()); + assertThrows(IllegalArgumentException.class, () -> rowHandler.rowToMessage(goodRow)); + } + + @Test + public void reorderRowToMessage() { + Schema schema = + Schema.builder() + .addByteArrayField(RowHandler.MESSAGE_KEY_FIELD) + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .build(); + Schema rowSchema = + Schema.builder() + .addByteArrayField(RowHandler.PAYLOAD_FIELD) + .addByteArrayField(RowHandler.MESSAGE_KEY_FIELD) + .build(); + RowHandler rowHandler = new RowHandler(schema); + Row row = Row.withSchema(rowSchema).attachValues("abc".getBytes(UTF_8), "def".getBytes(UTF_8)); + PubSubMessage expected = + PubSubMessage.newBuilder() + .setData(ByteString.copyFromUtf8("abc")) + .setKey(ByteString.copyFromUtf8("def")) + .build(); + assertEquals(expected, rowHandler.rowToMessage(row)); + } + + @Test + public void fullRowToMessage() { + RowHandler rowHandler = new RowHandler(FULL_WRITE_SCHEMA); + Instant now = Instant.now(); + Row row = + Row.withSchema(FULL_WRITE_SCHEMA) + .withFieldValue(RowHandler.MESSAGE_KEY_FIELD, "val1".getBytes(UTF_8)) + .withFieldValue(RowHandler.PAYLOAD_FIELD, "val2".getBytes(UTF_8)) + .withFieldValue(RowHandler.EVENT_TIMESTAMP_FIELD, now) + .withFieldValue( + RowHandler.ATTRIBUTES_FIELD, + ImmutableList.of( + Row.withSchema(RowHandler.ATTRIBUTES_ENTRY_SCHEMA) + .attachValues( + "key1", + ImmutableList.of("attr1".getBytes(UTF_8), "attr2".getBytes(UTF_8))), + Row.withSchema(RowHandler.ATTRIBUTES_ENTRY_SCHEMA) + .attachValues("key2", ImmutableList.of("attr3".getBytes(UTF_8))))) + .build(); + PubSubMessage expected = + PubSubMessage.newBuilder() + .setKey(ByteString.copyFromUtf8("val1")) + .setData(ByteString.copyFromUtf8("val2")) + .setEventTime(Timestamps.fromMillis(now.getMillis())) + .putAttributes( + "key1", + AttributeValues.newBuilder() + .addValues(ByteString.copyFromUtf8("attr1")) + .addValues(ByteString.copyFromUtf8("attr2")) + .build()) + .putAttributes( + "key2", + AttributeValues.newBuilder().addValues(ByteString.copyFromUtf8("attr3")).build()) + .build(); + assertEquals(expected, rowHandler.rowToMessage(row)); + } + + @Test + public void fullMessageToRow() { + RowHandler rowHandler = new RowHandler(FULL_READ_SCHEMA); + Instant event = Instant.now(); + Instant publish = Instant.now(); + PubSubMessage userMessage = + PubSubMessage.newBuilder() + .setKey(ByteString.copyFromUtf8("val1")) + .setData(ByteString.copyFromUtf8("val2")) + .setEventTime(Timestamps.fromMillis(event.getMillis())) + .putAttributes( + "key1", + AttributeValues.newBuilder() + .addValues(ByteString.copyFromUtf8("attr1")) + .addValues(ByteString.copyFromUtf8("attr2")) + .build()) + .putAttributes( + "key2", + AttributeValues.newBuilder().addValues(ByteString.copyFromUtf8("attr3")).build()) + .build(); + SequencedMessage sequencedMessage = + SequencedMessage.newBuilder() + .setMessage(userMessage) + .setPublishTime(Timestamps.fromMillis(publish.getMillis())) + .build(); + Row expected = + Row.withSchema(FULL_READ_SCHEMA) + .withFieldValue(RowHandler.MESSAGE_KEY_FIELD, "val1".getBytes(UTF_8)) + .withFieldValue(RowHandler.PAYLOAD_FIELD, "val2".getBytes(UTF_8)) + .withFieldValue(RowHandler.EVENT_TIMESTAMP_FIELD, event) + .withFieldValue(RowHandler.PUBLISH_TIMESTAMP_FIELD, publish) + .withFieldValue( + RowHandler.ATTRIBUTES_FIELD, + ImmutableList.of( + Row.withSchema(RowHandler.ATTRIBUTES_ENTRY_SCHEMA) + .attachValues( + "key1", + ImmutableList.of("attr1".getBytes(UTF_8), "attr2".getBytes(UTF_8))), + Row.withSchema(RowHandler.ATTRIBUTES_ENTRY_SCHEMA) + .attachValues("key2", ImmutableList.of("attr3".getBytes(UTF_8))))) + .build(); + assertEquals(expected, rowHandler.messageToRow(sequencedMessage)); + } +} diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderTest.java index 01ac8e35e5b7..fa0a22db22f1 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderTest.java @@ -28,7 +28,7 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; import org.joda.time.Duration; import org.junit.Before; import org.junit.Rule; @@ -37,9 +37,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestTableProviderTest { private static final Schema BASIC_SCHEMA = Schema.builder().addInt32Field("id").addStringField("name").build(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithFilterAndProjectPushDown.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithFilterAndProjectPushDown.java index 950c3118373a..eade9c8e90a7 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithFilterAndProjectPushDown.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithFilterAndProjectPushDown.java @@ -41,14 +41,10 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.CalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSets; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSets; import org.joda.time.Duration; import org.junit.Before; import org.junit.Rule; @@ -57,9 +53,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestTableProviderWithFilterAndProjectPushDown { private static final Schema BASIC_SCHEMA = Schema.builder() @@ -72,12 +65,12 @@ public class TestTableProviderWithFilterAndProjectPushDown { private static final List rulesWithPushDown = ImmutableList.of( BeamCalcRule.INSTANCE, - FilterCalcMergeRule.INSTANCE, - ProjectCalcMergeRule.INSTANCE, + CoreRules.FILTER_CALC_MERGE, + CoreRules.PROJECT_CALC_MERGE, BeamIOPushDownRule.INSTANCE, - FilterToCalcRule.INSTANCE, - ProjectToCalcRule.INSTANCE, - CalcMergeRule.INSTANCE); + CoreRules.FILTER_TO_CALC, + CoreRules.PROJECT_TO_CALC, + CoreRules.CALC_MERGE); private BeamSqlEnv sqlEnv; @Rule public TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithFilterPushDown.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithFilterPushDown.java index 738c9dfe33a5..029c4fad3183 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithFilterPushDown.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithFilterPushDown.java @@ -42,15 +42,11 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.CalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSets; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSets; import org.hamcrest.collection.IsIterableContainingInAnyOrder; import org.joda.time.Duration; import org.junit.Before; @@ -60,9 +56,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestTableProviderWithFilterPushDown { private static final Schema BASIC_SCHEMA = Schema.builder() @@ -75,12 +68,12 @@ public class TestTableProviderWithFilterPushDown { private static final List rulesWithPushDown = ImmutableList.of( BeamCalcRule.INSTANCE, - FilterCalcMergeRule.INSTANCE, - ProjectCalcMergeRule.INSTANCE, + CoreRules.FILTER_CALC_MERGE, + CoreRules.PROJECT_CALC_MERGE, BeamIOPushDownRule.INSTANCE, - FilterToCalcRule.INSTANCE, - ProjectToCalcRule.INSTANCE, - CalcMergeRule.INSTANCE); + CoreRules.FILTER_TO_CALC, + CoreRules.PROJECT_TO_CALC, + CoreRules.CALC_MERGE); private BeamSqlEnv sqlEnv; @Rule public TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithProjectPushDown.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithProjectPushDown.java index d37c4b2259c0..a7f3bc9f733c 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithProjectPushDown.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProviderWithProjectPushDown.java @@ -41,14 +41,10 @@ import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.CalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectToCalcRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSets; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSets; import org.joda.time.Duration; import org.junit.Before; import org.junit.Rule; @@ -57,9 +53,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestTableProviderWithProjectPushDown { private static final Schema BASIC_SCHEMA = Schema.builder() @@ -71,12 +64,12 @@ public class TestTableProviderWithProjectPushDown { private static final List rulesWithPushDown = ImmutableList.of( BeamCalcRule.INSTANCE, - FilterCalcMergeRule.INSTANCE, - ProjectCalcMergeRule.INSTANCE, + CoreRules.FILTER_CALC_MERGE, + CoreRules.PROJECT_CALC_MERGE, BeamIOPushDownRule.INSTANCE, - FilterToCalcRule.INSTANCE, - ProjectToCalcRule.INSTANCE, - CalcMergeRule.INSTANCE); + CoreRules.FILTER_TO_CALC, + CoreRules.PROJECT_TO_CALC, + CoreRules.CALC_MERGE); private BeamSqlEnv sqlEnv; @Rule public TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTableProviderTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTableProviderTest.java index 1c1b6bab2420..4b7f77c2563b 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTableProviderTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTableProviderTest.java @@ -17,13 +17,12 @@ */ package org.apache.beam.sdk.extensions.sql.meta.provider.text; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; -import static org.junit.Assert.assertThat; import java.io.File; import java.nio.file.Files; -import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; -import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.extensions.sql.SqlTransform; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; @@ -34,7 +33,7 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptors; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Charsets; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Charsets; import org.junit.Rule; import org.junit.Test; import org.junit.rules.TemporaryFolder; @@ -83,15 +82,13 @@ public void testLegacyDefaultCsv() throws Exception { Files.write( tempFolder.newFile("test.csv").toPath(), "hello,13\n\ngoodbye,42\n".getBytes(Charsets.UTF_8)); - - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); - env.executeDdl( + String query = "SELECT * FROM test"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s/*'", - SQL_CSV_SCHEMA, tempFolder.getRoot())); + SQL_CSV_SCHEMA, tempFolder.getRoot()); - PCollection rows = - BeamSqlRelUtils.toPCollection(pipeline, env.parseQuery("SELECT * FROM test")); + PCollection rows = pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); PAssert.that(rows) .containsInAnyOrder( @@ -110,14 +107,13 @@ public void testLegacyTdfCsv() throws Exception { tempFolder.newFile("test.csv").toPath(), "hello\t13\n\ngoodbye\t42\n".getBytes(Charsets.UTF_8)); - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); - env.executeDdl( + String query = "SELECT * FROM test"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s/*' TBLPROPERTIES '{\"format\":\"TDF\"}'", - SQL_CSV_SCHEMA, tempFolder.getRoot())); + SQL_CSV_SCHEMA, tempFolder.getRoot()); - PCollection rows = - BeamSqlRelUtils.toPCollection(pipeline, env.parseQuery("SELECT * FROM test")); + PCollection rows = pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); rows.apply( MapElements.into(TypeDescriptors.voids()) @@ -144,14 +140,13 @@ public void testExplicitCsv() throws Exception { tempFolder.newFile("test.csv").toPath(), "hello,13\n\ngoodbye,42\n".getBytes(Charsets.UTF_8)); - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); - env.executeDdl( + String query = "SELECT * FROM test"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s/*' TBLPROPERTIES '{\"format\":\"csv\"}'", - SQL_CSV_SCHEMA, tempFolder.getRoot())); + SQL_CSV_SCHEMA, tempFolder.getRoot()); - PCollection rows = - BeamSqlRelUtils.toPCollection(pipeline, env.parseQuery("SELECT * FROM test")); + PCollection rows = pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); PAssert.that(rows) .containsInAnyOrder( @@ -172,15 +167,14 @@ public void testExplicitCsvExcel() throws Exception { Files.write( tempFolder.newFile("test.csv").toPath(), "hello\n\ngoodbye\n".getBytes(Charsets.UTF_8)); - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); - env.executeDdl( + String query = "SELECT * FROM test"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s/*' " + "TBLPROPERTIES '{\"format\":\"csv\", \"csvFormat\":\"Excel\"}'", - SINGLE_STRING_SQL_SCHEMA, tempFolder.getRoot())); + SINGLE_STRING_SQL_SCHEMA, tempFolder.getRoot()); - PCollection rows = - BeamSqlRelUtils.toPCollection(pipeline, env.parseQuery("SELECT * FROM test")); + PCollection rows = pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); PAssert.that(rows) .containsInAnyOrder( @@ -199,14 +193,13 @@ public void testLines() throws Exception { Files.write( tempFolder.newFile("test.csv").toPath(), "hello,13\ngoodbye,42\n".getBytes(Charsets.UTF_8)); - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); - env.executeDdl( + String query = "SELECT * FROM test"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s/*' TBLPROPERTIES '{\"format\":\"lines\"}'", - SQL_LINES_SCHEMA, tempFolder.getRoot())); + SQL_LINES_SCHEMA, tempFolder.getRoot()); - PCollection rows = - BeamSqlRelUtils.toPCollection(pipeline, env.parseQuery("SELECT * FROM test")); + PCollection rows = pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); PAssert.that(rows) .containsInAnyOrder( @@ -219,14 +212,13 @@ public void testLines() throws Exception { public void testJson() throws Exception { Files.write(tempFolder.newFile("test.json").toPath(), JSON_TEXT.getBytes(Charsets.UTF_8)); - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); - env.executeDdl( + String query = "SELECT * FROM test"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s/*' TBLPROPERTIES '{\"format\":\"json\"}'", - SQL_JSON_SCHEMA, tempFolder.getRoot())); + SQL_JSON_SCHEMA, tempFolder.getRoot()); - PCollection rows = - BeamSqlRelUtils.toPCollection(pipeline, env.parseQuery("SELECT * FROM test")); + PCollection rows = pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); PAssert.that(rows) .containsInAnyOrder(Row.withSchema(JSON_SCHEMA).addValues("Jack", 13).build()); @@ -239,15 +231,14 @@ public void testInvalidJson() throws Exception { Files.write( tempFolder.newFile("test.json").toPath(), INVALID_JSON_TEXT.getBytes(Charsets.UTF_8)); - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); - env.executeDdl( + String query = "SELECT * FROM test"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s/*' " + "TBLPROPERTIES '{\"format\":\"json\", \"deadLetterFile\": \"%s\"}'", - SQL_JSON_SCHEMA, tempFolder.getRoot(), deadLetterFile.getAbsoluteFile())); + SQL_JSON_SCHEMA, tempFolder.getRoot(), deadLetterFile.getAbsoluteFile()); - PCollection rows = - BeamSqlRelUtils.toPCollection(pipeline, env.parseQuery("SELECT * FROM test")); + PCollection rows = pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); PAssert.that(rows).empty(); @@ -261,14 +252,14 @@ public void testInvalidJson() throws Exception { @Test public void testWriteLines() throws Exception { File destinationFile = new File(tempFolder.getRoot(), "lines-outputs"); - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); - env.executeDdl( + + String query = "INSERT INTO test VALUES ('hello'), ('goodbye')"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s' TBLPROPERTIES '{\"format\":\"lines\"}'", - SQL_LINES_SCHEMA, destinationFile.getAbsolutePath())); + SQL_LINES_SCHEMA, destinationFile.getAbsolutePath()); - BeamSqlRelUtils.toPCollection( - pipeline, env.parseQuery("INSERT INTO test VALUES ('hello'), ('goodbye')")); + pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); pipeline.run(); assertThat( @@ -280,16 +271,15 @@ public void testWriteLines() throws Exception { @Test public void testWriteCsv() throws Exception { File destinationFile = new File(tempFolder.getRoot(), "csv-outputs"); - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); // NumberedShardedFile - env.executeDdl( + String query = "INSERT INTO test VALUES ('hello', 42), ('goodbye', 13)"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s' TBLPROPERTIES '{\"format\":\"csv\"}'", - SQL_CSV_SCHEMA, destinationFile.getAbsolutePath())); + SQL_CSV_SCHEMA, destinationFile.getAbsolutePath()); - BeamSqlRelUtils.toPCollection( - pipeline, env.parseQuery("INSERT INTO test VALUES ('hello', 42), ('goodbye', 13)")); + pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); pipeline.run(); assertThat( @@ -301,14 +291,14 @@ public void testWriteCsv() throws Exception { @Test public void testWriteJson() throws Exception { File destinationFile = new File(tempFolder.getRoot(), "json-outputs"); - BeamSqlEnv env = BeamSqlEnv.inMemory(new TextTableProvider()); - env.executeDdl( + + String query = "INSERT INTO test(name, age) VALUES ('Jack', 13)"; + String ddl = String.format( "CREATE EXTERNAL TABLE test %s TYPE text LOCATION '%s' TBLPROPERTIES '{\"format\":\"json\"}'", - SQL_JSON_SCHEMA, destinationFile.getAbsolutePath())); + SQL_JSON_SCHEMA, destinationFile.getAbsolutePath()); - BeamSqlRelUtils.toPCollection( - pipeline, env.parseQuery("INSERT INTO test(name, age) VALUES ('Jack', 13)")); + pipeline.apply(SqlTransform.query(query).withDdlString(ddl)); pipeline.run(); assertThat( diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStoreTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStoreTest.java index 5929ac9118db..2cf835c38f64 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStoreTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStoreTest.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.extensions.sql.meta.store; import static org.apache.beam.sdk.schemas.Schema.toSchema; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import com.alibaba.fastjson.JSONObject; import java.util.HashMap; @@ -36,9 +36,6 @@ import org.junit.Test; /** UnitTest for {@link InMemoryMetaStore}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class InMemoryMetaStoreTest { private InMemoryMetaStore store; diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/utils/RowAsserts.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/utils/RowAsserts.java index b94578199993..af9581a4d388 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/utils/RowAsserts.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/utils/RowAsserts.java @@ -22,12 +22,9 @@ import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.Iterables; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.Iterables; /** Contain helpers to assert {@link Row}s. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RowAsserts { /** Asserts result contains single row with an int field. */ diff --git a/sdks/java/extensions/sql/udf-test-provider/build.gradle b/sdks/java/extensions/sql/udf-test-provider/build.gradle new file mode 100644 index 000000000000..461a4a83fcef --- /dev/null +++ b/sdks/java/extensions/sql/udf-test-provider/build.gradle @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +plugins { + id 'org.apache.beam.module' +} + +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.extensions.sql.provider', + publish: false, +) + +description = "Apache Beam :: SDKs :: Java :: Extensions :: SQL :: UDF test provider" +ext.summary = "Java UDFs for testing. This project must be built separately from its parent so the UDF provider is not included in the context classloader for tests." + +project.ext.jarPath = jar.archivePath + +dependencies { + // No dependency (direct or transitive) on :sdks:java:core. + compile project(":sdks:java:extensions:sql:udf") + compile library.java.vendored_guava_26_0_jre +} diff --git a/sdks/java/extensions/sql/udf-test-provider/src/main/java/org/apache/beam/sdk/extensions/sql/provider/UdfTestProvider.java b/sdks/java/extensions/sql/udf-test-provider/src/main/java/org/apache/beam/sdk/extensions/sql/provider/UdfTestProvider.java new file mode 100644 index 000000000000..c0da4aa1613b --- /dev/null +++ b/sdks/java/extensions/sql/udf-test-provider/src/main/java/org/apache/beam/sdk/extensions/sql/provider/UdfTestProvider.java @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.provider; + +import com.google.auto.service.AutoService; +import java.sql.Date; +import java.util.Map; +import org.apache.beam.sdk.extensions.sql.udf.AggregateFn; +import org.apache.beam.sdk.extensions.sql.udf.ScalarFn; +import org.apache.beam.sdk.extensions.sql.udf.UdfProvider; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +/** Defines Java UDFs for use in tests. */ +@AutoService(UdfProvider.class) +public class UdfTestProvider implements UdfProvider { + @Override + public Map userDefinedScalarFunctions() { + return ImmutableMap.of( + "helloWorld", + new HelloWorldFn(), + "matches", + new MatchFn(), + "increment", + new IncrementFn(), + "isNull", + new IsNullFn(), + "dateIncrementAll", + new DateIncrementAllFn()); + } + + @Override + public Map> userDefinedAggregateFunctions() { + return ImmutableMap.of("my_sum", new Sum()); + } + + public static class HelloWorldFn extends ScalarFn { + @ApplyMethod + public String helloWorld() { + return "Hello world!"; + } + } + + public static class MatchFn extends ScalarFn { + @ApplyMethod + public boolean matches(String s, String regex) { + return s.matches(regex); + } + } + + public static class IncrementFn extends ScalarFn { + @ApplyMethod + public Long increment(Long i) { + return i + 1; + } + } + + public static class IsNullFn extends ScalarFn { + @ApplyMethod + public boolean isNull(String s) { + return s == null; + } + } + + public static class UnusedFn extends ScalarFn { + @ApplyMethod + public String notRegistered() { + return "This method is not registered as a UDF."; + } + } + + public static class Sum implements AggregateFn { + + @Override + public Long createAccumulator() { + return 0L; + } + + @Override + public Long addInput(Long mutableAccumulator, Long input) { + return mutableAccumulator + input; + } + + @Override + public Long mergeAccumulators(Long mutableAccumulator, Iterable immutableAccumulators) { + for (Long x : immutableAccumulators) { + mutableAccumulator += x; + } + return mutableAccumulator; + } + + @Override + public Long extractOutput(Long mutableAccumulator) { + return mutableAccumulator; + } + } + + public static class DateIncrementAllFn extends ScalarFn { + @ApplyMethod + public Date incrementAll(Date date) { + return new Date(date.getYear() + 1, date.getMonth() + 1, date.getDate() + 1); + } + } +} diff --git a/sdks/java/extensions/sql/udf-test-provider/src/main/java/org/apache/beam/sdk/extensions/sql/provider/package-info.java b/sdks/java/extensions/sql/udf-test-provider/src/main/java/org/apache/beam/sdk/extensions/sql/provider/package-info.java new file mode 100644 index 000000000000..0ca46e07eccd --- /dev/null +++ b/sdks/java/extensions/sql/udf-test-provider/src/main/java/org/apache/beam/sdk/extensions/sql/provider/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Package containing UDF providers for testing. */ +package org.apache.beam.sdk.extensions.sql.provider; diff --git a/sdks/java/extensions/sql/udf/build.gradle b/sdks/java/extensions/sql/udf/build.gradle new file mode 100644 index 000000000000..24077fc93cd5 --- /dev/null +++ b/sdks/java/extensions/sql/udf/build.gradle @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +plugins { id 'org.apache.beam.module' } + +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.extensions.sql.udf') + +description = "Apache Beam :: SDKs :: Java :: Extensions :: SQL :: UDF" +ext.summary = "Provides interfaces for defining user-defined functions in Beam SQL." + +dependencies { + // None, except for those declared by applyJavaNature. + // Please think twice before adding dependencies here. The purpose of this module is to provide a way to define SQL + // functions in Java without depending on the rest of Beam. +} diff --git a/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/AggregateFn.java b/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/AggregateFn.java new file mode 100644 index 000000000000..1c1cecedcdce --- /dev/null +++ b/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/AggregateFn.java @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.udf; + +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * An aggregate function that can be executed as part of a SQL query. + * + *

    AggregateFn contains a subset of the functionality of {@code + * org.apache.beam.sdk.transforms.Combine.CombineFn}. + * + *

    AggregateFn is experimental. Compatibility is not guaranteed across Beam + * versions. + * + * @param type of input values + * @param type of mutable accumulator values + * @param type of output values + */ +public interface AggregateFn< + InputT extends @Nullable Object, + AccumT extends @Nullable Object, + OutputT extends @Nullable Object> { + + /** + * Returns a new, mutable accumulator value, representing the accumulation of zero input values. + */ + AccumT createAccumulator(); + + /** + * Adds the given input value to the given accumulator, returning the new accumulator value. + * + * @param mutableAccumulator may be modified and returned for efficiency + * @param input should not be mutated + */ + AccumT addInput(AccumT mutableAccumulator, InputT input); + + /** + * Returns an accumulator representing the accumulation of all the input values accumulated in the + * merging accumulators. + * + * @param mutableAccumulator This accumulator may be modified and returned for efficiency. + * @param immutableAccumulators These other accumulators should not be mutated, because they may + * be shared with other code and mutating them could lead to incorrect results or data + * corruption. + */ + AccumT mergeAccumulators(AccumT mutableAccumulator, Iterable immutableAccumulators); + + /** + * Returns the output value that is the result of combining all the input values represented by + * the given accumulator. + * + * @param mutableAccumulator can be modified for efficiency + */ + OutputT extractOutput(AccumT mutableAccumulator); +} diff --git a/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/ScalarFn.java b/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/ScalarFn.java new file mode 100644 index 000000000000..1c6482c80007 --- /dev/null +++ b/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/ScalarFn.java @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.udf; + +import java.io.Serializable; +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * A scalar function that can be executed as part of a SQL query. Subclasses must contain exactly + * one method annotated with {@link ApplyMethod}, which will be applied to the SQL function + * arguments at runtime. + * + *

    For example: + * + *

    
    + * public class IncrementFn extends ScalarFn {
    + *  {@literal @ApplyMethod}
    + *   public Long increment(Long i) {
    + *     return i + 1;
    + *   }
    + * }
    + * 
    + * + *

    ScalarFn is experimental. Compatibility is not guaranteed across Beam + * versions. + */ +public abstract class ScalarFn implements Serializable { + /** + * Annotates the single method in a {@link ScalarFn} implementation that is to be applied to SQL + * function arguments. + */ + @Retention(RetentionPolicy.RUNTIME) + @Target(ElementType.METHOD) + public static @interface ApplyMethod {} +} diff --git a/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/UdfProvider.java b/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/UdfProvider.java new file mode 100644 index 000000000000..fe8b5b5a6bcb --- /dev/null +++ b/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/UdfProvider.java @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.udf; + +import java.util.Collections; +import java.util.Map; + +/** + * Provider for user-defined functions written in Java. Implementations should be annotated with + * {@link com.google.auto.service.AutoService}. + */ +public interface UdfProvider { + /** Maps function names to scalar function implementations. */ + default Map userDefinedScalarFunctions() { + return Collections.emptyMap(); + } + + /** Maps function names to aggregate function implementations. */ + default Map> userDefinedAggregateFunctions() { + return Collections.emptyMap(); + } +} diff --git a/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/package-info.java b/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/package-info.java new file mode 100644 index 000000000000..a530cc322111 --- /dev/null +++ b/sdks/java/extensions/sql/udf/src/main/java/org/apache/beam/sdk/extensions/sql/udf/package-info.java @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Provides interfaces for defining user-defined functions in Beam SQL. + * + *

    The contents of this package are experimental. (They are not marked + * {@code @Experimental} as usual because that would require a dependency on the {@code + * :sdks:java:core} module, which we want to avoid.) + */ +package org.apache.beam.sdk.extensions.sql.udf; diff --git a/sdks/java/extensions/sql/zetasql/build.gradle b/sdks/java/extensions/sql/zetasql/build.gradle index 9589301e90ca..975b3d76369a 100644 --- a/sdks/java/extensions/sql/zetasql/build.gradle +++ b/sdks/java/extensions/sql/zetasql/build.gradle @@ -31,25 +31,39 @@ def zetasql_version = "2020.10.1" dependencies { compile enforcedPlatform(library.java.google_cloud_platform_libraries_bom) - compile project(":sdks:java:core") + permitUnusedDeclared enforcedPlatform(library.java.google_cloud_platform_libraries_bom) + compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":sdks:java:extensions:sql") - compile library.java.vendored_calcite_1_20_0 + compile project(":sdks:java:extensions:sql:udf") + compile library.java.vendored_calcite_1_26_0 compile library.java.guava compile library.java.grpc_api + compile library.java.joda_time compile library.java.protobuf_java compile library.java.protobuf_java_util - compile "com.google.api.grpc:proto-google-common-protos:1.12.0" // Interfaces with ZetaSQL use this - compile "com.google.api.grpc:grpc-google-common-protos:1.12.0" // Interfaces with ZetaSQL use this - compile "com.google.zetasql:zetasql-jni-channel:$zetasql_version" + permitUnusedDeclared library.java.protobuf_java_util // BEAM-11761 + compile library.java.slf4j_api + compile library.java.vendored_guava_26_0_jre + compile library.java.proto_google_common_protos // Interfaces with ZetaSQL use this + permitUnusedDeclared library.java.proto_google_common_protos // BEAM-11761 + compile library.java.grpc_google_common_protos // Interfaces with ZetaSQL use this + permitUnusedDeclared library.java.grpc_google_common_protos // BEAM-11761 compile "com.google.zetasql:zetasql-client:$zetasql_version" compile "com.google.zetasql:zetasql-types:$zetasql_version" - testCompile library.java.vendored_calcite_1_20_0 + compile "com.google.zetasql:zetasql-jni-channel:$zetasql_version" + permitUnusedDeclared "com.google.zetasql:zetasql-jni-channel:$zetasql_version" // BEAM-11761 + testCompile library.java.vendored_calcite_1_26_0 testCompile library.java.vendored_guava_26_0_jre testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.mockito_core testCompile library.java.quickcheck_core + testCompileOnly project(":sdks:java:extensions:sql:udf-test-provider") testRuntimeClasspath library.java.slf4j_jdk14 } +test { + dependsOn ":sdks:java:extensions:sql:emptyJar" + // Pass jars used by Java UDF tests via system properties. + systemProperty "beam.sql.udf.test.jar_path", project(":sdks:java:extensions:sql:udf-test-provider").jarPath + systemProperty "beam.sql.udf.test.empty_jar_path", project(":sdks:java:extensions:sql").emptyJar.archivePath +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamCalcRelType.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamCalcRelType.java new file mode 100644 index 000000000000..5b2885562ec9 --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamCalcRelType.java @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; +import org.apache.beam.sdk.extensions.sql.impl.rel.CalcRelSplitter; +import org.apache.beam.sdk.extensions.sql.zetasql.translation.ZetaSqlScalarFunctionImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.RexImpTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexDynamicParam; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexFieldAccess; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlUserDefinedFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** {@link CalcRelSplitter.RelType} for {@link BeamCalcRel}. */ +class BeamCalcRelType extends CalcRelSplitter.RelType { + private static final Logger LOG = LoggerFactory.getLogger(BeamCalcRelType.class); + + BeamCalcRelType(String name) { + super(name); + } + + @Override + protected boolean canImplement(RexFieldAccess field) { + return supportsType(field.getType()); + } + + @Override + protected boolean canImplement(RexLiteral literal) { + return supportsType(literal.getType()); + } + + @Override + protected boolean canImplement(RexDynamicParam param) { + return supportsType(param.getType()); + } + + @Override + protected boolean canImplement(RexCall call) { + final SqlOperator operator = call.getOperator(); + + RexImpTable.RexCallImplementor implementor = RexImpTable.INSTANCE.get(operator); + if (implementor == null) { + // Reject methods with no implementation + return false; + } + + if (operator instanceof SqlUserDefinedFunction) { + SqlUserDefinedFunction udf = (SqlUserDefinedFunction) call.op; + if (udf.function instanceof ZetaSqlScalarFunctionImpl) { + ZetaSqlScalarFunctionImpl scalarFunction = (ZetaSqlScalarFunctionImpl) udf.function; + if (!scalarFunction.functionGroup.equals( + BeamZetaSqlCatalog.USER_DEFINED_JAVA_SCALAR_FUNCTIONS)) { + // Reject ZetaSQL Builtin Scalar Functions + return false; + } + for (RexNode operand : call.getOperands()) { + if (operand instanceof RexLocalRef) { + if (!supportsType(operand.getType())) { + LOG.error( + "User-defined function {} received unsupported operand type {}.", + call.op.getName(), + ((RexLocalRef) operand).getType()); + return false; + } + } else { + LOG.error( + "User-defined function {} received unrecognized operand kind {}.", + call.op.getName(), + operand.getKind()); + return false; + } + } + } else { + // Reject other UDFs + return false; + } + } else { + // Reject Calcite implementations + return false; + } + return true; + } + + @Override + protected RelNode makeRel( + RelOptCluster cluster, + RelTraitSet traitSet, + RelBuilder relBuilder, + RelNode input, + RexProgram program) { + RexProgram normalizedProgram = program.normalize(cluster.getRexBuilder(), false); + return new BeamCalcRel( + cluster, + traitSet.replace(BeamLogicalConvention.INSTANCE), + RelOptRule.convert(input, input.getTraitSet().replace(BeamLogicalConvention.INSTANCE)), + normalizedProgram); + } + + /** + * Returns true only if the data type can be correctly implemented by {@link + * org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel} in ZetaSQL. + */ + private boolean supportsType(RelDataType type) { + switch (type.getSqlTypeName()) { + case BIGINT: + case BINARY: + case BOOLEAN: + case CHAR: + case DATE: + case DECIMAL: + case DOUBLE: + case NULL: + case TIMESTAMP: + case VARBINARY: + case VARCHAR: + return true; + case ARRAY: + return supportsType(type.getComponentType()); + case ROW: + return type.getFieldList().stream().allMatch((field) -> supportsType(field.getType())); + case TIME: // BEAM-12086 + case TIMESTAMP_WITH_LOCAL_TIME_ZONE: // BEAM-12087 + default: + return false; + } + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamJavaUdfCalcRule.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamJavaUdfCalcRule.java new file mode 100644 index 000000000000..ace37c597b3d --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamJavaUdfCalcRule.java @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.CalcRelSplitter; +import org.apache.beam.sdk.extensions.sql.impl.rule.BeamCalcRule; +import org.apache.beam.sdk.extensions.sql.impl.rule.BeamCalcSplittingRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; + +/** + * A {@link BeamCalcSplittingRule} to replace {@link Calc} with {@link BeamCalcRel}. + * + *

    Equivalent to {@link BeamCalcRule} but with limits to supported types and operators. + * + *

    This class is intended only for testing purposes. See {@link BeamZetaSqlCalcSplittingRule}. + */ +public class BeamJavaUdfCalcRule extends BeamCalcSplittingRule { + public static final BeamJavaUdfCalcRule INSTANCE = new BeamJavaUdfCalcRule(); + + private BeamJavaUdfCalcRule() { + super("BeamJavaUdfCalcRule"); + } + + @Override + protected CalcRelSplitter.RelType[] getRelTypes() { + // "Split" the Calc between two identical RelTypes. The second one is just a placeholder; if the + // first isn't usable, the second one won't be usable either, and the planner will fail. + return new CalcRelSplitter.RelType[] { + new BeamCalcRelType("BeamCalcRelType"), new BeamCalcRelType("BeamCalcRelType2") + }; + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcMergeRule.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcMergeRule.java new file mode 100644 index 000000000000..d1e8fff918df --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcMergeRule.java @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleOperand; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CalcMergeRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.CoreRules; + +/** + * Planner rule to merge a {@link BeamZetaSqlCalcRel} with a {@link BeamZetaSqlCalcRel}. Subset of + * {@link CalcMergeRule}. + */ +public class BeamZetaSqlCalcMergeRule extends RelOptRule { + public static final BeamZetaSqlCalcMergeRule INSTANCE = new BeamZetaSqlCalcMergeRule(); + + public BeamZetaSqlCalcMergeRule() { + super( + operand( + BeamZetaSqlCalcRel.class, + operand(BeamZetaSqlCalcRel.class, any()), + new RelOptRuleOperand[0])); + } + + @Override + public void onMatch(RelOptRuleCall call) { + CoreRules.CALC_MERGE.onMatch(call); + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java index 0aa16e58c1ee..38c604e19e27 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java @@ -17,12 +17,18 @@ */ package org.apache.beam.sdk.extensions.sql.zetasql; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; + +import com.google.auto.value.AutoValue; import com.google.zetasql.AnalyzerOptions; import com.google.zetasql.PreparedExpression; import com.google.zetasql.Value; +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.util.ArrayDeque; import java.util.HashMap; import java.util.List; import java.util.Map; +import java.util.Queue; import java.util.concurrent.ExecutionException; import java.util.concurrent.Future; import java.util.function.IntFunction; @@ -37,36 +43,39 @@ import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlDialect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Duration; +import org.joda.time.Instant; /** * BeamRelNode to replace {@code Project} and {@code Filter} node based on the {@code ZetaSQL} * expression evaluator. */ @Internal -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamZetaSqlCalcRel extends AbstractBeamCalcRel { private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT; + private static final int MAX_PENDING_WINDOW = 32; private final BeamSqlUnparseContext context; private static String columnName(int i) { @@ -90,6 +99,17 @@ public PTransform, PCollection> buildPTransform() { return new Transform(); } + @AutoValue + abstract static class TimestampedFuture { + private static TimestampedFuture create(Instant t, Future f) { + return new AutoValue_BeamZetaSqlCalcRel_TimestampedFuture(t, f); + } + + abstract Instant timestamp(); + + abstract Future future(); + } + private class Transform extends PTransform, PCollection> { @Override public PCollection expand(PCollectionList pinput) { @@ -143,6 +163,7 @@ private static Map createNullParams(Map inpu * {@code CalcFn} is the executor for a {@link BeamZetaSqlCalcRel} step. The implementation is * based on the {@code ZetaSQL} expression evaluator. */ + @SuppressFBWarnings("SE_TRANSIENT_FIELD_NOT_RESTORED") private static class CalcFn extends DoFn { private final String sql; private final Map nullParams; @@ -150,9 +171,10 @@ private static class CalcFn extends DoFn { private final Schema outputSchema; private final String defaultTimezone; private final boolean verifyRowValues; + private transient List referencedColumns = ImmutableList.of(); + private transient Map> pending = new HashMap<>(); private transient PreparedExpression exp; - private transient List referencedColumns; - private transient PreparedExpression.Stream stream; + private transient PreparedExpression.@Nullable Stream stream; CalcFn( String sql, @@ -162,6 +184,7 @@ private static class CalcFn extends DoFn { String defaultTimezone, boolean verifyRowValues) { this.sql = sql; + this.exp = new PreparedExpression(sql); this.nullParams = nullParams; this.inputSchema = inputSchema; this.outputSchema = outputSchema; @@ -169,8 +192,8 @@ private static class CalcFn extends DoFn { this.verifyRowValues = verifyRowValues; } - @Setup - public void setup() { + /** exp cannot be reused and is transient so needs to be reinitialized. */ + private void prepareExpression() { AnalyzerOptions options = SqlAnalyzer.getAnalyzerOptions(QueryParameters.ofNamed(nullParams), defaultTimezone); for (int i = 0; i < inputSchema.getFieldCount(); i++) { @@ -181,6 +204,11 @@ public void setup() { exp = new PreparedExpression(sql); exp.prepare(options); + } + + @Setup + public void setup() { + prepareExpression(); ImmutableList.Builder columns = new ImmutableList.Builder<>(); for (String c : exp.getReferencedColumns()) { @@ -191,10 +219,21 @@ public void setup() { stream = exp.stream(); } + @StartBundle + public void startBundle() { + pending = new HashMap<>(); + } + + @Override + public Duration getAllowedTimestampSkew() { + return Duration.millis(Long.MAX_VALUE); + } + @ProcessElement - public void processElement(ProcessContext c) throws InterruptedException { + public void processElement( + @Element Row row, @Timestamp Instant t, BoundedWindow w, OutputReceiver r) + throws InterruptedException { Map columns = new HashMap<>(); - Row row = c.element(); for (int i : referencedColumns) { columns.put( columnName(i), @@ -202,22 +241,83 @@ public void processElement(ProcessContext c) throws InterruptedException { row.getBaseValue(i, Object.class), inputSchema.getField(i).getType())); } - final Future vf = stream.execute(columns, nullParams); + @NonNull + Future valueFuture = checkArgumentNotNull(stream).execute(columns, nullParams); + + @Nullable Queue pendingWindow = pending.get(w); + if (pendingWindow == null) { + pendingWindow = new ArrayDeque<>(); + pending.put(w, pendingWindow); + } + pendingWindow.add(TimestampedFuture.create(t, valueFuture)); + + while ((!pendingWindow.isEmpty() && pendingWindow.element().future().isDone()) + || pendingWindow.size() > MAX_PENDING_WINDOW) { + outputRow(pendingWindow.remove(), r); + } + } + + @FinishBundle + public void finishBundle(FinishBundleContext c) throws InterruptedException { + checkArgumentNotNull(stream).flush(); + for (Map.Entry> pendingWindow : pending.entrySet()) { + OutputReceiver rowOutputReciever = + new OutputReceiverForFinishBundle(c, pendingWindow.getKey()); + for (TimestampedFuture timestampedFuture : pendingWindow.getValue()) { + outputRow(timestampedFuture, rowOutputReciever); + } + } + } + + // TODO(BEAM-1287): Remove this when FinishBundle has added support for an {@link + // OutputReceiver} + private static class OutputReceiverForFinishBundle implements OutputReceiver { + + private final FinishBundleContext c; + private final BoundedWindow w; + + private OutputReceiverForFinishBundle(FinishBundleContext c, BoundedWindow w) { + this.c = c; + this.w = w; + } + + @Override + public void output(Row output) { + throw new RuntimeException("Unsupported"); + } + + @Override + public void outputWithTimestamp(Row output, Instant timestamp) { + c.output(output, timestamp, w); + } + } + + private static RuntimeException extractException(ExecutionException e) { + try { + throw checkArgumentNotNull(e.getCause()); + } catch (RuntimeException r) { + return r; + } catch (Throwable t) { + return new RuntimeException(t); + } + } + + private void outputRow(TimestampedFuture c, OutputReceiver r) throws InterruptedException { final Value v; try { - v = vf.get(); + v = c.future().get(); } catch (ExecutionException e) { - throw (RuntimeException) e.getCause(); + throw extractException(e); } if (!v.isNull()) { - Row outputRow = ZetaSqlBeamTranslationUtils.toBeamRow(v, outputSchema, verifyRowValues); - c.output(outputRow); + Row row = ZetaSqlBeamTranslationUtils.toBeamRow(v, outputSchema, verifyRowValues); + r.outputWithTimestamp(row, c.timestamp()); } } @Teardown public void teardown() { - stream.close(); + checkArgumentNotNull(stream).close(); exp.close(); } } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRule.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRule.java index 2e7ea0f7a2cc..74dbafca9c87 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRule.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRule.java @@ -17,38 +17,24 @@ */ package org.apache.beam.sdk.extensions.sql.zetasql; -import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalCalc; +import org.apache.beam.sdk.extensions.sql.impl.rel.CalcRelSplitter; +import org.apache.beam.sdk.extensions.sql.impl.rule.BeamCalcSplittingRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Calc; -/** A {@code ConverterRule} to replace {@link Calc} with {@link BeamZetaSqlCalcRel}. */ -public class BeamZetaSqlCalcRule extends ConverterRule { +/** A {@link BeamCalcSplittingRule} to replace {@link Calc} with {@link BeamZetaSqlCalcRel}. */ +public class BeamZetaSqlCalcRule extends BeamCalcSplittingRule { public static final BeamZetaSqlCalcRule INSTANCE = new BeamZetaSqlCalcRule(); private BeamZetaSqlCalcRule() { - super( - LogicalCalc.class, Convention.NONE, BeamLogicalConvention.INSTANCE, "BeamZetaSqlCalcRule"); + super("BeamZetaSqlCalcRule"); } @Override - public boolean matches(RelOptRuleCall x) { - return true; - } - - @Override - public RelNode convert(RelNode rel) { - final Calc calc = (Calc) rel; - final RelNode input = calc.getInput(); - - return new BeamZetaSqlCalcRel( - calc.getCluster(), - calc.getTraitSet().replace(BeamLogicalConvention.INSTANCE), - RelOptRule.convert(input, input.getTraitSet().replace(BeamLogicalConvention.INSTANCE)), - calc.getProgram()); + protected CalcRelSplitter.RelType[] getRelTypes() { + // "Split" the Calc between two identical RelTypes. The second one is just a placeholder; if the + // first isn't usable, the second one won't be usable either, and the planner will fail. + return new CalcRelSplitter.RelType[] { + new BeamZetaSqlRelType("BeamZetaSqlRelType"), new BeamZetaSqlRelType("BeamZetaSqlRelType2") + }; } } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcSplittingRule.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcSplittingRule.java new file mode 100644 index 000000000000..3b9bb385fbe7 --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcSplittingRule.java @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.CalcRelSplitter; +import org.apache.beam.sdk.extensions.sql.impl.rule.BeamCalcSplittingRule; + +/** + * A {@link BeamCalcSplittingRule} that converts a {@link LogicalCalc} to a chain of {@link + * BeamZetaSqlCalcRel} and/or {@link BeamCalcRel} via {@link CalcRelSplitter}. + * + *

    Only Java UDFs are implemented using {@link BeamCalcRel}. All other expressions are + * implemented using {@link BeamZetaSqlCalcRel}. + */ +public class BeamZetaSqlCalcSplittingRule extends BeamCalcSplittingRule { + public static final BeamZetaSqlCalcSplittingRule INSTANCE = new BeamZetaSqlCalcSplittingRule(); + + private BeamZetaSqlCalcSplittingRule() { + super("BeamZetaSqlCalcRule"); + } + + @Override + protected CalcRelSplitter.RelType[] getRelTypes() { + return new CalcRelSplitter.RelType[] { + new BeamZetaSqlRelType("BeamZetaSqlRelType"), new BeamCalcRelType("BeamCalcRelType") + }; + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCatalog.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCatalog.java new file mode 100644 index 000000000000..ff78a602cd4e --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCatalog.java @@ -0,0 +1,589 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.common.collect.ImmutableList; +import com.google.zetasql.Analyzer; +import com.google.zetasql.AnalyzerOptions; +import com.google.zetasql.Function; +import com.google.zetasql.FunctionArgumentType; +import com.google.zetasql.FunctionSignature; +import com.google.zetasql.SimpleCatalog; +import com.google.zetasql.TVFRelation; +import com.google.zetasql.TableValuedFunction; +import com.google.zetasql.Type; +import com.google.zetasql.TypeFactory; +import com.google.zetasql.ZetaSQLBuiltinFunctionOptions; +import com.google.zetasql.ZetaSQLFunctions; +import com.google.zetasql.ZetaSQLType; +import com.google.zetasql.resolvedast.ResolvedNode; +import com.google.zetasql.resolvedast.ResolvedNodes; +import java.lang.reflect.Method; +import java.util.Arrays; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.sdk.extensions.sql.impl.JavaUdfLoader; +import org.apache.beam.sdk.extensions.sql.impl.LazyAggregateCombineFn; +import org.apache.beam.sdk.extensions.sql.impl.ScalarFnReflector; +import org.apache.beam.sdk.extensions.sql.impl.ScalarFunctionImpl; +import org.apache.beam.sdk.extensions.sql.impl.UdafImpl; +import org.apache.beam.sdk.extensions.sql.impl.utils.TVFStreamingUtils; +import org.apache.beam.sdk.extensions.sql.udf.ScalarFn; +import org.apache.beam.sdk.extensions.sql.zetasql.translation.UserFunctionDefinitions; +import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.FunctionParameter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +/** + * Catalog for registering tables and functions. Populates a {@link SimpleCatalog} based on a {@link + * SchemaPlus}. + */ +public class BeamZetaSqlCatalog { + // ZetaSQL function group identifiers. Different function groups may have divergent translation + // paths. + public static final String PRE_DEFINED_WINDOW_FUNCTIONS = "pre_defined_window_functions"; + public static final String USER_DEFINED_SQL_FUNCTIONS = "user_defined_functions"; + public static final String USER_DEFINED_JAVA_SCALAR_FUNCTIONS = + "user_defined_java_scalar_functions"; + public static final String USER_DEFINED_JAVA_AGGREGATE_FUNCTIONS = + "user_defined_java_aggregate_functions"; + /** + * Same as {@link Function}.ZETASQL_FUNCTION_GROUP_NAME. Identifies built-in ZetaSQL functions. + */ + public static final String ZETASQL_FUNCTION_GROUP_NAME = "ZetaSQL"; + + private static final ImmutableList PRE_DEFINED_WINDOW_FUNCTION_DECLARATIONS = + ImmutableList.of( + // TODO: support optional function argument (for window_offset). + "CREATE FUNCTION TUMBLE(ts TIMESTAMP, window_size STRING) AS (1);", + "CREATE FUNCTION TUMBLE_START(window_size STRING) RETURNS TIMESTAMP AS (null);", + "CREATE FUNCTION TUMBLE_END(window_size STRING) RETURNS TIMESTAMP AS (null);", + "CREATE FUNCTION HOP(ts TIMESTAMP, emit_frequency STRING, window_size STRING) AS (1);", + "CREATE FUNCTION HOP_START(emit_frequency STRING, window_size STRING) " + + "RETURNS TIMESTAMP AS (null);", + "CREATE FUNCTION HOP_END(emit_frequency STRING, window_size STRING) " + + "RETURNS TIMESTAMP AS (null);", + "CREATE FUNCTION SESSION(ts TIMESTAMP, session_gap STRING) AS (1);", + "CREATE FUNCTION SESSION_START(session_gap STRING) RETURNS TIMESTAMP AS (null);", + "CREATE FUNCTION SESSION_END(session_gap STRING) RETURNS TIMESTAMP AS (null);"); + + /** The top-level Calcite schema, which may contain sub-schemas. */ + private final SchemaPlus calciteSchema; + /** + * The top-level ZetaSQL catalog, which may contain nested catalogs for qualified table and + * function references. + */ + private final SimpleCatalog zetaSqlCatalog; + + private final JavaTypeFactory typeFactory; + + private final JavaUdfLoader javaUdfLoader = new JavaUdfLoader(); + private final Map, ResolvedNodes.ResolvedCreateFunctionStmt> sqlScalarUdfs = + new HashMap<>(); + /** User-defined table valued functions. */ + private final Map, ResolvedNode> sqlUdtvfs = new HashMap<>(); + + private final Map, UserFunctionDefinitions.JavaScalarFunction> javaScalarUdfs = + new HashMap<>(); + private final Map, Combine.CombineFn> javaUdafs = new HashMap<>(); + + private BeamZetaSqlCatalog( + SchemaPlus calciteSchema, SimpleCatalog zetaSqlCatalog, JavaTypeFactory typeFactory) { + this.calciteSchema = calciteSchema; + this.zetaSqlCatalog = zetaSqlCatalog; + this.typeFactory = typeFactory; + } + + /** Return catalog pre-populated with builtin functions. */ + static BeamZetaSqlCatalog create( + SchemaPlus calciteSchema, JavaTypeFactory typeFactory, AnalyzerOptions options) { + BeamZetaSqlCatalog catalog = + new BeamZetaSqlCatalog( + calciteSchema, new SimpleCatalog(calciteSchema.getName()), typeFactory); + catalog.addFunctionsToCatalog(options); + return catalog; + } + + SimpleCatalog getZetaSqlCatalog() { + return zetaSqlCatalog; + } + + void addTables(List> tables, QueryTrait queryTrait) { + tables.forEach(table -> addTableToLeafCatalog(table, queryTrait)); + } + + void addFunction(ResolvedNodes.ResolvedCreateFunctionStmt createFunctionStmt) { + String functionGroup = getFunctionGroup(createFunctionStmt); + switch (functionGroup) { + case USER_DEFINED_SQL_FUNCTIONS: + sqlScalarUdfs.put(createFunctionStmt.getNamePath(), createFunctionStmt); + break; + case USER_DEFINED_JAVA_SCALAR_FUNCTIONS: + String functionName = String.join(".", createFunctionStmt.getNamePath()); + for (FunctionArgumentType argumentType : + createFunctionStmt.getSignature().getFunctionArgumentList()) { + Type type = argumentType.getType(); + if (type == null) { + throw new UnsupportedOperationException( + "UDF templated argument types are not supported."); + } + validateJavaUdfZetaSqlType(type, functionName); + } + if (createFunctionStmt.getReturnType() == null) { + throw new IllegalArgumentException("UDF return type must not be null."); + } + validateJavaUdfZetaSqlType(createFunctionStmt.getReturnType(), functionName); + String jarPath = getJarPath(createFunctionStmt); + ScalarFn scalarFn = + javaUdfLoader.loadScalarFunction(createFunctionStmt.getNamePath(), jarPath); + Method method = ScalarFnReflector.getApplyMethod(scalarFn); + javaScalarUdfs.put( + createFunctionStmt.getNamePath(), + UserFunctionDefinitions.JavaScalarFunction.create(method, jarPath)); + break; + case USER_DEFINED_JAVA_AGGREGATE_FUNCTIONS: + jarPath = getJarPath(createFunctionStmt); + // Try loading the aggregate function just to make sure it exists. LazyAggregateCombineFn + // will need to fetch it again at runtime. + javaUdfLoader.loadAggregateFunction(createFunctionStmt.getNamePath(), jarPath); + Combine.CombineFn combineFn = + new LazyAggregateCombineFn<>(createFunctionStmt.getNamePath(), jarPath); + javaUdafs.put(createFunctionStmt.getNamePath(), combineFn); + break; + default: + throw new IllegalArgumentException( + String.format("Encountered unrecognized function group %s.", functionGroup)); + } + zetaSqlCatalog.addFunction( + new Function( + createFunctionStmt.getNamePath(), + functionGroup, + createFunctionStmt.getIsAggregate() + ? ZetaSQLFunctions.FunctionEnums.Mode.AGGREGATE + : ZetaSQLFunctions.FunctionEnums.Mode.SCALAR, + ImmutableList.of(createFunctionStmt.getSignature()))); + } + + /** + * Throws {@link UnsupportedOperationException} if ZetaSQL type is not supported in Java UDF. + * Supported types are a subset of the types supported by {@link BeamJavaUdfCalcRule}. + * + *

    Supported types should be kept in sync with {@link #validateJavaUdfCalciteType(RelDataType, + * String)}. + */ + void validateJavaUdfZetaSqlType(Type type, String functionName) { + switch (type.getKind()) { + case TYPE_BOOL: + case TYPE_BYTES: + case TYPE_DATE: + case TYPE_DOUBLE: + case TYPE_INT64: + case TYPE_NUMERIC: + case TYPE_STRING: + case TYPE_TIMESTAMP: + // These types are supported. + break; + case TYPE_ARRAY: + validateJavaUdfZetaSqlType(type.asArray().getElementType(), functionName); + break; + case TYPE_TIME: + case TYPE_DATETIME: + case TYPE_STRUCT: + default: + throw new UnsupportedOperationException( + String.format( + "ZetaSQL type %s not allowed in function %s", type.getKind().name(), functionName)); + } + } + + void addTableValuedFunction( + ResolvedNodes.ResolvedCreateTableFunctionStmt createTableFunctionStmt) { + zetaSqlCatalog.addTableValuedFunction( + new TableValuedFunction.FixedOutputSchemaTVF( + createTableFunctionStmt.getNamePath(), + createTableFunctionStmt.getSignature(), + TVFRelation.createColumnBased( + createTableFunctionStmt.getQuery().getColumnList().stream() + .map(c -> TVFRelation.Column.create(c.getName(), c.getType())) + .collect(Collectors.toList())))); + sqlUdtvfs.put(createTableFunctionStmt.getNamePath(), createTableFunctionStmt.getQuery()); + } + + UserFunctionDefinitions getUserFunctionDefinitions() { + return UserFunctionDefinitions.newBuilder() + .setSqlScalarFunctions(ImmutableMap.copyOf(sqlScalarUdfs)) + .setSqlTableValuedFunctions(ImmutableMap.copyOf(sqlUdtvfs)) + .setJavaScalarFunctions(ImmutableMap.copyOf(javaScalarUdfs)) + .setJavaAggregateFunctions(ImmutableMap.copyOf(javaUdafs)) + .build(); + } + + private void addFunctionsToCatalog(AnalyzerOptions options) { + // Enable ZetaSQL builtin functions. + ZetaSQLBuiltinFunctionOptions zetasqlBuiltinFunctionOptions = + new ZetaSQLBuiltinFunctionOptions(options.getLanguageOptions()); + SupportedZetaSqlBuiltinFunctions.ALLOWLIST.forEach( + zetasqlBuiltinFunctionOptions::includeFunctionSignatureId); + zetaSqlCatalog.addZetaSQLFunctions(zetasqlBuiltinFunctionOptions); + + // Enable Beam SQL's builtin windowing functions. + addWindowScalarFunctions(options); + addWindowTvfs(); + + // Add user-defined functions already defined in the schema, if any. + addUdfsFromSchema(); + } + + private void addWindowScalarFunctions(AnalyzerOptions options) { + PRE_DEFINED_WINDOW_FUNCTION_DECLARATIONS.stream() + .map( + func -> + (ResolvedNodes.ResolvedCreateFunctionStmt) + Analyzer.analyzeStatement(func, options, zetaSqlCatalog)) + .map( + resolvedFunc -> + new Function( + String.join(".", resolvedFunc.getNamePath()), + PRE_DEFINED_WINDOW_FUNCTIONS, + ZetaSQLFunctions.FunctionEnums.Mode.SCALAR, + ImmutableList.of(resolvedFunc.getSignature()))) + .forEach(zetaSqlCatalog::addFunction); + } + + @SuppressWarnings({ + "nullness" // customContext and volatility are in fact nullable, but they are missing the + // annotation upstream. TODO Unsuppress when this is fixed in ZetaSQL. + }) + private void addWindowTvfs() { + FunctionArgumentType retType = + new FunctionArgumentType(ZetaSQLFunctions.SignatureArgumentKind.ARG_TYPE_RELATION); + + FunctionArgumentType inputTableType = + new FunctionArgumentType(ZetaSQLFunctions.SignatureArgumentKind.ARG_TYPE_RELATION); + + FunctionArgumentType descriptorType = + new FunctionArgumentType( + ZetaSQLFunctions.SignatureArgumentKind.ARG_TYPE_DESCRIPTOR, + FunctionArgumentType.FunctionArgumentTypeOptions.builder() + .setDescriptorResolutionTableOffset(0) + .build(), + 1); + + FunctionArgumentType stringType = + new FunctionArgumentType(TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_STRING)); + + // TUMBLE + zetaSqlCatalog.addTableValuedFunction( + new TableValuedFunction.ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF( + ImmutableList.of(TVFStreamingUtils.FIXED_WINDOW_TVF), + new FunctionSignature( + retType, ImmutableList.of(inputTableType, descriptorType, stringType), -1), + ImmutableList.of( + TVFRelation.Column.create( + TVFStreamingUtils.WINDOW_START, + TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_TIMESTAMP)), + TVFRelation.Column.create( + TVFStreamingUtils.WINDOW_END, + TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_TIMESTAMP))), + null, + null)); + + // HOP + zetaSqlCatalog.addTableValuedFunction( + new TableValuedFunction.ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF( + ImmutableList.of(TVFStreamingUtils.SLIDING_WINDOW_TVF), + new FunctionSignature( + retType, + ImmutableList.of(inputTableType, descriptorType, stringType, stringType), + -1), + ImmutableList.of( + TVFRelation.Column.create( + TVFStreamingUtils.WINDOW_START, + TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_TIMESTAMP)), + TVFRelation.Column.create( + TVFStreamingUtils.WINDOW_END, + TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_TIMESTAMP))), + null, + null)); + + // SESSION + zetaSqlCatalog.addTableValuedFunction( + new TableValuedFunction.ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF( + ImmutableList.of(TVFStreamingUtils.SESSION_WINDOW_TVF), + new FunctionSignature( + retType, + ImmutableList.of(inputTableType, descriptorType, descriptorType, stringType), + -1), + ImmutableList.of( + TVFRelation.Column.create( + TVFStreamingUtils.WINDOW_START, + TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_TIMESTAMP)), + TVFRelation.Column.create( + TVFStreamingUtils.WINDOW_END, + TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_TIMESTAMP))), + null, + null)); + } + + private void addUdfsFromSchema() { + for (String functionName : calciteSchema.getFunctionNames()) { + Collection + functions = calciteSchema.getFunctions(functionName); + if (functions.size() != 1) { + throw new IllegalArgumentException( + String.format( + "Expected exactly 1 definition for function '%s', but found %d." + + " Beam ZetaSQL supports only a single function definition per function name (BEAM-12073).", + functionName, functions.size())); + } + for (org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function function : + functions) { + List path = Arrays.asList(functionName.split("\\.")); + if (function instanceof ScalarFunctionImpl) { + ScalarFunctionImpl scalarFunction = (ScalarFunctionImpl) function; + // Validate types before converting from Calcite to ZetaSQL, since the conversion may fail + // for unsupported types. + for (FunctionParameter parameter : scalarFunction.getParameters()) { + validateJavaUdfCalciteType(parameter.getType(typeFactory), functionName); + } + validateJavaUdfCalciteType(scalarFunction.getReturnType(typeFactory), functionName); + Method method = scalarFunction.method; + javaScalarUdfs.put(path, UserFunctionDefinitions.JavaScalarFunction.create(method, "")); + FunctionArgumentType resultType = + new FunctionArgumentType( + ZetaSqlCalciteTranslationUtils.toZetaSqlType( + scalarFunction.getReturnType(typeFactory))); + FunctionSignature functionSignature = + new FunctionSignature(resultType, getArgumentTypes(scalarFunction), 0L); + zetaSqlCatalog.addFunction( + new Function( + path, + USER_DEFINED_JAVA_SCALAR_FUNCTIONS, + ZetaSQLFunctions.FunctionEnums.Mode.SCALAR, + ImmutableList.of(functionSignature))); + } else if (function instanceof UdafImpl) { + UdafImpl udaf = (UdafImpl) function; + javaUdafs.put(path, udaf.getCombineFn()); + FunctionArgumentType resultType = + new FunctionArgumentType( + ZetaSqlCalciteTranslationUtils.toZetaSqlType(udaf.getReturnType(typeFactory))); + FunctionSignature functionSignature = + new FunctionSignature(resultType, getArgumentTypes(udaf), 0L); + zetaSqlCatalog.addFunction( + new Function( + path, + USER_DEFINED_JAVA_AGGREGATE_FUNCTIONS, + ZetaSQLFunctions.FunctionEnums.Mode.AGGREGATE, + ImmutableList.of(functionSignature))); + } else { + throw new IllegalArgumentException( + String.format( + "Function %s has unrecognized implementation type %s.", + functionName, function.getClass().getName())); + } + } + } + } + + private List getArgumentTypes( + org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function function) { + return function.getParameters().stream() + .map( + (arg) -> + new FunctionArgumentType( + ZetaSqlCalciteTranslationUtils.toZetaSqlType(arg.getType(typeFactory)))) + .collect(Collectors.toList()); + } + + /** + * Throws {@link UnsupportedOperationException} if Calcite type is not supported in Java UDF. + * Supported types are a subset of the corresponding Calcite types supported by {@link + * BeamJavaUdfCalcRule}. + * + *

    Supported types should be kept in sync with {@link #validateJavaUdfZetaSqlType(Type, + * String)}. + */ + private void validateJavaUdfCalciteType(RelDataType type, String functionName) { + switch (type.getSqlTypeName()) { + case BIGINT: + case BOOLEAN: + case DATE: + case DECIMAL: + case DOUBLE: + case TIMESTAMP: + case VARCHAR: + case VARBINARY: + // These types are supported. + break; + case ARRAY: + validateJavaUdfCalciteType(type.getComponentType(), functionName); + break; + case TIME: + case TIMESTAMP_WITH_LOCAL_TIME_ZONE: + case ROW: + default: + throw new UnsupportedOperationException( + String.format( + "Calcite type %s not allowed in function %s", + type.getSqlTypeName().getName(), functionName)); + } + } + + private String getFunctionGroup(ResolvedNodes.ResolvedCreateFunctionStmt createFunctionStmt) { + switch (createFunctionStmt.getLanguage().toUpperCase()) { + case "JAVA": + return createFunctionStmt.getIsAggregate() + ? USER_DEFINED_JAVA_AGGREGATE_FUNCTIONS + : USER_DEFINED_JAVA_SCALAR_FUNCTIONS; + case "SQL": + if (createFunctionStmt.getIsAggregate()) { + throw new UnsupportedOperationException( + "Native SQL aggregate functions are not supported (BEAM-9954)."); + } + return USER_DEFINED_SQL_FUNCTIONS; + case "PY": + case "PYTHON": + case "JS": + case "JAVASCRIPT": + throw new UnsupportedOperationException( + String.format( + "Function %s uses unsupported language %s.", + String.join(".", createFunctionStmt.getNamePath()), + createFunctionStmt.getLanguage())); + default: + throw new IllegalArgumentException( + String.format( + "Function %s uses unrecognized language %s.", + String.join(".", createFunctionStmt.getNamePath()), + createFunctionStmt.getLanguage())); + } + } + + /** + * Assume last element in tablePath is a table name, and everything before is catalogs. So the + * logic is to create nested catalogs until the last level, then add a table at the last level. + * + *

    Table schema is extracted from Calcite schema based on the table name resolution strategy, + * e.g. either by drilling down the schema.getSubschema() path or joining the table name with dots + * to construct a single compound identifier (e.g. Data Catalog use case). + */ + private void addTableToLeafCatalog(List tablePath, QueryTrait queryTrait) { + + SimpleCatalog leafCatalog = createNestedCatalogs(zetaSqlCatalog, tablePath); + + org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Table calciteTable = + TableResolution.resolveCalciteTable(calciteSchema, tablePath); + + if (calciteTable == null) { + throw new ZetaSqlException( + "Wasn't able to resolve the path " + + tablePath + + " in schema: " + + calciteSchema.getName()); + } + + RelDataType rowType = calciteTable.getRowType(typeFactory); + + TableResolution.SimpleTableWithPath tableWithPath = + TableResolution.SimpleTableWithPath.of(tablePath); + queryTrait.addResolvedTable(tableWithPath); + + addFieldsToTable(tableWithPath, rowType); + leafCatalog.addSimpleTable(tableWithPath.getTable()); + } + + private static void addFieldsToTable( + TableResolution.SimpleTableWithPath tableWithPath, RelDataType rowType) { + for (RelDataTypeField field : rowType.getFieldList()) { + tableWithPath + .getTable() + .addSimpleColumn( + field.getName(), ZetaSqlCalciteTranslationUtils.toZetaSqlType(field.getType())); + } + } + + /** For table path like a.b.c we assume c is the table and a.b are the nested catalogs/schemas. */ + private static SimpleCatalog createNestedCatalogs(SimpleCatalog catalog, List tablePath) { + SimpleCatalog currentCatalog = catalog; + for (int i = 0; i < tablePath.size() - 1; i++) { + String nextCatalogName = tablePath.get(i); + + Optional existing = tryGetExisting(currentCatalog, nextCatalogName); + + currentCatalog = + existing.isPresent() ? existing.get() : addNewCatalog(currentCatalog, nextCatalogName); + } + return currentCatalog; + } + + private static Optional tryGetExisting( + SimpleCatalog currentCatalog, String nextCatalogName) { + return currentCatalog.getCatalogList().stream() + .filter(c -> nextCatalogName.equals(c.getFullName())) + .findFirst(); + } + + private static SimpleCatalog addNewCatalog(SimpleCatalog currentCatalog, String nextCatalogName) { + SimpleCatalog nextCatalog = new SimpleCatalog(nextCatalogName); + currentCatalog.addSimpleCatalog(nextCatalog); + return nextCatalog; + } + + private static String getJarPath(ResolvedNodes.ResolvedCreateFunctionStmt createFunctionStmt) { + String jarPath = getOptionStringValue(createFunctionStmt, "path"); + if (jarPath.isEmpty()) { + throw new IllegalArgumentException( + String.format( + "No jar was provided to define function %s. Add 'OPTIONS (path=)' to the CREATE FUNCTION statement.", + String.join(".", createFunctionStmt.getNamePath()))); + } + return jarPath; + } + + private static String getOptionStringValue( + ResolvedNodes.ResolvedCreateFunctionStmt createFunctionStmt, String optionName) { + for (ResolvedNodes.ResolvedOption option : createFunctionStmt.getOptionList()) { + if (optionName.equals(option.getName())) { + if (option.getValue() == null) { + throw new IllegalArgumentException( + String.format( + "Option '%s' has null value (expected %s).", + optionName, ZetaSQLType.TypeKind.TYPE_STRING)); + } + if (option.getValue().getType().getKind() != ZetaSQLType.TypeKind.TYPE_STRING) { + throw new IllegalArgumentException( + String.format( + "Option '%s' has type %s (expected %s).", + optionName, + option.getValue().getType().getKind(), + ZetaSQLType.TypeKind.TYPE_STRING)); + } + return ((ResolvedNodes.ResolvedLiteral) option.getValue()).getValue().getStringValue(); + } + } + return ""; + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlRelType.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlRelType.java new file mode 100644 index 000000000000..53cb446e4a60 --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlRelType.java @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; +import org.apache.beam.sdk.extensions.sql.impl.rel.CalcRelSplitter; +import org.apache.beam.sdk.extensions.sql.zetasql.translation.ZetaSqlScalarFunctionImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexDynamicParam; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexFieldAccess; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlUserDefinedFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RelBuilder; + +/** {@link CalcRelSplitter.RelType} for {@link BeamZetaSqlCalcRel}. */ +class BeamZetaSqlRelType extends CalcRelSplitter.RelType { + BeamZetaSqlRelType(String name) { + super(name); + } + + @Override + protected boolean canImplement(RexFieldAccess field) { + return true; + } + + @Override + protected boolean canImplement(RexDynamicParam param) { + return true; + } + + @Override + protected boolean canImplement(RexLiteral literal) { + return true; + } + + @Override + protected boolean canImplement(RexCall call) { + if (call.getOperator() instanceof SqlUserDefinedFunction) { + SqlUserDefinedFunction udf = (SqlUserDefinedFunction) call.op; + if (udf.function instanceof ZetaSqlScalarFunctionImpl) { + ZetaSqlScalarFunctionImpl scalarFunction = (ZetaSqlScalarFunctionImpl) udf.function; + if (scalarFunction.functionGroup.equals( + BeamZetaSqlCatalog.USER_DEFINED_JAVA_SCALAR_FUNCTIONS)) { + return false; + } + } + } + return true; + } + + @Override + protected RelNode makeRel( + RelOptCluster cluster, + RelTraitSet traitSet, + RelBuilder relBuilder, + RelNode input, + RexProgram program) { + RexProgram normalizedProgram = program.normalize(cluster.getRexBuilder(), false); + return new BeamZetaSqlCalcRel( + cluster, + traitSet.replace(BeamLogicalConvention.INSTANCE), + RelOptRule.convert(input, input.getTraitSet().replace(BeamLogicalConvention.INSTANCE)), + normalizedProgram); + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/DateTimeUtils.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/DateTimeUtils.java index d1f092470c93..b258597e4472 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/DateTimeUtils.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/DateTimeUtils.java @@ -21,7 +21,7 @@ import io.grpc.Status; import java.time.LocalTime; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.TimeUnit; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.TimeUnit; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java index b4666cd11405..d6ac756db647 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java @@ -22,28 +22,16 @@ import static com.google.zetasql.ZetaSQLResolvedNodeKind.ResolvedNodeKind.RESOLVED_QUERY_STMT; import static java.nio.charset.StandardCharsets.UTF_8; -import com.google.common.collect.ImmutableList; -import com.google.common.collect.ImmutableSet; import com.google.zetasql.Analyzer; import com.google.zetasql.AnalyzerOptions; -import com.google.zetasql.Function; -import com.google.zetasql.FunctionArgumentType; -import com.google.zetasql.FunctionSignature; import com.google.zetasql.ParseResumeLocation; -import com.google.zetasql.SimpleCatalog; -import com.google.zetasql.TVFRelation; -import com.google.zetasql.TableValuedFunction; -import com.google.zetasql.TypeFactory; import com.google.zetasql.Value; -import com.google.zetasql.ZetaSQLBuiltinFunctionOptions; -import com.google.zetasql.ZetaSQLFunctions.FunctionEnums.Mode; -import com.google.zetasql.ZetaSQLFunctions.SignatureArgumentKind; import com.google.zetasql.ZetaSQLOptions.ErrorMessageMode; import com.google.zetasql.ZetaSQLOptions.LanguageFeature; import com.google.zetasql.ZetaSQLOptions.ParameterMode; import com.google.zetasql.ZetaSQLOptions.ProductMode; import com.google.zetasql.ZetaSQLResolvedNodeKind.ResolvedNodeKind; -import com.google.zetasql.ZetaSQLType.TypeKind; +import com.google.zetasql.resolvedast.ResolvedNodes; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedCreateFunctionStmt; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedCreateTableFunctionStmt; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedStatement; @@ -51,60 +39,21 @@ import java.util.HashSet; import java.util.List; import java.util.Map; -import java.util.Optional; -import java.util.stream.Collectors; -import org.apache.beam.sdk.extensions.sql.impl.ParseException; import org.apache.beam.sdk.extensions.sql.impl.QueryPlanner.QueryParameters; import org.apache.beam.sdk.extensions.sql.impl.QueryPlanner.QueryParameters.Kind; -import org.apache.beam.sdk.extensions.sql.impl.SqlConversionException; -import org.apache.beam.sdk.extensions.sql.impl.utils.TVFStreamingUtils; -import org.apache.beam.sdk.extensions.sql.zetasql.TableResolution.SimpleTableWithPath; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.java.JavaTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; /** Adapter for {@link Analyzer} to simplify the API for parsing the query and resolving the AST. */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SqlAnalyzer { - public static final String PRE_DEFINED_WINDOW_FUNCTIONS = "pre_defined_window_functions"; - public static final String USER_DEFINED_FUNCTIONS = "user_defined_functions"; - private static final ImmutableSet SUPPORTED_STATEMENT_KINDS = ImmutableSet.of( RESOLVED_QUERY_STMT, RESOLVED_CREATE_FUNCTION_STMT, RESOLVED_CREATE_TABLE_FUNCTION_STMT); - private static final ImmutableList FUNCTION_LIST = - ImmutableList.of( - // TODO: support optional function argument (for window_offset). - "CREATE FUNCTION TUMBLE(ts TIMESTAMP, window_size STRING) AS (1);", - "CREATE FUNCTION TUMBLE_START(window_size STRING) RETURNS TIMESTAMP AS (null);", - "CREATE FUNCTION TUMBLE_END(window_size STRING) RETURNS TIMESTAMP AS (null);", - "CREATE FUNCTION HOP(ts TIMESTAMP, emit_frequency STRING, window_size STRING) AS (1);", - "CREATE FUNCTION HOP_START(emit_frequency STRING, window_size STRING) " - + "RETURNS TIMESTAMP AS (null);", - "CREATE FUNCTION HOP_END(emit_frequency STRING, window_size STRING) " - + "RETURNS TIMESTAMP AS (null);", - "CREATE FUNCTION SESSION(ts TIMESTAMP, session_gap STRING) AS (1);", - "CREATE FUNCTION SESSION_START(session_gap STRING) RETURNS TIMESTAMP AS (null);", - "CREATE FUNCTION SESSION_END(session_gap STRING) RETURNS TIMESTAMP AS (null);"); - - private final QueryTrait queryTrait; - private final SchemaPlus topLevelSchema; - private final JavaTypeFactory typeFactory; - - SqlAnalyzer(QueryTrait queryTrait, SchemaPlus topLevelSchema, JavaTypeFactory typeFactory) { - this.queryTrait = queryTrait; - this.topLevelSchema = topLevelSchema; - this.typeFactory = typeFactory; - } - - static boolean isEndOfInput(ParseResumeLocation parseResumeLocation) { - return parseResumeLocation.getBytePosition() - >= parseResumeLocation.getInput().getBytes(UTF_8).length; - } + SqlAnalyzer() {} /** Returns table names from all statements in the SQL string. */ List> extractTableNames(String sql, AnalyzerOptions options) { @@ -118,45 +67,66 @@ List> extractTableNames(String sql, AnalyzerOptions options) { return tables.build(); } + /** + * Analyzes the entire SQL code block (which may consist of multiple statements) and returns the + * resolved query. + * + *

    Assumes there is exactly one SELECT statement in the input, and it must be the last + * statement in the input. + */ + ResolvedNodes.ResolvedQueryStmt analyzeQuery( + String sql, AnalyzerOptions options, BeamZetaSqlCatalog catalog) { + ParseResumeLocation parseResumeLocation = new ParseResumeLocation(sql); + ResolvedStatement statement; + do { + statement = analyzeNextStatement(parseResumeLocation, options, catalog); + if (statement.nodeKind() == RESOLVED_QUERY_STMT) { + if (!SqlAnalyzer.isEndOfInput(parseResumeLocation)) { + throw new UnsupportedOperationException( + "No additional statements are allowed after a SELECT statement."); + } + } + } while (!SqlAnalyzer.isEndOfInput(parseResumeLocation)); + + if (!(statement instanceof ResolvedNodes.ResolvedQueryStmt)) { + throw new UnsupportedOperationException( + "Statement list must end in a SELECT statement, not " + statement.nodeKindString()); + } + return (ResolvedNodes.ResolvedQueryStmt) statement; + } + + private static boolean isEndOfInput(ParseResumeLocation parseResumeLocation) { + return parseResumeLocation.getBytePosition() + >= parseResumeLocation.getInput().getBytes(UTF_8).length; + } + /** * Accepts the ParseResumeLocation for the current position in the SQL string. Advances the * ParseResumeLocation to the start of the next statement. Adds user-defined functions to the * catalog for use in following statements. Returns the resolved AST. */ - ResolvedStatement analyzeNextStatement( - ParseResumeLocation parseResumeLocation, AnalyzerOptions options, SimpleCatalog catalog) { + private ResolvedStatement analyzeNextStatement( + ParseResumeLocation parseResumeLocation, + AnalyzerOptions options, + BeamZetaSqlCatalog catalog) { ResolvedStatement resolvedStatement = - Analyzer.analyzeNextStatement(parseResumeLocation, options, catalog); + Analyzer.analyzeNextStatement(parseResumeLocation, options, catalog.getZetaSqlCatalog()); if (resolvedStatement.nodeKind() == RESOLVED_CREATE_FUNCTION_STMT) { ResolvedCreateFunctionStmt createFunctionStmt = (ResolvedCreateFunctionStmt) resolvedStatement; - Function userFunction = - new Function( - createFunctionStmt.getNamePath(), - USER_DEFINED_FUNCTIONS, - // TODO(BEAM-9954) handle aggregate functions - // TODO(BEAM-9969) handle table functions - Mode.SCALAR, - ImmutableList.of(createFunctionStmt.getSignature())); try { - catalog.addFunction(userFunction); + catalog.addFunction(createFunctionStmt); } catch (IllegalArgumentException e) { - throw new ParseException( + throw new RuntimeException( String.format( - "Failed to define function %s", String.join(".", createFunctionStmt.getNamePath())), + "Failed to define function '%s'", + String.join(".", createFunctionStmt.getNamePath())), e); } } else if (resolvedStatement.nodeKind() == RESOLVED_CREATE_TABLE_FUNCTION_STMT) { ResolvedCreateTableFunctionStmt createTableFunctionStmt = (ResolvedCreateTableFunctionStmt) resolvedStatement; - catalog.addTableValuedFunction( - new TableValuedFunction.FixedOutputSchemaTVF( - createTableFunctionStmt.getNamePath(), - createTableFunctionStmt.getSignature(), - TVFRelation.createColumnBased( - createTableFunctionStmt.getQuery().getColumnList().stream() - .map(c -> TVFRelation.Column.create(c.getName(), c.getType())) - .collect(Collectors.toList())))); + catalog.addTableValuedFunction(createTableFunctionStmt); } else if (!SUPPORTED_STATEMENT_KINDS.contains(resolvedStatement.nodeKind())) { throw new UnsupportedOperationException( "Unrecognized statement type " + resolvedStatement.nodeKindString()); @@ -174,13 +144,15 @@ static AnalyzerOptions baseAnalyzerOptions() { .setEnabledLanguageFeatures( new HashSet<>( Arrays.asList( - LanguageFeature.FEATURE_NUMERIC_TYPE, + LanguageFeature.FEATURE_CREATE_AGGREGATE_FUNCTION, + LanguageFeature.FEATURE_CREATE_TABLE_FUNCTION, LanguageFeature.FEATURE_DISALLOW_GROUP_BY_FLOAT, - LanguageFeature.FEATURE_V_1_2_CIVIL_TIME, - LanguageFeature.FEATURE_V_1_1_SELECT_STAR_EXCEPT_REPLACE, + LanguageFeature.FEATURE_NUMERIC_TYPE, LanguageFeature.FEATURE_TABLE_VALUED_FUNCTIONS, - LanguageFeature.FEATURE_CREATE_TABLE_FUNCTION, - LanguageFeature.FEATURE_TEMPLATE_FUNCTIONS))); + LanguageFeature.FEATURE_TEMPLATE_FUNCTIONS, + LanguageFeature.FEATURE_V_1_1_SELECT_STAR_EXCEPT_REPLACE, + LanguageFeature.FEATURE_V_1_2_CIVIL_TIME, + LanguageFeature.FEATURE_V_1_3_ADDITIONAL_STRING_FUNCTIONS))); options.getLanguageOptions().setSupportedStatementKinds(SUPPORTED_STATEMENT_KINDS); return options; @@ -205,179 +177,4 @@ static AnalyzerOptions getAnalyzerOptions(QueryParameters queryParams, String de return options; } - - /** - * Creates a SimpleCatalog which represents the top-level schema, populates it with tables, - * built-in functions. - */ - SimpleCatalog createPopulatedCatalog( - String catalogName, AnalyzerOptions options, List> tables) { - - SimpleCatalog catalog = new SimpleCatalog(catalogName); - addBuiltinFunctionsToCatalog(catalog, options); - - tables.forEach(table -> addTableToLeafCatalog(queryTrait, catalog, table)); - - return catalog; - } - - private void addBuiltinFunctionsToCatalog(SimpleCatalog catalog, AnalyzerOptions options) { - // Enable ZetaSQL builtin functions. - ZetaSQLBuiltinFunctionOptions zetasqlBuiltinFunctionOptions = - new ZetaSQLBuiltinFunctionOptions(options.getLanguageOptions()); - - SupportedZetaSqlBuiltinFunctions.ALLOWLIST.forEach( - zetasqlBuiltinFunctionOptions::includeFunctionSignatureId); - - catalog.addZetaSQLFunctions(zetasqlBuiltinFunctionOptions); - - FUNCTION_LIST.stream() - .map(func -> (ResolvedCreateFunctionStmt) Analyzer.analyzeStatement(func, options, catalog)) - .map( - resolvedFunc -> - new Function( - String.join(".", resolvedFunc.getNamePath()), - PRE_DEFINED_WINDOW_FUNCTIONS, - Mode.SCALAR, - ImmutableList.of(resolvedFunc.getSignature()))) - .forEach(catalog::addFunction); - - FunctionArgumentType retType = - new FunctionArgumentType(SignatureArgumentKind.ARG_TYPE_RELATION); - - FunctionArgumentType inputTableType = - new FunctionArgumentType(SignatureArgumentKind.ARG_TYPE_RELATION); - - FunctionArgumentType descriptorType = - new FunctionArgumentType( - SignatureArgumentKind.ARG_TYPE_DESCRIPTOR, - FunctionArgumentType.FunctionArgumentTypeOptions.builder() - .setDescriptorResolutionTableOffset(0) - .build(), - 1); - - FunctionArgumentType stringType = - new FunctionArgumentType(TypeFactory.createSimpleType(TypeKind.TYPE_STRING)); - - // TUMBLE - catalog.addTableValuedFunction( - new TableValuedFunction.ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF( - ImmutableList.of(TVFStreamingUtils.FIXED_WINDOW_TVF), - new FunctionSignature( - retType, ImmutableList.of(inputTableType, descriptorType, stringType), -1), - ImmutableList.of( - TVFRelation.Column.create( - TVFStreamingUtils.WINDOW_START, - TypeFactory.createSimpleType(TypeKind.TYPE_TIMESTAMP)), - TVFRelation.Column.create( - TVFStreamingUtils.WINDOW_END, - TypeFactory.createSimpleType(TypeKind.TYPE_TIMESTAMP))), - null, - null)); - - // HOP - catalog.addTableValuedFunction( - new TableValuedFunction.ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF( - ImmutableList.of(TVFStreamingUtils.SLIDING_WINDOW_TVF), - new FunctionSignature( - retType, - ImmutableList.of(inputTableType, descriptorType, stringType, stringType), - -1), - ImmutableList.of( - TVFRelation.Column.create( - TVFStreamingUtils.WINDOW_START, - TypeFactory.createSimpleType(TypeKind.TYPE_TIMESTAMP)), - TVFRelation.Column.create( - TVFStreamingUtils.WINDOW_END, - TypeFactory.createSimpleType(TypeKind.TYPE_TIMESTAMP))), - null, - null)); - - // SESSION - catalog.addTableValuedFunction( - new TableValuedFunction.ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF( - ImmutableList.of(TVFStreamingUtils.SESSION_WINDOW_TVF), - new FunctionSignature( - retType, - ImmutableList.of(inputTableType, descriptorType, descriptorType, stringType), - -1), - ImmutableList.of( - TVFRelation.Column.create( - TVFStreamingUtils.WINDOW_START, - TypeFactory.createSimpleType(TypeKind.TYPE_TIMESTAMP)), - TVFRelation.Column.create( - TVFStreamingUtils.WINDOW_END, - TypeFactory.createSimpleType(TypeKind.TYPE_TIMESTAMP))), - null, - null)); - } - - /** - * Assume last element in tablePath is a table name, and everything before is catalogs. So the - * logic is to create nested catalogs until the last level, then add a table at the last level. - * - *

    Table schema is extracted from Calcite schema based on the table name resultion strategy, - * e.g. either by drilling down the schema.getSubschema() path or joining the table name with dots - * to construct a single compound identifier (e.g. Data Catalog use case). - */ - private void addTableToLeafCatalog( - QueryTrait trait, SimpleCatalog topLevelCatalog, List tablePath) { - - SimpleCatalog leafCatalog = createNestedCatalogs(topLevelCatalog, tablePath); - - org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Table calciteTable = - TableResolution.resolveCalciteTable(topLevelSchema, tablePath); - - if (calciteTable == null) { - throw new SqlConversionException( - "Wasn't able to resolve the path " - + tablePath - + " in schema: " - + topLevelSchema.getName()); - } - - RelDataType rowType = calciteTable.getRowType(typeFactory); - - SimpleTableWithPath tableWithPath = SimpleTableWithPath.of(tablePath); - trait.addResolvedTable(tableWithPath); - - addFieldsToTable(tableWithPath, rowType); - leafCatalog.addSimpleTable(tableWithPath.getTable()); - } - - private void addFieldsToTable(SimpleTableWithPath tableWithPath, RelDataType rowType) { - for (RelDataTypeField field : rowType.getFieldList()) { - tableWithPath - .getTable() - .addSimpleColumn( - field.getName(), ZetaSqlCalciteTranslationUtils.toZetaSqlType(field.getType())); - } - } - - /** For table path like a.b.c we assume c is the table and a.b are the nested catalogs/schemas. */ - private SimpleCatalog createNestedCatalogs(SimpleCatalog catalog, List tablePath) { - SimpleCatalog currentCatalog = catalog; - for (int i = 0; i < tablePath.size() - 1; i++) { - String nextCatalogName = tablePath.get(i); - - Optional existing = tryGetExisting(currentCatalog, nextCatalogName); - - currentCatalog = - existing.isPresent() ? existing.get() : addNewCatalog(currentCatalog, nextCatalogName); - } - return currentCatalog; - } - - private Optional tryGetExisting( - SimpleCatalog currentCatalog, String nextCatalogName) { - return currentCatalog.getCatalogList().stream() - .filter(c -> nextCatalogName.equals(c.getFullName())) - .findFirst(); - } - - private SimpleCatalog addNewCatalog(SimpleCatalog currentCatalog, String nextCatalogName) { - SimpleCatalog nextCatalog = new SimpleCatalog(nextCatalogName); - currentCatalog.addSimpleCatalog(nextCatalog); - return nextCatalog; - } } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java index 16471fecf58a..6d9d11487e49 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java @@ -395,7 +395,7 @@ class SupportedZetaSqlBuiltinFunctions { // Aggregate functions. FunctionSignatureId.FN_ANY_VALUE, // any_value - // FunctionSignatureId.FN_ARRAY_AGG, // array_agg + FunctionSignatureId.FN_ARRAY_AGG, // array_agg // FunctionSignatureId.FN_ARRAY_CONCAT_AGG, // array_concat_agg FunctionSignatureId.FN_AVG_INT64, // avg FunctionSignatureId.FN_AVG_DOUBLE, // avg @@ -415,6 +415,7 @@ class SupportedZetaSqlBuiltinFunctions { // JIRA link: https://issues.apache.org/jira/browse/BEAM-10379 // FunctionSignatureId.FN_BIT_AND_INT64, // bit_and FunctionSignatureId.FN_BIT_OR_INT64, // bit_or + // TODO(BEAM-10379) Re-enable when nulls are handled properly. // FunctionSignatureId.FN_BIT_XOR_INT64, // bit_xor // FunctionSignatureId.FN_LOGICAL_AND, // logical_and // FunctionSignatureId.FN_LOGICAL_OR, // logical_or @@ -461,7 +462,7 @@ class SupportedZetaSqlBuiltinFunctions { // FunctionSignatureId.FN_VAR_SAMP_NUMERIC, // var_samp // FunctionSignatureId.FN_VAR_SAMP_BIGNUMERIC, // var_samp - // FunctionSignatureId.FN_COUNTIF, // countif + FunctionSignatureId.FN_COUNTIF, // countif // Approximate quantiles functions that produce or consume intermediate // sketches. All found in the "kll_quantiles.*" namespace. @@ -525,7 +526,7 @@ class SupportedZetaSqlBuiltinFunctions { FunctionSignatureId.FN_TO_JSON_STRING, // to_json_string(any[, bool]) -> string FunctionSignatureId.FN_JSON_QUERY, // json_query(string, string) -> string // FunctionSignatureId.FN_JSON_QUERY_JSON, // json_query(json, string) -> json - FunctionSignatureId.FN_JSON_VALUE // json_value(string, string) -> string + FunctionSignatureId.FN_JSON_VALUE, // json_value(string, string) -> string // FunctionSignatureId.FN_JSON_VALUE_JSON, // json_value(json, string) -> json // Net functions. These are all found in the "net.*" namespace. @@ -547,14 +548,14 @@ class SupportedZetaSqlBuiltinFunctions { // FunctionSignatureId.FN_NET_IPV4_TO_INT64, // net.ipv4_to_int64(bytes) // Hashing functions. - // FunctionSignatureId.FN_MD5_BYTES, // md5(bytes) - // FunctionSignatureId.FN_MD5_STRING, // md5(string) - // FunctionSignatureId.FN_SHA1_BYTES, // sha1(bytes) - // FunctionSignatureId.FN_SHA1_STRING, // sha1(string) - // FunctionSignatureId.FN_SHA256_BYTES, // sha256(bytes) - // FunctionSignatureId.FN_SHA256_STRING, // sha256(string) - // FunctionSignatureId.FN_SHA512_BYTES, // sha512(bytes) - // FunctionSignatureId.FN_SHA512_STRING, // sha512(string) + FunctionSignatureId.FN_MD5_BYTES, // md5(bytes) + FunctionSignatureId.FN_MD5_STRING, // md5(string) + FunctionSignatureId.FN_SHA1_BYTES, // sha1(bytes) + FunctionSignatureId.FN_SHA1_STRING, // sha1(string) + FunctionSignatureId.FN_SHA256_BYTES, // sha256(bytes) + FunctionSignatureId.FN_SHA256_STRING, // sha256(string) + FunctionSignatureId.FN_SHA512_BYTES, // sha512(bytes) + FunctionSignatureId.FN_SHA512_STRING // sha512(string) // Fingerprinting functions // FunctionSignatureId.FN_FARM_FINGERPRINT_BYTES, // farm_fingerprint(bytes) -> int64 diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/TableResolution.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/TableResolution.java index b76884d55607..907a39ae5f16 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/TableResolution.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/TableResolution.java @@ -23,10 +23,10 @@ import org.apache.beam.sdk.extensions.sql.impl.BeamCalciteSchema; import org.apache.beam.sdk.extensions.sql.impl.TableName; import org.apache.beam.sdk.extensions.sql.meta.CustomTableResolver; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Schema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Table; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Schema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Table; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; /** Utility methods to resolve a table, given a top-level Calcite schema and a table path. */ diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLPlannerImpl.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLPlannerImpl.java index 6259d12485c3..2de752f37fc1 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLPlannerImpl.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLPlannerImpl.java @@ -17,41 +17,29 @@ */ package org.apache.beam.sdk.extensions.sql.zetasql; -import static com.google.zetasql.ZetaSQLResolvedNodeKind.ResolvedNodeKind.RESOLVED_CREATE_FUNCTION_STMT; -import static com.google.zetasql.ZetaSQLResolvedNodeKind.ResolvedNodeKind.RESOLVED_CREATE_TABLE_FUNCTION_STMT; -import static com.google.zetasql.ZetaSQLResolvedNodeKind.ResolvedNodeKind.RESOLVED_QUERY_STMT; - import com.google.zetasql.AnalyzerOptions; import com.google.zetasql.LanguageOptions; -import com.google.zetasql.ParseResumeLocation; -import com.google.zetasql.SimpleCatalog; -import com.google.zetasql.resolvedast.ResolvedNode; -import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedCreateFunctionStmt; -import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedCreateTableFunctionStmt; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedQueryStmt; -import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedStatement; import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.QueryPlanner.QueryParameters; import org.apache.beam.sdk.extensions.sql.zetasql.translation.ConversionContext; import org.apache.beam.sdk.extensions.sql.zetasql.translation.ExpressionConverter; import org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter; -import org.apache.beam.sdk.extensions.sql.zetasql.translation.UserFunctionDefinitions; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.java.JavaTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelRoot; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexExecutor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.FrameworkConfig; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Frameworks; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Program; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Util; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelRoot; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexExecutor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.FrameworkConfig; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Frameworks; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Program; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Util; /** ZetaSQLPlannerImpl. */ @SuppressWarnings({ @@ -89,56 +77,25 @@ class ZetaSQLPlannerImpl { public RelRoot rel(String sql, QueryParameters params) { RelOptCluster cluster = RelOptCluster.create(planner, new RexBuilder(typeFactory)); - QueryTrait trait = new QueryTrait(); - SqlAnalyzer analyzer = - new SqlAnalyzer(trait, defaultSchemaPlus, (JavaTypeFactory) cluster.getTypeFactory()); - AnalyzerOptions options = SqlAnalyzer.getAnalyzerOptions(params, defaultTimezone); + BeamZetaSqlCatalog catalog = + BeamZetaSqlCatalog.create( + defaultSchemaPlus, (JavaTypeFactory) cluster.getTypeFactory(), options); // Set up table providers that need to be pre-registered + SqlAnalyzer analyzer = new SqlAnalyzer(); List> tables = analyzer.extractTableNames(sql, options); TableResolution.registerTables(this.defaultSchemaPlus, tables); - SimpleCatalog catalog = - analyzer.createPopulatedCatalog(defaultSchemaPlus.getName(), options, tables); - - ImmutableMap.Builder, ResolvedCreateFunctionStmt> udfBuilder = - ImmutableMap.builder(); - ImmutableMap.Builder, ResolvedNode> udtvfBuilder = ImmutableMap.builder(); - - ResolvedStatement statement; - ParseResumeLocation parseResumeLocation = new ParseResumeLocation(sql); - do { - statement = analyzer.analyzeNextStatement(parseResumeLocation, options, catalog); - if (statement.nodeKind() == RESOLVED_CREATE_FUNCTION_STMT) { - ResolvedCreateFunctionStmt createFunctionStmt = (ResolvedCreateFunctionStmt) statement; - udfBuilder.put(createFunctionStmt.getNamePath(), createFunctionStmt); - } else if (statement.nodeKind() == RESOLVED_CREATE_TABLE_FUNCTION_STMT) { - ResolvedCreateTableFunctionStmt createTableFunctionStmt = - (ResolvedCreateTableFunctionStmt) statement; - udtvfBuilder.put(createTableFunctionStmt.getNamePath(), createTableFunctionStmt.getQuery()); - } else if (statement.nodeKind() == RESOLVED_QUERY_STMT) { - if (!SqlAnalyzer.isEndOfInput(parseResumeLocation)) { - throw new UnsupportedOperationException( - "No additional statements are allowed after a SELECT statement."); - } - break; - } - } while (!SqlAnalyzer.isEndOfInput(parseResumeLocation)); - - if (!(statement instanceof ResolvedQueryStmt)) { - throw new UnsupportedOperationException( - "Statement list must end in a SELECT statement, not " + statement.nodeKindString()); - } - - UserFunctionDefinitions userFunctionDefinitions = - new UserFunctionDefinitions(udfBuilder.build(), udtvfBuilder.build()); + QueryTrait trait = new QueryTrait(); + catalog.addTables(tables, trait); + + ResolvedQueryStmt statement = analyzer.analyzeQuery(sql, options, catalog); ExpressionConverter expressionConverter = - new ExpressionConverter(cluster, params, userFunctionDefinitions); + new ExpressionConverter(cluster, params, catalog.getUserFunctionDefinitions()); ConversionContext context = ConversionContext.of(config, expressionConverter, cluster, trait); - RelNode convertedNode = - QueryStatementConverter.convertRootQuery(context, (ResolvedQueryStmt) statement); + RelNode convertedNode = QueryStatementConverter.convertRootQuery(context, statement); return RelRoot.of(convertedNode, SqlKind.ALL); } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java index b943ab3d2b74..2bc25edfc5bc 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java @@ -38,32 +38,36 @@ import org.apache.beam.sdk.extensions.sql.impl.rule.BeamUnnestRule; import org.apache.beam.sdk.extensions.sql.zetasql.unnest.BeamZetaSqlUncollectRule; import org.apache.beam.sdk.extensions.sql.zetasql.unnest.BeamZetaSqlUnnestRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionConfig; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.ConventionTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.CalciteCatalogReader; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelRoot; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.ChainedRelMetadataProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.JaninoRelMetadataProvider; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.FilterCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.JoinCommuteRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.ProjectCalcMergeRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParser; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserImplFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.util.ChainedSqlOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.FrameworkConfig; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Frameworks; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSets; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionConfig; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.ConventionTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.prepare.CalciteCatalogReader; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelRoot; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.ChainedRelMetadataProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.JaninoRelMetadataProvider; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.FilterCalcMergeRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.JoinCommuteRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.rules.ProjectCalcMergeRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParser; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserImplFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.util.SqlOperatorTables; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.FrameworkConfig; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Frameworks; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSets; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** ZetaSQLQueryPlanner. */ @SuppressWarnings({ @@ -71,6 +75,13 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ZetaSQLQueryPlanner implements QueryPlanner { + public static final Collection DEFAULT_CALC = + ImmutableList.builder() + .add(BeamZetaSqlCalcSplittingRule.INSTANCE, BeamZetaSqlCalcMergeRule.INSTANCE) + .build(); + + private static final Logger LOG = LoggerFactory.getLogger(ZetaSQLQueryPlanner.class); + private final ZetaSQLPlannerImpl plannerImpl; public ZetaSQLQueryPlanner(FrameworkConfig config) { @@ -83,7 +94,8 @@ public ZetaSQLQueryPlanner(FrameworkConfig config) { */ public ZetaSQLQueryPlanner(JdbcConnection jdbcConnection, Collection ruleSets) { plannerImpl = - new ZetaSQLPlannerImpl(defaultConfig(jdbcConnection, modifyRuleSetsForZetaSql(ruleSets))); + new ZetaSQLPlannerImpl( + defaultConfig(jdbcConnection, modifyRuleSetsForZetaSql(ruleSets, DEFAULT_CALC))); setDefaultTimezone( jdbcConnection .getPipelineOptions() @@ -101,10 +113,15 @@ public QueryPlanner createPlanner( }; public static Collection getZetaSqlRuleSets() { - return modifyRuleSetsForZetaSql(BeamRuleSets.getRuleSets()); + return modifyRuleSetsForZetaSql(BeamRuleSets.getRuleSets(), DEFAULT_CALC); } - private static Collection modifyRuleSetsForZetaSql(Collection ruleSets) { + public static Collection getZetaSqlRuleSets(Collection calc) { + return modifyRuleSetsForZetaSql(BeamRuleSets.getRuleSets(), calc); + } + + private static Collection modifyRuleSetsForZetaSql( + Collection ruleSets, Collection calc) { ImmutableList.Builder ret = ImmutableList.builder(); for (RuleSet ruleSet : ruleSets) { ImmutableList.Builder bd = ImmutableList.builder(); @@ -122,7 +139,7 @@ private static Collection modifyRuleSetsForZetaSql(Collection // planning result eventually. continue; } else if (rule instanceof BeamCalcRule) { - bd.add(BeamZetaSqlCalcRule.INSTANCE); + bd.addAll(calc); } else if (rule instanceof BeamUnnestRule) { bd.add(BeamZetaSqlUnnestRule.INSTANCE); } else if (rule instanceof BeamUncollectRule) { @@ -189,14 +206,20 @@ private BeamRelNode convertToBeamRelInternal(String sql, QueryParameters queryPa .getCluster() .setMetadataProvider( ChainedRelMetadataProvider.of( - org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList.of( + org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList.of( NonCumulativeCostImpl.SOURCE, RelMdNodeStats.SOURCE, root.rel.getCluster().getMetadataProvider()))); RelMetadataQuery.THREAD_PROVIDERS.set( JaninoRelMetadataProvider.of(root.rel.getCluster().getMetadataProvider())); root.rel.getCluster().invalidateMetadataQuery(); - return (BeamRelNode) plannerImpl.transform(0, desiredTraits, root.rel); + try { + BeamRelNode beamRelNode = (BeamRelNode) plannerImpl.transform(0, desiredTraits, root.rel); + LOG.info("BEAMPlan>\n" + RelOptUtil.toString(beamRelNode)); + return beamRelNode; + } catch (RelOptPlanner.CannotPlanException e) { + throw new SqlConversionException("Failed to produce plan for query " + sql, e); + } } private static FrameworkConfig defaultConfig( @@ -236,7 +259,7 @@ private static FrameworkConfig defaultConfig( .ruleSets(ruleSets.toArray(new RuleSet[0])) .costFactory(BeamCostModel.FACTORY) .typeSystem(connection.getTypeFactory().getTypeSystem()) - .operatorTable(ChainedSqlOperatorTable.of(opTab0, catalogReader)) + .operatorTable(SqlOperatorTables.chain(opTab0, catalogReader)) .build(); } } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlBeamTranslationUtils.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlBeamTranslationUtils.java index ca6bb776f48a..f29c9307cfdf 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlBeamTranslationUtils.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlBeamTranslationUtils.java @@ -40,6 +40,7 @@ import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; import org.apache.beam.sdk.values.Row; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.LongMath; +import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Instant; /** @@ -108,7 +109,7 @@ public static StructType toZetaSqlStructType(Schema schema) { } // Value conversion: Beam => ZetaSQL - public static Value toZetaSqlValue(Object object, FieldType fieldType) { + public static Value toZetaSqlValue(@Nullable Object object, FieldType fieldType) { if (object == null) { return Value.createNullValue(toZetaSqlType(fieldType)); } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlCalciteTranslationUtils.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlCalciteTranslationUtils.java index 203337c2ec9c..717c71f0e09b 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlCalciteTranslationUtils.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlCalciteTranslationUtils.java @@ -34,17 +34,17 @@ import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect; import org.apache.beam.sdk.extensions.sql.zetasql.translation.SqlOperators; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.ByteString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.TimeUnit; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.TimeUnitRange; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.DateString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.TimeString; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.TimestampString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.ByteString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.TimeUnit; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.TimeUnitRange; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.DateString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.TimeString; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.TimestampString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -58,6 +58,14 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public final class ZetaSqlCalciteTranslationUtils { + // Maximum and minimum allowed values for the NUMERIC/DECIMAL data type. + // https://github.com/google/zetasql/blob/master/docs/data-types.md#decimal-type + public static final BigDecimal ZETASQL_NUMERIC_MAX_VALUE = + new BigDecimal("99999999999999999999999999999.999999999"); + public static final BigDecimal ZETASQL_NUMERIC_MIN_VALUE = + new BigDecimal("-99999999999999999999999999999.999999999"); + // Number of digits after the decimal point supported by the NUMERIC data type. + public static final int ZETASQL_NUMERIC_SCALE = 9; private ZetaSqlCalciteTranslationUtils() {} @@ -290,7 +298,10 @@ public static RexNode toRexNode(Value value, RexBuilder rexBuilder) { private static RexNode arrayValueToRexNode(Value value, RexBuilder rexBuilder) { return rexBuilder.makeCall( - toCalciteArrayType(value.getType().asArray().getElementType(), false, rexBuilder), + toCalciteArrayType( + value.getType().asArray().getElementType(), + value.getElementList().stream().anyMatch(v -> v.isNull()), + rexBuilder), SqlStdOperatorTable.ARRAY_VALUE_CONSTRUCTOR, value.getElementList().stream() .map(v -> toRexNode(v, rexBuilder)) diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlException.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlException.java new file mode 100644 index 000000000000..7a948efb4b74 --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlException.java @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.zetasql.io.grpc.Status; +import com.google.zetasql.io.grpc.StatusRuntimeException; + +/** + * Exception to be thrown by the Beam ZetaSQL planner. + * + *

    Wraps a {@link StatusRuntimeException} containing a GRPC status code. + */ +public class ZetaSqlException extends RuntimeException { + + public ZetaSqlException(StatusRuntimeException cause) { + super(cause); + } + + public ZetaSqlException(String message) { + this(Status.UNIMPLEMENTED.withDescription(message).asRuntimeException()); + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/AggregateScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/AggregateScanConverter.java index 0a76bbde8274..bd73241ea06d 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/AggregateScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/AggregateScanConverter.java @@ -33,16 +33,19 @@ import java.util.List; import java.util.stream.Collectors; import java.util.stream.IntStream; +import org.apache.beam.sdk.extensions.sql.impl.UdafImpl; +import org.apache.beam.sdk.extensions.sql.zetasql.BeamZetaSqlCatalog; import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollations; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.AggregateCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalAggregate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalProject; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlAggFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.ImmutableBitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollations; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.AggregateCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalAggregate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalProject; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlAggFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlReturnTypeInference; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ImmutableBitSet; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -170,7 +173,7 @@ private LogicalProject convertAggregateScanInputScanToLogicalProject( } } - return LogicalProject.create(input, projects, fieldNames); + return LogicalProject.create(input, ImmutableList.of(), projects, fieldNames); } private AggregateCall convertAggCall( @@ -200,14 +203,44 @@ private AggregateCall convertAggCall( + " aggregation."); } - SqlAggFunction sqlAggFunction = - (SqlAggFunction) - SqlOperatorMappingTable.ZETASQL_FUNCTION_TO_CALCITE_SQL_OPERATOR.get( - aggregateFunctionCall.getFunction().getName()); - if (sqlAggFunction == null) { - throw new UnsupportedOperationException( - "Does not support ZetaSQL aggregate function: " - + aggregateFunctionCall.getFunction().getName()); + final SqlAggFunction sqlAggFunction; + if (aggregateFunctionCall + .getFunction() + .getGroup() + .equals(BeamZetaSqlCatalog.USER_DEFINED_JAVA_AGGREGATE_FUNCTIONS)) { + // Create a new operator for user-defined functions. + SqlReturnTypeInference typeInference = + x -> + ZetaSqlCalciteTranslationUtils.toCalciteType( + aggregateFunctionCall + .getFunction() + .getSignatureList() + .get(0) + .getResultType() + .getType(), + // TODO(BEAM-9514) set nullable=true + false, + getCluster().getRexBuilder()); + UdafImpl impl = + new UdafImpl<>( + getExpressionConverter() + .userFunctionDefinitions + .javaAggregateFunctions() + .get(aggregateFunctionCall.getFunction().getNamePath())); + sqlAggFunction = + SqlOperators.createUdafOperator( + aggregateFunctionCall.getFunction().getName(), typeInference, impl); + } else { + // Look up builtin functions in SqlOperatorMappingTable. + sqlAggFunction = + (SqlAggFunction) + SqlOperatorMappingTable.ZETASQL_FUNCTION_TO_CALCITE_SQL_OPERATOR.get( + aggregateFunctionCall.getFunction().getName()); + if (sqlAggFunction == null) { + throw new UnsupportedOperationException( + "Does not support ZetaSQL aggregate function: " + + aggregateFunctionCall.getFunction().getName()); + } } List argList = new ArrayList<>(); diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanColumnRefToUncollect.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanColumnRefToUncollect.java index 08a0fdae54a0..655140d412c2 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanColumnRefToUncollect.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanColumnRefToUncollect.java @@ -22,17 +22,17 @@ import java.util.Collections; import java.util.List; import org.apache.beam.sdk.extensions.sql.zetasql.unnest.ZetaSqlUnnest; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.CorrelationId; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalCorrelate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalProject; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.ImmutableBitSet; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.CorrelationId; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalCorrelate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalProject; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.ImmutableBitSet; /** * Converts array scan that represents a reference to an array column, or an (possibly nested) array @@ -87,6 +87,7 @@ public RelNode convert(ResolvedNodes.ResolvedArrayScan zetaNode, List i RelNode projectNode = LogicalProject.create( createOneRow(getCluster()), + ImmutableList.of(), Collections.singletonList( convertArrayExpr( zetaNode.getArrayExpr(), getCluster().getRexBuilder(), convertedColumnRef)), diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanLiteralToUncollectConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanLiteralToUncollectConverter.java index 402bb7b4b2c8..f194840b7bd9 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanLiteralToUncollectConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanLiteralToUncollectConverter.java @@ -21,9 +21,9 @@ import java.util.Collections; import java.util.List; import org.apache.beam.sdk.extensions.sql.zetasql.unnest.ZetaSqlUnnest; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalProject; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalProject; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; /** Converts array scan that represents an array literal to uncollect. */ @@ -51,6 +51,7 @@ public RelNode convert(ResolvedArrayScan zetaNode, List inputs) { RelNode projectNode = LogicalProject.create( createOneRow(getCluster()), + ImmutableList.of(), Collections.singletonList(arrayLiteralExpression), ImmutableList.of(fieldName)); diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanToJoinConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanToJoinConverter.java index 629d0367c433..ccc43ab1410b 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanToJoinConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ArrayScanToJoinConverter.java @@ -24,15 +24,15 @@ import java.util.Collections; import java.util.List; import org.apache.beam.sdk.extensions.sql.zetasql.unnest.ZetaSqlUnnest; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.CorrelationId; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalJoin; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalProject; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.CorrelationId; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalJoin; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalProject; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** Converts array scan that represents join of an uncollect(array_field) to uncollect. */ class ArrayScanToJoinConverter extends RelConverter { @@ -80,7 +80,8 @@ public RelNode convert(ResolvedArrayScan zetaNode, List inputs) { columnRef.getColumn().getId(), zetaNode.getInputScan().getColumnList()))); RelNode projectNode = - LogicalProject.create(createOneRow(getCluster()), projects, ImmutableList.of(columnName)); + LogicalProject.create( + createOneRow(getCluster()), ImmutableList.of(), projects, ImmutableList.of(columnName)); // Create an UnCollect boolean ordinality = (zetaNode.getArrayOffsetColumn() != null); @@ -104,13 +105,22 @@ public RelNode convert(ResolvedArrayScan zetaNode, List inputs) { zetaNode.getArrayOffsetColumn().getColumn().getName())); } - RelNode rightInput = LogicalProject.create(uncollectNode, rightProjects, rightNames); + RelNode rightInput = + LogicalProject.create(uncollectNode, ImmutableList.of(), rightProjects, rightNames); // Join condition should be a RexNode converted from join_expr. RexNode condition = getExpressionConverter().convertRexNodeFromResolvedExpr(zetaNode.getJoinExpr()); JoinRelType joinRelType = zetaNode.getIsOuter() ? JoinRelType.LEFT : JoinRelType.INNER; - return LogicalJoin.create(leftInput, rightInput, condition, ImmutableSet.of(), joinRelType); + return LogicalJoin.create( + leftInput, + rightInput, + ImmutableList.of(), + condition, + ImmutableSet.of(), + joinRelType, + false, + ImmutableList.of()); } } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ConversionContext.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ConversionContext.java index 9523b7669f15..0e7b4b816b60 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ConversionContext.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ConversionContext.java @@ -23,9 +23,9 @@ import java.util.Map; import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.sql.zetasql.QueryTrait; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.FrameworkConfig; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.FrameworkConfig; /** Conversion context, some rules need this data to convert the nodes. */ @Internal @@ -79,7 +79,7 @@ QueryTrait getTrait() { } Map, ResolvedNode> getUserDefinedTableValuedFunctions() { - return getExpressionConverter().userFunctionDefinitions.sqlTableValuedFunctions; + return getExpressionConverter().userFunctionDefinitions.sqlTableValuedFunctions(); } Map getFunctionArgumentRefMapping() { diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java index 24a5e1c74c09..cae2ac38a010 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java @@ -22,10 +22,13 @@ import static com.google.zetasql.ZetaSQLType.TypeKind.TYPE_BYTES; import static com.google.zetasql.ZetaSQLType.TypeKind.TYPE_DOUBLE; import static com.google.zetasql.ZetaSQLType.TypeKind.TYPE_INT64; +import static com.google.zetasql.ZetaSQLType.TypeKind.TYPE_NUMERIC; import static com.google.zetasql.ZetaSQLType.TypeKind.TYPE_STRING; import static com.google.zetasql.ZetaSQLType.TypeKind.TYPE_TIMESTAMP; -import static org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.PRE_DEFINED_WINDOW_FUNCTIONS; -import static org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.USER_DEFINED_FUNCTIONS; +import static org.apache.beam.sdk.extensions.sql.zetasql.BeamZetaSqlCatalog.PRE_DEFINED_WINDOW_FUNCTIONS; +import static org.apache.beam.sdk.extensions.sql.zetasql.BeamZetaSqlCatalog.USER_DEFINED_JAVA_SCALAR_FUNCTIONS; +import static org.apache.beam.sdk.extensions.sql.zetasql.BeamZetaSqlCatalog.USER_DEFINED_SQL_FUNCTIONS; +import static org.apache.beam.sdk.extensions.sql.zetasql.BeamZetaSqlCatalog.ZETASQL_FUNCTION_GROUP_NAME; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import com.google.common.base.Ascii; @@ -39,6 +42,7 @@ import com.google.zetasql.Type; import com.google.zetasql.Value; import com.google.zetasql.ZetaSQLType.TypeKind; +import com.google.zetasql.io.grpc.Status; import com.google.zetasql.resolvedast.ResolvedColumn; import com.google.zetasql.resolvedast.ResolvedNodes; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedAggregateScan; @@ -63,28 +67,28 @@ import java.util.stream.IntStream; import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.sql.impl.QueryPlanner.QueryParameters; -import org.apache.beam.sdk.extensions.sql.impl.SqlConversionException; import org.apache.beam.sdk.extensions.sql.impl.ZetaSqlUserDefinedSQLNativeTableValuedFunction; import org.apache.beam.sdk.extensions.sql.impl.utils.TVFStreamingUtils; import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.util.TimeUnit; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFieldImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIntervalQualifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlRowOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.avatica.util.TimeUnit; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFieldImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelRecordType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexInputRef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIntervalQualifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlRowOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; import org.checkerframework.checker.nullness.qual.Nullable; /** @@ -378,7 +382,7 @@ private RexNode convertRexNodeFromComputedColumnWithFieldList( if (functionCall.getFunction().getName().equals(FIXED_WINDOW) || functionCall.getFunction().getName().equals(SLIDING_WINDOW) || functionCall.getFunction().getName().equals(SESSION_WINDOW)) { - throw new SqlConversionException( + throw new ZetaSqlException( functionCall.getFunction().getName() + " shouldn't appear in SELECT exprlist."); } @@ -608,7 +612,7 @@ private RexNode convertResolvedFunctionCall( SqlOperator op = SqlOperatorMappingTable.ZETASQL_FUNCTION_TO_CALCITE_SQL_OPERATOR.get(funName); List operands = new ArrayList<>(); - if (funGroup.equals(PRE_DEFINED_WINDOW_FUNCTIONS)) { + if (PRE_DEFINED_WINDOW_FUNCTIONS.equals(funGroup)) { switch (funName) { case FIXED_WINDOW: case SESSION_WINDOW: @@ -646,7 +650,7 @@ private RexNode convertResolvedFunctionCall( throw new UnsupportedOperationException( "Unsupported function: " + funName + ". Only support TUMBLE, HOP, and SESSION now."); } - } else if (funGroup.equals("ZetaSQL")) { + } else if (ZETASQL_FUNCTION_GROUP_NAME.equals(funGroup)) { if (op == null) { Type returnType = functionCall.getSignature().getResultType().getType(); if (returnType != null) { @@ -664,9 +668,31 @@ private RexNode convertResolvedFunctionCall( operands.add( convertRexNodeFromResolvedExpr(expr, columnList, fieldList, outerFunctionArguments)); } - } else if (funGroup.equals(USER_DEFINED_FUNCTIONS)) { + } else if (USER_DEFINED_JAVA_SCALAR_FUNCTIONS.equals(funGroup)) { + UserFunctionDefinitions.JavaScalarFunction javaScalarFunction = + userFunctionDefinitions + .javaScalarFunctions() + .get(functionCall.getFunction().getNamePath()); + ArrayList innerFunctionArguments = new ArrayList<>(); + for (int i = 0; i < functionCall.getArgumentList().size(); i++) { + ResolvedExpr argExpr = functionCall.getArgumentList().get(i); + RexNode argNode = + convertRexNodeFromResolvedExpr(argExpr, columnList, fieldList, outerFunctionArguments); + innerFunctionArguments.add(argNode); + } + return rexBuilder() + .makeCall( + SqlOperators.createUdfOperator( + functionCall.getFunction().getName(), + javaScalarFunction.method(), + USER_DEFINED_JAVA_SCALAR_FUNCTIONS, + javaScalarFunction.jarPath()), + innerFunctionArguments); + } else if (USER_DEFINED_SQL_FUNCTIONS.equals(funGroup)) { ResolvedCreateFunctionStmt createFunctionStmt = - userFunctionDefinitions.sqlScalarFunctions.get(functionCall.getFunction().getNamePath()); + userFunctionDefinitions + .sqlScalarFunctions() + .get(functionCall.getFunction().getNamePath()); ResolvedExpr functionExpression = createFunctionStmt.getFunctionExpression(); ImmutableMap.Builder innerFunctionArguments = ImmutableMap.builder(); for (int i = 0; i < functionCall.getArgumentList().size(); i++) { @@ -694,7 +720,7 @@ private RexNode convertResolvedFunctionCall( private RexNode convertIntervalToRexIntervalLiteral(ResolvedLiteral resolvedLiteral) { if (resolvedLiteral.getType().getKind() != TYPE_STRING) { - throw new SqlConversionException(INTERVAL_FORMAT_MSG); + throw new ZetaSqlException(INTERVAL_FORMAT_MSG); } String valStr = resolvedLiteral.getValue().getStringValue(); @@ -702,18 +728,22 @@ private RexNode convertIntervalToRexIntervalLiteral(ResolvedLiteral resolvedLite Arrays.stream(valStr.split(" ")).filter(s -> !s.isEmpty()).collect(Collectors.toList()); if (stringList.size() != 3) { - throw new SqlConversionException(INTERVAL_FORMAT_MSG); + throw new ZetaSqlException(INTERVAL_FORMAT_MSG); } if (!Ascii.toUpperCase(stringList.get(0)).equals("INTERVAL")) { - throw new SqlConversionException(INTERVAL_FORMAT_MSG); + throw new ZetaSqlException(INTERVAL_FORMAT_MSG); } long intervalValue; try { intervalValue = Long.parseLong(stringList.get(1)); } catch (NumberFormatException e) { - throw new SqlConversionException(INTERVAL_FORMAT_MSG, e); + throw new ZetaSqlException( + Status.UNIMPLEMENTED + .withDescription(INTERVAL_FORMAT_MSG) + .withCause(e) + .asRuntimeException()); } String intervalDatepart = Ascii.toUpperCase(stringList.get(2)); @@ -746,7 +776,7 @@ private static BigDecimal convertIntervalValueToMillis( case INTERVAL_SECOND: return new BigDecimal(value * ONE_SECOND_IN_MILLIS); default: - throw new SqlConversionException(qualifier.typeName().toString()); + throw new ZetaSqlException(qualifier.typeName().toString()); } } @@ -772,7 +802,7 @@ private static SqlIntervalQualifier convertIntervalDatepartToSqlIntervalQualifie case "MILLISECOND": return new SqlIntervalQualifier(TimeUnit.MILLISECOND, null, SqlParserPos.ZERO); default: - throw new SqlConversionException( + throw new ZetaSqlException( String.format( "Received an undefined INTERVAL unit: %s. Please specify unit from the following" + " list: %s.", @@ -794,7 +824,7 @@ private RexNode convertResolvedCast( private RexNode convertResolvedCast(ResolvedCast resolvedCast, RexNode input) { TypeKind fromType = resolvedCast.getExpr().getType().getKind(); TypeKind toType = resolvedCast.getType().getKind(); - isCastingSupported(fromType, toType); + isCastingSupported(fromType, toType, input); // nullability of the output type should match that of the input node's type RelDataType outputType = @@ -808,12 +838,48 @@ private RexNode convertResolvedCast(ResolvedCast resolvedCast, RexNode input) { } } - private static void isCastingSupported(TypeKind fromType, TypeKind toType) { + private static void isCastingSupported(TypeKind fromType, TypeKind toType, RexNode input) { if (UNSUPPORTED_CASTING.containsKey(toType) && UNSUPPORTED_CASTING.get(toType).contains(fromType)) { throw new UnsupportedOperationException( "Does not support CAST(" + fromType + " AS " + toType + ")"); } + if (fromType.equals(TYPE_DOUBLE) + && toType.equals(TYPE_NUMERIC) + && input instanceof RexLiteral) { + BigDecimal value = (BigDecimal) ((RexLiteral) input).getValue(); + if (value.compareTo(ZetaSqlCalciteTranslationUtils.ZETASQL_NUMERIC_MAX_VALUE) > 0) { + throw new ZetaSqlException( + Status.OUT_OF_RANGE + .withDescription( + String.format( + "Casting %s as %s would cause overflow of literal %s.", + fromType, toType, value)) + .asRuntimeException()); + } + if (value.compareTo(ZetaSqlCalciteTranslationUtils.ZETASQL_NUMERIC_MIN_VALUE) < 0) { + throw new ZetaSqlException( + Status.OUT_OF_RANGE + .withDescription( + String.format( + "Casting %s as %s would cause underflow of literal %s.", + fromType, toType, value)) + .asRuntimeException()); + } + if (value.scale() > ZetaSqlCalciteTranslationUtils.ZETASQL_NUMERIC_SCALE) { + throw new ZetaSqlException( + Status.OUT_OF_RANGE + .withDescription( + String.format( + "Cannot cast %s as %s: scale %d exceeds %d for literal %s.", + fromType, + toType, + value.scale(), + ZetaSqlCalciteTranslationUtils.ZETASQL_NUMERIC_SCALE, + value)) + .asRuntimeException()); + } + } } private static boolean isZetaSQLCast(TypeKind fromType, TypeKind toType) { diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/FilterScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/FilterScanConverter.java index bf72c2c91aab..b1cd3d1de04e 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/FilterScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/FilterScanConverter.java @@ -21,9 +21,9 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedFilterScan; import java.util.Collections; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalFilter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalFilter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** Converts filter. */ class FilterScanConverter extends RelConverter { diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/JoinScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/JoinScanConverter.java index 0f942067d05f..7c3570944e98 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/JoinScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/JoinScanConverter.java @@ -22,11 +22,11 @@ import com.google.zetasql.resolvedast.ResolvedNode; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedJoinScan; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalJoin; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalJoin; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; @@ -92,6 +92,7 @@ public RelNode convert(ResolvedJoinScan zetaNode, List inputs) { return LogicalJoin.create( convertedLeftInput, convertedRightInput, + ImmutableList.of(), condition, ImmutableSet.of(), convertResolvedJoinType(zetaNode.getJoinType())); diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/LimitOffsetScanToLimitConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/LimitOffsetScanToLimitConverter.java index 679d6789df27..50cc96e8cdd0 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/LimitOffsetScanToLimitConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/LimitOffsetScanToLimitConverter.java @@ -22,13 +22,13 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedOrderByScan; import java.util.Collections; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollations; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexDynamicParam; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollations; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalSort; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexDynamicParam; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/LimitOffsetScanToOrderByLimitConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/LimitOffsetScanToOrderByLimitConverter.java index ad40241985b1..cdd70792885c 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/LimitOffsetScanToOrderByLimitConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/LimitOffsetScanToOrderByLimitConverter.java @@ -18,8 +18,8 @@ package org.apache.beam.sdk.extensions.sql.zetasql.translation; import static java.util.stream.Collectors.toList; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelFieldCollation.Direction.ASCENDING; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelFieldCollation.Direction.DESCENDING; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelFieldCollation.Direction.ASCENDING; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelFieldCollation.Direction.DESCENDING; import com.google.zetasql.resolvedast.ResolvedNode; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedLimitOffsetScan; @@ -27,15 +27,16 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedOrderByScan; import java.util.Collections; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelCollationImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelFieldCollation; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelFieldCollation.Direction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalProject; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelCollationImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelFieldCollation; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelFieldCollation.Direction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalProject; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalSort; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; /** Converts ORDER BY LIMIT. */ @@ -64,7 +65,7 @@ public List getInputs(ResolvedLimitOffsetScan zetaNode) { @Override public RelNode convert(ResolvedLimitOffsetScan zetaNode, List inputs) { ResolvedOrderByScan inputOrderByScan = (ResolvedOrderByScan) zetaNode.getInputScan(); - RelNode input = convertOrderByScanToLogicalScan(inputOrderByScan, inputs.get(0)); + RelNode input = inputs.get(0); RelCollation relCollation = getRelCollation(inputOrderByScan); RexNode offset = @@ -83,12 +84,13 @@ public RelNode convert(ResolvedLimitOffsetScan zetaNode, List inputs) { throw new UnsupportedOperationException("Limit requires non-null count and offset"); } - return LogicalSort.create(input, relCollation, offset, fetch); + RelNode sorted = LogicalSort.create(input, relCollation, offset, fetch); + return convertOrderByScanToLogicalScan(inputOrderByScan, sorted); } /** Collation is a sort order, as in ORDER BY DESCENDING/ASCENDING. */ private static RelCollation getRelCollation(ResolvedOrderByScan node) { - final long inputOffset = node.getColumnList().get(0).getId(); + final long inputOffset = node.getInputScan().getColumnList().get(0).getId(); List fieldCollations = node.getOrderByItemList().stream() .map(item -> orderByItemToFieldCollation(item, inputOffset)) @@ -109,6 +111,6 @@ private RelNode convertOrderByScanToLogicalScan(ResolvedOrderByScan node, RelNod .retrieveRexNodeFromOrderByScan(getCluster(), node, input.getRowType().getFieldList()); List fieldNames = getTrait().retrieveFieldNames(node.getColumnList()); - return LogicalProject.create(input, projects, fieldNames); + return LogicalProject.create(input, ImmutableList.of(), projects, fieldNames); } } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/OrderByScanUnsupportedConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/OrderByScanUnsupportedConverter.java index 878b2b2c6ddd..d564a1a4aa49 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/OrderByScanUnsupportedConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/OrderByScanUnsupportedConverter.java @@ -19,7 +19,7 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedOrderByScan; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; /** * Always throws exception, represents the case when order by is used without limit. diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ProjectScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ProjectScanConverter.java index d19b765d6f49..81fa76904744 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ProjectScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ProjectScanConverter.java @@ -21,9 +21,10 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedProjectScan; import java.util.Collections; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalProject; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalProject; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** Converts projection. */ class ProjectScanConverter extends RelConverter { @@ -44,6 +45,6 @@ public RelNode convert(ResolvedProjectScan zetaNode, List inputs) { List projects = getExpressionConverter().retrieveRexNode(zetaNode, input.getRowType().getFieldList()); List fieldNames = getTrait().retrieveFieldNames(zetaNode.getColumnList()); - return LogicalProject.create(input, projects, fieldNames); + return LogicalProject.create(input, ImmutableList.of(), projects, fieldNames); } } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/QueryStatementConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/QueryStatementConverter.java index 45ece98ce0b5..57bf88926311 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/QueryStatementConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/QueryStatementConverter.java @@ -37,7 +37,7 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedQueryStmt; import java.util.Collections; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMultimap; /** diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/RelConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/RelConverter.java index 55e3f39b8a96..503c69905ee4 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/RelConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/RelConverter.java @@ -22,14 +22,14 @@ import java.util.Collections; import java.util.List; import org.apache.beam.sdk.extensions.sql.zetasql.QueryTrait; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalValues; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.FrameworkConfig; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalValues; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.FrameworkConfig; /** A rule that converts Zeta SQL resolved relational node to corresponding Calcite rel node. */ abstract class RelConverter { diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SetOperationScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SetOperationScanConverter.java index 375021b3f201..b09f3dd50913 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SetOperationScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SetOperationScanConverter.java @@ -32,10 +32,10 @@ import java.util.List; import java.util.function.BiFunction; import java.util.function.Function; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalIntersect; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalMinus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalUnion; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalIntersect; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalMinus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalUnion; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SingleRowScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SingleRowScanConverter.java index e05edc8f13e1..04e882505fb5 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SingleRowScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SingleRowScanConverter.java @@ -19,7 +19,7 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedSingleRowScan; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; /** Converts a single row value. */ class SingleRowScanConverter extends RelConverter { diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlCaseWithValueOperatorRewriter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlCaseWithValueOperatorRewriter.java index c48175038b4a..a8afbe1b94d7 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlCaseWithValueOperatorRewriter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlCaseWithValueOperatorRewriter.java @@ -19,10 +19,10 @@ import java.util.ArrayList; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlCoalesceOperatorRewriter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlCoalesceOperatorRewriter.java index e8044337f6be..0301bfc799b6 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlCoalesceOperatorRewriter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlCoalesceOperatorRewriter.java @@ -19,11 +19,11 @@ import java.util.ArrayList; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Util; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Util; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlIfNullOperatorRewriter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlIfNullOperatorRewriter.java index 4c7db12d6995..63292d4dddd4 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlIfNullOperatorRewriter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlIfNullOperatorRewriter.java @@ -18,10 +18,10 @@ package org.apache.beam.sdk.extensions.sql.zetasql.translation; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlInOperatorRewriter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlInOperatorRewriter.java new file mode 100644 index 000000000000..4db7f43e355b --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlInOperatorRewriter.java @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql.translation; + +import java.util.List; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexLiteral; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** Rewrites $in calls as SEARCH calls. */ +class SqlInOperatorRewriter implements SqlOperatorRewriter { + @Override + public RexNode apply(RexBuilder rexBuilder, List operands) { + Preconditions.checkArgument( + operands.size() >= 2, "IN should have at least two arguments in function call."); + final RexNode arg = operands.get(0); + final List ranges = ImmutableList.copyOf(operands.subList(1, operands.size())); + + // ZetaSQL has weird behavior for NULL... + for (RexNode node : ranges) { + if (node instanceof RexLiteral && ((RexLiteral) node).isNull()) { + throw new UnsupportedOperationException("IN NULL unsupported"); + } + } + + return rexBuilder.makeIn(arg, ranges); + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlNullIfOperatorRewriter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlNullIfOperatorRewriter.java index d07bfcbb7480..8cf62aaafc38 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlNullIfOperatorRewriter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlNullIfOperatorRewriter.java @@ -18,10 +18,10 @@ package org.apache.beam.sdk.extensions.sql.zetasql.translation; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperatorMappingTable.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperatorMappingTable.java index 58e6f812bf3b..c59df966624a 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperatorMappingTable.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperatorMappingTable.java @@ -18,8 +18,8 @@ package org.apache.beam.sdk.extensions.sql.zetasql.translation; import java.util.Map; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.fun.SqlStdOperatorTable; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; /** SqlOperatorMappingTable. */ @@ -31,9 +31,9 @@ class SqlOperatorMappingTable { static final Map ZETASQL_FUNCTION_TO_CALCITE_SQL_OPERATOR = ImmutableMap.builder() // grouped window function - .put("TUMBLE", SqlStdOperatorTable.TUMBLE) - .put("HOP", SqlStdOperatorTable.HOP) - .put("SESSION", SqlStdOperatorTable.SESSION) + .put("TUMBLE", SqlStdOperatorTable.TUMBLE_OLD) + .put("HOP", SqlStdOperatorTable.HOP_OLD) + .put("SESSION", SqlStdOperatorTable.SESSION_OLD) // ZetaSQL functions .put("$and", SqlStdOperatorTable.AND) @@ -46,7 +46,6 @@ class SqlOperatorMappingTable { .put("$less", SqlStdOperatorTable.LESS_THAN) .put("$less_or_equal", SqlStdOperatorTable.LESS_THAN_OR_EQUAL) .put("$like", SqlOperators.LIKE) - .put("$in", SqlStdOperatorTable.IN) .put("$is_null", SqlStdOperatorTable.IS_NULL) .put("$is_true", SqlStdOperatorTable.IS_TRUE) .put("$is_false", SqlStdOperatorTable.IS_FALSE) @@ -57,6 +56,7 @@ class SqlOperatorMappingTable { .put("$divide", SqlStdOperatorTable.DIVIDE) .put("concat", SqlOperators.CONCAT) .put("substr", SqlOperators.SUBSTR) + .put("substring", SqlOperators.SUBSTR) .put("trim", SqlOperators.TRIM) .put("replace", SqlOperators.REPLACE) .put("char_length", SqlOperators.CHAR_LENGTH) @@ -75,7 +75,9 @@ class SqlOperatorMappingTable { // .put("bit_and", SqlStdOperatorTable.BIT_AND) //JIRA link: // https://issues.apache.org/jira/browse/BEAM-10379 .put("string_agg", SqlOperators.STRING_AGG_STRING_FN) // NULL values not supported + .put("array_agg", SqlOperators.ARRAY_AGG_FN) .put("bit_or", SqlStdOperatorTable.BIT_OR) + .put("bit_xor", SqlOperators.BIT_XOR) .put("ceil", SqlStdOperatorTable.CEIL) .put("floor", SqlStdOperatorTable.FLOOR) .put("mod", SqlStdOperatorTable.MOD) @@ -93,6 +95,7 @@ class SqlOperatorMappingTable { .put("coalesce", SqlStdOperatorTable.CASE) .put("ifnull", SqlStdOperatorTable.CASE) .put("nullif", SqlStdOperatorTable.CASE) + .put("countif", SqlOperators.COUNTIF) .build(); static final Map ZETASQL_FUNCTION_TO_CALCITE_SQL_OPERATOR_REWRITER = @@ -101,5 +104,6 @@ class SqlOperatorMappingTable { .put("coalesce", new SqlCoalesceOperatorRewriter()) .put("ifnull", new SqlIfNullOperatorRewriter()) .put("nullif", new SqlNullIfOperatorRewriter()) + .put("$in", new SqlInOperatorRewriter()) .build(); } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperatorRewriter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperatorRewriter.java index ae31ffa854d6..f64a334e8a9b 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperatorRewriter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperatorRewriter.java @@ -18,8 +18,8 @@ package org.apache.beam.sdk.extensions.sql.zetasql.translation; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexBuilder; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; /** Interface for rewriting calls a specific ZetaSQL operator. */ interface SqlOperatorRewriter { diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperators.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperators.java index 1c0835c4d96d..b44f63ebb1bf 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperators.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperators.java @@ -17,6 +17,8 @@ */ package org.apache.beam.sdk.extensions.sql.zetasql.translation; +import static org.apache.beam.sdk.extensions.sql.zetasql.BeamZetaSqlCatalog.ZETASQL_FUNCTION_GROUP_NAME; + import java.lang.reflect.Method; import java.util.ArrayList; import java.util.List; @@ -24,36 +26,39 @@ import org.apache.beam.sdk.extensions.sql.impl.ScalarFunctionImpl; import org.apache.beam.sdk.extensions.sql.impl.UdafImpl; import org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem; +import org.apache.beam.sdk.extensions.sql.impl.transform.BeamBuiltinAggregations; +import org.apache.beam.sdk.extensions.sql.impl.transform.agg.CountIf; +import org.apache.beam.sdk.extensions.sql.impl.udaf.ArrayAgg; import org.apache.beam.sdk.extensions.sql.impl.udaf.StringAgg; import org.apache.beam.sdk.extensions.sql.zetasql.DateTimeUtils; import org.apache.beam.sdk.extensions.sql.zetasql.translation.impl.BeamBuiltinMethods; import org.apache.beam.sdk.extensions.sql.zetasql.translation.impl.CastFunctionImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactoryImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.AggregateFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.FunctionParameter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.ScalarFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlFunctionCategory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlSyntax; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.FamilyOperandTypeChecker; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.InferTypes; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.OperandTypes; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlReturnTypeInference; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFamily; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlUserDefinedAggFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlUserDefinedFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Optionality; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Util; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.JavaTypeFactoryImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactoryImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.AggregateFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.FunctionParameter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.ScalarFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlFunctionCategory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlSyntax; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.FamilyOperandTypeChecker; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.InferTypes; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.OperandTypes; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlReturnTypeInference; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeFamily; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlUserDefinedAggFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlUserDefinedFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Optionality; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.util.Util; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; @@ -81,37 +86,52 @@ public class SqlOperators { x -> createTypeFactory().createSqlType(SqlTypeName.VARCHAR), new UdafImpl<>(new StringAgg.StringAggString())); + public static final SqlOperator ARRAY_AGG_FN = + createUdafOperator( + "array_agg", + x -> createTypeFactory().createArrayType(x.getOperandType(0), -1), + new UdafImpl<>(new ArrayAgg.ArrayAggArray())); + public static final SqlOperator START_WITHS = - createUdfOperator("STARTS_WITH", BeamBuiltinMethods.STARTS_WITH_METHOD); + createUdfOperator( + "STARTS_WITH", BeamBuiltinMethods.STARTS_WITH_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator CONCAT = - createUdfOperator("CONCAT", BeamBuiltinMethods.CONCAT_METHOD); + createUdfOperator("CONCAT", BeamBuiltinMethods.CONCAT_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator REPLACE = - createUdfOperator("REPLACE", BeamBuiltinMethods.REPLACE_METHOD); + createUdfOperator("REPLACE", BeamBuiltinMethods.REPLACE_METHOD, ZETASQL_FUNCTION_GROUP_NAME); - public static final SqlOperator TRIM = createUdfOperator("TRIM", BeamBuiltinMethods.TRIM_METHOD); + public static final SqlOperator TRIM = + createUdfOperator("TRIM", BeamBuiltinMethods.TRIM_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator LTRIM = - createUdfOperator("LTRIM", BeamBuiltinMethods.LTRIM_METHOD); + createUdfOperator("LTRIM", BeamBuiltinMethods.LTRIM_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator RTRIM = - createUdfOperator("RTRIM", BeamBuiltinMethods.RTRIM_METHOD); + createUdfOperator("RTRIM", BeamBuiltinMethods.RTRIM_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator SUBSTR = - createUdfOperator("SUBSTR", BeamBuiltinMethods.SUBSTR_METHOD); + createUdfOperator("SUBSTR", BeamBuiltinMethods.SUBSTR_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator REVERSE = - createUdfOperator("REVERSE", BeamBuiltinMethods.REVERSE_METHOD); + createUdfOperator("REVERSE", BeamBuiltinMethods.REVERSE_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator CHAR_LENGTH = - createUdfOperator("CHAR_LENGTH", BeamBuiltinMethods.CHAR_LENGTH_METHOD); + createUdfOperator( + "CHAR_LENGTH", BeamBuiltinMethods.CHAR_LENGTH_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator ENDS_WITH = - createUdfOperator("ENDS_WITH", BeamBuiltinMethods.ENDS_WITH_METHOD); + createUdfOperator( + "ENDS_WITH", BeamBuiltinMethods.ENDS_WITH_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator LIKE = - createUdfOperator("LIKE", BeamBuiltinMethods.LIKE_METHOD, SqlSyntax.BINARY); + createUdfOperator( + "LIKE", + BeamBuiltinMethods.LIKE_METHOD, + SqlSyntax.BINARY, + ZETASQL_FUNCTION_GROUP_NAME, + ""); public static final SqlOperator VALIDATE_TIMESTAMP = createUdfOperator( @@ -119,7 +139,8 @@ public class SqlOperators { DateTimeUtils.class, "validateTimestamp", x -> NULLABLE_TIMESTAMP, - ImmutableList.of(TIMESTAMP)); + ImmutableList.of(TIMESTAMP), + ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator VALIDATE_TIME_INTERVAL = createUdfOperator( @@ -127,18 +148,32 @@ public class SqlOperators { DateTimeUtils.class, "validateTimeInterval", x -> NULLABLE_BIGINT, - ImmutableList.of(BIGINT, OTHER)); + ImmutableList.of(BIGINT, OTHER), + ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator TIMESTAMP_OP = - createUdfOperator("TIMESTAMP", BeamBuiltinMethods.TIMESTAMP_METHOD); + createUdfOperator( + "TIMESTAMP", BeamBuiltinMethods.TIMESTAMP_METHOD, ZETASQL_FUNCTION_GROUP_NAME); public static final SqlOperator DATE_OP = - createUdfOperator("DATE", BeamBuiltinMethods.DATE_METHOD); + createUdfOperator("DATE", BeamBuiltinMethods.DATE_METHOD, ZETASQL_FUNCTION_GROUP_NAME); + + public static final SqlOperator BIT_XOR = + createUdafOperator( + "BIT_XOR", + x -> createTypeFactory().createSqlType(SqlTypeName.BIGINT), + new UdafImpl<>(new BeamBuiltinAggregations.BitXOr())); + + public static final SqlOperator COUNTIF = + createUdafOperator( + "countif", + x -> createTypeFactory().createSqlType(SqlTypeName.BIGINT), + new UdafImpl<>(new CountIf.CountIfFn())); public static final SqlUserDefinedFunction CAST_OP = new SqlUserDefinedFunction( new SqlIdentifier("CAST", SqlParserPos.ZERO), - null, + SqlKind.OTHER_FUNCTION, null, null, null, @@ -146,7 +181,9 @@ public class SqlOperators { /** * Create a dummy SqlFunction of type OTHER_FUNCTION from given function name and return type. - * These functions will be unparsed in BeamZetaSqlCalcRel and then executed by ZetaSQL evaluator. + * These functions will be unparsed in either {@link + * org.apache.beam.sdk.extensions.sql.zetasql.BeamZetaSqlCalcRel} (for built-in functions) or + * {@link org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel} (for user-defined functions). */ public static SqlFunction createZetaSqlFunction(String name, SqlTypeName returnType) { return new SqlFunction( @@ -158,7 +195,7 @@ public static SqlFunction createZetaSqlFunction(String name, SqlTypeName returnT SqlFunctionCategory.USER_DEFINED_FUNCTION); } - private static SqlUserDefinedAggFunction createUdafOperator( + static SqlUserDefinedAggFunction createUdafOperator( String name, SqlReturnTypeInference returnTypeInference, AggregateFunction function) { return new SqlUserDefinedAggFunction( new SqlIdentifier(name, SqlParserPos.ZERO), @@ -177,26 +214,29 @@ private static SqlUserDefinedFunction createUdfOperator( Class methodClass, String methodName, SqlReturnTypeInference returnTypeInference, - List paramTypes) { + List paramTypes, + String funGroup) { return new SqlUserDefinedFunction( new SqlIdentifier(name, SqlParserPos.ZERO), returnTypeInference, null, null, paramTypes, - ScalarFunctionImpl.create(methodClass, methodName)); + ZetaSqlScalarFunctionImpl.create(methodClass, methodName, funGroup, "")); + } + + static SqlUserDefinedFunction createUdfOperator( + String name, Method method, String funGroup, String jarPath) { + return createUdfOperator(name, method, SqlSyntax.FUNCTION, funGroup, jarPath); } - // Helper function to create SqlUserDefinedFunction based on a function name and a method. - // SqlUserDefinedFunction will be able to pass through Calcite codegen and get proper function - // called. - private static SqlUserDefinedFunction createUdfOperator(String name, Method method) { - return createUdfOperator(name, method, SqlSyntax.FUNCTION); + static SqlUserDefinedFunction createUdfOperator(String name, Method method, String funGroup) { + return createUdfOperator(name, method, SqlSyntax.FUNCTION, funGroup, ""); } private static SqlUserDefinedFunction createUdfOperator( - String name, Method method, final SqlSyntax syntax) { - Function function = ScalarFunctionImpl.create(method); + String name, Method method, final SqlSyntax syntax, String funGroup, String jarPath) { + Function function = ZetaSqlScalarFunctionImpl.create(method, funGroup, jarPath); final RelDataTypeFactory typeFactory = createTypeFactory(); List argTypes = new ArrayList<>(); diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlWindowTableFunction.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlWindowTableFunction.java index 423c41283d83..d693080b8be7 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlWindowTableFunction.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlWindowTableFunction.java @@ -20,20 +20,20 @@ import java.util.ArrayList; import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.utils.TVFStreamingUtils; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFieldImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlCallBinding; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlFunctionCategory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperandCountRange; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlOperandCountRanges; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlReturnTypeInference; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFieldImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelRecordType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlCallBinding; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlFunctionCategory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlKind; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlOperandCountRange; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlOperandCountRanges; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlReturnTypeInference; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlValidator; /** Base class for table-valued function windowing operator (TUMBLE, HOP and SESSION). */ @SuppressWarnings({ diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/TVFScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/TVFScanConverter.java index 01a1df2455a3..29f9603f6032 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/TVFScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/TVFScanConverter.java @@ -28,9 +28,9 @@ import java.util.ArrayList; import java.util.Collections; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalTableFunctionScan; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalTableFunctionScan; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; /** Converts TVFScan. */ @SuppressWarnings({ @@ -52,7 +52,8 @@ public RelNode convert(ResolvedTVFScan zetaNode, List inputs) { input, zetaNode.getTvf(), zetaNode.getArgumentList(), - zetaNode.getArgumentList().get(0).getScan() != null + zetaNode.getArgumentList().size() > 0 + && zetaNode.getArgumentList().get(0).getScan() != null ? zetaNode.getArgumentList().get(0).getScan().getColumnList() : Collections.emptyList()); RelNode tableFunctionScan = diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/TableScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/TableScanConverter.java index 9137b94681a5..dc0568cbcbe8 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/TableScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/TableScanConverter.java @@ -17,25 +17,26 @@ */ package org.apache.beam.sdk.extensions.sql.zetasql.translation; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkNotNull; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkNotNull; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedTableScan; import java.util.List; import java.util.Properties; import org.apache.beam.sdk.extensions.sql.zetasql.TableResolution; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.config.CalciteConnectionConfigImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.jdbc.CalciteSchema; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.CalciteCatalogReader; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.RelOptTableImpl; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelRoot; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Table; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.TranslatableTable; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.config.CalciteConnectionConfigImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.jdbc.CalciteSchema; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.prepare.CalciteCatalogReader; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.prepare.RelOptTableImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelRoot; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.hint.RelHint; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Table; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.TranslatableTable; /** Converts table scan. */ class TableScanConverter extends RelConverter { @@ -105,6 +106,11 @@ public RelRoot expandView( public RelOptCluster getCluster() { return TableScanConverter.this.getCluster(); } + + @Override + public List getTableHints() { + return ImmutableList.of(); + } }; } } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/UserFunctionDefinitions.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/UserFunctionDefinitions.java index a4544e70fdad..b66ace777eed 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/UserFunctionDefinitions.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/UserFunctionDefinitions.java @@ -17,27 +17,65 @@ */ package org.apache.beam.sdk.extensions.sql.zetasql.translation; +import com.google.auto.value.AutoValue; import com.google.zetasql.resolvedast.ResolvedNode; import com.google.zetasql.resolvedast.ResolvedNodes; +import java.lang.reflect.Method; import java.util.List; +import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; /** Holds user defined function definitions. */ -public class UserFunctionDefinitions { - public final ImmutableMap, ResolvedNodes.ResolvedCreateFunctionStmt> - sqlScalarFunctions; +@AutoValue +public abstract class UserFunctionDefinitions { + public abstract ImmutableMap, ResolvedNodes.ResolvedCreateFunctionStmt> + sqlScalarFunctions(); /** * SQL native user-defined table-valued function can be resolved by Analyzer. Keeping the function * name to its ResolvedNode mapping so during Plan conversion, UDTVF implementation can replace * inputs of TVFScanConverter. */ - public final ImmutableMap, ResolvedNode> sqlTableValuedFunctions; + public abstract ImmutableMap, ResolvedNode> sqlTableValuedFunctions(); - public UserFunctionDefinitions( - ImmutableMap, ResolvedNodes.ResolvedCreateFunctionStmt> sqlScalarFunctions, - ImmutableMap, ResolvedNode> sqlTableValuedFunctions) { - this.sqlScalarFunctions = sqlScalarFunctions; - this.sqlTableValuedFunctions = sqlTableValuedFunctions; + public abstract ImmutableMap, JavaScalarFunction> javaScalarFunctions(); + + public abstract ImmutableMap, Combine.CombineFn> javaAggregateFunctions(); + + @AutoValue + public abstract static class JavaScalarFunction { + public static JavaScalarFunction create(Method method, String jarPath) { + return new AutoValue_UserFunctionDefinitions_JavaScalarFunction(method, jarPath); + } + + public abstract Method method(); + + /** The Beam filesystem path to the jar where the method was defined. */ + public abstract String jarPath(); + } + + @AutoValue.Builder + public abstract static class Builder { + public abstract Builder setSqlScalarFunctions( + ImmutableMap, ResolvedNodes.ResolvedCreateFunctionStmt> sqlScalarFunctions); + + public abstract Builder setSqlTableValuedFunctions( + ImmutableMap, ResolvedNode> sqlTableValuedFunctions); + + public abstract Builder setJavaScalarFunctions( + ImmutableMap, JavaScalarFunction> javaScalarFunctions); + + public abstract Builder setJavaAggregateFunctions( + ImmutableMap, Combine.CombineFn> javaAggregateFunctions); + + public abstract UserFunctionDefinitions build(); + } + + public static Builder newBuilder() { + return new AutoValue_UserFunctionDefinitions.Builder() + .setSqlScalarFunctions(ImmutableMap.of()) + .setSqlTableValuedFunctions(ImmutableMap.of()) + .setJavaScalarFunctions(ImmutableMap.of()) + .setJavaAggregateFunctions(ImmutableMap.of()); } } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/WithRefScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/WithRefScanConverter.java index d1ed3cf64d05..66677323cfed 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/WithRefScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/WithRefScanConverter.java @@ -21,7 +21,7 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedWithRefScan; import java.util.Collections; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; /** Converts a call-site reference to a named WITH subquery. */ @SuppressWarnings({ diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/WithScanConverter.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/WithScanConverter.java index 7159356e3b12..b88674e57237 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/WithScanConverter.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/WithScanConverter.java @@ -21,7 +21,7 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedWithScan; import java.util.Collections; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; /** Converts a named WITH. */ class WithScanConverter extends RelConverter { diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ZetaSqlScalarFunctionImpl.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ZetaSqlScalarFunctionImpl.java new file mode 100644 index 000000000000..9b3f8eff5a96 --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ZetaSqlScalarFunctionImpl.java @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql.translation; + +import java.lang.reflect.Method; +import org.apache.beam.sdk.extensions.sql.impl.ScalarFunctionImpl; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.CallImplementor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.ScalarFunction; + +/** ZetaSQL-specific extension to {@link ScalarFunctionImpl}. */ +public class ZetaSqlScalarFunctionImpl extends ScalarFunctionImpl { + /** + * ZetaSQL function group identifier. Different function groups may have divergent translation + * paths. + */ + public final String functionGroup; + + private ZetaSqlScalarFunctionImpl( + Method method, CallImplementor implementor, String functionGroup, String jarPath) { + super(method, implementor, jarPath); + this.functionGroup = functionGroup; + } + + /** + * Creates {@link org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function} from + * given class. + * + *

    If a method of the given name is not found or it does not suit, returns {@code null}. + * + * @param clazz class that is used to implement the function + * @param methodName Method name (typically "eval") + * @param functionGroup ZetaSQL function group identifier. Different function groups may have + * divergent translation paths. + * @return created {@link ScalarFunction} or null + */ + public static Function create( + Class clazz, String methodName, String functionGroup, String jarPath) { + return create(findMethod(clazz, methodName), functionGroup, jarPath); + } + + /** + * Creates {@link org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.Function} from + * given method. When {@code eval} method does not suit, {@code null} is returned. + * + * @param method method that is used to implement the function + * @param functionGroup ZetaSQL function group identifier. Different function groups may have + * divergent translation paths. + * @return created {@link Function} or null + */ + public static Function create(Method method, String functionGroup, String jarPath) { + validateMethod(method); + CallImplementor implementor = createImplementor(method); + return new ZetaSqlScalarFunctionImpl(method, implementor, functionGroup, jarPath); + } + + /* + * Finds a method in a given class by name. + * @param clazz class to search method in + * @param name name of the method to find + * @return the first method with matching name or null when no method found + */ + private static Method findMethod(Class clazz, String name) { + for (Method method : clazz.getMethods()) { + if (method.getName().equals(name) && !method.isBridge()) { + return method; + } + } + throw new NoSuchMethodError( + String.format("Method %s not found in class %s.", name, clazz.getName())); + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/BeamBuiltinMethods.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/BeamBuiltinMethods.java index 223469b9bd4e..cbb2e2c54a90 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/BeamBuiltinMethods.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/BeamBuiltinMethods.java @@ -19,7 +19,7 @@ import java.lang.reflect.Method; import org.apache.beam.sdk.annotations.Internal; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Types; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Types; /** BeamBuiltinMethods. */ @Internal diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/CastFunctionImpl.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/CastFunctionImpl.java index c15bdab5779f..c68b7716cbb6 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/CastFunctionImpl.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/CastFunctionImpl.java @@ -17,22 +17,22 @@ */ package org.apache.beam.sdk.extensions.sql.zetasql.translation.impl; -import static org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.RexImpTable.createImplementor; +import static org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.RexImpTable.createImplementor; import java.util.Collections; import java.util.List; import org.apache.beam.sdk.annotations.Internal; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.CallImplementor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.NotNullImplementor; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.NullPolicy; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.RexImpTable; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.adapter.enumerable.RexToLixTranslator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expression; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.tree.Expressions; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.FunctionParameter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.ImplementableFunction; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.CallImplementor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.NotNullImplementor; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.NullPolicy; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.RexImpTable; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.adapter.enumerable.RexToLixTranslator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expression; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.tree.Expressions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.FunctionParameter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.ImplementableFunction; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; /** ZetaSQLCastFunctionImpl. */ @Internal diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/StringFunctions.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/StringFunctions.java index 213b5770f91d..451d8a684376 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/StringFunctions.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/StringFunctions.java @@ -19,8 +19,8 @@ import java.util.regex.Pattern; import org.apache.beam.sdk.annotations.Internal; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.function.Strict; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.SqlFunctions; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.function.Strict; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.runtime.SqlFunctions; /** StringFunctions. */ @Internal diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/TimestampFunctions.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/TimestampFunctions.java index 721e22662379..1718976d43fe 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/TimestampFunctions.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/TimestampFunctions.java @@ -20,7 +20,7 @@ import java.util.TimeZone; import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.sql.zetasql.DateTimeUtils; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.linq4j.function.Strict; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.linq4j.function.Strict; import org.joda.time.DateTime; import org.joda.time.DateTimeZone; diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUncollectRel.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUncollectRel.java index fce4c3d86575..f252e1acc34c 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUncollectRel.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUncollectRel.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.zetasql.unnest; -import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.calcite.v1_26_0.com.google.common.base.Preconditions.checkArgument; import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; @@ -31,11 +31,11 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; /** * {@link BeamRelNode} to implement an uncorrelated {@link ZetaSqlUnnest}, aka UNNEST. diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUncollectRule.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUncollectRule.java index 76449521168b..e985f74ebfa6 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUncollectRule.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUncollectRule.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.extensions.sql.zetasql.unnest; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.ConverterRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.convert.ConverterRule; /** * A {@code ConverterRule} to replace {@link ZetaSqlUnnest} with {@link BeamZetaSqlUncollectRel}. diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUnnestRel.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUnnestRel.java index 4fc60883c6d4..ae44c447a019 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUnnestRel.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUnnestRel.java @@ -31,17 +31,17 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionList; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Correlate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorUtil; +import org.apache.beam.vendor.calcite.v1_26_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Correlate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.validate.SqlValidatorUtil; import org.checkerframework.checker.nullness.qual.Nullable; /** diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUnnestRule.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUnnestRule.java index a4e80d9c436a..1a7d80ebcb81 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUnnestRule.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/BeamZetaSqlUnnestRule.java @@ -18,17 +18,17 @@ package org.apache.beam.sdk.extensions.sql.zetasql.unnest; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamLogicalConvention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSubset; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.SingleRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Correlate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.JoinRelType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalCorrelate; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalProject; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexFieldAccess; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRule; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptRuleCall; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.volcano.RelSubset; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.SingleRel; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.Correlate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.core.JoinRelType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalCorrelate; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.logical.LogicalProject; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexFieldAccess; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rex.RexNode; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; /** @@ -58,10 +58,6 @@ public void onMatch(RelOptRuleCall call) { RelNode outer = call.rel(1); RelNode uncollect = call.rel(2); - if (correlate.getCorrelationId().getId() != 0) { - // Only one level of correlation nesting is supported - return; - } if (correlate.getRequiredColumns().cardinality() != 1) { // can only unnest a single column return; @@ -113,7 +109,7 @@ public void onMatch(RelOptRuleCall call) { new BeamZetaSqlUnnestRel( correlate.getCluster(), correlate.getTraitSet().replace(BeamLogicalConvention.INSTANCE), - outer, + convert(outer, outer.getTraitSet().replace(BeamLogicalConvention.INSTANCE)), call.rel(2).getRowType(), fieldAccessIndices.build())); } diff --git a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/ZetaSqlUnnest.java b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/ZetaSqlUnnest.java index e2d6fad5b18b..871a9f8c1dd1 100644 --- a/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/ZetaSqlUnnest.java +++ b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/unnest/ZetaSqlUnnest.java @@ -18,20 +18,20 @@ package org.apache.beam.sdk.extensions.sql.zetasql.unnest; import java.util.List; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Convention; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelInput; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelWriter; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.SingleRel; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlUnnestOperator; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlUtil; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.MapSqlType; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Convention; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelInput; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.RelWriter; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.SingleRel; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlUnnestOperator; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlUtil; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.MapSqlType; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.type.SqlTypeName; /** * This class is a copy of Uncollect.java in Calcite: diff --git a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamJavaUdfCalcRuleTest.java b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamJavaUdfCalcRuleTest.java new file mode 100644 index 000000000000..21436b5d436e --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamJavaUdfCalcRuleTest.java @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import static org.hamcrest.Matchers.isA; + +import org.apache.beam.sdk.extensions.sql.impl.SqlConversionException; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Frameworks; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.joda.time.Duration; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link BeamJavaUdfCalcRule}. */ +@RunWith(JUnit4.class) +public class BeamJavaUdfCalcRuleTest extends ZetaSqlTestBase { + @Rule public transient TestPipeline pipeline = TestPipeline.create(); + @Rule public ExpectedException thrown = ExpectedException.none(); + + @Before + public void setUp() { + initialize(); + + this.config = + Frameworks.newConfigBuilder(config) + .ruleSets( + ZetaSQLQueryPlanner.getZetaSqlRuleSets( + ImmutableList.of(BeamJavaUdfCalcRule.INSTANCE)) + .toArray(new RuleSet[0])) + .build(); + } + + @Test + public void testSelectLiteral() { + String sql = "SELECT 1;"; + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema singleField = Schema.builder().addInt64Field("field1").build(); + + PAssert.that(stream).containsInAnyOrder(Row.withSchema(singleField).addValues(1L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testBuiltinFunctionThrowsSqlConversionException() { + String sql = "SELECT abs(1);"; + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + + thrown.expect(SqlConversionException.class); + thrown.expectCause(isA(RelOptPlanner.CannotPlanException.class)); + + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCatalogTest.java b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCatalogTest.java new file mode 100644 index 000000000000..6789848ba48b --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCatalogTest.java @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import static org.apache.beam.sdk.extensions.sql.zetasql.BeamZetaSqlCatalog.USER_DEFINED_JAVA_SCALAR_FUNCTIONS; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotNull; + +import com.google.zetasql.Analyzer; +import com.google.zetasql.AnalyzerOptions; +import com.google.zetasql.resolvedast.ResolvedNodes; +import java.lang.reflect.Method; +import java.sql.Time; +import java.util.List; +import org.apache.beam.sdk.extensions.sql.BeamSqlUdf; +import org.apache.beam.sdk.extensions.sql.impl.JdbcConnection; +import org.apache.beam.sdk.extensions.sql.impl.JdbcDriver; +import org.apache.beam.sdk.extensions.sql.impl.ScalarFunctionImpl; +import org.apache.beam.sdk.extensions.sql.meta.provider.ReadOnlyTableProvider; +import org.apache.beam.sdk.extensions.sql.zetasql.translation.UserFunctionDefinitions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link BeamZetaSqlCatalog}. */ +@RunWith(JUnit4.class) +public class BeamZetaSqlCatalogTest { + @Rule public ExpectedException thrown = ExpectedException.none(); + + public static class IncrementFn implements BeamSqlUdf { + public Long eval(Long i) { + return i + 1; + } + } + + public static class ReturnsArrayTimeFn implements BeamSqlUdf { + public List

    System properties beam.sql.udf.test.jarpath and + * beam.sql.udf.test.empty_jar_path must be set. + */ +@RunWith(JUnit4.class) +public class ZetaSqlJavaUdfTest extends ZetaSqlTestBase { + @Rule public transient TestPipeline pipeline = TestPipeline.create(); + @Rule public ExpectedException thrown = ExpectedException.none(); + + private final String jarPathProperty = "beam.sql.udf.test.jar_path"; + private final String emptyJarPathProperty = "beam.sql.udf.test.empty_jar_path"; + + private final @Nullable String jarPath = System.getProperty(jarPathProperty); + private final @Nullable String emptyJarPath = System.getProperty(emptyJarPathProperty); + + @Before + public void setUp() { + if (jarPath == null) { + fail( + String.format( + "System property %s must be set to run %s.", + jarPathProperty, ZetaSqlJavaUdfTest.class.getSimpleName())); + } + if (emptyJarPath == null) { + fail( + String.format( + "System property %s must be set to run %s.", + emptyJarPathProperty, ZetaSqlJavaUdfTest.class.getSimpleName())); + } + initialize(); + } + + @Test + public void testNullaryJavaUdf() { + String sql = + String.format( + "CREATE FUNCTION helloWorld() RETURNS STRING LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT helloWorld();", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema singleField = Schema.builder().addStringField("field1").build(); + + PAssert.that(stream) + .containsInAnyOrder(Row.withSchema(singleField).addValues("Hello world!").build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testUnaryJavaUdf() { + String sql = + String.format( + "CREATE FUNCTION increment(i INT64) RETURNS INT64 LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT increment(1);", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema singleField = Schema.builder().addInt64Field("field1").build(); + + PAssert.that(stream).containsInAnyOrder(Row.withSchema(singleField).addValues(2L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testJavaUdfColumnReference() { + String sql = + String.format( + "CREATE FUNCTION increment(i INT64) RETURNS INT64 LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT increment(int64_col) FROM table_all_types;", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema singleField = Schema.builder().addInt64Field("field1").build(); + + PAssert.that(stream) + .containsInAnyOrder( + Row.withSchema(singleField).addValues(0L).build(), + Row.withSchema(singleField).addValues(-1L).build(), + Row.withSchema(singleField).addValues(-2L).build(), + Row.withSchema(singleField).addValues(-3L).build(), + Row.withSchema(singleField).addValues(-4L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testNestedJavaUdf() { + String sql = + String.format( + "CREATE FUNCTION increment(i INT64) RETURNS INT64 LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT increment(increment(1));", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema singleField = Schema.builder().addInt64Field("field1").build(); + + PAssert.that(stream).containsInAnyOrder(Row.withSchema(singleField).addValues(3L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testUnexpectedNullArgumentThrowsRuntimeException() { + String sql = + String.format( + "CREATE FUNCTION increment(i INT64) RETURNS INT64 LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT increment(NULL);", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + thrown.expect(Pipeline.PipelineExecutionException.class); + thrown.expectMessage("CalcFn failed to evaluate"); + thrown.expectCause( + allOf(isA(RuntimeException.class), hasProperty("cause", isA(NullPointerException.class)))); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testExpectedNullArgument() { + String sql = + String.format( + "CREATE FUNCTION isNull(s STRING) RETURNS BOOLEAN LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT isNull(NULL);", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema singleField = Schema.builder().addBooleanField("field1").build(); + + PAssert.that(stream).containsInAnyOrder(Row.withSchema(singleField).addValues(true).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + public static class IncrementFn implements BeamSqlUdf { + public Long eval(Long i) { + return i + 1; + } + } + + @Test + public void testSqlTransformRegisterUdf() { + String sql = "SELECT increment(0);"; + PCollection stream = + pipeline.apply( + SqlTransform.query(sql) + .withQueryPlannerClass(ZetaSQLQueryPlanner.class) + .registerUdf("increment", IncrementFn.class)); + final Schema schema = Schema.builder().addInt64Field("field1").build(); + PAssert.that(stream).containsInAnyOrder(Row.withSchema(schema).addValues(1L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + /** This tests a subset of the code path used by {@link #testSqlTransformRegisterUdf()}. */ + @Test + public void testUdfFromCatalog() throws NoSuchMethodException { + // Add IncrementFn to Calcite schema. + JdbcConnection jdbcConnection = + JdbcDriver.connect( + new ReadOnlyTableProvider("empty_table_provider", ImmutableMap.of()), + PipelineOptionsFactory.create()); + Method method = IncrementFn.class.getMethod("eval", Long.class); + jdbcConnection.getCurrentSchemaPlus().add("increment", ScalarFunctionImpl.create(method)); + this.config = + Frameworks.newConfigBuilder(config) + .defaultSchema(jdbcConnection.getCurrentSchemaPlus()) + .build(); + + String sql = "SELECT increment(0);"; + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + final Schema schema = Schema.builder().addInt64Field("field1").build(); + PAssert.that(stream).containsInAnyOrder(Row.withSchema(schema).addValues(1L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testNullArgumentIsTypeChecked() { + // The Java definition for isNull takes a String, but here we declare it in SQL with INT64. + String sql = + String.format( + "CREATE FUNCTION isNull(i INT64) RETURNS INT64 LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT isNull(NULL);", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + // TODO(BEAM-11171) This should fail earlier, before compiling the CalcFn. + thrown.expect(UnsupportedOperationException.class); + thrown.expectMessage("Could not compile CalcFn"); + thrown.expectCause( + allOf( + isA(CompileException.class), + hasProperty( + "message", + containsString( + "No applicable constructor/method found for actual parameters \"java.lang.Long\"")))); + BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + } + + @Test + public void testFunctionSignatureTypeMismatchFailsPipelineConstruction() { + // The Java definition for isNull takes a String, but here we pass it a Long. + String sql = + String.format( + "CREATE FUNCTION isNull(i INT64) RETURNS BOOLEAN LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT isNull(0);", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + // TODO(BEAM-11171) This should fail earlier, before compiling the CalcFn. + thrown.expect(UnsupportedOperationException.class); + thrown.expectMessage("Could not compile CalcFn"); + thrown.expectCause( + allOf( + isA(CompileException.class), + hasProperty( + "message", + containsString( + "No applicable constructor/method found for actual parameters \"long\"")))); + BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + } + + @Test + public void testJavaUdfWithNoReturnTypeIsRejected() { + String sql = + String.format( + "CREATE FUNCTION helloWorld() LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT helloWorld();", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + thrown.expect(SqlException.class); + thrown.expectMessage("Non-SQL functions must specify a return type"); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + } + + @Test + public void testProjectUdfAndBuiltin() { + String sql = + String.format( + "CREATE FUNCTION matches(str STRING, regStr STRING) RETURNS BOOLEAN LANGUAGE java OPTIONS (path='%s'); " + + "SELECT matches(\"a\", \"a\"), 'apple'='beta'", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema schema = Schema.builder().addBooleanField("field1").addBooleanField("field2").build(); + + PAssert.that(stream).containsInAnyOrder(Row.withSchema(schema).addValues(true, false).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testProjectNestedUdfAndBuiltin() { + String sql = + String.format( + "CREATE FUNCTION increment(i INT64) RETURNS INT64 LANGUAGE java OPTIONS (path='%s'); " + + "SELECT increment(increment(0) + 1);", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema schema = Schema.builder().addInt64Field("field1").build(); + + PAssert.that(stream).containsInAnyOrder(Row.withSchema(schema).addValues(3L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testJavaUdfEmptyPath() { + String sql = + "CREATE FUNCTION foo() RETURNS STRING LANGUAGE java OPTIONS (path=''); SELECT foo();"; + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + thrown.expect(RuntimeException.class); + thrown.expectMessage("Failed to define function 'foo'"); + thrown.expectCause( + allOf( + isA(IllegalArgumentException.class), + hasProperty("message", containsString("No jar was provided to define function foo.")))); + zetaSQLQueryPlanner.convertToBeamRel(sql); + } + + @Test + public void testJavaUdfNoJarProvided() { + String sql = "CREATE FUNCTION foo() RETURNS STRING LANGUAGE java; SELECT foo();"; + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + thrown.expect(RuntimeException.class); + thrown.expectMessage("Failed to define function 'foo'"); + thrown.expectCause( + allOf( + isA(IllegalArgumentException.class), + hasProperty("message", containsString("No jar was provided to define function foo.")))); + zetaSQLQueryPlanner.convertToBeamRel(sql); + } + + @Test + public void testPathOptionNotString() { + String sql = + "CREATE FUNCTION foo() RETURNS STRING LANGUAGE java OPTIONS (path=23); SELECT foo();"; + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + thrown.expect(RuntimeException.class); + thrown.expectMessage("Failed to define function 'foo'"); + thrown.expectCause( + allOf( + isA(IllegalArgumentException.class), + hasProperty( + "message", + containsString("Option 'path' has type TYPE_INT64 (expected TYPE_STRING).")))); + zetaSQLQueryPlanner.convertToBeamRel(sql); + } + + @Test + public void testUdaf() { + String sql = + String.format( + "CREATE AGGREGATE FUNCTION my_sum(f INT64) RETURNS INT64 LANGUAGE java OPTIONS (path='%s'); " + + "SELECT my_sum(f_int_1) from aggregate_test_table", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema singleField = Schema.builder().addInt64Field("field1").build(); + + PAssert.that(stream).containsInAnyOrder(Row.withSchema(singleField).addValues(28L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testUdafNotFoundFailsToParse() { + String sql = + String.format( + "CREATE AGGREGATE FUNCTION nonexistent(f INT64) RETURNS INT64 LANGUAGE java OPTIONS (path='%s'); " + + "SELECT nonexistent(f_int_1) from aggregate_test_table", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + + thrown.expect(RuntimeException.class); + thrown.expectMessage("Failed to define function 'nonexistent'"); + thrown.expectCause( + allOf( + isA(IllegalArgumentException.class), + hasProperty( + "message", + containsString("No implementation of aggregate function nonexistent found")))); + + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + } + + @Test + public void testRegisterUdaf() { + String sql = "SELECT my_sum(k) FROM UNNEST([1, 2, 3]) k;"; + PCollection stream = + pipeline.apply( + SqlTransform.query(sql) + .withQueryPlannerClass(ZetaSQLQueryPlanner.class) + .registerUdaf("my_sum", Sum.ofLongs())); + Schema singleField = Schema.builder().addInt64Field("field1").build(); + PAssert.that(stream).containsInAnyOrder(Row.withSchema(singleField).addValues(6L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testDateUdf() { + String sql = + String.format( + "CREATE FUNCTION dateIncrementAll(d DATE) RETURNS DATE LANGUAGE java " + + "OPTIONS (path='%s'); " + + "SELECT dateIncrementAll('2020-04-04');", + jarPath); + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + Schema singleField = Schema.builder().addLogicalTypeField("field1", SqlTypes.DATE).build(); + PAssert.that(stream) + .containsInAnyOrder( + Row.withSchema(singleField).addValues(LocalDate.of(2021, 5, 5)).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTypeTest.java b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTypeTest.java new file mode 100644 index 000000000000..fa0d40a7ba00 --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTypeTest.java @@ -0,0 +1,586 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import java.math.BigDecimal; +import java.sql.Date; +import java.sql.Timestamp; +import java.time.LocalDate; +import java.util.List; +import org.apache.beam.sdk.extensions.sql.BeamSqlUdf; +import org.apache.beam.sdk.extensions.sql.impl.JdbcConnection; +import org.apache.beam.sdk.extensions.sql.impl.JdbcDriver; +import org.apache.beam.sdk.extensions.sql.impl.ScalarFunctionImpl; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.extensions.sql.meta.provider.ReadOnlyTableProvider; +import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestBoundedTable; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.schema.SchemaPlus; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Frameworks; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Duration; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests verifying that various data types can be passed through Java UDFs without data loss. */ +@RunWith(JUnit4.class) +public class ZetaSqlJavaUdfTypeTest extends ZetaSqlTestBase { + @Rule public transient TestPipeline pipeline = TestPipeline.create(); + @Rule public ExpectedException thrown = ExpectedException.none(); + + private static final TestBoundedTable table = + TestBoundedTable.of( + Schema.builder() + .addBooleanField("boolean_true") + .addBooleanField("boolean_false") + .addInt64Field("int64_0") + .addInt64Field("int64_pos") + .addInt64Field("int64_neg") + .addInt64Field("int64_max") + .addInt64Field("int64_min") + .addStringField("string_empty") + .addStringField("string_ascii") + .addStringField("string_unicode") + .addByteArrayField("bytes_empty") + .addByteArrayField("bytes_ascii") + .addByteArrayField("bytes_unicode") + .addDoubleField("float64_0") + .addDoubleField("float64_noninteger") + .addDoubleField("float64_pos") + .addDoubleField("float64_neg") + .addDoubleField("float64_max") + .addDoubleField("float64_min_pos") + .addDoubleField("float64_inf") + .addDoubleField("float64_neg_inf") + .addDoubleField("float64_nan") + .addLogicalTypeField("f_date", SqlTypes.DATE) + .addDateTimeField("f_timestamp") + .addArrayField("array_int64", Schema.FieldType.INT64) + .addDecimalField("numeric_one") + .addDecimalField("numeric_max") + .addDecimalField("numeric_min") + .build()) + .addRows( + true /* boolean_true */, + false /* boolean_false */, + 0L /* int64_0 */, + 123L /* int64_pos */, + -123L /* int64_neg */, + 9223372036854775807L /* int64_max */, + -9223372036854775808L /* int64_min */, + "" /* string_empty */, + "abc" /* string_ascii */, + "スタリング" /* string_unicode */, + new byte[] {} /* bytes_empty */, + new byte[] {'a', 'b', 'c'} /* bytes_ascii */, + new byte[] {-29, -126, -71} /* bytes_unicode */, + 0.0 /* float64_0 */, + 0.123 /* float64_noninteger */, + 123.0 /* float64_pos */, + -123.0 /* float64_neg */, + 1.7976931348623157e+308 /* float64_max */, + 2.2250738585072014e-308 /* float64_min_pos */, + Double.POSITIVE_INFINITY /* float64_inf */, + Double.NEGATIVE_INFINITY /* float64_neg_inf */, + Double.NaN /* float64_nan */, + LocalDate.of(2021, 4, 26) /* f_date */, + new DateTime(2021, 5, 6, 3, 48, 32, DateTimeZone.UTC) /* f_timestamp */, + ImmutableList.of(1L, 2L, 3L) /* array_int64 */, + new BigDecimal("1.000000000" /* numeric_one */), + new BigDecimal("99999999999999999999999999999.999999999" /* numeric_max */), + new BigDecimal("-99999999999999999999999999999.999999999" /* numeric_min */)); + + @Before + public void setUp() throws NoSuchMethodException { + initialize(); + + // Register test table. + JdbcConnection jdbcConnection = + JdbcDriver.connect( + new ReadOnlyTableProvider("table_provider", ImmutableMap.of("table", table)), + PipelineOptionsFactory.create()); + + // Register UDFs. + SchemaPlus schema = jdbcConnection.getCurrentSchemaPlus(); + schema.add( + "test_boolean", + ScalarFunctionImpl.create(BooleanIdentityFn.class.getMethod("eval", Boolean.class))); + schema.add( + "test_int64", + ScalarFunctionImpl.create(Int64IdentityFn.class.getMethod("eval", Long.class))); + schema.add( + "test_string", + ScalarFunctionImpl.create(StringIdentityFn.class.getMethod("eval", String.class))); + schema.add( + "test_bytes", + ScalarFunctionImpl.create(BytesIdentityFn.class.getMethod("eval", byte[].class))); + schema.add( + "test_float64", + ScalarFunctionImpl.create(DoubleIdentityFn.class.getMethod("eval", Double.class))); + schema.add( + "test_date", ScalarFunctionImpl.create(DateIdentityFn.class.getMethod("eval", Date.class))); + schema.add( + "test_timestamp", + ScalarFunctionImpl.create(TimestampIdentityFn.class.getMethod("eval", Timestamp.class))); + schema.add( + "test_array", + ScalarFunctionImpl.create(ListIdentityFn.class.getMethod("eval", List.class))); + schema.add( + "test_numeric", + ScalarFunctionImpl.create(BigDecimalIdentityFn.class.getMethod("eval", BigDecimal.class))); + + this.config = Frameworks.newConfigBuilder(config).defaultSchema(schema).build(); + } + + public static class BooleanIdentityFn implements BeamSqlUdf { + public Boolean eval(Boolean input) { + return input; + } + } + + public static class Int64IdentityFn implements BeamSqlUdf { + public Long eval(Long input) { + return input; + } + } + + public static class StringIdentityFn implements BeamSqlUdf { + public String eval(String input) { + return input; + } + } + + public static class BytesIdentityFn implements BeamSqlUdf { + public byte[] eval(byte[] input) { + return input; + } + } + + public static class DoubleIdentityFn implements BeamSqlUdf { + public Double eval(Double input) { + return input; + } + } + + public static class DateIdentityFn implements BeamSqlUdf { + public Date eval(Date input) { + return input; + } + } + + public static class TimestampIdentityFn implements BeamSqlUdf { + public Timestamp eval(Timestamp input) { + return input; + } + } + + public static class ListIdentityFn implements BeamSqlUdf { + public List eval(List input) { + return input; + } + } + + public static class BigDecimalIdentityFn implements BeamSqlUdf { + public BigDecimal eval(BigDecimal input) { + return input; + } + } + + private void runUdfTypeTest(String query, Object result, Schema.TypeName typeName) { + runUdfTypeTest(query, result, Schema.FieldType.of(typeName)); + } + + private void runUdfTypeTest(String query, Object result, Schema.LogicalType logicalType) { + runUdfTypeTest(query, result, Schema.FieldType.logicalType(logicalType)); + } + + private void runUdfTypeTest(String query, Object result, Schema.FieldType fieldType) { + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(query); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema outputSchema = Schema.builder().addField("res", fieldType).build(); + PAssert.that(stream).containsInAnyOrder(Row.withSchema(outputSchema).addValues(result).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testTrueLiteral() { + runUdfTypeTest("SELECT test_boolean(true);", true, Schema.TypeName.BOOLEAN); + } + + @Test + public void testTrueInput() { + runUdfTypeTest("SELECT test_boolean(boolean_true) FROM table;", true, Schema.TypeName.BOOLEAN); + } + + @Test + public void testFalseLiteral() { + runUdfTypeTest("SELECT test_boolean(false);", false, Schema.TypeName.BOOLEAN); + } + + @Test + public void testFalseInput() { + runUdfTypeTest( + "SELECT test_boolean(boolean_false) FROM table;", false, Schema.TypeName.BOOLEAN); + } + + @Test + public void testZeroInt64Literal() { + runUdfTypeTest("SELECT test_int64(0);", 0L, Schema.TypeName.INT64); + } + + @Test + public void testZeroInt64Input() { + runUdfTypeTest("SELECT test_int64(int64_0) FROM table;", 0L, Schema.TypeName.INT64); + } + + @Test + public void testPosInt64Literal() { + runUdfTypeTest("SELECT test_int64(123);", 123L, Schema.TypeName.INT64); + } + + @Test + public void testPosInt64Input() { + runUdfTypeTest("SELECT test_int64(int64_pos) FROM table;", 123L, Schema.TypeName.INT64); + } + + @Test + public void testNegInt64Literal() { + runUdfTypeTest("SELECT test_int64(-123);", -123L, Schema.TypeName.INT64); + } + + @Test + public void testNegInt64Input() { + runUdfTypeTest("SELECT test_int64(int64_neg) FROM table;", -123L, Schema.TypeName.INT64); + } + + @Test + public void testMaxInt64Literal() { + runUdfTypeTest( + "SELECT test_int64(9223372036854775807);", 9223372036854775807L, Schema.TypeName.INT64); + } + + @Test + public void testMaxInt64Input() { + runUdfTypeTest( + "SELECT test_int64(int64_max) FROM table;", 9223372036854775807L, Schema.TypeName.INT64); + } + + @Test + public void testMinInt64Literal() { + runUdfTypeTest( + "SELECT test_int64(-9223372036854775808);", -9223372036854775808L, Schema.TypeName.INT64); + } + + @Test + public void testMinInt64Input() { + runUdfTypeTest( + "SELECT test_int64(int64_min) FROM table;", -9223372036854775808L, Schema.TypeName.INT64); + } + + @Test + public void testEmptyStringLiteral() { + runUdfTypeTest("SELECT test_string('');", "", Schema.TypeName.STRING); + } + + @Test + public void testEmptyStringInput() { + runUdfTypeTest("SELECT test_string(string_empty) FROM table;", "", Schema.TypeName.STRING); + } + + @Test + public void testAsciiStringLiteral() { + runUdfTypeTest("SELECT test_string('abc');", "abc", Schema.TypeName.STRING); + } + + @Test + public void testAsciiStringInput() { + runUdfTypeTest("SELECT test_string(string_ascii) FROM table;", "abc", Schema.TypeName.STRING); + } + + @Test + public void testUnicodeStringLiteral() { + runUdfTypeTest("SELECT test_string('スタリング');", "スタリング", Schema.TypeName.STRING); + } + + @Test + public void testUnicodeStringInput() { + runUdfTypeTest( + "SELECT test_string(string_unicode) FROM table;", "スタリング", Schema.TypeName.STRING); + } + + @Test + public void testEmptyBytesLiteral() { + runUdfTypeTest("SELECT test_bytes(b'');", new byte[] {}, Schema.TypeName.BYTES); + } + + @Test + public void testEmptyBytesInput() { + runUdfTypeTest( + "SELECT test_bytes(bytes_empty) FROM table;", new byte[] {}, Schema.TypeName.BYTES); + } + + @Test + public void testAsciiBytesLiteral() { + runUdfTypeTest("SELECT test_bytes(b'abc');", new byte[] {'a', 'b', 'c'}, Schema.TypeName.BYTES); + } + + @Test + public void testAsciiBytesInput() { + runUdfTypeTest( + "SELECT test_bytes(bytes_ascii) FROM table;", + new byte[] {'a', 'b', 'c'}, + Schema.TypeName.BYTES); + } + + @Test + public void testUnicodeBytesLiteral() { + runUdfTypeTest("SELECT test_bytes(b'ス');", new byte[] {-29, -126, -71}, Schema.TypeName.BYTES); + } + + @Test + public void testUnicodeBytesInput() { + runUdfTypeTest( + "SELECT test_bytes(bytes_unicode) FROM table;", + new byte[] {-29, -126, -71}, + Schema.TypeName.BYTES); + } + + @Test + public void testZeroFloat64Literal() { + runUdfTypeTest("SELECT test_float64(0.0);", 0.0, Schema.TypeName.DOUBLE); + } + + @Test + public void testZeroFloat64Input() { + runUdfTypeTest("SELECT test_float64(float64_0) FROM table;", 0.0, Schema.TypeName.DOUBLE); + } + + @Test + public void testNonIntegerFloat64Literal() { + runUdfTypeTest("SELECT test_float64(0.123);", 0.123, Schema.TypeName.DOUBLE); + } + + @Test + public void testNonIntegerFloat64Input() { + runUdfTypeTest( + "SELECT test_float64(float64_noninteger) FROM table;", 0.123, Schema.TypeName.DOUBLE); + } + + @Test + public void testPosFloat64Literal() { + runUdfTypeTest("SELECT test_float64(123.0);", 123.0, Schema.TypeName.DOUBLE); + } + + @Test + public void testPosFloat64Input() { + runUdfTypeTest("SELECT test_float64(float64_pos) FROM table;", 123.0, Schema.TypeName.DOUBLE); + } + + @Test + public void testNegFloat64Literal() { + runUdfTypeTest("SELECT test_float64(-123.0);", -123.0, Schema.TypeName.DOUBLE); + } + + @Test + public void testNegFloat64Input() { + runUdfTypeTest("SELECT test_float64(float64_neg) FROM table;", -123.0, Schema.TypeName.DOUBLE); + } + + @Test + public void testMaxFloat64Literal() { + runUdfTypeTest( + "SELECT test_float64(1.7976931348623157e+308);", + 1.7976931348623157e+308, + Schema.TypeName.DOUBLE); + } + + @Test + public void testMaxFloat64Input() { + runUdfTypeTest( + "SELECT test_float64(float64_max) FROM table;", + 1.7976931348623157e+308, + Schema.TypeName.DOUBLE); + } + + @Test + public void testMinPosFloat64Literal() { + runUdfTypeTest( + "SELECT test_float64(2.2250738585072014e-308);", + 2.2250738585072014e-308, + Schema.TypeName.DOUBLE); + } + + @Test + public void testMinPosFloat64Input() { + runUdfTypeTest( + "SELECT test_float64(float64_min_pos) FROM table;", + 2.2250738585072014e-308, + Schema.TypeName.DOUBLE); + } + + @Test + public void testPosInfFloat64Literal() { + runUdfTypeTest( + "SELECT test_float64(CAST('+inf' AS FLOAT64));", + Double.POSITIVE_INFINITY, + Schema.TypeName.DOUBLE); + } + + @Test + public void testPosInfFloat64Input() { + runUdfTypeTest( + "SELECT test_float64(float64_inf) FROM table;", + Double.POSITIVE_INFINITY, + Schema.TypeName.DOUBLE); + } + + @Test + public void testNegInfFloat64Literal() { + runUdfTypeTest( + "SELECT test_float64(CAST('-inf' AS FLOAT64));", + Double.NEGATIVE_INFINITY, + Schema.TypeName.DOUBLE); + } + + @Test + public void testNegInfFloat64Input() { + runUdfTypeTest( + "SELECT test_float64(float64_neg_inf) FROM table;", + Double.NEGATIVE_INFINITY, + Schema.TypeName.DOUBLE); + } + + @Test + public void testNaNFloat64Literal() { + runUdfTypeTest( + "SELECT test_float64(CAST('NaN' AS FLOAT64));", Double.NaN, Schema.TypeName.DOUBLE); + } + + @Test + public void testNaNFloat64Input() { + runUdfTypeTest( + "SELECT test_float64(float64_nan) FROM table;", Double.NaN, Schema.TypeName.DOUBLE); + } + + @Test + public void testDateLiteral() { + runUdfTypeTest("SELECT test_date('2021-04-26');", LocalDate.of(2021, 4, 26), SqlTypes.DATE); + } + + @Test + public void testDateInput() { + runUdfTypeTest( + "SELECT test_date(f_date) FROM table;", LocalDate.of(2021, 4, 26), SqlTypes.DATE); + } + + @Test + public void testTimestampLiteral() { + runUdfTypeTest( + "SELECT test_timestamp('2021-05-06 03:48:32Z');", + new DateTime(2021, 5, 6, 3, 48, 32, DateTimeZone.UTC), + Schema.TypeName.DATETIME); + } + + @Test + public void testTimestampInput() { + runUdfTypeTest( + "SELECT test_timestamp(f_timestamp) FROM table;", + new DateTime(2021, 5, 6, 3, 48, 32, DateTimeZone.UTC), + Schema.TypeName.DATETIME); + } + + @Test + public void testArrayLiteral() { + runUdfTypeTest( + "SELECT test_array(ARRAY[1, 2, 3]);", + ImmutableList.of(1L, 2L, 3L), + Schema.FieldType.array(Schema.FieldType.INT64)); + } + + @Test + public void testArrayInput() { + runUdfTypeTest( + "SELECT test_array(array_int64) FROM table;", + ImmutableList.of(1L, 2L, 3L), + Schema.FieldType.array(Schema.FieldType.INT64)); + } + + @Test + public void testNumericOneLiteral() { + runUdfTypeTest( + "SELECT test_numeric(1.000000000);", + new BigDecimal("1.000000000"), + Schema.FieldType.DECIMAL); + } + + @Test + public void testNumericMaxLiteral() { + runUdfTypeTest( + "SELECT test_numeric(99999999999999999999999999999.999999999);", + new BigDecimal("99999999999999999999999999999.999999999"), + Schema.FieldType.DECIMAL); + } + + @Test + public void testNumericMinLiteral() { + runUdfTypeTest( + "SELECT test_numeric(-99999999999999999999999999999.999999999);", + new BigDecimal("-99999999999999999999999999999.999999999"), + Schema.FieldType.DECIMAL); + } + + @Test + public void testNumericOneInput() { + runUdfTypeTest( + "SELECT test_numeric(numeric_one) FROM table;", + new BigDecimal("1.000000000"), + Schema.FieldType.DECIMAL); + } + + @Test + public void testNumericMaxInput() { + runUdfTypeTest( + "SELECT test_numeric(numeric_max) FROM table;", + new BigDecimal("99999999999999999999999999999.999999999"), + Schema.FieldType.DECIMAL); + } + + @Test + public void testNumericMinInput() { + runUdfTypeTest( + "SELECT test_numeric(numeric_min) FROM table;", + new BigDecimal("-99999999999999999999999999999.999999999"), + Schema.FieldType.DECIMAL); + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlMathFunctionsTest.java b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlMathFunctionsTest.java index 0d1e493a1e61..830bb84419ff 100644 --- a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlMathFunctionsTest.java +++ b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlMathFunctionsTest.java @@ -35,9 +35,6 @@ /** Tests for ZetaSQL Math functions (on INT64, DOUBLE, NUMERIC types). */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ZetaSqlMathFunctionsTest extends ZetaSqlTestBase { @Rule public transient TestPipeline pipeline = TestPipeline.create(); @@ -640,8 +637,8 @@ public void testNumericLiteral() { ZetaSqlTypesUtils.bigDecimalAsNumeric("-0.54321"), ZetaSqlTypesUtils.bigDecimalAsNumeric("123456"), ZetaSqlTypesUtils.bigDecimalAsNumeric("-0.009876"), - ZetaSqlTypesUtils.NUMERIC_MIN_VALUE, - ZetaSqlTypesUtils.NUMERIC_MAX_VALUE) + ZetaSqlCalciteTranslationUtils.ZETASQL_NUMERIC_MIN_VALUE, + ZetaSqlCalciteTranslationUtils.ZETASQL_NUMERIC_MAX_VALUE) .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } @@ -892,7 +889,12 @@ public void testSafeArithmeticFunctionsNumeric() { .addNullableField("f_numeric4", Schema.FieldType.DECIMAL) .addNullableField("f_numeric5", Schema.FieldType.DECIMAL) .build()) - .addValues(null, null, null, null, ZetaSqlTypesUtils.NUMERIC_MIN_VALUE) + .addValues( + null, + null, + null, + null, + ZetaSqlCalciteTranslationUtils.ZETASQL_NUMERIC_MIN_VALUE) .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } diff --git a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlUdfTest.java b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlNativeUdfTest.java similarity index 78% rename from sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlUdfTest.java rename to sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlNativeUdfTest.java index bdcd9413e26e..de7e8561c018 100644 --- a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlUdfTest.java +++ b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlNativeUdfTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.extensions.sql.zetasql; +import static org.hamcrest.Matchers.isA; + import com.google.zetasql.SqlException; -import org.apache.beam.sdk.extensions.sql.impl.ParseException; -import org.apache.beam.sdk.extensions.sql.impl.SqlConversionException; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; import org.apache.beam.sdk.schemas.Schema; @@ -36,9 +36,9 @@ import org.junit.runner.RunWith; import org.junit.runners.JUnit4; -/** Tests for user defined functions in the ZetaSQL dialect. */ +/** Tests for SQL-native user defined functions in the ZetaSQL dialect. */ @RunWith(JUnit4.class) -public class ZetaSqlUdfTest extends ZetaSqlTestBase { +public class ZetaSqlNativeUdfTest extends ZetaSqlTestBase { @Rule public transient TestPipeline pipeline = TestPipeline.create(); @Rule public ExpectedException thrown = ExpectedException.none(); @@ -51,8 +51,9 @@ public void setUp() { public void testAlreadyDefinedUDFThrowsException() { String sql = "CREATE FUNCTION foo() AS (0); CREATE FUNCTION foo() AS (1); SELECT foo();"; ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); - thrown.expect(ParseException.class); - thrown.expectMessage("Failed to define function foo"); + thrown.expect(RuntimeException.class); + thrown.expectMessage("Failed to define function 'foo'"); + thrown.expectCause(isA(IllegalArgumentException.class)); zetaSQLQueryPlanner.convertToBeamRel(sql); } @@ -181,6 +182,27 @@ public void testUDTVF() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + @Test + public void testNullaryUdtvf() { + String sql = + "CREATE TABLE FUNCTION CustomerRange()\n" + + " AS\n" + + " SELECT *\n" + + " FROM KeyValue;\n" + + " SELECT key FROM CustomerRange()"; + + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + + Schema singleField = Schema.builder().addInt64Field("field1").build(); + PAssert.that(stream) + .containsInAnyOrder( + Row.withSchema(singleField).addValues(14L).build(), + Row.withSchema(singleField).addValues(15L).build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + @Test public void testUDTVFTableNotFound() { String sql = @@ -192,7 +214,7 @@ public void testUDTVFTableNotFound() { + " SELECT key FROM CustomerRange(10, 14)"; ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); - thrown.expect(SqlConversionException.class); + thrown.expect(ZetaSqlException.class); thrown.expectMessage("Wasn't able to resolve the path [TableNotExist] in schema: beam"); zetaSQLQueryPlanner.convertToBeamRel(sql); } @@ -212,4 +234,30 @@ public void testUDTVFFunctionNotFound() { thrown.expectMessage("Table-valued function not found: FunctionNotFound"); zetaSQLQueryPlanner.convertToBeamRel(sql); } + + @Test + public void testJavascriptUdfUnsupported() { + String sql = "CREATE FUNCTION foo() RETURNS STRING LANGUAGE js; SELECT foo();"; + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + thrown.expect(UnsupportedOperationException.class); + thrown.expectMessage("Function foo uses unsupported language js."); + zetaSQLQueryPlanner.convertToBeamRel(sql); + } + + @Test + public void testSqlNativeAggregateFunctionNotSupported() { + String sql = + "CREATE AGGREGATE FUNCTION double_sum(col FLOAT64)\n" + + "AS (2 * SUM(col));\n" + + "SELECT double_sum(col1) AS doubled_sum\n" + + "FROM (SELECT 1 AS col1 UNION ALL\n" + + " SELECT 3 AS col1 UNION ALL\n" + + " SELECT 5 AS col1\n" + + ");"; + + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + thrown.expect(UnsupportedOperationException.class); + thrown.expectMessage("Native SQL aggregate functions are not supported (BEAM-9954)."); + zetaSQLQueryPlanner.convertToBeamRel(sql); + } } diff --git a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlNumberTypesTest.java b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlNumberTypesTest.java new file mode 100644 index 000000000000..314a4fab4c93 --- /dev/null +++ b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlNumberTypesTest.java @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.zetasql.Value; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for ZetaSQL number type handling (on INT64, DOUBLE, NUMERIC types). */ +@RunWith(JUnit4.class) +public class ZetaSqlNumberTypesTest extends ZetaSqlTestBase { + @Rule public transient TestPipeline pipeline = TestPipeline.create(); + @Rule public ExpectedException thrown = ExpectedException.none(); + + @Before + public void setUp() { + initialize(); + } + + @Test + public void testCastDoubleAsNumericOverflow() { + double val = 1.7976931348623157e+308; + String sql = "SELECT CAST(@p0 AS NUMERIC) AS ColA"; + + thrown.expect(ZetaSqlException.class); + thrown.expectMessage("Casting TYPE_DOUBLE as TYPE_NUMERIC would cause overflow of literal"); + + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = + zetaSQLQueryPlanner.convertToBeamRel( + sql, ImmutableMap.of("p0", Value.createDoubleValue(val))); + } + + @Test + public void testCastDoubleAsNumericUnderflow() { + double val = -1.7976931348623157e+308; + String sql = "SELECT CAST(@p0 AS NUMERIC) AS ColA"; + + thrown.expect(ZetaSqlException.class); + thrown.expectMessage("Casting TYPE_DOUBLE as TYPE_NUMERIC would cause underflow of literal"); + + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = + zetaSQLQueryPlanner.convertToBeamRel( + sql, ImmutableMap.of("p0", Value.createDoubleValue(val))); + } + + @Test + public void testCastDoubleAsNumericScaleTooLarge() { + double val = 2.2250738585072014e-308; + String sql = "SELECT CAST(@p0 AS NUMERIC) AS ColA"; + + thrown.expect(ZetaSqlException.class); + thrown.expectMessage("Cannot cast TYPE_DOUBLE as TYPE_NUMERIC: scale 1022 exceeds 9"); + + ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + BeamRelNode beamRelNode = + zetaSQLQueryPlanner.convertToBeamRel( + sql, ImmutableMap.of("p0", Value.createDoubleValue(val))); + } +} diff --git a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTestBase.java b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTestBase.java index 741aa0128805..23e8e8874d74 100644 --- a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTestBase.java +++ b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTestBase.java @@ -26,17 +26,14 @@ import org.apache.beam.sdk.extensions.sql.meta.provider.ReadOnlyTableProvider; import org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.Contexts; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.ConventionTraitDef; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.FrameworkConfig; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Frameworks; -import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RuleSet; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.Contexts; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.plan.ConventionTraitDef; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.FrameworkConfig; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.Frameworks; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.tools.RuleSet; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; /** Common setup for ZetaSQL tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public abstract class ZetaSqlTestBase { protected static final long PIPELINE_EXECUTION_WAITTIME_MINUTES = 2L; @@ -56,12 +53,15 @@ private TableProvider createBeamTableProvider() { testBoundedTableMap.put("table_with_array", TestInput.TABLE_WITH_ARRAY); testBoundedTableMap.put("table_with_array_for_unnest", TestInput.TABLE_WITH_ARRAY_FOR_UNNEST); testBoundedTableMap.put("table_with_array_of_struct", TestInput.TABLE_WITH_ARRAY_OF_STRUCT); + testBoundedTableMap.put("table_with_struct_of_struct", TestInput.TABLE_WITH_STRUCT_OF_STRUCT); testBoundedTableMap.put( "table_with_struct_of_struct_of_array", TestInput.TABLE_WITH_STRUCT_OF_STRUCT_OF_ARRAY); testBoundedTableMap.put( "table_with_array_of_struct_of_struct", TestInput.TABLE_WITH_ARRAY_OF_STRUCT_OF_STRUCT); testBoundedTableMap.put( "table_with_struct_of_array_of_struct", TestInput.TABLE_WITH_STRUCT_OF_ARRAY_OF_STRUCT); + testBoundedTableMap.put( + "table_with_array_of_struct_of_array", TestInput.TABLE_WITH_ARRAY_OF_STRUCT_OF_ARRAY); testBoundedTableMap.put("table_for_case_when", TestInput.TABLE_FOR_CASE_WHEN); testBoundedTableMap.put("aggregate_test_table_two", TestInput.AGGREGATE_TABLE_TWO); testBoundedTableMap.put("table_empty", TestInput.TABLE_EMPTY); diff --git a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTimeFunctionsTest.java b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTimeFunctionsTest.java index 260f12ef5979..3e3ec8d5d71c 100644 --- a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTimeFunctionsTest.java +++ b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTimeFunctionsTest.java @@ -49,9 +49,6 @@ /** Tests for ZetaSQL time functions (DATE, TIME, DATETIME, and TIMESTAMP functions). */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ZetaSqlTimeFunctionsTest extends ZetaSqlTestBase { @Rule public transient TestPipeline pipeline = TestPipeline.create(); @@ -84,7 +81,9 @@ public void testDateLiteral() { @Test public void testDateColumn() { - String sql = "SELECT FORMAT_DATE('%b-%d-%Y', date_field) FROM table_with_date"; + // NOTE: Do not use textual format parameters (%b or %h: The abbreviated month name) as these + // are locale dependent. + String sql = "SELECT FORMAT_DATE('%m-%d-%Y', date_field) FROM table_with_date"; ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); @@ -93,10 +92,10 @@ public void testDateColumn() { PAssert.that(stream) .containsInAnyOrder( Row.withSchema(Schema.builder().addStringField("f_date_str").build()) - .addValues("Dec-25-2008") + .addValues("12-25-2008") .build(), Row.withSchema(Schema.builder().addStringField("f_date_str").build()) - .addValues("Apr-07-2020") + .addValues("04-07-2020") .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } @@ -393,7 +392,9 @@ public void testDateTrunc() { @Test public void testFormatDate() { - String sql = "SELECT FORMAT_DATE('%b-%d-%Y', DATE '2008-12-25')"; + // NOTE: Do not use textual format parameters (%b or %h: The abbreviated month name) as these + // are locale dependent. + String sql = "SELECT FORMAT_DATE('%m-%d-%Y', DATE '2008-12-25')"; ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); @@ -402,7 +403,7 @@ public void testFormatDate() { PAssert.that(stream) .containsInAnyOrder( Row.withSchema(Schema.builder().addStringField("f_date_str").build()) - .addValues("Dec-25-2008") + .addValues("12-25-2008") .build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } diff --git a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTypesUtils.java b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTypesUtils.java index 02893b70055b..3d8d47839586 100644 --- a/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTypesUtils.java +++ b/sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlTypesUtils.java @@ -24,11 +24,6 @@ @Internal public class ZetaSqlTypesUtils { - public static final BigDecimal NUMERIC_MAX_VALUE = - bigDecimalAsNumeric("99999999999999999999999999999.999999999"); - public static final BigDecimal NUMERIC_MIN_VALUE = - bigDecimalAsNumeric("-99999999999999999999999999999.999999999"); - private ZetaSqlTypesUtils() {} /** diff --git a/sdks/java/extensions/zetasketch/build.gradle b/sdks/java/extensions/zetasketch/build.gradle index ab1f7bf7a444..61432853005d 100644 --- a/sdks/java/extensions/zetasketch/build.gradle +++ b/sdks/java/extensions/zetasketch/build.gradle @@ -29,6 +29,7 @@ evaluationDependsOn(":runners:google-cloud-dataflow-java:worker:legacy-worker") def zetasketch_version = "0.1.0" dependencies { + compile library.java.auto_value_annotations compile library.java.slf4j_api compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") diff --git a/sdks/java/extensions/zetasketch/src/main/java/org/apache/beam/sdk/extensions/zetasketch/ApproximateCountDistinct.java b/sdks/java/extensions/zetasketch/src/main/java/org/apache/beam/sdk/extensions/zetasketch/ApproximateCountDistinct.java new file mode 100644 index 000000000000..9b9daf5e0a48 --- /dev/null +++ b/sdks/java/extensions/zetasketch/src/main/java/org/apache/beam/sdk/extensions/zetasketch/ApproximateCountDistinct.java @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.zetasketch; + +import com.google.auto.value.AutoValue; +import java.util.List; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.extensions.zetasketch.HllCount.Init.Builder; +import org.apache.beam.sdk.transforms.Contextful; +import org.apache.beam.sdk.transforms.Contextful.Fn; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ProcessFunction; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@code PTransform}s for estimating the number of distinct elements in a {@code PCollection}, or + * the number of distinct values associated with each key in a {@code PCollection} of {@code KV}s. + * + *

    We make use of the {@link HllCount} implementation for this transform. Please use {@link + * HllCount} directly if you need access to the sketches. + * + *

    If the object is not one of {@link Byte[]} {@link Integer} {@link Double} {@link String} make + * use of {@link Globally#via} or {@link PerKey#via} + * + *

    Examples

    + * + *

    Example 1: Approximate Count of Ints {@code PCollection} and specify precision

    + * + *
    {@code
    + * p.apply("Int", Create.of(ints)).apply("IntHLL", ApproximateCountDistinct.globally()
    + *   .withPercision(PRECISION));
    + *
    + * }
    + * + *

    Example 2: Approximate Count of Key Value {@code PCollection>}

    + * + *
    {@code
    + * PCollection> result =
    + *   p.apply("Long", Create.of(longs)).apply("LongHLL", ApproximateCountDistinct.perKey());
    + *
    + * }
    + * + *

    Example 3: Approximate Count of Key Value {@code PCollection>}

    + * + *
    {@code
    + * PCollection> approxResultInteger =
    + *   p.apply("Int", Create.of(Foo))
    + *     .apply("IntHLL", ApproximateCountDistinct.>perKey()
    + *       .via(kv -> KV.of(kv.getKey(), (long) kv.getValue().hashCode())));
    + * }
    + */ +@Experimental +public class ApproximateCountDistinct { + + private static final Logger LOG = LoggerFactory.getLogger(ApproximateCountDistinct.class); + + private static final List> HLL_IMPLEMENTED_TYPES = + ImmutableList.of( + TypeDescriptors.strings(), + TypeDescriptors.longs(), + TypeDescriptors.integers(), + new TypeDescriptor() {}); + + public static Globally globally() { + return new AutoValue_ApproximateCountDistinct_Globally.Builder() + .setPrecision(HllCount.DEFAULT_PRECISION) + .build(); + } + + public static PerKey perKey() { + return new AutoValue_ApproximateCountDistinct_PerKey.Builder() + .setPrecision(HllCount.DEFAULT_PRECISION) + .build(); + } + + ///////////////////////////////////////////////////////////////////////////// + + /** + * {@code PTransform} for estimating the number of distinct elements in a {@code PCollection}. + * + * @param the type of the elements in the input {@code PCollection} + */ + @AutoValue + public abstract static class Globally extends PTransform, PCollection> { + + public abstract int getPrecision(); + + public abstract Builder toBuilder(); + + @Nullable + public abstract Contextful> getMapping(); + + @AutoValue.Builder + public abstract static class Builder { + + public abstract Builder setPrecision(int precision); + + public abstract Builder setMapping(Contextful> value); + + public abstract Globally build(); + } + + public Globally via(ProcessFunction fn) { + + return toBuilder().setMapping(Contextful.fn(fn)).build(); + } + + public Globally withPercision(Integer withPercision) { + @SuppressWarnings("unchecked") + Globally globally = (Globally) toBuilder().setPrecision(withPercision).build(); + return globally; + } + + @Override + public PCollection expand(PCollection input) { + + TypeDescriptor type = input.getCoder().getEncodedTypeDescriptor(); + + if (HLL_IMPLEMENTED_TYPES.contains(type)) { + + HllCount.Init.Builder builder = builderForType(type); + + return input.apply(builder.globally()).apply(HllCount.Extract.globally()); + } + + // Boiler plate to avoid [argument.type.incompatible] NonNull vs Nullable + Contextful> mapping = getMapping(); + + if (mapping != null) { + return input + .apply(MapElements.into(TypeDescriptors.longs()).via(mapping)) + .apply(HllCount.Init.forLongs().globally()) + .apply(HllCount.Extract.globally()); + } + + throw new IllegalArgumentException( + String.format( + "%s supports Integer," + + " Long, String and byte[] objects directly. For other types you must provide a Mapping function.", + this.getClass().getCanonicalName())); + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + super.populateDisplayData(builder); + ApproximateCountDistinct.populateDisplayData(builder, getPrecision()); + } + } + + @AutoValue + public abstract static class PerKey + extends PTransform>, PCollection>> { + + public abstract Integer getPrecision(); + + @Nullable + public abstract Contextful, KV>> getMapping(); + + public abstract Builder toBuilder(); + + @AutoValue.Builder + public abstract static class Builder { + + public abstract Builder setPrecision(Integer precision); + + public abstract Builder setMapping(Contextful, KV>> value); + + public abstract PerKey build(); + } + + public PerKey withPercision(Integer withPercision) { + // Work around for loss of type inference when using API. + @SuppressWarnings("unchecked") + PerKey perKey = (PerKey) this.toBuilder().setPrecision(withPercision).build(); + return perKey; + } + + public PerKey via(ProcessFunction, KV> fn) { + + return this.toBuilder().setMapping(Contextful., KV>fn(fn)).build(); + } + + @Override + public PCollection> expand(PCollection> input) { + + Coder coder = ((KvCoder) input.getCoder()).getValueCoder(); + + TypeDescriptor type = coder.getEncodedTypeDescriptor(); + + if (HLL_IMPLEMENTED_TYPES.contains(type)) { + + HllCount.Init.Builder builder = builderForType(type); + + return input.apply(builder.perKey()).apply(HllCount.Extract.perKey()); + } + + // Boiler plate to avoid [argument.type.incompatible] NonNull vs Nullable + Contextful, KV>> mapping = getMapping(); + + if (mapping != null) { + Coder keyCoder = ((KvCoder) input.getCoder()).getKeyCoder(); + return input + .apply( + MapElements.into( + TypeDescriptors.kvs( + keyCoder.getEncodedTypeDescriptor(), TypeDescriptors.longs())) + .via(mapping)) + .apply(HllCount.Init.forLongs().perKey()) + .apply(HllCount.Extract.perKey()); + } + + throw new IllegalArgumentException( + String.format( + "%s supports Integer," + + " Long, String and byte[] objects directly not for %s type, you must provide a Mapping use via.", + this.getClass().getCanonicalName(), type.toString())); + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + super.populateDisplayData(builder); + ApproximateCountDistinct.populateDisplayData(builder, getPrecision()); + } + } + + ///////////////////////////////////////////////////////////////////////////// + + private static void populateDisplayData(DisplayData.Builder builder, Integer precision) { + builder.add(DisplayData.item("precision", precision).withLabel("Precision")); + } + + // HLLCount supports, Long, Integers, String and Byte primitives. + // We will return an appropriate builder + protected static Builder builderForType(TypeDescriptor input) { + + @SuppressWarnings("rawtypes") + HllCount.Init.Builder builder = null; + + if (input.equals(TypeDescriptors.strings())) { + builder = HllCount.Init.forStrings(); + } + if (input.equals(TypeDescriptors.longs())) { + builder = HllCount.Init.forLongs(); + } + if (input.equals(TypeDescriptors.integers())) { + builder = HllCount.Init.forIntegers(); + } + if (input.equals(new TypeDescriptor() {})) { + builder = HllCount.Init.forBytes(); + } + + if (builder == null) { + throw new IllegalArgumentException(String.format("Type not supported %s", input)); + } + + // Safe to ignore warning, as we know the type based on the check we do above. + @SuppressWarnings("unchecked") + Builder output = (Builder) builder; + + return output; + } +} diff --git a/sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/ApproximateCountDistinctTest.java b/sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/ApproximateCountDistinctTest.java new file mode 100644 index 000000000000..8796d83810d0 --- /dev/null +++ b/sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/ApproximateCountDistinctTest.java @@ -0,0 +1,342 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.zetasketch; + +import com.google.zetasketch.HyperLogLogPlusPlus; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +/** Tests for {@link ApproximateCountDistinct}. */ +public class ApproximateCountDistinctTest { + + @Rule public final transient TestPipeline p = TestPipeline.create(); + + // Integer + private static final List INTS1 = Arrays.asList(1, 2, 3, 3, 1, 4); + private static final Long INTS1_ESTIMATE; + + private static final int TEST_PRECISION = 20; + + static { + HyperLogLogPlusPlus hll = new HyperLogLogPlusPlus.Builder().buildForIntegers(); + INTS1.forEach(hll::add); + INTS1_ESTIMATE = hll.longResult(); + } + + /** Test correct Builder is returned from Generic type. * */ + @Test + public void testIntegerBuilder() { + + PCollection ints = p.apply(Create.of(1)); + HllCount.Init.Builder builder = + ApproximateCountDistinct.builderForType( + ints.getCoder().getEncodedTypeDescriptor()); + PCollection result = ints.apply(builder.globally()).apply(HllCount.Extract.globally()); + PAssert.that(result).containsInAnyOrder(1L); + p.run(); + } + /** Test correct Builder is returned from Generic type. * */ + @Test + public void testStringBuilder() { + + PCollection strings = p.apply(Create.of("43")); + HllCount.Init.Builder builder = + ApproximateCountDistinct.builderForType( + strings.getCoder().getEncodedTypeDescriptor()); + PCollection result = strings.apply(builder.globally()).apply(HllCount.Extract.globally()); + PAssert.that(result).containsInAnyOrder(1L); + p.run(); + } + /** Test correct Builder is returned from Generic type. * */ + @Test + public void testLongBuilder() { + + PCollection longs = p.apply(Create.of(1L)); + HllCount.Init.Builder builder = + ApproximateCountDistinct.builderForType(longs.getCoder().getEncodedTypeDescriptor()); + PCollection result = longs.apply(builder.globally()).apply(HllCount.Extract.globally()); + PAssert.that(result).containsInAnyOrder(1L); + p.run(); + } + /** Test correct Builder is returned from Generic type. * */ + @Test + public void testBytesBuilder() { + + byte[] byteArray = new byte[] {'A'}; + PCollection bytes = p.apply(Create.of(byteArray)); + TypeDescriptor a = bytes.getCoder().getEncodedTypeDescriptor(); + HllCount.Init.Builder builder = + ApproximateCountDistinct.builderForType( + bytes.getCoder().getEncodedTypeDescriptor()); + PCollection result = bytes.apply(builder.globally()).apply(HllCount.Extract.globally()); + PAssert.that(result).containsInAnyOrder(1L); + p.run(); + } + + /** Test Integer Globally. */ + @Test + @Category(NeedsRunner.class) + public void testStandardTypesGlobalForInteger() { + PCollection approxResultInteger = + p.apply("Int", Create.of(INTS1)).apply("IntHLL", ApproximateCountDistinct.globally()); + PAssert.thatSingleton(approxResultInteger).isEqualTo(INTS1_ESTIMATE); + p.run(); + } + + /** Test Long Globally. */ + @Test + @Category(NeedsRunner.class) + public void testStandardTypesGlobalForLong() { + + PCollection approxResultLong = + p.apply("Long", Create.of(INTS1.stream().map(Long::valueOf).collect(Collectors.toList()))) + .apply("LongHLL", ApproximateCountDistinct.globally()); + + PAssert.thatSingleton(approxResultLong).isEqualTo(INTS1_ESTIMATE); + + p.run(); + } + + /** Test String Globally. */ + @Test + @Category(NeedsRunner.class) + public void testStandardTypesGlobalForStrings() { + PCollection approxResultString = + p.apply("Str", Create.of(INTS1.stream().map(String::valueOf).collect(Collectors.toList()))) + .apply("StrHLL", ApproximateCountDistinct.globally()); + + PAssert.thatSingleton(approxResultString).isEqualTo(INTS1_ESTIMATE); + + p.run(); + } + + /** Test Byte Globally. */ + @Test + @Category(NeedsRunner.class) + public void testStandardTypesGlobalForBytes() { + PCollection approxResultByte = + p.apply( + "BytesHLL", + Create.of( + INTS1.stream() + .map(x -> ByteBuffer.allocate(4).putInt(x).array()) + .collect(Collectors.toList()))) + .apply(ApproximateCountDistinct.globally()); + + PAssert.thatSingleton(approxResultByte).isEqualTo(INTS1_ESTIMATE); + + p.run(); + } + + /** Test Integer Globally. */ + @Test + @Category(NeedsRunner.class) + public void testStandardTypesPerKeyForInteger() { + + List> ints = new ArrayList<>(); + + for (int i = 0; i < 3; i++) { + for (int k : INTS1) { + ints.add(KV.of(i, k)); + } + } + + PCollection> result = + p.apply("Int", Create.of(ints)).apply("IntHLL", ApproximateCountDistinct.perKey()); + + PAssert.that(result) + .containsInAnyOrder( + ImmutableList.of( + KV.of(0, INTS1_ESTIMATE), KV.of(1, INTS1_ESTIMATE), KV.of(2, INTS1_ESTIMATE))); + + p.run(); + } + + /** Test Long Globally. */ + @Test + @Category(NeedsRunner.class) + public void testStandardTypesPerKeyForLong() { + + List> longs = new ArrayList<>(); + + for (int i = 0; i < 3; i++) { + for (int k : INTS1) { + longs.add(KV.of(i, (long) k)); + } + } + + PCollection> result = + p.apply("Long", Create.of(longs)).apply("LongHLL", ApproximateCountDistinct.perKey()); + + PAssert.that(result) + .containsInAnyOrder( + ImmutableList.of( + KV.of(0, INTS1_ESTIMATE), KV.of(1, INTS1_ESTIMATE), KV.of(2, INTS1_ESTIMATE))); + + p.run(); + } + + /** Test String Globally. */ + @Test + @Category(NeedsRunner.class) + public void testStandardTypesPerKeyForStrings() { + List> strings = new ArrayList<>(); + + for (int i = 0; i < 3; i++) { + for (int k : INTS1) { + strings.add(KV.of(i, String.valueOf(k))); + } + } + + PCollection> result = + p.apply("Str", Create.of(strings)).apply("StrHLL", ApproximateCountDistinct.perKey()); + + PAssert.that(result) + .containsInAnyOrder( + ImmutableList.of( + KV.of(0, INTS1_ESTIMATE), KV.of(1, INTS1_ESTIMATE), KV.of(2, INTS1_ESTIMATE))); + + p.run(); + } + + /** Test Byte Globally. */ + @Test + @Category(NeedsRunner.class) + public void testStandardTypesPerKeyForBytes() { + + List> bytes = new ArrayList<>(); + + for (int i = 0; i < 3; i++) { + for (int k : INTS1) { + bytes.add(KV.of(i, ByteBuffer.allocate(4).putInt(k).array())); + } + } + + PCollection> result = + p.apply("BytesHLL", Create.of(bytes)).apply(ApproximateCountDistinct.perKey()); + + PAssert.that(result) + .containsInAnyOrder( + ImmutableList.of( + KV.of(0, INTS1_ESTIMATE), KV.of(1, INTS1_ESTIMATE), KV.of(2, INTS1_ESTIMATE))); + + p.run(); + } + + /** Test a general object, we will make use of a KV as the object as it already has a coder. */ + @Test + @Category(NeedsRunner.class) + public void testObjectTypesGlobal() { + + PCollection approxResultInteger = + p.apply( + "Int", + Create.of( + INTS1.stream().map(x -> KV.of(x, KV.of(x, x))).collect(Collectors.toList()))) + .apply( + "IntHLL", + ApproximateCountDistinct.>>globally() + .via((KV> x) -> (long) x.getValue().hashCode())); + + PAssert.thatSingleton(approxResultInteger).isEqualTo(INTS1_ESTIMATE); + + p.run(); + } + + /** Test a general object, we will make use of a KV as the object as it already has a coder. */ + @Test + @Category(NeedsRunner.class) + public void testObjectTypesPerKey() { + + List>> ints = new ArrayList<>(); + + for (int i = 0; i < 3; i++) { + for (int k : INTS1) { + ints.add(KV.of(i, KV.of(i, k))); + } + } + + PCollection> approxResultInteger = + p.apply("Int", Create.of(ints)) + .apply( + "IntHLL", + ApproximateCountDistinct.>perKey() + .via(x -> KV.of(x.getKey(), (long) x.hashCode())) + .withPercision(TEST_PRECISION)); + + PAssert.that(approxResultInteger) + .containsInAnyOrder( + ImmutableList.of( + KV.of(0, INTS1_ESTIMATE), KV.of(1, INTS1_ESTIMATE), KV.of(2, INTS1_ESTIMATE))); + + p.run(); + } + + /** Test a general object, we will make use of a KV as the object as it already has a coder. */ + @Test + @Category(NeedsRunner.class) + public void testGlobalPercision() { + + PCollection approxResultInteger = + p.apply("Int", Create.of(INTS1)) + .apply("IntHLL", ApproximateCountDistinct.globally().withPercision(TEST_PRECISION)); + + PAssert.thatSingleton(approxResultInteger).isEqualTo(INTS1_ESTIMATE); + + p.run(); + } + + /** Test a general object, we will make use of a KV as the object as it already has a coder. */ + @Test + @Category(NeedsRunner.class) + public void testPerKeyPercision() { + + List> ints = new ArrayList<>(); + + for (int i = 0; i < 3; i++) { + for (int k : INTS1) { + ints.add(KV.of(i, k)); + } + } + + PCollection> approxResultInteger = + p.apply("Int", Create.of(ints)) + .apply("IntHLL", ApproximateCountDistinct.perKey().withPercision(TEST_PRECISION)); + + PAssert.that(approxResultInteger) + .containsInAnyOrder( + ImmutableList.of( + KV.of(0, INTS1_ESTIMATE), KV.of(1, INTS1_ESTIMATE), KV.of(2, INTS1_ESTIMATE))); + + p.run(); + } +} diff --git a/sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java b/sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java index 855348a7fddf..2bd18ce32244 100644 --- a/sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java +++ b/sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java @@ -26,6 +26,7 @@ import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableRow; import com.google.api.services.bigquery.model.TableSchema; +import com.google.cloud.bigquery.storage.v1.DataFormat; import java.nio.ByteBuffer; import java.util.Arrays; import java.util.Collections; @@ -191,6 +192,7 @@ private void readSketchFromBigQuery(String tableId, Long expectedCount) { PCollection result = p.apply( BigQueryIO.read(parseQueryResultToByteArray) + .withFormat(DataFormat.AVRO) .fromQuery(query) .usingStandardSql() .withMethod(Method.DIRECT_READ) diff --git a/sdks/java/fn-execution/build.gradle b/sdks/java/fn-execution/build.gradle index ffb92cbc7ea2..8cb5f0a3d399 100644 --- a/sdks/java/fn-execution/build.gradle +++ b/sdks/java/fn-execution/build.gradle @@ -29,14 +29,13 @@ dependencies { compile project(path: ":model:pipeline", configuration: "shadow") compile project(path: ":model:fn-execution", configuration: "shadow") compile project(path: ":sdks:java:core", configuration: "shadow") - compile library.java.vendored_grpc_1_26_0 + compile library.java.vendored_grpc_1_36_0 compile library.java.vendored_guava_26_0_jre compile library.java.slf4j_api compile library.java.joda_time provided library.java.junit testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.mockito_core + testCompile library.java.commons_lang3 testRuntimeOnly library.java.slf4j_jdk14 } diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactory.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactory.java index c6180a3da30f..5fc604f6ba66 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactory.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactory.java @@ -20,14 +20,14 @@ import java.net.SocketAddress; import java.util.List; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ClientInterceptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.netty.NettyChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.EpollDomainSocketChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.EpollEventLoopGroup; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.EpollSocketChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.unix.DomainSocketAddress; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ClientInterceptor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NettyChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.EpollDomainSocketChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.EpollEventLoopGroup; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.EpollSocketChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.unix.DomainSocketAddress; /** A Factory which creates an underlying {@link ManagedChannel} implementation. */ public abstract class ManagedChannelFactory { @@ -36,7 +36,7 @@ public static ManagedChannelFactory createDefault() { } public static ManagedChannelFactory createEpoll() { - org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.Epoll.ensureAvailability(); + org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.Epoll.ensureAvailability(); return new Epoll(); } diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/SocketAddressFactory.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/SocketAddressFactory.java index b7d9d7674dde..06b2a37dc603 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/SocketAddressFactory.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/SocketAddressFactory.java @@ -23,7 +23,7 @@ import java.io.IOException; import java.net.InetSocketAddress; import java.net.SocketAddress; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.unix.DomainSocketAddress; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.unix.DomainSocketAddress; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.net.HostAndPort; /** Creates a {@link SocketAddress} based upon a supplied string. */ diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataBufferingOutboundObserver.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataBufferingOutboundObserver.java index e741c7e9c793..7e3b615a2be4 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataBufferingOutboundObserver.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataBufferingOutboundObserver.java @@ -23,7 +23,7 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; /** diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java index 29e1b3b775f0..e6b3facce60f 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java @@ -27,9 +27,9 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.Elements; import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java index ed1f178d37b5..8484a3a35d44 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.fn.data; import java.util.function.BiConsumer; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -73,6 +73,11 @@ public void awaitCompletion() throws Exception { readFuture.awaitCompletion(); } + @Override + public void runWhenComplete(Runnable completeRunnable) { + readFuture.runWhenComplete(completeRunnable); + } + @Override public boolean isDone() { return readFuture.isDone(); diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserver.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserver.java index 420772789b88..d3af45290df4 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserver.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserver.java @@ -20,8 +20,8 @@ import java.io.IOException; import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserver.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserver.java index 7da539b85fcf..e45e5d6da3bf 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserver.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserver.java @@ -25,7 +25,7 @@ import java.util.concurrent.TimeUnit; import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/CompletableFutureInboundDataClient.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/CompletableFutureInboundDataClient.java index 5f02e6e14f55..73b68222cfa2 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/CompletableFutureInboundDataClient.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/CompletableFutureInboundDataClient.java @@ -48,6 +48,12 @@ public void awaitCompletion() throws Exception { future.get(); } + @Override + @SuppressWarnings("FutureReturnValueIgnored") + public void runWhenComplete(Runnable completeRunnable) { + future.whenComplete((result, throwable) -> completeRunnable.run()); + } + @Override public boolean isDone() { return future.isDone(); diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/DecodingFnDataReceiver.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/DecodingFnDataReceiver.java index 6723d8dd6aa6..fd2f4e41e127 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/DecodingFnDataReceiver.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/DecodingFnDataReceiver.java @@ -19,7 +19,7 @@ import java.io.InputStream; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** A receiver of encoded data, decoding it and passing it onto a downstream consumer. */ public class DecodingFnDataReceiver implements FnDataReceiver { diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/InboundDataClient.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/InboundDataClient.java index db46290256d0..a8b2c673545e 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/InboundDataClient.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/InboundDataClient.java @@ -33,6 +33,9 @@ public interface InboundDataClient { */ void awaitCompletion() throws InterruptedException, Exception; + /** Runs the runnable once the client has completed reading from the inbound stream. */ + void runWhenComplete(Runnable completeRunnable); + /** * Returns true if the client is done, either via completing successfully or by being cancelled. */ diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortRead.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortRead.java index 1e205b19cdef..79c48a76f90a 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortRead.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortRead.java @@ -23,7 +23,7 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.RemoteGrpcPort; import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; /** diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortWrite.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortWrite.java index 2b736599ec0f..79db97f48ecf 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortWrite.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortWrite.java @@ -24,7 +24,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; /** diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/FnService.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/FnService.java similarity index 86% rename from runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/FnService.java rename to sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/FnService.java index 634657a42aca..f9ea1608428b 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/FnService.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/FnService.java @@ -15,9 +15,9 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.fnexecution; +package org.apache.beam.sdk.fn.server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.BindableService; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.BindableService; /** An interface sharing common behavior with services used during execution of user Fns. */ public interface FnService extends AutoCloseable, BindableService { @@ -26,8 +26,8 @@ public interface FnService extends AutoCloseable, BindableService { * *

    There should be no more calls to any service method by the time a call to {@link #close()} * begins. Specifically, this means that a {@link - * org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server} that this service is bound to should have - * completed a call to the {@link org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server#shutdown()} + * org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server} that this service is bound to should have + * completed a call to the {@link org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server#shutdown()} * method, and all future incoming calls will be rejected. */ @Override diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/GrpcContextHeaderAccessorProvider.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/GrpcContextHeaderAccessorProvider.java similarity index 82% rename from runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/GrpcContextHeaderAccessorProvider.java rename to sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/GrpcContextHeaderAccessorProvider.java index fdfcc477b676..738f7b0eebe7 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/GrpcContextHeaderAccessorProvider.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/GrpcContextHeaderAccessorProvider.java @@ -15,16 +15,16 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.fnexecution; +package org.apache.beam.sdk.fn.server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Context; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Contexts; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Metadata; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Metadata.Key; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerCall; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerCall.Listener; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerCallHandler; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerInterceptor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Context; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Contexts; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Metadata; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Metadata.Key; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerCall; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerCall.Listener; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerCallHandler; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerInterceptor; /** * A HeaderAccessorProvider which intercept the header in a GRPC request and expose the relevant diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/GrpcFnServer.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/GrpcFnServer.java similarity index 98% rename from runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/GrpcFnServer.java rename to sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/GrpcFnServer.java index 7e63fc934b69..3eab710d6071 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/GrpcFnServer.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/GrpcFnServer.java @@ -15,7 +15,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.fnexecution; +package org.apache.beam.sdk.fn.server; import java.io.IOException; import java.util.Collections; @@ -23,7 +23,7 @@ import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/HeaderAccessor.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/HeaderAccessor.java similarity index 95% rename from runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/HeaderAccessor.java rename to sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/HeaderAccessor.java index 89b5ffea7866..037ab0fb32ae 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/HeaderAccessor.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/HeaderAccessor.java @@ -15,7 +15,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.fnexecution; +package org.apache.beam.sdk.fn.server; /** Interface to access headers in the client request. */ public interface HeaderAccessor { diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/InProcessServerFactory.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/InProcessServerFactory.java similarity index 89% rename from runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/InProcessServerFactory.java rename to sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/InProcessServerFactory.java index e72b0bcf77cb..9bec6ae5eb1a 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/InProcessServerFactory.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/InProcessServerFactory.java @@ -15,16 +15,16 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.fnexecution; +package org.apache.beam.sdk.fn.server; import java.io.IOException; import java.util.List; import java.util.concurrent.atomic.AtomicInteger; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.BindableService; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerInterceptors; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.BindableService; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerInterceptors; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; /** * A {@link ServerFactory} which creates {@link Server servers} with the {@link diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/ServerFactory.java similarity index 94% rename from runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java rename to sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/ServerFactory.java index c1b842d46a4a..fb348481d362 100644 --- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/ServerFactory.java @@ -15,7 +15,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.fnexecution; +package org.apache.beam.sdk.fn.server; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; @@ -29,16 +29,16 @@ import java.util.function.Supplier; import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.sdk.fn.channel.SocketAddressFactory; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.BindableService; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerInterceptors; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.netty.NettyServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.EpollEventLoopGroup; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.EpollServerDomainSocketChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.EpollServerSocketChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.unix.DomainSocketAddress; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.internal.ThreadLocalRandom; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.BindableService; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerInterceptors; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NettyServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.EpollEventLoopGroup; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.EpollServerDomainSocketChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.EpollServerSocketChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.unix.DomainSocketAddress; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.ThreadLocalRandom; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.net.HostAndPort; /** A {@link Server gRPC server} factory. */ diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/package-info.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/package-info.java new file mode 100644 index 000000000000..a303544d207b --- /dev/null +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/server/package-info.java @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/** gPRC server factory. */ +package org.apache.beam.sdk.fn.server; diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/BufferingStreamObserver.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/BufferingStreamObserver.java index 02abd388f7fb..4ec5f9e71921 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/BufferingStreamObserver.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/BufferingStreamObserver.java @@ -27,8 +27,8 @@ import java.util.concurrent.Phaser; import java.util.concurrent.TimeUnit; import javax.annotation.concurrent.ThreadSafe; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; /** diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/DataStreams.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/DataStreams.java index f18921174fe0..f4ab8bb73e64 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/DataStreams.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/DataStreams.java @@ -22,19 +22,16 @@ import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; -import java.io.PushbackInputStream; import java.util.Iterator; import java.util.NoSuchElementException; import java.util.concurrent.BlockingQueue; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CountingInputStream; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** - * {@link #inbound(Iterator)} treats multiple {@link ByteString}s as a single input stream and - * {@link #outbound(OutputChunkConsumer)} treats a single {@link OutputStream} as multiple {@link - * ByteString}s. + * {@link DataStreamDecoder} treats multiple {@link ByteString}s as a single input stream decoding + * values with the supplied iterator. {@link #outbound(OutputChunkConsumer)} treats a single {@link + * OutputStream} as multiple {@link ByteString}s. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) @@ -43,19 +40,6 @@ public class DataStreams { public static final int DEFAULT_OUTBOUND_BUFFER_LIMIT_BYTES = 1_000_000; - /** - * Converts multiple {@link ByteString}s into a single {@link InputStream}. - * - *

    The iterator is accessed lazily. The supplied {@link Iterator} should block until either it - * knows that no more values will be provided or it has the next {@link ByteString}. - * - *

    Note that this {@link InputStream} follows the Beam Fn API specification for forcing values - * that decode consuming zero bytes to consuming exactly one byte. - */ - public static InputStream inbound(Iterator bytes) { - return new Inbound(bytes); - } - /** * Converts a single element delimited {@link OutputStream} into multiple {@link ByteString * ByteStrings}. @@ -172,66 +156,20 @@ public interface OutputChunkConsumer { } /** - * An input stream which concatenates multiple {@link ByteString}s. Lazily accesses the first - * {@link Iterator} on first access of this input stream. - * - *

    Closing this input stream has no effect. - */ - private static class Inbound extends InputStream { - private static final InputStream EMPTY_STREAM = - new InputStream() { - @Override - public int read() throws IOException { - return -1; - } - }; - - private final Iterator bytes; - private InputStream currentStream; - - public Inbound(Iterator bytes) { - this.currentStream = EMPTY_STREAM; - this.bytes = bytes; - } - - @Override - public int read() throws IOException { - int rval = -1; - // Move on to the next stream if we have read nothing - while ((rval = currentStream.read()) == -1 && bytes.hasNext()) { - currentStream = bytes.next().newInput(); - } - return rval; - } - - @Override - public int read(byte[] b, int off, int len) throws IOException { - int remainingLen = len; - while ((remainingLen -= - ByteStreams.read(currentStream, b, off + len - remainingLen, remainingLen)) - > 0) { - if (bytes.hasNext()) { - currentStream = bytes.next().newInput(); - } else { - int bytesRead = len - remainingLen; - return bytesRead > 0 ? bytesRead : -1; - } - } - return len - remainingLen; - } - } - - /** - * An adapter which converts an {@link InputStream} to an {@link Iterator} of {@code T} values - * using the specified {@link Coder}. + * An adapter which converts an {@link InputStream} to a {@link PrefetchableIterator} of {@code T} + * values using the specified {@link Coder}. * *

    Note that this adapter follows the Beam Fn API specification for forcing values that decode * consuming zero bytes to consuming exactly one byte. * *

    Note that access to the underlying {@link InputStream} is lazy and will only be invoked on - * first access to {@link #next()} or {@link #hasNext()}. + * first access to {@link #next}, {@link #hasNext}, {@link #isReady}, and {@link #prefetch}. + * + *

    Note that {@link #isReady} and {@link #prefetch} rely on non-empty {@link ByteString}s being + * returned via the underlying {@link PrefetchableIterator} otherwise the {@link #prefetch} will + * seemingly make zero progress yet will actually advance through the empty pages. */ - public static class DataStreamDecoder implements Iterator { + public static class DataStreamDecoder implements PrefetchableIterator { private enum State { READ_REQUIRED, @@ -239,17 +177,42 @@ private enum State { EOF } - private final CountingInputStream countingInputStream; - private final PushbackInputStream pushbackInputStream; + private final PrefetchableIterator inputByteStrings; + private final Inbound inbound; private final Coder coder; private State currentState; private T next; - public DataStreamDecoder(Coder coder, InputStream inputStream) { + public DataStreamDecoder(Coder coder, PrefetchableIterator inputStream) { this.currentState = State.READ_REQUIRED; this.coder = coder; - this.pushbackInputStream = new PushbackInputStream(inputStream, 1); - this.countingInputStream = new CountingInputStream(pushbackInputStream); + this.inputByteStrings = inputStream; + this.inbound = new Inbound(); + } + + @Override + public boolean isReady() { + switch (currentState) { + case EOF: + return true; + case READ_REQUIRED: + try { + return inbound.isReady(); + } catch (IOException e) { + throw new RuntimeException(e); + } + case HAS_NEXT: + return true; + default: + throw new IllegalStateException(String.format("Unknown state %s", currentState)); + } + } + + @Override + public void prefetch() { + if (!isReady()) { + inputByteStrings.prefetch(); + } } @Override @@ -259,18 +222,16 @@ public boolean hasNext() { return false; case READ_REQUIRED: try { - int nextByte = pushbackInputStream.read(); - if (nextByte == -1) { + if (inbound.isEof()) { currentState = State.EOF; return false; } - pushbackInputStream.unread(nextByte); - long count = countingInputStream.getCount(); - next = coder.decode(countingInputStream); + long previousPosition = inbound.position; + next = coder.decode(inbound); // Skip one byte if decoding the value consumed 0 bytes. - if (countingInputStream.getCount() - count == 0) { - checkState(countingInputStream.read() != -1, "Unexpected EOF reached"); + if (inbound.position - previousPosition == 0) { + checkState(inbound.read() != -1, "Unexpected EOF reached"); } currentState = State.HAS_NEXT; } catch (IOException e) { @@ -296,6 +257,88 @@ public T next() { public void remove() { throw new UnsupportedOperationException(); } + + private static final InputStream EMPTY_STREAM = ByteString.EMPTY.newInput(); + + /** + * An input stream which concatenates multiple {@link ByteString}s. Lazily accesses the {@link + * Iterator} on first access of this input stream. + * + *

    Closing this input stream has no effect. + */ + private class Inbound extends InputStream { + private long position; + private InputStream currentStream; + + public Inbound() { + this.currentStream = EMPTY_STREAM; + } + + public boolean isReady() throws IOException { + // Note that ByteString#newInput is guaranteed to return the length of the entire ByteString + // minus the number of bytes that have been read so far and can be reliably used to tell + // us whether we are at the end of the stream. + while (currentStream.available() == 0) { + if (!inputByteStrings.isReady()) { + return false; + } + if (!inputByteStrings.hasNext()) { + return true; + } + currentStream = inputByteStrings.next().newInput(); + } + return true; + } + + public boolean isEof() throws IOException { + // Note that ByteString#newInput is guaranteed to return the length of the entire ByteString + // minus the number of bytes that have been read so far and can be reliably used to tell + // us whether we are at the end of the stream. + while (currentStream.available() == 0) { + if (!inputByteStrings.hasNext()) { + return true; + } + currentStream = inputByteStrings.next().newInput(); + } + return false; + } + + @Override + public int read() throws IOException { + int read; + // Move on to the next stream if this stream is done + while ((read = currentStream.read()) == -1) { + if (!inputByteStrings.hasNext()) { + return -1; + } + currentStream = inputByteStrings.next().newInput(); + } + position += 1; + return read; + } + + @Override + public int read(byte[] b, int off, int len) throws IOException { + int remainingLen = len; + while (remainingLen > 0) { + int read; + // Move on to the next stream if this stream is done. Note that ByteString.newInput + // guarantees that read will consume the entire ByteString if the passed in length is + // greater than or equal to the remaining amount. + while ((read = currentStream.read(b, off + len - remainingLen, remainingLen)) == -1) { + if (!inputByteStrings.hasNext()) { + int bytesRead = len - remainingLen; + position += bytesRead; + return bytesRead > 0 ? bytesRead : -1; + } + currentStream = inputByteStrings.next().newInput(); + } + remainingLen -= read; + } + position += len; + return len; + } + } } /** diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/DirectStreamObserver.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/DirectStreamObserver.java index f60ea4b1ff31..d5c10685927e 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/DirectStreamObserver.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/DirectStreamObserver.java @@ -20,9 +20,10 @@ import java.util.concurrent.Phaser; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException; +import java.util.concurrent.atomic.AtomicInteger; import javax.annotation.concurrent.ThreadSafe; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -44,7 +45,7 @@ public final class DirectStreamObserver implements StreamObserver { private final CallStreamObserver outboundObserver; private final int maxMessagesBeforeCheck; - private int numberOfMessagesBeforeReadyCheck; + private AtomicInteger numMessages = new AtomicInteger(); public DirectStreamObserver(Phaser phaser, CallStreamObserver outboundObserver) { this(phaser, outboundObserver, DEFAULT_MAX_MESSAGES_BEFORE_CHECK); @@ -59,9 +60,8 @@ public DirectStreamObserver(Phaser phaser, CallStreamObserver outboundObserve @Override public void onNext(T value) { - numberOfMessagesBeforeReadyCheck += 1; - if (numberOfMessagesBeforeReadyCheck >= maxMessagesBeforeCheck) { - numberOfMessagesBeforeReadyCheck = 0; + if (maxMessagesBeforeCheck <= 1 + || numMessages.incrementAndGet() % maxMessagesBeforeCheck == 0) { int waitTime = 1; int totalTimeWaited = 0; int phase = phaser.getPhase(); diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/ForwardingClientResponseObserver.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/ForwardingClientResponseObserver.java index 016cb1109eec..43570188d054 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/ForwardingClientResponseObserver.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/ForwardingClientResponseObserver.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.fn.stream; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ClientCallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ClientResponseObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientCallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientResponseObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** * A {@link ClientResponseObserver} which delegates all {@link StreamObserver} calls. diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactory.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactory.java index 6693fee943ec..0467cfd8b7ee 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactory.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactory.java @@ -18,8 +18,8 @@ package org.apache.beam.sdk.fn.stream; import java.util.concurrent.ExecutorService; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** * Creates factories which determine an underlying {@link StreamObserver} implementation to use in diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterable.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterable.java new file mode 100644 index 000000000000..5700f6c41a48 --- /dev/null +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterable.java @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.fn.stream; + +/** An {@link Iterable} that returns {@link PrefetchableIterator}s. */ +public interface PrefetchableIterable extends Iterable { + + @Override + PrefetchableIterator iterator(); +} diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterables.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterables.java new file mode 100644 index 000000000000..489c727fb05a --- /dev/null +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterables.java @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.fn.stream; + +import static org.apache.beam.sdk.fn.stream.PrefetchableIterators.emptyIterator; + +import java.util.NoSuchElementException; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.FluentIterable; + +/** + * This class contains static utility functions that operate on or return objects of type {@link + * PrefetchableIterable}. + */ +public class PrefetchableIterables { + + private static final PrefetchableIterable EMPTY_ITERABLE = + new PrefetchableIterable() { + @Override + public PrefetchableIterator iterator() { + return emptyIterator(); + } + }; + + /** Returns an empty {@link PrefetchableIterable}. */ + public static PrefetchableIterable emptyIterable() { + return (PrefetchableIterable) EMPTY_ITERABLE; + } + + /** + * Returns a {@link PrefetchableIterable} over the specified values. + * + *

    {@link PrefetchableIterator#prefetch()} is a no-op and {@link + * PrefetchableIterator#isReady()} always returns true. + */ + public static PrefetchableIterable fromArray(T... values) { + if (values.length == 0) { + return emptyIterable(); + } + return new PrefetchableIterable() { + @Override + public PrefetchableIterator iterator() { + return PrefetchableIterators.fromArray(values); + } + }; + } + + /** + * Converts the {@link Iterable} into a {@link PrefetchableIterable}. + * + *

    If the {@link Iterable#iterator} does not return {@link PrefetchableIterator}s then one is + * constructed that ensures that {@link PrefetchableIterator#prefetch()} is a no-op and {@link + * PrefetchableIterator#isReady()} always returns true. + */ + private static PrefetchableIterable maybePrefetchable(Iterable iterable) { + if (iterable instanceof PrefetchableIterable) { + return (PrefetchableIterable) iterable; + } + return new PrefetchableIterable() { + @Override + public PrefetchableIterator iterator() { + return PrefetchableIterators.maybePrefetchable(iterable.iterator()); + } + }; + } + + /** + * Concatentates the {@link Iterable}s. + * + *

    See {@link PrefetchableIterators#concat} for additional details. + */ + public static PrefetchableIterable concat(Iterable... iterables) { + for (int i = 0; i < iterables.length; ++i) { + if (iterables[i] == null) { + throw new IllegalArgumentException("Iterable at position " + i + " was null."); + } + } + if (iterables.length == 0) { + return emptyIterable(); + } else if (iterables.length == 1) { + return maybePrefetchable(iterables[0]); + } + return new PrefetchableIterable() { + @SuppressWarnings("methodref.receiver.invalid") + @Override + public PrefetchableIterator iterator() { + return PrefetchableIterators.concatIterators( + FluentIterable.from(iterables).transform(Iterable::iterator).iterator()); + } + }; + } + + /** Limits the {@link PrefetchableIterable} to the specified number of elements. */ + public static PrefetchableIterable limit(Iterable iterable, int limit) { + PrefetchableIterable prefetchableIterable = maybePrefetchable(iterable); + return new PrefetchableIterable() { + @Override + public PrefetchableIterator iterator() { + return new PrefetchableIterator() { + PrefetchableIterator delegate = prefetchableIterable.iterator(); + int currentPosition; + + @Override + public boolean isReady() { + if (currentPosition < limit) { + return delegate.isReady(); + } + return true; + } + + @Override + public void prefetch() { + if (!isReady()) { + delegate.prefetch(); + } + } + + @Override + public boolean hasNext() { + if (currentPosition != limit) { + return delegate.hasNext(); + } + return false; + } + + @Override + public T next() { + if (!hasNext()) { + throw new NoSuchElementException(); + } + currentPosition += 1; + return delegate.next(); + } + }; + } + }; + } +} diff --git a/runners/flink/1.10/src/main/java/org/apache/beam/runners/flink/FlinkCapabilities.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterator.java similarity index 60% rename from runners/flink/1.10/src/main/java/org/apache/beam/runners/flink/FlinkCapabilities.java rename to sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterator.java index 1b56c72946a9..0a3226091066 100644 --- a/runners/flink/1.10/src/main/java/org/apache/beam/runners/flink/FlinkCapabilities.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterator.java @@ -15,20 +15,22 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.flink; +package org.apache.beam.sdk.fn.stream; -/** Handle different capabilities between flink versions. */ -public class FlinkCapabilities { +import java.util.Iterator; + +/** {@link Iterator} that supports prefetching the next set of records. */ +public interface PrefetchableIterator extends Iterator { + + /** + * Returns {@code true} if and only if {@link #hasNext} and {@link #next} will not require an + * expensive operation. + */ + boolean isReady(); /** - * Support for outputting elements in close method of chained drivers. - * - *

    {@see FLINK-14709} for more - * details. - * - * @return True if feature is supported. + * If not {@link #isReady}, schedules the next expensive operation such that at some point in time + * in the future {@link #isReady} will return true. */ - public static boolean supportsOutputDuringClosing() { - return true; - } + void prefetch(); } diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterators.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterators.java new file mode 100644 index 000000000000..b8ea509477ca --- /dev/null +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/PrefetchableIterators.java @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.fn.stream; + +import java.util.Arrays; +import java.util.Iterator; +import java.util.NoSuchElementException; + +public class PrefetchableIterators { + + private static final PrefetchableIterator EMPTY_ITERATOR = + new PrefetchableIterator() { + @Override + public boolean isReady() { + return true; + } + + @Override + public void prefetch() {} + + @Override + public boolean hasNext() { + return false; + } + + @Override + public Object next() { + throw new NoSuchElementException(); + } + }; + + /** Returns an empty {@link PrefetchableIterator}. */ + public static PrefetchableIterator emptyIterator() { + return (PrefetchableIterator) EMPTY_ITERATOR; + } + + /** + * Returns a {@link PrefetchableIterator} over the specified values. + * + *

    {@link PrefetchableIterator#prefetch()} is a no-op and {@link + * PrefetchableIterator#isReady()} always returns true. + */ + public static PrefetchableIterator fromArray(T... values) { + if (values.length == 0) { + return emptyIterator(); + } + return new PrefetchableIterator() { + int currentIndex; + + @Override + public boolean isReady() { + return true; + } + + @Override + public void prefetch() {} + + @Override + public boolean hasNext() { + return currentIndex < values.length; + } + + @Override + public T next() { + if (!hasNext()) { + throw new NoSuchElementException(); + } + return values[currentIndex++]; + } + }; + } + + /** + * If the {@link Iterator} is not a {@link PrefetchableIterator} then one is constructed that + * ensures {@link PrefetchableIterator#prefetch} is a no-op and {@link + * PrefetchableIterator#isReady} always returns true. + */ + // package private for PrefetchableIterables. + static PrefetchableIterator maybePrefetchable(Iterator iterator) { + if (iterator == null) { + throw new IllegalArgumentException("Expected non-null iterator."); + } + if (iterator instanceof PrefetchableIterator) { + return (PrefetchableIterator) iterator; + } + return new PrefetchableIterator() { + @Override + public boolean isReady() { + return true; + } + + @Override + public void prefetch() {} + + @Override + public boolean hasNext() { + return iterator.hasNext(); + } + + @Override + public T next() { + return iterator.next(); + } + }; + } + + public static PrefetchableIterator concatIterators(Iterator> iterators) { + if (!iterators.hasNext()) { + return emptyIterator(); + } + return new PrefetchableIterator() { + PrefetchableIterator delegate = maybePrefetchable(iterators.next()); + + @Override + public boolean isReady() { + // Ensure that we advance from iterators that don't have the next + // element to an iterator that supports prefetch or does have an element + for (; ; ) { + // If the delegate isn't ready then we aren't ready. + // We assume that non prefetchable iterators are always ready. + if (!delegate.isReady()) { + return false; + } + + // If the delegate has a next and is ready then we are ready + if (delegate.hasNext()) { + return true; + } + + // Otherwise we should advance to the next index since we know this iterator is empty + // and re-evaluate whether we are ready + if (!iterators.hasNext()) { + return true; + } + delegate = maybePrefetchable(iterators.next()); + } + } + + @Override + public void prefetch() { + if (!isReady()) { + delegate.prefetch(); + } + } + + @Override + public boolean hasNext() { + for (; ; ) { + if (delegate.hasNext()) { + return true; + } + if (!iterators.hasNext()) { + return false; + } + delegate = maybePrefetchable(iterators.next()); + } + } + + @Override + public T next() { + if (!hasNext()) { + throw new NoSuchElementException(); + } + return delegate.next(); + } + }; + } + + /** + * Concatentates the {@link Iterator}s. + * + *

    {@link Iterable}s are first converted into a {@link PrefetchableIterable} via {@link + * #maybePrefetchable}. + * + *

    The returned {@link PrefetchableIterable} ensures that iterators which are returned + * guarantee that {@link PrefetchableIterator#isReady} always advances till it finds an {@link + * Iterable} that is not {@link PrefetchableIterator#isReady}. {@link + * PrefetchableIterator#prefetch} is also guaranteed to advance past empty iterators till it finds + * one that is not ready. + */ + public static PrefetchableIterator concat(Iterator... iterators) { + for (int i = 0; i < iterators.length; ++i) { + if (iterators[i] == null) { + throw new IllegalArgumentException("Iterator at position " + i + " was null."); + } + } + if (iterators.length == 0) { + return emptyIterator(); + } else if (iterators.length == 1) { + return maybePrefetchable(iterators[0]); + } + return concatIterators(Arrays.asList(iterators).iterator()); + } +} diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/SynchronizedStreamObserver.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/SynchronizedStreamObserver.java index 31e9af2663bc..d6a7a1e62e9e 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/SynchronizedStreamObserver.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/SynchronizedStreamObserver.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.fn.stream; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** * A {@link StreamObserver} which provides synchronous access access to an underlying {@link diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/test/InProcessManagedChannelFactory.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/test/InProcessManagedChannelFactory.java index aad1fd1aa2e5..406561f361f2 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/test/InProcessManagedChannelFactory.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/test/InProcessManagedChannelFactory.java @@ -19,8 +19,8 @@ import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.sdk.fn.channel.ManagedChannelFactory; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; /** * A {@link ManagedChannelFactory} that uses in-process channels. diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/test/TestStreams.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/test/TestStreams.java index cd8b9773d3a0..9279a12f99e7 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/test/TestStreams.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/test/TestStreams.java @@ -19,8 +19,8 @@ import java.util.function.Consumer; import java.util.function.Supplier; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** Utility methods which enable testing of {@link StreamObserver}s. */ public class TestStreams { diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/windowing/EncodedBoundedWindow.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/windowing/EncodedBoundedWindow.java index 3e9b21c55341..5a7da5ffd4e2 100644 --- a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/windowing/EncodedBoundedWindow.java +++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/windowing/EncodedBoundedWindow.java @@ -25,7 +25,7 @@ import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.util.VarInt; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; import org.joda.time.Instant; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/IdGeneratorsTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/IdGeneratorsTest.java index 8dfbf17956be..c119a5f85ecd 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/IdGeneratorsTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/IdGeneratorsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.fn; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.junit.Test; import org.junit.runner.RunWith; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/JvmInitializersTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/JvmInitializersTest.java index 454357f5ee5a..a20b3a981c77 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/JvmInitializersTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/JvmInitializersTest.java @@ -31,9 +31,6 @@ /** Tests for {@link JvmInitializers}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public final class JvmInitializersTest { private static Boolean onStartupRan; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactoryTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactoryTest.java index fc0c8138f8b0..47d8a53c346b 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactoryTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactoryTest.java @@ -18,10 +18,12 @@ package org.apache.beam.sdk.fn.channel; import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; import static org.junit.Assume.assumeTrue; import org.apache.beam.model.pipeline.v1.Endpoints; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.commons.lang3.SystemUtils; import org.junit.Rule; import org.junit.Test; import org.junit.rules.TemporaryFolder; @@ -45,7 +47,8 @@ public void testDefaultChannel() { @Test public void testEpollHostPortChannel() { - assumeTrue(org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.Epoll.isAvailable()); + assumeTrue(SystemUtils.IS_OS_LINUX); + assertTrue(org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.Epoll.isAvailable()); Endpoints.ApiServiceDescriptor apiServiceDescriptor = Endpoints.ApiServiceDescriptor.newBuilder().setUrl("localhost:123").build(); ManagedChannel channel = @@ -56,7 +59,8 @@ public void testEpollHostPortChannel() { @Test public void testEpollDomainSocketChannel() throws Exception { - assumeTrue(org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.epoll.Epoll.isAvailable()); + assumeTrue(SystemUtils.IS_OS_LINUX); + assertTrue(org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.epoll.Epoll.isAvailable()); Endpoints.ApiServiceDescriptor apiServiceDescriptor = Endpoints.ApiServiceDescriptor.newBuilder() .setUrl("unix://" + tmpFolder.newFile().getAbsolutePath()) diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/channel/SocketAddressFactoryTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/channel/SocketAddressFactoryTest.java index 91c1e170de96..b464dcfeb41d 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/channel/SocketAddressFactoryTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/channel/SocketAddressFactoryTest.java @@ -17,13 +17,13 @@ */ package org.apache.beam.sdk.fn.channel; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.File; import java.net.InetSocketAddress; import java.net.SocketAddress; -import org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.unix.DomainSocketAddress; +import org.apache.beam.vendor.grpc.v1p36p0.io.netty.channel.unix.DomainSocketAddress; import org.hamcrest.Matchers; import org.junit.Rule; import org.junit.Test; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexerTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexerTest.java index f538158a05c3..e14d44858a30 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexerTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexerTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.fn.data; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.ArrayList; @@ -32,14 +32,11 @@ import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.fn.test.TestStreams; import org.apache.beam.sdk.values.KV; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles; import org.junit.Test; /** Tests for {@link BeamFnDataGrpcMultiplexer}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnDataGrpcMultiplexerTest { private static final Endpoints.ApiServiceDescriptor DESCRIPTOR = Endpoints.ApiServiceDescriptor.newBuilder().setUrl("test").build(); diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserverTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserverTest.java index f4f4641555e4..e4b9ba978ace 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserverTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserverTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.fn.data; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.empty; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.io.IOException; @@ -38,7 +38,7 @@ import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.Parameterized; @@ -46,9 +46,6 @@ /** Tests for {@link BeamFnDataSizeBasedBufferingOutboundObserver}. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnDataSizeBasedBufferingOutboundObserverTest { private static final LogicalEndpoint DATA_OUTPUT_LOCATION = LogicalEndpoint.data("777L", "555L"); private static final LogicalEndpoint TIMER_OUTPUT_LOCATION = diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserverTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserverTest.java index e985b75b68a4..89a1e86c1818 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserverTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserverTest.java @@ -36,7 +36,7 @@ import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.Parameterized; @@ -44,9 +44,6 @@ /** Tests for {@link BeamFnDataTimeBasedBufferingOutboundObserver}. */ @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnDataTimeBasedBufferingOutboundObserverTest { private static final LogicalEndpoint DATA_OUTPUT_LOCATION = LogicalEndpoint.data("777L", "555L"); private static final LogicalEndpoint TIMER_OUTPUT_LOCATION = diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/CompletableFutureInboundDataClientTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/CompletableFutureInboundDataClientTest.java index d9fa3fb27b80..b11dc5d78db5 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/CompletableFutureInboundDataClientTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/CompletableFutureInboundDataClientTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.fn.data; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; import static org.hamcrest.Matchers.isA; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.concurrent.CancellationException; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortReadTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortReadTest.java index 5ee09a2816f6..9cf32e29bd9b 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortReadTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortReadTest.java @@ -17,14 +17,14 @@ */ package org.apache.beam.sdk.fn.data; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.model.fnexecution.v1.BeamFnApi.RemoteGrpcPort; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.model.pipeline.v1.Endpoints.AuthenticationSpec; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortWriteTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortWriteTest.java index 200b893a2c12..e3168f080635 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortWriteTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/data/RemoteGrpcPortWriteTest.java @@ -17,14 +17,14 @@ */ package org.apache.beam.sdk.fn.data; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.model.fnexecution.v1.BeamFnApi.RemoteGrpcPort; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.model.pipeline.v1.Endpoints.AuthenticationSpec; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackersTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackersTest.java index f1be94268626..9a62c0ebf6b9 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackersTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackersTest.java @@ -17,10 +17,10 @@ */ package org.apache.beam.sdk.fn.splittabledofn; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.List; @@ -36,7 +36,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class RestrictionTrackersTest { @Test diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/AdvancingPhaserTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/AdvancingPhaserTest.java index f00dc002237c..9221a51de308 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/AdvancingPhaserTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/AdvancingPhaserTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.fn.stream; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.collection.IsEmptyCollection.empty; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/DataStreamsTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/DataStreamsTest.java index 83657d2474af..a8b48e844b62 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/DataStreamsTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/DataStreamsTest.java @@ -17,16 +17,15 @@ */ package org.apache.beam.sdk.fn.stream; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.collection.IsCollectionWithSize.hasSize; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertTrue; import static org.junit.Assume.assumeTrue; -import java.io.ByteArrayInputStream; -import java.io.ByteArrayOutputStream; import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; @@ -41,48 +40,22 @@ import org.apache.beam.sdk.fn.stream.DataStreams.DataStreamDecoder; import org.apache.beam.sdk.fn.stream.DataStreams.ElementDelimitedOutputStream; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CountingOutputStream; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.SettableFuture; import org.junit.Rule; import org.junit.Test; +import org.junit.experimental.runners.Enclosed; import org.junit.rules.ExpectedException; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** Tests for {@link DataStreams}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) +@RunWith(Enclosed.class) public class DataStreamsTest { - /** Tests for {@link DataStreams.Inbound}. */ - @RunWith(JUnit4.class) - public static class InboundTest { - private static final ByteString BYTES_A = ByteString.copyFromUtf8("TestData"); - private static final ByteString BYTES_B = ByteString.copyFromUtf8("SomeOtherTestData"); - - @Test - public void testEmptyRead() throws Exception { - assertEquals(ByteString.EMPTY, read()); - assertEquals(ByteString.EMPTY, read(ByteString.EMPTY)); - assertEquals(ByteString.EMPTY, read(ByteString.EMPTY, ByteString.EMPTY)); - } - - @Test - public void testRead() throws Exception { - assertEquals(BYTES_A.concat(BYTES_B), read(BYTES_A, BYTES_B)); - assertEquals(BYTES_A.concat(BYTES_B), read(BYTES_A, ByteString.EMPTY, BYTES_B)); - assertEquals(BYTES_A.concat(BYTES_B), read(BYTES_A, BYTES_B, ByteString.EMPTY)); - } - - private static ByteString read(ByteString... bytes) throws IOException { - return ByteString.readFrom(DataStreams.inbound(Arrays.asList(bytes).iterator())); - } - } - /** Tests for {@link DataStreams.BlockingQueueIterator}. */ @RunWith(JUnit4.class) public static class BlockingQueueIteratorTest { @@ -134,7 +107,7 @@ public void testNonEmptyInputStream() throws Exception { } @Test - public void testNonEmptyInputStreamWithZeroLengthCoder() throws Exception { + public void testNonEmptyInputStreamWithZeroLengthEncoding() throws Exception { CountingOutputStream countingOutputStream = new CountingOutputStream(ByteStreams.nullOutputStream()); GlobalWindow.Coder.INSTANCE.encode(GlobalWindow.INSTANCE, countingOutputStream); @@ -143,23 +116,79 @@ public void testNonEmptyInputStreamWithZeroLengthCoder() throws Exception { testDecoderWith(GlobalWindow.Coder.INSTANCE, GlobalWindow.INSTANCE, GlobalWindow.INSTANCE); } + @Test + public void testPrefetch() throws Exception { + List encodings = new ArrayList<>(); + { + ByteString.Output encoding = ByteString.newOutput(); + StringUtf8Coder.of().encode("A", encoding); + StringUtf8Coder.of().encode("BC", encoding); + encodings.add(encoding.toByteString()); + } + encodings.add(ByteString.EMPTY); + { + ByteString.Output encoding = ByteString.newOutput(); + StringUtf8Coder.of().encode("DEF", encoding); + StringUtf8Coder.of().encode("GHIJ", encoding); + encodings.add(encoding.toByteString()); + } + + PrefetchableIteratorsTest.ReadyAfterPrefetchUntilNext iterator = + new PrefetchableIteratorsTest.ReadyAfterPrefetchUntilNext<>(encodings.iterator()); + PrefetchableIterator decoder = + new DataStreamDecoder<>(StringUtf8Coder.of(), iterator); + assertFalse(decoder.isReady()); + decoder.prefetch(); + assertTrue(decoder.isReady()); + assertEquals(1, iterator.getNumPrefetchCalls()); + + decoder.next(); + // Now we will have moved off of the empty byte array that we start with so prefetch will + // do nothing since we are ready + assertTrue(decoder.isReady()); + decoder.prefetch(); + assertEquals(1, iterator.getNumPrefetchCalls()); + + decoder.next(); + // Now we are at the end of the first ByteString so we expect a prefetch to pass through + assertFalse(decoder.isReady()); + decoder.prefetch(); + assertEquals(2, iterator.getNumPrefetchCalls()); + // We also expect the decoder to not be ready since the next byte string is empty which + // would require us to move to the next page. This typically wouldn't happen in practice + // though because we expect non empty pages. + assertFalse(decoder.isReady()); + + // Prefetching will allow us to move to the third ByteString + decoder.prefetch(); + assertEquals(3, iterator.getNumPrefetchCalls()); + assertTrue(decoder.isReady()); + } + private void testDecoderWith(Coder coder, T... expected) throws IOException { - ByteArrayOutputStream baos = new ByteArrayOutputStream(); + ByteString.Output output = ByteString.newOutput(); for (T value : expected) { - int size = baos.size(); - coder.encode(value, baos); + int size = output.size(); + coder.encode(value, output); // Pad an arbitrary byte when values encode to zero bytes - if (baos.size() - size == 0) { - baos.write(0); + if (output.size() - size == 0) { + output.write(0); } } + testDecoderWith(coder, expected, Arrays.asList(output.toByteString())); + testDecoderWith(coder, expected, Arrays.asList(ByteString.EMPTY, output.toByteString())); + testDecoderWith(coder, expected, Arrays.asList(output.toByteString(), ByteString.EMPTY)); + } + private void testDecoderWith(Coder coder, T[] expected, List encoded) { Iterator decoder = - new DataStreamDecoder<>(coder, new ByteArrayInputStream(baos.toByteArray())); + new DataStreamDecoder<>( + coder, PrefetchableIterators.maybePrefetchable(encoded.iterator())); Object[] actual = Iterators.toArray(decoder, Object.class); assertArrayEquals(expected, actual); + // Ensure that we are consistent on hasNext at end of stream assertFalse(decoder.hasNext()); assertFalse(decoder.hasNext()); diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/ForwardingClientResponseObserverTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/ForwardingClientResponseObserverTest.java index 5e0e4b5afbb0..5ce6e553084a 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/ForwardingClientResponseObserverTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/ForwardingClientResponseObserverTest.java @@ -21,9 +21,9 @@ import static org.mockito.Mockito.verify; import static org.mockito.Mockito.verifyNoMoreInteractions; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ClientCallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ClientResponseObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientCallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientResponseObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactoryTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactoryTest.java index fe3a21d97562..0bcb624efd47 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactoryTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactoryTest.java @@ -17,13 +17,13 @@ */ package org.apache.beam.sdk.fn.stream; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.concurrent.Executors; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; @@ -33,9 +33,6 @@ /** Tests for {@link OutboundObserverFactory}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class OutboundObserverFactoryTest { @Mock private StreamObserver mockRequestObserver; @Mock private CallStreamObserver mockResponseObserver; diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/PrefetchableIterablesTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/PrefetchableIterablesTest.java new file mode 100644 index 000000000000..e4d1b0d52ff1 --- /dev/null +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/PrefetchableIterablesTest.java @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.fn.stream; + +import static org.junit.Assert.assertSame; + +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link PrefetchableIterators}. */ +@RunWith(JUnit4.class) +public class PrefetchableIterablesTest { + @Test + public void testEmptyIterable() { + verifyIterable(PrefetchableIterables.emptyIterable()); + } + + @Test + public void testFromArray() { + verifyIterable(PrefetchableIterables.fromArray("A", "B", "C"), "A", "B", "C"); + verifyIterable(PrefetchableIterables.fromArray()); + } + + @Test + public void testLimit() { + verifyIterable(PrefetchableIterables.limit(PrefetchableIterables.fromArray(), 0)); + verifyIterable(PrefetchableIterables.limit(PrefetchableIterables.fromArray(), 1)); + verifyIterable(PrefetchableIterables.limit(PrefetchableIterables.fromArray("A", "B", "C"), 0)); + verifyIterable( + PrefetchableIterables.limit(PrefetchableIterables.fromArray("A", "B", "C"), 2), "A", "B"); + verifyIterable( + PrefetchableIterables.limit(PrefetchableIterables.fromArray("A", "B", "C"), 3), + "A", + "B", + "C"); + verifyIterable( + PrefetchableIterables.limit(PrefetchableIterables.fromArray("A", "B", "C"), 4), + "A", + "B", + "C"); + } + + @Test + public void testConcat() { + verifyIterable(PrefetchableIterables.concat()); + + PrefetchableIterable instance = PrefetchableIterables.fromArray("A", "B"); + assertSame(PrefetchableIterables.concat(instance), instance); + + verifyIterable( + PrefetchableIterables.concat( + PrefetchableIterables.fromArray(), + PrefetchableIterables.fromArray(), + PrefetchableIterables.fromArray())); + verifyIterable( + PrefetchableIterables.concat( + PrefetchableIterables.fromArray("A", "B"), + PrefetchableIterables.fromArray(), + PrefetchableIterables.fromArray()), + "A", + "B"); + verifyIterable( + PrefetchableIterables.concat( + PrefetchableIterables.fromArray(), + PrefetchableIterables.fromArray("C", "D"), + PrefetchableIterables.fromArray()), + "C", + "D"); + verifyIterable( + PrefetchableIterables.concat( + PrefetchableIterables.fromArray(), + PrefetchableIterables.fromArray(), + PrefetchableIterables.fromArray("E", "F")), + "E", + "F"); + verifyIterable( + PrefetchableIterables.concat( + PrefetchableIterables.fromArray(), + PrefetchableIterables.fromArray("C", "D"), + PrefetchableIterables.fromArray("E", "F")), + "C", + "D", + "E", + "F"); + verifyIterable( + PrefetchableIterables.concat( + PrefetchableIterables.fromArray("A", "B"), + PrefetchableIterables.fromArray(), + PrefetchableIterables.fromArray("E", "F")), + "A", + "B", + "E", + "F"); + verifyIterable( + PrefetchableIterables.concat( + PrefetchableIterables.fromArray("A", "B"), + PrefetchableIterables.fromArray("C", "D"), + PrefetchableIterables.fromArray()), + "A", + "B", + "C", + "D"); + verifyIterable( + PrefetchableIterables.concat( + PrefetchableIterables.fromArray("A", "B"), + PrefetchableIterables.fromArray("C", "D"), + PrefetchableIterables.fromArray("E", "F")), + "A", + "B", + "C", + "D", + "E", + "F"); + } + + public static void verifyIterable(Iterable iterable, T... expected) { + // Ensure we can access the iterator multiple times + for (int i = 0; i < 3; i++) { + PrefetchableIteratorsTest.verifyIterator(iterable.iterator(), expected); + } + } +} diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/PrefetchableIteratorsTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/PrefetchableIteratorsTest.java new file mode 100644 index 000000000000..9ada1759bad1 --- /dev/null +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/stream/PrefetchableIteratorsTest.java @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.fn.stream; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertSame; +import static org.junit.Assert.assertThrows; +import static org.junit.Assert.assertTrue; + +import java.util.Iterator; +import java.util.NoSuchElementException; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link PrefetchableIterators}. */ +@RunWith(JUnit4.class) +public class PrefetchableIteratorsTest { + + @Test + public void testEmpty() { + verifyIterator(PrefetchableIterators.emptyIterator()); + verifyIsAlwaysReady(PrefetchableIterators.emptyIterator()); + } + + @Test + public void testFromArray() { + verifyIterator(PrefetchableIterators.fromArray("A", "B", "C"), "A", "B", "C"); + verifyIsAlwaysReady(PrefetchableIterators.fromArray("A", "B", "C")); + verifyIterator(PrefetchableIterators.fromArray()); + verifyIsAlwaysReady(PrefetchableIterators.fromArray()); + } + + @Test + public void testConcat() { + verifyIterator(PrefetchableIterators.concat()); + + PrefetchableIterator instance = PrefetchableIterators.fromArray("A", "B"); + assertSame(PrefetchableIterators.concat(instance), instance); + + verifyIterator( + PrefetchableIterators.concat( + PrefetchableIterators.fromArray(), + PrefetchableIterators.fromArray(), + PrefetchableIterators.fromArray())); + verifyIterator( + PrefetchableIterators.concat( + PrefetchableIterators.fromArray("A", "B"), + PrefetchableIterators.fromArray(), + PrefetchableIterators.fromArray()), + "A", + "B"); + verifyIterator( + PrefetchableIterators.concat( + PrefetchableIterators.fromArray(), + PrefetchableIterators.fromArray("C", "D"), + PrefetchableIterators.fromArray()), + "C", + "D"); + verifyIterator( + PrefetchableIterators.concat( + PrefetchableIterators.fromArray(), + PrefetchableIterators.fromArray(), + PrefetchableIterators.fromArray("E", "F")), + "E", + "F"); + verifyIterator( + PrefetchableIterators.concat( + PrefetchableIterators.fromArray(), + PrefetchableIterators.fromArray("C", "D"), + PrefetchableIterators.fromArray("E", "F")), + "C", + "D", + "E", + "F"); + verifyIterator( + PrefetchableIterators.concat( + PrefetchableIterators.fromArray("A", "B"), + PrefetchableIterators.fromArray(), + PrefetchableIterators.fromArray("E", "F")), + "A", + "B", + "E", + "F"); + verifyIterator( + PrefetchableIterators.concat( + PrefetchableIterators.fromArray("A", "B"), + PrefetchableIterators.fromArray("C", "D"), + PrefetchableIterators.fromArray()), + "A", + "B", + "C", + "D"); + verifyIterator( + PrefetchableIterators.concat( + PrefetchableIterators.fromArray("A", "B"), + PrefetchableIterators.fromArray("C", "D"), + PrefetchableIterators.fromArray("E", "F")), + "A", + "B", + "C", + "D", + "E", + "F"); + } + + public static class NeverReady implements PrefetchableIterator { + private final Iterator delegate; + int prefetchCalled; + + public NeverReady(Iterator delegate) { + this.delegate = delegate; + } + + @Override + public boolean isReady() { + return false; + } + + @Override + public void prefetch() { + prefetchCalled += 1; + } + + @Override + public boolean hasNext() { + return delegate.hasNext(); + } + + @Override + public T next() { + return delegate.next(); + } + + public int getNumPrefetchCalls() { + return prefetchCalled; + } + } + + public static class ReadyAfterPrefetch extends NeverReady { + + public ReadyAfterPrefetch(Iterator delegate) { + super(delegate); + } + + @Override + public boolean isReady() { + return prefetchCalled > 0; + } + } + + public static class ReadyAfterPrefetchUntilNext extends ReadyAfterPrefetch { + boolean advancedSincePrefetch; + + public ReadyAfterPrefetchUntilNext(Iterator delegate) { + super(delegate); + } + + @Override + public boolean isReady() { + return !advancedSincePrefetch && super.isReady(); + } + + @Override + public void prefetch() { + advancedSincePrefetch = false; + super.prefetch(); + } + + @Override + public T next() { + advancedSincePrefetch = true; + return super.next(); + } + + @Override + public boolean hasNext() { + advancedSincePrefetch = true; + return super.hasNext(); + } + } + + @Test + public void testConcatIsReadyAdvancesToNextIteratorWhenAble() { + NeverReady readyAfterPrefetch1 = + new NeverReady<>(PrefetchableIterators.fromArray("A", "B")); + ReadyAfterPrefetch readyAfterPrefetch2 = + new ReadyAfterPrefetch<>(PrefetchableIterators.fromArray("A", "B")); + ReadyAfterPrefetch readyAfterPrefetch3 = + new ReadyAfterPrefetch<>(PrefetchableIterators.fromArray("A", "B")); + + PrefetchableIterator iterator = + PrefetchableIterators.concat(readyAfterPrefetch1, readyAfterPrefetch2, readyAfterPrefetch3); + + // Expect no prefetches yet + assertEquals(0, readyAfterPrefetch1.getNumPrefetchCalls()); + assertEquals(0, readyAfterPrefetch2.getNumPrefetchCalls()); + assertEquals(0, readyAfterPrefetch3.getNumPrefetchCalls()); + + // We expect to attempt to prefetch for the first time. + iterator.prefetch(); + assertEquals(1, readyAfterPrefetch1.getNumPrefetchCalls()); + assertEquals(0, readyAfterPrefetch2.getNumPrefetchCalls()); + assertEquals(0, readyAfterPrefetch3.getNumPrefetchCalls()); + iterator.next(); + + // We expect to attempt to prefetch again since we aren't ready. + iterator.prefetch(); + assertEquals(2, readyAfterPrefetch1.getNumPrefetchCalls()); + assertEquals(0, readyAfterPrefetch2.getNumPrefetchCalls()); + assertEquals(0, readyAfterPrefetch3.getNumPrefetchCalls()); + iterator.next(); + + // The current iterator is done but is never ready so we can't advance to the next one and + // expect another prefetch to go to the current iterator. + iterator.prefetch(); + assertEquals(3, readyAfterPrefetch1.getNumPrefetchCalls()); + assertEquals(0, readyAfterPrefetch2.getNumPrefetchCalls()); + assertEquals(0, readyAfterPrefetch3.getNumPrefetchCalls()); + iterator.next(); + + // Now that we know the last iterator is done and have advanced to the next one we expect + // prefetch to go through + iterator.prefetch(); + assertEquals(3, readyAfterPrefetch1.getNumPrefetchCalls()); + assertEquals(1, readyAfterPrefetch2.getNumPrefetchCalls()); + assertEquals(0, readyAfterPrefetch3.getNumPrefetchCalls()); + iterator.next(); + + // The last iterator is done so we should be able to prefetch the next one before advancing + iterator.prefetch(); + assertEquals(3, readyAfterPrefetch1.getNumPrefetchCalls()); + assertEquals(1, readyAfterPrefetch2.getNumPrefetchCalls()); + assertEquals(1, readyAfterPrefetch3.getNumPrefetchCalls()); + iterator.next(); + + // The current iterator is ready so no additional prefetch is necessary + iterator.prefetch(); + assertEquals(3, readyAfterPrefetch1.getNumPrefetchCalls()); + assertEquals(1, readyAfterPrefetch2.getNumPrefetchCalls()); + assertEquals(1, readyAfterPrefetch3.getNumPrefetchCalls()); + iterator.next(); + } + + public static void verifyIsAlwaysReady(PrefetchableIterator iterator) { + while (iterator.hasNext()) { + assertTrue(iterator.isReady()); + iterator.next(); + } + assertTrue(iterator.isReady()); + } + + public static void verifyIterator(Iterator iterator, T... expected) { + for (int i = 0; i < expected.length; ++i) { + assertTrue(iterator.hasNext()); + assertEquals(expected[i], iterator.next()); + } + assertFalse(iterator.hasNext()); + assertThrows(NoSuchElementException.class, () -> iterator.next()); + // Ensure that multiple hasNext/next after a failure are repeatable + assertFalse(iterator.hasNext()); + assertThrows(NoSuchElementException.class, () -> iterator.next()); + } +} diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/TestExecutorsTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/TestExecutorsTest.java index d14ea4821651..11c44256a9fb 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/TestExecutorsTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/TestExecutorsTest.java @@ -33,9 +33,6 @@ /** Tests for {@link TestExecutors}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestExecutorsTest { @Test public void testSuccessfulTermination() throws Throwable { diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/TestStreamsTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/TestStreamsTest.java index 5aef195929c1..5e0b22e480e2 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/TestStreamsTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/TestStreamsTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.fn.test; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.ArrayList; @@ -31,9 +31,6 @@ /** Tests for {@link TestStreams}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TestStreamsTest { @Test public void testOnNextIsCalled() { diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/windowing/EncodedBoundedWindowTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/windowing/EncodedBoundedWindowTest.java index ecf5bc114324..1f3b5d5adae1 100644 --- a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/windowing/EncodedBoundedWindowTest.java +++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/windowing/EncodedBoundedWindowTest.java @@ -19,7 +19,7 @@ import org.apache.beam.sdk.fn.windowing.EncodedBoundedWindow.Coder; import org.apache.beam.sdk.testing.CoderProperties; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; diff --git a/sdks/java/harness/build.gradle b/sdks/java/harness/build.gradle index 45891b0ac86f..6337cd47f308 100644 --- a/sdks/java/harness/build.gradle +++ b/sdks/java/harness/build.gradle @@ -33,6 +33,7 @@ applyJavaNature( ], automaticModuleName: 'org.apache.beam.fn.harness', validateShadowJar: false, + enableJmh: true, testShadowJar: true, shadowClosure: // Create an uber jar without repackaging for the SDK harness @@ -61,19 +62,18 @@ dependencies { dependOnProjects.each { compile project(it) } - compile library.java.jackson_databind shadow library.java.vendored_guava_26_0_jre shadowTest library.java.powermock shadowTest library.java.powermock_mockito compile library.java.joda_time compile library.java.slf4j_api - compile library.java.vendored_grpc_1_26_0 - provided library.java.error_prone_annotations - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library + compile library.java.vendored_grpc_1_36_0 testCompile library.java.junit testCompile library.java.mockito_core testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(":runners:core-construction-java") + testCompile project(path: ":sdks:java:fn-execution", configuration: "testRuntime") shadowTestRuntimeClasspath library.java.slf4j_jdk14 + jmhCompile project(path: ":sdks:java:harness", configuration: "shadowTest") + jmhRuntime library.java.slf4j_jdk14 } diff --git a/sdks/java/harness/src/jmh/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClientBenchmark.java b/sdks/java/harness/src/jmh/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClientBenchmark.java new file mode 100644 index 000000000000..9e009c33c499 --- /dev/null +++ b/sdks/java/harness/src/jmh/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClientBenchmark.java @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.logging; + +import java.io.Closeable; +import java.util.HashMap; +import java.util.UUID; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; +import org.apache.beam.model.fnexecution.v1.BeamFnApi; +import org.apache.beam.model.fnexecution.v1.BeamFnLoggingGrpc; +import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; +import org.apache.beam.runners.core.metrics.ExecutionStateTracker; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.SimpleExecutionState; +import org.apache.beam.sdk.fn.channel.ManagedChannelFactory; +import org.apache.beam.sdk.fn.test.InProcessManagedChannelFactory; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.openjdk.jmh.annotations.Benchmark; +import org.openjdk.jmh.annotations.Level; +import org.openjdk.jmh.annotations.Scope; +import org.openjdk.jmh.annotations.State; +import org.openjdk.jmh.annotations.TearDown; +import org.openjdk.jmh.annotations.Threads; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Benchmarks for {@link BeamFnLoggingClient}. */ +public class BeamFnLoggingClientBenchmark { + private static final Logger LOG = LoggerFactory.getLogger(BeamFnLoggingClientBenchmark.class); + + /** A logging service which counts the number of calls it received. */ + public static class CallCountLoggingService extends BeamFnLoggingGrpc.BeamFnLoggingImplBase { + private AtomicInteger callCount = new AtomicInteger(); + + @Override + public StreamObserver logging( + StreamObserver outboundObserver) { + return new StreamObserver() { + + @Override + public void onNext(BeamFnApi.LogEntry.List list) { + callCount.incrementAndGet(); + } + + @Override + public void onError(Throwable throwable) { + outboundObserver.onError(throwable); + } + + @Override + public void onCompleted() { + outboundObserver.onCompleted(); + } + }; + } + } + + /** Setup a simple logging service and configure the {@link BeamFnLoggingClient}. */ + @State(Scope.Benchmark) + public static class ManageLoggingClientAndService { + public final BeamFnLoggingClient loggingClient; + public final CallCountLoggingService loggingService; + public final Server server; + + public ManageLoggingClientAndService() { + try { + ApiServiceDescriptor apiServiceDescriptor = + ApiServiceDescriptor.newBuilder() + .setUrl(BeamFnLoggingClientBenchmark.class.getName() + "#" + UUID.randomUUID()) + .build(); + ManagedChannelFactory managedChannelFactory = InProcessManagedChannelFactory.create(); + loggingService = new CallCountLoggingService(); + server = + InProcessServerBuilder.forName(apiServiceDescriptor.getUrl()) + .addService(loggingService) + .build(); + server.start(); + loggingClient = + new BeamFnLoggingClient( + PipelineOptionsFactory.create(), + apiServiceDescriptor, + managedChannelFactory::forDescriptor); + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + @TearDown(Level.Trial) + public void tearDown() throws Exception { + loggingClient.close(); + server.shutdown(); + if (server.awaitTermination(30, TimeUnit.SECONDS)) { + server.shutdownNow(); + } + } + } + + /** + * A {@link ManageLoggingClientAndService} which validates that more than zero calls made it to + * the service. + */ + @State(Scope.Benchmark) + public static class ManyExpectedCallsLoggingClientAndService + extends ManageLoggingClientAndService { + @Override + @TearDown + public void tearDown() throws Exception { + super.tearDown(); + if (loggingService.callCount.get() <= 0) { + throw new IllegalStateException( + "Server expected greater then zero calls. Benchmark misconfigured?"); + } + } + } + + /** + * A {@link ManageLoggingClientAndService} which validates that exactly zero calls made it to the + * service. + */ + @State(Scope.Benchmark) + public static class ZeroExpectedCallsLoggingClientAndService + extends ManageLoggingClientAndService { + @Override + @TearDown + public void tearDown() throws Exception { + super.tearDown(); + if (loggingService.callCount.get() != 0) { + throw new IllegalStateException("Server expected zero calls. Benchmark misconfigured?"); + } + } + } + + /** Sets up the {@link ExecutionStateTracker} and an execution state. */ + @State(Scope.Benchmark) + public static class ManageExecutionState { + private final ExecutionStateTracker executionStateTracker; + private final SimpleExecutionState simpleExecutionState; + + public ManageExecutionState() { + executionStateTracker = ExecutionStateTracker.newForTest(); + HashMap labelsMetadata = new HashMap<>(); + labelsMetadata.put(MonitoringInfoConstants.Labels.PTRANSFORM, "ptransformId"); + simpleExecutionState = + new SimpleExecutionState( + ExecutionStateTracker.PROCESS_STATE_NAME, + MonitoringInfoConstants.Urns.PROCESS_BUNDLE_MSECS, + labelsMetadata); + } + + @TearDown + public void tearDown() throws Exception { + executionStateTracker.reset(); + } + } + + @Benchmark + @Threads(16) // Use several threads since we expect contention during logging + public void testLogging(ManyExpectedCallsLoggingClientAndService client) { + LOG.warn("log me"); + } + + @Benchmark + @Threads(16) // Use several threads since we expect contention during logging + public void testLoggingWithAllOptionalParameters( + ManyExpectedCallsLoggingClientAndService client, ManageExecutionState executionState) + throws Exception { + BeamFnLoggingMDC.setInstructionId("instruction id"); + try (Closeable state = + executionState.executionStateTracker.enterState(executionState.simpleExecutionState)) { + LOG.warn("log me"); + } + BeamFnLoggingMDC.setInstructionId(null); + } + + @Benchmark + @Threads(16) // Use several threads since we expect contention during logging + public void testSkippedLogging(ZeroExpectedCallsLoggingClientAndService client) { + LOG.trace("no log"); + } +} diff --git a/sdks/java/harness/src/jmh/java/org/apache/beam/fn/harness/logging/package-info.java b/sdks/java/harness/src/jmh/java/org/apache/beam/fn/harness/logging/package-info.java new file mode 100644 index 000000000000..304238ed26ad --- /dev/null +++ b/sdks/java/harness/src/jmh/java/org/apache/beam/fn/harness/logging/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Benchmarks for logging. */ +package org.apache.beam.fn.harness.logging; diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java index b83db4855e6b..ac7267ac632b 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java @@ -17,6 +17,7 @@ */ package org.apache.beam.fn.harness; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables.getOnlyElement; import com.google.auto.service.AutoService; @@ -198,7 +199,7 @@ public Supplier getCurrentInstructionId() { .build()); } }); - reset(); + clearSplitIndices(); } public void registerInputLocation() { @@ -246,6 +247,14 @@ public void trySplit( // have yet to see the element or has completed processing the element by the time // we ask it to split (even after we have asked for its progress). + // If the split request we received was delayed we it may be for a previous bundle. + // Ensure we're processing a split for *this* bundle. This check is done under the lock + // to make sure reset() is not called concurrently in case the bundle processor is + // being released. + if (!request.getInstructionId().equals(processBundleInstructionIdSupplier.get())) { + return; + } + // If the split request we received was delayed and is less then the known number of elements // then use "index + 1" as the total size. Similarly, if we have already split and the // split request is bounded incorrectly, use the stop index as the upper bound. @@ -348,8 +357,17 @@ public void blockTillReadFinishes() throws Exception { } public void reset() { - index = -1; - stopIndex = Long.MAX_VALUE; + checkArgument( + processBundleInstructionIdSupplier.get() == null, + "Cannot reset an active bundle processor."); + clearSplitIndices(); + } + + private void clearSplitIndices() { + synchronized (splittingLock) { + index = -1; + stopIndex = Long.MAX_VALUE; + } } private boolean isValidSplitPoint(List allowedSplitPoints, long index) { diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataWriteRunner.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataWriteRunner.java index 8d1646014b65..b15499b70137 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataWriteRunner.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataWriteRunner.java @@ -49,6 +49,7 @@ import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.DoFn.BundleFinalizer; import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -115,7 +116,8 @@ public BeamFnDataWriteRunner createRunnerForPTransform( pCollectionConsumerRegistry.register( getOnlyElement(pTransform.getInputsMap().values()), pTransformId, - (FnDataReceiver) (FnDataReceiver>) runner::consume); + (FnDataReceiver) (FnDataReceiver>) runner::consume, + ((WindowedValueCoder) runner.coder).getValueCoder()); finishFunctionRegistry.register(pTransformId, runner::close); return runner; diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/CombineRunners.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/CombineRunners.java index 00f00bbaf95f..74b3a0b4bcae 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/CombineRunners.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/CombineRunners.java @@ -186,8 +186,8 @@ public PrecombineRunner createRunnerForPTransform( pCollectionConsumerRegistry.register( Iterables.getOnlyElement(pTransform.getInputsMap().values()), pTransformId, - (FnDataReceiver) - (FnDataReceiver>>) runner::processElement); + (FnDataReceiver) (FnDataReceiver>>) runner::processElement, + inputCoder); finishFunctionRegistry.register(pTransformId, runner::finishBundle); return runner; diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/ExternalWorkerService.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/ExternalWorkerService.java new file mode 100644 index 000000000000..4725485dd820 --- /dev/null +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/ExternalWorkerService.java @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness; + +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; + +import java.util.Collections; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StartWorkerRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StartWorkerResponse; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StopWorkerRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StopWorkerResponse; +import org.apache.beam.model.fnexecution.v1.BeamFnExternalWorkerPoolGrpc.BeamFnExternalWorkerPoolImplBase; +import org.apache.beam.model.pipeline.v1.Endpoints; +import org.apache.beam.runners.core.construction.Environments; +import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; +import org.apache.beam.sdk.fn.server.FnService; +import org.apache.beam.sdk.fn.server.GrpcFnServer; +import org.apache.beam.sdk.fn.server.ServerFactory; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PortablePipelineOptions; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Implements the BeamFnExternalWorkerPool service by starting a fresh SDK harness for each request. + */ +public class ExternalWorkerService extends BeamFnExternalWorkerPoolImplBase implements FnService { + + private static final Logger LOG = LoggerFactory.getLogger(ExternalWorkerService.class); + private static final String PIPELINE_OPTIONS_ENV_VAR = "PIPELINE_OPTIONS"; + + private final PipelineOptions options; + private final ServerFactory serverFactory = ServerFactory.createDefault(); + + public ExternalWorkerService(PipelineOptions options) { + this.options = options; + } + + @Override + public void startWorker( + StartWorkerRequest request, StreamObserver responseObserver) { + LOG.info( + "Starting worker {} pointing at {}.", + request.getWorkerId(), + request.getControlEndpoint().getUrl()); + LOG.debug("Worker request {}.", request); + Thread th = + new Thread( + () -> { + try { + FnHarness.main( + request.getWorkerId(), + options, + Collections.emptySet(), + request.getLoggingEndpoint(), + request.getControlEndpoint(), + null); + LOG.info("Successfully started worker {}.", request.getWorkerId()); + } catch (Exception exn) { + LOG.error(String.format("Failed to start worker %s.", request.getWorkerId()), exn); + } + }); + th.setName("SDK-worker-" + request.getWorkerId()); + th.setDaemon(true); + th.start(); + + responseObserver.onNext(StartWorkerResponse.newBuilder().build()); + responseObserver.onCompleted(); + } + + @Override + public void stopWorker( + StopWorkerRequest request, StreamObserver responseObserver) { + // Thread based workers terminate automatically + responseObserver.onNext(StopWorkerResponse.newBuilder().build()); + responseObserver.onCompleted(); + } + + @Override + public void close() {} + + public GrpcFnServer start() throws Exception { + final String externalServiceAddress = + Environments.getExternalServiceAddress(options.as(PortablePipelineOptions.class)); + GrpcFnServer server; + if (externalServiceAddress.isEmpty()) { + server = GrpcFnServer.allocatePortAndCreateFor(this, serverFactory); + } else { + server = + GrpcFnServer.create( + this, + Endpoints.ApiServiceDescriptor.newBuilder().setUrl(externalServiceAddress).build(), + serverFactory); + } + LOG.debug( + "Listening for worker start requests at {}.", server.getApiServiceDescriptor().getUrl()); + return server; + } + + /** + * Worker pool entry point. + * + *

    The worker pool exposes an RPC service that is used with EXTERNAL environment to start and + * stop the SDK workers. + * + *

    The worker pool uses threads for parallelism; + * + *

    This entry point is used by the Java SDK container in worker pool mode and expects the + * following environment variables: + * + *

      + *
    • PIPELINE_OPTIONS: A serialized form of {@link PipelineOptions}. It needs to be known + * up-front and matches the running job. See {@link PipelineOptions} for further details. + *
    + */ + public static void main(String[] args) { + LOG.info("Starting external worker service"); + final String optionsEnv = + checkArgumentNotNull( + System.getenv(PIPELINE_OPTIONS_ENV_VAR), + "No pipeline options provided in environment variables " + PIPELINE_OPTIONS_ENV_VAR); + LOG.info("Pipeline options {}", optionsEnv); + PipelineOptions options = PipelineOptionsTranslation.fromJson(optionsEnv); + + try (GrpcFnServer server = new ExternalWorkerService(options).start()) { + LOG.info( + "External worker service started at address: {}", + server.getApiServiceDescriptor().getUrl()); + // Wait to keep ExternalWorkerService running + Sleeper.DEFAULT.sleep(Long.MAX_VALUE); + } catch (Exception e) { + LOG.error("Error running worker service", e); + } finally { + LOG.info("External worker service stopped."); + } + } +} diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FlattenRunner.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FlattenRunner.java index b56085e27670..6493778ade3f 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FlattenRunner.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FlattenRunner.java @@ -32,13 +32,16 @@ import org.apache.beam.fn.harness.state.BeamFnStateClient; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.Coder; +import org.apache.beam.model.pipeline.v1.RunnerApi.Components; import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection; import org.apache.beam.runners.core.construction.PTransformTranslation; +import org.apache.beam.runners.core.construction.RehydratedComponents; import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.function.ThrowingRunnable; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.DoFn.BundleFinalizer; import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; /** Executes flatten PTransforms. */ @@ -85,14 +88,36 @@ public FlattenRunner createRunnerForPTransform( FnDataReceiver> receiver = pCollectionConsumerRegistry.getMultiplexingConsumer(output); - FlattenRunner runner = new FlattenRunner<>(); + RehydratedComponents components = + RehydratedComponents.forComponents(Components.newBuilder().putAllCoders(coders).build()); + FlattenRunner runner = new FlattenRunner<>(); for (String pCollectionId : pTransform.getInputsMap().values()) { pCollectionConsumerRegistry.register( - pCollectionId, pTransformId, (FnDataReceiver) receiver); + pCollectionId, + pTransformId, + (FnDataReceiver) receiver, + getValueCoder(components, pCollections, pCollectionId)); } return runner; } + + private org.apache.beam.sdk.coders.Coder getValueCoder( + RehydratedComponents components, + Map pCollections, + String pCollectionId) + throws IOException { + if (!pCollections.containsKey(pCollectionId)) { + throw new IllegalArgumentException( + String.format("Missing PCollection for id: %s", pCollectionId)); + } + org.apache.beam.sdk.coders.Coder coder = + components.getCoder(pCollections.get(pCollectionId).getCoderId()); + if (coder instanceof WindowedValueCoder) { + coder = ((WindowedValueCoder) coder).getValueCoder(); + } + return coder; + } } } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java index 95ca03e82b34..eda31826844c 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java @@ -32,6 +32,7 @@ import java.util.Iterator; import java.util.List; import java.util.Map; +import java.util.NavigableSet; import java.util.function.Consumer; import java.util.function.Supplier; import org.apache.beam.fn.harness.PTransformRunnerFactory.ProgressRequestCallback; @@ -43,6 +44,9 @@ import org.apache.beam.fn.harness.data.PTransformFunctionRegistry; import org.apache.beam.fn.harness.state.BeamFnStateClient; import org.apache.beam.fn.harness.state.FnApiStateAccessor; +import org.apache.beam.fn.harness.state.FnApiTimerBundleTracker; +import org.apache.beam.fn.harness.state.FnApiTimerBundleTracker.Modifications; +import org.apache.beam.fn.harness.state.FnApiTimerBundleTracker.TimerInfo; import org.apache.beam.fn.harness.state.SideInputSpec; import org.apache.beam.model.fnexecution.v1.BeamFnApi.BundleApplication; import org.apache.beam.model.fnexecution.v1.BeamFnApi.DelayedBundleApplication; @@ -111,8 +115,8 @@ import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.WindowingStrategy; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.util.Durations; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.Durations; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableListMultimap; @@ -121,6 +125,7 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ListMultimap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Table; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.DateTimeUtils; import org.joda.time.Duration; @@ -233,6 +238,7 @@ static class Factory stateAccessor; private Map> timerHandlers; + private FnApiTimerBundleTracker timerBundleTracker; private final DoFnInvoker doFnInvoker; private final StartBundleArgumentProvider startBundleArgumentProvider; private final ProcessBundleContextBase processContext; @@ -253,6 +259,8 @@ static class Factory currentElement; + private Object currentKey; + /** * Only valud during {@link * #processElementForWindowObservingSizedElementAndRestriction(WindowedValue)} and {@link @@ -469,8 +477,7 @@ static class Factory { - if (currentElement != null) { - checkState( - currentElement.getValue() instanceof KV, - "Accessing state in unkeyed context. Current element is not a KV: %s.", - currentElement.getValue()); - return ((KV) currentElement.getValue()).getKey(); - } else if (currentTimer != null) { - return currentTimer.getUserKey(); - } - return null; - }, + this::getCurrentKey, () -> currentWindow); } + private Object getCurrentKey() { + if (currentKey != null) { + return currentKey; + } + // TODO: Maybe memoize the key? + if (currentElement != null) { + checkState( + currentElement.getValue() instanceof KV, + "Accessing state in unkeyed context. Current element is not a KV: %s.", + currentElement.getValue()); + return ((KV) currentElement.getValue()).getKey(); + } else if (currentTimer != null) { + return currentTimer.getUserKey(); + } + return null; + } + private void startBundle() { // Register as a consumer for each timer. timerHandlers = new HashMap<>(); + timerBundleTracker = + new FnApiTimerBundleTracker( + keyCoder, windowCoder, this::getCurrentKey, () -> currentWindow); for (Map.Entry>>> timerFamilyInfo : timerFamilyInfos.entrySet()) { String localName = timerFamilyInfo.getKey(); @@ -1500,11 +1519,11 @@ static HandlesSplits.SplitResult constructSplitResult .setElement(bytesOut.toByteString()); // We don't want to change the output watermarks or set the checkpoint resume time since // that applies to the current window. - Map + Map outputWatermarkMapForUnprocessedWindows = new HashMap<>(); if (!initialWatermark.equals(GlobalWindow.TIMESTAMP_MIN_VALUE)) { - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp outputWatermark = - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp outputWatermark = + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp.newBuilder() .setSeconds(initialWatermark.getMillis() / 1000) .setNanos((int) (initialWatermark.getMillis() % 1000) * 1000000) .build(); @@ -1544,11 +1563,11 @@ static HandlesSplits.SplitResult constructSplitResult .setTransformId(pTransformId) .setInputId(mainInputId) .setElement(residualBytes.toByteString()); - Map + Map outputWatermarkMap = new HashMap<>(); if (!watermarkAndState.getKey().equals(GlobalWindow.TIMESTAMP_MIN_VALUE)) { - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp outputWatermark = - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp outputWatermark = + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp.newBuilder() .setSeconds(watermarkAndState.getKey().getMillis() / 1000) .setNanos((int) (watermarkAndState.getKey().getMillis() % 1000) * 1000000) .build(); @@ -1622,33 +1641,93 @@ private HandlesSplits.SplitResult trySplitForElementAndRestriction( private void processTimer( String timerIdOrTimerFamilyId, TimeDomain timeDomain, Timer timer) { - currentTimer = timer; - currentTimeDomain = timeDomain; - // The timerIdOrTimerFamilyId contains either a timerId from timer declaration or timerFamilyId - // from timer family declaration. - String timerId = - timerIdOrTimerFamilyId.startsWith(TimerFamilyDeclaration.PREFIX) - ? "" - : timerIdOrTimerFamilyId; - String timerFamilyId = - timerIdOrTimerFamilyId.startsWith(TimerFamilyDeclaration.PREFIX) - ? timerIdOrTimerFamilyId - : ""; try { + currentKey = timer.getUserKey(); Iterator windowIterator = (Iterator) timer.getWindows().iterator(); while (windowIterator.hasNext()) { currentWindow = windowIterator.next(); - doFnInvoker.invokeOnTimer(timerId, timerFamilyId, onTimerContext); + Modifications bundleModifications = timerBundleTracker.getBundleModifications(); + Table> modifiedTimerIds = + bundleModifications.getModifiedTimerIds(); + NavigableSet> earlierTimers = + bundleModifications + .getModifiedTimersOrdered(timeDomain) + .headSet(TimerInfo.of(timer, "", timeDomain), true); + while (!earlierTimers.isEmpty()) { + TimerInfo insertedTimer = earlierTimers.pollFirst(); + if (timerModified( + modifiedTimerIds, insertedTimer.getTimerFamilyOrId(), insertedTimer.getTimer())) { + continue; + } + + String timerId = + insertedTimer.getTimer().getDynamicTimerTag().isEmpty() + ? insertedTimer.getTimerFamilyOrId() + : insertedTimer.getTimer().getDynamicTimerTag(); + String timerFamily = + insertedTimer.getTimer().getDynamicTimerTag().isEmpty() + ? "" + : insertedTimer.getTimerFamilyOrId(); + + // If this timer was created previously in the bundle as an overwrite of a previous timer, + // we must make sure + // to clear the old timer. Since we are firing the timer inline, the runner doesn't know + // that the old timer + // was overwritten, and will otherwise fire it - causing a spurious timer fire. + modifiedTimerIds.put( + insertedTimer.getTimerFamilyOrId(), + insertedTimer.getTimer().getDynamicTimerTag(), + Timer.cleared( + insertedTimer.getTimer().getUserKey(), + insertedTimer.getTimer().getDynamicTimerTag(), + insertedTimer.getTimer().getWindows())); + // It's important to call processTimer after inserting the above deletion, otherwise the + // above line + // would overwrite any looping timer with a deletion. + processTimerDirect( + timerFamily, timerId, insertedTimer.getTimeDomain(), insertedTimer.getTimer()); + } + + if (!timerModified(modifiedTimerIds, timerIdOrTimerFamilyId, timer)) { + // The timerIdOrTimerFamilyId contains either a timerId from timer declaration or + // timerFamilyId + // from timer family declaration. + String timerId = + timerIdOrTimerFamilyId.startsWith(TimerFamilyDeclaration.PREFIX) + ? "" + : timerIdOrTimerFamilyId; + String timerFamilyId = + timerIdOrTimerFamilyId.startsWith(TimerFamilyDeclaration.PREFIX) + ? timerIdOrTimerFamilyId + : ""; + processTimerDirect(timerFamilyId, timerId, timeDomain, timer); + } } } finally { + currentKey = null; currentTimer = null; currentTimeDomain = null; currentWindow = null; } } + private boolean timerModified( + Table> modifiedTimerIds, String timerFamilyOrId, Timer timer) { + @Nullable + Timer modifiedTimer = modifiedTimerIds.get(timerFamilyOrId, timer.getDynamicTimerTag()); + return modifiedTimer != null && !modifiedTimer.equals(timer); + } + + private void processTimerDirect( + String timerFamilyId, String timerId, TimeDomain timeDomain, Timer timer) { + currentTimer = timer; + currentTimeDomain = timeDomain; + doFnInvoker.invokeOnTimer(timerId, timerFamilyId, onTimerContext); + } + private void finishBundle() throws Exception { + timerBundleTracker.outputTimers(timerFamilyOrId -> timerHandlers.get(timerFamilyOrId)); for (TimerHandler timerHandler : timerHandlers.values()) { timerHandler.awaitCompletion(); } @@ -1683,7 +1762,7 @@ private void outputTo( } private class FnApiTimer implements org.apache.beam.sdk.state.Timer { - private final String timerId; + private final String timerIdOrFamily; private final K userKey; private final String dynamicTimerTag; private final TimeDomain timeDomain; @@ -1698,7 +1777,7 @@ private class FnApiTimer implements org.apache.beam.sdk.state.Timer { private Duration offset = Duration.ZERO; FnApiTimer( - String timerId, + String timerIdOrFamily, K userKey, String dynamicTimerTag, BoundedWindow boundedWindow, @@ -1706,7 +1785,7 @@ private class FnApiTimer implements org.apache.beam.sdk.state.Timer { Instant elementTimestampOrTimerFireTimestamp, PaneInfo paneInfo, TimeDomain timeDomain) { - this.timerId = timerId; + this.timerIdOrFamily = timerIdOrFamily; this.userKey = userKey; this.dynamicTimerTag = dynamicTimerTag; this.elementTimestampOrTimerHoldTimestamp = elementTimestampOrTimerHoldTimestamp; @@ -1719,13 +1798,12 @@ private class FnApiTimer implements org.apache.beam.sdk.state.Timer { fireTimestamp = elementTimestampOrTimerFireTimestamp; break; case PROCESSING_TIME: - fireTimestamp = new Instant(DateTimeUtils.currentTimeMillis()); - break; - case SYNCHRONIZED_PROCESSING_TIME: + // TODO: This should use an injected clock when using TestStream. fireTimestamp = new Instant(DateTimeUtils.currentTimeMillis()); break; default: - throw new IllegalArgumentException(String.format("Unknown time domain %s", timeDomain)); + throw new IllegalArgumentException( + String.format("Unknown or unsupported time domain %s", timeDomain)); } try { @@ -1737,18 +1815,12 @@ private class FnApiTimer implements org.apache.beam.sdk.state.Timer { .getAllowedLateness(); } catch (IOException e) { throw new IllegalArgumentException( - String.format("Unable to get allowed lateness for timer %s", timerId)); + String.format("Unable to get allowed lateness for timer %s", timerIdOrFamily)); } } @Override public void set(Instant absoluteTime) { - // Verifies that the time domain of this timer is acceptable for absolute timers. - if (!TimeDomain.EVENT_TIME.equals(timeDomain)) { - throw new IllegalArgumentException( - "Can only set relative timers in processing time domain. Use #setRelative()"); - } - // Ensures that the target time is reasonable. For event time timers this means that the time // should be prior to window GC time. if (TimeDomain.EVENT_TIME.equals(timeDomain)) { @@ -1760,8 +1832,7 @@ public void set(Instant absoluteTime) { absoluteTime, windowExpiry); } - - output(absoluteTime); + timerBundleTracker.timerModified(timerIdOrFamily, timeDomain, getTimerForTime(absoluteTime)); } @Override @@ -1777,7 +1848,12 @@ public void setRelative() { : fireTimestamp.plus(period).minus(millisSinceStart); } target = minTargetAndGcTime(target); - output(target); + timerBundleTracker.timerModified(timerIdOrFamily, timeDomain, getTimerForTime(target)); + } + + @Override + public void clear() { + timerBundleTracker.timerModified(timerIdOrFamily, timeDomain, getClearedTimer()); } @Override @@ -1797,6 +1873,12 @@ public org.apache.beam.sdk.state.Timer withOutputTimestamp(Instant outputTime) { this.outputTimestamp = outputTime; return this; } + + @Override + public Instant getCurrentRelativeTime() { + return fireTimestamp; + } + /** * For event time timers the target time should be prior to window GC time. So it returns * min(time to set, GC Time of window). @@ -1811,7 +1893,11 @@ private Instant minTargetAndGcTime(Instant target) { return target; } - private void output(Instant scheduledTime) { + private Timer getClearedTimer() { + return Timer.cleared(userKey, dynamicTimerTag, Collections.singletonList(boundedWindow)); + } + + private Timer getTimerForTime(Instant scheduledTime) { if (outputTimestamp != null) { checkArgument( !outputTimestamp.isBefore(elementTimestampOrTimerHoldTimestamp), @@ -1857,20 +1943,13 @@ private void output(Instant scheduledTime) { outputTimestamp, windowExpiry); } - - TimerHandler consumer = (TimerHandler) timerHandlers.get(timerId); - try { - consumer.accept( - Timer.of( - userKey, - dynamicTimerTag, - Collections.singletonList(boundedWindow), - scheduledTime, - outputTimestamp, - paneInfo)); - } catch (Throwable t) { - throw UserCodeException.wrap(t); - } + return Timer.of( + userKey, + dynamicTimerTag, + Collections.singletonList(boundedWindow), + scheduledTime, + outputTimestamp, + paneInfo); } } @@ -2587,8 +2666,6 @@ private TimeDomain translateTimeDomain( return TimeDomain.EVENT_TIME; case PROCESSING_TIME: return TimeDomain.PROCESSING_TIME; - case SYNCHRONIZED_PROCESSING_TIME: - return TimeDomain.SYNCHRONIZED_PROCESSING_TIME; default: throw new IllegalArgumentException("Unknown time domain"); } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java index 89b0a7c52ffe..dee0b3ddc67a 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java @@ -17,18 +17,25 @@ */ package org.apache.beam.fn.harness; +import java.util.Collections; import java.util.EnumMap; import java.util.List; +import java.util.Set; import java.util.concurrent.ExecutorService; import java.util.concurrent.TimeUnit; import java.util.function.Function; +import java.util.stream.Collectors; +import java.util.stream.StreamSupport; +import javax.annotation.Nullable; import org.apache.beam.fn.harness.control.AddHarnessIdInterceptor; import org.apache.beam.fn.harness.control.BeamFnControlClient; import org.apache.beam.fn.harness.control.FinalizeBundleHandler; +import org.apache.beam.fn.harness.control.HarnessMonitoringInfosInstructionHandler; import org.apache.beam.fn.harness.control.ProcessBundleHandler; import org.apache.beam.fn.harness.data.BeamFnDataGrpcClient; import org.apache.beam.fn.harness.logging.BeamFnLoggingClient; import org.apache.beam.fn.harness.state.BeamFnStateGrpcClientCache; +import org.apache.beam.fn.harness.status.BeamFnStatusClient; import org.apache.beam.fn.harness.stream.HarnessStreamObserverFactories; import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionRequest; @@ -36,6 +43,9 @@ import org.apache.beam.model.fnexecution.v1.BeamFnControlGrpc; import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; +import org.apache.beam.runners.core.metrics.ExecutionStateSampler; +import org.apache.beam.runners.core.metrics.MetricsContainerImpl; +import org.apache.beam.runners.core.metrics.ShortIdMap; import org.apache.beam.sdk.extensions.gcp.options.GcsOptions; import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.IdGenerators; @@ -44,15 +54,18 @@ import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.function.ThrowingFunction; import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheLoader; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LoadingCache; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -80,7 +93,9 @@ public class FnHarness { private static final String HARNESS_ID = "HARNESS_ID"; private static final String CONTROL_API_SERVICE_DESCRIPTOR = "CONTROL_API_SERVICE_DESCRIPTOR"; private static final String LOGGING_API_SERVICE_DESCRIPTOR = "LOGGING_API_SERVICE_DESCRIPTOR"; + private static final String STATUS_API_SERVICE_DESCRIPTOR = "STATUS_API_SERVICE_DESCRIPTOR"; private static final String PIPELINE_OPTIONS = "PIPELINE_OPTIONS"; + private static final String RUNNER_CAPABILITIES = "RUNNER_CAPABILITIES"; private static final Logger LOG = LoggerFactory.getLogger(FnHarness.class); private static Endpoints.ApiServiceDescriptor getApiServiceDescriptor(String descriptor) @@ -104,6 +119,8 @@ public static void main(Function environmentVarGetter) throws Ex "Logging location %s%n", environmentVarGetter.apply(LOGGING_API_SERVICE_DESCRIPTOR)); System.out.format( "Control location %s%n", environmentVarGetter.apply(CONTROL_API_SERVICE_DESCRIPTOR)); + System.out.format( + "Status location %s%n", environmentVarGetter.apply(STATUS_API_SERVICE_DESCRIPTOR)); System.out.format("Pipeline options %s%n", environmentVarGetter.apply(PIPELINE_OPTIONS)); String id = environmentVarGetter.apply(HARNESS_ID); @@ -116,7 +133,23 @@ public static void main(Function environmentVarGetter) throws Ex Endpoints.ApiServiceDescriptor controlApiServiceDescriptor = getApiServiceDescriptor(environmentVarGetter.apply(CONTROL_API_SERVICE_DESCRIPTOR)); - main(id, options, loggingApiServiceDescriptor, controlApiServiceDescriptor); + Endpoints.ApiServiceDescriptor statusApiServiceDescriptor = + environmentVarGetter.apply(STATUS_API_SERVICE_DESCRIPTOR) == null + ? null + : getApiServiceDescriptor(environmentVarGetter.apply(STATUS_API_SERVICE_DESCRIPTOR)); + String runnerCapabilitesOrNull = environmentVarGetter.apply(RUNNER_CAPABILITIES); + Set runnerCapabilites = + runnerCapabilitesOrNull == null + ? Collections.emptySet() + : ImmutableSet.copyOf(runnerCapabilitesOrNull.split("\\s+")); + + main( + id, + options, + runnerCapabilites, + loggingApiServiceDescriptor, + controlApiServiceDescriptor, + statusApiServiceDescriptor); } /** @@ -125,15 +158,19 @@ public static void main(Function environmentVarGetter) throws Ex * * @param id Harness ID * @param options The options for this pipeline + * @param runnerCapabilites * @param loggingApiServiceDescriptor * @param controlApiServiceDescriptor + * @param statusApiServiceDescriptor * @throws Exception */ public static void main( String id, PipelineOptions options, + Set runnerCapabilites, Endpoints.ApiServiceDescriptor loggingApiServiceDescriptor, - Endpoints.ApiServiceDescriptor controlApiServiceDescriptor) + Endpoints.ApiServiceDescriptor controlApiServiceDescriptor, + @Nullable Endpoints.ApiServiceDescriptor statusApiServiceDescriptor) throws Exception { ManagedChannelFactory channelFactory; List experiments = options.as(ExperimentalOptions.class).getExperiments(); @@ -149,8 +186,10 @@ public static void main( main( id, options, + runnerCapabilites, loggingApiServiceDescriptor, controlApiServiceDescriptor, + statusApiServiceDescriptor, channelFactory, outboundObserverFactory); } @@ -161,8 +200,10 @@ public static void main( * * @param id Harness ID * @param options The options for this pipeline + * @param runnerCapabilites * @param loggingApiServiceDescriptor * @param controlApiServiceDescriptor + * @param statusApiServiceDescriptor * @param channelFactory * @param outboundObserverFactory * @throws Exception @@ -170,12 +211,15 @@ public static void main( public static void main( String id, PipelineOptions options, + Set runnerCapabilites, Endpoints.ApiServiceDescriptor loggingApiServiceDescriptor, Endpoints.ApiServiceDescriptor controlApiServiceDescriptor, + Endpoints.ApiServiceDescriptor statusApiServiceDescriptor, ManagedChannelFactory channelFactory, OutboundObserverFactory outboundObserverFactory) throws Exception { IdGenerator idGenerator = IdGenerators.decrementingLongs(); + ShortIdMap metricsShortIds = new ShortIdMap(); ExecutorService executorService = options.as(GcsOptions.class).getExecutorService(); // The logging client variable is not used per se, but during its lifetime (until close()) it // intercepts logging and sends it to the logging service. @@ -221,13 +265,28 @@ public BeamFnApi.ProcessBundleDescriptor load(String id) { } }); + MetricsEnvironment.setProcessWideContainer(MetricsContainerImpl.createProcessWideContainer()); + ProcessBundleHandler processBundleHandler = new ProcessBundleHandler( options, + runnerCapabilites, processBundleDescriptors::getUnchecked, beamFnDataMultiplexer, beamFnStateGrpcClientCache, - finalizeBundleHandler); + finalizeBundleHandler, + metricsShortIds); + + BeamFnStatusClient beamFnStatusClient = null; + if (statusApiServiceDescriptor != null) { + beamFnStatusClient = + new BeamFnStatusClient( + statusApiServiceDescriptor, + channelFactory::forDescriptor, + processBundleHandler.getBundleProcessorCache(), + options); + } + // TODO(BEAM-9729): Remove once runners no longer send this instruction. handlers.put( BeamFnApi.InstructionRequest.RequestCase.REGISTER, @@ -246,16 +305,58 @@ public BeamFnApi.ProcessBundleDescriptor load(String id) { handlers.put( BeamFnApi.InstructionRequest.RequestCase.PROCESS_BUNDLE_SPLIT, processBundleHandler::trySplit); - BeamFnControlClient control = - new BeamFnControlClient(id, controlStub, outboundObserverFactory, handlers); + handlers.put( + InstructionRequest.RequestCase.MONITORING_INFOS, + request -> + BeamFnApi.InstructionResponse.newBuilder() + .setMonitoringInfos( + BeamFnApi.MonitoringInfosMetadataResponse.newBuilder() + .putAllMonitoringInfo( + StreamSupport.stream( + request + .getMonitoringInfos() + .getMonitoringInfoIdList() + .spliterator(), + false) + .collect( + Collectors.toMap( + Function.identity(), metricsShortIds::get))))); + + HarnessMonitoringInfosInstructionHandler processWideHandler = + new HarnessMonitoringInfosInstructionHandler(metricsShortIds); + handlers.put( + InstructionRequest.RequestCase.HARNESS_MONITORING_INFOS, + processWideHandler::harnessMonitoringInfos); JvmInitializers.runBeforeProcessing(options); + String samplingPeriodMills = + ExperimentalOptions.getExperimentValue( + options, ExperimentalOptions.STATE_SAMPLING_PERIOD_MILLIS); + if (samplingPeriodMills != null) { + ExecutionStateSampler.setSamplingPeriod(Integer.parseInt(samplingPeriodMills)); + } + ExecutionStateSampler.instance().start(); + LOG.info("Entering instruction processing loop"); - control.processInstructionRequests(executorService); + + // The control client immediately dispatches requests to an executor so we execute on the + // direct executor. If we created separate channels for different stubs we could use + // directExecutor() when building the channel. + BeamFnControlClient control = + new BeamFnControlClient( + controlStub.withExecutor(MoreExecutors.directExecutor()), + outboundObserverFactory, + executorService, + handlers); + control.waitForTermination(); + if (beamFnStatusClient != null) { + beamFnStatusClient.close(); + } processBundleHandler.shutdown(); } finally { System.out.println("Shutting SDK harness down."); + ExecutionStateSampler.instance().stop(); executorService.shutdown(); } } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/MapFnRunners.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/MapFnRunners.java index 417e6d6dd3fa..a5b5b6b00eb6 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/MapFnRunners.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/MapFnRunners.java @@ -30,14 +30,17 @@ import org.apache.beam.fn.harness.data.PTransformFunctionRegistry; import org.apache.beam.fn.harness.state.BeamFnStateClient; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.Components; import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; +import org.apache.beam.runners.core.construction.RehydratedComponents; import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.function.ThrowingFunction; import org.apache.beam.sdk.function.ThrowingRunnable; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.DoFn.BundleFinalizer; import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; /** @@ -129,12 +132,34 @@ public Mapper createRunnerForPTransform( Mapper mapper = mapperFactory.create(pTransformId, pTransform, consumer); + RehydratedComponents components = + RehydratedComponents.forComponents(Components.newBuilder().putAllCoders(coders).build()); + String pCollectionId = Iterables.getOnlyElement(pTransform.getInputsMap().values()); pCollectionConsumerRegistry.register( - Iterables.getOnlyElement(pTransform.getInputsMap().values()), + pCollectionId, pTransformId, - (FnDataReceiver) (FnDataReceiver>) mapper::map); + (FnDataReceiver) (FnDataReceiver>) mapper::map, + getValueCoder(components, pCollections, pCollectionId)); return mapper; } + + private org.apache.beam.sdk.coders.Coder getValueCoder( + RehydratedComponents components, + Map pCollections, + String pCollectionId) + throws IOException { + if (!pCollections.containsKey(pCollectionId)) { + throw new IllegalArgumentException( + String.format("Missing PCollection for id: %s", pCollectionId)); + } + + org.apache.beam.sdk.coders.Coder coder = + components.getCoder(pCollections.get(pCollectionId).getCoderId()); + if (coder instanceof WindowedValueCoder) { + coder = ((WindowedValueCoder) coder).getValueCoder(); + } + return coder; + } } @FunctionalInterface diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/WindowMergingFnRunner.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/WindowMergingFnRunner.java index edf0e0aaedcd..e7b169e06e81 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/WindowMergingFnRunner.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/WindowMergingFnRunner.java @@ -34,6 +34,7 @@ import org.apache.beam.sdk.transforms.windowing.WindowFn.MergeContext; import org.apache.beam.sdk.values.KV; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; /** @@ -154,7 +155,13 @@ KV, Iterable>>>> mergeWindows( for (KV> mergedWindow : mergedWindows) { currentWindows.removeAll(mergedWindow.getValue()); } - return KV.of(windowsToMerge.getKey(), KV.of(currentWindows, (Iterable) mergedWindows)); + KV, Iterable>>>> result = + KV.of( + windowsToMerge.getKey(), + KV.of(Sets.newHashSet(currentWindows), (Iterable) Lists.newArrayList(mergedWindows))); + currentWindows.clear(); + mergedWindows.clear(); + return result; } } } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/AddHarnessIdInterceptor.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/AddHarnessIdInterceptor.java index f94971648d2c..3d1117c6369a 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/AddHarnessIdInterceptor.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/AddHarnessIdInterceptor.java @@ -19,10 +19,10 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ClientInterceptor; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Metadata; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Metadata.Key; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.MetadataUtils; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ClientInterceptor; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Metadata; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Metadata.Key; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.MetadataUtils; /** A {@link ClientInterceptor} that attaches a provided SDK Harness ID to outgoing messages. */ public class AddHarnessIdInterceptor { diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/BeamFnControlClient.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/BeamFnControlClient.java index bb5eeee9458b..90b5fed4ae85 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/BeamFnControlClient.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/BeamFnControlClient.java @@ -20,21 +20,17 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables.getStackTraceAsString; import java.util.EnumMap; -import java.util.Objects; -import java.util.concurrent.BlockingDeque; import java.util.concurrent.CompletableFuture; import java.util.concurrent.ExecutionException; import java.util.concurrent.Executor; -import java.util.concurrent.LinkedBlockingDeque; import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnControlGrpc; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.sdk.fn.channel.ManagedChannelFactory; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.function.ThrowingFunction; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -52,13 +48,9 @@ * client will not produce any more {@link BeamFnApi.InstructionRequest}s. */ public class BeamFnControlClient { - private static final String FAKE_INSTRUCTION_ID = "FAKE_INSTRUCTION_ID"; private static final Logger LOG = LoggerFactory.getLogger(BeamFnControlClient.class); - private static final BeamFnApi.InstructionRequest POISON_PILL = - BeamFnApi.InstructionRequest.newBuilder().setInstructionId(FAKE_INSTRUCTION_ID).build(); private final StreamObserver outboundObserver; - private final BlockingDeque bufferedInstructions; private final EnumMap< BeamFnApi.InstructionRequest.RequestCase, ThrowingFunction> @@ -66,32 +58,32 @@ public class BeamFnControlClient { private final CompletableFuture onFinish; public BeamFnControlClient( - String id, ApiServiceDescriptor apiServiceDescriptor, ManagedChannelFactory channelFactory, OutboundObserverFactory outboundObserverFactory, + Executor executor, EnumMap< BeamFnApi.InstructionRequest.RequestCase, ThrowingFunction> handlers) { this( - id, BeamFnControlGrpc.newStub(channelFactory.forDescriptor(apiServiceDescriptor)), outboundObserverFactory, + executor, handlers); } public BeamFnControlClient( - String id, BeamFnControlGrpc.BeamFnControlStub controlStub, OutboundObserverFactory outboundObserverFactory, + Executor executor, EnumMap< BeamFnApi.InstructionRequest.RequestCase, ThrowingFunction> handlers) { - this.bufferedInstructions = new LinkedBlockingDeque<>(); this.outboundObserver = - outboundObserverFactory.outboundObserverFor(controlStub::control, new InboundObserver()); + outboundObserverFactory.outboundObserverFor( + controlStub::control, new InboundObserver(executor)); this.handlers = handlers; this.onFinish = new CompletableFuture<>(); } @@ -103,65 +95,40 @@ public BeamFnControlClient( * termination. */ private class InboundObserver implements StreamObserver { + private final Executor executor; + + InboundObserver(Executor executorService) { + this.executor = executorService; + } @Override public void onNext(BeamFnApi.InstructionRequest value) { LOG.debug("Received InstructionRequest {}", value); - Uninterruptibles.putUninterruptibly(bufferedInstructions, value); + executor.execute( + () -> { + try { + BeamFnApi.InstructionResponse response = delegateOnInstructionRequestType(value); + sendInstructionResponse(response); + } catch (Error e) { + sendErrorResponse(e); + throw e; + } + }); } @Override public void onError(Throwable t) { - placePoisonPillIntoQueue(); onFinish.completeExceptionally(t); } @Override public void onCompleted() { - placePoisonPillIntoQueue(); onFinish.complete(COMPLETED); } - - /** - * This method emulates {@link Uninterruptibles#putUninterruptibly} but placing the element at - * the front of the queue. - * - *

    We place the poison pill at the front of the queue because if the server shutdown, any - * remaining instructions can be discarded. - */ - private void placePoisonPillIntoQueue() { - while (true) { - try { - bufferedInstructions.putFirst(POISON_PILL); - return; - } catch (InterruptedException e) { - // Ignored until we place the poison pill into the queue - } - } - } } - /** - * Note that this method continuously submits work to the supplied executor until the Beam Fn - * Control server hangs up or fails exceptionally. - */ - public void processInstructionRequests(Executor executor) - throws InterruptedException, ExecutionException { - BeamFnApi.InstructionRequest request; - while (!Objects.equals((request = bufferedInstructions.take()), POISON_PILL)) { - BeamFnApi.InstructionRequest currentRequest = request; - executor.execute( - () -> { - try { - BeamFnApi.InstructionResponse response = - delegateOnInstructionRequestType(currentRequest); - sendInstructionResponse(response); - } catch (Error e) { - sendErrorResponse(e); - throw e; - } - }); - } + /** This method blocks until the control stream has completed. */ + public void waitForTermination() throws InterruptedException, ExecutionException { onFinish.get(); } @@ -204,7 +171,6 @@ private void sendErrorResponse(Error e) { Status.INTERNAL .withDescription(String.format("%s: %s", e.getClass().getName(), e.getMessage())) .asException()); - // TODO: Should this clear out the instruction request queue? } private BeamFnApi.InstructionResponse.Builder missingHandler( diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/HarnessMonitoringInfosInstructionHandler.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/HarnessMonitoringInfosInstructionHandler.java new file mode 100644 index 000000000000..127dc48fcb47 --- /dev/null +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/HarnessMonitoringInfosInstructionHandler.java @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.control; + +import org.apache.beam.model.fnexecution.v1.BeamFnApi; +import org.apache.beam.runners.core.metrics.MetricsContainerImpl; +import org.apache.beam.runners.core.metrics.ShortIdMap; +import org.apache.beam.sdk.metrics.MetricsContainer; +import org.apache.beam.sdk.metrics.MetricsEnvironment; + +/** + * Processes {@link BeamFnApi.InstructionRequest}'s {@link BeamFnApi.HarnessMonitoringInfosResponse} + * + *

    These instructions are not associated with the currently processed bundles. They return + * MonitoringInfos payloads for "process-wide" metrics, which return metric values calculated over + * the life of the process. + */ +public class HarnessMonitoringInfosInstructionHandler { + + private final ShortIdMap metricsShortIds; + + public HarnessMonitoringInfosInstructionHandler(ShortIdMap metricsShortIds) { + this.metricsShortIds = metricsShortIds; + } + + public BeamFnApi.InstructionResponse.Builder harnessMonitoringInfos( + BeamFnApi.InstructionRequest request) { + BeamFnApi.HarnessMonitoringInfosResponse.Builder response = + BeamFnApi.HarnessMonitoringInfosResponse.newBuilder(); + MetricsContainer container = MetricsEnvironment.getProcessWideContainer(); + if (container != null && container instanceof MetricsContainerImpl) { + response.putAllMonitoringData( + ((MetricsContainerImpl) container).getMonitoringData(this.metricsShortIds)); + } + return BeamFnApi.InstructionResponse.newBuilder().setHarnessMonitoringInfos(response); + } +} diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java index e93c67b9e128..9cceb721542c 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java @@ -20,9 +20,11 @@ import com.google.auto.value.AutoValue; import java.io.Closeable; import java.io.IOException; +import java.time.Duration; import java.util.ArrayList; import java.util.Collection; import java.util.Collections; +import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; @@ -46,8 +48,10 @@ import org.apache.beam.fn.harness.data.PCollectionConsumerRegistry; import org.apache.beam.fn.harness.data.PTransformFunctionRegistry; import org.apache.beam.fn.harness.data.QueueingBeamFnDataClient; +import org.apache.beam.fn.harness.logging.BeamFnLoggingMDC; import org.apache.beam.fn.harness.state.BeamFnStateClient; import org.apache.beam.fn.harness.state.BeamFnStateGrpcClientCache; +import org.apache.beam.fn.harness.state.CachingBeamFnStateClient; import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleDescriptor; import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleRequest; @@ -55,30 +59,38 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest.Builder; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; +import org.apache.beam.model.pipeline.v1.MetricsApi; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.Coder; import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; import org.apache.beam.model.pipeline.v1.RunnerApi.WindowingStrategy; +import org.apache.beam.runners.core.construction.BeamUrns; import org.apache.beam.runners.core.construction.PTransformTranslation; import org.apache.beam.runners.core.construction.Timer; import org.apache.beam.runners.core.metrics.ExecutionStateSampler; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; import org.apache.beam.runners.core.metrics.MetricsContainerStepMap; +import org.apache.beam.runners.core.metrics.ShortIdMap; import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.fn.data.LogicalEndpoint; import org.apache.beam.sdk.function.ThrowingRunnable; +import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.StreamingOptions; import org.apache.beam.sdk.transforms.DoFn.BundleFinalizer; import org.apache.beam.sdk.util.common.ReflectHelpers; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Message; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Message; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheLoader; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LoadingCache; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashMultimap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.SetMultimap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; import org.joda.time.Instant; @@ -113,6 +125,8 @@ public class ProcessBundleHandler { private static final String DATA_OUTPUT_URN = "beam:runner:sink:v1"; public static final String JAVA_SOURCE_URN = "beam:source:java:0.1"; + private static final int DATA_QUEUE_SIZE = 1000; + private static final Logger LOG = LoggerFactory.getLogger(ProcessBundleHandler.class); @VisibleForTesting static final Map REGISTERED_RUNNER_FACTORIES; @@ -130,27 +144,52 @@ public class ProcessBundleHandler { REGISTERED_RUNNER_FACTORIES = builder.build(); } + // Creates a new map of state data for newly encountered state keys + private CacheLoader< + BeamFnApi.StateKey, + Map> + stateKeyMapCacheLoader = + new CacheLoader< + BeamFnApi.StateKey, + Map>() { + @Override + public Map load( + BeamFnApi.StateKey key) { + return new HashMap<>(); + } + }; + private final PipelineOptions options; private final Function fnApiRegistry; private final BeamFnDataClient beamFnDataClient; private final BeamFnStateGrpcClientCache beamFnStateGrpcClientCache; + private final LoadingCache< + BeamFnApi.StateKey, + Map> + stateCache; private final FinalizeBundleHandler finalizeBundleHandler; + private final ShortIdMap shortIds; + private final boolean runnerAcceptsShortIds; private final Map urnToPTransformRunnerFactoryMap; private final PTransformRunnerFactory defaultPTransformRunnerFactory; @VisibleForTesting final BundleProcessorCache bundleProcessorCache; public ProcessBundleHandler( PipelineOptions options, + Set runnerCapabilities, Function fnApiRegistry, BeamFnDataClient beamFnDataClient, BeamFnStateGrpcClientCache beamFnStateGrpcClientCache, - FinalizeBundleHandler finalizeBundleHandler) { + FinalizeBundleHandler finalizeBundleHandler, + ShortIdMap shortIds) { this( options, + runnerCapabilities, fnApiRegistry, beamFnDataClient, beamFnStateGrpcClientCache, finalizeBundleHandler, + shortIds, REGISTERED_RUNNER_FACTORIES, new BundleProcessorCache()); } @@ -158,17 +197,24 @@ public ProcessBundleHandler( @VisibleForTesting ProcessBundleHandler( PipelineOptions options, + Set runnerCapabilities, Function fnApiRegistry, BeamFnDataClient beamFnDataClient, BeamFnStateGrpcClientCache beamFnStateGrpcClientCache, FinalizeBundleHandler finalizeBundleHandler, + ShortIdMap shortIds, Map urnToPTransformRunnerFactoryMap, BundleProcessorCache bundleProcessorCache) { this.options = options; this.fnApiRegistry = fnApiRegistry; this.beamFnDataClient = beamFnDataClient; this.beamFnStateGrpcClientCache = beamFnStateGrpcClientCache; + this.stateCache = CacheBuilder.newBuilder().build(stateKeyMapCacheLoader); this.finalizeBundleHandler = finalizeBundleHandler; + this.shortIds = shortIds; + this.runnerAcceptsShortIds = + runnerCapabilities.contains( + BeamUrns.getUrn(RunnerApi.StandardRunnerProtocols.Enum.MONITORING_INFO_SHORT_IDS)); this.urnToPTransformRunnerFactoryMap = urnToPTransformRunnerFactoryMap; this.defaultPTransformRunnerFactory = new UnknownPTransformRunnerFactory(urnToPTransformRunnerFactoryMap.keySet()); @@ -288,62 +334,65 @@ public BeamFnApi.InstructionResponse.Builder processBundle(BeamFnApi.Instruction throw new RuntimeException(e); } }); - PTransformFunctionRegistry startFunctionRegistry = bundleProcessor.getStartFunctionRegistry(); - PTransformFunctionRegistry finishFunctionRegistry = bundleProcessor.getFinishFunctionRegistry(); - ExecutionStateTracker stateTracker = bundleProcessor.getStateTracker(); - QueueingBeamFnDataClient queueingClient = bundleProcessor.getQueueingClient(); - - try (HandleStateCallsForBundle beamFnStateClient = bundleProcessor.getBeamFnStateClient()) { - try (Closeable closeTracker = stateTracker.activate()) { - // Already in reverse topological order so we don't need to do anything. - for (ThrowingRunnable startFunction : startFunctionRegistry.getFunctions()) { - LOG.debug("Starting function {}", startFunction); - startFunction.run(); - } + try { + BeamFnLoggingMDC.setInstructionId(request.getInstructionId()); + PTransformFunctionRegistry startFunctionRegistry = bundleProcessor.getStartFunctionRegistry(); + PTransformFunctionRegistry finishFunctionRegistry = + bundleProcessor.getFinishFunctionRegistry(); + ExecutionStateTracker stateTracker = bundleProcessor.getStateTracker(); + QueueingBeamFnDataClient queueingClient = bundleProcessor.getQueueingClient(); + + try (HandleStateCallsForBundle beamFnStateClient = bundleProcessor.getBeamFnStateClient()) { + try (Closeable closeTracker = stateTracker.activate()) { + // Already in reverse topological order so we don't need to do anything. + for (ThrowingRunnable startFunction : startFunctionRegistry.getFunctions()) { + LOG.debug("Starting function {}", startFunction); + startFunction.run(); + } - queueingClient.drainAndBlock(); + queueingClient.drainAndBlock(); - // Need to reverse this since we want to call finish in topological order. - for (ThrowingRunnable finishFunction : - Lists.reverse(finishFunctionRegistry.getFunctions())) { - LOG.debug("Finishing function {}", finishFunction); - finishFunction.run(); + // Need to reverse this since we want to call finish in topological order. + for (ThrowingRunnable finishFunction : + Lists.reverse(finishFunctionRegistry.getFunctions())) { + LOG.debug("Finishing function {}", finishFunction); + finishFunction.run(); + } } - } - // Add all checkpointed residuals to the response. - response.addAllResidualRoots(bundleProcessor.getSplitListener().getResidualRoots()); - - // TODO(BEAM-6597): This should be reporting monitoring infos using the short id system. - // Get start bundle Execution Time Metrics. - response.addAllMonitoringInfos( - bundleProcessor.getStartFunctionRegistry().getExecutionTimeMonitoringInfos()); - // Get process bundle Execution Time Metrics. - response.addAllMonitoringInfos( - bundleProcessor.getpCollectionConsumerRegistry().getExecutionTimeMonitoringInfos()); - // Get finish bundle Execution Time Metrics. - response.addAllMonitoringInfos( - bundleProcessor.getFinishFunctionRegistry().getExecutionTimeMonitoringInfos()); - // Extract MonitoringInfos that come from the metrics container registry. - response.addAllMonitoringInfos( - bundleProcessor.getMetricsContainerRegistry().getMonitoringInfos()); - // Add any additional monitoring infos that the "runners" report explicitly. - for (ProgressRequestCallback progressRequestCallback : - bundleProcessor.getProgressRequestCallbacks()) { - response.addAllMonitoringInfos(progressRequestCallback.getMonitoringInfos()); - } + // Add all checkpointed residuals to the response. + response.addAllResidualRoots(bundleProcessor.getSplitListener().getResidualRoots()); + + // Add all metrics to the response. + Map monitoringData = monitoringData(bundleProcessor); + if (runnerAcceptsShortIds) { + response.putAllMonitoringData(monitoringData); + } else { + for (Map.Entry metric : monitoringData.entrySet()) { + response.addMonitoringInfos( + shortIds.get(metric.getKey()).toBuilder().setPayload(metric.getValue())); + } + } - if (!bundleProcessor.getBundleFinalizationCallbackRegistrations().isEmpty()) { - finalizeBundleHandler.registerCallbacks( - bundleProcessor.getInstructionId(), - ImmutableList.copyOf(bundleProcessor.getBundleFinalizationCallbackRegistrations())); - response.setRequiresFinalization(true); + if (!bundleProcessor.getBundleFinalizationCallbackRegistrations().isEmpty()) { + finalizeBundleHandler.registerCallbacks( + bundleProcessor.getInstructionId(), + ImmutableList.copyOf(bundleProcessor.getBundleFinalizationCallbackRegistrations())); + response.setRequiresFinalization(true); + } } + // Mark the bundle processor as re-usable. bundleProcessorCache.release( request.getProcessBundle().getProcessBundleDescriptorId(), bundleProcessor); + return BeamFnApi.InstructionResponse.newBuilder().setProcessBundle(response); + } catch (Exception e) { + // Make sure we clean-up from the active set of bundle processors. + bundleProcessorCache.discard(bundleProcessor); + throw e; + } finally { + BeamFnLoggingMDC.setInstructionId(null); } - return BeamFnApi.InstructionResponse.newBuilder().setProcessBundle(response); } public BeamFnApi.InstructionResponse.Builder progress(BeamFnApi.InstructionRequest request) @@ -361,29 +410,46 @@ public BeamFnApi.InstructionResponse.Builder progress(BeamFnApi.InstructionReque .setProcessBundleProgress(BeamFnApi.ProcessBundleProgressResponse.getDefaultInstance()); } - // TODO(BEAM-6597): This should really only be reporting monitoring infos where the data - // changed - // and we should be using the short id system. + Map monitoringData = monitoringData(bundleProcessor); + if (runnerAcceptsShortIds) { + response.putAllMonitoringData(monitoringData); + } else { + for (Map.Entry metric : monitoringData.entrySet()) { + response.addMonitoringInfos( + shortIds.get(metric.getKey()).toBuilder().setPayload(metric.getValue())); + } + } + return BeamFnApi.InstructionResponse.newBuilder().setProcessBundleProgress(response); + } + + private ImmutableMap monitoringData(BundleProcessor bundleProcessor) + throws Exception { + ImmutableMap.Builder result = ImmutableMap.builder(); // Get start bundle Execution Time Metrics. - response.addAllMonitoringInfos( - bundleProcessor.getStartFunctionRegistry().getExecutionTimeMonitoringInfos()); + result.putAll( + bundleProcessor.getStartFunctionRegistry().getExecutionTimeMonitoringData(shortIds)); // Get process bundle Execution Time Metrics. - response.addAllMonitoringInfos( - bundleProcessor.getpCollectionConsumerRegistry().getExecutionTimeMonitoringInfos()); + result.putAll( + bundleProcessor.getpCollectionConsumerRegistry().getExecutionTimeMonitoringData(shortIds)); // Get finish bundle Execution Time Metrics. - response.addAllMonitoringInfos( - bundleProcessor.getFinishFunctionRegistry().getExecutionTimeMonitoringInfos()); - // Extract all other MonitoringInfos other than the execution time monitoring infos. - response.addAllMonitoringInfos( - bundleProcessor.getMetricsContainerRegistry().getMonitoringInfos()); + result.putAll( + bundleProcessor.getFinishFunctionRegistry().getExecutionTimeMonitoringData(shortIds)); + // Extract MonitoringInfos that come from the metrics container registry. + result.putAll(bundleProcessor.getMetricsContainerRegistry().getMonitoringData(shortIds)); // Add any additional monitoring infos that the "runners" report explicitly. for (ProgressRequestCallback progressRequestCallback : bundleProcessor.getProgressRequestCallbacks()) { - response.addAllMonitoringInfos(progressRequestCallback.getMonitoringInfos()); + // TODO(BEAM-6597): Plumb reporting monitoring infos using the short id system upstream. + for (MetricsApi.MonitoringInfo monitoringInfo : + progressRequestCallback.getMonitoringInfos()) { + ByteString payload = monitoringInfo.getPayload(); + String shortId = + shortIds.getOrCreateShortId(monitoringInfo.toBuilder().clearPayload().build()); + result.put(shortId, payload); + } } - - return BeamFnApi.InstructionResponse.newBuilder().setProcessBundleProgress(response); + return result.build(); } /** Splits an active bundle. */ @@ -417,7 +483,8 @@ private BundleProcessor createBundleProcessor( // Note: We must create one instance of the QueueingBeamFnDataClient as it is designed to // handle the life of a bundle. It will insert elements onto a queue and drain them off so all // process() calls will execute on this thread when queueingClient.drainAndBlock() is called. - QueueingBeamFnDataClient queueingClient = new QueueingBeamFnDataClient(this.beamFnDataClient); + QueueingBeamFnDataClient queueingClient = + new QueueingBeamFnDataClient(this.beamFnDataClient, DATA_QUEUE_SIZE); BeamFnApi.ProcessBundleDescriptor bundleDescriptor = (BeamFnApi.ProcessBundleDescriptor) fnApiRegistry.apply(bundleId); @@ -448,14 +515,30 @@ private BundleProcessor createBundleProcessor( } } - // Instantiate a State API call handler depending on whether a State ApiServiceDescriptor - // was specified. - HandleStateCallsForBundle beamFnStateClient = - bundleDescriptor.hasStateApiServiceDescriptor() - ? new BlockTillStateCallsFinish( - beamFnStateGrpcClientCache.forApiServiceDescriptor( - bundleDescriptor.getStateApiServiceDescriptor())) - : new FailAllStateCallsForBundle(processBundleRequest); + // Instantiate a State API call handler depending on whether a State ApiServiceDescriptor was + // specified. + HandleStateCallsForBundle beamFnStateClient; + if (bundleDescriptor.hasStateApiServiceDescriptor()) { + BeamFnStateClient underlyingClient = + beamFnStateGrpcClientCache.forApiServiceDescriptor( + bundleDescriptor.getStateApiServiceDescriptor()); + + // If pipeline is batch, use a CachingBeamFnStateClient to store state responses. + // Once streaming is supported use CachingBeamFnStateClient for both. + // TODO(BEAM-10212): Remove experiment once cross bundle caching is used by default + if (ExperimentalOptions.hasExperiment(options, "cross_bundle_caching")) { + beamFnStateClient = + new BlockTillStateCallsFinish( + options.as(StreamingOptions.class).isStreaming() + ? underlyingClient + : new CachingBeamFnStateClient( + underlyingClient, stateCache, processBundleRequest.getCacheTokensList())); + } else { + beamFnStateClient = new BlockTillStateCallsFinish(underlyingClient); + } + } else { + beamFnStateClient = new FailAllStateCallsForBundle(processBundleRequest); + } // Instantiate a Timer client registration handler depending on whether a Timer // ApiServiceDescriptor was specified. @@ -530,10 +613,15 @@ public void afterBundleCommit(Instant callbackExpiry, Callback callback) { return bundleProcessor; } + public BundleProcessorCache getBundleProcessorCache() { + return bundleProcessorCache; + } + /** A cache for {@link BundleProcessor}s. */ public static class BundleProcessorCache { - private final Map> cachedBundleProcessors; + private final LoadingCache> + cachedBundleProcessors; private final Map activeBundleProcessors; @Override @@ -542,14 +630,36 @@ public int hashCode() { } BundleProcessorCache() { - this.cachedBundleProcessors = Maps.newConcurrentMap(); + this.cachedBundleProcessors = + CacheBuilder.newBuilder() + .expireAfterAccess(Duration.ofMinutes(1L)) + .removalListener( + removalNotification -> { + ((ConcurrentLinkedQueue) removalNotification.getValue()) + .forEach( + bundleProcessor -> { + bundleProcessor.shutdown(); + }); + }) + .build( + new CacheLoader>() { + @Override + public ConcurrentLinkedQueue load(String s) throws Exception { + return new ConcurrentLinkedQueue<>(); + } + }); // We specifically use a weak hash map so that references will automatically go out of scope // and not need to be freed explicitly from the cache. this.activeBundleProcessors = Collections.synchronizedMap(new WeakHashMap<>()); } + @VisibleForTesting Map> getCachedBundleProcessors() { - return cachedBundleProcessors; + return ImmutableMap.copyOf(cachedBundleProcessors.asMap()); + } + + public Map getActiveBundleProcessors() { + return ImmutableMap.copyOf(activeBundleProcessors); } /** @@ -565,8 +675,7 @@ BundleProcessor get( String instructionId, Supplier bundleProcessorSupplier) { ConcurrentLinkedQueue bundleProcessors = - cachedBundleProcessors.computeIfAbsent( - bundleDescriptorId, descriptorId -> new ConcurrentLinkedQueue<>()); + cachedBundleProcessors.getUnchecked(bundleDescriptorId); BundleProcessor bundleProcessor = bundleProcessors.poll(); if (bundleProcessor == null) { bundleProcessor = bundleProcessorSupplier.get(); @@ -581,13 +690,13 @@ BundleProcessor get( * Finds an active bundle processor for the specified {@code instructionId} or null if one could * not be found. */ - BundleProcessor find(String instructionId) { + public BundleProcessor find(String instructionId) { return activeBundleProcessors.get(instructionId); } /** - * Add a {@link BundleProcessor} to cache. The {@link BundleProcessor} will be reset before - * being added to the cache and will be marked as inactive. + * Add a {@link BundleProcessor} to cache. The {@link BundleProcessor} will be marked as + * inactive and reset before being added to the cache. */ void release(String bundleDescriptorId, BundleProcessor bundleProcessor) { activeBundleProcessors.remove(bundleProcessor.getInstructionId()); @@ -602,18 +711,14 @@ void release(String bundleDescriptorId, BundleProcessor bundleProcessor) { } } + /** Discard an active {@link BundleProcessor} instead of being re-used. */ + void discard(BundleProcessor bundleProcessor) { + activeBundleProcessors.remove(bundleProcessor.getInstructionId()); + } + /** Shutdown all the cached {@link BundleProcessor}s, running the tearDown() functions. */ void shutdown() throws Exception { - for (ConcurrentLinkedQueue bundleProcessors : - cachedBundleProcessors.values()) { - for (BundleProcessor bundleProcessor : bundleProcessors) { - for (ThrowingRunnable tearDownFunction : bundleProcessor.getTearDownFunctions()) { - LOG.debug("Tearing down function {}", tearDownFunction); - tearDownFunction.run(); - } - } - } - cachedBundleProcessors.clear(); + cachedBundleProcessors.invalidateAll(); } } @@ -669,7 +774,7 @@ public static BundleProcessor create( abstract MetricsContainerStepMap getMetricsContainerRegistry(); - abstract ExecutionStateTracker getStateTracker(); + public abstract ExecutionStateTracker getStateTracker(); abstract HandleStateCallsForBundle getBeamFnStateClient(); @@ -679,27 +784,42 @@ public static BundleProcessor create( abstract Collection getChannelRoots(); - String getInstructionId() { + synchronized String getInstructionId() { return this.instructionId; } - void setInstructionId(String instructionId) { + synchronized void setInstructionId(String instructionId) { this.instructionId = instructionId; } void reset() throws Exception { + setInstructionId(null); getStartFunctionRegistry().reset(); getFinishFunctionRegistry().reset(); getSplitListener().clear(); getpCollectionConsumerRegistry().reset(); getMetricsContainerRegistry().reset(); getStateTracker().reset(); - ExecutionStateSampler.instance().reset(); + getQueueingClient().reset(); getBundleFinalizationCallbackRegistrations().clear(); for (ThrowingRunnable resetFunction : getResetFunctions()) { resetFunction.run(); } } + + void shutdown() { + for (ThrowingRunnable tearDownFunction : getTearDownFunctions()) { + LOG.debug("Tearing down function {}", tearDownFunction); + try { + tearDownFunction.run(); + } catch (Exception e) { + LOG.error( + "Exceptions are thrown from DoFn.teardown method. Note that it will not fail the" + + " pipeline execution,", + e); + } + } + } } /** diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnDataClient.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnDataClient.java index 1d0ad35eed29..af5eb61935ee 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnDataClient.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnDataClient.java @@ -25,7 +25,7 @@ import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.fn.data.InboundDataClient; import org.apache.beam.sdk.fn.data.LogicalEndpoint; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** * The {@link BeamFnDataClient} is able to forward inbound elements to a {@link FnDataReceiver} and diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnDataGrpcClient.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnDataGrpcClient.java index f9242613c689..01812288247d 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnDataGrpcClient.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnDataGrpcClient.java @@ -33,8 +33,8 @@ import org.apache.beam.sdk.fn.data.LogicalEndpoint; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -67,7 +67,8 @@ public BeamFnDataGrpcClient( *

    The provided coder is used to decode elements on the inbound stream. The decoded elements * are passed to the provided consumer. Any failure during decoding or processing of the element * will complete the returned future exceptionally. On successful termination of the stream - * (signaled by an empty data block), the returned future is completed successfully. + * (signaled by an empty data block with isLast set to true), the returned future is completed + * successfully. */ @Override public InboundDataClient receive( diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnTimerGrpcClient.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnTimerGrpcClient.java index 0b1777b7289e..6bfaa6e5d92c 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnTimerGrpcClient.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/BeamFnTimerGrpcClient.java @@ -75,6 +75,11 @@ public void awaitCompletion() throws InterruptedException, Exception { inbound.awaitCompletion(); } + @Override + public void runWhenComplete(Runnable completeRunnable) { + inbound.runWhenComplete(completeRunnable); + } + @Override public boolean isDone() { return inbound.isDone(); diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java index 375af8974644..c88194881b94 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java @@ -22,25 +22,31 @@ import java.util.HashMap; import java.util.List; import java.util.Map; +import java.util.Random; import java.util.Set; import org.apache.beam.fn.harness.HandlesSplits; import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; -import org.apache.beam.runners.core.metrics.LabeledMetrics; -import org.apache.beam.runners.core.metrics.MetricsContainerImpl; import org.apache.beam.runners.core.metrics.MetricsContainerStepMap; import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Labels; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Urns; import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName; +import org.apache.beam.runners.core.metrics.ShortIdMap; import org.apache.beam.runners.core.metrics.SimpleExecutionState; import org.apache.beam.runners.core.metrics.SimpleStateRegistry; +import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Distribution; import org.apache.beam.sdk.metrics.MetricsContainer; import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.sdk.util.common.ElementByteSizeObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ArrayListMultimap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ListMultimap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; @@ -62,9 +68,13 @@ public class PCollectionConsumerRegistry { @SuppressWarnings({"rawtypes"}) abstract static class ConsumerAndMetadata { public static ConsumerAndMetadata forConsumer( - FnDataReceiver consumer, String pTransformId, SimpleExecutionState state) { + FnDataReceiver consumer, + String pTransformId, + SimpleExecutionState state, + Coder valueCoder, + MetricsContainer metricsContainer) { return new AutoValue_PCollectionConsumerRegistry_ConsumerAndMetadata( - consumer, pTransformId, state); + consumer, pTransformId, state, valueCoder, metricsContainer); } public abstract FnDataReceiver getConsumer(); @@ -72,6 +82,10 @@ public static ConsumerAndMetadata forConsumer( public abstract String getPTransformId(); public abstract SimpleExecutionState getExecutionState(); + + public abstract Coder getValueCoder(); + + public abstract MetricsContainer getMetricsContainer(); } private ListMultimap pCollectionIdsToConsumers; @@ -104,7 +118,10 @@ public PCollectionConsumerRegistry( * getMultiplexingConsumer()} is called. */ public void register( - String pCollectionId, String pTransformId, FnDataReceiver> consumer) { + String pCollectionId, + String pTransformId, + FnDataReceiver> consumer, + Coder valueCoder) { // Just save these consumers for now, but package them up later with an // ElementCountFnDataReceiver and possibly a MultiplexingFnDataReceiver // if there are multiple consumers. @@ -124,7 +141,13 @@ public void register( executionStates.register(state); pCollectionIdsToConsumers.put( - pCollectionId, ConsumerAndMetadata.forConsumer(consumer, pTransformId, state)); + pCollectionId, + ConsumerAndMetadata.forConsumer( + consumer, + pTransformId, + state, + valueCoder, + metricsContainerRegistry.getContainer(pTransformId))); } /** Reset the execution states of the registered functions. */ @@ -152,13 +175,18 @@ public FnDataReceiver> getMultiplexingConsumer(String pCollecti throw new IllegalArgumentException( String.format("Unknown PCollectionId %s", pCollectionId)); } else if (consumerAndMetadatas.size() == 1) { - if (consumerAndMetadatas.get(0).getConsumer() instanceof HandlesSplits) { - return new SplittingMetricTrackingFnDataReceiver(pcId, consumerAndMetadatas.get(0)); + ConsumerAndMetadata consumerAndMetadata = consumerAndMetadatas.get(0); + if (consumerAndMetadata.getConsumer() instanceof HandlesSplits) { + return new SplittingMetricTrackingFnDataReceiver(pcId, consumerAndMetadata); } - return new MetricTrackingFnDataReceiver(pcId, consumerAndMetadatas.get(0)); + return new MetricTrackingFnDataReceiver(pcId, consumerAndMetadata); } else { - /* TODO(SDF), Consider supporting splitting each consumer individually. This would never come up in the existing SDF expansion, but might be useful to support fused SDF nodes. This would require dedicated delivery of the split results to each of the consumers separately. */ - return new MultiplexingMetricTrackingFnDataReceiver(pcId, consumerAndMetadatas); + /* TODO(SDF), Consider supporting splitting each consumer individually. This would never + come up in the existing SDF expansion, but might be useful to support fused SDF nodes. + This would require dedicated delivery of the split results to each of the consumers + separately. */ + return new MultiplexingMetricTrackingFnDataReceiver( + pcId, ImmutableList.copyOf(consumerAndMetadatas)); } }); } @@ -168,6 +196,11 @@ public List getExecutionTimeMonitoringInfos() { return executionStates.getExecutionTimeMonitoringInfos(); } + /** @return Execution Time Monitoring data based on the tracked start or finish function. */ + public Map getExecutionTimeMonitoringData(ShortIdMap shortIds) { + return executionStates.getExecutionTimeMonitoringData(shortIds); + } + /** @return the underlying consumers for a pCollectionId, some tests may wish to check this. */ @VisibleForTesting public List getUnderlyingConsumers(String pCollectionId) { @@ -186,8 +219,10 @@ private class MetricTrackingFnDataReceiver implements FnDataReceiver> delegate; private final String pTransformId; private final SimpleExecutionState state; - private final Counter counter; - private final MetricsContainer unboundMetricContainer; + private final Counter unboundedElementCountCounter; + private final SampleByteSizeDistribution unboundedSampledByteSizeDistribution; + private final Coder coder; + private final MetricsContainer metricsContainer; public MetricTrackingFnDataReceiver( String pCollectionId, ConsumerAndMetadata consumerAndMetadata) { @@ -196,33 +231,42 @@ public MetricTrackingFnDataReceiver( this.pTransformId = consumerAndMetadata.getPTransformId(); HashMap labels = new HashMap(); labels.put(Labels.PCOLLECTION, pCollectionId); - MonitoringInfoMetricName metricName = - MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.ELEMENT_COUNT, labels); - this.counter = LabeledMetrics.counter(metricName); + // Collect the metric in a metric container which is not bound to the step name. // This is required to count elements from impulse steps, which will produce elements outside // of a pTransform context. - this.unboundMetricContainer = metricsContainerRegistry.getUnboundContainer(); + MetricsContainer unboundMetricContainer = metricsContainerRegistry.getUnboundContainer(); + + MonitoringInfoMetricName elementCountMetricName = + MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.ELEMENT_COUNT, labels); + this.unboundedElementCountCounter = unboundMetricContainer.getCounter(elementCountMetricName); + + MonitoringInfoMetricName sampledByteSizeMetricName = + MonitoringInfoMetricName.named(Urns.SAMPLED_BYTE_SIZE, labels); + this.unboundedSampledByteSizeDistribution = + new SampleByteSizeDistribution<>( + unboundMetricContainer.getDistribution(sampledByteSizeMetricName)); + + this.coder = consumerAndMetadata.getValueCoder(); + this.metricsContainer = consumerAndMetadata.getMetricsContainer(); } @Override public void accept(WindowedValue input) throws Exception { - try (Closeable close = - MetricsEnvironment.scopedMetricsContainer(this.unboundMetricContainer)) { - // Increment the counter for each window the element occurs in. - this.counter.inc(input.getWindows().size()); - - // Wrap the consumer with extra logic to set the metric container with the appropriate - // PTransform context. This ensures that user metrics obtain the pTransform ID when they are - // created. Also use the ExecutionStateTracker and enter an appropriate state to track the - // Process Bundle Execution time metric. - MetricsContainerImpl container = metricsContainerRegistry.getContainer(pTransformId); - try (Closeable closeable = MetricsEnvironment.scopedMetricsContainer(container)) { - try (Closeable trackerCloseable = stateTracker.enterState(state)) { - this.delegate.accept(input); - } + // Increment the counter for each window the element occurs in. + this.unboundedElementCountCounter.inc(input.getWindows().size()); + // TODO(BEAM-11879): Consider updating size per window when we have window optimization. + this.unboundedSampledByteSizeDistribution.tryUpdate(input.getValue(), this.coder); + // Wrap the consumer with extra logic to set the metric container with the appropriate + // PTransform context. This ensures that user metrics obtain the pTransform ID when they are + // created. Also use the ExecutionStateTracker and enter an appropriate state to track the + // Process Bundle Execution time metric. + try (Closeable closeable = MetricsEnvironment.scopedMetricsContainer(metricsContainer)) { + try (Closeable trackerCloseable = stateTracker.enterState(state)) { + this.delegate.accept(input); } } + this.unboundedSampledByteSizeDistribution.finishLazyUpdate(); } } @@ -236,44 +280,53 @@ public void accept(WindowedValue input) throws Exception { private class MultiplexingMetricTrackingFnDataReceiver implements FnDataReceiver> { private final List consumerAndMetadatas; - private final Counter counter; - private final MetricsContainer unboundMetricContainer; + private final Counter unboundedElementCountCounter; + private final SampleByteSizeDistribution unboundedSampledByteSizeDistribution; public MultiplexingMetricTrackingFnDataReceiver( String pCollectionId, List consumerAndMetadatas) { this.consumerAndMetadatas = consumerAndMetadatas; HashMap labels = new HashMap(); labels.put(Labels.PCOLLECTION, pCollectionId); - MonitoringInfoMetricName metricName = - MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.ELEMENT_COUNT, labels); - this.counter = LabeledMetrics.counter(metricName); + // Collect the metric in a metric container which is not bound to the step name. // This is required to count elements from impulse steps, which will produce elements outside // of a pTransform context. - this.unboundMetricContainer = metricsContainerRegistry.getUnboundContainer(); + MetricsContainer unboundMetricContainer = metricsContainerRegistry.getUnboundContainer(); + MonitoringInfoMetricName elementCountMetricName = + MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.ELEMENT_COUNT, labels); + this.unboundedElementCountCounter = unboundMetricContainer.getCounter(elementCountMetricName); + + MonitoringInfoMetricName sampledByteSizeMetricName = + MonitoringInfoMetricName.named(Urns.SAMPLED_BYTE_SIZE, labels); + this.unboundedSampledByteSizeDistribution = + new SampleByteSizeDistribution<>( + unboundMetricContainer.getDistribution(sampledByteSizeMetricName)); } @Override public void accept(WindowedValue input) throws Exception { - try (Closeable close = - MetricsEnvironment.scopedMetricsContainer(this.unboundMetricContainer)) { - // Increment the counter for each window the element occurs in. - this.counter.inc(input.getWindows().size()); - - // Wrap the consumer with extra logic to set the metric container with the appropriate - // PTransform context. This ensures that user metrics obtain the pTransform ID when they are - // created. Also use the ExecutionStateTracker and enter an appropriate state to track the - // Process Bundle Execution time metric. - for (ConsumerAndMetadata consumerAndMetadata : consumerAndMetadatas) { - MetricsContainerImpl container = - metricsContainerRegistry.getContainer(consumerAndMetadata.getPTransformId()); - try (Closeable closeable = MetricsEnvironment.scopedMetricsContainer(container)) { - try (Closeable trackerCloseable = - stateTracker.enterState(consumerAndMetadata.getExecutionState())) { - consumerAndMetadata.getConsumer().accept(input); - } + // Increment the counter for each window the element occurs in. + this.unboundedElementCountCounter.inc(input.getWindows().size()); + // Wrap the consumer with extra logic to set the metric container with the appropriate + // PTransform context. This ensures that user metrics obtain the pTransform ID when they are + // created. Also use the ExecutionStateTracker and enter an appropriate state to track the + // Process Bundle Execution time metric. + for (ConsumerAndMetadata consumerAndMetadata : consumerAndMetadatas) { + + if (consumerAndMetadata.getValueCoder() != null) { + // TODO(BEAM-11879): Consider updating size per window when we have window optimization. + this.unboundedSampledByteSizeDistribution.tryUpdate( + input.getValue(), consumerAndMetadata.getValueCoder()); + } + try (Closeable closeable = + MetricsEnvironment.scopedMetricsContainer(consumerAndMetadata.getMetricsContainer())) { + try (Closeable trackerCloseable = + stateTracker.enterState(consumerAndMetadata.getExecutionState())) { + consumerAndMetadata.getConsumer().accept(input); } } + this.unboundedSampledByteSizeDistribution.finishLazyUpdate(); } } } @@ -305,4 +358,90 @@ public double getProgress() { return delegate.getProgress(); } } + + private static class SampleByteSizeDistribution { + /** Basic implementation of {@link ElementByteSizeObserver} for use in size estimation. */ + private static class ByteSizeObserver extends ElementByteSizeObserver { + private long observedSize = 0; + + @Override + protected void reportElementSize(long elementSize) { + observedSize += elementSize; + } + } + + final Distribution distribution; + ByteSizeObserver byteCountObserver; + + public SampleByteSizeDistribution(Distribution distribution) { + this.distribution = distribution; + this.byteCountObserver = null; + } + + public void tryUpdate(T value, Coder coder) throws Exception { + if (shouldSampleElement()) { + // First try using byte size observer + byteCountObserver = new ByteSizeObserver(); + coder.registerByteSizeObserver(value, byteCountObserver); + + if (!byteCountObserver.getIsLazy()) { + byteCountObserver.advance(); + this.distribution.update(byteCountObserver.observedSize); + } + } else { + byteCountObserver = null; + } + } + + public void finishLazyUpdate() { + // Advance lazy ElementByteSizeObservers, if any. + if (byteCountObserver != null && byteCountObserver.getIsLazy()) { + byteCountObserver.advance(); + this.distribution.update(byteCountObserver.observedSize); + } + } + + private static final int RESERVOIR_SIZE = 10; + private static final int SAMPLING_THRESHOLD = 30; + private long samplingToken = 0; + private long nextSamplingToken = 0; + private Random randomGenerator = new Random(); + + private boolean shouldSampleElement() { + // Sampling probability decreases as the element count is increasing. + // We unconditionally sample the first samplingCutoff elements. Calculating + // nextInt(samplingToken) for each element is expensive, so after a threshold, calculate the + // gap to next sample. + // https://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/ + + // Reset samplingToken if it's going to exceed the max value. + if (samplingToken + 1 == Long.MAX_VALUE) { + samplingToken = 0; + nextSamplingToken = getNextSamplingToken(samplingToken); + } + + samplingToken++; + // Use traditional sampling until the threshold of 30 + if (nextSamplingToken == 0) { + if (samplingToken <= RESERVOIR_SIZE + || randomGenerator.nextInt((int) samplingToken) < RESERVOIR_SIZE) { + if (samplingToken > SAMPLING_THRESHOLD) { + nextSamplingToken = getNextSamplingToken(samplingToken); + } + return true; + } + } else if (samplingToken >= nextSamplingToken) { + nextSamplingToken = getNextSamplingToken(samplingToken); + return true; + } + return false; + } + + private long getNextSamplingToken(long samplingToken) { + double gap = + Math.log(1.0 - randomGenerator.nextDouble()) + / Math.log(1.0 - RESERVOIR_SIZE / (double) samplingToken); + return samplingToken + (int) gap; + } + } } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PTransformFunctionRegistry.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PTransformFunctionRegistry.java index 10d421cabd2d..4bc6b5f56e5d 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PTransformFunctionRegistry.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PTransformFunctionRegistry.java @@ -21,15 +21,18 @@ import java.util.ArrayList; import java.util.HashMap; import java.util.List; +import java.util.Map; import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; import org.apache.beam.runners.core.metrics.MetricsContainerImpl; import org.apache.beam.runners.core.metrics.MetricsContainerStepMap; import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.ShortIdMap; import org.apache.beam.runners.core.metrics.SimpleExecutionState; import org.apache.beam.runners.core.metrics.SimpleStateRegistry; import org.apache.beam.sdk.function.ThrowingRunnable; import org.apache.beam.sdk.metrics.MetricsEnvironment; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** * A class to to register and retrieve functions for bundle processing (i.e. the start, or finish @@ -122,6 +125,11 @@ public List getExecutionTimeMonitoringInfos() { return executionStates.getExecutionTimeMonitoringInfos(); } + /** @return Execution Time MonitoringInfos based on the tracked start or finish function. */ + public Map getExecutionTimeMonitoringData(ShortIdMap shortIds) { + return executionStates.getExecutionTimeMonitoringData(shortIds); + } + /** * @return A list of wrapper functions which will invoke the registered functions indirectly. The * order of registry is maintained. diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/QueueingBeamFnDataClient.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/QueueingBeamFnDataClient.java index 634ea2a6033b..e5691d443718 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/QueueingBeamFnDataClient.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/QueueingBeamFnDataClient.java @@ -17,10 +17,13 @@ */ package org.apache.beam.fn.harness.data; -import java.util.concurrent.ConcurrentHashMap; -import java.util.concurrent.LinkedBlockingQueue; +import java.util.ArrayList; +import java.util.HashSet; +import java.util.concurrent.ArrayBlockingQueue; +import java.util.concurrent.BlockingQueue; import java.util.concurrent.TimeUnit; -import org.apache.beam.fn.harness.control.ProcessBundleHandler; +import java.util.concurrent.atomic.AtomicBoolean; +import javax.annotation.concurrent.GuardedBy; import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.sdk.coders.Coder; @@ -28,32 +31,109 @@ import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.fn.data.InboundDataClient; import org.apache.beam.sdk.fn.data.LogicalEndpoint; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.Nullable; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * A {@link BeamFnDataClient} that queues elements so that they can be consumed and processed in the * thread which calls @{link #drainAndBlock}. + * + *

    This class is ready for use after creation or after a call to {@link #reset}. During usage it + * is expected that possibly multiple threads will register clients with {@link #receive}. After all + * clients have been registered a single thread should call {@link #drainAndBlock}. */ -@SuppressWarnings({ - "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class QueueingBeamFnDataClient implements BeamFnDataClient { + private static class ClosableQueue { + // Used to indicate that the queue is closed. + private static final ConsumerAndData POISON = + new ConsumerAndData<>( + input -> { + throw new RuntimeException("Unable to accept poison."); + }, + new Object()); + + private final BlockingQueue> queue; + private final AtomicBoolean closed = new AtomicBoolean(); + + ClosableQueue(int queueSize) { + this.queue = new ArrayBlockingQueue<>(queueSize); + } + + // Closes the queue indicating that no additional elements will be added. May be called + // at most once. Non-blocking. + void close() { + Preconditions.checkArgument(!closed.getAndSet(true)); + if (!queue.offer(POISON)) { + // If this returns false, the queue was full. Since there were elements in + // the queue, the poison is unnecessary since we check closed when taking from + // the queue. + LOG.debug("Queue was full, not adding poison"); + } + } + + // See BlockingQueue.offer. Must not be called after close(). + boolean offer(ConsumerAndData e, long l, TimeUnit t) throws InterruptedException { + return queue.offer(e, l, t); + } + + // Blocks until there is either an element available or the queue has been closed and is + // empty in which case it returns null. + @Nullable + ConsumerAndData take() throws InterruptedException { + // We first poll without blocking to optimize for the case there is data. + // If there is no data we end up blocking on take() and thus the extra + // poll doesn't matter. + @Nullable ConsumerAndData result = queue.poll(); + if (result == null) { + if (closed.get()) { + // Poll again to ensure that there is nothing in the queue. Once we observe closed as true + // we are guaranteed no additional elements other than the POISON will be added. However + // we can't rely on the previous poll result as it could race with additional offers and + // close. + result = queue.poll(); + } else { + // We are not closed so we perform a blocking take. We are guaranteed that additional + // elements will be offered or the POISON will be added by close to unblock this thread. + result = queue.take(); + } + } + if (result == POISON) { + return null; + } + return result; + } - private static final int QUEUE_SIZE = 1000; + boolean isEmpty() { + return queue.isEmpty() || queue.peek() == POISON; + } + } private static final Logger LOG = LoggerFactory.getLogger(QueueingBeamFnDataClient.class); private final BeamFnDataClient mainClient; - private final LinkedBlockingQueue queue; - private final ConcurrentHashMap inboundDataClients; - public QueueingBeamFnDataClient(BeamFnDataClient mainClient) { + @GuardedBy("inboundDataClients") + private final HashSet inboundDataClients; + + @GuardedBy("inboundDataClients") + private final ArrayList finishedClients; + + @GuardedBy("inboundDataClients") + private boolean isDraining = false; + + private final int queueSize; + private ClosableQueue queue; + + public QueueingBeamFnDataClient(BeamFnDataClient mainClient, int queueSize) { this.mainClient = mainClient; - this.queue = new LinkedBlockingQueue(QUEUE_SIZE); - this.inboundDataClients = new ConcurrentHashMap<>(); + this.inboundDataClients = new HashSet<>(); + this.finishedClients = new ArrayList<>(); + this.queueSize = queueSize; + this.queue = new ClosableQueue(queueSize); } @Override @@ -67,27 +147,40 @@ public InboundDataClient receive( inputLocation.getTransformId()); QueueingFnDataReceiver queueingConsumer = - new QueueingFnDataReceiver(consumer); + new QueueingFnDataReceiver<>(consumer, this.queue); InboundDataClient inboundDataClient = this.mainClient.receive(apiServiceDescriptor, inputLocation, queueingConsumer); queueingConsumer.inboundDataClient = inboundDataClient; - this.inboundDataClients.computeIfAbsent( - inboundDataClient, (InboundDataClient idcToStore) -> idcToStore); + synchronized (inboundDataClients) { + Preconditions.checkState(!isDraining); + if (this.inboundDataClients.add(inboundDataClient)) { + inboundDataClient.runWhenComplete(() -> completeInbound(inboundDataClient)); + } + } return inboundDataClient; } - // Returns true if all the InboundDataClients have finished or cancelled and no values - // remain on the queue. - private boolean allDone() { - for (InboundDataClient inboundDataClient : inboundDataClients.keySet()) { - if (!inboundDataClient.isDone()) { - return false; + private void completeInbound(InboundDataClient client) { + Preconditions.checkState(client.isDone()); + // This client will no longer be adding elements to the queue. + // + // There are several cases we consider here: + // - this is not the last active client -> do nothing since the last client will handle things + // - last client and we are draining -> we know that no additional elements will be added to the + // queue because there will be no more clients. We close the queue to trigger exiting + // drainAndBlock. + // - last client and we are not draining -> it is possible that additional clients will be added + // with receive. drainAndBlock itself detects this case and close the queue. + synchronized (inboundDataClients) { + if (!inboundDataClients.remove(client)) { + // Possible if this client was leftover from before reset() was called. + return; + } + finishedClients.add(client); + if (inboundDataClients.isEmpty() && isDraining) { + queue.close(); } } - if (!this.queue.isEmpty()) { - return false; - } - return true; } /** @@ -98,35 +191,50 @@ private boolean allDone() { * *

    All {@link InboundDataClient}s will be failed if processing throws an exception. * - *

    This method is NOT thread safe. This should only be invoked by a single thread, and is - * intended for use with a newly constructed QueueingBeamFnDataClient in {@link - * ProcessBundleHandler#processBundle}. + *

    This method is NOT thread safe. This should only be invoked once by a single thread. See + * class comment. */ public void drainAndBlock() throws Exception { + // There are several ways drainAndBlock completes: + // - processing elements fails -> all inbound clients are failed and exception thrown + // - draining starts while inbound clients are active -> the last client will poision the queue + // to notify that no more elements will arrive + // - draining starts without any remaining clients -> we just need to drain the queue and then + // are done as no further elements will arrive. + synchronized (inboundDataClients) { + Preconditions.checkState(!isDraining); + isDraining = true; + if (inboundDataClients.isEmpty()) { + queue.close(); + } + } while (true) { try { - ConsumerAndData tuple = queue.poll(200, TimeUnit.MILLISECONDS); - if (tuple != null) { - // Forward to the consumers who cares about this data. - tuple.consumer.accept(tuple.data); - } else { - // Note: We do not expect to ever hit this point without receiving all values - // as (1) The InboundObserver will not be set to Done until the - // QueuingFnDataReceiver.accept() call returns and will not be invoked again. - // (2) The QueueingFnDataReceiver will not return until the value is received in - // drainAndBlock, because of the use of the SynchronousQueue. - if (allDone()) { - break; - } + @Nullable ConsumerAndData tuple = queue.take(); + if (tuple == null) { + break; // queue has been drained and is closed. } + + // Forward to the consumers who cares about this data. + tuple.accept(); } catch (Exception e) { - LOG.error("Client failed to dequeue and process the value", e); - for (InboundDataClient inboundDataClient : inboundDataClients.keySet()) { + LOG.error("Client failed to deque and process the value", e); + HashSet clients = new HashSet<>(); + synchronized (inboundDataClients) { + clients.addAll(inboundDataClients); + clients.addAll(finishedClients); + } + for (InboundDataClient inboundDataClient : clients) { inboundDataClient.fail(e); } throw e; } } + synchronized (inboundDataClients) { + Preconditions.checkState(inboundDataClients.isEmpty()); + Preconditions.checkState(isDraining); + } + Preconditions.checkState(queue.isEmpty()); } @Override @@ -141,19 +249,33 @@ public CloseableFnDataReceiver send( return this.mainClient.send(apiServiceDescriptor, outputLocation, coder); } + /** Resets this object so that it may be reused. */ + public void reset() { + synchronized (inboundDataClients) { + inboundDataClients.clear(); + isDraining = false; + finishedClients.clear(); + } + // It is possible that previous inboundClients were failed but could still be adding + // additional elements to their bound queue. For this reason we create a new queue. + this.queue = new ClosableQueue(queueSize); + } + /** * The QueueingFnDataReceiver is a a FnDataReceiver used by the QueueingBeamFnDataClient. * *

    All {@link #accept accept()ed} values will be put onto a synchronous queue which will cause - * the calling thread to block until {@link QueueingBeamFnDataClient#drainAndBlock} is called. - * {@link QueueingBeamFnDataClient#drainAndBlock} is responsible for processing values from the - * queue. + * the calling thread to block until {@link QueueingBeamFnDataClient#drainAndBlock} is called or + * the inboundClient is failed. {@link QueueingBeamFnDataClient#drainAndBlock} is responsible for + * processing values from the queue. */ - public class QueueingFnDataReceiver implements FnDataReceiver { + private static class QueueingFnDataReceiver implements FnDataReceiver { private final FnDataReceiver consumer; - public InboundDataClient inboundDataClient; + private final ClosableQueue queue; + public @Nullable InboundDataClient inboundDataClient; // Null only during initialization. - public QueueingFnDataReceiver(FnDataReceiver consumer) { + public QueueingFnDataReceiver(FnDataReceiver consumer, ClosableQueue queue) { + this.queue = queue; this.consumer = consumer; } @@ -163,29 +285,35 @@ public QueueingFnDataReceiver(FnDataReceiver consumer) { */ @Override public void accept(T value) throws Exception { + @SuppressWarnings("nullness") + final @NonNull InboundDataClient client = this.inboundDataClient; try { - ConsumerAndData offering = new ConsumerAndData(this.consumer, value); - while (!queue.offer(offering, 200, TimeUnit.MILLISECONDS)) { - if (inboundDataClient.isDone()) { + ConsumerAndData offering = new ConsumerAndData<>(this.consumer, value); + while (!this.queue.offer(offering, 200, TimeUnit.MILLISECONDS)) { + if (client.isDone()) { // If it was cancelled by the consuming side of the queue. break; } } } catch (Exception e) { LOG.error("Failed to insert the value into the queue", e); - inboundDataClient.fail(e); + client.fail(e); throw e; } } } - static class ConsumerAndData { - public FnDataReceiver consumer; - public T data; + private static class ConsumerAndData { + private final FnDataReceiver consumer; + private final T data; public ConsumerAndData(FnDataReceiver receiver, T data) { this.consumer = receiver; this.data = data; } + + void accept() throws Exception { + consumer.accept(data); + } } } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClient.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClient.java index 0fdf4045066d..0cbfdf079d30 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClient.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClient.java @@ -19,17 +19,18 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables.getStackTraceAsString; +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; import java.util.ArrayList; import java.util.Collection; import java.util.List; import java.util.Map; -import java.util.concurrent.BlockingDeque; +import java.util.concurrent.ArrayBlockingQueue; +import java.util.concurrent.BlockingQueue; import java.util.concurrent.CancellationException; import java.util.concurrent.CompletableFuture; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Future; -import java.util.concurrent.LinkedBlockingDeque; import java.util.concurrent.Phaser; import java.util.concurrent.TimeUnit; import java.util.function.Function; @@ -41,17 +42,22 @@ import java.util.logging.Logger; import java.util.logging.SimpleFormatter; import org.apache.beam.model.fnexecution.v1.BeamFnApi; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.LogEntry; import org.apache.beam.model.fnexecution.v1.BeamFnLoggingGrpc; import org.apache.beam.model.pipeline.v1.Endpoints; +import org.apache.beam.runners.core.metrics.ExecutionStateTracker; +import org.apache.beam.runners.core.metrics.ExecutionStateTracker.ExecutionState; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.SimpleExecutionState; import org.apache.beam.sdk.extensions.gcp.options.GcsOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.SdkHarnessOptions; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ClientCallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ClientResponseObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientCallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientResponseObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.Nullable; @@ -176,8 +182,8 @@ public String toString() { } private class LogRecordHandler extends Handler implements Runnable { - private final BlockingDeque bufferedLogEntries = - new LinkedBlockingDeque<>(MAX_BUFFERED_LOG_ENTRY_COUNT); + private final BlockingQueue bufferedLogEntries = + new ArrayBlockingQueue<>(MAX_BUFFERED_LOG_ENTRY_COUNT); private @Nullable Future bufferedLogWriter = null; /** * Safe object publishing is not required since we only care if the thread that set this field @@ -206,6 +212,11 @@ public void publish(LogRecord record) { .setSeconds(record.getMillis() / 1000) .setNanos((int) (record.getMillis() % 1000) * 1_000_000)); + String instructionId = BeamFnLoggingMDC.getInstructionId(); + if (instructionId != null) { + builder.setInstructionId(instructionId); + } + Throwable thrown = record.getThrown(); if (thrown != null) { builder.setTrace(getStackTraceAsString(thrown)); @@ -215,6 +226,16 @@ public void publish(LogRecord record) { if (loggerName != null) { builder.setLogLocation(loggerName); } + ExecutionState state = ExecutionStateTracker.getCurrentExecutionState(record.getThreadID()); + if (state instanceof SimpleExecutionState) { + String transformId = + ((SimpleExecutionState) state) + .getLabels() + .get(MonitoringInfoConstants.Labels.PTRANSFORM); + if (transformId != null) { + builder.setTransformId(transformId); + } + } // The thread that sends log records should never perform a blocking publish and // only insert log records best effort. @@ -228,10 +249,15 @@ public void publish(LogRecord record) { } } else { // Never blocks caller, will drop log message if buffer is full. - bufferedLogEntries.offer(builder.build()); + dropIfBufferFull(builder.build()); } } + @SuppressFBWarnings("RV_RETURN_VALUE_IGNORED_BAD_PRACTICE") + private void dropIfBufferFull(BeamFnApi.LogEntry logEntry) { + bufferedLogEntries.offer(logEntry); + } + @Override public void run() { // Logging which occurs in this thread will attempt to publish log entries into the diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderJsonIT.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/logging/BeamFnLoggingMDC.java similarity index 54% rename from sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderJsonIT.java rename to sdks/java/harness/src/main/java/org/apache/beam/fn/harness/logging/BeamFnLoggingMDC.java index 735fabf80fce..bcfcd4b34ea5 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderJsonIT.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/logging/BeamFnLoggingMDC.java @@ -15,26 +15,21 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; +package org.apache.beam.fn.harness.logging; -import static java.nio.charset.StandardCharsets.UTF_8; +import org.checkerframework.checker.nullness.qual.Nullable; -import org.apache.kafka.clients.producer.ProducerRecord; +/** Mapped diagnostic context to be consumed and set on LogEntry protos in BeamFnLoggingClient. */ +public class BeamFnLoggingMDC { + private static final ThreadLocal<@Nullable String> instructionId = new ThreadLocal<>(); -public class KafkaTableProviderJsonIT extends KafkaTableProviderIT { - @Override - protected ProducerRecord generateProducerRecord(int i) { - return new ProducerRecord<>( - kafkaOptions.getKafkaTopic(), "k" + i, createJson(i).getBytes(UTF_8)); + /** Sets the Instruction ID of the current thread, which will be inherited by child threads. */ + public static void setInstructionId(@Nullable String newInstructionId) { + instructionId.set(newInstructionId); } - @Override - protected String getPayloadFormat() { - return "json"; - } - - private String createJson(int i) { - return String.format( - "{\"f_long\": %s, \"f_int\": %s, \"f_string\": \"%s\"}", i, i % 3 + 1, "value" + i); + /** Gets the Instruction ID of the current thread. */ + public static @Nullable String getInstructionId() { + return instructionId.get(); } } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BagUserState.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BagUserState.java index 3c96e42a5178..777036ab4c82 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BagUserState.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BagUserState.java @@ -26,8 +26,9 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateClearRequest; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.fn.stream.DataStreams; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterables; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; /** @@ -50,7 +51,7 @@ public class BagUserState { private final BeamFnStateClient beamFnStateClient; private final StateRequest request; private final Coder valueCoder; - private Iterable oldValues; + private PrefetchableIterable oldValues; private ArrayList newValues; private boolean isClosed; @@ -77,27 +78,23 @@ public BagUserState( request = requestBuilder.build(); this.oldValues = - new LazyCachingIteratorToIterable<>( - new DataStreams.DataStreamDecoder( - valueCoder, - DataStreams.inbound( - StateFetchingIterators.readAllStartingFrom(beamFnStateClient, request)))); + StateFetchingIterators.readAllAndDecodeStartingFrom(beamFnStateClient, request, valueCoder); this.newValues = new ArrayList<>(); } - public Iterable get() { + public PrefetchableIterable get() { checkState( !isClosed, "Bag user state is no longer usable because it is closed for %s", request.getStateKey()); if (oldValues == null) { // If we were cleared we should disregard old values. - return Iterables.limit(Collections.unmodifiableList(newValues), newValues.size()); + return PrefetchableIterables.limit(Collections.unmodifiableList(newValues), newValues.size()); } else if (newValues.isEmpty()) { // If we have no new values then just return the old values. return oldValues; } - return Iterables.concat( + return PrefetchableIterables.concat( oldValues, Iterables.limit(Collections.unmodifiableList(newValues), newValues.size())); } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCache.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCache.java index 5ae859f7b2c8..633db1d99b6c 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCache.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCache.java @@ -30,8 +30,8 @@ import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; import org.apache.beam.sdk.fn.IdGenerator; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/CachingBeamFnStateClient.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/CachingBeamFnStateClient.java new file mode 100644 index 000000000000..888d23139028 --- /dev/null +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/CachingBeamFnStateClient.java @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import com.google.auto.value.AutoValue; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.CompletableFuture; +import org.apache.beam.model.fnexecution.v1.BeamFnApi; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleRequest.CacheToken; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetResponse; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey.IterableSideInput; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey.MultimapKeysSideInput; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey.MultimapSideInput; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LoadingCache; + +/** + * Wraps a delegate BeamFnStateClient and stores the result of state requests in cross bundle cache + * according to the available cache tokens. If there are no cache tokens for the state key requested + * the request is forwarded to the client and executed normally. + */ +public class CachingBeamFnStateClient implements BeamFnStateClient { + + private final BeamFnStateClient beamFnStateClient; + private final LoadingCache> stateCache; + private final Map sideInputCacheTokens; + private final ByteString userStateToken; + + /** + * Creates a CachingBeamFnStateClient that wraps a BeamFnStateClient with a LoadingCache. Cache + * tokens are sent by the runner to indicate which state is able to be cached. + */ + public CachingBeamFnStateClient( + BeamFnStateClient beamFnStateClient, + LoadingCache> stateCache, + List cacheTokenList) { + this.beamFnStateClient = beamFnStateClient; + this.stateCache = stateCache; + this.sideInputCacheTokens = new HashMap<>(); + + // Set up cache tokens. + ByteString tempUserStateToken = ByteString.EMPTY; + for (BeamFnApi.ProcessBundleRequest.CacheToken token : cacheTokenList) { + if (token.hasUserState()) { + tempUserStateToken = token.getToken(); + } else if (token.hasSideInput()) { + sideInputCacheTokens.put(token.getSideInput(), token.getToken()); + } + } + this.userStateToken = tempUserStateToken; + } + + /** + * Completes the response with a cached value if possible, if not forwards the response to the + * BeamFnStateClient and tries caching the result. All Append and Clear requests are forwarded. + */ + @Override + @SuppressWarnings("FutureReturnValueIgnored") + public void handle( + StateRequest.Builder requestBuilder, CompletableFuture response) { + + StateKey stateKey = requestBuilder.getStateKey(); + ByteString cacheToken = getCacheToken(stateKey); + + // If state is not cacheable proceed as normal. + if (ByteString.EMPTY.equals(cacheToken)) { + beamFnStateClient.handle(requestBuilder, response); + return; + } + + switch (requestBuilder.getRequestCase()) { + case GET: + // Check if data is in the cache. + StateCacheKey cacheKey = + StateCacheKey.create(cacheToken, requestBuilder.getGet().getContinuationToken()); + Map stateKeyMap = stateCache.getUnchecked(stateKey); + StateGetResponse cachedPage = stateKeyMap.get(cacheKey); + + // If data is not cached, add callback to add response to cache on completion. + // Otherwise, complete the response with the cached data. + if (cachedPage == null) { + response.thenAccept( + stateResponse -> + stateCache.getUnchecked(stateKey).put(cacheKey, stateResponse.getGet())); + beamFnStateClient.handle(requestBuilder, response); + + } else { + response.complete( + StateResponse.newBuilder().setId(requestBuilder.getId()).setGet(cachedPage).build()); + } + + return; + + case APPEND: + // TODO(BEAM-12637): Support APPEND in CachingBeamFnStateClient. + beamFnStateClient.handle(requestBuilder, response); + + // Invalidate last page of cached values (entry with a blank continuation token response) + Map map = stateCache.getUnchecked(stateKey); + map.entrySet() + .removeIf(entry -> (entry.getValue().getContinuationToken().equals(ByteString.EMPTY))); + return; + + case CLEAR: + // Remove all state key data and replace with an empty response. + beamFnStateClient.handle(requestBuilder, response); + Map clearedData = new HashMap<>(); + StateCacheKey newKey = StateCacheKey.create(cacheToken, ByteString.EMPTY); + clearedData.put(newKey, StateGetResponse.getDefaultInstance()); + stateCache.put(stateKey, clearedData); + return; + + default: + throw new IllegalStateException( + String.format("Unknown request type %s", requestBuilder.getRequestCase())); + } + } + + private ByteString getCacheToken(BeamFnApi.StateKey stateKey) { + if (stateKey.hasBagUserState()) { + return userStateToken; + } else if (stateKey.hasRunner()) { + // TODO(BEAM-12638): Support caching of remote references. + return ByteString.EMPTY; + } else { + CacheToken.SideInput.Builder sideInputBuilder = CacheToken.SideInput.newBuilder(); + if (stateKey.hasIterableSideInput()) { + IterableSideInput iterableSideInput = stateKey.getIterableSideInput(); + sideInputBuilder + .setTransformId(iterableSideInput.getTransformId()) + .setSideInputId(iterableSideInput.getSideInputId()); + } else if (stateKey.hasMultimapSideInput()) { + MultimapSideInput multimapSideInput = stateKey.getMultimapSideInput(); + sideInputBuilder + .setTransformId(multimapSideInput.getTransformId()) + .setSideInputId(multimapSideInput.getSideInputId()); + } else if (stateKey.hasMultimapKeysSideInput()) { + MultimapKeysSideInput multimapKeysSideInput = stateKey.getMultimapKeysSideInput(); + sideInputBuilder + .setTransformId(multimapKeysSideInput.getTransformId()) + .setSideInputId(multimapKeysSideInput.getSideInputId()); + } + return sideInputCacheTokens.getOrDefault(sideInputBuilder.build(), ByteString.EMPTY); + } + } + + /** A key for caching the result of a StateGetRequest by cache and continuation tokens. */ + @AutoValue + public abstract static class StateCacheKey { + public abstract ByteString getCacheToken(); + + public abstract ByteString getContinuationToken(); + + static StateCacheKey create(ByteString cacheToken, ByteString continuationToken) { + return new AutoValue_CachingBeamFnStateClient_StateCacheKey(cacheToken, continuationToken); + } + } +} diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/FnApiStateAccessor.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/FnApiStateAccessor.java index cff5bf2b9267..5a931c5910a2 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/FnApiStateAccessor.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/FnApiStateAccessor.java @@ -38,7 +38,6 @@ import org.apache.beam.sdk.state.MapState; import org.apache.beam.sdk.state.OrderedListState; import org.apache.beam.sdk.state.ReadableState; -import org.apache.beam.sdk.state.ReadableStates; import org.apache.beam.sdk.state.SetState; import org.apache.beam.sdk.state.StateBinder; import org.apache.beam.sdk.state.StateContext; @@ -53,7 +52,7 @@ import org.apache.beam.sdk.util.CombineFnUtil; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.checkerframework.checker.nullness.qual.Nullable; @@ -264,7 +263,7 @@ public T read() { @Override public ValueState readLater() { - // TODO: Support prefetching. + impl.get().iterator().prefetch(); return this; } }; @@ -310,7 +309,7 @@ public Iterable read() { @Override public BagState readLater() { - // TODO: Support prefetching. + impl.get().iterator().prefetch(); return this; } @@ -391,6 +390,7 @@ public AccumT mergeAccumulators(Iterable accumulators) { @Override public CombiningState readLater() { + impl.get().iterator().prefetch(); return this; } @@ -412,7 +412,18 @@ public void add(ElementT value) { @Override public ReadableState isEmpty() { - return ReadableStates.immediate(!impl.get().iterator().hasNext()); + return new ReadableState() { + @Override + public @Nullable Boolean read() { + return !impl.get().iterator().hasNext(); + } + + @Override + public ReadableState readLater() { + impl.get().iterator().prefetch(); + return this; + } + }; } @Override diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/FnApiTimerBundleTracker.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/FnApiTimerBundleTracker.java new file mode 100644 index 000000000000..ea4723e4d748 --- /dev/null +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/FnApiTimerBundleTracker.java @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.util.Comparator; +import java.util.NavigableSet; +import java.util.function.Function; +import java.util.function.Supplier; +import javax.annotation.Nullable; +import org.apache.beam.fn.harness.data.BeamFnTimerClient.TimerHandler; +import org.apache.beam.runners.core.construction.Timer; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.state.TimeDomain; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.UserCodeException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ComparisonChain; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashBasedTable; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Table; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Table.Cell; + +public class FnApiTimerBundleTracker { + private final Supplier encodedCurrentKeySupplier; + private final Supplier encodedCurrentWindowSupplier; + private Table> timerModifications; + + @AutoValue + public abstract static class TimerInfo { + public abstract Timer getTimer(); + + public abstract String getTimerFamilyOrId(); + + public abstract TimeDomain getTimeDomain(); + + public static TimerInfo of( + Timer timer, String timerFamilyOrId, TimeDomain timeDomain) { + return new AutoValue_FnApiTimerBundleTracker_TimerInfo<>(timer, timerFamilyOrId, timeDomain); + } + } + + @AutoValue + public abstract static class Modifications { + public abstract NavigableSet> getModifiedEventTimersOrdered(); + + public abstract NavigableSet> getModifiedProcessingTimersOrdered(); + + public abstract NavigableSet> getModifiedSynchronizedProcessingTimersOrdered(); + + public NavigableSet> getModifiedTimersOrdered(TimeDomain timeDomain) { + switch (timeDomain) { + case EVENT_TIME: + return getModifiedEventTimersOrdered(); + case PROCESSING_TIME: + return getModifiedProcessingTimersOrdered(); + case SYNCHRONIZED_PROCESSING_TIME: + return getModifiedSynchronizedProcessingTimersOrdered(); + default: + throw new RuntimeException("Unexpected time domain " + timeDomain); + } + } + + public abstract Table> getModifiedTimerIds(); + + @SuppressWarnings({"nullness"}) + static Modifications create() { + Comparator timeDomainComparator = + (td1, td2) -> { + // We prioritize processing-time timers,as those tend to be more latency sensitive. + if (td1 == TimeDomain.PROCESSING_TIME && td2 == TimeDomain.EVENT_TIME) { + return -1; + } else if (td1 == TimeDomain.EVENT_TIME && td2 == TimeDomain.PROCESSING_TIME) { + return 1; + } else { + return td1.compareTo(td2); + } + }; + + // We don't compare userKey or window, as all timers in the TreeSet already have the same + // key/window. + Comparator> comparator = + (o1, o2) -> { + ComparisonChain chain = + ComparisonChain.start() + .compare(o1.getTimeDomain(), o2.getTimeDomain(), timeDomainComparator) + .compareTrueFirst(o1.getTimer().getClearBit(), o2.getTimer().getClearBit()) + .compare(o1.getTimer().getFireTimestamp(), o2.getTimer().getFireTimestamp()) + .compare(o1.getTimer().getHoldTimestamp(), o2.getTimer().getHoldTimestamp()) + .compare( + o1.getTimer().getDynamicTimerTag(), o2.getTimer().getDynamicTimerTag()); + return chain.result(); + }; + + return new AutoValue_FnApiTimerBundleTracker_Modifications<>( + Sets.newTreeSet(comparator), + Sets.newTreeSet(comparator), + Sets.newTreeSet(comparator), + HashBasedTable.create()); + } + }; + + public FnApiTimerBundleTracker( + Coder keyCoder, + Coder windowCoder, + Supplier currentKeySupplier, + Supplier currentWindowSupplier) { + timerModifications = HashBasedTable.create(); + this.encodedCurrentKeySupplier = + memoizeFunction( + currentKeySupplier, + key -> { + checkState( + keyCoder != null, "Accessing state in unkeyed context, no key coder available"); + + ByteString.Output encodedKeyOut = ByteString.newOutput(); + try { + ((Coder) keyCoder).encode(key, encodedKeyOut, Coder.Context.NESTED); + } catch (IOException e) { + throw new IllegalStateException(e); + } + return encodedKeyOut.toByteString(); + }); + this.encodedCurrentWindowSupplier = + memoizeFunction( + currentWindowSupplier, + window -> { + ByteString.Output encodedWindowOut = ByteString.newOutput(); + try { + windowCoder.encode(window, encodedWindowOut); + } catch (IOException e) { + throw new IllegalStateException(e); + } + return encodedWindowOut.toByteString(); + }); + } + + public void timerModified(String timerFamilyOrId, TimeDomain timeDomain, Timer timer) { + ByteString keyString = encodedCurrentKeySupplier.get(); + ByteString windowString = encodedCurrentWindowSupplier.get(); + @Nullable Modifications modifications = timerModifications.get(keyString, windowString); + if (modifications == null) { + modifications = Modifications.create(); + timerModifications.put(keyString, windowString, modifications); + } + if (!timer.getClearBit()) { + modifications + .getModifiedTimersOrdered(timeDomain) + .add(TimerInfo.of(timer, timerFamilyOrId, timeDomain)); + } + modifications.getModifiedTimerIds().put(timerFamilyOrId, timer.getDynamicTimerTag(), timer); + } + + public Modifications getBundleModifications() { + ByteString keyString = encodedCurrentKeySupplier.get(); + ByteString windowString = encodedCurrentWindowSupplier.get(); + @Nullable Modifications modifications = timerModifications.get(keyString, windowString); + if (modifications == null) { + modifications = Modifications.create(); + timerModifications.put(keyString, windowString, modifications); + } + return modifications; + } + + public void outputTimers(Function> getHandler) { + for (Cell> cell : timerModifications.cellSet()) { + Modifications modifications = cell.getValue(); + if (modifications != null) { + for (Cell> timerCell : + modifications.getModifiedTimerIds().cellSet()) { + String timerFamilyOrId = timerCell.getRowKey(); + Timer timer = timerCell.getValue(); + try { + if (timerFamilyOrId != null && timer != null) { + getHandler.apply(timerFamilyOrId).accept(timer); + } + } catch (Throwable t) { + throw UserCodeException.wrap(t); + } + } + } + } + } + + private static Supplier memoizeFunction( + Supplier arg, Function f) { + return new Supplier() { + private @Nullable ArgT memoizedArg = null; + private @Nullable ResultT memoizedResult = null; + + @Override + public ResultT get() { + ArgT currentArg = arg.get(); + if (memoizedArg == null || currentArg != memoizedArg) { + this.memoizedArg = currentArg; + memoizedResult = f.apply(currentArg); + } + if (memoizedResult != null) { + return memoizedResult; + } else { + throw new RuntimeException("Unexpected null result."); + } + } + }; + } +} diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/IterableSideInput.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/IterableSideInput.java index c4ee0b47973f..48c35de8d396 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/IterableSideInput.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/IterableSideInput.java @@ -19,9 +19,8 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.fn.stream.DataStreams; import org.apache.beam.sdk.transforms.Materializations.IterableView; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** * An implementation of a iterable side input that utilizes the Beam Fn State API to fetch values. @@ -66,11 +65,7 @@ public Iterable get() { .setSideInputId(sideInputId) .setWindow(encodedWindow); - return new LazyCachingIteratorToIterable<>( - new DataStreams.DataStreamDecoder( - valueCoder, - DataStreams.inbound( - StateFetchingIterators.readAllStartingFrom( - beamFnStateClient, requestBuilder.build())))); + return StateFetchingIterators.readAllAndDecodeStartingFrom( + beamFnStateClient, requestBuilder.build(), valueCoder); } } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/LazyCachingIteratorToIterable.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/LazyCachingIteratorToIterable.java index cfc76cf1a726..7828f93ba027 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/LazyCachingIteratorToIterable.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/LazyCachingIteratorToIterable.java @@ -21,6 +21,9 @@ import java.util.Iterator; import java.util.List; import java.util.NoSuchElementException; +import java.util.Objects; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterator; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.checkerframework.checker.nullness.qual.Nullable; @@ -28,29 +31,41 @@ * Converts an iterator to an iterable lazily loading values from the underlying iterator and * caching them to support reiteration. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -class LazyCachingIteratorToIterable implements Iterable { +class LazyCachingIteratorToIterable implements PrefetchableIterable { private final List cachedElements; - private final Iterator iterator; + private final PrefetchableIterator iterator; - public LazyCachingIteratorToIterable(Iterator iterator) { + public LazyCachingIteratorToIterable(PrefetchableIterator iterator) { this.cachedElements = new ArrayList<>(); this.iterator = iterator; } @Override - public Iterator iterator() { + public PrefetchableIterator iterator() { return new CachingIterator(); } /** An {@link Iterator} which adds and fetched values into the cached elements list. */ - private class CachingIterator implements Iterator { + private class CachingIterator implements PrefetchableIterator { private int position = 0; private CachingIterator() {} + @Override + public boolean isReady() { + if (position < cachedElements.size()) { + return true; + } + return iterator.isReady(); + } + + @Override + public void prefetch() { + if (!isReady()) { + iterator.prefetch(); + } + } + @Override public boolean hasNext() { // The order of the short circuit is important below. @@ -76,7 +91,7 @@ public T next() { @Override public int hashCode() { - return iterator.hasNext() ? iterator.next().hashCode() : -1789023489; + return iterator.hasNext() ? Objects.hashCode(iterator.next()) : -1789023489; } @Override diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/MultimapSideInput.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/MultimapSideInput.java index 0f5bd4f584a7..2fc72fe9cfff 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/MultimapSideInput.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/MultimapSideInput.java @@ -20,9 +20,8 @@ import java.io.IOException; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.fn.stream.DataStreams; import org.apache.beam.sdk.transforms.Materializations.MultimapView; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; /** * An implementation of a multimap side input that utilizes the Beam Fn State API to fetch values. @@ -71,12 +70,8 @@ public Iterable get() { .setSideInputId(sideInputId) .setWindow(encodedWindow); - return new LazyCachingIteratorToIterable<>( - new DataStreams.DataStreamDecoder( - keyCoder, - DataStreams.inbound( - StateFetchingIterators.readAllStartingFrom( - beamFnStateClient, requestBuilder.build())))); + return StateFetchingIterators.readAllAndDecodeStartingFrom( + beamFnStateClient, requestBuilder.build(), keyCoder); } @Override @@ -98,11 +93,7 @@ public Iterable get(K k) { .setWindow(encodedWindow) .setKey(output.toByteString()); - return new LazyCachingIteratorToIterable<>( - new DataStreams.DataStreamDecoder( - valueCoder, - DataStreams.inbound( - StateFetchingIterators.readAllStartingFrom( - beamFnStateClient, requestBuilder.build())))); + return StateFetchingIterators.readAllAndDecodeStartingFrom( + beamFnStateClient, requestBuilder.build(), valueCoder); } } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateBackedIterable.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateBackedIterable.java index 8df1cfa6592c..3bca90dc4907 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateBackedIterable.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateBackedIterable.java @@ -23,7 +23,9 @@ import java.io.DataOutputStream; import java.io.IOException; import java.io.InputStream; +import java.io.ObjectStreamException; import java.io.OutputStream; +import java.io.Serializable; import java.util.Collections; import java.util.Iterator; import java.util.List; @@ -37,8 +39,9 @@ import org.apache.beam.sdk.fn.stream.DataStreams; import org.apache.beam.sdk.util.BufferedElementCountingOutputStream; import org.apache.beam.sdk.util.VarInt; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; @@ -56,9 +59,9 @@ @SuppressWarnings({ "rawtypes" // TODO(https://issues.apache.org/jira/browse/BEAM-10556) }) -public class StateBackedIterable implements Iterable { +public class StateBackedIterable implements Iterable, Serializable { - private final BeamFnStateClient beamFnStateClient; + private final transient BeamFnStateClient beamFnStateClient; private final org.apache.beam.sdk.coders.Coder elemCoder; @VisibleForTesting final StateRequest request; @VisibleForTesting final List prefix; @@ -87,9 +90,11 @@ public Iterator iterator() { return Iterators.concat( prefix.iterator(), new DataStreams.DataStreamDecoder( - elemCoder, - DataStreams.inbound( - StateFetchingIterators.readAllStartingFrom(beamFnStateClient, request)))); + elemCoder, StateFetchingIterators.readAllStartingFrom(beamFnStateClient, request))); + } + + protected Object writeReplace() throws ObjectStreamException { + return ImmutableList.copyOf(this); } /** diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java index dda5cd6b249b..1026ba590466 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java @@ -24,7 +24,13 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetRequest; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.fn.stream.DataStreams.DataStreamDecoder; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterator; +import org.apache.beam.sdk.fn.stream.PrefetchableIterators; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; /** @@ -49,18 +55,174 @@ private StateFetchingIterators() {} * only) chunk of a state stream. This state request will be populated with a continuation * token to request further chunks of the stream if required. */ - public static Iterator readAllStartingFrom( + public static PrefetchableIterator readAllStartingFrom( BeamFnStateClient beamFnStateClient, StateRequest stateRequestForFirstChunk) { return new LazyBlockingStateFetchingIterator(beamFnStateClient, stateRequestForFirstChunk); } + /** + * This adapter handles using the continuation token to provide iteration over all the elements + * returned by the Beam Fn State API using the supplied state client, state request for the first + * chunk of the state stream, and a value decoder. + * + *

    The first page, and only the first page, of the state request results is cached for + * efficient re-iteration for small state requests while still allowing unboundedly large state + * requests without unboundedly large memory consumption. + * + * @param beamFnStateClient A client for handling state requests. + * @param stateRequestForFirstChunk A fully populated state request for the first (and possibly + * only) chunk of a state stream. This state request will be populated with a continuation + * token to request further chunks of the stream if required. + * @param valueCoder A coder for decoding the state stream. + */ + public static PrefetchableIterable readAllAndDecodeStartingFrom( + BeamFnStateClient beamFnStateClient, + StateRequest stateRequestForFirstChunk, + Coder valueCoder) { + return new FirstPageAndRemainder<>(beamFnStateClient, stateRequestForFirstChunk, valueCoder); + } + + /** + * A helper class that (lazily) gives the first page of a paginated state request separately from + * all the remaining pages. + */ + @VisibleForTesting + static class FirstPageAndRemainder implements PrefetchableIterable { + private final BeamFnStateClient beamFnStateClient; + private final StateRequest stateRequestForFirstChunk; + private final Coder valueCoder; + private LazyCachingIteratorToIterable firstPage; + private CompletableFuture firstPageResponseFuture; + private ByteString continuationToken; + + FirstPageAndRemainder( + BeamFnStateClient beamFnStateClient, + StateRequest stateRequestForFirstChunk, + Coder valueCoder) { + this.beamFnStateClient = beamFnStateClient; + this.stateRequestForFirstChunk = stateRequestForFirstChunk; + this.valueCoder = valueCoder; + } + + @Override + public PrefetchableIterator iterator() { + return new PrefetchableIterator() { + PrefetchableIterator delegate; + + private void ensureDelegateExists() { + if (delegate == null) { + // Fetch the first page if necessary + prefetchFirstPage(); + if (firstPage == null) { + StateResponse stateResponse; + try { + stateResponse = firstPageResponseFuture.get(); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new IllegalStateException(e); + } catch (ExecutionException e) { + if (e.getCause() == null) { + throw new IllegalStateException(e); + } + Throwables.throwIfUnchecked(e.getCause()); + throw new IllegalStateException(e.getCause()); + } + continuationToken = stateResponse.getGet().getContinuationToken(); + firstPage = + new LazyCachingIteratorToIterable<>( + new DataStreamDecoder<>( + valueCoder, + PrefetchableIterators.fromArray(stateResponse.getGet().getData()))); + } + + if (ByteString.EMPTY.equals((continuationToken))) { + delegate = firstPage.iterator(); + } else { + delegate = + PrefetchableIterators.concat( + firstPage.iterator(), + new DataStreamDecoder<>( + valueCoder, + new LazyBlockingStateFetchingIterator( + beamFnStateClient, + stateRequestForFirstChunk + .toBuilder() + .setGet( + StateGetRequest.newBuilder() + .setContinuationToken(continuationToken)) + .build()))); + } + } + } + + @Override + public boolean isReady() { + if (delegate == null) { + if (firstPageResponseFuture != null) { + return firstPageResponseFuture.isDone(); + } + return false; + } + return delegate.isReady(); + } + + @Override + public void prefetch() { + if (firstPageResponseFuture == null) { + prefetchFirstPage(); + } else if (delegate != null && !delegate.isReady()) { + delegate.prefetch(); + } + } + + @Override + public boolean hasNext() { + if (delegate == null) { + // Ensure that we prefetch the second page after the first has been accessed. + // Prefetching subsequent pages after the first will be handled by the + // LazyBlockingStateFetchingIterator + ensureDelegateExists(); + boolean rval = delegate.hasNext(); + delegate.prefetch(); + return rval; + } + return delegate.hasNext(); + } + + @Override + public T next() { + if (delegate == null) { + // Ensure that we prefetch the second page after the first has been accessed. + // Prefetching subsequent pages after the first will be handled by the + // LazyBlockingStateFetchingIterator + ensureDelegateExists(); + T rval = delegate.next(); + delegate.prefetch(); + return rval; + } + return delegate.next(); + } + }; + } + + private void prefetchFirstPage() { + if (firstPageResponseFuture == null) { + firstPageResponseFuture = new CompletableFuture<>(); + beamFnStateClient.handle( + stateRequestForFirstChunk.toBuilder().setGet(stateRequestForFirstChunk.getGet()), + firstPageResponseFuture); + } + } + } + /** * An {@link Iterator} which fetches {@link ByteString} chunks using the State API. * - *

    This iterator will only request a chunk on first access. Also it does not eagerly pre-fetch - * any future chunks and blocks whenever required to fetch the next block. + *

    This iterator will only request a chunk on first access. Subsequently it eagerly pre-fetches + * one future chunk at a time. */ - static class LazyBlockingStateFetchingIterator implements Iterator { + @VisibleForTesting + static class LazyBlockingStateFetchingIterator implements PrefetchableIterator { private enum State { READ_REQUIRED, @@ -73,13 +235,34 @@ private enum State { private State currentState; private ByteString continuationToken; private ByteString next; + private CompletableFuture prefetchedResponse; LazyBlockingStateFetchingIterator( BeamFnStateClient beamFnStateClient, StateRequest stateRequestForFirstChunk) { this.currentState = State.READ_REQUIRED; this.beamFnStateClient = beamFnStateClient; this.stateRequestForFirstChunk = stateRequestForFirstChunk; - this.continuationToken = ByteString.EMPTY; + this.continuationToken = stateRequestForFirstChunk.getGet().getContinuationToken(); + } + + @Override + public boolean isReady() { + if (prefetchedResponse == null) { + return currentState != State.READ_REQUIRED; + } + return prefetchedResponse.isDone(); + } + + @Override + public void prefetch() { + if (currentState == State.READ_REQUIRED && prefetchedResponse == null) { + prefetchedResponse = new CompletableFuture<>(); + beamFnStateClient.handle( + stateRequestForFirstChunk + .toBuilder() + .setGet(StateGetRequest.newBuilder().setContinuationToken(continuationToken)), + prefetchedResponse); + } } @Override @@ -88,15 +271,10 @@ public boolean hasNext() { case EOF: return false; case READ_REQUIRED: - CompletableFuture stateResponseFuture = new CompletableFuture<>(); - beamFnStateClient.handle( - stateRequestForFirstChunk - .toBuilder() - .setGet(StateGetRequest.newBuilder().setContinuationToken(continuationToken)), - stateResponseFuture); + prefetch(); StateResponse stateResponse; try { - stateResponse = stateResponseFuture.get(); + stateResponse = prefetchedResponse.get(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new IllegalStateException(e); @@ -107,6 +285,7 @@ public boolean hasNext() { Throwables.throwIfUnchecked(e.getCause()); throw new IllegalStateException(e.getCause()); } + prefetchedResponse = null; continuationToken = stateResponse.getGet().getContinuationToken(); next = stateResponse.getGet().getData(); currentState = State.HAS_NEXT; @@ -123,7 +302,12 @@ public ByteString next() { throw new NoSuchElementException(); } // If the continuation token is empty, that means we have reached EOF. - currentState = ByteString.EMPTY.equals(continuationToken) ? State.EOF : State.READ_REQUIRED; + if (ByteString.EMPTY.equals(continuationToken)) { + currentState = State.EOF; + } else { + currentState = State.READ_REQUIRED; + prefetch(); + } return next; } } diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/BeamFnStatusClient.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/BeamFnStatusClient.java new file mode 100644 index 000000000000..ccdfacf1fc75 --- /dev/null +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/BeamFnStatusClient.java @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.status; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Comparator; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.StringJoiner; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.function.Function; +import org.apache.beam.fn.harness.control.ProcessBundleHandler.BundleProcessor; +import org.apache.beam.fn.harness.control.ProcessBundleHandler.BundleProcessorCache; +import org.apache.beam.model.fnexecution.v1.BeamFnApi; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusResponse; +import org.apache.beam.model.fnexecution.v1.BeamFnWorkerStatusGrpc; +import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor; +import org.apache.beam.runners.core.metrics.ExecutionStateTracker; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class BeamFnStatusClient implements AutoCloseable { + private static final Object COMPLETED = new Object(); + private final StreamObserver outboundObserver; + private final BundleProcessorCache processBundleCache; + private final ManagedChannel channel; + private final CompletableFuture inboundObserverCompletion; + private static final Logger LOG = LoggerFactory.getLogger(BeamFnStatusClient.class); + private final MemoryMonitor memoryMonitor; + + public BeamFnStatusClient( + ApiServiceDescriptor apiServiceDescriptor, + Function channelFactory, + BundleProcessorCache processBundleCache, + PipelineOptions options) { + this.channel = channelFactory.apply(apiServiceDescriptor); + this.outboundObserver = + BeamFnWorkerStatusGrpc.newStub(channel).workerStatus(new InboundObserver()); + this.processBundleCache = processBundleCache; + this.memoryMonitor = MemoryMonitor.fromOptions(options); + this.inboundObserverCompletion = new CompletableFuture<>(); + Thread thread = new Thread(memoryMonitor); + thread.setDaemon(true); + thread.setPriority(Thread.MIN_PRIORITY); + thread.setName("MemoryMonitor"); + thread.start(); + } + + @Override + public void close() throws Exception { + try { + Object completion = inboundObserverCompletion.get(1, TimeUnit.MINUTES); + if (completion != COMPLETED) { + LOG.warn("InboundObserver for BeamFnStatusClient completed with exception."); + } + } finally { + // Shut the channel down + channel.shutdown(); + if (!channel.awaitTermination(10, TimeUnit.SECONDS)) { + channel.shutdownNow(); + } + } + } + + /** + * Class representing the execution state of a thread. + * + *

    Can be used in hash maps. + */ + static class Stack { + final StackTraceElement[] elements; + final Thread.State state; + + Stack(StackTraceElement[] elements, Thread.State state) { + this.elements = elements; + this.state = state; + } + + @Override + public int hashCode() { + return Objects.hash(Arrays.deepHashCode(elements), state); + } + + @Override + public boolean equals(@Nullable Object other) { + if (other == this) { + return true; + } else if (!(other instanceof Stack)) { + return false; + } else { + Stack that = (Stack) other; + return state == that.state && Arrays.deepEquals(elements, that.elements); + } + } + } + + String getThreadDump() { + StringJoiner trace = new StringJoiner("\n"); + trace.add("========== THREAD DUMP =========="); + // filter duplicates. + Map> stacks = new HashMap<>(); + Thread.getAllStackTraces() + .forEach( + (thread, elements) -> { + if (thread != Thread.currentThread()) { + Stack stack = new Stack(elements, thread.getState()); + stacks.putIfAbsent(stack, new ArrayList<>()); + stacks.get(stack).add(thread.toString()); + } + }); + + // Stacks with more threads are printed first. + stacks.entrySet().stream() + .sorted(Comparator.comparingInt(entry -> -entry.getValue().size())) + .forEachOrdered( + entry -> { + Stack stack = entry.getKey(); + List threads = entry.getValue(); + trace.add( + String.format( + "---- Threads (%d): %s State: %s Stack: ----", + threads.size(), threads, stack.state)); + Arrays.stream(stack.elements).map(StackTraceElement::toString).forEach(trace::add); + trace.add("\n"); + }); + return trace.toString(); + } + + String getMemoryUsage() { + StringJoiner memory = new StringJoiner("\n"); + memory.add("========== MEMORY USAGE =========="); + memory.add(memoryMonitor.describeMemory()); + return memory.toString(); + } + /** Class representing the execution state of a bundle. */ + static class BundleState { + final String instruction; + final String trackedThreadName; + final long timeSinceTransition; + + public String getInstruction() { + return instruction; + } + + public String getTrackedThreadName() { + return trackedThreadName; + } + + public long getTimeSinceTransition() { + return timeSinceTransition; + } + + public BundleState(String instruction, String trackedThreadName, long timeSinceTransition) { + this.instruction = instruction; + this.trackedThreadName = trackedThreadName; + this.timeSinceTransition = timeSinceTransition; + } + } + + @VisibleForTesting + String getActiveProcessBundleState() { + StringJoiner activeBundlesState = new StringJoiner("\n"); + activeBundlesState.add("========== ACTIVE PROCESSING BUNDLES =========="); + if (processBundleCache.getActiveBundleProcessors().isEmpty()) { + activeBundlesState.add("No active processing bundles."); + } else { + List bundleStates = new ArrayList<>(); + processBundleCache.getActiveBundleProcessors().keySet().stream() + .forEach( + instruction -> { + BundleProcessor bundleProcessor = processBundleCache.find(instruction); + if (bundleProcessor != null) { + ExecutionStateTracker executionStateTracker = bundleProcessor.getStateTracker(); + Thread trackedTread = executionStateTracker.getTrackedThread(); + if (trackedTread != null) { + bundleStates.add( + new BundleState( + instruction, + trackedTread.getName(), + executionStateTracker.getMillisSinceLastTransition())); + } + } + }); + bundleStates.stream() + // reverse sort active bundle by time since last transition. + .sorted(Comparator.comparing(BundleState::getTimeSinceTransition).reversed()) + .limit(10) // only keep top 10 + .forEachOrdered( + bundleState -> { + activeBundlesState.add( + String.format("---- Instruction %s ----", bundleState.getInstruction())); + activeBundlesState.add( + String.format("Tracked thread: %s", bundleState.getTrackedThreadName())); + activeBundlesState.add( + String.format( + "Time since transition: %.2f seconds%n", + bundleState.getTimeSinceTransition() / 1000.0)); + }); + } + return activeBundlesState.toString(); + } + + private class InboundObserver implements StreamObserver { + @Override + public void onNext(WorkerStatusRequest workerStatusRequest) { + StringJoiner status = new StringJoiner("\n"); + status.add(getMemoryUsage()); + status.add("\n"); + status.add(getActiveProcessBundleState()); + status.add("\n"); + status.add(getThreadDump()); + outboundObserver.onNext( + WorkerStatusResponse.newBuilder() + .setId(workerStatusRequest.getId()) + .setStatusInfo(status.toString()) + .build()); + } + + @Override + public void onError(Throwable t) { + LOG.error("Error getting SDK harness status", t); + inboundObserverCompletion.completeExceptionally(t); + } + + @Override + public void onCompleted() { + inboundObserverCompletion.complete(COMPLETED); + } + } +} diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/MemoryMonitor.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/MemoryMonitor.java new file mode 100644 index 000000000000..2a444d676af3 --- /dev/null +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/MemoryMonitor.java @@ -0,0 +1,635 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.status; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; +import java.lang.management.GarbageCollectorMXBean; +import java.lang.management.ManagementFactory; +import java.nio.channels.Channels; +import java.nio.channels.ReadableByteChannel; +import java.nio.channels.WritableByteChannel; +import java.nio.file.Files; +import java.nio.file.attribute.PosixFilePermission; +import java.util.ArrayDeque; +import java.util.Queue; +import java.util.UUID; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; +import javax.management.InstanceNotFoundException; +import javax.management.MBeanException; +import javax.management.MBeanServer; +import javax.management.MalformedObjectNameException; +import javax.management.ObjectName; +import javax.management.ReflectionException; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.CreateOptions.StandardCreateOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AtomicDouble; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A runnable which monitors a server for GC thrashing. + * + *

    Note: Only one instance of this should be initialized per server and it should be done when + * the server starts running. + * + *

    This runnable works as follows: + * + *

      + *
    • It wakes up periodically and determines how much time was spend on garbage collection since + * the last time it woke up. + *
    • If the time spent in garbage collection in the last period of time exceeds a certain + * threshold, that period is marked as "being in GC thrashing" + *
    • It keeps track of the GC thrashing status of the last few periods. + *
    • Every time the runnable's thread wakes up, it computes the ratio {@code (# monitored + * periods in GC thrashing) / (# monitored periods)}. + *
    • If this ratio exceeds a certain threshold, it is assumed that the server is in GC + * thrashing. + *
    • It can also shutdown the current jvm runtime when a threshold of consecutive gc thrashing + * count is met. A heap dump is made before shutdown. + *
    + */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class MemoryMonitor implements Runnable { + private static final Logger LOG = LoggerFactory.getLogger(MemoryMonitor.class); + + /** Amount of time (in ms) this thread must sleep between two consecutive iterations. */ + public static final long DEFAULT_SLEEP_TIME_MILLIS = 15 * 1000; // 15 sec. + + /** + * The number of periods to take into account when determining if the server is in GC thrashing. + */ + private static final int NUM_MONITORED_PERIODS = 4; // ie 1 min's worth. + + /** + * The (# monitored periods in GC thrashing) / (# monitored + * periods) threshold after which the server is considered to be in GC thrashing, expressed + * as a percentage. + */ + private static final double GC_THRASHING_PERCENTAGE_PER_SERVER = 60.0; + + /** + * The GC thrashing threshold percentage. A given period of time is considered "thrashing" if this + * percentage of CPU time is spent in garbage collection. + * + *

    If {@literal 100} is given as the value, MemoryMonitor will be disabled. + */ + private static final double GC_THRASHING_PERCENTAGE_PER_PERIOD = 50.0; + + /** + * The amount of memory (in bytes) we should pre-allocate, in order to be able to dump the heap. + * + *

    Since the server is in GC thrashing when we try to dump the heap, we might not be able to + * successfully do it. However, if we pre-allocate a big enough block of memory and "release" it + * right before trying to dump the heap, the pre-allocated block of memory will get GCed, and the + * heap dump might succeed. + */ + private static final int HEAP_DUMP_RESERVED_BYTES = 10 << 20; // 10MB + + /** + * Shutdown the current JVM instance after given consecutive gc thrashing periods are detected. + * This offers an opportunity to fast kill a JVM server if it is about to enter a long lasting gc + * thrashing state, which is almost never a desired behavior for a healthy server. 0 to disable. + */ + private static final int DEFAULT_SHUT_DOWN_AFTER_NUM_GCTHRASHING = 8; // ie 2 min's worth. + + /** Delay between logging the current memory state. */ + private static final int NORMAL_LOGGING_PERIOD_MILLIS = 5 * 60 * 1000; // 5 min. + + /** Abstract interface for providing GC stats (for testing). */ + public interface GCStatsProvider { + /** Return the total milliseconds spent in GC since JVM was started. */ + long totalGCTimeMilliseconds(); + } + + /** True system GC stats. */ + private static class SystemGCStatsProvider implements GCStatsProvider { + @Override + public long totalGCTimeMilliseconds() { + long inGC = 0; + for (GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { + inGC += gc.getCollectionTime(); + } + return inGC; + } + } + + /** Where to get GC stats. */ + private final GCStatsProvider gcStatsProvider; + + /** Actual sleep time, in milliseconds, for main monitor. */ + private final long sleepTimeMillis; + + /** Actual number of cycles before shutting down VM. */ + private final int shutDownAfterNumGCThrashing; + + /** + * The state of the periods that are taken into account when deciding if the server is in GC + * thrashing. + */ + private final Queue periodIsThrashing = new ArrayDeque<>(); + + /** Keeps track of the time the server spent in GC since it started running. */ + private long timeInGC = 0; + + /** + * A reserved block of memory, needed to dump the heap. Dumping the heap requires memory. However, + * since we try to do it when the server is in GC thrashing, no memory is available and dumpHeap() + * brings the server down. If we pre-allocate a block of memory though, and "release" it right + * before dumping the heap, this block of memory will be garbage collected, thus giving dumpHeap() + * enough space to dump the heap. + */ + @SuppressFBWarnings("unused") + private byte[] reservedForDumpingHeap = new byte[HEAP_DUMP_RESERVED_BYTES]; + + /** If true, dump the heap when thrashing or requested. */ + private final boolean canDumpHeap; + + /** + * The GC thrashing threshold for every period. If the time spent on garbage collection in one + * period exceeds this threshold, that period is considered to be in GC thrashing. + */ + private final double gcThrashingPercentagePerPeriod; + + private final AtomicBoolean isThrashing = new AtomicBoolean(false); + + private final AtomicBoolean isRunning = new AtomicBoolean(false); + + private final AtomicDouble lastMeasuredGCPercentage = new AtomicDouble(0.0); + private final AtomicDouble maxGCPercentage = new AtomicDouble(0.0); + private final AtomicInteger numPushbacks = new AtomicInteger(0); + + /** Wait point for threads in pushback waiting for gc thrashing to pass. */ + private final Object waitingForResources = new Object(); + + /** Wait point for threads wanting to wait for change to isRunning or isThrashing state. */ + private final Object waitingForStateChange = new Object(); + + /** + * If non null, if a heap dump is detected during initialization upload it to the given GCS path. + */ + private final @Nullable String uploadFilePath; + + private final File localDumpFolder; + + public static MemoryMonitor fromOptions(PipelineOptions options) { + String uploadFilePath = options.getTempLocation(); + boolean canDumpHeap = uploadFilePath != null; + + return new MemoryMonitor( + new SystemGCStatsProvider(), + DEFAULT_SLEEP_TIME_MILLIS, + DEFAULT_SHUT_DOWN_AFTER_NUM_GCTHRASHING, + canDumpHeap, + GC_THRASHING_PERCENTAGE_PER_PERIOD, + uploadFilePath, + getLoggingDir()); + } + + @VisibleForTesting + static MemoryMonitor forTest( + GCStatsProvider gcStatsProvider, + long sleepTimeMillis, + int shutDownAfterNumGCThrashing, + boolean canDumpHeap, + double gcThrashingPercentagePerPeriod, + @Nullable String uploadFilePath, + File localDumpFolder) { + return new MemoryMonitor( + gcStatsProvider, + sleepTimeMillis, + shutDownAfterNumGCThrashing, + canDumpHeap, + gcThrashingPercentagePerPeriod, + uploadFilePath, + localDumpFolder); + } + + private MemoryMonitor( + GCStatsProvider gcStatsProvider, + long sleepTimeMillis, + int shutDownAfterNumGCThrashing, + boolean canDumpHeap, + double gcThrashingPercentagePerPeriod, + @Nullable String uploadFilePath, + File localDumpFolder) { + this.gcStatsProvider = gcStatsProvider; + this.sleepTimeMillis = sleepTimeMillis; + this.shutDownAfterNumGCThrashing = shutDownAfterNumGCThrashing; + this.canDumpHeap = canDumpHeap; + this.gcThrashingPercentagePerPeriod = gcThrashingPercentagePerPeriod; + this.uploadFilePath = uploadFilePath; + this.localDumpFolder = localDumpFolder; + } + + /** For testing only: Wait for the monitor to be running. */ + @VisibleForTesting + void waitForRunning() { + synchronized (waitingForStateChange) { + boolean interrupted = false; + try { + while (!isRunning.get()) { + try { + waitingForStateChange.wait(); + } catch (InterruptedException e) { + interrupted = true; + // Retry test. + } + } + } finally { + if (interrupted) { + Thread.currentThread().interrupt(); + } + } + } + } + + /** For testing only: Wait for thrashing status to be updated to given value. */ + @VisibleForTesting + public void waitForThrashingState(boolean desiredThrashingState) { + synchronized (waitingForStateChange) { + boolean interrupted = false; + try { + while (isThrashing.get() != desiredThrashingState) { + try { + waitingForStateChange.wait(); + } catch (InterruptedException e) { + interrupted = true; + // Retry test. + } + } + } finally { + if (interrupted) { + Thread.currentThread().interrupt(); + } + } + } + } + + private File getDefaultHeapDumpPath() { + return new File(localDumpFolder, "heap_dump.hprof"); + } + + @VisibleForTesting + boolean tryUploadHeapDumpIfItExists() { + if (uploadFilePath == null) { + return false; + } + + boolean uploadedHeapDump = false; + File localSource = getDefaultHeapDumpPath(); + LOG.info("Looking for heap dump at {}", localSource); + if (localSource.exists()) { + LOG.warn("Heap dump {} detected, attempting to upload file to ", localSource); + String remoteDest = + String.format("%s/heap_dump%s.hprof", uploadFilePath, UUID.randomUUID().toString()); + ResourceId resource = FileSystems.matchNewResource(remoteDest, false); + try { + uploadFile(localSource, resource); + uploadedHeapDump = true; + LOG.warn("Heap dump {} uploaded to {}", localSource, remoteDest); + } catch (IOException e) { + LOG.error("Error uploading heap dump to {}", remoteDest, e); + } + + try { + Files.delete(localSource.toPath()); + LOG.info("Deleted local heap dump {}", localSource); + } catch (IOException e) { + LOG.warn("Unable to delete local heap dump {}", localSource, e); + } + } + + return uploadedHeapDump; + } + + private void uploadFile(File srcPath, ResourceId destination) throws IOException { + StandardCreateOptions createOptions = + StandardCreateOptions.builder().setMimeType("application/octet-stream").build(); + try (WritableByteChannel dst = FileSystems.create(destination, createOptions)) { + try (ReadableByteChannel src = Channels.newChannel(new FileInputStream(srcPath))) { + ByteStreams.copy(src, dst); + } + } + } + + /** Request the memory monitor stops. */ + public void stop() { + synchronized (waitingForStateChange) { + isRunning.set(false); + waitingForStateChange.notifyAll(); + } + } + + public boolean isThrashing() { + return isThrashing.get(); + } + + /** + * Check if we've observed high gc workload in sufficient sample periods to justify classifying + * the server as in gc thrashing. + */ + private void updateIsThrashing() { + // have we monitored enough periods? + if (periodIsThrashing.size() < NUM_MONITORED_PERIODS) { + setIsThrashing(false); + return; + } + + // count the number of periods in GC thrashing + int numPeriodsInGCThrashing = 0; + for (Boolean state : periodIsThrashing) { + numPeriodsInGCThrashing += (state ? 1 : 0); + } + + // Did we have too many periods in GC thrashing? + boolean serverInGcThrashing = + (numPeriodsInGCThrashing * 100 + >= periodIsThrashing.size() * GC_THRASHING_PERCENTAGE_PER_SERVER); + setIsThrashing(serverInGcThrashing); + } + + /** Set the thrashing state. */ + private void setIsThrashing(boolean serverInGcThrashing) { + synchronized (waitingForResources) { + synchronized (waitingForStateChange) { + boolean prev = isThrashing.getAndSet(serverInGcThrashing); + if (prev && !serverInGcThrashing) { + waitingForResources.notifyAll(); + } + if (prev != serverInGcThrashing) { + waitingForStateChange.notifyAll(); + } + } + } + } + + /** + * Determines if too much time was spent on garbage collection in the last period of time. + * + * @param now The current time. + * @param lastTimeWokeUp The last time this thread woke up. + * @return The state of the last period of time. + */ + private boolean wasLastPeriodInGCThrashing(long now, long lastTimeWokeUp) { + // Find out how much time was spent on garbage collection + // since the start of the server. This queries the set of garbage collectors for + // how long each one has spent doing GC. + long inGC = gcStatsProvider.totalGCTimeMilliseconds(); + + // Compare the amount of time spent in GC thrashing to the given threshold; + // if config.getSleepTimeMillis() is equal to 0 (should happen in tests only), + // then we compare percentage-per-period to 100% + double gcPercentage = (inGC - timeInGC) * 100.0 / (now - lastTimeWokeUp); + + lastMeasuredGCPercentage.set(gcPercentage); + maxGCPercentage.set(Math.max(maxGCPercentage.get(), gcPercentage)); + timeInGC = inGC; + + return gcPercentage > this.gcThrashingPercentagePerPeriod; + } + + /** + * Updates the data we monitor. + * + * @param now The current time. + * @param lastTimeWokeUp The last time this thread woke up. + */ + private void updateData(long now, long lastTimeWokeUp) { + // remove data that's no longer relevant + int numIntervals = NUM_MONITORED_PERIODS; + while (periodIsThrashing.size() >= numIntervals) { + periodIsThrashing.poll(); + } + // store the state of the last period + boolean wasThrashing = wasLastPeriodInGCThrashing(now, lastTimeWokeUp); + periodIsThrashing.offer(wasThrashing); + } + + /** + * Dumps the heap to a file and return the name of the file, or {@literal null} if the heap should + * not or could not be dumped. + * + * @return The name of the file the heap was dumped to, otherwise {@literal null}. + */ + public @Nullable File tryToDumpHeap() { + if (!canDumpHeap) { + return null; + } + + // Clearing this list should "release" some memory that will be needed to dump the heap. + // We could try to reallocate it again if we later notice memory pressure has subsided, + // but that is risk. Further, leaving this released may help with the memory pressure. + reservedForDumpingHeap = null; + + try { + return dumpHeap(); + } catch (Exception e) { + LOG.warn("Unable to dump heap: ", e); + return null; + } + } + + @SuppressFBWarnings("DM_EXIT") // we deliberately System.exit under memory + private void shutDownDueToGcThrashing(int thrashingCount) { + File heapDumpFile = tryToDumpHeap(); + LOG.error( + "Shutting down JVM after {} consecutive periods of measured GC thrashing. " + + "Memory is {}. Heap dump {}.", + thrashingCount, + describeMemory(), + heapDumpFile == null ? "not written" : ("written to '" + heapDumpFile + "'")); + + System.exit(1); + } + + /** Runs this thread. */ + @Override + public void run() { + synchronized (waitingForStateChange) { + Preconditions.checkState(!isRunning.getAndSet(true), "already running"); + + if (this.gcThrashingPercentagePerPeriod <= 0 || this.gcThrashingPercentagePerPeriod >= 100) { + LOG.warn( + "gcThrashingPercentagePerPeriod: {} is not valid value. Not starting MemoryMonitor.", + this.gcThrashingPercentagePerPeriod); + isRunning.set(false); + } + + waitingForStateChange.notifyAll(); + } + + // Within the memory monitor thread check to see if there is a pre-existing heap dump, and + // attempt to upload it. Note that this will delay the first memory monitor check. + tryUploadHeapDumpIfItExists(); + + try { + long lastTimeWokeUp = System.currentTimeMillis(); + long lastLog = -1; + int currentThrashingCount = 0; + while (true) { + synchronized (waitingForStateChange) { + waitingForStateChange.wait(sleepTimeMillis); + } + if (!isRunning.get()) { + break; + } + long now = System.currentTimeMillis(); + + updateData(now, lastTimeWokeUp); + updateIsThrashing(); + + if (lastLog < 0 || lastLog + NORMAL_LOGGING_PERIOD_MILLIS < now) { + LOG.info("Memory is {}", describeMemory()); + lastLog = now; + } + + if (isThrashing.get()) { + currentThrashingCount++; + + if (shutDownAfterNumGCThrashing > 0 + && (currentThrashingCount >= shutDownAfterNumGCThrashing)) { + shutDownDueToGcThrashing(currentThrashingCount); + } + } else { + // Reset the counter whenever the server is evaluated not under gc thrashing. + currentThrashingCount = 0; + } + + lastTimeWokeUp = now; + } + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + // most probably means that the server is shutting down + // in any case, there's not much we can do here + LOG.info("The GCThrashingMonitor was interrupted."); + } + } + + /** Return only when the server is not in the GC thrashing state. */ + public void waitForResources(String context) { + if (!isThrashing.get()) { + return; + } + numPushbacks.incrementAndGet(); + LOG.info("Waiting for resources for {}. Memory is {}", context, describeMemory()); + synchronized (waitingForResources) { + boolean interrupted = false; + try { + // No big deal if isThrashing became false in the meantime. + while (isThrashing.get()) { + try { + waitingForResources.wait(); + } catch (InterruptedException e1) { + interrupted = true; + LOG.debug("waitForResources was interrupted."); + } + } + } finally { + if (interrupted) { + Thread.currentThread().interrupt(); + } + } + } + LOG.info("Resources granted for {}. Memory is {}", context, describeMemory()); + } + + /** Return the path for logging heap dumps. */ + private static File getLoggingDir() { + return new File(System.getProperty("java.io.tmpdir")); + } + + /** + * Dump the current heap profile to a file in the given directory and return its name. + * + *

    NOTE: We deliberately don't salt the heap dump filename so as to minimize disk impact of + * repeated dumps. These files can be of comparable size to the local disk. + */ + public File dumpHeap() + throws MalformedObjectNameException, InstanceNotFoundException, ReflectionException, + MBeanException, IOException { + return dumpHeap(localDumpFolder); + } + + /** + * Dump the current heap profile to a file in the given directory and return its name. + * + *

    NOTE: We deliberately don't salt the heap dump filename so as to minimize disk impact of + * repeated dumps. These files can be of comparable size to the local disk. + */ + @VisibleForTesting + static synchronized File dumpHeap(File directory) + throws MalformedObjectNameException, InstanceNotFoundException, ReflectionException, + MBeanException, IOException { + boolean liveObjectsOnly = false; + File fileName = new File(directory, "heap_dump.hprof"); + if (fileName.exists() && !fileName.delete()) { + throw new IOException("heap_dump.hprof already existed and couldn't be deleted!"); + } + + MBeanServer mbs = ManagementFactory.getPlatformMBeanServer(); + ObjectName oname = new ObjectName("com.sun.management:type=HotSpotDiagnostic"); + Object[] parameters = {fileName.getPath(), liveObjectsOnly}; + String[] signatures = {String.class.getName(), boolean.class.getName()}; + mbs.invoke(oname, "dumpHeap", parameters, signatures); + + if (java.nio.file.FileSystems.getDefault().supportedFileAttributeViews().contains("posix")) { + Files.setPosixFilePermissions( + fileName.toPath(), + ImmutableSet.of( + PosixFilePermission.OWNER_READ, + PosixFilePermission.GROUP_READ, + PosixFilePermission.OTHERS_READ)); + } else { + fileName.setReadable(true, true); + } + + LOG.warn("Heap dumped to {}", fileName); + + return fileName; + } + + /** Return a string describing the current memory state of the server. */ + public String describeMemory() { + Runtime runtime = Runtime.getRuntime(); + long maxMemory = runtime.maxMemory(); + long totalMemory = runtime.totalMemory(); + long usedMemory = totalMemory - runtime.freeMemory(); + return String.format( + "used/total/max = %d/%d/%d MB, GC last/max = %.2f/%.2f %%, #pushbacks=%d, gc thrashing=%s", + usedMemory >> 20, + totalMemory >> 20, + maxMemory >> 20, + lastMeasuredGCPercentage.get(), + maxGCPercentage.get(), + numPushbacks.get(), + isThrashing.get()); + } +} diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/package-info.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/package-info.java new file mode 100644 index 000000000000..c3ec1799d2d2 --- /dev/null +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/package-info.java @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Worker status client. */ +package org.apache.beam.fn.harness.status; diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/HarnessStreamObserverFactories.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/HarnessStreamObserverFactories.java index 8707c3be9e67..23346753dad1 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/HarnessStreamObserverFactories.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/HarnessStreamObserverFactories.java @@ -22,7 +22,7 @@ import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; /** * Uses {@link PipelineOptions} to configure which underlying {@link StreamObserver} implementation diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/AssignWindowsRunnerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/AssignWindowsRunnerTest.java index ae6c4321a07d..51884cb89c0b 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/AssignWindowsRunnerTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/AssignWindowsRunnerTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.fn.harness; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import java.io.Serializable; @@ -28,9 +28,11 @@ import java.util.Collections; import org.apache.beam.fn.harness.AssignWindowsRunner.AssignWindowsMapFnFactory; import org.apache.beam.fn.harness.data.PCollectionConsumerRegistry; +import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; import org.apache.beam.model.pipeline.v1.RunnerApi.WindowIntoPayload; +import org.apache.beam.runners.core.construction.CoderTranslation; import org.apache.beam.runners.core.construction.Environments; import org.apache.beam.runners.core.construction.PTransformTranslation; import org.apache.beam.runners.core.construction.SdkComponents; @@ -38,6 +40,7 @@ import org.apache.beam.runners.core.metrics.ExecutionStateTracker; import org.apache.beam.runners.core.metrics.MetricsContainerStepMap; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.VarIntCoder; import org.apache.beam.sdk.function.ThrowingFunction; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.transforms.windowing.FixedWindows; @@ -176,9 +179,13 @@ public Coder windowCoder() { PCollectionConsumerRegistry pCollectionConsumerRegistry = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); - pCollectionConsumerRegistry.register("output", "ptransform", outputs::add); + pCollectionConsumerRegistry.register("output", "ptransform", outputs::add, VarIntCoder.of()); SdkComponents components = SdkComponents.create(); components.registerEnvironment(Environments.createDockerEnvironment("java")); + RunnerApi.PCollection pCollection = + RunnerApi.PCollection.newBuilder().setUniqueName("input").setCoderId("coder-id").build(); + RunnerApi.Coder coder = CoderTranslation.toProto(VarIntCoder.of()).getCoder(); + MapFnRunners.forWindowedValueMapFnFactory(new AssignWindowsMapFnFactory<>()) .createRunnerForPTransform( null /* pipelineOptions */, @@ -200,9 +207,9 @@ public Coder windowCoder() { .toByteString())) .build(), null /* processBundleInstructionId */, - null /* pCollections */, - null /* coders */, - null /* windowingStrategies */, + Collections.singletonMap("input", pCollection) /* pCollections */, + Collections.singletonMap("coder-id", coder) /* coders */, + Collections.emptyMap() /* windowingStrategies */, pCollectionConsumerRegistry, null /* startFunctionRegistry */, null /* finishFunctionRegistry */, diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/BeamFnDataReadRunnerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/BeamFnDataReadRunnerTest.java index e032e8f02c23..449e50cd3e92 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/BeamFnDataReadRunnerTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/BeamFnDataReadRunnerTest.java @@ -18,11 +18,11 @@ package org.apache.beam.fn.harness; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import static org.mockito.Matchers.any; import static org.mockito.Matchers.anyDouble; @@ -95,9 +95,6 @@ /** Tests for {@link BeamFnDataReadRunner}. */ @RunWith(Enclosed.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnDataReadRunnerTest { private static final Coder ELEMENT_CODER = StringUtf8Coder.of(); private static final String ELEMENT_CODER_SPEC_ID = "string-coder-id"; @@ -111,6 +108,7 @@ public class BeamFnDataReadRunnerTest { .setApiServiceDescriptor(Endpoints.ApiServiceDescriptor.getDefaultInstance()) .setCoderId(CODER_SPEC_ID) .build(); + private static final String DEFAULT_BUNDLE_ID = "57"; static { try { @@ -146,8 +144,6 @@ public void setUp() { @Test public void testCreatingAndProcessingBeamFnDataReadRunner() throws Exception { - String bundleId = "57"; - List> outputValues = new ArrayList<>(); MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap(); @@ -159,7 +155,8 @@ public void testCreatingAndProcessingBeamFnDataReadRunner() throws Exception { consumers.register( localOutputId, pTransformId, - (FnDataReceiver) (FnDataReceiver>) outputValues::add); + (FnDataReceiver) (FnDataReceiver>) outputValues::add, + StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -180,7 +177,7 @@ public void testCreatingAndProcessingBeamFnDataReadRunner() throws Exception { null /* beamFnTimerClient */, pTransformId, pTransform, - Suppliers.ofInstance(bundleId)::get, + Suppliers.ofInstance(DEFAULT_BUNDLE_ID)::get, ImmutableMap.of( localOutputId, RunnerApi.PCollection.newBuilder().setCoderId(ELEMENT_CODER_SPEC_ID).build()), @@ -205,7 +202,7 @@ public void testCreatingAndProcessingBeamFnDataReadRunner() throws Exception { verify(mockBeamFnDataClient) .receive( eq(PORT_SPEC.getApiServiceDescriptor()), - eq(LogicalEndpoint.data(bundleId, pTransformId)), + eq(LogicalEndpoint.data(DEFAULT_BUNDLE_ID, pTransformId)), eq(CODER), consumerCaptor.capture()); @@ -291,6 +288,8 @@ public void testReuseForMultipleBundles() throws Exception { Iterables.getOnlyElement(progressCallbacks).getMonitoringInfos())); assertThat(values, contains(valueInGlobalWindow("ABC"), valueInGlobalWindow("DEF"))); + // The BundleProcessor should be released first before calling reset. + bundleId.set(null); readRunner.reset(); values.clear(); // Ensure that when we reuse the BeamFnDataReadRunner the read index is reset to -1 @@ -365,7 +364,8 @@ public void testSplittingBeforeStartBundle() throws Exception { // The split should happen at 5 since the allowedSplitPoints is empty. assertEquals( channelSplitResult(5), - executeSplit(readRunner, PTRANSFORM_ID, -1L, 0.5, 10, Collections.EMPTY_LIST)); + executeSplit( + readRunner, PTRANSFORM_ID, DEFAULT_BUNDLE_ID, -1L, 0.5, 10, Collections.EMPTY_LIST)); readRunner.registerInputLocation(); // Ensure that we process the correct number of elements after splitting. @@ -395,7 +395,8 @@ public void testSplittingWhenNoElementsProcessed() throws Exception { // The split should happen at 5 since the allowedSplitPoints is empty. assertEquals( channelSplitResult(5), - executeSplit(readRunner, PTRANSFORM_ID, -1L, 0.5, 10, Collections.EMPTY_LIST)); + executeSplit( + readRunner, PTRANSFORM_ID, DEFAULT_BUNDLE_ID, -1L, 0.5, 10, Collections.EMPTY_LIST)); // Ensure that we process the correct number of elements after splitting. readRunner.forwardElementToConsumer(valueInGlobalWindow("A")); @@ -423,7 +424,8 @@ public void testSplittingWhenSomeElementsProcessed() throws Exception { readRunner.registerInputLocation(); assertEquals( channelSplitResult(6), - executeSplit(readRunner, PTRANSFORM_ID, 1L, 0.5, 10, Collections.EMPTY_LIST)); + executeSplit( + readRunner, PTRANSFORM_ID, DEFAULT_BUNDLE_ID, 1L, 0.5, 10, Collections.EMPTY_LIST)); // Ensure that we process the correct number of elements after splitting. readRunner.forwardElementToConsumer(valueInGlobalWindow("1")); @@ -441,6 +443,36 @@ public void testSplittingWhenSomeElementsProcessed() throws Exception { valueInGlobalWindow("3"), valueInGlobalWindow("4"))); } + + @Test + public void testSplittingAfterReuse() throws Exception { + List> outputValues = new ArrayList<>(); + BeamFnDataReadRunner readRunner = + createReadRunner(outputValues::add, PTRANSFORM_ID, mockBeamFnDataClient); + readRunner.registerInputLocation(); + // This split should not be executed. + assertEquals( + BeamFnApi.ProcessBundleSplitResponse.getDefaultInstance(), + executeSplit( + readRunner, PTRANSFORM_ID, "previousBundleId", 1L, 0.25, 10, Collections.EMPTY_LIST)); + + // Ensure that we process the correct number of elements after *not* splitting. + readRunner.forwardElementToConsumer(valueInGlobalWindow("1")); + readRunner.forwardElementToConsumer(valueInGlobalWindow("2")); + readRunner.forwardElementToConsumer(valueInGlobalWindow("3")); + readRunner.forwardElementToConsumer(valueInGlobalWindow("4")); + readRunner.forwardElementToConsumer(valueInGlobalWindow("5")); + assertThat( + outputValues, + contains( + valueInGlobalWindow("-1"), + valueInGlobalWindow("0"), + valueInGlobalWindow("1"), + valueInGlobalWindow("2"), + valueInGlobalWindow("3"), + valueInGlobalWindow("4"), + valueInGlobalWindow("5"))); + } } // Test different cases of chan nel split with empty allowed split points. @@ -496,6 +528,7 @@ public void testChannelSplit() throws Exception { executeSplit( readRunner, PTRANSFORM_ID, + DEFAULT_BUNDLE_ID, index, fractionOfRemainder, bufferSize, @@ -560,6 +593,7 @@ public void testChannelSplittingWithAllowedSplitPoints() throws Exception { executeSplit( readRunner, PTRANSFORM_ID, + DEFAULT_BUNDLE_ID, index, fractionOfRemainder, bufferSize, @@ -622,6 +656,7 @@ public void testElementSplit() throws Exception { executeSplit( readRunner, PTRANSFORM_ID, + DEFAULT_BUNDLE_ID, index, fractionOfRemainder, bufferSize, @@ -687,6 +722,7 @@ public void testElementSplittingWithAllowedSplitPoints() throws Exception { executeSplit( readRunner, PTRANSFORM_ID, + DEFAULT_BUNDLE_ID, index, fractionOfRemainder, bufferSize, @@ -718,14 +754,13 @@ private static BeamFnDataReadRunner createReadRunner( String pTransformId, BeamFnDataClient dataClient) throws Exception { - String bundleId = "57"; MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap(); PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); String localOutputId = "outputPC"; - consumers.register(localOutputId, pTransformId, consumer); + consumers.register(localOutputId, pTransformId, consumer, StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -746,7 +781,7 @@ private static BeamFnDataReadRunner createReadRunner( null /* beamFnTimerClient */, pTransformId, pTransform, - Suppliers.ofInstance(bundleId)::get, + Suppliers.ofInstance(DEFAULT_BUNDLE_ID)::get, ImmutableMap.of( localOutputId, RunnerApi.PCollection.newBuilder().setCoderId(ELEMENT_CODER_SPEC_ID).build()), @@ -773,6 +808,7 @@ private static MonitoringInfo createReadIndexMonitoringInfoAt(int index) { private static ProcessBundleSplitResponse executeSplit( BeamFnDataReadRunner readRunner, String pTransformId, + String bundleId, long index, double fractionOfRemainder, long inputElements, @@ -783,6 +819,7 @@ private static ProcessBundleSplitResponse executeSplit( } ProcessBundleSplitRequest request = ProcessBundleSplitRequest.newBuilder() + .setInstructionId(bundleId) .putDesiredSplits( pTransformId, DesiredSplit.newBuilder() diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/BeamFnDataWriteRunnerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/BeamFnDataWriteRunnerTest.java index 2c842470832c..c4d30b37ae9c 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/BeamFnDataWriteRunnerTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/BeamFnDataWriteRunnerTest.java @@ -18,11 +18,11 @@ package org.apache.beam.fn.harness; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; import static org.mockito.Matchers.any; @@ -73,9 +73,6 @@ /** Tests for {@link BeamFnDataWriteRunner}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnDataWriteRunnerTest { private static final String ELEM_CODER_ID = "string-coder-id"; @@ -121,15 +118,13 @@ public void setUp() { public void testCreatingAndProcessingBeamFnDataWriteRunner() throws Exception { String bundleId = "57L"; - PCollectionConsumerRegistry consumers = - new PCollectionConsumerRegistry( - mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class)); + MetricsContainerStepMap metricMap = new MetricsContainerStepMap(); + ExecutionStateTracker tracker = mock(ExecutionStateTracker.class); + PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry(metricMap, tracker); PTransformFunctionRegistry startFunctionRegistry = - new PTransformFunctionRegistry( - mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); + new PTransformFunctionRegistry(metricMap, tracker, "start"); PTransformFunctionRegistry finishFunctionRegistry = - new PTransformFunctionRegistry( - mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish"); + new PTransformFunctionRegistry(metricMap, tracker, "finish"); List teardownFunctions = new ArrayList<>(); String localInputId = "inputPC"; diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/CombineRunnersTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/CombineRunnersTest.java index 60cae35eaf1a..cce56606cda6 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/CombineRunnersTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/CombineRunnersTest.java @@ -18,20 +18,23 @@ package org.apache.beam.fn.harness; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; import static org.hamcrest.core.IsEqual.equalTo; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import java.util.ArrayDeque; import java.util.Arrays; import java.util.Collections; import java.util.Deque; +import java.util.HashMap; +import java.util.Map; import org.apache.beam.fn.harness.data.PCollectionConsumerRegistry; import org.apache.beam.fn.harness.data.PTransformFunctionRegistry; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.ModelCoders; import org.apache.beam.runners.core.construction.PipelineTranslation; import org.apache.beam.runners.core.construction.SdkComponents; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; @@ -56,9 +59,6 @@ /** Tests for {@link CombineRunners}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CombineRunnersTest { // CombineFn that converts strings to ints and sums them up to an accumulator, and negates the // value of the accumulator when extracting outputs. These operations are chosen to avoid @@ -134,8 +134,8 @@ public void testPrecombine() throws Exception { consumers.register( Iterables.getOnlyElement(pTransform.getOutputsMap().values()), TEST_COMBINE_ID, - (FnDataReceiver) - (FnDataReceiver>>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>>) mainOutputValues::add, + KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of())); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( @@ -213,9 +213,8 @@ public void testMergeAccumulators() throws Exception { consumers.register( Iterables.getOnlyElement(pTransform.getOutputsMap().values()), TEST_COMBINE_ID, - (FnDataReceiver) - (FnDataReceiver>>) mainOutputValues::add); - + (FnDataReceiver) (FnDataReceiver>>) mainOutputValues::add, + KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of())); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -223,6 +222,31 @@ public void testMergeAccumulators() throws Exception { new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish"); + // Create coder map for size estimation + RunnerApi.PCollection pCollection = + RunnerApi.PCollection.newBuilder() + .setUniqueName(inputPCollectionId) + .setCoderId("coder-id") + .build(); + Map pCollectionMap = + new HashMap<>(pProto.getComponents().getPcollectionsMap()); + pCollectionMap.put(inputPCollectionId, pCollection); + Map coderMap = new HashMap<>(pProto.getComponents().getCodersMap()); + coderMap.put( + "coder-id", + RunnerApi.Coder.newBuilder() + .setSpec(RunnerApi.FunctionSpec.newBuilder().setUrn(ModelCoders.KV_CODER_URN).build()) + .addComponentCoderIds("StringUtf8Coder") + .addComponentCoderIds("coder-id-iterable") + .build()); + coderMap.put( + "coder-id-iterable", + RunnerApi.Coder.newBuilder() + .setSpec( + RunnerApi.FunctionSpec.newBuilder().setUrn(ModelCoders.ITERABLE_CODER_URN).build()) + .addComponentCoderIds("BigEndianIntegerCoder") + .build()); + // Create runner. MapFnRunners.forValueMapFnFactory(CombineRunners::createMergeAccumulatorsMapFunction) .createRunnerForPTransform( @@ -233,8 +257,8 @@ public void testMergeAccumulators() throws Exception { TEST_COMBINE_ID, pTransform, null, - Collections.emptyMap(), - Collections.emptyMap(), + pCollectionMap, + coderMap, Collections.emptyMap(), consumers, startFunctionRegistry, @@ -280,8 +304,8 @@ public void testExtractOutputs() throws Exception { consumers.register( Iterables.getOnlyElement(pTransform.getOutputsMap().values()), TEST_COMBINE_ID, - (FnDataReceiver) - (FnDataReceiver>>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>>) mainOutputValues::add, + KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of())); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( @@ -290,6 +314,24 @@ public void testExtractOutputs() throws Exception { new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish"); + // Create coder map for size estimation + RunnerApi.PCollection pCollection = + RunnerApi.PCollection.newBuilder() + .setUniqueName(inputPCollectionId) + .setCoderId("coder-id") + .build(); + Map pCollectionMap = + new HashMap<>(pProto.getComponents().getPcollectionsMap()); + pCollectionMap.put(inputPCollectionId, pCollection); + Map coderMap = new HashMap<>(pProto.getComponents().getCodersMap()); + coderMap.put( + "coder-id", + RunnerApi.Coder.newBuilder() + .setSpec(RunnerApi.FunctionSpec.newBuilder().setUrn(ModelCoders.KV_CODER_URN).build()) + .addComponentCoderIds("StringUtf8Coder") + .addComponentCoderIds("BigEndianIntegerCoder") + .build()); + // Create runner. MapFnRunners.forValueMapFnFactory(CombineRunners::createExtractOutputsMapFunction) .createRunnerForPTransform( @@ -300,8 +342,8 @@ public void testExtractOutputs() throws Exception { TEST_COMBINE_ID, pTransform, null, - Collections.emptyMap(), - Collections.emptyMap(), + pCollectionMap, + coderMap, Collections.emptyMap(), consumers, startFunctionRegistry, @@ -347,8 +389,8 @@ public void testConvertToAccumulators() throws Exception { consumers.register( Iterables.getOnlyElement(pTransform.getOutputsMap().values()), TEST_COMBINE_ID, - (FnDataReceiver) - (FnDataReceiver>>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>>) mainOutputValues::add, + KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of())); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( @@ -367,8 +409,8 @@ public void testConvertToAccumulators() throws Exception { TEST_COMBINE_ID, pTransform, null, - Collections.emptyMap(), - Collections.emptyMap(), + pProto.getComponents().getPcollectionsMap(), + pProto.getComponents().getCodersMap(), Collections.emptyMap(), consumers, startFunctionRegistry, @@ -413,8 +455,8 @@ public void testCombineGroupedValues() throws Exception { consumers.register( Iterables.getOnlyElement(pTransform.getOutputsMap().values()), TEST_COMBINE_ID, - (FnDataReceiver) - (FnDataReceiver>>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>>) mainOutputValues::add, + KvCoder.of(StringUtf8Coder.of(), BigEndianIntegerCoder.of())); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( @@ -423,6 +465,31 @@ public void testCombineGroupedValues() throws Exception { new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish"); + // Create coder map for size estimation + RunnerApi.PCollection pCollection = + RunnerApi.PCollection.newBuilder() + .setUniqueName(inputPCollectionId) + .setCoderId("coder-id") + .build(); + Map pCollectionMap = + new HashMap<>(pProto.getComponents().getPcollectionsMap()); + pCollectionMap.put(inputPCollectionId, pCollection); + Map coderMap = new HashMap<>(pProto.getComponents().getCodersMap()); + coderMap.put( + "coder-id", + RunnerApi.Coder.newBuilder() + .setSpec(RunnerApi.FunctionSpec.newBuilder().setUrn(ModelCoders.KV_CODER_URN).build()) + .addComponentCoderIds("StringUtf8Coder") + .addComponentCoderIds("coder-id-iterable") + .build()); + coderMap.put( + "coder-id-iterable", + RunnerApi.Coder.newBuilder() + .setSpec( + RunnerApi.FunctionSpec.newBuilder().setUrn(ModelCoders.ITERABLE_CODER_URN).build()) + .addComponentCoderIds("StringUtf8Coder") + .build()); + // Create runner. MapFnRunners.forValueMapFnFactory(CombineRunners::createCombineGroupedValuesMapFunction) .createRunnerForPTransform( @@ -433,8 +500,8 @@ public void testCombineGroupedValues() throws Exception { TEST_COMBINE_ID, pTransform, null, - Collections.emptyMap(), - Collections.emptyMap(), + pCollectionMap, + coderMap, Collections.emptyMap(), consumers, startFunctionRegistry, diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/ExternalWorkerServiceTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/ExternalWorkerServiceTest.java new file mode 100644 index 000000000000..49a56a1fbe34 --- /dev/null +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/ExternalWorkerServiceTest.java @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness; + +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.verify; + +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StartWorkerRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StartWorkerResponse; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StopWorkerRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StopWorkerResponse; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class ExternalWorkerServiceTest { + + @Test + public void startWorker() { + PipelineOptions options = PipelineOptionsFactory.create(); + StartWorkerRequest request = StartWorkerRequest.getDefaultInstance(); + StreamObserver responseObserver = mock(StreamObserver.class); + ExternalWorkerService service = new ExternalWorkerService(options); + service.startWorker(request, responseObserver); + + verify(responseObserver).onNext(any(StartWorkerResponse.class)); + verify(responseObserver).onCompleted(); + } + + @Test + public void stopWorker() { + PipelineOptions options = PipelineOptionsFactory.create(); + StopWorkerRequest request = StopWorkerRequest.getDefaultInstance(); + StreamObserver responseObserver = mock(StreamObserver.class); + ExternalWorkerService service = new ExternalWorkerService(options); + service.stopWorker(request, responseObserver); + + verify(responseObserver).onNext(any(StopWorkerResponse.class)); + verify(responseObserver).onCompleted(); + } +} diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FlattenRunnerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FlattenRunnerTest.java index 285c5f532228..a012515e043a 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FlattenRunnerTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FlattenRunnerTest.java @@ -18,20 +18,25 @@ package org.apache.beam.fn.harness; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.hasSize; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import java.util.ArrayList; import java.util.Collections; +import java.util.HashMap; import java.util.List; +import java.util.Map; import org.apache.beam.fn.harness.data.PCollectionConsumerRegistry; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection; +import org.apache.beam.runners.core.construction.CoderTranslation; import org.apache.beam.runners.core.construction.PTransformTranslation; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; import org.apache.beam.runners.core.metrics.MetricsContainerStepMap; +import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.util.WindowedValue; @@ -42,9 +47,6 @@ /** Tests for {@link FlattenRunner}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FlattenRunnerTest { /** @@ -78,8 +80,28 @@ public void testCreatingAndProcessingDoFlatten() throws Exception { consumers.register( "mainOutputTarget", pTransformId, - (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); - + (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add, + StringUtf8Coder.of()); + Map pCollectionMap = new HashMap<>(); + pCollectionMap.put( + "inputATarget", + RunnerApi.PCollection.newBuilder() + .setUniqueName("inputATarget") + .setCoderId("coder-id") + .build()); + pCollectionMap.put( + "inputBTarget", + RunnerApi.PCollection.newBuilder() + .setUniqueName("inputBTarget") + .setCoderId("coder-id") + .build()); + pCollectionMap.put( + "inputCTarget", + RunnerApi.PCollection.newBuilder() + .setUniqueName("inputCTarget") + .setCoderId("coder-id") + .build()); + RunnerApi.Coder coder = CoderTranslation.toProto(StringUtf8Coder.of()).getCoder(); new FlattenRunner.Factory<>() .createRunnerForPTransform( PipelineOptionsFactory.create(), @@ -89,8 +111,8 @@ public void testCreatingAndProcessingDoFlatten() throws Exception { pTransformId, pTransform, Suppliers.ofInstance("57L")::get, - Collections.emptyMap(), - Collections.emptyMap(), + pCollectionMap, + Collections.singletonMap("coder-id", coder), Collections.emptyMap(), consumers, null /* startFunctionRegistry */, @@ -129,7 +151,6 @@ public void testCreatingAndProcessingDoFlatten() throws Exception { public void testFlattenWithDuplicateInputCollectionProducesMultipleOutputs() throws Exception { String pTransformId = "pTransformId"; String mainOutputId = "101"; - RunnerApi.FunctionSpec functionSpec = RunnerApi.FunctionSpec.newBuilder() .setUrn(PTransformTranslation.FLATTEN_TRANSFORM_URN) @@ -150,8 +171,15 @@ public void testFlattenWithDuplicateInputCollectionProducesMultipleOutputs() thr consumers.register( "mainOutputTarget", pTransformId, - (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add, + StringUtf8Coder.of()); + RunnerApi.PCollection pCollection = + RunnerApi.PCollection.newBuilder() + .setUniqueName("inputATarget") + .setCoderId("coder-id") + .build(); + RunnerApi.Coder coder = CoderTranslation.toProto(StringUtf8Coder.of()).getCoder(); new FlattenRunner.Factory<>() .createRunnerForPTransform( PipelineOptionsFactory.create(), @@ -161,8 +189,8 @@ public void testFlattenWithDuplicateInputCollectionProducesMultipleOutputs() thr pTransformId, pTransform, Suppliers.ofInstance("57L")::get, - Collections.emptyMap(), - Collections.emptyMap(), + Collections.singletonMap("inputATarget", pCollection), + Collections.singletonMap("coder-id", coder), Collections.emptyMap(), consumers, null /* startFunctionRegistry */, diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java index 055f830454ae..0523008829f7 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java @@ -21,6 +21,7 @@ import static org.apache.beam.sdk.util.WindowedValue.timestampedValueInGlobalWindow; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; @@ -29,7 +30,6 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; import static org.mockito.Mockito.mock; @@ -74,11 +74,13 @@ import org.apache.beam.runners.core.construction.SdkComponents; import org.apache.beam.runners.core.construction.graph.ProtoOverrides; import org.apache.beam.runners.core.construction.graph.SplittableParDoExpander; +import org.apache.beam.runners.core.metrics.DistributionData; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; import org.apache.beam.runners.core.metrics.MetricUpdates.MetricUpdate; import org.apache.beam.runners.core.metrics.MetricsContainerImpl; import org.apache.beam.runners.core.metrics.MetricsContainerStepMap; import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Urns; import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.coders.Coder; @@ -137,8 +139,8 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.util.Durations; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.util.Durations; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Suppliers; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -147,6 +149,7 @@ import org.joda.time.Duration; import org.joda.time.Instant; import org.junit.Before; +import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; import org.junit.experimental.runners.Enclosed; @@ -160,7 +163,6 @@ @RunWith(Enclosed.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FnApiDoFnRunnerTest implements Serializable { @@ -263,10 +265,12 @@ public void testUsingUserState() throws Exception { PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); + consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add, + StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -433,7 +437,7 @@ public void testProcessElementWithSideInputsAndOutputs() throws Exception { iterableSideInputKey(iterableSideInputView.getTagInternal().getId()), encode("iterableValue1", "iterableValue2", "iterableValue3")); - FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(stateData); + FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(stateData, 1000); List> mainOutputValues = new ArrayList<>(); List> additionalOutputValues = new ArrayList<>(); @@ -444,11 +448,13 @@ public void testProcessElementWithSideInputsAndOutputs() throws Exception { consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add, + StringUtf8Coder.of()); consumers.register( additionalPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>) additionalOutputValues::add); + (FnDataReceiver) (FnDataReceiver>) additionalOutputValues::add, + StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -570,11 +576,13 @@ public void testProcessElementWithNonWindowObservingOptimization() throws Except consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add, + StringUtf8Coder.of()); consumers.register( additionalPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>) additionalOutputValues::add); + (FnDataReceiver) (FnDataReceiver>) additionalOutputValues::add, + StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -721,7 +729,7 @@ public void testSideInputIsAccessibleForDownstreamCallers() throws Exception { iterableSideInputKey(iterableSideInputView.getTagInternal().getId(), encodedWindowB), encode("iterableValue1B", "iterableValue2B", "iterableValue3B")); - FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(stateData); + FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(stateData, 1000); List>> mainOutputValues = new ArrayList<>(); MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap(); @@ -731,7 +739,8 @@ public void testSideInputIsAccessibleForDownstreamCallers() throws Exception { consumers.register( Iterables.getOnlyElement(pTransform.getOutputsMap().values()), TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>>) mainOutputValues::add, + IterableCoder.of(StringUtf8Coder.of())); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -797,6 +806,7 @@ public MetricUpdate create(String stepName, MetricName name, long value) { } @Test + @Ignore("https://issues.apache.org/jira/browse/BEAM-12230") public void testUsingMetrics() throws Exception { MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap(); MetricsContainerImpl metricsContainer = metricsContainerRegistry.getUnboundContainer(); @@ -851,7 +861,8 @@ public void testUsingMetrics() throws Exception { consumers.register( Iterables.getOnlyElement(pTransform.getOutputsMap().values()), TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>>) mainOutputValues::add, + IterableCoder.of(StringUtf8Coder.of())); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -931,6 +942,21 @@ public void testUsingMetrics() throws Exception { builder.setInt64SumValue(2); expected.add(builder.build()); + builder = new SimpleMonitoringInfoBuilder(); + builder.setUrn(MonitoringInfoConstants.Urns.SAMPLED_BYTE_SIZE); + builder.setLabel( + MonitoringInfoConstants.Labels.PCOLLECTION, "Window.Into()/Window.Assign.out"); + builder.setInt64DistributionValue(DistributionData.create(4, 2, 2, 2)); + expected.add(builder.build()); + + builder = new SimpleMonitoringInfoBuilder(); + builder.setUrn(Urns.SAMPLED_BYTE_SIZE); + builder.setLabel( + MonitoringInfoConstants.Labels.PCOLLECTION, + "pTransformId/ParMultiDo(TestSideInputIsAccessibleForDownstreamCallers).output"); + builder.setInt64DistributionValue(DistributionData.create(10, 2, 5, 5)); + expected.add(builder.build()); + closeable.close(); List result = new ArrayList(); for (MonitoringInfo mi : metricsContainerRegistry.getMonitoringInfos()) { @@ -982,7 +1008,8 @@ public void testTimers() throws Exception { consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add, + StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( @@ -1037,132 +1064,119 @@ public void testTimers() throws Exception { eventTimer, timerInGlobalWindow("A", new Instant(1400L), new Instant(2400L))); fakeTimerClient.sendTimer( eventTimer, timerInGlobalWindow("B", new Instant(1500L), new Instant(2500L))); + // This will be ignored since there are earlier timers, and the earlier timer will eventually + // push + // the timer past 1600L. fakeTimerClient.sendTimer( eventTimer, timerInGlobalWindow("A", new Instant(1600L), new Instant(2600L))); + // This will be ignored since the timer was already cleared in this bundle. fakeTimerClient.sendTimer( processingTimer, timerInGlobalWindow("X", new Instant(1700L), new Instant(2700L))); fakeTimerClient.sendTimer( processingTimer, timerInGlobalWindow("C", new Instant(1800L), new Instant(2800L))); fakeTimerClient.sendTimer( - processingTimer, timerInGlobalWindow("B", new Instant(1900L), new Instant(2900L))); + processingTimer, timerInGlobalWindow("B", new Instant(1500), new Instant(10032))); fakeTimerClient.sendTimer( eventFamilyTimer, - dynamicTimerInGlobalWindow("B", "event-timer2", new Instant(2000L), new Instant(3000L))); + dynamicTimerInGlobalWindow("B", "event-timer2", new Instant(2000L), new Instant(1650L))); fakeTimerClient.sendTimer( processingFamilyTimer, dynamicTimerInGlobalWindow( "Y", "processing-timer2", new Instant(2100L), new Instant(3100L))); + assertThat( mainOutputValues, contains( - timestampedValueInGlobalWindow("mainX[X0]", new Instant(1000L)), - timestampedValueInGlobalWindow("mainY[]", new Instant(1100L)), - timestampedValueInGlobalWindow("mainX[X0, X1]", new Instant(1200L)), - timestampedValueInGlobalWindow("mainY[Y1]", new Instant(1300L)), - timestampedValueInGlobalWindow("event[A0]", new Instant(1400L)), - timestampedValueInGlobalWindow("event[]", new Instant(1500L)), - timestampedValueInGlobalWindow("event[A0, event]", new Instant(1600L)), - timestampedValueInGlobalWindow("processing[X0, X1, X2]", new Instant(1700L)), - timestampedValueInGlobalWindow("processing[C0]", new Instant(1800L)), - timestampedValueInGlobalWindow("processing[event]", new Instant(1900L)), - timestampedValueInGlobalWindow("event-family[event, processing]", new Instant(2000L)), - timestampedValueInGlobalWindow("processing-family[Y1, Y2]", new Instant(2100L)))); + timestampedValueInGlobalWindow("key:X mainX[X0]", new Instant(1000L)), + timestampedValueInGlobalWindow("key:Y mainY[]", new Instant(1100L)), + timestampedValueInGlobalWindow("key:X mainX[X0, X1]", new Instant(1200L)), + timestampedValueInGlobalWindow("key:Y mainY[Y1]", new Instant(1300L)), + timestampedValueInGlobalWindow("key:A event[A0]", new Instant(1400L)), + timestampedValueInGlobalWindow("key:B event[]", new Instant(1500L)), + timestampedValueInGlobalWindow("key:A event[A0, event]", new Instant(1400L)), + timestampedValueInGlobalWindow("key:A event[A0, event, event]", new Instant(1400L)), + timestampedValueInGlobalWindow( + "key:A event[A0, event, event, event]", new Instant(1400L)), + timestampedValueInGlobalWindow( + "key:A event[A0, event, event, event, event]", new Instant(1400L)), + timestampedValueInGlobalWindow( + "key:A event[A0, event, event, event, event, event]", new Instant(1400L)), + timestampedValueInGlobalWindow( + "key:A event[A0, event, event, event, event, event, event]", new Instant(1400L)), + timestampedValueInGlobalWindow("key:C processing[C0]", new Instant(1800L)), + timestampedValueInGlobalWindow("key:B processing[event]", new Instant(1500L)), + timestampedValueInGlobalWindow("key:B event[event, processing]", new Instant(1500)), + timestampedValueInGlobalWindow( + "key:B event[event, processing, event]", new Instant(1500)), + timestampedValueInGlobalWindow( + "key:B event[event, processing, event, event]", new Instant(1500)), + timestampedValueInGlobalWindow( + "key:B event-family[event, processing, event, event, event]", new Instant(2000L)), + timestampedValueInGlobalWindow( + "key:Y processing-family[Y1, Y2]", new Instant(2100L)))); + + mainOutputValues.clear(); + + assertFalse(fakeTimerClient.isOutboundClosed(eventTimer)); + assertFalse(fakeTimerClient.isOutboundClosed(processingTimer)); + assertFalse(fakeTimerClient.isOutboundClosed(eventFamilyTimer)); + assertFalse(fakeTimerClient.isOutboundClosed(processingFamilyTimer)); + fakeTimerClient.closeInbound(eventTimer); + fakeTimerClient.closeInbound(processingTimer); + fakeTimerClient.closeInbound(eventFamilyTimer); + fakeTimerClient.closeInbound(processingFamilyTimer); + + // Timers will get delivered to the client when finishBundle is called. + Iterables.getOnlyElement(finishFunctionRegistry.getFunctions()).run(); + assertThat( fakeTimerClient.getTimers(eventTimer), contains( - timerInGlobalWindow("X", new Instant(1000L), new Instant(1001L)), - timerInGlobalWindow("Y", new Instant(1100L), new Instant(1101L)), - timerInGlobalWindow("X", new Instant(1200L), new Instant(1201L)), - timerInGlobalWindow("Y", new Instant(1300L), new Instant(1301L)), - timerInGlobalWindow("A", new Instant(1400L), new Instant(2411L)), - timerInGlobalWindow("B", new Instant(1500L), new Instant(2511L)), - timerInGlobalWindow("A", new Instant(1600L), new Instant(2611L)), - timerInGlobalWindow("X", new Instant(1700L), new Instant(1721L)), - timerInGlobalWindow("C", new Instant(1800L), new Instant(1821L)), - timerInGlobalWindow("B", new Instant(1900L), new Instant(1921L)), - timerInGlobalWindow("B", new Instant(2000L), new Instant(2031L)), - timerInGlobalWindow("Y", new Instant(2100L), new Instant(2141L)))); + clearedTimerInGlobalWindow("X"), + timerInGlobalWindow("Y", new Instant(2100L), new Instant(2181L)), + timerInGlobalWindow("A", new Instant(1400L), new Instant(2617L)), + timerInGlobalWindow("B", new Instant(2000L), new Instant(2071L)), + timerInGlobalWindow("C", new Instant(1800L), new Instant(1861L)))); assertThat( fakeTimerClient.getTimers(processingTimer), contains( - timerInGlobalWindow("X", new Instant(1000L), new Instant(10002L)), - timerInGlobalWindow("Y", new Instant(1100L), new Instant(10002L)), - timerInGlobalWindow("X", new Instant(1200L), new Instant(10002L)), - timerInGlobalWindow("Y", new Instant(1300L), new Instant(10002L)), - timerInGlobalWindow("A", new Instant(1400L), new Instant(10012L)), - timerInGlobalWindow("B", new Instant(1500L), new Instant(10012L)), - timerInGlobalWindow("A", new Instant(1600L), new Instant(10012L)), - timerInGlobalWindow("X", new Instant(1700L), new Instant(10022L)), - timerInGlobalWindow("C", new Instant(1800L), new Instant(10022L)), - timerInGlobalWindow("B", new Instant(1900L), new Instant(10022L)), - timerInGlobalWindow("B", new Instant(2000L), new Instant(10032L)), - timerInGlobalWindow("Y", new Instant(2100L), new Instant(10042L)))); + clearedTimerInGlobalWindow("X"), + timerInGlobalWindow("Y", new Instant(2100L), new Instant(10082L)), + timerInGlobalWindow("A", new Instant(1400L), new Instant(10032L)), + timerInGlobalWindow("B", new Instant(2000L), new Instant(10072L)), + timerInGlobalWindow("C", new Instant(1800L), new Instant(10062L)))); + assertThat( fakeTimerClient.getTimers(eventFamilyTimer), - contains( - dynamicTimerInGlobalWindow( - "X", "event-timer1", new Instant(1000L), new Instant(1003L)), - dynamicTimerInGlobalWindow( - "Y", "event-timer1", new Instant(1100L), new Instant(1103L)), + containsInAnyOrder( dynamicTimerInGlobalWindow( "X", "event-timer1", new Instant(1200L), new Instant(1203L)), + clearedTimerInGlobalWindow("X", "to-delete-event"), + clearedTimerInGlobalWindow("Y", "to-delete-event"), dynamicTimerInGlobalWindow( - "Y", "event-timer1", new Instant(1300L), new Instant(1303L)), - dynamicTimerInGlobalWindow( - "A", "event-timer1", new Instant(1400L), new Instant(2413L)), - dynamicTimerInGlobalWindow( - "B", "event-timer1", new Instant(1500L), new Instant(2513L)), - dynamicTimerInGlobalWindow( - "A", "event-timer1", new Instant(1600L), new Instant(2613L)), + "Y", "event-timer1", new Instant(2100L), new Instant(2183L)), dynamicTimerInGlobalWindow( - "X", "event-timer1", new Instant(1700L), new Instant(1723L)), + "A", "event-timer1", new Instant(1400L), new Instant(2619L)), dynamicTimerInGlobalWindow( - "C", "event-timer1", new Instant(1800L), new Instant(1823L)), + "B", "event-timer1", new Instant(2000L), new Instant(2073L)), dynamicTimerInGlobalWindow( - "B", "event-timer1", new Instant(1900L), new Instant(1923L)), - dynamicTimerInGlobalWindow( - "B", "event-timer1", new Instant(2000L), new Instant(2033L)), - dynamicTimerInGlobalWindow( - "Y", "event-timer1", new Instant(2100L), new Instant(2143L)))); + "C", "event-timer1", new Instant(1800L), new Instant(1863L)))); assertThat( fakeTimerClient.getTimers(processingFamilyTimer), - contains( - dynamicTimerInGlobalWindow( - "X", "processing-timer1", new Instant(1000L), new Instant(10004L)), - dynamicTimerInGlobalWindow( - "Y", "processing-timer1", new Instant(1100L), new Instant(10004L)), + containsInAnyOrder( dynamicTimerInGlobalWindow( "X", "processing-timer1", new Instant(1200L), new Instant(10004L)), + clearedTimerInGlobalWindow("X", "to-delete-processing"), dynamicTimerInGlobalWindow( - "Y", "processing-timer1", new Instant(1300L), new Instant(10004L)), - dynamicTimerInGlobalWindow( - "A", "processing-timer1", new Instant(1400L), new Instant(10014L)), + "Y", "processing-timer1", new Instant(2100L), new Instant(10084L)), + clearedTimerInGlobalWindow("Y", "to-delete-processing"), dynamicTimerInGlobalWindow( - "B", "processing-timer1", new Instant(1500L), new Instant(10014L)), + "A", "processing-timer1", new Instant(1400L), new Instant(10034L)), dynamicTimerInGlobalWindow( - "A", "processing-timer1", new Instant(1600L), new Instant(10014L)), + "B", "processing-timer1", new Instant(2000L), new Instant(10074L)), dynamicTimerInGlobalWindow( - "X", "processing-timer1", new Instant(1700L), new Instant(10024L)), - dynamicTimerInGlobalWindow( - "C", "processing-timer1", new Instant(1800L), new Instant(10024L)), - dynamicTimerInGlobalWindow( - "B", "processing-timer1", new Instant(1900L), new Instant(10024L)), - dynamicTimerInGlobalWindow( - "B", "processing-timer1", new Instant(2000L), new Instant(10034L)), - dynamicTimerInGlobalWindow( - "Y", "processing-timer1", new Instant(2100L), new Instant(10044L)))); - mainOutputValues.clear(); + "C", "processing-timer1", new Instant(1800L), new Instant(10064L)))); - assertFalse(fakeTimerClient.isOutboundClosed(eventTimer)); - assertFalse(fakeTimerClient.isOutboundClosed(processingTimer)); - assertFalse(fakeTimerClient.isOutboundClosed(eventFamilyTimer)); - assertFalse(fakeTimerClient.isOutboundClosed(processingFamilyTimer)); - fakeTimerClient.closeInbound(eventTimer); - fakeTimerClient.closeInbound(processingTimer); - fakeTimerClient.closeInbound(eventFamilyTimer); - fakeTimerClient.closeInbound(processingFamilyTimer); - - Iterables.getOnlyElement(finishFunctionRegistry.getFunctions()).run(); assertThat(mainOutputValues, empty()); assertTrue(fakeTimerClient.isOutboundClosed(eventTimer)); @@ -1175,10 +1189,14 @@ public void testTimers() throws Exception { assertEquals( ImmutableMap.builder() - .put(bagUserStateKey("bag", "X"), encode("X0", "X1", "X2", "processing")) + .put(bagUserStateKey("bag", "X"), encode("X0", "X1", "X2")) .put(bagUserStateKey("bag", "Y"), encode("Y1", "Y2", "processing-family")) - .put(bagUserStateKey("bag", "A"), encode("A0", "event", "event")) - .put(bagUserStateKey("bag", "B"), encode("event", "processing", "event-family")) + .put( + bagUserStateKey("bag", "A"), + encode("A0", "event", "event", "event", "event", "event", "event", "event")) + .put( + bagUserStateKey("bag", "B"), + encode("event", "processing", "event", "event", "event", "event-family")) .put(bagUserStateKey("bag", "C"), encode("C0", "processing")) .build(), fakeStateClient.getData()); @@ -1189,6 +1207,17 @@ private org.apache.beam.runners.core.construction.Timer timerInGlobalWind return dynamicTimerInGlobalWindow(userKey, "", holdTimestamp, fireTimestamp); } + private org.apache.beam.runners.core.construction.Timer clearedTimerInGlobalWindow( + K userKey) { + return clearedTimerInGlobalWindow(userKey, ""); + } + + private org.apache.beam.runners.core.construction.Timer clearedTimerInGlobalWindow( + K userKey, String dynamicTimerTag) { + return org.apache.beam.runners.core.construction.Timer.cleared( + userKey, dynamicTimerTag, Collections.singletonList(GlobalWindow.INSTANCE)); + } + private org.apache.beam.runners.core.construction.Timer dynamicTimerInGlobalWindow( K userKey, String dynamicTimerTag, Instant holdTimestamp, Instant fireTimestamp) { return org.apache.beam.runners.core.construction.Timer.of( @@ -1234,106 +1263,122 @@ public void processElement( @TimerId("processing") Timer processingTimeTimer, @TimerFamily("event-family") TimerMap eventTimerFamily, @TimerFamily("processing-family") TimerMap processingTimerFamily) { - context.output("main" + context.element().getKey() + Iterables.toString(bagState.read())); + context.output( + "key:" + + context.element().getKey() + + " main" + + context.element().getKey() + + Iterables.toString(bagState.read())); bagState.add(context.element().getValue()); eventTimeTimer.withOutputTimestamp(context.timestamp()).set(context.timestamp().plus(1L)); + eventTimeTimer.clear(); processingTimeTimer.offset(Duration.millis(2L)); processingTimeTimer.setRelative(); + processingTimeTimer.clear(); eventTimerFamily .get("event-timer1") .withOutputTimestamp(context.timestamp()) .set(context.timestamp().plus(3L)); + eventTimerFamily.get("to-delete-event").set(context.timestamp().plus(5L)); + eventTimerFamily.get("to-delete-event").clear(); processingTimerFamily.get("processing-timer1").offset(Duration.millis(4L)).setRelative(); + processingTimerFamily.get("to-delete-processing").offset(Duration.millis(4L)).setRelative(); + processingTimerFamily.get("to-delete-processing").clear(); } @OnTimer("event") public void eventTimer( OnTimerContext context, + @Key String key, @StateId("bag") BagState bagState, @TimerId("event") Timer eventTimeTimer, @TimerId("processing") Timer processingTimeTimer, @TimerFamily("event-family") TimerMap eventTimerFamily, @TimerFamily("processing-family") TimerMap processingTimerFamily) { - context.output("event" + Iterables.toString(bagState.read())); + context.output("key:" + key + " event" + Iterables.toString(bagState.read())); bagState.add("event"); eventTimeTimer .withOutputTimestamp(context.timestamp()) - .set(context.fireTimestamp().plus(11L)); - processingTimeTimer.offset(Duration.millis(12L)); + .set(context.fireTimestamp().plus(31L)); + processingTimeTimer.offset(Duration.millis(32L)); processingTimeTimer.setRelative(); eventTimerFamily .get("event-timer1") .withOutputTimestamp(context.timestamp()) - .set(context.fireTimestamp().plus(13L)); + .set(context.fireTimestamp().plus(33L)); - processingTimerFamily.get("processing-timer1").offset(Duration.millis(14L)).setRelative(); + processingTimerFamily.get("processing-timer1").offset(Duration.millis(34L)).setRelative(); } @OnTimer("processing") public void processingTimer( OnTimerContext context, + @Key String key, @StateId("bag") BagState bagState, @TimerId("event") Timer eventTimeTimer, @TimerId("processing") Timer processingTimeTimer, @TimerFamily("event-family") TimerMap eventTimerFamily, @TimerFamily("processing-family") TimerMap processingTimerFamily) { - context.output("processing" + Iterables.toString(bagState.read())); + context.output("key:" + key + " processing" + Iterables.toString(bagState.read())); bagState.add("processing"); - eventTimeTimer.withOutputTimestamp(context.timestamp()).set(context.timestamp().plus(21L)); - processingTimeTimer.offset(Duration.millis(22L)); + eventTimeTimer.withOutputTimestamp(context.timestamp()).set(context.timestamp().plus(61L)); + processingTimeTimer.offset(Duration.millis(62L)); processingTimeTimer.setRelative(); eventTimerFamily .get("event-timer1") .withOutputTimestamp(context.timestamp()) - .set(context.timestamp().plus(23L)); + .set(context.timestamp().plus(63L)); - processingTimerFamily.get("processing-timer1").offset(Duration.millis(24L)).setRelative(); + processingTimerFamily.get("processing-timer1").offset(Duration.millis(64L)).setRelative(); } @OnTimerFamily("event-family") public void eventFamilyOnTimer( OnTimerContext context, + @Key String key, + @Timestamp Instant ts, @StateId("bag") BagState bagState, @TimerId("event") Timer eventTimeTimer, @TimerId("processing") Timer processingTimeTimer, @TimerFamily("event-family") TimerMap eventTimerFamily, @TimerFamily("processing-family") TimerMap processingTimerFamily) { - context.output("event-family" + Iterables.toString(bagState.read())); + context.output("key:" + key + " event-family" + Iterables.toString(bagState.read())); bagState.add("event-family"); - eventTimeTimer.withOutputTimestamp(context.timestamp()).set(context.timestamp().plus(31L)); - processingTimeTimer.offset(Duration.millis(32L)); + eventTimeTimer.withOutputTimestamp(context.timestamp()).set(context.timestamp().plus(71L)); + processingTimeTimer.offset(Duration.millis(72L)); processingTimeTimer.setRelative(); eventTimerFamily .get("event-timer1") .withOutputTimestamp(context.timestamp()) - .set(context.timestamp().plus(33L)); + .set(context.timestamp().plus(73L)); - processingTimerFamily.get("processing-timer1").offset(Duration.millis(34L)).setRelative(); + processingTimerFamily.get("processing-timer1").offset(Duration.millis(74L)).setRelative(); } @OnTimerFamily("processing-family") public void processingFamilyOnTimer( OnTimerContext context, + @Key String key, @StateId("bag") BagState bagState, @TimerId("event") Timer eventTimeTimer, @TimerId("processing") Timer processingTimeTimer, @TimerFamily("event-family") TimerMap eventTimerFamily, @TimerFamily("processing-family") TimerMap processingTimerFamily) { - context.output("processing-family" + Iterables.toString(bagState.read())); + context.output("key:" + key + " processing-family" + Iterables.toString(bagState.read())); bagState.add("processing-family"); - eventTimeTimer.withOutputTimestamp(context.timestamp()).set(context.timestamp().plus(41L)); - processingTimeTimer.offset(Duration.millis(42L)); + eventTimeTimer.withOutputTimestamp(context.timestamp()).set(context.timestamp().plus(81L)); + processingTimeTimer.offset(Duration.millis(82L)); processingTimeTimer.setRelative(); eventTimerFamily .get("event-timer1") .withOutputTimestamp(context.timestamp()) - .set(context.timestamp().plus(43L)); + .set(context.timestamp().plus(83L)); - processingTimerFamily.get("processing-timer1").offset(Duration.millis(44L)).setRelative(); + processingTimerFamily.get("processing-timer1").offset(Duration.millis(84L)).setRelative(); } } @@ -1647,7 +1692,8 @@ public void testProcessElementForSizedElementAndRestriction() throws Exception { consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add, + StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -1733,13 +1779,13 @@ public void testProcessElementForSizedElementAndRestriction() throws Exception { assertEquals( ImmutableMap.of( "output", - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp.newBuilder() .setSeconds(expectedOutputWatermark.getMillis() / 1000) .setNanos((int) (expectedOutputWatermark.getMillis() % 1000) * 1000000) .build()), residualRoot.getApplication().getOutputWatermarksMap()); assertEquals( - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Duration.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Duration.newBuilder() .setSeconds(54) .setNanos(321000000) .build(), @@ -1869,7 +1915,7 @@ public void testProcessElementForSizedElementAndRestriction() throws Exception { assertEquals( ImmutableMap.of( "output", - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp.newBuilder() .setSeconds(expectedOutputWatermark.getMillis() / 1000) .setNanos((int) (expectedOutputWatermark.getMillis() % 1000) * 1000000) .build()), @@ -1972,7 +2018,8 @@ public void testProcessElementForWindowedSizedElementAndRestriction() throws Exc consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); + (FnDataReceiver) (FnDataReceiver>) mainOutputValues::add, + StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -2052,27 +2099,27 @@ public void testProcessElementForWindowedSizedElementAndRestriction() throws Exc residualRoot.getApplication().getInputId()); assertEquals(TEST_TRANSFORM_ID, residualRoot.getApplication().getTransformId()); Instant expectedOutputWatermark = GlobalWindow.TIMESTAMP_MIN_VALUE.plus(7); - Map + Map expectedOutputWatmermarkMap = ImmutableMap.of( "output", - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp.newBuilder() .setSeconds(expectedOutputWatermark.getMillis() / 1000) .setNanos((int) (expectedOutputWatermark.getMillis() % 1000) * 1000000) .build()); Instant initialWatermark = GlobalWindow.TIMESTAMP_MIN_VALUE.plus(1); - Map + Map expectedOutputWatmermarkMapForUnprocessedWindows = ImmutableMap.of( "output", - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp.newBuilder() .setSeconds(initialWatermark.getMillis() / 1000) .setNanos((int) (initialWatermark.getMillis() % 1000) * 1000000) .build()); assertEquals( expectedOutputWatmermarkMap, residualRoot.getApplication().getOutputWatermarksMap()); assertEquals( - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Duration.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Duration.newBuilder() .setSeconds(54) .setNanos(321000000) .build(), @@ -2270,19 +2317,19 @@ public void testProcessElementForWindowedSizedElementAndRestriction() throws Exc residualRootInUnprocessedWindows.getRequestedTimeDelay()); Instant initialWatermark = GlobalWindow.TIMESTAMP_MIN_VALUE.plus(1); Instant expectedOutputWatermark = GlobalWindow.TIMESTAMP_MIN_VALUE.plus(2); - Map + Map expectedOutputWatermarkMapInUnprocessedResiduals = ImmutableMap.of( "output", - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp.newBuilder() .setSeconds(initialWatermark.getMillis() / 1000) .setNanos((int) (initialWatermark.getMillis() % 1000) * 1000000) .build()); - Map + Map expectedOutputWatermarkMap = ImmutableMap.of( "output", - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp.newBuilder() + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp.newBuilder() .setSeconds(expectedOutputWatermark.getMillis() / 1000) .setNanos((int) (expectedOutputWatermark.getMillis() % 1000) * 1000000) .build()); @@ -2360,7 +2407,6 @@ public void testProcessElementForPairWithRestriction() throws Exception { TEST_TRANSFORM_ID, ParDo.of(new WindowObservingTestSplittableDoFn(singletonSideInputView)) .withSideInputs(singletonSideInputView)); - RunnerApi.Pipeline pProto = ProtoOverrides.updateTransform( PTransformTranslation.PAR_DO_TRANSFORM_URN, @@ -2390,7 +2436,11 @@ public void testProcessElementForPairWithRestriction() throws Exception { PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); - consumers.register(outputPCollectionId, TEST_TRANSFORM_ID, ((List) mainOutputValues)::add); + consumers.register( + outputPCollectionId, + TEST_TRANSFORM_ID, + ((List) mainOutputValues)::add, + KvCoder.of(StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of()))); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -2398,7 +2448,6 @@ public void testProcessElementForPairWithRestriction() throws Exception { new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish"); List teardownFunctions = new ArrayList<>(); - new FnApiDoFnRunner.Factory<>() .createRunnerForPTransform( PipelineOptionsFactory.create(), @@ -2489,7 +2538,11 @@ public void testProcessElementForWindowedPairWithRestriction() throws Exception PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); - consumers.register(outputPCollectionId, TEST_TRANSFORM_ID, ((List) mainOutputValues)::add); + consumers.register( + outputPCollectionId, + TEST_TRANSFORM_ID, + ((List) mainOutputValues)::add, + KvCoder.of(StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of()))); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -2607,7 +2660,11 @@ public void testProcessElementForWindowedPairWithRestrictionWithNonWindowObservi PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); - consumers.register(outputPCollectionId, TEST_TRANSFORM_ID, ((List) mainOutputValues)::add); + consumers.register( + outputPCollectionId, + TEST_TRANSFORM_ID, + ((List) mainOutputValues)::add, + KvCoder.of(StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of()))); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -2715,7 +2772,13 @@ public void testProcessElementForSplitAndSizeRestriction() throws Exception { PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); - consumers.register(outputPCollectionId, TEST_TRANSFORM_ID, ((List) mainOutputValues)::add); + Coder coder = + KvCoder.of( + KvCoder.of( + StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of())), + DoubleCoder.of()); + consumers.register( + outputPCollectionId, TEST_TRANSFORM_ID, ((List) mainOutputValues)::add, coder); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -2828,7 +2891,13 @@ public void testProcessElementForWindowedSplitAndSizeRestriction() throws Except PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); - consumers.register(outputPCollectionId, TEST_TRANSFORM_ID, ((List) mainOutputValues)::add); + Coder coder = + KvCoder.of( + KvCoder.of( + StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of())), + DoubleCoder.of()); + consumers.register( + outputPCollectionId, TEST_TRANSFORM_ID, ((List) mainOutputValues)::add, coder); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -2985,7 +3054,13 @@ public void testProcessElementForWindowedSplitAndSizeRestriction() throws Except PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); - consumers.register(outputPCollectionId, TEST_TRANSFORM_ID, ((List) mainOutputValues)::add); + Coder coder = + KvCoder.of( + KvCoder.of( + StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of())), + DoubleCoder.of()); + consumers.register( + outputPCollectionId, TEST_TRANSFORM_ID, ((List) mainOutputValues)::add, coder); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -3194,10 +3269,16 @@ public void testProcessElementForTruncateAndSizeRestrictionForwardSplitWhenObser PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); + Coder coder = + KvCoder.of( + KvCoder.of( + StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of())), + DoubleCoder.of()); consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues)); + (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues), + coder); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -3363,10 +3444,13 @@ public void testProcessElementForTruncateAndSizeRestrictionForwardSplitWithoutOb PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); + Coder coder = + KvCoder.of(KvCoder.of(StringUtf8Coder.of(), OffsetRange.Coder.of()), DoubleCoder.of()); consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues)); + (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues), + coder); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -3440,10 +3524,16 @@ public void testProcessElementForTruncateAndSizeRestriction() throws Exception { PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); + Coder coder = + KvCoder.of( + KvCoder.of( + StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of())), + DoubleCoder.of()); consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues)); + (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues), + coder); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -3554,10 +3644,16 @@ public void testProcessElementForWindowedTruncateAndSizeRestriction() throws Exc PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); + Coder coder = + KvCoder.of( + KvCoder.of( + StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of())), + DoubleCoder.of()); consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues)); + (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues), + coder); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -3690,10 +3786,16 @@ public void testProcessElementForWindowedTruncateAndSizeRestriction() throws Exc PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); + Coder coder = + KvCoder.of( + KvCoder.of( + StringUtf8Coder.of(), KvCoder.of(OffsetRange.Coder.of(), InstantCoder.of())), + DoubleCoder.of()); consumers.register( outputPCollectionId, TEST_TRANSFORM_ID, - (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues)); + (FnDataReceiver) new SplittableFnDataReceiver(mainOutputValues), + coder); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); @@ -4695,9 +4797,9 @@ private HandlesSplits.SplitResult getProcessElementSplit(String transformId, Str .build())); } - private org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp toTimestamp( + private org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp toTimestamp( Instant time) { - return org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp.newBuilder() + return org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp.newBuilder() .setSeconds(time.getMillis() / 1000) .setNanos((int) (time.getMillis() % 1000) * 1000000) .build(); @@ -4803,7 +4905,7 @@ public void testConstructSplitResultWithOnlyWindowSplits() throws Exception { assertEquals(1, result.getResidualRoots().size()); DelayedBundleApplication residualRoot = result.getResidualRoots().get(0); assertEquals( - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Duration.getDefaultInstance(), + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Duration.getDefaultInstance(), residualRoot.getRequestedTimeDelay()); assertEquals(PROCESS_TRANSFORM_ID, residualRoot.getApplication().getTransformId()); assertEquals(PROCESS_INPUT_ID, residualRoot.getApplication().getInputId()); @@ -4858,7 +4960,7 @@ public void testConstructSplitResultWithElementAndWindowSplitFromProcess() throw DelayedBundleApplication windowResidual = result.getResidualRoots().get(0); DelayedBundleApplication elementResidual = result.getResidualRoots().get(1); assertEquals( - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Duration.getDefaultInstance(), + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Duration.getDefaultInstance(), windowResidual.getRequestedTimeDelay()); assertEquals(PROCESS_TRANSFORM_ID, windowResidual.getApplication().getTransformId()); assertEquals(PROCESS_INPUT_ID, windowResidual.getApplication().getInputId()); @@ -4916,7 +5018,7 @@ public void testConstructSplitResultWithElementAndWindowSplitFromTruncate() thro DelayedBundleApplication windowResidual = result.getResidualRoots().get(0); DelayedBundleApplication elementResidual = result.getResidualRoots().get(1); assertEquals( - org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Duration.getDefaultInstance(), + org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Duration.getDefaultInstance(), windowResidual.getRequestedTimeDelay()); assertEquals(TRUNCATE_TRANSFORM_ID, windowResidual.getApplication().getTransformId()); assertEquals(TRUNCATE_INPUT_ID, windowResidual.getApplication().getInputId()); diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnHarnessTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnHarnessTest.java index bf442710fbe7..0b70e908315c 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnHarnessTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnHarnessTest.java @@ -43,10 +43,10 @@ import org.apache.beam.sdk.harness.JvmInitializer; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.TextFormat; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles; import org.junit.Rule; import org.junit.Test; diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/MapFnRunnersTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/MapFnRunnersTest.java index 339657c13d1b..03ddd7fc79c3 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/MapFnRunnersTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/MapFnRunnersTest.java @@ -18,13 +18,14 @@ package org.apache.beam.fn.harness; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.empty; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; +import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.List; @@ -33,8 +34,10 @@ import org.apache.beam.fn.harness.data.PTransformFunctionRegistry; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; +import org.apache.beam.runners.core.construction.CoderTranslation; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; import org.apache.beam.runners.core.metrics.MetricsContainerStepMap; +import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.function.ThrowingFunction; import org.apache.beam.sdk.function.ThrowingRunnable; import org.apache.beam.sdk.options.PipelineOptionsFactory; @@ -52,9 +55,6 @@ /** Tests for {@link MapFnRunners}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MapFnRunnersTest { private static final String EXPECTED_ID = "pTransformId"; private static final RunnerApi.PTransform EXPECTED_PTRANSFORM = @@ -62,6 +62,17 @@ public class MapFnRunnersTest { .putInputs("input", "inputPC") .putOutputs("output", "outputPC") .build(); + private static final RunnerApi.PCollection INPUT_PCOLLECTION = + RunnerApi.PCollection.newBuilder().setUniqueName("inputPC").setCoderId("coder-id").build(); + private static RunnerApi.Coder valueCoder; + + static { + try { + valueCoder = CoderTranslation.toProto(StringUtf8Coder.of()).getCoder(); + } catch (IOException e) { + throw new ExceptionInInitializerError(e); + } + } @Test public void testValueOnlyMapping() throws Exception { @@ -70,7 +81,7 @@ public void testValueOnlyMapping() throws Exception { PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); - consumers.register("outputPC", EXPECTED_ID, outputConsumer::add); + consumers.register("outputPC", EXPECTED_ID, outputConsumer::add, StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( @@ -90,8 +101,8 @@ public void testValueOnlyMapping() throws Exception { EXPECTED_ID, EXPECTED_PTRANSFORM, Suppliers.ofInstance("57L")::get, - Collections.emptyMap(), - Collections.emptyMap(), + Collections.singletonMap("inputPC", INPUT_PCOLLECTION), + Collections.singletonMap("coder-id", valueCoder), Collections.emptyMap(), consumers, startFunctionRegistry, @@ -120,7 +131,7 @@ public void testFullWindowedValueMapping() throws Exception { PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( metricsContainerRegistry, mock(ExecutionStateTracker.class)); - consumers.register("outputPC", EXPECTED_ID, outputConsumer::add); + consumers.register("outputPC", EXPECTED_ID, outputConsumer::add, StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( @@ -139,8 +150,8 @@ public void testFullWindowedValueMapping() throws Exception { EXPECTED_ID, EXPECTED_PTRANSFORM, Suppliers.ofInstance("57L")::get, - Collections.emptyMap(), - Collections.emptyMap(), + Collections.singletonMap("inputPC", INPUT_PCOLLECTION), + Collections.singletonMap("coder-id", valueCoder), Collections.emptyMap(), consumers, startFunctionRegistry, @@ -165,17 +176,18 @@ public void testFullWindowedValueMapping() throws Exception { @Test public void testFullWindowedValueMappingWithCompressedWindow() throws Exception { List> outputConsumer = new ArrayList<>(); + MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap(); PCollectionConsumerRegistry consumers = new PCollectionConsumerRegistry( - mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class)); - consumers.register("outputPC", "pTransformId", outputConsumer::add); + metricsContainerRegistry, mock(ExecutionStateTracker.class)); + consumers.register("outputPC", "pTransformId", outputConsumer::add, StringUtf8Coder.of()); PTransformFunctionRegistry startFunctionRegistry = new PTransformFunctionRegistry( - mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); + metricsContainerRegistry, mock(ExecutionStateTracker.class), "start"); PTransformFunctionRegistry finishFunctionRegistry = new PTransformFunctionRegistry( - mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish"); + metricsContainerRegistry, mock(ExecutionStateTracker.class), "finish"); List teardownFunctions = new ArrayList<>(); MapFnRunners.forWindowedValueMapFnFactory(this::createMapFunctionForPTransform) @@ -187,8 +199,8 @@ public void testFullWindowedValueMappingWithCompressedWindow() throws Exception EXPECTED_ID, EXPECTED_PTRANSFORM, Suppliers.ofInstance("57L")::get, - Collections.emptyMap(), - Collections.emptyMap(), + Collections.singletonMap("inputPC", INPUT_PCOLLECTION), + Collections.singletonMap("coder-id", valueCoder), Collections.emptyMap(), consumers, startFunctionRegistry, diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/PrecombineGroupingTableTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/PrecombineGroupingTableTest.java index 4454eb37de76..3d3e2d31cd0d 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/PrecombineGroupingTableTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/PrecombineGroupingTableTest.java @@ -17,13 +17,13 @@ */ package org.apache.beam.fn.harness; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.anyOf; import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.in; import static org.hamcrest.core.Is.is; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.Arrays; @@ -44,9 +44,6 @@ /** Unit tests for {@link PrecombineGroupingTable}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PrecombineGroupingTableTest { private static class TestOutputReceiver implements Receiver { diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/WindowMergingFnRunnerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/WindowMergingFnRunnerTest.java index b936cfb7934f..359ea98546b9 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/WindowMergingFnRunnerTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/WindowMergingFnRunnerTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.fn.harness; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.util.Collections; import org.apache.beam.model.pipeline.v1.RunnerApi; @@ -105,6 +105,28 @@ public void testWindowMergingWithMergingWindowFn() throws Exception { Iterables.getOnlyElement(output.getValue().getValue()); assertEquals(new IntervalWindow(new Instant(7L), new Instant(11L)), mergedOutput.getKey()); assertThat(mergedOutput.getValue(), containsInAnyOrder(expectedToBeMerged)); + + // Process a new group of windows, make sure that previous result has been cleaned up. + BoundedWindow[] expectedToBeMergedGroup2 = + new BoundedWindow[] { + new IntervalWindow(new Instant(15L), new Instant(17L)), + new IntervalWindow(new Instant(16L), new Instant(18L)) + }; + + input = + KV.of( + "abc", + ImmutableList.builder() + .add(expectedToBeMergedGroup2) + .addAll(expectedToBeUnmerged) + .build()); + + output = mapFunction.apply(input); + assertEquals(input.getKey(), output.getKey()); + assertEquals(expectedToBeUnmerged, output.getValue().getKey()); + mergedOutput = Iterables.getOnlyElement(output.getValue().getValue()); + assertEquals(new IntervalWindow(new Instant(15L), new Instant(18L)), mergedOutput.getKey()); + assertThat(mergedOutput.getValue(), containsInAnyOrder(expectedToBeMergedGroup2)); } private static RunnerApi.PTransform createMergeTransformForWindowFn( diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/BeamFnControlClientTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/BeamFnControlClientTest.java index be7d2e3e4aa3..404c79c73264 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/BeamFnControlClientTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/BeamFnControlClientTest.java @@ -18,11 +18,11 @@ package org.apache.beam.fn.harness.control; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables.getStackTraceAsString; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.not; import static org.hamcrest.Matchers.nullValue; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.EnumMap; @@ -31,7 +31,6 @@ import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; -import java.util.concurrent.Future; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.atomic.AtomicBoolean; import org.apache.beam.model.fnexecution.v1.BeamFnApi; @@ -43,10 +42,10 @@ import org.apache.beam.sdk.fn.test.InProcessManagedChannelFactory; import org.apache.beam.sdk.fn.test.TestStreams; import org.apache.beam.sdk.function.ThrowingFunction; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles; import org.junit.Test; import org.junit.runner.RunWith; @@ -54,9 +53,6 @@ /** Tests for {@link BeamFnControlClient}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnControlClientTest { private static final BeamFnApi.InstructionRequest SUCCESSFUL_REQUEST = BeamFnApi.InstructionRequest.newBuilder() @@ -133,26 +129,19 @@ public StreamObserver control( throw FAILURE; }); + ExecutorService executor = Executors.newCachedThreadPool(); BeamFnControlClient client = new BeamFnControlClient( - "", apiServiceDescriptor, InProcessManagedChannelFactory.create(), OutboundObserverFactory.trivial(), + executor, handlers); // Get the connected client and attempt to send and receive an instruction StreamObserver outboundServerObserver = outboundServerObservers.take(); - ExecutorService executor = Executors.newCachedThreadPool(); - Future future = - executor.submit( - () -> { - client.processInstructionRequests(executor); - return null; - }); - outboundServerObserver.onNext(SUCCESSFUL_REQUEST); assertEquals(SUCCESSFUL_RESPONSE, values.take()); @@ -168,7 +157,7 @@ public StreamObserver control( // Ensure that the server completing the stream translates to the completable future // being completed allowing for a successful shutdown of the client. outboundServerObserver.onCompleted(); - future.get(); + client.waitForTermination(); } finally { server.shutdownNow(); } @@ -213,26 +202,19 @@ public StreamObserver control( throw new Error("Test Error"); }); + ExecutorService executor = Executors.newCachedThreadPool(); BeamFnControlClient client = new BeamFnControlClient( - "", apiServiceDescriptor, InProcessManagedChannelFactory.create(), OutboundObserverFactory.trivial(), + executor, handlers); // Get the connected client and attempt to send and receive an instruction StreamObserver outboundServerObserver = outboundServerObservers.take(); - ExecutorService executor = Executors.newCachedThreadPool(); - Future future = - executor.submit( - () -> { - client.processInstructionRequests(executor); - return null; - }); - // Ensure that all exceptions are caught and translated to failures outboundServerObserver.onNext( InstructionRequest.newBuilder() @@ -244,7 +226,7 @@ public StreamObserver control( // Ensure that the client shuts down when an Error is thrown from the harness try { - future.get(); + client.waitForTermination(); throw new IllegalStateException("The future should have terminated with an error"); } catch (ExecutionException errorWrapper) { assertThat(errorWrapper.getCause().getMessage(), containsString("Test Error")); diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/BundleSplitListenerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/BundleSplitListenerTest.java index 5a48efeccf33..fc392eca5665 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/BundleSplitListenerTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/BundleSplitListenerTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.fn.harness.control; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.empty; -import static org.junit.Assert.assertThat; import java.util.Collections; import org.apache.beam.model.fnexecution.v1.BeamFnApi.BundleApplication; diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/FinalizeBundleHandlerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/FinalizeBundleHandlerTest.java index da26d9b94518..a760d22b78af 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/FinalizeBundleHandlerTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/FinalizeBundleHandlerTest.java @@ -40,9 +40,6 @@ /** Tests for {@link FinalizeBundleHandler}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FinalizeBundleHandlerTest { private static final String INSTRUCTION_ID = "instructionId"; private static final InstructionResponse SUCCESSFUL_RESPONSE = diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/HarnessMonitoringInfosInstructionHandlerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/HarnessMonitoringInfosInstructionHandlerTest.java new file mode 100644 index 000000000000..ac69ed29a565 --- /dev/null +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/HarnessMonitoringInfosInstructionHandlerTest.java @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.control; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import java.util.HashMap; +import org.apache.beam.model.fnexecution.v1.BeamFnApi; +import org.apache.beam.runners.core.metrics.LabeledMetrics; +import org.apache.beam.runners.core.metrics.MetricsContainerImpl; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName; +import org.apache.beam.runners.core.metrics.ShortIdMap; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.MetricsEnvironment; +import org.junit.Test; + +public class HarnessMonitoringInfosInstructionHandlerTest { + + @Test + public void testReturnsProcessWideMonitoringInfos() { + MetricsEnvironment.setProcessWideContainer(MetricsContainerImpl.createProcessWideContainer()); + + HashMap labels = new HashMap(); + labels.put(MonitoringInfoConstants.Labels.SERVICE, "service"); + labels.put(MonitoringInfoConstants.Labels.METHOD, "method"); + labels.put(MonitoringInfoConstants.Labels.RESOURCE, "resource"); + labels.put(MonitoringInfoConstants.Labels.PTRANSFORM, "transform"); + labels.put(MonitoringInfoConstants.Labels.STATUS, "ok"); + MonitoringInfoMetricName name = + MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, labels); + Counter counter = LabeledMetrics.counter(name, true); + counter.inc(7); + + ShortIdMap metricsShortIds = new ShortIdMap(); + HarnessMonitoringInfosInstructionHandler testObject = + new HarnessMonitoringInfosInstructionHandler(metricsShortIds); + + BeamFnApi.InstructionRequest.Builder builder = BeamFnApi.InstructionRequest.newBuilder(); + BeamFnApi.InstructionResponse.Builder responseBuilder = + testObject.harnessMonitoringInfos(builder.build()); + + BeamFnApi.InstructionResponse response = responseBuilder.build(); + assertEquals(1, response.getHarnessMonitoringInfos().getMonitoringDataMap().size()); + + // Expect a payload to be set for "metric0". + assertTrue( + !response.getHarnessMonitoringInfos().getMonitoringDataMap().get("metric0").isEmpty()); + } +} diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/ProcessBundleHandlerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/ProcessBundleHandlerTest.java index a8c939c86a6c..f947cd68df69 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/ProcessBundleHandlerTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/control/ProcessBundleHandlerTest.java @@ -19,13 +19,15 @@ import static org.apache.beam.fn.harness.control.ProcessBundleHandler.REGISTERED_RUNNER_FACTORIES; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.equalTo; +import static org.hamcrest.collection.IsEmptyCollection.empty; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertThrows; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Mockito.argThat; @@ -75,6 +77,7 @@ import org.apache.beam.runners.core.construction.Timer; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; import org.apache.beam.runners.core.metrics.MetricsContainerStepMap; +import org.apache.beam.runners.core.metrics.ShortIdMap; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.fn.data.LogicalEndpoint; import org.apache.beam.sdk.function.ThrowingConsumer; @@ -90,17 +93,15 @@ import org.apache.beam.sdk.util.SerializableUtils; import org.apache.beam.sdk.util.WindowedValue; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Message; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Message; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles; import org.joda.time.Instant; import org.junit.Before; -import org.junit.Rule; import org.junit.Test; -import org.junit.rules.ExpectedException; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; import org.mockito.ArgumentCaptor; @@ -113,14 +114,11 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ProcessBundleHandlerTest { private static final String DATA_INPUT_URN = "beam:runner:source:v1"; private static final String DATA_OUTPUT_URN = "beam:runner:sink:v1"; - @Rule public ExpectedException thrown = ExpectedException.none(); - @Mock private BeamFnDataClient beamFnDataClient; @Captor private ArgumentCaptor>> consumerCaptor; @@ -228,7 +226,7 @@ MetricsContainerStepMap getMetricsContainerRegistry() { } @Override - ExecutionStateTracker getStateTracker() { + public ExecutionStateTracker getStateTracker() { return wrappedBundleProcessor.getStateTracker(); } @@ -276,10 +274,12 @@ public void testTrySplitBeforeBundleDoesNotFail() { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), null, beamFnDataClient, null /* beamFnStateClient */, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of(), new BundleProcessorCache()); @@ -302,10 +302,12 @@ public void testProgressBeforeBundleDoesNotFail() throws Exception { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), null, beamFnDataClient, null /* beamFnStateClient */, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of(), new BundleProcessorCache()); @@ -390,10 +392,12 @@ public void testOrderOfStartAndFinishCalls() throws Exception { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null /* beamFnStateClient */, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of( DATA_INPUT_URN, startFinishRecorder, DATA_OUTPUT_URN, startFinishRecorder), @@ -509,10 +513,12 @@ public void testOrderOfSetupTeardownCalls() throws Exception { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null /* beamFnStateClient */, null /* finalizeBundleHandler */, + new ShortIdMap(), urnToPTransformRunnerFactoryMap, new BundleProcessorCache()); @@ -555,10 +561,12 @@ public void testBundleProcessorIsResetWhenAddedBackToCache() throws Exception { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null /* beamFnStateGrpcClientCache */, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of( DATA_INPUT_URN, (pipelineOptions, @@ -634,6 +642,14 @@ public void testBundleProcessorIsFoundWhenActive() { // After it is released, ensure the bundle processor is no longer found cache.release("descriptorId", bundleProcessor); assertNull(cache.find("known")); + + // Once it is active, ensure the bundle processor is found + cache.get("descriptorId", "known", () -> bundleProcessor); + assertSame(bundleProcessor, cache.find("known")); + + // After it is discarded, ensure the bundle processor is no longer found + cache.discard(bundleProcessor); + assertNull(cache.find("known")); } @Test @@ -666,6 +682,7 @@ public void testBundleProcessorReset() throws Exception { bundleFinalizationCallbacks); bundleProcessor.reset(); + assertNull(bundleProcessor.getInstructionId()); verify(startFunctionRegistry, times(1)).reset(); verify(finishFunctionRegistry, times(1)).reset(); verify(splitListener, times(1)).clear(); @@ -691,10 +708,12 @@ public void testCreatingPTransformExceptionsArePropagated() throws Exception { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null /* beamFnStateGrpcClientCache */, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of( DATA_INPUT_URN, (pipelineOptions, @@ -715,16 +734,19 @@ public void testCreatingPTransformExceptionsArePropagated() throws Exception { addProgressRequestCallback, splitListener, bundleFinalizer) -> { - thrown.expect(IllegalStateException.class); - thrown.expectMessage("TestException"); throw new IllegalStateException("TestException"); }), new BundleProcessorCache()); - handler.processBundle( - BeamFnApi.InstructionRequest.newBuilder() - .setProcessBundle( - BeamFnApi.ProcessBundleRequest.newBuilder().setProcessBundleDescriptorId("1L")) - .build()); + assertThrows( + "TestException", + IllegalStateException.class, + () -> + handler.processBundle( + BeamFnApi.InstructionRequest.newBuilder() + .setProcessBundle( + BeamFnApi.ProcessBundleRequest.newBuilder() + .setProcessBundleDescriptorId("1L")) + .build())); } @Test @@ -744,10 +766,12 @@ public void testBundleFinalizationIsPropagated() throws Exception { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null /* beamFnStateGrpcClientCache */, mockFinalizeBundleHandler, + new ShortIdMap(), ImmutableMap.of( DATA_INPUT_URN, (PTransformRunnerFactory) @@ -799,7 +823,7 @@ public void testBundleFinalizationIsPropagated() throws Exception { } @Test - public void testPTransformStartExceptionsArePropagated() throws Exception { + public void testPTransformStartExceptionsArePropagated() { BeamFnApi.ProcessBundleDescriptor processBundleDescriptor = BeamFnApi.ProcessBundleDescriptor.newBuilder() .putTransforms( @@ -813,10 +837,12 @@ public void testPTransformStartExceptionsArePropagated() throws Exception { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null /* beamFnStateGrpcClientCache */, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of( DATA_INPUT_URN, (PTransformRunnerFactory) @@ -838,23 +864,24 @@ public void testPTransformStartExceptionsArePropagated() throws Exception { addProgressRequestCallback, splitListener, bundleFinalizer) -> { - thrown.expect(IllegalStateException.class); - thrown.expectMessage("TestException"); startFunctionRegistry.register( pTransformId, ProcessBundleHandlerTest::throwException); return null; }), new BundleProcessorCache()); - handler.processBundle( - BeamFnApi.InstructionRequest.newBuilder() - .setProcessBundle( - BeamFnApi.ProcessBundleRequest.newBuilder().setProcessBundleDescriptorId("1L")) - .build()); - + assertThrows( + "TestException", + IllegalStateException.class, + () -> + handler.processBundle( + BeamFnApi.InstructionRequest.newBuilder() + .setProcessBundle( + BeamFnApi.ProcessBundleRequest.newBuilder() + .setProcessBundleDescriptorId("1L")) + .build())); // BundleProcessor is not re-added back to the BundleProcessorCache in case of an exception // during bundle processing - assertThat( - handler.bundleProcessorCache.getCachedBundleProcessors(), equalTo(Collections.EMPTY_MAP)); + assertThat(handler.bundleProcessorCache.getCachedBundleProcessors().get("1L"), empty()); } @Test @@ -872,10 +899,12 @@ public void testPTransformFinishExceptionsArePropagated() throws Exception { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null /* beamFnStateGrpcClientCache */, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of( DATA_INPUT_URN, (PTransformRunnerFactory) @@ -897,23 +926,25 @@ public void testPTransformFinishExceptionsArePropagated() throws Exception { addProgressRequestCallback, splitListener, bundleFinalizer) -> { - thrown.expect(IllegalStateException.class); - thrown.expectMessage("TestException"); finishFunctionRegistry.register( pTransformId, ProcessBundleHandlerTest::throwException); return null; }), new BundleProcessorCache()); - handler.processBundle( - BeamFnApi.InstructionRequest.newBuilder() - .setProcessBundle( - BeamFnApi.ProcessBundleRequest.newBuilder().setProcessBundleDescriptorId("1L")) - .build()); + assertThrows( + "TestException", + IllegalStateException.class, + () -> + handler.processBundle( + BeamFnApi.InstructionRequest.newBuilder() + .setProcessBundle( + BeamFnApi.ProcessBundleRequest.newBuilder() + .setProcessBundleDescriptorId("1L")) + .build())); // BundleProcessor is not re-added back to the BundleProcessorCache in case of an exception // during bundle processing - assertThat( - handler.bundleProcessorCache.getCachedBundleProcessors(), equalTo(Collections.EMPTY_MAP)); + assertThat(handler.bundleProcessorCache.getCachedBundleProcessors().get("1L"), empty()); } @Test @@ -967,10 +998,12 @@ public void testPendingStateCallsBlockTillCompletion() throws Exception { ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, mockBeamFnStateGrpcClient, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of( DATA_INPUT_URN, new PTransformRunnerFactory() { @@ -1033,10 +1066,12 @@ public void testStateCallsFailIfNoStateApiServiceDescriptorSpecified() throws Ex ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null /* beamFnStateGrpcClientCache */, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of( DATA_INPUT_URN, new PTransformRunnerFactory() { @@ -1067,19 +1102,22 @@ public Object createRunnerForPTransform( } private void doStateCalls(BeamFnStateClient beamFnStateClient) { - thrown.expect(IllegalStateException.class); - thrown.expectMessage("State API calls are unsupported"); beamFnStateClient.handle( StateRequest.newBuilder().setInstructionId("SUCCESS"), new CompletableFuture<>()); } }), new BundleProcessorCache()); - handler.processBundle( - BeamFnApi.InstructionRequest.newBuilder() - .setProcessBundle( - BeamFnApi.ProcessBundleRequest.newBuilder().setProcessBundleDescriptorId("1L")) - .build()); + assertThrows( + "State API calls are unsupported", + IllegalStateException.class, + () -> + handler.processBundle( + BeamFnApi.InstructionRequest.newBuilder() + .setProcessBundle( + BeamFnApi.ProcessBundleRequest.newBuilder() + .setProcessBundleDescriptorId("1L")) + .build())); } @Test @@ -1097,10 +1135,12 @@ public void testTimerRegistrationsFailIfNoTimerApiServiceDescriptorSpecified() t ProcessBundleHandler handler = new ProcessBundleHandler( PipelineOptionsFactory.create(), + Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null /* beamFnStateGrpcClientCache */, null /* finalizeBundleHandler */, + new ShortIdMap(), ImmutableMap.of( DATA_INPUT_URN, new PTransformRunnerFactory() { @@ -1131,8 +1171,6 @@ public Object createRunnerForPTransform( } private void doTimerRegistrations(BeamFnTimerClient beamFnTimerClient) { - thrown.expect(IllegalStateException.class); - thrown.expectMessage("Timers are unsupported"); beamFnTimerClient.register( LogicalEndpoint.timer("1L", "2L", "Timer"), Timer.Coder.of(StringUtf8Coder.of(), GlobalWindow.Coder.INSTANCE), @@ -1140,11 +1178,16 @@ private void doTimerRegistrations(BeamFnTimerClient beamFnTimerClient) { } }), new BundleProcessorCache()); - handler.processBundle( - BeamFnApi.InstructionRequest.newBuilder() - .setProcessBundle( - BeamFnApi.ProcessBundleRequest.newBuilder().setProcessBundleDescriptorId("1L")) - .build()); + assertThrows( + "Timers are unsupported", + IllegalStateException.class, + () -> + handler.processBundle( + BeamFnApi.InstructionRequest.newBuilder() + .setProcessBundle( + BeamFnApi.ProcessBundleRequest.newBuilder() + .setProcessBundleDescriptorId("1L")) + .build())); } private static void throwException() { diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/BeamFnDataGrpcClientTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/BeamFnDataGrpcClientTest.java index 290b8e83cf18..8ce60b314265 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/BeamFnDataGrpcClientTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/BeamFnDataGrpcClientTest.java @@ -19,10 +19,10 @@ import static org.apache.beam.sdk.util.CoderUtils.encodeToByteArray; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.collection.IsEmptyCollection.empty; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.Collection; @@ -46,22 +46,19 @@ import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** Tests for {@link BeamFnDataGrpcClient}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnDataGrpcClientTest { private static final Coder> CODER = LengthPrefixCoder.of( diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/BeamFnDataInboundObserverTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/BeamFnDataInboundObserverTest.java index 6ddd639a02f5..aae1b119b407 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/BeamFnDataInboundObserverTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/BeamFnDataInboundObserverTest.java @@ -18,11 +18,11 @@ package org.apache.beam.fn.harness.data; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.empty; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.ArrayList; @@ -37,7 +37,7 @@ import org.apache.beam.sdk.fn.data.LogicalEndpoint; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/FakeBeamFnTimerClient.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/FakeBeamFnTimerClient.java index 0fe56f785995..80165a944f6e 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/FakeBeamFnTimerClient.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/FakeBeamFnTimerClient.java @@ -30,7 +30,6 @@ /** An implementation of a {@link BeamFnTimerClient} that can be used for testing. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FakeBeamFnTimerClient implements BeamFnTimerClient { private final ConcurrentMap> timerHandlers; @@ -65,6 +64,12 @@ public void awaitCompletion() throws InterruptedException, Exception { timerInputFutures.get(endpoint).get(); } + @Override + @SuppressWarnings("FutureReturnValueIgnored") + public void runWhenComplete(Runnable completeRunnable) { + timerInputFutures.get(endpoint).whenComplete((f, t) -> completeRunnable.run()); + } + @Override public boolean isDone() { return timerInputFutures.get(endpoint).isDone(); diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistryTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistryTest.java index 7a88942a753a..708ca8b34bba 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistryTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistryTest.java @@ -18,11 +18,13 @@ package org.apache.beam.fn.harness.data; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; +import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.any; +import static org.mockito.Mockito.doAnswer; import static org.mockito.Mockito.doThrow; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.times; @@ -30,21 +32,34 @@ import static org.mockito.Mockito.withSettings; import static org.powermock.api.mockito.PowerMockito.mockStatic; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; import org.apache.beam.fn.harness.HandlesSplits; import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; +import org.apache.beam.runners.core.metrics.DistributionData; import org.apache.beam.runners.core.metrics.ExecutionStateTracker; import org.apache.beam.runners.core.metrics.MetricsContainerStepMap; import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Labels; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Urns; +import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName; import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder; +import org.apache.beam.sdk.coders.IterableCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.fn.data.FnDataReceiver; import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.util.WindowedValue; +import org.apache.beam.sdk.util.common.ElementByteSizeObservableIterable; +import org.apache.beam.sdk.util.common.ElementByteSizeObservableIterator; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; import org.junit.runner.RunWith; +import org.mockito.stubbing.Answer; import org.powermock.api.mockito.PowerMockito; import org.powermock.core.classloader.annotations.PrepareForTest; import org.powermock.modules.junit4.PowerMockRunner; @@ -54,7 +69,6 @@ @PrepareForTest(MetricsEnvironment.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PCollectionConsumerRegistryTest { @@ -71,14 +85,14 @@ public void singleConsumer() throws Exception { metricsContainerRegistry, mock(ExecutionStateTracker.class)); FnDataReceiver> consumerA1 = mock(FnDataReceiver.class); - consumers.register(pCollectionA, pTransformIdA, consumerA1); + consumers.register(pCollectionA, pTransformIdA, consumerA1, StringUtf8Coder.of()); FnDataReceiver> wrapperConsumer = (FnDataReceiver>) (FnDataReceiver) consumers.getMultiplexingConsumer(pCollectionA); - - WindowedValue element = valueInGlobalWindow("elem"); - int numElements = 20; + String elementValue = "elem"; + WindowedValue element = valueInGlobalWindow(elementValue); + int numElements = 10; for (int i = 0; i < numElements; i++) { wrapperConsumer.accept(element); } @@ -87,18 +101,29 @@ public void singleConsumer() throws Exception { verify(consumerA1, times(numElements)).accept(element); assertThat(consumers.keySet(), contains(pCollectionA)); + List expected = new ArrayList<>(); + SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder(); builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT); builder.setLabel(MonitoringInfoConstants.Labels.PCOLLECTION, pCollectionA); builder.setInt64SumValue(numElements); - MonitoringInfo expected = builder.build(); + expected.add(builder.build()); + long elementByteSize = StringUtf8Coder.of().getEncodedElementByteSize(elementValue); + builder = new SimpleMonitoringInfoBuilder(); + builder.setUrn(Urns.SAMPLED_BYTE_SIZE); + builder.setLabel(MonitoringInfoConstants.Labels.PCOLLECTION, pCollectionA); + builder.setInt64DistributionValue( + DistributionData.create( + numElements * elementByteSize, numElements, elementByteSize, elementByteSize)); + expected.add(builder.build()); // Clear the timestamp before comparison. - MonitoringInfo result = - Iterables.find( + Iterable result = + Iterables.filter( metricsContainerRegistry.getMonitoringInfos(), monitoringInfo -> monitoringInfo.containsLabels(Labels.PCOLLECTION)); - assertEquals(expected, result); + + assertThat(result, containsInAnyOrder(expected.toArray())); } @Test @@ -113,7 +138,7 @@ public void singleConsumerException() throws Exception { metricsContainerRegistry, mock(ExecutionStateTracker.class)); FnDataReceiver> consumer = mock(FnDataReceiver.class); - consumers.register(pCollectionA, pTransformId, consumer); + consumers.register(pCollectionA, pTransformId, consumer, StringUtf8Coder.of()); FnDataReceiver> wrapperConsumer = (FnDataReceiver>) @@ -142,8 +167,8 @@ public void multipleConsumersSamePCollection() throws Exception { FnDataReceiver> consumerA1 = mock(FnDataReceiver.class); FnDataReceiver> consumerA2 = mock(FnDataReceiver.class); - consumers.register(pCollectionA, pTransformIdA, consumerA1); - consumers.register(pCollectionA, pTransformIdB, consumerA2); + consumers.register(pCollectionA, pTransformIdA, consumerA1, StringUtf8Coder.of()); + consumers.register(pCollectionA, pTransformIdB, consumerA2, StringUtf8Coder.of()); FnDataReceiver> wrapperConsumer = (FnDataReceiver>) @@ -187,8 +212,8 @@ public void multipleConsumersSamePCollectionException() throws Exception { FnDataReceiver> consumerA1 = mock(FnDataReceiver.class); FnDataReceiver> consumerA2 = mock(FnDataReceiver.class); - consumers.register(pCollectionA, pTransformId, consumerA1); - consumers.register(pCollectionA, pTransformId, consumerA2); + consumers.register(pCollectionA, pTransformId, consumerA1, StringUtf8Coder.of()); + consumers.register(pCollectionA, pTransformId, consumerA2, StringUtf8Coder.of()); FnDataReceiver> wrapperConsumer = (FnDataReceiver>) @@ -212,12 +237,12 @@ public void throwsOnRegisteringAfterMultiplexingConsumerWasInitialized() throws FnDataReceiver> consumerA1 = mock(FnDataReceiver.class); FnDataReceiver> consumerA2 = mock(FnDataReceiver.class); - consumers.register(pCollectionA, pTransformId, consumerA1); + consumers.register(pCollectionA, pTransformId, consumerA1, StringUtf8Coder.of()); consumers.getMultiplexingConsumer(pCollectionA); expectedException.expect(RuntimeException.class); expectedException.expectMessage("cannot be register()-d after"); - consumers.register(pCollectionA, pTransformId, consumerA2); + consumers.register(pCollectionA, pTransformId, consumerA2, StringUtf8Coder.of()); } @Test @@ -232,8 +257,8 @@ public void testScopedMetricContainerInvokedUponAcceptingElement() throws Except FnDataReceiver> consumerA1 = mock(FnDataReceiver.class); FnDataReceiver> consumerA2 = mock(FnDataReceiver.class); - consumers.register("pCollectionA", "pTransformA", consumerA1); - consumers.register("pCollectionA", "pTransformB", consumerA2); + consumers.register("pCollectionA", "pTransformA", consumerA1, StringUtf8Coder.of()); + consumers.register("pCollectionA", "pTransformB", consumerA2, StringUtf8Coder.of()); FnDataReceiver> wrapperConsumer = (FnDataReceiver>) @@ -252,7 +277,7 @@ public void testScopedMetricContainerInvokedUponAcceptingElement() throws Except } @Test - public void testScopedMetricContainerInvokedUponAccept() throws Exception { + public void testUnboundedCountersUponAccept() throws Exception { mockStatic(MetricsEnvironment.class, withSettings().verboseLogging()); final String pCollectionA = "pCollectionA"; final String pTransformIdA = "pTransformIdA"; @@ -264,7 +289,7 @@ public void testScopedMetricContainerInvokedUponAccept() throws Exception { FnDataReceiver> consumer = mock(FnDataReceiver.class, withSettings().verboseLogging()); - consumers.register(pCollectionA, pTransformIdA, consumer); + consumers.register(pCollectionA, pTransformIdA, consumer, StringUtf8Coder.of()); FnDataReceiver> wrapperConsumer = (FnDataReceiver>) @@ -275,9 +300,14 @@ public void testScopedMetricContainerInvokedUponAccept() throws Exception { verify(consumer, times(1)).accept(element); - // Verify that static scopedMetricsContainer is called with unbound container. - PowerMockito.verifyStatic(MetricsEnvironment.class, times(1)); - MetricsEnvironment.scopedMetricsContainer(metricsContainerRegistry.getUnboundContainer()); + HashMap labels = new HashMap(); + labels.put(Labels.PCOLLECTION, "pCollectionA"); + MonitoringInfoMetricName counterName = + MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.ELEMENT_COUNT, labels); + assertEquals( + 1L, + (long) + metricsContainerRegistry.getUnboundContainer().getCounter(counterName).getCumulative()); } @Test @@ -291,7 +321,7 @@ public void testHandlesSplitsPassedToOriginalConsumer() throws Exception { metricsContainerRegistry, mock(ExecutionStateTracker.class)); SplittingReceiver consumerA1 = mock(SplittingReceiver.class); - consumers.register(pCollectionA, pTransformIdA, consumerA1); + consumers.register(pCollectionA, pTransformIdA, consumerA1, StringUtf8Coder.of()); FnDataReceiver> wrapperConsumer = (FnDataReceiver>) @@ -306,5 +336,108 @@ public void testHandlesSplitsPassedToOriginalConsumer() throws Exception { verify(consumerA1).trySplit(0.3); } + @Test + public void testLazyByteSizeEstimation() throws Exception { + final String pCollectionA = "pCollectionA"; + final String pTransformIdA = "pTransformIdA"; + + MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap(); + PCollectionConsumerRegistry consumers = + new PCollectionConsumerRegistry( + metricsContainerRegistry, mock(ExecutionStateTracker.class)); + FnDataReceiver>> consumerA1 = mock(FnDataReceiver.class); + + consumers.register( + pCollectionA, pTransformIdA, consumerA1, IterableCoder.of(StringUtf8Coder.of())); + + FnDataReceiver>> wrapperConsumer = + (FnDataReceiver>>) + (FnDataReceiver) consumers.getMultiplexingConsumer(pCollectionA); + String elementValue = "elem"; + long elementByteSize = StringUtf8Coder.of().getEncodedElementByteSize(elementValue); + WindowedValue> element = + valueInGlobalWindow( + new TestElementByteSizeObservableIterable<>( + Arrays.asList(elementValue, elementValue), elementByteSize)); + int numElements = 10; + // Mock doing work on the iterable items + doAnswer( + (Answer) + invocation -> { + Object[] args = invocation.getArguments(); + WindowedValue> arg = (WindowedValue>) args[0]; + Iterator it = arg.getValue().iterator(); + while (it.hasNext()) { + it.next(); + } + return null; + }) + .when(consumerA1) + .accept(element); + + for (int i = 0; i < numElements; i++) { + wrapperConsumer.accept(element); + } + + // Check that the underlying consumers are each invoked per element. + verify(consumerA1, times(numElements)).accept(element); + assertThat(consumers.keySet(), contains(pCollectionA)); + + List expected = new ArrayList<>(); + + SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder(); + builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT); + builder.setLabel(MonitoringInfoConstants.Labels.PCOLLECTION, pCollectionA); + builder.setInt64SumValue(numElements); + expected.add(builder.build()); + + builder = new SimpleMonitoringInfoBuilder(); + builder.setUrn(Urns.SAMPLED_BYTE_SIZE); + builder.setLabel(MonitoringInfoConstants.Labels.PCOLLECTION, pCollectionA); + long expectedBytes = + (elementByteSize + 1) * 2 + + 5; // Additional 5 bytes are due to size and hasNext = false (1 byte). + builder.setInt64DistributionValue( + DistributionData.create( + numElements * expectedBytes, numElements, expectedBytes, expectedBytes)); + expected.add(builder.build()); + // Clear the timestamp before comparison. + Iterable result = + Iterables.filter( + metricsContainerRegistry.getMonitoringInfos(), + monitoringInfo -> monitoringInfo.containsLabels(Labels.PCOLLECTION)); + + assertThat(result, containsInAnyOrder(expected.toArray())); + } + + private class TestElementByteSizeObservableIterable + extends ElementByteSizeObservableIterable> { + private List elements; + private long elementByteSize; + + public TestElementByteSizeObservableIterable(List elements, long elementByteSize) { + this.elements = elements; + this.elementByteSize = elementByteSize; + } + + @Override + protected ElementByteSizeObservableIterator createIterator() { + return new ElementByteSizeObservableIterator() { + private int index = 0; + + @Override + public boolean hasNext() { + return index < elements.size(); + } + + @Override + public Object next() { + notifyValueReturned(elementByteSize); + return elements.get(index++); + } + }; + } + } + private abstract static class SplittingReceiver implements FnDataReceiver, HandlesSplits {} } diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/QueueingBeamFnDataClientTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/QueueingBeamFnDataClientTest.java index c5f60b60e8b6..ee9c4b85cb08 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/QueueingBeamFnDataClientTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/QueueingBeamFnDataClientTest.java @@ -19,8 +19,10 @@ import static org.apache.beam.sdk.util.CoderUtils.encodeToByteArray; import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; -import static org.junit.Assert.assertThat; +import static org.hamcrest.Matchers.equalTo; +import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -31,6 +33,7 @@ import java.util.concurrent.ExecutionException; import java.util.concurrent.Executors; import java.util.concurrent.Future; +import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicReference; import org.apache.beam.model.fnexecution.v1.BeamFnApi; import org.apache.beam.model.fnexecution.v1.BeamFnDataGrpc; @@ -47,13 +50,13 @@ import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.util.WindowedValue; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Rule; import org.junit.Test; import org.junit.runner.RunWith; @@ -171,7 +174,7 @@ public StreamObserver data( PipelineOptionsFactory.create(), (Endpoints.ApiServiceDescriptor descriptor) -> channel, OutboundObserverFactory.trivial()); - QueueingBeamFnDataClient queueingClient = new QueueingBeamFnDataClient(clientFactory); + QueueingBeamFnDataClient queueingClient = new QueueingBeamFnDataClient(clientFactory, 1000); InboundDataClient readFutureA = queueingClient.receive( @@ -235,6 +238,118 @@ public StreamObserver data( } } + @Test(timeout = 10000) + public void testClosingWithFullInboundQueue() throws Exception { + CountDownLatch waitForClientToConnect = new CountDownLatch(1); + CountDownLatch allowValueProcessing = new CountDownLatch(1); + final int numValues = 100; + CountDownLatch receiveAllValues = new CountDownLatch(numValues); + Collection> inboundValues = new ConcurrentLinkedQueue<>(); + Collection inboundServerValues = new ConcurrentLinkedQueue<>(); + AtomicReference> outboundServerObserver = + new AtomicReference<>(); + CallStreamObserver inboundServerObserver = + TestStreams.withOnNext(inboundServerValues::add).build(); + + Endpoints.ApiServiceDescriptor apiServiceDescriptor = + Endpoints.ApiServiceDescriptor.newBuilder() + .setUrl(this.getClass().getName() + "-" + UUID.randomUUID().toString()) + .build(); + Server server = + InProcessServerBuilder.forName(apiServiceDescriptor.getUrl()) + .addService( + new BeamFnDataGrpc.BeamFnDataImplBase() { + @Override + public StreamObserver data( + StreamObserver outboundObserver) { + outboundServerObserver.set(outboundObserver); + waitForClientToConnect.countDown(); + return inboundServerObserver; + } + }) + .build(); + server.start(); + try { + ManagedChannel channel = + InProcessChannelBuilder.forName(apiServiceDescriptor.getUrl()).build(); + + BeamFnDataGrpcClient clientFactory = + new BeamFnDataGrpcClient( + PipelineOptionsFactory.create(), + (Endpoints.ApiServiceDescriptor descriptor) -> channel, + OutboundObserverFactory.trivial()); + // We want the queue to have no room when we try to close. The queue size + // is therefore set to numValues -1 since one of the values has been removed + // from the queue to accept it. + QueueingBeamFnDataClient queueingClient = + new QueueingBeamFnDataClient(clientFactory, numValues - 1); + + final AtomicInteger currentCount = new AtomicInteger(); + InboundDataClient inboundDataClient = + queueingClient.receive( + apiServiceDescriptor, + ENDPOINT_A, + CODER, + (WindowedValue wv) -> { + if (allowValueProcessing.getCount() != 0) { + LOG.info("Inbound processing blocking"); + } + allowValueProcessing.await(); + LOG.info("Received " + wv.getValue()); + assertEquals("ABC" + currentCount.getAndIncrement(), wv.getValue()); + }); + + waitForClientToConnect.await(); + + // Start draining elements, the drain will be blocked by allowValueProcessing. + Future drainElementsFuture = + executor.submit( + () -> { + try { + queueingClient.drainAndBlock(); + } catch (Exception e) { + LOG.error("Failed ", e); + fail(); + } + }); + + // We should be able to send all the elements and complete without blocking. + for (int i = 0; i < numValues; ++i) { + BeamFnApi.Elements element = + BeamFnApi.Elements.newBuilder() + .addData( + BeamFnApi.Elements.Data.newBuilder() + .setInstructionId(ENDPOINT_A.getInstructionId()) + .setTransformId(ENDPOINT_A.getTransformId()) + .setData( + ByteString.copyFrom( + encodeToByteArray(CODER, valueInGlobalWindow("ABC" + i))))) + .build(); + outboundServerObserver.get().onNext(element); + } + outboundServerObserver + .get() + .onNext( + BeamFnApi.Elements.newBuilder() + .addData( + BeamFnApi.Elements.Data.newBuilder() + .setInstructionId(ENDPOINT_A.getInstructionId()) + .setTransformId(ENDPOINT_A.getTransformId()) + .setIsLast(true)) + .build()); + inboundDataClient.awaitCompletion(); + + // Allow processing to complete and verify that draining finishes. + LOG.info("Completed client, allowing inbound processing."); + allowValueProcessing.countDown(); + drainElementsFuture.get(); + + assertThat(currentCount.get(), equalTo(numValues)); + } finally { + server.shutdownNow(); + } + } + @Test(timeout = 100000) public void testBundleProcessorThrowsExecutionExceptionWhenUserCodeThrows() throws Exception { CountDownLatch waitForClientToConnect = new CountDownLatch(1); @@ -273,7 +388,7 @@ public StreamObserver data( PipelineOptionsFactory.create(), (Endpoints.ApiServiceDescriptor descriptor) -> channel, OutboundObserverFactory.trivial()); - QueueingBeamFnDataClient queueingClient = new QueueingBeamFnDataClient(clientFactory); + QueueingBeamFnDataClient queueingClient = new QueueingBeamFnDataClient(clientFactory, 1000); InboundDataClient readFutureA = queueingClient.receive( @@ -342,6 +457,8 @@ public StreamObserver data( } catch (ExecutionException e) { if (e.getCause() instanceof RuntimeException) { intentionallyFailedB = true; + } else { + LOG.error("Unintentional failure ", e); } } assertTrue(intentionallyFailedB); diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClientTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClientTest.java index 93c1b3d0ef1d..dd9e4124c916 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClientTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/BeamFnLoggingClientTest.java @@ -18,11 +18,11 @@ package org.apache.beam.fn.harness.logging; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables.getStackTraceAsString; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.util.Collection; @@ -39,27 +39,25 @@ import org.apache.beam.model.pipeline.v1.Endpoints; import org.apache.beam.sdk.fn.test.TestStreams; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Timestamp; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.Timestamp; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; +import org.junit.rules.TestRule; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** Tests for {@link BeamFnLoggingClient}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnLoggingClientTest { - + @Rule public TestRule restoreLogging = new RestoreBeamFnLoggingMDC(); private static final LogRecord FILTERED_RECORD; private static final LogRecord TEST_RECORD; private static final LogRecord TEST_RECORD_WITH_EXCEPTION; @@ -81,6 +79,7 @@ public class BeamFnLoggingClientTest { private static final BeamFnApi.LogEntry TEST_ENTRY = BeamFnApi.LogEntry.newBuilder() + .setInstructionId("instruction-1") .setSeverity(BeamFnApi.LogEntry.Severity.Enum.DEBUG) .setMessage("Message") .setThread("12345") @@ -89,6 +88,7 @@ public class BeamFnLoggingClientTest { .build(); private static final BeamFnApi.LogEntry TEST_ENTRY_WITH_EXCEPTION = BeamFnApi.LogEntry.newBuilder() + .setInstructionId("instruction-1") .setSeverity(BeamFnApi.LogEntry.Severity.Enum.WARN) .setMessage("MessageWithException") .setTrace(getStackTraceAsString(TEST_RECORD_WITH_EXCEPTION.getThrown())) @@ -100,6 +100,7 @@ public class BeamFnLoggingClientTest { @Test public void testLogging() throws Exception { + BeamFnLoggingMDC.setInstructionId("instruction-1"); AtomicBoolean clientClosedStream = new AtomicBoolean(); Collection values = new ConcurrentLinkedQueue<>(); AtomicReference> outboundServerObserver = @@ -178,6 +179,7 @@ public StreamObserver logging( @Test public void testWhenServerFailsThatClientIsAbleToCleanup() throws Exception { + BeamFnLoggingMDC.setInstructionId("instruction-1"); Collection values = new ConcurrentLinkedQueue<>(); AtomicReference> outboundServerObserver = new AtomicReference<>(); @@ -247,6 +249,7 @@ public StreamObserver logging( @Test public void testWhenServerHangsUpEarlyThatClientIsAbleCleanup() throws Exception { + BeamFnLoggingMDC.setInstructionId("instruction-1"); Collection values = new ConcurrentLinkedQueue<>(); AtomicReference> outboundServerObserver = new AtomicReference<>(); diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/RestoreBeamFnLoggingMDC.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/RestoreBeamFnLoggingMDC.java new file mode 100644 index 000000000000..003686233814 --- /dev/null +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/RestoreBeamFnLoggingMDC.java @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.logging; + +import org.junit.rules.ExternalResource; + +/** Saves, clears and restores the current thread-local logging parameters for tests. */ +public class RestoreBeamFnLoggingMDC extends ExternalResource { + private String previousInstruction; + + public RestoreBeamFnLoggingMDC() {} + + @Override + protected void before() throws Throwable { + previousInstruction = BeamFnLoggingMDC.getInstructionId(); + BeamFnLoggingMDC.setInstructionId(null); + } + + @Override + protected void after() { + BeamFnLoggingMDC.setInstructionId(previousInstruction); + } +} diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/RestoreBeamFnLoggingMDCTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/RestoreBeamFnLoggingMDCTest.java new file mode 100644 index 000000000000..47166b0436d4 --- /dev/null +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/logging/RestoreBeamFnLoggingMDCTest.java @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.logging; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNull; +import static org.junit.Assert.assertTrue; + +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TestRule; +import org.junit.runner.Description; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.junit.runners.model.Statement; + +/** Tests for {@link RestoreBeamFnLoggingMDC}. */ +@RunWith(JUnit4.class) +public class RestoreBeamFnLoggingMDCTest { + @Rule public TestRule restoreMDCAfterTest = new RestoreBeamFnLoggingMDC(); + + @Test + public void testOldValuesAreRestored() throws Throwable { + // We need our own instance here so that we don't squash any saved values. + TestRule restoreMDC = new RestoreBeamFnLoggingMDC(); + + final boolean[] evaluateRan = new boolean[1]; + BeamFnLoggingMDC.setInstructionId("oldInstruction"); + + restoreMDC + .apply( + new Statement() { + @Override + public void evaluate() { + evaluateRan[0] = true; + // Ensure parameters are cleared before the test runs + assertNull("null Instruction", BeamFnLoggingMDC.getInstructionId()); + + // Simulate updating parameters for the test + BeamFnLoggingMDC.setInstructionId("newInstruction"); + + // Ensure that the values changed + assertEquals("newInstruction", BeamFnLoggingMDC.getInstructionId()); + } + }, + Description.EMPTY) + .evaluate(); + + // Validate that the statement ran and that the values were reverted + assertTrue(evaluateRan[0]); + assertEquals("oldInstruction", BeamFnLoggingMDC.getInstructionId()); + } +} diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/BagUserStateTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/BagUserStateTest.java index 267319f73221..0edc9ea7f3a3 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/BagUserStateTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/BagUserStateTest.java @@ -25,7 +25,7 @@ import java.io.IOException; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; import org.apache.beam.sdk.coders.StringUtf8Coder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.junit.Rule; @@ -36,9 +36,6 @@ /** Tests for {@link BagUserState}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BagUserStateTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCacheTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCacheTest.java index f274358b0ff3..75a871ab6eb6 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCacheTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCacheTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.fn.harness.state; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNotSame; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import static org.junit.Assert.fail; import java.util.UUID; @@ -36,14 +36,14 @@ import org.apache.beam.sdk.fn.IdGenerators; import org.apache.beam.sdk.fn.stream.OutboundObserverFactory; import org.apache.beam.sdk.fn.test.TestStreams; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ManagedChannel; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Server; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Status; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusRuntimeException; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessChannelBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.inprocess.InProcessServerBuilder; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusRuntimeException; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessChannelBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles; import org.junit.After; import org.junit.Before; @@ -53,9 +53,6 @@ /** Tests for {@link BeamFnStateGrpcClientCache}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BeamFnStateGrpcClientCacheTest { private static final String SUCCESS = "SUCCESS"; private static final String FAIL = "FAIL"; diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/CachingBeamFnStateClientTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/CachingBeamFnStateClientTest.java new file mode 100644 index 000000000000..8604934224d8 --- /dev/null +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/CachingBeamFnStateClientTest.java @@ -0,0 +1,335 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNull; +import static org.junit.Assert.assertTrue; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.CompletableFuture; +import org.apache.beam.fn.harness.state.CachingBeamFnStateClient.StateCacheKey; +import org.apache.beam.model.fnexecution.v1.BeamFnApi; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleRequest.CacheToken; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateAppendRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateClearRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetResponse; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheLoader; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LoadingCache; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link CachingBeamFnStateClient}. */ +@RunWith(JUnit4.class) +public class CachingBeamFnStateClientTest { + + private LoadingCache> stateCache; + private List cacheTokenList; + private CacheToken userStateToken = + CacheToken.newBuilder() + .setUserState(CacheToken.UserState.getDefaultInstance()) + .setToken(ByteString.copyFromUtf8("1")) + .build(); + private StateCacheKey defaultCacheKey = + StateCacheKey.create(ByteString.copyFromUtf8("1"), ByteString.EMPTY); + private CacheLoader> loader = + new CacheLoader>() { + @Override + public Map load(StateKey key) { + return new HashMap<>(); + } + }; + + @Before + public void setup() { + stateCache = CacheBuilder.newBuilder().build(loader); + cacheTokenList = new ArrayList<>(); + } + + @Test + public void testNoCacheWithoutToken() throws Exception { + FakeBeamFnStateClient fakeClient = + new FakeBeamFnStateClient(ImmutableMap.of(key("A"), encode("A1", "A2", "A3"))); + + CachingBeamFnStateClient cachingClient = + new CachingBeamFnStateClient(fakeClient, stateCache, cacheTokenList); + + CompletableFuture response1 = new CompletableFuture<>(); + CompletableFuture response2 = new CompletableFuture<>(); + + StateRequest.Builder request = + StateRequest.newBuilder() + .setStateKey(key("A")) + .setGet(BeamFnApi.StateGetRequest.newBuilder().build()); + + cachingClient.handle(request, response1); + assertEquals(1, fakeClient.getCallCount()); + request.clearId(); + + cachingClient.handle(request, response2); + assertEquals(2, fakeClient.getCallCount()); + } + + @Test + public void testCachingUserState() throws Exception { + FakeBeamFnStateClient fakeClient = + new FakeBeamFnStateClient(ImmutableMap.of(key("A"), encode("A1", "A2", "A3")), 3); + + cacheTokenList.add(userStateToken); + + CachingBeamFnStateClient cachingClient = + new CachingBeamFnStateClient(fakeClient, stateCache, cacheTokenList); + + assertEquals(fakeClient.getData().get(key("A")), getALlDataForKey(key("A"), cachingClient)); + assertEquals(3, fakeClient.getCallCount()); + + assertEquals(fakeClient.getData().get(key("A")), getALlDataForKey(key("A"), cachingClient)); + assertEquals(3, fakeClient.getCallCount()); + } + + @Test + public void testCachingIterableSideInput() throws Exception { + StateKey iterableSideInput = + StateKey.newBuilder() + .setIterableSideInput( + StateKey.IterableSideInput.newBuilder() + .setTransformId("GBK") + .setSideInputId("Iterable") + .build()) + .build(); + + FakeBeamFnStateClient fakeClient = + new FakeBeamFnStateClient(ImmutableMap.of(iterableSideInput, encode("S1", "S2", "S3")), 3); + + CacheToken iterableToken = sideInputCacheToken("GBK", "Iterable"); + cacheTokenList.add(iterableToken); + + CachingBeamFnStateClient cachingClient = + new CachingBeamFnStateClient(fakeClient, stateCache, cacheTokenList); + + assertEquals( + fakeClient.getData().get(iterableSideInput), + getALlDataForKey(iterableSideInput, cachingClient)); + assertEquals(3, fakeClient.getCallCount()); + + assertEquals( + fakeClient.getData().get(iterableSideInput), + getALlDataForKey(iterableSideInput, cachingClient)); + assertEquals(3, fakeClient.getCallCount()); + } + + @Test + public void testCachingMultimapSideInput() throws Exception { + StateKey multimapKeys = + StateKey.newBuilder() + .setMultimapKeysSideInput( + StateKey.MultimapKeysSideInput.newBuilder() + .setTransformId("GBK") + .setSideInputId("Multimap") + .build()) + .build(); + + StateKey multimapValues = + StateKey.newBuilder() + .setMultimapSideInput( + StateKey.MultimapSideInput.newBuilder() + .setTransformId("GBK") + .setSideInputId("Multimap") + .setKey(encode("K1")) + .build()) + .build(); + + Map clientData = new HashMap<>(); + clientData.put(multimapKeys, encode("K1", "K2")); + clientData.put(multimapValues, encode("V1", "V2", "V3")); + + FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(clientData, 3); + + CacheToken multimapToken = sideInputCacheToken("GBK", "Multimap"); + cacheTokenList.add(multimapToken); + + CachingBeamFnStateClient cachingClient = + new CachingBeamFnStateClient(fakeClient, stateCache, cacheTokenList); + + assertEquals( + fakeClient.getData().get(multimapKeys), getALlDataForKey(multimapKeys, cachingClient)); + assertEquals(2, fakeClient.getCallCount()); + + assertEquals( + fakeClient.getData().get(multimapKeys), getALlDataForKey(multimapKeys, cachingClient)); + assertEquals(2, fakeClient.getCallCount()); + + assertEquals( + fakeClient.getData().get(multimapValues), getALlDataForKey(multimapValues, cachingClient)); + assertEquals(5, fakeClient.getCallCount()); + + assertEquals( + fakeClient.getData().get(multimapValues), getALlDataForKey(multimapValues, cachingClient)); + assertEquals(5, fakeClient.getCallCount()); + } + + @Test + public void testAppendInvalidatesLastPage() throws Exception { + FakeBeamFnStateClient fakeClient = + new FakeBeamFnStateClient( + ImmutableMap.of(key("A"), encode("A1"), key("B"), encode("B1")), 3); + + cacheTokenList.add(userStateToken); + + CachingBeamFnStateClient cachingClient = + new CachingBeamFnStateClient(fakeClient, stateCache, cacheTokenList); + + // Append works with no pages in cache + appendToKey(key("A"), encode("A2"), cachingClient); + assertTrue(stateCache.getUnchecked(key("A")).isEmpty()); + assertEquals(fakeClient.getData().get(key("A")), getALlDataForKey(key("A"), cachingClient)); + assertEquals(3, fakeClient.getCallCount()); + + // Append works with multiple pages in cache + appendToKey(key("A"), encode("A3"), cachingClient); + assertFalse( + stateCache + .getUnchecked(key("A")) + .containsValue(StateGetResponse.newBuilder().setData(encode("A2")).build())); + assertEquals(fakeClient.getData().get(key("A")), getALlDataForKey(key("A"), cachingClient)); + assertEquals(6, fakeClient.getCallCount()); + + // Append works with one page in the cache + assertEquals(fakeClient.getData().get(key("B")), getALlDataForKey(key("B"), cachingClient)); + appendToKey(key("B"), encode("B2"), cachingClient); + assertTrue(stateCache.getUnchecked(key("B")).isEmpty()); + assertEquals(fakeClient.getData().get(key("B")), getALlDataForKey(key("B"), cachingClient)); + assertEquals(10, fakeClient.getCallCount()); + + // Append works with no prior data + appendToKey(key("C"), encode("C1"), cachingClient); + assertTrue(stateCache.getUnchecked(key("C")).isEmpty()); + assertEquals(fakeClient.getData().get(key("C")), getALlDataForKey(key("C"), cachingClient)); + } + + @Test + public void testCacheClear() throws Exception { + FakeBeamFnStateClient fakeClient = + new FakeBeamFnStateClient( + ImmutableMap.of(key("A"), encode("A1"), key("B"), encode("B1", "B2")), 3); + + cacheTokenList.add(userStateToken); + + CachingBeamFnStateClient cachingClient = + new CachingBeamFnStateClient(fakeClient, stateCache, cacheTokenList); + + // Test clear cacheable value + clearKey(key("A"), cachingClient); + assertEquals(1, fakeClient.getCallCount()); + assertNull(fakeClient.getData().get(key("A"))); + assertEquals(ByteString.EMPTY, getALlDataForKey(key("A"), cachingClient)); + assertEquals(1, fakeClient.getCallCount()); + + // Test clear cached value + getALlDataForKey(key("B"), cachingClient); + clearKey(key("B"), cachingClient); + assertEquals(4, fakeClient.getCallCount()); + assertNull(fakeClient.getData().get(key("B"))); + assertEquals(ByteString.EMPTY, getALlDataForKey(key("B"), cachingClient)); + assertEquals(4, fakeClient.getCallCount()); + } + + private StateKey key(String id) throws IOException { + return StateKey.newBuilder() + .setBagUserState( + StateKey.BagUserState.newBuilder() + .setTransformId("ptransformId") + .setUserStateId("stateId") + .setWindow(ByteString.copyFromUtf8("encodedWindow")) + .setKey(encode(id))) + .build(); + } + + private CacheToken sideInputCacheToken(String transformID, String sideInputID) { + return CacheToken.newBuilder() + .setSideInput( + CacheToken.SideInput.newBuilder() + .setTransformId(transformID) + .setSideInputId(sideInputID) + .build()) + .setToken(ByteString.copyFromUtf8("1")) + .build(); + } + + private ByteString encode(String... values) throws IOException { + ByteString.Output out = ByteString.newOutput(); + for (String value : values) { + StringUtf8Coder.of().encode(value, out); + } + return out.toByteString(); + } + + private void appendToKey(StateKey key, ByteString data, CachingBeamFnStateClient cachingClient) + throws Exception { + StateRequest.Builder appendRequestBuilder = + StateRequest.newBuilder() + .setStateKey(key) + .setAppend(StateAppendRequest.newBuilder().setData(data)); + + CompletableFuture appendResponse = new CompletableFuture<>(); + cachingClient.handle(appendRequestBuilder, appendResponse); + appendResponse.get(); + } + + private void clearKey(StateKey key, CachingBeamFnStateClient cachingClient) throws Exception { + StateRequest.Builder clearRequestBuilder = + StateRequest.newBuilder().setStateKey(key).setClear(StateClearRequest.getDefaultInstance()); + + CompletableFuture clearResponse = new CompletableFuture<>(); + cachingClient.handle(clearRequestBuilder, clearResponse); + clearResponse.get(); + } + + private ByteString getALlDataForKey(StateKey key, CachingBeamFnStateClient cachingClient) + throws Exception { + ByteString continuationToken = ByteString.EMPTY; + ByteString allData = ByteString.EMPTY; + StateRequest.Builder requestBuilder = StateRequest.newBuilder().setStateKey(key); + do { + requestBuilder + .clearId() + .setGet(StateGetRequest.newBuilder().setContinuationToken(continuationToken)); + CompletableFuture getResponse = new CompletableFuture<>(); + cachingClient.handle(requestBuilder, getResponse); + continuationToken = getResponse.get().getGet().getContinuationToken(); + allData = allData.concat(getResponse.get().getGet().getData()); + } while (!continuationToken.equals(ByteString.EMPTY)); + + return allData; + } +} diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/FakeBeamFnStateClient.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/FakeBeamFnStateClient.java index e6346523ea91..64c429280c4d 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/FakeBeamFnStateClient.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/FakeBeamFnStateClient.java @@ -20,7 +20,9 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotEquals; +import java.util.ArrayList; import java.util.Collections; +import java.util.List; import java.util.Map; import java.util.concurrent.CompletableFuture; import java.util.concurrent.ConcurrentHashMap; @@ -32,19 +34,42 @@ import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest.RequestCase; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; /** A fake implementation of a {@link BeamFnStateClient} to aid with testing. */ public class FakeBeamFnStateClient implements BeamFnStateClient { - private final Map data; + private final Map> data; private int currentId; public FakeBeamFnStateClient(Map initialData) { - this.data = new ConcurrentHashMap<>(initialData); + this(initialData, 6); + } + + public FakeBeamFnStateClient(Map initialData, int chunkSize) { + this.data = + new ConcurrentHashMap<>( + Maps.transformValues( + initialData, + (ByteString all) -> { + List chunks = new ArrayList<>(); + for (int i = 0; i < Math.max(1, all.size()); i += chunkSize) { + chunks.add(all.substring(i, Math.min(all.size(), i + chunkSize))); + } + return chunks; + })); } public Map getData() { - return Collections.unmodifiableMap(data); + return Maps.transformValues( + data, + bs -> { + ByteString all = ByteString.EMPTY; + for (ByteString b : bs) { + all = all.concat(b); + } + return all; + }); } @Override @@ -67,16 +92,16 @@ public void handle( switch (request.getRequestCase()) { case GET: - // Chunk gets into 5 byte return blocks - ByteString byteString = data.getOrDefault(request.getStateKey(), ByteString.EMPTY); + // Chunk gets into 6 byte return blocks + List byteStrings = + data.getOrDefault(request.getStateKey(), Collections.singletonList(ByteString.EMPTY)); int block = 0; if (request.getGet().getContinuationToken().size() > 0) { block = Integer.parseInt(request.getGet().getContinuationToken().toStringUtf8()); } - ByteString returnBlock = - byteString.substring(block * 5, Math.min(byteString.size(), (block + 1) * 5)); + ByteString returnBlock = byteStrings.get(block); ByteString continuationToken = ByteString.EMPTY; - if (byteString.size() > (block + 1) * 5) { + if (byteStrings.size() > block + 1) { continuationToken = ByteString.copyFromUtf8(Integer.toString(block + 1)); } response = @@ -93,8 +118,18 @@ public void handle( break; case APPEND: - ByteString previousValue = data.getOrDefault(request.getStateKey(), ByteString.EMPTY); - data.put(request.getStateKey(), previousValue.concat(request.getAppend().getData())); + List previousValue = + data.getOrDefault(request.getStateKey(), Collections.singletonList(ByteString.EMPTY)); + List newValue = new ArrayList<>(); + newValue.addAll(previousValue); + ByteString newData = request.getAppend().getData(); + if (newData.size() % 2 == 0) { + newValue.remove(newValue.size() - 1); + newValue.add(previousValue.get(previousValue.size() - 1).concat(newData)); + } else { + newValue.add(newData); + } + data.put(request.getStateKey(), newValue); response = StateResponse.newBuilder().setAppend(StateAppendResponse.getDefaultInstance()); break; @@ -109,4 +144,8 @@ public void handle( private String generateId() { return Integer.toString(++currentId); } + + public int getCallCount() { + return currentId; + } } diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/LazyCachingIteratorToIterableTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/LazyCachingIteratorToIterableTest.java index 7597128dcfad..0914b017e703 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/LazyCachingIteratorToIterableTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/LazyCachingIteratorToIterableTest.java @@ -25,8 +25,11 @@ import java.util.Iterator; import java.util.NoSuchElementException; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterator; +import org.apache.beam.sdk.fn.stream.PrefetchableIterators; +import org.apache.beam.sdk.fn.stream.PrefetchableIteratorsTest.ReadyAfterPrefetch; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; @@ -40,7 +43,8 @@ public class LazyCachingIteratorToIterableTest { @Test public void testEmptyIterator() { - Iterable iterable = new LazyCachingIteratorToIterable<>(Iterators.forArray()); + Iterable iterable = + new LazyCachingIteratorToIterable<>(PrefetchableIterators.emptyIterator()); assertArrayEquals(new Object[0], Iterables.toArray(iterable, Object.class)); // iterate multiple times assertArrayEquals(new Object[0], Iterables.toArray(iterable, Object.class)); @@ -52,7 +56,7 @@ public void testEmptyIterator() { @Test public void testInterleavedIteration() { Iterable iterable = - new LazyCachingIteratorToIterable<>(Iterators.forArray("A", "B", "C")); + new LazyCachingIteratorToIterable<>(PrefetchableIterators.fromArray("A", "B", "C")); Iterator iterator1 = iterable.iterator(); assertTrue(iterator1.hasNext()); @@ -77,14 +81,45 @@ public void testInterleavedIteration() { @Test public void testEqualsAndHashCode() { - Iterable iterA = new LazyCachingIteratorToIterable<>(Iterators.forArray("A", "B", "C")); - Iterable iterB = new LazyCachingIteratorToIterable<>(Iterators.forArray("A", "B", "C")); - Iterable iterC = new LazyCachingIteratorToIterable<>(Iterators.forArray()); - Iterable iterD = new LazyCachingIteratorToIterable<>(Iterators.forArray()); + Iterable iterA = + new LazyCachingIteratorToIterable<>(PrefetchableIterators.fromArray("A", "B", "C")); + Iterable iterB = + new LazyCachingIteratorToIterable<>(PrefetchableIterators.fromArray("A", "B", "C")); + Iterable iterC = new LazyCachingIteratorToIterable<>(PrefetchableIterators.fromArray()); + Iterable iterD = new LazyCachingIteratorToIterable<>(PrefetchableIterators.fromArray()); assertEquals(iterA, iterB); assertEquals(iterC, iterD); assertNotEquals(iterA, iterC); assertEquals(iterA.hashCode(), iterB.hashCode()); assertEquals(iterC.hashCode(), iterD.hashCode()); } + + @Test + public void testPrefetch() { + ReadyAfterPrefetch underlying = + new ReadyAfterPrefetch<>(PrefetchableIterators.fromArray("A", "B", "C")); + PrefetchableIterable iterable = new LazyCachingIteratorToIterable<>(underlying); + PrefetchableIterator iterator1 = iterable.iterator(); + PrefetchableIterator iterator2 = iterable.iterator(); + + // Check that the lazy iterable doesn't do any prefetch/access on instantiation + assertFalse(underlying.isReady()); + assertFalse(iterator1.isReady()); + assertFalse(iterator2.isReady()); + + // Check that if both iterators prefetch there is only one prefetch for the underlying iterator + // iterator. + iterator1.prefetch(); + assertEquals(1, underlying.getNumPrefetchCalls()); + iterator2.prefetch(); + assertEquals(1, underlying.getNumPrefetchCalls()); + + // Check that if that one iterator has advanced, the second doesn't perform any prefetch since + // the element is now cached. + iterator1.next(); + iterator1.next(); + iterator2.next(); + iterator2.prefetch(); + assertEquals(1, underlying.getNumPrefetchCalls()); + } } diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java index 635c111f4d96..2bb7f156072e 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java @@ -22,7 +22,7 @@ import java.io.IOException; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; import org.apache.beam.sdk.coders.StringUtf8Coder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.junit.Test; diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/StateBackedIterableTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/StateBackedIterableTest.java index 8717c671a27e..4221a4ef85e3 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/StateBackedIterableTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/StateBackedIterableTest.java @@ -22,6 +22,8 @@ import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; @@ -30,7 +32,7 @@ import java.util.Random; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; import org.apache.beam.sdk.coders.StringUtf8Coder; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.FluentIterable; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -45,7 +47,6 @@ @RunWith(Enclosed.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class StateBackedIterableTest { @@ -172,6 +173,38 @@ public void testEncodeDecodeStateBackedIterable() throws Exception { assertEquals(iterable.prefix, result.prefix); assertEquals(iterable.request, result.request); } + + @Test + public void testSerializability() throws Exception { + FakeBeamFnStateClient fakeBeamFnStateClient = + new FakeBeamFnStateClient( + ImmutableMap.of( + key("suffix"), encode("C", "D", "E"), + key("emptySuffix"), encode())); + + StateBackedIterable iterable = + new StateBackedIterable<>( + fakeBeamFnStateClient, + "instruction", + encode("suffix"), + StringUtf8Coder.of(), + ImmutableList.of("A", "B")); + + List expected = ImmutableList.of("A", "B", "C", "D", "E"); + + ByteArrayOutputStream bout = new ByteArrayOutputStream(); + ObjectOutputStream out = new ObjectOutputStream(bout); + out.writeObject(iterable); + out.flush(); + ByteArrayInputStream bin = new ByteArrayInputStream(bout.toByteArray()); + ObjectInputStream in = new ObjectInputStream(bin); + Iterable deserialized = (Iterable) in.readObject(); + + // Check that the contents are the same. + assertEquals(expected, Lists.newArrayList(deserialized)); + // Check that we can still iterate over it as before. + assertEquals(expected, Lists.newArrayList(iterable)); + } } private static StateKey key(String id) throws IOException { diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/StateFetchingIteratorsTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/StateFetchingIteratorsTest.java index d184ca2c4daa..384d2df69219 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/StateFetchingIteratorsTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/StateFetchingIteratorsTest.java @@ -18,20 +18,74 @@ package org.apache.beam.fn.harness.state; import static org.junit.Assert.assertArrayEquals; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; -import java.util.Iterator; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.stream.Collectors; +import org.apache.beam.fn.harness.state.StateFetchingIterators.FirstPageAndRemainder; import org.apache.beam.fn.harness.state.StateFetchingIterators.LazyBlockingStateFetchingIterator; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetResponse; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.VarIntCoder; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterator; +import org.apache.beam.sdk.util.CoderUtils; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.junit.Test; +import org.junit.experimental.runners.Enclosed; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** Tests for {@link StateFetchingIterators}. */ +@RunWith(Enclosed.class) public class StateFetchingIteratorsTest { + + private static BeamFnStateClient fakeStateClient( + AtomicInteger callCount, ByteString... expected) { + return (requestBuilder, response) -> { + callCount.incrementAndGet(); + if (expected.length == 0) { + response.complete( + StateResponse.newBuilder() + .setId(requestBuilder.getId()) + .setGet(StateGetResponse.newBuilder()) + .build()); + return; + } + + ByteString continuationToken = requestBuilder.getGet().getContinuationToken(); + + int requestedPosition = 0; // Default position is 0 + if (!ByteString.EMPTY.equals(continuationToken)) { + requestedPosition = Integer.parseInt(continuationToken.toStringUtf8()); + } + + // Compute the new continuation token + ByteString newContinuationToken = ByteString.EMPTY; + if (requestedPosition != expected.length - 1) { + newContinuationToken = ByteString.copyFromUtf8(Integer.toString(requestedPosition + 1)); + } + response.complete( + StateResponse.newBuilder() + .setId(requestBuilder.getId()) + .setGet( + StateGetResponse.newBuilder() + .setData(expected[requestedPosition]) + .setContinuationToken(newContinuationToken)) + .build()); + }; + } + /** Tests for {@link StateFetchingIterators.LazyBlockingStateFetchingIterator}. */ @RunWith(JUnit4.class) public static class LazyBlockingStateFetchingIteratorTest { @@ -67,34 +121,127 @@ public void testMultiWithEmptyByteStrings() throws Exception { ByteString.EMPTY); } - private void testFetch(ByteString... expected) { + @Test + public void testPrefetchIgnoredWhenExistingPrefetchOngoing() throws Exception { + AtomicInteger callCount = new AtomicInteger(); BeamFnStateClient fakeStateClient = - (requestBuilder, response) -> { - ByteString continuationToken = requestBuilder.getGet().getContinuationToken(); - - int requestedPosition = 0; // Default position is 0 - if (!ByteString.EMPTY.equals(continuationToken)) { - requestedPosition = Integer.parseInt(continuationToken.toStringUtf8()); + new BeamFnStateClient() { + @Override + public void handle( + StateRequest.Builder requestBuilder, CompletableFuture response) { + callCount.incrementAndGet(); } - - // Compute the new continuation token - ByteString newContinuationToken = ByteString.EMPTY; - if (requestedPosition != expected.length - 1) { - newContinuationToken = - ByteString.copyFromUtf8(Integer.toString(requestedPosition + 1)); - } - response.complete( - StateResponse.newBuilder() - .setId(requestBuilder.getId()) - .setGet( - StateGetResponse.newBuilder() - .setData(expected[requestedPosition]) - .setContinuationToken(newContinuationToken)) - .build()); }; - Iterator byteStrings = + PrefetchableIterator byteStrings = + new LazyBlockingStateFetchingIterator(fakeStateClient, StateRequest.getDefaultInstance()); + assertEquals(0, callCount.get()); + byteStrings.prefetch(); + assertEquals(1, callCount.get()); // first prefetch + byteStrings.prefetch(); + assertEquals(1, callCount.get()); // subsequent is ignored + } + + private void testFetch(ByteString... expected) { + AtomicInteger callCount = new AtomicInteger(); + BeamFnStateClient fakeStateClient = fakeStateClient(callCount, expected); + PrefetchableIterator byteStrings = new LazyBlockingStateFetchingIterator(fakeStateClient, StateRequest.getDefaultInstance()); - assertArrayEquals(expected, Iterators.toArray(byteStrings, Object.class)); + assertEquals(0, callCount.get()); // Ensure it's fully lazy. + assertFalse(byteStrings.isReady()); + + // Prefetch every second element in the iterator capturing the results + List results = new ArrayList<>(); + for (int i = 0; i < expected.length; ++i) { + if (i % 2 == 0) { + // Ensure that prefetch performs the call + byteStrings.prefetch(); + assertEquals(i + 1, callCount.get()); + assertTrue(byteStrings.isReady()); + } + assertTrue(byteStrings.hasNext()); + results.add(byteStrings.next()); + } + assertFalse(byteStrings.hasNext()); + assertTrue(byteStrings.isReady()); + + assertEquals(Arrays.asList(expected), results); + } + } + + @RunWith(JUnit4.class) + public static class FirstPageAndRemainderTest { + + @Test + public void testEmptyValues() throws Exception { + testFetchValues(VarIntCoder.of()); + } + + @Test + public void testOneValue() throws Exception { + testFetchValues(VarIntCoder.of(), 4); + } + + @Test + public void testManyValues() throws Exception { + testFetchValues(VarIntCoder.of(), 1, 22, 333, 4444, 55555, 666666); + } + + private void testFetchValues(Coder coder, T... expected) { + List byteStrings = + Arrays.stream(expected) + .map( + value -> { + try { + return CoderUtils.encodeToByteArray(coder, value); + } catch (CoderException exn) { + throw new RuntimeException(exn); + } + }) + .map(ByteString::copyFrom) + .collect(Collectors.toList()); + + AtomicInteger callCount = new AtomicInteger(); + BeamFnStateClient fakeStateClient = + fakeStateClient(callCount, Iterables.toArray(byteStrings, ByteString.class)); + PrefetchableIterable values = + new FirstPageAndRemainder<>(fakeStateClient, StateRequest.getDefaultInstance(), coder); + + // Ensure it's fully lazy. + assertEquals(0, callCount.get()); + PrefetchableIterator valuesIter = values.iterator(); + assertFalse(valuesIter.isReady()); + assertEquals(0, callCount.get()); + + // Ensure that the first page result is cached across multiple iterators and subsequent + // iterators are ready and prefetch does nothing + valuesIter.prefetch(); + assertTrue(valuesIter.isReady()); + assertEquals(1, callCount.get()); + + PrefetchableIterator valuesIter2 = values.iterator(); + assertTrue(valuesIter2.isReady()); + valuesIter2.prefetch(); + assertEquals(1, callCount.get()); + + // Prefetch every second element in the iterator capturing the results + List results = new ArrayList<>(); + for (int i = 0; i < expected.length; ++i) { + if (i % 2 == 1) { + // Ensure that prefetch performs the call + valuesIter2.prefetch(); + assertTrue(valuesIter2.isReady()); + // Note that this is i+2 because we expect to prefetch the page after the current one + // We also have to bound it to the max number of pages + assertEquals(Math.min(i + 2, expected.length), callCount.get()); + } + assertTrue(valuesIter2.hasNext()); + results.add(valuesIter2.next()); + } + assertFalse(valuesIter2.hasNext()); + assertTrue(valuesIter2.isReady()); + + // The contents agree. + assertArrayEquals(expected, Iterables.toArray(values, Object.class)); } } } diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/status/BeamFnStatusClientTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/status/BeamFnStatusClientTest.java new file mode 100644 index 000000000000..196252fc8803 --- /dev/null +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/status/BeamFnStatusClientTest.java @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.status; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.containsString; +import static org.hamcrest.Matchers.is; +import static org.hamcrest.Matchers.not; +import static org.hamcrest.Matchers.stringContainsInOrder; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.StringJoiner; +import java.util.UUID; +import java.util.concurrent.BlockingQueue; +import java.util.concurrent.LinkedBlockingQueue; +import org.apache.beam.fn.harness.control.ProcessBundleHandler; +import org.apache.beam.fn.harness.control.ProcessBundleHandler.BundleProcessor; +import org.apache.beam.fn.harness.control.ProcessBundleHandler.BundleProcessorCache; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusResponse; +import org.apache.beam.model.fnexecution.v1.BeamFnWorkerStatusGrpc.BeamFnWorkerStatusImplBase; +import org.apache.beam.model.pipeline.v1.Endpoints; +import org.apache.beam.runners.core.metrics.ExecutionStateTracker; +import org.apache.beam.sdk.fn.channel.ManagedChannelFactory; +import org.apache.beam.sdk.fn.test.InProcessManagedChannelFactory; +import org.apache.beam.sdk.fn.test.TestStreams; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.inprocess.InProcessServerBuilder; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class BeamFnStatusClientTest { + private final Endpoints.ApiServiceDescriptor apiServiceDescriptor = + Endpoints.ApiServiceDescriptor.newBuilder() + .setUrl(this.getClass().getName() + "-" + UUID.randomUUID().toString()) + .build(); + + @Test + public void testActiveBundleState() { + ProcessBundleHandler handler = mock(ProcessBundleHandler.class); + BundleProcessorCache processorCache = mock(BundleProcessorCache.class); + Map bundleProcessorMap = new HashMap<>(); + for (int i = 0; i < 11; i++) { + BundleProcessor processor = mock(BundleProcessor.class); + ExecutionStateTracker executionStateTracker = mock(ExecutionStateTracker.class); + when(processor.getStateTracker()).thenReturn(executionStateTracker); + when(executionStateTracker.getMillisSinceLastTransition()) + .thenReturn(Integer.toUnsignedLong((10 - i) * 1000)); + when(executionStateTracker.getTrackedThread()).thenReturn(Thread.currentThread()); + String instruction = Integer.toString(i); + when(processorCache.find(instruction)).thenReturn(processor); + bundleProcessorMap.put(instruction, processor); + } + when(handler.getBundleProcessorCache()).thenReturn(processorCache); + when(processorCache.getActiveBundleProcessors()).thenReturn(bundleProcessorMap); + + ManagedChannelFactory channelFactory = InProcessManagedChannelFactory.create(); + BeamFnStatusClient client = + new BeamFnStatusClient( + apiServiceDescriptor, + channelFactory::forDescriptor, + handler.getBundleProcessorCache(), + PipelineOptionsFactory.create()); + StringJoiner joiner = new StringJoiner("\n"); + joiner.add(client.getActiveProcessBundleState()); + String actualState = joiner.toString(); + + List expectedInstructions = new ArrayList<>(); + for (int i = 0; i < 10; i++) { + expectedInstructions.add(String.format("Instruction %d", i)); + } + assertThat(actualState, stringContainsInOrder(expectedInstructions)); + assertThat(actualState, not(containsString("Instruction 10"))); + } + + @Test + public void testWorkerStatusResponse() throws Exception { + BlockingQueue values = new LinkedBlockingQueue<>(); + BlockingQueue> requestObservers = + new LinkedBlockingQueue<>(); + StreamObserver inboundServerObserver = + TestStreams.withOnNext(values::add).build(); + Server server = + InProcessServerBuilder.forName(apiServiceDescriptor.getUrl()) + .addService( + new BeamFnWorkerStatusImplBase() { + @Override + public StreamObserver workerStatus( + StreamObserver responseObserver) { + Uninterruptibles.putUninterruptibly(requestObservers, responseObserver); + return inboundServerObserver; + } + }) + .build(); + server.start(); + + try { + BundleProcessorCache processorCache = mock(BundleProcessorCache.class); + when(processorCache.getActiveBundleProcessors()).thenReturn(Collections.emptyMap()); + ManagedChannelFactory channelFactory = InProcessManagedChannelFactory.create(); + BeamFnStatusClient client = + new BeamFnStatusClient( + apiServiceDescriptor, + channelFactory::forDescriptor, + processorCache, + PipelineOptionsFactory.create()); + StreamObserver requestObserver = requestObservers.take(); + requestObserver.onNext(WorkerStatusRequest.newBuilder().setId("id").build()); + WorkerStatusResponse response = values.take(); + assertThat(response.getStatusInfo(), containsString("No active processing bundles.")); + assertThat(response.getId(), is("id")); + } finally { + server.shutdownNow(); + } + } +} diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/status/MemoryMonitorTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/status/MemoryMonitorTest.java new file mode 100644 index 000000000000..2fa407cf9bf6 --- /dev/null +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/status/MemoryMonitorTest.java @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.status; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertTrue; + +import java.io.File; +import java.io.IOException; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; +import org.hamcrest.Matchers; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Test the memory monitor will block threads when the server is in a (faked) GC thrashing state. + */ +@RunWith(JUnit4.class) +public class MemoryMonitorTest { + + @Rule public TemporaryFolder tempFolder = new TemporaryFolder(); + + static class FakeGCStatsProvider implements MemoryMonitor.GCStatsProvider { + AtomicBoolean inGCThrashingState = new AtomicBoolean(false); + long lastCallTimestamp = System.currentTimeMillis(); + long lastGCResult = 0; + + @Override + public long totalGCTimeMilliseconds() { + if (inGCThrashingState.get()) { + long now = System.currentTimeMillis(); + lastGCResult += now - lastCallTimestamp; + lastCallTimestamp = now; + } + return lastGCResult; + } + } + + private FakeGCStatsProvider provider; + private File localDumpFolder; + private MemoryMonitor monitor; + private Thread thread; + + @Before + public void setup() throws IOException { + provider = new FakeGCStatsProvider(); + localDumpFolder = tempFolder.newFolder(); + // Update every 10ms, never shutdown VM. + monitor = MemoryMonitor.forTest(provider, 10, 0, false, 50.0, null, localDumpFolder); + thread = new Thread(monitor); + thread.start(); + } + + @Test(timeout = 1000) + public void detectGCThrashing() throws InterruptedException { + monitor.waitForRunning(); + monitor.waitForResources("Test1"); + provider.inGCThrashingState.set(true); + monitor.waitForThrashingState(true); + final Semaphore s = new Semaphore(0); + new Thread( + () -> { + monitor.waitForResources("Test2"); + s.release(); + }) + .start(); + assertFalse(s.tryAcquire(100, TimeUnit.MILLISECONDS)); + provider.inGCThrashingState.set(false); + monitor.waitForThrashingState(false); + assertTrue(s.tryAcquire(100, TimeUnit.MILLISECONDS)); + monitor.waitForResources("Test3"); + } + + @Test + public void heapDumpOnce() throws Exception { + File folder = tempFolder.newFolder(); + + File dump1 = MemoryMonitor.dumpHeap(folder); + assertNotNull(dump1); + assertTrue(dump1.exists()); + assertThat(dump1.getParentFile(), Matchers.equalTo(folder)); + } + + @Test + public void heapDumpTwice() throws Exception { + File folder = tempFolder.newFolder(); + + File dump1 = MemoryMonitor.dumpHeap(folder); + assertNotNull(dump1); + assertTrue(dump1.exists()); + assertThat(dump1.getParentFile(), Matchers.equalTo(folder)); + + File dump2 = MemoryMonitor.dumpHeap(folder); + assertNotNull(dump2); + assertTrue(dump2.exists()); + assertThat(dump2.getParentFile(), Matchers.equalTo(folder)); + } + + @Test + public void uploadFile() throws Exception { + File remoteFolder = tempFolder.newFolder(); + monitor = + MemoryMonitor.forTest(provider, 10, 0, true, 50.0, remoteFolder.getPath(), localDumpFolder); + + // Force the monitor to generate a local heap dump + monitor.dumpHeap(); + + // Try to upload the heap dump + assertTrue(monitor.tryUploadHeapDumpIfItExists()); + + File[] files = remoteFolder.listFiles(); + assertThat(files, Matchers.arrayWithSize(1)); + assertThat(files[0].getAbsolutePath(), Matchers.containsString("heap_dump")); + assertThat(files[0].getAbsolutePath(), Matchers.containsString("hprof")); + } + + @Test + public void uploadFileDisabled() throws Exception { + monitor = MemoryMonitor.forTest(provider, 10, 0, true, 50.0, null, localDumpFolder); + + // Force the monitor to generate a local heap dump + monitor.dumpHeap(); + + // Try to upload the heap dump + assertFalse(monitor.tryUploadHeapDumpIfItExists()); + } + + @Test + public void disableMemoryMonitor() throws Exception { + MemoryMonitor disabledMonitor = + MemoryMonitor.forTest(provider, 10, 0, true, 100.0, null, localDumpFolder); + Thread disabledMonitorThread = new Thread(disabledMonitor); + disabledMonitorThread.start(); + + // Monitor thread should stop quickly after starting. Wait 10 seconds, and check that monitor + // thread is not alive. + disabledMonitorThread.join(10000); + assertFalse(disabledMonitorThread.isAlive()); + + // Enabled monitor thread should still be running. + assertTrue(thread.isAlive()); + } +} diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/stream/HarnessStreamObserverFactoriesTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/stream/HarnessStreamObserverFactoriesTest.java index e5e7ec60345f..b828b36f9265 100644 --- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/stream/HarnessStreamObserverFactoriesTest.java +++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/stream/HarnessStreamObserverFactoriesTest.java @@ -17,16 +17,16 @@ */ package org.apache.beam.fn.harness.stream; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.fn.stream.BufferingStreamObserver; import org.apache.beam.sdk.fn.stream.DirectStreamObserver; import org.apache.beam.sdk.fn.stream.ForwardingClientResponseObserver; import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.CallStreamObserver; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.CallStreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; @@ -36,9 +36,6 @@ /** Tests for {@link HarnessStreamObserverFactoriesTest}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HarnessStreamObserverFactoriesTest { @Mock private StreamObserver mockRequestObserver; @Mock private CallStreamObserver mockResponseObserver; diff --git a/sdks/java/io/amazon-web-services/OWNERS b/sdks/java/io/amazon-web-services/OWNERS new file mode 100644 index 000000000000..ffbc8b6c2475 --- /dev/null +++ b/sdks/java/io/amazon-web-services/OWNERS @@ -0,0 +1,4 @@ +# See the OWNERS docs at https://s.apache.org/beam-owners + +reviewers: + - aromanenko-dev diff --git a/sdks/java/io/amazon-web-services/build.gradle b/sdks/java/io/amazon-web-services/build.gradle index c7c3874b2af4..27a0346df1c9 100644 --- a/sdks/java/io/amazon-web-services/build.gradle +++ b/sdks/java/io/amazon-web-services/build.gradle @@ -39,11 +39,12 @@ dependencies { compile library.java.aws_java_sdk_sns compile library.java.aws_java_sdk_sqs compile library.java.aws_java_sdk_sts - compile "commons-lang:commons-lang:2.6" compile library.java.jackson_core compile library.java.jackson_annotations compile library.java.jackson_databind compile library.java.slf4j_api + compile library.java.joda_time + compile library.java.http_core runtime library.java.commons_codec runtime "org.apache.httpcomponents:httpclient:4.5.12" testCompile project(path: ":sdks:java:core", configuration: "shadowTest") @@ -58,6 +59,7 @@ dependencies { testCompile 'org.elasticmq:elasticmq-rest-sqs_2.12:0.15.6' testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") + testCompile project(path: ":runners:core-construction-java", configuration: "testRuntime") } test { diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java index 319f2bea5b4e..9f78a87057fb 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java @@ -32,6 +32,7 @@ import java.io.Serializable; import java.util.ArrayList; import java.util.Collections; +import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.function.Predicate; @@ -97,6 +98,11 @@ * writeRequest> * * + * If primary keys could repeat in your stream (i.e. an upsert stream), you could encounter a + * ValidationError, as AWS does not allow writing duplicate keys within a single batch operation. + * For such use cases, you can explicitly set the key names corresponding to the primary key to be + * deduplicated using the withDeduplicateKeys method + * *

    Reading from DynamoDB

    * *

    Example usage: @@ -131,7 +137,7 @@ public static Read read() { } public static Write write() { - return new AutoValue_DynamoDBIO_Write.Builder().build(); + return new AutoValue_DynamoDBIO_Write.Builder().setDeduplicateKeys(new ArrayList<>()).build(); } /** Read data from DynamoDB and return ScanResult. */ @@ -249,10 +255,19 @@ private static class ReadFn extends DoFn, T> { @ProcessElement public void processElement(@Element Read spec, OutputReceiver out) { AmazonDynamoDB client = spec.getAwsClientsProvider().createDynamoDB(); - ScanRequest scanRequest = spec.getScanRequestFn().apply(null); - scanRequest.setSegment(spec.getSegmentId()); - ScanResult scanResult = client.scan(scanRequest); - out.output(spec.getScanResultMapperFn().apply(scanResult)); + Map lastEvaluatedKey = null; + + do { + ScanRequest scanRequest = spec.getScanRequestFn().apply(null); + scanRequest.setSegment(spec.getSegmentId()); + if (lastEvaluatedKey != null) { + scanRequest.withExclusiveStartKey(lastEvaluatedKey); + } + + ScanResult scanResult = client.scan(scanRequest); + out.output(spec.getScanResultMapperFn().apply(scanResult)); + lastEvaluatedKey = scanResult.getLastEvaluatedKey(); + } while (lastEvaluatedKey != null); // iterate until all records are fetched } } @@ -346,6 +361,8 @@ public abstract static class Write extends PTransform, PCollec abstract @Nullable SerializableFunction> getWriteItemMapperFn(); + abstract List getDeduplicateKeys(); + abstract Builder builder(); @AutoValue.Builder @@ -358,6 +375,8 @@ abstract static class Builder { abstract Builder setWriteItemMapperFn( SerializableFunction> writeItemMapperFn); + abstract Builder setDeduplicateKeys(List deduplicateKeys); + abstract Write build(); } @@ -406,6 +425,10 @@ public Write withWriteRequestMapperFn( return builder().setWriteItemMapperFn(writeItemMapperFn).build(); } + public Write withDeduplicateKeys(List deduplicateKeys) { + return builder().setDeduplicateKeys(deduplicateKeys).build(); + } + @Override public PCollection expand(PCollection input) { return input.apply(ParDo.of(new WriteFn<>(this))); @@ -424,7 +447,7 @@ static class WriteFn extends DoFn { private static final int BATCH_SIZE = 25; private transient AmazonDynamoDB client; private final DynamoDBIO.Write spec; - private List> batch; + private Map>, KV> batch; WriteFn(DynamoDBIO.Write spec) { this.spec = spec; @@ -447,19 +470,35 @@ public void setup() { @StartBundle public void startBundle(StartBundleContext context) { - batch = new ArrayList<>(); + batch = new HashMap<>(); } @ProcessElement public void processElement(ProcessContext context) throws Exception { final KV writeRequest = (KV) spec.getWriteItemMapperFn().apply(context.element()); - batch.add(writeRequest); + batch.put( + KV.of(writeRequest.getKey(), extractDeduplicateKeyValues(writeRequest.getValue())), + writeRequest); if (batch.size() >= BATCH_SIZE) { flushBatch(); } } + private Map extractDeduplicateKeyValues(WriteRequest request) { + if (request.getPutRequest() != null) { + return request.getPutRequest().getItem().entrySet().stream() + .filter(entry -> spec.getDeduplicateKeys().contains(entry.getKey())) + .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); + } else if (request.getDeleteRequest() != null) { + return request.getDeleteRequest().getKey().entrySet().stream() + .filter(entry -> spec.getDeduplicateKeys().contains(entry.getKey())) + .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); + } else { + return Collections.emptyMap(); + } + } + @FinishBundle public void finishBundle(FinishBundleContext context) throws Exception { flushBatch(); @@ -474,7 +513,7 @@ private void flushBatch() throws IOException, InterruptedException { // Since each element is a KV in the batch, we need to group them // by tableName Map> mapTableRequest = - batch.stream() + batch.values().stream() .collect( Collectors.groupingBy( KV::getKey, Collectors.mapping(KV::getValue, Collectors.toList()))); diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsModule.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsModule.java index f873876cd2e0..69d5d19e8496 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsModule.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsModule.java @@ -18,9 +18,11 @@ package org.apache.beam.sdk.io.aws.options; import com.amazonaws.ClientConfiguration; +import com.amazonaws.auth.AWSCredentials; import com.amazonaws.auth.AWSCredentialsProvider; import com.amazonaws.auth.AWSStaticCredentialsProvider; import com.amazonaws.auth.BasicAWSCredentials; +import com.amazonaws.auth.BasicSessionCredentials; import com.amazonaws.auth.ClasspathPropertiesFileCredentialsProvider; import com.amazonaws.auth.DefaultAWSCredentialsProviderChain; import com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper; @@ -34,9 +36,7 @@ import com.fasterxml.jackson.annotation.JsonTypeInfo; import com.fasterxml.jackson.core.JsonGenerator; import com.fasterxml.jackson.core.JsonParser; -import com.fasterxml.jackson.core.JsonToken; import com.fasterxml.jackson.core.type.TypeReference; -import com.fasterxml.jackson.core.type.WritableTypeId; import com.fasterxml.jackson.databind.DeserializationContext; import com.fasterxml.jackson.databind.JsonDeserializer; import com.fasterxml.jackson.databind.JsonSerializer; @@ -68,6 +68,7 @@ public class AwsModule extends SimpleModule { private static final String AWS_ACCESS_KEY_ID = "awsAccessKeyId"; private static final String AWS_SECRET_KEY = "awsSecretKey"; + private static final String SESSION_TOKEN = "sessionToken"; private static final String CREDENTIALS_FILE_PATH = "credentialsFilePath"; public static final String CLIENT_EXECUTION_TIMEOUT = "clientExecutionTimeout"; public static final String CONNECTION_MAX_IDLE_TIME = "connectionMaxIdleTime"; @@ -121,8 +122,17 @@ public AWSCredentialsProvider deserializeWithType( } if (typeName.equals(AWSStaticCredentialsProvider.class.getSimpleName())) { - return new AWSStaticCredentialsProvider( - new BasicAWSCredentials(asMap.get(AWS_ACCESS_KEY_ID), asMap.get(AWS_SECRET_KEY))); + boolean isSession = asMap.containsKey(SESSION_TOKEN); + if (isSession) { + return new AWSStaticCredentialsProvider( + new BasicSessionCredentials( + asMap.get(AWS_ACCESS_KEY_ID), + asMap.get(AWS_SECRET_KEY), + asMap.get(SESSION_TOKEN))); + } else { + return new AWSStaticCredentialsProvider( + new BasicAWSCredentials(asMap.get(AWS_ACCESS_KEY_ID), asMap.get(AWS_SECRET_KEY))); + } } else if (typeName.equals(PropertiesFileCredentialsProvider.class.getSimpleName())) { return new PropertiesFileCredentialsProvider(asMap.get(CREDENTIALS_FILE_PATH)); } else if (typeName.equals( @@ -177,15 +187,20 @@ public void serializeWithType( SerializerProvider serializers, TypeSerializer typeSerializer) throws IOException { - WritableTypeId typeId = - typeSerializer.writeTypePrefix( - jsonGenerator, typeSerializer.typeId(credentialsProvider, JsonToken.START_OBJECT)); - if (credentialsProvider.getClass().equals(AWSStaticCredentialsProvider.class)) { - jsonGenerator.writeStringField( - AWS_ACCESS_KEY_ID, credentialsProvider.getCredentials().getAWSAccessKeyId()); - jsonGenerator.writeStringField( - AWS_SECRET_KEY, credentialsProvider.getCredentials().getAWSSecretKey()); + // BEAM-11958 Use deprecated Jackson APIs to be compatible with older versions of jackson + typeSerializer.writeTypePrefixForObject(credentialsProvider, jsonGenerator); + if (credentialsProvider.getClass().equals(AWSStaticCredentialsProvider.class)) { + AWSCredentials credentials = credentialsProvider.getCredentials(); + if (credentials.getClass().equals(BasicSessionCredentials.class)) { + BasicSessionCredentials sessionCredentials = (BasicSessionCredentials) credentials; + jsonGenerator.writeStringField(AWS_ACCESS_KEY_ID, sessionCredentials.getAWSAccessKeyId()); + jsonGenerator.writeStringField(AWS_SECRET_KEY, sessionCredentials.getAWSSecretKey()); + jsonGenerator.writeStringField(SESSION_TOKEN, sessionCredentials.getSessionToken()); + } else { + jsonGenerator.writeStringField(AWS_ACCESS_KEY_ID, credentials.getAWSAccessKeyId()); + jsonGenerator.writeStringField(AWS_SECRET_KEY, credentials.getAWSSecretKey()); + } } else if (credentialsProvider.getClass().equals(PropertiesFileCredentialsProvider.class)) { try { PropertiesFileCredentialsProvider specificProvider = @@ -239,7 +254,8 @@ public void serializeWithType( throw new IllegalArgumentException( "Unsupported AWS credentials provider type " + credentialsProvider.getClass()); } - typeSerializer.writeTypeSuffix(jsonGenerator, typeId); + // BEAM-11958 Use deprecated Jackson APIs to be compatible with older versions of jackson + typeSerializer.writeTypeSuffixForObject(credentialsProvider, jsonGenerator); } } @@ -300,7 +316,7 @@ public ClientConfiguration deserialize(JsonParser jsonParser, DeserializationCon clientConfiguration.setProxyHost((String) map.get(PROXY_HOST)); } if (map.containsKey(PROXY_PORT)) { - clientConfiguration.setProxyPort((Integer) map.get(PROXY_PORT)); + clientConfiguration.setProxyPort(((Number) map.get(PROXY_PORT)).intValue()); } if (map.containsKey(PROXY_USERNAME)) { clientConfiguration.setProxyUsername((String) map.get(PROXY_USERNAME)); @@ -309,27 +325,28 @@ public ClientConfiguration deserialize(JsonParser jsonParser, DeserializationCon clientConfiguration.setProxyPassword((String) map.get(PROXY_PASSWORD)); } if (map.containsKey(CLIENT_EXECUTION_TIMEOUT)) { - clientConfiguration.setClientExecutionTimeout((Integer) map.get(CLIENT_EXECUTION_TIMEOUT)); + clientConfiguration.setClientExecutionTimeout( + ((Number) map.get(CLIENT_EXECUTION_TIMEOUT)).intValue()); } if (map.containsKey(CONNECTION_MAX_IDLE_TIME)) { clientConfiguration.setConnectionMaxIdleMillis( ((Number) map.get(CONNECTION_MAX_IDLE_TIME)).longValue()); } if (map.containsKey(CONNECTION_TIMEOUT)) { - clientConfiguration.setConnectionTimeout((Integer) map.get(CONNECTION_TIMEOUT)); + clientConfiguration.setConnectionTimeout(((Number) map.get(CONNECTION_TIMEOUT)).intValue()); } if (map.containsKey(CONNECTION_TIME_TO_LIVE)) { clientConfiguration.setConnectionTTL( ((Number) map.get(CONNECTION_TIME_TO_LIVE)).longValue()); } if (map.containsKey(MAX_CONNECTIONS)) { - clientConfiguration.setMaxConnections((Integer) map.get(MAX_CONNECTIONS)); + clientConfiguration.setMaxConnections(((Number) map.get(MAX_CONNECTIONS)).intValue()); } if (map.containsKey(REQUEST_TIMEOUT)) { - clientConfiguration.setRequestTimeout((Integer) map.get(REQUEST_TIMEOUT)); + clientConfiguration.setRequestTimeout(((Number) map.get(REQUEST_TIMEOUT)).intValue()); } if (map.containsKey(SOCKET_TIMEOUT)) { - clientConfiguration.setSocketTimeout((Integer) map.get(SOCKET_TIMEOUT)); + clientConfiguration.setSocketTimeout(((Number) map.get(SOCKET_TIMEOUT)).intValue()); } return clientConfiguration; } diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/S3Options.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/S3Options.java index 41b07b128e90..e9979b5c99ea 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/S3Options.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/S3Options.java @@ -73,6 +73,14 @@ public interface S3Options extends AwsOptions { void setSSEAwsKeyManagementParams(SSEAwsKeyManagementParams value); + @Description( + "Set to true to use an S3 Bucket Key for object encryption with server-side " + + "encryption using AWS KMS (SSE-KMS)") + @Default.Boolean(false) + boolean getBucketKeyEnabled(); + + void setBucketKeyEnabled(boolean value); + @Description( "Factory class that should be created and used to create a builder of AmazonS3 client." + "Override the default value if you need a S3 client with custom properties, like path style access, etc.") diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/DefaultS3FileSystemSchemeRegistrar.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/DefaultS3FileSystemSchemeRegistrar.java new file mode 100644 index 000000000000..119df3c7f75f --- /dev/null +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/DefaultS3FileSystemSchemeRegistrar.java @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.s3; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.service.AutoService; +import javax.annotation.Nonnull; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.io.aws.options.S3Options; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** Registers the "s3" uri schema to be handled by {@link S3FileSystem}. */ +@AutoService(S3FileSystemSchemeRegistrar.class) +@Experimental(Kind.FILESYSTEM) +public class DefaultS3FileSystemSchemeRegistrar implements S3FileSystemSchemeRegistrar { + + @Override + public Iterable fromOptions(@Nonnull PipelineOptions options) { + checkNotNull(options, "Expect the runner have called FileSystems.setDefaultPipelineOptions()."); + return ImmutableList.of( + S3FileSystemConfiguration.fromS3Options(options.as(S3Options.class)).build()); + } +} diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java index fda3cfd3d58e..e5232df899ff 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java @@ -24,7 +24,6 @@ import com.amazonaws.AmazonClientException; import com.amazonaws.services.s3.AmazonS3; -import com.amazonaws.services.s3.AmazonS3ClientBuilder; import com.amazonaws.services.s3.model.AmazonS3Exception; import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest; import com.amazonaws.services.s3.model.CompleteMultipartUploadResult; @@ -62,11 +61,10 @@ import java.util.regex.Pattern; import java.util.stream.Collectors; import org.apache.beam.sdk.io.FileSystem; -import org.apache.beam.sdk.io.aws.options.S3ClientBuilderFactory; import org.apache.beam.sdk.io.aws.options.S3Options; import org.apache.beam.sdk.io.fs.CreateOptions; import org.apache.beam.sdk.io.fs.MatchResult; -import org.apache.beam.sdk.util.InstanceBuilder; +import org.apache.beam.sdk.io.fs.MoveOptions; import org.apache.beam.sdk.util.MoreFutures; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; @@ -84,7 +82,11 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; -/** {@link FileSystem} implementation for Amazon S3. */ +/** + * {@link FileSystem} implementation for storage systems that use the S3 protocol. + * + * @see S3FileSystemSchemeRegistrar + */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) @@ -104,30 +106,29 @@ class S3FileSystem extends FileSystem { // Non-final for testing. private Supplier amazonS3; - private final S3Options options; + private final S3FileSystemConfiguration config; private final ListeningExecutorService executorService; - S3FileSystem(S3Options options) { - this.options = checkNotNull(options, "options"); - AmazonS3ClientBuilder builder = - InstanceBuilder.ofType(S3ClientBuilderFactory.class) - .fromClass(options.getS3ClientFactoryClass()) - .build() - .createBuilder(options); + S3FileSystem(S3FileSystemConfiguration config) { + this.config = checkNotNull(config, "config"); // The Supplier is to make sure we don't call .build() unless we are actually using S3. - amazonS3 = Suppliers.memoize(builder::build); + amazonS3 = Suppliers.memoize(config.getS3ClientBuilder()::build); - checkNotNull(options.getS3StorageClass(), "storageClass"); - checkArgument(options.getS3ThreadPoolSize() > 0, "threadPoolSize"); + checkNotNull(config.getS3StorageClass(), "storageClass"); + checkArgument(config.getS3ThreadPoolSize() > 0, "threadPoolSize"); executorService = MoreExecutors.listeningDecorator( Executors.newFixedThreadPool( - options.getS3ThreadPoolSize(), new ThreadFactoryBuilder().setDaemon(true).build())); + config.getS3ThreadPoolSize(), new ThreadFactoryBuilder().setDaemon(true).build())); + } + + S3FileSystem(S3Options options) { + this(S3FileSystemConfiguration.fromS3Options(options).build()); } @Override protected String getScheme() { - return S3ResourceId.SCHEME; + return config.getScheme(); } @VisibleForTesting @@ -226,9 +227,10 @@ List matchGlobPaths(Collection globPaths) throws IOEx exception = pathWithEncoding.getException(); break; } else { + // TODO(BEAM-11821): Support file checksum in this method. metadatas.add( createBeamMetadata( - pathWithEncoding.getPath(), pathWithEncoding.getContentEncoding())); + pathWithEncoding.getPath(), pathWithEncoding.getContentEncoding(), null)); } } @@ -326,7 +328,8 @@ private ExpandedGlob expandGlob(S3ResourceId glob) { // Filter against regex. if (wildcardRegexp.matcher(objectSummary.getKey()).matches()) { S3ResourceId expandedPath = - S3ResourceId.fromComponents(objectSummary.getBucketName(), objectSummary.getKey()) + S3ResourceId.fromComponents( + glob.getScheme(), objectSummary.getBucketName(), objectSummary.getKey()) .withSize(objectSummary.getSize()) .withLastModified(objectSummary.getLastModified()); LOG.debug("Expanded S3 object path {}", expandedPath); @@ -363,7 +366,7 @@ private List matchNonGlobPaths(Collection paths) thro private ObjectMetadata getObjectMetadata(S3ResourceId s3ResourceId) throws AmazonClientException { GetObjectMetadataRequest request = new GetObjectMetadataRequest(s3ResourceId.getBucket(), s3ResourceId.getKey()); - request.setSSECustomerKey(options.getSSECustomerKey()); + request.setSSECustomerKey(config.getSSECustomerKey()); return amazonS3.get().getObjectMetadata(request); } @@ -385,32 +388,37 @@ MatchResult matchNonGlobPath(S3ResourceId path) { createBeamMetadata( path.withSize(s3Metadata.getContentLength()) .withLastModified(s3Metadata.getLastModified()), - Strings.nullToEmpty(s3Metadata.getContentEncoding())))); + Strings.nullToEmpty(s3Metadata.getContentEncoding()), + s3Metadata.getETag()))); } private static MatchResult.Metadata createBeamMetadata( - S3ResourceId path, String contentEncoding) { + S3ResourceId path, String contentEncoding, String eTag) { checkArgument(path.getSize().isPresent(), "The resource id should have a size."); checkNotNull(contentEncoding, "contentEncoding"); boolean isReadSeekEfficient = !NON_READ_SEEK_EFFICIENT_ENCODINGS.contains(contentEncoding); - return MatchResult.Metadata.builder() - .setIsReadSeekEfficient(isReadSeekEfficient) - .setResourceId(path) - .setSizeBytes(path.getSize().get()) - .setLastModifiedMillis(path.getLastModified().transform(Date::getTime).or(0L)) - .build(); + MatchResult.Metadata.Builder ret = + MatchResult.Metadata.builder() + .setIsReadSeekEfficient(isReadSeekEfficient) + .setResourceId(path) + .setSizeBytes(path.getSize().get()) + .setLastModifiedMillis(path.getLastModified().transform(Date::getTime).or(0L)); + if (eTag != null) { + ret.setChecksum(eTag); + } + return ret.build(); } @Override protected WritableByteChannel create(S3ResourceId resourceId, CreateOptions createOptions) throws IOException { - return new S3WritableByteChannel(amazonS3.get(), resourceId, createOptions.mimeType(), options); + return new S3WritableByteChannel(amazonS3.get(), resourceId, createOptions.mimeType(), config); } @Override protected ReadableByteChannel open(S3ResourceId resourceId) throws IOException { - return new S3ReadableSeekableByteChannel(amazonS3.get(), resourceId, options); + return new S3ReadableSeekableByteChannel(amazonS3.get(), resourceId, config); } @Override @@ -463,9 +471,9 @@ CopyObjectResult atomicCopy( destinationPath.getBucket(), destinationPath.getKey()); copyObjectRequest.setNewObjectMetadata(sourceObjectMetadata); - copyObjectRequest.setStorageClass(options.getS3StorageClass()); - copyObjectRequest.setSourceSSECustomerKey(options.getSSECustomerKey()); - copyObjectRequest.setDestinationSSECustomerKey(options.getSSECustomerKey()); + copyObjectRequest.setStorageClass(config.getS3StorageClass()); + copyObjectRequest.setSourceSSECustomerKey(config.getSSECustomerKey()); + copyObjectRequest.setDestinationSSECustomerKey(config.getSSECustomerKey()); return amazonS3.get().copyObject(copyObjectRequest); } @@ -475,9 +483,9 @@ CompleteMultipartUploadResult multipartCopy( throws AmazonClientException { InitiateMultipartUploadRequest initiateUploadRequest = new InitiateMultipartUploadRequest(destinationPath.getBucket(), destinationPath.getKey()) - .withStorageClass(options.getS3StorageClass()) + .withStorageClass(config.getS3StorageClass()) .withObjectMetadata(sourceObjectMetadata); - initiateUploadRequest.setSSECustomerKey(options.getSSECustomerKey()); + initiateUploadRequest.setSSECustomerKey(config.getSSECustomerKey()); InitiateMultipartUploadResult initiateUploadResult = amazonS3.get().initiateMultipartUpload(initiateUploadRequest); @@ -497,14 +505,14 @@ CompleteMultipartUploadResult multipartCopy( .withDestinationKey(destinationPath.getKey()) .withUploadId(uploadId) .withPartNumber(1); - copyPartRequest.setSourceSSECustomerKey(options.getSSECustomerKey()); - copyPartRequest.setDestinationSSECustomerKey(options.getSSECustomerKey()); + copyPartRequest.setSourceSSECustomerKey(config.getSSECustomerKey()); + copyPartRequest.setDestinationSSECustomerKey(config.getSSECustomerKey()); CopyPartResult copyPartResult = amazonS3.get().copyPart(copyPartRequest); eTags.add(copyPartResult.getPartETag()); } else { long bytePosition = 0; - Integer uploadBufferSizeBytes = options.getS3UploadBufferSizeBytes(); + Integer uploadBufferSizeBytes = config.getS3UploadBufferSizeBytes(); // Amazon parts are 1-indexed, not zero-indexed. for (int partNumber = 1; bytePosition < objectSize; partNumber++) { final CopyPartRequest copyPartRequest = @@ -517,8 +525,8 @@ CompleteMultipartUploadResult multipartCopy( .withPartNumber(partNumber) .withFirstByte(bytePosition) .withLastByte(Math.min(objectSize - 1, bytePosition + uploadBufferSizeBytes - 1)); - copyPartRequest.setSourceSSECustomerKey(options.getSSECustomerKey()); - copyPartRequest.setDestinationSSECustomerKey(options.getSSECustomerKey()); + copyPartRequest.setSourceSSECustomerKey(config.getSSECustomerKey()); + copyPartRequest.setDestinationSSECustomerKey(config.getSSECustomerKey()); CopyPartResult copyPartResult = amazonS3.get().copyPart(copyPartRequest); eTags.add(copyPartResult.getPartETag()); @@ -538,8 +546,13 @@ CompleteMultipartUploadResult multipartCopy( @Override protected void rename( - List sourceResourceIds, List destinationResourceIds) + List sourceResourceIds, + List destinationResourceIds, + MoveOptions... moveOptions) throws IOException { + if (moveOptions.length > 0) { + throw new UnsupportedOperationException("Support for move options is not yet implemented."); + } copy(sourceResourceIds, destinationResourceIds); delete(sourceResourceIds); } diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemConfiguration.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemConfiguration.java new file mode 100644 index 000000000000..d5f2327b5ff9 --- /dev/null +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemConfiguration.java @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.s3; + +import com.amazonaws.services.s3.AmazonS3ClientBuilder; +import com.amazonaws.services.s3.model.SSEAwsKeyManagementParams; +import com.amazonaws.services.s3.model.SSECustomerKey; +import com.google.auto.value.AutoValue; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.io.aws.options.S3ClientBuilderFactory; +import org.apache.beam.sdk.io.aws.options.S3Options; +import org.apache.beam.sdk.util.InstanceBuilder; + +/** + * Object used to configure {@link S3FileSystem}. + * + * @see S3Options + * @see S3FileSystemSchemeRegistrar + */ +@AutoValue +@Experimental(Kind.FILESYSTEM) +public abstract class S3FileSystemConfiguration { + public static final int MINIMUM_UPLOAD_BUFFER_SIZE_BYTES = + S3Options.S3UploadBufferSizeBytesFactory.MINIMUM_UPLOAD_BUFFER_SIZE_BYTES; + + /** The uri scheme used by resources on this filesystem. */ + public abstract String getScheme(); + + /** The AWS S3 storage class used for creating S3 objects. */ + public abstract String getS3StorageClass(); + + /** Size of S3 upload chunks. */ + public abstract int getS3UploadBufferSizeBytes(); + + /** Thread pool size, limiting the max concurrent S3 operations. */ + public abstract int getS3ThreadPoolSize(); + + /** Algorithm for SSE-S3 encryption, e.g. AES256. */ + public abstract @Nullable String getSSEAlgorithm(); + + /** SSE key for SSE-C encryption, e.g. a base64 encoded key and the algorithm. */ + public abstract @Nullable SSECustomerKey getSSECustomerKey(); + + /** KMS key id for SSE-KMS encryption, e.g. "arn:aws:kms:...". */ + public abstract @Nullable SSEAwsKeyManagementParams getSSEAwsKeyManagementParams(); + + /** + * Whether to ose an S3 Bucket Key for object encryption with server-side encryption using AWS KMS + * (SSE-KMS) or not. + */ + public abstract boolean getBucketKeyEnabled(); + + /** Builder used to create the {@code AmazonS3Client}. */ + public abstract AmazonS3ClientBuilder getS3ClientBuilder(); + + /** Creates a new uninitialized {@link Builder}. */ + public static Builder builder() { + return new AutoValue_S3FileSystemConfiguration.Builder(); + } + + /** Creates a new {@link Builder} with values initialized by this instance's properties. */ + public abstract Builder toBuilder(); + + /** + * Creates a new {@link Builder} with values initialized by the properties of {@code s3Options}. + */ + public static Builder fromS3Options(S3Options s3Options) { + return builder() + .setScheme("s3") + .setS3StorageClass(s3Options.getS3StorageClass()) + .setS3UploadBufferSizeBytes(s3Options.getS3UploadBufferSizeBytes()) + .setS3ThreadPoolSize(s3Options.getS3ThreadPoolSize()) + .setSSEAlgorithm(s3Options.getSSEAlgorithm()) + .setSSECustomerKey(s3Options.getSSECustomerKey()) + .setSSEAwsKeyManagementParams(s3Options.getSSEAwsKeyManagementParams()) + .setBucketKeyEnabled(s3Options.getBucketKeyEnabled()) + .setS3ClientBuilder(getBuilder(s3Options)); + } + + /** Creates a new {@link AmazonS3ClientBuilder} as specified by {@code s3Options}. */ + public static AmazonS3ClientBuilder getBuilder(S3Options s3Options) { + return InstanceBuilder.ofType(S3ClientBuilderFactory.class) + .fromClass(s3Options.getS3ClientFactoryClass()) + .build() + .createBuilder(s3Options); + } + + @AutoValue.Builder + public abstract static class Builder { + public abstract Builder setScheme(String value); + + public abstract Builder setS3StorageClass(String value); + + public abstract Builder setS3UploadBufferSizeBytes(int value); + + public abstract Builder setS3ThreadPoolSize(int value); + + public abstract Builder setSSEAlgorithm(@Nullable String value); + + public abstract Builder setSSECustomerKey(@Nullable SSECustomerKey value); + + public abstract Builder setSSEAwsKeyManagementParams(@Nullable SSEAwsKeyManagementParams value); + + public abstract Builder setBucketKeyEnabled(boolean value); + + public abstract Builder setS3ClientBuilder(AmazonS3ClientBuilder value); + + public abstract S3FileSystemConfiguration build(); + } +} diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemRegistrar.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemRegistrar.java index e9e5ace64d04..7446370b5d7c 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemRegistrar.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemRegistrar.java @@ -20,16 +20,24 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import com.google.auto.service.AutoService; +import java.util.Map; +import java.util.ServiceLoader; +import java.util.stream.Collectors; import javax.annotation.Nonnull; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.io.FileSystem; import org.apache.beam.sdk.io.FileSystemRegistrar; -import org.apache.beam.sdk.io.aws.options.S3Options; import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.sdk.util.common.ReflectHelpers; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Streams; -/** {@link AutoService} registrar for the {@link S3FileSystem}. */ +/** + * {@link AutoService} registrar for the {@link S3FileSystem}. + * + *

    Creates instances of {@link S3FileSystem} for each scheme registered with a {@link + * S3FileSystemSchemeRegistrar}. + */ @AutoService(FileSystemRegistrar.class) @Experimental(Kind.FILESYSTEM) public class S3FileSystemRegistrar implements FileSystemRegistrar { @@ -37,6 +45,14 @@ public class S3FileSystemRegistrar implements FileSystemRegistrar { @Override public Iterable> fromOptions(@Nonnull PipelineOptions options) { checkNotNull(options, "Expect the runner have called FileSystems.setDefaultPipelineOptions()."); - return ImmutableList.of(new S3FileSystem(options.as(S3Options.class))); + Map> fileSystems = + Streams.stream( + ServiceLoader.load( + S3FileSystemSchemeRegistrar.class, ReflectHelpers.findClassLoader())) + .flatMap(r -> Streams.stream(r.fromOptions(options))) + .map(S3FileSystem::new) + // Throws IllegalStateException if any duplicate schemes exist. + .collect(Collectors.toMap(S3FileSystem::getScheme, f -> (FileSystem) f)); + return fileSystems.values(); } } diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemSchemeRegistrar.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemSchemeRegistrar.java new file mode 100644 index 000000000000..0f51c3501db3 --- /dev/null +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemSchemeRegistrar.java @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.s3; + +import com.google.auto.service.AutoService; +import java.util.ServiceLoader; +import javax.annotation.Nonnull; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.io.FileSystem; +import org.apache.beam.sdk.io.FileSystemRegistrar; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * A registrar that creates {@link S3FileSystemConfiguration} instances from {@link + * PipelineOptions}. + * + *

    Users of storage systems that use the S3 protocol have the ability to register a URI scheme by + * creating a {@link ServiceLoader} entry and a concrete implementation of this interface. + * + *

    It is optional but recommended to use one of the many build time tools such as {@link + * AutoService} to generate the necessary META-INF files automatically. + */ +@Experimental(Kind.FILESYSTEM) +public interface S3FileSystemSchemeRegistrar { + /** + * Create zero or more {@link S3FileSystemConfiguration} instances from the given {@link + * PipelineOptions}. + * + *

    Each {@link S3FileSystemConfiguration#getScheme() scheme} is required to be unique among all + * schemes registered by all {@link S3FileSystemSchemeRegistrar}s, as well as among all {@link + * FileSystem}s registered by all {@link FileSystemRegistrar}s. + */ + Iterable fromOptions(@Nonnull PipelineOptions options); +} diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3ReadableSeekableByteChannel.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3ReadableSeekableByteChannel.java index 9dde3275f681..3f11f3ac67e9 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3ReadableSeekableByteChannel.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3ReadableSeekableByteChannel.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.io.aws.s3; +import static com.amazonaws.util.IOUtils.drainInputStream; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; @@ -24,6 +25,7 @@ import com.amazonaws.services.s3.AmazonS3; import com.amazonaws.services.s3.model.GetObjectRequest; import com.amazonaws.services.s3.model.S3Object; +import com.amazonaws.services.s3.model.S3ObjectInputStream; import java.io.BufferedInputStream; import java.io.IOException; import java.nio.ByteBuffer; @@ -32,7 +34,6 @@ import java.nio.channels.NonWritableChannelException; import java.nio.channels.ReadableByteChannel; import java.nio.channels.SeekableByteChannel; -import org.apache.beam.sdk.io.aws.options.S3Options; /** A readable S3 object, as a {@link SeekableByteChannel}. */ @SuppressWarnings({ @@ -46,14 +47,14 @@ class S3ReadableSeekableByteChannel implements SeekableByteChannel { private long position = 0; private boolean open = true; private S3Object s3Object; - private final S3Options options; + private final S3FileSystemConfiguration config; private ReadableByteChannel s3ObjectContentChannel; - S3ReadableSeekableByteChannel(AmazonS3 amazonS3, S3ResourceId path, S3Options options) - throws IOException { + S3ReadableSeekableByteChannel( + AmazonS3 amazonS3, S3ResourceId path, S3FileSystemConfiguration config) throws IOException { this.amazonS3 = checkNotNull(amazonS3, "amazonS3"); checkNotNull(path, "path"); - this.options = checkNotNull(options, "options"); + this.config = checkNotNull(config, "config"); if (path.getSize().isPresent()) { contentLength = path.getSize().get(); @@ -84,7 +85,7 @@ public int read(ByteBuffer destinationBuffer) throws IOException { if (s3Object == null) { GetObjectRequest request = new GetObjectRequest(path.getBucket(), path.getKey()); - request.setSSECustomerKey(options.getSSECustomerKey()); + request.setSSECustomerKey(config.getSSECustomerKey()); if (position > 0) { request.setRange(position, contentLength); } @@ -155,6 +156,8 @@ public long size() throws ClosedChannelException { @Override public void close() throws IOException { if (s3Object != null) { + S3ObjectInputStream s3ObjectInputStream = s3Object.getObjectContent(); + drainInputStream(s3ObjectInputStream); s3Object.close(); } open = false; diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3ResourceId.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3ResourceId.java index 745bbd3ad253..84c1d5ad19f6 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3ResourceId.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3ResourceId.java @@ -21,6 +21,7 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import java.io.ObjectStreamException; import java.util.Date; import java.util.Objects; import java.util.regex.Matcher; @@ -37,7 +38,9 @@ }) class S3ResourceId implements ResourceId { - static final String SCHEME = "s3"; + private static final long serialVersionUID = -8218379666994031337L; + + static final String DEFAULT_SCHEME = "s3"; private static final Pattern S3_URI = Pattern.compile("(?[^:]+)://(?[^/]+)(/(?.*))?"); @@ -49,34 +52,44 @@ class S3ResourceId implements ResourceId { private final String key; private final Long size; private final Date lastModified; + private final String scheme; private S3ResourceId( - String bucket, String key, @Nullable Long size, @Nullable Date lastModified) { + String scheme, String bucket, String key, @Nullable Long size, @Nullable Date lastModified) { + checkArgument(!Strings.isNullOrEmpty(scheme), "scheme"); checkArgument(!Strings.isNullOrEmpty(bucket), "bucket"); checkArgument(!bucket.contains("/"), "bucket must not contain '/': [%s]", bucket); + this.scheme = scheme; this.bucket = bucket; this.key = checkNotNull(key, "key"); this.size = size; this.lastModified = lastModified; } - static S3ResourceId fromComponents(String bucket, String key) { + private Object readResolve() throws ObjectStreamException { + if (scheme == null) { + return new S3ResourceId(DEFAULT_SCHEME, bucket, key, size, lastModified); + } + return this; + } + + static S3ResourceId fromComponents(String scheme, String bucket, String key) { if (!key.startsWith("/")) { key = "/" + key; } - return new S3ResourceId(bucket, key, null, null); + return new S3ResourceId(scheme, bucket, key, null, null); } static S3ResourceId fromUri(String uri) { Matcher m = S3_URI.matcher(uri); checkArgument(m.matches(), "Invalid S3 URI: [%s]", uri); - checkArgument(m.group("SCHEME").equalsIgnoreCase(SCHEME), "Invalid S3 URI scheme: [%s]", uri); + String scheme = m.group("SCHEME"); String bucket = m.group("BUCKET"); String key = Strings.nullToEmpty(m.group("KEY")); if (!key.startsWith("/")) { key = "/" + key; } - return fromComponents(bucket, key); + return fromComponents(scheme, bucket, key); } String getBucket() { @@ -93,7 +106,7 @@ Optional getSize() { } S3ResourceId withSize(long size) { - return new S3ResourceId(bucket, key, size, lastModified); + return new S3ResourceId(scheme, bucket, key, size, lastModified); } Optional getLastModified() { @@ -101,7 +114,7 @@ Optional getLastModified() { } S3ResourceId withLastModified(Date lastModified) { - return new S3ResourceId(bucket, key, size, lastModified); + return new S3ResourceId(scheme, bucket, key, size, lastModified); } @Override @@ -114,7 +127,7 @@ public ResourceId resolve(String other, ResolveOptions resolveOptions) { return this; } int parentStopsAt = key.substring(0, key.length() - 1).lastIndexOf('/'); - return fromComponents(bucket, key.substring(0, parentStopsAt + 1)); + return fromComponents(scheme, bucket, key.substring(0, parentStopsAt + 1)); } if ("".equals(other)) { @@ -125,9 +138,9 @@ public ResourceId resolve(String other, ResolveOptions resolveOptions) { other += "/"; } if (S3_URI.matcher(other).matches()) { - return fromUri(other); + return resolveFromUri(other); } - return fromComponents(bucket, key + other); + return fromComponents(scheme, bucket, key + other); } if (resolveOptions == StandardResolveOptions.RESOLVE_FILE) { @@ -135,26 +148,36 @@ public ResourceId resolve(String other, ResolveOptions resolveOptions) { !other.endsWith("/"), "Cannot resolve a file with a directory path: [%s]", other); checkArgument(!"..".equals(other), "Cannot resolve parent as file: [%s]", other); if (S3_URI.matcher(other).matches()) { - return fromUri(other); + return resolveFromUri(other); } - return fromComponents(bucket, key + other); + return fromComponents(scheme, bucket, key + other); } throw new UnsupportedOperationException( String.format("Unexpected StandardResolveOptions [%s]", resolveOptions)); } + private S3ResourceId resolveFromUri(String uri) { + S3ResourceId id = fromUri(uri); + checkArgument( + id.getScheme().equals(scheme), + "Cannot resolve a URI as a child resource unless its scheme is [%s]; instead it was [%s]", + scheme, + id.getScheme()); + return id; + } + @Override public ResourceId getCurrentDirectory() { if (isDirectory()) { return this; } - return fromComponents(getBucket(), key.substring(0, key.lastIndexOf('/') + 1)); + return fromComponents(scheme, getBucket(), key.substring(0, key.lastIndexOf('/') + 1)); } @Override public String getScheme() { - return SCHEME; + return scheme; } @Override @@ -186,7 +209,7 @@ String getKeyNonWildcardPrefix() { @Override public String toString() { - return String.format("%s://%s%s", SCHEME, bucket, key); + return String.format("%s://%s%s", scheme, bucket, key); } @Override @@ -194,12 +217,13 @@ public boolean equals(@Nullable Object obj) { if (!(obj instanceof S3ResourceId)) { return false; } + S3ResourceId o = (S3ResourceId) obj; - return bucket.equals(((S3ResourceId) obj).bucket) && key.equals(((S3ResourceId) obj).key); + return scheme.equals(o.scheme) && bucket.equals(o.bucket) && key.equals(o.key); } @Override public int hashCode() { - return Objects.hash(bucket, key); + return Objects.hash(scheme, bucket, key); } } diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java index 8ae662f20dca..9e00e93c0be2 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java @@ -39,8 +39,6 @@ import java.security.NoSuchAlgorithmException; import java.util.ArrayList; import java.util.List; -import org.apache.beam.sdk.io.aws.options.S3Options; -import org.apache.beam.sdk.io.aws.options.S3Options.S3UploadBufferSizeBytesFactory; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; /** A writable S3 object, as a {@link WritableByteChannel}. */ @@ -49,7 +47,7 @@ }) class S3WritableByteChannel implements WritableByteChannel { private final AmazonS3 amazonS3; - private final S3Options options; + private final S3FileSystemConfiguration config; private final S3ResourceId path; private final String uploadId; @@ -61,38 +59,40 @@ class S3WritableByteChannel implements WritableByteChannel { private boolean open = true; private final MessageDigest md5 = md5(); - S3WritableByteChannel(AmazonS3 amazonS3, S3ResourceId path, String contentType, S3Options options) + S3WritableByteChannel( + AmazonS3 amazonS3, S3ResourceId path, String contentType, S3FileSystemConfiguration config) throws IOException { this.amazonS3 = checkNotNull(amazonS3, "amazonS3"); - this.options = checkNotNull(options); + this.config = checkNotNull(config); this.path = checkNotNull(path, "path"); checkArgument( atMostOne( - options.getSSECustomerKey() != null, - options.getSSEAlgorithm() != null, - options.getSSEAwsKeyManagementParams() != null), + config.getSSECustomerKey() != null, + config.getSSEAlgorithm() != null, + config.getSSEAwsKeyManagementParams() != null), "Either SSECustomerKey (SSE-C) or SSEAlgorithm (SSE-S3)" + " or SSEAwsKeyManagementParams (SSE-KMS) must not be set at the same time."); // Amazon S3 API docs: Each part must be at least 5 MB in size, except the last part. checkArgument( - options.getS3UploadBufferSizeBytes() - >= S3UploadBufferSizeBytesFactory.MINIMUM_UPLOAD_BUFFER_SIZE_BYTES, + config.getS3UploadBufferSizeBytes() + >= S3FileSystemConfiguration.MINIMUM_UPLOAD_BUFFER_SIZE_BYTES, "S3UploadBufferSizeBytes must be at least %s bytes", - S3UploadBufferSizeBytesFactory.MINIMUM_UPLOAD_BUFFER_SIZE_BYTES); - this.uploadBuffer = ByteBuffer.allocate(options.getS3UploadBufferSizeBytes()); + S3FileSystemConfiguration.MINIMUM_UPLOAD_BUFFER_SIZE_BYTES); + this.uploadBuffer = ByteBuffer.allocate(config.getS3UploadBufferSizeBytes()); eTags = new ArrayList<>(); ObjectMetadata objectMetadata = new ObjectMetadata(); objectMetadata.setContentType(contentType); - if (options.getSSEAlgorithm() != null) { - objectMetadata.setSSEAlgorithm(options.getSSEAlgorithm()); + if (config.getSSEAlgorithm() != null) { + objectMetadata.setSSEAlgorithm(config.getSSEAlgorithm()); } InitiateMultipartUploadRequest request = new InitiateMultipartUploadRequest(path.getBucket(), path.getKey()) - .withStorageClass(options.getS3StorageClass()) + .withStorageClass(config.getS3StorageClass()) .withObjectMetadata(objectMetadata); - request.setSSECustomerKey(options.getSSECustomerKey()); - request.setSSEAwsKeyManagementParams(options.getSSEAwsKeyManagementParams()); + request.setSSECustomerKey(config.getSSECustomerKey()); + request.setSSEAwsKeyManagementParams(config.getSSEAwsKeyManagementParams()); + request.setBucketKeyEnabled(config.getBucketKeyEnabled()); InitiateMultipartUploadResult result; try { result = amazonS3.initiateMultipartUpload(request); @@ -147,7 +147,7 @@ private void flush() throws IOException { .withPartSize(uploadBuffer.remaining()) .withMD5Digest(Base64.encodeAsString(md5.digest())) .withInputStream(inputStream); - request.setSSECustomerKey(options.getSSECustomerKey()); + request.setSSECustomerKey(config.getSSECustomerKey()); UploadPartResult result; try { diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsCheckpointMark.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsCheckpointMark.java index 8d28b59b7354..a7746a158b44 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsCheckpointMark.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsCheckpointMark.java @@ -17,34 +17,69 @@ */ package org.apache.beam.sdk.io.aws.sqs; -import com.amazonaws.services.sqs.model.Message; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import java.io.IOException; import java.io.Serializable; -import java.util.Collection; import java.util.List; -import java.util.Optional; import org.apache.beam.sdk.io.UnboundedSource; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Objects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.checkerframework.checker.nullness.qual.Nullable; +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) class SqsCheckpointMark implements UnboundedSource.CheckpointMark, Serializable { - private final List messagesToDelete; - private final transient Optional reader; + /** + * If the checkpoint is for persisting: the reader who's snapshotted state we are persisting. If + * the checkpoint is for restoring: {@literal null}. Not persisted in durable checkpoint. CAUTION: + * Between a checkpoint being taken and {@link #finalizeCheckpoint()} being called the 'true' + * active reader may have changed. + */ + private transient @Nullable SqsUnboundedReader reader; - public SqsCheckpointMark(SqsUnboundedReader reader, Collection messagesToDelete) { - this.reader = Optional.of(reader); - this.messagesToDelete = ImmutableList.copyOf(messagesToDelete); - ; - } + /** + * If the checkpoint is for persisting: The ids of messages which have been passed downstream + * since the last checkpoint. If the checkpoint is for restoring: {@literal null}. Not persisted + * in durable checkpoint. + */ + private @Nullable List safeToDeleteIds; - @Override - public void finalizeCheckpoint() { - reader.ifPresent(r -> r.delete(messagesToDelete)); + /** + * If the checkpoint is for persisting: The receipt handles of messages which have been received + * from SQS but not yet passed downstream at the time of the snapshot. If the checkpoint is for + * restoring: Same, but recovered from durable storage. + */ + @VisibleForTesting final List notYetReadReceipts; + + public SqsCheckpointMark( + SqsUnboundedReader reader, List messagesToDelete, List notYetReadReceipts) { + this.reader = reader; + this.safeToDeleteIds = ImmutableList.copyOf(messagesToDelete); + this.notYetReadReceipts = ImmutableList.copyOf(notYetReadReceipts); } - List getMessagesToDelete() { - return messagesToDelete; + @Override + public void finalizeCheckpoint() throws IOException { + checkState(reader != null && safeToDeleteIds != null, "Cannot finalize a restored checkpoint"); + // Even if the 'true' active reader has changed since the checkpoint was taken we are + // fine: + // - The underlying SQS topic will not have changed, so the following deletes will still + // go to the right place. + // - We'll delete the ACK ids from the readers in-flight state, but that only affect + // flow control and stats, neither of which are relevant anymore. + try { + reader.delete(safeToDeleteIds); + } finally { + int remainingInFlight = reader.numInFlightCheckpoints.decrementAndGet(); + checkState(remainingInFlight >= 0, "Miscounted in-flight checkpoints"); + reader.maybeCloseClient(); + reader = null; + safeToDeleteIds = null; + } } @Override @@ -56,11 +91,11 @@ public boolean equals(@Nullable Object o) { return false; } SqsCheckpointMark that = (SqsCheckpointMark) o; - return Objects.equal(messagesToDelete, that.messagesToDelete); + return Objects.equal(safeToDeleteIds, that.safeToDeleteIds); } @Override public int hashCode() { - return Objects.hashCode(messagesToDelete); + return Objects.hashCode(safeToDeleteIds); } } diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedReader.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedReader.java index b066e25d92d9..393d7a0e4347 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedReader.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedReader.java @@ -17,51 +17,385 @@ */ package org.apache.beam.sdk.io.aws.sqs; +import static java.nio.charset.StandardCharsets.UTF_8; +import static java.util.stream.Collectors.groupingBy; +import static java.util.stream.Collectors.toMap; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import com.amazonaws.services.sqs.AmazonSQS; +import com.amazonaws.services.sqs.model.BatchResultErrorEntry; +import com.amazonaws.services.sqs.model.ChangeMessageVisibilityBatchRequestEntry; +import com.amazonaws.services.sqs.model.ChangeMessageVisibilityBatchResult; +import com.amazonaws.services.sqs.model.DeleteMessageBatchRequestEntry; +import com.amazonaws.services.sqs.model.DeleteMessageBatchResult; +import com.amazonaws.services.sqs.model.GetQueueAttributesRequest; import com.amazonaws.services.sqs.model.Message; +import com.amazonaws.services.sqs.model.MessageAttributeValue; import com.amazonaws.services.sqs.model.MessageSystemAttributeName; +import com.amazonaws.services.sqs.model.QueueAttributeName; import com.amazonaws.services.sqs.model.ReceiveMessageRequest; import com.amazonaws.services.sqs.model.ReceiveMessageResult; -import java.io.Serializable; -import java.nio.charset.StandardCharsets; +import java.io.IOException; import java.util.ArrayDeque; import java.util.ArrayList; import java.util.Arrays; -import java.util.Collection; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.LinkedHashMap; import java.util.List; +import java.util.Map; import java.util.NoSuchElementException; +import java.util.Objects; import java.util.Queue; +import java.util.Set; +import java.util.concurrent.ConcurrentLinkedQueue; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.stream.Collectors; +import java.util.stream.IntStream; import org.apache.beam.sdk.io.UnboundedSource; import org.apache.beam.sdk.io.UnboundedSource.CheckpointMark; +import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.sdk.transforms.Sum; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.BucketingFunction; +import org.apache.beam.sdk.util.MovingFunction; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.EvictingQueue; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.joda.time.Duration; import org.joda.time.Instant; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -class SqsUnboundedReader extends UnboundedSource.UnboundedReader implements Serializable { +class SqsUnboundedReader extends UnboundedSource.UnboundedReader { + private static final Logger LOG = LoggerFactory.getLogger(SqsUnboundedReader.class); + /** Maximum number of messages to pull from SQS per request. */ public static final int MAX_NUMBER_OF_MESSAGES = 10; + + /** Maximum times to retry batch SQS operations upon partial success. */ + private static final int BATCH_OPERATION_MAX_RETIRES = 5; + + /** Timeout for round trip from receiving a message to finally deleting it from SQS. */ + private static final Duration PROCESSING_TIMEOUT = Duration.standardMinutes(2); + + /** + * Percentage of visibility timeout by which to extend visibility timeout when they are near + * timeout. + */ + private static final int VISIBILITY_EXTENSION_PCT = 50; + + /** + * Percentage of ack timeout we should use as a safety margin. We'll try to extend visibility + * timeout by this margin before the visibility timeout actually expires. + */ + private static final int VISIBILITY_SAFETY_PCT = 20; + + /** + * For stats only: How close we can get to an visibility deadline before we risk it being already + * considered passed by SQS. + */ + private static final Duration VISIBILITY_TOO_LATE = Duration.standardSeconds(2); + + /** Maximum number of message ids per delete or visibility extension call. */ + private static final int DELETE_BATCH_SIZE = 10; + + /** Maximum number of messages in flight. */ + private static final int MAX_IN_FLIGHT = 20000; + + /** Maximum number of recent messages for calculating average message size. */ + private static final int MAX_AVG_BYTE_MESSAGES = 20; + + /** Period of samples to determine watermark and other stats. */ + private static final Duration SAMPLE_PERIOD = Duration.standardMinutes(1); + + /** Period of updates to determine watermark and other stats. */ + private static final Duration SAMPLE_UPDATE = Duration.standardSeconds(5); + + /** Period for logging stats. */ + private static final Duration LOG_PERIOD = Duration.standardSeconds(30); + + /** Minimum number of unread messages required before considering updating watermark. */ + private static final int MIN_WATERMARK_MESSAGES = 10; + + /** + * Minimum number of SAMPLE_UPDATE periods over which unread messages should be spread before + * considering updating watermark. + */ + private static final int MIN_WATERMARK_SPREAD = 2; + + // TODO: Would prefer to use MinLongFn but it is a BinaryCombineFn rather + // than a BinaryCombineLongFn. [BEAM-285] + private static final Combine.BinaryCombineLongFn MIN = + new Combine.BinaryCombineLongFn() { + @Override + public long apply(long left, long right) { + return Math.min(left, right); + } + + @Override + public long identity() { + return Long.MAX_VALUE; + } + }; + + private static final Combine.BinaryCombineLongFn MAX = + new Combine.BinaryCombineLongFn() { + @Override + public long apply(long left, long right) { + return Math.max(left, right); + } + + @Override + public long identity() { + return Long.MIN_VALUE; + } + }; + + private static final Combine.BinaryCombineLongFn SUM = Sum.ofLongs(); + + /** For access to topic and SQS client. */ private final SqsUnboundedSource source; + + /** + * The closed state of this {@link SqsUnboundedReader}. If true, the reader has not yet been + * closed, and it will have a non-null value within {@link #SqsUnboundedReader}. + */ + private AtomicBoolean active = new AtomicBoolean(true); + + /** The current message, or {@literal null} if none. */ private Message current; - private final Queue messagesNotYetRead; - private List messagesToDelete; - private Instant oldestPendingTimestamp = BoundedWindow.TIMESTAMP_MIN_VALUE; - public SqsUnboundedReader(SqsUnboundedSource source, SqsCheckpointMark sqsCheckpointMark) { + /** + * Messages we have received from SQS and not yet delivered downstream. We preserve their order. + */ + final Queue messagesNotYetRead; + + /** Message ids of messages we have delivered downstream but not yet deleted. */ + private Set safeToDeleteIds; + + /** + * Visibility timeout, in ms, as set on subscription when we first start reading. Not updated + * thereafter. -1 if not yet determined. + */ + private long visibilityTimeoutMs; + + /** Byte size of undecoded elements in {@link #messagesNotYetRead}. */ + private long notYetReadBytes; + + /** Byte size of recent messages. */ + private EvictingQueue recentMessageBytes; + + /** + * Bucketed map from received time (as system time, ms since epoch) to message timestamps (mssince + * epoch) of all received but not-yet read messages. Used to estimate watermark. + */ + private BucketingFunction minUnreadTimestampMsSinceEpoch; + + /** + * Minimum of timestamps (ms since epoch) of all recently read messages. Used to estimate + * watermark. + */ + private MovingFunction minReadTimestampMsSinceEpoch; + + /** Number of recent empty receives. */ + private MovingFunction numEmptyReceives; + + private static class InFlightState { + /** Receipt handle of message. */ + String receiptHandle; + + /** When request which yielded message was issued. */ + long requestTimeMsSinceEpoch; + + /** + * When SQS will consider this message's visibility timeout to timeout and thus it needs to be + * extended. + */ + long visibilityDeadlineMsSinceEpoch; + + public InFlightState( + String receiptHandle, long requestTimeMsSinceEpoch, long visibilityDeadlineMsSinceEpoch) { + this.receiptHandle = receiptHandle; + this.requestTimeMsSinceEpoch = requestTimeMsSinceEpoch; + this.visibilityDeadlineMsSinceEpoch = visibilityDeadlineMsSinceEpoch; + } + } + + /** + * Map from message ids of messages we have received from SQS but not yet deleted to their in + * flight state. Ordered from earliest to latest visibility deadline. + */ + private final LinkedHashMap inFlight; + + /** + * Batches of successfully deleted message ids which need to be pruned from the above. CAUTION: + * Accessed by both reader and checkpointing threads. + */ + private final Queue> deletedIds; + + /** + * System time (ms since epoch) we last received a message from SQS, or -1 if not yet received any + * messages. + */ + private long lastReceivedMsSinceEpoch; + + /** The last reported watermark (ms since epoch), or beginning of time if none yet reported. */ + private long lastWatermarkMsSinceEpoch; + + /** Stats only: System time (ms since epoch) we last logs stats, or -1 if never. */ + private long lastLogTimestampMsSinceEpoch; + + /** Stats only: Total number of messages received. */ + private long numReceived; + + /** Stats only: Number of messages which have recently been received. */ + private MovingFunction numReceivedRecently; + + /** Stats only: Number of messages which have recently had their deadline extended. */ + private MovingFunction numExtendedDeadlines; + + /** + * Stats only: Number of messages which have recently had their deadline extended even though it + * may be too late to do so. + */ + private MovingFunction numLateDeadlines; + + /** Stats only: Number of messages which have recently been deleted. */ + private MovingFunction numDeleted; + + /** + * Stats only: Number of messages which have recently expired (visibility timeout were extended + * for too long). + */ + private MovingFunction numExpired; + + /** Stats only: Number of messages which have recently been returned to visible on SQS. */ + private MovingFunction numReleased; + + /** Stats only: Number of message bytes which have recently been read by downstream consumer. */ + private MovingFunction numReadBytes; + + /** + * Stats only: Minimum of timestamp (ms since epoch) of all recently received messages. Used to + * estimate timestamp skew. Does not contribute to watermark estimator. + */ + private MovingFunction minReceivedTimestampMsSinceEpoch; + + /** + * Stats only: Maximum of timestamp (ms since epoch) of all recently received messages. Used to + * estimate timestamp skew. + */ + private MovingFunction maxReceivedTimestampMsSinceEpoch; + + /** Stats only: Minimum of recent estimated watermarks (ms since epoch). */ + private MovingFunction minWatermarkMsSinceEpoch; + + /** Stats ony: Maximum of recent estimated watermarks (ms since epoch). */ + private MovingFunction maxWatermarkMsSinceEpoch; + + /** + * Stats only: Number of messages with timestamps strictly behind the estimated watermark at the + * time they are received. These may be considered 'late' by downstream computations. + */ + private MovingFunction numLateMessages; + + /** + * Stats only: Current number of checkpoints in flight. CAUTION: Accessed by both checkpointing + * and reader threads. + */ + AtomicInteger numInFlightCheckpoints; + + /** Stats only: Maximum number of checkpoints in flight at any time. */ + private int maxInFlightCheckpoints; + + private static MovingFunction newFun(Combine.BinaryCombineLongFn function) { + return new MovingFunction( + SAMPLE_PERIOD.getMillis(), + SAMPLE_UPDATE.getMillis(), + MIN_WATERMARK_SPREAD, + MIN_WATERMARK_MESSAGES, + function); + } + + public SqsUnboundedReader(SqsUnboundedSource source, SqsCheckpointMark sqsCheckpointMark) + throws IOException { this.source = source; - this.current = null; - this.messagesNotYetRead = new ArrayDeque<>(); - this.messagesToDelete = new ArrayList<>(); + messagesNotYetRead = new ArrayDeque<>(); + safeToDeleteIds = new HashSet<>(); + inFlight = new LinkedHashMap<>(); + deletedIds = new ConcurrentLinkedQueue<>(); + visibilityTimeoutMs = -1; + notYetReadBytes = 0; + recentMessageBytes = EvictingQueue.create(MAX_AVG_BYTE_MESSAGES); + minUnreadTimestampMsSinceEpoch = + new BucketingFunction( + SAMPLE_UPDATE.getMillis(), MIN_WATERMARK_SPREAD, MIN_WATERMARK_MESSAGES, MIN); + minReadTimestampMsSinceEpoch = newFun(MIN); + numEmptyReceives = newFun(SUM); + lastReceivedMsSinceEpoch = -1; + lastWatermarkMsSinceEpoch = BoundedWindow.TIMESTAMP_MIN_VALUE.getMillis(); + current = null; + lastLogTimestampMsSinceEpoch = -1; + numReceived = 0L; + numReceivedRecently = newFun(SUM); + numExtendedDeadlines = newFun(SUM); + numLateDeadlines = newFun(SUM); + numDeleted = newFun(SUM); + numExpired = newFun(SUM); + numReleased = newFun(SUM); + numReadBytes = newFun(SUM); + minReceivedTimestampMsSinceEpoch = newFun(MIN); + maxReceivedTimestampMsSinceEpoch = newFun(MAX); + minWatermarkMsSinceEpoch = newFun(MIN); + maxWatermarkMsSinceEpoch = newFun(MAX); + numLateMessages = newFun(SUM); + numInFlightCheckpoints = new AtomicInteger(); + maxInFlightCheckpoints = 0; if (sqsCheckpointMark != null) { - this.messagesToDelete.addAll(sqsCheckpointMark.getMessagesToDelete()); + long nowMsSinceEpoch = now(); + extendBatch(nowMsSinceEpoch, sqsCheckpointMark.notYetReadReceipts, 0); + numReleased.add(nowMsSinceEpoch, sqsCheckpointMark.notYetReadReceipts.size()); } } @Override public Instant getWatermark() { - return oldestPendingTimestamp; + + // NOTE: We'll allow the watermark to go backwards. The underlying runner is responsible + // for aggregating all reported watermarks and ensuring the aggregate is latched. + // If we attempt to latch locally then it is possible a temporary starvation of one reader + // could cause its estimated watermark to fast forward to current system time. Then when + // the reader resumes its watermark would be unable to resume tracking. + // By letting the underlying runner latch we avoid any problems due to localized starvation. + long nowMsSinceEpoch = now(); + long readMin = minReadTimestampMsSinceEpoch.get(nowMsSinceEpoch); + long unreadMin = minUnreadTimestampMsSinceEpoch.get(); + if (readMin == Long.MAX_VALUE + && unreadMin == Long.MAX_VALUE + && numEmptyReceives.get(nowMsSinceEpoch) > 0 + && nowMsSinceEpoch > lastReceivedMsSinceEpoch + SAMPLE_PERIOD.getMillis()) { + // We don't currently have any unread messages pending, we have not had any messages + // read for a while, and we have not received any new messages from SQS for a while. + // Advance watermark to current time. + // TODO: Estimate a timestamp lag. + lastWatermarkMsSinceEpoch = nowMsSinceEpoch; + } else if (minReadTimestampMsSinceEpoch.isSignificant() + || minUnreadTimestampMsSinceEpoch.isSignificant()) { + // Take minimum of the timestamps in all unread messages and recently read messages. + lastWatermarkMsSinceEpoch = Math.min(readMin, unreadMin); + } + // else: We're not confident enough to estimate a new watermark. Stick with the old one. + minWatermarkMsSinceEpoch.add(nowMsSinceEpoch, lastWatermarkMsSinceEpoch); + maxWatermarkMsSinceEpoch.add(nowMsSinceEpoch, lastWatermarkMsSinceEpoch); + return new Instant(lastWatermarkMsSinceEpoch); } @Override @@ -86,12 +420,19 @@ public byte[] getCurrentRecordId() throws NoSuchElementException { if (current == null) { throw new NoSuchElementException(); } - return current.getMessageId().getBytes(StandardCharsets.UTF_8); + return current.getMessageId().getBytes(UTF_8); } @Override public CheckpointMark getCheckpointMark() { - return new SqsCheckpointMark(this, messagesToDelete); + int cur = numInFlightCheckpoints.incrementAndGet(); + maxInFlightCheckpoints = Math.max(maxInFlightCheckpoints, cur); + List snapshotSafeToDeleteIds = Lists.newArrayList(safeToDeleteIds); + List snapshotNotYetReadReceipts = new ArrayList<>(messagesNotYetRead.size()); + for (Message message : messagesNotYetRead) { + snapshotNotYetReadReceipts.add(message.getReceiptHandle()); + } + return new SqsCheckpointMark(this, snapshotSafeToDeleteIds, snapshotNotYetReadReceipts); } @Override @@ -100,47 +441,220 @@ public SqsUnboundedSource getCurrentSource() { } @Override - public boolean start() { + public long getTotalBacklogBytes() { + long avgBytes = avgMessageBytes(); + List requestAttributes = + Collections.singletonList(QueueAttributeName.ApproximateNumberOfMessages.toString()); + Map queueAttributes = + source + .getSqs() + .getQueueAttributes(source.getRead().queueUrl(), requestAttributes) + .getAttributes(); + long numMessages = + Long.parseLong( + queueAttributes.get(QueueAttributeName.ApproximateNumberOfMessages.toString())); + + // No messages consumed for estimating average message size + if (avgBytes == -1 && numMessages > 0) { + return BACKLOG_UNKNOWN; + } else { + return numMessages * avgBytes; + } + } + + @Override + public boolean start() throws IOException { + visibilityTimeoutMs = + Integer.parseInt( + source + .getSqs() + .getQueueAttributes( + new GetQueueAttributesRequest(source.getRead().queueUrl()) + .withAttributeNames("VisibilityTimeout")) + .getAttributes() + .get("VisibilityTimeout")) + * 1000L; return advance(); } @Override - public boolean advance() { + public boolean advance() throws IOException { + // Emit stats. + stats(); + + if (current != null) { + // Current is consumed. It can no longer contribute to holding back the watermark. + minUnreadTimestampMsSinceEpoch.remove(getRequestTimeMsSinceEpoch(current)); + current = null; + } + + // Retire state associated with deleted messages. + retire(); + + // Extend all pressing deadlines. + // Will BLOCK until done. + // If the system is pulling messages only to let them sit in a downstream queue then + // this will have the effect of slowing down the pull rate. + // However, if the system is genuinely taking longer to process each message then + // the work to extend visibility timeout would be better done in the background. + extend(); + if (messagesNotYetRead.isEmpty()) { + // Pull another batch. + // Will BLOCK until fetch returns, but will not block until a message is available. pull(); } + // Take one message from queue. current = messagesNotYetRead.poll(); if (current == null) { + // Try again later. return false; } - - messagesToDelete.add(current); - - Instant currentMessageTimestamp = getCurrentTimestamp(); - if (getCurrentTimestamp().isBefore(oldestPendingTimestamp)) { - oldestPendingTimestamp = currentMessageTimestamp; + notYetReadBytes -= current.getBody().getBytes(UTF_8).length; + checkState(notYetReadBytes >= 0); + long nowMsSinceEpoch = now(); + numReadBytes.add(nowMsSinceEpoch, current.getBody().getBytes(UTF_8).length); + recentMessageBytes.add(current.getBody().getBytes(UTF_8).length); + minReadTimestampMsSinceEpoch.add(nowMsSinceEpoch, getCurrentTimestamp().getMillis()); + if (getCurrentTimestamp().getMillis() < lastWatermarkMsSinceEpoch) { + numLateMessages.add(nowMsSinceEpoch, 1L); } + // Current message can be considered 'read' and will be persisted by the next + // checkpoint. So it is now safe to delete from SQS. + safeToDeleteIds.add(current.getMessageId()); + return true; } + /** + * {@inheritDoc}. + * + *

    Marks this {@link SqsUnboundedReader} as no longer active. The {@link AmazonSQS} continue to + * exist and be active beyond the life of this call if there are any in-flight checkpoints. When + * no in-flight checkpoints remain, the reader will be closed. + */ @Override - public void close() {} + public void close() throws IOException { + active.set(false); + maybeCloseClient(); + } - void delete(final Collection messages) { - for (Message message : messages) { - if (messagesToDelete.contains(message)) { - source.getSqs().deleteMessage(source.getRead().queueUrl(), message.getReceiptHandle()); - Instant currentMessageTimestamp = getTimestamp(message); - if (currentMessageTimestamp.isAfter(oldestPendingTimestamp)) { - oldestPendingTimestamp = currentMessageTimestamp; - } + /** + * Close this reader's underlying {@link AmazonSQS} if the reader has been closed and there are no + * outstanding checkpoints. + */ + void maybeCloseClient() throws IOException { + if (!active.get() && numInFlightCheckpoints.get() == 0) { + // The reader has been closed and it has no more outstanding checkpoints. The client + // must be closed so it doesn't leak + AmazonSQS client = source.getSqs(); + if (client != null) { + client.shutdown(); } } } + /** delete the provided {@code messageIds} from SQS. */ + void delete(List messageIds) throws IOException { + AtomicInteger counter = new AtomicInteger(); + for (List messageList : + messageIds.stream() + .collect(groupingBy(x -> counter.getAndIncrement() / DELETE_BATCH_SIZE)) + .values()) { + deleteBatch(messageList); + } + } + + /** + * delete the provided {@code messageIds} from SQS, blocking until all of the messages are + * deleted. + * + *

    CAUTION: May be invoked from a separate thread. + * + *

    CAUTION: Retains {@code messageIds}. + */ + private void deleteBatch(List messageIds) throws IOException { + int retries = 0; + List errorMessages = new ArrayList<>(); + Map pendingReceipts = + IntStream.range(0, messageIds.size()) + .boxed() + .filter(i -> inFlight.containsKey(messageIds.get(i))) + .collect(toMap(Object::toString, i -> inFlight.get(messageIds.get(i)).receiptHandle)); + + while (!pendingReceipts.isEmpty()) { + + if (retries >= BATCH_OPERATION_MAX_RETIRES) { + throw new IOException( + "Failed to delete " + + pendingReceipts.size() + + " messages after " + + retries + + " retries: " + + String.join(", ", errorMessages)); + } + + List entries = + pendingReceipts.entrySet().stream() + .map(r -> new DeleteMessageBatchRequestEntry(r.getKey(), r.getValue())) + .collect(Collectors.toList()); + + DeleteMessageBatchResult result = + source.getSqs().deleteMessageBatch(source.getRead().queueUrl(), entries); + + // Retry errors except invalid handles + Set retryErrors = + result.getFailed().stream() + .filter(e -> !e.getCode().equals("ReceiptHandleIsInvalid")) + .collect(Collectors.toSet()); + + pendingReceipts + .keySet() + .retainAll( + retryErrors.stream().map(BatchResultErrorEntry::getId).collect(Collectors.toSet())); + + errorMessages = + retryErrors.stream().map(BatchResultErrorEntry::getMessage).collect(Collectors.toList()); + + retries += 1; + } + deletedIds.add(messageIds); + } + + /** + * Messages which have been deleted (via the checkpoint finalize) are no longer in flight. This is + * only used for flow control and stats. + */ + private void retire() { + long nowMsSinceEpoch = now(); + while (true) { + List ackIds = deletedIds.poll(); + if (ackIds == null) { + return; + } + numDeleted.add(nowMsSinceEpoch, ackIds.size()); + for (String ackId : ackIds) { + inFlight.remove(ackId); + safeToDeleteIds.remove(ackId); + } + } + } + + /** BLOCKING Fetch another batch of messages from SQS. */ private void pull() { + if (inFlight.size() >= MAX_IN_FLIGHT) { + // Wait for checkpoint to be finalized before pulling anymore. + // There may be lag while checkpoints are persisted and the finalizeCheckpoint method + // is invoked. By limiting the in-flight messages we can ensure we don't end up consuming + // messages faster than we can checkpoint them. + return; + } + + long requestTimeMsSinceEpoch = now(); + long deadlineMsSinceEpoch = requestTimeMsSinceEpoch + visibilityTimeoutMs; + final ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(source.getRead().queueUrl()); @@ -153,17 +667,296 @@ private void pull() { final List messages = receiveMessageResult.getMessages(); if (messages == null || messages.isEmpty()) { + numEmptyReceives.add(requestTimeMsSinceEpoch, 1L); return; } + lastReceivedMsSinceEpoch = requestTimeMsSinceEpoch; + + // Capture the received messages. for (Message message : messages) { + // The Message class does not contain request time. + setRequestTimeMsSinceEpoch(message, requestTimeMsSinceEpoch); messagesNotYetRead.add(message); + notYetReadBytes += message.getBody().getBytes(UTF_8).length; + inFlight.put( + message.getMessageId(), + new InFlightState( + message.getReceiptHandle(), requestTimeMsSinceEpoch, deadlineMsSinceEpoch)); + numReceived++; + numReceivedRecently.add(requestTimeMsSinceEpoch, 1L); + minReceivedTimestampMsSinceEpoch.add( + requestTimeMsSinceEpoch, getTimestamp(message).getMillis()); + maxReceivedTimestampMsSinceEpoch.add( + requestTimeMsSinceEpoch, getTimestamp(message).getMillis()); + minUnreadTimestampMsSinceEpoch.add( + requestTimeMsSinceEpoch, getTimestamp(message).getMillis()); + } + } + + /** Return the current time, in ms since epoch. */ + long now() { + return System.currentTimeMillis(); + } + + /** + * BLOCKING Extend deadline for all messages which need it. CAUTION: If extensions can't keep up + * with wallclock then we'll never return. + */ + private void extend() throws IOException { + while (true) { + long nowMsSinceEpoch = now(); + List assumeExpired = new ArrayList<>(); + List toBeExtended = new ArrayList<>(); + List toBeExpired = new ArrayList<>(); + // Messages will be in increasing deadline order. + for (Map.Entry entry : inFlight.entrySet()) { + if (entry.getValue().visibilityDeadlineMsSinceEpoch + - (visibilityTimeoutMs * VISIBILITY_SAFETY_PCT) / 100 + > nowMsSinceEpoch) { + // All remaining messages don't need their visibility timeouts to be extended. + break; + } + + if (entry.getValue().visibilityDeadlineMsSinceEpoch - VISIBILITY_TOO_LATE.getMillis() + < nowMsSinceEpoch) { + // SQS may have already considered this message to have expired. + // If so it will (eventually) be made available on a future pull request. + // If this message ends up being committed then it will be considered a duplicate + // when re-pulled. + assumeExpired.add(entry.getKey()); + continue; + } + + if (entry.getValue().requestTimeMsSinceEpoch + PROCESSING_TIMEOUT.getMillis() + < nowMsSinceEpoch) { + // This message has been in-flight for too long. + // Give up on it, otherwise we risk extending its visibility timeout indefinitely. + toBeExpired.add(entry.getKey()); + continue; + } + + // Extend the visibility timeout for this message. + toBeExtended.add(entry.getKey()); + if (toBeExtended.size() >= DELETE_BATCH_SIZE) { + // Enough for one batch. + break; + } + } + + if (assumeExpired.isEmpty() && toBeExtended.isEmpty() && toBeExpired.isEmpty()) { + // Nothing to be done. + return; + } + + if (!assumeExpired.isEmpty()) { + // If we didn't make the visibility deadline assume expired and no longer in flight. + numLateDeadlines.add(nowMsSinceEpoch, assumeExpired.size()); + for (String messageId : assumeExpired) { + inFlight.remove(messageId); + } + } + + if (!toBeExpired.isEmpty()) { + // Expired messages are no longer considered in flight. + numExpired.add(nowMsSinceEpoch, toBeExpired.size()); + for (String messageId : toBeExpired) { + inFlight.remove(messageId); + } + } + + if (!toBeExtended.isEmpty()) { + // SQS extends visibility timeout from it's notion of current time. + // We'll try to track that on our side, but note the deadlines won't necessarily agree. + long extensionMs = (int) ((visibilityTimeoutMs * VISIBILITY_EXTENSION_PCT) / 100L); + long newDeadlineMsSinceEpoch = nowMsSinceEpoch + extensionMs; + for (String messageId : toBeExtended) { + // Maintain increasing ack deadline order. + String receiptHandle = inFlight.get(messageId).receiptHandle; + InFlightState state = inFlight.remove(messageId); + + inFlight.put( + messageId, + new InFlightState( + receiptHandle, state.requestTimeMsSinceEpoch, newDeadlineMsSinceEpoch)); + } + List receiptHandles = + toBeExtended.stream() + .map(inFlight::get) + .filter(Objects::nonNull) // get rid of null values + .map(m -> m.receiptHandle) + .collect(Collectors.toList()); + // BLOCKs until extended. + extendBatch(nowMsSinceEpoch, receiptHandles, (int) (extensionMs / 1000)); + } } } + /** + * BLOCKING Extend the visibility timeout for messages from SQS with the given {@code + * receiptHandles}. + */ + void extendBatch(long nowMsSinceEpoch, List receiptHandles, int extensionSec) + throws IOException { + int retries = 0; + int numMessages = receiptHandles.size(); + Map pendingReceipts = + IntStream.range(0, receiptHandles.size()) + .boxed() + .collect(toMap(Object::toString, receiptHandles::get)); + List errorMessages = new ArrayList<>(); + + while (!pendingReceipts.isEmpty()) { + + if (retries >= BATCH_OPERATION_MAX_RETIRES) { + throw new IOException( + "Failed to extend visibility timeout for " + + pendingReceipts.size() + + " messages after " + + retries + + " retries: " + + String.join(", ", errorMessages)); + } + + List entries = + pendingReceipts.entrySet().stream() + .map( + r -> + new ChangeMessageVisibilityBatchRequestEntry(r.getKey(), r.getValue()) + .withVisibilityTimeout(extensionSec)) + .collect(Collectors.toList()); + + ChangeMessageVisibilityBatchResult result = + source.getSqs().changeMessageVisibilityBatch(source.getRead().queueUrl(), entries); + + // Retry errors except invalid handles + Set retryErrors = + result.getFailed().stream() + .filter(e -> !e.getCode().equals("ReceiptHandleIsInvalid")) + .collect(Collectors.toSet()); + + pendingReceipts + .keySet() + .retainAll( + retryErrors.stream().map(BatchResultErrorEntry::getId).collect(Collectors.toSet())); + + errorMessages = + retryErrors.stream().map(BatchResultErrorEntry::getMessage).collect(Collectors.toList()); + + retries += 1; + } + numExtendedDeadlines.add(nowMsSinceEpoch, numMessages); + } + + /** Log stats if time to do so. */ + private void stats() { + long nowMsSinceEpoch = now(); + if (lastLogTimestampMsSinceEpoch < 0) { + lastLogTimestampMsSinceEpoch = nowMsSinceEpoch; + return; + } + long deltaMs = nowMsSinceEpoch - lastLogTimestampMsSinceEpoch; + if (deltaMs < LOG_PERIOD.getMillis()) { + return; + } + + String messageSkew = "unknown"; + long minTimestamp = minReceivedTimestampMsSinceEpoch.get(nowMsSinceEpoch); + long maxTimestamp = maxReceivedTimestampMsSinceEpoch.get(nowMsSinceEpoch); + if (minTimestamp < Long.MAX_VALUE && maxTimestamp > Long.MIN_VALUE) { + messageSkew = (maxTimestamp - minTimestamp) + "ms"; + } + + String watermarkSkew = "unknown"; + long minWatermark = minWatermarkMsSinceEpoch.get(nowMsSinceEpoch); + long maxWatermark = maxWatermarkMsSinceEpoch.get(nowMsSinceEpoch); + if (minWatermark < Long.MAX_VALUE && maxWatermark > Long.MIN_VALUE) { + watermarkSkew = (maxWatermark - minWatermark) + "ms"; + } + + String oldestInFlight = "no"; + String oldestAckId = Iterables.getFirst(inFlight.keySet(), null); + if (oldestAckId != null) { + oldestInFlight = (nowMsSinceEpoch - inFlight.get(oldestAckId).requestTimeMsSinceEpoch) + "ms"; + } + + LOG.debug( + "SQS {} has " + + "{} received messages, " + + "{} current unread messages, " + + "{} current unread bytes, " + + "{} current in-flight msgs, " + + "{} oldest in-flight, " + + "{} current in-flight checkpoints, " + + "{} max in-flight checkpoints, " + + "{} bytes in backlog, " + + "{}B/s recent read, " + + "{} recent received, " + + "{} recent extended, " + + "{} recent late extended, " + + "{} recent deleted, " + + "{} recent released, " + + "{} recent expired, " + + "{} recent message timestamp skew, " + + "{} recent watermark skew, " + + "{} recent late messages, " + + "{} last reported watermark", + source.getRead().queueUrl(), + numReceived, + messagesNotYetRead.size(), + notYetReadBytes, + inFlight.size(), + oldestInFlight, + numInFlightCheckpoints.get(), + maxInFlightCheckpoints, + getTotalBacklogBytes(), + numReadBytes.get(nowMsSinceEpoch) / (SAMPLE_PERIOD.getMillis() / 1000L), + numReceivedRecently.get(nowMsSinceEpoch), + numExtendedDeadlines.get(nowMsSinceEpoch), + numLateDeadlines.get(nowMsSinceEpoch), + numDeleted.get(nowMsSinceEpoch), + numReleased.get(nowMsSinceEpoch), + numExpired.get(nowMsSinceEpoch), + messageSkew, + watermarkSkew, + numLateMessages.get(nowMsSinceEpoch), + new Instant(lastWatermarkMsSinceEpoch)); + + lastLogTimestampMsSinceEpoch = nowMsSinceEpoch; + } + + /** Return the average byte size of all message read. -1 if no message read yet */ + private long avgMessageBytes() { + if (!recentMessageBytes.isEmpty()) { + return (long) recentMessageBytes.stream().mapToDouble(s -> s).average().getAsDouble(); + } else { + return -1L; + } + } + + /** Extract the timestamp from the given {@code message}. */ private Instant getTimestamp(final Message message) { return new Instant( Long.parseLong( message.getAttributes().get(MessageSystemAttributeName.SentTimestamp.toString()))); } + + /** + * Since SQS Message instances does not hold the request timestamp, store a new message attribute + * as the given {@code requestTimeMsSinceEpoch}. + */ + private void setRequestTimeMsSinceEpoch( + final Message message, final long requestTimeMsSinceEpoch) { + Map attributes = new HashMap<>(); + attributes.put( + "requestTimeMsSinceEpoch", + new MessageAttributeValue().withStringValue(Long.toString(requestTimeMsSinceEpoch))); + message.setMessageAttributes(attributes); + } + + /** Extract the request timestamp from the given {@code message}. */ + private Long getRequestTimeMsSinceEpoch(final Message message) { + return Long.parseLong( + message.getMessageAttributes().get("requestTimeMsSinceEpoch").getStringValue()); + } } diff --git a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedSource.java b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedSource.java index 5dcfce058267..1161bff61b16 100644 --- a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedSource.java +++ b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedSource.java @@ -20,6 +20,7 @@ import com.amazonaws.services.sqs.AmazonSQS; import com.amazonaws.services.sqs.AmazonSQSClientBuilder; import com.amazonaws.services.sqs.model.Message; +import java.io.IOException; import java.io.Serializable; import java.util.ArrayList; import java.util.List; @@ -68,7 +69,11 @@ public List split(int desiredNumSplits, PipelineOptions opti @Override public UnboundedReader createReader( PipelineOptions options, @Nullable SqsCheckpointMark checkpointMark) { - return new SqsUnboundedReader(this, checkpointMark); + try { + return new SqsUnboundedReader(this, checkpointMark); + } catch (IOException e) { + throw new RuntimeException("Unable to subscribe to " + read.queueUrl() + ": ", e); + } } @Override diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/coders/AwsCodersTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/coders/AwsCodersTest.java index 92b86ddf4eba..46ce02de7993 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/coders/AwsCodersTest.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/coders/AwsCodersTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.aws.coders; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.amazonaws.ResponseMetadata; import com.amazonaws.http.HttpResponse; @@ -29,9 +29,6 @@ import org.junit.Test; /** Tests for AWS coders. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AwsCodersTest { @Test diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/AwsClientsProviderMock.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/AwsClientsProviderMock.java index 8cef61954875..dfcf302e0062 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/AwsClientsProviderMock.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/AwsClientsProviderMock.java @@ -22,9 +22,6 @@ import org.mockito.Mockito; /** Mocking AwsClientProvider. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AwsClientsProviderMock implements AwsClientsProvider { private static AwsClientsProviderMock instance = new AwsClientsProviderMock(); diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIOTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIOTest.java index 9e737283bb2d..85f1d01e52fd 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIOTest.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIOTest.java @@ -27,31 +27,32 @@ import com.amazonaws.services.dynamodbv2.model.ScanRequest; import com.amazonaws.services.dynamodbv2.model.WriteRequest; import java.io.Serializable; +import java.util.Arrays; +import java.util.HashSet; import java.util.List; import java.util.Map; +import java.util.stream.Collectors; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.testing.ExpectedLogs; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.joda.time.Duration; import org.junit.AfterClass; import org.junit.BeforeClass; -import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; +import org.mockito.ArgumentCaptor; import org.mockito.Mockito; /** Test Coverage for the IO. */ -@Ignore("[BEAM-7794] DynamoDBIOTest is blocking forever") -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DynamoDBIOTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); @Rule public final transient ExpectedLogs expectedLogs = ExpectedLogs.none(DynamoDBIO.class); @@ -89,6 +90,27 @@ public void testReadScanResult() { pipeline.run().waitUntilFinish(); } + @Test + public void testReadScanResultWithLimit() { + // Maximum number of records in scan result + final int limit = 5; + + PCollection> actual = + pipeline + .apply( + DynamoDBIO.>>read() + .withAwsClientsProvider( + AwsClientsProviderMock.of(DynamoDBIOTestHelper.getDynamoDBClient())) + .withScanRequestFn( + (SerializableFunction) + input -> + new ScanRequest(tableName).withTotalSegments(1).withLimit(limit)) + .items()) + .apply(ParDo.of(new IterateListDoFn())); + PAssert.that(actual).containsInAnyOrder(expected); + pipeline.run().waitUntilFinish(); + } + // Test cases for Reader's arguments. @Test public void testMissingScanRequestFn() { @@ -215,4 +237,77 @@ public void testRetries() throws Throwable { } fail("Pipeline is expected to fail because we were unable to write to DynamoDB."); } + + /** + * A DoFn used to generate outputs duplicated N times, where N is the input. Used to generate + * bundles with duplicate elements. + */ + private static class WriteDuplicateGeneratorDoFn extends DoFn { + @ProcessElement + public void processElement(ProcessContext ctx) { + for (int i = 0; i < ctx.element(); i++) { + DynamoDBIOTestHelper.generateWriteRequests(numOfItems).forEach(ctx::output); + } + } + } + + @Test + public void testWriteDeduplication() { + // designate duplication factor for each bundle + final List duplications = Arrays.asList(1, 2, 3); + + final List deduplicateKeys = + Arrays.asList(DynamoDBIOTestHelper.ATTR_NAME_1, DynamoDBIOTestHelper.ATTR_NAME_2); + + AmazonDynamoDB amazonDynamoDBMock = Mockito.mock(AmazonDynamoDB.class); + + pipeline + .apply(Create.of(duplications)) + .apply("duplicate", ParDo.of(new WriteDuplicateGeneratorDoFn())) + .apply( + DynamoDBIO.write() + .withWriteRequestMapperFn( + (SerializableFunction>) + writeRequest -> KV.of(tableName, writeRequest)) + .withRetryConfiguration( + DynamoDBIO.RetryConfiguration.create(5, Duration.standardMinutes(1))) + .withAwsClientsProvider(AwsClientsProviderMock.of(amazonDynamoDBMock)) + .withDeduplicateKeys(deduplicateKeys)); + + pipeline.run().waitUntilFinish(); + + ArgumentCaptor argumentCaptor = + ArgumentCaptor.forClass(BatchWriteItemRequest.class); + Mockito.verify(amazonDynamoDBMock, Mockito.times(3)).batchWriteItem(argumentCaptor.capture()); + List batchRequests = argumentCaptor.getAllValues(); + batchRequests.forEach( + batchRequest -> { + List requests = batchRequest.getRequestItems().get(tableName); + // assert that each bundle contains expected number of items + assertEquals(numOfItems, requests.size()); + List> requestKeys = + requests.stream() + .map( + request -> + request.getPutRequest() != null + ? request.getPutRequest().getItem() + : request.getDeleteRequest().getKey()) + .collect(Collectors.toList()); + // assert no duplicate keys in each bundle + assertEquals(new HashSet<>(requestKeys).size(), requestKeys.size()); + }); + } + + private static class IterateListDoFn + extends DoFn>, Map> { + + @ProcessElement + public void processElement( + @Element List> items, + OutputReceiver> out) { + for (Map item : items) { + out.output(item); + } + } + } } diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIOTestHelper.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIOTestHelper.java index 0eaabee7382d..3ba22cdf28ed 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIOTestHelper.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIOTestHelper.java @@ -45,11 +45,8 @@ import org.testcontainers.utility.DockerImageName; /** A utility to generate test table and data for {@link DynamoDBIOTest}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class DynamoDBIOTestHelper implements Serializable { - private static final String LOCALSTACK_VERSION = "0.11.3"; + private static final String LOCALSTACK_VERSION = "0.11.4"; @Rule private static final LocalStackContainer localStackContainer = diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/options/AwsModuleTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/options/AwsModuleTest.java index d49d60ed6678..0e318c9989aa 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/options/AwsModuleTest.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/options/AwsModuleTest.java @@ -17,14 +17,15 @@ */ package org.apache.beam.sdk.io.aws.options; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItem; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import com.amazonaws.ClientConfiguration; import com.amazonaws.auth.AWSCredentialsProvider; import com.amazonaws.auth.AWSStaticCredentialsProvider; import com.amazonaws.auth.BasicAWSCredentials; +import com.amazonaws.auth.BasicSessionCredentials; import com.amazonaws.auth.ClasspathPropertiesFileCredentialsProvider; import com.amazonaws.auth.DefaultAWSCredentialsProviderChain; import com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper; @@ -39,6 +40,8 @@ import com.fasterxml.jackson.databind.ObjectMapper; import java.lang.reflect.Field; import java.util.List; +import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; +import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.util.common.ReflectHelpers; import org.hamcrest.Matchers; import org.junit.Test; @@ -47,9 +50,6 @@ /** Tests {@link AwsModule}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AwsModuleTest { private final ObjectMapper objectMapper = new ObjectMapper().registerModule(new AwsModule()); @@ -79,6 +79,20 @@ public void testAWSStaticCredentialsProviderSerializationDeserialization() throw assertEquals( credentialsProvider.getCredentials().getAWSSecretKey(), deserializedCredentialsProvider.getCredentials().getAWSSecretKey()); + + String sessionToken = "session-token"; + BasicSessionCredentials sessionCredentials = + new BasicSessionCredentials(awsKeyId, awsSecretKey, sessionToken); + credentialsProvider = new AWSStaticCredentialsProvider(sessionCredentials); + serializedCredentialsProvider = objectMapper.writeValueAsString(credentialsProvider); + deserializedCredentialsProvider = + objectMapper.readValue(serializedCredentialsProvider, AWSCredentialsProvider.class); + BasicSessionCredentials deserializedCredentials = + (BasicSessionCredentials) deserializedCredentialsProvider.getCredentials(); + assertEquals(credentialsProvider.getClass(), deserializedCredentialsProvider.getClass()); + assertEquals(deserializedCredentials.getAWSAccessKeyId(), awsKeyId); + assertEquals(deserializedCredentials.getAWSSecretKey(), awsSecretKey); + assertEquals(deserializedCredentials.getSessionToken(), sessionToken); } @Test @@ -242,4 +256,22 @@ public void testAwsHttpClientConfigurationSerializationDeserialization() throws assertEquals(1000, clientConfigurationDeserialized.getConnectionMaxIdleMillis()); assertEquals(300, clientConfigurationDeserialized.getSocketTimeout()); } + + @Test + public void testAwsHttpClientConfigurationSerializationDeserializationProto() throws Exception { + AwsOptions awsOptions = + PipelineOptionsTranslation.fromProto( + PipelineOptionsTranslation.toProto( + PipelineOptionsFactory.fromArgs( + "--clientConfiguration={ \"connectionTimeout\": 100, \"connectionMaxIdleTime\": 1000, \"socketTimeout\": 300, \"proxyPort\": -1, \"requestTimeout\": 1500 }") + .create())) + .as(AwsOptions.class); + ClientConfiguration clientConfigurationDeserialized = awsOptions.getClientConfiguration(); + + assertEquals(100, clientConfigurationDeserialized.getConnectionTimeout()); + assertEquals(1000, clientConfigurationDeserialized.getConnectionMaxIdleMillis()); + assertEquals(300, clientConfigurationDeserialized.getSocketTimeout()); + assertEquals(-1, clientConfigurationDeserialized.getProxyPort()); + assertEquals(1500, clientConfigurationDeserialized.getRequestTimeout()); + } } diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/MatchResultMatcher.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/MatchResultMatcher.java index b65613192b8e..2766a64b8be0 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/MatchResultMatcher.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/MatchResultMatcher.java @@ -33,9 +33,6 @@ * Hamcrest {@link Matcher} to match {@link MatchResult}. Necessary because {@link * MatchResult#metadata()} throws an exception under normal circumstances. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class MatchResultMatcher extends BaseMatcher { private final MatchResult.Status expectedStatus; diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java index 6b8e5086f13f..e3ff63a207bc 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java @@ -18,16 +18,19 @@ package org.apache.beam.sdk.io.aws.s3; import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.buildMockedS3FileSystem; -import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.getSSECustomerKeyMd5; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3Config; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3ConfigWithCustomEndpointAndPathStyleAccessEnabled; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3ConfigWithSSECustomerKey; import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3Options; import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3OptionsWithCustomEndpointAndPathStyleAccessEnabled; import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3OptionsWithSSECustomerKey; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.toMd5; import static org.apache.beam.sdk.io.fs.CreateOptions.StandardCreateOptions.builder; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.ArgumentMatchers.any; import static org.mockito.ArgumentMatchers.argThat; import static org.mockito.Matchers.anyObject; @@ -57,6 +60,7 @@ import com.amazonaws.services.s3.model.ListObjectsV2Result; import com.amazonaws.services.s3.model.ObjectMetadata; import com.amazonaws.services.s3.model.S3ObjectSummary; +import com.amazonaws.services.s3.model.SSECustomerKey; import io.findify.s3mock.S3Mock; import java.io.FileNotFoundException; import java.io.IOException; @@ -80,9 +84,6 @@ /** Test case for {@link S3FileSystem}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class S3FileSystemTest { private static S3Mock api; private static AmazonS3 client; @@ -110,12 +111,29 @@ public static void afterClass() { @Test public void testGetScheme() { + S3FileSystem s3FileSystem = new S3FileSystem(s3Config("s3")); + assertEquals("s3", s3FileSystem.getScheme()); + + s3FileSystem = new S3FileSystem(s3Config("other")); + assertEquals("other", s3FileSystem.getScheme()); + } + + @Test + public void testGetSchemeWithS3Options() { S3FileSystem s3FileSystem = new S3FileSystem(s3Options()); assertEquals("s3", s3FileSystem.getScheme()); } @Test public void testGetPathStyleAccessEnabled() throws URISyntaxException { + S3FileSystem s3FileSystem = + new S3FileSystem(s3ConfigWithCustomEndpointAndPathStyleAccessEnabled("s3")); + URL s3Url = s3FileSystem.getAmazonS3Client().getUrl("bucket", "file"); + assertEquals("https://s3.custom.dns/bucket/file", s3Url.toURI().toString()); + } + + @Test + public void testGetPathStyleAccessEnabledWithS3Options() throws URISyntaxException { S3FileSystem s3FileSystem = new S3FileSystem(s3OptionsWithCustomEndpointAndPathStyleAccessEnabled()); URL s3Url = s3FileSystem.getAmazonS3Client().getUrl("bucket", "file"); @@ -124,45 +142,64 @@ public void testGetPathStyleAccessEnabled() throws URISyntaxException { @Test public void testCopy() throws IOException { + testCopy(s3Config("s3")); + testCopy(s3Config("other")); + testCopy(s3ConfigWithSSECustomerKey("s3")); + testCopy(s3ConfigWithSSECustomerKey("other")); + } + + @Test + public void testCopyWithS3Options() throws IOException { testCopy(s3Options()); testCopy(s3OptionsWithSSECustomerKey()); } private GetObjectMetadataRequest createObjectMetadataRequest( - S3ResourceId path, S3Options options) { + S3ResourceId path, SSECustomerKey sseCustomerKey) { GetObjectMetadataRequest getObjectMetadataRequest = new GetObjectMetadataRequest(path.getBucket(), path.getKey()); - getObjectMetadataRequest.setSSECustomerKey(options.getSSECustomerKey()); + getObjectMetadataRequest.setSSECustomerKey(sseCustomerKey); return getObjectMetadataRequest; } private void assertGetObjectMetadata( S3FileSystem s3FileSystem, GetObjectMetadataRequest request, - S3Options options, + String sseCustomerKeyMd5, ObjectMetadata objectMetadata) { when(s3FileSystem .getAmazonS3Client() .getObjectMetadata(argThat(new GetObjectMetadataRequestMatcher(request)))) .thenReturn(objectMetadata); assertEquals( - getSSECustomerKeyMd5(options), + sseCustomerKeyMd5, s3FileSystem.getAmazonS3Client().getObjectMetadata(request).getSSECustomerKeyMd5()); } + private void testCopy(S3FileSystemConfiguration config) throws IOException { + testCopy(buildMockedS3FileSystem(config), config.getSSECustomerKey()); + } + private void testCopy(S3Options options) throws IOException { - S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + testCopy(buildMockedS3FileSystem(options), options.getSSECustomerKey()); + } - S3ResourceId sourcePath = S3ResourceId.fromUri("s3://bucket/from"); - S3ResourceId destinationPath = S3ResourceId.fromUri("s3://bucket/to"); + private void testCopy(S3FileSystem s3FileSystem, SSECustomerKey sseCustomerKey) + throws IOException { + S3ResourceId sourcePath = S3ResourceId.fromUri(s3FileSystem.getScheme() + "://bucket/from"); + S3ResourceId destinationPath = S3ResourceId.fromUri(s3FileSystem.getScheme() + "://bucket/to"); ObjectMetadata objectMetadata = new ObjectMetadata(); objectMetadata.setContentLength(0); - if (getSSECustomerKeyMd5(options) != null) { - objectMetadata.setSSECustomerKeyMd5(getSSECustomerKeyMd5(options)); + String sseCustomerKeyMd5 = toMd5(sseCustomerKey); + if (sseCustomerKeyMd5 != null) { + objectMetadata.setSSECustomerKeyMd5(sseCustomerKeyMd5); } assertGetObjectMetadata( - s3FileSystem, createObjectMetadataRequest(sourcePath, options), options, objectMetadata); + s3FileSystem, + createObjectMetadataRequest(sourcePath, sseCustomerKey), + sseCustomerKeyMd5, + objectMetadata); s3FileSystem.copy(sourcePath, destinationPath); @@ -171,7 +208,10 @@ private void testCopy(S3Options options) throws IOException { // we simulate a big object >= 5GB so it takes the multiPart path objectMetadata.setContentLength(5_368_709_120L); assertGetObjectMetadata( - s3FileSystem, createObjectMetadataRequest(sourcePath, options), options, objectMetadata); + s3FileSystem, + createObjectMetadataRequest(sourcePath, sseCustomerKey), + sseCustomerKeyMd5, + objectMetadata); try { s3FileSystem.copy(sourcePath, destinationPath); @@ -184,19 +224,34 @@ private void testCopy(S3Options options) throws IOException { @Test public void testAtomicCopy() { + testAtomicCopy(s3Config("s3")); + testAtomicCopy(s3Config("other")); + testAtomicCopy(s3ConfigWithSSECustomerKey("s3")); + testAtomicCopy(s3ConfigWithSSECustomerKey("other")); + } + + @Test + public void testAtomicCopyWithS3Options() { testAtomicCopy(s3Options()); testAtomicCopy(s3OptionsWithSSECustomerKey()); } + private void testAtomicCopy(S3FileSystemConfiguration config) { + testAtomicCopy(buildMockedS3FileSystem(config), config.getSSECustomerKey()); + } + private void testAtomicCopy(S3Options options) { - S3FileSystem s3FileSystem = buildMockedS3FileSystem(options); + testAtomicCopy(buildMockedS3FileSystem(options), options.getSSECustomerKey()); + } - S3ResourceId sourcePath = S3ResourceId.fromUri("s3://bucket/from"); - S3ResourceId destinationPath = S3ResourceId.fromUri("s3://bucket/to"); + private void testAtomicCopy(S3FileSystem s3FileSystem, SSECustomerKey sseCustomerKey) { + S3ResourceId sourcePath = S3ResourceId.fromUri(s3FileSystem.getScheme() + "://bucket/from"); + S3ResourceId destinationPath = S3ResourceId.fromUri(s3FileSystem.getScheme() + "://bucket/to"); CopyObjectResult copyObjectResult = new CopyObjectResult(); - if (getSSECustomerKeyMd5(options) != null) { - copyObjectResult.setSSECustomerKeyMd5(getSSECustomerKeyMd5(options)); + String sseCustomerKeyMd5 = toMd5(sseCustomerKey); + if (sseCustomerKeyMd5 != null) { + copyObjectResult.setSSECustomerKeyMd5(sseCustomerKeyMd5); } CopyObjectRequest copyObjectRequest = new CopyObjectRequest( @@ -204,12 +259,12 @@ private void testAtomicCopy(S3Options options) { sourcePath.getKey(), destinationPath.getBucket(), destinationPath.getKey()); - copyObjectRequest.setSourceSSECustomerKey(options.getSSECustomerKey()); - copyObjectRequest.setDestinationSSECustomerKey(options.getSSECustomerKey()); + copyObjectRequest.setSourceSSECustomerKey(sseCustomerKey); + copyObjectRequest.setDestinationSSECustomerKey(sseCustomerKey); when(s3FileSystem.getAmazonS3Client().copyObject(any(CopyObjectRequest.class))) .thenReturn(copyObjectResult); assertEquals( - getSSECustomerKeyMd5(options), + sseCustomerKeyMd5, s3FileSystem.getAmazonS3Client().copyObject(copyObjectRequest).getSSECustomerKeyMd5()); ObjectMetadata sourceS3ObjectMetadata = new ObjectMetadata(); @@ -220,28 +275,50 @@ private void testAtomicCopy(S3Options options) { @Test public void testMultipartCopy() { + testMultipartCopy(s3Config("s3")); + testMultipartCopy(s3Config("other")); + testMultipartCopy(s3ConfigWithSSECustomerKey("s3")); + testMultipartCopy(s3ConfigWithSSECustomerKey("other")); + } + + @Test + public void testMultipartCopyWithS3Options() { testMultipartCopy(s3Options()); testMultipartCopy(s3OptionsWithSSECustomerKey()); } + private void testMultipartCopy(S3FileSystemConfiguration config) { + testMultipartCopy( + buildMockedS3FileSystem(config), + config.getSSECustomerKey(), + config.getS3UploadBufferSizeBytes()); + } + private void testMultipartCopy(S3Options options) { - S3FileSystem s3FileSystem = buildMockedS3FileSystem(options); + testMultipartCopy( + buildMockedS3FileSystem(options), + options.getSSECustomerKey(), + options.getS3UploadBufferSizeBytes()); + } - S3ResourceId sourcePath = S3ResourceId.fromUri("s3://bucket/from"); - S3ResourceId destinationPath = S3ResourceId.fromUri("s3://bucket/to"); + private void testMultipartCopy( + S3FileSystem s3FileSystem, SSECustomerKey sseCustomerKey, long s3UploadBufferSizeBytes) { + S3ResourceId sourcePath = S3ResourceId.fromUri(s3FileSystem.getScheme() + "://bucket/from"); + S3ResourceId destinationPath = S3ResourceId.fromUri(s3FileSystem.getScheme() + "://bucket/to"); InitiateMultipartUploadResult initiateMultipartUploadResult = new InitiateMultipartUploadResult(); initiateMultipartUploadResult.setUploadId("upload-id"); - if (getSSECustomerKeyMd5(options) != null) { - initiateMultipartUploadResult.setSSECustomerKeyMd5(getSSECustomerKeyMd5(options)); + String sseCustomerKeyMd5 = toMd5(sseCustomerKey); + if (sseCustomerKeyMd5 != null) { + initiateMultipartUploadResult.setSSECustomerKeyMd5(sseCustomerKeyMd5); } when(s3FileSystem .getAmazonS3Client() .initiateMultipartUpload(any(InitiateMultipartUploadRequest.class))) .thenReturn(initiateMultipartUploadResult); assertEquals( - getSSECustomerKeyMd5(options), + sseCustomerKeyMd5, s3FileSystem .getAmazonS3Client() .initiateMultipartUpload( @@ -250,32 +327,32 @@ private void testMultipartCopy(S3Options options) { .getSSECustomerKeyMd5()); ObjectMetadata sourceObjectMetadata = new ObjectMetadata(); - sourceObjectMetadata.setContentLength((long) (options.getS3UploadBufferSizeBytes() * 1.5)); + sourceObjectMetadata.setContentLength((long) (s3UploadBufferSizeBytes * 1.5)); sourceObjectMetadata.setContentEncoding("read-seek-efficient"); - if (getSSECustomerKeyMd5(options) != null) { - sourceObjectMetadata.setSSECustomerKeyMd5(getSSECustomerKeyMd5(options)); + if (sseCustomerKeyMd5 != null) { + sourceObjectMetadata.setSSECustomerKeyMd5(sseCustomerKeyMd5); } assertGetObjectMetadata( s3FileSystem, - createObjectMetadataRequest(sourcePath, options), - options, + createObjectMetadataRequest(sourcePath, sseCustomerKey), + sseCustomerKeyMd5, sourceObjectMetadata); CopyPartResult copyPartResult1 = new CopyPartResult(); copyPartResult1.setETag("etag-1"); CopyPartResult copyPartResult2 = new CopyPartResult(); copyPartResult1.setETag("etag-2"); - if (getSSECustomerKeyMd5(options) != null) { - copyPartResult1.setSSECustomerKeyMd5(getSSECustomerKeyMd5(options)); - copyPartResult2.setSSECustomerKeyMd5(getSSECustomerKeyMd5(options)); + if (sseCustomerKeyMd5 != null) { + copyPartResult1.setSSECustomerKeyMd5(sseCustomerKeyMd5); + copyPartResult2.setSSECustomerKeyMd5(sseCustomerKeyMd5); } CopyPartRequest copyPartRequest = new CopyPartRequest(); - copyPartRequest.setSourceSSECustomerKey(options.getSSECustomerKey()); + copyPartRequest.setSourceSSECustomerKey(sseCustomerKey); when(s3FileSystem.getAmazonS3Client().copyPart(any(CopyPartRequest.class))) .thenReturn(copyPartResult1) .thenReturn(copyPartResult2); assertEquals( - getSSECustomerKeyMd5(options), + sseCustomerKeyMd5, s3FileSystem.getAmazonS3Client().copyPart(copyPartRequest).getSSECustomerKeyMd5()); s3FileSystem.multipartCopy(sourcePath, destinationPath, sourceObjectMetadata); @@ -286,6 +363,29 @@ private void testMultipartCopy(S3Options options) { @Test public void deleteThousandsOfObjectsInMultipleBuckets() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("s3")); + + List buckets = ImmutableList.of("bucket1", "bucket2"); + List keys = new ArrayList<>(); + for (int i = 0; i < 2500; i++) { + keys.add(String.format("key-%d", i)); + } + List paths = new ArrayList<>(); + for (String bucket : buckets) { + for (String key : keys) { + paths.add(S3ResourceId.fromComponents("s3", bucket, key)); + } + } + + s3FileSystem.delete(paths); + + // Should require 6 calls to delete 2500 objects in each of 2 buckets. + verify(s3FileSystem.getAmazonS3Client(), times(6)) + .deleteObjects(any(DeleteObjectsRequest.class)); + } + + @Test + public void deleteThousandsOfObjectsInMultipleBucketsWithS3Options() throws IOException { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); List buckets = ImmutableList.of("bucket1", "bucket2"); @@ -296,7 +396,7 @@ public void deleteThousandsOfObjectsInMultipleBuckets() throws IOException { List paths = new ArrayList<>(); for (String bucket : buckets) { for (String key : keys) { - paths.add(S3ResourceId.fromComponents(bucket, key)); + paths.add(S3ResourceId.fromComponents("s3", bucket, key)); } } @@ -309,6 +409,37 @@ public void deleteThousandsOfObjectsInMultipleBuckets() throws IOException { @Test public void matchNonGlob() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("mys3")); + + S3ResourceId path = S3ResourceId.fromUri("mys3://testbucket/testdirectory/filethatexists"); + long lastModifiedMillis = 1540000000000L; + ObjectMetadata s3ObjectMetadata = new ObjectMetadata(); + s3ObjectMetadata.setContentLength(100); + s3ObjectMetadata.setContentEncoding("read-seek-efficient"); + s3ObjectMetadata.setLastModified(new Date(lastModifiedMillis)); + when(s3FileSystem + .getAmazonS3Client() + .getObjectMetadata( + argThat( + new GetObjectMetadataRequestMatcher( + new GetObjectMetadataRequest(path.getBucket(), path.getKey()))))) + .thenReturn(s3ObjectMetadata); + + MatchResult result = s3FileSystem.matchNonGlobPath(path); + assertThat( + result, + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setSizeBytes(100) + .setLastModifiedMillis(lastModifiedMillis) + .setResourceId(path) + .setIsReadSeekEfficient(true) + .build()))); + } + + @Test + public void matchNonGlobWithS3Options() { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); @@ -340,6 +471,37 @@ public void matchNonGlob() { @Test public void matchNonGlobNotReadSeekEfficient() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("s3")); + + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); + long lastModifiedMillis = 1540000000000L; + ObjectMetadata s3ObjectMetadata = new ObjectMetadata(); + s3ObjectMetadata.setContentLength(100); + s3ObjectMetadata.setLastModified(new Date(lastModifiedMillis)); + s3ObjectMetadata.setContentEncoding("gzip"); + when(s3FileSystem + .getAmazonS3Client() + .getObjectMetadata( + argThat( + new GetObjectMetadataRequestMatcher( + new GetObjectMetadataRequest(path.getBucket(), path.getKey()))))) + .thenReturn(s3ObjectMetadata); + + MatchResult result = s3FileSystem.matchNonGlobPath(path); + assertThat( + result, + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setSizeBytes(100) + .setLastModifiedMillis(lastModifiedMillis) + .setResourceId(path) + .setIsReadSeekEfficient(false) + .build()))); + } + + @Test + public void matchNonGlobNotReadSeekEfficientWithS3Options() { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); @@ -371,6 +533,37 @@ public void matchNonGlobNotReadSeekEfficient() { @Test public void matchNonGlobNullContentEncoding() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("s3")); + + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); + long lastModifiedMillis = 1540000000000L; + ObjectMetadata s3ObjectMetadata = new ObjectMetadata(); + s3ObjectMetadata.setContentLength(100); + s3ObjectMetadata.setLastModified(new Date(lastModifiedMillis)); + s3ObjectMetadata.setContentEncoding(null); + when(s3FileSystem + .getAmazonS3Client() + .getObjectMetadata( + argThat( + new GetObjectMetadataRequestMatcher( + new GetObjectMetadataRequest(path.getBucket(), path.getKey()))))) + .thenReturn(s3ObjectMetadata); + + MatchResult result = s3FileSystem.matchNonGlobPath(path); + assertThat( + result, + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setSizeBytes(100) + .setLastModifiedMillis(lastModifiedMillis) + .setResourceId(path) + .setIsReadSeekEfficient(true) + .build()))); + } + + @Test + public void matchNonGlobNullContentEncodingWithS3Options() { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); @@ -402,6 +595,27 @@ public void matchNonGlobNullContentEncoding() { @Test public void matchNonGlobNotFound() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("mys3")); + + S3ResourceId path = S3ResourceId.fromUri("mys3://testbucket/testdirectory/nonexistentfile"); + AmazonS3Exception exception = new AmazonS3Exception("mock exception"); + exception.setStatusCode(404); + when(s3FileSystem + .getAmazonS3Client() + .getObjectMetadata( + argThat( + new GetObjectMetadataRequestMatcher( + new GetObjectMetadataRequest(path.getBucket(), path.getKey()))))) + .thenThrow(exception); + + MatchResult result = s3FileSystem.matchNonGlobPath(path); + assertThat( + result, + MatchResultMatcher.create(MatchResult.Status.NOT_FOUND, new FileNotFoundException())); + } + + @Test + public void matchNonGlobNotFoundWithS3Options() { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/nonexistentfile"); @@ -423,6 +637,26 @@ public void matchNonGlobNotFound() { @Test public void matchNonGlobForbidden() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("s3")); + + AmazonS3Exception exception = new AmazonS3Exception("mock exception"); + exception.setStatusCode(403); + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/keyname"); + when(s3FileSystem + .getAmazonS3Client() + .getObjectMetadata( + argThat( + new GetObjectMetadataRequestMatcher( + new GetObjectMetadataRequest(path.getBucket(), path.getKey()))))) + .thenThrow(exception); + + assertThat( + s3FileSystem.matchNonGlobPath(path), + MatchResultMatcher.create(MatchResult.Status.ERROR, new IOException(exception))); + } + + @Test + public void matchNonGlobForbiddenWithS3Options() { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); AmazonS3Exception exception = new AmazonS3Exception("mock exception"); @@ -465,6 +699,92 @@ public boolean matches(ListObjectsV2Request argument) { @Test public void matchGlob() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("mys3")); + + S3ResourceId path = S3ResourceId.fromUri("mys3://testbucket/foo/bar*baz"); + + ListObjectsV2Request firstRequest = + new ListObjectsV2Request() + .withBucketName(path.getBucket()) + .withPrefix(path.getKeyNonWildcardPrefix()) + .withContinuationToken(null); + + // Expected to be returned; prefix and wildcard/regex match + S3ObjectSummary firstMatch = new S3ObjectSummary(); + firstMatch.setBucketName(path.getBucket()); + firstMatch.setKey("foo/bar0baz"); + firstMatch.setSize(100); + firstMatch.setLastModified(new Date(1540000000001L)); + + // Expected to not be returned; prefix matches, but substring after wildcard does not + S3ObjectSummary secondMatch = new S3ObjectSummary(); + secondMatch.setBucketName(path.getBucket()); + secondMatch.setKey("foo/bar1qux"); + secondMatch.setSize(200); + secondMatch.setLastModified(new Date(1540000000002L)); + + // Expected first request returns continuation token + ListObjectsV2Result firstResult = new ListObjectsV2Result(); + firstResult.setNextContinuationToken("token"); + firstResult.getObjectSummaries().add(firstMatch); + firstResult.getObjectSummaries().add(secondMatch); + when(s3FileSystem + .getAmazonS3Client() + .listObjectsV2(argThat(new ListObjectsV2RequestArgumentMatches(firstRequest)))) + .thenReturn(firstResult); + + // Expect second request with continuation token + ListObjectsV2Request secondRequest = + new ListObjectsV2Request() + .withBucketName(path.getBucket()) + .withPrefix(path.getKeyNonWildcardPrefix()) + .withContinuationToken("token"); + + // Expected to be returned; prefix and wildcard/regex match + S3ObjectSummary thirdMatch = new S3ObjectSummary(); + thirdMatch.setBucketName(path.getBucket()); + thirdMatch.setKey("foo/bar2baz"); + thirdMatch.setSize(300); + thirdMatch.setLastModified(new Date(1540000000003L)); + + // Expected second request returns third prefix match and no continuation token + ListObjectsV2Result secondResult = new ListObjectsV2Result(); + secondResult.setNextContinuationToken(null); + secondResult.getObjectSummaries().add(thirdMatch); + when(s3FileSystem + .getAmazonS3Client() + .listObjectsV2(argThat(new ListObjectsV2RequestArgumentMatches(secondRequest)))) + .thenReturn(secondResult); + + // Expect object metadata queries for content encoding + ObjectMetadata metadata = new ObjectMetadata(); + metadata.setContentEncoding(""); + when(s3FileSystem.getAmazonS3Client().getObjectMetadata(anyObject())).thenReturn(metadata); + + assertThat( + s3FileSystem.matchGlobPaths(ImmutableList.of(path)).get(0), + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setIsReadSeekEfficient(true) + .setResourceId( + S3ResourceId.fromComponents( + "mys3", firstMatch.getBucketName(), firstMatch.getKey())) + .setSizeBytes(firstMatch.getSize()) + .setLastModifiedMillis(firstMatch.getLastModified().getTime()) + .build(), + MatchResult.Metadata.builder() + .setIsReadSeekEfficient(true) + .setResourceId( + S3ResourceId.fromComponents( + "mys3", thirdMatch.getBucketName(), thirdMatch.getKey())) + .setSizeBytes(thirdMatch.getSize()) + .setLastModifiedMillis(thirdMatch.getLastModified().getTime()) + .build()))); + } + + @Test + public void matchGlobWithS3Options() throws IOException { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/foo/bar*baz"); @@ -535,7 +855,7 @@ public void matchGlob() throws IOException { .setIsReadSeekEfficient(true) .setResourceId( S3ResourceId.fromComponents( - firstMatch.getBucketName(), firstMatch.getKey())) + "s3", firstMatch.getBucketName(), firstMatch.getKey())) .setSizeBytes(firstMatch.getSize()) .setLastModifiedMillis(firstMatch.getLastModified().getTime()) .build(), @@ -543,7 +863,7 @@ public void matchGlob() throws IOException { .setIsReadSeekEfficient(true) .setResourceId( S3ResourceId.fromComponents( - thirdMatch.getBucketName(), thirdMatch.getKey())) + "s3", thirdMatch.getBucketName(), thirdMatch.getKey())) .setSizeBytes(thirdMatch.getSize()) .setLastModifiedMillis(thirdMatch.getLastModified().getTime()) .build()))); @@ -551,6 +871,60 @@ public void matchGlob() throws IOException { @Test public void matchGlobWithSlashes() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("s3")); + + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/foo/bar\\baz*"); + + ListObjectsV2Request request = + new ListObjectsV2Request() + .withBucketName(path.getBucket()) + .withPrefix(path.getKeyNonWildcardPrefix()) + .withContinuationToken(null); + + // Expected to be returned; prefix and wildcard/regex match + S3ObjectSummary firstMatch = new S3ObjectSummary(); + firstMatch.setBucketName(path.getBucket()); + firstMatch.setKey("foo/bar\\baz0"); + firstMatch.setSize(100); + firstMatch.setLastModified(new Date(1540000000001L)); + + // Expected to not be returned; prefix matches, but substring after wildcard does not + S3ObjectSummary secondMatch = new S3ObjectSummary(); + secondMatch.setBucketName(path.getBucket()); + secondMatch.setKey("foo/bar/baz1"); + secondMatch.setSize(200); + secondMatch.setLastModified(new Date(1540000000002L)); + + // Expected first request returns continuation token + ListObjectsV2Result result = new ListObjectsV2Result(); + result.getObjectSummaries().add(firstMatch); + result.getObjectSummaries().add(secondMatch); + when(s3FileSystem + .getAmazonS3Client() + .listObjectsV2(argThat(new ListObjectsV2RequestArgumentMatches(request)))) + .thenReturn(result); + + // Expect object metadata queries for content encoding + ObjectMetadata metadata = new ObjectMetadata(); + metadata.setContentEncoding(""); + when(s3FileSystem.getAmazonS3Client().getObjectMetadata(anyObject())).thenReturn(metadata); + + assertThat( + s3FileSystem.matchGlobPaths(ImmutableList.of(path)).get(0), + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setIsReadSeekEfficient(true) + .setResourceId( + S3ResourceId.fromComponents( + "s3", firstMatch.getBucketName(), firstMatch.getKey())) + .setSizeBytes(firstMatch.getSize()) + .setLastModifiedMillis(firstMatch.getLastModified().getTime()) + .build()))); + } + + @Test + public void matchGlobWithSlashesWithS3Options() throws IOException { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/foo/bar\\baz*"); @@ -597,7 +971,7 @@ public void matchGlobWithSlashes() throws IOException { .setIsReadSeekEfficient(true) .setResourceId( S3ResourceId.fromComponents( - firstMatch.getBucketName(), firstMatch.getKey())) + "s3", firstMatch.getBucketName(), firstMatch.getKey())) .setSizeBytes(firstMatch.getSize()) .setLastModifiedMillis(firstMatch.getLastModified().getTime()) .build()))); @@ -605,6 +979,92 @@ public void matchGlobWithSlashes() throws IOException { @Test public void matchVariousInvokeThreadPool() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("s3")); + + AmazonS3Exception notFoundException = new AmazonS3Exception("mock exception"); + notFoundException.setStatusCode(404); + S3ResourceId pathNotExist = + S3ResourceId.fromUri("s3://testbucket/testdirectory/nonexistentfile"); + when(s3FileSystem + .getAmazonS3Client() + .getObjectMetadata( + argThat( + new GetObjectMetadataRequestMatcher( + new GetObjectMetadataRequest( + pathNotExist.getBucket(), pathNotExist.getKey()))))) + .thenThrow(notFoundException); + + AmazonS3Exception forbiddenException = new AmazonS3Exception("mock exception"); + forbiddenException.setStatusCode(403); + S3ResourceId pathForbidden = + S3ResourceId.fromUri("s3://testbucket/testdirectory/forbiddenfile"); + when(s3FileSystem + .getAmazonS3Client() + .getObjectMetadata( + argThat( + new GetObjectMetadataRequestMatcher( + new GetObjectMetadataRequest( + pathForbidden.getBucket(), pathForbidden.getKey()))))) + .thenThrow(forbiddenException); + + S3ResourceId pathExist = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); + ObjectMetadata s3ObjectMetadata = new ObjectMetadata(); + s3ObjectMetadata.setContentLength(100); + s3ObjectMetadata.setLastModified(new Date(1540000000000L)); + s3ObjectMetadata.setContentEncoding("not-gzip"); + when(s3FileSystem + .getAmazonS3Client() + .getObjectMetadata( + argThat( + new GetObjectMetadataRequestMatcher( + new GetObjectMetadataRequest(pathExist.getBucket(), pathExist.getKey()))))) + .thenReturn(s3ObjectMetadata); + + S3ResourceId pathGlob = S3ResourceId.fromUri("s3://testbucket/path/part*"); + + S3ObjectSummary foundListObject = new S3ObjectSummary(); + foundListObject.setBucketName(pathGlob.getBucket()); + foundListObject.setKey("path/part-0"); + foundListObject.setSize(200); + foundListObject.setLastModified(new Date(1541000000000L)); + + ListObjectsV2Result listObjectsResult = new ListObjectsV2Result(); + listObjectsResult.setNextContinuationToken(null); + listObjectsResult.getObjectSummaries().add(foundListObject); + when(s3FileSystem.getAmazonS3Client().listObjectsV2(notNull(ListObjectsV2Request.class))) + .thenReturn(listObjectsResult); + + ObjectMetadata metadata = new ObjectMetadata(); + metadata.setContentEncoding(""); + when(s3FileSystem + .getAmazonS3Client() + .getObjectMetadata( + argThat( + new GetObjectMetadataRequestMatcher( + new GetObjectMetadataRequest(pathGlob.getBucket(), "path/part-0"))))) + .thenReturn(metadata); + + assertThat( + s3FileSystem.match( + ImmutableList.of( + pathNotExist.toString(), + pathForbidden.toString(), + pathExist.toString(), + pathGlob.toString())), + contains( + MatchResultMatcher.create(MatchResult.Status.NOT_FOUND, new FileNotFoundException()), + MatchResultMatcher.create( + MatchResult.Status.ERROR, new IOException(forbiddenException)), + MatchResultMatcher.create(100, 1540000000000L, pathExist, true), + MatchResultMatcher.create( + 200, + 1541000000000L, + S3ResourceId.fromComponents("s3", pathGlob.getBucket(), foundListObject.getKey()), + true))); + } + + @Test + public void matchVariousInvokeThreadPoolWithS3Options() throws IOException { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); AmazonS3Exception notFoundException = new AmazonS3Exception("mock exception"); @@ -685,12 +1145,40 @@ MatchResult.Status.ERROR, new IOException(forbiddenException)), MatchResultMatcher.create( 200, 1541000000000L, - S3ResourceId.fromComponents(pathGlob.getBucket(), foundListObject.getKey()), + S3ResourceId.fromComponents("s3", pathGlob.getBucket(), foundListObject.getKey()), true))); } @Test public void testWriteAndRead() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Config("s3"), client); + + client.createBucket("testbucket"); + + byte[] writtenArray = new byte[] {0}; + ByteBuffer bb = ByteBuffer.allocate(writtenArray.length); + bb.put(writtenArray); + + // First create an object and write data to it + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/foo/bar.txt"); + WritableByteChannel writableByteChannel = + s3FileSystem.create(path, builder().setMimeType("application/text").build()); + writableByteChannel.write(bb); + writableByteChannel.close(); + + // Now read the same object + ByteBuffer bb2 = ByteBuffer.allocate(writtenArray.length); + ReadableByteChannel open = s3FileSystem.open(path); + open.read(bb2); + + // And compare the content with the one that was written + byte[] readArray = bb2.array(); + assertArrayEquals(readArray, writtenArray); + open.close(); + } + + @Test + public void testWriteAndReadWithS3Options() throws IOException { S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options(), client); client.createBucket("testbucket"); diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3ResourceIdTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3ResourceIdTest.java index 2e596f45d41b..152e096526cc 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3ResourceIdTest.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3ResourceIdTest.java @@ -25,7 +25,13 @@ import static org.junit.Assert.assertNull; import static org.junit.Assert.assertTrue; +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; import java.util.Arrays; +import java.util.Base64; +import java.util.Date; import java.util.List; import org.apache.beam.sdk.io.FileSystems; import org.apache.beam.sdk.io.aws.options.S3Options; @@ -41,9 +47,6 @@ /** Tests {@link S3ResourceId}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class S3ResourceIdTest { @Rule public ExpectedException thrown = ExpectedException.none(); @@ -78,6 +81,62 @@ static final class TestCase { new TestCase( "s3://bucket/path/to/dir/", "..", RESOLVE_DIRECTORY, "s3://bucket/path/to/")); + private S3ResourceId deserializeFromB64(String base64) throws Exception { + ByteArrayInputStream b = new ByteArrayInputStream(Base64.getDecoder().decode(base64)); + try (ObjectInputStream s = new ObjectInputStream(b)) { + return (S3ResourceId) s.readObject(); + } + } + + private String serializeToB64(S3ResourceId r) throws Exception { + ByteArrayOutputStream b = new ByteArrayOutputStream(); + try (ObjectOutputStream s = new ObjectOutputStream(b)) { + s.writeObject(r); + } + return Base64.getEncoder().encodeToString(b.toByteArray()); + } + + @Test + public void testSerialization() throws Exception { + String r1Serialized = + "rO0ABXNyACpvcmcuYXBhY2hlLmJlYW0uc2RrLmlvLmF3cy5zMy5TM1Jlc291cmNlSWSN8nM8V4cVFwIABEwABmJ1Y2tldHQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAA2tleXEAfgABTAAMbGFzdE1vZGlmaWVkdAAQTGphdmEvdXRpbC9EYXRlO0wABHNpemV0ABBMamF2YS9sYW5nL0xvbmc7eHB0AAZidWNrZXR0AAYvYS9iL2NwcA=="; + String r2Serialized = + "rO0ABXNyACpvcmcuYXBhY2hlLmJlYW0uc2RrLmlvLmF3cy5zMy5TM1Jlc291cmNlSWSN8nM8V4cVFwIABEwABmJ1Y2tldHQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAA2tleXEAfgABTAAMbGFzdE1vZGlmaWVkdAAQTGphdmEvdXRpbC9EYXRlO0wABHNpemV0ABBMamF2YS9sYW5nL0xvbmc7eHB0AAxvdGhlci1idWNrZXR0AAYveC95L3pwc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAB7"; + String r3Serialized = + "rO0ABXNyACpvcmcuYXBhY2hlLmJlYW0uc2RrLmlvLmF3cy5zMy5TM1Jlc291cmNlSWSN8nM8V4cVFwIABEwABmJ1Y2tldHQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAA2tleXEAfgABTAAMbGFzdE1vZGlmaWVkdAAQTGphdmEvdXRpbC9EYXRlO0wABHNpemV0ABBMamF2YS9sYW5nL0xvbmc7eHB0AAx0aGlyZC1idWNrZXR0AAkvZm9vL2Jhci9zcgAOamF2YS51dGlsLkRhdGVoaoEBS1l0GQMAAHhwdwgAADgCgmXOAHhw"; + String r4Serialized = + "rO0ABXNyACpvcmcuYXBhY2hlLmJlYW0uc2RrLmlvLmF3cy5zMy5TM1Jlc291cmNlSWSN8nM8V4cVFwIABEwABmJ1Y2tldHQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAA2tleXEAfgABTAAMbGFzdE1vZGlmaWVkdAAQTGphdmEvdXRpbC9EYXRlO0wABHNpemV0ABBMamF2YS9sYW5nL0xvbmc7eHB0AApiYXotYnVja2V0dAAGL2EvYi9jc3IADmphdmEudXRpbC5EYXRlaGqBAUtZdBkDAAB4cHcIAAA33gSV5gB4c3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAq"; + + S3ResourceId r1 = S3ResourceId.fromComponents("s3", "bucket", "a/b/c"); + S3ResourceId r2 = S3ResourceId.fromComponents("s3", "other-bucket", "x/y/z").withSize(123); + S3ResourceId r3 = + S3ResourceId.fromComponents("s3", "third-bucket", "foo/bar/") + .withLastModified(new Date(2021, 6, 3)); + S3ResourceId r4 = + S3ResourceId.fromComponents("s3", "baz-bucket", "a/b/c") + .withSize(42) + .withLastModified(new Date(2016, 6, 15)); + S3ResourceId r5 = S3ResourceId.fromComponents("other-scheme", "bucket", "a/b/c"); + S3ResourceId r6 = + S3ResourceId.fromComponents("other-scheme", "baz-bucket", "foo/bar/") + .withSize(42) + .withLastModified(new Date(2016, 6, 5)); + + // S3ResourceIds serialized by previous versions should still deserialize. + assertEquals(r1, deserializeFromB64(r1Serialized)); + assertEquals(r2, deserializeFromB64(r2Serialized)); + assertEquals(r3, deserializeFromB64(r3Serialized)); + assertEquals(r4, deserializeFromB64(r4Serialized)); + + // Current resource IDs should round-trip properly through serialization. + assertEquals(r1, deserializeFromB64(serializeToB64(r1))); + assertEquals(r2, deserializeFromB64(serializeToB64(r2))); + assertEquals(r3, deserializeFromB64(serializeToB64(r3))); + assertEquals(r4, deserializeFromB64(serializeToB64(r4))); + assertEquals(r5, deserializeFromB64(serializeToB64(r5))); + assertEquals(r6, deserializeFromB64(serializeToB64(r6))); + } + @Test public void testResolve() { for (TestCase testCase : PATH_TEST_CASES) { @@ -253,30 +312,29 @@ public void testS3ResourceIdToString() { @Test public void testEquals() { - S3ResourceId a = S3ResourceId.fromComponents("bucket", "a/b/c"); - S3ResourceId b = S3ResourceId.fromComponents("bucket", "a/b/c"); + S3ResourceId a = S3ResourceId.fromComponents("s3", "bucket", "a/b/c"); + S3ResourceId b = S3ResourceId.fromComponents("s3", "bucket", "a/b/c"); assertEquals(a, b); - b = S3ResourceId.fromComponents(a.getBucket(), "a/b/c/"); + b = S3ResourceId.fromComponents("s3", a.getBucket(), "a/b/c/"); assertNotEquals(a, b); - b = S3ResourceId.fromComponents(a.getBucket(), "x/y/z"); + b = S3ResourceId.fromComponents("s3", a.getBucket(), "x/y/z"); assertNotEquals(a, b); - b = S3ResourceId.fromComponents("other-bucket", a.getKey()); + b = S3ResourceId.fromComponents("s3", "other-bucket", a.getKey()); assertNotEquals(a, b); - } + assertNotEquals(b, a); - @Test - public void testInvalidS3ResourceId() { - thrown.expect(IllegalArgumentException.class); - S3ResourceId.fromUri("file://invalid/s3/path"); + b = S3ResourceId.fromComponents("other", "bucket", "a/b/c"); + assertNotEquals(a, b); + assertNotEquals(b, a); } @Test public void testInvalidBucket() { thrown.expect(IllegalArgumentException.class); - S3ResourceId.fromComponents("invalid/", ""); + S3ResourceId.fromComponents("s3", "invalid/", ""); } @Test diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3TestUtils.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3TestUtils.java index 96ab37baa584..3df2f10f9c82 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3TestUtils.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3TestUtils.java @@ -31,6 +31,17 @@ /** Utils to test S3 filesystem. */ class S3TestUtils { + private static S3FileSystemConfiguration.Builder configBuilder(String scheme) { + S3Options options = PipelineOptionsFactory.as(S3Options.class); + options.setAwsRegion("us-west-1"); + options.setS3UploadBufferSizeBytes(5_242_880); + return S3FileSystemConfiguration.fromS3Options(options).setScheme(scheme); + } + + static S3FileSystemConfiguration s3Config(String scheme) { + return configBuilder(scheme).build(); + } + static S3Options s3Options() { S3Options options = PipelineOptionsFactory.as(S3Options.class); options.setAwsRegion("us-west-1"); @@ -47,18 +58,49 @@ static S3Options s3OptionsWithCustomEndpointAndPathStyleAccessEnabled() { return options; } + static S3FileSystemConfiguration s3ConfigWithCustomEndpointAndPathStyleAccessEnabled( + String scheme) { + return S3FileSystemConfiguration.fromS3Options( + s3OptionsWithCustomEndpointAndPathStyleAccessEnabled()) + .setScheme(scheme) + .build(); + } + + static S3FileSystemConfiguration s3ConfigWithSSEAlgorithm(String scheme) { + return configBuilder(scheme) + .setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION) + .build(); + } + static S3Options s3OptionsWithSSEAlgorithm() { S3Options options = s3Options(); options.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION); return options; } + static S3FileSystemConfiguration s3ConfigWithSSECustomerKey(String scheme) { + return configBuilder(scheme) + .setSSECustomerKey(new SSECustomerKey("86glyTlCNZgccSxW8JxMa6ZdjdK3N141glAysPUZ3AA=")) + .build(); + } + static S3Options s3OptionsWithSSECustomerKey() { S3Options options = s3Options(); options.setSSECustomerKey(new SSECustomerKey("86glyTlCNZgccSxW8JxMa6ZdjdK3N141glAysPUZ3AA=")); return options; } + static S3FileSystemConfiguration s3ConfigWithSSEAwsKeyManagementParams(String scheme) { + String awsKmsKeyId = + "arn:aws:kms:eu-west-1:123456789012:key/dc123456-7890-ABCD-EF01-234567890ABC"; + SSEAwsKeyManagementParams sseAwsKeyManagementParams = + new SSEAwsKeyManagementParams(awsKmsKeyId); + return configBuilder(scheme) + .setSSEAwsKeyManagementParams(sseAwsKeyManagementParams) + .setBucketKeyEnabled(true) + .build(); + } + static S3Options s3OptionsWithSSEAwsKeyManagementParams() { S3Options options = s3Options(); String awsKmsKeyId = @@ -66,33 +108,58 @@ static S3Options s3OptionsWithSSEAwsKeyManagementParams() { SSEAwsKeyManagementParams sseAwsKeyManagementParams = new SSEAwsKeyManagementParams(awsKmsKeyId); options.setSSEAwsKeyManagementParams(sseAwsKeyManagementParams); + options.setBucketKeyEnabled(true); return options; } + static S3FileSystemConfiguration s3ConfigWithMultipleSSEOptions(String scheme) { + return s3ConfigWithSSEAwsKeyManagementParams(scheme) + .toBuilder() + .setSSECustomerKey(new SSECustomerKey("86glyTlCNZgccSxW8JxMa6ZdjdK3N141glAysPUZ3AA=")) + .build(); + } + static S3Options s3OptionsWithMultipleSSEOptions() { S3Options options = s3OptionsWithSSEAwsKeyManagementParams(); options.setSSECustomerKey(new SSECustomerKey("86glyTlCNZgccSxW8JxMa6ZdjdK3N141glAysPUZ3AA=")); return options; } + static S3FileSystem buildMockedS3FileSystem(S3FileSystemConfiguration config) { + return buildMockedS3FileSystem(config, Mockito.mock(AmazonS3.class)); + } + static S3FileSystem buildMockedS3FileSystem(S3Options options) { return buildMockedS3FileSystem(options, Mockito.mock(AmazonS3.class)); } + static S3FileSystem buildMockedS3FileSystem(S3FileSystemConfiguration config, AmazonS3 client) { + S3FileSystem s3FileSystem = new S3FileSystem(config); + s3FileSystem.setAmazonS3Client(client); + return s3FileSystem; + } + static S3FileSystem buildMockedS3FileSystem(S3Options options, AmazonS3 client) { S3FileSystem s3FileSystem = new S3FileSystem(options); s3FileSystem.setAmazonS3Client(client); return s3FileSystem; } - static @Nullable String getSSECustomerKeyMd5(S3Options options) { - SSECustomerKey sseCostumerKey = options.getSSECustomerKey(); - if (sseCostumerKey != null) { - return Base64.encodeAsString(DigestUtils.md5(Base64.decode(sseCostumerKey.getKey()))); + static @Nullable String toMd5(SSECustomerKey key) { + if (key != null && key.getKey() != null) { + return Base64.encodeAsString(DigestUtils.md5(Base64.decode(key.getKey()))); } return null; } + static @Nullable String getSSECustomerKeyMd5(S3FileSystemConfiguration config) { + return toMd5(config.getSSECustomerKey()); + } + + static @Nullable String getSSECustomerKeyMd5(S3Options options) { + return toMd5(options.getSSECustomerKey()); + } + private static class PathStyleAccessS3ClientBuilderFactory extends DefaultS3ClientBuilderFactory { @Override public AmazonS3ClientBuilder createBuilder(S3Options s3Options) { diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannelTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannelTest.java index be5f97193893..00f9cffda5ff 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannelTest.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannelTest.java @@ -17,12 +17,17 @@ */ package org.apache.beam.sdk.io.aws.s3; -import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.getSSECustomerKeyMd5; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3Config; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3ConfigWithMultipleSSEOptions; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3ConfigWithSSEAlgorithm; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3ConfigWithSSEAwsKeyManagementParams; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3ConfigWithSSECustomerKey; import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3Options; import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3OptionsWithMultipleSSEOptions; import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3OptionsWithSSEAlgorithm; import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3OptionsWithSSEAwsKeyManagementParams; import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3OptionsWithSSECustomerKey; +import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.toMd5; import static org.apache.beam.sdk.io.aws.s3.S3WritableByteChannel.atMostOne; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; @@ -42,6 +47,7 @@ import com.amazonaws.services.s3.model.CompleteMultipartUploadResult; import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest; import com.amazonaws.services.s3.model.InitiateMultipartUploadResult; +import com.amazonaws.services.s3.model.SSEAwsKeyManagementParams; import com.amazonaws.services.s3.model.UploadPartRequest; import com.amazonaws.services.s3.model.UploadPartResult; import java.io.IOException; @@ -55,14 +61,21 @@ /** Tests {@link S3WritableByteChannel}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class S3WritableByteChannelTest { @Rule public ExpectedException expected = ExpectedException.none(); @Test public void write() throws IOException { + writeFromConfig(s3Config("s3")); + writeFromConfig(s3ConfigWithSSEAlgorithm("s3")); + writeFromConfig(s3ConfigWithSSECustomerKey("s3")); + writeFromConfig(s3ConfigWithSSEAwsKeyManagementParams("s3")); + expected.expect(IllegalArgumentException.class); + writeFromConfig(s3ConfigWithMultipleSSEOptions("s3")); + } + + @Test + public void writeWithS3Options() throws IOException { writeFromOptions(s3Options()); writeFromOptions(s3OptionsWithSSEAlgorithm()); writeFromOptions(s3OptionsWithSSECustomerKey()); @@ -71,24 +84,71 @@ public void write() throws IOException { writeFromOptions(s3OptionsWithMultipleSSEOptions()); } + @FunctionalInterface + public interface Supplier { + S3WritableByteChannel get() throws IOException; + } + private void writeFromOptions(S3Options options) throws IOException { AmazonS3 mockAmazonS3 = mock(AmazonS3.class, withSettings().defaultAnswer(RETURNS_SMART_NULLS)); S3ResourceId path = S3ResourceId.fromUri("s3://bucket/dir/file"); + Supplier channel = + () -> + new S3WritableByteChannel( + mockAmazonS3, + path, + "text/plain", + S3FileSystemConfiguration.fromS3Options(options).build()); + write( + mockAmazonS3, + channel, + path, + options.getSSEAlgorithm(), + toMd5(options.getSSECustomerKey()), + options.getSSEAwsKeyManagementParams(), + options.getS3UploadBufferSizeBytes(), + options.getBucketKeyEnabled()); + } + + private void writeFromConfig(S3FileSystemConfiguration config) throws IOException { + AmazonS3 mockAmazonS3 = mock(AmazonS3.class, withSettings().defaultAnswer(RETURNS_SMART_NULLS)); + S3ResourceId path = S3ResourceId.fromUri("s3://bucket/dir/file"); + Supplier channel = () -> new S3WritableByteChannel(mockAmazonS3, path, "text/plain", config); + write( + mockAmazonS3, + channel, + path, + config.getSSEAlgorithm(), + toMd5(config.getSSECustomerKey()), + config.getSSEAwsKeyManagementParams(), + config.getS3UploadBufferSizeBytes(), + config.getBucketKeyEnabled()); + } + private void write( + AmazonS3 mockAmazonS3, + Supplier channelSupplier, + S3ResourceId path, + String sseAlgorithm, + String sseCustomerKeyMd5, + SSEAwsKeyManagementParams sseAwsKeyManagementParams, + long s3UploadBufferSizeBytes, + boolean bucketKeyEnabled) + throws IOException { InitiateMultipartUploadResult initiateMultipartUploadResult = new InitiateMultipartUploadResult(); initiateMultipartUploadResult.setUploadId("upload-id"); - String sseAlgorithm = options.getSSEAlgorithm(); - if (options.getSSEAlgorithm() != null) { + if (sseAlgorithm != null) { initiateMultipartUploadResult.setSSEAlgorithm(sseAlgorithm); } - if (getSSECustomerKeyMd5(options) != null) { - initiateMultipartUploadResult.setSSECustomerKeyMd5(getSSECustomerKeyMd5(options)); + if (sseCustomerKeyMd5 != null) { + initiateMultipartUploadResult.setSSECustomerKeyMd5(sseCustomerKeyMd5); } - if (options.getSSEAwsKeyManagementParams() != null) { + if (sseAwsKeyManagementParams != null) { sseAlgorithm = "aws:kms"; initiateMultipartUploadResult.setSSEAlgorithm(sseAlgorithm); } + initiateMultipartUploadResult.setBucketKeyEnabled(bucketKeyEnabled); doReturn(initiateMultipartUploadResult) .when(mockAmazonS3) .initiateMultipartUpload(any(InitiateMultipartUploadRequest.class)); @@ -97,21 +157,19 @@ private void writeFromOptions(S3Options options) throws IOException { mockAmazonS3.initiateMultipartUpload( new InitiateMultipartUploadRequest(path.getBucket(), path.getKey())); assertEquals(sseAlgorithm, mockInitiateMultipartUploadResult.getSSEAlgorithm()); - assertEquals( - getSSECustomerKeyMd5(options), mockInitiateMultipartUploadResult.getSSECustomerKeyMd5()); + assertEquals(bucketKeyEnabled, mockInitiateMultipartUploadResult.getBucketKeyEnabled()); + assertEquals(sseCustomerKeyMd5, mockInitiateMultipartUploadResult.getSSECustomerKeyMd5()); UploadPartResult result = new UploadPartResult(); result.setETag("etag"); - if (getSSECustomerKeyMd5(options) != null) { - result.setSSECustomerKeyMd5(getSSECustomerKeyMd5(options)); + if (sseCustomerKeyMd5 != null) { + result.setSSECustomerKeyMd5(sseCustomerKeyMd5); } doReturn(result).when(mockAmazonS3).uploadPart(any(UploadPartRequest.class)); UploadPartResult mockUploadPartResult = mockAmazonS3.uploadPart(new UploadPartRequest()); - assertEquals(getSSECustomerKeyMd5(options), mockUploadPartResult.getSSECustomerKeyMd5()); + assertEquals(sseCustomerKeyMd5, mockUploadPartResult.getSSECustomerKeyMd5()); - S3WritableByteChannel channel = - new S3WritableByteChannel(mockAmazonS3, path, "text/plain", options); int contentSize = 34_078_720; ByteBuffer uploadContent = ByteBuffer.allocate((int) (contentSize * 2.5)); for (int i = 0; i < contentSize; i++) { @@ -119,6 +177,7 @@ private void writeFromOptions(S3Options options) throws IOException { } uploadContent.flip(); + S3WritableByteChannel channel = channelSupplier.get(); int uploadedSize = channel.write(uploadContent); assertEquals(contentSize, uploadedSize); @@ -132,8 +191,7 @@ private void writeFromOptions(S3Options options) throws IOException { verify(mockAmazonS3, times(2)) .initiateMultipartUpload(notNull(InitiateMultipartUploadRequest.class)); - int partQuantity = - (int) Math.ceil((double) contentSize / options.getS3UploadBufferSizeBytes()) + 1; + int partQuantity = (int) Math.ceil((double) contentSize / s3UploadBufferSizeBytes) + 1; verify(mockAmazonS3, times(partQuantity)).uploadPart(notNull(UploadPartRequest.class)); verify(mockAmazonS3, times(1)) .completeMultipartUpload(notNull(CompleteMultipartUploadRequest.class)); diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sns/PublishResultCodersTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sns/PublishResultCodersTest.java index 013535d6ca72..8e172a61857b 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sns/PublishResultCodersTest.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sns/PublishResultCodersTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.aws.sns; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.amazonaws.ResponseMetadata; import com.amazonaws.http.HttpResponse; @@ -31,9 +31,6 @@ import org.junit.Test; /** Tests for PublishResult coders. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PublishResultCodersTest { @Test diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sns/SnsIOTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sns/SnsIOTest.java index 6fc6c18637a2..db28f3df4845 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sns/SnsIOTest.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sns/SnsIOTest.java @@ -56,9 +56,6 @@ /** Tests to verify writes to Sns. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SnsIOTest implements Serializable { private static final String topicName = "arn:aws:sns:us-west-2:5880:topic-FMFEHJ47NRFO"; diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/EmbeddedSqsServer.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/EmbeddedSqsServer.java new file mode 100644 index 000000000000..72dc9ddc9c25 --- /dev/null +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/EmbeddedSqsServer.java @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.sqs; + +import com.amazonaws.auth.AWSStaticCredentialsProvider; +import com.amazonaws.auth.BasicAWSCredentials; +import com.amazonaws.client.builder.AwsClientBuilder; +import com.amazonaws.services.sqs.AmazonSQS; +import com.amazonaws.services.sqs.AmazonSQSClientBuilder; +import com.amazonaws.services.sqs.model.CreateQueueResult; +import org.elasticmq.rest.sqs.SQSRestServer; +import org.elasticmq.rest.sqs.SQSRestServerBuilder; +import org.junit.rules.ExternalResource; + +class EmbeddedSqsServer extends ExternalResource { + + private SQSRestServer sqsRestServer; + private AmazonSQS client; + private String queueUrl; + + @Override + protected void before() { + sqsRestServer = SQSRestServerBuilder.start(); + + String endpoint = "http://localhost:9324"; + String region = "elasticmq"; + String accessKey = "x"; + String secretKey = "x"; + + client = + AmazonSQSClientBuilder.standard() + .withCredentials( + new AWSStaticCredentialsProvider(new BasicAWSCredentials(accessKey, secretKey))) + .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endpoint, region)) + .build(); + final CreateQueueResult queue = client.createQueue("test"); + queueUrl = queue.getQueueUrl(); + } + + @Override + protected void after() { + sqsRestServer.stopAndWait(); + } + + public AmazonSQS getClient() { + return client; + } + + public String getQueueUrl() { + return queueUrl; + } +} diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsIOTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsIOTest.java index 23da3d301fb8..2838b6c4d07b 100644 --- a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsIOTest.java +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsIOTest.java @@ -19,57 +19,27 @@ import static org.junit.Assert.assertEquals; -import com.amazonaws.auth.AWSStaticCredentialsProvider; -import com.amazonaws.auth.BasicAWSCredentials; -import com.amazonaws.client.builder.AwsClientBuilder; import com.amazonaws.services.sqs.AmazonSQS; -import com.amazonaws.services.sqs.AmazonSQSClientBuilder; -import com.amazonaws.services.sqs.model.CreateQueueResult; import com.amazonaws.services.sqs.model.Message; import com.amazonaws.services.sqs.model.ReceiveMessageResult; import com.amazonaws.services.sqs.model.SendMessageRequest; import java.util.ArrayList; import java.util.List; -import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; -import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.Create; -import org.apache.beam.sdk.values.PCollection; -import org.elasticmq.rest.sqs.SQSRestServer; -import org.elasticmq.rest.sqs.SQSRestServerBuilder; import org.junit.Rule; import org.junit.Test; -import org.junit.rules.ExternalResource; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** Tests on {@link SqsIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SqsIOTest { @Rule public TestPipeline pipeline = TestPipeline.create(); @Rule public EmbeddedSqsServer embeddedSqsRestServer = new EmbeddedSqsServer(); - @Test - public void testRead() { - final AmazonSQS client = embeddedSqsRestServer.getClient(); - final String queueUrl = embeddedSqsRestServer.getQueueUrl(); - - final PCollection output = - pipeline.apply(SqsIO.read().withQueueUrl(queueUrl).withMaxNumRecords(100)); - - PAssert.thatSingleton(output.apply(Count.globally())).isEqualTo(100L); - - for (int i = 0; i < 100; i++) { - client.sendMessage(queueUrl, "This is a test"); - } - pipeline.run(); - } - @Test public void testWrite() { final AmazonSQS client = embeddedSqsRestServer.getClient(); @@ -98,44 +68,4 @@ public void testWrite() { received.contains("This is a test " + i); } } - - private static class EmbeddedSqsServer extends ExternalResource { - - private SQSRestServer sqsRestServer; - private AmazonSQS client; - private String queueUrl; - - @Override - protected void before() { - sqsRestServer = SQSRestServerBuilder.start(); - - String endpoint = "http://localhost:9324"; - String region = "elasticmq"; - String accessKey = "x"; - String secretKey = "x"; - - client = - AmazonSQSClientBuilder.standard() - .withCredentials( - new AWSStaticCredentialsProvider(new BasicAWSCredentials(accessKey, secretKey))) - .withEndpointConfiguration( - new AwsClientBuilder.EndpointConfiguration(endpoint, region)) - .build(); - final CreateQueueResult queue = client.createQueue("test"); - queueUrl = queue.getQueueUrl(); - } - - @Override - protected void after() { - sqsRestServer.stopAndWait(); - } - - public AmazonSQS getClient() { - return client; - } - - public String getQueueUrl() { - return queueUrl; - } - } } diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedReaderTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedReaderTest.java new file mode 100644 index 000000000000..86de889ec06d --- /dev/null +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedReaderTest.java @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.sqs; + +import static junit.framework.TestCase.assertFalse; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import com.amazonaws.services.sqs.AmazonSQS; +import com.amazonaws.services.sqs.model.Message; +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashSet; +import java.util.List; +import org.apache.beam.sdk.io.UnboundedSource; +import org.apache.beam.sdk.io.aws.options.AwsOptions; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.util.CoderUtils; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests on {@link SqsUnboundedReader}. */ +@RunWith(JUnit4.class) +public class SqsUnboundedReaderTest { + private static final String DATA = "testData"; + + @Rule public TestPipeline pipeline = TestPipeline.create(); + + @Rule public EmbeddedSqsServer embeddedSqsRestServer = new EmbeddedSqsServer(); + + private SqsUnboundedSource source; + + private void setupOneMessage() { + final AmazonSQS client = embeddedSqsRestServer.getClient(); + final String queueUrl = embeddedSqsRestServer.getQueueUrl(); + client.sendMessage(queueUrl, DATA); + source = + new SqsUnboundedSource( + SqsIO.read().withQueueUrl(queueUrl).withMaxNumRecords(1), + new SqsConfiguration(pipeline.getOptions().as(AwsOptions.class))); + } + + private void setupMessages(List messages) { + final AmazonSQS client = embeddedSqsRestServer.getClient(); + final String queueUrl = embeddedSqsRestServer.getQueueUrl(); + for (String message : messages) { + client.sendMessage(queueUrl, message); + } + source = + new SqsUnboundedSource( + SqsIO.read().withQueueUrl(queueUrl).withMaxNumRecords(1), + new SqsConfiguration(pipeline.getOptions().as(AwsOptions.class))); + } + + @Test + public void testReadOneMessage() throws IOException { + setupOneMessage(); + UnboundedSource.UnboundedReader reader = + source.createReader(pipeline.getOptions(), null); + // Read one message. + assertTrue(reader.start()); + assertEquals(DATA, reader.getCurrent().getBody()); + assertFalse(reader.advance()); + // ACK the message. + UnboundedSource.CheckpointMark checkpoint = reader.getCheckpointMark(); + checkpoint.finalizeCheckpoint(); + reader.close(); + } + + @Test + public void testTimeoutAckAndRereadOneMessage() throws IOException { + setupOneMessage(); + UnboundedSource.UnboundedReader reader = + source.createReader(pipeline.getOptions(), null); + AmazonSQS sqsClient = source.getSqs(); + assertTrue(reader.start()); + assertEquals(DATA, reader.getCurrent().getBody()); + String receiptHandle = reader.getCurrent().getReceiptHandle(); + // Set the message to timeout. + sqsClient.changeMessageVisibility(source.getRead().queueUrl(), receiptHandle, 0); + // We'll now receive the same message again. + assertTrue(reader.advance()); + assertEquals(DATA, reader.getCurrent().getBody()); + assertFalse(reader.advance()); + // Now ACK the message. + UnboundedSource.CheckpointMark checkpoint = reader.getCheckpointMark(); + checkpoint.finalizeCheckpoint(); + reader.close(); + } + + @Test + public void testMultipleReaders() throws IOException { + List incoming = new ArrayList<>(); + for (int i = 0; i < 2; i++) { + incoming.add(String.format("data_%d", i)); + } + setupMessages(incoming); + UnboundedSource.UnboundedReader reader = + source.createReader(pipeline.getOptions(), null); + // Consume two messages, only read one. + assertTrue(reader.start()); + assertEquals("data_0", reader.getCurrent().getBody()); + + // Grab checkpoint. + SqsCheckpointMark checkpoint = (SqsCheckpointMark) reader.getCheckpointMark(); + checkpoint.finalizeCheckpoint(); + assertEquals(1, checkpoint.notYetReadReceipts.size()); + + // Read second message. + assertTrue(reader.advance()); + assertEquals("data_1", reader.getCurrent().getBody()); + + // Restore from checkpoint. + byte[] checkpointBytes = + CoderUtils.encodeToByteArray(source.getCheckpointMarkCoder(), checkpoint); + checkpoint = CoderUtils.decodeFromByteArray(source.getCheckpointMarkCoder(), checkpointBytes); + assertEquals(1, checkpoint.notYetReadReceipts.size()); + + // Re-read second message. + reader = source.createReader(pipeline.getOptions(), checkpoint); + assertTrue(reader.start()); + assertEquals("data_1", reader.getCurrent().getBody()); + + // We are done. + assertFalse(reader.advance()); + + // ACK final message. + checkpoint = (SqsCheckpointMark) reader.getCheckpointMark(); + checkpoint.finalizeCheckpoint(); + reader.close(); + } + + @Test + public void testReadMany() throws IOException { + + HashSet messages = new HashSet<>(); + List incoming = new ArrayList<>(); + for (int i = 0; i < 100; i++) { + String content = String.format("data_%d", i); + messages.add(content); + incoming.add(String.format("data_%d", i)); + } + setupMessages(incoming); + + SqsUnboundedReader reader = + (SqsUnboundedReader) source.createReader(pipeline.getOptions(), null); + + for (int i = 0; i < 100; i++) { + if (i == 0) { + assertTrue(reader.start()); + } else { + assertTrue(reader.advance()); + } + String data = reader.getCurrent().getBody(); + boolean messageNum = messages.remove(data); + // No duplicate messages. + assertTrue(messageNum); + } + // We are done. + assertFalse(reader.advance()); + // We saw each message exactly once. + assertTrue(messages.isEmpty()); + reader.close(); + } + + /** Tests that checkpoints finalized after the reader is closed succeed. */ + @Test + public void testCloseWithActiveCheckpoints() throws Exception { + setupOneMessage(); + UnboundedSource.UnboundedReader reader = + source.createReader(pipeline.getOptions(), null); + reader.start(); + UnboundedSource.CheckpointMark checkpoint = reader.getCheckpointMark(); + reader.close(); + checkpoint.finalizeCheckpoint(); + } +} diff --git a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedSourceTest.java b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedSourceTest.java new file mode 100644 index 000000000000..a5c7e6837a27 --- /dev/null +++ b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/sqs/SqsUnboundedSourceTest.java @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.sqs; + +import com.amazonaws.services.sqs.AmazonSQS; +import org.apache.beam.sdk.io.aws.options.AwsOptions; +import org.apache.beam.sdk.testing.CoderProperties; +import org.apache.beam.sdk.testing.TestPipeline; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests on {@link SqsUnboundedSource}. */ +@RunWith(JUnit4.class) +public class SqsUnboundedSourceTest { + + private static final String DATA = "testData"; + + @Rule public TestPipeline pipeline = TestPipeline.create(); + + @Rule public EmbeddedSqsServer embeddedSqsRestServer = new EmbeddedSqsServer(); + + @Test + public void testCheckpointCoderIsSane() { + final AmazonSQS client = embeddedSqsRestServer.getClient(); + final String queueUrl = embeddedSqsRestServer.getQueueUrl(); + client.sendMessage(queueUrl, DATA); + SqsUnboundedSource source = + new SqsUnboundedSource( + SqsIO.read().withQueueUrl(queueUrl).withMaxNumRecords(1), + new SqsConfiguration(pipeline.getOptions().as(AwsOptions.class))); + CoderProperties.coderSerializable(source.getCheckpointMarkCoder()); + } +} diff --git a/sdks/java/io/amazon-web-services2/OWNERS b/sdks/java/io/amazon-web-services2/OWNERS new file mode 100644 index 000000000000..ffbc8b6c2475 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/OWNERS @@ -0,0 +1,4 @@ +# See the OWNERS docs at https://s.apache.org/beam-owners + +reviewers: + - aromanenko-dev diff --git a/sdks/java/io/amazon-web-services2/build.gradle b/sdks/java/io/amazon-web-services2/build.gradle index 854263ec973d..116a12cb5fd1 100644 --- a/sdks/java/io/amazon-web-services2/build.gradle +++ b/sdks/java/io/amazon-web-services2/build.gradle @@ -39,26 +39,31 @@ dependencies { compile library.java.aws_java_sdk2_sdk_core compile library.java.aws_java_sdk2_sns compile library.java.aws_java_sdk2_sqs + compile library.java.aws_java_sdk2_s3 + compile library.java.aws_java_sdk2_http_client_spi + compile library.java.aws_java_sdk2_regions + compile library.java.aws_java_sdk2_utils compile library.java.jackson_core compile library.java.jackson_annotations compile library.java.jackson_databind - compile library.java.jackson_dataformat_cbor compile library.java.joda_time compile library.java.slf4j_api - compile "software.amazon.kinesis:amazon-kinesis-client:2.2.5" + compile "software.amazon.kinesis:amazon-kinesis-client:2.3.4" compile "commons-lang:commons-lang:2.6" + compile library.java.commons_lang3 + compile library.java.http_core + compile library.java.commons_codec testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") testCompile project(path: ":sdks:java:io:kinesis", configuration: "testRuntime") + testCompile "io.findify:s3mock_2.12:0.2.6" testCompile library.java.mockito_core testCompile library.java.guava_testlib - testCompile library.java.hamcrest_core testCompile library.java.junit testCompile 'org.elasticmq:elasticmq-rest-sqs_2.12:0.15.6' testCompile library.java.hamcrest_library testCompile library.java.powermock testCompile library.java.powermock_mockito - testCompile library.java.testcontainers_localstack testCompile "org.assertj:assertj-core:3.11.1" testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/BasicDynamoDbClientProvider.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/BasicDynamoDbClientProvider.java index 80b19e6830d4..80b9243d2e4b 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/BasicDynamoDbClientProvider.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/BasicDynamoDbClientProvider.java @@ -18,8 +18,11 @@ package org.apache.beam.sdk.io.aws2.dynamodb; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import java.net.URI; +import java.util.Objects; +import org.apache.beam.sdk.io.aws2.options.AwsSerializableUtils; import org.checkerframework.checker.nullness.qual.Nullable; import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; import software.amazon.awssdk.regions.Region; @@ -28,7 +31,7 @@ /** Basic implementation of {@link DynamoDbClientProvider} used by default in {@link DynamoDBIO}. */ public class BasicDynamoDbClientProvider implements DynamoDbClientProvider { - private final AwsCredentialsProvider awsCredentialsProvider; + private final String awsCredentialsProviderSerialized; private final String region; private final @Nullable URI serviceEndpoint; @@ -36,7 +39,9 @@ public class BasicDynamoDbClientProvider implements DynamoDbClientProvider { AwsCredentialsProvider awsCredentialsProvider, String region, @Nullable URI serviceEndpoint) { checkArgument(awsCredentialsProvider != null, "awsCredentialsProvider can not be null"); checkArgument(region != null, "region can not be null"); - this.awsCredentialsProvider = awsCredentialsProvider; + this.awsCredentialsProviderSerialized = + AwsSerializableUtils.serializeAwsCredentialsProvider(awsCredentialsProvider); + checkNotNull(awsCredentialsProviderSerialized, "awsCredentialsProviderString can not be null"); this.region = region; this.serviceEndpoint = serviceEndpoint; } @@ -45,7 +50,9 @@ public class BasicDynamoDbClientProvider implements DynamoDbClientProvider { public DynamoDbClient getDynamoDbClient() { DynamoDbClientBuilder builder = DynamoDbClient.builder() - .credentialsProvider(awsCredentialsProvider) + .credentialsProvider( + AwsSerializableUtils.deserializeAwsCredentialsProvider( + awsCredentialsProviderSerialized)) .region(Region.of(region)); if (serviceEndpoint != null) { @@ -54,4 +61,23 @@ public DynamoDbClient getDynamoDbClient() { return builder.build(); } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + BasicDynamoDbClientProvider that = (BasicDynamoDbClientProvider) o; + return Objects.equals(awsCredentialsProviderSerialized, that.awsCredentialsProviderSerialized) + && Objects.equals(region, that.region) + && Objects.equals(serviceEndpoint, that.serviceEndpoint); + } + + @Override + public int hashCode() { + return Objects.hash(awsCredentialsProviderSerialized, region, serviceEndpoint); + } } diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIO.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIO.java index 16f5dae47d87..e483f2ce340b 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIO.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIO.java @@ -25,6 +25,7 @@ import java.net.URI; import java.util.ArrayList; import java.util.Collections; +import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.function.Predicate; @@ -126,6 +127,11 @@ *

  • Mapper function with a table name to map or transform your object into KV * + * + * If primary keys could repeat in your stream (i.e. an upsert stream), you could encounter a + * ValidationError, as AWS does not allow writing duplicate keys within a single batch operation. + * For such use cases, you can explicitly set the key names corresponding to the primary key to be + * deduplicated using the withDeduplicateKeys method */ @Experimental(Kind.SOURCE_SINK) @SuppressWarnings({ @@ -138,7 +144,7 @@ public static Read read() { } public static Write write() { - return new AutoValue_DynamoDBIO_Write.Builder().build(); + return new AutoValue_DynamoDBIO_Write.Builder().setDeduplicateKeys(new ArrayList<>()).build(); } /** Read data from DynamoDB and return ScanResult. */ @@ -258,13 +264,22 @@ private static class ReadFn extends DoFn, T> { @ProcessElement public void processElement(@Element Read spec, OutputReceiver out) { DynamoDbClient client = spec.getDynamoDbClientProvider().getDynamoDbClient(); - - ScanRequest scanRequest = spec.getScanRequestFn().apply(null); - ScanRequest scanRequestWithSegment = - scanRequest.toBuilder().segment(spec.getSegmentId()).build(); - - ScanResponse scanResponse = client.scan(scanRequestWithSegment); - out.output(spec.getScanResponseMapperFn().apply(scanResponse)); + Map lastEvaluatedKey = null; + + do { + ScanRequest scanRequest = spec.getScanRequestFn().apply(null); + ScanRequest scanRequestWithSegment = + scanRequest + .toBuilder() + .segment(spec.getSegmentId()) + .exclusiveStartKey(lastEvaluatedKey) + .build(); + + ScanResponse scanResponse = client.scan(scanRequestWithSegment); + out.output(spec.getScanResponseMapperFn().apply(scanResponse)); + lastEvaluatedKey = scanResponse.lastEvaluatedKey(); + } while (lastEvaluatedKey != null + && !lastEvaluatedKey.isEmpty()); // iterate until all records are fetched } } @@ -360,6 +375,8 @@ public abstract static class Write extends PTransform, PCollec abstract @Nullable SerializableFunction> getWriteItemMapperFn(); + abstract List getDeduplicateKeys(); + abstract Builder toBuilder(); @AutoValue.Builder @@ -372,6 +389,8 @@ abstract static class Builder { abstract Builder setWriteItemMapperFn( SerializableFunction> writeItemMapperFn); + abstract Builder setDeduplicateKeys(List deduplicateKeys); + abstract Write build(); } @@ -424,6 +443,10 @@ public Write withWriteRequestMapperFn( return toBuilder().setWriteItemMapperFn(writeItemMapperFn).build(); } + public Write withDeduplicateKeys(List deduplicateKeys) { + return toBuilder().setDeduplicateKeys(deduplicateKeys).build(); + } + @Override public PCollection expand(PCollection input) { return input.apply(ParDo.of(new WriteFn<>(this))); @@ -442,7 +465,7 @@ static class WriteFn extends DoFn { private static final int BATCH_SIZE = 25; private transient DynamoDbClient client; private final Write spec; - private List> batch; + private Map>, KV> batch; WriteFn(Write spec) { this.spec = spec; @@ -465,19 +488,35 @@ public void setup() { @StartBundle public void startBundle(StartBundleContext context) { - batch = new ArrayList<>(); + batch = new HashMap<>(); } @ProcessElement public void processElement(ProcessContext context) throws Exception { final KV writeRequest = (KV) spec.getWriteItemMapperFn().apply(context.element()); - batch.add(writeRequest); + batch.put( + KV.of(writeRequest.getKey(), extractDeduplicateKeyValues(writeRequest.getValue())), + writeRequest); if (batch.size() >= BATCH_SIZE) { flushBatch(); } } + private Map extractDeduplicateKeyValues(WriteRequest request) { + if (request.putRequest() != null) { + return request.putRequest().item().entrySet().stream() + .filter(entry -> spec.getDeduplicateKeys().contains(entry.getKey())) + .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); + } else if (request.deleteRequest() != null) { + return request.deleteRequest().key().entrySet().stream() + .filter(entry -> spec.getDeduplicateKeys().contains(entry.getKey())) + .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); + } else { + return Collections.emptyMap(); + } + } + @FinishBundle public void finishBundle(FinishBundleContext context) throws Exception { flushBatch(); @@ -492,7 +531,7 @@ private void flushBatch() throws IOException, InterruptedException { // Since each element is a KV in the batch, we need to group them // by tableName Map> mapTableRequest = - batch.stream() + batch.values().stream() .collect( Collectors.groupingBy( KV::getKey, Collectors.mapping(KV::getValue, Collectors.toList()))); diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/AwsModule.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/AwsModule.java index 853681123975..7453d769d6b7 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/AwsModule.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/AwsModule.java @@ -20,9 +20,7 @@ import com.fasterxml.jackson.annotation.JsonTypeInfo; import com.fasterxml.jackson.core.JsonGenerator; import com.fasterxml.jackson.core.JsonParser; -import com.fasterxml.jackson.core.JsonToken; import com.fasterxml.jackson.core.type.TypeReference; -import com.fasterxml.jackson.core.type.WritableTypeId; import com.fasterxml.jackson.databind.DeserializationContext; import com.fasterxml.jackson.databind.JsonDeserializer; import com.fasterxml.jackson.databind.JsonSerializer; @@ -40,9 +38,12 @@ import java.util.Map; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.io.aws2.s3.SSECustomerKey; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import software.amazon.awssdk.auth.credentials.AwsBasicCredentials; +import software.amazon.awssdk.auth.credentials.AwsCredentials; import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; +import software.amazon.awssdk.auth.credentials.AwsSessionCredentials; import software.amazon.awssdk.auth.credentials.ContainerCredentialsProvider; import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider; import software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider; @@ -65,6 +66,7 @@ public class AwsModule extends SimpleModule { private static final String ACCESS_KEY_ID = "accessKeyId"; private static final String SECRET_ACCESS_KEY = "secretAccessKey"; + private static final String SESSION_TOKEN = "sessionToken"; public static final String CONNECTION_ACQUIRE_TIMEOUT = "connectionAcquisitionTimeout"; public static final String CONNECTION_MAX_IDLE_TIMEOUT = "connectionMaxIdleTime"; public static final String CONNECTION_TIMEOUT = "connectionTimeout"; @@ -77,6 +79,7 @@ public AwsModule() { setMixInAnnotation(AwsCredentialsProvider.class, AwsCredentialsProviderMixin.class); setMixInAnnotation(ProxyConfiguration.class, ProxyConfigurationMixin.class); setMixInAnnotation(AttributeMap.class, AttributeMapMixin.class); + setMixInAnnotation(SSECustomerKey.class, SSECustomerKeyMixin.class); } /** A mixin to add Jackson annotations to {@link AwsCredentialsProvider}. */ @@ -107,10 +110,18 @@ public AwsCredentialsProvider deserializeWithType( throw new IOException( String.format("AWS credentials provider type name key '%s' not found", typeNameKey)); } - if (typeName.equals(StaticCredentialsProvider.class.getSimpleName())) { - return StaticCredentialsProvider.create( - AwsBasicCredentials.create(asMap.get(ACCESS_KEY_ID), asMap.get(SECRET_ACCESS_KEY))); + boolean isSession = asMap.containsKey(SESSION_TOKEN); + if (isSession) { + return StaticCredentialsProvider.create( + AwsSessionCredentials.create( + asMap.get(ACCESS_KEY_ID), + asMap.get(SECRET_ACCESS_KEY), + asMap.get(SESSION_TOKEN))); + } else { + return StaticCredentialsProvider.create( + AwsBasicCredentials.create(asMap.get(ACCESS_KEY_ID), asMap.get(SECRET_ACCESS_KEY))); + } } else if (typeName.equals(DefaultCredentialsProvider.class.getSimpleName())) { return DefaultCredentialsProvider.create(); } else if (typeName.equals(EnvironmentVariableCredentialsProvider.class.getSimpleName())) { @@ -155,19 +166,25 @@ public void serializeWithType( SerializerProvider serializer, TypeSerializer typeSerializer) throws IOException { - WritableTypeId typeId = - typeSerializer.writeTypePrefix( - jsonGenerator, typeSerializer.typeId(credentialsProvider, JsonToken.START_OBJECT)); + // BEAM-11958 Use deprecated Jackson APIs to be compatible with older versions of jackson + typeSerializer.writeTypePrefixForObject(credentialsProvider, jsonGenerator); if (credentialsProvider.getClass().equals(StaticCredentialsProvider.class)) { - jsonGenerator.writeStringField( - ACCESS_KEY_ID, credentialsProvider.resolveCredentials().accessKeyId()); - jsonGenerator.writeStringField( - SECRET_ACCESS_KEY, credentialsProvider.resolveCredentials().secretAccessKey()); + AwsCredentials credentials = credentialsProvider.resolveCredentials(); + if (credentials.getClass().equals(AwsSessionCredentials.class)) { + AwsSessionCredentials sessionCredentials = (AwsSessionCredentials) credentials; + jsonGenerator.writeStringField(ACCESS_KEY_ID, sessionCredentials.accessKeyId()); + jsonGenerator.writeStringField(SECRET_ACCESS_KEY, sessionCredentials.secretAccessKey()); + jsonGenerator.writeStringField(SESSION_TOKEN, sessionCredentials.sessionToken()); + } else { + jsonGenerator.writeStringField(ACCESS_KEY_ID, credentials.accessKeyId()); + jsonGenerator.writeStringField(SECRET_ACCESS_KEY, credentials.secretAccessKey()); + } } else if (!SINGLETON_CREDENTIAL_PROVIDERS.contains(credentialsProvider.getClass())) { throw new IllegalArgumentException( "Unsupported AWS credentials provider type " + credentialsProvider.getClass()); } - typeSerializer.writeTypeSuffix(jsonGenerator, typeId); + // BEAM-11958 Use deprecated Jackson APIs to be compatible with older versions of jackson + typeSerializer.writeTypeSuffixForObject(credentialsProvider, jsonGenerator); } } @@ -301,4 +318,21 @@ public void serialize( jsonGenerator.writeEndObject(); } } + + @JsonDeserialize(using = SSECustomerKeyDeserializer.class) + private static class SSECustomerKeyMixin {} + + private static class SSECustomerKeyDeserializer extends JsonDeserializer { + + @Override + public SSECustomerKey deserialize(JsonParser parser, DeserializationContext context) + throws IOException { + Map asMap = parser.readValueAs(new TypeReference>() {}); + + final String key = asMap.getOrDefault("key", null); + final String algorithm = asMap.getOrDefault("algorithm", null); + final String md5 = asMap.getOrDefault("md5", null); + return SSECustomerKey.builder().key(key).algorithm(algorithm).md5(md5).build(); + } + } } diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/AwsSerializableUtils.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/AwsSerializableUtils.java new file mode 100644 index 000000000000..7f24fb7a6049 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/AwsSerializableUtils.java @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.options; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import java.io.IOException; +import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; + +/** Utilities for working with AWS Serializables. */ +public class AwsSerializableUtils { + + public static String serializeAwsCredentialsProvider( + AwsCredentialsProvider awsCredentialsProvider) { + ObjectMapper om = new ObjectMapper(); + om.registerModule(new AwsModule()); + try { + return om.writeValueAsString(awsCredentialsProvider); + } catch (JsonProcessingException e) { + throw new IllegalArgumentException("AwsCredentialsProvider can not be serialized to Json", e); + } + } + + public static AwsCredentialsProvider deserializeAwsCredentialsProvider( + String awsCredentialsProviderSerialized) { + ObjectMapper om = new ObjectMapper(); + om.registerModule(new AwsModule()); + try { + return om.readValue(awsCredentialsProviderSerialized, AwsCredentialsProvider.class); + } catch (IOException e) { + throw new IllegalArgumentException( + "AwsCredentialsProvider can not be deserialized from Json", e); + } + } +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/S3ClientBuilderFactory.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/S3ClientBuilderFactory.java new file mode 100644 index 000000000000..ae37534807bf --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/S3ClientBuilderFactory.java @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.options; + +import software.amazon.awssdk.services.s3.S3ClientBuilder; + +/** Construct S3ClientBuilder from S3 pipeline options. */ +public interface S3ClientBuilderFactory { + + S3ClientBuilder createBuilder(S3Options s3Options); +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/S3Options.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/S3Options.java new file mode 100644 index 000000000000..dbed7fcbd539 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/S3Options.java @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.options; + +import org.apache.beam.sdk.io.aws2.s3.DefaultS3ClientBuilderFactory; +import org.apache.beam.sdk.io.aws2.s3.SSECustomerKey; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** Options used to configure Amazon Web Services S3. */ +public interface S3Options extends AwsOptions { + + @Description("AWS S3 storage class used for creating S3 objects") + @Default.String("STANDARD") + String getS3StorageClass(); + + void setS3StorageClass(String value); + + @Description( + "Size of S3 upload chunks; max upload object size is this value multiplied by 10000;" + + "default is 64MB, or 5MB in memory-constrained environments. Must be at least 5MB.") + @Default.InstanceFactory(S3UploadBufferSizeBytesFactory.class) + Integer getS3UploadBufferSizeBytes(); + + void setS3UploadBufferSizeBytes(Integer value); + + @Description("Thread pool size, limiting max concurrent S3 operations") + @Default.Integer(50) + int getS3ThreadPoolSize(); + + void setS3ThreadPoolSize(int value); + + @Description("Algorithm for SSE-S3 encryption, e.g. AES256.") + @Nullable + String getSSEAlgorithm(); + + void setSSEAlgorithm(String value); + + @Description( + "SSE key for SSE-C encryption, e.g. a base64 encoded key and the algorithm." + + "To specify on the command-line, represent the value as a JSON object. For example:" + + " --SSECustomerKey={\"key\": \"86glyTlCN...\", \"algorithm\": \"AES256\"}") + @Default.InstanceFactory(SSECustomerKeyFactory.class) + SSECustomerKey getSSECustomerKey(); + + void setSSECustomerKey(SSECustomerKey sseCustomerKey); + + @Description("KMS key id for SSE-KMS encryption, e.g. arn:aws:kms:....") + @Nullable + String getSSEKMSKeyId(); + + void setSSEKMSKeyId(String value); + + @Description( + "Factory class that should be created and used to create a builder of S3client." + + "Override the default value if you need a S3 client with custom properties, like path style access, etc.") + @Default.Class(DefaultS3ClientBuilderFactory.class) + Class getS3ClientFactoryClass(); + + void setS3ClientFactoryClass(Class s3ClientFactoryClass); + + /** + * Provide the default s3 upload buffer size in bytes: 64MB if more than 512MB in RAM are + * available and 5MB otherwise. + */ + class S3UploadBufferSizeBytesFactory implements DefaultValueFactory { + + public static final int REQUIRED_MEMORY_FOR_DEFAULT_BUFFER_BYTES = 536_870_912; + public static final int MINIMUM_UPLOAD_BUFFER_SIZE_BYTES = 5_242_880; + public static final int DEFAULT_UPLOAD_BUFFER_SIZE_BYTES = 67_108_864; + + @Override + public Integer create(PipelineOptions options) { + return Runtime.getRuntime().maxMemory() < REQUIRED_MEMORY_FOR_DEFAULT_BUFFER_BYTES + ? MINIMUM_UPLOAD_BUFFER_SIZE_BYTES + : DEFAULT_UPLOAD_BUFFER_SIZE_BYTES; + } + } + + class SSECustomerKeyFactory implements DefaultValueFactory { + + @Override + public SSECustomerKey create(PipelineOptions options) { + return SSECustomerKey.builder().build(); + } + } +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/DefaultS3ClientBuilderFactory.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/DefaultS3ClientBuilderFactory.java new file mode 100644 index 000000000000..8aad7fed92cb --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/DefaultS3ClientBuilderFactory.java @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import java.net.URI; +import org.apache.beam.sdk.io.aws2.options.S3ClientBuilderFactory; +import org.apache.beam.sdk.io.aws2.options.S3Options; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import software.amazon.awssdk.http.SdkHttpClient; +import software.amazon.awssdk.http.apache.ApacheHttpClient; +import software.amazon.awssdk.regions.Region; +import software.amazon.awssdk.services.s3.S3Client; +import software.amazon.awssdk.services.s3.S3ClientBuilder; + +/** + * Construct S3ClientBuilder with default values of S3 client properties like path style access, + * accelerated mode, etc. + */ +public class DefaultS3ClientBuilderFactory implements S3ClientBuilderFactory { + + private static final Logger LOG = LoggerFactory.getLogger(DefaultS3ClientBuilderFactory.class); + + @Override + public S3ClientBuilder createBuilder(S3Options s3Options) { + S3ClientBuilder builder = + S3Client.builder().credentialsProvider(s3Options.getAwsCredentialsProvider()); + + if (s3Options.getProxyConfiguration() != null) { + SdkHttpClient httpClient = + ApacheHttpClient.builder().proxyConfiguration(s3Options.getProxyConfiguration()).build(); + builder = builder.httpClient(httpClient); + } + + if (!Strings.isNullOrEmpty(s3Options.getEndpoint())) { + URI endpoint = URI.create(s3Options.getEndpoint()); + Region region = Region.of(s3Options.getAwsRegion()); + builder.endpointOverride(endpoint).region(region); + } else if (!Strings.isNullOrEmpty(s3Options.getAwsRegion())) { + builder = builder.region(Region.of(s3Options.getAwsRegion())); + } else { + LOG.info( + "The AWS S3 Beam extension was included in this build, but the awsRegion flag " + + "was not specified. If you don't plan to use S3, then ignore this message."); + } + return builder; + } +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java new file mode 100644 index 000000000000..11a5215a8d9d --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java @@ -0,0 +1,671 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.sdk.io.FileSystemUtils.wildcardToRegexp; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import com.google.auto.value.AutoValue; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.channels.ReadableByteChannel; +import java.nio.channels.WritableByteChannel; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Date; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.Callable; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executors; +import java.util.concurrent.Future; +import java.util.regex.Pattern; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.io.FileSystem; +import org.apache.beam.sdk.io.aws2.options.S3ClientBuilderFactory; +import org.apache.beam.sdk.io.aws2.options.S3Options; +import org.apache.beam.sdk.io.fs.CreateOptions; +import org.apache.beam.sdk.io.fs.MatchResult; +import org.apache.beam.sdk.io.fs.MoveOptions; +import org.apache.beam.sdk.util.InstanceBuilder; +import org.apache.beam.sdk.util.MoreFutures; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Suppliers; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ArrayListMultimap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Multimap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListeningExecutorService; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import software.amazon.awssdk.core.exception.SdkServiceException; +import software.amazon.awssdk.services.s3.S3Client; +import software.amazon.awssdk.services.s3.S3ClientBuilder; +import software.amazon.awssdk.services.s3.model.CompleteMultipartUploadRequest; +import software.amazon.awssdk.services.s3.model.CompleteMultipartUploadResponse; +import software.amazon.awssdk.services.s3.model.CompletedMultipartUpload; +import software.amazon.awssdk.services.s3.model.CompletedPart; +import software.amazon.awssdk.services.s3.model.CopyObjectRequest; +import software.amazon.awssdk.services.s3.model.CopyObjectResponse; +import software.amazon.awssdk.services.s3.model.CopyPartResult; +import software.amazon.awssdk.services.s3.model.CreateMultipartUploadRequest; +import software.amazon.awssdk.services.s3.model.CreateMultipartUploadResponse; +import software.amazon.awssdk.services.s3.model.Delete; +import software.amazon.awssdk.services.s3.model.DeleteObjectsRequest; +import software.amazon.awssdk.services.s3.model.HeadObjectRequest; +import software.amazon.awssdk.services.s3.model.HeadObjectResponse; +import software.amazon.awssdk.services.s3.model.ListObjectsV2Request; +import software.amazon.awssdk.services.s3.model.ListObjectsV2Response; +import software.amazon.awssdk.services.s3.model.ObjectIdentifier; +import software.amazon.awssdk.services.s3.model.S3Exception; +import software.amazon.awssdk.services.s3.model.S3Object; +import software.amazon.awssdk.services.s3.model.UploadPartCopyRequest; + +/** {@link FileSystem} implementation for Amazon S3. */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +class S3FileSystem extends FileSystem { + + private static final Logger LOG = LoggerFactory.getLogger(S3FileSystem.class); + + // Amazon S3 API: You can create a copy of your object up to 5 GB in a single atomic operation + // Ref. https://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjectsExamples.html + private static final long MAX_COPY_OBJECT_SIZE_BYTES = 5_368_709_120L; + + // S3 API, delete-objects: "You may specify up to 1000 keys." + private static final int MAX_DELETE_OBJECTS_PER_REQUEST = 1000; + + private static final ImmutableSet NON_READ_SEEK_EFFICIENT_ENCODINGS = + ImmutableSet.of("gzip"); + + // Non-final for testing. + private Supplier s3Client; + private final S3Options options; + private final ListeningExecutorService executorService; + + S3FileSystem(S3Options options) { + this.options = checkNotNull(options, "options"); + S3ClientBuilder builder = + InstanceBuilder.ofType(S3ClientBuilderFactory.class) + .fromClass(options.getS3ClientFactoryClass()) + .build() + .createBuilder(options); + // The Supplier is to make sure we don't call .build() unless we are actually using S3. + s3Client = Suppliers.memoize(builder::build); + + checkNotNull(options.getS3StorageClass(), "storageClass"); + checkArgument(options.getS3ThreadPoolSize() > 0, "threadPoolSize"); + executorService = + MoreExecutors.listeningDecorator( + Executors.newFixedThreadPool( + options.getS3ThreadPoolSize(), new ThreadFactoryBuilder().setDaemon(true).build())); + } + + @Override + protected String getScheme() { + return S3ResourceId.SCHEME; + } + + @VisibleForTesting + void setS3Client(S3Client s3) { + this.s3Client = Suppliers.ofInstance(s3); + } + + @VisibleForTesting + S3Client getS3Client() { + return this.s3Client.get(); + } + + @Override + protected List match(List specs) throws IOException { + List paths = + specs.stream().map(S3ResourceId::fromUri).collect(Collectors.toList()); + List globs = new ArrayList<>(); + List nonGlobs = new ArrayList<>(); + List isGlobBooleans = new ArrayList<>(); + + paths.forEach( + path -> { + if (path.isWildcard()) { + globs.add(path); + isGlobBooleans.add(true); + } else { + nonGlobs.add(path); + isGlobBooleans.add(false); + } + }); + + Iterator globMatches = matchGlobPaths(globs).iterator(); + Iterator nonGlobMatches = matchNonGlobPaths(nonGlobs).iterator(); + + ImmutableList.Builder matchResults = ImmutableList.builder(); + isGlobBooleans.forEach( + isGlob -> { + if (isGlob) { + checkState( + globMatches.hasNext(), + "Internal error encountered in S3Filesystem: expected more elements in globMatches."); + matchResults.add(globMatches.next()); + } else { + checkState( + nonGlobMatches.hasNext(), + "Internal error encountered in S3Filesystem: expected more elements in nonGlobMatches."); + matchResults.add(nonGlobMatches.next()); + } + }); + checkState( + !globMatches.hasNext(), + "Internal error encountered in S3Filesystem: expected no more elements in globMatches."); + checkState( + !nonGlobMatches.hasNext(), + "Internal error encountered in S3Filesystem: expected no more elements in nonGlobMatches."); + + return matchResults.build(); + } + + /** Gets {@link MatchResult} representing all objects that match wildcard-containing paths. */ + @VisibleForTesting + List matchGlobPaths(Collection globPaths) throws IOException { + Stream> expandTasks = + globPaths.stream().map(path -> () -> expandGlob(path)); + + Map expandedGlobByGlobPath = + callTasks(expandTasks).stream() + .collect(Collectors.toMap(ExpandedGlob::getGlobPath, expandedGlob -> expandedGlob)); + + Stream> contentTypeTasks = + expandedGlobByGlobPath.values().stream() + .map(ExpandedGlob::getExpandedPaths) + .filter(Objects::nonNull) + .flatMap(List::stream) + .map(path -> () -> getPathContentEncoding(path)); + + Map exceptionByPath = + callTasks(contentTypeTasks).stream() + .collect( + Collectors.toMap(PathWithEncoding::getPath, pathWithEncoding -> pathWithEncoding)); + + List results = new ArrayList<>(globPaths.size()); + globPaths.forEach( + globPath -> { + ExpandedGlob expandedGlob = expandedGlobByGlobPath.get(globPath); + + if (expandedGlob.getException() != null) { + results.add(MatchResult.create(MatchResult.Status.ERROR, expandedGlob.getException())); + } else { + List metadatas = new ArrayList<>(); + IOException exception = null; + for (S3ResourceId expandedPath : expandedGlob.getExpandedPaths()) { + PathWithEncoding pathWithEncoding = exceptionByPath.get(expandedPath); + + if (pathWithEncoding.getException() != null) { + exception = pathWithEncoding.getException(); + break; + } else { + metadatas.add( + createBeamMetadata( + pathWithEncoding.getPath(), pathWithEncoding.getContentEncoding())); + } + } + + if (exception != null) { + if (exception instanceof FileNotFoundException) { + results.add(MatchResult.create(MatchResult.Status.NOT_FOUND, exception)); + } else { + results.add(MatchResult.create(MatchResult.Status.ERROR, exception)); + } + } else { + results.add(MatchResult.create(MatchResult.Status.OK, metadatas)); + } + } + }); + + return ImmutableList.copyOf(results); + } + + @AutoValue + abstract static class ExpandedGlob { + + abstract S3ResourceId getGlobPath(); + + abstract @Nullable List getExpandedPaths(); + + abstract @Nullable IOException getException(); + + static ExpandedGlob create(S3ResourceId globPath, List expandedPaths) { + checkNotNull(globPath, "globPath"); + checkNotNull(expandedPaths, "expandedPaths"); + return new AutoValue_S3FileSystem_ExpandedGlob(globPath, expandedPaths, null); + } + + static ExpandedGlob create(S3ResourceId globPath, IOException exception) { + checkNotNull(globPath, "globPath"); + checkNotNull(exception, "exception"); + return new AutoValue_S3FileSystem_ExpandedGlob(globPath, null, exception); + } + } + + @AutoValue + abstract static class PathWithEncoding { + + abstract S3ResourceId getPath(); + + abstract @Nullable String getContentEncoding(); + + abstract @Nullable IOException getException(); + + static PathWithEncoding create(S3ResourceId path, String contentEncoding) { + checkNotNull(path, "path"); + checkNotNull(contentEncoding, "contentEncoding"); + return new AutoValue_S3FileSystem_PathWithEncoding(path, contentEncoding, null); + } + + static PathWithEncoding create(S3ResourceId path, IOException exception) { + checkNotNull(path, "path"); + checkNotNull(exception, "exception"); + return new AutoValue_S3FileSystem_PathWithEncoding(path, null, exception); + } + } + + private ExpandedGlob expandGlob(S3ResourceId glob) { + // The S3 API can list objects, filtered by prefix, but not by wildcard. + // Here, we find the longest prefix without wildcard "*", + // then filter the results with a regex. + checkArgument(glob.isWildcard(), "isWildcard"); + String keyPrefix = glob.getKeyNonWildcardPrefix(); + Pattern wildcardRegexp = Pattern.compile(wildcardToRegexp(glob.getKey())); + + LOG.debug( + "expanding bucket {}, prefix {}, against pattern {}", + glob.getBucket(), + keyPrefix, + wildcardRegexp); + + ImmutableList.Builder expandedPaths = ImmutableList.builder(); + String continuationToken = null; + + do { + ListObjectsV2Request request = + ListObjectsV2Request.builder() + .bucket(glob.getBucket()) + .prefix(keyPrefix) + .continuationToken(continuationToken) + .build(); + ListObjectsV2Response response; + try { + response = s3Client.get().listObjectsV2(request); + } catch (SdkServiceException e) { + return ExpandedGlob.create(glob, new IOException(e)); + } + continuationToken = response.nextContinuationToken(); + List contents = response.contents(); + + contents.stream() + .filter(s3Object -> wildcardRegexp.matcher(s3Object.key()).matches()) + .forEach( + s3Object -> { + S3ResourceId expandedPath = + S3ResourceId.fromComponents(glob.getBucket(), s3Object.key()) + .withSize(s3Object.size()) + .withLastModified(Date.from(s3Object.lastModified())); + LOG.debug("Expanded S3 object path {}", expandedPath); + expandedPaths.add(expandedPath); + }); + } while (continuationToken != null); + + return ExpandedGlob.create(glob, expandedPaths.build()); + } + + private PathWithEncoding getPathContentEncoding(S3ResourceId path) { + HeadObjectResponse s3ObjectHead; + try { + s3ObjectHead = getObjectHead(path); + } catch (SdkServiceException e) { + if (e instanceof S3Exception && e.statusCode() == 404) { + return PathWithEncoding.create(path, new FileNotFoundException()); + } + return PathWithEncoding.create(path, new IOException(e)); + } + return PathWithEncoding.create(path, Strings.nullToEmpty(s3ObjectHead.contentEncoding())); + } + + private List matchNonGlobPaths(Collection paths) throws IOException { + return callTasks(paths.stream().map(path -> () -> matchNonGlobPath(path))); + } + + private HeadObjectResponse getObjectHead(S3ResourceId s3ResourceId) throws SdkServiceException { + HeadObjectRequest request = + HeadObjectRequest.builder() + .bucket(s3ResourceId.getBucket()) + .key(s3ResourceId.getKey()) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .sseCustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .build(); + return s3Client.get().headObject(request); + } + + @VisibleForTesting + MatchResult matchNonGlobPath(S3ResourceId path) { + HeadObjectResponse s3ObjectHead; + try { + s3ObjectHead = getObjectHead(path); + } catch (SdkServiceException e) { + if (e instanceof S3Exception && e.statusCode() == 404) { + return MatchResult.create(MatchResult.Status.NOT_FOUND, new FileNotFoundException()); + } + return MatchResult.create(MatchResult.Status.ERROR, new IOException(e)); + } + + return MatchResult.create( + MatchResult.Status.OK, + ImmutableList.of( + createBeamMetadata( + path.withSize(s3ObjectHead.contentLength()) + .withLastModified(Date.from(s3ObjectHead.lastModified())), + Strings.nullToEmpty(s3ObjectHead.contentEncoding())))); + } + + private static MatchResult.Metadata createBeamMetadata( + S3ResourceId path, String contentEncoding) { + checkArgument(path.getSize().isPresent(), "The resource id should have a size."); + checkNotNull(contentEncoding, "contentEncoding"); + boolean isReadSeekEfficient = !NON_READ_SEEK_EFFICIENT_ENCODINGS.contains(contentEncoding); + + return MatchResult.Metadata.builder() + .setIsReadSeekEfficient(isReadSeekEfficient) + .setResourceId(path) + .setSizeBytes(path.getSize().get()) + .setLastModifiedMillis(path.getLastModified().transform(Date::getTime).or(0L)) + .build(); + } + + @Override + protected WritableByteChannel create(S3ResourceId resourceId, CreateOptions createOptions) + throws IOException { + return new S3WritableByteChannel(s3Client.get(), resourceId, createOptions.mimeType(), options); + } + + @Override + protected ReadableByteChannel open(S3ResourceId resourceId) throws IOException { + return new S3ReadableSeekableByteChannel(s3Client.get(), resourceId, options); + } + + @Override + protected void copy(List sourcePaths, List destinationPaths) + throws IOException { + checkArgument( + sourcePaths.size() == destinationPaths.size(), + "sizes of sourcePaths and destinationPaths do not match"); + + Stream.Builder> tasks = Stream.builder(); + + Iterator sourcePathsIterator = sourcePaths.iterator(); + Iterator destinationPathsIterator = destinationPaths.iterator(); + while (sourcePathsIterator.hasNext()) { + final S3ResourceId sourcePath = sourcePathsIterator.next(); + final S3ResourceId destinationPath = destinationPathsIterator.next(); + + tasks.add( + () -> { + copy(sourcePath, destinationPath); + return null; + }); + } + + callTasks(tasks.build()); + } + + @VisibleForTesting + void copy(S3ResourceId sourcePath, S3ResourceId destinationPath) throws IOException { + try { + HeadObjectResponse sourceObjectHead = getObjectHead(sourcePath); + if (sourceObjectHead.contentLength() < MAX_COPY_OBJECT_SIZE_BYTES) { + atomicCopy(sourcePath, destinationPath, sourceObjectHead); + } else { + multipartCopy(sourcePath, destinationPath, sourceObjectHead); + } + } catch (SdkServiceException e) { + throw new IOException(e); + } + } + + @VisibleForTesting + CopyObjectResponse atomicCopy( + S3ResourceId sourcePath, S3ResourceId destinationPath, HeadObjectResponse objectHead) + throws SdkServiceException { + CopyObjectRequest copyObjectRequest = + CopyObjectRequest.builder() + .copySource(sourcePath.getBucket() + "/" + sourcePath.getKey()) + .destinationBucket(destinationPath.getBucket()) + .destinationKey(destinationPath.getKey()) + .metadata(objectHead.metadata()) + .storageClass(options.getS3StorageClass()) + .serverSideEncryption(options.getSSEAlgorithm()) + .ssekmsKeyId(options.getSSEKMSKeyId()) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .sseCustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .copySourceSSECustomerKey(options.getSSECustomerKey().getKey()) + .copySourceSSECustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .sseCustomerKeyMD5(options.getSSECustomerKey().getMD5()) + .copySourceSSECustomerKeyMD5(options.getSSECustomerKey().getMD5()) + .build(); + return s3Client.get().copyObject(copyObjectRequest); + } + + @VisibleForTesting + CompleteMultipartUploadResponse multipartCopy( + S3ResourceId sourcePath, S3ResourceId destinationPath, HeadObjectResponse sourceObjectHead) + throws SdkServiceException { + CreateMultipartUploadRequest initiateUploadRequest = + CreateMultipartUploadRequest.builder() + .bucket(destinationPath.getBucket()) + .key(destinationPath.getKey()) + .storageClass(options.getS3StorageClass()) + .metadata(sourceObjectHead.metadata()) + .serverSideEncryption(options.getSSEAlgorithm()) + .ssekmsKeyId(options.getSSEKMSKeyId()) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .sseCustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .build(); + + CreateMultipartUploadResponse createMultipartUploadResponse = + s3Client.get().createMultipartUpload(initiateUploadRequest); + final String uploadId = createMultipartUploadResponse.uploadId(); + + List completedParts = new ArrayList<>(); + + final long objectSize = sourceObjectHead.contentLength(); + CopyPartResult copyPartResult; + CompletedPart completedPart; + // extra validation in case a caller calls directly S3FileSystem.multipartCopy + // without using S3FileSystem.copy in the future + if (objectSize == 0) { + final UploadPartCopyRequest uploadPartCopyRequest = + UploadPartCopyRequest.builder() + .bucket(sourcePath.getBucket()) + .key(sourcePath.getKey()) + .copySource(sourcePath.getBucket() + "/" + sourcePath.getKey()) + .uploadId(uploadId) + .partNumber(1) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .sseCustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .copySourceSSECustomerKey(options.getSSECustomerKey().getKey()) + .copySourceSSECustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .build(); + + copyPartResult = s3Client.get().uploadPartCopy(uploadPartCopyRequest).copyPartResult(); + completedPart = CompletedPart.builder().partNumber(1).eTag(copyPartResult.eTag()).build(); + completedParts.add(completedPart); + } else { + long bytePosition = 0; + Integer uploadBufferSizeBytes = options.getS3UploadBufferSizeBytes(); + // Amazon parts are 1-indexed, not zero-indexed. + for (int partNumber = 1; bytePosition < objectSize; partNumber++) { + final UploadPartCopyRequest uploadPartCopyRequest = + UploadPartCopyRequest.builder() + .bucket(sourcePath.getBucket()) + .key(sourcePath.getKey()) + .copySource(destinationPath.getBucket() + "/" + sourcePath.getKey()) + .uploadId(uploadId) + .partNumber(partNumber) + .copySourceRange( + String.format( + "bytes=%s-%s", + bytePosition, + Math.min(objectSize - 1, bytePosition + uploadBufferSizeBytes - 1))) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .sseCustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .copySourceSSECustomerKey(options.getSSECustomerKey().getKey()) + .copySourceSSECustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .build(); + + copyPartResult = s3Client.get().uploadPartCopy(uploadPartCopyRequest).copyPartResult(); + completedPart = CompletedPart.builder().partNumber(1).eTag(copyPartResult.eTag()).build(); + completedParts.add(completedPart); + + bytePosition += uploadBufferSizeBytes; + } + } + CompletedMultipartUpload completedMultipartUpload = + CompletedMultipartUpload.builder().parts(completedParts).build(); + + CompleteMultipartUploadRequest completeUploadRequest = + CompleteMultipartUploadRequest.builder() + .bucket(destinationPath.getBucket()) + .key(destinationPath.getKey()) + .uploadId(uploadId) + .multipartUpload(completedMultipartUpload) + .build(); + return s3Client.get().completeMultipartUpload(completeUploadRequest); + } + + @Override + protected void rename( + List sourceResourceIds, + List destinationResourceIds, + MoveOptions... moveOptions) + throws IOException { + if (moveOptions.length > 0) { + throw new UnsupportedOperationException("Support for move options is not yet implemented."); + } + copy(sourceResourceIds, destinationResourceIds); + delete(sourceResourceIds); + } + + @Override + protected void delete(Collection resourceIds) throws IOException { + List nonDirectoryPaths = + resourceIds.stream() + .filter(s3ResourceId -> !s3ResourceId.isDirectory()) + .collect(Collectors.toList()); + Multimap keysByBucket = ArrayListMultimap.create(); + nonDirectoryPaths.forEach(path -> keysByBucket.put(path.getBucket(), path.getKey())); + + Stream.Builder> tasks = Stream.builder(); + keysByBucket + .keySet() + .forEach( + bucket -> + Iterables.partition(keysByBucket.get(bucket), MAX_DELETE_OBJECTS_PER_REQUEST) + .forEach( + keysPartition -> + tasks.add( + () -> { + delete(bucket, keysPartition); + return null; + }))); + callTasks(tasks.build()); + } + + private void delete(String bucket, Collection keys) throws IOException { + checkArgument( + keys.size() <= MAX_DELETE_OBJECTS_PER_REQUEST, + "only %s keys can be deleted per request, but got %s", + MAX_DELETE_OBJECTS_PER_REQUEST, + keys.size()); + + List deleteKeyVersions = + keys.stream() + .map((key) -> ObjectIdentifier.builder().key(key).build()) + .collect(Collectors.toList()); + Delete delete = Delete.builder().objects(deleteKeyVersions).build(); + DeleteObjectsRequest deleteObjectsRequest = + DeleteObjectsRequest.builder().bucket(bucket).delete(delete).build(); + try { + s3Client.get().deleteObjects(deleteObjectsRequest); + } catch (SdkServiceException e) { + throw new IOException(e); + } + } + + @Override + protected S3ResourceId matchNewResource(String singleResourceSpec, boolean isDirectory) { + if (isDirectory) { + if (!singleResourceSpec.endsWith("/")) { + singleResourceSpec += "/"; + } + } else { + checkArgument( + !singleResourceSpec.endsWith("/"), + "Expected a file path, but [%s] ends with '/'. This is unsupported in S3FileSystem.", + singleResourceSpec); + } + return S3ResourceId.fromUri(singleResourceSpec); + } + + /** + * Invokes tasks in a thread pool, then unwraps the resulting {@link Future Futures}. + * + *

    Any task exception is wrapped in {@link IOException}. + */ + private List callTasks(Stream> tasks) throws IOException { + + try { + return MoreFutures.get( + MoreFutures.allAsList( + tasks + .map(task -> MoreFutures.supplyAsync(task::call, executorService)) + .collect(Collectors.toList()))); + + } catch (ExecutionException e) { + if (e.getCause() != null) { + if (e.getCause() instanceof IOException) { + throw (IOException) e.getCause(); + } + throw new IOException(e.getCause()); + } + throw new IOException(e); + + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new IOException("executor service was interrupted"); + } + } +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystemRegistrar.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystemRegistrar.java new file mode 100644 index 000000000000..162c74b6e6ce --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystemRegistrar.java @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.service.AutoService; +import javax.annotation.Nonnull; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.io.FileSystem; +import org.apache.beam.sdk.io.FileSystemRegistrar; +import org.apache.beam.sdk.io.aws2.options.S3Options; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** {@link AutoService} registrar for the {@link S3FileSystem}. */ +@AutoService(FileSystemRegistrar.class) +@Experimental(Kind.FILESYSTEM) +public class S3FileSystemRegistrar implements FileSystemRegistrar { + + @Override + public Iterable> fromOptions(@Nonnull PipelineOptions options) { + checkNotNull(options, "Expect the runner have called FileSystems.setDefaultPipelineOptions()."); + return ImmutableList.of(new S3FileSystem(options.as(S3Options.class))); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ReadableSeekableByteChannel.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ReadableSeekableByteChannel.java new file mode 100644 index 000000000000..7b4c33fb9548 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ReadableSeekableByteChannel.java @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static software.amazon.awssdk.utils.IoUtils.drainInputStream; + +import java.io.BufferedInputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.Channels; +import java.nio.channels.ClosedChannelException; +import java.nio.channels.NonWritableChannelException; +import java.nio.channels.ReadableByteChannel; +import java.nio.channels.SeekableByteChannel; +import org.apache.beam.sdk.io.aws2.options.S3Options; +import org.checkerframework.checker.nullness.qual.Nullable; +import software.amazon.awssdk.core.ResponseInputStream; +import software.amazon.awssdk.core.exception.SdkClientException; +import software.amazon.awssdk.core.exception.SdkServiceException; +import software.amazon.awssdk.services.s3.S3Client; +import software.amazon.awssdk.services.s3.model.GetObjectRequest; +import software.amazon.awssdk.services.s3.model.GetObjectResponse; +import software.amazon.awssdk.services.s3.model.HeadObjectRequest; + +/** A readable S3 object, as a {@link SeekableByteChannel}. */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +class S3ReadableSeekableByteChannel implements SeekableByteChannel { + + private final S3Client s3Client; + private final S3ResourceId path; + private final S3Options options; + private final long contentLength; + private long position = 0; + private boolean open = true; + private @Nullable ResponseInputStream s3ResponseInputStream; + private @Nullable ReadableByteChannel s3ObjectContentChannel; + + S3ReadableSeekableByteChannel(S3Client s3Client, S3ResourceId path, S3Options options) + throws IOException { + this.s3Client = checkNotNull(s3Client, "s3Client"); + checkNotNull(path, "path"); + this.options = checkNotNull(options, "options"); + + if (path.getSize().isPresent()) { + contentLength = path.getSize().get(); + this.path = path; + + } else { + HeadObjectRequest request = + HeadObjectRequest.builder().bucket(path.getBucket()).key(path.getKey()).build(); + try { + contentLength = s3Client.headObject(request).contentLength(); + } catch (SdkClientException e) { + throw new IOException(e); + } + this.path = path.withSize(contentLength); + } + } + + @Override + public int read(ByteBuffer destinationBuffer) throws IOException { + if (!isOpen()) { + throw new ClosedChannelException(); + } + if (!destinationBuffer.hasRemaining()) { + return 0; + } + if (position == contentLength) { + return -1; + } + + if (s3ResponseInputStream == null) { + + GetObjectRequest.Builder builder = + GetObjectRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .sseCustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()); + if (position > 0) { + builder.range(String.format("bytes=%s-%s", position, contentLength)); + } + GetObjectRequest request = builder.build(); + try { + s3ResponseInputStream = s3Client.getObject(request); + } catch (SdkClientException e) { + throw new IOException(e); + } + s3ObjectContentChannel = + Channels.newChannel(new BufferedInputStream(s3ResponseInputStream, 1024 * 1024)); + } + + int totalBytesRead = 0; + int bytesRead = 0; + + do { + totalBytesRead += bytesRead; + try { + bytesRead = s3ObjectContentChannel.read(destinationBuffer); + } catch (SdkServiceException e) { + throw new IOException(e); + } + } while (bytesRead > 0); + + position += totalBytesRead; + return totalBytesRead; + } + + @Override + public long position() throws ClosedChannelException { + if (!isOpen()) { + throw new ClosedChannelException(); + } + return position; + } + + @Override + public SeekableByteChannel position(long newPosition) throws IOException { + if (!isOpen()) { + throw new ClosedChannelException(); + } + checkArgument(newPosition >= 0, "newPosition too low"); + checkArgument(newPosition < contentLength, "new position too high"); + + if (newPosition == position) { + return this; + } + + // The position has changed, so close and destroy the object to induce a re-creation on the next + // call to read() + if (s3ResponseInputStream != null) { + s3ResponseInputStream.close(); + s3ResponseInputStream = null; + } + position = newPosition; + return this; + } + + @Override + public long size() throws ClosedChannelException { + if (!isOpen()) { + throw new ClosedChannelException(); + } + return contentLength; + } + + @Override + public void close() throws IOException { + if (s3ResponseInputStream != null) { + drainInputStream(s3ResponseInputStream); + s3ResponseInputStream.close(); + } + open = false; + } + + @Override + public boolean isOpen() { + return open; + } + + @Override + public int write(ByteBuffer src) { + throw new NonWritableChannelException(); + } + + @Override + public SeekableByteChannel truncate(long size) { + throw new NonWritableChannelException(); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ResourceId.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ResourceId.java new file mode 100644 index 000000000000..ec99ff8f32fe --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ResourceId.java @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import java.util.Date; +import java.util.Objects; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import org.apache.beam.sdk.io.fs.ResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Optional; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** An identifier which represents a S3Object resource. */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +class S3ResourceId implements ResourceId { + + static final String SCHEME = "s3"; + + private static final Pattern S3_URI = + Pattern.compile("(?[^:]+)://(?[^/]+)(/(?.*))?"); + + /** Matches a glob containing a wildcard, capturing the portion before the first wildcard. */ + private static final Pattern GLOB_PREFIX = Pattern.compile("(?[^\\[*?]*)[\\[*?].*"); + + private final String bucket; + private final String key; + private final @Nullable Long size; + private final @Nullable Date lastModified; + + private S3ResourceId( + String bucket, String key, @Nullable Long size, @Nullable Date lastModified) { + checkArgument(!Strings.isNullOrEmpty(bucket), "bucket"); + checkArgument(!bucket.contains("/"), "bucket must not contain '/': [%s]", bucket); + this.bucket = bucket; + this.key = checkNotNull(key, "key"); + this.size = size; + this.lastModified = lastModified; + } + + static S3ResourceId fromComponents(String bucket, String key) { + if (!key.startsWith("/")) { + key = "/" + key; + } + return new S3ResourceId(bucket, key, null, null); + } + + static S3ResourceId fromUri(String uri) { + Matcher m = S3_URI.matcher(uri); + checkArgument(m.matches(), "Invalid S3 URI: [%s]", uri); + checkArgument(m.group("SCHEME").equalsIgnoreCase(SCHEME), "Invalid S3 URI scheme: [%s]", uri); + String bucket = m.group("BUCKET"); + String key = Strings.nullToEmpty(m.group("KEY")); + if (!key.startsWith("/")) { + key = "/" + key; + } + return fromComponents(bucket, key); + } + + String getBucket() { + return bucket; + } + + String getKey() { + // Skip leading slash + return key.substring(1); + } + + Optional getSize() { + return Optional.fromNullable(size); + } + + S3ResourceId withSize(long size) { + return new S3ResourceId(bucket, key, size, lastModified); + } + + Optional getLastModified() { + return Optional.fromNullable(lastModified); + } + + S3ResourceId withLastModified(Date lastModified) { + return new S3ResourceId(bucket, key, size, lastModified); + } + + @Override + public ResourceId resolve(String other, ResolveOptions resolveOptions) { + checkState(isDirectory(), "Expected this resource to be a directory, but was [%s]", toString()); + + if (resolveOptions == ResolveOptions.StandardResolveOptions.RESOLVE_DIRECTORY) { + if ("..".equals(other)) { + if ("/".equals(key)) { + return this; + } + int parentStopsAt = key.substring(0, key.length() - 1).lastIndexOf('/'); + return fromComponents(bucket, key.substring(0, parentStopsAt + 1)); + } + + if ("".equals(other)) { + return this; + } + + if (!other.endsWith("/")) { + other += "/"; + } + if (S3_URI.matcher(other).matches()) { + return fromUri(other); + } + return fromComponents(bucket, key + other); + } + + if (resolveOptions == ResolveOptions.StandardResolveOptions.RESOLVE_FILE) { + checkArgument( + !other.endsWith("/"), "Cannot resolve a file with a directory path: [%s]", other); + checkArgument(!"..".equals(other), "Cannot resolve parent as file: [%s]", other); + if (S3_URI.matcher(other).matches()) { + return fromUri(other); + } + return fromComponents(bucket, key + other); + } + + throw new UnsupportedOperationException( + String.format("Unexpected StandardResolveOptions [%s]", resolveOptions)); + } + + @Override + public ResourceId getCurrentDirectory() { + if (isDirectory()) { + return this; + } + return fromComponents(getBucket(), key.substring(0, key.lastIndexOf('/') + 1)); + } + + @Override + public String getScheme() { + return SCHEME; + } + + @Override + public @Nullable String getFilename() { + if (!isDirectory()) { + return key.substring(key.lastIndexOf('/') + 1); + } + if ("/".equals(key)) { + return null; + } + String keyWithoutTrailingSlash = key.substring(0, key.length() - 1); + return keyWithoutTrailingSlash.substring(keyWithoutTrailingSlash.lastIndexOf('/') + 1); + } + + @Override + public boolean isDirectory() { + return key.endsWith("/"); + } + + boolean isWildcard() { + return GLOB_PREFIX.matcher(getKey()).matches(); + } + + String getKeyNonWildcardPrefix() { + Matcher m = GLOB_PREFIX.matcher(getKey()); + checkArgument(m.matches(), String.format("Glob expression: [%s] is not expandable.", getKey())); + return m.group("PREFIX"); + } + + @Override + public String toString() { + return String.format("%s://%s%s", SCHEME, bucket, key); + } + + @Override + public boolean equals(@Nullable Object obj) { + if (!(obj instanceof S3ResourceId)) { + return false; + } + + return bucket.equals(((S3ResourceId) obj).bucket) && key.equals(((S3ResourceId) obj).key); + } + + @Override + public int hashCode() { + return Objects.hash(bucket, key); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3WritableByteChannel.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3WritableByteChannel.java new file mode 100644 index 000000000000..8aaf84bd3f9c --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3WritableByteChannel.java @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.ClosedChannelException; +import java.nio.channels.WritableByteChannel; +import java.security.MessageDigest; +import java.security.NoSuchAlgorithmException; +import java.util.ArrayList; +import java.util.Base64; +import java.util.Objects; +import org.apache.beam.sdk.io.aws2.options.S3Options; +import org.apache.beam.sdk.io.aws2.options.S3Options.S3UploadBufferSizeBytesFactory; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import software.amazon.awssdk.core.exception.SdkClientException; +import software.amazon.awssdk.core.sync.RequestBody; +import software.amazon.awssdk.services.s3.S3Client; +import software.amazon.awssdk.services.s3.model.CompleteMultipartUploadRequest; +import software.amazon.awssdk.services.s3.model.CompletedMultipartUpload; +import software.amazon.awssdk.services.s3.model.CompletedPart; +import software.amazon.awssdk.services.s3.model.CreateMultipartUploadRequest; +import software.amazon.awssdk.services.s3.model.CreateMultipartUploadResponse; +import software.amazon.awssdk.services.s3.model.ServerSideEncryption; +import software.amazon.awssdk.services.s3.model.UploadPartRequest; +import software.amazon.awssdk.services.s3.model.UploadPartResponse; + +/** A writable S3 object, as a {@link WritableByteChannel}. */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +class S3WritableByteChannel implements WritableByteChannel { + + private final S3Client s3Client; + private final S3Options options; + private final S3ResourceId path; + + private final String uploadId; + private final ByteBuffer uploadBuffer; + + // AWS S3 parts are 1-indexed, not zero-indexed. + private int partNumber = 1; + private boolean open = true; + private final MessageDigest md5 = md5(); + private final ArrayList completedParts; + + S3WritableByteChannel(S3Client s3, S3ResourceId path, String contentType, S3Options options) + throws IOException { + this.s3Client = checkNotNull(s3, "s3Client"); + this.options = checkNotNull(options); + this.path = checkNotNull(path, "path"); + + String awsKms = ServerSideEncryption.AWS_KMS.toString(); + checkArgument( + atMostOne( + options.getSSECustomerKey().getKey() != null, + (Objects.equals(options.getSSEAlgorithm(), awsKms) || options.getSSEKMSKeyId() != null), + (options.getSSEAlgorithm() != null && !options.getSSEAlgorithm().equals(awsKms))), + "Either SSECustomerKey (SSE-C) or SSEAlgorithm (SSE-S3)" + + " or SSEAwsKeyManagementParams (SSE-KMS) must not be set at the same time."); + // Amazon S3 API docs: Each part must be at least 5 MB in size, except the last part. + checkArgument( + options.getS3UploadBufferSizeBytes() + >= S3UploadBufferSizeBytesFactory.MINIMUM_UPLOAD_BUFFER_SIZE_BYTES, + "S3UploadBufferSizeBytes must be at least %s bytes", + S3UploadBufferSizeBytesFactory.MINIMUM_UPLOAD_BUFFER_SIZE_BYTES); + this.uploadBuffer = ByteBuffer.allocate(options.getS3UploadBufferSizeBytes()); + + completedParts = new ArrayList<>(); + CreateMultipartUploadRequest createMultipartUploadRequest = + CreateMultipartUploadRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .storageClass(options.getS3StorageClass()) + .contentType(contentType) + .serverSideEncryption(options.getSSEAlgorithm()) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .sseCustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .ssekmsKeyId(options.getSSEKMSKeyId()) + .sseCustomerKeyMD5(options.getSSECustomerKey().getMD5()) + .build(); + CreateMultipartUploadResponse response; + try { + response = this.s3Client.createMultipartUpload(createMultipartUploadRequest); + } catch (SdkClientException e) { + throw new IOException(e); + } + uploadId = response.uploadId(); + } + + private static MessageDigest md5() { + try { + return MessageDigest.getInstance("MD5"); + } catch (NoSuchAlgorithmException e) { + throw new IllegalStateException(e); + } + } + + @Override + public int write(ByteBuffer sourceBuffer) throws IOException { + if (!isOpen()) { + throw new ClosedChannelException(); + } + + int totalBytesWritten = 0; + while (sourceBuffer.hasRemaining()) { + int bytesWritten = Math.min(sourceBuffer.remaining(), uploadBuffer.remaining()); + totalBytesWritten += bytesWritten; + + byte[] copyBuffer = new byte[bytesWritten]; + sourceBuffer.get(copyBuffer); + uploadBuffer.put(copyBuffer); + md5.update(copyBuffer); + + if (!uploadBuffer.hasRemaining() || sourceBuffer.hasRemaining()) { + flush(); + } + } + + return totalBytesWritten; + } + + private void flush() throws IOException { + uploadBuffer.flip(); + ByteArrayInputStream inputStream = new ByteArrayInputStream(uploadBuffer.array()); + + UploadPartRequest request = + UploadPartRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .uploadId(uploadId) + .partNumber(partNumber++) + .contentLength((long) uploadBuffer.remaining()) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .sseCustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .sseCustomerKeyMD5(options.getSSECustomerKey().getMD5()) + .contentMD5(Base64.getEncoder().encodeToString(md5.digest())) + .build(); + + UploadPartResponse response; + try { + response = + s3Client.uploadPart( + request, RequestBody.fromInputStream(inputStream, request.contentLength())); + } catch (SdkClientException e) { + throw new IOException(e); + } + CompletedPart part = + CompletedPart.builder().partNumber(request.partNumber()).eTag(response.eTag()).build(); + uploadBuffer.clear(); + md5.reset(); + completedParts.add(part); + } + + @Override + public boolean isOpen() { + return open; + } + + @Override + public void close() throws IOException { + open = false; + if (uploadBuffer.remaining() > 0) { + flush(); + } + + CompletedMultipartUpload completedMultipartUpload = + CompletedMultipartUpload.builder().parts(completedParts).build(); + + CompleteMultipartUploadRequest request = + CompleteMultipartUploadRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .uploadId(uploadId) + .multipartUpload(completedMultipartUpload) + .build(); + try { + s3Client.completeMultipartUpload(request); + } catch (SdkClientException e) { + throw new IOException(e); + } + } + + @VisibleForTesting + static boolean atMostOne(boolean... values) { + boolean one = false; + for (boolean value : values) { + if (!one && value) { + one = true; + } else if (value) { + return false; + } + } + return true; + } +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/SSECustomerKey.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/SSECustomerKey.java new file mode 100644 index 000000000000..20000ca8c9cd --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/SSECustomerKey.java @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.nio.charset.StandardCharsets; +import java.util.Base64; +import org.apache.commons.codec.digest.DigestUtils; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** Customer provided key for use with Amazon S3 server-side encryption. */ +public class SSECustomerKey { + + private final @Nullable String key; + private final @Nullable String algorithm; + private final @Nullable String md5; + + private SSECustomerKey(Builder builder) { + checkArgument( + (builder.key == null && builder.algorithm == null) + || (builder.key != null && builder.algorithm != null), + "Encryption key and algorithm for SSE-C encryption must be specified in pairs"); + key = builder.key; + algorithm = builder.algorithm; + md5 = + builder.md5 == null && key != null + ? Base64.getEncoder() + .encodeToString( + DigestUtils.md5( + Base64.getDecoder().decode(key.getBytes(StandardCharsets.UTF_8)))) + : builder.md5; + } + + public @Nullable String getKey() { + return key; + } + + public @Nullable String getAlgorithm() { + return algorithm; + } + + public @Nullable String getMD5() { + return md5; + } + + public static Builder builder() { + return new Builder(); + } + + public static class Builder { + + private @Nullable String key; + private @Nullable String algorithm; + private @Nullable String md5; + + private Builder() {} + + public Builder key(String key) { + this.key = key; + return this; + } + + public Builder algorithm(String algorithm) { + this.algorithm = algorithm; + return this; + } + + public Builder md5(String md5) { + this.md5 = md5; + return this; + } + + public SSECustomerKey build() { + return new SSECustomerKey(this); + } + } +} diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/package-info.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/package-info.java new file mode 100644 index 000000000000..87a18c6d6d0c --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/package-info.java @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/** Defines IO connectors for Amazon Web Services S3. */ +@Experimental(Kind.FILESYSTEM) +package org.apache.beam.sdk.io.aws2.s3; + +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsAsyncClientProvider.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsAsyncClientProvider.java index b436d0f8b04f..691124c83c68 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsAsyncClientProvider.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsAsyncClientProvider.java @@ -18,8 +18,11 @@ package org.apache.beam.sdk.io.aws2.sns; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import java.net.URI; +import java.util.Objects; +import org.apache.beam.sdk.io.aws2.options.AwsSerializableUtils; import org.checkerframework.checker.nullness.qual.Nullable; import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; import software.amazon.awssdk.regions.Region; @@ -28,7 +31,7 @@ /** Basic implementation of {@link SnsAsyncClientProvider} used by default in {@link SnsIO}. */ class BasicSnsAsyncClientProvider implements SnsAsyncClientProvider { - private final AwsCredentialsProvider awsCredentialsProvider; + private final String awsCredentialsProviderSerialized; private final String region; private final @Nullable URI serviceEndpoint; @@ -36,7 +39,9 @@ class BasicSnsAsyncClientProvider implements SnsAsyncClientProvider { AwsCredentialsProvider awsCredentialsProvider, String region, @Nullable URI serviceEndpoint) { checkArgument(awsCredentialsProvider != null, "awsCredentialsProvider can not be null"); checkArgument(region != null, "region can not be null"); - this.awsCredentialsProvider = awsCredentialsProvider; + this.awsCredentialsProviderSerialized = + AwsSerializableUtils.serializeAwsCredentialsProvider(awsCredentialsProvider); + checkNotNull(awsCredentialsProviderSerialized, "awsCredentialsProviderString can not be null"); this.region = region; this.serviceEndpoint = serviceEndpoint; } @@ -45,7 +50,9 @@ class BasicSnsAsyncClientProvider implements SnsAsyncClientProvider { public SnsAsyncClient getSnsAsyncClient() { SnsAsyncClientBuilder builder = SnsAsyncClient.builder() - .credentialsProvider(awsCredentialsProvider) + .credentialsProvider( + AwsSerializableUtils.deserializeAwsCredentialsProvider( + awsCredentialsProviderSerialized)) .region(Region.of(region)); if (serviceEndpoint != null) { @@ -54,4 +61,23 @@ public SnsAsyncClient getSnsAsyncClient() { return builder.build(); } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + BasicSnsAsyncClientProvider that = (BasicSnsAsyncClientProvider) o; + return Objects.equals(awsCredentialsProviderSerialized, that.awsCredentialsProviderSerialized) + && Objects.equals(region, that.region) + && Objects.equals(serviceEndpoint, that.serviceEndpoint); + } + + @Override + public int hashCode() { + return Objects.hash(awsCredentialsProviderSerialized, region, serviceEndpoint); + } } diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsClientProvider.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsClientProvider.java index 24144eccdf31..68306cc17f7d 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsClientProvider.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsClientProvider.java @@ -18,8 +18,11 @@ package org.apache.beam.sdk.io.aws2.sns; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import java.net.URI; +import java.util.Objects; +import org.apache.beam.sdk.io.aws2.options.AwsSerializableUtils; import org.checkerframework.checker.nullness.qual.Nullable; import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; import software.amazon.awssdk.regions.Region; @@ -28,7 +31,7 @@ /** Basic implementation of {@link SnsClientProvider} used by default in {@link SnsIO}. */ class BasicSnsClientProvider implements SnsClientProvider { - private final AwsCredentialsProvider awsCredentialsProvider; + private final String awsCredentialsProviderSerialized; private final String region; private final @Nullable URI serviceEndpoint; @@ -36,7 +39,9 @@ class BasicSnsClientProvider implements SnsClientProvider { AwsCredentialsProvider awsCredentialsProvider, String region, @Nullable URI serviceEndpoint) { checkArgument(awsCredentialsProvider != null, "awsCredentialsProvider can not be null"); checkArgument(region != null, "region can not be null"); - this.awsCredentialsProvider = awsCredentialsProvider; + this.awsCredentialsProviderSerialized = + AwsSerializableUtils.serializeAwsCredentialsProvider(awsCredentialsProvider); + checkNotNull(awsCredentialsProviderSerialized, "awsCredentialsProviderString can not be null"); this.region = region; this.serviceEndpoint = serviceEndpoint; } @@ -44,7 +49,11 @@ class BasicSnsClientProvider implements SnsClientProvider { @Override public SnsClient getSnsClient() { SnsClientBuilder builder = - SnsClient.builder().credentialsProvider(awsCredentialsProvider).region(Region.of(region)); + SnsClient.builder() + .credentialsProvider( + AwsSerializableUtils.deserializeAwsCredentialsProvider( + awsCredentialsProviderSerialized)) + .region(Region.of(region)); if (serviceEndpoint != null) { builder.endpointOverride(serviceEndpoint); @@ -52,4 +61,23 @@ public SnsClient getSnsClient() { return builder.build(); } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + BasicSnsClientProvider that = (BasicSnsClientProvider) o; + return Objects.equals(awsCredentialsProviderSerialized, that.awsCredentialsProviderSerialized) + && Objects.equals(region, that.region) + && Objects.equals(serviceEndpoint, that.serviceEndpoint); + } + + @Override + public int hashCode() { + return Objects.hash(awsCredentialsProviderSerialized, region, serviceEndpoint); + } } diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/BasicSqsClientProvider.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/BasicSqsClientProvider.java index a4c508c13dd4..f5861664b59e 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/BasicSqsClientProvider.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/BasicSqsClientProvider.java @@ -18,8 +18,11 @@ package org.apache.beam.sdk.io.aws2.sqs; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import java.net.URI; +import java.util.Objects; +import org.apache.beam.sdk.io.aws2.options.AwsSerializableUtils; import org.checkerframework.checker.nullness.qual.Nullable; import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; import software.amazon.awssdk.regions.Region; @@ -28,7 +31,7 @@ /** Basic implementation of {@link SqsClientProvider} used by default in {@link SqsIO}. */ class BasicSqsClientProvider implements SqsClientProvider { - private final AwsCredentialsProvider awsCredentialsProvider; + private final String awsCredentialsProviderSerialized; private final String region; private final @Nullable URI serviceEndpoint; @@ -36,7 +39,9 @@ class BasicSqsClientProvider implements SqsClientProvider { AwsCredentialsProvider awsCredentialsProvider, String region, @Nullable URI serviceEndpoint) { checkArgument(awsCredentialsProvider != null, "awsCredentialsProvider can not be null"); checkArgument(region != null, "region can not be null"); - this.awsCredentialsProvider = awsCredentialsProvider; + this.awsCredentialsProviderSerialized = + AwsSerializableUtils.serializeAwsCredentialsProvider(awsCredentialsProvider); + checkNotNull(awsCredentialsProviderSerialized, "awsCredentialsProviderString can not be null"); this.region = region; this.serviceEndpoint = serviceEndpoint; } @@ -44,7 +49,11 @@ class BasicSqsClientProvider implements SqsClientProvider { @Override public SqsClient getSqsClient() { SqsClientBuilder builder = - SqsClient.builder().credentialsProvider(awsCredentialsProvider).region(Region.of(region)); + SqsClient.builder() + .credentialsProvider( + AwsSerializableUtils.deserializeAwsCredentialsProvider( + awsCredentialsProviderSerialized)) + .region(Region.of(region)); if (serviceEndpoint != null) { builder.endpointOverride(serviceEndpoint); @@ -52,4 +61,23 @@ public SqsClient getSqsClient() { return builder.build(); } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + BasicSqsClientProvider that = (BasicSqsClientProvider) o; + return Objects.equals(awsCredentialsProviderSerialized, that.awsCredentialsProviderSerialized) + && Objects.equals(region, that.region) + && Objects.equals(serviceEndpoint, that.serviceEndpoint); + } + + @Override + public int hashCode() { + return Objects.hash(awsCredentialsProviderSerialized, region, serviceEndpoint); + } } diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsCheckpointMark.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsCheckpointMark.java index 92ed2354de5c..2a9a630c4571 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsCheckpointMark.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsCheckpointMark.java @@ -17,33 +17,70 @@ */ package org.apache.beam.sdk.io.aws2.sqs; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import java.io.IOException; import java.io.Serializable; -import java.util.Collection; import java.util.List; import java.util.Optional; import org.apache.beam.sdk.io.UnboundedSource; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Objects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.checkerframework.checker.nullness.qual.Nullable; -import software.amazon.awssdk.services.sqs.model.Message; +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) class SqsCheckpointMark implements UnboundedSource.CheckpointMark, Serializable { - private final List messagesToDelete; - private final transient Optional reader; + /** + * If the checkpoint is for persisting: the reader who's snapshotted state we are persisting. If + * the checkpoint is for restoring: {@literal null}. Not persisted in durable checkpoint. CAUTION: + * Between a checkpoint being taken and {@link #finalizeCheckpoint()} being called the 'true' + * active reader may have changed. + */ + private transient Optional reader; + /** + * If the checkpoint is for persisting: The ids of messages which have been passed downstream + * since the last checkpoint. If the checkpoint is for restoring: {@literal null}. Not persisted + * in durable checkpoint. + */ + private @Nullable List safeToDeleteIds; + + /** + * If the checkpoint is for persisting: The receipt handles of messages which have been received + * from SQS but not yet passed downstream at the time of the snapshot. If the checkpoint is for + * restoring: Same, but recovered from durable storage. + */ + @VisibleForTesting final List notYetReadReceipts; - SqsCheckpointMark(SqsUnboundedReader reader, Collection messagesToDelete) { + SqsCheckpointMark( + SqsUnboundedReader reader, List messagesToDelete, List notYetReadReceipts) { this.reader = Optional.of(reader); - this.messagesToDelete = ImmutableList.copyOf(messagesToDelete); + this.safeToDeleteIds = ImmutableList.copyOf(messagesToDelete); + this.notYetReadReceipts = ImmutableList.copyOf(notYetReadReceipts); } @Override - public void finalizeCheckpoint() { - reader.ifPresent(r -> r.delete(messagesToDelete)); - } - - List getMessagesToDelete() { - return messagesToDelete; + public void finalizeCheckpoint() throws IOException { + checkState( + reader.isPresent() && safeToDeleteIds != null, "Cannot finalize a restored checkpoint"); + // Even if the 'true' active reader has changed since the checkpoint was taken we are + // fine: + // - The underlying SQS topic will not have changed, so the following deletes will still + // go to the right place. + // - We'll delete the ACK ids from the readers in-flight state, but that only affect + // flow control and stats, neither of which are relevant anymore. + try { + reader.get().delete(safeToDeleteIds); + } finally { + int remainingInFlight = reader.get().numInFlightCheckpoints.decrementAndGet(); + checkState(remainingInFlight >= 0, "Miscounted in-flight checkpoints"); + reader.get().maybeCloseClient(); + reader = Optional.empty(); + safeToDeleteIds = null; + } } @Override @@ -55,11 +92,11 @@ public boolean equals(@Nullable Object o) { return false; } SqsCheckpointMark that = (SqsCheckpointMark) o; - return Objects.equal(messagesToDelete, that.messagesToDelete); + return Objects.equal(safeToDeleteIds, that.safeToDeleteIds); } @Override public int hashCode() { - return Objects.hashCode(messagesToDelete); + return Objects.hashCode(safeToDeleteIds); } } diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsMessage.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsMessage.java index e0e8daaa0861..ba895d466a90 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsMessage.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsMessage.java @@ -30,8 +30,12 @@ public abstract class SqsMessage implements Serializable { abstract @Nullable String getMessageId(); + abstract @Nullable String getReceiptHandle(); + abstract @Nullable String getTimeStamp(); + abstract @Nullable String getRequestTimeStamp(); + abstract Builder toBuilder(); @AutoValue.Builder @@ -40,20 +44,33 @@ abstract static class Builder { abstract Builder setMessageId(String messageId); + abstract Builder setReceiptHandle(String receiptHandle); + abstract Builder setTimeStamp(String timeStamp); + abstract Builder setRequestTimeStamp(String timeStamp); + abstract SqsMessage build(); } - static SqsMessage create(String body, String messageId, String timeStamp) { + static SqsMessage create( + String body, + String messageId, + String receiptHandle, + String timeStamp, + String requestTimeStamp) { checkArgument(body != null, "body can not be null"); checkArgument(messageId != null, "messageId can not be null"); + checkArgument(receiptHandle != null, "receiptHandle can not be null"); checkArgument(timeStamp != null, "timeStamp can not be null"); + checkArgument(requestTimeStamp != null, "requestTimeStamp can not be null"); return new AutoValue_SqsMessage.Builder() .setBody(body) .setMessageId(messageId) + .setReceiptHandle(receiptHandle) .setTimeStamp(timeStamp) + .setRequestTimeStamp(requestTimeStamp) .build(); } } diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedReader.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedReader.java index 3d65a430786d..a6c7755ec640 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedReader.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedReader.java @@ -17,19 +17,50 @@ */ package org.apache.beam.sdk.io.aws2.sqs; -import java.io.Serializable; -import java.nio.charset.StandardCharsets; +import static java.nio.charset.StandardCharsets.UTF_8; +import static java.util.stream.Collectors.groupingBy; +import static java.util.stream.Collectors.toMap; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static software.amazon.awssdk.services.sqs.model.QueueAttributeName.VISIBILITY_TIMEOUT; + +import java.io.IOException; import java.util.ArrayDeque; import java.util.ArrayList; -import java.util.Collection; +import java.util.HashSet; +import java.util.LinkedHashMap; import java.util.List; +import java.util.Map; import java.util.NoSuchElementException; +import java.util.Objects; import java.util.Queue; +import java.util.Set; +import java.util.concurrent.ConcurrentLinkedQueue; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.stream.Collectors; +import java.util.stream.IntStream; import org.apache.beam.sdk.io.UnboundedSource; import org.apache.beam.sdk.io.UnboundedSource.CheckpointMark; +import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.sdk.transforms.Sum; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.BucketingFunction; +import org.apache.beam.sdk.util.MovingFunction; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.joda.time.Duration; import org.joda.time.Instant; -import software.amazon.awssdk.services.sqs.model.DeleteMessageRequest; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import software.amazon.awssdk.services.sqs.SqsClient; +import software.amazon.awssdk.services.sqs.model.BatchResultErrorEntry; +import software.amazon.awssdk.services.sqs.model.ChangeMessageVisibilityBatchRequest; +import software.amazon.awssdk.services.sqs.model.ChangeMessageVisibilityBatchRequestEntry; +import software.amazon.awssdk.services.sqs.model.ChangeMessageVisibilityBatchResponse; +import software.amazon.awssdk.services.sqs.model.DeleteMessageBatchRequest; +import software.amazon.awssdk.services.sqs.model.DeleteMessageBatchRequestEntry; +import software.amazon.awssdk.services.sqs.model.DeleteMessageBatchResponse; +import software.amazon.awssdk.services.sqs.model.GetQueueAttributesRequest; import software.amazon.awssdk.services.sqs.model.Message; import software.amazon.awssdk.services.sqs.model.MessageSystemAttributeName; import software.amazon.awssdk.services.sqs.model.ReceiveMessageRequest; @@ -38,31 +69,319 @@ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -class SqsUnboundedReader extends UnboundedSource.UnboundedReader - implements Serializable { +class SqsUnboundedReader extends UnboundedSource.UnboundedReader { + private static final Logger LOG = LoggerFactory.getLogger(SqsUnboundedReader.class); + /** Maximum number of messages to pull from SQS per request. */ public static final int MAX_NUMBER_OF_MESSAGES = 10; + + /** Maximum times to retry batch SQS operations upon partial success. */ + private static final int BATCH_OPERATION_MAX_RETIRES = 5; + + /** Timeout for round trip from receiving a message to finally deleting it from SQS. */ + private static final Duration PROCESSING_TIMEOUT = Duration.standardMinutes(2); + + /** + * Percentage of visibility timeout by which to extend visibility timeout when they are near + * timeout. + */ + private static final int VISIBILITY_EXTENSION_PCT = 50; + + /** + * Percentage of ack timeout we should use as a safety margin. We'll try to extend visibility + * timeout by this margin before the visibility timeout actually expires. + */ + private static final int VISIBILITY_SAFETY_PCT = 20; + + /** + * For stats only: How close we can get to an visibility deadline before we risk it being already + * considered passed by SQS. + */ + private static final Duration VISIBILITY_TOO_LATE = Duration.standardSeconds(2); + + /** Maximum number of message ids per delete or visibility extension call. */ + private static final int DELETE_BATCH_SIZE = 10; + + /** Maximum number of messages in flight. */ + private static final int MAX_IN_FLIGHT = 20000; + + /** Period of samples to determine watermark and other stats. */ + private static final Duration SAMPLE_PERIOD = Duration.standardMinutes(1); + + /** Period of updates to determine watermark and other stats. */ + private static final Duration SAMPLE_UPDATE = Duration.standardSeconds(5); + + /** Period for logging stats. */ + private static final Duration LOG_PERIOD = Duration.standardSeconds(30); + + /** Minimum number of unread messages required before considering updating watermark. */ + private static final int MIN_WATERMARK_MESSAGES = 10; + + /** + * Minimum number of SAMPLE_UPDATE periods over which unread messages should be spread before + * considering updating watermark. + */ + private static final int MIN_WATERMARK_SPREAD = 2; + + // TODO: Would prefer to use MinLongFn but it is a BinaryCombineFn rather + // than a BinaryCombineLongFn. [BEAM-285] + private static final Combine.BinaryCombineLongFn MIN = + new Combine.BinaryCombineLongFn() { + @Override + public long apply(long left, long right) { + return Math.min(left, right); + } + + @Override + public long identity() { + return Long.MAX_VALUE; + } + }; + + private static final Combine.BinaryCombineLongFn MAX = + new Combine.BinaryCombineLongFn() { + @Override + public long apply(long left, long right) { + return Math.max(left, right); + } + + @Override + public long identity() { + return Long.MIN_VALUE; + } + }; + + private static final Combine.BinaryCombineLongFn SUM = Sum.ofLongs(); + + /** For access to topic and SQS client. */ private final SqsUnboundedSource source; + + /** + * The closed state of this {@link SqsUnboundedReader}. If true, the reader has not yet been + * closed, and it will have a non-null value within {@link #SqsUnboundedReader}. + */ + private AtomicBoolean active = new AtomicBoolean(true); + + /** The current message, or {@literal null} if none. */ private SqsMessage current; - private final Queue messagesNotYetRead; - private List messagesToDelete; - private Instant oldestPendingTimestamp = BoundedWindow.TIMESTAMP_MIN_VALUE; - public SqsUnboundedReader(SqsUnboundedSource source, SqsCheckpointMark sqsCheckpointMark) { + /** + * Messages we have received from SQS and not yet delivered downstream. We preserve their order. + */ + private final Queue messagesNotYetRead; + + /** Message ids of messages we have delivered downstream but not yet deleted. */ + private Set safeToDeleteIds; + + /** + * Visibility timeout, in ms, as set on subscription when we first start reading. Not updated + * thereafter. -1 if not yet determined. + */ + private long visibilityTimeoutMs; + + /** Byte size of undecoded elements in {@link #messagesNotYetRead}. */ + private long notYetReadBytes; + + /** + * Bucketed map from received time (as system time, ms since epoch) to message timestamps (mssince + * epoch) of all received but not-yet read messages. Used to estimate watermark. + */ + private BucketingFunction minUnreadTimestampMsSinceEpoch; + + /** + * Minimum of timestamps (ms since epoch) of all recently read messages. Used to estimate + * watermark. + */ + private MovingFunction minReadTimestampMsSinceEpoch; + + private static class InFlightState { + /** Receipt handle of message. */ + String receiptHandle; + + /** When request which yielded message was issued. */ + long requestTimeMsSinceEpoch; + + /** + * When SQS will consider this message's visibility timeout to timeout and thus it needs to be + * extended. + */ + long visibilityDeadlineMsSinceEpoch; + + public InFlightState( + String receiptHandle, long requestTimeMsSinceEpoch, long visibilityDeadlineMsSinceEpoch) { + this.receiptHandle = receiptHandle; + this.requestTimeMsSinceEpoch = requestTimeMsSinceEpoch; + this.visibilityDeadlineMsSinceEpoch = visibilityDeadlineMsSinceEpoch; + } + } + + /** + * Map from message ids of messages we have received from SQS but not yet deleted to their in + * flight state. Ordered from earliest to latest visibility deadline. + */ + private final LinkedHashMap inFlight; + + /** + * Batches of successfully deleted message ids which need to be pruned from the above. CAUTION: + * Accessed by both reader and checkpointing threads. + */ + private final Queue> deletedIds; + + /** + * System time (ms since epoch) we last received a message from SQS, or -1 if not yet received any + * messages. + */ + private long lastReceivedMsSinceEpoch; + + /** The last reported watermark (ms since epoch), or beginning of time if none yet reported. */ + private long lastWatermarkMsSinceEpoch; + + /** Stats only: System time (ms since epoch) we last logs stats, or -1 if never. */ + private long lastLogTimestampMsSinceEpoch; + + /** Stats only: Total number of messages received. */ + private long numReceived; + + /** Stats only: Number of messages which have recently been received. */ + private MovingFunction numReceivedRecently; + + /** Stats only: Number of messages which have recently had their deadline extended. */ + private MovingFunction numExtendedDeadlines; + + /** + * Stats only: Number of messages which have recently had their deadline extended even though it + * may be too late to do so. + */ + private MovingFunction numLateDeadlines; + + /** Stats only: Number of messages which have recently been deleted. */ + private MovingFunction numDeleted; + + /** + * Stats only: Number of messages which have recently expired (visibility timeout were extended + * for too long). + */ + private MovingFunction numExpired; + + /** Stats only: Number of messages which have recently been returned to visible on SQS. */ + private MovingFunction numReleased; + + /** Stats only: Number of message bytes which have recently been read by downstream consumer. */ + private MovingFunction numReadBytes; + + /** + * Stats only: Minimum of timestamp (ms since epoch) of all recently received messages. Used to + * estimate timestamp skew. Does not contribute to watermark estimator. + */ + private MovingFunction minReceivedTimestampMsSinceEpoch; + + /** + * Stats only: Maximum of timestamp (ms since epoch) of all recently received messages. Used to + * estimate timestamp skew. + */ + private MovingFunction maxReceivedTimestampMsSinceEpoch; + + /** Stats only: Minimum of recent estimated watermarks (ms since epoch). */ + private MovingFunction minWatermarkMsSinceEpoch; + + /** Stats ony: Maximum of recent estimated watermarks (ms since epoch). */ + private MovingFunction maxWatermarkMsSinceEpoch; + + /** + * Stats only: Number of messages with timestamps strictly behind the estimated watermark at the + * time they are received. These may be considered 'late' by downstream computations. + */ + private MovingFunction numLateMessages; + + /** + * Stats only: Current number of checkpoints in flight. CAUTION: Accessed by both checkpointing + * and reader threads. + */ + AtomicInteger numInFlightCheckpoints; + + /** Stats only: Maximum number of checkpoints in flight at any time. */ + private int maxInFlightCheckpoints; + + private static MovingFunction newFun(Combine.BinaryCombineLongFn function) { + return new MovingFunction( + SAMPLE_PERIOD.getMillis(), + SAMPLE_UPDATE.getMillis(), + MIN_WATERMARK_SPREAD, + MIN_WATERMARK_MESSAGES, + function); + } + + public SqsUnboundedReader(SqsUnboundedSource source, SqsCheckpointMark sqsCheckpointMark) + throws IOException { this.source = source; - this.current = null; - this.messagesNotYetRead = new ArrayDeque<>(); - this.messagesToDelete = new ArrayList<>(); + messagesNotYetRead = new ArrayDeque<>(); + safeToDeleteIds = new HashSet<>(); + inFlight = new LinkedHashMap<>(); + deletedIds = new ConcurrentLinkedQueue<>(); + visibilityTimeoutMs = -1; + notYetReadBytes = 0; + minUnreadTimestampMsSinceEpoch = + new BucketingFunction( + SAMPLE_UPDATE.getMillis(), MIN_WATERMARK_SPREAD, MIN_WATERMARK_MESSAGES, MIN); + minReadTimestampMsSinceEpoch = newFun(MIN); + lastReceivedMsSinceEpoch = -1; + lastWatermarkMsSinceEpoch = BoundedWindow.TIMESTAMP_MIN_VALUE.getMillis(); + current = null; + lastLogTimestampMsSinceEpoch = -1; + numReceived = 0L; + numReceivedRecently = newFun(SUM); + numExtendedDeadlines = newFun(SUM); + numLateDeadlines = newFun(SUM); + numDeleted = newFun(SUM); + numExpired = newFun(SUM); + numReleased = newFun(SUM); + numReadBytes = newFun(SUM); + minReceivedTimestampMsSinceEpoch = newFun(MIN); + maxReceivedTimestampMsSinceEpoch = newFun(MAX); + minWatermarkMsSinceEpoch = newFun(MIN); + maxWatermarkMsSinceEpoch = newFun(MAX); + numLateMessages = newFun(SUM); + numInFlightCheckpoints = new AtomicInteger(); + maxInFlightCheckpoints = 0; if (sqsCheckpointMark != null) { - this.messagesToDelete.addAll(sqsCheckpointMark.getMessagesToDelete()); + long nowMsSinceEpoch = now(); + extendBatch(nowMsSinceEpoch, sqsCheckpointMark.notYetReadReceipts, 0); + numReleased.add(nowMsSinceEpoch, sqsCheckpointMark.notYetReadReceipts.size()); } } @Override public Instant getWatermark() { - return oldestPendingTimestamp; + + // NOTE: We'll allow the watermark to go backwards. The underlying runner is responsible + // for aggregating all reported watermarks and ensuring the aggregate is latched. + // If we attempt to latch locally then it is possible a temporary starvation of one reader + // could cause its estimated watermark to fast forward to current system time. Then when + // the reader resumes its watermark would be unable to resume tracking. + // By letting the underlying runner latch we avoid any problems due to localized starvation. + long nowMsSinceEpoch = now(); + long readMin = minReadTimestampMsSinceEpoch.get(nowMsSinceEpoch); + long unreadMin = minUnreadTimestampMsSinceEpoch.get(); + if (readMin == Long.MAX_VALUE + && unreadMin == Long.MAX_VALUE + && lastReceivedMsSinceEpoch >= 0 + && nowMsSinceEpoch > lastReceivedMsSinceEpoch + SAMPLE_PERIOD.getMillis()) { + // We don't currently have any unread messages pending, we have not had any messages + // read for a while, and we have not received any new messages from SQS for a while. + // Advance watermark to current time. + // TODO: Estimate a timestamp lag. + lastWatermarkMsSinceEpoch = nowMsSinceEpoch; + } else if (minReadTimestampMsSinceEpoch.isSignificant() + || minUnreadTimestampMsSinceEpoch.isSignificant()) { + // Take minimum of the timestamps in all unread messages and recently read messages. + lastWatermarkMsSinceEpoch = Math.min(readMin, unreadMin); + } + // else: We're not confident enough to estimate a new watermark. Stick with the old one. + minWatermarkMsSinceEpoch.add(nowMsSinceEpoch, lastWatermarkMsSinceEpoch); + maxWatermarkMsSinceEpoch.add(nowMsSinceEpoch, lastWatermarkMsSinceEpoch); + return new Instant(lastWatermarkMsSinceEpoch); } @Override @@ -87,12 +406,19 @@ public byte[] getCurrentRecordId() throws NoSuchElementException { if (current == null) { throw new NoSuchElementException(); } - return current.getMessageId().getBytes(StandardCharsets.UTF_8); + return current.getMessageId().getBytes(UTF_8); } @Override public CheckpointMark getCheckpointMark() { - return new SqsCheckpointMark(this, messagesToDelete); + int cur = numInFlightCheckpoints.incrementAndGet(); + maxInFlightCheckpoints = Math.max(maxInFlightCheckpoints, cur); + List snapshotSafeToDeleteIds = Lists.newArrayList(safeToDeleteIds); + List snapshotNotYetReadReceipts = new ArrayList<>(messagesNotYetRead.size()); + for (SqsMessage message : messagesNotYetRead) { + snapshotNotYetReadReceipts.add(message.getReceiptHandle()); + } + return new SqsCheckpointMark(this, snapshotSafeToDeleteIds, snapshotNotYetReadReceipts); } @Override @@ -101,66 +427,205 @@ public SqsUnboundedSource getCurrentSource() { } @Override - public boolean start() { + public boolean start() throws IOException { + visibilityTimeoutMs = + Integer.parseInt( + source + .getSqs() + .getQueueAttributes( + GetQueueAttributesRequest.builder() + .queueUrl(source.getRead().queueUrl()) + .attributeNames(VISIBILITY_TIMEOUT) + .build()) + .attributes() + .get(VISIBILITY_TIMEOUT)) + * 1000L; return advance(); } @Override - public boolean advance() { + public boolean advance() throws IOException { + // Emit stats. + stats(); + + if (current != null) { + // Current is consumed. It can no longer contribute to holding back the watermark. + minUnreadTimestampMsSinceEpoch.remove(Long.parseLong(current.getRequestTimeStamp())); + current = null; + } + + // Retire state associated with deleted messages. + retire(); + + // Extend all pressing deadlines. + // Will BLOCK until done. + // If the system is pulling messages only to let them sit in a downstream queue then + // this will have the effect of slowing down the pull rate. + // However, if the system is genuinely taking longer to process each message then + // the work to extend visibility timeout would be better done in the background. + extend(); + if (messagesNotYetRead.isEmpty()) { + // Pull another batch. + // Will BLOCK until fetch returns, but will not block until a message is available. pull(); } - Message orgMsg = messagesNotYetRead.poll(); - if (orgMsg != null) { - String timeStamp = - orgMsg.attributes().get(MessageSystemAttributeName.APPROXIMATE_FIRST_RECEIVE_TIMESTAMP); - current = SqsMessage.create(orgMsg.body(), orgMsg.messageId(), timeStamp); - } else { + // Take one message from queue. + current = messagesNotYetRead.poll(); + if (current == null) { + // Try again later. return false; } - - messagesToDelete.add(orgMsg); - - Instant currentMessageTimestamp = getCurrentTimestamp(); - if (getCurrentTimestamp().isBefore(oldestPendingTimestamp)) { - oldestPendingTimestamp = currentMessageTimestamp; + notYetReadBytes -= current.getBody().getBytes(UTF_8).length; + checkState(notYetReadBytes >= 0); + long nowMsSinceEpoch = now(); + numReadBytes.add(nowMsSinceEpoch, current.getBody().getBytes(UTF_8).length); + minReadTimestampMsSinceEpoch.add(nowMsSinceEpoch, getCurrentTimestamp().getMillis()); + ; + if (getCurrentTimestamp().getMillis() < lastWatermarkMsSinceEpoch) { + numLateMessages.add(nowMsSinceEpoch, 1L); } + // Current message can be considered 'read' and will be persisted by the next + // checkpoint. So it is now safe to delete from SQS. + safeToDeleteIds.add(current.getMessageId()); + return true; } + /** + * {@inheritDoc}. + * + *

    Marks this {@link SqsUnboundedReader} as no longer active. The {@link SqsClient} continue to + * exist and be active beyond the life of this call if there are any in-flight checkpoints. When + * no in-flight checkpoints remain, the reader will be closed. + */ @Override - public void close() {} - - void delete(final Collection messages) { - for (Message message : messages) { - if (messagesToDelete.contains(message)) { - DeleteMessageRequest deleteMessageRequest = - DeleteMessageRequest.builder() - .queueUrl(source.getRead().queueUrl()) - .receiptHandle(message.receiptHandle()) - .build(); - - source.getSqs().deleteMessage(deleteMessageRequest); - Instant currentMessageTimestamp = - getTimestamp( - message - .attributes() - .get(MessageSystemAttributeName.APPROXIMATE_FIRST_RECEIVE_TIMESTAMP)); - if (currentMessageTimestamp.isAfter(oldestPendingTimestamp)) { - oldestPendingTimestamp = currentMessageTimestamp; - } + public void close() throws IOException { + active.set(false); + maybeCloseClient(); + } + + /** + * Close this reader's underlying {@link SqsClient} if the reader has been closed and there are no + * outstanding checkpoints. + */ + void maybeCloseClient() throws IOException { + if (!active.get() && numInFlightCheckpoints.get() == 0) { + // The reader has been closed and it has no more outstanding checkpoints. The client + // must be closed so it doesn't leak + SqsClient client = source.getSqs(); + if (client != null) { + client.close(); } } } + /** delete the provided {@code messageIds} from SQS. */ + void delete(List messageIds) throws IOException { + AtomicInteger counter = new AtomicInteger(); + for (List messageList : + messageIds.stream() + .collect(groupingBy(x -> counter.getAndIncrement() / DELETE_BATCH_SIZE)) + .values()) { + deleteBatch(messageList); + } + } + + /** + * delete the provided {@code messageIds} from SQS, blocking until all of the messages are + * deleted. + * + *

    CAUTION: May be invoked from a separate thread. + * + *

    CAUTION: Retains {@code messageIds}. + */ + private void deleteBatch(List messageIds) throws IOException { + int retries = 0; + Map pendingReceipts = + IntStream.range(0, messageIds.size()) + .boxed() + .filter(i -> inFlight.containsKey(messageIds.get(i))) + .collect(toMap(Object::toString, i -> inFlight.get(messageIds.get(i)).receiptHandle)); + + while (!pendingReceipts.isEmpty()) { + + if (retries >= BATCH_OPERATION_MAX_RETIRES) { + throw new IOException( + "Failed to extend visibility timeout for " + + pendingReceipts.size() + + " messages after " + + retries + + " retries"); + } + + List entries = + pendingReceipts.entrySet().stream() + .map( + r -> + DeleteMessageBatchRequestEntry.builder() + .id(r.getKey()) + .receiptHandle(r.getValue()) + .build()) + .collect(Collectors.toList()); + + DeleteMessageBatchResponse result = + source + .getSqs() + .deleteMessageBatch( + DeleteMessageBatchRequest.builder() + .queueUrl(source.getRead().queueUrl()) + .entries(entries) + .build()); + + // Reflect failed message IDs to map + pendingReceipts + .keySet() + .retainAll( + result.failed().stream().map(BatchResultErrorEntry::id).collect(Collectors.toSet())); + + retries += 1; + } + deletedIds.add(messageIds); + } + + /** + * Messages which have been deleted (via the checkpoint finalize) are no longer in flight. This is + * only used for flow control and stats. + */ + private void retire() { + long nowMsSinceEpoch = now(); + while (true) { + List ackIds = deletedIds.poll(); + if (ackIds == null) { + return; + } + numDeleted.add(nowMsSinceEpoch, ackIds.size()); + for (String ackId : ackIds) { + inFlight.remove(ackId); + safeToDeleteIds.remove(ackId); + } + } + } + + /** BLOCKING Fetch another batch of messages from SQS. */ private void pull() { + if (inFlight.size() >= MAX_IN_FLIGHT) { + // Wait for checkpoint to be finalized before pulling anymore. + // There may be lag while checkpoints are persisted and the finalizeCheckpoint method + // is invoked. By limiting the in-flight messages we can ensure we don't end up consuming + // messages faster than we can checkpoint them. + return; + } + + long requestTimeMsSinceEpoch = now(); + long deadlineMsSinceEpoch = requestTimeMsSinceEpoch + visibilityTimeoutMs; + final ReceiveMessageRequest receiveMessageRequest = ReceiveMessageRequest.builder() .maxNumberOfMessages(MAX_NUMBER_OF_MESSAGES) - .attributeNamesWithStrings( - MessageSystemAttributeName.APPROXIMATE_FIRST_RECEIVE_TIMESTAMP.toString()) + .attributeNamesWithStrings(MessageSystemAttributeName.SENT_TIMESTAMP.toString()) .queueUrl(source.getRead().queueUrl()) .build(); @@ -173,7 +638,270 @@ private void pull() { return; } - messagesNotYetRead.addAll(messages); + lastReceivedMsSinceEpoch = requestTimeMsSinceEpoch; + + // Capture the received messages. + for (Message orgMsg : messages) { + String timeStamp = orgMsg.attributes().get(MessageSystemAttributeName.SENT_TIMESTAMP); + String requestTimeStamp = Long.toString(requestTimeMsSinceEpoch); + SqsMessage message = + SqsMessage.create( + orgMsg.body(), + orgMsg.messageId(), + orgMsg.receiptHandle(), + timeStamp, + requestTimeStamp); + messagesNotYetRead.add(message); + notYetReadBytes += message.getBody().getBytes(UTF_8).length; + inFlight.put( + message.getMessageId(), + new InFlightState( + message.getReceiptHandle(), requestTimeMsSinceEpoch, deadlineMsSinceEpoch)); + numReceived++; + numReceivedRecently.add(requestTimeMsSinceEpoch, 1L); + minReceivedTimestampMsSinceEpoch.add( + requestTimeMsSinceEpoch, getTimestamp(message.getTimeStamp()).getMillis()); + maxReceivedTimestampMsSinceEpoch.add( + requestTimeMsSinceEpoch, getTimestamp(message.getTimeStamp()).getMillis()); + minUnreadTimestampMsSinceEpoch.add( + requestTimeMsSinceEpoch, getTimestamp(message.getTimeStamp()).getMillis()); + } + } + + /** Return the current time, in ms since epoch. */ + long now() { + return System.currentTimeMillis(); + } + + /** + * BLOCKING Extend deadline for all messages which need it. CAUTION: If extensions can't keep up + * with wallclock then we'll never return. + */ + private void extend() throws IOException { + while (true) { + long nowMsSinceEpoch = now(); + List assumeExpired = new ArrayList<>(); + List toBeExtended = new ArrayList<>(); + List toBeExpired = new ArrayList<>(); + // Messages will be in increasing deadline order. + for (Map.Entry entry : inFlight.entrySet()) { + if (entry.getValue().visibilityDeadlineMsSinceEpoch + - (visibilityTimeoutMs * VISIBILITY_SAFETY_PCT) / 100 + > nowMsSinceEpoch) { + // All remaining messages don't need their visibility timeouts to be extended. + break; + } + + if (entry.getValue().visibilityDeadlineMsSinceEpoch - VISIBILITY_TOO_LATE.getMillis() + < nowMsSinceEpoch) { + // SQS may have already considered this message to have expired. + // If so it will (eventually) be made available on a future pull request. + // If this message ends up being committed then it will be considered a duplicate + // when re-pulled. + assumeExpired.add(entry.getKey()); + continue; + } + + if (entry.getValue().requestTimeMsSinceEpoch + PROCESSING_TIMEOUT.getMillis() + < nowMsSinceEpoch) { + // This message has been in-flight for too long. + // Give up on it, otherwise we risk extending its visibility timeout indefinitely. + toBeExpired.add(entry.getKey()); + continue; + } + + // Extend the visibility timeout for this message. + toBeExtended.add(entry.getKey()); + if (toBeExtended.size() >= DELETE_BATCH_SIZE) { + // Enough for one batch. + break; + } + } + + if (assumeExpired.isEmpty() && toBeExtended.isEmpty() && toBeExpired.isEmpty()) { + // Nothing to be done. + return; + } + + if (!assumeExpired.isEmpty()) { + // If we didn't make the visibility deadline assume expired and no longer in flight. + numLateDeadlines.add(nowMsSinceEpoch, assumeExpired.size()); + for (String messageId : assumeExpired) { + inFlight.remove(messageId); + } + } + + if (!toBeExpired.isEmpty()) { + // Expired messages are no longer considered in flight. + numExpired.add(nowMsSinceEpoch, toBeExpired.size()); + for (String messageId : toBeExpired) { + inFlight.remove(messageId); + } + } + + if (!toBeExtended.isEmpty()) { + // SQS extends visibility timeout from it's notion of current time. + // We'll try to track that on our side, but note the deadlines won't necessarily agree. + long extensionMs = (int) ((visibilityTimeoutMs * VISIBILITY_EXTENSION_PCT) / 100L); + long newDeadlineMsSinceEpoch = nowMsSinceEpoch + extensionMs; + for (String messageId : toBeExtended) { + // Maintain increasing ack deadline order. + String receiptHandle = inFlight.get(messageId).receiptHandle; + InFlightState state = inFlight.remove(messageId); + inFlight.put( + messageId, + new InFlightState( + receiptHandle, state.requestTimeMsSinceEpoch, newDeadlineMsSinceEpoch)); + } + List receiptHandles = + toBeExtended.stream() + .map(inFlight::get) + .filter(Objects::nonNull) // get rid of null values + .map(m -> m.receiptHandle) + .collect(Collectors.toList()); + // BLOCKs until extended. + extendBatch(nowMsSinceEpoch, receiptHandles, (int) (extensionMs / 1000)); + } + } + } + + /** + * BLOCKING Extend the visibility timeout for messages from SQS with the given {@code + * receiptHandles}. + */ + void extendBatch(long nowMsSinceEpoch, List receiptHandles, int extensionSec) + throws IOException { + int retries = 0; + int numMessages = receiptHandles.size(); + Map pendingReceipts = + IntStream.range(0, receiptHandles.size()) + .boxed() + .collect(toMap(Object::toString, receiptHandles::get)); + + while (!pendingReceipts.isEmpty()) { + + if (retries >= BATCH_OPERATION_MAX_RETIRES) { + throw new IOException( + "Failed to extend visibility timeout for " + + receiptHandles.size() + + " messages after " + + retries + + " retries"); + } + + List entries = + pendingReceipts.entrySet().stream() + .map( + r -> + ChangeMessageVisibilityBatchRequestEntry.builder() + .id(r.getKey()) + .receiptHandle(r.getValue()) + .visibilityTimeout(extensionSec) + .build()) + .collect(Collectors.toList()); + + ChangeMessageVisibilityBatchResponse response = + source + .getSqs() + .changeMessageVisibilityBatch( + ChangeMessageVisibilityBatchRequest.builder() + .queueUrl(source.getRead().queueUrl()) + .entries(entries) + .build()); + + pendingReceipts + .keySet() + .retainAll( + response.failed().stream() + .map(BatchResultErrorEntry::id) + .collect(Collectors.toSet())); + + retries += 1; + } + numExtendedDeadlines.add(nowMsSinceEpoch, numMessages); + } + + /** Log stats if time to do so. */ + private void stats() { + long nowMsSinceEpoch = now(); + if (lastLogTimestampMsSinceEpoch < 0) { + lastLogTimestampMsSinceEpoch = nowMsSinceEpoch; + return; + } + long deltaMs = nowMsSinceEpoch - lastLogTimestampMsSinceEpoch; + if (deltaMs < LOG_PERIOD.getMillis()) { + return; + } + + String messageSkew = "unknown"; + long minTimestamp = minReceivedTimestampMsSinceEpoch.get(nowMsSinceEpoch); + long maxTimestamp = maxReceivedTimestampMsSinceEpoch.get(nowMsSinceEpoch); + if (minTimestamp < Long.MAX_VALUE && maxTimestamp > Long.MIN_VALUE) { + messageSkew = (maxTimestamp - minTimestamp) + "ms"; + } + + String watermarkSkew = "unknown"; + long minWatermark = minWatermarkMsSinceEpoch.get(nowMsSinceEpoch); + long maxWatermark = maxWatermarkMsSinceEpoch.get(nowMsSinceEpoch); + if (minWatermark < Long.MAX_VALUE && maxWatermark > Long.MIN_VALUE) { + watermarkSkew = (maxWatermark - minWatermark) + "ms"; + } + + String oldestInFlight = "no"; + String oldestAckId = Iterables.getFirst(inFlight.keySet(), null); + if (oldestAckId != null) { + oldestInFlight = (nowMsSinceEpoch - inFlight.get(oldestAckId).requestTimeMsSinceEpoch) + "ms"; + } + + LOG.info( + "SQS {} has " + + "{} received messages, " + + "{} current unread messages, " + + "{} current unread bytes, " + + "{} current in-flight msgs, " + + "{} oldest in-flight, " + + "{} current in-flight checkpoints, " + + "{} max in-flight checkpoints, " + + "{}B/s recent read, " + + "{} recent received, " + + "{} recent extended, " + + "{} recent late extended, " + + "{} recent deleted, " + + "{} recent released, " + + "{} recent expired, " + + "{} recent message timestamp skew, " + + "{} recent watermark skew, " + + "{} recent late messages, " + + "{} last reported watermark, " + + "{} min recent read timestamp (significance = {}), " + + "{} min recent unread timestamp (significance = {}), " + + "{} last receive timestamp", + source.getRead().queueUrl(), + numReceived, + messagesNotYetRead.size(), + notYetReadBytes, + inFlight.size(), + oldestInFlight, + numInFlightCheckpoints.get(), + maxInFlightCheckpoints, + numReadBytes.get(nowMsSinceEpoch) / (SAMPLE_PERIOD.getMillis() / 1000L), + numReceivedRecently.get(nowMsSinceEpoch), + numExtendedDeadlines.get(nowMsSinceEpoch), + numLateDeadlines.get(nowMsSinceEpoch), + numDeleted.get(nowMsSinceEpoch), + numReleased.get(nowMsSinceEpoch), + numExpired.get(nowMsSinceEpoch), + messageSkew, + watermarkSkew, + numLateMessages.get(nowMsSinceEpoch), + new Instant(lastWatermarkMsSinceEpoch), + new Instant(minReadTimestampMsSinceEpoch.get(nowMsSinceEpoch)), + minReadTimestampMsSinceEpoch.isSignificant(), + new Instant(minUnreadTimestampMsSinceEpoch.get()), + minUnreadTimestampMsSinceEpoch.isSignificant(), + new Instant(lastReceivedMsSinceEpoch)); + + lastLogTimestampMsSinceEpoch = nowMsSinceEpoch; } private Instant getTimestamp(String timeStamp) { diff --git a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedSource.java b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedSource.java index b794b33b838f..96a616504ce3 100644 --- a/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedSource.java +++ b/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedSource.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.io.aws2.sqs; +import java.io.IOException; import java.io.Serializable; import java.util.ArrayList; import java.util.List; @@ -56,7 +57,11 @@ public List split(int desiredNumSplits, PipelineOptions opti @Override public UnboundedReader createReader( PipelineOptions options, @Nullable SqsCheckpointMark checkpointMark) { - return new SqsUnboundedReader(this, checkpointMark); + try { + return new SqsUnboundedReader(this, checkpointMark); + } catch (IOException e) { + throw new RuntimeException("Unable to subscribe to " + read.queueUrl() + ": ", e); + } } @Override diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/BasicDynamoDbClientProviderTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/BasicDynamoDbClientProviderTest.java new file mode 100644 index 000000000000..29c41ad6a517 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/BasicDynamoDbClientProviderTest.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.dynamodb; + +import static org.junit.Assert.assertEquals; + +import org.apache.beam.sdk.util.SerializableUtils; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import software.amazon.awssdk.auth.credentials.AwsBasicCredentials; +import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; +import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider; + +/** Tests on {@link BasicDynamoDbClientProvider}. */ +@RunWith(JUnit4.class) +public class BasicDynamoDbClientProviderTest { + + @Test + public void testSerialization() { + AwsCredentialsProvider awsCredentialsProvider = + StaticCredentialsProvider.create( + AwsBasicCredentials.create("ACCESS_KEY_ID", "SECRET_ACCESS_KEY")); + + BasicDynamoDbClientProvider dynamoDbClientProvider = + new BasicDynamoDbClientProvider(awsCredentialsProvider, "us-east-1", null); + + byte[] serializedBytes = SerializableUtils.serializeToByteArray(dynamoDbClientProvider); + + BasicDynamoDbClientProvider dynamoDbClientProviderDeserialized = + (BasicDynamoDbClientProvider) + SerializableUtils.deserializeFromByteArray(serializedBytes, "Aws Credentials Provider"); + + assertEquals(dynamoDbClientProvider, dynamoDbClientProviderDeserialized); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIOTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIOTest.java index 785a5bd7eeeb..6567d54544aa 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIOTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIOTest.java @@ -22,8 +22,11 @@ import static org.junit.Assert.fail; import java.io.Serializable; +import java.util.Arrays; +import java.util.HashSet; import java.util.List; import java.util.Map; +import java.util.stream.Collectors; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.testing.ExpectedLogs; import org.apache.beam.sdk.testing.PAssert; @@ -45,10 +48,10 @@ import org.junit.AfterClass; import org.junit.Before; import org.junit.BeforeClass; -import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; +import org.mockito.ArgumentCaptor; import org.mockito.Mockito; import software.amazon.awssdk.services.dynamodb.DynamoDbClient; import software.amazon.awssdk.services.dynamodb.model.AttributeValue; @@ -59,10 +62,6 @@ import software.amazon.awssdk.services.dynamodb.model.WriteRequest; /** Test Coverage for the IO. */ -@Ignore("[BEAM-7794] DynamoDBIOTest is blocking forever") -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DynamoDBIOTest implements Serializable { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); @Rule public final transient ExpectedLogs expectedLogs = ExpectedLogs.none(DynamoDBIO.class); @@ -144,6 +143,34 @@ public void processElement( pipeline.run().waitUntilFinish(); } + @Test + public void testReaderWithLimit() { + List> expected = + DynamoDBIOTestHelper.generateTestData(tableName, numOfItems); + + // Maximum number of records in scan result + final int limit = 5; + + PCollection> actual = + pipeline + .apply( + DynamoDBIO.>>read() + .withDynamoDbClientProvider( + DynamoDbClientProviderMock.of(DynamoDBIOTestHelper.getDynamoDBClient())) + .withScanRequestFn( + (SerializableFunction) + input -> + ScanRequest.builder() + .tableName(tableName) + .totalSegments(1) + .limit(limit) + .build()) + .items()) + .apply(ParDo.of(new IterateListDoFn())); + PAssert.that(actual).containsInAnyOrder(expected); + pipeline.run().waitUntilFinish(); + } + // Test cases for Reader's arguments. @Test public void testMissingScanRequestFn() { @@ -313,4 +340,96 @@ public void testRetries() throws Throwable { } fail("Pipeline is expected to fail because we were unable to write to DynamoDb."); } + + /** + * A DoFn used to generate outputs duplicated N times, where N is the input. Used to generate + * bundles with duplicate elements. + */ + private static class WriteDuplicateGeneratorDoFn extends DoFn> { + @ProcessElement + public void processElement(ProcessContext ctx) { + for (int i = 0; i < ctx.element(); i++) { + for (int j = 1; j <= numOfItems; j++) { + KV item = KV.of("test" + j, 1000 + j); + ctx.output(item); + } + } + } + } + + @Test + public void testWriteDeduplication() { + // designate duplication factor for each bundle + final List duplications = Arrays.asList(1, 2, 3); + + final List deduplicateKeys = Arrays.asList("hashKey1", "rangeKey2"); + + DynamoDbClient amazonDynamoDBMock = Mockito.mock(DynamoDbClient.class); + + pipeline + .apply(Create.of(duplications)) + .apply("duplicate", ParDo.of(new WriteDuplicateGeneratorDoFn())) + .apply( + DynamoDBIO.>write() + .withWriteRequestMapperFn( + (SerializableFunction, KV>) + entry -> { + Map putRequest = + ImmutableMap.of( + "hashKey1", + AttributeValue.builder().s(entry.getKey()).build(), + "rangeKey2", + AttributeValue.builder().n(entry.getValue().toString()).build()); + + WriteRequest writeRequest = + WriteRequest.builder() + .putRequest(PutRequest.builder().item(putRequest).build()) + .build(); + return KV.of(tableName, writeRequest); + }) + .withRetryConfiguration( + DynamoDBIO.RetryConfiguration.builder() + .setMaxAttempts(5) + .setMaxDuration(Duration.standardMinutes(1)) + .setRetryPredicate(DEFAULT_RETRY_PREDICATE) + .build()) + .withDynamoDbClientProvider(DynamoDbClientProviderMock.of(amazonDynamoDBMock)) + .withDeduplicateKeys(deduplicateKeys)); + + pipeline.run().waitUntilFinish(); + + ArgumentCaptor argumentCaptor = + ArgumentCaptor.forClass(BatchWriteItemRequest.class); + Mockito.verify(amazonDynamoDBMock, Mockito.times(3)).batchWriteItem(argumentCaptor.capture()); + List batchRequests = argumentCaptor.getAllValues(); + batchRequests.forEach( + batchRequest -> { + List requests = batchRequest.requestItems().get(tableName); + // assert that each bundle contains expected number of items + assertEquals(numOfItems, requests.size()); + List> requestKeys = + requests.stream() + .map( + request -> + request.putRequest() != null + ? request.putRequest().item() + : request.deleteRequest().key()) + .collect(Collectors.toList()); + // assert no duplicate keys in each bundle + assertEquals(new HashSet<>(requestKeys).size(), requestKeys.size()); + }); + } + + private static class IterateListDoFn + extends DoFn>, Map> { + + @ProcessElement + public void processElement( + @Element List> items, + OutputReceiver> out) { + for (Map item : items) { + out.output(item); + } + } + } } diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIOTestHelper.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIOTestHelper.java index d7be4497ed7a..f28ba01a13f5 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIOTestHelper.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIOTestHelper.java @@ -55,7 +55,6 @@ /** A utility to generate test table and data for {@link DynamoDBIOTest}. */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) class DynamoDBIOTestHelper implements Serializable { private static final String DYNAMODB_LOCAL_VERSION = "1.13.3"; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDbClientProviderMock.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDbClientProviderMock.java index 0f21a7de7a25..a01796607baa 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDbClientProviderMock.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDbClientProviderMock.java @@ -20,9 +20,6 @@ import software.amazon.awssdk.services.dynamodb.DynamoDbClient; /** Mocking AwsClientProvider. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DynamoDbClientProviderMock implements DynamoDbClientProvider { private static DynamoDbClientProviderMock instance = new DynamoDbClientProviderMock(); diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/AmazonKinesisMock.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/AmazonKinesisMock.java index 60ea08dd63ff..6d7c1344b2c8 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/AmazonKinesisMock.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/AmazonKinesisMock.java @@ -94,9 +94,6 @@ import software.amazon.awssdk.services.kinesis.model.UpdateShardCountResponse; /** Mock implementation of {@link KinesisClient} for testing. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class AmazonKinesisMock implements KinesisClient { static class TestData implements Serializable { diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/DynamicCheckpointGeneratorTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/DynamicCheckpointGeneratorTest.java index cad1fa1b6ded..3bbf39a206d8 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/DynamicCheckpointGeneratorTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/DynamicCheckpointGeneratorTest.java @@ -33,9 +33,6 @@ /** * */ @RunWith(PowerMockRunner.class) @PrepareForTest(Shard.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DynamicCheckpointGeneratorTest { @Mock private SimplifiedKinesisClient kinesisClient; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisIOIT.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisIOIT.java index e6c346917c54..565e03cc0140 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisIOIT.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisIOIT.java @@ -48,9 +48,6 @@ * KinesisTestOptions} in order to run this. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KinesisIOIT implements Serializable { private static int numberOfShards; private static int numberOfRows; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisReaderCheckpointTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisReaderCheckpointTest.java index 3e6440a49f03..aab5c1d677bb 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisReaderCheckpointTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisReaderCheckpointTest.java @@ -31,9 +31,6 @@ /** * */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KinesisReaderCheckpointTest { @Mock private ShardCheckpoint a, b, c; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisReaderTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisReaderTest.java index d1120a8eb478..ab15be0d5faa 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisReaderTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisReaderTest.java @@ -38,9 +38,6 @@ /** Tests {@link KinesisReader}. */ @RunWith(MockitoJUnitRunner.Silent.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KinesisReaderTest { @Mock private SimplifiedKinesisClient kinesis; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisServiceMock.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisServiceMock.java index 22864ba61804..ede988aa76c1 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisServiceMock.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisServiceMock.java @@ -26,9 +26,6 @@ import org.joda.time.DateTime; /** Simple mock implementation of Kinesis service for testing, singletone. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KinesisServiceMock { private static KinesisServiceMock instance; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/RecordFilterTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/RecordFilterTest.java index 9880d7fd128d..0058e2ea3166 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/RecordFilterTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/RecordFilterTest.java @@ -30,9 +30,6 @@ /** * */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RecordFilterTest { @Mock private ShardCheckpoint checkpoint; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardCheckpointTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardCheckpointTest.java index a13ac51cdd2c..aa4b21bb3185 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardCheckpointTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardCheckpointTest.java @@ -42,9 +42,6 @@ /** */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ShardCheckpointTest { private static final String AT_SEQUENCE_SHARD_IT = "AT_SEQUENCE_SHARD_IT"; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardReadersPoolTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardReadersPoolTest.java index e41b59e0b4e2..462ae654aece 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardReadersPoolTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardReadersPoolTest.java @@ -47,9 +47,6 @@ /** Tests {@link ShardReadersPool}. */ @RunWith(MockitoJUnitRunner.Silent.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ShardReadersPoolTest { private static final int TIMEOUT_IN_MILLIS = (int) TimeUnit.SECONDS.toMillis(10); diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardRecordsIteratorTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardRecordsIteratorTest.java index 67422200a191..a1f01cdac79c 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardRecordsIteratorTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ShardRecordsIteratorTest.java @@ -39,9 +39,6 @@ /** Tests {@link ShardRecordsIterator}. */ @RunWith(MockitoJUnitRunner.Silent.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ShardRecordsIteratorTest { private static final String INITIAL_ITERATOR = "INITIAL_ITERATOR"; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClientTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClientTest.java index 32254411277c..83b3e544e83f 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClientTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClientTest.java @@ -60,9 +60,6 @@ /** * */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SimplifiedKinesisClientTest { private static final String STREAM = "stream"; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/StartingPointShardsFinderTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/StartingPointShardsFinderTest.java index 21c1a589fc6e..51d97bb3760b 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/StartingPointShardsFinderTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/StartingPointShardsFinderTest.java @@ -35,9 +35,6 @@ /** Tests StartingPointShardsFinder. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StartingPointShardsFinderTest { private static final String STREAM_NAME = "streamName"; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/TimeUtilTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/TimeUtilTest.java index 5def8d2ec238..02fa7eda8870 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/TimeUtilTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/TimeUtilTest.java @@ -24,9 +24,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TimeUtilTest { @Test diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/options/AwsModuleTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/options/AwsModuleTest.java index dda5ad9568c4..9414e241e265 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/options/AwsModuleTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/options/AwsModuleTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.io.aws2.options; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItem; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import com.fasterxml.jackson.databind.Module; import com.fasterxml.jackson.databind.ObjectMapper; @@ -33,6 +33,7 @@ import org.junit.runners.JUnit4; import software.amazon.awssdk.auth.credentials.AwsBasicCredentials; import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; +import software.amazon.awssdk.auth.credentials.AwsSessionCredentials; import software.amazon.awssdk.auth.credentials.ContainerCredentialsProvider; import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider; import software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider; @@ -68,6 +69,20 @@ public void testStaticCredentialsProviderSerializationDeserialization() throws E assertEquals( credentialsProvider.resolveCredentials().secretAccessKey(), deserializedCredentialsProvider.resolveCredentials().secretAccessKey()); + + AwsSessionCredentials sessionCredentials = + AwsSessionCredentials.create("key-id", "secret-key", "session-token"); + credentialsProvider = StaticCredentialsProvider.create(sessionCredentials); + serializedCredentialsProvider = objectMapper.writeValueAsString(credentialsProvider); + deserializedCredentialsProvider = + objectMapper.readValue(serializedCredentialsProvider, AwsCredentialsProvider.class); + + assertEquals(credentialsProvider.getClass(), deserializedCredentialsProvider.getClass()); + AwsSessionCredentials deserializedCredentials = + (AwsSessionCredentials) deserializedCredentialsProvider.resolveCredentials(); + assertEquals(sessionCredentials.accessKeyId(), deserializedCredentials.accessKeyId()); + assertEquals(sessionCredentials.secretAccessKey(), deserializedCredentials.secretAccessKey()); + assertEquals(sessionCredentials.sessionToken(), deserializedCredentials.sessionToken()); } @Test diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/options/AwsSerializableUtilsTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/options/AwsSerializableUtilsTest.java new file mode 100644 index 000000000000..90c97b2fe329 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/options/AwsSerializableUtilsTest.java @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.options; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import org.junit.Test; +import software.amazon.awssdk.auth.credentials.AwsBasicCredentials; +import software.amazon.awssdk.auth.credentials.AwsCredentials; +import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; +import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider; + +/** Tests on {@link AwsSerializableUtils}. */ +public class AwsSerializableUtilsTest { + + private static final String ACCESS_KEY_ID = "ACCESS_KEY_ID"; + private static final String SECRET_ACCESS_KEY = "SECRET_ACCESS_KEY"; + + @Test + public void testAwsCredentialsProviderSerialization() { + AwsCredentialsProvider awsCredentialsProvider = + StaticCredentialsProvider.create( + AwsBasicCredentials.create(ACCESS_KEY_ID, SECRET_ACCESS_KEY)); + + String awsCredentialsProviderSerialized = + AwsSerializableUtils.serializeAwsCredentialsProvider(awsCredentialsProvider); + + AwsCredentialsProvider awsCredentialsProviderDeserialized = + AwsSerializableUtils.deserializeAwsCredentialsProvider(awsCredentialsProviderSerialized); + + assertTrue(awsCredentialsProviderDeserialized instanceof StaticCredentialsProvider); + AwsCredentials awsCredentials = awsCredentialsProviderDeserialized.resolveCredentials(); + assertEquals(ACCESS_KEY_ID, awsCredentials.accessKeyId()); + assertEquals(SECRET_ACCESS_KEY, awsCredentials.secretAccessKey()); + } + + static class UnknownAwsCredentialsProvider implements AwsCredentialsProvider { + @Override + public AwsCredentials resolveCredentials() { + return AwsBasicCredentials.create(ACCESS_KEY_ID, SECRET_ACCESS_KEY); + } + } + + @Test(expected = IllegalArgumentException.class) + public void testFailOnAwsCredentialsProviderSerialization() { + AwsCredentialsProvider awsCredentialsProvider = new UnknownAwsCredentialsProvider(); + AwsSerializableUtils.serializeAwsCredentialsProvider(awsCredentialsProvider); + } + + @Test(expected = IllegalArgumentException.class) + public void testFailOnAwsCredentialsProviderDeserialization() { + AwsSerializableUtils.deserializeAwsCredentialsProvider("invalid string"); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/MatchResultMatcher.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/MatchResultMatcher.java new file mode 100644 index 000000000000..f065bb9576f3 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/MatchResultMatcher.java @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.io.IOException; +import java.util.List; +import org.apache.beam.sdk.io.fs.MatchResult; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.hamcrest.BaseMatcher; +import org.hamcrest.Description; +import org.hamcrest.Matcher; + +/** + * Hamcrest {@link Matcher} to match {@link MatchResult}. Necessary because {@link + * MatchResult#metadata()} throws an exception under normal circumstances. + */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +class MatchResultMatcher extends BaseMatcher { + + private final MatchResult.Status expectedStatus; + private final List expectedMetadata; + private final IOException expectedException; + + private MatchResultMatcher( + MatchResult.Status expectedStatus, + List expectedMetadata, + IOException expectedException) { + this.expectedStatus = checkNotNull(expectedStatus); + checkArgument((expectedMetadata == null) ^ (expectedException == null)); + this.expectedMetadata = expectedMetadata; + this.expectedException = expectedException; + } + + static MatchResultMatcher create(List expectedMetadata) { + return new MatchResultMatcher(MatchResult.Status.OK, expectedMetadata, null); + } + + private static MatchResultMatcher create(MatchResult.Metadata expectedMetadata) { + return create(ImmutableList.of(expectedMetadata)); + } + + static MatchResultMatcher create( + long sizeBytes, long lastModifiedMillis, ResourceId resourceId, boolean isReadSeekEfficient) { + return create( + MatchResult.Metadata.builder() + .setSizeBytes(sizeBytes) + .setLastModifiedMillis(lastModifiedMillis) + .setResourceId(resourceId) + .setIsReadSeekEfficient(isReadSeekEfficient) + .build()); + } + + static MatchResultMatcher create( + MatchResult.Status expectedStatus, IOException expectedException) { + return new MatchResultMatcher(expectedStatus, null, expectedException); + } + + static MatchResultMatcher create(MatchResult expected) { + MatchResult.Status expectedStatus = expected.status(); + List expectedMetadata = null; + IOException expectedException = null; + try { + expectedMetadata = expected.metadata(); + } catch (IOException e) { + expectedException = e; + } + return new MatchResultMatcher(expectedStatus, expectedMetadata, expectedException); + } + + @Override + public boolean matches(Object actual) { + if (actual == null) { + return false; + } + if (!(actual instanceof MatchResult)) { + return false; + } + MatchResult actualResult = (MatchResult) actual; + if (!expectedStatus.equals(actualResult.status())) { + return false; + } + + List actualMetadata; + try { + actualMetadata = actualResult.metadata(); + } catch (IOException e) { + return expectedException != null && expectedException.toString().equals(e.toString()); + } + return expectedMetadata != null && expectedMetadata.equals(actualMetadata); + } + + @Override + public void describeTo(Description description) { + if (expectedMetadata != null) { + description.appendText(MatchResult.create(expectedStatus, expectedMetadata).toString()); + } else { + description.appendText(MatchResult.create(expectedStatus, expectedException).toString()); + } + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystemTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystemTest.java new file mode 100644 index 000000000000..5613d538d17e --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystemTest.java @@ -0,0 +1,780 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.buildMockedS3FileSystem; +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.getSSECustomerKeyMd5; +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.s3Options; +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.s3OptionsWithPathStyleAccessEnabled; +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.s3OptionsWithSSECustomerKey; +import static org.apache.beam.sdk.io.fs.CreateOptions.StandardCreateOptions.builder; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.contains; +import static org.junit.Assert.assertArrayEquals; +import static org.junit.Assert.assertEquals; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.argThat; +import static org.mockito.ArgumentMatchers.notNull; +import static org.mockito.Mockito.never; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; + +import akka.http.scaladsl.Http; +import io.findify.s3mock.S3Mock; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.net.URI; +import java.net.URISyntaxException; +import java.net.URL; +import java.nio.ByteBuffer; +import java.nio.channels.ReadableByteChannel; +import java.nio.channels.WritableByteChannel; +import java.time.Instant; +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.io.aws2.options.S3Options; +import org.apache.beam.sdk.io.fs.MatchResult; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.mockito.ArgumentMatcher; +import software.amazon.awssdk.auth.credentials.AnonymousCredentialsProvider; +import software.amazon.awssdk.core.exception.SdkServiceException; +import software.amazon.awssdk.regions.Region; +import software.amazon.awssdk.services.s3.S3Client; +import software.amazon.awssdk.services.s3.S3Configuration; +import software.amazon.awssdk.services.s3.model.CompleteMultipartUploadRequest; +import software.amazon.awssdk.services.s3.model.CopyObjectRequest; +import software.amazon.awssdk.services.s3.model.CopyObjectResponse; +import software.amazon.awssdk.services.s3.model.CopyPartResult; +import software.amazon.awssdk.services.s3.model.CreateBucketRequest; +import software.amazon.awssdk.services.s3.model.CreateMultipartUploadRequest; +import software.amazon.awssdk.services.s3.model.CreateMultipartUploadResponse; +import software.amazon.awssdk.services.s3.model.DeleteObjectsRequest; +import software.amazon.awssdk.services.s3.model.GetUrlRequest; +import software.amazon.awssdk.services.s3.model.HeadObjectRequest; +import software.amazon.awssdk.services.s3.model.HeadObjectResponse; +import software.amazon.awssdk.services.s3.model.ListObjectsV2Request; +import software.amazon.awssdk.services.s3.model.ListObjectsV2Response; +import software.amazon.awssdk.services.s3.model.S3Exception; +import software.amazon.awssdk.services.s3.model.S3Object; +import software.amazon.awssdk.services.s3.model.UploadPartCopyRequest; +import software.amazon.awssdk.services.s3.model.UploadPartCopyResponse; + +/** Test case for {@link S3FileSystem}. */ +@RunWith(JUnit4.class) +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class S3FileSystemTest { + + private static S3Mock api; + private static S3Client client; + + @BeforeClass + public static void beforeClass() { + api = new S3Mock.Builder().withInMemoryBackend().withPort(8002).build(); + Http.ServerBinding binding = api.start(); + + URI endpoint = URI.create("http://localhost:" + binding.localAddress().getPort()); + S3Configuration s3Configuration = + S3Configuration.builder().pathStyleAccessEnabled(true).build(); + client = + S3Client.builder() + .region(Region.US_WEST_1) + .serviceConfiguration(s3Configuration) + .endpointOverride(endpoint) + .credentialsProvider(AnonymousCredentialsProvider.create()) + .build(); + } + + @AfterClass + public static void afterClass() { + api.stop(); + } + + @Test + public void testGetScheme() { + S3FileSystem s3FileSystem = new S3FileSystem(s3Options()); + assertEquals("s3", s3FileSystem.getScheme()); + } + + @Test + public void testGetPathStyleAccessEnabled() throws URISyntaxException { + S3FileSystem s3FileSystem = new S3FileSystem(s3OptionsWithPathStyleAccessEnabled()); + URL s3Url = + s3FileSystem + .getS3Client() + .utilities() + .getUrl(GetUrlRequest.builder().bucket("bucket").key("file").build()); + assertEquals("https://s3.us-west-1.amazonaws.com/bucket/file", s3Url.toURI().toString()); + } + + @Test + public void testCopy() throws IOException { + testCopy(s3Options()); + testCopy(s3OptionsWithSSECustomerKey()); + } + + private HeadObjectRequest createObjectHeadRequest(S3ResourceId path, S3Options options) { + return HeadObjectRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .sseCustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .build(); + } + + private void assertGetObjectHead( + S3FileSystem s3FileSystem, + HeadObjectRequest request, + S3Options options, + HeadObjectResponse objectMetadata) { + when(s3FileSystem.getS3Client().headObject(argThat(new GetHeadObjectRequestMatcher(request)))) + .thenReturn(objectMetadata); + assertEquals( + getSSECustomerKeyMd5(options), + s3FileSystem.getS3Client().headObject(request).sseCustomerKeyMD5()); + } + + private void testCopy(S3Options options) throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + + S3ResourceId sourcePath = S3ResourceId.fromUri("s3://bucket/from"); + S3ResourceId destinationPath = S3ResourceId.fromUri("s3://bucket/to"); + + HeadObjectResponse.Builder builder = HeadObjectResponse.builder().contentLength(0L); + + if (getSSECustomerKeyMd5(options) != null) { + builder.sseCustomerKeyMD5(getSSECustomerKeyMd5(options)); + } + HeadObjectResponse headObjectResponse = builder.build(); + assertGetObjectHead( + s3FileSystem, createObjectHeadRequest(sourcePath, options), options, headObjectResponse); + + s3FileSystem.copy(sourcePath, destinationPath); + + verify(s3FileSystem.getS3Client(), times(1)).copyObject(any(CopyObjectRequest.class)); + + // we simulate a big object >= 5GB so it takes the multiPart path + HeadObjectResponse bigHeadObjectResponse = + headObjectResponse.toBuilder().contentLength(5_368_709_120L).build(); + assertGetObjectHead( + s3FileSystem, createObjectHeadRequest(sourcePath, options), options, headObjectResponse); + + try { + s3FileSystem.copy(sourcePath, destinationPath); + } catch (NullPointerException e) { + // ignore failing unmocked path, this is covered by testMultipartCopy test + } + + verify(s3FileSystem.getS3Client(), never()).copyObject((CopyObjectRequest) null); + } + + @Test + public void testAtomicCopy() { + testAtomicCopy(s3Options()); + testAtomicCopy(s3OptionsWithSSECustomerKey()); + } + + private void testAtomicCopy(S3Options options) { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(options); + + S3ResourceId sourcePath = S3ResourceId.fromUri("s3://bucket/from"); + S3ResourceId destinationPath = S3ResourceId.fromUri("s3://bucket/to"); + + CopyObjectResponse.Builder builder = CopyObjectResponse.builder(); + if (getSSECustomerKeyMd5(options) != null) { + builder.sseCustomerKeyMD5(getSSECustomerKeyMd5(options)); + } + CopyObjectResponse copyObjectResponse = builder.build(); + CopyObjectRequest copyObjectRequest = + CopyObjectRequest.builder() + .copySource(sourcePath.getBucket() + "/" + sourcePath.getKey()) + .destinationBucket(destinationPath.getBucket()) + .destinationBucket(destinationPath.getKey()) + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .copySourceSSECustomerAlgorithm(options.getSSECustomerKey().getAlgorithm()) + .build(); + when(s3FileSystem.getS3Client().copyObject(any(CopyObjectRequest.class))) + .thenReturn(copyObjectResponse); + assertEquals( + getSSECustomerKeyMd5(options), + s3FileSystem.getS3Client().copyObject(copyObjectRequest).sseCustomerKeyMD5()); + + HeadObjectResponse headObjectResponse = HeadObjectResponse.builder().build(); + s3FileSystem.atomicCopy(sourcePath, destinationPath, headObjectResponse); + + verify(s3FileSystem.getS3Client(), times(2)).copyObject(any(CopyObjectRequest.class)); + } + + @Test + public void testMultipartCopy() { + testMultipartCopy(s3Options()); + testMultipartCopy(s3OptionsWithSSECustomerKey()); + } + + private void testMultipartCopy(S3Options options) { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(options); + + S3ResourceId sourcePath = S3ResourceId.fromUri("s3://bucket/from"); + S3ResourceId destinationPath = S3ResourceId.fromUri("s3://bucket/to"); + + CreateMultipartUploadResponse.Builder builder = + CreateMultipartUploadResponse.builder().uploadId("upload-id"); + if (getSSECustomerKeyMd5(options) != null) { + builder.sseCustomerKeyMD5(getSSECustomerKeyMd5(options)); + } + CreateMultipartUploadResponse createMultipartUploadResponse = builder.build(); + when(s3FileSystem.getS3Client().createMultipartUpload(any(CreateMultipartUploadRequest.class))) + .thenReturn(createMultipartUploadResponse); + assertEquals( + getSSECustomerKeyMd5(options), + s3FileSystem + .getS3Client() + .createMultipartUpload( + CreateMultipartUploadRequest.builder() + .bucket(destinationPath.getBucket()) + .key(destinationPath.getKey()) + .build()) + .sseCustomerKeyMD5()); + + HeadObjectResponse.Builder headObjectResponseBuilder = + HeadObjectResponse.builder() + .contentLength((long) (options.getS3UploadBufferSizeBytes() * 1.5)) + .contentEncoding("read-seek-efficient"); + if (getSSECustomerKeyMd5(options) != null) { + headObjectResponseBuilder.sseCustomerKeyMD5(getSSECustomerKeyMd5(options)); + } + HeadObjectResponse headObjectResponse = headObjectResponseBuilder.build(); + assertGetObjectHead( + s3FileSystem, createObjectHeadRequest(sourcePath, options), options, headObjectResponse); + + CopyPartResult copyPartResult1 = CopyPartResult.builder().eTag("etag-1").build(); + CopyPartResult copyPartResult2 = CopyPartResult.builder().eTag("etag-2").build(); + UploadPartCopyResponse.Builder uploadPartCopyResponseBuilder1 = + UploadPartCopyResponse.builder().copyPartResult(copyPartResult1); + UploadPartCopyResponse.Builder uploadPartCopyResponseBuilder2 = + UploadPartCopyResponse.builder().copyPartResult(copyPartResult2); + if (getSSECustomerKeyMd5(options) != null) { + uploadPartCopyResponseBuilder1.sseCustomerKeyMD5(getSSECustomerKeyMd5(options)); + uploadPartCopyResponseBuilder2.sseCustomerKeyMD5(getSSECustomerKeyMd5(options)); + } + UploadPartCopyResponse uploadPartCopyResponse1 = uploadPartCopyResponseBuilder1.build(); + UploadPartCopyResponse uploadPartCopyResponse2 = uploadPartCopyResponseBuilder2.build(); + UploadPartCopyRequest uploadPartCopyRequest = + UploadPartCopyRequest.builder() + .sseCustomerKey(options.getSSECustomerKey().getKey()) + .build(); + when(s3FileSystem.getS3Client().uploadPartCopy(any(UploadPartCopyRequest.class))) + .thenReturn(uploadPartCopyResponse1) + .thenReturn(uploadPartCopyResponse2); + assertEquals( + getSSECustomerKeyMd5(options), + s3FileSystem.getS3Client().uploadPartCopy(uploadPartCopyRequest).sseCustomerKeyMD5()); + + s3FileSystem.multipartCopy(sourcePath, destinationPath, headObjectResponse); + + verify(s3FileSystem.getS3Client(), times(1)) + .completeMultipartUpload(any(CompleteMultipartUploadRequest.class)); + } + + @Test + public void deleteThousandsOfObjectsInMultipleBuckets() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + + List buckets = ImmutableList.of("bucket1", "bucket2"); + List keys = new ArrayList<>(); + for (int i = 0; i < 2500; i++) { + keys.add(String.format("key-%d", i)); + } + List paths = new ArrayList<>(); + for (String bucket : buckets) { + for (String key : keys) { + paths.add(S3ResourceId.fromComponents(bucket, key)); + } + } + + s3FileSystem.delete(paths); + + // Should require 6 calls to delete 2500 objects in each of 2 buckets. + verify(s3FileSystem.getS3Client(), times(6)).deleteObjects(any(DeleteObjectsRequest.class)); + } + + @Test + public void matchNonGlob() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); + long lastModifiedMillis = 1540000000000L; + HeadObjectResponse headObjectResponse = + HeadObjectResponse.builder() + .contentLength(100L) + .contentEncoding("read-seek-efficient") + .lastModified(Instant.ofEpochMilli(lastModifiedMillis)) + .build(); + when(s3FileSystem + .getS3Client() + .headObject( + argThat( + new GetHeadObjectRequestMatcher( + HeadObjectRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .build())))) + .thenReturn(headObjectResponse); + + MatchResult result = s3FileSystem.matchNonGlobPath(path); + assertThat( + result, + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setSizeBytes(100) + .setLastModifiedMillis(lastModifiedMillis) + .setResourceId(path) + .setIsReadSeekEfficient(true) + .build()))); + } + + @Test + public void matchNonGlobNotReadSeekEfficient() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); + long lastModifiedMillis = 1540000000000L; + HeadObjectResponse headObjectResponse = + HeadObjectResponse.builder() + .contentLength(100L) + .lastModified(Instant.ofEpochMilli(lastModifiedMillis)) + .contentEncoding("gzip") + .build(); + when(s3FileSystem + .getS3Client() + .headObject( + argThat( + new GetHeadObjectRequestMatcher( + HeadObjectRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .build())))) + .thenReturn(headObjectResponse); + + MatchResult result = s3FileSystem.matchNonGlobPath(path); + assertThat( + result, + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setSizeBytes(100) + .setLastModifiedMillis(lastModifiedMillis) + .setResourceId(path) + .setIsReadSeekEfficient(false) + .build()))); + } + + @Test + public void matchNonGlobNullContentEncoding() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); + long lastModifiedMillis = 1540000000000L; + HeadObjectResponse headObjectResponse = + HeadObjectResponse.builder() + .contentLength(100L) + .lastModified(Instant.ofEpochMilli(lastModifiedMillis)) + .contentEncoding(null) + .build(); + when(s3FileSystem + .getS3Client() + .headObject( + argThat( + new GetHeadObjectRequestMatcher( + HeadObjectRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .build())))) + .thenReturn(headObjectResponse); + + MatchResult result = s3FileSystem.matchNonGlobPath(path); + assertThat( + result, + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setSizeBytes(100) + .setLastModifiedMillis(lastModifiedMillis) + .setResourceId(path) + .setIsReadSeekEfficient(true) + .build()))); + } + + @Test + public void matchNonGlobNotFound() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/nonexistentfile"); + SdkServiceException exception = + S3Exception.builder().message("mock exception").statusCode(404).build(); + when(s3FileSystem + .getS3Client() + .headObject( + argThat( + new GetHeadObjectRequestMatcher( + HeadObjectRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .build())))) + .thenThrow(exception); + + MatchResult result = s3FileSystem.matchNonGlobPath(path); + assertThat( + result, + MatchResultMatcher.create(MatchResult.Status.NOT_FOUND, new FileNotFoundException())); + } + + @Test + public void matchNonGlobForbidden() { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + + SdkServiceException exception = + S3Exception.builder().message("mock exception").statusCode(403).build(); + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/testdirectory/keyname"); + when(s3FileSystem + .getS3Client() + .headObject( + argThat( + new GetHeadObjectRequestMatcher( + HeadObjectRequest.builder() + .bucket(path.getBucket()) + .key(path.getKey()) + .build())))) + .thenThrow(exception); + + assertThat( + s3FileSystem.matchNonGlobPath(path), + MatchResultMatcher.create(MatchResult.Status.ERROR, new IOException(exception))); + } + + static class ListObjectsV2RequestArgumentMatches + implements ArgumentMatcher { + + private final ListObjectsV2Request expected; + + ListObjectsV2RequestArgumentMatches(ListObjectsV2Request expected) { + this.expected = checkNotNull(expected); + } + + @Override + public boolean matches(ListObjectsV2Request argument) { + if (argument != null) { + return expected.bucket().equals(argument.bucket()) + && expected.prefix().equals(argument.prefix()) + && (expected.continuationToken() == null + ? argument.continuationToken() == null + : expected.continuationToken().equals(argument.continuationToken())); + } + return false; + } + } + + @Test + public void matchGlob() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/foo/bar*baz"); + + ListObjectsV2Request firstRequest = + ListObjectsV2Request.builder() + .bucket(path.getBucket()) + .prefix(path.getKeyNonWildcardPrefix()) + .continuationToken(null) + .build(); + + // Expected to be returned; prefix and wildcard/regex match + S3Object firstMatch = + S3Object.builder() + .key("foo/bar0baz") + .size(100L) + .lastModified(Instant.ofEpochMilli(1540000000001L)) + .build(); + + // Expected to not be returned; prefix matches, but substring after wildcard does not + S3Object secondMatch = + S3Object.builder() + .key("foo/bar1qux") + .size(200L) + .lastModified(Instant.ofEpochMilli(1540000000002L)) + .build(); + + // Expected first request returns continuation token + ListObjectsV2Response firstResponse = + ListObjectsV2Response.builder() + .nextContinuationToken("token") + .contents(firstMatch, secondMatch) + .build(); + when(s3FileSystem + .getS3Client() + .listObjectsV2(argThat(new ListObjectsV2RequestArgumentMatches(firstRequest)))) + .thenReturn(firstResponse); + + // Expect second request with continuation token + ListObjectsV2Request secondRequest = + ListObjectsV2Request.builder() + .bucket(path.getBucket()) + .prefix(path.getKeyNonWildcardPrefix()) + .continuationToken("token") + .build(); + + // Expected to be returned; prefix and wildcard/regex match + S3Object thirdMatch = + S3Object.builder() + .key("foo/bar2baz") + .size(300L) + .lastModified(Instant.ofEpochMilli(1540000000003L)) + .build(); + + // Expected second request returns third prefix match and no continuation token + ListObjectsV2Response secondResponse = + ListObjectsV2Response.builder().nextContinuationToken(null).contents(thirdMatch).build(); + when(s3FileSystem + .getS3Client() + .listObjectsV2(argThat(new ListObjectsV2RequestArgumentMatches(secondRequest)))) + .thenReturn(secondResponse); + + // Expect object metadata queries for content encoding + HeadObjectResponse headObjectResponse = + HeadObjectResponse.builder().contentEncoding("").build(); + when(s3FileSystem.getS3Client().headObject(any(HeadObjectRequest.class))) + .thenReturn(headObjectResponse); + + assertThat( + s3FileSystem.matchGlobPaths(ImmutableList.of(path)).get(0), + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setIsReadSeekEfficient(true) + .setResourceId(S3ResourceId.fromComponents(path.getBucket(), firstMatch.key())) + .setSizeBytes(firstMatch.size()) + .setLastModifiedMillis(firstMatch.lastModified().toEpochMilli()) + .build(), + MatchResult.Metadata.builder() + .setIsReadSeekEfficient(true) + .setResourceId(S3ResourceId.fromComponents(path.getBucket(), thirdMatch.key())) + .setSizeBytes(thirdMatch.size()) + .setLastModifiedMillis(thirdMatch.lastModified().toEpochMilli()) + .build()))); + } + + @Test + public void matchGlobWithSlashes() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/foo/bar\\baz*"); + + ListObjectsV2Request request = + ListObjectsV2Request.builder() + .bucket(path.getBucket()) + .prefix(path.getKeyNonWildcardPrefix()) + .continuationToken(null) + .build(); + + // Expected to be returned; prefix and wildcard/regex match + S3Object firstMatch = + S3Object.builder() + .key("foo/bar\\baz0") + .size(100L) + .lastModified(Instant.ofEpochMilli(1540000000001L)) + .build(); + + // Expected to not be returned; prefix matches, but substring after wildcard does not + S3Object secondMatch = + S3Object.builder() + .key("foo/bar/baz1") + .size(200L) + .lastModified(Instant.ofEpochMilli(1540000000002L)) + .build(); + + // Expected first request returns continuation token + ListObjectsV2Response response = + ListObjectsV2Response.builder().contents(firstMatch, secondMatch).build(); + when(s3FileSystem + .getS3Client() + .listObjectsV2(argThat(new ListObjectsV2RequestArgumentMatches(request)))) + .thenReturn(response); + + // Expect object metadata queries for content encoding + HeadObjectResponse headObjectResponse = + HeadObjectResponse.builder().contentEncoding("").build(); + when(s3FileSystem.getS3Client().headObject(any(HeadObjectRequest.class))) + .thenReturn(headObjectResponse); + + assertThat( + s3FileSystem.matchGlobPaths(ImmutableList.of(path)).get(0), + MatchResultMatcher.create( + ImmutableList.of( + MatchResult.Metadata.builder() + .setIsReadSeekEfficient(true) + .setResourceId(S3ResourceId.fromComponents(path.getBucket(), firstMatch.key())) + .setSizeBytes(firstMatch.size()) + .setLastModifiedMillis(firstMatch.lastModified().toEpochMilli()) + .build()))); + } + + @Test + public void matchVariousInvokeThreadPool() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options()); + SdkServiceException notFoundException = + S3Exception.builder().message("mock exception").statusCode(404).build(); + S3ResourceId pathNotExist = + S3ResourceId.fromUri("s3://testbucket/testdirectory/nonexistentfile"); + HeadObjectRequest headObjectRequestNotExist = + HeadObjectRequest.builder() + .bucket(pathNotExist.getBucket()) + .key(pathNotExist.getKey()) + .build(); + when(s3FileSystem + .getS3Client() + .headObject(argThat(new GetHeadObjectRequestMatcher(headObjectRequestNotExist)))) + .thenThrow(notFoundException); + + SdkServiceException forbiddenException = + SdkServiceException.builder().message("mock exception").statusCode(403).build(); + S3ResourceId pathForbidden = + S3ResourceId.fromUri("s3://testbucket/testdirectory/forbiddenfile"); + HeadObjectRequest headObjectRequestForbidden = + HeadObjectRequest.builder() + .bucket(pathForbidden.getBucket()) + .key(pathForbidden.getKey()) + .build(); + when(s3FileSystem + .getS3Client() + .headObject(argThat(new GetHeadObjectRequestMatcher(headObjectRequestForbidden)))) + .thenThrow(forbiddenException); + + S3ResourceId pathExist = S3ResourceId.fromUri("s3://testbucket/testdirectory/filethatexists"); + HeadObjectRequest headObjectRequestExist = + HeadObjectRequest.builder().bucket(pathExist.getBucket()).key(pathExist.getKey()).build(); + HeadObjectResponse s3ObjectMetadata = + HeadObjectResponse.builder() + .contentLength(100L) + .contentEncoding("not-gzip") + .lastModified(Instant.ofEpochMilli(1540000000000L)) + .build(); + when(s3FileSystem + .getS3Client() + .headObject(argThat(new GetHeadObjectRequestMatcher(headObjectRequestExist)))) + .thenReturn(s3ObjectMetadata); + + S3ResourceId pathGlob = S3ResourceId.fromUri("s3://testbucket/path/part*"); + + S3Object foundListObject = + S3Object.builder() + .key("path/part-0") + .size(200L) + .lastModified(Instant.ofEpochMilli(1541000000000L)) + .build(); + + ListObjectsV2Response listObjectsResponse = + ListObjectsV2Response.builder().continuationToken(null).contents(foundListObject).build(); + when(s3FileSystem.getS3Client().listObjectsV2((ListObjectsV2Request) notNull())) + .thenReturn(listObjectsResponse); + + HeadObjectResponse headObjectResponse = + HeadObjectResponse.builder().contentEncoding("").build(); + when(s3FileSystem + .getS3Client() + .headObject( + argThat( + new GetHeadObjectRequestMatcher( + HeadObjectRequest.builder() + .bucket(pathGlob.getBucket()) + .key("path/part-0") + .build())))) + .thenReturn(headObjectResponse); + + assertThat( + s3FileSystem.match( + ImmutableList.of( + pathNotExist.toString(), + pathForbidden.toString(), + pathExist.toString(), + pathGlob.toString())), + contains( + MatchResultMatcher.create(MatchResult.Status.NOT_FOUND, new FileNotFoundException()), + MatchResultMatcher.create( + MatchResult.Status.ERROR, new IOException(forbiddenException)), + MatchResultMatcher.create(100, 1540000000000L, pathExist, true), + MatchResultMatcher.create( + 200, + 1541000000000L, + S3ResourceId.fromComponents(pathGlob.getBucket(), foundListObject.key()), + true))); + } + + @Test + public void testWriteAndRead() throws IOException { + S3FileSystem s3FileSystem = buildMockedS3FileSystem(s3Options(), client); + + client.createBucket(CreateBucketRequest.builder().bucket("testbucket").build()); + + byte[] writtenArray = new byte[] {0}; + ByteBuffer bb = ByteBuffer.allocate(writtenArray.length); + bb.put(writtenArray); + + // First create an object and write data to it + S3ResourceId path = S3ResourceId.fromUri("s3://testbucket/foo/bar.txt"); + WritableByteChannel writableByteChannel = + s3FileSystem.create(path, builder().setMimeType("application/text").build()); + writableByteChannel.write(bb); + writableByteChannel.close(); + + // Now read the same object + ByteBuffer bb2 = ByteBuffer.allocate(writtenArray.length); + ReadableByteChannel open = s3FileSystem.open(path); + open.read(bb2); + + // And compare the content with the one that was written + byte[] readArray = bb2.array(); + assertArrayEquals(readArray, writtenArray); + open.close(); + } + + /** A mockito argument matcher to implement equality on GetHeadObjectRequest. */ + private static class GetHeadObjectRequestMatcher implements ArgumentMatcher { + + private final HeadObjectRequest expected; + + GetHeadObjectRequestMatcher(HeadObjectRequest expected) { + this.expected = expected; + } + + @Override + public boolean matches(HeadObjectRequest obj) { + if (obj == null) { + return false; + } + return obj.bucket().equals(expected.bucket()) && obj.key().equals(expected.key()); + } + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3ResourceIdTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3ResourceIdTest.java new file mode 100644 index 000000000000..fe5a3efee18b --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3ResourceIdTest.java @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions.RESOLVE_DIRECTORY; +import static org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions.RESOLVE_FILE; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNotEquals; +import static org.junit.Assert.assertNull; +import static org.junit.Assert.assertThrows; +import static org.junit.Assert.assertTrue; + +import java.util.Arrays; +import java.util.List; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.aws2.options.S3Options; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.io.fs.ResourceIdTester; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests {@link S3ResourceId}. */ +@RunWith(JUnit4.class) +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class S3ResourceIdTest { + + static final class TestCase { + + final String baseUri; + final String relativePath; + final StandardResolveOptions resolveOptions; + final String expectedResult; + + TestCase( + String baseUri, + String relativePath, + StandardResolveOptions resolveOptions, + String expectedResult) { + this.baseUri = baseUri; + this.relativePath = relativePath; + this.resolveOptions = resolveOptions; + this.expectedResult = expectedResult; + } + } + + // Each test case is an expected URL, then the components used to build it. + // Empty components result in a double slash. + private static final List PATH_TEST_CASES = + Arrays.asList( + new TestCase("s3://bucket/", "", RESOLVE_DIRECTORY, "s3://bucket/"), + new TestCase("s3://bucket", "", RESOLVE_DIRECTORY, "s3://bucket/"), + new TestCase("s3://bucket", "path/to/dir", RESOLVE_DIRECTORY, "s3://bucket/path/to/dir/"), + new TestCase("s3://bucket", "path/to/object", RESOLVE_FILE, "s3://bucket/path/to/object"), + new TestCase( + "s3://bucket/path/to/dir/", "..", RESOLVE_DIRECTORY, "s3://bucket/path/to/")); + + @Test + public void testResolve() { + for (TestCase testCase : PATH_TEST_CASES) { + ResourceId resourceId = S3ResourceId.fromUri(testCase.baseUri); + ResourceId resolved = resourceId.resolve(testCase.relativePath, testCase.resolveOptions); + assertEquals(testCase.expectedResult, resolved.toString()); + } + + // Tests for common s3 paths. + assertEquals( + S3ResourceId.fromUri("s3://bucket/tmp/aa"), + S3ResourceId.fromUri("s3://bucket/tmp/").resolve("aa", RESOLVE_FILE)); + assertEquals( + S3ResourceId.fromUri("s3://bucket/tmp/aa/bb/cc/"), + S3ResourceId.fromUri("s3://bucket/tmp/") + .resolve("aa", RESOLVE_DIRECTORY) + .resolve("bb", RESOLVE_DIRECTORY) + .resolve("cc", RESOLVE_DIRECTORY)); + + // Tests absolute path. + assertEquals( + S3ResourceId.fromUri("s3://bucket/tmp/aa"), + S3ResourceId.fromUri("s3://bucket/tmp/bb/").resolve("s3://bucket/tmp/aa", RESOLVE_FILE)); + + // Tests bucket with no ending '/'. + assertEquals( + S3ResourceId.fromUri("s3://my-bucket/tmp"), + S3ResourceId.fromUri("s3://my-bucket").resolve("tmp", RESOLVE_FILE)); + + // Tests path with unicode + assertEquals( + S3ResourceId.fromUri("s3://bucket/输出 目录/输出 文件01.txt"), + S3ResourceId.fromUri("s3://bucket/输出 目录/").resolve("输出 文件01.txt", RESOLVE_FILE)); + } + + @Test + public void testResolveInvalidInputs() { + assertThrows( + "Cannot resolve a file with a directory path: [tmp/]", + IllegalArgumentException.class, + () -> S3ResourceId.fromUri("s3://my_bucket/").resolve("tmp/", RESOLVE_FILE)); + } + + @Test + public void testResolveInvalidNotDirectory() { + ResourceId tmpDir = S3ResourceId.fromUri("s3://my_bucket/").resolve("tmp dir", RESOLVE_FILE); + + assertThrows( + "Expected this resource to be a directory, but was [s3://my_bucket/tmp dir]", + IllegalStateException.class, + () -> tmpDir.resolve("aa", RESOLVE_FILE)); + } + + @Test + public void testS3ResolveWithFileBase() { + ResourceId resourceId = S3ResourceId.fromUri("s3://bucket/path/to/file"); + assertThrows( + IllegalStateException.class, + () -> resourceId.resolve("child-path", RESOLVE_DIRECTORY)); // resource is not a directory + } + + @Test + public void testResolveParentToFile() { + ResourceId resourceId = S3ResourceId.fromUri("s3://bucket/path/to/dir/"); + assertThrows( + IllegalArgumentException.class, + () -> resourceId.resolve("..", RESOLVE_FILE)); // '..' only resolves as dir, not as file + } + + @Test + public void testGetCurrentDirectory() { + // Tests s3 paths. + assertEquals( + S3ResourceId.fromUri("s3://my_bucket/tmp dir/"), + S3ResourceId.fromUri("s3://my_bucket/tmp dir/").getCurrentDirectory()); + + // Tests path with unicode. + assertEquals( + S3ResourceId.fromUri("s3://my_bucket/输出 目录/"), + S3ResourceId.fromUri("s3://my_bucket/输出 目录/文件01.txt").getCurrentDirectory()); + + // Tests bucket with no ending '/'. + assertEquals( + S3ResourceId.fromUri("s3://my_bucket/"), + S3ResourceId.fromUri("s3://my_bucket").getCurrentDirectory()); + assertEquals( + S3ResourceId.fromUri("s3://my_bucket/"), + S3ResourceId.fromUri("s3://my_bucket/not-directory").getCurrentDirectory()); + } + + @Test + public void testIsDirectory() { + assertTrue(S3ResourceId.fromUri("s3://my_bucket/tmp dir/").isDirectory()); + assertTrue(S3ResourceId.fromUri("s3://my_bucket/").isDirectory()); + assertTrue(S3ResourceId.fromUri("s3://my_bucket").isDirectory()); + assertFalse(S3ResourceId.fromUri("s3://my_bucket/file").isDirectory()); + } + + @Test + public void testInvalidPathNoBucket() { + assertThrows( + "Invalid S3 URI: [s3://]", + IllegalArgumentException.class, + () -> S3ResourceId.fromUri("s3://")); + } + + @Test + public void testInvalidPathNoBucketAndSlash() { + assertThrows( + "Invalid S3 URI: [s3:///]", + IllegalArgumentException.class, + () -> S3ResourceId.fromUri("s3:///")); + } + + @Test + public void testGetScheme() { + // Tests s3 paths. + assertEquals("s3", S3ResourceId.fromUri("s3://my_bucket/tmp dir/").getScheme()); + + // Tests bucket with no ending '/'. + assertEquals("s3", S3ResourceId.fromUri("s3://my_bucket").getScheme()); + } + + @Test + public void testGetFilename() { + assertNull(S3ResourceId.fromUri("s3://my_bucket/").getFilename()); + assertEquals("abc", S3ResourceId.fromUri("s3://my_bucket/abc").getFilename()); + assertEquals("abc", S3ResourceId.fromUri("s3://my_bucket/abc/").getFilename()); + assertEquals("def", S3ResourceId.fromUri("s3://my_bucket/abc/def").getFilename()); + assertEquals("def", S3ResourceId.fromUri("s3://my_bucket/abc/def/").getFilename()); + assertEquals("xyz.txt", S3ResourceId.fromUri("s3://my_bucket/abc/xyz.txt").getFilename()); + } + + @Test + public void testParentRelationship() { + S3ResourceId path = S3ResourceId.fromUri("s3://bucket/dir/subdir/object"); + assertEquals("bucket", path.getBucket()); + assertEquals("dir/subdir/object", path.getKey()); + + // s3://bucket/dir/ + path = S3ResourceId.fromUri("s3://bucket/dir/subdir/"); + S3ResourceId parent = (S3ResourceId) path.resolve("..", RESOLVE_DIRECTORY); + assertEquals("bucket", parent.getBucket()); + assertEquals("dir/", parent.getKey()); + assertNotEquals(path, parent); + assertTrue(path.getKey().startsWith(parent.getKey())); + assertFalse(parent.getKey().startsWith(path.getKey())); + + // s3://bucket/ + S3ResourceId grandParent = (S3ResourceId) parent.resolve("..", RESOLVE_DIRECTORY); + assertEquals("bucket", grandParent.getBucket()); + assertEquals("", grandParent.getKey()); + } + + @Test + public void testBucketParsing() { + S3ResourceId path = S3ResourceId.fromUri("s3://bucket"); + S3ResourceId path2 = S3ResourceId.fromUri("s3://bucket/"); + + assertEquals(path, path2); + assertEquals(path.toString(), path2.toString()); + } + + @Test + public void testS3ResourceIdToString() { + String filename = "s3://some-bucket/some/file.txt"; + S3ResourceId path = S3ResourceId.fromUri(filename); + assertEquals(filename, path.toString()); + + filename = "s3://some-bucket/some/"; + path = S3ResourceId.fromUri(filename); + assertEquals(filename, path.toString()); + + filename = "s3://some-bucket/"; + path = S3ResourceId.fromUri(filename); + assertEquals(filename, path.toString()); + } + + @Test + public void testEquals() { + S3ResourceId a = S3ResourceId.fromComponents("bucket", "a/b/c"); + S3ResourceId b = S3ResourceId.fromComponents("bucket", "a/b/c"); + assertEquals(a, b); + + b = S3ResourceId.fromComponents(a.getBucket(), "a/b/c/"); + assertNotEquals(a, b); + + b = S3ResourceId.fromComponents(a.getBucket(), "x/y/z"); + assertNotEquals(a, b); + + b = S3ResourceId.fromComponents("other-bucket", a.getKey()); + assertNotEquals(a, b); + } + + @Test + public void testInvalidS3ResourceId() { + assertThrows( + IllegalArgumentException.class, () -> S3ResourceId.fromUri("file://invalid/s3/path")); + } + + @Test + public void testInvalidBucket() { + assertThrows(IllegalArgumentException.class, () -> S3ResourceId.fromComponents("invalid/", "")); + } + + @Test + public void testResourceIdTester() { + S3Options options = PipelineOptionsFactory.create().as(S3Options.class); + options.setAwsRegion("us-west-1"); + FileSystems.setDefaultPipelineOptions(options); + ResourceIdTester.runResourceIdBattery(S3ResourceId.fromUri("s3://bucket/foo/")); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3TestUtils.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3TestUtils.java new file mode 100644 index 000000000000..c6334e7099d8 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3TestUtils.java @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.sdk.io.aws2.options.S3Options.S3UploadBufferSizeBytesFactory.MINIMUM_UPLOAD_BUFFER_SIZE_BYTES; + +import java.nio.charset.StandardCharsets; +import java.util.Base64; +import org.apache.beam.sdk.io.aws2.options.S3Options; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.commons.codec.digest.DigestUtils; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.mockito.Mockito; +import software.amazon.awssdk.services.s3.S3Client; +import software.amazon.awssdk.services.s3.S3ClientBuilder; +import software.amazon.awssdk.services.s3.S3Configuration; +import software.amazon.awssdk.services.s3.model.ServerSideEncryption; + +/** Utils to test S3 filesystem. */ +class S3TestUtils { + + static S3Options s3Options() { + S3Options options = PipelineOptionsFactory.as(S3Options.class); + options.setAwsRegion("us-west-1"); + options.setS3UploadBufferSizeBytes(MINIMUM_UPLOAD_BUFFER_SIZE_BYTES); + return options; + } + + static S3Options s3OptionsWithPathStyleAccessEnabled() { + S3Options options = PipelineOptionsFactory.as(S3Options.class); + options.setAwsRegion("us-west-1"); + options.setS3UploadBufferSizeBytes(MINIMUM_UPLOAD_BUFFER_SIZE_BYTES); + options.setS3ClientFactoryClass(PathStyleAccessS3ClientBuilderFactory.class); + return options; + } + + static S3Options s3OptionsWithSSEAlgorithm() { + S3Options options = s3Options(); + options.setSSEAlgorithm(ServerSideEncryption.AES256.name()); + return options; + } + + static S3Options s3OptionsWithSSECustomerKey() { + S3Options options = s3Options(); + options.setSSECustomerKey( + SSECustomerKey.builder() + .key("86glyTlCNZgccSxW8JxMa6ZdjdK3N141glAysPUZ3AA=") + .algorithm("AES256") + .build()); + return options; + } + + static S3Options s3OptionsWithSSEKMSKeyId() { + S3Options options = s3Options(); + String ssekmsKeyId = + "arn:aws:kms:eu-west-1:123456789012:key/dc123456-7890-ABCD-EF01-234567890ABC"; + options.setSSEKMSKeyId(ssekmsKeyId); + options.setSSEAlgorithm("aws:kms"); + return options; + } + + static S3Options s3OptionsWithMultipleSSEOptions() { + S3Options options = s3OptionsWithSSEKMSKeyId(); + options.setSSECustomerKey( + SSECustomerKey.builder() + .key("86glyTlCNZgccSxW8JxMa6ZdjdK3N141glAysPUZ3AA=") + .algorithm("AES256") + .build()); + return options; + } + + static S3FileSystem buildMockedS3FileSystem(S3Options options) { + return buildMockedS3FileSystem(options, Mockito.mock(S3Client.class)); + } + + static S3FileSystem buildMockedS3FileSystem(S3Options options, S3Client client) { + S3FileSystem s3FileSystem = new S3FileSystem(options); + s3FileSystem.setS3Client(client); + return s3FileSystem; + } + + static @Nullable String getSSECustomerKeyMd5(S3Options options) { + String sseCostumerKey = options.getSSECustomerKey().getKey(); + if (sseCostumerKey != null) { + Base64.Decoder decoder = Base64.getDecoder(); + Base64.Encoder encoder = Base64.getEncoder(); + return encoder.encodeToString( + DigestUtils.md5(decoder.decode(sseCostumerKey.getBytes(StandardCharsets.UTF_8)))); + } + return null; + } + + private static class PathStyleAccessS3ClientBuilderFactory extends DefaultS3ClientBuilderFactory { + + @Override + public S3ClientBuilder createBuilder(S3Options s3Options) { + S3ClientBuilder s3ClientBuilder = super.createBuilder(s3Options); + return s3ClientBuilder.serviceConfiguration( + S3Configuration.builder().pathStyleAccessEnabled(true).build()); + } + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3WritableByteChannelTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3WritableByteChannelTest.java new file mode 100644 index 000000000000..9a8deee7c741 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/S3WritableByteChannelTest.java @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.getSSECustomerKeyMd5; +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.s3Options; +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.s3OptionsWithMultipleSSEOptions; +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.s3OptionsWithSSEAlgorithm; +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.s3OptionsWithSSECustomerKey; +import static org.apache.beam.sdk.io.aws2.s3.S3TestUtils.s3OptionsWithSSEKMSKeyId; +import static org.apache.beam.sdk.io.aws2.s3.S3WritableByteChannel.atMostOne; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertThrows; +import static org.junit.Assert.assertTrue; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.isNotNull; +import static org.mockito.ArgumentMatchers.notNull; +import static org.mockito.Mockito.RETURNS_SMART_NULLS; +import static org.mockito.Mockito.doReturn; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.withSettings; + +import java.io.IOException; +import java.nio.ByteBuffer; +import org.apache.beam.sdk.io.aws2.options.S3Options; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import software.amazon.awssdk.core.sync.RequestBody; +import software.amazon.awssdk.services.s3.S3Client; +import software.amazon.awssdk.services.s3.model.CompleteMultipartUploadRequest; +import software.amazon.awssdk.services.s3.model.CompleteMultipartUploadResponse; +import software.amazon.awssdk.services.s3.model.CreateMultipartUploadRequest; +import software.amazon.awssdk.services.s3.model.CreateMultipartUploadResponse; +import software.amazon.awssdk.services.s3.model.ServerSideEncryption; +import software.amazon.awssdk.services.s3.model.UploadPartRequest; +import software.amazon.awssdk.services.s3.model.UploadPartResponse; + +/** Tests {@link S3WritableByteChannel}. */ +@RunWith(JUnit4.class) +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class S3WritableByteChannelTest { + + @Test + public void write() throws IOException { + writeFromOptions(s3Options()); + writeFromOptions(s3OptionsWithSSEAlgorithm()); + writeFromOptions(s3OptionsWithSSECustomerKey()); + writeFromOptions(s3OptionsWithSSEKMSKeyId()); + assertThrows( + IllegalArgumentException.class, () -> writeFromOptions(s3OptionsWithMultipleSSEOptions())); + } + + private void writeFromOptions(S3Options options) throws IOException { + S3Client mockS3Client = mock(S3Client.class, withSettings().defaultAnswer(RETURNS_SMART_NULLS)); + S3ResourceId path = S3ResourceId.fromUri("s3://bucket/dir/file"); + + CreateMultipartUploadResponse.Builder builder = + CreateMultipartUploadResponse.builder().uploadId("upload-id"); + + ServerSideEncryption sseAlgorithm = ServerSideEncryption.fromValue(options.getSSEAlgorithm()); + if (options.getSSEAlgorithm() != null) { + builder.serverSideEncryption(sseAlgorithm); + } + if (getSSECustomerKeyMd5(options) != null) { + builder.sseCustomerKeyMD5(getSSECustomerKeyMd5(options)); + } + if (options.getSSEKMSKeyId() != null) { + sseAlgorithm = ServerSideEncryption.AWS_KMS; + builder.serverSideEncryption(sseAlgorithm); + } + CreateMultipartUploadResponse createMultipartUploadResponse = builder.build(); + doReturn(createMultipartUploadResponse) + .when(mockS3Client) + .createMultipartUpload(any(CreateMultipartUploadRequest.class)); + + CreateMultipartUploadRequest createMultipartUploadRequest = + CreateMultipartUploadRequest.builder().bucket(path.getBucket()).key(path.getKey()).build(); + CreateMultipartUploadResponse mockCreateMultipartUploadResponse1 = + mockS3Client.createMultipartUpload(createMultipartUploadRequest); + assertEquals(sseAlgorithm, mockCreateMultipartUploadResponse1.serverSideEncryption()); + assertEquals( + getSSECustomerKeyMd5(options), mockCreateMultipartUploadResponse1.sseCustomerKeyMD5()); + + UploadPartResponse.Builder uploadPartResponseBuilder = + UploadPartResponse.builder().eTag("etag"); + if (getSSECustomerKeyMd5(options) != null) { + uploadPartResponseBuilder.sseCustomerKeyMD5(getSSECustomerKeyMd5(options)); + } + UploadPartResponse response = uploadPartResponseBuilder.build(); + doReturn(response) + .when(mockS3Client) + .uploadPart(any(UploadPartRequest.class), any(RequestBody.class)); + + UploadPartResponse mockUploadPartResult = + mockS3Client.uploadPart(UploadPartRequest.builder().build(), RequestBody.empty()); + assertEquals(getSSECustomerKeyMd5(options), mockUploadPartResult.sseCustomerKeyMD5()); + + S3WritableByteChannel channel = + new S3WritableByteChannel(mockS3Client, path, "text/plain", options); + int contentSize = 34_078_720; + ByteBuffer uploadContent = ByteBuffer.allocate((int) (contentSize * 2.5)); + for (int i = 0; i < contentSize; i++) { + uploadContent.put((byte) 0xff); + } + uploadContent.flip(); + + int uploadedSize = channel.write(uploadContent); + assertEquals(contentSize, uploadedSize); + + CompleteMultipartUploadResponse completeMultipartUploadResponse = + CompleteMultipartUploadResponse.builder().build(); + doReturn(completeMultipartUploadResponse) + .when(mockS3Client) + .completeMultipartUpload(any(CompleteMultipartUploadRequest.class)); + + channel.close(); + + int partQuantity = + (int) Math.ceil((double) contentSize / options.getS3UploadBufferSizeBytes()) + 1; + + verify(mockS3Client, times(2)) + .createMultipartUpload((CreateMultipartUploadRequest) isNotNull()); + verify(mockS3Client, times(partQuantity)) + .uploadPart((UploadPartRequest) isNotNull(), any(RequestBody.class)); + verify(mockS3Client, times(1)) + .completeMultipartUpload((CompleteMultipartUploadRequest) notNull()); + verifyNoMoreInteractions(mockS3Client); + } + + @Test + public void testAtMostOne() { + assertTrue(atMostOne(true)); + assertTrue(atMostOne(false)); + assertFalse(atMostOne(true, true)); + assertTrue(atMostOne(true, false)); + assertTrue(atMostOne(false, true)); + assertTrue(atMostOne(false, false)); + assertFalse(atMostOne(true, true, true)); + assertFalse(atMostOne(true, true, false)); + assertFalse(atMostOne(true, false, true)); + assertTrue(atMostOne(true, false, false)); + assertFalse(atMostOne(false, true, true)); + assertTrue(atMostOne(false, true, false)); + assertTrue(atMostOne(false, false, true)); + assertTrue(atMostOne(false, false, false)); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/SSECustomerKeyTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/SSECustomerKeyTest.java new file mode 100644 index 000000000000..ac395520ac95 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/s3/SSECustomerKeyTest.java @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.s3; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThrows; + +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test case for {@link SSECustomerKey}. */ +@RunWith(JUnit4.class) +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class SSECustomerKeyTest { + + @Test + public void testBuild() { + assertThrows( + IllegalArgumentException.class, + () -> buildWithArgs("86glyTlCNZgccSxW8JxMa6ZdjdK3N141glAysPUZ3AA=", null, null, null)); + assertThrows(IllegalArgumentException.class, () -> buildWithArgs(null, "AES256", null, null)); + buildWithArgs(null, null, null, null); + buildWithArgs( + "86glyTlCNZgccSxW8JxMa6ZdjdK3N141glAysPUZ3AA=", "AES256", null, "SDW4BE3CQz7VqDHYKpNC5A=="); + buildWithArgs( + "86glyTlCNZgccSxW8JxMa6ZdjdK3N141glAysPUZ3AA=", + "AES256", + "SDW4BE3CQz7VqDHYKpNC5A==", + "SDW4BE3CQz7VqDHYKpNC5A=="); + } + + public void buildWithArgs(String key, String algorithm, String md5, String encodedMD5) { + SSECustomerKey sseCustomerKey = + SSECustomerKey.builder().key(key).algorithm(algorithm).md5(md5).build(); + assertEquals(key, sseCustomerKey.getKey()); + assertEquals(algorithm, sseCustomerKey.getAlgorithm()); + assertEquals(encodedMD5, sseCustomerKey.getMD5()); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsAsyncClientProviderTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsAsyncClientProviderTest.java new file mode 100644 index 000000000000..4b94750a8bb2 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsAsyncClientProviderTest.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.sns; + +import static org.junit.Assert.assertEquals; + +import org.apache.beam.sdk.util.SerializableUtils; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import software.amazon.awssdk.auth.credentials.AwsBasicCredentials; +import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; +import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider; + +/** Tests on {@link BasicSnsAsyncClientProvider}. */ +@RunWith(JUnit4.class) +public class BasicSnsAsyncClientProviderTest { + + @Test + public void testSerialization() { + AwsCredentialsProvider awsCredentialsProvider = + StaticCredentialsProvider.create( + AwsBasicCredentials.create("ACCESS_KEY_ID", "SECRET_ACCESS_KEY")); + + BasicSnsAsyncClientProvider snsAsyncClientProvider = + new BasicSnsAsyncClientProvider(awsCredentialsProvider, "us-east-1", null); + + byte[] serializedBytes = SerializableUtils.serializeToByteArray(snsAsyncClientProvider); + + BasicSnsAsyncClientProvider snsAsyncClientProviderDeserialized = + (BasicSnsAsyncClientProvider) + SerializableUtils.deserializeFromByteArray(serializedBytes, "Aws Credentials Provider"); + + assertEquals(snsAsyncClientProvider, snsAsyncClientProviderDeserialized); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsClientProviderTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsClientProviderTest.java new file mode 100644 index 000000000000..76665a4af7cf --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/BasicSnsClientProviderTest.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.sns; + +import static org.junit.Assert.assertEquals; + +import org.apache.beam.sdk.util.SerializableUtils; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import software.amazon.awssdk.auth.credentials.AwsBasicCredentials; +import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; +import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider; + +/** Tests on {@link BasicSnsClientProvider}. */ +@RunWith(JUnit4.class) +public class BasicSnsClientProviderTest { + + @Test + public void testSerialization() { + AwsCredentialsProvider awsCredentialsProvider = + StaticCredentialsProvider.create( + AwsBasicCredentials.create("ACCESS_KEY_ID", "SECRET_ACCESS_KEY")); + + BasicSnsClientProvider snsClientProvider = + new BasicSnsClientProvider(awsCredentialsProvider, "us-east-1", null); + + byte[] serializedBytes = SerializableUtils.serializeToByteArray(snsClientProvider); + + BasicSnsClientProvider snsClientProviderDeserialized = + (BasicSnsClientProvider) + SerializableUtils.deserializeFromByteArray(serializedBytes, "Aws Credentials Provider"); + + assertEquals(snsClientProvider, snsClientProviderDeserialized); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/MockSnsAsyncBaseClient.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/MockSnsAsyncBaseClient.java index fb6180a60b16..697ae8ba7b76 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/MockSnsAsyncBaseClient.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/MockSnsAsyncBaseClient.java @@ -20,9 +20,6 @@ import java.io.Serializable; import software.amazon.awssdk.services.sns.SnsAsyncClient; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class MockSnsAsyncBaseClient implements SnsAsyncClient, Serializable { @Override public String serviceName() { diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsClientMockErrors.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsClientMockErrors.java index bf212a035ce0..0557b645202d 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsClientMockErrors.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsClientMockErrors.java @@ -27,9 +27,6 @@ import software.amazon.awssdk.services.sns.model.PublishResponse; /** Mock class to test a failed publish of a msg. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SnsClientMockErrors implements SnsClient { @Override diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsClientMockSuccess.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsClientMockSuccess.java index a3cf66f6f450..ca49b1ff1129 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsClientMockSuccess.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsClientMockSuccess.java @@ -30,9 +30,6 @@ // import static org.mockito.BDDMockito.given; /** Mock class to test a successful publish of a msg. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SnsClientMockSuccess implements SnsClient { @Override diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsIOTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsIOTest.java index 8a72d9e8c4be..9aae09ef8eef 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsIOTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsIOTest.java @@ -41,9 +41,6 @@ /** Tests to verify writes to Sns. */ @RunWith(PowerMockRunner.class) @PrepareForTest({PublishResponse.class, GetTopicAttributesResponse.class}) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SnsIOTest implements Serializable { private static final String topicArn = "arn:aws:sns:us-west-2:5880:topic-FMFEHJ47NRFO"; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsIOWriteTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsIOWriteTest.java index b571c66689d3..8b9f7959f097 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsIOWriteTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsIOWriteTest.java @@ -40,9 +40,6 @@ import software.amazon.awssdk.services.sns.model.PublishRequest; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SnsIOWriteTest implements Serializable { private static final String TOPIC = "test"; private static final int FAILURE_STATUS_CODE = 400; diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/BasicSqsClientProviderTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/BasicSqsClientProviderTest.java new file mode 100644 index 000000000000..0970f9de0d5c --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/BasicSqsClientProviderTest.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.sqs; + +import static org.junit.Assert.assertEquals; + +import org.apache.beam.sdk.util.SerializableUtils; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import software.amazon.awssdk.auth.credentials.AwsBasicCredentials; +import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider; +import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider; + +/** Tests on {@link BasicSqsClientProvider}. */ +@RunWith(JUnit4.class) +public class BasicSqsClientProviderTest { + + @Test + public void testSerialization() { + AwsCredentialsProvider awsCredentialsProvider = + StaticCredentialsProvider.create( + AwsBasicCredentials.create("ACCESS_KEY_ID", "SECRET_ACCESS_KEY")); + + BasicSqsClientProvider sqsClientProvider = + new BasicSqsClientProvider(awsCredentialsProvider, "us-east-1", null); + + byte[] serializedBytes = SerializableUtils.serializeToByteArray(sqsClientProvider); + + BasicSqsClientProvider sqsClientProviderDeserialized = + (BasicSqsClientProvider) + SerializableUtils.deserializeFromByteArray(serializedBytes, "Aws Credentials Provider"); + + assertEquals(sqsClientProvider, sqsClientProviderDeserialized); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/EmbeddedSqsServer.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/EmbeddedSqsServer.java index dfef6017607b..8f532e5a9554 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/EmbeddedSqsServer.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/EmbeddedSqsServer.java @@ -20,6 +20,7 @@ import java.net.URI; import org.elasticmq.rest.sqs.SQSRestServer; import org.elasticmq.rest.sqs.SQSRestServerBuilder; +import org.junit.rules.ExternalResource; import software.amazon.awssdk.auth.credentials.AwsBasicCredentials; import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider; import software.amazon.awssdk.regions.Region; @@ -27,10 +28,7 @@ import software.amazon.awssdk.services.sqs.model.CreateQueueRequest; import software.amazon.awssdk.services.sqs.model.CreateQueueResponse; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -class EmbeddedSqsServer { +class EmbeddedSqsServer extends ExternalResource { private static SQSRestServer sqsRestServer; private static SqsClient client; private static String queueUrl; @@ -38,7 +36,8 @@ class EmbeddedSqsServer { private static String endPoint = String.format("http://localhost:%d", port); private static String queueName = "test"; - static void start() { + @Override + protected void before() { sqsRestServer = SQSRestServerBuilder.withPort(port).start(); client = @@ -55,15 +54,16 @@ static void start() { queueUrl = queue.queueUrl(); } - static SqsClient getClient() { + public SqsClient getClient() { return client; } - static String getQueueUrl() { + public String getQueueUrl() { return queueUrl; } - static void stop() { + @Override + protected void after() { sqsRestServer.stopAndWait(); } } diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsClientProviderMock.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsClientProviderMock.java index 38642cb96373..bd341526bd3c 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsClientProviderMock.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsClientProviderMock.java @@ -20,9 +20,6 @@ import software.amazon.awssdk.services.sqs.SqsClient; /** Mocking AwsClientProvider. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SqsClientProviderMock implements SqsClientProvider { private static SqsClientProviderMock instance = new SqsClientProviderMock(); diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsIOTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsIOTest.java index a3416dc28103..930085105a0f 100644 --- a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsIOTest.java +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsIOTest.java @@ -21,13 +21,8 @@ import java.util.ArrayList; import java.util.List; -import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; -import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.Create; -import org.apache.beam.sdk.values.PCollection; -import org.junit.AfterClass; -import org.junit.BeforeClass; import org.junit.Rule; import org.junit.Test; import org.junit.runner.RunWith; @@ -40,36 +35,19 @@ /** Tests on {@link SqsIO}. */ @RunWith(JUnit4.class) +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) public class SqsIOTest { @Rule public TestPipeline pipeline = TestPipeline.create(); - @Test - public void testRead() { - final SqsClient client = EmbeddedSqsServer.getClient(); - final String queueUrl = EmbeddedSqsServer.getQueueUrl(); - - final PCollection output = - pipeline.apply( - SqsIO.read() - .withSqsClientProvider(SqsClientProviderMock.of(client)) - .withQueueUrl(queueUrl) - .withMaxNumRecords(100)); - - PAssert.thatSingleton(output.apply(Count.globally())).isEqualTo(100L); - - for (int i = 0; i < 100; i++) { - SendMessageRequest sendMessageRequest = - SendMessageRequest.builder().queueUrl(queueUrl).messageBody("This is a test").build(); - client.sendMessage(sendMessageRequest); - } - pipeline.run(); - } + @Rule public EmbeddedSqsServer embeddedSqsRestServer = new EmbeddedSqsServer(); @Test public void testWrite() { - final SqsClient client = EmbeddedSqsServer.getClient(); - final String queueUrl = EmbeddedSqsServer.getQueueUrl(); + final SqsClient client = embeddedSqsRestServer.getClient(); + final String queueUrl = embeddedSqsRestServer.getQueueUrl(); List messages = new ArrayList<>(); for (int i = 0; i < 100; i++) { @@ -105,14 +83,4 @@ public void testWrite() { received.contains("This is a test " + i); } } - - @BeforeClass - public static void before() { - EmbeddedSqsServer.start(); - } - - @AfterClass - public static void after() { - EmbeddedSqsServer.stop(); - } } diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedReaderTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedReaderTest.java new file mode 100644 index 000000000000..e364ecde5d35 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedReaderTest.java @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.sqs; + +import static junit.framework.TestCase.assertFalse; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashSet; +import java.util.List; +import org.apache.beam.sdk.io.UnboundedSource; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.util.CoderUtils; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import software.amazon.awssdk.services.sqs.SqsClient; +import software.amazon.awssdk.services.sqs.model.ChangeMessageVisibilityRequest; +import software.amazon.awssdk.services.sqs.model.SendMessageRequest; + +/** Tests on {@link SqsUnboundedReader}. */ +@RunWith(JUnit4.class) +public class SqsUnboundedReaderTest { + private static final String DATA = "testData"; + + @Rule public TestPipeline pipeline = TestPipeline.create(); + + @Rule public EmbeddedSqsServer embeddedSqsRestServer = new EmbeddedSqsServer(); + + private SqsUnboundedSource source; + + private void setupOneMessage() { + final SqsClient client = embeddedSqsRestServer.getClient(); + final String queueUrl = embeddedSqsRestServer.getQueueUrl(); + client.sendMessage(SendMessageRequest.builder().queueUrl(queueUrl).messageBody(DATA).build()); + source = + new SqsUnboundedSource( + SqsIO.read() + .withQueueUrl(queueUrl) + .withSqsClientProvider(SqsClientProviderMock.of(client)) + .withMaxNumRecords(1)); + } + + private void setupMessages(List messages) { + final SqsClient client = embeddedSqsRestServer.getClient(); + final String queueUrl = embeddedSqsRestServer.getQueueUrl(); + for (String message : messages) { + client.sendMessage( + SendMessageRequest.builder().queueUrl(queueUrl).messageBody(message).build()); + } + source = + new SqsUnboundedSource( + SqsIO.read() + .withQueueUrl(queueUrl) + .withSqsClientProvider(SqsClientProviderMock.of(client)) + .withMaxNumRecords(messages.size())); + } + + @Test + public void testReadOneMessage() throws IOException { + setupOneMessage(); + UnboundedSource.UnboundedReader reader = + source.createReader(pipeline.getOptions(), null); + // Read one message. + assertTrue(reader.start()); + assertEquals(DATA, reader.getCurrent().getBody()); + assertFalse(reader.advance()); + // ACK the message. + UnboundedSource.CheckpointMark checkpoint = reader.getCheckpointMark(); + checkpoint.finalizeCheckpoint(); + reader.close(); + } + + @Test + public void testTimeoutAckAndRereadOneMessage() throws IOException { + setupOneMessage(); + UnboundedSource.UnboundedReader reader = + source.createReader(pipeline.getOptions(), null); + SqsClient sqsClient = source.getSqs(); + assertTrue(reader.start()); + assertEquals(DATA, reader.getCurrent().getBody()); + String receiptHandle = reader.getCurrent().getReceiptHandle(); + // Set the message to timeout. + sqsClient.changeMessageVisibility( + ChangeMessageVisibilityRequest.builder() + .queueUrl(source.getRead().queueUrl()) + .receiptHandle(receiptHandle) + .visibilityTimeout(0) + .build()); + // We'll now receive the same message again. + assertTrue(reader.advance()); + assertEquals(DATA, reader.getCurrent().getBody()); + assertFalse(reader.advance()); + // Now ACK the message. + UnboundedSource.CheckpointMark checkpoint = reader.getCheckpointMark(); + checkpoint.finalizeCheckpoint(); + reader.close(); + } + + @Test + public void testMultipleReaders() throws IOException { + List incoming = new ArrayList<>(); + for (int i = 0; i < 2; i++) { + incoming.add(String.format("data_%d", i)); + } + setupMessages(incoming); + UnboundedSource.UnboundedReader reader = + source.createReader(pipeline.getOptions(), null); + // Consume two messages, only read one. + assertTrue(reader.start()); + assertEquals("data_0", reader.getCurrent().getBody()); + + // Grab checkpoint. + SqsCheckpointMark checkpoint = (SqsCheckpointMark) reader.getCheckpointMark(); + checkpoint.finalizeCheckpoint(); + assertEquals(1, checkpoint.notYetReadReceipts.size()); + + // Read second message. + assertTrue(reader.advance()); + assertEquals("data_1", reader.getCurrent().getBody()); + + // Restore from checkpoint. + byte[] checkpointBytes = + CoderUtils.encodeToByteArray(source.getCheckpointMarkCoder(), checkpoint); + checkpoint = CoderUtils.decodeFromByteArray(source.getCheckpointMarkCoder(), checkpointBytes); + assertEquals(1, checkpoint.notYetReadReceipts.size()); + + // Re-read second message. + reader = source.createReader(pipeline.getOptions(), checkpoint); + assertTrue(reader.start()); + assertEquals("data_1", reader.getCurrent().getBody()); + + // We are done. + assertFalse(reader.advance()); + + // ACK final message. + checkpoint = (SqsCheckpointMark) reader.getCheckpointMark(); + checkpoint.finalizeCheckpoint(); + reader.close(); + } + + @Test + public void testReadMany() throws IOException { + + HashSet messages = new HashSet<>(); + List incoming = new ArrayList<>(); + for (int i = 0; i < 100; i++) { + String content = String.format("data_%d", i); + messages.add(content); + incoming.add(String.format("data_%d", i)); + } + setupMessages(incoming); + + SqsUnboundedReader reader = + (SqsUnboundedReader) source.createReader(pipeline.getOptions(), null); + + for (int i = 0; i < 100; i++) { + if (i == 0) { + assertTrue(reader.start()); + } else { + assertTrue(reader.advance()); + } + String data = reader.getCurrent().getBody(); + boolean messageNum = messages.remove(data); + // No duplicate messages. + assertTrue(messageNum); + } + // We are done. + assertFalse(reader.advance()); + // We saw each message exactly once. + assertTrue(messages.isEmpty()); + reader.close(); + } + + /** Tests that checkpoints finalized after the reader is closed succeed. */ + @Test + public void testCloseWithActiveCheckpoints() throws Exception { + setupOneMessage(); + UnboundedSource.UnboundedReader reader = + source.createReader(pipeline.getOptions(), null); + reader.start(); + UnboundedSource.CheckpointMark checkpoint = reader.getCheckpointMark(); + reader.close(); + checkpoint.finalizeCheckpoint(); + } +} diff --git a/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedSourceTest.java b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedSourceTest.java new file mode 100644 index 000000000000..8ebc5ae966b4 --- /dev/null +++ b/sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sqs/SqsUnboundedSourceTest.java @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.sqs; + +import org.apache.beam.sdk.testing.CoderProperties; +import org.apache.beam.sdk.testing.TestPipeline; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import software.amazon.awssdk.services.sqs.SqsClient; +import software.amazon.awssdk.services.sqs.model.SendMessageRequest; + +/** Tests on {@link SqsUnboundedSource}. */ +@RunWith(JUnit4.class) +public class SqsUnboundedSourceTest { + + private static final String DATA = "testData"; + + @Rule public TestPipeline pipeline = TestPipeline.create(); + + @Rule public EmbeddedSqsServer embeddedSqsRestServer = new EmbeddedSqsServer(); + + @Test + public void testCheckpointCoderIsSane() { + final SqsClient client = embeddedSqsRestServer.getClient(); + final String queueUrl = embeddedSqsRestServer.getQueueUrl(); + client.sendMessage(SendMessageRequest.builder().queueUrl(queueUrl).messageBody(DATA).build()); + SqsUnboundedSource source = + new SqsUnboundedSource( + SqsIO.read() + .withQueueUrl(queueUrl) + .withSqsClientProvider(SqsClientProviderMock.of(client)) + .withMaxNumRecords(1)); + CoderProperties.coderSerializable(source.getCheckpointMarkCoder()); + } +} diff --git a/sdks/java/io/amqp/build.gradle b/sdks/java/io/amqp/build.gradle index 9c9f60c327e7..db3317c4a6a1 100644 --- a/sdks/java/io/amqp/build.gradle +++ b/sdks/java/io/amqp/build.gradle @@ -29,8 +29,6 @@ dependencies { compile "org.apache.qpid:proton-j:0.16.0" testCompile library.java.slf4j_api testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.activemq_broker testCompile library.java.activemq_amqp testCompile library.java.activemq_junit diff --git a/sdks/java/io/azure/build.gradle b/sdks/java/io/azure/build.gradle index 97d31f9d9640..8f30a48a67bd 100644 --- a/sdks/java/io/azure/build.gradle +++ b/sdks/java/io/azure/build.gradle @@ -29,13 +29,19 @@ ext.summary = "IO library to read and write Azure services from Beam." repositories { jcenter() } dependencies { + compile library.java.commons_io + permitUnusedDeclared library.java.commons_io // BEAM-11761 + compile library.java.slf4j_api compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") - compile "com.azure:azure-storage-blob:12.8.0" + compile "com.azure:azure-storage-blob:12.10.0" compile "com.azure:azure-identity:1.0.8" compile "com.microsoft.azure:azure-storage:8.6.5" - compile "commons-io:commons-io:2.6" - compile library.java.slf4j_api + compile "com.azure:azure-core:1.9.0" + compile "com.azure:azure-storage-common:12.10.0" + compile library.java.jackson_annotations + compile library.java.jackson_core + compile library.java.jackson_databind testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile library.java.mockito_core testCompile library.java.junit diff --git a/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/blobstore/AzureBlobStoreFileSystem.java b/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/blobstore/AzureBlobStoreFileSystem.java index c466392a71b3..7bc6d2679db7 100644 --- a/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/blobstore/AzureBlobStoreFileSystem.java +++ b/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/blobstore/AzureBlobStoreFileSystem.java @@ -52,6 +52,7 @@ import org.apache.beam.sdk.io.azure.options.BlobstoreOptions; import org.apache.beam.sdk.io.fs.CreateOptions; import org.apache.beam.sdk.io.fs.MatchResult; +import org.apache.beam.sdk.io.fs.MoveOptions; import org.apache.beam.sdk.util.InstanceBuilder; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; @@ -198,24 +199,30 @@ MatchResult expand(AzfsResourceId azfsPattern) { .withSize(properties.getBlobSize()) .withLastModified(Date.from(properties.getLastModified().toInstant())); - results.add(toMetadata(rid, properties.getContentEncoding())); + results.add(toMetadata(rid, properties.getContentEncoding(), properties.getETag())); } }); return MatchResult.create(MatchResult.Status.OK, results); } - private MatchResult.Metadata toMetadata(AzfsResourceId path, String contentEncoding) { + private MatchResult.Metadata toMetadata( + AzfsResourceId path, String contentEncoding, String eTag) { checkArgument(path.getSize() != null, "The resource id should have a size."); boolean isReadSeekEfficient = !NON_READ_SEEK_EFFICIENT_ENCODINGS.contains(contentEncoding); - return MatchResult.Metadata.builder() - .setIsReadSeekEfficient(isReadSeekEfficient) - .setResourceId(path) - .setSizeBytes(path.getSize()) - .setLastModifiedMillis(path.getLastModified().transform(Date::getTime).or(0L)) - .build(); + MatchResult.Metadata.Builder ret = + MatchResult.Metadata.builder() + .setIsReadSeekEfficient(isReadSeekEfficient) + .setResourceId(path) + .setSizeBytes(path.getSize()) + .setLastModifiedMillis(path.getLastModified().transform(Date::getTime).or(0L)); + + if (eTag != null) { + ret.setChecksum(eTag); + } + return ret.build(); } /** @@ -253,7 +260,8 @@ private MatchResult toMatchResult(AzfsResourceId path) { toMetadata( path.withSize(blobProperties.getBlobSize()) .withLastModified(Date.from(blobProperties.getLastModified().toInstant())), - blobProperties.getContentEncoding()))); + blobProperties.getContentEncoding(), + blobProperties.getETag()))); } @Override @@ -382,8 +390,14 @@ String generateSasToken() throws IOException { } @Override - protected void rename(List srcResourceIds, List destResourceIds) + protected void rename( + List srcResourceIds, + List destResourceIds, + MoveOptions... moveOptions) throws IOException { + if (moveOptions.length > 0) { + throw new UnsupportedOperationException("Support for move options is not yet implemented."); + } copy(srcResourceIds, destResourceIds); delete(srcResourceIds); } diff --git a/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/blobstore/DefaultBlobstoreClientBuilderFactory.java b/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/blobstore/DefaultBlobstoreClientBuilderFactory.java index 707e73bb81c8..b68fcc2826f6 100644 --- a/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/blobstore/DefaultBlobstoreClientBuilderFactory.java +++ b/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/blobstore/DefaultBlobstoreClientBuilderFactory.java @@ -37,12 +37,8 @@ public BlobServiceClientBuilder createBuilder(BlobstoreOptions blobstoreOptions) builder = builder.connectionString(blobstoreOptions.getAzureConnectionString()); } - if (blobstoreOptions.getSharedKeyCredential() != null) { - builder = builder.credential(blobstoreOptions.getSharedKeyCredential()); - } - - if (blobstoreOptions.getTokenCredential() != null) { - builder = builder.credential(blobstoreOptions.getTokenCredential()); + if (blobstoreOptions.getAzureCredentialsProvider() != null) { + builder = builder.credential(blobstoreOptions.getAzureCredentialsProvider()); } if (!Strings.isNullOrEmpty(blobstoreOptions.getSasToken())) { diff --git a/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/options/AzureModule.java b/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/options/AzureModule.java new file mode 100644 index 000000000000..ea538cfbec94 --- /dev/null +++ b/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/options/AzureModule.java @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.azure.options; + +import com.azure.core.credential.TokenCredential; +import com.azure.identity.ClientCertificateCredential; +import com.azure.identity.ClientCertificateCredentialBuilder; +import com.azure.identity.ClientSecretCredential; +import com.azure.identity.ClientSecretCredentialBuilder; +import com.azure.identity.DefaultAzureCredential; +import com.azure.identity.DefaultAzureCredentialBuilder; +import com.azure.identity.EnvironmentCredential; +import com.azure.identity.EnvironmentCredentialBuilder; +import com.azure.identity.ManagedIdentityCredential; +import com.azure.identity.ManagedIdentityCredentialBuilder; +import com.azure.identity.UsernamePasswordCredential; +import com.azure.identity.UsernamePasswordCredentialBuilder; +import com.azure.identity.implementation.IdentityClient; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.core.JsonParser; +import com.fasterxml.jackson.core.JsonToken; +import com.fasterxml.jackson.core.type.TypeReference; +import com.fasterxml.jackson.core.type.WritableTypeId; +import com.fasterxml.jackson.databind.DeserializationContext; +import com.fasterxml.jackson.databind.JsonDeserializer; +import com.fasterxml.jackson.databind.JsonSerializer; +import com.fasterxml.jackson.databind.Module; +import com.fasterxml.jackson.databind.SerializerProvider; +import com.fasterxml.jackson.databind.annotation.JsonDeserialize; +import com.fasterxml.jackson.databind.annotation.JsonSerialize; +import com.fasterxml.jackson.databind.jsontype.TypeDeserializer; +import com.fasterxml.jackson.databind.jsontype.TypeSerializer; +import com.fasterxml.jackson.databind.module.SimpleModule; +import com.google.auto.service.AutoService; +import java.io.IOException; +import java.lang.reflect.Field; +import java.util.Map; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; + +/** + * A Jackson {@link Module} that registers a {@link JsonSerializer} and {@link JsonDeserializer} for + * Azure credential providers. The serialized form is a JSON map. + */ +@AutoService(Module.class) +public class AzureModule extends SimpleModule { + + private static final String TYPE_PROPERTY = "@type"; + private static final String AZURE_CLIENT_ID = "azureClientId"; + private static final String AZURE_TENANT_ID = "azureTenantId"; + private static final String AZURE_CLIENT_SECRET = "azureClientSecret"; + private static final String AZURE_CLIENT_CERTIFICATE_PATH = "azureClientCertificatePath"; + private static final String AZURE_USERNAME = "azureUsername"; + private static final String AZURE_PASSWORD = "azurePassword"; + private static final String AZURE_PFX_CERTIFICATE_PATH = "azurePfxCertificatePath"; + private static final String AZURE_PFX_CERTIFICATE_PASSWORD = "azurePfxCertificatePassword"; + + @SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) + }) + public AzureModule() { + super("AzureModule"); + setMixInAnnotation(TokenCredential.class, TokenCredentialMixin.class); + } + + /** A mixin to add Jackson annotations to {@link TokenCredential}. */ + @JsonDeserialize(using = TokenCredentialDeserializer.class) + @JsonSerialize(using = TokenCredentialSerializer.class) + @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.PROPERTY) + private static class TokenCredentialMixin {} + + private static class TokenCredentialDeserializer extends JsonDeserializer { + + @Override + public TokenCredential deserialize(JsonParser jsonParser, DeserializationContext context) + throws IOException { + return context.readValue(jsonParser, TokenCredential.class); + } + + @Override + public TokenCredential deserializeWithType( + JsonParser jsonParser, DeserializationContext context, TypeDeserializer typeDeserializer) + throws IOException { + Map asMap = + jsonParser.readValueAs(new TypeReference>() {}); + if (asMap == null) { + throw new IOException("Azure credentials provider could not be read."); + } + + String typeNameKey = typeDeserializer.getPropertyName(); + String typeName = asMap.get(typeNameKey); + if (typeName == null) { + throw new IOException( + String.format("Azure credentials provider type name key '%s' not found", typeNameKey)); + } + + if (typeName.equals(DefaultAzureCredential.class.getSimpleName())) { + return new DefaultAzureCredentialBuilder().build(); + } else if (typeName.equals(ClientSecretCredential.class.getSimpleName())) { + return new ClientSecretCredentialBuilder() + .clientId(asMap.getOrDefault(AZURE_CLIENT_ID, "")) + .clientSecret(asMap.getOrDefault(AZURE_CLIENT_SECRET, "")) + .tenantId(asMap.getOrDefault(AZURE_TENANT_ID, "")) + .build(); + } else if (typeName.equals(ManagedIdentityCredential.class.getSimpleName())) { + return new ManagedIdentityCredentialBuilder() + .clientId(asMap.getOrDefault(AZURE_CLIENT_ID, "")) + .build(); + } else if (typeName.equals(EnvironmentCredential.class.getSimpleName())) { + return new EnvironmentCredentialBuilder().build(); + } else if (typeName.equals(ClientCertificateCredential.class.getSimpleName())) { + if (asMap.containsKey(AZURE_CLIENT_CERTIFICATE_PATH)) { + return new ClientCertificateCredentialBuilder() + .clientId(asMap.getOrDefault(AZURE_CLIENT_ID, "")) + .pemCertificate(asMap.getOrDefault(AZURE_CLIENT_CERTIFICATE_PATH, "")) + .tenantId(asMap.getOrDefault(AZURE_TENANT_ID, "")) + .build(); + } else { + return new ClientCertificateCredentialBuilder() + .clientId(asMap.getOrDefault(AZURE_CLIENT_ID, "")) + .pfxCertificate( + asMap.getOrDefault(AZURE_PFX_CERTIFICATE_PATH, ""), + asMap.getOrDefault(AZURE_PFX_CERTIFICATE_PASSWORD, "")) + .tenantId(asMap.getOrDefault(AZURE_TENANT_ID, "")) + .build(); + } + } else if (typeName.equals(UsernamePasswordCredential.class.getSimpleName())) { + return new UsernamePasswordCredentialBuilder() + .clientId(asMap.getOrDefault(AZURE_CLIENT_ID, "")) + .username(asMap.getOrDefault(AZURE_USERNAME, "")) + .password(asMap.getOrDefault(AZURE_PASSWORD, "")) + .tenantId(asMap.getOrDefault(AZURE_TENANT_ID, "")) + .build(); + } else { + throw new IOException( + String.format("Azure credential provider type '%s' is not supported", typeName)); + } + } + } + + private static class TokenCredentialSerializer extends JsonSerializer { + // These providers are singletons, so don't require any serialization, other than type. + // add any singleton credentials... + private static final ImmutableSet SINGLETON_CREDENTIAL_PROVIDERS = ImmutableSet.of(); + + @Override + public void serialize( + TokenCredential tokenCredential, + JsonGenerator jsonGenerator, + SerializerProvider serializers) + throws IOException { + serializers.defaultSerializeValue(tokenCredential, jsonGenerator); + } + + @SuppressWarnings("nullness") + private static Object getMember(Object obj, String member) + throws IllegalAccessException, NoSuchFieldException { + Class cls = obj.getClass(); + Field field = cls.getDeclaredField(member); + field.setAccessible(true); + Object fieldObj = field.get(obj); + assert fieldObj != null; + return fieldObj; + } + + @Override + public void serializeWithType( + TokenCredential tokenCredential, + JsonGenerator jsonGenerator, + SerializerProvider serializers, + TypeSerializer typeSerializer) + throws IOException { + + WritableTypeId typeIdDef = + typeSerializer.writeTypePrefix( + jsonGenerator, typeSerializer.typeId(tokenCredential, JsonToken.START_OBJECT)); + + try { + if (tokenCredential instanceof DefaultAzureCredential) { + // Do nothing + } else if (tokenCredential instanceof ClientSecretCredential) { + ClientSecretCredential credential = (ClientSecretCredential) tokenCredential; + IdentityClient identityClient = (IdentityClient) getMember(credential, "identityClient"); + jsonGenerator.writeStringField( + AZURE_CLIENT_ID, (String) getMember(identityClient, "clientId")); + jsonGenerator.writeStringField( + AZURE_TENANT_ID, (String) getMember(identityClient, "tenantId")); + jsonGenerator.writeStringField( + AZURE_CLIENT_SECRET, (String) getMember(credential, "clientSecret")); + } else if (tokenCredential instanceof ManagedIdentityCredential) { + ManagedIdentityCredential credential = (ManagedIdentityCredential) tokenCredential; + Object appServiceMsiCredential = getMember(credential, "appServiceMSICredential"); + IdentityClient identityClient = + (IdentityClient) getMember(appServiceMsiCredential, "identityClient"); + jsonGenerator.writeStringField( + AZURE_CLIENT_ID, (String) getMember(identityClient, "clientId")); + } else if (tokenCredential instanceof EnvironmentCredential) { + // Do nothing + } else if (tokenCredential instanceof ClientCertificateCredential) { + throw new IOException("Client certificates not yet implemented"); // TODO + } else if (tokenCredential instanceof UsernamePasswordCredential) { + UsernamePasswordCredential credential = (UsernamePasswordCredential) tokenCredential; + IdentityClient identityClient = (IdentityClient) getMember(credential, "identityClient"); + jsonGenerator.writeStringField( + AZURE_CLIENT_ID, (String) getMember(identityClient, "clientId")); + jsonGenerator.writeStringField( + AZURE_USERNAME, (String) getMember(credential, "username")); + jsonGenerator.writeStringField( + AZURE_PASSWORD, (String) getMember(credential, "password")); + } else { + throw new IOException( + String.format( + "Azure credential provider type '%s' is not supported", + tokenCredential.getClass().getSimpleName())); + } + } catch (IllegalAccessException | NoSuchFieldException e) { + throw new IOException( + String.format( + "Failed to serialize object of type '%s': %s", + tokenCredential.getClass().getSimpleName(), e.toString())); + } + + typeSerializer.writeTypeSuffix(jsonGenerator, typeIdDef); + } + } +} diff --git a/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/options/BlobstoreOptions.java b/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/options/BlobstoreOptions.java index 108cb0c31d37..0e704abbe5b7 100644 --- a/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/options/BlobstoreOptions.java +++ b/sdks/java/io/azure/src/main/java/org/apache/beam/sdk/io/azure/options/BlobstoreOptions.java @@ -23,7 +23,6 @@ import com.azure.core.http.policy.HttpPipelinePolicy; import com.azure.identity.DefaultAzureCredentialBuilder; import com.azure.storage.blob.models.CustomerProvidedKey; -import com.azure.storage.common.StorageSharedKeyCredential; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.io.azure.blobstore.DefaultBlobstoreClientBuilderFactory; @@ -59,16 +58,6 @@ void setBlobstoreClientFactoryClass( void setAzureConnectionString(String connectionString); - @Description("Sets a StorageSharedKeyCredential used to authorize requests sent to the service.") - StorageSharedKeyCredential getSharedKeyCredential(); - - void setSharedKeyCredential(StorageSharedKeyCredential sharedKeyCredential); - - @Description("Sets a TokenCredential used to authorize requests sent to the service.") - TokenCredential getTokenCredential(); - - void setTokenCredential(TokenCredential tokenCredential); - @Description("Sets the SAS token used to authorize requests sent to the service.") String getSasToken(); @@ -110,14 +99,25 @@ void setBlobstoreClientFactoryClass( void setHttpPipeline(HttpPipeline httpPipeline); + /* Refer to {@link DefaultAWSCredentialsProviderChain} Javadoc for usage help. */ + /** * The credential instance that should be used to authenticate against Azure services. The option * value must contain a "@type" field and an Azure credentials provider class as the field value. + * + *

    For example, to specify the Azure client id, tenant id, and client secret, specify the + * following: + * {"@type" : "ClientSecretCredential", "azureClientId": "client_id_value", + * "azureTenantId": "tenant_id_value", "azureClientSecret": "client_secret_value"} + * */ @Description( "The credential instance that should be used to authenticate " + "against Azure services. The option value must contain \"@type\" field " - + "and an Azure credentials provider class name as the field value.") + + "and an Azure credentials provider class name as the field value. " + + " For example, to specify the Azure client id, tenant id, and client secret, specify the following: " + + "{\"@type\" : \"ClientSecretCredential\", \"azureClientId\": \"client_id_value\", " + + "\"azureTenantId\": \"tenant_id_value\", \"azureClientSecret\": \"client_secret_value\"}") @Default.InstanceFactory(AzureUserCredentialsFactory.class) TokenCredential getAzureCredentialsProvider(); diff --git a/sdks/java/io/azure/src/test/java/org/apache/beam/sdk/io/azure/blobstore/AzfsResourceIdTest.java b/sdks/java/io/azure/src/test/java/org/apache/beam/sdk/io/azure/blobstore/AzfsResourceIdTest.java index 936e976a4ac3..6a1379ba05eb 100644 --- a/sdks/java/io/azure/src/test/java/org/apache/beam/sdk/io/azure/blobstore/AzfsResourceIdTest.java +++ b/sdks/java/io/azure/src/test/java/org/apache/beam/sdk/io/azure/blobstore/AzfsResourceIdTest.java @@ -42,9 +42,6 @@ import org.junit.runners.Parameterized; @RunWith(Enclosed.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AzfsResourceIdTest { @RunWith(Parameterized.class) public static class ResolveTest { diff --git a/sdks/java/io/azure/src/test/java/org/apache/beam/sdk/io/azure/blobstore/AzureBlobStoreFileSystemTest.java b/sdks/java/io/azure/src/test/java/org/apache/beam/sdk/io/azure/blobstore/AzureBlobStoreFileSystemTest.java index 7f89119a713d..92d8ad07f04b 100644 --- a/sdks/java/io/azure/src/test/java/org/apache/beam/sdk/io/azure/blobstore/AzureBlobStoreFileSystemTest.java +++ b/sdks/java/io/azure/src/test/java/org/apache/beam/sdk/io/azure/blobstore/AzureBlobStoreFileSystemTest.java @@ -19,10 +19,10 @@ import static java.util.UUID.randomUUID; import static org.apache.beam.sdk.io.fs.CreateOptions.StandardCreateOptions.builder; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.ArgumentMatchers.any; import static org.mockito.ArgumentMatchers.anyString; import static org.mockito.Mockito.doAnswer; diff --git a/sdks/java/io/bigquery-io-perf-tests/build.gradle b/sdks/java/io/bigquery-io-perf-tests/build.gradle index 8d5cf89195d3..9c02f4297207 100644 --- a/sdks/java/io/bigquery-io-perf-tests/build.gradle +++ b/sdks/java/io/bigquery-io-perf-tests/build.gradle @@ -25,7 +25,7 @@ description = "Apache Beam :: SDKs :: Java :: Google BigQuery IO Performance tes ext.summary = "Performance tests for Google BigQuery IO sources and sinks" dependencies { - compile library.java.google_api_services_bigquery + testCompile library.java.google_api_services_bigquery testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(path: ":sdks:java:testing:test-utils", configuration: "testRuntime") testCompile project(path: ":sdks:java:extensions:google-cloud-platform-core", configuration: "testRuntime") @@ -34,7 +34,4 @@ dependencies { testCompile project(path: ":sdks:java:io:google-cloud-platform", configuration: "testRuntime") testCompile project(":sdks:java:io:synthetic") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.jaxb_api - testCompile library.java.jaxb_impl } diff --git a/sdks/java/io/bigquery-io-perf-tests/src/test/java/org/apache/beam/sdk/bigqueryioperftests/BigQueryIOIT.java b/sdks/java/io/bigquery-io-perf-tests/src/test/java/org/apache/beam/sdk/bigqueryioperftests/BigQueryIOIT.java index 9db230c2cdf6..41cad8d7bb78 100644 --- a/sdks/java/io/bigquery-io-perf-tests/src/test/java/org/apache/beam/sdk/bigqueryioperftests/BigQueryIOIT.java +++ b/sdks/java/io/bigquery-io-perf-tests/src/test/java/org/apache/beam/sdk/bigqueryioperftests/BigQueryIOIT.java @@ -78,9 +78,6 @@ * */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryIOIT { private static final String NAMESPACE = BigQueryIOIT.class.getName(); private static final String TEST_ID = UUID.randomUUID().toString(); diff --git a/sdks/java/io/cassandra/build.gradle b/sdks/java/io/cassandra/build.gradle index 6fc6e92b91af..94b584b186c0 100644 --- a/sdks/java/io/cassandra/build.gradle +++ b/sdks/java/io/cassandra/build.gradle @@ -28,8 +28,8 @@ enableJavaPerformanceTesting() description = "Apache Beam :: SDKs :: Java :: IO :: Cassandra" ext.summary = "IO to read and write with Apache Cassandra database" -// compatible with all Cassandra versions up to 3.11.3 -def achilles_version = "6.0.2" +// compatible with all Cassandra versions up to 3.11.10 +def achilles_version = "6.1.0" dependencies { compile library.java.vendored_guava_26_0_jre @@ -50,3 +50,10 @@ dependencies { testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } + +// Cassandra dependencies require old version of Guava (BEAM-11626) +configurations.all { + resolutionStrategy { + force 'com.google.guava:guava:25.1-jre' + } +} diff --git a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java index cd67dc6a581b..bd919ddbdade 100644 --- a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java +++ b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java @@ -21,13 +21,9 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; import com.datastax.driver.core.Cluster; -import com.datastax.driver.core.ColumnMetadata; import com.datastax.driver.core.ConsistencyLevel; import com.datastax.driver.core.PlainTextAuthProvider; import com.datastax.driver.core.QueryOptions; -import com.datastax.driver.core.ResultSet; -import com.datastax.driver.core.ResultSetFuture; -import com.datastax.driver.core.Row; import com.datastax.driver.core.Session; import com.datastax.driver.core.SocketOptions; import com.datastax.driver.core.policies.DCAwareRoundRobinPolicy; @@ -36,32 +32,32 @@ import java.math.BigInteger; import java.util.ArrayList; import java.util.Collections; -import java.util.Iterator; import java.util.List; -import java.util.NoSuchElementException; import java.util.Optional; +import java.util.Set; import java.util.concurrent.ExecutionException; import java.util.concurrent.Future; import java.util.function.BiFunction; import java.util.stream.Collectors; +import javax.annotation.Nullable; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.io.BoundedSource; +import org.apache.beam.sdk.coders.SerializableCoder; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.Reshuffle; import org.apache.beam.sdk.transforms.SerializableFunction; -import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.TypeDescriptor; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Joiner; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators; -import org.checkerframework.checker.nullness.qual.Nullable; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -89,6 +85,11 @@ * * } * + *

    Alternatively, one may use {@code CassandraIO.readAll() + * .withCoder(SerializableCoder.of(Person.class))} to query a subset of the Cassandra database by + * creating a PCollection of {@code CassandraIO.Read} each with their own query or + * RingRange. + * *

    Writing to Apache Cassandra

    * *

    {@code CassandraIO} provides a sink to write a collection of entities to Apache Cassandra. @@ -137,6 +138,11 @@ public static Read read() { return new AutoValue_CassandraIO_Read.Builder().build(); } + /** Provide a {@link ReadAll} {@link PTransform} to read data from a Cassandra database. */ + public static ReadAll readAll() { + return new AutoValue_CassandraIO_ReadAll.Builder().build(); + } + /** Provide a {@link Write} {@link PTransform} to write data to a Cassandra database. */ public static Write write() { return Write.builder(MutationType.WRITE).build(); @@ -186,6 +192,9 @@ public abstract static class Read extends PTransform> abstract @Nullable SerializableFunction mapperFactoryFn(); + @Nullable + abstract ValueProvider> ringRanges(); + abstract Builder builder(); /** Specify the hosts of the Apache Cassandra instances. */ @@ -371,6 +380,14 @@ public Read withMapperFactoryFn(SerializableFunction mapperF return builder().setMapperFactoryFn(mapperFactory).build(); } + public Read withRingRanges(Set ringRange) { + return withRingRanges(ValueProvider.StaticValueProvider.of(ringRange)); + } + + public Read withRingRanges(ValueProvider> ringRange) { + return builder().setRingRanges(ringRange).build(); + } + @Override public PCollection expand(PBegin input) { checkArgument((hosts() != null && port() != null), "WithHosts() and withPort() are required"); @@ -379,7 +396,69 @@ public PCollection expand(PBegin input) { checkArgument(entity() != null, "withEntity() is required"); checkArgument(coder() != null, "withCoder() is required"); - return input.apply(org.apache.beam.sdk.io.Read.from(new CassandraSource<>(this, null))); + PCollection> splits = + input + .apply(Create.of(this)) + .apply("Create Splits", ParDo.of(new SplitFn())) + .setCoder(SerializableCoder.of(new TypeDescriptor>() {})); + + return splits.apply("ReadAll", CassandraIO.readAll().withCoder(coder())); + } + + private static class SplitFn extends DoFn, Read> { + @ProcessElement + public void process( + @Element CassandraIO.Read read, OutputReceiver> outputReceiver) { + Set ringRanges = getRingRanges(read); + for (RingRange rr : ringRanges) { + Set subset = ImmutableSet.of(rr); + outputReceiver.output(read.withRingRanges(ImmutableSet.of(rr))); + } + } + + private static Set getRingRanges(Read read) { + try (Cluster cluster = + getCluster( + read.hosts(), + read.port(), + read.username(), + read.password(), + read.localDc(), + read.consistencyLevel(), + read.connectTimeout(), + read.readTimeout())) { + if (isMurmur3Partitioner(cluster)) { + LOG.info("Murmur3Partitioner detected, splitting"); + Integer splitCount; + if (read.minNumberOfSplits() != null && read.minNumberOfSplits().get() != null) { + splitCount = read.minNumberOfSplits().get(); + } else { + splitCount = cluster.getMetadata().getAllHosts().size(); + } + List tokens = + cluster.getMetadata().getTokenRanges().stream() + .map(tokenRange -> new BigInteger(tokenRange.getEnd().getValue().toString())) + .collect(Collectors.toList()); + SplitGenerator splitGenerator = + new SplitGenerator(cluster.getMetadata().getPartitioner()); + + return splitGenerator.generateSplits(splitCount, tokens).stream() + .flatMap(List::stream) + .collect(Collectors.toSet()); + + } else { + LOG.warn( + "Only Murmur3Partitioner is supported for splitting, using an unique source for " + + "the read"); + String partitioner = cluster.getMetadata().getPartitioner(); + RingRange totalRingRange = + RingRange.of( + SplitGenerator.getRangeMin(partitioner), + SplitGenerator.getRangeMax(partitioner)); + return Collections.singleton(totalRingRange); + } + } + } } @AutoValue.Builder @@ -418,6 +497,8 @@ abstract static class Builder { abstract Optional> mapperFactoryFn(); + abstract Builder setRingRanges(ValueProvider> ringRange); + abstract Read autoBuild(); public Read build() { @@ -429,390 +510,6 @@ public Read build() { } } - @VisibleForTesting - static class CassandraSource extends BoundedSource { - final Read spec; - final List splitQueries; - // split source ached size - can't be calculated when already split - Long estimatedSize; - private static final String MURMUR3PARTITIONER = "org.apache.cassandra.dht.Murmur3Partitioner"; - - CassandraSource(Read spec, List splitQueries) { - this(spec, splitQueries, null); - } - - private CassandraSource(Read spec, List splitQueries, Long estimatedSize) { - this.estimatedSize = estimatedSize; - this.spec = spec; - this.splitQueries = splitQueries; - } - - @Override - public Coder getOutputCoder() { - return spec.coder(); - } - - @Override - public BoundedReader createReader(PipelineOptions pipelineOptions) { - return new CassandraReader(this); - } - - @Override - public List> split( - long desiredBundleSizeBytes, PipelineOptions pipelineOptions) { - try (Cluster cluster = - getCluster( - spec.hosts(), - spec.port(), - spec.username(), - spec.password(), - spec.localDc(), - spec.consistencyLevel(), - spec.connectTimeout(), - spec.readTimeout())) { - if (isMurmur3Partitioner(cluster)) { - LOG.info("Murmur3Partitioner detected, splitting"); - return splitWithTokenRanges( - spec, desiredBundleSizeBytes, getEstimatedSizeBytes(pipelineOptions), cluster); - } else { - LOG.warn( - "Only Murmur3Partitioner is supported for splitting, using a unique source for " - + "the read"); - return Collections.singletonList( - new CassandraIO.CassandraSource<>(spec, Collections.singletonList(buildQuery(spec)))); - } - } - } - - private static String buildQuery(Read spec) { - return (spec.query() == null) - ? String.format("SELECT * FROM %s.%s", spec.keyspace().get(), spec.table().get()) - : spec.query().get().toString(); - } - - /** - * Compute the number of splits based on the estimated size and the desired bundle size, and - * create several sources. - */ - private List> splitWithTokenRanges( - CassandraIO.Read spec, - long desiredBundleSizeBytes, - long estimatedSizeBytes, - Cluster cluster) { - long numSplits = - getNumSplits(desiredBundleSizeBytes, estimatedSizeBytes, spec.minNumberOfSplits()); - LOG.info("Number of desired splits is {}", numSplits); - - SplitGenerator splitGenerator = new SplitGenerator(cluster.getMetadata().getPartitioner()); - List tokens = - cluster.getMetadata().getTokenRanges().stream() - .map(tokenRange -> new BigInteger(tokenRange.getEnd().getValue().toString())) - .collect(Collectors.toList()); - List> splits = splitGenerator.generateSplits(numSplits, tokens); - LOG.info("{} splits were actually generated", splits.size()); - - final String partitionKey = - cluster.getMetadata().getKeyspace(spec.keyspace().get()).getTable(spec.table().get()) - .getPartitionKey().stream() - .map(ColumnMetadata::getName) - .collect(Collectors.joining(",")); - - List tokenRanges = - getTokenRanges(cluster, spec.keyspace().get(), spec.table().get()); - final long estimatedSize = getEstimatedSizeBytesFromTokenRanges(tokenRanges) / splits.size(); - - List> sources = new ArrayList<>(); - for (List split : splits) { - List queries = new ArrayList<>(); - for (RingRange range : split) { - if (range.isWrapping()) { - // A wrapping range is one that overlaps from the end of the partitioner range and its - // start (ie : when the start token of the split is greater than the end token) - // We need to generate two queries here : one that goes from the start token to the end - // of - // the partitioner range, and the other from the start of the partitioner range to the - // end token of the split. - queries.add(generateRangeQuery(spec, partitionKey, range.getStart(), null)); - // Generation of the second query of the wrapping range - queries.add(generateRangeQuery(spec, partitionKey, null, range.getEnd())); - } else { - queries.add(generateRangeQuery(spec, partitionKey, range.getStart(), range.getEnd())); - } - } - sources.add(new CassandraIO.CassandraSource<>(spec, queries, estimatedSize)); - } - return sources; - } - - private static String generateRangeQuery( - Read spec, String partitionKey, BigInteger rangeStart, BigInteger rangeEnd) { - final String rangeFilter = - Joiner.on(" AND ") - .skipNulls() - .join( - rangeStart == null - ? null - : String.format("(token(%s) >= %d)", partitionKey, rangeStart), - rangeEnd == null - ? null - : String.format("(token(%s) < %d)", partitionKey, rangeEnd)); - final String query = - (spec.query() == null) - ? buildQuery(spec) + " WHERE " + rangeFilter - : buildQuery(spec) + " AND " + rangeFilter; - LOG.debug("CassandraIO generated query : {}", query); - return query; - } - - private static long getNumSplits( - long desiredBundleSizeBytes, - long estimatedSizeBytes, - @Nullable ValueProvider minNumberOfSplits) { - long numSplits = - desiredBundleSizeBytes > 0 ? (estimatedSizeBytes / desiredBundleSizeBytes) : 1; - if (numSplits <= 0) { - LOG.warn("Number of splits is less than 0 ({}), fallback to 1", numSplits); - numSplits = 1; - } - return minNumberOfSplits != null ? Math.max(numSplits, minNumberOfSplits.get()) : numSplits; - } - - /** - * Returns cached estimate for split or if missing calculate size for whole table. Highly - * innacurate if query is specified. - * - * @param pipelineOptions - * @return - */ - @Override - public long getEstimatedSizeBytes(PipelineOptions pipelineOptions) { - if (estimatedSize != null) { - return estimatedSize; - } else { - try (Cluster cluster = - getCluster( - spec.hosts(), - spec.port(), - spec.username(), - spec.password(), - spec.localDc(), - spec.consistencyLevel(), - spec.connectTimeout(), - spec.readTimeout())) { - if (isMurmur3Partitioner(cluster)) { - try { - List tokenRanges = - getTokenRanges(cluster, spec.keyspace().get(), spec.table().get()); - this.estimatedSize = getEstimatedSizeBytesFromTokenRanges(tokenRanges); - return this.estimatedSize; - } catch (Exception e) { - LOG.warn("Can't estimate the size", e); - return 0L; - } - } else { - LOG.warn("Only Murmur3 partitioner is supported, can't estimate the size"); - return 0L; - } - } - } - } - - @VisibleForTesting - static long getEstimatedSizeBytesFromTokenRanges(List tokenRanges) { - long size = 0L; - for (TokenRange tokenRange : tokenRanges) { - size = size + tokenRange.meanPartitionSize * tokenRange.partitionCount; - } - return Math.round(size / getRingFraction(tokenRanges)); - } - - @Override - public void populateDisplayData(DisplayData.Builder builder) { - super.populateDisplayData(builder); - if (spec.hosts() != null) { - builder.add(DisplayData.item("hosts", spec.hosts().toString())); - } - if (spec.port() != null) { - builder.add(DisplayData.item("port", spec.port())); - } - builder.addIfNotNull(DisplayData.item("keyspace", spec.keyspace())); - builder.addIfNotNull(DisplayData.item("table", spec.table())); - builder.addIfNotNull(DisplayData.item("username", spec.username())); - builder.addIfNotNull(DisplayData.item("localDc", spec.localDc())); - builder.addIfNotNull(DisplayData.item("consistencyLevel", spec.consistencyLevel())); - } - // ------------- CASSANDRA SOURCE UTIL METHODS ---------------// - - /** - * Gets the list of token ranges that a table occupies on a give Cassandra node. - * - *

    NB: This method is compatible with Cassandra 2.1.5 and greater. - */ - private static List getTokenRanges(Cluster cluster, String keyspace, String table) { - try (Session session = cluster.newSession()) { - ResultSet resultSet = - session.execute( - "SELECT range_start, range_end, partitions_count, mean_partition_size FROM " - + "system.size_estimates WHERE keyspace_name = ? AND table_name = ?", - keyspace, - table); - - ArrayList tokenRanges = new ArrayList<>(); - for (Row row : resultSet) { - TokenRange tokenRange = - new TokenRange( - row.getLong("partitions_count"), - row.getLong("mean_partition_size"), - new BigInteger(row.getString("range_start")), - new BigInteger(row.getString("range_end"))); - tokenRanges.add(tokenRange); - } - // The table may not contain the estimates yet - // or have partitions_count and mean_partition_size fields = 0 - // if the data was just inserted and the amount of data in the table was small. - // This is very common situation during tests, - // when we insert a few rows and immediately query them. - // However, for tiny data sets the lack of size estimates is not a problem at all, - // because we don't want to split tiny data anyways. - // Therefore, we're not issuing a warning if the result set was empty - // or mean_partition_size and partitions_count = 0. - return tokenRanges; - } - } - - /** Compute the percentage of token addressed compared with the whole tokens in the cluster. */ - @VisibleForTesting - static double getRingFraction(List tokenRanges) { - double ringFraction = 0; - for (TokenRange tokenRange : tokenRanges) { - ringFraction = - ringFraction - + (distance(tokenRange.rangeStart, tokenRange.rangeEnd).doubleValue() - / SplitGenerator.getRangeSize(MURMUR3PARTITIONER).doubleValue()); - } - return ringFraction; - } - - /** - * Check if the current partitioner is the Murmur3 (default in Cassandra version newer than 2). - */ - @VisibleForTesting - static boolean isMurmur3Partitioner(Cluster cluster) { - return MURMUR3PARTITIONER.equals(cluster.getMetadata().getPartitioner()); - } - - /** Measure distance between two tokens. */ - @VisibleForTesting - static BigInteger distance(BigInteger left, BigInteger right) { - return (right.compareTo(left) > 0) - ? right.subtract(left) - : right.subtract(left).add(SplitGenerator.getRangeSize(MURMUR3PARTITIONER)); - } - - /** - * Represent a token range in Cassandra instance, wrapping the partition count, size and token - * range. - */ - @VisibleForTesting - static class TokenRange { - private final long partitionCount; - private final long meanPartitionSize; - private final BigInteger rangeStart; - private final BigInteger rangeEnd; - - TokenRange( - long partitionCount, long meanPartitionSize, BigInteger rangeStart, BigInteger rangeEnd) { - this.partitionCount = partitionCount; - this.meanPartitionSize = meanPartitionSize; - this.rangeStart = rangeStart; - this.rangeEnd = rangeEnd; - } - } - - private class CassandraReader extends BoundedSource.BoundedReader { - private final CassandraIO.CassandraSource source; - private Cluster cluster; - private Session session; - private Iterator iterator; - private T current; - - CassandraReader(CassandraSource source) { - this.source = source; - } - - @Override - public boolean start() { - LOG.debug("Starting Cassandra reader"); - cluster = - getCluster( - source.spec.hosts(), - source.spec.port(), - source.spec.username(), - source.spec.password(), - source.spec.localDc(), - source.spec.consistencyLevel(), - source.spec.connectTimeout(), - source.spec.readTimeout()); - session = cluster.connect(source.spec.keyspace().get()); - LOG.debug("Queries: " + source.splitQueries); - List futures = new ArrayList<>(); - for (String query : source.splitQueries) { - futures.add(session.executeAsync(query)); - } - - final Mapper mapper = getMapper(session, source.spec.entity()); - - for (ResultSetFuture result : futures) { - if (iterator == null) { - iterator = mapper.map(result.getUninterruptibly()); - } else { - iterator = Iterators.concat(iterator, mapper.map(result.getUninterruptibly())); - } - } - - return advance(); - } - - @Override - public boolean advance() { - if (iterator.hasNext()) { - current = iterator.next(); - return true; - } - current = null; - return false; - } - - @Override - public void close() { - LOG.debug("Closing Cassandra reader"); - if (session != null) { - session.close(); - } - if (cluster != null) { - cluster.close(); - } - } - - @Override - public T getCurrent() throws NoSuchElementException { - if (current == null) { - throw new NoSuchElementException(); - } - return current; - } - - @Override - public CassandraIO.CassandraSource getCurrentSource() { - return source; - } - - private Mapper getMapper(Session session, Class enitity) { - return source.spec.mapperFactoryFn().apply(session); - } - } - } - /** Specify the mutation type: either write or delete. */ public enum MutationType { WRITE, @@ -1136,6 +833,11 @@ public void processElement(ProcessContext c) throws ExecutionException, Interrup writer.mutate(c.element()); } + @FinishBundle + public void finishBundle() throws Exception { + writer.flush(); + } + @Teardown public void teardown() throws Exception { writer.close(); @@ -1161,6 +863,11 @@ public void processElement(ProcessContext c) throws ExecutionException, Interrup deleter.mutate(c.element()); } + @FinishBundle + public void finishBundle() throws Exception { + deleter.flush(); + } + @Teardown public void teardown() throws Exception { deleter.close(); @@ -1169,7 +876,7 @@ public void teardown() throws Exception { } /** Get a Cassandra cluster using hosts and port. */ - private static Cluster getCluster( + static Cluster getCluster( ValueProvider> hosts, ValueProvider port, ValueProvider username, @@ -1269,12 +976,14 @@ void mutate(T entity) throws ExecutionException, InterruptedException { } } - void close() throws ExecutionException, InterruptedException { + void flush() throws ExecutionException, InterruptedException { if (this.mutateFutures.size() > 0) { // Waiting for the last in flight async queries to return before finishing the bundle. waitForFuturesToFinish(); } + } + void close() { if (session != null) { session.close(); } @@ -1289,4 +998,53 @@ private void waitForFuturesToFinish() throws ExecutionException, InterruptedExce } } } + + /** + * A {@link PTransform} to read data from Apache Cassandra. See {@link CassandraIO} for more + * information on usage and configuration. + */ + @AutoValue + public abstract static class ReadAll extends PTransform>, PCollection> { + @AutoValue.Builder + abstract static class Builder { + + abstract Builder setCoder(Coder coder); + + abstract ReadAll autoBuild(); + + public ReadAll build() { + return autoBuild(); + } + } + + @Nullable + abstract Coder coder(); + + abstract Builder builder(); + + /** Specify the {@link Coder} used to serialize the entity in the {@link PCollection}. */ + public ReadAll withCoder(Coder coder) { + checkArgument(coder != null, "coder can not be null"); + return builder().setCoder(coder).build(); + } + + @Override + public PCollection expand(PCollection> input) { + checkArgument(coder() != null, "withCoder() is required"); + return input + .apply("Reshuffle", Reshuffle.viaRandomKey()) + .apply("Read", ParDo.of(new ReadFn<>())) + .setCoder(this.coder()); + } + } + + /** + * Check if the current partitioner is the Murmur3 (default in Cassandra version newer than 2). + */ + @VisibleForTesting + private static boolean isMurmur3Partitioner(Cluster cluster) { + return MURMUR3PARTITIONER.equals(cluster.getMetadata().getPartitioner()); + } + + private static final String MURMUR3PARTITIONER = "org.apache.cassandra.dht.Murmur3Partitioner"; } diff --git a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/ConnectionManager.java b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/ConnectionManager.java new file mode 100644 index 000000000000..5091ac40b936 --- /dev/null +++ b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/ConnectionManager.java @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.cassandra; + +import com.datastax.driver.core.Cluster; +import com.datastax.driver.core.Session; +import java.util.Objects; +import java.util.concurrent.ConcurrentHashMap; +import org.apache.beam.sdk.io.cassandra.CassandraIO.Read; +import org.apache.beam.sdk.options.ValueProvider; + +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class ConnectionManager { + + private static final ConcurrentHashMap clusterMap = + new ConcurrentHashMap(); + private static final ConcurrentHashMap sessionMap = + new ConcurrentHashMap(); + + static { + Runtime.getRuntime() + .addShutdownHook( + new Thread( + () -> { + for (Session session : sessionMap.values()) { + if (!session.isClosed()) { + session.close(); + } + } + })); + } + + private static String readToClusterHash(Read read) { + return Objects.requireNonNull(read.hosts()).get().stream().reduce(",", (a, b) -> a + b) + + Objects.requireNonNull(read.port()).get() + + safeVPGet(read.localDc()) + + safeVPGet(read.consistencyLevel()); + } + + private static String readToSessionHash(Read read) { + return readToClusterHash(read) + read.keyspace().get(); + } + + static Session getSession(Read read) { + Cluster cluster = + clusterMap.computeIfAbsent( + readToClusterHash(read), + k -> + CassandraIO.getCluster( + Objects.requireNonNull(read.hosts()), + Objects.requireNonNull(read.port()), + read.username(), + read.password(), + read.localDc(), + read.consistencyLevel(), + read.connectTimeout(), + read.readTimeout())); + return sessionMap.computeIfAbsent( + readToSessionHash(read), + k -> cluster.connect(Objects.requireNonNull(read.keyspace()).get())); + } + + private static String safeVPGet(ValueProvider s) { + return s != null ? s.get() : ""; + } +} diff --git a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DefaultObjectMapper.java b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DefaultObjectMapper.java index 8f6d5781eac6..92ec2c58d8b2 100644 --- a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DefaultObjectMapper.java +++ b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DefaultObjectMapper.java @@ -34,7 +34,7 @@ }) class DefaultObjectMapper implements Mapper, Serializable { - private transient com.datastax.driver.mapping.Mapper mapper; + private final transient com.datastax.driver.mapping.Mapper mapper; DefaultObjectMapper(com.datastax.driver.mapping.Mapper mapper) { this.mapper = mapper; diff --git a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DefaultObjectMapperFactory.java b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DefaultObjectMapperFactory.java index 7976665905b7..ef75ff312aca 100644 --- a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DefaultObjectMapperFactory.java +++ b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/DefaultObjectMapperFactory.java @@ -34,7 +34,7 @@ class DefaultObjectMapperFactory implements SerializableFunction { private transient MappingManager mappingManager; - Class entity; + final Class entity; DefaultObjectMapperFactory(Class entity) { this.entity = entity; diff --git a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/ReadFn.java b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/ReadFn.java new file mode 100644 index 000000000000..193cdf0a3d8c --- /dev/null +++ b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/ReadFn.java @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.cassandra; + +import com.datastax.driver.core.Cluster; +import com.datastax.driver.core.ColumnMetadata; +import com.datastax.driver.core.PreparedStatement; +import com.datastax.driver.core.ResultSet; +import com.datastax.driver.core.Session; +import com.datastax.driver.core.Token; +import java.util.Collections; +import java.util.Iterator; +import java.util.Set; +import java.util.stream.Collectors; +import org.apache.beam.sdk.io.cassandra.CassandraIO.Read; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Joiner; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@SuppressWarnings({ + "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +class ReadFn extends DoFn, T> { + + private static final Logger LOG = LoggerFactory.getLogger(ReadFn.class); + + @ProcessElement + public void processElement(@Element Read read, OutputReceiver receiver) { + try { + Session session = ConnectionManager.getSession(read); + Mapper mapper = read.mapperFactoryFn().apply(session); + String partitionKey = + session.getCluster().getMetadata().getKeyspace(read.keyspace().get()) + .getTable(read.table().get()).getPartitionKey().stream() + .map(ColumnMetadata::getName) + .collect(Collectors.joining(",")); + + String query = generateRangeQuery(read, partitionKey, read.ringRanges() != null); + PreparedStatement preparedStatement = session.prepare(query); + Set ringRanges = + read.ringRanges() == null ? Collections.emptySet() : read.ringRanges().get(); + + for (RingRange rr : ringRanges) { + Token startToken = session.getCluster().getMetadata().newToken(rr.getStart().toString()); + Token endToken = session.getCluster().getMetadata().newToken(rr.getEnd().toString()); + ResultSet rs = + session.execute(preparedStatement.bind().setToken(0, startToken).setToken(1, endToken)); + Iterator iter = mapper.map(rs); + while (iter.hasNext()) { + T n = iter.next(); + receiver.output(n); + } + } + + if (read.ringRanges() == null) { + ResultSet rs = session.execute(preparedStatement.bind()); + Iterator iter = mapper.map(rs); + while (iter.hasNext()) { + receiver.output(iter.next()); + } + } + } catch (Exception ex) { + LOG.error("error", ex); + } + } + + private Session getSession(Read read) { + Cluster cluster = + CassandraIO.getCluster( + read.hosts(), + read.port(), + read.username(), + read.password(), + read.localDc(), + read.consistencyLevel(), + read.connectTimeout(), + read.readTimeout()); + + return cluster.connect(read.keyspace().get()); + } + + private static String generateRangeQuery( + Read spec, String partitionKey, Boolean hasRingRange) { + final String rangeFilter = + (hasRingRange) + ? Joiner.on(" AND ") + .skipNulls() + .join( + String.format("(token(%s) >= ?)", partitionKey), + String.format("(token(%s) < ?)", partitionKey)) + : ""; + final String combinedQuery = buildInitialQuery(spec, hasRingRange) + rangeFilter; + LOG.debug("CassandraIO generated query : {}", combinedQuery); + return combinedQuery; + } + + private static String buildInitialQuery(Read spec, Boolean hasRingRange) { + return (spec.query() == null) + ? String.format("SELECT * FROM %s.%s", spec.keyspace().get(), spec.table().get()) + + " WHERE " + : spec.query().get() + (hasRingRange ? " AND " : ""); + } +} diff --git a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/RingRange.java b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/RingRange.java index b5f94d7c0c9b..c83e47fcac9a 100644 --- a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/RingRange.java +++ b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/RingRange.java @@ -17,23 +17,31 @@ */ package org.apache.beam.sdk.io.cassandra; +import java.io.Serializable; import java.math.BigInteger; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; /** Models a Cassandra token range. */ -final class RingRange { +@Experimental(Kind.SOURCE_SINK) +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public final class RingRange implements Serializable { private final BigInteger start; private final BigInteger end; - RingRange(BigInteger start, BigInteger end) { + private RingRange(BigInteger start, BigInteger end) { this.start = start; this.end = end; } - BigInteger getStart() { + public BigInteger getStart() { return start; } - BigInteger getEnd() { + public BigInteger getEnd() { return end; } @@ -55,4 +63,34 @@ public boolean isWrapping() { public String toString() { return String.format("(%s,%s]", start.toString(), end.toString()); } + + public static RingRange of(BigInteger start, BigInteger end) { + return new RingRange(start, end); + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + + RingRange ringRange = (RingRange) o; + + if (getStart() != null + ? !getStart().equals(ringRange.getStart()) + : ringRange.getStart() != null) { + return false; + } + return getEnd() != null ? getEnd().equals(ringRange.getEnd()) : ringRange.getEnd() == null; + } + + @Override + public int hashCode() { + int result = getStart() != null ? getStart().hashCode() : 0; + result = 31 * result + (getEnd() != null ? getEnd().hashCode() : 0); + return result; + } } diff --git a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/SplitGenerator.java b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/SplitGenerator.java index de494212d560..bc1205a28797 100644 --- a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/SplitGenerator.java +++ b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/SplitGenerator.java @@ -39,22 +39,22 @@ final class SplitGenerator { this.partitioner = partitioner; } - private static BigInteger getRangeMin(String partitioner) { + static BigInteger getRangeMin(String partitioner) { if (partitioner.endsWith("RandomPartitioner")) { return BigInteger.ZERO; } else if (partitioner.endsWith("Murmur3Partitioner")) { - return new BigInteger("2").pow(63).negate(); + return BigInteger.valueOf(2).pow(63).negate(); } else { throw new UnsupportedOperationException( "Unsupported partitioner. " + "Only Random and Murmur3 are supported"); } } - private static BigInteger getRangeMax(String partitioner) { + static BigInteger getRangeMax(String partitioner) { if (partitioner.endsWith("RandomPartitioner")) { - return new BigInteger("2").pow(127).subtract(BigInteger.ONE); + return BigInteger.valueOf(2).pow(127).subtract(BigInteger.ONE); } else if (partitioner.endsWith("Murmur3Partitioner")) { - return new BigInteger("2").pow(63).subtract(BigInteger.ONE); + return BigInteger.valueOf(2).pow(63).subtract(BigInteger.ONE); } else { throw new UnsupportedOperationException( "Unsupported partitioner. " + "Only Random and Murmur3 are supported"); @@ -84,7 +84,7 @@ List> generateSplits(long totalSplitCount, List ring BigInteger start = ringTokens.get(i); BigInteger stop = ringTokens.get((i + 1) % tokenRangeCount); - if (!inRange(start) || !inRange(stop)) { + if (!isInRange(start) || !isInRange(stop)) { throw new RuntimeException( String.format("Tokens (%s,%s) not in range of %s", start, stop, partitioner)); } @@ -127,7 +127,7 @@ List> generateSplits(long totalSplitCount, List ring // Append the splits between the endpoints for (int j = 0; j < splitCount; j++) { - splits.add(new RingRange(endpointTokens.get(j), endpointTokens.get(j + 1))); + splits.add(RingRange.of(endpointTokens.get(j), endpointTokens.get(j + 1))); LOG.debug("Split #{}: [{},{})", j + 1, endpointTokens.get(j), endpointTokens.get(j + 1)); } } @@ -144,7 +144,7 @@ List> generateSplits(long totalSplitCount, List ring return coalesceSplits(getTargetSplitSize(totalSplitCount), splits); } - private boolean inRange(BigInteger token) { + private boolean isInRange(BigInteger token) { return !(token.compareTo(rangeMin) < 0 || token.compareTo(rangeMax) > 0); } diff --git a/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOIT.java b/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOIT.java index d9ed2ab998ec..1f936761f5ed 100644 --- a/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOIT.java +++ b/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOIT.java @@ -60,9 +60,6 @@ * */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CassandraIOIT implements Serializable { /** CassandraIOIT options. */ diff --git a/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java b/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java index bd85cb158d4f..131ce83b48dd 100644 --- a/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java +++ b/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java @@ -18,23 +18,18 @@ package org.apache.beam.sdk.io.cassandra; import static junit.framework.TestCase.assertTrue; -import static org.apache.beam.sdk.io.cassandra.CassandraIO.CassandraSource.distance; -import static org.apache.beam.sdk.io.cassandra.CassandraIO.CassandraSource.getEstimatedSizeBytesFromTokenRanges; -import static org.apache.beam.sdk.io.cassandra.CassandraIO.CassandraSource.getRingFraction; -import static org.apache.beam.sdk.io.cassandra.CassandraIO.CassandraSource.isMurmur3Partitioner; -import static org.apache.beam.sdk.testing.SourceTestUtils.readFromSource; -import static org.hamcrest.Matchers.greaterThan; -import static org.hamcrest.Matchers.lessThan; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import com.datastax.driver.core.Cluster; +import com.datastax.driver.core.Metadata; +import com.datastax.driver.core.ProtocolVersion; import com.datastax.driver.core.ResultSet; import com.datastax.driver.core.Row; import com.datastax.driver.core.Session; +import com.datastax.driver.core.TypeCodec; import com.datastax.driver.core.exceptions.NoHostAvailableException; -import com.datastax.driver.core.querybuilder.QueryBuilder; +import com.datastax.driver.mapping.annotations.ClusteringColumn; import com.datastax.driver.mapping.annotations.Column; import com.datastax.driver.mapping.annotations.Computed; import com.datastax.driver.mapping.annotations.PartitionKey; @@ -44,10 +39,14 @@ import java.io.IOException; import java.io.Serializable; import java.math.BigInteger; +import java.nio.ByteBuffer; import java.nio.file.Files; import java.nio.file.Paths; import java.util.ArrayList; +import java.util.Arrays; import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.concurrent.Callable; @@ -61,13 +60,8 @@ import javax.management.remote.JMXConnectorFactory; import javax.management.remote.JMXServiceURL; import org.apache.beam.sdk.coders.SerializableCoder; -import org.apache.beam.sdk.io.BoundedSource; -import org.apache.beam.sdk.io.cassandra.CassandraIO.CassandraSource.TokenRange; import org.apache.beam.sdk.io.common.NetworkTestHelper; -import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.testing.PAssert; -import org.apache.beam.sdk.testing.SourceTestUtils; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.Create; @@ -82,7 +76,6 @@ import org.apache.cassandra.service.StorageServiceMBean; import org.checkerframework.checker.nullness.qual.Nullable; import org.junit.AfterClass; -import org.junit.Assert; import org.junit.BeforeClass; import org.junit.ClassRule; import org.junit.Rule; @@ -97,10 +90,9 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class CassandraIOTest implements Serializable { - private static final long NUM_ROWS = 20L; + private static final long NUM_ROWS = 22L; private static final String CASSANDRA_KEYSPACE = "beam_ks"; private static final String CASSANDRA_HOST = "127.0.0.1"; private static final String CASSANDRA_TABLE = "scientist"; @@ -191,39 +183,44 @@ private static void insertData() throws Exception { LOG.info("Create Cassandra tables"); session.execute( String.format( - "CREATE TABLE IF NOT EXISTS %s.%s(person_id int, person_name text, PRIMARY KEY" - + "(person_id));", + "CREATE TABLE IF NOT EXISTS %s.%s(person_department text, person_id int, person_name text, PRIMARY KEY" + + "((person_department), person_id));", CASSANDRA_KEYSPACE, CASSANDRA_TABLE)); session.execute( String.format( - "CREATE TABLE IF NOT EXISTS %s.%s(person_id int, person_name text, PRIMARY KEY" - + "(person_id));", + "CREATE TABLE IF NOT EXISTS %s.%s(person_department text, person_id int, person_name text, PRIMARY KEY" + + "((person_department), person_id));", CASSANDRA_KEYSPACE, CASSANDRA_TABLE_WRITE)); LOG.info("Insert records"); - String[] scientists = { - "Einstein", - "Darwin", - "Copernicus", - "Pasteur", - "Curie", - "Faraday", - "Newton", - "Bohr", - "Galilei", - "Maxwell" + String[][] scientists = { + new String[] {"phys", "Einstein"}, + new String[] {"bio", "Darwin"}, + new String[] {"phys", "Copernicus"}, + new String[] {"bio", "Pasteur"}, + new String[] {"bio", "Curie"}, + new String[] {"phys", "Faraday"}, + new String[] {"math", "Newton"}, + new String[] {"phys", "Bohr"}, + new String[] {"phys", "Galileo"}, + new String[] {"math", "Maxwell"}, + new String[] {"logic", "Russel"}, }; for (int i = 0; i < NUM_ROWS; i++) { int index = i % scientists.length; - session.execute( + String insertStr = String.format( - "INSERT INTO %s.%s(person_id, person_name) values(" + "INSERT INTO %s.%s(person_department, person_id, person_name) values(" + + "'" + + scientists[index][0] + + "', " + i + ", '" - + scientists[index] + + scientists[index][1] + "');", CASSANDRA_KEYSPACE, - CASSANDRA_TABLE)); + CASSANDRA_TABLE); + session.execute(insertStr); } flushMemTablesAndRefreshSizeEstimates(); } @@ -277,25 +274,6 @@ private static void disableAutoCompaction() throws Exception { Thread.sleep(JMX_CONF_TIMEOUT); } - @Test - public void testEstimatedSizeBytes() throws Exception { - PipelineOptions pipelineOptions = PipelineOptionsFactory.create(); - CassandraIO.Read read = - CassandraIO.read() - .withHosts(Collections.singletonList(CASSANDRA_HOST)) - .withPort(cassandraPort) - .withKeyspace(CASSANDRA_KEYSPACE) - .withTable(CASSANDRA_TABLE); - CassandraIO.CassandraSource source = new CassandraIO.CassandraSource<>(read, null); - long estimatedSizeBytes = source.getEstimatedSizeBytes(pipelineOptions); - // the size is non determanistic in Cassandra backend: checks that estimatedSizeBytes >= 12960L - // -20% && estimatedSizeBytes <= 12960L +20% - assertThat( - "wrong estimated size in " + CASSANDRA_KEYSPACE + "/" + CASSANDRA_TABLE, - estimatedSizeBytes, - greaterThan(0L)); - } - @Test public void testRead() throws Exception { PCollection output = @@ -305,6 +283,7 @@ public void testRead() throws Exception { .withPort(cassandraPort) .withKeyspace(CASSANDRA_KEYSPACE) .withTable(CASSANDRA_TABLE) + .withMinNumberOfSplits(50) .withCoder(SerializableCoder.of(Scientist.class)) .withEntity(Scientist.class)); @@ -322,9 +301,113 @@ public KV apply(Scientist scientist) { PAssert.that(mapped.apply("Count occurrences per scientist", Count.perKey())) .satisfies( input -> { + int count = 0; for (KV element : input) { + count++; assertEquals(element.getKey(), NUM_ROWS / 10, element.getValue().longValue()); } + assertEquals(11, count); + return null; + }); + + pipeline.run(); + } + + private CassandraIO.Read getReadWithRingRange(RingRange... rr) { + return CassandraIO.read() + .withHosts(Collections.singletonList(CASSANDRA_HOST)) + .withPort(cassandraPort) + .withRingRanges(new HashSet<>(Arrays.asList(rr))) + .withKeyspace(CASSANDRA_KEYSPACE) + .withTable(CASSANDRA_TABLE) + .withCoder(SerializableCoder.of(Scientist.class)) + .withEntity(Scientist.class); + } + + private CassandraIO.Read getReadWithQuery(String query) { + return CassandraIO.read() + .withHosts(Collections.singletonList(CASSANDRA_HOST)) + .withPort(cassandraPort) + .withQuery(query) + .withKeyspace(CASSANDRA_KEYSPACE) + .withTable(CASSANDRA_TABLE) + .withCoder(SerializableCoder.of(Scientist.class)) + .withEntity(Scientist.class); + } + + @Test + public void testReadAllQuery() { + String physQuery = + String.format( + "SELECT * From %s.%s WHERE person_department='phys' AND person_id=0;", + CASSANDRA_KEYSPACE, CASSANDRA_TABLE); + + String mathQuery = + String.format( + "SELECT * From %s.%s WHERE person_department='math' AND person_id=6;", + CASSANDRA_KEYSPACE, CASSANDRA_TABLE); + + PCollection output = + pipeline + .apply(Create.of(getReadWithQuery(physQuery), getReadWithQuery(mathQuery))) + .apply( + CassandraIO.readAll().withCoder(SerializableCoder.of(Scientist.class))); + + PCollection mapped = + output.apply( + MapElements.via( + new SimpleFunction() { + @Override + public String apply(Scientist scientist) { + return scientist.name; + } + })); + PAssert.that(mapped).containsInAnyOrder("Einstein", "Newton"); + PAssert.thatSingleton(output.apply("count", Count.globally())).isEqualTo(2L); + pipeline.run(); + } + + @Test + public void testReadAllRingRange() { + RingRange physRR = + fromEncodedKey( + cluster.getMetadata(), TypeCodec.varchar().serialize("phys", ProtocolVersion.V3)); + + RingRange mathRR = + fromEncodedKey( + cluster.getMetadata(), TypeCodec.varchar().serialize("math", ProtocolVersion.V3)); + + RingRange logicRR = + fromEncodedKey( + cluster.getMetadata(), TypeCodec.varchar().serialize("logic", ProtocolVersion.V3)); + + PCollection output = + pipeline + .apply(Create.of(getReadWithRingRange(physRR), getReadWithRingRange(mathRR, logicRR))) + .apply( + CassandraIO.readAll().withCoder(SerializableCoder.of(Scientist.class))); + + PCollection> mapped = + output.apply( + MapElements.via( + new SimpleFunction>() { + @Override + public KV apply(Scientist scientist) { + return KV.of(scientist.department, scientist.id); + } + })); + + PAssert.that(mapped.apply("Count occurrences per department", Count.perKey())) + .satisfies( + input -> { + HashMap map = new HashMap<>(); + for (KV element : input) { + map.put(element.getKey(), element.getValue()); + } + assertEquals(3, map.size()); // do we have all three departments + assertEquals(map.get("phys"), 10L, 0L); + assertEquals(map.get("math"), 4L, 0L); + assertEquals(map.get("logic"), 2L, 0L); return null; }); @@ -340,8 +423,9 @@ public void testReadWithQuery() throws Exception { .withPort(cassandraPort) .withKeyspace(CASSANDRA_KEYSPACE) .withTable(CASSANDRA_TABLE) + .withMinNumberOfSplits(20) .withQuery( - "select person_id, writetime(person_name) from beam_ks.scientist where person_id=10") + "select person_id, writetime(person_name) from beam_ks.scientist where person_id=10 AND person_department='logic'") .withCoder(SerializableCoder.of(Scientist.class)) .withEntity(Scientist.class)); @@ -366,6 +450,7 @@ public void testWrite() { ScientistWrite scientist = new ScientistWrite(); scientist.id = i; scientist.name = "Name " + i; + scientist.department = "bio"; data.add(scientist); } @@ -486,52 +571,6 @@ public void testCustomMapperImplDelete() { assertEquals(1, counter.intValue()); } - @Test - public void testSplit() throws Exception { - PipelineOptions options = PipelineOptionsFactory.create(); - CassandraIO.Read read = - CassandraIO.read() - .withHosts(Collections.singletonList(CASSANDRA_HOST)) - .withPort(cassandraPort) - .withKeyspace(CASSANDRA_KEYSPACE) - .withTable(CASSANDRA_TABLE) - .withEntity(Scientist.class) - .withCoder(SerializableCoder.of(Scientist.class)); - - // initialSource will be read without splitting (which does not happen in production) - // so we need to provide splitQueries to avoid NPE in source.reader.start() - String splitQuery = QueryBuilder.select().from(CASSANDRA_KEYSPACE, CASSANDRA_TABLE).toString(); - CassandraIO.CassandraSource initialSource = - new CassandraIO.CassandraSource<>(read, Collections.singletonList(splitQuery)); - int desiredBundleSizeBytes = 2048; - long estimatedSize = initialSource.getEstimatedSizeBytes(options); - List> splits = initialSource.split(desiredBundleSizeBytes, options); - SourceTestUtils.assertSourcesEqualReferenceSource(initialSource, splits, options); - float expectedNumSplitsloat = - (float) initialSource.getEstimatedSizeBytes(options) / desiredBundleSizeBytes; - long sum = 0; - - for (BoundedSource subSource : splits) { - sum += subSource.getEstimatedSizeBytes(options); - } - - // due to division and cast estimateSize != sum but will be close. Exact equals checked below - assertEquals((long) (estimatedSize / splits.size()) * splits.size(), sum); - - int expectedNumSplits = (int) Math.ceil(expectedNumSplitsloat); - assertEquals("Wrong number of splits", expectedNumSplits, splits.size()); - int emptySplits = 0; - for (BoundedSource subSource : splits) { - if (readFromSource(subSource, options).isEmpty()) { - emptySplits += 1; - } - } - assertThat( - "There are too many empty splits, parallelism is sub-optimal", - emptySplits, - lessThan((int) (ACCEPTABLE_EMPTY_SPLITS_PERCENTAGE * splits.size()))); - } - private List getRows(String table) { ResultSet result = session.execute( @@ -546,6 +585,7 @@ public void testDelete() throws Exception { Scientist einstein = new Scientist(); einstein.id = 0; + einstein.department = "phys"; einstein.name = "Einstein"; pipeline .apply(Create.of(einstein)) @@ -562,7 +602,8 @@ public void testDelete() throws Exception { // re-insert suppressed doc to make the test autonomous session.execute( String.format( - "INSERT INTO %s.%s(person_id, person_name) values(" + "INSERT INTO %s.%s(person_department, person_id, person_name) values(" + + "'phys', " + einstein.id + ", '" + einstein.name @@ -571,58 +612,6 @@ public void testDelete() throws Exception { CASSANDRA_TABLE)); } - @Test - public void testValidPartitioner() { - Assert.assertTrue(isMurmur3Partitioner(cluster)); - } - - @Test - public void testDistance() { - BigInteger distance = distance(new BigInteger("10"), new BigInteger("100")); - assertEquals(BigInteger.valueOf(90), distance); - - distance = distance(new BigInteger("100"), new BigInteger("10")); - assertEquals(new BigInteger("18446744073709551526"), distance); - } - - @Test - public void testRingFraction() { - // simulate a first range taking "half" of the available tokens - List tokenRanges = new ArrayList<>(); - tokenRanges.add(new TokenRange(1, 1, BigInteger.valueOf(Long.MIN_VALUE), new BigInteger("0"))); - assertEquals(0.5, getRingFraction(tokenRanges), 0); - - // add a second range to cover all tokens available - tokenRanges.add(new TokenRange(1, 1, new BigInteger("0"), BigInteger.valueOf(Long.MAX_VALUE))); - assertEquals(1.0, getRingFraction(tokenRanges), 0); - } - - @Test - public void testEstimatedSizeBytesFromTokenRanges() { - List tokenRanges = new ArrayList<>(); - // one partition containing all tokens, the size is actually the size of the partition - tokenRanges.add( - new TokenRange( - 1, 1000, BigInteger.valueOf(Long.MIN_VALUE), BigInteger.valueOf(Long.MAX_VALUE))); - assertEquals(1000, getEstimatedSizeBytesFromTokenRanges(tokenRanges)); - - // one partition with half of the tokens, we estimate the size to the double of this partition - tokenRanges = new ArrayList<>(); - tokenRanges.add( - new TokenRange(1, 1000, BigInteger.valueOf(Long.MIN_VALUE), new BigInteger("0"))); - assertEquals(2000, getEstimatedSizeBytesFromTokenRanges(tokenRanges)); - - // we have three partitions covering all tokens, the size is the sum of partition size * - // partition count - tokenRanges = new ArrayList<>(); - tokenRanges.add( - new TokenRange(1, 1000, BigInteger.valueOf(Long.MIN_VALUE), new BigInteger("-3"))); - tokenRanges.add(new TokenRange(1, 1000, new BigInteger("-2"), new BigInteger("10000"))); - tokenRanges.add( - new TokenRange(2, 3000, new BigInteger("10001"), BigInteger.valueOf(Long.MAX_VALUE))); - assertEquals(8000, getEstimatedSizeBytesFromTokenRanges(tokenRanges)); - } - /** Simple Cassandra entity used in read tests. */ @Table(name = CASSANDRA_TABLE, keyspace = CASSANDRA_KEYSPACE) static class Scientist implements Serializable { @@ -633,10 +622,14 @@ static class Scientist implements Serializable { @Computed("writetime(person_name)") Long nameTs; - @PartitionKey() + @ClusteringColumn() @Column(name = "person_id") int id; + @PartitionKey + @Column(name = "person_department") + String department; + @Override public String toString() { return id + ":" + name; @@ -651,7 +644,9 @@ public boolean equals(@Nullable Object o) { return false; } Scientist scientist = (Scientist) o; - return id == scientist.id && Objects.equal(name, scientist.name); + return id == scientist.id + && Objects.equal(name, scientist.name) + && Objects.equal(department, scientist.department); } @Override @@ -660,6 +655,11 @@ public int hashCode() { } } + private static RingRange fromEncodedKey(Metadata metadata, ByteBuffer... bb) { + BigInteger bi = BigInteger.valueOf((long) metadata.newToken(bb).getValue()); + return RingRange.of(bi, bi.add(BigInteger.valueOf(1L))); + } + private static final String CASSANDRA_TABLE_WRITE = "scientist_write"; /** Simple Cassandra entity used in write tests. */ @Table(name = CASSANDRA_TABLE_WRITE, keyspace = CASSANDRA_KEYSPACE) diff --git a/sdks/java/io/clickhouse/build.gradle b/sdks/java/io/clickhouse/build.gradle index 42deda7c14ab..3fd2abfcfdd3 100644 --- a/sdks/java/io/clickhouse/build.gradle +++ b/sdks/java/io/clickhouse/build.gradle @@ -49,12 +49,15 @@ idea { } } -def clickhouse_jdbc_version = "0.2.4" +def clickhouse_jdbc_version = "0.2.6" dependencies { javacc "net.java.dev.javacc:javacc:7.0.9" compile project(path: ":sdks:java:core", configuration: "shadow") + compile library.java.guava compile library.java.joda_time + compile library.java.slf4j_api + compile library.java.vendored_guava_26_0_jre compile "ru.yandex.clickhouse:clickhouse-jdbc:$clickhouse_jdbc_version" testCompile library.java.slf4j_api testCompile library.java.junit @@ -64,3 +67,8 @@ dependencies { testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } + +processTestResources { + // BEAM-12390: config.d/zookeeper_default.xml should have xx4 permission. + fileMode 0644 +} diff --git a/sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/BaseClickHouseTest.java b/sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/BaseClickHouseTest.java index 57249e9a2d5f..8fc37c8ec94b 100644 --- a/sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/BaseClickHouseTest.java +++ b/sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/BaseClickHouseTest.java @@ -47,7 +47,6 @@ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) "unchecked", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class BaseClickHouseTest { diff --git a/sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIOTest.java b/sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIOTest.java index 3f0a34b6df46..d557e33f5b84 100644 --- a/sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIOTest.java +++ b/sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIOTest.java @@ -43,9 +43,6 @@ /** Tests for {@link ClickHouseIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ClickHouseIOTest extends BaseClickHouseTest { @Rule public TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/DatabaseTestHelper.java b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/DatabaseTestHelper.java index 56b72303fc6b..46d9aa97017b 100644 --- a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/DatabaseTestHelper.java +++ b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/DatabaseTestHelper.java @@ -17,14 +17,18 @@ */ package org.apache.beam.sdk.io.common; +import static org.junit.Assert.assertEquals; + import java.sql.Connection; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.text.SimpleDateFormat; +import java.util.ArrayList; import java.util.Date; import java.util.Optional; import javax.sql.DataSource; +import org.apache.beam.sdk.values.KV; import org.postgresql.ds.PGSimpleDataSource; /** This class contains helper methods to ease database usage in tests. */ @@ -104,4 +108,26 @@ public static void createTableWithStatement(DataSource dataSource, String stmt) } } } + + public static ArrayList> getTestDataToWrite(long rowsToAdd) { + ArrayList> data = new ArrayList<>(); + for (int i = 0; i < rowsToAdd; i++) { + KV kv = KV.of(i, "Test"); + data.add(kv); + } + return data; + } + + public static void assertRowCount(DataSource dataSource, String tableName, int expectedRowCount) + throws SQLException { + try (Connection connection = dataSource.getConnection()) { + try (Statement statement = connection.createStatement()) { + try (ResultSet resultSet = statement.executeQuery("select count(*) from " + tableName)) { + resultSet.next(); + int count = resultSet.getInt(1); + assertEquals(expectedRowCount, count); + } + } + } + } } diff --git a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/HashingFn.java b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/HashingFn.java index 033c1650b193..5107f264345e 100644 --- a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/HashingFn.java +++ b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/HashingFn.java @@ -34,9 +34,6 @@ import org.checkerframework.checker.nullness.qual.Nullable; /** Custom Function for Hashing. The combiner is combineUnordered, and accumulator is a HashCode. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HashingFn extends CombineFn { /** Serializable Class to store the HashCode of input String. */ diff --git a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOITHelper.java b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOITHelper.java index ccf6a9c35d11..d14eacb8230b 100644 --- a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOITHelper.java +++ b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOITHelper.java @@ -26,9 +26,6 @@ import org.slf4j.LoggerFactory; /** Methods common to all types of IOITs. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class IOITHelper { private static final Logger LOG = LoggerFactory.getLogger(IOITHelper.class); private static final int maxAttempts = 3; diff --git a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOITHelperTest.java b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOITHelperTest.java index bd117b33dc4f..7901209ac72a 100644 --- a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOITHelperTest.java +++ b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOITHelperTest.java @@ -31,9 +31,6 @@ /** Tests for functions in {@link IOITHelper}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class IOITHelperTest { private static long startTimeMeasure; private static String message = ""; diff --git a/sdks/java/io/contextualtextio/build.gradle b/sdks/java/io/contextualtextio/build.gradle index 0ebd983eeb60..340e994a37f3 100644 --- a/sdks/java/io/contextualtextio/build.gradle +++ b/sdks/java/io/contextualtextio/build.gradle @@ -25,11 +25,14 @@ description = "Apache Beam :: SDKs :: Java :: Contextual-Text-IO" ext.summary = "Context-aware Text IO." dependencies { - compile library.java.vendored_guava_26_0_jre - compile library.java.protobuf_java compile project(path: ":sdks:java:core", configuration: "shadow") - testCompile project(path: ":sdks:java:core", configuration: "shadowTest") + compile library.java.protobuf_java + permitUnusedDeclared library.java.protobuf_java // BEAM-11761 + compile library.java.slf4j_api + compile library.java.vendored_guava_26_0_jre + compile library.java.vendored_grpc_1_36_0 + testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile library.java.guava_testlib testCompile library.java.junit testCompile library.java.hamcrest_core diff --git a/sdks/java/io/contextualtextio/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOSource.java b/sdks/java/io/contextualtextio/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOSource.java index de70291f8b21..2a95120513fe 100644 --- a/sdks/java/io/contextualtextio/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOSource.java +++ b/sdks/java/io/contextualtextio/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOSource.java @@ -30,7 +30,7 @@ import org.apache.beam.sdk.options.ValueProvider; import org.apache.beam.sdk.schemas.SchemaCoder; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.checkerframework.checker.nullness.qual.Nullable; diff --git a/sdks/java/io/contextualtextio/src/test/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOTest.java b/sdks/java/io/contextualtextio/src/test/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOTest.java index 62b53fbdde79..2c3ab3b1231e 100644 --- a/sdks/java/io/contextualtextio/src/test/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOTest.java +++ b/sdks/java/io/contextualtextio/src/test/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOTest.java @@ -32,11 +32,11 @@ import static org.apache.beam.sdk.io.Compression.ZIP; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.apache.beam.sdk.values.TypeDescriptors.strings; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.Matchers.hasSize; -import static org.junit.Assert.assertThat; import java.io.File; import java.io.FileOutputStream; @@ -103,9 +103,6 @@ import org.junit.runners.Parameterized; /** Tests for {@link ContextualTextIO.Read}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ContextualTextIOTest { private static final int NUM_LINES_FOR_LARGE = 1024; diff --git a/sdks/java/io/debezium/build.gradle b/sdks/java/io/debezium/build.gradle new file mode 100644 index 000000000000..722528f42c70 --- /dev/null +++ b/sdks/java/io/debezium/build.gradle @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +import groovy.json.JsonOutput + +plugins { id 'org.apache.beam.module' } +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.io.debezium', + mavenRepositories: [ + [id: 'io.confluent', url: 'https://packages.confluent.io/maven/'] + ], + enableSpotbugs: false, +) +provideIntegrationTestingDependencies() + +description = "Apache Beam :: SDKs :: Java :: IO :: Debezium" +ext.summary = "Library to work with Debezium data." + +dependencies { + compile library.java.vendored_guava_26_0_jre + compile library.java.vendored_grpc_1_36_0 + compile project(path: ":sdks:java:core", configuration: "shadow") + compile library.java.slf4j_api + compile library.java.joda_time + provided library.java.jackson_dataformat_csv + testCompile project(path: ":sdks:java:core", configuration: "shadowTest") + testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") + + // Test dependencies + testCompile library.java.junit + testRuntimeOnly library.java.slf4j_jdk14 + testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") + testCompile project(":runners:google-cloud-dataflow-java") + testCompile "org.testcontainers:testcontainers:1.15.1" + testCompile "org.testcontainers:mysql:1.15.1" + + // Kafka connect dependencies + compile "org.apache.kafka:connect-api:2.5.0" + compile "org.apache.kafka:connect-json:2.5.0" + permitUnusedDeclared "org.apache.kafka:connect-json:2.5.0" // BEAM-11761 + + // Debezium dependencies + compile group: 'io.debezium', name: 'debezium-core', version: '1.3.1.Final' + testCompile group: 'io.debezium', name: 'debezium-connector-mysql', version: '1.3.1.Final' +} + +test { + testLogging { + outputs.upToDateWhen {false} + showStandardStreams = true + } +} + + +task integrationTest(type: Test, dependsOn: processTestResources) { + group = "Verification" + systemProperty "beamTestPipelineOptions", JsonOutput.toJson([ + "--runner=DirectRunner", + ]) + + // Disable Gradle cache: these ITs interact with live service that should always be considered "out of date" + outputs.upToDateWhen { false } + + include '**/*IT.class' + classpath = sourceSets.test.runtimeClasspath + testClassesDirs = sourceSets.test.output.classesDirs + + useJUnit { + } +} diff --git a/sdks/java/io/debezium/expansion-service/build.gradle b/sdks/java/io/debezium/expansion-service/build.gradle new file mode 100644 index 000000000000..a183c9128b29 --- /dev/null +++ b/sdks/java/io/debezium/expansion-service/build.gradle @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: 'org.apache.beam.module' +apply plugin: 'application' +mainClassName = "org.apache.beam.sdk.expansion.service.ExpansionService" + +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.io.debezium.expansion.service', + exportJavadoc: false, + validateShadowJar: false, + shadowClosure: {}, +) + +description = "Apache Beam :: SDKs :: Java :: IO :: Debezium :: Expansion Service" +ext.summary = "Expansion service serving DebeziumIO" + +dependencies { + compile project(":sdks:java:expansion-service") + permitUnusedDeclared project(":sdks:java:expansion-service") // BEAM-11761 + compile project(":sdks:java:io:debezium") + permitUnusedDeclared project(":sdks:java:io:debezium") // BEAM-11761 + runtime library.java.slf4j_jdk14 + + // Debezium runtime dependencies + def debezium_version = '1.3.1.Final' + runtimeOnly group: 'io.debezium', name: 'debezium-connector-mysql', version: debezium_version + runtimeOnly group: 'io.debezium', name: 'debezium-connector-postgres', version: debezium_version + runtimeOnly group: 'io.debezium', name: 'debezium-connector-sqlserver', version: debezium_version + runtimeOnly group: 'io.debezium', name: 'debezium-connector-oracle', version: debezium_version + runtimeOnly group: 'io.debezium', name: 'debezium-connector-db2', version: debezium_version +} \ No newline at end of file diff --git a/sdks/java/io/debezium/src/README.md b/sdks/java/io/debezium/src/README.md new file mode 100644 index 000000000000..4cf9be81c618 --- /dev/null +++ b/sdks/java/io/debezium/src/README.md @@ -0,0 +1,178 @@ + + +# DebeziumIO +## Connect your Debezium Databases to Apache Beam easily. + +### What is DebeziumIO? +DebeziumIO is an Apache Beam connector that lets users connect their Events-Driven Databases on [Debezium](https://debezium.io) to [Apache Beam](https://beam.apache.org/) without the need to set up a [Kafka](https://kafka.apache.org/) instance. + +### Getting Started + +DebeziumIO uses [Debezium Connectors v1.3](https://debezium.io/documentation/reference/1.3/connectors/) to connect to Apache Beam. All you need to do is choose the Debezium Connector that suits your Debezium setup and pick a [Serializable Function](https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/transforms/SerializableFunction.html), then you will be able to connect to Apache Beam and start building your own Pipelines. + +These connectors have been successfully tested and are known to work fine: +* MySQL Connector +* PostgreSQL Connector +* SQLServer Connector +* DB2 Connector + +Other connectors might also work. + + +Setting up a connector and running a Pipeline should be as simple as: +``` +Pipeline p = Pipeline.create(); // Create a Pipeline + p.apply(DebeziumIO.read() + .withConnectorConfiguration(...) // Debezium Connector setup + .withFormatFunction(...) // Serializable Function to use + ).setCoder(StringUtf8Coder.of()); +p.run().waitUntilFinish(); // Run your pipeline! +``` + +### Setting up a Debezium Connector + +DebeziumIO comes with a handy ConnectorConfiguration builder, which lets you provide all the configuration needed to access your Debezium Database. + +A basic configuration such as **username**, **password**, **port number**, and **host name** must be specified along with the **Debezium Connector class** you will use by using these methods: + +|Method|Param|Description| +|-|-|-| +|`.withConnectorClass(connectorClass)`|_Class_|Debezium Connector| +|`.withUsername(username)`|_String_|Database Username| +|`.withPassword(password)`|_String_|Database Password| +|`.withHostName(hostname)`|_String_|Database Hostname| +|`.withPort(portNumber)`|_String_|Database Port number| + +You can also add more configuration, such as Connector-specific Properties with the `_withConnectionProperty_` method: + +|Method|Params|Description| +|-|-|-| +|`.withConnectionProperty(propName, propValue)`|_String_, _String_|Adds a custom property to the connector.| +> **Note:** For more information on custom properties, see your [Debezium Connector](https://debezium.io/documentation/reference/1.3/connectors/) specific documentation. + +Example of a MySQL Debezium Connector setup: +``` +DebeziumIO.ConnectorConfiguration.create() + .withUsername("dbUsername") + .withPassword("dbPassword") + .withConnectorClass(MySqlConnector.class) + .withHostName("127.0.0.1") + .withPort("3306") + .withConnectionProperty("database.server.id", "serverId") + .withConnectionProperty("database.server.name", "serverName") + .withConnectionProperty("database.include.list", "dbName") + .withConnectionProperty("include.schema.changes", "false") +``` + +### Setting a Serializable Function + +A serializable function is required to depict each `SourceRecord` fetched from the Database. + +DebeziumIO comes with a built-in JSON Mapper that you can optionally use to map every `SourceRecord` fetched from the Database to a JSON object. This helps users visualize and access their data in a simple way. + +If you want to use this built-in JSON Mapper, you can do it by setting an instance of **SourceRecordJsonMapper** as a Serializable Function to the DebeziumIO: +``` +.withFormatFunction(new SourceRecordJson.SourceRecordJsonMapper()) +``` +> **Note:** `SourceRecordJsonMapper`comes out of the box, but you may use any Format Function you prefer. + +## Quick Example + +The following example is how an actual setup would look like using a **MySQL Debezium Connector** and **SourceRecordJsonMapper** as the Serializable Function. +``` +PipelineOptions options = PipelineOptionsFactory.create(); +Pipeline p = Pipeline.create(options); +p.apply(DebeziumIO.read(). + withConnectorConfiguration( // Debezium Connector setup + DebeziumIO.ConnectorConfiguration.create() + .withUsername("debezium") + .withPassword("dbz") + .withConnectorClass(MySqlConnector.class) + .withHostName("127.0.0.1") + .withPort("3306") + .withConnectionProperty("database.server.id", "184054") + .withConnectionProperty("database.server.name", "dbserver1") + .withConnectionProperty("database.include.list", "inventory") + .withConnectionProperty("include.schema.changes", "false") + ).withFormatFunction( + new SourceRecordJson.SourceRecordJsonMapper() // Serializable Function + ) +).setCoder(StringUtf8Coder.of()); + +p.run().waitUntilFinish(); +``` + +## Shortcut! + +If you will be using the built-in **SourceRecordJsonMapper** as your Serializable Function for all your pipelines, you should use **readAsJson()**. + +DebeziumIO comes with a method called `readAsJson`, which automatically sets the `SourceRecordJsonMapper` as the Serializable Function for your pipeline. This way, you would need to setup your connector before running your pipeline, without explicitly setting a Serializable Function. + +Example of using **readAsJson**: +``` +PipelineOptions options = PipelineOptionsFactory.create(); +Pipeline p = Pipeline.create(options); +p.apply(DebeziumIO.read(). + withConnectorConfiguration( // Debezium Connector setup + DebeziumIO.ConnectorConfiguration.create() + .withUsername("debezium") + .withPassword("dbz") + .withConnectorClass(MySqlConnector.class) + .withHostName("127.0.0.1") + .withPort("3306") + .withConnectionProperty("database.server.id", "184054") + .withConnectionProperty("database.server.name", "dbserver1") + .withConnectionProperty("database.include.list", "inventory") + .withConnectionProperty("include.schema.changes", "false")); + +p.run().waitUntilFinish(); +``` + +## Under the hood + +### KafkaSourceConsumerFn and Restrictions + +KafkaSourceConsumerFn (KSC onwards) is a [DoFn](https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/transforms/DoFn.html) in charge of the Database replication and CDC. + +There are two ways of initializing KSC: +* Restricted by number of records +* Restricted by amount of time (minutes) + +By default, DebeziumIO initializes it with the former, though user may choose the latter by setting the amount of minutes as a parameter: + +|Function|Param|Description| +|-|-|-| +|`KafkaSourceConsumerFn(connectorClass, recordMapper, maxRecords)`|_Class, SourceRecordMapper, Int_|Restrict run by number of records (Default).| +|`KafkaSourceConsumerFn(connectorClass, recordMapper, timeToRun)`|_Class, SourceRecordMapper, Long_|Restrict run by amount of time (in minutes).| + +### Requirements and Supported versions + +- JDK v8 +- Debezium Connectors v1.3 +- Apache Beam 2.25 + +## Running Unit Tests + +You can run Integration Tests using **gradlew**. + +Example of running the MySQL Connector Integration Test: +``` +./gradlew integrationTest -p sdks/java/io/debezium/ --tests org.apache.beam.io.debezium.DebeziumIOMySqlConnectorIT -DintegrationTestRunner=direct +``` \ No newline at end of file diff --git a/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/Connectors.java b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/Connectors.java new file mode 100644 index 000000000000..7aa8b6d0dc7f --- /dev/null +++ b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/Connectors.java @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import org.apache.kafka.connect.source.SourceConnector; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Enumeration of debezium connectors. */ +public enum Connectors { + MYSQL("MySQL", "io.debezium.connector.mysql.MySqlConnector"), + POSTGRES("PostgreSQL", "io.debezium.connector.postgresql.PostgresConnector"), + SQLSERVER("SQLServer", "io.debezium.connector.sqlserver.SqlServerConnector"), + ORACLE("Oracle", "io.debezium.connector.oracle.OracleConnector"), + DB2("DB2", "io.debezium.connector.db2.Db2Connector"), + ; + private static final Logger LOG = LoggerFactory.getLogger(Connectors.class); + private final String name; + private final String connector; + + Connectors(String name, String connector) { + this.name = name; + this.connector = connector; + } + + /** The name of this connector class. */ + public String getName() { + return name; + } + + /** Class connector to debezium. */ + public @Nullable Class getConnector() { + Class connectorClass = null; + try { + connectorClass = (Class) Class.forName(this.connector); + } catch (ClassCastException | ClassNotFoundException e) { + LOG.error("Connector class is not found", e); + } + return connectorClass; + } + + /** + * Returns a connector class corresponding to the given connector name. + * + * @param connectorName The name of the connector. Ex.: MySQL + * @return Connector enum representing the given connector name. + */ + public static Connectors fromName(String connectorName) { + for (Connectors connector : Connectors.values()) { + if (connector.getName().equals(connectorName)) { + return connector; + } + } + throw new IllegalArgumentException("Cannot create enum from " + connectorName + " value!"); + } +} diff --git a/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/DebeziumIO.java b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/DebeziumIO.java new file mode 100644 index 000000000000..b7c084fcba78 --- /dev/null +++ b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/DebeziumIO.java @@ -0,0 +1,515 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.auto.value.AutoValue; +import java.io.Serializable; +import java.util.HashMap; +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.MapCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Joiner; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.kafka.connect.source.SourceConnector; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Utility class which exposes an implementation {@link #read} and a Debezium configuration. + * + *

    Quick Overview

    + * + *

    This class lets Beam users connect to their existing Debezium implementations in an easy way. + * + *

    Any Kafka connector supported by Debezium should work fine with this IO. + * + *

    The following connectors were tested and worked well in some simple scenarios: + * + *

      + *
    • MySQL + *
    • PostgreSQL + *
    • SQLServer + *
    • DB2 + *
    + * + *

    Usage example

    + * + *

    Connect to a Debezium - MySQL database and run a Pipeline + * + *

    + *     private static final ConnectorConfiguration mySqlConnectorConfig = ConnectorConfiguration
    + *             .create()
    + *             .withUsername("uname")
    + *             .withPassword("pwd123")
    + *             .withHostName("127.0.0.1")
    + *             .withPort("3306")
    + *             .withConnectorClass(MySqlConnector.class)
    + *             .withConnectionProperty("database.server.id", "184054")
    + *             .withConnectionProperty("database.server.name", "serverid")
    + *             .withConnectionProperty("database.include.list", "dbname")
    + *             .withConnectionProperty("database.history", DebeziumSDFDatabaseHistory.class.getName())
    + *             .withConnectionProperty("include.schema.changes", "false");
    + *
    + *      PipelineOptions options = PipelineOptionsFactory.create();
    + *      Pipeline p = Pipeline.create(options);
    + *      p.apply(DebeziumIO.read()
    + *               .withConnectorConfiguration(mySqlConnectorConfig)
    + *               .withFormatFunction(new SourceRecordJson.SourceRecordJsonMapper())
    + *       ).setCoder(StringUtf8Coder.of());
    + *       p.run().waitUntilFinish();
    + * 
    + * + *

    In this example we are using {@link KafkaSourceConsumerFn.DebeziumSDFDatabaseHistory} to + * handle the Database history. + * + *

    Dependencies

    + * + *

    User may work with any of the supported Debezium Connectors above mentioned + * + *

    See Debezium + * Connectors for more info. + */ +@Experimental(Kind.SOURCE_SINK) +@SuppressWarnings({"nullness"}) +public class DebeziumIO { + private static final Logger LOG = LoggerFactory.getLogger(DebeziumIO.class); + + /** + * Read data from a Debezium source. + * + * @param Type of the data to be read. + */ + public static Read read() { + return new AutoValue_DebeziumIO_Read.Builder().build(); + } + + /** + * Read data from Debezium source and convert a Kafka {@link + * org.apache.kafka.connect.source.SourceRecord} into a JSON string using {@link + * org.apache.beam.io.debezium.SourceRecordJson.SourceRecordJsonMapper} as default function + * mapper. + * + * @return Reader object of String. + */ + public static Read readAsJson() { + return new AutoValue_DebeziumIO_Read.Builder() + .setFormatFunction(new SourceRecordJson.SourceRecordJsonMapper()) + .setCoder(StringUtf8Coder.of()) + .build(); + } + + /** Disallow construction of utility class. */ + private DebeziumIO() {} + + /** Implementation of {@link #read}. */ + @AutoValue + public abstract static class Read extends PTransform> { + + private static final long serialVersionUID = 1L; + + abstract @Nullable ConnectorConfiguration getConnectorConfiguration(); + + abstract @Nullable SourceRecordMapper getFormatFunction(); + + abstract @Nullable Integer getMaxNumberOfRecords(); + + abstract @Nullable Coder getCoder(); + + abstract Builder toBuilder(); + + @AutoValue.Builder + abstract static class Builder { + abstract Builder setConnectorConfiguration(ConnectorConfiguration config); + + abstract Builder setCoder(Coder coder); + + abstract Builder setFormatFunction(SourceRecordMapper mapperFn); + + abstract Builder setMaxNumberOfRecords(Integer maxNumberOfRecords); + + abstract Read build(); + } + + /** + * Applies the given configuration to the connector. It cannot be null. + * + * @param config Configuration to be used within the connector. + * @return PTransform {@link #read} + */ + public Read withConnectorConfiguration(final ConnectorConfiguration config) { + checkArgument(config != null, "config can not be null"); + return toBuilder().setConnectorConfiguration(config).build(); + } + + /** + * Applies a {@link SourceRecordMapper} to the connector. It cannot be null. + * + * @param mapperFn the mapper function to be used on each {@link + * org.apache.kafka.connect.source.SourceRecord}. + * @return PTransform {@link #read} + */ + public Read withFormatFunction(SourceRecordMapper mapperFn) { + checkArgument(mapperFn != null, "mapperFn can not be null"); + return toBuilder().setFormatFunction(mapperFn).build(); + } + + /** + * Applies a {@link Coder} to the connector. It cannot be null + * + * @param coder The Coder to be used over the data. + * @return PTransform {@link #read} + */ + public Read withCoder(Coder coder) { + checkArgument(coder != null, "coder can not be null"); + return toBuilder().setCoder(coder).build(); + } + + /** + * Once the specified number of records has been reached, it will stop fetching them. The value + * can be null (default) which means it will not stop. + * + * @param maxNumberOfRecords The maximum number of records to be fetched before stop. + * @return PTransform {@link #read} + */ + public Read withMaxNumberOfRecords(Integer maxNumberOfRecords) { + return toBuilder().setMaxNumberOfRecords(maxNumberOfRecords).build(); + } + + @Override + public PCollection expand(PBegin input) { + return input + .apply( + Create.of(Lists.newArrayList(getConnectorConfiguration().getConfigurationMap())) + .withCoder(MapCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()))) + .apply( + ParDo.of( + new KafkaSourceConsumerFn<>( + getConnectorConfiguration().getConnectorClass().get(), + getFormatFunction(), + getMaxNumberOfRecords()))) + .setCoder(getCoder()); + } + } + + /** A POJO describing a Debezium configuration. */ + @AutoValue + public abstract static class ConnectorConfiguration implements Serializable { + private static final long serialVersionUID = 1L; + + abstract @Nullable ValueProvider> getConnectorClass(); + + abstract @Nullable ValueProvider getHostName(); + + abstract @Nullable ValueProvider getPort(); + + abstract @Nullable ValueProvider getUsername(); + + abstract @Nullable ValueProvider getPassword(); + + abstract @Nullable ValueProvider getSourceConnector(); + + abstract @Nullable ValueProvider> getConnectionProperties(); + + abstract Builder builder(); + + @AutoValue.Builder + abstract static class Builder { + abstract Builder setConnectorClass(ValueProvider> connectorClass); + + abstract Builder setHostName(ValueProvider hostname); + + abstract Builder setPort(ValueProvider port); + + abstract Builder setUsername(ValueProvider username); + + abstract Builder setPassword(ValueProvider password); + + abstract Builder setConnectionProperties( + ValueProvider> connectionProperties); + + abstract Builder setSourceConnector(ValueProvider sourceConnector); + + abstract ConnectorConfiguration build(); + } + + /** + * Creates a ConnectorConfiguration. + * + * @return {@link ConnectorConfiguration} + */ + public static ConnectorConfiguration create() { + return new AutoValue_DebeziumIO_ConnectorConfiguration.Builder() + .setConnectionProperties(ValueProvider.StaticValueProvider.of(new HashMap<>())) + .build(); + } + + /** + * Applies the connectorClass to be used to connect to your database. + * + *

    Currently supported connectors are: + * + *

      + *
    • {@link io.debezium.connector.mysql.MySqlConnector} + *
    • {@link io.debezium.connector.postgresql.PostgresConnector} + *
    • {@link io.debezium.connector.sqlserver.SqlServerConnector } + *
    + * + * @param connectorClass Any of the supported connectors. + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withConnectorClass(Class connectorClass) { + checkArgument(connectorClass != null, "connectorClass can not be null"); + return withConnectorClass(ValueProvider.StaticValueProvider.of(connectorClass)); + } + + /** + * Sets the connectorClass to be used to connect to your database. It cannot be null. + * + *

    Currently supported connectors are: + * + *

      + *
    • {@link io.debezium.connector.mysql.MySqlConnector} + *
    • {@link io.debezium.connector.postgresql.PostgresConnector} + *
    • {@link io.debezium.connector.sqlserver.SqlServerConnector } + *
    + * + * @param connectorClass (as ValueProvider) + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withConnectorClass(ValueProvider> connectorClass) { + checkArgument(connectorClass != null, "connectorClass can not be null"); + return builder().setConnectorClass(connectorClass).build(); + } + + /** + * Sets the host name to be used on the database. It cannot be null. + * + * @param hostName The hostname of your database. + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withHostName(String hostName) { + checkArgument(hostName != null, "hostName can not be null"); + return withHostName(ValueProvider.StaticValueProvider.of(hostName)); + } + + /** + * Sets the host name to be used on the database. It cannot be null. + * + * @param hostName The hostname of your database (as ValueProvider). + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withHostName(ValueProvider hostName) { + checkArgument(hostName != null, "hostName can not be null"); + return builder().setHostName(hostName).build(); + } + + /** + * Sets the port on which your database is listening. It cannot be null. + * + * @param port The port to be used to connect to your database (as ValueProvider). + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withPort(String port) { + checkArgument(port != null, "port can not be null"); + return withPort(ValueProvider.StaticValueProvider.of(port)); + } + + /** + * Sets the port on which your database is listening. It cannot be null. + * + * @param port The port to be used to connect to your database. + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withPort(ValueProvider port) { + checkArgument(port != null, "port can not be null"); + return builder().setPort(port).build(); + } + + /** + * Sets the username to connect to your database. It cannot be null. + * + * @param username Database username + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withUsername(String username) { + checkArgument(username != null, "username can not be null"); + return withUsername(ValueProvider.StaticValueProvider.of(username)); + } + + /** + * Sets the username to connect to your database. It cannot be null. + * + * @param username (as ValueProvider). + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withUsername(ValueProvider username) { + checkArgument(username != null, "username can not be null"); + return builder().setUsername(username).build(); + } + + /** + * Sets the password to connect to your database. It cannot be null. + * + * @param password Database password + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withPassword(String password) { + checkArgument(password != null, "password can not be null"); + return withPassword(ValueProvider.StaticValueProvider.of(password)); + } + + /** + * Sets the password to connect to your database. It cannot be null. + * + * @param password (as ValueProvider). + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withPassword(ValueProvider password) { + checkArgument(password != null, "password can not be null"); + return builder().setPassword(password).build(); + } + + /** + * Sets a custom property to be used within the connection to your database. + * + *

    You may use this to set special configurations such as: + * + *

      + *
    • slot.name + *
    • database.dbname + *
    • database.server.id + *
    • ... + *
    + * + * @param connectionProperties Properties (Key, Value) Map + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withConnectionProperties( + Map connectionProperties) { + checkArgument(connectionProperties != null, "connectionProperties can not be null"); + return withConnectionProperties(ValueProvider.StaticValueProvider.of(connectionProperties)); + } + + /** + * Sets a custom property to be used within the connection to your database. + * + *

    You may use this to set special configurations such as: + * + *

      + *
    • slot.name + *
    • database.dbname + *
    • database.server.id + *
    • ... + *
    + * + * @param connectionProperties (as ValueProvider). + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withConnectionProperties( + ValueProvider> connectionProperties) { + checkArgument(connectionProperties != null, "connectionProperties can not be null"); + return builder().setConnectionProperties(connectionProperties).build(); + } + + /** + * Sets a custom property to be used within the connection to your database. + * + *

    You may use this to set special configurations such as: + * + *

      + *
    • slot.name + *
    • database.dbname + *
    • database.server.id + *
    • ... + *
    + * + * @param key Property name + * @param value Property value + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withConnectionProperty(String key, String value) { + checkArgument(key != null, "key can not be null"); + checkArgument(value != null, "value can not be null"); + checkArgument( + getConnectionProperties().get() != null, "connectionProperties can not be null"); + + ConnectorConfiguration config = builder().build(); + config.getConnectionProperties().get().putIfAbsent(key, value); + return config; + } + + /** + * Sets the {@link SourceConnector} to be used. It cannot be null. + * + * @param sourceConnector Any supported connector + * @return {@link ConnectorConfiguration} + */ + public ConnectorConfiguration withSourceConnector(SourceConnector sourceConnector) { + checkArgument(sourceConnector != null, "sourceConnector can not be null"); + return withSourceConnector(ValueProvider.StaticValueProvider.of(sourceConnector)); + } + + public ConnectorConfiguration withSourceConnector( + ValueProvider sourceConnector) { + checkArgument(sourceConnector != null, "sourceConnector can not be null"); + return builder().setSourceConnector(sourceConnector).build(); + } + + /** + * Configuration Map Getter. + * + * @return Configuration Map. + */ + public Map getConfigurationMap() { + HashMap configuration = new HashMap<>(); + + configuration.computeIfAbsent( + "connector.class", k -> getConnectorClass().get().getCanonicalName()); + configuration.computeIfAbsent("database.hostname", k -> getHostName().get()); + configuration.computeIfAbsent("database.port", k -> getPort().get()); + configuration.computeIfAbsent("database.user", k -> getUsername().get()); + configuration.computeIfAbsent("database.password", k -> getPassword().get()); + + for (Map.Entry entry : getConnectionProperties().get().entrySet()) { + configuration.computeIfAbsent(entry.getKey(), k -> entry.getValue()); + } + + // Set default Database History impl. if not provided + configuration.computeIfAbsent( + "database.history", + k -> KafkaSourceConsumerFn.DebeziumSDFDatabaseHistory.class.getName()); + + String stringProperties = Joiner.on('\n').withKeyValueSeparator(" -> ").join(configuration); + LOG.debug("---------------- Connector configuration: {}", stringProperties); + + return configuration; + } + } +} diff --git a/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/DebeziumTransformRegistrar.java b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/DebeziumTransformRegistrar.java new file mode 100644 index 000000000000..a00f706abfb2 --- /dev/null +++ b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/DebeziumTransformRegistrar.java @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import com.google.auto.service.AutoService; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.expansion.ExternalTransformRegistrar; +import org.apache.beam.sdk.transforms.ExternalTransformBuilder; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Exposes {@link DebeziumIO.Read} as an external transform for cross-language usage. */ +@Experimental(Experimental.Kind.PORTABILITY) +@AutoService(ExternalTransformRegistrar.class) +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class DebeziumTransformRegistrar implements ExternalTransformRegistrar { + private static final Logger LOG = LoggerFactory.getLogger(DebeziumTransformRegistrar.class); + public static final String READ_JSON_URN = "beam:external:java:debezium:read:v1"; + + @Override + public Map>> knownBuilders() { + return ImmutableMap.of( + READ_JSON_URN, + (Class>) (Class) ReadBuilder.class); + } + + private abstract static class CrossLanguageConfiguration { + String username; + String password; + String host; + String port; + Connectors connectorClass; + + public void setUsername(String username) { + this.username = username; + } + + public void setPassword(String password) { + this.password = password; + } + + public void setHost(String host) { + this.host = host; + } + + public void setPort(String port) { + this.port = port; + } + + public void setConnectorClass(String connectorClass) { + this.connectorClass = Connectors.fromName(connectorClass); + } + } + + public static class ReadBuilder + implements ExternalTransformBuilder> { + + public static class Configuration extends CrossLanguageConfiguration { + private @Nullable List connectionProperties; + private @Nullable Long maxNumberOfRecords; + + public void setConnectionProperties(@Nullable List connectionProperties) { + this.connectionProperties = connectionProperties; + } + + public void setMaxNumberOfRecords(@Nullable Long maxNumberOfRecords) { + this.maxNumberOfRecords = maxNumberOfRecords; + } + } + + @Override + public PTransform> buildExternal(Configuration configuration) { + DebeziumIO.ConnectorConfiguration connectorConfiguration = + DebeziumIO.ConnectorConfiguration.create() + .withUsername(configuration.username) + .withPassword(configuration.password) + .withHostName(configuration.host) + .withPort(configuration.port) + .withConnectorClass(configuration.connectorClass.getConnector()); + + if (configuration.connectionProperties != null) { + for (String connectionProperty : configuration.connectionProperties) { + String[] parts = connectionProperty.split("=", -1); + String key = parts[0]; + String value = parts[1]; + connectorConfiguration.withConnectionProperty(key, value); + } + } + + DebeziumIO.Read readTransform = + DebeziumIO.readAsJson().withConnectorConfiguration(connectorConfiguration); + + if (configuration.maxNumberOfRecords != null) { + readTransform = + readTransform.withMaxNumberOfRecords(configuration.maxNumberOfRecords.intValue()); + } + + return readTransform; + } + } +} diff --git a/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java new file mode 100644 index 000000000000..6d5658f72b06 --- /dev/null +++ b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java @@ -0,0 +1,416 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import io.debezium.document.Document; +import io.debezium.document.DocumentReader; +import io.debezium.document.DocumentWriter; +import io.debezium.relational.history.AbstractDatabaseHistory; +import io.debezium.relational.history.DatabaseHistoryException; +import io.debezium.relational.history.HistoryRecord; +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import java.util.function.Consumer; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.SerializableCoder; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.sdk.transforms.splittabledofn.SplitResult; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.kafka.connect.source.SourceConnector; +import org.apache.kafka.connect.source.SourceRecord; +import org.apache.kafka.connect.source.SourceTask; +import org.apache.kafka.connect.source.SourceTaskContext; +import org.apache.kafka.connect.storage.OffsetStorageReader; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.DateTime; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * + * + *

    Quick Overview

    + * + * SDF used to process records fetched from supported Debezium Connectors. + * + *

    Currently it has a time limiter (see {@link OffsetTracker}) which, if set, it will stop + * automatically after the specified elapsed minutes. Otherwise, it will keep running until the user + * explicitly interrupts it. + * + *

    It might be initialized either as: + * + *

    KafkaSourceConsumerFn(connectorClass, SourceRecordMapper)
    + * + * Or with a time limiter: + * + *
    KafkaSourceConsumerFn(connectorClass, SourceRecordMapper, minutesToRun)
    + */ +@SuppressWarnings({"nullness"}) +public class KafkaSourceConsumerFn extends DoFn, T> { + private static final Logger LOG = LoggerFactory.getLogger(KafkaSourceConsumerFn.class); + public static final String BEAM_INSTANCE_PROPERTY = "beam.parent.instance"; + + private final Class connectorClass; + private final SourceRecordMapper fn; + + private long minutesToRun = -1; + private Integer maxRecords; + + private static DateTime startTime; + private static final Map>> + restrictionTrackers = new ConcurrentHashMap<>(); + + /** + * Initializes the SDF with a time limit. + * + * @param connectorClass Supported Debezium connector class + * @param fn a SourceRecordMapper + * @param minutesToRun Maximum time to run (in minutes) + */ + KafkaSourceConsumerFn(Class connectorClass, SourceRecordMapper fn, long minutesToRun) { + this.connectorClass = (Class) connectorClass; + this.fn = fn; + this.minutesToRun = minutesToRun; + } + + /** + * Initializes the SDF to be run indefinitely. + * + * @param connectorClass Supported Debezium connector class + * @param fn a SourceRecordMapper + */ + KafkaSourceConsumerFn(Class connectorClass, SourceRecordMapper fn, Integer maxRecords) { + this.connectorClass = (Class) connectorClass; + this.fn = fn; + this.maxRecords = maxRecords; + } + + @GetInitialRestriction + public OffsetHolder getInitialRestriction(@Element Map unused) + throws IOException { + KafkaSourceConsumerFn.startTime = new DateTime(); + return new OffsetHolder(null, null, null, this.maxRecords, this.minutesToRun); + } + + @NewTracker + public RestrictionTracker> newTracker( + @Restriction OffsetHolder restriction) { + return new OffsetTracker(restriction); + } + + @GetRestrictionCoder + public Coder getRestrictionCoder() { + return SerializableCoder.of(OffsetHolder.class); + } + + /** + * Process the retrieved element. Currently it just logs the retrieved record as JSON. + * + * @param element Record retrieved + * @param tracker Restriction Tracker + * @param receiver Output Receiver + * @return + * @throws Exception + */ + @DoFn.ProcessElement + public ProcessContinuation process( + @Element Map element, + RestrictionTracker> tracker, + OutputReceiver receiver) + throws Exception { + Map configuration = new HashMap<>(element); + + // Adding the current restriction to the class object to be found by the database history + restrictionTrackers.put(this.getHashCode(), tracker); + configuration.put(BEAM_INSTANCE_PROPERTY, this.getHashCode()); + + SourceConnector connector = connectorClass.getDeclaredConstructor().newInstance(); + connector.start(configuration); + + SourceTask task = (SourceTask) connector.taskClass().getDeclaredConstructor().newInstance(); + + try { + Map consumerOffset = tracker.currentRestriction().offset; + LOG.debug("--------- Consumer offset from Debezium Tracker: {}", consumerOffset); + + task.initialize(new BeamSourceTaskContext(tracker.currentRestriction().offset)); + task.start(connector.taskConfigs(1).get(0)); + + List records = task.poll(); + + if (records == null) { + LOG.debug("-------- Pulled records null"); + return ProcessContinuation.stop(); + } + + LOG.debug("-------- {} records found", records.size()); + if (!records.isEmpty()) { + for (SourceRecord record : records) { + LOG.debug("-------- Record found: {}", record); + + Map offset = (Map) record.sourceOffset(); + + if (offset == null || !tracker.tryClaim(offset)) { + LOG.debug("-------- Offset null or could not be claimed"); + return ProcessContinuation.stop(); + } + + T json = this.fn.mapSourceRecord(record); + LOG.debug("****************** RECEIVED SOURCE AS JSON: {}", json); + + receiver.output(json); + } + + task.commit(); + } + } catch (Exception ex) { + LOG.error( + "-------- Error on consumer: {}. with stacktrace: {}", + ex.getMessage(), + ex.getStackTrace()); + } finally { + restrictionTrackers.remove(this.getHashCode()); + + LOG.debug("------- Stopping SourceTask"); + task.stop(); + } + + return ProcessContinuation.resume().withResumeDelay(Duration.standardSeconds(1)); + } + + public String getHashCode() { + return Integer.toString(System.identityHashCode(this)); + } + + private static class BeamSourceTaskContext implements SourceTaskContext { + private final @Nullable Map initialOffset; + + BeamSourceTaskContext(@Nullable Map initialOffset) { + this.initialOffset = initialOffset; + } + + @Override + public Map configs() { + // TODO(pabloem): Do we need to implement this? + throw new UnsupportedOperationException("unimplemented"); + } + + @Override + public OffsetStorageReader offsetStorageReader() { + LOG.debug("------------- Creating an offset storage reader"); + return new DebeziumSourceOffsetStorageReader(initialOffset); + } + } + + private static class DebeziumSourceOffsetStorageReader implements OffsetStorageReader { + private final Map offset; + + DebeziumSourceOffsetStorageReader(Map initialOffset) { + this.offset = initialOffset; + } + + @Override + public Map offset(Map partition) { + return offsets(Collections.singletonList(partition)) + .getOrDefault(partition, ImmutableMap.of()); + } + + @Override + public Map, Map> offsets( + Collection> partitions) { + LOG.debug("-------------- GETTING OFFSETS!"); + + Map, Map> map = new HashMap<>(); + for (Map partition : partitions) { + map.put(partition, (Map) offset); + } + + LOG.debug("-------------- OFFSETS: {}", map); + return map; + } + } + + static class OffsetHolder implements Serializable { + public final @Nullable Map offset; + public final @Nullable List history; + public final @Nullable Integer fetchedRecords; + public final @Nullable Integer maxRecords; + public final long minutesToRun; + + OffsetHolder( + @Nullable Map offset, + @Nullable List history, + @Nullable Integer fetchedRecords, + @Nullable Integer maxRecords, + long minutesToRun) { + this.offset = offset; + this.history = history == null ? new ArrayList<>() : history; + this.fetchedRecords = fetchedRecords; + this.maxRecords = maxRecords; + this.minutesToRun = minutesToRun; + } + + OffsetHolder( + @Nullable Map offset, + @Nullable List history, + @Nullable Integer fetchedRecords) { + this(offset, history, fetchedRecords, null, -1); + } + } + + /** {@link RestrictionTracker} for Debezium connectors. */ + static class OffsetTracker extends RestrictionTracker> { + private OffsetHolder restriction; + private static final long MILLIS = 60 * 1000; + + OffsetTracker(OffsetHolder holder) { + this.restriction = holder; + } + + /** + * Overriding {@link #tryClaim} in order to stop fetching records from the database. + * + *

    This works on two different ways: + * + *

    Number of records

    + * + *

    This is the default behavior. Once the specified number of records has been reached, it + * will stop fetching them. + * + *

    Time based

    + * + * User may specify the amount of time the connector to be kept alive. Please see {@link + * KafkaSourceConsumerFn} for more details on this. + * + * @param position Currently not used + * @return boolean + */ + @Override + public boolean tryClaim(Map position) { + LOG.debug("-------------- Claiming {} used to have: {}", position, restriction.offset); + long elapsedTime = System.currentTimeMillis() - startTime.getMillis(); + int fetchedRecords = + this.restriction.fetchedRecords == null ? 0 : this.restriction.fetchedRecords + 1; + LOG.debug("------------Fetched records {} / {}", fetchedRecords, this.restriction.maxRecords); + LOG.debug( + "-------------- Time running: {} / {}", + elapsedTime, + (this.restriction.minutesToRun * MILLIS)); + this.restriction = + new OffsetHolder( + position, + this.restriction.history, + fetchedRecords, + this.restriction.maxRecords, + this.restriction.minutesToRun); + LOG.debug("-------------- History: {}", this.restriction.history); + + if (this.restriction.maxRecords == null && this.restriction.minutesToRun == -1) { + return true; + } + + if (this.restriction.maxRecords != null) { + return fetchedRecords < this.restriction.maxRecords; + } + + return elapsedTime < this.restriction.minutesToRun * MILLIS; + } + + @Override + public OffsetHolder currentRestriction() { + return restriction; + } + + @Override + public SplitResult trySplit(double fractionOfRemainder) { + LOG.debug("-------------- Trying to split: fractionOfRemainder={}", fractionOfRemainder); + return SplitResult.of(new OffsetHolder(null, null, null), restriction); + } + + @Override + public void checkDone() throws IllegalStateException {} + + @Override + public IsBounded isBounded() { + return IsBounded.BOUNDED; + } + } + + public static class DebeziumSDFDatabaseHistory extends AbstractDatabaseHistory { + private List history; + + public DebeziumSDFDatabaseHistory() { + this.history = new ArrayList<>(); + } + + @Override + public void start() { + super.start(); + LOG.debug( + "------------ STARTING THE DATABASE HISTORY! - trackers: {} - config: {}", + restrictionTrackers, + config.asMap()); + + // We fetch the first key to get the first restriction tracker. + // TODO(BEAM-11737): This may cause issues with multiple trackers in the future. + RestrictionTracker tracker = + restrictionTrackers.get(restrictionTrackers.keySet().iterator().next()); + this.history = (List) tracker.currentRestriction().history; + } + + @Override + protected void storeRecord(HistoryRecord record) throws DatabaseHistoryException { + LOG.debug("------------- Adding history! {}", record); + + history.add(DocumentWriter.defaultWriter().writeAsBytes(record.document())); + } + + @Override + protected void recoverRecords(Consumer consumer) { + LOG.debug("------------- Trying to recover!"); + + try { + for (byte[] record : history) { + Document doc = DocumentReader.defaultReader().read(record); + consumer.accept(new HistoryRecord(doc)); + } + } catch (IOException e) { + throw new RuntimeException(e); + } + } + + @Override + public boolean exists() { + return history != null && !history.isEmpty(); + } + + @Override + public boolean storageExists() { + return history != null && !history.isEmpty(); + } + } +} diff --git a/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/SourceRecordJson.java b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/SourceRecordJson.java new file mode 100644 index 000000000000..e61884fa49ba --- /dev/null +++ b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/SourceRecordJson.java @@ -0,0 +1,287 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import java.io.Serializable; +import java.util.HashMap; +import java.util.Map; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.gson.Gson; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.gson.GsonBuilder; +import org.apache.kafka.connect.data.Field; +import org.apache.kafka.connect.data.Struct; +import org.apache.kafka.connect.errors.DataException; +import org.apache.kafka.connect.source.SourceRecord; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * This class can be used as a mapper for each {@link SourceRecord} retrieved. + * + *

    What it does

    + * + *

    It maps any SourceRecord retrieved from any supported {@link io.debezium.connector} to JSON + * + *

    How it works

    + * + *

    It will extract valuable fields from any given SourceRecord: + * + *

      + *
    • before - {@link #loadBefore} + *
    • after - {@link #loadAfter} + *
    • metadata - {@link #loadMetadata} + *
        + *
      • schema - Database Schema + *
      • connector - Connector used + *
      • version - Connector version + *
      + *
    + * + *

    Usage Example

    + * + *

    Map each SourceRecord to JSON + * + *

    + *     DebeziumIO.read()
    + *         .withFormatFunction(new SourceRecordJson.SourceRecordJsonMapper()
    + * 
    + */ +@SuppressWarnings({"nullness"}) +public class SourceRecordJson { + private final @Nullable SourceRecord sourceRecord; + private final @Nullable Struct value; + private final @Nullable Event event; + + /** + * Initializer. + * + * @param sourceRecord retrieved SourceRecord using a supported SourceConnector + */ + public SourceRecordJson(@Nullable SourceRecord sourceRecord) { + if (sourceRecord == null) { + throw new IllegalArgumentException(); + } + + this.sourceRecord = sourceRecord; + this.value = (Struct) sourceRecord.value(); + + if (this.value == null) { + this.event = new Event(null, null, null); + } else { + Event.Metadata metadata = this.loadMetadata(); + Event.Before before = this.loadBefore(); + Event.After after = this.loadAfter(); + + this.event = new Event(metadata, before, after); + } + } + + /** + * Extracts metadata from the SourceRecord. + * + * @return Metadata + */ + private Event.Metadata loadMetadata() { + @Nullable Struct source; + try { + source = (Struct) this.value.get("source"); + } catch (RuntimeException e) { + throw new IllegalArgumentException(); + } + @Nullable String schema; + + if (source == null) { + return null; + } + + try { + // PostgreSQL and SQL server use Schema + schema = source.getString("schema"); + } catch (DataException e) { + // MySQL uses file instead + schema = source.getString("file"); + } + + return new Event.Metadata( + source.getString("connector"), + source.getString("version"), + source.getString("name"), + source.getString("db"), + schema, + source.getString("table")); + } + + /** + * Extracts the before field within SourceRecord. + * + * @return Before + */ + private Event.Before loadBefore() { + @Nullable Struct before; + try { + before = (Struct) this.value.get("before"); + } catch (DataException e) { + return null; + } + if (before == null) { + return null; + } + + Map fields = new HashMap<>(); + for (Field field : before.schema().fields()) { + fields.put(field.name(), before.get(field)); + } + + return new Event.Before(fields); + } + + /** + * Extracts the after field within SourceRecord. + * + * @return After + */ + private Event.After loadAfter() { + @Nullable Struct after; + try { + after = (Struct) this.value.get("after"); + } catch (DataException e) { + return null; + } + if (after == null) { + return null; + } + + Map fields = new HashMap<>(); + for (Field field : after.schema().fields()) { + fields.put(field.name(), after.get(field)); + } + + return new Event.After(fields); + } + + /** + * Transforms the extracted data to a JSON string. + * + * @return JSON String + */ + public String toJson() { + return this.event.toJson(); + } + + /** {@link SourceRecordJson} implementation. */ + public static class SourceRecordJsonMapper implements SourceRecordMapper { + @Override + public String mapSourceRecord(SourceRecord sourceRecord) throws Exception { + return new SourceRecordJson(sourceRecord).toJson(); + } + } + + /** Depicts a SourceRecord as an Event in order for it to be mapped as JSON. */ + static class Event implements Serializable { + private final SourceRecordJson.Event.Metadata metadata; + private final SourceRecordJson.Event.Before before; + private final SourceRecordJson.Event.After after; + + /** + * Event Initializer. + * + * @param metadata Metadata retrieved from SourceRecord + * @param before Before data retrieved from SourceRecord + * @param after After data retrieved from SourceRecord + */ + public Event( + SourceRecordJson.Event.Metadata metadata, + SourceRecordJson.Event.Before before, + SourceRecordJson.Event.After after) { + this.metadata = metadata; + this.before = before; + this.after = after; + } + + /** + * Transforms the Event to a JSON string. + * + * @return JSON String + */ + public String toJson() { + Gson gson = new GsonBuilder().serializeNulls().create(); + return gson.toJson(this); + } + + /** Depicts the metadata within a SourceRecord. It has valuable fields. */ + static class Metadata implements Serializable { + private final @Nullable String connector; + private final @Nullable String version; + private final @Nullable String name; + private final @Nullable String database; + private final @Nullable String schema; + private final @Nullable String table; + + /** + * Metadata Initializer. + * + * @param connector Connector used + * @param version Connector version + * @param name Connector name + * @param database DB name + * @param schema Schema name + * @param table Table name + */ + public Metadata( + @Nullable String connector, + @Nullable String version, + @Nullable String name, + @Nullable String database, + @Nullable String schema, + @Nullable String table) { + this.connector = connector; + this.version = version; + this.name = name; + this.database = database; + this.schema = schema; + this.table = table; + } + } + + /** Depicts the before field within SourceRecord. */ + static class Before implements Serializable { + private final @Nullable Map fields; + + /** + * Before Initializer. + * + * @param fields Key - Value map with information within Before + */ + public Before(@Nullable Map fields) { + this.fields = fields; + } + } + + /** Depicts the after field within SourceRecord. */ + static class After implements Serializable { + private final @Nullable Map fields; + + /** + * After Initializer. + * + * @param fields Key - Value map with information within After + */ + public After(@Nullable Map fields) { + this.fields = fields; + } + } + } +} diff --git a/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/BeamStoppableFunction.java b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/SourceRecordMapper.java similarity index 66% rename from runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/BeamStoppableFunction.java rename to sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/SourceRecordMapper.java index 25eafd7ac114..65e42e915f45 100644 --- a/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/BeamStoppableFunction.java +++ b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/SourceRecordMapper.java @@ -15,15 +15,17 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.flink.translation.wrappers.streaming.io; +package org.apache.beam.io.debezium; -import org.apache.flink.api.common.functions.StoppableFunction; +import java.io.Serializable; +import org.apache.kafka.connect.source.SourceRecord; /** - * Custom StoppableFunction for backward compatibility. + * Interface used to map a Kafka source record. * - * @see Flink - * interface removal commit. + * @param The desired type you want to map the Kafka source record */ -public interface BeamStoppableFunction extends StoppableFunction {} +@FunctionalInterface +public interface SourceRecordMapper extends Serializable { + T mapSourceRecord(SourceRecord sourceRecord) throws Exception; +} diff --git a/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/package-info.java b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/package-info.java new file mode 100644 index 000000000000..86ba1f593c9e --- /dev/null +++ b/sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/package-info.java @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Transforms for reading from DebeziumIO. + * + * @see org.apache.beam.io.debezium.DebeziumIO + */ +@Experimental(Kind.SOURCE_SINK) +package org.apache.beam.io.debezium; + +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; diff --git a/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/DebeziumIOMySqlConnectorIT.java b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/DebeziumIOMySqlConnectorIT.java new file mode 100644 index 000000000000..6056ca0ebf6c --- /dev/null +++ b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/DebeziumIOMySqlConnectorIT.java @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import static org.apache.beam.sdk.testing.SerializableMatchers.hasItem; +import static org.hamcrest.MatcherAssert.assertThat; + +import io.debezium.connector.mysql.MySqlConnector; +import java.time.Duration; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.values.PCollection; +import org.junit.ClassRule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.testcontainers.containers.MySQLContainer; +import org.testcontainers.containers.wait.strategy.HttpWaitStrategy; +import org.testcontainers.utility.DockerImageName; + +@RunWith(JUnit4.class) +public class DebeziumIOMySqlConnectorIT { + /** + * Debezium - MySqlContainer + * + *

    Creates a docker container using the image used by the debezium tutorial. + */ + @ClassRule + public static final MySQLContainer MY_SQL_CONTAINER = + new MySQLContainer<>( + DockerImageName.parse("debezium/example-mysql:1.4") + .asCompatibleSubstituteFor("mysql")) + .withPassword("debezium") + .withUsername("mysqluser") + .withExposedPorts(3306) + .waitingFor( + new HttpWaitStrategy() + .forPort(3306) + .forStatusCodeMatching(response -> response == 200) + .withStartupTimeout(Duration.ofMinutes(2))); + + /** + * Debezium - MySQL connector Test. + * + *

    Tests that connector can actually connect to the database + */ + @Test + public void testDebeziumIOMySql() { + MY_SQL_CONTAINER.start(); + + String host = MY_SQL_CONTAINER.getContainerIpAddress(); + String port = MY_SQL_CONTAINER.getMappedPort(3306).toString(); + + PipelineOptions options = PipelineOptionsFactory.create(); + Pipeline p = Pipeline.create(options); + PCollection results = + p.apply( + DebeziumIO.read() + .withConnectorConfiguration( + DebeziumIO.ConnectorConfiguration.create() + .withUsername("debezium") + .withPassword("dbz") + .withConnectorClass(MySqlConnector.class) + .withHostName(host) + .withPort(port) + .withConnectionProperty("database.server.id", "184054") + .withConnectionProperty("database.server.name", "dbserver1") + .withConnectionProperty("database.include.list", "inventory") + .withConnectionProperty("include.schema.changes", "false")) + .withFormatFunction(new SourceRecordJson.SourceRecordJsonMapper()) + .withMaxNumberOfRecords(30) + .withCoder(StringUtf8Coder.of())); + String expected = + "{\"metadata\":{\"connector\":\"mysql\",\"version\":\"1.3.1.Final\",\"name\":\"dbserver1\"," + + "\"database\":\"inventory\",\"schema\":\"mysql-bin.000003\",\"table\":\"addresses\"},\"before\":null," + + "\"after\":{\"fields\":{\"zip\":\"76036\",\"city\":\"Euless\"," + + "\"street\":\"3183 Moore Avenue\",\"id\":10,\"state\":\"Texas\",\"customer_id\":1001," + + "\"type\":\"SHIPPING\"}}}"; + + PAssert.that(results) + .satisfies( + (Iterable res) -> { + assertThat(res, hasItem(expected)); + return null; + }); + + p.run().waitUntilFinish(); + MY_SQL_CONTAINER.stop(); + } +} diff --git a/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/DebeziumIOTest.java b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/DebeziumIOTest.java new file mode 100644 index 000000000000..ccf57b6cdec9 --- /dev/null +++ b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/DebeziumIOTest.java @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import static org.junit.Assert.assertThrows; +import static org.junit.Assert.assertTrue; + +import io.debezium.config.Configuration; +import io.debezium.connector.mysql.MySqlConnector; +import io.debezium.connector.mysql.MySqlConnectorConfig; +import java.io.Serializable; +import java.util.Map; +import org.apache.beam.io.debezium.DebeziumIO.ConnectorConfiguration; +import org.apache.kafka.common.config.ConfigValue; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Test on the DebeziumIO. */ +@RunWith(JUnit4.class) +public class DebeziumIOTest implements Serializable { + private static final Logger LOG = LoggerFactory.getLogger(DebeziumIOTest.class); + private static final ConnectorConfiguration MYSQL_CONNECTOR_CONFIGURATION = + ConnectorConfiguration.create() + .withUsername("debezium") + .withPassword("dbz") + .withHostName("127.0.0.1") + .withPort("3306") + .withConnectorClass(MySqlConnector.class) + .withConnectionProperty("database.server.id", "184054") + .withConnectionProperty("database.server.name", "dbserver1") + .withConnectionProperty("database.include.list", "inventory") + .withConnectionProperty( + "database.history", KafkaSourceConsumerFn.DebeziumSDFDatabaseHistory.class.getName()) + .withConnectionProperty("include.schema.changes", "false"); + + @Test + public void testSourceMySqlConnectorValidConfiguration() { + Map configurationMap = MYSQL_CONNECTOR_CONFIGURATION.getConfigurationMap(); + + Configuration debeziumConf = Configuration.from(configurationMap); + Map validConfig = debeziumConf.validate(MySqlConnectorConfig.ALL_FIELDS); + + for (ConfigValue configValue : validConfig.values()) { + assertTrue(configValue.errorMessages().isEmpty()); + } + } + + @Test + public void testSourceConnectorUsernamePassword() { + String username = "debezium"; + String password = "dbz"; + ConnectorConfiguration configuration = + MYSQL_CONNECTOR_CONFIGURATION.withUsername(username).withPassword(password); + Map configurationMap = configuration.getConfigurationMap(); + + Configuration debeziumConf = Configuration.from(configurationMap); + Map validConfig = debeziumConf.validate(MySqlConnectorConfig.ALL_FIELDS); + + for (ConfigValue configValue : validConfig.values()) { + assertTrue(configValue.errorMessages().isEmpty()); + } + } + + @Test + public void testSourceConnectorNullPassword() { + String username = "debezium"; + String password = null; + + assertThrows( + IllegalArgumentException.class, + () -> MYSQL_CONNECTOR_CONFIGURATION.withUsername(username).withPassword(password)); + } + + @Test + public void testSourceConnectorNullUsernameAndPassword() { + String username = null; + String password = null; + + assertThrows( + IllegalArgumentException.class, + () -> MYSQL_CONNECTOR_CONFIGURATION.withUsername(username).withPassword(password)); + } +} diff --git a/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/KafkaSourceConsumerFnTest.java b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/KafkaSourceConsumerFnTest.java new file mode 100644 index 000000000000..c22f8a33dd2b --- /dev/null +++ b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/KafkaSourceConsumerFnTest.java @@ -0,0 +1,264 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.MapCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.VarIntCoder; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.kafka.common.config.AbstractConfig; +import org.apache.kafka.common.config.ConfigDef; +import org.apache.kafka.connect.connector.Task; +import org.apache.kafka.connect.data.Schema; +import org.apache.kafka.connect.source.SourceConnector; +import org.apache.kafka.connect.source.SourceRecord; +import org.apache.kafka.connect.source.SourceTask; +import org.apache.kafka.connect.source.SourceTaskContext; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.junit.Assert; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class KafkaSourceConsumerFnTest implements Serializable { + @Test + public void testKafkaSourceConsumerFn() { + Map config = + ImmutableMap.of( + "from", "1", + "to", "10", + "delay", "0.4", + "topic", "any"); + + Pipeline pipeline = Pipeline.create(); + + PCollection counts = + pipeline + .apply( + Create.of(Lists.newArrayList(config)) + .withCoder(MapCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()))) + .apply( + ParDo.of( + new KafkaSourceConsumerFn<>( + CounterSourceConnector.class, + sourceRecord -> (Integer) sourceRecord.value(), + 10))) + .setCoder(VarIntCoder.of()); + + PAssert.that(counts).containsInAnyOrder(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); + pipeline.run().waitUntilFinish(); + } + + @Test + public void testStoppableKafkaSourceConsumerFn() { + Map config = + ImmutableMap.of( + "from", "1", + "to", "3", + "delay", "0.2", + "topic", "any"); + + Pipeline pipeline = Pipeline.create(); + + PCollection counts = + pipeline + .apply( + Create.of(Lists.newArrayList(config)) + .withCoder(MapCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()))) + .apply( + ParDo.of( + new KafkaSourceConsumerFn<>( + CounterSourceConnector.class, + sourceRecord -> (Integer) sourceRecord.value(), + 1))) + .setCoder(VarIntCoder.of()); + + pipeline.run().waitUntilFinish(); + Assert.assertEquals(3, CounterTask.getCountTasks()); + } +} + +class CounterSourceConnector extends SourceConnector { + public static class CounterSourceConnectorConfig extends AbstractConfig { + final Map props; + + CounterSourceConnectorConfig(Map props) { + super(configDef(), props); + this.props = props; + } + + protected static ConfigDef configDef() { + return new ConfigDef() + .define("from", ConfigDef.Type.INT, ConfigDef.Importance.HIGH, "Number to start from") + .define("to", ConfigDef.Type.INT, ConfigDef.Importance.HIGH, "Number to go to") + .define( + "delay", ConfigDef.Type.DOUBLE, ConfigDef.Importance.HIGH, "Time between each event") + .define( + "topic", + ConfigDef.Type.STRING, + ConfigDef.Importance.HIGH, + "Name of Kafka topic to produce to"); + } + } + + @Nullable private CounterSourceConnectorConfig connectorConfig; + + @Override + public void start(Map props) { + this.connectorConfig = new CounterSourceConnectorConfig(props); + } + + @Override + public Class taskClass() { + return CounterTask.class; + } + + @Override + public List> taskConfigs(int maxTasks) { + if (this.connectorConfig == null || this.connectorConfig.props == null) { + return Collections.emptyList(); + } + + return Collections.singletonList( + ImmutableMap.of( + "from", this.connectorConfig.props.get("from"), + "to", this.connectorConfig.props.get("to"), + "delay", this.connectorConfig.props.get("delay"), + "topic", this.connectorConfig.props.get("topic"))); + } + + @Override + public void stop() {} + + @Override + public ConfigDef config() { + return CounterSourceConnectorConfig.configDef(); + } + + @Override + public String version() { + return "ONE"; + } +} + +class CounterTask extends SourceTask { + private static int countStopTasks = 0; + private String topic = ""; + private Integer from = 0; + private Integer to = 0; + private Double delay = 0.0; + + private Long start = System.currentTimeMillis(); + private Integer last = 0; + private Object lastOffset = null; + + private static final String PARTITION_FIELD = "mod"; + private static final Integer PARTITION_NAME = 1; + + @Override + public String version() { + return "ONE"; + } + + @Override + public void initialize(SourceTaskContext context) { + super.initialize(context); + + Map offset = + context + .offsetStorageReader() + .offset(Collections.singletonMap(PARTITION_FIELD, PARTITION_NAME)); + + if (offset == null) { + this.start = System.currentTimeMillis(); + this.last = 0; + } else { + this.start = (Long) offset.get("start"); + this.last = ((Long) offset.getOrDefault("last", 0)).intValue(); + } + this.lastOffset = offset; + } + + @Override + public void start(Map props) { + this.topic = props.getOrDefault("topic", ""); + this.from = Integer.parseInt(props.getOrDefault("from", "0")); + this.to = Integer.parseInt(props.getOrDefault("to", "0")); + this.delay = Double.parseDouble(props.getOrDefault("delay", "0")); + + if (this.lastOffset != null) { + return; + } + + this.start = + props.containsKey("start") + ? Long.parseLong(props.get("start")) + : System.currentTimeMillis(); + this.last = this.from - 1; + } + + @Override + public List poll() throws InterruptedException { + if (this.last.equals(to)) { + return null; + } + + List records = new ArrayList<>(); + Long callTime = System.currentTimeMillis(); + Long secondsSinceStart = (callTime - this.start) / 1000; + Long recordsToOutput = Math.round(Math.floor(secondsSinceStart / this.delay)); + + while (this.last < this.to) { + this.last = this.last + 1; + Map sourcePartition = Collections.singletonMap(PARTITION_FIELD, 1); + Map sourceOffset = + ImmutableMap.of("last", this.last.longValue(), "start", this.start); + + records.add( + new SourceRecord( + sourcePartition, sourceOffset, this.topic, Schema.INT64_SCHEMA, this.last)); + + if (records.size() >= recordsToOutput) { + break; + } + } + + return records; + } + + @Override + public void stop() { + CounterTask.countStopTasks++; + } + + public static int getCountTasks() { + return CounterTask.countStopTasks; + } +} diff --git a/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/OffsetTrackerTest.java b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/OffsetTrackerTest.java new file mode 100644 index 000000000000..b8a6c9b031e3 --- /dev/null +++ b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/OffsetTrackerTest.java @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import io.debezium.connector.mysql.MySqlConnector; +import java.io.IOException; +import java.io.Serializable; +import java.util.HashMap; +import java.util.Map; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class OffsetTrackerTest implements Serializable { + @Test + public void testRestrictByNumberOfRecords() throws IOException { + Integer maxNumRecords = 10; + Map position = new HashMap<>(); + KafkaSourceConsumerFn kafkaSourceConsumerFn = + new KafkaSourceConsumerFn( + MySqlConnector.class, new SourceRecordJson.SourceRecordJsonMapper(), maxNumRecords); + KafkaSourceConsumerFn.OffsetHolder restriction = + kafkaSourceConsumerFn.getInitialRestriction(new HashMap<>()); + KafkaSourceConsumerFn.OffsetTracker tracker = + new KafkaSourceConsumerFn.OffsetTracker(restriction); + + for (int records = 0; records < maxNumRecords; records++) { + assertTrue("OffsetTracker should continue", tracker.tryClaim(position)); + } + assertFalse("OffsetTracker should stop", tracker.tryClaim(position)); + } + + @Test + public void testRestrictByAmountOfTime() throws IOException, InterruptedException { + long millis = 60 * 1000; + long minutesToRun = 1; + Map position = new HashMap<>(); + KafkaSourceConsumerFn kafkaSourceConsumerFn = + new KafkaSourceConsumerFn( + MySqlConnector.class, new SourceRecordJson.SourceRecordJsonMapper(), minutesToRun); + KafkaSourceConsumerFn.OffsetHolder restriction = + kafkaSourceConsumerFn.getInitialRestriction(new HashMap<>()); + KafkaSourceConsumerFn.OffsetTracker tracker = + new KafkaSourceConsumerFn.OffsetTracker(restriction); + + assertTrue(tracker.tryClaim(position)); + + Thread.sleep(minutesToRun * millis + 100); + + assertFalse(tracker.tryClaim(position)); + } +} diff --git a/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/SourceRecordJsonTest.java b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/SourceRecordJsonTest.java new file mode 100644 index 000000000000..badd01eee292 --- /dev/null +++ b/sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/SourceRecordJsonTest.java @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.debezium; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThrows; + +import java.io.Serializable; +import org.apache.kafka.connect.data.Schema; +import org.apache.kafka.connect.data.SchemaBuilder; +import org.apache.kafka.connect.data.Struct; +import org.apache.kafka.connect.source.SourceRecord; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class SourceRecordJsonTest implements Serializable { + @Test + public void testSourceRecordJson() { + SourceRecord record = this.buildSourceRecord(); + SourceRecordJson json = new SourceRecordJson(record); + + String jsonString = json.toJson(); + + String expectedJson = + "{\"metadata\":" + + "{\"connector\":\"test-connector\"," + + "\"version\":\"version-connector\"," + + "\"name\":\"test-connector-sql\"," + + "\"database\":\"test-db\"," + + "\"schema\":\"test-schema\"," + + "\"table\":\"test-table\"}," + + "\"before\":{\"fields\":{\"column1\":\"before-name\"}}," + + "\"after\":{\"fields\":{\"column1\":\"after-name\"}}}"; + + assertEquals(expectedJson, jsonString); + } + + @Test + public void testSourceRecordJsonWhenSourceRecordIsNull() { + assertThrows(IllegalArgumentException.class, () -> new SourceRecordJson(null)); + } + + private Schema buildSourceSchema() { + return SchemaBuilder.struct() + .field("connector", Schema.STRING_SCHEMA) + .field("version", Schema.STRING_SCHEMA) + .field("name", Schema.STRING_SCHEMA) + .field("db", Schema.STRING_SCHEMA) + .field("schema", Schema.STRING_SCHEMA) + .field("table", Schema.STRING_SCHEMA) + .build(); + } + + private Schema buildBeforeSchema() { + return SchemaBuilder.struct().field("column1", Schema.STRING_SCHEMA).build(); + } + + private Schema buildAfterSchema() { + return SchemaBuilder.struct().field("column1", Schema.STRING_SCHEMA).build(); + } + + private SourceRecord buildSourceRecord() { + final Schema sourceSchema = this.buildSourceSchema(); + final Schema beforeSchema = this.buildBeforeSchema(); + final Schema afterSchema = this.buildAfterSchema(); + + final Schema schema = + SchemaBuilder.struct() + .name("test") + .field("source", sourceSchema) + .field("before", beforeSchema) + .field("after", afterSchema) + .build(); + + final Struct source = new Struct(sourceSchema); + final Struct before = new Struct(beforeSchema); + final Struct after = new Struct(afterSchema); + final Struct value = new Struct(schema); + + source.put("connector", "test-connector"); + source.put("version", "version-connector"); + source.put("name", "test-connector-sql"); + source.put("db", "test-db"); + source.put("schema", "test-schema"); + source.put("table", "test-table"); + + before.put("column1", "before-name"); + after.put("column1", "after-name"); + + value.put("source", source); + value.put("before", before); + value.put("after", after); + + return new SourceRecord(null, null, null, schema, value); + } +} diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/build.gradle b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/build.gradle deleted file mode 100644 index a627e6236bd7..000000000000 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/build.gradle +++ /dev/null @@ -1,48 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -plugins { id 'org.apache.beam.module' } -applyJavaNature( - publish: false, - archivesBaseName: 'beam-sdks-java-io-elasticsearch-tests-2' -) -provideIntegrationTestingDependencies() -enableJavaPerformanceTesting() - -description = "Apache Beam :: SDKs :: Java :: IO :: Elasticsearch-Tests :: 2.x" -ext.summary = "Tests of ElasticsearchIO on Elasticsearch 2.x" - -def log4j_version = "2.6.2" -def elastic_search_version = "2.4.1" - -dependencies { - testCompile project(path: ":sdks:java:io:elasticsearch-tests:elasticsearch-tests-common", configuration: "testRuntime") - testCompile project(path: ":sdks:java:core", configuration: "shadow") - testCompile project(":sdks:java:io:elasticsearch") - testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") - testCompile library.java.slf4j_api - testCompile "org.apache.logging.log4j:log4j-api:$log4j_version" - testCompile "org.apache.logging.log4j:log4j-core:$log4j_version" - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library - testCompile library.java.junit - testCompile "org.elasticsearch.client:elasticsearch-rest-client:7.9.2" - testCompile "org.elasticsearch:elasticsearch:$elastic_search_version" - testRuntimeOnly library.java.slf4j_jdk14 - testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") -} diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java deleted file mode 100644 index 2f31c7f795a1..000000000000 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java +++ /dev/null @@ -1,180 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.elasticsearch; - -import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration; - -import org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOITCommon.ElasticsearchPipelineOptions; -import org.apache.beam.sdk.options.PipelineOptionsFactory; -import org.apache.beam.sdk.testing.TestPipeline; -import org.elasticsearch.client.RestClient; -import org.junit.AfterClass; -import org.junit.BeforeClass; -import org.junit.Rule; -import org.junit.Test; - -/** - * A test of {@link ElasticsearchIO} on an independent Elasticsearch v2.x instance. - * - *

    This test requires a running instance of Elasticsearch, and the test dataset must exist in the - * database. See {@link ElasticsearchIOITCommon} for instructions to achieve this. - * - *

    You can run this test by doing the following from the beam parent module directory with the - * correct server IP: - * - *

    - *  ./gradlew integrationTest -p sdks/java/io/elasticsearch-tests/elasticsearch-tests-2
    - *  -DintegrationTestPipelineOptions='[
    - *  "--elasticsearchServer=1.2.3.4",
    - *  "--elasticsearchHttpPort=9200"]'
    - *  --tests org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOIT
    - *  -DintegrationTestRunner=direct
    - * 
    - * - *

    It is likely that you will need to configure thread_pool.bulk.queue_size: 250 (or - * higher) in the backend Elasticsearch server for this test to run. - */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class ElasticsearchIOIT { - private static RestClient restClient; - private static ElasticsearchPipelineOptions options; - private static ConnectionConfiguration readConnectionConfiguration; - private static ConnectionConfiguration writeConnectionConfiguration; - private static ConnectionConfiguration updateConnectionConfiguration; - private static ElasticsearchIOTestCommon elasticsearchIOTestCommon; - - @Rule public TestPipeline pipeline = TestPipeline.create(); - - @BeforeClass - public static void beforeClass() throws Exception { - PipelineOptionsFactory.register(ElasticsearchPipelineOptions.class); - options = TestPipeline.testingPipelineOptions().as(ElasticsearchPipelineOptions.class); - readConnectionConfiguration = - ElasticsearchIOITCommon.getConnectionConfiguration( - options, ElasticsearchIOITCommon.IndexMode.READ); - writeConnectionConfiguration = - ElasticsearchIOITCommon.getConnectionConfiguration( - options, ElasticsearchIOITCommon.IndexMode.WRITE); - updateConnectionConfiguration = - ElasticsearchIOITCommon.getConnectionConfiguration( - options, ElasticsearchIOITCommon.IndexMode.WRITE_PARTIAL); - restClient = readConnectionConfiguration.createClient(); - elasticsearchIOTestCommon = - new ElasticsearchIOTestCommon(readConnectionConfiguration, restClient, true); - } - - @AfterClass - public static void afterClass() throws Exception { - ElasticsearchIOTestUtils.deleteIndex(writeConnectionConfiguration, restClient); - ElasticsearchIOTestUtils.deleteIndex(updateConnectionConfiguration, restClient); - restClient.close(); - } - - @Test - public void testSplitsVolume() throws Exception { - elasticsearchIOTestCommon.testSplit(0); - } - - @Test - public void testReadVolume() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testRead(); - } - - @Test - public void testWriteVolume() throws Exception { - ElasticsearchIOTestCommon elasticsearchIOTestCommonWrite = - new ElasticsearchIOTestCommon(writeConnectionConfiguration, restClient, true); - elasticsearchIOTestCommonWrite.setPipeline(pipeline); - elasticsearchIOTestCommonWrite.testWrite(); - } - - @Test - public void testSizesVolume() throws Exception { - elasticsearchIOTestCommon.testSizes(); - } - - /** - * This test verifies volume loading of Elasticsearch using explicit document IDs and routed to an - * index named the same as the scientist, and type which is based on the modulo 2 of the scientist - * name. The goal of this IT is to help observe and verify that the overhead of adding the - * functions to parse the document and extract the ID is acceptable. - */ - @Test - public void testWriteVolumeWithFullAddressing() throws Exception { - // cannot share elasticsearchIOTestCommon because tests run in parallel. - ElasticsearchIOTestCommon elasticsearchIOTestCommonWrite = - new ElasticsearchIOTestCommon(writeConnectionConfiguration, restClient, true); - elasticsearchIOTestCommonWrite.setPipeline(pipeline); - elasticsearchIOTestCommonWrite.testWriteWithFullAddressing(); - } - - /** - * This test verifies volume partial updates of Elasticsearch. The test dataset index is cloned - * and then a new field is added to each document using a partial update. The test then asserts - * the updates were applied. - */ - @Test - public void testWritePartialUpdate() throws Exception { - ElasticsearchIOTestUtils.copyIndex( - restClient, - readConnectionConfiguration.getIndex(), - updateConnectionConfiguration.getIndex()); - // cannot share elasticsearchIOTestCommon because tests run in parallel. - ElasticsearchIOTestCommon elasticsearchIOTestCommonUpdate = - new ElasticsearchIOTestCommon(updateConnectionConfiguration, restClient, true); - elasticsearchIOTestCommonUpdate.setPipeline(pipeline); - elasticsearchIOTestCommonUpdate.testWritePartialUpdate(); - } - - /** - * This test verifies volume deletes of Elasticsearch. The test dataset index is cloned and then - * around half of the documents are deleted and the other half is partially updated using bulk - * delete request. The test then asserts the documents were deleted successfully. - */ - @Test - public void testWriteWithIsDeletedFnWithPartialUpdates() throws Exception { - ElasticsearchIOTestUtils.copyIndex( - restClient, - readConnectionConfiguration.getIndex(), - updateConnectionConfiguration.getIndex()); - ElasticsearchIOTestCommon elasticsearchIOTestCommonDeleteFn = - new ElasticsearchIOTestCommon(updateConnectionConfiguration, restClient, true); - elasticsearchIOTestCommonDeleteFn.setPipeline(pipeline); - elasticsearchIOTestCommonDeleteFn.testWriteWithIsDeletedFnWithPartialUpdates(); - } - - /** - * This test verifies volume deletes of Elasticsearch. The test dataset index is cloned and then - * around half of the documents are deleted using bulk delete request. The test then asserts the - * documents were deleted successfully. - */ - @Test - public void testWriteWithIsDeletedFnWithoutPartialUpdate() throws Exception { - ElasticsearchIOTestUtils.copyIndex( - restClient, - readConnectionConfiguration.getIndex(), - updateConnectionConfiguration.getIndex()); - ElasticsearchIOTestCommon elasticsearchIOTestCommonDeleteFn = - new ElasticsearchIOTestCommon(updateConnectionConfiguration, restClient, true); - elasticsearchIOTestCommonDeleteFn.setPipeline(pipeline); - elasticsearchIOTestCommonDeleteFn.testWriteWithIsDeletedFnWithoutPartialUpdate(); - } -} diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java deleted file mode 100644 index 587b8fbdedc9..000000000000 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java +++ /dev/null @@ -1,237 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.elasticsearch; - -import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration; -import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ES_TYPE; -import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.getEsIndex; - -import java.io.IOException; -import java.io.Serializable; -import org.apache.beam.sdk.io.common.NetworkTestHelper; -import org.apache.beam.sdk.testing.TestPipeline; -import org.elasticsearch.client.Request; -import org.elasticsearch.client.RestClient; -import org.elasticsearch.common.settings.Settings; -import org.elasticsearch.node.Node; -import org.junit.AfterClass; -import org.junit.Before; -import org.junit.BeforeClass; -import org.junit.ClassRule; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.ExpectedException; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -/** Tests for {@link ElasticsearchIO} version 2.x. */ -@RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class ElasticsearchIOTest implements Serializable { - - private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchIOTest.class); - - private static final String ES_IP = "127.0.0.1"; - private static final int MAX_STARTUP_WAITING_TIME_MSEC = 5000; - private static int esHttpPort; - private static Node node; - private static RestClient restClient; - private static ConnectionConfiguration connectionConfiguration; - // cannot use inheritance because ES5 test already extends ESIntegTestCase. - private static ElasticsearchIOTestCommon elasticsearchIOTestCommon; - - @ClassRule public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder(); - - @Rule public TestPipeline pipeline = TestPipeline.create(); - - @BeforeClass - public static void beforeClass() throws IOException { - esHttpPort = NetworkTestHelper.getAvailableLocalPort(); - LOG.info("Starting embedded Elasticsearch instance ({})", esHttpPort); - Settings.Builder settingsBuilder = - Settings.settingsBuilder() - .put("cluster.name", "beam") - .put("http.enabled", "true") - .put("node.data", "true") - .put("path.data", TEMPORARY_FOLDER.getRoot().getPath()) - .put("path.home", TEMPORARY_FOLDER.getRoot().getPath()) - .put("node.name", "beam") - .put("network.host", ES_IP) - .put("http.port", esHttpPort) - .put("index.store.stats_refresh_interval", 0) - // had problems with some jdk, embedded ES was too slow for bulk insertion, - // and queue of 50 was full. No pb with real ES instance (cf testWrite integration test) - .put("threadpool.bulk.queue_size", 400); - node = new Node(settingsBuilder.build()); - LOG.info("Elasticsearch node created"); - node.start(); - connectionConfiguration = - ConnectionConfiguration.create( - new String[] {"http://" + ES_IP + ":" + esHttpPort}, getEsIndex(), ES_TYPE) - .withSocketTimeout(120000) - .withConnectTimeout(5000); - restClient = connectionConfiguration.createClient(); - elasticsearchIOTestCommon = - new ElasticsearchIOTestCommon(connectionConfiguration, restClient, false); - int waitingTime = 0; - int healthCheckFrequency = 500; - Request request = new Request("HEAD", "/"); - while ((waitingTime < MAX_STARTUP_WAITING_TIME_MSEC) - && restClient.performRequest(request).getStatusLine().getStatusCode() != 200) { - try { - Thread.sleep(healthCheckFrequency); - waitingTime += healthCheckFrequency; - } catch (InterruptedException e) { - LOG.warn( - "Waiting thread was interrupted while waiting for connection to Elasticsearch to be available"); - } - } - if (waitingTime >= MAX_STARTUP_WAITING_TIME_MSEC) { - throw new IOException("Max startup waiting for embedded Elasticsearch to start was exceeded"); - } - } - - @AfterClass - public static void afterClass() throws IOException { - restClient.close(); - node.close(); - } - - @Before - public void before() throws Exception { - ElasticsearchIOTestUtils.deleteIndex(connectionConfiguration, restClient); - } - - @Test - public void testSizes() throws Exception { - elasticsearchIOTestCommon.testSizes(); - } - - @Test - public void testRead() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testRead(); - } - - @Test - public void testReadWithQueryString() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testReadWithQueryString(); - } - - @Test - public void testReadWithQueryValueProvider() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testReadWithQueryValueProvider(); - } - - @Test - public void testWrite() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testWrite(); - } - - @Rule public ExpectedException expectedException = ExpectedException.none(); - - @Test - public void testWriteWithErrors() throws Exception { - elasticsearchIOTestCommon.setExpectedException(expectedException); - elasticsearchIOTestCommon.testWriteWithErrors(); - } - - @Test - public void testWriteWithMaxBatchSize() throws Exception { - elasticsearchIOTestCommon.testWriteWithMaxBatchSize(); - } - - @Test - public void testWriteWithMaxBatchSizeBytes() throws Exception { - elasticsearchIOTestCommon.testWriteWithMaxBatchSizeBytes(); - } - - @Test - public void testSplit() throws Exception { - elasticsearchIOTestCommon.testSplit(0); - } - - @Test - public void testWriteWithIdFn() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testWriteWithIdFn(); - } - - @Test - public void testWriteWithIndexFn() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testWriteWithIndexFn(); - } - - @Test - public void testWriteWithTypeFn() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testWriteWithTypeFn2x5x(); - } - - @Test - public void testWriteFullAddressing() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testWriteWithFullAddressing(); - } - - @Test - public void testWritePartialUpdate() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testWritePartialUpdate(); - } - - @Test - public void testReadWithMetadata() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testReadWithMetadata(); - } - - @Test - public void testDefaultRetryPredicate() throws IOException { - elasticsearchIOTestCommon.testDefaultRetryPredicate(restClient); - } - - @Test - public void testWriteRetry() throws Throwable { - elasticsearchIOTestCommon.setExpectedException(expectedException); - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testWriteRetry(); - } - - @Test - public void testWriteRetryValidRequest() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testWriteRetryValidRequest(); - } - - @Test - public void testWriteWithIsDeleteFn() throws Exception { - elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testWriteWithIsDeletedFnWithPartialUpdates(); - elasticsearchIOTestCommon.testWriteWithIsDeletedFnWithoutPartialUpdate(); - } -} diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/build.gradle b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/build.gradle index b017cac99a5a..f534f591c10f 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/build.gradle +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/build.gradle @@ -27,20 +27,15 @@ enableJavaPerformanceTesting() description = "Apache Beam :: SDKs :: Java :: IO :: Elasticsearch-Tests :: 5.x" ext.summary = "Tests of ElasticsearchIO on Elasticsearch 5.x" -test { - // needed for ESIntegTestCase - systemProperty "tests.security.manager", "false" -} - -def log4j_version = "2.6.2" +def log4j_version = "2.14.1" def elastic_search_version = "5.6.3" configurations.all { resolutionStrategy { // Make sure the log4j versions for api and core match instead of taking the default // Gradle rule of using the latest. - force "org.apache.logging.log4j:log4j-api:$log4j_version" force "org.apache.logging.log4j:log4j-core:$log4j_version" + force "org.apache.logging.log4j:log4j-api:$log4j_version" } } @@ -50,17 +45,18 @@ dependencies { testCompile "org.elasticsearch.plugin:transport-netty4-client:$elastic_search_version" testCompile "com.carrotsearch.randomizedtesting:randomizedtesting-runner:2.5.0" testCompile "org.elasticsearch:elasticsearch:$elastic_search_version" + testImplementation "org.testcontainers:elasticsearch:1.15.3" testCompile project(path: ":sdks:java:core", configuration: "shadow") testCompile project(":sdks:java:io:elasticsearch") testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") - testCompile "org.apache.logging.log4j:log4j-core:$log4j_version" - testCompile "org.apache.logging.log4j:log4j-api:$log4j_version" testCompile library.java.slf4j_api testCompile library.java.hamcrest_core testCompile library.java.hamcrest_library testCompile library.java.junit testCompile "org.elasticsearch.client:elasticsearch-rest-client:$elastic_search_version" + testRuntimeOnly "org.apache.logging.log4j:log4j-api:$log4j_version" + testRuntimeOnly "org.apache.logging.log4j:log4j-core:$log4j_version" testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java index be3b88113d93..9503853071fb 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java @@ -52,9 +52,6 @@ * higher) in the backend Elasticsearch server for this test to run. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ElasticsearchIOIT { private static RestClient restClient; private static ElasticsearchPipelineOptions options; @@ -110,6 +107,15 @@ public void testWriteVolume() throws Exception { elasticsearchIOTestCommonWrite.testWrite(); } + @Test + public void testWriteVolumeStateful() throws Exception { + // cannot share elasticsearchIOTestCommon because tests run in parallel. + ElasticsearchIOTestCommon elasticsearchIOTestCommonWrite = + new ElasticsearchIOTestCommon(writeConnectionConfiguration, restClient, true); + elasticsearchIOTestCommonWrite.setPipeline(pipeline); + elasticsearchIOTestCommonWrite.testWriteStateful(); + } + @Test public void testSizesVolume() throws Exception { elasticsearchIOTestCommon.testSizes(); @@ -130,6 +136,29 @@ public void testWriteWithFullAddressingVolume() throws Exception { elasticsearchIOTestCommonWrite.testWriteWithFullAddressing(); } + @Test + public void testWriteWithAllowableErrors() throws Exception { + elasticsearchIOTestCommon.testWriteWithAllowedErrors(); + } + + @Test + public void testWriteWithRouting() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithRouting(); + } + + @Test + public void testWriteScriptedUpsert() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteScriptedUpsert(); + } + + @Test + public void testWriteWithDocVersion() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithDocVersion(); + } + /** * This test verifies volume partial updates of Elasticsearch. The test dataset index is cloned * and then a new field is added to each document using a partial update. The test then asserts diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java index 9228507450b9..e6527aaeab3a 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java @@ -18,25 +18,26 @@ package org.apache.beam.sdk.io.elasticsearch; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration; -import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ES_TYPE; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.getEsIndex; -import static org.elasticsearch.test.ESIntegTestCase.Scope.SUITE; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.createConnectionConfig; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.createIndex; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.createTestContainer; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.deleteIndex; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.setDefaultTemplate; import com.carrotsearch.randomizedtesting.annotations.ThreadLeakScope; import java.io.IOException; import java.io.Serializable; -import java.net.InetSocketAddress; -import java.util.ArrayList; -import java.util.Collection; import org.apache.beam.sdk.testing.TestPipeline; -import org.elasticsearch.common.settings.Settings; -import org.elasticsearch.plugins.Plugin; -import org.elasticsearch.test.ESIntegTestCase; -import org.elasticsearch.transport.Netty4Plugin; +import org.elasticsearch.client.RestClient; +import org.junit.AfterClass; import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; +import org.testcontainers.elasticsearch.ElasticsearchContainer; /* Cannot use @RunWith(JUnit4.class) with ESIntegTestCase @@ -45,71 +46,50 @@ /** Tests for {@link ElasticsearchIO} version 5. */ @ThreadLeakScope(ThreadLeakScope.Scope.NONE) -// use cluster of 1 node that has data + master roles -@ESIntegTestCase.ClusterScope(scope = SUITE, numDataNodes = 1, supportsDedicatedMasters = false) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class ElasticsearchIOTest extends ESIntegTestCase implements Serializable { +public class ElasticsearchIOTest implements Serializable { private ElasticsearchIOTestCommon elasticsearchIOTestCommon; private ConnectionConfiguration connectionConfiguration; + private static ElasticsearchContainer container; + private static RestClient client; + static final String IMAGE_TAG = "5.6.3"; - private String[] fillAddresses() { - ArrayList result = new ArrayList<>(); - for (InetSocketAddress address : cluster().httpAddresses()) { - result.add(String.format("http://%s:%s", address.getHostString(), address.getPort())); - } - return result.toArray(new String[result.size()]); - } - - @Override - protected Settings nodeSettings(int nodeOrdinal) { - System.setProperty("es.set.netty.runtime.available.processors", "false"); - return Settings.builder() - .put(super.nodeSettings(nodeOrdinal)) - .put("http.enabled", "true") - // had problems with some jdk, embedded ES was too slow for bulk insertion, - // and queue of 50 was full. No pb with real ES instance (cf testWrite integration test) - .put("thread_pool.bulk.queue_size", 400) - .build(); - } + @BeforeClass + public static void beforeClass() throws IOException { + // Create the elasticsearch container. + container = createTestContainer(IMAGE_TAG); - @Override - public Settings indexSettings() { - return Settings.builder() - .put(super.indexSettings()) - // useful to have updated sizes for getEstimatedSize - .put("index.store.stats_refresh_interval", 0) - .build(); + // Start the container. This step might take some time... + container.start(); + client = ElasticsearchIOTestUtils.clientFromContainer(container); + setDefaultTemplate(client); } - @Override - protected Collection> nodePlugins() { - ArrayList> plugins = new ArrayList<>(); - plugins.add(Netty4Plugin.class); - return plugins; + @AfterClass + public static void afterClass() throws IOException { + client.close(); + container.stop(); } @Before public void setup() throws IOException { if (connectionConfiguration == null) { - connectionConfiguration = - ConnectionConfiguration.create(fillAddresses(), getEsIndex(), ES_TYPE) - .withSocketTimeout(120000) - .withConnectTimeout(5000); + connectionConfiguration = createConnectionConfig(client); elasticsearchIOTestCommon = - new ElasticsearchIOTestCommon(connectionConfiguration, getRestClient(), false); + new ElasticsearchIOTestCommon(connectionConfiguration, client, false); + + deleteIndex(client, getEsIndex()); } } @Rule public TestPipeline pipeline = TestPipeline.create(); + @Ignore("https://issues.apache.org/jira/browse/BEAM-5172") @Test public void testSizes() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.testSizes(); } @@ -117,7 +97,7 @@ public void testSizes() throws Exception { public void testRead() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.setPipeline(pipeline); elasticsearchIOTestCommon.testRead(); } @@ -126,7 +106,7 @@ public void testRead() throws Exception { public void testReadWithQueryString() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.setPipeline(pipeline); elasticsearchIOTestCommon.testReadWithQueryString(); } @@ -135,7 +115,7 @@ public void testReadWithQueryString() throws Exception { public void testReadWithQueryValueProvider() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.setPipeline(pipeline); elasticsearchIOTestCommon.testReadWithQueryValueProvider(); } @@ -164,11 +144,12 @@ public void testWriteWithMaxBatchSizeBytes() throws Exception { elasticsearchIOTestCommon.testWriteWithMaxBatchSizeBytes(); } + @Ignore("https://issues.apache.org/jira/browse/BEAM-5172") @Test public void testSplit() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.testSplit(2_000); } @@ -202,6 +183,47 @@ public void testWritePartialUpdate() throws Exception { elasticsearchIOTestCommon.testWritePartialUpdate(); } + @Test + public void testWriteAppendOnly() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteAppendOnly(); + } + + @Test(expected = Exception.class) + public void testWriteAppendOnlyDeleteNotAllowed() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteAppendOnlyDeleteNotAllowed(); + } + + @Test + public void testWriteWithAllowableErrors() throws Exception { + elasticsearchIOTestCommon.testWriteWithAllowedErrors(); + } + + @Test + public void testWriteWithRouting() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithRouting(); + } + + @Test + public void testWriteScriptedUpsert() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteScriptedUpsert(); + } + + @Test + public void testWriteWithDocVersion() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithDocVersion(); + } + + @Test + public void testMaxParallelRequestsPerWindow() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testMaxParallelRequestsPerWindow(); + } + @Test public void testReadWithMetadata() throws Exception { elasticsearchIOTestCommon.setPipeline(pipeline); @@ -210,7 +232,7 @@ public void testReadWithMetadata() throws Exception { @Test public void testDefaultRetryPredicate() throws IOException { - elasticsearchIOTestCommon.testDefaultRetryPredicate(getRestClient()); + elasticsearchIOTestCommon.testDefaultRetryPredicate(client); } @Test diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/build.gradle b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/build.gradle index 15f3bbff86f5..b41269112b51 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/build.gradle +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/build.gradle @@ -27,20 +27,15 @@ enableJavaPerformanceTesting() description = "Apache Beam :: SDKs :: Java :: IO :: Elasticsearch-Tests :: 6.x" ext.summary = "Tests of ElasticsearchIO on Elasticsearch 6.x" -test { - // needed for ESIntegTestCase - systemProperty "tests.security.manager", "false" -} - -def log4j_version = "2.6.2" +def log4j_version = "2.14.1" def elastic_search_version = "6.4.0" configurations.all { resolutionStrategy { // Make sure the log4j versions for api and core match instead of taking the default // Gradle rule of using the latest. - force "org.apache.logging.log4j:log4j-api:$log4j_version" force "org.apache.logging.log4j:log4j-core:$log4j_version" + force "org.apache.logging.log4j:log4j-api:$log4j_version" } } @@ -50,17 +45,18 @@ dependencies { testCompile "org.elasticsearch.plugin:transport-netty4-client:$elastic_search_version" testCompile "com.carrotsearch.randomizedtesting:randomizedtesting-runner:2.5.2" testCompile "org.elasticsearch:elasticsearch:$elastic_search_version" + testImplementation "org.testcontainers:elasticsearch:1.15.3" testCompile project(path: ":sdks:java:core", configuration: "shadow") testCompile project(":sdks:java:io:elasticsearch") testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") - testCompile "org.apache.logging.log4j:log4j-core:$log4j_version" - testCompile "org.apache.logging.log4j:log4j-api:$log4j_version" testCompile library.java.slf4j_api testCompile library.java.hamcrest_core testCompile library.java.hamcrest_library testCompile library.java.junit testCompile "org.elasticsearch.client:elasticsearch-rest-client:$elastic_search_version" + testRuntimeOnly "org.apache.logging.log4j:log4j-api:$log4j_version" + testRuntimeOnly "org.apache.logging.log4j:log4j-core:$log4j_version" testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java index 4e4f6c24770c..18bca065b641 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java @@ -52,9 +52,6 @@ * higher) in the backend Elasticsearch server for this test to run. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ElasticsearchIOIT { private static RestClient restClient; private static ElasticsearchPipelineOptions options; @@ -110,6 +107,15 @@ public void testWriteVolume() throws Exception { elasticsearchIOTestCommonWrite.testWrite(); } + @Test + public void testWriteVolumeStateful() throws Exception { + // cannot share elasticsearchIOTestCommon because tests run in parallel. + ElasticsearchIOTestCommon elasticsearchIOTestCommonWrite = + new ElasticsearchIOTestCommon(writeConnectionConfiguration, restClient, true); + elasticsearchIOTestCommonWrite.setPipeline(pipeline); + elasticsearchIOTestCommonWrite.testWriteStateful(); + } + @Test public void testSizesVolume() throws Exception { elasticsearchIOTestCommon.testSizes(); @@ -130,6 +136,29 @@ public void testWriteWithFullAddressingVolume() throws Exception { elasticsearchIOTestCommonWrite.testWriteWithFullAddressing(); } + @Test + public void testWriteWithAllowableErrors() throws Exception { + elasticsearchIOTestCommon.testWriteWithAllowedErrors(); + } + + @Test + public void testWriteWithRouting() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithRouting(); + } + + @Test + public void testWriteScriptedUpsert() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteScriptedUpsert(); + } + + @Test + public void testWriteWithDocVersion() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithDocVersion(); + } + /** * This test verifies volume partial updates of Elasticsearch. The test dataset index is cloned * and then a new field is added to each document using a partial update. The test then asserts diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java index 60e6e7d95b12..520b5907cd1b 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java @@ -18,25 +18,26 @@ package org.apache.beam.sdk.io.elasticsearch; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration; -import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ES_TYPE; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.getEsIndex; -import static org.elasticsearch.test.ESIntegTestCase.Scope.SUITE; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.createConnectionConfig; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.createIndex; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.createTestContainer; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.deleteIndex; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.setDefaultTemplate; import com.carrotsearch.randomizedtesting.annotations.ThreadLeakScope; import java.io.IOException; import java.io.Serializable; -import java.net.InetSocketAddress; -import java.util.ArrayList; -import java.util.Collection; import org.apache.beam.sdk.testing.TestPipeline; -import org.elasticsearch.common.settings.Settings; -import org.elasticsearch.plugins.Plugin; -import org.elasticsearch.test.ESIntegTestCase; -import org.elasticsearch.transport.Netty4Plugin; +import org.elasticsearch.client.RestClient; +import org.junit.AfterClass; import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; +import org.testcontainers.elasticsearch.ElasticsearchContainer; /* Cannot use @RunWith(JUnit4.class) with ESIntegTestCase @@ -45,71 +46,49 @@ /** Tests for {@link ElasticsearchIO} version 6. */ @ThreadLeakScope(ThreadLeakScope.Scope.NONE) -// use cluster of 1 node that has data + master roles -@ESIntegTestCase.ClusterScope(scope = SUITE, numDataNodes = 1, supportsDedicatedMasters = false) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class ElasticsearchIOTest extends ESIntegTestCase implements Serializable { +public class ElasticsearchIOTest implements Serializable { private ElasticsearchIOTestCommon elasticsearchIOTestCommon; private ConnectionConfiguration connectionConfiguration; + private static ElasticsearchContainer container; + private static RestClient client; + static final String IMAGE_TAG = "6.4.0"; - private String[] fillAddresses() { - ArrayList result = new ArrayList<>(); - for (InetSocketAddress address : cluster().httpAddresses()) { - result.add(String.format("http://%s:%s", address.getHostString(), address.getPort())); - } - return result.toArray(new String[result.size()]); - } - - @Override - protected Settings nodeSettings(int nodeOrdinal) { - System.setProperty("es.set.netty.runtime.available.processors", "false"); - return Settings.builder() - .put(super.nodeSettings(nodeOrdinal)) - .put("http.enabled", "true") - // had problems with some jdk, embedded ES was too slow for bulk insertion, - // and queue of 50 was full. No pb with real ES instance (cf testWrite integration test) - .put("thread_pool.bulk.queue_size", 400) - .build(); - } + @BeforeClass + public static void beforeClass() throws IOException { + // Create the elasticsearch container. + container = createTestContainer(IMAGE_TAG); - @Override - public Settings indexSettings() { - return Settings.builder() - .put(super.indexSettings()) - // useful to have updated sizes for getEstimatedSize - .put("index.store.stats_refresh_interval", 0) - .build(); + // Start the container. This step might take some time... + container.start(); + client = ElasticsearchIOTestUtils.clientFromContainer(container); + setDefaultTemplate(client); } - @Override - protected Collection> nodePlugins() { - ArrayList> plugins = new ArrayList<>(); - plugins.add(Netty4Plugin.class); - return plugins; + @AfterClass + public static void afterClass() throws IOException { + client.close(); + container.stop(); } @Before public void setup() throws IOException { if (connectionConfiguration == null) { - connectionConfiguration = - ConnectionConfiguration.create(fillAddresses(), getEsIndex(), ES_TYPE) - .withSocketTimeout(120000) - .withConnectTimeout(5000); + connectionConfiguration = createConnectionConfig(client); elasticsearchIOTestCommon = - new ElasticsearchIOTestCommon(connectionConfiguration, getRestClient(), false); + new ElasticsearchIOTestCommon(connectionConfiguration, client, false); + deleteIndex(client, getEsIndex()); } } @Rule public TestPipeline pipeline = TestPipeline.create(); + @Ignore("https://issues.apache.org/jira/browse/BEAM-5172") @Test public void testSizes() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.testSizes(); } @@ -117,7 +96,7 @@ public void testSizes() throws Exception { public void testRead() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.setPipeline(pipeline); elasticsearchIOTestCommon.testRead(); } @@ -126,7 +105,7 @@ public void testRead() throws Exception { public void testReadWithQueryString() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.setPipeline(pipeline); elasticsearchIOTestCommon.testReadWithQueryString(); } @@ -135,7 +114,7 @@ public void testReadWithQueryString() throws Exception { public void testReadWithQueryValueProvider() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.setPipeline(pipeline); elasticsearchIOTestCommon.testReadWithQueryValueProvider(); } @@ -164,11 +143,12 @@ public void testWriteWithMaxBatchSizeBytes() throws Exception { elasticsearchIOTestCommon.testWriteWithMaxBatchSizeBytes(); } + @Ignore("https://issues.apache.org/jira/browse/BEAM-5172") @Test public void testSplit() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.testSplit(2_000); } @@ -196,6 +176,47 @@ public void testWritePartialUpdate() throws Exception { elasticsearchIOTestCommon.testWritePartialUpdate(); } + @Test + public void testWriteAppendOnly() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteAppendOnly(); + } + + @Test(expected = Exception.class) + public void testWriteAppendOnlyDeleteNotAllowed() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteAppendOnlyDeleteNotAllowed(); + } + + @Test + public void testWriteWithAllowableErrors() throws Exception { + elasticsearchIOTestCommon.testWriteWithAllowedErrors(); + } + + @Test + public void testWriteWithRouting() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithRouting(); + } + + @Test + public void testWriteScriptedUpsert() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteScriptedUpsert(); + } + + @Test + public void testWriteWithDocVersion() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithDocVersion(); + } + + @Test + public void testMaxParallelRequestsPerWindow() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testMaxParallelRequestsPerWindow(); + } + @Test public void testReadWithMetadata() throws Exception { elasticsearchIOTestCommon.setPipeline(pipeline); @@ -204,7 +225,7 @@ public void testReadWithMetadata() throws Exception { @Test public void testDefaultRetryPredicate() throws IOException { - elasticsearchIOTestCommon.testDefaultRetryPredicate(getRestClient()); + elasticsearchIOTestCommon.testDefaultRetryPredicate(client); } @Test diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/build.gradle b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/build.gradle index 5b04c7b770b8..d69fa42be9b0 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/build.gradle +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/build.gradle @@ -27,20 +27,15 @@ enableJavaPerformanceTesting() description = "Apache Beam :: SDKs :: Java :: IO :: Elasticsearch-Tests :: 7.x" ext.summary = "Tests of ElasticsearchIO on Elasticsearch 7.x" -test { - // needed for ESIntegTestCase - systemProperty "tests.security.manager", "false" -} - -def log4j_version = "2.11.1" +def log4j_version = "2.14.1" def elastic_search_version = "7.9.2" configurations.all { resolutionStrategy { // Make sure the log4j versions for api and core match instead of taking the default // Gradle rule of using the latest. - force "org.apache.logging.log4j:log4j-api:$log4j_version" force "org.apache.logging.log4j:log4j-core:$log4j_version" + force "org.apache.logging.log4j:log4j-api:$log4j_version" } } @@ -49,18 +44,22 @@ dependencies { testCompile "org.elasticsearch.test:framework:$elastic_search_version" testCompile "org.elasticsearch.plugin:transport-netty4-client:$elastic_search_version" testCompile "com.carrotsearch.randomizedtesting:randomizedtesting-runner:2.7.5" - testCompile "org.elasticsearch:elasticsearch:$elastic_search_version" + testImplementation "org.testcontainers:elasticsearch:1.15.3" testCompile project(path: ":sdks:java:core", configuration: "shadow") testCompile project(":sdks:java:io:elasticsearch") testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") - testCompile "org.apache.logging.log4j:log4j-core:$log4j_version" - testCompile "org.apache.logging.log4j:log4j-api:$log4j_version" testCompile library.java.slf4j_api testCompile library.java.hamcrest_core testCompile library.java.hamcrest_library testCompile library.java.junit testCompile "org.elasticsearch.client:elasticsearch-rest-client:$elastic_search_version" + testRuntimeOnly "org.apache.logging.log4j:log4j-api:$log4j_version" + testRuntimeOnly "org.apache.logging.log4j:log4j-core:$log4j_version" testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } + +//test { +// systemProperty 'elasticsearchVersion', elasticsearchVersion +//} \ No newline at end of file diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java index 98e937ca8a49..8e4e76ab1824 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java @@ -52,9 +52,6 @@ * (or higher) in the backend Elasticsearch server for this test to run. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ElasticsearchIOIT { private static RestClient restClient; private static ElasticsearchPipelineOptions options; @@ -110,6 +107,15 @@ public void testWriteVolume() throws Exception { elasticsearchIOTestCommonWrite.testWrite(); } + @Test + public void testWriteVolumeStateful() throws Exception { + // cannot share elasticsearchIOTestCommon because tests run in parallel. + ElasticsearchIOTestCommon elasticsearchIOTestCommonWrite = + new ElasticsearchIOTestCommon(writeConnectionConfiguration, restClient, true); + elasticsearchIOTestCommonWrite.setPipeline(pipeline); + elasticsearchIOTestCommonWrite.testWriteStateful(); + } + @Test public void testSizesVolume() throws Exception { elasticsearchIOTestCommon.testSizes(); @@ -130,6 +136,29 @@ public void testWriteWithFullAddressingVolume() throws Exception { elasticsearchIOTestCommonWrite.testWriteWithFullAddressing(); } + @Test + public void testWriteWithAllowableErrors() throws Exception { + elasticsearchIOTestCommon.testWriteWithAllowedErrors(); + } + + @Test + public void testWriteWithRouting() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithRouting(); + } + + @Test + public void testWriteScriptedUpsert() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteScriptedUpsert(); + } + + @Test + public void testWriteWithDocVersion() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithDocVersion(); + } + /** * This test verifies volume partial updates of Elasticsearch. The test dataset index is cloned * and then a new field is added to each document using a partial update. The test then asserts diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java index 9e731ce50665..05f4bd97bd1b 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-7/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java @@ -18,25 +18,26 @@ package org.apache.beam.sdk.io.elasticsearch; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration; -import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ES_TYPE; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.getEsIndex; -import static org.elasticsearch.test.ESIntegTestCase.Scope.SUITE; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.createConnectionConfig; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.createIndex; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.createTestContainer; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.deleteIndex; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.setDefaultTemplate; import com.carrotsearch.randomizedtesting.annotations.ThreadLeakScope; import java.io.IOException; import java.io.Serializable; -import java.net.InetSocketAddress; -import java.util.ArrayList; -import java.util.Collection; import org.apache.beam.sdk.testing.TestPipeline; -import org.elasticsearch.common.settings.Settings; -import org.elasticsearch.plugins.Plugin; -import org.elasticsearch.test.ESIntegTestCase; -import org.elasticsearch.transport.Netty4Plugin; +import org.elasticsearch.client.RestClient; +import org.junit.AfterClass; import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; +import org.testcontainers.elasticsearch.ElasticsearchContainer; /* Cannot use @RunWith(JUnit4.class) with ESIntegTestCase @@ -45,75 +46,50 @@ /** Tests for {@link ElasticsearchIO} version 7. */ @ThreadLeakScope(ThreadLeakScope.Scope.NONE) -// use cluster of 1 node that has data + master roles -@ESIntegTestCase.ClusterScope(scope = SUITE, numDataNodes = 1, supportsDedicatedMasters = false) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class ElasticsearchIOTest extends ESIntegTestCase implements Serializable { +public class ElasticsearchIOTest implements Serializable { private ElasticsearchIOTestCommon elasticsearchIOTestCommon; private ConnectionConfiguration connectionConfiguration; + private static ElasticsearchContainer container; + private static RestClient client; + static final String IMAGE_TAG = "7.9.2"; - private String[] fillAddresses() { - ArrayList result = new ArrayList<>(); - for (InetSocketAddress address : cluster().httpAddresses()) { - result.add(String.format("http://%s:%s", address.getHostString(), address.getPort())); - } - return result.toArray(new String[result.size()]); - } + @BeforeClass + public static void beforeClass() throws IOException { + // Create the elasticsearch container. + container = createTestContainer(IMAGE_TAG); - @Override - protected boolean addMockHttpTransport() { - return false; + // Start the container. This step might take some time... + container.start(); + client = ElasticsearchIOTestUtils.clientFromContainer(container); + setDefaultTemplate(client); } - @Override - protected Settings nodeSettings(int nodeOrdinal) { - System.setProperty("es.set.netty.runtime.available.processors", "false"); - return Settings.builder() - .put(super.nodeSettings(nodeOrdinal)) - // had problems with some jdk, embedded ES was too slow for bulk insertion, - // and queue of 50 was full. No pb with real ES instance (cf testWrite integration test) - .put("thread_pool.write.queue_size", 400) - .build(); - } - - @Override - public Settings indexSettings() { - return Settings.builder() - .put(super.indexSettings()) - // useful to have updated sizes for getEstimatedSize - .put("index.store.stats_refresh_interval", 0) - .build(); - } - - @Override - protected Collection> nodePlugins() { - ArrayList> plugins = new ArrayList<>(); - plugins.add(Netty4Plugin.class); - return plugins; + @AfterClass + public static void afterClass() throws IOException { + client.close(); + container.stop(); } @Before public void setup() throws IOException { if (connectionConfiguration == null) { - connectionConfiguration = - ConnectionConfiguration.create(fillAddresses(), getEsIndex(), ES_TYPE) - .withSocketTimeout(120000) - .withConnectTimeout(5000); + connectionConfiguration = createConnectionConfig(client); elasticsearchIOTestCommon = - new ElasticsearchIOTestCommon(connectionConfiguration, getRestClient(), false); + new ElasticsearchIOTestCommon(connectionConfiguration, client, false); + + deleteIndex(client, getEsIndex()); } } @Rule public TestPipeline pipeline = TestPipeline.create(); + @Ignore("https://issues.apache.org/jira/browse/BEAM-5172") @Test public void testSizes() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.testSizes(); } @@ -121,7 +97,7 @@ public void testSizes() throws Exception { public void testRead() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.setPipeline(pipeline); elasticsearchIOTestCommon.testRead(); } @@ -130,16 +106,16 @@ public void testRead() throws Exception { public void testReadWithQueryString() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.setPipeline(pipeline); - elasticsearchIOTestCommon.testReadWithQueryString(); + elasticsearchIOTestCommon.testRead(); } @Test public void testReadWithQueryValueProvider() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.setPipeline(pipeline); elasticsearchIOTestCommon.testReadWithQueryValueProvider(); } @@ -168,11 +144,12 @@ public void testWriteWithMaxBatchSizeBytes() throws Exception { elasticsearchIOTestCommon.testWriteWithMaxBatchSizeBytes(); } + @Ignore("https://issues.apache.org/jira/browse/BEAM-5172") @Test public void testSplit() throws Exception { // need to create the index using the helper method (not create it at first insertion) // for the indexSettings() to be run - createIndex(getEsIndex()); + createIndex(elasticsearchIOTestCommon.restClient, getEsIndex()); elasticsearchIOTestCommon.testSplit(2_000); } @@ -200,6 +177,47 @@ public void testWritePartialUpdate() throws Exception { elasticsearchIOTestCommon.testWritePartialUpdate(); } + @Test + public void testWriteAppendOnly() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteAppendOnly(); + } + + @Test(expected = Exception.class) + public void testWriteAppendOnlyDeleteNotAllowed() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteAppendOnlyDeleteNotAllowed(); + } + + @Test + public void testWriteWithAllowableErrors() throws Exception { + elasticsearchIOTestCommon.testWriteWithAllowedErrors(); + } + + @Test + public void testWriteWithRouting() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithRouting(); + } + + @Test + public void testWriteScriptedUpsert() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteScriptedUpsert(); + } + + @Test + public void testWriteWithDocVersion() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testWriteWithDocVersion(); + } + + @Test + public void testMaxParallelRequestsPerWindow() throws Exception { + elasticsearchIOTestCommon.setPipeline(pipeline); + elasticsearchIOTestCommon.testMaxParallelRequestsPerWindow(); + } + @Test public void testReadWithMetadata() throws Exception { elasticsearchIOTestCommon.setPipeline(pipeline); @@ -208,7 +226,7 @@ public void testReadWithMetadata() throws Exception { @Test public void testDefaultRetryPredicate() throws IOException { - elasticsearchIOTestCommon.testDefaultRetryPredicate(getRestClient()); + elasticsearchIOTestCommon.testDefaultRetryPredicate(client); } @Test diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle index 1b46a13be738..6f470f277073 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle @@ -25,15 +25,15 @@ applyJavaNature( description = "Apache Beam :: SDKs :: Java :: IO :: Elasticsearch-Tests :: Common" ext.summary = "Common test classes for ElasticsearchIO" -def log4j_version = "2.11.1" +def log4j_version = "2.14.1" def elastic_search_version = "7.9.2" configurations.all { resolutionStrategy { // Make sure the log4j versions for api and core match instead of taking the default // Gradle rule of using the latest. - force "org.apache.logging.log4j:log4j-api:$log4j_version" force "org.apache.logging.log4j:log4j-core:$log4j_version" + force "org.apache.logging.log4j:log4j-api:$log4j_version" } } @@ -43,12 +43,15 @@ dependencies { testCompile project(":sdks:java:io:elasticsearch") testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") testCompile library.java.slf4j_api - testCompile "org.apache.logging.log4j:log4j-api:$log4j_version" - testCompile "org.apache.logging.log4j:log4j-core:$log4j_version" testCompile library.java.hamcrest_core testCompile library.java.hamcrest_library testCompile library.java.junit testCompile "org.elasticsearch.client:elasticsearch-rest-client:$elastic_search_version" + testImplementation "org.elasticsearch.client:elasticsearch-rest-high-level-client:${elastic_search_version}" + testImplementation "org.testcontainers:elasticsearch:1.15.3" + + testRuntimeOnly "org.apache.logging.log4j:log4j-api:$log4j_version" + testRuntimeOnly "org.apache.logging.log4j:log4j-core:$log4j_version" testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java index 68931fd8451b..c59e08b10551 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java @@ -18,27 +18,35 @@ package org.apache.beam.sdk.io.elasticsearch; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.BoundedElasticsearchSource; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.BulkIO; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.DocToBulk; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.Read; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.RetryConfiguration.DEFAULT_RETRY_PREDICATE; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.Write; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.getBackendVersion; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.FAMOUS_SCIENTISTS; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.NUM_SCIENTISTS; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.SCRIPT_SOURCE; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.countByMatch; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.countByScientistName; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.insertTestDocuments; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.refreshAllIndices; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestUtils.refreshIndexAndGetCurrentNumDocs; import static org.apache.beam.sdk.testing.SourceTestUtils.readFromSource; +import static org.apache.beam.sdk.values.TypeDescriptors.integers; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.Matchers.lessThan; +import static org.hamcrest.core.Is.is; import static org.hamcrest.core.Is.isA; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.node.ObjectNode; import java.io.IOException; import java.io.Serializable; import java.nio.charset.StandardCharsets; @@ -46,8 +54,10 @@ import java.util.Arrays; import java.util.Collections; import java.util.List; +import java.util.Map; import java.util.function.BiFunction; import org.apache.beam.sdk.io.BoundedSource; +import org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.BulkIO.StatefulBatching; import org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.RetryConfiguration.DefaultRetryPredicate; import org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.RetryConfiguration.RetryPredicate; import org.apache.beam.sdk.options.PipelineOptions; @@ -59,7 +69,9 @@ import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFnTester; +import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.http.HttpEntity; import org.apache.http.entity.ContentType; @@ -68,15 +80,16 @@ import org.elasticsearch.client.Response; import org.elasticsearch.client.RestClient; import org.hamcrest.CustomMatcher; +import org.hamcrest.Description; +import org.hamcrest.Matcher; +import org.hamcrest.TypeSafeMatcher; +import org.hamcrest.collection.IsIterableContainingInAnyOrder; import org.joda.time.Duration; import org.junit.rules.ExpectedException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** Common test class for {@link ElasticsearchIO}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class ElasticsearchIOTestCommon implements Serializable { private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchIOTestCommon.class); @@ -98,7 +111,7 @@ static String getEsIndex() { } static final String ES_TYPE = "test"; - static final long NUM_DOCS_UTESTS = 400L; + static final long NUM_DOCS_UTESTS = 100L; static final long NUM_DOCS_ITESTS = 50000L; static final float ACCEPTABLE_EMPTY_SPLITS_PERCENTAGE = 0.5f; private static final long AVERAGE_DOC_SIZE = 25L; @@ -111,7 +124,7 @@ static String getEsIndex() { private final long numDocs; private final ConnectionConfiguration connectionConfiguration; - private final RestClient restClient; + final RestClient restClient; private final boolean useAsITests; private TestPipeline pipeline; @@ -260,6 +273,25 @@ void testWrite() throws Exception { executeWriteTest(write); } + void testWriteStateful() throws Exception { + Write write = + ElasticsearchIO.write() + .withConnectionConfiguration(connectionConfiguration) + .withUseStatefulBatches(true); + executeWriteTest(write); + } + + List serializeDocs(ElasticsearchIO.Write write, List jsonDocs) + throws IOException { + List serializedInput = new ArrayList<>(); + for (String doc : jsonDocs) { + serializedInput.add( + DocToBulk.createBulkApiEntity( + write.getDocToBulk(), doc, getBackendVersion(connectionConfiguration))); + } + return serializedInput; + } + void testWriteWithErrors() throws Exception { Write write = ElasticsearchIO.write() @@ -268,6 +300,7 @@ void testWriteWithErrors() throws Exception { List input = ElasticsearchIOTestUtils.createDocuments( numDocs, ElasticsearchIOTestUtils.InjectionMode.INJECT_SOME_INVALID_DOCS); + expectedException.expect(isA(IOException.class)); expectedException.expectMessage( new CustomMatcher("RegExp matcher") { @@ -287,11 +320,32 @@ public boolean matches(Object o) { + "Document id .+: failed to parse \\(.+\\).*Caused by: .+ \\(.+\\).*"); } }); + // write bundles size is the runner decision, we cannot force a bundle size, // so we test the Writer as a DoFn outside of a runner. - try (DoFnTester fnTester = DoFnTester.of(new Write.WriteFn(write))) { + try (DoFnTester fnTester = + DoFnTester.of(new BulkIO.BulkIOBundleFn(write.getBulkIO()))) { // inserts into Elasticsearch - fnTester.processBundle(input); + fnTester.processBundle(serializeDocs(write, input)); + } + } + + void testWriteWithAllowedErrors() throws Exception { + Write write = + ElasticsearchIO.write() + .withConnectionConfiguration(connectionConfiguration) + .withMaxBatchSize(BATCH_SIZE) + .withAllowableResponseErrors(Collections.singleton("json_parse_exception")); + List input = + ElasticsearchIOTestUtils.createDocuments( + numDocs, ElasticsearchIOTestUtils.InjectionMode.INJECT_SOME_INVALID_DOCS); + + // write bundles size is the runner decision, we cannot force a bundle size, + // so we test the Writer as a DoFn outside of a runner. + try (DoFnTester fnTester = + DoFnTester.of(new BulkIO.BulkIOBundleFn(write.getBulkIO()))) { + // inserts into Elasticsearch + fnTester.processBundle(serializeDocs(write, input)); } } @@ -300,15 +354,24 @@ void testWriteWithMaxBatchSize() throws Exception { ElasticsearchIO.write() .withConnectionConfiguration(connectionConfiguration) .withMaxBatchSize(BATCH_SIZE); + // write bundles size is the runner decision, we cannot force a bundle size, // so we test the Writer as a DoFn outside of a runner. - try (DoFnTester fnTester = DoFnTester.of(new Write.WriteFn(write))) { + try (DoFnTester fnTester = + DoFnTester.of(new BulkIO.BulkIOBundleFn(write.getBulkIO()))) { List input = ElasticsearchIOTestUtils.createDocuments( numDocs, ElasticsearchIOTestUtils.InjectionMode.DO_NOT_INJECT_INVALID_DOCS); + + List serializedInput = new ArrayList<>(); + for (String doc : input) { + serializedInput.add( + DocToBulk.createBulkApiEntity( + write.getDocToBulk(), doc, getBackendVersion(connectionConfiguration))); + } long numDocsProcessed = 0; long numDocsInserted = 0; - for (String document : input) { + for (String document : serializedInput) { fnTester.processElement(document); numDocsProcessed++; // test every 100 docs to avoid overloading ES @@ -343,15 +406,22 @@ void testWriteWithMaxBatchSizeBytes() throws Exception { .withMaxBatchSizeBytes(BATCH_SIZE_BYTES); // write bundles size is the runner decision, we cannot force a bundle size, // so we test the Writer as a DoFn outside of a runner. - try (DoFnTester fnTester = DoFnTester.of(new Write.WriteFn(write))) { + try (DoFnTester fnTester = + DoFnTester.of(new BulkIO.BulkIOBundleFn(write.getBulkIO()))) { List input = ElasticsearchIOTestUtils.createDocuments( numDocs, ElasticsearchIOTestUtils.InjectionMode.DO_NOT_INJECT_INVALID_DOCS); + List serializedInput = new ArrayList<>(); + for (String doc : input) { + serializedInput.add( + DocToBulk.createBulkApiEntity( + write.getDocToBulk(), doc, getBackendVersion(connectionConfiguration))); + } long numDocsProcessed = 0; long sizeProcessed = 0; long numDocsInserted = 0; long batchInserted = 0; - for (String document : input) { + for (String document : serializedInput) { fnTester.processElement(document); numDocsProcessed++; sizeProcessed += document.getBytes(StandardCharsets.UTF_8).length; @@ -384,14 +454,25 @@ void testWriteWithMaxBatchSizeBytes() throws Exception { /** Extracts the name field from the JSON document. */ private static class ExtractValueFn implements Write.FieldValueExtractFn { private final String fieldName; + private final String disambiguationName; + + private ExtractValueFn(String fieldName, String disambiguationName) { + this.fieldName = fieldName; + this.disambiguationName = disambiguationName; + } private ExtractValueFn(String fieldName) { this.fieldName = fieldName; + this.disambiguationName = null; } @Override public String apply(JsonNode input) { - return input.path(fieldName).asText(); + String output = input.path(fieldName).asText(); + if (disambiguationName != null) { + output += disambiguationName; + } + return output; } } @@ -414,7 +495,7 @@ void testWriteWithIdFn() throws Exception { long currentNumDocs = refreshIndexAndGetCurrentNumDocs(connectionConfiguration, restClient); assertEquals(NUM_SCIENTISTS, currentNumDocs); - int count = countByScientistName(connectionConfiguration, restClient, "Einstein"); + int count = countByScientistName(connectionConfiguration, restClient, "Einstein", null); assertEquals(1, count); } @@ -427,6 +508,7 @@ void testWriteWithIdFn() throws Exception { * Therefore limit to a small number of docs to test routing behavior only. */ void testWriteWithIndexFn() throws Exception { + String disambiguation = "testWriteWithIndexFn".toLowerCase(); long docsPerScientist = 10; // very conservative long adjustedNumDocs = docsPerScientist * FAMOUS_SCIENTISTS.length; @@ -438,12 +520,14 @@ void testWriteWithIndexFn() throws Exception { .apply( ElasticsearchIO.write() .withConnectionConfiguration(connectionConfiguration) - .withIndexFn(new ExtractValueFn("scientist"))); + .withIndexFn(new ExtractValueFn("scientist", disambiguation))); pipeline.run(); // verify counts on each index for (String scientist : FAMOUS_SCIENTISTS) { - String index = scientist.toLowerCase(); + // Note that tests run in parallel, so without disambiguation multiple tests might be + // interacting with the index named after a particular scientist + String index = scientist.toLowerCase() + disambiguation; long count = refreshIndexAndGetCurrentNumDocs( restClient, @@ -510,6 +594,10 @@ void testWriteWithTypeFn2x5x() throws Exception { * As a result there should be only a single document in each index/type. */ void testWriteWithFullAddressing() throws Exception { + // Note that tests run in parallel, so without disambiguation multiple tests might be + // interacting with the index named after a particular scientist + String disambiguation = "testWriteWithFullAddressing".toLowerCase(); + List data = ElasticsearchIOTestUtils.createDocuments( numDocs, ElasticsearchIOTestUtils.InjectionMode.DO_NOT_INJECT_INVALID_DOCS); @@ -519,12 +607,12 @@ void testWriteWithFullAddressing() throws Exception { ElasticsearchIO.write() .withConnectionConfiguration(connectionConfiguration) .withIdFn(new ExtractValueFn("id")) - .withIndexFn(new ExtractValueFn("scientist")) + .withIndexFn(new ExtractValueFn("scientist", disambiguation)) .withTypeFn(new Modulo2ValueFn("scientist"))); pipeline.run(); for (String scientist : FAMOUS_SCIENTISTS) { - String index = scientist.toLowerCase(); + String index = scientist.toLowerCase() + disambiguation; for (int i = 0; i < 2; i++) { String type = "TYPE_" + scientist.hashCode() % 2; long count = @@ -535,6 +623,34 @@ void testWriteWithFullAddressing() throws Exception { } } + /** + * Tests that documents are correctly routed when routingFn function is provided to overwrite the + * defaults of using the configuration and auto-generation of the document IDs by Elasticsearch. + * The scientist name is used for routing. As a result there should be numDocs/NUM_SCIENTISTS in + * each index. + */ + void testWriteWithRouting() throws Exception { + List data = + ElasticsearchIOTestUtils.createDocuments( + numDocs, ElasticsearchIOTestUtils.InjectionMode.DO_NOT_INJECT_INVALID_DOCS); + pipeline + .apply(Create.of(data)) + .apply( + ElasticsearchIO.write() + .withConnectionConfiguration(connectionConfiguration) + .withRoutingFn(new ExtractValueFn("scientist"))); + pipeline.run(); + + refreshAllIndices(restClient); + for (String scientist : FAMOUS_SCIENTISTS) { + Map urlParams = Collections.singletonMap("routing", scientist); + + assertEquals( + numDocs / NUM_SCIENTISTS, + countByScientistName(connectionConfiguration, restClient, scientist, urlParams)); + } + } + /** * Tests partial updates by adding a group field to each document in the standard test set. The * group field is populated as the modulo 2 of the document id allowing for a test to ensure the @@ -571,11 +687,198 @@ void testWritePartialUpdate() throws Exception { assertEquals(numDocs, currentNumDocs); assertEquals( numDocs / NUM_SCIENTISTS, - countByScientistName(connectionConfiguration, restClient, "Einstein")); + countByScientistName(connectionConfiguration, restClient, "Einstein", null)); // Partial update assertions - assertEquals(numDocs / 2, countByMatch(connectionConfiguration, restClient, "group", "0")); - assertEquals(numDocs / 2, countByMatch(connectionConfiguration, restClient, "group", "1")); + assertEquals( + numDocs / 2, countByMatch(connectionConfiguration, restClient, "group", "0", null, null)); + assertEquals( + numDocs / 2, countByMatch(connectionConfiguration, restClient, "group", "1", null, null)); + } + + void testWriteWithDocVersion() throws Exception { + List jsonData = + ElasticsearchIOTestUtils.createJsonDocuments( + numDocs, ElasticsearchIOTestUtils.InjectionMode.DO_NOT_INJECT_INVALID_DOCS); + + List data = new ArrayList<>(); + for (ObjectNode doc : jsonData) { + doc.put("my_version", "1"); + data.add(doc.toString()); + } + + insertTestDocuments(connectionConfiguration, data, restClient); + long currentNumDocs = refreshIndexAndGetCurrentNumDocs(connectionConfiguration, restClient); + assertEquals(numDocs, currentNumDocs); + // Check that all docs have the same "my_version" + assertEquals( + numDocs, + countByMatch( + connectionConfiguration, restClient, "my_version", "1", null, KV.of(1, numDocs))); + + Write write = + ElasticsearchIO.write() + .withConnectionConfiguration(connectionConfiguration) + .withIdFn(new ExtractValueFn("id")) + .withDocVersionFn(new ExtractValueFn("my_version")) + .withDocVersionType("external"); + + data = new ArrayList<>(); + for (ObjectNode doc : jsonData) { + // Set version to larger number than originally set, and larger than next logical version + // number set by default by ES. + doc.put("my_version", "3"); + data.add(doc.toString()); + } + + // Test that documents with lower version are rejected, but rejections ignored when specified + pipeline.apply(Create.of(data)).apply(write); + pipeline.run(); + + currentNumDocs = refreshIndexAndGetCurrentNumDocs(connectionConfiguration, restClient); + assertEquals(numDocs, currentNumDocs); + + // my_version and doc version should have changed + assertEquals( + 0, + countByMatch( + connectionConfiguration, restClient, "my_version", "1", null, KV.of(1, numDocs))); + assertEquals( + numDocs, + countByMatch( + connectionConfiguration, restClient, "my_version", "3", null, KV.of(3, numDocs))); + } + + /** + * Tests upsert script by adding a group field to each document in the standard test set. The + * group field is populated as the modulo 2 of the document id allowing for a test to ensure the + * documents are split into 2 groups. + */ + void testWriteScriptedUpsert() throws Exception { + List data = + ElasticsearchIOTestUtils.createDocuments( + numDocs, ElasticsearchIOTestUtils.InjectionMode.DO_NOT_INJECT_INVALID_DOCS); + + Write write = + ElasticsearchIO.write() + .withConnectionConfiguration(connectionConfiguration) + .withIdFn(new ExtractValueFn("id")) + .withUpsertScript(SCRIPT_SOURCE); + + // Test that documents can be inserted/created by using withUpsertScript + pipeline.apply(Create.of(data)).apply(write); + pipeline.run(); + + // defensive coding to ensure our initial state is as expected + long currentNumDocs = refreshIndexAndGetCurrentNumDocs(connectionConfiguration, restClient); + // check we have not unwittingly modified existing behaviour + assertEquals(numDocs, currentNumDocs); + assertEquals( + numDocs / NUM_SCIENTISTS, + countByScientistName(connectionConfiguration, restClient, "Einstein", null)); + + // All docs should have have group = 0 added by the script upon creation + assertEquals( + numDocs, countByMatch(connectionConfiguration, restClient, "group", "0", null, null)); + + // Run the same data again. This time, because all docs exist in the index already, scripted + // updates should happen rather than scripted inserts. + pipeline.apply(Create.of(data)).apply(write); + pipeline.run(); + + currentNumDocs = refreshIndexAndGetCurrentNumDocs(connectionConfiguration, restClient); + + // check we have not unwittingly modified existing behaviour + assertEquals(numDocs, currentNumDocs); + assertEquals( + numDocs / NUM_SCIENTISTS, + countByScientistName(connectionConfiguration, restClient, "Einstein", null)); + + // The script will set either 0 or 1 for the group value on update operations + assertEquals( + numDocs / 2, countByMatch(connectionConfiguration, restClient, "group", "0", null, null)); + assertEquals( + numDocs / 2, countByMatch(connectionConfiguration, restClient, "group", "1", null, null)); + } + + void testMaxParallelRequestsPerWindow() throws Exception { + List data = + ElasticsearchIOTestUtils.createDocuments( + numDocs, ElasticsearchIOTestUtils.InjectionMode.DO_NOT_INJECT_INVALID_DOCS); + + Write write = + ElasticsearchIO.write() + .withConnectionConfiguration(connectionConfiguration) + .withMaxParallelRequestsPerWindow(1); + + PCollection>> batches = + pipeline.apply(Create.of(data)).apply(StatefulBatching.fromSpec(write.getBulkIO())); + + PCollection keyValues = + batches.apply( + MapElements.into(integers()) + .via((SerializableFunction>, Integer>) KV::getKey)); + + // Number of unique keys produced should be number of maxParallelRequestsPerWindow * numWindows + // There is only 1 request (key) per window, and 1 (global) window ie. one key total where + // key value is 0 + PAssert.that(keyValues).containsInAnyOrder(0); + + PAssert.that(batches).satisfies(new AssertThatHasExpectedContents(0, data)); + + pipeline.run(); + } + + private static class AssertThatHasExpectedContents + implements SerializableFunction>>, Void> { + + private final int key; + private final List expectedContents; + + AssertThatHasExpectedContents(int key, List expected) { + this.key = key; + this.expectedContents = expected; + } + + @Override + public Void apply(Iterable>> actual) { + assertThat( + actual, + IsIterableContainingInAnyOrder.containsInAnyOrder( + KvMatcher.isKv( + is(key), + IsIterableContainingInAnyOrder.containsInAnyOrder(expectedContents.toArray())))); + return null; + } + } + + public static class KvMatcher extends TypeSafeMatcher> { + final Matcher keyMatcher; + final Matcher valueMatcher; + + public static KvMatcher isKv(Matcher keyMatcher, Matcher valueMatcher) { + return new KvMatcher<>(keyMatcher, valueMatcher); + } + + public KvMatcher(Matcher keyMatcher, Matcher valueMatcher) { + this.keyMatcher = keyMatcher; + this.valueMatcher = valueMatcher; + } + + @Override + public boolean matchesSafely(KV kv) { + return keyMatcher.matches(kv.getKey()) && valueMatcher.matches(kv.getValue()); + } + + @Override + public void describeTo(Description description) { + description + .appendText("a KV(") + .appendValue(keyMatcher) + .appendText(", ") + .appendValue(valueMatcher) + .appendText(")"); + } } /** @@ -630,7 +933,7 @@ void testWriteRetry() throws Throwable { // max attempt is 3, but retry is 2 which excludes 1st attempt when error was identified and // retry started. expectedException.expectMessage( - String.format(ElasticsearchIO.Write.WriteFn.RETRY_FAILED_LOG, EXPECTED_RETRIES)); + String.format(ElasticsearchIO.BulkIO.RETRY_FAILED_LOG, EXPECTED_RETRIES)); ElasticsearchIO.Write write = ElasticsearchIO.write() @@ -653,6 +956,25 @@ void testWriteRetryValidRequest() throws Exception { executeWriteTest(write); } + void testWriteAppendOnly() throws Exception { + Write write = + ElasticsearchIO.write() + .withConnectionConfiguration(connectionConfiguration) + .withIdFn(new ExtractValueFn("id")) + .withAppendOnly(true); + executeWriteTest(write); + } + + void testWriteAppendOnlyDeleteNotAllowed() throws Exception { + Write write = + ElasticsearchIO.write() + .withConnectionConfiguration(connectionConfiguration) + .withIdFn(new ExtractValueFn("id")) + .withAppendOnly(true) + .withIsDeleteFn(doc -> true); + executeWriteTest(write); + } + private void executeWriteTest(ElasticsearchIO.Write write) throws Exception { List data = ElasticsearchIOTestUtils.createDocuments( @@ -663,7 +985,7 @@ private void executeWriteTest(ElasticsearchIO.Write write) throws Exception { long currentNumDocs = refreshIndexAndGetCurrentNumDocs(connectionConfiguration, restClient); assertEquals(numDocs, currentNumDocs); - int count = countByScientistName(connectionConfiguration, restClient, "Einstein"); + int count = countByScientistName(connectionConfiguration, restClient, "Einstein", null); assertEquals(numDocs / NUM_SCIENTISTS, count); } @@ -702,11 +1024,11 @@ void testWriteWithIsDeletedFnWithPartialUpdates() throws Exception { // check we have not unwittingly modified existing behaviour assertEquals( numDocs / NUM_SCIENTISTS, - countByScientistName(connectionConfiguration, restClient, "Einstein")); + countByScientistName(connectionConfiguration, restClient, "Einstein", null)); // Check if documents are deleted as expected assertEquals(numDocs / 2, currentNumDocs); - assertEquals(0, countByScientistName(connectionConfiguration, restClient, "Darwin")); + assertEquals(0, countByScientistName(connectionConfiguration, restClient, "Darwin", null)); } /** @@ -744,10 +1066,10 @@ void testWriteWithIsDeletedFnWithoutPartialUpdate() throws Exception { // check we have not unwittingly modified existing behaviour assertEquals( numDocs / NUM_SCIENTISTS, - countByScientistName(connectionConfiguration, restClient, "Einstein")); + countByScientistName(connectionConfiguration, restClient, "Einstein", null)); // Check if documents are deleted as expected assertEquals(numDocs / 2, currentNumDocs); - assertEquals(0, countByScientistName(connectionConfiguration, restClient, "Darwin")); + assertEquals(0, countByScientistName(connectionConfiguration, restClient, "Darwin", null)); } } diff --git a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestUtils.java b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestUtils.java index 16c0248363c5..5498f4488039 100644 --- a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestUtils.java +++ b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestUtils.java @@ -20,37 +20,54 @@ import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.getBackendVersion; import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.parseResponse; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ES_TYPE; +import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.getEsIndex; +import com.fasterxml.jackson.core.JsonProcessingException; import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.node.ObjectNode; import java.io.IOException; +import java.time.Duration; import java.util.ArrayList; import java.util.Collections; import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.values.KV; import org.apache.http.HttpEntity; +import org.apache.http.HttpHost; import org.apache.http.entity.ContentType; import org.apache.http.nio.entity.NStringEntity; +import org.checkerframework.checker.nullness.qual.Nullable; import org.elasticsearch.client.Request; import org.elasticsearch.client.Response; import org.elasticsearch.client.RestClient; +import org.testcontainers.elasticsearch.ElasticsearchContainer; +import org.testcontainers.utility.DockerImageName; /** Test utilities to use with {@link ElasticsearchIO}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class ElasticsearchIOTestUtils { + static final int ELASTICSEARCH_DEFAULT_PORT = 9200; + static final String ELASTICSEARCH_PASSWORD = "superSecure"; + static final String ELASTIC_UNAME = "elastic"; + static final String[] FAMOUS_SCIENTISTS = { - "Einstein", - "Darwin", - "Copernicus", - "Pasteur", - "Curie", - "Faraday", - "Newton", - "Bohr", - "Galilei", - "Maxwell" + "einstein", + "darwin", + "copernicus", + "pasteur", + "curie", + "faraday", + "newton", + "bohr", + "galilei", + "maxwell" }; + static final ObjectMapper MAPPER = new ObjectMapper(); static final int NUM_SCIENTISTS = FAMOUS_SCIENTISTS.length; + static final String SCRIPT_SOURCE = + "if(ctx._source.group != null) { ctx._source.group = params.id % 2 } else { ctx._source" + + ".group = 0 }"; /** Enumeration that specifies whether to insert malformed documents. */ public enum InjectionMode { @@ -69,11 +86,10 @@ private static void closeIndex(RestClient restClient, String index) throws IOExc restClient.performRequest(request); } - private static void deleteIndex(RestClient restClient, String index) throws IOException { + static void deleteIndex(RestClient restClient, String index) throws IOException { try { closeIndex(restClient, index); Request request = new Request("DELETE", String.format("/%s", index)); - request.addParameters(Collections.singletonMap("refresh", "wait_for")); restClient.performRequest(request); } catch (IOException e) { // it is fine to ignore this expression as deleteIndex occurs in @before, @@ -84,6 +100,27 @@ private static void deleteIndex(RestClient restClient, String index) throws IOEx } } + public static void createIndex(RestClient restClient, String indexName) throws IOException { + deleteIndex(restClient, indexName); + Request request = new Request("PUT", String.format("/%s", indexName)); + restClient.performRequest(request); + } + + public static void setDefaultTemplate(RestClient restClient) throws IOException { + Request request = new Request("PUT", "/_template/default"); + NStringEntity body = + new NStringEntity( + "{" + + "\"order\": 0," + + "\"index_patterns\": [\"*\"]," + + "\"template\": \"*\"," + + "\"settings\": {\"index.number_of_shards\": 1, \"index.number_of_replicas\": 0}}", + ContentType.APPLICATION_JSON); + + request.setEntity(body); + restClient.performRequest(request); + } + /** * Synchronously deletes the target if it exists and then (re)creates it as a copy of source * synchronously. @@ -104,11 +141,8 @@ static void copyIndex(RestClient restClient, String source, String target) throw /** Inserts the given number of test documents into Elasticsearch. */ static void insertTestDocuments( - ConnectionConfiguration connectionConfiguration, long numDocs, RestClient restClient) + ConnectionConfiguration connectionConfiguration, List data, RestClient restClient) throws IOException { - List data = - ElasticsearchIOTestUtils.createDocuments( - numDocs, ElasticsearchIOTestUtils.InjectionMode.DO_NOT_INJECT_INVALID_DOCS); StringBuilder bulkRequest = new StringBuilder(); int i = 0; for (String document : data) { @@ -129,9 +163,24 @@ static void insertTestDocuments( request.addParameters(Collections.singletonMap("refresh", "wait_for")); request.setEntity(requestBody); Response response = restClient.performRequest(request); - ElasticsearchIO.checkForErrors( - response.getEntity(), ElasticsearchIO.getBackendVersion(connectionConfiguration), false); + ElasticsearchIO.checkForErrors(response.getEntity(), Collections.emptySet()); + } + + /** Inserts the given number of test documents into Elasticsearch. */ + static void insertTestDocuments( + ConnectionConfiguration connectionConfiguration, long numDocs, RestClient restClient) + throws IOException { + List data = + ElasticsearchIOTestUtils.createDocuments( + numDocs, ElasticsearchIOTestUtils.InjectionMode.DO_NOT_INJECT_INVALID_DOCS); + insertTestDocuments(connectionConfiguration, data, restClient); } + + static void refreshAllIndices(RestClient restClient) throws IOException { + Request request = new Request("POST", "/_refresh"); + restClient.performRequest(request); + } + /** * Forces a refresh of the given index to make recently inserted documents available for search * using the index and type named in the connectionConfiguration. @@ -147,31 +196,66 @@ static long refreshIndexAndGetCurrentNumDocs( restClient, connectionConfiguration.getIndex(), connectionConfiguration.getType(), - getBackendVersion(connectionConfiguration)); + getBackendVersion(connectionConfiguration), + null); + } + + /** + * Forces a refresh of the given index to make recently inserted documents available for search + * using the index and type named in the connectionConfiguration. + * + * @param connectionConfiguration providing the index and type + * @param restClient To use for issuing queries + * @param urlParams Optional key/value pairs describing URL params for ES APIs + * @return The number of docs in the index + * @throws IOException On error communicating with Elasticsearch + */ + static long refreshIndexAndGetCurrentNumDocs( + ConnectionConfiguration connectionConfiguration, + RestClient restClient, + @Nullable Map urlParams) + throws IOException { + return refreshIndexAndGetCurrentNumDocs( + restClient, + connectionConfiguration.getIndex(), + connectionConfiguration.getType(), + getBackendVersion(connectionConfiguration), + urlParams); } + static long refreshIndexAndGetCurrentNumDocs( + RestClient restClient, String index, String type, int backendVersion) throws IOException { + return refreshIndexAndGetCurrentNumDocs(restClient, index, type, backendVersion, null); + } /** * Forces a refresh of the given index to make recently inserted documents available for search. * * @param restClient To use for issuing queries * @param index The Elasticsearch index * @param type The Elasticsearch type + * @param urlParams Optional key/value pairs describing URL params for ES APIs * @return The number of docs in the index * @throws IOException On error communicating with Elasticsearch */ static long refreshIndexAndGetCurrentNumDocs( - RestClient restClient, String index, String type, int backenVersion) throws IOException { + RestClient restClient, + String index, + String type, + int backendVersion, + @Nullable Map urlParams) + throws IOException { long result = 0; try { - String endPoint = String.format("/%s/_refresh", index); - Request request = new Request("POST", endPoint); - restClient.performRequest(request); + refreshAllIndices(restClient); - endPoint = String.format("/%s/%s/_search", index, type); - request = new Request("GET", endPoint); + String endPoint = generateSearchPath(index, type); + Request request = new Request("GET", endPoint); + if (urlParams != null) { + request.addParameters(urlParams); + } Response response = restClient.performRequest(request); JsonNode searchResult = ElasticsearchIO.parseResponse(response.getEntity()); - if (backenVersion >= 7) { + if (backendVersion >= 7) { result = searchResult.path("hits").path("total").path("value").asLong(); } else { result = searchResult.path("hits").path("total").asLong(); @@ -210,19 +294,67 @@ static List createDocuments(long numDocs, InjectionMode injectionMode) { return data; } + static List createJsonDocuments(long numDocs, InjectionMode injectionMode) + throws JsonProcessingException { + List stringData = createDocuments(numDocs, injectionMode); + List data = new ArrayList<>(); + + for (String doc : stringData) { + data.add((ObjectNode) MAPPER.readTree(doc)); + } + return data; + } + /** * Executes a query for the named scientist and returns the count from the result. * * @param connectionConfiguration Specifies the index and type * @param restClient To use to execute the call * @param scientistName The scientist to query for + * @param urlParams Optional key/value pairs describing URL params for ES APIs * @return The count of documents found * @throws IOException On error talking to Elasticsearch */ static int countByScientistName( - ConnectionConfiguration connectionConfiguration, RestClient restClient, String scientistName) + ConnectionConfiguration connectionConfiguration, + RestClient restClient, + String scientistName, + @Nullable Map urlParams) throws IOException { - return countByMatch(connectionConfiguration, restClient, "scientist", scientistName); + return countByMatch( + connectionConfiguration, restClient, "scientist", scientistName, urlParams, null); + } + + /** + * Creates a _search API path depending on ConnectionConfiguration and url params. + * + * @param index Optional Elasticsearch index + * @param type Optional Elasticsearch type + * @return The _search endpoint for the provided settings. + */ + static String generateSearchPath(@Nullable String index, @Nullable String type) { + StringBuilder sb = new StringBuilder(); + if (index != null) { + sb.append("/").append(index); + } + if (type != null) { + sb.append("/").append(type); + } + + sb.append("/_search"); + + return sb.toString(); + } + + /** + * Creates a _search API path depending on ConnectionConfiguration and url params. + * + * @param connectionConfiguration Specifies the index and type + * @return The _search endpoint for the provided settings. + */ + static String generateSearchPath(ConnectionConfiguration connectionConfiguration) { + return generateSearchPath( + connectionConfiguration.getIndex(), connectionConfiguration.getType()); } /** @@ -232,6 +364,8 @@ static int countByScientistName( * @param restClient To use to execute the call * @param field The field to query * @param value The value to match + * @param urlParams Optional key/value pairs describing URL params for ES APIs + * @param versionNumberCountPair Optional pair of [version_number, expected_num_doc_with_version] * @return The count of documents in the search result * @throws IOException On error communicating with Elasticsearch */ @@ -239,10 +373,18 @@ static int countByMatch( ConnectionConfiguration connectionConfiguration, RestClient restClient, String field, - String value) + String value, + @Nullable Map urlParams, + @Nullable KV versionNumberCountPair) throws IOException { + String size = + versionNumberCountPair == null ? "10" : versionNumberCountPair.getValue().toString(); String requestBody = "{\n" + + "\"size\": " + + size + + ",\n" + + "\"version\" : true,\n" + " \"query\" : {\"match\": {\n" + " \"" + field @@ -251,21 +393,63 @@ static int countByMatch( + "\"\n" + " }}\n" + "}\n"; - String endPoint = - String.format( - "/%s/%s/_search", - connectionConfiguration.getIndex(), connectionConfiguration.getType()); + + String endPoint = generateSearchPath(connectionConfiguration); HttpEntity httpEntity = new NStringEntity(requestBody, ContentType.APPLICATION_JSON); Request request = new Request("GET", endPoint); - request.addParameters(Collections.emptyMap()); request.setEntity(httpEntity); + if (urlParams != null) { + request.addParameters(urlParams); + } + Response response = restClient.performRequest(request); JsonNode searchResult = parseResponse(response.getEntity()); + + if (versionNumberCountPair != null) { + int numHits = 0; + for (JsonNode hit : searchResult.path("hits").path("hits")) { + if (hit.path("_version").asInt() == versionNumberCountPair.getKey()) { + numHits++; + } + } + return numHits; + } + if (getBackendVersion(connectionConfiguration) >= 7) { return searchResult.path("hits").path("total").path("value").asInt(); } else { return searchResult.path("hits").path("total").asInt(); } } + + static RestClient clientFromContainer(ElasticsearchContainer container) { + return RestClient.builder( + new HttpHost( + container.getContainerIpAddress(), + container.getMappedPort(ELASTICSEARCH_DEFAULT_PORT), + "http")) + .build(); + } + + static ConnectionConfiguration createConnectionConfig(RestClient restClient) { + String[] hostStrings = + restClient.getNodes().stream().map(node -> node.getHost().toURI()).toArray(String[]::new); + + return ConnectionConfiguration.create(hostStrings, getEsIndex(), ES_TYPE) + .withSocketTimeout(120000) + .withConnectTimeout(5000); + } + + static ElasticsearchContainer createTestContainer(String imageTag) { + ElasticsearchContainer container = + new ElasticsearchContainer( + DockerImageName.parse("docker.elastic.co/elasticsearch/elasticsearch") + .withTag(imageTag)) + .withEnv("xpack.security.enabled", "false"); + + container.withStartupTimeout(Duration.ofMinutes(3)); + + return container; + } } diff --git a/sdks/java/io/elasticsearch/OWNERS b/sdks/java/io/elasticsearch/OWNERS index c2bb79459d8b..c4a265bcd848 100644 --- a/sdks/java/io/elasticsearch/OWNERS +++ b/sdks/java/io/elasticsearch/OWNERS @@ -1,6 +1,7 @@ # See the OWNERS docs at https://s.apache.org/beam-owners reviewers: + - egalpin - echauchot - jbonofre - timrobertson100 diff --git a/sdks/java/io/elasticsearch/build.gradle b/sdks/java/io/elasticsearch/build.gradle index ac9e5e96cf02..ebe1372533a8 100644 --- a/sdks/java/io/elasticsearch/build.gradle +++ b/sdks/java/io/elasticsearch/build.gradle @@ -25,8 +25,16 @@ ext.summary = "IO to read and write on Elasticsearch" dependencies { compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") - compile library.java.jackson_databind + compile library.java.http_client + compile library.java.http_core compile library.java.jackson_annotations + permitUnusedDeclared library.java.jackson_annotations // BEAM-11761 + compile library.java.jackson_core + compile library.java.jackson_databind + compile library.java.joda_time + compile library.java.slf4j_api + compile "org.apache.httpcomponents:httpasyncclient:4.1.4" + compile "org.apache.httpcomponents:httpcore-nio:4.4.12" compile "org.elasticsearch.client:elasticsearch-rest-client:7.9.2" testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") } diff --git a/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java b/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java index 2375d3ea4afe..97257e4ee4b8 100644 --- a/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java +++ b/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java @@ -32,6 +32,8 @@ import java.io.IOException; import java.io.InputStream; import java.io.Serializable; +import java.net.ConnectException; +import java.net.SocketTimeoutException; import java.net.URL; import java.nio.charset.StandardCharsets; import java.security.KeyStore; @@ -39,11 +41,14 @@ import java.util.Arrays; import java.util.Collections; import java.util.HashMap; +import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.ListIterator; import java.util.Map; import java.util.NoSuchElementException; +import java.util.Objects; +import java.util.Set; import java.util.function.Predicate; import javax.annotation.Nonnull; import javax.net.ssl.SSLContext; @@ -55,29 +60,37 @@ import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.ValueProvider; import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupIntoBatches; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.Reshuffle; import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.util.BackOff; import org.apache.beam.sdk.util.BackOffUtils; import org.apache.beam.sdk.util.FluentBackoff; import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PDone; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; +import org.apache.http.ConnectionClosedException; +import org.apache.http.Header; import org.apache.http.HttpEntity; import org.apache.http.HttpHost; import org.apache.http.auth.AuthScope; import org.apache.http.auth.UsernamePasswordCredentials; import org.apache.http.client.CredentialsProvider; import org.apache.http.client.config.RequestConfig; +import org.apache.http.conn.ConnectTimeoutException; import org.apache.http.conn.ssl.TrustSelfSignedStrategy; import org.apache.http.conn.ssl.TrustStrategy; import org.apache.http.entity.BufferedHttpEntity; import org.apache.http.entity.ContentType; import org.apache.http.impl.client.BasicCredentialsProvider; +import org.apache.http.message.BasicHeader; import org.apache.http.nio.conn.ssl.SSLIOSessionStrategy; import org.apache.http.nio.entity.NStringEntity; import org.apache.http.ssl.SSLContexts; @@ -109,17 +122,53 @@ * * } * - *

    The connection configuration also accepts optional configuration: {@code withUsername()} and - * {@code withPassword()}. + *

    The connection configuration also accepts optional configuration: {@code withUsername()}, + * {@code withPassword()}, {@code withApiKey()} and {@code withBearerToken()}. * *

    You can also specify a query on the {@code read()} using {@code withQuery()}. * + *

    There are many more configuration options which can be found by looking at the with* methods + * of {@link ElasticsearchIO.Read} + * *

    Writing to Elasticsearch

    * *

    To write documents to Elasticsearch, use {@link ElasticsearchIO#write * ElasticsearchIO.write()}, which writes JSON documents from a {@link PCollection * PCollection<String>} (which can be bounded or unbounded). * + *

    {@link ElasticsearchIO.Write} involves 2 discrete steps: + * + *

      + *
    • Converting the input PCollection of valid ES documents into Bulk API directives i.e. Should + * the input document result in: update, insert, delete, with version, with routing, etc (See + * {@link ElasticsearchIO.DocToBulk}) + *
    • Batching Bulk API directives together and interfacing with an Elasticsearch cluster. (See + * {@link ElasticsearchIO.BulkIO}) + *
    + * + *

    In most cases, using {@link ElasticsearchIO#write} will be desirable. In some cases, one may + * want to use {@link ElasticsearchIO.DocToBulk} and {@link ElasticsearchIO.BulkIO} directly. Such + * cases might include: + * + *

      + *
    • Unit testing. Ensure that output Bulk API entities for a given set of inputs will produce + * an expected result, without the need for an available Elasticsearch cluster. See {@link + * ElasticsearchIO.Write#docToBulk} + *
    • Flexible options for data backup. Serialized Bulk API entities can be forked and sent to + * both Elasticsearch and a data lake. + *
    • Mirroring data to multiple clusters. Presently, mirroring data to multiple clusters would + * require duplicate computation. + *
    • Better batching with input streams in one job. A job may produce multiple "shapes" of Bulk + * API directives based on multiple input types, and then "fan-in" all serialized Bulk + * directives into a single BulkIO transform to improve batching semantics. + *
    • Decoupled jobs. Job(s) could be made to produce Bulk directives and then publish them to a + * message bus. A distinct job could consume from that message bus and solely be responsible + * for IO with the target cluster(s). + *
    + * + *

    Note that configurations options for {@link ElasticsearchIO.Write} are a union of + * configutation options for {@link ElasticsearchIO.DocToBulk} and {@link ElasticsearchIO.BulkIO}. + * *

    To configure {@link ElasticsearchIO#write ElasticsearchIO.write()}, similar to the read, you * have to provide a connection configuration. For instance: * @@ -132,25 +181,8 @@ * * } * - *

    Optionally, you can provide {@code withBatchSize()} and {@code withBatchSizeBytes()} to - * specify the size of the write batch in number of documents or in bytes. - * - *

    Optionally, you can provide an {@link ElasticsearchIO.Write.FieldValueExtractFn} using {@code - * withIdFn()} that will be run to extract the id value out of the provided document rather than - * using the document id auto-generated by Elasticsearch. - * - *

    Optionally, you can provide {@link ElasticsearchIO.Write.FieldValueExtractFn} using {@code - * withIndexFn()} or {@code withTypeFn()} to enable per-document routing to the target Elasticsearch - * index (all versions) and type (version > 6). Support for type routing was removed in - * Elasticsearch 6 (see - * https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch) - * - *

    When {withUsePartialUpdate()} is enabled, the input document must contain an id field and - * {@code withIdFn()} must be used to allow its extraction by the ElasticsearchIO. - * - *

    Optionally, {@code withSocketTimeout()} can be used to override the default retry timeout and - * socket timeout of 30000ms. {@code withConnectTimeout()} can be used to override the default - * connect timeout of 1000ms. + *

    There are many more configuration options which can be found by looking at the with* methods + * of {@link ElasticsearchIO.Write} */ @Experimental(Kind.SOURCE_SINK) @SuppressWarnings({ @@ -158,10 +190,16 @@ }) public class ElasticsearchIO { + private static final List VALID_CLUSTER_VERSIONS = Arrays.asList(2, 5, 6, 7); + private static final List VERSION_TYPES = + Arrays.asList("internal", "external", "external_gt", "external_gte"); + private static final String VERSION_CONFLICT_ERROR = "version_conflict_engine_exception"; + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchIO.class); public static Read read() { - // default scrollKeepalive = 5m as a majorant for un-predictable time between 2 start/read calls + // default scrollKeepalive = 5m as a majorant for un-predictable time between 2 start/read + // calls // default batchSize to 100 as recommended by ES dev team as a safe value when dealing // with big documents and still a good compromise for performances return new AutoValue_ElasticsearchIO_Read.Builder() @@ -171,16 +209,25 @@ public static Read read() { .build(); } - public static Write write() { - return new AutoValue_ElasticsearchIO_Write.Builder() + public static DocToBulk docToBulk() { + return new AutoValue_ElasticsearchIO_DocToBulk.Builder().build(); + } + + public static BulkIO bulkIO() { + return new AutoValue_ElasticsearchIO_BulkIO.Builder() // advised default starting batch size in ES docs .setMaxBatchSize(1000L) // advised default starting batch size in ES docs .setMaxBatchSizeBytes(5L * 1024L * 1024L) - .setUsePartialUpdate(false) // default is document upsert + .setUseStatefulBatches(false) + .setMaxParallelRequestsPerWindow(1) .build(); } + public static Write write() { + return new Write(); + } + private ElasticsearchIO() {} private static final ObjectMapper mapper = new ObjectMapper(); @@ -190,44 +237,48 @@ static JsonNode parseResponse(HttpEntity responseEntity) throws IOException { return mapper.readValue(responseEntity.getContent(), JsonNode.class); } - static void checkForErrors(HttpEntity responseEntity, int backendVersion, boolean partialUpdate) + static void checkForErrors(HttpEntity responseEntity, @Nullable Set allowedErrorTypes) throws IOException { + JsonNode searchResult = parseResponse(responseEntity); boolean errors = searchResult.path("errors").asBoolean(); if (errors) { + int numErrors = 0; + StringBuilder errorMessages = new StringBuilder("Error writing to Elasticsearch, some elements could not be inserted:"); JsonNode items = searchResult.path("items"); + if (items.isMissingNode() || items.size() == 0) { + errorMessages.append(searchResult.toString()); + } // some items present in bulk might have errors, concatenate error messages for (JsonNode item : items) { - - String errorRootName = ""; - // when use partial update, the response items includes all the update. - if (partialUpdate) { - errorRootName = "update"; - } else { - if (backendVersion == 2) { - errorRootName = "create"; - } else if (backendVersion >= 5) { - errorRootName = "index"; - } - } - JsonNode errorRoot = item.path(errorRootName); - JsonNode error = errorRoot.get("error"); + JsonNode error = item.findValue("error"); if (error != null) { + // N.B. An empty-string within the allowedErrorTypes Set implies all errors are allowed. String type = error.path("type").asText(); String reason = error.path("reason").asText(); - String docId = errorRoot.path("_id").asText(); - errorMessages.append(String.format("%nDocument id %s: %s (%s)", docId, reason, type)); - JsonNode causedBy = error.get("caused_by"); - if (causedBy != null) { - String cbReason = causedBy.path("reason").asText(); - String cbType = causedBy.path("type").asText(); - errorMessages.append(String.format("%nCaused by: %s (%s)", cbReason, cbType)); + String docId = item.findValue("_id").asText(); + JsonNode causedBy = error.path("caused_by"); // May not be present + String cbReason = causedBy.path("reason").asText(); + String cbType = causedBy.path("type").asText(); + + if (allowedErrorTypes == null + || (!allowedErrorTypes.contains(type) && !allowedErrorTypes.contains(cbType))) { + // 'error' and 'causedBy` fields are not null, and the error is not being ignored. + numErrors++; + + errorMessages.append(String.format("%nDocument id %s: %s (%s)", docId, reason, type)); + + if (!causedBy.isMissingNode()) { + errorMessages.append(String.format("%nCaused by: %s (%s)", cbReason, cbType)); + } } } } - throw new IOException(errorMessages.toString()); + if (numErrors > 0) { + throw new IOException(errorMessages.toString()); + } } } @@ -241,6 +292,10 @@ public abstract static class ConnectionConfiguration implements Serializable { public abstract @Nullable String getPassword(); + public abstract @Nullable String getApiKey(); + + public abstract @Nullable String getBearerToken(); + public abstract @Nullable String getKeystorePath(); public abstract @Nullable String getKeystorePassword(); @@ -265,6 +320,10 @@ abstract static class Builder { abstract Builder setPassword(String password); + abstract Builder setApiKey(String apiKey); + + abstract Builder setBearerToken(String bearerToken); + abstract Builder setKeystorePath(String keystorePath); abstract Builder setKeystorePassword(String password); @@ -303,6 +362,73 @@ public static ConnectionConfiguration create(String[] addresses, String index, S .build(); } + /** + * Creates a new Elasticsearch connection configuration with no default type. + * + * @param addresses list of addresses of Elasticsearch nodes + * @param index the index toward which the requests will be issued + * @return the connection configuration object + */ + public static ConnectionConfiguration create(String[] addresses, String index) { + checkArgument(addresses != null, "addresses can not be null"); + checkArgument(addresses.length > 0, "addresses can not be empty"); + checkArgument(index != null, "index can not be null"); + return new AutoValue_ElasticsearchIO_ConnectionConfiguration.Builder() + .setAddresses(Arrays.asList(addresses)) + .setIndex(index) + .setType("") + .setTrustSelfSignedCerts(false) + .build(); + } + + /** + * Creates a new Elasticsearch connection configuration with no default index nor type. + * + * @param addresses list of addresses of Elasticsearch nodes + * @return the connection configuration object + */ + public static ConnectionConfiguration create(String[] addresses) { + checkArgument(addresses != null, "addresses can not be null"); + checkArgument(addresses.length > 0, "addresses can not be empty"); + return new AutoValue_ElasticsearchIO_ConnectionConfiguration.Builder() + .setAddresses(Arrays.asList(addresses)) + .setIndex("") + .setType("") + .setTrustSelfSignedCerts(false) + .build(); + } + + /** + * Generates the bulk API endpoint based on the set values. + * + *

    Based on ConnectionConfiguration constructors, we know that one of the following is true: + * + *

      + *
    • index and type are non-empty strings + *
    • index is non-empty string, type is empty string + *
    • index and type are empty string + *
    + * + *

    Valid endpoints therefore include: + * + *

      + *
    • /_bulk + *
    • /index_name/_bulk + *
    • /index_name/type_name/_bulk + *
    + */ + public String getBulkEndPoint() { + StringBuilder sb = new StringBuilder(); + if (!Strings.isNullOrEmpty(getIndex())) { + sb.append("/").append(getIndex()); + } + if (!Strings.isNullOrEmpty(getType())) { + sb.append("/").append(getType()); + } + sb.append("/").append("_bulk"); + return sb.toString(); + } + /** * If Elasticsearch authentication is enabled, provide the username. * @@ -329,6 +455,30 @@ public ConnectionConfiguration withPassword(String password) { return builder().setPassword(password).build(); } + /** + * If Elasticsearch authentication is enabled, provide an API key. + * + * @param apiKey the API key used to authenticate to Elasticsearch + * @return a {@link ConnectionConfiguration} describes a connection configuration to + * Elasticsearch. + */ + public ConnectionConfiguration withApiKey(String apiKey) { + checkArgument(!Strings.isNullOrEmpty(apiKey), "apiKey can not be null or empty"); + return builder().setApiKey(apiKey).build(); + } + + /** + * If Elasticsearch authentication is enabled, provide a bearer token. + * + * @param bearerToken the bearer token used to authenticate to Elasticsearch + * @return a {@link ConnectionConfiguration} describes a connection configuration to + * Elasticsearch. + */ + public ConnectionConfiguration withBearerToken(String bearerToken) { + checkArgument(!Strings.isNullOrEmpty(bearerToken), "bearerToken can not be null or empty"); + return builder().setBearerToken(bearerToken).build(); + } + /** * If Elasticsearch uses SSL/TLS with mutual authentication (via shield), provide the keystore * containing the client key. @@ -424,6 +574,14 @@ RestClient createClient() throws IOException { httpAsyncClientBuilder -> httpAsyncClientBuilder.setDefaultCredentialsProvider(credentialsProvider)); } + if (getApiKey() != null) { + restClientBuilder.setDefaultHeaders( + new Header[] {new BasicHeader("Authorization", "ApiKey " + getApiKey())}); + } + if (getBearerToken() != null) { + restClientBuilder.setDefaultHeaders( + new Header[] {new BasicHeader("Authorization", "Bearer " + getBearerToken())}); + } if (getKeystorePath() != null && !getKeystorePath().isEmpty()) { try { KeyStore keyStore = KeyStore.getInstance("jks"); @@ -1030,37 +1188,43 @@ public boolean test(HttpEntity responseEntity) { } } - /** A {@link PTransform} writing data to Elasticsearch. */ + /** A {@link PTransform} converting docs to their Bulk API counterparts. */ @AutoValue - public abstract static class Write extends PTransform, PDone> { + public abstract static class DocToBulk + extends PTransform, PCollection> { - /** - * Interface allowing a specific field value to be returned from a parsed JSON document. This is - * used for using explicit document ids, and for dynamic routing (index/Type) on a document - * basis. A null response will result in default behaviour and an exception will be propagated - * as a failure. - */ - public interface FieldValueExtractFn extends SerializableFunction {} + private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(); + private static final int DEFAULT_RETRY_ON_CONFLICT = 5; // race conditions on updates - public interface BooleanFieldValueExtractFn extends SerializableFunction {} + static { + SimpleModule module = new SimpleModule(); + module.addSerializer(DocumentMetadata.class, new DocumentMetadataSerializer()); + OBJECT_MAPPER.registerModule(module); + } abstract @Nullable ConnectionConfiguration getConnectionConfiguration(); - abstract long getMaxBatchSize(); + abstract Write.@Nullable FieldValueExtractFn getIdFn(); - abstract long getMaxBatchSizeBytes(); + abstract Write.@Nullable FieldValueExtractFn getIndexFn(); - abstract @Nullable FieldValueExtractFn getIdFn(); + abstract Write.@Nullable FieldValueExtractFn getRoutingFn(); - abstract @Nullable FieldValueExtractFn getIndexFn(); + abstract Write.@Nullable FieldValueExtractFn getTypeFn(); - abstract @Nullable FieldValueExtractFn getTypeFn(); + abstract Write.@Nullable FieldValueExtractFn getDocVersionFn(); - abstract @Nullable RetryConfiguration getRetryConfiguration(); + abstract @Nullable String getDocVersionType(); - abstract boolean getUsePartialUpdate(); + abstract @Nullable String getUpsertScript(); - abstract @Nullable BooleanFieldValueExtractFn getIsDeleteFn(); + abstract @Nullable Boolean getUsePartialUpdate(); + + abstract @Nullable Boolean getAppendOnly(); + + abstract Write.@Nullable BooleanFieldValueExtractFn getIsDeleteFn(); + + abstract @Nullable Integer getBackendVersion(); abstract Builder builder(); @@ -1068,77 +1232,52 @@ public interface BooleanFieldValueExtractFn extends SerializableFunction 0, "batchSize must be > 0, but was %s", batchSize); - return builder().setMaxBatchSize(batchSize).build(); - } - - /** - * Provide a maximum size in bytes for the batch see bulk API - * (https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-bulk.html). Default is 5MB - * (like Elasticsearch bulk size advice). See - * https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html Depending on the - * execution engine, size of bundles may vary, this sets the maximum size. Change this if you - * need to have smaller Elasticsearch bulks. - * - * @param batchSizeBytes maximum batch size in bytes - * @return the {@link Write} with connection batch size in bytes set - */ - public Write withMaxBatchSizeBytes(long batchSizeBytes) { - checkArgument(batchSizeBytes > 0, "batchSizeBytes must be > 0, but was %s", batchSizeBytes); - return builder().setMaxBatchSizeBytes(batchSizeBytes).build(); - } - /** * Provide a function to extract the id from the document. This id will be used as the document * id in Elasticsearch. Should the function throw an Exception then the batch will fail and the * exception propagated. * * @param idFn to extract the document ID - * @return the {@link Write} with the function set + * @return the {@link DocToBulk} with the function set */ - public Write withIdFn(FieldValueExtractFn idFn) { + public DocToBulk withIdFn(Write.FieldValueExtractFn idFn) { checkArgument(idFn != null, "idFn must not be null"); return builder().setIdFn(idFn).build(); } @@ -1149,13 +1288,26 @@ public Write withIdFn(FieldValueExtractFn idFn) { * exception propagated. * * @param indexFn to extract the destination index from - * @return the {@link Write} with the function set + * @return the {@link DocToBulk} with the function set */ - public Write withIndexFn(FieldValueExtractFn indexFn) { + public DocToBulk withIndexFn(Write.FieldValueExtractFn indexFn) { checkArgument(indexFn != null, "indexFn must not be null"); return builder().setIndexFn(indexFn).build(); } + /** + * Provide a function to extract the target routing from the document allowing for dynamic + * document routing. Should the function throw an Exception then the batch will fail and the + * exception propagated. + * + * @param routingFn to extract the destination index from + * @return the {@link DocToBulk} with the function set + */ + public DocToBulk withRoutingFn(Write.FieldValueExtractFn routingFn) { + checkArgument(routingFn != null, "routingFn must not be null"); + return builder().setRoutingFn(routingFn).build(); + } + /** * Provide a function to extract the target type from the document allowing for dynamic document * routing. Should the function throw an Exception then the batch will fail and the exception @@ -1165,9 +1317,9 @@ public Write withIndexFn(FieldValueExtractFn indexFn) { * discussed in this blog. * * @param typeFn to extract the destination index from - * @return the {@link Write} with the function set + * @return the {@link DocToBulk} with the function set */ - public Write withTypeFn(FieldValueExtractFn typeFn) { + public DocToBulk withTypeFn(Write.FieldValueExtractFn typeFn) { checkArgument(typeFn != null, "typeFn must not be null"); return builder().setTypeFn(typeFn).build(); } @@ -1177,37 +1329,62 @@ public Write withTypeFn(FieldValueExtractFn typeFn) { * Elasticsearch. * * @param usePartialUpdate set to true to issue partial updates - * @return the {@link Write} with the partial update control set + * @return the {@link DocToBulk} with the partial update control set */ - public Write withUsePartialUpdate(boolean usePartialUpdate) { + public DocToBulk withUsePartialUpdate(boolean usePartialUpdate) { return builder().setUsePartialUpdate(usePartialUpdate).build(); } /** - * Provides configuration to retry a failed batch call to Elasticsearch. A batch is considered - * as failed if the underlying {@link RestClient} surfaces 429 HTTP status code as error for one - * or more of the items in the {@link Response}. Users should consider that retrying might - * compound the underlying problem which caused the initial failure. Users should also be aware - * that once retrying is exhausted the error is surfaced to the runner which may then - * opt to retry the current bundle in entirety or abort if the max number of retries of the - * runner is completed. Retrying uses an exponential backoff algorithm, with minimum backoff of - * 5 seconds and then surfacing the error once the maximum number of retries or maximum - * configuration duration is exceeded. + * Provide an instruction to control whether the target index should be considered append-only. + * For append-only indexes and/or data streams, only {@code create} operations will be issued, + * instead of {@code index}, which is the default. * - *

    Example use: + *

    {@code create} fails if a document with the same ID already exists in the target, {@code + * index} adds or replaces a document as necessary. If no ID is provided, both operations are + * equivalent, unless you are writing to a data + * stream. Data streams only support the {@code create} operation. For more information see + * the Note: if the value of @param backendVersion differs from the version the destination + * cluster is running, behavior is undefined and likely to yield errors. + * + * @param backendVersion the major version number of the version of Elasticsearch being run in + * the cluster where documents will be indexed. + * @return the {@link DocToBulk} with the Elasticsearch major version number set + */ + public DocToBulk withBackendVersion(int backendVersion) { + checkArgument( + VALID_CLUSTER_VERSIONS.contains(backendVersion), + "Backend version may only be one of " + "%s", + String.join(", ", VERSION_TYPES)); + return builder().setBackendVersion(backendVersion).build(); + } + @Override - public PDone expand(PCollection input) { + public PCollection expand(PCollection docs) { ConnectionConfiguration connectionConfiguration = getConnectionConfiguration(); - FieldValueExtractFn idFn = getIdFn(); - BooleanFieldValueExtractFn isDeleteFn = getIsDeleteFn(); - checkState(connectionConfiguration != null, "withConnectionConfiguration() is required"); + Integer backendVersion = getBackendVersion(); + Write.FieldValueExtractFn idFn = getIdFn(); + Write.BooleanFieldValueExtractFn isDeleteFn = getIsDeleteFn(); + checkState( + (backendVersion != null || connectionConfiguration != null), + "withBackendVersion() or withConnectionConfiguration() is required"); checkArgument( isDeleteFn == null || idFn != null, "Id needs to be specified by withIdFn for delete operation"); - input.apply(ParDo.of(new WriteFn(this))); - return PDone.in(input.getPipeline()); + + return docs.apply(ParDo.of(new DocToBulkFn(this))); + } + + // Encapsulates the elements which form the metadata for an Elasticsearch bulk operation + private static class DocumentMetadata implements Serializable { + final String index; + final String type; + final String id; + final Integer retryOnConflict; + final String routing; + final Integer backendVersion; + final String version; + final String versionType; + + DocumentMetadata( + String index, + String type, + String id, + Integer retryOnConflict, + String routing, + Integer backendVersion, + String version, + String versionType) { + this.index = index; + this.id = id; + this.type = type; + this.retryOnConflict = retryOnConflict; + this.routing = routing; + this.backendVersion = backendVersion; + this.version = version; + this.versionType = versionType; + } + } + + private static class DocumentMetadataSerializer extends StdSerializer { + private DocumentMetadataSerializer() { + super(DocumentMetadata.class); + } + + @Override + public void serialize(DocumentMetadata value, JsonGenerator gen, SerializerProvider provider) + throws IOException { + gen.writeStartObject(); + if (value.index != null) { + gen.writeStringField("_index", value.index); + } + if (value.type != null) { + gen.writeStringField("_type", value.type); + } + if (value.id != null) { + gen.writeStringField("_id", value.id); + } + if (value.routing != null) { + gen.writeStringField("routing", value.routing); + } + if (value.retryOnConflict != null && value.backendVersion <= 6) { + gen.writeNumberField("_retry_on_conflict", value.retryOnConflict); + } + if (value.retryOnConflict != null && value.backendVersion >= 7) { + gen.writeNumberField("retry_on_conflict", value.retryOnConflict); + } + if (value.version != null) { + gen.writeStringField("version", value.version); + } + if (value.versionType != null) { + gen.writeStringField("version_type", value.versionType); + } + gen.writeEndObject(); + } } - /** {@link DoFn} to for the {@link Write} transform. */ @VisibleForTesting - static class WriteFn extends DoFn { - private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(); - private static final int DEFAULT_RETRY_ON_CONFLICT = 5; // race conditions on updates + static String createBulkApiEntity(DocToBulk spec, String document, int backendVersion) + throws IOException { + String documentMetadata = "{}"; + boolean isDelete = false; + if (spec.getIndexFn() != null + || spec.getTypeFn() != null + || spec.getIdFn() != null + || spec.getRoutingFn() != null) { + // parse once and reused for efficiency + JsonNode parsedDocument = OBJECT_MAPPER.readTree(document); + documentMetadata = getDocumentMetadata(spec, parsedDocument, backendVersion); + if (spec.getIsDeleteFn() != null) { + isDelete = spec.getIsDeleteFn().apply(parsedDocument); + } + } + final boolean isAppendOnly = Boolean.TRUE.equals(spec.getAppendOnly()); - private static final Duration RETRY_INITIAL_BACKOFF = Duration.standardSeconds(5); + if (isDelete) { + checkState(!isAppendOnly, "No deletions allowed for append-only indices"); + // delete request used for deleting a document + return String.format("{ \"delete\" : %s }%n", documentMetadata); + } - @VisibleForTesting - static final String RETRY_ATTEMPT_LOG = "Error writing to Elasticsearch. Retry attempt[%d]"; + if (isAppendOnly) { + return String.format("{ \"create\" : %s }%n%s%n", documentMetadata, document); + } - @VisibleForTesting - static final String RETRY_FAILED_LOG = - "Error writing to ES after %d attempt(s). No more attempts allowed"; + // index is an insert/upsert and update is a partial update (or insert if not + // existing) + if (Boolean.TRUE.equals(spec.getUsePartialUpdate())) { + return String.format( + "{ \"update\" : %s }%n{ \"doc\" : %s, " + "\"doc_as_upsert\" : true }%n", + documentMetadata, document); + } else if (spec.getUpsertScript() != null) { + return String.format( + "{ \"update\" : %s }%n{ \"script\" : {\"source\": \"%s\", " + + "\"params\": %s}, \"upsert\" : %s, \"scripted_upsert\": true}%n", + documentMetadata, spec.getUpsertScript(), document, document); + } else { + return String.format("{ \"index\" : %s }%n%s%n", documentMetadata, document); + } + } - private transient FluentBackoff retryBackoff; + private static String lowerCaseOrNull(String input) { + return input == null ? null : input.toLowerCase(); + } + + /** + * Extracts the components that comprise the document address from the document using the {@link + * Write.FieldValueExtractFn} configured. This allows any or all of the index, type and document + * id to be controlled on a per document basis. If none are provided then an empty default of + * {@code {}} is returned. Sanitization of the index is performed, automatically lower-casing + * the value as required by Elasticsearch. + * + * @param parsedDocument the json from which the index, type and id may be extracted + * @return the document address as JSON or the default + * @throws IOException if the document cannot be parsed as JSON + */ + private static String getDocumentMetadata( + DocToBulk spec, JsonNode parsedDocument, int backendVersion) throws IOException { + DocumentMetadata metadata = + new DocumentMetadata( + spec.getIndexFn() != null + ? lowerCaseOrNull(spec.getIndexFn().apply(parsedDocument)) + : null, + spec.getTypeFn() != null ? spec.getTypeFn().apply(parsedDocument) : null, + spec.getIdFn() != null ? spec.getIdFn().apply(parsedDocument) : null, + (Boolean.TRUE.equals(spec.getUsePartialUpdate()) + || (spec.getUpsertScript() != null && !spec.getUpsertScript().isEmpty())) + ? DEFAULT_RETRY_ON_CONFLICT + : null, + spec.getRoutingFn() != null ? spec.getRoutingFn().apply(parsedDocument) : null, + backendVersion, + spec.getDocVersionFn() != null ? spec.getDocVersionFn().apply(parsedDocument) : null, + spec.getDocVersionType()); + return OBJECT_MAPPER.writeValueAsString(metadata); + } + /** {@link DoFn} to for the {@link DocToBulk} transform. */ + @VisibleForTesting + static class DocToBulkFn extends DoFn { + private final DocToBulk spec; private int backendVersion; - private final Write spec; - private transient RestClient restClient; - private ArrayList batch; - private long currentBatchSizeBytes; - - // Encapsulates the elements which form the metadata for an Elasticsearch bulk operation - private static class DocumentMetadata implements Serializable { - final String index; - final String type; - final String id; - final Integer retryOnConflict; - - DocumentMetadata(String index, String type, String id, Integer retryOnConflict) { - this.index = index; - this.type = type; - this.id = id; - this.retryOnConflict = retryOnConflict; - } - } - @VisibleForTesting - WriteFn(Write spec) { + public DocToBulkFn(DocToBulk spec) { this.spec = spec; } @Setup public void setup() throws IOException { - ConnectionConfiguration connectionConfiguration = spec.getConnectionConfiguration(); - backendVersion = getBackendVersion(connectionConfiguration); - restClient = connectionConfiguration.createClient(); - - retryBackoff = - FluentBackoff.DEFAULT.withMaxRetries(0).withInitialBackoff(RETRY_INITIAL_BACKOFF); - - if (spec.getRetryConfiguration() != null) { - retryBackoff = - FluentBackoff.DEFAULT - .withInitialBackoff(RETRY_INITIAL_BACKOFF) - .withMaxRetries(spec.getRetryConfiguration().getMaxAttempts() - 1) - .withMaxCumulativeBackoff(spec.getRetryConfiguration().getMaxDuration()); + if (spec.getBackendVersion() != null) { + backendVersion = spec.getBackendVersion(); + } else { + backendVersion = ElasticsearchIO.getBackendVersion(spec.getConnectionConfiguration()); } - // configure a custom serializer for metadata to be able to change serialization based - // on ES version - SimpleModule module = new SimpleModule(); - module.addSerializer(DocumentMetadata.class, new DocumentMetadataSerializer()); - OBJECT_MAPPER.registerModule(module); } - @StartBundle - public void startBundle(StartBundleContext context) { - batch = new ArrayList<>(); - currentBatchSizeBytes = 0; + @ProcessElement + public void processElement(ProcessContext c) throws IOException { + c.output(createBulkApiEntity(spec, c.element(), backendVersion)); } + } + } - private class DocumentMetadataSerializer extends StdSerializer { + /** + * A {@link PTransform} writing data to Elasticsearch. + * + *

    This {@link PTransform} acts as a convenience wrapper for doing both document to bulk API + * serialization as well as batching those Bulk API entities and writing them to an Elasticsearch + * cluster. This class is effectively a thin proxy for DocToBulk->BulkIO all-in-one for + * convenience and backward compatibility. + */ + public static class Write extends PTransform, PDone> { + public interface FieldValueExtractFn extends SerializableFunction {} - private DocumentMetadataSerializer() { - super(DocumentMetadata.class); - } + public interface BooleanFieldValueExtractFn extends SerializableFunction {} - @Override - public void serialize( - DocumentMetadata value, JsonGenerator gen, SerializerProvider provider) - throws IOException { - gen.writeStartObject(); - if (value.index != null) { - gen.writeStringField("_index", value.index); - } - if (value.type != null) { - gen.writeStringField("_type", value.type); - } - if (value.id != null) { - gen.writeStringField("_id", value.id); - } - if (value.retryOnConflict != null && (backendVersion <= 6)) { - gen.writeNumberField("_retry_on_conflict", value.retryOnConflict); - } - if (value.retryOnConflict != null && backendVersion >= 7) { - gen.writeNumberField("retry_on_conflict", value.retryOnConflict); - } - gen.writeEndObject(); + private DocToBulk docToBulk = new AutoValue_ElasticsearchIO_DocToBulk.Builder().build(); + + private BulkIO bulkIO = + new AutoValue_ElasticsearchIO_BulkIO.Builder() + // advised default starting batch size in ES docs + .setMaxBatchSize(1000L) + // advised default starting batch size in ES docs + .setMaxBatchSizeBytes(5L * 1024L * 1024L) + .setUseStatefulBatches(false) + .setMaxParallelRequestsPerWindow(1) + .build(); + + public DocToBulk getDocToBulk() { + return docToBulk; + } + + public BulkIO getBulkIO() { + return bulkIO; + } + + // For building Doc2Bulk + /** Refer to {@link DocToBulk#withIdFn}. */ + public Write withIdFn(FieldValueExtractFn idFn) { + docToBulk = docToBulk.withIdFn(idFn); + return this; + } + + /** Refer to {@link DocToBulk#withIndexFn}. */ + public Write withIndexFn(FieldValueExtractFn indexFn) { + docToBulk = docToBulk.withIndexFn(indexFn); + return this; + } + + /** Refer to {@link DocToBulk#withRoutingFn}. */ + public Write withRoutingFn(FieldValueExtractFn routingFn) { + docToBulk = docToBulk.withRoutingFn(routingFn); + return this; + } + + /** Refer to {@link DocToBulk#withTypeFn}. */ + public Write withTypeFn(FieldValueExtractFn typeFn) { + docToBulk = docToBulk.withTypeFn(typeFn); + return this; + } + + /** Refer to {@link DocToBulk#withDocVersionFn}. */ + public Write withDocVersionFn(FieldValueExtractFn docVersionFn) { + docToBulk = docToBulk.withDocVersionFn(docVersionFn); + return this; + } + + /** Refer to {@link DocToBulk#withDocVersionType}. */ + public Write withDocVersionType(String docVersionType) { + docToBulk = docToBulk.withDocVersionType(docVersionType); + return this; + } + + /** Refer to {@link DocToBulk#withUsePartialUpdate}. */ + public Write withUsePartialUpdate(boolean usePartialUpdate) { + docToBulk = docToBulk.withUsePartialUpdate(usePartialUpdate); + return this; + } + + /** Refer to {@link DocToBulk#withAppendOnly}. */ + public Write withAppendOnly(boolean appendOnly) { + docToBulk = docToBulk.withAppendOnly(appendOnly); + return this; + } + + /** Refer to {@link DocToBulk#withUpsertScript}. */ + public Write withUpsertScript(String source) { + docToBulk = docToBulk.withUpsertScript(source); + return this; + } + + /** Refer to {@link DocToBulk#withBackendVersion}. */ + public Write withBackendVersion(int backendVersion) { + docToBulk = docToBulk.withBackendVersion(backendVersion); + return this; + } + + /** Refer to {@link DocToBulk#withIsDeleteFn}. */ + public Write withIsDeleteFn(Write.BooleanFieldValueExtractFn isDeleteFn) { + docToBulk = docToBulk.withIsDeleteFn(isDeleteFn); + return this; + } + // End building Doc2Bulk + + /** Refer to {@link BulkIO#withConnectionConfiguration}. */ + public Write withConnectionConfiguration(ConnectionConfiguration connectionConfiguration) { + checkArgument(connectionConfiguration != null, "connectionConfiguration can not be null"); + docToBulk = docToBulk.withConnectionConfiguration(connectionConfiguration); + bulkIO = bulkIO.withConnectionConfiguration(connectionConfiguration); + return this; + } + + /** Refer to {@link BulkIO#withMaxBatchSize}. */ + public Write withMaxBatchSize(long batchSize) { + bulkIO = bulkIO.withMaxBatchSize(batchSize); + return this; + } + + /** Refer to {@link BulkIO#withMaxBatchSizeBytes}. */ + public Write withMaxBatchSizeBytes(long batchSizeBytes) { + bulkIO = bulkIO.withMaxBatchSizeBytes(batchSizeBytes); + return this; + } + + /** Refer to {@link BulkIO#withRetryConfiguration}. */ + public Write withRetryConfiguration(RetryConfiguration retryConfiguration) { + bulkIO = bulkIO.withRetryConfiguration(retryConfiguration); + return this; + } + + /** Refer to {@link BulkIO#withIgnoreVersionConflicts}. */ + public Write withIgnoreVersionConflicts(boolean ignoreVersionConflicts) { + bulkIO = bulkIO.withIgnoreVersionConflicts(ignoreVersionConflicts); + return this; + } + + /** Refer to {@link BulkIO#withUseStatefulBatches}. */ + public Write withUseStatefulBatches(boolean useStatefulBatches) { + bulkIO = bulkIO.withUseStatefulBatches(useStatefulBatches); + return this; + } + + /** Refer to {@link BulkIO#withMaxBufferingDuration}. */ + public Write withMaxBufferingDuration(Duration maxBufferingDuration) { + bulkIO = bulkIO.withMaxBufferingDuration(maxBufferingDuration); + return this; + } + + /** Refer to {@link BulkIO#withMaxParallelRequestsPerWindow}. */ + public Write withMaxParallelRequestsPerWindow(int maxParallelRequestsPerWindow) { + bulkIO = bulkIO.withMaxParallelRequestsPerWindow(maxParallelRequestsPerWindow); + return this; + } + + /** Refer to {@link BulkIO#withAllowableResponseErrors}. */ + public Write withAllowableResponseErrors(@Nullable Set allowableResponseErrors) { + if (allowableResponseErrors == null) { + allowableResponseErrors = new HashSet<>(); + } + + bulkIO = bulkIO.withAllowableResponseErrors(allowableResponseErrors); + return this; + } + + @Override + public PDone expand(PCollection input) { + return input.apply(docToBulk).apply(bulkIO); + } + } + + /** + * A {@link PTransform} writing Bulk API entities created by {@link ElasticsearchIO.DocToBulk} to + * an Elasticsearch cluster. Typically, using {@link ElasticsearchIO.Write} is preferred, whereas + * using {@link ElasticsearchIO.DocToBulk} and BulkIO separately is for advanced use cases such as + * mirroring data to multiple clusters or data lakes without recomputation. + */ + @AutoValue + public abstract static class BulkIO extends PTransform, PDone> { + @VisibleForTesting + static final String RETRY_ATTEMPT_LOG = "Error writing to Elasticsearch. Retry attempt[%d]"; + + @VisibleForTesting + static final String RETRY_FAILED_LOG = + "Error writing to ES after %d attempt(s). No more attempts allowed"; + + abstract @Nullable ConnectionConfiguration getConnectionConfiguration(); + + abstract long getMaxBatchSize(); + + abstract long getMaxBatchSizeBytes(); + + abstract @Nullable Duration getMaxBufferingDuration(); + + abstract boolean getUseStatefulBatches(); + + abstract int getMaxParallelRequestsPerWindow(); + + abstract @Nullable RetryConfiguration getRetryConfiguration(); + + abstract @Nullable Set getAllowedResponseErrors(); + + abstract Builder builder(); + + @AutoValue.Builder + abstract static class Builder { + abstract Builder setConnectionConfiguration(ConnectionConfiguration connectionConfiguration); + + abstract Builder setMaxBatchSize(long maxBatchSize); + + abstract Builder setMaxBatchSizeBytes(long maxBatchSizeBytes); + + abstract Builder setRetryConfiguration(RetryConfiguration retryConfiguration); + + abstract Builder setAllowedResponseErrors(Set allowedResponseErrors); + + abstract Builder setMaxBufferingDuration(Duration maxBufferingDuration); + + abstract Builder setUseStatefulBatches(boolean useStatefulBatches); + + abstract Builder setMaxParallelRequestsPerWindow(int maxParallelRequestsPerWindow); + + abstract BulkIO build(); + } + + /** + * Provide the Elasticsearch connection configuration object. + * + * @param connectionConfiguration the Elasticsearch {@link ConnectionConfiguration} object + * @return the {@link BulkIO} with connection configuration set + */ + public BulkIO withConnectionConfiguration(ConnectionConfiguration connectionConfiguration) { + checkArgument(connectionConfiguration != null, "connectionConfiguration can not be null"); + + return builder().setConnectionConfiguration(connectionConfiguration).build(); + } + + /** + * Provide a maximum size in number of documents for the batch see bulk API + * (https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-bulk.html). Default is 1000 + * docs (like Elasticsearch bulk size advice). See + * https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html Depending on the + * execution engine, size of bundles may vary, this sets the maximum size. Change this if you + * need to have smaller ElasticSearch bulks. + * + * @param batchSize maximum batch size in number of documents + * @return the {@link BulkIO} with connection batch size set + */ + public BulkIO withMaxBatchSize(long batchSize) { + checkArgument(batchSize > 0, "batchSize must be > 0, but was %s", batchSize); + return builder().setMaxBatchSize(batchSize).build(); + } + + /** + * Provide a maximum size in bytes for the batch see bulk API + * (https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-bulk.html). Default is 5MB + * (like Elasticsearch bulk size advice). See + * https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html Depending on the + * execution engine, size of bundles may vary, this sets the maximum size. Change this if you + * need to have smaller ElasticSearch bulks. + * + * @param batchSizeBytes maximum batch size in bytes + * @return the {@link BulkIO} with connection batch size in bytes set + */ + public BulkIO withMaxBatchSizeBytes(long batchSizeBytes) { + checkArgument(batchSizeBytes > 0, "batchSizeBytes must be > 0, but was %s", batchSizeBytes); + return builder().setMaxBatchSizeBytes(batchSizeBytes).build(); + } + + /** + * Provides configuration to retry a failed batch call to Elasticsearch. A batch is considered + * as failed if the underlying {@link RestClient} surfaces 429 HTTP status code as error for one + * or more of the items in the {@link Response}. Users should consider that retrying might + * compound the underlying problem which caused the initial failure. Users should also be aware + * that once retrying is exhausted the error is surfaced to the runner which may then + * opt to retry the current bundle in entirety or abort if the max number of retries of the + * runner is completed. Retrying uses an exponential backoff algorithm, with minimum backoff of + * 5 seconds and then surfacing the error once the maximum number of retries or maximum + * configuration duration is exceeded. + * + *

    Example use: + * + *

    {@code
    +     * ElasticsearchIO.write()
    +     *   .withRetryConfiguration(ElasticsearchIO.RetryConfiguration.create(10, Duration.standardMinutes(3))
    +     *   ...
    +     * }
    + * + * @param retryConfiguration the rules which govern the retry behavior + * @return the {@link BulkIO} with retrying configured + */ + public BulkIO withRetryConfiguration(RetryConfiguration retryConfiguration) { + checkArgument(retryConfiguration != null, "retryConfiguration is required"); + return builder().setRetryConfiguration(retryConfiguration).build(); + } + + /** + * Whether or not to suppress version conflict errors in a Bulk API response. This can be useful + * if your use case involves using external version types. + * + * @param ignoreVersionConflicts true to suppress version conflicts, false to surface version + * conflict errors. + * @return the {@link BulkIO} with version conflict handling configured + */ + public BulkIO withIgnoreVersionConflicts(boolean ignoreVersionConflicts) { + Set allowedResponseErrors = getAllowedResponseErrors(); + if (allowedResponseErrors == null) { + allowedResponseErrors = new HashSet<>(); + } + if (ignoreVersionConflicts) { + allowedResponseErrors.add(VERSION_CONFLICT_ERROR); + } + + return builder().setAllowedResponseErrors(allowedResponseErrors).build(); + } + + /** + * Provide a set of textual error types which can be contained in Bulk API response + * items[].error.type field. Any element in @param allowableResponseErrorTypes will suppress + * errors of the same type in Bulk responses. + * + *

    See also + * https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-failures-ex + * + * @param allowableResponseErrorTypes + * @return the {@link BulkIO} with allowable response errors set + */ + public BulkIO withAllowableResponseErrors(@Nullable Set allowableResponseErrorTypes) { + if (allowableResponseErrorTypes == null) { + allowableResponseErrorTypes = new HashSet<>(); + } + + return builder().setAllowedResponseErrors(allowableResponseErrorTypes).build(); + } + + /** + * If using {@link BulkIO#withUseStatefulBatches}, this can be used to set a maximum elapsed + * time before buffered elements are emitted to Elasticsearch as a Bulk API request. If this + * config is not set, Bulk requests will not be issued until {@link BulkIO#getMaxBatchSize} + * number of documents have been buffered. This may result in higher latency in particular if + * your max batch size is set to a large value and your pipeline input is low volume. + * + * @param maxBufferingDuration the maximum duration to wait before sending any buffered + * documents to Elasticsearch, regardless of maxBatchSize. + * @return the {@link BulkIO} with maximum buffering duration set + */ + public BulkIO withMaxBufferingDuration(Duration maxBufferingDuration) { + LOG.warn( + "Use of withMaxBufferingDuration requires withUseStatefulBatches(true). " + + "Setting that automatically."); + return builder() + .setUseStatefulBatches(true) + .setMaxBufferingDuration(maxBufferingDuration) + .build(); + } + + /** + * Whether or not to use Stateful Processing to ensure bulk requests have the desired number of + * entities i.e. as close to the maxBatchSize as possible. By default without this feature + * enabled, Bulk requests will not contain more than maxBatchSize entities, but the lower bound + * of batch size is determined by Beam Runner bundle sizes, which may be as few as 1. + * + * @param useStatefulBatches true enables the use of Stateful Processing to ensure that batches + * are as close to the maxBatchSize as possible. + * @return the {@link BulkIO} with Stateful Processing enabled or disabled + */ + public BulkIO withUseStatefulBatches(boolean useStatefulBatches) { + return builder().setUseStatefulBatches(useStatefulBatches).build(); + } + + /** + * When using {@link BulkIO#withUseStatefulBatches} Stateful Processing, states and therefore + * batches are maintained per-key-per-window. BE AWARE that low values for @param + * maxParallelRequestsPerWindow, in particular if the input data has a finite number of windows, + * can reduce parallelism greatly. If data is globally windowed and @param + * maxParallelRequestsPerWindow is set to 1,there will only ever be 1 request in flight. Having + * only a single request in flight can be beneficial for ensuring an Elasticsearch cluster is + * not overwhelmed by parallel requests,but may not work for all use cases. If this number is + * less than the number of maximum workers in your pipeline, the IO work will result in a + * sub-distribution of the last write step with most of the runners. + * + * @param maxParallelRequestsPerWindow the maximum number of parallel bulk requests for a window + * of data + * @return the {@link BulkIO} with maximum parallel bulk requests per window set + */ + public BulkIO withMaxParallelRequestsPerWindow(int maxParallelRequestsPerWindow) { + checkArgument( + maxParallelRequestsPerWindow > 0, "parameter value must be positive " + "a integer"); + return builder().setMaxParallelRequestsPerWindow(maxParallelRequestsPerWindow).build(); + } + + /** + * Creates batches of documents using Stateful Processing based on user configurable settings of + * withMaxBufferingDuration and withMaxParallelRequestsPerWindow. + * + *

    Mostly exists for testability of withMaxParallelRequestsPerWindow. + */ + @VisibleForTesting + static class StatefulBatching + extends PTransform, PCollection>>> { + final BulkIO spec; + + private StatefulBatching(BulkIO bulkSpec) { + spec = bulkSpec; + } + + public static StatefulBatching fromSpec(BulkIO spec) { + return new StatefulBatching(spec); + } + + @Override + public PCollection>> expand(PCollection input) { + GroupIntoBatches groupIntoBatches = + GroupIntoBatches.ofSize(spec.getMaxBatchSize()); + + if (spec.getMaxBufferingDuration() != null) { + groupIntoBatches = + groupIntoBatches.withMaxBufferingDuration(spec.getMaxBufferingDuration()); } + + return input + .apply(ParDo.of(new Reshuffle.AssignShardFn<>(spec.getMaxParallelRequestsPerWindow()))) + .apply(groupIntoBatches); } - /** - * Extracts the components that comprise the document address from the document using the - * {@link FieldValueExtractFn} configured. This allows any or all of the index, type and - * document id to be controlled on a per document basis. Sanitization of the index is - * performed, automatically lower-casing the value as required by Elasticsearch. - * - * @param parsedDocument the json from which the index, type and id may be extracted - * @return the document address as JSON or the default - * @throws IOException if the document cannot be parsed as JSON - */ - private String getDocumentMetadata(JsonNode parsedDocument) throws IOException { - DocumentMetadata metadata = - new DocumentMetadata( - spec.getIndexFn() != null - ? lowerCaseOrNull(spec.getIndexFn().apply(parsedDocument)) - : null, - spec.getTypeFn() != null ? spec.getTypeFn().apply(parsedDocument) : null, - spec.getIdFn() != null ? spec.getIdFn().apply(parsedDocument) : null, - spec.getUsePartialUpdate() ? DEFAULT_RETRY_ON_CONFLICT : null); - return OBJECT_MAPPER.writeValueAsString(metadata); + } + + @Override + public PDone expand(PCollection input) { + ConnectionConfiguration connectionConfiguration = getConnectionConfiguration(); + + checkState(connectionConfiguration != null, "withConnectionConfiguration() is required"); + + if (getUseStatefulBatches()) { + input.apply(StatefulBatching.fromSpec(this)).apply(ParDo.of(new BulkIOStatefulFn(this))); + } else { + input.apply(ParDo.of(new BulkIOBundleFn(this))); } + return PDone.in(input.getPipeline()); + } - private static String lowerCaseOrNull(String input) { - return input == null ? null : input.toLowerCase(); + static class BulkIOBundleFn extends BulkIOBaseFn { + @VisibleForTesting + BulkIOBundleFn(BulkIO bulkSpec) { + super(bulkSpec); } @ProcessElement public void processElement(ProcessContext context) throws Exception { - String document = context.element(); // use configuration and auto-generated document IDs - String documentMetadata = "{}"; - boolean isDelete = false; - if (spec.getIndexFn() != null || spec.getTypeFn() != null || spec.getIdFn() != null) { - // parse once and reused for efficiency - JsonNode parsedDocument = OBJECT_MAPPER.readTree(document); - documentMetadata = getDocumentMetadata(parsedDocument); - if (spec.getIsDeleteFn() != null) { - isDelete = spec.getIsDeleteFn().apply(parsedDocument); - } - } + String bulkApiEntity = context.element(); + addAndMaybeFlush(bulkApiEntity); + } + } - if (isDelete) { - // delete request used for deleting a document. - batch.add(String.format("{ \"delete\" : %s }%n", documentMetadata)); - } else { - // index is an insert/upsert and update is a partial update (or insert if not existing) - if (spec.getUsePartialUpdate()) { - batch.add( - String.format( - "{ \"update\" : %s }%n{ \"doc\" : %s, \"doc_as_upsert\" : true }%n", - documentMetadata, document)); - } else { - batch.add(String.format("{ \"index\" : %s }%n%s%n", documentMetadata, document)); - } + /* + Intended for use in conjunction with {@link GroupIntoBatches} + */ + static class BulkIOStatefulFn extends BulkIOBaseFn>> { + @VisibleForTesting + BulkIOStatefulFn(BulkIO bulkSpec) { + super(bulkSpec); + } + + @ProcessElement + public void processElement(ProcessContext context) throws Exception { + Iterable bulkApiEntities = context.element().getValue(); + for (String bulkApiEntity : bulkApiEntities) { + addAndMaybeFlush(bulkApiEntity); } + } + } - currentBatchSizeBytes += document.getBytes(StandardCharsets.UTF_8).length; - if (batch.size() >= spec.getMaxBatchSize() - || currentBatchSizeBytes >= spec.getMaxBatchSizeBytes()) { - flushBatch(); + /** {@link DoFn} to for the {@link BulkIO} transform. */ + @VisibleForTesting + private abstract static class BulkIOBaseFn extends DoFn { + private static final Duration RETRY_INITIAL_BACKOFF = Duration.standardSeconds(5); + + private transient FluentBackoff retryBackoff; + + private BulkIO spec; + private transient RestClient restClient; + private ArrayList batch; + long currentBatchSizeBytes; + + protected BulkIOBaseFn(BulkIO bulkSpec) { + this.spec = bulkSpec; + } + + @Setup + public void setup() throws IOException { + ConnectionConfiguration connectionConfiguration = spec.getConnectionConfiguration(); + + restClient = connectionConfiguration.createClient(); + + retryBackoff = + FluentBackoff.DEFAULT.withMaxRetries(0).withInitialBackoff(RETRY_INITIAL_BACKOFF); + + if (spec.getRetryConfiguration() != null) { + retryBackoff = + FluentBackoff.DEFAULT + .withInitialBackoff(RETRY_INITIAL_BACKOFF) + .withMaxRetries(spec.getRetryConfiguration().getMaxAttempts() - 1) + .withMaxCumulativeBackoff(spec.getRetryConfiguration().getMaxDuration()); } } + @StartBundle + public void startBundle(StartBundleContext context) { + batch = new ArrayList<>(); + currentBatchSizeBytes = 0; + } + @FinishBundle public void finishBundle(FinishBundleContext context) throws IOException, InterruptedException { flushBatch(); } + protected void addAndMaybeFlush(String bulkApiEntity) + throws IOException, InterruptedException { + batch.add(bulkApiEntity); + currentBatchSizeBytes += bulkApiEntity.getBytes(StandardCharsets.UTF_8).length; + + if (batch.size() >= spec.getMaxBatchSize() + || currentBatchSizeBytes >= spec.getMaxBatchSizeBytes()) { + flushBatch(); + } + } + + private boolean isRetryableClientException(Throwable t) { + // RestClient#performRequest only throws wrapped IOException so we must inspect the + // exception cause to determine if the exception is likely transient i.e. retryable or + // not. + return t.getCause() instanceof ConnectTimeoutException + || t.getCause() instanceof SocketTimeoutException + || t.getCause() instanceof ConnectionClosedException + || t.getCause() instanceof ConnectException; + } + private void flushBatch() throws IOException, InterruptedException { if (batch.isEmpty()) { return; } + + LOG.info( + "ElasticsearchIO batch size: {}, batch size bytes: {}", + batch.size(), + currentBatchSizeBytes); + StringBuilder bulkRequest = new StringBuilder(); for (String json : batch) { bulkRequest.append(json); } + batch.clear(); - currentBatchSizeBytes = 0; - Response response; - HttpEntity responseEntity; - // Elasticsearch will default to the index/type provided here if none are set in the - // document meta (i.e. using ElasticsearchIO$Write#withIndexFn and - // ElasticsearchIO$Write#withTypeFn options) - String endPoint = - String.format( - "/%s/%s/_bulk", - spec.getConnectionConfiguration().getIndex(), - spec.getConnectionConfiguration().getType()); + currentBatchSizeBytes = 0L; + + Response response = null; + HttpEntity responseEntity = null; + + // Elasticsearch will default to the index/type provided the {@link + // ConnectionConfiguration} if none are set in the document meta (i.e. + // using ElasticsearchIO$DocToBulk#withIndexFn and + // ElasticsearchIO$DocToBulk#withTypeFn options) + String endPoint = spec.getConnectionConfiguration().getBulkEndPoint(); + HttpEntity requestBody = new NStringEntity(bulkRequest.toString(), ContentType.APPLICATION_JSON); - Request request = new Request("POST", endPoint); - request.addParameters(Collections.emptyMap()); - request.setEntity(requestBody); - response = restClient.performRequest(request); - responseEntity = new BufferedHttpEntity(response.getEntity()); + try { + Request request = new Request("POST", endPoint); + request.addParameters(Collections.emptyMap()); + request.setEntity(requestBody); + response = restClient.performRequest(request); + responseEntity = new BufferedHttpEntity(response.getEntity()); + } catch (java.io.IOException ex) { + if (spec.getRetryConfiguration() == null || !isRetryableClientException(ex)) { + throw ex; + } + LOG.error("Caught ES timeout, retrying", ex); + } + if (spec.getRetryConfiguration() != null - && spec.getRetryConfiguration().getRetryPredicate().test(responseEntity)) { + && (response == null + || responseEntity == null + || spec.getRetryConfiguration().getRetryPredicate().test(responseEntity))) { + if (responseEntity != null + && spec.getRetryConfiguration().getRetryPredicate().test(responseEntity)) { + LOG.warn("ES Cluster is responding with HTP 429 - TOO_MANY_REQUESTS."); + } responseEntity = handleRetry("POST", endPoint, Collections.emptyMap(), requestBody); } - checkForErrors(responseEntity, backendVersion, spec.getUsePartialUpdate()); + checkForErrors(responseEntity, spec.getAllowedResponseErrors()); } /** retry request based on retry configuration policy. */ @@ -1446,21 +2222,32 @@ private HttpEntity handleRetry( String method, String endpoint, Map params, HttpEntity requestBody) throws IOException, InterruptedException { Response response; - HttpEntity responseEntity; + HttpEntity responseEntity = null; Sleeper sleeper = Sleeper.DEFAULT; BackOff backoff = retryBackoff.backoff(); int attempt = 0; // while retry policy exists while (BackOffUtils.next(sleeper, backoff)) { LOG.warn(String.format(RETRY_ATTEMPT_LOG, ++attempt)); - Request request = new Request(method, endpoint); - request.addParameters(params); - request.setEntity(requestBody); - response = restClient.performRequest(request); - responseEntity = new BufferedHttpEntity(response.getEntity()); + try { + Request request = new Request(method, endpoint); + request.addParameters(params); + request.setEntity(requestBody); + response = restClient.performRequest(request); + responseEntity = new BufferedHttpEntity(response.getEntity()); + } catch (java.io.IOException ex) { + if (isRetryableClientException(ex)) { + LOG.error("Caught ES timeout, retrying", ex); + continue; + } + } // if response has no 429 errors - if (!spec.getRetryConfiguration().getRetryPredicate().test(responseEntity)) { + if (!Objects.requireNonNull(spec.getRetryConfiguration()) + .getRetryPredicate() + .test(responseEntity)) { return responseEntity; + } else { + LOG.warn("ES Cluster is responding with HTP 429 - TOO_MANY_REQUESTS."); } } throw new IOException(String.format(RETRY_FAILED_LOG, attempt)); @@ -1483,10 +2270,7 @@ static int getBackendVersion(ConnectionConfiguration connectionConfiguration) { int backendVersion = Integer.parseInt(jsonNode.path("version").path("number").asText().substring(0, 1)); checkArgument( - (backendVersion == 2 - || backendVersion == 5 - || backendVersion == 6 - || backendVersion == 7), + (VALID_CLUSTER_VERSIONS.contains(backendVersion)), "The Elasticsearch version to connect to is %s.x. " + "This version of the ElasticsearchIO is only compatible with " + "Elasticsearch v7.x, v6.x, v5.x and v2.x", diff --git a/sdks/java/io/expansion-service/build.gradle b/sdks/java/io/expansion-service/build.gradle index 75ff5ad05544..c9aab1c1c868 100644 --- a/sdks/java/io/expansion-service/build.gradle +++ b/sdks/java/io/expansion-service/build.gradle @@ -32,10 +32,20 @@ ext.summary = "Expansion service serving several Java IOs" dependencies { compile project(":sdks:java:expansion-service") + permitUnusedDeclared project(":sdks:java:expansion-service") // BEAM-11761 compile project(":sdks:java:io:kafka") + permitUnusedDeclared project(":sdks:java:io:kafka") // BEAM-11761 compile project(":sdks:java:io:jdbc") + permitUnusedDeclared project(":sdks:java:io:jdbc") // BEAM-11761 // Include postgres so it can be used with external JDBC compile library.java.postgres + permitUnusedDeclared library.java.postgres // BEAM-11761 runtime library.java.kafka_clients runtime library.java.slf4j_jdk14 } + +task runExpansionService (type: JavaExec) { + main = "org.apache.beam.sdk.expansion.service.ExpansionService" + classpath = sourceSets.test.runtimeClasspath + args = [project.findProperty("constructionService.port") ?: "8097"] +} diff --git a/sdks/java/io/file-based-io-tests/build.gradle b/sdks/java/io/file-based-io-tests/build.gradle index 6c53e90ff17f..b47a51d7b633 100644 --- a/sdks/java/io/file-based-io-tests/build.gradle +++ b/sdks/java/io/file-based-io-tests/build.gradle @@ -31,8 +31,6 @@ dependencies { testCompile project(path: ":sdks:java:io:parquet", configuration: "testRuntime") testCompile project(path: ":sdks:java:testing:test-utils", configuration: "testRuntime") testCompile library.java.jaxb_api - testCompile library.java.jaxb_impl testCompile library.java.junit - testCompile library.java.hamcrest_core testCompile library.java.hadoop_client } diff --git a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/avro/AvroIOIT.java b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/avro/AvroIOIT.java index 8243ed4c97c6..0d48480e1745 100644 --- a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/avro/AvroIOIT.java +++ b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/avro/AvroIOIT.java @@ -78,9 +78,6 @@ * performance testing framework. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class AvroIOIT { private static final Schema AVRO_SCHEMA = diff --git a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/parquet/ParquetIOIT.java b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/parquet/ParquetIOIT.java index 95d88eaaef91..7a36ac80c9f2 100644 --- a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/parquet/ParquetIOIT.java +++ b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/parquet/ParquetIOIT.java @@ -78,9 +78,6 @@ * performance testing framework. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ParquetIOIT { private static final Schema SCHEMA = diff --git a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java index fe01ea8b0d76..c28bd939188f 100644 --- a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java +++ b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java @@ -77,9 +77,6 @@ * performance testing framework. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TextIOIT { private static final Logger LOG = LoggerFactory.getLogger(TextIOIT.class); diff --git a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/tfrecord/TFRecordIOIT.java b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/tfrecord/TFRecordIOIT.java index a7bfe84331a1..395000bdd5a6 100644 --- a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/tfrecord/TFRecordIOIT.java +++ b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/tfrecord/TFRecordIOIT.java @@ -78,9 +78,6 @@ * performance testing framework. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TFRecordIOIT { private static final String TFRECORD_NAMESPACE = TFRecordIOIT.class.getName(); diff --git a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/xml/XmlIOIT.java b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/xml/XmlIOIT.java index a52cd178aa0f..2b44274baff4 100644 --- a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/xml/XmlIOIT.java +++ b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/xml/XmlIOIT.java @@ -81,9 +81,6 @@ * performance testing framework. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class XmlIOIT { /** XmlIOIT options. */ diff --git a/sdks/java/io/google-cloud-platform/build.gradle b/sdks/java/io/google-cloud-platform/build.gradle index 7ac6878eae80..731cf3353548 100644 --- a/sdks/java/io/google-cloud-platform/build.gradle +++ b/sdks/java/io/google-cloud-platform/build.gradle @@ -20,9 +20,11 @@ import groovy.json.JsonOutput plugins { id 'org.apache.beam.module' } applyJavaNature( - automaticModuleName: 'org.apache.beam.sdk.io.gcp', enableSpotbugs: false, + classesTriggerCheckerBugs: [ + 'PubSubPayloadTranslation': 'https://github.com/typetools/checker-framework/issues/3791', + ], ) description = "Apache Beam :: SDKs :: Java :: IO :: Google Cloud Platform" @@ -30,18 +32,31 @@ ext.summary = "IO library to read and write Google Cloud Platform systems from B dependencies { compile enforcedPlatform(library.java.google_cloud_platform_libraries_bom) + permitUnusedDeclared enforcedPlatform(library.java.google_cloud_platform_libraries_bom) + compile project(path: ":model:pipeline", configuration: "shadow") + compile project(":runners:core-java") compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":sdks:java:expansion-service") + permitUnusedDeclared project(":sdks:java:expansion-service") // BEAM-11761 compile project(":sdks:java:extensions:google-cloud-platform-core") compile project(":sdks:java:extensions:protobuf") + compile project(":runners:core-construction-java") + compile project(":sdks:java:extensions:arrow") compile library.java.avro compile library.java.bigdataoss_util + compile library.java.error_prone_annotations + compile library.java.flogger_system_backend // Avoids conflicts with bigdataoss_util (BEAM-11010) + permitUnusedDeclared library.java.flogger_system_backend // BEAM-11010 compile library.java.gax compile library.java.gax_grpc + compile library.java.gax_httpjson + permitUnusedDeclared library.java.gax_httpjson // BEAM-8755 compile library.java.google_api_client + compile library.java.google_api_common compile library.java.google_api_services_bigquery compile library.java.google_api_services_healthcare compile library.java.google_api_services_pubsub + compile library.java.google_api_services_storage compile library.java.google_auth_library_credentials compile library.java.google_auth_library_oauth2_http compile library.java.google_cloud_bigquery_storage @@ -53,50 +68,81 @@ dependencies { compile(library.java.google_cloud_core_grpc) { exclude group: 'io.grpc', module: 'grpc-core' // Use Beam's version } + permitUnusedDeclared library.java.google_cloud_core_grpc // BEAM-11761 compile library.java.google_cloud_datastore_v1_proto_client + compile library.java.google_cloud_firestore compile library.java.google_cloud_pubsublite // GCP PubSub client is used in TestPubSub compile library.java.google_cloud_pubsub compile library.java.google_cloud_spanner + compile library.java.google_code_gson compile library.java.google_http_client compile library.java.google_http_client_jackson2 compile library.java.grpc_alts + permitUnusedDeclared library.java.grpc_alts // BEAM-11761 + compile library.java.grpc_api compile library.java.grpc_auth compile library.java.grpc_core + permitUnusedDeclared library.java.grpc_core // BEAM-11761 compile library.java.grpc_context + permitUnusedDeclared library.java.grpc_context // BEAM-11761 compile library.java.grpc_grpclb + permitUnusedDeclared library.java.grpc_grpclb // BEAM-11761 compile library.java.grpc_netty compile library.java.grpc_netty_shaded + permitUnusedDeclared library.java.grpc_netty_shaded // BEAM-11761 + compile library.java.grpc_protobuf compile library.java.grpc_stub + permitUnusedDeclared library.java.grpc_stub // BEAM-11761 compile library.java.grpc_google_cloud_pubsub_v1 compile library.java.grpc_google_cloud_pubsublite_v1 + permitUnusedDeclared library.java.grpc_google_cloud_pubsublite_v1 // BEAM-11761 compile library.java.guava + compile library.java.hamcrest + compile library.java.http_client compile library.java.hamcrest_core + permitUnusedDeclared library.java.hamcrest_core // BEAM-11761 + compile library.java.http_core + compile library.java.jackson_core compile library.java.jackson_databind compile library.java.joda_time compile library.java.junit compile library.java.netty_handler compile library.java.netty_tcnative_boringssl_static - compile library.java.proto_google_cloud_bigquery_storage_v1beta1 + permitUnusedDeclared library.java.netty_tcnative_boringssl_static // BEAM-11761 + compile library.java.proto_google_cloud_bigquery_storage_v1 + compile library.java.proto_google_cloud_bigquery_storage_v1beta2 + compile library.java.proto_google_cloud_bigtable_admin_v2 compile library.java.proto_google_cloud_bigtable_v2 compile library.java.proto_google_cloud_datastore_v1 + compile library.java.proto_google_cloud_firestore_v1 compile library.java.proto_google_cloud_pubsub_v1 compile library.java.proto_google_cloud_pubsublite_v1 compile library.java.proto_google_cloud_spanner_admin_database_v1 + permitUnusedDeclared library.java.proto_google_cloud_spanner_admin_database_v1 // BEAM-11761 + compile library.java.proto_google_cloud_spanner_v1 compile library.java.proto_google_common_protos compile library.java.protobuf_java + compile library.java.protobuf_java_util compile library.java.slf4j_api + compile library.java.vendored_grpc_1_36_0 + compile library.java.vendored_guava_26_0_jre + compile library.java.arrow_memory_core + compile library.java.arrow_vector + + compile "org.threeten:threetenbp:1.4.4" + + testCompile library.java.arrow_memory_netty testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(path: ":sdks:java:extensions:google-cloud-platform-core", configuration: "testRuntime") + testCompile project(path: ":sdks:java:extensions:protobuf", configuration: "testRuntime") testCompile project(path: ":runners:direct-java", configuration: "shadow") testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") testCompile project(path: ":sdks:java:testing:test-utils", configuration: "testRuntime") - // For testing Cross-language transforms - testCompile project(":runners:core-construction-java") - testCompile library.java.hamcrest_library testCompile library.java.mockito_core testCompile library.java.powermock testCompile library.java.powermock_mockito + testCompile library.java.joda_time testRuntimeOnly library.java.slf4j_jdk14 } diff --git a/sdks/java/io/google-cloud-platform/expansion-service/build.gradle b/sdks/java/io/google-cloud-platform/expansion-service/build.gradle index 2d1799762761..dea464bb3cdf 100644 --- a/sdks/java/io/google-cloud-platform/expansion-service/build.gradle +++ b/sdks/java/io/google-cloud-platform/expansion-service/build.gradle @@ -32,6 +32,8 @@ ext.summary = "Expansion service serving GCP Java IOs" dependencies { compile project(":sdks:java:expansion-service") + permitUnusedDeclared project(":sdks:java:expansion-service") // BEAM-11761 compile project(":sdks:java:io:google-cloud-platform") + permitUnusedDeclared project(":sdks:java:io:google-cloud-platform") // BEAM-11761 runtime library.java.slf4j_jdk14 } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java index f8041316bc7e..16b96bf870c7 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java @@ -30,10 +30,8 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; -import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.coders.NullableCoder; import org.apache.beam.sdk.coders.ShardedKeyCoder; -import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VoidCoder; import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition; @@ -47,9 +45,12 @@ import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.Flatten; import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.Reshuffle; +import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.transforms.Values; import org.apache.beam.sdk.transforms.View; import org.apache.beam.sdk.transforms.WithKeys; @@ -113,6 +114,10 @@ class BatchLoads // written. static final int FILE_TRIGGERING_RECORD_COUNT = 500000; + // If using auto-sharding for unbounded data, we batch the records before triggering file write + // to avoid generating too many small files. + static final Duration FILE_TRIGGERING_BATCHING_DURATION = Duration.standardSeconds(1); + // The maximum number of retries to poll the status of a job. // It sets to {@code Integer.MAX_VALUE} to block until the BigQuery job finishes. static final int LOAD_JOB_POLL_MAX_RETRIES = Integer.MAX_VALUE; @@ -140,8 +145,8 @@ class BatchLoads private ValueProvider loadJobProjectId; private final Coder elementCoder; private final RowWriterFactory rowWriterFactory; - private String kmsKey; - private boolean clusteringEnabled; + private final String kmsKey; + private final boolean clusteringEnabled; // The maximum number of times to retry failed load or copy jobs. private int maxRetryJobs = DEFAULT_MAX_RETRY_JOBS; @@ -265,31 +270,46 @@ public void validate(PipelineOptions options) { // Expand the pipeline when the user has requested periodically-triggered file writes. private WriteResult expandTriggered(PCollection> input) { - checkArgument(numFileShards > 0); Pipeline p = input.getPipeline(); final PCollectionView loadJobIdPrefixView = createJobIdPrefixView(p, JobType.LOAD); + final PCollectionView tempLoadJobIdPrefixView = + createJobIdPrefixView(p, JobType.TEMP_TABLE_LOAD); final PCollectionView copyJobIdPrefixView = createJobIdPrefixView(p, JobType.COPY); final PCollectionView tempFilePrefixView = createTempFilePrefixView(p, loadJobIdPrefixView); - // The user-supplied triggeringDuration is often chosen to control how many BigQuery load - // jobs are generated, to prevent going over BigQuery's daily quota for load jobs. If this - // is set to a large value, currently we have to buffer all the data until the trigger fires. - // Instead we ensure that the files are written if a threshold number of records are ready. - // We use only the user-supplied trigger on the actual BigQuery load. This allows us to - // offload the data to the filesystem. - PCollection> inputInGlobalWindow = - input.apply( - "rewindowIntoGlobal", - Window.>into(new GlobalWindows()) - .triggering( - Repeatedly.forever( - AfterFirst.of( - AfterProcessingTime.pastFirstElementInPane() - .plusDelayOf(triggeringFrequency), - AfterPane.elementCountAtLeast(FILE_TRIGGERING_RECORD_COUNT)))) - .discardingFiredPanes()); - PCollection> results = - writeShardedFiles(inputInGlobalWindow, tempFilePrefixView); + PCollection> results; + if (numFileShards > 0) { + // The user-supplied triggeringFrequency is often chosen to control how many BigQuery load + // jobs are generated, to prevent going over BigQuery's daily quota for load jobs. If this + // is set to a large value, currently we have to buffer all the data until the trigger fires. + // Instead we ensure that the files are written if a threshold number of records are ready. + // We use only the user-supplied trigger on the actual BigQuery load. This allows us to + // offload the data to the filesystem. + PCollection> inputInGlobalWindow = + input.apply( + "rewindowIntoGlobal", + Window.>into(new GlobalWindows()) + .triggering( + Repeatedly.forever( + AfterFirst.of( + AfterProcessingTime.pastFirstElementInPane() + .plusDelayOf(triggeringFrequency), + AfterPane.elementCountAtLeast(FILE_TRIGGERING_RECORD_COUNT)))) + .discardingFiredPanes()); + results = writeStaticallyShardedFiles(inputInGlobalWindow, tempFilePrefixView); + } else { + // In the case of dynamic sharding, however, we use a default trigger since the transform + // performs sharding also batches elements to avoid generating too many tiny files. User + // trigger is applied right after writes to limit the number of load jobs. + PCollection> inputInGlobalWindow = + input.apply( + "rewindowIntoGlobal", + Window.>into(new GlobalWindows()) + .triggering(DefaultTrigger.of()) + .discardingFiredPanes()); + results = writeDynamicallyShardedFilesTriggered(inputInGlobalWindow, tempFilePrefixView); + } + // Apply the user's trigger before we start generating BigQuery load jobs. results = results.apply( @@ -301,20 +321,20 @@ private WriteResult expandTriggered(PCollection> inpu .plusDelayOf(triggeringFrequency))) .discardingFiredPanes()); - TupleTag, List>> multiPartitionsTag = + TupleTag, WritePartition.Result>> multiPartitionsTag = new TupleTag<>("multiPartitionsTag"); - TupleTag, List>> singlePartitionTag = + TupleTag, WritePartition.Result>> singlePartitionTag = new TupleTag<>("singlePartitionTag"); // If we have non-default triggered output, we can't use the side-input technique used in - // expandUntriggered . Instead make the result list a main input. Apply a GroupByKey first for + // expandUntriggered. Instead make the result list a main input. Apply a GroupByKey first for // determinism. PCollectionTuple partitions = results - .apply("AttachSingletonKey", WithKeys.of((Void) null)) + .apply("AttachDestinationKey", WithKeys.of(result -> result.destination)) .setCoder( - KvCoder.of(VoidCoder.of(), WriteBundlesToFiles.ResultCoder.of(destinationCoder))) - .apply("GroupOntoSingleton", GroupByKey.create()) + KvCoder.of(destinationCoder, WriteBundlesToFiles.ResultCoder.of(destinationCoder))) + .apply("GroupFilesByDestination", GroupByKey.create()) .apply("ExtractResultValues", Values.create()) .apply( "WritePartitionTriggered", @@ -330,18 +350,19 @@ private WriteResult expandTriggered(PCollection> inpu rowWriterFactory)) .withSideInputs(tempFilePrefixView) .withOutputTags(multiPartitionsTag, TupleTagList.of(singlePartitionTag))); - PCollection> tempTables = - writeTempTables(partitions.get(multiPartitionsTag), loadJobIdPrefixView); + PCollection> tempTables = + writeTempTables(partitions.get(multiPartitionsTag), tempLoadJobIdPrefixView); tempTables // Now that the load job has happened, we want the rename to happen immediately. .apply( - Window.>into(new GlobalWindows()) + "Window Into Global Windows", + Window.>into(new GlobalWindows()) .triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))) - .apply(WithKeys.of((Void) null)) + .apply("Add Void Key", WithKeys.of((Void) null)) .setCoder(KvCoder.of(VoidCoder.of(), tempTables.getCoder())) - .apply(GroupByKey.create()) - .apply(Values.create()) + .apply("GroupByKey", GroupByKey.create()) + .apply("Extract Values", Values.create()) .apply( "WriteRenameTriggered", ParDo.of( @@ -361,6 +382,9 @@ private WriteResult expandTriggered(PCollection> inpu public WriteResult expandUntriggered(PCollection> input) { Pipeline p = input.getPipeline(); final PCollectionView loadJobIdPrefixView = createJobIdPrefixView(p, JobType.LOAD); + final PCollectionView tempLoadJobIdPrefixView = + createJobIdPrefixView(p, JobType.TEMP_TABLE_LOAD); + final PCollectionView copyJobIdPrefixView = createJobIdPrefixView(p, JobType.COPY); final PCollectionView tempFilePrefixView = createTempFilePrefixView(p, loadJobIdPrefixView); PCollection> inputInGlobalWindow = @@ -371,13 +395,13 @@ public WriteResult expandUntriggered(PCollection> inp .discardingFiredPanes()); PCollection> results = (numFileShards == 0) - ? writeDynamicallyShardedFiles(inputInGlobalWindow, tempFilePrefixView) - : writeShardedFiles(inputInGlobalWindow, tempFilePrefixView); + ? writeDynamicallyShardedFilesUntriggered(inputInGlobalWindow, tempFilePrefixView) + : writeStaticallyShardedFiles(inputInGlobalWindow, tempFilePrefixView); - TupleTag, List>> multiPartitionsTag = - new TupleTag, List>>("multiPartitionsTag") {}; - TupleTag, List>> singlePartitionTag = - new TupleTag, List>>("singlePartitionTag") {}; + TupleTag, WritePartition.Result>> multiPartitionsTag = + new TupleTag, WritePartition.Result>>("multiPartitionsTag") {}; + TupleTag, WritePartition.Result>> singlePartitionTag = + new TupleTag, WritePartition.Result>>("singlePartitionTag") {}; // This transform will look at the set of files written for each table, and if any table has // too many files or bytes, will partition that table's files into multiple partitions for @@ -400,8 +424,8 @@ public WriteResult expandUntriggered(PCollection> inp rowWriterFactory)) .withSideInputs(tempFilePrefixView) .withOutputTags(multiPartitionsTag, TupleTagList.of(singlePartitionTag))); - PCollection> tempTables = - writeTempTables(partitions.get(multiPartitionsTag), loadJobIdPrefixView); + PCollection> tempTables = + writeTempTables(partitions.get(multiPartitionsTag), tempLoadJobIdPrefixView); tempTables .apply("ReifyRenameInput", new ReifyAsIterable<>()) @@ -410,7 +434,7 @@ public WriteResult expandUntriggered(PCollection> inp ParDo.of( new WriteRename( bigQueryServices, - loadJobIdPrefixView, + copyJobIdPrefixView, writeDisposition, createDisposition, maxRetryJobs, @@ -444,7 +468,7 @@ public void process(ProcessContext c) { // Generate the temporary-file prefix. private PCollectionView createTempFilePrefixView( Pipeline p, final PCollectionView jobIdView) { - return p.apply(Create.of("")) + return p.apply("Create dummy value", Create.of("")) .apply( "GetTempFilePrefix", ParDo.of( @@ -470,9 +494,10 @@ public void getTempFilePrefix(ProcessContext c) { .apply("TempFilePrefixView", View.asSingleton()); } - // Writes input data to dynamically-sharded, per-bundle files. Returns a PCollection of filename, - // file byte size, and table destination. - PCollection> writeDynamicallyShardedFiles( + // Writes input data to dynamically-sharded per-bundle files without triggering. Input records are + // spilt to new files if memory is constrained. Returns a PCollection of filename, file byte size, + // and table destination. + PCollection> writeDynamicallyShardedFilesUntriggered( PCollection> input, PCollectionView tempFilePrefix) { TupleTag> writtenFilesTag = new TupleTag>("writtenFiles") {}; @@ -513,9 +538,9 @@ PCollection> writeDynamicallyShardedFil .setCoder(WriteBundlesToFiles.ResultCoder.of(destinationCoder)); } - // Writes input data to statically-sharded files. Returns a PCollection of filename, - // file byte size, and table destination. - PCollection> writeShardedFiles( + // Writes input data to statically-sharded files. Returns a PCollection of filename, file byte + // size, and table destination. + PCollection> writeStaticallyShardedFiles( PCollection> input, PCollectionView tempFilePrefix) { checkState(numFileShards > 0); PCollection, ElementT>> shardedRecords = @@ -547,31 +572,90 @@ public void processElement( return writeShardedRecords(shardedRecords, tempFilePrefix); } + // Writes input data to dynamically-sharded files with triggering. The input data is sharded by + // table destinations and each destination may be sub-sharded dynamically. Returns a PCollection + // of filename, file byte size, and table destination. + PCollection> writeDynamicallyShardedFilesTriggered( + PCollection> input, PCollectionView tempFilePrefix) { + BigQueryOptions options = input.getPipeline().getOptions().as(BigQueryOptions.class); + Duration maxBufferingDuration = + options.getMaxBufferingDurationMilliSec() > 0 + ? Duration.millis(options.getMaxBufferingDurationMilliSec()) + : FILE_TRIGGERING_BATCHING_DURATION; + // In contrast to fixed sharding with user trigger, here we use a global window with default + // trigger and rely on GroupIntoBatches transform to group, batch and at the same time + // parallelize properly. We also ensure that the files are written if a threshold number of + // records are ready. Dynamic sharding is achieved via the withShardedKey() option provided by + // GroupIntoBatches. + return input + .apply( + GroupIntoBatches.ofSize(FILE_TRIGGERING_RECORD_COUNT) + .withMaxBufferingDuration(maxBufferingDuration) + .withShardedKey()) + .setCoder( + KvCoder.of( + org.apache.beam.sdk.util.ShardedKey.Coder.of(destinationCoder), + IterableCoder.of(elementCoder))) + .apply( + "StripShardId", + MapElements.via( + new SimpleFunction< + KV, Iterable>, + KV>>() { + @Override + public KV> apply( + KV, Iterable> + input) { + return KV.of(input.getKey().getKey(), input.getValue()); + } + })) + .setCoder(KvCoder.of(destinationCoder, IterableCoder.of(elementCoder))) + .apply( + "WriteGroupedRecords", + ParDo.of( + new WriteGroupedRecordsToFiles( + tempFilePrefix, maxFileSize, rowWriterFactory)) + .withSideInputs(tempFilePrefix)) + .setCoder(WriteBundlesToFiles.ResultCoder.of(destinationCoder)); + } + private PCollection> writeShardedRecords( PCollection, ElementT>> shardedRecords, PCollectionView tempFilePrefix) { return shardedRecords .apply("GroupByDestination", GroupByKey.create()) + .apply( + "StripShardId", + MapElements.via( + new SimpleFunction< + KV, Iterable>, + KV>>() { + @Override + public KV> apply( + KV, Iterable> input) { + return KV.of(input.getKey().getKey(), input.getValue()); + } + })) + .setCoder(KvCoder.of(destinationCoder, IterableCoder.of(elementCoder))) .apply( "WriteGroupedRecords", ParDo.of( - new WriteGroupedRecordsToFiles( - tempFilePrefix, maxFileSize, rowWriterFactory)) + new WriteGroupedRecordsToFiles<>(tempFilePrefix, maxFileSize, rowWriterFactory)) .withSideInputs(tempFilePrefix)) .setCoder(WriteBundlesToFiles.ResultCoder.of(destinationCoder)); } // Take in a list of files and write them to temporary tables. - private PCollection> writeTempTables( - PCollection, List>> input, + private PCollection> writeTempTables( + PCollection, WritePartition.Result>> input, PCollectionView jobIdTokenView) { List> sideInputs = Lists.newArrayList(jobIdTokenView); sideInputs.addAll(dynamicDestinations.getSideInputs()); - Coder, List>> partitionsCoder = + Coder, WritePartition.Result>> partitionsCoder = KvCoder.of( ShardedKeyCoder.of(NullableCoder.of(destinationCoder)), - ListCoder.of(StringUtf8Coder.of())); + WritePartition.ResultCoder.INSTANCE); // If the final destination table exists already (and we're appending to it), then the temp // tables must exactly match schema, partitioning, etc. Wrap the DynamicDestinations object @@ -613,20 +697,24 @@ private PCollection> writeTempTables( rowWriterFactory.getSourceFormat(), useAvroLogicalTypes, schemaUpdateOptions)) - .setCoder(KvCoder.of(tableDestinationCoder, StringUtf8Coder.of())); + .setCoder(KvCoder.of(tableDestinationCoder, WriteTables.ResultCoder.INSTANCE)); } // In the case where the files fit into a single load job, there's no need to write temporary // tables and rename. We can load these files directly into the target BigQuery table. void writeSinglePartition( - PCollection, List>> input, + PCollection, WritePartition.Result>> input, PCollectionView loadJobIdPrefixView) { List> sideInputs = Lists.newArrayList(loadJobIdPrefixView); sideInputs.addAll(dynamicDestinations.getSideInputs()); - Coder, List>> partitionsCoder = + + Coder tableDestinationCoder = + clusteringEnabled ? TableDestinationCoderV3.of() : TableDestinationCoderV2.of(); + + Coder, WritePartition.Result>> partitionsCoder = KvCoder.of( ShardedKeyCoder.of(NullableCoder.of(destinationCoder)), - ListCoder.of(StringUtf8Coder.of())); + WritePartition.ResultCoder.INSTANCE); // Write single partition to final table input .setCoder(partitionsCoder) @@ -649,13 +737,14 @@ void writeSinglePartition( kmsKey, rowWriterFactory.getSourceFormat(), useAvroLogicalTypes, - schemaUpdateOptions)); + schemaUpdateOptions)) + .setCoder(KvCoder.of(tableDestinationCoder, WriteTables.ResultCoder.INSTANCE)); } private WriteResult writeResult(Pipeline p) { PCollection empty = p.apply("CreateEmptyFailedInserts", Create.empty(TypeDescriptor.of(TableRow.class))); - return WriteResult.in(p, new TupleTag<>("failedInserts"), empty); + return WriteResult.in(p, new TupleTag<>("failedInserts"), empty, null); } @Override diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchedStreamingWrite.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchedStreamingWrite.java new file mode 100644 index 000000000000..dfe797e28bf5 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchedStreamingWrite.java @@ -0,0 +1,449 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.api.services.bigquery.model.TableReference; +import com.google.api.services.bigquery.model.TableRow; +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import javax.annotation.Nullable; +import org.apache.beam.runners.core.metrics.MetricsLogger; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.sdk.coders.AtomicCoder; +import org.apache.beam.sdk.coders.IterableCoder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.MetricsContainer; +import org.apache.beam.sdk.metrics.MetricsEnvironment; +import org.apache.beam.sdk.metrics.SinkMetrics; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.DoFn.Teardown; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.transforms.windowing.DefaultTrigger; +import org.apache.beam.sdk.transforms.windowing.GlobalWindows; +import org.apache.beam.sdk.transforms.windowing.PaneInfo; +import org.apache.beam.sdk.transforms.windowing.Window; +import org.apache.beam.sdk.util.ShardedKey; +import org.apache.beam.sdk.values.FailsafeValueInSingleWindow; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.sdk.values.ValueInSingleWindow; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** PTransform to perform batched streaming BigQuery write. */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +class BatchedStreamingWrite + extends PTransform>>, PCollectionTuple> { + private static final TupleTag mainOutputTag = new TupleTag<>("mainOutput"); + static final TupleTag SUCCESSFUL_ROWS_TAG = new TupleTag<>("successfulRows"); + private static final Logger LOG = LoggerFactory.getLogger(BatchedStreamingWrite.class); + + private final BigQueryServices bqServices; + private final InsertRetryPolicy retryPolicy; + private final TupleTag failedOutputTag; + private final AtomicCoder failedOutputCoder; + private final ErrorContainer errorContainer; + private final boolean skipInvalidRows; + private final boolean ignoreUnknownValues; + private final boolean ignoreInsertIds; + private final SerializableFunction toTableRow; + private final SerializableFunction toFailsafeTableRow; + private final Set allowedMetricUrns; + + /** Tracks bytes written, exposed as "ByteCount" Counter. */ + private Counter byteCounter = SinkMetrics.bytesWritten(); + + /** Switches the method of batching. */ + private final boolean batchViaStateful; + + public BatchedStreamingWrite( + BigQueryServices bqServices, + InsertRetryPolicy retryPolicy, + TupleTag failedOutputTag, + AtomicCoder failedOutputCoder, + ErrorContainer errorContainer, + boolean skipInvalidRows, + boolean ignoreUnknownValues, + boolean ignoreInsertIds, + SerializableFunction toTableRow, + SerializableFunction toFailsafeTableRow) { + this.bqServices = bqServices; + this.retryPolicy = retryPolicy; + this.failedOutputTag = failedOutputTag; + this.failedOutputCoder = failedOutputCoder; + this.errorContainer = errorContainer; + this.skipInvalidRows = skipInvalidRows; + this.ignoreUnknownValues = ignoreUnknownValues; + this.ignoreInsertIds = ignoreInsertIds; + this.toTableRow = toTableRow; + this.toFailsafeTableRow = toFailsafeTableRow; + this.allowedMetricUrns = getAllowedMetricUrns(); + this.batchViaStateful = false; + } + + private BatchedStreamingWrite( + BigQueryServices bqServices, + InsertRetryPolicy retryPolicy, + TupleTag failedOutputTag, + AtomicCoder failedOutputCoder, + ErrorContainer errorContainer, + boolean skipInvalidRows, + boolean ignoreUnknownValues, + boolean ignoreInsertIds, + SerializableFunction toTableRow, + SerializableFunction toFailsafeTableRow, + boolean batchViaStateful) { + this.bqServices = bqServices; + this.retryPolicy = retryPolicy; + this.failedOutputTag = failedOutputTag; + this.failedOutputCoder = failedOutputCoder; + this.errorContainer = errorContainer; + this.skipInvalidRows = skipInvalidRows; + this.ignoreUnknownValues = ignoreUnknownValues; + this.ignoreInsertIds = ignoreInsertIds; + this.toTableRow = toTableRow; + this.toFailsafeTableRow = toFailsafeTableRow; + this.allowedMetricUrns = getAllowedMetricUrns(); + this.batchViaStateful = batchViaStateful; + } + + private static Set getAllowedMetricUrns() { + ImmutableSet.Builder setBuilder = ImmutableSet.builder(); + setBuilder.add(MonitoringInfoConstants.Urns.API_REQUEST_COUNT); + setBuilder.add(MonitoringInfoConstants.Urns.API_REQUEST_LATENCIES); + return setBuilder.build(); + } + + /** + * A transform that performs batched streaming BigQuery write; input elements are batched and + * flushed upon bundle finalization. + */ + public BatchedStreamingWrite viaDoFnFinalization() { + return new BatchedStreamingWrite<>( + bqServices, + retryPolicy, + failedOutputTag, + failedOutputCoder, + errorContainer, + skipInvalidRows, + ignoreUnknownValues, + ignoreInsertIds, + toTableRow, + toFailsafeTableRow, + false); + } + + /** + * A transform that performs batched streaming BigQuery write; input elements are grouped on table + * destinations and batched via a stateful DoFn. This also enables dynamic sharding during + * grouping to parallelize writes. + */ + public BatchedStreamingWrite viaStateful() { + return new BatchedStreamingWrite<>( + bqServices, + retryPolicy, + failedOutputTag, + failedOutputCoder, + errorContainer, + skipInvalidRows, + ignoreUnknownValues, + ignoreInsertIds, + toTableRow, + toFailsafeTableRow, + true); + } + + @Override + public PCollectionTuple expand(PCollection>> input) { + return batchViaStateful + ? input.apply(new ViaStateful()) + : input.apply(new ViaBundleFinalization()); + } + + private class ViaBundleFinalization + extends PTransform>>, PCollectionTuple> { + @Override + public PCollectionTuple expand(PCollection>> input) { + PCollectionTuple result = + input.apply( + ParDo.of(new BatchAndInsertElements()) + .withOutputTags( + mainOutputTag, TupleTagList.of(failedOutputTag).and(SUCCESSFUL_ROWS_TAG))); + result.get(failedOutputTag).setCoder(failedOutputCoder); + result.get(SUCCESSFUL_ROWS_TAG).setCoder(TableRowJsonCoder.of()); + return result; + } + } + + @VisibleForTesting + private class BatchAndInsertElements extends DoFn>, Void> { + + /** JsonTableRows to accumulate BigQuery rows in order to batch writes. */ + private transient Map>> tableRows; + + /** The list of unique ids for each BigQuery table row. */ + private transient Map> uniqueIdsForTableRows; + + private transient @Nullable DatasetService datasetService; + + private DatasetService getDatasetService(PipelineOptions pipelineOptions) throws IOException { + if (datasetService == null) { + datasetService = bqServices.getDatasetService(pipelineOptions.as(BigQueryOptions.class)); + } + return datasetService; + } + + /** Prepares a target BigQuery table. */ + @StartBundle + public void startBundle() { + tableRows = new HashMap<>(); + uniqueIdsForTableRows = new HashMap<>(); + } + + /** Accumulates the input into JsonTableRows and uniqueIdsForTableRows. */ + @ProcessElement + public void processElement( + @Element KV> element, + @Timestamp Instant timestamp, + BoundedWindow window, + PaneInfo pane) { + String tableSpec = element.getKey(); + TableRow tableRow = toTableRow.apply(element.getValue().tableRow); + TableRow failsafeTableRow = toFailsafeTableRow.apply(element.getValue().tableRow); + tableRows + .computeIfAbsent(tableSpec, k -> new ArrayList<>()) + .add(FailsafeValueInSingleWindow.of(tableRow, timestamp, window, pane, failsafeTableRow)); + uniqueIdsForTableRows + .computeIfAbsent(tableSpec, k -> new ArrayList<>()) + .add(element.getValue().uniqueId); + } + + /** Writes the accumulated rows into BigQuery with streaming API. */ + @FinishBundle + public void finishBundle(FinishBundleContext context) throws Exception { + List> failedInserts = Lists.newArrayList(); + List> successfulInserts = Lists.newArrayList(); + BigQueryOptions options = context.getPipelineOptions().as(BigQueryOptions.class); + for (Map.Entry>> entry : + tableRows.entrySet()) { + TableReference tableReference = BigQueryHelpers.parseTableSpec(entry.getKey()); + flushRows( + getDatasetService(options), + tableReference, + entry.getValue(), + uniqueIdsForTableRows.get(entry.getKey()), + failedInserts, + successfulInserts); + } + tableRows.clear(); + uniqueIdsForTableRows.clear(); + + for (ValueInSingleWindow row : failedInserts) { + context.output(failedOutputTag, row.getValue(), row.getTimestamp(), row.getWindow()); + } + reportStreamingApiLogging(options); + } + + @Teardown + public void onTeardown() { + try { + if (datasetService != null) { + datasetService.close(); + datasetService = null; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + + // The max duration input records are allowed to be buffered in the state, if using ViaStateful. + private static final Duration BATCH_MAX_BUFFERING_DURATION = Duration.millis(200); + + private class ViaStateful + extends PTransform>>, PCollectionTuple> { + @Override + public PCollectionTuple expand(PCollection>> input) { + BigQueryOptions options = input.getPipeline().getOptions().as(BigQueryOptions.class); + Duration maxBufferingDuration = + options.getMaxBufferingDurationMilliSec() > 0 + ? Duration.millis(options.getMaxBufferingDurationMilliSec()) + : BATCH_MAX_BUFFERING_DURATION; + KvCoder> inputCoder = (KvCoder) input.getCoder(); + TableRowInfoCoder valueCoder = + (TableRowInfoCoder) inputCoder.getCoderArguments().get(1); + PCollectionTuple result = + input + // Apply a global window to avoid GroupIntoBatches below performs tiny grouping + // partitioned by windows. + .apply( + Window.>>into(new GlobalWindows()) + .triggering(DefaultTrigger.of()) + .discardingFiredPanes()) + // Group and batch table rows such that each batch has no more than + // getMaxStreamingRowsToBatch rows. Also set a buffering time limit to avoid being + // stuck at a partial batch forever, especially in a global window. + .apply( + GroupIntoBatches.>ofSize( + options.getMaxStreamingRowsToBatch()) + .withMaxBufferingDuration(maxBufferingDuration) + .withShardedKey()) + .setCoder( + KvCoder.of( + ShardedKey.Coder.of(StringUtf8Coder.of()), IterableCoder.of(valueCoder))) + .apply( + ParDo.of(new InsertBatchedElements()) + .withOutputTags( + mainOutputTag, + TupleTagList.of(failedOutputTag).and(SUCCESSFUL_ROWS_TAG))); + result.get(failedOutputTag).setCoder(failedOutputCoder); + result.get(SUCCESSFUL_ROWS_TAG).setCoder(TableRowJsonCoder.of()); + return result; + } + } + + // TODO(BEAM-11408): This transform requires stable inputs. Currently it relies on the fact that + // the upstream transform GroupIntoBatches produces stable outputs as opposed to using the + // annotation @RequiresStableInputs, to avoid potential performance penalty due to extra data + // shuffling. + private class InsertBatchedElements + extends DoFn, Iterable>>, Void> { + private transient @Nullable DatasetService datasetService; + + private DatasetService getDatasetService(PipelineOptions pipelineOptions) throws IOException { + if (datasetService == null) { + datasetService = bqServices.getDatasetService(pipelineOptions.as(BigQueryOptions.class)); + } + return datasetService; + } + + @ProcessElement + public void processElement( + @Element KV, Iterable>> input, + BoundedWindow window, + ProcessContext context, + MultiOutputReceiver out) + throws InterruptedException, IOException { + List> tableRows = new ArrayList<>(); + List uniqueIds = new ArrayList<>(); + for (TableRowInfo row : input.getValue()) { + TableRow tableRow = toTableRow.apply(row.tableRow); + TableRow failsafeTableRow = toFailsafeTableRow.apply(row.tableRow); + tableRows.add( + FailsafeValueInSingleWindow.of( + tableRow, context.timestamp(), window, context.pane(), failsafeTableRow)); + uniqueIds.add(row.uniqueId); + } + LOG.info("Writing to BigQuery using Auto-sharding. Flushing {} rows.", tableRows.size()); + BigQueryOptions options = context.getPipelineOptions().as(BigQueryOptions.class); + TableReference tableReference = BigQueryHelpers.parseTableSpec(input.getKey().getKey()); + List> failedInserts = Lists.newArrayList(); + List> successfulInserts = Lists.newArrayList(); + flushRows( + getDatasetService(options), + tableReference, + tableRows, + uniqueIds, + failedInserts, + successfulInserts); + + for (ValueInSingleWindow row : failedInserts) { + out.get(failedOutputTag).output(row.getValue()); + } + for (ValueInSingleWindow row : successfulInserts) { + out.get(SUCCESSFUL_ROWS_TAG).output(row.getValue()); + } + reportStreamingApiLogging(options); + } + + @Teardown + public void onTeardown() { + try { + if (datasetService != null) { + datasetService.close(); + datasetService = null; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + + /** Writes the accumulated rows into BigQuery with streaming API. */ + private void flushRows( + DatasetService datasetService, + TableReference tableReference, + List> tableRows, + List uniqueIds, + List> failedInserts, + List> successfulInserts) + throws InterruptedException { + if (!tableRows.isEmpty()) { + try { + long totalBytes = + datasetService.insertAll( + tableReference, + tableRows, + uniqueIds, + retryPolicy, + failedInserts, + errorContainer, + skipInvalidRows, + ignoreUnknownValues, + ignoreInsertIds, + successfulInserts); + byteCounter.inc(totalBytes); + } catch (IOException e) { + throw new RuntimeException(e); + } + } + } + + private void reportStreamingApiLogging(BigQueryOptions options) { + MetricsContainer processWideContainer = MetricsEnvironment.getProcessWideContainer(); + if (processWideContainer instanceof MetricsLogger) { + MetricsLogger processWideMetricsLogger = (MetricsLogger) processWideContainer; + processWideMetricsLogger.tryLoggingMetrics( + "API call Metrics: \n", + this.allowedMetricUrns, + options.getBqStreamingApiLoggingFrequencySec() * 1000L); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java new file mode 100644 index 000000000000..816cbe9d6caf --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.protobuf.ByteString; +import com.google.protobuf.DescriptorProtos.DescriptorProto; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto.Label; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto.Type; +import com.google.protobuf.DescriptorProtos.FileDescriptorProto; +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.Descriptors.DescriptorValidationException; +import com.google.protobuf.Descriptors.FieldDescriptor; +import com.google.protobuf.Descriptors.FileDescriptor; +import com.google.protobuf.DynamicMessage; +import java.math.BigDecimal; +import java.time.LocalDate; +import java.time.LocalDateTime; +import java.time.LocalTime; +import java.util.List; +import java.util.Map; +import java.util.UUID; +import java.util.function.BiFunction; +import java.util.function.Function; +import java.util.stream.Collectors; +import java.util.stream.StreamSupport; +import javax.annotation.Nullable; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.LogicalType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.logicaltypes.EnumerationType; +import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Functions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Bytes; +import org.joda.time.ReadableInstant; + +/** + * Utility methods for converting Beam {@link Row} objects to dynamic protocol message, for use with + * the Storage write API. + */ +public class BeamRowToStorageApiProto { + // Number of digits after the decimal point supported by the NUMERIC data type. + private static final int NUMERIC_SCALE = 9; + // Maximum and minimum allowed values for the NUMERIC data type. + private static final BigDecimal MAX_NUMERIC_VALUE = + new BigDecimal("99999999999999999999999999999.999999999"); + private static final BigDecimal MIN_NUMERIC_VALUE = + new BigDecimal("-99999999999999999999999999999.999999999"); + + // TODO(reuvenlax): Support BIGNUMERIC and GEOGRAPHY types. + static final Map PRIMITIVE_TYPES = + ImmutableMap.builder() + .put(TypeName.INT16, Type.TYPE_INT32) + .put(TypeName.BYTE, Type.TYPE_INT32) + .put(TypeName.INT32, Type.TYPE_INT32) + .put(TypeName.INT64, Type.TYPE_INT64) + .put(TypeName.FLOAT, Type.TYPE_FLOAT) + .put(TypeName.DOUBLE, Type.TYPE_DOUBLE) + .put(TypeName.STRING, Type.TYPE_STRING) + .put(TypeName.BOOLEAN, Type.TYPE_BOOL) + .put(TypeName.DATETIME, Type.TYPE_INT64) + .put(TypeName.BYTES, Type.TYPE_BYTES) + .put(TypeName.DECIMAL, Type.TYPE_BYTES) + .build(); + + // A map of supported logical types to the protobuf field type. + static final Map LOGICAL_TYPES = + ImmutableMap.builder() + .put(SqlTypes.DATE.getIdentifier(), Type.TYPE_INT32) + .put(SqlTypes.TIME.getIdentifier(), Type.TYPE_INT64) + .put(SqlTypes.DATETIME.getIdentifier(), Type.TYPE_INT64) + .put(SqlTypes.TIMESTAMP.getIdentifier(), Type.TYPE_INT64) + .put(EnumerationType.IDENTIFIER, Type.TYPE_STRING) + .build(); + + static final Map> PRIMITIVE_ENCODERS = + ImmutableMap.>builder() + .put(TypeName.INT16, o -> Integer.valueOf((Short) o)) + .put(TypeName.BYTE, o -> Integer.valueOf((Byte) o)) + .put(TypeName.INT32, Functions.identity()) + .put(TypeName.INT64, Functions.identity()) + .put(TypeName.FLOAT, Function.identity()) + .put(TypeName.DOUBLE, Function.identity()) + .put(TypeName.STRING, Function.identity()) + .put(TypeName.BOOLEAN, Function.identity()) + // A Beam DATETIME is actually a timestamp, not a DateTime. + .put(TypeName.DATETIME, o -> ((ReadableInstant) o).getMillis() * 1000) + .put(TypeName.BYTES, o -> ByteString.copyFrom((byte[]) o)) + .put(TypeName.DECIMAL, o -> serializeBigDecimalToNumeric((BigDecimal) o)) + .build(); + + // A map of supported logical types to their encoding functions. + static final Map, Object, Object>> LOGICAL_TYPE_ENCODERS = + ImmutableMap., Object, Object>>builder() + .put( + SqlTypes.DATE.getIdentifier(), + (logicalType, value) -> (int) ((LocalDate) value).toEpochDay()) + .put( + SqlTypes.TIME.getIdentifier(), + (logicalType, value) -> CivilTimeEncoder.encodePacked64TimeMicros((LocalTime) value)) + .put( + SqlTypes.DATETIME.getIdentifier(), + (logicalType, value) -> + CivilTimeEncoder.encodePacked64DatetimeSeconds((LocalDateTime) value)) + .put( + SqlTypes.TIMESTAMP.getIdentifier(), + (logicalType, value) -> ((java.time.Instant) value).toEpochMilli() * 1000) + .put( + EnumerationType.IDENTIFIER, + (logicalType, value) -> + ((EnumerationType) logicalType).toString((EnumerationType.Value) value)) + .build(); + + /** + * Given a Beam Schema, returns a protocol-buffer Descriptor that can be used to write data using + * the BigQuery Storage API. + */ + public static Descriptor getDescriptorFromSchema(Schema schema) + throws DescriptorValidationException { + DescriptorProto descriptorProto = descriptorSchemaFromBeamSchema(schema); + FileDescriptorProto fileDescriptorProto = + FileDescriptorProto.newBuilder().addMessageType(descriptorProto).build(); + FileDescriptor fileDescriptor = + FileDescriptor.buildFrom(fileDescriptorProto, new FileDescriptor[0]); + + return Iterables.getOnlyElement(fileDescriptor.getMessageTypes()); + } + + /** + * Given a Beam {@link Row} object, returns a protocol-buffer message that can be used to write + * data using the BigQuery Storage streaming API. + */ + public static DynamicMessage messageFromBeamRow(Descriptor descriptor, Row row) { + Schema beamSchema = row.getSchema(); + DynamicMessage.Builder builder = DynamicMessage.newBuilder(descriptor); + for (int i = 0; i < row.getFieldCount(); ++i) { + Field beamField = beamSchema.getField(i); + FieldDescriptor fieldDescriptor = + Preconditions.checkNotNull(descriptor.findFieldByName(beamField.getName().toLowerCase())); + @Nullable Object value = messageValueFromRowValue(fieldDescriptor, beamField, i, row); + if (value != null) { + builder.setField(fieldDescriptor, value); + } + } + return builder.build(); + } + + @VisibleForTesting + static DescriptorProto descriptorSchemaFromBeamSchema(Schema schema) { + Preconditions.checkState(schema.getFieldCount() > 0); + DescriptorProto.Builder descriptorBuilder = DescriptorProto.newBuilder(); + // Create a unique name for the descriptor ('-' characters cannot be used). + descriptorBuilder.setName("D" + UUID.randomUUID().toString().replace("-", "_")); + int i = 1; + List nestedTypes = Lists.newArrayList(); + for (Field field : schema.getFields()) { + FieldDescriptorProto.Builder fieldDescriptorProtoBuilder = + fieldDescriptorFromBeamField(field, i++, nestedTypes); + descriptorBuilder.addField(fieldDescriptorProtoBuilder); + } + nestedTypes.forEach(descriptorBuilder::addNestedType); + return descriptorBuilder.build(); + } + + private static FieldDescriptorProto.Builder fieldDescriptorFromBeamField( + Field field, int fieldNumber, List nestedTypes) { + FieldDescriptorProto.Builder fieldDescriptorBuilder = FieldDescriptorProto.newBuilder(); + fieldDescriptorBuilder = fieldDescriptorBuilder.setName(field.getName().toLowerCase()); + fieldDescriptorBuilder = fieldDescriptorBuilder.setNumber(fieldNumber); + + switch (field.getType().getTypeName()) { + case ROW: + @Nullable Schema rowSchema = field.getType().getRowSchema(); + if (rowSchema == null) { + throw new RuntimeException("Unexpected null schema!"); + } + DescriptorProto nested = descriptorSchemaFromBeamSchema(rowSchema); + nestedTypes.add(nested); + fieldDescriptorBuilder = + fieldDescriptorBuilder.setType(Type.TYPE_MESSAGE).setTypeName(nested.getName()); + break; + case ARRAY: + case ITERABLE: + @Nullable FieldType elementType = field.getType().getCollectionElementType(); + if (elementType == null) { + throw new RuntimeException("Unexpected null element type!"); + } + Preconditions.checkState( + !Preconditions.checkNotNull(elementType.getTypeName()).isCollectionType(), + "Nested arrays not supported by BigQuery."); + return fieldDescriptorFromBeamField( + Field.of(field.getName(), elementType), fieldNumber, nestedTypes) + .setLabel(Label.LABEL_REPEATED); + case LOGICAL_TYPE: + @Nullable LogicalType logicalType = field.getType().getLogicalType(); + if (logicalType == null) { + throw new RuntimeException("Unexpected null logical type " + field.getType()); + } + @Nullable Type type = LOGICAL_TYPES.get(logicalType.getIdentifier()); + if (type == null) { + throw new RuntimeException("Unsupported logical type " + field.getType()); + } + fieldDescriptorBuilder = fieldDescriptorBuilder.setType(type); + break; + case MAP: + throw new RuntimeException("Map types not supported by BigQuery."); + default: + @Nullable Type primitiveType = PRIMITIVE_TYPES.get(field.getType().getTypeName()); + if (primitiveType == null) { + throw new RuntimeException("Unsupported type " + field.getType()); + } + fieldDescriptorBuilder = fieldDescriptorBuilder.setType(primitiveType); + } + if (field.getType().getNullable()) { + fieldDescriptorBuilder = fieldDescriptorBuilder.setLabel(Label.LABEL_OPTIONAL); + } else { + fieldDescriptorBuilder = fieldDescriptorBuilder.setLabel(Label.LABEL_REQUIRED); + } + return fieldDescriptorBuilder; + } + + @Nullable + private static Object messageValueFromRowValue( + FieldDescriptor fieldDescriptor, Field beamField, int index, Row row) { + @Nullable Object value = row.getValue(index); + if (value == null) { + if (fieldDescriptor.isOptional()) { + return null; + } else { + throw new IllegalArgumentException( + "Received null value for non-nullable field " + fieldDescriptor.getName()); + } + } + return toProtoValue(fieldDescriptor, beamField.getType(), value); + } + + private static Object toProtoValue( + FieldDescriptor fieldDescriptor, FieldType beamFieldType, Object value) { + switch (beamFieldType.getTypeName()) { + case ROW: + return messageFromBeamRow(fieldDescriptor.getMessageType(), (Row) value); + case ARRAY: + List list = (List) value; + @Nullable FieldType arrayElementType = beamFieldType.getCollectionElementType(); + if (arrayElementType == null) { + throw new RuntimeException("Unexpected null element type!"); + } + return list.stream() + .map(v -> toProtoValue(fieldDescriptor, arrayElementType, v)) + .collect(Collectors.toList()); + case ITERABLE: + Iterable iterable = (Iterable) value; + @Nullable FieldType iterableElementType = beamFieldType.getCollectionElementType(); + if (iterableElementType == null) { + throw new RuntimeException("Unexpected null element type!"); + } + return StreamSupport.stream(iterable.spliterator(), false) + .map(v -> toProtoValue(fieldDescriptor, iterableElementType, v)) + .collect(Collectors.toList()); + case MAP: + throw new RuntimeException("Map types not supported by BigQuery."); + default: + return scalarToProtoValue(beamFieldType, value); + } + } + + @VisibleForTesting + static Object scalarToProtoValue(FieldType beamFieldType, Object value) { + if (beamFieldType.getTypeName() == TypeName.LOGICAL_TYPE) { + @Nullable LogicalType logicalType = beamFieldType.getLogicalType(); + if (logicalType == null) { + throw new RuntimeException("Unexpectedly null logical type " + beamFieldType); + } + @Nullable + BiFunction, Object, Object> logicalTypeEncoder = + LOGICAL_TYPE_ENCODERS.get(logicalType.getIdentifier()); + if (logicalTypeEncoder == null) { + throw new RuntimeException("Unsupported logical type " + logicalType.getIdentifier()); + } + return logicalTypeEncoder.apply(logicalType, value); + } else { + @Nullable + Function encoder = PRIMITIVE_ENCODERS.get(beamFieldType.getTypeName()); + if (encoder == null) { + throw new RuntimeException("Unexpected beam type " + beamFieldType); + } + return encoder.apply(value); + } + } + + static ByteString serializeBigDecimalToNumeric(BigDecimal o) { + return serializeBigDecimal(o, NUMERIC_SCALE, MAX_NUMERIC_VALUE, MIN_NUMERIC_VALUE, "Numeric"); + } + + private static ByteString serializeBigDecimal( + BigDecimal v, int scale, BigDecimal maxValue, BigDecimal minValue, String typeName) { + if (v.scale() > scale) { + throw new IllegalArgumentException( + typeName + " scale cannot exceed " + scale + ": " + v.toPlainString()); + } + if (v.compareTo(maxValue) > 0 || v.compareTo(minValue) < 0) { + throw new IllegalArgumentException(typeName + " overflow: " + v.toPlainString()); + } + + byte[] bytes = v.setScale(scale).unscaledValue().toByteArray(); + // NUMERIC/BIGNUMERIC values are serialized as scaled integers in two's complement form in + // little endian + // order. BigInteger requires the same encoding but in big endian order, therefore we must + // reverse the bytes that come from the proto. + Bytes.reverse(bytes); + return ByteString.copyFrom(bytes); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java index 7eeab1036a71..f465fd9823b9 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java @@ -76,11 +76,16 @@ class BigQueryAvroUtils { .put("GEOGRAPHY", Type.STRING) .put("BYTES", Type.BYTES) .put("INTEGER", Type.LONG) + .put("INT64", Type.LONG) .put("FLOAT", Type.DOUBLE) + .put("FLOAT64", Type.DOUBLE) .put("NUMERIC", Type.BYTES) + .put("BIGNUMERIC", Type.BYTES) .put("BOOLEAN", Type.BOOLEAN) + .put("BOOL", Type.BOOLEAN) .put("TIMESTAMP", Type.LONG) .put("RECORD", Type.RECORD) + .put("STRUCT", Type.RECORD) .put("DATE", Type.STRING) .put("DATE", Type.INT) .put("DATETIME", Type.STRING) @@ -325,12 +330,15 @@ private static Object convertRequiredField( return v.toString(); } case "INTEGER": + case "INT64": verify(v instanceof Long, "Expected Long, got %s", v.getClass()); return ((Long) v).toString(); case "FLOAT": + case "FLOAT64": verify(v instanceof Double, "Expected Double, got %s", v.getClass()); return v; case "NUMERIC": + case "BIGNUMERIC": // NUMERIC data types are represented as BYTES with the DECIMAL logical type. They are // converted back to Strings with precision and scale determined by the logical type. verify(v instanceof ByteBuffer, "Expected ByteBuffer, got %s", v.getClass()); @@ -350,6 +358,7 @@ private static Object convertRequiredField( verify(v instanceof Long, "Expected Long, got %s", v.getClass()); return formatTimestamp((Long) v); case "RECORD": + case "STRUCT": verify(v instanceof GenericRecord, "Expected GenericRecord, got %s", v.getClass()); return convertGenericRecordToTableRow((GenericRecord) v, fieldSchema.getFields()); case "BYTES": diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryDlqProvider.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryDlqProvider.java new file mode 100644 index 000000000000..ae2ca3ac26e6 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryDlqProvider.java @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.api.services.bigquery.model.TableRow; +import com.google.auto.service.AutoService; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.Failure; +import org.apache.beam.sdk.schemas.io.GenericDlqProvider; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@Internal +@AutoService(GenericDlqProvider.class) +public class BigQueryDlqProvider implements GenericDlqProvider { + @Override + public String identifier() { + return "bigquery"; + } + + @Override + public PTransform, PDone> newDlqTransform(String config) { + return new DlqTransform(config); + } + + private static class DlqTransform extends PTransform, PDone> { + private static final Logger LOG = LoggerFactory.getLogger(BigQueryDlqProvider.class); + private final String tableSpec; + + DlqTransform(String tableSpec) { + this.tableSpec = tableSpec; + } + + @Override + public PDone expand(PCollection input) { + input + .apply( + "Failure to Row", + MapElements.into(TypeDescriptor.of(TableRow.class)).via(DlqTransform::getTableRow)) + .apply("Write Failures to BigQuery", BigQueryIO.writeTableRows().to(tableSpec)) + .getFailedInsertsWithErr() + .apply( + "Log insert failures", + MapElements.into(TypeDescriptor.of(Void.class)) + .via( + x -> { + LOG.error("Failed to insert error into BigQuery table. {}", x); + return null; + })); + return PDone.in(input.getPipeline()); + } + + private static TableRow getTableRow(Failure failure) { + Row row = + Row.withSchema( + Schema.builder().addByteArrayField("payload").addStringField("error").build()) + .withFieldValue("payload", failure.getPayload()) + .withFieldValue("error", failure.getError()) + .build(); + return BigQueryUtils.toTableRow(row); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java index bba618a3c5ee..7524b57d7d9f 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java @@ -251,7 +251,7 @@ boolean pollJob() throws IOException { case FAILED: String oldJobId = currentJobId.getJobId(); currentJobId = BigQueryHelpers.getRetryJobId(currentJobId, lookupJob).jobId; - LOG.info( + LOG.warn( "Load job {} failed, {}: {}. Next job id {}", oldJobId, shouldRetry() ? "will retry" : "will not retry", @@ -411,6 +411,21 @@ public static TableReference parseTableSpec(String tableSpec) { return ref.setDatasetId(match.group("DATASET")).setTableId(match.group("TABLE")); } + public static TableReference parseTableUrn(String tableUrn) { + Matcher match = BigQueryIO.TABLE_URN_SPEC.matcher(tableUrn); + if (!match.matches()) { + throw new IllegalArgumentException( + "Table reference is not in projects/[project_id]/datasets/[dataset_id]/tables/[table_id] " + + "format: " + + tableUrn); + } + + TableReference ref = new TableReference(); + ref.setProjectId(match.group("PROJECT")); + + return ref.setDatasetId(match.group("DATASET")).setTableId(match.group("TABLE")); + } + /** Strip off any partition decorator information from a tablespec. */ public static String stripPartitionDecorator(String tableSpec) { int index = tableSpec.lastIndexOf('$'); @@ -525,12 +540,17 @@ static void verifyDatasetPresence(DatasetService datasetService, TableReference public static @Nullable BigInteger getNumRows(BigQueryOptions options, TableReference tableRef) throws InterruptedException, IOException { - DatasetService datasetService = new BigQueryServicesImpl().getDatasetService(options); - Table table = datasetService.getTable(tableRef); - if (table == null) { - return null; + try (DatasetService datasetService = new BigQueryServicesImpl().getDatasetService(options)) { + Table table = datasetService.getTable(tableRef); + if (table == null) { + return null; + } + return table.getNumRows(); + } catch (IOException | InterruptedException e) { + throw e; + } catch (Exception e) { + throw new RuntimeException(e); } - return table.getNumRows(); } static String getDatasetLocation( diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java index d840836acd6e..8b9b705bdb04 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java @@ -504,6 +504,16 @@ public class BigQueryIO { static final Pattern TABLE_SPEC = Pattern.compile(DATASET_TABLE_REGEXP); + /** + * Matches table specifications in the form {@code "projects/[project_id]/datasets/[dataset_id]/tables[table_id]". + */ + private static final String TABLE_URN_REGEXP = + String.format( + "projects/(?%s)/datasets/(?%s)/tables/(?

  • %s)", + PROJECT_ID_REGEXP, DATASET_REGEXP, TABLE_REGEXP); + + static final Pattern TABLE_URN_SPEC = Pattern.compile(TABLE_URN_REGEXP); + /** * A formatting function that maps a TableRow to itself. This allows sending a {@code * PCollection} directly to BigQueryIO.Write. @@ -581,6 +591,7 @@ public static TypedRead read(SerializableFunction par .setParseFn(parseFn) .setMethod(Method.DEFAULT) .setUseAvroLogicalTypes(false) + .setFormat(DataFormat.AVRO) .build(); } @@ -770,6 +781,9 @@ abstract static class Builder { abstract Builder setMethod(Method method); + @Experimental(Experimental.Kind.SOURCE_SINK) + abstract Builder setFormat(DataFormat method); + abstract Builder setSelectedFields(ValueProvider> selectedFields); abstract Builder setRowRestriction(ValueProvider rowRestriction); @@ -818,6 +832,9 @@ abstract static class Builder { abstract Method getMethod(); + @Experimental(Experimental.Kind.SOURCE_SINK) + abstract DataFormat getFormat(); + abstract @Nullable ValueProvider> getSelectedFields(); abstract @Nullable ValueProvider getRowRestriction(); @@ -907,6 +924,7 @@ private BigQueryStorageQuerySource createStorageQuerySource( getQueryLocation(), getQueryTempDataset(), getKmsKey(), + getFormat(), getParseFn(), outputCoder, getBigQueryServices()); @@ -947,44 +965,53 @@ public void validate(PipelineOptions options) { // earlier stages of the pipeline or if a query depends on earlier stages of a pipeline. // For these cases the withoutValidation method can be used to disable the check. if (getValidate()) { - if (table != null) { - checkArgument(table.isAccessible(), "Cannot call validate if table is dynamically set."); - } - if (table != null && table.get().getProjectId() != null) { - // Check for source table presence for early failure notification. - DatasetService datasetService = getBigQueryServices().getDatasetService(bqOptions); - BigQueryHelpers.verifyDatasetPresence(datasetService, table.get()); - BigQueryHelpers.verifyTablePresence(datasetService, table.get()); - } else if (getQuery() != null) { - checkArgument( - getQuery().isAccessible(), "Cannot call validate if query is dynamically set."); - JobService jobService = getBigQueryServices().getJobService(bqOptions); - try { - jobService.dryRunQuery( - bqOptions.getProject(), - new JobConfigurationQuery() - .setQuery(getQuery().get()) - .setFlattenResults(getFlattenResults()) - .setUseLegacySql(getUseLegacySql()), - getQueryLocation()); - } catch (Exception e) { - throw new IllegalArgumentException( - String.format(QUERY_VALIDATION_FAILURE_ERROR, getQuery().get()), e); + try (DatasetService datasetService = getBigQueryServices().getDatasetService(bqOptions)) { + if (table != null) { + checkArgument( + table.isAccessible(), "Cannot call validate if table is dynamically set."); } + if (table != null && table.get().getProjectId() != null) { + // Check for source table presence for early failure notification. + BigQueryHelpers.verifyDatasetPresence(datasetService, table.get()); + BigQueryHelpers.verifyTablePresence(datasetService, table.get()); + } else if (getQuery() != null) { + checkArgument( + getQuery().isAccessible(), "Cannot call validate if query is dynamically set."); + JobService jobService = getBigQueryServices().getJobService(bqOptions); + try { + jobService.dryRunQuery( + bqOptions.getBigQueryProject() == null + ? bqOptions.getProject() + : bqOptions.getBigQueryProject(), + new JobConfigurationQuery() + .setQuery(getQuery().get()) + .setFlattenResults(getFlattenResults()) + .setUseLegacySql(getUseLegacySql()), + getQueryLocation()); + } catch (Exception e) { + throw new IllegalArgumentException( + String.format(QUERY_VALIDATION_FAILURE_ERROR, getQuery().get()), e); + } - DatasetService datasetService = getBigQueryServices().getDatasetService(bqOptions); - // If the user provided a temp dataset, check if the dataset exists before launching the - // query - if (getQueryTempDataset() != null) { - // The temp table is only used for dataset and project id validation, not for table name - // validation - TableReference tempTable = - new TableReference() - .setProjectId(bqOptions.getProject()) - .setDatasetId(getQueryTempDataset()) - .setTableId("dummy table"); - BigQueryHelpers.verifyDatasetPresence(datasetService, tempTable); + // If the user provided a temp dataset, check if the dataset exists before launching the + // query + if (getQueryTempDataset() != null) { + // The temp table is only used for dataset and project id validation, not for table + // name + // validation + TableReference tempTable = + new TableReference() + .setProjectId( + bqOptions.getBigQueryProject() == null + ? bqOptions.getProject() + : bqOptions.getBigQueryProject()) + .setDatasetId(getQueryTempDataset()) + .setTableId("dummy table"); + BigQueryHelpers.verifyDatasetPresence(datasetService, tempTable); + } } + } catch (Exception e) { + throw new RuntimeException(e); } } } @@ -1152,7 +1179,10 @@ void cleanup(PassThroughThenCleanup.ContextContainer c) throws Exception { String jobUuid = c.getJobId(); final String extractDestinationDir = resolveTempLocation(bqOptions.getTempLocation(), "BigQueryExtractTemp", jobUuid); - final String executingProject = bqOptions.getProject(); + final String executingProject = + bqOptions.getBigQueryProject() == null + ? bqOptions.getProject() + : bqOptions.getBigQueryProject(); JobReference jobRef = new JobReference() .setProjectId(executingProject) @@ -1195,6 +1225,7 @@ private PCollection expandForDirectRead(PBegin input, Coder outputCoder) { org.apache.beam.sdk.io.Read.from( BigQueryStorageTableSource.create( tableProvider, + getFormat(), getSelectedFields(), getRowRestriction(), getParseFn(), @@ -1277,7 +1308,10 @@ public void processElement(ProcessContext c) throws Exception { CreateReadSessionRequest request = CreateReadSessionRequest.newBuilder() .setParent( - BigQueryHelpers.toProjectResourceName(options.getProject())) + BigQueryHelpers.toProjectResourceName( + options.getBigQueryProject() == null + ? options.getProject() + : options.getBigQueryProject())) .setReadSession( ReadSession.newBuilder() .setTable( @@ -1364,20 +1398,24 @@ void cleanup(ContextContainer c) throws Exception { TableReference tempTable = createTempTableReference( - options.getProject(), + options.getBigQueryProject() == null + ? options.getProject() + : options.getBigQueryProject(), BigQueryResourceNaming.createJobIdPrefix( options.getJobName(), jobUuid, JobType.QUERY), queryTempDataset); - DatasetService datasetService = getBigQueryServices().getDatasetService(options); - LOG.info("Deleting temporary table with query results {}", tempTable); - datasetService.deleteTable(tempTable); - // Delete dataset only if it was created by Beam - boolean datasetCreatedByBeam = !queryTempDataset.isPresent(); - if (datasetCreatedByBeam) { - LOG.info( - "Deleting temporary dataset with query results {}", tempTable.getDatasetId()); - datasetService.deleteDataset(tempTable.getProjectId(), tempTable.getDatasetId()); + try (DatasetService datasetService = + getBigQueryServices().getDatasetService(options)) { + LOG.info("Deleting temporary table with query results {}", tempTable); + datasetService.deleteTable(tempTable); + // Delete dataset only if it was created by Beam + boolean datasetCreatedByBeam = !queryTempDataset.isPresent(); + if (datasetCreatedByBeam) { + LOG.info( + "Deleting temporary dataset with query results {}", tempTable.getDatasetId()); + datasetService.deleteDataset(tempTable.getProjectId(), tempTable.getDatasetId()); + } } } }; @@ -1536,6 +1574,12 @@ public TypedRead withMethod(Method method) { return toBuilder().setMethod(method).build(); } + /** See {@link DataFormat}. */ + @Experimental(Experimental.Kind.SOURCE_SINK) + public TypedRead withFormat(DataFormat format) { + return toBuilder().setFormat(format).build(); + } + /** See {@link #withSelectedFields(ValueProvider)}. */ public TypedRead withSelectedFields(List selectedFields) { return withSelectedFields(StaticValueProvider.of(selectedFields)); @@ -1654,6 +1698,7 @@ public static Write write() { .setWriteDisposition(Write.WriteDisposition.WRITE_EMPTY) .setSchemaUpdateOptions(Collections.emptySet()) .setNumFileShards(0) + .setNumStorageWriteApiStreams(0) .setMethod(Write.Method.DEFAULT) .setExtendedErrorInfo(false) .setSkipInvalidRows(false) @@ -1664,6 +1709,7 @@ public static Write write() { .setMaxBytesPerPartition(BatchLoads.DEFAULT_MAX_BYTES_PER_PARTITION) .setOptimizeWrites(false) .setUseBeamSchema(false) + .setAutoSharding(false) .build(); } @@ -1713,7 +1759,9 @@ public enum Method { * href="https://cloud.google.com/bigquery/streaming-data-into-bigquery">Streaming Data into * BigQuery. */ - STREAMING_INSERTS + STREAMING_INSERTS, + /** Use the new, experimental Storage Write API. */ + STORAGE_WRITE_API } abstract @Nullable ValueProvider getJsonTableRef(); @@ -1760,6 +1808,8 @@ public enum Method { abstract int getNumFileShards(); + abstract int getNumStorageWriteApiStreams(); + abstract int getMaxFilesPerPartition(); abstract long getMaxBytesPerPartition(); @@ -1789,6 +1839,9 @@ public enum Method { @Experimental(Kind.SCHEMAS) abstract Boolean getUseBeamSchema(); + @Experimental + abstract Boolean getAutoSharding(); + abstract Builder toBuilder(); @AutoValue.Builder @@ -1839,6 +1892,8 @@ abstract Builder setAvroSchemaFactory( abstract Builder setNumFileShards(int numFileShards); + abstract Builder setNumStorageWriteApiStreams(int numStorageApiStreams); + abstract Builder setMaxFilesPerPartition(int maxFilesPerPartition); abstract Builder setMaxBytesPerPartition(long maxBytesPerPartition); @@ -1868,6 +1923,9 @@ abstract Builder setAvroSchemaFactory( @Experimental(Kind.SCHEMAS) abstract Builder setUseBeamSchema(Boolean useBeamSchema); + @Experimental + abstract Builder setAutoSharding(Boolean autoSharding); + abstract Write build(); } @@ -2182,7 +2240,12 @@ public Write withWriteDisposition(WriteDisposition writeDisposition) { return toBuilder().setWriteDisposition(writeDisposition).build(); } - /** Allows the schema of the destination table to be updated as a side effect of the write. */ + /** + * Allows the schema of the destination table to be updated as a side effect of the write. + * + *

    This configuration applies only when writing to BigQuery with {@link Method#FILE_LOADS} as + * method. + */ public Write withSchemaUpdateOptions(Set schemaUpdateOptions) { checkArgument(schemaUpdateOptions != null, "schemaUpdateOptions can not be null"); return toBuilder().setSchemaUpdateOptions(schemaUpdateOptions).build(); @@ -2255,13 +2318,27 @@ public Write withTriggeringFrequency(Duration triggeringFrequency) { /** * Control how many file shards are written when using BigQuery load jobs. Applicable only when - * also setting {@link #withTriggeringFrequency}. + * also setting {@link #withTriggeringFrequency}. To let runner determine the sharding at + * runtime, set {@link #withAutoSharding()} instead. */ public Write withNumFileShards(int numFileShards) { checkArgument(numFileShards > 0, "numFileShards must be > 0, but was: %s", numFileShards); return toBuilder().setNumFileShards(numFileShards).build(); } + /** + * Control how many parallel streams are used when using Storage API writes. Applicable only + * when also setting {@link #withTriggeringFrequency}. To let runner determine the sharding at + * runtime, set {@link #withAutoSharding()} instead. + */ + public Write withNumStorageWriteApiStreams(int numStorageWriteApiStreams) { + checkArgument( + numStorageWriteApiStreams > 0, + "numStorageWriteApiStreams must be > 0, but was: %s", + numStorageWriteApiStreams); + return toBuilder().setNumStorageWriteApiStreams(numStorageWriteApiStreams).build(); + } + /** * Provides a custom location on GCS for storing temporary files to be loaded via BigQuery batch * load jobs. See "Usage with templates" in {@link BigQueryIO} documentation for discussion. @@ -2337,6 +2414,17 @@ public Write useBeamSchema() { return toBuilder().setUseBeamSchema(true).build(); } + /** + * If true, enables using a dynamically determined number of shards to write to BigQuery. This + * can be used for both {@link Method#FILE_LOADS} and {@link Method#STREAMING_INSERTS}. Only + * applicable to unbounded data. If using {@link Method#FILE_LOADS}, numFileShards set via + * {@link #withNumFileShards} will be ignored. + */ + @Experimental + public Write withAutoSharding() { + return toBuilder().setAutoSharding(true).build(); + } + @VisibleForTesting /** This method is for test usage only */ public Write withTestServices(BigQueryServices testServices) { @@ -2402,17 +2490,20 @@ public void validate(PipelineOptions pipelineOptions) { // The user specified a table. if (getJsonTableRef() != null && getJsonTableRef().isAccessible() && getValidate()) { TableReference table = getTableWithDefaultProject(options).get(); - DatasetService datasetService = getBigQueryServices().getDatasetService(options); - // Check for destination table presence and emptiness for early failure notification. - // Note that a presence check can fail when the table or dataset is created by an earlier - // stage of the pipeline. For these cases the #withoutValidation method can be used to - // disable the check. - BigQueryHelpers.verifyDatasetPresence(datasetService, table); - if (getCreateDisposition() == BigQueryIO.Write.CreateDisposition.CREATE_NEVER) { - BigQueryHelpers.verifyTablePresence(datasetService, table); - } - if (getWriteDisposition() == BigQueryIO.Write.WriteDisposition.WRITE_EMPTY) { - BigQueryHelpers.verifyTableNotExistOrEmpty(datasetService, table); + try (DatasetService datasetService = getBigQueryServices().getDatasetService(options)) { + // Check for destination table presence and emptiness for early failure notification. + // Note that a presence check can fail when the table or dataset is created by an earlier + // stage of the pipeline. For these cases the #withoutValidation method can be used to + // disable the check. + BigQueryHelpers.verifyDatasetPresence(datasetService, table); + if (getCreateDisposition() == BigQueryIO.Write.CreateDisposition.CREATE_NEVER) { + BigQueryHelpers.verifyTablePresence(datasetService, table); + } + if (getWriteDisposition() == BigQueryIO.Write.WriteDisposition.WRITE_EMPTY) { + BigQueryHelpers.verifyTableNotExistOrEmpty(datasetService, table); + } + } catch (Exception e) { + throw new RuntimeException(e); } } } @@ -2421,6 +2512,9 @@ private Method resolveMethod(PCollection input) { if (getMethod() != Method.DEFAULT) { return getMethod(); } + if (input.getPipeline().getOptions().as(BigQueryOptions.class).getUseStorageWriteApi()) { + return Method.STORAGE_WRITE_API; + } // By default, when writing an Unbounded PCollection, we use StreamingInserts and // BigQuery's streaming import API. return (input.isBounded() == IsBounded.UNBOUNDED) @@ -2428,6 +2522,23 @@ private Method resolveMethod(PCollection input) { : Method.FILE_LOADS; } + private Duration getStorageApiTriggeringFrequency(BigQueryOptions options) { + if (getTriggeringFrequency() != null) { + return getTriggeringFrequency(); + } + if (options.getStorageWriteApiTriggeringFrequencySec() != null) { + return Duration.standardSeconds(options.getStorageWriteApiTriggeringFrequencySec()); + } + return null; + } + + private int getStorageApiNumStreams(BigQueryOptions options) { + if (getNumStorageWriteApiStreams() != 0) { + return getNumStorageWriteApiStreams(); + } + return options.getNumStorageWriteApiStreams(); + } + @Override public WriteResult expand(PCollection input) { // We must have a destination to write to! @@ -2445,7 +2556,7 @@ public WriteResult expand(PCollection input) { allToArgs.stream() .filter(Predicates.notNull()::apply) .collect(Collectors.toList())), - "Exactly one of jsonTableRef, tableFunction, or " + "dynamicDestinations must be set"); + "Exactly one of jsonTableRef, tableFunction, or dynamicDestinations must be set"); List allSchemaArgs = Lists.newArrayList(getJsonSchema(), getSchemaFromView(), getDynamicDestinations()); @@ -2455,19 +2566,25 @@ public WriteResult expand(PCollection input) { allSchemaArgs.stream() .filter(Predicates.notNull()::apply) .collect(Collectors.toList())), - "No more than one of jsonSchema, schemaFromView, or dynamicDestinations may " + "be set"); + "No more than one of jsonSchema, schemaFromView, or dynamicDestinations may be set"); Method method = resolveMethod(input); - if (input.isBounded() == IsBounded.UNBOUNDED && method == Method.FILE_LOADS) { + if (input.isBounded() == IsBounded.UNBOUNDED + && (method == Method.FILE_LOADS || method == Method.STORAGE_WRITE_API)) { + BigQueryOptions bqOptions = input.getPipeline().getOptions().as(BigQueryOptions.class); + Duration triggeringFrequency = + (method == Method.STORAGE_WRITE_API) + ? getStorageApiTriggeringFrequency(bqOptions) + : getTriggeringFrequency(); checkArgument( - getTriggeringFrequency() != null, - "When writing an unbounded PCollection via FILE_LOADS, " + triggeringFrequency != null, + "When writing an unbounded PCollection via FILE_LOADS or STORAGE_API_WRITES, " + "triggering frequency must be specified"); } else { checkArgument( getTriggeringFrequency() == null && getNumFileShards() == 0, "Triggering frequency or number of file shards can be specified only when writing " - + "an unbounded PCollection via FILE_LOADS, but: the collection was %s " + + "an unbounded PCollection via FILE_LOADS or STORAGE_API_WRITES, but: the collection was %s " + "and the method was %s", input.isBounded(), method); @@ -2482,6 +2599,13 @@ public WriteResult expand(PCollection input) { method); } + if (input.isBounded() == IsBounded.BOUNDED) { + checkArgument(!getAutoSharding(), "Auto-sharding is only applicable to unbounded input."); + } + if (method == Method.STORAGE_WRITE_API) { + checkArgument(!getAutoSharding(), "Auto sharding not yet available for Storage API writes"); + } + if (getJsonTimePartitioning() != null) { checkArgument( getDynamicDestinations() == null, @@ -2549,7 +2673,7 @@ private WriteResult expandTyped( || getSchemaFromView() != null; if (getUseBeamSchema()) { - checkArgument(input.hasSchema()); + checkArgument(input.hasSchema(), "The input doesn't has a schema"); optimizeWrites = true; checkArgument( @@ -2575,7 +2699,7 @@ private WriteResult expandTyped( "CreateDisposition is CREATE_IF_NEEDED, however no schema was provided."); } - Coder destinationCoder = null; + Coder destinationCoder; try { destinationCoder = dynamicDestinations.getDestinationCoderWithDefault( @@ -2613,7 +2737,9 @@ private WriteResult expandTyped( + "A format function is not required if Beam schemas are used."); } } else { - checkArgument(avroRowWriterFactory == null); + checkArgument( + avroRowWriterFactory == null, + "When using a formatFunction, the AvroRowWriterFactory should be null"); checkArgument( formatFunction != null, "A function must be provided to convert the input type into a TableRow or " @@ -2624,32 +2750,38 @@ private WriteResult expandTyped( rowWriterFactory = RowWriterFactory.tableRows(formatFunction, formatRecordOnFailureFunction); } + PCollection> rowsWithDestination = input .apply( "PrepareWrite", new PrepareWrite<>(dynamicDestinations, SerializableFunctions.identity())) .setCoder(KvCoder.of(destinationCoder, input.getCoder())); + return continueExpandTyped( rowsWithDestination, input.getCoder(), + getUseBeamSchema() ? input.getSchema() : null, + getUseBeamSchema() ? input.getToRowFunction() : null, destinationCoder, dynamicDestinations, rowWriterFactory, method); } - private WriteResult continueExpandTyped( - PCollection> input, - Coder elementCoder, + private WriteResult continueExpandTyped( + PCollection> input, + Coder elementCoder, + @Nullable Schema elementSchema, + @Nullable SerializableFunction elementToRowFunction, Coder destinationCoder, DynamicDestinations dynamicDestinations, - RowWriterFactory rowWriterFactory, + RowWriterFactory rowWriterFactory, Method method) { if (method == Method.STREAMING_INSERTS) { checkArgument( getWriteDisposition() != WriteDisposition.WRITE_TRUNCATE, - "WriteDisposition.WRITE_TRUNCATE is not supported for an unbounded" + " PCollection."); + "WriteDisposition.WRITE_TRUNCATE is not supported for an unbounded PCollection."); InsertRetryPolicy retryPolicy = MoreObjects.firstNonNull(getFailedInsertRetryPolicy(), InsertRetryPolicy.alwaysRetry()); @@ -2660,10 +2792,9 @@ private WriteResult continueExpandTyped( getSchemaUpdateOptions() == null || getSchemaUpdateOptions().isEmpty(), "SchemaUpdateOptions are not supported when method == STREAMING_INSERTS"); - RowWriterFactory.TableRowWriterFactory tableRowWriterFactory = - (RowWriterFactory.TableRowWriterFactory) rowWriterFactory; - - StreamingInserts streamingInserts = + RowWriterFactory.TableRowWriterFactory tableRowWriterFactory = + (RowWriterFactory.TableRowWriterFactory) rowWriterFactory; + StreamingInserts streamingInserts = new StreamingInserts<>( getCreateDisposition(), dynamicDestinations, @@ -2676,9 +2807,10 @@ private WriteResult continueExpandTyped( .withSkipInvalidRows(getSkipInvalidRows()) .withIgnoreUnknownValues(getIgnoreUnknownValues()) .withIgnoreInsertIds(getIgnoreInsertIds()) + .withAutoSharding(getAutoSharding()) .withKmsKey(getKmsKey()); return input.apply(streamingInserts); - } else { + } else if (method == Method.FILE_LOADS) { checkArgument( getFailedInsertRetryPolicy() == null, "Record-insert retry policies are not supported when using BigQuery load jobs."); @@ -2689,7 +2821,7 @@ private WriteResult continueExpandTyped( "useAvroLogicalTypes can only be set with Avro output."); } - BatchLoads batchLoads = + BatchLoads batchLoads = new BatchLoads<>( getWriteDisposition(), getCreateDisposition(), @@ -2723,8 +2855,43 @@ private WriteResult continueExpandTyped( batchLoads.setMaxRetryJobs(1000); } batchLoads.setTriggeringFrequency(getTriggeringFrequency()); - batchLoads.setNumFileShards(getNumFileShards()); + if (getAutoSharding()) { + batchLoads.setNumFileShards(0); + } else { + batchLoads.setNumFileShards(getNumFileShards()); + } return input.apply(batchLoads); + } else if (method == Method.STORAGE_WRITE_API) { + StorageApiDynamicDestinations storageApiDynamicDestinations; + if (getUseBeamSchema()) { + // This ensures that the Beam rows are directly translated into protos for Sorage API + // writes, with no + // need to round trip through JSON TableRow objects. + storageApiDynamicDestinations = + new StorageApiDynamicDestinationsBeamRow( + dynamicDestinations, elementSchema, elementToRowFunction); + } else { + RowWriterFactory.TableRowWriterFactory tableRowWriterFactory = + (RowWriterFactory.TableRowWriterFactory) rowWriterFactory; + // Fallback behavior: convert to JSON TableRows and convert those into Beam TableRows. + storageApiDynamicDestinations = + new StorageApiDynamicDestinationsTableRow<>( + dynamicDestinations, tableRowWriterFactory.getToRowFn()); + } + + BigQueryOptions bqOptions = input.getPipeline().getOptions().as(BigQueryOptions.class); + StorageApiLoads storageApiLoads = + new StorageApiLoads( + destinationCoder, + storageApiDynamicDestinations, + getCreateDisposition(), + getKmsKey(), + getStorageApiTriggeringFrequency(bqOptions), + getBigQueryServices(), + getStorageApiNumStreams(bqOptions)); + return input.apply("StorageApiLoads", storageApiLoads); + } else { + throw new RuntimeException("Unexpected write method " + method); } } @@ -2786,7 +2953,10 @@ ValueProvider getTableWithDefaultProject(BigQueryOptions bqOptio // If user does not specify a project we assume the table to be located in // the default project. TableReference tableRef = table.get(); - tableRef.setProjectId(bqOptions.getProject()); + tableRef.setProjectId( + bqOptions.getBigQueryProject() == null + ? bqOptions.getProject() + : bqOptions.getBigQueryProject()); return NestedValueProvider.of( StaticValueProvider.of(BigQueryHelpers.toJsonString(tableRef)), new JsonTableRefToTableRef()); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java index 8944f9c5ce03..165898599c8f 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java @@ -72,10 +72,42 @@ public interface BigQueryOptions void setMaxStreamingBatchSize(Long value); @Description( - "The minimum duration in seconds between percentile latencies logging. The interval " - + "might be longer than the specified value due to each bundle processing time.") + "The minimum duration in seconds between streaming API statistics logging. " + + "The interval might be longer than the specified value due to each bundle processing time.") @Default.Integer(180) - Integer getLatencyLoggingFrequency(); + Integer getBqStreamingApiLoggingFrequencySec(); - void setLatencyLoggingFrequency(Integer value); + void setBqStreamingApiLoggingFrequencySec(Integer value); + + @Description("If set, then BigQueryIO.Write will default to using the Storage Write API.") + @Default.Boolean(false) + Boolean getUseStorageWriteApi(); + + void setUseStorageWriteApi(Boolean value); + + @Description( + "If set, then BigQueryIO.Write will default to using this number of Storage Write API streams.") + @Default.Integer(0) + Integer getNumStorageWriteApiStreams(); + + void setNumStorageWriteApiStreams(Integer value); + + @Description( + "If set, then BigQueryIO.Write will default to triggering the Storage Write API writes this often.") + Integer getStorageWriteApiTriggeringFrequencySec(); + + void setStorageWriteApiTriggeringFrequencySec(Integer value); + + @Description( + "When auto-sharding is used, the maximum duration in milliseconds the input records are" + + " allowed to be buffered before being written to BigQuery.") + @Default.Integer(0) + Integer getMaxBufferingDurationMilliSec(); + + void setMaxBufferingDurationMilliSec(Integer value); + + @Description("If specified, it will override the default (GcpOptions#getProject()) project id.") + String getBigQueryProject(); + + void setBigQueryProject(String value); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQueryHelper.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQueryHelper.java index 4193ba682cc5..b78afc514aee 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQueryHelper.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQueryHelper.java @@ -70,7 +70,9 @@ public static JobStatistics dryRunQueryIfNeeded( bqServices .getJobService(options) .dryRunQuery( - options.getProject(), + options.getBigQueryProject() == null + ? options.getProject() + : options.getBigQueryProject(), createBasicQueryConfig(query, flattenResults, useLegacySql), location); dryRunJobStats.compareAndSet(null, jobStatistics); @@ -109,7 +111,10 @@ public static TableReference executeQuery( .getReferencedTables(); if (referencedTables != null && !referencedTables.isEmpty()) { TableReference referencedTable = referencedTables.get(0); - effectiveLocation = tableService.getTable(referencedTable).getLocation(); + effectiveLocation = + tableService + .getDataset(referencedTable.getProjectId(), referencedTable.getDatasetId()) + .getLocation(); } } @@ -119,7 +124,12 @@ public static TableReference executeQuery( BigQueryResourceNaming.createJobIdPrefix(options.getJobName(), stepUuid, JobType.QUERY); Optional queryTempDatasetOpt = Optional.ofNullable(queryTempDatasetId); TableReference queryResultTable = - createTempTableReference(options.getProject(), queryJobId, queryTempDatasetOpt); + createTempTableReference( + options.getBigQueryProject() == null + ? options.getProject() + : options.getBigQueryProject(), + queryJobId, + queryTempDatasetOpt); boolean beamToCreateTempDataset = !queryTempDatasetOpt.isPresent(); // Create dataset only if it has not been set by the user @@ -153,7 +163,10 @@ public static TableReference executeQuery( JobReference jobReference = new JobReference() - .setProjectId(options.getProject()) + .setProjectId( + options.getBigQueryProject() == null + ? options.getProject() + : options.getBigQueryProject()) .setLocation(effectiveLocation) .setJobId(queryJobId); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySourceDef.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySourceDef.java index 090f5f19536e..2da260eabadf 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySourceDef.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySourceDef.java @@ -128,7 +128,9 @@ void cleanupTempResource(BigQueryOptions bqOptions, String stepUuid) throws Exce Optional queryTempDatasetOpt = Optional.ofNullable(tempDatasetId); TableReference tableToRemove = createTempTableReference( - bqOptions.getProject(), + bqOptions.getBigQueryProject() == null + ? bqOptions.getProject() + : bqOptions.getBigQueryProject(), BigQueryResourceNaming.createJobIdPrefix( bqOptions.getJobName(), stepUuid, JobType.QUERY), queryTempDatasetOpt); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java index 7e800fd0e898..7eae6fef33b5 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java @@ -69,6 +69,7 @@ static String createJobIdWithDestination( public enum JobType { LOAD, + TEMP_TABLE_LOAD, COPY, EXPORT, QUERY, diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java index f6894ca875e4..29d5c00a0a3c 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.bigquery; +import com.google.api.core.ApiFuture; import com.google.api.services.bigquery.model.Dataset; import com.google.api.services.bigquery.model.Job; import com.google.api.services.bigquery.model.JobConfigurationExtract; @@ -34,10 +35,16 @@ import com.google.cloud.bigquery.storage.v1.ReadSession; import com.google.cloud.bigquery.storage.v1.SplitReadStreamRequest; import com.google.cloud.bigquery.storage.v1.SplitReadStreamResponse; +import com.google.cloud.bigquery.storage.v1beta2.AppendRowsResponse; +import com.google.cloud.bigquery.storage.v1beta2.BatchCommitWriteStreamsResponse; +import com.google.cloud.bigquery.storage.v1beta2.FinalizeWriteStreamResponse; +import com.google.cloud.bigquery.storage.v1beta2.FlushRowsResponse; +import com.google.cloud.bigquery.storage.v1beta2.ProtoRows; +import com.google.cloud.bigquery.storage.v1beta2.WriteStream; +import com.google.protobuf.Descriptors.Descriptor; import java.io.IOException; import java.io.Serializable; import java.util.List; -import org.apache.beam.sdk.util.Histogram; import org.apache.beam.sdk.values.FailsafeValueInSingleWindow; import org.apache.beam.sdk.values.ValueInSingleWindow; import org.checkerframework.checker.nullness.qual.Nullable; @@ -51,14 +58,11 @@ public interface BigQueryServices extends Serializable { /** Returns a real, mock, or fake {@link DatasetService}. */ DatasetService getDatasetService(BigQueryOptions bqOptions); - /** Returns a real, mock, or fake {@link DatasetService}. */ - DatasetService getDatasetService(BigQueryOptions bqOptions, Histogram requestLatencies); - /** Returns a real, mock, or fake {@link StorageClient}. */ StorageClient getStorageClient(BigQueryOptions bqOptions) throws IOException; /** An interface for the Cloud BigQuery load service. */ - public interface JobService { + public interface JobService extends AutoCloseable { /** Start a BigQuery load job. */ void startLoadJob(JobReference jobRef, JobConfigurationLoad loadConfig) throws InterruptedException, IOException; @@ -94,7 +98,7 @@ JobStatistics dryRunQuery(String projectId, JobConfigurationQuery queryConfig, S } /** An interface to get, create and delete Cloud BigQuery datasets and tables. */ - public interface DatasetService { + public interface DatasetService extends AutoCloseable { /** * Gets the specified {@link Table} resource by table ID. * @@ -162,12 +166,54 @@ long insertAll( ErrorContainer errorContainer, boolean skipInvalidRows, boolean ignoreUnknownValues, - boolean ignoreInsertIds) + boolean ignoreInsertIds, + List> successfulRows) throws IOException, InterruptedException; /** Patch BigQuery {@link Table} description. */ Table patchTableDescription(TableReference tableReference, @Nullable String tableDescription) throws IOException, InterruptedException; + + /** Create a Write Stream for use with the Storage Write API. */ + WriteStream createWriteStream(String tableUrn, WriteStream.Type type) + throws IOException, InterruptedException; + + /** + * Create an append client for a given Storage API write stream. The stream must be created + * first. + */ + StreamAppendClient getStreamAppendClient(String streamName, Descriptor descriptor) + throws Exception; + + /** Flush a given stream up to the given offset. The stream must have type BUFFERED. */ + ApiFuture flush(String streamName, long flushOffset) + throws IOException, InterruptedException; + + /** + * Finalize a write stream. After finalization, no more records can be appended to the stream. + */ + ApiFuture finalizeWriteStream(String streamName); + + /** Commit write streams of type PENDING. The streams must be finalized before committing. */ + ApiFuture commitWriteStreams( + String tableUrn, Iterable writeStreamNames); + } + + /** An interface for appending records to a Storage API write stream. */ + interface StreamAppendClient extends AutoCloseable { + /** Append rows to a Storage API write stream at the given offset. */ + ApiFuture appendRows(long offset, ProtoRows rows) throws Exception; + + /** + * Pin this object. If close() is called before all pins are removed, the underlying resources + * will not be freed until all pins are removed. + */ + void pin(); + + /** + * Unpin this object. If the object has been closed, this will release any underlying resources. + */ + void unpin() throws Exception; } /** @@ -185,14 +231,23 @@ interface BigQueryServerStream extends Iterable, Serializable { /** An interface representing a client object for making calls to the BigQuery Storage API. */ interface StorageClient extends AutoCloseable { - /** Create a new read session against an existing table. */ + /** + * Create a new read session against an existing table. This method variant collects request + * count metric, table id in the request. + */ ReadSession createReadSession(CreateReadSessionRequest request); /** Read rows in the context of a specific read stream. */ BigQueryServerStream readRows(ReadRowsRequest request); + /* This method variant collects request count metric, using the fullTableID metadata. */ + BigQueryServerStream readRows(ReadRowsRequest request, String fullTableId); + SplitReadStreamResponse splitReadStream(SplitReadStreamRequest request); + /* This method variant collects request count metric, using the fullTableID metadata. */ + SplitReadStreamResponse splitReadStream(SplitReadStreamRequest request, String fullTableId); + /** * Close the client object. * diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java index 3d97b4234652..a636b0449ea4 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java @@ -27,8 +27,9 @@ import com.google.api.client.util.BackOffUtils; import com.google.api.client.util.ExponentialBackOff; import com.google.api.client.util.Sleeper; +import com.google.api.core.ApiFuture; import com.google.api.gax.core.FixedCredentialsProvider; -import com.google.api.gax.retrying.RetrySettings; +import com.google.api.gax.rpc.ApiException; import com.google.api.gax.rpc.FixedHeaderProvider; import com.google.api.gax.rpc.HeaderProvider; import com.google.api.gax.rpc.ServerStream; @@ -37,6 +38,7 @@ import com.google.api.services.bigquery.Bigquery.Tables; import com.google.api.services.bigquery.model.Dataset; import com.google.api.services.bigquery.model.DatasetReference; +import com.google.api.services.bigquery.model.ErrorProto; import com.google.api.services.bigquery.model.Job; import com.google.api.services.bigquery.model.JobConfiguration; import com.google.api.services.bigquery.model.JobConfigurationExtract; @@ -63,15 +65,38 @@ import com.google.cloud.bigquery.storage.v1.ReadSession; import com.google.cloud.bigquery.storage.v1.SplitReadStreamRequest; import com.google.cloud.bigquery.storage.v1.SplitReadStreamResponse; +import com.google.cloud.bigquery.storage.v1beta2.AppendRowsResponse; +import com.google.cloud.bigquery.storage.v1beta2.BatchCommitWriteStreamsRequest; +import com.google.cloud.bigquery.storage.v1beta2.BatchCommitWriteStreamsResponse; +import com.google.cloud.bigquery.storage.v1beta2.BigQueryWriteClient; +import com.google.cloud.bigquery.storage.v1beta2.BigQueryWriteSettings; +import com.google.cloud.bigquery.storage.v1beta2.CreateWriteStreamRequest; +import com.google.cloud.bigquery.storage.v1beta2.FinalizeWriteStreamRequest; +import com.google.cloud.bigquery.storage.v1beta2.FinalizeWriteStreamResponse; +import com.google.cloud.bigquery.storage.v1beta2.FlushRowsRequest; +import com.google.cloud.bigquery.storage.v1beta2.FlushRowsResponse; +import com.google.cloud.bigquery.storage.v1beta2.ProtoRows; +import com.google.cloud.bigquery.storage.v1beta2.ProtoSchema; +import com.google.cloud.bigquery.storage.v1beta2.StreamWriterV2; +import com.google.cloud.bigquery.storage.v1beta2.WriteStream; import com.google.cloud.hadoop.util.ApiErrorExtractor; import com.google.cloud.hadoop.util.ChainingHttpRequestInitializer; +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.Int64Value; +import com.google.rpc.RetryInfo; +import io.grpc.Metadata; +import io.grpc.Status; +import io.grpc.Status.Code; +import io.grpc.protobuf.ProtoUtils; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; import java.util.HashMap; +import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.Map; +import java.util.Set; import java.util.concurrent.Callable; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; @@ -81,7 +106,10 @@ import java.util.concurrent.TimeoutException; import java.util.concurrent.atomic.AtomicLong; import java.util.stream.Collectors; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.ServiceCallMetric; import org.apache.beam.sdk.extensions.gcp.auth.NullCredentialInitializer; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; import org.apache.beam.sdk.extensions.gcp.options.GcsOptions; import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter; import org.apache.beam.sdk.extensions.gcp.util.CustomHttpErrors; @@ -93,12 +121,13 @@ import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.util.FluentBackoff; -import org.apache.beam.sdk.util.Histogram; import org.apache.beam.sdk.util.ReleaseInfo; import org.apache.beam.sdk.values.FailsafeValueInSingleWindow; import org.apache.beam.sdk.values.ValueInSingleWindow; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; @@ -110,8 +139,9 @@ * service. */ @SuppressWarnings({"keyfor", "nullness"}) // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -class BigQueryServicesImpl implements BigQueryServices { +// TODO(https://issues.apache.org/jira/browse/BEAM-10402) +class BigQueryServicesImpl implements BigQueryServices { private static final Logger LOG = LoggerFactory.getLogger(BigQueryServicesImpl.class); // How frequently to log while polling. @@ -132,6 +162,14 @@ class BigQueryServicesImpl implements BigQueryServices { // The error code for quota exceeded error (https://cloud.google.com/bigquery/docs/error-messages) private static final String QUOTA_EXCEEDED = "quotaExceeded"; + protected static final Map API_METRIC_LABEL = + ImmutableMap.of( + MonitoringInfoConstants.Labels.SERVICE, "BigQuery", + MonitoringInfoConstants.Labels.METHOD, "BigQueryBatchWrite"); + + private static final Metadata.Key KEY_RETRY_INFO = + ProtoUtils.keyForProto(RetryInfo.getDefaultInstance()); + @Override public JobService getJobService(BigQueryOptions options) { return new JobServiceImpl(options); @@ -139,12 +177,7 @@ public JobService getJobService(BigQueryOptions options) { @Override public DatasetService getDatasetService(BigQueryOptions options) { - return new DatasetServiceImpl(options, null); - } - - @Override - public DatasetService getDatasetService(BigQueryOptions options, Histogram requestLatencies) { - return new DatasetServiceImpl(options, requestLatencies); + return new DatasetServiceImpl(options); } @Override @@ -171,7 +204,7 @@ static class JobServiceImpl implements BigQueryServices.JobService { private JobServiceImpl(BigQueryOptions options) { this.errorExtractor = new ApiErrorExtractor(); - this.client = newBigQueryClient(options, null).build(); + this.client = newBigQueryClient(options).build(); this.bqIOMetadata = BigQueryIOMetadata.create(); } @@ -421,10 +454,14 @@ public Job getJob(JobReference jobRef, Sleeper sleeper, BackOff backoff) jobRef, MAX_RPC_RETRIES), lastException); } + + @Override + public void close() throws Exception {} } @VisibleForTesting static class DatasetServiceImpl implements DatasetService { + // Backoff: 200ms * 1.5 ^ n, n=[1,5] private static final FluentBackoff INSERT_BACKOFF_FACTORY = FluentBackoff.DEFAULT.withInitialBackoff(Duration.millis(200)).withMaxRetries(5); @@ -436,6 +473,7 @@ static class DatasetServiceImpl implements DatasetService { private final ApiErrorExtractor errorExtractor; private final Bigquery client; + @Nullable private final BigQueryWriteClient newWriteClient; private final PipelineOptions options; private final long maxRowsPerBatch; private final long maxRowBatchSize; @@ -446,10 +484,12 @@ static class DatasetServiceImpl implements DatasetService { private ExecutorService executor; @VisibleForTesting - DatasetServiceImpl(Bigquery client, PipelineOptions options) { + DatasetServiceImpl( + Bigquery client, @Nullable BigQueryWriteClient newWriteClient, PipelineOptions options) { BigQueryOptions bqOptions = options.as(BigQueryOptions.class); this.errorExtractor = new ApiErrorExtractor(); this.client = client; + this.newWriteClient = newWriteClient; this.options = options; this.maxRowsPerBatch = bqOptions.getMaxStreamingRowsToBatch(); this.maxRowBatchSize = bqOptions.getMaxStreamingBatchSize(); @@ -457,19 +497,25 @@ static class DatasetServiceImpl implements DatasetService { } @VisibleForTesting - DatasetServiceImpl(Bigquery client, PipelineOptions options, long maxRowsPerBatch) { + DatasetServiceImpl( + Bigquery client, + BigQueryWriteClient newWriteClient, + PipelineOptions options, + long maxRowsPerBatch) { BigQueryOptions bqOptions = options.as(BigQueryOptions.class); this.errorExtractor = new ApiErrorExtractor(); this.client = client; + this.newWriteClient = newWriteClient; this.options = options; this.maxRowsPerBatch = maxRowsPerBatch; this.maxRowBatchSize = bqOptions.getMaxStreamingBatchSize(); this.executor = null; } - private DatasetServiceImpl(BigQueryOptions bqOptions, @Nullable Histogram requestLatencies) { + private DatasetServiceImpl(BigQueryOptions bqOptions) { this.errorExtractor = new ApiErrorExtractor(); - this.client = newBigQueryClient(bqOptions, requestLatencies).build(); + this.client = newBigQueryClient(bqOptions).build(); + this.newWriteClient = newBigQueryWriteClient(bqOptions); this.options = bqOptions; this.maxRowsPerBatch = bqOptions.getMaxStreamingRowsToBatch(); this.maxRowBatchSize = bqOptions.getMaxStreamingBatchSize(); @@ -584,7 +630,7 @@ Table tryCreateTable(Table table, BackOff backoff, Sleeper sleeper) throws IOExc table.getTableReference().getProjectId(), table.getTableReference().getDatasetId(), table.getTableReference().getTableId(), - TimeUnit.MILLISECONDS.toSeconds(RETRY_CREATE_TABLE_DURATION_MILLIS) / 60.0); + TimeUnit.MILLISECONDS.toMinutes(RETRY_CREATE_TABLE_DURATION_MILLIS)); retry = true; } continue; @@ -778,7 +824,8 @@ long insertAll( ErrorContainer errorContainer, boolean skipInvalidRows, boolean ignoreUnkownValues, - boolean ignoreInsertIds) + boolean ignoreInsertIds, + List> successfulRows) throws IOException, InterruptedException { checkNotNull(ref, "ref"); if (executor == null) { @@ -793,6 +840,7 @@ long insertAll( + "as many elements as rowList"); } + final Set failedIndices = new HashSet<>(); long retTotalDataSize = 0; List allErrors = new ArrayList<>(); // These lists contain the rows to publish. Initially the contain the entire list. @@ -802,6 +850,7 @@ long insertAll( if (!ignoreInsertIds) { idsToPublish = insertIdList; } + while (true) { List> retryRows = new ArrayList<>(); List retryIds = (idsToPublish != null) ? new ArrayList<>() : null; @@ -845,6 +894,8 @@ long insertAll( .insertAll(ref.getProjectId(), ref.getDatasetId(), ref.getTableId(), content) .setPrettyPrint(false); + // Create final reference (which cannot change). + // So the lamba expression can refer to rowsInsertedForRequest to use on error. futures.add( executor.submit( () -> { @@ -853,13 +904,27 @@ long insertAll( BackOffAdapter.toGcpBackOff(rateLimitBackoffFactory.backoff()); long totalBackoffMillis = 0L; while (true) { + ServiceCallMetric serviceCallMetric = BigQueryUtils.writeCallMetric(ref); try { - return insert.execute().getInsertErrors(); + List response = + insert.execute().getInsertErrors(); + if (response == null || response.isEmpty()) { + serviceCallMetric.call("ok"); + } else { + for (TableDataInsertAllResponse.InsertErrors insertErrors : response) { + for (ErrorProto insertError : insertErrors.getErrors()) { + serviceCallMetric.call(insertError.getReason()); + } + } + } + return response; } catch (IOException e) { GoogleJsonError.ErrorInfo errorInfo = getErrorInfo(e); if (errorInfo == null) { + serviceCallMetric.call(ServiceCallMetric.CANONICAL_STATUS_UNKNOWN); throw e; } + serviceCallMetric.call(errorInfo.getReason()); /** * TODO(BEAM-10584): Check for QUOTA_EXCEEDED error will be replaced by * ApiErrorExtractor.INSTANCE.quotaExceeded(e) after the next release of @@ -906,15 +971,19 @@ long insertAll( if (errors == null) { continue; } + for (TableDataInsertAllResponse.InsertErrors error : errors) { if (error.getIndex() == null) { throw new IOException("Insert failed: " + error + ", other errors: " + allErrors); } - int errorIndex = error.getIndex().intValue() + strideIndices.get(i); + failedIndices.add(errorIndex); if (retryPolicy.shouldRetry(new InsertRetryPolicy.Context(error))) { allErrors.add(error); retryRows.add(rowsToPublish.get(errorIndex)); + // TODO (BEAM-12139): Select the retry rows(using errorIndex) from the batch of rows + // which attempted insertion in this call. Not the entire set of rows in + // rowsToPublish. if (retryIds != null) { retryIds.add(idsToPublish.get(errorIndex)); } @@ -950,6 +1019,18 @@ long insertAll( allErrors.clear(); LOG.info("Retrying {} failed inserts to BigQuery", rowsToPublish.size()); } + if (successfulRows != null) { + for (int i = 0; i < rowsToPublish.size(); i++) { + if (!failedIndices.contains(i)) { + successfulRows.add( + ValueInSingleWindow.of( + rowsToPublish.get(i).getValue(), + rowsToPublish.get(i).getTimestamp(), + rowsToPublish.get(i).getWindow(), + rowsToPublish.get(i).getPane())); + } + } + } if (!allErrors.isEmpty()) { throw new IOException("Insert failed: " + allErrors); } else { @@ -967,7 +1048,8 @@ public long insertAll( ErrorContainer errorContainer, boolean skipInvalidRows, boolean ignoreUnknownValues, - boolean ignoreInsertIds) + boolean ignoreInsertIds, + List> successfulRows) throws IOException, InterruptedException { return insertAll( ref, @@ -981,7 +1063,8 @@ public long insertAll( errorContainer, skipInvalidRows, ignoreUnknownValues, - ignoreInsertIds); + ignoreInsertIds, + successfulRows); } protected GoogleJsonError.ErrorInfo getErrorInfo(IOException e) { @@ -1016,6 +1099,104 @@ public Table patchTableDescription( createDefaultBackoff(), ALWAYS_RETRY); } + + @Override + public WriteStream createWriteStream(String tableUrn, WriteStream.Type type) + throws IOException { + return newWriteClient.createWriteStream( + CreateWriteStreamRequest.newBuilder() + .setParent(tableUrn) + .setWriteStream(WriteStream.newBuilder().setType(type).build()) + .build()); + } + + @Override + public StreamAppendClient getStreamAppendClient(String streamName, Descriptor descriptor) + throws Exception { + ProtoSchema protoSchema = + ProtoSchema.newBuilder().setProtoDescriptor(descriptor.toProto()).build(); + StreamWriterV2 streamWriter = + StreamWriterV2.newBuilder(streamName).setWriterSchema(protoSchema).build(); + return new StreamAppendClient() { + private int pins = 0; + private boolean closed = false; + + @Override + public void close() throws Exception { + boolean closeWriter; + synchronized (this) { + Preconditions.checkState(!closed); + closed = true; + closeWriter = (pins == 0); + } + if (closeWriter) { + streamWriter.close(); + } + } + + @Override + public void pin() { + synchronized (this) { + Preconditions.checkState(!closed); + ++pins; + } + } + + @Override + public void unpin() throws Exception { + boolean closeWriter; + synchronized (this) { + Preconditions.checkState(pins > 0); + --pins; + closeWriter = (pins == 0) && closed; + } + if (closeWriter) { + streamWriter.close(); + } + } + + @Override + public ApiFuture appendRows(long offset, ProtoRows rows) + throws Exception { + return streamWriter.append(rows, offset); + } + }; + } + + @Override + public ApiFuture flush(String streamName, long flushOffset) + throws IOException, InterruptedException { + Int64Value offset = Int64Value.newBuilder().setValue(flushOffset).build(); + FlushRowsRequest request = + FlushRowsRequest.newBuilder().setWriteStream(streamName).setOffset(offset).build(); + return newWriteClient.flushRowsCallable().futureCall(request); + } + + @Override + public ApiFuture finalizeWriteStream(String streamName) { + return newWriteClient + .finalizeWriteStreamCallable() + .futureCall(FinalizeWriteStreamRequest.newBuilder().setName(streamName).build()); + } + + @Override + public ApiFuture commitWriteStreams( + String tableUrn, Iterable writeStreamNames) { + return newWriteClient + .batchCommitWriteStreamsCallable() + .futureCall( + BatchCommitWriteStreamsRequest.newBuilder() + .setParent(tableUrn) + .addAllWriteStreams(writeStreamNames) + .build()); + } + + @Override + public void close() throws Exception { + this.newWriteClient.shutdownNow(); + this.newWriteClient.awaitTermination(60, TimeUnit.SECONDS); + this.newWriteClient.close(); + } } static final SerializableFunction DONT_RETRY_NOT_FOUND = @@ -1059,8 +1240,9 @@ private static boolean nextBackOff(Sleeper sleeper, BackOff backoff) throws Inte } /** Returns a BigQuery client builder using the specified {@link BigQueryOptions}. */ - private static Bigquery.Builder newBigQueryClient( - BigQueryOptions options, @Nullable Histogram requestLatencies) { + private static Bigquery.Builder newBigQueryClient(BigQueryOptions options) { + // Do not log 404. It clutters the output and is possibly even required by the + // caller. RetryHttpRequestInitializer httpRequestInitializer = new RetryHttpRequestInitializer(ImmutableList.of(404)); httpRequestInitializer.setCustomErrors(createBigQueryClientCustomErrors()); @@ -1071,12 +1253,10 @@ private static Bigquery.Builder newBigQueryClient( credential == null ? new NullCredentialInitializer() : new HttpCredentialsAdapter(credential)); - // Do not log 404. It clutters the output and is possibly even required by the - // caller. + + initBuilder.add(new LatencyRecordingHttpRequestInitializer(API_METRIC_LABEL)); + initBuilder.add(httpRequestInitializer); - if (requestLatencies != null) { - initBuilder.add(new LatencyRecordingHttpRequestInitializer(requestLatencies)); - } HttpRequestInitializer chainInitializer = new ChainingHttpRequestInitializer( Iterables.toArray(initBuilder.build(), HttpRequestInitializer.class)); @@ -1086,6 +1266,17 @@ private static Bigquery.Builder newBigQueryClient( .setGoogleClientRequestInitializer(options.getGoogleApiTrace()); } + private static BigQueryWriteClient newBigQueryWriteClient(BigQueryOptions options) { + try { + return BigQueryWriteClient.create( + BigQueryWriteSettings.newBuilder() + .setCredentialsProvider(() -> options.as(GcpOptions.class).getGcpCredential()) + .build()); + } catch (Exception e) { + throw new RuntimeException(e); + } + } + public static CustomHttpErrors createBigQueryClientCustomErrors() { CustomHttpErrors.Builder builder = new CustomHttpErrors.Builder(); // 403 errors, to list tables, matching this URL: @@ -1121,6 +1312,32 @@ public void cancel() { static class StorageClientImpl implements StorageClient { + // If client retries ReadRows requests due to RESOURCE_EXHAUSTED error, bump + // throttlingMsecs according to delay. Runtime can use this information for + // autoscaling decisions. + @VisibleForTesting + public static class RetryAttemptCounter implements BigQueryReadSettings.RetryAttemptListener { + public final Counter throttlingMsecs = + Metrics.counter(StorageClientImpl.class, "throttling-msecs"); + + @SuppressWarnings("ProtoDurationGetSecondsGetNano") + @Override + public void onRetryAttempt(Status status, Metadata metadata) { + if (status != null + && status.getCode() == Code.RESOURCE_EXHAUSTED + && metadata != null + && metadata.containsKey(KEY_RETRY_INFO)) { + RetryInfo retryInfo = metadata.get(KEY_RETRY_INFO); + if (retryInfo.hasRetryDelay()) { + long delay = + retryInfo.getRetryDelay().getSeconds() * 1000 + + retryInfo.getRetryDelay().getNanos() / 1000000; + throttlingMsecs.inc(delay); + } + } + } + } + private static final HeaderProvider USER_AGENT_HEADER_PROVIDER = FixedHeaderProvider.create( "user-agent", "Apache_Beam_Java/" + ReleaseInfo.getReleaseInfo().getVersion()); @@ -1128,32 +1345,68 @@ static class StorageClientImpl implements StorageClient { private final BigQueryReadClient client; private StorageClientImpl(BigQueryOptions options) throws IOException { - BigQueryReadSettings.Builder builder = + BigQueryReadSettings.Builder settingsBuilder = BigQueryReadSettings.newBuilder() .setCredentialsProvider(FixedCredentialsProvider.create(options.getGcpCredential())) .setTransportChannelProvider( BigQueryReadSettings.defaultGrpcTransportProviderBuilder() .setHeaderProvider(USER_AGENT_HEADER_PROVIDER) - .build()); + .build()) + .setReadRowsRetryAttemptListener(new RetryAttemptCounter()); UnaryCallSettings.Builder createReadSessionSettings = - builder.getStubSettingsBuilder().createReadSessionSettings(); + settingsBuilder.getStubSettingsBuilder().createReadSessionSettings(); - RetrySettings.Builder retrySettings = + createReadSessionSettings.setRetrySettings( createReadSessionSettings .getRetrySettings() .toBuilder() .setInitialRpcTimeout(org.threeten.bp.Duration.ofHours(2)) .setMaxRpcTimeout(org.threeten.bp.Duration.ofHours(2)) - .setTotalTimeout(org.threeten.bp.Duration.ofHours(2)); + .setTotalTimeout(org.threeten.bp.Duration.ofHours(2)) + .build()); + + UnaryCallSettings.Builder + splitReadStreamSettings = + settingsBuilder.getStubSettingsBuilder().splitReadStreamSettings(); + + splitReadStreamSettings.setRetrySettings( + splitReadStreamSettings + .getRetrySettings() + .toBuilder() + .setInitialRpcTimeout(org.threeten.bp.Duration.ofSeconds(30)) + .setMaxRpcTimeout(org.threeten.bp.Duration.ofSeconds(30)) + .setTotalTimeout(org.threeten.bp.Duration.ofSeconds(30)) + .build()); + + this.client = BigQueryReadClient.create(settingsBuilder.build()); + } - createReadSessionSettings.setRetrySettings(retrySettings.build()); - this.client = BigQueryReadClient.create(builder.build()); + // Since BigQueryReadClient client's methods are final they cannot be mocked with Mockito for + // testing + // So this wrapper method can be mocked in tests, instead. + ReadSession callCreateReadSession(CreateReadSessionRequest request) { + return client.createReadSession(request); } @Override public ReadSession createReadSession(CreateReadSessionRequest request) { - return client.createReadSession(request); + TableReference tableReference = + BigQueryUtils.toTableReference(request.getReadSession().getTable()); + ServiceCallMetric serviceCallMetric = BigQueryUtils.readCallMetric(tableReference); + try { + ReadSession session = callCreateReadSession(request); + if (serviceCallMetric != null) { + serviceCallMetric.call("ok"); + } + return session; + + } catch (ApiException e) { + if (serviceCallMetric != null) { + serviceCallMetric.call(e.getStatusCode().getCode().name()); + } + throw e; + } } @Override @@ -1161,11 +1414,48 @@ public BigQueryServerStream readRows(ReadRowsRequest request) return new BigQueryServerStreamImpl<>(client.readRowsCallable().call(request)); } + @Override + public BigQueryServerStream readRows( + ReadRowsRequest request, String fullTableId) { + TableReference tableReference = BigQueryUtils.toTableReference(fullTableId); + ServiceCallMetric serviceCallMetric = BigQueryUtils.readCallMetric(tableReference); + try { + BigQueryServerStream response = readRows(request); + serviceCallMetric.call("ok"); + return response; + } catch (ApiException e) { + if (serviceCallMetric != null) { + serviceCallMetric.call(e.getStatusCode().getCode().name()); + } + throw e; + } + } + @Override public SplitReadStreamResponse splitReadStream(SplitReadStreamRequest request) { return client.splitReadStream(request); } + @Override + public SplitReadStreamResponse splitReadStream( + SplitReadStreamRequest request, String fullTableId) { + TableReference tableReference = BigQueryUtils.toTableReference(fullTableId); + ServiceCallMetric serviceCallMetric = BigQueryUtils.readCallMetric(tableReference); + try { + SplitReadStreamResponse response = splitReadStream(request); + + if (serviceCallMetric != null) { + serviceCallMetric.call("ok"); + } + return response; + } catch (ApiException e) { + if (serviceCallMetric != null) { + serviceCallMetric.call(e.getStatusCode().getCode().name()); + } + throw e; + } + } + @Override public void close() { client.close(); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageArrowReader.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageArrowReader.java new file mode 100644 index 000000000000..323b69c5b5fe --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageArrowReader.java @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.cloud.bigquery.storage.v1.ArrowSchema; +import com.google.cloud.bigquery.storage.v1.ReadRowsResponse; +import com.google.cloud.bigquery.storage.v1.ReadSession; +import java.io.IOException; +import java.io.InputStream; +import javax.annotation.Nullable; +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.types.pojo.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.beam.sdk.extensions.arrow.ArrowConversion; +import org.apache.beam.sdk.extensions.arrow.ArrowConversion.RecordBatchRowIterator; +import org.apache.beam.sdk.schemas.utils.AvroUtils; +import org.apache.beam.sdk.values.Row; + +@SuppressWarnings("nullness") +class BigQueryStorageArrowReader implements BigQueryStorageReader { + + private org.apache.beam.sdk.schemas.Schema arrowBeamSchema; + private @Nullable RecordBatchRowIterator recordBatchIterator; + private long rowCount; + private ArrowSchema protoSchema; + private @Nullable RootAllocator alloc; + + BigQueryStorageArrowReader(ReadSession readSession) throws IOException { + protoSchema = readSession.getArrowSchema(); + InputStream input = protoSchema.getSerializedSchema().newInput(); + this.arrowBeamSchema = + ArrowConversion.ArrowSchemaTranslator.toBeamSchema( + ArrowConversion.arrowSchemaFromInput(input)); + this.rowCount = 0; + this.alloc = null; + } + + @Override + public void processReadRowsResponse(ReadRowsResponse readRowsResponse) throws IOException { + com.google.cloud.bigquery.storage.v1.ArrowRecordBatch recordBatch = + readRowsResponse.getArrowRecordBatch(); + rowCount = recordBatch.getRowCount(); + this.alloc = new RootAllocator(Long.MAX_VALUE); + InputStream input = protoSchema.getSerializedSchema().newInput(); + Schema arrowSchema = ArrowConversion.arrowSchemaFromInput(input); + this.recordBatchIterator = + ArrowConversion.rowsFromSerializedRecordBatch( + arrowSchema, recordBatch.getSerializedRecordBatch().newInput(), this.alloc); + } + + @Override + public long getRowCount() { + return rowCount; + } + + @Override + public GenericRecord readSingleRecord() throws IOException { + if (recordBatchIterator == null) { + throw new IOException("Not Initialized"); + } + Row row = recordBatchIterator.next(); + // TODO(BEAM-12551): Update this interface to expect a Row, and avoid converting Arrow data to + // GenericRecord. + return AvroUtils.toGenericRecord(row, null); + } + + @Override + public boolean readyForNextReadResponse() throws IOException { + return recordBatchIterator == null || !recordBatchIterator.hasNext(); + } + + @Override + public void resetBuffer() { + cleanUp(); + } + + private void cleanUp() { + if (recordBatchIterator != null) { + recordBatchIterator.close(); + recordBatchIterator = null; + } + if (alloc != null) { + alloc.close(); + alloc = null; + } + } + + @Override + public void close() { + this.cleanUp(); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageAvroReader.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageAvroReader.java new file mode 100644 index 000000000000..1670c399f4e4 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageAvroReader.java @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.cloud.bigquery.storage.v1.AvroRows; +import com.google.cloud.bigquery.storage.v1.ReadRowsResponse; +import com.google.cloud.bigquery.storage.v1.ReadSession; +import java.io.IOException; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.BinaryDecoder; +import org.apache.avro.io.DatumReader; +import org.apache.avro.io.DecoderFactory; + +@SuppressWarnings({"nullness"}) +class BigQueryStorageAvroReader implements BigQueryStorageReader { + + private final Schema avroSchema; + private final DatumReader datumReader; + private BinaryDecoder decoder; + private GenericRecord record; + private long rowCount; + + BigQueryStorageAvroReader(ReadSession readSession) { + this.avroSchema = new Schema.Parser().parse(readSession.getAvroSchema().getSchema()); + this.datumReader = new GenericDatumReader<>(avroSchema); + this.rowCount = 0; + decoder = null; + record = null; + } + + @Override + public void processReadRowsResponse(ReadRowsResponse readRowsResponse) { + AvroRows avroRows = readRowsResponse.getAvroRows(); + rowCount = avroRows.getRowCount(); + decoder = + DecoderFactory.get() + .binaryDecoder(avroRows.getSerializedBinaryRows().toByteArray(), decoder); + } + + @Override + public long getRowCount() { + return rowCount; + } + + @Override + public GenericRecord readSingleRecord() throws IOException { + record = datumReader.read(record, decoder); + return record; + } + + @Override + public boolean readyForNextReadResponse() throws IOException { + return decoder == null || decoder.isEnd(); + } + + @Override + public void resetBuffer() { + decoder = null; + } + + @Override + public void close() {} +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageQuerySource.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageQuerySource.java index 464663fe57dd..120adc107fa2 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageQuerySource.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageQuerySource.java @@ -22,6 +22,7 @@ import com.google.api.services.bigquery.model.JobStatistics; import com.google.api.services.bigquery.model.Table; import com.google.api.services.bigquery.model.TableReference; +import com.google.cloud.bigquery.storage.v1.DataFormat; import java.io.IOException; import java.io.ObjectInputStream; import java.util.concurrent.atomic.AtomicReference; @@ -37,7 +38,7 @@ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public class BigQueryStorageQuerySource extends BigQueryStorageSourceBase { +class BigQueryStorageQuerySource extends BigQueryStorageSourceBase { public static BigQueryStorageQuerySource create( String stepUuid, @@ -48,6 +49,7 @@ public static BigQueryStorageQuerySource create( @Nullable String location, @Nullable String queryTempDataset, @Nullable String kmsKey, + @Nullable DataFormat format, SerializableFunction parseFn, Coder outputCoder, BigQueryServices bqServices) { @@ -60,6 +62,33 @@ public static BigQueryStorageQuerySource create( location, queryTempDataset, kmsKey, + format, + parseFn, + outputCoder, + bqServices); + } + + public static BigQueryStorageQuerySource create( + String stepUuid, + ValueProvider queryProvider, + Boolean flattenResults, + Boolean useLegacySql, + QueryPriority priority, + @Nullable String location, + @Nullable String kmsKey, + SerializableFunction parseFn, + Coder outputCoder, + BigQueryServices bqServices) { + return new BigQueryStorageQuerySource<>( + stepUuid, + queryProvider, + flattenResults, + useLegacySql, + priority, + location, + null, + kmsKey, + null, parseFn, outputCoder, bqServices); @@ -85,10 +114,11 @@ private BigQueryStorageQuerySource( @Nullable String location, @Nullable String queryTempDataset, @Nullable String kmsKey, + DataFormat format, SerializableFunction parseFn, Coder outputCoder, BigQueryServices bqServices) { - super(null, null, parseFn, outputCoder, bqServices); + super(format, null, null, parseFn, outputCoder, bqServices); this.stepUuid = checkNotNull(stepUuid, "stepUuid"); this.queryProvider = checkNotNull(queryProvider, "queryProvider"); this.flattenResults = checkNotNull(flattenResults, "flattenResults"); @@ -145,4 +175,9 @@ protected Table getTargetTable(BigQueryOptions options) throws Exception { kmsKey); return bqServices.getDatasetService(options).getTable(queryResultTable); } + + @Override + protected @Nullable String getTargetTableId(BigQueryOptions options) throws Exception { + return null; + } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageReader.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageReader.java new file mode 100644 index 000000000000..b93d64aa0f63 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageReader.java @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.cloud.bigquery.storage.v1.ReadRowsResponse; +import java.io.IOException; +import org.apache.avro.generic.GenericRecord; + +interface BigQueryStorageReader extends AutoCloseable { + + void processReadRowsResponse(ReadRowsResponse readRowsResponse) throws IOException; + + long getRowCount(); + + // TODO(BEAM-12551): BigQueryStorageReader should produce Rows, rather than GenericRecords + GenericRecord readSingleRecord() throws IOException; + + boolean readyForNextReadResponse() throws IOException; + + void resetBuffer(); + + @Override + void close(); +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageReaderFactory.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageReaderFactory.java new file mode 100644 index 000000000000..fba06d020699 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageReaderFactory.java @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.cloud.bigquery.storage.v1.ReadSession; +import java.io.IOException; + +class BigQueryStorageReaderFactory { + + private BigQueryStorageReaderFactory() {} + + public static BigQueryStorageReader getReader(ReadSession readSession) throws IOException { + if (readSession.hasAvroSchema()) { + return new BigQueryStorageAvroReader(readSession); + } else if (readSession.hasArrowSchema()) { + return new BigQueryStorageArrowReader(readSession); + } + throw new IllegalStateException("Read session does not have Avro/Arrow schema set."); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java index 64f2df94d7f2..504ec9e0e90f 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java @@ -29,10 +29,12 @@ import java.util.List; import org.apache.avro.Schema; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.extensions.arrow.ArrowConversion; import org.apache.beam.sdk.io.BoundedSource; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.StorageClient; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.schemas.utils.AvroUtils; import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; @@ -64,6 +66,7 @@ abstract class BigQueryStorageSourceBase extends BoundedSource { */ private static final int MIN_SPLIT_COUNT = 10; + protected final DataFormat format; protected final ValueProvider> selectedFieldsProvider; protected final ValueProvider rowRestrictionProvider; protected final SerializableFunction parseFn; @@ -71,11 +74,13 @@ abstract class BigQueryStorageSourceBase extends BoundedSource { protected final BigQueryServices bqServices; BigQueryStorageSourceBase( + DataFormat format, @Nullable ValueProvider> selectedFieldsProvider, @Nullable ValueProvider rowRestrictionProvider, SerializableFunction parseFn, Coder outputCoder, BigQueryServices bqServices) { + this.format = format; this.selectedFieldsProvider = selectedFieldsProvider; this.rowRestrictionProvider = rowRestrictionProvider; this.parseFn = checkNotNull(parseFn, "parseFn"); @@ -89,6 +94,8 @@ abstract class BigQueryStorageSourceBase extends BoundedSource { */ protected abstract Table getTargetTable(BigQueryOptions options) throws Exception; + protected abstract @Nullable String getTargetTableId(BigQueryOptions options) throws Exception; + @Override public Coder getOutputCoder() { return outputCoder; @@ -100,10 +107,16 @@ public List> split( BigQueryOptions bqOptions = options.as(BigQueryOptions.class); Table targetTable = getTargetTable(bqOptions); - ReadSession.Builder readSessionBuilder = - ReadSession.newBuilder() - .setTable(BigQueryHelpers.toTableResourceName(targetTable.getTableReference())) - .setDataFormat(DataFormat.AVRO); + String tableReferenceId = ""; + if (targetTable != null) { + tableReferenceId = BigQueryHelpers.toTableResourceName(targetTable.getTableReference()); + } else { + // If the table does not exist targetTable will be null. + // Construct the table id if we can generate it. For error recording/logging. + tableReferenceId = getTargetTableId(bqOptions); + } + + ReadSession.Builder readSessionBuilder = ReadSession.newBuilder().setTable(tableReferenceId); if (selectedFieldsProvider != null || rowRestrictionProvider != null) { ReadSession.TableReadOptions.Builder tableReadOptionsBuilder = @@ -116,6 +129,9 @@ public List> split( } readSessionBuilder.setReadOptions(tableReadOptionsBuilder); } + if (format != null) { + readSessionBuilder.setDataFormat(format); + } int streamCount = 0; if (desiredBundleSizeBytes > 0) { @@ -127,7 +143,11 @@ public List> split( CreateReadSessionRequest createReadSessionRequest = CreateReadSessionRequest.newBuilder() - .setParent(BigQueryHelpers.toProjectResourceName(bqOptions.getProject())) + .setParent( + BigQueryHelpers.toProjectResourceName( + bqOptions.getBigQueryProject() == null + ? bqOptions.getProject() + : bqOptions.getBigQueryProject())) .setReadSession(readSessionBuilder) .setMaxStreamCount(streamCount) .build(); @@ -146,7 +166,21 @@ public List> split( return ImmutableList.of(); } - Schema sessionSchema = new Schema.Parser().parse(readSession.getAvroSchema().getSchema()); + Schema sessionSchema; + if (readSession.getDataFormat() == DataFormat.ARROW) { + org.apache.arrow.vector.types.pojo.Schema schema = + ArrowConversion.arrowSchemaFromInput( + readSession.getArrowSchema().getSerializedSchema().newInput()); + org.apache.beam.sdk.schemas.Schema beamSchema = + ArrowConversion.ArrowSchemaTranslator.toBeamSchema(schema); + sessionSchema = AvroUtils.toAvroSchema(beamSchema); + } else if (readSession.getDataFormat() == DataFormat.AVRO) { + sessionSchema = new Schema.Parser().parse(readSession.getAvroSchema().getSchema()); + } else { + throw new IllegalArgumentException( + "data is not in a supported dataFormat: " + readSession.getDataFormat()); + } + TableSchema trimmedSchema = BigQueryAvroUtils.trimBigQueryTableSchema(targetTable.getSchema(), sessionSchema); List> sources = Lists.newArrayList(); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java index d220bcba8bd2..462c720a7da4 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java @@ -21,7 +21,9 @@ import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.toJsonString; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import com.google.api.gax.rpc.ApiException; import com.google.api.gax.rpc.FailedPreconditionException; +import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableSchema; import com.google.cloud.bigquery.storage.v1.ReadRowsRequest; import com.google.cloud.bigquery.storage.v1.ReadRowsResponse; @@ -33,12 +35,7 @@ import java.util.Iterator; import java.util.List; import java.util.NoSuchElementException; -import org.apache.avro.Schema; -import org.apache.avro.generic.GenericDatumReader; -import org.apache.avro.generic.GenericRecord; -import org.apache.avro.io.BinaryDecoder; -import org.apache.avro.io.DatumReader; -import org.apache.avro.io.DecoderFactory; +import org.apache.beam.runners.core.metrics.ServiceCallMetric; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.io.BoundedSource; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.BigQueryServerStream; @@ -56,7 +53,7 @@ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public class BigQueryStorageStreamSource extends BoundedSource { +class BigQueryStorageStreamSource extends BoundedSource { private static final Logger LOG = LoggerFactory.getLogger(BigQueryStorageStreamSource.class); @@ -149,7 +146,7 @@ public String toString() { /** A {@link org.apache.beam.sdk.io.Source.Reader} which reads records from a stream. */ public static class BigQueryStorageStreamReader extends BoundedSource.BoundedReader { - private final DatumReader datumReader; + private final BigQueryStorageReader reader; private final SerializableFunction parseFn; private final StorageClient storageClient; private final TableSchema tableSchema; @@ -157,24 +154,24 @@ public static class BigQueryStorageStreamReader extends BoundedSource.Bounded private BigQueryStorageStreamSource source; private BigQueryServerStream responseStream; private Iterator responseIterator; - private BinaryDecoder decoder; - private GenericRecord record; private T current; private long currentOffset; // Values used for progress reporting. + private boolean splitPossible = true; private double fractionConsumed; private double progressAtResponseStart; private double progressAtResponseEnd; private long rowsConsumedFromCurrentResponse; private long totalRowsInCurrentResponse; + private TableReference tableReference; + private ServiceCallMetric serviceCallMetric; + private BigQueryStorageStreamReader( BigQueryStorageStreamSource source, BigQueryOptions options) throws IOException { this.source = source; - this.datumReader = - new GenericDatumReader<>( - new Schema.Parser().parse(source.readSession.getAvroSchema().getSchema())); + this.reader = BigQueryStorageReaderFactory.getReader(source.readSession); this.parseFn = source.parseFn; this.storageClient = source.bqServices.getStorageClient(options); this.tableSchema = fromJsonString(source.jsonTableSchema, TableSchema.class); @@ -195,7 +192,9 @@ public synchronized boolean start() throws IOException { .setOffset(currentOffset) .build(); - responseStream = storageClient.readRows(request); + tableReference = BigQueryUtils.toTableReference(source.readSession.getTable()); + serviceCallMetric = BigQueryUtils.readCallMetric(tableReference); + responseStream = storageClient.readRows(request, source.readSession.getTable()); responseIterator = responseStream.iterator(); LOG.info("Started BigQuery Storage API read from stream {}.", source.readStream.getName()); return readNextRecord(); @@ -208,13 +207,29 @@ public synchronized boolean advance() throws IOException { } private synchronized boolean readNextRecord() throws IOException { - while (decoder == null || decoder.isEnd()) { + while (reader.readyForNextReadResponse()) { if (!responseIterator.hasNext()) { fractionConsumed = 1d; return false; } - ReadRowsResponse response = responseIterator.next(); + ReadRowsResponse response; + try { + response = responseIterator.next(); + // Since we don't have a direct hook to the underlying + // API call, record success ever time we read a record successfully. + if (serviceCallMetric != null) { + serviceCallMetric.call("ok"); + } + } catch (ApiException e) { + // Occasionally the iterator will fail and raise an exception. + // Capture it here and record the error in the metric. + if (serviceCallMetric != null) { + serviceCallMetric.call(e.getStatusCode().getCode().name()); + } + throw e; + } + progressAtResponseStart = response.getStats().getProgress().getAtResponseStart(); progressAtResponseEnd = response.getStats().getProgress().getAtResponseEnd(); totalRowsInCurrentResponse = response.getRowCount(); @@ -235,14 +250,12 @@ private synchronized boolean readNextRecord() throws IOException { "Progress at response end (%s) is not in the range [0.0, 1.0].", progressAtResponseEnd); - decoder = - DecoderFactory.get() - .binaryDecoder( - response.getAvroRows().getSerializedBinaryRows().toByteArray(), decoder); + reader.processReadRowsResponse(response); } - record = datumReader.read(record, decoder); - current = parseFn.apply(new SchemaAndRecord(record, tableSchema)); + SchemaAndRecord schemaAndRecord = new SchemaAndRecord(reader.readSingleRecord(), tableSchema); + + current = parseFn.apply(schemaAndRecord); // Updates the fraction consumed value. This value is calculated by interpolating between // the fraction consumed value from the previous server response (or zero if we're consuming @@ -268,6 +281,7 @@ public T getCurrent() throws NoSuchElementException { @Override public synchronized void close() { storageClient.close(); + reader.close(); } @Override @@ -288,6 +302,10 @@ public BoundedSource splitAtFraction(double fraction) { return null; } + if (!splitPossible) { + return null; + } + SplitReadStreamRequest splitRequest = SplitReadStreamRequest.newBuilder() .setName(source.readStream.getName()) @@ -305,6 +323,7 @@ public BoundedSource splitAtFraction(double fraction) { "BigQuery Storage API stream {} cannot be split at {}.", source.readStream.getName(), fraction); + splitPossible = false; return null; } @@ -320,7 +339,8 @@ public BoundedSource splitAtFraction(double fraction) { ReadRowsRequest.newBuilder() .setReadStream(splitResponse.getPrimaryStream().getName()) .setOffset(currentOffset + 1) - .build()); + .build(), + source.readSession.getTable()); newResponseIterator = newResponseStream.iterator(); newResponseIterator.hasNext(); } catch (FailedPreconditionException e) { @@ -350,7 +370,7 @@ public BoundedSource splitAtFraction(double fraction) { source = source.fromExisting(splitResponse.getPrimaryStream()); responseStream = newResponseStream; responseIterator = newResponseIterator; - decoder = null; + reader.resetBuffer(); } Metrics.counter(BigQueryStorageStreamReader.class, "split-at-fraction-calls-successful") diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageTableSource.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageTableSource.java index 9556bab19e7f..2a14cd50984f 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageTableSource.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageTableSource.java @@ -22,6 +22,7 @@ import com.google.api.services.bigquery.model.Table; import com.google.api.services.bigquery.model.TableReference; +import com.google.cloud.bigquery.storage.v1.DataFormat; import java.io.IOException; import java.io.ObjectInputStream; import java.util.List; @@ -46,13 +47,25 @@ public class BigQueryStorageTableSource extends BigQueryStorageSourceBase public static BigQueryStorageTableSource create( ValueProvider tableRefProvider, + DataFormat format, @Nullable ValueProvider> selectedFields, @Nullable ValueProvider rowRestriction, SerializableFunction parseFn, Coder outputCoder, BigQueryServices bqServices) { return new BigQueryStorageTableSource<>( - tableRefProvider, selectedFields, rowRestriction, parseFn, outputCoder, bqServices); + tableRefProvider, format, selectedFields, rowRestriction, parseFn, outputCoder, bqServices); + } + + public static BigQueryStorageTableSource create( + ValueProvider tableRefProvider, + @Nullable ValueProvider> selectedFields, + @Nullable ValueProvider rowRestriction, + SerializableFunction parseFn, + Coder outputCoder, + BigQueryServices bqServices) { + return new BigQueryStorageTableSource<>( + tableRefProvider, null, selectedFields, rowRestriction, parseFn, outputCoder, bqServices); } private final ValueProvider tableReferenceProvider; @@ -61,12 +74,13 @@ public static BigQueryStorageTableSource create( private BigQueryStorageTableSource( ValueProvider tableRefProvider, + DataFormat format, @Nullable ValueProvider> selectedFields, @Nullable ValueProvider rowRestriction, SerializableFunction parseFn, Coder outputCoder, BigQueryServices bqServices) { - super(selectedFields, rowRestriction, parseFn, outputCoder, bqServices); + super(format, selectedFields, rowRestriction, parseFn, outputCoder, bqServices); this.tableReferenceProvider = checkNotNull(tableRefProvider, "tableRefProvider"); cachedTable = new AtomicReference<>(); } @@ -88,11 +102,39 @@ public void populateDisplayData(DisplayData.Builder builder) { @Override public long getEstimatedSizeBytes(PipelineOptions options) throws Exception { - return getTargetTable(options.as(BigQueryOptions.class)).getNumBytes(); + Table table = getTargetTable(options.as(BigQueryOptions.class)); + if (table != null) { + return table.getNumBytes(); + } + // If the table does not exist, then it will be null. + // Avoid the NullPointerException here, allow a more meaningful table "not_found" + // error to be shown to the user, upon table read. + return 0; + } + + @Override + protected String getTargetTableId(BigQueryOptions options) throws Exception { + TableReference tableReference = tableReferenceProvider.get(); + if (Strings.isNullOrEmpty(tableReference.getProjectId())) { + checkState( + !Strings.isNullOrEmpty(options.getProject()), + "No project ID set in %s or %s, cannot construct a complete %s", + TableReference.class.getSimpleName(), + BigQueryOptions.class.getSimpleName(), + TableReference.class.getSimpleName()); + LOG.info( + "Project ID not set in {}. Using default project from {}.", + TableReference.class.getSimpleName(), + BigQueryOptions.class.getSimpleName()); + tableReference.setProjectId(options.getProject()); + } + return String.format( + "projects/%s/datasets/%s/tables/%s", + tableReference.getProjectId(), tableReference.getDatasetId(), tableReference.getTableId()); } @Override - protected Table getTargetTable(BigQueryOptions options) throws Exception { + protected @Nullable Table getTargetTable(BigQueryOptions options) throws Exception { if (cachedTable.get() == null) { TableReference tableReference = tableReferenceProvider.get(); if (Strings.isNullOrEmpty(tableReference.getProjectId())) { @@ -106,7 +148,10 @@ protected Table getTargetTable(BigQueryOptions options) throws Exception { "Project ID not set in {}. Using default project from {}.", TableReference.class.getSimpleName(), BigQueryOptions.class.getSimpleName()); - tableReference.setProjectId(options.getProject()); + tableReference.setProjectId( + options.getBigQueryProject() == null + ? options.getProject() + : options.getBigQueryProject()); } Table table = bqServices.getDatasetService(options).getTable(tableReference); cachedTable.compareAndSet(null, table); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTableSourceDef.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTableSourceDef.java index 8ce5ffc466d3..435876f2597f 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTableSourceDef.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTableSourceDef.java @@ -78,7 +78,10 @@ private TableReference setDefaultProjectIfAbsent( "Project ID not set in {}. Using default project from {}.", TableReference.class.getSimpleName(), BigQueryOptions.class.getSimpleName()); - tableReference.setProjectId(bqOptions.getProject()); + tableReference.setProjectId( + bqOptions.getBigQueryProject() == null + ? bqOptions.getProject() + : bqOptions.getBigQueryProject()); } return tableReference; } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java index 1eddacb0f659..5d35c70a6384 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java @@ -17,11 +17,14 @@ */ package org.apache.beam.sdk.io.gcp.bigquery; +import static java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME; +import static java.time.format.DateTimeFormatter.ISO_LOCAL_TIME; import static java.util.stream.Collectors.toList; import static java.util.stream.Collectors.toMap; import static org.apache.beam.sdk.values.Row.toRow; import com.google.api.services.bigquery.model.TableFieldSchema; +import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableRow; import com.google.api.services.bigquery.model.TableSchema; import com.google.auto.value.AutoValue; @@ -32,17 +35,23 @@ import java.time.LocalDateTime; import java.time.LocalTime; import java.util.ArrayList; +import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Optional; import java.util.Set; import java.util.function.Function; +import java.util.regex.Matcher; +import java.util.regex.Pattern; import java.util.stream.IntStream; import org.apache.avro.Conversions; import org.apache.avro.LogicalTypes; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericRecord; import org.apache.avro.util.Utf8; +import org.apache.beam.runners.core.metrics.GcpResourceIdentifiers; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.ServiceCallMetric; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.schemas.Schema; @@ -74,6 +83,19 @@ }) public class BigQueryUtils { + // For parsing the format returned on the API proto: + // google.cloud.bigquery.storage.v1.ReadSession.getTable() + // "projects/{project_id}/datasets/{dataset_id}/tables/{table_id}" + private static final Pattern TABLE_RESOURCE_PATTERN = + Pattern.compile( + "^projects/(?[^/]+)/datasets/(?[^/]+)/tables/(?

    [^/]+)$"); + + // For parsing the format used to refer to tables parameters in BigQueryIO. + // "{project_id}:{dataset_id}.{table_id}" or + // "{project_id}.{dataset_id}.{table_id}" + private static final Pattern SIMPLE_TABLE_PATTERN = + Pattern.compile("^(?[^\\.:]+)[\\.:](?[^\\.:]+)[\\.](?
    [^\\.:]+)$"); + /** Options for how to convert BigQuery data to Beam data. */ @AutoValue public abstract static class ConversionOptions implements Serializable { @@ -111,7 +133,7 @@ public abstract static class Builder { public abstract static class SchemaConversionOptions implements Serializable { /** - * Controls whether to use the map or row FieldType for a TableSchema field that appears to + * /** Controls whether to use the map or row FieldType for a TableSchema field that appears to * represent a map (it is an array of structs containing only {@code key} and {@code value} * fields). */ @@ -130,8 +152,11 @@ public abstract static class Builder { } } + private static final String BIGQUERY_TIME_PATTERN = "HH:mm:ss[.SSSSSS]"; + private static final java.time.format.DateTimeFormatter BIGQUERY_TIME_FORMATTER = + java.time.format.DateTimeFormatter.ofPattern(BIGQUERY_TIME_PATTERN); private static final java.time.format.DateTimeFormatter BIGQUERY_DATETIME_FORMATTER = - java.time.format.DateTimeFormatter.ofPattern("uuuu-MM-dd HH:mm:ss[.SSSSSS]"); + java.time.format.DateTimeFormatter.ofPattern("uuuu-MM-dd'T'" + BIGQUERY_TIME_PATTERN); private static final DateTimeFormatter BIGQUERY_TIMESTAMP_PRINTER; @@ -225,7 +250,7 @@ public abstract static class Builder { // TODO: BigQuery code should not be relying on Calcite metadata fields. If so, this belongs // in the SQL package. - private static final Map BEAM_TO_BIGQUERY_LOGICAL_MAPPING = + static final Map BEAM_TO_BIGQUERY_LOGICAL_MAPPING = ImmutableMap.builder() .put(SqlTypes.DATE.getIdentifier(), StandardSQLTypeName.DATE) .put(SqlTypes.TIME.getIdentifier(), StandardSQLTypeName.TIME) @@ -242,7 +267,7 @@ public abstract static class Builder { * Get the corresponding BigQuery {@link StandardSQLTypeName} for supported Beam {@link * FieldType}. */ - private static StandardSQLTypeName toStandardSQLTypeName(FieldType fieldType) { + static StandardSQLTypeName toStandardSQLTypeName(FieldType fieldType) { StandardSQLTypeName ret; if (fieldType.getTypeName().isLogicalType()) { ret = BEAM_TO_BIGQUERY_LOGICAL_MAPPING.get(fieldType.getLogicalType().getIdentifier()); @@ -547,11 +572,28 @@ public static TableRow toTableRow(Row row) { // For the JSON formats of DATE/DATETIME/TIME/TIMESTAMP types that BigQuery accepts, see // https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json#details_of_loading_json_data String identifier = fieldType.getLogicalType().getIdentifier(); - if (SqlTypes.DATE.getIdentifier().equals(identifier) - || SqlTypes.TIME.getIdentifier().equals(identifier)) { + if (SqlTypes.DATE.getIdentifier().equals(identifier)) { return fieldValue.toString(); + } else if (SqlTypes.TIME.getIdentifier().equals(identifier)) { + // LocalTime.toString() drops seconds if it is zero (see + // https://docs.oracle.com/javase/8/docs/api/java/time/LocalTime.html#toString--). + // but BigQuery TIME requires seconds + // (https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_type). + // Fractional seconds are optional so drop them to conserve number of bytes transferred. + LocalTime localTime = (LocalTime) fieldValue; + @SuppressWarnings( + "JavaLocalTimeGetNano") // Suppression is justified because seconds are always + // outputted. + java.time.format.DateTimeFormatter localTimeFormatter = + (0 == localTime.getNano()) ? ISO_LOCAL_TIME : BIGQUERY_TIME_FORMATTER; + return localTimeFormatter.format(localTime); } else if (SqlTypes.DATETIME.getIdentifier().equals(identifier)) { - return BIGQUERY_DATETIME_FORMATTER.format((LocalDateTime) fieldValue); + // Same rationale as SqlTypes.TIME + LocalDateTime localDateTime = (LocalDateTime) fieldValue; + @SuppressWarnings("JavaLocalDateTimeGetNano") + java.time.format.DateTimeFormatter localDateTimeFormatter = + (0 == localDateTime.getNano()) ? ISO_LOCAL_DATE_TIME : BIGQUERY_DATETIME_FORMATTER; + return localDateTimeFormatter.format(localDateTime); } else if ("Enum".equals(identifier)) { return fieldType .getLogicalType(EnumerationType.class) @@ -559,7 +601,7 @@ public static TableRow toTableRow(Row row) { } // fall through default: - return fieldValue; + return fieldValue.toString(); } } @@ -621,8 +663,10 @@ public static Row toBeamRow(Schema rowSchema, TableSchema bqSchema, TableRow jso } private static Object toBeamValue(FieldType fieldType, Object jsonBQValue) { - if (jsonBQValue instanceof String) { - String jsonBQString = (String) jsonBQValue; + if (jsonBQValue instanceof String + || jsonBQValue instanceof Number + || jsonBQValue instanceof Boolean) { + String jsonBQString = jsonBQValue.toString(); if (JSON_VALUE_PARSERS.containsKey(fieldType.getTypeName())) { return JSON_VALUE_PARSERS.get(fieldType.getTypeName()).apply(jsonBQString); } else if (fieldType.isLogicalType(SqlTypes.DATETIME.getIdentifier())) { @@ -888,4 +932,85 @@ private static Object convertAvroNumeric(Object value) { "Does not support converting avro format: " + value.getClass().getName()); } } + + /** + * @param fullTableId - Is one of the two forms commonly used to refer to bigquery tables in the + * beam codebase: + *
      + *
    • projects/{project_id}/datasets/{dataset_id}/tables/{table_id} + *
    • myproject:mydataset.mytable + *
    • myproject.mydataset.mytable + *
    + * + * @return a BigQueryTableIdentifier by parsing the fullTableId. If it cannot be parsed properly + * null is returned. + */ + public static @Nullable TableReference toTableReference(String fullTableId) { + // Try parsing the format: + // "projects/{project_id}/datasets/{dataset_id}/tables/{table_id}" + Matcher m = TABLE_RESOURCE_PATTERN.matcher(fullTableId); + if (m.matches()) { + return new TableReference() + .setProjectId(m.group("PROJECT")) + .setDatasetId(m.group("DATASET")) + .setTableId(m.group("TABLE")); + } + + // If that failed, try the format: + // "{project_id}:{dataset_id}.{table_id}" or + // "{project_id}.{dataset_id}.{table_id}" + m = SIMPLE_TABLE_PATTERN.matcher(fullTableId); + if (m.matches()) { + return new TableReference() + .setProjectId(m.group("PROJECT")) + .setDatasetId(m.group("DATASET")) + .setTableId(m.group("TABLE")); + } + return null; + } + + private static ServiceCallMetric callMetricForMethod( + TableReference tableReference, String method) { + if (tableReference != null) { + // TODO(ajamato): Add Ptransform label. Populate it as empty for now to prevent the + // SpecMonitoringInfoValidator from dropping the MonitoringInfo. + HashMap baseLabels = new HashMap(); + baseLabels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + baseLabels.put(MonitoringInfoConstants.Labels.SERVICE, "BigQuery"); + baseLabels.put(MonitoringInfoConstants.Labels.METHOD, method); + baseLabels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.bigQueryTable( + tableReference.getProjectId(), + tableReference.getDatasetId(), + tableReference.getTableId())); + baseLabels.put( + MonitoringInfoConstants.Labels.BIGQUERY_PROJECT_ID, tableReference.getProjectId()); + baseLabels.put( + MonitoringInfoConstants.Labels.BIGQUERY_DATASET, tableReference.getDatasetId()); + baseLabels.put(MonitoringInfoConstants.Labels.BIGQUERY_TABLE, tableReference.getTableId()); + return new ServiceCallMetric(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, baseLabels); + } + return null; + } + + /** + * @param tableReference - The table being read from. Can be a temporary BQ table used to read + * from a SQL query. + * @return a ServiceCallMetric for recording statuses for all BQ API responses related to reading + * elements directly from BigQuery in a process-wide metric. Such as: calls to readRows, + * splitReadStream, createReadSession. + */ + public static ServiceCallMetric readCallMetric(TableReference tableReference) { + return callMetricForMethod(tableReference, "BigQueryBatchRead"); + } + + /** + * @param tableReference - The table being written to. + * @return a ServiceCallMetric for recording statuses for all BQ responses related to writing + * elements directly to BigQuery in a process-wide metric. Such as: insertAll. + */ + public static ServiceCallMetric writeCallMetric(TableReference tableReference) { + return callMetricForMethod(tableReference, "BigQueryBatchWrite"); + } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CivilTimeEncoder.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CivilTimeEncoder.java new file mode 100644 index 000000000000..bed767b86af4 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CivilTimeEncoder.java @@ -0,0 +1,648 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.time.temporal.ChronoUnit; +import org.joda.time.LocalDateTime; +import org.joda.time.LocalTime; + +/** + * Encoder for TIME and DATETIME values, according to civil_time encoding. Copied out of the zetasql + * package. + * + *

    The valid range and number of bits required by each date/time field is as the following: + * + *

    + * + * + * + * + * + * + * + * + * + *
    Field Range #Bits
    Year [1, 9999] 14
    Month [1, 12] 4
    Day [1, 31] 5
    Hour [0, 23] 5
    Minute [0, 59] 6
    Second [0, 59]* 6
    Micros [0, 999999] 20
    Nanos [0, 999999999] 30
    + * + *

    * Leap second is not supported. + * + *

    When encoding the TIME or DATETIME into a bit field, larger date/time field is on the more + * significant side. + */ +public final class CivilTimeEncoder { + private static final int NANO_LENGTH = 30; + private static final int MICRO_LENGTH = 20; + + private static final int NANO_SHIFT = 0; + private static final int MICRO_SHIFT = 0; + private static final int SECOND_SHIFT = 0; + private static final int MINUTE_SHIFT = 6; + private static final int HOUR_SHIFT = 12; + private static final int DAY_SHIFT = 17; + private static final int MONTH_SHIFT = 22; + private static final int YEAR_SHIFT = 26; + + private static final long NANO_MASK = 0x3FFFFFFFL; + private static final long MICRO_MASK = 0xFFFFFL; + private static final long SECOND_MASK = 0x3FL; + private static final long MINUTE_MASK = 0xFC0L; + private static final long HOUR_MASK = 0x1F000L; + private static final long DAY_MASK = 0x3E0000L; + private static final long MONTH_MASK = 0x3C00000L; + private static final long YEAR_MASK = 0xFFFC000000L; + + private static final long TIME_SECONDS_MASK = 0x1FFFFL; + private static final long TIME_MICROS_MASK = 0x1FFFFFFFFFL; + private static final long TIME_NANOS_MASK = 0x7FFFFFFFFFFFL; + private static final long DATETIME_SECONDS_MASK = 0xFFFFFFFFFFL; + private static final long DATETIME_MICROS_MASK = 0xFFFFFFFFFFFFFFFL; + + /** + * Encodes {@code time} as a 4-byte integer with seconds precision. + * + *

    Encoding is as the following: + * + *

    +   *      3         2         1
    +   * MSB 10987654321098765432109876543210 LSB
    +   *                    | H ||  M ||  S |
    +   * 
    + * + * @see #decodePacked32TimeSeconds(int) + * @see #encodePacked32TimeSeconds(java.time.LocalTime) + */ + @SuppressWarnings("GoodTime") // should accept a java.time.LocalTime + public static int encodePacked32TimeSeconds(LocalTime time) { + checkValidTimeSeconds(time); + int bitFieldTimeSeconds = 0x0; + bitFieldTimeSeconds |= time.getHourOfDay() << HOUR_SHIFT; + bitFieldTimeSeconds |= time.getMinuteOfHour() << MINUTE_SHIFT; + bitFieldTimeSeconds |= time.getSecondOfMinute() << SECOND_SHIFT; + return bitFieldTimeSeconds; + } + + /** + * Encodes {@code time} as a 4-byte integer with seconds precision. + * + *

    Encoding is as the following: + * + *

    +   *      3         2         1
    +   * MSB 10987654321098765432109876543210 LSB
    +   *                    | H ||  M ||  S |
    +   * 
    + * + * @see #decodePacked32TimeSecondsAsJavaTime(int) + */ + @SuppressWarnings("GoodTime-ApiWithNumericTimeUnit") + public static int encodePacked32TimeSeconds(java.time.LocalTime time) { + checkValidTimeSeconds(time); + int bitFieldTimeSeconds = 0x0; + bitFieldTimeSeconds |= time.getHour() << HOUR_SHIFT; + bitFieldTimeSeconds |= time.getMinute() << MINUTE_SHIFT; + bitFieldTimeSeconds |= time.getSecond() << SECOND_SHIFT; + return bitFieldTimeSeconds; + } + + /** + * Decodes {@code bitFieldTimeSeconds} as a {@link LocalTime} with seconds precision. + * + *

    Encoding is as the following: + * + *

    +   *      3         2         1
    +   * MSB 10987654321098765432109876543210 LSB
    +   *                    | H ||  M ||  S |
    +   * 
    + * + * @see #encodePacked32TimeSeconds(LocalTime) + * @see #encodePacked32TimeSecondsAsJavaTime(int) + */ + @SuppressWarnings("GoodTime") // should return a java.time.LocalTime + public static LocalTime decodePacked32TimeSeconds(int bitFieldTimeSeconds) { + checkValidBitField(bitFieldTimeSeconds, TIME_SECONDS_MASK); + int hourOfDay = getFieldFromBitField(bitFieldTimeSeconds, HOUR_MASK, HOUR_SHIFT); + int minuteOfHour = getFieldFromBitField(bitFieldTimeSeconds, MINUTE_MASK, MINUTE_SHIFT); + int secondOfMinute = getFieldFromBitField(bitFieldTimeSeconds, SECOND_MASK, SECOND_SHIFT); + LocalTime time = new LocalTime(hourOfDay, minuteOfHour, secondOfMinute); + checkValidTimeSeconds(time); + return time; + } + + /** + * Decodes {@code bitFieldTimeSeconds} as a {@link java.time.LocalTime} with seconds precision. + * + *

    Encoding is as the following: + * + *

    +   *      3         2         1
    +   * MSB 10987654321098765432109876543210 LSB
    +   *                    | H ||  M ||  S |
    +   * 
    + * + * @see #encodePacked32TimeSeconds(java.time.LocalTime) + */ + @SuppressWarnings("GoodTime-ApiWithNumericTimeUnit") + public static java.time.LocalTime decodePacked32TimeSecondsAsJavaTime(int bitFieldTimeSeconds) { + checkValidBitField(bitFieldTimeSeconds, TIME_SECONDS_MASK); + int hourOfDay = getFieldFromBitField(bitFieldTimeSeconds, HOUR_MASK, HOUR_SHIFT); + int minuteOfHour = getFieldFromBitField(bitFieldTimeSeconds, MINUTE_MASK, MINUTE_SHIFT); + int secondOfMinute = getFieldFromBitField(bitFieldTimeSeconds, SECOND_MASK, SECOND_SHIFT); + // java.time.LocalTime validates the input parameters. + try { + return java.time.LocalTime.of(hourOfDay, minuteOfHour, secondOfMinute); + } catch (java.time.DateTimeException e) { + throw new IllegalArgumentException(e.getMessage(), e); + } + } + + /** + * Encodes {@code time} as a 8-byte integer with microseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                                | H ||  M ||  S ||-------micros-----|
    +   * 
    + * + * @see #decodePacked64TimeMicros(long) + * @see #encodePacked64TimeMicros(java.time.LocalTime) + */ + @SuppressWarnings("GoodTime") // should accept a java.time.LocalTime + public static long encodePacked64TimeMicros(LocalTime time) { + checkValidTimeMillis(time); + return (((long) encodePacked32TimeSeconds(time)) << MICRO_LENGTH) + | (time.getMillisOfSecond() * 1_000L); + } + + /** + * Encodes {@code time} as a 8-byte integer with microseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                                | H ||  M ||  S ||-------micros-----|
    +   * 
    + * + * @see #decodePacked64TimeMicrosAsJavaTime(long) + */ + @SuppressWarnings({"GoodTime-ApiWithNumericTimeUnit", "JavaLocalTimeGetNano"}) + public static long encodePacked64TimeMicros(java.time.LocalTime time) { + checkValidTimeMicros(time); + return (((long) encodePacked32TimeSeconds(time)) << MICRO_LENGTH) | (time.getNano() / 1_000L); + } + + /** + * Decodes {@code bitFieldTimeMicros} as a {@link LocalTime} with microseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                                | H ||  M ||  S ||-------micros-----|
    +   * 
    + * + *

    Warning: LocalTime only supports milliseconds precision. Result is truncated. + * + * @see #encodePacked64TimeMicros(LocalTime) + * @see #decodePacked64TimeMicrosAsJavaTime(long) + */ + @SuppressWarnings("GoodTime") // should return a java.time.LocalTime + public static LocalTime decodePacked64TimeMicros(long bitFieldTimeMicros) { + checkValidBitField(bitFieldTimeMicros, TIME_MICROS_MASK); + int bitFieldTimeSeconds = (int) (bitFieldTimeMicros >> MICRO_LENGTH); + LocalTime timeSeconds = decodePacked32TimeSeconds(bitFieldTimeSeconds); + int microOfSecond = getFieldFromBitField(bitFieldTimeMicros, MICRO_MASK, MICRO_SHIFT); + checkValidMicroOfSecond(microOfSecond); + LocalTime time = timeSeconds.withMillisOfSecond(microOfSecond / 1_000); + checkValidTimeMillis(time); + return time; + } + + /** + * Decodes {@code bitFieldTimeMicros} as a {@link java.time.LocalTime} with microseconds + * precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                                | H ||  M ||  S ||-------micros-----|
    +   * 
    + * + * @see #encodePacked64TimeMicros(java.time.LocalTime) + */ + @SuppressWarnings("GoodTime-ApiWithNumericTimeUnit") + public static java.time.LocalTime decodePacked64TimeMicrosAsJavaTime(long bitFieldTimeMicros) { + checkValidBitField(bitFieldTimeMicros, TIME_MICROS_MASK); + int bitFieldTimeSeconds = (int) (bitFieldTimeMicros >> MICRO_LENGTH); + java.time.LocalTime timeSeconds = decodePacked32TimeSecondsAsJavaTime(bitFieldTimeSeconds); + int microOfSecond = getFieldFromBitField(bitFieldTimeMicros, MICRO_MASK, MICRO_SHIFT); + checkValidMicroOfSecond(microOfSecond); + return timeSeconds.withNano(microOfSecond * 1_000); + } + + /** + * Encodes {@code time} as a 8-byte integer with nanoseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                      | H ||  M ||  S ||---------- nanos -----------|
    +   * 
    + * + * @see #decodePacked64TimeNanos(long) + * @see #encodePacked64TimeNanos(java.time.LocalTime) + */ + @SuppressWarnings("GoodTime") // should accept a java.time.LocalTime + public static long encodePacked64TimeNanos(LocalTime time) { + checkValidTimeMillis(time); + return (((long) encodePacked32TimeSeconds(time)) << NANO_LENGTH) + | (time.getMillisOfSecond() * 1_000_000L); + } + + /** + * Encodes {@code time} as a 8-byte integer with nanoseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                      | H ||  M ||  S ||---------- nanos -----------|
    +   * 
    + * + * @see #decodePacked64TimeNanosAsJavaTime(long) + */ + @SuppressWarnings({"GoodTime-ApiWithNumericTimeUnit", "JavaLocalTimeGetNano"}) + public static long encodePacked64TimeNanos(java.time.LocalTime time) { + checkValidTimeNanos(time); + return (((long) encodePacked32TimeSeconds(time)) << NANO_LENGTH) | time.getNano(); + } + + /** + * Decodes {@code bitFieldTimeNanos} as a {@link LocalTime} with nanoseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                      | H ||  M ||  S ||---------- nanos -----------|
    +   * 
    + * + *

    Warning: LocalTime only supports milliseconds precision. Result is truncated. + * + * @see #encodePacked64TimeNanos(LocalTime) + * @see #decodePacked64TimeNanosAsJavaTime(long) + */ + @SuppressWarnings("GoodTime") // should return a java.time.LocalTime + public static LocalTime decodePacked64TimeNanos(long bitFieldTimeNanos) { + checkValidBitField(bitFieldTimeNanos, TIME_NANOS_MASK); + int bitFieldTimeSeconds = (int) (bitFieldTimeNanos >> NANO_LENGTH); + LocalTime timeSeconds = decodePacked32TimeSeconds(bitFieldTimeSeconds); + int nanoOfSecond = getFieldFromBitField(bitFieldTimeNanos, NANO_MASK, NANO_SHIFT); + checkValidNanoOfSecond(nanoOfSecond); + LocalTime time = timeSeconds.withMillisOfSecond(nanoOfSecond / 1_000_000); + checkValidTimeMillis(time); + return time; + } + + /** + * Decodes {@code bitFieldTimeNanos} as a {@link java.time.LocalTime} with nanoseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                      | H ||  M ||  S ||---------- nanos -----------|
    +   * 
    + * + * @see #encodePacked64TimeNanos(java.time.LocalTime) + */ + @SuppressWarnings("GoodTime-ApiWithNumericTimeUnit") + public static java.time.LocalTime decodePacked64TimeNanosAsJavaTime(long bitFieldTimeNanos) { + checkValidBitField(bitFieldTimeNanos, TIME_NANOS_MASK); + int bitFieldTimeSeconds = (int) (bitFieldTimeNanos >> NANO_LENGTH); + java.time.LocalTime timeSeconds = decodePacked32TimeSecondsAsJavaTime(bitFieldTimeSeconds); + int nanoOfSecond = getFieldFromBitField(bitFieldTimeNanos, NANO_MASK, NANO_SHIFT); + checkValidNanoOfSecond(nanoOfSecond); + return timeSeconds.withNano(nanoOfSecond); + } + + /** + * Encodes {@code dateTime} as a 8-byte integer with seconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                             |--- year ---||m || D || H ||  M ||  S |
    +   * 
    + * + * @see #decodePacked64DatetimeSeconds(long) + * @see #encodePacked64DatetimeSeconds(java.time.LocalDateTime) + */ + @SuppressWarnings("GoodTime") // should accept a java.time.LocalDateTime + public static long encodePacked64DatetimeSeconds(LocalDateTime dateTime) { + checkValidDateTimeSeconds(dateTime); + long bitFieldDatetimeSeconds = 0x0L; + bitFieldDatetimeSeconds |= (long) dateTime.getYear() << YEAR_SHIFT; + bitFieldDatetimeSeconds |= (long) dateTime.getMonthOfYear() << MONTH_SHIFT; + bitFieldDatetimeSeconds |= (long) dateTime.getDayOfMonth() << DAY_SHIFT; + bitFieldDatetimeSeconds |= (long) encodePacked32TimeSeconds(dateTime.toLocalTime()); + return bitFieldDatetimeSeconds; + } + + /** + * Encodes {@code dateTime} as a 8-byte integer with seconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                             |--- year ---||m || D || H ||  M ||  S |
    +   * 
    + * + * @see #decodePacked64DatetimeSecondsAsJavaTime(long) + */ + @SuppressWarnings("GoodTime-ApiWithNumericTimeUnit") + public static long encodePacked64DatetimeSeconds(java.time.LocalDateTime dateTime) { + checkValidDateTimeSeconds(dateTime); + long bitFieldDatetimeSeconds = 0x0L; + bitFieldDatetimeSeconds |= (long) dateTime.getYear() << YEAR_SHIFT; + bitFieldDatetimeSeconds |= (long) dateTime.getMonthValue() << MONTH_SHIFT; + bitFieldDatetimeSeconds |= (long) dateTime.getDayOfMonth() << DAY_SHIFT; + bitFieldDatetimeSeconds |= (long) encodePacked32TimeSeconds(dateTime.toLocalTime()); + return bitFieldDatetimeSeconds; + } + + /** + * Decodes {@code bitFieldDatetimeSeconds} as a {@link LocalDateTime} with seconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                             |--- year ---||m || D || H ||  M ||  S |
    +   * 
    + * + * @see #encodePacked64DatetimeSeconds(LocalDateTime) + * @see #decodePacked64DatetimeSecondsAsJavaTime(long) + */ + @SuppressWarnings("GoodTime") // should return a java.time.LocalDateTime + public static LocalDateTime decodePacked64DatetimeSeconds(long bitFieldDatetimeSeconds) { + checkValidBitField(bitFieldDatetimeSeconds, DATETIME_SECONDS_MASK); + int bitFieldTimeSeconds = (int) (bitFieldDatetimeSeconds & TIME_SECONDS_MASK); + LocalTime timeSeconds = decodePacked32TimeSeconds(bitFieldTimeSeconds); + int year = getFieldFromBitField(bitFieldDatetimeSeconds, YEAR_MASK, YEAR_SHIFT); + int monthOfYear = getFieldFromBitField(bitFieldDatetimeSeconds, MONTH_MASK, MONTH_SHIFT); + int dayOfMonth = getFieldFromBitField(bitFieldDatetimeSeconds, DAY_MASK, DAY_SHIFT); + LocalDateTime dateTime = + new LocalDateTime( + year, + monthOfYear, + dayOfMonth, + timeSeconds.getHourOfDay(), + timeSeconds.getMinuteOfHour(), + timeSeconds.getSecondOfMinute()); + checkValidDateTimeSeconds(dateTime); + return dateTime; + } + + /** + * Decodes {@code bitFieldDatetimeSeconds} as a {@link LocalDateTime} with seconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *                             |--- year ---||m || D || H ||  M ||  S |
    +   * 
    + * + * @see #encodePacked64DatetimeSeconds(java.time.LocalDateTime) + */ + @SuppressWarnings("GoodTime-ApiWithNumericTimeUnit") + public static java.time.LocalDateTime decodePacked64DatetimeSecondsAsJavaTime( + long bitFieldDatetimeSeconds) { + checkValidBitField(bitFieldDatetimeSeconds, DATETIME_SECONDS_MASK); + int bitFieldTimeSeconds = (int) (bitFieldDatetimeSeconds & TIME_SECONDS_MASK); + java.time.LocalTime timeSeconds = decodePacked32TimeSecondsAsJavaTime(bitFieldTimeSeconds); + int year = getFieldFromBitField(bitFieldDatetimeSeconds, YEAR_MASK, YEAR_SHIFT); + int monthOfYear = getFieldFromBitField(bitFieldDatetimeSeconds, MONTH_MASK, MONTH_SHIFT); + int dayOfMonth = getFieldFromBitField(bitFieldDatetimeSeconds, DAY_MASK, DAY_SHIFT); + try { + java.time.LocalDateTime dateTime = + java.time.LocalDateTime.of( + year, + monthOfYear, + dayOfMonth, + timeSeconds.getHour(), + timeSeconds.getMinute(), + timeSeconds.getSecond()); + checkValidDateTimeSeconds(dateTime); + return dateTime; + } catch (java.time.DateTimeException e) { + throw new IllegalArgumentException(e.getMessage(), e); + } + } + + /** + * Encodes {@code dateTime} as a 8-byte integer with microseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *         |--- year ---||m || D || H ||  M ||  S ||-------micros-----|
    +   * 
    + * + * @see #decodePacked64DatetimeMicros(long) + * @see #encodePacked64DatetimeMicros(java.time.LocalDateTime) + */ + @SuppressWarnings("GoodTime") // should accept a java.time.LocalDateTime + public static long encodePacked64DatetimeMicros(LocalDateTime dateTime) { + checkValidDateTimeMillis(dateTime); + return (encodePacked64DatetimeSeconds(dateTime) << MICRO_LENGTH) + | (dateTime.getMillisOfSecond() * 1_000L); + } + + /** + * Encodes {@code dateTime} as a 8-byte integer with microseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *         |--- year ---||m || D || H ||  M ||  S ||-------micros-----|
    +   * 
    + * + * @see #decodePacked64DatetimeMicrosAsJavaTime(long) + */ + @SuppressWarnings({"GoodTime-ApiWithNumericTimeUnit", "JavaLocalDateTimeGetNano"}) + public static long encodePacked64DatetimeMicros(java.time.LocalDateTime dateTime) { + checkValidDateTimeMicros(dateTime); + return (encodePacked64DatetimeSeconds(dateTime) << MICRO_LENGTH) + | (dateTime.getNano() / 1_000L); + } + + /** + * Decodes {@code bitFieldDatetimeMicros} as a {@link LocalDateTime} with microseconds precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *         |--- year ---||m || D || H ||  M ||  S ||-------micros-----|
    +   * 
    + * + *

    Warning: LocalDateTime only supports milliseconds precision. Result is truncated. + * + * @see #encodePacked64DatetimeMicros(LocalDateTime) + * @see #decodePacked64DatetimeMicrosAsJavaTime(long) + */ + @SuppressWarnings("GoodTime") // should return a java.time.LocalDateTime + public static LocalDateTime decodePacked64DatetimeMicros(long bitFieldDatetimeMicros) { + checkValidBitField(bitFieldDatetimeMicros, DATETIME_MICROS_MASK); + long bitFieldDatetimeSeconds = bitFieldDatetimeMicros >> MICRO_LENGTH; + LocalDateTime dateTimeSeconds = decodePacked64DatetimeSeconds(bitFieldDatetimeSeconds); + int microOfSecond = getFieldFromBitField(bitFieldDatetimeMicros, MICRO_MASK, MICRO_SHIFT); + checkValidMicroOfSecond(microOfSecond); + LocalDateTime dateTime = dateTimeSeconds.withMillisOfSecond(microOfSecond / 1_000); + checkValidDateTimeMillis(dateTime); + return dateTime; + } + + /** + * Decodes {@code bitFieldDatetimeMicros} as a {@link java.time.LocalDateTime} with microseconds + * precision. + * + *

    Encoding is as the following: + * + *

    +   *        6         5         4         3         2         1
    +   * MSB 3210987654321098765432109876543210987654321098765432109876543210 LSB
    +   *         |--- year ---||m || D || H ||  M ||  S ||-------micros-----|
    +   * 
    + * + * @see #encodePacked64DatetimeMicros(java.time.LocalDateTime) + */ + @SuppressWarnings("GoodTime-ApiWithNumericTimeUnit") + public static java.time.LocalDateTime decodePacked64DatetimeMicrosAsJavaTime( + long bitFieldDatetimeMicros) { + checkValidBitField(bitFieldDatetimeMicros, DATETIME_MICROS_MASK); + long bitFieldDatetimeSeconds = bitFieldDatetimeMicros >> MICRO_LENGTH; + java.time.LocalDateTime dateTimeSeconds = + decodePacked64DatetimeSecondsAsJavaTime(bitFieldDatetimeSeconds); + int microOfSecond = getFieldFromBitField(bitFieldDatetimeMicros, MICRO_MASK, MICRO_SHIFT); + checkValidMicroOfSecond(microOfSecond); + java.time.LocalDateTime dateTime = dateTimeSeconds.withNano(microOfSecond * 1_000); + checkValidDateTimeMicros(dateTime); + return dateTime; + } + + private static int getFieldFromBitField(long bitField, long mask, int shift) { + return (int) ((bitField & mask) >> shift); + } + + private static void checkValidTimeSeconds(LocalTime time) { + checkArgument(time.getHourOfDay() >= 0 && time.getHourOfDay() <= 23); + checkArgument(time.getMinuteOfHour() >= 0 && time.getMinuteOfHour() <= 59); + checkArgument(time.getSecondOfMinute() >= 0 && time.getSecondOfMinute() <= 59); + } + + private static void checkValidTimeSeconds(java.time.LocalTime time) { + checkArgument(time.getHour() >= 0 && time.getHour() <= 23); + checkArgument(time.getMinute() >= 0 && time.getMinute() <= 59); + checkArgument(time.getSecond() >= 0 && time.getSecond() <= 59); + } + + private static void checkValidTimeMillis(LocalTime time) { + checkValidTimeSeconds(time); + checkArgument(time.getMillisOfSecond() >= 0 && time.getMillisOfSecond() <= 999); + } + + private static void checkValidTimeMicros(java.time.LocalTime time) { + checkValidTimeSeconds(time); + checkArgument(time.equals(time.truncatedTo(ChronoUnit.MICROS))); + } + + private static void checkValidTimeNanos(java.time.LocalTime time) { + checkValidTimeSeconds(time); + } + + private static void checkValidDateTimeSeconds(LocalDateTime dateTime) { + checkArgument(dateTime.getYear() >= 1 && dateTime.getYear() <= 9999); + checkArgument(dateTime.getMonthOfYear() >= 1 && dateTime.getMonthOfYear() <= 12); + checkArgument(dateTime.getDayOfMonth() >= 1 && dateTime.getDayOfMonth() <= 31); + checkValidTimeSeconds(dateTime.toLocalTime()); + } + + private static void checkValidDateTimeSeconds(java.time.LocalDateTime dateTime) { + checkArgument(dateTime.getYear() >= 1 && dateTime.getYear() <= 9999); + checkArgument(dateTime.getMonthValue() >= 1 && dateTime.getMonthValue() <= 12); + checkArgument(dateTime.getDayOfMonth() >= 1 && dateTime.getDayOfMonth() <= 31); + checkValidTimeSeconds(dateTime.toLocalTime()); + } + + private static void checkValidDateTimeMillis(LocalDateTime dateTime) { + checkValidDateTimeSeconds(dateTime); + checkArgument(dateTime.getMillisOfSecond() >= 0 && dateTime.getMillisOfSecond() <= 999); + } + + private static void checkValidDateTimeMicros(java.time.LocalDateTime dateTime) { + checkValidDateTimeSeconds(dateTime); + checkArgument(dateTime.equals(dateTime.truncatedTo(ChronoUnit.MICROS))); + } + + private static void checkValidDateTimeNanos(java.time.LocalDateTime dateTime) { + checkValidDateTimeSeconds(dateTime); + } + + private static void checkValidMicroOfSecond(int microOfSecond) { + checkArgument(microOfSecond >= 0 && microOfSecond <= 999999); + } + + private static void checkValidNanoOfSecond(int nanoOfSecond) { + checkArgument(nanoOfSecond >= 0 && nanoOfSecond <= 999999999); + } + + private static void checkValidBitField(long bitField, long mask) { + checkArgument((bitField & ~mask) == 0x0L); + } + + private CivilTimeEncoder() {} +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CreateTableHelpers.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CreateTableHelpers.java new file mode 100644 index 000000000000..11e6b329e0f9 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CreateTableHelpers.java @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.api.services.bigquery.model.EncryptionConfiguration; +import com.google.api.services.bigquery.model.Table; +import com.google.api.services.bigquery.model.TableReference; +import com.google.api.services.bigquery.model.TableSchema; +import java.util.Collections; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; + +public class CreateTableHelpers { + /** + * The list of tables created so far, so we don't try the creation each time. + * + *

    TODO: We should put a bound on memory usage of this. Use guava cache instead. + */ + private static Set createdTables = + Collections.newSetFromMap(new ConcurrentHashMap()); + + static TableDestination possiblyCreateTable( + DoFn.ProcessContext context, + TableDestination tableDestination, + Supplier schemaSupplier, + CreateDisposition createDisposition, + Coder tableDestinationCoder, + String kmsKey, + BigQueryServices bqServices) { + checkArgument( + tableDestination.getTableSpec() != null, + "DynamicDestinations.getTable() must return a TableDestination " + + "with a non-null table spec, but %s returned %s for destination %s," + + "which has a null table spec", + tableDestination); + boolean destinationCoderSupportsClustering = + !(tableDestinationCoder instanceof TableDestinationCoderV2); + checkArgument( + tableDestination.getClustering() == null || destinationCoderSupportsClustering, + "DynamicDestinations.getTable() may only return destinations with clustering configured" + + " if a destination coder is supplied that supports clustering, but %s is configured" + + " to use TableDestinationCoderV2. Set withClustering() on BigQueryIO.write() and, " + + " if you provided a custom DynamicDestinations instance, override" + + " getDestinationCoder() to return TableDestinationCoderV3."); + TableReference tableReference = tableDestination.getTableReference().clone(); + if (Strings.isNullOrEmpty(tableReference.getProjectId())) { + tableReference.setProjectId( + context.getPipelineOptions().as(BigQueryOptions.class).getProject()); + tableDestination = tableDestination.withTableReference(tableReference); + } + if (createDisposition == CreateDisposition.CREATE_NEVER) { + return tableDestination; + } + + String tableSpec = BigQueryHelpers.stripPartitionDecorator(tableDestination.getTableSpec()); + if (!createdTables.contains(tableSpec)) { + // Another thread may have succeeded in creating the table in the meanwhile, so + // check again. This check isn't needed for correctness, but we add it to prevent + // every thread from attempting a create and overwhelming our BigQuery quota. + synchronized (createdTables) { + if (!createdTables.contains(tableSpec)) { + tryCreateTable( + context, + schemaSupplier, + tableDestination, + createDisposition, + tableSpec, + kmsKey, + bqServices); + } + } + } + return tableDestination; + } + + @SuppressWarnings({"nullness"}) + private static void tryCreateTable( + DoFn.ProcessContext context, + Supplier schemaSupplier, + TableDestination tableDestination, + CreateDisposition createDisposition, + String tableSpec, + String kmsKey, + BigQueryServices bqServices) { + TableReference tableReference = tableDestination.getTableReference().clone(); + tableReference.setTableId(BigQueryHelpers.stripPartitionDecorator(tableReference.getTableId())); + try (DatasetService datasetService = + bqServices.getDatasetService(context.getPipelineOptions().as(BigQueryOptions.class))) { + if (datasetService.getTable(tableReference) == null) { + TableSchema tableSchema = schemaSupplier.get(); + checkArgument( + tableSchema != null, + "Unless create disposition is %s, a schema must be specified, i.e. " + + "DynamicDestinations.getSchema() may not return null. " + + "However, create disposition is %s, and " + + " %s returned null for destination %s", + CreateDisposition.CREATE_NEVER, + createDisposition, + tableDestination); + Table table = new Table().setTableReference(tableReference).setSchema(tableSchema); + if (tableDestination.getTableDescription() != null) { + table = table.setDescription(tableDestination.getTableDescription()); + } + if (tableDestination.getTimePartitioning() != null) { + table.setTimePartitioning(tableDestination.getTimePartitioning()); + if (tableDestination.getClustering() != null) { + table.setClustering(tableDestination.getClustering()); + } + } + if (kmsKey != null) { + table.setEncryptionConfiguration(new EncryptionConfiguration().setKmsKeyName(kmsKey)); + } + datasetService.createTable(table); + } + } catch (Exception e) { + throw new RuntimeException(e); + } + createdTables.add(tableSpec); + } + + @VisibleForTesting + static void clearCreatedTables() { + synchronized (createdTables) { + createdTables.clear(); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CreateTables.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CreateTables.java index 51af95d4af29..d9bb6dd57192 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CreateTables.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CreateTables.java @@ -19,9 +19,6 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; -import com.google.api.services.bigquery.model.EncryptionConfiguration; -import com.google.api.services.bigquery.model.Table; -import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableSchema; import java.util.Collections; import java.util.List; @@ -29,7 +26,6 @@ import java.util.Set; import java.util.concurrent.ConcurrentHashMap; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition; -import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; @@ -37,7 +33,7 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; @@ -110,115 +106,35 @@ public void startBundle() { @ProcessElement public void processElement(ProcessContext context) { dynamicDestinations.setSideInputAccessorFromProcessContext(context); - context.output( - KV.of( - destinations.computeIfAbsent( - context.element().getKey(), dest -> getTableDestination(context, dest)), - context.element().getValue())); - } - - private TableDestination getTableDestination(ProcessContext context, DestinationT destination) { - TableDestination tableDestination = dynamicDestinations.getTable(destination); - checkArgument( - tableDestination != null, - "DynamicDestinations.getTable() may not return null, " - + "but %s returned null for destination %s", - dynamicDestinations, - destination); - checkArgument( - tableDestination.getTableSpec() != null, - "DynamicDestinations.getTable() must return a TableDestination " - + "with a non-null table spec, but %s returned %s for destination %s," - + "which has a null table spec", - dynamicDestinations, - tableDestination, - destination); - boolean destinationCoderSupportsClustering = - !(dynamicDestinations.getDestinationCoder() instanceof TableDestinationCoderV2); - checkArgument( - tableDestination.getClustering() == null || destinationCoderSupportsClustering, - "DynamicDestinations.getTable() may only return destinations with clustering configured" - + " if a destination coder is supplied that supports clustering, but %s is configured" - + " to use TableDestinationCoderV2. Set withClustering() on BigQueryIO.write() and, " - + " if you provided a custom DynamicDestinations instance, override" - + " getDestinationCoder() to return TableDestinationCoderV3.", - dynamicDestinations); - TableReference tableReference = tableDestination.getTableReference().clone(); - if (Strings.isNullOrEmpty(tableReference.getProjectId())) { - tableReference.setProjectId( - context.getPipelineOptions().as(BigQueryOptions.class).getProject()); - tableDestination = tableDestination.withTableReference(tableReference); - } - if (createDisposition == CreateDisposition.CREATE_NEVER) { - return tableDestination; - } - - String tableSpec = BigQueryHelpers.stripPartitionDecorator(tableDestination.getTableSpec()); - if (!createdTables.contains(tableSpec)) { - // Another thread may have succeeded in creating the table in the meanwhile, so - // check again. This check isn't needed for correctness, but we add it to prevent - // every thread from attempting a create and overwhelming our BigQuery quota. - synchronized (createdTables) { - if (!createdTables.contains(tableSpec)) { - tryCreateTable(context, destination, tableDestination, tableSpec, kmsKey); - } - } - } - return tableDestination; - } - - private void tryCreateTable( - ProcessContext context, - DestinationT destination, - TableDestination tableDestination, - String tableSpec, - String kmsKey) { - DatasetService datasetService = - bqServices.getDatasetService(context.getPipelineOptions().as(BigQueryOptions.class)); - TableReference tableReference = tableDestination.getTableReference().clone(); - tableReference.setTableId( - BigQueryHelpers.stripPartitionDecorator(tableReference.getTableId())); - try { - if (datasetService.getTable(tableReference) == null) { - TableSchema tableSchema = dynamicDestinations.getSchema(destination); - checkArgument( - tableSchema != null, - "Unless create disposition is %s, a schema must be specified, i.e. " - + "DynamicDestinations.getSchema() may not return null. " - + "However, create disposition is %s, and " - + " %s returned null for destination %s", - CreateDisposition.CREATE_NEVER, - createDisposition, - dynamicDestinations, - destination); - Table table = - new Table() - .setTableReference(tableReference) - .setSchema(tableSchema) - .setDescription(tableDestination.getTableDescription()); - if (tableDestination.getTimePartitioning() != null) { - table.setTimePartitioning(tableDestination.getTimePartitioning()); - if (tableDestination.getClustering() != null) { - table.setClustering(tableDestination.getClustering()); - } - } - if (kmsKey != null) { - table.setEncryptionConfiguration(new EncryptionConfiguration().setKmsKeyName(kmsKey)); - } - datasetService.createTable(table); - } - } catch (Exception e) { - throw new RuntimeException(e); - } - createdTables.add(tableSpec); + TableDestination tableDestination = + destinations.computeIfAbsent( + context.element().getKey(), + dest -> { + TableDestination tableDestination1 = dynamicDestinations.getTable(dest); + checkArgument( + tableDestination1 != null, + "DynamicDestinations.getTable() may not return null, " + + "but %s returned null for destination %s", + dynamicDestinations, + dest); + Supplier schemaSupplier = () -> dynamicDestinations.getSchema(dest); + return CreateTableHelpers.possiblyCreateTable( + context, + tableDestination1, + schemaSupplier, + createDisposition, + dynamicDestinations.getDestinationCoder(), + kmsKey, + bqServices); + }); + + context.output(KV.of(tableDestination, context.element().getValue())); } } /** This method is used by the testing fake to clear static state. */ @VisibleForTesting static void clearCreatedTables() { - synchronized (createdTables) { - createdTables.clear(); - } + CreateTableHelpers.clearCreatedTables(); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinationsHelpers.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinationsHelpers.java index 18af86ca04dc..fb13ba85cff5 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinationsHelpers.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinationsHelpers.java @@ -34,6 +34,7 @@ import org.apache.beam.sdk.coders.CoderRegistry; import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.JsonTableRefToTableSpec; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; import org.apache.beam.sdk.options.ValueProvider; import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider; import org.apache.beam.sdk.transforms.DoFn; @@ -356,11 +357,18 @@ private Table getBigQueryTable(TableReference tableReference) { try { BigQueryOptions bqOptions = getPipelineOptions().as(BigQueryOptions.class); if (tableReference.getProjectId() == null) { - tableReference.setProjectId(bqOptions.getProject()); + tableReference.setProjectId( + bqOptions.getBigQueryProject() == null + ? bqOptions.getProject() + : bqOptions.getBigQueryProject()); } - return bqServices.getDatasetService(bqOptions).getTable(tableReference); - } catch (InterruptedException | IOException e) { - LOG.info("Failed to get BigQuery table " + tableReference); + try (DatasetService datasetService = bqServices.getDatasetService(bqOptions)) { + return datasetService.getTable(tableReference); + } catch (InterruptedException | IOException e) { + LOG.info("Failed to get BigQuery table " + tableReference); + } + } catch (Exception e) { + throw new RuntimeException(e); } } while (nextBackOff(Sleeper.DEFAULT, backoff)); } catch (InterruptedException e) { diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/RetryManager.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/RetryManager.java new file mode 100644 index 000000000000..16ae19712aa3 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/RetryManager.java @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.api.core.ApiFuture; +import com.google.api.core.ApiFutureCallback; +import com.google.api.core.ApiFutures; +import java.util.Queue; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.Executor; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.function.Consumer; +import java.util.function.Function; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.Operation.Context; +import org.apache.beam.sdk.util.BackOff; +import org.apache.beam.sdk.util.BackOffUtils; +import org.apache.beam.sdk.util.FluentBackoff; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Queues; +import org.joda.time.Duration; + +/** + * Retry manager used by Storage API operations. This class manages a sequence of operations (e.g. + * sequential appends to a stream) and retries of those operations. If any one operation fails, then + * all subsequent operations are expected to fail true and will alll be retried. + */ +class RetryManager> { + private Queue> operations; + private final BackOff backoff; + private final ExecutorService executor; + + // Enum returned by onError indicating whether errors should be retried. + enum RetryType { + // The in-flight operations will not be retried. + DONT_RETRY, + // All operations will be retried. + RETRY_ALL_OPERATIONS + }; + + static class WrappedFailure extends Throwable { + @Nullable private final Object result; + + public WrappedFailure(@Nullable Object result) { + this.result = result; + } + + @Nullable + Object getResult() { + return result; + } + } + + RetryManager(Duration initialBackoff, Duration maxBackoff, int maxRetries) { + this.operations = Queues.newArrayDeque(); + backoff = + FluentBackoff.DEFAULT + .withInitialBackoff(initialBackoff) + .withMaxBackoff(maxBackoff) + .withMaxRetries(maxRetries) + .backoff(); + this.executor = Executors.newCachedThreadPool(); + } + + static class Operation> { + static class Context { + private @Nullable Throwable error = null; + private @Nullable ResultT result = null; + + public void setError(@Nullable Throwable error) { + this.error = error; + } + + public @Nullable Throwable getError() { + return error; + } + + public void setResult(@Nullable ResultT result) { + this.result = result; + } + + public @Nullable ResultT getResult() { + return result; + } + } + + private final Function> runOperation; + private final Function, RetryType> onError; + private final Consumer onSuccess; + private final Function hasSucceeded; + @Nullable private ApiFuture future = null; + @Nullable private Callback callback = null; + @Nullable ContextT context = null; + + public Operation( + Function> runOperation, + Function, RetryType> onError, + Consumer onSuccess, + Function hasSucceeded, + ContextT context) { + this.runOperation = runOperation; + this.onError = onError; + this.onSuccess = onSuccess; + this.hasSucceeded = hasSucceeded; + this.context = context; + } + + @SuppressWarnings({"nullness"}) + void run(Executor executor) { + this.future = runOperation.apply(context); + this.callback = new Callback<>(hasSucceeded); + ApiFutures.addCallback(future, callback, executor); + } + + @SuppressWarnings({"nullness"}) + boolean await() throws Exception { + callback.await(); + return callback.getFailed(); + } + } + + private static class Callback implements ApiFutureCallback { + private final CountDownLatch waiter; + private final Function hasSucceeded; + @Nullable private Throwable failure = null; + boolean failed = false; + + Callback(Function hasSucceeded) { + this.waiter = new CountDownLatch(1); + this.hasSucceeded = hasSucceeded; + } + + void await() throws InterruptedException { + waiter.await(); + } + + boolean await(long timeoutSec) throws InterruptedException { + return waiter.await(timeoutSec, TimeUnit.SECONDS); + } + + @Override + public void onFailure(Throwable t) { + synchronized (this) { + failure = t; + failed = true; + } + waiter.countDown(); + } + + @Override + public void onSuccess(ResultT result) { + synchronized (this) { + if (hasSucceeded.apply(result)) { + failure = null; + } else { + failure = new WrappedFailure(result); + failed = true; + } + } + waiter.countDown(); + } + + @Nullable + Throwable getFailure() { + synchronized (this) { + return failure; + } + } + + boolean getFailed() { + synchronized (this) { + return failed; + } + } + } + + void addOperation( + Function> runOperation, + Function, RetryType> onError, + Consumer onSuccess, + ContextT context) + throws Exception { + addOperation(runOperation, onError, onSuccess, r -> true, context); + } + + void addOperation( + Function> runOperation, + Function, RetryType> onError, + Consumer onSuccess, + Function hasSucceeded, + ContextT context) + throws Exception { + addOperation(new Operation<>(runOperation, onError, onSuccess, hasSucceeded, context)); + } + + void addAndRunOperation( + Function> runOperation, + Function, RetryType> onError, + Consumer onSuccess, + ContextT context) + throws Exception { + addAndRunOperation(new Operation<>(runOperation, onError, onSuccess, r -> true, context)); + } + + void addAndRunOperation( + Function> runOperation, + Function, RetryType> onError, + Consumer onSuccess, + Function hasSucceeded, + ContextT context) + throws Exception { + addAndRunOperation(new Operation<>(runOperation, onError, onSuccess, hasSucceeded, context)); + } + + void addOperation(Operation operation) { + operations.add(operation); + } + + void addAndRunOperation(Operation operation) { + operation.run(executor); + operations.add(operation); + } + + void run(boolean await) throws Exception { + for (Operation operation : operations) { + operation.run(executor); + } + if (await) { + await(); + } + } + + @SuppressWarnings({"nullness"}) + void await() throws Exception { + while (!this.operations.isEmpty()) { + Operation operation = this.operations.element(); + boolean failed = operation.await(); + if (failed) { + Throwable failure = operation.callback.getFailure(); + operation.context.setError(failure); + RetryType retryType = + operation.onError.apply( + operations.stream().map(o -> o.context).collect(Collectors.toList())); + if (retryType == RetryType.DONT_RETRY) { + operations.clear(); + } else { + Preconditions.checkState(RetryType.RETRY_ALL_OPERATIONS == retryType); + if (!BackOffUtils.next(Sleeper.DEFAULT, backoff)) { + throw new RuntimeException(failure); + } + for (Operation awaitOperation : operations) { + awaitOperation.await(); + } + // Run all the operations again. + run(false); + } + } else { + operation.context.setResult(operation.future.get()); + operation.onSuccess.accept(operation.context); + operations.remove(); + } + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiConvertMessages.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiConvertMessages.java new file mode 100644 index 000000000000..d58c7af8cf66 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiConvertMessages.java @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import org.apache.beam.sdk.io.gcp.bigquery.StorageApiDynamicDestinations.MessageConverter; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; + +/** + * A transform that converts messages to protocol buffers in preparation for writing to BigQuery. + */ +public class StorageApiConvertMessages + extends PTransform< + PCollection>, PCollection>> { + private final StorageApiDynamicDestinations dynamicDestinations; + + public StorageApiConvertMessages( + StorageApiDynamicDestinations dynamicDestinations) { + this.dynamicDestinations = dynamicDestinations; + } + + @Override + public PCollection> expand( + PCollection> input) { + String operationName = input.getName() + "/" + getName(); + + return input.apply( + "Convert to message", + ParDo.of(new ConvertMessagesDoFn<>(dynamicDestinations, operationName)) + .withSideInputs(dynamicDestinations.getSideInputs())); + } + + public static class ConvertMessagesDoFn + extends DoFn, KV> { + private final StorageApiDynamicDestinations dynamicDestinations; + private TwoLevelMessageConverterCache messageConverters; + + ConvertMessagesDoFn( + StorageApiDynamicDestinations dynamicDestinations, + String operationName) { + this.dynamicDestinations = dynamicDestinations; + this.messageConverters = new TwoLevelMessageConverterCache<>(operationName); + } + + @ProcessElement + public void processElement( + ProcessContext c, + @Element KV element, + OutputReceiver> o) + throws Exception { + dynamicDestinations.setSideInputAccessorFromProcessContext(c); + MessageConverter messageConverter = + messageConverters.get(element.getKey(), dynamicDestinations); + o.output( + KV.of(element.getKey(), messageConverter.toMessage(element.getValue()).toByteArray())); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinations.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinations.java new file mode 100644 index 000000000000..65e14c602962 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinations.java @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.api.services.bigquery.model.TableSchema; +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.Message; +import java.util.List; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.values.PCollectionView; +import org.apache.beam.sdk.values.ValueInSingleWindow; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** Base dynamicDestinations class used by the Storage API sink. */ +abstract class StorageApiDynamicDestinations + extends DynamicDestinations { + public interface MessageConverter { + Descriptor getSchemaDescriptor(); + + Message toMessage(T element); + } + + private DynamicDestinations inner; + + StorageApiDynamicDestinations(DynamicDestinations inner) { + this.inner = inner; + } + + public abstract MessageConverter getMessageConverter(DestinationT destination) + throws Exception; + + @Override + public DestinationT getDestination(ValueInSingleWindow element) { + return inner.getDestination(element); + } + + @Override + public @Nullable Coder getDestinationCoder() { + return inner.getDestinationCoder(); + } + + @Override + public TableDestination getTable(DestinationT destination) { + return inner.getTable(destination); + } + + @Override + public TableSchema getSchema(DestinationT destination) { + return inner.getSchema(destination); + } + + @Override + public List> getSideInputs() { + return inner.getSideInputs(); + } + + @Override + void setSideInputAccessorFromProcessContext(DoFn.ProcessContext context) { + super.setSideInputAccessorFromProcessContext(context); + inner.setSideInputAccessorFromProcessContext(context); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java new file mode 100644 index 000000000000..40814a0a359d --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.Message; +import java.time.Duration; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.Cache; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; + +@SuppressWarnings({"nullness"}) +/** Storage API DynamicDestinations used when the input is a Beam Row. */ +class StorageApiDynamicDestinationsBeamRow + extends StorageApiDynamicDestinations { + private final Schema schema; + private final SerializableFunction toRow; + private final Cache destinationDescriptorCache = + CacheBuilder.newBuilder().expireAfterAccess(Duration.ofMinutes(15)).build(); + + StorageApiDynamicDestinationsBeamRow( + DynamicDestinations inner, + Schema schema, + SerializableFunction toRow) { + super(inner); + this.schema = schema; + this.toRow = toRow; + } + + @Override + public MessageConverter getMessageConverter(DestinationT destination) throws Exception { + return new MessageConverter() { + Descriptor descriptor = + destinationDescriptorCache.get( + destination, () -> BeamRowToStorageApiProto.getDescriptorFromSchema(schema)); + + @Override + public Descriptor getSchemaDescriptor() { + return descriptor; + } + + @Override + public Message toMessage(T element) { + return BeamRowToStorageApiProto.messageFromBeamRow(descriptor, toRow.apply(element)); + } + }; + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsTableRow.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsTableRow.java new file mode 100644 index 000000000000..0204488079e4 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsTableRow.java @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.api.services.bigquery.model.TableRow; +import com.google.api.services.bigquery.model.TableSchema; +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.Message; +import java.time.Duration; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.Cache; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; + +@SuppressWarnings({"nullness"}) +public class StorageApiDynamicDestinationsTableRow + extends StorageApiDynamicDestinations { + private final SerializableFunction formatFunction; + + // TODO: Make static! Or at least optimize the constant schema case. + private final Cache destinationDescriptorCache = + CacheBuilder.newBuilder().expireAfterAccess(Duration.ofMinutes(15)).build(); + + StorageApiDynamicDestinationsTableRow( + DynamicDestinations inner, + SerializableFunction formatFunction) { + super(inner); + this.formatFunction = formatFunction; + } + + @Override + public MessageConverter getMessageConverter(DestinationT destination) throws Exception { + final TableSchema tableSchema = getSchema(destination); + if (tableSchema == null) { + throw new RuntimeException( + "Schema must be set when writing TableRows using Storage API. Use " + + "BigQueryIO.Write.withSchema to set the schema."); + } + return new MessageConverter() { + Descriptor descriptor = + destinationDescriptorCache.get( + destination, + () -> TableRowToStorageApiProto.getDescriptorFromTableSchema(tableSchema)); + + @Override + public Descriptor getSchemaDescriptor() { + return descriptor; + } + + @Override + public Message toMessage(T element) { + return TableRowToStorageApiProto.messageFromTableRow( + descriptor, formatFunction.apply(element)); + } + }; + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiFinalizeWritesDoFn.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiFinalizeWritesDoFn.java new file mode 100644 index 000000000000..f935a638fa03 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiFinalizeWritesDoFn.java @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.cloud.bigquery.storage.v1beta2.BatchCommitWriteStreamsResponse; +import com.google.cloud.bigquery.storage.v1beta2.FinalizeWriteStreamResponse; +import com.google.cloud.bigquery.storage.v1beta2.StorageError; +import com.google.cloud.bigquery.storage.v1beta2.StorageError.StorageErrorCode; +import java.io.IOException; +import java.util.Collection; +import java.util.Map; +import java.util.Set; +import javax.annotation.Nullable; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.Operation.Context; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.RetryType; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** This DoFn finalizes and commits Storage API streams. */ +class StorageApiFinalizeWritesDoFn extends DoFn, Void> { + private static final Logger LOG = LoggerFactory.getLogger(StorageApiFinalizeWritesDoFn.class); + + private final Counter finalizeOperationsSent = + Metrics.counter(StorageApiFinalizeWritesDoFn.class, "finalizeOperationsSent"); + private final Counter finalizeOperationsSucceeded = + Metrics.counter(StorageApiFinalizeWritesDoFn.class, "finalizeOperationsSucceeded"); + private final Counter finalizeOperationsFailed = + Metrics.counter(StorageApiFinalizeWritesDoFn.class, "finalizeOperationsFailed"); + private final Counter batchCommitOperationsSent = + Metrics.counter(StorageApiFinalizeWritesDoFn.class, "batchCommitOperationsSent"); + private final Counter batchCommitOperationsSucceeded = + Metrics.counter(StorageApiFinalizeWritesDoFn.class, "batchCommitOperationsSucceeded"); + private final Counter batchCommitOperationsFailed = + Metrics.counter(StorageApiFinalizeWritesDoFn.class, "batchCommitOperationsFailed"); + + private Map> commitStreams; + private final BigQueryServices bqServices; + @Nullable private DatasetService datasetService; + + public StorageApiFinalizeWritesDoFn(BigQueryServices bqServices) { + this.bqServices = bqServices; + this.commitStreams = Maps.newHashMap(); + this.datasetService = null; + } + + private DatasetService getDatasetService(PipelineOptions pipelineOptions) throws IOException { + if (datasetService == null) { + datasetService = bqServices.getDatasetService(pipelineOptions.as(BigQueryOptions.class)); + } + return datasetService; + } + + @Teardown + public void onTeardown() { + try { + if (datasetService != null) { + datasetService.close(); + datasetService = null; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + @StartBundle + public void startBundle() throws IOException { + commitStreams = Maps.newHashMap(); + } + + @ProcessElement + @SuppressWarnings({"nullness"}) + public void process(PipelineOptions pipelineOptions, @Element KV element) + throws Exception { + String tableId = element.getKey(); + String streamId = element.getValue(); + DatasetService datasetService = getDatasetService(pipelineOptions); + + RetryManager> retryManager = + new RetryManager<>(Duration.standardSeconds(1), Duration.standardMinutes(1), 3); + retryManager.addOperation( + c -> { + finalizeOperationsSent.inc(); + return datasetService.finalizeWriteStream(streamId); + }, + contexts -> { + LOG.error( + "Finalize of stream " + + streamId + + " failed with " + + Iterables.getFirst(contexts, null).getError()); + finalizeOperationsFailed.inc(); + return RetryType.RETRY_ALL_OPERATIONS; + }, + c -> { + LOG.info("Finalize of stream " + streamId + " finished with " + c.getResult()); + finalizeOperationsSucceeded.inc(); + commitStreams.computeIfAbsent(tableId, d -> Lists.newArrayList()).add(streamId); + }, + new Context<>()); + retryManager.run(true); + } + + @FinishBundle + @SuppressWarnings({"nullness"}) + public void finishBundle(PipelineOptions pipelineOptions) throws Exception { + DatasetService datasetService = getDatasetService(pipelineOptions); + for (Map.Entry> entry : commitStreams.entrySet()) { + final String tableId = entry.getKey(); + final Collection streamNames = entry.getValue(); + final Set alreadyCommittedStreams = Sets.newHashSet(); + RetryManager> + retryManager = + new RetryManager<>(Duration.standardSeconds(1), Duration.standardMinutes(1), 3); + retryManager.addOperation( + c -> { + Iterable streamsToCommit = + Iterables.filter(streamNames, s -> !alreadyCommittedStreams.contains(s)); + batchCommitOperationsSent.inc(); + return datasetService.commitWriteStreams(tableId, streamsToCommit); + }, + contexts -> { + LOG.error( + "BatchCommit failed. tableId " + + tableId + + " streamNames " + + streamNames + + " error: " + + Iterables.getFirst(contexts, null).getError()); + batchCommitOperationsFailed.inc(); + return RetryType.RETRY_ALL_OPERATIONS; + }, + c -> { + LOG.info("BatchCommit succeeded for tableId " + tableId + " response " + c.getResult()); + batchCommitOperationsSucceeded.inc(); + }, + response -> { + if (!response.hasCommitTime()) { + for (StorageError storageError : response.getStreamErrorsList()) { + if (storageError.getCode() == StorageErrorCode.STREAM_ALREADY_COMMITTED) { + // Make sure that we don't retry any streams that are already committed. + alreadyCommittedStreams.add(storageError.getEntity()); + } + } + Iterable streamsToCommit = + Iterables.filter(streamNames, s -> !alreadyCommittedStreams.contains(s)); + // If there are no more streams left to commit, then report this operation as having + // succeeded. Otherwise, + // retry. + return Iterables.isEmpty(streamsToCommit); + } + return true; + }, + new Context<>()); + retryManager.run(true); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiFlushAndFinalizeDoFn.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiFlushAndFinalizeDoFn.java new file mode 100644 index 000000000000..4cf312e22c18 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiFlushAndFinalizeDoFn.java @@ -0,0 +1,219 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.api.gax.rpc.ApiException; +import com.google.api.gax.rpc.StatusCode.Code; +import com.google.cloud.bigquery.storage.v1beta2.FinalizeWriteStreamResponse; +import com.google.cloud.bigquery.storage.v1beta2.FlushRowsResponse; +import java.io.IOException; +import java.io.Serializable; +import java.time.Instant; +import java.util.Objects; +import javax.annotation.Nullable; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.Operation.Context; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.RetryType; +import org.apache.beam.sdk.io.gcp.bigquery.StorageApiFlushAndFinalizeDoFn.Operation; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Distribution; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.schemas.JavaFieldSchema; +import org.apache.beam.sdk.schemas.annotations.DefaultSchema; +import org.apache.beam.sdk.schemas.annotations.SchemaCreate; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** This DoFn flushes and optionally (if requested) finalizes Storage API streams. */ +public class StorageApiFlushAndFinalizeDoFn extends DoFn, Void> { + private static final Logger LOG = LoggerFactory.getLogger(StorageApiFlushAndFinalizeDoFn.class); + + private final BigQueryServices bqServices; + @Nullable private DatasetService datasetService = null; + private final Counter flushOperationsSent = + Metrics.counter(StorageApiFlushAndFinalizeDoFn.class, "flushOperationsSent"); + private final Counter flushOperationsSucceeded = + Metrics.counter(StorageApiFlushAndFinalizeDoFn.class, "flushOperationsSucceeded"); + private final Counter flushOperationsFailed = + Metrics.counter(StorageApiFlushAndFinalizeDoFn.class, "flushOperationsFailed"); + private final Counter flushOperationsAlreadyExists = + Metrics.counter(StorageApiFlushAndFinalizeDoFn.class, "flushOperationsAlreadyExists"); + private final Counter flushOperationsInvalidArgument = + Metrics.counter(StorageApiFlushAndFinalizeDoFn.class, "flushOperationsInvalidArgument"); + private final Distribution flushLatencyDistribution = + Metrics.distribution(StorageApiFlushAndFinalizeDoFn.class, "flushOperationLatencyMs"); + private final Counter finalizeOperationsSent = + Metrics.counter(StorageApiFlushAndFinalizeDoFn.class, "finalizeOperationsSent"); + private final Counter finalizeOperationsSucceeded = + Metrics.counter(StorageApiFlushAndFinalizeDoFn.class, "finalizeOperationsSucceeded"); + private final Counter finalizeOperationsFailed = + Metrics.counter(StorageApiFlushAndFinalizeDoFn.class, "finalizeOperationsFailed"); + + @DefaultSchema(JavaFieldSchema.class) + static class Operation implements Comparable, Serializable { + final long flushOffset; + final boolean finalizeStream; + + @SchemaCreate + public Operation(long flushOffset, boolean finalizeStream) { + this.flushOffset = flushOffset; + this.finalizeStream = finalizeStream; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + Operation operation = (Operation) o; + return flushOffset == operation.flushOffset && finalizeStream == operation.finalizeStream; + } + + @Override + public int hashCode() { + return Objects.hash(flushOffset, finalizeStream); + } + + @Override + public int compareTo(Operation other) { + int compValue = Long.compare(this.flushOffset, other.flushOffset); + if (compValue == 0) { + compValue = Boolean.compare(this.finalizeStream, other.finalizeStream); + } + return compValue; + } + } + + public StorageApiFlushAndFinalizeDoFn(BigQueryServices bqServices) { + this.bqServices = bqServices; + } + + private DatasetService getDatasetService(PipelineOptions pipelineOptions) throws IOException { + if (datasetService == null) { + datasetService = bqServices.getDatasetService(pipelineOptions.as(BigQueryOptions.class)); + } + return datasetService; + } + + @Teardown + public void onTeardown() { + try { + if (datasetService != null) { + datasetService.close(); + datasetService = null; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + @SuppressWarnings({"nullness"}) + @ProcessElement + public void process(PipelineOptions pipelineOptions, @Element KV element) + throws Exception { + final String streamId = element.getKey(); + final Operation operation = element.getValue(); + final DatasetService datasetService = getDatasetService(pipelineOptions); + // Flush the stream. If the flush offset < 0, that means we only need to finalize. + long offset = operation.flushOffset; + if (offset >= 0) { + Instant now = Instant.now(); + RetryManager> retryManager = + new RetryManager<>(Duration.standardSeconds(1), Duration.standardMinutes(1), 3); + retryManager.addOperation( + // runOperation + c -> { + try { + flushOperationsSent.inc(); + return datasetService.flush(streamId, offset); + } catch (Exception e) { + throw new RuntimeException(e); + } + }, + // onError + contexts -> { + Throwable error = Iterables.getFirst(contexts, null).getError(); + LOG.warn( + "Flush of stream " + streamId + " to offset " + offset + " failed with " + error); + flushOperationsFailed.inc(); + if (error instanceof ApiException) { + Code statusCode = ((ApiException) error).getStatusCode().getCode(); + if (statusCode.equals(Code.ALREADY_EXISTS)) { + flushOperationsAlreadyExists.inc(); + // Implies that we have already flushed up to this point, so don't retry. + return RetryType.DONT_RETRY; + } + if (statusCode.equals(Code.INVALID_ARGUMENT)) { + flushOperationsInvalidArgument.inc(); + // Implies that the stream has already been finalized. + // TODO: Storage API should provide a more-specific way of identifying this failure. + return RetryType.DONT_RETRY; + } + } + return RetryType.RETRY_ALL_OPERATIONS; + }, + // onSuccess + c -> { + flushOperationsSucceeded.inc(); + }, + new Context<>()); + retryManager.run(true); + java.time.Duration timeElapsed = java.time.Duration.between(now, Instant.now()); + flushLatencyDistribution.update(timeElapsed.toMillis()); + } + + // Finalize the stream. No need to commit the stream, since we are only dealing with BUFFERED + // streams here that have + // already been flushed. Note that in the case of errors upstream, we will leave an unflushed + // tail in the stream. + // This is by design - those records will be retried on a new stream, so we don't want to flush + // them in this stream + // or we would end up with duplicates. + if (operation.finalizeStream) { + RetryManager> retryManager = + new RetryManager<>(Duration.standardSeconds(1), Duration.standardMinutes(1), 3); + retryManager.addOperation( + c -> { + finalizeOperationsSent.inc(); + return datasetService.finalizeWriteStream(streamId); + }, + contexts -> { + LOG.warn( + "Finalize of stream " + + streamId + + " failed with " + + Iterables.getFirst(contexts, null).getError()); + finalizeOperationsFailed.inc(); + return RetryType.RETRY_ALL_OPERATIONS; + }, + r -> { + finalizeOperationsSucceeded.inc(); + }, + new Context<>()); + retryManager.run(true); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiLoads.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiLoads.java new file mode 100644 index 000000000000..3c27ddc8e42f --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiLoads.java @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import com.google.api.services.bigquery.model.TableRow; +import java.nio.ByteBuffer; +import java.util.concurrent.ThreadLocalRandom; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.ByteArrayCoder; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.windowing.GlobalWindows; +import org.apache.beam.sdk.transforms.windowing.Window; +import org.apache.beam.sdk.util.ShardedKey; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** This {@link PTransform} manages loads into BigQuery using the Storage API. */ +public class StorageApiLoads + extends PTransform>, WriteResult> { + private static final Logger LOG = LoggerFactory.getLogger(StorageApiLoads.class); + static final int MAX_BATCH_SIZE_BYTES = 2 * 1024 * 1024; + + private final Coder destinationCoder; + private final StorageApiDynamicDestinations dynamicDestinations; + private final CreateDisposition createDisposition; + private final String kmsKey; + private final Duration triggeringFrequency; + private final BigQueryServices bqServices; + private final int numShards; + + public StorageApiLoads( + Coder destinationCoder, + StorageApiDynamicDestinations dynamicDestinations, + CreateDisposition createDisposition, + String kmsKey, + Duration triggeringFrequency, + BigQueryServices bqServices, + int numShards) { + this.destinationCoder = destinationCoder; + this.dynamicDestinations = dynamicDestinations; + this.createDisposition = createDisposition; + this.kmsKey = kmsKey; + this.triggeringFrequency = triggeringFrequency; + this.bqServices = bqServices; + this.numShards = numShards; + } + + @Override + public WriteResult expand(PCollection> input) { + return triggeringFrequency != null ? expandTriggered(input) : expandUntriggered(input); + } + + public WriteResult expandTriggered(PCollection> input) { + // Handle triggered, low-latency loads into BigQuery. + PCollection> inputInGlobalWindow = + input.apply("rewindowIntoGlobal", Window.into(new GlobalWindows())); + + // First shard all the records. + // TODO(reuvenlax): Add autosharding support so that users don't have to pick a shard count. + PCollection, byte[]>> shardedRecords = + inputInGlobalWindow + .apply("Convert", new StorageApiConvertMessages<>(dynamicDestinations)) + .apply( + "AddShard", + ParDo.of( + new DoFn, KV, byte[]>>() { + int shardNumber; + + @Setup + public void setup() { + shardNumber = ThreadLocalRandom.current().nextInt(numShards); + } + + @ProcessElement + public void processElement( + @Element KV element, + OutputReceiver, byte[]>> o) { + DestinationT destination = element.getKey(); + ByteBuffer buffer = ByteBuffer.allocate(Integer.BYTES); + buffer.putInt(++shardNumber % numShards); + o.output( + KV.of(ShardedKey.of(destination, buffer.array()), element.getValue())); + } + })) + .setCoder(KvCoder.of(ShardedKey.Coder.of(destinationCoder), ByteArrayCoder.of())); + + PCollection, Iterable>> groupedRecords = + shardedRecords.apply( + "GroupIntoBatches", + GroupIntoBatches., byte[]>ofByteSize( + MAX_BATCH_SIZE_BYTES, (byte[] e) -> (long) e.length) + .withMaxBufferingDuration(triggeringFrequency)); + + groupedRecords.apply( + "StorageApiWriteSharded", + new StorageApiWritesShardedRecords<>( + dynamicDestinations, createDisposition, kmsKey, bqServices, destinationCoder)); + + return writeResult(input.getPipeline()); + } + + public WriteResult expandUntriggered(PCollection> input) { + PCollection> inputInGlobalWindow = + input.apply( + "rewindowIntoGlobal", Window.>into(new GlobalWindows())); + inputInGlobalWindow.apply( + "StorageApiWriteUnsharded", + new StorageApiWriteUnshardedRecords<>( + dynamicDestinations, createDisposition, kmsKey, bqServices, destinationCoder)); + return writeResult(input.getPipeline()); + } + + private WriteResult writeResult(Pipeline p) { + // TODO(reuvenlax): Support per-record failures if schema doesn't match or if the record is too + // large. + PCollection empty = + p.apply("CreateEmptyFailedInserts", Create.empty(TypeDescriptor.of(TableRow.class))); + return WriteResult.in(p, new TupleTag<>("failedInserts"), empty, null); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java new file mode 100644 index 000000000000..7ecc1fac1036 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java @@ -0,0 +1,331 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.api.services.bigquery.model.TableSchema; +import com.google.cloud.bigquery.storage.v1beta2.AppendRowsResponse; +import com.google.cloud.bigquery.storage.v1beta2.ProtoRows; +import com.google.cloud.bigquery.storage.v1beta2.WriteStream.Type; +import com.google.protobuf.ByteString; +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.StreamAppendClient; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.Operation.Context; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.RetryType; +import org.apache.beam.sdk.io.gcp.bigquery.StorageApiDynamicDestinations.MessageConverter; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.Reshuffle; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.transforms.windowing.GlobalWindow; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@SuppressWarnings({"nullness"}) +/** + * Write records to the Storage API using a standard batch approach. PENDING streams are used, which + * do not become visible until they are finalized and committed. Each input bundle to the DoFn + * creates a stream per output table, appends all records in the bundle to the stream, and schedules + * a finalize/commit operation at the end. + */ +public class StorageApiWriteUnshardedRecords + extends PTransform>, PCollection> { + private static final Logger LOG = LoggerFactory.getLogger(StorageApiWriteUnshardedRecords.class); + + private final StorageApiDynamicDestinations dynamicDestinations; + private final CreateDisposition createDisposition; + private final String kmsKey; + private final BigQueryServices bqServices; + private final Coder destinationCoder; + + public StorageApiWriteUnshardedRecords( + StorageApiDynamicDestinations dynamicDestinations, + CreateDisposition createDisposition, + String kmsKey, + BigQueryServices bqServices, + Coder destinationCoder) { + this.dynamicDestinations = dynamicDestinations; + this.createDisposition = createDisposition; + this.kmsKey = kmsKey; + this.bqServices = bqServices; + this.destinationCoder = destinationCoder; + } + + @Override + public PCollection expand(PCollection> input) { + String operationName = input.getName() + "/" + getName(); + return input + .apply( + "Write Records", + ParDo.of(new WriteRecordsDoFn(operationName)) + .withSideInputs(dynamicDestinations.getSideInputs())) + .setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())) + // Calling Reshuffle makes the output stable - once this completes, the append operations + // will not retry. + // TODO(reuvenlax): This should use RequiresStableInput instead. + .apply("Reshuffle", Reshuffle.of()) + .apply("Finalize writes", ParDo.of(new StorageApiFinalizeWritesDoFn(bqServices))); + } + + private static final ExecutorService closeWriterExecutor = Executors.newCachedThreadPool(); + // Run a closure asynchronously, ignoring failures. + private interface ThrowingRunnable { + void run() throws Exception; + } + + @SuppressWarnings("FutureReturnValueIgnored") + private static void runAsyncIgnoreFailure(ExecutorService executor, ThrowingRunnable task) { + executor.submit( + () -> { + try { + task.run(); + } catch (Exception e) { + // + } + }); + } + + class WriteRecordsDoFn extends DoFn, KV> { + class DestinationState { + private final String tableUrn; + private final MessageConverter messageConverter; + private String streamName = ""; + private @Nullable StreamAppendClient streamAppendClient = null; + private long currentOffset = 0; + private List pendingMessages; + @Nullable private DatasetService datasetService; + + public DestinationState( + String tableUrn, + MessageConverter messageConverter, + DatasetService datasetService) { + this.tableUrn = tableUrn; + this.messageConverter = messageConverter; + this.pendingMessages = Lists.newArrayList(); + this.datasetService = datasetService; + } + + void close() { + if (streamAppendClient != null) { + try { + streamAppendClient.close(); + streamAppendClient = null; + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + + StreamAppendClient getWriteStream() { + try { + if (streamAppendClient == null) { + this.streamName = + Preconditions.checkNotNull(datasetService) + .createWriteStream(tableUrn, Type.PENDING) + .getName(); + this.streamAppendClient = + Preconditions.checkNotNull(datasetService) + .getStreamAppendClient(streamName, messageConverter.getSchemaDescriptor()); + this.currentOffset = 0; + } + return streamAppendClient; + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + void invalidateWriteStream() { + try { + runAsyncIgnoreFailure(closeWriterExecutor, streamAppendClient::close); + streamAppendClient = null; + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + void addMessage(ElementT element) throws Exception { + ByteString message = messageConverter.toMessage(element).toByteString(); + pendingMessages.add(message); + if (shouldFlush()) { + flush(); + } + } + + boolean shouldFlush() { + // TODO: look at byte size too? + return pendingMessages.size() > 100; + } + + @SuppressWarnings({"nullness"}) + void flush() throws Exception { + if (pendingMessages.isEmpty()) { + return; + } + final ProtoRows.Builder inserts = ProtoRows.newBuilder(); + for (ByteString m : pendingMessages) { + inserts.addSerializedRows(m); + } + + ProtoRows protoRows = inserts.build(); + pendingMessages.clear(); + + RetryManager> retryManager = + new RetryManager<>(Duration.standardSeconds(1), Duration.standardMinutes(1), 5); + retryManager.addOperation( + c -> { + try { + long offset = currentOffset; + currentOffset += inserts.getSerializedRowsCount(); + return getWriteStream().appendRows(offset, protoRows); + } catch (Exception e) { + throw new RuntimeException(e); + } + }, + contexts -> { + LOG.info( + "Append to stream " + + streamName + + " failed with error " + + Iterables.getFirst(contexts, null).getError()); + invalidateWriteStream(); + return RetryType.RETRY_ALL_OPERATIONS; + }, + response -> { + LOG.info("Append to stream {} succeeded.", streamName); + }, + new Context<>()); + // TODO: Do we have to wait on every append? + retryManager.run(true); + } + } + + private Map destinations = Maps.newHashMap(); + private final TwoLevelMessageConverterCache messageConverters; + private @Nullable DatasetService datasetService; + + WriteRecordsDoFn(String operationName) { + this.messageConverters = new TwoLevelMessageConverterCache<>(operationName); + } + + private void initializeDatasetService(PipelineOptions pipelineOptions) { + if (datasetService == null) { + datasetService = bqServices.getDatasetService(pipelineOptions.as(BigQueryOptions.class)); + } + } + + @StartBundle + public void startBundle() throws IOException { + destinations = Maps.newHashMap(); + } + + DestinationState createDestinationState(ProcessContext c, DestinationT destination) { + TableDestination tableDestination1 = dynamicDestinations.getTable(destination); + checkArgument( + tableDestination1 != null, + "DynamicDestinations.getTable() may not return null, " + + "but %s returned null for destination %s", + dynamicDestinations, + destination); + Supplier schemaSupplier = () -> dynamicDestinations.getSchema(destination); + TableDestination createdTable = + CreateTableHelpers.possiblyCreateTable( + c, + tableDestination1, + schemaSupplier, + createDisposition, + destinationCoder, + kmsKey, + bqServices); + + MessageConverter messageConverter; + try { + messageConverter = messageConverters.get(destination, dynamicDestinations); + } catch (Exception e) { + throw new RuntimeException(e); + } + return new DestinationState(createdTable.getTableUrn(), messageConverter, datasetService); + } + + @ProcessElement + public void process( + ProcessContext c, + PipelineOptions pipelineOptions, + @Element KV element) + throws Exception { + initializeDatasetService(pipelineOptions); + dynamicDestinations.setSideInputAccessorFromProcessContext(c); + DestinationState state = + destinations.computeIfAbsent(element.getKey(), k -> createDestinationState(c, k)); + + if (state.shouldFlush()) { + // Too much memory being used. Flush the state and wait for it to drain out. + // TODO(reuvenlax): Consider waiting for memory usage to drop instead of waiting for all the + // appends to finish. + state.flush(); + } + state.addMessage(element.getValue()); + } + + @FinishBundle + public void finishBundle(FinishBundleContext context) throws Exception { + for (DestinationState state : destinations.values()) { + state.flush(); + context.output( + KV.of(state.tableUrn, state.streamName), + BoundedWindow.TIMESTAMP_MAX_VALUE.minus(1), + GlobalWindow.INSTANCE); + } + } + + @Teardown + public void teardown() { + for (DestinationState state : destinations.values()) { + state.close(); + } + try { + if (datasetService != null) { + datasetService.close(); + datasetService = null; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java new file mode 100644 index 000000000000..52871845ad6f --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java @@ -0,0 +1,535 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.api.core.ApiFuture; +import com.google.api.services.bigquery.model.TableSchema; +import com.google.cloud.bigquery.storage.v1beta2.AppendRowsResponse; +import com.google.cloud.bigquery.storage.v1beta2.ProtoRows; +import com.google.cloud.bigquery.storage.v1beta2.WriteStream.Type; +import com.google.protobuf.ByteString; +import com.google.protobuf.Descriptors.Descriptor; +import io.grpc.Status; +import io.grpc.Status.Code; +import java.io.IOException; +import java.time.Instant; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.NoSuchElementException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.function.BiConsumer; +import java.util.function.Consumer; +import java.util.function.Function; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.StreamAppendClient; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.RetryType; +import org.apache.beam.sdk.io.gcp.bigquery.StorageApiDynamicDestinations.MessageConverter; +import org.apache.beam.sdk.io.gcp.bigquery.StorageApiFlushAndFinalizeDoFn.Operation; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Distribution; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.schemas.NoSuchSchemaException; +import org.apache.beam.sdk.schemas.SchemaCoder; +import org.apache.beam.sdk.schemas.SchemaRegistry; +import org.apache.beam.sdk.state.StateSpec; +import org.apache.beam.sdk.state.StateSpecs; +import org.apache.beam.sdk.state.ValueState; +import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Max; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.windowing.AfterProcessingTime; +import org.apache.beam.sdk.transforms.windowing.Repeatedly; +import org.apache.beam.sdk.transforms.windowing.Window; +import org.apache.beam.sdk.util.ShardedKey; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.Cache; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.RemovalNotification; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** A transform to write sharded records to BigQuery using the Storage API. */ +@SuppressWarnings("FutureReturnValueIgnored") +public class StorageApiWritesShardedRecords + extends PTransform< + PCollection, Iterable>>, PCollection> { + private static final Logger LOG = LoggerFactory.getLogger(StorageApiWritesShardedRecords.class); + + private final StorageApiDynamicDestinations dynamicDestinations; + private final CreateDisposition createDisposition; + private final String kmsKey; + private final BigQueryServices bqServices; + private final Coder destinationCoder; + private static final ExecutorService closeWriterExecutor = Executors.newCachedThreadPool(); + + private static final Cache APPEND_CLIENTS = + CacheBuilder.newBuilder() + .expireAfterAccess(5, TimeUnit.MINUTES) + .removalListener( + (RemovalNotification removal) -> { + @Nullable final StreamAppendClient streamAppendClient = removal.getValue(); + // Close the writer in a different thread so as not to block the main one. + runAsyncIgnoreFailure(closeWriterExecutor, streamAppendClient::close); + }) + .build(); + + // Run a closure asynchronously, ignoring failures. + private interface ThrowingRunnable { + void run() throws Exception; + } + + private static void runAsyncIgnoreFailure(ExecutorService executor, ThrowingRunnable task) { + executor.submit( + () -> { + try { + task.run(); + } catch (Exception e) { + // + } + }); + } + + public StorageApiWritesShardedRecords( + StorageApiDynamicDestinations dynamicDestinations, + CreateDisposition createDisposition, + String kmsKey, + BigQueryServices bqServices, + Coder destinationCoder) { + this.dynamicDestinations = dynamicDestinations; + this.createDisposition = createDisposition; + this.kmsKey = kmsKey; + this.bqServices = bqServices; + this.destinationCoder = destinationCoder; + } + + @Override + public PCollection expand( + PCollection, Iterable>> input) { + String operationName = input.getName() + "/" + getName(); + // Append records to the Storage API streams. + PCollection> written = + input.apply( + "Write Records", + ParDo.of(new WriteRecordsDoFn(operationName)) + .withSideInputs(dynamicDestinations.getSideInputs())); + + SchemaCoder operationCoder; + try { + SchemaRegistry schemaRegistry = input.getPipeline().getSchemaRegistry(); + operationCoder = + SchemaCoder.of( + schemaRegistry.getSchema(Operation.class), + TypeDescriptor.of(Operation.class), + schemaRegistry.getToRowFunction(Operation.class), + schemaRegistry.getFromRowFunction(Operation.class)); + } catch (NoSuchSchemaException e) { + throw new RuntimeException(e); + } + + // Send all successful writes to be flushed. + return written + .setCoder(KvCoder.of(StringUtf8Coder.of(), operationCoder)) + .apply( + Window.>configure() + .triggering( + Repeatedly.forever( + AfterProcessingTime.pastFirstElementInPane() + .plusDelayOf(Duration.standardSeconds(1)))) + .discardingFiredPanes()) + .apply("maxFlushPosition", Combine.perKey(Max.naturalOrder(new Operation(-1, false)))) + .apply( + "Flush and finalize writes", ParDo.of(new StorageApiFlushAndFinalizeDoFn(bqServices))); + } + + /** + * Takes in an iterable and batches the results into multiple ProtoRows objects. The splitSize + * parameter controls how many rows are batched into a single ProtoRows object before we move on + * to the next one. + */ + static class SplittingIterable implements Iterable { + private final Iterable underlying; + private final long splitSize; + + public SplittingIterable(Iterable underlying, long splitSize) { + this.underlying = underlying; + this.splitSize = splitSize; + } + + @Override + public Iterator iterator() { + return new Iterator() { + final Iterator underlyingIterator = underlying.iterator(); + + @Override + public boolean hasNext() { + return underlyingIterator.hasNext(); + } + + @Override + public ProtoRows next() { + if (!hasNext()) { + throw new NoSuchElementException(); + } + + ProtoRows.Builder inserts = ProtoRows.newBuilder(); + long bytesSize = 0; + while (underlyingIterator.hasNext()) { + ByteString byteString = ByteString.copyFrom(underlyingIterator.next()); + inserts.addSerializedRows(byteString); + bytesSize += byteString.size(); + if (bytesSize > splitSize) { + break; + } + } + return inserts.build(); + } + }; + } + } + + class WriteRecordsDoFn + extends DoFn, Iterable>, KV> { + private final Counter recordsAppended = + Metrics.counter(WriteRecordsDoFn.class, "recordsAppended"); + private final Counter streamsCreated = + Metrics.counter(WriteRecordsDoFn.class, "streamsCreated"); + private final Counter appendFailures = + Metrics.counter(WriteRecordsDoFn.class, "appendFailures"); + private final Counter appendOffsetFailures = + Metrics.counter(WriteRecordsDoFn.class, "appendOffsetFailures"); + private final Counter flushesScheduled = + Metrics.counter(WriteRecordsDoFn.class, "flushesScheduled"); + private final Distribution appendLatencyDistribution = + Metrics.distribution(WriteRecordsDoFn.class, "appendLatencyDistributionMs"); + private final Distribution appendSizeDistribution = + Metrics.distribution(WriteRecordsDoFn.class, "appendSizeDistribution"); + private final Distribution appendSplitDistribution = + Metrics.distribution(WriteRecordsDoFn.class, "appendSplitDistribution"); + + private TwoLevelMessageConverterCache messageConverters; + + private Map destinations = Maps.newHashMap(); + + private @Nullable DatasetService datasetServiceInternal = null; + + // Stores the current stream for this key. + @StateId("streamName") + private final StateSpec> streamNameSpec = StateSpecs.value(); + + // Stores the current stream offset. + @StateId("streamOffset") + private final StateSpec> streamOffsetSpec = StateSpecs.value(); + + public WriteRecordsDoFn(String operationName) { + this.messageConverters = new TwoLevelMessageConverterCache<>(operationName); + } + + @StartBundle + public void startBundle() throws IOException { + destinations = Maps.newHashMap(); + } + + // Get the current stream for this key. If there is no current stream, create one and store the + // stream name in + // persistent state. + @SuppressWarnings({"nullness"}) + String getOrCreateStream( + String tableId, + ValueState streamName, + ValueState streamOffset, + DatasetService datasetService) + throws IOException, InterruptedException { + String stream = streamName.read(); + if (Strings.isNullOrEmpty(stream)) { + // In a buffered stream, data is only visible up to the offset to which it was flushed. + stream = datasetService.createWriteStream(tableId, Type.BUFFERED).getName(); + streamName.write(stream); + streamOffset.write(0L); + streamsCreated.inc(); + } + return stream; + } + + private DatasetService getDatasetService(PipelineOptions pipelineOptions) throws IOException { + if (datasetServiceInternal == null) { + datasetServiceInternal = + bqServices.getDatasetService(pipelineOptions.as(BigQueryOptions.class)); + } + return datasetServiceInternal; + } + + @Teardown + public void onTeardown() { + try { + if (datasetServiceInternal != null) { + datasetServiceInternal.close(); + datasetServiceInternal = null; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + @SuppressWarnings({"nullness"}) + @ProcessElement + public void process( + ProcessContext c, + final PipelineOptions pipelineOptions, + @Element KV, Iterable> element, + final @AlwaysFetched @StateId("streamName") ValueState streamName, + final @AlwaysFetched @StateId("streamOffset") ValueState streamOffset, + final OutputReceiver> o) + throws Exception { + dynamicDestinations.setSideInputAccessorFromProcessContext(c); + TableDestination tableDestination = + destinations.computeIfAbsent( + element.getKey().getKey(), + dest -> { + TableDestination tableDestination1 = dynamicDestinations.getTable(dest); + checkArgument( + tableDestination1 != null, + "DynamicDestinations.getTable() may not return null, " + + "but %s returned null for destination %s", + dynamicDestinations, + dest); + Supplier schemaSupplier = () -> dynamicDestinations.getSchema(dest); + return CreateTableHelpers.possiblyCreateTable( + c, + tableDestination1, + schemaSupplier, + createDisposition, + destinationCoder, + kmsKey, + bqServices); + }); + final String tableId = tableDestination.getTableUrn(); + final DatasetService datasetService = getDatasetService(pipelineOptions); + MessageConverter messageConverter = + messageConverters.get(element.getKey().getKey(), dynamicDestinations); + Descriptor descriptor = messageConverter.getSchemaDescriptor(); + + // Each ProtoRows object contains at most 1MB of rows. + // TODO: Push messageFromTableRow up to top level. That we we cans skip TableRow entirely if + // already proto or already schema. + final long oneMb = 1024 * 1024; + Iterable messages = new SplittingIterable(element.getValue(), oneMb); + + class AppendRowsContext extends RetryManager.Operation.Context { + final ShardedKey key; + String streamName = ""; + StreamAppendClient client = null; + long offset = -1; + long numRows = 0; + long tryIteration = 0; + + AppendRowsContext(ShardedKey key) { + this.key = key; + } + + @Override + public String toString() { + return "Context: key=" + + key + + " streamName=" + + streamName + + " offset=" + + offset + + " numRows=" + + numRows + + " tryIteration: " + + tryIteration; + } + }; + + // Initialize stream names and offsets for all contexts. This will be called initially, but + // will also be called + // if we roll over to a new stream on a retry. + BiConsumer, Boolean> initializeContexts = + (contexts, isFailure) -> { + try { + if (isFailure) { + // Clear the stream name, forcing a new one to be created. + streamName.write(""); + } + String stream = getOrCreateStream(tableId, streamName, streamOffset, datasetService); + StreamAppendClient appendClient = + APPEND_CLIENTS.get( + stream, () -> datasetService.getStreamAppendClient(stream, descriptor)); + for (AppendRowsContext context : contexts) { + context.streamName = stream; + appendClient.pin(); + context.client = appendClient; + context.offset = streamOffset.read(); + ++context.tryIteration; + streamOffset.write(context.offset + context.numRows); + } + } catch (Exception e) { + throw new RuntimeException(e); + } + }; + + Consumer> clearClients = + contexts -> { + APPEND_CLIENTS.invalidate(streamName.read()); + for (AppendRowsContext context : contexts) { + if (context.client != null) { + // Unpin in a different thread, as it may execute a blocking close. + runAsyncIgnoreFailure(closeWriterExecutor, context.client::unpin); + context.client = null; + } + } + }; + + Instant now = Instant.now(); + List contexts = Lists.newArrayList(); + RetryManager retryManager = + new RetryManager<>(Duration.standardSeconds(1), Duration.standardSeconds(10), 1000); + int numSplits = 0; + for (ProtoRows protoRows : messages) { + ++numSplits; + Function> run = + context -> { + try { + StreamAppendClient appendClient = + APPEND_CLIENTS.get( + context.streamName, + () -> datasetService.getStreamAppendClient(context.streamName, descriptor)); + return appendClient.appendRows(context.offset, protoRows); + } catch (Exception e) { + throw new RuntimeException(e); + } + }; + + // RetryManager + Function, RetryType> onError = + failedContexts -> { + // The first context is always the one that fails. + AppendRowsContext failedContext = + Preconditions.checkNotNull(Iterables.getFirst(failedContexts, null)); + Status.Code statusCode = Status.fromThrowable(failedContext.getError()).getCode(); + // Invalidate the StreamWriter and force a new one to be created. + LOG.error( + "Got error " + failedContext.getError() + " closing " + failedContext.streamName); + clearClients.accept(contexts); + appendFailures.inc(); + if (statusCode.equals(Code.OUT_OF_RANGE) || statusCode.equals(Code.ALREADY_EXISTS)) { + appendOffsetFailures.inc(); + LOG.warn( + "Append to " + + failedContext + + " failed with " + + failedContext.getError() + + " Will retry with a new stream"); + // This means that the offset we have stored does not match the current end of + // the stream in the Storage API. Usually this happens because a crash or a bundle + // failure + // happened after an append but before the worker could checkpoint it's + // state. The records that were appended in a failed bundle will be retried, + // meaning that the unflushed tail of the stream must be discarded to prevent + // duplicates. + + // Finalize the stream and clear streamName so a new stream will be created. + o.output( + KV.of(failedContext.streamName, new Operation(failedContext.offset - 1, true))); + // Reinitialize all contexts with the new stream and new offsets. + initializeContexts.accept(failedContexts, true); + + // Offset failures imply that all subsequent parallel appends will also fail. + // Retry them all. + return RetryType.RETRY_ALL_OPERATIONS; + } + + return RetryType.RETRY_ALL_OPERATIONS; + }; + + Consumer onSuccess = + context -> { + o.output( + KV.of( + context.streamName, + new Operation(context.offset + context.numRows - 1, false))); + flushesScheduled.inc(protoRows.getSerializedRowsCount()); + }; + + AppendRowsContext context = new AppendRowsContext(element.getKey()); + context.numRows = protoRows.getSerializedRowsCount(); + contexts.add(context); + retryManager.addOperation(run, onError, onSuccess, context); + recordsAppended.inc(protoRows.getSerializedRowsCount()); + appendSizeDistribution.update(context.numRows); + } + initializeContexts.accept(contexts, false); + + try { + retryManager.run(true); + } finally { + // Make sure that all pins are removed. + for (AppendRowsContext context : contexts) { + if (context.client != null) { + runAsyncIgnoreFailure(closeWriterExecutor, context.client::unpin); + } + } + } + appendSplitDistribution.update(numSplits); + + java.time.Duration timeElapsed = java.time.Duration.between(now, Instant.now()); + appendLatencyDistribution.update(timeElapsed.toMillis()); + } + + @OnWindowExpiration + public void onWindowExpiration( + @AlwaysFetched @StateId("streamName") ValueState streamName, + @AlwaysFetched @StateId("streamOffset") ValueState streamOffset, + OutputReceiver> o) { + // Window is done - usually because the pipeline has been drained. Make sure to clean up + // streams so that they are not leaked. + String stream = MoreObjects.firstNonNull(streamName.read(), null); + + if (!Strings.isNullOrEmpty(stream)) { + // Finalize the stream + long nextOffset = MoreObjects.firstNonNull(streamOffset.read(), 0L); + o.output(KV.of(stream, new Operation(nextOffset - 1, true))); + // Make sure that the stream object is closed. + APPEND_CLIENTS.invalidate(stream); + } + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingInserts.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingInserts.java index 149efa3ce1ee..1e4150a1436b 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingInserts.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingInserts.java @@ -42,6 +42,7 @@ public class StreamingInserts private final boolean skipInvalidRows; private final boolean ignoreUnknownValues; private final boolean ignoreInsertIds; + private final boolean autoSharding; private final String kmsKey; private final Coder elementCoder; private final SerializableFunction toTableRow; @@ -63,6 +64,7 @@ public StreamingInserts( false, false, false, + false, elementCoder, toTableRow, toFailsafeTableRow, @@ -79,6 +81,7 @@ private StreamingInserts( boolean skipInvalidRows, boolean ignoreUnknownValues, boolean ignoreInsertIds, + boolean autoSharding, Coder elementCoder, SerializableFunction toTableRow, SerializableFunction toFailsafeTableRow, @@ -91,6 +94,7 @@ private StreamingInserts( this.skipInvalidRows = skipInvalidRows; this.ignoreUnknownValues = ignoreUnknownValues; this.ignoreInsertIds = ignoreInsertIds; + this.autoSharding = autoSharding; this.elementCoder = elementCoder; this.toTableRow = toTableRow; this.toFailsafeTableRow = toFailsafeTableRow; @@ -109,6 +113,7 @@ public StreamingInserts withInsertRetryPolicy( skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow, @@ -126,6 +131,7 @@ public StreamingInserts withExtendedErrorInfo(boolean ex skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow, @@ -142,6 +148,7 @@ StreamingInserts withSkipInvalidRows(boolean skipInvalid skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow, @@ -158,6 +165,7 @@ StreamingInserts withIgnoreUnknownValues(boolean ignoreU skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow, @@ -174,6 +182,24 @@ StreamingInserts withIgnoreInsertIds(boolean ignoreInser skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, + elementCoder, + toTableRow, + toFailsafeTableRow, + kmsKey); + } + + StreamingInserts withAutoSharding(boolean autoSharding) { + return new StreamingInserts<>( + createDisposition, + dynamicDestinations, + bigQueryServices, + retryPolicy, + extendedErrorInfo, + skipInvalidRows, + ignoreUnknownValues, + ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow, @@ -190,6 +216,7 @@ StreamingInserts withKmsKey(String kmsKey) { skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow, @@ -206,6 +233,7 @@ StreamingInserts withTestServices(BigQueryServices bigQu skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow, @@ -229,6 +257,7 @@ public WriteResult expand(PCollection> input) { .withSkipInvalidRows(skipInvalidRows) .withIgnoreUnknownValues(ignoreUnknownValues) .withIgnoreInsertIds(ignoreInsertIds) + .withAutoSharding(autoSharding) .withElementCoder(elementCoder) .withToTableRow(toTableRow) .withToFailsafeTableRow(toFailsafeTableRow)); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java deleted file mode 100644 index 13fda34a8a1d..000000000000 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java +++ /dev/null @@ -1,211 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.bigquery; - -import com.google.api.services.bigquery.model.TableReference; -import com.google.api.services.bigquery.model.TableRow; -import java.io.IOException; -import java.math.RoundingMode; -import java.util.HashMap; -import java.util.List; -import java.util.Map; -import org.apache.beam.sdk.metrics.Counter; -import org.apache.beam.sdk.metrics.SinkMetrics; -import org.apache.beam.sdk.transforms.DoFn; -import org.apache.beam.sdk.transforms.SerializableFunction; -import org.apache.beam.sdk.transforms.windowing.BoundedWindow; -import org.apache.beam.sdk.transforms.windowing.PaneInfo; -import org.apache.beam.sdk.util.Histogram; -import org.apache.beam.sdk.util.SystemDoFnInternal; -import org.apache.beam.sdk.values.FailsafeValueInSingleWindow; -import org.apache.beam.sdk.values.KV; -import org.apache.beam.sdk.values.ShardedKey; -import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.sdk.values.ValueInSingleWindow; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath; -import org.joda.time.Instant; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -/** Implementation of DoFn to perform streaming BigQuery write. */ -@SystemDoFnInternal -@VisibleForTesting -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -class StreamingWriteFn - extends DoFn, TableRowInfo>, Void> { - private static final Logger LOG = LoggerFactory.getLogger(StreamingWriteFn.class); - - private final BigQueryServices bqServices; - private final InsertRetryPolicy retryPolicy; - private final TupleTag failedOutputTag; - private final ErrorContainer errorContainer; - private final boolean skipInvalidRows; - private final boolean ignoreUnknownValues; - private final boolean ignoreInsertIds; - private final SerializableFunction toTableRow; - private final SerializableFunction toFailsafeTableRow; - - /** JsonTableRows to accumulate BigQuery rows in order to batch writes. */ - private transient Map>> tableRows; - - /** The list of unique ids for each BigQuery table row. */ - private transient Map> uniqueIdsForTableRows; - - private transient long lastReportedSystemClockMillis; - private transient Histogram histogram; - - /** Tracks bytes written, exposed as "ByteCount" Counter. */ - private Counter byteCounter = SinkMetrics.bytesWritten(); - - StreamingWriteFn( - BigQueryServices bqServices, - InsertRetryPolicy retryPolicy, - TupleTag failedOutputTag, - ErrorContainer errorContainer, - boolean skipInvalidRows, - boolean ignoreUnknownValues, - boolean ignoreInsertIds, - SerializableFunction toTableRow, - SerializableFunction toFailsafeTableRow) { - this.bqServices = bqServices; - this.retryPolicy = retryPolicy; - this.failedOutputTag = failedOutputTag; - this.errorContainer = errorContainer; - this.skipInvalidRows = skipInvalidRows; - this.ignoreUnknownValues = ignoreUnknownValues; - this.ignoreInsertIds = ignoreInsertIds; - this.toTableRow = toTableRow; - this.toFailsafeTableRow = toFailsafeTableRow; - } - - @Setup - public void setup() { - // record latency upto 60 seconds in the resolution of 20ms - histogram = Histogram.linear(0, 20, 3000); - lastReportedSystemClockMillis = System.currentTimeMillis(); - } - - @Teardown - public void teardown() { - if (histogram.getTotalCount() > 0) { - logPercentiles(); - histogram.clear(); - } - } - - /** Prepares a target BigQuery table. */ - @StartBundle - public void startBundle() { - tableRows = new HashMap<>(); - uniqueIdsForTableRows = new HashMap<>(); - } - - /** Accumulates the input into JsonTableRows and uniqueIdsForTableRows. */ - @ProcessElement - public void processElement( - @Element KV, TableRowInfo> element, - @Timestamp Instant timestamp, - BoundedWindow window, - PaneInfo pane) { - String tableSpec = element.getKey().getKey(); - List> rows = - BigQueryHelpers.getOrCreateMapListValue(tableRows, tableSpec); - List uniqueIds = - BigQueryHelpers.getOrCreateMapListValue(uniqueIdsForTableRows, tableSpec); - - TableRow tableRow = toTableRow.apply(element.getValue().tableRow); - TableRow failsafeTableRow = toFailsafeTableRow.apply(element.getValue().tableRow); - rows.add(FailsafeValueInSingleWindow.of(tableRow, timestamp, window, pane, failsafeTableRow)); - uniqueIds.add(element.getValue().uniqueId); - } - - /** Writes the accumulated rows into BigQuery with streaming API. */ - @FinishBundle - public void finishBundle(FinishBundleContext context) throws Exception { - List> failedInserts = Lists.newArrayList(); - BigQueryOptions options = context.getPipelineOptions().as(BigQueryOptions.class); - for (Map.Entry>> entry : - tableRows.entrySet()) { - TableReference tableReference = BigQueryHelpers.parseTableSpec(entry.getKey()); - flushRows( - tableReference, - entry.getValue(), - uniqueIdsForTableRows.get(entry.getKey()), - options, - failedInserts); - } - tableRows.clear(); - uniqueIdsForTableRows.clear(); - - for (ValueInSingleWindow row : failedInserts) { - context.output(failedOutputTag, row.getValue(), row.getTimestamp(), row.getWindow()); - } - - long currentTimeMillis = System.currentTimeMillis(); - if (histogram.getTotalCount() > 0 - && (currentTimeMillis - lastReportedSystemClockMillis) - > options.getLatencyLoggingFrequency() * 1000L) { - logPercentiles(); - histogram.clear(); - lastReportedSystemClockMillis = currentTimeMillis; - } - } - - private void logPercentiles() { - LOG.info( - "Total number of streaming insert requests: {}, P99: {}ms, P90: {}ms, P50: {}ms", - histogram.getTotalCount(), - DoubleMath.roundToInt(histogram.p99(), RoundingMode.HALF_UP), - DoubleMath.roundToInt(histogram.p90(), RoundingMode.HALF_UP), - DoubleMath.roundToInt(histogram.p50(), RoundingMode.HALF_UP)); - } - - /** Writes the accumulated rows into BigQuery with streaming API. */ - private void flushRows( - TableReference tableReference, - List> tableRows, - List uniqueIds, - BigQueryOptions options, - List> failedInserts) - throws InterruptedException { - if (!tableRows.isEmpty()) { - try { - long totalBytes = - bqServices - .getDatasetService(options, histogram) - .insertAll( - tableReference, - tableRows, - uniqueIds, - retryPolicy, - failedInserts, - errorContainer, - skipInvalidRows, - ignoreUnknownValues, - ignoreInsertIds); - byteCounter.inc(totalBytes); - } catch (IOException e) { - throw new RuntimeException(e); - } - } - } -} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java index 6afe09331c69..5afd1ae93024 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java @@ -23,10 +23,12 @@ import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.ShardedKeyCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.Reshuffle; import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.transforms.windowing.DefaultTrigger; import org.apache.beam.sdk.transforms.windowing.GlobalWindows; import org.apache.beam.sdk.transforms.windowing.Window; @@ -35,7 +37,6 @@ import org.apache.beam.sdk.values.PCollectionTuple; import org.apache.beam.sdk.values.ShardedKey; import org.apache.beam.sdk.values.TupleTag; -import org.apache.beam.sdk.values.TupleTagList; /** * This transform takes in key-value pairs of {@link TableRow} entries and the {@link @@ -57,6 +58,7 @@ public class StreamingWriteTables private final boolean skipInvalidRows; private final boolean ignoreUnknownValues; private final boolean ignoreInsertIds; + private final boolean autoSharding; private final Coder elementCoder; private final SerializableFunction toTableRow; private final SerializableFunction toFailsafeTableRow; @@ -69,6 +71,7 @@ public StreamingWriteTables() { false, // skipInvalidRows false, // ignoreUnknownValues false, // ignoreInsertIds + false, // autoSharding null, // elementCoder null, // toTableRow null); // toFailsafeTableRow @@ -81,6 +84,7 @@ private StreamingWriteTables( boolean skipInvalidRows, boolean ignoreUnknownValues, boolean ignoreInsertIds, + boolean autoSharding, Coder elementCoder, SerializableFunction toTableRow, SerializableFunction toFailsafeTableRow) { @@ -90,6 +94,7 @@ private StreamingWriteTables( this.skipInvalidRows = skipInvalidRows; this.ignoreUnknownValues = ignoreUnknownValues; this.ignoreInsertIds = ignoreInsertIds; + this.autoSharding = autoSharding; this.elementCoder = elementCoder; this.toTableRow = toTableRow; this.toFailsafeTableRow = toFailsafeTableRow; @@ -103,6 +108,7 @@ StreamingWriteTables withTestServices(BigQueryServices bigQueryService skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow); @@ -116,6 +122,7 @@ StreamingWriteTables withInsertRetryPolicy(InsertRetryPolicy retryPoli skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow); @@ -129,6 +136,7 @@ StreamingWriteTables withExtendedErrorInfo(boolean extendedErrorInfo) skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow); @@ -142,6 +150,7 @@ StreamingWriteTables withSkipInvalidRows(boolean skipInvalidRows) { skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow); @@ -155,6 +164,7 @@ StreamingWriteTables withIgnoreUnknownValues(boolean ignoreUnknownValu skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow); @@ -168,6 +178,21 @@ StreamingWriteTables withIgnoreInsertIds(boolean ignoreInsertIds) { skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, + elementCoder, + toTableRow, + toFailsafeTableRow); + } + + StreamingWriteTables withAutoSharding(boolean autoSharding) { + return new StreamingWriteTables<>( + bigQueryServices, + retryPolicy, + extendedErrorInfo, + skipInvalidRows, + ignoreUnknownValues, + ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow); @@ -181,6 +206,7 @@ StreamingWriteTables withElementCoder(Coder elementCoder) { skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow); @@ -195,6 +221,7 @@ StreamingWriteTables withToTableRow( skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow); @@ -209,6 +236,7 @@ StreamingWriteTables withToFailsafeTableRow( skipInvalidRows, ignoreUnknownValues, ignoreInsertIds, + autoSharding, elementCoder, toTableRow, toFailsafeTableRow); @@ -218,32 +246,41 @@ StreamingWriteTables withToFailsafeTableRow( public WriteResult expand(PCollection> input) { if (extendedErrorInfo) { TupleTag failedInsertsTag = new TupleTag<>(FAILED_INSERTS_TAG_ID); - PCollection failedInserts = + PCollectionTuple result = writeAndGetErrors( input, failedInsertsTag, BigQueryInsertErrorCoder.of(), ErrorContainer.BIG_QUERY_INSERT_ERROR_ERROR_CONTAINER); - return WriteResult.withExtendedErrors(input.getPipeline(), failedInsertsTag, failedInserts); + PCollection failedInserts = result.get(failedInsertsTag); + return WriteResult.withExtendedErrors( + input.getPipeline(), + failedInsertsTag, + failedInserts, + result.get(BatchedStreamingWrite.SUCCESSFUL_ROWS_TAG)); } else { TupleTag failedInsertsTag = new TupleTag<>(FAILED_INSERTS_TAG_ID); - PCollection failedInserts = + PCollectionTuple result = writeAndGetErrors( input, failedInsertsTag, TableRowJsonCoder.of(), ErrorContainer.TABLE_ROW_ERROR_CONTAINER); - return WriteResult.in(input.getPipeline(), failedInsertsTag, failedInserts); + PCollection failedInserts = result.get(failedInsertsTag); + return WriteResult.in( + input.getPipeline(), + failedInsertsTag, + failedInserts, + result.get(BatchedStreamingWrite.SUCCESSFUL_ROWS_TAG)); } } - private PCollection writeAndGetErrors( + private PCollectionTuple writeAndGetErrors( PCollection> input, TupleTag failedInsertsTag, AtomicCoder coder, ErrorContainer errorContainer) { BigQueryOptions options = input.getPipeline().getOptions().as(BigQueryOptions.class); - int numShards = options.getNumStreamingKeys(); // A naive implementation would be to simply stream data directly to BigQuery. // However, this could occasionally lead to duplicated data, e.g., when @@ -251,53 +288,102 @@ private PCollection writeAndGetErrors( // The above risk is mitigated in this implementation by relying on // BigQuery built-in best effort de-dup mechanism. - // To use this mechanism, each input TableRow is tagged with a generated - // unique id, which is then passed to BigQuery and used to ignore duplicates - // We create 50 keys per BigQuery table to generate output on. This is few enough that we - // get good batching into BigQuery's insert calls, and enough that we can max out the - // streaming insert quota. - PCollection, TableRowInfo>> tagged = - input - .apply("ShardTableWrites", ParDo.of(new GenerateShardedTable<>(numShards))) - .setCoder(KvCoder.of(ShardedKeyCoder.of(StringUtf8Coder.of()), elementCoder)) - .apply("TagWithUniqueIds", ParDo.of(new TagWithUniqueIds<>())) - .setCoder( - KvCoder.of( - ShardedKeyCoder.of(StringUtf8Coder.of()), TableRowInfoCoder.of(elementCoder))); - - TupleTag mainOutputTag = new TupleTag<>("mainOutput"); + // unique id, which is then passed to BigQuery and used to ignore duplicates. // To prevent having the same TableRow processed more than once with regenerated // different unique ids, this implementation relies on "checkpointing", which is - // achieved as a side effect of having StreamingWriteFn immediately follow a GBK, - // performed by Reshuffle. - PCollectionTuple tuple = - tagged - .apply(Reshuffle.of()) - // Put in the global window to ensure that DynamicDestinations side inputs are accessed - // correctly. - .apply( - "GlobalWindow", - Window., TableRowInfo>>into(new GlobalWindows()) - .triggering(DefaultTrigger.of()) - .discardingFiredPanes()) - .apply( - "StreamingWrite", - ParDo.of( - new StreamingWriteFn<>( - bigQueryServices, - retryPolicy, - failedInsertsTag, - errorContainer, - skipInvalidRows, - ignoreUnknownValues, - ignoreInsertIds, - toTableRow, - toFailsafeTableRow)) - .withOutputTags(mainOutputTag, TupleTagList.of(failedInsertsTag))); - PCollection failedInserts = tuple.get(failedInsertsTag); - failedInserts.setCoder(coder); - return failedInserts; + // achieved as a side effect of having BigQuery insertion immediately follow a GBK. + + if (autoSharding) { + // If runner determined dynamic sharding is enabled, group TableRows on table destinations + // that may be sharded during the runtime. Otherwise, we choose a fixed number of shards per + // table destination following the logic below in the other branch. + PCollection>> unshardedTagged = + input + .apply( + "MapToTableSpec", + MapElements.via( + new SimpleFunction, KV>() { + @Override + public KV apply(KV input) { + return KV.of(input.getKey().getTableSpec(), input.getValue()); + } + })) + .setCoder(KvCoder.of(StringUtf8Coder.of(), elementCoder)) + .apply("TagWithUniqueIds", ParDo.of(new TagWithUniqueIds<>())) + .setCoder(KvCoder.of(StringUtf8Coder.of(), TableRowInfoCoder.of(elementCoder))); + + // Auto-sharding is achieved via GroupIntoBatches.WithShardedKey transform which groups and at + // the same time batches the TableRows to be inserted to BigQuery. + return unshardedTagged.apply( + "StreamingWrite", + new BatchedStreamingWrite<>( + bigQueryServices, + retryPolicy, + failedInsertsTag, + coder, + errorContainer, + skipInvalidRows, + ignoreUnknownValues, + ignoreInsertIds, + toTableRow, + toFailsafeTableRow) + .viaStateful()); + } else { + // We create 50 keys per BigQuery table to generate output on. This is few enough that we + // get good batching into BigQuery's insert calls, and enough that we can max out the + // streaming insert quota. + int numShards = options.getNumStreamingKeys(); + PCollection, TableRowInfo>> shardedTagged = + input + .apply("ShardTableWrites", ParDo.of(new GenerateShardedTable<>(numShards))) + .setCoder(KvCoder.of(ShardedKeyCoder.of(StringUtf8Coder.of()), elementCoder)) + .apply("TagWithUniqueIds", ParDo.of(new TagWithUniqueIds<>())) + .setCoder( + KvCoder.of( + ShardedKeyCoder.of(StringUtf8Coder.of()), + TableRowInfoCoder.of(elementCoder))); + + return shardedTagged + .apply(Reshuffle.of()) + // Put in the global window to ensure that DynamicDestinations side inputs are + // accessed + // correctly. + .apply( + "GlobalWindow", + Window., TableRowInfo>>into(new GlobalWindows()) + .triggering(DefaultTrigger.of()) + .discardingFiredPanes()) + .apply( + "StripShardId", + MapElements.via( + new SimpleFunction< + KV, TableRowInfo>, + KV>>() { + @Override + public KV> apply( + KV, TableRowInfo> input) { + return KV.of(input.getKey().getKey(), input.getValue()); + } + })) + .setCoder(KvCoder.of(StringUtf8Coder.of(), TableRowInfoCoder.of(elementCoder))) + // Also batch the TableRows in a best effort manner via bundle finalization before + // inserting to BigQuery. + .apply( + "StreamingWrite", + new BatchedStreamingWrite<>( + bigQueryServices, + retryPolicy, + failedInsertsTag, + coder, + errorContainer, + skipInvalidRows, + ignoreUnknownValues, + ignoreInsertIds, + toTableRow, + toFailsafeTableRow) + .viaDoFnFinalization()); + } } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableDestination.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableDestination.java index f8513502c5f7..ddd0b780c6ea 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableDestination.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableDestination.java @@ -119,10 +119,19 @@ public TableDestination withTableReference(TableReference tableReference) { tableReference, tableDescription, jsonTimePartitioning, jsonClustering); } + /** Return the tablespec in [project:].dataset.tableid format. */ public String getTableSpec() { return tableSpec; } + /** Return the tablespec in projects/[project]/datasets/[dataset]/tables/[table] format. */ + public String getTableUrn() { + TableReference table = getTableReference(); + return String.format( + "projects/%s/datasets/%s/tables/%s", + table.getProjectId(), table.getDatasetId(), table.getTableId()); + } + public TableReference getTableReference() { return BigQueryHelpers.parseTableSpec(tableSpec); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java new file mode 100644 index 000000000000..710d47dd76e1 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java @@ -0,0 +1,309 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static java.util.stream.Collectors.toList; + +import com.google.api.services.bigquery.model.TableFieldSchema; +import com.google.api.services.bigquery.model.TableRow; +import com.google.api.services.bigquery.model.TableSchema; +import com.google.protobuf.ByteString; +import com.google.protobuf.DescriptorProtos.DescriptorProto; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto.Label; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto.Type; +import com.google.protobuf.DescriptorProtos.FileDescriptorProto; +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.Descriptors.DescriptorValidationException; +import com.google.protobuf.Descriptors.FieldDescriptor; +import com.google.protobuf.Descriptors.FileDescriptor; +import com.google.protobuf.DynamicMessage; +import com.google.protobuf.Message; +import java.util.AbstractMap; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.UUID; +import java.util.function.Function; +import javax.annotation.Nullable; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.BaseEncoding; + +/** + * Utility methods for converting JSON {@link TableRow} objects to dynamic protocol message, for use + * with the Storage write API. + */ +public class TableRowToStorageApiProto { + static final Map PRIMITIVE_TYPES = + ImmutableMap.builder() + .put("INT64", Type.TYPE_INT64) + .put("INTEGER", Type.TYPE_INT64) + .put("FLOAT64", Type.TYPE_DOUBLE) + .put("FLOAT", Type.TYPE_DOUBLE) + .put("STRING", Type.TYPE_STRING) + .put("BOOL", Type.TYPE_BOOL) + .put("BOOLEAN", Type.TYPE_BOOL) + .put("BYTES", Type.TYPE_BYTES) + .put("NUMERIC", Type.TYPE_STRING) // Pass through the JSON encoding. + .put("BIGNUMERIC", Type.TYPE_STRING) // Pass through the JSON encoding. + .put("GEOGRAPHY", Type.TYPE_STRING) // Pass through the JSON encoding. + .put("DATE", Type.TYPE_STRING) // Pass through the JSON encoding. + .put("TIME", Type.TYPE_STRING) // Pass through the JSON encoding. + .put("DATETIME", Type.TYPE_STRING) // Pass through the JSON encoding. + .put("TIMESTAMP", Type.TYPE_STRING) // Pass through the JSON encoding. + .build(); + + /** + * Given a BigQuery TableSchema, returns a protocol-buffer Descriptor that can be used to write + * data using the BigQuery Storage API. + */ + public static Descriptor getDescriptorFromTableSchema(TableSchema jsonSchema) + throws DescriptorValidationException { + DescriptorProto descriptorProto = descriptorSchemaFromTableSchema(jsonSchema); + FileDescriptorProto fileDescriptorProto = + FileDescriptorProto.newBuilder().addMessageType(descriptorProto).build(); + FileDescriptor fileDescriptor = + FileDescriptor.buildFrom(fileDescriptorProto, new FileDescriptor[0]); + + return Iterables.getOnlyElement(fileDescriptor.getMessageTypes()); + } + + /** + * Given a BigQuery TableRow, returns a protocol-buffer message that can be used to write data + * using the BigQuery Storage API. + */ + public static DynamicMessage messageFromTableRow(Descriptor descriptor, TableRow tableRow) { + DynamicMessage.Builder builder = DynamicMessage.newBuilder(descriptor); + for (Map.Entry entry : tableRow.entrySet()) { + @Nullable + FieldDescriptor fieldDescriptor = descriptor.findFieldByName(entry.getKey().toLowerCase()); + if (fieldDescriptor == null) { + throw new RuntimeException( + "TableRow contained unexpected field with name " + entry.getKey()); + } + @Nullable Object value = messageValueFromFieldValue(fieldDescriptor, entry.getValue()); + if (value != null) { + builder.setField(fieldDescriptor, value); + } + } + return builder.build(); + } + + @VisibleForTesting + static DescriptorProto descriptorSchemaFromTableSchema(TableSchema tableSchema) { + return descriptorSchemaFromTableFieldSchemas(tableSchema.getFields()); + } + + private static DescriptorProto descriptorSchemaFromTableFieldSchemas( + Iterable tableFieldSchemas) { + DescriptorProto.Builder descriptorBuilder = DescriptorProto.newBuilder(); + // Create a unique name for the descriptor ('-' characters cannot be used). + descriptorBuilder.setName("D" + UUID.randomUUID().toString().replace("-", "_")); + int i = 1; + for (TableFieldSchema fieldSchema : tableFieldSchemas) { + fieldDescriptorFromTableField(fieldSchema, i++, descriptorBuilder); + } + return descriptorBuilder.build(); + } + + private static void fieldDescriptorFromTableField( + TableFieldSchema fieldSchema, int fieldNumber, DescriptorProto.Builder descriptorBuilder) { + FieldDescriptorProto.Builder fieldDescriptorBuilder = FieldDescriptorProto.newBuilder(); + fieldDescriptorBuilder = fieldDescriptorBuilder.setName(fieldSchema.getName().toLowerCase()); + fieldDescriptorBuilder = fieldDescriptorBuilder.setNumber(fieldNumber); + switch (fieldSchema.getType()) { + case "STRUCT": + case "RECORD": + DescriptorProto nested = descriptorSchemaFromTableFieldSchemas(fieldSchema.getFields()); + descriptorBuilder.addNestedType(nested); + fieldDescriptorBuilder = + fieldDescriptorBuilder.setType(Type.TYPE_MESSAGE).setTypeName(nested.getName()); + break; + default: + @Nullable Type type = PRIMITIVE_TYPES.get(fieldSchema.getType()); + if (type == null) { + throw new UnsupportedOperationException( + "Converting BigQuery type " + fieldSchema.getType() + " to Beam type is unsupported"); + } + fieldDescriptorBuilder = fieldDescriptorBuilder.setType(type); + } + + Optional fieldMode = Optional.ofNullable(fieldSchema.getMode()).map(Mode::valueOf); + if (fieldMode.filter(m -> m == Mode.REPEATED).isPresent()) { + fieldDescriptorBuilder = fieldDescriptorBuilder.setLabel(Label.LABEL_REPEATED); + } else if (!fieldMode.isPresent() || fieldMode.filter(m -> m == Mode.NULLABLE).isPresent()) { + fieldDescriptorBuilder = fieldDescriptorBuilder.setLabel(Label.LABEL_OPTIONAL); + } else { + fieldDescriptorBuilder = fieldDescriptorBuilder.setLabel(Label.LABEL_REQUIRED); + } + descriptorBuilder.addField(fieldDescriptorBuilder.build()); + } + + @Nullable + private static Object messageValueFromFieldValue( + FieldDescriptor fieldDescriptor, Object bqValue) { + if (bqValue == null) { + if (fieldDescriptor.isOptional()) { + return null; + } else { + throw new IllegalArgumentException( + "Received null value for non-nullable field " + fieldDescriptor.getName()); + } + } + return toProtoValue(fieldDescriptor, bqValue, fieldDescriptor.isRepeated()); + } + + private static final Map> + JSON_PROTO_STRING_PARSERS = + ImmutableMap.>builder() + .put(FieldDescriptor.Type.INT32, Integer::valueOf) + .put(FieldDescriptor.Type.INT64, Long::valueOf) + .put(FieldDescriptor.Type.FLOAT, Float::valueOf) + .put(FieldDescriptor.Type.DOUBLE, Double::valueOf) + .put(FieldDescriptor.Type.BOOL, Boolean::valueOf) + .put(FieldDescriptor.Type.STRING, str -> str) + .put( + FieldDescriptor.Type.BYTES, + b64 -> ByteString.copyFrom(BaseEncoding.base64().decode(b64))) + .build(); + + @Nullable + @SuppressWarnings({"nullness"}) + @VisibleForTesting + static Object toProtoValue( + FieldDescriptor fieldDescriptor, Object jsonBQValue, boolean isRepeated) { + if (isRepeated) { + return ((List) jsonBQValue) + .stream() + .map( + v -> { + if (fieldDescriptor.getType() == FieldDescriptor.Type.MESSAGE) { + return ((Map) v).get("v"); + } else { + return v; + } + }) + .map(v -> toProtoValue(fieldDescriptor, v, false)) + .collect(toList()); + } + + if (fieldDescriptor.getType() == FieldDescriptor.Type.MESSAGE) { + if (jsonBQValue instanceof AbstractMap) { + // This will handle nested rows. + TableRow tr = new TableRow(); + tr.putAll((AbstractMap) jsonBQValue); + return messageFromTableRow(fieldDescriptor.getMessageType(), tr); + } else { + throw new RuntimeException("Unexpected value " + jsonBQValue + " Expected a JSON map."); + } + } + @Nullable Object scalarValue = scalarToProtoValue(fieldDescriptor, jsonBQValue); + if (scalarValue == null) { + return toProtoValue(fieldDescriptor, jsonBQValue.toString(), isRepeated); + } else { + return scalarValue; + } + } + + @VisibleForTesting + @Nullable + static Object scalarToProtoValue(FieldDescriptor fieldDescriptor, Object jsonBQValue) { + if (jsonBQValue instanceof String) { + Function mapper = JSON_PROTO_STRING_PARSERS.get(fieldDescriptor.getType()); + if (mapper == null) { + throw new UnsupportedOperationException( + "Converting BigQuery type '" + + jsonBQValue.getClass() + + "' to '" + + fieldDescriptor + + "' is not supported"); + } + return mapper.apply((String) jsonBQValue); + } + + switch (fieldDescriptor.getType()) { + case BOOL: + if (jsonBQValue instanceof Boolean) { + return jsonBQValue; + } + break; + case BYTES: + break; + case INT64: + if (jsonBQValue instanceof Integer) { + return Long.valueOf((Integer) jsonBQValue); + } else if (jsonBQValue instanceof Long) { + return jsonBQValue; + } + break; + case INT32: + if (jsonBQValue instanceof Integer) { + return jsonBQValue; + } + break; + case STRING: + break; + case DOUBLE: + if (jsonBQValue instanceof Double) { + return jsonBQValue; + } else if (jsonBQValue instanceof Float) { + return Double.valueOf((Float) jsonBQValue); + } + break; + default: + throw new RuntimeException("Unsupported proto type " + fieldDescriptor.getType()); + } + return null; + } + + @VisibleForTesting + public static TableRow tableRowFromMessage(Message message) { + TableRow tableRow = new TableRow(); + for (Map.Entry field : message.getAllFields().entrySet()) { + FieldDescriptor fieldDescriptor = field.getKey(); + Object fieldValue = field.getValue(); + tableRow.putIfAbsent( + fieldDescriptor.getName(), jsonValueFromMessageValue(fieldDescriptor, fieldValue, true)); + } + return tableRow; + } + + public static Object jsonValueFromMessageValue( + FieldDescriptor fieldDescriptor, Object fieldValue, boolean expandRepeated) { + if (expandRepeated && fieldDescriptor.isRepeated()) { + List valueList = (List) fieldValue; + return valueList.stream() + .map(v -> jsonValueFromMessageValue(fieldDescriptor, v, false)) + .collect(toList()); + } + + switch (fieldDescriptor.getType()) { + case GROUP: + case MESSAGE: + return tableRowFromMessage((Message) fieldValue); + case BYTES: + return BaseEncoding.base64().encode(((ByteString) fieldValue).toByteArray()); + case ENUM: + throw new RuntimeException("Enumerations not supported"); + default: + return fieldValue.toString(); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java index 40e9b5fa7b85..bd4e230e07ca 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java @@ -20,9 +20,7 @@ import java.io.IOException; import java.util.UUID; import org.apache.beam.sdk.transforms.DoFn; -import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.sdk.values.KV; -import org.apache.beam.sdk.values.ShardedKey; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; /** @@ -35,8 +33,8 @@ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -class TagWithUniqueIds - extends DoFn, ElementT>, KV, TableRowInfo>> { +class TagWithUniqueIds + extends DoFn, KV>> { private transient String randomUUID; private transient long sequenceNo = 0L; @@ -47,7 +45,7 @@ public void startBundle() { /** Tag the input with a unique id. */ @ProcessElement - public void processElement(ProcessContext context, BoundedWindow window) throws IOException { + public void processElement(ProcessContext context) throws IOException { String uniqueId = randomUUID + sequenceNo++; // We output on keys 0-50 to ensure that there's enough batching for // BigQuery. diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TestBigQuery.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TestBigQuery.java index 031e4c109730..954f5f540e5a 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TestBigQuery.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TestBigQuery.java @@ -128,7 +128,10 @@ private void initializeBigQuery(Description description) private Table createTable(Description description) throws IOException, InterruptedException { TableReference tableReference = new TableReference() - .setProjectId(pipelineOptions.getProject()) + .setProjectId( + pipelineOptions.getBigQueryProject() == null + ? pipelineOptions.getProject() + : pipelineOptions.getBigQueryProject()) .setDatasetId(pipelineOptions.getTargetDataset()) .setTableId(createRandomizedName(description)); @@ -213,7 +216,9 @@ public TableDataInsertAllResponse insertRows(Schema rowSchema, Row... rows) thro return bq.tabledata() .insertAll( - pipelineOptions.getProject(), + pipelineOptions.getBigQueryProject() == null + ? pipelineOptions.getProject() + : pipelineOptions.getBigQueryProject(), pipelineOptions.getTargetDataset(), table.getTableReference().getTableId(), new TableDataInsertAllRequest().setRows(bqRows)) @@ -282,7 +287,9 @@ private TableSchema getSchema(Bigquery bq) { try { return bq.tables() .get( - pipelineOptions.getProject(), + pipelineOptions.getBigQueryProject() == null + ? pipelineOptions.getProject() + : pipelineOptions.getBigQueryProject(), pipelineOptions.getTargetDataset(), table.getTableReference().getTableId()) .setPrettyPrint(false) @@ -297,7 +304,9 @@ private List getTableRows(Bigquery bq) { try { return bq.tabledata() .list( - pipelineOptions.getProject(), + pipelineOptions.getBigQueryProject() == null + ? pipelineOptions.getProject() + : pipelineOptions.getBigQueryProject(), pipelineOptions.getTargetDataset(), table.getTableReference().getTableId()) .setPrettyPrint(false) diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TwoLevelMessageConverterCache.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TwoLevelMessageConverterCache.java new file mode 100644 index 000000000000..3e928ff29655 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TwoLevelMessageConverterCache.java @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import java.io.Serializable; +import org.apache.beam.sdk.io.gcp.bigquery.StorageApiDynamicDestinations.MessageConverter; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.Cache; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; + +/** + * A cache for {@link MessageConverter} objects. + * + *

    There is an instance-level cache which we try to use to avoid hashing the entire operation + * name. However since this object is stored in DoFns and many DoFns share the same + * MessageConverters, we also store a static cache keyed by operation name. + */ +class TwoLevelMessageConverterCache implements Serializable { + final String operationName; + + TwoLevelMessageConverterCache(String operationName) { + this.operationName = operationName; + } + + // Cache MessageConverters since creating them can be expensive. Cache is keyed by transform name + // and the destination. + @SuppressWarnings({"nullness"}) + private static final Cache, MessageConverter> CACHED_MESSAGE_CONVERTERS = + CacheBuilder.newBuilder().expireAfterAccess(java.time.Duration.ofMinutes(15)).build(); + + // Keep an instance-level cache of MessageConverter objects. This allows us to avoid hashing the + // entire operation name + // on every element. Since there will be multiple DoFn instances (and they may periodically be + // recreated), we + // still need the static cache to allow reuse. + @SuppressWarnings({"nullness"}) + private final Cache> localMessageConverters = + CacheBuilder.newBuilder().expireAfterAccess(java.time.Duration.ofMinutes(15)).build(); + + public MessageConverter get( + DestinationT destination, + StorageApiDynamicDestinations dynamicDestinations) + throws Exception { + // Lookup first in the local cache, and fall back to the static cache if necessary. + return localMessageConverters.get( + destination, + () -> + (MessageConverter) + CACHED_MESSAGE_CONVERTERS.get( + KV.of(operationName, destination), + () -> dynamicDestinations.getMessageConverter(destination))); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteGroupedRecordsToFiles.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteGroupedRecordsToFiles.java index 6db179bfeb8f..8c6366d0183b 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteGroupedRecordsToFiles.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteGroupedRecordsToFiles.java @@ -20,17 +20,14 @@ import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollectionView; -import org.apache.beam.sdk.values.ShardedKey; /** - * Receives elements grouped by their (sharded) destination, and writes them out to a file. Since - * all the elements in the {@link Iterable} are destined to the same table, they are all written to - * the same file. Ensures that only one {@link TableRowWriter} is active per bundle. + * Receives elements grouped by their destination, and writes them out to a file. Since all the + * elements in the {@link Iterable} are destined to the same table, they are all written to the same + * file. Ensures that only one {@link TableRowWriter} is active per bundle. */ class WriteGroupedRecordsToFiles - extends DoFn< - KV, Iterable>, - WriteBundlesToFiles.Result> { + extends DoFn>, WriteBundlesToFiles.Result> { private final PCollectionView tempFilePrefix; private final long maxFileSize; @@ -48,24 +45,24 @@ class WriteGroupedRecordsToFiles @ProcessElement public void processElement( ProcessContext c, - @Element KV, Iterable> element, + @Element KV> element, OutputReceiver> o) throws Exception { String tempFilePrefix = c.sideInput(this.tempFilePrefix); BigQueryRowWriter writer = - rowWriterFactory.createRowWriter(tempFilePrefix, element.getKey().getKey()); + rowWriterFactory.createRowWriter(tempFilePrefix, element.getKey()); try { for (ElementT tableRow : element.getValue()) { if (writer.getByteSize() > maxFileSize) { writer.close(); - writer = rowWriterFactory.createRowWriter(tempFilePrefix, element.getKey().getKey()); + writer = rowWriterFactory.createRowWriter(tempFilePrefix, element.getKey()); BigQueryRowWriter.Result result = writer.getResult(); o.output( new WriteBundlesToFiles.Result<>( - result.resourceId.toString(), result.byteSize, c.element().getKey().getKey())); + result.resourceId.toString(), result.byteSize, c.element().getKey())); } writer.write(tableRow); } @@ -76,6 +73,6 @@ public void processElement( BigQueryRowWriter.Result result = writer.getResult(); o.output( new WriteBundlesToFiles.Result<>( - result.resourceId.toString(), result.byteSize, c.element().getKey().getKey())); + result.resourceId.toString(), result.byteSize, c.element().getKey())); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WritePartition.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WritePartition.java index cd4f16325da2..e1e0566c1704 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WritePartition.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WritePartition.java @@ -17,8 +17,17 @@ */ package org.apache.beam.sdk.io.gcp.bigquery; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; import java.util.List; import java.util.Map; +import org.apache.beam.sdk.coders.AtomicCoder; +import org.apache.beam.sdk.coders.BooleanCoder; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.ListCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.Result; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.values.KV; @@ -39,7 +48,32 @@ class WritePartition extends DoFn< Iterable>, - KV, List>> { + KV, WritePartition.Result>> { + @AutoValue + abstract static class Result { + public abstract List getFilenames(); + + abstract Boolean isFirstPane(); + } + + static class ResultCoder extends AtomicCoder { + private static final Coder> FILENAMES_CODER = ListCoder.of(StringUtf8Coder.of()); + private static final Coder FIRST_PANE_CODER = BooleanCoder.of(); + static final ResultCoder INSTANCE = new ResultCoder(); + + @Override + public void encode(Result value, OutputStream outStream) throws IOException { + FILENAMES_CODER.encode(value.getFilenames(), outStream); + FIRST_PANE_CODER.encode(value.isFirstPane(), outStream); + } + + @Override + public Result decode(InputStream inStream) throws IOException { + return new AutoValue_WritePartition_Result( + FILENAMES_CODER.decode(inStream), FIRST_PANE_CODER.decode(inStream)); + } + } + private final boolean singletonTable; private final DynamicDestinations dynamicDestinations; private final PCollectionView tempFilePrefix; @@ -47,8 +81,9 @@ class WritePartition private final long maxSizeBytes; private final RowWriterFactory rowWriterFactory; - private @Nullable TupleTag, List>> multiPartitionsTag; - private TupleTag, List>> singlePartitionTag; + private @Nullable TupleTag, WritePartition.Result>> + multiPartitionsTag; + private TupleTag, WritePartition.Result>> singlePartitionTag; private static class PartitionData { private int numFiles = 0; @@ -131,8 +166,8 @@ void addPartition(PartitionData partition) { PCollectionView tempFilePrefix, int maxNumFiles, long maxSizeBytes, - TupleTag, List>> multiPartitionsTag, - TupleTag, List>> singlePartitionTag, + TupleTag, WritePartition.Result>> multiPartitionsTag, + TupleTag, WritePartition.Result>> singlePartitionTag, RowWriterFactory rowWriterFactory) { this.singletonTable = singletonTable; this.dynamicDestinations = dynamicDestinations; @@ -147,7 +182,6 @@ void addPartition(PartitionData partition) { @ProcessElement public void processElement(ProcessContext c) throws Exception { List> results = Lists.newArrayList(c.element()); - // If there are no elements to write _and_ the user specified a constant output table, then // generate an empty table of that name. if (results.isEmpty() && singletonTable) { @@ -161,7 +195,8 @@ public void processElement(ProcessContext c) throws Exception { BigQueryRowWriter.Result writerResult = writer.getResult(); results.add( - new Result<>(writerResult.resourceId.toString(), writerResult.byteSize, destination)); + new WriteBundlesToFiles.Result<>( + writerResult.resourceId.toString(), writerResult.byteSize, destination)); } Map currentResults = Maps.newHashMap(); @@ -190,11 +225,16 @@ public void processElement(ProcessContext c) throws Exception { // In the fast-path case where we only output one table, the transform loads it directly // to the final table. In this case, we output on a special TupleTag so the enclosing // transform knows to skip the rename step. - TupleTag, List>> outputTag = + TupleTag, WritePartition.Result>> outputTag = (destinationData.getPartitions().size() == 1) ? singlePartitionTag : multiPartitionsTag; for (int i = 0; i < destinationData.getPartitions().size(); ++i) { PartitionData partitionData = destinationData.getPartitions().get(i); - c.output(outputTag, KV.of(ShardedKey.of(destination, i + 1), partitionData.getFilenames())); + c.output( + outputTag, + KV.of( + ShardedKey.of(destination, i + 1), + new AutoValue_WritePartition_Result( + partitionData.getFilenames(), c.pane().isFirst()))); } } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java index 07b9b34ebf33..80201ff0be28 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java @@ -27,16 +27,19 @@ import java.util.Map; import java.util.stream.Collectors; import java.util.stream.StreamSupport; +import javax.annotation.Nullable; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.PendingJobManager; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.JobService; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ArrayListMultimap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Multimap; import org.slf4j.Logger; @@ -49,7 +52,7 @@ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -class WriteRename extends DoFn>, Void> { +class WriteRename extends DoFn>, Void> { private static final Logger LOG = LoggerFactory.getLogger(WriteRename.class); private final BigQueryServices bqServices; @@ -62,6 +65,7 @@ class WriteRename extends DoFn>, Void> { private final CreateDisposition firstPaneCreateDisposition; private final int maxRetryJobs; private final String kmsKey; + private @Nullable DatasetService datasetService; private static class PendingJobData { final BigQueryHelpers.PendingJob retryJob; @@ -100,13 +104,28 @@ public void startBundle(StartBundleContext c) { pendingJobs.clear(); } + @Teardown + public void onTeardown() { + try { + if (datasetService != null) { + datasetService.close(); + datasetService = null; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + @ProcessElement - public void processElement(ProcessContext c) throws Exception { - Multimap tempTables = ArrayListMultimap.create(); - for (KV entry : c.element()) { + public void processElement( + @Element Iterable> element, ProcessContext c) + throws Exception { + Multimap tempTables = ArrayListMultimap.create(); + for (KV entry : element) { tempTables.put(entry.getKey(), entry.getValue()); } - for (Map.Entry> entry : tempTables.asMap().entrySet()) { + for (Map.Entry> entry : + tempTables.asMap().entrySet()) { // Process each destination table. // Do not copy if no temp tables are provided. if (!entry.getValue().isEmpty()) { @@ -118,7 +137,7 @@ public void processElement(ProcessContext c) throws Exception { @FinishBundle public void finishBundle(FinishBundleContext c) throws Exception { DatasetService datasetService = - bqServices.getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class)); + getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class)); PendingJobManager jobManager = new PendingJobManager(); for (PendingJobData pendingJob : pendingJobs) { jobManager.addPendingJob( @@ -142,18 +161,35 @@ public void finishBundle(FinishBundleContext c) throws Exception { jobManager.waitForDone(); } + private DatasetService getDatasetService(PipelineOptions pipelineOptions) throws IOException { + if (datasetService == null) { + datasetService = bqServices.getDatasetService(pipelineOptions.as(BigQueryOptions.class)); + } + return datasetService; + } + private PendingJobData startWriteRename( - TableDestination finalTableDestination, Iterable tempTableNames, ProcessContext c) + TableDestination finalTableDestination, + Iterable tempTableNames, + ProcessContext c) throws Exception { + // The pane may have advanced either here due to triggering or due to an upstream trigger. We + // check the upstream + // trigger to handle the case where an earlier pane triggered the single-partition path. If this + // happened, then the + // table will already exist so we want to append to the table. + boolean isFirstPane = + Iterables.getFirst(tempTableNames, null).isFirstPane() && c.pane().isFirst(); WriteDisposition writeDisposition = - (c.pane().getIndex() == 0) ? firstPaneWriteDisposition : WriteDisposition.WRITE_APPEND; + isFirstPane ? firstPaneWriteDisposition : WriteDisposition.WRITE_APPEND; CreateDisposition createDisposition = - (c.pane().getIndex() == 0) ? firstPaneCreateDisposition : CreateDisposition.CREATE_NEVER; + isFirstPane ? firstPaneCreateDisposition : CreateDisposition.CREATE_NEVER; List tempTables = StreamSupport.stream(tempTableNames.spliterator(), false) - .map(table -> BigQueryHelpers.fromJsonString(table, TableReference.class)) + .map( + result -> + BigQueryHelpers.fromJsonString(result.getTableName(), TableReference.class)) .collect(Collectors.toList()); - ; // Make sure each destination table gets a unique job id. String jobIdPrefix = @@ -163,7 +199,7 @@ private PendingJobData startWriteRename( BigQueryHelpers.PendingJob retryJob = startCopy( bqServices.getJobService(c.getPipelineOptions().as(BigQueryOptions.class)), - bqServices.getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class)), + getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class)), jobIdPrefix, finalTableDestination.getTableReference(), tempTables, diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java index fd75f671cd91..85ef275ba3f9 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java @@ -29,6 +29,7 @@ import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.checkerframework.checker.nullness.qual.Nullable; /** The result of a {@link BigQueryIO.Write} transform. */ @SuppressWarnings({ @@ -40,18 +41,25 @@ public final class WriteResult implements POutput { private final PCollection failedInserts; private final TupleTag failedInsertsWithErrTag; private final PCollection failedInsertsWithErr; + private final PCollection successfulInserts; /** Creates a {@link WriteResult} in the given {@link Pipeline}. */ static WriteResult in( - Pipeline pipeline, TupleTag failedInsertsTag, PCollection failedInserts) { - return new WriteResult(pipeline, failedInsertsTag, failedInserts, null, null); + Pipeline pipeline, + TupleTag failedInsertsTag, + PCollection failedInserts, + @Nullable PCollection successfulInserts) { + return new WriteResult( + pipeline, failedInsertsTag, failedInserts, null, null, successfulInserts); } static WriteResult withExtendedErrors( Pipeline pipeline, TupleTag failedInsertsTag, - PCollection failedInserts) { - return new WriteResult(pipeline, null, null, failedInsertsTag, failedInserts); + PCollection failedInserts, + PCollection successfulInserts) { + return new WriteResult( + pipeline, null, null, failedInsertsTag, failedInserts, successfulInserts); } @Override @@ -68,12 +76,14 @@ private WriteResult( TupleTag failedInsertsTag, PCollection failedInserts, TupleTag failedInsertsWithErrTag, - PCollection failedInsertsWithErr) { + PCollection failedInsertsWithErr, + PCollection successfulInserts) { this.pipeline = pipeline; this.failedInsertsTag = failedInsertsTag; this.failedInserts = failedInserts; this.failedInsertsWithErrTag = failedInsertsWithErrTag; this.failedInsertsWithErr = failedInsertsWithErr; + this.successfulInserts = successfulInserts; } /** @@ -91,6 +101,14 @@ public PCollection getFailedInserts() { return failedInserts; } + /** Returns a {@link PCollection} containing the {@link TableRow}s that were written to BQ. */ + public PCollection getSuccessfulInserts() { + checkArgument( + successfulInserts != null, + "Retrieving successful inserts is only supported for streaming inserts."); + return successfulInserts; + } + /** * Returns a {@link PCollection} containing the {@link BigQueryInsertError}s with detailed error * information. diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java index e3b7fc68d5ef..32ed1fe61275 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java @@ -26,11 +26,17 @@ import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableSchema; import com.google.api.services.bigquery.model.TimePartitioning; +import com.google.auto.value.AutoValue; import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; import java.util.List; import java.util.Map; import java.util.Set; import java.util.stream.Collectors; +import org.apache.beam.sdk.coders.AtomicCoder; +import org.apache.beam.sdk.coders.BooleanCoder; +import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.coders.VoidCoder; @@ -43,6 +49,7 @@ import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.JobService; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.ValueProvider; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.GroupByKey; @@ -67,7 +74,10 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.checkerframework.checker.initialization.qual.Initialized; +import org.checkerframework.checker.nullness.qual.NonNull; import org.checkerframework.checker.nullness.qual.Nullable; +import org.checkerframework.checker.nullness.qual.UnknownKeyFor; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -88,8 +98,35 @@ }) class WriteTables extends PTransform< - PCollection, List>>, - PCollection>> { + PCollection, WritePartition.Result>>, + PCollection>> { + @AutoValue + abstract static class Result { + abstract String getTableName(); + + abstract Boolean isFirstPane(); + } + + static class ResultCoder extends AtomicCoder { + static final ResultCoder INSTANCE = new ResultCoder(); + + @Override + public void encode(Result value, @UnknownKeyFor @NonNull @Initialized OutputStream outStream) + throws @UnknownKeyFor @NonNull @Initialized CoderException, @UnknownKeyFor @NonNull + @Initialized IOException { + StringUtf8Coder.of().encode(value.getTableName(), outStream); + BooleanCoder.of().encode(value.isFirstPane(), outStream); + } + + @Override + public Result decode(@UnknownKeyFor @NonNull @Initialized InputStream inStream) + throws @UnknownKeyFor @NonNull @Initialized CoderException, @UnknownKeyFor @NonNull + @Initialized IOException { + return new AutoValue_WriteTables_Result( + StringUtf8Coder.of().decode(inStream), BooleanCoder.of().decode(inStream)); + } + } + private static final Logger LOG = LoggerFactory.getLogger(WriteTables.class); private final boolean tempTable; @@ -100,7 +137,7 @@ class WriteTables private final Set schemaUpdateOptions; private final DynamicDestinations dynamicDestinations; private final List> sideInputs; - private final TupleTag> mainOutputTag; + private final TupleTag> mainOutputTag; private final TupleTag temporaryFilesTag; private final ValueProvider loadJobProjectId; private final int maxRetryJobs; @@ -108,9 +145,13 @@ class WriteTables private final @Nullable String kmsKey; private final String sourceFormat; private final boolean useAvroLogicalTypes; + private @Nullable DatasetService datasetService; + private @Nullable JobService jobService; private class WriteTablesDoFn - extends DoFn, List>, KV> { + extends DoFn< + KV, WritePartition.Result>, KV> { + private Map jsonSchemas = Maps.newHashMap(); // Represents a pending BigQuery load job. @@ -120,18 +161,21 @@ private class PendingJobData { final List partitionFiles; final TableDestination tableDestination; final TableReference tableReference; + final boolean isFirstPane; public PendingJobData( BoundedWindow window, BigQueryHelpers.PendingJob retryJob, List partitionFiles, TableDestination tableDestination, - TableReference tableReference) { + TableReference tableReference, + boolean isFirstPane) { this.window = window; this.retryJob = retryJob; this.partitionFiles = partitionFiles; this.tableDestination = tableDestination; this.tableReference = tableReference; + this.isFirstPane = isFirstPane; } } // All pending load jobs. @@ -146,7 +190,11 @@ public void startBundle(StartBundleContext c) { } @ProcessElement - public void processElement(ProcessContext c, BoundedWindow window) throws Exception { + public void processElement( + @Element KV, WritePartition.Result> element, + ProcessContext c, + BoundedWindow window) + throws Exception { dynamicDestinations.setSideInputAccessorFromProcessContext(c); DestinationT destination = c.element().getKey().getKey(); TableSchema tableSchema; @@ -188,12 +236,16 @@ public void processElement(ProcessContext c, BoundedWindow window) throws Except dynamicDestinations); TableReference tableReference = tableDestination.getTableReference(); if (Strings.isNullOrEmpty(tableReference.getProjectId())) { - tableReference.setProjectId(c.getPipelineOptions().as(BigQueryOptions.class).getProject()); + BigQueryOptions options = c.getPipelineOptions().as(BigQueryOptions.class); + tableReference.setProjectId( + options.getBigQueryProject() == null + ? options.getProject() + : options.getBigQueryProject()); tableDestination = tableDestination.withTableReference(tableReference); } - Integer partition = c.element().getKey().getShardNumber(); - List partitionFiles = Lists.newArrayList(c.element().getValue()); + Integer partition = element.getKey().getShardNumber(); + List partitionFiles = Lists.newArrayList(element.getValue().getFilenames()); String jobIdPrefix = BigQueryResourceNaming.createJobIdWithDestination( c.sideInput(loadJobIdPrefixView), tableDestination, partition, c.pane().getIndex()); @@ -205,7 +257,7 @@ public void processElement(ProcessContext c, BoundedWindow window) throws Except WriteDisposition writeDisposition = firstPaneWriteDisposition; CreateDisposition createDisposition = firstPaneCreateDisposition; - if (c.pane().getIndex() > 0 && !tempTable) { + if (!element.getValue().isFirstPane() && !tempTable) { // If writing directly to the destination, then the table is created on the first write // and we should change the disposition for subsequent writes. writeDisposition = WriteDisposition.WRITE_APPEND; @@ -219,8 +271,8 @@ public void processElement(ProcessContext c, BoundedWindow window) throws Except BigQueryHelpers.PendingJob retryJob = startLoad( - bqServices.getJobService(c.getPipelineOptions().as(BigQueryOptions.class)), - bqServices.getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class)), + getJobService(c.getPipelineOptions().as(BigQueryOptions.class)), + getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class)), jobIdPrefix, tableReference, tableDestination.getTimePartitioning(), @@ -231,7 +283,43 @@ public void processElement(ProcessContext c, BoundedWindow window) throws Except createDisposition, schemaUpdateOptions); pendingJobs.add( - new PendingJobData(window, retryJob, partitionFiles, tableDestination, tableReference)); + new PendingJobData( + window, + retryJob, + partitionFiles, + tableDestination, + tableReference, + element.getValue().isFirstPane())); + } + + @Teardown + public void onTeardown() { + try { + if (datasetService != null) { + datasetService.close(); + datasetService = null; + } + if (jobService != null) { + jobService.close(); + jobService = null; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + private DatasetService getDatasetService(PipelineOptions pipelineOptions) throws IOException { + if (datasetService == null) { + datasetService = bqServices.getDatasetService(pipelineOptions.as(BigQueryOptions.class)); + } + return datasetService; + } + + private JobService getJobService(PipelineOptions pipelineOptions) throws IOException { + if (jobService == null) { + jobService = bqServices.getJobService(pipelineOptions.as(BigQueryOptions.class)); + } + return jobService; } @Override @@ -247,7 +335,7 @@ public void finishBundle(FinishBundleContext c) throws Exception { bqServices.getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class)); PendingJobManager jobManager = new PendingJobManager(); - for (PendingJobData pendingJob : pendingJobs) { + for (final PendingJobData pendingJob : pendingJobs) { jobManager = jobManager.addPendingJob( pendingJob.retryJob, @@ -262,11 +350,14 @@ public void finishBundle(FinishBundleContext c) throws Exception { BigQueryHelpers.stripPartitionDecorator(ref.getTableId())), pendingJob.tableDestination.getTableDescription()); } + + Result result = + new AutoValue_WriteTables_Result( + BigQueryHelpers.toJsonString(pendingJob.tableReference), + pendingJob.isFirstPane); c.output( mainOutputTag, - KV.of( - pendingJob.tableDestination, - BigQueryHelpers.toJsonString(pendingJob.tableReference)), + KV.of(pendingJob.tableDestination, result), pendingJob.window.maxTimestamp(), pendingJob.window); for (String file : pendingJob.partitionFiles) { @@ -328,8 +419,8 @@ public WriteTables( } @Override - public PCollection> expand( - PCollection, List>> input) { + public PCollection> expand( + PCollection, WritePartition.Result>> input) { PCollectionTuple writeTablesOutputs = input.apply( ParDo.of(new WriteTablesDoFn()) @@ -354,7 +445,6 @@ public PCollection> expand( .apply(GroupByKey.create()) .apply(Values.create()) .apply(ParDo.of(new GarbageCollectTemporaryFiles())); - return writeTablesOutputs.get(mainOutputTag); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BeamRowToBigtableMutation.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BeamRowToBigtableMutation.java index f394cbed607a..e1f2a0010a10 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BeamRowToBigtableMutation.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BeamRowToBigtableMutation.java @@ -111,12 +111,11 @@ private Mutation mutation(String family, String column, Row row) { private ByteString convertValueToByteString(Row row, String column) { Schema.Field field = row.getSchema().getField(column); - Schema.TypeName typeName = field.getType().getTypeName(); Object value = row.getValue(column); if (value == null) { throw new NullPointerException("Null value at column " + column); } else { - return cellValueParser.valueToByteString(value, typeName); + return cellValueParser.valueToByteString(value, field.getType()); } } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfig.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfig.java index 8592d45cee7d..7e5a775af4cc 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfig.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfig.java @@ -206,9 +206,6 @@ BigtableService getBigtableService(PipelineOptions pipelineOptions) { CredentialOptions.credential(pipelineOptions.as(GcpOptions.class).getGcpCredential())); } - // Default option that should be forced - bigtableOptions.setUseCachedDataPool(true); - return new BigtableServiceImpl(bigtableOptions.build()); } @@ -228,6 +225,9 @@ private BigtableOptions.Builder effectiveUserProvidedBigtableOptions() { effectiveOptions = getBigtableOptionsConfigurator().apply(effectiveOptions); } + // Default option that should be forced in most cases + effectiveOptions.setUseCachedDataPool(true); + if (getInstanceId() != null) { effectiveOptions.setInstanceId(getInstanceId().get()); } @@ -238,6 +238,7 @@ private BigtableOptions.Builder effectiveUserProvidedBigtableOptions() { if (getEmulatorHost() != null) { effectiveOptions.enableEmulator(getEmulatorHost()); + effectiveOptions.setUseCachedDataPool(false); } return effectiveOptions; diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRow.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRow.java index 6dff82dd98c5..22627801ea56 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRow.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRow.java @@ -94,8 +94,7 @@ private Row cellToRow(Cell cell, Schema cellSchema) { Row.FieldValueBuilder rowBuilder = Row.withSchema(cellSchema) - .withFieldValue( - VALUE, getCellValue(cell, cellSchema.getField(VALUE).getType().getTypeName())); + .withFieldValue(VALUE, getCellValue(cell, cellSchema.getField(VALUE).getType())); if (cellSchema.hasField(TIMESTAMP_MICROS)) { rowBuilder.withFieldValue(TIMESTAMP_MICROS, cell.getTimestampMicros()); } @@ -115,7 +114,7 @@ private Object columnToRow(Column column, Schema schema) { Schema.FieldType collectionElementType = columnType.getCollectionElementType(); if (collectionElementType != null) { return cells.stream() - .map(cell -> getCellValue(cell, collectionElementType.getTypeName())) + .map(cell -> getCellValue(cell, collectionElementType)) .collect(toList()); } else { throw new NullPointerException("Null collectionElementType at column " + columnName); @@ -128,7 +127,7 @@ private Object columnToRow(Column column, Schema schema) { return cellToRow(getLastCell(cells), rowSchema); } default: - return getCellValue(getLastCell(cells), columnType.getTypeName()); + return getCellValue(getLastCell(cells), columnType); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFlat.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFlat.java index 55cecbcaf2c8..8e9cc16e2d57 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFlat.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFlat.java @@ -101,7 +101,7 @@ private void setFamily(Row.FieldValueBuilder rowBuilder, Family family) { private void setColumn(Row.FieldValueBuilder rowBuilder, Column column) { String columnName = column.getQualifier().toStringUtf8(); - Schema.TypeName type = schema.getField(columnName).getType().getTypeName(); + Schema.FieldType type = schema.getField(columnName).getType(); rowBuilder.withFieldValue(columnName, getCellValue(getLastCell(column.getCellsList()), type)); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFn.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFn.java index 0e7da66fc89e..a72f58dfce94 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFn.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFn.java @@ -40,7 +40,7 @@ protected Cell getLastCell(List cells) { .orElseThrow(() -> new RuntimeException("Couldn't retrieve the most recent cell value")); } - protected Object getCellValue(Cell cell, Schema.TypeName typeName) { - return valueParser.getCellValue(cell, typeName); + protected Object getCellValue(Cell cell, Schema.FieldType type) { + return valueParser.getCellValue(cell, type); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImpl.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImpl.java index aeee3d2864c8..eb9f8c40aa4c 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImpl.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImpl.java @@ -38,10 +38,14 @@ import io.grpc.Status.Code; import io.grpc.StatusRuntimeException; import java.io.IOException; +import java.util.HashMap; import java.util.List; import java.util.NoSuchElementException; import java.util.concurrent.CompletableFuture; import java.util.concurrent.CompletionStage; +import org.apache.beam.runners.core.metrics.GcpResourceIdentifiers; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.ServiceCallMetric; import org.apache.beam.sdk.io.gcp.bigtable.BigtableIO.BigtableSource; import org.apache.beam.sdk.io.range.ByteKeyRange; import org.apache.beam.sdk.values.KV; @@ -130,12 +134,40 @@ public boolean start() throws IOException { String tableNameSr = session.getOptions().getInstanceName().toTableNameStr(source.getTableId().get()); + HashMap baseLabels = new HashMap<>(); + baseLabels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + baseLabels.put(MonitoringInfoConstants.Labels.SERVICE, "BigTable"); + baseLabels.put(MonitoringInfoConstants.Labels.METHOD, "google.bigtable.v2.ReadRows"); + baseLabels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.bigtableResource( + session.getOptions().getProjectId(), + session.getOptions().getInstanceId(), + source.getTableId().get())); + baseLabels.put( + MonitoringInfoConstants.Labels.BIGTABLE_PROJECT_ID, session.getOptions().getProjectId()); + baseLabels.put( + MonitoringInfoConstants.Labels.INSTANCE_ID, session.getOptions().getInstanceId()); + baseLabels.put( + MonitoringInfoConstants.Labels.TABLE_ID, + GcpResourceIdentifiers.bigtableTableID( + session.getOptions().getProjectId(), + session.getOptions().getInstanceId(), + source.getTableId().get())); + ServiceCallMetric serviceCallMetric = + new ServiceCallMetric(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, baseLabels); ReadRowsRequest.Builder requestB = ReadRowsRequest.newBuilder().setRows(rowSet).setTableName(tableNameSr); if (source.getRowFilter() != null) { requestB.setFilter(source.getRowFilter()); } - results = session.getDataClient().readRows(requestB.build()); + try { + results = session.getDataClient().readRows(requestB.build()); + serviceCallMetric.call("ok"); + } catch (StatusRuntimeException e) { + serviceCallMetric.call(e.getStatus().getCode().value()); + throw e; + } return advance(); } @@ -182,10 +214,12 @@ public Row getCurrentRow() throws NoSuchElementException { static class BigtableWriterImpl implements Writer { private BigtableSession session; private BulkMutation bulkMutation; + private BigtableTableName tableName; BigtableWriterImpl(BigtableSession session, BigtableTableName tableName) { this.session = session; bulkMutation = session.createBulkMutation(tableName); + this.tableName = tableName; } @Override @@ -231,6 +265,28 @@ public CompletionStage writeRecord(KV baseLabels = new HashMap<>(); + baseLabels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + baseLabels.put(MonitoringInfoConstants.Labels.SERVICE, "BigTable"); + baseLabels.put(MonitoringInfoConstants.Labels.METHOD, "google.bigtable.v2.MutateRows"); + baseLabels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.bigtableResource( + session.getOptions().getProjectId(), + session.getOptions().getInstanceId(), + tableName.getTableId())); + baseLabels.put( + MonitoringInfoConstants.Labels.BIGTABLE_PROJECT_ID, session.getOptions().getProjectId()); + baseLabels.put( + MonitoringInfoConstants.Labels.INSTANCE_ID, session.getOptions().getInstanceId()); + baseLabels.put( + MonitoringInfoConstants.Labels.TABLE_ID, + GcpResourceIdentifiers.bigtableTableID( + session.getOptions().getProjectId(), + session.getOptions().getInstanceId(), + tableName.getTableId())); + ServiceCallMetric serviceCallMetric = + new ServiceCallMetric(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, baseLabels); CompletableFuture result = new CompletableFuture<>(); Futures.addCallback( new VendoredListenableFutureAdapter<>(bulkMutation.add(request)), @@ -238,10 +294,17 @@ public CompletionStage writeRecord(KV * + *

    Write and delete operations will follow a gradual ramp-up by default in order to protect Cloud + * Datastore from potential overload. This rate limit follows a heuristic based on the expected + * number of workers. To optimize throughput in this initial stage, you can provide a hint to the + * relevant {@code PTransform} by calling {@code withHintNumWorkers}, e.g., {@code + * DatastoreIO.v1().deleteKey().withHintNumWorkers(numWorkers)}. While not recommended, you can also + * turn this off via {@code .withRampupThrottlingDisabled()}. + * *

    {@link Entity Entities} in the {@code PCollection} to be written or deleted must have complete * {@link Key Keys}. Complete {@code Keys} specify the {@code name} and {@code id} of the {@code * Entity}, where incomplete {@code Keys} do not. A {@code namespace} other than {@code projectId} @@ -209,10 +221,10 @@ public class DatastoreV1 { * size; it is adjusted at runtime based on the performance of previous writes (see {@link * DatastoreV1.WriteBatcher}). * - *

    Testing has found that a batch of 200 entities will generally finish within the timeout even + *

    Testing has found that a batch of 50 entities will generally finish within the timeout even * in adverse conditions. */ - @VisibleForTesting static final int DATASTORE_BATCH_UPDATE_ENTITIES_START = 200; + @VisibleForTesting static final int DATASTORE_BATCH_UPDATE_ENTITIES_START = 50; /** * When choosing the number of updates in a single RPC, never exceed the maximum allowed by the @@ -225,7 +237,7 @@ public class DatastoreV1 { * number of entities per request may be lower when we flush for the end of a bundle or if we hit * {@link DatastoreV1.DATASTORE_BATCH_UPDATE_BYTES_LIMIT}. */ - @VisibleForTesting static final int DATASTORE_BATCH_UPDATE_ENTITIES_MIN = 10; + @VisibleForTesting static final int DATASTORE_BATCH_UPDATE_ENTITIES_MIN = 5; /** * Cloud Datastore has a limit of 10MB per RPC, so we also flush if the total size of mutations @@ -235,7 +247,14 @@ public class DatastoreV1 { @VisibleForTesting static final int DATASTORE_BATCH_UPDATE_BYTES_LIMIT = 9_000_000; /** - * Non-retryable errors. See https://cloud.google.com/datastore/docs/concepts/errors#Error_Codes . + * Default hint for the expected number of workers in the ramp-up throttling step. This is used to + * approximate a global rate limit, so a mismatch can yield slightly elevated throttling. + * Reconfigure on the {@link Mutate} object to align with the worker count and improve throughput. + */ + private static final int DEFAULT_HINT_NUM_WORKERS = 500; + + /** + * Non-retryable errors. See https://cloud.google.com/datastore/docs/concepts/errors#Error_Codes. */ private static final Set NON_RETRYABLE_ERRORS = ImmutableSet.of( @@ -886,12 +905,27 @@ private RunQueryResponse runQueryWithRetries(RunQueryRequest request) throws Exc Sleeper sleeper = Sleeper.DEFAULT; BackOff backoff = RUNQUERY_BACKOFF.backoff(); while (true) { + HashMap baseLabels = new HashMap<>(); + baseLabels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + baseLabels.put(MonitoringInfoConstants.Labels.SERVICE, "Datastore"); + baseLabels.put(MonitoringInfoConstants.Labels.METHOD, "BatchDatastoreRead"); + baseLabels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.datastoreResource( + options.getProjectId(), options.getNamespace())); + baseLabels.put(MonitoringInfoConstants.Labels.DATASTORE_PROJECT, options.getProjectId()); + baseLabels.put( + MonitoringInfoConstants.Labels.DATASTORE_NAMESPACE, options.getNamespace()); + ServiceCallMetric serviceCallMetric = + new ServiceCallMetric(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, baseLabels); try { RunQueryResponse response = datastore.runQuery(request); + serviceCallMetric.call("ok"); rpcSuccesses.inc(); return response; } catch (DatastoreException exception) { rpcErrors.inc(); + serviceCallMetric.call(exception.getCode().getNumber()); if (NON_RETRYABLE_ERRORS.contains(exception.getCode())) { throw exception; @@ -970,7 +1004,7 @@ public void populateDisplayData(DisplayData.Builder builder) { * using {@link DatastoreV1.Write#withProjectId}. */ public Write write() { - return new Write(null, null); + return new Write(null, null, true, DEFAULT_HINT_NUM_WORKERS); } /** @@ -978,7 +1012,7 @@ public Write write() { * using {@link DeleteEntity#withProjectId}. */ public DeleteEntity deleteEntity() { - return new DeleteEntity(null, null); + return new DeleteEntity(null, null, true, DEFAULT_HINT_NUM_WORKERS); } /** @@ -986,7 +1020,7 @@ public DeleteEntity deleteEntity() { * {@link DeleteKey#withProjectId}. */ public DeleteKey deleteKey() { - return new DeleteKey(null, null); + return new DeleteKey(null, null, true, DEFAULT_HINT_NUM_WORKERS); } /** @@ -995,12 +1029,17 @@ public DeleteKey deleteKey() { * @see DatastoreIO */ public static class Write extends Mutate { + /** * Note that {@code projectId} is only {@code @Nullable} as a matter of build order, but if it * is {@code null} at instantiation time, an error will be thrown. */ - Write(@Nullable ValueProvider projectId, @Nullable String localhost) { - super(projectId, localhost, new UpsertFn()); + Write( + @Nullable ValueProvider projectId, + @Nullable String localhost, + boolean throttleRampup, + int hintNumWorkers) { + super(projectId, localhost, new UpsertFn(), throttleRampup, hintNumWorkers); } /** Returns a new {@link Write} that writes to the Cloud Datastore for the specified project. */ @@ -1012,7 +1051,7 @@ public Write withProjectId(String projectId) { /** Same as {@link Write#withProjectId(String)} but with a {@link ValueProvider}. */ public Write withProjectId(ValueProvider projectId) { checkArgument(projectId != null, "projectId can not be null"); - return new Write(projectId, localhost); + return new Write(projectId, localhost, throttleRampup, hintNumWorkers); } /** @@ -1021,7 +1060,21 @@ public Write withProjectId(ValueProvider projectId) { */ public Write withLocalhost(String localhost) { checkArgument(localhost != null, "localhost can not be null"); - return new Write(projectId, localhost); + return new Write(projectId, localhost, throttleRampup, hintNumWorkers); + } + + /** Returns a new {@link Write} that does not throttle during ramp-up. */ + public Write withRampupThrottlingDisabled() { + return new Write(projectId, localhost, false, hintNumWorkers); + } + + /** + * Returns a new {@link Write} with a different worker count hint for ramp-up throttling. Value + * is ignored if ramp-up throttling is disabled. + */ + public Write withHintNumWorkers(int hintNumWorkers) { + checkArgument(hintNumWorkers > 0, "hintNumWorkers must be positive"); + return new Write(projectId, localhost, throttleRampup, hintNumWorkers); } } @@ -1031,12 +1084,17 @@ public Write withLocalhost(String localhost) { * @see DatastoreIO */ public static class DeleteEntity extends Mutate { + /** * Note that {@code projectId} is only {@code @Nullable} as a matter of build order, but if it * is {@code null} at instantiation time, an error will be thrown. */ - DeleteEntity(@Nullable ValueProvider projectId, @Nullable String localhost) { - super(projectId, localhost, new DeleteEntityFn()); + DeleteEntity( + @Nullable ValueProvider projectId, + @Nullable String localhost, + boolean throttleRampup, + int hintNumWorkers) { + super(projectId, localhost, new DeleteEntityFn(), throttleRampup, hintNumWorkers); } /** @@ -1051,7 +1109,7 @@ public DeleteEntity withProjectId(String projectId) { /** Same as {@link DeleteEntity#withProjectId(String)} but with a {@link ValueProvider}. */ public DeleteEntity withProjectId(ValueProvider projectId) { checkArgument(projectId != null, "projectId can not be null"); - return new DeleteEntity(projectId, localhost); + return new DeleteEntity(projectId, localhost, throttleRampup, hintNumWorkers); } /** @@ -1060,7 +1118,21 @@ public DeleteEntity withProjectId(ValueProvider projectId) { */ public DeleteEntity withLocalhost(String localhost) { checkArgument(localhost != null, "localhost can not be null"); - return new DeleteEntity(projectId, localhost); + return new DeleteEntity(projectId, localhost, throttleRampup, hintNumWorkers); + } + + /** Returns a new {@link DeleteEntity} that does not throttle during ramp-up. */ + public DeleteEntity withRampupThrottlingDisabled() { + return new DeleteEntity(projectId, localhost, false, hintNumWorkers); + } + + /** + * Returns a new {@link DeleteEntity} with a different worker count hint for ramp-up throttling. + * Value is ignored if ramp-up throttling is disabled. + */ + public DeleteEntity withHintNumWorkers(int hintNumWorkers) { + checkArgument(hintNumWorkers > 0, "hintNumWorkers must be positive"); + return new DeleteEntity(projectId, localhost, throttleRampup, hintNumWorkers); } } @@ -1071,12 +1143,17 @@ public DeleteEntity withLocalhost(String localhost) { * @see DatastoreIO */ public static class DeleteKey extends Mutate { + /** * Note that {@code projectId} is only {@code @Nullable} as a matter of build order, but if it * is {@code null} at instantiation time, an error will be thrown. */ - DeleteKey(@Nullable ValueProvider projectId, @Nullable String localhost) { - super(projectId, localhost, new DeleteKeyFn()); + DeleteKey( + @Nullable ValueProvider projectId, + @Nullable String localhost, + boolean throttleRampup, + int hintNumWorkers) { + super(projectId, localhost, new DeleteKeyFn(), throttleRampup, hintNumWorkers); } /** @@ -1094,13 +1171,27 @@ public DeleteKey withProjectId(String projectId) { */ public DeleteKey withLocalhost(String localhost) { checkArgument(localhost != null, "localhost can not be null"); - return new DeleteKey(projectId, localhost); + return new DeleteKey(projectId, localhost, throttleRampup, hintNumWorkers); } /** Same as {@link DeleteKey#withProjectId(String)} but with a {@link ValueProvider}. */ public DeleteKey withProjectId(ValueProvider projectId) { checkArgument(projectId != null, "projectId can not be null"); - return new DeleteKey(projectId, localhost); + return new DeleteKey(projectId, localhost, throttleRampup, hintNumWorkers); + } + + /** Returns a new {@link DeleteKey} that does not throttle during ramp-up. */ + public DeleteKey withRampupThrottlingDisabled() { + return new DeleteKey(projectId, localhost, false, hintNumWorkers); + } + + /** + * Returns a new {@link DeleteKey} with a different worker count hint for ramp-up throttling. + * Value is ignored if ramp-up throttling is disabled. + */ + public DeleteKey withHintNumWorkers(int hintNumWorkers) { + checkArgument(hintNumWorkers > 0, "hintNumWorkers must be positive"); + return new DeleteKey(projectId, localhost, throttleRampup, hintNumWorkers); } } @@ -1113,11 +1204,16 @@ public DeleteKey withProjectId(ValueProvider projectId) { * provided, as the commits are retried when failures occur. */ private abstract static class Mutate extends PTransform, PDone> { + protected ValueProvider projectId; protected @Nullable String localhost; + protected boolean throttleRampup; + protected int hintNumWorkers; /** A function that transforms each {@code T} into a mutation. */ private final SimpleFunction mutationFn; + private final RampupThrottlingFn rampupThrottlingFn; + /** * Note that {@code projectId} is only {@code @Nullable} as a matter of build order, but if it * is {@code null} at instantiation time, an error will be thrown. @@ -1125,10 +1221,15 @@ private abstract static class Mutate extends PTransform, PDone Mutate( @Nullable ValueProvider projectId, @Nullable String localhost, - SimpleFunction mutationFn) { + SimpleFunction mutationFn, + boolean throttleRampup, + int hintNumWorkers) { this.projectId = projectId; this.localhost = localhost; + this.throttleRampup = throttleRampup; + this.hintNumWorkers = hintNumWorkers; this.mutationFn = checkNotNull(mutationFn); + this.rampupThrottlingFn = new RampupThrottlingFn<>(hintNumWorkers); } @Override @@ -1139,10 +1240,15 @@ public PDone expand(PCollection input) { } checkArgument(mutationFn != null, "mutationFn can not be null"); - input - .apply("Convert to Mutation", MapElements.via(mutationFn)) - .apply( - "Write Mutation to Datastore", ParDo.of(new DatastoreWriterFn(projectId, localhost))); + PCollection intermediateOutput = + input.apply("Convert to Mutation", MapElements.via(mutationFn)); + if (throttleRampup) { + intermediateOutput = + intermediateOutput.apply( + "Enforce ramp-up through throttling", ParDo.of(rampupThrottlingFn)); + } + intermediateOutput.apply( + "Write Mutation to Datastore", ParDo.of(new DatastoreWriterFn(projectId, localhost))); return PDone.in(input.getPipeline()); } @@ -1161,6 +1267,9 @@ public void populateDisplayData(DisplayData.Builder builder) { builder .addIfNotNull(DisplayData.item("projectId", projectId).withLabel("Output Project")) .include("mutationFn", mutationFn); + if (throttleRampup) { + builder.include("rampupThrottlingFn", rampupThrottlingFn); + } } public String getProjectId() { @@ -1171,6 +1280,7 @@ public String getProjectId() { /** Determines batch sizes for commit RPCs. */ @VisibleForTesting interface WriteBatcher { + /** Call before using this WriteBatcher. */ void start(); @@ -1195,8 +1305,9 @@ interface WriteBatcher { */ @VisibleForTesting static class WriteBatcherImpl implements WriteBatcher, Serializable { + /** Target time per RPC for writes. */ - static final int DATASTORE_BATCH_TARGET_LATENCY_MS = 5000; + static final int DATASTORE_BATCH_TARGET_LATENCY_MS = 6000; @Override public void start() { @@ -1217,12 +1328,11 @@ public int nextBatchSize(long timeSinceEpochMillis) { return DATASTORE_BATCH_UPDATE_ENTITIES_START; } long recentMeanLatency = Math.max(meanLatencyPerEntityMs.get(timeSinceEpochMillis), 1); + long targetBatchSize = DATASTORE_BATCH_TARGET_LATENCY_MS / recentMeanLatency; return (int) Math.max( DATASTORE_BATCH_UPDATE_ENTITIES_MIN, - Math.min( - DATASTORE_BATCH_UPDATE_ENTITIES_LIMIT, - DATASTORE_BATCH_TARGET_LATENCY_MS / recentMeanLatency)); + Math.min(DATASTORE_BATCH_UPDATE_ENTITIES_LIMIT, targetBatchSize)); } private transient MovingAverage meanLatencyPerEntityMs; @@ -1243,6 +1353,7 @@ public int nextBatchSize(long timeSinceEpochMillis) { */ @VisibleForTesting static class DatastoreWriterFn extends DoFn { + private static final Logger LOG = LoggerFactory.getLogger(DatastoreWriterFn.class); private final ValueProvider projectId; private final @Nullable String localhost; @@ -1252,13 +1363,17 @@ static class DatastoreWriterFn extends DoFn { private final List mutations = new ArrayList<>(); private int mutationsSize = 0; // Accumulated size of protos in mutations. private WriteBatcher writeBatcher; - private transient AdaptiveThrottler throttler; + private transient AdaptiveThrottler adaptiveThrottler; private final Counter throttlingMsecs = Metrics.counter(DatastoreWriterFn.class, "throttling-msecs"); private final Counter rpcErrors = Metrics.counter(DatastoreWriterFn.class, "datastoreRpcErrors"); private final Counter rpcSuccesses = Metrics.counter(DatastoreWriterFn.class, "datastoreRpcSuccesses"); + private final Counter entitiesMutated = + Metrics.counter(DatastoreWriterFn.class, "datastoreEntitiesMutated"); + private final Distribution latencyMsPerMutation = + Metrics.distribution(DatastoreWriterFn.class, "datastoreLatencyMsPerMutation"); private static final int MAX_RETRIES = 5; private static final FluentBackoff BUNDLE_WRITE_BACKOFF = @@ -1294,9 +1409,9 @@ static class DatastoreWriterFn extends DoFn { public void startBundle(StartBundleContext c) { datastore = datastoreFactory.getDatastore(c.getPipelineOptions(), projectId.get(), localhost); writeBatcher.start(); - if (throttler == null) { + if (adaptiveThrottler == null) { // Initialize throttler at first use, because it is not serializable. - throttler = new AdaptiveThrottler(120000, 10000, 1.25); + adaptiveThrottler = new AdaptiveThrottler(120000, 10000, 1.25); } } @@ -1344,30 +1459,46 @@ private void flushBatch() throws DatastoreException, IOException, InterruptedExc commitRequest.setMode(CommitRequest.Mode.NON_TRANSACTIONAL); long startTime = System.currentTimeMillis(), endTime; - if (throttler.throttleRequest(startTime)) { + if (adaptiveThrottler.throttleRequest(startTime)) { LOG.info("Delaying request due to previous failures"); throttlingMsecs.inc(WriteBatcherImpl.DATASTORE_BATCH_TARGET_LATENCY_MS); sleeper.sleep(WriteBatcherImpl.DATASTORE_BATCH_TARGET_LATENCY_MS); continue; } + HashMap baseLabels = new HashMap<>(); + baseLabels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + baseLabels.put(MonitoringInfoConstants.Labels.SERVICE, "Datastore"); + baseLabels.put(MonitoringInfoConstants.Labels.METHOD, "BatchDatastoreWrite"); + baseLabels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.datastoreResource(projectId.get(), "")); + baseLabels.put(MonitoringInfoConstants.Labels.DATASTORE_PROJECT, projectId.get()); + baseLabels.put(MonitoringInfoConstants.Labels.DATASTORE_NAMESPACE, ""); + ServiceCallMetric serviceCallMetric = + new ServiceCallMetric(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, baseLabels); try { datastore.commit(commitRequest.build()); endTime = System.currentTimeMillis(); + serviceCallMetric.call("ok"); writeBatcher.addRequestLatency(endTime, endTime - startTime, mutations.size()); - throttler.successfulRequest(startTime); + adaptiveThrottler.successfulRequest(startTime); + latencyMsPerMutation.update((endTime - startTime) / mutations.size()); rpcSuccesses.inc(); + entitiesMutated.inc(mutations.size()); // Break if the commit threw no exception. break; } catch (DatastoreException exception) { + serviceCallMetric.call(exception.getCode().getNumber()); if (exception.getCode() == Code.DEADLINE_EXCEEDED) { /* Most errors are not related to request size, and should not change our expectation of * the latency of successful requests. DEADLINE_EXCEEDED can be taken into * consideration, though. */ endTime = System.currentTimeMillis(); writeBatcher.addRequestLatency(endTime, endTime - startTime, mutations.size()); + latencyMsPerMutation.update((endTime - startTime) / mutations.size()); } // Only log the code and message for potentially-transient errors. The entire exception // will be propagated upon the last retry. @@ -1497,13 +1628,21 @@ public Datastore getDatastore(PipelineOptions pipelineOptions, String projectId) public Datastore getDatastore( PipelineOptions pipelineOptions, String projectId, @Nullable String localhost) { Credentials credential = pipelineOptions.as(GcpOptions.class).getGcpCredential(); + + // Add Beam version to user agent header. + HttpRequestInitializer userAgentInitializer = + request -> request.getHeaders().setUserAgent(pipelineOptions.getUserAgent()); HttpRequestInitializer initializer; if (credential != null) { initializer = new ChainingHttpRequestInitializer( - new HttpCredentialsAdapter(credential), new RetryHttpRequestInitializer()); + new HttpCredentialsAdapter(credential), + new RetryHttpRequestInitializer(), + userAgentInitializer); } else { - initializer = new RetryHttpRequestInitializer(); + initializer = + new ChainingHttpRequestInitializer( + new RetryHttpRequestInitializer(), userAgentInitializer); } DatastoreOptions.Builder builder = diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/RampupThrottlingFn.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/RampupThrottlingFn.java new file mode 100644 index 000000000000..e58814c7b5e5 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/RampupThrottlingFn.java @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.datastore; + +import java.io.IOException; +import java.io.Serializable; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Sum; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.util.BackOff; +import org.apache.beam.sdk.util.FluentBackoff; +import org.apache.beam.sdk.util.MovingFunction; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * An implementation of a client-side throttler that enforces a gradual ramp-up, broadly in line + * with Datastore best practices. See also + * https://cloud.google.com/datastore/docs/best-practices#ramping_up_traffic. + */ +public class RampupThrottlingFn extends DoFn implements Serializable { + + private static final Logger LOG = LoggerFactory.getLogger(RampupThrottlingFn.class); + private static final double BASE_BUDGET = 500.0; + private static final Duration RAMP_UP_INTERVAL = Duration.standardMinutes(5); + private static final FluentBackoff fluentBackoff = FluentBackoff.DEFAULT; + + private final int numWorkers; + + @VisibleForTesting + Counter throttlingMsecs = Metrics.counter(RampupThrottlingFn.class, "throttling-msecs"); + + // Initialized on every setup. + private transient MovingFunction successfulOps; + // Initialized once in constructor. + private Instant firstInstant; + + @VisibleForTesting transient Sleeper sleeper; + + public RampupThrottlingFn(int numWorkers) { + this.numWorkers = numWorkers; + this.sleeper = Sleeper.DEFAULT; + this.successfulOps = + new MovingFunction( + Duration.standardSeconds(1).getMillis(), + Duration.standardSeconds(1).getMillis(), + 1 /* numSignificantBuckets */, + 1 /* numSignificantSamples */, + Sum.ofLongs()); + this.firstInstant = Instant.now(); + } + + // 500 / numWorkers * 1.5^max(0, (x-5)/5), or "+50% every 5 minutes" + private int calcMaxOpsBudget(Instant first, Instant instant) { + double rampUpIntervalMinutes = (double) RAMP_UP_INTERVAL.getStandardMinutes(); + Duration durationSinceFirst = new Duration(first, instant); + + double calculatedGrowth = + (durationSinceFirst.getStandardMinutes() - rampUpIntervalMinutes) / rampUpIntervalMinutes; + double growth = Math.max(0, calculatedGrowth); + double maxOpsBudget = BASE_BUDGET / this.numWorkers * Math.pow(1.5, growth); + return (int) Math.min(Integer.MAX_VALUE, Math.max(1, maxOpsBudget)); + } + + @Setup + public void setup() { + this.sleeper = Sleeper.DEFAULT; + this.successfulOps = + new MovingFunction( + Duration.standardSeconds(1).getMillis(), + Duration.standardSeconds(1).getMillis(), + 1 /* numSignificantBuckets */, + 1 /* numSignificantSamples */, + Sum.ofLongs()); + } + + /** Emit only as many elements as the exponentially increasing budget allows. */ + @ProcessElement + public void processElement(ProcessContext c) throws IOException, InterruptedException { + Instant nonNullableFirstInstant = firstInstant; + + T element = c.element(); + BackOff backoff = fluentBackoff.backoff(); + while (true) { + Instant instant = Instant.now(); + int maxOpsBudget = calcMaxOpsBudget(nonNullableFirstInstant, instant); + long currentOpCount = successfulOps.get(instant.getMillis()); + long availableOps = maxOpsBudget - currentOpCount; + + if (maxOpsBudget >= Integer.MAX_VALUE || availableOps > 0) { + c.output(element); + successfulOps.add(instant.getMillis(), 1); + return; + } + + long backoffMillis = backoff.nextBackOffMillis(); + LOG.info("Delaying by {}ms to conform to gradual ramp-up.", backoffMillis); + throttlingMsecs.inc(backoffMillis); + sleeper.sleep(backoffMillis); + } + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + builder.add( + DisplayData.item("hintNumWorkers", numWorkers) + .withLabel("Number of workers for ramp-up throttling algorithm")); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/CounterFactory.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/CounterFactory.java new file mode 100644 index 000000000000..c3d123b174d7 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/CounterFactory.java @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import java.io.Serializable; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; + +interface CounterFactory extends Serializable { + + CounterFactory DEFAULT = Metrics::counter; + + Counter get(String namespace, String name); +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/DistributionFactory.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/DistributionFactory.java new file mode 100644 index 000000000000..14f414ac3be7 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/DistributionFactory.java @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import java.io.Serializable; +import org.apache.beam.sdk.metrics.Distribution; +import org.apache.beam.sdk.metrics.Metrics; + +interface DistributionFactory extends Serializable { + + DistributionFactory DEFAULT = Metrics::distribution; + + Distribution get(String namespace, String name); +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreDoFn.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreDoFn.java new file mode 100644 index 000000000000..d7e73aea0f84 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreDoFn.java @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; + +/** + * Base class for all stateful {@link DoFn} defined in the Firestore Connector. + * + *

    This class defines all of the lifecycle events as abstract methods, ensuring each is accounted + * for in any implementing function. + * + *

    This base class also serves as an upper bound for the unit tests where all DoFn are checked to + * ensure they are serializable and adhere to specific lifecycle events. + * + * @param The type of the previous stage of the pipeline + * @param The type output to the next stage of the pipeline + */ +abstract class FirestoreDoFn extends DoFn { + + @Override + public abstract void populateDisplayData(DisplayData.Builder builder); + + /** @see org.apache.beam.sdk.transforms.DoFn.Setup */ + @Setup + public abstract void setup() throws Exception; + + /** @see org.apache.beam.sdk.transforms.DoFn.StartBundle */ + @StartBundle + public abstract void startBundle(DoFn.StartBundleContext context) throws Exception; + + /** + * This class defines a common parent class for those DoFn which rely on the implicit window for + * emitting values while processing a bundle. + */ + abstract static class ImplicitlyWindowedFirestoreDoFn + extends FirestoreDoFn { + /** + * {@link ProcessContext#element() context.element()} must be non-null, otherwise a + * NullPointerException will be thrown. + * + * @param context Context to source element from, and output to + * @see org.apache.beam.sdk.transforms.DoFn.ProcessElement + */ + @ProcessElement + public abstract void processElement(DoFn.ProcessContext context) throws Exception; + + /** @see org.apache.beam.sdk.transforms.DoFn.FinishBundle */ + @FinishBundle + public abstract void finishBundle() throws Exception; + } + + /** + * This class defines a common parent class for those DoFn which must explicitly track the window + * for emitting values while processing bundles. This is primarily necessary to support the + * ability to emit values during {@link #finishBundle(DoFn.FinishBundleContext)} where an output + * value must be explicitly correlated to a window. + */ + abstract static class ExplicitlyWindowedFirestoreDoFn + extends FirestoreDoFn { + /** + * {@link ProcessContext#element() context.element()} must be non-null, otherwise a + * NullPointerException will be thrown. + * + * @param context Context to source element from, and output to + * @see org.apache.beam.sdk.transforms.DoFn.ProcessElement + */ + @ProcessElement + public abstract void processElement( + DoFn.ProcessContext context, BoundedWindow window) throws Exception; + + /** @see org.apache.beam.sdk.transforms.DoFn.FinishBundle */ + @FinishBundle + public abstract void finishBundle(FinishBundleContext context) throws Exception; + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreIO.java new file mode 100644 index 000000000000..66956d5e2ab5 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreIO.java @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; + +/** + * {@link FirestoreIO} provides an API for reading from and writing to Google Cloud + * Firestore. + * + *

    For documentation see {@link FirestoreV1}. + */ +@Experimental(Kind.SOURCE_SINK) +public final class FirestoreIO { + + private FirestoreIO() {} + + public static FirestoreV1 v1() { + return FirestoreV1.INSTANCE; + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreOptions.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreOptions.java new file mode 100644 index 000000000000..e7afbd5bb595 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreOptions.java @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.checkerframework.checker.nullness.qual.Nullable; + +@Description("Options used to configure Cloud Firestore IO") +public interface FirestoreOptions extends PipelineOptions { + + /** + * A host port pair to allow connecting to a Cloud Firestore emulator instead of the live service. + * The value passed to this method will take precedent if the {@code FIRESTORE_EMULATOR_HOST} + * environment variable is also set. + * + * @return the string representation of a host and port pair to be used when constructing Cloud + * Firestore clients. + * @see com.google.cloud.firestore.FirestoreOptions.Builder#setEmulatorHost(java.lang.String) + */ + @Nullable + String getEmulatorHost(); + + /** + * Define a host port pair to allow connecting to a Cloud Firestore emulator instead of the live + * service. The value passed to this method will take precedent if the {@code + * FIRESTORE_EMULATOR_HOST} environment variable is also set. + * + * @param host the emulator host and port to connect to + * @see com.google.cloud.firestore.FirestoreOptions.Builder#setEmulatorHost(java.lang.String) + */ + void setEmulatorHost(String host); +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreStatefulComponentFactory.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreStatefulComponentFactory.java new file mode 100644 index 000000000000..d8b8ba75a1d6 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreStatefulComponentFactory.java @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import com.google.api.gax.core.FixedCredentialsProvider; +import com.google.api.gax.grpc.InstantiatingGrpcChannelProvider; +import com.google.api.gax.retrying.RetrySettings; +import com.google.api.gax.rpc.ClientContext; +import com.google.api.gax.rpc.FixedHeaderProvider; +import com.google.cloud.firestore.FirestoreOptions.EmulatorCredentials; +import com.google.cloud.firestore.v1.FirestoreSettings; +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import com.google.cloud.firestore.v1.stub.GrpcFirestoreStub; +import java.io.Serializable; +import java.security.SecureRandom; +import java.util.Map; +import javax.annotation.concurrent.Immutable; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.checkerframework.checker.nullness.qual.NonNull; + +/** + * Factory class for all stateful components used in the Firestore Connector. + * + *

    None of the components returned by any of these factory methods are serializable, this factory + * functions to give a serialization friendly handle to create instances of these components. + * + *

    This class is stateless. + */ +@Immutable +class FirestoreStatefulComponentFactory implements Serializable { + + static final FirestoreStatefulComponentFactory INSTANCE = new FirestoreStatefulComponentFactory(); + + private FirestoreStatefulComponentFactory() {} + + /** + * Given a {@link PipelineOptions}, return a pre-configured {@link FirestoreStub} with values set + * based on those options. + * + *

    The provided {@link PipelineOptions} is expected to provide {@link FirestoreOptions} and + * {@link org.apache.beam.sdk.extensions.gcp.options.GcpOptions GcpOptions} for access to {@link + * GcpOptions#getProject()} + * + *

    The instance returned by this method is expected to bind to the lifecycle of a bundle. + * + * @param options The instance of options to read from + * @return a new {@link FirestoreStub} pre-configured with values from the provided options + */ + FirestoreStub getFirestoreStub(PipelineOptions options) { + try { + FirestoreSettings.Builder builder = + FirestoreSettings.newBuilder() + .setHeaderProvider( + new FixedHeaderProvider() { + @Override + public Map<@NonNull String, @NonNull String> getHeaders() { + return ImmutableMap.of("User-Agent", options.getUserAgent()); + } + }); + + RetrySettings retrySettings = RetrySettings.newBuilder().setMaxAttempts(1).build(); + + builder.applyToAllUnaryMethods( + b -> { + b.setRetrySettings(retrySettings); + return null; + }); + + FirestoreOptions firestoreOptions = options.as(FirestoreOptions.class); + String emulatorHostPort = firestoreOptions.getEmulatorHost(); + if (emulatorHostPort != null) { + builder + .setCredentialsProvider(FixedCredentialsProvider.create(new EmulatorCredentials())) + .setEndpoint(emulatorHostPort) + .setTransportChannelProvider( + InstantiatingGrpcChannelProvider.newBuilder() + .setEndpoint(emulatorHostPort) + .setChannelConfigurator(c -> c.usePlaintext()) + .build()); + } else { + GcpOptions gcpOptions = options.as(GcpOptions.class); + builder + .setCredentialsProvider(FixedCredentialsProvider.create(gcpOptions.getGcpCredential())) + .setEndpoint("batch-firestore.googleapis.com:443"); + } + + ClientContext clientContext = ClientContext.create(builder.build()); + return GrpcFirestoreStub.create(clientContext); + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + /** + * Given a {@link RpcQosOptions}, return a new instance of {@link RpcQos} + * + *

    The instance returned by this method is expected to bind to the lifecycle of a worker, and + * specifically live longer than a single bundle. + * + * @param options The instance of options to read from + * @return a new {@link RpcQos} based on the provided options + */ + RpcQos getRpcQos(RpcQosOptions options) { + return new RpcQosImpl( + options, + new SecureRandom(), + Sleeper.DEFAULT, + CounterFactory.DEFAULT, + DistributionFactory.DEFAULT); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1.java new file mode 100644 index 000000000000..74d6636b28a9 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1.java @@ -0,0 +1,1815 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static java.util.Objects.requireNonNull; + +import com.google.firestore.v1.BatchGetDocumentsRequest; +import com.google.firestore.v1.BatchGetDocumentsResponse; +import com.google.firestore.v1.BatchWriteRequest; +import com.google.firestore.v1.Cursor; +import com.google.firestore.v1.Document; +import com.google.firestore.v1.ListCollectionIdsRequest; +import com.google.firestore.v1.ListCollectionIdsResponse; +import com.google.firestore.v1.ListDocumentsRequest; +import com.google.firestore.v1.ListDocumentsResponse; +import com.google.firestore.v1.PartitionQueryRequest; +import com.google.firestore.v1.PartitionQueryResponse; +import com.google.firestore.v1.RunQueryRequest; +import com.google.firestore.v1.RunQueryResponse; +import com.google.firestore.v1.StructuredQuery; +import com.google.firestore.v1.StructuredQuery.FieldReference; +import com.google.firestore.v1.StructuredQuery.Projection; +import com.google.firestore.v1.Value; +import com.google.firestore.v1.WriteResult; +import com.google.rpc.Status; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; +import java.util.Optional; +import java.util.function.Function; +import javax.annotation.concurrent.Immutable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.BatchGetDocumentsFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.ListCollectionIdsFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.ListDocumentsFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.PartitionQueryFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.PartitionQueryPair; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.RunQueryFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.BatchWriteFnWithDeadLetterQueue; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.BatchWriteFnWithSummary; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.Reshuffle; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.display.HasDisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.PInput; +import org.apache.beam.sdk.values.POutput; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * {@link FirestoreV1} provides an API which provides lifecycle managed {@link PTransform}s for Cloud Firestore + * v1 API. + * + *

    This class is part of the Firestore Connector DSL and should be accessed via {@link + * FirestoreIO#v1()}. + * + *

    All {@link PTransform}s provided by this API use {@link + * org.apache.beam.sdk.extensions.gcp.options.GcpOptions GcpOptions} on {@link + * org.apache.beam.sdk.options.PipelineOptions PipelineOptions} for credentials access and projectId + * resolution. As such, the lifecycle of gRPC clients and project information is scoped to the + * bundle level, not the worker level. + * + *

    + * + *

    Operations

    + * + *

    Read

    + * + *

    The currently supported read operations and their execution behavior are as follows: + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + *
    RPCExecution BehaviorExample Usage
    {@link PartitionQuery}Parallel Streaming + *
    + * PCollection<{@link PartitionQueryRequest}> partitionQueryRequests = ...;
    + * PCollection<{@link RunQueryRequest}> runQueryRequests = partitionQueryRequests
    + *     .apply(FirestoreIO.v1().read().{@link Read#partitionQuery() partitionQuery()}.build());
    + * PCollection<{@link RunQueryResponse}> runQueryResponses = runQueryRequests
    + *     .apply(FirestoreIO.v1().read().{@link Read#runQuery() runQuery()}.build());
    + * 
    + *
    {@link RunQuery}Sequential Streaming + *
    + * PCollection<{@link RunQueryRequest}> runQueryRequests = ...;
    + * PCollection<{@link RunQueryResponse}> runQueryResponses = runQueryRequests
    + *     .apply(FirestoreIO.v1().read().{@link Read#runQuery() runQuery()}.build());
    + * 
    + *
    {@link BatchGetDocuments}Sequential Streaming + *
    + * PCollection<{@link BatchGetDocumentsRequest}> batchGetDocumentsRequests = ...;
    + * PCollection<{@link BatchGetDocumentsResponse}> batchGetDocumentsResponses = batchGetDocumentsRequests
    + *     .apply(FirestoreIO.v1().read().{@link Read#batchGetDocuments() batchGetDocuments()}.build());
    + * 
    + *
    {@link ListCollectionIds}Sequential Paginated + *
    + * PCollection<{@link ListCollectionIdsRequest}> listCollectionIdsRequests = ...;
    + * PCollection<{@link ListCollectionIdsResponse}> listCollectionIdsResponses = listCollectionIdsRequests
    + *     .apply(FirestoreIO.v1().read().{@link Read#listCollectionIds() listCollectionIds()}.build());
    + * 
    + *
    {@link ListDocuments}Sequential Paginated + *
    + * PCollection<{@link ListDocumentsRequest}> listDocumentsRequests = ...;
    + * PCollection<{@link ListDocumentsResponse}> listDocumentsResponses = listDocumentsRequests
    + *     .apply(FirestoreIO.v1().read().{@link Read#listDocuments() listDocuments()}.build());
    + * 
    + *
    + * + *

    PartitionQuery should be preferred over other options if at all possible, becuase it has the + * ability to parallelize execution of multiple queries for specific sub-ranges of the full results. + * When choosing the value to set for {@link PartitionQueryRequest.Builder#setPartitionCount(long)}, + * ensure you are picking a value this makes sense for your data set and your max number of workers. + * If you find that a partition query is taking a unexpectedly long time, try increasing the + * number of partitions. Depending on how large your dataset is increasing as much as 10x can + * significantly reduce total partition query wall time. + * + *

    You should only ever use ListDocuments if the use of {@code + * show_missing} is needed to access a document. RunQuery and PartitionQuery will always be + * faster if the use of {@code show_missing} is not needed. + * + *

    Write

    + * + * To write a {@link PCollection} to Cloud Firestore use {@link FirestoreV1#write()}, picking the + * behavior of the writer. + * + *

    Writes use Cloud Firestore's BatchWrite api which provides fine grained write semantics. + * + *

    The default behavior is to fail a bundle if any single write fails with a non-retryable error. + * + *

    + * PCollection<{@link com.google.firestore.v1.Write}> writes = ...;
    + * PCollection<{@link WriteSuccessSummary}> sink = writes
    + *     .apply(FirestoreIO.v1().write().{@link Write#batchWrite() batchWrite()}.build());
    + * 
    + * + * Alternatively, if you'd rather output write failures to a Dead Letter Queue add {@link + * BatchWriteWithSummary.Builder#withDeadLetterQueue() withDeadLetterQueue} when building your + * writer. + * + *
    + * PCollection<{@link com.google.firestore.v1.Write}> writes = ...;
    + * PCollection<{@link WriteFailure}> writeFailures = writes
    + *     .apply(FirestoreIO.v1().write().{@link Write#batchWrite() batchWrite()}.withDeadLetterQueue().build());
    + * 
    + * + *

    Permissions

    + * + * Permission requirements depend on the {@code PipelineRunner} that is used to execute the + * pipeline. Please refer to the documentation of corresponding {@code PipelineRunner}s for more + * details. + * + *

    Please see Security for server client + * libraries > Roles for security and permission related information specific to Cloud + * Firestore. + * + *

    Optionally, Cloud Firestore V1 Emulator, running locally, could be used for testing purposes + * by providing the host port information via {@link FirestoreOptions#setEmulatorHost(String)}. In + * such a case, all the Cloud Firestore API calls are directed to the Emulator. + * + * @see FirestoreIO#v1() + * @see org.apache.beam.sdk.PipelineRunner + * @see org.apache.beam.sdk.options.PipelineOptions + * @see org.apache.beam.sdk.extensions.gcp.options.GcpOptions + * @see Cloud + * Firestore v1 API + */ +@Immutable +public final class FirestoreV1 { + static final FirestoreV1 INSTANCE = new FirestoreV1(); + + private FirestoreV1() {} + + /** + * The class returned by this method provides the ability to create {@link PTransform PTransforms} + * for read operations available in the Firestore V1 API provided by {@link + * com.google.cloud.firestore.v1.stub.FirestoreStub FirestoreStub}. + * + *

    This method is part of the Firestore Connector DSL and should be accessed via {@link + * FirestoreIO#v1()}. + * + *

    + * + * @return Type safe builder factory for read operations. + * @see FirestoreIO#v1() + */ + public Read read() { + return Read.INSTANCE; + } + + /** + * The class returned by this method provides the ability to create {@link PTransform PTransforms} + * for write operations available in the Firestore V1 API provided by {@link + * com.google.cloud.firestore.v1.stub.FirestoreStub FirestoreStub}. + * + *

    This method is part of the Firestore Connector DSL and should be accessed via {@link + * FirestoreIO#v1()}. + * + *

    + * + * @return Type safe builder factory for write operations. + * @see FirestoreIO#v1() + */ + public Write write() { + return Write.INSTANCE; + } + + /** + * Type safe builder factory for read operations. + * + *

    This class is part of the Firestore Connector DSL and should be accessed via {@link #read() + * FirestoreIO.v1().read()}. + * + *

    This class provides access to a set of type safe builders for read operations available in + * the Firestore V1 API accessed through {@link com.google.cloud.firestore.v1.stub.FirestoreStub + * FirestoreStub}. Each builder allows configuration before creating an immutable instance which + * can be used in your pipeline. + * + *

    + * + * @see FirestoreIO#v1() + * @see #read() + */ + @Experimental(Kind.SOURCE_SINK) + @Immutable + public static final class Read { + private static final Read INSTANCE = new Read(); + + private Read() {} + + /** + * Factory method to create a new type safe builder for {@link ListDocumentsRequest} operations. + * + *

    This method is part of the Firestore Connector DSL and should be accessed via {@link + * FirestoreIO#v1()}. + * + *

    All request quality-of-service for the built {@link ListDocuments} PTransform is scoped to + * the worker and configured based on the {@link RpcQosOptions} specified via this builder. + * + *

    All logging for the built instance of {@link ListDocuments} will be sent to appender + * {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListDocuments}. + * + *

    The following metrics will be available for the built instance of {@link ListDocuments} + * + *

      + *
    1. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListDocuments.throttlingMs} A + * counter tracking the number of milliseconds RPCs are throttled by Qos + *
    2. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListDocuments.rpcFailures} A + * counter tracking the number of failed RPCs + *
    3. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListDocuments.rpcSuccesses} A + * counter tracking the number of successful RPCs + *
    + * + * @return A new type safe builder providing configuration for processing of {@link + * ListDocumentsRequest}s + * @see FirestoreIO#v1() + * @see ListDocuments + * @see ListDocumentsRequest + * @see ListDocumentsResponse + * @see google.firestore.v1.Firestore.ListDocuments + * @see google.firestore.v1.ListDocumentsRequest + * @see google.firestore.v1.ListDocumentsResponse + */ + public ListDocuments.Builder listDocuments() { + return new ListDocuments.Builder(); + } + + /** + * Factory method to create a new type safe builder for {@link ListCollectionIdsRequest} + * operations. + * + *

    This method is part of the Firestore Connector DSL and should be accessed via {@link + * FirestoreIO#v1()}. + * + *

    All request quality-of-service for the built {@link ListCollectionIds} PTransform is + * scoped to the worker and configured based on the {@link RpcQosOptions} specified via this + * builder. + * + *

    All logging for the built instance of {@link ListCollectionIds} will be sent to appender + * {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListCollectionIds}. + * + *

    The following metrics will be available for the built instance of {@link + * ListCollectionIds} + * + *

      + *
    1. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListCollectionIds.throttlingMs} + * A counter tracking the number of milliseconds RPCs are throttled by Qos + *
    2. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListCollectionIds.rpcFailures} + * A counter tracking the number of failed RPCs + *
    3. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListCollectionIds.rpcSuccesses} + * A counter tracking the number of successful RPCs + *
    + * + * @return A new type safe builder providing configuration for processing of {@link + * ListCollectionIdsRequest}s + * @see FirestoreIO#v1() + * @see ListCollectionIds + * @see ListCollectionIdsRequest + * @see ListCollectionIdsResponse + * @see google.firestore.v1.Firestore.ListCollectionIds + * @see google.firestore.v1.ListCollectionIdsRequest + * @see google.firestore.v1.ListCollectionIdsResponse + */ + public ListCollectionIds.Builder listCollectionIds() { + return new ListCollectionIds.Builder(); + } + + /** + * Factory method to create a new type safe builder for {@link BatchGetDocumentsRequest} + * operations. + * + *

    This method is part of the Firestore Connector DSL and should be accessed via {@link + * FirestoreIO#v1()}. + * + *

    All request quality-of-service for the built {@link BatchGetDocuments} PTransform is + * scoped to the worker and configured based on the {@link RpcQosOptions} specified via this + * builder. + * + *

    All logging for the built instance of {@link BatchGetDocuments} will be sent to appender + * {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.BatchGetDocuments}. + * + *

    The following metrics will be available for the built instance of {@link + * BatchGetDocuments} + * + *

      + *
    1. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.BatchGetDocuments.throttlingMs} + * A counter tracking the number of milliseconds RPCs are throttled by Qos + *
    2. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.BatchGetDocuments.rpcFailures} + * A counter tracking the number of failed RPCs + *
    3. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.BatchGetDocuments.rpcSuccesses} + * A counter tracking the number of successful RPCs + *
    4. {@code + * org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.RunQuery.rpcStreamValueReceived} A + * counter tracking the number of values received by a streaming RPC + *
    + * + * @return A new type safe builder providing configuration for processing of {@link + * BatchGetDocumentsRequest}s + * @see FirestoreIO#v1() + * @see BatchGetDocuments + * @see BatchGetDocumentsRequest + * @see BatchGetDocumentsResponse + * @see google.firestore.v1.Firestore.BatchGetDocuments + * @see google.firestore.v1.BatchGetDocumentsRequest + * @see google.firestore.v1.BatchGetDocumentsResponse + */ + public BatchGetDocuments.Builder batchGetDocuments() { + return new BatchGetDocuments.Builder(); + } + + /** + * Factory method to create a new type safe builder for {@link RunQueryRequest} operations. + * + *

    This method is part of the Firestore Connector DSL and should be accessed via {@link + * FirestoreIO#v1()}. + * + *

    All request quality-of-service for the built {@link RunQuery} PTransform is scoped to the + * worker and configured based on the {@link RpcQosOptions} specified via this builder. + * + *

    All logging for the built instance of {@link RunQuery} will be sent to appender {@code + * org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.RunQuery}. + * + *

    The following metrics will be available for the built instance of {@link RunQuery} + * + *

      + *
    1. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.RunQuery.throttlingMs} A + * counter tracking the number of milliseconds RPCs are throttled by Qos + *
    2. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.RunQuery.rpcFailures} A counter + * tracking the number of failed RPCs + *
    3. {@code org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.RunQuery.rpcSuccesses} A + * counter tracking the number of successful RPCs + *
    4. {@code + * org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.RunQuery.rpcStreamValueReceived} A + * counter tracking the number of values received by a streaming RPC + *
    + * + * @return A new type safe builder providing configuration for processing of {@link + * RunQueryRequest}s + * @see FirestoreIO#v1() + * @see RunQuery + * @see RunQueryRequest + * @see RunQueryResponse + * @see google.firestore.v1.Firestore.RunQuery + * @see google.firestore.v1.RunQueryRequest + * @see google.firestore.v1.RunQueryResponse + */ + public RunQuery.Builder runQuery() { + return new RunQuery.Builder(); + } + + /** + * Factory method to create a new type safe builder for {@link PartitionQueryRequest} + * operations. + * + *

    This method is part of the Firestore Connector DSL and should be accessed via {@link + * FirestoreIO#v1()}. + * + *

    All request quality-of-service for the built {@link PartitionQuery} PTransform is scoped + * to the worker and configured based on the {@link RpcQosOptions} specified via this builder. + * + * @return A new type safe builder providing configuration for processing of {@link + * PartitionQueryRequest}s + * @see FirestoreIO#v1() + * @see PartitionQuery + * @see PartitionQueryRequest + * @see RunQueryResponse + * @see google.firestore.v1.Firestore.PartitionQuery + * @see google.firestore.v1.PartitionQueryRequest + * @see google.firestore.v1.PartitionQueryResponse + */ + public PartitionQuery.Builder partitionQuery() { + return new PartitionQuery.Builder(); + } + } + + /** + * Type safe builder factory for write operations. + * + *

    This class is part of the Firestore Connector DSL and should be accessed via {@link #write() + * FirestoreIO.v1().write()}. + * + *

    + * + *

    This class provides access to a set of type safe builders for supported write operations + * available in the Firestore V1 API accessed through {@link + * com.google.cloud.firestore.v1.stub.FirestoreStub FirestoreStub}. Each builder allows + * configuration before creating an immutable instance which can be used in your pipeline. + * + * @see FirestoreIO#v1() + * @see #write() + */ + @Experimental(Kind.SOURCE_SINK) + @Immutable + public static final class Write { + private static final Write INSTANCE = new Write(); + + private Write() {} + + /** + * Factory method to create a new type safe builder for {@link com.google.firestore.v1.Write} + * operations. + * + *

    By default, when an error is encountered while trying to write to Cloud Firestore a {@link + * FailedWritesException} will be thrown. If you would like a failed write to not result in a + * {@link FailedWritesException}, you can instead use {@link BatchWriteWithDeadLetterQueue} + * which will output any failed write. {@link BatchWriteWithDeadLetterQueue} can be used by + * including {@link BatchWriteWithSummary.Builder#withDeadLetterQueue()} when constructing the + * write handler. + * + *

    This method is part of the Firestore Connector DSL and should be accessed via {@link + * FirestoreIO#v1()}. + * + *

    All request quality-of-service for the built {@link BatchWriteWithSummary} PTransform is + * scoped to the worker and configured based on the {@link RpcQosOptions} specified via this + * builder. + * + * @return A new type safe builder providing configuration for processing of {@link + * com.google.firestore.v1.Write}s + * @see FirestoreIO#v1() + * @see BatchWriteWithSummary + * @see BatchWriteRequest + * @see com.google.firestore.v1.BatchWriteResponse + * @see google.firestore.v1.Firestore.BatchWrite + * @see google.firestore.v1.BatchWriteRequest + * @see google.firestore.v1.BatchWriteResponse + */ + public BatchWriteWithSummary.Builder batchWrite() { + return new BatchWriteWithSummary.Builder(); + } + } + + /** + * Concrete class representing a {@link PTransform}{@code <}{@link PCollection}{@code <}{@link + * ListCollectionIdsRequest}{@code >, }{@link PTransform}{@code <}{@link + * ListCollectionIdsResponse}{@code >>} which will read from Firestore. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible via + * {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#listCollectionIds() listCollectionIds()}. + * + *

    All request quality-of-service for an instance of this PTransform is scoped to the worker + * and configured via {@link ListCollectionIds.Builder#withRpcQosOptions(RpcQosOptions)}. + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#listCollectionIds() + * @see FirestoreV1.ListCollectionIds.Builder + * @see ListCollectionIdsRequest + * @see ListCollectionIdsResponse + * @see google.firestore.v1.Firestore.ListCollectionIds + * @see google.firestore.v1.ListCollectionIdsRequest + * @see google.firestore.v1.ListCollectionIdsResponse + */ + public static final class ListCollectionIds + extends Transform< + PCollection, + PCollection, + ListCollectionIds, + ListCollectionIds.Builder> { + + private ListCollectionIds( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public PCollection expand(PCollection input) { + return input + .apply( + "listCollectionIds", + ParDo.of( + new ListCollectionIdsFn(clock, firestoreStatefulComponentFactory, rpcQosOptions))) + .apply(ParDo.of(new FlattenListCollectionIdsResponse())) + .apply(Reshuffle.viaRandomKey()); + } + + @Override + public Builder toBuilder() { + return new Builder(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + /** + * A type safe builder for {@link ListCollectionIds} allowing configuration and instantiation. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible + * via {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#listCollectionIds() listCollectionIds()}. + * + *

    + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#listCollectionIds() + * @see FirestoreV1.ListCollectionIds + * @see ListCollectionIdsRequest + * @see ListCollectionIdsResponse + * @see google.firestore.v1.Firestore.ListCollectionIds + * @see google.firestore.v1.ListCollectionIdsRequest + * @see google.firestore.v1.ListCollectionIdsResponse + */ + public static final class Builder + extends Transform.Builder< + PCollection, + PCollection, + ListCollectionIds, + ListCollectionIds.Builder> { + + private Builder() { + super(); + } + + private Builder( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public ListCollectionIds build() { + return genericBuild(); + } + + @Override + ListCollectionIds buildSafe( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new ListCollectionIds(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + } + } + + /** + * Concrete class representing a {@link PTransform}{@code <}{@link PCollection}{@code <}{@link + * ListDocumentsRequest}{@code >, }{@link PTransform}{@code <}{@link ListDocumentsResponse}{@code + * >>} which will read from Firestore. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible via + * {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#listDocuments() listDocuments()}. + * + *

    All request quality-of-service for an instance of this PTransform is scoped to the worker + * and configured via {@link ListDocuments.Builder#withRpcQosOptions(RpcQosOptions)}. + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#listDocuments() + * @see FirestoreV1.ListDocuments.Builder + * @see ListDocumentsRequest + * @see ListDocumentsResponse + * @see google.firestore.v1.Firestore.ListDocuments + * @see google.firestore.v1.ListDocumentsRequest + * @see google.firestore.v1.ListDocumentsResponse + */ + public static final class ListDocuments + extends Transform< + PCollection, + PCollection, + ListDocuments, + ListDocuments.Builder> { + + private ListDocuments( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public PCollection expand(PCollection input) { + return input + .apply( + "listDocuments", + ParDo.of( + new ListDocumentsFn(clock, firestoreStatefulComponentFactory, rpcQosOptions))) + .apply(ParDo.of(new ListDocumentsResponseToDocument())) + .apply(Reshuffle.viaRandomKey()); + } + + @Override + public Builder toBuilder() { + return new Builder(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + /** + * A type safe builder for {@link ListDocuments} allowing configuration and instantiation. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible + * via {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#listDocuments() listDocuments()}. + * + *

    + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#listDocuments() + * @see FirestoreV1.ListDocuments + * @see ListDocumentsRequest + * @see ListDocumentsResponse + * @see google.firestore.v1.Firestore.ListDocuments + * @see google.firestore.v1.ListDocumentsRequest + * @see google.firestore.v1.ListDocumentsResponse + */ + public static final class Builder + extends Transform.Builder< + PCollection, + PCollection, + ListDocuments, + ListDocuments.Builder> { + + private Builder() { + super(); + } + + private Builder( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public ListDocuments build() { + return genericBuild(); + } + + @Override + ListDocuments buildSafe( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new ListDocuments(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + } + } + + /** + * Concrete class representing a {@link PTransform}{@code <}{@link PCollection}{@code <}{@link + * RunQueryRequest}{@code >, }{@link PTransform}{@code <}{@link RunQueryResponse}{@code >>} which + * will read from Firestore. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible via + * {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#runQuery() runQuery()}. + * + *

    All request quality-of-service for an instance of this PTransform is scoped to the worker + * and configured via {@link RunQuery.Builder#withRpcQosOptions(RpcQosOptions)}. + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#runQuery() + * @see FirestoreV1.RunQuery.Builder + * @see RunQueryRequest + * @see RunQueryResponse + * @see google.firestore.v1.Firestore.RunQuery + * @see google.firestore.v1.RunQueryRequest + * @see google.firestore.v1.RunQueryResponse + */ + // TODO(BEAM-12605): Add dynamic work rebalancing to support a Splittable DoFn + // TODO(BEAM-12606): Add support for progress reporting + public static final class RunQuery + extends Transform< + PCollection, PCollection, RunQuery, RunQuery.Builder> { + + private RunQuery( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public PCollection expand(PCollection input) { + return input + .apply( + "runQuery", + ParDo.of(new RunQueryFn(clock, firestoreStatefulComponentFactory, rpcQosOptions))) + .apply(Reshuffle.viaRandomKey()); + } + + @Override + public Builder toBuilder() { + return new Builder(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + /** + * A type safe builder for {@link RunQuery} allowing configuration and instantiation. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible + * via {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#runQuery() runQuery()}. + * + *

    + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#runQuery() + * @see FirestoreV1.RunQuery + * @see RunQueryRequest + * @see RunQueryResponse + * @see google.firestore.v1.Firestore.RunQuery + * @see google.firestore.v1.RunQueryRequest + * @see google.firestore.v1.RunQueryResponse + */ + public static final class Builder + extends Transform.Builder< + PCollection, + PCollection, + RunQuery, + RunQuery.Builder> { + + private Builder() { + super(); + } + + private Builder( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public RunQuery build() { + return genericBuild(); + } + + @Override + RunQuery buildSafe( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new RunQuery(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + } + } + + /** + * Concrete class representing a {@link PTransform}{@code <}{@link PCollection}{@code <}{@link + * BatchGetDocumentsRequest}{@code >, }{@link PTransform}{@code <}{@link + * BatchGetDocumentsResponse}{@code >>} which will read from Firestore. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible via + * {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#batchGetDocuments() batchGetDocuments()}. + * + *

    All request quality-of-service for an instance of this PTransform is scoped to the worker + * and configured via {@link BatchGetDocuments.Builder#withRpcQosOptions(RpcQosOptions)}. + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#batchGetDocuments() + * @see FirestoreV1.BatchGetDocuments.Builder + * @see BatchGetDocumentsRequest + * @see BatchGetDocumentsResponse + * @see google.firestore.v1.Firestore.BatchGetDocuments + * @see google.firestore.v1.BatchGetDocumentsRequest + * @see google.firestore.v1.BatchGetDocumentsResponse + */ + public static final class BatchGetDocuments + extends Transform< + PCollection, + PCollection, + BatchGetDocuments, + BatchGetDocuments.Builder> { + + private BatchGetDocuments( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public PCollection expand( + PCollection input) { + return input + .apply( + "batchGetDocuments", + ParDo.of( + new BatchGetDocumentsFn(clock, firestoreStatefulComponentFactory, rpcQosOptions))) + .apply(Reshuffle.viaRandomKey()); + } + + @Override + public Builder toBuilder() { + return new Builder(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + /** + * A type safe builder for {@link BatchGetDocuments} allowing configuration and instantiation. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible + * via {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#batchGetDocuments() batchGetDocuments()}. + * + *

    + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#batchGetDocuments() + * @see FirestoreV1.BatchGetDocuments + * @see BatchGetDocumentsRequest + * @see BatchGetDocumentsResponse + * @see google.firestore.v1.Firestore.BatchGetDocuments + * @see google.firestore.v1.BatchGetDocumentsRequest + * @see google.firestore.v1.BatchGetDocumentsResponse + */ + public static final class Builder + extends Transform.Builder< + PCollection, + PCollection, + BatchGetDocuments, + BatchGetDocuments.Builder> { + + private Builder() { + super(); + } + + public Builder( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public BatchGetDocuments build() { + return genericBuild(); + } + + @Override + BatchGetDocuments buildSafe( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new BatchGetDocuments(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + } + } + + /** + * Concrete class representing a {@link PTransform}{@code <}{@link PCollection}{@code <}{@link + * PartitionQueryRequest}{@code >, }{@link PTransform}{@code <}{@link RunQueryRequest}{@code >>} + * which will read from Firestore. + * + *

    Perform the necessary operations and handling of {@link PartitionQueryResponse}s to yield a + * number of {@link RunQueryRequest} which are friendly to being executed in parallel. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible via + * {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#partitionQuery() partitionQuery()}. + * + *

    All request quality-of-service for an instance of this PTransform is scoped to the worker + * and configured via {@link PartitionQuery.Builder#withRpcQosOptions(RpcQosOptions)}. + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#partitionQuery() + * @see FirestoreV1.PartitionQuery.Builder + * @see FirestoreV1.Read#runQuery() + * @see FirestoreV1.RunQuery + * @see FirestoreV1.RunQuery.Builder + * @see PartitionQueryRequest + * @see RunQueryResponse + * @see google.firestore.v1.Firestore.PartitionQuery + * @see google.firestore.v1.PartitionQueryRequest + * @see google.firestore.v1.PartitionQueryResponse + */ + public static final class PartitionQuery + extends Transform< + PCollection, + PCollection, + PartitionQuery, + PartitionQuery.Builder> { + + private final boolean nameOnlyQuery; + + private PartitionQuery( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions, + boolean nameOnlyQuery) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + this.nameOnlyQuery = nameOnlyQuery; + } + + @Override + public PCollection expand(PCollection input) { + PCollection queries = + input + .apply( + "PartitionQuery", + ParDo.of( + new PartitionQueryFn( + clock, firestoreStatefulComponentFactory, rpcQosOptions))) + .apply("expand queries", ParDo.of(new PartitionQueryResponseToRunQueryRequest())); + if (nameOnlyQuery) { + queries = + queries.apply( + "set name only query", + MapElements.via( + new SimpleFunction() { + @Override + public RunQueryRequest apply(RunQueryRequest input) { + RunQueryRequest.Builder builder = input.toBuilder(); + builder + .getStructuredQueryBuilder() + .setSelect( + Projection.newBuilder() + .addFields( + FieldReference.newBuilder() + .setFieldPath("__name__") + .build()) + .build()); + return builder.build(); + } + })); + } + return queries.apply(Reshuffle.viaRandomKey()); + } + + @Override + public Builder toBuilder() { + return new Builder(clock, firestoreStatefulComponentFactory, rpcQosOptions, nameOnlyQuery); + } + + /** + * A type safe builder for {@link PartitionQuery} allowing configuration and instantiation. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible + * via {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#read() read()}{@code .}{@link + * FirestoreV1.Read#partitionQuery() partitionQuery()}. + * + *

    + * + * @see FirestoreIO#v1() + * @see FirestoreV1#read() + * @see FirestoreV1.Read#partitionQuery() + * @see FirestoreV1.PartitionQuery + * @see PartitionQueryRequest + * @see RunQueryResponse + * @see google.firestore.v1.Firestore.PartitionQuery + * @see google.firestore.v1.PartitionQueryRequest + * @see google.firestore.v1.PartitionQueryResponse + */ + public static final class Builder + extends Transform.Builder< + PCollection, + PCollection, + PartitionQuery, + FirestoreV1.PartitionQuery.Builder> { + + private boolean nameOnlyQuery = false; + + private Builder() { + super(); + } + + public Builder( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions, + boolean nameOnlyQuery) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + this.nameOnlyQuery = nameOnlyQuery; + } + + @Override + public PartitionQuery build() { + return genericBuild(); + } + + /** + * Update produced queries to only retrieve their {@code __name__} thereby not retrieving any + * fields and reducing resource requirements. + * + * @return this builder + */ + public Builder withNameOnlyQuery() { + this.nameOnlyQuery = true; + return this; + } + + @Override + PartitionQuery buildSafe( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new PartitionQuery( + clock, firestoreStatefulComponentFactory, rpcQosOptions, nameOnlyQuery); + } + } + + /** + * DoFn which contains the logic necessary to turn a {@link PartitionQueryRequest} and {@link + * PartitionQueryResponse} pair into {@code N} {@link RunQueryRequest}. + */ + static final class PartitionQueryResponseToRunQueryRequest + extends DoFn { + + /** + * When fetching cursors that span multiple pages it is expected (per + * PartitionQueryRequest.page_token) for the client to sort the cursors before processing + * them to define the sub-queries. So here we're defining a Comparator which will sort Cursors + * by the first reference value present, then comparing the reference values + * lexicographically. + */ + static final Comparator CURSOR_REFERENCE_VALUE_COMPARATOR; + + static { + Function> firstReferenceValue = + (Cursor c) -> + c.getValuesList().stream() + .filter( + v -> { + String referenceValue = v.getReferenceValue(); + return referenceValue != null && !referenceValue.isEmpty(); + }) + .findFirst(); + Function stringToPath = (String s) -> s.split("/"); + // compare references by their path segments rather than as a whole string to ensure + // per path segment comparison is taken into account. + Comparator pathWiseCompare = + (String[] path1, String[] path2) -> { + int minLength = Math.min(path1.length, path2.length); + for (int i = 0; i < minLength; i++) { + String pathSegment1 = path1[i]; + String pathSegment2 = path2[i]; + int compare = pathSegment1.compareTo(pathSegment2); + if (compare != 0) { + return compare; + } + } + if (path1.length == path2.length) { + return 0; + } else if (minLength == path1.length) { + return -1; + } else { + return 1; + } + }; + + // Sort those cursors which have no firstReferenceValue at the bottom of the list + CURSOR_REFERENCE_VALUE_COMPARATOR = + Comparator.comparing( + firstReferenceValue, + (o1, o2) -> { + if (o1.isPresent() && o2.isPresent()) { + return pathWiseCompare.compare( + stringToPath.apply(o1.get().getReferenceValue()), + stringToPath.apply(o2.get().getReferenceValue())); + } else if (o1.isPresent()) { + return -1; + } else { + return 1; + } + }); + } + + @ProcessElement + public void processElement(ProcessContext c) { + PartitionQueryPair pair = c.element(); + PartitionQueryRequest partitionQueryRequest = pair.getRequest(); + String dbRoot = partitionQueryRequest.getParent(); + StructuredQuery structuredQuery = partitionQueryRequest.getStructuredQuery(); + PartitionQueryResponse partitionQueryResponse = pair.getResponse(); + // create a new list before we sort things + List cursors = new ArrayList<>(partitionQueryResponse.getPartitionsList()); + cursors.sort(CURSOR_REFERENCE_VALUE_COMPARATOR); + final int size = cursors.size(); + if (size == 0) { + emit(c, dbRoot, structuredQuery.toBuilder()); + return; + } + final int lastIdx = size - 1; + for (int i = 0; i < size; i++) { + Cursor curr = cursors.get(i); + + if (i == 0) { + // first cursor, emit a range of everything up to the current cursor + emit(c, dbRoot, structuredQuery.toBuilder().setEndAt(curr)); + } + + if (0 < i && i <= lastIdx) { + Cursor prev = cursors.get(i - 1); + // emit a range for values between prev and curr + emit(c, dbRoot, structuredQuery.toBuilder().setStartAt(prev).setEndAt(curr)); + } + + if (i == lastIdx) { + // last cursor, emit a range of everything from the current cursor onward + emit(c, dbRoot, structuredQuery.toBuilder().setStartAt(curr)); + } + } + } + + private void emit(ProcessContext c, String dbRoot, StructuredQuery.Builder builder) { + RunQueryRequest runQueryRequest = + RunQueryRequest.newBuilder() + .setParent(dbRoot) + .setStructuredQuery(builder.build()) + .build(); + c.output(runQueryRequest); + } + } + } + + /** DoFn to output CollectionIds from a {@link ListCollectionIdsResponse}. */ + private static final class FlattenListCollectionIdsResponse + extends DoFn { + @ProcessElement + public void processElement(ProcessContext c) { + c.element().getCollectionIdsList().forEach(c::output); + } + } + + /** DoFn to output {@link Document}s from a {@link ListDocumentsResponse}. */ + private static final class ListDocumentsResponseToDocument + extends DoFn { + @ProcessElement + public void processElement(ProcessContext c) { + c.element().getDocumentsList().forEach(c::output); + } + } + + /** + * Concrete class representing a {@link PTransform}{@code <}{@link PCollection}{@code <}{@link + * com.google.firestore.v1.Write}{@code >, }{@link PDone}{@code >} which will write to Firestore. + * + *

    If an error is encountered while trying to write to Cloud Firestore a {@link + * FailedWritesException} will be thrown. If you would like a failed write to not result in a + * {@link FailedWritesException}, you can instead use {@link BatchWriteWithDeadLetterQueue} which + * will instead output any failed write. {@link BatchWriteWithDeadLetterQueue } can be used by + * including {@link Builder#withDeadLetterQueue()} when constructing the write handler. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible via + * {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#write() write()}{@code .}{@link + * FirestoreV1.Write#batchWrite() batchWrite()}. + * + *

    All request quality-of-service for an instance of this PTransform is scoped to the worker + * and configured via {@link Builder#withRpcQosOptions(RpcQosOptions)}. + * + *

    Writes performed against Firestore will be ordered and grouped to maximize throughput while + * maintaining a high request success rate. Batch sizes will be determined by the QOS layer. + * + * @see FirestoreIO#v1() + * @see FirestoreV1#write() + * @see FirestoreV1.Write#batchWrite() + * @see BatchWriteWithSummary.Builder + * @see BatchWriteWithDeadLetterQueue + * @see BatchWriteRequest + * @see com.google.firestore.v1.BatchWriteResponse + * @see google.firestore.v1.Firestore.BatchWrite + * @see google.firestore.v1.BatchWriteRequest + * @see google.firestore.v1.BatchWriteResponse + */ + public static final class BatchWriteWithSummary + extends Transform< + PCollection, + PCollection, + BatchWriteWithSummary, + BatchWriteWithSummary.Builder> { + + private BatchWriteWithSummary( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public PCollection expand( + PCollection input) { + return input.apply( + "batchWrite", + ParDo.of( + new BatchWriteFnWithSummary( + clock, + firestoreStatefulComponentFactory, + rpcQosOptions, + CounterFactory.DEFAULT))); + } + + @Override + public Builder toBuilder() { + return new Builder(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + /** + * A type safe builder for {@link BatchWriteWithSummary} allowing configuration and + * instantiation. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible + * via {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#write() write()}{@code .}{@link + * FirestoreV1.Write#batchWrite() batchWrite()}. + * + *

    + * + * @see FirestoreIO#v1() + * @see FirestoreV1#write() + * @see FirestoreV1.Write#batchWrite() + * @see BatchWriteWithSummary + * @see BatchWriteRequest + * @see com.google.firestore.v1.BatchWriteResponse + * @see google.firestore.v1.Firestore.BatchWrite + * @see google.firestore.v1.BatchWriteRequest + * @see google.firestore.v1.BatchWriteResponse + */ + public static final class Builder + extends Transform.Builder< + PCollection, + PCollection, + BatchWriteWithSummary, + BatchWriteWithSummary.Builder> { + + private Builder() { + super(); + } + + private Builder( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + public BatchWriteWithDeadLetterQueue.Builder withDeadLetterQueue() { + return new BatchWriteWithDeadLetterQueue.Builder( + clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public BatchWriteWithSummary build() { + return genericBuild(); + } + + @Override + BatchWriteWithSummary buildSafe( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new BatchWriteWithSummary(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + } + } + + /** + * Concrete class representing a {@link PTransform}{@code <}{@link PCollection}{@code <}{@link + * com.google.firestore.v1.Write}{@code >, }{@link PCollection}{@code <}{@link WriteFailure}{@code + * >} which will write to Firestore. {@link WriteFailure}s output by this {@code PTransform} are + * those writes which were not able to be applied to Cloud Firestore. + * + *

    Use this BatchWrite when you do not want a failed write to error an entire bundle. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible via + * {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#write() write()}{@code .}{@link + * FirestoreV1.Write#batchWrite() batchWrite()}{@code .}{@link + * BatchWriteWithSummary.Builder#withDeadLetterQueue() withDeadLetterQueue()}. + * + *

    All request quality-of-service for an instance of this PTransform is scoped to the worker + * and configured via {@link Builder#withRpcQosOptions(RpcQosOptions)}. + * + *

    Writes performed against Firestore will be ordered and grouped to maximize throughput while + * maintaining a high request success rate. Batch sizes will be determined by the QOS layer. + * + * @see FirestoreIO#v1() + * @see FirestoreV1#write() + * @see FirestoreV1.Write#batchWrite() + * @see BatchWriteWithSummary.Builder + * @see BatchWriteWithSummary.Builder#withDeadLetterQueue() + * @see BatchWriteRequest + * @see com.google.firestore.v1.BatchWriteResponse + * @see google.firestore.v1.Firestore.BatchWrite + * @see google.firestore.v1.BatchWriteRequest + * @see google.firestore.v1.BatchWriteResponse + */ + public static final class BatchWriteWithDeadLetterQueue + extends Transform< + PCollection, + PCollection, + BatchWriteWithDeadLetterQueue, + BatchWriteWithDeadLetterQueue.Builder> { + + private BatchWriteWithDeadLetterQueue( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public PCollection expand(PCollection input) { + return input.apply( + "batchWrite", + ParDo.of( + new BatchWriteFnWithDeadLetterQueue( + clock, + firestoreStatefulComponentFactory, + rpcQosOptions, + CounterFactory.DEFAULT))); + } + + @Override + public Builder toBuilder() { + return new Builder(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + /** + * A type safe builder for {@link BatchWriteWithDeadLetterQueue} allowing configuration and + * instantiation. + * + *

    This class is part of the Firestore Connector DSL, it has a type safe builder accessible + * via {@link FirestoreIO#v1()}{@code .}{@link FirestoreV1#write() write()}{@code .}{@link + * FirestoreV1.Write#batchWrite() batchWrite()}. + * + *

    + * + * @see FirestoreIO#v1() + * @see FirestoreV1#write() + * @see FirestoreV1.Write#batchWrite() + * @see BatchWriteWithSummary + * @see BatchWriteRequest + * @see com.google.firestore.v1.BatchWriteResponse + * @see google.firestore.v1.Firestore.BatchWrite + * @see google.firestore.v1.BatchWriteRequest + * @see google.firestore.v1.BatchWriteResponse + */ + public static final class Builder + extends Transform.Builder< + PCollection, + PCollection, + BatchWriteWithDeadLetterQueue, + BatchWriteWithDeadLetterQueue.Builder> { + + private Builder() { + super(); + } + + private Builder( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public BatchWriteWithDeadLetterQueue build() { + return genericBuild(); + } + + @Override + BatchWriteWithDeadLetterQueue buildSafe( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new BatchWriteWithDeadLetterQueue( + clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + } + } + + /** + * Summary object produced when a number of writes are successfully written to Firestore in a + * single BatchWrite. + */ + @Immutable + public static final class WriteSuccessSummary implements Serializable { + private final int numWrites; + private final long numBytes; + + public WriteSuccessSummary(int numWrites, long numBytes) { + this.numWrites = numWrites; + this.numBytes = numBytes; + } + + public int getNumWrites() { + return numWrites; + } + + public long getNumBytes() { + return numBytes; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (!(o instanceof WriteSuccessSummary)) { + return false; + } + WriteSuccessSummary that = (WriteSuccessSummary) o; + return numWrites == that.numWrites && numBytes == that.numBytes; + } + + @Override + public int hashCode() { + return Objects.hash(numWrites, numBytes); + } + + @Override + public String toString() { + return "WriteSummary{" + "numWrites=" + numWrites + ", numBytes=" + numBytes + '}'; + } + } + + /** + * Failure details for an attempted {@link com.google.firestore.v1.Write}. When a {@link + * com.google.firestore.v1.Write} is unable to be applied an instance of this class will be + * created with the details of the failure. + * + *

    Included data: + * + *

      + *
    • The original {@link com.google.firestore.v1.Write} + *
    • The {@link WriteResult} returned by the Cloud Firestore API + *
    • The {@link Status} returned by the Cloud Firestore API (often {@link Status#getMessage()} + * will provide details of why the write was unsuccessful + *
    + */ + @Immutable + public static final class WriteFailure implements Serializable { + private final com.google.firestore.v1.Write write; + private final WriteResult writeResult; + private final Status status; + + public WriteFailure( + com.google.firestore.v1.Write write, WriteResult writeResult, Status status) { + this.write = write; + this.writeResult = writeResult; + this.status = status; + } + + public com.google.firestore.v1.Write getWrite() { + return write; + } + + public WriteResult getWriteResult() { + return writeResult; + } + + public Status getStatus() { + return status; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (!(o instanceof WriteFailure)) { + return false; + } + WriteFailure that = (WriteFailure) o; + return write.equals(that.write) + && writeResult.equals(that.writeResult) + && status.equals(that.status); + } + + @Override + public int hashCode() { + return Objects.hash(write, writeResult, status); + } + } + + /** + * Exception that is thrown if one or more {@link com.google.firestore.v1.Write}s is unsuccessful + * with a non-retryable status code. + */ + public static class FailedWritesException extends RuntimeException { + private final List writeFailures; + + public FailedWritesException(List writeFailures) { + super(String.format("Not-retryable status code(s) for %d writes", writeFailures.size())); + this.writeFailures = writeFailures; + } + + /** This list of {@link WriteFailure}s detailing which writes failed and for what reason. */ + public List getWriteFailures() { + return writeFailures; + } + } + + /** + * Our base PTransform class for Firestore V1 API related functions. + * + *

    + * + * @param The type of the previous stage of the pipeline, usually a {@link PCollection} of a + * request type from {@link com.google.firestore.v1} + * @param The type returned from the RPC operation (usually a response class from {@link + * com.google.firestore.v1}) + * @param The type of this transform used to bind this type and the corresponding type + * safe {@link BldrT} together + * @param The type of the type safe builder which is used to build and instance of {@link + * TrfmT} + */ + private abstract static class Transform< + InT extends PInput, + OutT extends POutput, + TrfmT extends Transform, + BldrT extends Transform.Builder> + extends PTransform implements HasDisplayData { + final JodaClock clock; + final FirestoreStatefulComponentFactory firestoreStatefulComponentFactory; + final RpcQosOptions rpcQosOptions; + + Transform( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + this.clock = clock; + this.firestoreStatefulComponentFactory = firestoreStatefulComponentFactory; + this.rpcQosOptions = rpcQosOptions; + } + + @Override + public final void populateDisplayData(DisplayData.Builder builder) { + super.populateDisplayData(builder); + } + + /** + * Create a new {@link BldrT Builder} from the current instance. + * + * @return a new instance of a {@link BldrT Builder} initialized to the current state of this + * instance + */ + public abstract BldrT toBuilder(); + + /** + * Our base type safe builder for a {@link FirestoreV1.Transform} + * + *

    This type safe builder provides a user (and semver) friendly way to expose optional + * parameters to those users that wish to configure them. Additionally, we are able to add and + * deprecate individual parameters as may be needed. + * + * @param The type of the previous stage of the pipeline, usually a {@link PCollection} of + * a request type from {@link com.google.firestore.v1} + * @param The type returned from the RPC operation (usually a response class from {@link + * com.google.firestore.v1}) + * @param The type of this transform used to bind this type and the corresponding type + * safe {@link BldrT} together + * @param The type of the type safe builder which is used to build and instance of + * {@link TrfmT} + */ + abstract static class Builder< + InT extends PInput, + OutT extends POutput, + TrfmT extends Transform, + BldrT extends Transform.Builder> { + + JodaClock clock; + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory; + RpcQosOptions rpcQosOptions; + + Builder() { + clock = JodaClock.DEFAULT; + firestoreStatefulComponentFactory = FirestoreStatefulComponentFactory.INSTANCE; + rpcQosOptions = RpcQosOptions.defaultOptions(); + } + + private Builder( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + this.clock = clock; + this.firestoreStatefulComponentFactory = firestoreStatefulComponentFactory; + this.rpcQosOptions = rpcQosOptions; + } + + /** + * Convenience method to take care of hiding the unchecked cast warning from the compiler. + * This cast is safe because we are always an instance of {@link BldrT} as the only way to get + * an instance of {@link FirestoreV1.Transform.Builder} is for it to conform to {@code Bldr}'s + * constraints. + * + * @return Down cast this + */ + @SuppressWarnings({"unchecked", "RedundantSuppression"}) + BldrT self() { + return (BldrT) this; + } + + /** + * Create a new instance of {@link TrfmT Transform} from the current builder state. + * + * @return a new instance of {@link TrfmT Transform} from the current builder state. + */ + public abstract TrfmT build(); + + /** + * Provide a central location for the validation before ultimately constructing a transformer. + * + *

    While this method looks to purely be duplication (given that each implementation of + * {@link #build()}) simply delegates to this method, the build method carries with it the + * concrete class rather than the generic type information. Having the concrete class + * available to users is advantageous to reduce the necessity of reading the complex type + * information and instead presenting them with a concrete class name. + * + *

    Comparing the type of the builder at the use site of each method: + * + * + * + * + * + * + * + * + * + * + *
    {@code build()}{@code genericBuild()}
    {@code FirestoreV1.BatchGetDocuments.Builder}
    {@code FirestoreV1.Transform.Builder<
    +       *            In extends PInput,
    +       *            Out extends POutput,
    +       *            Trfm extends FirestoreV1.Transform,
    +       *            Bldr extends FirestoreV1.Transform.Builder
    +       *          >}
    + * + * While this type information is important for our implementation, it is less important for + * the users using our implementation. + */ + final TrfmT genericBuild() { + return buildSafe( + requireNonNull(clock, "clock must be non null"), + requireNonNull(firestoreStatefulComponentFactory, "firestoreFactory must be non null"), + requireNonNull(rpcQosOptions, "rpcQosOptions must be non null")); + } + + abstract TrfmT buildSafe( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions); + + /** + * Specify the {@link RpcQosOptions} that will be used when bootstrapping the QOS of each + * running instance of the {@link TrfmT Transform} created by this builder. + * + *

    NOTE This method behaves as set, mutating the value in this builder instance. + * + * @param rpcQosOptions The QOS Options to use when bootstrapping and running the built {@link + * TrfmT Transform}. + * @return this builder + * @see RpcQosOptions + * @see RpcQosOptions#defaultOptions() + * @see RpcQosOptions#newBuilder() + */ + public final BldrT withRpcQosOptions(RpcQosOptions rpcQosOptions) { + requireNonNull(rpcQosOptions, "rpcQosOptions must be non null"); + this.rpcQosOptions = rpcQosOptions; + return self(); + } + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1ReadFn.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1ReadFn.java new file mode 100644 index 000000000000..bfbc6362da4f --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1ReadFn.java @@ -0,0 +1,632 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static java.util.Objects.requireNonNull; + +import com.google.api.gax.paging.AbstractPage; +import com.google.api.gax.paging.AbstractPagedListResponse; +import com.google.api.gax.rpc.ServerStream; +import com.google.api.gax.rpc.ServerStreamingCallable; +import com.google.api.gax.rpc.UnaryCallable; +import com.google.cloud.firestore.v1.FirestoreClient.ListCollectionIdsPage; +import com.google.cloud.firestore.v1.FirestoreClient.ListCollectionIdsPagedResponse; +import com.google.cloud.firestore.v1.FirestoreClient.ListDocumentsPage; +import com.google.cloud.firestore.v1.FirestoreClient.ListDocumentsPagedResponse; +import com.google.cloud.firestore.v1.FirestoreClient.PartitionQueryPage; +import com.google.cloud.firestore.v1.FirestoreClient.PartitionQueryPagedResponse; +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import com.google.firestore.v1.BatchGetDocumentsRequest; +import com.google.firestore.v1.BatchGetDocumentsResponse; +import com.google.firestore.v1.Cursor; +import com.google.firestore.v1.ListCollectionIdsRequest; +import com.google.firestore.v1.ListCollectionIdsResponse; +import com.google.firestore.v1.ListDocumentsRequest; +import com.google.firestore.v1.ListDocumentsResponse; +import com.google.firestore.v1.PartitionQueryRequest; +import com.google.firestore.v1.PartitionQueryResponse; +import com.google.firestore.v1.RunQueryRequest; +import com.google.firestore.v1.RunQueryResponse; +import com.google.firestore.v1.StructuredQuery; +import com.google.firestore.v1.StructuredQuery.Direction; +import com.google.firestore.v1.StructuredQuery.FieldReference; +import com.google.firestore.v1.StructuredQuery.Order; +import com.google.firestore.v1.Value; +import com.google.protobuf.Message; +import com.google.protobuf.ProtocolStringList; +import java.io.Serializable; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreDoFn.ImplicitlyWindowedFirestoreDoFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1RpcAttemptContexts.HasRpcAttemptContext; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcAttempt.Context; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Instant; + +/** + * A collection of {@link org.apache.beam.sdk.transforms.DoFn DoFn}s for each of the supported read + * RPC methods from the Cloud Firestore V1 API. + */ +final class FirestoreV1ReadFn { + + /** + * {@link DoFn} for Firestore V1 {@link RunQueryRequest}s. + * + *

    This Fn uses a stream to obtain responses, each response from the stream will be output to + * the next stage of the pipeline. Each response from the stream represents an individual document + * with the associated metadata. + * + *

    If an error is encountered while reading from the stream, the stream will attempt to resume + * rather than starting over. The restarting of the stream will continue within the scope of the + * completion of the request (meaning any possibility of resumption is contingent upon an attempt + * being available in the Qos budget). + * + *

    All request quality-of-service is managed via the instance of {@link RpcQos} associated with + * the lifecycle of this Fn. + */ + static final class RunQueryFn + extends StreamingFirestoreV1ReadFn { + + RunQueryFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public Context getRpcAttemptContext() { + return FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext.RunQuery; + } + + @Override + protected ServerStreamingCallable getCallable( + FirestoreStub firestoreStub) { + return firestoreStub.runQueryCallable(); + } + + @Override + protected RunQueryRequest setStartFrom( + RunQueryRequest element, RunQueryResponse runQueryResponse) { + StructuredQuery query = element.getStructuredQuery(); + StructuredQuery.Builder builder; + List orderByList = query.getOrderByList(); + // if the orderByList is empty that means the default sort of "__name__ ASC" will be used + // Before we can set the cursor to the last document name read, we need to explicitly add + // the order of "__name__ ASC" because a cursor value must map to an order by + if (orderByList.isEmpty()) { + builder = + query + .toBuilder() + .addOrderBy( + Order.newBuilder() + .setField(FieldReference.newBuilder().setFieldPath("__name__").build()) + .setDirection(Direction.ASCENDING) + .build()) + .setStartAt( + Cursor.newBuilder() + .setBefore(false) + .addValues( + Value.newBuilder() + .setReferenceValue(runQueryResponse.getDocument().getName()) + .build())); + } else { + Cursor.Builder cursor = Cursor.newBuilder().setBefore(false); + Map fieldsMap = runQueryResponse.getDocument().getFieldsMap(); + for (Order order : orderByList) { + String fieldPath = order.getField().getFieldPath(); + Value value = fieldsMap.get(fieldPath); + if (value != null) { + cursor.addValues(value); + } else if ("__name__".equals(fieldPath)) { + cursor.addValues( + Value.newBuilder() + .setReferenceValue(runQueryResponse.getDocument().getName()) + .build()); + } + } + builder = query.toBuilder().setStartAt(cursor.build()); + } + return element.toBuilder().setStructuredQuery(builder.build()).build(); + } + } + + /** + * {@link DoFn} for Firestore V1 {@link PartitionQueryRequest}s. + * + *

    This Fn uses pagination to obtain responses, all pages will be aggregated before being + * emitted to the next stage of the pipeline. Aggregation of pages is necessary as the next step + * of pairing of cursors to create N queries must first sort all cursors. See {@code + * pageToken}s documentation for details. + * + *

    All request quality-of-service is managed via the instance of {@link RpcQos} associated with + * the lifecycle of this Fn. + */ + static final class PartitionQueryFn + extends BaseFirestoreV1ReadFn { + + public PartitionQueryFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public Context getRpcAttemptContext() { + return FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext.PartitionQuery; + } + + @Override + public void processElement(ProcessContext context) throws Exception { + @SuppressWarnings("nullness") + final PartitionQueryRequest element = + requireNonNull(context.element(), "c.element() must be non null"); + + RpcQos.RpcReadAttempt attempt = rpcQos.newReadAttempt(getRpcAttemptContext()); + PartitionQueryResponse.Builder aggregate = null; + while (true) { + if (!attempt.awaitSafeToProceed(clock.instant())) { + continue; + } + + try { + PartitionQueryRequest request = setPageToken(element, aggregate); + attempt.recordRequestStart(clock.instant()); + PartitionQueryPagedResponse pagedResponse = + firestoreStub.partitionQueryPagedCallable().call(request); + for (PartitionQueryPage page : pagedResponse.iteratePages()) { + attempt.recordRequestSuccessful(clock.instant()); + PartitionQueryResponse response = page.getResponse(); + if (aggregate == null) { + aggregate = response.toBuilder(); + } else { + aggregate.addAllPartitions(response.getPartitionsList()); + if (page.hasNextPage()) { + aggregate.setNextPageToken(response.getNextPageToken()); + } else { + aggregate.clearNextPageToken(); + } + } + if (page.hasNextPage()) { + attempt.recordRequestStart(clock.instant()); + } + } + attempt.completeSuccess(); + break; + } catch (RuntimeException exception) { + Instant end = clock.instant(); + attempt.recordRequestFailed(end); + attempt.checkCanRetry(end, exception); + } + } + if (aggregate != null) { + context.output(new PartitionQueryPair(element, aggregate.build())); + } + } + + private PartitionQueryRequest setPageToken( + PartitionQueryRequest request, PartitionQueryResponse.@Nullable Builder aggregate) { + if (aggregate != null && aggregate.getNextPageToken() != null) { + return request.toBuilder().setPageToken(aggregate.getNextPageToken()).build(); + } + return request; + } + } + + /** + * {@link DoFn} for Firestore V1 {@link ListDocumentsRequest}s. + * + *

    This Fn uses pagination to obtain responses, the response from each page will be output to + * the next stage of the pipeline. + * + *

    All request quality-of-service is managed via the instance of {@link RpcQos} associated with + * the lifecycle of this Fn. + */ + static final class ListDocumentsFn + extends PaginatedFirestoreV1ReadFn< + ListDocumentsRequest, + ListDocumentsPagedResponse, + ListDocumentsPage, + ListDocumentsResponse> { + + ListDocumentsFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public Context getRpcAttemptContext() { + return FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext.ListDocuments; + } + + @Override + protected UnaryCallable getCallable( + FirestoreStub firestoreStub) { + return firestoreStub.listDocumentsPagedCallable(); + } + + @Override + protected ListDocumentsRequest setPageToken( + ListDocumentsRequest request, String nextPageToken) { + return request.toBuilder().setPageToken(nextPageToken).build(); + } + } + + /** + * {@link DoFn} for Firestore V1 {@link ListCollectionIdsRequest}s. + * + *

    This Fn uses pagination to obtain responses, the response from each page will be output to + * the next stage of the pipeline. + * + *

    All request quality-of-service is managed via the instance of {@link RpcQos} associated with + * the lifecycle of this Fn. + */ + static final class ListCollectionIdsFn + extends PaginatedFirestoreV1ReadFn< + ListCollectionIdsRequest, + ListCollectionIdsPagedResponse, + ListCollectionIdsPage, + ListCollectionIdsResponse> { + + ListCollectionIdsFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public Context getRpcAttemptContext() { + return FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext.ListCollectionIds; + } + + @Override + protected UnaryCallable getCallable( + FirestoreStub firestoreStub) { + return firestoreStub.listCollectionIdsPagedCallable(); + } + + @Override + protected ListCollectionIdsRequest setPageToken( + ListCollectionIdsRequest request, String nextPageToken) { + return request.toBuilder().setPageToken(nextPageToken).build(); + } + } + + /** + * {@link DoFn} for Firestore V1 {@link BatchGetDocumentsRequest}s. + * + *

    This Fn uses a stream to obtain responses, each response from the stream will be output to + * the next stage of the pipeline. Each response from the stream represents an individual document + * with the associated metadata. + * + *

    All request quality-of-service is managed via the instance of {@link RpcQos} associated with + * the lifecycle of this Fn. + */ + static final class BatchGetDocumentsFn + extends StreamingFirestoreV1ReadFn { + + BatchGetDocumentsFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + @Override + public Context getRpcAttemptContext() { + return FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext.BatchGetDocuments; + } + + @Override + protected ServerStreamingCallable + getCallable(FirestoreStub firestoreStub) { + return firestoreStub.batchGetDocumentsCallable(); + } + + @Override + protected BatchGetDocumentsRequest setStartFrom( + BatchGetDocumentsRequest originalRequest, BatchGetDocumentsResponse mostRecentResponse) { + int startIndex = -1; + ProtocolStringList documentsList = originalRequest.getDocumentsList(); + String missing = mostRecentResponse.getMissing(); + String foundName = + mostRecentResponse.hasFound() ? mostRecentResponse.getFound().getName() : null; + // we only scan until the second to last originalRequest. If the final element were to be + // reached + // the full request would be complete and we wouldn't be in this scenario + int maxIndex = documentsList.size() - 2; + for (int i = 0; i <= maxIndex; i++) { + String docName = documentsList.get(i); + if (docName.equals(missing) || docName.equals(foundName)) { + startIndex = i; + break; + } + } + if (0 <= startIndex) { + BatchGetDocumentsRequest.Builder builder = originalRequest.toBuilder().clearDocuments(); + documentsList.stream() + .skip(startIndex + 1) // start from the next entry from the one we found + .forEach(builder::addDocuments); + return builder.build(); + } + throw new IllegalStateException( + String.format( + "Unable to determine BatchGet resumption point. Most recently received doc __name__ '%s'", + foundName != null ? foundName : missing)); + } + } + + /** + * {@link DoFn} Providing support for a Read type RPC operation which uses a Stream rather than + * pagination. Each response from the stream will be output to the next stage of the pipeline. + * + *

    All request quality-of-service is managed via the instance of {@link RpcQos} associated with + * the lifecycle of this Fn. + * + * @param Request type + * @param Response type + */ + private abstract static class StreamingFirestoreV1ReadFn< + InT extends Message, OutT extends Message> + extends BaseFirestoreV1ReadFn { + + protected StreamingFirestoreV1ReadFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + protected abstract ServerStreamingCallable getCallable(FirestoreStub firestoreStub); + + protected abstract InT setStartFrom(InT element, OutT out); + + @Override + public final void processElement(ProcessContext c) throws Exception { + @SuppressWarnings( + "nullness") // for some reason requireNonNull thinks its parameter but be non-null... + final InT element = requireNonNull(c.element(), "c.element() must be non null"); + + RpcQos.RpcReadAttempt attempt = rpcQos.newReadAttempt(getRpcAttemptContext()); + OutT lastReceivedValue = null; + while (true) { + if (!attempt.awaitSafeToProceed(clock.instant())) { + continue; + } + + Instant start = clock.instant(); + try { + InT request = + lastReceivedValue == null ? element : setStartFrom(element, lastReceivedValue); + attempt.recordRequestStart(start); + ServerStream serverStream = getCallable(firestoreStub).call(request); + attempt.recordRequestSuccessful(clock.instant()); + for (OutT out : serverStream) { + lastReceivedValue = out; + attempt.recordStreamValue(clock.instant()); + c.output(out); + } + attempt.completeSuccess(); + break; + } catch (RuntimeException exception) { + Instant end = clock.instant(); + attempt.recordRequestFailed(end); + attempt.checkCanRetry(end, exception); + } + } + } + } + + /** + * {@link DoFn} Providing support for a Read type RPC operation which uses pagination rather than + * a Stream. + * + * @param Request type + * @param Response type + */ + @SuppressWarnings({ + // errorchecker doesn't like the second ? on PagedResponse, seemingly because of different + // recursion depth limits; 3 on the found vs 4 on the required. + // The second ? is the type of collection the paged response uses to hold all responses if + // trying to expand all pages to a single collection. We are emitting a single page at at time + // while tracking read progress so we can resume if an error has occurred and we still have + // attempt budget available. + "type.argument.type.incompatible" + }) + private abstract static class PaginatedFirestoreV1ReadFn< + RequestT extends Message, + PagedResponseT extends AbstractPagedListResponse, + PageT extends AbstractPage, + ResponseT extends Message> + extends BaseFirestoreV1ReadFn { + + protected PaginatedFirestoreV1ReadFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + protected abstract UnaryCallable getCallable( + FirestoreStub firestoreStub); + + protected abstract RequestT setPageToken(RequestT request, String nextPageToken); + + @Override + public final void processElement(ProcessContext c) throws Exception { + @SuppressWarnings( + "nullness") // for some reason requireNonNull thinks its parameter but be non-null... + final RequestT element = requireNonNull(c.element(), "c.element() must be non null"); + + RpcQos.RpcReadAttempt attempt = rpcQos.newReadAttempt(getRpcAttemptContext()); + String nextPageToken = null; + while (true) { + if (!attempt.awaitSafeToProceed(clock.instant())) { + continue; + } + + try { + RequestT request = nextPageToken == null ? element : setPageToken(element, nextPageToken); + attempt.recordRequestStart(clock.instant()); + PagedResponseT pagedResponse = getCallable(firestoreStub).call(request); + for (PageT page : pagedResponse.iteratePages()) { + ResponseT response = page.getResponse(); + attempt.recordRequestSuccessful(clock.instant()); + c.output(response); + if (page.hasNextPage()) { + nextPageToken = page.getNextPageToken(); + attempt.recordRequestStart(clock.instant()); + } + } + attempt.completeSuccess(); + break; + } catch (RuntimeException exception) { + Instant end = clock.instant(); + attempt.recordRequestFailed(end); + attempt.checkCanRetry(end, exception); + } + } + } + } + + /** + * Base class for all {@link org.apache.beam.sdk.transforms.DoFn DoFn}s which provide access to + * RPCs from the Cloud Firestore V1 API. + * + *

    This class takes care of common lifecycle elements and transient state management for + * subclasses allowing subclasses to provide the minimal implementation for {@link + * ImplicitlyWindowedFirestoreDoFn#processElement(DoFn.ProcessContext)}} + * + * @param The type of element coming into this {@link DoFn} + * @param The type of element output from this {@link DoFn} + */ + abstract static class BaseFirestoreV1ReadFn + extends ImplicitlyWindowedFirestoreDoFn implements HasRpcAttemptContext { + + protected final JodaClock clock; + protected final FirestoreStatefulComponentFactory firestoreStatefulComponentFactory; + protected final RpcQosOptions rpcQosOptions; + + // transient running state information, not important to any possible checkpointing + protected transient FirestoreStub firestoreStub; + protected transient RpcQos rpcQos; + protected transient String projectId; + + @SuppressWarnings( + "initialization.fields.uninitialized") // allow transient fields to be managed by component + // lifecycle + protected BaseFirestoreV1ReadFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + this.clock = requireNonNull(clock, "clock must be non null"); + this.firestoreStatefulComponentFactory = + requireNonNull(firestoreStatefulComponentFactory, "firestoreFactory must be non null"); + this.rpcQosOptions = requireNonNull(rpcQosOptions, "rpcQosOptions must be non null"); + } + + /** {@inheritDoc} */ + @Override + public void setup() { + rpcQos = firestoreStatefulComponentFactory.getRpcQos(rpcQosOptions); + } + + /** {@inheritDoc} */ + @Override + public final void startBundle(StartBundleContext c) { + String project = c.getPipelineOptions().as(GcpOptions.class).getProject(); + projectId = + requireNonNull(project, "project must be defined on GcpOptions of PipelineOptions"); + firestoreStub = firestoreStatefulComponentFactory.getFirestoreStub(c.getPipelineOptions()); + } + + /** {@inheritDoc} */ + @SuppressWarnings("nullness") // allow clearing transient fields + @Override + public void finishBundle() throws Exception { + projectId = null; + firestoreStub.close(); + } + + /** {@inheritDoc} */ + @Override + public final void populateDisplayData(DisplayData.Builder builder) { + builder.include("rpcQosOptions", rpcQosOptions); + } + } + + /** + * Tuple class for a PartitionQuery Request and Response Pair. + * + *

    When processing the response of a ParitionQuery it only is useful in the context of the + * original request as the cursors from the response are tied to the index resolved from the + * request. This class ties these two together so that they can be passed along the pipeline + * together. + */ + static final class PartitionQueryPair implements Serializable { + private final PartitionQueryRequest request; + private final PartitionQueryResponse response; + + @VisibleForTesting + PartitionQueryPair(PartitionQueryRequest request, PartitionQueryResponse response) { + this.request = request; + this.response = response; + } + + public PartitionQueryRequest getRequest() { + return request; + } + + public PartitionQueryResponse getResponse() { + return response; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (!(o instanceof PartitionQueryPair)) { + return false; + } + PartitionQueryPair that = (PartitionQueryPair) o; + return request.equals(that.request) && response.equals(that.response); + } + + @Override + public int hashCode() { + return Objects.hash(request, response); + } + + @Override + public String toString() { + return "PartitionQueryPair{" + "request=" + request + ", response=" + response + '}'; + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1RpcAttemptContexts.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1RpcAttemptContexts.java new file mode 100644 index 000000000000..0a75eeb59c96 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1RpcAttemptContexts.java @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcAttempt.Context; + +final class FirestoreV1RpcAttemptContexts { + + /** + * The base namespace used for {@link Context RpcAttempt.Context} values in {@link + * V1FnRpcAttemptContext}. This value directly impacts the names of log appender and metrics. + * + *

    This names is part of the public API and must not be changed outside of a deprecation cycle. + */ + private static final String CONTEXT_BASE_NAMESPACE = + "org.apache.beam.sdk.io.gcp.firestore.FirestoreV1"; + + interface HasRpcAttemptContext { + Context getRpcAttemptContext(); + } + + /** + * A set of defined {@link Context RpcAttempt.Context} values used to determine metrics and + * logging namespaces. Implemented as an enum to ensure a single instance. + * + *

    These names are part of the public API and must not be changed outside of a deprecation + * cycle. + */ + enum V1FnRpcAttemptContext implements Context { + BatchGetDocuments(), + BatchWrite(), + ListCollectionIds(), + ListDocuments(), + PartitionQuery(), + RunQuery(); + + private final String namespace; + + V1FnRpcAttemptContext() { + this.namespace = CONTEXT_BASE_NAMESPACE + "." + this.name(); + } + + @Override + public final String getNamespace() { + return namespace; + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1WriteFn.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1WriteFn.java new file mode 100644 index 000000000000..bbf40603aa7c --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1WriteFn.java @@ -0,0 +1,631 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static java.util.Objects.requireNonNull; + +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import com.google.firestore.v1.BatchWriteRequest; +import com.google.firestore.v1.BatchWriteResponse; +import com.google.firestore.v1.DatabaseRootName; +import com.google.firestore.v1.Write; +import com.google.firestore.v1.WriteResult; +import com.google.rpc.Code; +import com.google.rpc.Status; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.Iterator; +import java.util.List; +import java.util.Objects; +import java.util.PriorityQueue; +import java.util.Queue; +import java.util.stream.Collectors; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreDoFn.ExplicitlyWindowedFirestoreDoFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.FailedWritesException; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.WriteFailure; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.WriteSuccessSummary; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1RpcAttemptContexts.HasRpcAttemptContext; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcAttempt.Context; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.Element; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.FlushBuffer; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.BackOffUtils; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Instant; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A collection of {@link org.apache.beam.sdk.transforms.DoFn DoFn}s for each of the supported write + * RPC methods from the Cloud Firestore V1 API. + */ +final class FirestoreV1WriteFn { + + static final class BatchWriteFnWithSummary extends BaseBatchWriteFn { + BatchWriteFnWithSummary( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions, + CounterFactory counterFactory) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions, counterFactory); + } + + @Override + void handleWriteFailures( + ContextAdapter context, + Instant timestamp, + List> writeFailures, + Runnable logMessage) { + throw new FailedWritesException( + writeFailures.stream().map(KV::getKey).collect(Collectors.toList())); + } + + @Override + void handleWriteSummary( + ContextAdapter context, + Instant timestamp, + KV tuple, + Runnable logMessage) { + logMessage.run(); + context.output(tuple.getKey(), timestamp, tuple.getValue()); + } + } + + static final class BatchWriteFnWithDeadLetterQueue extends BaseBatchWriteFn { + BatchWriteFnWithDeadLetterQueue( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions, + CounterFactory counterFactory) { + super(clock, firestoreStatefulComponentFactory, rpcQosOptions, counterFactory); + } + + @Override + void handleWriteFailures( + ContextAdapter context, + Instant timestamp, + List> writeFailures, + Runnable logMessage) { + logMessage.run(); + for (KV kv : writeFailures) { + context.output(kv.getKey(), timestamp, kv.getValue()); + } + } + + @Override + void handleWriteSummary( + ContextAdapter context, + Instant timestamp, + KV tuple, + Runnable logMessage) { + logMessage.run(); + } + } + + /** + * {@link DoFn} for Firestore V1 {@link BatchWriteRequest}s. + * + *

    Writes will be enqueued to be sent at a potentially later time when more writes are + * available. This Fn attempts to maximize throughput while maintaining a high request success + * rate. + * + *

    All request quality-of-service is managed via the instance of {@link RpcQos} associated with + * the lifecycle of this Fn. + */ + abstract static class BaseBatchWriteFn extends ExplicitlyWindowedFirestoreDoFn + implements HasRpcAttemptContext { + private static final Logger LOG = + LoggerFactory.getLogger( + FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext.BatchWrite.getNamespace()); + private final JodaClock clock; + private final FirestoreStatefulComponentFactory firestoreStatefulComponentFactory; + private final RpcQosOptions rpcQosOptions; + private final CounterFactory counterFactory; + private final V1FnRpcAttemptContext rpcAttemptContext; + + // transient running state information, not important to any possible checkpointing + // worker scoped state + private transient RpcQos rpcQos; + private transient Counter writesSuccessful; + private transient Counter writesFailedRetryable; + private transient Counter writesFailedNonRetryable; + // bundle scoped state + private transient FirestoreStub firestoreStub; + private transient DatabaseRootName databaseRootName; + + @VisibleForTesting + transient Queue<@NonNull WriteElement> writes = new PriorityQueue<>(WriteElement.COMPARATOR); + + @VisibleForTesting transient int queueNextEntryPriority = 0; + + @SuppressWarnings( + "initialization.fields.uninitialized") // allow transient fields to be managed by component + // lifecycle + BaseBatchWriteFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions, + CounterFactory counterFactory) { + this.clock = clock; + this.firestoreStatefulComponentFactory = firestoreStatefulComponentFactory; + this.rpcQosOptions = rpcQosOptions; + this.counterFactory = counterFactory; + this.rpcAttemptContext = V1FnRpcAttemptContext.BatchWrite; + } + + @Override + public Context getRpcAttemptContext() { + return rpcAttemptContext; + } + + @Override + public final void populateDisplayData(DisplayData.Builder builder) { + builder.include("rpcQosOptions", rpcQosOptions); + } + + @Override + public void setup() { + rpcQos = firestoreStatefulComponentFactory.getRpcQos(rpcQosOptions); + writes = new PriorityQueue<>(WriteElement.COMPARATOR); + + String namespace = rpcAttemptContext.getNamespace(); + writesSuccessful = counterFactory.get(namespace, "writes_successful"); + writesFailedRetryable = counterFactory.get(namespace, "writes_failed_retryable"); + writesFailedNonRetryable = counterFactory.get(namespace, "writes_failed_non-retryable"); + } + + @Override + public final void startBundle(StartBundleContext c) { + String project = c.getPipelineOptions().as(GcpOptions.class).getProject(); + databaseRootName = + DatabaseRootName.of( + requireNonNull(project, "project must be defined on GcpOptions of PipelineOptions"), + "(default)"); + firestoreStub = firestoreStatefulComponentFactory.getFirestoreStub(c.getPipelineOptions()); + } + + /** + * For each element extract and enqueue all writes from the commit. Then potentially flush any + * previously and currently enqueued writes. + * + *

    In order for writes to be enqueued the value of {@link BatchWriteRequest#getDatabase()} + * must match exactly with the database name this instance is configured for via the provided + * {@link org.apache.beam.sdk.options.PipelineOptions PipelineOptions} + * + *

    {@inheritDoc} + */ + @Override + public void processElement(ProcessContext context, BoundedWindow window) throws Exception { + @SuppressWarnings( + "nullness") // error checker is configured to treat any method not explicitly annotated as + // @Nullable as non-null, this includes Objects.requireNonNull + Write write = requireNonNull(context.element(), "context.element() must be non null"); + ProcessContextAdapter contextAdapter = new ProcessContextAdapter<>(context); + int serializedSize = write.getSerializedSize(); + boolean tooLarge = rpcQos.bytesOverLimit(serializedSize); + if (tooLarge) { + String message = + String.format( + "%s for document '%s' larger than configured max allowed bytes per batch", + getWriteType(write), getName(write)); + handleWriteFailures( + contextAdapter, + clock.instant(), + ImmutableList.of( + KV.of( + new WriteFailure( + write, + WriteResult.newBuilder().build(), + Status.newBuilder() + .setCode(Code.INVALID_ARGUMENT.getNumber()) + .setMessage(message) + .build()), + window)), + () -> LOG.info(message)); + } else { + writes.offer(new WriteElement(queueNextEntryPriority++, write, window)); + flushBatch(/* finishingBundle */ false, contextAdapter); + } + } + + /** + * Attempt to flush any outstanding enqueued writes before cleaning up any bundle related state. + * {@inheritDoc} + */ + @SuppressWarnings("nullness") // allow clearing transient fields + @Override + public void finishBundle(FinishBundleContext context) throws Exception { + try { + flushBatch(/* finishingBundle */ true, new FinishBundleContextAdapter<>(context)); + } finally { + databaseRootName = null; + firestoreStub.close(); + } + } + + /** + * Possibly flush enqueued writes to Firestore. + * + *

    This flush attempts to maximize throughput and success rate of RPCs. When a flush should + * happen and how many writes are included is determined and managed by the {@link RpcQos} + * instance of this class. + * + * @param finishingBundle A boolean specifying if this call is from {@link + * #finishBundle(DoFn.FinishBundleContext)}. If {@code true}, this method will not return + * until a terminal state (success, attempts exhausted) for all enqueued writes is reached. + * If {@code false} and the batch buffer is not full, this method will yield until another + * element is added. + * @throws InterruptedException If the current thread is interrupted at anytime, such as while + * waiting for the next attempt + * @see RpcQos + * @see RpcQos.RpcWriteAttempt + * @see BackOffUtils#next(org.apache.beam.sdk.util.Sleeper, org.apache.beam.sdk.util.BackOff) + */ + private void flushBatch(boolean finishingBundle, ContextAdapter context) + throws InterruptedException { + while (!writes.isEmpty()) { + RpcWriteAttempt attempt = rpcQos.newWriteAttempt(getRpcAttemptContext()); + Instant begin = clock.instant(); + if (!attempt.awaitSafeToProceed(begin)) { + continue; + } + + FlushBuffer flushBuffer = getFlushBuffer(attempt, begin); + if (flushBuffer.isFull() || (finishingBundle && flushBuffer.isNonEmpty())) { + DoFlushStatus flushStatus = doFlush(attempt, flushBuffer, context); + if (flushStatus == DoFlushStatus.ONE_OR_MORE_FAILURES && !finishingBundle) { + break; + } + } else { + // since we're not going to perform a flush, we need to return the writes that were + // preemptively removed + flushBuffer.forEach(writes::offer); + if (!finishingBundle) { + // we're not on the final flush, so yield until more elements are delivered or + // finishingBundle is true + return; + } + } + } + if (writes.isEmpty()) { + // now that the queue has been emptied reset our priority back to 0 to try and ensure + // we won't run into overflow issues if a worker runs for a long time and processes + // many writes. + queueNextEntryPriority = 0; + } + } + + private FlushBuffer getFlushBuffer(RpcWriteAttempt attempt, Instant start) { + FlushBuffer buffer = attempt.newFlushBuffer(start); + + WriteElement peek; + while ((peek = writes.peek()) != null) { + if (buffer.offer(peek)) { + writes.poll(); + } else { + break; + } + } + return buffer; + } + + private BatchWriteRequest getBatchWriteRequest(FlushBuffer flushBuffer) { + BatchWriteRequest.Builder commitBuilder = + BatchWriteRequest.newBuilder().setDatabase(databaseRootName.toString()); + for (WriteElement element : flushBuffer) { + commitBuilder.addWrites(element.getValue()); + } + return commitBuilder.build(); + } + + private DoFlushStatus doFlush( + RpcWriteAttempt attempt, + FlushBuffer flushBuffer, + ContextAdapter context) + throws InterruptedException { + int writesCount = flushBuffer.getBufferedElementsCount(); + long bytes = flushBuffer.getBufferedElementsBytes(); + BatchWriteRequest request = getBatchWriteRequest(flushBuffer); + // This is our retry loop + // If an error is encountered and is retryable, continue will start the loop over again + // If an error is encountered and is not retryable, the error will be thrown and the loop + // will end + // If no error is encountered the responses WriteResults will be inspected before breaking + // the loop + while (true) { + Instant start = clock.instant(); + LOG.debug( + "Sending BatchWrite request with {} writes totalling {} bytes", writesCount, bytes); + Instant end; + BatchWriteResponse response; + try { + attempt.recordRequestStart(start, writesCount); + response = firestoreStub.batchWriteCallable().call(request); + end = clock.instant(); + attempt.recordRequestSuccessful(end); + } catch (RuntimeException exception) { + end = clock.instant(); + String exceptionMessage = exception.getMessage(); + LOG.warn( + "Sending BatchWrite request with {} writes totalling {} bytes failed due to error: {}", + writesCount, + bytes, + exceptionMessage != null ? exceptionMessage : exception.getClass().getName()); + attempt.recordRequestFailed(end); + attempt.recordWriteCounts(end, 0, writesCount); + flushBuffer.forEach(writes::offer); + attempt.checkCanRetry(end, exception); + continue; + } + + long elapsedMillis = end.minus(start.getMillis()).getMillis(); + + int okCount = 0; + long okBytes = 0L; + BoundedWindow okWindow = null; + List> nonRetryableWrites = new ArrayList<>(); + + List writeResultList = response.getWriteResultsList(); + List statusList = response.getStatusList(); + + Iterator iterator = flushBuffer.iterator(); + for (int i = 0; iterator.hasNext() && i < statusList.size(); i++) { + WriteElement writeElement = iterator.next(); + Status writeStatus = statusList.get(i); + Code code = Code.forNumber(writeStatus.getCode()); + + if (code == Code.OK) { + okCount++; + okBytes += writeElement.getSerializedSize(); + okWindow = writeElement.window; + } else { + if (attempt.isCodeRetryable(code)) { + writes.offer(writeElement); + } else { + nonRetryableWrites.add( + KV.of( + new WriteFailure( + writeElement.getValue(), writeResultList.get(i), writeStatus), + writeElement.window)); + } + } + } + + int nonRetryableCount = nonRetryableWrites.size(); + int retryableCount = writesCount - okCount - nonRetryableCount; + + writesSuccessful.inc(okCount); + writesFailedRetryable.inc(retryableCount); + writesFailedNonRetryable.inc(nonRetryableCount); + + attempt.recordWriteCounts(end, okCount, nonRetryableCount + retryableCount); + if (okCount == writesCount) { + handleWriteSummary( + context, + end, + KV.of(new WriteSuccessSummary(okCount, okBytes), coerceNonNull(okWindow)), + () -> + LOG.debug( + "Sending BatchWrite request with {} writes totalling {} bytes was completely applied in {}ms", + writesCount, + bytes, + elapsedMillis)); + attempt.completeSuccess(); + return DoFlushStatus.OK; + } else { + if (nonRetryableCount > 0) { + int finalOkCount = okCount; + handleWriteFailures( + context, + end, + ImmutableList.copyOf(nonRetryableWrites), + () -> + LOG.warn( + "Sending BatchWrite request with {} writes totalling {} bytes was incompletely applied in {}ms ({} ok, {} retryable, {} non-retryable)", + writesCount, + bytes, + elapsedMillis, + finalOkCount, + retryableCount, + nonRetryableCount)); + } else if (retryableCount > 0) { + int finalOkCount = okCount; + Runnable logMessage = + () -> + LOG.debug( + "Sending BatchWrite request with {} writes totalling {} bytes was incompletely applied in {}ms ({} ok, {} retryable)", + writesCount, + bytes, + elapsedMillis, + finalOkCount, + retryableCount); + if (okCount > 0) { + handleWriteSummary( + context, + end, + KV.of(new WriteSuccessSummary(okCount, okBytes), coerceNonNull(okWindow)), + logMessage); + } else { + logMessage.run(); + } + } + return DoFlushStatus.ONE_OR_MORE_FAILURES; + } + } + } + + /** + * Window values are part of the WriteElement which is used to full the flushBuffer and + * accessible after the response is returned. Our FlushBuffer is ensured to be non-empty before + * passed to this method, so the loop above will always iterate at least once and by virtue of + * this method only being when at least one ok status OK is present successfulWindow will be + * non-null. + * + *

    This method is here to prove to error prone that the value of successfulWindow we are + * passing along is in fact non-null. + */ + private static BoundedWindow coerceNonNull(@Nullable BoundedWindow successfulWindow) { + if (successfulWindow == null) { + throw new IllegalStateException("Unable to locate window for successful request"); + } + return successfulWindow; + } + + private enum DoFlushStatus { + OK, + ONE_OR_MORE_FAILURES + } + + abstract void handleWriteFailures( + ContextAdapter context, + Instant timestamp, + List> writeFailures, + Runnable logMessage); + + abstract void handleWriteSummary( + ContextAdapter context, + Instant timestamp, + KV tuple, + Runnable logMessage); + + private static String getWriteType(Write w) { + if (w.hasUpdate()) { + return "UPDATE"; + } else if (w.hasTransform()) { + return "TRANSFORM"; + } else { + return "DELETE"; + } + } + + private static String getName(Write w) { + if (w.hasUpdate()) { + return w.getUpdate().getName(); + } else if (w.hasTransform()) { + return w.getTransform().getDocument(); + } else { + return w.getDelete(); + } + } + + /** + * Adapter interface which provides a common parent for {@link ProcessContext} and {@link + * FinishBundleContext} so that we are able to use a single common invocation to output from. + */ + interface ContextAdapter { + void output(T t, Instant timestamp, BoundedWindow window); + } + + private static final class ProcessContextAdapter implements ContextAdapter { + private final DoFn.ProcessContext context; + + private ProcessContextAdapter(DoFn.ProcessContext context) { + this.context = context; + } + + @Override + public void output(T t, Instant timestamp, BoundedWindow window) { + context.outputWithTimestamp(t, timestamp); + } + } + + private static final class FinishBundleContextAdapter implements ContextAdapter { + private final DoFn.FinishBundleContext context; + + private FinishBundleContextAdapter(DoFn.FinishBundleContext context) { + this.context = context; + } + + @Override + public void output(T t, Instant timestamp, BoundedWindow window) { + context.output(t, timestamp, window); + } + } + } + + static final class WriteElement implements Element { + + private static final Comparator COMPARATOR = + Comparator.comparing(WriteElement::getQueuePosition); + private final int queuePosition; + private final Write value; + + private final BoundedWindow window; + + WriteElement(int queuePosition, Write value, BoundedWindow window) { + this.value = value; + this.queuePosition = queuePosition; + this.window = window; + } + + public int getQueuePosition() { + return queuePosition; + } + + @Override + public Write getValue() { + return value; + } + + @Override + public long getSerializedSize() { + return value.getSerializedSize(); + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (!(o instanceof WriteElement)) { + return false; + } + WriteElement that = (WriteElement) o; + return queuePosition == that.queuePosition + && value.equals(that.value) + && window.equals(that.window); + } + + @Override + public int hashCode() { + return Objects.hash(queuePosition, value, window); + } + + @Override + public String toString() { + return "WriteElement{" + + "queuePosition=" + + queuePosition + + ", value=" + + value + + ", window=" + + window + + '}'; + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/JodaClock.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/JodaClock.java new file mode 100644 index 000000000000..187d8b623b99 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/JodaClock.java @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import java.io.Serializable; +import org.joda.time.Instant; + +/** Simple clock interface to get an instant in a test friendly way. */ +interface JodaClock extends Serializable { + JodaClock DEFAULT = Instant::now; + + Instant instant(); +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/RpcQos.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/RpcQos.java new file mode 100644 index 000000000000..dca12db0c211 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/RpcQos.java @@ -0,0 +1,269 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import com.google.rpc.Code; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.BatchWriteWithSummary; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcAttempt.Context; +import org.joda.time.Instant; + +/** + * Quality of Service manager for Firestore RPCs. + * + *

    Cloud Firestore has a number of considerations for interacting with the database in a reliable + * manner. + * + *

    Every RPC which is sent to Cloud Firestore is subject to QoS considerations. Successful, + * failed, attempted requests are all tracked and directly drive the determination of when to + * attempt an RPC. In the case of a write rpc, the QoS will also determine the size of the request + * in order to try and maximize throughput with success rate. + * + *

    The lifecycle of an instance of {@link RpcQos} is expected to be bound to the lifetime of the + * worker the RPC Functions run on. Explicitly, this instance should live longer than an individual + * bundle. + * + *

    Each request processed via one of the {@link org.apache.beam.sdk.transforms.PTransform}s + * available in {@link FirestoreV1} will work its way through a state machine provided by this + * {@link RpcQos}. The high level state machine events are as follows: + * + *

      + *
    1. Create new {@link RpcAttempt} + *
    2. Check if it is safe to proceed with sending the request ({@link + * RpcAttempt#awaitSafeToProceed(Instant)}) + *
    3. Record start of trying to send request + *
    4. Record success or failure state of send request attempt + *
    5. If success output all returned responses + *
    6. If failure check retry ability ({@link RpcAttempt#checkCanRetry(Instant, + * RuntimeException)}) + *
        + *
      1. Ensure the request has budget to retry ({@link RpcQosOptions#getMaxAttempts()}) + *
      2. Ensure the error is not a non-retryable error + *
      + *
    + * + * Configuration of options can be accomplished by passing an instances of {@link RpcQosOptions} to + * the {@code withRpcQosOptions} method of each {@code Builder} available in {@link FirestoreV1}. + * + *

    A new instance of {@link RpcQosOptions.Builder} can be created via {@link + * RpcQosOptions#newBuilder()}. A default instance of {@link RpcQosOptions} can be created via + * {@link RpcQosOptions#defaultOptions()}. + * + *

    + * + * @see FirestoreV1 + * @see FirestoreV1.BatchGetDocuments.Builder#withRpcQosOptions(RpcQosOptions) + * @see BatchWriteWithSummary.Builder#withRpcQosOptions(RpcQosOptions) + * @see FirestoreV1.ListCollectionIds.Builder#withRpcQosOptions(RpcQosOptions) + * @see FirestoreV1.ListDocuments.Builder#withRpcQosOptions(RpcQosOptions) + * @see FirestoreV1.PartitionQuery.Builder#withRpcQosOptions(RpcQosOptions) + * @see FirestoreV1.RunQuery.Builder#withRpcQosOptions(RpcQosOptions) + * @see Standard limits + * @see Designing + * for scale + */ +interface RpcQos { + + /** + * Create a new stateful attempt for a read operation. The returned {@link RpcReadAttempt} will be + * used for the full lifetime of trying to successfully process a request. + * + * @param context The {@link Context} which this new attempt should be associated with. + * @return A new {@link RpcReadAttempt} which will be used while trying to successfully a request + */ + RpcReadAttempt newReadAttempt(Context context); + + /** + * Create a new stateful attempt for a write operation. The returned {@link RpcWriteAttempt} will + * be used for the full lifetime of trying to successfully process a request. + * + * @param context The {@link Context} which this new attempt should be associated with. + * @return A new {@link RpcWriteAttempt} which will be used while trying to successfully a request + */ + RpcWriteAttempt newWriteAttempt(Context context); + + /** + * Check if a request is over the max allowed number of bytes. + * + * @param bytes number of bytes to check against the allowed limit + * @return true if {@code bytes} is over the allowed limit, false otherwise + * @see RpcQosOptions#getBatchMaxBytes() + */ + boolean bytesOverLimit(long bytes); + + /** + * Base interface representing the lifespan of attempting to successfully process a single + * request. + */ + interface RpcAttempt { + + /** + * Await until it is safe to proceed sending the rpc, evaluated relative to {@code start}. If it + * is not yet safe to proceed, this method will block until it is safe to proceed. + * + * @param start The intended start time of the next rpc + * @return true if it is safe to proceed with sending the next rpc, false otherwise. + * @throws InterruptedException if this thread is interrupted while waiting + * @see Thread#sleep(long) + * @see org.apache.beam.sdk.util.Sleeper#sleep(long) + */ + boolean awaitSafeToProceed(Instant start) throws InterruptedException; + + /** + * Determine if an rpc can be retried given {@code instant} and {@code exception}. + * + *

    If a backoff is necessary before retrying this method can block for backoff before + * returning. + * + *

    If no retry is available this {@link RpcAttempt} will move to a terminal failed state and + * will error if further interaction is attempted. + * + * @param instant The instant with which to evaluate retryability + * @param exception Exception to evaluate for retry ability + * @throws InterruptedException if this thread is interrupted while waiting + */ + void checkCanRetry(Instant instant, RuntimeException exception) throws InterruptedException; + + /** + * Mark this {@link RpcAttempt} as having completed successfully, moving to a terminal success + * state. If any further interaction is attempted an error will be thrown. + */ + void completeSuccess(); + + boolean isCodeRetryable(Code code); + + void recordRequestSuccessful(Instant end); + + void recordRequestFailed(Instant end); + + /** + * Context which an attempt should be associated with. + * + *

    Some things which are associated with an attempt: + * + *

      + *
    1. Log appender + *
    2. Metrics + *
    + */ + interface Context { + + /** + * The namespace used for log appender and metrics. + * + * @return the namespace to use + */ + String getNamespace(); + } + } + + /** + * Read specific interface for {@link RpcAttempt}. + * + *

    This interface provides those methods which apply to read operations and the state tracked + * related to read operations. + */ + interface RpcReadAttempt extends RpcAttempt { + + /** Record the start time of sending the rpc. */ + void recordRequestStart(Instant start); + + void recordStreamValue(Instant now); + } + + /** + * Write specific interface for {@link RpcAttempt}. + * + *

    This interface provides those methods which apply to write operations and the state tracked + * related to write operations. + */ + interface RpcWriteAttempt extends RpcAttempt { + + /** + * Create a new {@link FlushBuffer} that can be sized relative to the QoS state and to the + * provided {@code instant}. + * + * @param instant The intended start time of the next rpc + * @param The type which will be sent in the request + * @param The {@link Element} type which the returned buffer will contain + * @return a new {@link FlushBuffer} which queued messages can be staged to before final flush + */ + > FlushBuffer newFlushBuffer(Instant instant); + + /** Record the start time of sending the rpc. */ + void recordRequestStart(Instant start, int numWrites); + + void recordWriteCounts(Instant end, int successfulWrites, int failedWrites); + + /** + * A buffer which is sized related to QoS state and provides a staging location for elements + * before a request is actually created and sent. + * + * @param The {@link Element} type which will be stored in this instance + */ + interface FlushBuffer> extends Iterable { + /** + * Attempt to add {@code newElement} to this {@link FlushBuffer}. + * + * @param newElement The {@link Element} to try and add + * @return true if the flush group has capacity for newElement, false otherwise + */ + boolean offer(ElementT newElement); + + /** @return the number of elements that are currently buffered in this instance */ + int getBufferedElementsCount(); + + /** @return the number of bytes that are currently buffered in this instance */ + long getBufferedElementsBytes(); + + /** + * @return true if the buffer contains enough {@link Element}s such that a flush should + * happen, false otherwise + */ + boolean isFull(); + + boolean isNonEmpty(); + } + + /** + * An element which can be added to a {@link FlushBuffer}. This interface is mainly a marker for + * ensuring an encapsulated lifecycle managed way of determining the serialized size of {@code + * T} which will need to be sent in a request. + * + *

    Due to the nature of how protobufs handle serialization, we want to ensure we only + * calculate the serialization size once so we avoid unnecessary memory pressure. The serialized + * size is read many times through out the lifecycle of determining to send a T. + * + * @param The type which will sent in the request + */ + interface Element { + + /** @return the value that will be sent in a request */ + T getValue(); + + /** + * This method should memoize the calculated size to avoid unnecessary memory pressure due to + * this method being called many times over the course of its lifecycle. + * + * @return the number of bytes T is when serialized + */ + long getSerializedSize(); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosImpl.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosImpl.java new file mode 100644 index 000000000000..f91f18cd4bd4 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosImpl.java @@ -0,0 +1,1072 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import com.google.api.gax.grpc.GrpcStatusCode; +import com.google.api.gax.rpc.ApiException; +import com.google.api.gax.rpc.StatusCode; +import com.google.rpc.Code; +import java.util.Collections; +import java.util.Iterator; +import java.util.Objects; +import java.util.Optional; +import java.util.Random; +import java.util.Set; +import java.util.WeakHashMap; +import java.util.function.Function; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcAttempt.Context; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.Element; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.FlushBuffer; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.StatusCodeAwareBackoff.BackoffDuration; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.StatusCodeAwareBackoff.BackoffResult; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.StatusCodeAwareBackoff.BackoffResults; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Distribution; +import org.apache.beam.sdk.metrics.MetricName; +import org.apache.beam.sdk.transforms.Sum; +import org.apache.beam.sdk.util.BackOff; +import org.apache.beam.sdk.util.MovingFunction; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Ints; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.joda.time.Interval; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +final class RpcQosImpl implements RpcQos { + + /** Non-retryable errors. See https://cloud.google.com/apis/design/errors#handling_errors. */ + private static final Set NON_RETRYABLE_ERROR_NUMBERS = + ImmutableSet.of( + Code.ALREADY_EXISTS, + Code.DATA_LOSS, + Code.FAILED_PRECONDITION, + Code.INVALID_ARGUMENT, + Code.OUT_OF_RANGE, + Code.NOT_FOUND, + Code.PERMISSION_DENIED, + Code.UNIMPLEMENTED) + .stream() + .map(Code::getNumber) + .collect(ImmutableSet.toImmutableSet()); + /** + * The target minimum number of requests per samplePeriodMs, even if no requests succeed. Must be + * greater than 0, else we could throttle to zero. Because every decision is probabilistic, there + * is no guarantee that the request rate in any given interval will not be zero. (This is the +1 + * from the formula in https://landing.google.com/sre/book/chapters/handling-overload.html) + */ + private static final double MIN_REQUESTS = 1; + + private final RpcQosOptions options; + + private final AdaptiveThrottler at; + private final WriteBatcher wb; + private final WriteRampUp writeRampUp; + + private final WeakHashMap counters; + private final Random random; + private final Sleeper sleeper; + private final Function computeCounters; + private final DistributionFactory distributionFactory; + + RpcQosImpl( + RpcQosOptions options, + Random random, + Sleeper sleeper, + CounterFactory counterFactory, + DistributionFactory distributionFactory) { + this.options = options; + this.random = random; + this.sleeper = sleeper; + DistributionFactory filteringDistributionFactory = + new DiagnosticOnlyFilteringDistributionFactory( + !options.isShouldReportDiagnosticMetrics(), distributionFactory); + this.distributionFactory = filteringDistributionFactory; + at = + new AdaptiveThrottler( + options.getSamplePeriod(), + options.getSamplePeriodBucketSize(), + options.getThrottleDuration(), + options.getOverloadRatio()); + wb = + new WriteBatcher( + options.getSamplePeriod(), + options.getSamplePeriodBucketSize(), + options.getBatchInitialCount(), + options.getBatchTargetLatency(), + filteringDistributionFactory); + writeRampUp = + new WriteRampUp(500.0 / options.getHintMaxNumWorkers(), filteringDistributionFactory); + counters = new WeakHashMap<>(); + computeCounters = (Context c) -> O11y.create(c, counterFactory, filteringDistributionFactory); + } + + @Override + public RpcWriteAttemptImpl newWriteAttempt(Context context) { + return new RpcWriteAttemptImpl( + context, + counters.computeIfAbsent(context, computeCounters), + new StatusCodeAwareBackoff( + random, + options.getMaxAttempts(), + options.getThrottleDuration(), + Collections.emptySet()), + sleeper); + } + + @Override + public RpcReadAttemptImpl newReadAttempt(Context context) { + Set graceStatusCodeNumbers = Collections.emptySet(); + // When reading results from a RunQuery or BatchGet the stream returning the results has a + // maximum lifetime of 60 seconds at which point it will be broken with an UNAVAILABLE + // status code. Since this is expected for semi-large query result set sizes we specify + // it as a grace value for backoff evaluation. + if (V1FnRpcAttemptContext.RunQuery.equals(context) + || V1FnRpcAttemptContext.BatchGetDocuments.equals(context)) { + graceStatusCodeNumbers = ImmutableSet.of(Code.UNAVAILABLE_VALUE); + } + return new RpcReadAttemptImpl( + context, + counters.computeIfAbsent(context, computeCounters), + new StatusCodeAwareBackoff( + random, + options.getMaxAttempts(), + options.getThrottleDuration(), + graceStatusCodeNumbers), + sleeper); + } + + @Override + public boolean bytesOverLimit(long bytes) { + return bytes > options.getBatchMaxBytes(); + } + + private static MovingFunction createMovingFunction(Duration samplePeriod, Duration sampleUpdate) { + return new MovingFunction( + samplePeriod.getMillis(), + sampleUpdate.getMillis(), + 1 /* numSignificantBuckets */, + 1 /* numSignificantSamples */, + Sum.ofLongs()); + } + + private enum AttemptState { + PENDING, + STARTED, + COMPLETE_SUCCESS, + COMPLETE_ERROR; + + public void checkActive() { + switch (this) { + case PENDING: + case STARTED: + return; + case COMPLETE_SUCCESS: + throw new IllegalStateException( + "Expected state to be PENDING or STARTED, but was COMPLETE_SUCCESS"); + case COMPLETE_ERROR: + throw new IllegalStateException( + "Expected state to be PENDING or STARTED, but was COMPLETE_ERROR"); + } + } + + public void checkStarted() { + switch (this) { + case STARTED: + return; + case PENDING: + throw new IllegalStateException("Expected state to be STARTED, but was PENDING"); + case COMPLETE_SUCCESS: + throw new IllegalStateException("Expected state to be STARTED, but was COMPLETE_SUCCESS"); + case COMPLETE_ERROR: + throw new IllegalStateException("Expected state to be STARTED, but was COMPLETE_ERROR"); + } + } + } + + private abstract class BaseRpcAttempt implements RpcAttempt { + private final Logger logger; + final O11y o11y; + final StatusCodeAwareBackoff backoff; + final Sleeper sleeper; + + AttemptState state; + Instant start; + + @SuppressWarnings( + "initialization.fields.uninitialized") // allow transient fields to be managed by component + // lifecycle + BaseRpcAttempt(Context context, O11y o11y, StatusCodeAwareBackoff backoff, Sleeper sleeper) { + this.logger = LoggerFactory.getLogger(String.format("%s.RpcQos", context.getNamespace())); + this.o11y = o11y; + this.backoff = backoff; + this.sleeper = sleeper; + this.state = AttemptState.PENDING; + } + + @Override + public boolean awaitSafeToProceed(Instant instant) throws InterruptedException { + state.checkActive(); + Duration shouldThrottleRequest = at.shouldThrottleRequest(instant); + if (shouldThrottleRequest.compareTo(Duration.ZERO) > 0) { + long throttleRequestMillis = shouldThrottleRequest.getMillis(); + logger.debug("Delaying request by {}ms", throttleRequestMillis); + throttleRequest(shouldThrottleRequest); + return false; + } + + return true; + } + + @Override + public void checkCanRetry(Instant instant, RuntimeException exception) + throws InterruptedException { + state.checkActive(); + + Optional findApiException = findApiException(exception); + + if (findApiException.isPresent()) { + ApiException apiException = findApiException.get(); + // order here is semi-important + // First we always want to test if the error code is one of the codes we have deemed + // non-retryable before delegating to the exceptions default set. + Optional statusCodeNumber = getStatusCodeNumber(apiException); + if (maxAttemptsExhausted(instant, statusCodeNumber.orElse(Code.UNKNOWN_VALUE)) + || statusCodeNumber.map(NON_RETRYABLE_ERROR_NUMBERS::contains).orElse(false) + || !apiException.isRetryable()) { + state = AttemptState.COMPLETE_ERROR; + throw apiException; + } + } else { + state = AttemptState.COMPLETE_ERROR; + throw exception; + } + } + + @Override + public void completeSuccess() { + state.checkActive(); + state = AttemptState.COMPLETE_SUCCESS; + } + + @Override + public boolean isCodeRetryable(Code code) { + return !NON_RETRYABLE_ERROR_NUMBERS.contains(code.getNumber()); + } + + @Override + public void recordRequestSuccessful(Instant end) { + state.checkStarted(); + o11y.rpcSuccesses.inc(); + o11y.rpcDurationMs.update(durationMs(end)); + at.recordRequestSuccessful(start); + } + + @Override + public void recordRequestFailed(Instant end) { + state.checkStarted(); + o11y.rpcFailures.inc(); + o11y.rpcDurationMs.update(durationMs(end)); + at.recordRequestFailed(start); + } + + private boolean maxAttemptsExhausted(Instant now, int statusCodeNumber) + throws InterruptedException { + BackoffResult backoffResult = backoff.nextBackoff(now, statusCodeNumber); + if (BackoffResults.EXHAUSTED.equals(backoffResult)) { + logger.error("Max attempts exhausted after {} attempts.", options.getMaxAttempts()); + return true; + } else if (backoffResult instanceof BackoffDuration) { + BackoffDuration result = (BackoffDuration) backoffResult; + sleeper.sleep(result.getDuration().getMillis()); + return false; + } else { + return false; + } + } + + Logger getLogger() { + return logger; + } + + final void throttleRequest(Duration shouldThrottleRequest) throws InterruptedException { + o11y.throttlingMs.inc(shouldThrottleRequest.getMillis()); + sleeper.sleep(shouldThrottleRequest.getMillis()); + } + + final long durationMs(Instant end) { + return end.minus(start.getMillis()).getMillis(); + } + + private Optional getStatusCodeNumber(ApiException apiException) { + StatusCode statusCode = apiException.getStatusCode(); + if (statusCode instanceof GrpcStatusCode) { + GrpcStatusCode grpcStatusCode = (GrpcStatusCode) statusCode; + return Optional.of(grpcStatusCode.getTransportCode().value()); + } + return Optional.empty(); + } + + private Optional findApiException(Throwable throwable) { + if (throwable instanceof ApiException) { + ApiException apiException = (ApiException) throwable; + return Optional.of(apiException); + } else { + Throwable cause = throwable.getCause(); + if (cause != null) { + return findApiException(cause); + } else { + return Optional.empty(); + } + } + } + } + + private final class RpcReadAttemptImpl extends BaseRpcAttempt implements RpcReadAttempt { + private RpcReadAttemptImpl( + Context context, O11y o11y, StatusCodeAwareBackoff backoff, Sleeper sleeper) { + super(context, o11y, backoff, sleeper); + } + + @Override + public void recordRequestStart(Instant start) { + at.recordRequestStart(start); + this.start = start; + state = AttemptState.STARTED; + } + + @Override + public void recordStreamValue(Instant now) { + state.checkActive(); + o11y.rpcStreamValueReceived.inc(); + } + } + + final class RpcWriteAttemptImpl extends BaseRpcAttempt implements RpcWriteAttempt { + + private RpcWriteAttemptImpl( + Context context, O11y o11y, StatusCodeAwareBackoff backoff, Sleeper sleeper) { + super(context, o11y, backoff, sleeper); + } + + @Override + public boolean awaitSafeToProceed(Instant instant) throws InterruptedException { + state.checkActive(); + Optional shouldThrottle = writeRampUp.shouldThrottle(instant); + if (shouldThrottle.isPresent()) { + Duration throttleDuration = shouldThrottle.get(); + long throttleDurationMillis = throttleDuration.getMillis(); + getLogger().debug("Still ramping up, Delaying request by {}ms", throttleDurationMillis); + throttleRequest(throttleDuration); + return false; + } else { + return super.awaitSafeToProceed(instant); + } + } + + @Override + public > FlushBufferImpl newFlushBuffer( + Instant instantSinceEpoch) { + state.checkActive(); + int availableWriteCountBudget = writeRampUp.getAvailableWriteCountBudget(instantSinceEpoch); + int nextBatchMaxCount = wb.nextBatchMaxCount(instantSinceEpoch); + int batchMaxCount = + Ints.min( + Math.max(0, availableWriteCountBudget), + Math.max(0, nextBatchMaxCount), + options.getBatchMaxCount()); + o11y.batchCapacityCount.update(batchMaxCount); + return new FlushBufferImpl<>(batchMaxCount, options.getBatchMaxBytes()); + } + + @Override + public void recordRequestStart(Instant start, int numWrites) { + at.recordRequestStart(start, numWrites); + writeRampUp.recordWriteCount(start, numWrites); + this.start = start; + state = AttemptState.STARTED; + } + + @Override + public void recordWriteCounts(Instant end, int successfulWrites, int failedWrites) { + int totalWrites = successfulWrites + failedWrites; + state.checkStarted(); + wb.recordRequestLatency(start, end, totalWrites, o11y.latencyPerDocumentMs); + if (successfulWrites > 0) { + at.recordRequestSuccessful(start, successfulWrites); + } + if (failedWrites > 0) { + at.recordRequestFailed(start, failedWrites); + } + } + } + + /** + * Determines batch sizes based on past performance. + * + *

    It aims for a target response time per RPC: it uses the response times for previous RPCs and + * the number of documents contained in them, calculates a rolling average time-per-document, and + * chooses the number of documents for future writes to hit the target time. + * + *

    This enables us to send large batches without sending overly-large requests in the case of + * expensive document writes that may timeout before the server can apply them all. + */ + private static final class WriteBatcher { + private static final Logger LOG = LoggerFactory.getLogger(WriteBatcher.class); + + private final int batchInitialCount; + private final Duration batchTargetLatency; + private final MovingAverage meanLatencyPerDocumentMs; + private final Distribution batchMaxCount; + + private WriteBatcher( + Duration samplePeriod, + Duration samplePeriodBucketSize, + int batchInitialCount, + Duration batchTargetLatency, + DistributionFactory distributionFactory) { + this.batchInitialCount = batchInitialCount; + this.batchTargetLatency = batchTargetLatency; + this.meanLatencyPerDocumentMs = new MovingAverage(samplePeriod, samplePeriodBucketSize); + this.batchMaxCount = + distributionFactory.get(RpcQos.class.getName(), "qos_writeBatcher_batchMaxCount"); + } + + private void recordRequestLatency( + Instant start, Instant end, int numWrites, Distribution distribution) { + try { + Interval interval = new Interval(start, end); + long msPerWrite = numWrites == 0 ? 0 : interval.toDurationMillis() / numWrites; + distribution.update(msPerWrite); + meanLatencyPerDocumentMs.add(end, msPerWrite); + } catch (IllegalArgumentException e) { + LOG.warn("Invalid time interval start = {} end = {}", start, end, e); + } + } + + private int nextBatchMaxCount(Instant instantSinceEpoch) { + if (!meanLatencyPerDocumentMs.hasValue(instantSinceEpoch)) { + return batchInitialCount; + } + long recentMeanLatency = Math.max(meanLatencyPerDocumentMs.get(instantSinceEpoch), 1); + long nextBatchMaxCount = batchTargetLatency.getMillis() / recentMeanLatency; + int count = Math.toIntExact(nextBatchMaxCount); + batchMaxCount.update(count); + return count; + } + } + + /** + * An implementation of client-side adaptive throttling. See + * https://sre.google/sre-book/handling-overload/#client-side-throttling-a7sYUg for a full + * discussion of the use case and algorithm applied. + */ + private final class AdaptiveThrottler { + private final MovingFunction successfulRequestsMovingFunction; + private final MovingFunction failedRequestsMovingFunction; + private final MovingFunction allRequestsMovingFunction; + private final Distribution allRequestsCountDist; + private final Distribution successfulRequestsCountDist; + private final Distribution overloadMaxCountDist; + private final Distribution overloadUsageDist; + private final Distribution throttleProbabilityDist; + private final Distribution throttlingMs; + private final LinearBackoff backoff; + private final double overloadRatio; + + private AdaptiveThrottler( + Duration samplePeriod, + Duration samplePeriodBucketSize, + Duration throttleDuration, + double overloadRatio) { + allRequestsMovingFunction = createMovingFunction(samplePeriod, samplePeriodBucketSize); + successfulRequestsMovingFunction = createMovingFunction(samplePeriod, samplePeriodBucketSize); + failedRequestsMovingFunction = createMovingFunction(samplePeriod, samplePeriodBucketSize); + allRequestsCountDist = + distributionFactory.get(RpcQos.class.getName(), "qos_adaptiveThrottler_allRequestsCount"); + successfulRequestsCountDist = + distributionFactory.get( + RpcQos.class.getName(), "qos_adaptiveThrottler_successfulRequestsCount"); + overloadMaxCountDist = + distributionFactory.get(RpcQos.class.getName(), "qos_adaptiveThrottler_overloadMaxCount"); + overloadUsageDist = + distributionFactory.get(RpcQos.class.getName(), "qos_adaptiveThrottler_overloadUsagePct"); + throttleProbabilityDist = + distributionFactory.get( + RpcQos.class.getName(), "qos_adaptiveThrottler_throttleProbabilityPct"); + throttlingMs = + distributionFactory.get(RpcQos.class.getName(), "qos_adaptiveThrottler_throttlingMs"); + backoff = new LinearBackoff(throttleDuration); + this.overloadRatio = overloadRatio; + } + + private Duration shouldThrottleRequest(Instant instantSinceEpoch) { + double delayProbability = throttlingProbability(instantSinceEpoch); + + if (random.nextDouble() < delayProbability) { + long millis = backoff.nextBackOffMillis(); + throttlingMs.update(millis); + return Duration.millis(millis); + } else { + backoff.reset(); + return Duration.ZERO; + } + } + + private void recordRequestStart(Instant instantSinceEpoch) { + recordRequestStart(instantSinceEpoch, 1); + } + + private void recordRequestStart(Instant instantSinceEpoch, int value) { + allRequestsMovingFunction.add(instantSinceEpoch.getMillis(), value); + } + + private void recordRequestSuccessful(Instant instantSinceEpoch) { + recordRequestSuccessful(instantSinceEpoch, 1); + } + + private void recordRequestSuccessful(Instant instantSinceEpoch, int value) { + successfulRequestsMovingFunction.add(instantSinceEpoch.getMillis(), value); + } + + private void recordRequestFailed(Instant instantSinceEpoch) { + recordRequestFailed(instantSinceEpoch, 1); + } + + private void recordRequestFailed(Instant instantSinceEpoch, int value) { + failedRequestsMovingFunction.add(instantSinceEpoch.getMillis(), value); + } + + /** + * Implementation of the formula from Handling Overload from SRE + * Book. + */ + private double throttlingProbability(Instant instantSinceEpoch) { + if (!allRequestsMovingFunction.isSignificant()) { + return 0; + } + long nowMsSinceEpoch = instantSinceEpoch.getMillis(); + long allRequestsCount = allRequestsMovingFunction.get(nowMsSinceEpoch); + long successfulRequestsCount = successfulRequestsMovingFunction.get(nowMsSinceEpoch); + + double overloadMaxCount = overloadRatio * successfulRequestsCount; + double overloadUsage = allRequestsCount - overloadMaxCount; + + double calcProbability = overloadUsage / (allRequestsCount + MIN_REQUESTS); + allRequestsCountDist.update(allRequestsCount); + successfulRequestsCountDist.update(successfulRequestsCount); + overloadMaxCountDist.update((long) overloadMaxCount); + overloadUsageDist.update((long) (overloadUsage * 100)); + throttleProbabilityDist.update((long) (calcProbability * 100)); + return Math.max(0, calcProbability); + } + } + + /** + * An implementation providing the 500/50/5 ramp up strategy recommended by Ramping up + * traffic. + */ + @VisibleForTesting + static final class WriteRampUp { + + private static final Duration RAMP_UP_INTERVAL = Duration.standardMinutes(5); + private final double baseBatchBudget; + private final long rampUpIntervalMinutes; + private final MovingFunction writeCounts; + private final LinearBackoff backoff; + private final Distribution throttlingMs; + private final Distribution availableWriteCountBudget; + + @SuppressWarnings("OptionalUsedAsFieldOrParameterType") + private Optional firstInstant = Optional.empty(); + + WriteRampUp(double baseBatchBudget, DistributionFactory distributionFactory) { + this.baseBatchBudget = baseBatchBudget; + this.rampUpIntervalMinutes = RAMP_UP_INTERVAL.getStandardMinutes(); + this.writeCounts = + createMovingFunction( + // track up to one second of budget usage. + // this determines the full duration of time we want to keep track of request counts + Duration.standardSeconds(1), + // refill the budget each second + // this determines the sub-granularity the full duration will be broken into. So if + // we wanted budget to refill twice per second, this could be passed + // Duration.millis(500) + Duration.standardSeconds(1)); + this.backoff = new LinearBackoff(Duration.standardSeconds(1)); + this.throttlingMs = + distributionFactory.get(RpcQos.class.getName(), "qos_rampUp_throttlingMs"); + this.availableWriteCountBudget = + distributionFactory.get(RpcQos.class.getName(), "qos_rampUp_availableWriteCountBudget"); + } + + int getAvailableWriteCountBudget(Instant instant) { + if (!firstInstant.isPresent()) { + firstInstant = Optional.of(instant); + return (int) Math.max(1, baseBatchBudget); + } + + Instant first = firstInstant.get(); + double maxRequestBudget = calcMaxRequestBudget(instant, first); + long writeCount = writeCounts.get(instant.getMillis()); + double availableBudget = maxRequestBudget - writeCount; + int budget = Ints.saturatedCast((long) availableBudget); + availableWriteCountBudget.update(budget); + return budget; + } + + /** + * Calculate the value relative to the growth line relative to the {@code first}. + * + *

    For a {@code baseBatchBudget} of 500 the line would look like this + */ + private double calcMaxRequestBudget(Instant instant, Instant first) { + Duration durationSinceFirst = new Duration(first, instant); + long calculatedGrowth = + (durationSinceFirst.getStandardMinutes() - rampUpIntervalMinutes) / rampUpIntervalMinutes; + long growth = Math.max(0, calculatedGrowth); + return baseBatchBudget * Math.pow(1.5, growth); + } + + void recordWriteCount(Instant instant, int numWrites) { + writeCounts.add(instant.getMillis(), numWrites); + } + + Optional shouldThrottle(Instant instant) { + int availableWriteCountBudget = getAvailableWriteCountBudget(instant); + if (availableWriteCountBudget <= 0) { + long nextBackOffMillis = backoff.nextBackOffMillis(); + if (nextBackOffMillis > BackOff.STOP) { + Duration throttleDuration = Duration.millis(nextBackOffMillis); + throttlingMs.update(throttleDuration.getMillis()); + return Optional.of(throttleDuration); + } else { + // we've exhausted our backoff, and have moved into the next time window try again + backoff.reset(); + return Optional.empty(); + } + } else { + // budget is available reset backoff tracking + backoff.reset(); + return Optional.empty(); + } + } + } + + /** + * For ramp up we're following a simplistic linear growth formula, when calculating backoff we're + * calculating the next time to check a client side budget and we don't need the randomness + * introduced by FluentBackoff. (Being linear also makes our test simulations easier to model and + * verify) + */ + private static class LinearBackoff implements BackOff { + private static final long MAX_BACKOFF_MILLIS = 60_000; + private static final long MAX_CUMULATIVE_MILLIS = 60_000; + private final long startBackoffMillis; + private long currentBackoffMillis; + private long cumulativeMillis; + + public LinearBackoff(Duration throttleDuration) { + startBackoffMillis = throttleDuration.getMillis(); + currentBackoffMillis = startBackoffMillis; + cumulativeMillis = 0; + } + + @Override + public void reset() { + currentBackoffMillis = startBackoffMillis; + cumulativeMillis = 0; + } + + @Override + public long nextBackOffMillis() { + if (currentBackoffMillis > MAX_BACKOFF_MILLIS) { + reset(); + return MAX_BACKOFF_MILLIS; + } else { + long remainingBudget = Math.max(MAX_CUMULATIVE_MILLIS - cumulativeMillis, 0); + if (remainingBudget == 0) { + reset(); + return STOP; + } + long retVal = Math.min(currentBackoffMillis, remainingBudget); + currentBackoffMillis = (long) (currentBackoffMillis * 1.5); + cumulativeMillis += retVal; + return retVal; + } + } + } + + /** + * This class implements a backoff algorithm similar to that of {@link + * org.apache.beam.sdk.util.FluentBackoff} with some key differences: + * + *

      + *
    1. A set of status code numbers may be specified to have a graceful evaluation + *
    2. Gracefully evaluated status code numbers will increment a decaying counter to ensure if + * the graceful status code numbers occur more than once in the previous 60 seconds the + * regular backoff behavior will kick in. + *
    3. The random number generator used to induce jitter is provided via constructor parameter + * rather than using {@link Math#random()}} + *
    + * + * The primary motivation for creating this implementation is to support streamed responses from + * Firestore. In the case of RunQuery and BatchGet the results are returned via stream. The result + * stream has a maximum lifetime of 60 seconds before it will be broken and an UNAVAILABLE status + * code will be raised. Give that UNAVAILABLE is expected for streams, this class allows for + * defining a set of status code numbers which are given a grace count of 1 before backoff kicks + * in. When backoff does kick in, it is implemented using the same calculations as {@link + * org.apache.beam.sdk.util.FluentBackoff}. + */ + static final class StatusCodeAwareBackoff { + private static final double RANDOMIZATION_FACTOR = 0.5; + private static final Duration MAX_BACKOFF = Duration.standardMinutes(1); + private static final Duration MAX_CUMULATIVE_BACKOFF = Duration.standardMinutes(1); + + private final Random rand; + private final int maxAttempts; + private final Duration initialBackoff; + private final Set graceStatusCodeNumbers; + private final MovingFunction graceStatusCodeTracker; + + private Duration cumulativeBackoff; + private int attempt; + + StatusCodeAwareBackoff( + Random rand, + int maxAttempts, + Duration throttleDuration, + Set graceStatusCodeNumbers) { + this.rand = rand; + this.graceStatusCodeNumbers = graceStatusCodeNumbers; + this.maxAttempts = maxAttempts; + this.initialBackoff = throttleDuration; + this.graceStatusCodeTracker = createGraceStatusCodeTracker(); + this.cumulativeBackoff = Duration.ZERO; + this.attempt = 1; + } + + BackoffResult nextBackoff(Instant now, int statusCodeNumber) { + if (graceStatusCodeNumbers.contains(statusCodeNumber)) { + long nowMillis = now.getMillis(); + long numGraceStatusCode = graceStatusCodeTracker.get(nowMillis); + graceStatusCodeTracker.add(nowMillis, 1); + if (numGraceStatusCode < 1) { + return BackoffResults.NONE; + } else { + return doBackoff(); + } + } else { + return doBackoff(); + } + } + + private BackoffResult doBackoff() { + // Maximum number of retries reached. + if (attempt >= maxAttempts) { + return BackoffResults.EXHAUSTED; + } + // Maximum cumulative backoff reached. + if (cumulativeBackoff.compareTo(MAX_CUMULATIVE_BACKOFF) >= 0) { + return BackoffResults.EXHAUSTED; + } + + double currentIntervalMillis = + Math.min( + initialBackoff.getMillis() * Math.pow(1.5, attempt - 1), MAX_BACKOFF.getMillis()); + double randomOffset = + (rand.nextDouble() * 2 - 1) * RANDOMIZATION_FACTOR * currentIntervalMillis; + long nextBackoffMillis = Math.round(currentIntervalMillis + randomOffset); + // Cap to limit on cumulative backoff + Duration remainingCumulative = MAX_CUMULATIVE_BACKOFF.minus(cumulativeBackoff); + nextBackoffMillis = Math.min(nextBackoffMillis, remainingCumulative.getMillis()); + + // Update state and return backoff. + cumulativeBackoff = cumulativeBackoff.plus(nextBackoffMillis); + attempt += 1; + return new BackoffDuration(Duration.millis(nextBackoffMillis)); + } + + private static MovingFunction createGraceStatusCodeTracker() { + return createMovingFunction(Duration.standardMinutes(1), Duration.millis(500)); + } + + interface BackoffResult {} + + enum BackoffResults implements BackoffResult { + EXHAUSTED, + NONE + } + + static final class BackoffDuration implements BackoffResult { + private final Duration duration; + + BackoffDuration(Duration duration) { + this.duration = duration; + } + + Duration getDuration() { + return duration; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (!(o instanceof BackoffDuration)) { + return false; + } + BackoffDuration that = (BackoffDuration) o; + return Objects.equals(duration, that.duration); + } + + @Override + public int hashCode() { + return Objects.hash(duration); + } + + @Override + public String toString() { + return "BackoffDuration{" + "duration=" + duration + '}'; + } + } + } + + private static class MovingAverage { + private final MovingFunction sum; + private final MovingFunction count; + + private MovingAverage(Duration samplePeriod, Duration sampleUpdate) { + sum = createMovingFunction(samplePeriod, sampleUpdate); + count = createMovingFunction(samplePeriod, sampleUpdate); + } + + private void add(Instant instantSinceEpoch, long value) { + sum.add(instantSinceEpoch.getMillis(), value); + count.add(instantSinceEpoch.getMillis(), 1); + } + + private long get(Instant instantSinceEpoch) { + return sum.get(instantSinceEpoch.getMillis()) / count.get(instantSinceEpoch.getMillis()); + } + + private boolean hasValue(Instant instantSinceEpoch) { + return sum.isSignificant() + && count.isSignificant() + && count.get(instantSinceEpoch.getMillis()) > 0; + } + } + + /** + * Observability (o11y) related metrics. Contains handles to counters and distributions related to + * QoS. + */ + private static final class O11y { + final Counter throttlingMs; + final Counter rpcFailures; + final Counter rpcSuccesses; + final Counter rpcStreamValueReceived; + final Distribution rpcDurationMs; + final Distribution latencyPerDocumentMs; + final Distribution batchCapacityCount; + + private O11y( + Counter throttlingMs, + Counter rpcFailures, + Counter rpcSuccesses, + Counter rpcStreamValueReceived, + Distribution rpcDurationMs, + Distribution latencyPerDocumentMs, + Distribution batchCapacityCount) { + this.throttlingMs = throttlingMs; + this.rpcFailures = rpcFailures; + this.rpcSuccesses = rpcSuccesses; + this.rpcStreamValueReceived = rpcStreamValueReceived; + this.rpcDurationMs = rpcDurationMs; + this.latencyPerDocumentMs = latencyPerDocumentMs; + this.batchCapacityCount = batchCapacityCount; + } + + private static O11y create( + Context context, CounterFactory counterFactory, DistributionFactory distributionFactory) { + // metrics are named using '_' (underscore) instead of '/' (slash) as separators, because + // metric names become gcp resources and get unique URLs, so some parts of the UI will only + // show the value after the last '/' + return new O11y( + // throttlingMs is a special counter used by dataflow. When we are having to throttle, + // we signal to dataflow that fact by adding to this counter. + // Signaling to dataflow is important so that a bundle isn't categorised as hung. + counterFactory.get(context.getNamespace(), "throttlingMs"), + // metrics specific to each rpc + counterFactory.get(context.getNamespace(), "rpc_failures"), + counterFactory.get(context.getNamespace(), "rpc_successes"), + counterFactory.get(context.getNamespace(), "rpc_streamValueReceived"), + distributionFactory.get(context.getNamespace(), "rpc_durationMs"), + // qos wide metrics + distributionFactory.get(RpcQos.class.getName(), "qos_write_latencyPerDocumentMs"), + distributionFactory.get(RpcQos.class.getName(), "qos_write_batchCapacityCount")); + } + } + + static class FlushBufferImpl> implements FlushBuffer { + + final int nextBatchMaxCount; + final long nextBatchMaxBytes; + final ImmutableList.Builder elements; + + int offersAcceptedCount = 0; + long offersAcceptedBytes = 0; + + public FlushBufferImpl(int nextBatchMaxCount, long nextBatchMaxBytes) { + this.nextBatchMaxCount = nextBatchMaxCount; + this.nextBatchMaxBytes = nextBatchMaxBytes; + this.elements = ImmutableList.builder(); + } + + @Override + public boolean offer(ElementT newElement) { + if (offersAcceptedCount < nextBatchMaxCount) { + long newBytesTotal = offersAcceptedBytes + newElement.getSerializedSize(); + if (newBytesTotal <= nextBatchMaxBytes) { + elements.add(newElement); + offersAcceptedCount++; + offersAcceptedBytes = newBytesTotal; + return true; + } else { + return false; + } + } else { + return false; + } + } + + @Override + public Iterator iterator() { + return elements.build().iterator(); + } + + @Override + public int getBufferedElementsCount() { + return offersAcceptedCount; + } + + @Override + public long getBufferedElementsBytes() { + return offersAcceptedBytes; + } + + @Override + public boolean isFull() { + return isNonEmpty() + && (offersAcceptedCount == nextBatchMaxCount || offersAcceptedBytes >= nextBatchMaxBytes); + } + + @Override + public boolean isNonEmpty() { + return offersAcceptedCount > 0; + } + } + + private static final class DiagnosticOnlyFilteringDistributionFactory + implements DistributionFactory { + + private static final Set DIAGNOSTIC_ONLY_METRIC_NAMES = + ImmutableSet.of( + "qos_adaptiveThrottler_allRequestsCount", + "qos_adaptiveThrottler_overloadMaxCount", + "qos_adaptiveThrottler_overloadUsagePct", + "qos_adaptiveThrottler_successfulRequestsCount", + "qos_adaptiveThrottler_throttleProbabilityPct", + "qos_adaptiveThrottler_throttlingMs", + "qos_rampUp_availableWriteCountBudget", + "qos_rampUp_throttlingMs", + "qos_writeBatcher_batchMaxCount", + "qos_write_latencyPerDocumentMs"); + + private final boolean excludeMetrics; + private final DistributionFactory delegate; + + private DiagnosticOnlyFilteringDistributionFactory( + boolean excludeMetrics, DistributionFactory delegate) { + this.excludeMetrics = excludeMetrics; + this.delegate = delegate; + } + + @Override + public Distribution get(String namespace, String name) { + if (excludeMetrics && DIAGNOSTIC_ONLY_METRIC_NAMES.contains(name)) { + return new NullDistribution(new SimpleMetricName(namespace, name)); + } else { + return delegate.get(namespace, name); + } + } + } + + private static final class NullDistribution implements Distribution { + + private final MetricName name; + + private NullDistribution(MetricName name) { + this.name = name; + } + + @Override + public void update(long value) {} + + @Override + public void update(long sum, long count, long min, long max) {} + + @Override + public MetricName getName() { + return name; + } + } + + private static class SimpleMetricName extends MetricName { + + private final String namespace; + private final String name; + + public SimpleMetricName(String namespace, String name) { + this.namespace = namespace; + this.name = name; + } + + @Override + public String getNamespace() { + return namespace; + } + + @Override + public String getName() { + return name; + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosOptions.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosOptions.java new file mode 100644 index 000000000000..f228beadbfc1 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosOptions.java @@ -0,0 +1,775 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static java.util.Objects.requireNonNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.io.Serializable; +import java.util.Comparator; +import java.util.Objects; +import javax.annotation.concurrent.Immutable; +import javax.annotation.concurrent.ThreadSafe; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.BatchWriteWithSummary; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.display.HasDisplayData; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Duration; + +/** + * Quality of Service manager options for Firestore RPCs. + * + *

    Every RPC which is sent to Cloud Firestore is subject to QoS considerations. Successful, + * failed, attempted requests are all tracked and directly drive the determination of when to + * attempt an RPC. + * + *

    Configuration of options can be accomplished by passing an instances of {@link RpcQosOptions} + * to the {@code withRpcQosOptions} method of each {@code Builder} available in {@link FirestoreV1}. + * + *

    A new instance of {@link RpcQosOptions.Builder} can be created via {@link + * RpcQosOptions#newBuilder()}. A default instance of {@link RpcQosOptions} can be created via + * {@link RpcQosOptions#defaultOptions()}. + * + *

    + * + * @see FirestoreV1 + * @see FirestoreV1.BatchGetDocuments.Builder#withRpcQosOptions(RpcQosOptions) + * @see BatchWriteWithSummary.Builder#withRpcQosOptions(RpcQosOptions) + * @see FirestoreV1.ListCollectionIds.Builder#withRpcQosOptions(RpcQosOptions) + * @see FirestoreV1.ListDocuments.Builder#withRpcQosOptions(RpcQosOptions) + * @see FirestoreV1.PartitionQuery.Builder#withRpcQosOptions(RpcQosOptions) + * @see FirestoreV1.RunQuery.Builder#withRpcQosOptions(RpcQosOptions) + * @see Standard limits + * @see Designing + * for scale + */ +@Immutable +@ThreadSafe +public final class RpcQosOptions implements Serializable, HasDisplayData { + + private final int maxAttempts; + private final Duration initialBackoff; + private final Duration samplePeriod; + private final Duration samplePeriodBucketSize; + private final double overloadRatio; + private final Duration throttleDuration; + private final int batchInitialCount; + private final int batchMaxCount; + private final long batchMaxBytes; + private final Duration batchTargetLatency; + private final int hintMaxNumWorkers; + private final boolean shouldReportDiagnosticMetrics; + + private RpcQosOptions( + int maxAttempts, + Duration initialBackoff, + Duration samplePeriod, + Duration samplePeriodBucketSize, + double overloadRatio, + Duration throttleDuration, + int batchInitialCount, + int batchMaxCount, + long batchMaxBytes, + Duration batchTargetLatency, + int hintMaxNumWorkers, + boolean shouldReportDiagnosticMetrics) { + this.maxAttempts = maxAttempts; + this.initialBackoff = initialBackoff; + this.samplePeriod = samplePeriod; + this.samplePeriodBucketSize = samplePeriodBucketSize; + this.overloadRatio = overloadRatio; + this.throttleDuration = throttleDuration; + this.batchInitialCount = batchInitialCount; + this.batchMaxCount = batchMaxCount; + this.batchMaxBytes = batchMaxBytes; + this.batchTargetLatency = batchTargetLatency; + this.hintMaxNumWorkers = hintMaxNumWorkers; + this.shouldReportDiagnosticMetrics = shouldReportDiagnosticMetrics; + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + builder + .add(DisplayData.item("maxAttempts", maxAttempts).withLabel("maxAttempts")) + .add(DisplayData.item("initialBackoff", initialBackoff).withLabel("initialBackoff")) + .add(DisplayData.item("samplePeriod", samplePeriod).withLabel("samplePeriod")) + .add( + DisplayData.item("samplePeriodBucketSize", samplePeriodBucketSize) + .withLabel("samplePeriodBucketSize")) + .add(DisplayData.item("overloadRatio", overloadRatio).withLabel("overloadRatio")) + .add(DisplayData.item("throttleDuration", throttleDuration).withLabel("throttleDuration")) + .add( + DisplayData.item("batchInitialCount", batchInitialCount).withLabel("batchInitialCount")) + .add(DisplayData.item("batchMaxCount", batchMaxCount).withLabel("batchMaxCount")) + .add(DisplayData.item("batchMaxBytes", batchMaxBytes).withLabel("batchMaxBytes")) + .add( + DisplayData.item("batchTargetLatency", batchTargetLatency) + .withLabel("batchTargetLatency")) + .add( + DisplayData.item("hintMaxNumWorkers", hintMaxNumWorkers).withLabel("hintMaxNumWorkers")) + .add( + DisplayData.item("shouldReportDiagnosticMetrics", shouldReportDiagnosticMetrics) + .withLabel("shouldReportDiagnosticMetrics")); + } + + /** + * The maximum number of times a request will be attempted for a complete successful result. + * + *

    For a stream based response, the full stream read must complete within the specified number + * of attempts. Restarting a stream will count as a new attempt. + * + *

    Default Value: 5 + * + * @see RpcQosOptions.Builder#withMaxAttempts(int) + */ + public int getMaxAttempts() { + return maxAttempts; + } + + /** + * The initial backoff duration to be used before retrying a request for the first time. + * + *

    Default Value: 5 sec + * + * @see RpcQosOptions.Builder#withInitialBackoff(Duration) + */ + public Duration getInitialBackoff() { + return initialBackoff; + } + + /** + * The length of time sampled request data will be retained. + * + *

    Default Value: 2 min + * + * @see RpcQosOptions.Builder#withSamplePeriod(Duration) + */ + public Duration getSamplePeriod() { + return samplePeriod; + } + + /** + * The size of buckets within the specified {@link #getSamplePeriod() samplePeriod}. + * + *

    Default Value: 10 sec + * + * @see RpcQosOptions.Builder#withSamplePeriodBucketSize(Duration) + */ + public Duration getSamplePeriodBucketSize() { + return samplePeriodBucketSize; + } + + /** + * The target ratio between requests sent and successful requests. This is "K" in the formula in + * SRE Book - Handling + * Overload + * + *

    Default Value: 1.05 + * + * @see RpcQosOptions.Builder#withOverloadRatio(double) + */ + public double getOverloadRatio() { + return overloadRatio; + } + + /** + * The amount of time an attempt will be throttled if deemed necessary based on previous success + * rate. + * + *

    Default value: 5 sec + * + * @see RpcQosOptions.Builder#withThrottleDuration(Duration) + */ + public Duration getThrottleDuration() { + return throttleDuration; + } + + /** + * The initial size of a batch; used in the absence of the QoS system having significant data to + * determine a better batch size. + * + *

    Default Value: 20 + * + * @see RpcQosOptions.Builder#withBatchInitialCount(int) + */ + public int getBatchInitialCount() { + return batchInitialCount; + } + + /** + * The maximum number of writes to include in a batch. (Used in the absence of the QoS system + * having significant data to determine a better value). The actual number of writes per request + * may be lower if we reach {@link #getBatchMaxBytes() batchMaxBytes}. + * + *

    Default Value: 500 + * + * @see RpcQosOptions.Builder#withBatchMaxCount(int) + */ + public int getBatchMaxCount() { + return batchMaxCount; + } + + /** + * The maximum number of bytes to include in a batch. (Used in the absence of the QoS system + * having significant data to determine a better value). The actual number of bytes per request + * may be lower if {@link #getBatchMaxCount() batchMaxCount} is reached first. + * + *

    Default Value: 9.5 MiB + * + * @see RpcQosOptions.Builder#withBatchMaxBytes(long) + */ + public long getBatchMaxBytes() { + return batchMaxBytes; + } + + /** + * Target latency for batch requests. It aims for a target response time per RPC: the response + * times for previous RPCs and the number of writes contained in them, calculates a rolling + * average time-per-write, and chooses the number of writes for future requests to hit the target + * time. + * + *

    Default Value: 5 sec + * + * @see RpcQosOptions.Builder#withBatchTargetLatency(Duration) + */ + public Duration getBatchTargetLatency() { + return batchTargetLatency; + } + + /** + * A hint to the QoS system for the intended max number of workers for a pipeline. The provided + * value can be used to try to scale calculations to values appropriate for the pipeline as a + * whole. + * + *

    If you are running your pipeline on Cloud Dataflow this parameter should be set to the + * same value as {@code + * maxNumWorkers} + * + *

    Default Value: 500 + * + * @see RpcQosOptions.Builder#withBatchMaxCount(int) + */ + public int getHintMaxNumWorkers() { + return hintMaxNumWorkers; + } + + /** + * Whether additional diagnostic metrics should be reported for a Transform. + * + *

    If true, additional detailed diagnostic metrics will be reported for the RPC QoS subsystem. + * + *

    This parameter should be used with care as it will output ~40 additional custom metrics + * which will count toward any possible pipeline metrics limits (For example Dataflow + * Custom Metrics limits and Cloud Monitoring custom + * metrics quota) + * + *

    Default Value: {@code false} + * + * @see RpcQosOptions.Builder#withReportDiagnosticMetrics() + */ + public boolean isShouldReportDiagnosticMetrics() { + return shouldReportDiagnosticMetrics; + } + + /** + * Create a new {@link Builder} initialized with the values from this instance. + * + * @return a new {@link Builder} initialized with the values from this instance. + */ + public Builder toBuilder() { + Builder builder = + new Builder() + .withMaxAttempts(maxAttempts) + .withInitialBackoff(initialBackoff) + .withSamplePeriod(samplePeriod) + .withSamplePeriodBucketSize(samplePeriodBucketSize) + .withOverloadRatio(overloadRatio) + .withThrottleDuration(throttleDuration) + .withBatchInitialCount(batchInitialCount) + .withBatchMaxCount(batchMaxCount) + .withBatchMaxBytes(batchMaxBytes) + .withBatchTargetLatency(batchTargetLatency) + .withHintMaxNumWorkers(hintMaxNumWorkers); + if (shouldReportDiagnosticMetrics) { + return builder.withReportDiagnosticMetrics(); + } else { + return builder; + } + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (!(o instanceof RpcQosOptions)) { + return false; + } + RpcQosOptions that = (RpcQosOptions) o; + return maxAttempts == that.maxAttempts + && Double.compare(that.overloadRatio, overloadRatio) == 0 + && batchInitialCount == that.batchInitialCount + && batchMaxCount == that.batchMaxCount + && batchMaxBytes == that.batchMaxBytes + && hintMaxNumWorkers == that.hintMaxNumWorkers + && shouldReportDiagnosticMetrics == that.shouldReportDiagnosticMetrics + && initialBackoff.equals(that.initialBackoff) + && samplePeriod.equals(that.samplePeriod) + && samplePeriodBucketSize.equals(that.samplePeriodBucketSize) + && throttleDuration.equals(that.throttleDuration) + && batchTargetLatency.equals(that.batchTargetLatency); + } + + @Override + public int hashCode() { + return Objects.hash( + maxAttempts, + initialBackoff, + samplePeriod, + samplePeriodBucketSize, + overloadRatio, + throttleDuration, + batchInitialCount, + batchMaxCount, + batchMaxBytes, + batchTargetLatency, + hintMaxNumWorkers, + shouldReportDiagnosticMetrics); + } + + @Override + public String toString() { + return "RpcQosOptions{" + + "maxAttempts=" + + maxAttempts + + ", initialBackoff=" + + initialBackoff + + ", samplePeriod=" + + samplePeriod + + ", samplePeriodBucketSize=" + + samplePeriodBucketSize + + ", overloadRatio=" + + overloadRatio + + ", throttleDuration=" + + throttleDuration + + ", batchInitialCount=" + + batchInitialCount + + ", batchMaxCount=" + + batchMaxCount + + ", batchMaxBytes=" + + batchMaxBytes + + ", batchTargetLatency=" + + batchTargetLatency + + ", hintMaxNumWorkers=" + + hintMaxNumWorkers + + ", shouldReportDiagnosticMetrics=" + + shouldReportDiagnosticMetrics + + '}'; + } + + /** + * Factory method to return a new instance of {@link RpcQosOptions} with all default values. + * + * @return New instance of {@link RpcQosOptions} with all default values + * @see #newBuilder() + */ + public static RpcQosOptions defaultOptions() { + return newBuilder().build(); + } + + /** + * Factory method to return a new instance of {@link Builder} with all values set to their initial + * default values. + * + * @return New instance of {@link Builder} with all values set to their initial default values + * @see #defaultOptions() + */ + public static Builder newBuilder() { + return new Builder(); + } + + /** + * Mutable Builder class for creating instances of {@link RpcQosOptions}. + * + *

    A new instance of {@link RpcQosOptions.Builder} can be created via {@link + * RpcQosOptions#newBuilder()}. + * + *

    NOTE: All {@code with} methods in this class function as set rather than copy with + * new value + */ + public static final class Builder { + + /** + * Cloud Firestore has a limit of 10MB per RPC. This is set lower than the 10MB limit on the + * RPC, as this only accounts for the mutations themselves and not the Request wrapper around + * them. + */ + private static final long FIRESTORE_RPC_BYTES_MAX = (long) (9.5 * 1024 * 1024); + /** The Cloud Firestore API has a limit of 500 document updates per request. */ + private static final int FIRESTORE_SINGLE_REQUEST_UPDATE_DOCUMENTS_MAX = 500; + + private int maxAttempts; + private Duration initialBackoff; + private Duration samplePeriod; + private Duration samplePeriodBucketSize; + private double overloadRatio; + private Duration throttleDuration; + private int batchInitialCount; + private int batchMaxCount; + private long batchMaxBytes; + private Duration batchTargetLatency; + private int hintMaxNumWorkers; + private boolean shouldReportDiagnosticMetrics; + + private Builder() { + maxAttempts = 5; + initialBackoff = Duration.standardSeconds(5); + samplePeriod = Duration.standardMinutes(2); + samplePeriodBucketSize = Duration.standardSeconds(10); + overloadRatio = 1.05; + throttleDuration = Duration.standardSeconds(5); + batchInitialCount = 20; + batchMaxCount = FIRESTORE_SINGLE_REQUEST_UPDATE_DOCUMENTS_MAX; + batchMaxBytes = FIRESTORE_RPC_BYTES_MAX; + batchTargetLatency = Duration.standardSeconds(5); + hintMaxNumWorkers = 500; + shouldReportDiagnosticMetrics = false; + } + + /** + * Configure the maximum number of times a request will be attempted for a complete successful + * result. + * + *

    For a stream based response, the full stream read must complete within the specified + * number of attempts. Restarting a stream will count as a new attempt. + * + *

    Default Value: 5 + * + * @param maxAttempts an int in the range 1 <= {@code maxAttempts} <= 5 + * @return this builder + * @see RpcQosOptions#getMaxAttempts() + */ + public Builder withMaxAttempts(int maxAttempts) { + this.maxAttempts = maxAttempts; + return this; + } + + /** + * Configure the initial backoff duration to be used before retrying a request for the first + * time. + * + *

    Default Value: 5 sec + * + * @param initialBackoff a {@link Duration} in the range 5 sec <= {@code initialBackoff} <= 2 + * min + * @return this builder + * @see RpcQosOptions#getInitialBackoff() + */ + public Builder withInitialBackoff(Duration initialBackoff) { + this.initialBackoff = initialBackoff; + return this; + } + + /** + * Configure the length of time sampled request data will be retained. + * + *

    Default Value: 2 min + * + * @param samplePeriod a {@link Duration} in the range 2 min <= {@code samplePeriod} <= 20 min + * @return this builder + * @see RpcQosOptions#getSamplePeriod() + */ + public Builder withSamplePeriod(Duration samplePeriod) { + this.samplePeriod = samplePeriod; + return this; + } + + /** + * Configure the size of buckets within the specified {@link #withSamplePeriod(Duration) + * samplePeriod}. + * + *

    Default Value: 10 sec + * + * @param samplePeriodBucketSize a {@link Duration} in the range 10 sec <= {@code + * samplePeriodBucketSize} <= 20 min + * @return this builder + * @see RpcQosOptions#getSamplePeriodBucketSize() + */ + public Builder withSamplePeriodBucketSize(Duration samplePeriodBucketSize) { + this.samplePeriodBucketSize = samplePeriodBucketSize; + return this; + } + + /** + * The target ratio between requests sent and successful requests. This is "K" in the formula in + * SRE Book - Handling Overload + * + *

    Default Value: 1.05 + * + * @param overloadRatio the target ratio between requests sent and successful requests. A double + * in the range 1.0 <= {@code overloadRatio} <= 1.5 + * @return this builder + * @see RpcQosOptions#getOverloadRatio() + */ + public Builder withOverloadRatio(double overloadRatio) { + this.overloadRatio = overloadRatio; + return this; + } + + /** + * Configure the amount of time an attempt will be throttled if deemed necessary based on + * previous success rate. + * + *

    Default value: 5 sec + * + * @param throttleDuration a {@link Duration} in the range 5 sec <= {@code throttleDuration} <= + * 1 min + * @return this builder + * @see RpcQosOptions#getThrottleDuration() + */ + public Builder withThrottleDuration(Duration throttleDuration) { + this.throttleDuration = throttleDuration; + return this; + } + + /** + * Configure the initial size of a batch; used in the absence of the QoS system having + * significant data to determine a better batch size. + * + *

    Default Value: 20 + * + * @param batchInitialCount an int in the range 1 <= {@code batchInitialCount} <= 500 + * @return this builder + * @see RpcQosOptions#getBatchInitialCount() + */ + public Builder withBatchInitialCount(int batchInitialCount) { + this.batchInitialCount = batchInitialCount; + return this; + } + + /** + * Configure the maximum number of writes to include in a batch. (Used in the absence of the QoS + * system having significant data to determine a better value). The actual number of writes per + * request may be lower if {@link #withBatchMaxBytes(long) batchMaxBytes} is reached first. + * + *

    Default Value: 500 + * + * @param batchMaxCount an int in the range 1 <= {@code batchMaxCount} <= 500 + * @return this builder + * @see RpcQosOptions#getBatchMaxCount() + */ + public Builder withBatchMaxCount(int batchMaxCount) { + this.batchMaxCount = batchMaxCount; + return this; + } + + /** + * Configure the maximum number of bytes to include in a batch. (Used in the absence of the QoS + * system having significant data to determine a better value). The actual number of bytes per + * request may be lower if {@link #withBatchMaxCount(int) batchMaxCount} is reached first. + * + *

    Default Value: 9.5 MiB + * + * @param batchMaxBytes an int in the range 1 B <= {@code batchMaxBytes} <= 9.5 MiB + * @return this builder + * @see RpcQosOptions#getBatchMaxBytes() + */ + public Builder withBatchMaxBytes(long batchMaxBytes) { + this.batchMaxBytes = batchMaxBytes; + return this; + } + + /** + * Target latency for batch requests. It aims for a target response time per RPC: the response + * times for previous RPCs and the number of writes contained in them, calculates a rolling + * average time-per-write, and chooses the number of writes for future requests to hit the + * target time. + * + *

    Default Value: 5 sec + * + * @param batchTargetLatency a {@link Duration} in the range 5 sec <= {@code batchTargetLatency} + * <= 2 min + * @return this builder + * @see RpcQosOptions#getBatchTargetLatency() + */ + public Builder withBatchTargetLatency(Duration batchTargetLatency) { + this.batchTargetLatency = batchTargetLatency; + return this; + } + + /** + * Provide a hint to the QoS system for the intended max number of workers for a pipeline. The + * provided value can be used to try to scale calculations to values appropriate for the + * pipeline as a whole. + * + *

    If you are running your pipeline on Cloud Dataflow this parameter should be set to + * the same value as {@code + * maxNumWorkers} + * + *

    Default Value: 500 + * + * @param hintMaxNumWorkers an int in the range 1 <= {@code hintMaxNumWorkers} <= + * Integer.MAX_VALUE + * @return this builder + * @see RpcQosOptions#getHintMaxNumWorkers() + */ + public Builder withHintMaxNumWorkers(int hintMaxNumWorkers) { + this.hintMaxNumWorkers = hintMaxNumWorkers; + return this; + } + + /** + * Whether additional diagnostic metrics should be reported for a Transform. + * + *

    If invoked on this builder, additional detailed diagnostic metrics will be reported for + * the RPC QoS subsystem. + * + *

    This parameter should be used with care as it will output ~40 additional custom metrics + * which will count toward any possible pipeline metrics limits (For example Dataflow + * Custom Metrics limits and Cloud Monitoring + * custom metrics quota) + * + * @return this builder + * @see RpcQosOptions#isShouldReportDiagnosticMetrics() + */ + public Builder withReportDiagnosticMetrics() { + this.shouldReportDiagnosticMetrics = true; + return this; + } + + /** + * Create a new instance of {@link RpcQosOptions} from the current builder state. + * + *

    All provided values will be validated for non-nullness and that each value falls within + * its allowed range. + * + * @return a new instance of {@link RpcQosOptions} from the current builder state. + * @throws NullPointerException if any nullable value is null + * @throws IllegalArgumentException if any provided value does not fall within its allowed range + */ + public RpcQosOptions build() { + validateIndividualFields(); + validateRelatedFields(); + return unsafeBuild(); + } + + @VisibleForTesting + RpcQosOptions unsafeBuild() { + return new RpcQosOptions( + maxAttempts, + initialBackoff, + samplePeriod, + samplePeriodBucketSize, + overloadRatio, + throttleDuration, + batchInitialCount, + batchMaxCount, + batchMaxBytes, + batchTargetLatency, + hintMaxNumWorkers, + shouldReportDiagnosticMetrics); + } + + @VisibleForTesting + void validateIndividualFields() { + checkInRange("maxAttempts", 1, 5, maxAttempts, Integer::compare); + requireNonNull(initialBackoff, "initialBackoff must be non null"); + checkInRange( + "initialBackoff", + Duration.standardSeconds(5), + Duration.standardMinutes(2), + initialBackoff, + Duration::compareTo); + requireNonNull(samplePeriod, "samplePeriod must be non null"); + checkInRange( + "samplePeriod", + Duration.standardMinutes(2), + Duration.standardMinutes(20), + samplePeriod, + Duration::compareTo); + requireNonNull(samplePeriodBucketSize, "samplePeriodBucketSize must be non null"); + checkInRange( + "samplePeriodBucketSize", + Duration.standardSeconds(10), + Duration.standardMinutes(20), + samplePeriodBucketSize, + Duration::compareTo); + checkInRange("overloadRatio", 1.0, 1.5, overloadRatio, Double::compare); + requireNonNull(throttleDuration, "throttleDuration must be non null"); + checkInRange( + "throttleDuration", + Duration.standardSeconds(5), + Duration.standardMinutes(1), + throttleDuration, + Duration::compareTo); + checkInRange( + "batchInitialCount", + 1, + FIRESTORE_SINGLE_REQUEST_UPDATE_DOCUMENTS_MAX, + batchInitialCount, + Integer::compare); + checkInRange( + "batchMaxCount", + 1, + FIRESTORE_SINGLE_REQUEST_UPDATE_DOCUMENTS_MAX, + batchMaxCount, + Integer::compare); + checkInRange("batchMaxBytes", 1L, FIRESTORE_RPC_BYTES_MAX, batchMaxBytes, Long::compare); + requireNonNull(batchTargetLatency, "batchTargetLatency must be non null"); + checkInRange( + "batchTargetLatency", + Duration.standardSeconds(5), + Duration.standardMinutes(2), + batchTargetLatency, + Duration::compareTo); + checkInRange("hintWorkerCount", 1, Integer.MAX_VALUE, hintMaxNumWorkers, Integer::compare); + } + + @VisibleForTesting + void validateRelatedFields() { + checkArgument( + samplePeriodBucketSize.compareTo(samplePeriod) <= 0, + String.format( + "expected samplePeriodBucketSize <= samplePeriod, but was %s <= %s", + samplePeriodBucketSize, samplePeriod)); + } + + private static void checkInRange( + String fieldName, T min, T max, T actual, Comparator comp) { + checkArgument( + 0 <= comp.compare(actual, min) && comp.compare(actual, max) <= 0, + String.format( + "%s must be in the range %s to %s, but was %s", fieldName, min, max, actual)); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/package-info.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/package-info.java new file mode 100644 index 000000000000..779df15c1772 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/package-info.java @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Provides an API for reading from and writing to Google Cloud Firestore. + */ +@Experimental(Kind.SOURCE_SINK) +package org.apache.beam.sdk.io.gcp.firestore; + +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java index 6b2f263a4e15..50ab75ba9041 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java @@ -21,9 +21,9 @@ import com.fasterxml.jackson.core.JsonProcessingException; import com.fasterxml.jackson.databind.ObjectMapper; -import com.google.api.services.healthcare.v1beta1.model.DeidentifyConfig; -import com.google.api.services.healthcare.v1beta1.model.HttpBody; -import com.google.api.services.healthcare.v1beta1.model.Operation; +import com.google.api.services.healthcare.v1.model.DeidentifyConfig; +import com.google.api.services.healthcare.v1.model.HttpBody; +import com.google.api.services.healthcare.v1.model.Operation; import com.google.auto.value.AutoValue; import com.google.gson.Gson; import com.google.gson.JsonArray; @@ -31,9 +31,10 @@ import java.nio.ByteBuffer; import java.nio.channels.WritableByteChannel; import java.nio.charset.StandardCharsets; +import java.time.Instant; import java.util.ArrayList; -import java.util.Collection; import java.util.Collections; +import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.NoSuchElementException; @@ -83,6 +84,7 @@ import org.apache.beam.sdk.values.PValue; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.sdk.values.TypeDescriptor; import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -135,22 +137,22 @@ * to a destination FHIR store. It is important that the destination store must already exist. * *

    Search This is to search FHIR resources within a given FHIR store. The inputs are individual - * FHIR Search queries, represented by KV. The outputs are results - * of each Search, represented as a Json array of FHIR resources in string form, with pagination - * handled. + * FHIR Search queries, represented by the FhirSearchParameter class. The outputs are results of + * each Search, represented as a Json array of FHIR resources in string form, with pagination + * handled, and an optional input key. * * @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/executeBundle> + * href=>https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.fhirStores.fhir/executeBundle> * @see https://cloud.google.com/healthcare/docs/how-tos/permissions-healthcare-api-gcp-products#fhir_store_cloud_storage_permissions> * @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores/import> + * href=>https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.fhirStores/import> * @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores/export> + * href=>https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.fhirStores/export> * @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores/deidentify> + * href=>https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.fhirStores/deidentify> * @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores/search> + * href=>https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.fhirStores/search> * A {@link PCollection} of {@link String} can be ingested into an Fhir store using {@link * FhirIO.Write#fhirStoresImport(String, String, String, FhirIO.Import.ContentStructure)} This * will return a {@link FhirIO.Write.Result} on which you can call {@link @@ -206,18 +208,43 @@ * DeidentifyConfig deidConfig = new DeidentifyConfig(); // use default DeidentifyConfig * pipeline.apply(FhirIO.deidentify(fhirStoreName, destinationFhirStoreName, deidConfig)); * - * // Search FHIR resources. - * PCollection>> searchQueries = ...; + * // Search FHIR resources using an "OR" query. + * Map queries = new HashMap<>(); + * queries.put("name", "Alice,Bob"); + * FhirSearchParameter searchParameter = FhirSearchParameter.of("Patient", queries); + * PCollection> searchQueries = + * pipeline.apply( + * Create.of(searchParameter) + * .withCoder(FhirSearchParameterCoder.of(StringUtf8Coder.of()))); * FhirIO.Search.Result searchResult = * searchQueries.apply(FhirIO.searchResources(options.getFhirStore())); + * PCollection resources = searchResult.getResources(); // JsonArray of results + * + * // Search FHIR resources using an "AND" query with a key. + * Map> listQueries = new HashMap<>(); + * listQueries.put("name", Arrays.asList("Alice", "Bob")); + * FhirSearchParameter> listSearchParameter = + * FhirSearchParameter.of("Patient", "Alice-Bob-Search", listQueries); + * PCollection>> listSearchQueries = + * pipeline.apply( + * Create.of(listSearchParameter) + * .withCoder(FhirSearchParameterCoder.of(ListCoder.of(StringUtf8Coder.of())))); + * FhirIO.Search.Result listSearchResult = + * searchQueries.apply(FhirIO.searchResources(options.getFhirStore())); + * PCollection> listResource = + * listSearchResult.getKeyedResources(); // KV<"Alice-Bob-Search", JsonArray of results> * - * }*** * */ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class FhirIO { + private static final String BASE_METRIC_PREFIX = "fhirio/"; + private static final String LRO_COUNTER_KEY = "counter"; + private static final String LRO_SUCCESS_KEY = "success"; + private static final String LRO_FAILURE_KEY = "failure"; + private static final Logger LOG = LoggerFactory.getLogger(FhirIO.class); /** * Read resources from a PCollection of resource IDs (e.g. when subscribing the pubsub @@ -231,13 +258,23 @@ public static Read readResources() { } /** - * Search resources from a Fhir store. + * Search resources from a Fhir store with String parameter values. * * @return the search * @see Search */ - public static Search searchResources(String fhirStore) { - return new Search(fhirStore); + public static Search searchResources(String fhirStore) { + return new Search(fhirStore); + } + + /** + * Search resources from a Fhir store with any type of parameter values. + * + * @return the search + * @see Search + */ + public static Search searchResourcesWithGenericParameters(String fhirStore) { + return new Search<>(fhirStore); } /** @@ -345,6 +382,37 @@ public static Deidentify deidentify( return new Deidentify(sourceFhirStore, destinationFhirStore, deidConfig); } + /** + * Increments success and failure counters for an LRO. To be used after the LRO has completed. + * This function leverages the fact that the LRO metadata is always of the format: "counter": { + * "success": "1", "failure": "1" } + * + * @param operation LRO operation object. + * @param successCounter the success counter for this operation. + * @param failureCounter the failure counter for this operation. + */ + private static void incrementLroCounters( + Operation operation, Counter successCounter, Counter failureCounter) { + Map opMetadata = operation.getMetadata(); + if (opMetadata.containsKey(LRO_COUNTER_KEY)) { + try { + Map counters = (Map) opMetadata.get(LRO_COUNTER_KEY); + if (counters.containsKey(LRO_SUCCESS_KEY)) { + successCounter.inc(Long.parseLong(counters.get(LRO_SUCCESS_KEY))); + } + if (counters.containsKey(LRO_FAILURE_KEY)) { + Long numFailures = Long.parseLong(counters.get(LRO_FAILURE_KEY)); + failureCounter.inc(numFailures); + if (numFailures > 0) { + LOG.error("LRO: " + operation.getName() + " had " + numFailures + " errors."); + } + } + } catch (Exception e) { + LOG.error("failed to increment LRO counters, error message: " + e.getMessage()); + } + } + } + /** The type Read. */ public static class Read extends PTransform, FhirIO.Read.Result> { private static final Logger LOG = LoggerFactory.getLogger(Read.class); @@ -368,9 +436,7 @@ public static class Result implements POutput, PInput { * @throws IllegalArgumentException the illegal argument exception */ static FhirIO.Read.Result of(PCollectionTuple pct) throws IllegalArgumentException { - if (pct.getAll() - .keySet() - .containsAll((Collection) TupleTagList.of(OUT).and(DEAD_LETTER))) { + if (pct.has(OUT) && pct.has(DEAD_LETTER)) { return new FhirIO.Read.Result(pct); } else { throw new IllegalArgumentException( @@ -465,11 +531,16 @@ public FhirIO.Read.Result expand(PCollection resourceIds) { /** DoFn for fetching messages from the Fhir store with error handling. */ static class ReadResourceFn extends DoFn { - private Counter failedMessageGets = - Metrics.counter(ReadResourceFn.class, "failed-message-reads"); private static final Logger LOG = LoggerFactory.getLogger(ReadResourceFn.class); - private final Counter successfulStringGets = - Metrics.counter(ReadResourceFn.class, "successful-hl7v2-message-gets"); + private static final Counter READ_RESOURCE_ERRORS = + Metrics.counter(ReadResourceFn.class, BASE_METRIC_PREFIX + "read_resource_error_count"); + private static final Counter READ_RESOURCE_SUCCESS = + Metrics.counter( + ReadResourceFn.class, BASE_METRIC_PREFIX + "read_resource_success_count"); + private static final Distribution READ_RESOURCE_LATENCY_MS = + Metrics.distribution( + ReadResourceFn.class, BASE_METRIC_PREFIX + "read_resource_latency_ms"); + private HealthcareApiClient client; private ObjectMapper mapper; @@ -498,7 +569,7 @@ public void processElement(ProcessContext context) { try { context.output(fetchResource(this.client, resourceId)); } catch (Exception e) { - failedMessageGets.inc(); + READ_RESOURCE_ERRORS.inc(); LOG.warn( String.format( "Error fetching Fhir message with ID %s writing to Dead Letter " @@ -510,14 +581,15 @@ public void processElement(ProcessContext context) { private String fetchResource(HealthcareApiClient client, String resourceId) throws IOException, IllegalArgumentException { - long startTime = System.currentTimeMillis(); + long startTime = Instant.now().toEpochMilli(); HttpBody resource = client.readFhirResource(resourceId); + READ_RESOURCE_LATENCY_MS.update(Instant.now().toEpochMilli() - startTime); if (resource == null) { throw new IOException(String.format("GET request for %s returned null", resourceId)); } - this.successfulStringGets.inc(); + READ_RESOURCE_SUCCESS.inc(); return mapper.writeValueAsString(resource); } } @@ -528,26 +600,28 @@ private String fetchResource(HealthcareApiClient client, String resourceId) @AutoValue public abstract static class Write extends PTransform, Write.Result> { - /** The tag for the failed writes to FHIR store`. */ + /** The tag for successful writes to FHIR store. */ + public static final TupleTag SUCCESSFUL_BODY = new TupleTag() {}; + /** The tag for the failed writes to FHIR store. */ public static final TupleTag> FAILED_BODY = new TupleTag>() {}; - /** The tag for the files that failed to FHIR store`. */ + /** The tag for the files that failed to FHIR store. */ public static final TupleTag> FAILED_FILES = new TupleTag>() {}; - /** The tag for temp files for import to FHIR store`. */ + /** The tag for temp files for import to FHIR store. */ public static final TupleTag TEMP_FILES = new TupleTag() {}; /** The enum Write method. */ public enum WriteMethod { /** * Execute Bundle Method executes a batch of requests as a single transaction @see . + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.fhirStores.fhir/executeBundle>. */ EXECUTE_BUNDLE, /** * Import Method bulk imports resources from GCS. This is ideal for initial loads to empty * FHIR stores. . + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.fhirStores/import>. */ IMPORT } @@ -555,25 +629,41 @@ public enum WriteMethod { /** The type Result. */ public static class Result implements POutput { private final Pipeline pipeline; + private final PCollection successfulBodies; private final PCollection> failedBodies; private final PCollection> failedFiles; /** - * Creates a {@link FhirIO.Write.Result} in the given {@link Pipeline}. @param pipeline the - * pipeline + * Creates a {@link FhirIO.Write.Result} in the given {@link Pipeline}. * - * @param failedBodies the failed inserts + * @param pipeline the pipeline + * @param bodies the successful and failing bodies results. * @return the result */ - static Result in(Pipeline pipeline, PCollection> failedBodies) { - return new Result(pipeline, failedBodies, null); + static Result in(Pipeline pipeline, PCollectionTuple bodies) throws IllegalArgumentException { + if (bodies.has(SUCCESSFUL_BODY) && bodies.has(FAILED_BODY)) { + return new Result(pipeline, bodies.get(SUCCESSFUL_BODY), bodies.get(FAILED_BODY), null); + } else { + throw new IllegalArgumentException( + "The PCollection tuple bodies must have the FhirIO.Write.SUCCESSFUL_BODY " + + "and FhirIO.Write.FAILED_BODY tuple tags."); + } } static Result in( Pipeline pipeline, PCollection> failedBodies, PCollection> failedFiles) { - return new Result(pipeline, failedBodies, failedFiles); + return new Result(pipeline, null, failedBodies, failedFiles); + } + + /** + * Gets successful bodies from Write. + * + * @return the entries that were inserted + */ + public PCollection getSuccessfulBodies() { + return this.successfulBodies; } /** @@ -601,7 +691,13 @@ public Pipeline getPipeline() { @Override public Map, PValue> expand() { - return ImmutableMap.of(Write.FAILED_BODY, failedBodies, Write.FAILED_FILES, failedFiles); + return ImmutableMap.of( + SUCCESSFUL_BODY, + successfulBodies, + FAILED_BODY, + failedBodies, + Write.FAILED_FILES, + failedFiles); } @Override @@ -610,9 +706,15 @@ public void finishSpecifyingOutput( private Result( Pipeline pipeline, + @Nullable PCollection successfulBodies, PCollection> failedBodies, @Nullable PCollection> failedFiles) { this.pipeline = pipeline; + if (successfulBodies == null) { + successfulBodies = + (PCollection) pipeline.apply(Create.empty(StringUtf8Coder.of())); + } + this.successfulBodies = successfulBodies; this.failedBodies = failedBodies; if (failedFiles == null) { failedFiles = @@ -716,7 +818,7 @@ private static Write.Builder write(String fhirStore) { /** * Create Method creates a single FHIR resource. @see + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.fhirStores.fhir/create> * * @param fhirStore the hl 7 v 2 store * @param gcsTempPath the gcs temp path @@ -766,9 +868,9 @@ public static Write fhirStoresImport( /** * Execute Bundle Method executes a batch of requests as a single transaction @see . + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.fhirStores.fhir/executeBundle>. * - * @param fhirStore the hl 7 v 2 store + * @param fhirStore the fhir store * @return the write */ public static Write executeBundles(String fhirStore) { @@ -795,8 +897,7 @@ public static Write executeBundles(ValueProvider fhirStore) { @Override public Result expand(PCollection input) { - PCollection> failedBundles; - PCollection> failedImports; + PCollectionTuple bundles; switch (this.getWriteMethod()) { case IMPORT: LOG.warn( @@ -814,14 +915,15 @@ public Result expand(PCollection input) { return input.apply(new Import(getFhirStore(), tempPath, deadPath, contentStructure)); case EXECUTE_BUNDLE: default: - failedBundles = - input - .apply( - "Execute FHIR Bundles", - ParDo.of(new ExecuteBundles.ExecuteBundlesFn(this.getFhirStore()))) - .setCoder(HealthcareIOErrorCoder.of(StringUtf8Coder.of())); + bundles = + input.apply( + "Execute FHIR Bundles", + ParDo.of(new ExecuteBundles.ExecuteBundlesFn(this.getFhirStore())) + .withOutputTags(SUCCESSFUL_BODY, TupleTagList.of(FAILED_BODY))); + bundles.get(SUCCESSFUL_BODY).setCoder(StringUtf8Coder.of()); + bundles.get(FAILED_BODY).setCoder(HealthcareIOErrorCoder.of(StringUtf8Coder.of())); } - return Result.in(input.getPipeline(), failedBundles); + return Result.in(input.getPipeline(), bundles); } } @@ -923,7 +1025,7 @@ Optional> getImportGcsDeadLetterPath() { public Write.Result expand(PCollection input) { checkState( input.isBounded() == IsBounded.BOUNDED, - "FhirIO.Import should only be used on unbounded PCollections as it is" + "FhirIO.Import should only be used on bounded PCollections as it is" + "intended for batch use only."); // fall back on pipeline's temp location. @@ -977,13 +1079,11 @@ public Write.Result expand(PCollection input) { @ProcessElement public void delete(@Element Metadata path, ProcessContext context) { // Wait til window closes for failedBodies and failedFiles to ensure we are - // done processing - // anything under tempGcsPath because it has been successfully imported to - // FHIR store or - // copies have been moved to the dead letter path. + // done processing anything under tempGcsPath because it has been + // successfully imported to FHIR store or copies have been moved to the + // dead letter path. // Clean up all of tempGcsPath. This will handle removing phantom temporary - // objects from - // failed / rescheduled ImportFn::importBatch. + // objects from failed / rescheduled ImportFn::importBatch. try { FileSystems.delete( Collections.singleton(path.resourceId()), @@ -1106,6 +1206,10 @@ public void closeFile(FinishBundleContext context) throws IOException { static class ImportFn extends DoFn>, HealthcareIOError> { + private static final Counter IMPORT_ERRORS = + Metrics.counter(ImportFn.class, BASE_METRIC_PREFIX + "resources_imported_failure_count"); + private static final Counter IMPORT_SUCCESS = + Metrics.counter(ImportFn.class, BASE_METRIC_PREFIX + "resources_imported_success_count"); private static final Logger LOG = LoggerFactory.getLogger(ImportFn.class); private final ValueProvider tempGcsPath; private final ValueProvider deadLetterGcsPath; @@ -1179,7 +1283,8 @@ public void importBatch( Operation operation = client.importFhirResource( fhirStore.get(), importUri.toString(), contentStructure.name()); - client.pollOperation(operation, 500L); + operation = client.pollOperation(operation, 500L); + incrementLroCounters(operation, IMPORT_SUCCESS, IMPORT_ERRORS); // Clean up temp files on GCS as they we successfully imported to FHIR store and no longer // needed. FileSystems.delete(tempDestinations); @@ -1247,17 +1352,28 @@ public static class ExecuteBundles extends PTransform, Write @Override public FhirIO.Write.Result expand(PCollection input) { - return Write.Result.in( - input.getPipeline(), - input - .apply(ParDo.of(new ExecuteBundlesFn(fhirStore))) - .setCoder(HealthcareIOErrorCoder.of(StringUtf8Coder.of()))); + PCollectionTuple bodies = + input.apply( + ParDo.of(new ExecuteBundlesFn(fhirStore)) + .withOutputTags(Write.SUCCESSFUL_BODY, TupleTagList.of(Write.FAILED_BODY))); + bodies.get(Write.SUCCESSFUL_BODY).setCoder(StringUtf8Coder.of()); + bodies.get(Write.FAILED_BODY).setCoder(HealthcareIOErrorCoder.of(StringUtf8Coder.of())); + return Write.Result.in(input.getPipeline(), bodies); } /** The type Write Fhir fn. */ - static class ExecuteBundlesFn extends DoFn> { + static class ExecuteBundlesFn extends DoFn { + + private static final Counter EXECUTE_BUNDLE_ERRORS = + Metrics.counter( + ExecuteBundlesFn.class, BASE_METRIC_PREFIX + "execute_bundle_error_count"); + private static final Counter EXECUTE_BUNDLE_SUCCESS = + Metrics.counter( + ExecuteBundlesFn.class, BASE_METRIC_PREFIX + "execute_bundle_success_count"); + private static final Distribution EXECUTE_BUNDLE_LATENCY_MS = + Metrics.distribution( + ExecuteBundlesFn.class, BASE_METRIC_PREFIX + "execute_bundle_latency_ms"); - private Counter failedBundles = Metrics.counter(ExecuteBundlesFn.class, "failed-bundles"); private transient HealthcareApiClient client; private final ObjectMapper mapper = new ObjectMapper(); /** The Fhir store. */ @@ -1275,28 +1391,27 @@ static class ExecuteBundlesFn extends DoFn> { /** * Initialize healthcare client. * - * @throws IOException the io exception + * @throws IOException If the Healthcare client cannot be created. */ @Setup public void initClient() throws IOException { this.client = new HttpHealthcareApiClient(); } - /** - * Execute Bundles. - * - * @param context the context - */ @ProcessElement public void executeBundles(ProcessContext context) { String body = context.element(); try { + long startTime = Instant.now().toEpochMilli(); // Validate that data was set to valid JSON. mapper.readTree(body); client.executeFhirBundle(fhirStore.get(), body); + EXECUTE_BUNDLE_LATENCY_MS.update(Instant.now().toEpochMilli() - startTime); + EXECUTE_BUNDLE_SUCCESS.inc(); + context.output(Write.SUCCESSFUL_BODY, body); } catch (IOException | HealthcareHttpException e) { - failedBundles.inc(); - context.output(HealthcareIOError.of(body, e)); + EXECUTE_BUNDLE_ERRORS.inc(); + context.output(Write.FAILED_BODY, HealthcareIOError.of(body, e)); } } } @@ -1326,6 +1441,15 @@ public PCollection expand(PBegin input) { /** A function that schedules an export operation and monitors the status. */ public static class ExportResourcesToGcsFn extends DoFn { + + private static final Counter EXPORT_ERRORS = + Metrics.counter( + ExportResourcesToGcsFn.class, + BASE_METRIC_PREFIX + "resources_exported_failure_count"); + private static final Counter EXPORT_SUCCESS = + Metrics.counter( + ExportResourcesToGcsFn.class, + BASE_METRIC_PREFIX + "resources_exported_success_count"); private HealthcareApiClient client; private final ValueProvider exportGcsUriPrefix; @@ -1349,6 +1473,7 @@ public void exportResourcesToGcs(ProcessContext context) throw new RuntimeException( String.format("Export operation (%s) failed.", operation.getName())); } + incrementLroCounters(operation, EXPORT_SUCCESS, EXPORT_ERRORS); context.output(String.format("%s/*", gcsPrefix.replaceAll("/+$", ""))); } } @@ -1381,6 +1506,13 @@ public PCollection expand(PBegin input) { /** A function that schedules a deidentify operation and monitors the status. */ public static class DeidentifyFn extends DoFn { + + private static final Counter DEIDENTIFY_ERRORS = + Metrics.counter( + DeidentifyFn.class, BASE_METRIC_PREFIX + "resources_deidentified_failure_count"); + private static final Counter DEIDENTIFY_SUCCESS = + Metrics.counter( + DeidentifyFn.class, BASE_METRIC_PREFIX + "resources_deidentified_success_count"); private HealthcareApiClient client; private final ValueProvider destinationFhirStore; private static final Gson gson = new Gson(); @@ -1410,14 +1542,15 @@ public void deidentify(ProcessContext context) throw new IOException( String.format("DeidentifyFhirStore operation (%s) failed.", operation.getName())); } + incrementLroCounters(operation, DEIDENTIFY_SUCCESS, DEIDENTIFY_ERRORS); context.output(destinationFhirStore); } } } /** The type Search. */ - public static class Search - extends PTransform>>, FhirIO.Search.Result> { + public static class Search + extends PTransform>, FhirIO.Search.Result> { private static final Logger LOG = LoggerFactory.getLogger(Search.class); private final ValueProvider fhirStore; @@ -1431,7 +1564,8 @@ public static class Search } public static class Result implements POutput, PInput { - private PCollection resources; + private PCollection> keyedResources; + private PCollection resources; private PCollection> failedSearches; PCollectionTuple pct; @@ -1444,9 +1578,7 @@ public static class Result implements POutput, PInput { * @throws IllegalArgumentException the illegal argument exception */ static FhirIO.Search.Result of(PCollectionTuple pct) throws IllegalArgumentException { - if (pct.getAll() - .keySet() - .containsAll((Collection) TupleTagList.of(OUT).and(DEAD_LETTER))) { + if (pct.has(OUT) && pct.has(DEAD_LETTER)) { return new FhirIO.Search.Result(pct); } else { throw new IllegalArgumentException( @@ -1457,7 +1589,15 @@ static FhirIO.Search.Result of(PCollectionTuple pct) throws IllegalArgumentExcep private Result(PCollectionTuple pct) { this.pct = pct; - this.resources = pct.get(OUT); + this.keyedResources = + pct.get(OUT).setCoder(KvCoder.of(StringUtf8Coder.of(), JsonArrayCoder.of())); + this.resources = + this.keyedResources + .apply( + "Extract Values", + MapElements.into(TypeDescriptor.of(JsonArray.class)) + .via((KV in) -> in.getValue())) + .setCoder(JsonArrayCoder.of()); this.failedSearches = pct.get(DEAD_LETTER).setCoder(HealthcareIOErrorCoder.of(StringUtf8Coder.of())); } @@ -1476,10 +1616,19 @@ public PCollection> getFailedSearches() { * * @return the resources */ - public PCollection getResources() { + public PCollection getResources() { return resources; } + /** + * Gets resources with input SearchParameter key. + * + * @return the resources with input SearchParameter key. + */ + public PCollection> getKeyedResources() { + return keyedResources; + } + @Override public Pipeline getPipeline() { return this.pct.getPipeline(); @@ -1487,7 +1636,7 @@ public Pipeline getPipeline() { @Override public Map, PValue> expand() { - return ImmutableMap.of(OUT, resources); + return ImmutableMap.of(OUT, keyedResources); } @Override @@ -1496,13 +1645,14 @@ public void finishSpecifyingOutput( } /** The tag for the main output of Fhir Messages. */ - public static final TupleTag OUT = new TupleTag() {}; + public static final TupleTag> OUT = + new TupleTag>() {}; /** The tag for the deadletter output of Fhir Messages. */ public static final TupleTag> DEAD_LETTER = new TupleTag>() {}; @Override - public FhirIO.Search.Result expand(PCollection>> input) { + public FhirIO.Search.Result expand(PCollection> input) { return input.apply("Fetch Fhir messages", new SearchResourcesJsonString(this.fhirStore)); } @@ -1524,8 +1674,8 @@ public FhirIO.Search.Result expand(PCollection>> * stacktrace. * */ - static class SearchResourcesJsonString - extends PTransform>>, FhirIO.Search.Result> { + class SearchResourcesJsonString + extends PTransform>, FhirIO.Search.Result> { private final ValueProvider fhirStore; @@ -1534,7 +1684,7 @@ public SearchResourcesJsonString(ValueProvider fhirStore) { } @Override - public FhirIO.Search.Result expand(PCollection>> resourceIds) { + public FhirIO.Search.Result expand(PCollection> resourceIds) { return new FhirIO.Search.Result( resourceIds.apply( ParDo.of(new SearchResourcesFn(this.fhirStore)) @@ -1543,15 +1693,19 @@ public FhirIO.Search.Result expand(PCollection>> } /** DoFn for searching messages from the Fhir store with error handling. */ - static class SearchResourcesFn extends DoFn>, String> { - - private Distribution searchLatencyMs = - Metrics.distribution(SearchResourcesFn.class, "fhir-search-latency-ms"); - private Counter failedSearches = - Metrics.counter(SearchResourcesFn.class, "failed-fhir-searches"); - private static final Logger LOG = LoggerFactory.getLogger(SearchResourcesFn.class); - private final Counter successfulSearches = - Metrics.counter(SearchResourcesFn.class, "successful-fhir-searches"); + class SearchResourcesFn extends DoFn, KV> { + + private final Counter searchResourceErrors = + Metrics.counter( + SearchResourcesFn.class, BASE_METRIC_PREFIX + "search_resource_error_count"); + private final Counter searchResourceSuccess = + Metrics.counter( + SearchResourcesFn.class, BASE_METRIC_PREFIX + "search_resource_success_count"); + private final Distribution searchResourceLatencyMs = + Metrics.distribution( + SearchResourcesFn.class, BASE_METRIC_PREFIX + "search_resource_latency_ms"); + + private final Logger log = LoggerFactory.getLogger(SearchResourcesFn.class); private HealthcareApiClient client; private final ValueProvider fhirStore; @@ -1577,17 +1731,19 @@ public void instantiateHealthcareClient() throws IOException { */ @ProcessElement public void processElement(ProcessContext context) { - KV> elementValues = context.element(); + FhirSearchParameter fhirSearchParameters = context.element(); try { context.output( - searchResources( - this.client, - this.fhirStore.toString(), - elementValues.getKey(), - elementValues.getValue())); + KV.of( + fhirSearchParameters.getKey(), + searchResources( + this.client, + this.fhirStore.toString(), + fhirSearchParameters.getResourceType(), + fhirSearchParameters.getQueries()))); } catch (IllegalArgumentException | NoSuchElementException e) { - failedSearches.inc(); - LOG.warn( + searchResourceErrors.inc(); + log.warn( String.format( "Error search FHIR messages writing to Dead Letter " + "Queue. Cause: %s Stack Trace: %s", @@ -1597,25 +1753,28 @@ public void processElement(ProcessContext context) { } } - private String searchResources( + private JsonArray searchResources( HealthcareApiClient client, String fhirStore, String resourceType, - @Nullable Map parameters) + @Nullable Map parameters) throws NoSuchElementException { - long startTime = System.currentTimeMillis(); + long start = Instant.now().toEpochMilli(); + HashMap parameterObjects = new HashMap<>(); + if (parameters != null) { + parameters.forEach(parameterObjects::put); + } HttpHealthcareApiClient.FhirResourcePages.FhirResourcePagesIterator iter = new HttpHealthcareApiClient.FhirResourcePages.FhirResourcePagesIterator( - client, fhirStore, resourceType, parameters); + client, fhirStore, resourceType, parameterObjects); JsonArray result = new JsonArray(); - result.addAll(iter.next()); while (iter.hasNext()) { result.addAll(iter.next()); } - searchLatencyMs.update(System.currentTimeMillis() - startTime); - this.successfulSearches.inc(); - return result.toString(); + searchResourceLatencyMs.update(java.time.Instant.now().toEpochMilli() - start); + searchResourceSuccess.inc(); + return result; } } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirSearchParameter.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirSearchParameter.java new file mode 100644 index 000000000000..6f80d8619d4e --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirSearchParameter.java @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.healthcare; + +import java.util.Map; +import java.util.Objects; +import org.apache.beam.sdk.coders.DefaultCoder; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * FhirSearchParameter represents the query parameters for a FHIR search request, used as a + * parameter for {@link FhirIO.Search}. + */ +@DefaultCoder(FhirSearchParameterCoder.class) +public class FhirSearchParameter { + + private final String resourceType; + // The key is used as a key for the search query, if there is source information to propagate + // through the pipeline. + private final String key; + private final @Nullable Map queries; + + private FhirSearchParameter( + String resourceType, @Nullable String key, @Nullable Map queries) { + this.resourceType = resourceType; + if (key != null) { + this.key = key; + } else { + this.key = ""; + } + this.queries = queries; + } + + public static FhirSearchParameter of( + String resourceType, @Nullable String key, @Nullable Map queries) { + return new FhirSearchParameter<>(resourceType, key, queries); + } + + public static FhirSearchParameter of( + String resourceType, @Nullable Map queries) { + return new FhirSearchParameter<>(resourceType, null, queries); + } + + public String getResourceType() { + return resourceType; + } + + public String getKey() { + return key; + } + + public @Nullable Map getQueries() { + return queries; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + FhirSearchParameter that = (FhirSearchParameter) o; + return Objects.equals(resourceType, that.resourceType) + && Objects.equals(key, that.key) + && Objects.equals(queries, that.queries); + } + + @Override + public int hashCode() { + return Objects.hash(resourceType, queries); + } + + @Override + public String toString() { + return "FhirSearchParameter{" + + "resourceType='" + + resourceType + + '\'' + + ", key='" + + key + + '\'' + + ", queries=" + + queries + + '}'; + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirSearchParameterCoder.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirSearchParameterCoder.java new file mode 100644 index 000000000000..06c4c1cdee98 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirSearchParameterCoder.java @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.healthcare; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.Map; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CustomCoder; +import org.apache.beam.sdk.coders.MapCoder; +import org.apache.beam.sdk.coders.NullableCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; + +/** + * FhirSearchParameterCoder is the coder for {@link FhirSearchParameter}, which takes a coder for + * type T. + */ +public class FhirSearchParameterCoder extends CustomCoder> { + + private static final NullableCoder STRING_CODER = NullableCoder.of(StringUtf8Coder.of()); + private final NullableCoder> originalCoder; + + FhirSearchParameterCoder(Coder originalCoder) { + this.originalCoder = NullableCoder.of(MapCoder.of(STRING_CODER, originalCoder)); + } + + public static FhirSearchParameterCoder of(Coder originalCoder) { + return new FhirSearchParameterCoder(originalCoder); + } + + @Override + public void encode(FhirSearchParameter value, OutputStream outStream) throws IOException { + STRING_CODER.encode(value.getResourceType(), outStream); + STRING_CODER.encode(value.getKey(), outStream); + originalCoder.encode(value.getQueries(), outStream); + } + + @Override + public FhirSearchParameter decode(InputStream inStream) throws IOException { + String resourceType = STRING_CODER.decode(inStream); + String key = STRING_CODER.decode(inStream); + Map queries = originalCoder.decode(inStream); + return FhirSearchParameter.of(resourceType, key, queries); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java index 325c628c0c75..540b82e9ff33 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.healthcare; -import com.google.api.services.healthcare.v1beta1.model.Message; +import com.google.api.services.healthcare.v1.model.Message; import com.google.auto.value.AutoValue; import java.io.IOException; import java.text.ParseException; @@ -92,9 +92,9 @@ *

    Message Listing Message Listing with {@link HL7v2IO.ListHL7v2Messages} supports batch use * cases where you want to process all the messages in an HL7v2 store or those matching a * filter @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.hl7V2Stores.messages/list#query-parameters + * href=>https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages/list#query-parameters * This paginates through results of a Messages.List call @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.hl7V2Stores.messages/list + * href=>https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages/list * and outputs directly to a {@link PCollection} of {@link HL7v2Message}. In these use cases, the * error handling similar to above is unnecessary because we are listing from the source of truth * the pipeline should fail transparently if this transform fails to paginate through all the @@ -204,7 +204,7 @@ public static ListHL7v2Messages read(ValueProvider hl7v2Store) { * Read all HL7v2 Messages from a single store matching a filter. * * @see + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages/list#query-parameters> */ public static ListHL7v2Messages readWithFilter(String hl7v2Store, String filter) { return new ListHL7v2Messages( @@ -216,7 +216,7 @@ public static ListHL7v2Messages readWithFilter(String hl7v2Store, String filter) * Read all HL7v2 Messages from a single store matching a filter. * * @see + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages/list#query-parameters> */ public static ListHL7v2Messages readWithFilter( ValueProvider hl7v2Store, ValueProvider filter) { @@ -228,7 +228,7 @@ public static ListHL7v2Messages readWithFilter( * Read all HL7v2 Messages from a multiple stores matching a filter. * * @see + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages/list#query-parameters> */ public static ListHL7v2Messages readAllWithFilter(List hl7v2Stores, String filter) { return new ListHL7v2Messages( @@ -239,7 +239,7 @@ public static ListHL7v2Messages readAllWithFilter(List hl7v2Stores, Stri * Read all HL7v2 Messages from a multiple stores matching a filter. * * @see + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages/list#query-parameters> */ public static ListHL7v2Messages readAllWithFilter( ValueProvider> hl7v2Stores, ValueProvider filter) { @@ -248,7 +248,7 @@ public static ListHL7v2Messages readAllWithFilter( /** * Write with Messages.Ingest method. @see + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages/ingest> * * @param hl7v2Store the hl 7 v 2 store * @return the write @@ -264,7 +264,7 @@ public static Write ingestMessages(String hl7v2Store) { * patterns would be {@link PubsubIO#readStrings()} reading a subscription on an HL7v2 Store's * notification channel topic or using {@link ListHL7v2Messages} to list HL7v2 message IDs with an * optional filter using Ingest write method. @see . + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages/list>. */ public static class Read extends PTransform, Read.Result> { @@ -410,12 +410,9 @@ public void processElement(ProcessContext context) { } private Message fetchMessage(HealthcareApiClient client, String msgId) - throws IOException, ParseException, IllegalArgumentException, InterruptedException { - long startTime = System.currentTimeMillis(); - + throws IOException, ParseException, IllegalArgumentException { try { - com.google.api.services.healthcare.v1beta1.model.Message msg = - client.getHL7v2Message(msgId); + com.google.api.services.healthcare.v1.model.Message msg = client.getHL7v2Message(msgId); if (msg == null) { throw new IOException(String.format("GET request for %s returned null", msgId)); } @@ -641,7 +638,7 @@ public Result expand(PCollection messages) { public enum WriteMethod { /** * Ingest write method. @see + * href=https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages/ingest> */ INGEST, /** diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2Message.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2Message.java index 992d4c140f1a..b9a72f315b79 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2Message.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2Message.java @@ -18,8 +18,8 @@ package org.apache.beam.sdk.io.gcp.healthcare; import com.fasterxml.jackson.databind.ObjectMapper; -import com.google.api.services.healthcare.v1beta1.model.Message; -import com.google.api.services.healthcare.v1beta1.model.SchematizedData; +import com.google.api.services.healthcare.v1.model.Message; +import com.google.api.services.healthcare.v1.model.SchematizedData; import java.io.IOException; import java.util.Map; import org.checkerframework.checker.nullness.qual.Nullable; @@ -80,7 +80,9 @@ public Message toModel() { out.setCreateTime(this.getCreateTime()); out.setData(this.getData()); out.setSendFacility(this.getSendFacility()); - out.setSchematizedData(new SchematizedData().setData(this.schematizedData)); + if (this.schematizedData != null) { + out.setSchematizedData(new SchematizedData().setData(this.schematizedData)); + } out.setLabels(this.labels); return out; } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2MessageCoder.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2MessageCoder.java index d1af961531ed..b471fbb45500 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2MessageCoder.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2MessageCoder.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.healthcare; -import com.google.api.services.healthcare.v1beta1.model.Message; +import com.google.api.services.healthcare.v1.model.Message; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HealthcareApiClient.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HealthcareApiClient.java index dbbedc8375d0..13b2d5c06502 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HealthcareApiClient.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HealthcareApiClient.java @@ -17,16 +17,16 @@ */ package org.apache.beam.sdk.io.gcp.healthcare; -import com.google.api.services.healthcare.v1beta1.model.DeidentifyConfig; -import com.google.api.services.healthcare.v1beta1.model.DicomStore; -import com.google.api.services.healthcare.v1beta1.model.Empty; -import com.google.api.services.healthcare.v1beta1.model.FhirStore; -import com.google.api.services.healthcare.v1beta1.model.Hl7V2Store; -import com.google.api.services.healthcare.v1beta1.model.HttpBody; -import com.google.api.services.healthcare.v1beta1.model.IngestMessageResponse; -import com.google.api.services.healthcare.v1beta1.model.ListMessagesResponse; -import com.google.api.services.healthcare.v1beta1.model.Message; -import com.google.api.services.healthcare.v1beta1.model.Operation; +import com.google.api.services.healthcare.v1.model.DeidentifyConfig; +import com.google.api.services.healthcare.v1.model.DicomStore; +import com.google.api.services.healthcare.v1.model.Empty; +import com.google.api.services.healthcare.v1.model.FhirStore; +import com.google.api.services.healthcare.v1.model.Hl7V2Store; +import com.google.api.services.healthcare.v1.model.HttpBody; +import com.google.api.services.healthcare.v1.model.IngestMessageResponse; +import com.google.api.services.healthcare.v1.model.ListMessagesResponse; +import com.google.api.services.healthcare.v1.model.Message; +import com.google.api.services.healthcare.v1.model.Operation; import java.io.IOException; import java.net.URISyntaxException; import java.text.ParseException; @@ -182,7 +182,7 @@ HttpBody executeFhirBundle(String fhirStore, String bundle) HttpBody searchFhirResource( String fhirStore, String resourceType, - @Nullable Map parameters, + @Nullable Map parameters, String pageToken) throws IOException; diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HttpHealthcareApiClient.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HttpHealthcareApiClient.java index 47cbfe6d0d78..be07db0ab856 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HttpHealthcareApiClient.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HttpHealthcareApiClient.java @@ -24,30 +24,30 @@ import com.google.api.client.http.javanet.NetHttpTransport; import com.google.api.client.json.JsonFactory; import com.google.api.client.json.jackson2.JacksonFactory; -import com.google.api.services.healthcare.v1beta1.CloudHealthcare; -import com.google.api.services.healthcare.v1beta1.CloudHealthcare.Projects.Locations.Datasets.FhirStores.Fhir.Search; -import com.google.api.services.healthcare.v1beta1.CloudHealthcare.Projects.Locations.Datasets.Hl7V2Stores.Messages; -import com.google.api.services.healthcare.v1beta1.CloudHealthcareScopes; -import com.google.api.services.healthcare.v1beta1.model.CreateMessageRequest; -import com.google.api.services.healthcare.v1beta1.model.DeidentifyConfig; -import com.google.api.services.healthcare.v1beta1.model.DeidentifyFhirStoreRequest; -import com.google.api.services.healthcare.v1beta1.model.DicomStore; -import com.google.api.services.healthcare.v1beta1.model.Empty; -import com.google.api.services.healthcare.v1beta1.model.ExportResourcesRequest; -import com.google.api.services.healthcare.v1beta1.model.FhirStore; -import com.google.api.services.healthcare.v1beta1.model.GoogleCloudHealthcareV1beta1FhirRestGcsDestination; -import com.google.api.services.healthcare.v1beta1.model.GoogleCloudHealthcareV1beta1FhirRestGcsSource; -import com.google.api.services.healthcare.v1beta1.model.Hl7V2Store; -import com.google.api.services.healthcare.v1beta1.model.HttpBody; -import com.google.api.services.healthcare.v1beta1.model.ImportResourcesRequest; -import com.google.api.services.healthcare.v1beta1.model.IngestMessageRequest; -import com.google.api.services.healthcare.v1beta1.model.IngestMessageResponse; -import com.google.api.services.healthcare.v1beta1.model.ListFhirStoresResponse; -import com.google.api.services.healthcare.v1beta1.model.ListMessagesResponse; -import com.google.api.services.healthcare.v1beta1.model.Message; -import com.google.api.services.healthcare.v1beta1.model.NotificationConfig; -import com.google.api.services.healthcare.v1beta1.model.Operation; -import com.google.api.services.healthcare.v1beta1.model.SearchResourcesRequest; +import com.google.api.services.healthcare.v1.CloudHealthcare; +import com.google.api.services.healthcare.v1.CloudHealthcare.Projects.Locations.Datasets.FhirStores.Fhir.Search; +import com.google.api.services.healthcare.v1.CloudHealthcare.Projects.Locations.Datasets.Hl7V2Stores.Messages; +import com.google.api.services.healthcare.v1.CloudHealthcareScopes; +import com.google.api.services.healthcare.v1.model.CreateMessageRequest; +import com.google.api.services.healthcare.v1.model.DeidentifyConfig; +import com.google.api.services.healthcare.v1.model.DeidentifyFhirStoreRequest; +import com.google.api.services.healthcare.v1.model.DicomStore; +import com.google.api.services.healthcare.v1.model.Empty; +import com.google.api.services.healthcare.v1.model.ExportResourcesRequest; +import com.google.api.services.healthcare.v1.model.FhirStore; +import com.google.api.services.healthcare.v1.model.GoogleCloudHealthcareV1FhirGcsDestination; +import com.google.api.services.healthcare.v1.model.GoogleCloudHealthcareV1FhirGcsSource; +import com.google.api.services.healthcare.v1.model.Hl7V2Store; +import com.google.api.services.healthcare.v1.model.HttpBody; +import com.google.api.services.healthcare.v1.model.ImportResourcesRequest; +import com.google.api.services.healthcare.v1.model.IngestMessageRequest; +import com.google.api.services.healthcare.v1.model.IngestMessageResponse; +import com.google.api.services.healthcare.v1.model.ListFhirStoresResponse; +import com.google.api.services.healthcare.v1.model.ListMessagesResponse; +import com.google.api.services.healthcare.v1.model.Message; +import com.google.api.services.healthcare.v1.model.NotificationConfig; +import com.google.api.services.healthcare.v1.model.Operation; +import com.google.api.services.healthcare.v1.model.SearchResourcesRequest; import com.google.api.services.storage.StorageScopes; import com.google.auth.oauth2.GoogleCredentials; import com.google.gson.JsonArray; @@ -322,7 +322,7 @@ public Instant getEarliestHL7v2SendTime(String hl7v2Store, @Nullable String filt return Instant.ofEpochMilli(0); } // sendTime is conveniently RFC3339 UTC "Zulu" - // https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.hl7V2Stores.messages#Message + // https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages#Message return Instant.parse(sendTime); } @@ -358,7 +358,7 @@ public Instant getLatestHL7v2SendTime(String hl7v2Store, @Nullable String filter return Instant.now(); } // sendTime is conveniently RFC3339 UTC "Zulu" - // https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.hl7V2Stores.messages#Message + // https://cloud.google.com/healthcare/docs/reference/rest/v1/projects.locations.datasets.hl7V2Stores.messages#Message return Instant.parse(sendTime); } @@ -491,8 +491,7 @@ public Message createHL7v2Message(String hl7v2Store, Message msg) throws IOExcep public Operation importFhirResource( String fhirStore, String gcsSourcePath, @Nullable String contentStructure) throws IOException { - GoogleCloudHealthcareV1beta1FhirRestGcsSource gcsSrc = - new GoogleCloudHealthcareV1beta1FhirRestGcsSource(); + GoogleCloudHealthcareV1FhirGcsSource gcsSrc = new GoogleCloudHealthcareV1FhirGcsSource(); gcsSrc.setUri(gcsSourcePath); ImportResourcesRequest importRequest = new ImportResourcesRequest(); @@ -509,9 +508,8 @@ public Operation importFhirResource( @Override public Operation exportFhirResourceToGcs(String fhirStore, String gcsDestinationPrefix) throws IOException { - GoogleCloudHealthcareV1beta1FhirRestGcsDestination gcsDst = - new GoogleCloudHealthcareV1beta1FhirRestGcsDestination(); - + GoogleCloudHealthcareV1FhirGcsDestination gcsDst = + new GoogleCloudHealthcareV1FhirGcsDestination(); gcsDst.setUriPrefix(gcsDestinationPrefix); ExportResourcesRequest exportRequest = new ExportResourcesRequest(); exportRequest.setGcsDestination(gcsDst); @@ -564,7 +562,7 @@ public HttpBody executeFhirBundle(String fhirStore, String bundle) StringEntity requestEntity = new StringEntity(bundle, ContentType.APPLICATION_JSON); URI uri; try { - uri = new URIBuilder(client.getRootUrl() + "v1beta1/" + fhirStore + "/fhir").build(); + uri = new URIBuilder(client.getRootUrl() + "v1/" + fhirStore + "/fhir").build(); } catch (URISyntaxException e) { LOG.error("URL error when making executeBundle request to FHIR API. " + e.getMessage()); throw new IllegalArgumentException(e); @@ -639,7 +637,7 @@ public HttpBody readFhirResource(String resourceId) throws IOException { public HttpBody searchFhirResource( String fhirStore, String resourceType, - @Nullable Map parameters, + @Nullable Map parameters, String pageToken) throws IOException { SearchResourcesRequest request = new SearchResourcesRequest().setResourceType(resourceType); @@ -868,7 +866,7 @@ public static class FhirResourcePages implements Iterable { private final String fhirStore; private final String resourceType; - private final Map parameters; + private final Map parameters; private transient HealthcareApiClient client; /** @@ -883,7 +881,7 @@ public static class FhirResourcePages implements Iterable { HealthcareApiClient client, String fhirStore, String resourceType, - @Nullable Map parameters) { + @Nullable Map parameters) { this.client = client; this.fhirStore = fhirStore; this.resourceType = resourceType; @@ -905,7 +903,7 @@ public static HttpBody makeSearchRequest( HealthcareApiClient client, String fhirStore, String resourceType, - @Nullable Map parameters, + @Nullable Map parameters, String pageToken) throws IOException { return client.searchFhirResource(fhirStore, resourceType, parameters, pageToken); @@ -922,7 +920,7 @@ public static class FhirResourcePagesIterator implements Iterator { private final String fhirStore; private final String resourceType; - private final Map parameters; + private final Map parameters; private HealthcareApiClient client; private String pageToken; private boolean isFirstRequest; @@ -940,7 +938,7 @@ public static class FhirResourcePagesIterator implements Iterator { HealthcareApiClient client, String fhirStore, String resourceType, - @Nullable Map parameters) { + @Nullable Map parameters) { this.client = client; this.fhirStore = fhirStore; this.resourceType = resourceType; @@ -961,7 +959,7 @@ public boolean hasNext() throws NoSuchElementException { JsonObject jsonResponse = JsonParser.parseString(mapper.writeValueAsString(response)).getAsJsonObject(); JsonArray resources = jsonResponse.getAsJsonArray("entry"); - return resources.size() != 0; + return resources != null && resources.size() != 0; } catch (IOException e) { throw new NoSuchElementException( String.format( diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderCSVIT.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/JsonArrayCoder.java similarity index 52% rename from sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderCSVIT.java rename to sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/JsonArrayCoder.java index 935ac833442a..6f4c2db538a2 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderCSVIT.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/JsonArrayCoder.java @@ -15,28 +15,34 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.sdk.extensions.sql.meta.provider.kafka; +package org.apache.beam.sdk.io.gcp.healthcare; -import static java.nio.charset.StandardCharsets.UTF_8; -import static org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils.beamRow2CsvLine; - -import org.apache.commons.csv.CSVFormat; -import org.apache.kafka.clients.producer.ProducerRecord; +import com.google.gson.JsonArray; +import com.google.gson.JsonParser; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CustomCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -public class KafkaTableProviderCSVIT extends KafkaTableProviderIT { +public class JsonArrayCoder extends CustomCoder { + private static final JsonArrayCoder CODER = new JsonArrayCoder(); + private static final StringUtf8Coder STRING_CODER = StringUtf8Coder.of(); + + public static JsonArrayCoder of() { + return CODER; + } + @Override - protected ProducerRecord generateProducerRecord(int i) { - return new ProducerRecord<>( - kafkaOptions.getKafkaTopic(), - "k" + i, - beamRow2CsvLine(generateRow(i), CSVFormat.DEFAULT).getBytes(UTF_8)); + public void encode(JsonArray value, OutputStream outStream) throws IOException { + STRING_CODER.encode(value.toString(), outStream); } @Override - protected String getPayloadFormat() { - return null; + public JsonArray decode(InputStream inStream) throws IOException { + return JsonParser.parseString(STRING_CODER.decode(inStream)).getAsJsonArray(); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/AddTimestampAttribute.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/AddTimestampAttribute.java new file mode 100644 index 000000000000..66eed6e86d4c --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/AddTimestampAttribute.java @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsub; + +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.TIMESTAMP_FIELD; + +import org.apache.beam.sdk.schemas.transforms.DropFields; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.WithTimestamps; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Adds a timestamp attribute if desired and filters it out of the underlying row if no timestamp + * attribute exists. + */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +class AddTimestampAttribute extends PTransform, PCollection> { + private static final Logger LOG = LoggerFactory.getLogger(AddTimestampAttribute.class); + private final boolean useTimestampAttribute; + + AddTimestampAttribute(boolean useTimestampAttribute) { + this.useTimestampAttribute = useTimestampAttribute; + } + + @Override + public PCollection expand(PCollection input) { + // If a timestamp attribute is used, make sure the TIMESTAMP_FIELD is propagated to the + // element's event time. PubSubIO will populate the attribute from there. + PCollection withTimestamp = + useTimestampAttribute + ? input.apply(WithTimestamps.of((row) -> row.getDateTime(TIMESTAMP_FIELD).toInstant())) + : input; + + PCollection rows; + if (withTimestamp.getSchema().hasField(TIMESTAMP_FIELD)) { + if (!useTimestampAttribute) { + // Warn the user if they're writing data to TIMESTAMP_FIELD, but event timestamp is mapped + // to publish time. The data will be dropped. + LOG.warn( + String.format( + "Dropping output field '%s' before writing to PubSub because this is a read-only " + + "column. To preserve this information you must configure a timestamp attribute.", + TIMESTAMP_FIELD)); + } + rows = withTimestamp.apply(DropFields.fields(TIMESTAMP_FIELD)); + } else { + rows = withTimestamp; + } + + return rows; + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/NestedRowToMessage.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/NestedRowToMessage.java new file mode 100644 index 000000000000..c5cae3416c3d --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/NestedRowToMessage.java @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsub; + +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.ATTRIBUTES_FIELD; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.PAYLOAD_FIELD; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.ATTRIBUTE_ARRAY_FIELD_TYPE; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.ATTRIBUTE_MAP_FIELD_TYPE; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.util.Collection; +import java.util.Map; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +class NestedRowToMessage extends SimpleFunction { + private static final long serialVersionUID = 65176815766314684L; + + private final PayloadSerializer serializer; + private final SerializableFunction> attributesExtractor; + private final SerializableFunction payloadExtractor; + + @SuppressWarnings("methodref.receiver.bound.invalid") + NestedRowToMessage(PayloadSerializer serializer, Schema schema) { + this.serializer = serializer; + if (schema.getField(ATTRIBUTES_FIELD).getType().equals(ATTRIBUTE_MAP_FIELD_TYPE)) { + attributesExtractor = NestedRowToMessage::getAttributesFromMap; + } else { + checkArgument(schema.getField(ATTRIBUTES_FIELD).getType().equals(ATTRIBUTE_ARRAY_FIELD_TYPE)); + attributesExtractor = NestedRowToMessage::getAttributesFromArray; + } + if (schema.getField(PAYLOAD_FIELD).getType().equals(FieldType.BYTES)) { + payloadExtractor = NestedRowToMessage::getPayloadFromBytes; + } else { + checkArgument(schema.getField(PAYLOAD_FIELD).getType().getTypeName().equals(TypeName.ROW)); + payloadExtractor = this::getPayloadFromNested; + } + } + + private static Map getAttributesFromMap(Row row) { + return ImmutableMap.builder() + .putAll(checkArgumentNotNull(row.getMap(ATTRIBUTES_FIELD))) + .build(); + } + + private static Map getAttributesFromArray(Row row) { + ImmutableMap.Builder attributes = ImmutableMap.builder(); + Collection attributeEntries = checkArgumentNotNull(row.getArray(ATTRIBUTES_FIELD)); + for (Row entry : attributeEntries) { + attributes.put( + checkArgumentNotNull(entry.getString("key")), + checkArgumentNotNull(entry.getString("value"))); + } + return attributes.build(); + } + + private static byte[] getPayloadFromBytes(Row row) { + return checkArgumentNotNull(row.getBytes(PAYLOAD_FIELD)); + } + + private byte[] getPayloadFromNested(Row row) { + return serializer.serialize(checkArgumentNotNull(row.getRow(PAYLOAD_FIELD))); + } + + @Override + public PubsubMessage apply(Row row) { + return new PubsubMessage(payloadExtractor.apply(row), attributesExtractor.apply(row)); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubSubPayloadTranslation.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubSubPayloadTranslation.java new file mode 100644 index 000000000000..a1a267507bc3 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubSubPayloadTranslation.java @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsub; + +import com.google.auto.service.AutoService; +import java.util.Collections; +import java.util.Map; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; +import org.apache.beam.model.pipeline.v1.RunnerApi.PubSubReadPayload; +import org.apache.beam.model.pipeline.v1.RunnerApi.PubSubWritePayload; +import org.apache.beam.runners.core.construction.PTransformTranslation; +import org.apache.beam.runners.core.construction.PTransformTranslation.TransformPayloadTranslator; +import org.apache.beam.runners.core.construction.SdkComponents; +import org.apache.beam.runners.core.construction.TransformPayloadTranslatorRegistrar; +import org.apache.beam.sdk.io.Read; +import org.apache.beam.sdk.io.Read.Unbounded; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.SubscriptionPath; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.TopicPath; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSource.PubsubSource; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider; +import org.apache.beam.sdk.runners.AppliedPTransform; +import org.apache.beam.sdk.transforms.PTransform; + +@SuppressWarnings({ + "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +/** + * Utility methods for translating a {@link Unbounded} which reads from {@link + * PubsubUnboundedSource} to {@link RunnerApi} representations. + */ +public class PubSubPayloadTranslation { + static class PubSubReadPayloadTranslator + implements TransformPayloadTranslator> { + + @Override + public String getUrn(Read.Unbounded transform) { + if (!(transform.getSource() instanceof PubsubUnboundedSource.PubsubSource)) { + return null; + } + return PTransformTranslation.PUBSUB_READ; + } + + @Override + public RunnerApi.FunctionSpec translate( + AppliedPTransform> transform, SdkComponents components) { + if (!(transform.getTransform().getSource() instanceof PubsubUnboundedSource.PubsubSource)) { + return null; + } + PubSubReadPayload.Builder payloadBuilder = PubSubReadPayload.newBuilder(); + PubsubUnboundedSource pubsubUnboundedSource = + ((PubsubSource) transform.getTransform().getSource()).outer; + ValueProvider topicProvider = pubsubUnboundedSource.getTopicProvider(); + if (topicProvider != null) { + if (topicProvider.isAccessible()) { + payloadBuilder.setTopic(topicProvider.get().getFullPath()); + } else { + payloadBuilder.setTopicRuntimeOverridden( + ((NestedValueProvider) topicProvider).propertyName()); + } + } + ValueProvider subscriptionProvider = + pubsubUnboundedSource.getSubscriptionProvider(); + if (subscriptionProvider != null) { + if (subscriptionProvider.isAccessible()) { + payloadBuilder.setSubscription(subscriptionProvider.get().getFullPath()); + } else { + payloadBuilder.setSubscriptionRuntimeOverridden( + ((NestedValueProvider) subscriptionProvider).propertyName()); + } + } + + if (pubsubUnboundedSource.getTimestampAttribute() != null) { + payloadBuilder.setTimestampAttribute(pubsubUnboundedSource.getTimestampAttribute()); + } + if (pubsubUnboundedSource.getIdAttribute() != null) { + payloadBuilder.setIdAttribute(pubsubUnboundedSource.getIdAttribute()); + } + payloadBuilder.setWithAttributes( + pubsubUnboundedSource.getNeedsAttributes() || pubsubUnboundedSource.getNeedsMessageId()); + return FunctionSpec.newBuilder() + .setUrn(getUrn(transform.getTransform())) + .setPayload(payloadBuilder.build().toByteString()) + .build(); + } + } + + static class PubSubWritePayloadTranslator + implements TransformPayloadTranslator { + + @Override + public String getUrn(PubsubUnboundedSink.PubsubSink transform) { + return PTransformTranslation.PUBSUB_WRITE; + } + + @Override + public RunnerApi.FunctionSpec translate( + AppliedPTransform transform, + SdkComponents components) { + PubSubWritePayload.Builder payloadBuilder = PubSubWritePayload.newBuilder(); + ValueProvider topicProvider = transform.getTransform().outer.getTopicProvider(); + if (topicProvider.isAccessible()) { + payloadBuilder.setTopic(topicProvider.get().getFullPath()); + } else { + payloadBuilder.setTopicRuntimeOverridden( + ((NestedValueProvider) topicProvider).propertyName()); + } + if (transform.getTransform().outer.getTimestampAttribute() != null) { + payloadBuilder.setTimestampAttribute( + transform.getTransform().outer.getTimestampAttribute()); + } + if (transform.getTransform().outer.getIdAttribute() != null) { + payloadBuilder.setIdAttribute(transform.getTransform().outer.getIdAttribute()); + } + return FunctionSpec.newBuilder() + .setUrn(getUrn(transform.getTransform())) + .setPayload(payloadBuilder.build().toByteString()) + .build(); + } + } + + @AutoService(TransformPayloadTranslatorRegistrar.class) + public static class WriteRegistrar implements TransformPayloadTranslatorRegistrar { + + @Override + public Map, ? extends TransformPayloadTranslator> + getTransformPayloadTranslators() { + return Collections.singletonMap( + PubsubUnboundedSink.PubsubSink.class, new PubSubWritePayloadTranslator()); + } + } + + @AutoService(TransformPayloadTranslatorRegistrar.class) + public static class ReadRegistrar implements TransformPayloadTranslatorRegistrar { + + @Override + public Map, ? extends TransformPayloadTranslator> + getTransformPayloadTranslators() { + return Collections.singletonMap(Read.Unbounded.class, new PubSubReadPayloadTranslator()); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubClient.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubClient.java index 8fa8690b424f..936092a3ba06 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubClient.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubClient.java @@ -31,8 +31,8 @@ import java.util.Map; import java.util.concurrent.ThreadLocalRandom; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Objects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; import org.checkerframework.checker.nullness.qual.Nullable; /** An (abstract) helper class for talking to Pubsub via an underlying transport. */ @@ -58,13 +58,12 @@ PubsubClient newClient( } /** - * Return timestamp as ms-since-unix-epoch corresponding to {@code timestamp}. Return {@literal - * null} if no timestamp could be found. Throw {@link IllegalArgumentException} if timestamp - * cannot be recognized. + * Return timestamp as ms-since-unix-epoch corresponding to {@code timestamp}. Throw {@link + * IllegalArgumentException} if timestamp cannot be recognized. */ - private static @Nullable Long asMsSinceEpoch(@Nullable String timestamp) { - if (Strings.isNullOrEmpty(timestamp)) { - return null; + protected static Long parseTimestampAsMsSinceEpoch(String timestamp) { + if (timestamp.isEmpty()) { + throw new IllegalArgumentException("Empty timestamp."); } try { // Try parsing as milliseconds since epoch. Note there is no way to parse a @@ -81,40 +80,28 @@ PubsubClient newClient( /** * Return the timestamp (in ms since unix epoch) to use for a Pubsub message with {@code - * attributes} and {@code pubsubTimestamp}. + * timestampAttribute} and {@code attriutes}. * - *

    If {@code timestampAttribute} is non-{@literal null} then the message attributes must - * contain that attribute, and the value of that attribute will be taken as the timestamp. - * Otherwise the timestamp will be taken from the Pubsub publish timestamp {@code - * pubsubTimestamp}. + *

    The message attributes must contain {@code timestampAttribute}, and the value of that + * attribute will be taken as the timestamp. * * @throws IllegalArgumentException if the timestamp cannot be recognized as a ms-since-unix-epoch * or RFC3339 time. */ - protected static long extractTimestamp( - @Nullable String timestampAttribute, - @Nullable String pubsubTimestamp, - @Nullable Map attributes) { - Long timestampMsSinceEpoch; - if (Strings.isNullOrEmpty(timestampAttribute)) { - timestampMsSinceEpoch = asMsSinceEpoch(pubsubTimestamp); - checkArgument( - timestampMsSinceEpoch != null, - "Cannot interpret PubSub publish timestamp: %s", - pubsubTimestamp); - } else { - String value = attributes == null ? null : attributes.get(timestampAttribute); - checkArgument( - value != null, - "PubSub message is missing a value for timestamp attribute %s", - timestampAttribute); - timestampMsSinceEpoch = asMsSinceEpoch(value); - checkArgument( - timestampMsSinceEpoch != null, - "Cannot interpret value of attribute %s as timestamp: %s", - timestampAttribute, - value); - } + protected static long extractTimestampAttribute( + String timestampAttribute, @Nullable Map attributes) { + Preconditions.checkState(!timestampAttribute.isEmpty()); + String value = attributes == null ? null : attributes.get(timestampAttribute); + checkArgument( + value != null, + "PubSub message is missing a value for timestamp attribute %s", + timestampAttribute); + Long timestampMsSinceEpoch = parseTimestampAsMsSinceEpoch(value); + checkArgument( + timestampMsSinceEpoch != null, + "Cannot interpret value of attribute %s as timestamp: %s", + timestampAttribute, + value); return timestampMsSinceEpoch; } @@ -202,7 +189,7 @@ public String getName() { return subscriptionName; } - public String getV1Beta1Path() { + public String getFullPath() { return String.format("/subscriptions/%s/%s", projectId, subscriptionName); } @@ -259,7 +246,7 @@ public String getName() { return splits.get(3); } - public String getV1Beta1Path() { + public String getFullPath() { List splits = Splitter.on('/').splitToList(path); checkState(splits.size() == 4, "Malformed topic path %s", path); return String.format("/topics/%s/%s", splits.get(1), splits.get(3)); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubDlqProvider.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubDlqProvider.java new file mode 100644 index 000000000000..c950f7c1f05e --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubDlqProvider.java @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsub; + +import com.google.auto.service.AutoService; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.io.Failure; +import org.apache.beam.sdk.schemas.io.GenericDlqProvider; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +@Internal +@AutoService(GenericDlqProvider.class) +public class PubsubDlqProvider implements GenericDlqProvider { + @Override + public String identifier() { + return "pubsub"; + } + + @Override + public PTransform, PDone> newDlqTransform(String config) { + return new DlqTransform(config); + } + + private static class DlqTransform extends PTransform, PDone> { + private final String topic; + + DlqTransform(String topic) { + this.topic = topic; + } + + @Override + public PDone expand(PCollection input) { + return input + .apply( + "Failure to PubsubMessage", + MapElements.into(TypeDescriptor.of(PubsubMessage.class)) + .via(DlqTransform::getMessage)) + .apply("Write Failures to Pub/Sub", PubsubIO.writeMessages().to(topic)); + } + + private static PubsubMessage getMessage(Failure failure) { + return new PubsubMessage( + failure.getPayload(), ImmutableMap.of("beam-dlq-error", failure.getError())); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java index 06187419218f..43b932e80e35 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.pubsub; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; import com.google.auth.Credentials; @@ -230,14 +231,15 @@ public List pull( @Nullable Map attributes = pubsubMessage.getAttributes(); // Timestamp. - String pubsubTimestampString = null; - Timestamp timestampProto = pubsubMessage.getPublishTime(); - if (timestampProto != null) { - pubsubTimestampString = - String.valueOf(timestampProto.getSeconds() + timestampProto.getNanos() / 1000L); + long timestampMsSinceEpoch; + if (Strings.isNullOrEmpty(timestampAttribute)) { + Timestamp timestampProto = pubsubMessage.getPublishTime(); + checkArgument(timestampProto != null, "Pubsub message is missing timestamp proto"); + timestampMsSinceEpoch = + timestampProto.getSeconds() * 1000 + timestampProto.getNanos() / 1000L / 1000L; + } else { + timestampMsSinceEpoch = extractTimestampAttribute(timestampAttribute, attributes); } - long timestampMsSinceEpoch = - extractTimestamp(timestampAttribute, pubsubTimestampString, attributes); // Ack id. String ackId = message.getAckId(); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java index 601ed1c82874..2ade12941d7b 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java @@ -22,6 +22,9 @@ import com.google.api.client.util.Clock; import com.google.auto.value.AutoValue; import com.google.protobuf.ByteString; +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.DynamicMessage; +import com.google.protobuf.InvalidProtocolBufferException; import com.google.protobuf.Message; import java.io.IOException; import java.io.Serializable; @@ -42,6 +45,8 @@ import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.extensions.protobuf.ProtoCoder; +import org.apache.beam.sdk.extensions.protobuf.ProtoDomain; +import org.apache.beam.sdk.extensions.protobuf.ProtoDynamicMessageSchema; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.OutgoingMessage; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.SubscriptionPath; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.TopicPath; @@ -57,9 +62,13 @@ import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.transforms.WithFailures; +import org.apache.beam.sdk.transforms.WithFailures.Result; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.windowing.AfterWatermark; import org.apache.beam.sdk.util.CoderUtils; +import org.apache.beam.sdk.values.EncodableThrowable; +import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PDone; @@ -483,6 +492,54 @@ public static Read readProtos(Class messageClass) { return Read.newBuilder(parsePayloadUsingCoder(coder)).setCoder(coder).build(); } + /** + * Returns a {@link PTransform} that continuously reads binary encoded protobuf messages for the + * type specified by {@code fullMessageName}. + * + *

    This is primarily here for cases where the message type cannot be known at compile time. If + * it can be known, prefer {@link PubsubIO#readProtos(Class)}, as {@link DynamicMessage} tends to + * perform worse than concrete types. + * + *

    Beam will infer a schema for the {@link DynamicMessage} schema. Note that some proto schema + * features are not supported by all sinks. + * + * @param domain The {@link ProtoDomain} that contains the target message and its dependencies. + * @param fullMessageName The full name of the message for lookup in {@code domain}. + */ + @Experimental(Kind.SCHEMAS) + public static Read readProtoDynamicMessages( + ProtoDomain domain, String fullMessageName) { + SerializableFunction parser = + message -> { + try { + return DynamicMessage.parseFrom( + domain.getDescriptor(fullMessageName), message.getPayload()); + } catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Could not parse Pub/Sub message", e); + } + }; + + ProtoDynamicMessageSchema schema = + ProtoDynamicMessageSchema.forDescriptor(domain, domain.getDescriptor(fullMessageName)); + return Read.newBuilder(parser) + .setCoder( + SchemaCoder.of( + schema.getSchema(), + TypeDescriptor.of(DynamicMessage.class), + schema.getToRowFunction(), + schema.getFromRowFunction())) + .build(); + } + + /** + * Similar to {@link PubsubIO#readProtoDynamicMessages(ProtoDomain, String)} but for when the + * {@link Descriptor} is already known. + */ + @Experimental(Kind.SCHEMAS) + public static Read readProtoDynamicMessages(Descriptor descriptor) { + return readProtoDynamicMessages(ProtoDomain.buildFrom(descriptor), descriptor.getFullName()); + } + /** * Returns A {@link PTransform} that continuously reads binary encoded Avro messages of the given * type from a Google Cloud Pub/Sub stream. @@ -590,6 +647,8 @@ public abstract static class Read extends PTransform> abstract @Nullable ValueProvider getTopicProvider(); + abstract @Nullable ValueProvider getDeadLetterTopicProvider(); + abstract PubsubClient.PubsubClientFactory getPubsubClientFactory(); abstract @Nullable ValueProvider getSubscriptionProvider(); @@ -640,6 +699,8 @@ static Builder newBuilder() { abstract static class Builder { abstract Builder setTopicProvider(ValueProvider topic); + abstract Builder setDeadLetterTopicProvider(ValueProvider deadLetterTopic); + abstract Builder setPubsubClientFactory(PubsubClient.PubsubClientFactory clientFactory); abstract Builder setSubscriptionProvider(ValueProvider subscription); @@ -710,17 +771,62 @@ public Read fromTopic(String topic) { return fromTopic(StaticValueProvider.of(topic)); } - /** Like {@code topic()} but with a {@link ValueProvider}. */ + /** Like {@link Read#fromTopic(String)} but with a {@link ValueProvider}. */ public Read fromTopic(ValueProvider topic) { - if (topic.isAccessible()) { - // Validate. - PubsubTopic.fromPath(topic.get()); - } + validateTopic(topic); return toBuilder() .setTopicProvider(NestedValueProvider.of(topic, PubsubTopic::fromPath)) .build(); } + /** + * Creates and returns a transform for writing read failures out to a dead-letter topic. + * + *

    The message written to the dead-letter will contain three attributes: + * + *

      + *
    • exceptionClassName: The type of exception that was thrown. + *
    • exceptionMessage: The message in the exception + *
    • pubsubMessageId: The message id of the original Pub/Sub message if it was read in, + * otherwise "" + *
    + * + *

    The {@link PubsubClient.PubsubClientFactory} used in the {@link Write} transform for + * errors will be the same as used in the final {@link Read} transform. + * + *

    If there might be a parsing error (or similar), then this should be set up on the + * topic to avoid wasting resources and to provide more error details with the message written + * to Pub/Sub. Otherwise, the Pub/Sub topic should have a dead-letter configuration set up to + * avoid an infinite retry loop. + * + *

    Only failures that result from the {@link Read} configuration (e.g. parsing errors) will + * be sent to the dead-letter topic. Errors that occur after a successful read will need to set + * up their own {@link Write} transform. Errors with delivery require configuring Pub/Sub itself + * to write to the dead-letter topic after a certain number of failed attempts. + * + *

    See {@link PubsubIO.PubsubTopic#fromPath(String)} for more details on the format of the + * {@code deadLetterTopic} string. + */ + public Read withDeadLetterTopic(String deadLetterTopic) { + return withDeadLetterTopic(StaticValueProvider.of(deadLetterTopic)); + } + + /** Like {@link Read#withDeadLetterTopic(String)} but with a {@link ValueProvider}. */ + public Read withDeadLetterTopic(ValueProvider deadLetterTopic) { + validateTopic(deadLetterTopic); + return toBuilder() + .setDeadLetterTopicProvider( + NestedValueProvider.of(deadLetterTopic, PubsubTopic::fromPath)) + .build(); + } + + /** Handles validation of {@code topic}. */ + private static void validateTopic(ValueProvider topic) { + if (topic.isAccessible()) { + PubsubTopic.fromPath(topic.get()); + } + } + /** * The default client to write to Pub/Sub is the {@link PubsubJsonClient}, created by the {@link * PubsubJsonClient.PubsubJsonClientFactory}. This function allows to change the Pub/Sub client @@ -828,8 +934,60 @@ public PCollection expand(PBegin input) { getIdAttribute(), getNeedsAttributes(), getNeedsMessageId()); - PCollection read = - input.apply(source).apply(MapElements.into(new TypeDescriptor() {}).via(getParseFn())); + + PCollection read; + PCollection preParse = input.apply(source); + TypeDescriptor typeDescriptor = new TypeDescriptor() {}; + if (getDeadLetterTopicProvider() == null) { + read = preParse.apply(MapElements.into(typeDescriptor).via(getParseFn())); + } else { + Result, KV> result = + preParse.apply( + "PubsubIO.Read/Map/Parse-Incoming-Messages", + MapElements.into(typeDescriptor) + .via(getParseFn()) + .exceptionsVia(new WithFailures.ThrowableHandler() {})); + read = result.output(); + + // Write out failures to the provided dead-letter topic. + result + .failures() + // Since the stack trace could easily exceed Pub/Sub limits, we need to remove it from + // the attributes. + .apply( + "PubsubIO.Read/Map/Remove-Stack-Trace-Attribute", + MapElements.into(new TypeDescriptor>>() {}) + .via( + kv -> { + PubsubMessage message = kv.getKey(); + String messageId = + message.getMessageId() == null ? "" : message.getMessageId(); + Throwable throwable = kv.getValue().throwable(); + + // In order to stay within Pub/Sub limits, we aren't adding the stack + // trace to the attributes. Therefore, we need to log the throwable. + LOG.error( + "Error parsing Pub/Sub message with id '{}'", messageId, throwable); + + ImmutableMap attributes = + ImmutableMap.builder() + .put("exceptionClassName", throwable.getClass().getName()) + .put("exceptionMessage", throwable.getMessage()) + .put("pubsubMessageId", messageId) + .build(); + + return KV.of(kv.getKey(), attributes); + })) + .apply( + "PubsubIO.Read/Map/Create-Dead-Letter-Payload", + MapElements.into(TypeDescriptor.of(PubsubMessage.class)) + .via(kv -> new PubsubMessage(kv.getKey().getPayload(), kv.getValue()))) + .apply( + writeMessages() + .to(getDeadLetterTopicProvider().get().asPath()) + .withClientFactory(getPubsubClientFactory())); + } + return read.setCoder(getCoder()); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java index 198c5389c831..145fe95ce41e 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java @@ -190,8 +190,12 @@ public List pull( } // Timestamp. - long timestampMsSinceEpoch = - extractTimestamp(timestampAttribute, message.getMessage().getPublishTime(), attributes); + long timestampMsSinceEpoch; + if (Strings.isNullOrEmpty(timestampAttribute)) { + timestampMsSinceEpoch = parseTimestampAsMsSinceEpoch(message.getMessage().getPublishTime()); + } else { + timestampMsSinceEpoch = extractTimestampAttribute(timestampAttribute, attributes); + } // Ack id. String ackId = message.getAckId(); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessage.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessage.java index ee53cf691d83..6b23506dcc88 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessage.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessage.java @@ -19,8 +19,8 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import com.google.auto.value.AutoValue; import java.util.Map; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.checkerframework.checker.nullness.qual.Nullable; /** @@ -31,51 +31,68 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PubsubMessage { + @AutoValue + abstract static class Impl { + @SuppressWarnings("mutable") + abstract byte[] getPayload(); - private byte[] message; - private @Nullable Map attributes; - private @Nullable String messageId; + abstract @Nullable Map getAttributeMap(); + + abstract @Nullable String getMessageId(); + + static Impl create( + byte[] payload, @Nullable Map attributes, @Nullable String messageId) { + return new AutoValue_PubsubMessage_Impl(payload, attributes, messageId); + } + } + + private Impl impl; public PubsubMessage(byte[] payload, @Nullable Map attributes) { - this.message = payload; - this.attributes = attributes; - this.messageId = null; + this(payload, attributes, null); } public PubsubMessage( byte[] payload, @Nullable Map attributes, @Nullable String messageId) { - this.message = payload; - this.attributes = attributes; - this.messageId = messageId; + impl = Impl.create(payload, attributes, messageId); } /** Returns the main PubSub message. */ public byte[] getPayload() { - return message; + return impl.getPayload(); } /** Returns the given attribute value. If not such attribute exists, returns null. */ public @Nullable String getAttribute(String attribute) { checkNotNull(attribute, "attribute"); - return attributes.get(attribute); + return impl.getAttributeMap().get(attribute); } /** Returns the full map of attributes. This is an unmodifiable map. */ public @Nullable Map getAttributeMap() { - return attributes; + return impl.getAttributeMap(); } /** Returns the messageId of the message populated by Cloud Pub/Sub. */ public @Nullable String getMessageId() { - return messageId; + return impl.getMessageId(); } @Override public String toString() { - return MoreObjects.toStringHelper(this) - .add("message", message) - .add("attributes", attributes) - .add("messageId", messageId) - .toString(); + return impl.toString(); + } + + @Override + public boolean equals(Object other) { + if (!(other instanceof PubsubMessage)) { + return false; + } + return impl.equals(((PubsubMessage) other).impl); + } + + @Override + public int hashCode() { + return impl.hashCode(); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageToRow.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageToRow.java index f2d602288a47..ec61bf1a96b9 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageToRow.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageToRow.java @@ -17,36 +17,31 @@ */ package org.apache.beam.sdk.io.gcp.pubsub; -import static java.nio.charset.StandardCharsets.UTF_8; import static java.util.stream.Collectors.toList; -import static org.apache.beam.sdk.util.RowJsonUtils.newObjectMapperWith; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.ATTRIBUTE_ARRAY_ENTRY_SCHEMA; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; -import com.fasterxml.jackson.databind.ObjectMapper; import com.google.auto.value.AutoValue; import java.io.Serializable; import java.util.List; import java.util.Map; import java.util.stream.Collectors; -import org.apache.avro.AvroRuntimeException; +import javax.annotation.Nullable; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Internal; -import org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PayloadFormat; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.TypeName; -import org.apache.beam.sdk.schemas.utils.AvroUtils; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; -import org.apache.beam.sdk.transforms.SimpleFunction; -import org.apache.beam.sdk.util.RowJson.RowJsonDeserializer; -import org.apache.beam.sdk.util.RowJson.UnsupportedRowJsonException; -import org.apache.beam.sdk.util.RowJsonUtils; +import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionTuple; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.joda.time.Instant; /** Read side converter for {@link PubsubMessage} with JSON/AVRO payload. */ @@ -58,6 +53,8 @@ }) abstract class PubsubMessageToRow extends PTransform, PCollectionTuple> implements Serializable { + interface SerializerProvider extends SerializableFunction {} + static final String TIMESTAMP_FIELD = "event_timestamp"; static final String ATTRIBUTES_FIELD = "attributes"; static final String PAYLOAD_FIELD = "payload"; @@ -85,10 +82,11 @@ abstract class PubsubMessageToRow extends PTransform, public abstract boolean useFlatSchema(); - public abstract PayloadFormat payloadFormat(); + // A provider for a PayloadSerializer given the expected payload schema. + public abstract @Nullable SerializerProvider serializerProvider(); public static Builder builder() { - return new AutoValue_PubsubMessageToRow.Builder().payloadFormat(PayloadFormat.JSON); + return new AutoValue_PubsubMessageToRow.Builder(); } @Override @@ -98,63 +96,15 @@ public PCollectionTuple expand(PCollection input) { ParDo.of( useFlatSchema() ? new FlatSchemaPubsubMessageToRow( - messageSchema(), useDlq(), payloadFormat()) + messageSchema(), useDlq(), serializerProvider()) : new NestedSchemaPubsubMessageToRow( - messageSchema(), useDlq(), payloadFormat())) + messageSchema(), useDlq(), serializerProvider())) .withOutputTags( MAIN_TAG, useDlq() ? TupleTagList.of(DLQ_TAG) : TupleTagList.empty())); rows.get(MAIN_TAG).setRowSchema(messageSchema()); return rows; } - @VisibleForTesting - static SimpleFunction getParsePayloadFn( - PayloadFormat format, Schema payloadSchema) { - switch (format) { - case JSON: - return new ParseJsonPayloadFn(payloadSchema); - case AVRO: - return new ParseAvroPayloadFn(payloadSchema); - default: - throw new IllegalArgumentException("Unsupported payload format given: " + format); - } - } - - private static class ParseJsonPayloadFn extends SimpleFunction { - private final ObjectMapper jsonMapper; - - ParseJsonPayloadFn(Schema payloadSchema) { - jsonMapper = newObjectMapperWith(RowJsonDeserializer.forSchema(payloadSchema)); - } - - @Override - public Row apply(PubsubMessage message) { - String payloadJson = new String(message.getPayload(), UTF_8); - try { - return RowJsonUtils.jsonToRow(jsonMapper, payloadJson); - } catch (UnsupportedRowJsonException e) { - throw new ParseException(e); - } - } - } - - private static class ParseAvroPayloadFn extends SimpleFunction { - private final SimpleFunction avroBytesToRowFn; - - public ParseAvroPayloadFn(Schema payloadSchema) { - avroBytesToRowFn = AvroUtils.getAvroBytesToRowFunction(payloadSchema); - } - - @Override - public Row apply(PubsubMessage message) { - try { - return avroBytesToRowFn.apply(message.getPayload()); - } catch (AvroRuntimeException e) { - throw new ParseException(e); - } - } - } - /** * A {@link DoFn} to convert a flat schema{@link PubsubMessage} with JSON/AVRO payload to {@link * Row}. @@ -166,10 +116,10 @@ private static class FlatSchemaPubsubMessageToRow extends DoFn parsePayloadFn; + private final PayloadSerializer payloadSerializer; protected FlatSchemaPubsubMessageToRow( - Schema messageSchema, boolean useDlq, PayloadFormat payloadFormat) { + Schema messageSchema, boolean useDlq, SerializerProvider serializerProvider) { this.messageSchema = messageSchema; // Construct flat payload schema. Schema payloadSchema = @@ -178,7 +128,7 @@ protected FlatSchemaPubsubMessageToRow( .filter(f -> !f.getName().equals(TIMESTAMP_FIELD)) .collect(Collectors.toList())); this.useDlq = useDlq; - this.parsePayloadFn = getParsePayloadFn(payloadFormat, payloadSchema); + this.payloadSerializer = serializerProvider.apply(payloadSchema); } /** @@ -198,17 +148,17 @@ private Object getValueForFieldFlatSchema(Schema.Field field, Instant timestamp, public void processElement( @Element PubsubMessage element, @Timestamp Instant timestamp, MultiOutputReceiver o) { try { - Row payload = parsePayloadFn.apply(element); + Row payload = payloadSerializer.deserialize(element.getPayload()); List values = messageSchema.getFields().stream() .map(field -> getValueForFieldFlatSchema(field, timestamp, payload)) .collect(toList()); o.get(MAIN_TAG).output(Row.withSchema(messageSchema).addValues(values).build()); - } catch (ParseException pe) { + } catch (Exception e) { if (useDlq) { o.get(DLQ_TAG).output(element); } else { - throw new RuntimeException(pe); + throw new RuntimeException(e); } } } @@ -225,26 +175,51 @@ private static class NestedSchemaPubsubMessageToRow extends DoFn parsePayloadFn; + private final @Nullable PayloadSerializer payloadSerializer; - protected NestedSchemaPubsubMessageToRow( - Schema messageSchema, boolean useDlq, PayloadFormat payloadFormat) { + private NestedSchemaPubsubMessageToRow( + Schema messageSchema, boolean useDlq, @Nullable SerializerProvider serializerProvider) { this.messageSchema = messageSchema; this.useDlq = useDlq; - Schema payloadSchema = messageSchema.getField(PAYLOAD_FIELD).getType().getRowSchema(); - this.parsePayloadFn = getParsePayloadFn(payloadFormat, payloadSchema); + if (serializerProvider == null) { + checkArgument( + messageSchema.getField(PAYLOAD_FIELD).getType().getTypeName().equals(TypeName.BYTES)); + this.payloadSerializer = null; + } else { + checkArgument( + messageSchema.getField(PAYLOAD_FIELD).getType().getTypeName().equals(TypeName.ROW)); + Schema payloadSchema = messageSchema.getField(PAYLOAD_FIELD).getType().getRowSchema(); + this.payloadSerializer = serializerProvider.apply(payloadSchema); + } + } + + private Object maybeDeserialize(byte[] payload) { + if (payloadSerializer == null) { + return payload; + } + return payloadSerializer.deserialize(payload); + } + + private Object handleAttributes(Map attributeMap) { + if (messageSchema.getField(ATTRIBUTES_FIELD).getType().getTypeName().isMapType()) { + return attributeMap; + } + ImmutableList.Builder rows = ImmutableList.builder(); + attributeMap.forEach( + (k, v) -> rows.add(Row.withSchema(ATTRIBUTE_ARRAY_ENTRY_SCHEMA).attachValues(k, v))); + return rows.build(); } /** Get the value for a field int the order they're specified in the nested schema. */ private Object getValueForFieldNestedSchema( - Schema.Field field, Instant timestamp, Map attributeMap, Row payload) { + Schema.Field field, Instant timestamp, Map attributeMap, byte[] payload) { switch (field.getName()) { case TIMESTAMP_FIELD: return timestamp; case ATTRIBUTES_FIELD: - return attributeMap; + return handleAttributes(attributeMap); case PAYLOAD_FIELD: - return payload; + return maybeDeserialize(payload); default: throw new IllegalArgumentException( "Unexpected field '" @@ -259,20 +234,19 @@ private Object getValueForFieldNestedSchema( public void processElement( @Element PubsubMessage element, @Timestamp Instant timestamp, MultiOutputReceiver o) { try { - Row payload = parsePayloadFn.apply(element); List values = messageSchema.getFields().stream() .map( field -> getValueForFieldNestedSchema( - field, timestamp, element.getAttributeMap(), payload)) + field, timestamp, element.getAttributeMap(), element.getPayload())) .collect(toList()); o.get(MAIN_TAG).output(Row.withSchema(messageSchema).addValues(values).build()); - } catch (ParseException exception) { + } catch (Exception e) { if (useDlq) { o.get(DLQ_TAG).output(element); } else { - throw new RuntimeException(exception); + throw new RuntimeException(e); } } } @@ -286,7 +260,7 @@ abstract static class Builder { public abstract Builder useFlatSchema(boolean useFlatSchema); - public abstract Builder payloadFormat(PayloadFormat payloadFormat); + public abstract Builder serializerProvider(SerializerProvider serializerProvider); public abstract PubsubMessageToRow build(); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessages.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessages.java index 6d9e2956834e..bf6a28863b01 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessages.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessages.java @@ -21,23 +21,39 @@ import com.google.protobuf.InvalidProtocolBufferException; import java.util.Map; import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; /** Common util functions for converting between PubsubMessage proto and {@link PubsubMessage}. */ -public class PubsubMessages { +public final class PubsubMessages { + private PubsubMessages() {} + + public static com.google.pubsub.v1.PubsubMessage toProto(PubsubMessage input) { + Map attributes = input.getAttributeMap(); + com.google.pubsub.v1.PubsubMessage.Builder message = + com.google.pubsub.v1.PubsubMessage.newBuilder() + .setData(ByteString.copyFrom(input.getPayload())); + // TODO(BEAM-8085) this should not be null + if (attributes != null) { + message.putAllAttributes(attributes); + } + String messageId = input.getMessageId(); + if (messageId != null) { + message.setMessageId(messageId); + } + return message.build(); + } + + public static PubsubMessage fromProto(com.google.pubsub.v1.PubsubMessage input) { + return new PubsubMessage( + input.getData().toByteArray(), input.getAttributesMap(), input.getMessageId()); + } + // Convert the PubsubMessage to a PubsubMessage proto, then return its serialized representation. public static class ParsePayloadAsPubsubMessageProto implements SerializableFunction { @Override public byte[] apply(PubsubMessage input) { - Map attributes = input.getAttributeMap(); - com.google.pubsub.v1.PubsubMessage.Builder message = - com.google.pubsub.v1.PubsubMessage.newBuilder() - .setData(ByteString.copyFrom(input.getPayload())); - // TODO(BEAM-8085) this should not be null - if (attributes != null) { - message.putAllAttributes(attributes); - } - return message.build().toByteArray(); + return toProto(input).toByteArray(); } } @@ -49,10 +65,19 @@ public PubsubMessage apply(byte[] input) { try { com.google.pubsub.v1.PubsubMessage message = com.google.pubsub.v1.PubsubMessage.parseFrom(input); - return new PubsubMessage(message.getData().toByteArray(), message.getAttributesMap()); + return fromProto(message); } catch (InvalidProtocolBufferException e) { throw new RuntimeException("Could not decode Pubsub message", e); } } } + + public static class DeserializeBytesIntoPubsubMessagePayloadOnly + implements SerializableFunction { + + @Override + public PubsubMessage apply(byte[] value) { + return new PubsubMessage(value, ImmutableMap.of()); + } + } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java index fa1c0e2ef96c..3a357b904a9d 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.pubsub; +import static java.util.stream.Collectors.toList; import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.ATTRIBUTES_FIELD; import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.DLQ_TAG; import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.MAIN_TAG; @@ -27,14 +28,19 @@ import com.google.auto.service.AutoService; import com.google.auto.value.AutoValue; import java.io.Serializable; +import java.util.List; import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.schemas.AutoValueSchema; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.io.InvalidConfigurationException; import org.apache.beam.sdk.schemas.io.InvalidSchemaException; import org.apache.beam.sdk.schemas.io.SchemaIO; import org.apache.beam.sdk.schemas.io.SchemaIOProvider; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializers; +import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; @@ -42,6 +48,7 @@ import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.Nullable; /** @@ -94,13 +101,12 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class PubsubSchemaIOProvider implements SchemaIOProvider { - public static final FieldType VARCHAR = FieldType.STRING; - public static final FieldType TIMESTAMP = FieldType.DATETIME; - - public enum PayloadFormat { - JSON, - AVRO - } + public static final FieldType ATTRIBUTE_MAP_FIELD_TYPE = + Schema.FieldType.map(FieldType.STRING.withNullable(false), FieldType.STRING); + public static final Schema ATTRIBUTE_ARRAY_ENTRY_SCHEMA = + Schema.builder().addStringField("key").addStringField("value").build(); + public static final FieldType ATTRIBUTE_ARRAY_FIELD_TYPE = + Schema.FieldType.array(Schema.FieldType.row(ATTRIBUTE_ARRAY_ENTRY_SCHEMA)); /** Returns an id that uniquely represents this IO. */ @Override @@ -118,6 +124,11 @@ public Schema configurationSchema() { .addNullableField("timestampAttributeKey", FieldType.STRING) .addNullableField("deadLetterQueue", FieldType.STRING) .addNullableField("format", FieldType.STRING) + // For ThriftPayloadSerializerProvider + .addNullableField("thriftClass", FieldType.STRING) + .addNullableField("thriftProtocolFactoryClass", FieldType.STRING) + // For ProtoPayloadSerializerProvider + .addNullableField("protoClass", FieldType.STRING) .build(); } @@ -149,7 +160,7 @@ private void validateDataSchema(Schema schema) { "Unsupported schema specified for Pubsub source in CREATE TABLE." + "CREATE TABLE for Pubsub topic must not be null"); } - if (!PubsubSchemaIO.fieldPresent(schema, TIMESTAMP_FIELD, TIMESTAMP)) { + if (!PubsubSchemaIO.fieldPresent(schema, TIMESTAMP_FIELD, FieldType.DATETIME)) { throw new InvalidSchemaException( "Unsupported schema specified for Pubsub source in CREATE TABLE." + "CREATE TABLE for Pubsub topic must include at least 'event_timestamp' field of " @@ -180,7 +191,7 @@ private static class PubsubSchemaIO implements SchemaIO, Serializable { private PubsubSchemaIO(String location, Row config, Schema dataSchema) { this.dataSchema = dataSchema; this.location = location; - this.useFlatSchema = !definesAttributeAndPayload(dataSchema); + this.useFlatSchema = !shouldUseNestedSchema(dataSchema); this.config = new AutoValueSchema().fromRowFunction(TypeDescriptor.of(Config.class)).apply(config); } @@ -190,22 +201,27 @@ public Schema schema() { return dataSchema; } + private boolean needsSerializer() { + return useFlatSchema || !fieldPresent(schema(), PAYLOAD_FIELD, FieldType.BYTES); + } + @Override public PTransform> buildReader() { return new PTransform>() { @Override public PCollection expand(PBegin begin) { + PubsubMessageToRow.Builder builder = + PubsubMessageToRow.builder() + .messageSchema(dataSchema) + .useDlq(config.useDeadLetterQueue()) + .useFlatSchema(useFlatSchema); + if (needsSerializer()) { + builder.serializerProvider(config::serializer); + } PCollectionTuple rowsWithDlq = begin .apply("ReadFromPubsub", readMessagesWithAttributes()) - .apply( - "PubsubMessageToRow", - PubsubMessageToRow.builder() - .messageSchema(dataSchema) - .useDlq(config.useDeadLetterQueue()) - .useFlatSchema(useFlatSchema) - .payloadFormat(config.format()) - .build()); + .apply("PubsubMessageToRow", builder.build()); rowsWithDlq.get(MAIN_TAG).setRowSchema(dataSchema); if (config.useDeadLetterQueue()) { @@ -219,19 +235,31 @@ public PCollection expand(PBegin begin) { @Override public PTransform, POutput> buildWriter() { - if (!useFlatSchema) { - throw new UnsupportedOperationException( - "Writing to a Pubsub topic is only supported for flattened schemas"); - } + @Nullable + PayloadSerializer serializer = + needsSerializer() ? config.serializer(stripFromTimestampField(dataSchema)) : null; return new PTransform, POutput>() { @Override public POutput expand(PCollection input) { - return input - .apply( - RowToPubsubMessage.of( - config.useTimestampAttribute(), config.format(), dataSchema)) - .apply(createPubsubMessageWrite()); + PCollection filtered = + input.apply(new AddTimestampAttribute(config.useTimestampAttribute())); + PCollection transformed; + if (useFlatSchema) { + transformed = + filtered.apply( + "Transform Flat Schema", + MapElements.into(TypeDescriptor.of(PubsubMessage.class)) + .via( + row -> + new PubsubMessage(serializer.serialize(row), ImmutableMap.of()))); + } else { + transformed = + filtered.apply( + "Transform Nested Schema", + MapElements.via(new NestedRowToMessage(serializer, filtered.getSchema()))); + } + return transformed.apply(createPubsubMessageWrite()); } }; } @@ -261,11 +289,23 @@ private PubsubIO.Write writeMessagesToDlq() { : write; } - private boolean definesAttributeAndPayload(Schema schema) { - return fieldPresent( - schema, ATTRIBUTES_FIELD, Schema.FieldType.map(VARCHAR.withNullable(false), VARCHAR)) - && (schema.hasField(PAYLOAD_FIELD) - && ROW.equals(schema.getField(PAYLOAD_FIELD).getType().getTypeName())); + private boolean hasValidAttributesField(Schema schema) { + return fieldPresent(schema, ATTRIBUTES_FIELD, ATTRIBUTE_MAP_FIELD_TYPE) + || fieldPresent(schema, ATTRIBUTES_FIELD, ATTRIBUTE_ARRAY_FIELD_TYPE); + } + + private boolean hasValidPayloadField(Schema schema) { + if (!schema.hasField(PAYLOAD_FIELD)) { + return false; + } + if (fieldPresent(schema, PAYLOAD_FIELD, FieldType.BYTES)) { + return true; + } + return schema.getField(PAYLOAD_FIELD).getType().getTypeName().equals(ROW); + } + + private boolean shouldUseNestedSchema(Schema schema) { + return hasValidPayloadField(schema) && hasValidAttributesField(schema); } private static boolean fieldPresent( @@ -276,6 +316,14 @@ private static boolean fieldPresent( } } + private static Schema stripFromTimestampField(Schema schema) { + List selectedFields = + schema.getFields().stream() + .filter(field -> !TIMESTAMP_FIELD.equals(field.getName())) + .collect(toList()); + return Schema.of(selectedFields.toArray(new Schema.Field[0])); + } + @AutoValue abstract static class Config implements Serializable { @@ -285,6 +333,14 @@ abstract static class Config implements Serializable { abstract @Nullable String getFormat(); + // For ThriftPayloadSerializerProvider + abstract @Nullable String getThriftClass(); + + abstract @Nullable String getThriftProtocolFactoryClass(); + + // For ProtoPayloadSerializerProvider + abstract @Nullable String getProtoClass(); + boolean useDeadLetterQueue() { return getDeadLetterQueue() != null; } @@ -293,10 +349,19 @@ boolean useTimestampAttribute() { return getTimestampAttributeKey() != null; } - PayloadFormat format() { - return getFormat() == null - ? PayloadFormat.JSON - : PayloadFormat.valueOf(getFormat().toUpperCase()); + PayloadSerializer serializer(Schema schema) { + String format = getFormat() == null ? "json" : getFormat(); + ImmutableMap.Builder params = ImmutableMap.builder(); + if (getThriftClass() != null) { + params.put("thriftClass", getThriftClass()); + } + if (getThriftProtocolFactoryClass() != null) { + params.put("thriftProtocolFactoryClass", getThriftProtocolFactoryClass()); + } + if (getProtoClass() != null) { + params.put("protoClass", getProtoClass()); + } + return PayloadSerializers.getSerializer(format, schema, params.build()); } } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClient.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClient.java index b5de4a31197a..08a6bddb6684 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClient.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClient.java @@ -103,13 +103,8 @@ public static PubsubTestClientFactory createFactoryForPublish( final TopicPath expectedTopic, final Iterable expectedOutgoingMessages, final Iterable failingOutgoingMessages) { - synchronized (STATE) { - checkState(!STATE.isActive, "Test still in flight"); - STATE.expectedTopic = expectedTopic; - STATE.remainingExpectedOutgoingMessages = Sets.newHashSet(expectedOutgoingMessages); - STATE.remainingFailingOutgoingMessages = Sets.newHashSet(failingOutgoingMessages); - STATE.isActive = true; - } + activate( + () -> setPublishState(expectedTopic, expectedOutgoingMessages, failingOutgoingMessages)); return new PubsubTestClientFactory() { @Override public PubsubClient newClient( @@ -125,15 +120,7 @@ public String getKind() { @Override public void close() { - synchronized (STATE) { - checkState(STATE.isActive, "No test still in flight"); - checkState( - STATE.remainingExpectedOutgoingMessages.isEmpty(), - "Still waiting for %s messages to be published", - STATE.remainingExpectedOutgoingMessages.size()); - STATE.isActive = false; - STATE.remainingExpectedOutgoingMessages = null; - } + deactivate(PubsubTestClient::performFinalPublishStateChecks); } }; } @@ -147,16 +134,8 @@ public static PubsubTestClientFactory createFactoryForPull( final SubscriptionPath expectedSubscription, final int ackTimeoutSec, final Iterable expectedIncomingMessages) { - synchronized (STATE) { - checkState(!STATE.isActive, "Test still in flight"); - STATE.clock = clock; - STATE.expectedSubscription = expectedSubscription; - STATE.ackTimeoutSec = ackTimeoutSec; - STATE.remainingPendingIncomingMessages = Lists.newArrayList(expectedIncomingMessages); - STATE.pendingAckIncomingMessages = new HashMap<>(); - STATE.ackDeadline = new HashMap<>(); - STATE.isActive = true; - } + activate( + () -> setPullState(expectedSubscription, clock, ackTimeoutSec, expectedIncomingMessages)); return new PubsubTestClientFactory() { @Override public PubsubClient newClient( @@ -172,29 +151,139 @@ public String getKind() { @Override public void close() { - synchronized (STATE) { - checkState(STATE.isActive, "No test still in flight"); - checkState( - STATE.remainingPendingIncomingMessages.isEmpty(), - "Still waiting for %s messages to be pulled", - STATE.remainingPendingIncomingMessages.size()); - checkState( - STATE.pendingAckIncomingMessages.isEmpty(), - "Still waiting for %s messages to be ACKed", - STATE.pendingAckIncomingMessages.size()); - checkState( - STATE.ackDeadline.isEmpty(), - "Still waiting for %s messages to be ACKed", - STATE.ackDeadline.size()); - STATE.isActive = false; - STATE.remainingPendingIncomingMessages = null; - STATE.pendingAckIncomingMessages = null; - STATE.ackDeadline = null; - } + deactivate(PubsubTestClient::performFinalPullStateChecks); } }; } + /** + * Returns a factory for a test that is expected to both publish and pull messages over the course + * of the test. + */ + public static PubsubTestClientFactory createFactoryForPullAndPublish( + final SubscriptionPath pullSubscription, + final TopicPath publishTopicPath, + final Clock pullClock, + final int pullAckTimeoutSec, + final Iterable expectedIncomingMessages, + final Iterable expectedOutgoingMessages, + final Iterable failingOutgoingMessages) { + activate( + () -> { + setPublishState(publishTopicPath, expectedOutgoingMessages, failingOutgoingMessages); + setPullState(pullSubscription, pullClock, pullAckTimeoutSec, expectedIncomingMessages); + }); + return new PubsubTestClientFactory() { + @Override + public void close() throws IOException { + deactivate( + () -> { + performFinalPublishStateChecks(); + performFinalPullStateChecks(); + }); + } + + @Override + public PubsubClient newClient( + @Nullable String timestampAttribute, @Nullable String idAttribute, PubsubOptions options) + throws IOException { + return new PubsubTestClient(); + } + + @Override + public String getKind() { + return "PublishAndPullTest"; + } + }; + } + + /** + * Activates {@link PubsubTestClientFactory} state for the test. This can only be called once per + * test. + * + *

    It is not necessary to set {@code STATE.isActive}. That will be set regardless. + * + * @param setStateValues a {@link Runnable} that sets all the desired values. + */ + private static void activate(Runnable setStateValues) { + synchronized (STATE) { + checkState(!STATE.isActive, "Test still in flight"); + setStateValues.run(); + STATE.isActive = true; + } + } + + /** + * Deactivates {@link PubsubTestClientFactory} state for use in other tests. This can only be + * called once per test. + * + *

    It is not necessary to check or set {@code STATE.isActivate}. That will be handled by this + * method. + * + * @param runFinalChecks a {@link Runnable} to handle any final state checking before marking + * {@code STATE} as no longer active + */ + private static void deactivate(Runnable runFinalChecks) { + synchronized (STATE) { + checkState(STATE.isActive, "No test still in flight"); + runFinalChecks.run(); + STATE.remainingExpectedOutgoingMessages = null; + STATE.remainingPendingIncomingMessages = null; + STATE.pendingAckIncomingMessages = null; + STATE.ackDeadline = null; + STATE.isActive = false; + } + } + + /** Handles setting {@code STATE} values for a publishing client. */ + private static void setPublishState( + final TopicPath expectedTopic, + final Iterable expectedOutgoingMessages, + final Iterable failingOutgoingMessages) { + STATE.expectedTopic = expectedTopic; + STATE.remainingExpectedOutgoingMessages = Sets.newHashSet(expectedOutgoingMessages); + STATE.remainingFailingOutgoingMessages = Sets.newHashSet(failingOutgoingMessages); + } + + /** Handles setting {@code STATE} values for a pulling client. */ + private static void setPullState( + final SubscriptionPath expectedSubscription, + final Clock clock, + final int ackTimeoutSec, + final Iterable expectedIncomingMessages) { + STATE.clock = clock; + STATE.expectedSubscription = expectedSubscription; + STATE.ackTimeoutSec = ackTimeoutSec; + STATE.remainingPendingIncomingMessages = Lists.newArrayList(expectedIncomingMessages); + STATE.pendingAckIncomingMessages = new HashMap<>(); + STATE.ackDeadline = new HashMap<>(); + } + + /** Handles verifying {@code STATE} at end of publish test. */ + private static void performFinalPublishStateChecks() { + checkState(STATE.isActive, "No test still in flight"); + checkState( + STATE.remainingExpectedOutgoingMessages.isEmpty(), + "Still waiting for %s messages to be published", + STATE.remainingExpectedOutgoingMessages.size()); + } + + /** Handles verifying {@code STATE} at end of pull test. */ + private static void performFinalPullStateChecks() { + checkState( + STATE.remainingPendingIncomingMessages.isEmpty(), + "Still waiting for %s messages to be pulled", + STATE.remainingPendingIncomingMessages.size()); + checkState( + STATE.pendingAckIncomingMessages.isEmpty(), + "Still waiting for %s messages to be ACKed", + STATE.pendingAckIncomingMessages.size()); + checkState( + STATE.ackDeadline.isEmpty(), + "Still waiting for %s messages to be ACKed", + STATE.ackDeadline.size()); + } + public static PubsubTestClientFactory createFactoryForCreateSubscription() { return new PubsubTestClientFactory() { int numCalls = 0; diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSink.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSink.java index 5c9fca3d77f7..e6ce089e9260 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSink.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSink.java @@ -29,6 +29,7 @@ import java.util.concurrent.ThreadLocalRandom; import org.apache.beam.sdk.coders.AtomicCoder; import org.apache.beam.sdk.coders.BigEndianLongCoder; +import org.apache.beam.sdk.coders.ByteArrayCoder; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.KvCoder; @@ -46,6 +47,7 @@ import org.apache.beam.sdk.options.ValueProvider; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.display.DisplayData; @@ -59,6 +61,7 @@ import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.TypeDescriptor; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing; import org.checkerframework.checker.nullness.qual.Nullable; @@ -141,7 +144,7 @@ enum RecordIdMethod { // ================================================================================ /** Convert elements to messages and shard them. */ - private static class ShardFn extends DoFn> { + private static class ShardFn extends DoFn> { private final Counter elementCounter = Metrics.counter(ShardFn.class, "elements"); private final int numShards; private final RecordIdMethod recordIdMethod; @@ -154,8 +157,9 @@ private static class ShardFn extends DoFn getTopicProvider() { @Override public PDone expand(PCollection input) { - input - .apply( - "PubsubUnboundedSink.Window", - Window.into(new GlobalWindows()) - .triggering( - Repeatedly.forever( - AfterFirst.of( - AfterPane.elementCountAtLeast(publishBatchSize), - AfterProcessingTime.pastFirstElementInPane().plusDelayOf(maxLatency)))) - .discardingFiredPanes()) - .apply("PubsubUnboundedSink.Shard", ParDo.of(new ShardFn(numShards, recordIdMethod))) - .setCoder(KvCoder.of(VarIntCoder.of(), CODER)) - .apply(GroupByKey.create()) + return input .apply( - "PubsubUnboundedSink.Writer", - ParDo.of( - new WriterFn( - pubsubFactory, - topic, - timestampAttribute, - idAttribute, - publishBatchSize, - publishBatchBytes))); - return PDone.in(input.getPipeline()); + "Output Serialized PubsubMessage Proto", + MapElements.into(new TypeDescriptor() {}) + .via(new PubsubMessages.ParsePayloadAsPubsubMessageProto())) + .setCoder(ByteArrayCoder.of()) + .apply(new PubsubSink(this)); + } + + static class PubsubSink extends PTransform, PDone> { + public final PubsubUnboundedSink outer; + + PubsubSink(PubsubUnboundedSink outer) { + this.outer = outer; + } + + @Override + public PDone expand(PCollection input) { + input + .apply( + "PubsubUnboundedSink.Window", + Window.into(new GlobalWindows()) + .triggering( + Repeatedly.forever( + AfterFirst.of( + AfterPane.elementCountAtLeast(outer.publishBatchSize), + AfterProcessingTime.pastFirstElementInPane() + .plusDelayOf(outer.maxLatency)))) + .discardingFiredPanes()) + .apply( + "PubsubUnboundedSink.Shard", + ParDo.of(new ShardFn(outer.numShards, outer.recordIdMethod))) + .setCoder(KvCoder.of(VarIntCoder.of(), CODER)) + .apply(GroupByKey.create()) + .apply( + "PubsubUnboundedSink.Writer", + ParDo.of( + new WriterFn( + outer.pubsubFactory, + outer.topic, + outer.timestampAttribute, + outer.idAttribute, + outer.publishBatchSize, + outer.publishBatchBytes))); + return PDone.in(input.getPipeline()); + } } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java index 37d0e943345d..c5b7f3fb0ff5 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java @@ -42,6 +42,7 @@ import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicReference; import org.apache.beam.sdk.coders.AtomicCoder; +import org.apache.beam.sdk.coders.ByteArrayCoder; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.coders.NullableCoder; @@ -53,15 +54,19 @@ import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.PubsubClientFactory; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.SubscriptionPath; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.TopicPath; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessages.DeserializeBytesIntoPubsubMessagePayloadOnly; import org.apache.beam.sdk.metrics.Counter; import org.apache.beam.sdk.metrics.SourceMetrics; +import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.ValueProvider; import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.transforms.Sum; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.display.DisplayData.Builder; @@ -70,6 +75,7 @@ import org.apache.beam.sdk.util.MovingFunction; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; @@ -358,7 +364,7 @@ public PubsubCheckpoint decode(InputStream inStream) throws IOException { * consumed downstream and/or ACKed back to Pubsub. */ @VisibleForTesting - static class PubsubReader extends UnboundedSource.UnboundedReader { + static class PubsubReader extends UnboundedSource.UnboundedReader { /** For access to topic and checkpointCoder. */ private final PubsubSource outer; @@ -882,14 +888,16 @@ public boolean advance() throws IOException { } @Override - public PubsubMessage getCurrent() throws NoSuchElementException { + public byte[] getCurrent() throws NoSuchElementException { if (current == null) { throw new NoSuchElementException(); } - return new PubsubMessage( - current.message().getData().toByteArray(), - current.message().getAttributesMap(), - current.recordId()); + if (this.outer.outer.getNeedsMessageId() || this.outer.outer.getNeedsAttributes()) { + com.google.pubsub.v1.PubsubMessage output = + current.message().toBuilder().setMessageId(current.recordId()).build(); + return output.toByteArray(); + } + return current.message().getData().toByteArray(); } @Override @@ -1010,7 +1018,7 @@ public long getSplitBacklogBytes() { // ================================================================================ @VisibleForTesting - static class PubsubSource extends UnboundedSource { + static class PubsubSource extends UnboundedSource { public final PubsubUnboundedSource outer; // The subscription to read from. @VisibleForTesting final ValueProvider subscriptionPath; @@ -1086,16 +1094,8 @@ public PubsubReader createReader( } @Override - public Coder getOutputCoder() { - if (outer.getNeedsMessageId()) { - return outer.getNeedsAttributes() - ? PubsubMessageWithAttributesAndMessageIdCoder.of() - : PubsubMessageWithMessageIdCoder.of(); - } else { - return outer.getNeedsAttributes() - ? PubsubMessageWithAttributesCoder.of() - : PubsubMessagePayloadOnlyCoder.of(); - } + public Coder getOutputCoder() { + return ByteArrayCoder.of(); } @Override @@ -1336,14 +1336,52 @@ public boolean getNeedsMessageId() { @Override public PCollection expand(PBegin input) { - return input - .getPipeline() - .begin() - .apply(Read.from(new PubsubSource(this))) - .apply( - "PubsubUnboundedSource.Stats", - ParDo.of( - new StatsFn(pubsubFactory, subscription, topic, timestampAttribute, idAttribute))); + SerializableFunction function; + if (getNeedsAttributes() || getNeedsMessageId()) { + function = new PubsubMessages.ParsePubsubMessageProtoAsPayload(); + } else { + function = new DeserializeBytesIntoPubsubMessagePayloadOnly(); + } + Coder messageCoder; + if (getNeedsMessageId()) { + messageCoder = + getNeedsAttributes() + ? PubsubMessageWithAttributesAndMessageIdCoder.of() + : PubsubMessageWithMessageIdCoder.of(); + } else { + messageCoder = + getNeedsAttributes() + ? PubsubMessageWithAttributesCoder.of() + : PubsubMessagePayloadOnlyCoder.of(); + } + PCollection messages = + input + .getPipeline() + .begin() + .apply(Read.from(new PubsubSource(this))) + .apply( + "MapBytesToPubsubMessages", + MapElements.into(TypeDescriptor.of(PubsubMessage.class)).via(function)) + .setCoder(messageCoder); + if (usesStatsFn(input.getPipeline().getOptions())) { + messages = + messages.apply( + "PubsubUnboundedSource.Stats", + ParDo.of( + new StatsFn( + pubsubFactory, subscription, topic, timestampAttribute, idAttribute))); + } + return messages; + } + + private boolean usesStatsFn(PipelineOptions options) { + if (ExperimentalOptions.hasExperiment(options, "enable_custom_pubsub_source")) { + return true; + } + if (!options.getRunner().getName().startsWith("org.apache.beam.runners.dataflow.")) { + return true; + } + return false; } private SubscriptionPath createRandomSubscription(PipelineOptions options) { diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/RowToPubsubMessage.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/RowToPubsubMessage.java deleted file mode 100644 index 4208e7fcb5bc..000000000000 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/RowToPubsubMessage.java +++ /dev/null @@ -1,117 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.pubsub; - -import static java.nio.charset.StandardCharsets.UTF_8; -import static java.util.stream.Collectors.toList; -import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.TIMESTAMP_FIELD; -import static org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PayloadFormat; - -import java.util.List; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.schemas.transforms.DropFields; -import org.apache.beam.sdk.schemas.utils.AvroUtils; -import org.apache.beam.sdk.transforms.MapElements; -import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.transforms.SimpleFunction; -import org.apache.beam.sdk.transforms.ToJson; -import org.apache.beam.sdk.transforms.WithTimestamps; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; -import org.checkerframework.checker.nullness.qual.Nullable; - -/** - * A {@link PTransform} to convert {@link Row} to {@link PubsubMessage} with JSON/AVRO payload. - * - *

    Currently only supports writing a flat schema into a JSON/AVRO payload. This means that all - * Row field values are written to the {@link PubsubMessage} payload, except for {@code - * event_timestamp}, which is either ignored or written to the message attributes, depending on - * whether config.getValue("timestampAttributeKey") is set. - */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -class RowToPubsubMessage extends PTransform, PCollection> { - private final boolean useTimestampAttribute; - private final PayloadFormat payloadFormat; - private final @Nullable Schema payloadSchema; - - private RowToPubsubMessage( - boolean useTimestampAttribute, PayloadFormat payloadFormat, @Nullable Schema schema) { - this.useTimestampAttribute = useTimestampAttribute; - this.payloadFormat = payloadFormat; - this.payloadSchema = schema == null ? null : stripFromTimestampField(schema); - } - - public static RowToPubsubMessage of(boolean useTimestampAttribute, PayloadFormat payloadFormat) { - return new RowToPubsubMessage(useTimestampAttribute, payloadFormat, null); - } - - public static RowToPubsubMessage of( - boolean useTimestampAttribute, PayloadFormat payloadFormat, Schema schema) { - return new RowToPubsubMessage(useTimestampAttribute, payloadFormat, schema); - } - - @Override - public PCollection expand(PCollection input) { - PCollection withTimestamp = - useTimestampAttribute - ? input.apply(WithTimestamps.of((row) -> row.getDateTime(TIMESTAMP_FIELD).toInstant())) - : input; - - withTimestamp = withTimestamp.apply(DropFields.fields(TIMESTAMP_FIELD)); - switch (payloadFormat) { - case JSON: - return withTimestamp - .apply("MapRowToJsonString", ToJson.of()) - .apply("MapToJsonBytes", MapElements.via(new StringToBytes())) - .apply("MapToPubsubMessage", MapElements.via(new ToPubsubMessage())); - case AVRO: - return withTimestamp - .apply( - "MapRowToAvroBytes", - MapElements.via(AvroUtils.getRowToAvroBytesFunction(payloadSchema))) - .apply("MapToPubsubMessage", MapElements.via(new ToPubsubMessage())); - default: - throw new IllegalArgumentException("Unsupported payload format: " + payloadFormat); - } - } - - private static class StringToBytes extends SimpleFunction { - @Override - public byte[] apply(String s) { - return s.getBytes(UTF_8); - } - } - - private static class ToPubsubMessage extends SimpleFunction { - @Override - public PubsubMessage apply(byte[] bytes) { - return new PubsubMessage(bytes, ImmutableMap.of()); - } - } - - private Schema stripFromTimestampField(Schema schema) { - List selectedFields = - schema.getFields().stream() - .filter(field -> !TIMESTAMP_FIELD.equals(field.getName())) - .collect(toList()); - return Schema.of(selectedFields.toArray(new Schema.Field[0])); - } -} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java index 2c32c550886a..daa487789700 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java @@ -22,6 +22,9 @@ import com.google.api.core.ApiFuture; import com.google.api.core.ApiFutures; +import com.google.api.gax.grpc.GrpcTransportChannel; +import com.google.api.gax.rpc.FixedTransportChannelProvider; +import com.google.api.gax.rpc.TransportChannelProvider; import com.google.cloud.pubsub.v1.AckReplyConsumer; import com.google.cloud.pubsub.v1.MessageReceiver; import com.google.cloud.pubsub.v1.Publisher; @@ -32,7 +35,8 @@ import com.google.cloud.pubsub.v1.TopicAdminSettings; import com.google.protobuf.ByteString; import com.google.pubsub.v1.PushConfig; -import com.google.pubsub.v1.Subscription; +import io.grpc.ManagedChannel; +import io.grpc.ManagedChannelBuilder; import java.io.IOException; import java.util.List; import java.util.concurrent.BlockingQueue; @@ -43,6 +47,7 @@ import java.util.stream.Collectors; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.SubscriptionPath; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.TopicPath; +import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.testing.TestPipelineOptions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; @@ -78,25 +83,38 @@ public class TestPubsub implements TestRule { private final TestPubsubOptions pipelineOptions; private final String pubsubEndpoint; + private final boolean isLocalhost; private @Nullable TopicAdminClient topicAdmin = null; private @Nullable SubscriptionAdminClient subscriptionAdmin = null; private @Nullable TopicPath eventsTopicPath = null; private @Nullable SubscriptionPath subscriptionPath = null; + private @Nullable ManagedChannel channel = null; + private @Nullable TransportChannelProvider channelProvider = null; /** - * Creates an instance of this rule. + * Creates an instance of this rule using options provided by {@link + * TestPipeline#testingPipelineOptions()}. * *

    Loads GCP configuration from {@link TestPipelineOptions}. */ public static TestPubsub create() { - TestPubsubOptions options = TestPipeline.testingPipelineOptions().as(TestPubsubOptions.class); - return new TestPubsub(options); + return fromOptions(TestPipeline.testingPipelineOptions()); + } + + /** + * Creates an instance of this rule using provided options. + * + *

    Loads GCP configuration from {@link TestPipelineOptions}. + */ + public static TestPubsub fromOptions(PipelineOptions options) { + return new TestPubsub(options.as(TestPubsubOptions.class)); } private TestPubsub(TestPubsubOptions pipelineOptions) { this.pipelineOptions = pipelineOptions; this.pubsubEndpoint = PubsubOptions.targetForRootUrl(this.pipelineOptions.getPubsubRootUrl()); + this.isLocalhost = this.pubsubEndpoint.startsWith("localhost"); } @Override @@ -125,16 +143,24 @@ public void evaluate() throws Throwable { } private void initializePubsub(Description description) throws IOException { + if (isLocalhost) { + channel = ManagedChannelBuilder.forTarget(pubsubEndpoint).usePlaintext().build(); + } else { + channel = ManagedChannelBuilder.forTarget(pubsubEndpoint).useTransportSecurity().build(); + } + channelProvider = FixedTransportChannelProvider.create(GrpcTransportChannel.create(channel)); topicAdmin = TopicAdminClient.create( TopicAdminSettings.newBuilder() .setCredentialsProvider(pipelineOptions::getGcpCredential) + .setTransportChannelProvider(channelProvider) .setEndpoint(pubsubEndpoint) .build()); subscriptionAdmin = SubscriptionAdminClient.create( SubscriptionAdminSettings.newBuilder() .setCredentialsProvider(pipelineOptions::getGcpCredential) + .setTransportChannelProvider(channelProvider) .setEndpoint(pubsubEndpoint) .build()); TopicPath eventsTopicPathTmp = @@ -142,25 +168,27 @@ private void initializePubsub(Description description) throws IOException { pipelineOptions.getProject(), createTopicName(description, EVENTS_TOPIC_NAME)); topicAdmin.createTopic(eventsTopicPathTmp.getPath()); + // Set this after successful creation; it signals that the topic needs teardown eventsTopicPath = eventsTopicPathTmp; String subscriptionName = topicPath().getName() + "_beam_" + ThreadLocalRandom.current().nextLong(); - subscriptionPath = + SubscriptionPath subscriptionPathTmp = new SubscriptionPath( String.format( "projects/%s/subscriptions/%s", pipelineOptions.getProject(), subscriptionName)); - Subscription subscription = - subscriptionAdmin.createSubscription( - subscriptionPath.getPath(), - topicPath().getPath(), - PushConfig.getDefaultInstance(), - DEFAULT_ACK_DEADLINE_SECONDS); + subscriptionAdmin.createSubscription( + subscriptionPathTmp.getPath(), + topicPath().getPath(), + PushConfig.getDefaultInstance(), + DEFAULT_ACK_DEADLINE_SECONDS); + + subscriptionPath = subscriptionPathTmp; } - private void tearDown() throws IOException { - if (subscriptionAdmin == null || topicAdmin == null) { + private void tearDown() { + if (subscriptionAdmin == null || topicAdmin == null || channel == null) { return; } @@ -178,9 +206,12 @@ private void tearDown() throws IOException { } finally { subscriptionAdmin.close(); topicAdmin.close(); + channel.shutdown(); subscriptionAdmin = null; topicAdmin = null; + channelProvider = null; + channel = null; eventsTopicPath = null; subscriptionPath = null; @@ -244,6 +275,7 @@ public void publish(List messages) { eventPublisher = Publisher.newBuilder(eventsTopicPath.getPath()) .setCredentialsProvider(pipelineOptions::getGcpCredential) + .setChannelProvider(channelProvider) .setEndpoint(pubsubEndpoint) .build(); } catch (IOException e) { @@ -296,6 +328,7 @@ public List waitForNMessages(int n, Duration timeoutDuration) Subscriber subscriber = Subscriber.newBuilder(subscriptionPath.getPath(), receiver) .setCredentialsProvider(pipelineOptions::getGcpCredential) + .setChannelProvider(channelProvider) .setEndpoint(pubsubEndpoint) .build(); subscriber.startAsync(); @@ -381,7 +414,7 @@ public void assertSubscriptionEventuallyCreated(String project, Duration timeout DateTime startTime = new DateTime(); int sizeOfSubscriptionList = 0; while (sizeOfSubscriptionList == 0 - && Seconds.secondsBetween(new DateTime(), startTime).getSeconds() + && Seconds.secondsBetween(startTime, new DateTime()).getSeconds() < timeoutDuration.toStandardSeconds().getSeconds()) { // Sleep 1 sec Thread.sleep(1000); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsubSignal.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsubSignal.java index c185914c093e..2d93946c6279 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsubSignal.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsubSignal.java @@ -17,18 +17,23 @@ */ package org.apache.beam.sdk.io.gcp.pubsub; -import static java.util.stream.Collectors.toList; import static org.apache.beam.sdk.io.gcp.pubsub.TestPubsub.createTopicName; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; -import io.grpc.Status; -import io.grpc.StatusRuntimeException; +import com.google.api.gax.rpc.ApiException; +import com.google.cloud.pubsub.v1.AckReplyConsumer; +import com.google.cloud.pubsub.v1.MessageReceiver; +import com.google.cloud.pubsub.v1.Subscriber; +import com.google.cloud.pubsub.v1.SubscriptionAdminClient; +import com.google.cloud.pubsub.v1.SubscriptionAdminSettings; +import com.google.cloud.pubsub.v1.TopicAdminClient; +import com.google.cloud.pubsub.v1.TopicAdminSettings; +import com.google.pubsub.v1.PushConfig; import java.io.IOException; -import java.util.List; import java.util.Set; import java.util.concurrent.ThreadLocalRandom; +import java.util.concurrent.atomic.AtomicReference; import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.IncomingMessage; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.SubscriptionPath; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.TopicPath; import org.apache.beam.sdk.state.BagState; @@ -55,6 +60,7 @@ import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.DateTime; import org.joda.time.Duration; +import org.joda.time.Seconds; import org.junit.rules.TestRule; import org.junit.runner.Description; import org.junit.runners.model.Statement; @@ -76,12 +82,16 @@ public class TestPubsubSignal implements TestRule { private static final String RESULT_SUCCESS_MESSAGE = "SUCCESS"; private static final String START_TOPIC_NAME = "start"; private static final String START_SIGNAL_MESSAGE = "START SIGNAL"; + private static final Integer DEFAULT_ACK_DEADLINE_SECONDS = 60; private static final String NO_ID_ATTRIBUTE = null; private static final String NO_TIMESTAMP_ATTRIBUTE = null; - PubsubClient pubsub; - private TestPubsubOptions pipelineOptions; + private final TestPubsubOptions pipelineOptions; + private final String pubsubEndpoint; + + private @Nullable TopicAdminClient topicAdmin = null; + private @Nullable SubscriptionAdminClient subscriptionAdmin = null; private @Nullable TopicPath resultTopicPath = null; private @Nullable TopicPath startTopicPath = null; @@ -97,6 +107,7 @@ public static TestPubsubSignal create() { private TestPubsubSignal(TestPubsubOptions pipelineOptions) { this.pipelineOptions = pipelineOptions; + this.pubsubEndpoint = PubsubOptions.targetForRootUrl(this.pipelineOptions.getPubsubRootUrl()); } @Override @@ -104,7 +115,7 @@ public Statement apply(Statement base, Description description) { return new Statement() { @Override public void evaluate() throws Throwable { - if (TestPubsubSignal.this.pubsub != null) { + if (topicAdmin != null || subscriptionAdmin != null) { throw new AssertionError( "Pubsub client was not shutdown in previous test. " + "Topic path is'" @@ -125,9 +136,18 @@ public void evaluate() throws Throwable { } private void initializePubsub(Description description) throws IOException { - pubsub = - PubsubGrpcClient.FACTORY.newClient( - NO_TIMESTAMP_ATTRIBUTE, NO_ID_ATTRIBUTE, pipelineOptions); + topicAdmin = + TopicAdminClient.create( + TopicAdminSettings.newBuilder() + .setCredentialsProvider(pipelineOptions::getGcpCredential) + .setEndpoint(pubsubEndpoint) + .build()); + subscriptionAdmin = + SubscriptionAdminClient.create( + SubscriptionAdminSettings.newBuilder() + .setCredentialsProvider(pipelineOptions::getGcpCredential) + .setEndpoint(pubsubEndpoint) + .build()); // Example topic name: // integ-test-TestClassName-testMethodName-2018-12-11-23-32-333--result @@ -138,8 +158,8 @@ private void initializePubsub(Description description) throws IOException { PubsubClient.topicPathFromName( pipelineOptions.getProject(), createTopicName(description, START_TOPIC_NAME)); - pubsub.createTopic(resultTopicPathTmp); - pubsub.createTopic(startTopicPathTmp); + topicAdmin.createTopic(resultTopicPathTmp.getPath()); + topicAdmin.createTopic(startTopicPathTmp.getPath()); // Set these after successful creation; this signals that they need teardown resultTopicPath = resultTopicPathTmp; @@ -147,21 +167,34 @@ private void initializePubsub(Description description) throws IOException { } private void tearDown() throws IOException { - if (pubsub == null) { + if (subscriptionAdmin == null || topicAdmin == null) { return; } try { if (resultTopicPath != null) { - pubsub.deleteTopic(resultTopicPath); + for (String subscriptionPath : + topicAdmin.listTopicSubscriptions(resultTopicPath.getPath()).iterateAll()) { + subscriptionAdmin.deleteSubscription(subscriptionPath); + } + topicAdmin.deleteTopic(resultTopicPath.getPath()); } if (startTopicPath != null) { - pubsub.deleteTopic(startTopicPath); + for (String subscriptionPath : + topicAdmin.listTopicSubscriptions(startTopicPath.getPath()).iterateAll()) { + subscriptionAdmin.deleteSubscription(subscriptionPath); + } + topicAdmin.deleteTopic(startTopicPath.getPath()); } } finally { - pubsub.close(); - pubsub = null; + subscriptionAdmin.close(); + topicAdmin.close(); + + subscriptionAdmin = null; + topicAdmin = null; + resultTopicPath = null; + startTopicPath = null; } } @@ -215,8 +248,11 @@ public Supplier waitForStart(Duration duration) throws IOException { pipelineOptions.getProject(), "start-subscription-" + String.valueOf(ThreadLocalRandom.current().nextLong())); - pubsub.createSubscription( - startTopicPath, startSubscriptionPath, (int) duration.getStandardSeconds()); + subscriptionAdmin.createSubscription( + startSubscriptionPath.getPath(), + startTopicPath.getPath(), + PushConfig.getDefaultInstance(), + (int) duration.getStandardSeconds()); return Suppliers.memoize( () -> { @@ -228,8 +264,8 @@ public Supplier waitForStart(Duration duration) throws IOException { throw new RuntimeException(e); } finally { try { - pubsub.deleteSubscription(startSubscriptionPath); - } catch (IOException e) { + subscriptionAdmin.deleteSubscription(startSubscriptionPath.getPath()); + } catch (ApiException e) { LOG.error(String.format("Leaked PubSub subscription '%s'", startSubscriptionPath)); } } @@ -243,14 +279,17 @@ public void waitForSuccess(Duration duration) throws IOException { pipelineOptions.getProject(), "result-subscription-" + String.valueOf(ThreadLocalRandom.current().nextLong())); - pubsub.createSubscription( - resultTopicPath, resultSubscriptionPath, (int) duration.getStandardSeconds()); + subscriptionAdmin.createSubscription( + resultSubscriptionPath.getPath(), + resultTopicPath.getPath(), + PushConfig.getDefaultInstance(), + (int) duration.getStandardSeconds()); String result = pollForResultForDuration(resultSubscriptionPath, duration); try { - pubsub.deleteSubscription(resultSubscriptionPath); - } catch (IOException e) { + subscriptionAdmin.deleteSubscription(resultSubscriptionPath.getPath()); + } catch (ApiException e) { LOG.error(String.format("Leaked PubSub subscription '%s'", resultSubscriptionPath)); } @@ -260,39 +299,51 @@ public void waitForSuccess(Duration duration) throws IOException { } private String pollForResultForDuration( - SubscriptionPath signalSubscriptionPath, Duration duration) throws IOException { + SubscriptionPath signalSubscriptionPath, Duration timeoutDuration) throws IOException { - List signal = null; - DateTime endPolling = DateTime.now().plus(duration.getMillis()); + AtomicReference result = new AtomicReference<>(null); - do { + MessageReceiver receiver = + (com.google.pubsub.v1.PubsubMessage message, AckReplyConsumer replyConsumer) -> { + LOG.info("Received message: {}", message.getData().toStringUtf8()); + // Ignore empty messages + if (message.getData().isEmpty()) { + replyConsumer.ack(); + } + if (result.compareAndSet(null, message.getData().toStringUtf8())) { + replyConsumer.ack(); + } else { + replyConsumer.nack(); + } + }; + + Subscriber subscriber = + Subscriber.newBuilder(signalSubscriptionPath.getPath(), receiver) + .setCredentialsProvider(pipelineOptions::getGcpCredential) + .setEndpoint(pubsubEndpoint) + .build(); + subscriber.startAsync(); + + DateTime startTime = new DateTime(); + int timeoutSeconds = timeoutDuration.toStandardSeconds().getSeconds(); + while (result.get() == null + && Seconds.secondsBetween(startTime, new DateTime()).getSeconds() < timeoutSeconds) { try { - signal = pubsub.pull(DateTime.now().getMillis(), signalSubscriptionPath, 1, false); - if (signal.isEmpty()) { - continue; - } - pubsub.acknowledge( - signalSubscriptionPath, signal.stream().map(IncomingMessage::ackId).collect(toList())); - break; - } catch (StatusRuntimeException e) { - if (!Status.DEADLINE_EXCEEDED.equals(e.getStatus())) { - LOG.warn( - "(Will retry) Error while polling {} for signal: {}", - signalSubscriptionPath, - e.getStatus()); - } - sleep(500); + Thread.sleep(1000); + } catch (InterruptedException ignored) { } - } while (DateTime.now().isBefore(endPolling)); + } + + subscriber.stopAsync(); + subscriber.awaitTerminated(); - if (signal == null || signal.isEmpty()) { + if (result.get() == null) { throw new AssertionError( String.format( "Did not receive signal on %s in %ss", - signalSubscriptionPath, duration.getStandardSeconds())); + signalSubscriptionPath, timeoutDuration.getStandardSeconds())); } - - return signal.get(0).message().getData().toStringUtf8(); + return result.get(); } private void sleep(long t) { @@ -389,9 +440,11 @@ public void processElement( // check if all elements seen so far satisfy the success predicate try { if (successPredicate.apply(eventsSoFar)) { + LOG.info("Predicate has been satisfied. Sending SUCCESS message."); context.output("SUCCESS"); } } catch (Throwable e) { + LOG.error("Error while applying predicate.", e); context.output("FAILURE: " + e.getMessage()); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/CloudPubsubChecks.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/CloudPubsubChecks.java deleted file mode 100644 index 6dc15166666a..000000000000 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/CloudPubsubChecks.java +++ /dev/null @@ -1,51 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.pubsublite; - -import static com.google.cloud.pubsublite.cloudpubsub.MessageTransforms.toCpsPublishTransformer; - -import com.google.cloud.pubsublite.Message; -import com.google.cloud.pubsublite.proto.PubSubMessage; -import org.apache.beam.sdk.transforms.MapElements; -import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.TypeDescriptor; - -/** - * A class providing a conversion validity check between Cloud Pub/Sub and Pub/Sub Lite message - * types. - */ -public final class CloudPubsubChecks { - private CloudPubsubChecks() {} - - /** - * Ensure that all messages that pass through can be converted to Cloud Pub/Sub messages using the - * standard transformation methods in the client library. - * - *

    Will fail the pipeline if a message has multiple attributes per key. - */ - public static PTransform, PCollection> - ensureUsableAsCloudPubsub() { - return MapElements.into(TypeDescriptor.of(PubSubMessage.class)) - .via( - message -> { - Object unused = toCpsPublishTransformer().transform(Message.fromProto(message)); - return message; - }); - } -} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/CloudPubsubTransforms.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/CloudPubsubTransforms.java new file mode 100644 index 000000000000..1140c11c2767 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/CloudPubsubTransforms.java @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static com.google.cloud.pubsublite.cloudpubsub.MessageTransforms.fromCpsPublishTransformer; +import static com.google.cloud.pubsublite.cloudpubsub.MessageTransforms.toCpsPublishTransformer; +import static com.google.cloud.pubsublite.cloudpubsub.MessageTransforms.toCpsSubscribeTransformer; + +import com.google.cloud.pubsublite.Message; +import com.google.cloud.pubsublite.cloudpubsub.KeyExtractor; +import com.google.cloud.pubsublite.proto.PubSubMessage; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessages; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; + +/** A class providing transforms between Cloud Pub/Sub and Pub/Sub Lite message types. */ +public final class CloudPubsubTransforms { + private CloudPubsubTransforms() {} + /** + * Ensure that all messages that pass through can be converted to Cloud Pub/Sub messages using the + * standard transformation methods in the client library. + * + *

    Will fail the pipeline if a message has multiple attributes per key. + */ + public static PTransform, PCollection> + ensureUsableAsCloudPubsub() { + return new PTransform, PCollection>() { + @Override + public PCollection expand(PCollection input) { + return input.apply( + MapElements.into(TypeDescriptor.of(PubSubMessage.class)) + .via( + message -> { + Object unused = + toCpsPublishTransformer().transform(Message.fromProto(message)); + return message; + })); + } + }; + } + + /** + * Transform messages read from Pub/Sub Lite to their equivalent Cloud Pub/Sub Message that would + * have been read from PubsubIO. + * + *

    Will fail the pipeline if a message has multiple attributes per map key. + */ + public static PTransform, PCollection> + toCloudPubsubMessages() { + return new PTransform, PCollection>() { + @Override + public PCollection expand(PCollection input) { + return input.apply( + MapElements.into(TypeDescriptor.of(PubsubMessage.class)) + .via( + message -> + PubsubMessages.fromProto( + toCpsSubscribeTransformer() + .transform( + com.google.cloud.pubsublite.SequencedMessage.fromProto( + message))))); + } + }; + } + + /** + * Transform messages publishable using PubsubIO to their equivalent Pub/Sub Lite publishable + * message. + */ + public static PTransform, PCollection> + fromCloudPubsubMessages() { + return new PTransform, PCollection>() { + @Override + public PCollection expand(PCollection input) { + return input.apply( + MapElements.into(TypeDescriptor.of(PubSubMessage.class)) + .via( + message -> + fromCpsPublishTransformer(KeyExtractor.DEFAULT) + .transform(PubsubMessages.toProto(message)) + .toProto())); + } + }; + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/DlqProvider.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/DlqProvider.java new file mode 100644 index 000000000000..987419d6fc88 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/DlqProvider.java @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import com.google.auto.service.AutoService; +import com.google.cloud.pubsublite.TopicPath; +import com.google.cloud.pubsublite.proto.AttributeValues; +import com.google.cloud.pubsublite.proto.PubSubMessage; +import com.google.protobuf.ByteString; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.io.Failure; +import org.apache.beam.sdk.schemas.io.GenericDlqProvider; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.TypeDescriptor; + +@Internal +@AutoService(GenericDlqProvider.class) +public class DlqProvider implements GenericDlqProvider { + @Override + public String identifier() { + return "pubsublite"; + } + + @Override + public PTransform, PDone> newDlqTransform(String config) { + return new DlqTransform(TopicPath.parse(config)); + } + + private static class DlqTransform extends PTransform, PDone> { + private final TopicPath topic; + + DlqTransform(TopicPath topic) { + this.topic = topic; + } + + @Override + public PDone expand(PCollection input) { + return input + .apply( + "Failure to PubSubMessage", + MapElements.into(TypeDescriptor.of(PubSubMessage.class)) + .via(DlqTransform::getMessage)) + .apply( + "Write Failures to Pub/Sub Lite", + PubsubLiteIO.write(PublisherOptions.newBuilder().setTopicPath(topic).build())); + } + + private static PubSubMessage getMessage(Failure failure) { + PubSubMessage.Builder builder = PubSubMessage.newBuilder(); + builder.putAttributes( + "beam-dlq-error", + AttributeValues.newBuilder() + .addValues(ByteString.copyFromUtf8(failure.getError())) + .build()); + builder.setData(ByteString.copyFrom(failure.getPayload())); + return builder.build(); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetFinalizer.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/InitialOffsetReader.java similarity index 75% rename from sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetFinalizer.java rename to sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/InitialOffsetReader.java index 532d3f088db2..0cbafaf369b9 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetFinalizer.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/InitialOffsetReader.java @@ -17,12 +17,12 @@ */ package org.apache.beam.sdk.io.gcp.pubsublite; +import com.google.api.gax.rpc.ApiException; import com.google.cloud.pubsublite.Offset; -import com.google.cloud.pubsublite.Partition; -import com.google.cloud.pubsublite.internal.CheckedApiException; -import java.util.Map; -/** An internal interface for finalizing offsets. */ -interface OffsetFinalizer { - void finalizeOffsets(Map offsets) throws CheckedApiException; +interface InitialOffsetReader extends AutoCloseable { + Offset read() throws ApiException; + + @Override + void close(); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/InitialOffsetReaderImpl.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/InitialOffsetReaderImpl.java new file mode 100644 index 000000000000..c97c7aa1f32d --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/InitialOffsetReaderImpl.java @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static com.google.cloud.pubsublite.internal.ExtractStatus.toCanonical; + +import com.google.api.gax.rpc.ApiException; +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.Partition; +import com.google.cloud.pubsublite.SubscriptionPath; +import com.google.cloud.pubsublite.internal.CursorClient; +import java.util.Map; + +class InitialOffsetReaderImpl implements InitialOffsetReader { + private final CursorClient client; + private final SubscriptionPath subscription; + private final Partition partition; + + InitialOffsetReaderImpl(CursorClient client, SubscriptionPath subscription, Partition partition) { + this.client = client; + this.subscription = subscription; + this.partition = partition; + } + + @Override + public Offset read() throws ApiException { + try { + Map results = client.listPartitionCursors(subscription).get(); + return results.getOrDefault(partition, Offset.of(0)); + } catch (Throwable t) { + throw toCanonical(t).underlying; + } + } + + @Override + public void close() { + client.close(); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/LimitingTopicBacklogReader.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/LimitingTopicBacklogReader.java new file mode 100644 index 000000000000..1e10496eed5b --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/LimitingTopicBacklogReader.java @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static com.google.cloud.pubsublite.internal.ExtractStatus.toCanonical; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.api.gax.rpc.ApiException; +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.proto.ComputeMessageStatsResponse; +import com.google.errorprone.annotations.concurrent.GuardedBy; +import java.util.concurrent.TimeUnit; +import javax.annotation.Nullable; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Ticker; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheLoader; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LoadingCache; + +final class LimitingTopicBacklogReader implements TopicBacklogReader { + private final TopicBacklogReader underlying; + private final LoadingCache backlogCache; + + @GuardedBy("this") + @Nullable + private Offset currentRequestOffset = null; + + @SuppressWarnings("method.invocation.invalid") + LimitingTopicBacklogReader(TopicBacklogReader underlying, Ticker ticker) { + this.underlying = underlying; + backlogCache = + CacheBuilder.newBuilder() + .ticker(ticker) + .maximumSize(1) + .expireAfterWrite(1, TimeUnit.MINUTES) + .refreshAfterWrite(10, TimeUnit.SECONDS) + .build( + new CacheLoader() { + @Override + public ComputeMessageStatsResponse load(String val) { + return loadFromUnderlying(); + } + }); + } + + @SuppressWarnings("argument.type.incompatible") + private synchronized ComputeMessageStatsResponse loadFromUnderlying() { + return underlying.computeMessageStats(checkNotNull(currentRequestOffset)); + } + + @Override + public synchronized ComputeMessageStatsResponse computeMessageStats(Offset offset) + throws ApiException { + currentRequestOffset = offset; + try { + // There is only a single entry in the cache. + return backlogCache.get("cache"); + } catch (Throwable t) { + throw toCanonical(t).underlying; + } + } + + @Override + public void close() { + underlying.close(); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/ManagedBacklogReaderFactory.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/ManagedBacklogReaderFactory.java new file mode 100644 index 000000000000..de0cf433ff33 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/ManagedBacklogReaderFactory.java @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import java.io.Serializable; + +/** + * A ManagedBacklogReaderFactory produces TopicBacklogReaders and tears down any produced readers + * when it is itself closed. + * + *

    close() should never be called on produced readers. + */ +public interface ManagedBacklogReaderFactory extends AutoCloseable, Serializable { + TopicBacklogReader newReader(SubscriptionPartition subscriptionPartition); + + @Override + void close(); +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/ManagedBacklogReaderFactoryImpl.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/ManagedBacklogReaderFactoryImpl.java new file mode 100644 index 000000000000..9a337bfdb784 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/ManagedBacklogReaderFactoryImpl.java @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import com.google.api.gax.rpc.ApiException; +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.proto.ComputeMessageStatsResponse; +import java.util.HashMap; +import java.util.Map; +import javax.annotation.concurrent.GuardedBy; +import org.apache.beam.sdk.transforms.SerializableFunction; + +public class ManagedBacklogReaderFactoryImpl implements ManagedBacklogReaderFactory { + private final SerializableFunction newReader; + + @GuardedBy("this") + private final Map readers = new HashMap<>(); + + ManagedBacklogReaderFactoryImpl( + SerializableFunction newReader) { + this.newReader = newReader; + } + + private static final class NonCloseableTopicBacklogReader implements TopicBacklogReader { + private final TopicBacklogReader underlying; + + NonCloseableTopicBacklogReader(TopicBacklogReader underlying) { + this.underlying = underlying; + } + + @Override + public ComputeMessageStatsResponse computeMessageStats(Offset offset) throws ApiException { + return underlying.computeMessageStats(offset); + } + + @Override + public void close() { + throw new IllegalArgumentException( + "Cannot call close() on a reader returned from ManagedBacklogReaderFactory."); + } + } + + @Override + public synchronized TopicBacklogReader newReader(SubscriptionPartition subscriptionPartition) { + return new NonCloseableTopicBacklogReader( + readers.computeIfAbsent(subscriptionPartition, newReader::apply)); + } + + @Override + public synchronized void close() { + readers.values().forEach(TopicBacklogReader::close); + } +} diff --git a/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/FlinkCapabilities.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteProgress.java similarity index 61% rename from runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/FlinkCapabilities.java rename to sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteProgress.java index e1d2a44f9597..2572909a1e91 100644 --- a/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/FlinkCapabilities.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteProgress.java @@ -15,20 +15,19 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.flink; +package org.apache.beam.sdk.io.gcp.pubsublite; -/** Handle different capabilities between flink versions. */ -public class FlinkCapabilities { +import com.google.auto.value.AutoValue; +import com.google.cloud.pubsublite.Offset; - /** - * Support for outputting elements in close method of chained drivers. - * - *

    {@see FLINK-14709} for more - * details. - * - * @return True if feature is supported. - */ - public static boolean supportsOutputDuringClosing() { - return false; +/** A representation of progress through a Pub/Sub lite partition. */ +@AutoValue +abstract class OffsetByteProgress { + static OffsetByteProgress of(Offset lastOffset, long batchBytes) { + return new AutoValue_OffsetByteProgress(lastOffset, batchBytes); } + /** The last offset of the messages received. */ + abstract Offset lastOffset(); + + abstract long batchBytes(); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRange.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRange.java new file mode 100644 index 000000000000..b39d87e6e1f0 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRange.java @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import com.google.auto.value.AutoValue; +import org.apache.beam.sdk.coders.DefaultCoder; +import org.apache.beam.sdk.io.range.OffsetRange; + +@AutoValue +@DefaultCoder(OffsetByteRangeCoder.class) +abstract class OffsetByteRange { + abstract OffsetRange getRange(); + + abstract long getByteCount(); + + static OffsetByteRange of(OffsetRange range, long byteCount) { + return new AutoValue_OffsetByteRange(range, byteCount); + } + + static OffsetByteRange of(OffsetRange range) { + return of(range, 0); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRangeCoder.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRangeCoder.java new file mode 100644 index 000000000000..076cda13e193 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRangeCoder.java @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.AtomicCoder; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderProvider; +import org.apache.beam.sdk.coders.CoderProviders; +import org.apache.beam.sdk.coders.DelegateCoder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.VarLongCoder; +import org.apache.beam.sdk.io.range.OffsetRange; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.TypeDescriptor; + +public class OffsetByteRangeCoder extends AtomicCoder { + private static final Coder CODER = + DelegateCoder.of( + KvCoder.of(OffsetRange.Coder.of(), VarLongCoder.of()), + OffsetByteRangeCoder::toKv, + OffsetByteRangeCoder::fromKv); + + private static KV toKv(OffsetByteRange value) { + return KV.of(value.getRange(), value.getByteCount()); + } + + private static OffsetByteRange fromKv(KV kv) { + return OffsetByteRange.of(kv.getKey(), kv.getValue()); + } + + @Override + public void encode(OffsetByteRange value, OutputStream outStream) throws IOException { + CODER.encode(value, outStream); + } + + @Override + public OffsetByteRange decode(InputStream inStream) throws IOException { + return CODER.decode(inStream); + } + + public static CoderProvider getCoderProvider() { + return CoderProviders.forCoder( + TypeDescriptor.of(OffsetByteRange.class), new OffsetByteRangeCoder()); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRangeTracker.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRangeTracker.java new file mode 100644 index 000000000000..da9aaaa03ac3 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRangeTracker.java @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.proto.ComputeMessageStatsResponse; +import java.util.concurrent.TimeUnit; +import javax.annotation.Nullable; +import org.apache.beam.sdk.io.range.OffsetRange; +import org.apache.beam.sdk.transforms.splittabledofn.SplitResult; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Stopwatch; +import org.joda.time.Duration; + +/** + * OffsetByteRangeTracker is an unbounded restriction tracker for Pub/Sub lite partitions that + * tracks offsets for checkpointing and bytes for progress. + * + *

    Any valid instance of an OffsetByteRangeTracker tracks one of exactly two types of ranges: - + * Unbounded ranges whose last offset is Long.MAX_VALUE - Completed ranges that are either empty + * (From == To) or fully claimed (lastClaimed == To - 1) + * + *

    Also prevents splitting until minTrackingTime has passed or minBytesReceived have been + * received. IMPORTANT: minTrackingTime must be strictly smaller than the SDF read timeout when it + * would return ProcessContinuation.resume(). + */ +class OffsetByteRangeTracker extends TrackerWithProgress { + private final TopicBacklogReader unownedBacklogReader; + private final Duration minTrackingTime; + private final long minBytesReceived; + private final Stopwatch stopwatch; + private OffsetByteRange range; + private @Nullable Long lastClaimed; + + public OffsetByteRangeTracker( + OffsetByteRange range, + TopicBacklogReader unownedBacklogReader, + Stopwatch stopwatch, + Duration minTrackingTime, + long minBytesReceived) { + checkArgument( + range.getRange().getTo() == Long.MAX_VALUE, + "May only construct OffsetByteRangeTracker with an unbounded range with no progress."); + checkArgument( + range.getByteCount() == 0L, + "May only construct OffsetByteRangeTracker with an unbounded range with no progress."); + this.unownedBacklogReader = unownedBacklogReader; + this.minTrackingTime = minTrackingTime; + this.minBytesReceived = minBytesReceived; + this.stopwatch = stopwatch.reset().start(); + this.range = range; + } + + @Override + public IsBounded isBounded() { + return IsBounded.UNBOUNDED; + } + + @Override + public boolean tryClaim(OffsetByteProgress position) { + long toClaim = position.lastOffset().value(); + checkArgument( + lastClaimed == null || toClaim > lastClaimed, + "Trying to claim offset %s while last attempted was %s", + position.lastOffset().value(), + lastClaimed); + checkArgument( + toClaim >= range.getRange().getFrom(), + "Trying to claim offset %s before start of the range %s", + toClaim, + range); + // split() has already been called, truncating this range. No more offsets may be claimed. + if (range.getRange().getTo() != Long.MAX_VALUE) { + boolean isRangeEmpty = range.getRange().getTo() == range.getRange().getFrom(); + boolean isValidClosedRange = nextOffset() == range.getRange().getTo(); + checkState( + isRangeEmpty || isValidClosedRange, + "Violated class precondition: offset range improperly split. Please report a beam bug."); + return false; + } + lastClaimed = toClaim; + range = OffsetByteRange.of(range.getRange(), range.getByteCount() + position.batchBytes()); + return true; + } + + @Override + public OffsetByteRange currentRestriction() { + return range; + } + + private long nextOffset() { + checkState(lastClaimed == null || lastClaimed < Long.MAX_VALUE); + return lastClaimed == null ? currentRestriction().getRange().getFrom() : lastClaimed + 1; + } + + /** + * Whether the tracker has received enough data/been running for enough time that it can + * checkpoint and be confident it can get sufficient throughput. + */ + private boolean receivedEnough() { + Duration duration = Duration.millis(stopwatch.elapsed(TimeUnit.MILLISECONDS)); + if (duration.isLongerThan(minTrackingTime)) { + return true; + } + if (currentRestriction().getByteCount() >= minBytesReceived) { + return true; + } + return false; + } + + @Override + public @Nullable SplitResult trySplit(double fractionOfRemainder) { + // Cannot split a bounded range. This should already be completely claimed. + if (range.getRange().getTo() != Long.MAX_VALUE) { + return null; + } + if (!receivedEnough()) { + return null; + } + range = + OffsetByteRange.of( + new OffsetRange(currentRestriction().getRange().getFrom(), nextOffset()), + range.getByteCount()); + return SplitResult.of( + this.range, OffsetByteRange.of(new OffsetRange(nextOffset(), Long.MAX_VALUE), 0)); + } + + @Override + @SuppressWarnings("unboxing.of.nullable") + public void checkDone() throws IllegalStateException { + if (range.getRange().getFrom() == range.getRange().getTo()) { + return; + } + checkState( + lastClaimed != null, + "Last attempted offset should not be null. No work was claimed in non-empty range %s.", + range); + long lastClaimedNotNull = checkNotNull(lastClaimed); + checkState( + lastClaimedNotNull >= range.getRange().getTo() - 1, + "Last attempted offset was %s in range %s, claiming work in [%s, %s) was not attempted", + lastClaimedNotNull, + range, + lastClaimedNotNull + 1, + range.getRange().getTo()); + } + + @Override + public Progress getProgress() { + ComputeMessageStatsResponse stats = + this.unownedBacklogReader.computeMessageStats(Offset.of(nextOffset())); + return Progress.from(range.getByteCount(), stats.getMessageBytes()); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetCheckpointMark.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetCheckpointMark.java deleted file mode 100644 index e78902df9826..000000000000 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetCheckpointMark.java +++ /dev/null @@ -1,74 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.pubsublite; - -import com.google.cloud.pubsublite.Offset; -import com.google.cloud.pubsublite.Partition; -import com.google.cloud.pubsublite.internal.CheckedApiException; -import java.io.IOException; -import java.util.Map; -import java.util.Optional; -import org.apache.beam.sdk.coders.BigEndianLongCoder; -import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.coders.DelegateCoder; -import org.apache.beam.sdk.coders.MapCoder; -import org.apache.beam.sdk.io.UnboundedSource.CheckpointMark; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; - -/** A CheckpointMark holding a map from partition numbers to the checkpointed offset. */ -class OffsetCheckpointMark implements CheckpointMark { - private final Optional finalizer; - final Map partitionOffsetMap; - - OffsetCheckpointMark(OffsetFinalizer finalizer, Map partitionOffsetMap) { - this.finalizer = Optional.of(finalizer); - this.partitionOffsetMap = partitionOffsetMap; - } - - private OffsetCheckpointMark(Map encodedMap) { - ImmutableMap.Builder builder = ImmutableMap.builder(); - for (Map.Entry entry : encodedMap.entrySet()) { - builder.put(Partition.of(entry.getKey()), Offset.of(entry.getValue())); - } - finalizer = Optional.empty(); - partitionOffsetMap = builder.build(); - } - - @Override - public void finalizeCheckpoint() throws IOException { - if (!finalizer.isPresent()) { - return; - } - try { - finalizer.get().finalizeOffsets(partitionOffsetMap); - } catch (CheckedApiException e) { - throw new IOException(e); - } - } - - static Coder getCoder() { - return DelegateCoder.of( - MapCoder.of(BigEndianLongCoder.of(), BigEndianLongCoder.of()), - (OffsetCheckpointMark mark) -> { - ImmutableMap.Builder builder = ImmutableMap.builder(); - mark.partitionOffsetMap.forEach((key, value) -> builder.put(key.value(), value.value())); - return builder.build(); - }, - OffsetCheckpointMark::new); - } -} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PerServerPublisherCache.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PerServerPublisherCache.java index 623e20c09b45..d7526d88e089 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PerServerPublisherCache.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PerServerPublisherCache.java @@ -27,4 +27,8 @@ final class PerServerPublisherCache { private PerServerPublisherCache() {} static final PublisherCache PUBLISHER_CACHE = new PublisherCache(); + + static { + Runtime.getRuntime().addShutdownHook(new Thread(PUBLISHER_CACHE::close)); + } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PerSubscriptionPartitionSdf.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PerSubscriptionPartitionSdf.java new file mode 100644 index 000000000000..fdf792029863 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PerSubscriptionPartitionSdf.java @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static com.google.cloud.pubsublite.internal.wire.ApiServiceUtils.blockingShutdown; + +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.internal.ExtractStatus; +import com.google.cloud.pubsublite.internal.wire.Committer; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import org.apache.beam.sdk.io.range.OffsetRange; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.SerializableBiFunction; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.sdk.transforms.splittabledofn.WatermarkEstimators.MonotonicallyIncreasing; +import org.joda.time.Duration; +import org.joda.time.Instant; + +class PerSubscriptionPartitionSdf extends DoFn { + private final Duration maxSleepTime; + private final ManagedBacklogReaderFactory backlogReaderFactory; + private final SubscriptionPartitionProcessorFactory processorFactory; + private final SerializableFunction + offsetReaderFactory; + private final SerializableBiFunction + trackerFactory; + private final SerializableFunction committerFactory; + + PerSubscriptionPartitionSdf( + Duration maxSleepTime, + ManagedBacklogReaderFactory backlogReaderFactory, + SerializableFunction offsetReaderFactory, + SerializableBiFunction + trackerFactory, + SubscriptionPartitionProcessorFactory processorFactory, + SerializableFunction committerFactory) { + this.maxSleepTime = maxSleepTime; + this.backlogReaderFactory = backlogReaderFactory; + this.processorFactory = processorFactory; + this.offsetReaderFactory = offsetReaderFactory; + this.trackerFactory = trackerFactory; + this.committerFactory = committerFactory; + } + + @Teardown + public void teardown() { + backlogReaderFactory.close(); + } + + @GetInitialWatermarkEstimatorState + public Instant getInitialWatermarkState() { + return Instant.EPOCH; + } + + @NewWatermarkEstimator + public MonotonicallyIncreasing newWatermarkEstimator(@WatermarkEstimatorState Instant state) { + return new MonotonicallyIncreasing(state); + } + + @ProcessElement + public ProcessContinuation processElement( + RestrictionTracker tracker, + @Element SubscriptionPartition subscriptionPartition, + OutputReceiver receiver) + throws Exception { + try (SubscriptionPartitionProcessor processor = + processorFactory.newProcessor(subscriptionPartition, tracker, receiver)) { + processor.start(); + ProcessContinuation result = processor.waitForCompletion(maxSleepTime); + processor + .lastClaimed() + .ifPresent( + lastClaimedOffset -> { + Committer committer = committerFactory.apply(subscriptionPartition); + committer.startAsync().awaitRunning(); + // Commit the next-to-deliver offset. + try { + committer.commitOffset(Offset.of(lastClaimedOffset.value() + 1)).get(); + } catch (Exception e) { + throw ExtractStatus.toCanonical(e).underlying; + } + blockingShutdown(committer); + }); + return result; + } + } + + @GetInitialRestriction + public OffsetByteRange getInitialRestriction( + @Element SubscriptionPartition subscriptionPartition) { + try (InitialOffsetReader reader = offsetReaderFactory.apply(subscriptionPartition)) { + Offset offset = reader.read(); + return OffsetByteRange.of( + new OffsetRange(offset.value(), Long.MAX_VALUE /* open interval */)); + } + } + + @NewTracker + public TrackerWithProgress newTracker( + @Element SubscriptionPartition subscriptionPartition, @Restriction OffsetByteRange range) { + return trackerFactory.apply(backlogReaderFactory.newReader(subscriptionPartition), range); + } + + @GetSize + public double getSize( + @Element SubscriptionPartition subscriptionPartition, + @Restriction OffsetByteRange restriction) { + if (restriction.getRange().getTo() != Long.MAX_VALUE) { + return restriction.getByteCount(); + } + return newTracker(subscriptionPartition, restriction).getProgress().getWorkRemaining(); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherCache.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherCache.java index b800d731b440..3dbdec69db99 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherCache.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherCache.java @@ -22,53 +22,51 @@ import com.google.api.core.ApiService.Listener; import com.google.api.core.ApiService.State; import com.google.api.gax.rpc.ApiException; -import com.google.cloud.pubsublite.PublishMetadata; -import com.google.cloud.pubsublite.internal.CloseableMonitor; +import com.google.cloud.pubsublite.MessageMetadata; import com.google.cloud.pubsublite.internal.Publisher; +import com.google.cloud.pubsublite.internal.wire.SystemExecutors; import com.google.errorprone.annotations.concurrent.GuardedBy; import java.util.HashMap; -import java.util.concurrent.Executor; -import java.util.concurrent.Executors; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; /** A map of working publishers by PublisherOptions. */ -class PublisherCache { - private final CloseableMonitor monitor = new CloseableMonitor(); - - private final Executor listenerExecutor = Executors.newSingleThreadExecutor(); - - @GuardedBy("monitor.monitor") - private final HashMap> livePublishers = +class PublisherCache implements AutoCloseable { + @GuardedBy("this") + private final HashMap> livePublishers = new HashMap<>(); - Publisher get(PublisherOptions options) throws ApiException { + private synchronized void evict(PublisherOptions options) { + livePublishers.remove(options); + } + + synchronized Publisher get(PublisherOptions options) throws ApiException { checkArgument(options.usesCache()); - try (CloseableMonitor.Hold h = monitor.enter()) { - Publisher publisher = livePublishers.get(options); - if (publisher != null) { - return publisher; - } - publisher = Publishers.newPublisher(options); - livePublishers.put(options, publisher); - publisher.addListener( - new Listener() { - @Override - public void failed(State s, Throwable t) { - try (CloseableMonitor.Hold h = monitor.enter()) { - livePublishers.remove(options); - } - } - }, - listenerExecutor); - publisher.startAsync(); + Publisher publisher = livePublishers.get(options); + if (publisher != null) { return publisher; } + publisher = Publishers.newPublisher(options); + livePublishers.put(options, publisher); + publisher.addListener( + new Listener() { + @Override + public void failed(State s, Throwable t) { + evict(options); + } + }, + SystemExecutors.getFuturesExecutor()); + publisher.startAsync().awaitRunning(); + return publisher; } @VisibleForTesting - void set(PublisherOptions options, Publisher toCache) { - try (CloseableMonitor.Hold h = monitor.enter()) { - livePublishers.put(options, toCache); - } + synchronized void set(PublisherOptions options, Publisher toCache) { + livePublishers.put(options, toCache); + } + + @Override + public synchronized void close() { + livePublishers.forEach(((options, publisher) -> publisher.stopAsync())); + livePublishers.clear(); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherOptions.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherOptions.java index 99d6d02ba29d..1ddd7cce995b 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherOptions.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherOptions.java @@ -34,7 +34,7 @@ public abstract class PublisherOptions implements Serializable { /** * A supplier for the publisher to be used. If enabled, does not use the publisher cache. * - *

    The returned type must be convertible to Publisher, but Object is used to + *

    The returned type must be convertible to Publisher, but Object is used to * prevent adding an api surface dependency on guava when this is not used. */ public abstract @Nullable SerializableSupplier publisherSupplier(); @@ -59,7 +59,7 @@ public abstract static class Builder { /** * A supplier for the publisher to be used. If enabled, does not use the publisher cache. * - *

    The returned type must be convertible to Publisher, but Object is used to + *

    The returned type must be convertible to Publisher, but Object is used to * prevent adding an api surface dependency on guava when this is not used. */ public abstract Builder setPublisherSupplier(SerializableSupplier stubSupplier); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherOrError.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherOrError.java index 111b84bf8e1a..f0a4aeeeb7d0 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherOrError.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PublisherOrError.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.io.gcp.pubsublite; import com.google.auto.value.AutoOneOf; -import com.google.cloud.pubsublite.PublishMetadata; +import com.google.cloud.pubsublite.MessageMetadata; import com.google.cloud.pubsublite.internal.CheckedApiException; import com.google.cloud.pubsublite.internal.Publisher; @@ -35,11 +35,11 @@ enum Kind { abstract Kind getKind(); - abstract Publisher publisher(); + abstract Publisher publisher(); abstract CheckedApiException error(); - static PublisherOrError ofPublisher(Publisher p) { + static PublisherOrError ofPublisher(Publisher p) { return AutoOneOf_PublisherOrError.publisher(p); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/Publishers.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/Publishers.java index c44c3df2edfb..67ea6cf6062d 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/Publishers.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/Publishers.java @@ -17,15 +17,27 @@ */ package org.apache.beam.sdk.io.gcp.pubsublite; +import static com.google.cloud.pubsublite.internal.ExtractStatus.toCanonical; import static com.google.cloud.pubsublite.internal.UncheckedApiPreconditions.checkArgument; +import static com.google.cloud.pubsublite.internal.wire.ServiceClients.addDefaultMetadata; +import static com.google.cloud.pubsublite.internal.wire.ServiceClients.addDefaultSettings; import com.google.api.gax.rpc.ApiException; -import com.google.cloud.pubsublite.PublishMetadata; +import com.google.cloud.pubsublite.AdminClient; +import com.google.cloud.pubsublite.AdminClientSettings; +import com.google.cloud.pubsublite.MessageMetadata; +import com.google.cloud.pubsublite.Partition; +import com.google.cloud.pubsublite.cloudpubsub.PublisherSettings; import com.google.cloud.pubsublite.internal.Publisher; +import com.google.cloud.pubsublite.internal.wire.PartitionCountWatchingPublisherSettings; import com.google.cloud.pubsublite.internal.wire.PubsubContext; import com.google.cloud.pubsublite.internal.wire.PubsubContext.Framework; -import com.google.cloud.pubsublite.internal.wire.RoutingPublisherBuilder; +import com.google.cloud.pubsublite.internal.wire.RoutingMetadata; import com.google.cloud.pubsublite.internal.wire.SinglePartitionPublisherBuilder; +import com.google.cloud.pubsublite.v1.AdminServiceClient; +import com.google.cloud.pubsublite.v1.AdminServiceSettings; +import com.google.cloud.pubsublite.v1.PublisherServiceClient; +import com.google.cloud.pubsublite.v1.PublisherServiceSettings; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.reflect.TypeToken; class Publishers { @@ -33,24 +45,59 @@ class Publishers { private Publishers() {} + private static AdminClient newAdminClient(PublisherOptions options) throws ApiException { + try { + return AdminClient.create( + AdminClientSettings.newBuilder() + .setServiceClient( + AdminServiceClient.create( + addDefaultSettings( + options.topicPath().location().extractRegion(), + AdminServiceSettings.newBuilder()))) + .setRegion(options.topicPath().location().extractRegion()) + .build()); + } catch (Throwable t) { + throw toCanonical(t).underlying; + } + } + + private static PublisherServiceClient newServiceClient( + PublisherOptions options, Partition partition) { + PublisherServiceSettings.Builder settingsBuilder = PublisherServiceSettings.newBuilder(); + settingsBuilder = + addDefaultMetadata( + PubsubContext.of(FRAMEWORK), + RoutingMetadata.of(options.topicPath(), partition), + settingsBuilder); + try { + return PublisherServiceClient.create( + addDefaultSettings(options.topicPath().location().extractRegion(), settingsBuilder)); + } catch (Throwable t) { + throw toCanonical(t).underlying; + } + } + @SuppressWarnings("unchecked") - static Publisher newPublisher(PublisherOptions options) throws ApiException { + static Publisher newPublisher(PublisherOptions options) throws ApiException { SerializableSupplier supplier = options.publisherSupplier(); if (supplier != null) { Object supplied = supplier.get(); - TypeToken> token = new TypeToken>() {}; + TypeToken> token = new TypeToken>() {}; checkArgument(token.isSupertypeOf(supplied.getClass())); - return (Publisher) supplied; + return (Publisher) supplied; } - return RoutingPublisherBuilder.newBuilder() + return PartitionCountWatchingPublisherSettings.newBuilder() .setTopic(options.topicPath()) .setPublisherFactory( partition -> SinglePartitionPublisherBuilder.newBuilder() .setTopic(options.topicPath()) .setPartition(partition) - .setContext(PubsubContext.of(FRAMEWORK)) + .setServiceClient(newServiceClient(options, partition)) + .setBatchingSettings(PublisherSettings.DEFAULT_BATCHING_SETTINGS) .build()) - .build(); + .setAdminClient(newAdminClient(options)) + .build() + .instantiate(); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java index b52a54d5e10f..b93ac61f33be 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java @@ -20,9 +20,9 @@ import com.google.cloud.pubsublite.proto.PubSubMessage; import com.google.cloud.pubsublite.proto.SequencedMessage; import org.apache.beam.sdk.annotations.Experimental; -import org.apache.beam.sdk.io.Read; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PDone; @@ -51,21 +51,13 @@ private PubsubLiteIO() {} * .setName(subscriptionName) * .build(); * - * FlowControlSettings flowControlSettings = - * FlowControlSettings.builder() - * // Set outstanding bytes to 10 MiB per partition. - * .setBytesOutstanding(10 * 1024 * 1024L) - * .setMessagesOutstanding(Long.MAX_VALUE) - * .build(); - * * PCollection messages = p.apply(PubsubLiteIO.read(SubscriberOptions.newBuilder() * .setSubscriptionPath(subscriptionPath) - * .setFlowControlSettings(flowControlSettings) * .build()), "read"); * } */ - public static Read.Unbounded read(SubscriberOptions options) { - return Read.from(new PubsubLiteUnboundedSource(options)); + public static PTransform> read(SubscriberOptions options) { + return new SubscribeTransform(options); } /** @@ -115,7 +107,7 @@ public static PTransform, PCollection> * } */ public static PTransform, PDone> write(PublisherOptions options) { - return new PTransform, PDone>("PubsubLiteIO") { + return new PTransform, PDone>() { @Override public PDone expand(PCollection input) { PubsubLiteSink sink = new PubsubLiteSink(options); diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteSink.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteSink.java index 1d0a6db671af..d0e3afa2ac07 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteSink.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteSink.java @@ -24,20 +24,18 @@ import com.google.api.core.ApiService.State; import com.google.api.gax.rpc.ApiException; import com.google.cloud.pubsublite.Message; -import com.google.cloud.pubsublite.PublishMetadata; +import com.google.cloud.pubsublite.MessageMetadata; import com.google.cloud.pubsublite.internal.CheckedApiException; import com.google.cloud.pubsublite.internal.ExtractStatus; import com.google.cloud.pubsublite.internal.Publisher; +import com.google.cloud.pubsublite.internal.wire.SystemExecutors; import com.google.cloud.pubsublite.proto.PubSubMessage; import com.google.errorprone.annotations.concurrent.GuardedBy; import java.util.ArrayDeque; import java.util.Deque; -import java.util.concurrent.Executor; -import java.util.concurrent.Executors; import java.util.function.Consumer; import org.apache.beam.sdk.io.gcp.pubsublite.PublisherOrError.Kind; import org.apache.beam.sdk.transforms.DoFn; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; /** A sink which publishes messages to Pub/Sub Lite. */ @SuppressWarnings({ @@ -56,15 +54,13 @@ class PubsubLiteSink extends DoFn { @GuardedBy("this") private transient Deque errorsSinceLastFinish; - private static final Executor executor = Executors.newCachedThreadPool(); - PubsubLiteSink(PublisherOptions options) { this.options = options; } @Setup public void setup() throws ApiException { - Publisher publisher; + Publisher publisher; if (options.usesCache()) { publisher = PerServerPublisherCache.PUBLISHER_CACHE.get(options); } else { @@ -89,7 +85,7 @@ public void failed(State s, Throwable t) { onFailure.accept(t); } }, - MoreExecutors.directExecutor()); + SystemExecutors.getFuturesExecutor()); if (!options.usesCache()) { publisher.startAsync(); } @@ -107,7 +103,7 @@ public synchronized void processElement(@Element PubSubMessage message) if (publisherOrError.getKind() == Kind.ERROR) { throw publisherOrError.error(); } - ApiFuture future = + ApiFuture future = publisherOrError.publisher().publish(Message.fromProto(message)); // cannot declare in inner class since 'this' means something different. Consumer onFailure = @@ -119,9 +115,9 @@ public synchronized void processElement(@Element PubSubMessage message) }; ApiFutures.addCallback( future, - new ApiFutureCallback() { + new ApiFutureCallback() { @Override - public void onSuccess(PublishMetadata publishMetadata) { + public void onSuccess(MessageMetadata messageMetadata) { decrementOutstanding(); } @@ -130,7 +126,7 @@ public void onFailure(Throwable t) { onFailure.accept(t); } }, - executor); + SystemExecutors.getFuturesExecutor()); } // Intentionally don't flush on bundle finish to allow multi-sink client reuse. diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteUnboundedReader.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteUnboundedReader.java deleted file mode 100644 index 211e00c2f6fa..000000000000 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteUnboundedReader.java +++ /dev/null @@ -1,333 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.pubsublite; - -import com.google.api.core.ApiFuture; -import com.google.api.gax.rpc.ApiException; -import com.google.api.gax.rpc.StatusCode.Code; -import com.google.auto.value.AutoValue; -import com.google.cloud.pubsublite.Offset; -import com.google.cloud.pubsublite.Partition; -import com.google.cloud.pubsublite.internal.CheckedApiException; -import com.google.cloud.pubsublite.internal.CloseableMonitor; -import com.google.cloud.pubsublite.internal.ExtractStatus; -import com.google.cloud.pubsublite.internal.ProxyService; -import com.google.cloud.pubsublite.internal.PullSubscriber; -import com.google.cloud.pubsublite.internal.wire.Committer; -import com.google.cloud.pubsublite.proto.ComputeMessageStatsResponse; -import com.google.cloud.pubsublite.proto.SequencedMessage; -import com.google.errorprone.annotations.concurrent.GuardedBy; -import com.google.protobuf.Timestamp; -import com.google.protobuf.util.Timestamps; -import java.io.IOException; -import java.util.ArrayDeque; -import java.util.ArrayList; -import java.util.Collection; -import java.util.List; -import java.util.Map; -import java.util.NoSuchElementException; -import java.util.Optional; -import java.util.Queue; -import java.util.concurrent.ExecutionException; -import java.util.concurrent.TimeUnit; -import java.util.function.Consumer; -import java.util.stream.Collectors; -import org.apache.beam.sdk.io.UnboundedSource; -import org.apache.beam.sdk.io.UnboundedSource.CheckpointMark; -import org.apache.beam.sdk.io.UnboundedSource.UnboundedReader; -import org.apache.beam.sdk.transforms.windowing.BoundedWindow; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Ticker; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheBuilder; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.CacheLoader; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LoadingCache; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; -import org.joda.time.Instant; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -/** A reader for Pub/Sub Lite that generates a stream of SequencedMessages. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -class PubsubLiteUnboundedReader extends UnboundedReader - implements OffsetFinalizer { - private static final Logger LOG = LoggerFactory.getLogger(PubsubLiteUnboundedReader.class); - private final UnboundedSource source; - private final TopicBacklogReader backlogReader; - private final LoadingCache backlogCache; - private final CloseableMonitor monitor = new CloseableMonitor(); - - @GuardedBy("monitor.monitor") - private final ImmutableMap subscriberMap; - - private final CommitterProxy committerProxy; - - @GuardedBy("monitor.monitor") - private final Queue messages = new ArrayDeque<>(); - - @GuardedBy("monitor.monitor") - private Optional permanentError = Optional.empty(); - - private static class CommitterProxy extends ProxyService { - private final Consumer permanentErrorSetter; - - CommitterProxy( - Collection states, Consumer permanentErrorSetter) - throws ApiException { - this.permanentErrorSetter = permanentErrorSetter; - addServices(states.stream().map(state -> state.committer).collect(Collectors.toList())); - } - - @Override - protected void start() {} - - @Override - protected void stop() {} - - @Override - protected void handlePermanentError(CheckedApiException error) { - permanentErrorSetter.accept(error); - } - } - - public PubsubLiteUnboundedReader( - UnboundedSource source, - Map subscriberMap, - TopicBacklogReader backlogReader) - throws ApiException { - this(source, subscriberMap, backlogReader, Ticker.systemTicker()); - } - - PubsubLiteUnboundedReader( - UnboundedSource source, - Map subscriberMap, - TopicBacklogReader backlogReader, - Ticker ticker) - throws ApiException { - this.source = source; - this.subscriberMap = ImmutableMap.copyOf(subscriberMap); - this.committerProxy = - new CommitterProxy( - subscriberMap.values(), - error -> { - try (CloseableMonitor.Hold h = monitor.enter()) { - permanentError = Optional.of(permanentError.orElse(error)); - } - }); - this.backlogReader = backlogReader; - this.backlogCache = - CacheBuilder.newBuilder() - .ticker(ticker) - .maximumSize(1) - .expireAfterWrite(1, TimeUnit.MINUTES) - .refreshAfterWrite(10, TimeUnit.SECONDS) - .build( - new CacheLoader() { - @Override - public Long load(Object val) throws InterruptedException, ExecutionException { - return computeSplitBacklog().get().getMessageBytes(); - } - }); - this.committerProxy.startAsync().awaitRunning(); - } - - private ApiFuture computeSplitBacklog() { - ImmutableMap.Builder builder = ImmutableMap.builder(); - try (CloseableMonitor.Hold h = monitor.enter()) { - subscriberMap.forEach( - (partition, subscriberState) -> - subscriberState.lastDelivered.ifPresent(offset -> builder.put(partition, offset))); - } - return backlogReader.computeMessageStats(builder.build()); - } - - @Override - public void finalizeOffsets(Map offsets) throws CheckedApiException { - List> commitFutures = new ArrayList<>(); - try (CloseableMonitor.Hold h = monitor.enter()) { - for (Partition partition : offsets.keySet()) { - if (!subscriberMap.containsKey(partition)) { - throw new CheckedApiException( - String.format( - "Asked to finalize an offset for partition %s which was not managed by this" - + " reader.", - partition), - Code.INVALID_ARGUMENT); - } - commitFutures.add( - subscriberMap.get(partition).committer.commitOffset(offsets.get(partition))); - } - } - // Add outside of monitor in case they are finished inline. - commitFutures.forEach( - commitFuture -> - ExtractStatus.addFailureHandler( - commitFuture, - error -> { - try (CloseableMonitor.Hold h = monitor.enter()) { - if (!permanentError.isPresent()) { - permanentError = Optional.of(error); - } - } - })); - } - - static class SubscriberState { - Instant lastDeliveredPublishTimestamp = BoundedWindow.TIMESTAMP_MIN_VALUE; - Optional lastDelivered = Optional.empty(); - PullSubscriber subscriber; - Committer committer; - } - - @AutoValue - abstract static class PartitionedSequencedMessage { - abstract Partition partition(); - - abstract SequencedMessage sequencedMessage(); - - private static PartitionedSequencedMessage of( - Partition partition, SequencedMessage sequencedMessage) { - return new AutoValue_PubsubLiteUnboundedReader_PartitionedSequencedMessage( - partition, sequencedMessage); - } - } - - @Override - public boolean start() throws IOException { - return advance(); - } - - @Override - public boolean advance() throws IOException { - try (CloseableMonitor.Hold h = monitor.enter()) { - if (permanentError.isPresent()) { - throw permanentError.get(); - } - // messages starts empty. This will not remove messages on the first iteration. - if (!messages.isEmpty()) { - PartitionedSequencedMessage unusedMessage = messages.poll(); - } - // Intentionally do this twice: We don't bound the buffer in this class, so we want to flush - // the last pull from the subscribers before pulling new messages. - if (!messages.isEmpty()) { - setLastDelivered(messages.peek()); - return true; - } - pullFromSubscribers(); - if (!messages.isEmpty()) { - setLastDelivered(messages.peek()); - return true; - } - return false; - } catch (CheckedApiException e) { - throw new IOException(e); - } - } - - @GuardedBy("monitor.monitor") - private void setLastDelivered(PartitionedSequencedMessage message) { - SubscriberState state = subscriberMap.get(message.partition()); - state.lastDelivered = - Optional.of(Offset.of(message.sequencedMessage().getCursor().getOffset())); - Timestamp timestamp = message.sequencedMessage().getPublishTime(); - state.lastDeliveredPublishTimestamp = new Instant(Timestamps.toMillis(timestamp)); - } - - @GuardedBy("monitor.monitor") - private void pullFromSubscribers() throws CheckedApiException { - for (Map.Entry entry : subscriberMap.entrySet()) { - for (SequencedMessage message : entry.getValue().subscriber.pull()) { - messages.add(PartitionedSequencedMessage.of(entry.getKey(), message)); - } - } - } - - @Override - public SequencedMessage getCurrent() throws NoSuchElementException { - try (CloseableMonitor.Hold h = monitor.enter()) { - if (messages.isEmpty()) { - throw new NoSuchElementException(); - } - return messages.peek().sequencedMessage(); - } - } - - @Override - public Instant getCurrentTimestamp() throws NoSuchElementException { - try (CloseableMonitor.Hold h = monitor.enter()) { - if (messages.isEmpty()) { - throw new NoSuchElementException(); - } - return new Instant(Timestamps.toMillis(messages.peek().sequencedMessage().getPublishTime())); - } - } - - @Override - public void close() { - try (CloseableMonitor.Hold h = monitor.enter()) { - for (SubscriberState state : subscriberMap.values()) { - try { - state.subscriber.close(); - } catch (Exception e) { - throw new IllegalStateException(e); - } - } - } - committerProxy.stopAsync().awaitTerminated(); - } - - @Override - public Instant getWatermark() { - try (CloseableMonitor.Hold h = monitor.enter()) { - return subscriberMap.values().stream() - .map(state -> state.lastDeliveredPublishTimestamp) - .min(Instant::compareTo) - .get(); - } - } - - @Override - public CheckpointMark getCheckpointMark() { - try (CloseableMonitor.Hold h = monitor.enter()) { - ImmutableMap.Builder builder = ImmutableMap.builder(); - subscriberMap.forEach( - (partition, subscriberState) -> - subscriberState.lastDelivered.ifPresent(offset -> builder.put(partition, offset))); - return new OffsetCheckpointMark(this, builder.build()); - } - } - - @Override - public long getSplitBacklogBytes() { - try { - // We use the cache because it allows us to coalesce request, periodically refresh the value - // and expire the value after a maximum staleness, but there is only ever one key. - return backlogCache.get("Backlog"); - } catch (ExecutionException e) { - LOG.warn( - "Failed to retrieve backlog information, reporting the backlog size as UNKNOWN: {}", - e.getCause().getMessage()); - return BACKLOG_UNKNOWN; - } - } - - @Override - public UnboundedSource getCurrentSource() { - return source; - } -} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteUnboundedSource.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteUnboundedSource.java deleted file mode 100644 index 9948a19c0bfd..000000000000 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteUnboundedSource.java +++ /dev/null @@ -1,126 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.pubsublite; - -import static java.lang.Math.min; - -import com.google.cloud.pubsublite.Offset; -import com.google.cloud.pubsublite.Partition; -import com.google.cloud.pubsublite.internal.BufferingPullSubscriber; -import com.google.cloud.pubsublite.proto.Cursor; -import com.google.cloud.pubsublite.proto.SeekRequest; -import com.google.cloud.pubsublite.proto.SequencedMessage; -import java.io.IOException; -import java.util.ArrayList; -import java.util.List; -import java.util.Optional; -import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.sdk.extensions.protobuf.ProtoCoder; -import org.apache.beam.sdk.io.UnboundedSource; -import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; -import org.checkerframework.checker.nullness.qual.Nullable; - -/** An UnboundedSource of Pub/Sub Lite SequencedMessages. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -class PubsubLiteUnboundedSource extends UnboundedSource { - private final SubscriberOptions subscriberOptions; - - PubsubLiteUnboundedSource(SubscriberOptions options) { - this.subscriberOptions = options; - } - - @Override - public List> split( - int desiredNumSplits, PipelineOptions options) { - ArrayList> partitionPartitions = - new ArrayList<>(min(desiredNumSplits, subscriberOptions.partitions().size())); - for (int i = 0; i < desiredNumSplits; i++) { - partitionPartitions.add(new ArrayList<>()); - } - int counter = 0; - for (Partition partition : subscriberOptions.partitions()) { - partitionPartitions.get(counter % desiredNumSplits).add(partition); - ++counter; - } - ImmutableList.Builder builder = ImmutableList.builder(); - for (List partitionSubset : partitionPartitions) { - if (partitionSubset.isEmpty()) { - continue; - } - builder.add( - new PubsubLiteUnboundedSource( - subscriberOptions - .toBuilder() - .setPartitions(ImmutableSet.copyOf(partitionSubset)) - .build())); - } - return builder.build(); - } - - @Override - public UnboundedReader createReader( - PipelineOptions options, @Nullable OffsetCheckpointMark checkpointMark) throws IOException { - try { - ImmutableMap.Builder statesBuilder = - ImmutableMap.builder(); - for (Partition partition : subscriberOptions.partitions()) { - PubsubLiteUnboundedReader.SubscriberState state = - new PubsubLiteUnboundedReader.SubscriberState(); - state.committer = subscriberOptions.getCommitter(partition); - if (checkpointMark != null && checkpointMark.partitionOffsetMap.containsKey(partition)) { - Offset checkpointed = checkpointMark.partitionOffsetMap.get(partition); - state.lastDelivered = Optional.of(checkpointed); - state.subscriber = - new TranslatingPullSubscriber( - new BufferingPullSubscriber( - subscriberOptions.getSubscriberFactory(partition), - subscriberOptions.flowControlSettings(), - SeekRequest.newBuilder() - .setCursor(Cursor.newBuilder().setOffset(checkpointed.value())) - .build())); - } else { - state.subscriber = - new TranslatingPullSubscriber( - new BufferingPullSubscriber( - subscriberOptions.getSubscriberFactory(partition), - subscriberOptions.flowControlSettings())); - } - statesBuilder.put(partition, state); - } - return new PubsubLiteUnboundedReader( - this, statesBuilder.build(), subscriberOptions.getBacklogReader()); - } catch (Throwable t) { - throw new IOException(t); - } - } - - @Override - public Coder getCheckpointMarkCoder() { - return OffsetCheckpointMark.getCoder(); - } - - @Override - public Coder getOutputCoder() { - return ProtoCoder.of(SequencedMessage.class); - } -} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SerializableSubscriberFactory.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SerializableSubscriberFactory.java new file mode 100644 index 000000000000..c65491903b69 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SerializableSubscriberFactory.java @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import com.google.api.gax.rpc.ApiException; +import com.google.cloud.pubsublite.Partition; +import com.google.cloud.pubsublite.internal.wire.Subscriber; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import java.io.Serializable; +import java.util.List; +import java.util.function.Consumer; + +interface SerializableSubscriberFactory extends Serializable { + long serialVersionUID = -6978345654136456L; + + Subscriber newSubscriber(Partition partition, Consumer> messageConsumer) + throws ApiException; +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SerializableSupplier.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SerializableSupplier.java index a906a5039627..48b8c4db66b0 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SerializableSupplier.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SerializableSupplier.java @@ -19,7 +19,7 @@ import java.io.Serializable; -/** A serializable Supplier that can throw a StatusException. */ +/** A serializable Supplier. */ public interface SerializableSupplier extends Serializable { T get(); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscribeTransform.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscribeTransform.java new file mode 100644 index 000000000000..b6a9f5d59090 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscribeTransform.java @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static com.google.cloud.pubsublite.internal.ExtractStatus.toCanonical; +import static com.google.cloud.pubsublite.internal.UncheckedApiPreconditions.checkArgument; + +import com.google.api.gax.rpc.ApiException; +import com.google.cloud.pubsublite.AdminClient; +import com.google.cloud.pubsublite.AdminClientSettings; +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.Partition; +import com.google.cloud.pubsublite.TopicPath; +import com.google.cloud.pubsublite.internal.wire.Committer; +import com.google.cloud.pubsublite.internal.wire.Subscriber; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import java.util.List; +import java.util.function.Consumer; +import java.util.stream.Collectors; +import org.apache.beam.sdk.transforms.DoFn.OutputReceiver; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Stopwatch; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.LongMath; +import org.joda.time.Duration; + +class SubscribeTransform extends PTransform> { + private final SubscriberOptions options; + + SubscribeTransform(SubscriberOptions options) { + this.options = options; + } + + private void checkSubscription(SubscriptionPartition subscriptionPartition) throws ApiException { + checkArgument(subscriptionPartition.subscription().equals(options.subscriptionPath())); + } + + private Subscriber newSubscriber( + Partition partition, Offset initialOffset, Consumer> consumer) { + try { + return options + .getSubscriberFactory(partition, initialOffset) + .newSubscriber( + messages -> + consumer.accept( + messages.stream() + .map(message -> message.toProto()) + .collect(Collectors.toList()))); + } catch (Throwable t) { + throw toCanonical(t).underlying; + } + } + + private SubscriptionPartitionProcessor newPartitionProcessor( + SubscriptionPartition subscriptionPartition, + RestrictionTracker tracker, + OutputReceiver receiver) + throws ApiException { + checkSubscription(subscriptionPartition); + return new SubscriptionPartitionProcessorImpl( + tracker, + receiver, + consumer -> + newSubscriber( + subscriptionPartition.partition(), + Offset.of(tracker.currentRestriction().getRange().getFrom()), + consumer), + options.flowControlSettings()); + } + + private TopicBacklogReader newBacklogReader(SubscriptionPartition subscriptionPartition) { + checkSubscription(subscriptionPartition); + return options.getBacklogReader(subscriptionPartition.partition()); + } + + private TrackerWithProgress newRestrictionTracker( + TopicBacklogReader backlogReader, OffsetByteRange initial) { + return new OffsetByteRangeTracker( + initial, + backlogReader, + Stopwatch.createUnstarted(), + options.minBundleTimeout(), + LongMath.saturatedMultiply(options.flowControlSettings().bytesOutstanding(), 10)); + } + + private InitialOffsetReader newInitialOffsetReader(SubscriptionPartition subscriptionPartition) { + checkSubscription(subscriptionPartition); + return options.getInitialOffsetReader(subscriptionPartition.partition()); + } + + private Committer newCommitter(SubscriptionPartition subscriptionPartition) { + checkSubscription(subscriptionPartition); + return options.getCommitter(subscriptionPartition.partition()); + } + + private TopicPath getTopicPath() { + try (AdminClient admin = + AdminClient.create( + AdminClientSettings.newBuilder() + .setRegion(options.subscriptionPath().location().extractRegion()) + .build())) { + return TopicPath.parse(admin.getSubscription(options.subscriptionPath()).get().getTopic()); + } catch (Throwable t) { + throw toCanonical(t).underlying; + } + } + + @Override + public PCollection expand(PBegin input) { + PCollection subscriptionPartitions; + subscriptionPartitions = + input.apply(new SubscriptionPartitionLoader(getTopicPath(), options.subscriptionPath())); + + return subscriptionPartitions.apply( + ParDo.of( + new PerSubscriptionPartitionSdf( + // Ensure we read for at least 5 seconds more than the bundle timeout. + options.minBundleTimeout().plus(Duration.standardSeconds(5)), + new ManagedBacklogReaderFactoryImpl(this::newBacklogReader), + this::newInitialOffsetReader, + this::newRestrictionTracker, + this::newPartitionProcessor, + this::newCommitter))); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriberOptions.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriberOptions.java index 4c3f5054e7b0..a9625be608fd 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriberOptions.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriberOptions.java @@ -17,22 +17,34 @@ */ package org.apache.beam.sdk.io.gcp.pubsublite; +import static com.google.cloud.pubsublite.internal.ExtractStatus.toCanonical; +import static com.google.cloud.pubsublite.internal.wire.ServiceClients.addDefaultMetadata; +import static com.google.cloud.pubsublite.internal.wire.ServiceClients.addDefaultSettings; + import com.google.api.gax.rpc.ApiException; import com.google.auto.value.AutoValue; +import com.google.cloud.pubsublite.Offset; import com.google.cloud.pubsublite.Partition; -import com.google.cloud.pubsublite.PartitionLookupUtils; import com.google.cloud.pubsublite.SubscriptionPath; import com.google.cloud.pubsublite.cloudpubsub.FlowControlSettings; +import com.google.cloud.pubsublite.internal.CursorClient; +import com.google.cloud.pubsublite.internal.CursorClientSettings; import com.google.cloud.pubsublite.internal.wire.Committer; -import com.google.cloud.pubsublite.internal.wire.CommitterBuilder; +import com.google.cloud.pubsublite.internal.wire.CommitterSettings; import com.google.cloud.pubsublite.internal.wire.PubsubContext; import com.google.cloud.pubsublite.internal.wire.PubsubContext.Framework; +import com.google.cloud.pubsublite.internal.wire.RoutingMetadata; import com.google.cloud.pubsublite.internal.wire.SubscriberBuilder; import com.google.cloud.pubsublite.internal.wire.SubscriberFactory; +import com.google.cloud.pubsublite.proto.Cursor; +import com.google.cloud.pubsublite.proto.SeekRequest; +import com.google.cloud.pubsublite.v1.CursorServiceClient; +import com.google.cloud.pubsublite.v1.CursorServiceSettings; +import com.google.cloud.pubsublite.v1.SubscriberServiceClient; +import com.google.cloud.pubsublite.v1.SubscriberServiceSettings; import java.io.Serializable; -import java.util.Set; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.checkerframework.checker.nullness.qual.Nullable; +import org.joda.time.Duration; @AutoValue public abstract class SubscriberOptions implements Serializable { @@ -40,14 +52,30 @@ public abstract class SubscriberOptions implements Serializable { private static final Framework FRAMEWORK = Framework.of("BEAM"); + private static final long MEBIBYTE = 1L << 20; + + private static final Duration MIN_BUNDLE_TIMEOUT = Duration.standardMinutes(1); + + public static final FlowControlSettings DEFAULT_FLOW_CONTROL = + FlowControlSettings.builder() + .setMessagesOutstanding(Long.MAX_VALUE) + .setBytesOutstanding(100 * MEBIBYTE) + .build(); + // Required parameters. public abstract SubscriptionPath subscriptionPath(); + // Optional parameters. + /** Per-partition flow control parameters for this subscription. */ public abstract FlowControlSettings flowControlSettings(); - // Optional parameters. - /** A set of partitions. If empty, retrieve the set of partitions using an admin client. */ - public abstract Set partitions(); + /** + * The minimum wall time to pass before allowing bundle closure. + * + *

    Setting this to too small of a value will result in increased compute costs and lower + * throughput per byte. Immediate timeouts (Duration.ZERO) may be useful for testing. + */ + public abstract Duration minBundleTimeout(); /** * A factory to override subscriber creation entirely and delegate to another method. Primarily @@ -67,14 +95,38 @@ public abstract class SubscriberOptions implements Serializable { */ abstract @Nullable SerializableSupplier backlogReaderSupplier(); + /** + * A supplier to override offset reader creation entirely and delegate to another method. + * Primarily useful for testing. + */ + abstract @Nullable SerializableSupplier offsetReaderSupplier(); + public static Builder newBuilder() { Builder builder = new AutoValue_SubscriberOptions.Builder(); - return builder.setPartitions(ImmutableSet.of()); + return builder + .setFlowControlSettings(DEFAULT_FLOW_CONTROL) + .setMinBundleTimeout(MIN_BUNDLE_TIMEOUT); } public abstract Builder toBuilder(); - SubscriberFactory getSubscriberFactory(Partition partition) { + private SubscriberServiceClient newSubscriberServiceClient(Partition partition) + throws ApiException { + try { + SubscriberServiceSettings.Builder settingsBuilder = SubscriberServiceSettings.newBuilder(); + settingsBuilder = + addDefaultMetadata( + PubsubContext.of(FRAMEWORK), + RoutingMetadata.of(subscriptionPath(), partition), + settingsBuilder); + return SubscriberServiceClient.create( + addDefaultSettings(subscriptionPath().location().extractRegion(), settingsBuilder)); + } catch (Throwable t) { + throw toCanonical(t).underlying; + } + } + + SubscriberFactory getSubscriberFactory(Partition partition, Offset initialOffset) { SubscriberFactory factory = subscriberFactory(); if (factory != null) { return factory; @@ -84,41 +136,73 @@ SubscriberFactory getSubscriberFactory(Partition partition) { .setMessageConsumer(consumer) .setSubscriptionPath(subscriptionPath()) .setPartition(partition) - .setContext(PubsubContext.of(FRAMEWORK)) + .setServiceClient(newSubscriberServiceClient(partition)) + .setInitialLocation( + SeekRequest.newBuilder() + .setCursor(Cursor.newBuilder().setOffset(initialOffset.value())) + .build()) .build(); } + private CursorServiceClient newCursorServiceClient() throws ApiException { + try { + return CursorServiceClient.create( + addDefaultSettings( + subscriptionPath().location().extractRegion(), CursorServiceSettings.newBuilder())); + } catch (Throwable t) { + throw toCanonical(t).underlying; + } + } + Committer getCommitter(Partition partition) { SerializableSupplier supplier = committerSupplier(); if (supplier != null) { return supplier.get(); } - return CommitterBuilder.newBuilder() + return CommitterSettings.newBuilder() .setSubscriptionPath(subscriptionPath()) .setPartition(partition) - .build(); + .setServiceClient(newCursorServiceClient()) + .build() + .instantiate(); } - TopicBacklogReader getBacklogReader() { + TopicBacklogReader getBacklogReader(Partition partition) { SerializableSupplier supplier = backlogReaderSupplier(); if (supplier != null) { return supplier.get(); } return TopicBacklogReaderSettings.newBuilder() .setTopicPathFromSubscriptionPath(subscriptionPath()) + .setPartition(partition) .build() .instantiate(); } + InitialOffsetReader getInitialOffsetReader(Partition partition) { + SerializableSupplier supplier = offsetReaderSupplier(); + if (supplier != null) { + return supplier.get(); + } + return new InitialOffsetReaderImpl( + CursorClient.create( + CursorClientSettings.newBuilder() + .setRegion(subscriptionPath().location().extractRegion()) + .build()), + subscriptionPath(), + partition); + } + @AutoValue.Builder public abstract static class Builder { // Required parameters. public abstract Builder setSubscriptionPath(SubscriptionPath path); - public abstract Builder setPartitions(Set partitions); - + // Optional parameters public abstract Builder setFlowControlSettings(FlowControlSettings flowControlSettings); + public abstract Builder setMinBundleTimeout(Duration minBundleTimeout); + // Used in unit tests abstract Builder setSubscriberFactory(SubscriberFactory subscriberFactory); @@ -127,28 +211,9 @@ public abstract static class Builder { abstract Builder setBacklogReaderSupplier( SerializableSupplier backlogReaderSupplier); - // Used for implementing build(); - abstract SubscriptionPath subscriptionPath(); + abstract Builder setOffsetReaderSupplier( + SerializableSupplier offsetReaderSupplier); - abstract Set partitions(); - - abstract SubscriberOptions autoBuild(); - - @SuppressWarnings("CheckReturnValue") - public SubscriberOptions build() throws ApiException { - if (!partitions().isEmpty()) { - return autoBuild(); - } - - if (partitions().isEmpty()) { - int partitionCount = PartitionLookupUtils.numPartitions(subscriptionPath()); - ImmutableSet.Builder partitions = ImmutableSet.builder(); - for (int i = 0; i < partitionCount; i++) { - partitions.add(Partition.of(i)); - } - setPartitions(partitions.build()); - } - return autoBuild(); - } + public abstract SubscriberOptions build(); } } diff --git a/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/AbstractStreamOperatorCompat.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartition.java similarity index 58% rename from runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/AbstractStreamOperatorCompat.java rename to sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartition.java index cd7920d542da..753ad2a921e2 100644 --- a/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/AbstractStreamOperatorCompat.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartition.java @@ -15,16 +15,21 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.flink.translation.wrappers.streaming; +package org.apache.beam.sdk.io.gcp.pubsublite; -import org.apache.flink.streaming.api.operators.AbstractStreamOperator; -import org.apache.flink.streaming.api.operators.InternalTimeServiceManager; +import com.google.auto.value.AutoValue; +import com.google.cloud.pubsublite.Partition; +import com.google.cloud.pubsublite.SubscriptionPath; +import org.apache.beam.sdk.coders.DefaultCoder; -/** Compatibility layer for {@link AbstractStreamOperator} breaking changes. */ -public abstract class AbstractStreamOperatorCompat - extends AbstractStreamOperator { - // timeServiceManager was made private behind a getter in Flink 1.11 - protected InternalTimeServiceManager getTimeServiceManagerCompat() { - return timeServiceManager; +@AutoValue +@DefaultCoder(SubscriptionPartitionCoder.class) +public abstract class SubscriptionPartition { + static SubscriptionPartition of(SubscriptionPath subscription, Partition partition) { + return new AutoValue_SubscriptionPartition(subscription, partition); } + + abstract SubscriptionPath subscription(); + + abstract Partition partition(); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionCoder.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionCoder.java new file mode 100644 index 000000000000..77380b474dee --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionCoder.java @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import com.google.cloud.pubsublite.Partition; +import com.google.cloud.pubsublite.SubscriptionPath; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.AtomicCoder; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderProvider; +import org.apache.beam.sdk.coders.CoderProviders; +import org.apache.beam.sdk.coders.DelegateCoder; +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.VarLongCoder; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.TypeDescriptor; + +public class SubscriptionPartitionCoder extends AtomicCoder { + private static final Coder CODER = + DelegateCoder.of( + KvCoder.of(StringUtf8Coder.of(), VarLongCoder.of()), + SubscriptionPartitionCoder::toKv, + SubscriptionPartitionCoder::fromKv); + + private static KV toKv(SubscriptionPartition value) { + return KV.of(value.subscription().toString(), value.partition().value()); + } + + private static SubscriptionPartition fromKv(KV kv) { + return SubscriptionPartition.of( + SubscriptionPath.parse(kv.getKey()), Partition.of(kv.getValue())); + } + + @Override + public void encode(SubscriptionPartition value, OutputStream outStream) throws IOException { + CODER.encode(value, outStream); + } + + @Override + public SubscriptionPartition decode(InputStream inStream) throws IOException { + return CODER.decode(inStream); + } + + public static CoderProvider getCoderProvider() { + return CoderProviders.forCoder( + TypeDescriptor.of(SubscriptionPartition.class), new SubscriptionPartitionCoder()); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionLoader.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionLoader.java new file mode 100644 index 000000000000..e411d801a7f7 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionLoader.java @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.cloud.pubsublite.Partition; +import com.google.cloud.pubsublite.PartitionLookupUtils; +import com.google.cloud.pubsublite.SubscriptionPath; +import com.google.cloud.pubsublite.TopicPath; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.IntStream; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.Watch; +import org.apache.beam.sdk.transforms.Watch.Growth.PollFn; +import org.apache.beam.sdk.transforms.Watch.Growth.PollResult; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.joda.time.Duration; +import org.joda.time.Instant; + +class SubscriptionPartitionLoader extends PTransform> { + private final TopicPath topic; + private final SubscriptionPath subscription; + private final SerializableFunction getPartitionCount; + private final Duration pollDuration; + private final boolean terminate; + + SubscriptionPartitionLoader(TopicPath topic, SubscriptionPath subscription) { + this( + topic, + subscription, + PartitionLookupUtils::numPartitions, + Duration.standardMinutes(1), + false); + } + + @VisibleForTesting + SubscriptionPartitionLoader( + TopicPath topic, + SubscriptionPath subscription, + SerializableFunction getPartitionCount, + Duration pollDuration, + boolean terminate) { + this.topic = topic; + this.subscription = subscription; + this.getPartitionCount = getPartitionCount; + this.pollDuration = pollDuration; + this.terminate = terminate; + } + + @Override + public PCollection expand(PBegin input) { + PCollection start = input.apply(Create.of(ImmutableList.of(topic))); + PCollection> partitions = + start.apply( + Watch.growthOf( + new PollFn() { + @Override + public PollResult apply(TopicPath element, Context c) { + checkArgument(element.equals(topic)); + int partitionCount = getPartitionCount.apply(element); + List partitions = + IntStream.range(0, partitionCount) + .mapToObj(Partition::of) + .collect(Collectors.toList()); + return PollResult.incomplete(Instant.now(), partitions); + } + }) + .withPollInterval(pollDuration) + .withTerminationPerInput( + terminate + ? Watch.Growth.afterTotalOf(pollDuration.multipliedBy(10)) + : Watch.Growth.never())); + return partitions.apply( + MapElements.into(TypeDescriptor.of(SubscriptionPartition.class)) + .via(kv -> SubscriptionPartition.of(subscription, kv.getValue()))); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessor.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessor.java new file mode 100644 index 000000000000..471e651508f1 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessor.java @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.internal.CheckedApiException; +import java.util.Optional; +import org.apache.beam.sdk.transforms.DoFn.ProcessContinuation; +import org.joda.time.Duration; + +interface SubscriptionPartitionProcessor extends AutoCloseable { + void start() throws CheckedApiException; + + ProcessContinuation waitForCompletion(Duration duration); + + Optional lastClaimed(); +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessorFactory.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessorFactory.java new file mode 100644 index 000000000000..530c180ebd88 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessorFactory.java @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import com.google.cloud.pubsublite.proto.SequencedMessage; +import java.io.Serializable; +import org.apache.beam.sdk.transforms.DoFn.OutputReceiver; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; + +interface SubscriptionPartitionProcessorFactory extends Serializable { + long serialVersionUID = 765145146544654L; + + SubscriptionPartitionProcessor newProcessor( + SubscriptionPartition subscriptionPartition, + RestrictionTracker tracker, + OutputReceiver receiver); +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessorImpl.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessorImpl.java new file mode 100644 index 000000000000..a086d18b2f65 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessorImpl.java @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static com.google.cloud.pubsublite.internal.wire.ApiServiceUtils.blockingShutdown; + +import com.google.api.core.ApiService.Listener; +import com.google.api.core.ApiService.State; +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.cloudpubsub.FlowControlSettings; +import com.google.cloud.pubsublite.internal.CheckedApiException; +import com.google.cloud.pubsublite.internal.ExtractStatus; +import com.google.cloud.pubsublite.internal.wire.Subscriber; +import com.google.cloud.pubsublite.internal.wire.SystemExecutors; +import com.google.cloud.pubsublite.proto.FlowControlRequest; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import com.google.protobuf.util.Timestamps; +import java.util.List; +import java.util.Optional; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; +import java.util.function.Consumer; +import java.util.function.Function; +import org.apache.beam.sdk.transforms.DoFn.OutputReceiver; +import org.apache.beam.sdk.transforms.DoFn.ProcessContinuation; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.SettableFuture; +import org.joda.time.Duration; +import org.joda.time.Instant; + +class SubscriptionPartitionProcessorImpl extends Listener + implements SubscriptionPartitionProcessor { + private final RestrictionTracker tracker; + private final OutputReceiver receiver; + private final Subscriber subscriber; + private final SettableFuture completionFuture = SettableFuture.create(); + private final FlowControlSettings flowControlSettings; + private Optional lastClaimedOffset = Optional.empty(); + + @SuppressWarnings("methodref.receiver.bound.invalid") + SubscriptionPartitionProcessorImpl( + RestrictionTracker tracker, + OutputReceiver receiver, + Function>, Subscriber> subscriberFactory, + FlowControlSettings flowControlSettings) { + this.tracker = tracker; + this.receiver = receiver; + this.subscriber = subscriberFactory.apply(this::onMessages); + this.flowControlSettings = flowControlSettings; + } + + @Override + @SuppressWarnings("argument.type.incompatible") + public void start() throws CheckedApiException { + this.subscriber.addListener(this, SystemExecutors.getFuturesExecutor()); + this.subscriber.startAsync(); + this.subscriber.awaitRunning(); + try { + this.subscriber.allowFlow( + FlowControlRequest.newBuilder() + .setAllowedBytes(flowControlSettings.bytesOutstanding()) + .setAllowedMessages(flowControlSettings.messagesOutstanding()) + .build()); + } catch (Throwable t) { + throw ExtractStatus.toCanonical(t); + } + } + + private void onMessages(List messages) { + if (completionFuture.isDone()) { + return; + } + Offset lastOffset = Offset.of(Iterables.getLast(messages).getCursor().getOffset()); + long byteSize = messages.stream().mapToLong(SequencedMessage::getSizeBytes).sum(); + if (tracker.tryClaim(OffsetByteProgress.of(lastOffset, byteSize))) { + lastClaimedOffset = Optional.of(lastOffset); + messages.forEach( + message -> + receiver.outputWithTimestamp( + message, new Instant(Timestamps.toMillis(message.getPublishTime())))); + try { + subscriber.allowFlow( + FlowControlRequest.newBuilder() + .setAllowedBytes(byteSize) + .setAllowedMessages(messages.size()) + .build()); + } catch (CheckedApiException e) { + completionFuture.setException(e); + } + } else { + completionFuture.set(null); + } + } + + @Override + public void failed(State from, Throwable failure) { + completionFuture.setException(ExtractStatus.toCanonical(failure)); + } + + @Override + public void close() { + blockingShutdown(subscriber); + } + + @Override + @SuppressWarnings("argument.type.incompatible") + public ProcessContinuation waitForCompletion(Duration duration) { + try { + completionFuture.get(duration.getMillis(), TimeUnit.MILLISECONDS); + // CompletionFuture set with null when tryClaim returned false. + return ProcessContinuation.stop(); + } catch (TimeoutException ignored) { + // Timed out waiting, yield to the runtime. + return ProcessContinuation.resume(); + } catch (ExecutionException e) { + throw ExtractStatus.toCanonical(e.getCause()).underlying; + } catch (Throwable t) { + throw ExtractStatus.toCanonical(t).underlying; + } + } + + @Override + public Optional lastClaimed() { + return lastClaimedOffset; + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReader.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReader.java index 40d402672dc8..1cf27cc49012 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReader.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReader.java @@ -17,32 +17,25 @@ */ package org.apache.beam.sdk.io.gcp.pubsublite; -import com.google.api.core.ApiFuture; +import com.google.api.gax.rpc.ApiException; import com.google.cloud.pubsublite.Offset; -import com.google.cloud.pubsublite.Partition; import com.google.cloud.pubsublite.proto.ComputeMessageStatsResponse; -import java.util.Map; /** - * The TopicBacklogReader is intended for clients who would like to use the TopicStats API to - * aggregate the backlog, or the distance between the current cursor and HEAD across multiple - * partitions within a subscription. + * The TopicBacklogReader uses the TopicStats API to aggregate the backlog, or the distance between + * the current cursor and HEAD for a single {subscription, partition} pair. */ -public interface TopicBacklogReader { - - /** Create a TopicBacklogReader from settings. */ - static TopicBacklogReader create(TopicBacklogReaderSettings settings) { - return settings.instantiate(); - } +interface TopicBacklogReader extends AutoCloseable { /** - * Compute and aggregate message statistics for message between the provided start offset and HEAD - * for each partition. + * Compute and aggregate message statistics for message between the provided start offset and + * HEAD. This method is blocking. * - * @param subscriptionState A map from partition to the current offset of the subscriber in a - * given partition. - * @return a future with either an error or a ComputeMessageStatsResponse with the aggregated - * statistics for messages in the backlog on success. + * @param offset The current offset of the subscriber. + * @return A ComputeMessageStatsResponse with the aggregated statistics for messages in the + * backlog. */ - ApiFuture computeMessageStats( - Map subscriptionState); + ComputeMessageStatsResponse computeMessageStats(Offset offset) throws ApiException; + + @Override + void close(); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderImpl.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderImpl.java index 58e8f74bd1af..94b66da27c43 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderImpl.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderImpl.java @@ -17,73 +17,53 @@ */ package org.apache.beam.sdk.io.gcp.pubsublite; -import com.google.api.core.ApiFuture; -import com.google.api.core.ApiFutures; +import static com.google.cloud.pubsublite.internal.ExtractStatus.toCanonical; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.api.gax.rpc.ApiException; import com.google.cloud.pubsublite.Offset; import com.google.cloud.pubsublite.Partition; import com.google.cloud.pubsublite.TopicPath; import com.google.cloud.pubsublite.internal.TopicStatsClient; import com.google.cloud.pubsublite.proto.ComputeMessageStatsResponse; -import com.google.protobuf.Timestamp; -import com.google.protobuf.util.Timestamps; -import java.util.List; -import java.util.Map; -import java.util.Optional; -import java.util.stream.Collectors; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; +import java.util.concurrent.ExecutionException; +import javax.annotation.Nonnull; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; final class TopicBacklogReaderImpl implements TopicBacklogReader { + private static final Logger LOG = LoggerFactory.getLogger(TopicBacklogReaderImpl.class); private final TopicStatsClient client; private final TopicPath topicPath; + private final Partition partition; - public TopicBacklogReaderImpl(TopicStatsClient client, TopicPath topicPath) { + public TopicBacklogReaderImpl(TopicStatsClient client, TopicPath topicPath, Partition partition) { this.client = client; this.topicPath = topicPath; + this.partition = partition; } - private static Optional minTimestamp(Optional t1, Timestamp t2) { - if (!t1.isPresent() || Timestamps.compare(t1.get(), t2) > 0) { - return Optional.of(t2); + @Override + @SuppressWarnings("assignment.type.incompatible") + public ComputeMessageStatsResponse computeMessageStats(Offset offset) throws ApiException { + try { + return client + .computeMessageStats(topicPath, partition, offset, Offset.of(Integer.MAX_VALUE)) + .get(); + } catch (ExecutionException e) { + @Nonnull Throwable cause = checkNotNull(e.getCause()); + throw toCanonical(cause).underlying; + } catch (InterruptedException e) { + throw toCanonical(e).underlying; } - return t1; } @Override - @SuppressWarnings("dereference.of.nullable") - public ApiFuture computeMessageStats( - Map subscriptionState) { - List> perPartitionFutures = - subscriptionState.entrySet().stream() - .map( - e -> - client.computeMessageStats( - topicPath, e.getKey(), e.getValue(), Offset.of(Integer.MAX_VALUE))) - .collect(Collectors.toList()); - return ApiFutures.transform( - ApiFutures.allAsList(perPartitionFutures), - responses -> { - Optional minPublishTime = Optional.empty(); - Optional minEventTime = Optional.empty(); - long messageBytes = 0; - long messageCount = 0; - for (ComputeMessageStatsResponse response : responses) { - messageBytes += response.getMessageBytes(); - messageCount += response.getMessageCount(); - if (response.hasMinimumPublishTime()) { - minPublishTime = minTimestamp(minPublishTime, response.getMinimumPublishTime()); - } - if (response.hasMinimumEventTime()) { - minEventTime = minTimestamp(minPublishTime, response.getMinimumEventTime()); - } - } - ComputeMessageStatsResponse.Builder builder = - ComputeMessageStatsResponse.newBuilder() - .setMessageBytes(messageBytes) - .setMessageCount(messageCount); - minPublishTime.ifPresent(builder::setMinimumPublishTime); - minEventTime.ifPresent(builder::setMinimumEventTime); - return builder.build(); - }, - MoreExecutors.directExecutor()); + public void close() { + try { + client.close(); + } catch (Exception e) { + LOG.warn("Failed to close topic stats client.", e); + } } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderSettings.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderSettings.java index 86d75e985fb0..79db0f19f5dd 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderSettings.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderSettings.java @@ -17,10 +17,13 @@ */ package org.apache.beam.sdk.io.gcp.pubsublite; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + import com.google.api.gax.rpc.ApiException; import com.google.auto.value.AutoValue; import com.google.cloud.pubsublite.AdminClient; import com.google.cloud.pubsublite.AdminClientSettings; +import com.google.cloud.pubsublite.Partition; import com.google.cloud.pubsublite.SubscriptionPath; import com.google.cloud.pubsublite.TopicPath; import com.google.cloud.pubsublite.internal.ExtractStatus; @@ -28,9 +31,11 @@ import com.google.cloud.pubsublite.internal.TopicStatsClientSettings; import java.io.Serializable; import java.util.concurrent.ExecutionException; +import javax.annotation.Nonnull; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Ticker; @AutoValue -public abstract class TopicBacklogReaderSettings implements Serializable { +abstract class TopicBacklogReaderSettings implements Serializable { private static final long serialVersionUID = -4001752066450248673L; /** @@ -39,40 +44,48 @@ public abstract class TopicBacklogReaderSettings implements Serializable { */ abstract TopicPath topicPath(); - public static Builder newBuilder() { + abstract Partition partition(); + + static Builder newBuilder() { return new AutoValue_TopicBacklogReaderSettings.Builder(); } @AutoValue.Builder - public abstract static class Builder { + abstract static class Builder { // Required parameters. - public abstract Builder setTopicPath(TopicPath topicPath); + abstract Builder setTopicPath(TopicPath topicPath); - @SuppressWarnings("argument.type.incompatible") - public Builder setTopicPathFromSubscriptionPath(SubscriptionPath subscriptionPath) + @SuppressWarnings("assignment.type.incompatible") + Builder setTopicPathFromSubscriptionPath(SubscriptionPath subscriptionPath) throws ApiException { try (AdminClient adminClient = AdminClient.create( AdminClientSettings.newBuilder() - .setRegion(subscriptionPath.location().region()) + .setRegion(subscriptionPath.location().extractRegion()) .build())) { return setTopicPath( TopicPath.parse(adminClient.getSubscription(subscriptionPath).get().getTopic())); } catch (ExecutionException e) { - throw ExtractStatus.toCanonical(e.getCause()).underlying; + @Nonnull Throwable cause = checkNotNull(e.getCause()); + throw ExtractStatus.toCanonical(cause).underlying; } catch (Throwable t) { throw ExtractStatus.toCanonical(t).underlying; } } - public abstract TopicBacklogReaderSettings build(); + abstract Builder setPartition(Partition partition); + + abstract TopicBacklogReaderSettings build(); } - @SuppressWarnings("CheckReturnValue") TopicBacklogReader instantiate() throws ApiException { TopicStatsClientSettings settings = - TopicStatsClientSettings.newBuilder().setRegion(topicPath().location().region()).build(); - return new TopicBacklogReaderImpl(TopicStatsClient.create(settings), topicPath()); + TopicStatsClientSettings.newBuilder() + .setRegion(topicPath().location().extractRegion()) + .build(); + TopicBacklogReader impl = + new TopicBacklogReaderImpl(TopicStatsClient.create(settings), topicPath(), partition()); + return new LimitingTopicBacklogReader(impl, Ticker.systemTicker()); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TrackerWithProgress.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TrackerWithProgress.java new file mode 100644 index 000000000000..7f0d0309a597 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TrackerWithProgress.java @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker.HasProgress; + +public abstract class TrackerWithProgress + extends RestrictionTracker implements HasProgress {} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TranslatingPullSubscriber.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TranslatingPullSubscriber.java deleted file mode 100644 index fa3dd3aef7f6..000000000000 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/TranslatingPullSubscriber.java +++ /dev/null @@ -1,55 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.pubsublite; - -import com.google.cloud.pubsublite.Offset; -import com.google.cloud.pubsublite.internal.CheckedApiException; -import com.google.cloud.pubsublite.internal.PullSubscriber; -import com.google.cloud.pubsublite.proto.SequencedMessage; -import java.util.List; -import java.util.Optional; -import java.util.stream.Collectors; - -/** - * A PullSubscriber translating from {@link com.google.cloud.pubsublite.SequencedMessage}to {@link - * com.google.cloud.pubsublite.proto.SequencedMessage}. - */ -class TranslatingPullSubscriber implements PullSubscriber { - private final PullSubscriber underlying; - - TranslatingPullSubscriber( - PullSubscriber underlying) { - this.underlying = underlying; - } - - @Override - public List pull() throws CheckedApiException { - List messages = underlying.pull(); - return messages.stream().map(m -> m.toProto()).collect(Collectors.toList()); - } - - @Override - public Optional nextOffset() { - return underlying.nextOffset(); - } - - @Override - public void close() throws Exception { - underlying.close(); - } -} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/Uuid.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/Uuid.java index 8a608e0b770f..5d3bed9e4056 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/Uuid.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/Uuid.java @@ -21,6 +21,7 @@ import com.google.protobuf.ByteString; import java.io.DataOutputStream; import java.io.IOException; +import java.util.Base64; import java.util.UUID; import org.apache.beam.sdk.coders.DefaultCoder; @@ -49,6 +50,9 @@ public static Uuid random() { } catch (IOException e) { throw new RuntimeException("Should never have an IOException since there is no io.", e); } - return Uuid.of(output.toByteString()); + // Encode to Base64 so the random UUIDs are valid if consumed from the Cloud Pub/Sub client. + return Uuid.of( + ByteString.copyFrom( + Base64.getEncoder().encode(output.toByteString().asReadOnlyByteBuffer()))); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/UuidCoder.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/UuidCoder.java index b71226d7b74d..d23fb3080c2c 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/UuidCoder.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/UuidCoder.java @@ -30,16 +30,17 @@ /** A coder for a Uuid. */ public class UuidCoder extends AtomicCoder { - private static Coder coder = DelegateCoder.of(ByteStringCoder.of(), Uuid::value, Uuid::of); + private static final Coder CODER = + DelegateCoder.of(ByteStringCoder.of(), Uuid::value, Uuid::of); @Override public void encode(Uuid value, OutputStream outStream) throws IOException { - coder.encode(value, outStream); + CODER.encode(value, outStream); } @Override public Uuid decode(InputStream inStream) throws IOException { - return coder.decode(inStream); + return CODER.decode(inStream); } public static CoderProvider getCoderProvider() { diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/CreateTransactionFn.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/CreateTransactionFn.java index cff3fb07a79a..cf269851e17f 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/CreateTransactionFn.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/CreateTransactionFn.java @@ -18,6 +18,7 @@ package org.apache.beam.sdk.io.gcp.spanner; import com.google.cloud.spanner.BatchReadOnlyTransaction; +import com.google.cloud.spanner.TimestampBound; import org.apache.beam.sdk.transforms.DoFn; /** Creates a batch transaction. */ @@ -25,18 +26,19 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) class CreateTransactionFn extends DoFn { + private final SpannerConfig config; + private final TimestampBound timestampBound; - private final SpannerIO.CreateTransaction config; - - CreateTransactionFn(SpannerIO.CreateTransaction config) { + CreateTransactionFn(SpannerConfig config, TimestampBound timestampBound) { this.config = config; + this.timestampBound = timestampBound; } private transient SpannerAccessor spannerAccessor; @DoFn.Setup public void setup() throws Exception { - spannerAccessor = SpannerAccessor.getOrCreate(config.getSpannerConfig()); + spannerAccessor = SpannerAccessor.getOrCreate(config); } @Teardown @@ -47,7 +49,7 @@ public void teardown() throws Exception { @ProcessElement public void processElement(ProcessContext c) throws Exception { BatchReadOnlyTransaction tx = - spannerAccessor.getBatchClient().batchReadOnlyTransaction(config.getTimestampBound()); + spannerAccessor.getBatchClient().batchReadOnlyTransaction(timestampBound); c.output(Transaction.create(tx.getBatchTransactionId())); } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/MutationSizeEstimator.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/MutationSizeEstimator.java index d2ec52f2dafa..69806927ddfb 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/MutationSizeEstimator.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/MutationSizeEstimator.java @@ -25,6 +25,7 @@ import com.google.cloud.spanner.KeySet; import com.google.cloud.spanner.Mutation; import com.google.cloud.spanner.Value; +import java.math.BigDecimal; /** Estimates the logical size of {@link com.google.cloud.spanner.Mutation}. */ class MutationSizeEstimator { @@ -115,6 +116,13 @@ private static long estimatePrimitiveValue(Value v) { return v.isNull() ? 0 : v.getString().length(); case BYTES: return v.isNull() ? 0 : v.getBytes().length(); + case NUMERIC: + // see + // https://cloud.google.com/spanner/docs/working-with-numerics#handling_numeric_when_creating_a_client_library_or_driver + // Numeric/BigDecimal are stored in protos as String. It is likely that they + // are also stored in the Spanner database as String, so this gives an approximation for + // mutation value size. + return v.isNull() ? 0 : v.getNumeric().toString().length(); default: throw new IllegalArgumentException("Unsupported type " + v.getType()); } @@ -153,6 +161,21 @@ private static long estimateArrayValue(Value v) { return 12L * v.getDateArray().size(); case TIMESTAMP: return 12L * v.getTimestampArray().size(); + case NUMERIC: + totalLength = 0; + for (BigDecimal n : v.getNumericArray()) { + if (n == null) { + continue; + } + // see + // https://cloud.google.com/spanner/docs/working-with-numerics#handling_numeric_when_creating_a_client_library_or_driver + // Numeric/BigDecimal are stored in protos as String. It is likely that they + // are also stored in the Spanner database as String, so this gives an approximation for + // mutation value size. + totalLength += n.toString().length(); + } + return totalLength; + default: throw new IllegalArgumentException("Unsupported type " + v.getType()); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerAccessor.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerAccessor.java index a22b5b9ae6c2..c34e21c3e171 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerAccessor.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerAccessor.java @@ -17,8 +17,11 @@ */ package org.apache.beam.sdk.io.gcp.spanner; +import com.google.api.gax.core.ExecutorProvider; +import com.google.api.gax.grpc.InstantiatingGrpcChannelProvider; import com.google.api.gax.retrying.RetrySettings; -import com.google.api.gax.rpc.FixedHeaderProvider; +import com.google.api.gax.rpc.HeaderProvider; +import com.google.api.gax.rpc.ServerStreamingCallSettings; import com.google.api.gax.rpc.UnaryCallSettings; import com.google.cloud.NoCredentials; import com.google.cloud.ServiceFactory; @@ -28,18 +31,31 @@ import com.google.cloud.spanner.DatabaseId; import com.google.cloud.spanner.Spanner; import com.google.cloud.spanner.SpannerOptions; +import com.google.cloud.spanner.spi.v1.SpannerInterceptorProvider; import com.google.spanner.v1.CommitRequest; import com.google.spanner.v1.CommitResponse; +import com.google.spanner.v1.ExecuteSqlRequest; +import com.google.spanner.v1.PartialResultSet; import io.grpc.CallOptions; import io.grpc.Channel; import io.grpc.ClientCall; import io.grpc.ClientInterceptor; import io.grpc.MethodDescriptor; +import java.net.MalformedURLException; +import java.net.URL; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledThreadPoolExecutor; +import java.util.concurrent.ThreadFactory; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; import org.apache.beam.sdk.options.ValueProvider; import org.apache.beam.sdk.util.ReleaseInfo; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; import org.joda.time.Duration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -48,10 +64,13 @@ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -class SpannerAccessor implements AutoCloseable { +public class SpannerAccessor implements AutoCloseable { private static final Logger LOG = LoggerFactory.getLogger(SpannerAccessor.class); - // A common user agent token that indicates that this request was originated from Apache Beam. + /* A common user agent token that indicates that this request was originated from + * Apache Beam. Setting the user-agent allows Cloud Spanner to detect that the + * workload is coming from Dataflow and to potentially apply performance optimizations + */ private static final String USER_AGENT_PREFIX = "Apache_Beam_Java"; // Only create one SpannerAccessor for each different SpannerConfig. @@ -69,6 +88,12 @@ class SpannerAccessor implements AutoCloseable { private final DatabaseAdminClient databaseAdminClient; private final SpannerConfig spannerConfig; + private static final int MAX_MESSAGE_SIZE = 100 * 1024 * 1024; + private static final int MAX_METADATA_SIZE = 32 * 1024; // bytes + private static final int NUM_CHANNELS = 4; + public static final org.threeten.bp.Duration GRPC_KEEP_ALIVE_SECONDS = + org.threeten.bp.Duration.ofSeconds(120); + private SpannerAccessor( Spanner spanner, DatabaseClient databaseClient, @@ -82,7 +107,7 @@ private SpannerAccessor( this.spannerConfig = spannerConfig; } - static SpannerAccessor getOrCreate(SpannerConfig spannerConfig) { + public static SpannerAccessor getOrCreate(SpannerConfig spannerConfig) { SpannerAccessor self = spannerAccessors.get(spannerConfig); if (self == null) { @@ -122,6 +147,36 @@ private static SpannerAccessor createAndConnect(SpannerConfig spannerConfig) { org.threeten.bp.Duration.ofMillis(commitDeadline.get().getMillis())) .build()); } + // Setting the timeout for streaming read to 2 hours. This is 1 hour by default + // after BEAM 2.20. + ServerStreamingCallSettings.Builder + executeStreamingSqlSettings = + builder.getSpannerStubSettingsBuilder().executeStreamingSqlSettings(); + RetrySettings.Builder executeSqlStreamingRetrySettings = + executeStreamingSqlSettings.getRetrySettings().toBuilder(); + executeStreamingSqlSettings.setRetrySettings( + executeSqlStreamingRetrySettings + .setInitialRpcTimeout(org.threeten.bp.Duration.ofMinutes(120)) + .setMaxRpcTimeout(org.threeten.bp.Duration.ofMinutes(120)) + .setTotalTimeout(org.threeten.bp.Duration.ofMinutes(120)) + .build()); + + ManagedInstantiatingExecutorProvider executorProvider = + new ManagedInstantiatingExecutorProvider( + new ThreadFactoryBuilder() + .setDaemon(true) + .setNameFormat("Cloud-Spanner-TransportChannel-%d") + .build()); + + InstantiatingGrpcChannelProvider.Builder instantiatingGrpcChannelProvider = + InstantiatingGrpcChannelProvider.newBuilder() + .setMaxInboundMessageSize(MAX_MESSAGE_SIZE) + .setMaxInboundMetadataSize(MAX_METADATA_SIZE) + .setPoolSize(NUM_CHANNELS) + .setExecutorProvider(executorProvider) + .setKeepAliveTime(GRPC_KEEP_ALIVE_SECONDS) + .setInterceptorProvider(SpannerInterceptorProvider.createDefault()) + .setAttemptDirectPath(true); ValueProvider projectId = spannerConfig.getProjectId(); if (projectId != null) { @@ -134,14 +189,34 @@ private static SpannerAccessor createAndConnect(SpannerConfig spannerConfig) { ValueProvider host = spannerConfig.getHost(); if (host != null) { builder.setHost(host.get()); + instantiatingGrpcChannelProvider.setEndpoint(getEndpoint(host.get())); } ValueProvider emulatorHost = spannerConfig.getEmulatorHost(); if (emulatorHost != null) { builder.setEmulatorHost(emulatorHost.get()); builder.setCredentials(NoCredentials.getInstance()); + } else { + String userAgentString = USER_AGENT_PREFIX + "/" + ReleaseInfo.getReleaseInfo().getVersion(); + /* Workaround to setup user-agent string. + * InstantiatingGrpcChannelProvider will override the settings provided. + * The section below and all associated artifacts will be removed once the bug + * that prevents setting user-agent is fixed. + * https://github.com/googleapis/java-spanner/pull/871 + * + * Code to be replaced: + * builder.setHeaderProvider(FixedHeaderProvider.create("user-agent", userAgentString)); + */ + instantiatingGrpcChannelProvider.setHeaderProvider( + new HeaderProvider() { + @Override + public Map getHeaders() { + final Map headers = new HashMap<>(); + headers.put("user-agent", userAgentString); + return headers; + } + }); + builder.setChannelProvider(instantiatingGrpcChannelProvider.build()); } - String userAgentString = USER_AGENT_PREFIX + "/" + ReleaseInfo.getReleaseInfo().getVersion(); - builder.setHeaderProvider(FixedHeaderProvider.create("user-agent", userAgentString)); SpannerOptions options = builder.build(); Spanner spanner = options.getService(); @@ -157,15 +232,26 @@ private static SpannerAccessor createAndConnect(SpannerConfig spannerConfig) { spanner, databaseClient, databaseAdminClient, batchClient, spannerConfig); } - DatabaseClient getDatabaseClient() { + private static String getEndpoint(String host) { + URL url; + try { + url = new URL(host); + } catch (MalformedURLException e) { + throw new IllegalArgumentException("Invalid host: " + host, e); + } + return String.format( + "%s:%s", url.getHost(), url.getPort() < 0 ? url.getDefaultPort() : url.getPort()); + } + + public DatabaseClient getDatabaseClient() { return databaseClient; } - BatchClient getBatchClient() { + public BatchClient getBatchClient() { return batchClient; } - DatabaseAdminClient getDatabaseAdminClient() { + public DatabaseAdminClient getDatabaseAdminClient() { return databaseAdminClient; } @@ -205,4 +291,32 @@ public ClientCall interceptCall( return next.newCall(method, callOptions); } } + + private static final class ManagedInstantiatingExecutorProvider implements ExecutorProvider { + // 4 Gapic clients * 4 channels per client. + private static final int DEFAULT_MIN_THREAD_COUNT = 16; + private final List executors = new ArrayList<>(); + private final ThreadFactory threadFactory; + + private ManagedInstantiatingExecutorProvider(ThreadFactory threadFactory) { + this.threadFactory = threadFactory; + } + + @Override + public boolean shouldAutoClose() { + return false; + } + + @Override + public ScheduledExecutorService getExecutor() { + int numCpus = Runtime.getRuntime().availableProcessors(); + int numThreads = Math.max(DEFAULT_MIN_THREAD_COUNT, numCpus); + ScheduledExecutorService executor = + new ScheduledThreadPoolExecutor(numThreads, threadFactory); + synchronized (this) { + executors.add(executor); + } + return executor; + } + } } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java index b31d71672873..07ff21663736 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java @@ -79,6 +79,7 @@ import org.apache.beam.sdk.values.PCollectionTuple; import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.PInput; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TupleTagList; @@ -739,13 +740,15 @@ public PCollection expand(PBegin input) { } /** - * A {@link PTransform} that create a transaction. + * A {@link PTransform} that create a transaction. If applied to a {@link PCollection}, it will + * create a transaction after the {@link PCollection} is closed. * * @see SpannerIO + * @see Wait */ @AutoValue public abstract static class CreateTransaction - extends PTransform> { + extends PTransform> { abstract SpannerConfig getSpannerConfig(); @@ -754,12 +757,21 @@ public abstract static class CreateTransaction abstract Builder toBuilder(); @Override - public PCollectionView expand(PBegin input) { + public PCollectionView expand(PInput input) { getSpannerConfig().validate(); - return input - .apply(Create.of(1)) - .apply("Create transaction", ParDo.of(new CreateTransactionFn(this))) + PCollection collection = input.getPipeline().apply(Create.of(1)); + + if (input instanceof PCollection) { + collection = collection.apply(Wait.on((PCollection) input)); + } else if (!(input instanceof PBegin)) { + throw new RuntimeException("input must be PBegin or PCollection"); + } + + return collection + .apply( + "Create transaction", + ParDo.of(new CreateTransactionFn(this.getSpannerConfig(), this.getTimestampBound()))) .apply("As PCollectionView", View.asSingleton()); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerTransformRegistrar.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerTransformRegistrar.java index 891dd3702f4c..42bb0a487a1b 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerTransformRegistrar.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerTransformRegistrar.java @@ -38,7 +38,7 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PDone; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.InvalidProtocolBufferException; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.InvalidProtocolBufferException; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.checkerframework.checker.nullness.qual.NonNull; import org.checkerframework.checker.nullness.qual.Nullable; diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigtableEmulatorWrapper.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigtableEmulatorWrapper.java deleted file mode 100644 index 2a4b87eda2bf..000000000000 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigtableEmulatorWrapper.java +++ /dev/null @@ -1,75 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.testing; - -import com.google.api.core.ApiFuture; -import com.google.cloud.bigtable.admin.v2.BigtableTableAdminClient; -import com.google.cloud.bigtable.admin.v2.BigtableTableAdminSettings; -import com.google.cloud.bigtable.admin.v2.models.CreateTableRequest; -import com.google.cloud.bigtable.data.v2.BigtableDataClient; -import com.google.cloud.bigtable.data.v2.BigtableDataSettings; -import com.google.cloud.bigtable.data.v2.models.RowMutation; -import com.google.protobuf.ByteString; -import java.io.IOException; -import java.util.concurrent.ExecutionException; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; - -public class BigtableEmulatorWrapper { - private final BigtableTableAdminClient tableAdminClient; - private final BigtableDataClient dataClient; - - public BigtableEmulatorWrapper(int emulatorPort, String projectId, String instanceId) - throws IOException { - BigtableTableAdminSettings.Builder tableAdminSettings = - BigtableTableAdminSettings.newBuilderForEmulator(emulatorPort) - .setProjectId(projectId) - .setInstanceId(instanceId); - tableAdminClient = BigtableTableAdminClient.create(tableAdminSettings.build()); - - BigtableDataSettings.Builder dataSettings = - BigtableDataSettings.newBuilderForEmulator(emulatorPort) - .setProjectId(projectId) - .setInstanceId(instanceId); - dataClient = BigtableDataClient.create(dataSettings.build()); - } - - public void writeRow( - String key, - String table, - String familyColumn, - String columnQualifier, - byte[] value, - long timestampMicros) - throws ExecutionException, InterruptedException { - ApiFuture mutateFuture = - dataClient.mutateRowAsync( - RowMutation.create(table, key) - .setCell( - familyColumn, - ByteString.copyFromUtf8(columnQualifier), - timestampMicros, - ByteString.copyFrom(value))); - mutateFuture.get(); - } - - public void createTable(String tableName, String... families) { - CreateTableRequest request = CreateTableRequest.of(tableName); - ImmutableList.copyOf(families).forEach(request::addFamily); - tableAdminClient.createTable(request); - } -} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigtableTestUtils.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigtableTestUtils.java deleted file mode 100644 index eefc51f39794..000000000000 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigtableTestUtils.java +++ /dev/null @@ -1,154 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.testing; - -import static java.nio.charset.StandardCharsets.UTF_8; -import static java.util.stream.Collectors.toList; -import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.KEY; -import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.LABELS; -import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.TIMESTAMP_MICROS; -import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.VALUE; -import static org.hamcrest.MatcherAssert.assertThat; -import static org.hamcrest.Matchers.containsString; -import static org.junit.Assert.fail; - -import com.google.bigtable.v2.Cell; -import com.google.bigtable.v2.Column; -import com.google.bigtable.v2.Family; -import com.google.protobuf.ByteString; -import java.time.Instant; -import java.util.List; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Ints; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Longs; -import org.checkerframework.checker.nullness.qual.Nullable; - -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class BigtableTestUtils { - - public static final String KEY1 = "key1"; - public static final String KEY2 = "key2"; - - public static final String BOOL_COLUMN = "boolColumn"; - public static final String LONG_COLUMN = "longColumn"; - public static final String STRING_COLUMN = "stringColumn"; - public static final String DOUBLE_COLUMN = "doubleColumn"; - public static final String BINARY_COLUMN = "binaryColumn"; - public static final String FAMILY_TEST = "familyTest"; - - public static final Schema LONG_COLUMN_SCHEMA = - Schema.builder() - .addInt64Field(VALUE) - .addInt64Field(TIMESTAMP_MICROS) - .addArrayField(LABELS, Schema.FieldType.STRING) - .build(); - - public static final Schema TEST_FAMILY_SCHEMA = - Schema.builder() - .addBooleanField(BOOL_COLUMN) - .addRowField(LONG_COLUMN, LONG_COLUMN_SCHEMA) - .addArrayField(STRING_COLUMN, Schema.FieldType.STRING) - .addDoubleField(DOUBLE_COLUMN) - .addByteArrayField(BINARY_COLUMN) - .build(); - - public static final Schema TEST_SCHEMA = - Schema.builder().addStringField(KEY).addRowField(FAMILY_TEST, TEST_FAMILY_SCHEMA).build(); - - public static final Schema TEST_FLAT_SCHEMA = - Schema.builder() - .addStringField(KEY) - .addBooleanField(BOOL_COLUMN) - .addInt64Field(LONG_COLUMN) - .addStringField(STRING_COLUMN) - .addDoubleField(DOUBLE_COLUMN) - .build(); - - public static final long NOW = Instant.now().toEpochMilli() * 1_000; - public static final long LATER = NOW + 1_000; - - public static byte[] floatToByteArray(float number) { - return Ints.toByteArray(Float.floatToIntBits(number)); - } - - public static byte[] doubleToByteArray(double number) { - return Longs.toByteArray(Double.doubleToLongBits(number)); - } - - public static byte[] booleanToByteArray(boolean condition) { - return condition ? new byte[] {1} : new byte[] {0}; - } - - public static void checkMessage(@Nullable String message, String substring) { - if (message != null) { - assertThat(message, containsString(substring)); - } else { - fail(); - } - } - - public static com.google.bigtable.v2.Row bigTableRow() { - List columns = - ImmutableList.of( - column("boolColumn", booleanToByteArray(true)), - column("doubleColumn", doubleToByteArray(5.5)), - column("longColumn", Longs.toByteArray(10L)), - column("stringColumn", "stringValue".getBytes(UTF_8))); - Family family = Family.newBuilder().setName("familyTest").addAllColumns(columns).build(); - return com.google.bigtable.v2.Row.newBuilder() - .setKey(ByteString.copyFromUtf8("key")) - .addFamilies(family) - .build(); - } - - // There is no possibility to insert a value with fixed timestamp so we have to replace it - // for the testing purpose. - public static com.google.bigtable.v2.Row setFixedTimestamp(com.google.bigtable.v2.Row row) { - Family family = row.getFamilies(0); - - List columnsReplaced = - family.getColumnsList().stream() - .map( - column -> { - Cell cell = column.getCells(0); - return column( - column.getQualifier().toStringUtf8(), cell.getValue().toByteArray()); - }) - .collect(toList()); - Family familyReplaced = - Family.newBuilder().setName(family.getName()).addAllColumns(columnsReplaced).build(); - return com.google.bigtable.v2.Row.newBuilder() - .setKey(row.getKey()) - .addFamilies(familyReplaced) - .build(); - } - - private static Column column(String qualifier, byte[] value) { - return Column.newBuilder() - .setQualifier(ByteString.copyFromUtf8(qualifier)) - .addCells(cell(value)) - .build(); - } - - private static Cell cell(byte[] value) { - return Cell.newBuilder().setValue(ByteString.copyFrom(value)).setTimestampMicros(NOW).build(); - } -} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigtableUtils.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigtableUtils.java new file mode 100644 index 000000000000..67ed384e007f --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigtableUtils.java @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.testing; + +import com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Ints; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Longs; + +public class BigtableUtils { + + public static ByteString byteString(byte[] bytes) { + return ByteString.copyFrom(bytes); + } + + public static ByteString byteStringUtf8(String s) { + return ByteString.copyFromUtf8(s); + } + + public static byte[] floatToByteArray(float number) { + return Ints.toByteArray(Float.floatToIntBits(number)); + } + + public static byte[] longToByteArray(long number) { + return Longs.toByteArray(number); + } + + public static byte[] doubleToByteArray(double number) { + return Longs.toByteArray(Double.doubleToLongBits(number)); + } + + public static byte[] booleanToByteArray(boolean condition) { + return condition ? new byte[] {1} : new byte[] {0}; + } +} diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeBigQueryServices.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeBigQueryServices.java index 6bbb9dabebbd..2296e934839f 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeBigQueryServices.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeBigQueryServices.java @@ -33,7 +33,6 @@ import org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices; import org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder; -import org.apache.beam.sdk.util.Histogram; import org.apache.beam.sdk.values.KV; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -72,11 +71,6 @@ public DatasetService getDatasetService(BigQueryOptions bqOptions) { return datasetService; } - @Override - public DatasetService getDatasetService(BigQueryOptions bqOptions, Histogram requestLatencies) { - return datasetService; - } - @Override public StorageClient getStorageClient(BigQueryOptions bqOptions) { return storageClient; diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeDatasetService.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeDatasetService.java index 7534a86ed1ee..1ee6501ea23a 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeDatasetService.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeDatasetService.java @@ -21,32 +21,50 @@ import com.google.api.client.googleapis.json.GoogleJsonResponseException; import com.google.api.client.http.HttpHeaders; +import com.google.api.core.ApiFuture; +import com.google.api.core.ApiFutures; import com.google.api.services.bigquery.model.Dataset; import com.google.api.services.bigquery.model.DatasetReference; import com.google.api.services.bigquery.model.Table; import com.google.api.services.bigquery.model.TableDataInsertAllResponse; import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableRow; +import com.google.cloud.bigquery.storage.v1beta2.AppendRowsResponse; +import com.google.cloud.bigquery.storage.v1beta2.BatchCommitWriteStreamsResponse; +import com.google.cloud.bigquery.storage.v1beta2.FinalizeWriteStreamResponse; +import com.google.cloud.bigquery.storage.v1beta2.FlushRowsResponse; +import com.google.cloud.bigquery.storage.v1beta2.ProtoRows; +import com.google.cloud.bigquery.storage.v1beta2.WriteStream; +import com.google.cloud.bigquery.storage.v1beta2.WriteStream.Type; +import com.google.protobuf.ByteString; +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.DynamicMessage; +import com.google.protobuf.Timestamp; import java.io.IOException; import java.io.Serializable; import java.util.HashMap; import java.util.List; import java.util.Map; +import java.util.UUID; +import java.util.concurrent.atomic.AtomicInteger; import java.util.regex.Pattern; +import javax.annotation.Nullable; import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.StreamAppendClient; import org.apache.beam.sdk.io.gcp.bigquery.ErrorContainer; import org.apache.beam.sdk.io.gcp.bigquery.InsertRetryPolicy; import org.apache.beam.sdk.io.gcp.bigquery.InsertRetryPolicy.Context; +import org.apache.beam.sdk.io.gcp.bigquery.TableRowToStorageApiProto; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; import org.apache.beam.sdk.transforms.windowing.PaneInfo; import org.apache.beam.sdk.values.FailsafeValueInSingleWindow; import org.apache.beam.sdk.values.ValueInSingleWindow; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashBasedTable; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; -import org.checkerframework.checker.nullness.qual.Nullable; /** A fake dataset service that can be serialized, for use in testReadFromTable. */ @Internal @@ -60,10 +78,70 @@ public class FakeDatasetService implements DatasetService, Serializable { String, String, Map> tables; + static Map writeStreams; + + @Override + public void close() throws Exception {} + + static class Stream { + final List stream; + final TableContainer tableContainer; + final Type type; + long nextFlushPosition; + boolean finalized; + + Stream(TableContainer tableContainer, Type type) { + this.stream = Lists.newArrayList(); + this.tableContainer = tableContainer; + this.type = type; + this.finalized = false; + this.nextFlushPosition = 0; + } + + long finalizeStream() { + this.finalized = true; + return stream.size(); + } + + void appendRows(long position, List rowsToAppend) { + if (finalized) { + throw new RuntimeException("Stream already finalized."); + } + if (position != -1 && position != stream.size()) { + throw new RuntimeException("Bad append: " + position); + } + stream.addAll(rowsToAppend); + } + + void flush(long position) { + Preconditions.checkState(type == Type.BUFFERED); + Preconditions.checkState(!finalized); + if (position >= stream.size()) { + throw new RuntimeException(""); + } + for (; nextFlushPosition <= position; ++nextFlushPosition) { + tableContainer.addRow(stream.get((int) nextFlushPosition), ""); + } + } + + void commit() { + if (!finalized) { + throw new RuntimeException("Can't commit unfinalized stream."); + } + Preconditions.checkState(type == Type.PENDING); + stream.forEach(tr -> tableContainer.addRow(tr, null)); + } + } + Map> insertErrors = Maps.newHashMap(); + // The counter for the number of insertions performed. + static AtomicInteger insertCount; + public static void setUp() { tables = HashBasedTable.create(); + insertCount = new AtomicInteger(0); + writeStreams = Maps.newHashMap(); FakeJobService.setUp(); } @@ -87,6 +165,7 @@ public Table getTable(TableReference tableRef, @Nullable List selectedFi tableRef.getProjectId(), tableRef.getDatasetId()); } TableContainer tableContainer = dataset.get(tableRef.getTableId()); + return tableContainer == null ? null : tableContainer.getTable(); } } @@ -94,6 +173,7 @@ public Table getTable(TableReference tableRef, @Nullable List selectedFi public List getAllRows(String projectId, String datasetId, String tableId) throws InterruptedException, IOException { synchronized (tables) { + TableContainer tableContainer = getTableContainer(projectId, datasetId, tableId); return getTableContainer(projectId, datasetId, tableId).getRows(); } } @@ -217,6 +297,10 @@ public void deleteDataset(String projectId, String datasetId) } } + public int getInsertCount() { + return insertCount.get(); + } + public long insertAll( TableReference ref, List rowList, @Nullable List insertIdList) throws IOException, InterruptedException { @@ -239,7 +323,8 @@ public long insertAll( null, false, false, - false); + false, + null); } @Override @@ -252,7 +337,8 @@ public long insertAll( ErrorContainer errorContainer, boolean skipInvalidRows, boolean ignoreUnknownValues, - boolean ignoreInsertIds) + boolean ignoreInsertIds, + List> successfulRows) throws IOException, InterruptedException { Map> insertErrors = getInsertErrors(); synchronized (tables) { @@ -287,11 +373,20 @@ public long insertAll( } else { dataSize += tableContainer.addRow(row, insertIdList.get(i)); } + if (successfulRows != null) { + successfulRows.add( + ValueInSingleWindow.of( + row, + rowList.get(i).getTimestamp(), + rowList.get(i).getWindow(), + rowList.get(i).getPane())); + } } else { errorContainer.add( failedInserts, allErrors.get(allErrors.size() - 1), ref, rowList.get(i)); } } + insertCount.addAndGet(1); return dataSize; } } @@ -312,6 +407,103 @@ public Table patchTableDescription( } } + @Override + public WriteStream createWriteStream(String tableUrn, Type type) + throws IOException, InterruptedException { + if (type != Type.PENDING && type != Type.BUFFERED) { + throw new RuntimeException("We only support PENDING or BUFFERED streams."); + } + TableReference tableReference = + BigQueryHelpers.parseTableUrn(BigQueryHelpers.stripPartitionDecorator(tableUrn)); + synchronized (tables) { + TableContainer tableContainer = + getTableContainer( + tableReference.getProjectId(), + tableReference.getDatasetId(), + tableReference.getTableId()); + String streamName = UUID.randomUUID().toString(); + writeStreams.put(streamName, new Stream(tableContainer, type)); + return WriteStream.newBuilder().setName(streamName).build(); + } + } + + @Override + public StreamAppendClient getStreamAppendClient(String streamName, Descriptor descriptor) { + return new StreamAppendClient() { + @Override + public ApiFuture appendRows(long offset, ProtoRows rows) + throws Exception { + synchronized (tables) { + Stream stream = writeStreams.get(streamName); + if (stream == null) { + throw new RuntimeException("No such stream: " + streamName); + } + List tableRows = + Lists.newArrayListWithExpectedSize(rows.getSerializedRowsCount()); + for (ByteString bytes : rows.getSerializedRowsList()) { + tableRows.add( + TableRowToStorageApiProto.tableRowFromMessage( + DynamicMessage.parseFrom(descriptor, bytes))); + } + stream.appendRows(offset, tableRows); + } + return ApiFutures.immediateFuture(AppendRowsResponse.newBuilder().build()); + } + + @Override + public void close() throws Exception {} + + @Override + public void pin() {} + + @Override + public void unpin() throws Exception {} + }; + } + + @Override + public ApiFuture flush(String streamName, long offset) { + synchronized (tables) { + Stream stream = writeStreams.get(streamName); + if (stream == null) { + throw new RuntimeException("No such stream: " + streamName); + } + stream.flush(offset); + } + return ApiFutures.immediateFuture(FlushRowsResponse.newBuilder().build()); + } + + @Override + public ApiFuture finalizeWriteStream(String streamName) { + synchronized (tables) { + Stream stream = writeStreams.get(streamName); + if (stream == null) { + throw new RuntimeException("No such stream: " + streamName); + } + long numRows = stream.finalizeStream(); + return ApiFutures.immediateFuture( + FinalizeWriteStreamResponse.newBuilder().setRowCount(numRows).build()); + } + } + + @Override + public ApiFuture commitWriteStreams( + String tableUrn, Iterable writeStreamNames) { + synchronized (tables) { + for (String streamName : writeStreamNames) { + Stream stream = writeStreams.get(streamName); + if (stream == null) { + throw new RuntimeException("No such stream: " + streamName); + } + stream.commit(); + } + } + return ApiFutures.immediateFuture( + BatchCommitWriteStreamsResponse.newBuilder() + .setCommitTime(Timestamp.newBuilder().build()) + .build()); + } + /** * Cause a given {@link TableRow} object to fail when it's inserted. The errors link the list will * be returned on subsequent retries, and the insert will succeed when the errors run out. diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeJobService.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeJobService.java index 924a9962294c..a891c484ad19 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeJobService.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeJobService.java @@ -103,6 +103,9 @@ public class FakeJobService implements JobService, Serializable { private final FakeDatasetService datasetService; + @Override + public void close() throws Exception {} + private static class JobInfo { Job job; int getJobCount = 0; @@ -170,7 +173,6 @@ public void startLoadJob(JobReference jobRef, JobConfigurationLoad loadConfig) FileSystems.copy(sourceFiles.build(), loadFiles.build()); filesForLoadJobs.put(jobRef.getProjectId(), jobRef.getJobId(), loadFiles.build()); } - allJobs.put(jobRef.getProjectId(), jobRef.getJobId(), new JobInfo(job)); } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/GcpApiSurfaceTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/GcpApiSurfaceTest.java index d894ebfde1dc..118210ead32c 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/GcpApiSurfaceTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/GcpApiSurfaceTest.java @@ -34,9 +34,6 @@ /** API surface verification for {@link org.apache.beam.sdk.io.gcp}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GcpApiSurfaceTest { @Test @@ -63,18 +60,27 @@ public void testGcpApiSurface() throws Exception { classesInPackage("com.google.api.client.http"), classesInPackage("com.google.api.client.json"), classesInPackage("com.google.api.client.util"), + classesInPackage("com.google.api.gax.retrying"), + classesInPackage("com.google.api.gax.longrunning"), classesInPackage("com.google.api.services.bigquery.model"), classesInPackage("com.google.api.services.healthcare"), classesInPackage("com.google.auth"), classesInPackage("com.google.bigtable.v2"), + classesInPackage("com.google.cloud"), + classesInPackage("com.google.common.collect"), classesInPackage("com.google.cloud.bigquery.storage.v1"), + classesInPackage("com.google.cloud.bigquery.storage.v1beta2"), classesInPackage("com.google.cloud.bigtable.config"), + classesInPackage("com.google.iam.v1"), classesInPackage("com.google.spanner.v1"), classesInPackage("com.google.pubsub.v1"), classesInPackage("com.google.cloud.pubsublite"), Matchers.equalTo(com.google.api.gax.rpc.ApiException.class), + Matchers.equalTo(com.google.api.gax.paging.Page.class), Matchers.>equalTo(com.google.api.gax.rpc.StatusCode.class), Matchers.>equalTo(com.google.common.base.Function.class), + Matchers.>equalTo(com.google.common.base.Optional.class), + Matchers.>equalTo(com.google.common.base.Supplier.class), Matchers.>equalTo(com.google.api.gax.rpc.StatusCode.Code.class), Matchers.>equalTo(com.google.cloud.bigtable.grpc.BigtableClusterName.class), Matchers.>equalTo(com.google.cloud.bigtable.grpc.BigtableInstanceName.class), @@ -89,9 +95,19 @@ public void testGcpApiSurface() throws Exception { Matchers.>equalTo(com.google.cloud.ByteArray.class), Matchers.>equalTo(com.google.cloud.Date.class), Matchers.>equalTo(com.google.cloud.Timestamp.class), + // BatchWrite for Firestore returns individual Status for each write attempted, in the + // case of a failure and using the dead letter queue the Status is returned as part of + // the WriteFailure + Matchers.>equalTo(com.google.rpc.Status.class), + Matchers.>equalTo(com.google.rpc.Status.Builder.class), + Matchers.>equalTo(com.google.rpc.StatusOrBuilder.class), classesInPackage("com.google.cloud.spanner"), + classesInPackage("com.google.longrunning"), + classesInPackage("com.google.spanner.admin.database.v1"), classesInPackage("com.google.datastore.v1"), + classesInPackage("com.google.firestore.v1"), classesInPackage("com.google.protobuf"), + classesInPackage("com.google.rpc"), classesInPackage("com.google.type"), classesInPackage("com.fasterxml.jackson.annotation"), classesInPackage("com.fasterxml.jackson.core"), @@ -104,6 +120,9 @@ public void testGcpApiSurface() throws Exception { classesInPackage("org.apache.commons.logging"), classesInPackage("org.codehaus.jackson"), classesInPackage("org.joda.time"), + Matchers.>equalTo(org.threeten.bp.Duration.class), + Matchers.>equalTo(org.threeten.bp.format.ResolverStyle.class), + classesInPackage("org.threeten.bp.temporal"), classesInPackage("com.google.gson")); assertThat(apiSurface, containsOnlyClassesMatching(allowedClasses)); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProtoTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProtoTest.java new file mode 100644 index 000000000000..32c0b7a90878 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProtoTest.java @@ -0,0 +1,391 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.junit.Assert.assertEquals; + +import com.google.protobuf.ByteString; +import com.google.protobuf.DescriptorProtos.DescriptorProto; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto.Label; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto.Type; +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.Descriptors.FieldDescriptor; +import com.google.protobuf.DynamicMessage; +import java.math.BigDecimal; +import java.nio.charset.StandardCharsets; +import java.time.LocalDate; +import java.time.LocalDateTime; +import java.time.LocalTime; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.logicaltypes.EnumerationType; +import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Functions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.joda.time.Instant; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +/** Unit tests form {@link BeamRowToStorageApiProto}. */ +public class BeamRowToStorageApiProtoTest { + private static final EnumerationType TEST_ENUM = + EnumerationType.create("ONE", "TWO", "RED", "BLUE"); + private static final Schema BASE_SCHEMA = + Schema.builder() + .addField("byteValue", FieldType.BYTE.withNullable(true)) + .addField("int16Value", FieldType.INT16) + .addField("int32Value", FieldType.INT32.withNullable(true)) + .addField("int64Value", FieldType.INT64.withNullable(true)) + .addField("decimalValue", FieldType.DECIMAL.withNullable(true)) + .addField("floatValue", FieldType.FLOAT.withNullable(true)) + .addField("doubleValue", FieldType.DOUBLE.withNullable(true)) + .addField("stringValue", FieldType.STRING.withNullable(true)) + .addField("datetimeValue", FieldType.DATETIME.withNullable(true)) + .addField("booleanValue", FieldType.BOOLEAN.withNullable(true)) + .addField("bytesValue", FieldType.BYTES.withNullable(true)) + .addField("arrayValue", FieldType.array(FieldType.STRING)) + .addField("iterableValue", FieldType.array(FieldType.STRING)) + .addField("sqlDateValue", FieldType.logicalType(SqlTypes.DATE).withNullable(true)) + .addField("sqlTimeValue", FieldType.logicalType(SqlTypes.TIME).withNullable(true)) + .addField("sqlDatetimeValue", FieldType.logicalType(SqlTypes.DATETIME).withNullable(true)) + .addField( + "sqlTimestampValue", FieldType.logicalType(SqlTypes.TIMESTAMP).withNullable(true)) + .addField("enumValue", FieldType.logicalType(TEST_ENUM).withNullable(true)) + .build(); + + private static final DescriptorProto BASE_SCHEMA_PROTO = + DescriptorProto.newBuilder() + .addField( + FieldDescriptorProto.newBuilder() + .setName("bytevalue") + .setNumber(1) + .setType(Type.TYPE_INT32) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("int16value") + .setNumber(2) + .setType(Type.TYPE_INT32) + .setLabel(Label.LABEL_REQUIRED) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("int32value") + .setNumber(3) + .setType(Type.TYPE_INT32) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("int64value") + .setNumber(4) + .setType(Type.TYPE_INT64) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("decimalvalue") + .setNumber(5) + .setType(Type.TYPE_BYTES) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("floatvalue") + .setNumber(6) + .setType(Type.TYPE_FLOAT) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("doublevalue") + .setNumber(7) + .setType(Type.TYPE_DOUBLE) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("stringvalue") + .setNumber(8) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("datetimevalue") + .setNumber(9) + .setType(Type.TYPE_INT64) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("booleanvalue") + .setNumber(10) + .setType(Type.TYPE_BOOL) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("bytesvalue") + .setNumber(11) + .setType(Type.TYPE_BYTES) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("arrayvalue") + .setNumber(12) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_REPEATED) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("iterablevalue") + .setNumber(13) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_REPEATED) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("sqldatevalue") + .setNumber(14) + .setType(Type.TYPE_INT32) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("sqltimevalue") + .setNumber(15) + .setType(Type.TYPE_INT64) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("sqldatetimevalue") + .setNumber(16) + .setType(Type.TYPE_INT64) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("sqltimestampvalue") + .setNumber(17) + .setType(Type.TYPE_INT64) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("enumvalue") + .setNumber(18) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .build(); + + private static final byte[] BYTES = "BYTE BYTE BYTE".getBytes(StandardCharsets.UTF_8); + private static final Row BASE_ROW = + Row.withSchema(BASE_SCHEMA) + .withFieldValue("byteValue", (byte) 1) + .withFieldValue("int16Value", (short) 2) + .withFieldValue("int32Value", (int) 3) + .withFieldValue("int64Value", (long) 4) + .withFieldValue("decimalValue", BigDecimal.valueOf(5)) + .withFieldValue("floatValue", (float) 3.14) + .withFieldValue("doubleValue", (double) 2.68) + .withFieldValue("stringValue", "I am a string. Hear me roar.") + .withFieldValue("datetimeValue", Instant.now()) + .withFieldValue("booleanValue", true) + .withFieldValue("bytesValue", BYTES) + .withFieldValue("arrayValue", ImmutableList.of("one", "two", "red", "blue")) + .withFieldValue("iterableValue", ImmutableList.of("blue", "red", "two", "one")) + .withFieldValue("sqlDateValue", LocalDate.now()) + .withFieldValue("sqlTimeValue", LocalTime.now()) + .withFieldValue("sqlDatetimeValue", LocalDateTime.now()) + .withFieldValue("sqlTimestampValue", java.time.Instant.now()) + .withFieldValue("enumValue", TEST_ENUM.valueOf("RED")) + .build(); + private static final Map BASE_PROTO_EXPECTED_FIELDS = + ImmutableMap.builder() + .put("bytevalue", (int) 1) + .put("int16value", (int) 2) + .put("int32value", (int) 3) + .put("int64value", (long) 4) + .put( + "decimalvalue", + BeamRowToStorageApiProto.serializeBigDecimalToNumeric(BigDecimal.valueOf(5))) + .put("floatvalue", (float) 3.14) + .put("doublevalue", (double) 2.68) + .put("stringvalue", "I am a string. Hear me roar.") + .put("datetimevalue", BASE_ROW.getDateTime("datetimeValue").getMillis() * 1000) + .put("booleanvalue", true) + .put("bytesvalue", ByteString.copyFrom(BYTES)) + .put("arrayvalue", ImmutableList.of("one", "two", "red", "blue")) + .put("iterablevalue", ImmutableList.of("blue", "red", "two", "one")) + .put( + "sqldatevalue", + (int) BASE_ROW.getLogicalTypeValue("sqlDateValue", LocalDate.class).toEpochDay()) + .put( + "sqltimevalue", + CivilTimeEncoder.encodePacked64TimeMicros( + BASE_ROW.getLogicalTypeValue("sqlTimeValue", LocalTime.class))) + .put( + "sqldatetimevalue", + CivilTimeEncoder.encodePacked64DatetimeSeconds( + BASE_ROW.getLogicalTypeValue("sqlDatetimeValue", LocalDateTime.class))) + .put( + "sqltimestampvalue", + BASE_ROW + .getLogicalTypeValue("sqlTimestampValue", java.time.Instant.class) + .toEpochMilli() + * 1000) + .put("enumvalue", "RED") + .build(); + + private static final Schema NESTED_SCHEMA = + Schema.builder() + .addField("nested", FieldType.row(BASE_SCHEMA).withNullable(true)) + .addField("nestedArray", FieldType.array(FieldType.row(BASE_SCHEMA))) + .addField("nestedIterable", FieldType.iterable(FieldType.row(BASE_SCHEMA))) + .build(); + private static final Row NESTED_ROW = + Row.withSchema(NESTED_SCHEMA) + .withFieldValue("nested", BASE_ROW) + .withFieldValue("nestedArray", ImmutableList.of(BASE_ROW, BASE_ROW)) + .withFieldValue("nestedIterable", ImmutableList.of(BASE_ROW, BASE_ROW)) + .build(); + + @Test + public void testDescriptorFromSchema() { + DescriptorProto descriptor = + BeamRowToStorageApiProto.descriptorSchemaFromBeamSchema(BASE_SCHEMA); + Map types = + descriptor.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + Map expectedTypes = + BASE_SCHEMA_PROTO.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + assertEquals(expectedTypes, types); + + Map nameMapping = + BASE_SCHEMA.getFields().stream() + .collect(Collectors.toMap(f -> f.getName().toLowerCase(), Field::getName)); + descriptor + .getFieldList() + .forEach( + p -> { + FieldType schemaFieldType = + BASE_SCHEMA.getField(nameMapping.get(p.getName())).getType(); + Label label = + schemaFieldType.getTypeName().isCollectionType() + ? Label.LABEL_REPEATED + : schemaFieldType.getNullable() ? Label.LABEL_OPTIONAL : Label.LABEL_REQUIRED; + assertEquals(label, p.getLabel()); + }); + } + + @Test + public void testNestedFromSchema() { + DescriptorProto descriptor = + BeamRowToStorageApiProto.descriptorSchemaFromBeamSchema(NESTED_SCHEMA); + Map expectedBaseTypes = + BASE_SCHEMA_PROTO.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + + Map types = + descriptor.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + Map typeNames = + descriptor.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getTypeName)); + Map typeLabels = + descriptor.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getLabel)); + + assertEquals(3, types.size()); + + Map nestedTypes = + descriptor.getNestedTypeList().stream() + .collect(Collectors.toMap(DescriptorProto::getName, Functions.identity())); + assertEquals(3, nestedTypes.size()); + assertEquals(Type.TYPE_MESSAGE, types.get("nested")); + assertEquals(Label.LABEL_OPTIONAL, typeLabels.get("nested")); + String nestedTypeName1 = typeNames.get("nested"); + Map nestedTypes1 = + nestedTypes.get(nestedTypeName1).getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + assertEquals(expectedBaseTypes, nestedTypes1); + + assertEquals(Type.TYPE_MESSAGE, types.get("nestedarray")); + assertEquals(Label.LABEL_REPEATED, typeLabels.get("nestedarray")); + String nestedTypeName2 = typeNames.get("nestedarray"); + Map nestedTypes2 = + nestedTypes.get(nestedTypeName2).getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + assertEquals(expectedBaseTypes, nestedTypes2); + + assertEquals(Type.TYPE_MESSAGE, types.get("nestediterable")); + assertEquals(Label.LABEL_REPEATED, typeLabels.get("nestediterable")); + String nestedTypeName3 = typeNames.get("nestediterable"); + Map nestedTypes3 = + nestedTypes.get(nestedTypeName3).getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + assertEquals(expectedBaseTypes, nestedTypes3); + } + + private void assertBaseRecord(DynamicMessage msg) { + Map recordFields = + msg.getAllFields().entrySet().stream() + .collect( + Collectors.toMap(entry -> entry.getKey().getName(), entry -> entry.getValue())); + assertEquals(BASE_PROTO_EXPECTED_FIELDS, recordFields); + } + + @Test + public void testMessageFromTableRow() throws Exception { + Descriptor descriptor = BeamRowToStorageApiProto.getDescriptorFromSchema(NESTED_SCHEMA); + DynamicMessage msg = BeamRowToStorageApiProto.messageFromBeamRow(descriptor, NESTED_ROW); + assertEquals(3, msg.getAllFields().size()); + + Map fieldDescriptors = + descriptor.getFields().stream() + .collect(Collectors.toMap(FieldDescriptor::getName, Functions.identity())); + DynamicMessage nestedMsg = (DynamicMessage) msg.getField(fieldDescriptors.get("nested")); + assertBaseRecord(nestedMsg); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtilsTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtilsTest.java index 1adffa89c522..88ad9be52073 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtilsTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtilsTest.java @@ -44,15 +44,13 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.BaseEncoding; +import org.apache.commons.lang3.tuple.Pair; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** Tests for {@link BigQueryAvroUtils}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryAvroUtilsTest { private List subFields = Lists.newArrayList( @@ -73,6 +71,10 @@ public class BigQueryAvroUtilsTest { new TableFieldSchema().setName("quantity").setType("INTEGER") /* default to NULLABLE */, new TableFieldSchema().setName("birthday").setType("TIMESTAMP").setMode("NULLABLE"), new TableFieldSchema().setName("birthdayMoney").setType("NUMERIC").setMode("NULLABLE"), + new TableFieldSchema() + .setName("lotteryWinnings") + .setType("BIGNUMERIC") + .setMode("NULLABLE"), new TableFieldSchema().setName("flighted").setType("BOOLEAN").setMode("NULLABLE"), new TableFieldSchema().setName("sound").setType("BYTES").setMode("NULLABLE"), new TableFieldSchema().setName("anniversaryDate").setType("DATE").setMode("NULLABLE"), @@ -93,37 +95,52 @@ public class BigQueryAvroUtilsTest { .setFields(subFields), new TableFieldSchema().setName("geoPositions").setType("GEOGRAPHY").setMode("NULLABLE")); + private Pair convertToByteBuffer(BigDecimal bigDecimal, Schema schema) { + LogicalType bigDecimalLogicalType = + LogicalTypes.decimal(bigDecimal.precision(), bigDecimal.scale()); + // DecimalConversion.toBytes returns a ByteBuffer, which can be mutated by callees if passed + // to other methods. We wrap the byte array as a ByteBuffer before adding it to the + // GenericRecords. + byte[] bigDecimalBytes = + new Conversions.DecimalConversion() + .toBytes(bigDecimal, schema, bigDecimalLogicalType) + .array(); + return Pair.of(bigDecimalLogicalType, bigDecimalBytes); + } + @Test public void testConvertGenericRecordToTableRow() throws Exception { TableSchema tableSchema = new TableSchema(); tableSchema.setFields(fields); - // BigQuery encodes NUMERIC values to Avro using the BYTES type with the DECIMAL logical - // type. AvroCoder can't apply logical types to Schemas directly, so we need to get the + // BigQuery encodes NUMERIC and BIGNUMERIC values to Avro using the BYTES type with the DECIMAL + // logical type. AvroCoder can't apply logical types to Schemas directly, so we need to get the // Schema for the Bird class defined below, then replace the field used to test NUMERIC with // a field that has the appropriate Schema. - BigDecimal birthdayMoney = new BigDecimal("123456789.123456789"); - Schema birthdayMoneySchema = Schema.create(Type.BYTES); - LogicalType birthdayMoneyLogicalType = - LogicalTypes.decimal(birthdayMoney.precision(), birthdayMoney.scale()); - // DecimalConversion.toBytes returns a ByteBuffer, which can be mutated by callees if passed - // to other methods. We wrap the byte array as a ByteBuffer when adding it to the - // GenericRecords below. - byte[] birthdayMoneyBytes = - new Conversions.DecimalConversion() - .toBytes(birthdayMoney, birthdayMoneySchema, birthdayMoneyLogicalType) - .array(); + Schema numericSchema = Schema.create(Type.BYTES); + BigDecimal numeric = new BigDecimal("123456789.123456789"); + Pair numericPair = convertToByteBuffer(numeric, numericSchema); + Schema bigNumericSchema = Schema.create(Type.BYTES); + BigDecimal bigNumeric = + new BigDecimal( + "578960446186580977117854925043439539266.34992332820282019728792003956564819967"); + Pair bigNumericPair = convertToByteBuffer(bigNumeric, bigNumericSchema); - // In order to update the Schema for birthdayMoney, we need to recreate all of the Fields. + // In order to update the Schema for NUMERIC and BIGNUMERIC values, we need to recreate all of + // the Fields. List avroFields = new ArrayList<>(); for (Schema.Field field : AvroCoder.of(Bird.class).getSchema().getFields()) { Schema schema = field.schema(); if ("birthdayMoney".equals(field.name())) { - // birthdayMoney is a nullable field with type BYTES/DECIMAL. + // birthdayMoney is nullable field with type BYTES/DECIMAL. schema = Schema.createUnion( - Schema.create(Type.NULL), - birthdayMoneyLogicalType.addToSchema(birthdayMoneySchema)); + Schema.create(Type.NULL), numericPair.getLeft().addToSchema(numericSchema)); + } else if ("lotteryWinnings".equals(field.name())) { + // lotteryWinnings is nullable field with type BYTES/DECIMAL. + schema = + Schema.createUnion( + Schema.create(Type.NULL), bigNumericPair.getLeft().addToSchema(bigNumericSchema)); } // After a Field is added to a Schema, it is assigned a position, so we can't simply reuse // the existing Field. @@ -151,7 +168,8 @@ public void testConvertGenericRecordToTableRow() throws Exception { record.put("number", 5L); record.put("quality", 5.0); record.put("birthday", 5L); - record.put("birthdayMoney", ByteBuffer.wrap(birthdayMoneyBytes)); + record.put("birthdayMoney", ByteBuffer.wrap(numericPair.getRight())); + record.put("lotteryWinnings", ByteBuffer.wrap(bigNumericPair.getRight())); record.put("flighted", Boolean.TRUE); record.put("sound", soundByteBuffer); record.put("anniversaryDate", new Utf8("2000-01-01")); @@ -163,7 +181,8 @@ public void testConvertGenericRecordToTableRow() throws Exception { new TableRow() .set("number", "5") .set("birthday", "1970-01-01 00:00:00.000005 UTC") - .set("birthdayMoney", birthdayMoney.toString()) + .set("birthdayMoney", numeric.toString()) + .set("lotteryWinnings", bigNumeric.toString()) .set("quality", 5.0) .set("associates", new ArrayList()) .set("flighted", Boolean.TRUE) @@ -184,13 +203,15 @@ public void testConvertGenericRecordToTableRow() throws Exception { GenericRecord record = new GenericData.Record(avroSchema); record.put("number", 5L); record.put("associates", Lists.newArrayList(nestedRecord)); - record.put("birthdayMoney", ByteBuffer.wrap(birthdayMoneyBytes)); + record.put("birthdayMoney", ByteBuffer.wrap(numericPair.getRight())); + record.put("lotteryWinnings", ByteBuffer.wrap(bigNumericPair.getRight())); TableRow convertedRow = BigQueryAvroUtils.convertGenericRecordToTableRow(record, tableSchema); TableRow row = new TableRow() .set("associates", Lists.newArrayList(new TableRow().set("species", "other"))) .set("number", "5") - .set("birthdayMoney", birthdayMoney.toString()); + .set("birthdayMoney", numeric.toString()) + .set("lotteryWinnings", bigNumeric.toString()); assertEquals(row, convertedRow); TableRow clonedRow = convertedRow.clone(); assertEquals(convertedRow, clonedRow); @@ -220,6 +241,9 @@ public void testConvertBigQuerySchemaToAvroSchema() { assertThat( avroSchema.getField("birthdayMoney").schema(), equalTo(Schema.createUnion(Schema.create(Type.NULL), Schema.create(Type.BYTES)))); + assertThat( + avroSchema.getField("lotteryWinnings").schema(), + equalTo(Schema.createUnion(Schema.create(Type.NULL), Schema.create(Type.BYTES)))); assertThat( avroSchema.getField("flighted").schema(), equalTo(Schema.createUnion(Schema.create(Type.NULL), Schema.create(Type.BOOLEAN)))); @@ -317,6 +341,7 @@ static class Bird { @Nullable Long quantity; @Nullable Long birthday; // Exercises TIMESTAMP. @Nullable ByteBuffer birthdayMoney; // Exercises NUMERIC. + @Nullable ByteBuffer lotteryWinnings; // Exercises BIGNUMERIC. @Nullable String geoPositions; // Exercises GEOGRAPHY. @Nullable Boolean flighted; @Nullable ByteBuffer sound; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpersTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpersTest.java index a9cddce49b45..de6f277322bb 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpersTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpersTest.java @@ -54,20 +54,26 @@ /** Tests for {@link BigQueryHelpers}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryHelpersTest { @Rule public transient ExpectedException thrown = ExpectedException.none(); @Test - public void testTableParsing() { + public void testTablesspecParsing() { TableReference ref = BigQueryHelpers.parseTableSpec("my-project:data_set.table_name"); assertEquals("my-project", ref.getProjectId()); assertEquals("data_set", ref.getDatasetId()); assertEquals("table_name", ref.getTableId()); } + @Test + public void testTableUrnParsing() { + TableReference ref = + BigQueryHelpers.parseTableUrn("projects/my-project/datasets/data_set/tables/table_name"); + assertEquals("my-project", ref.getProjectId()); + assertEquals("data_set", ref.getDatasetId()); + assertEquals("table_name", ref.getTableId()); + } + @Test public void testTableParsing_validPatterns() { BigQueryHelpers.parseTableSpec("a123-456:foo_bar.d"); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadIT.java index 3971de4ea0f5..40d6de288008 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadIT.java @@ -39,9 +39,6 @@ * number of records read equals the given expected number of records. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryIOReadIT { private BigQueryIOReadOptions options; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadTest.java index 69439bb32011..d9feefd45429 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadTest.java @@ -19,10 +19,10 @@ import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryResourceNaming.createTempTableReference; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import com.google.api.services.bigquery.model.JobStatistics; import com.google.api.services.bigquery.model.JobStatistics2; @@ -85,9 +85,6 @@ /** Tests for {@link BigQueryIO#read}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryIOReadTest implements Serializable { private transient PipelineOptions options; private transient TemporaryFolder testFolder = new TemporaryFolder(); @@ -108,6 +105,10 @@ public Statement apply(final Statement base, final Description description) { public void evaluate() throws Throwable { options = TestPipeline.testingPipelineOptions(); options.as(BigQueryOptions.class).setProject("project-id"); + if (description.getAnnotations().stream() + .anyMatch(a -> a.annotationType().equals(ProjectOverride.class))) { + options.as(BigQueryOptions.class).setBigQueryProject("bigquery-project-id"); + } options .as(BigQueryOptions.class) .setTempLocation(testFolder.getRoot().getAbsolutePath()); @@ -128,6 +129,64 @@ public void evaluate() throws Throwable { .withDatasetService(fakeDatasetService) .withJobService(fakeJobService); + private void checkSetsProject(String projectId) throws Exception { + fakeDatasetService.createDataset(projectId, "dataset-id", "", "", null); + String tableId = "sometable"; + TableReference tableReference = + new TableReference().setProjectId(projectId).setDatasetId("dataset-id").setTableId(tableId); + + fakeDatasetService.createTable( + new Table() + .setTableReference(tableReference) + .setSchema( + new TableSchema() + .setFields( + ImmutableList.of( + new TableFieldSchema().setName("name").setType("STRING"), + new TableFieldSchema().setName("number").setType("INTEGER"))))); + + FakeBigQueryServices fakeBqServices = + new FakeBigQueryServices() + .withJobService(new FakeJobService()) + .withDatasetService(fakeDatasetService); + + List expected = + ImmutableList.of( + new TableRow().set("name", "a").set("number", 1L), + new TableRow().set("name", "b").set("number", 2L), + new TableRow().set("name", "c").set("number", 3L), + new TableRow().set("name", "d").set("number", 4L), + new TableRow().set("name", "e").set("number", 5L), + new TableRow().set("name", "f").set("number", 6L)); + fakeDatasetService.insertAll(tableReference, expected, null); + + TableReference tableRef = new TableReference().setDatasetId("dataset-id").setTableId(tableId); + + PCollection> output = + p.apply(BigQueryIO.read().from(tableRef).withTestServices(fakeBqServices)) + .apply( + ParDo.of( + new DoFn>() { + @ProcessElement + public void processElement(ProcessContext c) throws Exception { + c.output( + KV.of( + (String) c.element().get("name"), + Long.valueOf((String) c.element().get("number")))); + } + })); + PAssert.that(output) + .containsInAnyOrder( + ImmutableList.of( + KV.of("a", 1L), + KV.of("b", 2L), + KV.of("c", 3L), + KV.of("d", 4L), + KV.of("e", 5L), + KV.of("f", 6L))); + p.run(); + } + private void checkReadTableObject( BigQueryIO.Read read, String project, String dataset, String table) { checkReadTableObjectWithValidate(read, project, dataset, table, true); @@ -235,63 +294,14 @@ public void testBuildQueryBasedTypedReadSource() { } @Test - public void testValidateReadSetsDefaultProject() throws Exception { - String tableId = "sometable"; - TableReference tableReference = - new TableReference() - .setProjectId("project-id") - .setDatasetId("dataset-id") - .setTableId(tableId); - fakeDatasetService.createTable( - new Table() - .setTableReference(tableReference) - .setSchema( - new TableSchema() - .setFields( - ImmutableList.of( - new TableFieldSchema().setName("name").setType("STRING"), - new TableFieldSchema().setName("number").setType("INTEGER"))))); - - FakeBigQueryServices fakeBqServices = - new FakeBigQueryServices() - .withJobService(new FakeJobService()) - .withDatasetService(fakeDatasetService); - - List expected = - ImmutableList.of( - new TableRow().set("name", "a").set("number", 1L), - new TableRow().set("name", "b").set("number", 2L), - new TableRow().set("name", "c").set("number", 3L), - new TableRow().set("name", "d").set("number", 4L), - new TableRow().set("name", "e").set("number", 5L), - new TableRow().set("name", "f").set("number", 6L)); - fakeDatasetService.insertAll(tableReference, expected, null); - - TableReference tableRef = new TableReference().setDatasetId("dataset-id").setTableId(tableId); + @ProjectOverride + public void testValidateReadSetsBigQueryProject() throws Exception { + checkSetsProject("bigquery-project-id"); + } - PCollection> output = - p.apply(BigQueryIO.read().from(tableRef).withTestServices(fakeBqServices)) - .apply( - ParDo.of( - new DoFn>() { - @ProcessElement - public void processElement(ProcessContext c) throws Exception { - c.output( - KV.of( - (String) c.element().get("name"), - Long.valueOf((String) c.element().get("number")))); - } - })); - PAssert.that(output) - .containsInAnyOrder( - ImmutableList.of( - KV.of("a", 1L), - KV.of("b", 2L), - KV.of("c", 3L), - KV.of("d", 4L), - KV.of("e", 5L), - KV.of("f", 6L))); - p.run(); + @Test + public void testValidateReadSetsDefaultProject() throws Exception { + checkSetsProject("project-id"); } @Test diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageQueryIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageQueryIT.java index dee6417579f9..fa29f7b8447b 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageQueryIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageQueryIT.java @@ -43,9 +43,6 @@ * pre-defined table and asserts that the number of records read is equal to the expected count. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryIOStorageQueryIT { private static final Map EXPECTED_NUM_RECORDS = diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageQueryTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageQueryTest.java index 124209905cd5..86f538a4d0f3 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageQueryTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageQueryTest.java @@ -95,9 +95,6 @@ /** Tests for {@link BigQueryIO#readTableRows()} using {@link Method#DIRECT_READ}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryIOStorageQueryTest { private transient BigQueryOptions options; @@ -119,6 +116,10 @@ public Statement apply(Statement base, Description description) { public void evaluate() throws Throwable { options = TestPipeline.testingPipelineOptions().as(BigQueryOptions.class); options.setProject("project-id"); + if (description.getAnnotations().stream() + .anyMatch(a -> a.annotationType().equals(ProjectOverride.class))) { + options.as(BigQueryOptions.class).setBigQueryProject("bigquery-project-id"); + } options.setTempLocation(testFolder.getRoot().getAbsolutePath()); p = TestPipeline.fromOptions(options); p.apply(base, description).evaluate(); @@ -300,6 +301,7 @@ public void testQuerySourceEstimatedSize() throws Exception { /* location = */ null, /* queryTempDataset = */ null, /* kmsKey = */ null, + null, new TableRowParser(), TableRowJsonCoder.of(), fakeBigQueryServices); @@ -371,8 +373,7 @@ private void doQuerySourceInitialSplit( .setParent("projects/" + options.getProject()) .setReadSession( ReadSession.newBuilder() - .setTable(BigQueryHelpers.toTableResourceName(tempTableReference)) - .setDataFormat(DataFormat.AVRO)) + .setTable(BigQueryHelpers.toTableResourceName(tempTableReference))) .setMaxStreamCount(requestedStreamCount) .build(); @@ -393,7 +394,8 @@ private void doQuerySourceInitialSplit( ReadSession.Builder builder = ReadSession.newBuilder() - .setAvroSchema(AvroSchema.newBuilder().setSchema(sessionSchema.toString())); + .setAvroSchema(AvroSchema.newBuilder().setSchema(sessionSchema.toString())) + .setDataFormat(DataFormat.AVRO); for (int i = 0; i < expectedStreamCount; i++) { builder.addStreams(ReadStream.newBuilder().setName("stream-" + i)); } @@ -411,6 +413,7 @@ private void doQuerySourceInitialSplit( /* location = */ null, /* queryTempDataset = */ null, /* kmsKey = */ null, + null, new TableRowParser(), TableRowJsonCoder.of(), new FakeBigQueryServices() @@ -464,8 +467,7 @@ public void testQuerySourceInitialSplit_NoReferencedTables() throws Exception { .setParent("projects/" + options.getProject()) .setReadSession( ReadSession.newBuilder() - .setTable(BigQueryHelpers.toTableResourceName(tempTableReference)) - .setDataFormat(DataFormat.AVRO)) + .setTable(BigQueryHelpers.toTableResourceName(tempTableReference))) .setMaxStreamCount(1024) .build(); @@ -486,7 +488,8 @@ public void testQuerySourceInitialSplit_NoReferencedTables() throws Exception { ReadSession.Builder builder = ReadSession.newBuilder() - .setAvroSchema(AvroSchema.newBuilder().setSchema(sessionSchema.toString())); + .setAvroSchema(AvroSchema.newBuilder().setSchema(sessionSchema.toString())) + .setDataFormat(DataFormat.AVRO); for (int i = 0; i < 1024; i++) { builder.addStreams(ReadStream.newBuilder().setName("stream-" + i)); } @@ -504,6 +507,7 @@ public void testQuerySourceInitialSplit_NoReferencedTables() throws Exception { /* location = */ null, /* queryTempDataset = */ null, /* kmsKey = */ null, + null, new TableRowParser(), TableRowJsonCoder.of(), new FakeBigQueryServices() @@ -582,6 +586,80 @@ public KV apply(SchemaAndRecord input) { } } + @Test + @ProjectOverride + public void testQuerySourceInitialSplitWithBigQueryProject_EmptyResult() throws Exception { + + TableReference sourceTableRef = + BigQueryHelpers.parseTableSpec("bigquery-project-id:dataset.table"); + + fakeDatasetService.createDataset( + sourceTableRef.getProjectId(), + sourceTableRef.getDatasetId(), + "asia-northeast1", + "Fake plastic tree^H^H^H^Htables", + null); + + fakeDatasetService.createTable( + new Table().setTableReference(sourceTableRef).setLocation("asia-northeast1")); + + Table queryResultTable = new Table().setSchema(TABLE_SCHEMA).setNumBytes(0L); + + String encodedQuery = FakeBigQueryServices.encodeQueryResult(queryResultTable); + + fakeJobService.expectDryRunQuery( + options.getBigQueryProject(), + encodedQuery, + new JobStatistics() + .setQuery( + new JobStatistics2() + .setTotalBytesProcessed(1024L * 1024L) + .setReferencedTables(ImmutableList.of(sourceTableRef)))); + + String stepUuid = "testStepUuid"; + + TableReference tempTableReference = + createTempTableReference( + options.getBigQueryProject(), + BigQueryResourceNaming.createJobIdPrefix(options.getJobName(), stepUuid, JobType.QUERY), + Optional.empty()); + + CreateReadSessionRequest expectedRequest = + CreateReadSessionRequest.newBuilder() + .setParent("projects/" + options.getBigQueryProject()) + .setReadSession( + ReadSession.newBuilder() + .setTable(BigQueryHelpers.toTableResourceName(tempTableReference)) + .setDataFormat(DataFormat.AVRO)) + .setMaxStreamCount(10) + .build(); + + ReadSession emptyReadSession = ReadSession.newBuilder().build(); + StorageClient fakeStorageClient = mock(StorageClient.class); + when(fakeStorageClient.createReadSession(expectedRequest)).thenReturn(emptyReadSession); + + BigQueryStorageQuerySource querySource = + BigQueryStorageQuerySource.create( + stepUuid, + ValueProvider.StaticValueProvider.of(encodedQuery), + /* flattenResults = */ true, + /* useLegacySql = */ true, + /* priority = */ QueryPriority.BATCH, + /* location = */ null, + /* queryTempDataset = */ null, + /* kmsKey = */ null, + DataFormat.AVRO, + new TableRowParser(), + TableRowJsonCoder.of(), + new FakeBigQueryServices() + .withDatasetService(fakeDatasetService) + .withJobService(fakeJobService) + .withStorageClient(fakeStorageClient)); + + List> sources = querySource.split(1024L, options); + assertTrue(sources.isEmpty()); + } + @Test public void testQuerySourceInitialSplit_EmptyResult() throws Exception { @@ -623,8 +701,7 @@ public void testQuerySourceInitialSplit_EmptyResult() throws Exception { .setParent("projects/" + options.getProject()) .setReadSession( ReadSession.newBuilder() - .setTable(BigQueryHelpers.toTableResourceName(tempTableReference)) - .setDataFormat(DataFormat.AVRO)) + .setTable(BigQueryHelpers.toTableResourceName(tempTableReference))) .setMaxStreamCount(10) .build(); @@ -642,6 +719,7 @@ public void testQuerySourceInitialSplit_EmptyResult() throws Exception { /* location = */ null, /* queryTempDataset = */ null, /* kmsKey = */ null, + null, new TableRowParser(), TableRowJsonCoder.of(), new FakeBigQueryServices() @@ -665,6 +743,7 @@ public void testQuerySourceCreateReader() throws Exception { /* location = */ "asia-northeast1", /* queryTempDataset = */ null, /* kmsKey = */ null, + null, new TableRowParser(), TableRowJsonCoder.of(), fakeBigQueryServices); @@ -716,6 +795,7 @@ private void doReadFromBigQueryIO(boolean templateCompatibility) throws Exceptio .setName("readSessionName") .setAvroSchema(AvroSchema.newBuilder().setSchema(AVRO_SCHEMA_STRING)) .addStreams(ReadStream.newBuilder().setName("streamName")) + .setDataFormat(DataFormat.AVRO) .build(); ReadRowsRequest expectedReadRowsRequest = @@ -741,7 +821,7 @@ private void doReadFromBigQueryIO(boolean templateCompatibility) throws Exceptio StorageClient fakeStorageClient = mock(StorageClient.class, withSettings().serializable()); when(fakeStorageClient.createReadSession(any())).thenReturn(readSession); - when(fakeStorageClient.readRows(expectedReadRowsRequest)) + when(fakeStorageClient.readRows(expectedReadRowsRequest, "")) .thenReturn(new FakeBigQueryServerStream<>(readRowsResponses)); BigQueryIO.TypedRead> typedRead = diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadIT.java index c2bcd9352b2a..d2794a656f13 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadIT.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.bigquery; +import com.google.cloud.bigquery.storage.v1.DataFormat; import java.util.Map; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; @@ -43,9 +44,6 @@ * records read is equal to the expected count. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryIOStorageReadIT { private static final Map EXPECTED_NUM_RECORDS = @@ -73,12 +71,19 @@ public interface BigQueryIOStorageReadOptions extends TestPipelineOptions, Exper long getNumRecords(); void setNumRecords(long numRecords); + + @Description("The data format to use") + @Validation.Required + DataFormat getDataFormat(); + + void setDataFormat(DataFormat dataFormat); } - private void setUpTestEnvironment(String tableSize) { + private void setUpTestEnvironment(String tableSize, DataFormat format) { PipelineOptionsFactory.register(BigQueryIOStorageReadOptions.class); options = TestPipeline.testingPipelineOptions().as(BigQueryIOStorageReadOptions.class); options.setNumRecords(EXPECTED_NUM_RECORDS.get(tableSize)); + options.setDataFormat(format); String project = TestPipeline.testingPipelineOptions().as(GcpOptions.class).getProject(); options.setInputTable(project + ":" + DATASET_ID + "." + TABLE_PREFIX + tableSize); } @@ -90,15 +95,22 @@ private void runBigQueryIOStorageReadPipeline() { "Read", BigQueryIO.read(TableRowParser.INSTANCE) .from(options.getInputTable()) - .withMethod(Method.DIRECT_READ)) + .withMethod(Method.DIRECT_READ) + .withFormat(options.getDataFormat())) .apply("Count", Count.globally()); PAssert.thatSingleton(count).isEqualTo(options.getNumRecords()); p.run().waitUntilFinish(); } @Test - public void testBigQueryStorageRead1G() throws Exception { - setUpTestEnvironment("1G"); + public void testBigQueryStorageRead1GAvro() throws Exception { + setUpTestEnvironment("1G", DataFormat.AVRO); + runBigQueryIOStorageReadPipeline(); + } + + @Test + public void testBigQueryStorageRead1GArrow() throws Exception { + setUpTestEnvironment("1G", DataFormat.ARROW); runBigQueryIOStorageReadPipeline(); } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTableRowIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTableRowIT.java index 1dcde1d8cdfd..734c3af2c4d4 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTableRowIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTableRowIT.java @@ -50,9 +50,6 @@ * combination with {@link TableRowParser} to generate output in {@link TableRow} form. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryIOStorageReadTableRowIT { private static final String DATASET_ID = "big_query_import_export"; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java index 8bcd106bf215..135a536b7ba2 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.bigquery; +import static java.util.Arrays.asList; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; @@ -26,6 +27,8 @@ import static org.junit.Assert.assertNull; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; import static org.mockito.Mockito.when; import static org.mockito.Mockito.withSettings; @@ -37,6 +40,8 @@ import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableRow; import com.google.api.services.bigquery.model.TableSchema; +import com.google.cloud.bigquery.storage.v1.ArrowRecordBatch; +import com.google.cloud.bigquery.storage.v1.ArrowSchema; import com.google.cloud.bigquery.storage.v1.AvroRows; import com.google.cloud.bigquery.storage.v1.AvroSchema; import com.google.cloud.bigquery.storage.v1.CreateReadSessionRequest; @@ -54,10 +59,23 @@ import io.grpc.Status.Code; import io.grpc.StatusRuntimeException; import java.io.ByteArrayOutputStream; +import java.io.IOException; import java.math.BigInteger; +import java.nio.channels.Channels; import java.util.ArrayList; +import java.util.Arrays; import java.util.Collection; import java.util.List; +import org.apache.arrow.memory.BufferAllocator; +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.VarCharVector; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.apache.arrow.vector.VectorUnloader; +import org.apache.arrow.vector.ipc.WriteChannel; +import org.apache.arrow.vector.ipc.message.MessageSerializer; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.util.Text; import org.apache.avro.Schema; import org.apache.avro.generic.GenericData.Record; import org.apache.avro.generic.GenericDatumWriter; @@ -88,6 +106,7 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.junit.After; import org.junit.Before; import org.junit.Rule; import org.junit.Test; @@ -98,17 +117,16 @@ import org.junit.runner.RunWith; import org.junit.runners.JUnit4; import org.junit.runners.model.Statement; +import org.mockito.ArgumentMatchers; /** Tests for {@link BigQueryIO#readTableRows() using {@link Method#DIRECT_READ}}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryIOStorageReadTest { private transient PipelineOptions options; - private transient TemporaryFolder testFolder = new TemporaryFolder(); + private final transient TemporaryFolder testFolder = new TemporaryFolder(); private transient TestPipeline p; + private BufferAllocator allocator; @Rule public final transient TestRule folderThenPipeline = @@ -125,6 +143,10 @@ public Statement apply(Statement base, Description description) { public void evaluate() throws Throwable { options = TestPipeline.testingPipelineOptions(); options.as(BigQueryOptions.class).setProject("project-id"); + if (description.getAnnotations().stream() + .anyMatch(a -> a.annotationType().equals(ProjectOverride.class))) { + options.as(BigQueryOptions.class).setBigQueryProject("bigquery-project-id"); + } options .as(BigQueryOptions.class) .setTempLocation(testFolder.getRoot().getAbsolutePath()); @@ -138,11 +160,17 @@ public void evaluate() throws Throwable { @Rule public transient ExpectedException thrown = ExpectedException.none(); - private FakeDatasetService fakeDatasetService = new FakeDatasetService(); + private final FakeDatasetService fakeDatasetService = new FakeDatasetService(); @Before public void setUp() throws Exception { FakeDatasetService.setUp(); + allocator = new RootAllocator(Long.MAX_VALUE); + } + + @After + public void teardown() { + allocator.close(); } @Test @@ -305,6 +333,26 @@ private void doTableSourceEstimatedSizeTest(boolean useStreamingBuffer) throws E assertEquals(100, tableSource.getEstimatedSizeBytes(options)); } + @Test + @ProjectOverride + public void testTableSourceEstimatedSize_WithBigQueryProject() throws Exception { + fakeDatasetService.createDataset("bigquery-project-id", "dataset", "", "", null); + TableReference tableRef = BigQueryHelpers.parseTableSpec("bigquery-project-id:dataset.table"); + Table table = new Table().setTableReference(tableRef).setNumBytes(100L); + fakeDatasetService.createTable(table); + + BigQueryStorageTableSource tableSource = + BigQueryStorageTableSource.create( + ValueProvider.StaticValueProvider.of(BigQueryHelpers.parseTableSpec("dataset.table")), + null, + null, + new TableRowParser(), + TableRowJsonCoder.of(), + new FakeBigQueryServices().withDatasetService(fakeDatasetService)); + + assertEquals(100, tableSource.getEstimatedSizeBytes(options)); + } + @Test public void testTableSourceEstimatedSize_WithDefaultProject() throws Exception { fakeDatasetService.createDataset("project-id", "dataset", "", "", null); @@ -370,6 +418,11 @@ public void testTableSourceInitialSplit_MaxSplitCount() throws Exception { new TableFieldSchema().setName("name").setType("STRING").setMode("REQUIRED"), new TableFieldSchema().setName("number").setType("INTEGER").setMode("REQUIRED"))); + private static final org.apache.arrow.vector.types.pojo.Schema ARROW_SCHEMA = + new org.apache.arrow.vector.types.pojo.Schema( + asList( + field("name", new ArrowType.Utf8()), field("number", new ArrowType.Int(64, true)))); + private void doTableSourceInitialSplitTest(long bundleSize, int streamCount) throws Exception { fakeDatasetService.createDataset("foo.com:project", "dataset", "", "", null); TableReference tableRef = BigQueryHelpers.parseTableSpec("foo.com:project:dataset.table"); @@ -384,14 +437,14 @@ private void doTableSourceInitialSplitTest(long bundleSize, int streamCount) thr .setParent("projects/project-id") .setReadSession( ReadSession.newBuilder() - .setTable("projects/foo.com:project/datasets/dataset/tables/table") - .setDataFormat(DataFormat.AVRO)) + .setTable("projects/foo.com:project/datasets/dataset/tables/table")) .setMaxStreamCount(streamCount) .build(); ReadSession.Builder builder = ReadSession.newBuilder() - .setAvroSchema(AvroSchema.newBuilder().setSchema(AVRO_SCHEMA_STRING)); + .setAvroSchema(AvroSchema.newBuilder().setSchema(AVRO_SCHEMA_STRING)) + .setDataFormat(DataFormat.AVRO); for (int i = 0; i < streamCount; i++) { builder.addStreams(ReadStream.newBuilder().setName("stream-" + i)); } @@ -429,7 +482,6 @@ public void testTableSourceInitialSplit_WithSelectedFieldsAndRowRestriction() th .setReadSession( ReadSession.newBuilder() .setTable("projects/foo.com:project/datasets/dataset/tables/table") - .setDataFormat(DataFormat.AVRO) .setReadOptions( ReadSession.TableReadOptions.newBuilder() .addSelectedFields("name") @@ -439,7 +491,8 @@ public void testTableSourceInitialSplit_WithSelectedFieldsAndRowRestriction() th ReadSession.Builder builder = ReadSession.newBuilder() - .setAvroSchema(AvroSchema.newBuilder().setSchema(TRIMMED_AVRO_SCHEMA_STRING)); + .setAvroSchema(AvroSchema.newBuilder().setSchema(TRIMMED_AVRO_SCHEMA_STRING)) + .setDataFormat(DataFormat.AVRO); for (int i = 0; i < 10; i++) { builder.addStreams(ReadStream.newBuilder().setName("stream-" + i)); } @@ -477,14 +530,14 @@ public void testTableSourceInitialSplit_WithDefaultProject() throws Exception { .setParent("projects/project-id") .setReadSession( ReadSession.newBuilder() - .setTable("projects/project-id/datasets/dataset/tables/table") - .setDataFormat(DataFormat.AVRO)) + .setTable("projects/project-id/datasets/dataset/tables/table")) .setMaxStreamCount(1024) .build(); ReadSession.Builder builder = ReadSession.newBuilder() - .setAvroSchema(AvroSchema.newBuilder().setSchema(AVRO_SCHEMA_STRING)); + .setAvroSchema(AvroSchema.newBuilder().setSchema(AVRO_SCHEMA_STRING)) + .setDataFormat(DataFormat.AVRO); for (int i = 0; i < 50; i++) { builder.addStreams(ReadStream.newBuilder().setName("stream-" + i)); } @@ -525,8 +578,7 @@ public void testTableSourceInitialSplit_EmptyTable() throws Exception { .setParent("projects/project-id") .setReadSession( ReadSession.newBuilder() - .setTable("projects/foo.com:project/datasets/dataset/tables/table") - .setDataFormat(DataFormat.AVRO)) + .setTable("projects/foo.com:project/datasets/dataset/tables/table")) .setMaxStreamCount(1024) .build(); @@ -579,6 +631,18 @@ private static GenericRecord createRecord(String name, long number, Schema schem return genericRecord; } + private static ByteString serializeArrowSchema( + org.apache.arrow.vector.types.pojo.Schema arrowSchema) { + ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream(); + try { + MessageSerializer.serialize( + new WriteChannel(Channels.newChannel(byteOutputStream)), arrowSchema); + } catch (IOException ex) { + throw new RuntimeException("Failed to serialize arrow schema.", ex); + } + return ByteString.copyFrom(byteOutputStream.toByteArray()); + } + private static final EncoderFactory ENCODER_FACTORY = EncoderFactory.get(); private static ReadRowsResponse createResponse( @@ -611,6 +675,51 @@ private static ReadRowsResponse createResponse( .build(); } + private ReadRowsResponse createResponseArrow( + org.apache.arrow.vector.types.pojo.Schema arrowSchema, + List name, + List number, + double progressAtResponseStart, + double progressAtResponseEnd) { + ArrowRecordBatch serializedRecord; + try (VectorSchemaRoot schemaRoot = VectorSchemaRoot.create(arrowSchema, allocator)) { + schemaRoot.allocateNew(); + schemaRoot.setRowCount(name.size()); + VarCharVector strVector = (VarCharVector) schemaRoot.getFieldVectors().get(0); + BigIntVector bigIntVector = (BigIntVector) schemaRoot.getFieldVectors().get(1); + for (int i = 0; i < name.size(); i++) { + bigIntVector.set(i, number.get(i)); + strVector.set(i, new Text(name.get(i))); + } + + VectorUnloader unLoader = new VectorUnloader(schemaRoot); + try (org.apache.arrow.vector.ipc.message.ArrowRecordBatch records = + unLoader.getRecordBatch()) { + try (ByteArrayOutputStream os = new ByteArrayOutputStream()) { + MessageSerializer.serialize(new WriteChannel(Channels.newChannel(os)), records); + serializedRecord = + ArrowRecordBatch.newBuilder() + .setRowCount(records.getLength()) + .setSerializedRecordBatch(ByteString.copyFrom(os.toByteArray())) + .build(); + } catch (IOException e) { + throw new RuntimeException("Error writing to byte array output stream", e); + } + } + } + + return ReadRowsResponse.newBuilder() + .setArrowRecordBatch(serializedRecord) + .setRowCount(name.size()) + .setStats( + StreamStats.newBuilder() + .setProgress( + Progress.newBuilder() + .setAtResponseStart(progressAtResponseStart) + .setAtResponseEnd(progressAtResponseEnd))) + .build(); + } + @Test public void testStreamSourceEstimatedSizeBytes() throws Exception { @@ -665,7 +774,7 @@ public void testReadFromStreamSource() throws Exception { createResponse(AVRO_SCHEMA, records.subList(2, 3), 0.5, 0.75)); StorageClient fakeStorageClient = mock(StorageClient.class); - when(fakeStorageClient.readRows(expectedRequest)) + when(fakeStorageClient.readRows(expectedRequest, "")) .thenReturn(new FakeBigQueryServerStream<>(responses)); BigQueryStorageStreamSource streamSource = @@ -721,7 +830,7 @@ public void testFractionConsumed() throws Exception { createResponse(AVRO_SCHEMA, records.subList(4, 7), 0.7, 1.0)); StorageClient fakeStorageClient = mock(StorageClient.class); - when(fakeStorageClient.readRows(expectedRequest)) + when(fakeStorageClient.readRows(expectedRequest, "")) .thenReturn(new FakeBigQueryServerStream<>(responses)); BigQueryStorageStreamSource streamSource = @@ -790,7 +899,7 @@ public void testFractionConsumedWithSplit() throws Exception { createResponse(AVRO_SCHEMA, records.subList(4, 7), 0.800, 0.875)); StorageClient fakeStorageClient = mock(StorageClient.class); - when(fakeStorageClient.readRows(expectedRequest)) + when(fakeStorageClient.readRows(expectedRequest, "")) .thenReturn(new FakeBigQueryServerStream<>(parentResponses)); when(fakeStorageClient.splitReadStream( @@ -807,7 +916,7 @@ public void testFractionConsumedWithSplit() throws Exception { createResponse(AVRO_SCHEMA, records.subList(3, 4), 0.8, 1.0)); when(fakeStorageClient.readRows( - ReadRowsRequest.newBuilder().setReadStream("primaryStream").setOffset(1).build())) + ReadRowsRequest.newBuilder().setReadStream("primaryStream").setOffset(1).build(), "")) .thenReturn(new FakeBigQueryServerStream<>(primaryResponses)); BigQueryStorageStreamSource streamSource = @@ -865,7 +974,7 @@ public void testStreamSourceSplitAtFractionSucceeds() throws Exception { StorageClient fakeStorageClient = mock(StorageClient.class); when(fakeStorageClient.readRows( - ReadRowsRequest.newBuilder().setReadStream("parentStream").build())) + ReadRowsRequest.newBuilder().setReadStream("parentStream").build(), "")) .thenReturn(new FakeBigQueryServerStream<>(parentResponses)); // Mocks the split call. @@ -884,11 +993,12 @@ public void testStreamSourceSplitAtFractionSucceeds() throws Exception { // This test will read rows 0 and 1 from the parent before calling split, // so we expect the primary read to start at offset 2. .setOffset(2) - .build())) + .build(), + "")) .thenReturn(new FakeBigQueryServerStream<>(parentResponses.subList(1, 2))); when(fakeStorageClient.readRows( - ReadRowsRequest.newBuilder().setReadStream("remainderStream").build())) + ReadRowsRequest.newBuilder().setReadStream("remainderStream").build(), "")) .thenReturn( new FakeBigQueryServerStream<>(parentResponses.subList(2, parentResponses.size()))); @@ -942,7 +1052,7 @@ public void testStreamSourceSplitAtFractionRepeated() throws Exception { // Mock the initial ReadRows call. when(fakeStorageClient.readRows( - ReadRowsRequest.newBuilder().setReadStream(readStreams.get(0).getName()).build())) + ReadRowsRequest.newBuilder().setReadStream(readStreams.get(0).getName()).build(), "")) .thenReturn( new FakeBigQueryServerStream<>( Lists.newArrayList( @@ -982,7 +1092,8 @@ public void testStreamSourceSplitAtFractionRepeated() throws Exception { ReadRowsRequest.newBuilder() .setReadStream(readStreams.get(1).getName()) .setOffset(1) - .build())) + .build(), + "")) .thenReturn( new FakeBigQueryServerStream<>( Lists.newArrayList( @@ -1016,7 +1127,8 @@ public void testStreamSourceSplitAtFractionRepeated() throws Exception { ReadRowsRequest.newBuilder() .setReadStream(readStreams.get(2).getName()) .setOffset(2) - .build())) + .build(), + "")) .thenReturn( new FakeBigQueryServerStream<>( Lists.newArrayList( @@ -1082,7 +1194,7 @@ public void testStreamSourceSplitAtFractionFailsWhenSplitIsNotPossible() throws StorageClient fakeStorageClient = mock(StorageClient.class); when(fakeStorageClient.readRows( - ReadRowsRequest.newBuilder().setReadStream("parentStream").build())) + ReadRowsRequest.newBuilder().setReadStream("parentStream").build(), "")) .thenReturn(new FakeBigQueryServerStream<>(parentResponses)); // Mocks the split call. A response without a primary_stream and remainder_stream means @@ -1112,6 +1224,12 @@ public void testStreamSourceSplitAtFractionFailsWhenSplitIsNotPossible() throws assertEquals("B", parent.getCurrent().get("name")); assertNull(parent.splitAtFraction(0.5)); + verify(fakeStorageClient, times(1)).splitReadStream(ArgumentMatchers.any()); + + // Verify that subsequent splitAtFraction() calls after a failed splitAtFraction() attempt + // do NOT invoke SplitReadStream. + assertNull(parent.splitAtFraction(0.5)); + verify(fakeStorageClient, times(1)).splitReadStream(ArgumentMatchers.any()); // Verify that the parent source still works okay even after an unsuccessful split attempt. assertTrue(parent.advance()); @@ -1144,7 +1262,7 @@ public void testStreamSourceSplitAtFractionFailsWhenParentIsPastSplitPoint() thr StorageClient fakeStorageClient = mock(StorageClient.class); when(fakeStorageClient.readRows( - ReadRowsRequest.newBuilder().setReadStream("parentStream").build())) + ReadRowsRequest.newBuilder().setReadStream("parentStream").build(), "")) .thenReturn(new FakeBigQueryServerStream<>(parentResponses)); // Mocks the split call. A response without a primary_stream and remainder_stream means @@ -1165,7 +1283,8 @@ public void testStreamSourceSplitAtFractionFailsWhenParentIsPastSplitPoint() thr // This test will read rows 0 and 1 from the parent before calling split, // so we expect the primary read to start at offset 2. .setOffset(2) - .build())) + .build(), + "")) .thenThrow( new FailedPreconditionException( "Given row offset is invalid for stream.", @@ -1237,6 +1356,7 @@ public void testReadFromBigQueryIO() throws Exception { .setName("readSessionName") .setAvroSchema(AvroSchema.newBuilder().setSchema(AVRO_SCHEMA_STRING)) .addStreams(ReadStream.newBuilder().setName("streamName")) + .setDataFormat(DataFormat.AVRO) .build(); ReadRowsRequest expectedReadRowsRequest = @@ -1257,7 +1377,7 @@ public void testReadFromBigQueryIO() throws Exception { StorageClient fakeStorageClient = mock(StorageClient.class, withSettings().serializable()); when(fakeStorageClient.createReadSession(expectedCreateReadSessionRequest)) .thenReturn(readSession); - when(fakeStorageClient.readRows(expectedReadRowsRequest)) + when(fakeStorageClient.readRows(expectedReadRowsRequest, "")) .thenReturn(new FakeBigQueryServerStream<>(readRowsResponses)); PCollection> output = @@ -1265,6 +1385,7 @@ public void testReadFromBigQueryIO() throws Exception { BigQueryIO.read(new ParseKeyValue()) .from("foo.com:project:dataset.table") .withMethod(Method.DIRECT_READ) + .withFormat(DataFormat.AVRO) .withTestServices( new FakeBigQueryServices() .withDatasetService(fakeDatasetService) @@ -1290,9 +1411,9 @@ public void testReadFromBigQueryIOWithTrimmedSchema() throws Exception { .setReadSession( ReadSession.newBuilder() .setTable("projects/foo.com:project/datasets/dataset/tables/table") - .setDataFormat(DataFormat.AVRO) .setReadOptions( - ReadSession.TableReadOptions.newBuilder().addSelectedFields("name"))) + ReadSession.TableReadOptions.newBuilder().addSelectedFields("name")) + .setDataFormat(DataFormat.AVRO)) .setMaxStreamCount(10) .build(); @@ -1301,6 +1422,7 @@ public void testReadFromBigQueryIOWithTrimmedSchema() throws Exception { .setName("readSessionName") .setAvroSchema(AvroSchema.newBuilder().setSchema(TRIMMED_AVRO_SCHEMA_STRING)) .addStreams(ReadStream.newBuilder().setName("streamName")) + .setDataFormat(DataFormat.AVRO) .build(); ReadRowsRequest expectedReadRowsRequest = @@ -1321,7 +1443,7 @@ public void testReadFromBigQueryIOWithTrimmedSchema() throws Exception { StorageClient fakeStorageClient = mock(StorageClient.class, withSettings().serializable()); when(fakeStorageClient.createReadSession(expectedCreateReadSessionRequest)) .thenReturn(readSession); - when(fakeStorageClient.readRows(expectedReadRowsRequest)) + when(fakeStorageClient.readRows(expectedReadRowsRequest, "")) .thenReturn(new FakeBigQueryServerStream<>(readRowsResponses)); PCollection output = @@ -1330,6 +1452,7 @@ public void testReadFromBigQueryIOWithTrimmedSchema() throws Exception { .from("foo.com:project:dataset.table") .withMethod(Method.DIRECT_READ) .withSelectedFields(Lists.newArrayList("name")) + .withFormat(DataFormat.AVRO) .withTestServices( new FakeBigQueryServices() .withDatasetService(fakeDatasetService) @@ -1345,4 +1468,618 @@ public void testReadFromBigQueryIOWithTrimmedSchema() throws Exception { p.run(); } + + @Test + public void testReadFromBigQueryIOArrow() throws Exception { + fakeDatasetService.createDataset("foo.com:project", "dataset", "", "", null); + TableReference tableRef = BigQueryHelpers.parseTableSpec("foo.com:project:dataset.table"); + Table table = new Table().setTableReference(tableRef).setNumBytes(10L).setSchema(TABLE_SCHEMA); + fakeDatasetService.createTable(table); + + CreateReadSessionRequest expectedCreateReadSessionRequest = + CreateReadSessionRequest.newBuilder() + .setParent("projects/project-id") + .setReadSession( + ReadSession.newBuilder() + .setTable("projects/foo.com:project/datasets/dataset/tables/table") + .setDataFormat(DataFormat.ARROW)) + .setMaxStreamCount(10) + .build(); + + ReadSession readSession = + ReadSession.newBuilder() + .setName("readSessionName") + .setArrowSchema( + ArrowSchema.newBuilder() + .setSerializedSchema(serializeArrowSchema(ARROW_SCHEMA)) + .build()) + .addStreams(ReadStream.newBuilder().setName("streamName")) + .setDataFormat(DataFormat.ARROW) + .build(); + + ReadRowsRequest expectedReadRowsRequest = + ReadRowsRequest.newBuilder().setReadStream("streamName").build(); + + List names = Arrays.asList("A", "B", "C", "D"); + List values = Arrays.asList(1L, 2L, 3L, 4L); + List readRowsResponses = + Lists.newArrayList( + createResponseArrow(ARROW_SCHEMA, names.subList(0, 2), values.subList(0, 2), 0.0, 0.50), + createResponseArrow( + ARROW_SCHEMA, names.subList(2, 4), values.subList(2, 4), 0.5, 0.75)); + + StorageClient fakeStorageClient = mock(StorageClient.class, withSettings().serializable()); + when(fakeStorageClient.createReadSession(expectedCreateReadSessionRequest)) + .thenReturn(readSession); + when(fakeStorageClient.readRows(expectedReadRowsRequest, "")) + .thenReturn(new FakeBigQueryServerStream<>(readRowsResponses)); + + PCollection> output = + p.apply( + BigQueryIO.read(new ParseKeyValue()) + .from("foo.com:project:dataset.table") + .withMethod(Method.DIRECT_READ) + .withFormat(DataFormat.ARROW) + .withTestServices( + new FakeBigQueryServices() + .withDatasetService(fakeDatasetService) + .withStorageClient(fakeStorageClient))); + + PAssert.that(output) + .containsInAnyOrder( + ImmutableList.of(KV.of("A", 1L), KV.of("B", 2L), KV.of("C", 3L), KV.of("D", 4L))); + + p.run(); + } + + @Test + public void testReadFromStreamSourceArrow() throws Exception { + + ReadSession readSession = + ReadSession.newBuilder() + .setName("readSession") + .setArrowSchema( + ArrowSchema.newBuilder() + .setSerializedSchema(serializeArrowSchema(ARROW_SCHEMA)) + .build()) + .setDataFormat(DataFormat.ARROW) + .build(); + + ReadRowsRequest expectedRequest = + ReadRowsRequest.newBuilder().setReadStream("readStream").build(); + + List names = Arrays.asList("A", "B", "C"); + List values = Arrays.asList(1L, 2L, 3L); + List responses = + Lists.newArrayList( + createResponseArrow(ARROW_SCHEMA, names.subList(0, 2), values.subList(0, 2), 0.0, 0.50), + createResponseArrow( + ARROW_SCHEMA, names.subList(2, 3), values.subList(2, 3), 0.5, 0.75)); + + StorageClient fakeStorageClient = mock(StorageClient.class); + when(fakeStorageClient.readRows(expectedRequest, "")) + .thenReturn(new FakeBigQueryServerStream<>(responses)); + + BigQueryStorageStreamSource streamSource = + BigQueryStorageStreamSource.create( + readSession, + ReadStream.newBuilder().setName("readStream").build(), + TABLE_SCHEMA, + new TableRowParser(), + TableRowJsonCoder.of(), + new FakeBigQueryServices().withStorageClient(fakeStorageClient)); + + List rows = new ArrayList<>(); + BoundedReader reader = streamSource.createReader(options); + for (boolean hasNext = reader.start(); hasNext; hasNext = reader.advance()) { + rows.add(reader.getCurrent()); + } + + System.out.println("Rows: " + rows); + + assertEquals(3, rows.size()); + } + + @Test + public void testFractionConsumedArrow() throws Exception { + ReadSession readSession = + ReadSession.newBuilder() + .setName("readSession") + .setArrowSchema( + ArrowSchema.newBuilder() + .setSerializedSchema(serializeArrowSchema(ARROW_SCHEMA)) + .build()) + .setDataFormat(DataFormat.ARROW) + .build(); + + ReadRowsRequest expectedRequest = + ReadRowsRequest.newBuilder().setReadStream("readStream").build(); + + List names = Arrays.asList("A", "B", "C", "D", "E", "F", "G"); + List values = Arrays.asList(1L, 2L, 3L, 4L, 5L, 6L, 7L); + List responses = + Lists.newArrayList( + createResponseArrow(ARROW_SCHEMA, names.subList(0, 2), values.subList(0, 2), 0.0, 0.25), + createResponseArrow( + ARROW_SCHEMA, Lists.newArrayList(), Lists.newArrayList(), 0.25, 0.25), + createResponseArrow(ARROW_SCHEMA, names.subList(2, 4), values.subList(2, 4), 0.3, 0.5), + createResponseArrow(ARROW_SCHEMA, names.subList(4, 7), values.subList(4, 7), 0.7, 1.0)); + + StorageClient fakeStorageClient = mock(StorageClient.class); + when(fakeStorageClient.readRows(expectedRequest, "")) + .thenReturn(new FakeBigQueryServerStream<>(responses)); + + BigQueryStorageStreamSource streamSource = + BigQueryStorageStreamSource.create( + readSession, + ReadStream.newBuilder().setName("readStream").build(), + TABLE_SCHEMA, + new TableRowParser(), + TableRowJsonCoder.of(), + new FakeBigQueryServices().withStorageClient(fakeStorageClient)); + + List rows = new ArrayList<>(); + BoundedReader reader = streamSource.createReader(options); + + // Before call to BoundedReader#start, fraction consumed must be zero. + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + + assertTrue(reader.start()); // Reads A. + assertEquals(0.125, reader.getFractionConsumed(), DELTA); + assertTrue(reader.advance()); // Reads B. + assertEquals(0.25, reader.getFractionConsumed(), DELTA); + + assertTrue(reader.advance()); // Reads C. + assertEquals(0.4, reader.getFractionConsumed(), DELTA); + assertTrue(reader.advance()); // Reads D. + assertEquals(0.5, reader.getFractionConsumed(), DELTA); + + assertTrue(reader.advance()); // Reads E. + assertEquals(0.8, reader.getFractionConsumed(), DELTA); + assertTrue(reader.advance()); // Reads F. + assertEquals(0.9, reader.getFractionConsumed(), DELTA); + assertTrue(reader.advance()); // Reads G. + assertEquals(1.0, reader.getFractionConsumed(), DELTA); + + assertFalse(reader.advance()); // Reaches the end. + + // We are done with the stream, so we should report 100% consumption. + assertEquals(Double.valueOf(1.0), reader.getFractionConsumed()); + } + + @Test + public void testFractionConsumedWithSplitArrow() throws Exception { + ReadSession readSession = + ReadSession.newBuilder() + .setName("readSession") + .setArrowSchema( + ArrowSchema.newBuilder() + .setSerializedSchema(serializeArrowSchema(ARROW_SCHEMA)) + .build()) + .setDataFormat(DataFormat.ARROW) + .build(); + + ReadRowsRequest expectedRequest = + ReadRowsRequest.newBuilder().setReadStream("parentStream").build(); + + List names = Arrays.asList("A", "B", "C", "D", "E", "F", "G"); + List values = Arrays.asList(1L, 2L, 3L, 4L, 5L, 6L, 7L); + List parentResponse = + Lists.newArrayList( + createResponseArrow(ARROW_SCHEMA, names.subList(0, 2), values.subList(0, 2), 0.0, 0.25), + createResponseArrow(ARROW_SCHEMA, names.subList(2, 4), values.subList(2, 4), 0.3, 0.5), + createResponseArrow( + ARROW_SCHEMA, names.subList(4, 7), values.subList(4, 7), 0.7, 0.875)); + + StorageClient fakeStorageClient = mock(StorageClient.class); + when(fakeStorageClient.readRows(expectedRequest, "")) + .thenReturn(new FakeBigQueryServerStream<>(parentResponse)); + + when(fakeStorageClient.splitReadStream( + SplitReadStreamRequest.newBuilder().setName("parentStream").setFraction(0.5f).build())) + .thenReturn( + SplitReadStreamResponse.newBuilder() + .setPrimaryStream(ReadStream.newBuilder().setName("primaryStream")) + .setRemainderStream(ReadStream.newBuilder().setName("remainderStream")) + .build()); + List primaryResponses = + Lists.newArrayList( + createResponseArrow( + ARROW_SCHEMA, names.subList(1, 3), values.subList(1, 3), 0.25, 0.75), + createResponseArrow(ARROW_SCHEMA, names.subList(3, 4), values.subList(3, 4), 0.8, 1.0)); + + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder().setReadStream("primaryStream").setOffset(1).build(), "")) + .thenReturn(new FakeBigQueryServerStream<>(primaryResponses)); + + BigQueryStorageStreamSource streamSource = + BigQueryStorageStreamSource.create( + readSession, + ReadStream.newBuilder().setName("parentStream").build(), + TABLE_SCHEMA, + new TableRowParser(), + TableRowJsonCoder.of(), + new FakeBigQueryServices().withStorageClient(fakeStorageClient)); + + BoundedReader reader = streamSource.createReader(options); + + // Before call to BoundedReader#start, fraction consumed must be zero. + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + + assertTrue(reader.start()); // Reads A. + assertEquals(0.125, reader.getFractionConsumed(), DELTA); + + reader.splitAtFraction(0.5); + assertEquals(0.125, reader.getFractionConsumed(), DELTA); + + assertTrue(reader.advance()); // Reads B. + assertEquals(0.5, reader.getFractionConsumed(), DELTA); + + assertTrue(reader.advance()); // Reads C. + assertEquals(0.75, reader.getFractionConsumed(), DELTA); + + assertTrue(reader.advance()); // Reads D. + assertEquals(1.0, reader.getFractionConsumed(), DELTA); + + assertFalse(reader.advance()); + assertEquals(1.0, reader.getFractionConsumed(), DELTA); + } + + @Test + public void testStreamSourceSplitAtFractionSucceedsArrow() throws Exception { + List names = Arrays.asList("A", "B", "C", "D", "E", "F", "G"); + List values = Arrays.asList(1L, 2L, 3L, 4L, 5L, 6L, 7L); + List parentResponses = + Lists.newArrayList( + createResponseArrow(ARROW_SCHEMA, names.subList(0, 2), values.subList(0, 2), 0.0, 0.25), + createResponseArrow(ARROW_SCHEMA, names.subList(2, 3), values.subList(2, 3), 0.25, 0.5), + createResponseArrow( + ARROW_SCHEMA, names.subList(3, 5), values.subList(3, 5), 0.5, 0.75)); + + StorageClient fakeStorageClient = mock(StorageClient.class); + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder().setReadStream("parentStream").build(), "")) + .thenReturn(new FakeBigQueryServerStream<>(parentResponses)); + + // Mocks the split call. + when(fakeStorageClient.splitReadStream( + SplitReadStreamRequest.newBuilder().setName("parentStream").setFraction(0.5f).build())) + .thenReturn( + SplitReadStreamResponse.newBuilder() + .setPrimaryStream(ReadStream.newBuilder().setName("primaryStream")) + .setRemainderStream(ReadStream.newBuilder().setName("remainderStream")) + .build()); + + // Mocks the ReadRows calls expected on the primary and residual streams. + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder() + .setReadStream("primaryStream") + // This test will read rows 0 and 1 from the parent before calling split, + // so we expect the primary read to start at offset 2. + .setOffset(2) + .build(), + "")) + .thenReturn(new FakeBigQueryServerStream<>(parentResponses.subList(1, 2))); + + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder().setReadStream("remainderStream").build(), "")) + .thenReturn( + new FakeBigQueryServerStream<>(parentResponses.subList(2, parentResponses.size()))); + + BigQueryStorageStreamSource streamSource = + BigQueryStorageStreamSource.create( + ReadSession.newBuilder() + .setName("readSession") + .setArrowSchema( + ArrowSchema.newBuilder() + .setSerializedSchema(serializeArrowSchema(ARROW_SCHEMA)) + .build()) + .setDataFormat(DataFormat.ARROW) + .build(), + ReadStream.newBuilder().setName("parentStream").build(), + TABLE_SCHEMA, + new TableRowParser(), + TableRowJsonCoder.of(), + new FakeBigQueryServices().withStorageClient(fakeStorageClient)); + + // Read a few records from the parent stream and ensure that records are returned in the + // prescribed order. + BoundedReader parent = streamSource.createReader(options); + assertTrue(parent.start()); + assertEquals("A", parent.getCurrent().get("name")); + assertTrue(parent.advance()); + assertEquals("B", parent.getCurrent().get("name")); + + // Now split the stream, and ensure that the "parent" reader has been replaced with the + // primary stream and that the returned source points to the residual stream. + BoundedReader primary = parent; + BoundedSource residualSource = parent.splitAtFraction(0.5); + assertNotNull(residualSource); + BoundedReader residual = residualSource.createReader(options); + + assertTrue(primary.advance()); + assertEquals("C", primary.getCurrent().get("name")); + assertFalse(primary.advance()); + + assertTrue(residual.start()); + assertEquals("D", residual.getCurrent().get("name")); + assertTrue(residual.advance()); + assertEquals("E", residual.getCurrent().get("name")); + assertFalse(residual.advance()); + } + + @Test + public void testStreamSourceSplitAtFractionRepeatedArrow() throws Exception { + List readStreams = + Lists.newArrayList( + ReadStream.newBuilder().setName("stream1").build(), + ReadStream.newBuilder().setName("stream2").build(), + ReadStream.newBuilder().setName("stream3").build()); + + StorageClient fakeStorageClient = mock(StorageClient.class); + + List names = Arrays.asList("A", "B", "C", "D", "E", "F"); + List values = Arrays.asList(1L, 2L, 3L, 4L, 5L, 6L); + List parentResponses = + Lists.newArrayList( + createResponseArrow(ARROW_SCHEMA, names.subList(0, 2), values.subList(0, 2), 0.0, 0.25), + createResponseArrow(ARROW_SCHEMA, names.subList(2, 4), values.subList(2, 4), 0.25, 0.5), + createResponseArrow( + ARROW_SCHEMA, names.subList(4, 6), values.subList(4, 6), 0.5, 0.75)); + + // Mock the initial ReadRows call. + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder().setReadStream(readStreams.get(0).getName()).build(), "")) + .thenReturn(new FakeBigQueryServerStream<>(parentResponses)); + + // Mock the first SplitReadStream call. + when(fakeStorageClient.splitReadStream( + SplitReadStreamRequest.newBuilder() + .setName(readStreams.get(0).getName()) + .setFraction(0.83f) + .build())) + .thenReturn( + SplitReadStreamResponse.newBuilder() + .setPrimaryStream(readStreams.get(1)) + .setRemainderStream(ReadStream.newBuilder().setName("ignored")) + .build()); + + List otherResponses = + Lists.newArrayList( + createResponseArrow(ARROW_SCHEMA, names.subList(1, 3), values.subList(1, 3), 0.0, 0.50), + createResponseArrow( + ARROW_SCHEMA, names.subList(3, 4), values.subList(3, 4), 0.5, 0.75)); + + // Mock the second ReadRows call. + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder() + .setReadStream(readStreams.get(1).getName()) + .setOffset(1) + .build(), + "")) + .thenReturn(new FakeBigQueryServerStream<>(otherResponses)); + + // Mock the second SplitReadStream call. + when(fakeStorageClient.splitReadStream( + SplitReadStreamRequest.newBuilder() + .setName(readStreams.get(1).getName()) + .setFraction(0.75f) + .build())) + .thenReturn( + SplitReadStreamResponse.newBuilder() + .setPrimaryStream(readStreams.get(2)) + .setRemainderStream(ReadStream.newBuilder().setName("ignored")) + .build()); + + List lastResponses = + Lists.newArrayList( + createResponseArrow( + ARROW_SCHEMA, names.subList(2, 4), values.subList(2, 4), 0.80, 0.90)); + + // Mock the third ReadRows call. + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder() + .setReadStream(readStreams.get(2).getName()) + .setOffset(2) + .build(), + "")) + .thenReturn(new FakeBigQueryServerStream<>(lastResponses)); + + BoundedSource source = + BigQueryStorageStreamSource.create( + ReadSession.newBuilder() + .setName("readSession") + .setArrowSchema( + ArrowSchema.newBuilder() + .setSerializedSchema(serializeArrowSchema(ARROW_SCHEMA)) + .build()) + .setDataFormat(DataFormat.ARROW) + .build(), + readStreams.get(0), + TABLE_SCHEMA, + new TableRowParser(), + TableRowJsonCoder.of(), + new FakeBigQueryServices().withStorageClient(fakeStorageClient)); + + BoundedReader reader = source.createReader(options); + assertTrue(reader.start()); + assertEquals("A", reader.getCurrent().get("name")); + + BoundedSource residualSource = reader.splitAtFraction(0.83f); + assertNotNull(residualSource); + assertEquals("A", reader.getCurrent().get("name")); + + assertTrue(reader.advance()); + assertEquals("B", reader.getCurrent().get("name")); + + residualSource = reader.splitAtFraction(0.75f); + assertNotNull(residualSource); + assertEquals("B", reader.getCurrent().get("name")); + + assertTrue(reader.advance()); + assertEquals("C", reader.getCurrent().get("name")); + assertTrue(reader.advance()); + assertEquals("D", reader.getCurrent().get("name")); + assertFalse(reader.advance()); + } + + @Test + public void testStreamSourceSplitAtFractionFailsWhenSplitIsNotPossibleArrow() throws Exception { + List names = Arrays.asList("A", "B", "C", "D", "E", "F"); + List values = Arrays.asList(1L, 2L, 3L, 4L, 5L, 6L); + List parentResponses = + Lists.newArrayList( + createResponseArrow(ARROW_SCHEMA, names.subList(0, 2), values.subList(0, 2), 0.0, 0.25), + createResponseArrow(ARROW_SCHEMA, names.subList(2, 3), values.subList(2, 3), 0.25, 0.5), + createResponseArrow( + ARROW_SCHEMA, names.subList(3, 5), values.subList(3, 5), 0.5, 0.75)); + + StorageClient fakeStorageClient = mock(StorageClient.class); + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder().setReadStream("parentStream").build(), "")) + .thenReturn(new FakeBigQueryServerStream<>(parentResponses)); + + // Mocks the split call. A response without a primary_stream and remainder_stream means + // that the split is not possible. + when(fakeStorageClient.splitReadStream( + SplitReadStreamRequest.newBuilder().setName("parentStream").setFraction(0.5f).build())) + .thenReturn(SplitReadStreamResponse.getDefaultInstance()); + + BigQueryStorageStreamSource streamSource = + BigQueryStorageStreamSource.create( + ReadSession.newBuilder() + .setName("readSession") + .setArrowSchema( + ArrowSchema.newBuilder() + .setSerializedSchema(serializeArrowSchema(ARROW_SCHEMA)) + .build()) + .setDataFormat(DataFormat.ARROW) + .build(), + ReadStream.newBuilder().setName("parentStream").build(), + TABLE_SCHEMA, + new TableRowParser(), + TableRowJsonCoder.of(), + new FakeBigQueryServices().withStorageClient(fakeStorageClient)); + + // Read a few records from the parent stream and ensure that records are returned in the + // prescribed order. + BoundedReader parent = streamSource.createReader(options); + assertTrue(parent.start()); + assertEquals("A", parent.getCurrent().get("name")); + assertTrue(parent.advance()); + assertEquals("B", parent.getCurrent().get("name")); + + assertNull(parent.splitAtFraction(0.5)); + verify(fakeStorageClient, times(1)).splitReadStream(ArgumentMatchers.any()); + + // Verify that subsequent splitAtFraction() calls after a failed splitAtFraction() attempt + // do NOT invoke SplitReadStream. + assertNull(parent.splitAtFraction(0.5)); + verify(fakeStorageClient, times(1)).splitReadStream(ArgumentMatchers.any()); + + // Verify that the parent source still works okay even after an unsuccessful split attempt. + assertTrue(parent.advance()); + assertEquals("C", parent.getCurrent().get("name")); + assertTrue(parent.advance()); + assertEquals("D", parent.getCurrent().get("name")); + assertTrue(parent.advance()); + assertEquals("E", parent.getCurrent().get("name")); + assertFalse(parent.advance()); + } + + @Test + public void testStreamSourceSplitAtFractionFailsWhenParentIsPastSplitPointArrow() + throws Exception { + List names = Arrays.asList("A", "B", "C", "D", "E", "F"); + List values = Arrays.asList(1L, 2L, 3L, 4L, 5L, 6L); + List parentResponses = + Lists.newArrayList( + createResponseArrow(ARROW_SCHEMA, names.subList(0, 2), values.subList(0, 2), 0.0, 0.25), + createResponseArrow(ARROW_SCHEMA, names.subList(2, 3), values.subList(2, 3), 0.25, 0.5), + createResponseArrow( + ARROW_SCHEMA, names.subList(3, 5), values.subList(3, 5), 0.5, 0.75)); + + StorageClient fakeStorageClient = mock(StorageClient.class); + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder().setReadStream("parentStream").build(), "")) + .thenReturn(new FakeBigQueryServerStream<>(parentResponses)); + + // Mocks the split call. A response without a primary_stream and remainder_stream means + // that the split is not possible. + // Mocks the split call. + when(fakeStorageClient.splitReadStream( + SplitReadStreamRequest.newBuilder().setName("parentStream").setFraction(0.5f).build())) + .thenReturn( + SplitReadStreamResponse.newBuilder() + .setPrimaryStream(ReadStream.newBuilder().setName("primaryStream")) + .setRemainderStream(ReadStream.newBuilder().setName("remainderStream")) + .build()); + + // Mocks the ReadRows calls expected on the primary and residual streams. + when(fakeStorageClient.readRows( + ReadRowsRequest.newBuilder() + .setReadStream("primaryStream") + // This test will read rows 0 and 1 from the parent before calling split, + // so we expect the primary read to start at offset 2. + .setOffset(2) + .build(), + "")) + .thenThrow( + new FailedPreconditionException( + "Given row offset is invalid for stream.", + new StatusRuntimeException(Status.FAILED_PRECONDITION), + GrpcStatusCode.of(Code.FAILED_PRECONDITION), + /* retryable = */ false)); + + BigQueryStorageStreamSource streamSource = + BigQueryStorageStreamSource.create( + ReadSession.newBuilder() + .setName("readSession") + .setArrowSchema( + ArrowSchema.newBuilder() + .setSerializedSchema(serializeArrowSchema(ARROW_SCHEMA)) + .build()) + .setDataFormat(DataFormat.ARROW) + .build(), + ReadStream.newBuilder().setName("parentStream").build(), + TABLE_SCHEMA, + new TableRowParser(), + TableRowJsonCoder.of(), + new FakeBigQueryServices().withStorageClient(fakeStorageClient)); + + // Read a few records from the parent stream and ensure that records are returned in the + // prescribed order. + BoundedReader parent = streamSource.createReader(options); + assertTrue(parent.start()); + assertEquals("A", parent.getCurrent().get("name")); + assertTrue(parent.advance()); + assertEquals("B", parent.getCurrent().get("name")); + + assertNull(parent.splitAtFraction(0.5)); + + // Verify that the parent source still works okay even after an unsuccessful split attempt. + assertTrue(parent.advance()); + assertEquals("C", parent.getCurrent().get("name")); + assertTrue(parent.advance()); + assertEquals("D", parent.getCurrent().get("name")); + assertTrue(parent.advance()); + assertEquals("E", parent.getCurrent().get("name")); + assertFalse(parent.advance()); + } + + private static org.apache.arrow.vector.types.pojo.Field field( + String name, + boolean nullable, + ArrowType type, + org.apache.arrow.vector.types.pojo.Field... children) { + return new org.apache.arrow.vector.types.pojo.Field( + name, + new org.apache.arrow.vector.types.pojo.FieldType(nullable, type, null, null), + asList(children)); + } + + static org.apache.arrow.vector.types.pojo.Field field( + String name, ArrowType type, org.apache.arrow.vector.types.pojo.Field... children) { + return field(name, false, type, children); + } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOWriteTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOWriteTest.java index 6ab0183b5b91..6799c6786378 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOWriteTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOWriteTest.java @@ -22,6 +22,7 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; @@ -30,7 +31,6 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; @@ -57,6 +57,7 @@ import java.util.Collection; import java.util.Collections; import java.util.EnumSet; +import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Set; @@ -64,6 +65,7 @@ import java.util.regex.Matcher; import java.util.regex.Pattern; import java.util.stream.Collectors; +import java.util.stream.StreamSupport; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericDatumWriter; import org.apache.avro.generic.GenericRecord; @@ -71,12 +73,17 @@ import org.apache.avro.io.Encoder; import org.apache.beam.sdk.coders.AtomicCoder; import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.SerializableCoder; +import org.apache.beam.sdk.coders.ShardedKeyCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.io.GenerateSequence; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.Method; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.SchemaUpdateOption; +import org.apache.beam.sdk.io.gcp.bigquery.WritePartition.ResultCoder; +import org.apache.beam.sdk.io.gcp.bigquery.WriteTables.Result; import org.apache.beam.sdk.io.gcp.testing.FakeBigQueryServices; import org.apache.beam.sdk.io.gcp.testing.FakeDatasetService; import org.apache.beam.sdk.io.gcp.testing.FakeJobService; @@ -134,11 +141,13 @@ import org.junit.rules.TestRule; import org.junit.runner.Description; import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; +import org.junit.runners.Parameterized; +import org.junit.runners.Parameterized.Parameter; +import org.junit.runners.Parameterized.Parameters; import org.junit.runners.model.Statement; /** Tests for {@link BigQueryIO#write}. */ -@RunWith(JUnit4.class) +@RunWith(Parameterized.class) @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) @@ -147,6 +156,21 @@ public class BigQueryIOWriteTest implements Serializable { private transient TemporaryFolder testFolder = new TemporaryFolder(); private transient TestPipeline p; + @Parameters + public static Iterable data() { + return ImmutableList.of( + new Object[] {false, false}, + new Object[] {false, true}, + new Object[] {true, false}, + new Object[] {true, true}); + } + + @Parameter(0) + public boolean useStorageApi; + + @Parameter(1) + public boolean useStreaming; + @Rule public final transient TestRule folderThenPipeline = new TestRule() { @@ -161,10 +185,20 @@ public Statement apply(final Statement base, final Description description) { @Override public void evaluate() throws Throwable { options = TestPipeline.testingPipelineOptions(); - options.as(BigQueryOptions.class).setProject("project-id"); - options - .as(BigQueryOptions.class) - .setTempLocation(testFolder.getRoot().getAbsolutePath()); + BigQueryOptions bqOptions = options.as(BigQueryOptions.class); + bqOptions.setProject("project-id"); + if (description.getAnnotations().stream() + .anyMatch(a -> a.annotationType().equals(ProjectOverride.class))) { + options.as(BigQueryOptions.class).setBigQueryProject("bigquery-project-id"); + } + bqOptions.setTempLocation(testFolder.getRoot().getAbsolutePath()); + if (useStorageApi) { + bqOptions.setUseStorageWriteApi(true); + if (useStreaming) { + bqOptions.setNumStorageWriteApiStreams(2); + bqOptions.setStorageWriteApiTriggeringFrequencySec(1); + } + } p = TestPipeline.fromOptions(options); p.apply(base, description).evaluate(); } @@ -187,7 +221,7 @@ public void evaluate() throws Throwable { public void setUp() throws IOException, InterruptedException { FakeDatasetService.setUp(); BigQueryIO.clearCreatedTables(); - + fakeDatasetService.createDataset("bigquery-project-id", "dataset-id", "", "", null); fakeDatasetService.createDataset("project-id", "dataset-id", "", "", null); } @@ -201,6 +235,9 @@ abstract static class StringLongDestinations extends DynamicDestinations(arg -> arg))); - if (streaming) { + if (useStreaming) { users = users.setIsBoundedInternal(PCollection.IsBounded.UNBOUNDED); } @@ -358,6 +396,9 @@ private void verifySideInputs() { return new TableRow().set("name", matcher.group(1)).set("id", matcher.group(2)); }); } + if (autoSharding) { + write = write.withAutoSharding(); + } users.apply("WriteBigQuery", write); p.run(); @@ -391,11 +432,8 @@ void testTimePartitioningClustering( new TableSchema() .setFields( ImmutableList.of( - new TableFieldSchema() - .setName("date") - .setType("DATE") - .setName("number") - .setType("INTEGER"))); + new TableFieldSchema().setName("date").setType("DATE"), + new TableFieldSchema().setName("number").setType("INTEGER"))); Write writeTransform = BigQueryIO.writeTableRows() @@ -436,27 +474,38 @@ void testClustering(BigQueryIO.Write.Method insertMethod) throws Exception { } @Test - public void testTimePartitioningStreamingInserts() throws Exception { - testTimePartitioning(BigQueryIO.Write.Method.STREAMING_INSERTS); - } - - @Test - public void testTimePartitioningBatchLoads() throws Exception { - testTimePartitioning(BigQueryIO.Write.Method.FILE_LOADS); + public void testTimePartitioning() throws Exception { + BigQueryIO.Write.Method method; + if (useStorageApi) { + method = Method.STORAGE_WRITE_API; + } else if (useStreaming) { + method = Method.STREAMING_INSERTS; + } else { + method = Method.FILE_LOADS; + } + testTimePartitioning(method); } @Test - public void testClusteringStreamingInserts() throws Exception { - testClustering(BigQueryIO.Write.Method.STREAMING_INSERTS); + public void testTimePartitioningStorageApi() throws Exception { + if (!useStorageApi) { + return; + } + testTimePartitioning(Method.STORAGE_WRITE_API); } @Test - public void testClusteringBatchLoads() throws Exception { - testClustering(BigQueryIO.Write.Method.FILE_LOADS); + public void testClusteringStorageApi() throws Exception { + if (useStorageApi) { + testClustering(Method.STORAGE_WRITE_API); + } } @Test(expected = IllegalArgumentException.class) public void testClusteringThrowsWithoutPartitioning() throws Exception { + if (useStorageApi || !useStreaming) { + throw new IllegalArgumentException(); + } p.enableAbandonedNodeEnforcement(false); testTimePartitioningClustering(Method.STREAMING_INSERTS, false, true); } @@ -472,11 +521,12 @@ public void testClusteringTableFunction() throws Exception { new TableSchema() .setFields( ImmutableList.of( - new TableFieldSchema() - .setName("date") - .setType("DATE") - .setName("number") - .setType("INTEGER"))); + new TableFieldSchema().setName("date").setType("DATE"), + new TableFieldSchema().setName("number").setType("INTEGER"))); + + // withMethod overrides the pipeline option, so we need to explicitly request + // STORAGE_API_WRITES. + BigQueryIO.Write.Method method = useStorageApi ? Method.STORAGE_WRITE_API : Method.FILE_LOADS; p.apply(Create.of(row1, row2)) .apply( BigQueryIO.writeTableRows() @@ -491,7 +541,7 @@ public void testClusteringTableFunction() throws Exception { new Clustering().setFields(ImmutableList.of("date"))); }) .withTestServices(fakeBqServices) - .withMethod(BigQueryIO.Write.Method.FILE_LOADS) + .withMethod(method) .withSchema(schema) .withClustering() .withoutValidation()); @@ -506,6 +556,9 @@ public void testClusteringTableFunction() throws Exception { @Test public void testTriggeredFileLoads() throws Exception { + if (useStorageApi || !useStreaming) { + return; + } List elements = Lists.newArrayList(); for (int i = 0; i < 30; ++i) { elements.add(new TableRow().set("number", i)); @@ -523,6 +576,7 @@ public void testTriggeredFileLoads() throws Exception { elements.get(20), Iterables.toArray(elements.subList(21, 30), TableRow.class)) .advanceWatermarkToInfinity(); + BigQueryIO.Write.Method method = Method.FILE_LOADS; p.apply(testStream) .apply( BigQueryIO.writeTableRows() @@ -535,7 +589,7 @@ public void testTriggeredFileLoads() throws Exception { .withTestServices(fakeBqServices) .withTriggeringFrequency(Duration.standardSeconds(30)) .withNumFileShards(2) - .withMethod(BigQueryIO.Write.Method.FILE_LOADS) + .withMethod(method) .withoutValidation()); p.run(); @@ -545,6 +599,9 @@ public void testTriggeredFileLoads() throws Exception { } public void testTriggeredFileLoadsWithTempTables(String tableRef) throws Exception { + if (useStorageApi || !useStreaming) { + return; + } List elements = Lists.newArrayList(); for (int i = 0; i < 30; ++i) { elements.add(new TableRow().set("number", i)); @@ -562,6 +619,7 @@ public void testTriggeredFileLoadsWithTempTables(String tableRef) throws Excepti elements.get(20), Iterables.toArray(elements.subList(21, 30), TableRow.class)) .advanceWatermarkToInfinity(); + BigQueryIO.Write.Method method = Method.FILE_LOADS; p.apply(testStream) .apply( BigQueryIO.writeTableRows() @@ -576,15 +634,25 @@ public void testTriggeredFileLoadsWithTempTables(String tableRef) throws Excepti .withNumFileShards(2) .withMaxBytesPerPartition(1) .withMaxFilesPerPartition(1) - .withMethod(BigQueryIO.Write.Method.FILE_LOADS) + .withMethod(method) .withoutValidation()); p.run(); + final int projectIdSplitter = tableRef.indexOf(':'); + final String projectId = + projectIdSplitter == -1 ? "project-id" : tableRef.substring(0, projectIdSplitter); + assertThat( - fakeDatasetService.getAllRows("project-id", "dataset-id", "table-id"), + fakeDatasetService.getAllRows(projectId, "dataset-id", "table-id"), containsInAnyOrder(Iterables.toArray(elements, TableRow.class))); } + @Test + @ProjectOverride + public void testTriggeredFileLoadsWithTempTablesBigQueryProject() throws Exception { + testTriggeredFileLoadsWithTempTables("bigquery-project-id:dataset-id.table-id"); + } + @Test public void testTriggeredFileLoadsWithTempTables() throws Exception { testTriggeredFileLoadsWithTempTables("project-id:dataset-id.table-id"); @@ -595,8 +663,83 @@ public void testTriggeredFileLoadsWithTempTablesDefaultProject() throws Exceptio testTriggeredFileLoadsWithTempTables("dataset-id.table-id"); } + @Test + public void testTriggeredFileLoadsWithAutoSharding() throws Exception { + if (useStorageApi || !useStreaming) { + // This test does not make sense for the storage API. + return; + } + List elements = Lists.newArrayList(); + for (int i = 0; i < 30; ++i) { + elements.add(new TableRow().set("number", i)); + } + + Instant startInstant = new Instant(0L); + TestStream testStream = + TestStream.create(TableRowJsonCoder.of()) + // Initialize watermark for timer to be triggered correctly. + .advanceWatermarkTo(startInstant) + .addElements( + elements.get(0), Iterables.toArray(elements.subList(1, 10), TableRow.class)) + .advanceProcessingTime(Duration.standardMinutes(1)) + .advanceWatermarkTo(startInstant.plus(Duration.standardSeconds(10))) + .addElements( + elements.get(10), Iterables.toArray(elements.subList(11, 20), TableRow.class)) + .advanceProcessingTime(Duration.standardMinutes(1)) + .advanceWatermarkTo(startInstant.plus(Duration.standardSeconds(30))) + .addElements( + elements.get(20), Iterables.toArray(elements.subList(21, 30), TableRow.class)) + .advanceProcessingTime(Duration.standardMinutes(2)) + .advanceWatermarkToInfinity(); + + int numTables = 3; + p.apply(testStream) + .apply( + BigQueryIO.writeTableRows() + .to( + (ValueInSingleWindow vsw) -> { + String tableSpec = + "project-id:dataset-id.table-" + + ((int) vsw.getValue().get("number") % numTables); + return new TableDestination(tableSpec, null); + }) + .withSchema( + new TableSchema() + .setFields( + ImmutableList.of( + new TableFieldSchema().setName("number").setType("INTEGER")))) + .withTestServices(fakeBqServices) + // Set a triggering frequency without needing to also specify numFileShards when + // using autoSharding. + .withTriggeringFrequency(Duration.standardSeconds(100)) + .withAutoSharding() + .withMaxBytesPerPartition(1000) + .withMaxFilesPerPartition(10) + .withMethod(BigQueryIO.Write.Method.FILE_LOADS) + .withoutValidation()); + p.run(); + + Map> elementsByTableIdx = new HashMap<>(); + for (int i = 0; i < elements.size(); i++) { + elementsByTableIdx + .computeIfAbsent(i % numTables, k -> new ArrayList<>()) + .add(elements.get(i)); + } + for (Map.Entry> entry : elementsByTableIdx.entrySet()) { + assertThat( + fakeDatasetService.getAllRows("project-id", "dataset-id", "table-" + entry.getKey()), + containsInAnyOrder(Iterables.toArray(entry.getValue(), TableRow.class))); + } + // For each table destination, it's expected to create two load jobs based on the triggering + // frequency and processing time intervals. + assertEquals(2 * numTables, fakeDatasetService.getInsertCount()); + } + @Test public void testFailuresNoRetryPolicy() throws Exception { + if (useStorageApi || !useStreaming) { + return; + } TableRow row1 = new TableRow().set("name", "a").set("number", "1"); TableRow row2 = new TableRow().set("name", "b").set("number", "2"); TableRow row3 = new TableRow().set("name", "c").set("number", "3"); @@ -633,6 +776,9 @@ public void testFailuresNoRetryPolicy() throws Exception { @Test public void testRetryPolicy() throws Exception { + if (useStorageApi || !useStreaming) { + return; + } TableRow row1 = new TableRow().set("name", "a").set("number", "1"); TableRow row2 = new TableRow().set("name", "b").set("number", "2"); TableRow row3 = new TableRow().set("name", "c").set("number", "3"); @@ -649,13 +795,13 @@ public void testRetryPolicy() throws Exception { row1, ImmutableList.of(ephemeralError, ephemeralError), row2, ImmutableList.of(ephemeralError, ephemeralError, persistentError))); - PCollection failedRows = + WriteResult result = p.apply(Create.of(row1, row2, row3)) .apply( BigQueryIO.writeTableRows() .to("project-id:dataset-id.table-id") - .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) - .withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS) + .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) + .withMethod(Method.STREAMING_INSERTS) .withSchema( new TableSchema() .setFields( @@ -664,11 +810,15 @@ public void testRetryPolicy() throws Exception { new TableFieldSchema().setName("number").setType("INTEGER")))) .withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors()) .withTestServices(fakeBqServices) - .withoutValidation()) - .getFailedInserts(); + .withoutValidation()); + + PCollection failedRows = result.getFailedInserts(); // row2 finally fails with a non-retryable error, so we expect to see it in the collection of // failed rows. PAssert.that(failedRows).containsInAnyOrder(row2); + if (useStorageApi || !useStreaming) { + PAssert.that(result.getSuccessfulInserts()).containsInAnyOrder(row1, row3); + } p.run(); // Only row1 and row3 were successfully inserted. @@ -702,6 +852,9 @@ public void testWrite() throws Exception { @Test public void testWriteWithoutInsertId() throws Exception { + if (useStorageApi || !useStreaming) { + return; + } TableRow row1 = new TableRow().set("name", "a").set("number", 1); TableRow row2 = new TableRow().set("name", "b").set("number", 2); TableRow row3 = new TableRow().set("name", "c").set("number", 3); @@ -752,6 +905,9 @@ public static InputRecord create( @Test public void testWriteAvro() throws Exception { + if (useStorageApi || useStreaming) { + return; + } p.apply( Create.of( InputRecord.create("test", 1, 1.0, Instant.parse("2019-01-01T00:00:00Z")), @@ -800,6 +956,9 @@ public void testWriteAvro() throws Exception { @Test public void testWriteAvroWithCustomWriter() throws Exception { + if (useStorageApi || useStreaming) { + return; + } SerializableFunction, GenericRecord> formatFunction = r -> { GenericRecord rec = new GenericData.Record(r.getSchema()); @@ -860,35 +1019,54 @@ protected void writeString(org.apache.avro.Schema schema, Object datum, Encoder @Test public void testStreamingWrite() throws Exception { + streamingWrite(false); + } + + @Test + public void testStreamingWriteWithAutoSharding() throws Exception { + if (useStorageApi) { + return; + } + streamingWrite(true); + } + + private void streamingWrite(boolean autoSharding) throws Exception { + if (!useStreaming) { + return; + } + BigQueryIO.Write write = + BigQueryIO.writeTableRows() + .to("project-id:dataset-id.table-id") + .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) + .withSchema( + new TableSchema() + .setFields( + ImmutableList.of( + new TableFieldSchema().setName("name").setType("STRING"), + new TableFieldSchema().setName("number").setType("INTEGER")))) + .withTestServices(fakeBqServices) + .withoutValidation(); + if (autoSharding) { + write = write.withAutoSharding(); + } p.apply( Create.of( - new TableRow().set("name", "a").set("number", 1), - new TableRow().set("name", "b").set("number", 2), - new TableRow().set("name", "c").set("number", 3), - new TableRow().set("name", "d").set("number", 4)) + new TableRow().set("name", "a").set("number", "1"), + new TableRow().set("name", "b").set("number", "2"), + new TableRow().set("name", "c").set("number", "3"), + new TableRow().set("name", "d").set("number", "4")) .withCoder(TableRowJsonCoder.of())) .setIsBoundedInternal(PCollection.IsBounded.UNBOUNDED) - .apply( - BigQueryIO.writeTableRows() - .to("project-id:dataset-id.table-id") - .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) - .withSchema( - new TableSchema() - .setFields( - ImmutableList.of( - new TableFieldSchema().setName("name").setType("STRING"), - new TableFieldSchema().setName("number").setType("INTEGER")))) - .withTestServices(fakeBqServices) - .withoutValidation()); + .apply("WriteToBQ", write); p.run(); assertThat( fakeDatasetService.getAllRows("project-id", "dataset-id", "table-id"), containsInAnyOrder( - new TableRow().set("name", "a").set("number", 1), - new TableRow().set("name", "b").set("number", 2), - new TableRow().set("name", "c").set("number", 3), - new TableRow().set("name", "d").set("number", 4))); + new TableRow().set("name", "a").set("number", "1"), + new TableRow().set("name", "b").set("number", "2"), + new TableRow().set("name", "c").set("number", "3"), + new TableRow().set("name", "d").set("number", "4"))); } @DefaultSchema(JavaFieldSchema.class) @@ -905,6 +1083,9 @@ static class SchemaPojo { @Test public void testSchemaWriteLoads() throws Exception { + // withMethod overrides the pipeline option, so we need to explicitly request + // STORAGE_API_WRITES. + BigQueryIO.Write.Method method = useStorageApi ? Method.STORAGE_WRITE_API : Method.FILE_LOADS; p.apply( Create.of( new SchemaPojo("a", 1), @@ -915,12 +1096,14 @@ public void testSchemaWriteLoads() throws Exception { BigQueryIO.write() .to("project-id:dataset-id.table-id") .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) - .withMethod(Method.FILE_LOADS) + .withMethod(method) .useBeamSchema() .withTestServices(fakeBqServices) .withoutValidation()); p.run(); + System.err.println( + "Wrote: " + fakeDatasetService.getAllRows("project-id", "dataset-id", "table-id")); assertThat( fakeDatasetService.getAllRows("project-id", "dataset-id", "table-id"), containsInAnyOrder( @@ -932,6 +1115,9 @@ public void testSchemaWriteLoads() throws Exception { @Test public void testSchemaWriteStreams() throws Exception { + if (useStorageApi || !useStreaming) { + return; + } p.apply( Create.of( new SchemaPojo("a", 1), @@ -1004,11 +1190,6 @@ public WindowMappingFn getDefaultWindowMappingFn() { throw new UnsupportedOperationException( "PartitionedGlobalWindows is not allowed in side inputs"); } - - @Override - public Instant getOutputTime(Instant inputTimestamp, PartitionedGlobalWindow window) { - return inputTimestamp; - } } /** Custom Window object that encodes a String value. */ @@ -1067,16 +1248,7 @@ public void verifyDeterministic() {} } @Test - public void testStreamingWriteWithDynamicTables() throws Exception { - testWriteWithDynamicTables(true); - } - - @Test - public void testBatchWriteWithDynamicTables() throws Exception { - testWriteWithDynamicTables(false); - } - - public void testWriteWithDynamicTables(boolean streaming) throws Exception { + public void testWriteWithDynamicTables() throws Exception { List inserts = new ArrayList<>(); for (int i = 0; i < 10; i++) { inserts.add(i); @@ -1115,7 +1287,7 @@ public void testWriteWithDynamicTables(boolean streaming) throws Exception { }; PCollection input = p.apply("CreateSource", Create.of(inserts)); - if (streaming) { + if (useStreaming) { input = input.setIsBoundedInternal(PCollection.IsBounded.UNBOUNDED); } @@ -1127,7 +1299,9 @@ public void testWriteWithDynamicTables(boolean streaming) throws Exception { .apply( BigQueryIO.write() .to(tableFunction) - .withFormatFunction(i -> new TableRow().set("name", "number" + i).set("number", i)) + .withFormatFunction( + i -> + new TableRow().set("name", "number" + i).set("number", Integer.toString(i))) .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) .withSchemaFromView(schemasView) .withTestServices(fakeBqServices) @@ -1154,13 +1328,20 @@ public void testWriteWithDynamicTables(boolean streaming) throws Exception { assertThat( fakeDatasetService.getAllRows("project-id", "dataset-id", tableId), containsInAnyOrder( - new TableRow().set("name", String.format("number%d", i)).set("number", i), - new TableRow().set("name", String.format("number%d", i + 5)).set("number", i + 5))); + new TableRow() + .set("name", String.format("number%d", i)) + .set("number", Integer.toString(i)), + new TableRow() + .set("name", String.format("number%d", i + 5)) + .set("number", Integer.toString(i + 5)))); } } @Test public void testWriteUnknown() throws Exception { + if (useStorageApi) { + return; + } p.apply( Create.of( new TableRow().set("name", "a").set("number", 1), @@ -1181,6 +1362,9 @@ public void testWriteUnknown() throws Exception { @Test public void testWriteFailedJobs() throws Exception { + if (useStorageApi) { + return; + } p.apply( Create.of( new TableRow().set("name", "a").set("number", 1), @@ -1204,10 +1388,16 @@ public void testWriteFailedJobs() throws Exception { @Test public void testWriteWithMissingSchemaFromView() throws Exception { + // Because no messages PCollectionView> view = p.apply("Create schema view", Create.of(KV.of("foo", "bar"), KV.of("bar", "boo"))) .apply(View.asMap()); - p.apply(Create.empty(TableRowJsonCoder.of())) + p.apply( + Create.of( + new TableRow().set("name", "a").set("number", 1), + new TableRow().set("name", "b").set("number", 2), + new TableRow().set("name", "c").set("number", 3)) + .withCoder(TableRowJsonCoder.of())) .apply( BigQueryIO.writeTableRows() .to("dataset-id.table-id") @@ -1248,6 +1438,7 @@ public void testWriteBuilderMethods() { assertEquals(BigQueryIO.Write.WriteDisposition.WRITE_EMPTY, write.getWriteDisposition()); assertEquals(null, write.getTableDescription()); assertTrue(write.getValidate()); + assertFalse(write.getAutoSharding()); assertFalse(write.withoutValidation().getValidate()); TableSchema schema = new TableSchema(); @@ -1316,13 +1507,14 @@ public void testBuildWriteDisplayData() { assertThat(displayData, hasDisplayItem("validation", false)); } - private void testWriteValidatesDataset(boolean unbounded) throws Exception { + @Test + public void testWriteValidatesDataset() throws Exception { TableReference tableRef = new TableReference(); tableRef.setDatasetId("somedataset"); tableRef.setTableId("sometable"); PCollection tableRows; - if (unbounded) { + if (useStreaming) { tableRows = p.apply(GenerateSequence.from(0)) .apply( @@ -1352,18 +1544,11 @@ public TableRow apply(Long input) { p.run(); } - @Test - public void testWriteValidatesDatasetBatch() throws Exception { - testWriteValidatesDataset(false); - } - - @Test - public void testWriteValidatesDatasetStreaming() throws Exception { - testWriteValidatesDataset(true); - } - @Test public void testCreateNeverWithStreaming() throws Exception { + if (!useStreaming) { + return; + } p.enableAbandonedNodeEnforcement(false); TableReference tableRef = new TableReference(); @@ -1424,6 +1609,9 @@ public void testWriteValidateFailsNoFormatFunction() { @Test public void testWriteValidateFailsBothFormatFunctions() { + if (useStorageApi) { + return; + } p.enableAbandonedNodeEnforcement(false); thrown.expect(IllegalArgumentException.class); @@ -1441,6 +1629,9 @@ public void testWriteValidateFailsBothFormatFunctions() { @Test public void testWriteValidateFailsWithBeamSchemaAndAvroFormatFunction() { + if (useStorageApi) { + return; + } p.enableAbandonedNodeEnforcement(false); thrown.expect(IllegalArgumentException.class); @@ -1456,6 +1647,9 @@ public void testWriteValidateFailsWithBeamSchemaAndAvroFormatFunction() { @Test public void testWriteValidateFailsWithAvroFormatAndStreamingInserts() { + if (!useStreaming && !useStorageApi) { + return; + } p.enableAbandonedNodeEnforcement(false); thrown.expect(IllegalArgumentException.class); @@ -1470,6 +1664,25 @@ public void testWriteValidateFailsWithAvroFormatAndStreamingInserts() { .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)); } + @Test + public void testWriteValidateFailsWithBatchAutoSharding() { + if (useStorageApi) { + return; + } + p.enableAbandonedNodeEnforcement(false); + + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage("Auto-sharding is only applicable to unbounded input."); + p.apply(Create.empty(INPUT_RECORD_CODER)) + .apply( + BigQueryIO.write() + .to("dataset.table") + .withSchema(new TableSchema()) + .withMethod(Method.STREAMING_INSERTS) + .withAutoSharding() + .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)); + } + @Test public void testWritePartitionEmptyData() throws Exception { long numFiles = 0; @@ -1550,10 +1763,12 @@ private void testWritePartition( } } - TupleTag, List>> multiPartitionsTag = - new TupleTag, List>>("multiPartitionsTag") {}; - TupleTag, List>> singlePartitionTag = - new TupleTag, List>>("singlePartitionTag") {}; + TupleTag, WritePartition.Result>> multiPartitionsTag = + new TupleTag, WritePartition.Result>>( + "multiPartitionsTag") {}; + TupleTag, WritePartition.Result>> singlePartitionTag = + new TupleTag, WritePartition.Result>>( + "singlePartitionTag") {}; String tempFilePrefix = testFolder.newFolder("BigQueryIOTest").getAbsolutePath(); PCollectionView tempFilePrefixView = @@ -1573,12 +1788,12 @@ private void testWritePartition( DoFnTester< Iterable>, - KV, List>> + KV, WritePartition.Result>> tester = DoFnTester.of(writePartition); tester.setSideInput(tempFilePrefixView, GlobalWindow.INSTANCE, tempFilePrefix); tester.processElement(files); - List, List>> partitions; + List, WritePartition.Result>> partitions; if (expectedNumPartitionsPerTable > 1) { partitions = tester.takeOutputElements(multiPartitionsTag); } else { @@ -1587,12 +1802,12 @@ private void testWritePartition( List> partitionsResult = Lists.newArrayList(); Map> filesPerTableResult = Maps.newHashMap(); - for (KV, List> partition : partitions) { + for (KV, WritePartition.Result> partition : partitions) { String table = partition.getKey().getKey().getTableSpec(); partitionsResult.add(partition.getKey()); List tableFilesResult = filesPerTableResult.computeIfAbsent(table, k -> Lists.newArrayList()); - tableFilesResult.addAll(partition.getValue()); + tableFilesResult.addAll(partition.getValue().getFilenames()); } assertThat( @@ -1639,7 +1854,7 @@ public void testWriteTables() throws Exception { String jobIdToken = "jobId"; final Multimap expectedTempTables = ArrayListMultimap.create(); - List, List>> partitions = Lists.newArrayList(); + List, WritePartition.Result>> partitions = Lists.newArrayList(); for (int i = 0; i < numTables; ++i) { String tableName = String.format("project-id:dataset-id.table%05d", i); TableDestination tableDestination = new TableDestination(tableName, tableName); @@ -1661,7 +1876,10 @@ public void testWriteTables() throws Exception { } filesPerPartition.add(writer.getResult().resourceId.toString()); } - partitions.add(KV.of(ShardedKey.of(tableDestination.getTableSpec(), j), filesPerPartition)); + partitions.add( + KV.of( + ShardedKey.of(tableDestination.getTableSpec(), j), + new AutoValue_WritePartition_Result(filesPerPartition, true))); String json = String.format( @@ -1671,8 +1889,11 @@ public void testWriteTables() throws Exception { } } - PCollection, List>> writeTablesInput = - p.apply(Create.of(partitions)); + PCollection, WritePartition.Result>> writeTablesInput = + p.apply( + Create.of(partitions) + .withCoder( + KvCoder.of(ShardedKeyCoder.of(StringUtf8Coder.of()), ResultCoder.INSTANCE))); PCollectionView jobIdTokenView = p.apply("CreateJobId", Create.of("jobId")).apply(View.asSingleton()); List> sideInputs = ImmutableList.of(jobIdTokenView); @@ -1695,18 +1916,25 @@ public void testWriteTables() throws Exception { false, Collections.emptySet()); - PCollection> writeTablesOutput = - writeTablesInput.apply(writeTables); + PCollection> writeTablesOutput = + writeTablesInput + .apply(writeTables) + .setCoder(KvCoder.of(TableDestinationCoderV3.of(), WriteTables.ResultCoder.INSTANCE)); PAssert.thatMultimap(writeTablesOutput) .satisfies( input -> { assertEquals(input.keySet(), expectedTempTables.keySet()); - for (Map.Entry> entry : input.entrySet()) { + for (Map.Entry> entry : + input.entrySet()) { + Iterable tableNames = + StreamSupport.stream(entry.getValue().spliterator(), false) + .map(Result::getTableName) + .collect(Collectors.toList()); @SuppressWarnings("unchecked") String[] expectedValues = Iterables.toArray(expectedTempTables.get(entry.getKey()), String.class); - assertThat(entry.getValue(), containsInAnyOrder(expectedValues)); + assertThat(tableNames, containsInAnyOrder(expectedValues)); } return null; }); @@ -1743,7 +1971,7 @@ public void testWriteRename() throws Exception { Multimap expectedRowsPerTable = ArrayListMultimap.create(); String jobIdToken = "jobIdToken"; Multimap tempTables = ArrayListMultimap.create(); - List> tempTablesElement = Lists.newArrayList(); + List> tempTablesElement = Lists.newArrayList(); for (int i = 0; i < numFinalTables; ++i) { String tableName = "project-id:dataset-id.table_" + i; TableDestination tableDestination = new TableDestination(tableName, "table_" + i + "_desc"); @@ -1763,7 +1991,8 @@ public void testWriteRename() throws Exception { expectedRowsPerTable.putAll(tableDestination, rows); String tableJson = toJsonString(tempTable); tempTables.put(tableDestination, tableJson); - tempTablesElement.add(KV.of(tableDestination, tableJson)); + tempTablesElement.add( + KV.of(tableDestination, new AutoValue_WriteTables_Result(tableJson, true))); } } @@ -1779,7 +2008,8 @@ public void testWriteRename() throws Exception { 3, "kms_key"); - DoFnTester>, Void> tester = DoFnTester.of(writeRename); + DoFnTester>, Void> tester = + DoFnTester.of(writeRename); tester.setSideInput(jobIdTokenView, GlobalWindow.INSTANCE, jobIdToken); tester.processElement(tempTablesElement); tester.finishBundle(); @@ -1863,16 +2093,22 @@ public void testWriteToTableDecorator() throws Exception { TableRow row1 = new TableRow().set("name", "a").set("number", "1"); TableRow row2 = new TableRow().set("name", "b").set("number", "2"); + // withMethod overrides the pipeline option, so we need to explicitly requiest + // STORAGE_API_WRITES. + BigQueryIO.Write.Method method = + useStorageApi ? Method.STORAGE_WRITE_API : Method.STREAMING_INSERTS; TableSchema schema = new TableSchema() .setFields( - ImmutableList.of(new TableFieldSchema().setName("number").setType("INTEGER"))); + ImmutableList.of( + new TableFieldSchema().setName("name").setType("STRING"), + new TableFieldSchema().setName("number").setType("INTEGER"))); p.apply(Create.of(row1, row2)) .apply( BigQueryIO.writeTableRows() .to("project-id:dataset-id.table-id$20171127") .withTestServices(fakeBqServices) - .withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS) + .withMethod(method) .withSchema(schema) .withoutValidation()); p.run(); @@ -1880,6 +2116,9 @@ public void testWriteToTableDecorator() throws Exception { @Test public void testExtendedErrorRetrieval() throws Exception { + if (useStorageApi) { + return; + } TableRow row1 = new TableRow().set("name", "a").set("number", "1"); TableRow row2 = new TableRow().set("name", "b").set("number", "2"); TableRow row3 = new TableRow().set("name", "c").set("number", "3"); @@ -1932,6 +2171,9 @@ public void testExtendedErrorRetrieval() throws Exception { @Test public void testWrongErrorConfigs() { + if (useStorageApi) { + return; + } p.enableAutoRunIfMissing(true); TableRow row1 = new TableRow().set("name", "a").set("number", "1"); @@ -2002,7 +2244,8 @@ void schemaUpdateOptionsTest( .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) .withSchemaUpdateOptions(schemaUpdateOptions); - p.apply(Create.of(row)).apply(writeTransform); + p.apply("Create" + insertMethod, Create.of(row)) + .apply("Write" + insertMethod, writeTransform); p.run(); List expectedOptions = @@ -2034,6 +2277,9 @@ public void testWriteFileSchemaUpdateOptionAll() throws Exception { @Test public void testSchemaUpdateOptionsFailsStreamingInserts() throws Exception { + if (!useStreaming && !useStorageApi) { + return; + } Set options = EnumSet.of(SchemaUpdateOption.ALLOW_FIELD_ADDITION); p.enableAbandonedNodeEnforcement(false); thrown.expect(IllegalArgumentException.class); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryKmsKeyIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryKmsKeyIT.java index 0d6e73048dd8..50610f351902 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryKmsKeyIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryKmsKeyIT.java @@ -49,9 +49,6 @@ */ @RunWith(JUnit4.class) @Category(UsesKms.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryKmsKeyIT { private static final Logger LOG = LoggerFactory.getLogger(BigQueryKmsKeyIT.class); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySchemaUpdateOptionsIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySchemaUpdateOptionsIT.java index 99b91d4302a1..8d030ca98b46 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySchemaUpdateOptionsIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySchemaUpdateOptionsIT.java @@ -50,9 +50,6 @@ /** Integration test for BigqueryIO with DataflowRunner and DirectRunner. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQuerySchemaUpdateOptionsIT { private static final Logger LOG = LoggerFactory.getLogger(BigQuerySchemaUpdateOptionsIT.class); private static String project; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImplTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImplTest.java index e809d26a56ad..a9a360917dae 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImplTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImplTest.java @@ -17,17 +17,20 @@ */ package org.apache.beam.sdk.io.gcp.bigquery; -import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Verify.verifyNotNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.is; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.Mockito.atLeastOnce; import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.never; import static org.mockito.Mockito.times; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.when; @@ -46,6 +49,8 @@ import com.google.api.client.testing.util.MockSleeper; import com.google.api.client.util.BackOff; import com.google.api.client.util.Sleeper; +import com.google.api.gax.rpc.ApiException; +import com.google.api.gax.rpc.StatusCode; import com.google.api.services.bigquery.Bigquery; import com.google.api.services.bigquery.model.ErrorProto; import com.google.api.services.bigquery.model.Job; @@ -60,19 +65,37 @@ import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableRow; import com.google.api.services.bigquery.model.TableSchema; +import com.google.cloud.bigquery.storage.v1.CreateReadSessionRequest; +import com.google.cloud.bigquery.storage.v1.ReadRowsRequest; +import com.google.cloud.bigquery.storage.v1.ReadRowsResponse; +import com.google.cloud.bigquery.storage.v1.ReadSession; +import com.google.cloud.bigquery.storage.v1.SplitReadStreamRequest; +import com.google.cloud.bigquery.storage.v1.SplitReadStreamResponse; import com.google.cloud.hadoop.util.ApiErrorExtractor; import com.google.cloud.hadoop.util.RetryBoundedBackOff; +import com.google.protobuf.Parser; +import com.google.rpc.RetryInfo; +import io.grpc.Metadata; +import io.grpc.Status; import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; +import java.util.HashMap; +import java.util.Iterator; import java.util.List; +import org.apache.beam.runners.core.metrics.GcpResourceIdentifiers; +import org.apache.beam.runners.core.metrics.MetricsContainerImpl; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName; import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter; import org.apache.beam.sdk.extensions.gcp.util.FastNanoClockAndSleeper; import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer; import org.apache.beam.sdk.extensions.gcp.util.Transport; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl; import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.JobServiceImpl; +import org.apache.beam.sdk.metrics.MetricName; +import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.testing.ExpectedLogs; import org.apache.beam.sdk.transforms.windowing.GlobalWindow; @@ -80,6 +103,7 @@ import org.apache.beam.sdk.util.FluentBackoff; import org.apache.beam.sdk.values.FailsafeValueInSingleWindow; import org.apache.beam.sdk.values.ValueInSingleWindow; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Verify; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.joda.time.Duration; @@ -89,18 +113,16 @@ import org.junit.rules.ExpectedException; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; -import org.mockito.Mock; import org.mockito.MockitoAnnotations; /** Tests for {@link BigQueryServicesImpl}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryServicesImplTest { @Rule public ExpectedException thrown = ExpectedException.none(); @Rule public ExpectedLogs expectedLogs = ExpectedLogs.none(BigQueryServicesImpl.class); - @Mock private LowLevelHttpResponse response; + // A test can make mock responses through setupMockResponses + private LowLevelHttpResponse[] responses; + private MockLowLevelHttpRequest request; private Bigquery bigquery; @@ -108,12 +130,18 @@ public class BigQueryServicesImplTest { public void setUp() { MockitoAnnotations.initMocks(this); - // Set up the MockHttpRequest for future inspection + // Set up the MockHttpRequest for future inspection. request = new MockLowLevelHttpRequest() { + int index = 0; + @Override public LowLevelHttpResponse execute() throws IOException { - return response; + Verify.verify( + index < responses.length, + "The number of HttpRequest invocation exceeded the number of prepared mock requests. Index: %d", + index); + return responses[index++]; } }; // A mock transport that lets us mock the API responses. @@ -125,6 +153,80 @@ public LowLevelHttpResponse execute() throws IOException { new Bigquery.Builder( transport, Transport.getJsonFactory(), new RetryHttpRequestInitializer()) .build(); + + // Setup the ProcessWideContainer for testing metrics are set. + MetricsContainerImpl container = new MetricsContainerImpl(null); + MetricsEnvironment.setProcessWideContainer(container); + MetricsEnvironment.setCurrentContainer(container); + } + + @FunctionalInterface + private interface MockSetupFunction { + void apply(LowLevelHttpResponse t) throws IOException; + } + + /** + * Prepares the mock objects using {@code mockPreparations}, and assigns them to {@link + * #responses}. + */ + private void setupMockResponses(MockSetupFunction... mockPreparations) throws IOException { + responses = new LowLevelHttpResponse[mockPreparations.length]; + for (int i = 0; i < mockPreparations.length; ++i) { + MockSetupFunction setupFunction = mockPreparations[i]; + LowLevelHttpResponse response = mock(LowLevelHttpResponse.class); + setupFunction.apply(response); + responses[i] = response; + } + } + + /** + * Verifies the test interacted the mock objects in {@link #responses}. + * + *

    The implementation of google-api-client or google-http-client may influence the number of + * interaction in future + */ + private void verifyAllResponsesAreRead() throws IOException { + Verify.verify(responses != null, "The test setup is incorrect. Responses are not setup"); + for (LowLevelHttpResponse response : responses) { + // Google-http-client reads the field twice per response. + verify(response, atLeastOnce()).getStatusCode(); + verify(response, times(1)).getContent(); + verify(response, times(1)).getContentType(); + } + } + + private void verifyRequestMetricWasSet( + String method, String projectId, String dataset, String table, String status, long count) { + // Verify the metric as reported. + HashMap labels = new HashMap(); + // TODO(ajamato): Add Ptransform label. Populate it as empty for now to prevent the + // SpecMonitoringInfoValidator from dropping the MonitoringInfo. + labels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + labels.put(MonitoringInfoConstants.Labels.SERVICE, "BigQuery"); + labels.put(MonitoringInfoConstants.Labels.METHOD, method); + labels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.bigQueryTable(projectId, dataset, table)); + labels.put(MonitoringInfoConstants.Labels.BIGQUERY_PROJECT_ID, projectId); + labels.put(MonitoringInfoConstants.Labels.BIGQUERY_DATASET, dataset); + labels.put(MonitoringInfoConstants.Labels.BIGQUERY_TABLE, table); + labels.put(MonitoringInfoConstants.Labels.STATUS, status); + + MonitoringInfoMetricName name = + MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, labels); + MetricsContainerImpl container = + (MetricsContainerImpl) MetricsEnvironment.getProcessWideContainer(); + assertEquals(count, (long) container.getCounter(name).getCumulative()); + } + + private void verifyWriteMetricWasSet( + String projectId, String dataset, String table, String status, long count) { + verifyRequestMetricWasSet("BigQueryBatchWrite", projectId, dataset, table, status, count); + } + + private void verifyReadMetricWasSet( + String projectId, String dataset, String table, String status, long count) { + verifyRequestMetricWasSet("BigQueryBatchRead", projectId, dataset, table, status, count); } /** Tests that {@link BigQueryServicesImpl.JobServiceImpl#startLoadJob} succeeds. */ @@ -136,9 +238,12 @@ public void testStartLoadJobSucceeds() throws IOException, InterruptedException jobRef.setProjectId("projectId"); testJob.setJobReference(jobRef); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContent()).thenReturn(toStream(testJob)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testJob)); + }); Sleeper sleeper = new FastNanoClockAndSleeper(); JobServiceImpl.startJob( @@ -148,9 +253,7 @@ public void testStartLoadJobSucceeds() throws IOException, InterruptedException sleeper, BackOffAdapter.toGcpBackOff(FluentBackoff.DEFAULT.backoff())); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); expectedLogs.verifyInfo(String.format("Started BigQuery job: %s", jobRef)); } @@ -166,7 +269,10 @@ public void testStartLoadJobSucceedsAlreadyExists() throws IOException, Interrup jobRef.setProjectId("projectId"); testJob.setJobReference(jobRef); - when(response.getStatusCode()).thenReturn(409); // 409 means already exists + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(409); // 409 means already exists + }); Sleeper sleeper = new FastNanoClockAndSleeper(); JobServiceImpl.startJob( @@ -176,9 +282,7 @@ public void testStartLoadJobSucceedsAlreadyExists() throws IOException, Interrup sleeper, BackOffAdapter.toGcpBackOff(FluentBackoff.DEFAULT.backoff())); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); expectedLogs.verifyNotLogged("Started BigQuery job"); } @@ -192,11 +296,18 @@ public void testStartLoadJobRetry() throws IOException, InterruptedException { testJob.setJobReference(jobRef); // First response is 403 rate limited, second response has valid payload. - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(403).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))) - .thenReturn(toStream(testJob)); + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(403); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getContent()) + .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testJob)); + }); Sleeper sleeper = new FastNanoClockAndSleeper(); JobServiceImpl.startJob( @@ -206,9 +317,7 @@ public void testStartLoadJobRetry() throws IOException, InterruptedException { sleeper, BackOffAdapter.toGcpBackOff(FluentBackoff.DEFAULT.backoff())); - verify(response, times(2)).getStatusCode(); - verify(response, times(2)).getContent(); - verify(response, times(2)).getContentType(); + verifyAllResponsesAreRead(); } /** Tests that {@link BigQueryServicesImpl.JobServiceImpl#pollJob} succeeds. */ @@ -217,9 +326,12 @@ public void testPollJobSucceeds() throws IOException, InterruptedException { Job testJob = new Job(); testJob.setStatus(new JobStatus().setState("DONE")); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContent()).thenReturn(toStream(testJob)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testJob)); + }); BigQueryServicesImpl.JobServiceImpl jobService = new BigQueryServicesImpl.JobServiceImpl(bigquery); @@ -227,9 +339,7 @@ public void testPollJobSucceeds() throws IOException, InterruptedException { Job job = jobService.pollJob(jobRef, Sleeper.DEFAULT, BackOff.ZERO_BACKOFF); assertEquals(testJob, job); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } /** Tests that {@link BigQueryServicesImpl.JobServiceImpl#pollJob} fails. */ @@ -238,9 +348,12 @@ public void testPollJobFailed() throws IOException, InterruptedException { Job testJob = new Job(); testJob.setStatus(new JobStatus().setState("DONE").setErrorResult(new ErrorProto())); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContent()).thenReturn(toStream(testJob)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testJob)); + }); BigQueryServicesImpl.JobServiceImpl jobService = new BigQueryServicesImpl.JobServiceImpl(bigquery); @@ -248,9 +361,7 @@ public void testPollJobFailed() throws IOException, InterruptedException { Job job = jobService.pollJob(jobRef, Sleeper.DEFAULT, BackOff.ZERO_BACKOFF); assertEquals(testJob, job); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } /** Tests that {@link BigQueryServicesImpl.JobServiceImpl#pollJob} returns UNKNOWN. */ @@ -259,9 +370,12 @@ public void testPollJobUnknown() throws IOException, InterruptedException { Job testJob = new Job(); testJob.setStatus(new JobStatus()); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContent()).thenReturn(toStream(testJob)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testJob)); + }); BigQueryServicesImpl.JobServiceImpl jobService = new BigQueryServicesImpl.JobServiceImpl(bigquery); @@ -269,9 +383,7 @@ public void testPollJobUnknown() throws IOException, InterruptedException { Job job = jobService.pollJob(jobRef, Sleeper.DEFAULT, BackOff.STOP_BACKOFF); assertEquals(null, job); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } @Test @@ -279,9 +391,12 @@ public void testGetJobSucceeds() throws Exception { Job testJob = new Job(); testJob.setStatus(new JobStatus()); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContent()).thenReturn(toStream(testJob)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testJob)); + }); BigQueryServicesImpl.JobServiceImpl jobService = new BigQueryServicesImpl.JobServiceImpl(bigquery); @@ -289,15 +404,16 @@ public void testGetJobSucceeds() throws Exception { Job job = jobService.getJob(jobRef, Sleeper.DEFAULT, BackOff.ZERO_BACKOFF); assertEquals(testJob, job); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } @Test public void testGetJobNotFound() throws Exception { - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(404); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(404); + }); BigQueryServicesImpl.JobServiceImpl jobService = new BigQueryServicesImpl.JobServiceImpl(bigquery); @@ -305,15 +421,16 @@ public void testGetJobNotFound() throws Exception { Job job = jobService.getJob(jobRef, Sleeper.DEFAULT, BackOff.ZERO_BACKOFF); assertEquals(null, job); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } @Test public void testGetJobThrows() throws Exception { - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(401); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(401); + }); BigQueryServicesImpl.JobServiceImpl jobService = new BigQueryServicesImpl.JobServiceImpl(bigquery); @@ -335,30 +452,40 @@ public void testGetTableSucceeds() throws Exception { Table testTable = new Table(); testTable.setTableReference(tableRef); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(403).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))) - .thenReturn(toStream(testTable)); + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(403); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getContent()) + .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testTable)); + }); BigQueryServicesImpl.DatasetServiceImpl datasetService = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); Table table = datasetService.getTable(tableRef, null, BackOff.ZERO_BACKOFF, Sleeper.DEFAULT); assertEquals(testTable, table); - verify(response, times(2)).getStatusCode(); - verify(response, times(2)).getContent(); - verify(response, times(2)).getContentType(); + verifyAllResponsesAreRead(); } @Test public void testGetTableNotFound() throws IOException, InterruptedException { - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(404); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(404); + }); BigQueryServicesImpl.DatasetServiceImpl datasetService = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); TableReference tableRef = new TableReference() @@ -368,15 +495,16 @@ public void testGetTableNotFound() throws IOException, InterruptedException { Table table = datasetService.getTable(tableRef, null, BackOff.ZERO_BACKOFF, Sleeper.DEFAULT); assertNull(table); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } @Test public void testGetTableThrows() throws Exception { - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(401); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(401); + }); TableReference tableRef = new TableReference() @@ -388,7 +516,8 @@ public void testGetTableThrows() throws Exception { thrown.expectMessage(String.format("Unable to get table: %s", tableRef.getTableId())); BigQueryServicesImpl.DatasetServiceImpl datasetService = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); datasetService.getTable(tableRef, null, BackOff.STOP_BACKOFF, Sleeper.DEFAULT); } @@ -403,29 +532,39 @@ public void testIsTableEmptySucceeds() throws Exception { TableDataList testDataList = new TableDataList().setRows(ImmutableList.of(new TableRow())); // First response is 403 rate limited, second response has valid payload. - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(403).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))) - .thenReturn(toStream(testDataList)); + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(403); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getContent()) + .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testDataList)); + }); BigQueryServicesImpl.DatasetServiceImpl datasetService = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); assertFalse(datasetService.isTableEmpty(tableRef, BackOff.ZERO_BACKOFF, Sleeper.DEFAULT)); - verify(response, times(2)).getStatusCode(); - verify(response, times(2)).getContent(); - verify(response, times(2)).getContentType(); + verifyAllResponsesAreRead(); } @Test public void testIsTableEmptyNoRetryForNotFound() throws IOException, InterruptedException { - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(404); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(404); + }); BigQueryServicesImpl.DatasetServiceImpl datasetService = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); TableReference tableRef = new TableReference() @@ -439,16 +578,17 @@ public void testIsTableEmptyNoRetryForNotFound() throws IOException, Interrupted try { datasetService.isTableEmpty(tableRef, BackOff.ZERO_BACKOFF, Sleeper.DEFAULT); } finally { - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } } @Test public void testIsTableEmptyThrows() throws Exception { - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(401); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(401); + }); TableReference tableRef = new TableReference() @@ -457,7 +597,8 @@ public void testIsTableEmptyThrows() throws Exception { .setTableId("tableId"); BigQueryServicesImpl.DatasetServiceImpl datasetService = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); thrown.expect(IOException.class); thrown.expectMessage(String.format("Unable to list table data: %s", tableRef.getTableId())); @@ -468,10 +609,12 @@ public void testIsTableEmptyThrows() throws Exception { @Test public void testExecuteWithRetries() throws IOException, InterruptedException { Table testTable = new Table(); - - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContent()).thenReturn(toStream(testTable)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testTable)); + }); Table table = BigQueryServicesImpl.executeWithRetries( @@ -482,9 +625,7 @@ public void testExecuteWithRetries() throws IOException, InterruptedException { BigQueryServicesImpl.ALWAYS_RETRY); assertEquals(testTable, table); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } private FailsafeValueInSingleWindow wrapValue(T value) { @@ -513,14 +654,21 @@ public void testInsertRateLimitRetry() throws Exception { rows.add(wrapValue(new TableRow())); // First response is 403 rate limited, second response has valid payload. - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(403).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))) - .thenReturn(toStream(new TableDataInsertAllResponse())); + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(403); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getContent()) + .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(new TableDataInsertAllResponse())); + }); DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); dataService.insertAll( ref, rows, @@ -533,11 +681,13 @@ public void testInsertRateLimitRetry() throws Exception { null, false, false, - false); - verify(response, times(2)).getStatusCode(); - verify(response, times(2)).getContent(); - verify(response, times(2)).getContentType(); + false, + null); + + verifyAllResponsesAreRead(); expectedLogs.verifyInfo("BigQuery insertAll error, retrying:"); + + verifyWriteMetricWasSet("project", "dataset", "table", "ratelimitexceeded", 1); } /** Tests that {@link DatasetServiceImpl#insertAll} retries quota exceeded attempts. */ @@ -549,14 +699,21 @@ public void testInsertQuotaExceededRetry() throws Exception { rows.add(wrapValue(new TableRow())); // First response is 403 quota exceeded, second response has valid payload. - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(403).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(errorWithReasonAndStatus("quotaExceeded", 403))) - .thenReturn(toStream(new TableDataInsertAllResponse())); + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(403); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getContent()) + .thenReturn(toStream(errorWithReasonAndStatus("quotaExceeded", 403))); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(new TableDataInsertAllResponse())); + }); DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); dataService.insertAll( ref, rows, @@ -569,11 +726,13 @@ public void testInsertQuotaExceededRetry() throws Exception { null, false, false, - false); - verify(response, times(2)).getStatusCode(); - verify(response, times(2)).getContent(); - verify(response, times(2)).getContentType(); + false, + null); + + verifyAllResponsesAreRead(); expectedLogs.verifyInfo("BigQuery insertAll error, retrying:"); + + verifyWriteMetricWasSet("project", "dataset", "table", "quotaexceeded", 1); } /** Tests that {@link DatasetServiceImpl#insertAll} can stop quotaExceeded retry attempts. */ @@ -584,25 +743,34 @@ public void testInsertStoppedRetry() throws Exception { List> rows = new ArrayList<>(); rows.add(wrapValue(new TableRow())); + MockSetupFunction quotaExceededResponse = + response -> { + when(response.getStatusCode()).thenReturn(403); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getContent()) + .thenReturn(toStream(errorWithReasonAndStatus("quotaExceeded", 403))); + }; + // Respond 403 four times, then valid payload. - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()) - .thenReturn(403) - .thenReturn(403) - .thenReturn(403) - .thenReturn(403) - .thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(errorWithReasonAndStatus("quotaExceeded", 403))) - .thenReturn(toStream(errorWithReasonAndStatus("quotaExceeded", 403))) - .thenReturn(toStream(errorWithReasonAndStatus("quotaExceeded", 403))) - .thenReturn(toStream(errorWithReasonAndStatus("quotaExceeded", 403))) - .thenReturn(toStream(new TableDataInsertAllResponse())); + setupMockResponses( + quotaExceededResponse, + quotaExceededResponse, + quotaExceededResponse, + quotaExceededResponse, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(new TableDataInsertAllResponse())); + }); + thrown.expect(RuntimeException.class); + + // Google-http-client 1.39.1 and higher does not read the content of the response with error + // status code. How can we ensure appropriate exception is thrown? thrown.expectMessage("quotaExceeded"); DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); dataService.insertAll( ref, rows, @@ -615,10 +783,12 @@ public void testInsertStoppedRetry() throws Exception { null, false, false, - false); - verify(response, times(5)).getStatusCode(); - verify(response, times(5)).getContent(); - verify(response, times(5)).getContentType(); + false, + null); + + verifyAllResponsesAreRead(); + + verifyWriteMetricWasSet("project", "dataset", "table", "quotaexceeded", 1); } // A BackOff that makes a total of 4 attempts @@ -646,14 +816,20 @@ public void testInsertRetrySelectRows() throws Exception { final TableDataInsertAllResponse allRowsSucceeded = new TableDataInsertAllResponse(); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(bFailed)) - .thenReturn(toStream(allRowsSucceeded)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(bFailed)); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(allRowsSucceeded)); + }); DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); dataService.insertAll( ref, rows, @@ -666,10 +842,13 @@ public void testInsertRetrySelectRows() throws Exception { null, false, false, - false); - verify(response, times(2)).getStatusCode(); - verify(response, times(2)).getContent(); - verify(response, times(2)).getContentType(); + false, + null); + + verifyAllResponsesAreRead(); + + verifyWriteMetricWasSet("project", "dataset", "table", "unknown", 1); + verifyWriteMetricWasSet("project", "dataset", "table", "ok", 1); } /** Tests that {@link DatasetServiceImpl#insertAll} fails gracefully when persistent issues. */ @@ -680,24 +859,42 @@ public void testInsertFailsGracefully() throws Exception { List> rows = ImmutableList.of(wrapValue(new TableRow()), wrapValue(new TableRow())); + ErrorProto errorProto = new ErrorProto().setReason("schemaMismatch"); final TableDataInsertAllResponse row1Failed = new TableDataInsertAllResponse() - .setInsertErrors(ImmutableList.of(new InsertErrors().setIndex(1L))); + .setInsertErrors( + ImmutableList.of( + new InsertErrors().setIndex(1L).setErrors(ImmutableList.of(errorProto)))); final TableDataInsertAllResponse row0Failed = new TableDataInsertAllResponse() - .setInsertErrors(ImmutableList.of(new InsertErrors().setIndex(0L))); + .setInsertErrors( + ImmutableList.of( + new InsertErrors().setIndex(0L).setErrors(ImmutableList.of(errorProto)))); + + MockSetupFunction row0FailureResponseFunction = + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + // Always return 200. + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenAnswer(invocation -> toStream(row0Failed)); + }; - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - // Always return 200. - when(response.getStatusCode()).thenReturn(200); - // Return row 1 failing, then we retry row 1 as row 0, and row 0 persistently fails. - when(response.getContent()) - .thenReturn(toStream(row1Failed)) - .thenAnswer(invocation -> toStream(row0Failed)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + // Always return 200. + when(response.getStatusCode()).thenReturn(200); + // Return row 1 failing, then we retry row 1 as row 0, and row 0 persistently fails. + when(response.getContent()).thenReturn(toStream(row1Failed)); + }, + // 3 failures + row0FailureResponseFunction, + row0FailureResponseFunction, + row0FailureResponseFunction); DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); // Expect it to fail. try { @@ -713,19 +910,20 @@ public void testInsertFailsGracefully() throws Exception { null, false, false, - false); + false, + null); fail(); } catch (IOException e) { assertThat(e, instanceOf(IOException.class)); assertThat(e.getMessage(), containsString("Insert failed:")); - assertThat(e.getMessage(), containsString("[{\"index\":0}]")); + assertThat(e.getMessage(), containsString("[{\"errors\":[{\"reason\":\"schemaMismatch\"}]")); } // Verify the exact number of retries as well as log messages. - verify(response, times(4)).getStatusCode(); - verify(response, times(4)).getContent(); - verify(response, times(4)).getContentType(); + verifyAllResponsesAreRead(); expectedLogs.verifyInfo("Retrying 1 failed inserts to BigQuery"); + + verifyWriteMetricWasSet("project", "dataset", "table", "schemamismatch", 4); } /** @@ -739,16 +937,25 @@ public void testFailInsertOtherRetry() throws Exception { List> rows = new ArrayList<>(); rows.add(wrapValue(new TableRow())); + final TableDataInsertAllResponse allRowsSucceeded = + new TableDataInsertAllResponse().setInsertErrors(ImmutableList.of()); // First response is 403 non-{rate-limited, quota-exceeded}, second response has valid payload - // but should not - // be invoked. - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(403).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(errorWithReasonAndStatus("actually forbidden", 403))) - .thenReturn(toStream(new TableDataInsertAllResponse())); + // but should not be invoked. + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(403); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getContent()) + .thenReturn(toStream(errorWithReasonAndStatus("actually forbidden", 403))); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(new TableDataInsertAllResponse())); + }); + DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); thrown.expect(RuntimeException.class); thrown.expectMessage("actually forbidden"); try { @@ -764,12 +971,19 @@ public void testFailInsertOtherRetry() throws Exception { null, false, false, - false); + false, + null); } finally { - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verify(responses[0], atLeastOnce()).getStatusCode(); + verify(responses[0]).getContent(); + verify(responses[0]).getContentType(); + // It should not invoke 2nd response + verify(responses[1], never()).getStatusCode(); + verify(responses[1], never()).getContent(); + verify(responses[1], never()).getContentType(); } + + verifyWriteMetricWasSet("project", "dataset", "table", "actually forbidden", 1); } /** @@ -807,20 +1021,26 @@ public void testInsertRetryPolicy() throws InterruptedException, IOException { // On the final attempt, no failures are returned. final TableDataInsertAllResponse allRowsSucceeded = new TableDataInsertAllResponse(); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - // Always return 200. - when(response.getStatusCode()).thenReturn(200); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200).thenReturn(200); - - // First fail - when(response.getContent()) - .thenReturn(toStream(firstFailure)) - .thenReturn(toStream(secondFialure)) - .thenReturn(toStream(allRowsSucceeded)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + // Always return 200. + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(firstFailure)); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(secondFialure)); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(allRowsSucceeded)); + }); DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); List> failedInserts = Lists.newArrayList(); dataService.insertAll( @@ -835,9 +1055,12 @@ public void testInsertRetryPolicy() throws InterruptedException, IOException { ErrorContainer.TABLE_ROW_ERROR_CONTAINER, false, false, - false); + false, + null); assertEquals(1, failedInserts.size()); expectedLogs.verifyInfo("Retrying 1 failed inserts to BigQuery"); + + verifyWriteMetricWasSet("project", "dataset", "table", "timeout", 2); } /** @@ -855,14 +1078,16 @@ public void testSkipInvalidRowsIgnoreUnknownIgnoreInsertIdsValuesStreaming() final TableDataInsertAllResponse allRowsSucceeded = new TableDataInsertAllResponse(); // Return a 200 response each time - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(allRowsSucceeded)) - .thenReturn(toStream(allRowsSucceeded)); + MockSetupFunction allRowsSucceededResponseFunction = + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(allRowsSucceeded)); + }; + setupMockResponses(allRowsSucceededResponseFunction, allRowsSucceededResponseFunction); DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); // First, test with all flags disabled dataService.insertAll( @@ -877,7 +1102,8 @@ public void testSkipInvalidRowsIgnoreUnknownIgnoreInsertIdsValuesStreaming() ErrorContainer.TABLE_ROW_ERROR_CONTAINER, false, false, - false); + false, + null); TableDataInsertAllRequest parsedRequest = fromString(request.getContentAsString(), TableDataInsertAllRequest.class); @@ -898,7 +1124,8 @@ public void testSkipInvalidRowsIgnoreUnknownIgnoreInsertIdsValuesStreaming() ErrorContainer.TABLE_ROW_ERROR_CONTAINER, true, true, - true); + true, + null); parsedRequest = fromString(request.getContentAsString(), TableDataInsertAllRequest.class); @@ -906,6 +1133,8 @@ public void testSkipInvalidRowsIgnoreUnknownIgnoreInsertIdsValuesStreaming() assertTrue(parsedRequest.getIgnoreUnknownValues()); assertNull(parsedRequest.getRows().get(0).getInsertId()); assertNull(parsedRequest.getRows().get(1).getInsertId()); + + verifyWriteMetricWasSet("project", "dataset", "table", "ok", 2); } /** A helper to convert a string response back to a {@link GenericJson} subclass. */ @@ -938,7 +1167,7 @@ private static GoogleJsonErrorContainer errorWithReasonAndStatus(String reason, @Test public void testGetErrorInfo() throws IOException { DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); ErrorInfo info = new ErrorInfo(); List infoList = new ArrayList<>(); infoList.add(info); @@ -957,19 +1186,22 @@ public void testCreateTableSucceeds() throws IOException { TableReference ref = new TableReference().setProjectId("project").setDatasetId("dataset").setTableId("table"); Table testTable = new Table().setTableReference(ref); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContent()).thenReturn(toStream(testTable)); + + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testTable)); + }); BigQueryServicesImpl.DatasetServiceImpl services = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); Table ret = services.tryCreateTable( - testTable, new RetryBoundedBackOff(0, BackOff.ZERO_BACKOFF), Sleeper.DEFAULT); + testTable, new RetryBoundedBackOff(BackOff.ZERO_BACKOFF, 0), Sleeper.DEFAULT); assertEquals(testTable, ret); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } /** Tests that {@link BigQueryServicesImpl} does not retry non-rate-limited attempts. */ @@ -980,25 +1212,37 @@ public void testCreateTableDoesNotRetry() throws IOException { Table testTable = new Table().setTableReference(ref); // First response is 403 not-rate-limited, second response has valid payload but should not // be invoked. - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(403).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(errorWithReasonAndStatus("actually forbidden", 403))) - .thenReturn(toStream(testTable)); + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(403); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getContent()) + .thenReturn(toStream(errorWithReasonAndStatus("actually forbidden", 403))); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testTable)); + }); thrown.expect(GoogleJsonResponseException.class); thrown.expectMessage("actually forbidden"); BigQueryServicesImpl.DatasetServiceImpl services = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); try { services.tryCreateTable( - testTable, new RetryBoundedBackOff(3, BackOff.ZERO_BACKOFF), Sleeper.DEFAULT); + testTable, new RetryBoundedBackOff(BackOff.ZERO_BACKOFF, 3), Sleeper.DEFAULT); fail(); } catch (IOException e) { - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verify(responses[0], atLeastOnce()).getStatusCode(); + verify(responses[0]).getContent(); + verify(responses[0]).getContentType(); + // It should not invoke 2nd response + verify(responses[1], never()).getStatusCode(); + verify(responses[1], never()).getContent(); + verify(responses[1], never()).getContentType(); throw e; } } @@ -1016,18 +1260,20 @@ public void testCreateTableSucceedsAlreadyExists() throws IOException { new TableFieldSchema().setName("column2").setType("Integer"))); Table testTable = new Table().setTableReference(ref).setSchema(schema); - when(response.getStatusCode()).thenReturn(409); // 409 means already exists + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(409); // 409 means already exists + }); BigQueryServicesImpl.DatasetServiceImpl services = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); Table ret = services.tryCreateTable( - testTable, new RetryBoundedBackOff(0, BackOff.ZERO_BACKOFF), Sleeper.DEFAULT); + testTable, new RetryBoundedBackOff(BackOff.ZERO_BACKOFF, 0), Sleeper.DEFAULT); assertNull(ret); - verify(response, times(1)).getStatusCode(); - verify(response, times(1)).getContent(); - verify(response, times(1)).getContentType(); + verifyAllResponsesAreRead(); } /** Tests that {@link BigQueryServicesImpl} retries quota rate limited attempts. */ @@ -1038,25 +1284,33 @@ public void testCreateTableRetry() throws IOException { Table testTable = new Table().setTableReference(ref); // First response is 403 rate limited, second response has valid payload. - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(403).thenReturn(200); - when(response.getContent()) - .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))) - .thenReturn(toStream(testTable)); + setupMockResponses( + response -> { + when(response.getStatusCode()).thenReturn(403); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getContent()) + .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403))); + }, + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(testTable)); + }); BigQueryServicesImpl.DatasetServiceImpl services = - new BigQueryServicesImpl.DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new BigQueryServicesImpl.DatasetServiceImpl( + bigquery, null, PipelineOptionsFactory.create()); Table ret = services.tryCreateTable( - testTable, new RetryBoundedBackOff(3, BackOff.ZERO_BACKOFF), Sleeper.DEFAULT); + testTable, new RetryBoundedBackOff(BackOff.ZERO_BACKOFF, 3), Sleeper.DEFAULT); assertEquals(testTable, ret); - verify(response, times(2)).getStatusCode(); - verify(response, times(2)).getContent(); - verify(response, times(2)).getContentType(); - verifyNotNull(ret.getTableReference()); + verifyAllResponsesAreRead(); + + assertNotNull(ret.getTableReference()); + expectedLogs.verifyInfo( "Quota limit reached when creating table project:dataset.table, " - + "retrying up to 5.0 minutes"); + + "retrying up to 5 minutes"); } /** Tests that {@link DatasetServiceImpl#insertAll} uses the supplied {@link ErrorContainer}. */ @@ -1083,14 +1337,15 @@ public void testSimpleErrorRetrieval() throws InterruptedException, IOException .setIndex(1L) .setErrors(ImmutableList.of(new ErrorProto().setReason("invalid"))))); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - - when(response.getContent()).thenReturn(toStream(failures)); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContent()).thenReturn(toStream(failures)); + }); DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); List> failedInserts = Lists.newArrayList(); dataService.insertAll( @@ -1105,7 +1360,8 @@ public void testSimpleErrorRetrieval() throws InterruptedException, IOException ErrorContainer.TABLE_ROW_ERROR_CONTAINER, false, false, - false); + false, + null); assertThat(failedInserts, is(expected)); } @@ -1139,14 +1395,17 @@ public void testExtendedErrorRetrieval() throws InterruptedException, IOExceptio new BigQueryInsertError( rows.get(1).getValue(), failures.getInsertErrors().get(1), ref))); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getStatusCode()).thenReturn(200); - when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + setupMockResponses( + response -> { + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); + when(response.getStatusCode()).thenReturn(200); + when(response.getContentType()).thenReturn(Json.MEDIA_TYPE); - when(response.getContent()).thenReturn(toStream(failures)); + when(response.getContent()).thenReturn(toStream(failures)); + }); DatasetServiceImpl dataService = - new DatasetServiceImpl(bigquery, PipelineOptionsFactory.create()); + new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create()); List> failedInserts = Lists.newArrayList(); dataService.insertAll( @@ -1161,8 +1420,211 @@ public void testExtendedErrorRetrieval() throws InterruptedException, IOExceptio ErrorContainer.BIG_QUERY_INSERT_ERROR_ERROR_CONTAINER, false, false, - false); + false, + null); assertThat(failedInserts, is(expected)); } + + @Test + public void testCreateReadSessionSetsRequestCountMetric() + throws InterruptedException, IOException { + BigQueryServicesImpl.StorageClientImpl client = + mock(BigQueryServicesImpl.StorageClientImpl.class); + + CreateReadSessionRequest.Builder builder = CreateReadSessionRequest.newBuilder(); + builder.getReadSessionBuilder().setTable("myproject:mydataset.mytable"); + CreateReadSessionRequest request = builder.build(); + when(client.callCreateReadSession(request)) + .thenReturn(ReadSession.newBuilder().build()); // Mock implementation. + when(client.createReadSession(any())).thenCallRealMethod(); // Real implementation. + + client.createReadSession(request); + verifyReadMetricWasSet("myproject", "mydataset", "mytable", "ok", 1); + } + + @Test + public void testCreateReadSessionSetsRequestCountMetricOnError() + throws InterruptedException, IOException { + BigQueryServicesImpl.StorageClientImpl client = + mock(BigQueryServicesImpl.StorageClientImpl.class); + + CreateReadSessionRequest.Builder builder = CreateReadSessionRequest.newBuilder(); + builder.getReadSessionBuilder().setTable("myproject:mydataset.mytable"); + CreateReadSessionRequest request = builder.build(); + StatusCode statusCode = + new StatusCode() { + @Override + public Code getCode() { + return Code.NOT_FOUND; + } + + @Override + public Object getTransportCode() { + return null; + } + }; + when(client.callCreateReadSession(request)) + .thenThrow(new ApiException("Not Found", null, statusCode, false)); // Mock implementation. + when(client.createReadSession(any())).thenCallRealMethod(); // Real implementation. + + thrown.expect(ApiException.class); + thrown.expectMessage("Not Found"); + + client.createReadSession(request); + verifyReadMetricWasSet("myproject", "mydataset", "mytable", "not_found", 1); + } + + @Test + public void testReadRowsSetsRequestCountMetric() throws InterruptedException, IOException { + BigQueryServices.StorageClient client = mock(BigQueryServicesImpl.StorageClientImpl.class); + ReadRowsRequest request = null; + BigQueryServices.BigQueryServerStream response = + new BigQueryServices.BigQueryServerStream() { + @Override + public Iterator iterator() { + return null; + } + + @Override + public void cancel() {} + }; + + when(client.readRows(request)).thenReturn(response); // Mock implementation. + when(client.readRows(any(), any())).thenCallRealMethod(); // Real implementation. + + client.readRows(request, "myproject:mydataset.mytable"); + verifyReadMetricWasSet("myproject", "mydataset", "mytable", "ok", 1); + } + + @Test + public void testReadRowsSetsRequestCountMetricOnError() throws InterruptedException, IOException { + BigQueryServices.StorageClient client = mock(BigQueryServicesImpl.StorageClientImpl.class); + ReadRowsRequest request = null; + StatusCode statusCode = + new StatusCode() { + @Override + public Code getCode() { + return Code.INTERNAL; + } + + @Override + public Object getTransportCode() { + return null; + } + }; + when(client.readRows(request)) + .thenThrow(new ApiException("Internal", null, statusCode, false)); // Mock implementation. + when(client.readRows(any(), any())).thenCallRealMethod(); // Real implementation. + + thrown.expect(ApiException.class); + thrown.expectMessage("Internal"); + + client.readRows(request, "myproject:mydataset.mytable"); + verifyReadMetricWasSet("myproject", "mydataset", "mytable", "internal", 1); + } + + @Test + public void testSplitReadStreamSetsRequestCountMetric() throws InterruptedException, IOException { + BigQueryServices.StorageClient client = mock(BigQueryServicesImpl.StorageClientImpl.class); + + SplitReadStreamRequest request = null; + when(client.splitReadStream(request)) + .thenReturn(SplitReadStreamResponse.newBuilder().build()); // Mock implementation. + when(client.splitReadStream(any(), any())).thenCallRealMethod(); // Real implementation. + + client.splitReadStream(request, "myproject:mydataset.mytable"); + verifyReadMetricWasSet("myproject", "mydataset", "mytable", "ok", 1); + } + + @Test + public void testSplitReadStreamSetsRequestCountMetricOnError() + throws InterruptedException, IOException { + BigQueryServices.StorageClient client = mock(BigQueryServicesImpl.StorageClientImpl.class); + SplitReadStreamRequest request = null; + StatusCode statusCode = + new StatusCode() { + @Override + public Code getCode() { + return Code.RESOURCE_EXHAUSTED; + } + + @Override + public Object getTransportCode() { + return null; + } + }; + when(client.splitReadStream(request)) + .thenThrow( + new ApiException( + "Resource Exhausted", null, statusCode, false)); // Mock implementation. + when(client.splitReadStream(any(), any())).thenCallRealMethod(); // Real implementation. + + thrown.expect(ApiException.class); + thrown.expectMessage("Resource Exhausted"); + + client.splitReadStream(request, "myproject:mydataset.mytable"); + verifyReadMetricWasSet("myproject", "mydataset", "mytable", "resource_exhausted", 1); + } + + @Test + public void testRetryAttemptCounter() { + BigQueryServicesImpl.StorageClientImpl.RetryAttemptCounter counter = + new BigQueryServicesImpl.StorageClientImpl.RetryAttemptCounter(); + + RetryInfo retryInfo = + RetryInfo.newBuilder() + .setRetryDelay( + com.google.protobuf.Duration.newBuilder() + .setSeconds(123) + .setNanos(456000000) + .build()) + .build(); + + Metadata metadata = new Metadata(); + metadata.put( + Metadata.Key.of( + "google.rpc.retryinfo-bin", + new Metadata.BinaryMarshaller() { + @Override + public byte[] toBytes(RetryInfo value) { + return value.toByteArray(); + } + + @Override + public RetryInfo parseBytes(byte[] serialized) { + try { + Parser parser = (RetryInfo.newBuilder().build()).getParserForType(); + return parser.parseFrom(serialized); + } catch (Exception e) { + return null; + } + } + }), + retryInfo); + + MetricName metricName = + MetricName.named( + "org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$StorageClientImpl", + "throttling-msecs"); + MetricsContainerImpl container = + (MetricsContainerImpl) MetricsEnvironment.getCurrentContainer(); + + // Nulls don't bump the counter. + counter.onRetryAttempt(null, null); + assertEquals(0, (long) container.getCounter(metricName).getCumulative()); + + // Resource exhausted with empty metadata doesn't bump the counter. + counter.onRetryAttempt( + Status.RESOURCE_EXHAUSTED.withDescription("You have consumed some quota"), new Metadata()); + assertEquals(0, (long) container.getCounter(metricName).getCumulative()); + + // Resource exhausted with retry info bumps the counter. + counter.onRetryAttempt(Status.RESOURCE_EXHAUSTED.withDescription("Stop for a while"), metadata); + assertEquals(123456, (long) container.getCounter(metricName).getCumulative()); + + // Other errors with retry info doesn't bump the counter. + counter.onRetryAttempt(Status.UNAVAILABLE.withDescription("Server is gone"), metadata); + assertEquals(123456, (long) container.getCounter(metricName).getCumulative()); + } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageReaderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageReaderTest.java new file mode 100644 index 000000000000..d98114071698 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageReaderTest.java @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static java.util.Arrays.asList; +import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOStorageReadTest.field; +import static org.hamcrest.CoreMatchers.instanceOf; +import static org.hamcrest.MatcherAssert.assertThat; + +import com.google.cloud.bigquery.storage.v1.ArrowSchema; +import com.google.cloud.bigquery.storage.v1.AvroSchema; +import com.google.cloud.bigquery.storage.v1.ReadSession; +import com.google.protobuf.ByteString; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.channels.Channels; +import org.apache.arrow.vector.ipc.WriteChannel; +import org.apache.arrow.vector.ipc.message.MessageSerializer; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class BigQueryStorageReaderTest { + + private static final org.apache.arrow.vector.types.pojo.Schema ARROW_SCHEMA = + new org.apache.arrow.vector.types.pojo.Schema( + asList( + field("name", new ArrowType.Utf8()), field("number", new ArrowType.Int(64, true)))); + private static final ReadSession ARROW_READ_SESSION = + ReadSession.newBuilder() + .setName("readSession") + .setArrowSchema( + ArrowSchema.newBuilder() + .setSerializedSchema(serializeArrowSchema(ARROW_SCHEMA)) + .build()) + .build(); + private static final String AVRO_SCHEMA_STRING = + "{\"namespace\": \"example.avro\",\n" + + " \"type\": \"record\",\n" + + " \"name\": \"RowRecord\",\n" + + " \"fields\": [\n" + + " {\"name\": \"name\", \"type\": \"string\"},\n" + + " {\"name\": \"number\", \"type\": \"long\"}\n" + + " ]\n" + + "}"; + private static final ReadSession AVRO_READ_SESSION = + ReadSession.newBuilder() + .setName("readSession") + .setAvroSchema(AvroSchema.newBuilder().setSchema(AVRO_SCHEMA_STRING)) + .build(); + + @Test + public void bigQueryStorageReaderFactory_arrowReader() throws Exception { + BigQueryStorageReader reader = BigQueryStorageReaderFactory.getReader(ARROW_READ_SESSION); + assertThat(reader, instanceOf(BigQueryStorageArrowReader.class)); + } + + @Test + public void bigQueryStorageReaderFactory_avroReader() throws Exception { + BigQueryStorageReader reader = BigQueryStorageReaderFactory.getReader(AVRO_READ_SESSION); + assertThat(reader, instanceOf(BigQueryStorageAvroReader.class)); + } + + private static ByteString serializeArrowSchema( + org.apache.arrow.vector.types.pojo.Schema arrowSchema) { + ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream(); + try { + MessageSerializer.serialize( + new WriteChannel(Channels.newChannel(byteOutputStream)), arrowSchema); + } catch (IOException ex) { + throw new RuntimeException("Failed to serialize arrow schema.", ex); + } + return ByteString.copyFrom(byteOutputStream.toByteArray()); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTimePartitioningClusteringIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTimePartitioningClusteringIT.java index da9e645993e4..c0e5bf7a48ef 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTimePartitioningClusteringIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTimePartitioningClusteringIT.java @@ -46,9 +46,6 @@ /** Integration test that partitions and clusters sample data in BigQuery. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryTimePartitioningClusteringIT { private static final String WEATHER_SAMPLES_TABLE = "clouddataflow-readonly:samples.weather_stations"; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java index 85d6ea7db831..0cdd98775337 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java @@ -65,9 +65,6 @@ /** Integration test for BigqueryIO with DataflowRunner and DirectRunner. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryToTableIT { private static final Logger LOG = LoggerFactory.getLogger(BigQueryToTableIT.class); private static String project; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilTest.java index ebf4b20a45ff..1e64e22b4146 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilTest.java @@ -60,9 +60,6 @@ /** Tests for util classes related to BigQuery. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryUtilTest { @Rule public ExpectedException thrown = ExpectedException.none(); @@ -177,7 +174,7 @@ public void testTableGet() throws InterruptedException, IOException { onTableList(dataList); BigQueryServicesImpl.DatasetServiceImpl services = - new BigQueryServicesImpl.DatasetServiceImpl(mockClient, options); + new BigQueryServicesImpl.DatasetServiceImpl(mockClient, null, options); services.getTable( new TableReference().setProjectId("project").setDatasetId("dataset").setTableId("table")); @@ -197,7 +194,7 @@ public void testInsertAll() throws Exception { onInsertAll(errorsIndices); TableReference ref = BigQueryHelpers.parseTableSpec("project:dataset.table"); - DatasetServiceImpl datasetService = new DatasetServiceImpl(mockClient, options, 5); + DatasetServiceImpl datasetService = new DatasetServiceImpl(mockClient, null, options, 5); List> rows = new ArrayList<>(); List ids = new ArrayList<>(); @@ -216,7 +213,16 @@ public void testInsertAll() throws Exception { try { totalBytes = datasetService.insertAll( - ref, rows, ids, InsertRetryPolicy.alwaysRetry(), null, null, false, false, false); + ref, + rows, + ids, + InsertRetryPolicy.alwaysRetry(), + null, + null, + false, + false, + false, + null); } finally { verifyInsertAll(5); // Each of the 25 rows has 1 byte for length and 30 bytes: '{"f":[{"v":"foo"},{"v":1234}]}' diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilsTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilsTest.java index 871f4d04c274..474eea7fd01a 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilsTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilsTest.java @@ -19,16 +19,18 @@ import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toTableRow; import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toTableSchema; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.nullValue; import static org.hamcrest.collection.IsMapContaining.hasEntry; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertNull; import static org.junit.Assert.assertThrows; import com.google.api.services.bigquery.model.TableFieldSchema; +import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableRow; import com.google.api.services.bigquery.model.TableSchema; import java.math.BigDecimal; @@ -64,9 +66,6 @@ /** Tests for {@link BigQueryUtils}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryUtilsTest { private static final Schema FLAT_TYPE = Schema.builder() @@ -78,11 +77,20 @@ public class BigQueryUtilsTest { .addNullableField("timestamp_variant3", Schema.FieldType.DATETIME) .addNullableField("timestamp_variant4", Schema.FieldType.DATETIME) .addNullableField("datetime", Schema.FieldType.logicalType(SqlTypes.DATETIME)) + .addNullableField("datetime0ms", Schema.FieldType.logicalType(SqlTypes.DATETIME)) + .addNullableField("datetime0s_ns", Schema.FieldType.logicalType(SqlTypes.DATETIME)) + .addNullableField("datetime0s_0ns", Schema.FieldType.logicalType(SqlTypes.DATETIME)) .addNullableField("date", Schema.FieldType.logicalType(SqlTypes.DATE)) .addNullableField("time", Schema.FieldType.logicalType(SqlTypes.TIME)) + .addNullableField("time0ms", Schema.FieldType.logicalType(SqlTypes.TIME)) + .addNullableField("time0s_ns", Schema.FieldType.logicalType(SqlTypes.TIME)) + .addNullableField("time0s_0ns", Schema.FieldType.logicalType(SqlTypes.TIME)) .addNullableField("valid", Schema.FieldType.BOOLEAN) .addNullableField("binary", Schema.FieldType.BYTES) .addNullableField("numeric", Schema.FieldType.DECIMAL) + .addNullableField("boolean", Schema.FieldType.BOOLEAN) + .addNullableField("long", Schema.FieldType.INT64) + .addNullableField("double", Schema.FieldType.DOUBLE) .build(); private static final Schema ENUM_TYPE = @@ -141,12 +149,36 @@ public class BigQueryUtilsTest { private static final TableFieldSchema DATETIME = new TableFieldSchema().setName("datetime").setType(StandardSQLTypeName.DATETIME.toString()); + private static final TableFieldSchema DATETIME_0MS = + new TableFieldSchema() + .setName("datetime0ms") + .setType(StandardSQLTypeName.DATETIME.toString()); + + private static final TableFieldSchema DATETIME_0S_NS = + new TableFieldSchema() + .setName("datetime0s_ns") + .setType(StandardSQLTypeName.DATETIME.toString()); + + private static final TableFieldSchema DATETIME_0S_0NS = + new TableFieldSchema() + .setName("datetime0s_0ns") + .setType(StandardSQLTypeName.DATETIME.toString()); + private static final TableFieldSchema DATE = new TableFieldSchema().setName("date").setType(StandardSQLTypeName.DATE.toString()); private static final TableFieldSchema TIME = new TableFieldSchema().setName("time").setType(StandardSQLTypeName.TIME.toString()); + private static final TableFieldSchema TIME_0MS = + new TableFieldSchema().setName("time0ms").setType(StandardSQLTypeName.TIME.toString()); + + private static final TableFieldSchema TIME_0S_NS = + new TableFieldSchema().setName("time0s_ns").setType(StandardSQLTypeName.TIME.toString()); + + private static final TableFieldSchema TIME_0S_0NS = + new TableFieldSchema().setName("time0s_0ns").setType(StandardSQLTypeName.TIME.toString()); + private static final TableFieldSchema VALID = new TableFieldSchema().setName("valid").setType(StandardSQLTypeName.BOOL.toString()); @@ -156,6 +188,15 @@ public class BigQueryUtilsTest { private static final TableFieldSchema NUMERIC = new TableFieldSchema().setName("numeric").setType(StandardSQLTypeName.NUMERIC.toString()); + private static final TableFieldSchema BOOLEAN = + new TableFieldSchema().setName("boolean").setType(StandardSQLTypeName.BOOL.toString()); + + private static final TableFieldSchema LONG = + new TableFieldSchema().setName("long").setType(StandardSQLTypeName.INT64.toString()); + + private static final TableFieldSchema DOUBLE = + new TableFieldSchema().setName("double").setType(StandardSQLTypeName.FLOAT64.toString()); + private static final TableFieldSchema COLOR = new TableFieldSchema().setName("color").setType(StandardSQLTypeName.STRING.toString()); @@ -192,11 +233,20 @@ public class BigQueryUtilsTest { TIMESTAMP_VARIANT3, TIMESTAMP_VARIANT4, DATETIME, + DATETIME_0MS, + DATETIME_0S_NS, + DATETIME_0S_0NS, DATE, TIME, + TIME_0MS, + TIME_0S_NS, + TIME_0S_0NS, VALID, BINARY, - NUMERIC)); + NUMERIC, + BOOLEAN, + LONG, + DOUBLE)); private static final TableFieldSchema ROWS = new TableFieldSchema() @@ -213,11 +263,20 @@ public class BigQueryUtilsTest { TIMESTAMP_VARIANT3, TIMESTAMP_VARIANT4, DATETIME, + DATETIME_0MS, + DATETIME_0S_NS, + DATETIME_0S_0NS, DATE, TIME, + TIME_0MS, + TIME_0S_NS, + TIME_0S_0NS, VALID, BINARY, - NUMERIC)); + NUMERIC, + BOOLEAN, + LONG, + DOUBLE)); private static final TableFieldSchema MAP = new TableFieldSchema() @@ -244,11 +303,20 @@ public class BigQueryUtilsTest { .parseDateTime("2019-08-18T15:52:07.123"), new DateTime(123456), LocalDateTime.parse("2020-11-02T12:34:56.789876"), + LocalDateTime.parse("2020-11-02T12:34:56"), + LocalDateTime.parse("2020-11-02T12:34:00.789876"), + LocalDateTime.parse("2020-11-02T12:34"), LocalDate.parse("2020-11-02"), LocalTime.parse("12:34:56.789876"), + LocalTime.parse("12:34:56"), + LocalTime.parse("12:34:00.789876"), + LocalTime.parse("12:34"), false, Base64.getDecoder().decode("ABCD1234"), - new BigDecimal(123.456).setScale(3, RoundingMode.HALF_UP)) + new BigDecimal(123.456).setScale(3, RoundingMode.HALF_UP), + true, + 123L, + 123.456d) .build(); private static final TableRow BQ_FLAT_ROW = @@ -264,16 +332,27 @@ public class BigQueryUtilsTest { "timestamp_variant4", String.valueOf( new DateTime(123456L, ISOChronology.getInstanceUTC()).getMillis() / 1000.0D)) - .set("datetime", "2020-11-02 12:34:56.789876") + .set("datetime", "2020-11-02T12:34:56.789876") + .set("datetime0ms", "2020-11-02T12:34:56") + .set("datetime0s_ns", "2020-11-02T12:34:00.789876") + .set("datetime0s_0ns", "2020-11-02T12:34:00") .set("date", "2020-11-02") .set("time", "12:34:56.789876") + .set("time0ms", "12:34:56") + .set("time0s_ns", "12:34:00.789876") + .set("time0s_0ns", "12:34:00") .set("valid", "false") .set("binary", "ABCD1234") - .set("numeric", "123.456"); + .set("numeric", "123.456") + .set("boolean", true) + .set("long", 123L) + .set("double", 123.456d); private static final Row NULL_FLAT_ROW = Row.withSchema(FLAT_TYPE) - .addValues(null, null, null, null, null, null, null, null, null, null, null, null, null) + .addValues( + null, null, null, null, null, null, null, null, null, null, null, null, null, null, + null, null, null, null, null, null, null, null) .build(); private static final TableRow BQ_NULL_FLAT_ROW = @@ -286,11 +365,20 @@ public class BigQueryUtilsTest { .set("timestamp_variant3", null) .set("timestamp_variant4", null) .set("datetime", null) + .set("datetime0ms", null) + .set("datetime0s_ns", null) + .set("datetime0s_0ns", null) .set("date", null) .set("time", null) + .set("time0ms", null) + .set("time0s_ns", null) + .set("time0s_0ns", null) .set("valid", null) .set("binary", null) - .set("numeric", null); + .set("numeric", null) + .set("boolean", null) + .set("long", null) + .set("double", null); private static final Row ENUM_ROW = Row.withSchema(ENUM_TYPE).addValues(new EnumerationType.Value(1)).build(); @@ -336,11 +424,20 @@ public class BigQueryUtilsTest { TIMESTAMP_VARIANT3, TIMESTAMP_VARIANT4, DATETIME, + DATETIME_0MS, + DATETIME_0S_NS, + DATETIME_0S_0NS, DATE, TIME, + TIME_0MS, + TIME_0S_NS, + TIME_0S_0NS, VALID, BINARY, - NUMERIC)); + NUMERIC, + BOOLEAN, + LONG, + DOUBLE)); private static final TableSchema BQ_ENUM_TYPE = new TableSchema().setFields(Arrays.asList(COLOR)); @@ -382,11 +479,20 @@ public void testToTableSchema_flat() { TIMESTAMP_VARIANT3, TIMESTAMP_VARIANT4, DATETIME, + DATETIME_0MS, + DATETIME_0S_NS, + DATETIME_0S_0NS, DATE, TIME, + TIME_0MS, + TIME_0S_NS, + TIME_0S_0NS, VALID, BINARY, - NUMERIC)); + NUMERIC, + BOOLEAN, + LONG, + DOUBLE)); } @Test @@ -423,11 +529,20 @@ public void testToTableSchema_row() { TIMESTAMP_VARIANT3, TIMESTAMP_VARIANT4, DATETIME, + DATETIME_0MS, + DATETIME_0S_NS, + DATETIME_0S_0NS, DATE, TIME, + TIME_0MS, + TIME_0S_NS, + TIME_0S_0NS, VALID, BINARY, - NUMERIC)); + NUMERIC, + BOOLEAN, + LONG, + DOUBLE)); } @Test @@ -450,11 +565,20 @@ public void testToTableSchema_array_row() { TIMESTAMP_VARIANT3, TIMESTAMP_VARIANT4, DATETIME, + DATETIME_0MS, + DATETIME_0S_NS, + DATETIME_0S_0NS, DATE, TIME, + TIME_0MS, + TIME_0S_NS, + TIME_0S_0NS, VALID, BINARY, - NUMERIC)); + NUMERIC, + BOOLEAN, + LONG, + DOUBLE)); } @Test @@ -472,18 +596,26 @@ public void testToTableSchema_map() { @Test public void testToTableRow_flat() { TableRow row = toTableRow().apply(FLAT_ROW); - System.out.println(row); - assertThat(row.size(), equalTo(13)); + assertThat(row.size(), equalTo(22)); assertThat(row, hasEntry("id", "123")); assertThat(row, hasEntry("value", "123.456")); - assertThat(row, hasEntry("datetime", "2020-11-02 12:34:56.789876")); + assertThat(row, hasEntry("datetime", "2020-11-02T12:34:56.789876")); + assertThat(row, hasEntry("datetime0ms", "2020-11-02T12:34:56")); + assertThat(row, hasEntry("datetime0s_ns", "2020-11-02T12:34:00.789876")); + assertThat(row, hasEntry("datetime0s_0ns", "2020-11-02T12:34:00")); assertThat(row, hasEntry("date", "2020-11-02")); assertThat(row, hasEntry("time", "12:34:56.789876")); + assertThat(row, hasEntry("time0ms", "12:34:56")); + assertThat(row, hasEntry("time0s_ns", "12:34:00.789876")); + assertThat(row, hasEntry("time0s_0ns", "12:34:00")); assertThat(row, hasEntry("name", "test")); assertThat(row, hasEntry("valid", "false")); assertThat(row, hasEntry("binary", "ABCD1234")); assertThat(row, hasEntry("numeric", "123.456")); + assertThat(row, hasEntry("boolean", "true")); + assertThat(row, hasEntry("long", "123")); + assertThat(row, hasEntry("double", "123.456")); } @Test @@ -519,16 +651,25 @@ public void testToTableRow_row() { assertThat(row.size(), equalTo(1)); row = (TableRow) row.get("row"); - assertThat(row.size(), equalTo(13)); + assertThat(row.size(), equalTo(22)); assertThat(row, hasEntry("id", "123")); assertThat(row, hasEntry("value", "123.456")); - assertThat(row, hasEntry("datetime", "2020-11-02 12:34:56.789876")); + assertThat(row, hasEntry("datetime", "2020-11-02T12:34:56.789876")); + assertThat(row, hasEntry("datetime0ms", "2020-11-02T12:34:56")); + assertThat(row, hasEntry("datetime0s_ns", "2020-11-02T12:34:00.789876")); + assertThat(row, hasEntry("datetime0s_0ns", "2020-11-02T12:34:00")); assertThat(row, hasEntry("date", "2020-11-02")); assertThat(row, hasEntry("time", "12:34:56.789876")); + assertThat(row, hasEntry("time0ms", "12:34:56")); + assertThat(row, hasEntry("time0s_ns", "12:34:00.789876")); + assertThat(row, hasEntry("time0s_0ns", "12:34:00")); assertThat(row, hasEntry("name", "test")); assertThat(row, hasEntry("valid", "false")); assertThat(row, hasEntry("binary", "ABCD1234")); assertThat(row, hasEntry("numeric", "123.456")); + assertThat(row, hasEntry("boolean", "true")); + assertThat(row, hasEntry("long", "123")); + assertThat(row, hasEntry("double", "123.456")); } @Test @@ -537,23 +678,32 @@ public void testToTableRow_array_row() { assertThat(row.size(), equalTo(1)); row = ((List) row.get("rows")).get(0); - assertThat(row.size(), equalTo(13)); + assertThat(row.size(), equalTo(22)); assertThat(row, hasEntry("id", "123")); assertThat(row, hasEntry("value", "123.456")); - assertThat(row, hasEntry("datetime", "2020-11-02 12:34:56.789876")); + assertThat(row, hasEntry("datetime", "2020-11-02T12:34:56.789876")); + assertThat(row, hasEntry("datetime0ms", "2020-11-02T12:34:56")); + assertThat(row, hasEntry("datetime0s_ns", "2020-11-02T12:34:00.789876")); + assertThat(row, hasEntry("datetime0s_0ns", "2020-11-02T12:34:00")); assertThat(row, hasEntry("date", "2020-11-02")); assertThat(row, hasEntry("time", "12:34:56.789876")); + assertThat(row, hasEntry("time0ms", "12:34:56")); + assertThat(row, hasEntry("time0s_ns", "12:34:00.789876")); + assertThat(row, hasEntry("time0s_0ns", "12:34:00")); assertThat(row, hasEntry("name", "test")); assertThat(row, hasEntry("valid", "false")); assertThat(row, hasEntry("binary", "ABCD1234")); assertThat(row, hasEntry("numeric", "123.456")); + assertThat(row, hasEntry("boolean", "true")); + assertThat(row, hasEntry("long", "123")); + assertThat(row, hasEntry("double", "123.456")); } @Test public void testToTableRow_null_row() { TableRow row = toTableRow().apply(NULL_FLAT_ROW); - assertThat(row.size(), equalTo(13)); + assertThat(row.size(), equalTo(22)); assertThat(row, hasEntry("id", null)); assertThat(row, hasEntry("value", null)); assertThat(row, hasEntry("name", null)); @@ -562,11 +712,20 @@ public void testToTableRow_null_row() { assertThat(row, hasEntry("timestamp_variant3", null)); assertThat(row, hasEntry("timestamp_variant4", null)); assertThat(row, hasEntry("datetime", null)); + assertThat(row, hasEntry("datetime0ms", null)); + assertThat(row, hasEntry("datetime0s_ns", null)); + assertThat(row, hasEntry("datetime0s_0ns", null)); assertThat(row, hasEntry("date", null)); assertThat(row, hasEntry("time", null)); + assertThat(row, hasEntry("time0ms", null)); + assertThat(row, hasEntry("time0s_ns", null)); + assertThat(row, hasEntry("time0s_0ns", null)); assertThat(row, hasEntry("valid", null)); assertThat(row, hasEntry("binary", null)); assertThat(row, hasEntry("numeric", null)); + assertThat(row, hasEntry("boolean", null)); + assertThat(row, hasEntry("long", null)); + assertThat(row, hasEntry("double", null)); } private static final BigQueryUtils.ConversionOptions TRUNCATE_OPTIONS = @@ -776,4 +935,82 @@ public void testToBeamRow_avro_array_array_row() { record, AVRO_ARRAY_ARRAY_TYPE, BigQueryUtils.ConversionOptions.builder().build()); assertEquals(expected, beamRow); } + + @Test + public void testToTableReference() { + { + TableReference tr = + BigQueryUtils.toTableReference("projects/myproject/datasets/mydataset/tables/mytable"); + assertEquals("myproject", tr.getProjectId()); + assertEquals("mydataset", tr.getDatasetId()); + assertEquals("mytable", tr.getTableId()); + } + + { + // Test colon(":") after project format + TableReference tr = BigQueryUtils.toTableReference("myprojectwithcolon:mydataset.mytable"); + assertEquals("myprojectwithcolon", tr.getProjectId()); + assertEquals("mydataset", tr.getDatasetId()); + assertEquals("mytable", tr.getTableId()); + } + + { + // Test dot(".") after project format + TableReference tr = BigQueryUtils.toTableReference("myprojectwithdot.mydataset.mytable"); + assertEquals("myprojectwithdot", tr.getProjectId()); + assertEquals("mydataset", tr.getDatasetId()); + assertEquals("mytable", tr.getTableId()); + } + + // Invalid scenarios + assertNull(BigQueryUtils.toTableReference("")); + assertNull(BigQueryUtils.toTableReference(":.")); + assertNull(BigQueryUtils.toTableReference("..")); + assertNull(BigQueryUtils.toTableReference("myproject")); + assertNull(BigQueryUtils.toTableReference("myproject:")); + assertNull(BigQueryUtils.toTableReference("myproject.")); + assertNull(BigQueryUtils.toTableReference("myproject:mydataset")); + assertNull(BigQueryUtils.toTableReference("myproject:mydataset.")); + assertNull(BigQueryUtils.toTableReference("myproject:mydataset.mytable.")); + assertNull(BigQueryUtils.toTableReference("myproject:mydataset:mytable:")); + assertNull(BigQueryUtils.toTableReference(".invalidleadingdot.mydataset.mytable")); + assertNull(BigQueryUtils.toTableReference("invalidtrailingdot.mydataset.mytable.")); + assertNull(BigQueryUtils.toTableReference(":invalidleadingcolon.mydataset.mytable")); + assertNull(BigQueryUtils.toTableReference("invalidtrailingcolon.mydataset.mytable:")); + assertNull(BigQueryUtils.toTableReference("myproject.mydataset.mytable.myinvalidpart")); + assertNull(BigQueryUtils.toTableReference("myproject:mydataset.mytable.myinvalidpart")); + + assertNull( + BigQueryUtils.toTableReference("/projects/extraslash/datasets/mydataset/tables/mytable")); + assertNull( + BigQueryUtils.toTableReference("projects//extraslash/datasets/mydataset/tables/mytable")); + assertNull( + BigQueryUtils.toTableReference("projects/extraslash//datasets/mydataset/tables/mytable")); + assertNull( + BigQueryUtils.toTableReference("projects/extraslash/datasets//mydataset/tables/mytable")); + assertNull( + BigQueryUtils.toTableReference("projects/extraslash/datasets/mydataset//tables/mytable")); + assertNull( + BigQueryUtils.toTableReference("projects/extraslash/datasets/mydataset/tables//mytable")); + assertNull( + BigQueryUtils.toTableReference("projects/extraslash/datasets/mydataset/tables/mytable/")); + + assertNull(BigQueryUtils.toTableReference("projects/myproject/datasets/mydataset/tables//")); + assertNull(BigQueryUtils.toTableReference("projects/myproject/datasets//tables/mytable/")); + assertNull(BigQueryUtils.toTableReference("projects//datasets/mydataset/tables/mytable/")); + assertNull(BigQueryUtils.toTableReference("projects//datasets//tables//")); + + assertNull( + BigQueryUtils.toTableReference("projects/myproject/datasets/mydataset/tables/mytable/")); + assertNull(BigQueryUtils.toTableReference("projects/myproject/datasets/mydataset/tables/")); + assertNull(BigQueryUtils.toTableReference("projects/myproject/datasets/mydataset/tables")); + assertNull(BigQueryUtils.toTableReference("projects/myproject/datasets/mydataset/")); + assertNull(BigQueryUtils.toTableReference("projects/myproject/datasets/mydataset")); + assertNull(BigQueryUtils.toTableReference("projects/myproject/datasets/")); + assertNull(BigQueryUtils.toTableReference("projects/myproject/datasets")); + assertNull(BigQueryUtils.toTableReference("projects/myproject/")); + assertNull(BigQueryUtils.toTableReference("projects/myproject")); + assertNull(BigQueryUtils.toTableReference("projects/")); + assertNull(BigQueryUtils.toTableReference("projects")); + } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/ProjectOverride.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/ProjectOverride.java new file mode 100644 index 000000000000..974caaa824d4 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/ProjectOverride.java @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * This used-case handles situation where bigquery project id may differ from default project. Used + * to mark unit test method where the project id will be overridden with supplied project id through + * pipeline options. The annotation is checked in @Rule. + */ +@Retention(RetentionPolicy.RUNTIME) +@Target({ElementType.METHOD}) +public @interface ProjectOverride {} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/RetryManagerTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/RetryManagerTest.java new file mode 100644 index 000000000000..6e8a5746b0b2 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/RetryManagerTest.java @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.junit.Assert.assertEquals; + +import com.google.api.core.ApiFutures; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.Operation; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.Operation.Context; +import org.apache.beam.sdk.io.gcp.bigquery.RetryManager.RetryType; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.joda.time.Duration; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link RetryManager}. */ +@RunWith(JUnit4.class) +public class RetryManagerTest { + static class Context extends Operation.Context { + int numStarted = 0; + int numSucceeded = 0; + int numFailed = 0; + } + + @Test + public void testNoFailures() throws Exception { + List contexts = Lists.newArrayList(); + RetryManager retryManager = + new RetryManager<>(Duration.millis(1), Duration.millis(1), 5); + for (int i = 0; i < 5; ++i) { + Context context = new Context(); + contexts.add(context); + retryManager.addOperation( + c -> { + ++c.numStarted; + return ApiFutures.immediateFuture("yes"); + }, + cs -> { + cs.forEach(c -> ++c.numFailed); + return RetryType.DONT_RETRY; + }, + c -> ++c.numSucceeded, + context); + } + contexts.forEach( + c -> { + assertEquals(0, c.numStarted); + assertEquals(0, c.numSucceeded); + assertEquals(0, c.numFailed); + }); + retryManager.run(true); + contexts.forEach( + c -> { + assertEquals(1, c.numStarted); + assertEquals(1, c.numSucceeded); + assertEquals(0, c.numFailed); + }); + } + + @Test + public void testRetryInOrder() throws Exception { + Map contexts = Maps.newHashMap(); + Map expectedStarts = Maps.newHashMap(); + Map expectedFailures = Maps.newHashMap(); + + RetryManager retryManager = + new RetryManager<>(Duration.millis(1), Duration.millis(1), 50); + for (int i = 0; i < 5; ++i) { + final int index = i; + String value = "yes " + i; + Context context = new Context(); + contexts.put(value, context); + expectedStarts.put(value, i + 2); + expectedFailures.put(value, i + 1); + retryManager.addOperation( + c -> { + // Make sure that each operation fails on its own. Failing a previous operation + // automatically + // fails all subsequent operations. + if (c.numStarted <= index) { + ++c.numStarted; + RuntimeException e = new RuntimeException("foo"); + return ApiFutures.immediateFailedFuture(e); + } else { + ++c.numStarted; + return ApiFutures.immediateFuture(value); + } + }, + cs -> { + cs.forEach(c -> ++c.numFailed); + return RetryType.RETRY_ALL_OPERATIONS; + }, + c -> ++c.numSucceeded, + context); + } + contexts + .values() + .forEach( + c -> { + assertEquals(0, c.numStarted); + assertEquals(0, c.numSucceeded); + assertEquals(0, c.numFailed); + }); + retryManager.run(true); + contexts + .entrySet() + .forEach( + e -> { + assertEquals((int) expectedStarts.get(e.getKey()), e.getValue().numStarted); + assertEquals(1, e.getValue().numSucceeded); + assertEquals((int) expectedFailures.get(e.getKey()), e.getValue().numFailed); + }); + } + + @Test + public void testDontRetry() throws Exception { + List contexts = Lists.newArrayList(); + + RetryManager retryManager = + new RetryManager<>(Duration.millis(1), Duration.millis(1), 50); + for (int i = 0; i < 5; ++i) { + Context context = new Context(); + contexts.add(context); + String value = "yes " + i; + retryManager.addOperation( + c -> { + if (c.numStarted == 0) { + ++c.numStarted; + RuntimeException e = new RuntimeException("foo"); + return ApiFutures.immediateFailedFuture(e); + } else { + ++c.numStarted; + return ApiFutures.immediateFuture(value); + } + }, + cs -> { + cs.forEach(c -> ++c.numFailed); + return RetryType.DONT_RETRY; + }, + c -> ++c.numSucceeded, + context); + } + contexts.forEach( + c -> { + assertEquals(0, c.numStarted); + assertEquals(0, c.numSucceeded); + assertEquals(0, c.numFailed); + }); + retryManager.run(true); + contexts.forEach( + c -> { + assertEquals(1, c.numStarted); + assertEquals(0, c.numSucceeded); + assertEquals(1, c.numFailed); + }); + } + + @Test + public void testHasSucceeded() throws Exception { + List contexts = Lists.newArrayList(); + RetryManager retryManager = + new RetryManager<>(Duration.millis(1), Duration.millis(1), 5); + for (int i = 0; i < 5; ++i) { + Context context = new Context(); + contexts.add(context); + retryManager.addOperation( + c -> { + ++c.numStarted; + return ApiFutures.immediateFuture("yes"); + }, + cs -> { + cs.forEach(c -> ++c.numFailed); + return RetryType.DONT_RETRY; + }, + c -> ++c.numSucceeded, + c -> false, + context); + } + contexts.forEach( + c -> { + assertEquals(0, c.numStarted); + assertEquals(0, c.numSucceeded); + assertEquals(0, c.numFailed); + }); + retryManager.run(true); + contexts.forEach( + c -> { + assertEquals(1, c.numStarted); + assertEquals(0, c.numSucceeded); + assertEquals(1, c.numFailed); + }); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowJsonCoderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowJsonCoderTest.java index 15e50f593290..b09832085a7d 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowJsonCoderTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowJsonCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.gcp.bigquery; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import com.google.api.services.bigquery.model.TableRow; import java.util.Arrays; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java new file mode 100644 index 000000000000..0acc41f62eb5 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java @@ -0,0 +1,312 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.junit.Assert.assertEquals; + +import com.google.api.services.bigquery.model.TableFieldSchema; +import com.google.api.services.bigquery.model.TableRow; +import com.google.api.services.bigquery.model.TableSchema; +import com.google.protobuf.ByteString; +import com.google.protobuf.DescriptorProtos.DescriptorProto; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto.Label; +import com.google.protobuf.DescriptorProtos.FieldDescriptorProto.Type; +import com.google.protobuf.Descriptors.Descriptor; +import com.google.protobuf.Descriptors.FieldDescriptor; +import com.google.protobuf.DynamicMessage; +import java.nio.charset.StandardCharsets; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Functions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.BaseEncoding; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +/** Unit tests for {@link org.apache.beam.sdk.io.gcp.bigquery.TableRowToStorageApiProto}. */ +public class TableRowToStorageApiProtoTest { + private static final TableSchema BASE_TABLE_SCHEMA = + new TableSchema() + .setFields( + ImmutableList.builder() + .add(new TableFieldSchema().setType("STRING").setName("stringValue")) + .add(new TableFieldSchema().setType("BYTES").setName("bytesValue")) + .add(new TableFieldSchema().setType("INT64").setName("int64Value")) + .add(new TableFieldSchema().setType("INTEGER").setName("intValue")) + .add(new TableFieldSchema().setType("FLOAT64").setName("float64Value")) + .add(new TableFieldSchema().setType("FLOAT").setName("floatValue")) + .add(new TableFieldSchema().setType("BOOL").setName("boolValue")) + .add(new TableFieldSchema().setType("BOOLEAN").setName("booleanValue")) + .add(new TableFieldSchema().setType("TIMESTAMP").setName("timestampValue")) + .add(new TableFieldSchema().setType("TIME").setName("timeValue")) + .add(new TableFieldSchema().setType("DATETIME").setName("datetimeValue")) + .add(new TableFieldSchema().setType("DATE").setName("dateValue")) + .add(new TableFieldSchema().setType("NUMERIC").setName("numericValue")) + .add( + new TableFieldSchema() + .setType("STRING") + .setMode("REPEATED") + .setName("arrayValue")) + .build()); + + private static final DescriptorProto BASE_TABLE_SCHEMA_PROTO = + DescriptorProto.newBuilder() + .addField( + FieldDescriptorProto.newBuilder() + .setName("stringvalue") + .setNumber(1) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("bytesvalue") + .setNumber(2) + .setType(Type.TYPE_BYTES) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("int64value") + .setNumber(3) + .setType(Type.TYPE_INT64) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("intvalue") + .setNumber(4) + .setType(Type.TYPE_INT64) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("float64value") + .setNumber(5) + .setType(Type.TYPE_DOUBLE) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("floatvalue") + .setNumber(6) + .setType(Type.TYPE_DOUBLE) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("boolvalue") + .setNumber(7) + .setType(Type.TYPE_BOOL) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("booleanvalue") + .setNumber(8) + .setType(Type.TYPE_BOOL) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("timestampvalue") + .setNumber(9) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("timevalue") + .setNumber(10) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("datetimevalue") + .setNumber(11) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("datevalue") + .setNumber(12) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("numericvalue") + .setNumber(13) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_OPTIONAL) + .build()) + .addField( + FieldDescriptorProto.newBuilder() + .setName("arrayvalue") + .setNumber(14) + .setType(Type.TYPE_STRING) + .setLabel(Label.LABEL_REPEATED) + .build()) + .build(); + + private static final TableSchema NESTED_TABLE_SCHEMA = + new TableSchema() + .setFields( + ImmutableList.builder() + .add( + new TableFieldSchema() + .setType("STRUCT") + .setName("nestedValue1") + .setFields(BASE_TABLE_SCHEMA.getFields())) + .add( + new TableFieldSchema() + .setType("RECORD") + .setName("nestedValue2") + .setFields(BASE_TABLE_SCHEMA.getFields())) + .build()); + + // For now, test that no exceptions are thrown. + @Test + public void testDescriptorFromTableSchema() { + DescriptorProto descriptor = + TableRowToStorageApiProto.descriptorSchemaFromTableSchema(BASE_TABLE_SCHEMA); + Map types = + descriptor.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + Map expectedTypes = + BASE_TABLE_SCHEMA_PROTO.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + assertEquals(expectedTypes, types); + } + + @Test + public void testNestedFromTableSchema() { + DescriptorProto descriptor = + TableRowToStorageApiProto.descriptorSchemaFromTableSchema(NESTED_TABLE_SCHEMA); + Map expectedBaseTypes = + BASE_TABLE_SCHEMA_PROTO.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + + Map types = + descriptor.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + Map typeNames = + descriptor.getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getTypeName)); + assertEquals(2, types.size()); + + Map nestedTypes = + descriptor.getNestedTypeList().stream() + .collect(Collectors.toMap(DescriptorProto::getName, Functions.identity())); + assertEquals(2, nestedTypes.size()); + assertEquals(Type.TYPE_MESSAGE, types.get("nestedvalue1")); + String nestedTypeName1 = typeNames.get("nestedvalue1"); + Map nestedTypes1 = + nestedTypes.get(nestedTypeName1).getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + assertEquals(expectedBaseTypes, nestedTypes1); + + assertEquals(Type.TYPE_MESSAGE, types.get("nestedvalue2")); + String nestedTypeName2 = typeNames.get("nestedvalue2"); + Map nestedTypes2 = + nestedTypes.get(nestedTypeName2).getFieldList().stream() + .collect( + Collectors.toMap(FieldDescriptorProto::getName, FieldDescriptorProto::getType)); + assertEquals(expectedBaseTypes, nestedTypes2); + } + + @Test + public void testRepeatedDescriptorFromTableSchema() { + TableRowToStorageApiProto.descriptorSchemaFromTableSchema(BASE_TABLE_SCHEMA); + } + + private static final TableRow BASE_TABLE_ROW = + new TableRow() + .set("stringValue", "string") + .set( + "bytesValue", BaseEncoding.base64().encode("string".getBytes(StandardCharsets.UTF_8))) + .set("int64Value", "42") + .set("intValue", "43") + .set("float64Value", "2.8168") + .set("floatValue", "2.817") + .set("boolValue", "true") + .set("booleanValue", "true") + .set("timestampValue", "43") + .set("timeValue", "00:52:07[.123]|[.123456] UTC") + .set("datetimeValue", "2019-08-16 00:52:07[.123]|[.123456] UTC") + .set("dateValue", "2019-08-16") + .set("numericValue", "23.4") + .set("arrayValue", ImmutableList.of("hello", "goodbye")); + + private static final Map BASE_ROW_EXPECTED_PROTO_VALUES = + ImmutableMap.builder() + .put("stringvalue", "string") + .put("bytesvalue", ByteString.copyFrom("string".getBytes(StandardCharsets.UTF_8))) + .put("int64value", (long) 42) + .put("intvalue", (long) 43) + .put("float64value", (double) 2.8168) + .put("floatvalue", (double) 2.817) + .put("boolvalue", true) + .put("booleanvalue", true) + .put("timestampvalue", "43") + .put("timevalue", "00:52:07[.123]|[.123456] UTC") + .put("datetimevalue", "2019-08-16 00:52:07[.123]|[.123456] UTC") + .put("datevalue", "2019-08-16") + .put("numericvalue", "23.4") + .put("arrayvalue", ImmutableList.of("hello", "goodbye")) + .build(); + + private void assertBaseRecord(DynamicMessage msg) { + Map recordFields = + msg.getAllFields().entrySet().stream() + .collect( + Collectors.toMap(entry -> entry.getKey().getName(), entry -> entry.getValue())); + assertEquals(BASE_ROW_EXPECTED_PROTO_VALUES, recordFields); + } + + @Test + public void testMessageFromTableRow() throws Exception { + TableRow tableRow = + new TableRow().set("nestedValue1", BASE_TABLE_ROW).set("nestedValue2", BASE_TABLE_ROW); + Descriptor descriptor = + TableRowToStorageApiProto.getDescriptorFromTableSchema(NESTED_TABLE_SCHEMA); + DynamicMessage msg = TableRowToStorageApiProto.messageFromTableRow(descriptor, tableRow); + assertEquals(2, msg.getAllFields().size()); + + Map fieldDescriptors = + descriptor.getFields().stream() + .collect(Collectors.toMap(FieldDescriptor::getName, Functions.identity())); + assertBaseRecord((DynamicMessage) msg.getField(fieldDescriptors.get("nestedvalue1"))); + assertBaseRecord((DynamicMessage) msg.getField(fieldDescriptors.get("nestedvalue2"))); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BeamRowToBigtableMutationTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BeamRowToBigtableMutationTest.java index 2583ce349b16..5fa16dbffafa 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BeamRowToBigtableMutationTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BeamRowToBigtableMutationTest.java @@ -18,13 +18,13 @@ package org.apache.beam.sdk.io.gcp.bigtable; import static java.util.stream.Collectors.toList; -import static org.apache.beam.sdk.io.gcp.bigtable.TestUtils.rowMutation; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.BOOL_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.DOUBLE_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.FAMILY_TEST; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.LONG_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.STRING_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.TEST_FLAT_SCHEMA; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.BOOL_COLUMN; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.DOUBLE_COLUMN; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.FAMILY_TEST; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.LONG_COLUMN; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.STRING_COLUMN; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.TEST_FLAT_SCHEMA; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.rowMutation; import com.google.bigtable.v2.Mutation; import com.google.protobuf.ByteString; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfigTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfigTest.java index 20e564b4c582..3636aaa0d71a 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfigTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfigTest.java @@ -21,10 +21,10 @@ import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasKey; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasLabel; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasValue; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import com.google.cloud.bigtable.config.BigtableOptions; @@ -44,9 +44,6 @@ /** Unit tests for {@link BigtableConfig}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigtableConfigTest { static final ValueProvider NOT_ACCESSIBLE_VALUE = diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java index 10310a744a47..abe2749ba21b 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java @@ -29,6 +29,7 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Verify.verifyNotNull; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.allOf; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.greaterThan; @@ -37,7 +38,6 @@ import static org.hamcrest.Matchers.lessThan; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import com.google.bigtable.v2.Cell; @@ -77,7 +77,7 @@ import org.apache.beam.sdk.coders.IterableCoder; import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; -import org.apache.beam.sdk.coders.VarIntCoder; +import org.apache.beam.sdk.coders.VarLongCoder; import org.apache.beam.sdk.extensions.gcp.auth.TestCredential; import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; import org.apache.beam.sdk.extensions.protobuf.ByteStringCoder; @@ -126,9 +126,6 @@ /** Unit tests for {@link BigtableIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigtableIOTest { @Rule public final transient TestPipeline p = TestPipeline.create(); @Rule public ExpectedException thrown = ExpectedException.none(); @@ -1274,8 +1271,7 @@ public void testWritingAndWaitingOnResults() throws Exception { * A DoFn used to generate N outputs, where N is the input. Used to generate bundles of >= 1 * element. */ - private static class WriteGeneratorDoFn - extends DoFn>> { + private static class WriteGeneratorDoFn extends DoFn>> { @ProcessElement public void processElement(ProcessContext ctx) { for (int i = 0; i < ctx.element(); i++) { @@ -1296,12 +1292,12 @@ public void testWritingEmitsResultsWhenDoneInFixedWindow() throws Exception { Instant elementTimestamp = Instant.parse("2019-06-10T00:00:00"); Duration windowDuration = Duration.standardMinutes(1); - TestStream input = - TestStream.create(VarIntCoder.of()) + TestStream input = + TestStream.create(VarLongCoder.of()) .advanceWatermarkTo(elementTimestamp) - .addElements(1) + .addElements(1L) .advanceWatermarkTo(elementTimestamp.plus(windowDuration)) - .addElements(2) + .addElements(2L) .advanceWatermarkToInfinity(); BoundedWindow expectedFirstWindow = new IntervalWindow(elementTimestamp, windowDuration); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFlatTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFlatTest.java index f201b3a8b685..ef59c6ab7466 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFlatTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowFlatTest.java @@ -17,13 +17,13 @@ */ package org.apache.beam.sdk.io.gcp.bigtable; -import static org.apache.beam.sdk.io.gcp.bigtable.TestUtils.bigtableRow; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.BOOL_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.DOUBLE_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.FAMILY_TEST; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.LONG_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.STRING_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.TEST_FLAT_SCHEMA; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.BOOL_COLUMN; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.DOUBLE_COLUMN; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.FAMILY_TEST; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.LONG_COLUMN; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.STRING_COLUMN; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.TEST_FLAT_SCHEMA; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.bigtableRow; import java.util.Map; import java.util.Set; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowTest.java index 0a683ed3f68f..dfcefab5728c 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableRowToBeamRowTest.java @@ -18,13 +18,13 @@ package org.apache.beam.sdk.io.gcp.bigtable; import static java.util.stream.Collectors.toList; -import static org.apache.beam.sdk.io.gcp.bigtable.TestUtils.bigtableRow; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.FAMILY_TEST; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.LATER; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.LONG_COLUMN_SCHEMA; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.STRING_COLUMN; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.TEST_FAMILY_SCHEMA; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.TEST_SCHEMA; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.FAMILY_TEST; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.LATER; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.LONG_COLUMN_SCHEMA; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.STRING_COLUMN; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.TEST_FAMILY_SCHEMA; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.TEST_SCHEMA; +import static org.apache.beam.sdk.io.gcp.bigtable.BigtableTestUtils.bigtableRow; import org.apache.beam.sdk.extensions.protobuf.ProtoCoder; import org.apache.beam.sdk.testing.PAssert; @@ -39,9 +39,6 @@ import org.junit.Rule; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigtableRowToBeamRowTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); @@ -71,8 +68,7 @@ private Row familyRow() { false, Row.withSchema(LONG_COLUMN_SCHEMA).attachValues(2L, LATER, ImmutableList.of("label1")), ImmutableList.of("value1", "value2"), - 5.5, - new byte[] {2, 1, 0}); + 5.5); } private static class SortStringColumn extends SimpleFunction { diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImplTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImplTest.java index 13b135a07594..b983cc3ac86a 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImplTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImplTest.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.bigtable; +import static org.junit.Assert.assertEquals; import static org.mockito.Matchers.any; import static org.mockito.Matchers.eq; import static org.mockito.Mockito.times; @@ -42,9 +43,15 @@ import java.io.IOException; import java.nio.charset.StandardCharsets; import java.util.Arrays; +import java.util.HashMap; +import org.apache.beam.runners.core.metrics.GcpResourceIdentifiers; +import org.apache.beam.runners.core.metrics.MetricsContainerImpl; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName; import org.apache.beam.sdk.io.gcp.bigtable.BigtableIO.BigtableSource; import org.apache.beam.sdk.io.range.ByteKey; import org.apache.beam.sdk.io.range.ByteKeyRange; +import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; import org.apache.beam.sdk.values.KV; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -59,13 +66,14 @@ /** Unit tests of BigtableServiceImpl. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigtableServiceImplTest { + private static final String PROJECT_ID = "project"; + private static final String INSTANCE_ID = "instance"; + private static final String TABLE_ID = "table"; + private static final BigtableTableName TABLE_NAME = - new BigtableInstanceName("project", "instance").toTableName("table"); + new BigtableInstanceName(PROJECT_ID, INSTANCE_ID).toTableName(TABLE_ID); @Mock private BigtableSession mockSession; @@ -79,10 +87,13 @@ public class BigtableServiceImplTest { public void setup() { MockitoAnnotations.initMocks(this); BigtableOptions options = - new BigtableOptions.Builder().setProjectId("project").setInstanceId("instance").build(); + new BigtableOptions.Builder().setProjectId(PROJECT_ID).setInstanceId(INSTANCE_ID).build(); when(mockSession.getOptions()).thenReturn(options); when(mockSession.createBulkMutation(eq(TABLE_NAME))).thenReturn(mockBulkMutation); when(mockSession.getDataClient()).thenReturn(mockBigtableDataClient); + // Setup the ProcessWideContainer for testing metrics are set. + MetricsContainerImpl container = new MetricsContainerImpl(null); + MetricsEnvironment.setProcessWideContainer(container); } /** @@ -97,7 +108,7 @@ public void testRead() throws IOException { ByteKey start = ByteKey.copyFrom("a".getBytes(StandardCharsets.UTF_8)); ByteKey end = ByteKey.copyFrom("b".getBytes(StandardCharsets.UTF_8)); when(mockBigtableSource.getRanges()).thenReturn(Arrays.asList(ByteKeyRange.of(start, end))); - when(mockBigtableSource.getTableId()).thenReturn(StaticValueProvider.of("table_name")); + when(mockBigtableSource.getTableId()).thenReturn(StaticValueProvider.of(TABLE_ID)); @SuppressWarnings("unchecked") ResultScanner mockResultScanner = Mockito.mock(ResultScanner.class); Row expectedRow = Row.newBuilder().setKey(ByteString.copyFromUtf8("a")).build(); @@ -112,6 +123,7 @@ public void testRead() throws IOException { underTest.close(); verify(mockResultScanner, times(1)).close(); + verifyMetricWasSet("google.bigtable.v2.ReadRows", "ok", 1); } /** @@ -143,4 +155,27 @@ public void testWrite() throws IOException, InterruptedException { underTest.close(); verify(mockBulkMutation, times(1)).flush(); } + + private void verifyMetricWasSet(String method, String status, long count) { + // Verify the metric as reported. + HashMap labels = new HashMap<>(); + labels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + labels.put(MonitoringInfoConstants.Labels.SERVICE, "BigTable"); + labels.put(MonitoringInfoConstants.Labels.METHOD, method); + labels.put( + MonitoringInfoConstants.Labels.RESOURCE, + GcpResourceIdentifiers.bigtableResource(PROJECT_ID, INSTANCE_ID, TABLE_ID)); + labels.put(MonitoringInfoConstants.Labels.BIGTABLE_PROJECT_ID, PROJECT_ID); + labels.put(MonitoringInfoConstants.Labels.INSTANCE_ID, INSTANCE_ID); + labels.put( + MonitoringInfoConstants.Labels.TABLE_ID, + GcpResourceIdentifiers.bigtableTableID(PROJECT_ID, INSTANCE_ID, TABLE_ID)); + labels.put(MonitoringInfoConstants.Labels.STATUS, status); + + MonitoringInfoMetricName name = + MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, labels); + MetricsContainerImpl container = + (MetricsContainerImpl) MetricsEnvironment.getProcessWideContainer(); + assertEquals(count, (long) container.getCounter(name).getCumulative()); + } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/TestUtils.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableTestUtils.java similarity index 71% rename from sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/TestUtils.java rename to sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableTestUtils.java index d7345cbeccf5..5c5af10b37b7 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/TestUtils.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableTestUtils.java @@ -18,11 +18,12 @@ package org.apache.beam.sdk.io.gcp.bigtable; import static java.nio.charset.StandardCharsets.UTF_8; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.FAMILY_TEST; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.LATER; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.NOW; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.booleanToByteArray; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.doubleToByteArray; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.KEY; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.LABELS; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.TIMESTAMP_MICROS; +import static org.apache.beam.sdk.io.gcp.bigtable.RowUtils.VALUE; +import static org.apache.beam.sdk.io.gcp.testing.BigtableUtils.booleanToByteArray; +import static org.apache.beam.sdk.io.gcp.testing.BigtableUtils.doubleToByteArray; import com.google.bigtable.v2.Cell; import com.google.bigtable.v2.Column; @@ -30,11 +31,48 @@ import com.google.bigtable.v2.Mutation; import com.google.protobuf.ByteString; import java.util.List; +import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.values.KV; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.Longs; -public class TestUtils { +class BigtableTestUtils { + + static final String BOOL_COLUMN = "boolColumn"; + static final String LONG_COLUMN = "longColumn"; + static final String STRING_COLUMN = "stringColumn"; + static final String DOUBLE_COLUMN = "doubleColumn"; + static final String FAMILY_TEST = "familyTest"; + + static final Schema LONG_COLUMN_SCHEMA = + Schema.builder() + .addInt64Field(VALUE) + .addInt64Field(TIMESTAMP_MICROS) + .addArrayField(LABELS, Schema.FieldType.STRING) + .build(); + + static final Schema TEST_FAMILY_SCHEMA = + Schema.builder() + .addBooleanField(BOOL_COLUMN) + .addRowField(LONG_COLUMN, LONG_COLUMN_SCHEMA) + .addArrayField(STRING_COLUMN, Schema.FieldType.STRING) + .addDoubleField(DOUBLE_COLUMN) + .build(); + + static final Schema TEST_SCHEMA = + Schema.builder().addStringField(KEY).addRowField(FAMILY_TEST, TEST_FAMILY_SCHEMA).build(); + + static final Schema TEST_FLAT_SCHEMA = + Schema.builder() + .addStringField(KEY) + .addBooleanField(BOOL_COLUMN) + .addInt64Field(LONG_COLUMN) + .addStringField(STRING_COLUMN) + .addDoubleField(DOUBLE_COLUMN) + .build(); + + static final long NOW = 5_000_000_000L; + static final long LATER = NOW + 1_000L; static com.google.bigtable.v2.Row bigtableRow(long i) { return com.google.bigtable.v2.Row.newBuilder() diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java index 25579a8245fd..0aa8650ca99f 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.bigtable; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import com.google.bigtable.admin.v2.ColumnFamily; import com.google.bigtable.admin.v2.CreateTableRequest; @@ -59,9 +59,6 @@ /** End-to-end tests of BigtableWrite. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigtableWriteIT implements Serializable { /** * These tests requires a static instances because the writers go through a serialization step diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/CellValueParserTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/CellValueParserTest.java index fccb20d9817b..0fe6b7f56ce8 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/CellValueParserTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/CellValueParserTest.java @@ -18,23 +18,27 @@ package org.apache.beam.sdk.io.gcp.bigtable; import static java.nio.charset.StandardCharsets.UTF_8; -import static org.apache.beam.sdk.io.gcp.testing.BigtableTestUtils.checkMessage; -import static org.apache.beam.sdk.schemas.Schema.TypeName.BOOLEAN; -import static org.apache.beam.sdk.schemas.Schema.TypeName.BYTE; -import static org.apache.beam.sdk.schemas.Schema.TypeName.BYTES; -import static org.apache.beam.sdk.schemas.Schema.TypeName.DATETIME; -import static org.apache.beam.sdk.schemas.Schema.TypeName.DOUBLE; -import static org.apache.beam.sdk.schemas.Schema.TypeName.FLOAT; -import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; -import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; -import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.FieldType.BOOLEAN; +import static org.apache.beam.sdk.schemas.Schema.FieldType.BYTE; +import static org.apache.beam.sdk.schemas.Schema.FieldType.BYTES; +import static org.apache.beam.sdk.schemas.Schema.FieldType.DATETIME; +import static org.apache.beam.sdk.schemas.Schema.FieldType.DOUBLE; +import static org.apache.beam.sdk.schemas.Schema.FieldType.FLOAT; +import static org.apache.beam.sdk.schemas.Schema.FieldType.INT16; +import static org.apache.beam.sdk.schemas.Schema.FieldType.INT32; +import static org.apache.beam.sdk.schemas.Schema.FieldType.INT64; +import static org.apache.beam.sdk.schemas.Schema.FieldType.STRING; import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; -import static org.apache.beam.sdk.schemas.Schema.TypeName.STRING; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.containsString; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertThrows; +import static org.junit.Assert.fail; import com.google.bigtable.v2.Cell; import com.google.protobuf.ByteString; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.DateTime; import org.joda.time.DateTimeZone; import org.junit.Test; @@ -185,7 +189,9 @@ public void shouldParseBytesType() { public void shouldFailOnUnsupportedType() { byte[] value = new byte[0]; IllegalArgumentException exception = - assertThrows(IllegalArgumentException.class, () -> PARSER.getCellValue(cell(value), MAP)); + assertThrows( + IllegalArgumentException.class, + () -> PARSER.getCellValue(cell(value), FieldType.of(MAP))); checkMessage(exception.getMessage(), "Unsupported cell value type 'MAP'."); } @@ -274,4 +280,12 @@ private ByteString byteString(byte[] bytes) { private Cell cell(byte[] value) { return Cell.newBuilder().setValue(ByteString.copyFrom(value)).build(); } + + private void checkMessage(@Nullable String message, String substring) { + if (message != null) { + assertThat(message, containsString(substring)); + } else { + fail(); + } + } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/AdaptiveThrottlerTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/AdaptiveThrottlerTest.java index 4be4b5057880..6db8af3906c3 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/AdaptiveThrottlerTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/AdaptiveThrottlerTest.java @@ -17,11 +17,11 @@ */ package org.apache.beam.sdk.io.gcp.datastore; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.closeTo; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import java.util.Random; import org.junit.Test; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DataStoreV1SchemaIOProviderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DataStoreV1SchemaIOProviderTest.java index 5ce5f72a3262..147474496e7b 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DataStoreV1SchemaIOProviderTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DataStoreV1SchemaIOProviderTest.java @@ -31,9 +31,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataStoreV1SchemaIOProviderTest { static final String DEFAULT_KEY_FIELD = "__key__"; public static final String KEY_FIELD_PROPERTY = "keyField"; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1Test.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1Test.java index 4d752847c19a..8ab31dc29802 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1Test.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1Test.java @@ -34,13 +34,14 @@ import static org.apache.beam.sdk.io.gcp.datastore.DatastoreV1.Read.translateGqlQueryWithLimitCheck; import static org.apache.beam.sdk.io.gcp.datastore.DatastoreV1.isValidKey; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.greaterThanOrEqualTo; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.lessThanOrEqualTo; +import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.Matchers.any; import static org.mockito.Matchers.eq; @@ -69,9 +70,13 @@ import com.google.rpc.Code; import java.util.ArrayList; import java.util.Date; +import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Set; +import org.apache.beam.runners.core.metrics.MetricsContainerImpl; +import org.apache.beam.runners.core.metrics.MonitoringInfoConstants; +import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.io.gcp.datastore.DatastoreV1.DatastoreWriterFn; import org.apache.beam.sdk.io.gcp.datastore.DatastoreV1.DeleteEntity; @@ -84,6 +89,7 @@ import org.apache.beam.sdk.io.gcp.datastore.DatastoreV1.UpsertFn; import org.apache.beam.sdk.io.gcp.datastore.DatastoreV1.V1DatastoreFactory; import org.apache.beam.sdk.io.gcp.datastore.DatastoreV1.Write; +import org.apache.beam.sdk.metrics.MetricsEnvironment; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.options.ValueProvider; @@ -108,9 +114,6 @@ /** Tests for {@link DatastoreV1}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DatastoreV1Test { private static final String PROJECT_ID = "testProject"; private static final String NAMESPACE = "testNamespace"; @@ -143,6 +146,9 @@ public void setUp() { when(mockDatastoreFactory.getDatastore(any(PipelineOptions.class), any(String.class), any())) .thenReturn(mockDatastore); when(mockDatastoreFactory.getQuerySplitter()).thenReturn(mockQuerySplitter); + // Setup the ProcessWideContainer for testing metrics are set. + MetricsContainerImpl container = new MetricsContainerImpl(null); + MetricsEnvironment.setProcessWideContainer(container); } @Test @@ -295,8 +301,10 @@ public void testDeleteKeyDisplayData() { @Test public void testWritePrimitiveDisplayData() { + int hintNumWorkers = 10; DisplayDataEvaluator evaluator = DisplayDataEvaluator.create(); - PTransform, ?> write = DatastoreIO.v1().write().withProjectId("myProject"); + PTransform, ?> write = + DatastoreIO.v1().write().withProjectId("myProject").withHintNumWorkers(hintNumWorkers); Set displayData = evaluator.displayDataForPrimitiveTransforms(write); assertThat( @@ -307,13 +315,42 @@ public void testWritePrimitiveDisplayData() { "DatastoreIO write should include the upsertFn in its primitive display data", displayData, hasItem(hasDisplayItem("upsertFn"))); + assertThat( + "DatastoreIO write should include ramp-up throttling worker count hint if enabled", + displayData, + hasItem(hasDisplayItem("hintNumWorkers", hintNumWorkers))); + } + + @Test + public void testWritePrimitiveDisplayDataDisabledThrottler() { + DisplayDataEvaluator evaluator = DisplayDataEvaluator.create(); + PTransform, ?> write = + DatastoreIO.v1().write().withProjectId("myProject").withRampupThrottlingDisabled(); + + Set displayData = evaluator.displayDataForPrimitiveTransforms(write); + assertThat( + "DatastoreIO write should include the project in its primitive display data", + displayData, + hasItem(hasDisplayItem("projectId"))); + assertThat( + "DatastoreIO write should include the upsertFn in its primitive display data", + displayData, + hasItem(hasDisplayItem("upsertFn"))); + assertThat( + "DatastoreIO write should include ramp-up throttling worker count hint if enabled", + displayData, + not(hasItem(hasDisplayItem("hintNumWorkers")))); } @Test public void testDeleteEntityPrimitiveDisplayData() { + int hintNumWorkers = 10; DisplayDataEvaluator evaluator = DisplayDataEvaluator.create(); PTransform, ?> write = - DatastoreIO.v1().deleteEntity().withProjectId("myProject"); + DatastoreIO.v1() + .deleteEntity() + .withProjectId("myProject") + .withHintNumWorkers(hintNumWorkers); Set displayData = evaluator.displayDataForPrimitiveTransforms(write); assertThat( @@ -324,12 +361,18 @@ public void testDeleteEntityPrimitiveDisplayData() { "DatastoreIO write should include the deleteEntityFn in its primitive display data", displayData, hasItem(hasDisplayItem("deleteEntityFn"))); + assertThat( + "DatastoreIO write should include ramp-up throttling worker count hint if enabled", + displayData, + hasItem(hasDisplayItem("hintNumWorkers", hintNumWorkers))); } @Test public void testDeleteKeyPrimitiveDisplayData() { + int hintNumWorkers = 10; DisplayDataEvaluator evaluator = DisplayDataEvaluator.create(); - PTransform, ?> write = DatastoreIO.v1().deleteKey().withProjectId("myProject"); + PTransform, ?> write = + DatastoreIO.v1().deleteKey().withProjectId("myProject").withHintNumWorkers(hintNumWorkers); Set displayData = evaluator.displayDataForPrimitiveTransforms(write); assertThat( @@ -340,6 +383,10 @@ public void testDeleteKeyPrimitiveDisplayData() { "DatastoreIO write should include the deleteKeyFn in its primitive display data", displayData, hasItem(hasDisplayItem("deleteKeyFn"))); + assertThat( + "DatastoreIO write should include ramp-up throttling worker count hint if enabled", + displayData, + hasItem(hasDisplayItem("hintNumWorkers", hintNumWorkers))); } /** Test building a Write using builder methods. */ @@ -465,12 +512,14 @@ public void testDatastoreWriteFnDisplayData() { @Test public void testDatatoreWriterFnWithOneBatch() throws Exception { datastoreWriterFnTest(100); + verifyMetricWasSet("BatchDatastoreWrite", "ok", "", 2); } /** Tests {@link DatastoreWriterFn} with entities of more than one batches, but not a multiple. */ @Test public void testDatatoreWriterFnWithMultipleBatches() throws Exception { datastoreWriterFnTest(DatastoreV1.DATASTORE_BATCH_UPDATE_ENTITIES_START * 3 + 100); + verifyMetricWasSet("BatchDatastoreWrite", "ok", "", 5); } /** @@ -480,6 +529,7 @@ public void testDatatoreWriterFnWithMultipleBatches() throws Exception { @Test public void testDatatoreWriterFnWithBatchesExactMultiple() throws Exception { datastoreWriterFnTest(DatastoreV1.DATASTORE_BATCH_UPDATE_ENTITIES_START * 2); + verifyMetricWasSet("BatchDatastoreWrite", "ok", "", 2); } // A helper method to test DatastoreWriterFn for various batch sizes. @@ -573,6 +623,8 @@ public void testDatatoreWriterFnRetriesErrors() throws Exception { DoFnTester doFnTester = DoFnTester.of(datastoreWriter); doFnTester.setCloningBehavior(CloningBehavior.DO_NOT_CLONE); doFnTester.processBundle(mutations); + verifyMetricWasSet("BatchDatastoreWrite", "ok", "", 2); + verifyMetricWasSet("BatchDatastoreWrite", "unknown", "", 1); } /** @@ -686,18 +738,21 @@ public void testSplitQueryFnWithQueryLimit() throws Exception { @Test public void testReadFnWithOneBatch() throws Exception { readFnTest(5); + verifyMetricWasSet("BatchDatastoreRead", "ok", NAMESPACE, 1); } /** Tests {@link ReadFn} with a query limit more than one batch, and not a multiple. */ @Test public void testReadFnWithMultipleBatches() throws Exception { readFnTest(QUERY_BATCH_LIMIT + 5); + verifyMetricWasSet("BatchDatastoreRead", "ok", NAMESPACE, 2); } /** Tests {@link ReadFn} for several batches, using an exact multiple of batch size results. */ @Test public void testReadFnWithBatchesExactMultiple() throws Exception { readFnTest(5 * QUERY_BATCH_LIMIT); + verifyMetricWasSet("BatchDatastoreRead", "ok", NAMESPACE, 5); } /** Tests that {@link ReadFn} retries after an error. */ @@ -719,6 +774,8 @@ public void testReadFnRetriesErrors() throws Exception { DoFnTester doFnTester = DoFnTester.of(readFn); doFnTester.setCloningBehavior(CloningBehavior.DO_NOT_CLONE); doFnTester.processBundle(query); + verifyMetricWasSet("BatchDatastoreRead", "ok", NAMESPACE, 1); + verifyMetricWasSet("BatchDatastoreRead", "unknown", NAMESPACE, 1); } @Test @@ -834,15 +891,15 @@ public void testWriteBatcherSlowQueries() { writeBatcher.start(); writeBatcher.addRequestLatency(0, 10000, 200); writeBatcher.addRequestLatency(0, 10000, 200); - assertEquals(100, writeBatcher.nextBatchSize(0)); + assertEquals(120, writeBatcher.nextBatchSize(0)); } @Test public void testWriteBatcherSizeNotBelowMinimum() { DatastoreV1.WriteBatcher writeBatcher = new DatastoreV1.WriteBatcherImpl(); writeBatcher.start(); - writeBatcher.addRequestLatency(0, 30000, 50); - writeBatcher.addRequestLatency(0, 30000, 50); + writeBatcher.addRequestLatency(0, 75000, 50); + writeBatcher.addRequestLatency(0, 75000, 50); assertEquals(DatastoreV1.DATASTORE_BATCH_UPDATE_ENTITIES_MIN, writeBatcher.nextBatchSize(0)); } @@ -851,9 +908,9 @@ public void testWriteBatcherSlidingWindow() { DatastoreV1.WriteBatcher writeBatcher = new DatastoreV1.WriteBatcherImpl(); writeBatcher.start(); writeBatcher.addRequestLatency(0, 30000, 50); - writeBatcher.addRequestLatency(50000, 5000, 200); - writeBatcher.addRequestLatency(100000, 5000, 200); - assertEquals(200, writeBatcher.nextBatchSize(150000)); + writeBatcher.addRequestLatency(50000, 8000, 200); + writeBatcher.addRequestLatency(100000, 8000, 200); + assertEquals(150, writeBatcher.nextBatchSize(150000)); } /** Helper Methods */ @@ -1022,4 +1079,24 @@ public int nextBatchSize(long timeSinceEpochMillis) { return DatastoreV1.DATASTORE_BATCH_UPDATE_ENTITIES_START; } } + + private void verifyMetricWasSet(String method, String status, String namespace, long count) { + // Verify the metric as reported. + HashMap labels = new HashMap<>(); + labels.put(MonitoringInfoConstants.Labels.PTRANSFORM, ""); + labels.put(MonitoringInfoConstants.Labels.SERVICE, "Datastore"); + labels.put(MonitoringInfoConstants.Labels.METHOD, method); + labels.put( + MonitoringInfoConstants.Labels.RESOURCE, + "//bigtable.googleapis.com/projects/" + PROJECT_ID + "/namespaces/" + namespace); + labels.put(MonitoringInfoConstants.Labels.DATASTORE_PROJECT, PROJECT_ID); + labels.put(MonitoringInfoConstants.Labels.DATASTORE_NAMESPACE, namespace); + labels.put(MonitoringInfoConstants.Labels.STATUS, status); + + MonitoringInfoMetricName name = + MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.API_REQUEST_COUNT, labels); + MetricsContainerImpl container = + (MetricsContainerImpl) MetricsEnvironment.getProcessWideContainer(); + assertEquals(count, (long) container.getCounter(name).getCumulative()); + } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/EntityToRowRowToEntityTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/EntityToRowRowToEntityTest.java index 159a13164977..7026d1fb34e7 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/EntityToRowRowToEntityTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/EntityToRowRowToEntityTest.java @@ -31,7 +31,7 @@ import com.google.datastore.v1.Key; import com.google.datastore.v1.Value; import com.google.protobuf.ByteString; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.Arrays; import java.util.Collections; import java.util.UUID; @@ -51,9 +51,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class EntityToRowRowToEntityTest { private static final String KIND = "kind"; private static final String UUID_VALUE = UUID.randomUUID().toString(); @@ -91,7 +88,7 @@ public class EntityToRowRowToEntityTest { makeValue(Collections.singletonList(makeValue(NESTED_ENTITY).build())).build()) .putProperties("double", makeValue(Double.MAX_VALUE).build()) .putProperties( - "bytes", makeValue(ByteString.copyFrom("hello", Charset.defaultCharset())).build()) + "bytes", makeValue(ByteString.copyFrom("hello", StandardCharsets.UTF_8)).build()) .putProperties("string", makeValue("string").build()) .putProperties("nullable", Value.newBuilder().build()) .build(); @@ -105,7 +102,7 @@ public class EntityToRowRowToEntityTest { Arrays.asList("string1", "string2"), Collections.singletonList(row(NESTED_ROW_SCHEMA, Long.MIN_VALUE)), Double.MAX_VALUE, - "hello".getBytes(Charset.defaultCharset()), + "hello".getBytes(StandardCharsets.UTF_8), "string", null); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/RampupThrottlingFnTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/RampupThrottlingFnTest.java new file mode 100644 index 000000000000..6c57a8485578 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/RampupThrottlingFnTest.java @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.datastore; + +import static org.junit.Assert.assertThrows; +import static org.mockito.Mockito.verify; + +import java.util.Map; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.transforms.DoFnTester; +import org.apache.beam.sdk.transforms.DoFnTester.CloningBehavior; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.joda.time.DateTimeUtils; +import org.joda.time.Duration; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.mockito.Mock; +import org.mockito.MockitoAnnotations; + +/** Tests for {@link RampupThrottlingFn}. */ +@RunWith(JUnit4.class) +public class RampupThrottlingFnTest { + + @Mock private Counter mockCounter; + private final Sleeper mockSleeper = + millis -> { + verify(mockCounter).inc(millis); + throw new RampupDelayException(); + }; + private DoFnTester rampupThrottlingFnTester; + + @Before + public void setUp() throws Exception { + MockitoAnnotations.openMocks(this); + + DateTimeUtils.setCurrentMillisFixed(0); + RampupThrottlingFn rampupThrottlingFn = new RampupThrottlingFn<>(1); + rampupThrottlingFnTester = DoFnTester.of(rampupThrottlingFn); + rampupThrottlingFnTester.setCloningBehavior(CloningBehavior.DO_NOT_CLONE); + rampupThrottlingFnTester.startBundle(); + rampupThrottlingFn.sleeper = mockSleeper; + rampupThrottlingFn.throttlingMsecs = mockCounter; + } + + @Test + public void testRampupThrottler() throws Exception { + Map rampupSchedule = + ImmutableMap.builder() + .put(Duration.ZERO, 500) + .put(Duration.millis(1), 0) + .put(Duration.standardSeconds(1), 500) + .put(Duration.standardSeconds(1).plus(Duration.millis(1)), 0) + .put(Duration.standardMinutes(5), 500) + .put(Duration.standardMinutes(10), 750) + .put(Duration.standardMinutes(15), 1125) + .put(Duration.standardMinutes(30), 3796) + .put(Duration.standardMinutes(60), 43248) + .build(); + + for (Map.Entry entry : rampupSchedule.entrySet()) { + DateTimeUtils.setCurrentMillisFixed(entry.getKey().getMillis()); + for (int i = 0; i < entry.getValue(); i++) { + rampupThrottlingFnTester.processElement(null); + } + assertThrows(RampupDelayException.class, () -> rampupThrottlingFnTester.processElement(null)); + } + } + + static class RampupDelayException extends InterruptedException {} +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/SplitQueryFnIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/SplitQueryFnIT.java index 2fa80eaa4bb9..6d0bd52f26dd 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/SplitQueryFnIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/SplitQueryFnIT.java @@ -49,9 +49,6 @@ * */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SplitQueryFnIT { /** Tests {@link SplitQueryFn} to generate expected number of splits for a large dataset. */ @Test diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1ReadIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1ReadIT.java index 6c8c3c425607..55b53b3f0c91 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1ReadIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1ReadIT.java @@ -44,9 +44,6 @@ /** End-to-end tests for Datastore DatastoreV1.Read. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class V1ReadIT { private V1TestOptions options; private String project; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1TestUtil.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1TestUtil.java index 37929c9605f6..e3057c4b90ca 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1TestUtil.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1TestUtil.java @@ -62,9 +62,6 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class V1TestUtil { private static final Logger LOG = LoggerFactory.getLogger(V1TestUtil.class); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1WriteIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1WriteIT.java index 3ccb57b334b1..78a50712e43a 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1WriteIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1WriteIT.java @@ -37,9 +37,6 @@ /** End-to-end tests for Datastore DatastoreV1.Write. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class V1WriteIT { private V1TestOptions options; private String project; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreFnTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreFnTest.java new file mode 100644 index 000000000000..8c8c0558974b --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreFnTest.java @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.Mockito.when; + +import com.google.auth.Credentials; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.util.SerializableUtils; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.mockito.Mock; +import org.mockito.junit.MockitoJUnitRunner; + +@RunWith(MockitoJUnitRunner.class) +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +abstract class BaseFirestoreFnTest> { + + protected final String projectId = "testing-project"; + + // @Rule public final Timeout timeout = new Timeout(10, TimeUnit.SECONDS); + @Mock protected DoFn.StartBundleContext startBundleContext; + @Mock protected DoFn.ProcessContext processContext; + + @Mock(lenient = true) + protected DisplayData.Builder displayDataBuilder; + + @Mock(lenient = true) + protected PipelineOptions pipelineOptions; + + @Mock(lenient = true) + protected GcpOptions gcpOptions; + + @Mock(lenient = true) + protected FirestoreOptions firestoreOptions; + + @Mock protected Credentials credentials; + + @Before + public void stubDisplayDataBuilderChains() { + when(displayDataBuilder.include(any(), any())).thenReturn(displayDataBuilder); + when(displayDataBuilder.add(any())).thenReturn(displayDataBuilder); + when(displayDataBuilder.addIfNotNull(any())).thenReturn(displayDataBuilder); + when(displayDataBuilder.addIfNotDefault(any(), any())).thenReturn(displayDataBuilder); + } + + @Before + public void stubPipelineOptions() { + when(startBundleContext.getPipelineOptions()).thenReturn(pipelineOptions); + when(pipelineOptions.as(FirestoreOptions.class)).thenReturn(firestoreOptions); + when(firestoreOptions.getEmulatorHost()).thenReturn(null); + when(pipelineOptions.as(GcpOptions.class)).thenReturn(gcpOptions); + when(gcpOptions.getProject()).thenReturn(projectId); + when(gcpOptions.getGcpCredential()).thenReturn(credentials); + } + + @Test + public final void ensureSerializable() { + SerializableUtils.ensureSerializable(getFn()); + } + + protected abstract FnT getFn(); + + protected final void runFunction(FnT fn) throws Exception { + runFunction(fn, 1); + } + + protected final void runFunction(FnT fn, int processElementCount) throws Exception { + fn.populateDisplayData(displayDataBuilder); + fn.setup(); + fn.startBundle(startBundleContext); + processElementsAndFinishBundle(fn, processElementCount); + } + + protected abstract void processElementsAndFinishBundle(FnT fn, int processElementCount) + throws Exception; +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreV1FnTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreV1FnTest.java new file mode 100644 index 000000000000..39e0db3ee194 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreV1FnTest.java @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.fail; + +import com.google.api.gax.grpc.GrpcStatusCode; +import com.google.api.gax.rpc.ApiException; +import com.google.api.gax.rpc.ApiExceptionFactory; +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import io.grpc.Status.Code; +import java.net.SocketTimeoutException; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1RpcAttemptContexts.HasRpcAttemptContext; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcAttempt.Context; +import org.joda.time.Instant; +import org.junit.Test; +import org.mockito.Mock; + +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +abstract class BaseFirestoreV1FnTest< + InT, OutT, FnT extends FirestoreDoFn & HasRpcAttemptContext> + extends BaseFirestoreFnTest { + protected static final ApiException RETRYABLE_ERROR = + ApiExceptionFactory.createException( + new SocketTimeoutException("retryableError"), GrpcStatusCode.of(Code.CANCELLED), true); + + protected final RpcQosOptions rpcQosOptions = RpcQosOptions.newBuilder().build(); + + protected final JodaClock clock = new MonotonicJodaClock(); + @Mock protected FirestoreStatefulComponentFactory ff; + @Mock protected FirestoreStub stub; + @Mock protected RpcQos rpcQos; + + @Test + public abstract void attemptsExhaustedForRetryableError() throws Exception; + + @Test + public abstract void noRequestIsSentIfNotSafeToProceed() throws Exception; + + @Test + public final void contextNamespaceMatchesPublicAPIDefinedValue() { + FnT fn = getFn(); + Context rpcAttemptContext = fn.getRpcAttemptContext(); + if (rpcAttemptContext instanceof V1FnRpcAttemptContext) { + V1FnRpcAttemptContext v1FnRpcAttemptContext = (V1FnRpcAttemptContext) rpcAttemptContext; + switch (v1FnRpcAttemptContext) { + case BatchGetDocuments: + assertEquals( + "org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.BatchGetDocuments", + v1FnRpcAttemptContext.getNamespace()); + break; + case BatchWrite: + assertEquals( + "org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.BatchWrite", + v1FnRpcAttemptContext.getNamespace()); + break; + case ListCollectionIds: + assertEquals( + "org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListCollectionIds", + v1FnRpcAttemptContext.getNamespace()); + break; + case ListDocuments: + assertEquals( + "org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.ListDocuments", + v1FnRpcAttemptContext.getNamespace()); + break; + case PartitionQuery: + assertEquals( + "org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.PartitionQuery", + v1FnRpcAttemptContext.getNamespace()); + break; + case RunQuery: + assertEquals( + "org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.RunQuery", + v1FnRpcAttemptContext.getNamespace()); + break; + default: + fail("Unverified V1FnRpcAttemptContext value"); + } + } + } + + private static class MonotonicJodaClock implements JodaClock { + private long counter = 0; + + @Override + public Instant instant() { + return Instant.ofEpochMilli(counter++); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreV1ReadFnTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreV1ReadFnTest.java new file mode 100644 index 000000000000..0aab59d3aacd --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreV1ReadFnTest.java @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.junit.Assert.assertSame; +import static org.junit.Assert.fail; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.doNothing; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +import com.google.api.gax.rpc.ApiException; +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.BaseFirestoreV1ReadFn; +import org.junit.Test; +import org.mockito.Mock; + +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +abstract class BaseFirestoreV1ReadFnTest + extends BaseFirestoreV1FnTest> { + + @Mock protected RpcQos.RpcReadAttempt attempt; + + @Override + @Test + public final void attemptsExhaustedForRetryableError() throws Exception { + BaseFirestoreV1ReadFn fn = getFn(clock, ff, rpcQosOptions); + V1RpcFnTestCtx ctx = newCtx(); + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newReadAttempt(fn.getRpcAttemptContext())).thenReturn(attempt); + ctx.mockRpcToCallable(stub); + + when(attempt.awaitSafeToProceed(any())).thenReturn(true, true, true); + ctx.whenCallableCall(any(), RETRYABLE_ERROR, RETRYABLE_ERROR, RETRYABLE_ERROR); + doNothing().when(attempt).recordRequestFailed(any()); + doNothing() + .doNothing() + .doThrow(RETRYABLE_ERROR) + .when(attempt) + .checkCanRetry(any(), eq(RETRYABLE_ERROR)); + + when(processContext.element()).thenReturn(ctx.getRequest()); + + try { + runFunction(fn); + fail("Expected ApiException to be throw after exhausted attempts"); + } catch (ApiException e) { + assertSame(RETRYABLE_ERROR, e); + } + + verify(attempt, times(3)).awaitSafeToProceed(any()); + verify(attempt, times(3)).recordRequestFailed(any()); + verify(attempt, times(0)).recordStreamValue(any()); + verify(attempt, times(0)).recordRequestSuccessful(any()); + } + + @Override + @Test + public final void noRequestIsSentIfNotSafeToProceed() throws Exception { + BaseFirestoreV1ReadFn fn = getFn(clock, ff, rpcQosOptions); + V1RpcFnTestCtx ctx = newCtx(); + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newReadAttempt(fn.getRpcAttemptContext())).thenReturn(attempt); + + InterruptedException interruptedException = new InterruptedException(); + when(attempt.awaitSafeToProceed(any())).thenReturn(false).thenThrow(interruptedException); + + when(processContext.element()).thenReturn(ctx.getRequest()); + + try { + runFunction(fn); + fail("Expected ApiException to be throw after exhausted attempts"); + } catch (InterruptedException e) { + assertSame(interruptedException, e); + } + + verify(stub, times(1)).close(); + verifyNoMoreInteractions(stub); + ctx.verifyNoInteractionsWithCallable(); + verify(attempt, times(0)).recordRequestFailed(any()); + verify(attempt, times(0)).recordStreamValue(any()); + verify(attempt, times(0)).recordRequestSuccessful(any()); + } + + @Test + public abstract void resumeFromLastReadValue() throws Exception; + + protected abstract V1RpcFnTestCtx newCtx(); + + @Override + protected final BaseFirestoreV1ReadFn getFn() { + return getFn(JodaClock.DEFAULT, FirestoreStatefulComponentFactory.INSTANCE, rpcQosOptions); + } + + protected abstract BaseFirestoreV1ReadFn getFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions); + + @Override + protected void processElementsAndFinishBundle( + BaseFirestoreV1ReadFn fn, int processElementCount) throws Exception { + try { + for (int i = 0; i < processElementCount; i++) { + fn.processElement(processContext); + } + } finally { + fn.finishBundle(); + } + } + + /** + * This class exists due to the fact that there is not a common parent interface in gax for each + * of the types of Callables that are output upon code generation. + */ + protected abstract class V1RpcFnTestCtx { + + protected V1RpcFnTestCtx() {} + + public abstract InT getRequest(); + + public abstract void mockRpcToCallable(FirestoreStub stub); + + public abstract void whenCallableCall(InT in, Throwable... throwables); + + public abstract void verifyNoInteractionsWithCallable(); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreV1WriteFnTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreV1WriteFnTest.java new file mode 100644 index 000000000000..727fdc07d42a --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/BaseFirestoreV1WriteFnTest.java @@ -0,0 +1,919 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.apache.beam.sdk.io.gcp.firestore.FirestoreProtoHelpers.newWrite; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists.newArrayList; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertSame; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; +import static org.mockito.AdditionalMatchers.gt; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.anyInt; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.atLeastOnce; +import static org.mockito.Mockito.doNothing; +import static org.mockito.Mockito.doThrow; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.never; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; +import static org.powermock.api.mockito.PowerMockito.spy; + +import com.google.api.gax.grpc.GrpcStatusCode; +import com.google.api.gax.rpc.ApiException; +import com.google.api.gax.rpc.ApiExceptionFactory; +import com.google.api.gax.rpc.UnaryCallable; +import com.google.firestore.v1.BatchWriteRequest; +import com.google.firestore.v1.BatchWriteResponse; +import com.google.firestore.v1.Write; +import com.google.firestore.v1.WriteResult; +import com.google.rpc.Code; +import com.google.rpc.Status; +import java.io.IOException; +import java.lang.reflect.Method; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Random; +import java.util.concurrent.atomic.AtomicLong; +import java.util.function.Function; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1RpcAttemptContexts.HasRpcAttemptContext; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.BaseBatchWriteFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.WriteElement; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.Element; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.FlushBuffer; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.FlushBufferImpl; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Distribution; +import org.apache.beam.sdk.metrics.MetricName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.junit.Before; +import org.junit.Test; +import org.mockito.ArgumentCaptor; +import org.mockito.Mock; +import org.mockito.invocation.InvocationOnMock; +import org.mockito.stubbing.Answer; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +public abstract class BaseFirestoreV1WriteFnTest< + OutT, FnT extends BaseBatchWriteFn & HasRpcAttemptContext> + extends BaseFirestoreV1FnTest { + private static final Logger LOG = LoggerFactory.getLogger(BaseFirestoreV1WriteFnTest.class); + + protected static final Status STATUS_OK = + Status.newBuilder().setCode(Code.OK.getNumber()).build(); + protected static final Status STATUS_DEADLINE_EXCEEDED = + Status.newBuilder().setCode(Code.DEADLINE_EXCEEDED.getNumber()).build(); + + @Mock(lenient = true) + protected BoundedWindow window; + + @Mock protected DoFn.FinishBundleContext finishBundleContext; + @Mock protected UnaryCallable callable; + @Mock protected RpcQos.RpcWriteAttempt attempt; + @Mock protected RpcQos.RpcWriteAttempt attempt2; + + protected MetricsFixture metricsFixture; + + @Before + public final void setUp() { + when(rpcQos.newWriteAttempt(any())).thenReturn(attempt, attempt2); + + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(ff.getFirestoreStub(pipelineOptions)).thenReturn(stub); + when(stub.batchWriteCallable()).thenReturn(callable); + metricsFixture = new MetricsFixture(); + } + + @Override + @Test + public final void attemptsExhaustedForRetryableError() throws Exception { + Instant attemptStart = Instant.ofEpochMilli(0); + Instant rpc1Start = Instant.ofEpochMilli(1); + Instant rpc1End = Instant.ofEpochMilli(2); + Instant rpc2Start = Instant.ofEpochMilli(3); + Instant rpc2End = Instant.ofEpochMilli(4); + Instant rpc3Start = Instant.ofEpochMilli(5); + Instant rpc3End = Instant.ofEpochMilli(6); + Write write = newWrite(); + Element element1 = new WriteElement(0, write, window); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newWriteAttempt(FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext.BatchWrite)) + .thenReturn(attempt); + when(stub.batchWriteCallable()).thenReturn(callable); + + FlushBuffer> flushBuffer = spy(newFlushBuffer(rpcQosOptions)); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + when(attempt.>newFlushBuffer(attemptStart)).thenReturn(flushBuffer); + when(flushBuffer.offer(element1)).thenReturn(true); + when(flushBuffer.iterator()).thenReturn(newArrayList(element1).iterator()); + when(flushBuffer.getBufferedElementsCount()).thenReturn(1); + when(flushBuffer.isFull()).thenReturn(true); + + when(callable.call(any())).thenThrow(RETRYABLE_ERROR, RETRYABLE_ERROR, RETRYABLE_ERROR); + doNothing().when(attempt).recordWriteCounts(any(), anyInt(), anyInt()); + doNothing() + .doNothing() + .doThrow(RETRYABLE_ERROR) + .when(attempt) + .checkCanRetry(any(), eq(RETRYABLE_ERROR)); + + when(processContext.element()).thenReturn(write); + + try { + runFunction( + getFn(clock, ff, rpcQosOptions, CounterFactory.DEFAULT, DistributionFactory.DEFAULT)); + fail("Expected ApiException to be throw after exhausted attempts"); + } catch (ApiException e) { + assertSame(RETRYABLE_ERROR, e); + } + + verify(attempt, times(1)).awaitSafeToProceed(attemptStart); + verify(attempt, times(1)).recordRequestStart(rpc1Start, 1); + verify(attempt, times(1)).recordWriteCounts(rpc1End, 0, 1); + verify(attempt, times(1)).recordRequestStart(rpc2Start, 1); + verify(attempt, times(1)).recordWriteCounts(rpc2End, 0, 1); + verify(attempt, times(1)).recordRequestStart(rpc3Start, 1); + verify(attempt, times(1)).recordWriteCounts(rpc3End, 0, 1); + verify(attempt, times(0)).recordWriteCounts(any(), gt(0), anyInt()); + verify(attempt, never()).completeSuccess(); + } + + @Override + @Test + public final void noRequestIsSentIfNotSafeToProceed() throws Exception { + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newWriteAttempt(FirestoreV1RpcAttemptContexts.V1FnRpcAttemptContext.BatchWrite)) + .thenReturn(attempt); + + InterruptedException interruptedException = new InterruptedException(); + when(attempt.awaitSafeToProceed(any())).thenReturn(false).thenThrow(interruptedException); + + when(processContext.element()).thenReturn(newWrite()); + + try { + runFunction( + getFn(clock, ff, rpcQosOptions, CounterFactory.DEFAULT, DistributionFactory.DEFAULT)); + fail("Expected ApiException to be throw after exhausted attempts"); + } catch (InterruptedException e) { + assertSame(interruptedException, e); + } + + verify(stub, times(1)).close(); + verifyNoMoreInteractions(stub); + verifyNoMoreInteractions(callable); + verify(attempt, times(0)).recordWriteCounts(any(), anyInt(), anyInt()); + } + + @Test + public abstract void enqueueingWritesValidateBytesSize() throws Exception; + + @Test + public final void endToEnd_success() throws Exception { + + Write write = newWrite(); + BatchWriteRequest expectedRequest = + BatchWriteRequest.newBuilder() + .setDatabase("projects/testing-project/databases/(default)") + .addWrites(write) + .build(); + + BatchWriteResponse response = BatchWriteResponse.newBuilder().addStatus(STATUS_OK).build(); + + Element element1 = new WriteElement(0, write, window); + + Instant attemptStart = Instant.ofEpochMilli(0); + Instant rpcStart = Instant.ofEpochMilli(1); + Instant rpcEnd = Instant.ofEpochMilli(2); + + RpcQosOptions options = rpcQosOptions.toBuilder().withBatchMaxCount(1).build(); + FlushBuffer> flushBuffer = spy(newFlushBuffer(options)); + when(processContext.element()).thenReturn(write); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + when(attempt.>newFlushBuffer(attemptStart)).thenReturn(flushBuffer); + ArgumentCaptor requestCaptor = + ArgumentCaptor.forClass(BatchWriteRequest.class); + when(callable.call(requestCaptor.capture())).thenReturn(response); + + runFunction(getFn(clock, ff, options, CounterFactory.DEFAULT, DistributionFactory.DEFAULT)); + + assertEquals(expectedRequest, requestCaptor.getValue()); + verify(flushBuffer, times(1)).offer(element1); + verify(flushBuffer, times(1)).isFull(); + verify(attempt, times(1)).recordRequestStart(rpcStart, 1); + verify(attempt, times(1)).recordWriteCounts(rpcEnd, 1, 0); + verify(attempt, never()).recordWriteCounts(any(), anyInt(), gt(0)); + verify(attempt, never()).checkCanRetry(any(), any()); + } + + @Test + public final void endToEnd_exhaustingAttemptsResultsInException() throws Exception { + ApiException err1 = + ApiExceptionFactory.createException( + new IOException("err1"), GrpcStatusCode.of(io.grpc.Status.Code.ABORTED), false); + ApiException err2 = + ApiExceptionFactory.createException( + new IOException("err2"), GrpcStatusCode.of(io.grpc.Status.Code.ABORTED), false); + ApiException err3 = + ApiExceptionFactory.createException( + new IOException("err3"), GrpcStatusCode.of(io.grpc.Status.Code.ABORTED), false); + + Instant attemptStart = Instant.ofEpochMilli(0); + Instant rpc1Start = Instant.ofEpochMilli(1); + Instant rpc1End = Instant.ofEpochMilli(2); + Instant rpc2Start = Instant.ofEpochMilli(3); + Instant rpc2End = Instant.ofEpochMilli(4); + Instant rpc3Start = Instant.ofEpochMilli(5); + Instant rpc3End = Instant.ofEpochMilli(6); + + Write write = newWrite(); + + Element element1 = new WriteElement(0, write, window); + + FlushBuffer> flushBuffer = spy(newFlushBuffer(rpcQosOptions)); + when(processContext.element()).thenReturn(write); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + when(attempt.>newFlushBuffer(attemptStart)).thenReturn(flushBuffer); + when(flushBuffer.isFull()).thenReturn(true); + when(flushBuffer.offer(element1)).thenReturn(true); + when(flushBuffer.iterator()).thenReturn(newArrayList(element1).iterator()); + when(flushBuffer.getBufferedElementsCount()).thenReturn(1); + when(callable.call(any())).thenThrow(err1, err2, err3); + doNothing().when(attempt).checkCanRetry(any(), eq(err1)); + doNothing().when(attempt).checkCanRetry(any(), eq(err2)); + doThrow(err3).when(attempt).checkCanRetry(any(), eq(err3)); + + try { + FnT fn = getFn(clock, ff, rpcQosOptions, CounterFactory.DEFAULT, DistributionFactory.DEFAULT); + runFunction(fn); + fail("Expected exception"); + } catch (ApiException e) { + assertNotNull(e.getMessage()); + assertTrue(e.getMessage().contains("err3")); + } + + verify(flushBuffer, times(1)).offer(element1); + verify(flushBuffer, atLeastOnce()).isFull(); + verify(attempt, times(1)).recordRequestStart(rpc1Start, 1); + verify(attempt, times(1)).recordWriteCounts(rpc1End, 0, 1); + verify(attempt, times(1)).recordRequestStart(rpc2Start, 1); + verify(attempt, times(1)).recordWriteCounts(rpc2End, 0, 1); + verify(attempt, times(1)).recordRequestStart(rpc3Start, 1); + verify(attempt, times(1)).recordWriteCounts(rpc3End, 0, 1); + verify(attempt, never()).recordWriteCounts(any(), gt(0), anyInt()); + verify(attempt, never()).completeSuccess(); + } + + @Test + public final void endToEnd_awaitSafeToProceed_falseIsTerminalForAttempt() throws Exception { + RpcQosOptions options = rpcQosOptions.toBuilder().withBatchMaxCount(2).build(); + + Instant rpc1Start = Instant.ofEpochMilli(3); + Instant rpc1End = Instant.ofEpochMilli(4); + ArgumentCaptor requestCaptor = + ArgumentCaptor.forClass(BatchWriteRequest.class); + + Write write = newWrite(); + BatchWriteRequest expectedRequest = + BatchWriteRequest.newBuilder() + .setDatabase("projects/testing-project/databases/(default)") + .addWrites(write) + .build(); + BatchWriteResponse response = BatchWriteResponse.newBuilder().addStatus(STATUS_OK).build(); + + when(processContext.element()).thenReturn(write); + // process element attempt 1 + when(attempt.awaitSafeToProceed(any())) + .thenReturn(false) + .thenThrow(new IllegalStateException("too many attempt1#awaitSafeToProceed")); + // process element attempt 2 + when(attempt2.awaitSafeToProceed(any())) + .thenReturn(true) + .thenThrow(new IllegalStateException("too many attempt2#awaitSafeToProceed")); + when(attempt2.>newFlushBuffer(any())) + .thenAnswer(invocation -> newFlushBuffer(options)); + // finish bundle attempt + RpcQos.RpcWriteAttempt finishBundleAttempt = mock(RpcWriteAttempt.class); + when(finishBundleAttempt.awaitSafeToProceed(any())) + .thenReturn(true, true) + .thenThrow(new IllegalStateException("too many finishBundleAttempt#awaitSafeToProceed")); + when(finishBundleAttempt.>newFlushBuffer(any())) + .thenAnswer(invocation -> newFlushBuffer(options)); + when(rpcQos.newWriteAttempt(any())).thenReturn(attempt, attempt2, finishBundleAttempt); + when(callable.call(requestCaptor.capture())).thenReturn(response); + + FnT fn = getFn(clock, ff, options, CounterFactory.DEFAULT, DistributionFactory.DEFAULT); + runFunction(fn); + + assertEquals(expectedRequest, requestCaptor.getValue()); + verify(attempt, times(1)).awaitSafeToProceed(any()); + verifyNoMoreInteractions(attempt); + verify(attempt2, times(1)).awaitSafeToProceed(any()); + verify(attempt2, times(1)).newFlushBuffer(any()); + verifyNoMoreInteractions(attempt2); + verify(finishBundleAttempt, times(1)).recordRequestStart(rpc1Start, 1); + verify(finishBundleAttempt, times(1)).recordWriteCounts(rpc1End, 1, 0); + verify(finishBundleAttempt, times(1)).completeSuccess(); + verify(finishBundleAttempt, never()).checkCanRetry(any(), any()); + } + + @Test + public final void endToEnd_deadlineExceededOnAnIndividualWriteResultsInThrottling() + throws Exception { + final long totalDocCount = 1_000_000; + final int numWorkers = 100; + final long docCount = totalDocCount / numWorkers; + LOG.info("docCount = {}", docCount); + RpcQosOptions options = + rpcQosOptions + .toBuilder() + .withHintMaxNumWorkers(numWorkers) + .withSamplePeriod(Duration.standardMinutes(10)) + // .withBatchInitialCount(5) + .withReportDiagnosticMetrics() + .build(); + LOG.debug("options = {}", options); + + FirestoreStatefulComponentFactory ff = mock(FirestoreStatefulComponentFactory.class); + when(ff.getFirestoreStub(any())).thenReturn(stub); + Random random = new Random(12345); + TestClock clock = new TestClock(Instant.EPOCH, Duration.standardSeconds(1)); + Sleeper sleeper = millis -> clock.setNext(advanceClockBy(Duration.millis(millis))); + RpcQosImpl qos = + new RpcQosImpl( + options, + random, + sleeper, + metricsFixture.counterFactory, + metricsFixture.distributionFactory); + RpcQos qosSpy = + mock( + RpcQos.class, + invocation -> { + Method method = invocation.getMethod(); + LOG.debug("method = {}", method); + Method actualMethod = + qos.getClass().getMethod(method.getName(), method.getParameterTypes()); + return actualMethod.invoke(qos, invocation.getArguments()); + }); + + when(ff.getRpcQos(options)).thenReturn(qosSpy); + + int defaultDocumentWriteLatency = 30; + final AtomicLong writeCounter = new AtomicLong(); + when(processContext.element()) + .thenAnswer(invocation -> newWrite(writeCounter.getAndIncrement())); + when(callable.call(any())) + .thenAnswer( + new Answer() { + private final Random rand = new Random(84572908); + private final Instant threshold = + Instant.ofEpochMilli(Duration.standardMinutes(20).getMillis()); + + @Override + public BatchWriteResponse answer(InvocationOnMock invocation) throws Throwable { + BatchWriteRequest request = invocation.getArgument(0, BatchWriteRequest.class); + LOG.debug("request = {}", request); + long requestDurationMs = 0; + BatchWriteResponse.Builder builder = BatchWriteResponse.newBuilder(); + for (Write w : request.getWritesList()) { + builder.addWriteResults(WriteResult.newBuilder().build()); + if (clock.prev.isBefore(threshold)) { + requestDurationMs += defaultDocumentWriteLatency; + builder.addStatus(STATUS_OK); + } else { + int latency = rand.nextInt(1500); + LOG.debug("latency = {}", latency); + if (latency > 300) { + builder.addStatus(STATUS_DEADLINE_EXCEEDED); + } else { + builder.addStatus(STATUS_OK); + } + requestDurationMs += latency; + } + } + clock.setNext(advanceClockBy(Duration.millis(requestDurationMs))); + return builder.build(); + } + }); + LOG.info( + "### parameters: {defaultDocumentWriteLatency: {}, rpcQosOptions: {}}", + defaultDocumentWriteLatency, + options); + + FnT fn = + getFn( + clock, ff, options, metricsFixture.counterFactory, metricsFixture.distributionFactory); + fn.setup(); + fn.startBundle(startBundleContext); + while (writeCounter.get() < docCount) { + fn.processElement(processContext, window); + } + fn.finishBundle(finishBundleContext); + + LOG.info("writeCounter = {}", writeCounter.get()); + LOG.info("clock.prev = {}", clock.prev); + + MyDistribution qosAdaptiveThrottlerThrottlingMs = + metricsFixture.distributions.get("qos_adaptiveThrottler_throttlingMs"); + assertNotNull(qosAdaptiveThrottlerThrottlingMs); + List updateInvocations = qosAdaptiveThrottlerThrottlingMs.updateInvocations; + assertFalse(updateInvocations.isEmpty()); + } + + @Test + public final void endToEnd_maxBatchSizeRespected() throws Exception { + + Instant enqueue0 = Instant.ofEpochMilli(0); + Instant enqueue1 = Instant.ofEpochMilli(1); + Instant enqueue2 = Instant.ofEpochMilli(2); + Instant enqueue3 = Instant.ofEpochMilli(3); + Instant enqueue4 = Instant.ofEpochMilli(4); + Instant group1Rpc1Start = Instant.ofEpochMilli(5); + Instant group1Rpc1End = Instant.ofEpochMilli(6); + + Instant enqueue5 = Instant.ofEpochMilli(7); + Instant finalFlush = Instant.ofEpochMilli(8); + Instant group2Rpc1Start = Instant.ofEpochMilli(9); + Instant group2Rpc1End = Instant.ofEpochMilli(10); + + Write write0 = newWrite(0); + Write write1 = newWrite(1); + Write write2 = newWrite(2); + Write write3 = newWrite(3); + Write write4 = newWrite(4); + Write write5 = newWrite(5); + int maxValuesPerGroup = 5; + + BatchWriteRequest.Builder builder = + BatchWriteRequest.newBuilder().setDatabase("projects/testing-project/databases/(default)"); + + BatchWriteRequest expectedGroup1Request = + builder + .build() + .toBuilder() + .addWrites(write0) + .addWrites(write1) + .addWrites(write2) + .addWrites(write3) + .addWrites(write4) + .build(); + + BatchWriteRequest expectedGroup2Request = builder.build().toBuilder().addWrites(write5).build(); + + BatchWriteResponse group1Response = + BatchWriteResponse.newBuilder() + .addStatus(STATUS_OK) + .addStatus(STATUS_OK) + .addStatus(STATUS_OK) + .addStatus(STATUS_OK) + .addStatus(STATUS_OK) + .build(); + + BatchWriteResponse group2Response = + BatchWriteResponse.newBuilder().addStatus(STATUS_OK).build(); + + RpcQosOptions options = rpcQosOptions.toBuilder().withBatchMaxCount(maxValuesPerGroup).build(); + FlushBuffer> flushBuffer = spy(newFlushBuffer(options)); + FlushBuffer> flushBuffer2 = spy(newFlushBuffer(options)); + + when(processContext.element()).thenReturn(write0, write1, write2, write3, write4, write5); + + when(rpcQos.newWriteAttempt(any())) + .thenReturn(attempt, attempt, attempt, attempt, attempt, attempt2, attempt2, attempt2) + .thenThrow(new IllegalStateException("too many attempts")); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + when(attempt2.awaitSafeToProceed(any())).thenReturn(true); + + when(attempt.>newFlushBuffer(enqueue0)) + .thenReturn(newFlushBuffer(options)); + when(attempt.>newFlushBuffer(enqueue1)) + .thenReturn(newFlushBuffer(options)); + when(attempt.>newFlushBuffer(enqueue2)) + .thenReturn(newFlushBuffer(options)); + when(attempt.>newFlushBuffer(enqueue3)) + .thenReturn(newFlushBuffer(options)); + when(attempt.>newFlushBuffer(enqueue4)).thenReturn(flushBuffer); + when(callable.call(expectedGroup1Request)).thenReturn(group1Response); + + when(attempt2.>newFlushBuffer(enqueue5)) + .thenReturn(newFlushBuffer(options)); + when(attempt2.>newFlushBuffer(finalFlush)).thenReturn(flushBuffer2); + when(callable.call(expectedGroup2Request)).thenReturn(group2Response); + + runFunction( + getFn(clock, ff, options, CounterFactory.DEFAULT, DistributionFactory.DEFAULT), + maxValuesPerGroup + 1); + + verify(attempt, times(1)).recordRequestStart(group1Rpc1Start, 5); + verify(attempt, times(1)).recordWriteCounts(group1Rpc1End, 5, 0); + verify(attempt, times(1)).completeSuccess(); + verify(attempt2, times(1)).recordRequestStart(group2Rpc1Start, 1); + verify(attempt2, times(1)).recordWriteCounts(group2Rpc1End, 1, 0); + verify(attempt2, times(1)).completeSuccess(); + verify(callable, times(1)).call(expectedGroup1Request); + verify(callable, times(1)).call(expectedGroup2Request); + verifyNoMoreInteractions(callable); + verify(flushBuffer, times(maxValuesPerGroup)).offer(any()); + verify(flushBuffer2, times(1)).offer(any()); + } + + @Test + public final void endToEnd_partialSuccessReturnsWritesToQueue() throws Exception { + Write write0 = newWrite(0); + Write write1 = newWrite(1); + Write write2 = newWrite(2); + Write write3 = newWrite(3); + Write write4 = newWrite(4); + + BatchWriteRequest expectedRequest1 = + BatchWriteRequest.newBuilder() + .setDatabase("projects/testing-project/databases/(default)") + .addWrites(write0) + .addWrites(write1) + .addWrites(write2) + .addWrites(write3) + .addWrites(write4) + .build(); + + BatchWriteResponse response1 = + BatchWriteResponse.newBuilder() + .addStatus(STATUS_OK) + .addStatus(statusForCode(Code.INVALID_ARGUMENT)) + .addStatus(statusForCode(Code.FAILED_PRECONDITION)) + .addStatus(statusForCode(Code.UNAUTHENTICATED)) + .addStatus(STATUS_OK) + .build(); + + BatchWriteRequest expectedRequest2 = + BatchWriteRequest.newBuilder() + .setDatabase("projects/testing-project/databases/(default)") + .addWrites(write1) + .addWrites(write2) + .addWrites(write3) + .build(); + + BatchWriteResponse response2 = + BatchWriteResponse.newBuilder() + .addStatus(STATUS_OK) + .addStatus(STATUS_OK) + .addStatus(STATUS_OK) + .build(); + + RpcQosOptions options = + rpcQosOptions.toBuilder().withMaxAttempts(1).withBatchMaxCount(5).build(); + + when(processContext.element()) + .thenReturn(write0, write1, write2, write3, write4) + .thenThrow(new IllegalStateException("too many calls")); + + when(rpcQos.newWriteAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + when(attempt.>newFlushBuffer(any())) + .thenAnswer(invocation -> newFlushBuffer(options)); + when(attempt.isCodeRetryable(Code.INVALID_ARGUMENT)).thenReturn(true); + when(attempt.isCodeRetryable(Code.FAILED_PRECONDITION)).thenReturn(true); + when(attempt.isCodeRetryable(Code.UNAUTHENTICATED)).thenReturn(true); + + ArgumentCaptor requestCaptor1 = + ArgumentCaptor.forClass(BatchWriteRequest.class); + when(callable.call(requestCaptor1.capture())).thenReturn(response1); + + FnT fn = getFn(clock, ff, options, CounterFactory.DEFAULT, DistributionFactory.DEFAULT); + fn.setup(); + fn.startBundle(startBundleContext); + fn.processElement(processContext, window); // write0 + fn.processElement(processContext, window); // write1 + fn.processElement(processContext, window); // write2 + fn.processElement(processContext, window); // write3 + fn.processElement(processContext, window); // write4 + + assertEquals(expectedRequest1, requestCaptor1.getValue()); + + List expectedRemainingWrites = + newArrayList( + new WriteElement(1, write1, window), + new WriteElement(2, write2, window), + new WriteElement(3, write3, window)); + List actualWrites = new ArrayList<>(fn.writes); + + assertEquals(expectedRemainingWrites, actualWrites); + assertEquals(5, fn.queueNextEntryPriority); + + ArgumentCaptor requestCaptor2 = + ArgumentCaptor.forClass(BatchWriteRequest.class); + when(callable.call(requestCaptor2.capture())).thenReturn(response2); + fn.finishBundle(finishBundleContext); + assertEquals(expectedRequest2, requestCaptor2.getValue()); + + assertEquals(0, fn.queueNextEntryPriority); + + verify(attempt, times(1)).recordRequestStart(any(), eq(5)); + verify(attempt, times(1)).recordWriteCounts(any(), eq(2), eq(3)); + + verify(attempt, times(1)).recordRequestStart(any(), eq(3)); + verify(attempt, times(1)).recordWriteCounts(any(), eq(3), eq(0)); + verify(attempt, times(1)).completeSuccess(); + verify(callable, times(2)).call(any()); + verifyNoMoreInteractions(callable); + } + + @Test + public final void writesRemainInQueueWhenFlushIsNotReadyAndThenFlushesInFinishBundle() + throws Exception { + RpcQosOptions options = rpcQosOptions.toBuilder().withMaxAttempts(1).build(); + + Write write = newWrite(); + BatchWriteRequest expectedRequest = + BatchWriteRequest.newBuilder() + .setDatabase("projects/testing-project/databases/(default)") + .addWrites(write) + .build(); + BatchWriteResponse response = BatchWriteResponse.newBuilder().addStatus(STATUS_OK).build(); + + when(processContext.element()) + .thenReturn(write) + .thenThrow(new IllegalStateException("too many element calls")); + when(rpcQos.newWriteAttempt(any())) + .thenReturn(attempt, attempt2) + .thenThrow(new IllegalStateException("too many attempt calls")); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + when(attempt2.awaitSafeToProceed(any())).thenReturn(true); + when(attempt.>newFlushBuffer(any())) + .thenAnswer(invocation -> newFlushBuffer(options)); + when(attempt2.>newFlushBuffer(any())) + .thenAnswer(invocation -> newFlushBuffer(options)); + + FnT fn = getFn(clock, ff, options, CounterFactory.DEFAULT, DistributionFactory.DEFAULT); + fn.populateDisplayData(displayDataBuilder); + fn.setup(); + fn.startBundle(startBundleContext); + fn.processElement(processContext, window); + + assertEquals(1, fn.writes.size()); + verify(attempt, never()).recordWriteCounts(any(), anyInt(), anyInt()); + verify(attempt, never()).checkCanRetry(any(), any()); + verify(attempt, never()).completeSuccess(); + + Instant attempt2RpcStart = Instant.ofEpochMilli(2); + Instant attempt2RpcEnd = Instant.ofEpochMilli(3); + + ArgumentCaptor requestCaptor = + ArgumentCaptor.forClass(BatchWriteRequest.class); + when(callable.call(requestCaptor.capture())).thenReturn(response); + fn.finishBundle(finishBundleContext); + + assertEquals(0, fn.writes.size()); + assertEquals(expectedRequest, requestCaptor.getValue()); + verify(attempt2, times(1)).recordRequestStart(attempt2RpcStart, 1); + verify(attempt2, times(1)).recordWriteCounts(attempt2RpcEnd, 1, 0); + verify(attempt2, never()).recordWriteCounts(any(), anyInt(), gt(0)); + verify(attempt2, never()).checkCanRetry(any(), any()); + verify(attempt2, times(1)).completeSuccess(); + } + + @Test + public final void queuedWritesMaintainPriorityIfNotFlushed() throws Exception { + RpcQosOptions options = rpcQosOptions.toBuilder().withMaxAttempts(1).build(); + + Write write0 = newWrite(0); + Write write1 = newWrite(1); + Write write2 = newWrite(2); + Write write3 = newWrite(3); + Write write4 = newWrite(4); + Instant write4Start = Instant.ofEpochMilli(4); + + when(processContext.element()) + .thenReturn(write0, write1, write2, write3, write4) + .thenThrow(new IllegalStateException("too many calls")); + + when(rpcQos.newWriteAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + when(attempt.>newFlushBuffer(any())) + .thenAnswer(invocation -> newFlushBuffer(options)); + + FnT fn = getFn(clock, ff, options, CounterFactory.DEFAULT, DistributionFactory.DEFAULT); + fn.setup(); + fn.startBundle(startBundleContext); + fn.processElement(processContext, window); // write0 + fn.processElement(processContext, window); // write1 + fn.processElement(processContext, window); // write2 + fn.processElement(processContext, window); // write3 + fn.processElement(processContext, window); // write4 + + List expectedWrites = + newArrayList( + new WriteElement(0, write0, window), + new WriteElement(1, write1, window), + new WriteElement(2, write2, window), + new WriteElement(3, write3, window), + new WriteElement(4, write4, window)); + List actualWrites = new ArrayList<>(fn.writes); + + assertEquals(expectedWrites, actualWrites); + assertEquals(5, fn.queueNextEntryPriority); + verify(attempt, times(1)).newFlushBuffer(write4Start); + verifyNoMoreInteractions(callable); + } + + @Override + protected final FnT getFn() { + return getFn( + JodaClock.DEFAULT, + FirestoreStatefulComponentFactory.INSTANCE, + rpcQosOptions, + CounterFactory.DEFAULT, + DistributionFactory.DEFAULT); + } + + protected abstract FnT getFn( + JodaClock clock, + FirestoreStatefulComponentFactory ff, + RpcQosOptions rpcQosOptions, + CounterFactory counterFactory, + DistributionFactory distributionFactory); + + @Override + protected final void processElementsAndFinishBundle(FnT fn, int processElementCount) + throws Exception { + try { + for (int i = 0; i < processElementCount; i++) { + fn.processElement(processContext, window); + } + } finally { + fn.finishBundle(finishBundleContext); + } + } + + protected FlushBufferImpl> newFlushBuffer(RpcQosOptions options) { + return new FlushBufferImpl<>(options.getBatchMaxCount(), options.getBatchMaxBytes()); + } + + protected static Status statusForCode(Code code) { + return Status.newBuilder().setCode(code.getNumber()).build(); + } + + private static Function advanceClockBy(Duration duration) { + return (i) -> i.withDurationAdded(duration, 1); + } + + private static class TestClock implements JodaClock { + private final Function defaultNext; + + private Function next; + private Instant prev; + + public TestClock(Instant start, Duration defaultInterval) { + prev = start; + defaultNext = advanceClockBy(defaultInterval); + } + + public TestClock setNext(Function next) { + this.next = next; + return this; + } + + @Override + public Instant instant() { + final Instant ret; + if (next != null) { + ret = next.apply(prev); + next = null; + } else { + ret = defaultNext.apply(prev); + } + prev = ret; + LOG.trace("{} testClock:instant:{}", METRIC_MARKER, ret.toString()); + return ret; + } + } + + private static final String METRIC_MARKER = "XXX"; + + public static final class MetricsFixture { + final Map counters; + final Map distributions; + final CounterFactory counterFactory; + final DistributionFactory distributionFactory; + + public MetricsFixture() { + counters = new HashMap<>(); + distributions = new HashMap<>(); + counterFactory = + (namespace, name) -> + counters.computeIfAbsent(name, (k) -> new MyCounter(namespace, name)); + distributionFactory = + (namespace, name) -> + distributions.computeIfAbsent(name, (k) -> new MyDistribution(namespace, name)); + } + + public Map getCounters() { + return ImmutableMap.copyOf(counters); + } + + public Map getDistributions() { + return ImmutableMap.copyOf(distributions); + } + } + + private static class MyCounter implements Counter { + private final MetricName named; + + private long incInvocationCount; + private final List incInvocations; + + public MyCounter(String namespace, String name) { + named = MetricName.named(namespace, name); + incInvocationCount = 0; + incInvocations = new ArrayList<>(); + } + + @Override + public void inc() { + LOG.trace("{} {}:inc()", METRIC_MARKER, named); + incInvocationCount++; + } + + @Override + public void inc(long n) { + LOG.trace("{} {}:inc(n = {})", METRIC_MARKER, named, n); + incInvocations.add(n); + } + + @Override + public void dec() { + dec(1); + } + + @Override + public void dec(long n) { + throw new IllegalStateException("not implemented"); + } + + @Override + public MetricName getName() { + return named; + } + } + + private static class MyDistribution implements Distribution { + private final MetricName name; + + private final List updateInvocations; + + public MyDistribution(String namespace, String name) { + this.name = MetricName.named(namespace, name); + this.updateInvocations = new ArrayList<>(); + } + + @Override + public void update(long value) { + LOG.trace("{} {}:update(value = {})", METRIC_MARKER, name, value); + updateInvocations.add(value); + } + + @Override + public void update(long sum, long count, long min, long max) { + throw new IllegalStateException("not implemented"); + } + + @Override + public MetricName getName() { + return name; + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreProtoHelpers.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreProtoHelpers.java new file mode 100644 index 000000000000..dbcfdb0e5896 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreProtoHelpers.java @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import com.google.firestore.v1.Write; + +final class FirestoreProtoHelpers { + + static Write newWrite() { + return newWrite(1); + } + + static Write newWrite(long i) { + Write.Builder writeBuilder = Write.newBuilder(); + writeBuilder.getUpdateBuilder().setName(String.format("doc-%012d", i)); + return writeBuilder.build(); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnBatchGetDocumentsTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnBatchGetDocumentsTest.java new file mode 100644 index 000000000000..b45b99a2d41b --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnBatchGetDocumentsTest.java @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.junit.Assert.assertEquals; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.doNothing; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +import com.google.api.gax.rpc.ServerStream; +import com.google.api.gax.rpc.ServerStreamingCallable; +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import com.google.firestore.v1.BatchGetDocumentsRequest; +import com.google.firestore.v1.BatchGetDocumentsResponse; +import com.google.firestore.v1.Document; +import com.google.firestore.v1.Value; +import java.util.List; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.BatchGetDocumentsFn; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.AbstractIterator; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; +import org.mockito.ArgumentCaptor; +import org.mockito.Mock; + +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +public final class FirestoreV1FnBatchGetDocumentsTest + extends BaseFirestoreV1ReadFnTest { + + @Mock + private ServerStreamingCallable callable; + + @Mock private ServerStream responseStream1; + @Mock private ServerStream responseStream2; + @Mock private ServerStream responseStream3; + + @Test + public void endToEnd() throws Exception { + final BatchGetDocumentsRequest request = + BatchGetDocumentsRequest.newBuilder() + .setDatabase(String.format("projects/%s/databases/(default)", projectId)) + .addDocuments("doc_1-1") + .addDocuments("doc_1-2") + .addDocuments("doc_1-3") + .build(); + + final BatchGetDocumentsResponse response1 = newFound(1); + final BatchGetDocumentsResponse response2 = newFound(2); + final BatchGetDocumentsResponse response3 = newFound(3); + + List responses = ImmutableList.of(response1, response2, response3); + when(responseStream1.iterator()).thenReturn(responses.iterator()); + + when(callable.call(request)).thenReturn(responseStream1); + + when(stub.batchGetDocumentsCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())) + .thenReturn(FirestoreStatefulComponentFactory.INSTANCE.getRpcQos(rpcQosOptions)); + + ArgumentCaptor responsesCaptor = + ArgumentCaptor.forClass(BatchGetDocumentsResponse.class); + doNothing().when(processContext).output(responsesCaptor.capture()); + + when(processContext.element()).thenReturn(request); + + runFunction(getFn(clock, ff, rpcQosOptions)); + + List allValues = responsesCaptor.getAllValues(); + assertEquals(responses, allValues); + } + + @Override + public void resumeFromLastReadValue() throws Exception { + + final BatchGetDocumentsResponse response1 = newMissing(1); + final BatchGetDocumentsResponse response2 = newFound(2); + final BatchGetDocumentsResponse response3 = newMissing(3); + final BatchGetDocumentsResponse response4 = newFound(4); + + final BatchGetDocumentsRequest request1 = + BatchGetDocumentsRequest.newBuilder() + .setDatabase(String.format("projects/%s/databases/(default)", projectId)) + .addDocuments(response1.getMissing()) + .addDocuments(response2.getFound().getName()) + .addDocuments(response3.getMissing()) + .addDocuments(response4.getFound().getName()) + .build(); + + BatchGetDocumentsRequest request2 = + BatchGetDocumentsRequest.newBuilder() + .setDatabase(String.format("projects/%s/databases/(default)", projectId)) + .addDocuments(response3.getMissing()) + .addDocuments(response4.getFound().getName()) + .build(); + BatchGetDocumentsRequest request3 = + BatchGetDocumentsRequest.newBuilder() + .setDatabase(String.format("projects/%s/databases/(default)", projectId)) + .addDocuments(response4.getFound().getName()) + .build(); + + when(responseStream1.iterator()) + .thenReturn( + new AbstractIterator() { + private int counter = 10; + + @Override + protected BatchGetDocumentsResponse computeNext() { + int count = counter++; + if (count == 10) { + return response1; + } else if (count == 11) { + return response2; + } else { + throw RETRYABLE_ERROR; + } + } + }); + when(responseStream2.iterator()) + .thenReturn( + new AbstractIterator() { + private int counter = 20; + + @Override + protected BatchGetDocumentsResponse computeNext() { + int count = counter++; + if (count == 20) { + return response3; + } else { + throw RETRYABLE_ERROR; + } + } + }); + when(responseStream3.iterator()).thenReturn(ImmutableList.of(response4).iterator()); + + doNothing().when(attempt).checkCanRetry(any(), eq(RETRYABLE_ERROR)); + when(callable.call(request1)).thenReturn(responseStream1); + when(callable.call(request2)).thenReturn(responseStream2); + when(callable.call(request3)).thenReturn(responseStream3); + + when(stub.batchGetDocumentsCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newReadAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + + ArgumentCaptor responsesCaptor = + ArgumentCaptor.forClass(BatchGetDocumentsResponse.class); + + doNothing().when(processContext).output(responsesCaptor.capture()); + + when(processContext.element()).thenReturn(request1); + + BatchGetDocumentsFn fn = new BatchGetDocumentsFn(clock, ff, rpcQosOptions); + + runFunction(fn); + + List expectedResponses = + ImmutableList.of(response1, response2, response3, response4); + List actualResponses = responsesCaptor.getAllValues(); + assertEquals(expectedResponses, actualResponses); + + verify(callable, times(1)).call(request1); + verify(callable, times(1)).call(request2); + verify(attempt, times(4)).recordStreamValue(any()); + } + + @Override + protected V1RpcFnTestCtx newCtx() { + return new V1RpcFnTestCtx() { + @Override + public BatchGetDocumentsRequest getRequest() { + return BatchGetDocumentsRequest.newBuilder() + .setDatabase(String.format("projects/%s/databases/(default)", projectId)) + .addDocuments("doc_1-1") + .build(); + } + + @Override + public void mockRpcToCallable(FirestoreStub stub) { + when(stub.batchGetDocumentsCallable()).thenReturn(callable); + } + + @Override + public void whenCallableCall(BatchGetDocumentsRequest in, Throwable... throwables) { + when(callable.call(in)).thenThrow(throwables); + } + + @Override + public void verifyNoInteractionsWithCallable() { + verifyNoMoreInteractions(callable); + } + }; + } + + @Override + protected BatchGetDocumentsFn getFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new BatchGetDocumentsFn(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + private static BatchGetDocumentsResponse newFound(int docNumber) { + String docName = docName(docNumber); + return BatchGetDocumentsResponse.newBuilder() + .setFound( + Document.newBuilder() + .setName(docName) + .putAllFields( + ImmutableMap.of("foo", Value.newBuilder().setStringValue("bar").build())) + .build()) + .build(); + } + + private static BatchGetDocumentsResponse newMissing(int docNumber) { + String docName = docName(docNumber); + return BatchGetDocumentsResponse.newBuilder().setMissing(docName).build(); + } + + private static String docName(int docNumber) { + return String.format("doc-%d", docNumber); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnBatchWriteWithDeadLetterQueueTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnBatchWriteWithDeadLetterQueueTest.java new file mode 100644 index 000000000000..83c3a166bd62 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnBatchWriteWithDeadLetterQueueTest.java @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertTrue; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.doNothing; +import static org.mockito.Mockito.never; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +import com.google.firestore.v1.ArrayValue; +import com.google.firestore.v1.BatchWriteRequest; +import com.google.firestore.v1.BatchWriteResponse; +import com.google.firestore.v1.Document; +import com.google.firestore.v1.DocumentTransform; +import com.google.firestore.v1.DocumentTransform.FieldTransform; +import com.google.firestore.v1.Precondition; +import com.google.firestore.v1.Value; +import com.google.firestore.v1.Write; +import com.google.firestore.v1.WriteResult; +import com.google.protobuf.ByteString; +import com.google.protobuf.Timestamp; +import com.google.rpc.Code; +import java.security.SecureRandom; +import java.util.ArrayList; +import java.util.Base64; +import java.util.List; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.WriteFailure; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.BatchWriteFnWithDeadLetterQueue; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.WriteElement; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.Element; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; +import org.mockito.ArgumentCaptor; + +@SuppressWarnings({ + "initialization.fields.uninitialized", // mockito fields are initialized via the Mockito Runner + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public final class FirestoreV1FnBatchWriteWithDeadLetterQueueTest + extends BaseFirestoreV1WriteFnTest { + + @Override + @Test + public void enqueueingWritesValidateBytesSize() throws Exception { + int maxBytes = 50; + RpcQosOptions options = rpcQosOptions.toBuilder().withBatchMaxBytes(maxBytes).build(); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())) + .thenReturn(FirestoreStatefulComponentFactory.INSTANCE.getRpcQos(options)); + + byte[] bytes = new byte[maxBytes + 1]; + SecureRandom.getInstanceStrong().nextBytes(bytes); + byte[] base64Bytes = Base64.getEncoder().encode(bytes); + String base64String = Base64.getEncoder().encodeToString(bytes); + + Value largeValue = + Value.newBuilder().setStringValueBytes(ByteString.copyFrom(base64Bytes)).build(); + + // apply a doc transform that is too large + Write write1 = + Write.newBuilder() + .setTransform( + DocumentTransform.newBuilder() + .setDocument(String.format("doc-%03d", 2)) + .addFieldTransforms( + FieldTransform.newBuilder() + .setAppendMissingElements( + ArrayValue.newBuilder().addValues(largeValue)))) + .build(); + // delete a doc that is too large + Write write2 = + Write.newBuilder().setDelete(String.format("doc-%03d_%s", 3, base64String)).build(); + // update a doc that is too large + Write write3 = + Write.newBuilder() + .setUpdate( + Document.newBuilder() + .setName(String.format("doc-%03d", 4)) + .putAllFields(ImmutableMap.of("foo", largeValue))) + .build(); + + BatchWriteFnWithDeadLetterQueue fn = + getFn( + clock, ff, options, metricsFixture.counterFactory, metricsFixture.distributionFactory); + fn.populateDisplayData(displayDataBuilder); + fn.setup(); + fn.startBundle(startBundleContext); + ArgumentCaptor write1FailureCapture = ArgumentCaptor.forClass(WriteFailure.class); + doNothing().when(processContext).outputWithTimestamp(write1FailureCapture.capture(), any()); + when(processContext.element()).thenReturn(write1); + fn.processElement(processContext, window); + WriteFailure failure = write1FailureCapture.getValue(); + assertNotNull(failure); + String message = failure.getStatus().getMessage(); + assertTrue(message.contains("TRANSFORM")); + assertTrue(message.contains("larger than configured max allowed bytes per batch")); + + ArgumentCaptor write2FailureCapture = ArgumentCaptor.forClass(WriteFailure.class); + doNothing().when(processContext).outputWithTimestamp(write2FailureCapture.capture(), any()); + when(processContext.element()).thenReturn(write2); + fn.processElement(processContext, window); + WriteFailure failure2 = write2FailureCapture.getValue(); + assertNotNull(failure2); + String message2 = failure2.getStatus().getMessage(); + assertTrue(message2.contains("DELETE")); + assertTrue(message2.contains("larger than configured max allowed bytes per batch")); + + ArgumentCaptor write3FailureCapture = ArgumentCaptor.forClass(WriteFailure.class); + doNothing().when(processContext).outputWithTimestamp(write3FailureCapture.capture(), any()); + when(processContext.element()).thenReturn(write3); + fn.processElement(processContext, window); + WriteFailure failure3 = write3FailureCapture.getValue(); + assertNotNull(failure3); + String message3 = failure3.getStatus().getMessage(); + assertTrue(message3.contains("UPDATE")); + assertTrue(message3.contains("larger than configured max allowed bytes per batch")); + + assertEquals(0, fn.writes.size()); + } + + @Test + public void nonRetryableWriteIsOutput() throws Exception { + Write write0 = FirestoreProtoHelpers.newWrite(0); + Write write1 = + FirestoreProtoHelpers.newWrite(1) + .toBuilder() + .setCurrentDocument(Precondition.newBuilder().setExists(false).build()) + .build(); + + BatchWriteRequest expectedRequest1 = + BatchWriteRequest.newBuilder() + .setDatabase("projects/testing-project/databases/(default)") + .addWrites(write0) + .addWrites(write1) + .build(); + + BatchWriteResponse response1 = + BatchWriteResponse.newBuilder() + .addStatus(STATUS_OK) + .addWriteResults( + WriteResult.newBuilder() + .setUpdateTime(Timestamp.newBuilder().setSeconds(1).build()) + .build()) + .addStatus(statusForCode(Code.ALREADY_EXISTS)) + .addWriteResults(WriteResult.newBuilder().build()) + .build(); + + RpcQosOptions options = + rpcQosOptions.toBuilder().withMaxAttempts(1).withBatchMaxCount(2).build(); + + when(processContext.element()) + .thenReturn(write0, write1) + .thenThrow(new IllegalStateException("too many calls")); + + when(rpcQos.newWriteAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + when(attempt.>newFlushBuffer(any())) + .thenReturn(newFlushBuffer(options)) + .thenReturn(newFlushBuffer(options)) + .thenThrow(new IllegalStateException("too many attempt#newFlushBuffer calls")); + when(attempt.isCodeRetryable(Code.ALREADY_EXISTS)).thenReturn(false); + + ArgumentCaptor requestCaptor1 = + ArgumentCaptor.forClass(BatchWriteRequest.class); + when(callable.call(requestCaptor1.capture())).thenReturn(response1); + + BatchWriteFnWithDeadLetterQueue fn = + getFn(clock, ff, options, CounterFactory.DEFAULT, DistributionFactory.DEFAULT); + fn.setup(); + fn.startBundle(startBundleContext); + fn.processElement(processContext, window); // write0 + ArgumentCaptor writeFailureCapture = ArgumentCaptor.forClass(WriteFailure.class); + doNothing().when(processContext).outputWithTimestamp(writeFailureCapture.capture(), any()); + fn.processElement(processContext, window); // write1 + WriteFailure failure = writeFailureCapture.getValue(); + assertEquals(Code.ALREADY_EXISTS.getNumber(), failure.getStatus().getCode()); + assertEquals(write1, failure.getWrite()); + assertEquals(WriteResult.getDefaultInstance(), failure.getWriteResult()); + + assertEquals(expectedRequest1, requestCaptor1.getValue()); + + List actualWrites = new ArrayList<>(fn.writes); + + assertTrue(actualWrites.isEmpty()); + + fn.finishBundle(finishBundleContext); + + verify(attempt, times(1)).recordRequestStart(any(), eq(2)); + verify(attempt, times(1)).recordWriteCounts(any(), eq(1), eq(1)); + + verify(attempt, never()).completeSuccess(); + verify(callable, times(1)).call(any()); + verifyNoMoreInteractions(callable); + } + + @Override + protected BatchWriteFnWithDeadLetterQueue getFn( + JodaClock clock, + FirestoreStatefulComponentFactory ff, + RpcQosOptions rpcQosOptions, + CounterFactory counterFactory, + DistributionFactory distributionFactory) { + return new BatchWriteFnWithDeadLetterQueue(clock, ff, rpcQosOptions, counterFactory); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnBatchWriteWithSummaryTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnBatchWriteWithSummaryTest.java new file mode 100644 index 000000000000..984b8e453a47 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnBatchWriteWithSummaryTest.java @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.Mockito.never; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +import com.google.firestore.v1.ArrayValue; +import com.google.firestore.v1.BatchWriteRequest; +import com.google.firestore.v1.BatchWriteResponse; +import com.google.firestore.v1.Document; +import com.google.firestore.v1.DocumentTransform; +import com.google.firestore.v1.DocumentTransform.FieldTransform; +import com.google.firestore.v1.Precondition; +import com.google.firestore.v1.Value; +import com.google.firestore.v1.Write; +import com.google.firestore.v1.WriteResult; +import com.google.protobuf.ByteString; +import com.google.protobuf.Timestamp; +import com.google.rpc.Code; +import java.security.SecureRandom; +import java.util.ArrayList; +import java.util.Base64; +import java.util.List; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.FailedWritesException; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.WriteFailure; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.WriteSuccessSummary; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.BaseBatchWriteFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.BatchWriteFnWithSummary; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.WriteElement; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.Element; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.joda.time.Instant; +import org.junit.After; +import org.junit.Test; +import org.mockito.ArgumentCaptor; + +@SuppressWarnings({ + "initialization.fields.uninitialized", // mockito fields are initialized via the Mockito Runner + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public final class FirestoreV1FnBatchWriteWithSummaryTest + extends BaseFirestoreV1WriteFnTest { + + @After + public void tearDown() { + verify(processContext, never()).output(any()); + } + + @Override + @Test + public void enqueueingWritesValidateBytesSize() throws Exception { + int maxBytes = 50; + RpcQosOptions options = rpcQosOptions.toBuilder().withBatchMaxBytes(maxBytes).build(); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())) + .thenReturn(FirestoreStatefulComponentFactory.INSTANCE.getRpcQos(options)); + + byte[] bytes = new byte[maxBytes + 1]; + SecureRandom.getInstanceStrong().nextBytes(bytes); + byte[] base64Bytes = Base64.getEncoder().encode(bytes); + String base64String = Base64.getEncoder().encodeToString(bytes); + + Value largeValue = + Value.newBuilder().setStringValueBytes(ByteString.copyFrom(base64Bytes)).build(); + + // apply a doc transform that is too large + Write write1 = + Write.newBuilder() + .setTransform( + DocumentTransform.newBuilder() + .setDocument(String.format("doc-%03d", 2)) + .addFieldTransforms( + FieldTransform.newBuilder() + .setAppendMissingElements( + ArrayValue.newBuilder().addValues(largeValue)))) + .build(); + // delete a doc that is too large + Write write2 = + Write.newBuilder().setDelete(String.format("doc-%03d_%s", 3, base64String)).build(); + // update a doc that is too large + Write write3 = + Write.newBuilder() + .setUpdate( + Document.newBuilder() + .setName(String.format("doc-%03d", 4)) + .putAllFields(ImmutableMap.of("foo", largeValue))) + .build(); + + BatchWriteFnWithSummary fn = + getFn( + clock, ff, options, metricsFixture.counterFactory, metricsFixture.distributionFactory); + fn.populateDisplayData(displayDataBuilder); + fn.setup(); + fn.startBundle(startBundleContext); + try { + when(processContext.element()).thenReturn(write1); + fn.processElement(processContext, window); + fail("expected validation error"); + } catch (FailedWritesException e) { + WriteFailure failure = e.getWriteFailures().get(0); + assertNotNull(failure); + String message = failure.getStatus().getMessage(); + assertTrue(message.contains("TRANSFORM")); + assertTrue(message.contains("larger than configured max allowed bytes per batch")); + } + try { + when(processContext.element()).thenReturn(write2); + fn.processElement(processContext, window); + fail("expected validation error"); + } catch (FailedWritesException e) { + WriteFailure failure = e.getWriteFailures().get(0); + assertNotNull(failure); + String message = failure.getStatus().getMessage(); + assertTrue(message.contains("DELETE")); + assertTrue(message.contains("larger than configured max allowed bytes per batch")); + } + try { + when(processContext.element()).thenReturn(write3); + fn.processElement(processContext, window); + fail("expected validation error"); + } catch (FailedWritesException e) { + WriteFailure failure = e.getWriteFailures().get(0); + assertNotNull(failure); + String message = failure.getStatus().getMessage(); + assertTrue(message.contains("UPDATE")); + assertTrue(message.contains("larger than configured max allowed bytes per batch")); + } + + assertEquals(0, fn.writes.size()); + } + + @Test + public void nonRetryableWriteResultStopsAttempts() throws Exception { + Write write0 = FirestoreProtoHelpers.newWrite(0); + Write write1 = + FirestoreProtoHelpers.newWrite(1) + .toBuilder() + .setCurrentDocument(Precondition.newBuilder().setExists(false).build()) + .build(); + + BatchWriteRequest expectedRequest1 = + BatchWriteRequest.newBuilder() + .setDatabase("projects/testing-project/databases/(default)") + .addWrites(write0) + .addWrites(write1) + .build(); + + BatchWriteResponse response1 = + BatchWriteResponse.newBuilder() + .addStatus(STATUS_OK) + .addWriteResults( + WriteResult.newBuilder() + .setUpdateTime(Timestamp.newBuilder().setSeconds(1).build()) + .build()) + .addStatus(statusForCode(Code.ALREADY_EXISTS)) + .addWriteResults(WriteResult.newBuilder().build()) + .build(); + + RpcQosOptions options = + rpcQosOptions.toBuilder().withMaxAttempts(1).withBatchMaxCount(2).build(); + + when(processContext.element()) + .thenReturn(write0, write1) + .thenThrow(new IllegalStateException("too many calls")); + + when(rpcQos.newWriteAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + when(attempt.>newFlushBuffer(any())) + .thenReturn(newFlushBuffer(options)) + .thenReturn(newFlushBuffer(options)) + .thenThrow(new IllegalStateException("too many attempt#newFlushBuffer calls")); + when(attempt.isCodeRetryable(Code.ALREADY_EXISTS)).thenReturn(false); + + ArgumentCaptor requestCaptor1 = + ArgumentCaptor.forClass(BatchWriteRequest.class); + when(callable.call(requestCaptor1.capture())).thenReturn(response1); + + BaseBatchWriteFn fn = + new BatchWriteFnWithSummary(clock, ff, options, CounterFactory.DEFAULT); + fn.setup(); + fn.startBundle(startBundleContext); + fn.processElement(processContext, window); // write0 + try { + fn.processElement(processContext, window); // write1 + fail("expected an exception when trying to apply a write with a failed precondition"); + } catch (FailedWritesException e) { + List writeFailures = e.getWriteFailures(); + assertEquals(1, writeFailures.size()); + WriteFailure failure = writeFailures.get(0); + assertEquals(Code.ALREADY_EXISTS.getNumber(), failure.getStatus().getCode()); + assertEquals(write1, failure.getWrite()); + assertEquals(WriteResult.getDefaultInstance(), failure.getWriteResult()); + } + + assertEquals(expectedRequest1, requestCaptor1.getValue()); + + List actualWrites = new ArrayList<>(fn.writes); + + Instant flush1Attempt1Begin = Instant.ofEpochMilli(1); + Instant flush1Attempt1RpcStart = Instant.ofEpochMilli(2); + Instant flush1Attempt1RpcEnd = Instant.ofEpochMilli(3); + + assertTrue(actualWrites.isEmpty()); + + fn.finishBundle(finishBundleContext); + + verify(attempt, times(1)).newFlushBuffer(flush1Attempt1Begin); + verify(attempt, times(1)).recordRequestStart(flush1Attempt1RpcStart, 2); + verify(attempt, times(1)).recordWriteCounts(flush1Attempt1RpcEnd, 1, 1); + + verify(attempt, never()).completeSuccess(); + verify(callable, times(1)).call(any()); + verifyNoMoreInteractions(callable); + } + + @Override + protected BatchWriteFnWithSummary getFn( + JodaClock clock, + FirestoreStatefulComponentFactory ff, + RpcQosOptions rpcQosOptions, + CounterFactory counterFactory, + DistributionFactory distributionFactory) { + return new BatchWriteFnWithSummary(clock, ff, rpcQosOptions, counterFactory); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnListCollectionIdsTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnListCollectionIdsTest.java new file mode 100644 index 000000000000..8a0275734c71 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnListCollectionIdsTest.java @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists.newArrayList; +import static org.junit.Assert.assertEquals; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.doNothing; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +import com.google.api.gax.rpc.UnaryCallable; +import com.google.cloud.firestore.v1.FirestoreClient.ListCollectionIdsPage; +import com.google.cloud.firestore.v1.FirestoreClient.ListCollectionIdsPagedResponse; +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import com.google.firestore.v1.ListCollectionIdsRequest; +import com.google.firestore.v1.ListCollectionIdsResponse; +import java.util.Iterator; +import java.util.List; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.ListCollectionIdsFn; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.AbstractIterator; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Test; +import org.mockito.ArgumentCaptor; +import org.mockito.Mock; + +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +public final class FirestoreV1FnListCollectionIdsTest + extends BaseFirestoreV1ReadFnTest { + + @Mock private UnaryCallable callable; + @Mock private ListCollectionIdsPagedResponse pagedResponse1; + @Mock private ListCollectionIdsPage page1; + @Mock private ListCollectionIdsPagedResponse pagedResponse2; + @Mock private ListCollectionIdsPage page2; + + @Test + public void endToEnd() throws Exception { + // First page of the response + ListCollectionIdsRequest request1 = + ListCollectionIdsRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + ListCollectionIdsResponse response1 = + ListCollectionIdsResponse.newBuilder() + .addCollectionIds("col_1-1") + .addCollectionIds("col_1-2") + .addCollectionIds("col_1-3") + .setNextPageToken("page2") + .build(); + when(page1.getNextPageToken()).thenReturn(response1.getNextPageToken()); + when(page1.getResponse()).thenReturn(response1); + when(page1.hasNextPage()).thenReturn(true); + + // Second page of the response + ListCollectionIdsResponse response2 = + ListCollectionIdsResponse.newBuilder().addCollectionIds("col_2-1").build(); + when(page2.getResponse()).thenReturn(response2); + when(page2.hasNextPage()).thenReturn(false); + when(pagedResponse1.iteratePages()).thenReturn(ImmutableList.of(page1, page2)); + when(callable.call(request1)).thenReturn(pagedResponse1); + + when(stub.listCollectionIdsPagedCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + RpcQosOptions options = RpcQosOptions.defaultOptions(); + when(ff.getRpcQos(any())) + .thenReturn(FirestoreStatefulComponentFactory.INSTANCE.getRpcQos(options)); + + ArgumentCaptor responses = + ArgumentCaptor.forClass(ListCollectionIdsResponse.class); + + doNothing().when(processContext).output(responses.capture()); + + when(processContext.element()).thenReturn(request1); + + ListCollectionIdsFn fn = new ListCollectionIdsFn(clock, ff, options); + + runFunction(fn); + + List expected = newArrayList(response1, response2); + List allValues = responses.getAllValues(); + assertEquals(expected, allValues); + } + + @Override + public void resumeFromLastReadValue() throws Exception { + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newReadAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + + // First page of the response + ListCollectionIdsRequest request1 = + ListCollectionIdsRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + ListCollectionIdsResponse response1 = + ListCollectionIdsResponse.newBuilder() + .addCollectionIds("col_1-1") + .addCollectionIds("col_1-2") + .addCollectionIds("col_1-3") + .setNextPageToken("page2") + .build(); + when(page1.getNextPageToken()).thenReturn(response1.getNextPageToken()); + when(page1.getResponse()).thenReturn(response1); + when(page1.hasNextPage()).thenReturn(true); + when(callable.call(request1)).thenReturn(pagedResponse1); + doNothing().when(attempt).checkCanRetry(any(), eq(RETRYABLE_ERROR)); + when(pagedResponse1.iteratePages()) + .thenAnswer( + invocation -> + new Iterable() { + @Override + public Iterator iterator() { + return new AbstractIterator() { + private boolean first = true; + + @Override + protected ListCollectionIdsPage computeNext() { + if (first) { + first = false; + return page1; + } else { + throw RETRYABLE_ERROR; + } + } + }; + } + }); + + // Second page of the response + ListCollectionIdsRequest request2 = + ListCollectionIdsRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .setPageToken("page2") + .build(); + ListCollectionIdsResponse response2 = + ListCollectionIdsResponse.newBuilder().addCollectionIds("col_2-1").build(); + when(page2.getResponse()).thenReturn(response2); + when(page2.hasNextPage()).thenReturn(false); + when(callable.call(request2)).thenReturn(pagedResponse2); + when(pagedResponse2.iteratePages()).thenReturn(ImmutableList.of(page2)); + + when(stub.listCollectionIdsPagedCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + + ArgumentCaptor responses = + ArgumentCaptor.forClass(ListCollectionIdsResponse.class); + + doNothing().when(processContext).output(responses.capture()); + + when(processContext.element()).thenReturn(request1); + + ListCollectionIdsFn fn = new ListCollectionIdsFn(clock, ff, rpcQosOptions); + + runFunction(fn); + + List expected = newArrayList(response1, response2); + List allValues = responses.getAllValues(); + assertEquals(expected, allValues); + } + + @Override + protected V1RpcFnTestCtx newCtx() { + return new V1RpcFnTestCtx() { + @Override + public ListCollectionIdsRequest getRequest() { + return ListCollectionIdsRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + } + + @Override + public void mockRpcToCallable(FirestoreStub stub) { + when(stub.listCollectionIdsPagedCallable()).thenReturn(callable); + } + + @Override + public void whenCallableCall(ListCollectionIdsRequest in, Throwable... throwables) { + when(callable.call(in)).thenThrow(throwables); + } + + @Override + public void verifyNoInteractionsWithCallable() { + verifyNoMoreInteractions(callable); + } + }; + } + + @Override + protected ListCollectionIdsFn getFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new ListCollectionIdsFn(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnListDocumentsTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnListDocumentsTest.java new file mode 100644 index 000000000000..c81aae770e7d --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnListDocumentsTest.java @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists.newArrayList; +import static org.junit.Assert.assertEquals; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.doNothing; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +import com.google.api.gax.rpc.UnaryCallable; +import com.google.cloud.firestore.v1.FirestoreClient.ListDocumentsPage; +import com.google.cloud.firestore.v1.FirestoreClient.ListDocumentsPagedResponse; +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import com.google.firestore.v1.Document; +import com.google.firestore.v1.ListDocumentsRequest; +import com.google.firestore.v1.ListDocumentsResponse; +import com.google.firestore.v1.Value; +import java.util.Iterator; +import java.util.List; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.ListDocumentsFn; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.AbstractIterator; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; +import org.mockito.ArgumentCaptor; +import org.mockito.Mock; + +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +public final class FirestoreV1FnListDocumentsTest + extends BaseFirestoreV1ReadFnTest { + + @Mock private UnaryCallable callable; + @Mock private ListDocumentsPagedResponse pagedResponse1; + @Mock private ListDocumentsPage page1; + @Mock private ListDocumentsPagedResponse pagedResponse2; + @Mock private ListDocumentsPage page2; + + @Test + public void endToEnd() throws Exception { + // First page of the response + ListDocumentsRequest request1 = + ListDocumentsRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + ListDocumentsResponse response1 = + ListDocumentsResponse.newBuilder() + .addDocuments( + Document.newBuilder() + .setName("doc_1-1") + .putAllFields( + ImmutableMap.of("foo", Value.newBuilder().setStringValue("bar").build())) + .build()) + .addDocuments( + Document.newBuilder() + .setName("doc_1-2") + .putAllFields( + ImmutableMap.of("foo", Value.newBuilder().setStringValue("bar").build())) + .build()) + .addDocuments( + Document.newBuilder() + .setName("doc_1-3") + .putAllFields( + ImmutableMap.of("foo", Value.newBuilder().setStringValue("bar").build())) + .build()) + .setNextPageToken("page2") + .build(); + when(page1.getNextPageToken()).thenReturn(response1.getNextPageToken()); + when(page1.getResponse()).thenReturn(response1); + when(page1.hasNextPage()).thenReturn(true); + + // Second page of the response + ListDocumentsResponse response2 = + ListDocumentsResponse.newBuilder() + .addDocuments( + Document.newBuilder() + .setName("doc_2-1") + .putAllFields( + ImmutableMap.of("foo", Value.newBuilder().setStringValue("bar").build())) + .build()) + .build(); + when(page2.getResponse()).thenReturn(response2); + when(page2.hasNextPage()).thenReturn(false); + when(pagedResponse1.iteratePages()).thenReturn(ImmutableList.of(page1, page2)); + when(callable.call(request1)).thenReturn(pagedResponse1); + + when(stub.listDocumentsPagedCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + RpcQosOptions options = RpcQosOptions.defaultOptions(); + when(ff.getRpcQos(any())) + .thenReturn(FirestoreStatefulComponentFactory.INSTANCE.getRpcQos(options)); + + ArgumentCaptor responses = + ArgumentCaptor.forClass(ListDocumentsResponse.class); + + doNothing().when(processContext).output(responses.capture()); + + when(processContext.element()).thenReturn(request1); + + ListDocumentsFn fn = new ListDocumentsFn(clock, ff, options); + + runFunction(fn); + + List expected = newArrayList(response1, response2); + List allValues = responses.getAllValues(); + assertEquals(expected, allValues); + } + + @Override + public void resumeFromLastReadValue() throws Exception { + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newReadAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + + // First page of the response + ListDocumentsRequest request1 = + ListDocumentsRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + ListDocumentsResponse response1 = + ListDocumentsResponse.newBuilder() + .addDocuments( + Document.newBuilder() + .setName("doc_1-1") + .putAllFields( + ImmutableMap.of("foo", Value.newBuilder().setStringValue("bar").build())) + .build()) + .addDocuments( + Document.newBuilder() + .setName("doc_1-2") + .putAllFields( + ImmutableMap.of("foo", Value.newBuilder().setStringValue("bar").build())) + .build()) + .addDocuments( + Document.newBuilder() + .setName("doc_1-3") + .putAllFields( + ImmutableMap.of("foo", Value.newBuilder().setStringValue("bar").build())) + .build()) + .setNextPageToken("page2") + .build(); + when(page1.getNextPageToken()).thenReturn(response1.getNextPageToken()); + when(page1.getResponse()).thenReturn(response1); + when(page1.hasNextPage()).thenReturn(true); + when(callable.call(request1)).thenReturn(pagedResponse1); + doNothing().when(attempt).checkCanRetry(any(), eq(RETRYABLE_ERROR)); + when(pagedResponse1.iteratePages()) + .thenAnswer( + invocation -> + new Iterable() { + @Override + public Iterator iterator() { + return new AbstractIterator() { + private boolean first = true; + + @Override + protected ListDocumentsPage computeNext() { + if (first) { + first = false; + return page1; + } else { + throw RETRYABLE_ERROR; + } + } + }; + } + }); + + // Second page of the response + ListDocumentsRequest request2 = + ListDocumentsRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .setPageToken("page2") + .build(); + ListDocumentsResponse response2 = + ListDocumentsResponse.newBuilder() + .addDocuments( + Document.newBuilder() + .setName("doc_2-1") + .putAllFields( + ImmutableMap.of("foo", Value.newBuilder().setStringValue("bar").build())) + .build()) + .build(); + when(page2.getResponse()).thenReturn(response2); + when(page2.hasNextPage()).thenReturn(false); + when(callable.call(request2)).thenReturn(pagedResponse2); + when(pagedResponse2.iteratePages()).thenReturn(ImmutableList.of(page2)); + + when(stub.listDocumentsPagedCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + + ArgumentCaptor responses = + ArgumentCaptor.forClass(ListDocumentsResponse.class); + + doNothing().when(processContext).output(responses.capture()); + + when(processContext.element()).thenReturn(request1); + + ListDocumentsFn fn = new ListDocumentsFn(clock, ff, rpcQosOptions); + + runFunction(fn); + + List expected = newArrayList(response1, response2); + List allValues = responses.getAllValues(); + assertEquals(expected, allValues); + } + + @Override + protected V1RpcFnTestCtx newCtx() { + return new V1RpcFnTestCtx() { + @Override + public ListDocumentsRequest getRequest() { + return ListDocumentsRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + } + + @Override + public void mockRpcToCallable(FirestoreStub stub) { + when(stub.listDocumentsPagedCallable()).thenReturn(callable); + } + + @Override + public void whenCallableCall(ListDocumentsRequest in, Throwable... throwables) { + when(callable.call(in)).thenThrow(throwables); + } + + @Override + public void verifyNoInteractionsWithCallable() { + verifyNoMoreInteractions(callable); + } + }; + } + + @Override + protected ListDocumentsFn getFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new ListDocumentsFn(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnPartitionQueryTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnPartitionQueryTest.java new file mode 100644 index 000000000000..1f298837ea6a --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnPartitionQueryTest.java @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists.newArrayList; +import static org.junit.Assert.assertEquals; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.doNothing; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +import com.google.api.gax.rpc.UnaryCallable; +import com.google.cloud.firestore.v1.FirestoreClient.PartitionQueryPage; +import com.google.cloud.firestore.v1.FirestoreClient.PartitionQueryPagedResponse; +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import com.google.firestore.v1.Cursor; +import com.google.firestore.v1.PartitionQueryRequest; +import com.google.firestore.v1.PartitionQueryResponse; +import com.google.firestore.v1.Value; +import java.util.Iterator; +import java.util.List; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.PartitionQueryFn; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.PartitionQueryPair; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.AbstractIterator; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.junit.Test; +import org.mockito.ArgumentCaptor; +import org.mockito.Mock; + +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +public final class FirestoreV1FnPartitionQueryTest + extends BaseFirestoreV1ReadFnTest { + + @Mock private UnaryCallable callable; + @Mock private PartitionQueryPagedResponse pagedResponse1; + @Mock private PartitionQueryPage page1; + @Mock private PartitionQueryPagedResponse pagedResponse2; + @Mock private PartitionQueryPage page2; + + @Test + public void endToEnd() throws Exception { + // First page of the response + PartitionQueryRequest request1 = + PartitionQueryRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + PartitionQueryResponse response1 = + PartitionQueryResponse.newBuilder() + .addPartitions( + Cursor.newBuilder().addValues(Value.newBuilder().setReferenceValue("doc-100"))) + .addPartitions( + Cursor.newBuilder().addValues(Value.newBuilder().setReferenceValue("doc-200"))) + .addPartitions( + Cursor.newBuilder().addValues(Value.newBuilder().setReferenceValue("doc-300"))) + .addPartitions( + Cursor.newBuilder().addValues(Value.newBuilder().setReferenceValue("doc-400"))) + .build(); + when(callable.call(request1)).thenReturn(pagedResponse1); + when(page1.getResponse()).thenReturn(response1); + when(pagedResponse1.iteratePages()).thenReturn(ImmutableList.of(page1)); + + when(stub.partitionQueryPagedCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + RpcQosOptions options = RpcQosOptions.defaultOptions(); + when(ff.getRpcQos(any())) + .thenReturn(FirestoreStatefulComponentFactory.INSTANCE.getRpcQos(options)); + + ArgumentCaptor responses = + ArgumentCaptor.forClass(PartitionQueryPair.class); + + doNothing().when(processContext).output(responses.capture()); + + when(processContext.element()).thenReturn(request1); + + PartitionQueryFn fn = new PartitionQueryFn(clock, ff, options); + + runFunction(fn); + + List expected = newArrayList(new PartitionQueryPair(request1, response1)); + List allValues = responses.getAllValues(); + assertEquals(expected, allValues); + } + + @Test + public void endToEnd_emptyCursors() throws Exception { + // First page of the response + PartitionQueryRequest request1 = + PartitionQueryRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + PartitionQueryResponse response1 = PartitionQueryResponse.newBuilder().build(); + when(callable.call(request1)).thenReturn(pagedResponse1); + when(page1.getResponse()).thenReturn(response1); + when(pagedResponse1.iteratePages()).thenReturn(ImmutableList.of(page1)); + + when(stub.partitionQueryPagedCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + RpcQosOptions options = RpcQosOptions.defaultOptions(); + when(ff.getRpcQos(any())) + .thenReturn(FirestoreStatefulComponentFactory.INSTANCE.getRpcQos(options)); + + ArgumentCaptor responses = + ArgumentCaptor.forClass(PartitionQueryPair.class); + + doNothing().when(processContext).output(responses.capture()); + + when(processContext.element()).thenReturn(request1); + + PartitionQueryFn fn = new PartitionQueryFn(clock, ff, options); + + runFunction(fn); + + List expected = newArrayList(new PartitionQueryPair(request1, response1)); + List allValues = responses.getAllValues(); + assertEquals(expected, allValues); + } + + @Override + public void resumeFromLastReadValue() throws Exception { + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newReadAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + + // First page of the response + PartitionQueryRequest request1 = + PartitionQueryRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + PartitionQueryResponse response1 = + PartitionQueryResponse.newBuilder() + .addPartitions( + Cursor.newBuilder().addValues(Value.newBuilder().setReferenceValue("doc-100"))) + .addPartitions( + Cursor.newBuilder().addValues(Value.newBuilder().setReferenceValue("doc-200"))) + .addPartitions( + Cursor.newBuilder().addValues(Value.newBuilder().setReferenceValue("doc-300"))) + .setNextPageToken("page2") + .build(); + when(page1.getResponse()).thenReturn(response1); + when(callable.call(request1)).thenReturn(pagedResponse1); + doNothing().when(attempt).checkCanRetry(any(), eq(RETRYABLE_ERROR)); + when(pagedResponse1.iteratePages()) + .thenAnswer( + invocation -> + new Iterable() { + @Override + public Iterator iterator() { + return new AbstractIterator() { + private boolean first = true; + + @Override + protected PartitionQueryPage computeNext() { + if (first) { + first = false; + return page1; + } else { + throw RETRYABLE_ERROR; + } + } + }; + } + }); + + // Second page of the response + PartitionQueryRequest request2 = + PartitionQueryRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .setPageToken("page2") + .build(); + PartitionQueryResponse response2 = + PartitionQueryResponse.newBuilder() + .addPartitions( + Cursor.newBuilder().addValues(Value.newBuilder().setReferenceValue("doc-400"))) + .build(); + + PartitionQueryResponse expectedResponse = + response1 + .toBuilder() + .clearNextPageToken() + .addAllPartitions(response2.getPartitionsList()) + .build(); + + when(page2.getResponse()).thenReturn(response2); + when(page2.hasNextPage()).thenReturn(false); + when(callable.call(request2)).thenReturn(pagedResponse2); + when(pagedResponse2.iteratePages()).thenReturn(ImmutableList.of(page2)); + + when(stub.partitionQueryPagedCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + + ArgumentCaptor responses = + ArgumentCaptor.forClass(PartitionQueryPair.class); + + doNothing().when(processContext).output(responses.capture()); + + when(processContext.element()).thenReturn(request1); + + PartitionQueryFn fn = new PartitionQueryFn(clock, ff, rpcQosOptions); + + runFunction(fn); + + List expected = + newArrayList(new PartitionQueryPair(request1, expectedResponse)); + List allValues = responses.getAllValues(); + assertEquals(expected, allValues); + } + + @Override + protected V1RpcFnTestCtx newCtx() { + return new V1RpcFnTestCtx() { + @Override + public PartitionQueryRequest getRequest() { + return PartitionQueryRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + } + + @Override + public void mockRpcToCallable(FirestoreStub stub) { + when(stub.partitionQueryPagedCallable()).thenReturn(callable); + } + + @Override + public void whenCallableCall(PartitionQueryRequest in, Throwable... throwables) { + when(callable.call(in)).thenThrow(throwables); + } + + @Override + public void verifyNoInteractionsWithCallable() { + verifyNoMoreInteractions(callable); + } + }; + } + + @Override + protected PartitionQueryFn getFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new PartitionQueryFn(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnRunQueryTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnRunQueryTest.java new file mode 100644 index 000000000000..23b18d0a6253 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1FnRunQueryTest.java @@ -0,0 +1,364 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static java.util.Objects.requireNonNull; +import static org.junit.Assert.assertEquals; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.doNothing; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +import com.google.api.gax.rpc.ServerStream; +import com.google.api.gax.rpc.ServerStreamingCallable; +import com.google.cloud.firestore.v1.stub.FirestoreStub; +import com.google.firestore.v1.Cursor; +import com.google.firestore.v1.Document; +import com.google.firestore.v1.RunQueryRequest; +import com.google.firestore.v1.RunQueryResponse; +import com.google.firestore.v1.StructuredQuery; +import com.google.firestore.v1.StructuredQuery.CollectionSelector; +import com.google.firestore.v1.StructuredQuery.Direction; +import com.google.firestore.v1.StructuredQuery.FieldFilter; +import com.google.firestore.v1.StructuredQuery.FieldFilter.Operator; +import com.google.firestore.v1.StructuredQuery.FieldReference; +import com.google.firestore.v1.StructuredQuery.Filter; +import com.google.firestore.v1.StructuredQuery.Order; +import com.google.firestore.v1.Value; +import java.util.Collections; +import java.util.List; +import java.util.function.Function; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.RunQueryFn; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.AbstractIterator; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; +import org.mockito.ArgumentCaptor; +import org.mockito.Mock; + +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +public final class FirestoreV1FnRunQueryTest + extends BaseFirestoreV1ReadFnTest { + + @Mock private ServerStreamingCallable callable; + @Mock private ServerStream responseStream1; + @Mock private ServerStream responseStream2; + + @Test + public void endToEnd() throws Exception { + TestData testData = TestData.fieldEqualsBar().setProjectId(projectId).build(); + + List responses = + ImmutableList.of(testData.response1, testData.response2, testData.response3); + when(responseStream1.iterator()).thenReturn(responses.iterator()); + + when(callable.call(testData.request)).thenReturn(responseStream1); + + when(stub.runQueryCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + RpcQosOptions options = RpcQosOptions.defaultOptions(); + when(ff.getRpcQos(any())) + .thenReturn(FirestoreStatefulComponentFactory.INSTANCE.getRpcQos(options)); + + ArgumentCaptor responsesCaptor = + ArgumentCaptor.forClass(RunQueryResponse.class); + + doNothing().when(processContext).output(responsesCaptor.capture()); + + when(processContext.element()).thenReturn(testData.request); + + RunQueryFn fn = new RunQueryFn(clock, ff, options); + + runFunction(fn); + + List allValues = responsesCaptor.getAllValues(); + assertEquals(responses, allValues); + } + + @Override + public void resumeFromLastReadValue() throws Exception { + TestData testData = + TestData.fieldEqualsBar() + .setProjectId(projectId) + .setOrderFunction( + f -> + Collections.singletonList( + Order.newBuilder().setDirection(Direction.ASCENDING).setField(f).build())) + .build(); + + RunQueryRequest request2 = + RunQueryRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .setStructuredQuery( + testData + .request + .getStructuredQuery() + .toBuilder() + .setStartAt( + Cursor.newBuilder() + .setBefore(false) + .addValues(Value.newBuilder().setStringValue("bar")))) + .build(); + + List responses = + ImmutableList.of(testData.response1, testData.response2, testData.response3); + when(responseStream1.iterator()) + .thenReturn( + new AbstractIterator() { + private int invocationCount = 1; + + @Override + protected RunQueryResponse computeNext() { + int count = invocationCount++; + if (count == 1) { + return responses.get(0); + } else if (count == 2) { + return responses.get(1); + } else { + throw RETRYABLE_ERROR; + } + } + }); + + when(callable.call(testData.request)).thenReturn(responseStream1); + doNothing().when(attempt).checkCanRetry(any(), eq(RETRYABLE_ERROR)); + when(responseStream2.iterator()).thenReturn(ImmutableList.of(responses.get(2)).iterator()); + when(callable.call(request2)).thenReturn(responseStream2); + + when(stub.runQueryCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newReadAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + + ArgumentCaptor responsesCaptor = + ArgumentCaptor.forClass(RunQueryResponse.class); + + doNothing().when(processContext).output(responsesCaptor.capture()); + + when(processContext.element()).thenReturn(testData.request); + + RunQueryFn fn = new RunQueryFn(clock, ff, rpcQosOptions); + + runFunction(fn); + + List allValues = responsesCaptor.getAllValues(); + assertEquals(responses, allValues); + + verify(callable, times(1)).call(testData.request); + verify(callable, times(1)).call(request2); + verify(attempt, times(3)).recordStreamValue(any()); + } + + @Test + public void resumeFromLastReadValue_withNoOrderBy() throws Exception { + TestData testData = TestData.fieldEqualsBar().setProjectId(projectId).build(); + + RunQueryRequest request2 = + RunQueryRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .setStructuredQuery( + testData + .request + .getStructuredQuery() + .toBuilder() + .setStartAt( + Cursor.newBuilder() + .setBefore(false) + .addValues( + Value.newBuilder() + .setReferenceValue(testData.response2.getDocument().getName()))) + .addOrderBy( + Order.newBuilder() + .setField(FieldReference.newBuilder().setFieldPath("__name__")) + .setDirection(Direction.ASCENDING))) + .build(); + + List responses = + ImmutableList.of(testData.response1, testData.response2, testData.response3); + when(responseStream1.iterator()) + .thenReturn( + new AbstractIterator() { + private int invocationCount = 1; + + @Override + protected RunQueryResponse computeNext() { + int count = invocationCount++; + if (count == 1) { + return responses.get(0); + } else if (count == 2) { + return responses.get(1); + } else { + throw RETRYABLE_ERROR; + } + } + }); + + when(callable.call(testData.request)).thenReturn(responseStream1); + doNothing().when(attempt).checkCanRetry(any(), eq(RETRYABLE_ERROR)); + when(responseStream2.iterator()).thenReturn(ImmutableList.of(testData.response3).iterator()); + when(callable.call(request2)).thenReturn(responseStream2); + + when(stub.runQueryCallable()).thenReturn(callable); + + when(ff.getFirestoreStub(any())).thenReturn(stub); + when(ff.getRpcQos(any())).thenReturn(rpcQos); + when(rpcQos.newReadAttempt(any())).thenReturn(attempt); + when(attempt.awaitSafeToProceed(any())).thenReturn(true); + + ArgumentCaptor responsesCaptor = + ArgumentCaptor.forClass(RunQueryResponse.class); + + doNothing().when(processContext).output(responsesCaptor.capture()); + + when(processContext.element()).thenReturn(testData.request); + + RunQueryFn fn = new RunQueryFn(clock, ff, rpcQosOptions); + + runFunction(fn); + + List allValues = responsesCaptor.getAllValues(); + assertEquals(responses, allValues); + + verify(callable, times(1)).call(testData.request); + verify(callable, times(1)).call(request2); + verify(attempt, times(3)).recordStreamValue(any()); + } + + @Override + protected V1RpcFnTestCtx newCtx() { + return new V1RpcFnTestCtx() { + @Override + public RunQueryRequest getRequest() { + return RunQueryRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .build(); + } + + @Override + public void mockRpcToCallable(FirestoreStub stub) { + when(stub.runQueryCallable()).thenReturn(callable); + } + + @Override + public void whenCallableCall(RunQueryRequest in, Throwable... throwables) { + when(callable.call(in)).thenThrow(throwables); + } + + @Override + public void verifyNoInteractionsWithCallable() { + verifyNoMoreInteractions(callable); + } + }; + } + + @Override + protected RunQueryFn getFn( + JodaClock clock, + FirestoreStatefulComponentFactory firestoreStatefulComponentFactory, + RpcQosOptions rpcQosOptions) { + return new RunQueryFn(clock, firestoreStatefulComponentFactory, rpcQosOptions); + } + + private static final class TestData { + + private final RunQueryRequest request; + private final RunQueryResponse response1; + private final RunQueryResponse response2; + private final RunQueryResponse response3; + + public TestData(String projectId, Function> orderFunction) { + String fieldPath = "foo"; + FieldReference foo = FieldReference.newBuilder().setFieldPath(fieldPath).build(); + StructuredQuery.Builder builder = + StructuredQuery.newBuilder() + .addFrom( + CollectionSelector.newBuilder() + .setAllDescendants(false) + .setCollectionId("collection")) + .setWhere( + Filter.newBuilder() + .setFieldFilter( + FieldFilter.newBuilder() + .setField(foo) + .setOp(Operator.EQUAL) + .setValue(Value.newBuilder().setStringValue("bar")) + .build())); + + orderFunction.apply(foo).forEach(builder::addOrderBy); + request = + RunQueryRequest.newBuilder() + .setParent(String.format("projects/%s/databases/(default)/document", projectId)) + .setStructuredQuery(builder) + .build(); + + response1 = newResponse(fieldPath, 1); + response2 = newResponse(fieldPath, 2); + response3 = newResponse(fieldPath, 3); + } + + private static RunQueryResponse newResponse(String field, int docNumber) { + String docId = String.format("doc-%d", docNumber); + return RunQueryResponse.newBuilder() + .setDocument( + Document.newBuilder() + .setName(docId) + .putAllFields( + ImmutableMap.of(field, Value.newBuilder().setStringValue("bar").build())) + .build()) + .build(); + } + + private static Builder fieldEqualsBar() { + return new Builder(); + } + + @SuppressWarnings("initialization.fields.uninitialized") // fields set via builder methods + private static final class Builder { + + private String projectId; + private Function> orderFunction; + + public Builder() { + orderFunction = f -> Collections.emptyList(); + } + + public Builder setProjectId(String projectId) { + this.projectId = projectId; + return this; + } + + public Builder setOrderFunction(Function> orderFunction) { + this.orderFunction = orderFunction; + return this; + } + + private TestData build() { + return new TestData( + requireNonNull(projectId, "projectId must be non null"), + requireNonNull(orderFunction, "orderFunction must be non null")); + } + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/PartitionQueryResponseToRunQueryRequestTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/PartitionQueryResponseToRunQueryRequestTest.java new file mode 100644 index 000000000000..ed789dae59f4 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/PartitionQueryResponseToRunQueryRequestTest.java @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists.newArrayList; +import static org.junit.Assert.assertEquals; +import static org.mockito.Mockito.doNothing; +import static org.mockito.Mockito.when; + +import com.google.firestore.v1.Cursor; +import com.google.firestore.v1.PartitionQueryRequest; +import com.google.firestore.v1.PartitionQueryResponse; +import com.google.firestore.v1.RunQueryRequest; +import com.google.firestore.v1.StructuredQuery; +import com.google.firestore.v1.StructuredQuery.CollectionSelector; +import com.google.firestore.v1.Value; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.PartitionQuery.PartitionQueryResponseToRunQueryRequest; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1ReadFn.PartitionQueryPair; +import org.apache.beam.sdk.transforms.DoFn; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.mockito.ArgumentCaptor; +import org.mockito.Mock; +import org.mockito.junit.MockitoJUnitRunner; + +@RunWith(MockitoJUnitRunner.class) +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +public final class PartitionQueryResponseToRunQueryRequestTest { + + @Mock protected DoFn.ProcessContext processContext; + + @Test + public void ensureSortingCorrectlyHandlesPathSegments() { + List expected = + newArrayList( + referenceValueCursor("projects/p1/databases/d1/documents/c1/doc1"), + referenceValueCursor("projects/p1/databases/d1/documents/c1/doc2"), + referenceValueCursor("projects/p1/databases/d1/documents/c1/doc2/c2/doc1"), + referenceValueCursor("projects/p1/databases/d1/documents/c1/doc2/c2/doc2"), + referenceValueCursor("projects/p1/databases/d1/documents/c10/doc1"), + referenceValueCursor("projects/p1/databases/d1/documents/c2/doc1"), + referenceValueCursor("projects/p2/databases/d2/documents/c1/doc1"), + referenceValueCursor("projects/p2/databases/d2/documents/c1-/doc1"), + referenceValueCursor("projects/p2/databases/d3/documents/c1-/doc1"), + referenceValueCursor("projects/p2/databases/d3/documents/c1-/doc1"), + Cursor.newBuilder().build()); + + for (int i = 0; i < 1000; i++) { + List list = new ArrayList<>(expected); + Collections.shuffle(list); + + list.sort(PartitionQueryResponseToRunQueryRequest.CURSOR_REFERENCE_VALUE_COMPARATOR); + + assertEquals(expected, list); + } + } + + @Test + public void ensureCursorPairingWorks() { + StructuredQuery query = + StructuredQuery.newBuilder() + .addFrom( + CollectionSelector.newBuilder() + .setAllDescendants(true) + .setCollectionId("c1") + .build()) + .build(); + + Cursor cursor1 = referenceValueCursor("projects/p1/databases/d1/documents/c1/doc1"); + Cursor cursor2 = referenceValueCursor("projects/p1/databases/d1/documents/c1/doc2"); + Cursor cursor3 = referenceValueCursor("projects/p1/databases/d1/documents/c1/doc2/c2/doc2"); + + List expectedQueries = + newArrayList( + newQueryWithCursors(query, null, cursor1), + newQueryWithCursors(query, cursor1, cursor2), + newQueryWithCursors(query, cursor2, cursor3), + newQueryWithCursors(query, cursor3, null)); + + PartitionQueryPair partitionQueryPair = + new PartitionQueryPair( + PartitionQueryRequest.newBuilder().setStructuredQuery(query).build(), + PartitionQueryResponse.newBuilder() + .addPartitions(cursor3) + .addPartitions(cursor1) + .addPartitions(cursor2) + .build()); + + ArgumentCaptor captor = ArgumentCaptor.forClass(RunQueryRequest.class); + when(processContext.element()).thenReturn(partitionQueryPair); + doNothing().when(processContext).output(captor.capture()); + + PartitionQueryResponseToRunQueryRequest fn = new PartitionQueryResponseToRunQueryRequest(); + fn.processElement(processContext); + + List actualQueries = + captor.getAllValues().stream() + .map(RunQueryRequest::getStructuredQuery) + .collect(Collectors.toList()); + + assertEquals(expectedQueries, actualQueries); + } + + @Test + public void ensureCursorPairingWorks_emptyCursorsInResponse() { + StructuredQuery query = + StructuredQuery.newBuilder() + .addFrom( + CollectionSelector.newBuilder() + .setAllDescendants(true) + .setCollectionId("c1") + .build()) + .build(); + + List expectedQueries = newArrayList(query); + + PartitionQueryPair partitionQueryPair = + new PartitionQueryPair( + PartitionQueryRequest.newBuilder().setStructuredQuery(query).build(), + PartitionQueryResponse.newBuilder().build()); + + ArgumentCaptor captor = ArgumentCaptor.forClass(RunQueryRequest.class); + when(processContext.element()).thenReturn(partitionQueryPair); + doNothing().when(processContext).output(captor.capture()); + + PartitionQueryResponseToRunQueryRequest fn = new PartitionQueryResponseToRunQueryRequest(); + fn.processElement(processContext); + + List actualQueries = + captor.getAllValues().stream() + .map(RunQueryRequest::getStructuredQuery) + .collect(Collectors.toList()); + + assertEquals(expectedQueries, actualQueries); + } + + private static Cursor referenceValueCursor(String referenceValue) { + return Cursor.newBuilder() + .addValues(Value.newBuilder().setReferenceValue(referenceValue).build()) + .build(); + } + + private static StructuredQuery newQueryWithCursors( + StructuredQuery query, Cursor startAt, Cursor endAt) { + StructuredQuery.Builder builder = query.toBuilder(); + if (startAt != null) { + builder.setStartAt(startAt); + } + if (endAt != null) { + builder.setEndAt(endAt); + } + return builder.build(); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosOptionsTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosOptionsTest.java new file mode 100644 index 000000000000..363561784be4 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosOptionsTest.java @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists.newArrayList; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotEquals; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +import java.util.ArrayList; +import java.util.List; +import java.util.function.BiFunction; +import java.util.stream.Collectors; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.display.DisplayData.ItemSpec; +import org.apache.beam.sdk.util.SerializableUtils; +import org.joda.time.Duration; +import org.junit.Test; +import org.junit.runners.model.MultipleFailureException; +import org.mockito.ArgumentCaptor; + +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public final class RpcQosOptionsTest { + + @Test + public void ensureSerializable() { + SerializableUtils.ensureSerializable(RpcQosOptions.defaultOptions()); + } + + @Test + public void builderBuildBuilder() { + RpcQosOptions rpcQosOptions = RpcQosOptions.defaultOptions(); + int newMaxAttempts = rpcQosOptions.getMaxAttempts() - 1; + RpcQosOptions.Builder builder = rpcQosOptions.toBuilder().withMaxAttempts(newMaxAttempts); + RpcQosOptions build = builder.build(); + + assertNotEquals(rpcQosOptions, build); + assertEquals(newMaxAttempts, build.getMaxAttempts()); + } + + @Test + public void populateDisplayData() { + //noinspection unchecked + ArgumentCaptor> captor = ArgumentCaptor.forClass(DisplayData.ItemSpec.class); + + DisplayData.Builder builder = mock(DisplayData.Builder.class); + when(builder.add(captor.capture())).thenReturn(builder); + RpcQosOptions rpcQosOptions = RpcQosOptions.defaultOptions(); + rpcQosOptions.populateDisplayData(builder); + + List actualKeys = + captor.getAllValues().stream().map(ItemSpec::getKey).sorted().collect(Collectors.toList()); + + List expectedKeys = + newArrayList( + "batchInitialCount", + "batchMaxBytes", + "batchMaxCount", + "batchTargetLatency", + "hintMaxNumWorkers", + "initialBackoff", + "maxAttempts", + "overloadRatio", + "samplePeriod", + "samplePeriodBucketSize", + "shouldReportDiagnosticMetrics", + "throttleDuration"); + + assertEquals(expectedKeys, actualKeys); + } + + @Test + public void defaultOptionsBuildSuccessfully() { + assertNotNull(RpcQosOptions.defaultOptions()); + } + + @Test + public void argumentValidation_maxAttempts() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withMaxAttempts; + testNullability(f); + testIntRange(f, 1, 5); + } + + @Test + public void argumentValidation_withInitialBackoff() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withInitialBackoff; + testNullability(f); + testDurationRange(f, Duration.standardSeconds(5), Duration.standardMinutes(2)); + } + + @Test + public void argumentValidation_withSamplePeriod() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withSamplePeriod; + testNullability(f); + testDurationRange(f, Duration.standardMinutes(2), Duration.standardMinutes(20)); + } + + @Test + public void argumentValidation_withSampleUpdateFrequency() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withSamplePeriodBucketSize; + testNullability(f); + testDurationRange(f, Duration.standardSeconds(10), Duration.standardMinutes(20)); + } + + @Test + public void argumentValidation_withSampleUpdateFrequency_lteqSamplePeriod() { + RpcQosOptions.newBuilder() + .withSamplePeriod(Duration.millis(5)) + .withSamplePeriodBucketSize(Duration.millis(5)) + .validateRelatedFields(); + try { + RpcQosOptions.newBuilder() + .withSamplePeriod(Duration.millis(5)) + .withSamplePeriodBucketSize(Duration.millis(6)) + .validateRelatedFields(); + fail("expected validation failure for samplePeriodBucketSize > samplePeriod"); + } catch (IllegalArgumentException e) { + assertTrue(e.getMessage().contains("samplePeriodBucketSize <= samplePeriod")); + } + } + + @Test + public void argumentValidation_withOverloadRatio() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withOverloadRatio; + testNullability(f); + testRange(0.0001, Double::sum, (a, b) -> a - b, f, 1.0, 1.5); + } + + @Test + public void argumentValidation_withThrottleDuration() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withThrottleDuration; + testNullability(f); + testDurationRange(f, Duration.standardSeconds(5), Duration.standardMinutes(1)); + } + + @Test + public void argumentValidation_withBatchInitialCount() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withBatchInitialCount; + testNullability(f); + testIntRange(f, 1, 500); + } + + @Test + public void argumentValidation_withBatchMaxCount() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withBatchMaxCount; + testNullability(f); + testIntRange(f, 1, 500); + } + + @Test + public void argumentValidation_withBatchMaxBytes() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withBatchMaxBytes; + testNullability(f); + testRange(1L, Math::addExact, Math::subtractExact, f, 1L, (long) (9.5 * 1024 * 1024)); + } + + @Test + public void argumentValidation_withBatchTargetLatency() throws Exception { + BiFunction f = + RpcQosOptions.Builder::withBatchTargetLatency; + testNullability(f); + testDurationRange(f, Duration.standardSeconds(5), Duration.standardMinutes(2)); + } + + private static void testNullability( + BiFunction f) { + try { + f.apply(RpcQosOptions.newBuilder(), null).build(); + fail("expected NullPointerException"); + } catch (NullPointerException e) { + // pass + } + } + + private static void testIntRange( + BiFunction f, int min, int max) + throws Exception { + testRange(1, Math::addExact, Math::subtractExact, f, min, max); + } + + private static void testDurationRange( + BiFunction f, + Duration min, + Duration max) + throws Exception { + testRange(Duration.millis(1), Duration::plus, Duration::minus, f, min, max); + } + + private static void testRange( + T epsilon, + BiFunction plus, + BiFunction minus, + BiFunction f, + T min, + T max) + throws Exception { + List errors = new ArrayList<>(); + errors.addAll(testMinBoundary(epsilon, plus, minus, f, min)); + errors.addAll(testMaxBoundary(epsilon, plus, minus, f, max)); + MultipleFailureException.assertEmpty(errors); + } + + private static List testMaxBoundary( + T epsilon, + BiFunction plus, + BiFunction minus, + BiFunction f, + T max) { + List errors = new ArrayList<>(); + try { + f.apply(RpcQosOptions.newBuilder(), minus.apply(max, epsilon)).validateIndividualFields(); + } catch (Throwable t) { + errors.add(newError(t, "max - epsilon")); + } + try { + f.apply(RpcQosOptions.newBuilder(), max).validateIndividualFields(); + } catch (Throwable t) { + errors.add(newError(t, "max")); + } + try { + try { + f.apply(RpcQosOptions.newBuilder(), plus.apply(max, epsilon)).validateIndividualFields(); + fail("expected IllegalArgumentException"); + } catch (IllegalArgumentException e) { + // pass + } + } catch (Throwable t) { + errors.add(newError(t, "max + epsilon")); + } + return errors; + } + + private static List testMinBoundary( + T epsilon, + BiFunction plus, + BiFunction minus, + BiFunction f, + T min) { + List errors = new ArrayList<>(); + try { + try { + f.apply(RpcQosOptions.newBuilder(), minus.apply(min, epsilon)).validateIndividualFields(); + fail("expected IllegalArgumentException"); + } catch (IllegalArgumentException e) { + // pass + } + } catch (Throwable t) { + errors.add(newError(t, "min - epsilon")); + } + try { + f.apply(RpcQosOptions.newBuilder(), min).validateIndividualFields(); + } catch (Throwable t) { + errors.add(newError(t, "min")); + } + try { + f.apply(RpcQosOptions.newBuilder(), plus.apply(min, epsilon)).validateIndividualFields(); + } catch (Throwable t) { + errors.add(newError(t, "min + epsilon")); + } + return errors; + } + + private static AssertionError newError(Throwable t, String conditionDescription) { + return new AssertionError( + String.format("error while testing boundary condition (%s)", conditionDescription), t); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosSimulationTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosSimulationTest.java new file mode 100644 index 000000000000..bbf3e135e43f --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosSimulationTest.java @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.joda.time.Duration.ZERO; +import static org.joda.time.Duration.millis; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +import java.util.Arrays; +import java.util.Random; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcAttempt.Context; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.Element; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.FlushBufferImpl; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.RpcWriteAttemptImpl; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Distribution; +import org.apache.beam.sdk.util.Sleeper; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TestName; +import org.junit.runner.RunWith; +import org.mockito.Mock; +import org.mockito.junit.MockitoJUnitRunner; + +@RunWith(MockitoJUnitRunner.class) +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +public final class RpcQosSimulationTest { + + @Mock(lenient = true) + private Sleeper sleeper; + + @Mock(lenient = true) + private CounterFactory counterFactory; + + @Mock(lenient = true) + private DistributionFactory distributionFactory; + + @Mock(lenient = true) + private Counter counterThrottlingMs; + + @Mock(lenient = true) + private Counter counterRpcFailures; + + @Mock(lenient = true) + private Counter counterRpcSuccesses; + + @Mock(lenient = true) + private Counter counterRpcStreamValueReceived; + + @Rule public final TestName testName = new TestName(); + + // should not be static, important to reinitialize for each test + private final Random random = + new Random(1234567890); // fix the seed so we have deterministic tests + + private Context rpcAttemptContext; + + @Before + public void setUp() { + rpcAttemptContext = + () -> String.format("%s.%s", this.getClass().getName(), testName.getMethodName()); + when(counterFactory.get(rpcAttemptContext.getNamespace(), "throttlingMs")) + .thenReturn(counterThrottlingMs); + when(counterFactory.get(rpcAttemptContext.getNamespace(), "rpc_failures")) + .thenReturn(counterRpcFailures); + when(counterFactory.get(rpcAttemptContext.getNamespace(), "rpc_successes")) + .thenReturn(counterRpcSuccesses); + when(counterFactory.get(rpcAttemptContext.getNamespace(), "rpc_streamValueReceived")) + .thenReturn(counterRpcStreamValueReceived); + when(distributionFactory.get(any(), any())) + .thenAnswer( + invocation -> mock(Distribution.class, invocation.getArgument(1, String.class))); + } + + @Test + public void writeRampUp_shouldScaleAlongTheExpectedLine() throws InterruptedException { + RpcQosOptions options = + RpcQosOptions.newBuilder() + .withHintMaxNumWorkers(1) + .withThrottleDuration(Duration.standardSeconds(5)) + .withBatchInitialCount(200) + // in the test we're jumping ahead by 5 minutes at multiple points, so the default 2 + // minute + // history for the adaptive throttler empties and the batch count drops back to the + // initial value. Increase the sample period to 10 minutes to ease the amount of state + // tracking that needs to be taken into account for calculating the ramp up values being + // asserted. + .withSamplePeriod(Duration.standardMinutes(10)) + .withSamplePeriodBucketSize(Duration.standardMinutes(2)) + .build(); + + RpcQosImpl qos = new RpcQosImpl(options, random, sleeper, counterFactory, distributionFactory); + /* + Ramp up budgets for 0 -> 90 minutes + 1970-01-01T00:00:00.000Z -> 500 + 1970-01-01T00:05:00.000Z -> 500 + 1970-01-01T00:10:00.000Z -> 750 + 1970-01-01T00:15:00.000Z -> 1,125 + 1970-01-01T00:20:00.000Z -> 1,687 + 1970-01-01T00:25:00.000Z -> 2,531 + 1970-01-01T00:30:00.000Z -> 3,796 + 1970-01-01T00:35:00.000Z -> 5,695 + 1970-01-01T00:40:00.000Z -> 8,542 + 1970-01-01T00:45:00.000Z -> 12,814 + 1970-01-01T00:50:00.000Z -> 19,221 + 1970-01-01T00:55:00.000Z -> 28,832 + 1970-01-01T01:00:00.000Z -> 43,248 + 1970-01-01T01:05:00.000Z -> 64,873 + 1970-01-01T01:10:00.000Z -> 97,309 + 1970-01-01T01:15:00.000Z -> 145,964 + 1970-01-01T01:20:00.000Z -> 218,946 + 1970-01-01T01:25:00.000Z -> 328,420 + 1970-01-01T01:30:00.000Z -> 492,630 + */ + // timeline values (names are of the format t'minute'm'seconds's'millis'ms) + Instant t00m00s000ms = t(ZERO); + Instant t00m00s010ms = t(millis(10)); + Instant t00m00s020ms = t(millis(20)); + Instant t00m00s999ms = t(millis(999)); + + Instant t00m01s000ms = t(seconds(1)); + Instant t00m01s001ms = t(seconds(1), millis(1)); + + Instant t05m00s000ms = t(minutes(5)); + Instant t05m00s001ms = t(minutes(5), millis(1)); + Instant t05m00s002ms = t(minutes(5), millis(2)); + + Instant t10m00s000ms = t(minutes(10)); + Instant t10m00s001ms = t(minutes(10), millis(1)); + Instant t10m00s002ms = t(minutes(10), millis(2)); + + Instant t15m00s000ms = t(minutes(15), millis(1)); + + Instant t20m00s000ms = t(minutes(20)); + Instant t20m01s000ms = t(minutes(20), seconds(1)); + + safeToProceedAndWithBudgetAndWrite(qos, t00m00s000ms, 200, 200, "write 200"); + safeToProceedAndWithBudgetAndWrite(qos, t00m00s010ms, 300, 300, "write 300"); + unsafeToProceed(qos, t00m00s020ms); + unsafeToProceed(qos, t00m00s999ms); + safeToProceedAndWithBudgetAndWrite( + qos, t00m01s000ms, 500, 500, "wait 1 second for budget to refill, write 500"); + unsafeToProceed(qos, t00m01s001ms); + safeToProceedAndWithBudgetAndWrite( + qos, t05m00s000ms, 500, 100, "jump ahead to next ramp up interval and write 100"); + safeToProceedAndWithBudgetAndWrite( + qos, t05m00s001ms, 400, 400, "write another 400 exhausting budget"); + unsafeToProceed(qos, t05m00s002ms); + safeToProceedAndWithBudgetAndWrite( + qos, + t10m00s000ms, + 500, + 500, + "after 10 minutes the ramp up should allow 750 writes, write 500"); + safeToProceedAndWithBudgetAndWrite(qos, t10m00s001ms, 250, 250, "write 250 more"); + unsafeToProceed(qos, t10m00s002ms); + safeToProceedAndWithBudgetAndWrite( + qos, + t15m00s000ms, + 500, + 500, + "after 15 minutes the ramp up should allow 1,125 writes, write 500"); + safeToProceedAndWithBudgetAndWrite(qos, t15m00s000ms, 500, 500, "write 500 more"); + safeToProceedAndWithBudgetAndWrite(qos, t15m00s000ms, 125, 125, "write 125 more"); + unsafeToProceed(qos, t15m00s000ms); + safeToProceedAndWithBudgetAndWrite( + qos, + t20m00s000ms, + 500, + 500, + "after 20 minutes the ramp up should allow 1,687 writes, write 500"); + safeToProceedAndWithBudgetAndWrite(qos, t20m00s000ms, 500, 500, "write 500 more"); + safeToProceedAndWithBudgetAndWrite(qos, t20m00s000ms, 500, 500, "write 500 more"); + safeToProceedAndWithBudgetAndWrite(qos, t20m00s000ms, 187, 187, "write 125 more"); + unsafeToProceed(qos, t20m00s000ms); + safeToProceedAndWithBudgetAndWrite( + qos, t20m01s000ms, 500, 500, "wait 1 second for the budget to refill, write 500"); + safeToProceedAndWithBudgetAndWrite(qos, t20m01s000ms, 500, 500, "write 500 more"); + safeToProceedAndWithBudgetAndWrite(qos, t20m01s000ms, 500, 500, "write 500 more"); + safeToProceedAndWithBudgetAndWrite(qos, t20m01s000ms, 187, 187, "write 125 more"); + unsafeToProceed(qos, t20m01s000ms); + } + + private static Instant t(Duration d) { + return Instant.ofEpochMilli(d.getMillis()); + } + + private static Instant t(Duration... ds) { + Duration sum = Arrays.stream(ds).reduce(Duration.ZERO, Duration::plus, Duration::plus); + return t(sum); + } + + private static Duration minutes(int i) { + return Duration.standardMinutes(i); + } + + private static Duration seconds(int i) { + return Duration.standardSeconds(i); + } + + private void unsafeToProceed(RpcQosImpl qos, Instant t) throws InterruptedException { + RpcWriteAttemptImpl attempt = qos.newWriteAttempt(rpcAttemptContext); + assertFalse( + msg("verify budget depleted", t, "awaitSafeToProceed was true, expected false"), + attempt.awaitSafeToProceed(t)); + } + + private void safeToProceedAndWithBudgetAndWrite( + RpcQosImpl qos, Instant t, int expectedBatchMaxCount, int writeCount, String description) + throws InterruptedException { + RpcWriteAttemptImpl attempt = qos.newWriteAttempt(rpcAttemptContext); + assertTrue( + msg(description, t, "awaitSafeToProceed was false, expected true"), + attempt.awaitSafeToProceed(t)); + FlushBufferImpl> buffer = attempt.newFlushBuffer(t); + assertEquals( + msg(description, t, "unexpected batchMaxCount"), + expectedBatchMaxCount, + buffer.nextBatchMaxCount); + attempt.recordRequestStart(t, writeCount); + attempt.recordWriteCounts(t, writeCount, 0); + } + + private static String msg(String description, Instant t, String message) { + return String.format("[%s @ t = %s] %s", description, t, message); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosTest.java new file mode 100644 index 000000000000..e1e130614728 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/RpcQosTest.java @@ -0,0 +1,763 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore; + +import static org.apache.beam.sdk.io.gcp.firestore.FirestoreProtoHelpers.newWrite; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists.newArrayList; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertSame; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.anyLong; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; + +import com.google.api.gax.grpc.GrpcStatusCode; +import com.google.api.gax.rpc.ApiException; +import com.google.api.gax.rpc.ApiExceptionFactory; +import com.google.firestore.v1.Write; +import com.google.rpc.Code; +import edu.umd.cs.findbugs.annotations.Nullable; +import io.grpc.Status; +import java.io.IOException; +import java.net.SocketTimeoutException; +import java.util.Collections; +import java.util.List; +import java.util.Objects; +import java.util.Random; +import java.util.function.Consumer; +import java.util.stream.Collectors; +import java.util.stream.IntStream; +import java.util.stream.StreamSupport; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1WriteFn.WriteElement; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcAttempt.Context; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcReadAttempt; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.Element; +import org.apache.beam.sdk.io.gcp.firestore.RpcQos.RpcWriteAttempt.FlushBuffer; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.FlushBufferImpl; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.RpcWriteAttemptImpl; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.StatusCodeAwareBackoff; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.StatusCodeAwareBackoff.BackoffDuration; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.StatusCodeAwareBackoff.BackoffResult; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.StatusCodeAwareBackoff.BackoffResults; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosImpl.WriteRampUp; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Distribution; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.mockito.Mock; +import org.mockito.junit.MockitoJUnitRunner; + +@RunWith(MockitoJUnitRunner.class) +@SuppressWarnings( + "initialization.fields.uninitialized") // mockito fields are initialized via the Mockito Runner +public final class RpcQosTest { + private static final ApiException RETRYABLE_ERROR = + ApiExceptionFactory.createException( + new SocketTimeoutException("retryableError"), + GrpcStatusCode.of(Status.Code.CANCELLED), + true); + private static final ApiException NON_RETRYABLE_ERROR = + ApiExceptionFactory.createException( + new IOException("nonRetryableError"), + GrpcStatusCode.of(Status.Code.FAILED_PRECONDITION), + false); + private static final ApiException RETRYABLE_ERROR_WITH_NON_RETRYABLE_CODE = + ApiExceptionFactory.createException( + new SocketTimeoutException("retryableError"), + GrpcStatusCode.of(Status.Code.INVALID_ARGUMENT), + true); + + private static final Context RPC_ATTEMPT_CONTEXT = RpcQosTest.class::getName; + + @Mock(lenient = true) + private Sleeper sleeper; + + @Mock(lenient = true) + private CounterFactory counterFactory; + + @Mock(lenient = true) + private DistributionFactory distributionFactory; + + @Mock(lenient = true) + private Counter counterThrottlingMs; + + @Mock(lenient = true) + private Counter counterRpcFailures; + + @Mock(lenient = true) + private Counter counterRpcSuccesses; + + @Mock(lenient = true) + private Counter counterRpcStreamValueReceived; + + @Mock(lenient = true) + private BoundedWindow window; + + // A clock that increments one from the epoch each time it's called + private final JodaClock monotonicClock = + new JodaClock() { + private long counter = 0; + + @Override + public Instant instant() { + return Instant.ofEpochMilli(counter++); + } + }; + // should not be static, important to reinitialize for each test + private final Random random = + new Random(1234567890); // fix the seed so we have deterministic tests + + private RpcQosOptions options; + + @Before + public void setUp() { + when(counterFactory.get(RPC_ATTEMPT_CONTEXT.getNamespace(), "throttlingMs")) + .thenReturn(counterThrottlingMs); + when(counterFactory.get(RPC_ATTEMPT_CONTEXT.getNamespace(), "rpc_failures")) + .thenReturn(counterRpcFailures); + when(counterFactory.get(RPC_ATTEMPT_CONTEXT.getNamespace(), "rpc_successes")) + .thenReturn(counterRpcSuccesses); + when(counterFactory.get(RPC_ATTEMPT_CONTEXT.getNamespace(), "rpc_streamValueReceived")) + .thenReturn(counterRpcStreamValueReceived); + when(distributionFactory.get(any(), any())) + .thenAnswer( + invocation -> mock(Distribution.class, invocation.getArgument(1, String.class))); + + // init here after mocks have been initialized + options = + RpcQosOptions.defaultOptions() + .toBuilder() + .withInitialBackoff(Duration.millis(1)) + .withSamplePeriod(Duration.millis(100)) + .withSamplePeriodBucketSize(Duration.millis(10)) + .withOverloadRatio(2.0) + .withThrottleDuration(Duration.millis(50)) + .withHintMaxNumWorkers(1) + .unsafeBuild(); + } + + @Test + public void reads_processedWhenNoErrors() throws InterruptedException { + + RpcQos qos = new RpcQosImpl(options, random, sleeper, counterFactory, distributionFactory); + + int numSuccesses = 100; + int numStreamElements = 25; + // record enough successful requests to fill up the sample period + for (int i = 0; i < numSuccesses; i++) { + RpcReadAttempt readAttempt = qos.newReadAttempt(RPC_ATTEMPT_CONTEXT); + Instant start = monotonicClock.instant(); + assertTrue(readAttempt.awaitSafeToProceed(start)); + for (int j = 0; j < numStreamElements; j++) { + readAttempt.recordStreamValue(monotonicClock.instant()); + } + readAttempt.recordRequestStart(monotonicClock.instant()); + readAttempt.recordRequestSuccessful(monotonicClock.instant()); + } + + verify(sleeper, times(0)).sleep(anyLong()); + verify(counterThrottlingMs, times(0)).inc(anyLong()); + verify(counterRpcFailures, times(0)).inc(); + verify(counterRpcSuccesses, times(numSuccesses)).inc(); + verify(counterRpcStreamValueReceived, times(numSuccesses * numStreamElements)).inc(); + } + + @Test + public void reads_blockWhenNotSafeToProceed() throws InterruptedException { + + RpcQos qos = new RpcQosImpl(options, random, sleeper, counterFactory, distributionFactory); + + // Based on the defined options, 3 failures is the upper bound before the next attempt + // will have to wait before proceeding + int numFailures = 3; + for (int i = 0; i < numFailures; i++) { + RpcReadAttempt readAttempt = qos.newReadAttempt(RPC_ATTEMPT_CONTEXT); + Instant start = monotonicClock.instant(); + assertTrue(readAttempt.awaitSafeToProceed(start)); + Instant end = monotonicClock.instant(); + readAttempt.recordRequestStart(start); + readAttempt.recordRequestFailed(end); + } + + RpcReadAttempt readAttempt2 = qos.newReadAttempt(RPC_ATTEMPT_CONTEXT); + assertFalse(readAttempt2.awaitSafeToProceed(monotonicClock.instant())); + + long sleepMillis = options.getInitialBackoff().getMillis(); + + verify(sleeper, times(0)).sleep(sleepMillis); + verify(counterThrottlingMs, times(0)).inc(sleepMillis); + verify(counterRpcFailures, times(numFailures)).inc(); + verify(counterRpcSuccesses, times(0)).inc(); + verify(counterRpcStreamValueReceived, times(0)).inc(); + } + + @Test + public void writes_blockWhenNotSafeToProceed() throws InterruptedException { + + RpcQos qos = new RpcQosImpl(options, random, sleeper, counterFactory, distributionFactory); + + // Based on the defined options, 3 failures is the upper bound before the next attempt + // will have to wait before proceeding + int numFailures = 3; + for (int i = 0; i < numFailures; i++) { + RpcWriteAttempt writeAttempt = qos.newWriteAttempt(RPC_ATTEMPT_CONTEXT); + Instant start = monotonicClock.instant(); + assertTrue(writeAttempt.awaitSafeToProceed(start)); + writeAttempt.recordRequestStart(start, 1); + Instant end = monotonicClock.instant(); + writeAttempt.recordWriteCounts(end, 0, 1); + writeAttempt.recordRequestFailed(end); + } + + RpcWriteAttempt writeAttempt2 = qos.newWriteAttempt(RPC_ATTEMPT_CONTEXT); + assertFalse(writeAttempt2.awaitSafeToProceed(monotonicClock.instant())); + + long sleepMillis = options.getInitialBackoff().getMillis(); + + verify(sleeper, times(0)).sleep(sleepMillis); + verify(counterThrottlingMs, times(0)).inc(sleepMillis); + verify(counterRpcFailures, times(numFailures)).inc(); + verify(counterRpcSuccesses, times(0)).inc(); + verify(counterRpcStreamValueReceived, times(0)).inc(); + } + + @Test + public void writes_shouldFlush_numWritesHigherThanBatchCount_newTimeBucket_lteq0() { + doTest_writes_shouldFlush_numWritesHigherThanBatchCount_newTimeBucket(false, 0); + } + + @Test + public void writes_shouldFlush_numWritesHigherThanBatchCount_newTimeBucket_lt() { + doTest_writes_shouldFlush_numWritesHigherThanBatchCount_newTimeBucket(false, 9); + } + + @Test + public void writes_shouldFlush_numWritesHigherThanBatchCount_newTimeBucket_eq() { + doTest_writes_shouldFlush_numWritesHigherThanBatchCount_newTimeBucket(true, 10); + } + + @Test + public void writes_shouldFlush_numWritesHigherThanBatchCount_newTimeBucket_gt() { + doTest_writes_shouldFlush_numWritesHigherThanBatchCount_newTimeBucket(true, 11); + } + + @Test + public void writes_shouldFlush_numWritesHigherThanBatchCount_existingTimeBucket_lteq0() { + doTest_writes_shouldFlush_numWritesHigherThanBatchCount_existingTimeBucket(false, 0); + } + + @Test + public void writes_shouldFlush_numWritesHigherThanBatchCount_existingTimeBucket_lt() { + doTest_writes_shouldFlush_numWritesHigherThanBatchCount_existingTimeBucket(false, 9); + } + + @Test + public void writes_shouldFlush_numWritesHigherThanBatchCount_existingTimeBucket_eq() { + doTest_writes_shouldFlush_numWritesHigherThanBatchCount_existingTimeBucket(true, 10); + } + + @Test + public void writes_shouldFlush_numWritesHigherThanBatchCount_existingTimeBucket_gt() { + doTest_writes_shouldFlush_numWritesHigherThanBatchCount_existingTimeBucket(true, 11); + } + + @Test + public void writes_shouldFlush_numBytes_lt() { + doTest_writes_shouldFlush_numBytes(false, 2999); + } + + @Test + public void writes_shouldFlush_numBytes_eq() { + doTest_writes_shouldFlush_numBytes(true, 3000); + } + + @Test + public void attemptsExhaustCorrectly() throws InterruptedException { + + RpcQosOptions rpcQosOptions = options.toBuilder().withMaxAttempts(3).unsafeBuild(); + RpcQos qos = + new RpcQosImpl(rpcQosOptions, random, sleeper, counterFactory, distributionFactory); + + RpcReadAttempt readAttempt = qos.newReadAttempt(RPC_ATTEMPT_CONTEXT); + // try 1 + readAttempt.recordRequestStart(monotonicClock.instant()); + readAttempt.recordRequestFailed(monotonicClock.instant()); + readAttempt.checkCanRetry(monotonicClock.instant(), RETRYABLE_ERROR); + // try 2 + readAttempt.recordRequestStart(monotonicClock.instant()); + readAttempt.recordRequestFailed(monotonicClock.instant()); + readAttempt.checkCanRetry(monotonicClock.instant(), RETRYABLE_ERROR); + // try 3 + readAttempt.recordRequestStart(monotonicClock.instant()); + readAttempt.recordRequestFailed(monotonicClock.instant()); + try { + readAttempt.checkCanRetry(monotonicClock.instant(), RETRYABLE_ERROR); + fail("expected retry to be exhausted after third attempt"); + } catch (ApiException e) { + assertSame(e, RETRYABLE_ERROR); + } + + verify(counterThrottlingMs, times(0)).inc(anyLong()); + verify(counterRpcFailures, times(3)).inc(); + verify(counterRpcSuccesses, times(0)).inc(); + verify(counterRpcStreamValueReceived, times(0)).inc(); + } + + @Test + public void attemptThrowsOnNonRetryableError() throws InterruptedException { + + RpcQosOptions rpcQosOptions = options.toBuilder().withMaxAttempts(3).unsafeBuild(); + RpcQos qos = + new RpcQosImpl(rpcQosOptions, random, sleeper, counterFactory, distributionFactory); + + RpcReadAttempt readAttempt = qos.newReadAttempt(RPC_ATTEMPT_CONTEXT); + readAttempt.recordRequestStart(monotonicClock.instant()); + // try 1 + readAttempt.recordRequestFailed(monotonicClock.instant()); + try { + readAttempt.checkCanRetry(monotonicClock.instant(), NON_RETRYABLE_ERROR); + fail("expected non-retryable error to throw error on first occurrence"); + } catch (ApiException e) { + assertSame(e, NON_RETRYABLE_ERROR); + } + + verify(counterThrottlingMs, times(0)).inc(anyLong()); + verify(counterRpcFailures, times(1)).inc(); + verify(counterRpcSuccesses, times(0)).inc(); + verify(counterRpcStreamValueReceived, times(0)).inc(); + } + + @Test + public void attemptThrowsOnNonRetryableErrorCode() throws InterruptedException { + + RpcQosOptions rpcQosOptions = options.toBuilder().withMaxAttempts(3).unsafeBuild(); + RpcQos qos = + new RpcQosImpl(rpcQosOptions, random, sleeper, counterFactory, distributionFactory); + + RpcReadAttempt readAttempt = qos.newReadAttempt(RPC_ATTEMPT_CONTEXT); + readAttempt.recordRequestStart(monotonicClock.instant()); + // try 1 + readAttempt.recordRequestFailed(monotonicClock.instant()); + try { + readAttempt.checkCanRetry(monotonicClock.instant(), RETRYABLE_ERROR_WITH_NON_RETRYABLE_CODE); + fail("expected non-retryable error to throw error on first occurrence"); + } catch (ApiException e) { + assertSame(e, RETRYABLE_ERROR_WITH_NON_RETRYABLE_CODE); + } + + verify(counterThrottlingMs, times(0)).inc(anyLong()); + verify(counterRpcFailures, times(1)).inc(); + verify(counterRpcSuccesses, times(0)).inc(); + verify(counterRpcStreamValueReceived, times(0)).inc(); + } + + @Test + public void attemptEnforcesActiveStateToPerformOperations_maxAttemptsExhausted() + throws InterruptedException { + RpcQosOptions rpcQosOptions = options.toBuilder().withMaxAttempts(1).unsafeBuild(); + RpcQos qos = + new RpcQosImpl(rpcQosOptions, random, sleeper, counterFactory, distributionFactory); + + RpcReadAttempt readAttempt = qos.newReadAttempt(RPC_ATTEMPT_CONTEXT); + + readAttempt.recordRequestStart(monotonicClock.instant()); + readAttempt.recordRequestFailed(monotonicClock.instant()); + try { + readAttempt.checkCanRetry(monotonicClock.instant(), RETRYABLE_ERROR); + fail("expected error to be re-thrown due to max attempts exhaustion"); + } catch (ApiException e) { + // expected + } + + try { + readAttempt.recordStreamValue(monotonicClock.instant()); + fail("expected IllegalStateException due to attempt being in terminal state"); + } catch (IllegalStateException e) { + // expected + } + + verify(sleeper, times(0)) + .sleep(anyLong()); // happens in checkCanRetry when the backoff is checked + verify(counterThrottlingMs, times(0)).inc(anyLong()); + verify(counterRpcFailures, times(1)).inc(); + verify(counterRpcSuccesses, times(0)).inc(); + verify(counterRpcStreamValueReceived, times(0)).inc(); + } + + @Test + public void attemptEnforcesActiveStateToPerformOperations_successful() + throws InterruptedException { + RpcQosOptions rpcQosOptions = options.toBuilder().withMaxAttempts(1).unsafeBuild(); + RpcQos qos = + new RpcQosImpl(rpcQosOptions, random, sleeper, counterFactory, distributionFactory); + + RpcReadAttempt readAttempt = qos.newReadAttempt(RPC_ATTEMPT_CONTEXT); + + readAttempt.recordRequestStart(monotonicClock.instant()); + readAttempt.recordRequestSuccessful(monotonicClock.instant()); + readAttempt.completeSuccess(); + try { + readAttempt.recordStreamValue(monotonicClock.instant()); + fail("expected IllegalStateException due to attempt being in terminal state"); + } catch (IllegalStateException e) { + // expected + } + + verify(sleeper, times(0)) + .sleep(anyLong()); // happens in checkCanRetry when the backoff is checked + verify(counterThrottlingMs, times(0)).inc(anyLong()); + verify(counterRpcFailures, times(0)).inc(); + verify(counterRpcSuccesses, times(1)).inc(); + verify(counterRpcStreamValueReceived, times(0)).inc(); + } + + @Test + public void offerOfElementWhichWouldCrossMaxBytesReturnFalse() { + RpcQosOptions rpcQosOptions = options.toBuilder().withBatchMaxBytes(5000).unsafeBuild(); + RpcQos qos = + new RpcQosImpl(rpcQosOptions, random, sleeper, counterFactory, distributionFactory); + + RpcWriteAttempt attempt = qos.newWriteAttempt(RPC_ATTEMPT_CONTEXT); + FlushBuffer> accumulator = attempt.newFlushBuffer(monotonicClock.instant()); + assertFalse(accumulator.offer(new FixedSerializationSize<>(newWrite(), 5001))); + + assertFalse(accumulator.isFull()); + assertEquals(0, accumulator.getBufferedElementsBytes()); + assertEquals(0, accumulator.getBufferedElementsCount()); + } + + @Test + public void flushBuffer_doesNotErrorWhenMaxIsOne() { + FlushBufferImpl> buffer = new FlushBufferImpl<>(1, 1000); + assertTrue(buffer.offer(new FixedSerializationSize<>("a", 1))); + assertFalse(buffer.offer(new FixedSerializationSize<>("b", 1))); + assertEquals(1, buffer.getBufferedElementsCount()); + } + + @Test + public void flushBuffer_doesNotErrorWhenMaxIsZero() { + FlushBufferImpl> buffer = new FlushBufferImpl<>(0, 1000); + assertFalse(buffer.offer(new FixedSerializationSize<>("a", 1))); + assertEquals(0, buffer.getBufferedElementsCount()); + assertFalse(buffer.isFull()); + assertFalse(buffer.isNonEmpty()); + } + + @Test + public void rampUp_calcForWorkerCount1() { + Instant t0 = Instant.ofEpochMilli(0); + Instant t5 = Instant.ofEpochSecond(60 * 5); + Instant t10 = Instant.ofEpochSecond(60 * 10); + Instant t15 = Instant.ofEpochSecond(60 * 15); + Instant t90 = Instant.ofEpochSecond(60 * 90); + + WriteRampUp tracker = new WriteRampUp(500, distributionFactory); + assertEquals(500, tracker.getAvailableWriteCountBudget(t0)); + assertEquals(500, tracker.getAvailableWriteCountBudget(t5)); + assertEquals(750, tracker.getAvailableWriteCountBudget(t10)); + assertEquals(1_125, tracker.getAvailableWriteCountBudget(t15)); + assertEquals(492_630, tracker.getAvailableWriteCountBudget(t90)); + } + + @Test + public void rampUp_calcForWorkerCount100() { + Instant t0 = Instant.ofEpochMilli(0); + Instant t5 = Instant.ofEpochSecond(60 * 5); + Instant t10 = Instant.ofEpochSecond(60 * 10); + Instant t15 = Instant.ofEpochSecond(60 * 15); + Instant t90 = Instant.ofEpochSecond(60 * 90); + + WriteRampUp tracker = new WriteRampUp(5, distributionFactory); + assertEquals(5, tracker.getAvailableWriteCountBudget(t0)); + assertEquals(5, tracker.getAvailableWriteCountBudget(t5)); + assertEquals(7, tracker.getAvailableWriteCountBudget(t10)); + assertEquals(11, tracker.getAvailableWriteCountBudget(t15)); + assertEquals(4_926, tracker.getAvailableWriteCountBudget(t90)); + } + + @Test + public void rampUp_calcForWorkerCount1000() { + Instant t0 = Instant.ofEpochMilli(0); + Instant t5 = Instant.ofEpochSecond(60 * 5); + Instant t10 = Instant.ofEpochSecond(60 * 10); + Instant t15 = Instant.ofEpochSecond(60 * 15); + Instant t90 = Instant.ofEpochSecond(60 * 90); + + WriteRampUp tracker = new WriteRampUp(1, distributionFactory); + assertEquals(1, tracker.getAvailableWriteCountBudget(t0)); + assertEquals(1, tracker.getAvailableWriteCountBudget(t5)); + assertEquals(1, tracker.getAvailableWriteCountBudget(t10)); + assertEquals(2, tracker.getAvailableWriteCountBudget(t15)); + assertEquals(985, tracker.getAvailableWriteCountBudget(t90)); + } + + @Test + public void rampUp_calcFor90Minutes() { + int increment = 5; + // 500 * 1.5^max(0, (x-5)/5) + List expected = + from0To90By(increment) + .map(x -> (int) (500 * Math.pow(1.5, Math.max(0, (x - increment) / increment)))) + .boxed() + .collect(Collectors.toList()); + + WriteRampUp tracker = new WriteRampUp(500, distributionFactory); + List actual = + from0To90By(increment) + .mapToObj(i -> Instant.ofEpochSecond(60 * i)) + .map(tracker::getAvailableWriteCountBudget) + .collect(Collectors.toList()); + + assertEquals(expected, actual); + } + + @Test + public void initialBatchSizeRelativeToWorkerCount_10000() { + doTest_initialBatchSizeRelativeToWorkerCount(10000, 1); + } + + @Test + public void initialBatchSizeRelativeToWorkerCount_1000() { + doTest_initialBatchSizeRelativeToWorkerCount(1000, 1); + } + + @Test + public void initialBatchSizeRelativeToWorkerCount_100() { + doTest_initialBatchSizeRelativeToWorkerCount(100, 5); + } + + @Test + public void initialBatchSizeRelativeToWorkerCount_10() { + doTest_initialBatchSizeRelativeToWorkerCount(10, 50); + } + + @Test + public void initialBatchSizeRelativeToWorkerCount_1() { + doTest_initialBatchSizeRelativeToWorkerCount(1, 500); + } + + @Test + public void isCodeRetryable() { + doTest_isCodeRetryable(Code.ABORTED, true); + doTest_isCodeRetryable(Code.ALREADY_EXISTS, false); + doTest_isCodeRetryable(Code.CANCELLED, true); + doTest_isCodeRetryable(Code.DATA_LOSS, false); + doTest_isCodeRetryable(Code.DEADLINE_EXCEEDED, true); + doTest_isCodeRetryable(Code.FAILED_PRECONDITION, false); + doTest_isCodeRetryable(Code.INTERNAL, true); + doTest_isCodeRetryable(Code.INVALID_ARGUMENT, false); + doTest_isCodeRetryable(Code.NOT_FOUND, false); + doTest_isCodeRetryable(Code.OK, true); + doTest_isCodeRetryable(Code.OUT_OF_RANGE, false); + doTest_isCodeRetryable(Code.PERMISSION_DENIED, false); + doTest_isCodeRetryable(Code.RESOURCE_EXHAUSTED, true); + doTest_isCodeRetryable(Code.UNAUTHENTICATED, true); + doTest_isCodeRetryable(Code.UNAVAILABLE, true); + doTest_isCodeRetryable(Code.UNIMPLEMENTED, false); + doTest_isCodeRetryable(Code.UNKNOWN, true); + } + + @Test + public void statusCodeAwareBackoff_graceCodeBackoffWithin60sec() { + + StatusCodeAwareBackoff backoff = + new StatusCodeAwareBackoff( + random, 5, Duration.standardSeconds(5), ImmutableSet.of(Code.UNAVAILABLE_VALUE)); + + BackoffResult backoffResult1 = + backoff.nextBackoff(Instant.ofEpochMilli(1), Code.UNAVAILABLE_VALUE); + assertEquals(BackoffResults.NONE, backoffResult1); + + BackoffResult backoffResult2 = + backoff.nextBackoff(Instant.ofEpochMilli(2), Code.UNAVAILABLE_VALUE); + assertEquals(new BackoffDuration(Duration.millis(6_091)), backoffResult2); + + BackoffResult backoffResult3 = + backoff.nextBackoff(Instant.ofEpochMilli(60_100), Code.UNAVAILABLE_VALUE); + assertEquals(BackoffResults.NONE, backoffResult3); + } + + @Test + public void statusCodeAwareBackoff_exhausted_attemptCount() { + + StatusCodeAwareBackoff backoff = + new StatusCodeAwareBackoff(random, 1, Duration.standardSeconds(5), Collections.emptySet()); + + BackoffResult backoffResult1 = + backoff.nextBackoff(Instant.ofEpochMilli(1), Code.UNAVAILABLE_VALUE); + assertEquals(BackoffResults.EXHAUSTED, backoffResult1); + } + + @Test + public void statusCodeAwareBackoff_exhausted_cumulativeBackoff() { + + StatusCodeAwareBackoff backoff = + new StatusCodeAwareBackoff(random, 3, Duration.standardSeconds(60), Collections.emptySet()); + + BackoffDuration backoff60Sec = new BackoffDuration(Duration.standardMinutes(1)); + BackoffResult backoffResult1 = + backoff.nextBackoff(Instant.ofEpochMilli(1), Code.DEADLINE_EXCEEDED_VALUE); + assertEquals(backoff60Sec, backoffResult1); + + BackoffResult backoffResult2 = + backoff.nextBackoff(Instant.ofEpochMilli(2), Code.DEADLINE_EXCEEDED_VALUE); + assertEquals(BackoffResults.EXHAUSTED, backoffResult2); + } + + private IntStream from0To90By(int increment) { + return IntStream.iterate(0, i -> i + increment).limit((90 / increment) + 1); + } + + private void doTest_writes_shouldFlush_numWritesHigherThanBatchCount_newTimeBucket( + boolean expectFlush, int batchCount) { + doTest_shouldFlush_numWritesHigherThanBatchCount(expectFlush, batchCount, qos -> {}); + } + + private void doTest_writes_shouldFlush_numWritesHigherThanBatchCount_existingTimeBucket( + boolean expectFlush, int batchCount) { + doTest_shouldFlush_numWritesHigherThanBatchCount( + expectFlush, + batchCount, + (qos) -> { + RpcWriteAttempt attempt = qos.newWriteAttempt(RPC_ATTEMPT_CONTEXT); + attempt.recordRequestStart(monotonicClock.instant(), 1); + attempt.recordWriteCounts(monotonicClock.instant(), 1, 0); + }); + } + + private void doTest_shouldFlush_numWritesHigherThanBatchCount( + boolean expectFlush, int batchCount, Consumer preAttempt) { + RpcQosOptions rpcQosOptions = + options.toBuilder().withBatchInitialCount(10).withBatchMaxCount(10).unsafeBuild(); + RpcQos qos = + new RpcQosImpl(rpcQosOptions, random, sleeper, counterFactory, distributionFactory); + + preAttempt.accept(qos); + + RpcWriteAttempt attempt = qos.newWriteAttempt(RPC_ATTEMPT_CONTEXT); + FlushBuffer> accumulator = attempt.newFlushBuffer(monotonicClock.instant()); + for (int i = 0; i < batchCount; i++) { + accumulator.offer(new WriteElement(i, newWrite(), window)); + } + + if (expectFlush) { + assertTrue(accumulator.isFull()); + assertEquals(10, accumulator.getBufferedElementsCount()); + } else { + assertFalse(accumulator.isFull()); + assertEquals(batchCount, accumulator.getBufferedElementsCount()); + } + } + + private void doTest_writes_shouldFlush_numBytes(boolean expectFlush, long numBytes) { + RpcQosOptions rpcQosOptions = options.toBuilder().withBatchMaxBytes(3000).unsafeBuild(); + RpcQos qos = + new RpcQosImpl(rpcQosOptions, random, sleeper, counterFactory, distributionFactory); + + RpcWriteAttempt attempt = qos.newWriteAttempt(RPC_ATTEMPT_CONTEXT); + FlushBuffer> accumulator = attempt.newFlushBuffer(monotonicClock.instant()); + assertTrue(accumulator.offer(new FixedSerializationSize<>(newWrite(), numBytes))); + + assertEquals(expectFlush, accumulator.isFull()); + assertEquals(numBytes, accumulator.getBufferedElementsBytes()); + assertEquals( + newArrayList(newWrite()), + StreamSupport.stream(accumulator.spliterator(), false) + .map(Element::getValue) + .collect(Collectors.toList())); + } + + private void doTest_initialBatchSizeRelativeToWorkerCount( + int hintWorkerCount, int expectedBatchMaxCount) { + RpcQosOptions options = + RpcQosOptions.newBuilder() + .withHintMaxNumWorkers(hintWorkerCount) + .withBatchInitialCount(500) + .build(); + RpcQosImpl qos = new RpcQosImpl(options, random, sleeper, counterFactory, distributionFactory); + RpcWriteAttemptImpl attempt = qos.newWriteAttempt(RPC_ATTEMPT_CONTEXT); + FlushBufferImpl> buffer = attempt.newFlushBuffer(Instant.EPOCH); + assertEquals(expectedBatchMaxCount, buffer.nextBatchMaxCount); + } + + private void doTest_isCodeRetryable(Code code, boolean shouldBeRetryable) { + RpcQosOptions options = RpcQosOptions.defaultOptions(); + RpcQosImpl qos = new RpcQosImpl(options, random, sleeper, counterFactory, distributionFactory); + RpcWriteAttemptImpl attempt = qos.newWriteAttempt(RPC_ATTEMPT_CONTEXT); + assertEquals(shouldBeRetryable, attempt.isCodeRetryable(code)); + } + + private static final class FixedSerializationSize implements Element { + private final long serializedSize; + private final T write; + + public FixedSerializationSize(T write, long serializedSize) { + this.write = write; + this.serializedSize = serializedSize; + } + + @Override + public T getValue() { + return write; + } + + @Override + public long getSerializedSize() { + return serializedSize; + } + + @Override + public String toString() { + return "FixedSerializationSize{" + + "serializedSize=" + + serializedSize + + ", write=" + + write + + '}'; + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (!(o instanceof FixedSerializationSize)) { + return false; + } + FixedSerializationSize that = (FixedSerializationSize) o; + return serializedSize == that.serializedSize && Objects.equals(write, that.write); + } + + @Override + public int hashCode() { + return Objects.hash(serializedSize, write); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/it/BaseFirestoreIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/it/BaseFirestoreIT.java new file mode 100644 index 000000000000..94e1bfad9b04 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/it/BaseFirestoreIT.java @@ -0,0 +1,329 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore.it; + +import static org.apache.beam.sdk.io.gcp.firestore.it.FirestoreTestingHelper.assumeEnvVarSet; +import static org.apache.beam.sdk.io.gcp.firestore.it.FirestoreTestingHelper.chunkUpDocIds; +import static org.hamcrest.Matchers.equalTo; +import static org.junit.Assert.assertEquals; +import static org.junit.Assume.assumeThat; + +import com.google.api.core.ApiFutures; +import com.google.cloud.firestore.WriteBatch; +import com.google.firestore.v1.BatchGetDocumentsRequest; +import com.google.firestore.v1.BatchGetDocumentsResponse; +import com.google.firestore.v1.Document; +import com.google.firestore.v1.ListCollectionIdsRequest; +import com.google.firestore.v1.ListDocumentsRequest; +import com.google.firestore.v1.PartitionQueryRequest; +import com.google.firestore.v1.RunQueryRequest; +import com.google.firestore.v1.RunQueryResponse; +import com.google.firestore.v1.Write; +import java.util.Collections; +import java.util.List; +import java.util.UUID; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import java.util.stream.IntStream; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreIO; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreOptions; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosOptions; +import org.apache.beam.sdk.io.gcp.firestore.it.FirestoreTestingHelper.CleanupMode; +import org.apache.beam.sdk.io.gcp.firestore.it.FirestoreTestingHelper.DataLayout; +import org.apache.beam.sdk.io.gcp.firestore.it.FirestoreTestingHelper.DocumentGenerator; +import org.apache.beam.sdk.io.gcp.firestore.it.FirestoreTestingHelper.TestDataLayoutHint; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; +import org.junit.AssumptionViolatedException; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TestName; +import org.junit.rules.Timeout; + +@SuppressWarnings({ + "initialization.fields.uninitialized", + "initialization.static.fields.uninitialized" +}) // fields are managed via #beforeClass & #setup +abstract class BaseFirestoreIT { + + protected static final int NUM_ITEMS_TO_GENERATE = + 768; // more than one read page and one write page + private static final String ENV_GOOGLE_APPLICATION_CREDENTIALS = "GOOGLE_APPLICATION_CREDENTIALS"; + private static final String ENV_FIRESTORE_EMULATOR_HOST = "FIRESTORE_EMULATOR_HOST"; + private static final String ENV_GOOGLE_CLOUD_PROJECT = "GOOGLE_CLOUD_PROJECT"; + + @Rule(order = 1) + public final TestName testName = new TestName(); + + @Rule( + order = + 2) // ensure our helper is "outer" to the timeout so we are allowed to cleanup even if a + // test times out + public final FirestoreTestingHelper helper = new FirestoreTestingHelper(CleanupMode.ALWAYS); + + @Rule(order = 3) + public final Timeout timeout = new Timeout(5, TimeUnit.MINUTES); + + @Rule(order = 4) + public final TestPipeline testPipeline = TestPipeline.create(); + + protected static String project; + protected static RpcQosOptions rpcQosOptions; + protected GcpOptions options; + + @BeforeClass + public static void beforeClass() { + try { + assumeEnvVarSet(ENV_GOOGLE_APPLICATION_CREDENTIALS); + } catch (AssumptionViolatedException e) { + try { + assumeEnvVarSet(ENV_FIRESTORE_EMULATOR_HOST); + } catch (AssumptionViolatedException exception) { + assumeThat( + String.format( + "Either %s or %s must be set", + ENV_GOOGLE_APPLICATION_CREDENTIALS, ENV_FIRESTORE_EMULATOR_HOST), + false, + equalTo(true)); + } + } + project = assumeEnvVarSet(ENV_GOOGLE_CLOUD_PROJECT); + rpcQosOptions = + RpcQosOptions.defaultOptions() + .toBuilder() + .withMaxAttempts(1) + .withHintMaxNumWorkers(1) + .build(); + } + + @Before + public void setup() { + options = TestPipeline.testingPipelineOptions().as(GcpOptions.class); + String emulatorHostPort = System.getenv(ENV_FIRESTORE_EMULATOR_HOST); + if (emulatorHostPort != null) { + options.as(FirestoreOptions.class).setEmulatorHost(emulatorHostPort); + } + options.setProject(project); + } + + @Test + @TestDataLayoutHint(DataLayout.Deep) + public final void listCollections() throws Exception { + // verification and cleanup of nested collections is much slower because each document + // requires an rpc to find its collections, instead of using the usual size, use 20 + // to keep the test quick + List collectionIds = + IntStream.rangeClosed(1, 20).mapToObj(i -> helper.colId()).collect(Collectors.toList()); + + ApiFutures.transform( + ApiFutures.allAsList( + chunkUpDocIds(collectionIds) + .map( + chunk -> { + WriteBatch batch = helper.getFs().batch(); + chunk.stream() + .map(col -> helper.getBaseDocument().collection(col).document()) + .forEach(ref -> batch.set(ref, ImmutableMap.of("foo", "bar"))); + return batch.commit(); + }) + .collect(Collectors.toList())), + FirestoreTestingHelper.flattenListList(), + MoreExecutors.directExecutor()) + .get(10, TimeUnit.SECONDS); + + PCollection actualCollectionIds = + testPipeline + .apply(Create.of("")) + .apply(getListCollectionIdsPTransform(testName.getMethodName())) + .apply( + FirestoreIO.v1() + .read() + .listCollectionIds() + .withRpcQosOptions(rpcQosOptions) + .build()); + + PAssert.that(actualCollectionIds).containsInAnyOrder(collectionIds); + testPipeline.run(options); + } + + @Test + public final void listDocuments() throws Exception { + DocumentGenerator documentGenerator = helper.documentGenerator(NUM_ITEMS_TO_GENERATE, "a"); + documentGenerator.generateDocuments().get(10, TimeUnit.SECONDS); + + PCollection listDocumentPaths = + testPipeline + .apply(Create.of("a")) + .apply(getListDocumentsPTransform(testName.getMethodName())) + .apply(FirestoreIO.v1().read().listDocuments().withRpcQosOptions(rpcQosOptions).build()) + .apply(ParDo.of(new DocumentToName())); + + PAssert.that(listDocumentPaths).containsInAnyOrder(documentGenerator.expectedDocumentPaths()); + testPipeline.run(options); + } + + @Test + public final void runQuery() throws Exception { + String collectionId = "a"; + DocumentGenerator documentGenerator = + helper.documentGenerator(NUM_ITEMS_TO_GENERATE, collectionId, /* addBazDoc = */ true); + documentGenerator.generateDocuments().get(10, TimeUnit.SECONDS); + + PCollection listDocumentPaths = + testPipeline + .apply(Create.of(collectionId)) + .apply(getRunQueryPTransform(testName.getMethodName())) + .apply(FirestoreIO.v1().read().runQuery().withRpcQosOptions(rpcQosOptions).build()) + .apply(ParDo.of(new RunQueryResponseToDocument())) + .apply(ParDo.of(new DocumentToName())); + + PAssert.that(listDocumentPaths).containsInAnyOrder(documentGenerator.expectedDocumentPaths()); + testPipeline.run(options); + } + + @Test + public final void partitionQuery() throws Exception { + String collectionGroupId = UUID.randomUUID().toString(); + // currently firestore will only generate a partition every 128 documents, so generate enough + // documents to get 2 cursors returned, resulting in 3 partitions + int partitionCount = 3; + int documentCount = (partitionCount * 128) - 1; + + // create some documents for listing and asserting in the test + DocumentGenerator documentGenerator = + helper.documentGenerator(documentCount, collectionGroupId); + documentGenerator.generateDocuments().get(10, TimeUnit.SECONDS); + + PCollection listDocumentPaths = + testPipeline + .apply(Create.of(collectionGroupId)) + .apply(getPartitionQueryPTransform(testName.getMethodName(), partitionCount)) + .apply(FirestoreIO.v1().read().partitionQuery().withNameOnlyQuery().build()) + .apply(FirestoreIO.v1().read().runQuery().build()) + .apply(ParDo.of(new RunQueryResponseToDocument())) + .apply(ParDo.of(new DocumentToName())); + + PAssert.that(listDocumentPaths).containsInAnyOrder(documentGenerator.expectedDocumentPaths()); + testPipeline.run(options); + } + + @Test + public final void batchGet() throws Exception { + String collectionId = "a"; + DocumentGenerator documentGenerator = + helper.documentGenerator(NUM_ITEMS_TO_GENERATE, collectionId); + documentGenerator.generateDocuments().get(10, TimeUnit.SECONDS); + + PCollection listDocumentPaths = + testPipeline + .apply(Create.of(Collections.singletonList(documentGenerator.getDocumentIds()))) + .apply(getBatchGetDocumentsPTransform(testName.getMethodName(), collectionId)) + .apply( + FirestoreIO.v1() + .read() + .batchGetDocuments() + .withRpcQosOptions(rpcQosOptions) + .build()) + .apply(Filter.by(BatchGetDocumentsResponse::hasFound)) + .apply(ParDo.of(new BatchGetDocumentsResponseToDocument())) + .apply(ParDo.of(new DocumentToName())); + + PAssert.that(listDocumentPaths).containsInAnyOrder(documentGenerator.expectedDocumentPaths()); + testPipeline.run(options); + } + + @Test + public final void write() { + String collectionId = "a"; + runWriteTest(getWritePTransform(testName.getMethodName(), collectionId), collectionId); + } + + protected abstract PTransform, PCollection> + getListCollectionIdsPTransform(String testMethodName); + + protected abstract PTransform, PCollection> + getListDocumentsPTransform(String testMethodName); + + protected abstract PTransform>, PCollection> + getBatchGetDocumentsPTransform(String testMethodName, String collectionId); + + protected abstract PTransform, PCollection> + getRunQueryPTransform(String testMethodName); + + protected abstract PTransform, PCollection> + getPartitionQueryPTransform(String testMethodName, int partitionCount); + + protected abstract PTransform>, PCollection> getWritePTransform( + String testMethodName, String collectionId); + + protected final void runWriteTest( + PTransform>, PCollection> createWrite, String collectionId) { + List documentIds = + IntStream.rangeClosed(1, 1_000).mapToObj(i -> helper.docId()).collect(Collectors.toList()); + + // Create.of unwraps the list of document ids, so wrap it in another list + testPipeline + .apply(Create.of(Collections.singletonList(documentIds))) + .apply(createWrite) + .apply(FirestoreIO.v1().write().batchWrite().withRpcQosOptions(rpcQosOptions).build()); + + testPipeline.run(options); + + List actualDocumentIds = + helper + .listDocumentsViaQuery( + String.format("%s/%s", helper.getBaseDocumentPath(), collectionId)) + .map(name -> name.substring(name.lastIndexOf("/") + 1)) + .collect(Collectors.toList()); + + assertEquals(documentIds, actualDocumentIds); + } + + private static final class RunQueryResponseToDocument extends DoFn { + @ProcessElement + public void processElement(ProcessContext c) { + c.output(c.element().getDocument()); + } + } + + private static final class BatchGetDocumentsResponseToDocument + extends DoFn { + @ProcessElement + public void processElement(ProcessContext c) { + c.output(c.element().getFound()); + } + } + + private static final class DocumentToName extends DoFn { + @ProcessElement + public void processElement(ProcessContext c) { + c.output(c.element().getName()); + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/it/FirestoreTestingHelper.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/it/FirestoreTestingHelper.java new file mode 100644 index 000000000000..572d2dd9e4dd --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/it/FirestoreTestingHelper.java @@ -0,0 +1,438 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore.it; + +import static org.hamcrest.Matchers.is; +import static org.hamcrest.Matchers.notNullValue; +import static org.junit.Assume.assumeThat; + +import com.google.api.core.ApiFunction; +import com.google.api.core.ApiFuture; +import com.google.api.core.ApiFutures; +import com.google.cloud.firestore.CollectionReference; +import com.google.cloud.firestore.DocumentReference; +import com.google.cloud.firestore.Firestore; +import com.google.cloud.firestore.FirestoreOptions; +import com.google.cloud.firestore.WriteBatch; +import com.google.cloud.firestore.WriteResult; +import com.google.cloud.firestore.spi.v1.FirestoreRpc; +import com.google.cloud.firestore.v1.FirestoreClient.ListCollectionIdsPagedResponse; +import com.google.cloud.firestore.v1.FirestoreClient.ListDocumentsPagedResponse; +import com.google.firestore.v1.BatchWriteRequest; +import com.google.firestore.v1.Document; +import com.google.firestore.v1.ListCollectionIdsRequest; +import com.google.firestore.v1.ListDocumentsRequest; +import com.google.firestore.v1.RunQueryRequest; +import com.google.firestore.v1.RunQueryResponse; +import com.google.firestore.v1.StructuredQuery; +import com.google.firestore.v1.StructuredQuery.CollectionSelector; +import com.google.firestore.v1.StructuredQuery.Direction; +import com.google.firestore.v1.StructuredQuery.FieldReference; +import com.google.firestore.v1.StructuredQuery.Order; +import com.google.firestore.v1.StructuredQuery.Projection; +import com.google.firestore.v1.Write; +import edu.umd.cs.findbugs.annotations.NonNull; +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; +import java.time.Clock; +import java.time.Instant; +import java.time.ZoneId; +import java.time.ZoneOffset; +import java.time.format.DateTimeFormatter; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Optional; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; +import java.util.stream.Collectors; +import java.util.stream.IntStream; +import java.util.stream.Stream; +import java.util.stream.StreamSupport; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Streams; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors; +import org.junit.rules.TestRule; +import org.junit.runner.Description; +import org.junit.runners.model.Statement; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +final class FirestoreTestingHelper implements TestRule { + private static final Logger LOG = LoggerFactory.getLogger(FirestoreTestingHelper.class); + + static final String BASE_COLLECTION_ID; + + static { + Instant now = Clock.systemUTC().instant(); + DateTimeFormatter formatter = + DateTimeFormatter.ISO_LOCAL_DATE_TIME.withZone(ZoneId.from(ZoneOffset.UTC)); + BASE_COLLECTION_ID = formatter.format(now); + } + + enum CleanupMode { + ALWAYS, + ON_SUCCESS_ONLY, + NEVER + } + + enum DataLayout { + Shallow, + Deep + } + + @Retention(RetentionPolicy.RUNTIME) + @Target(ElementType.METHOD) + @interface TestDataLayoutHint { + DataLayout value() default DataLayout.Shallow; + } + + private static final FirestoreOptions FIRESTORE_OPTIONS = FirestoreOptions.getDefaultInstance(); + + private final Firestore fs = FIRESTORE_OPTIONS.getService(); + private final FirestoreRpc rpc = (FirestoreRpc) FIRESTORE_OPTIONS.getRpc(); + private final CleanupMode cleanupMode; + + private Class testClass; + private String testName; + private DataLayout dataLayout; + + private int docIdCounter; + private int colIdCounter; + + private boolean testSuccess = false; + + @SuppressWarnings( + "initialization.fields.uninitialized") // testClass and testName are managed via #apply + public FirestoreTestingHelper(CleanupMode cleanupMode) { + this.cleanupMode = cleanupMode; + } + + @Override + public Statement apply(@NonNull Statement base, Description description) { + testClass = description.getTestClass(); + testName = description.getMethodName(); + TestDataLayoutHint hint = description.getAnnotation(TestDataLayoutHint.class); + if (hint != null) { + dataLayout = hint.value(); + } else { + dataLayout = DataLayout.Shallow; + } + return new Statement() { + @Override + public void evaluate() throws Throwable { + try { + base.evaluate(); + testSuccess = true; + } finally { + if (cleanupMode == CleanupMode.ALWAYS + || (cleanupMode == CleanupMode.ON_SUCCESS_ONLY && testSuccess)) { + try { + cleanUp(getBaseDocumentPath()); + } catch (Exception e) { + if (LOG.isDebugEnabled() || LOG.isTraceEnabled()) { + LOG.debug("Error while running cleanup", e); + } else { + LOG.info( + "Error while running cleanup: {} (set log level higher for stacktrace)", + e.getMessage() == null ? e.getClass().getName() : e.getMessage()); + } + } + } + } + } + }; + } + + Firestore getFs() { + return fs; + } + + String getDatabase() { + return String.format( + "projects/%s/databases/%s", + FIRESTORE_OPTIONS.getProjectId(), FIRESTORE_OPTIONS.getDatabaseId()); + } + + String getDocumentRoot() { + return getDatabase() + "/documents"; + } + + DocumentReference getBaseDocument() { + return getBaseDocument(fs, testClass, testName); + } + + String getBaseDocumentPath() { + return String.format("%s/%s", getDocumentRoot(), getBaseDocument().getPath()); + } + + String docId() { + return id("doc", docIdCounter++); + } + + String colId() { + return id("col", colIdCounter++); + } + + DocumentGenerator documentGenerator(int to, String collectionId) { + return documentGenerator(to, collectionId, false); + } + + DocumentGenerator documentGenerator(int to, String collectionId, boolean addBazDoc) { + return new DocumentGenerator(to, collectionId, addBazDoc); + } + + Stream listCollectionIds(String parent) { + ListCollectionIdsRequest lcir = ListCollectionIdsRequest.newBuilder().setParent(parent).build(); + // LOGGER.debug("lcir = {}", lcir); + + ListCollectionIdsPagedResponse response = rpc.listCollectionIdsPagedCallable().call(lcir); + return StreamSupport.stream(response.iteratePages().spliterator(), false) + .flatMap(page -> StreamSupport.stream(page.getValues().spliterator(), false)) + .map(colId -> String.format("%s/%s", parent, colId)); + } + + Stream listDocumentIds(String collectionPath) { + int index = collectionPath.lastIndexOf('/'); + String parent = collectionPath.substring(0, index); + String collectionId = collectionPath.substring(index + 1); + ListDocumentsRequest ldr = + ListDocumentsRequest.newBuilder() + .setParent(parent) + .setCollectionId(collectionId) + .setShowMissing(true) + .build(); + // LOGGER.debug("ldr = {}", ldr); + + ListDocumentsPagedResponse response = rpc.listDocumentsPagedCallable().call(ldr); + return StreamSupport.stream(response.iteratePages().spliterator(), false) + .flatMap(page -> page.getResponse().getDocumentsList().stream()) + .map(Document::getName) + .filter(s -> !s.isEmpty()); + } + + Stream listDocumentsViaQuery(String collectionPath) { + int index = collectionPath.lastIndexOf('/'); + String parent = collectionPath.substring(0, index); + String collectionId = collectionPath.substring(index + 1); + FieldReference nameField = FieldReference.newBuilder().setFieldPath("__name__").build(); + RunQueryRequest rqr = + RunQueryRequest.newBuilder() + .setParent(parent) + .setStructuredQuery( + StructuredQuery.newBuilder() + .addFrom(CollectionSelector.newBuilder().setCollectionId(collectionId)) + .addOrderBy( + Order.newBuilder() + .setField(nameField) + .setDirection(Direction.ASCENDING) + .build()) + .setSelect(Projection.newBuilder().addFields(nameField).build())) + .build(); + // LOGGER.debug("rqr = {}", rqr); + + return StreamSupport.stream(rpc.runQueryCallable().call(rqr).spliterator(), false) + .filter(RunQueryResponse::hasDocument) + .map(RunQueryResponse::getDocument) + .map(Document::getName); + } + + private void cleanUp(String baseDocument) { + LOG.debug("Running cleanup..."); + Batcher batcher = new Batcher(); + walkDoc(batcher, baseDocument); + batcher.flush(); + LOG.debug("Running cleanup complete"); + } + + private void walkDoc(Batcher batcher, String baseDocument) { + batcher.checkState(); + listCollectionIds(baseDocument).forEach(col -> walkCol(batcher, col)); + batcher.add(baseDocument); + } + + private void walkCol(Batcher batcher, String baseCollection) { + batcher.checkState(); + if (dataLayout == DataLayout.Shallow) { + // queries are much faster for flat listing when we don't need to try and account for + // `show_missing` + listDocumentsViaQuery(baseCollection).forEach(batcher::add); + // flush any pending writes before we start recursively walking down the tree + batcher.flush(); + } + listDocumentIds(baseCollection).forEach(doc -> walkDoc(batcher, doc)); + } + + static DocumentReference getBaseDocument(Firestore fs, Class testClass, String testName) { + return fs.collection("beam") + .document("IT") + .collection(BASE_COLLECTION_ID) + .document(testClass.getSimpleName()) + .collection("test") + .document(testName); + } + + static Stream> chunkUpDocIds(List things) { + return Streams.stream(Iterables.partition(things, 500)); + } + + static ApiFunction>, List> flattenListList() { + return input -> { + List retVal = new ArrayList<>(); + for (List writeResults : input) { + retVal.addAll(writeResults); + } + return retVal; + }; + } + + @SuppressWarnings("nullness") + static String assumeEnvVarSet(@NonNull String name) { + LOG.debug(">>> assumeEnvVarSet(name : {})", name); + try { + String value = System.getenv(name); + LOG.debug("value = {}", value); + assumeThat(name + " not set", value, is(notNullValue())); + return value; + } finally { + LOG.debug("<<< assumeEnvVarSet(name : {})", name); + } + } + + private static String id(String docOrCol, int counter) { + return String.format("%s-%05d", docOrCol, counter); + } + + final class DocumentGenerator { + + private final List documentIds; + private final String collectionId; + private final boolean addBazDoc; + + private DocumentGenerator(int to, String collectionId, boolean addBazDoc) { + this.documentIds = + Collections.unmodifiableList( + IntStream.rangeClosed(1, to).mapToObj(i -> docId()).collect(Collectors.toList())); + this.collectionId = collectionId; + this.addBazDoc = addBazDoc; + } + + ApiFuture> generateDocuments() { + // create some documents for listing and asserting in the test + CollectionReference baseCollection = getBaseDocument().collection(collectionId); + + List>> futures = + Streams.concat( + chunkUpDocIds(documentIds) + .map( + chunk -> { + WriteBatch batch = fs.batch(); + chunk.stream() + .map(baseCollection::document) + .forEach(ref -> batch.set(ref, ImmutableMap.of("foo", "bar"))); + return batch.commit(); + }), + Stream.of(Optional.of(addBazDoc)) + .filter(o -> o.filter(b -> b).isPresent()) + .map( + x -> { + WriteBatch batch = fs.batch(); + batch.set(baseCollection.document(), ImmutableMap.of("foo", "baz")); + return batch.commit(); + })) + .collect(Collectors.toList()); + + return ApiFutures.transform( + ApiFutures.allAsList(futures), + FirestoreTestingHelper.flattenListList(), + MoreExecutors.directExecutor()); + } + + List getDocumentIds() { + return documentIds; + } + + List expectedDocumentPaths() { + return documentIds.stream() + .map( + id -> + String.format( + "%s/%s/%s", + getDocumentRoot(), getBaseDocument().collection(collectionId).getPath(), id)) + .collect(Collectors.toList()); + } + } + + @SuppressWarnings({ + "nullness", + "initialization.fields.uninitialized" + }) // batch is managed as part of use + private final class Batcher { + private static final int MAX_BATCH_SIZE = 500; + + private final BatchWriteRequest.Builder batch; + private boolean failed; + + public Batcher() { + batch = BatchWriteRequest.newBuilder().setDatabase(getDatabase()); + this.failed = false; + } + + public void add(String docName) { + checkState(); + batch.addWrites(Write.newBuilder().setDelete(docName)); + } + + private void checkState() { + requireNotFailed(); + maybeFlush(); + } + + private void requireNotFailed() { + if (failed) { + throw new IllegalStateException( + "Previous batch commit failed, unable to enqueue new operation"); + } + } + + private void maybeFlush() { + if (batch.getWritesCount() == MAX_BATCH_SIZE) { + flush(); + } + } + + private void flush() { + if (batch.getWritesCount() == 0) { + return; + } + try { + LOG.trace("Flushing {} elements...", batch.getWritesCount()); + rpc.batchWriteCallable().futureCall(batch.build()).get(30, TimeUnit.SECONDS); + LOG.trace("Flushing {} elements complete", batch.getWritesCount()); + batch.clearWrites(); + } catch (InterruptedException | ExecutionException | TimeoutException e) { + failed = true; + throw new RuntimeException(e); + } + } + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/it/FirestoreV1IT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/it/FirestoreV1IT.java new file mode 100644 index 000000000000..157624eb32fe --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/firestore/it/FirestoreV1IT.java @@ -0,0 +1,385 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.firestore.it; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists.newArrayList; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertTrue; + +import com.google.api.core.ApiFuture; +import com.google.cloud.firestore.QueryDocumentSnapshot; +import com.google.cloud.firestore.QuerySnapshot; +import com.google.firestore.v1.BatchGetDocumentsRequest; +import com.google.firestore.v1.Document; +import com.google.firestore.v1.ListCollectionIdsRequest; +import com.google.firestore.v1.ListDocumentsRequest; +import com.google.firestore.v1.PartitionQueryRequest; +import com.google.firestore.v1.Precondition; +import com.google.firestore.v1.RunQueryRequest; +import com.google.firestore.v1.StructuredQuery; +import com.google.firestore.v1.StructuredQuery.CollectionSelector; +import com.google.firestore.v1.StructuredQuery.Direction; +import com.google.firestore.v1.StructuredQuery.FieldFilter; +import com.google.firestore.v1.StructuredQuery.FieldFilter.Operator; +import com.google.firestore.v1.StructuredQuery.FieldReference; +import com.google.firestore.v1.StructuredQuery.Filter; +import com.google.firestore.v1.StructuredQuery.Order; +import com.google.firestore.v1.Value; +import com.google.firestore.v1.Write; +import com.google.protobuf.Timestamp; +import com.google.rpc.Code; +import java.util.Iterator; +import java.util.List; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; +import java.util.stream.Collectors; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreIO; +import org.apache.beam.sdk.io.gcp.firestore.FirestoreV1.WriteFailure; +import org.apache.beam.sdk.io.gcp.firestore.RpcQosOptions; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; + +public final class FirestoreV1IT extends BaseFirestoreIT { + + @Test + public void batchWrite_partialFailureOutputsToDeadLetterQueue() + throws InterruptedException, ExecutionException, TimeoutException { + String collectionId = "a"; + + String docId = helper.docId(); + Write validWrite = + Write.newBuilder() + .setUpdate( + Document.newBuilder() + .setName(docPath(helper.getBaseDocumentPath(), collectionId, docId)) + .putFields("foo", Value.newBuilder().setStringValue(docId).build())) + .build(); + + long millis = System.currentTimeMillis(); + Timestamp timestamp = + Timestamp.newBuilder() + .setSeconds(millis / 1000) + .setNanos((int) ((millis % 1000) * 1000000)) + .build(); + + String docId2 = helper.docId(); + helper + .getBaseDocument() + .collection(collectionId) + .document(docId2) + .create(ImmutableMap.of("foo", "baz")) + .get(10, TimeUnit.SECONDS); + + // this will fail because we're setting a updateTime precondition to before it was created + Write conditionalUpdate = + Write.newBuilder() + .setUpdate( + Document.newBuilder() + .setName(docPath(helper.getBaseDocumentPath(), collectionId, docId2)) + .putFields("foo", Value.newBuilder().setStringValue(docId).build())) + .setCurrentDocument(Precondition.newBuilder().setUpdateTime(timestamp)) + .build(); + + List writes = newArrayList(validWrite, conditionalUpdate); + + RpcQosOptions options = BaseFirestoreIT.rpcQosOptions.toBuilder().withBatchMaxCount(2).build(); + PCollection writeFailurePCollection = + testPipeline + .apply(Create.of(writes)) + .apply( + FirestoreIO.v1() + .write() + .batchWrite() + .withDeadLetterQueue() + .withRpcQosOptions(options) + .build()); + + PAssert.that(writeFailurePCollection) + .satisfies( + (writeFailures) -> { + Iterator iterator = writeFailures.iterator(); + assertTrue(iterator.hasNext()); + WriteFailure failure = iterator.next(); + assertEquals(Code.FAILED_PRECONDITION, Code.forNumber(failure.getStatus().getCode())); + assertNotNull(failure.getWriteResult()); + assertFalse(failure.getWriteResult().hasUpdateTime()); + assertEquals(conditionalUpdate, failure.getWrite()); + assertFalse(iterator.hasNext()); + return null; + }); + testPipeline.run(this.options); + + ApiFuture actualDocsQuery = + helper.getBaseDocument().collection(collectionId).orderBy("__name__").get(); + QuerySnapshot querySnapshot = actualDocsQuery.get(10, TimeUnit.SECONDS); + List documents = querySnapshot.getDocuments(); + List> actualDocumentIds = + documents.stream() + .map(doc -> KV.of(doc.getId(), doc.getString("foo"))) + .collect(Collectors.toList()); + + List> expected = newArrayList(KV.of(docId, docId), KV.of(docId2, "baz")); + assertEquals(expected, actualDocumentIds); + } + + @Override + protected PTransform, PCollection> + getListCollectionIdsPTransform(String testMethodName) { + return new ListCollectionIds(helper.getDatabase(), helper.getBaseDocumentPath()); + } + + @Override + protected PTransform, PCollection> + getListDocumentsPTransform(String testMethodName) { + return new ListDocuments(helper.getDatabase(), helper.getBaseDocumentPath()); + } + + @Override + protected PTransform>, PCollection> + getBatchGetDocumentsPTransform(String testMethodName, String collectionId) { + return new BatchGetDocuments(helper.getDatabase(), helper.getBaseDocumentPath(), collectionId); + } + + @Override + protected PTransform, PCollection> getRunQueryPTransform( + String testMethodName) { + return new RunQuery(helper.getDatabase(), helper.getBaseDocumentPath()); + } + + @Override + protected PTransform, PCollection> + getPartitionQueryPTransform(String testMethodName, int partitionCount) { + return new PartitionQuery(helper.getDatabase(), helper.getDocumentRoot(), partitionCount); + } + + @Override + protected PTransform>, PCollection> getWritePTransform( + String testMethodName, String collectionId) { + return new WritePTransform(helper.getDatabase(), helper.getBaseDocumentPath(), collectionId); + } + + private static final class ListCollectionIds + extends BasePTransform { + + public ListCollectionIds(String database, String baseDocumentPath) { + super(database, baseDocumentPath, ""); + } + + @Override + public PCollection expand(PCollection input) { + return input.apply( + ParDo.of( + new DoFn() { + @ProcessElement + public void processElement(ProcessContext c) { + c.output( + ListCollectionIdsRequest.newBuilder().setParent(baseDocumentPath).build()); + } + })); + } + } + + private static final class ListDocuments extends BasePTransform { + + public ListDocuments(String database, String baseDocumentPath) { + super(database, baseDocumentPath, ""); + } + + @Override + public PCollection expand(PCollection input) { + return input.apply( + ParDo.of( + new DoFn() { + @ProcessElement + public void processElement(ProcessContext c) { + c.output( + ListDocumentsRequest.newBuilder() + .setParent(baseDocumentPath) + .setCollectionId(c.element()) + .build()); + } + })); + } + } + + private static final class BatchGetDocuments + extends BasePTransform, BatchGetDocumentsRequest> { + + public BatchGetDocuments(String database, String baseDocumentPath, String collectionId) { + super(database, baseDocumentPath, collectionId); + } + + @Override + public PCollection expand(PCollection> input) { + return input.apply( + ParDo.of( + new DoFn, BatchGetDocumentsRequest>() { + @ProcessElement + public void processElement(ProcessContext c) { + BatchGetDocumentsRequest.Builder builder = + BatchGetDocumentsRequest.newBuilder().setDatabase(database); + builder.addDocuments(docPath("404")); + c.element().stream().map(docId -> docPath(docId)).forEach(builder::addDocuments); + c.output(builder.build()); + } + })); + } + } + + private static final class RunQuery extends BasePTransform { + + public RunQuery(String database, String baseDocumentPath) { + super(database, baseDocumentPath, ""); + } + + @Override + public PCollection expand(PCollection input) { + return input.apply( + ParDo.of( + new DoFn() { + @ProcessElement + public void processElement(ProcessContext c) { + c.output( + RunQueryRequest.newBuilder() + .setParent(baseDocumentPath) + .setStructuredQuery( + StructuredQuery.newBuilder() + .addFrom( + CollectionSelector.newBuilder().setCollectionId(c.element())) + .setWhere( + Filter.newBuilder() + .setFieldFilter( + FieldFilter.newBuilder() + .setField( + FieldReference.newBuilder() + .setFieldPath("foo")) + .setOp(Operator.EQUAL) + .setValue( + Value.newBuilder().setStringValue("bar"))))) + .build()); + } + })); + } + } + + private static final class PartitionQuery extends BasePTransform { + + private final int partitionCount; + + public PartitionQuery(String database, String baseDocumentPath, int partitionCount) { + super(database, baseDocumentPath, ""); + this.partitionCount = partitionCount; + } + + @Override + public PCollection expand(PCollection input) { + return input.apply( + ParDo.of( + new DoFn() { + @ProcessElement + public void processElement(ProcessContext c) { + c.output( + PartitionQueryRequest.newBuilder() + .setParent(baseDocumentPath) + // set the page size smaller than the number of partitions we're + // requesting to ensure + // that more than one page is to be fetched so that our multi-page + // handling code is + // tested. + .setPageSize(1) + .setPartitionCount(partitionCount) + .setStructuredQuery( + StructuredQuery.newBuilder() + .addFrom( + CollectionSelector.newBuilder() + .setCollectionId(c.element()) + .setAllDescendants(true)) + .addOrderBy( + Order.newBuilder() + .setField( + FieldReference.newBuilder().setFieldPath("__name__")) + .setDirection(Direction.ASCENDING))) + .build()); + } + })); + } + } + + private static final class WritePTransform extends BasePTransform, Write> { + + public WritePTransform(String database, String baseDocumentPath, String collectionId) { + super(database, baseDocumentPath, collectionId); + } + + @Override + public PCollection expand(PCollection> input) { + return input.apply( + ParDo.of( + new DoFn, Write>() { + @ProcessElement + public void processElement(ProcessContext c) { + List documentIds = c.element(); + documentIds.stream() + .map( + docId -> + Write.newBuilder() + .setUpdate( + Document.newBuilder() + .setName(docPath(docId)) + .putFields( + "foo", + Value.newBuilder().setStringValue(docId).build())) + .build()) + .forEach(c::output); + } + })); + } + } + + private abstract static class BasePTransform + extends PTransform, PCollection> { + + protected final String database; + protected final String baseDocumentPath; + protected final String collectionId; + + private BasePTransform(String database, String baseDocumentPath, String collectionId) { + this.database = database; + this.baseDocumentPath = baseDocumentPath; + this.collectionId = collectionId; + } + + protected String docPath(String docId) { + return FirestoreV1IT.docPath(baseDocumentPath, collectionId, docId); + } + } + + private static String docPath(String baseDocumentPath, String collectionId, String docId) { + return String.format("%s/%s/%s", baseDocumentPath, collectionId, docId); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOLROIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOLROIT.java index 49c2c064232f..8deefc207e1b 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOLROIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOLROIT.java @@ -20,7 +20,7 @@ import static org.apache.beam.sdk.io.gcp.healthcare.FhirIOTestUtil.DEFAULT_TEMP_BUCKET; import static org.apache.beam.sdk.io.gcp.healthcare.HL7v2IOTestUtil.HEALTHCARE_DATASET_TEMPLATE; -import com.google.api.services.healthcare.v1beta1.model.DeidentifyConfig; +import com.google.api.services.healthcare.v1.model.DeidentifyConfig; import java.io.IOException; import java.security.SecureRandom; import org.apache.beam.sdk.testing.TestPipeline; @@ -34,9 +34,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FhirIOLROIT { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOReadIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOReadIT.java index 32e8eda62c85..38c310e1c0bc 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOReadIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOReadIT.java @@ -45,9 +45,6 @@ import org.junit.runners.Parameterized.Parameters; @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FhirIOReadIT { @Parameters(name = "{0}") diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOSearchIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOSearchIT.java index 307d0f650c10..e3a5b4e0c759 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOSearchIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOSearchIT.java @@ -18,6 +18,7 @@ package org.apache.beam.sdk.io.gcp.healthcare; import static org.apache.beam.sdk.io.gcp.healthcare.HL7v2IOTestUtil.HEALTHCARE_DATASET_TEMPLATE; +import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotEquals; import com.google.gson.JsonArray; @@ -30,17 +31,17 @@ import java.util.Collection; import java.util.HashMap; import java.util.List; -import java.util.Map; import org.apache.beam.runners.direct.DirectOptions; -import org.apache.beam.sdk.coders.KvCoder; -import org.apache.beam.sdk.coders.MapCoder; +import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.VarIntCoder; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; -import org.junit.AfterClass; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.After; import org.junit.Before; import org.junit.Rule; import org.junit.Test; @@ -49,9 +50,6 @@ import org.junit.runners.Parameterized.Parameters; @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FhirIOSearchIT { @Parameters(name = "{0}") @@ -67,8 +65,10 @@ public static Collection versions() { private static final String BASE_STORE_ID = "FHIR_store_search_it_" + System.currentTimeMillis() + "_" + (new SecureRandom().nextInt(32)); private String fhirStoreId; - private static final int MAX_NUM_OF_SEARCHES = 100; - private List>> input = new ArrayList<>(); + private static final int MAX_NUM_OF_SEARCHES = 50; + private List> input = new ArrayList<>(); + private List>> genericParametersInput = new ArrayList<>(); + private static final String KEY = "key"; public String version; @@ -96,17 +96,15 @@ public void setup() throws Exception { JsonArray fhirResources = JsonParser.parseString(bundles.get(0)).getAsJsonObject().getAsJsonArray("entry"); HashMap searchParameters = new HashMap<>(); - searchParameters.put("_count", Integer.toString(100)); + searchParameters.put("_count", Integer.toString(50)); + HashMap> genericSearchParameters = new HashMap<>(); + genericSearchParameters.put("_count", Arrays.asList(50)); int searches = 0; for (JsonElement resource : fhirResources) { - input.add( - KV.of( - resource - .getAsJsonObject() - .getAsJsonObject("resource") - .get("resourceType") - .getAsString(), - searchParameters)); + String resourceType = + resource.getAsJsonObject().getAsJsonObject("resource").get("resourceType").getAsString(); + input.add(FhirSearchParameter.of(resourceType, KEY, searchParameters)); + genericParametersInput.add(FhirSearchParameter.of(resourceType, genericSearchParameters)); searches++; if (searches > MAX_NUM_OF_SEARCHES) { break; @@ -114,8 +112,8 @@ public void setup() throws Exception { } } - @AfterClass - public static void teardown() throws IOException { + @After + public void teardown() throws IOException { HealthcareApiClient client = new HttpHealthcareApiClient(); for (String version : versions()) { client.deleteFhirStore(healthcareDataset + "/fhirStores/" + BASE_STORE_ID + version); @@ -127,25 +125,85 @@ public void testFhirIOSearch() { pipeline.getOptions().as(DirectOptions.class).setBlockOnRun(false); // Search using the resource type of each written resource and empty search parameters. - PCollection>> searchConfigs = + PCollection> searchConfigs = pipeline.apply( - Create.of(input) - .withCoder( - KvCoder.of( - StringUtf8Coder.of(), - MapCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())))); + Create.of(input).withCoder(FhirSearchParameterCoder.of(StringUtf8Coder.of()))); FhirIO.Search.Result result = searchConfigs.apply( FhirIO.searchResources(healthcareDataset + "/fhirStores/" + fhirStoreId)); // Verify that there are no failures. PAssert.that(result.getFailedSearches()).empty(); - // Verify that none of the result resource sets are empty sets. - PAssert.that(result.getResources()) + // Verify that none of the result resource sets are empty sets, using both getResources methods. + PCollection> keyedResources = result.getKeyedResources(); + PAssert.that(keyedResources) .satisfies( input -> { - for (String resource : input) { - assertNotEquals(JsonParser.parseString(resource).getAsJsonArray().size(), 0); + for (KV resource : input) { + assertEquals(KEY, resource.getKey()); + assertNotEquals(0, resource.getValue().size()); + } + return null; + }); + + pipeline.run().waitUntilFinish(); + } + + @Test + public void testFhirIOSearchWithGenericParameters() { + pipeline.getOptions().as(DirectOptions.class).setBlockOnRun(false); + + // Search using the resource type of each written resource and empty search parameters. + PCollection>> searchConfigs = + pipeline.apply( + Create.of(genericParametersInput) + .withCoder(FhirSearchParameterCoder.of(ListCoder.of(VarIntCoder.of())))); + FhirIO.Search.Result result = + searchConfigs.apply( + (FhirIO.Search>) + FhirIO.searchResourcesWithGenericParameters( + healthcareDataset + "/fhirStores/" + fhirStoreId)); + + // Verify that there are no failures. + PAssert.that(result.getFailedSearches()).empty(); + // Verify that none of the result resource sets are empty sets, using both getResources methods. + PCollection resources = result.getResources(); + PAssert.that(resources) + .satisfies( + input -> { + for (JsonArray resource : input) { + assertNotEquals(0, resource.size()); + } + return null; + }); + + pipeline.run().waitUntilFinish(); + } + + @Test + public void testFhirIOSearch_emptyResult() { + pipeline.getOptions().as(DirectOptions.class).setBlockOnRun(false); + + // Search using a search that will return no results. + FhirSearchParameter emptySearch = + FhirSearchParameter.of("Patient", KEY, ImmutableMap.of("name", "INVALID_NAME")); + PCollection> searchConfigs = + pipeline.apply( + Create.of(emptySearch).withCoder(FhirSearchParameterCoder.of(StringUtf8Coder.of()))); + FhirIO.Search.Result result = + searchConfigs.apply( + FhirIO.searchResources(healthcareDataset + "/fhirStores/" + fhirStoreId)); + + // Verify that there are no failures. + PAssert.that(result.getFailedSearches()).empty(); + // Verify that the result is empty. + PCollection> keyedResources = result.getKeyedResources(); + PAssert.that(keyedResources) + .satisfies( + input -> { + for (KV resource : input) { + assertEquals(KEY, resource.getKey()); + assertEquals(0, resource.getValue().size()); } return null; }); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOTest.java index 4736d8d97b75..8313a01e3d63 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOTest.java @@ -21,7 +21,8 @@ import java.util.Arrays; import java.util.Collections; import java.util.List; -import java.util.Map; +import org.apache.beam.sdk.coders.ListCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Count; @@ -61,20 +62,47 @@ public void test_FhirIO_failedReads() { @Test public void test_FhirIO_failedSearches() { - List>> input = Arrays.asList(KV.of("resource-type-1", null)); + List> input = + Arrays.asList(FhirSearchParameter.of("resource-type-1", null)); FhirIO.Search.Result searchResult = - pipeline.apply(Create.of(input)).apply(FhirIO.searchResources("bad-store")); + pipeline + .apply(Create.of(input).withCoder(FhirSearchParameterCoder.of(StringUtf8Coder.of()))) + .apply(FhirIO.searchResources("bad-store")); PCollection> failed = searchResult.getFailedSearches(); - PCollection resources = searchResult.getResources(); + PCollection failedMsgIds = + failed.apply( + MapElements.into(TypeDescriptors.strings()).via(HealthcareIOError::getDataResource)); + + PAssert.that(failedMsgIds).containsInAnyOrder(Arrays.asList("bad-store")); + PAssert.that(searchResult.getResources()).empty(); + PAssert.that(searchResult.getKeyedResources()).empty(); + pipeline.run(); + } + + @Test + public void test_FhirIO_failedSearchesWithGenericParameters() { + List>> input = + Arrays.asList(FhirSearchParameter.of("resource-type-1", null)); + FhirIO.Search.Result searchResult = + pipeline + .apply( + Create.of(input) + .withCoder(FhirSearchParameterCoder.of(ListCoder.of(StringUtf8Coder.of())))) + .apply( + (FhirIO.Search>) + FhirIO.searchResourcesWithGenericParameters("bad-store")); + + PCollection> failed = searchResult.getFailedSearches(); PCollection failedMsgIds = failed.apply( MapElements.into(TypeDescriptors.strings()).via(HealthcareIOError::getDataResource)); PAssert.that(failedMsgIds).containsInAnyOrder(Arrays.asList("bad-store")); - PAssert.that(resources).empty(); + PAssert.that(searchResult.getResources()).empty(); + PAssert.that(searchResult.getKeyedResources()).empty(); pipeline.run(); } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOTestUtil.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOTestUtil.java index 5b584426baf1..c893f59f744b 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOTestUtil.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOTestUtil.java @@ -40,9 +40,6 @@ import java.util.stream.Stream; import org.apache.beam.sdk.io.gcp.healthcare.HttpHealthcareApiClient.HealthcareHttpException; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class FhirIOTestUtil { public static final String DEFAULT_TEMP_BUCKET = "temp-storage-for-healthcare-io-tests"; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOWriteIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOWriteIT.java index e25696f37146..6e260ec1a607 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOWriteIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIOWriteIT.java @@ -41,9 +41,6 @@ import org.junit.runners.Parameterized.Parameters; @RunWith(Parameterized.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FhirIOWriteIT { @Parameters(name = "{0}") diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7V2MessagePagesTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7V2MessagePagesTest.java index b47225a51b6f..95617b94a791 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7V2MessagePagesTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7V2MessagePagesTest.java @@ -21,8 +21,8 @@ import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; -import com.google.api.services.healthcare.v1beta1.model.ListMessagesResponse; -import com.google.api.services.healthcare.v1beta1.model.Message; +import com.google.api.services.healthcare.v1.model.ListMessagesResponse; +import com.google.api.services.healthcare.v1.model.Message; import java.io.IOException; import java.util.Iterator; import java.util.List; @@ -37,9 +37,6 @@ /** The type HL7v2 message id pages test. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HL7V2MessagePagesTest { /** The Healthcare API. */ diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadIT.java index a9e7aa75a480..610c830b861e 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadIT.java @@ -42,9 +42,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HL7v2IOReadIT { private transient HealthcareApiClient client; private static String healthcareDataset; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java index 1959aa0eada2..bb8079c3420d 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java @@ -47,9 +47,6 @@ * with schematized data which should be output only. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HL7v2IOReadWriteIT { private transient HealthcareApiClient client; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTest.java index 34d9254d1e0a..7a5db64e5b2a 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.io.gcp.healthcare; -import com.google.api.services.healthcare.v1beta1.model.Message; +import com.google.api.services.healthcare.v1.model.Message; import java.util.Arrays; import java.util.Collections; import java.util.List; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTestUtil.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTestUtil.java index d8ad6f298458..997ebc98936d 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTestUtil.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTestUtil.java @@ -19,7 +19,7 @@ import com.google.api.client.util.Base64; import com.google.api.client.util.Sleeper; -import com.google.api.services.healthcare.v1beta1.model.Message; +import com.google.api.services.healthcare.v1.model.Message; import java.io.IOException; import java.util.Arrays; import java.util.List; @@ -35,9 +35,6 @@ import org.joda.time.Duration; import org.joda.time.Instant; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class HL7v2IOTestUtil { public static final long HL7V2_INDEXING_TIMEOUT_MINUTES = 10L; /** Google Cloud Healthcare Dataset in Apache Beam integration test project. */ diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOWriteIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOWriteIT.java index b9b6fd7b1ced..34fcabad5680 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOWriteIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOWriteIT.java @@ -22,7 +22,7 @@ import static org.apache.beam.sdk.io.gcp.healthcare.HL7v2IOTestUtil.MESSAGES; import static org.apache.beam.sdk.io.gcp.healthcare.HL7v2IOTestUtil.deleteAllHL7v2Messages; -import com.google.api.services.healthcare.v1beta1.model.Hl7V2Store; +import com.google.api.services.healthcare.v1.model.Hl7V2Store; import java.io.IOException; import java.security.SecureRandom; import java.util.concurrent.TimeoutException; @@ -41,9 +41,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HL7v2IOWriteIT { private transient HealthcareApiClient client; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/NestedRowToMessageTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/NestedRowToMessageTest.java new file mode 100644 index 000000000000..af2cdae176c2 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/NestedRowToMessageTest.java @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsub; + +import static java.nio.charset.StandardCharsets.UTF_8; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.ATTRIBUTES_FIELD; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.PAYLOAD_FIELD; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.ATTRIBUTE_ARRAY_ENTRY_SCHEMA; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.ATTRIBUTE_ARRAY_FIELD_TYPE; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.ATTRIBUTE_MAP_FIELD_TYPE; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThrows; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +import java.util.Map; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.mockito.junit.MockitoJUnitRunner; + +@RunWith(MockitoJUnitRunner.class) +public class NestedRowToMessageTest { + private static final PayloadSerializer SERIALIZER = mock(PayloadSerializer.class); + private static final Map ATTRIBUTES = ImmutableMap.of("k1", "v1", "k2", "v2"); + + @Test + public void mapAttributesTransformed() { + Schema schema = + Schema.builder() + .addByteArrayField(PAYLOAD_FIELD) + .addField(ATTRIBUTES_FIELD, ATTRIBUTE_MAP_FIELD_TYPE) + .build(); + Row row = Row.withSchema(schema).attachValues("abc".getBytes(UTF_8), ATTRIBUTES); + PubsubMessage message = new PubsubMessage("abc".getBytes(UTF_8), ATTRIBUTES); + assertEquals(message, new NestedRowToMessage(SERIALIZER, schema).apply(row)); + } + + @Test + public void entriesAttributesTransformed() { + Schema schema = + Schema.builder() + .addByteArrayField(PAYLOAD_FIELD) + .addField(ATTRIBUTES_FIELD, ATTRIBUTE_ARRAY_FIELD_TYPE) + .build(); + Row row = + Row.withSchema(schema) + .attachValues( + "abc".getBytes(UTF_8), + ImmutableList.of( + Row.withSchema(ATTRIBUTE_ARRAY_ENTRY_SCHEMA).attachValues("k1", "v1"), + Row.withSchema(ATTRIBUTE_ARRAY_ENTRY_SCHEMA).attachValues("k2", "v2"))); + PubsubMessage message = new PubsubMessage("abc".getBytes(UTF_8), ATTRIBUTES); + assertEquals(message, new NestedRowToMessage(SERIALIZER, schema).apply(row)); + } + + @Test + public void rowPayloadTransformed() { + Schema payloadSchema = Schema.builder().addStringField("fieldName").build(); + Row payload = Row.withSchema(payloadSchema).attachValues("abc"); + Schema schema = + Schema.builder() + .addRowField(PAYLOAD_FIELD, payloadSchema) + .addField(ATTRIBUTES_FIELD, ATTRIBUTE_MAP_FIELD_TYPE) + .build(); + Row row = Row.withSchema(schema).attachValues(payload, ATTRIBUTES); + when(SERIALIZER.serialize(payload)).thenReturn("abc".getBytes(UTF_8)); + PubsubMessage message = new PubsubMessage("abc".getBytes(UTF_8), ATTRIBUTES); + assertEquals(message, new NestedRowToMessage(SERIALIZER, schema).apply(row)); + } + + @Test + public void rowPayloadTransformFailure() { + Schema payloadSchema = Schema.builder().addStringField("fieldName").build(); + Row payload = Row.withSchema(payloadSchema).attachValues("abc"); + Schema schema = + Schema.builder() + .addRowField(PAYLOAD_FIELD, payloadSchema) + .addField(ATTRIBUTES_FIELD, ATTRIBUTE_MAP_FIELD_TYPE) + .build(); + Row row = Row.withSchema(schema).attachValues(payload, ATTRIBUTES); + when(SERIALIZER.serialize(payload)).thenThrow(new IllegalArgumentException()); + assertThrows( + IllegalArgumentException.class, + () -> new NestedRowToMessage(SERIALIZER, schema).apply(row)); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubSubReadPayloadTranslationTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubSubReadPayloadTranslationTest.java new file mode 100644 index 000000000000..4cf547dea934 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubSubReadPayloadTranslationTest.java @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsub; + +import static org.junit.Assert.assertEquals; + +import java.util.Arrays; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.PubSubReadPayload; +import org.apache.beam.runners.core.construction.Environments; +import org.apache.beam.runners.core.construction.PTransformTranslation; +import org.apache.beam.runners.core.construction.SdkComponents; +import org.apache.beam.sdk.io.Read; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.SubscriptionPath; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.TopicPath; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.runners.AppliedPTransform; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PValues; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; +import org.junit.runners.Parameterized.Parameter; +import org.junit.runners.Parameterized.Parameters; + +/** Test RunnerImplementedSourceTranslator. */ +@RunWith(Parameterized.class) +public class PubSubReadPayloadTranslationTest { + private static final String TIMESTAMP_ATTRIBUTE = "timestamp"; + private static final String ID_ATTRIBUTE = "id"; + private static final String PROJECT = "project"; + private static final TopicPath TOPIC = PubsubClient.topicPathFromName(PROJECT, "testTopic"); + private static final SubscriptionPath SUBSCRIPTION = + PubsubClient.subscriptionPathFromName(PROJECT, "testSubscription"); + private final PubSubPayloadTranslation.PubSubReadPayloadTranslator sourceTranslator = + new PubSubPayloadTranslation.PubSubReadPayloadTranslator(); + + public static TestPipeline pipeline = TestPipeline.create().enableAbandonedNodeEnforcement(false); + private static final ValueProvider TOPIC_PROVIDER = pipeline.newProvider(TOPIC); + private static final ValueProvider SUBSCRIPTION_PROVIDER = + pipeline.newProvider(SUBSCRIPTION); + + @Parameters + public static Iterable data() { + return Arrays.asList( + new Object[][] { + { + // Read payload only from TOPIC. + Read.from( + new PubsubUnboundedSource.PubsubSource( + new PubsubUnboundedSource( + PubsubTestClient.createFactoryForCreateSubscription(), + StaticValueProvider.of(PubsubClient.projectPathFromId(PROJECT)), + StaticValueProvider.of(TOPIC), + null /* subscription */, + null /* timestampLabel */, + null /* idLabel */, + false /* needsAttributes */, + false /* needsMessageId*/))), + PubSubReadPayload.newBuilder() + .setTopic(TOPIC.getFullPath()) + .setWithAttributes(false) + .build() + }, + { + // Read with attributes and message id from TOPIC. + Read.from( + new PubsubUnboundedSource.PubsubSource( + new PubsubUnboundedSource( + PubsubTestClient.createFactoryForCreateSubscription(), + StaticValueProvider.of(PubsubClient.projectPathFromId(PROJECT)), + StaticValueProvider.of(TOPIC), + null /* subscription */, + TIMESTAMP_ATTRIBUTE /* timestampLabel */, + ID_ATTRIBUTE /* idLabel */, + true /* needsAttributes */, + true /* needsMessageId */))), + PubSubReadPayload.newBuilder() + .setTopic(TOPIC.getFullPath()) + .setIdAttribute(ID_ATTRIBUTE) + .setTimestampAttribute(TIMESTAMP_ATTRIBUTE) + .setWithAttributes(true) + .build() + }, + { + // Read payload from runtime provided topic. + Read.from( + new PubsubUnboundedSource.PubsubSource( + new PubsubUnboundedSource( + PubsubTestClient.createFactoryForCreateSubscription(), + StaticValueProvider.of(PubsubClient.projectPathFromId(PROJECT)), + TOPIC_PROVIDER, + null /* subscription */, + null /* timestampLabel */, + null /* idLabel */, + false /* needsAttributes */, + false /* needsMessageId */))), + PubSubReadPayload.newBuilder() + .setTopicRuntimeOverridden(((NestedValueProvider) TOPIC_PROVIDER).propertyName()) + .setWithAttributes(false) + .build() + }, + { + // Read payload with attributes and message id from runtime provided topic. + Read.from( + new PubsubUnboundedSource.PubsubSource( + new PubsubUnboundedSource( + PubsubTestClient.createFactoryForCreateSubscription(), + StaticValueProvider.of(PubsubClient.projectPathFromId(PROJECT)), + TOPIC_PROVIDER, + null /* subscription */, + TIMESTAMP_ATTRIBUTE /* timestampLabel */, + ID_ATTRIBUTE /* idLabel */, + true /* needsAttributes */, + true /* needsMessageId */))), + PubSubReadPayload.newBuilder() + .setTopicRuntimeOverridden(((NestedValueProvider) TOPIC_PROVIDER).propertyName()) + .setIdAttribute(ID_ATTRIBUTE) + .setTimestampAttribute(TIMESTAMP_ATTRIBUTE) + .setWithAttributes(true) + .build() + }, + { + // Read payload only from SUBSCRIPTION. + Read.from( + new PubsubUnboundedSource.PubsubSource( + new PubsubUnboundedSource( + PubsubTestClient.createFactoryForCreateSubscription(), + StaticValueProvider.of(PubsubClient.projectPathFromId(PROJECT)), + null /* topic */, + StaticValueProvider.of(SUBSCRIPTION), + null /* timestampLabel */, + null /* idLabel */, + false /* needsAttributes */, + false /* needsMessageId */))), + PubSubReadPayload.newBuilder() + .setSubscription(SUBSCRIPTION.getFullPath()) + .setWithAttributes(false) + .build() + }, + { + // Read payload with attributes and message id from SUBSCRIPTION. + Read.from( + new PubsubUnboundedSource.PubsubSource( + new PubsubUnboundedSource( + PubsubTestClient.createFactoryForCreateSubscription(), + StaticValueProvider.of(PubsubClient.projectPathFromId(PROJECT)), + null /* topic */, + StaticValueProvider.of(SUBSCRIPTION), + TIMESTAMP_ATTRIBUTE /* timestampLabel */, + ID_ATTRIBUTE /* idLabel */, + true /* needsAttributes */, + true /* needsMessageId */))), + PubSubReadPayload.newBuilder() + .setSubscription(SUBSCRIPTION.getFullPath()) + .setIdAttribute(ID_ATTRIBUTE) + .setTimestampAttribute(TIMESTAMP_ATTRIBUTE) + .setWithAttributes(true) + .build() + }, + { + // Read payload only from runtime provided subscription. + Read.from( + new PubsubUnboundedSource.PubsubSource( + new PubsubUnboundedSource( + PubsubTestClient.createFactoryForCreateSubscription(), + StaticValueProvider.of(PubsubClient.projectPathFromId(PROJECT)), + null /* topic */, + SUBSCRIPTION_PROVIDER, + null /* timestampLabel */, + null /* idLabel */, + false /* needsAttributes */, + false /* needsMessageId */))), + PubSubReadPayload.newBuilder() + .setSubscriptionRuntimeOverridden( + ((NestedValueProvider) SUBSCRIPTION_PROVIDER).propertyName()) + .setWithAttributes(false) + .build() + }, + { + // Read payload with attributes and message id from runtime provided subscription. + Read.from( + new PubsubUnboundedSource.PubsubSource( + new PubsubUnboundedSource( + PubsubTestClient.createFactoryForCreateSubscription(), + StaticValueProvider.of(PubsubClient.projectPathFromId(PROJECT)), + null /* topic */, + SUBSCRIPTION_PROVIDER, + TIMESTAMP_ATTRIBUTE /* timestampLabel */, + ID_ATTRIBUTE /* idLabel */, + true /* needsAttributes */, + true /* needsMessageId */))), + PubSubReadPayload.newBuilder() + .setSubscriptionRuntimeOverridden( + ((NestedValueProvider) SUBSCRIPTION_PROVIDER).propertyName()) + .setIdAttribute(ID_ATTRIBUTE) + .setTimestampAttribute(TIMESTAMP_ATTRIBUTE) + .setWithAttributes(true) + .build() + }, + }); + } + + @Parameter(0) + public Read.Unbounded readFromPubSub; + + @Parameter(1) + public PubSubReadPayload pubsubReadPayload; + + @Test + public void testTranslateSourceToFunctionSpec() throws Exception { + PCollection output = pipeline.apply(readFromPubSub); + AppliedPTransform> appliedPTransform = + AppliedPTransform.of( + "ReadFromPubsub", + PValues.expandInput(pipeline.begin()), + PValues.expandOutput(output), + readFromPubSub, + ResourceHints.create(), + pipeline); + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + RunnerApi.FunctionSpec spec = + sourceTranslator.translate((AppliedPTransform) appliedPTransform, components); + assertEquals(PTransformTranslation.PUBSUB_READ, spec.getUrn()); + PubSubReadPayload result = PubSubReadPayload.parseFrom(spec.getPayload()); + assertEquals(pubsubReadPayload, result); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubSubWritePayloadTranslationTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubSubWritePayloadTranslationTest.java new file mode 100644 index 000000000000..ea722ac70c48 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubSubWritePayloadTranslationTest.java @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsub; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.PubSubWritePayload; +import org.apache.beam.runners.core.construction.Environments; +import org.apache.beam.runners.core.construction.PTransformTranslation; +import org.apache.beam.runners.core.construction.SdkComponents; +import org.apache.beam.sdk.io.gcp.pubsub.PubSubPayloadTranslation.PubSubWritePayloadTranslator; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.TopicPath; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSink.PubsubSink; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.runners.AppliedPTransform; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.resourcehints.ResourceHints; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.PValues; +import org.joda.time.Duration; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test RunnerImplementedSinkTranslator. */ +@RunWith(JUnit4.class) +public class PubSubWritePayloadTranslationTest { + private static final String TIMESTAMP_ATTRIBUTE = "timestamp"; + private static final String ID_ATTRIBUTE = "id"; + private static final TopicPath TOPIC = PubsubClient.topicPathFromName("testProject", "testTopic"); + private final PubSubPayloadTranslation.PubSubWritePayloadTranslator sinkTranslator = + new PubSubWritePayloadTranslator(); + + @Rule public TestPipeline pipeline = TestPipeline.create().enableAbandonedNodeEnforcement(false); + + @Test + public void testTranslateSinkWithTopic() throws Exception { + PubsubUnboundedSink pubsubUnboundedSink = + new PubsubUnboundedSink( + null, + StaticValueProvider.of(TOPIC), + TIMESTAMP_ATTRIBUTE, + ID_ATTRIBUTE, + 0, + 0, + 0, + Duration.ZERO, + null); + PubsubUnboundedSink.PubsubSink pubsubSink = new PubsubSink(pubsubUnboundedSink); + PCollection input = pipeline.apply(Create.of(new byte[0])); + PDone output = input.apply(pubsubSink); + AppliedPTransform appliedPTransform = + AppliedPTransform.of( + "sink", + PValues.expandInput(input), + PValues.expandOutput(output), + pubsubSink, + ResourceHints.create(), + pipeline); + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + RunnerApi.FunctionSpec spec = sinkTranslator.translate(appliedPTransform, components); + + assertEquals(PTransformTranslation.PUBSUB_WRITE, spec.getUrn()); + PubSubWritePayload payload = PubSubWritePayload.parseFrom(spec.getPayload()); + assertEquals(TOPIC.getFullPath(), payload.getTopic()); + assertTrue(payload.getTopicRuntimeOverridden().isEmpty()); + assertEquals(TIMESTAMP_ATTRIBUTE, payload.getTimestampAttribute()); + assertEquals(ID_ATTRIBUTE, payload.getIdAttribute()); + } + + @Test + public void testTranslateSinkWithTopicOverridden() throws Exception { + ValueProvider runtimeProvider = pipeline.newProvider(TOPIC); + PubsubUnboundedSink pubsubUnboundedSinkSink = + new PubsubUnboundedSink( + null, runtimeProvider, TIMESTAMP_ATTRIBUTE, ID_ATTRIBUTE, 0, 0, 0, Duration.ZERO, null); + PubsubSink pubsubSink = new PubsubSink(pubsubUnboundedSinkSink); + PCollection input = pipeline.apply(Create.of(new byte[0])); + PDone output = input.apply(pubsubSink); + AppliedPTransform appliedPTransform = + AppliedPTransform.of( + "sink", + PValues.expandInput(input), + PValues.expandOutput(output), + pubsubSink, + ResourceHints.create(), + pipeline); + SdkComponents components = SdkComponents.create(); + components.registerEnvironment(Environments.createDockerEnvironment("java")); + RunnerApi.FunctionSpec spec = sinkTranslator.translate(appliedPTransform, components); + + assertEquals(PTransformTranslation.PUBSUB_WRITE, spec.getUrn()); + PubSubWritePayload payload = PubSubWritePayload.parseFrom(spec.getPayload()); + assertEquals( + ((NestedValueProvider) runtimeProvider).propertyName(), + payload.getTopicRuntimeOverridden()); + assertTrue(payload.getTopic().isEmpty()); + assertEquals(TIMESTAMP_ATTRIBUTE, payload.getTimestampAttribute()); + assertEquals(ID_ATTRIBUTE, payload.getIdAttribute()); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubClientTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubClientTest.java index 4161122f9608..ab62c1e907a5 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubClientTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubClientTest.java @@ -42,7 +42,7 @@ public class PubsubClientTest { private long parse(String timestamp) { Map map = ImmutableMap.of("myAttribute", timestamp); - return PubsubClient.extractTimestamp("myAttribute", null, map); + return PubsubClient.extractTimestampAttribute("myAttribute", map); } private void roundTripRfc339(String timestamp) { @@ -53,24 +53,17 @@ private void truncatedRfc339(String timestamp, String truncatedTimestmap) { assertEquals(Instant.parse(truncatedTimestmap).getMillis(), parse(timestamp)); } - @Test - public void noTimestampAttributeReturnsPubsubPublish() { - final long time = 987654321L; - long timestamp = PubsubClient.extractTimestamp(null, String.valueOf(time), null); - assertEquals(time, timestamp); - } - @Test public void noTimestampAttributeAndInvalidPubsubPublishThrowsError() { thrown.expect(NumberFormatException.class); - PubsubClient.extractTimestamp(null, "not-a-date", null); + PubsubClient.parseTimestampAsMsSinceEpoch("not-a-date"); } @Test public void timestampAttributeWithNullAttributesThrowsError() { thrown.expect(RuntimeException.class); thrown.expectMessage("PubSub message is missing a value for timestamp attribute myAttribute"); - PubsubClient.extractTimestamp("myAttribute", null, null); + PubsubClient.extractTimestampAttribute("myAttribute", null); } @Test @@ -78,14 +71,14 @@ public void timestampAttributeSetWithMissingAttributeThrowsError() { thrown.expect(RuntimeException.class); thrown.expectMessage("PubSub message is missing a value for timestamp attribute myAttribute"); Map map = ImmutableMap.of("otherLabel", "whatever"); - PubsubClient.extractTimestamp("myAttribute", null, map); + PubsubClient.extractTimestampAttribute("myAttribute", map); } @Test public void timestampAttributeParsesMillisecondsSinceEpoch() { long time = 1446162101123L; Map map = ImmutableMap.of("myAttribute", String.valueOf(time)); - long timestamp = PubsubClient.extractTimestamp("myAttribute", null, map); + long timestamp = PubsubClient.extractTimestampAttribute("myAttribute", map); assertEquals(time, timestamp); } @@ -173,13 +166,13 @@ public void projectPathFromIdWellFormed() { public void subscriptionPathFromNameWellFormed() { SubscriptionPath path = PubsubClient.subscriptionPathFromName("test", "something"); assertEquals("projects/test/subscriptions/something", path.getPath()); - assertEquals("/subscriptions/test/something", path.getV1Beta1Path()); + assertEquals("/subscriptions/test/something", path.getFullPath()); } @Test public void topicPathFromNameWellFormed() { TopicPath path = PubsubClient.topicPathFromName("test", "something"); assertEquals("projects/test/topics/something", path.getPath()); - assertEquals("/topics/test/something", path.getV1Beta1Path()); + assertEquals("/topics/test/something", path.getFullPath()); } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClientTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClientTest.java index ff32671f00a5..8eeacc47e671 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClientTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClientTest.java @@ -48,6 +48,7 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.checkerframework.checker.nullness.qual.Nullable; import org.junit.After; import org.junit.Before; import org.junit.Test; @@ -56,9 +57,6 @@ /** Tests for PubsubGrpcClient. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubGrpcClientTest { private ManagedChannel inProcessChannel; @@ -68,9 +66,9 @@ public class PubsubGrpcClientTest { private static final TopicPath TOPIC = PubsubClient.topicPathFromName("testProject", "testTopic"); private static final SubscriptionPath SUBSCRIPTION = PubsubClient.subscriptionPathFromName("testProject", "testSubscription"); - private static final long REQ_TIME = 1234L; - private static final long PUB_TIME = 3456L; - private static final long MESSAGE_TIME = 6789L; + private static final long REQ_TIME_MS = 1234L; + private static final long PUB_TIME_MS = 3456L; + private static final long MESSAGE_TIME_MS = 6789L; private static final String TIMESTAMP_ATTRIBUTE = "timestamp"; private static final String ID_ATTRIBUTE = "id"; private static final String MESSAGE_ID = "testMessageId"; @@ -86,10 +84,14 @@ public void setup() { String.format( "%s-%s", PubsubGrpcClientTest.class.getName(), ThreadLocalRandom.current().nextInt()); inProcessChannel = InProcessChannelBuilder.forName(channelName).directExecutor().build(); + } + + protected void initializeClient( + @Nullable String timestampAttribute, @Nullable String idAttribute) { Credentials testCredentials = new TestCredential(); client = new PubsubGrpcClient( - TIMESTAMP_ATTRIBUTE, ID_ATTRIBUTE, 10, inProcessChannel, testCredentials); + timestampAttribute, idAttribute, 10, inProcessChannel, testCredentials); } @After @@ -100,6 +102,62 @@ public void teardown() throws IOException { @Test public void pullOneMessage() throws IOException { + initializeClient(null, null); + String expectedSubscription = SUBSCRIPTION.getPath(); + final PullRequest expectedRequest = + PullRequest.newBuilder() + .setSubscription(expectedSubscription) + .setReturnImmediately(true) + .setMaxMessages(10) + .build(); + Timestamp timestamp = + Timestamp.newBuilder() + .setSeconds(PUB_TIME_MS / 1000) + .setNanos((int) (PUB_TIME_MS % 1000) * 1000 * 1000) + .build(); + PubsubMessage expectedPubsubMessage = + PubsubMessage.newBuilder() + .setMessageId(MESSAGE_ID) + .setData(ByteString.copyFrom(DATA.getBytes(StandardCharsets.UTF_8))) + .setPublishTime(timestamp) + .build(); + ReceivedMessage expectedReceivedMessage = + ReceivedMessage.newBuilder().setMessage(expectedPubsubMessage).setAckId(ACK_ID).build(); + final PullResponse response = + PullResponse.newBuilder() + .addAllReceivedMessages(ImmutableList.of(expectedReceivedMessage)) + .build(); + + final List requestsReceived = new ArrayList<>(); + SubscriberImplBase subscriberImplBase = + new SubscriberImplBase() { + @Override + public void pull(PullRequest request, StreamObserver responseObserver) { + requestsReceived.add(request); + responseObserver.onNext(response); + responseObserver.onCompleted(); + } + }; + Server server = + InProcessServerBuilder.forName(channelName).addService(subscriberImplBase).build().start(); + try { + List actualMessages = client.pull(REQ_TIME_MS, SUBSCRIPTION, 10, true); + assertEquals(1, actualMessages.size()); + IncomingMessage actualMessage = actualMessages.get(0); + assertEquals(ACK_ID, actualMessage.ackId()); + assertEquals(DATA, actualMessage.message().getData().toStringUtf8()); + assertEquals(MESSAGE_ID, actualMessage.recordId()); + assertEquals(REQ_TIME_MS, actualMessage.requestTimeMsSinceEpoch()); + assertEquals(PUB_TIME_MS, actualMessage.timestampMsSinceEpoch()); + assertEquals(expectedRequest, Iterables.getOnlyElement(requestsReceived)); + } finally { + server.shutdownNow(); + } + } + + @Test + public void pullOneMessageUsingAttributes() throws IOException { + initializeClient(TIMESTAMP_ATTRIBUTE, ID_ATTRIBUTE); String expectedSubscription = SUBSCRIPTION.getPath(); final PullRequest expectedRequest = PullRequest.newBuilder() @@ -109,8 +167,8 @@ public void pullOneMessage() throws IOException { .build(); Timestamp timestamp = Timestamp.newBuilder() - .setSeconds(PUB_TIME / 1000) - .setNanos((int) (PUB_TIME % 1000) * 1000) + .setSeconds(PUB_TIME_MS / 1000) + .setNanos((int) (PUB_TIME_MS % 1000) * 1000 * 1000) .build(); PubsubMessage expectedPubsubMessage = PubsubMessage.newBuilder() @@ -120,7 +178,7 @@ public void pullOneMessage() throws IOException { .putAllAttributes(ATTRIBUTES) .putAllAttributes( ImmutableMap.of( - TIMESTAMP_ATTRIBUTE, String.valueOf(MESSAGE_TIME), ID_ATTRIBUTE, RECORD_ID)) + TIMESTAMP_ATTRIBUTE, String.valueOf(MESSAGE_TIME_MS), ID_ATTRIBUTE, RECORD_ID)) .build(); ReceivedMessage expectedReceivedMessage = ReceivedMessage.newBuilder().setMessage(expectedPubsubMessage).setAckId(ACK_ID).build(); @@ -142,14 +200,14 @@ public void pull(PullRequest request, StreamObserver responseObser Server server = InProcessServerBuilder.forName(channelName).addService(subscriberImplBase).build().start(); try { - List acutalMessages = client.pull(REQ_TIME, SUBSCRIPTION, 10, true); - assertEquals(1, acutalMessages.size()); - IncomingMessage actualMessage = acutalMessages.get(0); + List actualMessages = client.pull(REQ_TIME_MS, SUBSCRIPTION, 10, true); + assertEquals(1, actualMessages.size()); + IncomingMessage actualMessage = actualMessages.get(0); assertEquals(ACK_ID, actualMessage.ackId()); assertEquals(DATA, actualMessage.message().getData().toStringUtf8()); assertEquals(RECORD_ID, actualMessage.recordId()); - assertEquals(REQ_TIME, actualMessage.requestTimeMsSinceEpoch()); - assertEquals(MESSAGE_TIME, actualMessage.timestampMsSinceEpoch()); + assertEquals(REQ_TIME_MS, actualMessage.requestTimeMsSinceEpoch()); + assertEquals(MESSAGE_TIME_MS, actualMessage.timestampMsSinceEpoch()); assertEquals(expectedRequest, Iterables.getOnlyElement(requestsReceived)); } finally { server.shutdownNow(); @@ -158,6 +216,7 @@ public void pull(PullRequest request, StreamObserver responseObser @Test public void publishOneMessage() throws IOException { + initializeClient(TIMESTAMP_ATTRIBUTE, ID_ATTRIBUTE); String expectedTopic = TOPIC.getPath(); PubsubMessage expectedPubsubMessage = PubsubMessage.newBuilder() @@ -165,7 +224,7 @@ public void publishOneMessage() throws IOException { .putAllAttributes(ATTRIBUTES) .putAllAttributes( ImmutableMap.of( - TIMESTAMP_ATTRIBUTE, String.valueOf(MESSAGE_TIME), ID_ATTRIBUTE, RECORD_ID)) + TIMESTAMP_ATTRIBUTE, String.valueOf(MESSAGE_TIME_MS), ID_ATTRIBUTE, RECORD_ID)) .build(); final PublishRequest expectedRequest = PublishRequest.newBuilder() @@ -195,7 +254,7 @@ public void publish( .setData(ByteString.copyFromUtf8(DATA)) .putAllAttributes(ATTRIBUTES) .build(), - MESSAGE_TIME, + MESSAGE_TIME_MS, RECORD_ID); int n = client.publish(TOPIC, ImmutableList.of(actualMessage)); assertEquals(1, n); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOExternalTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOExternalTest.java index 26630384f50e..6c2ab8169d00 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOExternalTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOExternalTest.java @@ -40,11 +40,12 @@ import org.apache.beam.sdk.transforms.Impulse; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.checkerframework.checker.nullness.qual.Nullable; import org.hamcrest.Matchers; +import org.hamcrest.text.MatchesPattern; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; @@ -52,9 +53,6 @@ /** Tests for building {@link PubsubIO} externally via the ExpansionService. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubIOExternalTest { @Test public void testConstructPubsubRead() throws Exception { @@ -96,8 +94,11 @@ public void testConstructPubsubRead() throws Exception { RunnerApi.PTransform transform = result.getTransform(); assertThat( transform.getSubtransformsList(), - Matchers.contains( - "test_namespacetest/PubsubUnboundedSource", "test_namespacetest/MapElements")); + Matchers.hasItem(MatchesPattern.matchesPattern(".*PubsubUnboundedSource.*"))); + assertThat( + transform.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*MapElements.*"))); + assertThat(transform.getInputsCount(), Matchers.is(0)); assertThat(transform.getOutputsCount(), Matchers.is(1)); } @@ -153,8 +154,10 @@ public void testConstructPubsubWrite() throws Exception { RunnerApi.PTransform transform = result.getTransform(); assertThat( transform.getSubtransformsList(), - Matchers.contains( - "test_namespacetest/MapElements", "test_namespacetest/PubsubUnboundedSink")); + Matchers.hasItem(MatchesPattern.matchesPattern(".*MapElements.*"))); + assertThat( + transform.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*PubsubUnboundedSink.*"))); assertThat(transform.getInputsCount(), Matchers.is(1)); assertThat(transform.getOutputsCount(), Matchers.is(0)); @@ -162,13 +165,17 @@ public void testConstructPubsubWrite() throws Exception { RunnerApi.PTransform writeComposite = result.getComponents().getTransformsOrThrow(transform.getSubtransforms(1)); - // test_namespacetest/PubsubUnboundedSink/PubsubUnboundedSink.Writer + // test_namespacetest/PubsubUnboundedSink/PubsubSink RunnerApi.PTransform writeComposite2 = - result.getComponents().getTransformsOrThrow(writeComposite.getSubtransforms(3)); + result.getComponents().getTransformsOrThrow(writeComposite.getSubtransforms(1)); + + // test_namespacetest/PubsubUnboundedSink/PubsubSink/PubsubUnboundedSink.Writer + RunnerApi.PTransform writeComposite3 = + result.getComponents().getTransformsOrThrow(writeComposite2.getSubtransforms(3)); - // test_namespacetest/PubsubUnboundedSink/PubsubUnboundedSink.Writer/ParMultiDo(Writer) + // test_namespacetest/PubsubUnboundedSink/PubsubSink/PubsubUnboundedSink.Writer/ParMultiDo(Writer) RunnerApi.PTransform writeParDo = - result.getComponents().getTransformsOrThrow(writeComposite2.getSubtransforms(0)); + result.getComponents().getTransformsOrThrow(writeComposite3.getSubtransforms(0)); RunnerApi.ParDoPayload parDoPayload = RunnerApi.ParDoPayload.parseFrom(writeParDo.getSpec().getPayload()); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOTest.java index d8569c9555ed..21671fef556f 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOTest.java @@ -18,6 +18,7 @@ package org.apache.beam.sdk.io.gcp.pubsub; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasItem; import static org.hamcrest.Matchers.is; @@ -26,10 +27,11 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import com.google.api.client.util.Clock; import com.google.protobuf.ByteString; +import com.google.protobuf.DynamicMessage; +import com.google.protobuf.InvalidProtocolBufferException; import java.io.IOException; import java.io.Serializable; import java.nio.charset.StandardCharsets; @@ -47,9 +49,14 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.extensions.protobuf.Proto3SchemaMessages.Primitive; +import org.apache.beam.sdk.extensions.protobuf.ProtoCoder; +import org.apache.beam.sdk.extensions.protobuf.ProtoDomain; import org.apache.beam.sdk.io.AvroGeneratedUser; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.IncomingMessage; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.OutgoingMessage; import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.SubscriptionPath; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubClient.TopicPath; import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO.Read; import org.apache.beam.sdk.io.gcp.pubsub.PubsubTestClient.PubsubTestClientFactory; import org.apache.beam.sdk.options.PipelineOptions; @@ -57,11 +64,14 @@ import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.display.DisplayDataEvaluator; import org.apache.beam.sdk.util.CoderUtils; import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; @@ -80,9 +90,6 @@ /** Tests for PubsubIO Read and Write transforms. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubIOTest { @Rule public ExpectedException thrown = ExpectedException.none(); @@ -344,6 +351,8 @@ public boolean equals(@Nullable Object other) { private transient PipelineOptions options; private static final SubscriptionPath SUBSCRIPTION = PubsubClient.subscriptionPathFromName("test-project", "testSubscription"); + private static final TopicPath TOPIC = + PubsubClient.topicPathFromName("test-project", "testTopic"); private static final Clock CLOCK = (Clock & Serializable) () -> 673L; transient TestPipeline readPipeline; @@ -419,6 +428,142 @@ public void after() throws IOException { } } + @Test + public void testFailedParseWithDeadLetterConfigured() { + ByteString data = ByteString.copyFrom("Hello, World!".getBytes(StandardCharsets.UTF_8)); + RuntimeException exception = new RuntimeException("Some error message"); + ImmutableList expectedReads = + ImmutableList.of( + IncomingMessage.of( + com.google.pubsub.v1.PubsubMessage.newBuilder().setData(data).build(), + 1234L, + 0, + UUID.randomUUID().toString(), + UUID.randomUUID().toString())); + ImmutableList expectedWrites = + ImmutableList.of( + OutgoingMessage.of( + com.google.pubsub.v1.PubsubMessage.newBuilder() + .setData(data) + .putAttributes("exceptionClassName", exception.getClass().getName()) + .putAttributes("exceptionMessage", exception.getMessage()) + .putAttributes("pubsubMessageId", "") + .build(), + 1234L, + null)); + clientFactory = + PubsubTestClient.createFactoryForPullAndPublish( + SUBSCRIPTION, TOPIC, CLOCK, 60, expectedReads, expectedWrites, ImmutableList.of()); + + PCollection read = + readPipeline.apply( + PubsubIO.readStrings() + .fromSubscription(SUBSCRIPTION.getPath()) + .withDeadLetterTopic(TOPIC.getPath()) + .withClock(CLOCK) + .withClientFactory(clientFactory) + .withCoderAndParseFn( + StringUtf8Coder.of(), + SimpleFunction.fromSerializableFunctionWithOutputType( + message -> { + throw exception; + }, + TypeDescriptors.strings()))); + + PAssert.that(read).empty(); + readPipeline.run(); + } + + @Test + public void testProto() { + ProtoCoder coder = ProtoCoder.of(Primitive.class); + ImmutableList inputs = + ImmutableList.of( + Primitive.newBuilder().setPrimitiveInt32(42).build(), + Primitive.newBuilder().setPrimitiveBool(true).build(), + Primitive.newBuilder().setPrimitiveString("Hello, World!").build()); + setupTestClient(inputs, coder); + PCollection read = + readPipeline.apply( + PubsubIO.readProtos(Primitive.class) + .fromSubscription(SUBSCRIPTION.getPath()) + .withClock(CLOCK) + .withClientFactory(clientFactory)); + PAssert.that(read).containsInAnyOrder(inputs); + readPipeline.run(); + } + + @Test + public void testProtoDynamicMessages() { + ProtoCoder coder = ProtoCoder.of(Primitive.class); + ImmutableList inputs = + ImmutableList.of( + Primitive.newBuilder().setPrimitiveInt32(42).build(), + Primitive.newBuilder().setPrimitiveBool(true).build(), + Primitive.newBuilder().setPrimitiveString("Hello, World!").build()); + setupTestClient(inputs, coder); + + ProtoDomain domain = ProtoDomain.buildFrom(Primitive.getDescriptor()); + String name = Primitive.getDescriptor().getFullName(); + PCollection read = + readPipeline + .apply( + PubsubIO.readProtoDynamicMessages(domain, name) + .fromSubscription(SUBSCRIPTION.getPath()) + .withClock(CLOCK) + .withClientFactory(clientFactory)) + // DynamicMessage doesn't work well with PAssert, but if the content can be successfully + // converted back into the original Primitive, then that should be good enough to + // consider it a successful read. + .apply( + "Return To Primitive", + MapElements.into(TypeDescriptor.of(Primitive.class)) + .via( + (DynamicMessage message) -> { + try { + return Primitive.parseFrom(message.toByteArray()); + } catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Could not return to Primitive", e); + } + })); + + PAssert.that(read).containsInAnyOrder(inputs); + readPipeline.run(); + } + + @Test + public void testProtoDynamicMessagesFromDescriptor() { + ProtoCoder coder = ProtoCoder.of(Primitive.class); + ImmutableList inputs = + ImmutableList.of( + Primitive.newBuilder().setPrimitiveInt32(42).build(), + Primitive.newBuilder().setPrimitiveBool(true).build(), + Primitive.newBuilder().setPrimitiveString("Hello, World!").build()); + setupTestClient(inputs, coder); + + PCollection read = + readPipeline + .apply( + PubsubIO.readProtoDynamicMessages(Primitive.getDescriptor()) + .fromSubscription(SUBSCRIPTION.getPath()) + .withClock(CLOCK) + .withClientFactory(clientFactory)) + .apply( + "Return To Primitive", + MapElements.into(TypeDescriptor.of(Primitive.class)) + .via( + (DynamicMessage message) -> { + try { + return Primitive.parseFrom(message.toByteArray()); + } catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Could not return to Primitive", e); + } + })); + + PAssert.that(read).containsInAnyOrder(inputs); + readPipeline.run(); + } + @Test public void testAvroGenericRecords() { AvroCoder coder = AvroCoder.of(GenericRecord.class, SCHEMA); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClientTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClientTest.java index e6662d2b187c..22c1cb16fe37 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClientTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClientTest.java @@ -57,9 +57,6 @@ /** Tests for PubsubJsonClient. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubJsonClientTest { private Pubsub mockPubsub; private PubsubClient client; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessagePayloadOnlyCoderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessagePayloadOnlyCoderTest.java index b132814921de..cbdf08515b95 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessagePayloadOnlyCoderTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessagePayloadOnlyCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.gcp.pubsub; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.nio.charset.StandardCharsets; import org.apache.beam.sdk.coders.Coder; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageToRowTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageToRowTest.java index e98d4e8fb7b4..bbde5f9c653b 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageToRowTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageToRowTest.java @@ -21,28 +21,28 @@ import static java.util.stream.Collectors.toSet; import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.DLQ_TAG; import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.MAIN_TAG; -import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.getParsePayloadFn; -import static org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.VARCHAR; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.ATTRIBUTE_ARRAY_ENTRY_SCHEMA; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables.size; import static org.junit.Assert.assertEquals; import java.io.Serializable; +import java.util.List; import java.util.Map; import java.util.Set; import java.util.function.Function; import java.util.stream.StreamSupport; -import org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PayloadFormat; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.SerializerProvider; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; -import org.apache.beam.sdk.schemas.utils.AvroUtils; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializers; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; -import org.apache.beam.sdk.transforms.SimpleFunction; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionTuple; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.joda.time.DateTime; @@ -52,10 +52,9 @@ import org.junit.Test; /** Unit tests for {@link PubsubMessageToRow}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubMessageToRowTest implements Serializable { + private static final SerializerProvider JSON_SERIALIZER_PROVIDER = + schema -> PayloadSerializers.getSerializer("json", schema, ImmutableMap.of()); @Rule public transient TestPipeline pipeline = TestPipeline.create(); @@ -70,7 +69,7 @@ public void testConvertsMessages() { Schema messageSchema = Schema.builder() .addDateTimeField("event_timestamp") - .addMapField("attributes", VARCHAR, VARCHAR) + .addMapField("attributes", FieldType.STRING, FieldType.STRING) .addRowField("payload", payloadSchema) .build(); @@ -90,6 +89,7 @@ public void testConvertsMessages() { .messageSchema(messageSchema) .useDlq(false) .useFlatSchema(false) + .serializerProvider(JSON_SERIALIZER_PROVIDER) .build()); PAssert.that(rows.get(MAIN_TAG)) @@ -113,6 +113,50 @@ public void testConvertsMessages() { pipeline.run(); } + @Test + public void testConvertsMessagesBytesPayloadArrayAttributes() { + Schema messageSchema = + Schema.builder() + .addDateTimeField("event_timestamp") + .addArrayField("attributes", FieldType.row(ATTRIBUTE_ARRAY_ENTRY_SCHEMA)) + .addByteArrayField("payload") + .build(); + + PCollectionTuple rows = + pipeline + .apply( + "create", + Create.timestamped( + message(1, map("attr", "val"), "foo"), + message(2, map("bttr", "vbl"), "baz"), + message(3, map("cttr", "vcl"), "bar"), + message(4, map("dttr", "vdl"), "qaz"))) + .apply( + "convert", + PubsubMessageToRow.builder() + .messageSchema(messageSchema) + .useDlq(false) + .useFlatSchema(false) + .build()); + + PAssert.that(rows.get(MAIN_TAG)) + .containsInAnyOrder( + Row.withSchema(messageSchema) + .addValues(ts(1), mapEntries("attr", "val"), "foo".getBytes(UTF_8)) + .build(), + Row.withSchema(messageSchema) + .addValues(ts(2), mapEntries("bttr", "vbl"), "baz".getBytes(UTF_8)) + .build(), + Row.withSchema(messageSchema) + .addValues(ts(3), mapEntries("cttr", "vcl"), "bar".getBytes(UTF_8)) + .build(), + Row.withSchema(messageSchema) + .addValues(ts(4), mapEntries("dttr", "vdl"), "qaz".getBytes(UTF_8)) + .build()); + + pipeline.run(); + } + @Test public void testSendsInvalidToDLQ() { Schema payloadSchema = Schema.builder().addInt32Field("id").addStringField("name").build(); @@ -120,7 +164,7 @@ public void testSendsInvalidToDLQ() { Schema messageSchema = Schema.builder() .addDateTimeField("event_timestamp") - .addMapField("attributes", VARCHAR, VARCHAR) + .addMapField("attributes", FieldType.STRING, FieldType.STRING) .addRowField("payload", payloadSchema) .build(); @@ -139,6 +183,7 @@ public void testSendsInvalidToDLQ() { .messageSchema(messageSchema) .useDlq(true) .useFlatSchema(false) + .serializerProvider(JSON_SERIALIZER_PROVIDER) .build()); PCollection rows = outputs.get(MAIN_TAG); @@ -195,6 +240,7 @@ public void testConvertsMessagesToFlatRow() { .messageSchema(messageSchema) .useDlq(false) .useFlatSchema(true) + .serializerProvider(JSON_SERIALIZER_PROVIDER) .build()); PAssert.that(rows.get(MAIN_TAG)) @@ -242,6 +288,7 @@ public void testSendsFlatRowInvalidToDLQ() { .messageSchema(messageSchema) .useDlq(true) .useFlatSchema(true) + .serializerProvider(JSON_SERIALIZER_PROVIDER) .build()); PCollection rows = outputs.get(MAIN_TAG); @@ -294,10 +341,12 @@ public void testFlatSchemaMessageInvalidElement() { .messageSchema(messageSchema) .useDlq(false) .useFlatSchema(true) + .serializerProvider(JSON_SERIALIZER_PROVIDER) .build()); Exception exception = Assert.assertThrows(RuntimeException.class, () -> pipeline.run()); - Assert.assertTrue(exception.getMessage().contains("Error parsing message")); + Assert.assertTrue( + exception.toString(), exception.getMessage().contains("Non-nullable field 'id'")); } @Test @@ -307,7 +356,7 @@ public void testNestedSchemaMessageInvalidElement() { Schema messageSchema = Schema.builder() .addDateTimeField("event_timestamp") - .addMapField("attributes", VARCHAR, VARCHAR) + .addMapField("attributes", FieldType.STRING, FieldType.STRING) .addRowField("payload", payloadSchema) .build(); @@ -323,30 +372,12 @@ public void testNestedSchemaMessageInvalidElement() { .messageSchema(messageSchema) .useDlq(false) .useFlatSchema(false) + .serializerProvider(JSON_SERIALIZER_PROVIDER) .build()); Exception exception = Assert.assertThrows(RuntimeException.class, () -> pipeline.run()); - Assert.assertTrue(exception.getMessage().contains("Error parsing message")); - } - - @Test - public void testParsesAvroPayload() { - Schema payloadSchema = getParserSchema(); - Row row = row(payloadSchema, 3, "Dovahkiin", 5.5, 5L); - byte[] payload = AvroUtils.getRowToAvroBytesFunction(payloadSchema).apply(row); - SimpleFunction messageToRowFn = - getParsePayloadFn(PayloadFormat.AVRO, payloadSchema); - Row parsedRow = messageToRowFn.apply(new PubsubMessage(payload, null)); - assertEquals(row, parsedRow); - } - - private Schema getParserSchema() { - return Schema.builder() - .addInt32Field("id") - .addNullableField("name", FieldType.STRING) - .addNullableField("real", FieldType.DOUBLE) - .addNullableField("number", FieldType.INT64) - .build(); + Assert.assertTrue( + exception.toString(), exception.getMessage().contains("Non-nullable field 'id'")); } private Row row(Schema schema, Object... objects) { @@ -357,6 +388,10 @@ private Map map(String attr, String val) { return ImmutableMap.of(attr, val); } + private List mapEntries(String attr, String val) { + return ImmutableList.of(Row.withSchema(ATTRIBUTE_ARRAY_ENTRY_SCHEMA).attachValues(attr, val)); + } + private TimestampedValue message( int timestamp, Map attributes, String payload) { diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesAndMessageIdCoderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesAndMessageIdCoderTest.java index 5a061ad2cd56..1967bb1e4c15 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesAndMessageIdCoderTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesAndMessageIdCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.gcp.pubsub; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.nio.charset.StandardCharsets; import java.util.Map; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoderTest.java index eb33fd3895a4..67267a9e2685 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoderTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.gcp.pubsub; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.nio.charset.StandardCharsets; import java.util.Map; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithMessageIdCoderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithMessageIdCoderTest.java index 586b53687f39..79c719a8f766 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithMessageIdCoderTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithMessageIdCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.gcp.pubsub; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.nio.charset.StandardCharsets; import org.apache.beam.sdk.coders.Coder; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadIT.java index fa056793a9c7..66b4582016b6 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadIT.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.io.gcp.pubsub; import java.util.Set; -import org.apache.beam.runners.direct.DirectOptions; import org.apache.beam.sdk.PipelineResult; import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.TestPipelineOptions; import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings; @@ -44,7 +44,7 @@ public class PubsubReadIT { @Test public void testReadPublicData() throws Exception { // The pipeline will never terminate on its own - pipeline.getOptions().as(DirectOptions.class).setBlockOnRun(false); + pipeline.getOptions().as(TestPipelineOptions.class).setBlockOnRun(false); PCollection messages = pipeline.apply( @@ -71,7 +71,7 @@ public void testReadPublicData() throws Exception { @Test public void testReadPubsubMessageId() throws Exception { // The pipeline will never terminate on its own - pipeline.getOptions().as(DirectOptions.class).setBlockOnRun(false); + pipeline.getOptions().as(TestPipelineOptions.class).setBlockOnRun(false); PCollection messages = pipeline.apply( diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClientTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClientTest.java index 7472766996a8..b0746392d998 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClientTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClientTest.java @@ -39,9 +39,6 @@ /** Tests for PubsubTestClient. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubTestClientTest { private static final TopicPath TOPIC = PubsubClient.topicPathFromName("testProject", "testTopic"); private static final SubscriptionPath SUBSCRIPTION = @@ -128,4 +125,48 @@ public void publishOneMessage() throws IOException { } } } + + @Test + public void testPullThenPublish() throws IOException { + AtomicLong now = new AtomicLong(); + Clock clock = now::get; + PubsubMessage message = + PubsubMessage.newBuilder().setData(ByteString.copyFromUtf8(DATA)).build(); + IncomingMessage expectedIncomingMessage = + IncomingMessage.of(message, MESSAGE_TIME, REQ_TIME, ACK_ID, MESSAGE_ID); + OutgoingMessage expectedOutgoingMessage = OutgoingMessage.of(message, MESSAGE_TIME, MESSAGE_ID); + + try (PubsubTestClientFactory factory = + PubsubTestClient.createFactoryForPullAndPublish( + SUBSCRIPTION, + TOPIC, + clock, + ACK_TIMEOUT_S, + ImmutableList.of(expectedIncomingMessage), + ImmutableList.of(expectedOutgoingMessage), + ImmutableList.of())) { + try (PubsubTestClient client = (PubsubTestClient) factory.newClient(null, null, null)) { + // Pull + now.set(REQ_TIME); + client.advance(); + List actualIncomingMessages = + client.pull(now.get(), SUBSCRIPTION, 1, true); + now.addAndGet(ACK_TIMEOUT_S - 10); + client.advance(); + client.acknowledge(SUBSCRIPTION, ImmutableList.of(ACK_ID)); + + assertEquals(1, actualIncomingMessages.size()); + assertEquals(expectedIncomingMessage, actualIncomingMessages.get(0)); + + // Publish + IncomingMessage incomingMessage = actualIncomingMessages.get(0); + OutgoingMessage actualOutgoingMessage = + OutgoingMessage.of( + incomingMessage.message(), + incomingMessage.timestampMsSinceEpoch(), + incomingMessage.recordId()); + client.publish(TOPIC, ImmutableList.of(actualOutgoingMessage)); + } + } + } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSinkTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSinkTest.java index 5fb102c32646..f8cd86ee463c 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSinkTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSinkTest.java @@ -46,9 +46,6 @@ /** Test PubsubUnboundedSink. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubUnboundedSinkTest implements Serializable { private static final TopicPath TOPIC = PubsubClient.topicPathFromName("testProject", "testTopic"); private static final String DATA = "testData"; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSourceTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSourceTest.java index 6025449b49bc..a9564d3ee512 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSourceTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSourceTest.java @@ -18,6 +18,7 @@ package org.apache.beam.sdk.io.gcp.pubsub; import static junit.framework.TestCase.assertFalse; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.Matchers.hasSize; @@ -27,11 +28,11 @@ import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import com.google.api.client.util.Clock; import com.google.protobuf.ByteString; +import com.google.pubsub.v1.PubsubMessage; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.util.ArrayList; @@ -62,9 +63,6 @@ /** Test PubsubUnboundedSource. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubUnboundedSourceTest { private static final SubscriptionPath SUBSCRIPTION = PubsubClient.subscriptionPathFromName("testProject", "testSubscription"); @@ -123,8 +121,12 @@ public void after() throws IOException { factory = null; } - private static String data(PubsubMessage message) { - return new String(message.getPayload(), StandardCharsets.UTF_8); + private static String data(byte[] message, boolean payloadOnly) throws Exception { + if (payloadOnly) { + return new String(message, StandardCharsets.UTF_8); + } + PubsubMessage data = PubsubMessage.parseFrom(message); + return new String(data.getData().toByteArray(), StandardCharsets.UTF_8); } @Test @@ -137,12 +139,16 @@ public void checkpointCoderIsSane() { } @Test - public void readOneMessage() throws IOException { + public void readOneMessage() throws Exception { setupOneMessage(); PubsubReader reader = primSource.createReader(p.getOptions(), null); // Read one message. assertTrue(reader.start()); - assertEquals(DATA, data(reader.getCurrent())); + assertEquals( + DATA, + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId()))); assertFalse(reader.advance()); // ACK the message. PubsubCheckpoint checkpoint = reader.getCheckpointMark(); @@ -151,18 +157,26 @@ public void readOneMessage() throws IOException { } @Test - public void timeoutAckAndRereadOneMessage() throws IOException { + public void timeoutAckAndRereadOneMessage() throws Exception { setupOneMessage(); PubsubReader reader = primSource.createReader(p.getOptions(), null); PubsubTestClient pubsubClient = (PubsubTestClient) reader.getPubsubClient(); assertTrue(reader.start()); - assertEquals(DATA, data(reader.getCurrent())); + assertEquals( + DATA, + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId()))); // Let the ACK deadline for the above expire. now.addAndGet(65 * 1000); pubsubClient.advance(); // We'll now receive the same message again. assertTrue(reader.advance()); - assertEquals(DATA, data(reader.getCurrent())); + assertEquals( + DATA, + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId()))); assertFalse(reader.advance()); // Now ACK the message. PubsubCheckpoint checkpoint = reader.getCheckpointMark(); @@ -171,13 +185,17 @@ public void timeoutAckAndRereadOneMessage() throws IOException { } @Test - public void extendAck() throws IOException { + public void extendAck() throws Exception { setupOneMessage(); PubsubReader reader = primSource.createReader(p.getOptions(), null); PubsubTestClient pubsubClient = (PubsubTestClient) reader.getPubsubClient(); // Pull the first message but don't take a checkpoint for it. assertTrue(reader.start()); - assertEquals(DATA, data(reader.getCurrent())); + assertEquals( + DATA, + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId()))); // Extend the ack now.addAndGet(55 * 1000); pubsubClient.advance(); @@ -193,13 +211,17 @@ public void extendAck() throws IOException { } @Test - public void timeoutAckExtensions() throws IOException { + public void timeoutAckExtensions() throws Exception { setupOneMessage(); PubsubReader reader = primSource.createReader(p.getOptions(), null); PubsubTestClient pubsubClient = (PubsubTestClient) reader.getPubsubClient(); // Pull the first message but don't take a checkpoint for it. assertTrue(reader.start()); - assertEquals(DATA, data(reader.getCurrent())); + assertEquals( + DATA, + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId()))); // Extend the ack. now.addAndGet(55 * 1000); pubsubClient.advance(); @@ -215,7 +237,11 @@ public void timeoutAckExtensions() throws IOException { pubsubClient.advance(); // Reread the same message. assertTrue(reader.advance()); - assertEquals(DATA, data(reader.getCurrent())); + assertEquals( + DATA, + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId()))); // Now ACK the message. PubsubCheckpoint checkpoint = reader.getCheckpointMark(); checkpoint.finalizeCheckpoint(); @@ -223,7 +249,7 @@ public void timeoutAckExtensions() throws IOException { } @Test - public void multipleReaders() throws IOException { + public void multipleReaders() throws Exception { List incoming = new ArrayList<>(); for (int i = 0; i < 2; i++) { String data = String.format("data_%d", i); @@ -242,7 +268,11 @@ public void multipleReaders() throws IOException { PubsubReader reader = primSource.createReader(p.getOptions(), null); // Consume two messages, only read one. assertTrue(reader.start()); - assertEquals("data_0", data(reader.getCurrent())); + assertEquals( + "data_0", + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId()))); // Grab checkpoint. PubsubCheckpoint checkpoint = reader.getCheckpointMark(); @@ -252,7 +282,11 @@ public void multipleReaders() throws IOException { // Read second message. assertTrue(reader.advance()); - assertEquals("data_1", data(reader.getCurrent())); + assertEquals( + "data_1", + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId()))); // Restore from checkpoint. byte[] checkpointBytes = @@ -265,7 +299,11 @@ public void multipleReaders() throws IOException { // Re-read second message. reader = primSource.createReader(p.getOptions(), checkpoint); assertTrue(reader.start()); - assertEquals("data_1", data(reader.getCurrent())); + assertEquals( + "data_1", + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId()))); // We are done. assertFalse(reader.advance()); @@ -281,7 +319,7 @@ private long messageNumToTimestamp(int messageNum) { } @Test - public void readManyMessages() throws IOException { + public void readManyMessages() throws Exception { Map dataToMessageNum = new HashMap<>(); final int m = 97; @@ -318,7 +356,10 @@ public void readManyMessages() throws IOException { // We'll checkpoint and ack within the 2min limit. now.addAndGet(30); pubsubClient.advance(); - String data = data(reader.getCurrent()); + String data = + data( + reader.getCurrent(), + !(primSource.outer.getNeedsAttributes() || primSource.outer.getNeedsMessageId())); Integer messageNum = dataToMessageNum.remove(data); // No duplicate messages. assertNotNull(messageNum); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/AddUuidsTransformTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/AddUuidsTransformTest.java index ea233b989b15..5741a683f712 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/AddUuidsTransformTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/AddUuidsTransformTest.java @@ -36,9 +36,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public final class AddUuidsTransformTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/FakeSerializable.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/FakeSerializable.java index 6bedbe999f7e..c802b9a7afe1 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/FakeSerializable.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/FakeSerializable.java @@ -26,9 +26,6 @@ * static map. It is useful in the presence of in-process serialization, but not out of process * serialization. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) final class FakeSerializable { private static final AtomicInteger idCounter = new AtomicInteger(0); private static final ConcurrentHashMap map = new ConcurrentHashMap<>(); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRangeTrackerTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRangeTrackerTest.java new file mode 100644 index 000000000000..5a31f4fc686d --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetByteRangeTrackerTest.java @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNull; +import static org.junit.Assert.assertThrows; +import static org.junit.Assert.assertTrue; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; +import static org.mockito.MockitoAnnotations.initMocks; + +import com.google.api.gax.rpc.ApiException; +import com.google.api.gax.rpc.StatusCode.Code; +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.internal.CheckedApiException; +import com.google.cloud.pubsublite.proto.ComputeMessageStatsResponse; +import org.apache.beam.sdk.io.range.OffsetRange; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker.Progress; +import org.apache.beam.sdk.transforms.splittabledofn.SplitResult; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Stopwatch; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Ticker; +import org.joda.time.Duration; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.mockito.Spy; + +@RunWith(JUnit4.class) +@SuppressWarnings("initialization.fields.uninitialized") +public class OffsetByteRangeTrackerTest { + private static final double IGNORED_FRACTION = -10000000.0; + private static final long MIN_BYTES = 1000; + private static final OffsetRange RANGE = new OffsetRange(123L, Long.MAX_VALUE); + private final TopicBacklogReader unownedBacklogReader = mock(TopicBacklogReader.class); + + @Spy Ticker ticker; + private OffsetByteRangeTracker tracker; + + @Before + public void setUp() { + initMocks(this); + when(ticker.read()).thenReturn(0L); + tracker = + new OffsetByteRangeTracker( + OffsetByteRange.of(RANGE, 0), + unownedBacklogReader, + Stopwatch.createUnstarted(ticker), + Duration.millis(500), + MIN_BYTES); + } + + @Test + public void progressTracked() { + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(123), 10))); + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(124), 11))); + when(unownedBacklogReader.computeMessageStats(Offset.of(125))) + .thenReturn(ComputeMessageStatsResponse.newBuilder().setMessageBytes(1000).build()); + Progress progress = tracker.getProgress(); + assertEquals(21, progress.getWorkCompleted(), .0001); + assertEquals(1000, progress.getWorkRemaining(), .0001); + } + + @Test + public void getProgressStatsFailure() { + when(unownedBacklogReader.computeMessageStats(Offset.of(123))) + .thenThrow(new CheckedApiException(Code.INTERNAL).underlying); + assertThrows(ApiException.class, tracker::getProgress); + } + + @Test + @SuppressWarnings({"dereference.of.nullable", "argument.type.incompatible"}) + public void claimSplitSuccess() { + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_000), MIN_BYTES))); + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(10_000), MIN_BYTES))); + SplitResult splits = tracker.trySplit(IGNORED_FRACTION); + OffsetByteRange primary = splits.getPrimary(); + assertEquals(RANGE.getFrom(), primary.getRange().getFrom()); + assertEquals(10_001, primary.getRange().getTo()); + assertEquals(MIN_BYTES * 2, primary.getByteCount()); + OffsetByteRange residual = splits.getResidual(); + assertEquals(10_001, residual.getRange().getFrom()); + assertEquals(Long.MAX_VALUE, residual.getRange().getTo()); + assertEquals(0, residual.getByteCount()); + assertEquals(splits.getPrimary(), tracker.currentRestriction()); + tracker.checkDone(); + assertNull(tracker.trySplit(IGNORED_FRACTION)); + } + + @Test + @SuppressWarnings({"dereference.of.nullable", "argument.type.incompatible"}) + public void splitWithoutClaimEmpty() { + when(ticker.read()).thenReturn(100000000000000L); + SplitResult splits = tracker.trySplit(IGNORED_FRACTION); + assertEquals(RANGE.getFrom(), splits.getPrimary().getRange().getFrom()); + assertEquals(RANGE.getFrom(), splits.getPrimary().getRange().getTo()); + assertEquals(RANGE, splits.getResidual().getRange()); + assertEquals(splits.getPrimary(), tracker.currentRestriction()); + tracker.checkDone(); + assertNull(tracker.trySplit(IGNORED_FRACTION)); + } + + @Test + public void unboundedNotDone() { + assertThrows(IllegalStateException.class, tracker::checkDone); + } + + @Test + public void cannotClaimBackwards() { + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_000), MIN_BYTES))); + assertThrows( + IllegalArgumentException.class, + () -> tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_000), MIN_BYTES))); + assertThrows( + IllegalArgumentException.class, + () -> tracker.tryClaim(OffsetByteProgress.of(Offset.of(999), MIN_BYTES))); + } + + @Test + public void cannotClaimSplitRange() { + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_000), MIN_BYTES))); + assertTrue(tracker.trySplit(IGNORED_FRACTION) != null); + assertFalse(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_001), MIN_BYTES))); + } + + @Test + public void cannotSplitNotEnoughBytesOrTime() { + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_000), MIN_BYTES - 2))); + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_001), 1))); + when(ticker.read()).thenReturn(100_000_000L); + assertTrue(tracker.trySplit(IGNORED_FRACTION) == null); + } + + @Test + public void canSplitTimeOnly() { + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_000), MIN_BYTES - 2))); + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_001), 1))); + when(ticker.read()).thenReturn(1_000_000_000L); + assertTrue(tracker.trySplit(IGNORED_FRACTION) != null); + } + + @Test + public void canSplitBytesOnly() { + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_000), MIN_BYTES - 2))); + assertTrue(tracker.tryClaim(OffsetByteProgress.of(Offset.of(1_001), 2))); + when(ticker.read()).thenReturn(100_000_000L); + assertTrue(tracker.trySplit(IGNORED_FRACTION) != null); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetCheckpointMarkTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetCheckpointMarkTest.java deleted file mode 100644 index 4c55d8baedcc..000000000000 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/OffsetCheckpointMarkTest.java +++ /dev/null @@ -1,81 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.pubsublite; - -import static org.hamcrest.MatcherAssert.assertThat; -import static org.hamcrest.Matchers.equalTo; -import static org.mockito.Mockito.mock; -import static org.mockito.Mockito.verify; -import static org.mockito.Mockito.verifyZeroInteractions; - -import com.google.cloud.pubsublite.Offset; -import com.google.cloud.pubsublite.Partition; -import java.io.ByteArrayInputStream; -import java.io.ByteArrayOutputStream; -import java.util.Map; -import org.apache.beam.sdk.coders.Coder; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; -import org.junit.Before; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; -import org.mockito.ArgumentCaptor; -import org.mockito.Captor; -import org.mockito.MockitoAnnotations; - -@RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class OffsetCheckpointMarkTest { - @Captor private ArgumentCaptor> mapCaptor; - - @Before - public void setUp() { - MockitoAnnotations.initMocks(this); - } - - @Test - public void finalizeFinalizesWithOffsets() throws Exception { - Map map = - ImmutableMap.of(Partition.of(10), Offset.of(15), Partition.of(85), Offset.of(0)); - OffsetFinalizer finalizer = mock(OffsetFinalizer.class); - OffsetCheckpointMark mark = new OffsetCheckpointMark(finalizer, map); - mark.finalizeCheckpoint(); - verify(finalizer).finalizeOffsets(mapCaptor.capture()); - assertThat(mapCaptor.getValue(), equalTo(map)); - } - - @Test - public void coderDropsFinalizerKeepsOffsets() throws Exception { - Coder coder = OffsetCheckpointMark.getCoder(); - OffsetFinalizer finalizer = mock(OffsetFinalizer.class); - OffsetCheckpointMark mark = - new OffsetCheckpointMark( - finalizer, - ImmutableMap.of(Partition.of(10), Offset.of(15), Partition.of(85), Offset.of(0))); - - ByteArrayOutputStream output = new ByteArrayOutputStream(); - coder.encode(mark, output); - ByteArrayInputStream input = new ByteArrayInputStream(output.toByteArray()); - OffsetCheckpointMark decoded = coder.decode(input); - assertThat(mark.partitionOffsetMap, equalTo(decoded.partitionOffsetMap)); - decoded.finalizeCheckpoint(); - verifyZeroInteractions(finalizer); - } -} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PerSubscriptionPartitionSdfTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PerSubscriptionPartitionSdfTest.java new file mode 100644 index 000000000000..0a4e3e7458f5 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PerSubscriptionPartitionSdfTest.java @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static com.google.cloud.pubsublite.internal.testing.UnitTestExamples.example; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertSame; +import static org.junit.Assert.assertThrows; +import static org.junit.Assert.assertTrue; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.doReturn; +import static org.mockito.Mockito.inOrder; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoInteractions; +import static org.mockito.Mockito.when; +import static org.mockito.MockitoAnnotations.initMocks; + +import com.google.api.core.ApiFutures; +import com.google.api.gax.rpc.ApiException; +import com.google.api.gax.rpc.StatusCode.Code; +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.Partition; +import com.google.cloud.pubsublite.SubscriptionPath; +import com.google.cloud.pubsublite.internal.CheckedApiException; +import com.google.cloud.pubsublite.internal.testing.FakeApiService; +import com.google.cloud.pubsublite.internal.wire.Committer; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import java.io.ByteArrayOutputStream; +import java.io.ObjectOutputStream; +import java.util.Optional; +import javax.annotation.Nonnull; +import org.apache.beam.sdk.io.range.OffsetRange; +import org.apache.beam.sdk.transforms.DoFn.OutputReceiver; +import org.apache.beam.sdk.transforms.DoFn.ProcessContinuation; +import org.apache.beam.sdk.transforms.SerializableBiFunction; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker.Progress; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath; +import org.joda.time.Duration; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.mockito.InOrder; +import org.mockito.Mock; +import org.mockito.Spy; + +@RunWith(JUnit4.class) +@SuppressWarnings("initialization.fields.uninitialized") +public class PerSubscriptionPartitionSdfTest { + private static final Duration MAX_SLEEP_TIME = + Duration.standardMinutes(10).plus(Duration.millis(10)); + private static final OffsetByteRange RESTRICTION = + OffsetByteRange.of(new OffsetRange(1, Long.MAX_VALUE), 0); + private static final SubscriptionPartition PARTITION = + SubscriptionPartition.of(example(SubscriptionPath.class), example(Partition.class)); + + @Mock SerializableFunction offsetReaderFactory; + + @Mock ManagedBacklogReaderFactory backlogReaderFactory; + @Mock TopicBacklogReader backlogReader; + + @Mock + SerializableBiFunction trackerFactory; + + @Mock SubscriptionPartitionProcessorFactory processorFactory; + @Mock SerializableFunction committerFactory; + + @Mock InitialOffsetReader initialOffsetReader; + @Spy TrackerWithProgress tracker; + @Mock OutputReceiver output; + @Mock SubscriptionPartitionProcessor processor; + + abstract static class FakeCommitter extends FakeApiService implements Committer {} + + @Spy FakeCommitter committer; + + PerSubscriptionPartitionSdf sdf; + + @Before + public void setUp() { + initMocks(this); + when(offsetReaderFactory.apply(any())).thenReturn(initialOffsetReader); + when(processorFactory.newProcessor(any(), any(), any())).thenReturn(processor); + when(trackerFactory.apply(any(), any())).thenReturn(tracker); + when(committerFactory.apply(any())).thenReturn(committer); + when(tracker.currentRestriction()).thenReturn(RESTRICTION); + when(backlogReaderFactory.newReader(any())).thenReturn(backlogReader); + sdf = + new PerSubscriptionPartitionSdf( + MAX_SLEEP_TIME, + backlogReaderFactory, + offsetReaderFactory, + trackerFactory, + processorFactory, + committerFactory); + } + + @Test + public void getInitialRestrictionReadSuccess() { + when(initialOffsetReader.read()).thenReturn(example(Offset.class)); + OffsetByteRange range = sdf.getInitialRestriction(PARTITION); + assertEquals(example(Offset.class).value(), range.getRange().getFrom()); + assertEquals(Long.MAX_VALUE, range.getRange().getTo()); + assertEquals(0, range.getByteCount()); + verify(offsetReaderFactory).apply(PARTITION); + } + + @Test + public void getInitialRestrictionReadFailure() { + when(initialOffsetReader.read()).thenThrow(new CheckedApiException(Code.INTERNAL).underlying); + assertThrows(ApiException.class, () -> sdf.getInitialRestriction(PARTITION)); + } + + @Test + public void newTrackerCallsFactory() { + assertSame(tracker, sdf.newTracker(PARTITION, RESTRICTION)); + verify(trackerFactory).apply(backlogReader, RESTRICTION); + } + + @Test + public void tearDownClosesBacklogReaderFactory() { + sdf.teardown(); + verify(backlogReaderFactory).close(); + } + + @Test + @SuppressWarnings("argument.type.incompatible") + public void process() throws Exception { + when(processor.waitForCompletion(MAX_SLEEP_TIME)).thenReturn(ProcessContinuation.resume()); + when(processorFactory.newProcessor(any(), any(), any())) + .thenAnswer( + args -> { + @Nonnull + RestrictionTracker wrapped = args.getArgument(1); + when(tracker.tryClaim(any())).thenReturn(true).thenReturn(false); + assertTrue(wrapped.tryClaim(OffsetByteProgress.of(example(Offset.class), 123))); + assertFalse(wrapped.tryClaim(OffsetByteProgress.of(Offset.of(333333), 123))); + return processor; + }); + doReturn(Optional.of(example(Offset.class))).when(processor).lastClaimed(); + when(committer.commitOffset(any())).thenReturn(ApiFutures.immediateFuture(null)); + assertEquals(ProcessContinuation.resume(), sdf.processElement(tracker, PARTITION, output)); + verify(processorFactory).newProcessor(eq(PARTITION), any(), eq(output)); + InOrder order = inOrder(processor); + order.verify(processor).start(); + order.verify(processor).waitForCompletion(MAX_SLEEP_TIME); + order.verify(processor).lastClaimed(); + order.verify(processor).close(); + InOrder order2 = inOrder(committerFactory, committer); + order2.verify(committer).startAsync(); + order2.verify(committer).awaitRunning(); + order2.verify(committer).commitOffset(Offset.of(example(Offset.class).value() + 1)); + order2.verify(committer).stopAsync(); + order2.verify(committer).awaitTerminated(); + } + + private static final class NoopManagedBacklogReaderFactory + implements ManagedBacklogReaderFactory { + @Override + public TopicBacklogReader newReader(SubscriptionPartition subscriptionPartition) { + return null; + } + + @Override + public void close() {} + } + + @Test + @SuppressWarnings("return.type.incompatible") + public void dofnIsSerializable() throws Exception { + ObjectOutputStream output = new ObjectOutputStream(new ByteArrayOutputStream()); + output.writeObject( + new PerSubscriptionPartitionSdf( + MAX_SLEEP_TIME, + new NoopManagedBacklogReaderFactory(), + x -> null, + (x, y) -> null, + (x, y, z) -> null, + (x) -> null)); + } + + @Test + public void getProgressUnboundedRangeDelegates() { + Progress progress = Progress.from(0, 0.2); + when(tracker.getProgress()).thenReturn(progress); + assertTrue( + DoubleMath.fuzzyEquals( + progress.getWorkRemaining(), sdf.getSize(PARTITION, RESTRICTION), .0001)); + verify(tracker).getProgress(); + } + + @Test + public void getProgressBoundedReturnsBytes() { + assertTrue( + DoubleMath.fuzzyEquals( + 123.0, + sdf.getSize(PARTITION, OffsetByteRange.of(new OffsetRange(87, 8000), 123)), + .0001)); + verifyNoInteractions(tracker); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteSinkTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteSinkTest.java index dd28deb3554e..0a9fc18a1dd0 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteSinkTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteSinkTest.java @@ -36,10 +36,10 @@ import com.google.cloud.pubsublite.CloudRegion; import com.google.cloud.pubsublite.CloudZone; import com.google.cloud.pubsublite.Message; +import com.google.cloud.pubsublite.MessageMetadata; import com.google.cloud.pubsublite.Offset; import com.google.cloud.pubsublite.Partition; import com.google.cloud.pubsublite.ProjectNumber; -import com.google.cloud.pubsublite.PublishMetadata; import com.google.cloud.pubsublite.TopicName; import com.google.cloud.pubsublite.TopicPath; import com.google.cloud.pubsublite.internal.CheckedApiException; @@ -70,14 +70,11 @@ import org.mockito.stubbing.Answer; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PubsubLiteSinkTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); abstract static class PublisherFakeService extends FakeApiService - implements Publisher {} + implements Publisher {} @Spy private PublisherFakeService publisher; @@ -127,7 +124,7 @@ public void setUp() throws Exception { @Test public void singleMessagePublishes() throws Exception { when(publisher.publish(Message.builder().build())) - .thenReturn(ApiFutures.immediateFuture(PublishMetadata.of(Partition.of(1), Offset.of(2)))); + .thenReturn(ApiFutures.immediateFuture(MessageMetadata.of(Partition.of(1), Offset.of(2)))); runWith(Message.builder().build()); verify(publisher).publish(Message.builder().build()); } @@ -137,9 +134,9 @@ public void manyMessagePublishes() throws Exception { Message message1 = Message.builder().build(); Message message2 = Message.builder().setKey(ByteString.copyFromUtf8("abc")).build(); when(publisher.publish(message1)) - .thenReturn(ApiFutures.immediateFuture(PublishMetadata.of(Partition.of(1), Offset.of(2)))); + .thenReturn(ApiFutures.immediateFuture(MessageMetadata.of(Partition.of(1), Offset.of(2)))); when(publisher.publish(message2)) - .thenReturn(ApiFutures.immediateFuture(PublishMetadata.of(Partition.of(85), Offset.of(3)))); + .thenReturn(ApiFutures.immediateFuture(MessageMetadata.of(Partition.of(85), Offset.of(3)))); runWith(message1, message2); verify(publisher, times(2)).publish(publishedMessageCaptor.capture()); assertThat(publishedMessageCaptor.getAllValues(), containsInAnyOrder(message1, message2)); @@ -164,9 +161,9 @@ public void exceptionMixedWithOK() throws Exception { Message message1 = Message.builder().build(); Message message2 = Message.builder().setKey(ByteString.copyFromUtf8("abc")).build(); Message message3 = Message.builder().setKey(ByteString.copyFromUtf8("def")).build(); - SettableApiFuture future1 = SettableApiFuture.create(); - SettableApiFuture future2 = SettableApiFuture.create(); - SettableApiFuture future3 = SettableApiFuture.create(); + SettableApiFuture future1 = SettableApiFuture.create(); + SettableApiFuture future2 = SettableApiFuture.create(); + SettableApiFuture future3 = SettableApiFuture.create(); CountDownLatch startedLatch = new CountDownLatch(3); when(publisher.publish(message1)) .then( @@ -191,9 +188,9 @@ public void exceptionMixedWithOK() throws Exception { () -> { try { startedLatch.await(); - future1.set(PublishMetadata.of(Partition.of(1), Offset.of(2))); + future1.set(MessageMetadata.of(Partition.of(1), Offset.of(2))); future2.setException(new CheckedApiException(Code.INTERNAL).underlying); - future3.set(PublishMetadata.of(Partition.of(1), Offset.of(3))); + future3.set(MessageMetadata.of(Partition.of(1), Offset.of(3))); } catch (InterruptedException e) { fail(); throw new RuntimeException(e); @@ -213,7 +210,7 @@ public void exceptionMixedWithOK() throws Exception { @Test public void listenerExceptionOnBundleFinish() throws Exception { Message message1 = Message.builder().build(); - SettableApiFuture future = SettableApiFuture.create(); + SettableApiFuture future = SettableApiFuture.create(); SettableApiFuture publishFuture = SettableApiFuture.create(); when(publisher.publish(message1)) @@ -234,7 +231,7 @@ public void listenerExceptionOnBundleFinish() throws Exception { }); publishFuture.get(); listener.failed(null, new CheckedApiException(Code.INTERNAL).underlying); - future.set(PublishMetadata.of(Partition.of(1), Offset.of(2))); + future.set(MessageMetadata.of(Partition.of(1), Offset.of(2))); executorFuture.get(); } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteUnboundedReaderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteUnboundedReaderTest.java deleted file mode 100644 index fddd0188a5e6..000000000000 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteUnboundedReaderTest.java +++ /dev/null @@ -1,330 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.gcp.pubsublite; - -import static org.hamcrest.MatcherAssert.assertThat; -import static org.hamcrest.Matchers.equalTo; -import static org.hamcrest.Matchers.sameInstance; -import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertTrue; -import static org.mockito.Mockito.mock; -import static org.mockito.Mockito.reset; -import static org.mockito.Mockito.times; -import static org.mockito.Mockito.verify; -import static org.mockito.Mockito.verifyNoMoreInteractions; -import static org.mockito.Mockito.when; - -import com.google.api.core.ApiFutures; -import com.google.cloud.pubsublite.Offset; -import com.google.cloud.pubsublite.Partition; -import com.google.cloud.pubsublite.internal.PullSubscriber; -import com.google.cloud.pubsublite.internal.testing.FakeApiService; -import com.google.cloud.pubsublite.internal.wire.Committer; -import com.google.cloud.pubsublite.proto.ComputeMessageStatsResponse; -import com.google.cloud.pubsublite.proto.Cursor; -import com.google.cloud.pubsublite.proto.SequencedMessage; -import com.google.protobuf.Duration; -import com.google.protobuf.Timestamp; -import com.google.protobuf.util.Durations; -import com.google.protobuf.util.Timestamps; -import io.grpc.Status; -import io.grpc.StatusException; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.Collections; -import java.util.List; -import java.util.Random; -import org.apache.beam.sdk.io.UnboundedSource; -import org.apache.beam.sdk.io.UnboundedSource.CheckpointMark; -import org.apache.beam.sdk.io.gcp.pubsublite.PubsubLiteUnboundedReader.SubscriberState; -import org.apache.beam.sdk.transforms.windowing.BoundedWindow; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Ticker; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; -import org.joda.time.Instant; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; -import org.mockito.Mock; -import org.mockito.MockitoAnnotations; -import org.mockito.Spy; - -@RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class PubsubLiteUnboundedReaderTest { - @Mock private PullSubscriber subscriber5; - - @Mock private PullSubscriber subscriber8; - - abstract static class CommitterFakeService extends FakeApiService implements Committer {} - - private static class FakeTicker extends Ticker { - private Timestamp time; - - FakeTicker(Timestamp start) { - time = start; - } - - @Override - public long read() { - return Timestamps.toNanos(time); - } - - public void advance(Duration duration) { - time = Timestamps.add(time, duration); - } - } - - @Spy private CommitterFakeService committer5; - @Spy private CommitterFakeService committer8; - - @SuppressWarnings("unchecked") - private final UnboundedSource source = mock(UnboundedSource.class); - - @Mock private TopicBacklogReader backlogReader; - - private final FakeTicker ticker = new FakeTicker(Timestamps.fromSeconds(450)); - - private final PubsubLiteUnboundedReader reader; - - private static SequencedMessage exampleMessage(Offset offset, Timestamp publishTime) { - return SequencedMessage.newBuilder() - .setPublishTime(publishTime) - .setCursor(Cursor.newBuilder().setOffset(offset.value())) - .setSizeBytes(100) - .build(); - } - - private static Timestamp randomMilliAllignedTimestamp() { - return Timestamps.fromMillis(new Random().nextInt(Integer.MAX_VALUE)); - } - - private static Instant toInstant(Timestamp timestamp) { - return new Instant(Timestamps.toMillis(timestamp)); - } - - public PubsubLiteUnboundedReaderTest() throws StatusException { - MockitoAnnotations.initMocks(this); - SubscriberState state5 = new SubscriberState(); - state5.subscriber = subscriber5; - state5.committer = committer5; - SubscriberState state8 = new SubscriberState(); - state8.subscriber = subscriber8; - state8.committer = committer8; - reader = - new PubsubLiteUnboundedReader( - source, - ImmutableMap.of(Partition.of(5), state5, Partition.of(8), state8), - backlogReader, - ticker); - } - - @Test - public void sourceReturnsSource() { - assertThat(reader.getCurrentSource(), sameInstance(source)); - } - - @Test - public void startPullsFromAllSubscribers() throws Exception { - when(subscriber5.pull()).thenReturn(ImmutableList.of()); - when(subscriber8.pull()).thenReturn(ImmutableList.of()); - assertFalse(reader.start()); - verify(subscriber5).pull(); - verify(subscriber8).pull(); - assertThat(reader.getWatermark(), equalTo(BoundedWindow.TIMESTAMP_MIN_VALUE)); - verifyNoMoreInteractions(subscriber5, subscriber8); - } - - @Test - public void startReturnsTrueIfMessagesExist() throws Exception { - Timestamp ts = randomMilliAllignedTimestamp(); - SequencedMessage message = exampleMessage(Offset.of(10), ts); - when(subscriber5.pull()).thenReturn(ImmutableList.of(message)); - when(subscriber8.pull()).thenReturn(ImmutableList.of()); - assertTrue(reader.start()); - verify(subscriber5).pull(); - verify(subscriber8).pull(); - assertThat(reader.getCurrent(), equalTo(message)); - assertThat(reader.getWatermark(), equalTo(BoundedWindow.TIMESTAMP_MIN_VALUE)); - assertThat(reader.getCurrentTimestamp(), equalTo(toInstant(ts))); - verifyNoMoreInteractions(subscriber5, subscriber8); - } - - @Test - public void advanceSetsWatermarkAfterAllSubscribersPopulated() throws Exception { - Timestamp ts1 = randomMilliAllignedTimestamp(); - Timestamp ts2 = randomMilliAllignedTimestamp(); - SequencedMessage message1 = exampleMessage(Offset.of(10), ts1); - SequencedMessage message2 = exampleMessage(Offset.of(888), ts2); - when(subscriber5.pull()).thenReturn(ImmutableList.of(message1)); - when(subscriber8.pull()).thenReturn(ImmutableList.of(message2)); - assertTrue(reader.start()); - verify(subscriber5).pull(); - verify(subscriber8).pull(); - verifyNoMoreInteractions(subscriber5, subscriber8); - reset(subscriber5, subscriber8); - List messages = new ArrayList<>(); - messages.add(reader.getCurrent()); - assertThat(reader.getWatermark(), equalTo(BoundedWindow.TIMESTAMP_MIN_VALUE)); - // This could be either original message, but is the current message from the reader. - assertThat(reader.getCurrentTimestamp(), equalTo(toInstant(messages.get(0).getPublishTime()))); - assertTrue(reader.advance()); - messages.add(reader.getCurrent()); - assertThat( - reader.getWatermark(), - equalTo(Collections.min(Arrays.asList(toInstant(ts1), toInstant(ts2))))); - assertThat(reader.getCurrentTimestamp(), equalTo(toInstant(messages.get(1).getPublishTime()))); - // Second pull yields no more messages. - when(subscriber5.pull()).thenReturn(ImmutableList.of()); - when(subscriber8.pull()).thenReturn(ImmutableList.of()); - assertFalse(reader.advance()); - verify(subscriber5).pull(); - verify(subscriber8).pull(); - verifyNoMoreInteractions(subscriber5, subscriber8); - } - - @Test - public void multipleMessagesInPullReadsAllBeforeNextPull() throws Exception { - SequencedMessage message1 = exampleMessage(Offset.of(10), randomMilliAllignedTimestamp()); - SequencedMessage message2 = exampleMessage(Offset.of(888), randomMilliAllignedTimestamp()); - SequencedMessage message3 = exampleMessage(Offset.of(999), randomMilliAllignedTimestamp()); - when(subscriber5.pull()) - .thenReturn(ImmutableList.of(message1, message2, message3)) - .thenReturn(ImmutableList.of()); - when(subscriber8.pull()).thenReturn(ImmutableList.of()).thenReturn(ImmutableList.of()); - assertTrue(reader.start()); - assertTrue(reader.advance()); - assertTrue(reader.advance()); - assertFalse(reader.advance()); - verify(subscriber5, times(2)).pull(); - verify(subscriber8, times(2)).pull(); - verifyNoMoreInteractions(subscriber5, subscriber8); - } - - @Test - public void messagesOnSubsequentPullsProcessed() throws Exception { - SequencedMessage message1 = exampleMessage(Offset.of(10), randomMilliAllignedTimestamp()); - SequencedMessage message2 = exampleMessage(Offset.of(888), randomMilliAllignedTimestamp()); - SequencedMessage message3 = exampleMessage(Offset.of(999), randomMilliAllignedTimestamp()); - when(subscriber5.pull()) - .thenReturn(ImmutableList.of(message1)) - .thenReturn(ImmutableList.of(message2)) - .thenReturn(ImmutableList.of()); - when(subscriber8.pull()) - .thenReturn(ImmutableList.of()) - .thenReturn(ImmutableList.of(message3)) - .thenReturn(ImmutableList.of()); - assertTrue(reader.start()); - assertTrue(reader.advance()); - assertTrue(reader.advance()); - assertFalse(reader.advance()); - verify(subscriber5, times(3)).pull(); - verify(subscriber8, times(3)).pull(); - verifyNoMoreInteractions(subscriber5, subscriber8); - } - - @Test - public void checkpointMarkFinalizeCommits() throws Exception { - Timestamp ts = randomMilliAllignedTimestamp(); - SequencedMessage message = exampleMessage(Offset.of(10), ts); - when(subscriber5.pull()).thenReturn(ImmutableList.of(message)); - when(subscriber8.pull()).thenReturn(ImmutableList.of()); - assertTrue(reader.start()); - verify(subscriber5).pull(); - verify(subscriber8).pull(); - verifyNoMoreInteractions(subscriber5, subscriber8); - - CheckpointMark mark = reader.getCheckpointMark(); - - when(committer5.commitOffset(Offset.of(10))).thenReturn(ApiFutures.immediateFuture(null)); - mark.finalizeCheckpoint(); - verify(committer5).commitOffset(Offset.of(10)); - } - - @Test - public void splitBacklogBytes_returnsUnknownBacklogOnError() throws Exception { - when(backlogReader.computeMessageStats(ImmutableMap.of())) - .thenReturn(ApiFutures.immediateFailedFuture(new StatusException(Status.UNAVAILABLE))); - assertThat(PubsubLiteUnboundedReader.BACKLOG_UNKNOWN, equalTo(reader.getSplitBacklogBytes())); - } - - @Test - public void splitBacklogBytes_computesBacklog() throws Exception { - ComputeMessageStatsResponse response = - ComputeMessageStatsResponse.newBuilder().setMessageBytes(40).build(); - when(backlogReader.computeMessageStats(ImmutableMap.of())) - .thenReturn(ApiFutures.immediateFuture(response)); - assertThat(response.getMessageBytes(), equalTo(reader.getSplitBacklogBytes())); - } - - @SuppressWarnings("unchecked") - @Test - public void splitBacklogBytes_computesBacklogOncePerTenSeconds() throws Exception { - ComputeMessageStatsResponse response1 = - ComputeMessageStatsResponse.newBuilder().setMessageBytes(40).build(); - ComputeMessageStatsResponse response2 = - ComputeMessageStatsResponse.newBuilder().setMessageBytes(50).build(); - - when(backlogReader.computeMessageStats(ImmutableMap.of())) - .thenReturn(ApiFutures.immediateFuture(response1), ApiFutures.immediateFuture(response2)); - - assertThat(response1.getMessageBytes(), equalTo(reader.getSplitBacklogBytes())); - ticker.advance(Durations.fromSeconds(10)); - assertThat(response1.getMessageBytes(), equalTo(reader.getSplitBacklogBytes())); - ticker.advance(Durations.fromSeconds(1)); - assertThat(response2.getMessageBytes(), equalTo(reader.getSplitBacklogBytes())); - } - - @SuppressWarnings("unchecked") - @Test - public void splitBacklogBytes_oldValueExpiresAfterOneMinute() throws Exception { - ComputeMessageStatsResponse response = - ComputeMessageStatsResponse.newBuilder().setMessageBytes(40).build(); - - when(backlogReader.computeMessageStats(ImmutableMap.of())) - .thenReturn( - ApiFutures.immediateFuture(response), - ApiFutures.immediateFailedFuture(new StatusException(Status.UNAVAILABLE))); - - assertThat(response.getMessageBytes(), equalTo(reader.getSplitBacklogBytes())); - ticker.advance(Durations.fromSeconds(30)); - assertThat(response.getMessageBytes(), equalTo(reader.getSplitBacklogBytes())); - ticker.advance(Durations.fromSeconds(31)); - assertThat(PubsubLiteUnboundedReader.BACKLOG_UNKNOWN, equalTo(reader.getSplitBacklogBytes())); - } - - @Test - public void splitBacklogBytes_usesCorrectCursorValues() throws Exception { - SequencedMessage message1 = exampleMessage(Offset.of(10), randomMilliAllignedTimestamp()); - SequencedMessage message2 = exampleMessage(Offset.of(888), randomMilliAllignedTimestamp()); - ComputeMessageStatsResponse response = - ComputeMessageStatsResponse.newBuilder().setMessageBytes(40).build(); - - when(subscriber5.pull()).thenReturn(ImmutableList.of(message1)); - when(subscriber8.pull()).thenReturn(ImmutableList.of(message2)); - when(backlogReader.computeMessageStats( - ImmutableMap.of(Partition.of(5), Offset.of(10), Partition.of(8), Offset.of(888)))) - .thenReturn(ApiFutures.immediateFuture(response)); - - assertTrue(reader.start()); - assertTrue(reader.advance()); - assertThat(response.getMessageBytes(), equalTo(reader.getSplitBacklogBytes())); - } -} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/ReadWriteIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/ReadWriteIT.java new file mode 100644 index 000000000000..e2429423dd0c --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/ReadWriteIT.java @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; +import static org.junit.Assert.fail; + +import com.google.cloud.pubsublite.AdminClient; +import com.google.cloud.pubsublite.AdminClientSettings; +import com.google.cloud.pubsublite.BacklogLocation; +import com.google.cloud.pubsublite.CloudZone; +import com.google.cloud.pubsublite.Message; +import com.google.cloud.pubsublite.ProjectId; +import com.google.cloud.pubsublite.SubscriptionName; +import com.google.cloud.pubsublite.SubscriptionPath; +import com.google.cloud.pubsublite.TopicName; +import com.google.cloud.pubsublite.TopicPath; +import com.google.cloud.pubsublite.proto.PubSubMessage; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import com.google.cloud.pubsublite.proto.Subscription; +import com.google.cloud.pubsublite.proto.Subscription.DeliveryConfig.DeliveryRequirement; +import com.google.cloud.pubsublite.proto.Topic; +import com.google.cloud.pubsublite.proto.Topic.PartitionConfig.Capacity; +import com.google.errorprone.annotations.concurrent.GuardedBy; +import com.google.protobuf.ByteString; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Deque; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ThreadLocalRandom; +import java.util.stream.Collectors; +import java.util.stream.IntStream; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.StreamingOptions; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.TestPipelineOptions; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.FlatMapElements; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.joda.time.Duration; +import org.junit.After; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@RunWith(JUnit4.class) +public class ReadWriteIT { + private static final Logger LOG = LoggerFactory.getLogger(ReadWriteIT.class); + private static final CloudZone ZONE = CloudZone.parse("us-central1-b"); + private static final int MESSAGE_COUNT = 90; + + @Rule public transient TestPipeline pipeline = TestPipeline.create(); + + private static ProjectId getProject(PipelineOptions options) { + return ProjectId.of(checkArgumentNotNull(options.as(GcpOptions.class).getProject())); + } + + private static String randomName() { + return "beam_it_resource_" + ThreadLocalRandom.current().nextLong(); + } + + private static AdminClient newAdminClient() { + return AdminClient.create(AdminClientSettings.newBuilder().setRegion(ZONE.region()).build()); + } + + private final Deque cleanupActions = new ArrayDeque<>(); + + private TopicPath createTopic(ProjectId id) throws Exception { + TopicPath toReturn = + TopicPath.newBuilder() + .setProject(id) + .setLocation(ZONE) + .setName(TopicName.of(randomName())) + .build(); + Topic.Builder topic = Topic.newBuilder().setName(toReturn.toString()); + topic + .getPartitionConfigBuilder() + .setCount(2) + .setCapacity(Capacity.newBuilder().setPublishMibPerSec(4).setSubscribeMibPerSec(4)); + topic.getRetentionConfigBuilder().setPerPartitionBytes(30 * (1L << 30)); + cleanupActions.addLast( + () -> { + try (AdminClient client = newAdminClient()) { + client.deleteTopic(toReturn).get(); + } catch (Throwable t) { + LOG.error("Failed to clean up topic.", t); + } + }); + try (AdminClient client = newAdminClient()) { + client.createTopic(topic.build()).get(); + } + return toReturn; + } + + private SubscriptionPath createSubscription(TopicPath topic) throws Exception { + SubscriptionPath toReturn = + SubscriptionPath.newBuilder() + .setProject(topic.project()) + .setLocation(ZONE) + .setName(SubscriptionName.of(randomName())) + .build(); + Subscription.Builder subscription = Subscription.newBuilder().setName(toReturn.toString()); + subscription + .getDeliveryConfigBuilder() + .setDeliveryRequirement(DeliveryRequirement.DELIVER_IMMEDIATELY); + subscription.setTopic(topic.toString()); + cleanupActions.addLast( + () -> { + try (AdminClient client = newAdminClient()) { + client.deleteSubscription(toReturn).get(); + } catch (Throwable t) { + LOG.error("Failed to clean up subscription.", t); + } + }); + try (AdminClient client = newAdminClient()) { + client.createSubscription(subscription.build(), BacklogLocation.BEGINNING).get(); + } + return toReturn; + } + + @After + public void tearDown() { + while (!cleanupActions.isEmpty()) { + cleanupActions.removeLast().run(); + } + } + + // Workaround for BEAM-12867 + // TODO(BEAM-12867): Remove this. + private static class CustomCreate extends PTransform, PCollection> { + @Override + public PCollection expand(PCollection input) { + return input.apply( + "createIndexes", + FlatMapElements.via( + new SimpleFunction>() { + @Override + public Iterable apply(Void input) { + return IntStream.range(0, MESSAGE_COUNT).boxed().collect(Collectors.toList()); + } + })); + } + } + + public static void writeMessages(TopicPath topicPath, Pipeline pipeline) { + PCollection trigger = pipeline.apply(Create.of((Void) null)); + PCollection indexes = trigger.apply("createIndexes", new CustomCreate()); + PCollection messages = + indexes.apply( + "createMessages", + MapElements.via( + new SimpleFunction( + index -> + Message.builder() + .setData(ByteString.copyFromUtf8(index.toString())) + .build() + .toProto()) {})); + // Add UUIDs to messages for later deduplication. + messages = messages.apply("addUuids", PubsubLiteIO.addUuids()); + messages.apply( + "writeMessages", + PubsubLiteIO.write(PublisherOptions.newBuilder().setTopicPath(topicPath).build())); + } + + public static PCollection readMessages( + SubscriptionPath subscriptionPath, Pipeline pipeline) { + PCollection messages = + pipeline.apply( + "readMessages", + PubsubLiteIO.read( + SubscriberOptions.newBuilder() + .setSubscriptionPath(subscriptionPath) + // setMinBundleTimeout INTENDED FOR TESTING ONLY + // This sacrifices efficiency to make tests run faster. Do not use this in a + // real pipeline! + .setMinBundleTimeout(Duration.standardSeconds(5)) + .build())); + // Deduplicate messages based on the uuids added in PubsubLiteIO.addUuids() when writing. + return messages.apply( + "dedupeMessages", PubsubLiteIO.deduplicate(UuidDeduplicationOptions.newBuilder().build())); + } + + // This static out of band communication is needed to retain serializability. + @GuardedBy("ReadWriteIT.class") + private static final List received = new ArrayList<>(); + + private static synchronized void addMessageReceived(SequencedMessage message) { + received.add(message); + } + + private static synchronized List getTestQuickstartReceived() { + return ImmutableList.copyOf(received); + } + + private static PTransform, PCollection> + collectTestQuickstart() { + return MapElements.via( + new SimpleFunction() { + @Override + public Void apply(SequencedMessage input) { + addMessageReceived(input); + return null; + } + }); + } + + @Test + public void testReadWrite() throws Exception { + pipeline.getOptions().as(StreamingOptions.class).setStreaming(true); + pipeline.getOptions().as(TestPipelineOptions.class).setBlockOnRun(false); + + TopicPath topic = createTopic(getProject(pipeline.getOptions())); + SubscriptionPath subscription = createSubscription(topic); + + // Publish some messages + writeMessages(topic, pipeline); + + // Read some messages. They should be deduplicated by the time we see them, so there should be + // exactly numMessages, one for every index in [0,MESSAGE_COUNT). + PCollection messages = readMessages(subscription, pipeline); + messages.apply("messageReceiver", collectTestQuickstart()); + pipeline.run(); + LOG.info("Running!"); + for (int round = 0; round < 120; ++round) { + Thread.sleep(1000); + Map receivedCounts = new HashMap<>(); + for (SequencedMessage message : getTestQuickstartReceived()) { + int id = Integer.parseInt(message.getMessage().getData().toStringUtf8()); + receivedCounts.put(id, receivedCounts.getOrDefault(id, 0) + 1); + } + LOG.info("Performing comparison round {}.\n", round); + boolean done = true; + List missing = new ArrayList<>(); + for (int id = 0; id < MESSAGE_COUNT; id++) { + int idCount = receivedCounts.getOrDefault(id, 0); + if (idCount == 0) { + missing.add(id); + done = false; + } + if (idCount > 1) { + fail(String.format("Failed to deduplicate message with id %s.", id)); + } + } + LOG.info("Still messing messages: {}.\n", missing); + if (done) { + return; + } + } + fail( + String.format( + "Failed to receive all messages after 2 minutes. Received %s messages.", + getTestQuickstartReceived().size())); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionLoaderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionLoaderTest.java new file mode 100644 index 000000000000..83f13da124f5 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionLoaderTest.java @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static com.google.cloud.pubsublite.internal.testing.UnitTestExamples.example; +import static org.mockito.Mockito.when; +import static org.mockito.MockitoAnnotations.initMocks; + +import com.google.cloud.pubsublite.Partition; +import com.google.cloud.pubsublite.SubscriptionPath; +import com.google.cloud.pubsublite.TopicPath; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.PCollection; +import org.joda.time.Duration; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.mockito.Mock; + +@SuppressWarnings("initialization.fields.uninitialized") +@RunWith(JUnit4.class) +public class SubscriptionPartitionLoaderTest { + @Rule public final transient TestPipeline pipeline = TestPipeline.create(); + @Mock SerializableFunction getPartitionCount; + private SubscriptionPartitionLoader loader; + + @Before + public void setUp() { + initMocks(this); + FakeSerializable.Handle> handle = + FakeSerializable.put(getPartitionCount); + loader = + new SubscriptionPartitionLoader( + example(TopicPath.class), + example(SubscriptionPath.class), + topic -> handle.get().apply(topic), + Duration.millis(50), + true); + } + + @Test + public void singleResult() { + when(getPartitionCount.apply(example(TopicPath.class))).thenReturn(3); + PCollection output = pipeline.apply(loader); + PAssert.that(output) + .containsInAnyOrder( + SubscriptionPartition.of(example(SubscriptionPath.class), Partition.of(0)), + SubscriptionPartition.of(example(SubscriptionPath.class), Partition.of(1)), + SubscriptionPartition.of(example(SubscriptionPath.class), Partition.of(2))); + pipeline.run().waitUntilFinish(); + } + + @Test + public void addedResults() { + when(getPartitionCount.apply(example(TopicPath.class))).thenReturn(3).thenReturn(4); + PCollection output = pipeline.apply(loader); + PAssert.that(output) + .containsInAnyOrder( + SubscriptionPartition.of(example(SubscriptionPath.class), Partition.of(0)), + SubscriptionPartition.of(example(SubscriptionPath.class), Partition.of(1)), + SubscriptionPartition.of(example(SubscriptionPath.class), Partition.of(2)), + SubscriptionPartition.of(example(SubscriptionPath.class), Partition.of(3))); + pipeline.run().waitUntilFinish(); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessorImplTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessorImplTest.java new file mode 100644 index 000000000000..3d74375897a0 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/SubscriptionPartitionProcessorImplTest.java @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsublite; + +import static com.google.cloud.pubsublite.internal.testing.UnitTestExamples.example; +import static org.apache.beam.sdk.io.gcp.pubsublite.SubscriberOptions.DEFAULT_FLOW_CONTROL; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertThrows; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.Mockito.doThrow; +import static org.mockito.Mockito.inOrder; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; +import static org.mockito.MockitoAnnotations.initMocks; + +import com.google.api.gax.rpc.ApiException; +import com.google.api.gax.rpc.StatusCode.Code; +import com.google.cloud.pubsublite.Offset; +import com.google.cloud.pubsublite.internal.CheckedApiException; +import com.google.cloud.pubsublite.internal.testing.FakeApiService; +import com.google.cloud.pubsublite.internal.wire.Subscriber; +import com.google.cloud.pubsublite.proto.Cursor; +import com.google.cloud.pubsublite.proto.FlowControlRequest; +import com.google.cloud.pubsublite.proto.SequencedMessage; +import com.google.protobuf.util.Timestamps; +import java.util.List; +import java.util.function.Consumer; +import java.util.function.Function; +import org.apache.beam.sdk.io.range.OffsetRange; +import org.apache.beam.sdk.transforms.DoFn.OutputReceiver; +import org.apache.beam.sdk.transforms.DoFn.ProcessContinuation; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; +import org.mockito.InOrder; +import org.mockito.Mock; +import org.mockito.Spy; + +@RunWith(JUnit4.class) +@SuppressWarnings("initialization.fields.uninitialized") +public class SubscriptionPartitionProcessorImplTest { + @Spy RestrictionTracker tracker; + @Mock OutputReceiver receiver; + @Mock Function>, Subscriber> subscriberFactory; + + abstract static class FakeSubscriber extends FakeApiService implements Subscriber {} + + @Spy FakeSubscriber subscriber; + + Consumer> leakedConsumer; + SubscriptionPartitionProcessor processor; + + private static SequencedMessage messageWithOffset(long offset) { + return SequencedMessage.newBuilder() + .setCursor(Cursor.newBuilder().setOffset(offset)) + .setPublishTime(Timestamps.fromMillis(10000 + offset)) + .setSizeBytes(1024) + .build(); + } + + private OffsetByteRange initialRange() { + return OffsetByteRange.of(new OffsetRange(example(Offset.class).value(), Long.MAX_VALUE)); + } + + @Before + public void setUp() { + initMocks(this); + when(subscriberFactory.apply(any())) + .then( + args -> { + leakedConsumer = args.getArgument(0); + return subscriber; + }); + processor = + new SubscriptionPartitionProcessorImpl( + tracker, receiver, subscriberFactory, DEFAULT_FLOW_CONTROL); + assertNotNull(leakedConsumer); + } + + @Test + public void lifecycle() throws Exception { + when(tracker.currentRestriction()).thenReturn(initialRange()); + processor.start(); + verify(subscriber).startAsync(); + verify(subscriber).awaitRunning(); + verify(subscriber) + .allowFlow( + FlowControlRequest.newBuilder() + .setAllowedBytes(DEFAULT_FLOW_CONTROL.bytesOutstanding()) + .setAllowedMessages(DEFAULT_FLOW_CONTROL.messagesOutstanding()) + .build()); + processor.close(); + verify(subscriber).stopAsync(); + verify(subscriber).awaitTerminated(); + } + + @Test + public void lifecycleFlowControlThrows() throws Exception { + when(tracker.currentRestriction()).thenReturn(initialRange()); + doThrow(new CheckedApiException(Code.OUT_OF_RANGE)).when(subscriber).allowFlow(any()); + assertThrows(CheckedApiException.class, () -> processor.start()); + } + + @Test + public void lifecycleSubscriberAwaitThrows() throws Exception { + when(tracker.currentRestriction()).thenReturn(initialRange()); + processor.start(); + doThrow(new CheckedApiException(Code.INTERNAL).underlying).when(subscriber).awaitTerminated(); + assertThrows(ApiException.class, () -> processor.close()); + verify(subscriber).stopAsync(); + verify(subscriber).awaitTerminated(); + } + + @Test + public void subscriberFailureFails() throws Exception { + when(tracker.currentRestriction()).thenReturn(initialRange()); + processor.start(); + subscriber.fail(new CheckedApiException(Code.OUT_OF_RANGE)); + ApiException e = + assertThrows( + // Longer wait is needed due to listener asynchrony. + ApiException.class, () -> processor.waitForCompletion(Duration.standardSeconds(1))); + assertEquals(Code.OUT_OF_RANGE, e.getStatusCode().getCode()); + } + + @Test + public void allowFlowFailureFails() throws Exception { + when(tracker.currentRestriction()).thenReturn(initialRange()); + processor.start(); + when(tracker.tryClaim(any())).thenReturn(true); + doThrow(new CheckedApiException(Code.OUT_OF_RANGE)).when(subscriber).allowFlow(any()); + leakedConsumer.accept(ImmutableList.of(messageWithOffset(1))); + ApiException e = + assertThrows(ApiException.class, () -> processor.waitForCompletion(Duration.ZERO)); + assertEquals(Code.OUT_OF_RANGE, e.getStatusCode().getCode()); + } + + @Test + public void timeoutReturnsResume() { + assertEquals(ProcessContinuation.resume(), processor.waitForCompletion(Duration.millis(10))); + assertFalse(processor.lastClaimed().isPresent()); + } + + @Test + public void failedClaimCausesStop() { + when(tracker.tryClaim(any())).thenReturn(false); + leakedConsumer.accept(ImmutableList.of(messageWithOffset(1))); + verify(tracker, times(1)).tryClaim(any()); + assertEquals(ProcessContinuation.stop(), processor.waitForCompletion(Duration.millis(10))); + assertFalse(processor.lastClaimed().isPresent()); + // Future calls to process don't try to claim. + leakedConsumer.accept(ImmutableList.of(messageWithOffset(2))); + verify(tracker, times(1)).tryClaim(any()); + } + + @Test + public void successfulClaimThenTimeout() throws Exception { + when(tracker.tryClaim(any())).thenReturn(true); + SequencedMessage message1 = messageWithOffset(1); + SequencedMessage message3 = messageWithOffset(3); + leakedConsumer.accept(ImmutableList.of(message1, message3)); + InOrder order = inOrder(tracker, receiver, subscriber); + order + .verify(tracker) + .tryClaim( + OffsetByteProgress.of(Offset.of(3), message1.getSizeBytes() + message3.getSizeBytes())); + order + .verify(receiver) + .outputWithTimestamp(message1, new Instant(Timestamps.toMillis(message1.getPublishTime()))); + order + .verify(receiver) + .outputWithTimestamp(message3, new Instant(Timestamps.toMillis(message3.getPublishTime()))); + order + .verify(subscriber) + .allowFlow( + FlowControlRequest.newBuilder() + .setAllowedMessages(2) + .setAllowedBytes(message1.getSizeBytes() + message3.getSizeBytes()) + .build()); + assertEquals(ProcessContinuation.resume(), processor.waitForCompletion(Duration.millis(10))); + assertEquals(processor.lastClaimed().get(), Offset.of(3)); + } +} diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderImplTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderImplTest.java index 51c2379f745e..65b54fc353c0 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderImplTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsublite/TopicBacklogReaderImplTest.java @@ -17,26 +17,22 @@ */ package org.apache.beam.sdk.io.gcp.pubsublite; +import static com.google.cloud.pubsublite.internal.testing.UnitTestExamples.example; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertThrows; import static org.mockito.Mockito.when; +import static org.mockito.MockitoAnnotations.initMocks; -import com.google.api.core.ApiFuture; import com.google.api.core.ApiFutures; +import com.google.api.gax.rpc.ApiException; import com.google.api.gax.rpc.StatusCode.Code; -import com.google.cloud.pubsublite.CloudZone; import com.google.cloud.pubsublite.Offset; import com.google.cloud.pubsublite.Partition; -import com.google.cloud.pubsublite.ProjectNumber; -import com.google.cloud.pubsublite.TopicName; import com.google.cloud.pubsublite.TopicPath; import com.google.cloud.pubsublite.internal.CheckedApiException; -import com.google.cloud.pubsublite.internal.ExtractStatus; import com.google.cloud.pubsublite.internal.TopicStatsClient; import com.google.cloud.pubsublite.proto.ComputeMessageStatsResponse; import com.google.protobuf.Timestamp; -import java.util.concurrent.ExecutionException; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.junit.Before; import org.junit.Rule; import org.junit.Test; @@ -54,125 +50,50 @@ public final class TopicBacklogReaderImplTest { @Mock TopicStatsClient mockClient; - private TopicPath topicPath; private TopicBacklogReader reader; @Before public void setUp() { - this.topicPath = - TopicPath.newBuilder() - .setProject(ProjectNumber.of(4)) - .setName(TopicName.of("test")) - .setLocation(CloudZone.parse("us-central1-b")) - .build(); - this.reader = new TopicBacklogReaderImpl(mockClient, topicPath); + initMocks(this); + this.reader = + new TopicBacklogReaderImpl(mockClient, example(TopicPath.class), example(Partition.class)); } @SuppressWarnings("incompatible") @Test - public void computeMessageStats_partialFailure() { - ComputeMessageStatsResponse partition1 = ComputeMessageStatsResponse.getDefaultInstance(); - - when(mockClient.computeMessageStats( - topicPath, Partition.of(1), Offset.of(10), Offset.of(Integer.MAX_VALUE))) - .thenReturn(ApiFutures.immediateFuture(partition1)); + public void computeMessageStats_failure() { when(mockClient.computeMessageStats( - topicPath, Partition.of(2), Offset.of(20), Offset.of(Integer.MAX_VALUE))) - .thenReturn(ApiFutures.immediateFailedFuture(new CheckedApiException(Code.UNAVAILABLE))); - - ApiFuture future = - reader.computeMessageStats( - ImmutableMap.of(Partition.of(1), Offset.of(10), Partition.of(2), Offset.of(20))); - - ExecutionException ex = assertThrows(ExecutionException.class, future::get); - assertEquals(Code.UNAVAILABLE, ExtractStatus.extract(ex.getCause()).get().code()); + example(TopicPath.class), + example(Partition.class), + example(Offset.class), + Offset.of(Integer.MAX_VALUE))) + .thenReturn( + ApiFutures.immediateFailedFuture(new CheckedApiException(Code.UNAVAILABLE).underlying)); + + ApiException e = + assertThrows(ApiException.class, () -> reader.computeMessageStats(example(Offset.class))); + assertEquals(Code.UNAVAILABLE, e.getStatusCode().getCode()); } @Test - public void computeMessageStats_aggregatesEmptyMessages() throws Exception { - ComputeMessageStatsResponse partition1 = ComputeMessageStatsResponse.getDefaultInstance(); - ComputeMessageStatsResponse partition2 = ComputeMessageStatsResponse.getDefaultInstance(); - ComputeMessageStatsResponse aggregate = ComputeMessageStatsResponse.getDefaultInstance(); - - when(mockClient.computeMessageStats( - topicPath, Partition.of(1), Offset.of(10), Offset.of(Integer.MAX_VALUE))) - .thenReturn(ApiFutures.immediateFuture(partition1)); - when(mockClient.computeMessageStats( - topicPath, Partition.of(2), Offset.of(20), Offset.of(Integer.MAX_VALUE))) - .thenReturn(ApiFutures.immediateFuture(partition2)); - - ApiFuture future = - reader.computeMessageStats( - ImmutableMap.of(Partition.of(1), Offset.of(10), Partition.of(2), Offset.of(20))); - - assertEquals(future.get(), aggregate); - } - - @Test - public void computeMessageStats_timestampsAggregatedWhenPresent() throws Exception { + public void computeMessageStats_validResponseCached() { Timestamp minEventTime = Timestamp.newBuilder().setSeconds(1000).setNanos(10).build(); Timestamp minPublishTime = Timestamp.newBuilder().setSeconds(1001).setNanos(11).build(); - ComputeMessageStatsResponse partition1 = - ComputeMessageStatsResponse.newBuilder().setMinimumPublishTime(minPublishTime).build(); - ComputeMessageStatsResponse partition2 = - ComputeMessageStatsResponse.newBuilder().setMinimumEventTime(minEventTime).build(); - ComputeMessageStatsResponse aggregate = - ComputeMessageStatsResponse.newBuilder() - .setMinimumEventTime(minEventTime) - .setMinimumPublishTime(minPublishTime) - .build(); - - when(mockClient.computeMessageStats( - topicPath, Partition.of(1), Offset.of(10), Offset.of(Integer.MAX_VALUE))) - .thenReturn(ApiFutures.immediateFuture(partition1)); - when(mockClient.computeMessageStats( - topicPath, Partition.of(2), Offset.of(20), Offset.of(Integer.MAX_VALUE))) - .thenReturn(ApiFutures.immediateFuture(partition2)); - - ApiFuture future = - reader.computeMessageStats( - ImmutableMap.of(Partition.of(1), Offset.of(10), Partition.of(2), Offset.of(20))); - - assertEquals(future.get(), aggregate); - } - - @Test - public void computeMessageStats_resultsAggregated() throws Exception { - Timestamp minEventTime = Timestamp.newBuilder().setSeconds(1000).setNanos(10).build(); - Timestamp minPublishTime = Timestamp.newBuilder().setSeconds(1001).setNanos(11).build(); - ComputeMessageStatsResponse partition1 = + ComputeMessageStatsResponse response = ComputeMessageStatsResponse.newBuilder() .setMessageCount(10) .setMessageBytes(100) .setMinimumEventTime(minEventTime.toBuilder().setSeconds(1002).build()) .setMinimumPublishTime(minPublishTime) .build(); - ComputeMessageStatsResponse partition2 = - ComputeMessageStatsResponse.newBuilder() - .setMessageCount(20) - .setMessageBytes(200) - .setMinimumEventTime(minEventTime) - .setMinimumPublishTime(minPublishTime.toBuilder().setNanos(12).build()) - .build(); - ComputeMessageStatsResponse aggregate = - ComputeMessageStatsResponse.newBuilder() - .setMessageCount(30) - .setMessageBytes(300) - .setMinimumEventTime(minEventTime) - .setMinimumPublishTime(minPublishTime) - .build(); when(mockClient.computeMessageStats( - topicPath, Partition.of(1), Offset.of(10), Offset.of(Integer.MAX_VALUE))) - .thenReturn(ApiFutures.immediateFuture(partition1)); - when(mockClient.computeMessageStats( - topicPath, Partition.of(2), Offset.of(20), Offset.of(Integer.MAX_VALUE))) - .thenReturn(ApiFutures.immediateFuture(partition2)); - - ApiFuture future = - reader.computeMessageStats( - ImmutableMap.of(Partition.of(1), Offset.of(10), Partition.of(2), Offset.of(20))); + example(TopicPath.class), + example(Partition.class), + example(Offset.class), + Offset.of(Integer.MAX_VALUE))) + .thenReturn(ApiFutures.immediateFuture(response)); - assertEquals(future.get(), aggregate); + assertEquals(reader.computeMessageStats(example(Offset.class)), response); } } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/FakeServiceFactory.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/FakeServiceFactory.java index cd300ee67328..8c417bdc34ec 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/FakeServiceFactory.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/FakeServiceFactory.java @@ -37,9 +37,6 @@ * A serialization friendly type service factory that maintains a mock {@link Spanner} and {@link * DatabaseClient}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class FakeServiceFactory implements ServiceFactory, Serializable { // Marked as static so they could be returned by serviceFactory, which is serializable. diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationKeyEncoderTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationKeyEncoderTest.java index d301145892da..40b7ab8c2f90 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationKeyEncoderTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationKeyEncoderTest.java @@ -38,9 +38,6 @@ /** Tests for {@link MutationKeyEncoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MutationKeyEncoderTest { @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationSizeEstimatorTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationSizeEstimatorTest.java index 8e16f736eabb..40ffbdef7a0d 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationSizeEstimatorTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationSizeEstimatorTest.java @@ -17,23 +17,22 @@ */ package org.apache.beam.sdk.io.gcp.spanner; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import com.google.cloud.ByteArray; import com.google.cloud.Date; import com.google.cloud.Timestamp; import com.google.cloud.spanner.Mutation; +import java.math.BigDecimal; import java.util.Arrays; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; /** A set of unit tests for {@link MutationSizeEstimator}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MutationSizeEstimatorTest { @Test @@ -41,10 +40,16 @@ public void primitives() throws Exception { Mutation int64 = Mutation.newInsertOrUpdateBuilder("test").set("one").to(1).build(); Mutation float64 = Mutation.newInsertOrUpdateBuilder("test").set("one").to(2.9).build(); Mutation bool = Mutation.newInsertOrUpdateBuilder("test").set("one").to(false).build(); + Mutation numeric = + Mutation.newInsertOrUpdateBuilder("test") + .set("one") + .to(new BigDecimal("12345678901234567890.123456789")) + .build(); assertThat(MutationSizeEstimator.sizeOf(int64), is(8L)); assertThat(MutationSizeEstimator.sizeOf(float64), is(8L)); assertThat(MutationSizeEstimator.sizeOf(bool), is(1L)); + assertThat(MutationSizeEstimator.sizeOf(numeric), is(30L)); } @Test @@ -64,10 +69,21 @@ public void primitiveArrays() throws Exception { .set("one") .toBoolArray(new boolean[] {true, true, false, true}) .build(); + Mutation numeric = + Mutation.newInsertOrUpdateBuilder("test") + .set("one") + .toNumericArray( + ImmutableList.of( + new BigDecimal("12345678901234567890.123456789"), + new BigDecimal("12345678901234567890123.1234567890123"), + new BigDecimal("123456789012345678901234.1234567890123456"), + new BigDecimal("1234567890123456789012345.1234567890123456789"))) + .build(); assertThat(MutationSizeEstimator.sizeOf(int64), is(24L)); assertThat(MutationSizeEstimator.sizeOf(float64), is(16L)); assertThat(MutationSizeEstimator.sizeOf(bool), is(4L)); + assertThat(MutationSizeEstimator.sizeOf(numeric), is(153L)); } @Test @@ -81,10 +97,16 @@ public void nullPrimitiveArrays() throws Exception { .build(); Mutation bool = Mutation.newInsertOrUpdateBuilder("test").set("one").toBoolArray((boolean[]) null).build(); + Mutation numeric = + Mutation.newInsertOrUpdateBuilder("test") + .set("one") + .toNumericArray((Iterable) null) + .build(); assertThat(MutationSizeEstimator.sizeOf(int64), is(0L)); assertThat(MutationSizeEstimator.sizeOf(float64), is(0L)); assertThat(MutationSizeEstimator.sizeOf(bool), is(0L)); + assertThat(MutationSizeEstimator.sizeOf(numeric), is(0L)); } @Test diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/OrderedCodeTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/OrderedCodeTest.java index b4cab9aca6c6..522ff786cbad 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/OrderedCodeTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/OrderedCodeTest.java @@ -36,9 +36,6 @@ /** A set of unit tests to verify {@link OrderedCode}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class OrderedCodeTest { /** Data for a generic coding test case with known encoded outputs. */ abstract static class CodingTestCase { diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/ReadSpannerSchemaTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/ReadSpannerSchemaTest.java index 94a5cbcd7ed8..306039a29a9d 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/ReadSpannerSchemaTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/ReadSpannerSchemaTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.io.gcp.spanner; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.mockito.Matchers.argThat; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; @@ -42,9 +42,6 @@ /** A test of {@link ReadSpannerSchemaTest}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ReadSpannerSchemaTest { @Rule public final transient ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerAccessorTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerAccessorTest.java index 8a43b65ffde8..8ce5d681b84d 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerAccessorTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerAccessorTest.java @@ -29,9 +29,6 @@ import org.junit.runners.JUnit4; @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SpannerAccessorTest { private FakeServiceFactory serviceFactory; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java index 9a96d2064160..5977c2ebd979 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java @@ -54,9 +54,6 @@ /** Unit tests for {@link SpannerIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SpannerIOReadTest implements Serializable { @Rule diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java index 798642493884..58ce51444087 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java @@ -18,11 +18,11 @@ package org.apache.beam.sdk.io.gcp.spanner; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.hasSize; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.mockito.ArgumentMatchers.anyLong; import static org.mockito.ArgumentMatchers.eq; @@ -93,9 +93,6 @@ * pipeline. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SpannerIOWriteTest implements Serializable { private static final long CELLS_PER_KEY = 7; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java index 0507c79d6ade..cc2078133c04 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java @@ -54,9 +54,6 @@ /** End-to-end test of Cloud Spanner Source. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SpannerReadIT { private static final int MAX_DB_NAME_LENGTH = 30; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java index 1d6d86292817..21f24894ae15 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.io.gcp.spanner; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.is; -import static org.junit.Assert.assertThat; import com.google.api.gax.longrunning.OperationFuture; import com.google.cloud.spanner.Database; @@ -59,9 +59,6 @@ /** End-to-end test of Cloud Spanner Sink. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SpannerWriteIT { private static final int MAX_DB_NAME_LENGTH = 30; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/StructUtilsTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/StructUtilsTest.java index 6b4a465b3634..577a470cdcdf 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/StructUtilsTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/StructUtilsTest.java @@ -19,9 +19,9 @@ import static java.nio.charset.StandardCharsets.UTF_8; import static org.apache.beam.sdk.io.gcp.spanner.StructUtils.beamTypeToSpannerType; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertThrows; import static org.junit.Assert.fail; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java index 678837387ff3..66fe8bf4ae9f 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java @@ -54,9 +54,6 @@ /** Integration test for GCS CMEK support. */ @RunWith(JUnit4.class) @Category(UsesKms.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class GcsKmsKeyIT { private static final String INPUT_FILE = "gs://dataflow-samples/shakespeare/kinglear.txt"; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/testing/BigqueryClientTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/testing/BigqueryClientTest.java index e6a8f615432f..da8f944f71cb 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/testing/BigqueryClientTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/testing/BigqueryClientTest.java @@ -42,9 +42,6 @@ /** Tests for {@link BigqueryClient}. */ @RunWith(PowerMockRunner.class) @PrepareForTest(BigqueryClient.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigqueryClientTest { private final String projectId = "test-project"; private final String query = "test-query"; diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/testing/BigqueryMatcherTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/testing/BigqueryMatcherTest.java index f6dab9e41a0f..508dfecc58a4 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/testing/BigqueryMatcherTest.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/testing/BigqueryMatcherTest.java @@ -41,9 +41,6 @@ /** Tests for {@link BigqueryMatcher}. */ @RunWith(PowerMockRunner.class) @PrepareForTest(BigqueryClient.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigqueryMatcherTest { private final String appName = "test-app"; private final String projectId = "test-project"; diff --git a/sdks/java/io/hadoop-common/build.gradle b/sdks/java/io/hadoop-common/build.gradle index 534512e49d10..552073742edb 100644 --- a/sdks/java/io/hadoop-common/build.gradle +++ b/sdks/java/io/hadoop-common/build.gradle @@ -38,8 +38,6 @@ dependencies { provided library.java.hadoop_client provided library.java.hadoop_common provided library.java.hadoop_mapreduce_client_core - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.junit hadoopVersions.each {kv -> "hadoopVersion$kv.key" "org.apache.hadoop:hadoop-client:$kv.value" diff --git a/sdks/java/io/hadoop-common/src/main/java/org/apache/beam/sdk/io/hadoop/SerializableConfiguration.java b/sdks/java/io/hadoop-common/src/main/java/org/apache/beam/sdk/io/hadoop/SerializableConfiguration.java index 7650df2625cf..d222c08f3339 100644 --- a/sdks/java/io/hadoop-common/src/main/java/org/apache/beam/sdk/io/hadoop/SerializableConfiguration.java +++ b/sdks/java/io/hadoop-common/src/main/java/org/apache/beam/sdk/io/hadoop/SerializableConfiguration.java @@ -86,6 +86,17 @@ public static Job newJob(@Nullable SerializableConfiguration conf) throws IOExce } } + /** Returns a new configuration instance using provided flags. */ + public static SerializableConfiguration fromMap(Map entries) { + Configuration hadoopConfiguration = new Configuration(); + + for (Map.Entry entry : entries.entrySet()) { + hadoopConfiguration.set(entry.getKey(), entry.getValue()); + } + + return new SerializableConfiguration(hadoopConfiguration); + } + /** Returns new populated {@link Configuration} object. */ public static Configuration newConfiguration(@Nullable SerializableConfiguration conf) { if (conf == null) { diff --git a/sdks/java/io/hadoop-common/src/test/java/org/apache/beam/sdk/io/hadoop/SerializableConfigurationTest.java b/sdks/java/io/hadoop-common/src/test/java/org/apache/beam/sdk/io/hadoop/SerializableConfigurationTest.java index b694b46483ba..135b5ec349fa 100644 --- a/sdks/java/io/hadoop-common/src/test/java/org/apache/beam/sdk/io/hadoop/SerializableConfigurationTest.java +++ b/sdks/java/io/hadoop-common/src/test/java/org/apache/beam/sdk/io/hadoop/SerializableConfigurationTest.java @@ -21,6 +21,7 @@ import static org.junit.Assert.assertNotNull; import org.apache.beam.repackaged.core.org.apache.commons.lang3.SerializationUtils; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.Job; import org.junit.Rule; @@ -31,9 +32,6 @@ /** Tests for SerializableConfiguration. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SerializableConfigurationTest { @Rule public final ExpectedException thrown = ExpectedException.none(); private static final SerializableConfiguration DEFAULT_SERIALIZABLE_CONF = @@ -73,4 +71,12 @@ public void testCreateNewJob() throws Exception { Job job = SerializableConfiguration.newJob(DEFAULT_SERIALIZABLE_CONF); assertNotNull(job); } + + @Test + public void testFromMap() { + SerializableConfiguration testConf = + SerializableConfiguration.fromMap(ImmutableMap.of("hadoop.silly.test", "test-value")); + + assertEquals("test-value", testConf.get().get("hadoop.silly.test", "default-value")); + } } diff --git a/sdks/java/io/hadoop-common/src/test/java/org/apache/beam/sdk/io/hadoop/WritableCoderTest.java b/sdks/java/io/hadoop-common/src/test/java/org/apache/beam/sdk/io/hadoop/WritableCoderTest.java index 5b21553addee..738337e076ec 100644 --- a/sdks/java/io/hadoop-common/src/test/java/org/apache/beam/sdk/io/hadoop/WritableCoderTest.java +++ b/sdks/java/io/hadoop-common/src/test/java/org/apache/beam/sdk/io/hadoop/WritableCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.hadoop; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.instanceOf; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.coders.CoderRegistry; import org.apache.beam.sdk.testing.CoderProperties; diff --git a/sdks/java/io/hadoop-file-system/build.gradle b/sdks/java/io/hadoop-file-system/build.gradle index 39ba9ba452e0..940dcb715c48 100644 --- a/sdks/java/io/hadoop-file-system/build.gradle +++ b/sdks/java/io/hadoop-file-system/build.gradle @@ -20,7 +20,6 @@ import java.util.stream.Collectors plugins { id 'org.apache.beam.module' } applyJavaNature( - automaticModuleName: 'org.apache.beam.sdk.io.hdfs') description = "Apache Beam :: SDKs :: Java :: IO :: Hadoop File System" @@ -45,8 +44,6 @@ dependencies { provided library.java.hadoop_common provided library.java.hadoop_mapreduce_client_core testCompile project(path: ":sdks:java:core", configuration: "shadowTest") - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.mockito_core testCompile library.java.junit testCompile library.java.hadoop_minicluster @@ -76,6 +73,13 @@ hadoopVersions.each {kv -> } } +// Hadoop dependencies require old version of Guava (BEAM-11626) +configurations.all { + resolutionStrategy { + force 'com.google.guava:guava:25.1-jre' + } +} + task hadoopVersionsTest(group: "Verification") { description = "Runs Hadoop file system tests with different Hadoop versions" def taskNames = hadoopVersions.keySet().stream() diff --git a/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java b/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java index a5897594e3a6..a813e318d3a2 100644 --- a/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java +++ b/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java @@ -37,6 +37,7 @@ import org.apache.beam.sdk.io.fs.MatchResult; import org.apache.beam.sdk.io.fs.MatchResult.Metadata; import org.apache.beam.sdk.io.fs.MatchResult.Status; +import org.apache.beam.sdk.io.fs.MoveOptions; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.hadoop.conf.Configuration; @@ -244,8 +245,13 @@ protected void copy(List srcResourceIds, List srcResourceIds, List destResourceIds) + List srcResourceIds, + List destResourceIds, + MoveOptions... moveOptions) throws IOException { + if (moveOptions.length > 0) { + throw new UnsupportedOperationException("Support for move options is not yet implemented."); + } for (int i = 0; i < srcResourceIds.size(); ++i) { final Path srcPath = srcResourceIds.get(i).toPath(); diff --git a/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java b/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java index 43a6e7e7d2ba..d9acf0b417b9 100644 --- a/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java +++ b/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java @@ -18,6 +18,7 @@ package org.apache.beam.sdk.io.hdfs; import java.io.File; +import java.nio.file.Paths; import java.util.List; import java.util.Map; import java.util.Set; @@ -112,9 +113,13 @@ private List readConfigurationFromHadoopYarnConfigDirs() { } } - // Load the configuration from paths found (if exists) + // Set used to dedup same config paths + Set confPaths = Sets.newHashSet(); + // Load the configuration from paths found (if exists and not loaded yet) for (String confDir : explodedConfDirs) { - if (new File(confDir).exists()) { + java.nio.file.Path path = Paths.get(confDir).normalize(); + if (new File(confDir).exists() && !confPaths.contains(path)) { + confPaths.add(path); Configuration conf = new Configuration(false); boolean confLoaded = false; for (String confName : Lists.newArrayList("core-site.xml", "hdfs-site.xml")) { diff --git a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemModuleTest.java b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemModuleTest.java index d1fc7c4c26d4..2f09e987fc32 100644 --- a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemModuleTest.java +++ b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemModuleTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.hdfs; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasItem; -import static org.junit.Assert.assertThat; import com.fasterxml.jackson.databind.Module; import com.fasterxml.jackson.databind.ObjectMapper; diff --git a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptionsRegistrarTest.java b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptionsRegistrarTest.java index 844a58a91b03..ecfc037bb14d 100644 --- a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptionsRegistrarTest.java +++ b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptionsRegistrarTest.java @@ -17,7 +17,7 @@ */ package org.apache.beam.sdk.io.hdfs; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.fail; import java.util.ServiceLoader; diff --git a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptionsTest.java b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptionsTest.java index f2f289fa452e..5e75b0293ebb 100644 --- a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptionsTest.java +++ b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptionsTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.io.hdfs; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.spy; import static org.mockito.Mockito.when; @@ -182,6 +182,26 @@ public void testDefaultSetYarnConfDirAndHadoopConfDirAndSameConfiguration() thro assertThat(configurationList.get(0).get("propertyB"), Matchers.equalTo("B")); } + @Test + public void testDefaultSetYarnConfDirAndHadoopConfDirAndSameDir() throws IOException { + Files.write( + createPropertyData("A"), tmpFolder.newFile("core-site.xml"), StandardCharsets.UTF_8); + Files.write( + createPropertyData("B"), tmpFolder.newFile("hdfs-site.xml"), StandardCharsets.UTF_8); + HadoopFileSystemOptions.ConfigurationLocator configurationLocator = + spy(new HadoopFileSystemOptions.ConfigurationLocator()); + Map environment = Maps.newHashMap(); + environment.put("HADOOP_CONF_DIR", tmpFolder.getRoot().getAbsolutePath()); + environment.put("YARN_CONF_DIR", tmpFolder.getRoot().getAbsolutePath() + "/"); + when(configurationLocator.getEnvironment()).thenReturn(environment); + + List configurationList = + configurationLocator.create(PipelineOptionsFactory.create()); + assertEquals(1, configurationList.size()); + assertThat(configurationList.get(0).get("propertyA"), Matchers.equalTo("A")); + assertThat(configurationList.get(0).get("propertyB"), Matchers.equalTo("B")); + } + @Test public void testDefaultSetYarnConfDirAndHadoopConfDirMultiPathAndSameConfiguration() throws IOException { diff --git a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemRegistrarTest.java b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemRegistrarTest.java index adfe745aa6ad..f8aaeb6ddc5b 100644 --- a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemRegistrarTest.java +++ b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemRegistrarTest.java @@ -40,9 +40,6 @@ /** Tests for {@link HadoopFileSystemRegistrar}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HadoopFileSystemRegistrarTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); diff --git a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemTest.java b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemTest.java index a9cdc7fcc25e..e39e1519c9fd 100644 --- a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemTest.java +++ b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemTest.java @@ -17,13 +17,13 @@ */ package org.apache.beam.sdk.io.hdfs; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.contains; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.hasSize; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.FileNotFoundException; import java.io.InputStream; @@ -66,9 +66,6 @@ /** Tests for {@link HadoopFileSystem}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HadoopFileSystemTest { @Rule public TestPipeline p = TestPipeline.create(); diff --git a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopResourceIdTest.java b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopResourceIdTest.java index d9dcf9368eb2..1726a3e3aa59 100644 --- a/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopResourceIdTest.java +++ b/sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopResourceIdTest.java @@ -35,9 +35,6 @@ import org.junit.rules.TemporaryFolder; /** Tests for {@link HadoopResourceId}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HadoopResourceIdTest { private MiniDFSCluster hdfsCluster; diff --git a/sdks/java/io/hadoop-format/build.gradle b/sdks/java/io/hadoop-format/build.gradle index 5ab4a55013a3..d9f2c985e989 100644 --- a/sdks/java/io/hadoop-format/build.gradle +++ b/sdks/java/io/hadoop-format/build.gradle @@ -38,8 +38,7 @@ def hadoopVersions = [ hadoopVersions.each {kv -> configurations.create("hadoopVersion$kv.key")} -def log4j_version = "2.6.2" -def elastic_search_version = "7.9.2" +def elastic_search_version = "7.12.0" configurations.create("sparkRunner") configurations.sparkRunner { @@ -60,6 +59,7 @@ dependencies { compile library.java.vendored_guava_26_0_jre compile library.java.slf4j_api compile project(":sdks:java:io:hadoop-common") + compile library.java.joda_time provided library.java.hadoop_common provided library.java.hadoop_hdfs provided library.java.hadoop_mapreduce_client_core @@ -68,6 +68,8 @@ dependencies { testCompile project(path: ":sdks:java:testing:test-utils", configuration: "testRuntime") testCompile project(":sdks:java:io:jdbc") testCompile project(path: ":examples:java", configuration: "testRuntime") + testCompile project(path: ":examples:java:twitter", configuration: "testRuntime") + testCompile "org.elasticsearch.plugin:transport-netty4-client:$elastic_search_version" testCompile library.java.testcontainers_elasticsearch testCompile "org.elasticsearch.client:elasticsearch-rest-high-level-client:$elastic_search_version" testCompile "org.elasticsearch:elasticsearch:$elastic_search_version" @@ -90,7 +92,6 @@ dependencies { testCompile library.java.cassandra_driver_mapping testCompile "org.apache.cassandra:cassandra-all:3.11.8" testCompile library.java.postgres - testCompile "org.apache.logging.log4j:log4j-core:$log4j_version" testCompile library.java.junit testCompile library.java.hamcrest_core testCompile library.java.hamcrest_library @@ -103,7 +104,8 @@ dependencies { delegate.add("sparkRunner", project(path: ":sdks:java:io:hadoop-format", configuration: "testRuntime")) sparkRunner project(path: ":examples:java", configuration: "testRuntime") - sparkRunner project(":runners:spark") + sparkRunner project(path: ":examples:java:twitter", configuration: "testRuntime") + sparkRunner project(":runners:spark:2") sparkRunner project(":sdks:java:io:hadoop-file-system") sparkRunner library.java.spark_streaming sparkRunner library.java.spark_core @@ -128,6 +130,13 @@ hadoopVersions.each {kv -> } } +// Hadoop dependencies require old version of Guava (BEAM-11626) +configurations.all { + resolutionStrategy { + force 'com.google.guava:guava:25.1-jre' + } +} + // The cassandra.yaml file currently assumes "target/..." exists. // TODO: Update cassandra.yaml to inject new properties representing // the root path. Also migrate cassandra.yaml to use any open ports diff --git a/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java b/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java index 215674fc9d75..ad8aeb83a2c5 100644 --- a/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java +++ b/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java @@ -199,6 +199,21 @@ * .withValueTranslation(myOutputValueType); * } * + *

    Hadoop formats typically work with Writable data structures which are mutable and instances + * are reused by the input format reader. Therefore, to not to have elements which can change value + * after they are emitted from read, this IO will clone each key value read from underlying hadoop + * input format (unless they are in the list of well known immutable types). However, in cases where + * used input format does not reuse instances for key/value or translation functions are used which + * already output immutable types, such clone of values can be needless penalty. In these cases IO + * can be instructed to skip key/value cloning. + * + *

    {@code
    + * HadoopFormatIO.Read read = ...
    + * p.apply("read", read
    + *     .withSkipKeyClone(true)
    + *     .withSkipValueClone(true));
    + * }
    + * *

    IMPORTANT! In case of using {@code DBInputFormat} to read data from RDBMS, Beam parallelizes * the process by using LIMIT and OFFSET clauses of SQL query to fetch different ranges of records * (as a split) by different workers. To guarantee the same order and proper split of results you @@ -331,7 +346,10 @@ public class HadoopFormatIO { * HadoopFormatIO.Read#withKeyTranslation}/ {@link HadoopFormatIO.Read#withValueTranslation}. */ public static Read read() { - return new AutoValue_HadoopFormatIO_Read.Builder().build(); + return new AutoValue_HadoopFormatIO_Read.Builder() + .setSkipKeyClone(false) + .setSkipValueClone(false) + .build(); } /** @@ -374,6 +392,10 @@ public abstract static class Read extends PTransform getValueCoder(); + public abstract @Nullable Boolean getSkipKeyClone(); + + public abstract @Nullable Boolean getSkipValueClone(); + public abstract @Nullable TypeDescriptor getinputFormatClass(); public abstract @Nullable TypeDescriptor getinputFormatKeyClass(); @@ -398,6 +420,10 @@ abstract static class Builder { abstract Builder setValueCoder(Coder valueCoder); + abstract Builder setSkipKeyClone(Boolean value); + + abstract Builder setSkipValueClone(Boolean value); + abstract Builder setInputFormatClass(TypeDescriptor inputFormatClass); abstract Builder setInputFormatKeyClass(TypeDescriptor inputFormatKeyClass); @@ -475,6 +501,16 @@ public Read withValueTranslation(SimpleFunction function, Coder c return withValueTranslation(function).toBuilder().setValueCoder(coder).build(); } + /** Determines if key clone should be skipped or not (default is 'false'). */ + public Read withSkipKeyClone(boolean value) { + return toBuilder().setSkipKeyClone(value).build(); + } + + /** Determines if value clone should be skipped or not (default is 'false'). */ + public Read withSkipValueClone(boolean value) { + return toBuilder().setSkipValueClone(value).build(); + } + @Override public PCollection> expand(PBegin input) { validateTransform(); @@ -488,13 +524,16 @@ public PCollection> expand(PBegin input) { if (valueCoder == null) { valueCoder = getDefaultCoder(getValueTypeDescriptor(), coderRegistry); } + HadoopInputFormatBoundedSource source = new HadoopInputFormatBoundedSource<>( getConfiguration(), keyCoder, valueCoder, getKeyTranslationFunction(), - getValueTranslationFunction()); + getValueTranslationFunction(), + getSkipKeyClone(), + getSkipValueClone()); return input.getPipeline().apply(org.apache.beam.sdk.io.Read.from(source)); } @@ -582,6 +621,8 @@ public static class HadoopInputFormatBoundedSource extends BoundedSource keyTranslationFunction; private final @Nullable SimpleFunction valueTranslationFunction; private final SerializableSplit inputSplit; + private final boolean skipKeyClone; + private final boolean skipValueClone; private transient List inputSplits; private long boundedSourceEstimatedSize = 0; private transient InputFormat inputFormatObj; @@ -605,8 +646,18 @@ public static class HadoopInputFormatBoundedSource extends BoundedSource keyCoder, Coder valueCoder, @Nullable SimpleFunction keyTranslationFunction, - @Nullable SimpleFunction valueTranslationFunction) { - this(conf, keyCoder, valueCoder, keyTranslationFunction, valueTranslationFunction, null); + @Nullable SimpleFunction valueTranslationFunction, + boolean skipKeyClone, + boolean skipValueClone) { + this( + conf, + keyCoder, + valueCoder, + keyTranslationFunction, + valueTranslationFunction, + null, + skipKeyClone, + skipValueClone); } @SuppressWarnings("WeakerAccess") @@ -616,13 +667,17 @@ protected HadoopInputFormatBoundedSource( Coder valueCoder, @Nullable SimpleFunction keyTranslationFunction, @Nullable SimpleFunction valueTranslationFunction, - SerializableSplit inputSplit) { + SerializableSplit inputSplit, + boolean skipKeyClone, + boolean skipValueClone) { this.conf = conf; this.inputSplit = inputSplit; this.keyCoder = keyCoder; this.valueCoder = valueCoder; this.keyTranslationFunction = keyTranslationFunction; this.valueTranslationFunction = valueTranslationFunction; + this.skipKeyClone = skipKeyClone; + this.skipValueClone = skipValueClone; } @SuppressWarnings("WeakerAccess") @@ -678,7 +733,9 @@ public List>> split(long desiredBundleSizeBytes, Pipeline valueCoder, keyTranslationFunction, valueTranslationFunction, - serializableInputSplit)) + serializableInputSplit, + skipKeyClone, + skipValueClone)) .collect(Collectors.toList()); } @@ -881,11 +938,16 @@ public KV getCurrent() { V value; try { // Transform key if translation function is provided. - key = transformKeyOrValue(recordReader.getCurrentKey(), keyTranslationFunction, keyCoder); + key = + transformKeyOrValue( + recordReader.getCurrentKey(), keyTranslationFunction, keyCoder, skipKeyClone); // Transform value if translation function is provided. value = transformKeyOrValue( - recordReader.getCurrentValue(), valueTranslationFunction, valueCoder); + recordReader.getCurrentValue(), + valueTranslationFunction, + valueCoder, + skipValueClone); } catch (IOException | InterruptedException e) { LOG.error("Unable to read data: ", e); throw new IllegalStateException("Unable to read data: " + "{}", e); @@ -896,7 +958,10 @@ public KV getCurrent() { /** Returns the serialized output of transformed key or value object. */ @SuppressWarnings("unchecked") private T3 transformKeyOrValue( - T input, @Nullable SimpleFunction simpleFunction, Coder coder) + T input, + @Nullable SimpleFunction simpleFunction, + Coder coder, + boolean skipClone) throws CoderException, ClassCastException { T3 output; if (null != simpleFunction) { @@ -904,7 +969,7 @@ private T3 transformKeyOrValue( } else { output = (T3) input; } - return cloneIfPossiblyMutable(output, coder); + return skipClone ? output : cloneIfPossiblyMutable(output, coder); } /** diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/ConfigurableEmployeeInputFormat.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/ConfigurableEmployeeInputFormat.java index ba43b6376471..5e5640becc38 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/ConfigurableEmployeeInputFormat.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/ConfigurableEmployeeInputFormat.java @@ -37,9 +37,6 @@ * Configurable. This validates if setConf() method is called before getSplits(). Known InputFormats * which implement Configurable are DBInputFormat, TableInputFormat etc. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class ConfigurableEmployeeInputFormat extends InputFormat implements Configurable { public boolean isConfSet = false; diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/Employee.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/Employee.java index 528c9967a7a6..fe2cda132520 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/Employee.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/Employee.java @@ -27,9 +27,6 @@ * HadoopFormatIO} for different unit tests. */ @DefaultCoder(AvroCoder.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class Employee { private String empAddress; private String empName; diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/EmployeeInputFormat.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/EmployeeInputFormat.java index 66260a74e6b6..93679f9113f3 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/EmployeeInputFormat.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/EmployeeInputFormat.java @@ -40,9 +40,6 @@ * whether the {@linkplain HadoopFormatIO } source returns immutable records in the scenario when * RecordReader creates new key and value objects every time it reads data. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class EmployeeInputFormat extends InputFormat { public EmployeeInputFormat() {} diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/EmployeeOutputFormat.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/EmployeeOutputFormat.java index 2ede718d9133..83194deac5a6 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/EmployeeOutputFormat.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/EmployeeOutputFormat.java @@ -33,9 +33,6 @@ * List}. {@linkplain EmployeeOutputFormat} is used to test the {@linkplain HadoopFormatIO } * sink. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class EmployeeOutputFormat extends OutputFormat { private static volatile List> output; private static OutputCommitter outputCommitter; diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HDFSSynchronizationTest.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HDFSSynchronizationTest.java index 075ebe86822c..2a9236b40821 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HDFSSynchronizationTest.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HDFSSynchronizationTest.java @@ -39,9 +39,6 @@ /** Tests functionality of {@link HDFSSynchronization} class. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HDFSSynchronizationTest { private static final String DEFAULT_JOB_ID = String.valueOf(1); @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraIT.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraIT.java index ad218f28ae36..cd85b97061d6 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraIT.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraIT.java @@ -63,9 +63,6 @@ * invocation pipeline options. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HadoopFormatIOCassandraIT implements Serializable { private static final String CASSANDRA_KEYSPACE = "ycsb"; diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraTest.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraTest.java index 3bbc81c2d3e2..be367be88282 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraTest.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraTest.java @@ -26,7 +26,7 @@ import java.io.File; import java.io.Serializable; import java.net.URI; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import org.apache.beam.sdk.io.common.HashingFn; @@ -51,9 +51,6 @@ /** Tests to validate HadoopFormatIO for embedded Cassandra instance. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HadoopFormatIOCassandraTest implements Serializable { private static final long serialVersionUID = 1L; private static final String CASSANDRA_KEYSPACE = "beamdb"; @@ -209,10 +206,10 @@ public static void beforeClass() throws Exception { private static void replacePortsInConfFile() throws Exception { URI uri = HadoopFormatIOCassandraTest.class.getResource("/cassandra.yaml").toURI(); Path cassandraYamlPath = new File(uri).toPath(); - String content = new String(Files.readAllBytes(cassandraYamlPath), Charset.defaultCharset()); + String content = new String(Files.readAllBytes(cassandraYamlPath), StandardCharsets.UTF_8); content = content.replaceAll("9042", String.valueOf(cassandraNativePort)); content = content.replaceAll("9061", String.valueOf(cassandraPort)); - Files.write(cassandraYamlPath, content.getBytes(Charset.defaultCharset())); + Files.write(cassandraYamlPath, content.getBytes(StandardCharsets.UTF_8)); } @AfterClass diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOElasticIT.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOElasticIT.java index f7721e3b4cd5..5676a7940da3 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOElasticIT.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOElasticIT.java @@ -82,9 +82,6 @@ * invocation pipeline options. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HadoopFormatIOElasticIT implements Serializable { private static final String TRUE = "true"; diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOIT.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOIT.java index a1b114adfc34..ed1e45ea5dd1 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOIT.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOIT.java @@ -94,9 +94,6 @@ * performance testing framework. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HadoopFormatIOIT { private static final String NAMESPACE = HadoopFormatIOIT.class.getName(); diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOReadTest.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOReadTest.java index 8a0965a8735d..8fa446fb10eb 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOReadTest.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOReadTest.java @@ -22,10 +22,10 @@ import static org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource; import static org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.SerializableSplit; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.IOException; @@ -60,6 +60,7 @@ import org.apache.hadoop.mapreduce.JobContext; import org.apache.hadoop.mapreduce.TaskAttemptContext; import org.apache.hadoop.mapreduce.lib.db.DBInputFormat; +import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Rule; import org.junit.Test; @@ -72,7 +73,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class HadoopFormatIOReadTest { private static SerializableConfiguration serConf; @@ -376,6 +376,16 @@ public void testReadValidationFailsMissingConfiguration() { read.validateTransform(); } + @Test + public void testReadObjectCreationWithSkipKeyValueClone() { + HadoopFormatIO.Read read = HadoopFormatIO.read(); + assertEquals(false, read.getSkipKeyClone()); + assertEquals(false, read.getSkipValueClone()); + + assertEquals(true, read.withSkipKeyClone(true).getSkipKeyClone()); + assertEquals(true, read.withSkipValueClone(true).getSkipValueClone()); + } + /** * This test validates functionality of {@link * HadoopFormatIO.Read#withConfiguration(Configuration) withConfiguration(Configuration)} function @@ -519,7 +529,9 @@ public void testReadDisplayData() { AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - new SerializableSplit()); + new SerializableSplit(), + false, + false); DisplayData displayData = DisplayData.from(boundedSource); assertThat( displayData, @@ -550,7 +562,9 @@ public void testReadIfCreateRecordReaderFails() throws Exception { AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - new SerializableSplit()); + new SerializableSplit(), + false, + false); boundedSource.setInputFormatObj(mockInputFormat); SourceTestUtils.readFromSource(boundedSource, p.getOptions()); } @@ -577,7 +591,9 @@ public void testReadWithNullCreateRecordReader() throws Exception { AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - new SerializableSplit()); + new SerializableSplit(), + false, + false); boundedSource.setInputFormatObj(mockInputFormat); SourceTestUtils.readFromSource(boundedSource, p.getOptions()); } @@ -604,7 +620,9 @@ public void testReadersStartWhenZeroRecords() throws Exception { AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - new SerializableSplit(mockInputSplit)); + new SerializableSplit(mockInputSplit), + false, + false); boundedSource.setInputFormatObj(mockInputFormat); BoundedReader> reader = boundedSource.createReader(p.getOptions()); assertFalse(reader.start()); @@ -625,7 +643,8 @@ public void testReadersGetFractionConsumed() throws Exception { Text.class, Employee.class, WritableCoder.of(Text.class), - AvroCoder.of(Employee.class)); + AvroCoder.of(Employee.class), + false); long estimatedSize = hifSource.getEstimatedSizeBytes(p.getOptions()); // Validate if estimated size is equal to the size of records. assertEquals(referenceRecords.size(), estimatedSize); @@ -687,7 +706,9 @@ public void testGetFractionConsumedForBadProgressValue() throws Exception { AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - new SerializableSplit(mockInputSplit)); + new SerializableSplit(mockInputSplit), + false, + false); boundedSource.setInputFormatObj(mockInputFormat); BoundedReader> reader = boundedSource.createReader(p.getOptions()); assertEquals(Double.valueOf(0), reader.getFractionConsumed()); @@ -718,7 +739,9 @@ public void testReaderAndParentSourceReadsSameData() throws Exception { AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - new SerializableSplit(mockInputSplit)); + new SerializableSplit(mockInputSplit), + false, + false); BoundedReader> reader = boundedSource.createReader(p.getOptions()); SourceTestUtils.assertUnstartedReaderReadsSameAsItsSource(reader, p.getOptions()); } @@ -738,7 +761,9 @@ public void testGetCurrentSourceFunction() throws Exception { AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - split); + split, + false, + false); BoundedReader> hifReader = source.createReader(p.getOptions()); BoundedSource> hifSource = hifReader.getCurrentSource(); assertEquals(hifSource, source); @@ -757,7 +782,8 @@ public void testCreateReaderIfSplitNotCalled() throws Exception { Text.class, Employee.class, WritableCoder.of(Text.class), - AvroCoder.of(Employee.class)); + AvroCoder.of(Employee.class), + false); thrown.expect(IOException.class); thrown.expectMessage("Cannot create reader as source is not split yet."); hifSource.createReader(p.getOptions()); @@ -781,7 +807,9 @@ public void testComputeSplitsIfGetSplitsReturnsEmptyList() throws Exception { AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - mockInputSplit); + mockInputSplit, + false, + false); thrown.expect(IOException.class); thrown.expectMessage("Error in computing splits, getSplits() returns a empty list"); hifSource.setInputFormatObj(mockInputFormat); @@ -806,7 +834,9 @@ public void testComputeSplitsIfGetSplitsReturnsNullValue() throws Exception { AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - mockInputSplit); + mockInputSplit, + false, + false); thrown.expect(IOException.class); thrown.expectMessage("Error in computing splits, getSplits() returns null."); hifSource.setInputFormatObj(mockInputFormat); @@ -837,7 +867,9 @@ public void testComputeSplitsIfGetSplitsReturnsListHavingNullValues() throws Exc AvroCoder.of(Employee.class), null, // No key translation required. null, // No value translation required. - new SerializableSplit()); + new SerializableSplit(), + false, + false); thrown.expect(IOException.class); thrown.expectMessage( "Error in computing splits, split is null in InputSplits list populated " @@ -858,7 +890,8 @@ public void testImmutablityOfOutputOfReadIfRecordReaderObjectsAreMutable() throw Text.class, Employee.class, WritableCoder.of(Text.class), - AvroCoder.of(Employee.class)); + AvroCoder.of(Employee.class), + false); List> bundleRecords = new ArrayList<>(); for (BoundedSource> source : boundedSourceList) { List> elems = SourceTestUtils.readFromSource(source, p.getOptions()); @@ -880,7 +913,8 @@ public void testReadingWithConfigurableInputFormat() throws Exception { Text.class, Employee.class, WritableCoder.of(Text.class), - AvroCoder.of(Employee.class)); + AvroCoder.of(Employee.class), + false); for (BoundedSource> source : boundedSourceList) { // Cast to HadoopInputFormatBoundedSource to access getInputFormat(). HadoopInputFormatBoundedSource hifSource = @@ -905,7 +939,8 @@ public void testImmutablityOfOutputOfReadIfRecordReaderObjectsAreImmutable() thr Text.class, Employee.class, WritableCoder.of(Text.class), - AvroCoder.of(Employee.class)); + AvroCoder.of(Employee.class), + false); List> bundleRecords = new ArrayList<>(); for (BoundedSource> source : boundedSourceList) { List> elems = SourceTestUtils.readFromSource(source, p.getOptions()); @@ -915,6 +950,63 @@ public void testImmutablityOfOutputOfReadIfRecordReaderObjectsAreImmutable() thr assertThat(bundleRecords, containsInAnyOrder(referenceRecords.toArray())); } + /** + * This test validates that in case reader is instructed to not to clone key value records, then + * key value records are exactly the same as output from the source no mater if they are mutable + * or immutable. This override setting is useful to turn on when using key-value translation + * functions and avoid possibly unnecessary copy. + */ + @Test + public void testSkipKeyValueClone() throws Exception { + + SerializableConfiguration serConf = + loadTestConfiguration(EmployeeInputFormat.class, Text.class, Employee.class); + + // with skip clone 'true' it should produce different instances of key/value + List>> sources = + new HadoopInputFormatBoundedSource<>( + serConf, + WritableCoder.of(Text.class), + AvroCoder.of(Employee.class), + new SingletonTextFn(), + new SingletonEmployeeFn(), + true, + true) + .split(0, p.getOptions()); + + for (BoundedSource> source : sources) { + List> elems = SourceTestUtils.readFromSource(source, p.getOptions()); + for (KV elem : elems) { + Assert.assertSame(SingletonTextFn.TEXT, elem.getKey()); + Assert.assertEquals(SingletonTextFn.TEXT, elem.getKey()); + Assert.assertSame(SingletonEmployeeFn.EMPLOYEE, elem.getValue()); + Assert.assertEquals(SingletonEmployeeFn.EMPLOYEE, elem.getValue()); + } + } + + // with skip clone 'false' it should produce different instances of value + sources = + new HadoopInputFormatBoundedSource<>( + serConf, + WritableCoder.of(Text.class), + AvroCoder.of(Employee.class), + new SingletonTextFn(), + new SingletonEmployeeFn(), + false, + false) + .split(0, p.getOptions()); + + for (BoundedSource> source : sources) { + List> elems = SourceTestUtils.readFromSource(source, p.getOptions()); + for (KV elem : elems) { + Assert.assertNotSame(SingletonTextFn.TEXT, elem.getKey()); + Assert.assertEquals(SingletonTextFn.TEXT, elem.getKey()); + Assert.assertNotSame(SingletonEmployeeFn.EMPLOYEE, elem.getValue()); + Assert.assertEquals(SingletonEmployeeFn.EMPLOYEE, elem.getValue()); + } + } + } + @Test public void testValidateConfigurationWithDBInputFormat() { Configuration conf = new Configuration(); @@ -943,7 +1035,8 @@ private HadoopInputFormatBoundedSource getTestHIFSource( Class inputFormatKeyClass, Class inputFormatValueClass, Coder keyCoder, - Coder valueCoder) { + Coder valueCoder, + boolean skipKeyValueClone) { SerializableConfiguration serConf = loadTestConfiguration(inputFormatClass, inputFormatKeyClass, inputFormatValueClass); return new HadoopInputFormatBoundedSource<>( @@ -951,7 +1044,9 @@ private HadoopInputFormatBoundedSource getTestHIFSource( keyCoder, valueCoder, null, // No key translation required. - null); // No value translation required. + null, // No value translation required. + skipKeyValueClone, + skipKeyValueClone); } private List>> getBoundedSourceList( @@ -959,11 +1054,37 @@ private List>> getBoundedSourceList( Class inputFormatKeyClass, Class inputFormatValueClass, Coder keyCoder, - Coder valueCoder) + Coder valueCoder, + boolean skipKeyValueClone) throws Exception { HadoopInputFormatBoundedSource boundedSource = getTestHIFSource( - inputFormatClass, inputFormatKeyClass, inputFormatValueClass, keyCoder, valueCoder); + inputFormatClass, + inputFormatKeyClass, + inputFormatValueClass, + keyCoder, + valueCoder, + skipKeyValueClone); return boundedSource.split(0, p.getOptions()); } + + private static class SingletonEmployeeFn extends SimpleFunction { + + static final Employee EMPLOYEE = new Employee("Name", "Address"); + + @Override + public Employee apply(Employee input) { + return EMPLOYEE; + } + } + + private static class SingletonTextFn extends SimpleFunction { + + static final Text TEXT = new Text("content"); + + @Override + public Text apply(Text input) { + return TEXT; + } + } } diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOSequenceFileTest.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOSequenceFileTest.java index af0986d83a41..89a5c5bcd772 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOSequenceFileTest.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOSequenceFileTest.java @@ -69,9 +69,6 @@ /** Tests {@link HadoopFormatIO} output with batch and stream pipeline. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HadoopFormatIOSequenceFileTest { private static final Instant START_TIME = new Instant(0); diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOWriteTest.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOWriteTest.java index b9cf1fe57efe..9c41bcd0445f 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOWriteTest.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOWriteTest.java @@ -51,9 +51,6 @@ /** Unit tests for {@link HadoopFormatIO.Write}. */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HadoopFormatIOWriteTest { private static final int REDUCERS_COUNT = 2; diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/IterableCombinerTest.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/IterableCombinerTest.java index 528575cc1e64..1f45f5fcdd6a 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/IterableCombinerTest.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/IterableCombinerTest.java @@ -35,9 +35,6 @@ /** Tests Iterable combiner whether works correctly. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class IterableCombinerTest { private static final TypeDescriptor STRING_TYPE_DESCRIPTOR = TypeDescriptors.strings(); diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/ReuseObjectsEmployeeInputFormat.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/ReuseObjectsEmployeeInputFormat.java index 04e2e10b4ec5..2e20102e92b2 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/ReuseObjectsEmployeeInputFormat.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/ReuseObjectsEmployeeInputFormat.java @@ -49,9 +49,6 @@ * HadoopFormatIO } source returns immutable records for a scenario when RecordReader returns the * same key and value objects with updating values every time it reads data. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class ReuseObjectsEmployeeInputFormat extends InputFormat { public ReuseObjectsEmployeeInputFormat() {} diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/TestRowDBWritable.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/TestRowDBWritable.java index ab8b4df98361..f4e3677c2c77 100644 --- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/TestRowDBWritable.java +++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/TestRowDBWritable.java @@ -35,9 +35,6 @@ * org.apache.hadoop.mapreduce.lib.db.DBInputFormat}. */ @DefaultCoder(AvroCoder.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class TestRowDBWritable extends TestRow implements DBWritable, Writable { private Integer id; diff --git a/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOIT.java b/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOIT.java index 04b692986045..3b7787f72427 100644 --- a/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOIT.java +++ b/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOIT.java @@ -67,9 +67,6 @@ * */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HBaseIOIT { /** HBaseIOIT options. */ diff --git a/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java b/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java index 4b4acdcac76f..6b096492437a 100644 --- a/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java +++ b/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java @@ -22,10 +22,10 @@ import static org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionFails; import static org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionSucceedsAndConsistent; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.hasSize; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; -import static org.junit.Assert.assertThat; import java.nio.charset.StandardCharsets; import java.util.ArrayList; @@ -79,9 +79,6 @@ /** Test HBaseIO. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HBaseIOTest { @Rule public final transient TestPipeline p = TestPipeline.create(); @Rule public ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/io/hcatalog/build.gradle b/sdks/java/io/hcatalog/build.gradle index d1c9e6fa9831..820418caf912 100644 --- a/sdks/java/io/hcatalog/build.gradle +++ b/sdks/java/io/hcatalog/build.gradle @@ -41,6 +41,17 @@ test { ignoreFailures true } +configurations.testRuntimeClasspath { + resolutionStrategy { + def log4j_version = "2.8.2" + // Beam's build system forces a uniform log4j version resolution for all modules, however for + // the HCatalog case the current version of log4j produces NoClassDefFoundError so we need to + // force an old version on the tests runtime classpath + force "org.apache.logging.log4j:log4j-api:${log4j_version}" + force "org.apache.logging.log4j:log4j-core:${log4j_version}" + } +} + /* * We need to rely on manually specifying these evaluationDependsOn to ensure that * the following projects are evaluated before we evaluate this project. This is because @@ -55,6 +66,7 @@ dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") compile project(":sdks:java:io:hadoop-common") compile library.java.slf4j_api + compile library.java.joda_time // Hive bundles without repackaging Jackson which is why we redeclare it here so that it appears // on the compile/test/runtime classpath before Hive. provided library.java.jackson_annotations diff --git a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogBeamSchemaTest.java b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogBeamSchemaTest.java index 1156b081135e..18d25ae6346b 100644 --- a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogBeamSchemaTest.java +++ b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogBeamSchemaTest.java @@ -39,9 +39,6 @@ /** Unit tests for {@link HCatalogBeamSchema}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HCatalogBeamSchemaTest implements Serializable { private static final String TEST_TABLE_PARTITIONED = TEST_TABLE + "_partitioned"; diff --git a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogIOIT.java b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogIOIT.java index 7ab5c9d20d89..60d925888f9c 100644 --- a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogIOIT.java +++ b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogIOIT.java @@ -65,9 +65,6 @@ * runner invocation pipeline options. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HCatalogIOIT { /** PipelineOptions for testing {@link org.apache.beam.sdk.io.hcatalog.HCatalogIO}. */ diff --git a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogIOTest.java b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogIOTest.java index 0554561225bd..2dd938e4824c 100644 --- a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogIOTest.java +++ b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HCatalogIOTest.java @@ -26,11 +26,11 @@ import static org.apache.beam.sdk.io.hcatalog.test.HCatalogIOTestUtils.getExpectedRecords; import static org.apache.beam.sdk.io.hcatalog.test.HCatalogIOTestUtils.getReaderContext; import static org.apache.beam.sdk.io.hcatalog.test.HCatalogIOTestUtils.insertTestData; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.isA; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.IOException; @@ -83,9 +83,6 @@ /** Test for HCatalogIO. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class HCatalogIOTest implements Serializable { private static final PipelineOptions OPTIONS = PipelineOptionsFactory.create(); diff --git a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HiveDatabaseTestHelper.java b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HiveDatabaseTestHelper.java index 307e5193e2da..fb83c0060f49 100644 --- a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HiveDatabaseTestHelper.java +++ b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/HiveDatabaseTestHelper.java @@ -24,9 +24,6 @@ import org.apache.beam.sdk.io.common.DatabaseTestHelper; /** Helper for creating connection and test tables on hive database via JDBC driver. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class HiveDatabaseTestHelper { private static Connection con; private static Statement stmt; diff --git a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java index dac97788e670..5b748da86773 100644 --- a/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java +++ b/sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java @@ -24,9 +24,6 @@ import org.junit.Assert; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SchemaUtilsTest { @Test public void testParameterizedTypesToBeamTypes() { diff --git a/sdks/java/io/influxdb/build.gradle b/sdks/java/io/influxdb/build.gradle index 782c57218eae..4bbe170097d3 100644 --- a/sdks/java/io/influxdb/build.gradle +++ b/sdks/java/io/influxdb/build.gradle @@ -27,6 +27,9 @@ ext.summary = "IO to read and write on InfluxDB" dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.influxdb_library + compile "com.squareup.okhttp3:okhttp:4.6.0" + compile library.java.vendored_guava_26_0_jre + compile library.java.slf4j_api testCompile library.java.junit testCompile library.java.powermock testCompile library.java.powermock_mockito diff --git a/sdks/java/io/jdbc/build.gradle b/sdks/java/io/jdbc/build.gradle index 549749c1d830..ca8616bd8c39 100644 --- a/sdks/java/io/jdbc/build.gradle +++ b/sdks/java/io/jdbc/build.gradle @@ -29,18 +29,19 @@ ext.summary = "IO to read and write on JDBC datasource." dependencies { compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") - compile "org.apache.commons:commons-dbcp2:2.6.0" + compile "org.apache.commons:commons-dbcp2:2.8.0" + compile library.java.joda_time + compile "org.apache.commons:commons-pool2:2.8.1" + compile library.java.slf4j_api + testCompile "org.apache.derby:derby:10.14.2.0" + testCompile "org.apache.derby:derbyclient:10.14.2.0" + testCompile "org.apache.derby:derbynet:10.14.2.0" testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") testCompile project(path: ":sdks:java:testing:test-utils", configuration: "testRuntime") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.slf4j_api testCompile library.java.postgres - testCompile "org.apache.derby:derby:10.14.2.0" - testCompile "org.apache.derby:derbyclient:10.14.2.0" - testCompile "org.apache.derby:derbynet:10.14.2.0" testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java index 93b0d393f3a0..c0e36f71c1a5 100644 --- a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java +++ b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java @@ -17,8 +17,11 @@ */ package org.apache.beam.sdk.io.jdbc; +import static java.lang.Integer.MAX_VALUE; import static org.apache.beam.sdk.io.jdbc.SchemaUtil.checkNullabilityForFields; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; import com.google.auto.value.AutoValue; import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; @@ -29,11 +32,16 @@ import java.sql.ResultSet; import java.sql.SQLException; import java.util.ArrayList; +import java.util.Arrays; import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; import java.util.List; import java.util.Objects; import java.util.Optional; +import java.util.Set; import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.TimeUnit; import java.util.function.Predicate; import java.util.stream.Collectors; import java.util.stream.IntStream; @@ -42,13 +50,19 @@ import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.RowCoder; +import org.apache.beam.sdk.io.jdbc.JdbcUtil.PartitioningFn; +import org.apache.beam.sdk.io.jdbc.SchemaUtil.FieldWithIndex; +import org.apache.beam.sdk.metrics.Distribution; +import org.apache.beam.sdk.metrics.Metrics; import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; import org.apache.beam.sdk.schemas.NoSuchSchemaException; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.SchemaRegistry; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.GroupByKey; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.Reshuffle; @@ -62,6 +76,7 @@ import org.apache.beam.sdk.util.BackOffUtils; import org.apache.beam.sdk.util.FluentBackoff; import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PCollectionView; @@ -153,11 +168,11 @@ * case you should look into sharing a single instance of a {@link PoolingDataSource} across all the * execution threads. For example: * - *

    {@code
    - * private static class MyDataSourceProviderFn implements SerializableFunction {
    + * 
    
    + * private static class MyDataSourceProviderFn implements{@literal SerializableFunction} {
      *   private static transient DataSource dataSource;
      *
    - *   @Override
    + *  {@literal @Override}
      *   public synchronized DataSource apply(Void input) {
      *     if (dataSource == null) {
      *       dataSource = ... build data source ...
    @@ -165,11 +180,58 @@
      *     return dataSource;
      *   }
      * }
    - *
    + * {@literal
      * pipeline.apply(JdbcIO.>read()
      *   .withDataSourceProviderFn(new MyDataSourceProviderFn())
      *   // ...
      * );
    + * }
    + * + *

    3. To read all data from a table in parallel with partitioning can be done with {@link + * ReadWithPartitions}: + * + *

    {@code
    + * pipeline.apply(JdbcIO.>readWithPartitions()
    + *  .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(
    + *         "com.mysql.jdbc.Driver", "jdbc:mysql://hostname:3306/mydb")
    + *       .withUsername("username")
    + *       .withPassword("password"))
    + *  .withTable("Person")
    + *  .withPartitionColumn("id")
    + *  .withLowerBound(0)
    + *  .withUpperBound(1000)
    + *  .withNumPartitions(5)
    + *  .withCoder(KvCoder.of(BigEndianIntegerCoder.of(), StringUtf8Coder.of()))
    + *  .withRowMapper(new JdbcIO.RowMapper>() {
    + *    public KV mapRow(ResultSet resultSet) throws Exception {
    + *      return KV.of(resultSet.getInt(1), resultSet.getString(2));
    + *    }
    + *  })
    + * );
    + * }
    + * + *

    Instead of a full table you could also use a subquery in parentheses. The subquery can be + * specified using Table option instead and partition columns can be qualified using the subquery + * alias provided as part of Table. + * + *

    {@code
    + * pipeline.apply(JdbcIO.>readWithPartitions()
    + *  .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(
    + *         "com.mysql.jdbc.Driver", "jdbc:mysql://hostname:3306/mydb")
    + *       .withUsername("username")
    + *       .withPassword("password"))
    + *  .withTable("(select id, name from Person) as subq")
    + *  .withPartitionColumn("id")
    + *  .withLowerBound(0)
    + *  .withUpperBound(1000)
    + *  .withNumPartitions(5)
    + *  .withCoder(KvCoder.of(BigEndianIntegerCoder.of(), StringUtf8Coder.of()))
    + *  .withRowMapper(new JdbcIO.RowMapper>() {
    + *    public KV mapRow(ResultSet resultSet) throws Exception {
    + *      return KV.of(resultSet.getInt(1), resultSet.getString(2));
    + *    }
    + *  })
    + * );
      * }
    * *

    Writing to JDBC datasource

    @@ -250,11 +312,29 @@ public static ReadAll readAll() { .build(); } + /** + * Like {@link #readAll}, but executes multiple instances of the query on the same table + * (subquery) using ranges. + * + * @param Type of the data to be read. + */ + public static ReadWithPartitions readWithPartitions() { + return new AutoValue_JdbcIO_ReadWithPartitions.Builder() + .setLowerBound(DEFAULT_LOWER_BOUND) + .setUpperBound(DEFAULT_UPPER_BOUND) + .setNumPartitions(DEFAULT_NUM_PARTITIONS) + .build(); + } + private static final long DEFAULT_BATCH_SIZE = 1000L; private static final int DEFAULT_FETCH_SIZE = 50_000; // Default values used from fluent backoff. private static final Duration DEFAULT_INITIAL_BACKOFF = Duration.standardSeconds(1); private static final Duration DEFAULT_MAX_CUMULATIVE_BACKOFF = Duration.standardDays(1000); + // Default values used for partitioning a table + private static final int DEFAULT_LOWER_BOUND = 0; + private static final int DEFAULT_UPPER_BOUND = MAX_VALUE; + private static final int DEFAULT_NUM_PARTITIONS = 200; /** * Write data to a JDBC datasource. @@ -262,7 +342,7 @@ public static ReadAll readAll() { * @param Type of the data to be written. */ public static Write write() { - return new Write(); + return new Write<>(); } public static WriteVoid writeVoid() { @@ -275,13 +355,17 @@ public static WriteVoid writeVoid() { /** * This is the default {@link Predicate} we use to detect DeadLock. It basically test if the - * {@link SQLException#getSQLState()} equals 40001. 40001 is the SQL State used by most of - * database to identify deadlock. + * {@link SQLException#getSQLState()} equals 40001 or 40P01. 40001 is the SQL State used by most + * of databases to identify deadlock, and 40P01 is specific to PostgreSQL (see PostgreSQL documentation). */ public static class DefaultRetryStrategy implements RetryStrategy { + private static final Set errorCodesToRetry = + new HashSet(Arrays.asList("40001", "40P01")); + @Override public boolean apply(SQLException e) { - return "40001".equals(e.getSQLState()); + return errorCodesToRetry.contains(e.getSQLState()); } } @@ -866,8 +950,182 @@ public void populateDisplayData(DisplayData.Builder builder) { } } + /** Implementation of {@link #readWithPartitions}. */ + @AutoValue + public abstract static class ReadWithPartitions extends PTransform> { + + abstract @Nullable SerializableFunction getDataSourceProviderFn(); + + abstract @Nullable RowMapper getRowMapper(); + + abstract @Nullable Coder getCoder(); + + abstract int getNumPartitions(); + + abstract @Nullable String getPartitionColumn(); + + abstract int getLowerBound(); + + abstract int getUpperBound(); + + abstract @Nullable String getTable(); + + abstract Builder toBuilder(); + + @AutoValue.Builder + abstract static class Builder { + + abstract Builder setDataSourceProviderFn( + SerializableFunction dataSourceProviderFn); + + abstract Builder setRowMapper(RowMapper rowMapper); + + abstract Builder setCoder(Coder coder); + + abstract Builder setNumPartitions(int numPartitions); + + abstract Builder setPartitionColumn(String partitionColumn); + + abstract Builder setLowerBound(int lowerBound); + + abstract Builder setUpperBound(int upperBound); + + abstract Builder setTable(String tableName); + + abstract ReadWithPartitions build(); + } + + public ReadWithPartitions withDataSourceConfiguration(final DataSourceConfiguration config) { + return withDataSourceProviderFn(new DataSourceProviderFromDataSourceConfiguration(config)); + } + + public ReadWithPartitions withDataSourceProviderFn( + SerializableFunction dataSourceProviderFn) { + return toBuilder().setDataSourceProviderFn(dataSourceProviderFn).build(); + } + + public ReadWithPartitions withRowMapper(RowMapper rowMapper) { + checkNotNull(rowMapper, "rowMapper can not be null"); + return toBuilder().setRowMapper(rowMapper).build(); + } + + public ReadWithPartitions withCoder(Coder coder) { + checkNotNull(coder, "coder can not be null"); + return toBuilder().setCoder(coder).build(); + } + + /** + * The number of partitions. This, along with withLowerBound and withUpperBound, form partitions + * strides for generated WHERE clause expressions used to split the column withPartitionColumn + * evenly. When the input is less than 1, the number is set to 1. + */ + public ReadWithPartitions withNumPartitions(int numPartitions) { + checkArgument(numPartitions > 0, "numPartitions can not be less than 1"); + return toBuilder().setNumPartitions(numPartitions).build(); + } + + /** The name of a column of numeric type that will be used for partitioning. */ + public ReadWithPartitions withPartitionColumn(String partitionColumn) { + checkNotNull(partitionColumn, "partitionColumn can not be null"); + return toBuilder().setPartitionColumn(partitionColumn).build(); + } + + public ReadWithPartitions withLowerBound(int lowerBound) { + return toBuilder().setLowerBound(lowerBound).build(); + } + + public ReadWithPartitions withUpperBound(int upperBound) { + return toBuilder().setUpperBound(upperBound).build(); + } + + /** Name of the table in the external database. Can be used to pass a user-defined subqery. */ + public ReadWithPartitions withTable(String tableName) { + checkNotNull(tableName, "table can not be null"); + return toBuilder().setTable(tableName).build(); + } + + @Override + public PCollection expand(PBegin input) { + checkNotNull(getRowMapper(), "withRowMapper() is required"); + checkNotNull(getCoder(), "withCoder() is required"); + checkNotNull( + getDataSourceProviderFn(), + "withDataSourceConfiguration() or withDataSourceProviderFn() is required"); + checkNotNull(getPartitionColumn(), "withPartitionColumn() is required"); + checkNotNull(getTable(), "withTable() is required"); + checkArgument( + getLowerBound() < getUpperBound(), + "The lower bound of partitioning column is larger or equal than the upper bound"); + checkArgument( + getUpperBound() - getLowerBound() >= getNumPartitions(), + "The specified number of partitions is more than the difference between upper bound and lower bound"); + + if (getUpperBound() == MAX_VALUE || getLowerBound() == 0) { + refineBounds(input); + } + + int stride = (getUpperBound() - getLowerBound()) / getNumPartitions(); + PCollection> params = + input.apply( + Create.of( + Collections.singletonList( + Arrays.asList(getLowerBound(), getUpperBound(), getNumPartitions())))); + PCollection>> ranges = + params + .apply("Partitioning", ParDo.of(new PartitioningFn())) + .apply("Group partitions", GroupByKey.create()); + + return ranges.apply( + "Read ranges", + JdbcIO.>, T>readAll() + .withDataSourceProviderFn(getDataSourceProviderFn()) + .withFetchSize(stride) + .withQuery( + String.format( + "select * from %1$s where %2$s >= ? and %2$s < ?", + getTable(), getPartitionColumn())) + .withCoder(getCoder()) + .withRowMapper(getRowMapper()) + .withParameterSetter( + (PreparedStatementSetter>>) + (element, preparedStatement) -> { + String[] range = element.getKey().split(",", -1); + preparedStatement.setInt(1, Integer.parseInt(range[0])); + preparedStatement.setInt(2, Integer.parseInt(range[1])); + }) + .withOutputParallelization(false)); + } + + private void refineBounds(PBegin input) { + Integer[] bounds = + JdbcUtil.getBounds(input, getTable(), getDataSourceProviderFn(), getPartitionColumn()); + if (getLowerBound() == 0) { + withLowerBound(bounds[0]); + } + if (getUpperBound() == MAX_VALUE) { + withUpperBound(bounds[1]); + } + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + super.populateDisplayData(builder); + builder.add(DisplayData.item("rowMapper", getRowMapper().getClass().getName())); + builder.add(DisplayData.item("coder", getCoder().getClass().getName())); + builder.add(DisplayData.item("partitionColumn", getPartitionColumn())); + builder.add(DisplayData.item("table", getTable())); + builder.add(DisplayData.item("numPartitions", getNumPartitions())); + builder.add(DisplayData.item("lowerBound", getLowerBound())); + builder.add(DisplayData.item("upperBound", getUpperBound())); + if (getDataSourceProviderFn() instanceof HasDisplayData) { + ((HasDisplayData) getDataSourceProviderFn()).populateDisplayData(builder); + } + } + } + /** A {@link DoFn} executing the SQL query to read from the database. */ private static class ReadFn extends DoFn { + private final SerializableFunction dataSourceProviderFn; private final ValueProvider query; private final PreparedStatementSetter parameterSetter; @@ -903,6 +1161,10 @@ public void processElement(ProcessContext context) throws Exception { if (connection == null) { connection = dataSource.getConnection(); } + // PostgreSQL requires autocommit to be disabled to enable cursor streaming + // see https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor + LOG.info("Autocommit has been disabled"); + connection.setAutoCommit(false); try (PreparedStatement statement = connection.prepareStatement( query.get(), ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)) { @@ -1021,43 +1283,43 @@ public static class Write extends PTransform, PDone> { /** See {@link WriteVoid#withDataSourceConfiguration(DataSourceConfiguration)}. */ public Write withDataSourceConfiguration(DataSourceConfiguration config) { - return new Write(inner.withDataSourceConfiguration(config)); + return new Write<>(inner.withDataSourceConfiguration(config)); } /** See {@link WriteVoid#withDataSourceProviderFn(SerializableFunction)}. */ public Write withDataSourceProviderFn( SerializableFunction dataSourceProviderFn) { - return new Write(inner.withDataSourceProviderFn(dataSourceProviderFn)); + return new Write<>(inner.withDataSourceProviderFn(dataSourceProviderFn)); } /** See {@link WriteVoid#withStatement(String)}. */ public Write withStatement(String statement) { - return new Write(inner.withStatement(statement)); + return new Write<>(inner.withStatement(statement)); } /** See {@link WriteVoid#withPreparedStatementSetter(PreparedStatementSetter)}. */ public Write withPreparedStatementSetter(PreparedStatementSetter setter) { - return new Write(inner.withPreparedStatementSetter(setter)); + return new Write<>(inner.withPreparedStatementSetter(setter)); } /** See {@link WriteVoid#withBatchSize(long)}. */ public Write withBatchSize(long batchSize) { - return new Write(inner.withBatchSize(batchSize)); + return new Write<>(inner.withBatchSize(batchSize)); } /** See {@link WriteVoid#withRetryStrategy(RetryStrategy)}. */ public Write withRetryStrategy(RetryStrategy retryStrategy) { - return new Write(inner.withRetryStrategy(retryStrategy)); + return new Write<>(inner.withRetryStrategy(retryStrategy)); } /** See {@link WriteVoid#withRetryConfiguration(RetryConfiguration)}. */ public Write withRetryConfiguration(RetryConfiguration retryConfiguration) { - return new Write(inner.withRetryConfiguration(retryConfiguration)); + return new Write<>(inner.withRetryConfiguration(retryConfiguration)); } /** See {@link WriteVoid#withTable(String)}. */ public Write withTable(String table) { - return new Write(inner.withTable(table)); + return new Write<>(inner.withTable(table)); } /** @@ -1079,150 +1341,285 @@ public WriteVoid withResults() { return inner; } + /** + * Returns {@link WriteWithResults} transform that could return a specific result. + * + *

    See {@link WriteWithResults} + */ + public WriteWithResults withWriteResults( + RowMapper rowMapper) { + return new AutoValue_JdbcIO_WriteWithResults.Builder() + .setRowMapper(rowMapper) + .setRetryStrategy(inner.getRetryStrategy()) + .setRetryConfiguration(inner.getRetryConfiguration()) + .setDataSourceProviderFn(inner.getDataSourceProviderFn()) + .setPreparedStatementSetter(inner.getPreparedStatementSetter()) + .setStatement(inner.getStatement()) + .setTable(inner.getTable()) + .build(); + } + @Override public void populateDisplayData(DisplayData.Builder builder) { inner.populateDisplayData(builder); } - private boolean hasStatementAndSetter() { - return inner.getStatement() != null && inner.getPreparedStatementSetter() != null; - } - @Override public PDone expand(PCollection input) { - // fixme: validate invalid table input - if (input.hasSchema() && !hasStatementAndSetter()) { - checkArgument( - inner.getTable() != null, "table cannot be null if statement is not provided"); - Schema schema = input.getSchema(); - List fields = getFilteredFields(schema); - inner = - inner.withStatement( - JdbcUtil.generateStatement( - inner.getTable(), - fields.stream() - .map(SchemaUtil.FieldWithIndex::getField) - .collect(Collectors.toList()))); - inner = - inner.withPreparedStatementSetter( - new AutoGeneratedPreparedStatementSetter(fields, input.getToRowFunction())); - } - inner.expand(input); return PDone.in(input.getPipeline()); } + } - private List getFilteredFields(Schema schema) { - Schema tableSchema; + /** Interface implemented by functions that sets prepared statement data. */ + @FunctionalInterface + interface PreparedStatementSetCaller extends Serializable { + void set( + Row element, + PreparedStatement preparedStatement, + int prepareStatementIndex, + SchemaUtil.FieldWithIndex schemaFieldWithIndex) + throws SQLException; + } - try (Connection connection = inner.getDataSourceProviderFn().apply(null).getConnection(); - PreparedStatement statement = - connection.prepareStatement((String.format("SELECT * FROM %s", inner.getTable())))) { - tableSchema = SchemaUtil.toBeamSchema(statement.getMetaData()); - statement.close(); - } catch (SQLException e) { - throw new RuntimeException( - "Error while determining columns from table: " + inner.getTable(), e); - } + /** + * A {@link PTransform} to write to a JDBC datasource. Executes statements one by one. + * + *

    The INSERT, UPDATE, and DELETE commands sometimes have an optional RETURNING clause that + * supports obtaining data from modified rows while they are being manipulated. Output {@link + * PCollection} of this transform is a collection of such returning results mapped by {@link + * RowMapper}. + */ + @AutoValue + public abstract static class WriteWithResults + extends PTransform, PCollection> { + abstract @Nullable SerializableFunction getDataSourceProviderFn(); - if (tableSchema.getFieldCount() < schema.getFieldCount()) { - throw new RuntimeException("Input schema has more fields than actual table."); - } + abstract @Nullable ValueProvider getStatement(); - // filter out missing fields from output table - List missingFields = - tableSchema.getFields().stream() - .filter( - line -> - schema.getFields().stream() - .noneMatch(s -> s.getName().equalsIgnoreCase(line.getName()))) - .collect(Collectors.toList()); + abstract @Nullable PreparedStatementSetter getPreparedStatementSetter(); - // allow insert only if missing fields are nullable - if (checkNullabilityForFields(missingFields)) { - throw new RuntimeException("Non nullable fields are not allowed without schema."); - } + abstract @Nullable RetryStrategy getRetryStrategy(); - List tableFilteredFields = - tableSchema.getFields().stream() - .map( - (tableField) -> { - Optional optionalSchemaField = - schema.getFields().stream() - .filter((f) -> SchemaUtil.compareSchemaField(tableField, f)) - .findFirst(); - return (optionalSchemaField.isPresent()) - ? SchemaUtil.FieldWithIndex.of( - tableField, schema.getFields().indexOf(optionalSchemaField.get())) - : null; - }) - .filter(Objects::nonNull) - .collect(Collectors.toList()); + abstract @Nullable RetryConfiguration getRetryConfiguration(); - if (tableFilteredFields.size() != schema.getFieldCount()) { - throw new RuntimeException("Provided schema doesn't match with database schema."); - } + abstract @Nullable String getTable(); - return tableFilteredFields; + abstract @Nullable RowMapper getRowMapper(); + + abstract Builder toBuilder(); + + @AutoValue.Builder + abstract static class Builder { + abstract Builder setDataSourceProviderFn( + SerializableFunction dataSourceProviderFn); + + abstract Builder setStatement(ValueProvider statement); + + abstract Builder setPreparedStatementSetter(PreparedStatementSetter setter); + + abstract Builder setRetryStrategy(RetryStrategy deadlockPredicate); + + abstract Builder setRetryConfiguration(RetryConfiguration retryConfiguration); + + abstract Builder setTable(String table); + + abstract Builder setRowMapper(RowMapper rowMapper); + + abstract WriteWithResults build(); + } + + public WriteWithResults withDataSourceConfiguration(DataSourceConfiguration config) { + return withDataSourceProviderFn(new DataSourceProviderFromDataSourceConfiguration(config)); + } + + public WriteWithResults withDataSourceProviderFn( + SerializableFunction dataSourceProviderFn) { + return toBuilder().setDataSourceProviderFn(dataSourceProviderFn).build(); + } + + public WriteWithResults withStatement(String statement) { + return withStatement(ValueProvider.StaticValueProvider.of(statement)); + } + + public WriteWithResults withStatement(ValueProvider statement) { + return toBuilder().setStatement(statement).build(); + } + + public WriteWithResults withPreparedStatementSetter(PreparedStatementSetter setter) { + return toBuilder().setPreparedStatementSetter(setter).build(); } /** - * A {@link org.apache.beam.sdk.io.jdbc.JdbcIO.PreparedStatementSetter} implementation that - * calls related setters on prepared statement. + * When a SQL exception occurs, {@link Write} uses this {@link RetryStrategy} to determine if it + * will retry the statements. If {@link RetryStrategy#apply(SQLException)} returns {@code true}, + * then {@link Write} retries the statements. */ - private class AutoGeneratedPreparedStatementSetter implements PreparedStatementSetter { + public WriteWithResults withRetryStrategy(RetryStrategy retryStrategy) { + checkArgument(retryStrategy != null, "retryStrategy can not be null"); + return toBuilder().setRetryStrategy(retryStrategy).build(); + } + + /** + * When a SQL exception occurs, {@link Write} uses this {@link RetryConfiguration} to + * exponentially back off and retry the statements based on the {@link RetryConfiguration} + * mentioned. + * + *

    Usage of RetryConfiguration - + * + *

    {@code
    +     * pipeline.apply(JdbcIO.write())
    +     *    .withReturningResults(...)
    +     *    .withDataSourceConfiguration(...)
    +     *    .withRetryStrategy(...)
    +     *    .withRetryConfiguration(JdbcIO.RetryConfiguration.
    +     *        create(5, Duration.standardSeconds(5), Duration.standardSeconds(1))
    +     *
    +     * }
    + * + * maxDuration and initialDuration are Nullable + * + *
    {@code
    +     * pipeline.apply(JdbcIO.write())
    +     *    .withReturningResults(...)
    +     *    .withDataSourceConfiguration(...)
    +     *    .withRetryStrategy(...)
    +     *    .withRetryConfiguration(JdbcIO.RetryConfiguration.
    +     *        create(5, null, null)
    +     *
    +     * }
    + */ + public WriteWithResults withRetryConfiguration(RetryConfiguration retryConfiguration) { + checkArgument(retryConfiguration != null, "retryConfiguration can not be null"); + return toBuilder().setRetryConfiguration(retryConfiguration).build(); + } - private List fields; - private SerializableFunction toRowFn; - private List preparedStatementFieldSetterList = new ArrayList<>(); + public WriteWithResults withTable(String table) { + checkArgument(table != null, "table name can not be null"); + return toBuilder().setTable(table).build(); + } - AutoGeneratedPreparedStatementSetter( - List fieldsWithIndex, SerializableFunction toRowFn) { - this.fields = fieldsWithIndex; - this.toRowFn = toRowFn; - populatePreparedStatementFieldSetter(); + public WriteWithResults withRowMapper(RowMapper rowMapper) { + checkArgument(rowMapper != null, "result set getter can not be null"); + return toBuilder().setRowMapper(rowMapper).build(); + } + + @Override + public PCollection expand(PCollection input) { + checkArgument(getStatement() != null, "withStatement() is required"); + checkArgument( + getPreparedStatementSetter() != null, "withPreparedStatementSetter() is required"); + checkArgument( + (getDataSourceProviderFn() != null), + "withDataSourceConfiguration() or withDataSourceProviderFn() is required"); + + return input.apply(ParDo.of(new WriteWithResultsFn<>(this))); + } + + private static class WriteWithResultsFn extends DoFn { + + private final WriteWithResults spec; + private DataSource dataSource; + private Connection connection; + private PreparedStatement preparedStatement; + private static FluentBackoff retryBackOff; + + public WriteWithResultsFn(WriteWithResults spec) { + this.spec = spec; } - private void populatePreparedStatementFieldSetter() { - IntStream.range(0, fields.size()) - .forEach( - (index) -> { - Schema.FieldType fieldType = fields.get(index).getField().getType(); - preparedStatementFieldSetterList.add( - JdbcUtil.getPreparedStatementSetCaller(fieldType)); - }); + @Setup + public void setup() { + dataSource = spec.getDataSourceProviderFn().apply(null); + RetryConfiguration retryConfiguration = spec.getRetryConfiguration(); + + retryBackOff = + FluentBackoff.DEFAULT + .withInitialBackoff(retryConfiguration.getInitialDuration()) + .withMaxCumulativeBackoff(retryConfiguration.getMaxDuration()) + .withMaxRetries(retryConfiguration.getMaxAttempts()); + } + + @ProcessElement + public void processElement(ProcessContext context) throws Exception { + T record = context.element(); + + // Only acquire the connection if there is something to write. + if (connection == null) { + connection = dataSource.getConnection(); + connection.setAutoCommit(false); + preparedStatement = connection.prepareStatement(spec.getStatement().get()); + } + Sleeper sleeper = Sleeper.DEFAULT; + BackOff backoff = retryBackOff.backoff(); + while (true) { + try (PreparedStatement preparedStatement = + connection.prepareStatement(spec.getStatement().get())) { + try { + + try { + spec.getPreparedStatementSetter().setParameters(record, preparedStatement); + } catch (Exception e) { + throw new RuntimeException(e); + } + + // execute the statement + preparedStatement.execute(); + // commit the changes + connection.commit(); + context.output(spec.getRowMapper().mapRow(preparedStatement.getResultSet())); + return; + } catch (SQLException exception) { + if (!spec.getRetryStrategy().apply(exception)) { + throw exception; + } + LOG.warn("Deadlock detected, retrying", exception); + connection.rollback(); + if (!BackOffUtils.next(sleeper, backoff)) { + // we tried the max number of times + throw exception; + } + } + } + } + } + + @FinishBundle + public void finishBundle() throws Exception { + cleanUpStatementAndConnection(); } @Override - public void setParameters(T element, PreparedStatement preparedStatement) throws Exception { - Row row = (element instanceof Row) ? (Row) element : toRowFn.apply(element); - IntStream.range(0, fields.size()) - .forEach( - (index) -> { - try { - preparedStatementFieldSetterList - .get(index) - .set(row, preparedStatement, index, fields.get(index)); - } catch (SQLException | NullPointerException e) { - throw new RuntimeException("Error while setting data to preparedStatement", e); - } - }); + protected void finalize() throws Throwable { + cleanUpStatementAndConnection(); } - } - } - /** Interface implemented by functions that sets prepared statement data. */ - @FunctionalInterface - interface PreparedStatementSetCaller extends Serializable { - void set( - Row element, - PreparedStatement preparedStatement, - int prepareStatementIndex, - SchemaUtil.FieldWithIndex schemaFieldWithIndex) - throws SQLException; + private void cleanUpStatementAndConnection() throws Exception { + try { + if (preparedStatement != null) { + try { + preparedStatement.close(); + } finally { + preparedStatement = null; + } + } + } finally { + if (connection != null) { + try { + connection.close(); + } finally { + connection = null; + } + } + } + } + } } - /** A {@link PTransform} to write to a JDBC datasource. */ + /** + * A {@link PTransform} to write to a JDBC datasource. Executes statements in a batch, and returns + * a trivial result. + */ @AutoValue public abstract static class WriteVoid extends PTransform, PCollection> { @@ -1342,18 +1739,144 @@ public WriteVoid withTable(String table) { @Override public PCollection expand(PCollection input) { - checkArgument(getStatement() != null, "withStatement() is required"); - checkArgument( - getPreparedStatementSetter() != null, "withPreparedStatementSetter() is required"); + WriteVoid spec = this; checkArgument( - (getDataSourceProviderFn() != null), + (spec.getDataSourceProviderFn() != null), "withDataSourceConfiguration() or withDataSourceProviderFn() is required"); + // fixme: validate invalid table input + if (input.hasSchema() && !spec.hasStatementAndSetter()) { + checkArgument(spec.getTable() != null, "table cannot be null if statement is not provided"); + List fields = spec.getFilteredFields(input.getSchema()); + spec = + spec.toBuilder() + .setStatement(spec.generateStatement(fields)) + .setPreparedStatementSetter( + new AutoGeneratedPreparedStatementSetter(fields, input.getToRowFunction())) + .build(); + } else { + checkArgument(spec.getStatement() != null, "withStatement() is required"); + checkArgument( + spec.getPreparedStatementSetter() != null, "withPreparedStatementSetter() is required"); + } + return input.apply(ParDo.of(new WriteFn<>(spec))); + } + + private StaticValueProvider generateStatement(List fields) { + return StaticValueProvider.of( + JdbcUtil.generateStatement( + getTable(), + fields.stream().map(FieldWithIndex::getField).collect(Collectors.toList()))); + } + + private List getFilteredFields(Schema schema) { + Schema tableSchema; + + try (Connection connection = getDataSourceProviderFn().apply(null).getConnection(); + PreparedStatement statement = + connection.prepareStatement((String.format("SELECT * FROM %s", getTable())))) { + tableSchema = SchemaUtil.toBeamSchema(statement.getMetaData()); + statement.close(); + } catch (SQLException e) { + throw new RuntimeException("Error while determining columns from table: " + getTable(), e); + } + + checkState( + tableSchema.getFieldCount() >= schema.getFieldCount(), + "Input schema has more fields than actual table."); + + // filter out missing fields from output table + List missingFields = + tableSchema.getFields().stream() + .filter( + line -> + schema.getFields().stream() + .noneMatch(s -> s.getName().equalsIgnoreCase(line.getName()))) + .collect(Collectors.toList()); + + // allow insert only if missing fields are nullable + checkState( + !checkNullabilityForFields(missingFields), + "Non nullable fields are not allowed without schema."); + + List tableFilteredFields = + tableSchema.getFields().stream() + .map( + (tableField) -> { + Optional optionalSchemaField = + schema.getFields().stream() + .filter((f) -> SchemaUtil.compareSchemaField(tableField, f)) + .findFirst(); + return optionalSchemaField + .map( + field -> + FieldWithIndex.of(tableField, schema.getFields().indexOf(field))) + .orElse(null); + }) + .filter(Objects::nonNull) + .collect(Collectors.toList()); + + checkState( + tableFilteredFields.size() == schema.getFieldCount(), + "Provided schema doesn't match with database schema."); + + return tableFilteredFields; + } + + /** + * A {@link org.apache.beam.sdk.io.jdbc.JdbcIO.PreparedStatementSetter} implementation that + * calls related setters on prepared statement. + */ + private class AutoGeneratedPreparedStatementSetter implements PreparedStatementSetter { + + private final List fields; + private final SerializableFunction toRowFn; + private final List preparedStatementFieldSetterList = + new ArrayList<>(); + + AutoGeneratedPreparedStatementSetter( + List fieldsWithIndex, SerializableFunction toRowFn) { + this.fields = fieldsWithIndex; + this.toRowFn = toRowFn; + populatePreparedStatementFieldSetter(); + } + + private void populatePreparedStatementFieldSetter() { + IntStream.range(0, fields.size()) + .forEach( + (index) -> { + Schema.FieldType fieldType = fields.get(index).getField().getType(); + preparedStatementFieldSetterList.add( + JdbcUtil.getPreparedStatementSetCaller(fieldType)); + }); + } - return input.apply(ParDo.of(new WriteFn<>(this))); + @Override + public void setParameters(T element, PreparedStatement preparedStatement) throws Exception { + Row row = (element instanceof Row) ? (Row) element : toRowFn.apply(element); + IntStream.range(0, fields.size()) + .forEach( + (index) -> { + try { + preparedStatementFieldSetterList + .get(index) + .set(row, preparedStatement, index, fields.get(index)); + } catch (SQLException | NullPointerException e) { + throw new RuntimeException("Error while setting data to preparedStatement", e); + } + }); + } + } + + private boolean hasStatementAndSetter() { + return getStatement() != null && getPreparedStatementSetter() != null; } private static class WriteFn extends DoFn { + private static final Distribution RECORDS_PER_BATCH = + Metrics.distribution(WriteFn.class, "records_per_jdbc_batch"); + private static final Distribution MS_PER_BATCH = + Metrics.distribution(WriteFn.class, "milliseconds_per_batch"); private final WriteVoid spec; private DataSource dataSource; private Connection connection; @@ -1377,6 +1900,17 @@ public void setup() { .withMaxRetries(retryConfiguration.getMaxAttempts()); } + @Override + public void populateDisplayData(DisplayData.Builder builder) { + spec.populateDisplayData(builder); + builder.add( + DisplayData.item( + "query", preparedStatement == null ? "null" : preparedStatement.toString())); + builder.add( + DisplayData.item("dataSource", dataSource == null ? "null" : dataSource.toString())); + builder.add(DisplayData.item("spec", spec == null ? "null" : spec.toString())); + } + @ProcessElement public void processElement(ProcessContext context) throws Exception { T record = context.element(); @@ -1433,6 +1967,7 @@ private void executeBatch() throws SQLException, IOException, InterruptedExcepti if (records.isEmpty()) { return; } + Long startTimeNs = System.nanoTime(); // Only acquire the connection if there is something to write. if (connection == null) { connection = dataSource.getConnection(); @@ -1453,6 +1988,8 @@ private void executeBatch() throws SQLException, IOException, InterruptedExcepti preparedStatement.executeBatch(); // commit the changes connection.commit(); + RECORDS_PER_BATCH.update(records.size()); + MS_PER_BATCH.update(TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTimeNs)); break; } catch (SQLException exception) { if (!spec.getRetryStrategy().apply(exception)) { diff --git a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java index 22913e324bf7..dac1e78548dd 100644 --- a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java +++ b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java @@ -139,7 +139,6 @@ public PTransform, PDone> buildWriter() { return new PTransform, PDone>() { @Override public PDone expand(PCollection input) { - // TODO: BEAM-10396 use writeRows() when it's available return input.apply( JdbcIO.write() .withDataSourceConfiguration(getDataSourceConfiguration()) diff --git a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcUtil.java b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcUtil.java index f6303b168849..dff05f4af1c6 100644 --- a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcUtil.java +++ b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcUtil.java @@ -23,15 +23,25 @@ import java.sql.SQLException; import java.sql.Time; import java.sql.Timestamp; +import java.util.Arrays; import java.util.Calendar; import java.util.EnumMap; import java.util.List; import java.util.Map; +import java.util.Objects; import java.util.TimeZone; import java.util.stream.Collectors; import java.util.stream.IntStream; +import javax.sql.DataSource; +import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.joda.time.DateTime; @@ -252,4 +262,72 @@ private static Calendar withTimestampAndTimezone(DateTime dateTime) { return calendar; } + + /** Create partitions on a table. */ + static class PartitioningFn extends DoFn, KV> { + @ProcessElement + public void processElement(ProcessContext c) { + List params = c.element(); + Integer lowerBound = params.get(0); + Integer upperBound = params.get(1); + Integer numPartitions = params.get(2); + if (lowerBound > upperBound) { + throw new RuntimeException( + String.format( + "Lower bound [%s] is higher than upper bound [%s]", lowerBound, upperBound)); + } + int stride = (upperBound - lowerBound) / numPartitions + 1; + for (int i = lowerBound; i < upperBound - stride; i += stride) { + String range = String.format("%s,%s", i, i + stride); + KV kvRange = KV.of(range, 1); + c.output(kvRange); + } + if (upperBound - lowerBound > stride * (numPartitions - 1)) { + int indexFrom = (numPartitions - 1) * stride; + int indexTo = upperBound + 1; + String range = String.format("%s,%s", indexFrom, indexTo); + KV kvRange = KV.of(range, 1); + c.output(kvRange); + } + } + } + + /** + * Select maximal and minimal value from a table by partitioning column. + * + * @return pair of integers corresponds to the upper and lower bounds. + */ + static Integer[] getBounds( + PBegin input, + String table, + SerializableFunction providerFunctionFn, + String partitionColumn) { + final Integer[] bounds = {0, 0}; + input + .apply( + String.format("Read min and max value by %s", partitionColumn), + JdbcIO.read() + .withDataSourceProviderFn(providerFunctionFn) + .withQuery( + String.format("select min(%1$s), max(%1$s) from %2$s", partitionColumn, table)) + .withRowMapper( + (JdbcIO.RowMapper) + resultSet -> + String.join( + ",", Arrays.asList(resultSet.getString(1), resultSet.getString(2)))) + .withOutputParallelization(false) + .withCoder(StringUtf8Coder.of())) + .apply( + ParDo.of( + new DoFn() { + @ProcessElement + public void processElement(ProcessContext context) { + List elements = Splitter.on(',').splitToList(context.element()); + bounds[0] = Integer.parseInt(Objects.requireNonNull(elements.get(0))); + bounds[1] = Integer.parseInt(Objects.requireNonNull(elements.get(1))); + context.output(context.element()); + } + })); + return bounds; + } } diff --git a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcWriteResult.java b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcWriteResult.java new file mode 100644 index 000000000000..3117c2459ba2 --- /dev/null +++ b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcWriteResult.java @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.jdbc; + +import com.google.auto.value.AutoValue; + +/** The result of writing a row to JDBC datasource. */ +@AutoValue +public abstract class JdbcWriteResult { + public static JdbcWriteResult create() { + return new AutoValue_JdbcWriteResult(); + } +} diff --git a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/LogicalTypes.java b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/LogicalTypes.java index 27d886527799..2e5872883805 100644 --- a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/LogicalTypes.java +++ b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/LogicalTypes.java @@ -265,7 +265,13 @@ private FixedPrecisionNumeric( @Override public BigDecimal toInputType(BigDecimal base) { - checkArgument(base == null || (base.precision() == precision && base.scale() == scale)); + checkArgument( + base == null || (base.precision() == precision && base.scale() == scale), + "Expected BigDecimal base to be null or have precision = %s (was %s), scale = %s (was %s)", + precision, + base.precision(), + scale, + base.scale()); return base; } } diff --git a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/SchemaUtil.java b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/SchemaUtil.java index 921fe82d9e65..0cc7c2693bac 100644 --- a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/SchemaUtil.java +++ b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/SchemaUtil.java @@ -53,6 +53,7 @@ import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.values.Row; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.joda.time.DateTime; @@ -197,10 +198,19 @@ private static BeamFieldConverter beamLogicalField( }; } - /** Converts numeric fields with specified precision and scale. */ + /** + * Converts numeric fields with specified precision and scale to {@link + * LogicalTypes.FixedPrecisionNumeric}. If a precision of numeric field is not specified, then + * converts such field to {@link FieldType#DECIMAL}. + */ private static BeamFieldConverter beamLogicalNumericField(String identifier) { return (index, md) -> { int precision = md.getPrecision(index); + if (precision == Integer.MAX_VALUE || precision == -1) { + // If a precision is not specified, the column stores values as given (e.g. in Oracle DB) + return Schema.Field.of(md.getColumnLabel(index), FieldType.DECIMAL) + .withNullable(md.isNullable(index) == ResultSetMetaData.columnNullable); + } int scale = md.getScale(index); Schema.FieldType fieldType = Schema.FieldType.logicalType( diff --git a/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOIT.java b/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOIT.java index 476c95321e51..45eb09418a02 100644 --- a/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOIT.java +++ b/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOIT.java @@ -17,11 +17,14 @@ */ package org.apache.beam.sdk.io.jdbc; +import static org.apache.beam.sdk.io.common.DatabaseTestHelper.assertRowCount; +import static org.apache.beam.sdk.io.common.DatabaseTestHelper.getTestDataToWrite; import static org.apache.beam.sdk.io.common.IOITHelper.executeWithRetry; import static org.apache.beam.sdk.io.common.IOITHelper.readIOTestPipelineOptions; import com.google.cloud.Timestamp; import java.sql.SQLException; +import java.util.ArrayList; import java.util.HashSet; import java.util.List; import java.util.Optional; @@ -44,8 +47,10 @@ import org.apache.beam.sdk.testutils.publishing.InfluxDBSettings; import org.apache.beam.sdk.transforms.Combine; import org.apache.beam.sdk.transforms.Count; +import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.ParDo; import org.apache.beam.sdk.transforms.Top; +import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; import org.junit.AfterClass; import org.junit.BeforeClass; @@ -77,11 +82,9 @@ * performance testing framework. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JdbcIOIT { + private static final int EXPECTED_ROW_COUNT = 1000; private static final String NAMESPACE = JdbcIOIT.class.getName(); private static int numberOfRows; private static PGSimpleDataSource dataSource; @@ -258,4 +261,54 @@ private PipelineResult runRead() { return pipelineRead.run(); } + + @Test + public void testWriteWithWriteResults() throws Exception { + String firstTableName = DatabaseTestHelper.getTestTableName("UT_WRITE"); + DatabaseTestHelper.createTable(dataSource, firstTableName); + try { + ArrayList> data = getTestDataToWrite(EXPECTED_ROW_COUNT); + + PCollection> dataCollection = pipelineWrite.apply(Create.of(data)); + PCollection resultSetCollection = + dataCollection.apply( + getJdbcWriteWithReturning(firstTableName) + .withWriteResults( + (resultSet -> { + if (resultSet != null && resultSet.next()) { + return new JdbcTestHelper.TestDto(resultSet.getInt(1)); + } + return new JdbcTestHelper.TestDto(JdbcTestHelper.TestDto.EMPTY_RESULT); + }))); + resultSetCollection.setCoder(JdbcTestHelper.TEST_DTO_CODER); + + List expectedResult = new ArrayList<>(); + for (int id = 0; id < EXPECTED_ROW_COUNT; id++) { + expectedResult.add(new JdbcTestHelper.TestDto(id)); + } + + PAssert.that(resultSetCollection).containsInAnyOrder(expectedResult); + + pipelineWrite.run(); + + assertRowCount(dataSource, firstTableName, EXPECTED_ROW_COUNT); + } finally { + DatabaseTestHelper.deleteTable(dataSource, firstTableName); + } + } + + /** + * @return {@link JdbcIO.Write} transform that writes to {@param tableName} Postgres table and + * returns all fields of modified rows. + */ + private static JdbcIO.Write> getJdbcWriteWithReturning(String tableName) { + return JdbcIO.>write() + .withDataSourceProviderFn(voidInput -> dataSource) + .withStatement(String.format("insert into %s values(?, ?) returning *", tableName)) + .withPreparedStatementSetter( + (element, statement) -> { + statement.setInt(1, element.getKey()); + statement.setString(2, element.getValue()); + }); + } } diff --git a/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java b/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java index d714d94c9023..31ec663573c1 100644 --- a/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java +++ b/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java @@ -17,12 +17,15 @@ */ package org.apache.beam.sdk.io.jdbc; +import static java.sql.JDBCType.NUMERIC; +import static org.apache.beam.sdk.io.common.DatabaseTestHelper.assertRowCount; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.not; import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertSame; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertThrows; import static org.junit.Assert.assertTrue; import static org.mockito.ArgumentMatchers.anyString; @@ -40,7 +43,6 @@ import java.sql.Date; import java.sql.JDBCType; import java.sql.PreparedStatement; -import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.sql.Time; @@ -62,7 +64,9 @@ import org.apache.beam.sdk.io.common.TestRow; import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration; import org.apache.beam.sdk.io.jdbc.JdbcIO.PoolableDataSourceProvider; +import org.apache.beam.sdk.io.jdbc.LogicalTypes.FixedPrecisionNumeric; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; import org.apache.beam.sdk.schemas.transforms.Select; import org.apache.beam.sdk.testing.ExpectedLogs; import org.apache.beam.sdk.testing.PAssert; @@ -94,9 +98,6 @@ /** Test on the JdbcIO. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JdbcIOTest implements Serializable { private static final Logger LOG = LoggerFactory.getLogger(JdbcIOTest.class); private static final DataSourceConfiguration DATA_SOURCE_CONFIGURATION = @@ -276,6 +277,38 @@ public void testReadRowsWithDataSourceConfiguration() { pipeline.run(); } + @Test + public void testReadRowsWithNumericFields() { + PCollection rows = + pipeline.apply( + JdbcIO.readRows() + .withDataSourceConfiguration(DATA_SOURCE_CONFIGURATION) + .withQuery( + String.format( + "SELECT CAST(1 AS NUMERIC(1, 0)) AS T1 FROM %s WHERE name = ?", + READ_TABLE_NAME)) + .withStatementPreparator( + preparedStatement -> + preparedStatement.setString(1, TestRow.getNameForSeed(1)))); + + Schema expectedSchema = + Schema.of( + Schema.Field.of( + "T1", + FieldType.logicalType(FixedPrecisionNumeric.of(NUMERIC.getName(), 1, 0)) + .withNullable(false))); + + assertEquals(expectedSchema, rows.getSchema()); + + PCollection output = rows.apply(Select.fieldNames("T1")); + PAssert.that(output) + .containsInAnyOrder( + ImmutableList.of( + Row.withSchema(expectedSchema).addValues(BigDecimal.valueOf(1)).build())); + + pipeline.run(); + } + @Test public void testReadRowsWithoutStatementPreparator() { SerializableFunction dataSourceProvider = ignored -> DATA_SOURCE; @@ -337,6 +370,94 @@ public void testReadWithSchema() { pipeline.run(); } + @Test + public void testReadWithPartitions() { + PCollection rows = + pipeline.apply( + JdbcIO.readWithPartitions() + .withDataSourceConfiguration(DATA_SOURCE_CONFIGURATION) + .withRowMapper(new JdbcTestHelper.CreateTestRowOfNameAndId()) + .withCoder(SerializableCoder.of(TestRow.class)) + .withTable(READ_TABLE_NAME) + .withNumPartitions(1) + .withPartitionColumn("id") + .withLowerBound(0) + .withUpperBound(1000)); + PAssert.thatSingleton(rows.apply("Count All", Count.globally())).isEqualTo(1000L); + pipeline.run(); + } + + @Test + public void testReadWithPartitionsBySubqery() { + PCollection rows = + pipeline.apply( + JdbcIO.readWithPartitions() + .withDataSourceConfiguration(DATA_SOURCE_CONFIGURATION) + .withRowMapper(new JdbcTestHelper.CreateTestRowOfNameAndId()) + .withCoder(SerializableCoder.of(TestRow.class)) + .withTable(String.format("(select * from %s) as subq", READ_TABLE_NAME)) + .withNumPartitions(10) + .withPartitionColumn("id") + .withLowerBound(0) + .withUpperBound(1000)); + PAssert.thatSingleton(rows.apply("Count All", Count.globally())).isEqualTo(1000L); + pipeline.run(); + } + + @Test + public void testIfNumPartitionsIsZero() { + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage("numPartitions can not be less than 1"); + PCollection rows = + pipeline.apply( + JdbcIO.readWithPartitions() + .withDataSourceConfiguration(DATA_SOURCE_CONFIGURATION) + .withRowMapper(new JdbcTestHelper.CreateTestRowOfNameAndId()) + .withCoder(SerializableCoder.of(TestRow.class)) + .withTable(READ_TABLE_NAME) + .withNumPartitions(0) + .withPartitionColumn("id") + .withLowerBound(0) + .withUpperBound(1000)); + pipeline.run(); + } + + @Test + public void testNumPartitionsMoreThanTotalRows() { + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage( + "The specified number of partitions is more than the difference between upper bound and lower bound"); + pipeline.apply( + JdbcIO.readWithPartitions() + .withDataSourceConfiguration(DATA_SOURCE_CONFIGURATION) + .withRowMapper(new JdbcTestHelper.CreateTestRowOfNameAndId()) + .withCoder(SerializableCoder.of(TestRow.class)) + .withTable(READ_TABLE_NAME) + .withNumPartitions(200) + .withPartitionColumn("id") + .withLowerBound(0) + .withUpperBound(100)); + pipeline.run(); + } + + @Test + public void testLowerBoundIsMoreThanUpperBound() { + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage( + "The lower bound of partitioning column is larger or equal than the upper bound"); + pipeline.apply( + JdbcIO.readWithPartitions() + .withDataSourceConfiguration(DATA_SOURCE_CONFIGURATION) + .withRowMapper(new JdbcTestHelper.CreateTestRowOfNameAndId()) + .withCoder(SerializableCoder.of(TestRow.class)) + .withTable(READ_TABLE_NAME) + .withNumPartitions(5) + .withPartitionColumn("id") + .withLowerBound(100) + .withUpperBound(100)); + pipeline.run(); + } + @Test public void testWrite() throws Exception { String tableName = DatabaseTestHelper.getTestTableName("UT_WRITE"); @@ -347,12 +468,47 @@ public void testWrite() throws Exception { pipeline.run(); - assertRowCount(tableName, EXPECTED_ROW_COUNT); + assertRowCount(DATA_SOURCE, tableName, EXPECTED_ROW_COUNT); } finally { DatabaseTestHelper.deleteTable(DATA_SOURCE, tableName); } } + @Test + public void testWriteWithWriteResults() throws Exception { + String firstTableName = DatabaseTestHelper.getTestTableName("UT_WRITE"); + DatabaseTestHelper.createTable(DATA_SOURCE, firstTableName); + try { + ArrayList> data = getDataToWrite(EXPECTED_ROW_COUNT); + + PCollection> dataCollection = pipeline.apply(Create.of(data)); + PCollection resultSetCollection = + dataCollection.apply( + getJdbcWrite(firstTableName) + .withWriteResults( + (resultSet -> { + if (resultSet != null && resultSet.next()) { + return new JdbcTestHelper.TestDto(resultSet.getInt(1)); + } + return new JdbcTestHelper.TestDto(JdbcTestHelper.TestDto.EMPTY_RESULT); + }))); + resultSetCollection.setCoder(JdbcTestHelper.TEST_DTO_CODER); + + List expectedResult = new ArrayList<>(); + for (int i = 0; i < EXPECTED_ROW_COUNT; i++) { + expectedResult.add(new JdbcTestHelper.TestDto(JdbcTestHelper.TestDto.EMPTY_RESULT)); + } + + PAssert.that(resultSetCollection).containsInAnyOrder(expectedResult); + + pipeline.run(); + + assertRowCount(DATA_SOURCE, firstTableName, EXPECTED_ROW_COUNT); + } finally { + DatabaseTestHelper.deleteTable(DATA_SOURCE, firstTableName); + } + } + @Test public void testWriteWithResultsAndWaitOn() throws Exception { String firstTableName = DatabaseTestHelper.getTestTableName("UT_WRITE"); @@ -369,8 +525,8 @@ public void testWriteWithResultsAndWaitOn() throws Exception { pipeline.run(); - assertRowCount(firstTableName, EXPECTED_ROW_COUNT); - assertRowCount(secondTableName, EXPECTED_ROW_COUNT); + assertRowCount(DATA_SOURCE, firstTableName, EXPECTED_ROW_COUNT); + assertRowCount(DATA_SOURCE, secondTableName, EXPECTED_ROW_COUNT); } finally { DatabaseTestHelper.deleteTable(DATA_SOURCE, firstTableName); } @@ -388,6 +544,13 @@ private static JdbcIO.Write> getJdbcWrite(String tableName) }); } + private static JdbcIO.Write getJdbcWriteWithoutStatement(String tableName) { + return JdbcIO.write() + .withDataSourceConfiguration(DATA_SOURCE_CONFIGURATION) + .withBatchSize(10L) + .withTable(tableName); + } + private static ArrayList> getDataToWrite(long rowsToAdd) { ArrayList> data = new ArrayList<>(); for (int i = 0; i < rowsToAdd; i++) { @@ -397,18 +560,6 @@ private static ArrayList> getDataToWrite(long rowsToAdd) { return data; } - private static void assertRowCount(String tableName, int expectedRowCount) throws SQLException { - try (Connection connection = DATA_SOURCE.getConnection()) { - try (Statement statement = connection.createStatement()) { - try (ResultSet resultSet = statement.executeQuery("select count(*) from " + tableName)) { - resultSet.next(); - int count = resultSet.getInt(1); - assertEquals(expectedRowCount, count); - } - } - } - } - @Test public void testWriteWithBackoff() throws Exception { String tableName = DatabaseTestHelper.getTestTableName("UT_WRITE_BACKOFF"); @@ -462,10 +613,10 @@ public void testWriteWithBackoff() throws Exception { pipeline.run(); commitThread.join(); - // we verify the the backoff has been called thanks to the log message + // we verify that the backoff has been called thanks to the log message expectedLogs.verifyWarn("Deadlock detected, retrying"); - assertRowCount(tableName, 2); + assertRowCount(DATA_SOURCE, tableName, 2); } @Test @@ -517,7 +668,7 @@ public void testWriteWithoutPreparedStatement() throws Exception { .withBatchSize(10L) .withTable(tableName)); pipeline.run(); - assertRowCount(tableName, rowsToAdd); + assertRowCount(DATA_SOURCE, tableName, rowsToAdd); } finally { DatabaseTestHelper.deleteTable(DATA_SOURCE, tableName); } @@ -600,7 +751,7 @@ public void testWriteWithoutPreparedStatementAndNonRowType() throws Exception { .withBatchSize(10L) .withTable(tableName)); pipeline.run(); - assertRowCount(tableName, rowsToAdd); + assertRowCount(DATA_SOURCE, tableName, rowsToAdd); } finally { DatabaseTestHelper.deleteTable(DATA_SOURCE, tableName); } @@ -899,6 +1050,48 @@ protected boolean matchesSafely(Iterable logRecords) { }); // Since the pipeline was unable to write, only the row from insertStatement was written. - assertRowCount(tableName, 1); + assertRowCount(DATA_SOURCE, tableName, 1); + } + + @Test + public void testDefaultRetryStrategy() { + final JdbcIO.RetryStrategy strategy = new JdbcIO.DefaultRetryStrategy(); + assertTrue(strategy.apply(new SQLException("SQL deadlock", "40001"))); + assertTrue(strategy.apply(new SQLException("PostgreSQL deadlock", "40P01"))); + assertFalse(strategy.apply(new SQLException("Other code", "40X01"))); + } + + @Test + public void testWriteRowsResultsAndWaitOn() throws Exception { + String firstTableName = DatabaseTestHelper.getTestTableName("UT_WRITE_ROWS_PS"); + String secondTableName = DatabaseTestHelper.getTestTableName("UT_WRITE_ROWS_PS_AFTER_WAIT"); + DatabaseTestHelper.createTable(DATA_SOURCE, firstTableName); + DatabaseTestHelper.createTable(DATA_SOURCE, secondTableName); + + Schema.Builder schemaBuilder = Schema.builder(); + schemaBuilder.addField(Schema.Field.of("id", Schema.FieldType.INT32)); + schemaBuilder.addField(Schema.Field.of("name", Schema.FieldType.STRING)); + Schema schema = schemaBuilder.build(); + try { + ArrayList data = getRowsToWrite(EXPECTED_ROW_COUNT, schema); + + PCollection dataCollection = pipeline.apply(Create.of(data)); + PCollection rowsWritten = + dataCollection + .setRowSchema(schema) + .apply(getJdbcWriteWithoutStatement(firstTableName).withResults()); + dataCollection + .apply(Wait.on(rowsWritten)) + .setRowSchema(schema) // setRowSchema must be after .apply(Wait.on()) + .apply(getJdbcWriteWithoutStatement(secondTableName)); + + pipeline.run(); + + assertRowCount(DATA_SOURCE, firstTableName, EXPECTED_ROW_COUNT); + assertRowCount(DATA_SOURCE, secondTableName, EXPECTED_ROW_COUNT); + } finally { + DatabaseTestHelper.deleteTable(DATA_SOURCE, firstTableName); + DatabaseTestHelper.deleteTable(DATA_SOURCE, secondTableName); + } } } diff --git a/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcTestHelper.java b/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcTestHelper.java index f7f2363548ab..081f8af68b48 100644 --- a/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcTestHelper.java +++ b/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcTestHelper.java @@ -17,20 +17,74 @@ */ package org.apache.beam.sdk.io.jdbc; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.io.Serializable; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; +import java.util.Objects; +import org.apache.beam.sdk.coders.BigEndianIntegerCoder; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; import org.apache.beam.sdk.io.common.TestRow; /** * Contains Test helper methods used by both Integration and Unit Tests in {@link * org.apache.beam.sdk.io.jdbc.JdbcIO}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class JdbcTestHelper { + public static class TestDto extends JdbcWriteResult implements Serializable { + + public static final int EMPTY_RESULT = 0; + + private int simpleField; + + public TestDto(int simpleField) { + this.simpleField = simpleField; + } + + @Override + public boolean equals(Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + TestDto testDto = (TestDto) o; + return simpleField == testDto.simpleField; + } + + @Override + public int hashCode() { + return Objects.hash(simpleField); + } + } + + public static final Coder TEST_DTO_CODER = + new CustomCoder() { + @Override + public void encode(TestDto value, OutputStream outStream) + throws CoderException, IOException { + BigEndianIntegerCoder.of().encode(value.simpleField, outStream); + } + + @Override + public TestDto decode(InputStream inStream) throws CoderException, IOException { + int simpleField = BigEndianIntegerCoder.of().decode(inStream); + return new TestDto(simpleField); + } + + @Override + public Object structuralValue(TestDto v) { + return v; + } + }; + static class CreateTestRowOfNameAndId implements JdbcIO.RowMapper { @Override public TestRow mapRow(ResultSet resultSet) throws Exception { diff --git a/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/SchemaUtilTest.java b/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/SchemaUtilTest.java index dd3e36b76497..18acf049ee3c 100644 --- a/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/SchemaUtilTest.java +++ b/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/SchemaUtilTest.java @@ -37,6 +37,7 @@ import java.sql.Timestamp; import java.sql.Types; import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.utils.AvroUtils; import org.apache.beam.sdk.values.Row; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.joda.time.DateTime; @@ -48,9 +49,6 @@ /** Test SchemaUtils. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SchemaUtilTest { @Test public void testToBeamSchema() throws SQLException { @@ -226,6 +224,65 @@ public void testBeamRowMapperPrimitiveTypes() throws Exception { assertEquals(wantRow, haveRow); } + @Test + public void testJdbcLogicalTypesMapValidAvroSchemaIT() { + String expectedAvroSchema = + "{" + + " \"type\": \"record\"," + + " \"name\": \"topLevelRecord\"," + + " \"fields\": [{" + + " \"name\": \"longvarchar_col\"," + + " \"type\": {" + + " \"type\": \"string\"," + + " \"logicalType\": \"varchar\"," + + " \"maxLength\": 50" + + " }" + + " }, {" + + " \"name\": \"varchar_col\"," + + " \"type\": {" + + " \"type\": \"string\"," + + " \"logicalType\": \"varchar\"," + + " \"maxLength\": 15" + + " }" + + " }, {" + + " \"name\": \"fixedlength_char_col\"," + + " \"type\": {" + + " \"type\": \"string\"," + + " \"logicalType\": \"char\"," + + " \"maxLength\": 25" + + " }" + + " }, {" + + " \"name\": \"date_col\"," + + " \"type\": {" + + " \"type\": \"int\"," + + " \"logicalType\": \"date\"" + + " }" + + " }, {" + + " \"name\": \"time_col\"," + + " \"type\": {" + + " \"type\": \"int\"," + + " \"logicalType\": \"time-millis\"" + + " }" + + " }]" + + "}"; + + Schema jdbcRowSchema = + Schema.builder() + .addField( + "longvarchar_col", LogicalTypes.variableLengthString(JDBCType.LONGVARCHAR, 50)) + .addField("varchar_col", LogicalTypes.variableLengthString(JDBCType.VARCHAR, 15)) + .addField("fixedlength_char_col", LogicalTypes.fixedLengthString(JDBCType.CHAR, 25)) + .addField("date_col", LogicalTypes.JDBC_DATE_TYPE) + .addField("time_col", LogicalTypes.JDBC_TIME_TYPE) + .build(); + + System.out.println(AvroUtils.toAvroSchema(jdbcRowSchema)); + + assertEquals( + new org.apache.avro.Schema.Parser().parse(expectedAvroSchema), + AvroUtils.toAvroSchema(jdbcRowSchema)); + } + @Test public void testBeamRowMapperDateTime() throws Exception { long epochMilli = 1558719710000L; diff --git a/sdks/java/io/jms/build.gradle b/sdks/java/io/jms/build.gradle index 657bed432fae..3f9d0c719ec6 100644 --- a/sdks/java/io/jms/build.gradle +++ b/sdks/java/io/jms/build.gradle @@ -36,8 +36,7 @@ dependencies { testCompile library.java.activemq_kahadb_store testCompile library.java.activemq_client testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library + testCompile library.java.mockito_core testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/AutoScaler.java b/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/AutoScaler.java new file mode 100644 index 000000000000..0e023d1aae09 --- /dev/null +++ b/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/AutoScaler.java @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.jms; + +import java.io.Serializable; +import org.apache.beam.sdk.io.UnboundedSource; + +/** + * Enables users to specify their own `JMS` backlog reporters enabling {@link JmsIO} to report + * {@link UnboundedSource.UnboundedReader#getTotalBacklogBytes()}. + */ +public interface AutoScaler extends Serializable { + + /** The {@link AutoScaler} is started when the {@link JmsIO.UnboundedJmsReader} is started. */ + void start(); + + /** + * Returns the size of the backlog of unread data in the underlying data source represented by all + * splits of this source. + */ + long getTotalBacklogBytes(); + + /** The {@link AutoScaler} is stopped when the {@link JmsIO.UnboundedJmsReader} is closed. */ + void stop(); +} diff --git a/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/DefaultAutoscaler.java b/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/DefaultAutoscaler.java new file mode 100644 index 000000000000..2b05cf630bf8 --- /dev/null +++ b/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/DefaultAutoscaler.java @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.jms; + +import static org.apache.beam.sdk.io.UnboundedSource.UnboundedReader.BACKLOG_UNKNOWN; + +/** + * Default implementation of {@link AutoScaler}. Returns {@link + * org.apache.beam.sdk.io.UnboundedSource.UnboundedReader#BACKLOG_UNKNOWN} as the default value. + */ +public class DefaultAutoscaler implements AutoScaler { + @Override + public void start() {} + + @Override + public long getTotalBacklogBytes() { + return BACKLOG_UNKNOWN; + } + + @Override + public void stop() {} +} diff --git a/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java b/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java index 4999e10ee9e8..9fa4492cf235 100644 --- a/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java +++ b/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java @@ -196,6 +196,8 @@ public abstract static class Read extends PTransform> abstract @Nullable Coder getCoder(); + abstract @Nullable AutoScaler getAutoScaler(); + abstract Builder builder(); @AutoValue.Builder @@ -218,6 +220,8 @@ abstract static class Builder { abstract Builder setCoder(Coder coder); + abstract Builder setAutoScaler(AutoScaler autoScaler); + abstract Read build(); } @@ -344,6 +348,14 @@ public Read withCoder(Coder coder) { return builder().setCoder(coder).build(); } + /** + * Sets the {@link AutoScaler} to use for reporting backlog during the execution of this source. + */ + public Read withAutoScaler(AutoScaler autoScaler) { + checkArgument(autoScaler != null, "autoScaler can not be null"); + return builder().setAutoScaler(autoScaler).build(); + } + @Override public PCollection expand(PBegin input) { checkArgument(getConnectionFactory() != null, "withConnectionFactory() is required"); @@ -447,6 +459,7 @@ static class UnboundedJmsReader extends UnboundedReader { private Connection connection; private Session session; private MessageConsumer consumer; + private AutoScaler autoScaler; private T currentMessage; private Instant currentTimestamp; @@ -474,6 +487,12 @@ public boolean start() throws IOException { } connection.start(); this.connection = connection; + if (spec.getAutoScaler() == null) { + this.autoScaler = new DefaultAutoscaler(); + } else { + this.autoScaler = spec.getAutoScaler(); + } + this.autoScaler.start(); } catch (Exception e) { throw new IOException("Error connecting to JMS", e); } @@ -544,6 +563,11 @@ public CheckpointMark getCheckpointMark() { return checkpointMark; } + @Override + public long getTotalBacklogBytes() { + return this.autoScaler.getTotalBacklogBytes(); + } + @Override public UnboundedSource getCurrentSource() { return source; @@ -565,6 +589,10 @@ public void close() throws IOException { connection.close(); connection = null; } + if (autoScaler != null) { + autoScaler.stop(); + autoScaler = null; + } } catch (Exception e) { throw new IOException(e); } diff --git a/sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java b/sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java index 8b13f0966291..a9f3c3f004ef 100644 --- a/sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java +++ b/sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java @@ -17,12 +17,17 @@ */ package org.apache.beam.sdk.io.jms; +import static org.apache.beam.sdk.io.UnboundedSource.UnboundedReader.BACKLOG_UNKNOWN; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; import java.io.IOException; import java.lang.reflect.Proxy; @@ -70,7 +75,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class JmsIOTest { @@ -422,6 +426,50 @@ public void testCheckpointMarkDefaultCoder() throws Exception { CoderProperties.coderDecodeEncodeEqual(coder, jmsCheckpointMark); } + @Test + public void testDefaultAutoscaler() throws IOException { + JmsIO.Read spec = + JmsIO.read() + .withConnectionFactory(connectionFactory) + .withUsername(USERNAME) + .withPassword(PASSWORD) + .withQueue(QUEUE); + JmsIO.UnboundedJmsSource source = new JmsIO.UnboundedJmsSource(spec); + JmsIO.UnboundedJmsReader reader = source.createReader(null, null); + + // start the reader and check getSplitBacklogBytes and getTotalBacklogBytes values + reader.start(); + assertEquals(BACKLOG_UNKNOWN, reader.getSplitBacklogBytes()); + assertEquals(BACKLOG_UNKNOWN, reader.getTotalBacklogBytes()); + reader.close(); + } + + @Test + public void testCustomAutoscaler() throws IOException { + long excpectedTotalBacklogBytes = 1111L; + + AutoScaler autoScaler = mock(DefaultAutoscaler.class); + when(autoScaler.getTotalBacklogBytes()).thenReturn(excpectedTotalBacklogBytes); + JmsIO.Read spec = + JmsIO.read() + .withConnectionFactory(connectionFactory) + .withUsername(USERNAME) + .withPassword(PASSWORD) + .withQueue(QUEUE) + .withAutoScaler(autoScaler); + + JmsIO.UnboundedJmsSource source = new JmsIO.UnboundedJmsSource(spec); + JmsIO.UnboundedJmsReader reader = source.createReader(null, null); + + // start the reader and check getSplitBacklogBytes and getTotalBacklogBytes values + reader.start(); + verify(autoScaler, times(1)).start(); + assertEquals(excpectedTotalBacklogBytes, reader.getTotalBacklogBytes()); + verify(autoScaler, times(1)).getTotalBacklogBytes(); + reader.close(); + verify(autoScaler, times(1)).stop(); + } + private int count(String queue) throws Exception { Connection connection = connectionFactory.createConnection(USERNAME, PASSWORD); connection.start(); diff --git a/sdks/java/io/kafka/build.gradle b/sdks/java/io/kafka/build.gradle index a500ddd49fb5..7a4ca21f1b06 100644 --- a/sdks/java/io/kafka/build.gradle +++ b/sdks/java/io/kafka/build.gradle @@ -48,7 +48,10 @@ kafkaVersions.each{k,v -> configurations.create("kafkaVersion$k")} dependencies { compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") + compile project(":runners:core-construction-java") compile project(":sdks:java:expansion-service") + permitUnusedDeclared project(":sdks:java:expansion-service") // BEAM-11761 + compile library.java.avro // Get back to "provided" since 2.14 provided library.java.kafka_clients compile library.java.slf4j_api @@ -57,6 +60,13 @@ dependencies { compile library.java.jackson_databind compile "org.springframework:spring-expression:4.3.18.RELEASE" compile ("io.confluent:kafka-avro-serializer:5.3.2") { + // zookeeper depends on "spotbugs-annotations:3.1.9" which clashes with current + // "spotbugs-annotations:3.1.12" used in Beam. Not required. + exclude group: "org.apache.zookeeper", module: "zookeeper" + // "kafka-clients" has to be provided since user can use its own version. + exclude group: "org.apache.kafka", module: "kafka-clients" + } + compile ("io.confluent:kafka-schema-registry-client:5.3.2") { // It depends on "spotbugs-annotations:3.1.9" which clashes with current // "spotbugs-annotations:3.1.12" used in Beam. Not required. exclude group: "org.apache.zookeeper", module: "zookeeper" @@ -71,8 +81,6 @@ dependencies { // For testing Cross-language transforms testCompile project(":runners:core-construction-java") testCompile library.java.avro - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.junit testCompile library.java.powermock testCompile library.java.powermock_mockito diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProvider.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProvider.java index 8c6d499838d1..05c80ed5b0a9 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProvider.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProvider.java @@ -74,14 +74,24 @@ public class ConfluentSchemaRegistryDeserializerProvider implements Deseriali public static ConfluentSchemaRegistryDeserializerProvider of( String schemaRegistryUrl, String subject) { - return of(schemaRegistryUrl, subject, null); + return of(schemaRegistryUrl, subject, null, null); } public static ConfluentSchemaRegistryDeserializerProvider of( String schemaRegistryUrl, String subject, @Nullable Integer version) { + return of(schemaRegistryUrl, subject, version, null); + } + + public static ConfluentSchemaRegistryDeserializerProvider of( + String schemaRegistryUrl, + String subject, + @Nullable Integer version, + @Nullable Map schemaRegistryConfigs) { return new ConfluentSchemaRegistryDeserializerProvider( (SerializableFunction) - input -> new CachedSchemaRegistryClient(schemaRegistryUrl, Integer.MAX_VALUE), + input -> + new CachedSchemaRegistryClient( + schemaRegistryUrl, Integer.MAX_VALUE, schemaRegistryConfigs), schemaRegistryUrl, subject, version); diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java index 03dd9288694d..3d38f90fdd68 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java @@ -43,32 +43,40 @@ * It auto detects the input type List/Collection/Varargs, to eliminate the method definition * differences. */ -@SuppressWarnings({ - "rawtypes" // TODO(https://issues.apache.org/jira/browse/BEAM-10556) -}) class ConsumerSpEL { private static final Logger LOG = LoggerFactory.getLogger(ConsumerSpEL.class); - private SpelParserConfiguration config = new SpelParserConfiguration(true, true); - private ExpressionParser parser = new SpelExpressionParser(config); + private static final SpelParserConfiguration config = new SpelParserConfiguration(true, true); + private static final ExpressionParser parser = new SpelExpressionParser(config); - private Expression seek2endExpression = parser.parseExpression("#consumer.seekToEnd(#tp)"); + // This method changed from accepting varargs to accepting a Collection. + private static final Expression seek2endExpression = + parser.parseExpression("#consumer.seekToEnd(#tp)"); + // This method changed from accepting a List to accepting a Collection. + private static final Expression assignExpression = + parser.parseExpression("#consumer.assign(#tp)"); - private Expression assignExpression = parser.parseExpression("#consumer.assign(#tp)"); + private static boolean hasRecordTimestamp; + private static boolean hasHeaders; + private static boolean hasOffsetsForTimes; + private static boolean deserializerSupportsHeaders; - private Expression deserializeWithHeadersExpression = - parser.parseExpression("#deserializer.deserialize(#topic, #headers, #data)"); - - private boolean hasRecordTimestamp = false; - private boolean hasOffsetsForTimes = false; - private boolean deserializerSupportsHeaders = false; + static { + try { + // It is supported by Kafka Client 0.10.0.0 onwards. + hasRecordTimestamp = + ConsumerRecord.class + .getMethod("timestamp", (Class[]) null) + .getReturnType() + .equals(Long.TYPE); + } catch (NoSuchMethodException | SecurityException e) { + LOG.debug("Timestamp for Kafka message is not available."); + } - static boolean hasHeaders() { - boolean clientHasHeaders = false; try { // It is supported by Kafka Client 0.11.0.0 onwards. - clientHasHeaders = + hasHeaders = "org.apache.kafka.common.header.Headers" .equals( ConsumerRecord.class @@ -78,20 +86,6 @@ static boolean hasHeaders() { } catch (NoSuchMethodException | SecurityException e) { LOG.debug("Headers is not available"); } - return clientHasHeaders; - } - - public ConsumerSpEL() { - try { - // It is supported by Kafka Client 0.10.0.0 onwards. - hasRecordTimestamp = - ConsumerRecord.class - .getMethod("timestamp", (Class[]) null) - .getReturnType() - .equals(Long.TYPE); - } catch (NoSuchMethodException | SecurityException e) { - LOG.debug("Timestamp for Kafka message is not available."); - } try { // It is supported by Kafka Client 0.10.1.0 onwards. @@ -115,38 +109,30 @@ public ConsumerSpEL() { } } - public void evaluateSeek2End(Consumer consumer, TopicPartition topicPartition) { + public static void evaluateSeek2End(Consumer consumer, TopicPartition topicPartition) { StandardEvaluationContext mapContext = new StandardEvaluationContext(); mapContext.setVariable("consumer", consumer); mapContext.setVariable("tp", topicPartition); seek2endExpression.getValue(mapContext); } - public void evaluateAssign(Consumer consumer, Collection topicPartitions) { + public static void evaluateAssign( + Consumer consumer, Collection topicPartitions) { StandardEvaluationContext mapContext = new StandardEvaluationContext(); mapContext.setVariable("consumer", consumer); mapContext.setVariable("tp", topicPartitions); assignExpression.getValue(mapContext); } - public Object evaluateDeserializeWithHeaders( - Deserializer deserializer, ConsumerRecord rawRecord, Boolean isKey) { - StandardEvaluationContext mapContext = new StandardEvaluationContext(); - mapContext.setVariable("deserializer", deserializer); - mapContext.setVariable("topic", rawRecord.topic()); - mapContext.setVariable("headers", rawRecord.headers()); - mapContext.setVariable("data", isKey ? rawRecord.key() : rawRecord.value()); - return deserializeWithHeadersExpression.getValue(mapContext); - } - - public long getRecordTimestamp(ConsumerRecord rawRecord) { + public static long getRecordTimestamp(ConsumerRecord rawRecord) { if (hasRecordTimestamp) { return rawRecord.timestamp(); } return -1L; // This is the timestamp used in Kafka for older messages without timestamps. } - public KafkaTimestampType getRecordTimestampType(ConsumerRecord rawRecord) { + public static KafkaTimestampType getRecordTimestampType( + ConsumerRecord rawRecord) { if (hasRecordTimestamp) { return KafkaTimestampType.forOrdinal(rawRecord.timestampType().ordinal()); } else { @@ -154,29 +140,33 @@ public KafkaTimestampType getRecordTimestampType(ConsumerRecord } } - public boolean hasOffsetsForTimes() { + public static boolean hasOffsetsForTimes() { return hasOffsetsForTimes; } - public boolean deserializerSupportsHeaders() { + public static boolean hasHeaders() { + return hasHeaders; + } + + public static boolean deserializerSupportsHeaders() { return deserializerSupportsHeaders; } - public Object deserializeKey( - Deserializer deserializer, ConsumerRecord rawRecord) { + public static T deserializeKey( + Deserializer deserializer, ConsumerRecord rawRecord) { if (deserializerSupportsHeaders) { // Kafka API 2.1.0 onwards - return evaluateDeserializeWithHeaders(deserializer, rawRecord, true); + return deserializer.deserialize(rawRecord.topic(), rawRecord.headers(), rawRecord.key()); } else { return deserializer.deserialize(rawRecord.topic(), rawRecord.key()); } } - public Object deserializeValue( - Deserializer deserializer, ConsumerRecord rawRecord) { + public static T deserializeValue( + Deserializer deserializer, ConsumerRecord rawRecord) { if (deserializerSupportsHeaders) { // Kafka API 2.1.0 onwards - return evaluateDeserializeWithHeaders(deserializer, rawRecord, false); + return deserializer.deserialize(rawRecord.topic(), rawRecord.headers(), rawRecord.value()); } else { return deserializer.deserialize(rawRecord.topic(), rawRecord.value()); } @@ -187,7 +177,8 @@ public Object deserializeValue( * no messages later than timestamp or if this partition does not support timestamp based offset. */ @SuppressWarnings("unchecked") - public long offsetForTime(Consumer consumer, TopicPartition topicPartition, Instant time) { + public static long offsetForTime( + Consumer consumer, TopicPartition topicPartition, Instant time) { checkArgument(hasOffsetsForTimes, "This Kafka Client must support Consumer.OffsetsForTimes()."); diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCommitOffset.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCommitOffset.java index 54f561094a28..78100d97b244 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCommitOffset.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCommitOffset.java @@ -65,8 +65,6 @@ static class CommitOffsetDoFn extends DoFn, Void private final SerializableFunction, Consumer> consumerFactoryFn; - private transient ConsumerSpEL consumerSpEL = null; - CommitOffsetDoFn(KafkaIO.ReadSourceDescriptors readSourceDescriptors) { offsetConsumerConfig = readSourceDescriptors.getOffsetConsumerConfig(); consumerConfig = readSourceDescriptors.getConsumerConfig(); diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java index 60608e0aec8a..b7aad110448a 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java @@ -22,19 +22,26 @@ import com.google.auto.service.AutoService; import com.google.auto.value.AutoValue; +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; import io.confluent.kafka.serializers.KafkaAvroDeserializer; import java.io.InputStream; import java.io.OutputStream; import java.lang.reflect.Method; import java.util.ArrayList; +import java.util.Arrays; import java.util.Collections; import java.util.HashMap; +import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Optional; import java.util.Set; +import java.util.stream.Collectors; +import org.apache.beam.runners.core.construction.PTransformMatchers; +import org.apache.beam.runners.core.construction.ReplacementOutputs; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.coders.AtomicCoder; import org.apache.beam.sdk.coders.AvroCoder; import org.apache.beam.sdk.coders.ByteArrayCoder; @@ -48,10 +55,17 @@ import org.apache.beam.sdk.io.Read.Unbounded; import org.apache.beam.sdk.io.UnboundedSource; import org.apache.beam.sdk.io.UnboundedSource.CheckpointMark; +import org.apache.beam.sdk.io.kafka.KafkaIO.Read.External; import org.apache.beam.sdk.options.ExperimentalOptions; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.runners.AppliedPTransform; +import org.apache.beam.sdk.runners.PTransformOverride; +import org.apache.beam.sdk.runners.PTransformOverrideFactory; +import org.apache.beam.sdk.schemas.JavaFieldSchema; import org.apache.beam.sdk.schemas.NoSuchSchemaException; +import org.apache.beam.sdk.schemas.annotations.DefaultSchema; +import org.apache.beam.sdk.schemas.annotations.SchemaCreate; import org.apache.beam.sdk.schemas.transforms.Convert; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.ExternalTransformBuilder; @@ -72,7 +86,9 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.PDone; import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TupleTag; import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Joiner; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; @@ -141,6 +157,11 @@ * // offset consumed by the pipeline can be committed back. * .commitOffsetsInFinalize() * + * // Specified a serializable function which can determine whether to stop reading from given + * // TopicPartition during runtime. Note that only {@link ReadFromKafkaDoFn} respect the + * // signal. + * .withCheckStopReadingFn(new SerializedFunction() {}) + * * // finally, if you don't need Kafka metadata, you can drop it.g * .withoutMetadata() // PCollection> * ) @@ -158,6 +179,88 @@ * Read#withValueDeserializerAndCoder(Class, Coder)}. Note that Kafka messages are interpreted using * key and value deserializers. * + *

    Read From Kafka Dynamically

    + * + * For a given kafka bootstrap_server, KafkaIO is also able to detect and read from available {@link + * TopicPartition} dynamically and stop reading from un. KafkaIO uses {@link + * WatchKafkaTopicPartitionDoFn} to emit any new added {@link TopicPartition} and uses {@link + * ReadFromKafkaDoFn} to read from each {@link KafkaSourceDescriptor}. Dynamic read is able to solve + * 2 scenarios: + * + *
      + *
    • Certain topic or partition is added/deleted. + *
    • Certain topic or partition is added, then removed but added back again + *
    + * + * Within providing {@code checkStopReadingFn}, there are 2 more cases that dynamic read can handle: + * + *
      + *
    • Certain topic or partition is stopped + *
    • Certain topic or partition is added, then stopped but added back again + *
    + * + * Race conditions may happen under 2 supported cases: + * + *
      + *
    • A TopicPartition is removed, but added backed again + *
    • A TopicPartition is stopped, then want to read it again + *
    + * + * When race condition happens, it will result in the stopped/removed TopicPartition failing to be + * emitted to ReadFromKafkaDoFn again. Or ReadFromKafkaDoFn will output replicated records. The + * major cause for such race condition is that both {@link WatchKafkaTopicPartitionDoFn} and {@link + * ReadFromKafkaDoFn} react to the signal from removed/stopped {@link TopicPartition} but we cannot + * guarantee that both DoFns perform related actions at the same time. + * + *

    Here is one example for failing to emit new added {@link TopicPartition}: + * + *

      + *
    • A {@link WatchKafkaTopicPartitionDoFn} is configured with updating the current tracking set + * every 1 hour. + *
    • One TopicPartition A is tracked by the {@link WatchKafkaTopicPartitionDoFn} at 10:00AM and + * {@link ReadFromKafkaDoFn} starts to read from TopicPartition A immediately. + *
    • At 10:30AM, the {@link WatchKafkaTopicPartitionDoFn} notices that the {@link + * TopicPartition} has been stopped/removed, so it stops reading from it and returns {@code + * ProcessContinuation.stop()}. + *
    • At 10:45 the pipeline author wants to read from TopicPartition A again. + *
    • At 11:00AM when {@link WatchKafkaTopicPartitionDoFn} is invoked by firing timer, it doesn’t + * know that TopicPartition A has been stopped/removed. All it knows is that TopicPartition A + * is still an active TopicPartition and it will not emit TopicPartition A again. + *
    + * + * Another race condition example for producing duplicate records: + * + *
      + *
    • At 10:00AM, {@link ReadFromKafkaDoFn} is processing TopicPartition A + *
    • At 10:05AM, {@link ReadFromKafkaDoFn} starts to process other TopicPartitions(sdf-initiated + * checkpoint or runner-issued checkpoint happens) + *
    • At 10:10AM, {@link WatchKafkaTopicPartitionDoFn} knows that TopicPartition A is + * stopped/removed + *
    • At 10:15AM, {@link WatchKafkaTopicPartitionDoFn} knows that TopicPartition A is added again + * and emits TopicPartition A again + *
    • At 10:20AM, {@link ReadFromKafkaDoFn} starts to process resumed TopicPartition A but at the + * same time {@link ReadFromKafkaDoFn} is also processing the new emitted TopicPartitionA. + *
    + * + * For more design details, please refer to + * https://docs.google.com/document/d/1FU3GxVRetHPLVizP3Mdv6mP5tpjZ3fd99qNjUI5DT5k/. To enable + * dynamic read, you can write a pipeline like: + * + *
    {@code
    + * pipeline
    + *   .apply(KafkaIO.read()
    + *      // Configure the dynamic read with 1 hour, where the pipeline will look into available
    + *      // TopicPartitions and emit new added ones every 1 hour.
    + *      .withDynamicRead(Duration.standardHours(1))
    + *      .withCheckStopReadingFn(new SerializedFunction() {})
    + *      .withBootstrapServers("broker_1:9092,broker_2:9092")
    + *      .withKeyDeserializer(LongDeserializer.class)
    + *      .withValueDeserializer(StringDeserializer.class)
    + *   )
    + *   .apply(Values.create()) // PCollection
    + *    ...
    + * }
    + * *

    Partition Assignment and Checkpointing

    * * The Kafka partitions are evenly distributed among splits (workers). @@ -213,6 +316,27 @@ * ... * }
    * + *

    You can also pass properties to the schema registry client allowing you to configure + * authentication + * + *

    {@code
    + * ImmutableMap csrConfig =
    + *     ImmutableMap.builder()
    + *         .put(AbstractKafkaAvroSerDeConfig.BASIC_AUTH_CREDENTIALS_SOURCE,"USER_INFO")
    + *         .put(AbstractKafkaAvroSerDeConfig.USER_INFO_CONFIG,":")
    + *         .build();
    + *
    + * PCollection> input = pipeline
    + *   .apply(KafkaIO.read()
    + *      .withBootstrapServers("broker_1:9092,broker_2:9092")
    + *      .withTopic("my_topic")
    + *      .withKeyDeserializer(LongDeserializer.class)
    + *      // Use Confluent Schema Registry, specify schema registry URL, value subject and schema registry client configuration
    + *      .withValueDeserializer(
    + *          ConfluentSchemaRegistryDeserializerProvider.of("https://localhost:8081", "my_topic-value", null, csrConfig))
    + *    ...
    + * }
    + * *

    Read from Kafka as a {@link DoFn}

    * * {@link ReadSourceDescriptors} is the {@link PTransform} that takes a PCollection of {@link @@ -426,6 +550,7 @@ public static Read read() { .setConsumerConfig(KafkaIOUtils.DEFAULT_CONSUMER_PROPERTIES) .setMaxNumRecords(Long.MAX_VALUE) .setCommitOffsetsInFinalizeEnabled(false) + .setDynamicRead(false) .setTimestampPolicyFactory(TimestampPolicyFactory.withProcessingTime()) .build(); } @@ -485,9 +610,9 @@ public abstract static class Read extends PTransform>> { abstract Map getConsumerConfig(); - abstract List getTopics(); + abstract @Nullable List getTopics(); - abstract List getTopicPartitions(); + abstract @Nullable List getTopicPartitions(); abstract @Nullable Coder getKeyCoder(); @@ -506,6 +631,10 @@ public abstract static class Read abstract boolean isCommitOffsetsInFinalizeEnabled(); + abstract boolean isDynamicRead(); + + abstract @Nullable Duration getWatchTopicPartitionDuration(); + abstract TimestampPolicyFactory getTimestampPolicyFactory(); abstract @Nullable Map getOffsetConsumerConfig(); @@ -514,12 +643,13 @@ public abstract static class Read abstract @Nullable DeserializerProvider getValueDeserializerProvider(); + abstract @Nullable SerializableFunction getCheckStopReadingFn(); + abstract Builder toBuilder(); @Experimental(Kind.PORTABILITY) @AutoValue.Builder - abstract static class Builder - implements ExternalTransformBuilder>> { + abstract static class Builder { abstract Builder setConsumerConfig(Map config); abstract Builder setTopics(List topics); @@ -543,6 +673,10 @@ abstract Builder setConsumerFactoryFn( abstract Builder setCommitOffsetsInFinalizeEnabled(boolean commitOffsetInFinalize); + abstract Builder setDynamicRead(boolean dynamicRead); + + abstract Builder setWatchTopicPartitionDuration(Duration duration); + abstract Builder setTimestampPolicyFactory( TimestampPolicyFactory timestampPolicyFactory); @@ -553,46 +687,65 @@ abstract Builder setTimestampPolicyFactory( abstract Builder setValueDeserializerProvider( DeserializerProvider deserializerProvider); + abstract Builder setCheckStopReadingFn( + SerializableFunction checkStopReadingFn); + abstract Read build(); - @Override - public PTransform>> buildExternal( - External.Configuration config) { + static void setupExternalBuilder(Builder builder, External.Configuration config) { ImmutableList.Builder listBuilder = ImmutableList.builder(); for (String topic : config.topics) { listBuilder.add(topic); } - setTopics(listBuilder.build()); + builder.setTopics(listBuilder.build()); Class keyDeserializer = resolveClass(config.keyDeserializer); - setKeyDeserializerProvider(LocalDeserializerProvider.of(keyDeserializer)); - setKeyCoder(resolveCoder(keyDeserializer)); + builder.setKeyDeserializerProvider(LocalDeserializerProvider.of(keyDeserializer)); + builder.setKeyCoder(resolveCoder(keyDeserializer)); Class valueDeserializer = resolveClass(config.valueDeserializer); - setValueDeserializerProvider(LocalDeserializerProvider.of(valueDeserializer)); - setValueCoder(resolveCoder(valueDeserializer)); + builder.setValueDeserializerProvider(LocalDeserializerProvider.of(valueDeserializer)); + builder.setValueCoder(resolveCoder(valueDeserializer)); Map consumerConfig = new HashMap<>(config.consumerConfig); // Key and Value Deserializers always have to be in the config. consumerConfig.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDeserializer.getName()); consumerConfig.put( ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDeserializer.getName()); - setConsumerConfig(consumerConfig); + builder.setConsumerConfig(consumerConfig); // Set required defaults - setTopicPartitions(Collections.emptyList()); - setConsumerFactoryFn(KafkaIOUtils.KAFKA_CONSUMER_FACTORY_FN); + builder.setTopicPartitions(Collections.emptyList()); + builder.setConsumerFactoryFn(KafkaIOUtils.KAFKA_CONSUMER_FACTORY_FN); if (config.maxReadTime != null) { - setMaxReadTime(Duration.standardSeconds(config.maxReadTime)); + builder.setMaxReadTime(Duration.standardSeconds(config.maxReadTime)); + } + builder.setMaxNumRecords( + config.maxNumRecords == null ? Long.MAX_VALUE : config.maxNumRecords); + + // Set committing offset configuration. + builder.setCommitOffsetsInFinalizeEnabled(config.commitOffsetInFinalize); + + // Set timestamp policy with built-in types. + String timestampPolicy = config.timestampPolicy; + if (timestampPolicy.equals("ProcessingTime")) { + builder.setTimestampPolicyFactory(TimestampPolicyFactory.withProcessingTime()); + } else if (timestampPolicy.equals("CreateTime")) { + builder.setTimestampPolicyFactory(TimestampPolicyFactory.withCreateTime(Duration.ZERO)); + } else if (timestampPolicy.equals("LogAppendTime")) { + builder.setTimestampPolicyFactory(TimestampPolicyFactory.withLogAppendTime()); + } else { + throw new IllegalArgumentException( + "timestampPolicy should be one of (ProcessingTime, CreateTime, LogAppendTime)"); } - setMaxNumRecords(config.maxNumRecords == null ? Long.MAX_VALUE : config.maxNumRecords); - setCommitOffsetsInFinalizeEnabled(false); - setTimestampPolicyFactory(TimestampPolicyFactory.withProcessingTime()); + if (config.startReadTime != null) { - setStartReadTime(Instant.ofEpochMilli(config.startReadTime)); + builder.setStartReadTime(Instant.ofEpochMilli(config.startReadTime)); } - // We do not include Metadata until we can encode KafkaRecords cross-language - return build().withoutMetadata(); + + // We can expose dynamic read to external build when ReadFromKafkaDoFn is the default + // implementation. + builder.setDynamicRead(false); } private static Coder resolveCoder(Class deserializer) { @@ -625,14 +778,22 @@ private static Coder resolveCoder(Class deserializer) { @AutoService(ExternalTransformRegistrar.class) public static class External implements ExternalTransformRegistrar { - public static final String URN = "beam:external:java:kafka:read:v1"; + // Using the transform name in the URN so that the corresponding transform can be easily + // identified. + public static final String URN_WITH_METADATA = + "beam:external:java:kafkaio:externalwithmetadata:v1"; + public static final String URN_WITHOUT_METADATA = + "beam:external:java:kafkaio:typedwithoutmetadata:v1"; @Override public Map>> knownBuilders() { return ImmutableMap.of( - URN, + URN_WITH_METADATA, (Class>) - (Class) AutoValue_KafkaIO_Read.Builder.class); + (Class) RowsWithMetadata.Builder.class, + URN_WITHOUT_METADATA, + (Class>) + (Class) TypedWithoutMetadata.Builder.class); } /** Parameters class to expose the Read transform to an external SDK. */ @@ -645,6 +806,8 @@ public static class Configuration { private Long startReadTime; private Long maxNumRecords; private Long maxReadTime; + private Boolean commitOffsetInFinalize; + private String timestampPolicy; public void setConsumerConfig(Map consumerConfig) { this.consumerConfig = consumerConfig; @@ -673,6 +836,14 @@ public void setMaxNumRecords(Long maxNumRecords) { public void setMaxReadTime(Long maxReadTime) { this.maxReadTime = maxReadTime; } + + public void setCommitOffsetInFinalize(Boolean commitOffsetInFinalize) { + this.commitOffsetInFinalize = commitOffsetInFinalize; + } + + public void setTimestampPolicy(String timestampPolicy) { + this.timestampPolicy = timestampPolicy; + } } } @@ -700,7 +871,8 @@ public Read withTopic(String topic) { */ public Read withTopics(List topics) { checkState( - getTopicPartitions().isEmpty(), "Only topics or topicPartitions can be set, not both"); + getTopicPartitions() == null || getTopicPartitions().isEmpty(), + "Only topics or topicPartitions can be set, not both"); return toBuilder().setTopics(ImmutableList.copyOf(topics)).build(); } @@ -712,7 +884,9 @@ public Read withTopics(List topics) { * partitions are distributed among the splits. */ public Read withTopicPartitions(List topicPartitions) { - checkState(getTopics().isEmpty(), "Only topics or topicPartitions can be set, not both"); + checkState( + getTopics() == null || getTopics().isEmpty(), + "Only topics or topicPartitions can be set, not both"); return toBuilder().setTopicPartitions(ImmutableList.copyOf(topicPartitions)).build(); } @@ -962,6 +1136,16 @@ public Read commitOffsetsInFinalize() { return toBuilder().setCommitOffsetsInFinalizeEnabled(true).build(); } + /** + * Configure the KafkaIO to use {@link WatchKafkaTopicPartitionDoFn} to detect and emit any new + * available {@link TopicPartition} for {@link ReadFromKafkaDoFn} to consume during pipeline + * execution time. The KafkaIO will regularly check the availability based on the given + * duration. If the duration is not specified as {@code null}, the default duration is 1 hour. + */ + public Read withDynamicRead(Duration duration) { + return toBuilder().setDynamicRead(true).setWatchTopicPartitionDuration(duration).build(); + } + /** * Set additional configuration for the backend offset consumer. It may be required for a * secured Kafka cluster, especially when you see similar WARN log message 'exception while @@ -998,24 +1182,45 @@ public Read withConsumerConfigUpdates(Map configUpdates) { return toBuilder().setConsumerConfig(config).build(); } + /** + * A custom {@link SerializableFunction} that determines whether the {@link ReadFromKafkaDoFn} + * should stop reading from the given {@link TopicPartition}. + */ + public Read withCheckStopReadingFn( + SerializableFunction checkStopReadingFn) { + return toBuilder().setCheckStopReadingFn(checkStopReadingFn).build(); + } + /** Returns a {@link PTransform} for PCollection of {@link KV}, dropping Kafka metatdata. */ public PTransform>> withoutMetadata() { return new TypedWithoutMetadata<>(this); } + PTransform> externalWithMetadata() { + return new RowsWithMetadata<>(this); + } + @Override public PCollection> expand(PBegin input) { checkArgument( getConsumerConfig().get(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG) != null, "withBootstrapServers() is required"); - checkArgument( - getTopics().size() > 0 || getTopicPartitions().size() > 0, - "Either withTopic(), withTopics() or withTopicPartitions() is required"); + // With dynamic read, we no longer require providing topic/partition during pipeline + // construction time. But it requires enabling beam_fn_api. + if (!isDynamicRead()) { + checkArgument( + (getTopics() != null && getTopics().size() > 0) + || (getTopicPartitions() != null && getTopicPartitions().size() > 0), + "Either withTopic(), withTopics() or withTopicPartitions() is required"); + } else { + checkArgument( + ExperimentalOptions.hasExperiment(input.getPipeline().getOptions(), "beam_fn_api"), + "Kafka Dynamic Read requires enabling experiment beam_fn_api."); + } checkArgument(getKeyDeserializerProvider() != null, "withKeyDeserializer() is required"); checkArgument(getValueDeserializerProvider() != null, "withValueDeserializer() is required"); - ConsumerSpEL consumerSpEL = new ConsumerSpEL(); - if (!consumerSpEL.hasOffsetsForTimes()) { + if (!ConsumerSpEL.hasOffsetsForTimes()) { LOG.warn( "Kafka client version {} is too old. Versions before 0.10.1.0 are deprecated and " + "may not be supported in next release of Apache Beam. " @@ -1024,7 +1229,7 @@ public PCollection> expand(PBegin input) { } if (getStartReadTime() != null) { checkArgument( - consumerSpEL.hasOffsetsForTimes(), + ConsumerSpEL.hasOffsetsForTimes(), "Consumer.offsetsForTimes is only supported by Kafka Client 0.10.1.0 onwards, " + "current version of Kafka Client is " + AppInfoParser.getVersion() @@ -1051,45 +1256,170 @@ public PCollection> expand(PBegin input) { Coder keyCoder = getKeyCoder(coderRegistry); Coder valueCoder = getValueCoder(coderRegistry); - // The Read will be expanded into SDF transform when "beam_fn_api" is enabled and - // "beam_fn_api_use_deprecated_read" is not enabled. - if (!ExperimentalOptions.hasExperiment(input.getPipeline().getOptions(), "beam_fn_api") - || ExperimentalOptions.hasExperiment( + // For read from unbounded in a bounded manner, we actually are not going through Read or SDF. + if (ExperimentalOptions.hasExperiment( input.getPipeline().getOptions(), "beam_fn_api_use_deprecated_read") - || !ExperimentalOptions.hasExperiment( - input.getPipeline().getOptions(), "use_sdf_kafka_read")) { + || ExperimentalOptions.hasExperiment( + input.getPipeline().getOptions(), "use_deprecated_read") + || getMaxNumRecords() < Long.MAX_VALUE + || getMaxReadTime() != null + || runnerRequiresLegacyRead(input.getPipeline().getOptions())) { + return input.apply(new ReadFromKafkaViaUnbounded<>(this, keyCoder, valueCoder)); + } + return input.apply(new ReadFromKafkaViaSDF<>(this, keyCoder, valueCoder)); + } + + private boolean runnerRequiresLegacyRead(PipelineOptions options) { + // Only Dataflow runner requires sdf read at this moment. For other non-portable runners, if + // it doesn't specify use_sdf_read, it will use legacy read regarding to performance concern. + // TODO(BEAM-10670): Remove this special check when we address performance issue. + if (ExperimentalOptions.hasExperiment(options, "use_sdf_read")) { + return false; + } + if (options.getRunner().getName().startsWith("org.apache.beam.runners.dataflow.")) { + return false; + } else if (ExperimentalOptions.hasExperiment(options, "beam_fn_api")) { + return false; + } + return true; + } + + /** + * A {@link PTransformOverride} for runners to swap {@link ReadFromKafkaViaSDF} to legacy Kafka + * read if runners doesn't have a good support on executing unbounded Splittable DoFn. + */ + @Internal + public static final PTransformOverride KAFKA_READ_OVERRIDE = + PTransformOverride.of( + PTransformMatchers.classEqualTo(ReadFromKafkaViaSDF.class), + new KafkaReadOverrideFactory<>()); + + private static class KafkaReadOverrideFactory + implements PTransformOverrideFactory< + PBegin, PCollection>, ReadFromKafkaViaSDF> { + + @Override + public PTransformReplacement>> getReplacementTransform( + AppliedPTransform>, ReadFromKafkaViaSDF> + transform) { + return PTransformReplacement.of( + transform.getPipeline().begin(), + new ReadFromKafkaViaUnbounded<>( + transform.getTransform().kafkaRead, + transform.getTransform().keyCoder, + transform.getTransform().valueCoder)); + } + + @Override + public Map, ReplacementOutput> mapOutputs( + Map, PCollection> outputs, PCollection> newOutput) { + return ReplacementOutputs.singleton(outputs, newOutput); + } + } + + private static class ReadFromKafkaViaUnbounded + extends PTransform>> { + Read kafkaRead; + Coder keyCoder; + Coder valueCoder; + + ReadFromKafkaViaUnbounded(Read kafkaRead, Coder keyCoder, Coder valueCoder) { + this.kafkaRead = kafkaRead; + this.keyCoder = keyCoder; + this.valueCoder = valueCoder; + } + + @Override + public PCollection> expand(PBegin input) { // Handles unbounded source to bounded conversion if maxNumRecords or maxReadTime is set. Unbounded> unbounded = org.apache.beam.sdk.io.Read.from( - toBuilder().setKeyCoder(keyCoder).setValueCoder(valueCoder).build().makeSource()); + kafkaRead + .toBuilder() + .setKeyCoder(keyCoder) + .setValueCoder(valueCoder) + .build() + .makeSource()); PTransform>> transform = unbounded; - if (getMaxNumRecords() < Long.MAX_VALUE || getMaxReadTime() != null) { + if (kafkaRead.getMaxNumRecords() < Long.MAX_VALUE || kafkaRead.getMaxReadTime() != null) { transform = - unbounded.withMaxReadTime(getMaxReadTime()).withMaxNumRecords(getMaxNumRecords()); + unbounded + .withMaxReadTime(kafkaRead.getMaxReadTime()) + .withMaxNumRecords(kafkaRead.getMaxNumRecords()); } return input.getPipeline().apply(transform); } - ReadSourceDescriptors readTransform = - ReadSourceDescriptors.read() - .withConsumerConfigOverrides(getConsumerConfig()) - .withOffsetConsumerConfigOverrides(getOffsetConsumerConfig()) - .withConsumerFactoryFn(getConsumerFactoryFn()) - .withKeyDeserializerProvider(getKeyDeserializerProvider()) - .withValueDeserializerProvider(getValueDeserializerProvider()) - .withManualWatermarkEstimator() - .withTimestampPolicyFactory(getTimestampPolicyFactory()); - if (isCommitOffsetsInFinalizeEnabled()) { - readTransform = readTransform.commitOffsets(); + } + + static class ReadFromKafkaViaSDF + extends PTransform>> { + Read kafkaRead; + Coder keyCoder; + Coder valueCoder; + + ReadFromKafkaViaSDF(Read kafkaRead, Coder keyCoder, Coder valueCoder) { + this.kafkaRead = kafkaRead; + this.keyCoder = keyCoder; + this.valueCoder = valueCoder; + } + + @Override + public PCollection> expand(PBegin input) { + ReadSourceDescriptors readTransform = + ReadSourceDescriptors.read() + .withConsumerConfigOverrides(kafkaRead.getConsumerConfig()) + .withOffsetConsumerConfigOverrides(kafkaRead.getOffsetConsumerConfig()) + .withConsumerFactoryFn(kafkaRead.getConsumerFactoryFn()) + .withKeyDeserializerProvider(kafkaRead.getKeyDeserializerProvider()) + .withValueDeserializerProvider(kafkaRead.getValueDeserializerProvider()) + .withManualWatermarkEstimator() + .withTimestampPolicyFactory(kafkaRead.getTimestampPolicyFactory()) + .withCheckStopReadingFn(kafkaRead.getCheckStopReadingFn()); + if (kafkaRead.isCommitOffsetsInFinalizeEnabled()) { + readTransform = readTransform.commitOffsets(); + } + PCollection output; + if (kafkaRead.isDynamicRead()) { + Set topics = new HashSet<>(); + if (kafkaRead.getTopics() != null && kafkaRead.getTopics().size() > 0) { + topics.addAll(kafkaRead.getTopics()); + } + if (kafkaRead.getTopicPartitions() != null && kafkaRead.getTopicPartitions().size() > 0) { + for (TopicPartition topicPartition : kafkaRead.getTopicPartitions()) { + topics.add(topicPartition.topic()); + } + } + output = + input + .getPipeline() + .apply(Impulse.create()) + .apply( + MapElements.into( + TypeDescriptors.kvs( + new TypeDescriptor() {}, new TypeDescriptor() {})) + .via(element -> KV.of(element, element))) + .apply( + ParDo.of( + new WatchKafkaTopicPartitionDoFn( + kafkaRead.getWatchTopicPartitionDuration(), + kafkaRead.getConsumerFactoryFn(), + kafkaRead.getCheckStopReadingFn(), + kafkaRead.getConsumerConfig(), + kafkaRead.getStartReadTime(), + topics.stream().collect(Collectors.toList())))); + + } else { + output = + input + .getPipeline() + .apply(Impulse.create()) + .apply(ParDo.of(new GenerateKafkaSourceDescriptor(kafkaRead))); + } + return output.apply(readTransform).setCoder(KafkaRecordCoder.of(keyCoder, valueCoder)); } - PCollection output = - input - .getPipeline() - .apply(Impulse.create()) - .apply(ParDo.of(new GenerateKafkaSourceDescriptor(this))); - return output.apply(readTransform).setCoder(KafkaRecordCoder.of(keyCoder, valueCoder)); } /** @@ -1205,6 +1535,20 @@ public static class TypedWithoutMetadata extends PTransform + implements ExternalTransformBuilder>> { + + @Override + public PTransform>> buildExternal( + External.Configuration config) { + Read.Builder readBuilder = new AutoValue_KafkaIO_Read.Builder(); + Read.Builder.setupExternalBuilder(readBuilder, config); + + return readBuilder.build().withoutMetadata(); + } + } + @Override public PCollection> expand(PBegin begin) { return begin @@ -1227,6 +1571,152 @@ public void populateDisplayData(DisplayData.Builder builder) { } } + @DefaultSchema(JavaFieldSchema.class) + @SuppressFBWarnings("URF_UNREAD_FIELD") + /** + * Represents a Kafka header. We define a new class so that we can add schema annotations for + * generating Rows. + */ + static class KafkaHeader { + + String key; + byte[] value; + + @SchemaCreate + public KafkaHeader(String key, byte[] value) { + this.key = key; + this.value = value; + } + } + + @DefaultSchema(JavaFieldSchema.class) + @SuppressFBWarnings("URF_UNREAD_FIELD") + /** + * Represents a Kafka record with metadata whey key and values are byte arrays. This class should + * only be used to represent a Kafka record for external transforms. TODO(BEAM-7345): use regular + * KafkaRecord class when Beam Schema inference supports generics. + */ + static class ByteArrayKafkaRecord { + + String topic; + int partition; + long offset; + long timestamp; + byte[] key; + byte[] value; + List headers; + int timestampTypeId; + String timestampTypeName; + + @SchemaCreate + public ByteArrayKafkaRecord( + String topic, + int partition, + long offset, + long timestamp, + byte[] key, + byte[] value, + @Nullable List headers, + int timestampTypeId, + String timestampTypeName) { + this.topic = topic; + this.partition = partition; + this.offset = offset; + this.timestamp = timestamp; + this.key = key; + this.value = value; + this.headers = headers; + this.timestampTypeId = timestampTypeId; + this.timestampTypeName = timestampTypeName; + } + } + + /** + * A {@link PTransform} to read from Kafka topics. Similar to {@link KafkaIO.Read}, but generates + * a {@link PCollection} of {@link Row}. This class is primarily used as a cross-language + * transform since {@link KafkaRecord} is not a type that can be easily encoded using Beam's + * standard coders. See {@link KafkaIO} for more information on usage and configuration of reader. + */ + static class RowsWithMetadata extends PTransform> { + private final Read read; + + RowsWithMetadata(Read read) { + super("KafkaIO.RowsWithMetadata"); + this.read = read; + } + + @Experimental(Kind.PORTABILITY) + static class Builder + implements ExternalTransformBuilder> { + + @Override + public PTransform> buildExternal(External.Configuration config) { + Read.Builder readBuilder = new AutoValue_KafkaIO_Read.Builder(); + Read.Builder.setupExternalBuilder(readBuilder, config); + + Class keyDeserializer = resolveClass(config.keyDeserializer); + Coder keyCoder = Read.Builder.resolveCoder(keyDeserializer); + if (!(keyCoder instanceof ByteArrayCoder)) { + throw new RuntimeException( + "ExternalWithMetadata transform only supports keys of type byte[]"); + } + Class valueDeserializer = resolveClass(config.valueDeserializer); + Coder valueCoder = Read.Builder.resolveCoder(valueDeserializer); + if (!(valueCoder instanceof ByteArrayCoder)) { + throw new RuntimeException( + "ExternalWithMetadata transform only supports values of type byte[]"); + } + + return readBuilder.build().externalWithMetadata(); + } + } + + public static ByteArrayKafkaRecord toExternalKafkaRecord(KafkaRecord kafkaRecord) { + List headers = + (kafkaRecord.getHeaders() == null) + ? null + : Arrays.stream(kafkaRecord.getHeaders().toArray()) + .map(h -> new KafkaHeader(h.key(), h.value())) + .collect(Collectors.toList()); + ByteArrayKafkaRecord byteArrayKafkaRecord = + new ByteArrayKafkaRecord( + kafkaRecord.getTopic(), + kafkaRecord.getPartition(), + kafkaRecord.getOffset(), + kafkaRecord.getTimestamp(), + (byte[]) kafkaRecord.getKV().getKey(), + (byte[]) kafkaRecord.getKV().getValue(), + headers, + kafkaRecord.getTimestampType().id, + kafkaRecord.getTimestampType().name); + + return byteArrayKafkaRecord; + } + + @Override + public PCollection expand(PBegin begin) { + return begin + .apply(read) + .apply( + "Convert to ExternalKafkaRecord", + ParDo.of( + new DoFn, ByteArrayKafkaRecord>() { + @ProcessElement + public void processElement(ProcessContext ctx) { + KafkaRecord kafkRecord = ctx.element(); + ctx.output(toExternalKafkaRecord(kafkRecord)); + } + })) + .apply(Convert.toRows()); + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + super.populateDisplayData(builder); + read.populateDisplayData(builder); + } + } + /** * A {@link PTransform} to read from {@link KafkaSourceDescriptor}. See {@link KafkaIO} for more * information on usage and configuration. See {@link ReadFromKafkaDoFn} for more implementation @@ -1267,6 +1757,8 @@ public abstract static class ReadSourceDescriptors abstract SerializableFunction, Consumer> getConsumerFactoryFn(); + abstract @Nullable SerializableFunction getCheckStopReadingFn(); + abstract @Nullable SerializableFunction, Instant> getExtractOutputTimestampFn(); @@ -1289,6 +1781,9 @@ abstract ReadSourceDescriptors.Builder setOffsetConsumerConfig( abstract ReadSourceDescriptors.Builder setConsumerFactoryFn( SerializableFunction, Consumer> consumerFactoryFn); + abstract ReadSourceDescriptors.Builder setCheckStopReadingFn( + SerializableFunction checkStopReadingFn); + abstract ReadSourceDescriptors.Builder setKeyDeserializerProvider( DeserializerProvider deserializerProvider); @@ -1402,6 +1897,15 @@ public ReadSourceDescriptors withConsumerFactoryFn( return toBuilder().setConsumerFactoryFn(consumerFactoryFn).build(); } + /** + * A custom {@link SerializableFunction} that determines whether the {@link ReadFromKafkaDoFn} + * should stop reading from the given {@link TopicPartition}. + */ + public ReadSourceDescriptors withCheckStopReadingFn( + SerializableFunction checkStopReadingFn) { + return toBuilder().setCheckStopReadingFn(checkStopReadingFn).build(); + } + /** * Updates configuration for the main consumer. This method merges updates from the provided map * with any prior updates using {@link KafkaIOUtils#DEFAULT_CONSUMER_PROPERTIES} as the starting @@ -1604,15 +2108,10 @@ ReadSourceDescriptors withTimestampPolicyFactory( @Override public PCollection> expand(PCollection input) { - checkArgument( - ExperimentalOptions.hasExperiment(input.getPipeline().getOptions(), "beam_fn_api"), - "The ReadSourceDescriptors can only used when beam_fn_api is enabled."); - checkArgument(getKeyDeserializerProvider() != null, "withKeyDeserializer() is required"); checkArgument(getValueDeserializerProvider() != null, "withValueDeserializer() is required"); - ConsumerSpEL consumerSpEL = new ConsumerSpEL(); - if (!consumerSpEL.hasOffsetsForTimes()) { + if (!ConsumerSpEL.hasOffsetsForTimes()) { LOG.warn( "Kafka client version {} is too old. Versions before 0.10.1.0 are deprecated and " + "may not be supported in next release of Apache Beam. " @@ -1908,10 +2407,8 @@ public WriteRecords withPublishTimestampFunction( * transform ties checkpointing semantics in compatible Beam runners and transactions in Kafka * (version 0.11+) to ensure a record is written only once. As the implementation relies on * runners checkpoint semantics, not all the runners are compatible. The sink throws an - * exception during initialization if the runner is not explicitly allowed. Flink runner is one - * of the runners whose checkpoint semantics are not compatible with current implementation - * (hope to provide a solution in near future). Dataflow runner and Spark runners are - * compatible. + * exception during initialization if the runner is not explicitly allowed. The Dataflow, Flink, + * and Spark runners are compatible. * *

    Note on performance: Exactly-once sink involves two shuffles of the records. In addition * to cost of shuffling the records among workers, the records go through 2 diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecord.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecord.java index 8e11a6f9143c..b66b895acf62 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecord.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecord.java @@ -38,7 +38,7 @@ public class KafkaRecord { private final String topic; private final int partition; private final long offset; - private final Headers headers; + private final @Nullable Headers headers; private final KV kv; private final long timestamp; private final KafkaTimestampType timestampType; @@ -84,7 +84,7 @@ public long getOffset() { return offset; } - public Headers getHeaders() { + public @Nullable Headers getHeaders() { if (!ConsumerSpEL.hasHeaders()) { throw new RuntimeException( "The version kafka-clients does not support record headers, " diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java index be32c47fbd9f..aaeb1b40c65d 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java @@ -82,7 +82,7 @@ class KafkaUnboundedReader extends UnboundedReader> { public boolean start() throws IOException { Read spec = source.getSpec(); consumer = spec.getConsumerFactoryFn().apply(spec.getConsumerConfig()); - consumerSpEL.evaluateAssign(consumer, spec.getTopicPartitions()); + ConsumerSpEL.evaluateAssign(consumer, spec.getTopicPartitions()); keyDeserializerInstance = spec.getKeyDeserializerProvider().getDeserializer(spec.getConsumerConfig(), true); @@ -130,7 +130,7 @@ public boolean start() throws IOException { name, spec.getOffsetConsumerConfig(), spec.getConsumerConfig()); offsetConsumer = spec.getConsumerFactoryFn().apply(offsetConsumerConfig); - consumerSpEL.evaluateAssign(offsetConsumer, spec.getTopicPartitions()); + ConsumerSpEL.evaluateAssign(offsetConsumer, spec.getTopicPartitions()); // Fetch offsets once before running periodically. updateLatestOffsets(); @@ -192,11 +192,11 @@ public boolean advance() throws IOException { rawRecord.topic(), rawRecord.partition(), rawRecord.offset(), - consumerSpEL.getRecordTimestamp(rawRecord), - consumerSpEL.getRecordTimestampType(rawRecord), + ConsumerSpEL.getRecordTimestamp(rawRecord), + ConsumerSpEL.getRecordTimestampType(rawRecord), ConsumerSpEL.hasHeaders() ? rawRecord.headers() : null, - (K) consumerSpEL.deserializeKey(keyDeserializerInstance, rawRecord), - (V) consumerSpEL.deserializeValue(valueDeserializerInstance, rawRecord)); + ConsumerSpEL.deserializeKey(keyDeserializerInstance, rawRecord), + ConsumerSpEL.deserializeValue(valueDeserializerInstance, rawRecord)); curTimestamp = pState.timestampPolicy.getTimestampForRecord(pState.mkTimestampPolicyContext(), record); @@ -357,9 +357,6 @@ public long getSplitBacklogBytes() { private static final long UNINITIALIZED_OFFSET = -1; - // Add SpEL instance to cover the interface difference of Kafka client - private transient ConsumerSpEL consumerSpEL; - /** watermark before any records have been read. */ private static Instant initialWatermark = BoundedWindow.TIMESTAMP_MIN_VALUE; @@ -464,7 +461,6 @@ Instant updateAndGetWatermark() { KafkaUnboundedReader( KafkaUnboundedSource source, @Nullable KafkaCheckpointMark checkpointMark) { - this.consumerSpEL = new ConsumerSpEL(); this.source = source; this.name = "Reader-" + source.getId(); @@ -616,7 +612,7 @@ private void setupInitialOffset(PartitionState pState) { Instant startReadTime = spec.getStartReadTime(); if (startReadTime != null) { pState.nextOffset = - consumerSpEL.offsetForTime(consumer, pState.topicPartition, spec.getStartReadTime()); + ConsumerSpEL.offsetForTime(consumer, pState.topicPartition, spec.getStartReadTime()); consumer.seek(pState.topicPartition, pState.nextOffset); } else { pState.nextOffset = consumer.position(pState.topicPartition); @@ -630,7 +626,7 @@ private void updateLatestOffsets() { for (PartitionState p : partitionStates) { try { Instant fetchTime = Instant.now(); - consumerSpEL.evaluateSeek2End(offsetConsumer, p.topicPartition); + ConsumerSpEL.evaluateSeek2End(offsetConsumer, p.topicPartition); long offset = offsetConsumer.position(p.topicPartition); p.setLatestOffset(offset, fetchTime); } catch (Exception e) { diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedSource.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedSource.java index d0ed7d44d7e0..d1ecce0bdbee 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedSource.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedSource.java @@ -66,7 +66,12 @@ public List> split(int desiredNumSplits, PipelineOpti if (partitions.isEmpty()) { try (Consumer consumer = spec.getConsumerFactoryFn().apply(spec.getConsumerConfig())) { for (String topic : spec.getTopics()) { - for (PartitionInfo p : consumer.partitionsFor(topic)) { + List partitionInfoList = consumer.partitionsFor(topic); + checkState( + partitionInfoList != null, + "Could not find any partitions info. Please check Kafka configuration and make sure " + + "that provided topics exist."); + for (PartitionInfo p : partitionInfoList) { partitions.add(new TopicPartition(p.topic(), p.partition())); } } @@ -83,6 +88,10 @@ public List> split(int desiredNumSplits, PipelineOpti "Could not find any partitions. Please check Kafka configuration and topic names"); int numSplits = Math.min(desiredNumSplits, partitions.size()); + // XXX make all splits have the same # of partitions + while (partitions.size() % numSplits > 0) { + ++numSplits; + } List> assignments = new ArrayList<>(numSplits); for (int i = 0; i < numSplits; i++) { diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java index 7d5357db5709..fb5f9d04cfe4 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java @@ -70,7 +70,12 @@ public void processElement(ProcessContext ctx) throws Exception { producer.send( new ProducerRecord<>( - topicName, null, timestampMillis, record.key(), record.value(), record.headers()), + topicName, + record.partition(), + timestampMillis, + record.key(), + record.value(), + record.headers()), new SendCallback()); elementsWritten.inc(); diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java index 08a590be8b9d..b7f8e7a57b03 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java @@ -20,8 +20,11 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; import java.util.HashMap; +import java.util.HashSet; +import java.util.List; import java.util.Map; import java.util.Optional; +import java.util.Set; import java.util.concurrent.TimeUnit; import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.io.kafka.KafkaIO.ReadSourceDescriptors; @@ -52,6 +55,7 @@ import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.common.PartitionInfo; import org.apache.kafka.common.TopicPartition; import org.apache.kafka.common.serialization.Deserializer; import org.joda.time.Duration; @@ -116,6 +120,23 @@ * extractTimestampFn} and {@link * ReadSourceDescriptors#withMonotonicallyIncreasingWatermarkEstimator()} as the {@link * WatermarkEstimator}. + * + *

    Stop Reading from Removed {@link TopicPartition}

    + * + * {@link ReadFromKafkaDoFn} will stop reading from any removed {@link TopicPartition} automatically + * by querying Kafka {@link Consumer} APIs. Please note that stopping reading may not happen as soon + * as the {@link TopicPartition} is removed. For example, the removal could happen at the same time + * when {@link ReadFromKafkaDoFn} performs a {@link Consumer#poll(java.time.Duration)}. In that + * case, the {@link ReadFromKafkaDoFn} will still output the fetched records. + * + *

    Stop Reading from Stopped {@link TopicPartition}

    + * + * {@link ReadFromKafkaDoFn} will also stop reading from certain {@link TopicPartition} if it's a + * good time to do so by querying {@link ReadFromKafkaDoFn#checkStopReadingFn}. {@link + * ReadFromKafkaDoFn#checkStopReadingFn} is a customer-provided callback which is used to determine + * whether to stop reading from the given {@link TopicPartition}. Similar to the mechanism of + * stopping reading from removed {@link TopicPartition}, the stopping reading may not happens + * immediately. */ @UnboundedPerElement @SuppressWarnings({ @@ -134,12 +155,15 @@ class ReadFromKafkaDoFn this.extractOutputTimestampFn = transform.getExtractOutputTimestampFn(); this.createWatermarkEstimatorFn = transform.getCreateWatermarkEstimatorFn(); this.timestampPolicyFactory = transform.getTimestampPolicyFactory(); + this.checkStopReadingFn = transform.getCheckStopReadingFn(); } private static final Logger LOG = LoggerFactory.getLogger(ReadFromKafkaDoFn.class); private final Map offsetConsumerConfig; + private final SerializableFunction checkStopReadingFn; + private final SerializableFunction, Consumer> consumerFactoryFn; private final SerializableFunction, Instant> extractOutputTimestampFn; @@ -148,7 +172,6 @@ class ReadFromKafkaDoFn private final TimestampPolicyFactory timestampPolicyFactory; // Valid between bundle start and bundle finish. - private transient ConsumerSpEL consumerSpEL = null; private transient Deserializer keyDeserializerInstance = null; private transient Deserializer valueDeserializerInstance = null; @@ -169,22 +192,20 @@ private static class KafkaLatestOffsetEstimator private final Consumer offsetConsumer; private final TopicPartition topicPartition; - private final ConsumerSpEL consumerSpEL; private final Supplier memoizedBacklog; KafkaLatestOffsetEstimator( Consumer offsetConsumer, TopicPartition topicPartition) { this.offsetConsumer = offsetConsumer; this.topicPartition = topicPartition; - this.consumerSpEL = new ConsumerSpEL(); - this.consumerSpEL.evaluateAssign(this.offsetConsumer, ImmutableList.of(this.topicPartition)); + ConsumerSpEL.evaluateAssign(this.offsetConsumer, ImmutableList.of(this.topicPartition)); memoizedBacklog = Suppliers.memoizeWithExpiration( () -> { - consumerSpEL.evaluateSeek2End(offsetConsumer, topicPartition); + ConsumerSpEL.evaluateSeek2End(offsetConsumer, topicPartition); return offsetConsumer.position(topicPartition); }, - 5, + 1, TimeUnit.SECONDS); } @@ -211,14 +232,14 @@ public OffsetRange initialRestriction(@Element KafkaSourceDescriptor kafkaSource consumerFactoryFn.apply( KafkaIOUtils.getOffsetConsumerConfig( "initialOffset", offsetConsumerConfig, updatedConsumerConfig))) { - consumerSpEL.evaluateAssign( + ConsumerSpEL.evaluateAssign( offsetConsumer, ImmutableList.of(kafkaSourceDescriptor.getTopicPartition())); long startOffset; if (kafkaSourceDescriptor.getStartReadOffset() != null) { startOffset = kafkaSourceDescriptor.getStartReadOffset(); } else if (kafkaSourceDescriptor.getStartReadTime() != null) { startOffset = - consumerSpEL.offsetForTime( + ConsumerSpEL.offsetForTime( offsetConsumer, kafkaSourceDescriptor.getTopicPartition(), kafkaSourceDescriptor.getStartReadTime()); @@ -275,7 +296,11 @@ public ProcessContinuation processElement( RestrictionTracker tracker, WatermarkEstimator watermarkEstimator, OutputReceiver>> receiver) { - // If there is no future work, resume with max timeout and move to the next element. + // Stop processing current TopicPartition when it's time to stop. + if (checkStopReadingFn != null + && checkStopReadingFn.apply(kafkaSourceDescriptor.getTopicPartition())) { + return ProcessContinuation.stop(); + } Map updatedConsumerConfig = overrideBootstrapServersConfig(consumerConfig, kafkaSourceDescriptor); // If there is a timestampPolicyFactory, create the TimestampPolicy for current @@ -288,7 +313,20 @@ public ProcessContinuation processElement( Optional.ofNullable(watermarkEstimator.currentWatermark())); } try (Consumer consumer = consumerFactoryFn.apply(updatedConsumerConfig)) { - consumerSpEL.evaluateAssign( + // Check whether current TopicPartition is still available to read. + Set existingTopicPartitions = new HashSet<>(); + for (List topicPartitionList : consumer.listTopics().values()) { + topicPartitionList.forEach( + partitionInfo -> { + existingTopicPartitions.add( + new TopicPartition(partitionInfo.topic(), partitionInfo.partition())); + }); + } + if (!existingTopicPartitions.contains(kafkaSourceDescriptor.getTopicPartition())) { + return ProcessContinuation.stop(); + } + + ConsumerSpEL.evaluateAssign( consumer, ImmutableList.of(kafkaSourceDescriptor.getTopicPartition())); long startOffset = tracker.currentRestriction().getFrom(); long expectedOffset = startOffset; @@ -311,11 +349,11 @@ public ProcessContinuation processElement( rawRecord.topic(), rawRecord.partition(), rawRecord.offset(), - consumerSpEL.getRecordTimestamp(rawRecord), - consumerSpEL.getRecordTimestampType(rawRecord), + ConsumerSpEL.getRecordTimestamp(rawRecord), + ConsumerSpEL.getRecordTimestampType(rawRecord), ConsumerSpEL.hasHeaders() ? rawRecord.headers() : null, - (K) consumerSpEL.deserializeKey(keyDeserializerInstance, rawRecord), - (V) consumerSpEL.deserializeValue(valueDeserializerInstance, rawRecord)); + ConsumerSpEL.deserializeKey(keyDeserializerInstance, rawRecord), + ConsumerSpEL.deserializeValue(valueDeserializerInstance, rawRecord)); int recordSize = (rawRecord.key() == null ? 0 : rawRecord.key().length) + (rawRecord.value() == null ? 0 : rawRecord.value().length); @@ -361,7 +399,6 @@ public AverageRecordSize load(TopicPartition topicPartition) throws Exception { return new AverageRecordSize(); } }); - consumerSpEL = new ConsumerSpEL(); keyDeserializerInstance = keyDeserializerProvider.getDeserializer(consumerConfig, true); valueDeserializerInstance = valueDeserializerProvider.getDeserializer(consumerConfig, false); } diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/TopicPartitionCoder.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/TopicPartitionCoder.java new file mode 100644 index 000000000000..4868dc2c9734 --- /dev/null +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/TopicPartitionCoder.java @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.kafka; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.List; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.coders.StructuredCoder; +import org.apache.beam.sdk.coders.VarIntCoder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.kafka.common.TopicPartition; + +/** The {@link Coder} for encoding and decoding {@link TopicPartition} in Beam. */ +@SuppressWarnings({"nullness"}) +public class TopicPartitionCoder extends StructuredCoder { + + @Override + public void encode(TopicPartition value, OutputStream outStream) + throws CoderException, IOException { + StringUtf8Coder.of().encode(value.topic(), outStream); + VarIntCoder.of().encode(value.partition(), outStream); + } + + @Override + public TopicPartition decode(InputStream inStream) throws CoderException, IOException { + String topic = StringUtf8Coder.of().decode(inStream); + int partition = VarIntCoder.of().decode(inStream); + return new TopicPartition(topic, partition); + } + + @Override + public List> getCoderArguments() { + return ImmutableList.of(StringUtf8Coder.of(), VarIntCoder.of()); + } + + @Override + public void verifyDeterministic() throws NonDeterministicException {} +} diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/WatchKafkaTopicPartitionDoFn.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/WatchKafkaTopicPartitionDoFn.java new file mode 100644 index 000000000000..f021e360361f --- /dev/null +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/WatchKafkaTopicPartitionDoFn.java @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.kafka; + +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.state.BagState; +import org.apache.beam.sdk.state.StateSpec; +import org.apache.beam.sdk.state.StateSpecs; +import org.apache.beam.sdk.state.TimeDomain; +import org.apache.beam.sdk.state.Timer; +import org.apache.beam.sdk.state.TimerSpec; +import org.apache.beam.sdk.state.TimerSpecs; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; +import org.apache.kafka.clients.consumer.Consumer; +import org.apache.kafka.common.PartitionInfo; +import org.apache.kafka.common.TopicPartition; +import org.joda.time.Duration; +import org.joda.time.Instant; + +/** + * A stateful {@linkl DoFn} that emits new available {@link TopicPartition} regularly. + * + *

    Please refer to + * https://docs.google.com/document/d/1FU3GxVRetHPLVizP3Mdv6mP5tpjZ3fd99qNjUI5DT5k/edit# for more + * details. + */ +@SuppressWarnings({"nullness"}) +@Experimental +class WatchKafkaTopicPartitionDoFn extends DoFn, KafkaSourceDescriptor> { + + private static final Duration DEFAULT_CHECK_DURATION = Duration.standardHours(1); + private static final String TIMER_ID = "watch_timer"; + private static final String STATE_ID = "topic_partition_set"; + private final Duration checkDuration; + + private final SerializableFunction, Consumer> + kafkaConsumerFactoryFn; + private final SerializableFunction checkStopReadingFn; + private final Map kafkaConsumerConfig; + private final Instant startReadTime; + + private static final String COUNTER_NAMESPACE = "watch_kafka_topic_partition"; + + private final List topics; + + WatchKafkaTopicPartitionDoFn( + Duration checkDuration, + SerializableFunction, Consumer> kafkaConsumerFactoryFn, + SerializableFunction checkStopReadingFn, + Map kafkaConsumerConfig, + Instant startReadTime, + List topics) { + this.checkDuration = checkDuration == null ? DEFAULT_CHECK_DURATION : checkDuration; + this.kafkaConsumerFactoryFn = kafkaConsumerFactoryFn; + this.checkStopReadingFn = checkStopReadingFn; + this.kafkaConsumerConfig = kafkaConsumerConfig; + this.startReadTime = startReadTime; + this.topics = topics; + } + + @TimerId(TIMER_ID) + private final TimerSpec timerSpec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME); + + @StateId(STATE_ID) + private final StateSpec> bagStateSpec = + StateSpecs.bag(new TopicPartitionCoder()); + + @VisibleForTesting + Set getAllTopicPartitions() { + Set current = new HashSet<>(); + try (Consumer kafkaConsumer = + kafkaConsumerFactoryFn.apply(kafkaConsumerConfig)) { + if (topics != null && !topics.isEmpty()) { + for (String topic : topics) { + for (PartitionInfo partition : kafkaConsumer.partitionsFor(topic)) { + current.add(new TopicPartition(topic, partition.partition())); + } + } + + } else { + for (Map.Entry> topicInfo : + kafkaConsumer.listTopics().entrySet()) { + for (PartitionInfo partition : topicInfo.getValue()) { + current.add(new TopicPartition(topicInfo.getKey(), partition.partition())); + } + } + } + } + return current; + } + + @ProcessElement + public void processElement( + @TimerId(TIMER_ID) Timer timer, + @StateId(STATE_ID) BagState existingTopicPartitions, + OutputReceiver outputReceiver) { + // For the first time, we emit all available TopicPartition and write them into State. + Set current = getAllTopicPartitions(); + current.forEach( + topicPartition -> { + if (checkStopReadingFn == null || !checkStopReadingFn.apply(topicPartition)) { + Counter foundedTopicPartition = + Metrics.counter(COUNTER_NAMESPACE, topicPartition.toString()); + foundedTopicPartition.inc(); + existingTopicPartitions.add(topicPartition); + outputReceiver.output( + KafkaSourceDescriptor.of(topicPartition, null, startReadTime, null)); + } + }); + + timer.offset(checkDuration).setRelative(); + } + + @OnTimer(TIMER_ID) + public void onTimer( + @TimerId(TIMER_ID) Timer timer, + @StateId(STATE_ID) BagState existingTopicPartitions, + OutputReceiver outputReceiver) { + Set readingTopicPartitions = new HashSet<>(); + existingTopicPartitions + .read() + .forEach( + topicPartition -> { + readingTopicPartitions.add(topicPartition); + }); + existingTopicPartitions.clear(); + + Set currentAll = this.getAllTopicPartitions(); + + // Emit new added TopicPartitions. + Set newAdded = Sets.difference(currentAll, readingTopicPartitions); + newAdded.forEach( + topicPartition -> { + if (checkStopReadingFn == null || !checkStopReadingFn.apply(topicPartition)) { + Counter foundedTopicPartition = + Metrics.counter(COUNTER_NAMESPACE, topicPartition.toString()); + foundedTopicPartition.inc(); + outputReceiver.output( + KafkaSourceDescriptor.of(topicPartition, null, startReadTime, null)); + } + }); + + // Update the State. + currentAll.forEach( + topicPartition -> { + if (checkStopReadingFn == null || !checkStopReadingFn.apply(topicPartition)) { + existingTopicPartitions.add(topicPartition); + } + }); + + // Reset the timer. + timer.set(Instant.now().plus(checkDuration.getMillis())); + } +} diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProviderTest.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProviderTest.java index db8ce7032cb2..9e44146e82d2 100644 --- a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProviderTest.java +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProviderTest.java @@ -40,7 +40,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ConfluentSchemaRegistryDeserializerProviderTest { @Test diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/CustomTimestampPolicyWithLimitedDelayTest.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/CustomTimestampPolicyWithLimitedDelayTest.java index 44474e2bf7e5..7cca7ca628af 100644 --- a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/CustomTimestampPolicyWithLimitedDelayTest.java +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/CustomTimestampPolicyWithLimitedDelayTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.kafka; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; -import static org.junit.Assert.assertThat; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; @@ -36,9 +36,6 @@ /** Tests for {@link CustomTimestampPolicyWithLimitedDelay}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class CustomTimestampPolicyWithLimitedDelayTest { // Takes offsets of timestamps from now returns the results as offsets from 'now'. diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOExternalTest.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOExternalTest.java index 82c67eb1064d..928cd9a87e34 100644 --- a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOExternalTest.java +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOExternalTest.java @@ -18,10 +18,12 @@ package org.apache.beam.sdk.io.kafka; import static org.hamcrest.MatcherAssert.assertThat; +import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import java.io.ByteArrayOutputStream; import java.io.IOException; +import java.nio.charset.StandardCharsets; import java.util.List; import java.util.Map; import java.util.stream.Collectors; @@ -36,6 +38,9 @@ import org.apache.beam.sdk.coders.KvCoder; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.expansion.service.ExpansionService; +import org.apache.beam.sdk.io.kafka.KafkaIO.ByteArrayKafkaRecord; +import org.apache.beam.sdk.io.kafka.KafkaIO.Read.External; +import org.apache.beam.sdk.io.kafka.KafkaIO.RowsWithMetadata; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.Field; import org.apache.beam.sdk.schemas.Schema.FieldType; @@ -46,14 +51,16 @@ import org.apache.beam.sdk.transforms.WithKeys; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.Row; -import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString; -import org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.StreamObserver; +import org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.ByteString; +import org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.StreamObserver; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.kafka.clients.producer.ProducerConfig; +import org.apache.kafka.common.header.internals.RecordHeaders; import org.hamcrest.Matchers; +import org.hamcrest.text.MatchesPattern; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; @@ -63,14 +70,33 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class KafkaIOExternalTest { + + private void verifyKafkaReadComposite( + RunnerApi.PTransform kafkaSDFReadComposite, ExpansionApi.ExpansionResponse result) + throws Exception { + assertThat( + kafkaSDFReadComposite.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*Impulse.*"))); + assertThat( + kafkaSDFReadComposite.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*GenerateKafkaSourceDescriptor.*"))); + assertThat( + kafkaSDFReadComposite.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*ReadSourceDescriptors.*"))); + RunnerApi.PTransform kafkaSdfParDo = + result.getComponents().getTransformsOrThrow(kafkaSDFReadComposite.getSubtransforms(2)); + RunnerApi.ParDoPayload parDoPayload = + RunnerApi.ParDoPayload.parseFrom(kafkaSdfParDo.getSpec().getPayload()); + assertNotNull(parDoPayload.getRestrictionCoderId()); + } + @Test public void testConstructKafkaRead() throws Exception { List topics = ImmutableList.of("topic1", "topic2"); String keyDeserializer = "org.apache.kafka.common.serialization.ByteArrayDeserializer"; - String valueDeserializer = "org.apache.kafka.common.serialization.LongDeserializer"; + String valueDeserializer = "org.apache.kafka.common.serialization.ByteArrayDeserializer"; ImmutableMap consumerConfig = ImmutableMap.builder() .put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "server1:port,server2:port") @@ -89,12 +115,16 @@ public void testConstructKafkaRead() throws Exception { "consumer_config", FieldType.map(FieldType.STRING, FieldType.STRING)), Field.of("key_deserializer", FieldType.STRING), Field.of("value_deserializer", FieldType.STRING), - Field.of("start_read_time", FieldType.INT64))) + Field.of("start_read_time", FieldType.INT64), + Field.of("commit_offset_in_finalize", FieldType.BOOLEAN), + Field.of("timestamp_policy", FieldType.STRING))) .withFieldValue("topics", topics) .withFieldValue("consumer_config", consumerConfig) .withFieldValue("key_deserializer", keyDeserializer) .withFieldValue("value_deserializer", valueDeserializer) .withFieldValue("start_read_time", startReadTime) + .withFieldValue("commit_offset_in_finalize", false) + .withFieldValue("timestamp_policy", "ProcessingTime") .build()); RunnerApi.Components defaultInstance = RunnerApi.Components.getDefaultInstance(); @@ -106,7 +136,7 @@ public void testConstructKafkaRead() throws Exception { .setUniqueName("test") .setSpec( RunnerApi.FunctionSpec.newBuilder() - .setUrn("beam:external:java:kafka:read:v1") + .setUrn(External.URN_WITH_METADATA) .setPayload(payload.toByteString()))) .setNamespace("test_namespace") .build(); @@ -117,22 +147,124 @@ public void testConstructKafkaRead() throws Exception { RunnerApi.PTransform transform = result.getTransform(); assertThat( transform.getSubtransformsList(), - Matchers.contains( - "test_namespacetest/KafkaIO.Read", "test_namespacetest/Remove Kafka Metadata")); + Matchers.hasItem(MatchesPattern.matchesPattern(".*KafkaIO-Read.*"))); + assertThat( + transform.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*Convert-to-ExternalKafkaRecord.*"))); + assertThat( + transform.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*Convert-ConvertTransform.*"))); assertThat(transform.getInputsCount(), Matchers.is(0)); assertThat(transform.getOutputsCount(), Matchers.is(1)); - RunnerApi.PTransform kafkaComposite = + RunnerApi.PTransform kafkaReadComposite = result.getComponents().getTransformsOrThrow(transform.getSubtransforms(0)); + + verifyKafkaReadComposite( + result.getComponents().getTransformsOrThrow(kafkaReadComposite.getSubtransforms(0)), + result); + } + + @Test + public void testKafkaRecordToExternalKafkaRecord() throws Exception { + RecordHeaders headers = new RecordHeaders(); + headers.add("dummyHeaderKey", "dummyHeaderVal".getBytes(StandardCharsets.UTF_8)); + KafkaRecord kafkaRecord = + new KafkaRecord( + "dummyTopic", + 111, + 222, + 12345, + KafkaTimestampType.LOG_APPEND_TIME, + headers, + "dummyKey".getBytes(StandardCharsets.UTF_8), + "dummyValue".getBytes(StandardCharsets.UTF_8)); + + ByteArrayKafkaRecord byteArrayKafkaRecord = RowsWithMetadata.toExternalKafkaRecord(kafkaRecord); + + assertEquals("dummyTopic", byteArrayKafkaRecord.topic); + assertEquals(111, byteArrayKafkaRecord.partition); + assertEquals(222, byteArrayKafkaRecord.offset); + assertEquals(12345, byteArrayKafkaRecord.timestamp); + assertEquals(KafkaTimestampType.LOG_APPEND_TIME.id, byteArrayKafkaRecord.timestampTypeId); + assertEquals(KafkaTimestampType.LOG_APPEND_TIME.name, byteArrayKafkaRecord.timestampTypeName); + assertEquals("dummyKey", new String(byteArrayKafkaRecord.key, "UTF-8")); + assertEquals("dummyValue", new String(byteArrayKafkaRecord.value, "UTF-8")); + assertEquals(1, byteArrayKafkaRecord.headers.size()); + assertEquals("dummyHeaderKey", byteArrayKafkaRecord.headers.get(0).key); + assertEquals("dummyHeaderVal", new String(byteArrayKafkaRecord.headers.get(0).value, "UTF-8")); + } + + @Test + public void testConstructKafkaReadWithoutMetadata() throws Exception { + List topics = ImmutableList.of("topic1", "topic2"); + String keyDeserializer = "org.apache.kafka.common.serialization.ByteArrayDeserializer"; + String valueDeserializer = "org.apache.kafka.common.serialization.LongDeserializer"; + ImmutableMap consumerConfig = + ImmutableMap.builder() + .put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "server1:port,server2:port") + .put("key2", "value2") + .put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDeserializer) + .put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDeserializer) + .build(); + Long startReadTime = 100L; + + ExternalTransforms.ExternalConfigurationPayload payload = + encodeRow( + Row.withSchema( + Schema.of( + Field.of("topics", FieldType.array(FieldType.STRING)), + Field.of( + "consumer_config", FieldType.map(FieldType.STRING, FieldType.STRING)), + Field.of("key_deserializer", FieldType.STRING), + Field.of("value_deserializer", FieldType.STRING), + Field.of("start_read_time", FieldType.INT64), + Field.of("commit_offset_in_finalize", FieldType.BOOLEAN), + Field.of("timestamp_policy", FieldType.STRING))) + .withFieldValue("topics", topics) + .withFieldValue("consumer_config", consumerConfig) + .withFieldValue("key_deserializer", keyDeserializer) + .withFieldValue("value_deserializer", valueDeserializer) + .withFieldValue("start_read_time", startReadTime) + .withFieldValue("commit_offset_in_finalize", false) + .withFieldValue("timestamp_policy", "ProcessingTime") + .build()); + + RunnerApi.Components defaultInstance = RunnerApi.Components.getDefaultInstance(); + ExpansionApi.ExpansionRequest request = + ExpansionApi.ExpansionRequest.newBuilder() + .setComponents(defaultInstance) + .setTransform( + RunnerApi.PTransform.newBuilder() + .setUniqueName("test") + .setSpec( + RunnerApi.FunctionSpec.newBuilder() + .setUrn(External.URN_WITHOUT_METADATA) + .setPayload(payload.toByteString()))) + .setNamespace("test_namespace") + .build(); + ExpansionService expansionService = new ExpansionService(); + TestStreamObserver observer = new TestStreamObserver<>(); + expansionService.expand(request, observer); + ExpansionApi.ExpansionResponse result = observer.result; + RunnerApi.PTransform transform = result.getTransform(); + assertThat( + transform.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*KafkaIO-Read.*"))); + assertThat( + transform.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*Remove-Kafka-Metadata.*"))); + assertThat(transform.getInputsCount(), Matchers.is(0)); + assertThat(transform.getOutputsCount(), Matchers.is(1)); + RunnerApi.PTransform kafkaReadComposite = - result.getComponents().getTransformsOrThrow(kafkaComposite.getSubtransforms(0)); - RunnerApi.PTransform kafkaSdfComposite = - result.getComponents().getTransformsOrThrow(kafkaReadComposite.getSubtransforms(2)); - RunnerApi.PTransform kafkaSdfParDo = - result.getComponents().getTransformsOrThrow(kafkaSdfComposite.getSubtransforms(0)); - RunnerApi.ParDoPayload parDoPayload = - RunnerApi.ParDoPayload.parseFrom(kafkaSdfParDo.getSpec().getPayload()); - assertNotNull(parDoPayload.getRestrictionCoderId()); + result.getComponents().getTransformsOrThrow(transform.getSubtransforms(0)); + RunnerApi.PTransform kafkaComposite = + result.getComponents().getTransformsOrThrow(kafkaReadComposite.getSubtransforms(0)); + + verifyKafkaReadComposite( + result.getComponents().getTransformsOrThrow(kafkaReadComposite.getSubtransforms(0)), + result); } @Test @@ -192,8 +324,10 @@ public void testConstructKafkaWrite() throws Exception { RunnerApi.PTransform transform = result.getTransform(); assertThat( transform.getSubtransformsList(), - Matchers.contains( - "test_namespacetest/Kafka ProducerRecord", "test_namespacetest/KafkaIO.WriteRecords")); + Matchers.hasItem(MatchesPattern.matchesPattern(".*Kafka-ProducerRecord.*"))); + assertThat( + transform.getSubtransformsList(), + Matchers.hasItem(MatchesPattern.matchesPattern(".*KafkaIO-WriteRecords.*"))); assertThat(transform.getInputsCount(), Matchers.is(1)); assertThat(transform.getOutputsCount(), Matchers.is(0)); diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOIT.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOIT.java index b2772c3f09d7..09cefbf6c57a 100644 --- a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOIT.java +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOIT.java @@ -77,9 +77,6 @@ * topic so that we could read them back after writing. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KafkaIOIT { private static final String READ_TIME_METRIC_NAME = "read_time"; diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java index 1f7d5902fb26..33ea0ae0cc92 100644 --- a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java @@ -20,15 +20,15 @@ import static org.apache.beam.sdk.io.kafka.ConfluentSchemaRegistryDeserializerProviderTest.mockDeserializerProvider; import static org.apache.beam.sdk.metrics.MetricResultsMatchers.attemptedMetricsResult; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsString; import static org.hamcrest.Matchers.greaterThan; import static org.hamcrest.Matchers.hasItem; +import static org.hamcrest.Matchers.instanceOf; import static org.hamcrest.Matchers.isA; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertNotEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import static org.junit.Assume.assumeTrue; import static org.junit.internal.matchers.ThrowableCauseMatcher.hasCause; @@ -41,7 +41,7 @@ import java.io.IOException; import java.lang.reflect.Method; import java.nio.ByteBuffer; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.AbstractMap.SimpleEntry; import java.util.ArrayList; import java.util.Arrays; @@ -149,9 +149,6 @@ * specific Kafka version. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KafkaIOTest { private static final Logger LOG = LoggerFactory.getLogger(KafkaIOTest.class); @@ -508,50 +505,34 @@ public void testReadAvroSpecificRecordsWithConfluentSchemaRegistry() { public static class IntegerDeserializerWithHeadersAssertor extends IntegerDeserializer implements Deserializer { - ConsumerSpEL consumerSpEL = null; @Override public Integer deserialize(String topic, byte[] data) { - StackTraceElement[] stackTraceElements = Thread.currentThread().getStackTrace(); - if (consumerSpEL == null) { - consumerSpEL = new ConsumerSpEL(); - } - if (consumerSpEL.deserializerSupportsHeaders()) { - // Assert we have the default deserializer with headers API in the stack trace for Kafka API - // 2.1.0 onwards - try { - assertEquals(Deserializer.class, Class.forName(stackTraceElements[3].getClassName())); - assertEquals("deserialize", stackTraceElements[3].getMethodName()); - } catch (ClassNotFoundException e) { - } - } else { - assertNotEquals("deserialize", stackTraceElements[3].getMethodName()); - } + assertEquals(false, ConsumerSpEL.deserializerSupportsHeaders()); + return super.deserialize(topic, data); + } + + @Override + public Integer deserialize(String topic, Headers headers, byte[] data) { + // Overriding the default should trigger header evaluation and this to be called. + assertEquals(true, ConsumerSpEL.deserializerSupportsHeaders()); return super.deserialize(topic, data); } } public static class LongDeserializerWithHeadersAssertor extends LongDeserializer implements Deserializer { - ConsumerSpEL consumerSpEL = null; @Override public Long deserialize(String topic, byte[] data) { - if (consumerSpEL == null) { - consumerSpEL = new ConsumerSpEL(); - } - StackTraceElement[] stackTraceElements = Thread.currentThread().getStackTrace(); - if (consumerSpEL.deserializerSupportsHeaders()) { - // Assert we have the default deserializer with headers API in the stack trace for Kafka API - // 2.1.0 onwards - try { - assertEquals(Deserializer.class, Class.forName(stackTraceElements[3].getClassName())); - assertEquals("deserialize", stackTraceElements[3].getMethodName()); - } catch (ClassNotFoundException e) { - } - } else { - assertNotEquals("deserialize", stackTraceElements[3].getMethodName()); - } + assertEquals(false, ConsumerSpEL.deserializerSupportsHeaders()); + return super.deserialize(topic, data); + } + + @Override + public Long deserialize(String topic, Headers headers, byte[] data) { + // Overriding the default should trigger header evaluation and this to be called. + assertEquals(true, ConsumerSpEL.deserializerSupportsHeaders()); return super.deserialize(topic, data); } } @@ -709,6 +690,31 @@ public void testUnboundedSourceWithExplicitPartitions() { p.run(); } + @Test + public void testUnboundedSourceWithWrongTopic() { + // Expect an exception when provided Kafka topic doesn't exist. + thrown.expect(PipelineExecutionException.class); + thrown.expectCause(instanceOf(IllegalStateException.class)); + thrown.expectMessage( + "Could not find any partitions info. Please check Kafka configuration and make sure that " + + "provided topics exist."); + + int numElements = 1000; + KafkaIO.Read reader = + KafkaIO.read() + .withBootstrapServers("none") + .withTopic("wrong_topic") // read from topic that doesn't exist + .withConsumerFactoryFn( + new ConsumerFactoryFn( + ImmutableList.of("my_topic"), 10, numElements, OffsetResetStrategy.EARLIEST)) + .withMaxNumRecords(numElements) + .withKeyDeserializer(IntegerDeserializer.class) + .withValueDeserializer(LongDeserializer.class); + + p.apply(reader.withoutMetadata()).apply(Values.create()); + p.run(); + } + private static class ElementValueDiff extends DoFn { @ProcessElement public void processElement(ProcessContext c) throws Exception { @@ -1408,7 +1414,7 @@ public void testKafkaWriteHeaders() throws Exception { assertEquals(1, headersArray.length); assertEquals(header.getKey(), headersArray[0].key()); assertEquals( - header.getValue(), new String(headersArray[0].value(), Charset.defaultCharset())); + header.getValue(), new String(headersArray[0].value(), StandardCharsets.UTF_8)); } } } @@ -1452,9 +1458,49 @@ public void testSinkProducerRecordsWithCustomTS() throws Exception { } } + @Test + public void testSinkProducerRecordsWithCustomPartition() throws Exception { + int numElements = 1000; + + try (MockProducerWrapper producerWrapper = new MockProducerWrapper()) { + + ProducerSendCompletionThread completionThread = + new ProducerSendCompletionThread(producerWrapper.mockProducer).start(); + + final String defaultTopic = "test"; + final Integer partition = 1; + + p.apply(mkKafkaReadTransform(numElements, new ValueAsTimestampFn()).withoutMetadata()) + .apply(ParDo.of(new KV2ProducerRecord(defaultTopic, partition))) + .setCoder(ProducerRecordCoder.of(VarIntCoder.of(), VarLongCoder.of())) + .apply( + KafkaIO.writeRecords() + .withBootstrapServers("none") + .withKeySerializer(IntegerSerializer.class) + .withValueSerializer(LongSerializer.class) + .withProducerFactoryFn(new ProducerFactoryFn(producerWrapper.producerKey))); + + p.run(); + + completionThread.shutdown(); + + // Verify that messages are written with user-defined timestamp + List> sent = producerWrapper.mockProducer.history(); + + for (int i = 0; i < numElements; i++) { + ProducerRecord record = sent.get(i); + assertEquals(defaultTopic, record.topic()); + assertEquals(partition, record.partition()); + assertEquals(i, record.key().intValue()); + assertEquals(i, record.value().longValue()); + } + } + } + private static class KV2ProducerRecord extends DoFn, ProducerRecord> { final String topic; + final Integer partition; final boolean isSingleTopic; final Long ts; final SimpleEntry header; @@ -1463,6 +1509,10 @@ private static class KV2ProducerRecord this(topic, true); } + KV2ProducerRecord(String topic, Integer partition) { + this(topic, true, null, null, partition); + } + KV2ProducerRecord(String topic, Long ts) { this(topic, true, ts); } @@ -1472,12 +1522,22 @@ private static class KV2ProducerRecord } KV2ProducerRecord(String topic, boolean isSingleTopic, Long ts) { - this(topic, isSingleTopic, ts, null); + this(topic, isSingleTopic, ts, null, null); } KV2ProducerRecord( String topic, boolean isSingleTopic, Long ts, SimpleEntry header) { + this(topic, isSingleTopic, ts, header, null); + } + + KV2ProducerRecord( + String topic, + boolean isSingleTopic, + Long ts, + SimpleEntry header, + Integer partition) { this.topic = topic; + this.partition = partition; this.isSingleTopic = isSingleTopic; this.ts = ts; this.header = header; @@ -1491,17 +1551,19 @@ public void processElement(ProcessContext ctx) { headers = Arrays.asList( new RecordHeader( - header.getKey(), header.getValue().getBytes(Charset.defaultCharset()))); + header.getKey(), header.getValue().getBytes(StandardCharsets.UTF_8))); } if (isSingleTopic) { - ctx.output(new ProducerRecord<>(topic, null, ts, kv.getKey(), kv.getValue(), headers)); + ctx.output(new ProducerRecord<>(topic, partition, ts, kv.getKey(), kv.getValue(), headers)); } else { if (kv.getKey() % 2 == 0) { ctx.output( - new ProducerRecord<>(topic + "_2", null, ts, kv.getKey(), kv.getValue(), headers)); + new ProducerRecord<>( + topic + "_2", partition, ts, kv.getKey(), kv.getValue(), headers)); } else { ctx.output( - new ProducerRecord<>(topic + "_1", null, ts, kv.getKey(), kv.getValue(), headers)); + new ProducerRecord<>( + topic + "_1", partition, ts, kv.getKey(), kv.getValue(), headers)); } } } @@ -1597,7 +1659,7 @@ public void testSinkWithSendErrors() throws Throwable { @Test public void testUnboundedSourceStartReadTime() { - assumeTrue(new ConsumerSpEL().hasOffsetsForTimes()); + assumeTrue(ConsumerSpEL.hasOffsetsForTimes()); int numElements = 1000; // In this MockConsumer, we let the elements of the time and offset equal and there are 20 @@ -1621,7 +1683,7 @@ public void testUnboundedSourceStartReadTime() { @Test public void testUnboundedSourceStartReadTimeException() { - assumeTrue(new ConsumerSpEL().hasOffsetsForTimes()); + assumeTrue(ConsumerSpEL.hasOffsetsForTimes()); noMessagesException.expect(RuntimeException.class); diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOUtilsTest.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOUtilsTest.java index 4392fcbba3fc..62083fa6ddcc 100644 --- a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOUtilsTest.java +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOUtilsTest.java @@ -31,7 +31,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class KafkaIOUtilsTest { diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ProducerRecordCoderTest.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ProducerRecordCoderTest.java index 652ecdeea040..9ba997dfa183 100644 --- a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ProducerRecordCoderTest.java +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ProducerRecordCoderTest.java @@ -44,7 +44,6 @@ @PrepareForTest(ConsumerSpEL.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ProducerRecordCoderTest { @Test diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFnTest.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFnTest.java new file mode 100644 index 000000000000..7bcfe108ee89 --- /dev/null +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFnTest.java @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.kafka; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.beam.sdk.io.kafka.KafkaIO.ReadSourceDescriptors; +import org.apache.beam.sdk.io.range.OffsetRange; +import org.apache.beam.sdk.transforms.DoFn.OutputReceiver; +import org.apache.beam.sdk.transforms.DoFn.ProcessContinuation; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.apache.kafka.clients.consumer.Consumer; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.MockConsumer; +import org.apache.kafka.clients.consumer.OffsetAndTimestamp; +import org.apache.kafka.clients.consumer.OffsetResetStrategy; +import org.apache.kafka.common.PartitionInfo; +import org.apache.kafka.common.TopicPartition; +import org.apache.kafka.common.header.internals.RecordHeaders; +import org.apache.kafka.common.serialization.StringDeserializer; +import org.checkerframework.checker.initialization.qual.Initialized; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.UnknownKeyFor; +import org.joda.time.Instant; +import org.junit.Before; +import org.junit.Test; +import org.testcontainers.shaded.com.google.common.collect.ImmutableMap; + +@SuppressWarnings({ + "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) +}) +public class ReadFromKafkaDoFnTest { + + private final TopicPartition topicPartition = new TopicPartition("topic", 0); + + private final SimpleMockKafkaConsumer consumer = + new SimpleMockKafkaConsumer(OffsetResetStrategy.NONE, topicPartition); + + private final ReadFromKafkaDoFn dofnInstance = + new ReadFromKafkaDoFn(makeReadSourceDescriptor(consumer)); + + private ReadSourceDescriptors makeReadSourceDescriptor( + Consumer kafkaMockConsumer) { + return ReadSourceDescriptors.read() + .withKeyDeserializer(StringDeserializer.class) + .withValueDeserializer(StringDeserializer.class) + .withConsumerFactoryFn( + new SerializableFunction, Consumer>() { + @Override + public Consumer apply(Map input) { + return kafkaMockConsumer; + } + }) + .withBootstrapServers("bootstrap_server"); + } + + private static class SimpleMockKafkaConsumer extends MockConsumer { + + private final TopicPartition topicPartition; + private boolean isRemoved = false; + private long currentPos = 0L; + private long startOffset = 0L; + private long startOffsetForTime = 0L; + private long numOfRecordsPerPoll; + + public SimpleMockKafkaConsumer( + OffsetResetStrategy offsetResetStrategy, TopicPartition topicPartition) { + super(offsetResetStrategy); + this.topicPartition = topicPartition; + } + + public void reset() { + this.isRemoved = false; + this.currentPos = 0L; + this.startOffset = 0L; + this.startOffsetForTime = 0L; + this.numOfRecordsPerPoll = 0L; + } + + public void setRemoved() { + this.isRemoved = true; + } + + public void setNumOfRecordsPerPoll(long num) { + this.numOfRecordsPerPoll = num; + } + + public void setCurrentPos(long pos) { + this.currentPos = pos; + } + + public void setStartOffsetForTime(long pos) { + this.startOffsetForTime = pos; + } + + @Override + public synchronized Map> listTopics() { + if (this.isRemoved) { + return ImmutableMap.of(); + } + return ImmutableMap.of( + topicPartition.topic(), + ImmutableList.of( + new PartitionInfo( + topicPartition.topic(), topicPartition.partition(), null, null, null))); + } + + @Override + public synchronized void assign(Collection partitions) { + assertTrue(Iterables.getOnlyElement(partitions).equals(this.topicPartition)); + } + + @Override + public synchronized void seek(TopicPartition partition, long offset) { + assertTrue(partition.equals(this.topicPartition)); + this.startOffset = offset; + } + + @Override + public synchronized ConsumerRecords poll(long timeout) { + if (topicPartition == null) { + return ConsumerRecords.empty(); + } + String key = "key"; + String value = "value"; + List> records = new ArrayList<>(); + for (long i = 0; i <= numOfRecordsPerPoll; i++) { + records.add( + new ConsumerRecord( + topicPartition.topic(), + topicPartition.partition(), + startOffset + i, + key.getBytes(Charsets.UTF_8), + value.getBytes(Charsets.UTF_8))); + } + if (records.isEmpty()) { + return ConsumerRecords.empty(); + } + return new ConsumerRecords(ImmutableMap.of(topicPartition, records)); + } + + @Override + public synchronized Map offsetsForTimes( + Map timestampsToSearch) { + assertTrue( + Iterables.getOnlyElement( + timestampsToSearch.keySet().stream().collect(Collectors.toList())) + .equals(this.topicPartition)); + return ImmutableMap.of( + topicPartition, + new OffsetAndTimestamp( + this.startOffsetForTime, Iterables.getOnlyElement(timestampsToSearch.values()))); + } + + @Override + public synchronized long position(TopicPartition partition) { + assertTrue(partition.equals(this.topicPartition)); + return this.currentPos; + } + } + + private static class MockOutputReceiver + implements OutputReceiver>> { + + private final List>> records = + new ArrayList<>(); + + @Override + public void output(KV> output) {} + + @Override + public void outputWithTimestamp( + KV> output, + @UnknownKeyFor @NonNull @Initialized Instant timestamp) { + records.add(output); + } + + public List>> getOutputs() { + return this.records; + } + } + + private List>> createExpectedRecords( + KafkaSourceDescriptor descriptor, + long startOffset, + int numRecords, + String key, + String value) { + List>> records = new ArrayList<>(); + for (int i = 0; i < numRecords; i++) { + records.add( + KV.of( + descriptor, + new KafkaRecord( + topicPartition.topic(), + topicPartition.partition(), + startOffset + i, + -1L, + KafkaTimestampType.NO_TIMESTAMP_TYPE, + new RecordHeaders(), + KV.of(key, value)))); + } + return records; + } + + @Before + public void setUp() throws Exception { + dofnInstance.setup(); + consumer.reset(); + } + + @Test + public void testInitialRestrictionWhenHasStartOffset() throws Exception { + long expectedStartOffset = 10L; + consumer.setStartOffsetForTime(15L); + consumer.setCurrentPos(5L); + OffsetRange result = + dofnInstance.initialRestriction( + KafkaSourceDescriptor.of( + topicPartition, expectedStartOffset, Instant.now(), ImmutableList.of())); + assertEquals(new OffsetRange(expectedStartOffset, Long.MAX_VALUE), result); + } + + @Test + public void testInitialRestrictionWhenHasStartTime() throws Exception { + long expectedStartOffset = 10L; + consumer.setStartOffsetForTime(expectedStartOffset); + consumer.setCurrentPos(5L); + OffsetRange result = + dofnInstance.initialRestriction( + KafkaSourceDescriptor.of(topicPartition, null, Instant.now(), ImmutableList.of())); + assertEquals(new OffsetRange(expectedStartOffset, Long.MAX_VALUE), result); + } + + @Test + public void testInitialRestrictionWithConsumerPosition() throws Exception { + long expectedStartOffset = 5L; + consumer.setCurrentPos(5L); + OffsetRange result = + dofnInstance.initialRestriction( + KafkaSourceDescriptor.of(topicPartition, null, null, ImmutableList.of())); + assertEquals(new OffsetRange(expectedStartOffset, Long.MAX_VALUE), result); + } + + @Test + public void testProcessElement() throws Exception { + MockOutputReceiver receiver = new MockOutputReceiver(); + consumer.setNumOfRecordsPerPoll(3L); + long startOffset = 5L; + OffsetRangeTracker tracker = + new OffsetRangeTracker(new OffsetRange(startOffset, startOffset + 3)); + KafkaSourceDescriptor descriptor = KafkaSourceDescriptor.of(topicPartition, null, null, null); + ProcessContinuation result = + dofnInstance.processElement(descriptor, tracker, null, (OutputReceiver) receiver); + assertEquals(ProcessContinuation.stop(), result); + assertEquals( + createExpectedRecords(descriptor, startOffset, 3, "key", "value"), receiver.getOutputs()); + } + + @Test + public void testProcessElementWithEmptyPoll() throws Exception { + MockOutputReceiver receiver = new MockOutputReceiver(); + consumer.setNumOfRecordsPerPoll(-1); + OffsetRangeTracker tracker = new OffsetRangeTracker(new OffsetRange(0L, Long.MAX_VALUE)); + ProcessContinuation result = + dofnInstance.processElement( + KafkaSourceDescriptor.of(topicPartition, null, null, null), + tracker, + null, + (OutputReceiver) receiver); + assertEquals(ProcessContinuation.resume(), result); + assertTrue(receiver.getOutputs().isEmpty()); + } + + @Test + public void testProcessElementWhenTopicPartitionIsRemoved() throws Exception { + MockOutputReceiver receiver = new MockOutputReceiver(); + consumer.setRemoved(); + consumer.setNumOfRecordsPerPoll(10); + OffsetRangeTracker tracker = new OffsetRangeTracker(new OffsetRange(0L, Long.MAX_VALUE)); + ProcessContinuation result = + dofnInstance.processElement( + KafkaSourceDescriptor.of(topicPartition, null, null, null), + tracker, + null, + (OutputReceiver) receiver); + assertEquals(ProcessContinuation.stop(), result); + } + + @Test + public void testProcessElementWhenTopicPartitionIsStopped() throws Exception { + MockOutputReceiver receiver = new MockOutputReceiver(); + ReadFromKafkaDoFn instance = + new ReadFromKafkaDoFn( + makeReadSourceDescriptor(consumer) + .toBuilder() + .setCheckStopReadingFn( + new SerializableFunction() { + @Override + public Boolean apply(TopicPartition input) { + assertTrue(input.equals(topicPartition)); + return true; + } + }) + .build()); + consumer.setNumOfRecordsPerPoll(10); + OffsetRangeTracker tracker = new OffsetRangeTracker(new OffsetRange(0L, Long.MAX_VALUE)); + ProcessContinuation result = + instance.processElement( + KafkaSourceDescriptor.of(topicPartition, null, null, null), + tracker, + null, + (OutputReceiver) receiver); + assertEquals(ProcessContinuation.stop(), result); + } +} diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/TopicPartitionCoderTest.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/TopicPartitionCoderTest.java new file mode 100644 index 000000000000..55cd957b0fe8 --- /dev/null +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/TopicPartitionCoderTest.java @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.kafka; + +import static org.junit.Assert.assertEquals; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import org.apache.kafka.common.TopicPartition; +import org.junit.Test; + +@SuppressWarnings({"nullness"}) +public class TopicPartitionCoderTest { + + @Test + public void testEncodeDecodeRoundTrip() throws Exception { + TopicPartitionCoder coder = new TopicPartitionCoder(); + TopicPartition topicPartition = new TopicPartition("topic", 1); + ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); + coder.encode(topicPartition, outputStream); + assertEquals( + topicPartition, coder.decode(new ByteArrayInputStream(outputStream.toByteArray()))); + } + + @Test + public void testToString() throws Exception { + TopicPartitionCoder coder = new TopicPartitionCoder(); + assertEquals("TopicPartitionCoder(StringUtf8Coder,VarIntCoder)", coder.toString()); + } +} diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/WatchKafkaTopicPartitionDoFnTest.java b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/WatchKafkaTopicPartitionDoFnTest.java new file mode 100644 index 000000000000..1fcc5ded235c --- /dev/null +++ b/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/WatchKafkaTopicPartitionDoFnTest.java @@ -0,0 +1,445 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.kafka; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; +import static org.mockito.Mockito.never; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; +import static org.powermock.api.mockito.PowerMockito.mockStatic; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.stream.Collectors; +import org.apache.beam.sdk.state.BagState; +import org.apache.beam.sdk.state.ReadableState; +import org.apache.beam.sdk.state.Timer; +import org.apache.beam.sdk.transforms.DoFn.OutputReceiver; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; +import org.apache.kafka.clients.consumer.Consumer; +import org.apache.kafka.common.PartitionInfo; +import org.apache.kafka.common.TopicPartition; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.mockito.Mock; +import org.powermock.core.classloader.annotations.PrepareForTest; +import org.powermock.modules.junit4.PowerMockRunner; + +@RunWith(PowerMockRunner.class) +@PrepareForTest({Instant.class}) +@SuppressWarnings({"nullness"}) +public class WatchKafkaTopicPartitionDoFnTest { + + @Mock Consumer mockConsumer; + @Mock Timer timer; + + private final SerializableFunction, Consumer> consumerFn = + new SerializableFunction, Consumer>() { + @Override + public Consumer apply(Map input) { + return mockConsumer; + } + }; + + @Test + public void testGetAllTopicPartitions() throws Exception { + when(mockConsumer.listTopics()) + .thenReturn( + ImmutableMap.of( + "topic1", + ImmutableList.of( + new PartitionInfo("topic1", 0, null, null, null), + new PartitionInfo("topic1", 1, null, null, null)), + "topic2", + ImmutableList.of( + new PartitionInfo("topic2", 0, null, null, null), + new PartitionInfo("topic2", 1, null, null, null)))); + WatchKafkaTopicPartitionDoFn dofnInstance = + new WatchKafkaTopicPartitionDoFn( + Duration.millis(1L), consumerFn, null, ImmutableMap.of(), null, null); + assertEquals( + ImmutableSet.of( + new TopicPartition("topic1", 0), + new TopicPartition("topic1", 1), + new TopicPartition("topic2", 0), + new TopicPartition("topic2", 1)), + dofnInstance.getAllTopicPartitions()); + } + + @Test + public void testGetAllTopicPartitionsWithGivenTopics() throws Exception { + List givenTopics = ImmutableList.of("topic1", "topic2"); + when(mockConsumer.partitionsFor("topic1")) + .thenReturn( + ImmutableList.of( + new PartitionInfo("topic1", 0, null, null, null), + new PartitionInfo("topic1", 1, null, null, null))); + when(mockConsumer.partitionsFor("topic2")) + .thenReturn( + ImmutableList.of( + new PartitionInfo("topic2", 0, null, null, null), + new PartitionInfo("topic2", 1, null, null, null))); + WatchKafkaTopicPartitionDoFn dofnInstance = + new WatchKafkaTopicPartitionDoFn( + Duration.millis(1L), consumerFn, null, ImmutableMap.of(), null, givenTopics); + verify(mockConsumer, never()).listTopics(); + assertEquals( + ImmutableSet.of( + new TopicPartition("topic1", 0), + new TopicPartition("topic1", 1), + new TopicPartition("topic2", 0), + new TopicPartition("topic2", 1)), + dofnInstance.getAllTopicPartitions()); + } + + @Test + public void testProcessElementWhenNoAvailableTopicPartition() throws Exception { + WatchKafkaTopicPartitionDoFn dofnInstance = + new WatchKafkaTopicPartitionDoFn( + Duration.millis(600L), consumerFn, null, ImmutableMap.of(), null, null); + MockOutputReceiver outputReceiver = new MockOutputReceiver(); + + when(mockConsumer.listTopics()).thenReturn(ImmutableMap.of()); + MockBagState bagState = new MockBagState(ImmutableList.of()); + + when(timer.offset(Duration.millis(600L))).thenReturn(timer); + dofnInstance.processElement(timer, bagState, outputReceiver); + verify(timer, times(1)).setRelative(); + assertTrue(outputReceiver.getOutputs().isEmpty()); + assertTrue(bagState.getCurrentStates().isEmpty()); + } + + @Test + public void testProcessElementWithAvailableTopicPartitions() throws Exception { + Instant startReadTime = Instant.ofEpochMilli(1L); + WatchKafkaTopicPartitionDoFn dofnInstance = + new WatchKafkaTopicPartitionDoFn( + Duration.millis(600L), consumerFn, null, ImmutableMap.of(), startReadTime, null); + MockOutputReceiver outputReceiver = new MockOutputReceiver(); + + when(mockConsumer.listTopics()) + .thenReturn( + ImmutableMap.of( + "topic1", + ImmutableList.of( + new PartitionInfo("topic1", 0, null, null, null), + new PartitionInfo("topic1", 1, null, null, null)), + "topic2", + ImmutableList.of( + new PartitionInfo("topic2", 0, null, null, null), + new PartitionInfo("topic2", 1, null, null, null)))); + MockBagState bagState = new MockBagState(ImmutableList.of()); + + when(timer.offset(Duration.millis(600L))).thenReturn(timer); + dofnInstance.processElement(timer, bagState, outputReceiver); + + verify(timer, times(1)).setRelative(); + Set expectedOutputTopicPartitions = + ImmutableSet.of( + new TopicPartition("topic1", 0), + new TopicPartition("topic1", 1), + new TopicPartition("topic2", 0), + new TopicPartition("topic2", 1)); + Set expectedOutputDescriptor = + generateDescriptorsFromTopicPartitions(expectedOutputTopicPartitions, startReadTime); + assertEquals(expectedOutputDescriptor, new HashSet<>(outputReceiver.getOutputs())); + assertEquals(expectedOutputTopicPartitions, bagState.getCurrentStates()); + } + + @Test + public void testProcessElementWithStoppingReadingTopicPartition() throws Exception { + Instant startReadTime = Instant.ofEpochMilli(1L); + SerializableFunction checkStopReadingFn = + new SerializableFunction() { + @Override + public Boolean apply(TopicPartition input) { + if (input.equals(new TopicPartition("topic1", 1))) { + return true; + } + return false; + } + }; + WatchKafkaTopicPartitionDoFn dofnInstance = + new WatchKafkaTopicPartitionDoFn( + Duration.millis(600L), + consumerFn, + checkStopReadingFn, + ImmutableMap.of(), + startReadTime, + null); + MockOutputReceiver outputReceiver = new MockOutputReceiver(); + + when(mockConsumer.listTopics()) + .thenReturn( + ImmutableMap.of( + "topic1", + ImmutableList.of( + new PartitionInfo("topic1", 0, null, null, null), + new PartitionInfo("topic1", 1, null, null, null)), + "topic2", + ImmutableList.of( + new PartitionInfo("topic2", 0, null, null, null), + new PartitionInfo("topic2", 1, null, null, null)))); + MockBagState bagState = new MockBagState(ImmutableList.of()); + + when(timer.offset(Duration.millis(600L))).thenReturn(timer); + dofnInstance.processElement(timer, bagState, outputReceiver); + verify(timer, times(1)).setRelative(); + + Set expectedOutputTopicPartitions = + ImmutableSet.of( + new TopicPartition("topic1", 0), + new TopicPartition("topic2", 0), + new TopicPartition("topic2", 1)); + Set expectedOutputDescriptor = + generateDescriptorsFromTopicPartitions(expectedOutputTopicPartitions, startReadTime); + assertEquals(expectedOutputDescriptor, new HashSet<>(outputReceiver.getOutputs())); + assertEquals(expectedOutputTopicPartitions, bagState.getCurrentStates()); + } + + @Test + public void testOnTimerWithNoAvailableTopicPartition() throws Exception { + WatchKafkaTopicPartitionDoFn dofnInstance = + new WatchKafkaTopicPartitionDoFn( + Duration.millis(600L), consumerFn, null, ImmutableMap.of(), null, null); + MockOutputReceiver outputReceiver = new MockOutputReceiver(); + + when(mockConsumer.listTopics()).thenReturn(ImmutableMap.of()); + MockBagState bagState = new MockBagState(ImmutableList.of(new TopicPartition("topic1", 0))); + Instant now = Instant.EPOCH; + mockStatic(Instant.class); + when(Instant.now()).thenReturn(now); + + dofnInstance.onTimer(timer, bagState, outputReceiver); + + verify(timer, times(1)).set(now.plus(600L)); + assertTrue(outputReceiver.getOutputs().isEmpty()); + assertTrue(bagState.getCurrentStates().isEmpty()); + } + + @Test + public void testOnTimerWithAdditionOnly() throws Exception { + Instant startReadTime = Instant.ofEpochMilli(1L); + WatchKafkaTopicPartitionDoFn dofnInstance = + new WatchKafkaTopicPartitionDoFn( + Duration.millis(600L), consumerFn, null, ImmutableMap.of(), startReadTime, null); + MockOutputReceiver outputReceiver = new MockOutputReceiver(); + + when(mockConsumer.listTopics()) + .thenReturn( + ImmutableMap.of( + "topic1", + ImmutableList.of( + new PartitionInfo("topic1", 0, null, null, null), + new PartitionInfo("topic1", 1, null, null, null)), + "topic2", + ImmutableList.of( + new PartitionInfo("topic2", 0, null, null, null), + new PartitionInfo("topic2", 1, null, null, null)))); + MockBagState bagState = + new MockBagState( + ImmutableList.of(new TopicPartition("topic1", 0), new TopicPartition("topic1", 1))); + Instant now = Instant.EPOCH; + mockStatic(Instant.class); + when(Instant.now()).thenReturn(now); + + dofnInstance.onTimer(timer, bagState, outputReceiver); + + verify(timer, times(1)).set(now.plus(600L)); + Set expectedOutputTopicPartitions = + ImmutableSet.of(new TopicPartition("topic2", 0), new TopicPartition("topic2", 1)); + Set expectedCurrentTopicPartitions = + ImmutableSet.of( + new TopicPartition("topic1", 0), + new TopicPartition("topic1", 1), + new TopicPartition("topic2", 0), + new TopicPartition("topic2", 1)); + Set expectedOutputDescriptor = + generateDescriptorsFromTopicPartitions(expectedOutputTopicPartitions, startReadTime); + assertEquals(expectedOutputDescriptor, new HashSet<>(outputReceiver.getOutputs())); + assertEquals(expectedCurrentTopicPartitions, bagState.getCurrentStates()); + } + + @Test + public void testOnTimerWithRemovalOnly() throws Exception { + Instant startReadTime = Instant.ofEpochMilli(1L); + WatchKafkaTopicPartitionDoFn dofnInstance = + new WatchKafkaTopicPartitionDoFn( + Duration.millis(600L), consumerFn, null, ImmutableMap.of(), startReadTime, null); + MockOutputReceiver outputReceiver = new MockOutputReceiver(); + + when(mockConsumer.listTopics()) + .thenReturn( + ImmutableMap.of( + "topic1", + ImmutableList.of(new PartitionInfo("topic1", 0, null, null, null)), + "topic2", + ImmutableList.of( + new PartitionInfo("topic2", 0, null, null, null), + new PartitionInfo("topic2", 1, null, null, null)))); + MockBagState bagState = + new MockBagState( + ImmutableList.of( + new TopicPartition("topic1", 0), + new TopicPartition("topic1", 1), + new TopicPartition("topic2", 0), + new TopicPartition("topic2", 1))); + Instant now = Instant.EPOCH; + mockStatic(Instant.class); + when(Instant.now()).thenReturn(now); + + dofnInstance.onTimer(timer, bagState, outputReceiver); + + verify(timer, times(1)).set(now.plus(600L)); + Set expectedCurrentTopicPartitions = + ImmutableSet.of( + new TopicPartition("topic1", 0), + new TopicPartition("topic2", 0), + new TopicPartition("topic2", 1)); + assertTrue(outputReceiver.getOutputs().isEmpty()); + assertEquals(expectedCurrentTopicPartitions, bagState.getCurrentStates()); + } + + @Test + public void testOnTimerWithStoppedTopicPartitions() throws Exception { + Instant startReadTime = Instant.ofEpochMilli(1L); + SerializableFunction checkStopReadingFn = + new SerializableFunction() { + @Override + public Boolean apply(TopicPartition input) { + if (input.equals(new TopicPartition("topic1", 1))) { + return true; + } + return false; + } + }; + WatchKafkaTopicPartitionDoFn dofnInstance = + new WatchKafkaTopicPartitionDoFn( + Duration.millis(600L), + consumerFn, + checkStopReadingFn, + ImmutableMap.of(), + startReadTime, + null); + MockOutputReceiver outputReceiver = new MockOutputReceiver(); + + when(mockConsumer.listTopics()) + .thenReturn( + ImmutableMap.of( + "topic1", + ImmutableList.of( + new PartitionInfo("topic1", 0, null, null, null), + new PartitionInfo("topic1", 1, null, null, null)), + "topic2", + ImmutableList.of( + new PartitionInfo("topic2", 0, null, null, null), + new PartitionInfo("topic2", 1, null, null, null)))); + MockBagState bagState = + new MockBagState( + ImmutableList.of( + new TopicPartition("topic1", 0), + new TopicPartition("topic2", 0), + new TopicPartition("topic2", 1))); + Instant now = Instant.EPOCH; + mockStatic(Instant.class); + when(Instant.now()).thenReturn(now); + + dofnInstance.onTimer(timer, bagState, outputReceiver); + + Set expectedCurrentTopicPartitions = + ImmutableSet.of( + new TopicPartition("topic1", 0), + new TopicPartition("topic2", 0), + new TopicPartition("topic2", 1)); + + verify(timer, times(1)).set(now.plus(600L)); + assertTrue(outputReceiver.getOutputs().isEmpty()); + assertEquals(expectedCurrentTopicPartitions, bagState.getCurrentStates()); + } + + private static class MockOutputReceiver implements OutputReceiver { + + private List outputs = new ArrayList<>(); + + @Override + public void output(KafkaSourceDescriptor output) { + outputs.add(output); + } + + @Override + public void outputWithTimestamp(KafkaSourceDescriptor output, Instant timestamp) {} + + public List getOutputs() { + return outputs; + } + } + + private static class MockBagState implements BagState { + private Set topicPartitions = new HashSet<>(); + + MockBagState(List readReturn) { + topicPartitions.addAll(readReturn); + } + + @Override + public Iterable read() { + return topicPartitions; + } + + @Override + public void add(TopicPartition value) { + topicPartitions.add(value); + } + + @Override + public ReadableState isEmpty() { + return null; + } + + @Override + public BagState readLater() { + return null; + } + + @Override + public void clear() { + topicPartitions.clear(); + } + + public Set getCurrentStates() { + return topicPartitions; + } + } + + private Set generateDescriptorsFromTopicPartitions( + Set topicPartitions, Instant startReadTime) { + return topicPartitions.stream() + .map(topicPartition -> KafkaSourceDescriptor.of(topicPartition, null, startReadTime, null)) + .collect(Collectors.toSet()); + } +} diff --git a/sdks/java/io/kinesis/build.gradle b/sdks/java/io/kinesis/build.gradle index bb513d875c6f..ad7cc1c90559 100644 --- a/sdks/java/io/kinesis/build.gradle +++ b/sdks/java/io/kinesis/build.gradle @@ -31,22 +31,21 @@ test { dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") - compile library.java.slf4j_api - compile library.java.joda_time - compile library.java.jackson_dataformat_cbor - compile library.java.guava compile library.java.aws_java_sdk_cloudwatch compile library.java.aws_java_sdk_core compile library.java.aws_java_sdk_kinesis - compile "com.amazonaws:amazon-kinesis-client:1.13.0" + compile library.java.commons_lang3 + compile library.java.guava + compile library.java.joda_time + compile library.java.slf4j_api + compile "com.amazonaws:amazon-kinesis-client:1.14.2" compile "com.amazonaws:amazon-kinesis-producer:0.14.1" compile "commons-lang:commons-lang:2.6" + compile library.java.vendored_guava_26_0_jre testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") testCompile library.java.junit testCompile library.java.mockito_core testCompile library.java.guava_testlib - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.powermock testCompile library.java.powermock_mockito testCompile library.java.testcontainers_localstack diff --git a/sdks/java/io/kinesis/expansion-service/build.gradle b/sdks/java/io/kinesis/expansion-service/build.gradle index ef3db42f0e2d..c186e6353523 100644 --- a/sdks/java/io/kinesis/expansion-service/build.gradle +++ b/sdks/java/io/kinesis/expansion-service/build.gradle @@ -21,7 +21,6 @@ apply plugin: 'application' mainClassName = "org.apache.beam.sdk.expansion.service.ExpansionService" applyJavaNature( - automaticModuleName: 'org.apache.beam.sdk.io.kinesis.expansion.service', exportJavadoc: false, validateShadowJar: false, @@ -33,6 +32,8 @@ ext.summary = "Expansion service serving KinesisIO" dependencies { compile project(":sdks:java:expansion-service") + permitUnusedDeclared project(":sdks:java:expansion-service") // BEAM-11761 compile project(":sdks:java:io:kinesis") + permitUnusedDeclared project(":sdks:java:io:kinesis") // BEAM-11761 runtime library.java.slf4j_jdk14 } diff --git a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/DynamicCheckpointGenerator.java b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/DynamicCheckpointGenerator.java index 06a96698d1ae..8ef1274947b5 100644 --- a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/DynamicCheckpointGenerator.java +++ b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/DynamicCheckpointGenerator.java @@ -17,10 +17,8 @@ */ package org.apache.beam.sdk.io.kinesis; -import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; - import com.amazonaws.services.kinesis.model.Shard; -import java.util.Set; +import java.util.List; import java.util.stream.Collectors; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -34,35 +32,22 @@ class DynamicCheckpointGenerator implements CheckpointGenerator { private static final Logger LOG = LoggerFactory.getLogger(DynamicCheckpointGenerator.class); private final String streamName; private final StartingPoint startingPoint; - private final StartingPointShardsFinder startingPointShardsFinder; public DynamicCheckpointGenerator(String streamName, StartingPoint startingPoint) { this.streamName = streamName; this.startingPoint = startingPoint; - this.startingPointShardsFinder = new StartingPointShardsFinder(); - } - - public DynamicCheckpointGenerator( - String streamName, - StartingPoint startingPoint, - StartingPointShardsFinder startingPointShardsFinder) { - this.streamName = checkNotNull(streamName, "streamName"); - this.startingPoint = checkNotNull(startingPoint, "startingPoint"); - this.startingPointShardsFinder = - checkNotNull(startingPointShardsFinder, "startingPointShardsFinder"); } @Override public KinesisReaderCheckpoint generate(SimplifiedKinesisClient kinesis) throws TransientKinesisException { - Set shardsAtStartingPoint = - startingPointShardsFinder.findShardsAtStartingPoint(kinesis, streamName, startingPoint); + List streamShards = kinesis.listShardsAtPoint(streamName, startingPoint); LOG.info( "Creating a checkpoint with following shards {} at {}", - shardsAtStartingPoint, + streamShards, startingPoint.getTimestamp()); return new KinesisReaderCheckpoint( - shardsAtStartingPoint.stream() + streamShards.stream() .map(shard -> new ShardCheckpoint(streamName, shard.getShardId(), startingPoint)) .collect(Collectors.toList())); } diff --git a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisReaderCheckpoint.java b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisReaderCheckpoint.java index 5829e353ef95..7faa2516fe97 100644 --- a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisReaderCheckpoint.java +++ b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisReaderCheckpoint.java @@ -61,6 +61,11 @@ private int divideAndRoundUp(int nominator, int denominator) { return (nominator + denominator - 1) / denominator; } + String getStreamName() { + Iterator iterator = iterator(); + return iterator.hasNext() ? iterator.next().getStreamName() : "[unknown]"; + } + @Override public void finalizeCheckpoint() throws IOException {} diff --git a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardReadersPool.java b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardReadersPool.java index 79c9478b4320..bf480e44eb93 100644 --- a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardReadersPool.java +++ b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardReadersPool.java @@ -19,6 +19,7 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import java.util.Collection; import java.util.Comparator; import java.util.List; import java.util.concurrent.ArrayBlockingQueue; @@ -33,6 +34,7 @@ import java.util.concurrent.atomic.AtomicReference; import java.util.function.Function; import java.util.stream.Collectors; +import java.util.stream.StreamSupport; import org.apache.beam.sdk.transforms.windowing.BoundedWindow; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; @@ -109,7 +111,8 @@ void start() throws TransientKinesisException { if (!shardIteratorsMap.get().isEmpty()) { recordsQueue = new ArrayBlockingQueue<>(queueCapacityPerShard * shardIteratorsMap.get().size()); - startReadingShards(shardIteratorsMap.get().values()); + String streamName = initialCheckpoint.getStreamName(); + startReadingShards(shardIteratorsMap.get().values(), streamName); } else { // There are no shards to handle when restoring from an empty checkpoint. Empty checkpoints // are generated when the last shard handled by this pool was closed @@ -119,7 +122,15 @@ void start() throws TransientKinesisException { // Note: readLoop() will log any Throwable raised so opt to ignore the future result @SuppressWarnings("FutureReturnValueIgnored") - void startReadingShards(Iterable shardRecordsIterators) { + void startReadingShards(Iterable shardRecordsIterators, String streamName) { + if (!shardRecordsIterators.iterator().hasNext()) { + LOG.info("Stream {} will not be read, no shard records iterators available", streamName); + return; + } + LOG.info( + "Starting to read {} stream from {} shards", + streamName, + getShardIdsFromRecordsIterators(shardRecordsIterators)); for (final ShardRecordsIterator recordsIterator : shardRecordsIterators) { numberOfRecordsInAQueueByShard.put(recordsIterator.getShardId(), new AtomicInteger()); executorService.submit( @@ -318,7 +329,36 @@ private void readFromSuccessiveShards(final ShardRecordsIterator closedShardIter current, closedShardIterator, successiveShardRecordIterators); } while (!shardIteratorsMap.compareAndSet(current, updated)); numberOfRecordsInAQueueByShard.remove(closedShardIterator.getShardId()); - startReadingShards(successiveShardRecordIterators); + + logSuccessiveShardsFromRecordsIterators(closedShardIterator, successiveShardRecordIterators); + + String streamName = closedShardIterator.getStreamName(); + startReadingShards(successiveShardRecordIterators, streamName); + } + + private static void logSuccessiveShardsFromRecordsIterators( + final ShardRecordsIterator closedShardIterator, + final Collection shardRecordsIterators) { + if (shardRecordsIterators.isEmpty()) { + LOG.info( + "Shard {} for {} stream is closed. Found no successive shards to read from " + + "as it was merged with another shard and this one is considered adjacent by merge operation", + closedShardIterator.getShardId(), + closedShardIterator.getStreamName()); + } else { + LOG.info( + "Shard {} for {} stream is closed, found successive shards to read from: {}", + closedShardIterator.getShardId(), + closedShardIterator.getStreamName(), + getShardIdsFromRecordsIterators(shardRecordsIterators)); + } + } + + private static List getShardIdsFromRecordsIterators( + final Iterable iterators) { + return StreamSupport.stream(iterators.spliterator(), false) + .map(ShardRecordsIterator::getShardId) + .collect(Collectors.toList()); } private ImmutableMap createMapWithSuccessiveShards( diff --git a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIterator.java b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIterator.java index 7c08c0df32b5..c80fe57ecdb3 100644 --- a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIterator.java +++ b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIterator.java @@ -130,8 +130,12 @@ String getShardId() { return shardId; } + String getStreamName() { + return streamName; + } + List findSuccessiveShardRecordIterators() throws TransientKinesisException { - List shards = kinesis.listShards(streamName); + List shards = kinesis.listShardsFollowingClosedShard(streamName, shardId); List successiveShardRecordIterators = new ArrayList<>(); for (Shard shard : shards) { if (shardId.equals(shard.getParentShardId())) { diff --git a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java index 3253d90beee4..a152b64ecdbd 100644 --- a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java +++ b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java @@ -27,25 +27,33 @@ import com.amazonaws.services.cloudwatch.model.GetMetricStatisticsRequest; import com.amazonaws.services.cloudwatch.model.GetMetricStatisticsResult; import com.amazonaws.services.kinesis.AmazonKinesis; +import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream; import com.amazonaws.services.kinesis.clientlibrary.types.UserRecord; +import com.amazonaws.services.kinesis.model.DescribeStreamSummaryRequest; import com.amazonaws.services.kinesis.model.ExpiredIteratorException; import com.amazonaws.services.kinesis.model.GetRecordsRequest; import com.amazonaws.services.kinesis.model.GetRecordsResult; import com.amazonaws.services.kinesis.model.GetShardIteratorRequest; import com.amazonaws.services.kinesis.model.LimitExceededException; +import com.amazonaws.services.kinesis.model.ListShardsRequest; +import com.amazonaws.services.kinesis.model.ListShardsResult; import com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException; import com.amazonaws.services.kinesis.model.Shard; +import com.amazonaws.services.kinesis.model.ShardFilter; +import com.amazonaws.services.kinesis.model.ShardFilterType; import com.amazonaws.services.kinesis.model.ShardIteratorType; -import com.amazonaws.services.kinesis.model.StreamDescription; +import com.amazonaws.services.kinesis.model.StreamDescriptionSummary; +import java.io.IOException; import java.util.Collections; import java.util.Date; import java.util.List; import java.util.concurrent.Callable; +import java.util.function.Supplier; import org.apache.beam.sdk.util.BackOff; import org.apache.beam.sdk.util.BackOffUtils; import org.apache.beam.sdk.util.FluentBackoff; import org.apache.beam.sdk.util.Sleeper; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.joda.time.Duration; import org.joda.time.Instant; import org.joda.time.Minutes; @@ -61,18 +69,33 @@ class SimplifiedKinesisClient { private static final int PERIOD_GRANULARITY_IN_SECONDS = 60; private static final String SUM_STATISTIC = "Sum"; private static final String STREAM_NAME_DIMENSION = "StreamName"; - private static final int LIST_SHARDS_DESCRIBE_STREAM_MAX_ATTEMPTS = 10; - private static final Duration LIST_SHARDS_DESCRIBE_STREAM_INITIAL_BACKOFF = + private static final int LIST_SHARDS_MAX_RESULTS = 1_000; + private static final Duration + SPACING_FOR_TIMESTAMP_LIST_SHARDS_REQUEST_TO_NOT_EXCEED_TRIM_HORIZON = + Duration.standardMinutes(5); + private static final int DESCRIBE_STREAM_SUMMARY_MAX_ATTEMPTS = 10; + private static final Duration DESCRIBE_STREAM_SUMMARY_INITIAL_BACKOFF = Duration.standardSeconds(1); + private final AmazonKinesis kinesis; private final AmazonCloudWatch cloudWatch; private final Integer limit; + private final Supplier currentInstantSupplier; public SimplifiedKinesisClient( AmazonKinesis kinesis, AmazonCloudWatch cloudWatch, Integer limit) { + this(kinesis, cloudWatch, limit, Instant::now); + } + + SimplifiedKinesisClient( + AmazonKinesis kinesis, + AmazonCloudWatch cloudWatch, + Integer limit, + Supplier currentInstantSupplier) { this.kinesis = checkNotNull(kinesis, "kinesis"); this.cloudWatch = checkNotNull(cloudWatch, "cloudWatch"); this.limit = limit; + this.currentInstantSupplier = currentInstantSupplier; } public static SimplifiedKinesisClient from(AWSClientsProvider provider, Integer limit) { @@ -101,43 +124,116 @@ public String getShardIterator( .getShardIterator()); } - public List listShards(final String streamName) throws TransientKinesisException { + public List listShardsAtPoint(final String streamName, final StartingPoint startingPoint) + throws TransientKinesisException { + ShardFilter shardFilter = + wrapExceptions(() -> buildShardFilterForStartingPoint(streamName, startingPoint)); + return listShards(streamName, shardFilter); + } + + private ShardFilter buildShardFilterForStartingPoint( + String streamName, StartingPoint startingPoint) throws IOException, InterruptedException { + InitialPositionInStream position = startingPoint.getPosition(); + switch (position) { + case LATEST: + return new ShardFilter().withType(ShardFilterType.AT_LATEST); + case TRIM_HORIZON: + return new ShardFilter().withType(ShardFilterType.AT_TRIM_HORIZON); + case AT_TIMESTAMP: + return buildShardFilterForTimestamp(streamName, startingPoint.getTimestamp()); + default: + throw new IllegalArgumentException( + String.format("Unrecognized '%s' position to create shard filter with", position)); + } + } + + private ShardFilter buildShardFilterForTimestamp( + String streamName, Instant startingPointTimestamp) throws IOException, InterruptedException { + StreamDescriptionSummary streamDescription = describeStreamSummary(streamName); + + Instant streamCreationTimestamp = new Instant(streamDescription.getStreamCreationTimestamp()); + if (streamCreationTimestamp.isAfter(startingPointTimestamp)) { + return new ShardFilter().withType(ShardFilterType.AT_TRIM_HORIZON); + } + + Duration retentionPeriod = Duration.standardHours(streamDescription.getRetentionPeriodHours()); + + Instant streamTrimHorizonTimestamp = + currentInstantSupplier + .get() + .minus(retentionPeriod) + .plus(SPACING_FOR_TIMESTAMP_LIST_SHARDS_REQUEST_TO_NOT_EXCEED_TRIM_HORIZON); + if (startingPointTimestamp.isAfter(streamTrimHorizonTimestamp)) { + return new ShardFilter() + .withType(ShardFilterType.AT_TIMESTAMP) + .withTimestamp(startingPointTimestamp.toDate()); + } else { + return new ShardFilter().withType(ShardFilterType.AT_TRIM_HORIZON); + } + } + + private StreamDescriptionSummary describeStreamSummary(final String streamName) + throws IOException, InterruptedException { + // DescribeStreamSummary has limits that can be hit fairly easily if we are attempting + // to configure multiple KinesisIO inputs in the same account. Retry up to + // DESCRIBE_STREAM_SUMMARY_MAX_ATTEMPTS times if we end up hitting that limit. + // + // Only pass the wrapped exception up once that limit is reached. Use FluentBackoff + // to implement the retry policy. + FluentBackoff retryBackoff = + FluentBackoff.DEFAULT + .withMaxRetries(DESCRIBE_STREAM_SUMMARY_MAX_ATTEMPTS) + .withInitialBackoff(DESCRIBE_STREAM_SUMMARY_INITIAL_BACKOFF); + BackOff backoff = retryBackoff.backoff(); + Sleeper sleeper = Sleeper.DEFAULT; + + DescribeStreamSummaryRequest request = new DescribeStreamSummaryRequest(); + request.setStreamName(streamName); + while (true) { + try { + return kinesis.describeStreamSummary(request).getStreamDescriptionSummary(); + } catch (LimitExceededException exc) { + if (!BackOffUtils.next(sleeper, backoff)) { + throw exc; + } + } + } + } + + public List listShardsFollowingClosedShard( + final String streamName, final String exclusiveStartShardId) + throws TransientKinesisException { + ShardFilter shardFilter = + new ShardFilter() + .withType(ShardFilterType.AFTER_SHARD_ID) + .withShardId(exclusiveStartShardId); + return listShards(streamName, shardFilter); + } + + private List listShards(final String streamName, final ShardFilter shardFilter) + throws TransientKinesisException { return wrapExceptions( () -> { - List shards = Lists.newArrayList(); - String lastShardId = null; + ImmutableList.Builder shardsBuilder = ImmutableList.builder(); - // DescribeStream has limits that can be hit fairly easily if we are attempting - // to configure multiple KinesisIO inputs in the same account. Retry up to - // LIST_SHARDS_DESCRIBE_STREAM_MAX_ATTEMPTS times if we end up hitting that limit. - // - // Only pass the wrapped exception up once that limit is reached. Use FluentBackoff - // to implement the retry policy. - FluentBackoff retryBackoff = - FluentBackoff.DEFAULT - .withMaxRetries(LIST_SHARDS_DESCRIBE_STREAM_MAX_ATTEMPTS) - .withInitialBackoff(LIST_SHARDS_DESCRIBE_STREAM_INITIAL_BACKOFF); - StreamDescription description = null; + String currentNextToken = null; do { - BackOff backoff = retryBackoff.backoff(); - Sleeper sleeper = Sleeper.DEFAULT; - while (true) { - try { - description = - kinesis.describeStream(streamName, lastShardId).getStreamDescription(); - break; - } catch (LimitExceededException exc) { - if (!BackOffUtils.next(sleeper, backoff)) { - throw exc; - } - } + ListShardsRequest request = new ListShardsRequest(); + request.setMaxResults(LIST_SHARDS_MAX_RESULTS); + if (currentNextToken != null) { + request.setNextToken(currentNextToken); + } else { + request.setStreamName(streamName); } + request.setShardFilter(shardFilter); - shards.addAll(description.getShards()); - lastShardId = shards.get(shards.size() - 1).getShardId(); - } while (description.getHasMoreShards()); + ListShardsResult response = kinesis.listShards(request); + List shards = response.getShards(); + shardsBuilder.addAll(shards); + currentNextToken = response.getNextToken(); + } while (currentNextToken != null); - return shards; + return shardsBuilder.build(); }); } diff --git a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/StartingPoint.java b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/StartingPoint.java index 24a3a003eef4..946e1e38cde5 100644 --- a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/StartingPoint.java +++ b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/StartingPoint.java @@ -20,7 +20,6 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream; -import com.amazonaws.services.kinesis.model.ShardIteratorType; import java.io.Serializable; import java.util.Objects; import org.checkerframework.checker.nullness.qual.Nullable; @@ -46,7 +45,7 @@ public StartingPoint(InitialPositionInStream position) { public StartingPoint(Instant timestamp) { this.timestamp = checkNotNull(timestamp, "timestamp"); - this.position = null; + this.position = InitialPositionInStream.AT_TIMESTAMP; } public InitialPositionInStream getPosition() { @@ -54,11 +53,11 @@ public InitialPositionInStream getPosition() { } public String getPositionName() { - return position != null ? position.name() : ShardIteratorType.AT_TIMESTAMP.name(); + return position.name(); } public Instant getTimestamp() { - return timestamp != null ? timestamp : null; + return timestamp; } @Override diff --git a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/StartingPointShardsFinder.java b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/StartingPointShardsFinder.java deleted file mode 100644 index 95ab85a3b387..000000000000 --- a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/StartingPointShardsFinder.java +++ /dev/null @@ -1,205 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.kinesis; - -import com.amazonaws.services.kinesis.model.Shard; -import com.amazonaws.services.kinesis.model.ShardIteratorType; -import java.io.Serializable; -import java.util.HashSet; -import java.util.List; -import java.util.Objects; -import java.util.Set; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -/** - * This class is responsible for establishing the initial set of shards that existed at the given - * starting point. - */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -class StartingPointShardsFinder implements Serializable { - - private static final Logger LOG = LoggerFactory.getLogger(StartingPointShardsFinder.class); - - /** - * Finds all the shards at the given startingPoint. This method starts by gathering the oldest - * shards in the stream and considers them as initial shards set. Then it validates the shards by - * getting an iterator at the given starting point and trying to read some records. If shard - * passes the validation then it is added to the result shards set. If not then it is regarded as - * expired and its successors are taken into consideration. This step is repeated until all valid - * shards are found. - * - *

    The following diagram depicts sample split and merge operations on a stream with 3 initial - * shards. Let's consider what happens when T1, T2, T3 or T4 timestamps are passed as the - * startingPoint. - * - *

      - *
    • T1 timestamp (or TRIM_HORIZON marker) - 0000, 0001 and 0002 shards are the oldest so they - * are gathered as initial shards set. All of them are valid at T1 timestamp so they are all - * returned from the method. - *
    • T2 timestamp - 0000, 0001 and 0002 shards form the initial shards set. - *
        - *
      • 0000 passes the validation at T2 timestamp so it is added to the result set - *
      • 0001 does not pass the validation as it is already closed at T2 timestamp so its - * successors 0003 and 0004 are considered. Both are valid at T2 timestamp so they are - * added to the resulting set. - *
      • 0002 also does not pass the validation so its successors 0005 and 0006 are - * considered and both are valid. - *
      - * Finally the resulting set contains 0000, 0003, 0004, 0005 and 0006 shards. - *
    • T3 timestamp - the beginning is the same as in T2 case. - *
        - *
      • 0000 is valid - *
      • 0001 is already closed at T2 timestamp so its successors 0003 and 0004 are next. - * 0003 is valid but 0004 is already closed at T3 timestamp. It has one successor 0007 - * which is the result of merging 0004 and 0005 shards. 0007 has two parent shards - * then stored in {@link Shard#parentShardId} and {@link Shard#adjacentParentShardId} - * fields. Only one of them should follow the relation to its successor so it is - * always the shard stored in parentShardId field. Let's assume that it was 0004 shard - * and it's the one that considers 0007 its successor. 0007 is valid at T3 timestamp - * and it's added to the result set. - *
      • 0002 is closed at T3 timestamp so its successors 0005 and 0006 are next. 0005 is - * also closed because it was merged with 0004 shard. Their successor is 0007 and it - * was already considered by 0004 shard so no action here is needed. Shard 0006 is - * valid. - *
      - *
    • T4 timestamp (or LATEST marker) - following the same reasoning as in previous cases it - * end's up with 0000, 0003, 0008 and 0010 shards. - *
    - * - *
    -   *      T1                T2          T3                      T4
    -   *      |                 |           |                       |
    -   * 0000-----------------------------------------------------------
    -   *
    -   *
    -   *             0003-----------------------------------------------
    -   *            /
    -   * 0001------+
    -   *            \
    -   *             0004-----------+             0008------------------
    -   *                             \           /
    -   *                              0007------+
    -   *                             /           \
    -   *                  0005------+             0009------+
    -   *                 /                                   \
    -   * 0002-----------+                                     0010------
    -   *                 \                                   /
    -   *                  0006------------------------------+
    -   * 
    - */ - Set findShardsAtStartingPoint( - SimplifiedKinesisClient kinesis, String streamName, StartingPoint startingPoint) - throws TransientKinesisException { - List allShards = kinesis.listShards(streamName); - Set initialShards = findInitialShardsWithoutParents(streamName, allShards); - - Set startingPointShards = new HashSet<>(); - Set expiredShards; - do { - Set validShards = validateShards(kinesis, initialShards, streamName, startingPoint); - startingPointShards.addAll(validShards); - expiredShards = Sets.difference(initialShards, validShards); - if (!expiredShards.isEmpty()) { - LOG.info( - "Following shards expired for {} stream at '{}' starting point: {}", - streamName, - startingPoint, - expiredShards); - } - initialShards = findNextShards(allShards, expiredShards); - } while (!expiredShards.isEmpty()); - return startingPointShards; - } - - private Set findNextShards(List allShards, Set expiredShards) { - Set nextShards = new HashSet<>(); - for (Shard expiredShard : expiredShards) { - boolean successorFound = false; - for (Shard shard : allShards) { - if (Objects.equals(expiredShard.getShardId(), shard.getParentShardId())) { - nextShards.add(shard); - successorFound = true; - } else if (Objects.equals(expiredShard.getShardId(), shard.getAdjacentParentShardId())) { - successorFound = true; - } - } - if (!successorFound) { - // This can potentially happen during split/merge operation. Newly created shards might be - // not listed in the allShards list and their predecessor is already considered expired. - // Retrying should solve the issue. - throw new IllegalStateException("No successors were found for shard: " + expiredShard); - } - } - return nextShards; - } - - /** - * Finds the initial set of shards (the oldest ones). These shards do not have their parents in - * the shard list. - */ - private Set findInitialShardsWithoutParents(String streamName, List allShards) { - Set shardIds = new HashSet<>(); - for (Shard shard : allShards) { - shardIds.add(shard.getShardId()); - } - LOG.info("Stream {} has following shards: {}", streamName, shardIds); - Set shardsWithoutParents = new HashSet<>(); - for (Shard shard : allShards) { - if (!shardIds.contains(shard.getParentShardId())) { - shardsWithoutParents.add(shard); - } - } - return shardsWithoutParents; - } - - /** - * Validates the shards at the given startingPoint. Validity is checked by getting an iterator at - * the startingPoint and then trying to read some records. This action does not affect the records - * at all. If the shard is valid then it will get read from exactly the same point and these - * records will be read again. - */ - private Set validateShards( - SimplifiedKinesisClient kinesis, - Iterable rootShards, - String streamName, - StartingPoint startingPoint) - throws TransientKinesisException { - Set validShards = new HashSet<>(); - ShardIteratorType shardIteratorType = - ShardIteratorType.fromValue(startingPoint.getPositionName()); - for (Shard shard : rootShards) { - String shardIterator = - kinesis.getShardIterator( - streamName, - shard.getShardId(), - shardIteratorType, - null, - startingPoint.getTimestamp()); - GetKinesisRecordsResult records = - kinesis.getRecords(shardIterator, streamName, shard.getShardId()); - if (records.getNextShardIterator() != null || !records.getRecords().isEmpty()) { - validShards.add(shard); - } - } - return validShards; - } -} diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/AmazonKinesisMock.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/AmazonKinesisMock.java index 56723c700ddf..7179af4b97af 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/AmazonKinesisMock.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/AmazonKinesisMock.java @@ -85,7 +85,6 @@ import com.amazonaws.services.kinesis.model.StartStreamEncryptionResult; import com.amazonaws.services.kinesis.model.StopStreamEncryptionRequest; import com.amazonaws.services.kinesis.model.StopStreamEncryptionResult; -import com.amazonaws.services.kinesis.model.StreamDescription; import com.amazonaws.services.kinesis.model.UpdateShardCountRequest; import com.amazonaws.services.kinesis.model.UpdateShardCountResult; import com.amazonaws.services.kinesis.producer.IKinesisProducer; @@ -94,9 +93,9 @@ import java.io.Serializable; import java.nio.ByteBuffer; import java.nio.charset.StandardCharsets; -import java.util.ArrayList; import java.util.List; import java.util.stream.Collectors; +import java.util.stream.IntStream; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Splitter; import org.apache.commons.lang.builder.EqualsBuilder; import org.checkerframework.checker.nullness.qual.Nullable; @@ -104,9 +103,6 @@ import org.mockito.Mockito; /** Mock implemenation of {@link AmazonKinesis} for testing. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class AmazonKinesisMock implements AmazonKinesis { static class TestData implements Serializable { @@ -166,31 +162,31 @@ static class Provider implements AWSClientsProvider { private final List> shardedData; private final int numberOfRecordsPerGet; - private int rateLimitDescribeStream = 0; + private boolean expectedListShardsLimitExceededException; public Provider(List> shardedData, int numberOfRecordsPerGet) { this.shardedData = shardedData; this.numberOfRecordsPerGet = numberOfRecordsPerGet; } - /** - * Simulate an initially rate limited DescribeStream. - * - * @param rateLimitDescribeStream The number of rate limited requests before success - */ - public Provider withRateLimitedDescribeStream(int rateLimitDescribeStream) { - this.rateLimitDescribeStream = rateLimitDescribeStream; + /** Simulate limit exceeded exception for ListShards. */ + public Provider withExpectedListShardsLimitExceededException() { + expectedListShardsLimitExceededException = true; return this; } @Override public AmazonKinesis getKinesisClient() { - return new AmazonKinesisMock( + AmazonKinesisMock client = + new AmazonKinesisMock( shardedData.stream() .map(testDatas -> transform(testDatas, TestData::convertToRecord)) .collect(Collectors.toList()), - numberOfRecordsPerGet) - .withRateLimitedDescribeStream(rateLimitDescribeStream); + numberOfRecordsPerGet); + if (expectedListShardsLimitExceededException) { + client = client.withExpectedListShardsLimitExceededException(); + } + return client; } @Override @@ -207,15 +203,15 @@ public IKinesisProducer createKinesisProducer(KinesisProducerConfiguration confi private final List> shardedData; private final int numberOfRecordsPerGet; - private int rateLimitDescribeStream = 0; + private boolean expectedListShardsLimitExceededException; public AmazonKinesisMock(List> shardedData, int numberOfRecordsPerGet) { this.shardedData = shardedData; this.numberOfRecordsPerGet = numberOfRecordsPerGet; } - public AmazonKinesisMock withRateLimitedDescribeStream(int rateLimitDescribeStream) { - this.rateLimitDescribeStream = rateLimitDescribeStream; + public AmazonKinesisMock withExpectedListShardsLimitExceededException() { + this.expectedListShardsLimitExceededException = true; return this; } @@ -252,30 +248,7 @@ public GetShardIteratorResult getShardIterator(GetShardIteratorRequest getShardI @Override public DescribeStreamResult describeStream(String streamName, String exclusiveStartShardId) { - if (rateLimitDescribeStream-- > 0) { - throw new LimitExceededException("DescribeStream rate limit exceeded"); - } - int nextShardId = 0; - if (exclusiveStartShardId != null) { - nextShardId = parseInt(exclusiveStartShardId) + 1; - } - boolean hasMoreShards = nextShardId + 1 < shardedData.size(); - - List shards = new ArrayList<>(); - if (nextShardId < shardedData.size()) { - shards.add(new Shard().withShardId(Integer.toString(nextShardId))); - } - - HttpResponse response = new HttpResponse(null, null); - response.setStatusCode(200); - DescribeStreamResult result = new DescribeStreamResult(); - result.setSdkHttpMetadata(SdkHttpMetadata.from(response)); - result.withStreamDescription( - new StreamDescription() - .withHasMoreShards(hasMoreShards) - .withShards(shards) - .withStreamName(streamName)); - return result; + throw new RuntimeException("Not implemented"); } @Override @@ -386,7 +359,23 @@ public IncreaseStreamRetentionPeriodResult increaseStreamRetentionPeriod( @Override public ListShardsResult listShards(ListShardsRequest listShardsRequest) { - throw new RuntimeException("Not implemented"); + if (expectedListShardsLimitExceededException) { + throw new LimitExceededException("ListShards rate limit exceeded"); + } + + ListShardsResult result = new ListShardsResult(); + + List shards = + IntStream.range(0, shardedData.size()) + .boxed() + .map(i -> new Shard().withShardId(Integer.toString(i))) + .collect(Collectors.toList()); + result.setShards(shards); + + HttpResponse response = new HttpResponse(null, null); + response.setStatusCode(200); + result.setSdkHttpMetadata(SdkHttpMetadata.from(response)); + return result; } @Override diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/DynamicCheckpointGeneratorTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/DynamicCheckpointGeneratorTest.java index 64e263e93e9d..437fd8610149 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/DynamicCheckpointGeneratorTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/DynamicCheckpointGeneratorTest.java @@ -22,8 +22,8 @@ import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream; import com.amazonaws.services.kinesis.model.Shard; -import java.util.Set; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets; +import java.util.List; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; import org.junit.Test; import org.junit.runner.RunWith; import org.mockito.Mock; @@ -31,13 +31,9 @@ /** * */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DynamicCheckpointGeneratorTest { @Mock private SimplifiedKinesisClient kinesisClient; - @Mock private StartingPointShardsFinder startingPointShardsFinder; @Mock private Shard shard1, shard2, shard3; @Test @@ -45,37 +41,15 @@ public void shouldMapAllShardsToCheckpoints() throws Exception { when(shard1.getShardId()).thenReturn("shard-01"); when(shard2.getShardId()).thenReturn("shard-02"); when(shard3.getShardId()).thenReturn("shard-03"); - Set shards = Sets.newHashSet(shard1, shard2, shard3); + List shards = ImmutableList.of(shard1, shard2, shard3); + String streamName = "stream"; StartingPoint startingPoint = new StartingPoint(InitialPositionInStream.LATEST); - when(startingPointShardsFinder.findShardsAtStartingPoint( - kinesisClient, "stream", startingPoint)) - .thenReturn(shards); + when(kinesisClient.listShardsAtPoint(streamName, startingPoint)).thenReturn(shards); DynamicCheckpointGenerator underTest = - new DynamicCheckpointGenerator("stream", startingPoint, startingPointShardsFinder); + new DynamicCheckpointGenerator(streamName, startingPoint); KinesisReaderCheckpoint checkpoint = underTest.generate(kinesisClient); assertThat(checkpoint).hasSize(3); } - - @Test - public void shouldMapAllValidShardsToCheckpoints() throws Exception { - when(shard1.getShardId()).thenReturn("shard-01"); - when(shard2.getShardId()).thenReturn("shard-02"); - when(shard3.getShardId()).thenReturn("shard-03"); - String streamName = "stream"; - Set shards = Sets.newHashSet(shard1, shard2); - StartingPoint startingPoint = new StartingPoint(InitialPositionInStream.LATEST); - when(startingPointShardsFinder.findShardsAtStartingPoint( - kinesisClient, "stream", startingPoint)) - .thenReturn(shards); - - DynamicCheckpointGenerator underTest = - new DynamicCheckpointGenerator(streamName, startingPoint, startingPointShardsFinder); - - KinesisReaderCheckpoint checkpoint = underTest.generate(kinesisClient); - assertThat(checkpoint) - .hasSize(2) - .doesNotContain(new ShardCheckpoint(streamName, shard3.getShardId(), startingPoint)); - } } diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisIOIT.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisIOIT.java index f369668cd7c8..b1f3180ae128 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisIOIT.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisIOIT.java @@ -57,11 +57,8 @@ * when no options are provided an instance of localstack is used. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KinesisIOIT implements Serializable { - private static final String LOCALSTACK_VERSION = "0.11.3"; + private static final String LOCALSTACK_VERSION = "0.11.4"; @Rule public TestPipeline pipelineWrite = TestPipeline.create(); @Rule public TestPipeline pipelineRead = TestPipeline.create(); diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisMockReadTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisMockReadTest.java index b8965a6cbfa4..818f26b55022 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisMockReadTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisMockReadTest.java @@ -49,21 +49,12 @@ public void readsDataFromMockKinesis() { verifyReadWithProvider(new AmazonKinesisMock.Provider(testData, 10), testData); } - @Test - public void readsDataFromMockKinesisWithDescribeStreamRateLimit() { - List> testData = defaultTestData(); - verifyReadWithProvider( - new AmazonKinesisMock.Provider(testData, 10).withRateLimitedDescribeStream(2), testData); - } - @Test(expected = PipelineExecutionException.class) - public void readsDataFromMockKinesisWithDescribeStreamRateLimitFailure() { + public void readsDataFromMockKinesisWithLimitFailure() { List> testData = defaultTestData(); - // Verify with a provider that will generate more LimitExceededExceptions then we - // will retry. Should result in generation of a TransientKinesisException and subsequently - // a PipelineExecutionException. verifyReadWithProvider( - new AmazonKinesisMock.Provider(testData, 10).withRateLimitedDescribeStream(11), testData); + new AmazonKinesisMock.Provider(testData, 10).withExpectedListShardsLimitExceededException(), + testData); } public void verifyReadWithProvider( diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisMockWriteTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisMockWriteTest.java index 46968bde9177..4e189a050b88 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisMockWriteTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisMockWriteTest.java @@ -18,15 +18,11 @@ package org.apache.beam.sdk.io.kinesis; import static org.junit.Assert.assertEquals; -import static org.mockito.Matchers.any; import static org.mockito.Mockito.mock; -import static org.mockito.Mockito.when; -import com.amazonaws.http.SdkHttpMetadata; import com.amazonaws.services.cloudwatch.AmazonCloudWatch; import com.amazonaws.services.kinesis.AmazonKinesis; import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream; -import com.amazonaws.services.kinesis.model.DescribeStreamResult; import com.amazonaws.services.kinesis.producer.IKinesisProducer; import com.amazonaws.services.kinesis.producer.KinesisProducerConfiguration; import java.nio.charset.StandardCharsets; @@ -48,9 +44,6 @@ /** Tests for {@link KinesisIO.Write}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KinesisMockWriteTest { private static final String STREAM = "BEAM"; private static final String PARTITION_KEY = "partitionKey"; @@ -235,15 +228,10 @@ public String getExplicitHashKey(byte[] value) { } private static final class FakeKinesisProvider implements AWSClientsProvider { - private boolean isExistingStream = true; private boolean isFailedFlush = false; public FakeKinesisProvider() {} - public FakeKinesisProvider(boolean isExistingStream) { - this.isExistingStream = isExistingStream; - } - public FakeKinesisProvider setFailedFlush(boolean failedFlush) { isFailedFlush = failedFlush; return this; @@ -251,7 +239,7 @@ public FakeKinesisProvider setFailedFlush(boolean failedFlush) { @Override public AmazonKinesis getKinesisClient() { - return getMockedAmazonKinesisClient(); + return mock(AmazonKinesis.class); } @Override @@ -263,19 +251,5 @@ public AmazonCloudWatch getCloudWatchClient() { public IKinesisProducer createKinesisProducer(KinesisProducerConfiguration config) { return new KinesisProducerMock(config, isFailedFlush); } - - private AmazonKinesis getMockedAmazonKinesisClient() { - int statusCode = isExistingStream ? 200 : 404; - SdkHttpMetadata httpMetadata = mock(SdkHttpMetadata.class); - when(httpMetadata.getHttpStatusCode()).thenReturn(statusCode); - - DescribeStreamResult streamResult = mock(DescribeStreamResult.class); - when(streamResult.getSdkHttpMetadata()).thenReturn(httpMetadata); - - AmazonKinesis client = mock(AmazonKinesis.class); - when(client.describeStream(any(String.class))).thenReturn(streamResult); - - return client; - } } } diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisReaderCheckpointTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisReaderCheckpointTest.java index 244d880a8ee4..1653daf958eb 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisReaderCheckpointTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisReaderCheckpointTest.java @@ -31,9 +31,6 @@ /** * */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KinesisReaderCheckpointTest { @Mock private ShardCheckpoint a, b, c; diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisReaderTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisReaderTest.java index 41bc443ac220..64f0fe7c6538 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisReaderTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisReaderTest.java @@ -38,9 +38,6 @@ /** Tests {@link KinesisReader}. */ @RunWith(MockitoJUnitRunner.Silent.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KinesisReaderTest { @Mock private SimplifiedKinesisClient kinesis; diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisServiceMock.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisServiceMock.java index aebb357b1314..0508b0570a7f 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisServiceMock.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisServiceMock.java @@ -26,9 +26,6 @@ import org.joda.time.DateTime; /** Simple mock implementation of Kinesis service for testing, singletone. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KinesisServiceMock { private static KinesisServiceMock instance; diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/RecordFilterTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/RecordFilterTest.java index 09908b5f4dd1..e17fa86c0ccb 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/RecordFilterTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/RecordFilterTest.java @@ -30,9 +30,6 @@ /** * */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RecordFilterTest { @Mock private ShardCheckpoint checkpoint; diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardCheckpointTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardCheckpointTest.java index 61e65f5d685d..5abe60585fae 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardCheckpointTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardCheckpointTest.java @@ -42,9 +42,6 @@ /** */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ShardCheckpointTest { private static final String AT_SEQUENCE_SHARD_IT = "AT_SEQUENCE_SHARD_IT"; diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardReadersPoolTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardReadersPoolTest.java index 325a2b413f9d..5950ae455aea 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardReadersPoolTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardReadersPoolTest.java @@ -47,9 +47,6 @@ /** Tests {@link ShardReadersPool}. */ @RunWith(MockitoJUnitRunner.Silent.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ShardReadersPoolTest { private static final int TIMEOUT_IN_MILLIS = (int) TimeUnit.SECONDS.toMillis(10); @@ -72,8 +69,10 @@ public void setUp() throws TransientKinesisException { when(c.getShardId()).thenReturn("shard2"); when(d.getShardId()).thenReturn("shard2"); when(firstCheckpoint.getShardId()).thenReturn("shard1"); + when(firstCheckpoint.getStreamName()).thenReturn("testStream"); when(secondCheckpoint.getShardId()).thenReturn("shard2"); when(firstIterator.getShardId()).thenReturn("shard1"); + when(firstIterator.getStreamName()).thenReturn("testStream"); when(firstIterator.getCheckpoint()).thenReturn(firstCheckpoint); when(secondIterator.getShardId()).thenReturn("shard2"); when(secondIterator.getCheckpoint()).thenReturn(secondCheckpoint); @@ -276,8 +275,10 @@ public void shouldForgetClosedShardIterator() throws Exception { when(firstIterator.findSuccessiveShardRecordIterators()).thenReturn(emptyList); shardReadersPool.start(); - verify(shardReadersPool).startReadingShards(ImmutableList.of(firstIterator, secondIterator)); - verify(shardReadersPool, timeout(TIMEOUT_IN_MILLIS)).startReadingShards(emptyList); + verify(shardReadersPool) + .startReadingShards(ImmutableList.of(firstIterator, secondIterator), "testStream"); + verify(shardReadersPool, timeout(TIMEOUT_IN_MILLIS)) + .startReadingShards(emptyList, "testStream"); KinesisReaderCheckpoint checkpointMark = shardReadersPool.getCheckpointMark(); assertThat(checkpointMark.iterator()) diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIteratorTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIteratorTest.java index 168b8df0d87f..397dc9831a9a 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIteratorTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIteratorTest.java @@ -39,9 +39,6 @@ /** Tests {@link ShardRecordsIterator}. */ @RunWith(MockitoJUnitRunner.Silent.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ShardRecordsIteratorTest { private static final String INITIAL_ITERATOR = "INITIAL_ITERATOR"; diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClientTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClientTest.java index a92b47d1d20a..9c8ea2989e17 100644 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClientTest.java +++ b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClientTest.java @@ -32,22 +32,30 @@ import com.amazonaws.services.cloudwatch.model.GetMetricStatisticsRequest; import com.amazonaws.services.cloudwatch.model.GetMetricStatisticsResult; import com.amazonaws.services.kinesis.AmazonKinesis; -import com.amazonaws.services.kinesis.model.DescribeStreamResult; +import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream; +import com.amazonaws.services.kinesis.model.DescribeStreamSummaryRequest; +import com.amazonaws.services.kinesis.model.DescribeStreamSummaryResult; import com.amazonaws.services.kinesis.model.ExpiredIteratorException; import com.amazonaws.services.kinesis.model.GetRecordsRequest; import com.amazonaws.services.kinesis.model.GetRecordsResult; import com.amazonaws.services.kinesis.model.GetShardIteratorRequest; import com.amazonaws.services.kinesis.model.GetShardIteratorResult; import com.amazonaws.services.kinesis.model.LimitExceededException; +import com.amazonaws.services.kinesis.model.ListShardsRequest; +import com.amazonaws.services.kinesis.model.ListShardsResult; import com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException; import com.amazonaws.services.kinesis.model.Record; import com.amazonaws.services.kinesis.model.Shard; +import com.amazonaws.services.kinesis.model.ShardFilter; +import com.amazonaws.services.kinesis.model.ShardFilterType; import com.amazonaws.services.kinesis.model.ShardIteratorType; -import com.amazonaws.services.kinesis.model.StreamDescription; +import com.amazonaws.services.kinesis.model.StreamDescriptionSummary; import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.Arrays; import java.util.List; +import java.util.function.Supplier; +import org.joda.time.Duration; import org.joda.time.Instant; import org.joda.time.Minutes; import org.junit.Test; @@ -59,9 +67,6 @@ /** * */ @RunWith(MockitoJUnitRunner.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SimplifiedKinesisClientTest { private static final String STREAM = "stream"; @@ -70,9 +75,11 @@ public class SimplifiedKinesisClientTest { private static final String SHARD_3 = "shard-03"; private static final String SHARD_ITERATOR = "iterator"; private static final String SEQUENCE_NUMBER = "abc123"; + private static final Instant CURRENT_TIMESTAMP = Instant.parse("2000-01-01T15:00:00.000Z"); @Mock private AmazonKinesis kinesis; @Mock private AmazonCloudWatch cloudWatch; + @Mock private Supplier currentInstantSupplier; @InjectMocks private SimplifiedKinesisClient underTest; @Test @@ -166,22 +173,259 @@ private void shouldHandleGetShardIteratorError( } @Test - public void shouldListAllShards() throws Exception { + public void shouldListAllShardsForTrimHorizon() throws Exception { Shard shard1 = new Shard().withShardId(SHARD_1); Shard shard2 = new Shard().withShardId(SHARD_2); Shard shard3 = new Shard().withShardId(SHARD_3); - when(kinesis.describeStream(STREAM, null)) + + ShardFilter shardFilter = new ShardFilter().withType(ShardFilterType.AT_TRIM_HORIZON); + + when(kinesis.listShards( + new ListShardsRequest() + .withStreamName(STREAM) + .withShardFilter(shardFilter) + .withMaxResults(1_000))) + .thenReturn(new ListShardsResult().withShards(shard1, shard2, shard3).withNextToken(null)); + + List shards = + underTest.listShardsAtPoint( + STREAM, new StartingPoint(InitialPositionInStream.TRIM_HORIZON)); + + assertThat(shards).containsOnly(shard1, shard2, shard3); + } + + @Test + public void shouldListAllShardsForTrimHorizonWithPagedResults() throws Exception { + Shard shard1 = new Shard().withShardId(SHARD_1); + Shard shard2 = new Shard().withShardId(SHARD_2); + Shard shard3 = new Shard().withShardId(SHARD_3); + + ShardFilter shardFilter = new ShardFilter().withType(ShardFilterType.AT_TRIM_HORIZON); + + String nextListShardsToken = "testNextToken"; + when(kinesis.listShards( + new ListShardsRequest() + .withStreamName(STREAM) + .withShardFilter(shardFilter) + .withMaxResults(1_000))) + .thenReturn( + new ListShardsResult().withShards(shard1, shard2).withNextToken(nextListShardsToken)); + + when(kinesis.listShards( + new ListShardsRequest() + .withMaxResults(1_000) + .withShardFilter(shardFilter) + .withNextToken(nextListShardsToken))) + .thenReturn(new ListShardsResult().withShards(shard3).withNextToken(null)); + + List shards = + underTest.listShardsAtPoint( + STREAM, new StartingPoint(InitialPositionInStream.TRIM_HORIZON)); + + assertThat(shards).containsOnly(shard1, shard2, shard3); + } + + @Test + public void shouldListAllShardsForTimestampWithinStreamRetentionAfterStreamCreationTimestamp() + throws Exception { + Shard shard1 = new Shard().withShardId(SHARD_1); + Shard shard2 = new Shard().withShardId(SHARD_2); + Shard shard3 = new Shard().withShardId(SHARD_3); + + int hoursDifference = 1; + int retentionPeriodHours = hoursDifference * 3; + Instant streamCreationTimestamp = + CURRENT_TIMESTAMP.minus(Duration.standardHours(retentionPeriodHours)); + Instant startingPointTimestamp = + streamCreationTimestamp.plus(Duration.standardHours(hoursDifference)); + + when(currentInstantSupplier.get()).thenReturn(CURRENT_TIMESTAMP); + + when(kinesis.describeStreamSummary(new DescribeStreamSummaryRequest().withStreamName(STREAM))) + .thenReturn( + new DescribeStreamSummaryResult() + .withStreamDescriptionSummary( + new StreamDescriptionSummary() + .withRetentionPeriodHours(retentionPeriodHours) + .withStreamCreationTimestamp(streamCreationTimestamp.toDate()))); + + ShardFilter shardFilter = + new ShardFilter() + .withType(ShardFilterType.AT_TIMESTAMP) + .withTimestamp(startingPointTimestamp.toDate()); + + when(kinesis.listShards( + new ListShardsRequest() + .withStreamName(STREAM) + .withShardFilter(shardFilter) + .withMaxResults(1_000))) + .thenReturn(new ListShardsResult().withShards(shard1, shard2, shard3).withNextToken(null)); + + List shards = + underTest.listShardsAtPoint(STREAM, new StartingPoint(startingPointTimestamp)); + + assertThat(shards).containsOnly(shard1, shard2, shard3); + } + + @Test + public void + shouldListAllShardsForTimestampWithRetriedDescribeStreamSummaryCallAfterStreamCreationTimestamp() + throws TransientKinesisException { + Shard shard1 = new Shard().withShardId(SHARD_1); + Shard shard2 = new Shard().withShardId(SHARD_2); + Shard shard3 = new Shard().withShardId(SHARD_3); + + int hoursDifference = 1; + int retentionPeriodHours = hoursDifference * 3; + Instant streamCreationTimestamp = + CURRENT_TIMESTAMP.minus(Duration.standardHours(retentionPeriodHours)); + Instant startingPointTimestamp = + streamCreationTimestamp.plus(Duration.standardHours(hoursDifference)); + + when(currentInstantSupplier.get()).thenReturn(CURRENT_TIMESTAMP); + + when(kinesis.describeStreamSummary(new DescribeStreamSummaryRequest().withStreamName(STREAM))) + .thenThrow(new LimitExceededException("Fake Exception: Limit exceeded")) + .thenReturn( + new DescribeStreamSummaryResult() + .withStreamDescriptionSummary( + new StreamDescriptionSummary() + .withRetentionPeriodHours(retentionPeriodHours) + .withStreamCreationTimestamp(streamCreationTimestamp.toDate()))); + + ShardFilter shardFilter = + new ShardFilter() + .withType(ShardFilterType.AT_TIMESTAMP) + .withTimestamp(startingPointTimestamp.toDate()); + + when(kinesis.listShards( + new ListShardsRequest() + .withStreamName(STREAM) + .withShardFilter(shardFilter) + .withMaxResults(1_000))) + .thenReturn(new ListShardsResult().withShards(shard1, shard2, shard3).withNextToken(null)); + + List shards = + underTest.listShardsAtPoint(STREAM, new StartingPoint(startingPointTimestamp)); + + assertThat(shards).containsOnly(shard1, shard2, shard3); + } + + @Test + public void shouldListAllShardsForTimestampOutsideStreamRetentionAfterStreamCreationTimestamp() + throws Exception { + Shard shard1 = new Shard().withShardId(SHARD_1); + Shard shard2 = new Shard().withShardId(SHARD_2); + Shard shard3 = new Shard().withShardId(SHARD_3); + + int retentionPeriodHours = 3; + int startingPointHours = 5; + int hoursSinceStreamCreation = 6; + + Instant streamCreationTimestamp = + CURRENT_TIMESTAMP.minus(Duration.standardHours(hoursSinceStreamCreation)); + Instant startingPointTimestampAfterStreamRetentionTimestamp = + CURRENT_TIMESTAMP.minus(Duration.standardHours(startingPointHours)); + + when(currentInstantSupplier.get()).thenReturn(CURRENT_TIMESTAMP); + + DescribeStreamSummaryRequest describeStreamRequest = + new DescribeStreamSummaryRequest().withStreamName(STREAM); + when(kinesis.describeStreamSummary(describeStreamRequest)) .thenReturn( - new DescribeStreamResult() - .withStreamDescription( - new StreamDescription().withShards(shard1, shard2).withHasMoreShards(true))); - when(kinesis.describeStream(STREAM, SHARD_2)) + new DescribeStreamSummaryResult() + .withStreamDescriptionSummary( + new StreamDescriptionSummary() + .withRetentionPeriodHours(retentionPeriodHours) + .withStreamCreationTimestamp(streamCreationTimestamp.toDate()))); + + ShardFilter shardFilter = new ShardFilter().withType(ShardFilterType.AT_TRIM_HORIZON); + + when(kinesis.listShards( + new ListShardsRequest() + .withStreamName(STREAM) + .withShardFilter(shardFilter) + .withMaxResults(1_000))) + .thenReturn(new ListShardsResult().withShards(shard1, shard2, shard3).withNextToken(null)); + + List shards = + underTest.listShardsAtPoint( + STREAM, new StartingPoint(startingPointTimestampAfterStreamRetentionTimestamp)); + + assertThat(shards).containsOnly(shard1, shard2, shard3); + } + + @Test + public void shouldListAllShardsForTimestampBeforeStreamCreationTimestamp() throws Exception { + Shard shard1 = new Shard().withShardId(SHARD_1); + Shard shard2 = new Shard().withShardId(SHARD_2); + Shard shard3 = new Shard().withShardId(SHARD_3); + + Instant startingPointTimestamp = Instant.parse("2000-01-01T15:00:00.000Z"); + Instant streamCreationTimestamp = startingPointTimestamp.plus(Duration.standardHours(1)); + + DescribeStreamSummaryRequest describeStreamRequest = + new DescribeStreamSummaryRequest().withStreamName(STREAM); + when(kinesis.describeStreamSummary(describeStreamRequest)) .thenReturn( - new DescribeStreamResult() - .withStreamDescription( - new StreamDescription().withShards(shard3).withHasMoreShards(false))); + new DescribeStreamSummaryResult() + .withStreamDescriptionSummary( + new StreamDescriptionSummary() + .withStreamCreationTimestamp(streamCreationTimestamp.toDate()))); + + ShardFilter shardFilter = new ShardFilter().withType(ShardFilterType.AT_TRIM_HORIZON); + + when(kinesis.listShards( + new ListShardsRequest() + .withStreamName(STREAM) + .withShardFilter(shardFilter) + .withMaxResults(1_000))) + .thenReturn(new ListShardsResult().withShards(shard1, shard2, shard3).withNextToken(null)); + + List shards = + underTest.listShardsAtPoint(STREAM, new StartingPoint(startingPointTimestamp)); + + assertThat(shards).containsOnly(shard1, shard2, shard3); + } + + @Test + public void shouldListAllShardsForLatest() throws Exception { + Shard shard1 = new Shard().withShardId(SHARD_1); + Shard shard2 = new Shard().withShardId(SHARD_2); + Shard shard3 = new Shard().withShardId(SHARD_3); + + when(kinesis.listShards( + new ListShardsRequest() + .withStreamName(STREAM) + .withShardFilter(new ShardFilter().withType(ShardFilterType.AT_LATEST)) + .withMaxResults(1_000))) + .thenReturn(new ListShardsResult().withShards(shard1, shard2, shard3).withNextToken(null)); + + List shards = + underTest.listShardsAtPoint(STREAM, new StartingPoint(InitialPositionInStream.LATEST)); + + assertThat(shards).containsOnly(shard1, shard2, shard3); + } + + @Test + public void shouldListAllShardsForExclusiveStartShardId() throws Exception { + Shard shard1 = new Shard().withShardId(SHARD_1); + Shard shard2 = new Shard().withShardId(SHARD_2); + Shard shard3 = new Shard().withShardId(SHARD_3); + + String exclusiveStartShardId = "exclusiveStartShardId"; + + when(kinesis.listShards( + new ListShardsRequest() + .withStreamName(STREAM) + .withMaxResults(1_000) + .withShardFilter( + new ShardFilter() + .withType(ShardFilterType.AFTER_SHARD_ID) + .withShardId(exclusiveStartShardId)))) + .thenReturn(new ListShardsResult().withShards(shard1, shard2, shard3).withNextToken(null)); - List shards = underTest.listShards(STREAM); + List shards = underTest.listShardsFollowingClosedShard(STREAM, exclusiveStartShardId); assertThat(shards).containsOnly(shard1, shard2, shard3); } @@ -222,9 +466,9 @@ public void shouldHandleUnexpectedExceptionForShardListing() { private void shouldHandleShardListingError( Exception thrownException, Class expectedExceptionClass) { - when(kinesis.describeStream(STREAM, null)).thenThrow(thrownException); + when(kinesis.listShards(any(ListShardsRequest.class))).thenThrow(thrownException); try { - underTest.listShards(STREAM); + underTest.listShardsAtPoint(STREAM, new StartingPoint(InitialPositionInStream.TRIM_HORIZON)); failBecauseExceptionWasNotThrown(expectedExceptionClass); } catch (Exception e) { assertThat(e).isExactlyInstanceOf(expectedExceptionClass); diff --git a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/StartingPointShardsFinderTest.java b/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/StartingPointShardsFinderTest.java deleted file mode 100644 index 0c2926b2e1d0..000000000000 --- a/sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/StartingPointShardsFinderTest.java +++ /dev/null @@ -1,284 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.kinesis; - -import static org.assertj.core.api.Assertions.assertThat; -import static org.mockito.Mockito.mock; -import static org.mockito.Mockito.when; - -import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream; -import com.amazonaws.services.kinesis.clientlibrary.types.UserRecord; -import com.amazonaws.services.kinesis.model.Shard; -import com.amazonaws.services.kinesis.model.ShardIteratorType; -import java.util.Collections; -import java.util.List; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; -import org.joda.time.Instant; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.JUnit4; - -/** Tests StartingPointShardsFinder. */ -@RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class StartingPointShardsFinderTest { - - private static final String STREAM_NAME = "streamName"; - private SimplifiedKinesisClient kinesis = mock(SimplifiedKinesisClient.class); - - /* - * This test operates on shards hierarchy prepared upfront. - * Following diagram depicts shard split and merge operations: - * - * 0002------------------------------+ - * / \ - * 0000------+ 0009 - * \ / - * 0003------+ 0007------+ - * \ / - * 0006------+ - * / \ - * 0004------+ 0008------+ - * / \ - * 0001------+ 0010 - * \ / - * 0005------------------------------+ - * - */ - private final Shard shard00 = createClosedShard("0000"); - private final Shard shard01 = createClosedShard("0001"); - private final Shard shard02 = createClosedShard("0002").withParentShardId("0000"); - private final Shard shard03 = createClosedShard("0003").withParentShardId("0000"); - private final Shard shard04 = createClosedShard("0004").withParentShardId("0001"); - private final Shard shard05 = createClosedShard("0005").withParentShardId("0001"); - private final Shard shard06 = - createClosedShard("0006").withParentShardId("0003").withAdjacentParentShardId("0004"); - private final Shard shard07 = createClosedShard("0007").withParentShardId("0006"); - private final Shard shard08 = createClosedShard("0008").withParentShardId("0006"); - private final Shard shard09 = - createOpenShard("0009").withParentShardId("0002").withAdjacentParentShardId("0007"); - private final Shard shard10 = - createOpenShard("0010").withParentShardId("0008").withAdjacentParentShardId("0005"); - - private final List allShards = - ImmutableList.of( - shard00, shard01, shard02, shard03, shard04, shard05, shard06, shard07, shard08, shard09, - shard10); - - private StartingPointShardsFinder underTest = new StartingPointShardsFinder(); - - @Test - public void shouldFindFirstShardsWhenAllShardsAreValid() throws Exception { - // given - Instant timestampAtTheBeginning = new Instant(); - StartingPoint startingPointAtTheBeginning = new StartingPoint(timestampAtTheBeginning); - for (Shard shard : allShards) { - activeAtTimestamp(shard, timestampAtTheBeginning); - } - when(kinesis.listShards(STREAM_NAME)).thenReturn(allShards); - - // when - Iterable shardsAtStartingPoint = - underTest.findShardsAtStartingPoint(kinesis, STREAM_NAME, startingPointAtTheBeginning); - - // then - assertThat(shardsAtStartingPoint).containsExactlyInAnyOrder(shard00, shard01); - } - - @Test - public void shouldFind3StartingShardsInTheMiddle() throws Exception { - // given - Instant timestampAfterShards3And4Merge = new Instant(); - StartingPoint startingPointAfterFirstSplitsAndMerge = - new StartingPoint(timestampAfterShards3And4Merge); - - expiredAtTimestamp(shard00, timestampAfterShards3And4Merge); - expiredAtTimestamp(shard01, timestampAfterShards3And4Merge); - activeAtTimestamp(shard02, timestampAfterShards3And4Merge); - expiredAtTimestamp(shard03, timestampAfterShards3And4Merge); - expiredAtTimestamp(shard04, timestampAfterShards3And4Merge); - activeAtTimestamp(shard05, timestampAfterShards3And4Merge); - activeAtTimestamp(shard06, timestampAfterShards3And4Merge); - activeAtTimestamp(shard07, timestampAfterShards3And4Merge); - activeAtTimestamp(shard08, timestampAfterShards3And4Merge); - activeAtTimestamp(shard09, timestampAfterShards3And4Merge); - activeAtTimestamp(shard10, timestampAfterShards3And4Merge); - - when(kinesis.listShards(STREAM_NAME)).thenReturn(allShards); - - // when - Iterable shardsAtStartingPoint = - underTest.findShardsAtStartingPoint( - kinesis, STREAM_NAME, startingPointAfterFirstSplitsAndMerge); - - // then - assertThat(shardsAtStartingPoint).containsExactlyInAnyOrder(shard02, shard05, shard06); - } - - @Test - public void shouldFindLastShardWhenAllPreviousExpired() throws Exception { - // given - Instant timestampAtTheEnd = new Instant(); - StartingPoint startingPointAtTheEnd = new StartingPoint(timestampAtTheEnd); - - expiredAtTimestamp(shard00, timestampAtTheEnd); - expiredAtTimestamp(shard01, timestampAtTheEnd); - expiredAtTimestamp(shard02, timestampAtTheEnd); - expiredAtTimestamp(shard03, timestampAtTheEnd); - expiredAtTimestamp(shard04, timestampAtTheEnd); - expiredAtTimestamp(shard05, timestampAtTheEnd); - expiredAtTimestamp(shard06, timestampAtTheEnd); - expiredAtTimestamp(shard07, timestampAtTheEnd); - expiredAtTimestamp(shard08, timestampAtTheEnd); - activeAtTimestamp(shard09, timestampAtTheEnd); - activeAtTimestamp(shard10, timestampAtTheEnd); - - when(kinesis.listShards(STREAM_NAME)).thenReturn(allShards); - - // when - Iterable shardsAtStartingPoint = - underTest.findShardsAtStartingPoint(kinesis, STREAM_NAME, startingPointAtTheEnd); - - // then - assertThat(shardsAtStartingPoint).containsExactlyInAnyOrder(shard09, shard10); - } - - @Test - public void shouldFindLastShardsWhenLatestStartingPointRequested() throws Exception { - // given - StartingPoint latestStartingPoint = new StartingPoint(InitialPositionInStream.LATEST); - when(kinesis.listShards(STREAM_NAME)).thenReturn(allShards); - - // when - Iterable shardsAtStartingPoint = - underTest.findShardsAtStartingPoint(kinesis, STREAM_NAME, latestStartingPoint); - - // then - assertThat(shardsAtStartingPoint).containsExactlyInAnyOrder(shard09, shard10); - } - - @Test - public void shouldFindEarliestShardsWhenTrimHorizonStartingPointRequested() throws Exception { - // given - StartingPoint trimHorizonStartingPoint = - new StartingPoint(InitialPositionInStream.TRIM_HORIZON); - when(kinesis.listShards(STREAM_NAME)).thenReturn(allShards); - - // when - Iterable shardsAtStartingPoint = - underTest.findShardsAtStartingPoint(kinesis, STREAM_NAME, trimHorizonStartingPoint); - - // then - assertThat(shardsAtStartingPoint).containsExactlyInAnyOrder(shard00, shard01); - } - - @Test(expected = IllegalStateException.class) - public void shouldThrowExceptionWhenSuccessorsNotFoundForExpiredShard() throws Exception { - // given - StartingPoint latestStartingPoint = new StartingPoint(InitialPositionInStream.LATEST); - Shard closedShard10 = - createClosedShard("0010").withParentShardId("0008").withAdjacentParentShardId("0005"); - List shards = - ImmutableList.of( - shard00, - shard01, - shard02, - shard03, - shard04, - shard05, - shard06, - shard07, - shard08, - shard09, - closedShard10); - - when(kinesis.listShards(STREAM_NAME)).thenReturn(shards); - - // when - underTest.findShardsAtStartingPoint(kinesis, STREAM_NAME, latestStartingPoint); - } - - private Shard createClosedShard(String shardId) { - Shard shard = new Shard().withShardId(shardId); - activeAtPoint(shard, ShardIteratorType.TRIM_HORIZON); - expiredAtPoint(shard, ShardIteratorType.LATEST); - return shard; - } - - private Shard createOpenShard(String shardId) { - Shard shard = new Shard().withShardId(shardId); - activeAtPoint(shard, ShardIteratorType.TRIM_HORIZON); - activeAtPoint(shard, ShardIteratorType.LATEST); - return shard; - } - - private void expiredAtTimestamp(Shard shard, Instant startTimestamp) { - prepareShard(shard, null, ShardIteratorType.AT_TIMESTAMP, startTimestamp); - } - - private void expiredAtPoint(Shard shard, ShardIteratorType shardIteratorType) { - prepareShard(shard, null, shardIteratorType, null); - } - - private void activeAtTimestamp(Shard shard, Instant startTimestamp) { - prepareShard( - shard, - "timestampIterator-" + shard.getShardId(), - ShardIteratorType.AT_TIMESTAMP, - startTimestamp); - } - - private void activeAtPoint(Shard shard, ShardIteratorType shardIteratorType) { - prepareShard(shard, shardIteratorType.toString() + shard.getShardId(), shardIteratorType, null); - } - - private void prepareShard( - Shard shard, - String nextIterator, - ShardIteratorType shardIteratorType, - Instant startTimestamp) { - try { - String shardIterator = shardIteratorType + shard.getShardId() + "-current"; - if (shardIteratorType == ShardIteratorType.AT_TIMESTAMP) { - when(kinesis.getShardIterator( - STREAM_NAME, - shard.getShardId(), - ShardIteratorType.AT_TIMESTAMP, - null, - startTimestamp)) - .thenReturn(shardIterator); - } else { - when(kinesis.getShardIterator( - STREAM_NAME, shard.getShardId(), shardIteratorType, null, null)) - .thenReturn(shardIterator); - } - GetKinesisRecordsResult result = - new GetKinesisRecordsResult( - Collections.emptyList(), - nextIterator, - 0, - STREAM_NAME, - shard.getShardId()); - when(kinesis.getRecords(shardIterator, STREAM_NAME, shard.getShardId())).thenReturn(result); - } catch (TransientKinesisException e) { - throw new RuntimeException(e); - } - } -} diff --git a/sdks/java/io/kudu/build.gradle b/sdks/java/io/kudu/build.gradle index ffd776021dd4..7f28d5ecf9cf 100644 --- a/sdks/java/io/kudu/build.gradle +++ b/sdks/java/io/kudu/build.gradle @@ -50,8 +50,6 @@ dependencies { compile library.java.slf4j_api testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.junit testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") diff --git a/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOIT.java b/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOIT.java index 2948f7032f4d..a50209ddb526 100644 --- a/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOIT.java +++ b/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOIT.java @@ -23,6 +23,7 @@ import static org.apache.beam.sdk.io.kudu.KuduTestUtils.SCHEMA; import static org.apache.beam.sdk.io.kudu.KuduTestUtils.createTableOptions; import static org.apache.beam.sdk.io.kudu.KuduTestUtils.rowCount; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import java.util.Arrays; @@ -45,7 +46,6 @@ import org.apache.kudu.client.KuduTable; import org.apache.kudu.client.RowResult; import org.junit.AfterClass; -import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Rule; import org.junit.Test; @@ -89,9 +89,6 @@ * */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KuduIOIT { private static final Logger LOG = LoggerFactory.getLogger(KuduIOIT.class); @@ -228,7 +225,7 @@ private void runWrite() throws Exception { .withFormatFn(new GenerateUpsert())); writePipeline.run().waitUntilFinish(); - Assert.assertThat( + assertThat( "Wrong number of records in table", rowCount(kuduTable), equalTo(options.getNumberOfRecords())); diff --git a/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java b/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java index fb1d0bc9c1d4..86d60e0768ae 100644 --- a/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java +++ b/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java @@ -65,9 +65,6 @@ * carried out in {@link KuduIOIT}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class KuduIOTest { private static final Logger LOG = LoggerFactory.getLogger(KuduIOTest.class); diff --git a/sdks/java/io/mongodb/build.gradle b/sdks/java/io/mongodb/build.gradle index 4e6ef1c67391..e84cb77aee5f 100644 --- a/sdks/java/io/mongodb/build.gradle +++ b/sdks/java/io/mongodb/build.gradle @@ -25,14 +25,12 @@ description = "Apache Beam :: SDKs :: Java :: IO :: MongoDB" ext.summary = "IO to read and write on MongoDB." dependencies { - compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") - compile library.java.slf4j_api compile library.java.joda_time - compile "org.mongodb:mongo-java-driver:3.12.7" + compile library.java.mongo_java_driver + compile library.java.slf4j_api + compile library.java.vendored_guava_26_0_jre testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") testCompile project(path: ":sdks:java:testing:test-utils", configuration: "testRuntime") testCompile "de.flapdoodle.embed:de.flapdoodle.embed.mongo:2.2.0" diff --git a/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/MongoDbIO.java b/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/MongoDbIO.java index 2a8775997844..8c7f03b6b0a0 100644 --- a/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/MongoDbIO.java +++ b/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/MongoDbIO.java @@ -31,12 +31,17 @@ import com.mongodb.client.MongoCursor; import com.mongodb.client.MongoDatabase; import com.mongodb.client.model.Aggregates; +import com.mongodb.client.model.BulkWriteOptions; import com.mongodb.client.model.Filters; import com.mongodb.client.model.InsertManyOptions; +import com.mongodb.client.model.UpdateOneModel; +import com.mongodb.client.model.UpdateOptions; +import com.mongodb.client.model.WriteModel; import java.util.ArrayList; -import java.util.Arrays; import java.util.Collections; +import java.util.HashMap; import java.util.List; +import java.util.Map; import java.util.stream.Collectors; import javax.net.ssl.SSLContext; import org.apache.beam.sdk.annotations.Experimental; @@ -94,8 +99,8 @@ * *

    MongoDB sink supports writing of Document (as JSON String) in a MongoDB. * - *

    To configure a MongoDB sink, you must specify a connection {@code URI}, a {@code Database} - * name, a {@code Collection} name. For instance: + *

    To configure a MongoDB sink and insert/replace, you must specify a connection {@code URI}, a + * {@code Database} name, a {@code Collection} name. For instance: * *

    {@code
      * pipeline
    @@ -107,6 +112,27 @@
      *     .withNumSplits(30))
      *
      * }
    + * + * * + * + *

    To configure a MongoDB sink and update, you must specify a connection {@code URI}, a {@code + * Database} * name, a {@code Collection} name. It matches the key with _id in target collection. + * For instance: * * + * + *

    {@code
    + * * pipeline
    + * *   .apply(...)
    + * *   .apply(MongoDbIO.write()
    + * *     .withUri("mongodb://localhost:27017")
    + * *     .withDatabase("my-database")
    + * *     .withCollection("my-collection")
    + * *     .withUpdateConfiguration(UpdateConfiguration.create().withUpdateKey("key1")
    + * *     .withUpdateFields(UpdateField.fieldUpdate("$set", "source-field1", "dest-field1"),
    + * *                       UpdateField.fieldUpdate("$set","source-field2", "dest-field2"),
    + * *                       //pushes entire input doc to the dest field
    + * *                        UpdateField.fullUpdate("$push", "dest-field3") )));
    + * *
    + * }
    */ @Experimental(Kind.SOURCE_SINK) @SuppressWarnings({ @@ -265,42 +291,6 @@ public Read withCollection(String collection) { return builder().setCollection(collection).build(); } - /** - * Sets a filter on the documents in a collection. - * - * @deprecated Filtering manually is discouraged. Use {@link #withQueryFn(SerializableFunction) - * with {@link FindQuery#withFilters(Bson)} as an argument to set up the projection}. - */ - @Deprecated - public Read withFilter(String filter) { - checkArgument(filter != null, "filter can not be null"); - checkArgument( - this.queryFn().getClass() != FindQuery.class, - "withFilter is only supported for FindQuery API"); - FindQuery findQuery = (FindQuery) queryFn(); - FindQuery queryWithFilter = - findQuery.toBuilder().setFilters(bson2BsonDocument(Document.parse(filter))).build(); - return builder().setQueryFn(queryWithFilter).build(); - } - - /** - * Sets a projection on the documents in a collection. - * - * @deprecated Use {@link #withQueryFn(SerializableFunction) with {@link - * FindQuery#withProjection(List)} as an argument to set up the projection}. - */ - @Deprecated - public Read withProjection(final String... fieldNames) { - checkArgument(fieldNames.length > 0, "projection can not be null"); - checkArgument( - this.queryFn().getClass() != FindQuery.class, - "withFilter is only supported for FindQuery API"); - FindQuery findQuery = (FindQuery) queryFn(); - FindQuery queryWithProjection = - findQuery.toBuilder().setProjection(Arrays.asList(fieldNames)).build(); - return builder().setQueryFn(queryWithProjection).build(); - } - /** Sets the user defined number of splits. */ public Read withNumSplits(int numSplits) { checkArgument(numSplits >= 0, "invalid num_splits: must be >= 0, but was %s", numSplits); @@ -527,7 +517,6 @@ public List> split( .anyMatch(s -> s.keySet().contains("$limit"))) { return Collections.singletonList(this); } - splitKeys = buildAutoBuckets(mongoDatabase, spec); for (BsonDocument shardFilter : splitKeysToMatch(splitKeys)) { @@ -781,6 +770,8 @@ public abstract static class Write extends PTransform, PDo abstract long batchSize(); + abstract @Nullable UpdateConfiguration updateConfiguration(); + abstract Builder builder(); @AutoValue.Builder @@ -803,6 +794,8 @@ abstract static class Builder { abstract Builder setBatchSize(long batchSize); + abstract Builder setUpdateConfiguration(UpdateConfiguration updateConfiguration); + abstract Write build(); } @@ -893,6 +886,10 @@ public Write withBatchSize(long batchSize) { return builder().setBatchSize(batchSize).build(); } + public Write withUpdateConfiguration(UpdateConfiguration updateConfiguration) { + return builder().setUpdateConfiguration(updateConfiguration).build(); + } + @Override public PDone expand(PCollection input) { checkArgument(uri() != null, "withUri() is required"); @@ -947,6 +944,7 @@ public void startBundle() { public void processElement(ProcessContext ctx) { // Need to copy the document because mongoCollection.insertMany() will mutate it // before inserting (will assign an id). + batch.add(new Document(ctx.element())); if (batch.size() >= spec.batchSize()) { flush(); @@ -964,6 +962,15 @@ private void flush() { } MongoDatabase mongoDatabase = client.getDatabase(spec.database()); MongoCollection mongoCollection = mongoDatabase.getCollection(spec.collection()); + if (spec.updateConfiguration() == null) { + insertDocuments(mongoCollection); + } else { + updateDocuments(mongoCollection); + } + batch.clear(); + } + + private void insertDocuments(MongoCollection mongoCollection) { try { mongoCollection.insertMany(batch, new InsertManyOptions().ordered(spec.ordered())); } catch (MongoBulkWriteException e) { @@ -971,8 +978,57 @@ private void flush() { throw e; } } + } - batch.clear(); + private void updateDocuments(MongoCollection mongoCollection) { + if (batch.isEmpty()) { + return; + } + List> actions = new ArrayList<>(); + @Nullable List updateFields = spec.updateConfiguration().updateFields(); + Map> operatorFieldsMap = getOperatorFieldsMap(updateFields); + try { + for (Document doc : batch) { + Document updateDocument = new Document(); + for (Map.Entry> entry : operatorFieldsMap.entrySet()) { + Document updateSubDocument = new Document(); + for (UpdateField field : entry.getValue()) { + updateSubDocument.append( + field.destField(), + field.sourceField() == null ? doc : doc.get(field.sourceField())); + } + updateDocument.append(entry.getKey(), updateSubDocument); + } + Document findCriteria = + new Document("_id", doc.get(spec.updateConfiguration().updateKey())); + UpdateOptions updateOptions = + new UpdateOptions().upsert(spec.updateConfiguration().isUpsert()); + actions.add(new UpdateOneModel<>(findCriteria, updateDocument, updateOptions)); + } + mongoCollection.bulkWrite(actions, new BulkWriteOptions().ordered(spec.ordered())); + } catch (MongoBulkWriteException e) { + if (spec.ordered()) { + throw e; + } + } + } + + private static Map> getOperatorFieldsMap( + List updateFields) { + Map> operatorFieldsMap = new HashMap<>(); + for (UpdateField field : updateFields) { + String updateOperator = field.updateOperator(); + if (operatorFieldsMap.containsKey(updateOperator)) { + List fields = operatorFieldsMap.get(updateOperator); + fields.add(field); + operatorFieldsMap.put(updateOperator, fields); + } else { + List fields = new ArrayList<>(); + fields.add(field); + operatorFieldsMap.put(updateOperator, fields); + } + } + return operatorFieldsMap; } @Teardown diff --git a/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/UpdateConfiguration.java b/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/UpdateConfiguration.java new file mode 100644 index 000000000000..cda037dac187 --- /dev/null +++ b/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/UpdateConfiguration.java @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.mongodb; + +import com.google.auto.value.AutoValue; +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** Builds a MongoDB UpdateConfiguration object. */ +@Experimental(Kind.SOURCE_SINK) +@AutoValue +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public abstract class UpdateConfiguration implements Serializable { + + abstract @Nullable String updateKey(); + + abstract @Nullable List updateFields(); + + abstract boolean isUpsert(); + + private static Builder builder() { + return new AutoValue_UpdateConfiguration.Builder() + .setUpdateFields(Collections.emptyList()) + .setIsUpsert(false); + } + + abstract Builder toBuilder(); + + public static UpdateConfiguration create() { + return builder().build(); + } + + @AutoValue.Builder + abstract static class Builder { + abstract Builder setUpdateFields(@Nullable List updateFields); + + abstract Builder setUpdateKey(@Nullable String updateKey); + + abstract Builder setIsUpsert(boolean isUpsert); + + abstract UpdateConfiguration build(); + } + + /** + * Sets the configurations for multiple updates. Takes update operator, source field name and dest + * field name for each one + */ + public UpdateConfiguration withUpdateFields(UpdateField... updateFields) { + return toBuilder().setUpdateFields(Arrays.asList(updateFields)).build(); + } + + /** Sets the filters to find. */ + public UpdateConfiguration withUpdateKey(String updateKey) { + return toBuilder().setUpdateKey(updateKey).build(); + } + + public UpdateConfiguration withIsUpsert(boolean isUpsert) { + return toBuilder().setIsUpsert(isUpsert).build(); + } +} diff --git a/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/UpdateField.java b/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/UpdateField.java new file mode 100644 index 000000000000..a1c8423ea00c --- /dev/null +++ b/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/UpdateField.java @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.mongodb; + +import com.google.auto.value.AutoValue; +import java.io.Serializable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.checkerframework.checker.nullness.qual.Nullable; + +@Experimental(Kind.SOURCE_SINK) +@AutoValue +public abstract class UpdateField implements Serializable { + + abstract @Nullable String updateOperator(); + + abstract @Nullable String sourceField(); + + abstract @Nullable String destField(); + + private static Builder builder() { + return new AutoValue_UpdateField.Builder().setSourceField(null); + } + + abstract UpdateField.Builder toBuilder(); + + private static UpdateField create() { + return builder().build(); + } + + @AutoValue.Builder + abstract static class Builder { + abstract UpdateField.Builder setUpdateOperator(@Nullable String updateOperator); + + abstract UpdateField.Builder setSourceField(@Nullable String sourceField); + + abstract UpdateField.Builder setDestField(@Nullable String destField); + + abstract UpdateField build(); + } + + /** Sets the limit of documents to find. */ + public static UpdateField fullUpdate(String updateOperator, String destField) { + return create().toBuilder().setUpdateOperator(updateOperator).setDestField(destField).build(); + } + + public static UpdateField fieldUpdate( + String updateOperator, String sourceField, String destField) { + return create() + .toBuilder() + .setUpdateOperator(updateOperator) + .setSourceField(sourceField) + .setDestField(destField) + .build(); + } +} diff --git a/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDBGridFSIOTest.java b/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDBGridFSIOTest.java index df4b6e32d819..0ef57fbad180 100644 --- a/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDBGridFSIOTest.java +++ b/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDBGridFSIOTest.java @@ -80,9 +80,6 @@ /** Test on the MongoDbGridFSIO. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MongoDBGridFSIOTest { private static final Logger LOG = LoggerFactory.getLogger(MongoDBGridFSIOTest.class); diff --git a/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDBIOIT.java b/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDBIOIT.java index 6f47cf75cf5d..086d9a6f6642 100644 --- a/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDBIOIT.java +++ b/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDBIOIT.java @@ -78,9 +78,6 @@ * performance testing framework. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MongoDBIOIT { private static final String NAMESPACE = MongoDBIOIT.class.getName(); diff --git a/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDbIOTest.java b/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDbIOTest.java index 96fec1551c43..3e2b62c1a615 100644 --- a/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDbIOTest.java +++ b/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDbIOTest.java @@ -63,9 +63,6 @@ /** Test on the MongoDbIO. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MongoDbIOTest { private static final Logger LOG = LoggerFactory.getLogger(MongoDbIOTest.class); @@ -106,7 +103,7 @@ public static void beforeClass() throws Exception { client = new MongoClient("localhost", port); LOG.info("Insert test data"); - List documents = createDocuments(1000); + List documents = createDocuments(1000, false); MongoCollection collection = getCollection(COLLECTION); collection.insertMany(documents); } @@ -332,7 +329,7 @@ public void testWrite() { final int numElements = 1000; pipeline - .apply(Create.of(createDocuments(numElements))) + .apply(Create.of(createDocuments(numElements, false))) .apply( MongoDbIO.write() .withUri("mongodb://localhost:" + port) @@ -364,7 +361,37 @@ public void testWriteUnordered() { assertEquals(1, countElements(collectionName)); } - private static List createDocuments(final int n) { + @Test + public void testUpdate() { + final String collectionName = "testUpdate"; + final int numElements = 100; + Document doc = Document.parse("{\"id\":1,\"scientist\":\"Updated\",\"country\":\"India\"}"); + + getCollection(collectionName).insertMany(createDocuments(numElements, true)); + assertEquals(numElements, countElements(collectionName)); + List docs = new ArrayList<>(); + docs.add(doc); + pipeline + .apply(Create.of(docs)) + .apply( + MongoDbIO.write() + .withUri("mongodb://localhost:" + port) + .withDatabase(DATABASE) + .withCollection(collectionName) + .withUpdateConfiguration( + UpdateConfiguration.create() + .withUpdateKey("id") + .withUpdateFields( + UpdateField.fieldUpdate("$set", "scientist", "scientist"), + UpdateField.fieldUpdate("$set", "country", "country")))); + pipeline.run(); + + Document out = getCollection(collectionName).find(new Document("_id", 1)).first(); + assertEquals("Updated", out.get("scientist")); + assertEquals("India", out.get("country")); + } + + private static List createDocuments(final int n, boolean addId) { final String[] scientists = new String[] { "Einstein", @@ -395,6 +422,9 @@ private static List createDocuments(final int n) { for (int i = 1; i <= n; i++) { int index = i % scientists.length; Document document = new Document(); + if (addId) { + document.append("_id", i); + } document.append("scientist", scientists[index]); document.append("country", country[index]); documents.add(document); diff --git a/sdks/java/io/mqtt/build.gradle b/sdks/java/io/mqtt/build.gradle index a384274de773..40a0825fcbaf 100644 --- a/sdks/java/io/mqtt/build.gradle +++ b/sdks/java/io/mqtt/build.gradle @@ -34,8 +34,6 @@ dependencies { testCompile library.java.activemq_mqtt testCompile library.java.activemq_kahadb_store testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java b/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java index e92cf3a55f83..16e1052d24a9 100644 --- a/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java +++ b/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java @@ -170,22 +170,6 @@ public static ConnectionConfiguration create(String serverUri, String topic) { .build(); } - /** - * Describe a connection configuration to the MQTT broker. - * - * @param serverUri The MQTT broker URI. - * @param topic The MQTT getTopic pattern. - * @param clientId A client ID prefix, used to construct a unique client ID. - * @return A connection configuration to the MQTT broker. - * @deprecated This constructor will be removed in a future version of Beam, please use - * #create(String, String)} and {@link #withClientId(String)} instead. - */ - @Deprecated - public static ConnectionConfiguration create(String serverUri, String topic, String clientId) { - checkArgument(clientId != null, "clientId can not be null"); - return create(serverUri, topic).withClientId(clientId); - } - /** Set up the MQTT broker URI. */ public ConnectionConfiguration withServerUri(String serverUri) { checkArgument(serverUri != null, "serverUri can not be null"); diff --git a/sdks/java/io/mqtt/src/test/java/org/apache/beam/sdk/io/mqtt/MqttIOTest.java b/sdks/java/io/mqtt/src/test/java/org/apache/beam/sdk/io/mqtt/MqttIOTest.java index e165f44ec8a6..2740493bd66a 100644 --- a/sdks/java/io/mqtt/src/test/java/org/apache/beam/sdk/io/mqtt/MqttIOTest.java +++ b/sdks/java/io/mqtt/src/test/java/org/apache/beam/sdk/io/mqtt/MqttIOTest.java @@ -56,9 +56,6 @@ /** Tests of {@link MqttIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class MqttIOTest { private static final Logger LOG = LoggerFactory.getLogger(MqttIOTest.class); diff --git a/sdks/java/io/parquet/build.gradle b/sdks/java/io/parquet/build.gradle index b3af3c6f0a43..8e9f679db971 100644 --- a/sdks/java/io/parquet/build.gradle +++ b/sdks/java/io/parquet/build.gradle @@ -35,7 +35,7 @@ def hadoopVersions = [ hadoopVersions.each {kv -> configurations.create("hadoopVersion$kv.key")} -def parquet_version = "1.10.0" +def parquet_version = "1.12.0" dependencies { compile library.java.vendored_guava_26_0_jre @@ -43,14 +43,13 @@ dependencies { compile project(":sdks:java:io:hadoop-common") compile library.java.slf4j_api compile "org.apache.parquet:parquet-avro:$parquet_version" + compile "org.apache.parquet:parquet-column:$parquet_version" compile "org.apache.parquet:parquet-common:$parquet_version" compile "org.apache.parquet:parquet-hadoop:$parquet_version" compile library.java.avro provided library.java.hadoop_client testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") hadoopVersions.each {kv -> diff --git a/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/GenericRecordReadConverter.java b/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/GenericRecordReadConverter.java deleted file mode 100644 index ad77ad7b8f12..000000000000 --- a/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/GenericRecordReadConverter.java +++ /dev/null @@ -1,64 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.parquet; - -import com.google.auto.value.AutoValue; -import java.io.Serializable; -import org.apache.avro.generic.GenericRecord; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.schemas.utils.AvroUtils; -import org.apache.beam.sdk.transforms.DoFn; -import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.transforms.ParDo; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.Row; - -/** A {@link PTransform} to convert {@link GenericRecord} to {@link Row}. */ -@AutoValue -abstract class GenericRecordReadConverter - extends PTransform, PCollection> implements Serializable { - - public abstract Schema beamSchema(); - - public static Builder builder() { - return new AutoValue_GenericRecordReadConverter.Builder(); - } - - @Override - public PCollection expand(PCollection input) { - return input - .apply( - "GenericRecordsToRows", - ParDo.of( - new DoFn() { - @ProcessElement - public void processElement(ProcessContext c) { - Row row = AvroUtils.toBeamRowStrict(c.element(), beamSchema()); - c.output(row); - } - })) - .setRowSchema(beamSchema()); - } - - @AutoValue.Builder - abstract static class Builder { - public abstract Builder beamSchema(Schema beamSchema); - - public abstract GenericRecordReadConverter build(); - } -} diff --git a/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java b/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java index 5dd9a11d76d7..81f597869dce 100644 --- a/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java +++ b/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java @@ -18,6 +18,7 @@ package org.apache.beam.sdk.io.parquet; import static java.lang.String.format; +import static org.apache.parquet.Preconditions.checkArgument; import static org.apache.parquet.Preconditions.checkNotNull; import static org.apache.parquet.hadoop.ParquetFileWriter.Mode.OVERWRITE; @@ -30,6 +31,7 @@ import java.util.ArrayList; import java.util.List; import java.util.Map; +import java.util.Map.Entry; import javax.annotation.Nullable; import org.apache.avro.Schema; import org.apache.avro.generic.GenericData; @@ -38,21 +40,31 @@ import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.coders.AvroCoder; +import org.apache.beam.sdk.coders.CannotProvideCoderException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderRegistry; import org.apache.beam.sdk.coders.StringUtf8Coder; import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.FileIO.ReadableFile; import org.apache.beam.sdk.io.fs.ResourceId; import org.apache.beam.sdk.io.hadoop.SerializableConfiguration; +import org.apache.beam.sdk.io.parquet.ParquetIO.ReadFiles.ReadFn; +import org.apache.beam.sdk.io.parquet.ParquetIO.ReadFiles.SplitReadFn; import org.apache.beam.sdk.io.range.OffsetRange; import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.schemas.utils.AvroUtils; import org.apache.beam.sdk.transforms.Create; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker; import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; import org.apache.hadoop.conf.Configuration; @@ -112,7 +124,7 @@ * to set the data model associated with the {@link AvroParquetReader} * *

    For more advanced use cases, like reading each file in a {@link PCollection} of {@link - * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * ReadableFile}, use the {@link ReadFiles} transform. * *

    For example: * @@ -147,9 +159,69 @@ * column in a row is stored interleaved. * *

    {@code
    - * * PCollection records = pipeline.apply(ParquetIO.read(SCHEMA).from("/foo/bar").withProjection(Projection_schema,Encoder_Schema));
    - * * ...
    - * *
    + * PCollection records =
    + *   pipeline
    + *     .apply(
    + *       ParquetIO.read(SCHEMA).from("/foo/bar").withProjection(Projection_schema,Encoder_Schema));
    + * }
    + * + *

    Reading records of an unknown schema

    + * + *

    To read records from files whose schema is unknown at pipeline construction time or differs + * between files, use {@link #parseGenericRecords(SerializableFunction)} - in this case, you will + * need to specify a parsing function for converting each {@link GenericRecord} into a value of your + * custom type. + * + *

    For example: + * + *

    {@code
    + * Pipeline p = ...;
    + *
    + * PCollection records =
    + *     p.apply(
    + *       ParquetIO.parseGenericRecords(
    + *           new SerializableFunction() {
    + *               public Foo apply(GenericRecord record) {
    + *                   // If needed, access the schema of the record using record.getSchema()
    + *                   return ...;
    + *               }
    + *           })
    + *           .from(...));
    + *
    + * // For reading from files
    + *  PCollection files = p.apply(...);
    + *
    + *  PCollection records =
    + *     files
    + *       .apply(
    + *           ParquetIO.parseFilesGenericRecords(
    + *               new SerializableFunction() {
    + *                   public Foo apply(GenericRecord record) {
    + *                       // If needed, access the schema of the record using record.getSchema()
    + *                       return ...;
    + *                   }
    + *           }));
    + * }
    + * + *

    Inferring Beam schemas from Parquet files

    + * + *

    If you want to use SQL or schema based operations on an Parquet-based PCollection, you must + * configure the read transform to infer the Beam schema and automatically setup the Beam related + * coders by doing: + * + *

    {@code
    + * PCollection parquetRecords =
    + *   p.apply(ParquetIO.read(...).from(...).withBeamSchemas(true));
    + * }
    + * + * You can also use it when reading a list of filenams from a {@code PCollection}: + * + *
    {@code
    + * PCollection filePatterns = p.apply(...);
    + *
    + * PCollection parquetRecords =
    + *   filePatterns
    + *     .apply(ParquetIO.readFiles(...).withBeamSchemas(true));
      * }
    * *

    Writing Parquet files

    @@ -187,22 +259,51 @@ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class ParquetIO { + private static final Logger LOG = LoggerFactory.getLogger(ParquetIO.class); /** * Reads {@link GenericRecord} from a Parquet file (or multiple Parquet files matching the * pattern). */ public static Read read(Schema schema) { - return new AutoValue_ParquetIO_Read.Builder().setSchema(schema).setSplittable(false).build(); + return new AutoValue_ParquetIO_Read.Builder() + .setSchema(schema) + .setInferBeamSchema(false) + .setSplittable(false) + .build(); } /** * Like {@link #read(Schema)}, but reads each file in a {@link PCollection} of {@link - * org.apache.beam.sdk.io.FileIO.ReadableFile}, which allows more flexible usage. + * ReadableFile}, which allows more flexible usage. */ public static ReadFiles readFiles(Schema schema) { return new AutoValue_ParquetIO_ReadFiles.Builder() + .setSplittable(false) + .setInferBeamSchema(false) .setSchema(schema) + .build(); + } + + /** + * Reads {@link GenericRecord} from a Parquet file (or multiple Parquet files matching the + * pattern) and converts to user defined type using provided parseFn. + */ + public static Parse parseGenericRecords(SerializableFunction parseFn) { + return new AutoValue_ParquetIO_Parse.Builder() + .setParseFn(parseFn) + .setSplittable(false) + .build(); + } + + /** + * Reads {@link GenericRecord} from Parquet files and converts to user defined type using provided + * {@code parseFn}. + */ + public static ParseFiles parseFilesGenericRecords( + SerializableFunction parseFn) { + return new AutoValue_ParquetIO_ParseFiles.Builder() + .setParseFn(parseFn) .setSplittable(false) .build(); } @@ -221,6 +322,10 @@ public abstract static class Read extends PTransform filepattern); @@ -240,6 +347,8 @@ abstract static class Builder { abstract Builder setAvroDataModel(GenericData model); + abstract Builder setConfiguration(SerializableConfiguration configuration); + abstract Read build(); } @@ -261,6 +370,23 @@ public Read withProjection(Schema projectionSchema, Schema encoderSchema) { .build(); } + /** Specify Hadoop configuration for ParquetReader. */ + public Read withConfiguration(Map configuration) { + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(SerializableConfiguration.fromMap(configuration)).build(); + } + + /** Specify Hadoop configuration for ParquetReader. */ + public Read withConfiguration(Configuration configuration) { + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(new SerializableConfiguration(configuration)).build(); + } + + @Experimental(Kind.SCHEMAS) + public Read withBeamSchemas(boolean inferBeamSchema) { + return toBuilder().setInferBeamSchema(inferBeamSchema).build(); + } + /** Enable the Splittable reading. */ public Read withSplit() { return toBuilder().setSplittable(true).build(); @@ -276,34 +402,270 @@ public Read withAvroDataModel(GenericData model) { @Override public PCollection expand(PBegin input) { checkNotNull(getFilepattern(), "Filepattern cannot be null."); - PCollection inputFiles = + PCollection inputFiles = input .apply( "Create filepattern", Create.ofProvider(getFilepattern(), StringUtf8Coder.of())) .apply(FileIO.matchAll()) .apply(FileIO.readMatches()); + + ReadFiles readFiles = + readFiles(getSchema()) + .withBeamSchemas(getInferBeamSchema()) + .withAvroDataModel(getAvroDataModel()); if (isSplittable()) { - return inputFiles.apply( - readFiles(getSchema()) - .withSplit() - .withAvroDataModel(getAvroDataModel()) - .withProjection(getProjectionSchema(), getEncoderSchema())); + readFiles = readFiles.withSplit().withProjection(getProjectionSchema(), getEncoderSchema()); + } + if (getConfiguration() != null) { + readFiles = readFiles.withConfiguration(getConfiguration().get()); + } + + return inputFiles.apply(readFiles); + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + super.populateDisplayData(builder); + builder + .addIfNotNull( + DisplayData.item("filePattern", getFilepattern()).withLabel("Input File Pattern")) + .addIfNotNull(DisplayData.item("schema", String.valueOf(getSchema()))) + .add( + DisplayData.item("inferBeamSchema", getInferBeamSchema()) + .withLabel("Infer Beam Schema")) + .add(DisplayData.item("splittable", isSplittable())) + .addIfNotNull(DisplayData.item("projectionSchema", String.valueOf(getProjectionSchema()))) + .addIfNotNull(DisplayData.item("avroDataModel", String.valueOf(getAvroDataModel()))); + if (this.getConfiguration() != null) { + Configuration configuration = this.getConfiguration().get(); + for (Entry entry : configuration) { + if (entry.getKey().startsWith("parquet")) { + builder.addIfNotNull(DisplayData.item(entry.getKey(), entry.getValue())); + } + } } - return inputFiles.apply(readFiles(getSchema()).withAvroDataModel(getAvroDataModel())); + } + } + + /** Implementation of {@link #parseGenericRecords(SerializableFunction)}. */ + @AutoValue + public abstract static class Parse extends PTransform> { + abstract @Nullable ValueProvider getFilepattern(); + + abstract SerializableFunction getParseFn(); + + abstract @Nullable Coder getCoder(); + + abstract @Nullable SerializableConfiguration getConfiguration(); + + abstract boolean isSplittable(); + + abstract Builder toBuilder(); + + @AutoValue.Builder + abstract static class Builder { + abstract Builder setFilepattern(ValueProvider inputFiles); + + abstract Builder setParseFn(SerializableFunction parseFn); + + abstract Builder setCoder(Coder coder); + + abstract Builder setConfiguration(SerializableConfiguration configuration); + + abstract Builder setSplittable(boolean splittable); + + abstract Parse build(); + } + + public Parse from(ValueProvider filepattern) { + return toBuilder().setFilepattern(filepattern).build(); + } + + public Parse from(String filepattern) { + return from(ValueProvider.StaticValueProvider.of(filepattern)); + } + + /** Specify the output coder to use for output of the {@code ParseFn}. */ + public Parse withCoder(Coder coder) { + return (coder == null) ? this : toBuilder().setCoder(coder).build(); + } + + /** Specify Hadoop configuration for ParquetReader. */ + public Parse withConfiguration(Map configuration) { + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(SerializableConfiguration.fromMap(configuration)).build(); + } + + /** Specify Hadoop configuration for ParquetReader. */ + public Parse withConfiguration(Configuration configuration) { + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(new SerializableConfiguration(configuration)).build(); + } + + public Parse withSplit() { + return toBuilder().setSplittable(true).build(); + } + + @Override + public PCollection expand(PBegin input) { + checkNotNull(getFilepattern(), "Filepattern cannot be null."); + return input + .apply("Create filepattern", Create.ofProvider(getFilepattern(), StringUtf8Coder.of())) + .apply(FileIO.matchAll()) + .apply(FileIO.readMatches()) + .apply( + parseFilesGenericRecords(getParseFn()) + .toBuilder() + .setCoder(getCoder()) + .setSplittable(isSplittable()) + .build()); + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + super.populateDisplayData(builder); + builder + .addIfNotNull( + DisplayData.item("filePattern", getFilepattern()).withLabel("Input File Pattern")) + .add(DisplayData.item("splittable", isSplittable())) + .add(DisplayData.item("parseFn", getParseFn().getClass()).withLabel("Parse function")); + if (this.getCoder() != null) { + builder.add(DisplayData.item("coder", getCoder().getClass())); + } + if (this.getConfiguration() != null) { + Configuration configuration = this.getConfiguration().get(); + for (Entry entry : configuration) { + if (entry.getKey().startsWith("parquet")) { + builder.addIfNotNull(DisplayData.item(entry.getKey(), entry.getValue())); + } + } + } + } + } + + /** Implementation of {@link #parseFilesGenericRecords(SerializableFunction)}. */ + @AutoValue + public abstract static class ParseFiles + extends PTransform, PCollection> { + + abstract SerializableFunction getParseFn(); + + abstract @Nullable Coder getCoder(); + + abstract @Nullable SerializableConfiguration getConfiguration(); + + abstract boolean isSplittable(); + + abstract Builder toBuilder(); + + @AutoValue.Builder + abstract static class Builder { + abstract Builder setParseFn(SerializableFunction parseFn); + + abstract Builder setCoder(Coder coder); + + abstract Builder setConfiguration(SerializableConfiguration configuration); + + abstract Builder setSplittable(boolean split); + + abstract ParseFiles build(); + } + + /** Specify the output coder to use for output of the {@code ParseFn}. */ + public ParseFiles withCoder(Coder coder) { + return (coder == null) ? this : toBuilder().setCoder(coder).build(); + } + + /** Specify Hadoop configuration for ParquetReader. */ + public ParseFiles withConfiguration(Map configuration) { + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(SerializableConfiguration.fromMap(configuration)).build(); + } + + /** Specify Hadoop configuration for ParquetReader. */ + public ParseFiles withConfiguration(Configuration configuration) { + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(new SerializableConfiguration(configuration)).build(); + } + + public ParseFiles withSplit() { + return toBuilder().setSplittable(true).build(); + } + + @Override + public PCollection expand(PCollection input) { + checkArgument(!isGenericRecordOutput(), "Parse can't be used for reading as GenericRecord."); + + return input + .apply(ParDo.of(buildFileReadingFn())) + .setCoder(inferCoder(input.getPipeline().getCoderRegistry())); } @Override public void populateDisplayData(DisplayData.Builder builder) { super.populateDisplayData(builder); - builder.add( - DisplayData.item("filePattern", getFilepattern()).withLabel("Input File Pattern")); + builder + .add(DisplayData.item("splittable", isSplittable())) + .add(DisplayData.item("parseFn", getParseFn().getClass()).withLabel("Parse function")); + if (this.getCoder() != null) { + builder.add(DisplayData.item("coder", getCoder().getClass())); + } + if (this.getConfiguration() != null) { + Configuration configuration = this.getConfiguration().get(); + for (Entry entry : configuration) { + if (entry.getKey().startsWith("parquet")) { + builder.addIfNotNull(DisplayData.item(entry.getKey(), entry.getValue())); + } + } + } + } + + /** Returns Splittable or normal Parquet file reading DoFn. */ + private DoFn buildFileReadingFn() { + return isSplittable() + ? new SplitReadFn<>(null, null, getParseFn(), getConfiguration()) + : new ReadFn<>(null, getParseFn(), getConfiguration()); + } + + /** Returns true if expected output is {@code PCollection}. */ + private boolean isGenericRecordOutput() { + String outputType = TypeDescriptors.outputOf(getParseFn()).getType().getTypeName(); + return outputType.equals(GenericRecord.class.getTypeName()); + } + + /** + * Identifies the {@code Coder} to be used for the output PCollection. + * + *

    throws an exception if expected output is of type {@link GenericRecord}. + * + * @param coderRegistry the {@link org.apache.beam.sdk.Pipeline}'s CoderRegistry to identify + * Coder for expected output type of {@link #getParseFn()} + */ + private Coder inferCoder(CoderRegistry coderRegistry) { + if (isGenericRecordOutput()) { + throw new IllegalArgumentException("Parse can't be used for reading as GenericRecord."); + } + + // Use explicitly provided coder + if (getCoder() != null) { + return getCoder(); + } + + // If not GenericRecord infer it from ParseFn. + try { + return coderRegistry.getCoder(TypeDescriptors.outputOf(getParseFn())); + } catch (CannotProvideCoderException e) { + throw new IllegalArgumentException( + "Unable to infer coder for output of parseFn. Specify it explicitly using .withCoder().", + e); + } } } /** Implementation of {@link #readFiles(Schema)}. */ @AutoValue public abstract static class ReadFiles - extends PTransform, PCollection> { + extends PTransform, PCollection> { abstract @Nullable Schema getSchema(); @@ -313,6 +675,10 @@ public abstract static class ReadFiles abstract @Nullable Schema getProjectionSchema(); + abstract @Nullable SerializableConfiguration getConfiguration(); + + abstract boolean getInferBeamSchema(); + abstract boolean isSplittable(); abstract Builder toBuilder(); @@ -327,6 +693,10 @@ abstract static class Builder { abstract Builder setProjectionSchema(Schema schema); + abstract Builder setConfiguration(SerializableConfiguration configuration); + + abstract Builder setInferBeamSchema(boolean inferBeamSchema); + abstract Builder setSplittable(boolean split); abstract ReadFiles build(); @@ -346,49 +716,113 @@ public ReadFiles withProjection(Schema projectionSchema, Schema encoderSchema) { .setSplittable(true) .build(); } + + /** Specify Hadoop configuration for ParquetReader. */ + public ReadFiles withConfiguration(Map configuration) { + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(SerializableConfiguration.fromMap(configuration)).build(); + } + + /** Specify Hadoop configuration for ParquetReader. */ + public ReadFiles withConfiguration(Configuration configuration) { + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(new SerializableConfiguration(configuration)).build(); + } + + @Experimental(Kind.SCHEMAS) + public ReadFiles withBeamSchemas(boolean inferBeamSchema) { + return toBuilder().setInferBeamSchema(inferBeamSchema).build(); + } + /** Enable the Splittable reading. */ public ReadFiles withSplit() { return toBuilder().setSplittable(true).build(); } @Override - public PCollection expand(PCollection input) { + public PCollection expand(PCollection input) { checkNotNull(getSchema(), "Schema can not be null"); - if (isSplittable()) { - Schema coderSchema = getProjectionSchema() == null ? getSchema() : getEncoderSchema(); - return input - .apply(ParDo.of(new SplitReadFn(getAvroDataModel(), getProjectionSchema()))) - .setCoder(AvroCoder.of(coderSchema)); + return input.apply(ParDo.of(getReaderFn())).setCoder(getCollectionCoder()); + } + + @Override + public void populateDisplayData(DisplayData.Builder builder) { + super.populateDisplayData(builder); + builder + .addIfNotNull(DisplayData.item("schema", String.valueOf(getSchema()))) + .add( + DisplayData.item("inferBeamSchema", getInferBeamSchema()) + .withLabel("Infer Beam Schema")) + .add(DisplayData.item("splittable", isSplittable())) + .addIfNotNull(DisplayData.item("projectionSchema", String.valueOf(getProjectionSchema()))) + .addIfNotNull(DisplayData.item("avroDataModel", String.valueOf(getAvroDataModel()))); + if (this.getConfiguration() != null) { + Configuration configuration = this.getConfiguration().get(); + for (Entry entry : configuration) { + if (entry.getKey().startsWith("parquet")) { + builder.addIfNotNull(DisplayData.item(entry.getKey(), entry.getValue())); + } + } } - return input - .apply(ParDo.of(new ReadFn(getAvroDataModel()))) - .setCoder(AvroCoder.of(getSchema())); + } + + /** Returns Parquet file reading function based on {@link #isSplittable()}. */ + private DoFn getReaderFn() { + return isSplittable() + ? new SplitReadFn<>( + getAvroDataModel(), + getProjectionSchema(), + GenericRecordPassthroughFn.create(), + getConfiguration()) + : new ReadFn<>( + getAvroDataModel(), GenericRecordPassthroughFn.create(), getConfiguration()); + } + + /** + * Returns {@link org.apache.beam.sdk.schemas.SchemaCoder} when using Beam schemas, {@link + * AvroCoder} when not using Beam schema. + */ + @Experimental(Kind.SCHEMAS) + private Coder getCollectionCoder() { + Schema coderSchema = + getProjectionSchema() != null && isSplittable() ? getEncoderSchema() : getSchema(); + + return getInferBeamSchema() ? AvroUtils.schemaCoder(coderSchema) : AvroCoder.of(coderSchema); } @DoFn.BoundedPerElement - static class SplitReadFn extends DoFn { - private Class modelClass; - private static final Logger LOG = LoggerFactory.getLogger(SplitReadFn.class); - private String requestSchemaString; + static class SplitReadFn extends DoFn { + private final Class modelClass; + private final String requestSchemaString; // Default initial splitting the file into blocks of 64MB. Unit of SPLIT_LIMIT is byte. private static final long SPLIT_LIMIT = 64000000; - SplitReadFn(GenericData model, Schema requestSchema) { + private @Nullable final SerializableConfiguration configuration; + + private final SerializableFunction parseFn; + + SplitReadFn( + GenericData model, + Schema requestSchema, + SerializableFunction parseFn, + @Nullable SerializableConfiguration configuration) { this.modelClass = model != null ? model.getClass() : null; this.requestSchemaString = requestSchema != null ? requestSchema.toString() : null; + this.parseFn = checkNotNull(parseFn, "GenericRecord parse function can't be null"); + this.configuration = configuration; } - ParquetFileReader getParquetFileReader(FileIO.ReadableFile file) throws Exception { + private ParquetFileReader getParquetFileReader(ReadableFile file) throws Exception { ParquetReadOptions options = HadoopReadOptions.builder(getConfWithModelClass()).build(); return ParquetFileReader.open(new BeamParquetInputFile(file.openSeekable()), options); } @ProcessElement public void processElement( - @Element FileIO.ReadableFile file, + @Element ReadableFile file, RestrictionTracker tracker, - OutputReceiver outputReceiver) + OutputReceiver outputReceiver) throws Exception { LOG.debug( "start " @@ -400,98 +834,99 @@ public void processElement( if (modelClass != null) { model = (GenericData) modelClass.getMethod("get").invoke(null); } - AvroReadSupport readSupport = new AvroReadSupport(model); + AvroReadSupport readSupport = new AvroReadSupport<>(model); if (requestSchemaString != null) { AvroReadSupport.setRequestedProjection( conf, new Schema.Parser().parse(requestSchemaString)); } ParquetReadOptions options = HadoopReadOptions.builder(conf).build(); - ParquetFileReader reader = - ParquetFileReader.open(new BeamParquetInputFile(file.openSeekable()), options); - Filter filter = checkNotNull(options.getRecordFilter(), "filter"); - Configuration hadoopConf = ((HadoopReadOptions) options).getConf(); - FileMetaData parquetFileMetadata = reader.getFooter().getFileMetaData(); - MessageType fileSchema = parquetFileMetadata.getSchema(); - Map fileMetadata = parquetFileMetadata.getKeyValueMetaData(); - ReadSupport.ReadContext readContext = - readSupport.init( - new InitContext( - hadoopConf, Maps.transformValues(fileMetadata, ImmutableSet::of), fileSchema)); - ColumnIOFactory columnIOFactory = new ColumnIOFactory(parquetFileMetadata.getCreatedBy()); - - RecordMaterializer recordConverter = - readSupport.prepareForRead(hadoopConf, fileMetadata, fileSchema, readContext); - reader.setRequestedSchema(readContext.getRequestedSchema()); - MessageColumnIO columnIO = - columnIOFactory.getColumnIO(readContext.getRequestedSchema(), fileSchema, true); - long currentBlock = tracker.currentRestriction().getFrom(); - for (int i = 0; i < currentBlock; i++) { - reader.skipNextRowGroup(); - } - while (tracker.tryClaim(currentBlock)) { - PageReadStore pages = reader.readNextRowGroup(); - LOG.debug("block {} read in memory. row count = {}", currentBlock, pages.getRowCount()); - currentBlock += 1; - RecordReader recordReader = - columnIO.getRecordReader( - pages, recordConverter, options.useRecordFilter() ? filter : FilterCompat.NOOP); - long currentRow = 0; - long totalRows = pages.getRowCount(); - while (currentRow < totalRows) { - try { - GenericRecord record; - currentRow += 1; + try (ParquetFileReader reader = + ParquetFileReader.open(new BeamParquetInputFile(file.openSeekable()), options)) { + Filter filter = checkNotNull(options.getRecordFilter(), "filter"); + Configuration hadoopConf = ((HadoopReadOptions) options).getConf(); + FileMetaData parquetFileMetadata = reader.getFooter().getFileMetaData(); + MessageType fileSchema = parquetFileMetadata.getSchema(); + Map fileMetadata = parquetFileMetadata.getKeyValueMetaData(); + ReadSupport.ReadContext readContext = + readSupport.init( + new InitContext( + hadoopConf, + Maps.transformValues(fileMetadata, ImmutableSet::of), + fileSchema)); + ColumnIOFactory columnIOFactory = new ColumnIOFactory(parquetFileMetadata.getCreatedBy()); + + RecordMaterializer recordConverter = + readSupport.prepareForRead(hadoopConf, fileMetadata, fileSchema, readContext); + reader.setRequestedSchema(readContext.getRequestedSchema()); + MessageColumnIO columnIO = + columnIOFactory.getColumnIO(readContext.getRequestedSchema(), fileSchema, true); + long currentBlock = tracker.currentRestriction().getFrom(); + for (int i = 0; i < currentBlock; i++) { + reader.skipNextRowGroup(); + } + while (tracker.tryClaim(currentBlock)) { + PageReadStore pages = reader.readNextRowGroup(); + LOG.debug("block {} read in memory. row count = {}", currentBlock, pages.getRowCount()); + currentBlock += 1; + RecordReader recordReader = + columnIO.getRecordReader( + pages, recordConverter, options.useRecordFilter() ? filter : FilterCompat.NOOP); + long currentRow = 0; + long totalRows = pages.getRowCount(); + while (currentRow < totalRows) { try { - record = recordReader.read(); - } catch (RecordMaterializer.RecordMaterializationException e) { - LOG.warn( - "skipping a corrupt record at {} in block {} in file {}", - currentRow, - currentBlock, - file.toString()); - continue; - } - if (record == null) { - // only happens with FilteredRecordReader at end of block - LOG.debug( - "filtered record reader reached end of block in block {} in file {}", - currentBlock, - file.toString()); - break; - } - if (recordReader.shouldSkipCurrentRecord()) { - // this record is being filtered via the filter2 package - LOG.debug( - "skipping record at {} in block {} in file {}", - currentRow, - currentBlock, - file.toString()); - continue; + GenericRecord record; + currentRow += 1; + try { + record = recordReader.read(); + } catch (RecordMaterializer.RecordMaterializationException e) { + LOG.warn( + "skipping a corrupt record at {} in block {} in file {}", + currentRow, + currentBlock, + file.toString()); + continue; + } + if (record == null) { + // only happens with FilteredRecordReader at end of block + LOG.debug( + "filtered record reader reached end of block in block {} in file {}", + currentBlock, + file.toString()); + break; + } + if (recordReader.shouldSkipCurrentRecord()) { + // this record is being filtered via the filter2 package + LOG.debug( + "skipping record at {} in block {} in file {}", + currentRow, + currentBlock, + file.toString()); + continue; + } + outputReceiver.output(parseFn.apply(record)); + } catch (RuntimeException e) { + + throw new ParquetDecodingException( + format( + "Can not read value at %d in block %d in file %s", + currentRow, currentBlock, file.toString()), + e); } - outputReceiver.output(record); - } catch (RuntimeException e) { - - throw new ParquetDecodingException( - format( - "Can not read value at %d in block %d in file %s", - currentRow, currentBlock, file.toString()), - e); } + LOG.debug( + "Finish processing {} rows from block {} in file {}", + currentRow, + currentBlock - 1, + file.toString()); } - LOG.debug( - "Finish processing {} rows from block {} in file {}", - currentRow, - currentBlock - 1, - file.toString()); } } - public Configuration getConfWithModelClass() throws Exception { - Configuration conf = new Configuration(); - GenericData model = null; - if (modelClass != null) { - model = (GenericData) modelClass.getMethod("get").invoke(null); - } + public Configuration getConfWithModelClass() throws ReflectiveOperationException { + Configuration conf = SerializableConfiguration.newConfiguration(configuration); + GenericData model = buildModelObject(modelClass); + if (model != null && (model.getClass() == GenericData.class || model.getClass() == SpecificData.class)) { conf.setBoolean(AvroReadSupport.AVRO_COMPATIBILITY, true); @@ -502,29 +937,31 @@ public Configuration getConfWithModelClass() throws Exception { } @GetInitialRestriction - public OffsetRange getInitialRestriction(@Element FileIO.ReadableFile file) throws Exception { - ParquetFileReader reader = getParquetFileReader(file); - return new OffsetRange(0, reader.getRowGroups().size()); + public OffsetRange getInitialRestriction(@Element ReadableFile file) throws Exception { + try (ParquetFileReader reader = getParquetFileReader(file)) { + return new OffsetRange(0, reader.getRowGroups().size()); + } } @SplitRestriction public void split( @Restriction OffsetRange restriction, OutputReceiver out, - @Element FileIO.ReadableFile file) + @Element ReadableFile file) throws Exception { - ParquetFileReader reader = getParquetFileReader(file); - List rowGroups = reader.getRowGroups(); - for (OffsetRange offsetRange : - splitBlockWithLimit( - restriction.getFrom(), restriction.getTo(), rowGroups, SPLIT_LIMIT)) { - out.output(offsetRange); + try (ParquetFileReader reader = getParquetFileReader(file)) { + List rowGroups = reader.getRowGroups(); + for (OffsetRange offsetRange : + splitBlockWithLimit( + restriction.getFrom(), restriction.getTo(), rowGroups, SPLIT_LIMIT)) { + out.output(offsetRange); + } } } public ArrayList splitBlockWithLimit( long start, long end, List blockList, long limit) { - ArrayList offsetList = new ArrayList(); + ArrayList offsetList = new ArrayList<>(); long totalSize = 0; long rangeStart = start; for (long rangeEnd = start; rangeEnd < end; rangeEnd++) { @@ -543,8 +980,7 @@ public ArrayList splitBlockWithLimit( @NewTracker public RestrictionTracker newTracker( - @Restriction OffsetRange restriction, @Element FileIO.ReadableFile file) - throws Exception { + @Restriction OffsetRange restriction, @Element ReadableFile file) throws Exception { CountAndSize recordCountAndSize = getRecordCountAndSize(file, restriction); return new BlockTracker( restriction, @@ -558,23 +994,23 @@ public OffsetRange.Coder getRestrictionCoder() { } @GetSize - public double getSize(@Element FileIO.ReadableFile file, @Restriction OffsetRange restriction) + public double getSize(@Element ReadableFile file, @Restriction OffsetRange restriction) throws Exception { return getRecordCountAndSize(file, restriction).getSize(); } - private CountAndSize getRecordCountAndSize(FileIO.ReadableFile file, OffsetRange restriction) + private CountAndSize getRecordCountAndSize(ReadableFile file, OffsetRange restriction) throws Exception { - ParquetFileReader reader = getParquetFileReader(file); - double size = 0; - double recordCount = 0; - for (long i = restriction.getFrom(); i < restriction.getTo(); i++) { - BlockMetaData block = reader.getRowGroups().get((int) i); - recordCount += block.getRowCount(); - size += block.getTotalByteSize(); + try (ParquetFileReader reader = getParquetFileReader(file)) { + double size = 0; + double recordCount = 0; + for (long i = restriction.getFrom(); i < restriction.getTo(); i++) { + BlockMetaData block = reader.getRowGroups().get((int) i); + recordCount += block.getRowCount(); + size += block.getTotalByteSize(); + } + return CountAndSize.create(recordCount, size); } - CountAndSize countAndSize = CountAndSize.create(recordCount, size); - return countAndSize; } @AutoValue @@ -604,7 +1040,7 @@ public BlockTracker(OffsetRange range, long totalByteSize, long recordCount) { } } - public void makeProgress() throws Exception { + public void makeProgress() throws IOException { progress += approximateRecordSize; if (progress > totalWork) { throw new IOException("Making progress out of range"); @@ -618,18 +1054,26 @@ public Progress getProgress() { } } - static class ReadFn extends DoFn { + static class ReadFn extends DoFn { + + private final Class modelClass; + + private final SerializableFunction parseFn; - private Class modelClass; + private final SerializableConfiguration configuration; - ReadFn(GenericData model) { + ReadFn( + GenericData model, + SerializableFunction parseFn, + SerializableConfiguration configuration) { this.modelClass = model != null ? model.getClass() : null; + this.parseFn = checkNotNull(parseFn, "GenericRecord parse function is null"); + this.configuration = configuration; } @ProcessElement - public void processElement(ProcessContext processContext) throws Exception { - FileIO.ReadableFile file = processContext.element(); - + public void processElement(@Element ReadableFile file, OutputReceiver receiver) + throws Exception { if (!file.getMetadata().isReadSeekEfficient()) { ResourceId filename = file.getMetadata().resourceId(); throw new RuntimeException(String.format("File has to be seekable: %s", filename)); @@ -637,25 +1081,29 @@ public void processElement(ProcessContext processContext) throws Exception { SeekableByteChannel seekableByteChannel = file.openSeekable(); - AvroParquetReader.Builder builder = - AvroParquetReader.builder(new BeamParquetInputFile(seekableByteChannel)); + AvroParquetReader.Builder builder = + (AvroParquetReader.Builder) + AvroParquetReader.builder( + new BeamParquetInputFile(seekableByteChannel)) + .withConf(SerializableConfiguration.newConfiguration(configuration)); if (modelClass != null) { // all GenericData implementations have a static get method - builder = builder.withDataModel((GenericData) modelClass.getMethod("get").invoke(null)); + builder = builder.withDataModel(buildModelObject(modelClass)); } try (ParquetReader reader = builder.build()) { GenericRecord read; while ((read = reader.read()) != null) { - processContext.output(read); + receiver.output(parseFn.apply(read)); } } + + seekableByteChannel.close(); } } private static class BeamParquetInputFile implements InputFile { - - private SeekableByteChannel seekableByteChannel; + private final SeekableByteChannel seekableByteChannel; BeamParquetInputFile(SeekableByteChannel seekableByteChannel) { this.seekableByteChannel = seekableByteChannel; @@ -689,6 +1137,8 @@ public static Sink sink(Schema schema) { return new AutoValue_ParquetIO_Sink.Builder() .setJsonSchema(schema.toString()) .setCompressionCodec(CompressionCodecName.SNAPPY) + // This resembles the default value for ParquetWriter.rowGroupSize. + .setRowGroupSize(ParquetWriter.DEFAULT_BLOCK_SIZE) .build(); } @@ -702,6 +1152,10 @@ public abstract static class Sink implements FileIO.Sink { abstract @Nullable SerializableConfiguration getConfiguration(); + abstract int getRowGroupSize(); + + abstract @Nullable Class getAvroDataModelClass(); + abstract Builder toBuilder(); @AutoValue.Builder @@ -712,6 +1166,10 @@ abstract static class Builder { abstract Builder setConfiguration(SerializableConfiguration configuration); + abstract Builder setRowGroupSize(int rowGroupSize); + + abstract Builder setAvroDataModelClass(Class modelClass); + abstract Sink build(); } @@ -722,13 +1180,27 @@ public Sink withCompressionCodec(CompressionCodecName compressionCodecName) { /** Specifies configuration to be passed into the sink's writer. */ public Sink withConfiguration(Map configuration) { - Configuration hadoopConfiguration = new Configuration(); - for (Map.Entry entry : configuration.entrySet()) { - hadoopConfiguration.set(entry.getKey(), entry.getValue()); - } - return toBuilder() - .setConfiguration(new SerializableConfiguration(hadoopConfiguration)) - .build(); + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(SerializableConfiguration.fromMap(configuration)).build(); + } + + /** Specify Hadoop configuration for ParquetReader. */ + public Sink withConfiguration(Configuration configuration) { + checkArgument(configuration != null, "configuration can not be null"); + return toBuilder().setConfiguration(new SerializableConfiguration(configuration)).build(); + } + + /** Specify row-group size; if not set or zero, a default is used by the underlying writer. */ + public Sink withRowGroupSize(int rowGroupSize) { + checkArgument(rowGroupSize > 0, "rowGroupSize must be positive"); + return toBuilder().setRowGroupSize(rowGroupSize).build(); + } + + /** + * Define the Avro data model; see {@link AvroParquetWriter.Builder#withDataModel(GenericData)}. + */ + public Sink withAvroDataModel(GenericData model) { + return toBuilder().setAvroDataModelClass(model.getClass()).build(); } private transient @Nullable ParquetWriter writer; @@ -738,6 +1210,7 @@ public void open(WritableByteChannel channel) throws IOException { checkNotNull(getJsonSchema(), "Schema cannot be null"); Schema schema = new Schema.Parser().parse(getJsonSchema()); + Class modelClass = getAvroDataModelClass(); BeamParquetOutputFile beamParquetOutputFile = new BeamParquetOutputFile(Channels.newOutputStream(channel)); @@ -746,12 +1219,17 @@ public void open(WritableByteChannel channel) throws IOException { AvroParquetWriter.builder(beamParquetOutputFile) .withSchema(schema) .withCompressionCodec(getCompressionCodec()) - .withWriteMode(OVERWRITE); - - if (getConfiguration() != null) { - builder = builder.withConf(getConfiguration().get()); + .withWriteMode(OVERWRITE) + .withConf(SerializableConfiguration.newConfiguration(getConfiguration())) + .withRowGroupSize(getRowGroupSize()); + if (modelClass != null) { + try { + builder.withDataModel(buildModelObject(modelClass)); + } catch (ReflectiveOperationException e) { + throw new IOException( + "Couldn't set the specified Avro data model " + modelClass.getName(), e); + } } - this.writer = builder.build(); } @@ -769,7 +1247,7 @@ public void flush() throws IOException { private static class BeamParquetOutputFile implements OutputFile { - private OutputStream outputStream; + private final OutputStream outputStream; BeamParquetOutputFile(OutputStream outputStream) { this.outputStream = outputStream; @@ -798,14 +1276,14 @@ public long defaultBlockSize() { private static class BeamOutputStream extends PositionOutputStream { private long position = 0; - private OutputStream outputStream; + private final OutputStream outputStream; private BeamOutputStream(OutputStream outputStream) { this.outputStream = outputStream; } @Override - public long getPos() throws IOException { + public long getPos() { return position; } @@ -838,6 +1316,34 @@ public void close() throws IOException { } } + /** Returns a model object created using provided modelClass or null. */ + private static GenericData buildModelObject(@Nullable Class modelClass) + throws ReflectiveOperationException { + return (modelClass == null) ? null : (GenericData) modelClass.getMethod("get").invoke(null); + } + + /** + * Passthrough function to provide seamless backward compatibility to ParquetIO's functionality. + */ + @VisibleForTesting + static final class GenericRecordPassthroughFn + implements SerializableFunction { + + private static final GenericRecordPassthroughFn singleton = new GenericRecordPassthroughFn(); + + static GenericRecordPassthroughFn create() { + return singleton; + } + + @Override + public GenericRecord apply(GenericRecord input) { + return input; + } + + /** Enforce singleton pattern, by disallowing construction with {@code new} operator. */ + private GenericRecordPassthroughFn() {} + } + /** Disallow construction of utility class. */ private ParquetIO() {} } diff --git a/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetSchemaIOProvider.java b/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetSchemaIOProvider.java deleted file mode 100644 index 59c03f9b456c..000000000000 --- a/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetSchemaIOProvider.java +++ /dev/null @@ -1,115 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.parquet; - -import com.google.auto.service.AutoService; -import java.io.Serializable; -import org.apache.avro.generic.GenericRecord; -import org.apache.beam.sdk.annotations.Internal; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.schemas.io.SchemaIO; -import org.apache.beam.sdk.schemas.io.SchemaIOProvider; -import org.apache.beam.sdk.schemas.utils.AvroUtils; -import org.apache.beam.sdk.transforms.PTransform; -import org.apache.beam.sdk.values.PBegin; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.POutput; -import org.apache.beam.sdk.values.Row; - -/** - * An implementation of {@link SchemaIOProvider} for reading and writing parquet files with {@link - * ParquetIO}. - */ -@Internal -@AutoService(SchemaIOProvider.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) -public class ParquetSchemaIOProvider implements SchemaIOProvider { - /** Returns an id that uniquely represents this IO. */ - @Override - public String identifier() { - return "parquet"; - } - - /** - * Returns the expected schema of the configuration object. Note this is distinct from the schema - * of the data source itself. No configuration expected for parquet. - */ - @Override - public Schema configurationSchema() { - return Schema.builder().build(); - } - - /** - * Produce a SchemaIO given a String representing the data's location, the schema of the data that - * resides there, and some IO-specific configuration object. - */ - @Override - public ParquetSchemaIO from(String location, Row configuration, Schema dataSchema) { - return new ParquetSchemaIO(location, dataSchema); - } - - @Override - public boolean requiresDataSchema() { - return true; - } - - @Override - public PCollection.IsBounded isBounded() { - return PCollection.IsBounded.BOUNDED; - } - - /** An abstraction to create schema aware IOs. */ - private static class ParquetSchemaIO implements SchemaIO, Serializable { - protected final Schema dataSchema; - protected final String location; - - private ParquetSchemaIO(String location, Schema dataSchema) { - this.dataSchema = dataSchema; - this.location = location; - } - - @Override - public Schema schema() { - return dataSchema; - } - - @Override - public PTransform> buildReader() { - return new PTransform>() { - @Override - public PCollection expand(PBegin begin) { - PTransform, PCollection> readConverter = - GenericRecordReadConverter.builder().beamSchema(dataSchema).build(); - - return begin - .apply( - "ParquetIORead", - ParquetIO.read(AvroUtils.toAvroSchema(dataSchema)).from(location)) - .apply("GenericRecordToRow", readConverter); - } - }; - } - - @Override - public PTransform, POutput> buildWriter() { - throw new UnsupportedOperationException("Writing to a Parquet file is not supported"); - } - } -} diff --git a/sdks/java/io/parquet/src/test/java/org/apache/beam/sdk/io/parquet/GenericRecordToRowTest.java b/sdks/java/io/parquet/src/test/java/org/apache/beam/sdk/io/parquet/GenericRecordToRowTest.java deleted file mode 100644 index 020926866a17..000000000000 --- a/sdks/java/io/parquet/src/test/java/org/apache/beam/sdk/io/parquet/GenericRecordToRowTest.java +++ /dev/null @@ -1,79 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.parquet; - -import java.io.Serializable; -import org.apache.avro.Schema; -import org.apache.avro.generic.GenericData; -import org.apache.avro.generic.GenericRecord; -import org.apache.beam.sdk.coders.AvroCoder; -import org.apache.beam.sdk.testing.PAssert; -import org.apache.beam.sdk.testing.TestPipeline; -import org.apache.beam.sdk.transforms.Create; -import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.Row; -import org.junit.Rule; -import org.junit.Test; - -/** Unit tests for {@link GenericRecordReadConverter}. */ -public class GenericRecordToRowTest implements Serializable { - @Rule public transient TestPipeline pipeline = TestPipeline.create(); - - org.apache.beam.sdk.schemas.Schema payloadSchema = - org.apache.beam.sdk.schemas.Schema.builder() - .addField("name", org.apache.beam.sdk.schemas.Schema.FieldType.STRING) - .addField("favorite_number", org.apache.beam.sdk.schemas.Schema.FieldType.INT32) - .addField("favorite_color", org.apache.beam.sdk.schemas.Schema.FieldType.STRING) - .addField("price", org.apache.beam.sdk.schemas.Schema.FieldType.DOUBLE) - .build(); - - @Test - public void testConvertsGenericRecordToRow() { - String schemaString = - "{\"namespace\": \"example.avro\",\n" - + " \"type\": \"record\",\n" - + " \"name\": \"User\",\n" - + " \"fields\": [\n" - + " {\"name\": \"name\", \"type\": \"string\"},\n" - + " {\"name\": \"favorite_number\", \"type\": \"int\"},\n" - + " {\"name\": \"favorite_color\", \"type\": \"string\"},\n" - + " {\"name\": \"price\", \"type\": \"double\"}\n" - + " ]\n" - + "}"; - Schema schema = (new Schema.Parser()).parse(schemaString); - - GenericRecord before = new GenericData.Record(schema); - before.put("name", "Bob"); - before.put("favorite_number", 256); - before.put("favorite_color", "red"); - before.put("price", 2.4); - - AvroCoder coder = AvroCoder.of(schema); - - PCollection rows = - pipeline - .apply("create PCollection", Create.of(before).withCoder(coder)) - .apply( - "convert", GenericRecordReadConverter.builder().beamSchema(payloadSchema).build()); - - PAssert.that(rows) - .containsInAnyOrder( - Row.withSchema(payloadSchema).addValues("Bob", 256, "red", 2.4).build()); - pipeline.run(); - } -} diff --git a/sdks/java/io/parquet/src/test/java/org/apache/beam/sdk/io/parquet/ParquetIOTest.java b/sdks/java/io/parquet/src/test/java/org/apache/beam/sdk/io/parquet/ParquetIOTest.java index 684ff5f8be10..d2609b4fcd89 100644 --- a/sdks/java/io/parquet/src/test/java/org/apache/beam/sdk/io/parquet/ParquetIOTest.java +++ b/sdks/java/io/parquet/src/test/java/org/apache/beam/sdk/io/parquet/ParquetIOTest.java @@ -17,31 +17,48 @@ */ package org.apache.beam.sdk.io.parquet; +import static java.util.stream.Collectors.toList; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThrows; import static org.junit.Assert.assertTrue; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; +import java.io.ByteArrayOutputStream; +import java.io.IOException; import java.io.Serializable; import java.util.ArrayList; import java.util.List; import org.apache.avro.Schema; import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericDatumWriter; import org.apache.avro.generic.GenericRecord; import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.avro.io.EncoderFactory; +import org.apache.avro.io.JsonEncoder; import org.apache.avro.reflect.ReflectData; import org.apache.beam.sdk.coders.AvroCoder; import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.parquet.ParquetIO.GenericRecordPassthroughFn; import org.apache.beam.sdk.io.range.OffsetRange; +import org.apache.beam.sdk.schemas.SchemaCoder; +import org.apache.beam.sdk.schemas.utils.AvroUtils; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.transforms.Values; import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.hadoop.conf.Configuration; +import org.apache.parquet.filter2.predicate.FilterApi; +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.hadoop.ParquetInputFormat; import org.apache.parquet.hadoop.metadata.BlockMetaData; -import org.junit.Assert; +import org.apache.parquet.io.api.Binary; import org.junit.Rule; import org.junit.Test; import org.junit.rules.TemporaryFolder; @@ -50,9 +67,6 @@ /** Test on the {@link ParquetIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ParquetIOTest implements Serializable { @Rule public transient TestPipeline mainPipeline = TestPipeline.create(); @@ -123,7 +137,7 @@ public void testWriteAndReadWithProjection() { } @Test - public void testBlockTracker() throws Exception { + public void testBlockTracker() { OffsetRange range = new OffsetRange(0, 1); ParquetIO.ReadFiles.BlockTracker tracker = new ParquetIO.ReadFiles.BlockTracker(range, 7, 3); assertEquals(tracker.getProgress().getWorkRemaining(), 1.0, 0.01); @@ -136,8 +150,10 @@ public void testBlockTracker() throws Exception { @Test public void testSplitBlockWithLimit() { - ParquetIO.ReadFiles.SplitReadFn testFn = new ParquetIO.ReadFiles.SplitReadFn(null, null); - ArrayList blockList = new ArrayList(); + ParquetIO.ReadFiles.SplitReadFn testFn = + new ParquetIO.ReadFiles.SplitReadFn<>( + null, null, ParquetIO.GenericRecordPassthroughFn.create(), null); + ArrayList blockList = new ArrayList<>(); ArrayList rangeList; BlockMetaData testBlock = mock(BlockMetaData.class); when(testBlock.getTotalByteSize()).thenReturn((long) 60); @@ -173,6 +189,24 @@ public void testWriteAndRead() { readPipeline.run().waitUntilFinish(); } + @Test + public void testWriteWithRowGroupSizeAndRead() { + List records = generateGenericRecords(1000); + + mainPipeline + .apply(Create.of(records).withCoder(AvroCoder.of(SCHEMA))) + .apply( + FileIO.write() + .via(ParquetIO.sink(SCHEMA).withRowGroupSize(1500)) + .to(temporaryFolder.getRoot().getAbsolutePath())); + mainPipeline.run().waitUntilFinish(); + PCollection readBack = + readPipeline.apply( + ParquetIO.read(SCHEMA).from(temporaryFolder.getRoot().getAbsolutePath() + "/*")); + PAssert.that(readBack).containsInAnyOrder(records); + readPipeline.run().waitUntilFinish(); + } + @Test public void testWriteAndReadWithSplit() { List records = generateGenericRecords(1000); @@ -194,6 +228,51 @@ public void testWriteAndReadWithSplit() { readPipeline.run().waitUntilFinish(); } + @Test + public void testWriteAndReadWithBeamSchema() { + List records = generateGenericRecords(1000); + + mainPipeline + .apply(Create.of(records).withCoder(AvroCoder.of(SCHEMA))) + .apply( + FileIO.write() + .via(ParquetIO.sink(SCHEMA)) + .to(temporaryFolder.getRoot().getAbsolutePath())); + + mainPipeline.run().waitUntilFinish(); + + PCollection readBackRecords = + readPipeline.apply( + ParquetIO.read(SCHEMA) + .from(temporaryFolder.getRoot().getAbsolutePath() + "/*") + .withBeamSchemas(true)); + + PAssert.that(readBackRecords).containsInAnyOrder(records); + readPipeline.run().waitUntilFinish(); + } + + @Test + public void testWriteAndReadFilesAsJsonForWithSplitForUnknownSchema() { + List records = generateGenericRecords(1000); + + mainPipeline + .apply(Create.of(records).withCoder(AvroCoder.of(SCHEMA))) + .apply( + FileIO.write() + .via(ParquetIO.sink(SCHEMA)) + .to(temporaryFolder.getRoot().getAbsolutePath())); + mainPipeline.run().waitUntilFinish(); + + PCollection readBackAsJsonWithSplit = + readPipeline.apply( + ParquetIO.parseGenericRecords(ParseGenericRecordAsJsonFn.create()) + .from(temporaryFolder.getRoot().getAbsolutePath() + "/*") + .withSplit()); + + PAssert.that(readBackAsJsonWithSplit).containsInAnyOrder(convertRecordsToJson(records)); + readPipeline.run().waitUntilFinish(); + } + @Test public void testWriteAndReadFiles() { List records = generateGenericRecords(1000); @@ -216,8 +295,74 @@ public void testWriteAndReadFiles() { mainPipeline.run().waitUntilFinish(); } + @Test + public void testReadFilesAsJsonForUnknownSchemaFiles() { + List records = generateGenericRecords(1000); + List expectedJsonRecords = convertRecordsToJson(records); + + PCollection writeThenRead = + mainPipeline + .apply(Create.of(records).withCoder(AvroCoder.of(SCHEMA))) + .apply( + FileIO.write() + .via(ParquetIO.sink(SCHEMA)) + .to(temporaryFolder.getRoot().getAbsolutePath())) + .getPerDestinationOutputFilenames() + .apply(Values.create()) + .apply(FileIO.matchAll()) + .apply(FileIO.readMatches()) + .apply(ParquetIO.parseFilesGenericRecords(ParseGenericRecordAsJsonFn.create())); + + assertEquals(1000, expectedJsonRecords.size()); + PAssert.that(writeThenRead).containsInAnyOrder(expectedJsonRecords); + + mainPipeline.run().waitUntilFinish(); + } + + @Test + public void testReadFilesAsRowForUnknownSchemaFiles() { + List records = generateGenericRecords(1000); + List expectedRows = + records.stream().map(record -> AvroUtils.toBeamRowStrict(record, null)).collect(toList()); + + PCollection writeThenRead = + mainPipeline + .apply(Create.of(records).withCoder(AvroCoder.of(SCHEMA))) + .apply( + FileIO.write() + .via(ParquetIO.sink(SCHEMA)) + .to(temporaryFolder.getRoot().getAbsolutePath())) + .getPerDestinationOutputFilenames() + .apply(Values.create()) + .apply(FileIO.matchAll()) + .apply(FileIO.readMatches()) + .apply( + ParquetIO.parseFilesGenericRecords( + (SerializableFunction) + record -> AvroUtils.toBeamRowStrict(record, null)) + .withCoder(SchemaCoder.of(AvroUtils.toBeamSchema(SCHEMA)))); + + PAssert.that(writeThenRead).containsInAnyOrder(expectedRows); + + mainPipeline.run().waitUntilFinish(); + } + + @Test + @SuppressWarnings({"nullable", "ConstantConditions"} /* forced check. */) + public void testReadFilesUnknownSchemaFilesForGenericRecordThrowException() { + IllegalArgumentException illegalArgumentException = + assertThrows( + IllegalArgumentException.class, + () -> + ParquetIO.parseFilesGenericRecords(GenericRecordPassthroughFn.create()) + .expand(null)); + + assertEquals( + "Parse can't be used for reading as GenericRecord.", illegalArgumentException.getMessage()); + } + private List generateGenericRecords(long count) { - ArrayList data = new ArrayList<>(); + List data = new ArrayList<>(); GenericRecordBuilder builder = new GenericRecordBuilder(SCHEMA); for (int i = 0; i < count; i++) { int index = i % SCIENTISTS.length; @@ -241,9 +386,24 @@ private List generateRequestedRecords(long count) { @Test public void testReadDisplayData() { - DisplayData displayData = DisplayData.from(ParquetIO.read(SCHEMA).from("foo.parquet")); - - Assert.assertThat(displayData, hasDisplayItem("filePattern", "foo.parquet")); + Configuration configuration = new Configuration(); + configuration.set("parquet.foo", "foo"); + DisplayData displayData = + DisplayData.from( + ParquetIO.read(SCHEMA) + .from("foo.parquet") + .withSplit() + .withProjection(REQUESTED_SCHEMA, SCHEMA) + .withAvroDataModel(GenericData.get()) + .withConfiguration(configuration)); + + assertThat(displayData, hasDisplayItem("filePattern", "foo.parquet")); + assertThat(displayData, hasDisplayItem("schema", SCHEMA.toString())); + assertThat(displayData, hasDisplayItem("inferBeamSchema", false)); + assertThat(displayData, hasDisplayItem("splittable", true)); + assertThat(displayData, hasDisplayItem("projectionSchema", REQUESTED_SCHEMA.toString())); + assertThat(displayData, hasDisplayItem("avroDataModel", GenericData.get().toString())); + assertThat(displayData, hasDisplayItem("parquet.foo", "foo")); } public static class TestRecord { @@ -345,4 +505,105 @@ public void testWriteAndReadwithSplitUsingReflectDataSchemaWithDataModel() { PAssert.that(readBack).containsInAnyOrder(records); readPipeline.run().waitUntilFinish(); } + + @Test + public void testWriteAndReadUsingGenericDataSchemaWithDataModel() { + Schema schema = new Schema.Parser().parse(SCHEMA_STRING); + + List records = generateGenericRecords(1000); + mainPipeline + .apply(Create.of(records).withCoder(AvroCoder.of(schema))) + .apply( + FileIO.write() + .via(ParquetIO.sink(schema).withAvroDataModel(GenericData.get())) + .to(temporaryFolder.getRoot().getAbsolutePath())); + mainPipeline.run().waitUntilFinish(); + + PCollection readBack = + readPipeline.apply( + ParquetIO.read(schema) + .withAvroDataModel(GenericData.get()) + .from(temporaryFolder.getRoot().getAbsolutePath() + "/*")); + + PAssert.that(readBack).containsInAnyOrder(records); + readPipeline.run().waitUntilFinish(); + } + + @Test + public void testWriteAndReadwithSplitUsingGenericDataSchemaWithDataModel() { + Schema schema = new Schema.Parser().parse(SCHEMA_STRING); + + List records = generateGenericRecords(1000); + mainPipeline + .apply(Create.of(records).withCoder(AvroCoder.of(schema))) + .apply( + FileIO.write() + .via(ParquetIO.sink(schema).withAvroDataModel(GenericData.get())) + .to(temporaryFolder.getRoot().getAbsolutePath())); + mainPipeline.run().waitUntilFinish(); + + PCollection readBack = + readPipeline.apply( + ParquetIO.read(schema) + .withSplit() + .withAvroDataModel(GenericData.get()) + .from(temporaryFolder.getRoot().getAbsolutePath() + "/*")); + + PAssert.that(readBack).containsInAnyOrder(records); + readPipeline.run().waitUntilFinish(); + } + + @Test + public void testWriteAndReadWithConfiguration() { + List records = generateGenericRecords(10); + List expectedRecords = generateGenericRecords(1); + + mainPipeline + .apply(Create.of(records).withCoder(AvroCoder.of(SCHEMA))) + .apply( + FileIO.write() + .via(ParquetIO.sink(SCHEMA)) + .to(temporaryFolder.getRoot().getAbsolutePath())); + mainPipeline.run().waitUntilFinish(); + + Configuration configuration = new Configuration(); + FilterPredicate filterPredicate = + FilterApi.eq(FilterApi.binaryColumn("id"), Binary.fromString("0")); + ParquetInputFormat.setFilterPredicate(configuration, filterPredicate); + PCollection readBack = + readPipeline.apply( + ParquetIO.read(SCHEMA) + .from(temporaryFolder.getRoot().getAbsolutePath() + "/*") + .withConfiguration(configuration)); + PAssert.that(readBack).containsInAnyOrder(expectedRecords); + readPipeline.run().waitUntilFinish(); + } + + /** Returns list of JSON representation of GenericRecords. */ + private static List convertRecordsToJson(List records) { + return records.stream().map(ParseGenericRecordAsJsonFn.create()::apply).collect(toList()); + } + + /** Sample Parse function that converts GenericRecord as JSON. for testing. */ + private static class ParseGenericRecordAsJsonFn + implements SerializableFunction { + + public static ParseGenericRecordAsJsonFn create() { + return new ParseGenericRecordAsJsonFn(); + } + + @Override + public String apply(GenericRecord input) { + ByteArrayOutputStream baos = new ByteArrayOutputStream(); + + try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(input.getSchema(), baos, true); + new GenericDatumWriter(input.getSchema()).write(input, jsonEncoder); + jsonEncoder.flush(); + } catch (IOException ioException) { + throw new RuntimeException("error converting record to JSON", ioException); + } + return baos.toString(); + } + } } diff --git a/sdks/java/io/parquet/src/test/resources/log4j-test.properties b/sdks/java/io/parquet/src/test/resources/log4j-test.properties new file mode 100644 index 000000000000..e28055ab941b --- /dev/null +++ b/sdks/java/io/parquet/src/test/resources/log4j-test.properties @@ -0,0 +1,31 @@ +################################################################################ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +################################################################################ + +# Set root logger level to ERROR. +# set manually to INFO for debugging purposes. +log4j.rootLogger=ERROR, testlogger + +# ConsoleAppender. +log4j.appender.testlogger=org.apache.log4j.ConsoleAppender +log4j.appender.testlogger.target = System.err +log4j.appender.testlogger.layout=org.apache.log4j.PatternLayout +log4j.appender.testlogger.layout.ConversionPattern=%d [%t] %-5p %c %x - %m%n + +# Beam SDK logging +log4j.logger.org.apache.beam=INFO +#log4j.logger.org.apache.beam.sdk.io.parquet.ParquetIO=DEBUG diff --git a/sdks/java/io/rabbitmq/build.gradle b/sdks/java/io/rabbitmq/build.gradle index 2256bc3726ab..2b8008437e33 100644 --- a/sdks/java/io/rabbitmq/build.gradle +++ b/sdks/java/io/rabbitmq/build.gradle @@ -35,8 +35,6 @@ dependencies { testCompile "org.apache.qpid:qpid-broker-plugins-memory-store:$qpid_version" testCompile "org.apache.qpid:qpid-broker-plugins-amqp-0-8-protocol:$qpid_version" testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.slf4j_api testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") diff --git a/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java b/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java index 8aed86e7b81b..73d483223ab4 100644 --- a/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java +++ b/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java @@ -35,6 +35,7 @@ import java.util.Date; import java.util.List; import java.util.NoSuchElementException; +import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.TimeoutException; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; @@ -416,7 +417,14 @@ private static class RabbitMQCheckpointMark implements UnboundedSource.CheckpointMark, Serializable { transient Channel channel; Instant latestTimestamp = Instant.now(); - final List sessionIds = new ArrayList<>(); + transient ConcurrentLinkedQueue sessionIds = new ConcurrentLinkedQueue<>(); + + // this method is called after deserialization on the deserialized object + private Object readResolve() { + // (re-)initialize transient fields as required + this.sessionIds = new ConcurrentLinkedQueue<>(); + return this; + } /** * Advances the watermark to the provided time, provided said time is after the current @@ -432,12 +440,13 @@ public void advanceWatermark(Instant time) { @Override public void finalizeCheckpoint() throws IOException { - for (Long sessionId : sessionIds) { + Long sessionId = sessionIds.poll(); + while (sessionId != null) { channel.basicAck(sessionId, false); + sessionId = sessionIds.poll(); } channel.txCommit(); latestTimestamp = Instant.now(); - sessionIds.clear(); } } diff --git a/sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/ExchangeTestPlan.java b/sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/ExchangeTestPlan.java index a72cddb75423..669cc3c43f2e 100644 --- a/sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/ExchangeTestPlan.java +++ b/sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/ExchangeTestPlan.java @@ -32,9 +32,6 @@ * that could be used, this class has been implemented to help represent the parameters of a test * oriented around reading messages published to an exchange. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) class ExchangeTestPlan { static final String DEFAULT_ROUTING_KEY = "someRoutingKey"; diff --git a/sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java b/sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java index 1055c4d5a11a..c7b3c0e67975 100644 --- a/sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java +++ b/sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java @@ -68,9 +68,6 @@ /** Test of {@link RabbitMqIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RabbitMqIOTest implements Serializable { private static final Logger LOG = LoggerFactory.getLogger(RabbitMqIOTest.class); diff --git a/sdks/java/io/redis/build.gradle b/sdks/java/io/redis/build.gradle index 611053f10df4..49809d336cbc 100644 --- a/sdks/java/io/redis/build.gradle +++ b/sdks/java/io/redis/build.gradle @@ -25,11 +25,9 @@ ext.summary ="IO to read and write on a Redis keystore." dependencies { compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") - compile "redis.clients:jedis:3.3.0" + compile "redis.clients:jedis:3.5.2" testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile "com.github.kstyrc:embedded-redis:0.6" testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") diff --git a/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java b/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java index f4f19d4be6ba..a229b2f92411 100644 --- a/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java +++ b/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java @@ -122,22 +122,6 @@ public static Read read() { .build(); } - /** - * Like {@link #read()} but executes multiple instances of the Redis query substituting each - * element of a {@link PCollection} as key pattern. - * - * @deprecated This method is not consistent with the readAll pattern of other transforms and will - * be remove soon. Please update you code to use {@link #readKeyPatterns()} instead. - */ - @Deprecated - public static ReadAll readAll() { - return new AutoValue_RedisIO_ReadAll.Builder() - .setConnectionConfiguration(RedisConnectionConfiguration.create()) - .setBatchSize(1000) - .setOutputParallelization(true) - .build(); - } - /** * Like {@link #read()} but executes multiple instances of the Redis query substituting each * element of a {@link PCollection} as key pattern. @@ -253,91 +237,6 @@ public PCollection> expand(PBegin input) { } } - /** - * Implementation of {@link #readAll()}. - * - * @deprecated This class will be removed soon. Please update you code to depend on {@link - * ReadKeyPatterns} instead. - */ - @Deprecated - @AutoValue - public abstract static class ReadAll - extends PTransform, PCollection>> { - - abstract @Nullable RedisConnectionConfiguration connectionConfiguration(); - - abstract int batchSize(); - - abstract boolean outputParallelization(); - - abstract Builder toBuilder(); - - @AutoValue.Builder - abstract static class Builder { - - abstract @Nullable Builder setConnectionConfiguration( - RedisConnectionConfiguration connection); - - abstract Builder setBatchSize(int batchSize); - - abstract Builder setOutputParallelization(boolean outputParallelization); - - abstract ReadAll build(); - } - - public ReadAll withEndpoint(String host, int port) { - checkArgument(host != null, "host can not be null"); - checkArgument(port > 0, "port can not be negative or 0"); - return toBuilder() - .setConnectionConfiguration(connectionConfiguration().withHost(host).withPort(port)) - .build(); - } - - public ReadAll withAuth(String auth) { - checkArgument(auth != null, "auth can not be null"); - return toBuilder() - .setConnectionConfiguration(connectionConfiguration().withAuth(auth)) - .build(); - } - - public ReadAll withTimeout(int timeout) { - checkArgument(timeout >= 0, "timeout can not be negative"); - return toBuilder() - .setConnectionConfiguration(connectionConfiguration().withTimeout(timeout)) - .build(); - } - - public ReadAll withConnectionConfiguration(RedisConnectionConfiguration connection) { - checkArgument(connection != null, "connection can not be null"); - return toBuilder().setConnectionConfiguration(connection).build(); - } - - public ReadAll withBatchSize(int batchSize) { - return toBuilder().setBatchSize(batchSize).build(); - } - - /** - * Whether to reshuffle the resulting PCollection so results are distributed to all workers. The - * default is to parallelize and should only be changed if this is known to be unnecessary. - */ - public ReadAll withOutputParallelization(boolean outputParallelization) { - return toBuilder().setOutputParallelization(outputParallelization).build(); - } - - @Override - public PCollection> expand(PCollection input) { - checkArgument(connectionConfiguration() != null, "withConnectionConfiguration() is required"); - PCollection> output = - input - .apply(ParDo.of(new ReadFn(connectionConfiguration(), batchSize()))) - .setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())); - if (outputParallelization()) { - output = output.apply(new Reparallelize()); - } - return output; - } - } - /** Implementation of {@link #readKeyPatterns()}. */ @AutoValue public abstract static class ReadKeyPatterns diff --git a/sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java b/sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java index 7f6da56d49a2..badf03979c0d 100644 --- a/sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java +++ b/sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java @@ -43,9 +43,6 @@ /** Test on the Redis IO. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class RedisIOTest { private static final String REDIS_HOST = "localhost"; diff --git a/sdks/java/io/snowflake/build.gradle b/sdks/java/io/snowflake/build.gradle index a6adc23ebba8..c3c5d06578d1 100644 --- a/sdks/java/io/snowflake/build.gradle +++ b/sdks/java/io/snowflake/build.gradle @@ -28,18 +28,18 @@ dependencies { compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") compile project(path: ":sdks:java:extensions:google-cloud-platform-core") + permitUnusedDeclared project(path: ":sdks:java:extensions:google-cloud-platform-core") compile library.java.slf4j_api compile group: 'net.snowflake', name: 'snowflake-jdbc', version: '3.12.11' compile group: 'com.opencsv', name: 'opencsv', version: '5.0' compile 'net.snowflake:snowflake-ingest-sdk:0.9.9' + compile library.java.joda_time testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(path: ":sdks:java:io:common", configuration: "testRuntime") testCompile project(path: ":sdks:java:testing:test-utils", configuration: "testRuntime") testCompile 'com.google.cloud:google-cloud-storage:1.102.0' testCompile library.java.avro testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.slf4j_api testRuntimeOnly library.java.hadoop_client testRuntimeOnly library.java.slf4j_jdk14 diff --git a/sdks/java/io/snowflake/expansion-service/build.gradle b/sdks/java/io/snowflake/expansion-service/build.gradle index fe442b41af47..247a622fcd32 100644 --- a/sdks/java/io/snowflake/expansion-service/build.gradle +++ b/sdks/java/io/snowflake/expansion-service/build.gradle @@ -21,7 +21,6 @@ apply plugin: 'application' mainClassName = "org.apache.beam.sdk.expansion.service.ExpansionService" applyJavaNature( - automaticModuleName: 'org.apache.beam.sdk.io.expansion.service', exportJavadoc: false, validateShadowJar: false, @@ -33,7 +32,9 @@ description = "Apache Beam :: SDKs :: Java :: IO :: Snowflake ::Expansion Servic dependencies { compile project(":sdks:java:expansion-service") + permitUnusedDeclared project(":sdks:java:expansion-service") compile project(":sdks:java:io:snowflake") + permitUnusedDeclared project(":sdks:java:io:snowflake") runtime library.java.slf4j_jdk14 } diff --git a/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/KeyPairUtils.java b/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/KeyPairUtils.java index 437e96a07764..ca5753cc396d 100644 --- a/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/KeyPairUtils.java +++ b/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/KeyPairUtils.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.io.snowflake; import java.io.IOException; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Paths; import java.security.InvalidKeyException; @@ -59,7 +59,7 @@ public static PrivateKey preparePrivateKey(String privateKey, String privateKeyP public static String readPrivateKeyFile(String privateKeyPath) { try { byte[] keyBytes = Files.readAllBytes(Paths.get(privateKeyPath)); - return new String(keyBytes, Charset.defaultCharset()); + return new String(keyBytes, StandardCharsets.UTF_8); } catch (IOException e) { throw new RuntimeException("Can't read private key from provided path"); } diff --git a/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java b/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java index b53f95381d5c..729f01c234d4 100644 --- a/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java +++ b/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java @@ -1070,7 +1070,8 @@ private PCollection writeBatch(PCollection input, ValueProvider stagingB private PCollection writeBatchFiles( PCollection input, ValueProvider outputDirectory) { - return writeFiles(input, outputDirectory, DEFAULT_BATCH_SHARDS_NUMBER); + int shards = (getShardsNumber() > 0) ? getShardsNumber() : DEFAULT_BATCH_SHARDS_NUMBER; + return writeFiles(input, outputDirectory, shards); } private PCollection writeFiles( diff --git a/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/crosslanguage/ReadBuilder.java b/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/crosslanguage/ReadBuilder.java index ad275ed1b58a..47bd7ff06df0 100644 --- a/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/crosslanguage/ReadBuilder.java +++ b/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/crosslanguage/ReadBuilder.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.io.snowflake.crosslanguage; import java.io.Serializable; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.annotations.Experimental.Kind; import org.apache.beam.sdk.coders.ByteArrayCoder; @@ -59,7 +59,7 @@ public static SnowflakeIO.CsvMapper getCsvMapper() { return parts -> { String partsCSV = String.join(",", parts); - return partsCSV.getBytes(Charset.defaultCharset()); + return partsCSV.getBytes(StandardCharsets.UTF_8); }; } } diff --git a/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/services/SnowflakeBatchServiceImpl.java b/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/services/SnowflakeBatchServiceImpl.java index bee0daa8c637..2927829e7b6c 100644 --- a/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/services/SnowflakeBatchServiceImpl.java +++ b/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/services/SnowflakeBatchServiceImpl.java @@ -20,7 +20,7 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import java.math.BigInteger; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; @@ -98,7 +98,7 @@ private String copyIntoStage(SnowflakeBatchServiceConfig config) throws SQLExcep } private String getASCIICharRepresentation(String input) { - return String.format("0x%x", new BigInteger(1, input.getBytes(Charset.defaultCharset()))); + return String.format("0x%x", new BigInteger(1, input.getBytes(StandardCharsets.UTF_8))); } /** diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/BatchSnowflakeIOIT.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/BatchSnowflakeIOIT.java index cf2faaf62277..8503338678c6 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/BatchSnowflakeIOIT.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/BatchSnowflakeIOIT.java @@ -78,9 +78,6 @@ * -DintegrationTestRunner=dataflow * */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BatchSnowflakeIOIT { private static final String tableName = "IOIT"; diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeBasicDataSource.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeBasicDataSource.java index 1ef2617dbc67..5fc694fb9d0c 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeBasicDataSource.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeBasicDataSource.java @@ -41,9 +41,6 @@ /** * Fake implementation of {@link net.snowflake.client.jdbc.SnowflakeBasicDataSource} used in tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FakeSnowflakeBasicDataSource extends SnowflakeBasicDataSource implements Serializable { @Override public FakeConnection getConnection() throws SQLException { diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeBatchServiceImpl.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeBatchServiceImpl.java index b8254c018f3e..90ee4b9d3940 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeBatchServiceImpl.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeBatchServiceImpl.java @@ -32,9 +32,6 @@ import org.apache.beam.sdk.io.snowflake.services.SnowflakeService; /** Fake implementation of {@link SnowflakeService} used in tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FakeSnowflakeBatchServiceImpl implements SnowflakeService { diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeDatabase.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeDatabase.java index f98e8e152a1a..32e494496bec 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeDatabase.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeDatabase.java @@ -26,9 +26,6 @@ import net.snowflake.client.jdbc.SnowflakeSQLException; /** Fake implementation of Snowflake warehouse used in test code. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FakeSnowflakeDatabase implements Serializable { private static Map> tables = new HashMap<>(); diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeStreamingServiceImpl.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeStreamingServiceImpl.java index 8df12afa06b7..8dc4694e72ac 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeStreamingServiceImpl.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/FakeSnowflakeStreamingServiceImpl.java @@ -21,7 +21,7 @@ import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; import java.util.zip.GZIPInputStream; @@ -29,9 +29,6 @@ import org.apache.beam.sdk.io.snowflake.services.SnowflakeStreamingServiceConfig; /** Fake implementation of {@link SnowflakeService} used in tests. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FakeSnowflakeStreamingServiceImpl implements SnowflakeService { private FakeSnowflakeIngestManager snowflakeIngestManager; @@ -61,7 +58,7 @@ private List readGZIPFile(String file) { List lines = new ArrayList<>(); try { GZIPInputStream gzip = new GZIPInputStream(new FileInputStream(file)); - BufferedReader br = new BufferedReader(new InputStreamReader(gzip, Charset.defaultCharset())); + BufferedReader br = new BufferedReader(new InputStreamReader(gzip, StandardCharsets.UTF_8)); String line; while ((line = br.readLine()) != null) { diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/StreamingSnowflakeIOIT.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/StreamingSnowflakeIOIT.java index 0311d2070163..e2a408816015 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/StreamingSnowflakeIOIT.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/StreamingSnowflakeIOIT.java @@ -70,9 +70,6 @@ * -DintegrationTestRunner=direct * */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class StreamingSnowflakeIOIT { private static final int TIMEOUT = 900000; private static final int INTERVAL = 30000; diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/TestUtils.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/TestUtils.java index 5f094bf22203..2dbceacef9f4 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/TestUtils.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/TestUtils.java @@ -26,7 +26,7 @@ import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; @@ -50,7 +50,6 @@ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class TestUtils { @@ -152,7 +151,7 @@ public static List readGZIPFile(String file) { List lines = new ArrayList<>(); try { GZIPInputStream gzip = new GZIPInputStream(new FileInputStream(file)); - BufferedReader br = new BufferedReader(new InputStreamReader(gzip, Charset.defaultCharset())); + BufferedReader br = new BufferedReader(new InputStreamReader(gzip, StandardCharsets.UTF_8)); String line; while ((line = br.readLine()) != null) { @@ -175,7 +174,7 @@ public static String getValidPrivateKeyPath(Class c) { public static String getRawValidPrivateKey(Class c) throws IOException { byte[] keyBytes = Files.readAllBytes(Paths.get(getValidPrivateKeyPath(c))); - return new String(keyBytes, Charset.defaultCharset()); + return new String(keyBytes, StandardCharsets.UTF_8); } public static String getPrivateKeyPassphrase() { diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/DataSourceConfigurationTest.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/DataSourceConfigurationTest.java index eac11a5c821e..6bac967c4894 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/DataSourceConfigurationTest.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/DataSourceConfigurationTest.java @@ -29,9 +29,6 @@ import org.junit.Test; /** Unit tests for {@link org.apache.beam.sdk.io.snowflake.SnowflakeIO.DataSourceConfiguration}. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class DataSourceConfigurationTest { private SnowflakeIO.DataSourceConfiguration configuration; diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/read/SnowflakeIOReadTest.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/read/SnowflakeIOReadTest.java index 263227b4cbb2..024ef47aac2d 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/read/SnowflakeIOReadTest.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/read/SnowflakeIOReadTest.java @@ -47,7 +47,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SnowflakeIOReadTest implements Serializable { public static final String FAKE_TABLE = "FAKE_TABLE"; diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/CreateDispositionTest.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/CreateDispositionTest.java index 14678d32a146..f91fe31f5fe9 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/CreateDispositionTest.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/CreateDispositionTest.java @@ -50,7 +50,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class CreateDispositionTest { private static final String FAKE_TABLE = "FAKE_TABLE"; diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/QueryDispositionLocationTest.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/QueryDispositionLocationTest.java index 4f415e47ff1b..1aa1ad0b1b97 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/QueryDispositionLocationTest.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/QueryDispositionLocationTest.java @@ -47,7 +47,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class QueryDispositionLocationTest { private static final String FAKE_TABLE = "FAKE_TABLE"; diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/SchemaDispositionTest.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/SchemaDispositionTest.java index 7acddcfa6157..d150f440f5e7 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/SchemaDispositionTest.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/SchemaDispositionTest.java @@ -56,7 +56,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SchemaDispositionTest { private static final String FAKE_TABLE = "FAKE_TABLE"; diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/SnowflakeIOWriteTest.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/SnowflakeIOWriteTest.java index ac9c6687c3dd..185015169d97 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/SnowflakeIOWriteTest.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/SnowflakeIOWriteTest.java @@ -49,7 +49,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SnowflakeIOWriteTest { private static final String FAKE_TABLE = "FAKE_TABLE"; diff --git a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/StreamingWriteTest.java b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/StreamingWriteTest.java index d39b4b551d88..0ee2928fbca6 100644 --- a/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/StreamingWriteTest.java +++ b/sdks/java/io/snowflake/src/test/java/org/apache/beam/sdk/io/snowflake/test/unit/write/StreamingWriteTest.java @@ -62,7 +62,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class StreamingWriteTest { private static final Logger LOG = LoggerFactory.getLogger(StreamingWriteTest.class); diff --git a/sdks/java/io/solr/build.gradle b/sdks/java/io/solr/build.gradle index 33a9b5cb69c7..efdfde26e043 100644 --- a/sdks/java/io/solr/build.gradle +++ b/sdks/java/io/solr/build.gradle @@ -30,12 +30,11 @@ dependencies { compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.commons_compress + compile library.java.joda_time + compile library.java.slf4j_api compile "org.apache.solr:solr-solrj:$solrVersion" - compileOnly "org.apache.httpcomponents:httpclient:4.5.6" testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(":sdks:java:io:common") - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile library.java.junit testCompile library.java.slf4j_api testCompile "org.apache.solr:solr-test-framework:$solrVersion" diff --git a/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/JavaBinCodecCoderTest.java b/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/JavaBinCodecCoderTest.java index 19c6aeadf3b3..cfa734876716 100644 --- a/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/JavaBinCodecCoderTest.java +++ b/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/JavaBinCodecCoderTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.io.solr; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import java.util.ArrayList; import java.util.List; @@ -36,9 +36,6 @@ /** Test case for {@link JavaBinCodecCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JavaBinCodecCoderTest { private static final Coder TEST_CODER = JavaBinCodecCoder.of(SolrDocument.class); private static final List TEST_VALUES = new ArrayList<>(); diff --git a/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java b/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java index 16c4815db4c0..6e5d982a6d0b 100644 --- a/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java +++ b/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java @@ -24,7 +24,7 @@ import com.carrotsearch.randomizedtesting.RandomizedRunner; import com.carrotsearch.randomizedtesting.annotations.ThreadLeakScope; import java.io.IOException; -import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; import java.util.List; import java.util.Set; import org.apache.beam.sdk.Pipeline; @@ -68,7 +68,6 @@ @RunWith(RandomizedRunner.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class SolrIOTest extends SolrCloudTestCase { private static final Logger LOG = LoggerFactory.getLogger(SolrIOTest.class); @@ -112,7 +111,7 @@ public static void beforeClass() throws Exception { ZkStateReader zkStateReader = cluster.getSolrClient().getZkStateReader(); zkStateReader .getZkClient() - .setData("/security.json", securityJson.getBytes(Charset.defaultCharset()), true); + .setData("/security.json", securityJson.getBytes(StandardCharsets.UTF_8), true); String zkAddress = cluster.getZkServer().getZkAddress(); connectionConfiguration = SolrIO.ConnectionConfiguration.create(zkAddress).withBasicCredentials("solr", password); diff --git a/sdks/java/io/splunk/build.gradle b/sdks/java/io/splunk/build.gradle index a8a31b7354d6..0a29d1e8339f 100644 --- a/sdks/java/io/splunk/build.gradle +++ b/sdks/java/io/splunk/build.gradle @@ -26,11 +26,17 @@ ext.summary = "IO to write events to Splunk Http Event Collector (HEC)" dependencies { compile platform(library.java.google_cloud_platform_libraries_bom) - compile library.java.slf4j_api + compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.google_api_client + permitUnusedDeclared library.java.google_api_client // BEAM-11761 compile library.java.google_http_client_apache_v2 compile library.java.google_code_gson - compile project(path: ":sdks:java:core", configuration: "shadow") + compile library.java.google_http_client + compile library.java.http_client + compile library.java.http_core + compile library.java.joda_time + compile library.java.slf4j_api + compile library.java.vendored_guava_26_0_jre testCompile library.java.junit testCompile group: 'org.mock-server', name: 'mockserver-junit-rule', version: '5.10.0' testCompile group: 'org.mock-server', name: 'mockserver-client-java', version: '5.10.0' diff --git a/sdks/java/io/splunk/src/test/java/org/apache/beam/sdk/io/splunk/SplunkEventWriterTest.java b/sdks/java/io/splunk/src/test/java/org/apache/beam/sdk/io/splunk/SplunkEventWriterTest.java index 60feb1e1c5ef..e31ec0ced09a 100644 --- a/sdks/java/io/splunk/src/test/java/org/apache/beam/sdk/io/splunk/SplunkEventWriterTest.java +++ b/sdks/java/io/splunk/src/test/java/org/apache/beam/sdk/io/splunk/SplunkEventWriterTest.java @@ -45,9 +45,6 @@ import org.mockserver.verify.VerificationTimes; /** Unit tests for {@link SplunkEventWriter} class. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SplunkEventWriterTest { private static final String EXPECTED_PATH = "/" + HttpEventPublisher.HEC_URL_PATH; diff --git a/sdks/java/io/splunk/src/test/java/org/apache/beam/sdk/io/splunk/SplunkIOTest.java b/sdks/java/io/splunk/src/test/java/org/apache/beam/sdk/io/splunk/SplunkIOTest.java index 240a5d9cfcc9..f34047ff331e 100644 --- a/sdks/java/io/splunk/src/test/java/org/apache/beam/sdk/io/splunk/SplunkIOTest.java +++ b/sdks/java/io/splunk/src/test/java/org/apache/beam/sdk/io/splunk/SplunkIOTest.java @@ -37,9 +37,6 @@ import org.mockserver.verify.VerificationTimes; /** Unit tests for {@link SplunkIO} class. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SplunkIOTest { private static final String EXPECTED_PATH = "/" + HttpEventPublisher.HEC_URL_PATH; diff --git a/sdks/java/io/synthetic/build.gradle b/sdks/java/io/synthetic/build.gradle index cb9cbf8db1d7..65d58ce54a40 100644 --- a/sdks/java/io/synthetic/build.gradle +++ b/sdks/java/io/synthetic/build.gradle @@ -31,11 +31,11 @@ dependencies { compile library.java.jackson_core compile library.java.jackson_annotations compile library.java.jackson_databind + compile library.java.slf4j_api + compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") testCompile library.java.vendored_guava_26_0_jre testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/BundleSplitterTest.java b/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/BundleSplitterTest.java index 9042cfb997ee..8c7ec44712b0 100644 --- a/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/BundleSplitterTest.java +++ b/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/BundleSplitterTest.java @@ -32,7 +32,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class BundleSplitterTest { diff --git a/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/SyntheticBoundedSourceTest.java b/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/SyntheticBoundedSourceTest.java index 478ce6184f27..c6c8df4d8a50 100644 --- a/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/SyntheticBoundedSourceTest.java +++ b/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/SyntheticBoundedSourceTest.java @@ -43,9 +43,6 @@ /** Unit tests for {@link SyntheticBoundedSource}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SyntheticBoundedSourceTest { @Rule public final ExpectedException thrown = ExpectedException.none(); diff --git a/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/SyntheticUnboundedSourceTest.java b/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/SyntheticUnboundedSourceTest.java index 8cba0be2ef65..8dfaefaad0ef 100644 --- a/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/SyntheticUnboundedSourceTest.java +++ b/sdks/java/io/synthetic/src/test/java/org/apache/beam/sdk/io/synthetic/SyntheticUnboundedSourceTest.java @@ -33,9 +33,6 @@ /** Unit tests for {@link SyntheticUnboundedSource}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class SyntheticUnboundedSourceTest { private SyntheticSourceOptions sourceOptions; diff --git a/sdks/java/io/thrift/build.gradle b/sdks/java/io/thrift/build.gradle index 90bec04697b8..e012c7ee8a61 100644 --- a/sdks/java/io/thrift/build.gradle +++ b/sdks/java/io/thrift/build.gradle @@ -20,7 +20,16 @@ plugins { id 'org.apache.beam.module' // id "org.jruyi.thrift" version "0.4.1" } -applyJavaNature(automaticModuleName: 'org.apache.beam.sdk.io.thrift') +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.io.thrift', + generatedClassPatterns: [ + /^org\.apache\.beam\.sdk\.io\.thrift\.payloads.*/, + /^org\.apache\.beam\.sdk\.io\.thrift\.TestThriftStruct/, + /^org\.apache\.beam\.sdk\.io\.thrift\.TestThriftInnerStruct/, + /^org\.apache\.beam\.sdk\.io\.thrift\.TestThriftEnum/, + /^org\.apache\.beam\.sdk\.io\.thrift\.TestThriftUnion/, + ], +) description = "Apache Beam :: SDKs :: Java :: IO :: Thrift" ext.summary = "IO to read and write Thrift encoded data files" @@ -28,14 +37,9 @@ ext.summary = "IO to read and write Thrift encoded data files" dependencies { compile library.java.vendored_guava_26_0_jre compile library.java.slf4j_api - compile "org.apache.thrift:libthrift:0.13.0" + compile "org.apache.thrift:libthrift:0.14.1" compile project(path: ":sdks:java:core", configuration: "shadow") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library - testCompile library.java.jackson_databind - testCompile "com.google.code.gson:gson:2.8.6" - testCompile "commons-io:commons-io:2.6" testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java b/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java index 242d58137230..1a3f25fd2fc8 100644 --- a/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java +++ b/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java @@ -17,7 +17,6 @@ */ package org.apache.beam.sdk.io.thrift; -import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; @@ -27,6 +26,7 @@ import org.apache.thrift.protocol.TProtocol; import org.apache.thrift.protocol.TProtocolFactory; import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; /** * A {@link org.apache.beam.sdk.coders.Coder} using a Thrift {@link TProtocol} to @@ -37,7 +37,7 @@ @SuppressWarnings({ "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) -class ThriftCoder extends CustomCoder { +public class ThriftCoder extends CustomCoder { private final Class type; private final TProtocolFactory protocolFactory; @@ -57,7 +57,7 @@ protected ThriftCoder(Class type, TProtocolFactory protocolFactory) { * @return ThriftCoder initialize with class to be encoded/decoded and {@link TProtocolFactory} * used to encode/decode. */ - static ThriftCoder of(Class clazz, TProtocolFactory protocolFactory) { + public static ThriftCoder of(Class clazz, TProtocolFactory protocolFactory) { return new ThriftCoder<>(clazz, protocolFactory); } @@ -68,19 +68,18 @@ static ThriftCoder of(Class clazz, TProtocolFactory protocolFactory) { * @param value {@link org.apache.thrift.TBase} to encode. * @param outStream stream to output encoded value to. * @throws IOException if writing to the {@code OutputStream} fails for some reason - * @throws CoderException if the value could not be encoded for some reason */ @Override public void encode(T value, OutputStream outStream) throws CoderException, IOException { - ByteArrayOutputStream baos = new ByteArrayOutputStream(); - TProtocol protocol = protocolFactory.getProtocol(new TIOStreamTransport(baos)); try { + TProtocol protocol = protocolFactory.getProtocol(new TIOStreamTransport(outStream)); TBase tBase = (TBase) value; tBase.write(protocol); + } catch (TTransportException tte) { + throw new CoderException("Could not transport value. Error: " + tte.getMessage()); } catch (Exception te) { throw new CoderException("Could not write value. Error: " + te.getMessage()); } - outStream.write(baos.toByteArray()); } /** diff --git a/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java b/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java index 7f85605c03c4..d9e47dc3df23 100644 --- a/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java +++ b/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java @@ -37,12 +37,14 @@ import org.apache.beam.sdk.transforms.display.DisplayData; import org.apache.beam.sdk.values.PCollection; import org.apache.thrift.TBase; +import org.apache.thrift.TConfiguration; import org.apache.thrift.TException; import org.apache.thrift.protocol.TProtocol; import org.apache.thrift.protocol.TProtocolFactory; import org.apache.thrift.protocol.TSimpleJSONProtocol; import org.apache.thrift.transport.AutoExpandingBufferReadTransport; import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; import org.checkerframework.checker.nullness.qual.Nullable; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -181,7 +183,7 @@ public void processElement(@Element FileIO.ReadableFile file, OutputReceiver TIOStreamTransport streamTransport = new TIOStreamTransport(new BufferedInputStream(inputStream)); AutoExpandingBufferReadTransport readTransport = - new AutoExpandingBufferReadTransport(262_144_000); + new AutoExpandingBufferReadTransport(new TConfiguration(), 262_144_000); readTransport.fill(streamTransport, inputStream.available()); TProtocol protocol = tProtocol.getProtocol(readTransport); while (protocol.getTransport().getBytesRemainingInBuffer() > 0) { @@ -264,10 +266,13 @@ protected static class ThriftWriter> { public void write(T element) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); - TProtocol protocol = protocolFactory.getProtocol(new TIOStreamTransport(baos)); try { + TProtocol protocol = protocolFactory.getProtocol(new TIOStreamTransport(baos)); element.write(protocol); + } catch (TTransportException tte) { + LOG.error("Error in transport to TIOStreamTransport: " + tte); + throw new RuntimeException(tte); } catch (TException te) { LOG.error("Error in writing element to TProtocol: " + te); throw new RuntimeException(te); diff --git a/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftPayloadSerializerProvider.java b/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftPayloadSerializerProvider.java new file mode 100644 index 000000000000..96b61715224a --- /dev/null +++ b/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftPayloadSerializerProvider.java @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.sdk.schemas.transforms.Cast.castRow; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; + +import com.google.auto.service.AutoService; +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.schemas.RowMessages; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializer; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializerProvider; +import org.apache.beam.sdk.transforms.SimpleFunction; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.thrift.TBase; +import org.apache.thrift.protocol.TProtocolFactory; + +@Internal +@Experimental(Kind.SCHEMAS) +@SuppressWarnings("rawtypes") +@AutoService(PayloadSerializerProvider.class) +public class ThriftPayloadSerializerProvider implements PayloadSerializerProvider { + @Override + public String identifier() { + return "thrift"; + } + + private static Class getMessageClass(Map tableParams) { + String thriftClassName = checkArgumentNotNull(tableParams.get("thriftClass")).toString(); + try { + Class thriftClass = Class.forName(thriftClassName); + return thriftClass.asSubclass(TBase.class); + } catch (ClassNotFoundException e) { + throw new IllegalArgumentException("Incorrect thrift class provided: " + thriftClassName, e); + } + } + + private static TProtocolFactory getProtocolFactory(Map tableParams) { + String thriftFactoryClassName = + checkArgumentNotNull(tableParams.get("thriftProtocolFactoryClass")).toString(); + try { + Class thriftClass = Class.forName(thriftFactoryClassName); + return thriftClass.asSubclass(TProtocolFactory.class).getDeclaredConstructor().newInstance(); + } catch (ReflectiveOperationException e) { + throw new IllegalArgumentException( + "Incorrect thrift protocol factory class provided: " + thriftFactoryClassName, e); + } + } + + private static void inferAndVerifySchema(Class thriftClass, Schema requiredSchema) { + TypeDescriptor typeDescriptor = TypeDescriptor.of(thriftClass); + Schema schema = checkArgumentNotNull(ThriftSchema.provider().schemaFor(typeDescriptor)); + if (!schema.assignableTo(requiredSchema)) { + throw new IllegalArgumentException( + String.format( + "Given message schema: '%s'%n" + + "does not match schema inferred from thrift class.%n" + + "Thrift class: '%s'%n" + + "Inferred schema: '%s'", + requiredSchema, thriftClass.getName(), schema)); + } + } + + /** A helper needed to fix the type `T` of thriftClass to satisfy RowMessages constraints. */ + private static PayloadSerializer getPayloadSerializer( + Schema schema, TProtocolFactory protocolFactory, Class thriftClass) { + Coder coder = ThriftCoder.of(thriftClass, protocolFactory); + TypeDescriptor descriptor = TypeDescriptor.of(thriftClass); + SimpleFunction toRowFn = + RowMessages.bytesToRowFn(ThriftSchema.provider(), descriptor, coder); + return PayloadSerializer.of( + RowMessages.rowToBytesFn(ThriftSchema.provider(), descriptor, coder), + bytes -> { + Row rawRow = toRowFn.apply(bytes); + return castRow(rawRow, rawRow.getSchema(), schema); + }); + } + + @Override + public PayloadSerializer getSerializer(Schema schema, Map tableParams) { + Class thriftClass = getMessageClass(tableParams); + TProtocolFactory protocolFactory = getProtocolFactory(tableParams); + inferAndVerifySchema(thriftClass, schema); + return getPayloadSerializer(schema, protocolFactory, thriftClass); + } +} diff --git a/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java b/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java new file mode 100644 index 000000000000..0bf47c671d35 --- /dev/null +++ b/sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.util.Collections.unmodifiableMap; + +import java.lang.reflect.Method; +import java.lang.reflect.Modifier; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.Iterator; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.Map.Entry; +import java.util.function.Function; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import java.util.stream.StreamSupport; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldValueGetter; +import org.apache.beam.sdk.schemas.FieldValueTypeInformation; +import org.apache.beam.sdk.schemas.GetterBasedSchemaProvider; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Builder; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.SchemaProvider; +import org.apache.beam.sdk.schemas.SchemaUserTypeCreator; +import org.apache.beam.sdk.schemas.logicaltypes.EnumerationType; +import org.apache.beam.sdk.schemas.logicaltypes.OneOfType; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.thrift.TBase; +import org.apache.thrift.TEnum; +import org.apache.thrift.TFieldIdEnum; +import org.apache.thrift.TFieldRequirementType; +import org.apache.thrift.TUnion; +import org.apache.thrift.meta_data.EnumMetaData; +import org.apache.thrift.meta_data.FieldMetaData; +import org.apache.thrift.meta_data.FieldValueMetaData; +import org.apache.thrift.meta_data.ListMetaData; +import org.apache.thrift.meta_data.MapMetaData; +import org.apache.thrift.meta_data.SetMetaData; +import org.apache.thrift.meta_data.StructMetaData; +import org.apache.thrift.protocol.TType; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * Schema provider for generated thrift types. + * + *

      + *
    • Primitive type mapping is straight-forward (e.g. {@link TType#I32} -> {@link + * FieldType#INT32}). + *
    • {@link TType#STRING} gets mapped as either {@link FieldType#STRING} or {@link + * FieldType#BYTES}, depending on whether the {@link FieldValueMetaData#isBinary()} flag is + * set. + *
    • {@link TType#MAP} becomes {@link FieldType#map(FieldType, FieldType) a beam map} passing + * the key and value types recursively. + *
    • {@link TType#SET} gets translated into a beam {@link FieldType#iterable(FieldType) + * iterable}, passing the corresponding element type. + *
    • {@link TType#LIST} becomes an {@link FieldType#array(FieldType) array} of the corresponding + * element type. + *
    • {@link TType#ENUM thrift enums} are converted into {@link EnumerationType beam enumeration + * types}. + *
    • {@link TUnion thrift union} types get mapped to {@link OneOfType beam one-of} types. + *
    + * + *

    The mapping logic relies on the available {@link FieldMetaData thrift metadata} introspection + * and tries to make as few assumptions about the generated code as possible (i.e. does not rely on + * accessor naming convention, as the thrift compiler supports options such as "beans" or + * "fullcamel"/"nocamel".
    + * However, the following strong assumptions are made by this class: + * + *

      + *
    • All thrift generated classes implement {@link TBase}, except for enums which become {@link + * Enum java enums} implementing {@link TEnum}. + *
    • All {@link TUnion} types provide static factory methods for each of the supported field + * types, with the same name as the field itself and only one such method taking a single + * parameter exists. + *
    • All non-union types have a corresponding java field with the same name for every field in + * the original thrift source file. + *
    + * + *

    Thrift typedefs for container types (and possibly others) do not preserve the full type + * information. For this reason, this class allows for {@link #custom() manual registration} of such + * "lossy" typedefs with their corresponding beam types. + * + *

    Note: Thrift encoding and decoding are not fully symmetrical, i.e. the {@link + * TBase#isSet(TFieldIdEnum) isSet} flag may not be preserved upon converting a thrift object to a + * beam row and back. On encoding, we extract all thrift values, no matter if the fields are set or + * not. On decoding, we set all non-{@code null} beam row values to the corresponding thrift fields, + * leaving the rest unset. + */ +@Experimental(Experimental.Kind.SCHEMAS) +public final class ThriftSchema extends GetterBasedSchemaProvider { + private static final ThriftSchema defaultProvider = new ThriftSchema(Collections.emptyMap()); + + private final Map typedefs; + + private ThriftSchema(Map typedefs) { + this.typedefs = typedefs; + } + + /** + * Schema provider that maps any thrift type to a Beam schema, assuming that any typedefs that + * might have been used in the thrift definitions will preserve all required metadata to infer the + * beam type (which is the case for any primitive typedefs and alike). + * + * @see #custom() for how to manually pass the beam type for container typedefs + */ + public static @NonNull SchemaProvider provider() { + return defaultProvider; + } + + /** + * Builds a schema provider that maps any thrift type to a Beam schema, allowing for custom thrift + * typedef entries (which cannot be resolved using the available metadata) to be manually + * registered with their corresponding beam types. + * + *

    E.g. {@code typedef set StringSet} will not carry the element type information and + * needs to be manually mapped here as {@code .custom().withTypedef("StringSet", + * FieldType.iterable(FieldType.STRING)).provider()}. + */ + public static @NonNull Customizer custom() { + return new Customizer(); + } + + public static final class Customizer { + private final Map typedefs = new HashMap<>(); + + private Customizer() {} + + public @NonNull Customizer typedef( + @NonNull String thriftTypedefName, @NonNull FieldType beamType) { + typedefs.put(thriftTypedefName, beamType); + return this; + } + + public @NonNull SchemaProvider provider() { + if (typedefs.isEmpty()) { + return defaultProvider; + } else { + return new ThriftSchema(unmodifiableMap(new HashMap<>(typedefs))); + } + } + } + + @Override + public @NonNull Schema schemaFor(TypeDescriptor typeDescriptor) { + return schemaFor(typeDescriptor.getRawType()); + } + + private Schema schemaFor(Class targetClass) { + if (!TBase.class.isAssignableFrom(targetClass)) { + throw new IllegalArgumentException("Expected thrift class but got: " + targetClass); + } + final Stream fields = + thriftFieldDescriptors(targetClass).values().stream().map(this::beamField); + if (TUnion.class.isAssignableFrom(targetClass)) { + return OneOfType.create(fields.collect(Collectors.toList())).getOneOfSchema(); + } else { + return fields + .reduce(Schema.builder(), Builder::addField, ThriftSchema::throwingCombiner) + .build(); + } + } + + private static X throwingCombiner(X lhs, X rhs) { + throw new IllegalStateException(); + } + + private Schema.Field beamField(FieldMetaData fieldDescriptor) { + try { + final FieldType type = beamType(fieldDescriptor.valueMetaData); + switch (fieldDescriptor.requirementType) { + case TFieldRequirementType.REQUIRED: + return Schema.Field.of(fieldDescriptor.fieldName, type); + case TFieldRequirementType.DEFAULT: + // aka "opt-in, req-out", so it's safest to fall through to nullable + case TFieldRequirementType.OPTIONAL: + default: + return Schema.Field.nullable(fieldDescriptor.fieldName, type); + } + } catch (Exception e) { + throw new IllegalStateException( + "Could not infer beam type for thrift field: " + fieldDescriptor.fieldName, e); + } + } + + @SuppressWarnings("rawtypes") + @Override + public @NonNull List fieldValueGetters( + @NonNull Class targetClass, @NonNull Schema schema) { + return schemaFieldDescriptors(targetClass, schema).keySet().stream() + .map(FieldExtractor::new) + .collect(Collectors.toList()); + } + + @Override + public @NonNull List fieldValueTypeInformations( + @NonNull Class targetClass, @NonNull Schema schema) { + return schemaFieldDescriptors(targetClass, schema).values().stream() + .map(descriptor -> fieldValueTypeInfo(targetClass, descriptor.fieldName)) + .collect(Collectors.toList()); + } + + @SuppressWarnings("unchecked") + private static > + Map thriftFieldDescriptors(Class targetClass) { + return (Map) FieldMetaData.getStructMetaDataMap((Class) targetClass); + } + + private FieldValueTypeInformation fieldValueTypeInfo(Class type, String fieldName) { + if (TUnion.class.isAssignableFrom(type)) { + final List factoryMethods = + Stream.of(type.getDeclaredMethods()) + .filter(m -> m.getName().equals(fieldName)) + .filter(m -> m.getModifiers() == (Modifier.PUBLIC | Modifier.STATIC)) + .filter(m -> m.getParameterCount() == 1) + .filter(m -> m.getReturnType() == type) + .collect(Collectors.toList()); + if (factoryMethods.isEmpty()) { + throw new IllegalArgumentException( + String.format( + "No suitable static factory method: %s.%s(...)", type.getName(), fieldName)); + } + if (factoryMethods.size() > 1) { + throw new IllegalStateException("Overloaded factory methods: " + factoryMethods); + } + return FieldValueTypeInformation.forSetter(factoryMethods.get(0), ""); + } else { + try { + return FieldValueTypeInformation.forField(type.getDeclaredField(fieldName)); + } catch (NoSuchFieldException e) { + throw new IllegalArgumentException(e); + } + } + } + + @Override + public @NonNull SchemaUserTypeCreator schemaTypeCreator( + @NonNull Class targetClass, @NonNull Schema schema) { + final Map fieldDescriptors = + schemaFieldDescriptors(targetClass, schema); + return params -> restoreThriftObject(targetClass, fieldDescriptors, params); + } + + @SuppressWarnings("nullness") + private Map schemaFieldDescriptors( + Class targetClass, Schema schema) { + final Map fieldDescriptors = thriftFieldDescriptors(targetClass); + final Map fields = + fieldDescriptors.keySet().stream() + .collect(Collectors.toMap(TFieldIdEnum::getFieldName, Function.identity())); + + return schema.getFields().stream() + .map(Schema.Field::getName) + .map(fields::get) + .collect( + Collectors.toMap( + Function.identity(), + fieldDescriptors::get, + ThriftSchema::throwingCombiner, + LinkedHashMap::new)); + } + + private > T restoreThriftObject( + Class targetClass, Map fields, Object[] params) { + if (params.length != fields.size()) { + throw new IllegalArgumentException( + String.format( + "The parameter list: %s does not match the expected fields: %s", + Arrays.toString(params), fields.keySet())); + } + try { + @SuppressWarnings("unchecked") + final T thrift = (T) targetClass.getDeclaredConstructor().newInstance(); + final Iterator> iter = fields.entrySet().iterator(); + Stream.of(params).forEach(param -> setThriftField(thrift, iter.next(), param)); + return thrift; + } catch (Exception e) { + throw new IllegalStateException(e); + } + } + + private > void setThriftField( + T thrift, Entry fieldDescriptor, Object value) { + final FieldT field = fieldDescriptor.getKey(); + final FieldMetaData descriptor = fieldDescriptor.getValue(); + if (value != null) { + final Object actualValue; + switch (descriptor.valueMetaData.type) { + case TType.SET: + actualValue = + StreamSupport.stream(((Iterable) value).spliterator(), false) + .collect(Collectors.toSet()); + break; + case TType.ENUM: + final Class enumClass = + ((EnumMetaData) descriptor.valueMetaData).enumClass; + @SuppressWarnings("nullness") // it's either "nullness" or "unsafe", apparently + final TEnum @NonNull [] enumConstants = enumClass.getEnumConstants(); + actualValue = enumConstants[(Integer) value]; + break; + default: + actualValue = value; + } + thrift.setFieldValue(field, actualValue); + } + } + + private & TEnum> FieldType beamType(FieldValueMetaData metadata) { + if (metadata.isTypedef()) { + final FieldType beamType = typedefs.get(metadata.getTypedefName()); + if (beamType != null) { + return beamType; + } + } + switch (metadata.type) { + case TType.BOOL: + return FieldType.BOOLEAN; + case TType.BYTE: + return FieldType.BYTE; + case TType.I16: + return FieldType.INT16; + case TType.I32: + return FieldType.INT32; + case TType.I64: + return FieldType.INT64; + case TType.DOUBLE: + return FieldType.DOUBLE; + case TType.STRING: + return metadata.isBinary() ? FieldType.BYTES : FieldType.STRING; + case TType.SET: + final FieldValueMetaData setElemMetadata = ((SetMetaData) metadata).elemMetaData; + return FieldType.iterable(beamType(setElemMetadata)); + case TType.LIST: + final FieldValueMetaData listElemMetadata = ((ListMetaData) metadata).elemMetaData; + return FieldType.array(beamType(listElemMetadata)); + case TType.MAP: + final MapMetaData mapMetadata = ((MapMetaData) metadata); + return FieldType.map( + beamType(mapMetadata.keyMetaData), beamType(mapMetadata.valueMetaData)); + case TType.STRUCT: + final StructMetaData structMetadata = ((StructMetaData) metadata); + return FieldType.row(schemaFor(structMetadata.structClass)); + case TType.ENUM: + @SuppressWarnings("unchecked") + final Class enumClass = (Class) ((EnumMetaData) metadata).enumClass; + @SuppressWarnings("nullness") // it's either "nullness" or "unsafe", apparently + final EnumT @NonNull [] enumConstants = enumClass.getEnumConstants(); + final String[] enumValues = + Stream.of(enumConstants).map(EnumT::name).toArray(String[]::new); + return FieldType.logicalType(EnumerationType.create(enumValues)); + default: + throw new IllegalArgumentException("Unsupported thrift type code: " + metadata.type); + } + } + + private static class FieldExtractor> + implements FieldValueGetter { + private final FieldT field; + + private FieldExtractor(FieldT field) { + this.field = field; + } + + @Override + public @Nullable Object get(T thrift) { + if (!(thrift instanceof TUnion) || thrift.isSet(field)) { + final Object value = thrift.getFieldValue(field); + if (value instanceof Enum) { + return ((Enum) value).ordinal(); + } else { + return value; + } + } else { + return null; + } + } + + @Override + public @NonNull String name() { + return field.getFieldName(); + } + + @Override + public String toString() { + return name(); + } + } +} diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftEnum.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftEnum.java new file mode 100644 index 000000000000..dbb434a8f8f5 --- /dev/null +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftEnum.java @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +@javax.annotation.Generated( + value = "Autogenerated by Thrift Compiler (0.13.0)", + date = "2020-12-12") +public enum TestThriftEnum implements org.apache.thrift.TEnum { + C1(0), + C2(1); + + private final int value; + + private TestThriftEnum(int value) { + this.value = value; + } + + /** Get the integer value of this enum value, as defined in the Thrift IDL. */ + public int getValue() { + return value; + } + + /** + * Find a the enum type by its integer value, as defined in the Thrift IDL. + * + * @return null if the value is not found. + */ + @org.apache.thrift.annotation.Nullable + public static TestThriftEnum findByValue(int value) { + switch (value) { + case 0: + return C1; + case 1: + return C2; + default: + return null; + } + } +} diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftInnerStruct.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftInnerStruct.java new file mode 100644 index 000000000000..26e5ac7ba61c --- /dev/null +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftInnerStruct.java @@ -0,0 +1,526 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +@SuppressWarnings({"cast", "rawtypes", "serial", "unchecked", "unused"}) +@javax.annotation.Generated( + value = "Autogenerated by Thrift Compiler (0.13.0)", + date = "2020-12-12") +public class TestThriftInnerStruct + implements org.apache.thrift.TBase, + java.io.Serializable, + Cloneable, + Comparable { + private static final org.apache.thrift.protocol.TStruct STRUCT_DESC = + new org.apache.thrift.protocol.TStruct("TestThriftInnerStruct"); + + private static final org.apache.thrift.protocol.TField TEST_NAME_TYPEDEF_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "testNameTypedef", org.apache.thrift.protocol.TType.STRING, (short) 1); + private static final org.apache.thrift.protocol.TField TEST_AGE_TYPEDEF_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "testAgeTypedef", org.apache.thrift.protocol.TType.I16, (short) 2); + + private static final org.apache.thrift.scheme.SchemeFactory STANDARD_SCHEME_FACTORY = + new TestThriftInnerStructStandardSchemeFactory(); + private static final org.apache.thrift.scheme.SchemeFactory TUPLE_SCHEME_FACTORY = + new TestThriftInnerStructTupleSchemeFactory(); + + private @org.apache.thrift.annotation.Nullable java.lang.String testNameTypedef; // required + private short testAgeTypedef; // required + + /** + * The set of fields this struct contains, along with convenience methods for finding and + * manipulating them. + */ + public enum _Fields implements org.apache.thrift.TFieldIdEnum { + TEST_NAME_TYPEDEF((short) 1, "testNameTypedef"), + TEST_AGE_TYPEDEF((short) 2, "testAgeTypedef"); + + private static final java.util.Map byName = + new java.util.HashMap(); + + static { + for (_Fields field : java.util.EnumSet.allOf(_Fields.class)) { + byName.put(field.getFieldName(), field); + } + } + + /** Find the _Fields constant that matches fieldId, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByThriftId(int fieldId) { + switch (fieldId) { + case 1: // TEST_NAME_TYPEDEF + return TEST_NAME_TYPEDEF; + case 2: // TEST_AGE_TYPEDEF + return TEST_AGE_TYPEDEF; + default: + return null; + } + } + + /** Find the _Fields constant that matches fieldId, throwing an exception if it is not found. */ + public static _Fields findByThriftIdOrThrow(int fieldId) { + _Fields fields = findByThriftId(fieldId); + if (fields == null) + throw new java.lang.IllegalArgumentException("Field " + fieldId + " doesn't exist!"); + return fields; + } + + /** Find the _Fields constant that matches name, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByName(java.lang.String name) { + return byName.get(name); + } + + private final short _thriftId; + private final java.lang.String _fieldName; + + _Fields(short thriftId, java.lang.String fieldName) { + _thriftId = thriftId; + _fieldName = fieldName; + } + + public short getThriftFieldId() { + return _thriftId; + } + + public java.lang.String getFieldName() { + return _fieldName; + } + } + + // isset id assignments + private static final int __TESTAGETYPEDEF_ISSET_ID = 0; + private byte __isset_bitfield = 0; + public static final java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> metaDataMap; + + static { + java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> tmpMap = + new java.util.EnumMap<_Fields, org.apache.thrift.meta_data.FieldMetaData>(_Fields.class); + tmpMap.put( + _Fields.TEST_NAME_TYPEDEF, + new org.apache.thrift.meta_data.FieldMetaData( + "testNameTypedef", + org.apache.thrift.TFieldRequirementType.DEFAULT, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.STRING, "Name"))); + tmpMap.put( + _Fields.TEST_AGE_TYPEDEF, + new org.apache.thrift.meta_data.FieldMetaData( + "testAgeTypedef", + org.apache.thrift.TFieldRequirementType.DEFAULT, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.I16, "Age"))); + metaDataMap = java.util.Collections.unmodifiableMap(tmpMap); + org.apache.thrift.meta_data.FieldMetaData.addStructMetaDataMap( + TestThriftInnerStruct.class, metaDataMap); + } + + public TestThriftInnerStruct() { + this.testNameTypedef = "kid"; + + this.testAgeTypedef = (short) 12; + } + + public TestThriftInnerStruct(java.lang.String testNameTypedef, short testAgeTypedef) { + this(); + this.testNameTypedef = testNameTypedef; + this.testAgeTypedef = testAgeTypedef; + setTestAgeTypedefIsSet(true); + } + + /** Performs a deep copy on other. */ + public TestThriftInnerStruct(TestThriftInnerStruct other) { + __isset_bitfield = other.__isset_bitfield; + if (other.isSetTestNameTypedef()) { + this.testNameTypedef = other.testNameTypedef; + } + this.testAgeTypedef = other.testAgeTypedef; + } + + public TestThriftInnerStruct deepCopy() { + return new TestThriftInnerStruct(this); + } + + @Override + public void clear() { + this.testNameTypedef = "kid"; + + this.testAgeTypedef = (short) 12; + } + + @org.apache.thrift.annotation.Nullable + public java.lang.String getTestNameTypedef() { + return this.testNameTypedef; + } + + public void setTestNameTypedef( + @org.apache.thrift.annotation.Nullable java.lang.String testNameTypedef) { + this.testNameTypedef = testNameTypedef; + } + + public void unsetTestNameTypedef() { + this.testNameTypedef = null; + } + + /** + * Returns true if field testNameTypedef is set (has been assigned a value) and false otherwise + */ + public boolean isSetTestNameTypedef() { + return this.testNameTypedef != null; + } + + public void setTestNameTypedefIsSet(boolean value) { + if (!value) { + this.testNameTypedef = null; + } + } + + public short getTestAgeTypedef() { + return this.testAgeTypedef; + } + + public void setTestAgeTypedef(short testAgeTypedef) { + this.testAgeTypedef = testAgeTypedef; + setTestAgeTypedefIsSet(true); + } + + public void unsetTestAgeTypedef() { + __isset_bitfield = + org.apache.thrift.EncodingUtils.clearBit(__isset_bitfield, __TESTAGETYPEDEF_ISSET_ID); + } + + /** Returns true if field testAgeTypedef is set (has been assigned a value) and false otherwise */ + public boolean isSetTestAgeTypedef() { + return org.apache.thrift.EncodingUtils.testBit(__isset_bitfield, __TESTAGETYPEDEF_ISSET_ID); + } + + public void setTestAgeTypedefIsSet(boolean value) { + __isset_bitfield = + org.apache.thrift.EncodingUtils.setBit(__isset_bitfield, __TESTAGETYPEDEF_ISSET_ID, value); + } + + public void setFieldValue( + _Fields field, @org.apache.thrift.annotation.Nullable java.lang.Object value) { + switch (field) { + case TEST_NAME_TYPEDEF: + if (value == null) { + unsetTestNameTypedef(); + } else { + setTestNameTypedef((java.lang.String) value); + } + break; + + case TEST_AGE_TYPEDEF: + if (value == null) { + unsetTestAgeTypedef(); + } else { + setTestAgeTypedef((java.lang.Short) value); + } + break; + } + } + + @org.apache.thrift.annotation.Nullable + public java.lang.Object getFieldValue(_Fields field) { + switch (field) { + case TEST_NAME_TYPEDEF: + return getTestNameTypedef(); + + case TEST_AGE_TYPEDEF: + return getTestAgeTypedef(); + } + throw new java.lang.IllegalStateException(); + } + + /** + * Returns true if field corresponding to fieldID is set (has been assigned a value) and false + * otherwise + */ + public boolean isSet(_Fields field) { + if (field == null) { + throw new java.lang.IllegalArgumentException(); + } + + switch (field) { + case TEST_NAME_TYPEDEF: + return isSetTestNameTypedef(); + case TEST_AGE_TYPEDEF: + return isSetTestAgeTypedef(); + } + throw new java.lang.IllegalStateException(); + } + + @Override + public boolean equals(java.lang.Object that) { + if (that == null) return false; + if (that instanceof TestThriftInnerStruct) return this.equals((TestThriftInnerStruct) that); + return false; + } + + public boolean equals(TestThriftInnerStruct that) { + if (that == null) return false; + if (this == that) return true; + + boolean this_present_testNameTypedef = true && this.isSetTestNameTypedef(); + boolean that_present_testNameTypedef = true && that.isSetTestNameTypedef(); + if (this_present_testNameTypedef || that_present_testNameTypedef) { + if (!(this_present_testNameTypedef && that_present_testNameTypedef)) return false; + if (!this.testNameTypedef.equals(that.testNameTypedef)) return false; + } + + boolean this_present_testAgeTypedef = true; + boolean that_present_testAgeTypedef = true; + if (this_present_testAgeTypedef || that_present_testAgeTypedef) { + if (!(this_present_testAgeTypedef && that_present_testAgeTypedef)) return false; + if (this.testAgeTypedef != that.testAgeTypedef) return false; + } + + return true; + } + + @Override + public int hashCode() { + int hashCode = 1; + + hashCode = hashCode * 8191 + ((isSetTestNameTypedef()) ? 131071 : 524287); + if (isSetTestNameTypedef()) hashCode = hashCode * 8191 + testNameTypedef.hashCode(); + + hashCode = hashCode * 8191 + testAgeTypedef; + + return hashCode; + } + + @Override + public int compareTo(TestThriftInnerStruct other) { + if (!getClass().equals(other.getClass())) { + return getClass().getName().compareTo(other.getClass().getName()); + } + + int lastComparison = 0; + + lastComparison = + java.lang.Boolean.valueOf(isSetTestNameTypedef()).compareTo(other.isSetTestNameTypedef()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetTestNameTypedef()) { + lastComparison = + org.apache.thrift.TBaseHelper.compareTo(this.testNameTypedef, other.testNameTypedef); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = + java.lang.Boolean.valueOf(isSetTestAgeTypedef()).compareTo(other.isSetTestAgeTypedef()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetTestAgeTypedef()) { + lastComparison = + org.apache.thrift.TBaseHelper.compareTo(this.testAgeTypedef, other.testAgeTypedef); + if (lastComparison != 0) { + return lastComparison; + } + } + return 0; + } + + @org.apache.thrift.annotation.Nullable + public _Fields fieldForId(int fieldId) { + return _Fields.findByThriftId(fieldId); + } + + public void read(org.apache.thrift.protocol.TProtocol iprot) throws org.apache.thrift.TException { + scheme(iprot).read(iprot, this); + } + + public void write(org.apache.thrift.protocol.TProtocol oprot) + throws org.apache.thrift.TException { + scheme(oprot).write(oprot, this); + } + + @Override + public java.lang.String toString() { + java.lang.StringBuilder sb = new java.lang.StringBuilder("TestThriftInnerStruct("); + boolean first = true; + + sb.append("testNameTypedef:"); + if (this.testNameTypedef == null) { + sb.append("null"); + } else { + sb.append(this.testNameTypedef); + } + first = false; + if (!first) sb.append(", "); + sb.append("testAgeTypedef:"); + sb.append(this.testAgeTypedef); + first = false; + sb.append(")"); + return sb.toString(); + } + + public void validate() throws org.apache.thrift.TException { + // check for required fields + // check for sub-struct validity + } + + private void writeObject(java.io.ObjectOutputStream out) throws java.io.IOException { + try { + write( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(out))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } + + private void readObject(java.io.ObjectInputStream in) + throws java.io.IOException, java.lang.ClassNotFoundException { + try { + // it doesn't seem like you should have to do this, but java serialization is wacky, and + // doesn't call the default constructor. + __isset_bitfield = 0; + read( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(in))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } + + private static class TestThriftInnerStructStandardSchemeFactory + implements org.apache.thrift.scheme.SchemeFactory { + public TestThriftInnerStructStandardScheme getScheme() { + return new TestThriftInnerStructStandardScheme(); + } + } + + private static class TestThriftInnerStructStandardScheme + extends org.apache.thrift.scheme.StandardScheme { + + public void read(org.apache.thrift.protocol.TProtocol iprot, TestThriftInnerStruct struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TField schemeField; + iprot.readStructBegin(); + while (true) { + schemeField = iprot.readFieldBegin(); + if (schemeField.type == org.apache.thrift.protocol.TType.STOP) { + break; + } + switch (schemeField.id) { + case 1: // TEST_NAME_TYPEDEF + if (schemeField.type == org.apache.thrift.protocol.TType.STRING) { + struct.testNameTypedef = iprot.readString(); + struct.setTestNameTypedefIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 2: // TEST_AGE_TYPEDEF + if (schemeField.type == org.apache.thrift.protocol.TType.I16) { + struct.testAgeTypedef = iprot.readI16(); + struct.setTestAgeTypedefIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + default: + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + iprot.readFieldEnd(); + } + iprot.readStructEnd(); + struct.validate(); + } + + public void write(org.apache.thrift.protocol.TProtocol oprot, TestThriftInnerStruct struct) + throws org.apache.thrift.TException { + struct.validate(); + + oprot.writeStructBegin(STRUCT_DESC); + if (struct.testNameTypedef != null) { + oprot.writeFieldBegin(TEST_NAME_TYPEDEF_FIELD_DESC); + oprot.writeString(struct.testNameTypedef); + oprot.writeFieldEnd(); + } + oprot.writeFieldBegin(TEST_AGE_TYPEDEF_FIELD_DESC); + oprot.writeI16(struct.testAgeTypedef); + oprot.writeFieldEnd(); + oprot.writeFieldStop(); + oprot.writeStructEnd(); + } + } + + private static class TestThriftInnerStructTupleSchemeFactory + implements org.apache.thrift.scheme.SchemeFactory { + public TestThriftInnerStructTupleScheme getScheme() { + return new TestThriftInnerStructTupleScheme(); + } + } + + private static class TestThriftInnerStructTupleScheme + extends org.apache.thrift.scheme.TupleScheme { + + @Override + public void write(org.apache.thrift.protocol.TProtocol prot, TestThriftInnerStruct struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TTupleProtocol oprot = + (org.apache.thrift.protocol.TTupleProtocol) prot; + java.util.BitSet optionals = new java.util.BitSet(); + if (struct.isSetTestNameTypedef()) { + optionals.set(0); + } + if (struct.isSetTestAgeTypedef()) { + optionals.set(1); + } + oprot.writeBitSet(optionals, 2); + if (struct.isSetTestNameTypedef()) { + oprot.writeString(struct.testNameTypedef); + } + if (struct.isSetTestAgeTypedef()) { + oprot.writeI16(struct.testAgeTypedef); + } + } + + @Override + public void read(org.apache.thrift.protocol.TProtocol prot, TestThriftInnerStruct struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TTupleProtocol iprot = + (org.apache.thrift.protocol.TTupleProtocol) prot; + java.util.BitSet incoming = iprot.readBitSet(2); + if (incoming.get(0)) { + struct.testNameTypedef = iprot.readString(); + struct.setTestNameTypedefIsSet(true); + } + if (incoming.get(1)) { + struct.testAgeTypedef = iprot.readI16(); + struct.setTestAgeTypedefIsSet(true); + } + } + } + + private static S scheme( + org.apache.thrift.protocol.TProtocol proto) { + return (org.apache.thrift.scheme.StandardScheme.class.equals(proto.getScheme()) + ? STANDARD_SCHEME_FACTORY + : TUPLE_SCHEME_FACTORY) + .getScheme(); + } +} diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java index 0e4850d9bbed..6d6487f25d78 100644 --- a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java @@ -17,17 +17,10 @@ */ package org.apache.beam.sdk.io.thrift; -@SuppressWarnings({ - "cast", - "rawtypes", - "serial", - "unchecked", - "unused", - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) +@SuppressWarnings({"cast", "rawtypes", "serial", "unchecked", "unused"}) @javax.annotation.Generated( value = "Autogenerated by Thrift Compiler (0.13.0)", - date = "2019-12-21") + date = "2020-12-12") public class TestThriftStruct implements org.apache.thrift.TBase, java.io.Serializable, @@ -60,21 +53,43 @@ public class TestThriftStruct private static final org.apache.thrift.protocol.TField TEST_BOOL_FIELD_DESC = new org.apache.thrift.protocol.TField( "testBool", org.apache.thrift.protocol.TType.BOOL, (short) 8); + private static final org.apache.thrift.protocol.TField TEST_LIST_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "testList", org.apache.thrift.protocol.TType.LIST, (short) 9); + private static final org.apache.thrift.protocol.TField TEST_STRING_SET_TYPEDEF_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "testStringSetTypedef", org.apache.thrift.protocol.TType.SET, (short) 10); + private static final org.apache.thrift.protocol.TField TEST_ENUM_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "testEnum", org.apache.thrift.protocol.TType.I32, (short) 11); + private static final org.apache.thrift.protocol.TField TEST_NESTED_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "testNested", org.apache.thrift.protocol.TType.STRUCT, (short) 12); + private static final org.apache.thrift.protocol.TField TEST_UNION_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "testUnion", org.apache.thrift.protocol.TType.STRUCT, (short) 13); private static final org.apache.thrift.scheme.SchemeFactory STANDARD_SCHEME_FACTORY = new TestThriftStructStandardSchemeFactory(); private static final org.apache.thrift.scheme.SchemeFactory TUPLE_SCHEME_FACTORY = new TestThriftStructTupleSchemeFactory(); - public byte testByte; // required - public short testShort; // required - public int testInt; // required - public long testLong; // required - public double testDouble; // required - public @org.apache.thrift.annotation.Nullable java.util.Map + private byte testByte; // required + private short testShort; // required + private int testInt; // required + private long testLong; // required + private double testDouble; // required + private @org.apache.thrift.annotation.Nullable java.util.Map stringIntMap; // required - public @org.apache.thrift.annotation.Nullable java.nio.ByteBuffer testBinary; // required - public boolean testBool; // required + private @org.apache.thrift.annotation.Nullable java.nio.ByteBuffer testBinary; // required + private boolean testBool; // required + private @org.apache.thrift.annotation.Nullable java.util.List + testList; // required + private @org.apache.thrift.annotation.Nullable java.util.Set + testStringSetTypedef; // required + private @org.apache.thrift.annotation.Nullable TestThriftEnum testEnum; // required + private @org.apache.thrift.annotation.Nullable TestThriftInnerStruct testNested; // required + private @org.apache.thrift.annotation.Nullable TestThriftUnion testUnion; // required /** * The set of fields this struct contains, along with convenience methods for finding and @@ -88,7 +103,13 @@ public enum _Fields implements org.apache.thrift.TFieldIdEnum { TEST_DOUBLE((short) 5, "testDouble"), STRING_INT_MAP((short) 6, "stringIntMap"), TEST_BINARY((short) 7, "testBinary"), - TEST_BOOL((short) 8, "testBool"); + TEST_BOOL((short) 8, "testBool"), + TEST_LIST((short) 9, "testList"), + TEST_STRING_SET_TYPEDEF((short) 10, "testStringSetTypedef"), + /** @see TestThriftEnum */ + TEST_ENUM((short) 11, "testEnum"), + TEST_NESTED((short) 12, "testNested"), + TEST_UNION((short) 13, "testUnion"); private static final java.util.Map byName = new java.util.HashMap(); @@ -119,6 +140,16 @@ public static _Fields findByThriftId(int fieldId) { return TEST_BINARY; case 8: // TEST_BOOL return TEST_BOOL; + case 9: // TEST_LIST + return TEST_LIST; + case 10: // TEST_STRING_SET_TYPEDEF + return TEST_STRING_SET_TYPEDEF; + case 11: // TEST_ENUM + return TEST_ENUM; + case 12: // TEST_NESTED + return TEST_NESTED; + case 13: // TEST_UNION + return TEST_UNION; default: return null; } @@ -228,6 +259,43 @@ public java.lang.String getFieldName() { org.apache.thrift.TFieldRequirementType.DEFAULT, new org.apache.thrift.meta_data.FieldValueMetaData( org.apache.thrift.protocol.TType.BOOL))); + tmpMap.put( + _Fields.TEST_LIST, + new org.apache.thrift.meta_data.FieldMetaData( + "testList", + org.apache.thrift.TFieldRequirementType.DEFAULT, + new org.apache.thrift.meta_data.ListMetaData( + org.apache.thrift.protocol.TType.LIST, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.I32)))); + tmpMap.put( + _Fields.TEST_STRING_SET_TYPEDEF, + new org.apache.thrift.meta_data.FieldMetaData( + "testStringSetTypedef", + org.apache.thrift.TFieldRequirementType.DEFAULT, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.SET, "StringSet"))); + tmpMap.put( + _Fields.TEST_ENUM, + new org.apache.thrift.meta_data.FieldMetaData( + "testEnum", + org.apache.thrift.TFieldRequirementType.DEFAULT, + new org.apache.thrift.meta_data.EnumMetaData( + org.apache.thrift.protocol.TType.ENUM, TestThriftEnum.class))); + tmpMap.put( + _Fields.TEST_NESTED, + new org.apache.thrift.meta_data.FieldMetaData( + "testNested", + org.apache.thrift.TFieldRequirementType.DEFAULT, + new org.apache.thrift.meta_data.StructMetaData( + org.apache.thrift.protocol.TType.STRUCT, TestThriftInnerStruct.class))); + tmpMap.put( + _Fields.TEST_UNION, + new org.apache.thrift.meta_data.FieldMetaData( + "testUnion", + org.apache.thrift.TFieldRequirementType.DEFAULT, + new org.apache.thrift.meta_data.StructMetaData( + org.apache.thrift.protocol.TType.STRUCT, TestThriftUnion.class))); metaDataMap = java.util.Collections.unmodifiableMap(tmpMap); org.apache.thrift.meta_data.FieldMetaData.addStructMetaDataMap( TestThriftStruct.class, metaDataMap); @@ -243,7 +311,12 @@ public TestThriftStruct( double testDouble, java.util.Map stringIntMap, java.nio.ByteBuffer testBinary, - boolean testBool) { + boolean testBool, + java.util.List testList, + java.util.Set testStringSetTypedef, + TestThriftEnum testEnum, + TestThriftInnerStruct testNested, + TestThriftUnion testUnion) { this(); this.testByte = testByte; setTestByteIsSet(true); @@ -259,6 +332,11 @@ public TestThriftStruct( this.testBinary = org.apache.thrift.TBaseHelper.copyBinary(testBinary); this.testBool = testBool; setTestBoolIsSet(true); + this.testList = testList; + this.testStringSetTypedef = testStringSetTypedef; + this.testEnum = testEnum; + this.testNested = testNested; + this.testUnion = testUnion; } /** Performs a deep copy on other. */ @@ -278,6 +356,25 @@ public TestThriftStruct(TestThriftStruct other) { this.testBinary = org.apache.thrift.TBaseHelper.copyBinary(other.testBinary); } this.testBool = other.testBool; + if (other.isSetTestList()) { + java.util.List __this__testList = + new java.util.ArrayList(other.testList); + this.testList = __this__testList; + } + if (other.isSetTestStringSetTypedef()) { + java.util.Set __this__testStringSetTypedef = + new java.util.HashSet(other.testStringSetTypedef); + this.testStringSetTypedef = __this__testStringSetTypedef; + } + if (other.isSetTestEnum()) { + this.testEnum = other.testEnum; + } + if (other.isSetTestNested()) { + this.testNested = new TestThriftInnerStruct(other.testNested); + } + if (other.isSetTestUnion()) { + this.testUnion = new TestThriftUnion(other.testUnion); + } } public TestThriftStruct deepCopy() { @@ -300,16 +397,20 @@ public void clear() { this.testBinary = null; setTestBoolIsSet(false); this.testBool = false; + this.testList = null; + this.testStringSetTypedef = null; + this.testEnum = null; + this.testNested = null; + this.testUnion = null; } public byte getTestByte() { return this.testByte; } - public TestThriftStruct setTestByte(byte testByte) { + public void setTestByte(byte testByte) { this.testByte = testByte; setTestByteIsSet(true); - return this; } public void unsetTestByte() { @@ -331,10 +432,9 @@ public short getTestShort() { return this.testShort; } - public TestThriftStruct setTestShort(short testShort) { + public void setTestShort(short testShort) { this.testShort = testShort; setTestShortIsSet(true); - return this; } public void unsetTestShort() { @@ -356,10 +456,9 @@ public int getTestInt() { return this.testInt; } - public TestThriftStruct setTestInt(int testInt) { + public void setTestInt(int testInt) { this.testInt = testInt; setTestIntIsSet(true); - return this; } public void unsetTestInt() { @@ -381,10 +480,9 @@ public long getTestLong() { return this.testLong; } - public TestThriftStruct setTestLong(long testLong) { + public void setTestLong(long testLong) { this.testLong = testLong; setTestLongIsSet(true); - return this; } public void unsetTestLong() { @@ -406,10 +504,9 @@ public double getTestDouble() { return this.testDouble; } - public TestThriftStruct setTestDouble(double testDouble) { + public void setTestDouble(double testDouble) { this.testDouble = testDouble; setTestDoubleIsSet(true); - return this; } public void unsetTestDouble() { @@ -443,11 +540,10 @@ public java.util.Map getStringIntMap() { return this.stringIntMap; } - public TestThriftStruct setStringIntMap( + public void setStringIntMap( @org.apache.thrift.annotation.Nullable java.util.Map stringIntMap) { this.stringIntMap = stringIntMap; - return this; } public void unsetStringIntMap() { @@ -474,18 +570,15 @@ public java.nio.ByteBuffer bufferForTestBinary() { return org.apache.thrift.TBaseHelper.copyBinary(testBinary); } - public TestThriftStruct setTestBinary(byte[] testBinary) { + public void setTestBinary(byte[] testBinary) { this.testBinary = testBinary == null ? (java.nio.ByteBuffer) null : java.nio.ByteBuffer.wrap(testBinary.clone()); - return this; } - public TestThriftStruct setTestBinary( - @org.apache.thrift.annotation.Nullable java.nio.ByteBuffer testBinary) { + public void setTestBinary(@org.apache.thrift.annotation.Nullable java.nio.ByteBuffer testBinary) { this.testBinary = org.apache.thrift.TBaseHelper.copyBinary(testBinary); - return this; } public void unsetTestBinary() { @@ -507,10 +600,9 @@ public boolean isTestBool() { return this.testBool; } - public TestThriftStruct setTestBool(boolean testBool) { + public void setTestBool(boolean testBool) { this.testBool = testBool; setTestBoolIsSet(true); - return this; } public void unsetTestBool() { @@ -528,6 +620,166 @@ public void setTestBoolIsSet(boolean value) { org.apache.thrift.EncodingUtils.setBit(__isset_bitfield, __TESTBOOL_ISSET_ID, value); } + public int getTestListSize() { + return (this.testList == null) ? 0 : this.testList.size(); + } + + @org.apache.thrift.annotation.Nullable + public java.util.Iterator getTestListIterator() { + return (this.testList == null) ? null : this.testList.iterator(); + } + + public void addToTestList(int elem) { + if (this.testList == null) { + this.testList = new java.util.ArrayList(); + } + this.testList.add(elem); + } + + @org.apache.thrift.annotation.Nullable + public java.util.List getTestList() { + return this.testList; + } + + public void setTestList( + @org.apache.thrift.annotation.Nullable java.util.List testList) { + this.testList = testList; + } + + public void unsetTestList() { + this.testList = null; + } + + /** Returns true if field testList is set (has been assigned a value) and false otherwise */ + public boolean isSetTestList() { + return this.testList != null; + } + + public void setTestListIsSet(boolean value) { + if (!value) { + this.testList = null; + } + } + + public int getTestStringSetTypedefSize() { + return (this.testStringSetTypedef == null) ? 0 : this.testStringSetTypedef.size(); + } + + @org.apache.thrift.annotation.Nullable + public java.util.Iterator getTestStringSetTypedefIterator() { + return (this.testStringSetTypedef == null) ? null : this.testStringSetTypedef.iterator(); + } + + public void addToTestStringSetTypedef(java.lang.String elem) { + if (this.testStringSetTypedef == null) { + this.testStringSetTypedef = new java.util.HashSet(); + } + this.testStringSetTypedef.add(elem); + } + + @org.apache.thrift.annotation.Nullable + public java.util.Set getTestStringSetTypedef() { + return this.testStringSetTypedef; + } + + public void setTestStringSetTypedef( + @org.apache.thrift.annotation.Nullable java.util.Set testStringSetTypedef) { + this.testStringSetTypedef = testStringSetTypedef; + } + + public void unsetTestStringSetTypedef() { + this.testStringSetTypedef = null; + } + + /** + * Returns true if field testStringSetTypedef is set (has been assigned a value) and false + * otherwise + */ + public boolean isSetTestStringSetTypedef() { + return this.testStringSetTypedef != null; + } + + public void setTestStringSetTypedefIsSet(boolean value) { + if (!value) { + this.testStringSetTypedef = null; + } + } + + /** @see TestThriftEnum */ + @org.apache.thrift.annotation.Nullable + public TestThriftEnum getTestEnum() { + return this.testEnum; + } + + /** @see TestThriftEnum */ + public void setTestEnum(@org.apache.thrift.annotation.Nullable TestThriftEnum testEnum) { + this.testEnum = testEnum; + } + + public void unsetTestEnum() { + this.testEnum = null; + } + + /** Returns true if field testEnum is set (has been assigned a value) and false otherwise */ + public boolean isSetTestEnum() { + return this.testEnum != null; + } + + public void setTestEnumIsSet(boolean value) { + if (!value) { + this.testEnum = null; + } + } + + @org.apache.thrift.annotation.Nullable + public TestThriftInnerStruct getTestNested() { + return this.testNested; + } + + public void setTestNested( + @org.apache.thrift.annotation.Nullable TestThriftInnerStruct testNested) { + this.testNested = testNested; + } + + public void unsetTestNested() { + this.testNested = null; + } + + /** Returns true if field testNested is set (has been assigned a value) and false otherwise */ + public boolean isSetTestNested() { + return this.testNested != null; + } + + public void setTestNestedIsSet(boolean value) { + if (!value) { + this.testNested = null; + } + } + + @org.apache.thrift.annotation.Nullable + public TestThriftUnion getTestUnion() { + return this.testUnion; + } + + public void setTestUnion(@org.apache.thrift.annotation.Nullable TestThriftUnion testUnion) { + this.testUnion = testUnion; + } + + public void unsetTestUnion() { + this.testUnion = null; + } + + /** Returns true if field testUnion is set (has been assigned a value) and false otherwise */ + public boolean isSetTestUnion() { + return this.testUnion != null; + } + + public void setTestUnionIsSet(boolean value) { + if (!value) { + this.testUnion = null; + } + } + public void setFieldValue( _Fields field, @org.apache.thrift.annotation.Nullable java.lang.Object value) { switch (field) { @@ -598,6 +850,46 @@ public void setFieldValue( setTestBool((java.lang.Boolean) value); } break; + + case TEST_LIST: + if (value == null) { + unsetTestList(); + } else { + setTestList((java.util.List) value); + } + break; + + case TEST_STRING_SET_TYPEDEF: + if (value == null) { + unsetTestStringSetTypedef(); + } else { + setTestStringSetTypedef((java.util.Set) value); + } + break; + + case TEST_ENUM: + if (value == null) { + unsetTestEnum(); + } else { + setTestEnum((TestThriftEnum) value); + } + break; + + case TEST_NESTED: + if (value == null) { + unsetTestNested(); + } else { + setTestNested((TestThriftInnerStruct) value); + } + break; + + case TEST_UNION: + if (value == null) { + unsetTestUnion(); + } else { + setTestUnion((TestThriftUnion) value); + } + break; } } @@ -627,6 +919,21 @@ public java.lang.Object getFieldValue(_Fields field) { case TEST_BOOL: return isTestBool(); + + case TEST_LIST: + return getTestList(); + + case TEST_STRING_SET_TYPEDEF: + return getTestStringSetTypedef(); + + case TEST_ENUM: + return getTestEnum(); + + case TEST_NESTED: + return getTestNested(); + + case TEST_UNION: + return getTestUnion(); } throw new java.lang.IllegalStateException(); } @@ -657,6 +964,16 @@ public boolean isSet(_Fields field) { return isSetTestBinary(); case TEST_BOOL: return isSetTestBool(); + case TEST_LIST: + return isSetTestList(); + case TEST_STRING_SET_TYPEDEF: + return isSetTestStringSetTypedef(); + case TEST_ENUM: + return isSetTestEnum(); + case TEST_NESTED: + return isSetTestNested(); + case TEST_UNION: + return isSetTestUnion(); } throw new java.lang.IllegalStateException(); } @@ -728,6 +1045,41 @@ public boolean equals(TestThriftStruct that) { if (this.testBool != that.testBool) return false; } + boolean this_present_testList = true && this.isSetTestList(); + boolean that_present_testList = true && that.isSetTestList(); + if (this_present_testList || that_present_testList) { + if (!(this_present_testList && that_present_testList)) return false; + if (!this.testList.equals(that.testList)) return false; + } + + boolean this_present_testStringSetTypedef = true && this.isSetTestStringSetTypedef(); + boolean that_present_testStringSetTypedef = true && that.isSetTestStringSetTypedef(); + if (this_present_testStringSetTypedef || that_present_testStringSetTypedef) { + if (!(this_present_testStringSetTypedef && that_present_testStringSetTypedef)) return false; + if (!this.testStringSetTypedef.equals(that.testStringSetTypedef)) return false; + } + + boolean this_present_testEnum = true && this.isSetTestEnum(); + boolean that_present_testEnum = true && that.isSetTestEnum(); + if (this_present_testEnum || that_present_testEnum) { + if (!(this_present_testEnum && that_present_testEnum)) return false; + if (!this.testEnum.equals(that.testEnum)) return false; + } + + boolean this_present_testNested = true && this.isSetTestNested(); + boolean that_present_testNested = true && that.isSetTestNested(); + if (this_present_testNested || that_present_testNested) { + if (!(this_present_testNested && that_present_testNested)) return false; + if (!this.testNested.equals(that.testNested)) return false; + } + + boolean this_present_testUnion = true && this.isSetTestUnion(); + boolean that_present_testUnion = true && that.isSetTestUnion(); + if (this_present_testUnion || that_present_testUnion) { + if (!(this_present_testUnion && that_present_testUnion)) return false; + if (!this.testUnion.equals(that.testUnion)) return false; + } + return true; } @@ -753,6 +1105,21 @@ public int hashCode() { hashCode = hashCode * 8191 + ((testBool) ? 131071 : 524287); + hashCode = hashCode * 8191 + ((isSetTestList()) ? 131071 : 524287); + if (isSetTestList()) hashCode = hashCode * 8191 + testList.hashCode(); + + hashCode = hashCode * 8191 + ((isSetTestStringSetTypedef()) ? 131071 : 524287); + if (isSetTestStringSetTypedef()) hashCode = hashCode * 8191 + testStringSetTypedef.hashCode(); + + hashCode = hashCode * 8191 + ((isSetTestEnum()) ? 131071 : 524287); + if (isSetTestEnum()) hashCode = hashCode * 8191 + testEnum.getValue(); + + hashCode = hashCode * 8191 + ((isSetTestNested()) ? 131071 : 524287); + if (isSetTestNested()) hashCode = hashCode * 8191 + testNested.hashCode(); + + hashCode = hashCode * 8191 + ((isSetTestUnion()) ? 131071 : 524287); + if (isSetTestUnion()) hashCode = hashCode * 8191 + testUnion.hashCode(); + return hashCode; } @@ -848,6 +1215,61 @@ public int compareTo(TestThriftStruct other) { return lastComparison; } } + lastComparison = java.lang.Boolean.valueOf(isSetTestList()).compareTo(other.isSetTestList()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetTestList()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.testList, other.testList); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = + java.lang.Boolean.valueOf(isSetTestStringSetTypedef()) + .compareTo(other.isSetTestStringSetTypedef()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetTestStringSetTypedef()) { + lastComparison = + org.apache.thrift.TBaseHelper.compareTo( + this.testStringSetTypedef, other.testStringSetTypedef); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = java.lang.Boolean.valueOf(isSetTestEnum()).compareTo(other.isSetTestEnum()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetTestEnum()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.testEnum, other.testEnum); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = + java.lang.Boolean.valueOf(isSetTestNested()).compareTo(other.isSetTestNested()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetTestNested()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.testNested, other.testNested); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = java.lang.Boolean.valueOf(isSetTestUnion()).compareTo(other.isSetTestUnion()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetTestUnion()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.testUnion, other.testUnion); + if (lastComparison != 0) { + return lastComparison; + } + } return 0; } @@ -909,6 +1331,46 @@ public java.lang.String toString() { sb.append("testBool:"); sb.append(this.testBool); first = false; + if (!first) sb.append(", "); + sb.append("testList:"); + if (this.testList == null) { + sb.append("null"); + } else { + sb.append(this.testList); + } + first = false; + if (!first) sb.append(", "); + sb.append("testStringSetTypedef:"); + if (this.testStringSetTypedef == null) { + sb.append("null"); + } else { + sb.append(this.testStringSetTypedef); + } + first = false; + if (!first) sb.append(", "); + sb.append("testEnum:"); + if (this.testEnum == null) { + sb.append("null"); + } else { + sb.append(this.testEnum); + } + first = false; + if (!first) sb.append(", "); + sb.append("testNested:"); + if (this.testNested == null) { + sb.append("null"); + } else { + sb.append(this.testNested); + } + first = false; + if (!first) sb.append(", "); + sb.append("testUnion:"); + if (this.testUnion == null) { + sb.append("null"); + } else { + sb.append(this.testUnion); + } + first = false; sb.append(")"); return sb.toString(); } @@ -916,6 +1378,9 @@ public java.lang.String toString() { public void validate() throws org.apache.thrift.TException { // check for required fields // check for sub-struct validity + if (testNested != null) { + testNested.validate(); + } } private void writeObject(java.io.ObjectOutputStream out) throws java.io.IOException { @@ -1038,14 +1503,74 @@ public void read(org.apache.thrift.protocol.TProtocol iprot, TestThriftStruct st org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); } break; + case 9: // TEST_LIST + if (schemeField.type == org.apache.thrift.protocol.TType.LIST) { + { + org.apache.thrift.protocol.TList _list4 = iprot.readListBegin(); + struct.testList = new java.util.ArrayList(_list4.size); + int _elem5; + for (int _i6 = 0; _i6 < _list4.size; ++_i6) { + _elem5 = iprot.readI32(); + struct.testList.add(_elem5); + } + iprot.readListEnd(); + } + struct.setTestListIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 10: // TEST_STRING_SET_TYPEDEF + if (schemeField.type == org.apache.thrift.protocol.TType.SET) { + { + org.apache.thrift.protocol.TSet _set7 = iprot.readSetBegin(); + struct.testStringSetTypedef = + new java.util.HashSet(2 * _set7.size); + @org.apache.thrift.annotation.Nullable java.lang.String _elem8; + for (int _i9 = 0; _i9 < _set7.size; ++_i9) { + _elem8 = iprot.readString(); + struct.testStringSetTypedef.add(_elem8); + } + iprot.readSetEnd(); + } + struct.setTestStringSetTypedefIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 11: // TEST_ENUM + if (schemeField.type == org.apache.thrift.protocol.TType.I32) { + struct.testEnum = + org.apache.beam.sdk.io.thrift.TestThriftEnum.findByValue(iprot.readI32()); + struct.setTestEnumIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 12: // TEST_NESTED + if (schemeField.type == org.apache.thrift.protocol.TType.STRUCT) { + struct.testNested = new TestThriftInnerStruct(); + struct.testNested.read(iprot); + struct.setTestNestedIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 13: // TEST_UNION + if (schemeField.type == org.apache.thrift.protocol.TType.STRUCT) { + struct.testUnion = new TestThriftUnion(); + struct.testUnion.read(iprot); + struct.setTestUnionIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; default: org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); } iprot.readFieldEnd(); } iprot.readStructEnd(); - - // check for required fields of primitive type, which can't be checked in the validate method struct.validate(); } @@ -1077,10 +1602,10 @@ public void write(org.apache.thrift.protocol.TProtocol oprot, TestThriftStruct s org.apache.thrift.protocol.TType.STRING, org.apache.thrift.protocol.TType.I16, struct.stringIntMap.size())); - for (java.util.Map.Entry _iter4 : + for (java.util.Map.Entry _iter10 : struct.stringIntMap.entrySet()) { - oprot.writeString(_iter4.getKey()); - oprot.writeI16(_iter4.getValue()); + oprot.writeString(_iter10.getKey()); + oprot.writeI16(_iter10.getValue()); } oprot.writeMapEnd(); } @@ -1094,6 +1619,47 @@ public void write(org.apache.thrift.protocol.TProtocol oprot, TestThriftStruct s oprot.writeFieldBegin(TEST_BOOL_FIELD_DESC); oprot.writeBool(struct.testBool); oprot.writeFieldEnd(); + if (struct.testList != null) { + oprot.writeFieldBegin(TEST_LIST_FIELD_DESC); + { + oprot.writeListBegin( + new org.apache.thrift.protocol.TList( + org.apache.thrift.protocol.TType.I32, struct.testList.size())); + for (int _iter11 : struct.testList) { + oprot.writeI32(_iter11); + } + oprot.writeListEnd(); + } + oprot.writeFieldEnd(); + } + if (struct.testStringSetTypedef != null) { + oprot.writeFieldBegin(TEST_STRING_SET_TYPEDEF_FIELD_DESC); + { + oprot.writeSetBegin( + new org.apache.thrift.protocol.TSet( + org.apache.thrift.protocol.TType.STRING, struct.testStringSetTypedef.size())); + for (java.lang.String _iter12 : struct.testStringSetTypedef) { + oprot.writeString(_iter12); + } + oprot.writeSetEnd(); + } + oprot.writeFieldEnd(); + } + if (struct.testEnum != null) { + oprot.writeFieldBegin(TEST_ENUM_FIELD_DESC); + oprot.writeI32(struct.testEnum.getValue()); + oprot.writeFieldEnd(); + } + if (struct.testNested != null) { + oprot.writeFieldBegin(TEST_NESTED_FIELD_DESC); + struct.testNested.write(oprot); + oprot.writeFieldEnd(); + } + if (struct.testUnion != null) { + oprot.writeFieldBegin(TEST_UNION_FIELD_DESC); + struct.testUnion.write(oprot); + oprot.writeFieldEnd(); + } oprot.writeFieldStop(); oprot.writeStructEnd(); } @@ -1139,7 +1705,22 @@ public void write(org.apache.thrift.protocol.TProtocol prot, TestThriftStruct st if (struct.isSetTestBool()) { optionals.set(7); } - oprot.writeBitSet(optionals, 8); + if (struct.isSetTestList()) { + optionals.set(8); + } + if (struct.isSetTestStringSetTypedef()) { + optionals.set(9); + } + if (struct.isSetTestEnum()) { + optionals.set(10); + } + if (struct.isSetTestNested()) { + optionals.set(11); + } + if (struct.isSetTestUnion()) { + optionals.set(12); + } + oprot.writeBitSet(optionals, 13); if (struct.isSetTestByte()) { oprot.writeByte(struct.testByte); } @@ -1158,10 +1739,10 @@ public void write(org.apache.thrift.protocol.TProtocol prot, TestThriftStruct st if (struct.isSetStringIntMap()) { { oprot.writeI32(struct.stringIntMap.size()); - for (java.util.Map.Entry _iter5 : + for (java.util.Map.Entry _iter13 : struct.stringIntMap.entrySet()) { - oprot.writeString(_iter5.getKey()); - oprot.writeI16(_iter5.getValue()); + oprot.writeString(_iter13.getKey()); + oprot.writeI16(_iter13.getValue()); } } } @@ -1171,6 +1752,31 @@ public void write(org.apache.thrift.protocol.TProtocol prot, TestThriftStruct st if (struct.isSetTestBool()) { oprot.writeBool(struct.testBool); } + if (struct.isSetTestList()) { + { + oprot.writeI32(struct.testList.size()); + for (int _iter14 : struct.testList) { + oprot.writeI32(_iter14); + } + } + } + if (struct.isSetTestStringSetTypedef()) { + { + oprot.writeI32(struct.testStringSetTypedef.size()); + for (java.lang.String _iter15 : struct.testStringSetTypedef) { + oprot.writeString(_iter15); + } + } + } + if (struct.isSetTestEnum()) { + oprot.writeI32(struct.testEnum.getValue()); + } + if (struct.isSetTestNested()) { + struct.testNested.write(oprot); + } + if (struct.isSetTestUnion()) { + struct.testUnion.write(oprot); + } } @Override @@ -1178,7 +1784,7 @@ public void read(org.apache.thrift.protocol.TProtocol prot, TestThriftStruct str throws org.apache.thrift.TException { org.apache.thrift.protocol.TTupleProtocol iprot = (org.apache.thrift.protocol.TTupleProtocol) prot; - java.util.BitSet incoming = iprot.readBitSet(8); + java.util.BitSet incoming = iprot.readBitSet(13); if (incoming.get(0)) { struct.testByte = iprot.readByte(); struct.setTestByteIsSet(true); @@ -1201,19 +1807,19 @@ public void read(org.apache.thrift.protocol.TProtocol prot, TestThriftStruct str } if (incoming.get(5)) { { - org.apache.thrift.protocol.TMap _map6 = + org.apache.thrift.protocol.TMap _map16 = new org.apache.thrift.protocol.TMap( org.apache.thrift.protocol.TType.STRING, org.apache.thrift.protocol.TType.I16, iprot.readI32()); struct.stringIntMap = - new java.util.HashMap(2 * _map6.size); - @org.apache.thrift.annotation.Nullable java.lang.String _key7; - short _val8; - for (int _i9 = 0; _i9 < _map6.size; ++_i9) { - _key7 = iprot.readString(); - _val8 = iprot.readI16(); - struct.stringIntMap.put(_key7, _val8); + new java.util.HashMap(2 * _map16.size); + @org.apache.thrift.annotation.Nullable java.lang.String _key17; + short _val18; + for (int _i19 = 0; _i19 < _map16.size; ++_i19) { + _key17 = iprot.readString(); + _val18 = iprot.readI16(); + struct.stringIntMap.put(_key17, _val18); } } struct.setStringIntMapIsSet(true); @@ -1226,6 +1832,48 @@ public void read(org.apache.thrift.protocol.TProtocol prot, TestThriftStruct str struct.testBool = iprot.readBool(); struct.setTestBoolIsSet(true); } + if (incoming.get(8)) { + { + org.apache.thrift.protocol.TList _list20 = + new org.apache.thrift.protocol.TList( + org.apache.thrift.protocol.TType.I32, iprot.readI32()); + struct.testList = new java.util.ArrayList(_list20.size); + int _elem21; + for (int _i22 = 0; _i22 < _list20.size; ++_i22) { + _elem21 = iprot.readI32(); + struct.testList.add(_elem21); + } + } + struct.setTestListIsSet(true); + } + if (incoming.get(9)) { + { + org.apache.thrift.protocol.TSet _set23 = + new org.apache.thrift.protocol.TSet( + org.apache.thrift.protocol.TType.STRING, iprot.readI32()); + struct.testStringSetTypedef = new java.util.HashSet(2 * _set23.size); + @org.apache.thrift.annotation.Nullable java.lang.String _elem24; + for (int _i25 = 0; _i25 < _set23.size; ++_i25) { + _elem24 = iprot.readString(); + struct.testStringSetTypedef.add(_elem24); + } + } + struct.setTestStringSetTypedefIsSet(true); + } + if (incoming.get(10)) { + struct.testEnum = org.apache.beam.sdk.io.thrift.TestThriftEnum.findByValue(iprot.readI32()); + struct.setTestEnumIsSet(true); + } + if (incoming.get(11)) { + struct.testNested = new TestThriftInnerStruct(); + struct.testNested.read(iprot); + struct.setTestNestedIsSet(true); + } + if (incoming.get(12)) { + struct.testUnion = new TestThriftUnion(); + struct.testUnion.read(iprot); + struct.setTestUnionIsSet(true); + } } } diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftUnion.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftUnion.java new file mode 100644 index 000000000000..5cac062c9d67 --- /dev/null +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftUnion.java @@ -0,0 +1,401 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +@SuppressWarnings({"cast", "rawtypes", "serial", "unchecked", "unused"}) +@javax.annotation.Generated( + value = "Autogenerated by Thrift Compiler (0.13.0)", + date = "2020-12-12") +public class TestThriftUnion + extends org.apache.thrift.TUnion { + private static final org.apache.thrift.protocol.TStruct STRUCT_DESC = + new org.apache.thrift.protocol.TStruct("TestThriftUnion"); + private static final org.apache.thrift.protocol.TField SNAKE_CASE_NESTED_STRUCT_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "snake_case_nested_struct", org.apache.thrift.protocol.TType.STRUCT, (short) 1); + private static final org.apache.thrift.protocol.TField CAMEL_CASE_ENUM_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "camelCaseEnum", org.apache.thrift.protocol.TType.I32, (short) 2); + + /** + * The set of fields this struct contains, along with convenience methods for finding and + * manipulating them. + */ + public enum _Fields implements org.apache.thrift.TFieldIdEnum { + SNAKE_CASE_NESTED_STRUCT((short) 1, "snake_case_nested_struct"), + /** @see TestThriftEnum */ + CAMEL_CASE_ENUM((short) 2, "camelCaseEnum"); + + private static final java.util.Map byName = + new java.util.HashMap(); + + static { + for (_Fields field : java.util.EnumSet.allOf(_Fields.class)) { + byName.put(field.getFieldName(), field); + } + } + + /** Find the _Fields constant that matches fieldId, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByThriftId(int fieldId) { + switch (fieldId) { + case 1: // SNAKE_CASE_NESTED_STRUCT + return SNAKE_CASE_NESTED_STRUCT; + case 2: // CAMEL_CASE_ENUM + return CAMEL_CASE_ENUM; + default: + return null; + } + } + + /** Find the _Fields constant that matches fieldId, throwing an exception if it is not found. */ + public static _Fields findByThriftIdOrThrow(int fieldId) { + _Fields fields = findByThriftId(fieldId); + if (fields == null) + throw new java.lang.IllegalArgumentException("Field " + fieldId + " doesn't exist!"); + return fields; + } + + /** Find the _Fields constant that matches name, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByName(java.lang.String name) { + return byName.get(name); + } + + private final short _thriftId; + private final java.lang.String _fieldName; + + _Fields(short thriftId, java.lang.String fieldName) { + _thriftId = thriftId; + _fieldName = fieldName; + } + + public short getThriftFieldId() { + return _thriftId; + } + + public java.lang.String getFieldName() { + return _fieldName; + } + } + + public static final java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> metaDataMap; + + static { + java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> tmpMap = + new java.util.EnumMap<_Fields, org.apache.thrift.meta_data.FieldMetaData>(_Fields.class); + tmpMap.put( + _Fields.SNAKE_CASE_NESTED_STRUCT, + new org.apache.thrift.meta_data.FieldMetaData( + "snake_case_nested_struct", + org.apache.thrift.TFieldRequirementType.OPTIONAL, + new org.apache.thrift.meta_data.StructMetaData( + org.apache.thrift.protocol.TType.STRUCT, TestThriftInnerStruct.class))); + tmpMap.put( + _Fields.CAMEL_CASE_ENUM, + new org.apache.thrift.meta_data.FieldMetaData( + "camelCaseEnum", + org.apache.thrift.TFieldRequirementType.OPTIONAL, + new org.apache.thrift.meta_data.EnumMetaData( + org.apache.thrift.protocol.TType.ENUM, TestThriftEnum.class))); + metaDataMap = java.util.Collections.unmodifiableMap(tmpMap); + org.apache.thrift.meta_data.FieldMetaData.addStructMetaDataMap( + TestThriftUnion.class, metaDataMap); + } + + public TestThriftUnion() { + super(); + } + + public TestThriftUnion(_Fields setField, java.lang.Object value) { + super(setField, value); + } + + public TestThriftUnion(TestThriftUnion other) { + super(other); + } + + public TestThriftUnion deepCopy() { + return new TestThriftUnion(this); + } + + public static TestThriftUnion snake_case_nested_struct(TestThriftInnerStruct value) { + TestThriftUnion x = new TestThriftUnion(); + x.setSnake_case_nested_struct(value); + return x; + } + + public static TestThriftUnion camelCaseEnum(TestThriftEnum value) { + TestThriftUnion x = new TestThriftUnion(); + x.setCamelCaseEnum(value); + return x; + } + + @Override + protected void checkType(_Fields setField, java.lang.Object value) + throws java.lang.ClassCastException { + switch (setField) { + case SNAKE_CASE_NESTED_STRUCT: + if (value instanceof TestThriftInnerStruct) { + break; + } + throw new java.lang.ClassCastException( + "Was expecting value of type TestThriftInnerStruct for field 'snake_case_nested_struct', but got " + + value.getClass().getSimpleName()); + case CAMEL_CASE_ENUM: + if (value instanceof TestThriftEnum) { + break; + } + throw new java.lang.ClassCastException( + "Was expecting value of type TestThriftEnum for field 'camelCaseEnum', but got " + + value.getClass().getSimpleName()); + default: + throw new java.lang.IllegalArgumentException("Unknown field id " + setField); + } + } + + @Override + protected java.lang.Object standardSchemeReadValue( + org.apache.thrift.protocol.TProtocol iprot, org.apache.thrift.protocol.TField field) + throws org.apache.thrift.TException { + _Fields setField = _Fields.findByThriftId(field.id); + if (setField != null) { + switch (setField) { + case SNAKE_CASE_NESTED_STRUCT: + if (field.type == SNAKE_CASE_NESTED_STRUCT_FIELD_DESC.type) { + TestThriftInnerStruct snake_case_nested_struct; + snake_case_nested_struct = new TestThriftInnerStruct(); + snake_case_nested_struct.read(iprot); + return snake_case_nested_struct; + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, field.type); + return null; + } + case CAMEL_CASE_ENUM: + if (field.type == CAMEL_CASE_ENUM_FIELD_DESC.type) { + TestThriftEnum camelCaseEnum; + camelCaseEnum = + org.apache.beam.sdk.io.thrift.TestThriftEnum.findByValue(iprot.readI32()); + return camelCaseEnum; + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, field.type); + return null; + } + default: + throw new java.lang.IllegalStateException( + "setField wasn't null, but didn't match any of the case statements!"); + } + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, field.type); + return null; + } + } + + @Override + protected void standardSchemeWriteValue(org.apache.thrift.protocol.TProtocol oprot) + throws org.apache.thrift.TException { + switch (setField_) { + case SNAKE_CASE_NESTED_STRUCT: + TestThriftInnerStruct snake_case_nested_struct = (TestThriftInnerStruct) value_; + snake_case_nested_struct.write(oprot); + return; + case CAMEL_CASE_ENUM: + TestThriftEnum camelCaseEnum = (TestThriftEnum) value_; + oprot.writeI32(camelCaseEnum.getValue()); + return; + default: + throw new java.lang.IllegalStateException( + "Cannot write union with unknown field " + setField_); + } + } + + @Override + protected java.lang.Object tupleSchemeReadValue( + org.apache.thrift.protocol.TProtocol iprot, short fieldID) + throws org.apache.thrift.TException { + _Fields setField = _Fields.findByThriftId(fieldID); + if (setField != null) { + switch (setField) { + case SNAKE_CASE_NESTED_STRUCT: + TestThriftInnerStruct snake_case_nested_struct; + snake_case_nested_struct = new TestThriftInnerStruct(); + snake_case_nested_struct.read(iprot); + return snake_case_nested_struct; + case CAMEL_CASE_ENUM: + TestThriftEnum camelCaseEnum; + camelCaseEnum = org.apache.beam.sdk.io.thrift.TestThriftEnum.findByValue(iprot.readI32()); + return camelCaseEnum; + default: + throw new java.lang.IllegalStateException( + "setField wasn't null, but didn't match any of the case statements!"); + } + } else { + throw new org.apache.thrift.protocol.TProtocolException( + "Couldn't find a field with field id " + fieldID); + } + } + + @Override + protected void tupleSchemeWriteValue(org.apache.thrift.protocol.TProtocol oprot) + throws org.apache.thrift.TException { + switch (setField_) { + case SNAKE_CASE_NESTED_STRUCT: + TestThriftInnerStruct snake_case_nested_struct = (TestThriftInnerStruct) value_; + snake_case_nested_struct.write(oprot); + return; + case CAMEL_CASE_ENUM: + TestThriftEnum camelCaseEnum = (TestThriftEnum) value_; + oprot.writeI32(camelCaseEnum.getValue()); + return; + default: + throw new java.lang.IllegalStateException( + "Cannot write union with unknown field " + setField_); + } + } + + @Override + protected org.apache.thrift.protocol.TField getFieldDesc(_Fields setField) { + switch (setField) { + case SNAKE_CASE_NESTED_STRUCT: + return SNAKE_CASE_NESTED_STRUCT_FIELD_DESC; + case CAMEL_CASE_ENUM: + return CAMEL_CASE_ENUM_FIELD_DESC; + default: + throw new java.lang.IllegalArgumentException("Unknown field id " + setField); + } + } + + @Override + protected org.apache.thrift.protocol.TStruct getStructDesc() { + return STRUCT_DESC; + } + + @Override + protected _Fields enumForId(short id) { + return _Fields.findByThriftIdOrThrow(id); + } + + @org.apache.thrift.annotation.Nullable + public _Fields fieldForId(int fieldId) { + return _Fields.findByThriftId(fieldId); + } + + public TestThriftInnerStruct getSnake_case_nested_struct() { + if (getSetField() == _Fields.SNAKE_CASE_NESTED_STRUCT) { + return (TestThriftInnerStruct) getFieldValue(); + } else { + throw new java.lang.RuntimeException( + "Cannot get field 'snake_case_nested_struct' because union is currently set to " + + getFieldDesc(getSetField()).name); + } + } + + public void setSnake_case_nested_struct(TestThriftInnerStruct value) { + if (value == null) throw new java.lang.NullPointerException(); + setField_ = _Fields.SNAKE_CASE_NESTED_STRUCT; + value_ = value; + } + + /** @see TestThriftEnum */ + public TestThriftEnum getCamelCaseEnum() { + if (getSetField() == _Fields.CAMEL_CASE_ENUM) { + return (TestThriftEnum) getFieldValue(); + } else { + throw new java.lang.RuntimeException( + "Cannot get field 'camelCaseEnum' because union is currently set to " + + getFieldDesc(getSetField()).name); + } + } + + /** @see TestThriftEnum */ + public void setCamelCaseEnum(TestThriftEnum value) { + if (value == null) throw new java.lang.NullPointerException(); + setField_ = _Fields.CAMEL_CASE_ENUM; + value_ = value; + } + + public boolean isSetSnake_case_nested_struct() { + return setField_ == _Fields.SNAKE_CASE_NESTED_STRUCT; + } + + public boolean isSetCamelCaseEnum() { + return setField_ == _Fields.CAMEL_CASE_ENUM; + } + + public boolean equals(java.lang.Object other) { + if (other instanceof TestThriftUnion) { + return equals((TestThriftUnion) other); + } else { + return false; + } + } + + public boolean equals(TestThriftUnion other) { + return other != null + && getSetField() == other.getSetField() + && getFieldValue().equals(other.getFieldValue()); + } + + @Override + public int compareTo(TestThriftUnion other) { + int lastComparison = + org.apache.thrift.TBaseHelper.compareTo(getSetField(), other.getSetField()); + if (lastComparison == 0) { + return org.apache.thrift.TBaseHelper.compareTo(getFieldValue(), other.getFieldValue()); + } + return lastComparison; + } + + @Override + public int hashCode() { + java.util.List list = new java.util.ArrayList(); + list.add(this.getClass().getName()); + org.apache.thrift.TFieldIdEnum setField = getSetField(); + if (setField != null) { + list.add(setField.getThriftFieldId()); + java.lang.Object value = getFieldValue(); + if (value instanceof org.apache.thrift.TEnum) { + list.add(((org.apache.thrift.TEnum) getFieldValue()).getValue()); + } else { + list.add(value); + } + } + return list.hashCode(); + } + + private void writeObject(java.io.ObjectOutputStream out) throws java.io.IOException { + try { + write( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(out))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } + + private void readObject(java.io.ObjectInputStream in) + throws java.io.IOException, java.lang.ClassNotFoundException { + try { + read( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(in))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } +} diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java index 5400b0ffb1b1..afdc9c376846 100644 --- a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java @@ -46,9 +46,6 @@ /** Tests for {@link ThriftIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class ThriftIOTest implements Serializable { private static final String RESOURCE_DIR = "ThriftIOTest/"; @@ -72,16 +69,16 @@ public void setUp() throws Exception { byte[] bytes = new byte[10]; ByteBuffer buffer = ByteBuffer.wrap(bytes); - TEST_THRIFT_STRUCT.testByte = 100; - TEST_THRIFT_STRUCT.testShort = 200; - TEST_THRIFT_STRUCT.testInt = 2500; - TEST_THRIFT_STRUCT.testLong = 79303L; - TEST_THRIFT_STRUCT.testDouble = 25.007; - TEST_THRIFT_STRUCT.testBool = true; - TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>(); - TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1); - TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2); - TEST_THRIFT_STRUCT.testBinary = buffer; + TEST_THRIFT_STRUCT.setTestByte((byte) 100); + TEST_THRIFT_STRUCT.setTestShort((short) 200); + TEST_THRIFT_STRUCT.setTestInt(2500); + TEST_THRIFT_STRUCT.setTestLong(79303L); + TEST_THRIFT_STRUCT.setTestDouble(25.007); + TEST_THRIFT_STRUCT.setTestBool(true); + TEST_THRIFT_STRUCT.setStringIntMap(new HashMap<>()); + TEST_THRIFT_STRUCT.getStringIntMap().put("first", (short) 1); + TEST_THRIFT_STRUCT.getStringIntMap().put("second", (short) 2); + TEST_THRIFT_STRUCT.setTestBinary(buffer); testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L)); } @@ -240,15 +237,15 @@ private List generateTestObjects(long count) { // Generate random string String randomString = RandomStringUtils.random(10, true, false); short s = (short) RandomUtils.nextInt(0, Short.MAX_VALUE + 1); - temp.stringIntMap = new HashMap<>(); - temp.stringIntMap.put(randomString, s); - temp.testShort = s; - temp.testBinary = buffer; - temp.testBool = RandomUtils.nextBoolean(); - temp.testByte = (byte) RandomUtils.nextInt(0, Byte.MAX_VALUE + 1); - temp.testDouble = RandomUtils.nextDouble(); - temp.testInt = RandomUtils.nextInt(); - temp.testLong = RandomUtils.nextLong(); + temp.setStringIntMap(new HashMap<>()); + temp.getStringIntMap().put(randomString, s); + temp.setTestShort(s); + temp.setTestBinary(buffer); + temp.setTestBool(RandomUtils.nextBoolean()); + temp.setTestByte((byte) RandomUtils.nextInt(0, Byte.MAX_VALUE + 1)); + temp.setTestDouble(RandomUtils.nextDouble()); + temp.setTestInt(RandomUtils.nextInt()); + temp.setTestLong(RandomUtils.nextLong()); testThriftStructList.add(temp); } diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftPayloadSerializerProviderTest.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftPayloadSerializerProviderTest.java new file mode 100644 index 000000000000..c19f15cbaea3 --- /dev/null +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftPayloadSerializerProviderTest.java @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThrows; + +import org.apache.beam.sdk.io.thrift.payloads.TestThriftMessage; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.payloads.PayloadSerializerProvider; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.thrift.TDeserializer; +import org.apache.thrift.TSerializer; +import org.apache.thrift.protocol.TCompactProtocol; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class ThriftPayloadSerializerProviderTest { + private static final Schema SHUFFLED_SCHEMA = + Schema.builder() + .addStringField("f_string") + .addInt32Field("f_int") + .addArrayField("f_double_array", Schema.FieldType.DOUBLE) + .addDoubleField("f_double") + .addInt64Field("f_long") + .build(); + private static final Row ROW = + Row.withSchema(SHUFFLED_SCHEMA) + .withFieldValue("f_string", "string") + .withFieldValue("f_int", 123) + .withFieldValue("f_double_array", ImmutableList.of(8.0)) + .withFieldValue("f_double", 9.0) + .withFieldValue("f_long", 456L) + .build(); + private static final TestThriftMessage MESSAGE = + new TestThriftMessage() + .setFLong(456L) + .setFInt(123) + .setFDouble(9.0) + .setFString("string") + .setFDoubleArray(ImmutableList.of(8.0)); + + private final PayloadSerializerProvider provider = new ThriftPayloadSerializerProvider(); + + @Test + public void invalidArgs() { + assertThrows( + IllegalArgumentException.class, + () -> provider.getSerializer(SHUFFLED_SCHEMA, ImmutableMap.of())); + assertThrows( + IllegalArgumentException.class, + () -> + provider.getSerializer( + SHUFFLED_SCHEMA, + ImmutableMap.of("thriftClass", "", "thriftProtocolFactoryClass", ""))); + assertThrows( + IllegalArgumentException.class, + () -> + provider.getSerializer( + SHUFFLED_SCHEMA, + ImmutableMap.of( + "thriftClass", + "", + "thriftProtocolFactoryClass", + TCompactProtocol.Factory.class.getName()))); + assertThrows( + IllegalArgumentException.class, + () -> + provider.getSerializer( + SHUFFLED_SCHEMA, + ImmutableMap.of( + "thriftClass", + TestThriftMessage.class.getName(), + "thriftProtocolFactoryClass", + ""))); + assertThrows( + ClassCastException.class, + () -> + provider.getSerializer( + SHUFFLED_SCHEMA, + ImmutableMap.of( + "thriftClass", ImmutableList.class.getName(), + "thriftProtocolFactoryClass", TCompactProtocol.Factory.class.getName()))); + assertThrows( + ClassCastException.class, + () -> + provider.getSerializer( + SHUFFLED_SCHEMA, + ImmutableMap.of( + "thriftClass", TestThriftMessage.class.getName(), + "thriftProtocolFactoryClass", ImmutableList.class.getName()))); + assertThrows( + IllegalArgumentException.class, + () -> + provider.getSerializer( + Schema.builder() + .addStringField("f_NOTACTUALLYINMESSAGE") + .addInt32Field("f_int") + .addArrayField("f_double_array", Schema.FieldType.DOUBLE) + .addDoubleField("f_double") + .addInt64Field("f_long") + .build(), + ImmutableMap.of( + "thriftClass", TestThriftMessage.class.getName(), + "thriftProtocolFactoryClass", TCompactProtocol.Factory.class.getName()))); + } + + @Test + public void serialize() throws Exception { + byte[] bytes = + provider + .getSerializer( + SHUFFLED_SCHEMA, + ImmutableMap.of( + "thriftClass", TestThriftMessage.class.getName(), + "thriftProtocolFactoryClass", TCompactProtocol.Factory.class.getName())) + .serialize(ROW); + TestThriftMessage result = new TestThriftMessage(); + new TDeserializer(new TCompactProtocol.Factory()).deserialize(result, bytes); + assertEquals(MESSAGE, result); + } + + @Test + public void deserialize() throws Exception { + Row row = + provider + .getSerializer( + SHUFFLED_SCHEMA, + ImmutableMap.of( + "thriftClass", TestThriftMessage.class.getName(), + "thriftProtocolFactoryClass", TCompactProtocol.Factory.class.getName())) + .deserialize(new TSerializer(new TCompactProtocol.Factory()).serialize(MESSAGE)); + assertEquals(ROW, row); + } +} diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftSchemaTest.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftSchemaTest.java new file mode 100644 index 000000000000..c269f5a18421 --- /dev/null +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftSchemaTest.java @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotNull; + +import java.nio.charset.StandardCharsets; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.function.Function; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.SchemaProvider; +import org.apache.beam.sdk.schemas.logicaltypes.EnumerationType; +import org.apache.beam.sdk.schemas.transforms.Convert; +import org.apache.beam.sdk.schemas.transforms.Group; +import org.apache.beam.sdk.schemas.transforms.Select; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Count; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.Distinct; +import org.apache.beam.sdk.transforms.FlatMapElements; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.Sum; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.junit.Rule; +import org.junit.Test; + +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-11436) +}) +public class ThriftSchemaTest { + private static final SchemaProvider defaultSchemaProvider = ThriftSchema.provider(); + private static final SchemaProvider customSchemaProvider = + ThriftSchema.custom().typedef("StringSet", FieldType.iterable(FieldType.STRING)).provider(); + + @Rule public transient TestPipeline testPipeline = TestPipeline.create(); + + @Test(expected = IllegalArgumentException.class) + public void testThriftSchemaOnlyAllowsThriftClasses() { + defaultSchemaProvider.schemaFor(TypeDescriptor.of(String.class)); + } + + @Test + public void testInnerStructSchemaWithSimpleTypedefs() { + // primitive typedefs don't need any special handling + final Schema schema = + defaultSchemaProvider.schemaFor(TypeDescriptor.of(TestThriftInnerStruct.class)); + assertNotNull(schema); + assertEquals(TypeName.STRING, schema.getField("testNameTypedef").getType().getTypeName()); + assertEquals(TypeName.INT16, schema.getField("testAgeTypedef").getType().getTypeName()); + } + + @Test + public void testUnionSchema() { + final Schema schema = defaultSchemaProvider.schemaFor(TypeDescriptor.of(TestThriftUnion.class)); + assertNotNull(schema); + assertEquals(TypeName.LOGICAL_TYPE, schema.getField("camelCaseEnum").getType().getTypeName()); + assertEquals( + EnumerationType.IDENTIFIER, + schema.getField("camelCaseEnum").getType().getLogicalType().getIdentifier()); + assertEquals(TypeName.ROW, schema.getField("snake_case_nested_struct").getType().getTypeName()); + } + + @Test(expected = IllegalStateException.class) + public void testMainStructSchemaWithoutTypedefRegistration() { + // container typedefs like set cannot be inferred based on available metadata + defaultSchemaProvider.schemaFor(TypeDescriptor.of(TestThriftStruct.class)); + } + + @Test + public void testMainStructSchemaWithContainerTypedefRegistered() { + final Schema schema = customSchemaProvider.schemaFor(TypeDescriptor.of(TestThriftStruct.class)); + assertNotNull(schema); + assertEquals(TypeName.BOOLEAN, schema.getField("testBool").getType().getTypeName()); + assertEquals(TypeName.BYTE, schema.getField("testByte").getType().getTypeName()); + assertEquals(TypeName.INT16, schema.getField("testShort").getType().getTypeName()); + assertEquals(TypeName.INT32, schema.getField("testInt").getType().getTypeName()); + assertEquals(TypeName.INT64, schema.getField("testLong").getType().getTypeName()); + assertEquals(TypeName.DOUBLE, schema.getField("testDouble").getType().getTypeName()); + assertEquals(TypeName.BYTES, schema.getField("testBinary").getType().getTypeName()); + assertEquals(TypeName.MAP, schema.getField("stringIntMap").getType().getTypeName()); + assertEquals(TypeName.LOGICAL_TYPE, schema.getField("testEnum").getType().getTypeName()); + assertEquals( + EnumerationType.IDENTIFIER, + schema.getField("testEnum").getType().getLogicalType().getIdentifier()); + assertEquals(TypeName.ARRAY, schema.getField("testList").getType().getTypeName()); + assertEquals(TypeName.ROW, schema.getField("testNested").getType().getTypeName()); + assertEquals( + TypeName.ITERABLE, schema.getField("testStringSetTypedef").getType().getTypeName()); + } + + @Test + public void testSchemaUsage() { + final List thriftData = + Arrays.asList( + thriftObj(1, 0.5, "k1", "k2"), + thriftObj(2, 1.5, "k1", "k2"), + thriftObj(1, 2.5, "k2", "k3"), + thriftObj(2, 3.5, "k2", "k3")); + + testPipeline + .getSchemaRegistry() + .registerSchemaProvider(TestThriftStruct.class, customSchemaProvider); + final PCollection rows = + testPipeline.apply(Create.of(thriftData)).apply("toRows", Convert.toRows()); + + playWithVariousDataTypes(rows); + + final PCollection restored = + rows.apply("backToThrift", Convert.fromRows(TypeDescriptor.of(TestThriftStruct.class))); + PAssert.that(restored).containsInAnyOrder(thriftData); + + testPipeline.run(); + } + + private void playWithVariousDataTypes(PCollection rows) { + final PCollection sumByKey = + rows.apply( + Group.byFieldNames("testLong") + .aggregateField("testDouble", Sum.ofDoubles(), "total")); + final Schema keySchema = Schema.of(Field.of("key", FieldType.INT64)); + final Schema valueSchema = Schema.of(Field.of("value", FieldType.DOUBLE)); + PAssert.that(sumByKey) + .containsInAnyOrder( + Row.withSchema(sumByKey.getSchema()) + .addValues( + Row.withSchema(keySchema).addValues(1L).build(), + Row.withSchema(valueSchema).addValues(3.0).build()) + .build(), + Row.withSchema(sumByKey.getSchema()) + .addValues( + Row.withSchema(keySchema).addValues(2L).build(), + Row.withSchema(valueSchema).addValues(5.0).build()) + .build()); + + final PCollection count = + rows.apply("bin", Select.fieldNames("testBinary")) + .apply("distinctBin", Distinct.create()) + .apply(Count.globally()); + PAssert.that(count).containsInAnyOrder(2L); + + final PCollection mapEntries = + rows.apply( + MapElements.into(TypeDescriptors.strings()) + .via( + row -> + row.getMap("stringIntMap").keySet().stream() + .collect(Collectors.joining()))) + .apply("distinctMapEntries", Distinct.create()); + PAssert.that(mapEntries).containsInAnyOrder("k1k2", "k3k2"); + + final PCollection tags = + rows.apply( + FlatMapElements.into(TypeDescriptor.of(String.class)) + .via(row -> row.getIterable("testStringSetTypedef"))) + .apply("distinctTags", Distinct.create()); + PAssert.that(tags).containsInAnyOrder("tag"); + + final PCollection enumerated = + rows.apply("enumSelect", Select.fieldNames("testEnum")) + .apply("distinctEnumValues", Distinct.create()); + + final Schema enumSchema = + Schema.of(Field.of("testEnum", FieldType.logicalType(EnumerationType.create("C1", "C2")))); + PAssert.that(enumerated) + .containsInAnyOrder( + Row.withSchema(enumSchema).addValues(new EnumerationType.Value(0)).build(), + Row.withSchema(enumSchema).addValues(new EnumerationType.Value(1)).build()); + + final PCollection unionNestedStructNames = + rows.apply( + "unionSelectStruct", + Select.fieldNames("testUnion.snake_case_nested_struct.testNameTypedef")) + .apply("distinctUnionNames", Distinct.create()); + + final Schema unionNameSchema = Schema.of(Field.nullable("name", FieldType.STRING)); + PAssert.that(unionNestedStructNames) + .containsInAnyOrder( + Row.withSchema(unionNameSchema).addValues("kid").build(), + Row.withSchema(unionNameSchema).addValues((String) null).build()); + + final PCollection unionNestedEnumValues = + rows.apply("unionSelectEnum", Select.fieldNames("testUnion.camelCaseEnum")) + .apply("distinctUnionEnum", Distinct.create()); + + final Schema unionNestedEnumSchema = + Schema.of( + Field.nullable("testEnum", FieldType.logicalType(EnumerationType.create("C1", "C2")))); + PAssert.that(unionNestedEnumValues) + .containsInAnyOrder( + Row.withSchema(unionNestedEnumSchema).addValues(new EnumerationType.Value(0)).build(), + Row.withSchema(unionNestedEnumSchema).addValues((EnumerationType.Value) null).build()); + + final Schema nameSchema = Schema.of(Field.of("name", FieldType.STRING)); + final PCollection names = + rows.apply("names", Select.fieldNames("testNested.testNameTypedef")) + .apply("distinctNames", Distinct.create()); + PAssert.that(names) + .containsInAnyOrder(Row.withSchema(nameSchema).addValues("Maradona").build()); + } + + private TestThriftStruct thriftObj(int index, double doubleValue, String... mapKeys) { + final TestThriftStruct thrift = new TestThriftStruct(); + thrift.setTestLong(index); + thrift.setTestInt(index); + thrift.setTestDouble(doubleValue); + final Map map = + Stream.of(mapKeys).collect(Collectors.toMap(Function.identity(), k -> (short) k.length())); + thrift.setStringIntMap(map); + thrift.setTestBinary(String.join("", mapKeys).getBytes(StandardCharsets.UTF_8)); + thrift.setTestStringSetTypedef(Collections.singleton("tag")); + thrift.setTestList(Arrays.asList(1, 2, 3)); + final TestThriftInnerStruct nested = new TestThriftInnerStruct("Maradona", (short) 60); + thrift.setTestNested(nested); + if (index % 2 == 0) { + thrift.setTestEnum(TestThriftEnum.C1); + thrift.setTestUnion(TestThriftUnion.snake_case_nested_struct(new TestThriftInnerStruct())); + } else { + thrift.setTestEnum(TestThriftEnum.C2); + thrift.setTestUnion(TestThriftUnion.camelCaseEnum(TestThriftEnum.C1)); + } + return thrift; + } +} diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/payloads/ItThriftMessage.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/payloads/ItThriftMessage.java new file mode 100644 index 000000000000..c977be38470a --- /dev/null +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/payloads/ItThriftMessage.java @@ -0,0 +1,610 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.payloads; + +@SuppressWarnings({"cast", "rawtypes", "serial", "unchecked", "unused"}) +@javax.annotation.Generated( + value = "Autogenerated by Thrift Compiler (0.13.0)", + date = "2020-12-20") +public class ItThriftMessage + implements org.apache.thrift.TBase, + java.io.Serializable, + Cloneable, + Comparable { + private static final org.apache.thrift.protocol.TStruct STRUCT_DESC = + new org.apache.thrift.protocol.TStruct("ItThriftMessage"); + + private static final org.apache.thrift.protocol.TField F_LONG_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "f_long", org.apache.thrift.protocol.TType.I64, (short) 1); + private static final org.apache.thrift.protocol.TField F_INT_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "f_int", org.apache.thrift.protocol.TType.I32, (short) 2); + private static final org.apache.thrift.protocol.TField F_STRING_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "f_string", org.apache.thrift.protocol.TType.STRING, (short) 3); + + private static final org.apache.thrift.scheme.SchemeFactory STANDARD_SCHEME_FACTORY = + new ItThriftMessageStandardSchemeFactory(); + private static final org.apache.thrift.scheme.SchemeFactory TUPLE_SCHEME_FACTORY = + new ItThriftMessageTupleSchemeFactory(); + + private long f_long; // required + private int f_int; // required + private @org.apache.thrift.annotation.Nullable String f_string; // required + + /** + * The set of fields this struct contains, along with convenience methods for finding and + * manipulating them. + */ + public enum _Fields implements org.apache.thrift.TFieldIdEnum { + F_LONG((short) 1, "f_long"), + F_INT((short) 2, "f_int"), + F_STRING((short) 3, "f_string"); + + private static final java.util.Map byName = + new java.util.HashMap(); + + static { + for (_Fields field : java.util.EnumSet.allOf(_Fields.class)) { + byName.put(field.getFieldName(), field); + } + } + + /** Find the _Fields constant that matches fieldId, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByThriftId(int fieldId) { + switch (fieldId) { + case 1: // F_LONG + return F_LONG; + case 2: // F_INT + return F_INT; + case 3: // F_STRING + return F_STRING; + default: + return null; + } + } + + /** Find the _Fields constant that matches fieldId, throwing an exception if it is not found. */ + public static _Fields findByThriftIdOrThrow(int fieldId) { + _Fields fields = findByThriftId(fieldId); + if (fields == null) + throw new IllegalArgumentException("Field " + fieldId + " doesn't exist!"); + return fields; + } + + /** Find the _Fields constant that matches name, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByName(String name) { + return byName.get(name); + } + + private final short _thriftId; + private final String _fieldName; + + _Fields(short thriftId, String fieldName) { + _thriftId = thriftId; + _fieldName = fieldName; + } + + public short getThriftFieldId() { + return _thriftId; + } + + public String getFieldName() { + return _fieldName; + } + } + + // isset id assignments + private static final int __F_LONG_ISSET_ID = 0; + private static final int __F_INT_ISSET_ID = 1; + private byte __isset_bitfield = 0; + public static final java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> metaDataMap; + + static { + java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> tmpMap = + new java.util.EnumMap<_Fields, org.apache.thrift.meta_data.FieldMetaData>(_Fields.class); + tmpMap.put( + _Fields.F_LONG, + new org.apache.thrift.meta_data.FieldMetaData( + "f_long", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.I64))); + tmpMap.put( + _Fields.F_INT, + new org.apache.thrift.meta_data.FieldMetaData( + "f_int", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.I32))); + tmpMap.put( + _Fields.F_STRING, + new org.apache.thrift.meta_data.FieldMetaData( + "f_string", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.STRING))); + metaDataMap = java.util.Collections.unmodifiableMap(tmpMap); + org.apache.thrift.meta_data.FieldMetaData.addStructMetaDataMap( + ItThriftMessage.class, metaDataMap); + } + + public ItThriftMessage() {} + + public ItThriftMessage(long f_long, int f_int, String f_string) { + this(); + this.f_long = f_long; + setFLongIsSet(true); + this.f_int = f_int; + setFIntIsSet(true); + this.f_string = f_string; + } + + /** Performs a deep copy on other. */ + public ItThriftMessage(ItThriftMessage other) { + __isset_bitfield = other.__isset_bitfield; + this.f_long = other.f_long; + this.f_int = other.f_int; + if (other.isSetFString()) { + this.f_string = other.f_string; + } + } + + public ItThriftMessage deepCopy() { + return new ItThriftMessage(this); + } + + @Override + public void clear() { + setFLongIsSet(false); + this.f_long = 0; + setFIntIsSet(false); + this.f_int = 0; + this.f_string = null; + } + + public long getFLong() { + return this.f_long; + } + + public ItThriftMessage setFLong(long f_long) { + this.f_long = f_long; + setFLongIsSet(true); + return this; + } + + public void unsetFLong() { + __isset_bitfield = + org.apache.thrift.EncodingUtils.clearBit(__isset_bitfield, __F_LONG_ISSET_ID); + } + + /** Returns true if field f_long is set (has been assigned a value) and false otherwise */ + public boolean isSetFLong() { + return org.apache.thrift.EncodingUtils.testBit(__isset_bitfield, __F_LONG_ISSET_ID); + } + + public void setFLongIsSet(boolean value) { + __isset_bitfield = + org.apache.thrift.EncodingUtils.setBit(__isset_bitfield, __F_LONG_ISSET_ID, value); + } + + public int getFInt() { + return this.f_int; + } + + public ItThriftMessage setFInt(int f_int) { + this.f_int = f_int; + setFIntIsSet(true); + return this; + } + + public void unsetFInt() { + __isset_bitfield = org.apache.thrift.EncodingUtils.clearBit(__isset_bitfield, __F_INT_ISSET_ID); + } + + /** Returns true if field f_int is set (has been assigned a value) and false otherwise */ + public boolean isSetFInt() { + return org.apache.thrift.EncodingUtils.testBit(__isset_bitfield, __F_INT_ISSET_ID); + } + + public void setFIntIsSet(boolean value) { + __isset_bitfield = + org.apache.thrift.EncodingUtils.setBit(__isset_bitfield, __F_INT_ISSET_ID, value); + } + + @org.apache.thrift.annotation.Nullable + public String getFString() { + return this.f_string; + } + + public ItThriftMessage setFString(@org.apache.thrift.annotation.Nullable String f_string) { + this.f_string = f_string; + return this; + } + + public void unsetFString() { + this.f_string = null; + } + + /** Returns true if field f_string is set (has been assigned a value) and false otherwise */ + public boolean isSetFString() { + return this.f_string != null; + } + + public void setFStringIsSet(boolean value) { + if (!value) { + this.f_string = null; + } + } + + public void setFieldValue(_Fields field, @org.apache.thrift.annotation.Nullable Object value) { + switch (field) { + case F_LONG: + if (value == null) { + unsetFLong(); + } else { + setFLong((Long) value); + } + break; + + case F_INT: + if (value == null) { + unsetFInt(); + } else { + setFInt((Integer) value); + } + break; + + case F_STRING: + if (value == null) { + unsetFString(); + } else { + setFString((String) value); + } + break; + } + } + + @org.apache.thrift.annotation.Nullable + public Object getFieldValue(_Fields field) { + switch (field) { + case F_LONG: + return getFLong(); + + case F_INT: + return getFInt(); + + case F_STRING: + return getFString(); + } + throw new IllegalStateException(); + } + + /** + * Returns true if field corresponding to fieldID is set (has been assigned a value) and false + * otherwise + */ + public boolean isSet(_Fields field) { + if (field == null) { + throw new IllegalArgumentException(); + } + + switch (field) { + case F_LONG: + return isSetFLong(); + case F_INT: + return isSetFInt(); + case F_STRING: + return isSetFString(); + } + throw new IllegalStateException(); + } + + @Override + public boolean equals(Object that) { + if (that == null) return false; + if (that instanceof ItThriftMessage) return this.equals((ItThriftMessage) that); + return false; + } + + public boolean equals(ItThriftMessage that) { + if (that == null) return false; + if (this == that) return true; + + boolean this_present_f_long = true; + boolean that_present_f_long = true; + if (this_present_f_long || that_present_f_long) { + if (!(this_present_f_long && that_present_f_long)) return false; + if (this.f_long != that.f_long) return false; + } + + boolean this_present_f_int = true; + boolean that_present_f_int = true; + if (this_present_f_int || that_present_f_int) { + if (!(this_present_f_int && that_present_f_int)) return false; + if (this.f_int != that.f_int) return false; + } + + boolean this_present_f_string = true && this.isSetFString(); + boolean that_present_f_string = true && that.isSetFString(); + if (this_present_f_string || that_present_f_string) { + if (!(this_present_f_string && that_present_f_string)) return false; + if (!this.f_string.equals(that.f_string)) return false; + } + + return true; + } + + @Override + public int hashCode() { + int hashCode = 1; + + hashCode = hashCode * 8191 + org.apache.thrift.TBaseHelper.hashCode(f_long); + + hashCode = hashCode * 8191 + f_int; + + hashCode = hashCode * 8191 + ((isSetFString()) ? 131071 : 524287); + if (isSetFString()) hashCode = hashCode * 8191 + f_string.hashCode(); + + return hashCode; + } + + @Override + public int compareTo(ItThriftMessage other) { + if (!getClass().equals(other.getClass())) { + return getClass().getName().compareTo(other.getClass().getName()); + } + + int lastComparison = 0; + + lastComparison = Boolean.valueOf(isSetFLong()).compareTo(other.isSetFLong()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetFLong()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.f_long, other.f_long); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = Boolean.valueOf(isSetFInt()).compareTo(other.isSetFInt()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetFInt()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.f_int, other.f_int); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = Boolean.valueOf(isSetFString()).compareTo(other.isSetFString()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetFString()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.f_string, other.f_string); + if (lastComparison != 0) { + return lastComparison; + } + } + return 0; + } + + @org.apache.thrift.annotation.Nullable + public _Fields fieldForId(int fieldId) { + return _Fields.findByThriftId(fieldId); + } + + public void read(org.apache.thrift.protocol.TProtocol iprot) throws org.apache.thrift.TException { + scheme(iprot).read(iprot, this); + } + + public void write(org.apache.thrift.protocol.TProtocol oprot) + throws org.apache.thrift.TException { + scheme(oprot).write(oprot, this); + } + + @Override + public String toString() { + StringBuilder sb = new StringBuilder("ItThriftMessage("); + boolean first = true; + + sb.append("f_long:"); + sb.append(this.f_long); + first = false; + if (!first) sb.append(", "); + sb.append("f_int:"); + sb.append(this.f_int); + first = false; + if (!first) sb.append(", "); + sb.append("f_string:"); + if (this.f_string == null) { + sb.append("null"); + } else { + sb.append(this.f_string); + } + first = false; + sb.append(")"); + return sb.toString(); + } + + public void validate() throws org.apache.thrift.TException { + // check for required fields + // alas, we cannot check 'f_long' because it's a primitive and you chose the non-beans + // generator. + // alas, we cannot check 'f_int' because it's a primitive and you chose the non-beans generator. + if (f_string == null) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'f_string' was not present! Struct: " + toString()); + } + // check for sub-struct validity + } + + private void writeObject(java.io.ObjectOutputStream out) throws java.io.IOException { + try { + write( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(out))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } + + private void readObject(java.io.ObjectInputStream in) + throws java.io.IOException, ClassNotFoundException { + try { + // it doesn't seem like you should have to do this, but java serialization is wacky, and + // doesn't call the default constructor. + __isset_bitfield = 0; + read( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(in))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } + + private static class ItThriftMessageStandardSchemeFactory + implements org.apache.thrift.scheme.SchemeFactory { + public ItThriftMessageStandardScheme getScheme() { + return new ItThriftMessageStandardScheme(); + } + } + + private static class ItThriftMessageStandardScheme + extends org.apache.thrift.scheme.StandardScheme { + + public void read(org.apache.thrift.protocol.TProtocol iprot, ItThriftMessage struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TField schemeField; + iprot.readStructBegin(); + while (true) { + schemeField = iprot.readFieldBegin(); + if (schemeField.type == org.apache.thrift.protocol.TType.STOP) { + break; + } + switch (schemeField.id) { + case 1: // F_LONG + if (schemeField.type == org.apache.thrift.protocol.TType.I64) { + struct.f_long = iprot.readI64(); + struct.setFLongIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 2: // F_INT + if (schemeField.type == org.apache.thrift.protocol.TType.I32) { + struct.f_int = iprot.readI32(); + struct.setFIntIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 3: // F_STRING + if (schemeField.type == org.apache.thrift.protocol.TType.STRING) { + struct.f_string = iprot.readString(); + struct.setFStringIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + default: + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + iprot.readFieldEnd(); + } + iprot.readStructEnd(); + + // check for required fields of primitive type, which can't be checked in the validate method + if (!struct.isSetFLong()) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'f_long' was not found in serialized data! Struct: " + toString()); + } + if (!struct.isSetFInt()) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'f_int' was not found in serialized data! Struct: " + toString()); + } + struct.validate(); + } + + public void write(org.apache.thrift.protocol.TProtocol oprot, ItThriftMessage struct) + throws org.apache.thrift.TException { + struct.validate(); + + oprot.writeStructBegin(STRUCT_DESC); + oprot.writeFieldBegin(F_LONG_FIELD_DESC); + oprot.writeI64(struct.f_long); + oprot.writeFieldEnd(); + oprot.writeFieldBegin(F_INT_FIELD_DESC); + oprot.writeI32(struct.f_int); + oprot.writeFieldEnd(); + if (struct.f_string != null) { + oprot.writeFieldBegin(F_STRING_FIELD_DESC); + oprot.writeString(struct.f_string); + oprot.writeFieldEnd(); + } + oprot.writeFieldStop(); + oprot.writeStructEnd(); + } + } + + private static class ItThriftMessageTupleSchemeFactory + implements org.apache.thrift.scheme.SchemeFactory { + public ItThriftMessageTupleScheme getScheme() { + return new ItThriftMessageTupleScheme(); + } + } + + private static class ItThriftMessageTupleScheme + extends org.apache.thrift.scheme.TupleScheme { + + @Override + public void write(org.apache.thrift.protocol.TProtocol prot, ItThriftMessage struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TTupleProtocol oprot = + (org.apache.thrift.protocol.TTupleProtocol) prot; + oprot.writeI64(struct.f_long); + oprot.writeI32(struct.f_int); + oprot.writeString(struct.f_string); + } + + @Override + public void read(org.apache.thrift.protocol.TProtocol prot, ItThriftMessage struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TTupleProtocol iprot = + (org.apache.thrift.protocol.TTupleProtocol) prot; + struct.f_long = iprot.readI64(); + struct.setFLongIsSet(true); + struct.f_int = iprot.readI32(); + struct.setFIntIsSet(true); + struct.f_string = iprot.readString(); + struct.setFStringIsSet(true); + } + } + + private static S scheme( + org.apache.thrift.protocol.TProtocol proto) { + return (org.apache.thrift.scheme.StandardScheme.class.equals(proto.getScheme()) + ? STANDARD_SCHEME_FACTORY + : TUPLE_SCHEME_FACTORY) + .getScheme(); + } +} diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/payloads/SimpleThriftMessage.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/payloads/SimpleThriftMessage.java new file mode 100644 index 000000000000..48cf6652c090 --- /dev/null +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/payloads/SimpleThriftMessage.java @@ -0,0 +1,508 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.payloads; + +@SuppressWarnings({"cast", "rawtypes", "serial", "unchecked", "unused"}) +@javax.annotation.Generated( + value = "Autogenerated by Thrift Compiler (0.13.0)", + date = "2020-12-20") +public class SimpleThriftMessage + implements org.apache.thrift.TBase, + java.io.Serializable, + Cloneable, + Comparable { + private static final org.apache.thrift.protocol.TStruct STRUCT_DESC = + new org.apache.thrift.protocol.TStruct("SimpleThriftMessage"); + + private static final org.apache.thrift.protocol.TField ID_FIELD_DESC = + new org.apache.thrift.protocol.TField("id", org.apache.thrift.protocol.TType.I32, (short) 1); + private static final org.apache.thrift.protocol.TField NAME_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "name", org.apache.thrift.protocol.TType.STRING, (short) 2); + + private static final org.apache.thrift.scheme.SchemeFactory STANDARD_SCHEME_FACTORY = + new SimpleThriftMessageStandardSchemeFactory(); + private static final org.apache.thrift.scheme.SchemeFactory TUPLE_SCHEME_FACTORY = + new SimpleThriftMessageTupleSchemeFactory(); + + private int id; // required + private @org.apache.thrift.annotation.Nullable String name; // required + + /** + * The set of fields this struct contains, along with convenience methods for finding and + * manipulating them. + */ + public enum _Fields implements org.apache.thrift.TFieldIdEnum { + ID((short) 1, "id"), + NAME((short) 2, "name"); + + private static final java.util.Map byName = + new java.util.HashMap(); + + static { + for (_Fields field : java.util.EnumSet.allOf(_Fields.class)) { + byName.put(field.getFieldName(), field); + } + } + + /** Find the _Fields constant that matches fieldId, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByThriftId(int fieldId) { + switch (fieldId) { + case 1: // ID + return ID; + case 2: // NAME + return NAME; + default: + return null; + } + } + + /** Find the _Fields constant that matches fieldId, throwing an exception if it is not found. */ + public static _Fields findByThriftIdOrThrow(int fieldId) { + _Fields fields = findByThriftId(fieldId); + if (fields == null) + throw new IllegalArgumentException("Field " + fieldId + " doesn't exist!"); + return fields; + } + + /** Find the _Fields constant that matches name, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByName(String name) { + return byName.get(name); + } + + private final short _thriftId; + private final String _fieldName; + + _Fields(short thriftId, String fieldName) { + _thriftId = thriftId; + _fieldName = fieldName; + } + + public short getThriftFieldId() { + return _thriftId; + } + + public String getFieldName() { + return _fieldName; + } + } + + // isset id assignments + private static final int __ID_ISSET_ID = 0; + private byte __isset_bitfield = 0; + public static final java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> metaDataMap; + + static { + java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> tmpMap = + new java.util.EnumMap<_Fields, org.apache.thrift.meta_data.FieldMetaData>(_Fields.class); + tmpMap.put( + _Fields.ID, + new org.apache.thrift.meta_data.FieldMetaData( + "id", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.I32))); + tmpMap.put( + _Fields.NAME, + new org.apache.thrift.meta_data.FieldMetaData( + "name", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.STRING))); + metaDataMap = java.util.Collections.unmodifiableMap(tmpMap); + org.apache.thrift.meta_data.FieldMetaData.addStructMetaDataMap( + SimpleThriftMessage.class, metaDataMap); + } + + public SimpleThriftMessage() {} + + public SimpleThriftMessage(int id, String name) { + this(); + this.id = id; + setIdIsSet(true); + this.name = name; + } + + /** Performs a deep copy on other. */ + public SimpleThriftMessage(SimpleThriftMessage other) { + __isset_bitfield = other.__isset_bitfield; + this.id = other.id; + if (other.isSetName()) { + this.name = other.name; + } + } + + public SimpleThriftMessage deepCopy() { + return new SimpleThriftMessage(this); + } + + @Override + public void clear() { + setIdIsSet(false); + this.id = 0; + this.name = null; + } + + public int getId() { + return this.id; + } + + public SimpleThriftMessage setId(int id) { + this.id = id; + setIdIsSet(true); + return this; + } + + public void unsetId() { + __isset_bitfield = org.apache.thrift.EncodingUtils.clearBit(__isset_bitfield, __ID_ISSET_ID); + } + + /** Returns true if field id is set (has been assigned a value) and false otherwise */ + public boolean isSetId() { + return org.apache.thrift.EncodingUtils.testBit(__isset_bitfield, __ID_ISSET_ID); + } + + public void setIdIsSet(boolean value) { + __isset_bitfield = + org.apache.thrift.EncodingUtils.setBit(__isset_bitfield, __ID_ISSET_ID, value); + } + + @org.apache.thrift.annotation.Nullable + public String getName() { + return this.name; + } + + public SimpleThriftMessage setName(@org.apache.thrift.annotation.Nullable String name) { + this.name = name; + return this; + } + + public void unsetName() { + this.name = null; + } + + /** Returns true if field name is set (has been assigned a value) and false otherwise */ + public boolean isSetName() { + return this.name != null; + } + + public void setNameIsSet(boolean value) { + if (!value) { + this.name = null; + } + } + + public void setFieldValue(_Fields field, @org.apache.thrift.annotation.Nullable Object value) { + switch (field) { + case ID: + if (value == null) { + unsetId(); + } else { + setId((Integer) value); + } + break; + + case NAME: + if (value == null) { + unsetName(); + } else { + setName((String) value); + } + break; + } + } + + @org.apache.thrift.annotation.Nullable + public Object getFieldValue(_Fields field) { + switch (field) { + case ID: + return getId(); + + case NAME: + return getName(); + } + throw new IllegalStateException(); + } + + /** + * Returns true if field corresponding to fieldID is set (has been assigned a value) and false + * otherwise + */ + public boolean isSet(_Fields field) { + if (field == null) { + throw new IllegalArgumentException(); + } + + switch (field) { + case ID: + return isSetId(); + case NAME: + return isSetName(); + } + throw new IllegalStateException(); + } + + @Override + public boolean equals(Object that) { + if (that == null) return false; + if (that instanceof SimpleThriftMessage) return this.equals((SimpleThriftMessage) that); + return false; + } + + public boolean equals(SimpleThriftMessage that) { + if (that == null) return false; + if (this == that) return true; + + boolean this_present_id = true; + boolean that_present_id = true; + if (this_present_id || that_present_id) { + if (!(this_present_id && that_present_id)) return false; + if (this.id != that.id) return false; + } + + boolean this_present_name = true && this.isSetName(); + boolean that_present_name = true && that.isSetName(); + if (this_present_name || that_present_name) { + if (!(this_present_name && that_present_name)) return false; + if (!this.name.equals(that.name)) return false; + } + + return true; + } + + @Override + public int hashCode() { + int hashCode = 1; + + hashCode = hashCode * 8191 + id; + + hashCode = hashCode * 8191 + ((isSetName()) ? 131071 : 524287); + if (isSetName()) hashCode = hashCode * 8191 + name.hashCode(); + + return hashCode; + } + + @Override + public int compareTo(SimpleThriftMessage other) { + if (!getClass().equals(other.getClass())) { + return getClass().getName().compareTo(other.getClass().getName()); + } + + int lastComparison = 0; + + lastComparison = Boolean.valueOf(isSetId()).compareTo(other.isSetId()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetId()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.id, other.id); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = Boolean.valueOf(isSetName()).compareTo(other.isSetName()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetName()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.name, other.name); + if (lastComparison != 0) { + return lastComparison; + } + } + return 0; + } + + @org.apache.thrift.annotation.Nullable + public _Fields fieldForId(int fieldId) { + return _Fields.findByThriftId(fieldId); + } + + public void read(org.apache.thrift.protocol.TProtocol iprot) throws org.apache.thrift.TException { + scheme(iprot).read(iprot, this); + } + + public void write(org.apache.thrift.protocol.TProtocol oprot) + throws org.apache.thrift.TException { + scheme(oprot).write(oprot, this); + } + + @Override + public String toString() { + StringBuilder sb = new StringBuilder("SimpleThriftMessage("); + boolean first = true; + + sb.append("id:"); + sb.append(this.id); + first = false; + if (!first) sb.append(", "); + sb.append("name:"); + if (this.name == null) { + sb.append("null"); + } else { + sb.append(this.name); + } + first = false; + sb.append(")"); + return sb.toString(); + } + + public void validate() throws org.apache.thrift.TException { + // check for required fields + // alas, we cannot check 'id' because it's a primitive and you chose the non-beans generator. + if (name == null) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'name' was not present! Struct: " + toString()); + } + // check for sub-struct validity + } + + private void writeObject(java.io.ObjectOutputStream out) throws java.io.IOException { + try { + write( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(out))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } + + private void readObject(java.io.ObjectInputStream in) + throws java.io.IOException, ClassNotFoundException { + try { + // it doesn't seem like you should have to do this, but java serialization is wacky, and + // doesn't call the default constructor. + __isset_bitfield = 0; + read( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(in))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } + + private static class SimpleThriftMessageStandardSchemeFactory + implements org.apache.thrift.scheme.SchemeFactory { + public SimpleThriftMessageStandardScheme getScheme() { + return new SimpleThriftMessageStandardScheme(); + } + } + + private static class SimpleThriftMessageStandardScheme + extends org.apache.thrift.scheme.StandardScheme { + + public void read(org.apache.thrift.protocol.TProtocol iprot, SimpleThriftMessage struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TField schemeField; + iprot.readStructBegin(); + while (true) { + schemeField = iprot.readFieldBegin(); + if (schemeField.type == org.apache.thrift.protocol.TType.STOP) { + break; + } + switch (schemeField.id) { + case 1: // ID + if (schemeField.type == org.apache.thrift.protocol.TType.I32) { + struct.id = iprot.readI32(); + struct.setIdIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 2: // NAME + if (schemeField.type == org.apache.thrift.protocol.TType.STRING) { + struct.name = iprot.readString(); + struct.setNameIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + default: + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + iprot.readFieldEnd(); + } + iprot.readStructEnd(); + + // check for required fields of primitive type, which can't be checked in the validate method + if (!struct.isSetId()) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'id' was not found in serialized data! Struct: " + toString()); + } + struct.validate(); + } + + public void write(org.apache.thrift.protocol.TProtocol oprot, SimpleThriftMessage struct) + throws org.apache.thrift.TException { + struct.validate(); + + oprot.writeStructBegin(STRUCT_DESC); + oprot.writeFieldBegin(ID_FIELD_DESC); + oprot.writeI32(struct.id); + oprot.writeFieldEnd(); + if (struct.name != null) { + oprot.writeFieldBegin(NAME_FIELD_DESC); + oprot.writeString(struct.name); + oprot.writeFieldEnd(); + } + oprot.writeFieldStop(); + oprot.writeStructEnd(); + } + } + + private static class SimpleThriftMessageTupleSchemeFactory + implements org.apache.thrift.scheme.SchemeFactory { + public SimpleThriftMessageTupleScheme getScheme() { + return new SimpleThriftMessageTupleScheme(); + } + } + + private static class SimpleThriftMessageTupleScheme + extends org.apache.thrift.scheme.TupleScheme { + + @Override + public void write(org.apache.thrift.protocol.TProtocol prot, SimpleThriftMessage struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TTupleProtocol oprot = + (org.apache.thrift.protocol.TTupleProtocol) prot; + oprot.writeI32(struct.id); + oprot.writeString(struct.name); + } + + @Override + public void read(org.apache.thrift.protocol.TProtocol prot, SimpleThriftMessage struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TTupleProtocol iprot = + (org.apache.thrift.protocol.TTupleProtocol) prot; + struct.id = iprot.readI32(); + struct.setIdIsSet(true); + struct.name = iprot.readString(); + struct.setNameIsSet(true); + } + } + + private static S scheme( + org.apache.thrift.protocol.TProtocol proto) { + return (org.apache.thrift.scheme.StandardScheme.class.equals(proto.getScheme()) + ? STANDARD_SCHEME_FACTORY + : TUPLE_SCHEME_FACTORY) + .getScheme(); + } +} diff --git a/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/payloads/TestThriftMessage.java b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/payloads/TestThriftMessage.java new file mode 100644 index 000000000000..c68d25480237 --- /dev/null +++ b/sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/payloads/TestThriftMessage.java @@ -0,0 +1,877 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.payloads; + +@SuppressWarnings({"cast", "rawtypes", "serial", "unchecked", "unused"}) +@javax.annotation.Generated( + value = "Autogenerated by Thrift Compiler (0.13.0)", + date = "2020-12-20") +public class TestThriftMessage + implements org.apache.thrift.TBase, + java.io.Serializable, + Cloneable, + Comparable { + private static final org.apache.thrift.protocol.TStruct STRUCT_DESC = + new org.apache.thrift.protocol.TStruct("TestThriftMessage"); + + private static final org.apache.thrift.protocol.TField F_LONG_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "f_long", org.apache.thrift.protocol.TType.I64, (short) 1); + private static final org.apache.thrift.protocol.TField F_INT_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "f_int", org.apache.thrift.protocol.TType.I32, (short) 2); + private static final org.apache.thrift.protocol.TField F_DOUBLE_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "f_double", org.apache.thrift.protocol.TType.DOUBLE, (short) 3); + private static final org.apache.thrift.protocol.TField F_STRING_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "f_string", org.apache.thrift.protocol.TType.STRING, (short) 4); + private static final org.apache.thrift.protocol.TField F_DOUBLE_ARRAY_FIELD_DESC = + new org.apache.thrift.protocol.TField( + "f_double_array", org.apache.thrift.protocol.TType.LIST, (short) 5); + + private static final org.apache.thrift.scheme.SchemeFactory STANDARD_SCHEME_FACTORY = + new TestThriftMessageStandardSchemeFactory(); + private static final org.apache.thrift.scheme.SchemeFactory TUPLE_SCHEME_FACTORY = + new TestThriftMessageTupleSchemeFactory(); + + private long f_long; // required + private int f_int; // required + private double f_double; // required + private @org.apache.thrift.annotation.Nullable String f_string; // required + private @org.apache.thrift.annotation.Nullable java.util.List f_double_array; // required + + /** + * The set of fields this struct contains, along with convenience methods for finding and + * manipulating them. + */ + public enum _Fields implements org.apache.thrift.TFieldIdEnum { + F_LONG((short) 1, "f_long"), + F_INT((short) 2, "f_int"), + F_DOUBLE((short) 3, "f_double"), + F_STRING((short) 4, "f_string"), + F_DOUBLE_ARRAY((short) 5, "f_double_array"); + + private static final java.util.Map byName = + new java.util.HashMap(); + + static { + for (_Fields field : java.util.EnumSet.allOf(_Fields.class)) { + byName.put(field.getFieldName(), field); + } + } + + /** Find the _Fields constant that matches fieldId, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByThriftId(int fieldId) { + switch (fieldId) { + case 1: // F_LONG + return F_LONG; + case 2: // F_INT + return F_INT; + case 3: // F_DOUBLE + return F_DOUBLE; + case 4: // F_STRING + return F_STRING; + case 5: // F_DOUBLE_ARRAY + return F_DOUBLE_ARRAY; + default: + return null; + } + } + + /** Find the _Fields constant that matches fieldId, throwing an exception if it is not found. */ + public static _Fields findByThriftIdOrThrow(int fieldId) { + _Fields fields = findByThriftId(fieldId); + if (fields == null) + throw new IllegalArgumentException("Field " + fieldId + " doesn't exist!"); + return fields; + } + + /** Find the _Fields constant that matches name, or null if its not found. */ + @org.apache.thrift.annotation.Nullable + public static _Fields findByName(String name) { + return byName.get(name); + } + + private final short _thriftId; + private final String _fieldName; + + _Fields(short thriftId, String fieldName) { + _thriftId = thriftId; + _fieldName = fieldName; + } + + public short getThriftFieldId() { + return _thriftId; + } + + public String getFieldName() { + return _fieldName; + } + } + + // isset id assignments + private static final int __F_LONG_ISSET_ID = 0; + private static final int __F_INT_ISSET_ID = 1; + private static final int __F_DOUBLE_ISSET_ID = 2; + private byte __isset_bitfield = 0; + public static final java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> metaDataMap; + + static { + java.util.Map<_Fields, org.apache.thrift.meta_data.FieldMetaData> tmpMap = + new java.util.EnumMap<_Fields, org.apache.thrift.meta_data.FieldMetaData>(_Fields.class); + tmpMap.put( + _Fields.F_LONG, + new org.apache.thrift.meta_data.FieldMetaData( + "f_long", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.I64))); + tmpMap.put( + _Fields.F_INT, + new org.apache.thrift.meta_data.FieldMetaData( + "f_int", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.I32))); + tmpMap.put( + _Fields.F_DOUBLE, + new org.apache.thrift.meta_data.FieldMetaData( + "f_double", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.DOUBLE))); + tmpMap.put( + _Fields.F_STRING, + new org.apache.thrift.meta_data.FieldMetaData( + "f_string", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.STRING))); + tmpMap.put( + _Fields.F_DOUBLE_ARRAY, + new org.apache.thrift.meta_data.FieldMetaData( + "f_double_array", + org.apache.thrift.TFieldRequirementType.REQUIRED, + new org.apache.thrift.meta_data.ListMetaData( + org.apache.thrift.protocol.TType.LIST, + new org.apache.thrift.meta_data.FieldValueMetaData( + org.apache.thrift.protocol.TType.DOUBLE)))); + metaDataMap = java.util.Collections.unmodifiableMap(tmpMap); + org.apache.thrift.meta_data.FieldMetaData.addStructMetaDataMap( + TestThriftMessage.class, metaDataMap); + } + + public TestThriftMessage() { + this.f_double_array = new java.util.ArrayList(); + } + + public TestThriftMessage( + long f_long, + int f_int, + double f_double, + String f_string, + java.util.List f_double_array) { + this(); + this.f_long = f_long; + setFLongIsSet(true); + this.f_int = f_int; + setFIntIsSet(true); + this.f_double = f_double; + setFDoubleIsSet(true); + this.f_string = f_string; + this.f_double_array = f_double_array; + } + + /** Performs a deep copy on other. */ + public TestThriftMessage(TestThriftMessage other) { + __isset_bitfield = other.__isset_bitfield; + this.f_long = other.f_long; + this.f_int = other.f_int; + this.f_double = other.f_double; + if (other.isSetFString()) { + this.f_string = other.f_string; + } + if (other.isSetFDoubleArray()) { + java.util.List __this__f_double_array = + new java.util.ArrayList(other.f_double_array); + this.f_double_array = __this__f_double_array; + } + } + + public TestThriftMessage deepCopy() { + return new TestThriftMessage(this); + } + + @Override + public void clear() { + setFLongIsSet(false); + this.f_long = 0; + setFIntIsSet(false); + this.f_int = 0; + setFDoubleIsSet(false); + this.f_double = 0.0; + this.f_string = null; + this.f_double_array = new java.util.ArrayList(); + } + + public long getFLong() { + return this.f_long; + } + + public TestThriftMessage setFLong(long f_long) { + this.f_long = f_long; + setFLongIsSet(true); + return this; + } + + public void unsetFLong() { + __isset_bitfield = + org.apache.thrift.EncodingUtils.clearBit(__isset_bitfield, __F_LONG_ISSET_ID); + } + + /** Returns true if field f_long is set (has been assigned a value) and false otherwise */ + public boolean isSetFLong() { + return org.apache.thrift.EncodingUtils.testBit(__isset_bitfield, __F_LONG_ISSET_ID); + } + + public void setFLongIsSet(boolean value) { + __isset_bitfield = + org.apache.thrift.EncodingUtils.setBit(__isset_bitfield, __F_LONG_ISSET_ID, value); + } + + public int getFInt() { + return this.f_int; + } + + public TestThriftMessage setFInt(int f_int) { + this.f_int = f_int; + setFIntIsSet(true); + return this; + } + + public void unsetFInt() { + __isset_bitfield = org.apache.thrift.EncodingUtils.clearBit(__isset_bitfield, __F_INT_ISSET_ID); + } + + /** Returns true if field f_int is set (has been assigned a value) and false otherwise */ + public boolean isSetFInt() { + return org.apache.thrift.EncodingUtils.testBit(__isset_bitfield, __F_INT_ISSET_ID); + } + + public void setFIntIsSet(boolean value) { + __isset_bitfield = + org.apache.thrift.EncodingUtils.setBit(__isset_bitfield, __F_INT_ISSET_ID, value); + } + + public double getFDouble() { + return this.f_double; + } + + public TestThriftMessage setFDouble(double f_double) { + this.f_double = f_double; + setFDoubleIsSet(true); + return this; + } + + public void unsetFDouble() { + __isset_bitfield = + org.apache.thrift.EncodingUtils.clearBit(__isset_bitfield, __F_DOUBLE_ISSET_ID); + } + + /** Returns true if field f_double is set (has been assigned a value) and false otherwise */ + public boolean isSetFDouble() { + return org.apache.thrift.EncodingUtils.testBit(__isset_bitfield, __F_DOUBLE_ISSET_ID); + } + + public void setFDoubleIsSet(boolean value) { + __isset_bitfield = + org.apache.thrift.EncodingUtils.setBit(__isset_bitfield, __F_DOUBLE_ISSET_ID, value); + } + + @org.apache.thrift.annotation.Nullable + public String getFString() { + return this.f_string; + } + + public TestThriftMessage setFString(@org.apache.thrift.annotation.Nullable String f_string) { + this.f_string = f_string; + return this; + } + + public void unsetFString() { + this.f_string = null; + } + + /** Returns true if field f_string is set (has been assigned a value) and false otherwise */ + public boolean isSetFString() { + return this.f_string != null; + } + + public void setFStringIsSet(boolean value) { + if (!value) { + this.f_string = null; + } + } + + public int getFDoubleArraySize() { + return (this.f_double_array == null) ? 0 : this.f_double_array.size(); + } + + @org.apache.thrift.annotation.Nullable + public java.util.Iterator getFDoubleArrayIterator() { + return (this.f_double_array == null) ? null : this.f_double_array.iterator(); + } + + public void addToFDoubleArray(double elem) { + if (this.f_double_array == null) { + this.f_double_array = new java.util.ArrayList(); + } + this.f_double_array.add(elem); + } + + @org.apache.thrift.annotation.Nullable + public java.util.List getFDoubleArray() { + return this.f_double_array; + } + + public TestThriftMessage setFDoubleArray( + @org.apache.thrift.annotation.Nullable java.util.List f_double_array) { + this.f_double_array = f_double_array; + return this; + } + + public void unsetFDoubleArray() { + this.f_double_array = null; + } + + /** Returns true if field f_double_array is set (has been assigned a value) and false otherwise */ + public boolean isSetFDoubleArray() { + return this.f_double_array != null; + } + + public void setFDoubleArrayIsSet(boolean value) { + if (!value) { + this.f_double_array = null; + } + } + + public void setFieldValue(_Fields field, @org.apache.thrift.annotation.Nullable Object value) { + switch (field) { + case F_LONG: + if (value == null) { + unsetFLong(); + } else { + setFLong((Long) value); + } + break; + + case F_INT: + if (value == null) { + unsetFInt(); + } else { + setFInt((Integer) value); + } + break; + + case F_DOUBLE: + if (value == null) { + unsetFDouble(); + } else { + setFDouble((Double) value); + } + break; + + case F_STRING: + if (value == null) { + unsetFString(); + } else { + setFString((String) value); + } + break; + + case F_DOUBLE_ARRAY: + if (value == null) { + unsetFDoubleArray(); + } else { + setFDoubleArray((java.util.List) value); + } + break; + } + } + + @org.apache.thrift.annotation.Nullable + public Object getFieldValue(_Fields field) { + switch (field) { + case F_LONG: + return getFLong(); + + case F_INT: + return getFInt(); + + case F_DOUBLE: + return getFDouble(); + + case F_STRING: + return getFString(); + + case F_DOUBLE_ARRAY: + return getFDoubleArray(); + } + throw new IllegalStateException(); + } + + /** + * Returns true if field corresponding to fieldID is set (has been assigned a value) and false + * otherwise + */ + public boolean isSet(_Fields field) { + if (field == null) { + throw new IllegalArgumentException(); + } + + switch (field) { + case F_LONG: + return isSetFLong(); + case F_INT: + return isSetFInt(); + case F_DOUBLE: + return isSetFDouble(); + case F_STRING: + return isSetFString(); + case F_DOUBLE_ARRAY: + return isSetFDoubleArray(); + } + throw new IllegalStateException(); + } + + @Override + public boolean equals(Object that) { + if (that == null) return false; + if (that instanceof TestThriftMessage) return this.equals((TestThriftMessage) that); + return false; + } + + public boolean equals(TestThriftMessage that) { + if (that == null) return false; + if (this == that) return true; + + boolean this_present_f_long = true; + boolean that_present_f_long = true; + if (this_present_f_long || that_present_f_long) { + if (!(this_present_f_long && that_present_f_long)) return false; + if (this.f_long != that.f_long) return false; + } + + boolean this_present_f_int = true; + boolean that_present_f_int = true; + if (this_present_f_int || that_present_f_int) { + if (!(this_present_f_int && that_present_f_int)) return false; + if (this.f_int != that.f_int) return false; + } + + boolean this_present_f_double = true; + boolean that_present_f_double = true; + if (this_present_f_double || that_present_f_double) { + if (!(this_present_f_double && that_present_f_double)) return false; + if (this.f_double != that.f_double) return false; + } + + boolean this_present_f_string = true && this.isSetFString(); + boolean that_present_f_string = true && that.isSetFString(); + if (this_present_f_string || that_present_f_string) { + if (!(this_present_f_string && that_present_f_string)) return false; + if (!this.f_string.equals(that.f_string)) return false; + } + + boolean this_present_f_double_array = true && this.isSetFDoubleArray(); + boolean that_present_f_double_array = true && that.isSetFDoubleArray(); + if (this_present_f_double_array || that_present_f_double_array) { + if (!(this_present_f_double_array && that_present_f_double_array)) return false; + if (!this.f_double_array.equals(that.f_double_array)) return false; + } + + return true; + } + + @Override + public int hashCode() { + int hashCode = 1; + + hashCode = hashCode * 8191 + org.apache.thrift.TBaseHelper.hashCode(f_long); + + hashCode = hashCode * 8191 + f_int; + + hashCode = hashCode * 8191 + org.apache.thrift.TBaseHelper.hashCode(f_double); + + hashCode = hashCode * 8191 + ((isSetFString()) ? 131071 : 524287); + if (isSetFString()) hashCode = hashCode * 8191 + f_string.hashCode(); + + hashCode = hashCode * 8191 + ((isSetFDoubleArray()) ? 131071 : 524287); + if (isSetFDoubleArray()) hashCode = hashCode * 8191 + f_double_array.hashCode(); + + return hashCode; + } + + @Override + public int compareTo(TestThriftMessage other) { + if (!getClass().equals(other.getClass())) { + return getClass().getName().compareTo(other.getClass().getName()); + } + + int lastComparison = 0; + + lastComparison = Boolean.valueOf(isSetFLong()).compareTo(other.isSetFLong()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetFLong()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.f_long, other.f_long); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = Boolean.valueOf(isSetFInt()).compareTo(other.isSetFInt()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetFInt()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.f_int, other.f_int); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = Boolean.valueOf(isSetFDouble()).compareTo(other.isSetFDouble()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetFDouble()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.f_double, other.f_double); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = Boolean.valueOf(isSetFString()).compareTo(other.isSetFString()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetFString()) { + lastComparison = org.apache.thrift.TBaseHelper.compareTo(this.f_string, other.f_string); + if (lastComparison != 0) { + return lastComparison; + } + } + lastComparison = Boolean.valueOf(isSetFDoubleArray()).compareTo(other.isSetFDoubleArray()); + if (lastComparison != 0) { + return lastComparison; + } + if (isSetFDoubleArray()) { + lastComparison = + org.apache.thrift.TBaseHelper.compareTo(this.f_double_array, other.f_double_array); + if (lastComparison != 0) { + return lastComparison; + } + } + return 0; + } + + @org.apache.thrift.annotation.Nullable + public _Fields fieldForId(int fieldId) { + return _Fields.findByThriftId(fieldId); + } + + public void read(org.apache.thrift.protocol.TProtocol iprot) throws org.apache.thrift.TException { + scheme(iprot).read(iprot, this); + } + + public void write(org.apache.thrift.protocol.TProtocol oprot) + throws org.apache.thrift.TException { + scheme(oprot).write(oprot, this); + } + + @Override + public String toString() { + StringBuilder sb = new StringBuilder("TestThriftMessage("); + boolean first = true; + + sb.append("f_long:"); + sb.append(this.f_long); + first = false; + if (!first) sb.append(", "); + sb.append("f_int:"); + sb.append(this.f_int); + first = false; + if (!first) sb.append(", "); + sb.append("f_double:"); + sb.append(this.f_double); + first = false; + if (!first) sb.append(", "); + sb.append("f_string:"); + if (this.f_string == null) { + sb.append("null"); + } else { + sb.append(this.f_string); + } + first = false; + if (!first) sb.append(", "); + sb.append("f_double_array:"); + if (this.f_double_array == null) { + sb.append("null"); + } else { + sb.append(this.f_double_array); + } + first = false; + sb.append(")"); + return sb.toString(); + } + + public void validate() throws org.apache.thrift.TException { + // check for required fields + // alas, we cannot check 'f_long' because it's a primitive and you chose the non-beans + // generator. + // alas, we cannot check 'f_int' because it's a primitive and you chose the non-beans generator. + // alas, we cannot check 'f_double' because it's a primitive and you chose the non-beans + // generator. + if (f_string == null) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'f_string' was not present! Struct: " + toString()); + } + if (f_double_array == null) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'f_double_array' was not present! Struct: " + toString()); + } + // check for sub-struct validity + } + + private void writeObject(java.io.ObjectOutputStream out) throws java.io.IOException { + try { + write( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(out))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } + + private void readObject(java.io.ObjectInputStream in) + throws java.io.IOException, ClassNotFoundException { + try { + // it doesn't seem like you should have to do this, but java serialization is wacky, and + // doesn't call the default constructor. + __isset_bitfield = 0; + read( + new org.apache.thrift.protocol.TCompactProtocol( + new org.apache.thrift.transport.TIOStreamTransport(in))); + } catch (org.apache.thrift.TException te) { + throw new java.io.IOException(te); + } + } + + private static class TestThriftMessageStandardSchemeFactory + implements org.apache.thrift.scheme.SchemeFactory { + public TestThriftMessageStandardScheme getScheme() { + return new TestThriftMessageStandardScheme(); + } + } + + private static class TestThriftMessageStandardScheme + extends org.apache.thrift.scheme.StandardScheme { + + public void read(org.apache.thrift.protocol.TProtocol iprot, TestThriftMessage struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TField schemeField; + iprot.readStructBegin(); + while (true) { + schemeField = iprot.readFieldBegin(); + if (schemeField.type == org.apache.thrift.protocol.TType.STOP) { + break; + } + switch (schemeField.id) { + case 1: // F_LONG + if (schemeField.type == org.apache.thrift.protocol.TType.I64) { + struct.f_long = iprot.readI64(); + struct.setFLongIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 2: // F_INT + if (schemeField.type == org.apache.thrift.protocol.TType.I32) { + struct.f_int = iprot.readI32(); + struct.setFIntIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 3: // F_DOUBLE + if (schemeField.type == org.apache.thrift.protocol.TType.DOUBLE) { + struct.f_double = iprot.readDouble(); + struct.setFDoubleIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 4: // F_STRING + if (schemeField.type == org.apache.thrift.protocol.TType.STRING) { + struct.f_string = iprot.readString(); + struct.setFStringIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + case 5: // F_DOUBLE_ARRAY + if (schemeField.type == org.apache.thrift.protocol.TType.LIST) { + { + org.apache.thrift.protocol.TList _list0 = iprot.readListBegin(); + struct.f_double_array = new java.util.ArrayList(_list0.size); + double _elem1; + for (int _i2 = 0; _i2 < _list0.size; ++_i2) { + _elem1 = iprot.readDouble(); + struct.f_double_array.add(_elem1); + } + iprot.readListEnd(); + } + struct.setFDoubleArrayIsSet(true); + } else { + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + break; + default: + org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); + } + iprot.readFieldEnd(); + } + iprot.readStructEnd(); + + // check for required fields of primitive type, which can't be checked in the validate method + if (!struct.isSetFLong()) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'f_long' was not found in serialized data! Struct: " + toString()); + } + if (!struct.isSetFInt()) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'f_int' was not found in serialized data! Struct: " + toString()); + } + if (!struct.isSetFDouble()) { + throw new org.apache.thrift.protocol.TProtocolException( + "Required field 'f_double' was not found in serialized data! Struct: " + toString()); + } + struct.validate(); + } + + public void write(org.apache.thrift.protocol.TProtocol oprot, TestThriftMessage struct) + throws org.apache.thrift.TException { + struct.validate(); + + oprot.writeStructBegin(STRUCT_DESC); + oprot.writeFieldBegin(F_LONG_FIELD_DESC); + oprot.writeI64(struct.f_long); + oprot.writeFieldEnd(); + oprot.writeFieldBegin(F_INT_FIELD_DESC); + oprot.writeI32(struct.f_int); + oprot.writeFieldEnd(); + oprot.writeFieldBegin(F_DOUBLE_FIELD_DESC); + oprot.writeDouble(struct.f_double); + oprot.writeFieldEnd(); + if (struct.f_string != null) { + oprot.writeFieldBegin(F_STRING_FIELD_DESC); + oprot.writeString(struct.f_string); + oprot.writeFieldEnd(); + } + if (struct.f_double_array != null) { + oprot.writeFieldBegin(F_DOUBLE_ARRAY_FIELD_DESC); + { + oprot.writeListBegin( + new org.apache.thrift.protocol.TList( + org.apache.thrift.protocol.TType.DOUBLE, struct.f_double_array.size())); + for (double _iter3 : struct.f_double_array) { + oprot.writeDouble(_iter3); + } + oprot.writeListEnd(); + } + oprot.writeFieldEnd(); + } + oprot.writeFieldStop(); + oprot.writeStructEnd(); + } + } + + private static class TestThriftMessageTupleSchemeFactory + implements org.apache.thrift.scheme.SchemeFactory { + public TestThriftMessageTupleScheme getScheme() { + return new TestThriftMessageTupleScheme(); + } + } + + private static class TestThriftMessageTupleScheme + extends org.apache.thrift.scheme.TupleScheme { + + @Override + public void write(org.apache.thrift.protocol.TProtocol prot, TestThriftMessage struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TTupleProtocol oprot = + (org.apache.thrift.protocol.TTupleProtocol) prot; + oprot.writeI64(struct.f_long); + oprot.writeI32(struct.f_int); + oprot.writeDouble(struct.f_double); + oprot.writeString(struct.f_string); + { + oprot.writeI32(struct.f_double_array.size()); + for (double _iter4 : struct.f_double_array) { + oprot.writeDouble(_iter4); + } + } + } + + @Override + public void read(org.apache.thrift.protocol.TProtocol prot, TestThriftMessage struct) + throws org.apache.thrift.TException { + org.apache.thrift.protocol.TTupleProtocol iprot = + (org.apache.thrift.protocol.TTupleProtocol) prot; + struct.f_long = iprot.readI64(); + struct.setFLongIsSet(true); + struct.f_int = iprot.readI32(); + struct.setFIntIsSet(true); + struct.f_double = iprot.readDouble(); + struct.setFDoubleIsSet(true); + struct.f_string = iprot.readString(); + struct.setFStringIsSet(true); + { + org.apache.thrift.protocol.TList _list5 = + new org.apache.thrift.protocol.TList( + org.apache.thrift.protocol.TType.DOUBLE, iprot.readI32()); + struct.f_double_array = new java.util.ArrayList(_list5.size); + double _elem6; + for (int _i7 = 0; _i7 < _list5.size; ++_i7) { + _elem6 = iprot.readDouble(); + struct.f_double_array.add(_elem6); + } + } + struct.setFDoubleArrayIsSet(true); + } + } + + private static S scheme( + org.apache.thrift.protocol.TProtocol proto) { + return (org.apache.thrift.scheme.StandardScheme.class.equals(proto.getScheme()) + ? STANDARD_SCHEME_FACTORY + : TUPLE_SCHEME_FACTORY) + .getScheme(); + } +} diff --git a/sdks/java/io/thrift/src/test/resources/org/apache/beam/sdk/io/thrift/payloads/thrift_test.thrift b/sdks/java/io/thrift/src/test/resources/org/apache/beam/sdk/io/thrift/payloads/thrift_test.thrift new file mode 100644 index 000000000000..df3a0b625b6d --- /dev/null +++ b/sdks/java/io/thrift/src/test/resources/org/apache/beam/sdk/io/thrift/payloads/thrift_test.thrift @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* +This thrift file is used to generate the TestThrift* classes: + +thrift --gen java:beans \ + -out sdks/java/io/thrift/src/test/java/ \ + sdks/java/io/thrift/src/test/resources/thrift/thrift_test.thrift + +./gradlew :sdks:java:io:thrift:spotlessApply +*/ + +namespace java org.apache.beam.sdk.io.thrift + +typedef string Name +typedef i16 Age + +struct TestThriftInnerStruct { + 1: Name testNameTypedef = "kid" + 2: Age testAgeTypedef = 12 +} + +enum TestThriftEnum { C1, C2 } + +union TestThriftUnion { + 1: TestThriftInnerStruct snake_case_nested_struct; + 2: TestThriftEnum camelCaseEnum; +} + +typedef set StringSet + +struct TestThriftStruct { + 1: i8 testByte + 2: i16 testShort + 3: i32 testInt + 4: i64 testLong + 5: double testDouble + 6: map stringIntMap + 7: binary testBinary + 8: bool testBool + 9: list testList + 10: StringSet testStringSetTypedef + 11: TestThriftEnum testEnum + 12: TestThriftInnerStruct testNested + 13: TestThriftUnion testUnion +} diff --git a/sdks/java/io/thrift/src/test/thrift/payload.thrift b/sdks/java/io/thrift/src/test/thrift/payload.thrift new file mode 100644 index 000000000000..7544d102b83e --- /dev/null +++ b/sdks/java/io/thrift/src/test/thrift/payload.thrift @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* +thrift --gen java:private-members,fullcamel \ + -out sdks/java/io/thrift/src/test/java/ \ + sdks/java/io/thrift/src/test/thrift/payload.thrift + +./gradlew :sdks:java:extensions:sql:spotlessApply +*/ + +namespace java org.apache.beam.sdk.io.thrift.payloads + +struct TestThriftMessage { + 1: required i64 f_long + 2: required i32 f_int + 3: required double f_double + 4: required string f_string + 5: required list f_double_array = 5 +} + +struct SimpleThriftMessage { + 1: required i32 id + 2: required string name +} + +struct ItThriftMessage { + 1: required i64 f_long + 2: required i32 f_int + 3: required string f_string +} diff --git a/sdks/java/io/tika/build.gradle b/sdks/java/io/tika/build.gradle index 28c6c622a31f..0b2a9e718ec9 100644 --- a/sdks/java/io/tika/build.gradle +++ b/sdks/java/io/tika/build.gradle @@ -22,18 +22,15 @@ applyJavaNature(automaticModuleName: 'org.apache.beam.sdk.io.tika') description = "Apache Beam :: SDKs :: Java :: IO :: Tika" ext.summary = "Tika Input to parse files." -def tika_version = "1.24.1" -def bndlib_version = "1.43.0" +def tika_version = "1.26" +def bndlib_version = "1.50.0" dependencies { compile library.java.vendored_guava_26_0_jre - compileOnly "biz.aQute:bndlib:$bndlib_version" compile project(path: ":sdks:java:core", configuration: "shadow") compile "org.apache.tika:tika-core:$tika_version" testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testCompile "org.apache.tika:tika-parsers:$tika_version" testCompileOnly "biz.aQute:bndlib:$bndlib_version" testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") diff --git a/sdks/java/io/tika/src/test/java/org/apache/beam/sdk/io/tika/ParseResultTest.java b/sdks/java/io/tika/src/test/java/org/apache/beam/sdk/io/tika/ParseResultTest.java index 9b324db2b36d..05de20f71e61 100644 --- a/sdks/java/io/tika/src/test/java/org/apache/beam/sdk/io/tika/ParseResultTest.java +++ b/sdks/java/io/tika/src/test/java/org/apache/beam/sdk/io/tika/ParseResultTest.java @@ -19,8 +19,8 @@ import static org.hamcrest.CoreMatchers.equalTo; import static org.hamcrest.CoreMatchers.not; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import org.apache.tika.metadata.Metadata; import org.junit.Test; diff --git a/sdks/java/io/tika/src/test/java/org/apache/beam/sdk/io/tika/TikaIOTest.java b/sdks/java/io/tika/src/test/java/org/apache/beam/sdk/io/tika/TikaIOTest.java index e254fd82fd98..93c0446dfe67 100644 --- a/sdks/java/io/tika/src/test/java/org/apache/beam/sdk/io/tika/TikaIOTest.java +++ b/sdks/java/io/tika/src/test/java/org/apache/beam/sdk/io/tika/TikaIOTest.java @@ -18,9 +18,9 @@ package org.apache.beam.sdk.io.tika; import static org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem; +import static org.hamcrest.MatcherAssert.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.IOException; @@ -46,9 +46,6 @@ /** Tests for {@link TikaIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class TikaIOTest implements Serializable { private static final String PDF_ZIP_FILE = "\n\n\n\n\n\n\n\napache-beam-tika.pdf\n\n\nCombining\n\n\nApache Beam\n\n\n" diff --git a/sdks/java/io/xml/build.gradle b/sdks/java/io/xml/build.gradle index d6c8ba2761e3..3b7698db7b71 100644 --- a/sdks/java/io/xml/build.gradle +++ b/sdks/java/io/xml/build.gradle @@ -25,14 +25,15 @@ ext.summary = "IO to read and write XML files." dependencies { compile library.java.jaxb_api compile library.java.jaxb_impl + permitUnusedDeclared library.java.jaxb_impl // BEAM-11761 compile library.java.vendored_guava_26_0_jre compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.stax2_api + compile "javax.xml.stream:stax-api:1.0-2" compile library.java.woodstox_core_asl + permitUnusedDeclared library.java.woodstox_core_asl // BEAM-11761 testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile library.java.junit - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testRuntimeOnly library.java.slf4j_jdk14 testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadow") } diff --git a/sdks/java/io/xml/src/main/java/org/apache/beam/sdk/io/xml/XmlSource.java b/sdks/java/io/xml/src/main/java/org/apache/beam/sdk/io/xml/XmlSource.java index a7bd320b5fea..04bb74a37cd1 100644 --- a/sdks/java/io/xml/src/main/java/org/apache/beam/sdk/io/xml/XmlSource.java +++ b/sdks/java/io/xml/src/main/java/org/apache/beam/sdk/io/xml/XmlSource.java @@ -233,6 +233,8 @@ private long getFirstOccurenceOfRecordElement( int charBytesFound = 0; ByteBuffer buf = ByteBuffer.allocate(BUF_SIZE); + boolean bufSizeChanged = false; + byte[] recordStartBytes = ("<" + getCurrentSource().configuration.getRecordElement()) .getBytes(StandardCharsets.UTF_8); @@ -281,7 +283,16 @@ private long getFirstOccurenceOfRecordElement( break outer; } else { // Matching was unsuccessful. Reset the buffer to include bytes read for the char. - ByteBuffer newbuf = ByteBuffer.allocate(BUF_SIZE); + int bytesToWrite = buf.remaining() + charBytes.length; + ByteBuffer newbuf; + if (bytesToWrite > BUF_SIZE) { + // Avoiding buffer overflow. The number of bytes to push to the buffer might be + // larger than BUF_SIZE due to additional 'charBytes'. + newbuf = ByteBuffer.allocate(bytesToWrite); + bufSizeChanged = true; + } else { + newbuf = ByteBuffer.allocate(BUF_SIZE); + } newbuf.put(charBytes); offsetInFileOfCurrentByte -= charBytes.length; while (buf.hasRemaining()) { @@ -320,7 +331,14 @@ private long getFirstOccurenceOfRecordElement( recordStartBytesMatched = true; } } - buf.clear(); + if (bufSizeChanged) { + // We have to reset the size of the buffer to 'BUF_SIZE' + // to prevent it from infinitely increasing. + buf = ByteBuffer.allocate(BUF_SIZE); + bufSizeChanged = false; + } else { + buf.clear(); + } } if (!fullyMatched) { diff --git a/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/JAXBCoderTest.java b/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/JAXBCoderTest.java index 8d995edd049b..d3444094cdf3 100644 --- a/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/JAXBCoderTest.java +++ b/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/JAXBCoderTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.io.xml; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import java.io.IOException; import java.io.InputStream; @@ -46,9 +46,6 @@ /** Unit tests for {@link JAXBCoder}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JAXBCoderTest { @XmlRootElement diff --git a/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlIOTest.java b/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlIOTest.java index ac4543244317..e0a67543b7cf 100644 --- a/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlIOTest.java +++ b/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlIOTest.java @@ -37,7 +37,6 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.checkerframework.checker.nullness.qual.Nullable; -import org.junit.Assert; import org.junit.Rule; import org.junit.Test; import org.junit.rules.ExpectedException; @@ -47,9 +46,6 @@ /** Tests for {@link XmlIO}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class XmlIOTest { @Rule public TemporaryFolder tmpFolder = new TemporaryFolder(); @@ -187,11 +183,11 @@ public void testDisplayData() { .withMinBundleSize(1234) .withRecordClass(Integer.class)); - Assert.assertThat(displayData, hasDisplayItem("filePattern", "foo.xml")); - Assert.assertThat(displayData, hasDisplayItem("rootElement", "bird")); - Assert.assertThat(displayData, hasDisplayItem("recordElement", "cat")); - Assert.assertThat(displayData, hasDisplayItem("recordClass", Integer.class)); - Assert.assertThat(displayData, hasDisplayItem("minBundleSize", 1234)); + assertThat(displayData, hasDisplayItem("filePattern", "foo.xml")); + assertThat(displayData, hasDisplayItem("rootElement", "bird")); + assertThat(displayData, hasDisplayItem("recordElement", "cat")); + assertThat(displayData, hasDisplayItem("recordClass", Integer.class)); + assertThat(displayData, hasDisplayItem("minBundleSize", 1234)); } @Test diff --git a/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlSourceTest.java b/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlSourceTest.java index f84f8f404a14..d45942eebc0c 100644 --- a/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlSourceTest.java +++ b/sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlSourceTest.java @@ -20,9 +20,9 @@ import static org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionExhaustive; import static org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionFails; import static org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionSucceedsAndConsistent; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; import java.io.BufferedWriter; @@ -58,9 +58,6 @@ * XmlIOTest}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class XmlSourceTest { @Rule public TestPipeline p = TestPipeline.create(); @@ -158,6 +155,12 @@ public class XmlSourceTest { + "Cédric7blue" + ""; + private String trainXMLWithTrainTagsTemplate = + "" + + "%trainTags%Thomas1blue" + + "%trainTags%Henry3green" + + ""; + @XmlRootElement static class TinyTrain { TinyTrain(String name) { @@ -873,6 +876,42 @@ public void testSplitAtFractionExhaustiveSingleByte() throws Exception { assertSplitAtFractionExhaustive(source, options); } + @Test + public void testNoBufferOverflowThrown() throws IOException { + // The magicNumber was found imperatively and will be different for different xml content. + // Test with the current setup causes BufferOverflow in + // XMLReader#getFirstOccurenceOfRecordElement method, + // if the specific corner case is not handled + final int magicNumber = 51; + StringBuilder tagsSb = new StringBuilder(); + for (int j = 0; j < magicNumber; j++) { + // tags which start the same way as the record element, trigger + // a special flow, which could end up with BufferOverflow + // exception + tagsSb.append("").append(j).append(""); + } + File file = tempFolder.newFile("trainXMLWithTags"); + + String xmlWithNoise = trainXMLWithTrainTagsTemplate.replace("%trainTags%", tagsSb.toString()); + Files.write(file.toPath(), xmlWithNoise.getBytes(StandardCharsets.UTF_8)); + + PCollection output = + p.apply( + "ReadFileData", + XmlIO.read() + .from(file.toPath().toString()) + .withRootElement("trains") + .withRecordElement("train") + .withRecordClass(Train.class) + .withMinBundleSize(1024)); + + List expectedResults = + ImmutableList.of( + new Train("Thomas", 1, "blue", null), new Train("Henry", 3, "green", null)); + PAssert.that(output).containsInAnyOrder(expectedResults); + p.run(); + } + @Test @Ignore( "Multi-byte characters in XML are not supported because the parser " diff --git a/sdks/java/javadoc/build.gradle b/sdks/java/javadoc/build.gradle index fbdc03dfe73d..62d98f70ac49 100644 --- a/sdks/java/javadoc/build.gradle +++ b/sdks/java/javadoc/build.gradle @@ -28,7 +28,7 @@ applyJavaNature(publish: false) description = "Apache Beam :: SDKs :: Java :: Aggregated Javadoc" for (p in rootProject.subprojects) { - if (!p.path.equals(project.path) && !p.path.equals(':sdks:java:bom')) { + if (!p.path.equals(project.path) && !p.path.startsWith(':sdks:java:bom')) { evaluationDependsOn(p.path) } } @@ -36,7 +36,7 @@ for (p in rootProject.subprojects) { ext.getExportedJavadocProjects = { def exportedJavadocProjects = new ArrayList<>(); for (p in rootProject.subprojects) { - if (!p.path.equals(project.path) && !p.path.equals(':sdks:java:bom')) { + if (!p.path.equals(project.path) && !p.path.startsWith(':sdks:java:bom')) { def subproject = p // project(':' + p.name) if (subproject.ext.properties.containsKey('exportJavadoc') && subproject.ext.properties.exportJavadoc) { diff --git a/sdks/java/maven-archetypes/examples/build.gradle b/sdks/java/maven-archetypes/examples/build.gradle index 43f3021e20d4..8f0773160d66 100644 --- a/sdks/java/maven-archetypes/examples/build.gradle +++ b/sdks/java/maven-archetypes/examples/build.gradle @@ -45,7 +45,7 @@ processResources { 'maven-jar-plugin.version': dependencies.create(project.library.maven.maven_jar_plugin).getVersion(), 'maven-shade-plugin.version': dependencies.create(project.library.maven.maven_shade_plugin).getVersion(), 'maven-surefire-plugin.version': dependencies.create(project.library.maven.maven_surefire_plugin).getVersion(), - 'flink.artifact.name': 'beam-runners-flink-'.concat(project(":runners:flink:1.10").getName()), + 'flink.artifact.name': 'beam-runners-flink-'.concat(project(":runners:flink:${project.ext.latestFlinkVersion}").getName()), ] } @@ -75,7 +75,3 @@ sourceSets { output.dir('src', builtBy: 'generateSources') } } - -dependencies { - compile project(":examples:java") -} diff --git a/sdks/java/maven-archetypes/examples/generate-sources.sh b/sdks/java/maven-archetypes/examples/generate-sources.sh index c2b686cf9a32..5df4c1f77616 100755 --- a/sdks/java/maven-archetypes/examples/generate-sources.sh +++ b/sdks/java/maven-archetypes/examples/generate-sources.sh @@ -21,7 +21,7 @@ # Usage: Invoke with no arguments from any working directory. # The directory of this script. Assumes root of the maven-archetypes module. -HERE="$( dirname "$0" )" +HERE=${HERE-"$( dirname "$0" )"} # The directory of the examples-java module EXAMPLES_ROOT="${HERE}/../../../../examples/java" @@ -29,11 +29,13 @@ EXAMPLES_ROOT="${HERE}/../../../../examples/java" # The root of the examples archetype ARCHETYPE_ROOT="${HERE}/src/main/resources/archetype-resources" +SAMPLE_FILE="${HERE}/sample.txt" + mkdir -p "${ARCHETYPE_ROOT}/src/main/java" mkdir -p "${ARCHETYPE_ROOT}/src/test/java" # -# Copy the Java subset of the examples project verbatim. +# Copy the Java subset of the examples project verbatim. # rsync -a --exclude cookbook --exclude complete --exclude snippets \ "${EXAMPLES_ROOT}"/src/main/java/org/apache/beam/examples/ \ @@ -45,6 +47,15 @@ rsync -a --exclude cookbook --exclude complete --exclude snippets --exclude '*IT "${ARCHETYPE_ROOT}/src/test/java" \ --delete +# +# Copy sample.txt from this dir to archetype root. +# +if [ -f "$SAMPLE_FILE" ]; then + rsync -a \ + "${SAMPLE_FILE}" \ + "${ARCHETYPE_ROOT}" +fi + mkdir -p "${ARCHETYPE_ROOT}/src/main/java/complete/game" mkdir -p "${ARCHETYPE_ROOT}/src/test/java/complete/game" @@ -78,7 +89,7 @@ find "${ARCHETYPE_ROOT}/src/test/java" -name '*.java' -print0 \ | xargs -0 sed -i.bak 's/^import org\.apache\.beam\.examples/import ${package}/g' # -# The use of -i.bak is necessary for the above to work with both GNU and BSD sed. +# The use of -i.bak is necessary for the above to work with both GNU and BSD sed. # Delete the files now. # find "${ARCHETYPE_ROOT}/src" -name '*.bak' -delete diff --git a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml index 95c906f7360f..4560e89d9c0c 100644 --- a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml +++ b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml @@ -132,6 +132,9 @@ + + reference.conf + diff --git a/sdks/java/maven-archetypes/gcp-bom-examples/build.gradle b/sdks/java/maven-archetypes/gcp-bom-examples/build.gradle new file mode 100644 index 000000000000..ad3328bfcabe --- /dev/null +++ b/sdks/java/maven-archetypes/gcp-bom-examples/build.gradle @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +plugins { id 'org.apache.beam.module' } +applyJavaNature(exportJavadoc: false, automaticModuleName: 'org.apache.beam.maven.archetypes.gcp.bom.examples') + +// Based off of :sdks:java:maven-archetypes:examples project +description = "Apache Beam :: SDKs :: Java :: Maven Archetypes :: Google Cloud Platform BOM Examples" +ext.summary = """A Maven Archetype to create a project +using the Beam Google Cloud Platform BOM""" + +processResources { + filter org.apache.tools.ant.filters.ReplaceTokens, tokens: [ + 'project.version': version, + 'bigquery.version': dependencies.create(project.library.java.google_api_services_bigquery).getVersion(), + 'google-api-client.version': dependencies.create(project.library.java.google_api_client).getVersion(), + 'hamcrest.version': dependencies.create(project.library.java.hamcrest_library).getVersion(), + 'jackson.version': dependencies.create(project.library.java.jackson_core).getVersion(), + 'joda.version': dependencies.create(project.library.java.joda_time).getVersion(), + 'junit.version': dependencies.create(project.library.java.junit).getVersion(), + 'pubsub.version': dependencies.create(project.library.java.google_api_services_pubsub).getVersion(), + 'slf4j.version': dependencies.create(project.library.java.slf4j_api).getVersion(), + 'spark.version': dependencies.create(project.library.java.spark_core).getVersion(), + 'nemo.version': dependencies.create(project.library.java.nemo_compiler_frontend_beam).getVersion(), + 'hadoop.version': dependencies.create(project.library.java.hadoop_client).getVersion(), + 'mockito.version': dependencies.create(project.library.java.mockito_core).getVersion(), + 'maven-compiler-plugin.version': dependencies.create(project.library.maven.maven_compiler_plugin).getVersion(), + 'maven-exec-plugin.version': dependencies.create(project.library.maven.maven_exec_plugin).getVersion(), + 'maven-jar-plugin.version': dependencies.create(project.library.maven.maven_jar_plugin).getVersion(), + 'maven-shade-plugin.version': dependencies.create(project.library.maven.maven_shade_plugin).getVersion(), + 'maven-surefire-plugin.version': dependencies.create(project.library.maven.maven_surefire_plugin).getVersion(), + 'flink.artifact.name': 'beam-runners-flink-'.concat(project(":runners:flink:${project.ext.latestFlinkVersion}").getName()), + ] +} + +/* + * We need to rely on manually specifying these evaluationDependsOn to ensure that + * the following projects are evaluated before we evaluate this project. This is because + * we are attempting to reference the "sourceSets.{main|test}.allSource" directly. + */ +evaluationDependsOn(':examples:java') + +task generateSources(type: Exec) { + inputs.file('../examples/generate-sources.sh') + .withPropertyName('generate-sources.sh') + .withPathSensitivity(PathSensitivity.RELATIVE) + inputs.files(project(':examples:java').sourceSets.main.allSource) + .withPropertyName('sourcesMain') + .withPathSensitivity(PathSensitivity.RELATIVE) + inputs.files(project(':examples:java').sourceSets.test.allSource) + .withPropertyName('sourcesTest') + .withPathSensitivity(PathSensitivity.RELATIVE) + outputs.dir('src/main/resources/archetype-resources/src') + environment "HERE", "." + commandLine '../examples/generate-sources.sh' +} + +sourceSets { + main { + output.dir('src', builtBy: 'generateSources') + } +} diff --git a/sdks/java/maven-archetypes/gcp-bom-examples/pom.xml b/sdks/java/maven-archetypes/gcp-bom-examples/pom.xml new file mode 100644 index 000000000000..b7447e3822a6 --- /dev/null +++ b/sdks/java/maven-archetypes/gcp-bom-examples/pom.xml @@ -0,0 +1,136 @@ + + + + + 4.0.0 + + org.apache.beam + beam-sdks-java-maven-archetypes-gcp-bom-examples + 2.6.0-SNAPSHOT + Apache Beam :: SDKs :: Java :: Maven Archetypes :: Google Cloud Platform BOM Examples + A Maven Archetype to create a project + using the Beam Google Cloud Platform BOM. + + + maven-archetype + + + + + org.apache.maven.archetype + archetype-packaging + 2.4 + + + + + + + maven-archetype-plugin + 2.4 + + + org.apache.maven.shared + maven-invoker + 2.2 + + + + + + default-integration-test + install + + integration-test + + + + + + + org.eclipse.m2e + lifecycle-mapping + 1.0.0 + + + + + + org.codehaus.mojo + exec-maven-plugin + [1.5.0,) + + exec + + + + + false + + + + + + + + + + + + + exec-maven-plugin + org.codehaus.mojo + 1.5.0 + + + generate-archetype-contents + generate-sources + + exec + + + ${project.basedir}/generate-sources.sh + + + + + + org.apache.maven.plugins + maven-clean-plugin + 3.1.0 + + + + src/main/resources/archetype-resources + + src/**/* + src + + false + + + + + + + diff --git a/sdks/java/maven-archetypes/gcp-bom-examples/src/main/resources/META-INF/maven/archetype-metadata.xml b/sdks/java/maven-archetypes/gcp-bom-examples/src/main/resources/META-INF/maven/archetype-metadata.xml new file mode 100644 index 000000000000..b13328d10d25 --- /dev/null +++ b/sdks/java/maven-archetypes/gcp-bom-examples/src/main/resources/META-INF/maven/archetype-metadata.xml @@ -0,0 +1,45 @@ + + + + + + + 1.8 + + + + + + src/main/java + + **/*.java + + + + + src/test/java + + **/*.java + + + + diff --git a/sdks/java/maven-archetypes/gcp-bom-examples/src/main/resources/archetype-resources/pom.xml b/sdks/java/maven-archetypes/gcp-bom-examples/src/main/resources/archetype-resources/pom.xml new file mode 100644 index 000000000000..e85511b6aed5 --- /dev/null +++ b/sdks/java/maven-archetypes/gcp-bom-examples/src/main/resources/archetype-resources/pom.xml @@ -0,0 +1,463 @@ + + + + 4.0.0 + + ${groupId} + ${artifactId} + ${version} + + jar + + + @project.version@ + + @bigquery.version@ + @google-api-client.version@ + @hamcrest.version@ + @jackson.version@ + @joda.version@ + @junit.version@ + @maven-compiler-plugin.version@ + @maven-exec-plugin.version@ + @maven-jar-plugin.version@ + @maven-shade-plugin.version@ + @mockito.version@ + @pubsub.version@ + @slf4j.version@ + @spark.version@ + @hadoop.version@ + @maven-surefire-plugin.version@ + @nemo.version@ + @flink.artifact.name@ + + + + + apache.snapshots + Apache Development Snapshot Repository + https://repository.apache.org/content/repositories/snapshots/ + + false + + + true + + + + + + + + org.apache.maven.plugins + maven-compiler-plugin + ${maven-compiler-plugin.version} + + ${targetPlatform} + ${targetPlatform} + + + + + org.apache.maven.plugins + maven-surefire-plugin + ${maven-surefire-plugin.version} + + all + 4 + true + + + + org.apache.maven.surefire + surefire-junit47 + ${maven-surefire-plugin.version} + + + + + + + org.apache.maven.plugins + maven-jar-plugin + ${maven-jar-plugin.version} + + + + + org.apache.maven.plugins + maven-shade-plugin + ${maven-shade-plugin.version} + + + package + + shade + + + ${project.artifactId}-bundled-${project.version} + + + *:* + + META-INF/LICENSE + META-INF/*.SF + META-INF/*.DSA + META-INF/*.RSA + + + + + + + reference.conf + + + + + + + + + + + + org.codehaus.mojo + exec-maven-plugin + ${maven-exec-plugin.version} + + false + + + + + + + + + direct-runner + + true + + + + + org.apache.beam + beam-runners-direct-java + runtime + + + + + + portable-runner + + true + + + + + org.apache.beam + beam-runners-portability-java + runtime + + + + + + dataflow-runner + + + + org.apache.beam + beam-runners-google-cloud-dataflow-java + runtime + + + + + + flink-runner + + + + org.apache.beam + + ${flink.artifact.name} + runtime + + + + + + spark-runner + + + 4.1.17.Final + + + + org.apache.beam + beam-runners-spark + runtime + + + org.apache.beam + beam-sdks-java-io-hadoop-file-system + runtime + + + org.apache.spark + spark-streaming_2.11 + runtime + + + org.slf4j + jul-to-slf4j + + + + + com.fasterxml.jackson.module + jackson-module-scala_2.11 + ${jackson.version} + runtime + + + + org.apache.beam + beam-sdks-java-io-google-cloud-platform + + + io.grpc + grpc-netty + + + io.netty + netty-handler + + + + + + + samza-runner + + + org.apache.beam + beam-runners-samza + runtime + + + + + twister2-runner + + + org.apache.beam + beam-runners-twister2 + runtime + + + + + nemo-runner + + + org.apache.nemo + nemo-compiler-frontend-beam + ${nemo.version} + + + org.apache.hadoop + hadoop-common + ${hadoop.version} + + + org.slf4j + slf4j-api + + + org.slf4j + slf4j-log4j12 + + + + + + + + jet-runner + + + org.apache.beam + beam-runners-jet + runtime + + + + + + + + + + org.apache.beam + beam-sdks-java-core + + + + + org.apache.beam + beam-sdks-java-io-google-cloud-platform + + + + + com.google.api-client + google-api-client + ${google-api-client.version} + + + + com.google.guava + guava-jdk5 + + + + + + com.google.apis + google-api-services-bigquery + ${bigquery.version} + + + + com.google.guava + guava-jdk5 + + + + + + com.google.http-client + google-http-client + + + + com.google.guava + guava-jdk5 + + + + + + com.google.apis + google-api-services-pubsub + ${pubsub.version} + + + + com.google.guava + guava-jdk5 + + + + + + joda-time + joda-time + ${joda.version} + + + + + org.slf4j + slf4j-api + ${slf4j.version} + + + + org.slf4j + slf4j-jdk14 + ${slf4j.version} + + runtime + + + + + org.hamcrest + hamcrest-core + ${hamcrest.version} + + + + org.hamcrest + hamcrest-library + ${hamcrest.version} + + + + junit + junit + ${junit.version} + + + + + org.apache.beam + beam-runners-direct-java + test + + + + org.mockito + mockito-core + ${mockito.version} + test + + + + + + + + org.apache.beam + beam-sdks-java-google-cloud-platform-bom + ${beam.version} + pom + import + + + + diff --git a/sdks/java/maven-archetypes/starter/build.gradle b/sdks/java/maven-archetypes/starter/build.gradle index 850c661b2134..7c8c02bd5593 100644 --- a/sdks/java/maven-archetypes/starter/build.gradle +++ b/sdks/java/maven-archetypes/starter/build.gradle @@ -31,8 +31,3 @@ processResources { 'maven-exec-plugin.version': dependencies.create(project.library.maven.maven_exec_plugin).getVersion(), ] } - -dependencies { - compile project(path: ":runners:direct-java", configuration: "shadow") - compile project(path: ":sdks:java:core", configuration: "shadow") -} diff --git a/sdks/java/maven-archetypes/starter/src/main/resources/archetype-resources/src/main/java/StarterPipeline.java b/sdks/java/maven-archetypes/starter/src/main/resources/archetype-resources/src/main/java/StarterPipeline.java index 051c7db38cbe..d6afdecf11db 100644 --- a/sdks/java/maven-archetypes/starter/src/main/resources/archetype-resources/src/main/java/StarterPipeline.java +++ b/sdks/java/maven-archetypes/starter/src/main/resources/archetype-resources/src/main/java/StarterPipeline.java @@ -15,7 +15,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package $ +package ${package}; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.options.PipelineOptionsFactory; @@ -27,8 +27,6 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; -{package}; - /** * A starter example for writing Beam programs. * diff --git a/sdks/java/testing/expansion-service/build.gradle b/sdks/java/testing/expansion-service/build.gradle index 53491e2f09dd..45ca023f8523 100644 --- a/sdks/java/testing/expansion-service/build.gradle +++ b/sdks/java/testing/expansion-service/build.gradle @@ -27,10 +27,8 @@ ext.summary = """Testing Expansion Service used for executing cross-language tra dependencies { - compile project(path: ":runners:core-construction-java") - compile project(path: ":sdks:java:io:parquet") - compile project(path: ":sdks:java:core", configuration: "shadow") - compile project(":sdks:java:expansion-service") + testCompile project(path: ":sdks:java:io:parquet") + testCompile project(":sdks:java:expansion-service") testRuntime library.java.hadoop_client } diff --git a/sdks/java/testing/expansion-service/src/test/java/org/apache/beam/sdk/testing/expansion/TestExpansionService.java b/sdks/java/testing/expansion-service/src/test/java/org/apache/beam/sdk/testing/expansion/TestExpansionService.java index eb26526a22b2..824624e112b8 100644 --- a/sdks/java/testing/expansion-service/src/test/java/org/apache/beam/sdk/testing/expansion/TestExpansionService.java +++ b/sdks/java/testing/expansion-service/src/test/java/org/apache/beam/sdk/testing/expansion/TestExpansionService.java @@ -30,6 +30,7 @@ import org.apache.beam.sdk.expansion.ExternalTransformRegistrar; import org.apache.beam.sdk.expansion.service.ExpansionService; import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.TextIO; import org.apache.beam.sdk.io.parquet.ParquetIO; import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.DoFn; @@ -63,7 +64,6 @@ */ @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class TestExpansionService { @@ -80,6 +80,7 @@ public class TestExpansionService { private static final String TEST_COUNT_URN = "beam:transforms:xlang:count"; private static final String TEST_FILTER_URN = "beam:transforms:xlang:filter_less_than_eq"; private static final String TEST_PARQUET_READ_URN = "beam:transforms:xlang:parquet_read"; + private static final String TEST_TEXTIO_READ_URN = "beam:transforms:xlang:textio_read"; @AutoService(ExpansionService.ExpansionServiceRegistrar.class) public static class TestServiceRegistrar implements ExpansionService.ExpansionServiceRegistrar { @@ -181,6 +182,7 @@ public static class TestTransformRegistrar implements ExternalTransformRegistrar builder.put(TEST_PARTITION_URN, PartitionBuilder.class); builder.put(TEST_PARQUET_WRITE_URN, ParquetWriteBuilder.class); builder.put(TEST_PARQUET_READ_URN, ParquetReadBuilder.class); + builder.put(TEST_TEXTIO_READ_URN, TextIOReadBuilder.class); return builder.build(); } @@ -345,5 +347,19 @@ public PCollection expand(PBegin input) { }; } } + + public static class TextIOReadBuilder + implements ExternalTransformBuilder> { + @Override + public PTransform> buildExternal( + StringConfiguration configuration) { + return new PTransform>() { + @Override + public PCollection expand(PBegin input) { + return input.apply(TextIO.read().from(configuration.data)); + } + }; + } + } } } diff --git a/sdks/java/testing/jpms-tests/build.gradle b/sdks/java/testing/jpms-tests/build.gradle index a292aa21d994..d31e00df830d 100644 --- a/sdks/java/testing/jpms-tests/build.gradle +++ b/sdks/java/testing/jpms-tests/build.gradle @@ -23,7 +23,6 @@ plugins { } javaVersion="1.11" applyJavaNature( - exportJavadoc: false, publish: false, disableLintWarnings: ['requires-transitive-automatic', 'requires-automatic'] @@ -41,6 +40,7 @@ def testRunnerClass = [ directRunnerIntegrationTest: "org.apache.beam.runners.direct.DirectRunner", flinkRunnerIntegrationTest: "org.apache.beam.runners.flink.TestFlinkRunner", dataflowRunnerIntegrationTest: "org.apache.beam.runners.dataflow.TestDataflowRunner", + sparkRunnerIntegrationTest: "org.apache.beam.runners.spark.TestSparkRunner", ] configurations { @@ -48,8 +48,12 @@ configurations { directRunnerIntegrationTest.extendsFrom(baseIntegrationTest) flinkRunnerIntegrationTest.extendsFrom(baseIntegrationTest) dataflowRunnerIntegrationTest.extendsFrom(baseIntegrationTest) + sparkRunnerIntegrationTest.extendsFrom(baseIntegrationTest) } +def spark_version = '3.1.1' +def spark_scala_version = '2.12' + dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") compile project(path: ":sdks:java:extensions:google-cloud-platform-core") @@ -59,8 +63,11 @@ dependencies { baseIntegrationTest project(path: ":sdks:java:testing:jpms-tests", configuration: "testRuntime") directRunnerIntegrationTest project(":runners:direct-java") - flinkRunnerIntegrationTest project(":runners:flink:1.10") + flinkRunnerIntegrationTest project(":runners:flink:${project.ext.latestFlinkVersion}") dataflowRunnerIntegrationTest project(":runners:google-cloud-dataflow-java") + sparkRunnerIntegrationTest project(":runners:spark:3") + sparkRunnerIntegrationTest "org.apache.spark:spark-sql_$spark_scala_version:$spark_version" + sparkRunnerIntegrationTest "org.apache.spark:spark-streaming_$spark_scala_version:$spark_version" } /* diff --git a/sdks/java/testing/load-tests/build.gradle b/sdks/java/testing/load-tests/build.gradle index 6568cf2f00b0..04b8bde792ab 100644 --- a/sdks/java/testing/load-tests/build.gradle +++ b/sdks/java/testing/load-tests/build.gradle @@ -38,9 +38,11 @@ def runnerProperty = "runner" def runnerDependency = (project.hasProperty(runnerProperty) ? project.getProperty(runnerProperty) : ":runners:direct-java") - -def shouldProvideSpark = ":runners:spark".equals(runnerDependency) +def loadTestRunnerVersionProperty = "runner.version" +def loadTestRunnerVersion = project.findProperty(loadTestRunnerVersionProperty) +def shouldProvideSpark = ":runners:spark:2".equals(runnerDependency) def isDataflowRunner = ":runners:google-cloud-dataflow-java".equals(runnerDependency) +def isDataflowRunnerV2 = isDataflowRunner && "V2".equals(loadTestRunnerVersion) def runnerConfiguration = ":runners:direct-java".equals(runnerDependency) ? "shadow" : null if (isDataflowRunner) { @@ -49,7 +51,11 @@ if (isDataflowRunner) { * the following projects are evaluated before we evaluate this project. This is because * we are attempting to reference a property from the project directly. */ - evaluationDependsOn(":runners:google-cloud-dataflow-java:worker:legacy-worker") + if (isDataflowRunnerV2) { + evaluationDependsOn(":runners:google-cloud-dataflow-java") + } else { + evaluationDependsOn(":runners:google-cloud-dataflow-java:worker:legacy-worker") + } } configurations { @@ -61,14 +67,17 @@ configurations { dependencies { compile library.java.kafka_clients - compile project(path: ":sdks:java:core", configuration: "shadow") - compile project(path: ":runners:direct-java", configuration: "shadow") compile project(":sdks:java:io:synthetic") compile project(":sdks:java:testing:test-utils") compile project(":sdks:java:io:google-cloud-platform") compile project(":sdks:java:io:kafka") compile project(":sdks:java:io:kinesis") + compile library.java.aws_java_sdk_core + compile library.java.google_cloud_core + compile library.java.joda_time + compile library.java.vendored_guava_26_0_jre + compile library.java.slf4j_api gradleRun project(project.path) gradleRun project(path: runnerDependency, configuration: runnerConfiguration) @@ -92,23 +101,39 @@ if (shouldProvideSpark) { } } -task run(type: JavaExec) { +def getLoadTestArgs = { def loadTestArgs = project.findProperty(loadTestArgsProperty) ?: "" + def loadTestArgsList = new ArrayList() + Collections.addAll(loadTestArgsList as Collection, loadTestArgs.split()) if (isDataflowRunner) { - dependsOn ":runners:google-cloud-dataflow-java:worker:legacy-worker:shadowJar" - - def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: project(":runners:google-cloud-dataflow-java:worker:legacy-worker").shadowJar.archivePath - // Provide job with a customizable worker jar. - // With legacy worker jar, containerImage is set to empty (i.e. to use the internal build). - // More context and discussions can be found in PR#6694. - loadTestArgs = loadTestArgs + - " --dataflowWorkerJar=${dataflowWorkerJar} " + - " --workerHarnessContainerImage=" + if (isDataflowRunnerV2) { + loadTestArgsList.add("--experiments=beam_fn_api,use_unified_worker,use_runner_v2,shuffle_mode=service") + def sdkContainerImage = project.findProperty('sdkContainerImage') ?: project(":runners:google-cloud-dataflow-java").dockerImageName + loadTestArgsList.add("--sdkContainerImage=${sdkContainerImage}") + } else { + def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: project(":runners:google-cloud-dataflow-java:worker:legacy-worker").shadowJar.archivePath + // Provide job with a customizable worker jar. + // With legacy worker jar, containerImage is set to empty (i.e. to use the internal build). + // More context and discussions can be found in PR#6694. + loadTestArgsList.add("--dataflowWorkerJar=${dataflowWorkerJar}") + loadTestArgsList.add("--workerHarnessContainerImage=") + } } + return loadTestArgsList +} +task run(type: JavaExec) { + def loadTestArgsList = getLoadTestArgs() + if (isDataflowRunner) { + if (isDataflowRunnerV2){ + dependsOn ":runners:google-cloud-dataflow-java:buildAndPushDockerContainer" + finalizedBy ":runners:google-cloud-dataflow-java:cleanUpDockerImages" + } else { + dependsOn ":runners:google-cloud-dataflow-java:worker:legacy-worker:shadowJar" + } + } main = mainClass classpath = configurations.gradleRun - args loadTestArgs.split() + args loadTestArgsList.toArray() } - diff --git a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTest.java b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTest.java index 3813ed122456..f5cdf31d8ef5 100644 --- a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTest.java +++ b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTest.java @@ -24,6 +24,7 @@ import java.io.IOException; import java.util.Arrays; import java.util.List; +import java.util.Map; import java.util.Optional; import java.util.UUID; import java.util.regex.Matcher; @@ -81,12 +82,15 @@ abstract class LoadTest { private final InfluxDBSettings settings; + private final Map influxTags; + LoadTest(String[] args, Class testOptions, String metricsNamespace) throws IOException { this.metricsNamespace = metricsNamespace; this.runtimeMonitor = new TimeMonitor<>(metricsNamespace, "runtime"); this.options = LoadTestOptions.readFromArgs(args, testOptions); this.sourceOptions = fromJsonString(options.getSourceOptions(), SyntheticSourceOptions.class); this.pipeline = Pipeline.create(options); + this.influxTags = options.getInfluxTags(); this.runner = getRunnerName(options.getRunner().getName()); settings = InfluxDBSettings.builder() @@ -147,6 +151,19 @@ public PipelineResult run() throws IOException { return pipelineResult; } + private String buildMetric(String suffix) { + StringBuilder metricBuilder = new StringBuilder(runner); + if (influxTags != null && !influxTags.isEmpty()) { + influxTags.entrySet().stream() + .forEach( + entry -> { + metricBuilder.append(entry.getValue()).append("_"); + }); + } + metricBuilder.append(suffix); + return metricBuilder.toString(); + } + private List readMetrics( Timestamp timestamp, PipelineResult result, String testId) { MetricsReader reader = new MetricsReader(result, metricsNamespace); @@ -155,14 +172,14 @@ private List readMetrics( NamedTestResult.create( testId, timestamp.toString(), - runner + "runtime_sec", + buildMetric("runtime_sec"), (reader.getEndTimeMetric("runtime") - reader.getStartTimeMetric("runtime")) / 1000D); NamedTestResult totalBytes = NamedTestResult.create( testId, timestamp.toString(), - runner + "total_bytes_count", + buildMetric("total_bytes_count"), reader.getCounterMetric("totalBytes.count")); return Arrays.asList(runtime, totalBytes); diff --git a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestOptions.java b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestOptions.java index be4dceaca1e6..90ad3ee89130 100644 --- a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestOptions.java +++ b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestOptions.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.loadtests; +import java.util.Map; import org.apache.beam.sdk.options.ApplicationNameOptions; import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.Description; @@ -92,6 +93,12 @@ public interface LoadTestOptions extends PipelineOptions, ApplicationNameOptions void setPublishToInfluxDB(Boolean publishToInfluxDB); + @Description("Additional tags for Influx data") + @Nullable + Map getInfluxTags(); + + void setInfluxTags(Map influxTags); + static T readFromArgs(String[] args, Class optionsClass) { return PipelineOptionsFactory.fromArgs(args).withValidation().as(optionsClass); } diff --git a/sdks/java/testing/nexmark/build.gradle b/sdks/java/testing/nexmark/build.gradle index e3f7ae828ba8..3345b2de8a3d 100644 --- a/sdks/java/testing/nexmark/build.gradle +++ b/sdks/java/testing/nexmark/build.gradle @@ -28,13 +28,19 @@ description = "Apache Beam :: SDKs :: Java :: Nexmark" // When running via Gradle, this property can be used to pass commandline arguments // to the nexmark launch def nexmarkArgsProperty = "nexmark.args" +// When running via Gradle, this property can be set to "true" to enable profiling for +// the nexmark pipeline. Currently only works for the Dataflow runner. +def nexmarkProfilingProperty = "nexmark.profile" // When running via Gradle, this property sets the runner dependency def nexmarkRunnerProperty = "nexmark.runner" def nexmarkRunnerDependency = project.findProperty(nexmarkRunnerProperty) ?: ":runners:direct-java" -def shouldProvideSpark = ":runners:spark".equals(nexmarkRunnerDependency) +def nexmarkRunnerVersionProperty = "nexmark.runner.version" +def nexmarkRunnerVersion = project.findProperty(nexmarkRunnerVersionProperty) +def shouldProvideSpark = ":runners:spark:2".equals(nexmarkRunnerDependency) def isDataflowRunner = ":runners:google-cloud-dataflow-java".equals(nexmarkRunnerDependency) +def isDataflowRunnerV2 = isDataflowRunner && "V2".equals(nexmarkRunnerVersion) def runnerConfiguration = ":runners:direct-java".equals(nexmarkRunnerDependency) ? "shadow" : null if (isDataflowRunner) { @@ -43,7 +49,11 @@ if (isDataflowRunner) { * the following projects are evaluated before we evaluate this project. This is because * we are attempting to reference a property from the project directly. */ - evaluationDependsOn(":runners:google-cloud-dataflow-java:worker:legacy-worker") + if (isDataflowRunnerV2) { + evaluationDependsOn(":runners:google-cloud-dataflow-java") + } else { + evaluationDependsOn(":runners:google-cloud-dataflow-java:worker:legacy-worker") + } } configurations { @@ -62,22 +72,21 @@ dependencies { compile project(":sdks:java:extensions:sql:zetasql") compile project(":sdks:java:io:kafka") compile project(":sdks:java:testing:test-utils") + compile library.java.google_api_client + compile library.java.junit + compile "org.hamcrest:hamcrest:2.1" compile library.java.google_api_services_bigquery compile library.java.jackson_core compile library.java.jackson_annotations compile library.java.jackson_databind + compile library.java.jackson_datatype_joda compile library.java.avro compile library.java.joda_time compile library.java.slf4j_api compile library.java.kafka_clients - provided library.java.junit - provided library.java.hamcrest_core testRuntimeClasspath library.java.slf4j_jdk14 testCompile project(path: ":sdks:java:io:google-cloud-platform", configuration: "testRuntime") testCompile project(path: ":sdks:java:testing:test-utils", configuration: "testRuntime") - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library - gradleRun project(project.path) gradleRun project(path: nexmarkRunnerDependency, configuration: runnerConfiguration) @@ -100,31 +109,60 @@ if (shouldProvideSpark) { } } +def getNexmarkArgs = { + def nexmarkArgsStr = project.findProperty(nexmarkArgsProperty) ?: "" + def nexmarkArgsList = new ArrayList() + Collections.addAll(nexmarkArgsList, nexmarkArgsStr.split()) + + if (isDataflowRunner) { + if (isDataflowRunnerV2) { + nexmarkArgsList.add("--experiments=beam_fn_api,use_unified_worker,use_runner_v2,shuffle_mode=service") + def sdkContainerImage = project.findProperty('sdkContainerImage') ?: project(":runners:google-cloud-dataflow-java").dockerImageName + nexmarkArgsList.add("--sdkContainerImage=${sdkContainerImage}") + + // TODO(BEAM-12295) enable all queries once issues with runner V2 is fixed. + if (nexmarkArgsList.contains("--streaming=true")) { + nexmarkArgsList.add("--skipQueries=AVERAGE_PRICE_FOR_CATEGORY,AVERAGE_SELLING_PRICE_BY_SELLER,WINNING_BIDS,BOUNDED_SIDE_INPUT_JOIN,SESSION_SIDE_INPUT_JOIN,PORTABILITY_BATCH") // 4, 6, 9, 13, 14, 15 + } else { + nexmarkArgsList.add("--skipQueries=LOCAL_ITEM_SUGGESTION,AVERAGE_PRICE_FOR_CATEGORY,AVERAGE_SELLING_PRICE_BY_SELLER,HIGHEST_BID,WINNING_BIDS,SESSION_SIDE_INPUT_JOIN,BOUNDED_SIDE_INPUT_JOIN") // 3, 4, 6, 7, 9, 13, 14, 15 + } + } else { + def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: project(":runners:google-cloud-dataflow-java:worker:legacy-worker").shadowJar.archivePath + // Provide job with a customizable worker jar. + // With legacy worker jar, containerImage is set to empty (i.e. to use the internal build). + // More context and discussions can be found in PR#6694. + nexmarkArgsList.add("--dataflowWorkerJar=${dataflowWorkerJar}".toString()) + nexmarkArgsList.add('--workerHarnessContainerImage=') + + def nexmarkProfile = project.findProperty(nexmarkProfilingProperty) ?: "" + if (nexmarkProfile.equals("true")) { + nexmarkArgsList.add('--profilingAgentConfiguration={ "APICurated": true }') + } + } + } + return nexmarkArgsList +} + // Execute the Nexmark queries or suites via Gradle. // // Parameters: // -Pnexmark.runner -// Specify a runner subproject, such as ":runners:spark" or ":runners:flink:1.10" +// Specify a runner subproject, such as ":runners:spark:2" or ":runners:flink:1.13" // Defaults to ":runners:direct-java" // // -Pnexmark.args // Specify the command line for invoking org.apache.beam.sdk.nexmark.Main task run(type: JavaExec) { - def nexmarkArgsStr = project.findProperty(nexmarkArgsProperty) ?: "" - + def nexmarkArgsList = getNexmarkArgs() if (isDataflowRunner) { - dependsOn ":runners:google-cloud-dataflow-java:worker:legacy-worker:shadowJar" - - def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: project(":runners:google-cloud-dataflow-java:worker:legacy-worker").shadowJar.archivePath - // Provide job with a customizable worker jar. - // With legacy worker jar, containerImage is set to empty (i.e. to use the internal build). - // More context and discussions can be found in PR#6694. - nexmarkArgsStr = nexmarkArgsStr + - " --dataflowWorkerJar=${dataflowWorkerJar} " + - " --workerHarnessContainerImage=" + if (isDataflowRunnerV2) { + dependsOn ":runners:google-cloud-dataflow-java:buildAndPushDockerContainer" + finalizedBy ":runners:google-cloud-dataflow-java:cleanUpDockerImages" + } else { + dependsOn ":runners:google-cloud-dataflow-java:worker:legacy-worker:shadowJar" + } } - main = "org.apache.beam.sdk.nexmark.Main" classpath = configurations.gradleRun - args nexmarkArgsStr.split() -} + args nexmarkArgsList.toArray() +} \ No newline at end of file diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java index 1c11c0c2f6e5..892ba2837473 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java @@ -210,6 +210,7 @@ private static void savePerfsToInfluxDB( final Map results, final long timestamp) { final InfluxDBSettings settings = getInfluxSettings(options); + final Map tags = options.getInfluxTags(); final String runner = options.getRunner().getSimpleName(); final List> schemaResults = results.entrySet().stream() @@ -222,7 +223,7 @@ private static void savePerfsToInfluxDB( runner, produceMeasurement(options, entry))) .collect(toList()); - InfluxDBPublisher.publishNexmarkResults(schemaResults, settings); + InfluxDBPublisher.publishNexmarkResults(schemaResults, settings, tags); } private static InfluxDBSettings getInfluxSettings(final NexmarkOptions options) { diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Monitor.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Monitor.java index b4cc5ae7bdad..e8e5fa7b8e79 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Monitor.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Monitor.java @@ -39,20 +39,15 @@ public class Monitor implements Serializable { private class MonitorDoFn extends DoFn { final Counter elementCounter = Metrics.counter(name, prefix + ".elements"); final Counter bytesCounter = Metrics.counter(name, prefix + ".bytes"); - final Distribution startTime = Metrics.distribution(name, prefix + ".startTime"); - final Distribution endTime = Metrics.distribution(name, prefix + ".endTime"); - final Distribution startTimestamp = Metrics.distribution(name, prefix + ".startTimestamp"); - final Distribution endTimestamp = Metrics.distribution(name, prefix + ".endTimestamp"); + final Distribution processingTime = Metrics.distribution(name, prefix + ".processingTime"); + final Distribution eventTimestamp = Metrics.distribution(name, prefix + ".eventTimestamp"); @ProcessElement public void processElement(ProcessContext c) { elementCounter.inc(); bytesCounter.inc(c.element().sizeInBytes()); - long now = System.currentTimeMillis(); - startTime.update(now); - endTime.update(now); - startTimestamp.update(c.timestamp().getMillis()); - endTimestamp.update(c.timestamp().getMillis()); + processingTime.update(System.currentTimeMillis()); + eventTimestamp.update(c.timestamp().getMillis()); c.output(c.element()); } } diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java index 1108eb66a1e8..bd523f06c2aa 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java @@ -211,6 +211,19 @@ public class NexmarkConfiguration implements Serializable { */ @JsonProperty public long outOfOrderGroupSize = 1; + /** + * Used by Query 13. This specifies number of random keys to generate. Generated events will be + * assigned to these keys randomly and then goes through a GBK. + */ + @JsonProperty public int numKeyBuckets = 20000; + + /** + * Used by Query 13. This specifies CPU usage factor that the ParDo in query 13 should use. 1.0 + * means ParDo should be the CPU expensive operation on every element and 0.0 means ParDo should + * pass through all elements directly. + */ + @JsonProperty public double pardoCPUFactor = 1.0; + /** Replace any properties of this configuration which have been supplied by the command line. */ public void overrideFromOptions(NexmarkOptions options) { if (options.getDebug() != null) { diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkLauncher.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkLauncher.java index fd2044f652e9..d9048bcfa0d8 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkLauncher.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkLauncher.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.nexmark; +import static org.apache.beam.sdk.nexmark.NexmarkQueryName.PORTABILITY_BATCH; import static org.apache.beam.sdk.nexmark.NexmarkUtils.PubSubMode.COMBINED; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; @@ -60,6 +61,7 @@ import org.apache.beam.sdk.nexmark.queries.Query10; import org.apache.beam.sdk.nexmark.queries.Query11; import org.apache.beam.sdk.nexmark.queries.Query12; +import org.apache.beam.sdk.nexmark.queries.Query13; import org.apache.beam.sdk.nexmark.queries.Query1Model; import org.apache.beam.sdk.nexmark.queries.Query2; import org.apache.beam.sdk.nexmark.queries.Query2Model; @@ -104,9 +106,9 @@ import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps; +import org.apache.kafka.common.TopicPartition; import org.apache.kafka.common.serialization.ByteArrayDeserializer; import org.apache.kafka.common.serialization.ByteArraySerializer; -import org.apache.kafka.common.serialization.LongDeserializer; import org.apache.kafka.common.serialization.StringSerializer; import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; @@ -275,18 +277,18 @@ private NexmarkPerf currentPerf( long numEvents = eventMetrics.getCounterMetric(eventMonitor.prefix + ".elements"); long numEventBytes = eventMetrics.getCounterMetric(eventMonitor.prefix + ".bytes"); - long eventStart = eventMetrics.getStartTimeMetric(eventMonitor.prefix + ".startTime"); - long eventEnd = eventMetrics.getEndTimeMetric(eventMonitor.prefix + ".endTime"); + long eventStart = eventMetrics.getStartTimeMetric(eventMonitor.prefix + ".processingTime"); + long eventEnd = eventMetrics.getEndTimeMetric(eventMonitor.prefix + ".processingTime"); MetricsReader resultMetrics = new MetricsReader(result, resultMonitor.name); long numResults = resultMetrics.getCounterMetric(resultMonitor.prefix + ".elements"); long numResultBytes = resultMetrics.getCounterMetric(resultMonitor.prefix + ".bytes"); - long resultStart = resultMetrics.getStartTimeMetric(resultMonitor.prefix + ".startTime"); - long resultEnd = resultMetrics.getEndTimeMetric(resultMonitor.prefix + ".endTime"); + long resultStart = resultMetrics.getStartTimeMetric(resultMonitor.prefix + ".processingTime"); + long resultEnd = resultMetrics.getEndTimeMetric(resultMonitor.prefix + ".processingTime"); long timestampStart = - resultMetrics.getStartTimeMetric(resultMonitor.prefix + ".startTimestamp"); - long timestampEnd = resultMetrics.getEndTimeMetric(resultMonitor.prefix + ".endTimestamp"); + resultMetrics.getStartTimeMetric(resultMonitor.prefix + ".eventTimestamp"); + long timestampEnd = resultMetrics.getEndTimeMetric(resultMonitor.prefix + ".eventTimestamp"); long effectiveEnd = -1; if (eventEnd >= 0 && resultEnd >= 0) { @@ -717,11 +719,12 @@ private void sinkEventsToKafka(PCollection events) { .withBootstrapServers(options.getBootstrapServers()) .withTopic(options.getKafkaTopic()) .withValueSerializer(ByteArraySerializer.class) + .withInputTimestamp() .values()); } - static final DoFn, Event> BYTEARRAY_TO_EVENT = - new DoFn, Event>() { + static final DoFn, Event> BYTEARRAY_TO_EVENT = + new DoFn, Event>() { @ProcessElement public void processElement(ProcessContext c) throws IOException { byte[] encodedEvent = c.element().getValue(); @@ -731,20 +734,35 @@ public void processElement(ProcessContext c) throws IOException { }; /** Return source of events from Kafka. */ - private PCollection sourceEventsFromKafka(Pipeline p, final Instant now) { + private PCollection sourceEventsFromKafka(Pipeline p, final Instant start) { checkArgument((options.getBootstrapServers() != null), "Missing --bootstrapServers"); NexmarkUtils.console("Reading events from Kafka Topic %s", options.getKafkaTopic()); - KafkaIO.Read read = - KafkaIO.read() + KafkaIO.Read read = + KafkaIO.read() .withBootstrapServers(options.getBootstrapServers()) - .withTopic(options.getKafkaTopic()) - .withKeyDeserializer(LongDeserializer.class) + .withKeyDeserializer(ByteArrayDeserializer.class) .withValueDeserializer(ByteArrayDeserializer.class) - .withStartReadTime(now) + .withStartReadTime(start) .withMaxNumRecords( options.getNumEvents() != null ? options.getNumEvents() : Long.MAX_VALUE); + if (options.getKafkaTopicCreateTimeMaxDelaySec() >= 0) { + read = + read.withCreateTime( + Duration.standardSeconds(options.getKafkaTopicCreateTimeMaxDelaySec())); + } + + if (options.getNumKafkaTopicPartitions() > 0) { + ArrayList partitionArrayList = new ArrayList<>(); + for (int i = 0; i < options.getNumKafkaTopicPartitions(); ++i) { + partitionArrayList.add(new TopicPartition(options.getKafkaTopic(), i)); + } + read = read.withTopicPartitions(partitionArrayList); + } else { + read = read.withTopic(options.getKafkaTopic()); + } + return p.apply(queryName + ".ReadKafkaEvents", read.withoutMetadata()) .apply(queryName + ".KafkaToEvents", ParDo.of(BYTEARRAY_TO_EVENT)); } @@ -802,6 +820,7 @@ private void sinkResultsToKafka(PCollection formattedResults) { .withBootstrapServers(options.getBootstrapServers()) .withTopic(options.getKafkaResultsTopic()) .withValueSerializer(StringSerializer.class) + .withInputTimestamp() .values()); } @@ -1012,7 +1031,8 @@ private PCollection createSource(Pipeline p, final Instant now) throws IO // finished. In other case. when pubSubMode=SUBSCRIBE_ONLY, now should be null and // it will be ignored. source = - sourceEventsFromKafka(p, configuration.pubSubMode == COMBINED ? now : null); + sourceEventsFromKafka( + p, configuration.pubSubMode == COMBINED ? now : Instant.EPOCH); } else { source = sourceEventsFromPubsub(p); } @@ -1143,6 +1163,11 @@ private void modelResultRates(NexmarkQueryModel model) { return null; } + if (configuration.query == PORTABILITY_BATCH && options.isStreaming()) { + NexmarkUtils.console("skipping PORTABILITY_BATCH since it does not support streaming mode"); + return null; + } + queryName = query.getName(); // Append queryName to temp location @@ -1389,6 +1414,9 @@ private Map createJavaQueries() { .put( NexmarkQueryName.SESSION_SIDE_INPUT_JOIN, new NexmarkQuery(configuration, new SessionSideInputJoin(configuration))) + .put( + NexmarkQueryName.PORTABILITY_BATCH, + new NexmarkQuery(configuration, new Query13(configuration))) .build(); } diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkOptions.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkOptions.java index 31a459edf736..70e3f7168b5a 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkOptions.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkOptions.java @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.nexmark; +import java.util.Map; import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; import org.apache.beam.sdk.io.gcp.pubsub.PubsubOptions; import org.apache.beam.sdk.options.ApplicationNameOptions; @@ -429,6 +430,20 @@ void setPubsubMessageSerializationMethod( void setKafkaTopic(String value); + @Description( + "Number of partitions for Kafka topic in streaming mode. If unspecified, the broker will be queried for all partitions.") + int getNumKafkaTopicPartitions(); + + void setNumKafkaTopicPartitions(int value); + + @Description( + "If non-negative, events from the Kafka topic will get their timestamps from the Kafka createtime, with the maximum delay for" + + "disorder as specified.") + @Default.Integer(60) + int getKafkaTopicCreateTimeMaxDelaySec(); + + void setKafkaTopicCreateTimeMaxDelaySec(int value); + @Description("Base name of Kafka results topic in streaming mode.") @Default.String("nexmark-results") @Nullable @@ -488,4 +503,10 @@ void setPubsubMessageSerializationMethod( String getInfluxRetentionPolicy(); void setInfluxRetentionPolicy(String influxRetentionPolicy); + + @Description("Additional tags for Influx data") + @Nullable + Map getInfluxTags(); + + void setInfluxTags(Map influxTags); } diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkQueryName.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkQueryName.java index 24c7e3d4e174..afb893c12fb1 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkQueryName.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkQueryName.java @@ -43,6 +43,7 @@ public enum NexmarkQueryName { LOG_TO_SHARDED_FILES(10), // Query "10" USER_SESSIONS(11), // Query "11" PROCESSING_TIME_WINDOWS(12), // Query "12" + PORTABILITY_BATCH(15), // Query "13" // Other non-numbered queries BOUNDED_SIDE_INPUT_JOIN(13), diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkUtils.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkUtils.java index 7bc00a8d00c5..79d38bb26dba 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkUtils.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkUtils.java @@ -20,6 +20,7 @@ import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.datatype.joda.JodaModule; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; @@ -94,7 +95,7 @@ public class NexmarkUtils { private static final Logger LOG = LoggerFactory.getLogger(NexmarkUtils.class); /** Mapper for (de)serializing JSON. */ - public static final ObjectMapper MAPPER = new ObjectMapper(); + public static final ObjectMapper MAPPER = new ObjectMapper().registerModule(new JodaModule()); /** Possible sources for events. */ public enum SourceType { diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/NexmarkQuery.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/NexmarkQuery.java index 7ae1523b1206..13947e9a25e8 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/NexmarkQuery.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/NexmarkQuery.java @@ -39,7 +39,6 @@ public final class NexmarkQuery final NexmarkConfiguration configuration; public final Monitor eventMonitor; public final Monitor resultMonitor; - private final Monitor endOfStreamMonitor; private final Counter fatalCounter; private final NexmarkQueryTransform transform; private transient PCollection> sideInput = null; @@ -51,12 +50,10 @@ public NexmarkQuery(NexmarkConfiguration configuration, NexmarkQueryTransform if (configuration.debug) { eventMonitor = new Monitor<>(name + ".Events", "event"); resultMonitor = new Monitor<>(name + ".Results", "result"); - endOfStreamMonitor = new Monitor<>(name + ".EndOfStream", "end"); fatalCounter = Metrics.counter(name, "fatal"); } else { eventMonitor = null; resultMonitor = null; - endOfStreamMonitor = null; fatalCounter = null; } } diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/NexmarkQueryModel.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/NexmarkQueryModel.java index 3b8479d4626a..8bb911fdc0b9 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/NexmarkQueryModel.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/NexmarkQueryModel.java @@ -17,6 +17,8 @@ */ package org.apache.beam.sdk.nexmark.queries; +import static org.hamcrest.MatcherAssert.assertThat; + import java.io.Serializable; import java.util.Collection; import java.util.HashSet; @@ -99,7 +101,7 @@ public SerializableFunction>, Void> assertionFor() @Override public @Nullable Void apply(Iterable> actual) { Collection actualStrings = toCollection(relevantResults(actual).iterator()); - Assert.assertThat("wrong pipeline output", actualStrings, IsEqual.equalTo(expectedStrings)); + assertThat("wrong pipeline output", actualStrings, IsEqual.equalTo(expectedStrings)); return null; } }; diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query10.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query10.java index c8b58f9a5d6b..0c5e8b901bf6 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query10.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query10.java @@ -188,7 +188,7 @@ public PCollection expand(PCollection events) { public void processElement(ProcessContext c) { if (c.element().hasAnnotation("LATE")) { lateCounter.inc(); - LOG.info("Observed late: {}", c.element()); + LOG.debug("Observed late: {}", c.element()); } else { onTimeCounter.inc(); } @@ -240,7 +240,7 @@ public void processElement(ProcessContext c, BoundedWindow window) { } } String shard = c.element().getKey(); - LOG.info( + LOG.debug( String.format( "%s with timestamp %s has %d actually late and %d on-time " + "elements in pane %s for window %s", @@ -289,7 +289,7 @@ public void processElement(ProcessContext c, BoundedWindow window) String shard = c.element().getKey(); GcsOptions options = c.getPipelineOptions().as(GcsOptions.class); OutputFile outputFile = outputFileFor(window, shard, c.pane()); - LOG.info( + LOG.debug( String.format( "Writing %s with record timestamp %s, window timestamp %s, pane %s", shard, c.timestamp(), window.maxTimestamp(), c.pane())); @@ -350,7 +350,7 @@ public void processElement(ProcessContext c, BoundedWindow window) LOG.error("ERROR! Unexpected ON_TIME pane index: {}", c.pane()); } else { GcsOptions options = c.getPipelineOptions().as(GcsOptions.class); - LOG.info( + LOG.debug( "Index with record timestamp {}, window timestamp {}, pane {}", c.timestamp(), window.maxTimestamp(), diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query13.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query13.java new file mode 100644 index 000000000000..9a4fec1c965d --- /dev/null +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query13.java @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.nexmark.queries; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.Random; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.nexmark.NexmarkConfiguration; +import org.apache.beam.sdk.nexmark.model.Event; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.Reshuffle.AssignShardFn; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; + +/** + * Query "13" PORTABILITY_BATCH (not in original suite). + * + *

    This benchmark is created to stress the boundary of runner and SDK in portability world. The + * basic shape of this benchmark is source + GBK + ParDo, in which the GBK read + ParDo will require + * that runner reads from shuffle and connects with SDK to do CPU intensive computation. + */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class Query13 extends NexmarkQueryTransform { + private final NexmarkConfiguration configuration; + + public Query13(NexmarkConfiguration configuration) { + super("Query13"); + this.configuration = configuration; + } + + @Override + public PCollection expand(PCollection events) { + final Coder coder = events.getCoder(); + return events + .apply("Pair with random key", ParDo.of(new AssignShardFn<>(configuration.numKeyBuckets))) + .apply(GroupByKey.create()) + .apply( + "ExpandIterable", + ParDo.of( + new DoFn>, Event>() { + @ProcessElement + public void processElement( + @Element KV> element, OutputReceiver r) { + for (Event value : element.getValue()) { + r.output(value); + } + } + })) + // Force round trip through coder. + .apply( + name + ".Serialize", + ParDo.of( + new DoFn() { + private final Counter bytesMetric = Metrics.counter(name, "serde-bytes"); + private final Random random = new Random(); + private double pardoCPUFactor = + (configuration.pardoCPUFactor >= 0.0 && configuration.pardoCPUFactor <= 1.0) + ? configuration.pardoCPUFactor + : 1.0; + + @ProcessElement + public void processElement(ProcessContext c) throws CoderException, IOException { + Event event; + if (random.nextDouble() <= pardoCPUFactor) { + event = encodeDecode(coder, c.element(), bytesMetric); + } else { + event = c.element(); + } + c.output(event); + } + })); + } + + private static Event encodeDecode(Coder coder, Event e, Counter bytesMetric) + throws IOException { + ByteArrayOutputStream outStream = new ByteArrayOutputStream(); + coder.encode(e, outStream, Coder.Context.OUTER); + byte[] byteArray = outStream.toByteArray(); + bytesMetric.inc((long) byteArray.length); + ByteArrayInputStream inStream = new ByteArrayInputStream(byteArray); + return coder.decode(inStream, Coder.Context.OUTER); + } +} diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query3.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query3.java index 0d23d5fa422d..11ca3d7aa792 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query3.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query3.java @@ -17,9 +17,6 @@ */ package org.apache.beam.sdk.nexmark.queries; -import java.util.ArrayList; -import java.util.List; -import org.apache.beam.sdk.coders.ListCoder; import org.apache.beam.sdk.metrics.Counter; import org.apache.beam.sdk.metrics.Metrics; import org.apache.beam.sdk.nexmark.NexmarkConfiguration; @@ -27,6 +24,7 @@ import org.apache.beam.sdk.nexmark.model.Event; import org.apache.beam.sdk.nexmark.model.NameCityStateId; import org.apache.beam.sdk.nexmark.model.Person; +import org.apache.beam.sdk.state.BagState; import org.apache.beam.sdk.state.StateSpec; import org.apache.beam.sdk.state.StateSpecs; import org.apache.beam.sdk.state.TimeDomain; @@ -36,16 +34,11 @@ import org.apache.beam.sdk.state.ValueState; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.Flatten; import org.apache.beam.sdk.transforms.ParDo; -import org.apache.beam.sdk.transforms.join.CoGbkResult; -import org.apache.beam.sdk.transforms.join.CoGroupByKey; -import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple; -import org.apache.beam.sdk.transforms.windowing.AfterPane; -import org.apache.beam.sdk.transforms.windowing.GlobalWindows; -import org.apache.beam.sdk.transforms.windowing.Repeatedly; -import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; import org.joda.time.Duration; import org.joda.time.Instant; import org.slf4j.Logger; @@ -83,27 +76,29 @@ public Query3(NexmarkConfiguration configuration) { @Override public PCollection expand(PCollection events) { - int numEventsInPane = 30; - - PCollection eventsWindowed = - events.apply( - Window.into(new GlobalWindows()) - .triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(numEventsInPane))) - .discardingFiredPanes() - .withAllowedLateness(Duration.ZERO)); - PCollection> auctionsBySellerId = - eventsWindowed + PCollection> auctionsBySellerId = + events // Only want the new auction events. .apply(NexmarkQueryUtil.JUST_NEW_AUCTIONS) // We only want auctions in category 10. .apply(name + ".InCategory", Filter.by(auction -> auction.category == 10)) - // Key auctions by their seller id. - .apply("AuctionBySeller", NexmarkQueryUtil.AUCTION_BY_SELLER); + // Key auctions by their seller id and move to union Event type. + .apply( + "EventByAuctionSeller", + ParDo.of( + new DoFn>() { + @ProcessElement + public void processElement(ProcessContext c) { + Event e = new Event(); + e.newAuction = c.element(); + c.output(KV.of(c.element().seller, e)); + } + })); - PCollection> personsById = - eventsWindowed + PCollection> personsById = + events // Only want the new people events. .apply(NexmarkQueryUtil.JUST_NEW_PERSONS) @@ -116,18 +111,24 @@ public PCollection expand(PCollection events) { || "ID".equals(person.state) || "CA".equals(person.state))) - // Key people by their id. - .apply("PersonById", NexmarkQueryUtil.PERSON_BY_ID); + // Key persons by their id and move to the union event type. + .apply( + "EventByPersonId", + ParDo.of( + new DoFn>() { + @ProcessElement + public void processElement(ProcessContext c) { + Event e = new Event(); + e.newPerson = c.element(); + c.output(KV.of(c.element().id, e)); + } + })); - return // Join auctions and people. - // concatenate KeyedPCollections - KeyedPCollectionTuple.of(NexmarkQueryUtil.AUCTION_TAG, auctionsBySellerId) - .and(NexmarkQueryUtil.PERSON_TAG, personsById) - // group auctions and persons by personId - .apply(CoGroupByKey.create()) + return PCollectionList.of(auctionsBySellerId) + .and(personsById) + .apply(Flatten.pCollections()) .apply(name + ".Join", ParDo.of(joinDoFn)) - // Project what we want. .apply( name + ".Project", @@ -154,8 +155,11 @@ public void processElement(ProcessContext c) { *

    However we know that each auction is associated with at most one person, so only need to * store auction records in persistent state until we have seen the corresponding person record. * And of course may have already seen that record. + * + *

    To prevent state from accumulating over time, we cleanup buffered people or auctions after a + * max waiting time. */ - private static class JoinDoFn extends DoFn, KV> { + private static class JoinDoFn extends DoFn, KV> { private final int maxAuctionsWaitingTime; private static final String AUCTIONS = "auctions"; @@ -164,13 +168,12 @@ private static class JoinDoFn extends DoFn, KV> personSpec = StateSpecs.value(Person.CODER); - private static final String PERSON_STATE_EXPIRING = "personStateExpiring"; + private static final String STATE_EXPIRING = "stateExpiring"; @StateId(AUCTIONS) - private final StateSpec>> auctionsSpec = - StateSpecs.value(ListCoder.of(Auction.CODER)); + private final StateSpec> auctionsSpec = StateSpecs.bag(Auction.CODER); - @TimerId(PERSON_STATE_EXPIRING) + @TimerId(STATE_EXPIRING) private final TimerSpec timerSpec = TimerSpecs.timer(TimeDomain.EVENT_TIME); // Used to refer the metrics namespace @@ -178,7 +181,6 @@ private static class JoinDoFn extends DoFn, KV personState, - @StateId(AUCTIONS) ValueState> auctionsState) { + @Element KV element, + OutputReceiver> output, + @TimerId(STATE_EXPIRING) Timer timer, + @StateId(PERSON) @AlwaysFetched ValueState personState, + @StateId(AUCTIONS) BagState auctionsState) { // We would *almost* implement this by rewindowing into the global window and // running a combiner over the result. The combiner's accumulator would be the // state we use below. However, combiners cannot emit intermediate results, thus - // we need to wait for the pending ReduceFn API. + // we need to wait for the pending ReduceFn API Person existingPerson = personState.read(); - if (existingPerson != null) { - // We've already seen the new person event for this person id. - // We can join with any new auctions on-the-fly without needing any - // additional persistent state. - for (Auction newAuction : c.element().getValue().getAll(NexmarkQueryUtil.AUCTION_TAG)) { - newAuctionCounter.inc(); - newOldOutputCounter.inc(); - c.output(KV.of(newAuction, existingPerson)); - } - return; - } - - Person theNewPerson = null; - for (Person newPerson : c.element().getValue().getAll(NexmarkQueryUtil.PERSON_TAG)) { - if (theNewPerson == null) { - theNewPerson = newPerson; + Event event = element.getValue(); + Instant eventTime = null; + // Event is a union object, handle a new person or auction. + if (event.newPerson != null) { + Person person = event.newPerson; + eventTime = person.dateTime; + if (existingPerson == null) { + newPersonCounter.inc(); + personState.write(person); + // We've now seen the person for this person id so can flush any + // pending auctions for the same seller id (an auction is done by only one seller). + Iterable pendingAuctions = auctionsState.read(); + if (pendingAuctions != null) { + for (Auction pendingAuction : pendingAuctions) { + oldNewOutputCounter.inc(); + output.output(KV.of(pendingAuction, person)); + } + auctionsState.clear(); + } } else { - if (theNewPerson.equals(newPerson)) { - LOG.error("Duplicate person {}", theNewPerson); + if (person.equals(existingPerson)) { + LOG.error("Duplicate person {}", person); } else { - LOG.error("Conflicting persons {} and {}", theNewPerson, newPerson); + LOG.error("Conflicting persons {} and {}", existingPerson, person); } fatalCounter.inc(); - continue; } - newPersonCounter.inc(); - // We've now seen the person for this person id so can flush any - // pending auctions for the same seller id (an auction is done by only one seller). - List pendingAuctions = auctionsState.read(); - if (pendingAuctions != null) { - for (Auction pendingAuction : pendingAuctions) { - oldNewOutputCounter.inc(); - c.output(KV.of(pendingAuction, newPerson)); - } - auctionsState.clear(); - } - // Also deal with any new auctions. - for (Auction newAuction : c.element().getValue().getAll(NexmarkQueryUtil.AUCTION_TAG)) { + } else if (event.newAuction != null) { + Auction auction = event.newAuction; + eventTime = auction.dateTime; + newAuctionCounter.inc(); + if (existingPerson == null) { + auctionsState.add(auction); + } else { newAuctionCounter.inc(); - newNewOutputCounter.inc(); - c.output(KV.of(newAuction, newPerson)); + newOldOutputCounter.inc(); + output.output(KV.of(auction, existingPerson)); } - // Remember this person for any future auctions. - personState.write(newPerson); - // set a time out to clear this state + } else { + LOG.error("Only expecting people or auctions but received {}", event); + fatalCounter.inc(); + } + if (eventTime != null) { + // Set or reset the cleanup timer to clear the state. Instant firingTime = - new Instant(newPerson.dateTime).plus(Duration.standardSeconds(maxAuctionsWaitingTime)); + new Instant(eventTime).plus(Duration.standardSeconds(maxAuctionsWaitingTime)); timer.set(firingTime); } - if (theNewPerson != null) { - return; - } - - // We'll need to remember the auctions until we see the corresponding - // new person event. - List pendingAuctions = auctionsState.read(); - if (pendingAuctions == null) { - pendingAuctions = new ArrayList<>(); - } - for (Auction newAuction : c.element().getValue().getAll(NexmarkQueryUtil.AUCTION_TAG)) { - newAuctionCounter.inc(); - pendingAuctions.add(newAuction); - } - auctionsState.write(pendingAuctions); } - @OnTimer(PERSON_STATE_EXPIRING) + @OnTimer(STATE_EXPIRING) public void onTimerCallback( - OnTimerContext context, @StateId(PERSON) ValueState personState) { + OnTimerContext context, + @StateId(PERSON) ValueState personState, + @StateId(AUCTIONS) BagState auctionState) { personState.clear(); + auctionState.clear(); } } } diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query5.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query5.java index 76d9e64d3245..77d64ab0b07b 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query5.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query5.java @@ -18,12 +18,18 @@ package org.apache.beam.sdk.nexmark.queries; import java.util.ArrayList; -import java.util.Collections; import java.util.List; +import java.util.Objects; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderRegistry; import org.apache.beam.sdk.nexmark.NexmarkConfiguration; import org.apache.beam.sdk.nexmark.model.AuctionCount; import org.apache.beam.sdk.nexmark.model.Event; +import org.apache.beam.sdk.nexmark.queries.Query5.TopCombineFn.Accum; +import org.apache.beam.sdk.schemas.JavaFieldSchema; +import org.apache.beam.sdk.schemas.SchemaCoder; import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.sdk.transforms.Combine.AccumulatingCombineFn; import org.apache.beam.sdk.transforms.Count; import org.apache.beam.sdk.transforms.DoFn; import org.apache.beam.sdk.transforms.ParDo; @@ -31,6 +37,10 @@ import org.apache.beam.sdk.transforms.windowing.Window; import org.apache.beam.sdk.values.KV; import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.Nullable; import org.joda.time.Duration; /** @@ -53,6 +63,80 @@ public class Query5 extends NexmarkQueryTransform { private final NexmarkConfiguration configuration; + /** CombineFn that takes bidders with counts and keeps all bidders with the top count. */ + public static class TopCombineFn + extends AccumulatingCombineFn, Accum, KV>> { + @Override + public Accum createAccumulator() { + return new Accum(); + } + + @Override + public Coder getAccumulatorCoder( + @NonNull CoderRegistry registry, @NonNull Coder> inputCoder) { + JavaFieldSchema provider = new JavaFieldSchema(); + TypeDescriptor typeDescriptor = new TypeDescriptor() {}; + return SchemaCoder.of( + provider.schemaFor(typeDescriptor), + typeDescriptor, + provider.toRowFunction(typeDescriptor), + provider.fromRowFunction(typeDescriptor)); + } + + /** Accumulator that takes bidders with counts and keeps all bidders with the top count. */ + public static class Accum + implements AccumulatingCombineFn.Accumulator, Accum, KV>> { + + public ArrayList auctions = new ArrayList<>(); + public long count = 0; + + @Override + public void addInput(KV input) { + if (input.getValue() > count) { + count = input.getValue(); + auctions.clear(); + auctions.add(input.getKey()); + } else if (input.getValue() == count) { + auctions.add(input.getKey()); + } + } + + @Override + public void mergeAccumulator(Accum other) { + if (other.count > this.count) { + this.count = other.count; + this.auctions.clear(); + this.auctions.addAll(other.auctions); + } else if (other.count == this.count) { + this.auctions.addAll(other.auctions); + } + } + + @Override + public KV> extractOutput() { + return KV.of(count, auctions); + } + + @Override + public boolean equals(@Nullable Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + + Accum other = (Accum) o; + return this.count == other.count && Iterables.elementsEqual(this.auctions, other.auctions); + } + + @Override + public int hashCode() { + return Objects.hash(count, auctions); + } + } + } + public Query5(NexmarkConfiguration configuration) { super("Query5"); this.configuration = configuration; @@ -74,58 +158,19 @@ public PCollection expand(PCollection events) { // Count the number of bids per auction id. .apply(Count.perElement()) - // We'll want to keep all auctions with the maximal number of bids. - // Start by lifting each into a singleton list. - // need to do so because bellow combine returns a list of auctions in the key in case of - // equal number of bids. Combine needs to have same input type and return type. - .apply( - name + ".ToSingletons", - ParDo.of( - new DoFn, KV, Long>>() { - @ProcessElement - public void processElement(ProcessContext c) { - c.output( - KV.of( - Collections.singletonList(c.element().getKey()), - c.element().getValue())); - } - })) - // Keep only the auction ids with the most bids. .apply( - Combine.globally( - new Combine.BinaryCombineFn, Long>>() { - @Override - public KV, Long> apply( - KV, Long> left, KV, Long> right) { - List leftBestAuctions = left.getKey(); - long leftCount = left.getValue(); - List rightBestAuctions = right.getKey(); - long rightCount = right.getValue(); - if (leftCount > rightCount) { - return left; - } else if (leftCount < rightCount) { - return right; - } else { - List newBestAuctions = new ArrayList<>(); - newBestAuctions.addAll(leftBestAuctions); - newBestAuctions.addAll(rightBestAuctions); - return KV.of(newBestAuctions, leftCount); - } - } - }) - .withoutDefaults() - .withFanout(configuration.fanout)) + Combine.globally(new TopCombineFn()).withoutDefaults().withFanout(configuration.fanout)) // Project into result. .apply( name + ".Select", ParDo.of( - new DoFn, Long>, AuctionCount>() { + new DoFn>, AuctionCount>() { @ProcessElement public void processElement(ProcessContext c) { - long count = c.element().getValue(); - for (long auction : c.element().getKey()) { + long count = c.element().getKey(); + for (long auction : c.element().getValue()) { c.output(new AuctionCount(auction, count)); } } diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/WinningBids.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/WinningBids.java index 4fb34764a720..684924458e4c 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/WinningBids.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/WinningBids.java @@ -283,29 +283,6 @@ public Coder windowCoder() { public WindowMappingFn getDefaultWindowMappingFn() { throw new UnsupportedOperationException("AuctionWindowFn not supported for side inputs"); } - - /** - * Below we will GBK auctions and bids on their auction ids. Then we will reduce those per id to - * emit {@code (auction, winning bid)} pairs for auctions which have expired with at least one - * valid bid. We would like those output pairs to have a timestamp of the auction's expiry - * (since that's the earliest we know for sure we have the correct winner). We would also like - * to make that winning results are available to following stages at the auction's expiry. - * - *

    Each result of the GBK will have a timestamp of the min of the result of this object's - * assignOutputTime over all records which end up in one of its iterables. Thus we get the - * desired behavior if we ignore each record's timestamp and always return the auction window's - * 'maxTimestamp', which will correspond to the auction's expiry. - * - *

    In contrast, if this object's assignOutputTime were to return 'inputTimestamp' (the usual - * implementation), then each GBK record will take as its timestamp the minimum of the - * timestamps of all bids and auctions within it, which will always be the auction's timestamp. - * An auction which expires well into the future would thus hold up the watermark of the GBK - * results until that auction expired. That in turn would hold up all winning pairs. - */ - @Override - public Instant getOutputTime(Instant inputTimestamp, AuctionOrBidWindow window) { - return window.maxTimestamp(); - } } private final AuctionOrBidWindowFn auctionOrBidWindowFn; diff --git a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/sources/UnboundedEventSource.java b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/sources/UnboundedEventSource.java index ec328d8a914f..0a48131daabb 100644 --- a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/sources/UnboundedEventSource.java +++ b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/sources/UnboundedEventSource.java @@ -98,7 +98,7 @@ private class EventReader extends UnboundedReader { * Current backlog, as estimated number of event bytes we are behind, or null if unknown. * Reported to callers. */ - private @Nullable Long backlogBytes; + private long backlogBytes; /** Wallclock time (ms since epoch) we last reported the backlog, or -1 if never reported. */ private long lastReportedBacklogWallclock; @@ -127,6 +127,7 @@ public EventReader(Generator generator) { lastReportedBacklogWallclock = -1; pendingEventWallclockTime = -1; timestampAtLastReportedBacklogMs = -1; + updateBacklog(System.currentTimeMillis(), 0); } public EventReader(GeneratorConfig config) { @@ -146,9 +147,7 @@ public boolean advance() { while (pendingEvent == null) { if (!generator.hasNext() && heldBackEvents.isEmpty()) { // No more events, EVER. - if (isRateLimited) { - updateBacklog(System.currentTimeMillis(), 0); - } + updateBacklog(System.currentTimeMillis(), 0); if (watermark < BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis()) { watermark = BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis(); LOG.trace("stopped unbounded generator {}", generator); @@ -177,9 +176,7 @@ public boolean advance() { } } else { // Waiting for held-back event to fire. - if (isRateLimited) { - updateBacklog(now, 0); - } + updateBacklog(now, 0); return false; } @@ -199,6 +196,8 @@ public boolean advance() { return false; } updateBacklog(now, now - pendingEventWallclockTime); + } else { + updateBacklog(now, 0); } // This event is ready to fire. @@ -210,20 +209,26 @@ public boolean advance() { private void updateBacklog(long now, long newBacklogDurationMs) { backlogDurationMs = newBacklogDurationMs; long interEventDelayUs = generator.currentInterEventDelayUs(); - if (interEventDelayUs != 0) { + if (isRateLimited && interEventDelayUs > 0) { long backlogEvents = (backlogDurationMs * 1000 + interEventDelayUs - 1) / interEventDelayUs; backlogBytes = generator.getCurrentConfig().estimatedBytesForEvents(backlogEvents); + } else { + double fractionRemaining = 1.0 - generator.getFractionConsumed(); + backlogBytes = + Math.max( + 0L, + (long) (generator.getCurrentConfig().getEstimatedSizeBytes() * fractionRemaining)); } if (lastReportedBacklogWallclock < 0 || now - lastReportedBacklogWallclock > BACKLOG_PERIOD.getMillis()) { - double timeDialation = Double.NaN; + double timeDilation = Double.NaN; if (pendingEvent != null && lastReportedBacklogWallclock >= 0 && timestampAtLastReportedBacklogMs >= 0) { long wallclockProgressionMs = now - lastReportedBacklogWallclock; long eventTimeProgressionMs = pendingEvent.getTimestamp().getMillis() - timestampAtLastReportedBacklogMs; - timeDialation = (double) eventTimeProgressionMs / (double) wallclockProgressionMs; + timeDilation = (double) eventTimeProgressionMs / (double) wallclockProgressionMs; } LOG.debug( "unbounded generator backlog now {}ms ({} bytes) at {}us interEventDelay " @@ -231,7 +236,7 @@ private void updateBacklog(long now, long newBacklogDurationMs) { backlogDurationMs, backlogBytes, interEventDelayUs, - timeDialation); + timeDilation); lastReportedBacklogWallclock = now; if (pendingEvent != null) { timestampAtLastReportedBacklogMs = pendingEvent.getTimestamp().getMillis(); @@ -277,7 +282,7 @@ public GeneratorCheckpoint getCheckpointMark() { @Override public long getSplitBacklogBytes() { - return backlogBytes == null ? BACKLOG_UNKNOWN : backlogBytes; + return backlogBytes; } @Override diff --git a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/NexmarkConfigurationTest.java b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/NexmarkConfigurationTest.java index de447b03c47a..c621bd40469c 100644 --- a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/NexmarkConfigurationTest.java +++ b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/NexmarkConfigurationTest.java @@ -17,8 +17,8 @@ */ package org.apache.beam.sdk.nexmark; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; -import static org.junit.Assert.assertThat; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.junit.Test; diff --git a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/NexmarkUtilsTest.java b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/NexmarkUtilsTest.java index 5c03d0f78a41..d3f43105d314 100644 --- a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/NexmarkUtilsTest.java +++ b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/NexmarkUtilsTest.java @@ -45,7 +45,6 @@ @RunWith(JUnit4.class) @SuppressWarnings({ "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556) - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) }) public class NexmarkUtilsTest { diff --git a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java index c60c6e70d895..e8092ac49764 100644 --- a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java +++ b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java @@ -18,7 +18,7 @@ package org.apache.beam.sdk.nexmark; import static org.hamcrest.CoreMatchers.hasItems; -import static org.junit.Assert.assertThat; +import static org.hamcrest.MatcherAssert.assertThat; import java.util.HashMap; import java.util.List; @@ -31,9 +31,6 @@ import org.junit.Test; /** Test class for BigQuery sinks. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class PerfsToBigQueryTest { private static final NexmarkQueryName QUERY = NexmarkQueryName.CURRENCY_CONVERSION; diff --git a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/BoundedSideInputJoinTest.java b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/BoundedSideInputJoinTest.java index f8f65d253a7d..7435b894cb48 100644 --- a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/BoundedSideInputJoinTest.java +++ b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/BoundedSideInputJoinTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.nexmark.queries; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; -import static org.junit.Assert.assertThat; import java.util.Random; import org.apache.beam.sdk.PipelineResult; diff --git a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/SessionSideInputJoinTest.java b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/SessionSideInputJoinTest.java index efb904b25c9b..c139e957d111 100644 --- a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/SessionSideInputJoinTest.java +++ b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/SessionSideInputJoinTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.nexmark.queries; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; -import static org.junit.Assert.assertThat; import java.util.Random; import org.apache.beam.sdk.PipelineResult; diff --git a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/sql/SqlBoundedSideInputJoinTest.java b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/sql/SqlBoundedSideInputJoinTest.java index 577ee70deed0..3810eecc4f57 100644 --- a/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/sql/SqlBoundedSideInputJoinTest.java +++ b/sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/sql/SqlBoundedSideInputJoinTest.java @@ -17,9 +17,9 @@ */ package org.apache.beam.sdk.nexmark.queries.sql; +import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import static org.hamcrest.Matchers.greaterThan; -import static org.junit.Assert.assertThat; import java.util.Random; import org.apache.beam.sdk.PipelineResult; diff --git a/sdks/java/testing/test-utils/build.gradle b/sdks/java/testing/test-utils/build.gradle index d95af02acf67..e21c54696c58 100644 --- a/sdks/java/testing/test-utils/build.gradle +++ b/sdks/java/testing/test-utils/build.gradle @@ -30,12 +30,15 @@ dependencies { compile project(path: ":sdks:java:core", configuration: "shadow") compile library.java.vendored_guava_26_0_jre compile library.java.google_cloud_bigquery - compile project(":sdks:java:extensions:google-cloud-platform-core") + compile library.java.google_code_gson + compile library.java.joda_time + compile library.java.commons_compress + compile library.java.commons_lang3 + compile library.java.http_client + compile library.java.http_core + compile library.java.slf4j_api testCompile library.java.junit - testCompile library.java.mockito_core - testCompile library.java.hamcrest_core - testCompile library.java.hamcrest_library testRuntimeOnly project(path: ":runners:direct-java", configuration: "shadowTest") } diff --git a/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/publishing/InfluxDBPublisher.java b/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/publishing/InfluxDBPublisher.java index 3507720be768..42c4d8b4ec1e 100644 --- a/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/publishing/InfluxDBPublisher.java +++ b/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/publishing/InfluxDBPublisher.java @@ -58,8 +58,10 @@ public final class InfluxDBPublisher { private InfluxDBPublisher() {} public static void publishNexmarkResults( - final Collection> results, final InfluxDBSettings settings) { - publishWithCheck(settings, () -> publishNexmark(results, settings)); + final Collection> results, + final InfluxDBSettings settings, + final Map tags) { + publishWithCheck(settings, () -> publishNexmark(results, settings, tags)); } public static void publishWithSettings( @@ -82,25 +84,38 @@ private static void publishWithCheck( } private static void publishNexmark( - final Collection> results, final InfluxDBSettings settings) + final Collection> results, + final InfluxDBSettings settings, + final Map tags) throws Exception { final HttpClientBuilder builder = provideHttpBuilder(settings); final HttpPost postRequest = providePOSTRequest(settings); final StringBuilder metricBuilder = new StringBuilder(); + results.forEach( - map -> - metricBuilder - .append(map.get("measurement")) - .append(",") - .append(getKV(map, "runner")) - .append(" ") - .append(getKV(map, "runtimeMs")) - .append(",") - .append(getKV(map, "numResults")) - .append(" ") - .append(map.get("timestamp")) - .append('\n')); + map -> { + metricBuilder.append(map.get("measurement")).append(",").append(getKV(map, "runner")); + if (tags != null && !tags.isEmpty()) { + tags.entrySet().stream() + .forEach( + entry -> { + metricBuilder + .append(",") + .append(entry.getKey()) + .append("=") + .append(entry.getValue()); + }); + } + metricBuilder + .append(" ") + .append(getKV(map, "runtimeMs")) + .append(",") + .append(getKV(map, "numResults")) + .append(" ") + .append(map.get("timestamp")) + .append('\n'); + }); postRequest.setEntity( new GzipCompressingEntity(new ByteArrayEntity(metricBuilder.toString().getBytes(UTF_8)))); diff --git a/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/fakes/FakeBigQueryClient.java b/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/fakes/FakeBigQueryClient.java index e8e40b2888ad..2bcbe4ef90cd 100644 --- a/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/fakes/FakeBigQueryClient.java +++ b/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/fakes/FakeBigQueryClient.java @@ -29,9 +29,6 @@ * * @see BigQueryClient */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FakeBigQueryClient extends BigQueryClient { private Map>> rowsPerTable; diff --git a/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/fakes/FakeBigQueryResultsPublisher.java b/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/fakes/FakeBigQueryResultsPublisher.java index 7fee64331b1e..e8adfb900f38 100644 --- a/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/fakes/FakeBigQueryResultsPublisher.java +++ b/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/fakes/FakeBigQueryResultsPublisher.java @@ -25,9 +25,6 @@ import org.apache.beam.sdk.testutils.publishing.BigQueryResultsPublisher; /** A fake implementation of {@link BigQueryResultsPublisher} for testing purposes only. */ -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class FakeBigQueryResultsPublisher extends BigQueryResultsPublisher { private Map> recordsPerTable; diff --git a/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/jvmverification/JvmVerification.java b/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/jvmverification/JvmVerification.java index 719cde8f7265..1cc0ed0cecbe 100644 --- a/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/jvmverification/JvmVerification.java +++ b/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/jvmverification/JvmVerification.java @@ -30,9 +30,6 @@ import org.apache.commons.codec.binary.Hex; import org.junit.Test; -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class JvmVerification { private static final Map versionMapping = new HashMap<>(); diff --git a/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/publishing/BigQueryResultsPublisherTest.java b/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/publishing/BigQueryResultsPublisherTest.java index 7ee9bdf3a8bb..07a40352e49b 100644 --- a/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/publishing/BigQueryResultsPublisherTest.java +++ b/sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/publishing/BigQueryResultsPublisherTest.java @@ -33,9 +33,6 @@ /** Tests for {@link BigQueryResultsPublisher}. */ @RunWith(JUnit4.class) -@SuppressWarnings({ - "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) -}) public class BigQueryResultsPublisherTest { private static final String TABLE_NAME = "table"; diff --git a/sdks/java/testing/tpcds/README.md b/sdks/java/testing/tpcds/README.md new file mode 100644 index 000000000000..247b5cbe9300 --- /dev/null +++ b/sdks/java/testing/tpcds/README.md @@ -0,0 +1,68 @@ + + +# TPC-DS Benchmark + +## Google Dataflow Runner + +To execute TPC-DS benchmark for 1Gb dataset on Google Dataflow, run the following example command from the command line: + +```bash +./gradlew :sdks:java:testing:tpcds:run -Ptpcds.args="--dataSize=1G \ + --runner=DataflowRunner \ + --queries=3,26,55 \ + --tpcParallel=2 \ + --dataDirectory=/path/to/tpcds_data/ \ + --project=apache-beam-testing \ + --stagingLocation=gs://beamsql_tpcds_1/staging \ + --tempLocation=gs://beamsql_tpcds_2/temp \ + --dataDirectory=/path/to/tpcds_data/ \ + --region=us-west1 \ + --maxNumWorkers=10" +``` + +To run a query using ZetaSQL planner (currently Query96 can be run using ZetaSQL), set the plannerName as below. If not specified, the default planner is Calcite. + +```bash +./gradlew :sdks:java:testing:tpcds:run -Ptpcds.args="--dataSize=1G \ + --runner=DataflowRunner \ + --queries=96 \ + --tpcParallel=2 \ + --dataDirectory=/path/to/tpcds_data/ \ + --plannerName=org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner \ + --project=apache-beam-testing \ + --stagingLocation=gs://beamsql_tpcds_1/staging \ + --tempLocation=gs://beamsql_tpcds_2/temp \ + --region=us-west1 \ + --maxNumWorkers=10" +``` + +## Spark Runner + +To execute TPC-DS benchmark with Query3 for 1Gb dataset on Apache Spark 2.x, run the following example command from the command line: + +```bash +./gradlew :sdks:java:testing:tpcds:run -Ptpcds.runner=":runners:spark:2" -Ptpcds.args=" \ + --runner=SparkRunner \ + --queries=3 \ + --tpcParallel=1 \ + --dataDirectory=/path/to/tpcds_data/ \ + --dataSize=1G \ + --resultsDirectory=/path/to/tpcds_results/" +``` diff --git a/sdks/java/testing/tpcds/build.gradle b/sdks/java/testing/tpcds/build.gradle index c52ec7adb31e..931b7582ae18 100644 --- a/sdks/java/testing/tpcds/build.gradle +++ b/sdks/java/testing/tpcds/build.gradle @@ -16,38 +16,108 @@ * limitations under the License. */ -plugins { - id 'java' -} +plugins { id 'org.apache.beam.module' } +applyJavaNature( + automaticModuleName: 'org.apache.beam.sdk.tpcds', + exportJavadoc: false, + archivesBaseName: 'beam-sdks-java-tpcds', +) -description = "Apache Beam :: SDKs :: Java :: TPC-DS Benchark" +description = "Apache Beam :: SDKs :: Java :: TPC-DS" -version '2.24.0-SNAPSHOT' +// When running via Gradle, this property can be used to pass commandline arguments +// to the TPD-DS run +def tpcdsArgsProperty = "tpcds.args" -sourceCompatibility = 1.8 +// When running via Gradle, this property sets the runner dependency +def tpcdsRunnerProperty = "tpcds.runner" +def tpcdsRunnerDependency = project.findProperty(tpcdsRunnerProperty) + ?: ":runners:direct-java" +def shouldProvideSpark = ":runners:spark:2".equals(tpcdsRunnerDependency) +def isDataflowRunner = ":runners:google-cloud-dataflow-java".equals(tpcdsRunnerDependency) +def runnerConfiguration = ":runners:direct-java".equals(tpcdsRunnerDependency) ? "shadow" : null + +if (isDataflowRunner) { + /* + * We need to rely on manually specifying these evaluationDependsOn to ensure that + * the following projects are evaluated before we evaluate this project. This is because + * we are attempting to reference a property from the project directly. + */ + evaluationDependsOn(":runners:google-cloud-dataflow-java:worker:legacy-worker") +} -repositories { - mavenCentral() +configurations { + // A configuration for running the TPC-DS launcher directly from Gradle, which + // uses Gradle to put the appropriate dependencies on the Classpath rather than + // bundling them into a fat jar + gradleRun } dependencies { - compile 'com.googlecode.json-simple:json-simple:1.1.1' - compile project(path: ":sdks:java:core", configuration: "shadow") - compile project(path: ":runners:google-cloud-dataflow-java") - compile project(":sdks:java:io:google-cloud-platform") + compile library.java.avro + compile library.java.vendored_guava_26_0_jre + compile library.java.vendored_calcite_1_26_0 + compile library.java.commons_csv + compile library.java.slf4j_api + compile "com.googlecode.json-simple:json-simple:1.1.1" + compile "com.alibaba:fastjson:1.2.69" compile project(":sdks:java:extensions:sql") compile project(":sdks:java:extensions:sql:zetasql") - compile group: 'com.google.auto.service', name: 'auto-service', version: '1.0-rc1' - testCompile group: 'junit', name: 'junit', version: '4.12' + compile project(":sdks:java:io:parquet") + compile project(path: ":runners:google-cloud-dataflow-java") + compile project(path: ":sdks:java:core", configuration: "shadow") + testRuntimeClasspath library.java.slf4j_jdk14 + testCompile project(path: ":sdks:java:io:google-cloud-platform", configuration: "testRuntime") + gradleRun project(project.path) + gradleRun project(path: tpcdsRunnerDependency, configuration: runnerConfiguration) + + // The Spark runner requires the user to provide a Spark dependency. For self-contained + // runs with the Spark runner, we can provide such a dependency. This is deliberately phrased + // to not hardcode any runner other than :runners:direct-java + if (shouldProvideSpark) { + gradleRun library.java.spark_core, { + exclude group:"org.slf4j", module:"jul-to-slf4j" + } + gradleRun library.java.spark_sql + gradleRun library.java.spark_streaming + } } -// When running via Gradle, this property can be used to pass commandline arguments -// to the tpcds run -def tpcdsArgsProperty = "tpcds.args" +if (shouldProvideSpark) { + configurations.gradleRun { + // Using Spark runner causes a StackOverflowError if slf4j-jdk14 is on the classpath + exclude group: "org.slf4j", module: "slf4j-jdk14" + } +} +// Execute the TPC-DS queries or suites via Gradle. +// +// Parameters: +// -Ptpcds.runner +// Specify a runner subproject, such as ":runners:spark:2" or ":runners:flink:1.13" +// Defaults to ":runners:direct-java" +// +// -Ptpcds.args +// Specify the command line for invoking org.apache.beam.sdk.tpcds.BeamTpcds task run(type: JavaExec) { - main = "org.apache.beam.sdk.tpcds.BeamTpcds" - classpath = sourceSets.main.runtimeClasspath def tpcdsArgsStr = project.findProperty(tpcdsArgsProperty) ?: "" - args tpcdsArgsStr.split() + def tpcdsArgsList = new ArrayList() + Collections.addAll(tpcdsArgsList, tpcdsArgsStr.split()) + + if (isDataflowRunner) { + dependsOn ":runners:google-cloud-dataflow-java:worker:legacy-worker:shadowJar" + + def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: + project(":runners:google-cloud-dataflow-java:worker:legacy-worker") + .shadowJar.archivePath + // Provide job with a customizable worker jar. + // With legacy worker jar, containerImage is set to empty (i.e. to use the internal build). + // More context and discussions can be found in PR#6694. + tpcdsArgsList.add("--dataflowWorkerJar=${dataflowWorkerJar}".toString()) + tpcdsArgsList.add('--workerHarnessContainerImage=') + } + + main = "org.apache.beam.sdk.tpcds.BeamTpcds" + classpath = configurations.gradleRun + args tpcdsArgsList.toArray() } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/BeamSqlEnvRunner.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/BeamSqlEnvRunner.java index f94b748bc355..504b2ad14f6f 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/BeamSqlEnvRunner.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/BeamSqlEnvRunner.java @@ -17,10 +17,19 @@ */ package org.apache.beam.sdk.tpcds; +import static org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull; + import com.alibaba.fastjson.JSONObject; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.Map; +import java.util.concurrent.CompletionService; +import java.util.concurrent.ExecutorCompletionService; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions; import org.apache.beam.sdk.Pipeline; -import org.apache.beam.sdk.extensions.sql.SqlTransform; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlPipelineOptions; import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; @@ -34,155 +43,194 @@ import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.Row; import org.apache.beam.sdk.values.TypeDescriptors; - -import java.util.ArrayList; -import java.util.Arrays; -import java.util.List; -import java.util.Map; -import java.util.concurrent.CompletionService; -import java.util.concurrent.ExecutorCompletionService; -import java.util.concurrent.ExecutorService; -import java.util.concurrent.Executors; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** - * This class executes jobs using BeamSqlEnv, it uses BeamSqlEnv.executeDdl and BeamSqlEnv.parseQuery to run queries. + * This class executes jobs using BeamSqlEnv, it uses BeamSqlEnv.executeDdl and + * BeamSqlEnv.parseQuery to run queries. */ public class BeamSqlEnvRunner { - private static final String DATA_DIRECTORY = "gs://beamsql_tpcds_1/data"; - private static final String RESULT_DIRECTORY = "gs://beamsql_tpcds_1/tpcds_results"; - private static final String SUMMARY_START = "\n" + "TPC-DS Query Execution Summary:"; - private static final List SUMMARY_HEADERS_LIST = Arrays.asList("Query Name", "Job Name", "Data Size", "Dialect", "Status", "Start Time", "End Time", "Elapsed Time(sec)"); - - private static final Logger Log = LoggerFactory.getLogger(BeamSqlEnvRunner.class); - - private static String buildTableCreateStatement(String tableName) { - String createStatement = "CREATE EXTERNAL TABLE " + tableName + " (%s) TYPE text LOCATION '%s' TBLPROPERTIES '{\"format\":\"csv\", \"csvformat\": \"InformixUnload\"}'"; - return createStatement; + private static final String DATA_DIRECTORY = "gs://beamsql_tpcds_1/data"; + private static final String RESULT_DIRECTORY = "gs://beamsql_tpcds_1/tpcds_results"; + private static final String SUMMARY_START = "\n" + "TPC-DS Query Execution Summary:"; + private static final List SUMMARY_HEADERS_LIST = + Arrays.asList( + "Query Name", + "Job Name", + "Data Size", + "Dialect", + "Status", + "Start Time", + "End Time", + "Elapsed Time(sec)"); + + private static final Logger LOG = LoggerFactory.getLogger(BeamSqlEnvRunner.class); + + private static String buildTableCreateStatement(String tableName) { + return "CREATE EXTERNAL TABLE " + + tableName + + " (%s) TYPE text LOCATION '%s' TBLPROPERTIES '{\"format\":\"csv\", \"csvformat\": \"InformixUnload\"}'"; + } + + private static String buildDataLocation(String dataSize, String tableName) { + return DATA_DIRECTORY + "/" + dataSize + "/" + tableName + ".dat"; + } + + /** + * Register all tables into BeamSqlEnv, set their schemas, and set the locations where their + * corresponding data are stored. Currently this method is not supported by ZetaSQL planner. + */ + private static void registerAllTablesByBeamSqlEnv(BeamSqlEnv env, String dataSize) + throws Exception { + List tableNames = TableSchemaJSONLoader.getAllTableNames(); + for (String tableName : tableNames) { + String createStatement = buildTableCreateStatement(tableName); + String tableSchema = TableSchemaJSONLoader.parseTableSchema(tableName); + String dataLocation = buildDataLocation(dataSize, tableName); + env.executeDdl(String.format(createStatement, tableSchema, dataLocation)); } - - private static String buildDataLocation(String dataSize, String tableName) { - String dataLocation = DATA_DIRECTORY + "/" + dataSize + "/" + tableName + ".dat"; - return dataLocation; + } + + /** + * Register all tables into InMemoryMetaStore, set their schemas, and set the locations where + * their corresponding data are stored. + */ + private static void registerAllTablesByInMemoryMetaStore( + InMemoryMetaStore inMemoryMetaStore, String dataSize) throws Exception { + JSONObject properties = new JSONObject(); + properties.put("csvformat", "InformixUnload"); + + Map schemaMap = TpcdsSchemas.getTpcdsSchemas(); + for (Map.Entry entry : schemaMap.entrySet()) { + String tableName = entry.getKey(); + String dataLocation = DATA_DIRECTORY + "/" + dataSize + "/" + tableName + ".dat"; + Schema tableSchema = schemaMap.get(tableName); + checkArgumentNotNull(tableSchema, "Table schema can't be null for table: " + tableName); + Table table = + Table.builder() + .name(tableName) + .schema(tableSchema) + .location(dataLocation) + .properties(properties) + .type("text") + .build(); + inMemoryMetaStore.createTable(table); } - - /** Register all tables into BeamSqlEnv, set their schemas, and set the locations where their corresponding data are stored. - * Currently this method is not supported by ZetaSQL planner. */ - private static void registerAllTablesByBeamSqlEnv(BeamSqlEnv env, String dataSize) throws Exception { - List tableNames = TableSchemaJSONLoader.getAllTableNames(); - for (String tableName : tableNames) { - String createStatement = buildTableCreateStatement(tableName); - String tableSchema = TableSchemaJSONLoader.parseTableSchema(tableName); - String dataLocation = buildDataLocation(dataSize, tableName); - env.executeDdl(String.format(createStatement, tableSchema, dataLocation)); - } + } + + /** + * Print the summary table after all jobs are finished. + * + * @param completion A collection of all TpcdsRunResult that are from finished jobs. + * @param numOfResults The number of results in the collection. + * @throws Exception + */ + private static void printExecutionSummary( + CompletionService completion, int numOfResults) throws Exception { + List> summaryRowsList = new ArrayList<>(); + for (int i = 0; i < numOfResults; i++) { + TpcdsRunResult tpcdsRunResult = completion.take().get(); + List list = new ArrayList<>(); + list.add(tpcdsRunResult.getQueryName()); + list.add(tpcdsRunResult.getJobName()); + list.add(tpcdsRunResult.getDataSize()); + list.add(tpcdsRunResult.getDialect()); + list.add(tpcdsRunResult.getIsSuccessful() ? "Successful" : "Failed"); + list.add(tpcdsRunResult.getIsSuccessful() ? tpcdsRunResult.getStartDate().toString() : ""); + list.add(tpcdsRunResult.getIsSuccessful() ? tpcdsRunResult.getEndDate().toString() : ""); + list.add( + tpcdsRunResult.getIsSuccessful() ? Double.toString(tpcdsRunResult.getElapsedTime()) : ""); + summaryRowsList.add(list); } - /** Register all tables into InMemoryMetaStore, set their schemas, and set the locations where their corresponding data are stored. */ - private static void registerAllTablesByInMemoryMetaStore(InMemoryMetaStore inMemoryMetaStore, String dataSize) throws Exception { - JSONObject properties = new JSONObject(); - properties.put("csvformat", "InformixUnload"); - - Map schemaMap = TpcdsSchemas.getTpcdsSchemas(); - for (String tableName : schemaMap.keySet()) { - String dataLocation = DATA_DIRECTORY + "/" + dataSize + "/" + tableName + ".dat"; - Schema tableSchema = schemaMap.get(tableName); - Table table = Table.builder().name(tableName).schema(tableSchema).location(dataLocation).properties(properties).type("text").build(); - inMemoryMetaStore.createTable(table); - } + System.out.println(SUMMARY_START); + System.out.println(SummaryGenerator.generateTable(SUMMARY_HEADERS_LIST, summaryRowsList)); + } + + /** + * This is the alternative method in BeamTpcds.main method. Run job using BeamSqlEnv.parseQuery() + * method. (Doesn't perform well when running query96). + * + * @param args Command line arguments + * @throws Exception + */ + public static void runUsingBeamSqlEnv(String[] args) throws Exception { + InMemoryMetaStore inMemoryMetaStore = new InMemoryMetaStore(); + inMemoryMetaStore.registerProvider(new TextTableProvider()); + + TpcdsOptions tpcdsOptions = + PipelineOptionsFactory.fromArgs(args).withValidation().as(TpcdsOptions.class); + + String dataSize = TpcdsParametersReader.getAndCheckDataSize(tpcdsOptions); + String[] queryNames = TpcdsParametersReader.getAndCheckQueryNames(tpcdsOptions); + int nThreads = TpcdsParametersReader.getAndCheckTpcParallel(tpcdsOptions); + + // Using ExecutorService and CompletionService to fulfill multi-threading functionality + ExecutorService executor = Executors.newFixedThreadPool(nThreads); + CompletionService completion = new ExecutorCompletionService<>(executor); + + // Directly create all tables and register them into inMemoryMetaStore before creating + // BeamSqlEnv object. + registerAllTablesByInMemoryMetaStore(inMemoryMetaStore, dataSize); + + BeamSqlPipelineOptions beamSqlPipelineOptions = tpcdsOptions.as(BeamSqlPipelineOptions.class); + BeamSqlEnv env = + BeamSqlEnv.builder(inMemoryMetaStore) + .setPipelineOptions(beamSqlPipelineOptions) + .setQueryPlannerClassName(beamSqlPipelineOptions.getPlannerName()) + .build(); + + // Make an array of pipelines, each pipeline is responsible for running a corresponding query. + Pipeline[] pipelines = new Pipeline[queryNames.length]; + + // Execute all queries, transform the each result into a PCollection, write them into + // the txt file and store in a GCP directory. + for (int i = 0; i < queryNames.length; i++) { + // For each query, get a copy of pipelineOptions from command line arguments, cast + // tpcdsOptions as a DataflowPipelineOptions object to read and set required parameters for + // pipeline execution. + TpcdsOptions tpcdsOptionsCopy = + PipelineOptionsFactory.fromArgs(args).withValidation().as(TpcdsOptions.class); + DataflowPipelineOptions dataflowPipelineOptionsCopy = + tpcdsOptionsCopy.as(DataflowPipelineOptions.class); + + // Set a unique job name using the time stamp so that multiple different pipelines can run + // together. + dataflowPipelineOptionsCopy.setJobName(queryNames[i] + "result" + System.currentTimeMillis()); + + pipelines[i] = Pipeline.create(dataflowPipelineOptionsCopy); + String queryString = QueryReader.readQuery(queryNames[i]); + + try { + // Query execution + PCollection rows = + BeamSqlRelUtils.toPCollection(pipelines[i], env.parseQuery(queryString)); + + // Transform the result from PCollection into PCollection, and write it to the + // location where results are stored. + PCollection rowStrings = + rows.apply(MapElements.into(TypeDescriptors.strings()).via(Row::toString)); + rowStrings.apply( + TextIO.write() + .to( + RESULT_DIRECTORY + + "/" + + dataSize + + "/" + + pipelines[i].getOptions().getJobName()) + .withSuffix(".txt") + .withNumShards(1)); + } catch (Exception e) { + LOG.error("{} failed to execute", queryNames[i]); + e.printStackTrace(); + } + + completion.submit(new TpcdsRun(pipelines[i])); } - /** - * Print the summary table after all jobs are finished. - * @param completion A collection of all TpcdsRunResult that are from finished jobs. - * @param numOfResults The number of results in the collection. - * @throws Exception - */ - private static void printExecutionSummary(CompletionService completion, int numOfResults) throws Exception { - List> summaryRowsList = new ArrayList<>(); - for (int i = 0; i < numOfResults; i++) { - TpcdsRunResult tpcdsRunResult = completion.take().get(); - List list = new ArrayList<>(); - list.add(tpcdsRunResult.getQueryName()); - list.add(tpcdsRunResult.getJobName()); - list.add(tpcdsRunResult.getDataSize()); - list.add(tpcdsRunResult.getDialect()); - list.add(tpcdsRunResult.getIsSuccessful() ? "Successful" : "Failed"); - list.add(tpcdsRunResult.getIsSuccessful() ? tpcdsRunResult.getStartDate().toString() : ""); - list.add(tpcdsRunResult.getIsSuccessful() ? tpcdsRunResult.getEndDate().toString(): ""); - list.add(tpcdsRunResult.getIsSuccessful() ? Double.toString(tpcdsRunResult.getElapsedTime()) : ""); - summaryRowsList.add(list); - } - - System.out.println(SUMMARY_START); - System.out.println(SummaryGenerator.generateTable(SUMMARY_HEADERS_LIST, summaryRowsList)); - } + executor.shutdown(); - /** - * This is the alternative method in BeamTpcds.main method. Run job using BeamSqlEnv.parseQuery() method. (Doesn't perform well when running query96). - * @param args Command line arguments - * @throws Exception - */ - public static void runUsingBeamSqlEnv(String[] args) throws Exception { - InMemoryMetaStore inMemoryMetaStore = new InMemoryMetaStore(); - inMemoryMetaStore.registerProvider(new TextTableProvider()); - - TpcdsOptions tpcdsOptions = PipelineOptionsFactory.fromArgs(args).withValidation().as(TpcdsOptions.class); - - String dataSize = TpcdsParametersReader.getAndCheckDataSize(tpcdsOptions); - String[] queryNameArr = TpcdsParametersReader.getAndCheckQueryNameArray(tpcdsOptions); - int nThreads = TpcdsParametersReader.getAndCheckTpcParallel(tpcdsOptions); - - // Using ExecutorService and CompletionService to fulfill multi-threading functionality - ExecutorService executor = Executors.newFixedThreadPool(nThreads); - CompletionService completion = new ExecutorCompletionService<>(executor); - - // Directly create all tables and register them into inMemoryMetaStore before creating BeamSqlEnv object. - registerAllTablesByInMemoryMetaStore(inMemoryMetaStore, dataSize); - - BeamSqlPipelineOptions beamSqlPipelineOptions = tpcdsOptions.as(BeamSqlPipelineOptions.class); - BeamSqlEnv env = - BeamSqlEnv - .builder(inMemoryMetaStore) - .setPipelineOptions(beamSqlPipelineOptions) - .setQueryPlannerClassName(beamSqlPipelineOptions.getPlannerName()) - .build(); - - // Make an array of pipelines, each pipeline is responsible for running a corresponding query. - Pipeline[] pipelines = new Pipeline[queryNameArr.length]; - - // Execute all queries, transform the each result into a PCollection, write them into the txt file and store in a GCP directory. - for (int i = 0; i < queryNameArr.length; i++) { - // For each query, get a copy of pipelineOptions from command line arguments, cast tpcdsOptions as a DataflowPipelineOptions object to read and set required parameters for pipeline execution. - TpcdsOptions tpcdsOptionsCopy = PipelineOptionsFactory.fromArgs(args).withValidation().as(TpcdsOptions.class); - DataflowPipelineOptions dataflowPipelineOptionsCopy = tpcdsOptionsCopy.as(DataflowPipelineOptions.class); - - // Set a unique job name using the time stamp so that multiple different pipelines can run together. - dataflowPipelineOptionsCopy.setJobName(queryNameArr[i] + "result" + System.currentTimeMillis()); - - pipelines[i] = Pipeline.create(dataflowPipelineOptionsCopy); - String queryString = QueryReader.readQuery(queryNameArr[i]); - - try { - // Query execution - PCollection rows = BeamSqlRelUtils.toPCollection(pipelines[i], env.parseQuery(queryString)); - - // Transform the result from PCollection into PCollection, and write it to the location where results are stored. - PCollection rowStrings = rows.apply(MapElements - .into(TypeDescriptors.strings()) - .via((Row row) -> row.toString())); - rowStrings.apply(TextIO.write().to(RESULT_DIRECTORY + "/" + dataSize + "/" + pipelines[i].getOptions().getJobName()).withSuffix(".txt").withNumShards(1)); - } catch (Exception e) { - Log.error("{} failed to execute", queryNameArr[i]); - e.printStackTrace(); - } - - completion.submit(new TpcdsRun(pipelines[i])); - } - - executor.shutdown(); - - printExecutionSummary(completion, queryNameArr.length); - } + printExecutionSummary(completion, queryNames.length); + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/BeamTpcds.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/BeamTpcds.java index a7f67dea0acf..3361453130ef 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/BeamTpcds.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/BeamTpcds.java @@ -17,43 +17,17 @@ */ package org.apache.beam.sdk.tpcds; - -/** - * To execute this main() method, run the following example command from the command line. - * - * ./gradlew :sdks:java:testing:tpcds:run -Ptpcds.args="--dataSize=1G \ - * --queries=3,26,55 \ - * --tpcParallel=2 \ - * --project=apache-beam-testing \ - * --stagingLocation=gs://beamsql_tpcds_1/staging \ - * --tempLocation=gs://beamsql_tpcds_2/temp \ - * --runner=DataflowRunner \ - * --region=us-west1 \ - * --maxNumWorkers=10" - * - * - * To run query using ZetaSQL planner (currently query96 can be run using ZetaSQL), set the plannerName as below. If not specified, the default planner is Calcite. - * - * ./gradlew :sdks:java:testing:tpcds:run -Ptpcds.args="--dataSize=1G \ - * --queries=96 \ - * --tpcParallel=2 \ - * --plannerName=org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner \ - * --project=apache-beam-testing \ - * --stagingLocation=gs://beamsql_tpcds_1/staging \ - * --tempLocation=gs://beamsql_tpcds_2/temp \ - * --runner=DataflowRunner \ - * --region=us-west1 \ - * --maxNumWorkers=10" - */ +/** Main driver program to run TPC-DS benchmark. */ public class BeamTpcds { - /** - * The main method can choose to run either SqlTransformRunner.runUsingSqlTransform() or BeamSqlEnvRunner.runUsingBeamSqlEnv() - * Currently the former has better performance so it is chosen. - * - * @param args Command line arguments - * @throws Exception - */ - public static void main(String[] args) throws Exception { - SqlTransformRunner.runUsingSqlTransform(args); - } + /** + * The main method can choose to run either SqlTransformRunner.runUsingSqlTransform() or + * BeamSqlEnvRunner.runUsingBeamSqlEnv() Currently the former has better performance so it is + * chosen. + * + * @param args Command line arguments + * @throws Exception + */ + public static void main(String[] args) throws Exception { + SqlTransformRunner.runUsingSqlTransform(args); + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/CsvToRow.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/CsvToRow.java index 22519b2e1d41..d6c8ed8ee2f7 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/CsvToRow.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/CsvToRow.java @@ -17,6 +17,9 @@ */ package org.apache.beam.sdk.tpcds; +import static org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils.csvLines2BeamRows; + +import java.io.Serializable; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.transforms.FlatMapElements; import org.apache.beam.sdk.transforms.PTransform; @@ -25,34 +28,30 @@ import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.commons.csv.CSVFormat; -import java.io.Serializable; - -import static org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils.csvLines2BeamRows; - /** - * A readConverter class for TextTable that can read csv file and transform it to PCollection + * A readConverter class for TextTable that can read csv file and transform it to PCollection. */ public class CsvToRow extends PTransform, PCollection> - implements Serializable { - private Schema schema; - private CSVFormat csvFormat; + implements Serializable { + private final Schema schema; + private final CSVFormat csvFormat; - public CSVFormat getCsvFormat() { - return csvFormat; - } + public CSVFormat getCsvFormat() { + return csvFormat; + } - public CsvToRow(Schema schema, CSVFormat csvFormat) { - this.schema = schema; - this.csvFormat = csvFormat; - } + public CsvToRow(Schema schema, CSVFormat csvFormat) { + this.schema = schema; + this.csvFormat = csvFormat; + } - @Override - public PCollection expand(PCollection input) { - return input - .apply( - "csvToRow", - FlatMapElements.into(TypeDescriptors.rows()) - .via(s -> csvLines2BeamRows(csvFormat, s, schema))) - .setRowSchema(schema); - } + @Override + public PCollection expand(PCollection input) { + return input + .apply( + "csvToRow", + FlatMapElements.into(TypeDescriptors.rows()) + .via(s -> csvLines2BeamRows(csvFormat, s, schema))) + .setRowSchema(schema); + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/QueryReader.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/QueryReader.java index 1666c78fba2f..3400a8da1c04 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/QueryReader.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/QueryReader.java @@ -17,43 +17,44 @@ */ package org.apache.beam.sdk.tpcds; -import java.io.BufferedReader; -import java.io.File; -import java.io.FileReader; -import java.util.Objects; +import java.util.Set; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParseException; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.parser.SqlParser; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; /** - * The QueryReader reads query file (the file's extension is '.sql' and content doesn't end with a ';'), write the query as a string and return it. + * The QueryReader reads query file (the file's extension is '.sql' and content doesn't end with a + * ';'), write the query as a string and return it. */ public class QueryReader { - /** - * Reads a query file (.sql), return the query as a string. - * @param queryFileName The name of the query file (such as "query1, query5...") which is stored in resource/queries directory - * @return The query string stored in this file. - * @throws Exception - */ - public static String readQuery(String queryFileName) throws Exception { - // Prepare the file reader. - String queryFilePath = Objects.requireNonNull(QueryReader.class.getClassLoader().getResource("queries/" + queryFileName + ".sql")).getPath(); - File queryFile = new File(queryFilePath); - FileReader fileReader = new FileReader(queryFile); - BufferedReader reader = new BufferedReader(fileReader); + /** + * Reads a query file (.sql), return the query as a string. + * + * @param queryFileName The name of the query file (such as "query1, query5...") which is stored + * in resource/queries directory + * @return The query string stored in this file. + * @throws Exception + */ + public static String readQuery(String queryFileName) throws Exception { + String path = "queries/" + queryFileName + ".sql"; + return Resources.toString(Resources.getResource(path), Charsets.UTF_8); + } - // Read the file into stringBuilder. - StringBuilder stringBuilder = new StringBuilder(); - String line; - String ls = System.getProperty("line.separator"); - while ((line = reader.readLine()) != null) { - stringBuilder.append(line); - stringBuilder.append(ls); - } - - // Delete the last new line separator. - stringBuilder.deleteCharAt(stringBuilder.length() - 1); - reader.close(); - - String queryString = stringBuilder.toString(); - - return queryString; - } + /** + * Parse query and get all its identifiers. + * + * @param queryString + * @return Set of SQL query identifiers as strings. + * @throws SqlParseException + */ + public static Set getQueryIdentifiers(String queryString) throws SqlParseException { + SqlParser parser = SqlParser.create(queryString); + SqlNode parsedQuery = parser.parseQuery(); + SqlTransformRunner.SqlIdentifierVisitor sqlVisitor = + new SqlTransformRunner.SqlIdentifierVisitor(); + parsedQuery.accept(sqlVisitor); + return sqlVisitor.getIdentifiers(); + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/RowToCsv.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/RowToCsv.java index 3bae75d02a4a..a0879484356c 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/RowToCsv.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/RowToCsv.java @@ -17,6 +17,9 @@ */ package org.apache.beam.sdk.tpcds; +import static org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils.beamRow2CsvLine; + +import java.io.Serializable; import org.apache.beam.sdk.transforms.MapElements; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.values.PCollection; @@ -24,29 +27,26 @@ import org.apache.beam.sdk.values.TypeDescriptors; import org.apache.commons.csv.CSVFormat; -import java.io.Serializable; - -import static org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils.beamRow2CsvLine; - /** - * A writeConverter class for TextTable that can transform PCollection into PCollection, the format of string is determined by CSVFormat + * A writeConverter class for TextTable that can transform PCollection into + * PCollection, the format of string is determined by CSVFormat. */ public class RowToCsv extends PTransform, PCollection> - implements Serializable { - private CSVFormat csvFormat; + implements Serializable { + private final CSVFormat csvFormat; - public RowToCsv(CSVFormat csvFormat) { - this.csvFormat = csvFormat; - } + public RowToCsv(CSVFormat csvFormat) { + this.csvFormat = csvFormat; + } - public CSVFormat getCsvFormat() { - return csvFormat; - } + public CSVFormat getCsvFormat() { + return csvFormat; + } - @Override - public PCollection expand(PCollection input) { - return input.apply( - "rowToCsv", - MapElements.into(TypeDescriptors.strings()).via(row -> beamRow2CsvLine(row, csvFormat))); - } + @Override + public PCollection expand(PCollection input) { + return input.apply( + "rowToCsv", + MapElements.into(TypeDescriptors.strings()).via(row -> beamRow2CsvLine(row, csvFormat))); + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/SqlTransformRunner.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/SqlTransformRunner.java index a40cd12e9a3a..9d25f6ad46b7 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/SqlTransformRunner.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/SqlTransformRunner.java @@ -17,165 +17,295 @@ */ package org.apache.beam.sdk.tpcds; -import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions; +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletionService; +import java.util.concurrent.ExecutorCompletionService; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import org.apache.avro.generic.GenericRecord; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.extensions.sql.SqlTransform; -import org.apache.beam.sdk.extensions.sql.impl.BeamSqlPipelineOptions; import org.apache.beam.sdk.extensions.sql.meta.provider.text.TextTable; import org.apache.beam.sdk.io.TextIO; +import org.apache.beam.sdk.io.parquet.ParquetIO; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.SchemaCoder; import org.apache.beam.sdk.transforms.MapElements; -import org.apache.beam.sdk.values.*; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_26_0.org.apache.calcite.sql.util.SqlBasicVisitor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; import org.apache.commons.csv.CSVFormat; - -import java.util.ArrayList; -import java.util.Arrays; -import java.util.List; -import java.util.Map; -import java.util.concurrent.CompletionService; -import java.util.concurrent.ExecutorCompletionService; -import java.util.concurrent.ExecutorService; -import java.util.concurrent.Executors; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** - * This class executes jobs using PCollection and SqlTransform, it uses SqlTransform.query to run queries. + * This class executes jobs using PCollection and SqlTransform, it uses SqlTransform.query to run + * queries. + * + *

    TODO: Add tests. */ public class SqlTransformRunner { - private static final String DATA_DIRECTORY = "gs://beamsql_tpcds_1/data"; - private static final String RESULT_DIRECTORY = "gs://beamsql_tpcds_1/tpcds_results"; - private static final String SUMMARY_START = "\n" + "TPC-DS Query Execution Summary:"; - private static final List SUMMARY_HEADERS_LIST = Arrays.asList("Query Name", "Job Name", "Data Size", "Dialect", "Status", "Start Time", "End Time", "Elapsed Time(sec)"); - - private static final Logger Log = LoggerFactory.getLogger(SqlTransform.class); - - /** - * Get all tables (in the form of TextTable) needed for a specific query execution - * @param pipeline The pipeline that will be run to execute the query - * @param csvFormat The csvFormat to construct readConverter (CsvToRow) and writeConverter (RowToCsv) - * @param queryName The name of the query which will be executed (for example: query3, query55, query96) - * @return A PCollectionTuple which is constructed by all tables needed for running query. - * @throws Exception - */ - private static PCollectionTuple getTables(Pipeline pipeline, CSVFormat csvFormat, String queryName) throws Exception { - Map schemaMap = TpcdsSchemas.getTpcdsSchemas(); - TpcdsOptions tpcdsOptions = pipeline.getOptions().as(TpcdsOptions.class); - String dataSize = TpcdsParametersReader.getAndCheckDataSize(tpcdsOptions); - String queryString = QueryReader.readQuery(queryName); - - PCollectionTuple tables = PCollectionTuple.empty(pipeline); - for (Map.Entry tableSchema : schemaMap.entrySet()) { - String tableName = tableSchema.getKey(); - - // Only when queryString contains tableName, the table is relevant to this query and will be added. This can avoid reading unnecessary data files. - if (queryString.contains(tableName)) { - // This is location path where the data are stored - String filePattern = DATA_DIRECTORY + "/" + dataSize + "/" + tableName + ".dat"; - - PCollection table = - new TextTable( - tableSchema.getValue(), - filePattern, - new CsvToRow(tableSchema.getValue(), csvFormat), - new RowToCsv(csvFormat)) - .buildIOReader(pipeline.begin()) - .setCoder(SchemaCoder.of(tableSchema.getValue())) - .setName(tableSchema.getKey()); - - tables = tables.and(new TupleTag<>(tableName), table); + private static final String SUMMARY_START = "\n" + "TPC-DS Query Execution Summary:"; + private static final List SUMMARY_HEADERS_LIST = + Arrays.asList( + "Query Name", + "Job Name", + "Data Size", + "Dialect", + "Status", + "Start Time", + "End Time", + "Elapsed Time(sec)"); + + private static final Logger LOG = LoggerFactory.getLogger(SqlTransformRunner.class); + + /** This class is used to extract all SQL query identifiers. */ + static class SqlIdentifierVisitor extends SqlBasicVisitor { + private final Set identifiers = new HashSet<>(); + + public Set getIdentifiers() { + return identifiers; + } + + @Override + public Void visit(SqlIdentifier id) { + identifiers.addAll(id.names); + return null; + } + } + + /** + * Get all tables (in the form of TextTable) needed for a specific query execution. + * + * @param pipeline The pipeline that will be run to execute the query + * @param csvFormat The csvFormat to construct readConverter (CsvToRow) and writeConverter + * (RowToCsv) + * @param queryName The name of the query which will be executed (for example: query3, query55, + * query96) + * @return A PCollectionTuple which is constructed by all tables needed for running query. + * @throws Exception + */ + private static PCollectionTuple getTables( + Pipeline pipeline, CSVFormat csvFormat, String queryName) throws Exception { + Map schemaMap = TpcdsSchemas.getTpcdsSchemas(); + TpcdsOptions tpcdsOptions = pipeline.getOptions().as(TpcdsOptions.class); + String dataSize = TpcdsParametersReader.getAndCheckDataSize(tpcdsOptions); + Set identifiers = QueryReader.getQueryIdentifiers(QueryReader.readQuery(queryName)); + + PCollectionTuple tables = PCollectionTuple.empty(pipeline); + for (Map.Entry tableSchema : schemaMap.entrySet()) { + String tableName = tableSchema.getKey(); + + // Only when query identifiers contain tableName, the table is relevant to this query and will + // be added. This can avoid reading unnecessary data files. + if (identifiers.contains(tableName.toUpperCase())) { + Set tableColumns = getTableColumns(identifiers, tableSchema); + + switch (tpcdsOptions.getSourceType()) { + case CSV: + { + PCollection table = + getTableCSV(pipeline, csvFormat, tpcdsOptions, dataSize, tableSchema, tableName); + tables = tables.and(new TupleTag<>(tableName), table); + break; } + case PARQUET: + { + PCollection table = + getTableParquet(pipeline, tpcdsOptions, dataSize, tableName, tableColumns); + tables = tables.and(new TupleTag<>(tableName), table); + break; + } + default: + throw new IllegalStateException( + "Unexpected source type: " + tpcdsOptions.getSourceType()); } - return tables; + } } + return tables; + } - /** - * Print the summary table after all jobs are finished. - * @param completion A collection of all TpcdsRunResult that are from finished jobs. - * @param numOfResults The number of results in the collection. - * @throws Exception - */ - private static void printExecutionSummary(CompletionService completion, int numOfResults) throws Exception { - List> summaryRowsList = new ArrayList<>(); - for (int i = 0; i < numOfResults; i++) { - TpcdsRunResult tpcdsRunResult = completion.take().get(); - List list = new ArrayList<>(); - list.add(tpcdsRunResult.getQueryName()); - list.add(tpcdsRunResult.getJobName()); - list.add(tpcdsRunResult.getDataSize()); - list.add(tpcdsRunResult.getDialect()); - // If the job is not successful, leave the run time related field blank - list.add(tpcdsRunResult.getIsSuccessful() ? "Successful" : "Failed"); - list.add(tpcdsRunResult.getIsSuccessful() ? tpcdsRunResult.getStartDate().toString() : ""); - list.add(tpcdsRunResult.getIsSuccessful() ? tpcdsRunResult.getEndDate().toString(): ""); - list.add(tpcdsRunResult.getIsSuccessful() ? Double.toString(tpcdsRunResult.getElapsedTime()) : ""); - summaryRowsList.add(list); - } + private static Set getTableColumns( + Set identifiers, Map.Entry tableSchema) { + Set tableColumns = new HashSet<>(); + List fields = tableSchema.getValue().getFields(); + for (Schema.Field field : fields) { + String fieldName = field.getName(); + if (identifiers.contains(fieldName.toUpperCase())) { + tableColumns.add(fieldName); + } + } + return tableColumns; + } + + private static PCollection getTableParquet( + Pipeline pipeline, + TpcdsOptions tpcdsOptions, + String dataSize, + String tableName, + Set tableColumns) + throws IOException { + org.apache.avro.Schema schema = getAvroSchema(tableName); + org.apache.avro.Schema schemaProjected = getProjectedSchema(tableColumns, schema); + + String filepattern = + tpcdsOptions.getDataDirectory() + "/" + dataSize + "/" + tableName + "/*.parquet"; + + return pipeline.apply( + "Read " + tableName + " (parquet)", + ParquetIO.read(schema) + .from(filepattern) + .withSplit() + .withProjection(schemaProjected, schemaProjected) + .withBeamSchemas(true)); + } + + private static PCollection getTableCSV( + Pipeline pipeline, + CSVFormat csvFormat, + TpcdsOptions tpcdsOptions, + String dataSize, + Map.Entry tableSchema, + String tableName) { + // This is location path where the data are stored + String filePattern = + tpcdsOptions.getDataDirectory() + "/" + dataSize + "/" + tableName + ".dat"; + + return new TextTable( + tableSchema.getValue(), + filePattern, + new CsvToRow(tableSchema.getValue(), csvFormat), + new RowToCsv(csvFormat)) + .buildIOReader(pipeline.begin()) + .setCoder(SchemaCoder.of(tableSchema.getValue())) + .setName(tableSchema.getKey()); + } + + private static org.apache.avro.Schema getAvroSchema(String tableName) throws IOException { + String path = "schemas_avro/" + tableName + ".json"; + return new org.apache.avro.Schema.Parser() + .parse(Resources.toString(Resources.getResource(path), Charsets.UTF_8)); + } - System.out.println(SUMMARY_START); - System.out.println(SummaryGenerator.generateTable(SUMMARY_HEADERS_LIST, summaryRowsList)); + static org.apache.avro.Schema getProjectedSchema( + Set projectedFieldNames, org.apache.avro.Schema schema) { + List projectedFields = new ArrayList<>(); + for (org.apache.avro.Schema.Field f : schema.getFields()) { + if (projectedFieldNames.contains(f.name())) { + projectedFields.add( + new org.apache.avro.Schema.Field(f.name(), f.schema(), f.doc(), f.defaultVal())); + } } + org.apache.avro.Schema schemaProjected = + org.apache.avro.Schema.createRecord(schema.getName() + "_projected", "", "", false); + schemaProjected.setFields(projectedFields); + return schemaProjected; + } - /** - * This is the default method in BeamTpcds.main method. Run job using SqlTranform.query() method. - * @param args Command line arguments - * @throws Exception - */ - public static void runUsingSqlTransform(String[] args) throws Exception { - TpcdsOptions tpcdsOptions = PipelineOptionsFactory.fromArgs(args).withValidation().as(TpcdsOptions.class); - - String dataSize = TpcdsParametersReader.getAndCheckDataSize(tpcdsOptions); - String[] queryNameArr = TpcdsParametersReader.getAndCheckQueryNameArray(tpcdsOptions); - int nThreads = TpcdsParametersReader.getAndCheckTpcParallel(tpcdsOptions); - - // Using ExecutorService and CompletionService to fulfill multi-threading functionality - ExecutorService executor = Executors.newFixedThreadPool(nThreads); - CompletionService completion = new ExecutorCompletionService<>(executor); - - // Make an array of pipelines, each pipeline is responsible for running a corresponding query. - Pipeline[] pipelines = new Pipeline[queryNameArr.length]; - CSVFormat csvFormat = CSVFormat.MYSQL.withDelimiter('|').withNullString(""); - - // Execute all queries, transform the each result into a PCollection, write them into the txt file and store in a GCP directory. - for (int i = 0; i < queryNameArr.length; i++) { - // For each query, get a copy of pipelineOptions from command line arguments. - TpcdsOptions tpcdsOptionsCopy = PipelineOptionsFactory.fromArgs(args).withValidation().as(TpcdsOptions.class); - - // Cast tpcdsOptions as a BeamSqlPipelineOptions object to read and set queryPlanner (the default one is Calcite, can change to ZetaSQL). - BeamSqlPipelineOptions beamSqlPipelineOptionsCopy = tpcdsOptionsCopy.as(BeamSqlPipelineOptions.class); - - // Finally, cast BeamSqlPipelineOptions as a DataflowPipelineOptions object to read and set other required pipeline optionsparameters . - DataflowPipelineOptions dataflowPipelineOptionsCopy = beamSqlPipelineOptionsCopy.as(DataflowPipelineOptions.class); - - // Set a unique job name using the time stamp so that multiple different pipelines can run together. - dataflowPipelineOptionsCopy.setJobName(queryNameArr[i] + "result" + System.currentTimeMillis()); - - pipelines[i] = Pipeline.create(dataflowPipelineOptionsCopy); - String queryString = QueryReader.readQuery(queryNameArr[i]); - PCollectionTuple tables = getTables(pipelines[i], csvFormat, queryNameArr[i]); - - try { - tables - .apply( - SqlTransform.query(queryString)) - .apply( - MapElements.into(TypeDescriptors.strings()).via((Row row) -> row.toString())) - .apply(TextIO.write() - .to(RESULT_DIRECTORY + "/" + dataSize + "/" + pipelines[i].getOptions().getJobName()) - .withSuffix(".txt") - .withNumShards(1)); - } catch (Exception e) { - Log.error("{} failed to execute", queryNameArr[i]); - e.printStackTrace(); - } + /** + * Print the summary table after all jobs are finished. + * + * @param completion A collection of all TpcdsRunResult that are from finished jobs. + * @param numOfResults The number of results in the collection. + * @throws Exception + */ + private static void printExecutionSummary( + CompletionService completion, int numOfResults) throws Exception { + List> summaryRowsList = new ArrayList<>(); + for (int i = 0; i < numOfResults; i++) { + TpcdsRunResult tpcdsRunResult = completion.take().get(); + List list = new ArrayList<>(); + list.add(tpcdsRunResult.getQueryName()); + list.add(tpcdsRunResult.getJobName()); + list.add(tpcdsRunResult.getDataSize()); + list.add(tpcdsRunResult.getDialect()); + // If the job is not successful, leave the run time related field blank + list.add(tpcdsRunResult.getIsSuccessful() ? "Successful" : "Failed"); + list.add(tpcdsRunResult.getIsSuccessful() ? tpcdsRunResult.getStartDate().toString() : ""); + list.add(tpcdsRunResult.getIsSuccessful() ? tpcdsRunResult.getEndDate().toString() : ""); + list.add( + tpcdsRunResult.getIsSuccessful() ? Double.toString(tpcdsRunResult.getElapsedTime()) : ""); + summaryRowsList.add(list); + } - completion.submit(new TpcdsRun(pipelines[i])); - } + System.out.println(SUMMARY_START); + System.out.println(SummaryGenerator.generateTable(SUMMARY_HEADERS_LIST, summaryRowsList)); + } + + /** + * This is the default method in BeamTpcds.main method. Run job using SqlTranform.query() method. + * + * @param args Command line arguments + * @throws Exception + */ + public static void runUsingSqlTransform(String[] args) throws Exception { + TpcdsOptions tpcdsOptions = + PipelineOptionsFactory.fromArgs(args).withValidation().as(TpcdsOptions.class); + + String dataSize = TpcdsParametersReader.getAndCheckDataSize(tpcdsOptions); + String[] queryNames = TpcdsParametersReader.getAndCheckQueryNames(tpcdsOptions); + int nThreads = TpcdsParametersReader.getAndCheckTpcParallel(tpcdsOptions); + + // Using ExecutorService and CompletionService to fulfill multi-threading functionality + ExecutorService executor = Executors.newFixedThreadPool(nThreads); + CompletionService completion = new ExecutorCompletionService<>(executor); - executor.shutdown(); + // Make an array of pipelines, each pipeline is responsible for running a corresponding query. + Pipeline[] pipelines = new Pipeline[queryNames.length]; + CSVFormat csvFormat = CSVFormat.MYSQL.withDelimiter('|').withNullString(""); - printExecutionSummary(completion, queryNameArr.length); + // Execute all queries, transform each result into a PCollection, write them into + // the txt file and store in a GCP directory. + for (int i = 0; i < queryNames.length; i++) { + // For each query, get a copy of pipelineOptions from command line arguments. + TpcdsOptions tpcdsOptionsCopy = + PipelineOptionsFactory.fromArgs(args).withValidation().as(TpcdsOptions.class); + + // Set a unique job name using the time stamp so that multiple different pipelines can run + // together. + tpcdsOptionsCopy.setJobName(queryNames[i] + "result" + System.currentTimeMillis()); + + pipelines[i] = Pipeline.create(tpcdsOptionsCopy); + String queryString = QueryReader.readQuery(queryNames[i]); + PCollectionTuple tables = getTables(pipelines[i], csvFormat, queryNames[i]); + + try { + tables + .apply(SqlTransform.query(queryString)) + .apply(MapElements.into(TypeDescriptors.strings()).via(Row::toString)) + .apply( + TextIO.write() + .to( + tpcdsOptions.getResultsDirectory() + + "/" + + dataSize + + "/" + + pipelines[i].getOptions().getJobName()) + .withSuffix(".txt") + .withNumShards(1)); + } catch (Exception e) { + LOG.error("{} failed to execute", queryNames[i]); + e.printStackTrace(); + } + + completion.submit(new TpcdsRun(pipelines[i])); } + + executor.shutdown(); + + printExecutionSummary(completion, queryNames.length); + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/SummaryGenerator.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/SummaryGenerator.java index bddb6a8fa6fc..36f7a283b29d 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/SummaryGenerator.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/SummaryGenerator.java @@ -20,134 +20,157 @@ import java.util.HashMap; import java.util.List; import java.util.Map; +import java.util.Optional; -/** - * Generate the tpcds queries execution summary on the command line after finishing all jobs. - */ +/** Generate the tpcds queries execution summary on the command line after finishing all jobs. */ public class SummaryGenerator { - private static final int PADDING_SIZE = 2; - private static final String NEW_LINE = "\n"; - private static final String TABLE_JOINT_SYMBOL = "+"; - private static final String TABLE_V_SPLIT_SYMBOL = "|"; - private static final String TABLE_H_SPLIT_SYMBOL = "-"; + private static final int PADDING_SIZE = 2; + private static final String NEW_LINE = "\n"; + private static final String TABLE_JOINT_SYMBOL = "+"; + private static final String TABLE_V_SPLIT_SYMBOL = "|"; + private static final String TABLE_H_SPLIT_SYMBOL = "-"; - public static String generateTable(List headersList, List> rowsList,int... overRiddenHeaderHeight) { - StringBuilder stringBuilder = new StringBuilder(); + public static String generateTable( + List headersList, List> rowsList, int... overRiddenHeaderHeight) { + StringBuilder stringBuilder = new StringBuilder(); - int rowHeight = overRiddenHeaderHeight.length > 0 ? overRiddenHeaderHeight[0] : 1; + int rowHeight = overRiddenHeaderHeight.length > 0 ? overRiddenHeaderHeight[0] : 1; - Map columnMaxWidthMapping = getMaximumWidthofTable(headersList, rowsList); + Map columnMaxWidthMapping = getMaximumWidthofTable(headersList, rowsList); - stringBuilder.append(NEW_LINE); - stringBuilder.append(NEW_LINE); - createRowLine(stringBuilder, headersList.size(), columnMaxWidthMapping); - stringBuilder.append(NEW_LINE); + stringBuilder.append(NEW_LINE); + stringBuilder.append(NEW_LINE); + createRowLine(stringBuilder, headersList.size(), columnMaxWidthMapping); + stringBuilder.append(NEW_LINE); - for (int headerIndex = 0; headerIndex < headersList.size(); headerIndex++) { - fillCell(stringBuilder, headersList.get(headerIndex), headerIndex, columnMaxWidthMapping); - } - - stringBuilder.append(NEW_LINE); + for (int headerIndex = 0; headerIndex < headersList.size(); headerIndex++) { + fillCell(stringBuilder, headersList.get(headerIndex), headerIndex, columnMaxWidthMapping); + } - createRowLine(stringBuilder, headersList.size(), columnMaxWidthMapping); + stringBuilder.append(NEW_LINE); - for (List row : rowsList) { - for (int i = 0; i < rowHeight; i++) { - stringBuilder.append(NEW_LINE); - } - for (int cellIndex = 0; cellIndex < row.size(); cellIndex++) { - fillCell(stringBuilder, row.get(cellIndex), cellIndex, columnMaxWidthMapping); - } - } + createRowLine(stringBuilder, headersList.size(), columnMaxWidthMapping); + for (List row : rowsList) { + for (int i = 0; i < rowHeight; i++) { stringBuilder.append(NEW_LINE); - createRowLine(stringBuilder, headersList.size(), columnMaxWidthMapping); - stringBuilder.append(NEW_LINE); - stringBuilder.append(NEW_LINE); - - return stringBuilder.toString(); + } + for (int cellIndex = 0; cellIndex < row.size(); cellIndex++) { + fillCell(stringBuilder, row.get(cellIndex), cellIndex, columnMaxWidthMapping); + } } - private static void fillSpace(StringBuilder stringBuilder, int length) { - for (int i = 0; i < length; i++) { - stringBuilder.append(" "); - } - } + stringBuilder.append(NEW_LINE); + createRowLine(stringBuilder, headersList.size(), columnMaxWidthMapping); + stringBuilder.append(NEW_LINE); + stringBuilder.append(NEW_LINE); - /** Add a rowLine at the beginning, the middle between headersList and rowLists, the end of the summary table. */ - private static void createRowLine(StringBuilder stringBuilder,int headersListSize, Map columnMaxWidthMapping) { - for (int i = 0; i < headersListSize; i++) { - if(i == 0) { - stringBuilder.append(TABLE_JOINT_SYMBOL); - } - - for (int j = 0; j < columnMaxWidthMapping.get(i) + PADDING_SIZE * 2 ; j++) { - stringBuilder.append(TABLE_H_SPLIT_SYMBOL); - } - stringBuilder.append(TABLE_JOINT_SYMBOL); - } - } + return stringBuilder.toString(); + } - /** Get the width of the summary table. */ - private static Map getMaximumWidthofTable(List headersList, List> rowsList) { - Map columnMaxWidthMapping = new HashMap<>(); + private static void fillSpace(StringBuilder stringBuilder, int length) { + for (int i = 0; i < length; i++) { + stringBuilder.append(" "); + } + } + + /** + * Add a rowLine at the beginning, the middle between headersList and rowLists, the end of the + * summary table. + */ + private static void createRowLine( + StringBuilder stringBuilder, + int headersListSize, + Map columnMaxWidthMapping) { + for (int i = 0; i < headersListSize; i++) { + if (i == 0) { + stringBuilder.append(TABLE_JOINT_SYMBOL); + } + + int columnMaxWidth = Optional.ofNullable(columnMaxWidthMapping.get(i)).orElse(0); + for (int j = 0; j < columnMaxWidth + PADDING_SIZE * 2; j++) { + stringBuilder.append(TABLE_H_SPLIT_SYMBOL); + } + stringBuilder.append(TABLE_JOINT_SYMBOL); + } + } - for (int columnIndex = 0; columnIndex < headersList.size(); columnIndex++) { - columnMaxWidthMapping.put(columnIndex, 0); - } + /** Get the width of the summary table. */ + private static Map getMaximumWidthofTable( + List headersList, List> rowsList) { + Map columnMaxWidthMapping = new HashMap<>(); - for (int columnIndex = 0; columnIndex < headersList.size(); columnIndex++) { - if(headersList.get(columnIndex).length() > columnMaxWidthMapping.get(columnIndex)) { - columnMaxWidthMapping.put(columnIndex, headersList.get(columnIndex).length()); - } - } + for (int columnIndex = 0; columnIndex < headersList.size(); columnIndex++) { + columnMaxWidthMapping.put(columnIndex, 0); + } - for (List row : rowsList) { - for (int columnIndex = 0; columnIndex < row.size(); columnIndex++) { - if(row.get(columnIndex).length() > columnMaxWidthMapping.get(columnIndex)) { - columnMaxWidthMapping.put(columnIndex, row.get(columnIndex).length()); - } - } - } + for (int columnIndex = 0; columnIndex < headersList.size(); columnIndex++) { + Integer columnMaxWidth = + Optional.ofNullable(columnMaxWidthMapping.get(columnIndex)).orElse(0); + if (headersList.get(columnIndex).length() > columnMaxWidth) { + columnMaxWidthMapping.put(columnIndex, headersList.get(columnIndex).length()); + } + } - for (int columnIndex = 0; columnIndex < headersList.size(); columnIndex++) { - if(columnMaxWidthMapping.get(columnIndex) % 2 != 0) { - columnMaxWidthMapping.put(columnIndex, columnMaxWidthMapping.get(columnIndex) + 1); - } + for (List row : rowsList) { + for (int columnIndex = 0; columnIndex < row.size(); columnIndex++) { + Integer columnMaxWidth = + Optional.ofNullable(columnMaxWidthMapping.get(columnIndex)).orElse(0); + if (row.get(columnIndex).length() > columnMaxWidth) { + columnMaxWidthMapping.put(columnIndex, row.get(columnIndex).length()); } - - return columnMaxWidthMapping; + } } - private static int getOptimumCellPadding(int cellIndex,int datalength,Map columnMaxWidthMapping,int cellPaddingSize) { - if(datalength % 2 != 0) { - datalength++; - } + for (int columnIndex = 0; columnIndex < headersList.size(); columnIndex++) { + int columnMaxWidth = Optional.ofNullable(columnMaxWidthMapping.get(columnIndex)).orElse(0); + if (columnMaxWidth % 2 != 0) { + columnMaxWidthMapping.put(columnIndex, columnMaxWidth + 1); + } + } - if(datalength < columnMaxWidthMapping.get(cellIndex)) { - cellPaddingSize = cellPaddingSize + (columnMaxWidthMapping.get(cellIndex) - datalength) / 2; - } + return columnMaxWidthMapping; + } - return cellPaddingSize; + private static int getOptimumCellPadding( + int cellIndex, + int datalength, + Map columnMaxWidthMapping, + int cellPaddingSize) { + if (datalength % 2 != 0) { + datalength++; } - /** Use space to fill a single cell with optimum cell padding size. */ - private static void fillCell(StringBuilder stringBuilder,String cell,int cellIndex,Map columnMaxWidthMapping) { + int columnMaxWidth = Optional.ofNullable(columnMaxWidthMapping.get(cellIndex)).orElse(0); + if (datalength < columnMaxWidth) { + cellPaddingSize = cellPaddingSize + (columnMaxWidth - datalength) / 2; + } - int cellPaddingSize = getOptimumCellPadding(cellIndex, cell.length(), columnMaxWidthMapping, PADDING_SIZE); + return cellPaddingSize; + } - if(cellIndex == 0) { - stringBuilder.append(TABLE_V_SPLIT_SYMBOL); - } + /** Use space to fill a single cell with optimum cell padding size. */ + private static void fillCell( + StringBuilder stringBuilder, + String cell, + int cellIndex, + Map columnMaxWidthMapping) { - fillSpace(stringBuilder, cellPaddingSize); - stringBuilder.append(cell); - if(cell.length() % 2 != 0) { - stringBuilder.append(" "); - } + int cellPaddingSize = + getOptimumCellPadding(cellIndex, cell.length(), columnMaxWidthMapping, PADDING_SIZE); - fillSpace(stringBuilder, cellPaddingSize); + if (cellIndex == 0) { + stringBuilder.append(TABLE_V_SPLIT_SYMBOL); + } - stringBuilder.append(TABLE_V_SPLIT_SYMBOL); + fillSpace(stringBuilder, cellPaddingSize); + stringBuilder.append(cell); + if (cell.length() % 2 != 0) { + stringBuilder.append(" "); } + + fillSpace(stringBuilder, cellPaddingSize); + + stringBuilder.append(TABLE_V_SPLIT_SYMBOL); + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TableSchemaJSONLoader.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TableSchemaJSONLoader.java index 420386cc3af8..3e8cb3cc4694 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TableSchemaJSONLoader.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TableSchemaJSONLoader.java @@ -17,96 +17,111 @@ */ package org.apache.beam.sdk.tpcds; -import org.apache.beam.repackaged.core.org.apache.commons.compress.utils.FileNameUtils; -import org.json.simple.JSONArray; -import org.json.simple.JSONObject; -import org.json.simple.parser.JSONParser; - -import java.io.File; -import java.io.FileReader; +import java.io.IOException; +import java.util.ArrayList; import java.util.Iterator; import java.util.List; import java.util.Map; -import java.util.Objects; -import java.util.ArrayList; - +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.reflect.ClassPath; +import org.json.simple.JSONArray; +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; /** - * TableSchemaJSONLoader can get all table's names from resource/schemas directory and parse a table's schema into a string. + * TableSchemaJSONLoader can get all table's names from resource/schemas directory and parse a + * table's schema into a string. */ public class TableSchemaJSONLoader { - /** - * Read a table schema json file from resource/schemas directory, parse the file into a string which can be utilized by BeamSqlEnv.executeDdl method. - * @param tableName The name of the json file to be read (fo example: item, store_sales). - * @return A string that matches the format in BeamSqlEnv.executeDdl method, such as "d_date_sk bigint, d_date_id varchar" - * @throws Exception - */ - public static String parseTableSchema(String tableName) throws Exception { - String tableFilePath = Objects.requireNonNull(TableSchemaJSONLoader.class.getClassLoader().getResource("schemas/" + tableName +".json")).getPath(); + /** + * Read a table schema json file from resource/schemas directory, parse the file into a string + * which can be utilized by BeamSqlEnv.executeDdl method. + * + * @param tableName The name of the json file to be read (fo example: item, store_sales). + * @return A string that matches the format in BeamSqlEnv.executeDdl method, such as "d_date_sk + * bigint, d_date_id varchar" + * @throws Exception + */ + // TODO(BEAM-12160): Fix the warning. + @SuppressWarnings({"rawtypes", "DefaultCharset"}) + public static String parseTableSchema(String tableName) throws Exception { + String path = "schemas/" + tableName + ".json"; + String schema = Resources.toString(Resources.getResource(path), Charsets.UTF_8); - JSONObject jsonObject = (JSONObject) new JSONParser().parse(new FileReader(new File(tableFilePath))); - JSONArray jsonArray = (JSONArray) jsonObject.get("schema"); - - // Iterate each element in jsonArray to construct the schema string - StringBuilder schemaStringBuilder = new StringBuilder(); + JSONObject jsonObject = (JSONObject) new JSONParser().parse(schema); + JSONArray jsonArray = (JSONArray) jsonObject.get("schema"); + if (jsonArray == null) { + throw new RuntimeException("Can't get Json array for \"schema\" key."); + } - Iterator jsonArrIterator = jsonArray.iterator(); - Iterator recordIterator; - while (jsonArrIterator.hasNext()) { - recordIterator = ((Map) jsonArrIterator.next()).entrySet().iterator(); - while (recordIterator.hasNext()) { - Map.Entry pair = recordIterator.next(); + // Iterate each element in jsonArray to construct the schema string + StringBuilder schemaStringBuilder = new StringBuilder(); - if (pair.getKey().equals("type")) { - // If the key of the pair is "type", make some modification before appending it to the schemaStringBuilder, then append a comma. - String typeName = (String) pair.getValue(); - if (typeName.toLowerCase().equals("identifier") || typeName.toLowerCase().equals("integer")) { - // Use long type to represent int, prevent overflow - schemaStringBuilder.append("bigint"); - } else if (typeName.contains("decimal")) { - // Currently Beam SQL doesn't handle "decimal" type properly, use "double" to replace it for now. - schemaStringBuilder.append("double"); - } else { - // Currently Beam SQL doesn't handle "date" type properly, use "varchar" replace it for now. - schemaStringBuilder.append("varchar"); - } - schemaStringBuilder.append(','); - } else { - // If the key of the pair is "name", directly append it to the StringBuilder, then append a space. - schemaStringBuilder.append((pair.getValue())); - schemaStringBuilder.append(' '); - } - } - } + Iterator jsonArrIterator = jsonArray.iterator(); + Iterator recordIterator; + while (jsonArrIterator.hasNext()) { + recordIterator = ((Map) jsonArrIterator.next()).entrySet().iterator(); + while (recordIterator.hasNext()) { + Map.Entry pair = (Map.Entry) recordIterator.next(); - // Delete the last ',' in schema string - if (schemaStringBuilder.length() > 0) { - schemaStringBuilder.deleteCharAt(schemaStringBuilder.length() - 1); + if (pair.getKey().equals("type")) { + // If the key of the pair is "type", make some modification before appending it to the + // schemaStringBuilder, then append a comma. + String typeName = (String) pair.getValue(); + if (typeName.equalsIgnoreCase("identifier") || typeName.equalsIgnoreCase("integer")) { + // Use long type to represent int, prevent overflow + schemaStringBuilder.append("bigint"); + } else if (typeName.contains("decimal")) { + // Currently Beam SQL doesn't handle "decimal" type properly, use "double" to replace it + // for now. + schemaStringBuilder.append("double"); + } else { + // Currently Beam SQL doesn't handle "date" type properly, use "varchar" replace it for + // now. + schemaStringBuilder.append("varchar"); + } + schemaStringBuilder.append(','); + } else { + // If the key of the pair is "name", directly append it to the StringBuilder, then append + // a space. + schemaStringBuilder.append((pair.getValue())); + schemaStringBuilder.append(' '); } - - String schemaString = schemaStringBuilder.toString(); - - return schemaString; + } } - /** - * Get all tables' names. Tables are stored in resource/schemas directory in the form of json files, such as "item.json", "store_sales.json", they'll be converted to "item", "store_sales". - * @return The list of names of all tables. - */ - public static List getAllTableNames() { - String tableDirPath = Objects.requireNonNull(TableSchemaJSONLoader.class.getClassLoader().getResource("schemas")).getPath(); - File tableDir = new File(tableDirPath); - File[] tableDirListing = tableDir.listFiles(); + // Delete the last ',' in schema string + if (schemaStringBuilder.length() > 0) { + schemaStringBuilder.deleteCharAt(schemaStringBuilder.length() - 1); + } - List tableNames = new ArrayList<>(); + return schemaStringBuilder.toString(); + } - if (tableDirListing != null) { - for (File file : tableDirListing) { - // Remove the .json extension in file name - tableNames.add(FileNameUtils.getBaseName((file.getName()))); - } - } + /** + * Get all tables' names. Tables are stored in resource/schemas directory in the form of json + * files, such as "item.json", "store_sales.json", they'll be converted to "item", "store_sales". + * + * @return The list of names of all tables. + */ + public static List getAllTableNames() throws IOException { + ClassLoader classLoader = TableSchemaJSONLoader.class.getClassLoader(); + if (classLoader == null) { + throw new RuntimeException("Can't get classloader from TableSchemaJSONLoader."); + } + ClassPath classPath = ClassPath.from(classLoader); - return tableNames; + List tableNames = new ArrayList<>(); + for (ClassPath.ResourceInfo resourceInfo : classPath.getResources()) { + String resourceName = resourceInfo.getResourceName(); + if (resourceName.startsWith("schemas/")) { + String tableName = + resourceName.substring("schemas/".length(), resourceName.length() - ".json".length()); + tableNames.add(tableName); + } } + + return tableNames; + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsOptions.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsOptions.java index 1c567ddf339d..30159991db7a 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsOptions.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsOptions.java @@ -17,26 +17,48 @@ */ package org.apache.beam.sdk.tpcds; +import org.apache.beam.sdk.extensions.sql.impl.BeamSqlPipelineOptions; import org.apache.beam.sdk.options.Default; import org.apache.beam.sdk.options.Description; -import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.Validation; -/** Options used to configure TPC-DS test */ -public interface TpcdsOptions extends PipelineOptions { - @Description("The size of TPC-DS data to run query on, user input should contain the unit, such as '1G', '10G'") - String getDataSize(); +/** Options used to configure TPC-DS test. */ +public interface TpcdsOptions extends BeamSqlPipelineOptions { + @Description( + "The size of TPC-DS data to run query on, user input should contain the unit, such as '1G', '10G'") + @Validation.Required + String getDataSize(); - void setDataSize(String dataSize); + void setDataSize(String dataSize); - // Set the return type to be String since reading from the command line (user input will be like "1,2,55" which represent TPC-DS query1, query3, query55) - @Description("The queries numbers, read user input as string, numbers separated by commas") - String getQueries(); + // Set the return type to be String since reading from the command line (user input will be like + // "1,2,55" which represent TPC-DS query1, query3, query55) + @Description("The queries numbers, read user input as string, numbers separated by commas") + String getQueries(); - void setQueries(String queries); + void setQueries(String queries); - @Description("The number of queries to run in parallel") - @Default.Integer(1) - Integer getTpcParallel(); + @Description("The number of queries to run in parallel") + @Default.Integer(1) + Integer getTpcParallel(); - void setTpcParallel(Integer parallelism); + void setTpcParallel(Integer parallelism); + + @Description("The path to input data directory") + @Validation.Required + String getDataDirectory(); + + void setDataDirectory(String path); + + @Description("The path to directory with results") + @Validation.Required + String getResultsDirectory(); + + void setResultsDirectory(String path); + + @Description("Where the data comes from.") + @Default.Enum("CSV") + TpcdsUtils.SourceType getSourceType(); + + void setSourceType(TpcdsUtils.SourceType sourceType); } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsOptionsRegistrar.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsOptionsRegistrar.java index d1ddc9ddb794..fc596677d268 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsOptionsRegistrar.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsOptionsRegistrar.java @@ -20,14 +20,14 @@ import com.google.auto.service.AutoService; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.options.PipelineOptionsRegistrar; -import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; /** {@link AutoService} registrar for {@link TpcdsOptions}. */ @AutoService(PipelineOptionsRegistrar.class) -public class TpcdsOptionsRegistrar implements PipelineOptionsRegistrar{ +public class TpcdsOptionsRegistrar implements PipelineOptionsRegistrar { - @Override - public Iterable> getPipelineOptions() { - return ImmutableList.of(TpcdsOptions.class); - } + @Override + public Iterable> getPipelineOptions() { + return ImmutableList.of(TpcdsOptions.class); + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsParametersReader.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsParametersReader.java index c2813416d74d..4a2ed544c96a 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsParametersReader.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsParametersReader.java @@ -17,91 +17,107 @@ */ package org.apache.beam.sdk.tpcds; +import java.util.ArrayList; +import java.util.Arrays; import java.util.HashSet; +import java.util.List; import java.util.Set; import java.util.stream.Collectors; import java.util.stream.Stream; -/** - * Get and check the TpcdsOptions' parameters, throw exceptions when user input is invalid - */ +/** Get and check the TpcdsOptions' parameters, throw exceptions when user input is invalid. */ public class TpcdsParametersReader { - /** The data sizes that have been supported. */ - private static final Set supportedDataSizes = Stream.of("1G", "10G", "100G").collect(Collectors.toCollection(HashSet::new)); - /** - * Get and check dataSize entered by user. This dataSize has to have been supported. - * - * @param tpcdsOptions TpcdsOptions object constructed from user input - * @return The dateSize user entered, if it is contained in supportedDataSizes set. - * @throws Exception - */ - public static String getAndCheckDataSize(TpcdsOptions tpcdsOptions) throws Exception { - String dataSize = tpcdsOptions.getDataSize(); + /** The data sizes that have been supported. */ + private static final Set supportedDataSizes = + Stream.of("1G", "10G", "100G").collect(Collectors.toCollection(HashSet::new)); - if (!supportedDataSizes.contains(dataSize)) { - throw new Exception("The data size you entered has not been supported."); - } + private static final String QUERY_PREFIX = "query"; - return dataSize; - } + public static final List ALL_QUERY_NAMES = getAllQueryNames(); - /** - * Get and check queries entered by user. This has to be a string of numbers separated by commas or "all" which means run all 99 queiries. - * All query numbers have to be between 1 and 99. - * - * @param tpcdsOptions TpcdsOptions object constructed from user input - * @return An array of query names, for example "1,2,7" will be output as "query1,query2,query7" - * @throws Exception - */ - public static String[] getAndCheckQueryNameArray(TpcdsOptions tpcdsOptions) throws Exception { - String queryNums = tpcdsOptions.getQueries(); + /** + * Get and check dataSize entered by user. This dataSize has to have been supported. + * + * @param tpcdsOptions TpcdsOptions object constructed from user input + * @return The dateSize user entered, if it is contained in supportedDataSizes set. + * @throws Exception + */ + public static String getAndCheckDataSize(TpcdsOptions tpcdsOptions) throws Exception { + String dataSize = tpcdsOptions.getDataSize(); + + if (!supportedDataSizes.contains(dataSize)) { + throw new Exception("The data size you entered has not been supported."); + } - String[] queryNumArr; - if (queryNums.toLowerCase().equals("all")) { - // All 99 TPC-DS queries need to be executed. - queryNumArr = new String[99]; - for (int i = 0; i < 99; i++) { - queryNumArr[i] = Integer.toString(i + 1); - } - } else { - // Split user input queryNums by spaces and commas, get an array of all query numbers. - queryNumArr = queryNums.split("[\\s,]+"); + return dataSize; + } - for (String queryNumStr : queryNumArr) { - try { - int queryNum = Integer.parseInt(queryNumStr); - if (queryNum < 1 || queryNum > 99) { - throw new Exception("The queries you entered contains invalid query number, please provide integers between 1 and 99."); - } - } catch (NumberFormatException e) { - System.out.println("The queries you entered should be integers, please provide integers between 1 and 99."); + /** + * Get and check queries entered by user. This has to be a string of numbers separated by commas + * or "all" which means run all 99 queries. All query numbers have to be between 1 and 99. Some + * queries (14, 23, 24 and 39) may contain suffixes "a" or "b", e.g. "14a" and "14b". + * + * @param tpcdsOptions TpcdsOptions object constructed from user input + * @return An array of query names, for example "1,2,14b" will be output as + * "query1,query2,query14b" + * @throws Exception + */ + public static String[] getAndCheckQueryNames(TpcdsOptions tpcdsOptions) { + if (tpcdsOptions.getQueries().equalsIgnoreCase("all")) { + // All TPC-DS queries need to be executed. + return ALL_QUERY_NAMES.toArray(new String[0]); + } else { + List queries = new ArrayList<>(); + Arrays.stream(tpcdsOptions.getQueries().split("[\\s,]+", -1)) + .map(s -> QUERY_PREFIX + s) + .forEach( + query -> { + if (!ALL_QUERY_NAMES.contains(query)) { + throw new IllegalArgumentException( + "The query \"" + query + "\" is not supported."); } - } - } + queries.add(query); + }); + return queries.toArray(new String[0]); + } + } - String[] queryNameArr = new String[queryNumArr.length]; - for (int i = 0; i < queryNumArr.length; i++) { - queryNameArr[i] = "query" + queryNumArr[i]; - } + private static List getAllQueryNames() { + List queries = new ArrayList<>(); - return queryNameArr; + for (int i = 1; i <= 99; i++) { + switch (i) { + case 14: + case 23: + case 24: + case 39: + queries.add(QUERY_PREFIX + i + "a"); + queries.add(QUERY_PREFIX + i + "b"); + break; + default: + queries.add(QUERY_PREFIX + i); + break; + } } + return queries; + } - /** - * Get and check TpcParallel entered by user. This has to be an integer between 1 and 99. - * - * @param tpcdsOptions TpcdsOptions object constructed from user input. - * @return The TpcParallel user entered. - * @throws Exception - */ - public static int getAndCheckTpcParallel(TpcdsOptions tpcdsOptions) throws Exception { - int nThreads = tpcdsOptions.getTpcParallel(); - - if (nThreads < 1 || nThreads > 99) { - throw new Exception("The TpcParallel your entered is invalid, please provide an integer between 1 and 99."); - } + /** + * Get and check TpcParallel entered by user. This has to be an integer between 1 and 99. + * + * @param tpcdsOptions TpcdsOptions object constructed from user input. + * @return The TpcParallel user entered. + * @throws Exception + */ + public static int getAndCheckTpcParallel(TpcdsOptions tpcdsOptions) throws Exception { + int nThreads = tpcdsOptions.getTpcParallel(); - return nThreads; + if (nThreads < 1 || nThreads > 99) { + throw new Exception( + "The TpcParallel your entered is invalid, please provide an integer between 1 and 99."); } + + return nThreads; + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsRun.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsRun.java index 1070a8891921..4b80aa80aec0 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsRun.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsRun.java @@ -17,40 +17,42 @@ */ package org.apache.beam.sdk.tpcds; +import java.util.concurrent.Callable; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.PipelineResult; import org.apache.beam.sdk.PipelineResult.State; -import java.util.concurrent.Callable; -/** - * To fulfill multi-threaded execution - */ +/** To fulfill multi-threaded execution. */ public class TpcdsRun implements Callable { - private final Pipeline pipeline; + private final Pipeline pipeline; - public TpcdsRun (Pipeline pipeline) { - this.pipeline = pipeline; - } - - @Override - public TpcdsRunResult call() { - TpcdsRunResult tpcdsRunResult; + public TpcdsRun(Pipeline pipeline) { + this.pipeline = pipeline; + } - try { - PipelineResult pipelineResult = pipeline.run(); - long startTimeStamp = System.currentTimeMillis(); - State state = pipelineResult.waitUntilFinish(); - long endTimeStamp = System.currentTimeMillis(); + @Override + public TpcdsRunResult call() { + TpcdsRunResult tpcdsRunResult; - // Make sure to set the job status to be successful only when pipelineResult's final state is DONE. - boolean isSuccessful = state == State.DONE; - tpcdsRunResult = new TpcdsRunResult(isSuccessful, startTimeStamp, endTimeStamp, pipeline.getOptions(), pipelineResult); - } catch (Exception e) { - // If the pipeline execution failed, return a result with failed status but don't interrupt other threads. - e.printStackTrace(); - tpcdsRunResult = new TpcdsRunResult(false, 0, 0, pipeline.getOptions(), null); - } + try { + PipelineResult pipelineResult = pipeline.run(); + long startTimeStamp = System.currentTimeMillis(); + State state = pipelineResult.waitUntilFinish(); + long endTimeStamp = System.currentTimeMillis(); - return tpcdsRunResult; + // Make sure to set the job status to be successful only when pipelineResult's final state is + // DONE. + boolean isSuccessful = state == State.DONE; + tpcdsRunResult = + new TpcdsRunResult( + isSuccessful, startTimeStamp, endTimeStamp, pipeline.getOptions(), pipelineResult); + } catch (Exception e) { + // If the pipeline execution failed, return a result with failed status but don't interrupt + // other threads. + e.printStackTrace(); + tpcdsRunResult = new TpcdsRunResult(false, 0, 0, pipeline.getOptions(), null); } + + return tpcdsRunResult; + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsRunResult.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsRunResult.java index 0e22dced6603..55fdac1e177d 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsRunResult.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsRunResult.java @@ -17,76 +17,86 @@ */ package org.apache.beam.sdk.tpcds; +import java.sql.Timestamp; +import java.util.Date; import org.apache.beam.sdk.PipelineResult; import org.apache.beam.sdk.extensions.sql.impl.BeamSqlPipelineOptions; import org.apache.beam.sdk.options.PipelineOptions; -import java.sql.Timestamp; -import java.util.Date; - +import org.checkerframework.checker.nullness.qual.Nullable; public class TpcdsRunResult { - private boolean isSuccessful; - private long startTime; - private long endTime; - private PipelineOptions pipelineOptions; - private PipelineResult pipelineResult; + private final boolean isSuccessful; + private final long startTime; + private final long endTime; + private final PipelineOptions pipelineOptions; + private final @Nullable PipelineResult pipelineResult; - public TpcdsRunResult(boolean isSuccessful, long startTime, long endTime, PipelineOptions pipelineOptions, PipelineResult pipelineResult) { - this.isSuccessful = isSuccessful; - this.startTime = startTime; - this.endTime = endTime; - this.pipelineOptions = pipelineOptions; - this.pipelineResult = pipelineResult; - } + public TpcdsRunResult( + boolean isSuccessful, + long startTime, + long endTime, + PipelineOptions pipelineOptions, + @Nullable PipelineResult pipelineResult) { + this.isSuccessful = isSuccessful; + this.startTime = startTime; + this.endTime = endTime; + this.pipelineOptions = pipelineOptions; + this.pipelineResult = pipelineResult; + } - public boolean getIsSuccessful() { return isSuccessful; } + public boolean getIsSuccessful() { + return isSuccessful; + } - public Date getStartDate() { - Timestamp startTimeStamp = new Timestamp(startTime); - Date startDate = new Date(startTimeStamp.getTime()); - return startDate; - } + public Date getStartDate() { + Timestamp startTimeStamp = new Timestamp(startTime); + return new Date(startTimeStamp.getTime()); + } - public Date getEndDate() { - Timestamp endTimeStamp = new Timestamp(endTime); - Date endDate = new Date(endTimeStamp.getTime()); - return endDate; - } + public Date getEndDate() { + Timestamp endTimeStamp = new Timestamp(endTime); + return new Date(endTimeStamp.getTime()); + } - public double getElapsedTime() { - return (endTime - startTime) / 1000.0; - } + public double getElapsedTime() { + return (endTime - startTime) / 1000.0; + } - public PipelineOptions getPipelineOptions() { return pipelineOptions; } + public PipelineOptions getPipelineOptions() { + return pipelineOptions; + } - public PipelineResult getPipelineResult() { return pipelineResult; } + public @Nullable PipelineResult getPipelineResult() { + return pipelineResult; + } - public String getJobName() { - PipelineOptions pipelineOptions = getPipelineOptions(); - return pipelineOptions.getJobName(); - } + public String getJobName() { + PipelineOptions pipelineOptions = getPipelineOptions(); + return pipelineOptions.getJobName(); + } - public String getQueryName() { - String jobName = getJobName(); - int endIndex = jobName.indexOf("result"); - String queryName = jobName.substring(0, endIndex); - return queryName; - } + public String getQueryName() { + String jobName = getJobName(); + int endIndex = jobName.indexOf("result"); + return jobName.substring(0, endIndex); + } - public String getDataSize() throws Exception { - PipelineOptions pipelineOptions = getPipelineOptions(); - return TpcdsParametersReader.getAndCheckDataSize(pipelineOptions.as(TpcdsOptions.class)); - } + public String getDataSize() throws Exception { + PipelineOptions pipelineOptions = getPipelineOptions(); + return TpcdsParametersReader.getAndCheckDataSize(pipelineOptions.as(TpcdsOptions.class)); + } - public String getDialect() throws Exception { - PipelineOptions pipelineOptions = getPipelineOptions(); - String queryPlannerClassName = pipelineOptions.as(BeamSqlPipelineOptions.class).getPlannerName(); - String dialect; - if (queryPlannerClassName.equals("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")) { - dialect = "ZetaSQL"; - } else { - dialect = "Calcite"; - } - return dialect; + public String getDialect() { + PipelineOptions pipelineOptions = getPipelineOptions(); + String queryPlannerClassName = + pipelineOptions.as(BeamSqlPipelineOptions.class).getPlannerName(); + String dialect; + if (queryPlannerClassName.equals( + "org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")) { + dialect = "ZetaSQL"; + } else { + dialect = "Calcite"; } + return dialect; + } } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsSchemas.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsSchemas.java index b776551b3379..6dc00af1f169 100644 --- a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsSchemas.java +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsSchemas.java @@ -17,656 +17,705 @@ */ package org.apache.beam.sdk.tpcds; -import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; - import java.util.HashMap; import java.util.List; import java.util.Map; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; public class TpcdsSchemas { - /** - * Get all tpcds table schemas automatically by reading json files. - * In this case all field will be nullable, this is a bit different from the tpcds specification, but doesn't affect query execution. - * - * @return A map of all tpcds table schemas with their table names as keys. - * @throws Exception - */ - public static Map getTpcdsSchemas() throws Exception { - Map schemaMap = new HashMap<>(); - - List tableNames = TableSchemaJSONLoader.getAllTableNames(); - for (String tableName : tableNames) { - Schema.Builder schemaBuilder = Schema.builder(); - - String tableSchemaString = TableSchemaJSONLoader.parseTableSchema(tableName); - String[] nameTypePairs = tableSchemaString.split(","); - - for (String nameTypePairString : nameTypePairs) { - String[] nameTypePair = nameTypePairString.split("\\s+"); - String name = nameTypePair[0]; - String type = nameTypePair[1]; - - Schema.FieldType fieldType; - if (type.equals("bigint")) { - fieldType = Schema.FieldType.INT64; - } else if (type.equals("double")) { - fieldType = Schema.FieldType.DOUBLE; - } else { - fieldType = Schema.FieldType.STRING; - } - - schemaBuilder.addNullableField(name, fieldType); - } - - Schema tableSchema = schemaBuilder.build(); - schemaMap.put(tableName,tableSchema); + /** + * Get all tpcds table schemas automatically by reading json files. In this case all field will be + * nullable, this is a bit different from the tpcds specification, but doesn't affect query + * execution. + * + * @return A map of all tpcds table schemas with their table names as keys. + * @throws Exception + */ + // TODO(BEAM-12160): Fix the warning. + @SuppressWarnings("StringSplitter") + public static Map getTpcdsSchemas() throws Exception { + Map schemaMap = new HashMap<>(); + + List tableNames = TableSchemaJSONLoader.getAllTableNames(); + for (String tableName : tableNames) { + Schema.Builder schemaBuilder = Schema.builder(); + + String tableSchemaString = TableSchemaJSONLoader.parseTableSchema(tableName); + String[] nameTypePairs = tableSchemaString.split(","); + + for (String nameTypePairString : nameTypePairs) { + String[] nameTypePair = nameTypePairString.split("\\s+"); + String name = nameTypePair[0]; + String type = nameTypePair[1]; + + Schema.FieldType fieldType; + if (type.equals("bigint")) { + fieldType = Schema.FieldType.INT64; + } else if (type.equals("double")) { + fieldType = Schema.FieldType.DOUBLE; + } else { + fieldType = Schema.FieldType.STRING; } - return schemaMap; - } - /** - * Get all tpcds table schemas according to tpcds official specification. Some fields are set to be not nullable. - * - * @return A map of all tpcds table schemas with their table names as keys. - */ - public static Map getTpcdsSchemasImmutableMap() { - ImmutableMap immutableSchemaMap = - ImmutableMap. builder() - .put("call_center", callCenterSchema) - .put("catalog_page", catalogPageSchema) - .put("catalog_returns", catalogReturnsSchema) - .put("catalog_sales", catalogSalesSchema) - .put("customer", customerSchema) - .put("customer_address", customerAddressSchema) - .put("customer_demographics", customerDemographicsSchema) - .put("date_dim", dateDimSchema) - .put("household_demographics", householdDemographicsSchema) - .put("income_band", incomeBandSchema) - .put("inventory", inventorySchema) - .put("item", itemSchema) - .put("promotion", promotionSchema) - .put("reason", reasonSchema) - .put("ship_mode", shipModeSchema) - .put("store", storeSchema) - .put("store_returns", storeReturnsSchema) - .put("store_sales", storeSalesSchema) - .put("time_dim", timeDimSchema) - .put("warehouse", warehouseSchema) - .put("web_page", webPageSchema) - .put("web_returns", webReturnsSchema) - .put("web_sales", webSalesSchema) - .put("web_site", webSiteSchema) - .build(); - return immutableSchemaMap; - } + schemaBuilder.addNullableField(name, fieldType); + } - public static Schema getCallCenterSchema() { return callCenterSchema; } - - public static Schema getCatalogPageSchema() { return catalogPageSchema; } - - public static Schema getCatalogReturnsSchema() { return catalogReturnsSchema; } - - public static Schema getCatalogSalesSchema() { return catalogSalesSchema; } - - public static Schema getCustomerSchema() { return customerSchema; } - - public static Schema getCustomerAddressSchema() { return customerAddressSchema; } - - public static Schema getCustomerDemographicsSchema() { return customerDemographicsSchema; } - - public static Schema getDateDimSchema() { return dateDimSchema; } - - public static Schema getHouseholdDemographicsSchema() { return householdDemographicsSchema; } - - public static Schema getIncomeBandSchema() { return incomeBandSchema; } - - public static Schema getInventorySchema() { return inventorySchema; } - - public static Schema getItemSchema() { return itemSchema; } - - public static Schema getPromotionSchema() { return promotionSchema; } - - public static Schema getReasonSchema() { return reasonSchema; } - - public static Schema getShipModeSchema() { return shipModeSchema; } - - public static Schema getStoreSchema() { return storeSchema; } - - public static Schema getStoreReturnsSchema() { return storeReturnsSchema; } - - public static Schema getStoreSalesSchema() { return storeSalesSchema; } - - public static Schema getTimeDimSchema() { return timeDimSchema; } - - public static Schema getWarehouseSchema() { return warehouseSchema; } - - public static Schema getWebpageSchema() { return webPageSchema; } - - public static Schema getWebReturnsSchema() { return webReturnsSchema; } - - public static Schema getWebSalesSchema() { return webSalesSchema; } - - public static Schema getWebSiteSchema() { return webSiteSchema; } - - private static Schema callCenterSchema = - Schema.builder() - .addField("cc_call_center_sk", Schema.FieldType.INT64) - .addField("cc_call_center_id", Schema.FieldType.STRING) - .addNullableField("cc_rec_start_date", Schema.FieldType.STRING) - .addNullableField("cc_rec_end_date", Schema.FieldType.STRING) - .addNullableField("cc_closed_date_sk", Schema.FieldType.INT64) - .addNullableField("cc_open_date_sk", Schema.FieldType.INT64) - .addNullableField("cc_name", Schema.FieldType.STRING) - .addNullableField("cc_class", Schema.FieldType.STRING) - .addNullableField("cc_employees", Schema.FieldType.INT64) - .addNullableField("cc_sq_ft", Schema.FieldType.INT64) - .addNullableField("cc_hours", Schema.FieldType.STRING) - .addNullableField("cc_manager", Schema.FieldType.STRING) - .addNullableField("cc_mkt_id", Schema.FieldType.INT64) - .addNullableField("cc_mkt_class", Schema.FieldType.STRING) - .addNullableField("cc_mkt_desc", Schema.FieldType.STRING) - .addNullableField("cc_market_manager", Schema.FieldType.STRING) - .addNullableField("cc_division", Schema.FieldType.INT64) - .addNullableField("cc_division_name", Schema.FieldType.STRING) - .addNullableField("cc_company", Schema.FieldType.INT64) - .addNullableField("cc_company_name", Schema.FieldType.STRING) - .addNullableField("cc_street_number", Schema.FieldType.STRING) - .addNullableField("cc_street_name", Schema.FieldType.STRING) - .addNullableField("cc_street_type", Schema.FieldType.STRING) - .addNullableField("cc_suite_number", Schema.FieldType.STRING) - .addNullableField("cc_city", Schema.FieldType.STRING) - .addNullableField("cc_county", Schema.FieldType.STRING) - .addNullableField("cc_state", Schema.FieldType.STRING) - .addNullableField("cc_zip", Schema.FieldType.STRING) - .addNullableField("cc_country", Schema.FieldType.STRING) - .addNullableField("cc_gmt_offset", Schema.FieldType.DOUBLE) - .addNullableField("cc_tax_percentage", Schema.FieldType.DOUBLE) - .build(); - - private static Schema catalogPageSchema = - Schema.builder() - .addField("cp_catalog_page_sk", Schema.FieldType.INT64) - .addField("cp_catalog_page_id", Schema.FieldType.STRING) - .addNullableField("cp_start_date_sk", Schema.FieldType.INT64) - .addNullableField("cp_end_date_sk", Schema.FieldType.INT64) - .addNullableField("cp_department", Schema.FieldType.STRING) - .addNullableField("cp_catalog_number", Schema.FieldType.INT64) - .addNullableField("cp_catalog_page_number", Schema.FieldType.INT64) - .addNullableField("cp_description", Schema.FieldType.STRING) - .addNullableField("cp_type", Schema.FieldType.STRING) - .build(); - - private static Schema catalogReturnsSchema = - Schema.builder() - .addNullableField("cr_returned_date_sk", Schema.FieldType.INT64) - .addNullableField("cr_returned_time_sk", Schema.FieldType.INT64) - .addField("cr_item_sk", Schema.FieldType.INT64) - .addNullableField("cr_refunded_customer_sk", Schema.FieldType.INT64) - .addNullableField("cr_refunded_cdemo_sk", Schema.FieldType.INT64) - .addNullableField("cr_refunded_hdemo_sk", Schema.FieldType.INT64) - .addNullableField("cr_refunded_addr_sk", Schema.FieldType.INT64) - .addNullableField("cr_returning_customer_sk", Schema.FieldType.INT64) - .addNullableField("cr_returning_cdemo_sk", Schema.FieldType.INT64) - .addNullableField("cr_returning_hdemo_sk", Schema.FieldType.INT64) - .addNullableField("cr_returning_addr_sk", Schema.FieldType.INT64) - .addNullableField("cr_call_center_sk", Schema.FieldType.INT64) - .addNullableField("cr_catalog_page_sk", Schema.FieldType.INT64) - .addNullableField("cr_ship_mode_sk", Schema.FieldType.INT64) - .addNullableField("cr_warehouse_sk", Schema.FieldType.INT64) - .addNullableField("cr_reason_sk", Schema.FieldType.INT64) - .addField("cr_order_number", Schema.FieldType.INT64) - .addNullableField("cr_return_quantity", Schema.FieldType.INT64) - .addNullableField("cr_return_amount", Schema.FieldType.DOUBLE) - .addNullableField("cr_return_tax", Schema.FieldType.DOUBLE) - .addNullableField("cr_return_amt_inc_tax", Schema.FieldType.DOUBLE) - .addNullableField("cr_fee", Schema.FieldType.DOUBLE) - .addNullableField("cr_return_ship_cost", Schema.FieldType.DOUBLE) - .addNullableField("cr_refunded_cash", Schema.FieldType.DOUBLE) - .addNullableField("cr_reversed_charge", Schema.FieldType.DOUBLE) - .addNullableField("cr_store_credit", Schema.FieldType.DOUBLE) - .addNullableField("cr_net_loss", Schema.FieldType.DOUBLE) - .build(); - - private static Schema catalogSalesSchema = - Schema.builder() - .addNullableField("cs_sold_date_sk", Schema.FieldType.INT64) - .addNullableField("cs_sold_time_sk", Schema.FieldType.INT64) - .addNullableField("cs_ship_date_sk", Schema.FieldType.INT64) - .addNullableField("cs_bill_customer_sk", Schema.FieldType.INT64) - .addNullableField("cs_bill_cdemo_sk", Schema.FieldType.INT64) - .addNullableField("cs_bill_hdemo_sk", Schema.FieldType.INT64) - .addNullableField("cs_bill_addr_sk", Schema.FieldType.INT64) - .addNullableField("cs_ship_customer_sk", Schema.FieldType.INT64) - .addNullableField("cs_ship_cdemo_sk", Schema.FieldType.INT64) - .addNullableField("cs_ship_hdemo_sk", Schema.FieldType.INT64) - .addNullableField("cs_ship_addr_sk", Schema.FieldType.INT64) - .addNullableField("cs_call_center_sk", Schema.FieldType.INT64) - .addNullableField("cs_catalog_page_sk", Schema.FieldType.INT64) - .addNullableField("cs_ship_mode_sk", Schema.FieldType.INT64) - .addNullableField("cs_warehouse_sk", Schema.FieldType.INT64) - .addField("cs_item_sk", Schema.FieldType.INT64) - .addNullableField("cs_promo_sk", Schema.FieldType.INT64) - .addField("cs_order_number", Schema.FieldType.INT64) - .addNullableField("cs_quantity", Schema.FieldType.DOUBLE) - .addNullableField("cs_wholesale_cost", Schema.FieldType.DOUBLE) - .addNullableField("cs_list_price", Schema.FieldType.DOUBLE) - .addNullableField("cs_sales_price", Schema.FieldType.DOUBLE) - .addNullableField("cs_ext_discount_amt", Schema.FieldType.DOUBLE) - .addNullableField("cs_ext_sales_price", Schema.FieldType.DOUBLE) - .addNullableField("cs_ext_wholesale_cost", Schema.FieldType.DOUBLE) - .addNullableField("cs_ext_list_price", Schema.FieldType.DOUBLE) - .addNullableField("cs_ext_tax", Schema.FieldType.DOUBLE) - .addNullableField("cs_coupon_amt", Schema.FieldType.DOUBLE) - .addNullableField("cs_ext_ship_cost", Schema.FieldType.DOUBLE) - .addNullableField("cs_net_paid", Schema.FieldType.DOUBLE) - .addNullableField("cs_net_paid_inc_tax", Schema.FieldType.DOUBLE) - .addNullableField("cs_net_paid_inc_ship", Schema.FieldType.DOUBLE) - .addNullableField("cs_net_paid_inc_ship_tax", Schema.FieldType.DOUBLE) - .addNullableField("cs_net_profit", Schema.FieldType.DOUBLE) - .build(); - - private static Schema customerSchema = - Schema.builder() - .addField("c_customer_sk", Schema.FieldType.INT64) - .addField("c_customer_id", Schema.FieldType.STRING) - .addNullableField("c_current_cdemo_sk", Schema.FieldType.INT64) - .addNullableField("c_current_hdemo_sk", Schema.FieldType.INT64) - .addNullableField("c_current_addr_sk", Schema.FieldType.INT64) - .addNullableField("c_first_shipto_date_sk", Schema.FieldType.INT64) - .addNullableField("c_first_sales_date_sk", Schema.FieldType.INT64) - .addNullableField("c_salutation", Schema.FieldType.STRING) - .addNullableField("c_first_name", Schema.FieldType.STRING) - .addNullableField("c_last_name", Schema.FieldType.STRING) - .addNullableField("c_preferred_cust_flag", Schema.FieldType.STRING) - .addNullableField("c_birth_day", Schema.FieldType.INT64) - .addNullableField("c_birth_month", Schema.FieldType.INT64) - .addNullableField("c_birth_year", Schema.FieldType.INT64) - .addNullableField("c_birth_country", Schema.FieldType.STRING) - .addNullableField("c_login", Schema.FieldType.STRING) - .addNullableField("c_email_address", Schema.FieldType.STRING) - .addNullableField("c_last_review_date_sk", Schema.FieldType.INT64) - .build(); - - private static Schema customerAddressSchema = - Schema.builder() - .addField("ca_address_sk", Schema.FieldType.INT64) - .addField("ca_address_id", Schema.FieldType.STRING) - .addNullableField("ca_street_number", Schema.FieldType.STRING) - .addNullableField("ca_street_name", Schema.FieldType.STRING) - .addNullableField("ca_street_type", Schema.FieldType.STRING) - .addNullableField("ca_suite_number", Schema.FieldType.STRING) - .addNullableField("ca_city", Schema.FieldType.STRING) - .addNullableField("ca_county", Schema.FieldType.STRING) - .addNullableField("ca_state", Schema.FieldType.STRING) - .addNullableField("ca_zip", Schema.FieldType.STRING) - .addNullableField("ca_country", Schema.FieldType.STRING) - .addNullableField("ca_gmt_offset", Schema.FieldType.DOUBLE) - .addNullableField("ca_location_type", Schema.FieldType.STRING) - .build(); - - private static Schema customerDemographicsSchema = - Schema.builder() - .addField("cd_demo_sk", Schema.FieldType.INT64) - .addNullableField("cd_gender", Schema.FieldType.STRING) - .addNullableField("cd_marital_status", Schema.FieldType.STRING) - .addNullableField("cd_education_status", Schema.FieldType.STRING) - .addNullableField("cd_purchase_estimate", Schema.FieldType.INT64) - .addNullableField("cd_credit_rating", Schema.FieldType.STRING) - .addNullableField("cd_dep_count", Schema.FieldType.INT64) - .addNullableField("cd_dep_employed_count", Schema.FieldType.INT64) - .addNullableField("cd_dep_college_count", Schema.FieldType.INT64) - .build(); - - private static Schema dateDimSchema = - Schema.builder() - .addField("d_date_sk", Schema.FieldType.INT64) - .addField("d_date_id", Schema.FieldType.STRING) - .addNullableField("d_date", Schema.FieldType.STRING) - .addNullableField("d_month_seq", Schema.FieldType.INT64) - .addNullableField("d_week_seq", Schema.FieldType.INT64) - .addNullableField("d_quarter_seq", Schema.FieldType.INT64) - .addNullableField("d_year", Schema.FieldType.INT64) - .addNullableField("d_dow", Schema.FieldType.INT64) - .addNullableField("d_moy", Schema.FieldType.INT64) - .addNullableField("d_dom", Schema.FieldType.INT64) - .addNullableField("d_qoy", Schema.FieldType.INT64) - .addNullableField("d_fy_year", Schema.FieldType.INT64) - .addNullableField("d_fy_quarter_seq", Schema.FieldType.INT64) - .addNullableField("d_fy_week_seq", Schema.FieldType.INT64) - .addNullableField("d_day_name", Schema.FieldType.STRING) - .addNullableField("d_quarter_name", Schema.FieldType.STRING) - .addNullableField("d_holiday", Schema.FieldType.STRING) - .addNullableField("d_weekend", Schema.FieldType.STRING) - .addNullableField("d_following_holiday", Schema.FieldType.STRING) - .addNullableField("d_first_dom", Schema.FieldType.INT64) - .addNullableField("d_last_dom", Schema.FieldType.INT64) - .addNullableField("d_same_day_ly", Schema.FieldType.INT64) - .addNullableField("d_same_day_lq", Schema.FieldType.INT64) - .addNullableField("d_current_day", Schema.FieldType.STRING) - .addNullableField("d_current_week", Schema.FieldType.STRING) - .addNullableField("d_current_month", Schema.FieldType.STRING) - .addNullableField("d_current_quarter", Schema.FieldType.STRING) - .addNullableField("d_current_year", Schema.FieldType.STRING) - .build(); - - private static Schema householdDemographicsSchema = - Schema.builder() - .addField("hd_demo_sk", Schema.FieldType.INT64) - .addNullableField("hd_income_band_sk", Schema.FieldType.INT64) - .addNullableField("hd_buy_potential", Schema.FieldType.STRING) - .addNullableField("hd_dep_count", Schema.FieldType.INT64) - .addNullableField("hd_vehicle_count", Schema.FieldType.INT64) - .build(); - - private static Schema incomeBandSchema = - Schema.builder() - .addField("ib_income_band_sk", Schema.FieldType.INT64) - .addNullableField("ib_lower_bound", Schema.FieldType.INT64) - .addNullableField("ib_upper_bound", Schema.FieldType.INT64) - .build(); - - private static Schema inventorySchema = - Schema.builder() - .addField("inv_date_sk", Schema.FieldType.INT32) - .addField("inv_item_sk", Schema.FieldType.INT32) - .addField("inv_warehouse_sk", Schema.FieldType.INT32) - .addNullableField("inv_quantity_on_hand", Schema.FieldType.INT32) - .build(); - - private static Schema itemSchema = - Schema.builder() - .addField("i_item_sk", Schema.FieldType.INT64) - .addField("i_item_id", Schema.FieldType.STRING) - .addNullableField("i_rec_start_date", Schema.FieldType.STRING) - .addNullableField("i_rec_end_date", Schema.FieldType.STRING) - .addNullableField("i_item_desc", Schema.FieldType.STRING) - .addNullableField("i_current_price", Schema.FieldType.DOUBLE) - .addNullableField("i_wholesale_cost", Schema.FieldType.DOUBLE) - .addNullableField("i_brand_id", Schema.FieldType.INT64) - .addNullableField("i_brand", Schema.FieldType.STRING) - .addNullableField("i_class_id", Schema.FieldType.INT64) - .addNullableField("i_class", Schema.FieldType.STRING) - .addNullableField("i_category_id", Schema.FieldType.INT64) - .addNullableField("i_category", Schema.FieldType.STRING) - .addNullableField("i_manufact_id", Schema.FieldType.INT64) - .addNullableField("i_manufact", Schema.FieldType.STRING) - .addNullableField("i_size", Schema.FieldType.STRING) - .addNullableField("i_formulation", Schema.FieldType.STRING) - .addNullableField("i_color", Schema.FieldType.STRING) - .addNullableField("i_units", Schema.FieldType.STRING) - .addNullableField("i_container", Schema.FieldType.STRING) - .addNullableField("i_manager_id", Schema.FieldType.INT64) - .addNullableField("i_product_name", Schema.FieldType.STRING) - .build(); - - private static Schema promotionSchema = - Schema.builder() - .addField("p_promo_sk", Schema.FieldType.INT64) - .addField("p_promo_id", Schema.FieldType.STRING) - .addNullableField("p_start_date_sk", Schema.FieldType.INT64) - .addNullableField("p_end_date_sk", Schema.FieldType.INT64) - .addNullableField("p_item_sk", Schema.FieldType.INT64) - .addNullableField("p_cost", Schema.FieldType.DOUBLE) - .addNullableField("p_response_target", Schema.FieldType.INT64) - .addNullableField("p_promo_name", Schema.FieldType.STRING) - .addNullableField("p_channel_dmail", Schema.FieldType.STRING) - .addNullableField("p_channel_email", Schema.FieldType.STRING) - .addNullableField("p_channel_catalog", Schema.FieldType.STRING) - .addNullableField("p_channel_tv", Schema.FieldType.STRING) - .addNullableField("p_channel_radio", Schema.FieldType.STRING) - .addNullableField("p_channel_press", Schema.FieldType.STRING) - .addNullableField("p_channel_event", Schema.FieldType.STRING) - .addNullableField("p_channel_demo", Schema.FieldType.STRING) - .addNullableField("p_channel_details", Schema.FieldType.STRING) - .addNullableField("p_purpose", Schema.FieldType.STRING) - .addNullableField("p_discount_active", Schema.FieldType.STRING) - .build(); - - private static Schema reasonSchema = - Schema.builder() - .addField("r_reason_sk", Schema.FieldType.INT64) - .addField("r_reason_id", Schema.FieldType.STRING) - .addNullableField("r_reason_desc", Schema.FieldType.STRING) - .build(); - - private static Schema shipModeSchema = - Schema.builder() - .addField("sm_ship_mode_sk", Schema.FieldType.INT64) - .addField("sm_ship_mode_id", Schema.FieldType.STRING) - .addNullableField("sm_type", Schema.FieldType.STRING) - .addNullableField("sm_code", Schema.FieldType.STRING) - .addNullableField("sm_carrier", Schema.FieldType.STRING) - .addNullableField("sm_contract", Schema.FieldType.STRING) - .build(); - - private static Schema storeSchema = - Schema.builder() - .addField("s_store_sk", Schema.FieldType.INT64) - .addField("s_store_id", Schema.FieldType.STRING) - .addNullableField("s_rec_start_date", Schema.FieldType.STRING) - .addNullableField("s_rec_end_date", Schema.FieldType.STRING) - .addNullableField("s_closed_date_sk", Schema.FieldType.INT64) - .addNullableField("s_store_name", Schema.FieldType.STRING) - .addNullableField("s_number_employees", Schema.FieldType.INT64) - .addNullableField("s_floor_space", Schema.FieldType.INT64) - .addNullableField("s_hours", Schema.FieldType.STRING) - .addNullableField("S_manager", Schema.FieldType.STRING) - .addNullableField("S_market_id", Schema.FieldType.INT64) - .addNullableField("S_geography_class", Schema.FieldType.STRING) - .addNullableField("S_market_desc", Schema.FieldType.STRING) - .addNullableField("s_market_manager", Schema.FieldType.STRING) - .addNullableField("s_division_id", Schema.FieldType.INT64) - .addNullableField("s_division_name", Schema.FieldType.STRING) - .addNullableField("s_company_id", Schema.FieldType.INT64) - .addNullableField("s_company_name", Schema.FieldType.STRING) - .addNullableField("s_street_number", Schema.FieldType.STRING) - .addNullableField("s_street_name", Schema.FieldType.STRING) - .addNullableField("s_street_type", Schema.FieldType.STRING) - .addNullableField("s_suite_number", Schema.FieldType.STRING) - .addNullableField("s_city", Schema.FieldType.STRING) - .addNullableField("s_county", Schema.FieldType.STRING) - .addNullableField("s_state", Schema.FieldType.STRING) - .addNullableField("s_zip", Schema.FieldType.STRING) - .addNullableField("s_country", Schema.FieldType.STRING) - .addNullableField("s_gmt_offset", Schema.FieldType.DOUBLE) - .addNullableField("s_tax_percentage", Schema.FieldType.DOUBLE) - .build(); - - private static Schema storeReturnsSchema = - Schema.builder() - .addNullableField("sr_returned_date_sk", Schema.FieldType.INT64) - .addNullableField("sr_return_time_sk", Schema.FieldType.INT64) - .addField("sr_item_sk", Schema.FieldType.INT64) - .addNullableField("sr_customer_sk", Schema.FieldType.INT64) - .addNullableField("sr_cdemo_sk", Schema.FieldType.INT64) - .addNullableField("sr_hdemo_sk", Schema.FieldType.INT64) - .addNullableField("sr_addr_sk", Schema.FieldType.INT64) - .addNullableField("sr_store_sk", Schema.FieldType.INT64) - .addNullableField("sr_reason_sk", Schema.FieldType.INT64) - .addField("sr_ticket_number", Schema.FieldType.INT64) - .addNullableField("sr_return_quantity", Schema.FieldType.INT64) - .addNullableField("sr_return_amt", Schema.FieldType.DOUBLE) - .addNullableField("sr_return_tax", Schema.FieldType.DOUBLE) - .addNullableField("sr_return_amt_inc_tax", Schema.FieldType.DOUBLE) - .addNullableField("sr_fee", Schema.FieldType.DOUBLE) - .addNullableField("sr_return_ship_cost", Schema.FieldType.DOUBLE) - .addNullableField("sr_refunded_cash", Schema.FieldType.DOUBLE) - .addNullableField("sr_reversed_charge", Schema.FieldType.DOUBLE) - .addNullableField("sr_store_credit", Schema.FieldType.DOUBLE) - .addNullableField("sr_net_loss", Schema.FieldType.DOUBLE) - .build(); - - private static Schema storeSalesSchema = - Schema.builder() - .addNullableField("ss_sold_date_sk", Schema.FieldType.INT64) - .addNullableField("ss_sold_time_sk", Schema.FieldType.INT64) - .addField("ss_item_sk", Schema.FieldType.INT64) - .addNullableField("ss_customer_sk", Schema.FieldType.INT64) - .addNullableField("ss_cdemo_sk", Schema.FieldType.INT64) - .addNullableField("ss_hdemo_sk", Schema.FieldType.INT64) - .addNullableField("ss_addr_sk", Schema.FieldType.INT64) - .addNullableField("ss_store_sk", Schema.FieldType.INT64) - .addNullableField("ss_promo_sk", Schema.FieldType.INT64) - .addField("ss_ticket_number", Schema.FieldType.INT64) - .addNullableField("ss_quantity", Schema.FieldType.INT64) - .addNullableField("ss_wholesale_cost", Schema.FieldType.DOUBLE) - .addNullableField("ss_list_price", Schema.FieldType.DOUBLE) - .addNullableField("ss_sales_price", Schema.FieldType.DOUBLE) - .addNullableField("ss_ext_discount_amt", Schema.FieldType.DOUBLE) - .addNullableField("ss_ext_sales_price", Schema.FieldType.DOUBLE) - .addNullableField("ss_ext_wholesale_cost", Schema.FieldType.DOUBLE) - .addNullableField("ss_ext_list_price", Schema.FieldType.DOUBLE) - .addNullableField("ss_ext_tax", Schema.FieldType.DOUBLE) - .addNullableField("ss_coupon_amt", Schema.FieldType.DOUBLE) - .addNullableField("ss_net_paid", Schema.FieldType.DOUBLE) - .addNullableField("ss_net_paid_inc_tax", Schema.FieldType.DOUBLE) - .addNullableField("ss_net_profit", Schema.FieldType.DOUBLE) - .build(); - - private static Schema timeDimSchema = - Schema.builder() - .addField("t_time_sk", Schema.FieldType.INT64) - .addField("t_time_id", Schema.FieldType.STRING) - .addNullableField("t_time", Schema.FieldType.INT64) - .addNullableField("t_hour", Schema.FieldType.INT64) - .addNullableField("t_minute", Schema.FieldType.INT64) - .addNullableField("t_second", Schema.FieldType.INT64) - .addNullableField("t_am_pm", Schema.FieldType.STRING) - .addNullableField("t_shift", Schema.FieldType.STRING) - .addNullableField("t_sub_shift", Schema.FieldType.STRING) - .addNullableField("t_meal_time", Schema.FieldType.STRING) - .build(); - - private static Schema warehouseSchema = - Schema.builder() - .addField("w_warehouse_sk", Schema.FieldType.INT64) - .addField("w_warehouse_id", Schema.FieldType.STRING) - .addNullableField("w_warehouse_name", Schema.FieldType.STRING) - .addNullableField("w_warehouse_sq_ft", Schema.FieldType.INT64) - .addNullableField("w_street_number", Schema.FieldType.STRING) - .addNullableField("w_street_name", Schema.FieldType.STRING) - .addNullableField("w_street_type", Schema.FieldType.STRING) - .addNullableField("w_suite_number", Schema.FieldType.STRING) - .addNullableField("w_city", Schema.FieldType.STRING) - .addNullableField("w_county", Schema.FieldType.STRING) - .addNullableField("w_state", Schema.FieldType.STRING) - .addNullableField("w_zip", Schema.FieldType.STRING) - .addNullableField("w_country", Schema.FieldType.STRING) - .addNullableField("w_gmt_offset", Schema.FieldType.DOUBLE) - .build(); - - private static Schema webPageSchema = - Schema.builder() - .addField("wp_web_page_sk", Schema.FieldType.INT64) - .addField("wp_web_page_id", Schema.FieldType.STRING) - .addNullableField("wp_rec_start_date", Schema.FieldType.STRING) - .addNullableField("wp_rec_end_date", Schema.FieldType.STRING) - .addNullableField("wp_creation_date_sk", Schema.FieldType.INT64) - .addNullableField("wp_access_date_sk", Schema.FieldType.INT64) - .addNullableField("wp_autogen_flag", Schema.FieldType.STRING) - .addNullableField("wp_customer_sk", Schema.FieldType.INT64) - .addNullableField("wp_url", Schema.FieldType.STRING) - .addNullableField("wp_type", Schema.FieldType.STRING) - .addNullableField("wp_char_count", Schema.FieldType.INT64) - .addNullableField("wp_link_count", Schema.FieldType.INT64) - .addNullableField("wp_image_count", Schema.FieldType.INT64) - .addNullableField("wp_max_ad_count", Schema.FieldType.INT64) - .build(); - - private static Schema webReturnsSchema = - Schema.builder() - .addNullableField("wr_returned_date_sk", Schema.FieldType.INT64) - .addNullableField("wr_returned_time_sk", Schema.FieldType.INT64) - .addField("wr_item_sk", Schema.FieldType.INT64) - .addNullableField("wr_refunded_customer_sk", Schema.FieldType.INT64) - .addNullableField("wr_refunded_cdemo_sk", Schema.FieldType.INT64) - .addNullableField("wr_refunded_hdemo_sk", Schema.FieldType.INT64) - .addNullableField("wr_refunded_addr_sk", Schema.FieldType.INT64) - .addNullableField("wr_returning_customer_sk", Schema.FieldType.INT64) - .addNullableField("wr_returning_cdemo_sk", Schema.FieldType.INT64) - .addNullableField("wr_returning_hdemo_sk", Schema.FieldType.INT64) - .addNullableField("wr_returning_addr_sk", Schema.FieldType.INT64) - .addNullableField("wr_web_page_sk", Schema.FieldType.INT64) - .addNullableField("wr_reason_sk", Schema.FieldType.INT64) - .addField("wr_order_number", Schema.FieldType.INT64) - .addNullableField("wr_return_quantity", Schema.FieldType.INT64) - .addNullableField("wr_return_amt", Schema.FieldType.DOUBLE) - .addNullableField("wr_return_tax", Schema.FieldType.DOUBLE) - .addNullableField("wr_return_amt_inc_tax", Schema.FieldType.DOUBLE) - .addNullableField("wr_fee", Schema.FieldType.DOUBLE) - .addNullableField("wr_return_ship_cost", Schema.FieldType.DOUBLE) - .addNullableField("wr_refunded_cash", Schema.FieldType.DOUBLE) - .addNullableField("wr_reversed_charge", Schema.FieldType.DOUBLE) - .addNullableField("wr_account_credit", Schema.FieldType.DOUBLE) - .addNullableField("wr_net_loss", Schema.FieldType.DOUBLE) - .build(); - - private static Schema webSalesSchema = - Schema.builder() - .addNullableField("ws_sold_date_sk", Schema.FieldType.INT32) - .addNullableField("ws_sold_time_sk", Schema.FieldType.INT32) - .addNullableField("ws_ship_date_sk", Schema.FieldType.INT32) - .addField("ws_item_sk", Schema.FieldType.INT32) - .addNullableField("ws_bill_customer_sk", Schema.FieldType.INT32) - .addNullableField("ws_bill_cdemo_sk", Schema.FieldType.INT32) - .addNullableField("ws_bill_hdemo_sk", Schema.FieldType.INT32) - .addNullableField("ws_bill_addr_sk", Schema.FieldType.INT32) - .addNullableField("ws_ship_customer_sk", Schema.FieldType.INT32) - .addNullableField("ws_ship_cdemo_sk", Schema.FieldType.INT32) - .addNullableField("ws_ship_hdemo_sk", Schema.FieldType.INT32) - .addNullableField("ws_ship_addr_sk", Schema.FieldType.INT32) - .addNullableField("ws_web_page_sk", Schema.FieldType.INT32) - .addNullableField("ws_web_site_sk", Schema.FieldType.INT32) - .addNullableField("ws_ship_mode_sk", Schema.FieldType.INT32) - .addNullableField("ws_warehouse_sk", Schema.FieldType.INT32) - .addNullableField("ws_promo_sk", Schema.FieldType.INT32) - .addField("ws_order_number", Schema.FieldType.INT64) - .addNullableField("ws_quantity", Schema.FieldType.INT32) - .addNullableField("ws_wholesale_cost", Schema.FieldType.DOUBLE) - .addNullableField("ws_list_price", Schema.FieldType.DOUBLE) - .addNullableField("ws_sales_price", Schema.FieldType.DOUBLE) - .addNullableField("ws_ext_discount_amt", Schema.FieldType.DOUBLE) - .addNullableField("ws_ext_sales_price", Schema.FieldType.DOUBLE) - .addNullableField("ws_ext_wholesale_cost", Schema.FieldType.DOUBLE) - .addNullableField("ws_ext_list_price", Schema.FieldType.DOUBLE) - .addNullableField("ws_ext_tax", Schema.FieldType.DOUBLE) - .addNullableField("ws_coupon_amt", Schema.FieldType.DOUBLE) - .addNullableField("ws_ext_ship_cost", Schema.FieldType.DOUBLE) - .addNullableField("ws_net_paid", Schema.FieldType.DOUBLE) - .addNullableField("ws_net_paid_inc_tax", Schema.FieldType.DOUBLE) - .addNullableField("ws_net_paid_inc_ship", Schema.FieldType.DOUBLE) - .addNullableField("ws_net_paid_inc_ship_tax", Schema.FieldType.DOUBLE) - .addNullableField("ws_net_profit", Schema.FieldType.DOUBLE) - .build(); - - private static Schema webSiteSchema = - Schema.builder() - .addField("web_site_sk", Schema.FieldType.STRING) - .addField("web_site_id", Schema.FieldType.STRING) - .addNullableField("web_rec_start_date", Schema.FieldType.STRING) - .addNullableField("web_rec_end_date", Schema.FieldType.STRING) - .addNullableField("web_name", Schema.FieldType.STRING) - .addNullableField("web_open_date_sk", Schema.FieldType.INT32) - .addNullableField("web_close_date_sk", Schema.FieldType.INT32) - .addNullableField("web_class", Schema.FieldType.STRING) - .addNullableField("web_manager", Schema.FieldType.STRING) - .addNullableField("web_mkt_id", Schema.FieldType.INT32) - .addNullableField("web_mkt_class", Schema.FieldType.STRING) - .addNullableField("web_mkt_desc", Schema.FieldType.STRING) - .addNullableField("web_market_manager", Schema.FieldType.STRING) - .addNullableField("web_company_id", Schema.FieldType.INT32) - .addNullableField("web_company_name", Schema.FieldType.STRING) - .addNullableField("web_street_number", Schema.FieldType.STRING) - .addNullableField("web_street_name", Schema.FieldType.STRING) - .addNullableField("web_street_type", Schema.FieldType.STRING) - .addNullableField("web_suite_number", Schema.FieldType.STRING) - .addNullableField("web_city", Schema.FieldType.STRING) - .addNullableField("web_county", Schema.FieldType.STRING) - .addNullableField("web_state", Schema.FieldType.STRING) - .addNullableField("web_zip", Schema.FieldType.STRING) - .addNullableField("web_country", Schema.FieldType.STRING) - .addNullableField("web_gmt_offset", Schema.FieldType.DOUBLE) - .addNullableField("web_tax_percentage", Schema.FieldType.DOUBLE) - .build(); + Schema tableSchema = schemaBuilder.build(); + schemaMap.put(tableName, tableSchema); + } + return schemaMap; + } + + /** + * Get all tpcds table schemas according to tpcds official specification. Some fields are set to + * be not nullable. + * + * @return A map of all tpcds table schemas with their table names as keys. + */ + public static Map getTpcdsSchemasImmutableMap() { + return ImmutableMap.builder() + .put("call_center", callCenterSchema) + .put("catalog_page", catalogPageSchema) + .put("catalog_returns", catalogReturnsSchema) + .put("catalog_sales", catalogSalesSchema) + .put("customer", customerSchema) + .put("customer_address", customerAddressSchema) + .put("customer_demographics", customerDemographicsSchema) + .put("date_dim", dateDimSchema) + .put("household_demographics", householdDemographicsSchema) + .put("income_band", incomeBandSchema) + .put("inventory", inventorySchema) + .put("item", itemSchema) + .put("promotion", promotionSchema) + .put("reason", reasonSchema) + .put("ship_mode", shipModeSchema) + .put("store", storeSchema) + .put("store_returns", storeReturnsSchema) + .put("store_sales", storeSalesSchema) + .put("time_dim", timeDimSchema) + .put("warehouse", warehouseSchema) + .put("web_page", webPageSchema) + .put("web_returns", webReturnsSchema) + .put("web_sales", webSalesSchema) + .put("web_site", webSiteSchema) + .build(); + } + + public static Schema getCallCenterSchema() { + return callCenterSchema; + } + + public static Schema getCatalogPageSchema() { + return catalogPageSchema; + } + + public static Schema getCatalogReturnsSchema() { + return catalogReturnsSchema; + } + + public static Schema getCatalogSalesSchema() { + return catalogSalesSchema; + } + + public static Schema getCustomerSchema() { + return customerSchema; + } + + public static Schema getCustomerAddressSchema() { + return customerAddressSchema; + } + + public static Schema getCustomerDemographicsSchema() { + return customerDemographicsSchema; + } + + public static Schema getDateDimSchema() { + return dateDimSchema; + } + + public static Schema getHouseholdDemographicsSchema() { + return householdDemographicsSchema; + } + + public static Schema getIncomeBandSchema() { + return incomeBandSchema; + } + + public static Schema getInventorySchema() { + return inventorySchema; + } + + public static Schema getItemSchema() { + return itemSchema; + } + + public static Schema getPromotionSchema() { + return promotionSchema; + } + + public static Schema getReasonSchema() { + return reasonSchema; + } + + public static Schema getShipModeSchema() { + return shipModeSchema; + } + + public static Schema getStoreSchema() { + return storeSchema; + } + + public static Schema getStoreReturnsSchema() { + return storeReturnsSchema; + } + + public static Schema getStoreSalesSchema() { + return storeSalesSchema; + } + + public static Schema getTimeDimSchema() { + return timeDimSchema; + } + + public static Schema getWarehouseSchema() { + return warehouseSchema; + } + + public static Schema getWebpageSchema() { + return webPageSchema; + } + + public static Schema getWebReturnsSchema() { + return webReturnsSchema; + } + + public static Schema getWebSalesSchema() { + return webSalesSchema; + } + + public static Schema getWebSiteSchema() { + return webSiteSchema; + } + + private static final Schema callCenterSchema = + Schema.builder() + .addField("cc_call_center_sk", Schema.FieldType.INT64) + .addField("cc_call_center_id", Schema.FieldType.STRING) + .addNullableField("cc_rec_start_date", Schema.FieldType.STRING) + .addNullableField("cc_rec_end_date", Schema.FieldType.STRING) + .addNullableField("cc_closed_date_sk", Schema.FieldType.INT64) + .addNullableField("cc_open_date_sk", Schema.FieldType.INT64) + .addNullableField("cc_name", Schema.FieldType.STRING) + .addNullableField("cc_class", Schema.FieldType.STRING) + .addNullableField("cc_employees", Schema.FieldType.INT64) + .addNullableField("cc_sq_ft", Schema.FieldType.INT64) + .addNullableField("cc_hours", Schema.FieldType.STRING) + .addNullableField("cc_manager", Schema.FieldType.STRING) + .addNullableField("cc_mkt_id", Schema.FieldType.INT64) + .addNullableField("cc_mkt_class", Schema.FieldType.STRING) + .addNullableField("cc_mkt_desc", Schema.FieldType.STRING) + .addNullableField("cc_market_manager", Schema.FieldType.STRING) + .addNullableField("cc_division", Schema.FieldType.INT64) + .addNullableField("cc_division_name", Schema.FieldType.STRING) + .addNullableField("cc_company", Schema.FieldType.INT64) + .addNullableField("cc_company_name", Schema.FieldType.STRING) + .addNullableField("cc_street_number", Schema.FieldType.STRING) + .addNullableField("cc_street_name", Schema.FieldType.STRING) + .addNullableField("cc_street_type", Schema.FieldType.STRING) + .addNullableField("cc_suite_number", Schema.FieldType.STRING) + .addNullableField("cc_city", Schema.FieldType.STRING) + .addNullableField("cc_county", Schema.FieldType.STRING) + .addNullableField("cc_state", Schema.FieldType.STRING) + .addNullableField("cc_zip", Schema.FieldType.STRING) + .addNullableField("cc_country", Schema.FieldType.STRING) + .addNullableField("cc_gmt_offset", Schema.FieldType.DOUBLE) + .addNullableField("cc_tax_percentage", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema catalogPageSchema = + Schema.builder() + .addField("cp_catalog_page_sk", Schema.FieldType.INT64) + .addField("cp_catalog_page_id", Schema.FieldType.STRING) + .addNullableField("cp_start_date_sk", Schema.FieldType.INT64) + .addNullableField("cp_end_date_sk", Schema.FieldType.INT64) + .addNullableField("cp_department", Schema.FieldType.STRING) + .addNullableField("cp_catalog_number", Schema.FieldType.INT64) + .addNullableField("cp_catalog_page_number", Schema.FieldType.INT64) + .addNullableField("cp_description", Schema.FieldType.STRING) + .addNullableField("cp_type", Schema.FieldType.STRING) + .build(); + + private static final Schema catalogReturnsSchema = + Schema.builder() + .addNullableField("cr_returned_date_sk", Schema.FieldType.INT64) + .addNullableField("cr_returned_time_sk", Schema.FieldType.INT64) + .addField("cr_item_sk", Schema.FieldType.INT64) + .addNullableField("cr_refunded_customer_sk", Schema.FieldType.INT64) + .addNullableField("cr_refunded_cdemo_sk", Schema.FieldType.INT64) + .addNullableField("cr_refunded_hdemo_sk", Schema.FieldType.INT64) + .addNullableField("cr_refunded_addr_sk", Schema.FieldType.INT64) + .addNullableField("cr_returning_customer_sk", Schema.FieldType.INT64) + .addNullableField("cr_returning_cdemo_sk", Schema.FieldType.INT64) + .addNullableField("cr_returning_hdemo_sk", Schema.FieldType.INT64) + .addNullableField("cr_returning_addr_sk", Schema.FieldType.INT64) + .addNullableField("cr_call_center_sk", Schema.FieldType.INT64) + .addNullableField("cr_catalog_page_sk", Schema.FieldType.INT64) + .addNullableField("cr_ship_mode_sk", Schema.FieldType.INT64) + .addNullableField("cr_warehouse_sk", Schema.FieldType.INT64) + .addNullableField("cr_reason_sk", Schema.FieldType.INT64) + .addField("cr_order_number", Schema.FieldType.INT64) + .addNullableField("cr_return_quantity", Schema.FieldType.INT64) + .addNullableField("cr_return_amount", Schema.FieldType.DOUBLE) + .addNullableField("cr_return_tax", Schema.FieldType.DOUBLE) + .addNullableField("cr_return_amt_inc_tax", Schema.FieldType.DOUBLE) + .addNullableField("cr_fee", Schema.FieldType.DOUBLE) + .addNullableField("cr_return_ship_cost", Schema.FieldType.DOUBLE) + .addNullableField("cr_refunded_cash", Schema.FieldType.DOUBLE) + .addNullableField("cr_reversed_charge", Schema.FieldType.DOUBLE) + .addNullableField("cr_store_credit", Schema.FieldType.DOUBLE) + .addNullableField("cr_net_loss", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema catalogSalesSchema = + Schema.builder() + .addNullableField("cs_sold_date_sk", Schema.FieldType.INT64) + .addNullableField("cs_sold_time_sk", Schema.FieldType.INT64) + .addNullableField("cs_ship_date_sk", Schema.FieldType.INT64) + .addNullableField("cs_bill_customer_sk", Schema.FieldType.INT64) + .addNullableField("cs_bill_cdemo_sk", Schema.FieldType.INT64) + .addNullableField("cs_bill_hdemo_sk", Schema.FieldType.INT64) + .addNullableField("cs_bill_addr_sk", Schema.FieldType.INT64) + .addNullableField("cs_ship_customer_sk", Schema.FieldType.INT64) + .addNullableField("cs_ship_cdemo_sk", Schema.FieldType.INT64) + .addNullableField("cs_ship_hdemo_sk", Schema.FieldType.INT64) + .addNullableField("cs_ship_addr_sk", Schema.FieldType.INT64) + .addNullableField("cs_call_center_sk", Schema.FieldType.INT64) + .addNullableField("cs_catalog_page_sk", Schema.FieldType.INT64) + .addNullableField("cs_ship_mode_sk", Schema.FieldType.INT64) + .addNullableField("cs_warehouse_sk", Schema.FieldType.INT64) + .addField("cs_item_sk", Schema.FieldType.INT64) + .addNullableField("cs_promo_sk", Schema.FieldType.INT64) + .addField("cs_order_number", Schema.FieldType.INT64) + .addNullableField("cs_quantity", Schema.FieldType.DOUBLE) + .addNullableField("cs_wholesale_cost", Schema.FieldType.DOUBLE) + .addNullableField("cs_list_price", Schema.FieldType.DOUBLE) + .addNullableField("cs_sales_price", Schema.FieldType.DOUBLE) + .addNullableField("cs_ext_discount_amt", Schema.FieldType.DOUBLE) + .addNullableField("cs_ext_sales_price", Schema.FieldType.DOUBLE) + .addNullableField("cs_ext_wholesale_cost", Schema.FieldType.DOUBLE) + .addNullableField("cs_ext_list_price", Schema.FieldType.DOUBLE) + .addNullableField("cs_ext_tax", Schema.FieldType.DOUBLE) + .addNullableField("cs_coupon_amt", Schema.FieldType.DOUBLE) + .addNullableField("cs_ext_ship_cost", Schema.FieldType.DOUBLE) + .addNullableField("cs_net_paid", Schema.FieldType.DOUBLE) + .addNullableField("cs_net_paid_inc_tax", Schema.FieldType.DOUBLE) + .addNullableField("cs_net_paid_inc_ship", Schema.FieldType.DOUBLE) + .addNullableField("cs_net_paid_inc_ship_tax", Schema.FieldType.DOUBLE) + .addNullableField("cs_net_profit", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema customerSchema = + Schema.builder() + .addField("c_customer_sk", Schema.FieldType.INT64) + .addField("c_customer_id", Schema.FieldType.STRING) + .addNullableField("c_current_cdemo_sk", Schema.FieldType.INT64) + .addNullableField("c_current_hdemo_sk", Schema.FieldType.INT64) + .addNullableField("c_current_addr_sk", Schema.FieldType.INT64) + .addNullableField("c_first_shipto_date_sk", Schema.FieldType.INT64) + .addNullableField("c_first_sales_date_sk", Schema.FieldType.INT64) + .addNullableField("c_salutation", Schema.FieldType.STRING) + .addNullableField("c_first_name", Schema.FieldType.STRING) + .addNullableField("c_last_name", Schema.FieldType.STRING) + .addNullableField("c_preferred_cust_flag", Schema.FieldType.STRING) + .addNullableField("c_birth_day", Schema.FieldType.INT64) + .addNullableField("c_birth_month", Schema.FieldType.INT64) + .addNullableField("c_birth_year", Schema.FieldType.INT64) + .addNullableField("c_birth_country", Schema.FieldType.STRING) + .addNullableField("c_login", Schema.FieldType.STRING) + .addNullableField("c_email_address", Schema.FieldType.STRING) + .addNullableField("c_last_review_date_sk", Schema.FieldType.INT64) + .build(); + + private static final Schema customerAddressSchema = + Schema.builder() + .addField("ca_address_sk", Schema.FieldType.INT64) + .addField("ca_address_id", Schema.FieldType.STRING) + .addNullableField("ca_street_number", Schema.FieldType.STRING) + .addNullableField("ca_street_name", Schema.FieldType.STRING) + .addNullableField("ca_street_type", Schema.FieldType.STRING) + .addNullableField("ca_suite_number", Schema.FieldType.STRING) + .addNullableField("ca_city", Schema.FieldType.STRING) + .addNullableField("ca_county", Schema.FieldType.STRING) + .addNullableField("ca_state", Schema.FieldType.STRING) + .addNullableField("ca_zip", Schema.FieldType.STRING) + .addNullableField("ca_country", Schema.FieldType.STRING) + .addNullableField("ca_gmt_offset", Schema.FieldType.DOUBLE) + .addNullableField("ca_location_type", Schema.FieldType.STRING) + .build(); + + private static final Schema customerDemographicsSchema = + Schema.builder() + .addField("cd_demo_sk", Schema.FieldType.INT64) + .addNullableField("cd_gender", Schema.FieldType.STRING) + .addNullableField("cd_marital_status", Schema.FieldType.STRING) + .addNullableField("cd_education_status", Schema.FieldType.STRING) + .addNullableField("cd_purchase_estimate", Schema.FieldType.INT64) + .addNullableField("cd_credit_rating", Schema.FieldType.STRING) + .addNullableField("cd_dep_count", Schema.FieldType.INT64) + .addNullableField("cd_dep_employed_count", Schema.FieldType.INT64) + .addNullableField("cd_dep_college_count", Schema.FieldType.INT64) + .build(); + + private static final Schema dateDimSchema = + Schema.builder() + .addField("d_date_sk", Schema.FieldType.INT64) + .addField("d_date_id", Schema.FieldType.STRING) + .addNullableField("d_date", Schema.FieldType.STRING) + .addNullableField("d_month_seq", Schema.FieldType.INT64) + .addNullableField("d_week_seq", Schema.FieldType.INT64) + .addNullableField("d_quarter_seq", Schema.FieldType.INT64) + .addNullableField("d_year", Schema.FieldType.INT64) + .addNullableField("d_dow", Schema.FieldType.INT64) + .addNullableField("d_moy", Schema.FieldType.INT64) + .addNullableField("d_dom", Schema.FieldType.INT64) + .addNullableField("d_qoy", Schema.FieldType.INT64) + .addNullableField("d_fy_year", Schema.FieldType.INT64) + .addNullableField("d_fy_quarter_seq", Schema.FieldType.INT64) + .addNullableField("d_fy_week_seq", Schema.FieldType.INT64) + .addNullableField("d_day_name", Schema.FieldType.STRING) + .addNullableField("d_quarter_name", Schema.FieldType.STRING) + .addNullableField("d_holiday", Schema.FieldType.STRING) + .addNullableField("d_weekend", Schema.FieldType.STRING) + .addNullableField("d_following_holiday", Schema.FieldType.STRING) + .addNullableField("d_first_dom", Schema.FieldType.INT64) + .addNullableField("d_last_dom", Schema.FieldType.INT64) + .addNullableField("d_same_day_ly", Schema.FieldType.INT64) + .addNullableField("d_same_day_lq", Schema.FieldType.INT64) + .addNullableField("d_current_day", Schema.FieldType.STRING) + .addNullableField("d_current_week", Schema.FieldType.STRING) + .addNullableField("d_current_month", Schema.FieldType.STRING) + .addNullableField("d_current_quarter", Schema.FieldType.STRING) + .addNullableField("d_current_year", Schema.FieldType.STRING) + .build(); + + private static final Schema householdDemographicsSchema = + Schema.builder() + .addField("hd_demo_sk", Schema.FieldType.INT64) + .addNullableField("hd_income_band_sk", Schema.FieldType.INT64) + .addNullableField("hd_buy_potential", Schema.FieldType.STRING) + .addNullableField("hd_dep_count", Schema.FieldType.INT64) + .addNullableField("hd_vehicle_count", Schema.FieldType.INT64) + .build(); + + private static final Schema incomeBandSchema = + Schema.builder() + .addField("ib_income_band_sk", Schema.FieldType.INT64) + .addNullableField("ib_lower_bound", Schema.FieldType.INT64) + .addNullableField("ib_upper_bound", Schema.FieldType.INT64) + .build(); + + private static final Schema inventorySchema = + Schema.builder() + .addField("inv_date_sk", Schema.FieldType.INT32) + .addField("inv_item_sk", Schema.FieldType.INT32) + .addField("inv_warehouse_sk", Schema.FieldType.INT32) + .addNullableField("inv_quantity_on_hand", Schema.FieldType.INT32) + .build(); + + private static final Schema itemSchema = + Schema.builder() + .addField("i_item_sk", Schema.FieldType.INT64) + .addField("i_item_id", Schema.FieldType.STRING) + .addNullableField("i_rec_start_date", Schema.FieldType.STRING) + .addNullableField("i_rec_end_date", Schema.FieldType.STRING) + .addNullableField("i_item_desc", Schema.FieldType.STRING) + .addNullableField("i_current_price", Schema.FieldType.DOUBLE) + .addNullableField("i_wholesale_cost", Schema.FieldType.DOUBLE) + .addNullableField("i_brand_id", Schema.FieldType.INT64) + .addNullableField("i_brand", Schema.FieldType.STRING) + .addNullableField("i_class_id", Schema.FieldType.INT64) + .addNullableField("i_class", Schema.FieldType.STRING) + .addNullableField("i_category_id", Schema.FieldType.INT64) + .addNullableField("i_category", Schema.FieldType.STRING) + .addNullableField("i_manufact_id", Schema.FieldType.INT64) + .addNullableField("i_manufact", Schema.FieldType.STRING) + .addNullableField("i_size", Schema.FieldType.STRING) + .addNullableField("i_formulation", Schema.FieldType.STRING) + .addNullableField("i_color", Schema.FieldType.STRING) + .addNullableField("i_units", Schema.FieldType.STRING) + .addNullableField("i_container", Schema.FieldType.STRING) + .addNullableField("i_manager_id", Schema.FieldType.INT64) + .addNullableField("i_product_name", Schema.FieldType.STRING) + .build(); + + private static final Schema promotionSchema = + Schema.builder() + .addField("p_promo_sk", Schema.FieldType.INT64) + .addField("p_promo_id", Schema.FieldType.STRING) + .addNullableField("p_start_date_sk", Schema.FieldType.INT64) + .addNullableField("p_end_date_sk", Schema.FieldType.INT64) + .addNullableField("p_item_sk", Schema.FieldType.INT64) + .addNullableField("p_cost", Schema.FieldType.DOUBLE) + .addNullableField("p_response_target", Schema.FieldType.INT64) + .addNullableField("p_promo_name", Schema.FieldType.STRING) + .addNullableField("p_channel_dmail", Schema.FieldType.STRING) + .addNullableField("p_channel_email", Schema.FieldType.STRING) + .addNullableField("p_channel_catalog", Schema.FieldType.STRING) + .addNullableField("p_channel_tv", Schema.FieldType.STRING) + .addNullableField("p_channel_radio", Schema.FieldType.STRING) + .addNullableField("p_channel_press", Schema.FieldType.STRING) + .addNullableField("p_channel_event", Schema.FieldType.STRING) + .addNullableField("p_channel_demo", Schema.FieldType.STRING) + .addNullableField("p_channel_details", Schema.FieldType.STRING) + .addNullableField("p_purpose", Schema.FieldType.STRING) + .addNullableField("p_discount_active", Schema.FieldType.STRING) + .build(); + + private static final Schema reasonSchema = + Schema.builder() + .addField("r_reason_sk", Schema.FieldType.INT64) + .addField("r_reason_id", Schema.FieldType.STRING) + .addNullableField("r_reason_desc", Schema.FieldType.STRING) + .build(); + + private static final Schema shipModeSchema = + Schema.builder() + .addField("sm_ship_mode_sk", Schema.FieldType.INT64) + .addField("sm_ship_mode_id", Schema.FieldType.STRING) + .addNullableField("sm_type", Schema.FieldType.STRING) + .addNullableField("sm_code", Schema.FieldType.STRING) + .addNullableField("sm_carrier", Schema.FieldType.STRING) + .addNullableField("sm_contract", Schema.FieldType.STRING) + .build(); + + private static final Schema storeSchema = + Schema.builder() + .addField("s_store_sk", Schema.FieldType.INT64) + .addField("s_store_id", Schema.FieldType.STRING) + .addNullableField("s_rec_start_date", Schema.FieldType.STRING) + .addNullableField("s_rec_end_date", Schema.FieldType.STRING) + .addNullableField("s_closed_date_sk", Schema.FieldType.INT64) + .addNullableField("s_store_name", Schema.FieldType.STRING) + .addNullableField("s_number_employees", Schema.FieldType.INT64) + .addNullableField("s_floor_space", Schema.FieldType.INT64) + .addNullableField("s_hours", Schema.FieldType.STRING) + .addNullableField("S_manager", Schema.FieldType.STRING) + .addNullableField("S_market_id", Schema.FieldType.INT64) + .addNullableField("S_geography_class", Schema.FieldType.STRING) + .addNullableField("S_market_desc", Schema.FieldType.STRING) + .addNullableField("s_market_manager", Schema.FieldType.STRING) + .addNullableField("s_division_id", Schema.FieldType.INT64) + .addNullableField("s_division_name", Schema.FieldType.STRING) + .addNullableField("s_company_id", Schema.FieldType.INT64) + .addNullableField("s_company_name", Schema.FieldType.STRING) + .addNullableField("s_street_number", Schema.FieldType.STRING) + .addNullableField("s_street_name", Schema.FieldType.STRING) + .addNullableField("s_street_type", Schema.FieldType.STRING) + .addNullableField("s_suite_number", Schema.FieldType.STRING) + .addNullableField("s_city", Schema.FieldType.STRING) + .addNullableField("s_county", Schema.FieldType.STRING) + .addNullableField("s_state", Schema.FieldType.STRING) + .addNullableField("s_zip", Schema.FieldType.STRING) + .addNullableField("s_country", Schema.FieldType.STRING) + .addNullableField("s_gmt_offset", Schema.FieldType.DOUBLE) + .addNullableField("s_tax_percentage", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema storeReturnsSchema = + Schema.builder() + .addNullableField("sr_returned_date_sk", Schema.FieldType.INT64) + .addNullableField("sr_return_time_sk", Schema.FieldType.INT64) + .addField("sr_item_sk", Schema.FieldType.INT64) + .addNullableField("sr_customer_sk", Schema.FieldType.INT64) + .addNullableField("sr_cdemo_sk", Schema.FieldType.INT64) + .addNullableField("sr_hdemo_sk", Schema.FieldType.INT64) + .addNullableField("sr_addr_sk", Schema.FieldType.INT64) + .addNullableField("sr_store_sk", Schema.FieldType.INT64) + .addNullableField("sr_reason_sk", Schema.FieldType.INT64) + .addField("sr_ticket_number", Schema.FieldType.INT64) + .addNullableField("sr_return_quantity", Schema.FieldType.INT64) + .addNullableField("sr_return_amt", Schema.FieldType.DOUBLE) + .addNullableField("sr_return_tax", Schema.FieldType.DOUBLE) + .addNullableField("sr_return_amt_inc_tax", Schema.FieldType.DOUBLE) + .addNullableField("sr_fee", Schema.FieldType.DOUBLE) + .addNullableField("sr_return_ship_cost", Schema.FieldType.DOUBLE) + .addNullableField("sr_refunded_cash", Schema.FieldType.DOUBLE) + .addNullableField("sr_reversed_charge", Schema.FieldType.DOUBLE) + .addNullableField("sr_store_credit", Schema.FieldType.DOUBLE) + .addNullableField("sr_net_loss", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema storeSalesSchema = + Schema.builder() + .addNullableField("ss_sold_date_sk", Schema.FieldType.INT64) + .addNullableField("ss_sold_time_sk", Schema.FieldType.INT64) + .addField("ss_item_sk", Schema.FieldType.INT64) + .addNullableField("ss_customer_sk", Schema.FieldType.INT64) + .addNullableField("ss_cdemo_sk", Schema.FieldType.INT64) + .addNullableField("ss_hdemo_sk", Schema.FieldType.INT64) + .addNullableField("ss_addr_sk", Schema.FieldType.INT64) + .addNullableField("ss_store_sk", Schema.FieldType.INT64) + .addNullableField("ss_promo_sk", Schema.FieldType.INT64) + .addField("ss_ticket_number", Schema.FieldType.INT64) + .addNullableField("ss_quantity", Schema.FieldType.INT64) + .addNullableField("ss_wholesale_cost", Schema.FieldType.DOUBLE) + .addNullableField("ss_list_price", Schema.FieldType.DOUBLE) + .addNullableField("ss_sales_price", Schema.FieldType.DOUBLE) + .addNullableField("ss_ext_discount_amt", Schema.FieldType.DOUBLE) + .addNullableField("ss_ext_sales_price", Schema.FieldType.DOUBLE) + .addNullableField("ss_ext_wholesale_cost", Schema.FieldType.DOUBLE) + .addNullableField("ss_ext_list_price", Schema.FieldType.DOUBLE) + .addNullableField("ss_ext_tax", Schema.FieldType.DOUBLE) + .addNullableField("ss_coupon_amt", Schema.FieldType.DOUBLE) + .addNullableField("ss_net_paid", Schema.FieldType.DOUBLE) + .addNullableField("ss_net_paid_inc_tax", Schema.FieldType.DOUBLE) + .addNullableField("ss_net_profit", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema timeDimSchema = + Schema.builder() + .addField("t_time_sk", Schema.FieldType.INT64) + .addField("t_time_id", Schema.FieldType.STRING) + .addNullableField("t_time", Schema.FieldType.INT64) + .addNullableField("t_hour", Schema.FieldType.INT64) + .addNullableField("t_minute", Schema.FieldType.INT64) + .addNullableField("t_second", Schema.FieldType.INT64) + .addNullableField("t_am_pm", Schema.FieldType.STRING) + .addNullableField("t_shift", Schema.FieldType.STRING) + .addNullableField("t_sub_shift", Schema.FieldType.STRING) + .addNullableField("t_meal_time", Schema.FieldType.STRING) + .build(); + + private static final Schema warehouseSchema = + Schema.builder() + .addField("w_warehouse_sk", Schema.FieldType.INT64) + .addField("w_warehouse_id", Schema.FieldType.STRING) + .addNullableField("w_warehouse_name", Schema.FieldType.STRING) + .addNullableField("w_warehouse_sq_ft", Schema.FieldType.INT64) + .addNullableField("w_street_number", Schema.FieldType.STRING) + .addNullableField("w_street_name", Schema.FieldType.STRING) + .addNullableField("w_street_type", Schema.FieldType.STRING) + .addNullableField("w_suite_number", Schema.FieldType.STRING) + .addNullableField("w_city", Schema.FieldType.STRING) + .addNullableField("w_county", Schema.FieldType.STRING) + .addNullableField("w_state", Schema.FieldType.STRING) + .addNullableField("w_zip", Schema.FieldType.STRING) + .addNullableField("w_country", Schema.FieldType.STRING) + .addNullableField("w_gmt_offset", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema webPageSchema = + Schema.builder() + .addField("wp_web_page_sk", Schema.FieldType.INT64) + .addField("wp_web_page_id", Schema.FieldType.STRING) + .addNullableField("wp_rec_start_date", Schema.FieldType.STRING) + .addNullableField("wp_rec_end_date", Schema.FieldType.STRING) + .addNullableField("wp_creation_date_sk", Schema.FieldType.INT64) + .addNullableField("wp_access_date_sk", Schema.FieldType.INT64) + .addNullableField("wp_autogen_flag", Schema.FieldType.STRING) + .addNullableField("wp_customer_sk", Schema.FieldType.INT64) + .addNullableField("wp_url", Schema.FieldType.STRING) + .addNullableField("wp_type", Schema.FieldType.STRING) + .addNullableField("wp_char_count", Schema.FieldType.INT64) + .addNullableField("wp_link_count", Schema.FieldType.INT64) + .addNullableField("wp_image_count", Schema.FieldType.INT64) + .addNullableField("wp_max_ad_count", Schema.FieldType.INT64) + .build(); + + private static final Schema webReturnsSchema = + Schema.builder() + .addNullableField("wr_returned_date_sk", Schema.FieldType.INT64) + .addNullableField("wr_returned_time_sk", Schema.FieldType.INT64) + .addField("wr_item_sk", Schema.FieldType.INT64) + .addNullableField("wr_refunded_customer_sk", Schema.FieldType.INT64) + .addNullableField("wr_refunded_cdemo_sk", Schema.FieldType.INT64) + .addNullableField("wr_refunded_hdemo_sk", Schema.FieldType.INT64) + .addNullableField("wr_refunded_addr_sk", Schema.FieldType.INT64) + .addNullableField("wr_returning_customer_sk", Schema.FieldType.INT64) + .addNullableField("wr_returning_cdemo_sk", Schema.FieldType.INT64) + .addNullableField("wr_returning_hdemo_sk", Schema.FieldType.INT64) + .addNullableField("wr_returning_addr_sk", Schema.FieldType.INT64) + .addNullableField("wr_web_page_sk", Schema.FieldType.INT64) + .addNullableField("wr_reason_sk", Schema.FieldType.INT64) + .addField("wr_order_number", Schema.FieldType.INT64) + .addNullableField("wr_return_quantity", Schema.FieldType.INT64) + .addNullableField("wr_return_amt", Schema.FieldType.DOUBLE) + .addNullableField("wr_return_tax", Schema.FieldType.DOUBLE) + .addNullableField("wr_return_amt_inc_tax", Schema.FieldType.DOUBLE) + .addNullableField("wr_fee", Schema.FieldType.DOUBLE) + .addNullableField("wr_return_ship_cost", Schema.FieldType.DOUBLE) + .addNullableField("wr_refunded_cash", Schema.FieldType.DOUBLE) + .addNullableField("wr_reversed_charge", Schema.FieldType.DOUBLE) + .addNullableField("wr_account_credit", Schema.FieldType.DOUBLE) + .addNullableField("wr_net_loss", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema webSalesSchema = + Schema.builder() + .addNullableField("ws_sold_date_sk", Schema.FieldType.INT32) + .addNullableField("ws_sold_time_sk", Schema.FieldType.INT32) + .addNullableField("ws_ship_date_sk", Schema.FieldType.INT32) + .addField("ws_item_sk", Schema.FieldType.INT32) + .addNullableField("ws_bill_customer_sk", Schema.FieldType.INT32) + .addNullableField("ws_bill_cdemo_sk", Schema.FieldType.INT32) + .addNullableField("ws_bill_hdemo_sk", Schema.FieldType.INT32) + .addNullableField("ws_bill_addr_sk", Schema.FieldType.INT32) + .addNullableField("ws_ship_customer_sk", Schema.FieldType.INT32) + .addNullableField("ws_ship_cdemo_sk", Schema.FieldType.INT32) + .addNullableField("ws_ship_hdemo_sk", Schema.FieldType.INT32) + .addNullableField("ws_ship_addr_sk", Schema.FieldType.INT32) + .addNullableField("ws_web_page_sk", Schema.FieldType.INT32) + .addNullableField("ws_web_site_sk", Schema.FieldType.INT32) + .addNullableField("ws_ship_mode_sk", Schema.FieldType.INT32) + .addNullableField("ws_warehouse_sk", Schema.FieldType.INT32) + .addNullableField("ws_promo_sk", Schema.FieldType.INT32) + .addField("ws_order_number", Schema.FieldType.INT64) + .addNullableField("ws_quantity", Schema.FieldType.INT32) + .addNullableField("ws_wholesale_cost", Schema.FieldType.DOUBLE) + .addNullableField("ws_list_price", Schema.FieldType.DOUBLE) + .addNullableField("ws_sales_price", Schema.FieldType.DOUBLE) + .addNullableField("ws_ext_discount_amt", Schema.FieldType.DOUBLE) + .addNullableField("ws_ext_sales_price", Schema.FieldType.DOUBLE) + .addNullableField("ws_ext_wholesale_cost", Schema.FieldType.DOUBLE) + .addNullableField("ws_ext_list_price", Schema.FieldType.DOUBLE) + .addNullableField("ws_ext_tax", Schema.FieldType.DOUBLE) + .addNullableField("ws_coupon_amt", Schema.FieldType.DOUBLE) + .addNullableField("ws_ext_ship_cost", Schema.FieldType.DOUBLE) + .addNullableField("ws_net_paid", Schema.FieldType.DOUBLE) + .addNullableField("ws_net_paid_inc_tax", Schema.FieldType.DOUBLE) + .addNullableField("ws_net_paid_inc_ship", Schema.FieldType.DOUBLE) + .addNullableField("ws_net_paid_inc_ship_tax", Schema.FieldType.DOUBLE) + .addNullableField("ws_net_profit", Schema.FieldType.DOUBLE) + .build(); + + private static final Schema webSiteSchema = + Schema.builder() + .addField("web_site_sk", Schema.FieldType.STRING) + .addField("web_site_id", Schema.FieldType.STRING) + .addNullableField("web_rec_start_date", Schema.FieldType.STRING) + .addNullableField("web_rec_end_date", Schema.FieldType.STRING) + .addNullableField("web_name", Schema.FieldType.STRING) + .addNullableField("web_open_date_sk", Schema.FieldType.INT32) + .addNullableField("web_close_date_sk", Schema.FieldType.INT32) + .addNullableField("web_class", Schema.FieldType.STRING) + .addNullableField("web_manager", Schema.FieldType.STRING) + .addNullableField("web_mkt_id", Schema.FieldType.INT32) + .addNullableField("web_mkt_class", Schema.FieldType.STRING) + .addNullableField("web_mkt_desc", Schema.FieldType.STRING) + .addNullableField("web_market_manager", Schema.FieldType.STRING) + .addNullableField("web_company_id", Schema.FieldType.INT32) + .addNullableField("web_company_name", Schema.FieldType.STRING) + .addNullableField("web_street_number", Schema.FieldType.STRING) + .addNullableField("web_street_name", Schema.FieldType.STRING) + .addNullableField("web_street_type", Schema.FieldType.STRING) + .addNullableField("web_suite_number", Schema.FieldType.STRING) + .addNullableField("web_city", Schema.FieldType.STRING) + .addNullableField("web_county", Schema.FieldType.STRING) + .addNullableField("web_state", Schema.FieldType.STRING) + .addNullableField("web_zip", Schema.FieldType.STRING) + .addNullableField("web_country", Schema.FieldType.STRING) + .addNullableField("web_gmt_offset", Schema.FieldType.DOUBLE) + .addNullableField("web_tax_percentage", Schema.FieldType.DOUBLE) + .build(); } diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsUtils.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsUtils.java new file mode 100644 index 000000000000..8ef0f4723731 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/TpcdsUtils.java @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.tpcds; + +/** Odd's 'n Ends used throughout queries and driver. */ +public class TpcdsUtils { + + /** Possible sources for events. */ + public enum SourceType { + /** Read events from CSV files. */ + CSV, + /** Read events from Parquet files. */ + PARQUET + } +} diff --git a/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/package-info.java b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/package-info.java new file mode 100644 index 000000000000..8c2f339270d3 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/java/org/apache/beam/sdk/tpcds/package-info.java @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/** TPC-DS test suite. */ +package org.apache.beam.sdk.tpcds; diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query1.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query1.sql index 3cdf4ca20b56..0e12b2799f42 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query1.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query1.sql @@ -17,7 +17,7 @@ with customer_total_return as (select sr_customer_sk as ctr_customer_sk ,sr_store_sk as ctr_store_sk -,sum(SR_FEE) as ctr_total_return +,sum(sr_fee) as ctr_total_return from store_returns ,date_dim where sr_returned_date_sk = d_date_sk diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query12.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query12.sql index c015bff87ddd..a435044a3ca2 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query12.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query12.sql @@ -14,34 +14,34 @@ -- See the License for the specific language governing permissions and -- limitations under the License. -select i_item_id - ,i_item_desc - ,i_category - ,i_class +select i_item_desc + ,i_category + ,i_class ,i_current_price - ,sum(ws_ext_sales_price) as itemrevenue + ,i_item_id + ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio -from +from web_sales - ,item + ,item ,date_dim -where - ws_item_sk = i_item_sk +where + ws_item_sk = i_item_sk and i_category in ('Jewelry', 'Sports', 'Books') and ws_sold_date_sk = d_date_sk - and d_date between cast('2001-01-12' as date) - and (cast('2001-01-12' as date) + 30 days) -group by + and d_date between cast('2001-01-12' as date) + and (cast('2001-01-12' as date) + interval '30' day) +group by i_item_id - ,i_item_desc + ,i_item_desc ,i_category ,i_class ,i_current_price -order by +order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query14.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query14.sql deleted file mode 100644 index 8d9de3cd5a0e..000000000000 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query14.sql +++ /dev/null @@ -1,223 +0,0 @@ --- Licensed to the Apache Software Foundation (ASF) under one --- or more contributor license agreements. See the NOTICE file --- distributed with this work for additional information --- regarding copyright ownership. The ASF licenses this file --- to you under the Apache License, Version 2.0 (the --- "License"); you may not use this file except in compliance --- with the License. You may obtain a copy of the License at --- --- http://www.apache.org/licenses/LICENSE-2.0 --- --- Unless required by applicable law or agreed to in writing, software --- distributed under the License is distributed on an "AS IS" BASIS, --- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --- See the License for the specific language governing permissions and --- limitations under the License. - -with cross_items as - (select i_item_sk ss_item_sk - from item, - (select iss.i_brand_id brand_id - ,iss.i_class_id class_id - ,iss.i_category_id category_id - from store_sales - ,item iss - ,date_dim d1 - where ss_item_sk = iss.i_item_sk - and ss_sold_date_sk = d1.d_date_sk - and d1.d_year between 1998 AND 1998 + 2 - intersect - select ics.i_brand_id - ,ics.i_class_id - ,ics.i_category_id - from catalog_sales - ,item ics - ,date_dim d2 - where cs_item_sk = ics.i_item_sk - and cs_sold_date_sk = d2.d_date_sk - and d2.d_year between 1998 AND 1998 + 2 - intersect - select iws.i_brand_id - ,iws.i_class_id - ,iws.i_category_id - from web_sales - ,item iws - ,date_dim d3 - where ws_item_sk = iws.i_item_sk - and ws_sold_date_sk = d3.d_date_sk - and d3.d_year between 1998 AND 1998 + 2) - where i_brand_id = brand_id - and i_class_id = class_id - and i_category_id = category_id -), - avg_sales as - (select avg(quantity*list_price) average_sales - from (select ss_quantity quantity - ,ss_list_price list_price - from store_sales - ,date_dim - where ss_sold_date_sk = d_date_sk - and d_year between 1998 and 1998 + 2 - union all - select cs_quantity quantity - ,cs_list_price list_price - from catalog_sales - ,date_dim - where cs_sold_date_sk = d_date_sk - and d_year between 1998 and 1998 + 2 - union all - select ws_quantity quantity - ,ws_list_price list_price - from web_sales - ,date_dim - where ws_sold_date_sk = d_date_sk - and d_year between 1998 and 1998 + 2) x) - select channel, i_brand_id,i_class_id,i_category_id,sum(sales), sum(number_sales) - from( - select 'store' channel, i_brand_id,i_class_id - ,i_category_id,sum(ss_quantity*ss_list_price) sales - , count(*) number_sales - from store_sales - ,item - ,date_dim - where ss_item_sk in (select ss_item_sk from cross_items) - and ss_item_sk = i_item_sk - and ss_sold_date_sk = d_date_sk - and d_year = 1998+2 - and d_moy = 11 - group by i_brand_id,i_class_id,i_category_id - having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales) - union all - select 'catalog' channel, i_brand_id,i_class_id,i_category_id, sum(cs_quantity*cs_list_price) sales, count(*) number_sales - from catalog_sales - ,item - ,date_dim - where cs_item_sk in (select ss_item_sk from cross_items) - and cs_item_sk = i_item_sk - and cs_sold_date_sk = d_date_sk - and d_year = 1998+2 - and d_moy = 11 - group by i_brand_id,i_class_id,i_category_id - having sum(cs_quantity*cs_list_price) > (select average_sales from avg_sales) - union all - select 'web' channel, i_brand_id,i_class_id,i_category_id, sum(ws_quantity*ws_list_price) sales , count(*) number_sales - from web_sales - ,item - ,date_dim - where ws_item_sk in (select ss_item_sk from cross_items) - and ws_item_sk = i_item_sk - and ws_sold_date_sk = d_date_sk - and d_year = 1998+2 - and d_moy = 11 - group by i_brand_id,i_class_id,i_category_id - having sum(ws_quantity*ws_list_price) > (select average_sales from avg_sales) - ) y - group by rollup (channel, i_brand_id,i_class_id,i_category_id) - order by channel,i_brand_id,i_class_id,i_category_id - limit 100; -with cross_items as - (select i_item_sk ss_item_sk - from item, - (select iss.i_brand_id brand_id - ,iss.i_class_id class_id - ,iss.i_category_id category_id - from store_sales - ,item iss - ,date_dim d1 - where ss_item_sk = iss.i_item_sk - and ss_sold_date_sk = d1.d_date_sk - and d1.d_year between 1998 AND 1998 + 2 - intersect - select ics.i_brand_id - ,ics.i_class_id - ,ics.i_category_id - from catalog_sales - ,item ics - ,date_dim d2 - where cs_item_sk = ics.i_item_sk - and cs_sold_date_sk = d2.d_date_sk - and d2.d_year between 1998 AND 1998 + 2 - intersect - select iws.i_brand_id - ,iws.i_class_id - ,iws.i_category_id - from web_sales - ,item iws - ,date_dim d3 - where ws_item_sk = iws.i_item_sk - and ws_sold_date_sk = d3.d_date_sk - and d3.d_year between 1998 AND 1998 + 2) x - where i_brand_id = brand_id - and i_class_id = class_id - and i_category_id = category_id -), - avg_sales as -(select avg(quantity*list_price) average_sales - from (select ss_quantity quantity - ,ss_list_price list_price - from store_sales - ,date_dim - where ss_sold_date_sk = d_date_sk - and d_year between 1998 and 1998 + 2 - union all - select cs_quantity quantity - ,cs_list_price list_price - from catalog_sales - ,date_dim - where cs_sold_date_sk = d_date_sk - and d_year between 1998 and 1998 + 2 - union all - select ws_quantity quantity - ,ws_list_price list_price - from web_sales - ,date_dim - where ws_sold_date_sk = d_date_sk - and d_year between 1998 and 1998 + 2) x) - select this_year.channel ty_channel - ,this_year.i_brand_id ty_brand - ,this_year.i_class_id ty_class - ,this_year.i_category_id ty_category - ,this_year.sales ty_sales - ,this_year.number_sales ty_number_sales - ,last_year.channel ly_channel - ,last_year.i_brand_id ly_brand - ,last_year.i_class_id ly_class - ,last_year.i_category_id ly_category - ,last_year.sales ly_sales - ,last_year.number_sales ly_number_sales - from - (select 'store' channel, i_brand_id,i_class_id,i_category_id - ,sum(ss_quantity*ss_list_price) sales, count(*) number_sales - from store_sales - ,item - ,date_dim - where ss_item_sk in (select ss_item_sk from cross_items) - and ss_item_sk = i_item_sk - and ss_sold_date_sk = d_date_sk - and d_week_seq = (select d_week_seq - from date_dim - where d_year = 1998 + 1 - and d_moy = 12 - and d_dom = 16) - group by i_brand_id,i_class_id,i_category_id - having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) this_year, - (select 'store' channel, i_brand_id,i_class_id - ,i_category_id, sum(ss_quantity*ss_list_price) sales, count(*) number_sales - from store_sales - ,item - ,date_dim - where ss_item_sk in (select ss_item_sk from cross_items) - and ss_item_sk = i_item_sk - and ss_sold_date_sk = d_date_sk - and d_week_seq = (select d_week_seq - from date_dim - where d_year = 1998 - and d_moy = 12 - and d_dom = 16) - group by i_brand_id,i_class_id,i_category_id - having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) last_year - where this_year.i_brand_id= last_year.i_brand_id - and this_year.i_class_id = last_year.i_class_id - and this_year.i_category_id = last_year.i_category_id - order by this_year.channel, this_year.i_brand_id, this_year.i_class_id, this_year.i_category_id - limit 100 diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query14a.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query14a.sql new file mode 100644 index 000000000000..cb2729f18894 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query14a.sql @@ -0,0 +1,95 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. + +with cross_items as + (select i_item_sk ss_item_sk + from item, + (select iss.i_brand_id brand_id, iss.i_class_id class_id, iss.i_category_id category_id + from store_sales, item iss, date_dim d1 + where ss_item_sk = iss.i_item_sk + and ss_sold_date_sk = d1.d_date_sk + and d1.d_year between 1999 AND 1999 + 2 + intersect + select ics.i_brand_id, ics.i_class_id, ics.i_category_id + from catalog_sales, item ics, date_dim d2 + where cs_item_sk = ics.i_item_sk + and cs_sold_date_sk = d2.d_date_sk + and d2.d_year between 1999 AND 1999 + 2 + intersect + select iws.i_brand_id, iws.i_class_id, iws.i_category_id + from web_sales, item iws, date_dim d3 + where ws_item_sk = iws.i_item_sk + and ws_sold_date_sk = d3.d_date_sk + and d3.d_year between 1999 AND 1999 + 2) x + where i_brand_id = brand_id + and i_class_id = class_id + and i_category_id = category_id +), + avg_sales as + (select avg(quantity*list_price) average_sales + from ( + select ss_quantity quantity, ss_list_price list_price + from store_sales, date_dim + where ss_sold_date_sk = d_date_sk + and d_year between 1999 and 2001 + union all + select cs_quantity quantity, cs_list_price list_price + from catalog_sales, date_dim + where cs_sold_date_sk = d_date_sk + and d_year between 1999 and 1999 + 2 + union all + select ws_quantity quantity, ws_list_price list_price + from web_sales, date_dim + where ws_sold_date_sk = d_date_sk + and d_year between 1999 and 1999 + 2) x) + select channel, i_brand_id,i_class_id,i_category_id,sum(sales), sum(number_sales) + from( + select 'store' channel, i_brand_id,i_class_id + ,i_category_id,sum(ss_quantity*ss_list_price) sales + , count(*) number_sales + from store_sales, item, date_dim + where ss_item_sk in (select ss_item_sk from cross_items) + and ss_item_sk = i_item_sk + and ss_sold_date_sk = d_date_sk + and d_year = 1999+2 + and d_moy = 11 + group by i_brand_id,i_class_id,i_category_id + having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales) + union all + select 'catalog' channel, i_brand_id,i_class_id,i_category_id, sum(cs_quantity*cs_list_price) sales, count(*) number_sales + from catalog_sales, item, date_dim + where cs_item_sk in (select ss_item_sk from cross_items) + and cs_item_sk = i_item_sk + and cs_sold_date_sk = d_date_sk + and d_year = 1999+2 + and d_moy = 11 + group by i_brand_id,i_class_id,i_category_id + having sum(cs_quantity*cs_list_price) > (select average_sales from avg_sales) + union all + select 'web' channel, i_brand_id,i_class_id,i_category_id, sum(ws_quantity*ws_list_price) sales , count(*) number_sales + from web_sales, item, date_dim + where ws_item_sk in (select ss_item_sk from cross_items) + and ws_item_sk = i_item_sk + and ws_sold_date_sk = d_date_sk + and d_year = 1999+2 + and d_moy = 11 + group by i_brand_id,i_class_id,i_category_id + having sum(ws_quantity*ws_list_price) > (select average_sales from avg_sales) + ) y + group by rollup (channel, i_brand_id,i_class_id,i_category_id) + order by channel,i_brand_id,i_class_id,i_category_id + limit 100 + diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query14b.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query14b.sql new file mode 100644 index 000000000000..060e0e87df1b --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query14b.sql @@ -0,0 +1,110 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. + +with cross_items as + (select i_item_sk ss_item_sk + from item, + (select iss.i_brand_id brand_id + ,iss.i_class_id class_id + ,iss.i_category_id category_id + from store_sales + ,item iss + ,date_dim d1 + where ss_item_sk = iss.i_item_sk + and ss_sold_date_sk = d1.d_date_sk + and d1.d_year between 1999 AND 1999 + 2 + intersect + select ics.i_brand_id + ,ics.i_class_id + ,ics.i_category_id + from catalog_sales + ,item ics + ,date_dim d2 + where cs_item_sk = ics.i_item_sk + and cs_sold_date_sk = d2.d_date_sk + and d2.d_year between 1999 AND 1999 + 2 + intersect + select iws.i_brand_id + ,iws.i_class_id + ,iws.i_category_id + from web_sales + ,item iws + ,date_dim d3 + where ws_item_sk = iws.i_item_sk + and ws_sold_date_sk = d3.d_date_sk + and d3.d_year between 1999 AND 1999 + 2) x + where i_brand_id = brand_id + and i_class_id = class_id + and i_category_id = category_id +), + avg_sales as +(select avg(quantity*list_price) average_sales + from (select ss_quantity quantity + ,ss_list_price list_price + from store_sales + ,date_dim + where ss_sold_date_sk = d_date_sk + and d_year between 1998 and 1998 + 2 + union all + select cs_quantity quantity + ,cs_list_price list_price + from catalog_sales + ,date_dim + where cs_sold_date_sk = d_date_sk + and d_year between 1998 and 1998 + 2 + union all + select ws_quantity quantity + ,ws_list_price list_price + from web_sales + ,date_dim + where ws_sold_date_sk = d_date_sk + and d_year between 1998 and 1998 + 2) x) + select * from + (select 'store' channel, i_brand_id,i_class_id,i_category_id + ,sum(ss_quantity*ss_list_price) sales, count(*) number_sales + from store_sales + ,item + ,date_dim + where ss_item_sk in (select ss_item_sk from cross_items) + and ss_item_sk = i_item_sk + and ss_sold_date_sk = d_date_sk + and d_week_seq = (select d_week_seq + from date_dim + where d_year = 1998 + 1 + and d_moy = 12 + and d_dom = 16) + group by i_brand_id,i_class_id,i_category_id + having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) this_year, + (select 'store' channel, i_brand_id,i_class_id + ,i_category_id, sum(ss_quantity*ss_list_price) sales, count(*) number_sales + from store_sales + ,item + ,date_dim + where ss_item_sk in (select ss_item_sk from cross_items) + and ss_item_sk = i_item_sk + and ss_sold_date_sk = d_date_sk + and d_week_seq = (select d_week_seq + from date_dim + where d_year = 1998 + and d_moy = 12 + and d_dom = 16) + group by i_brand_id,i_class_id,i_category_id + having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) last_year + where this_year.i_brand_id= last_year.i_brand_id + and this_year.i_class_id = last_year.i_class_id + and this_year.i_category_id = last_year.i_category_id + order by this_year.channel, this_year.i_brand_id, this_year.i_class_id, this_year.i_category_id + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query15.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query15.sql index 1ae0c374a8b8..ca65cdca5613 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query15.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query15.sql @@ -22,7 +22,7 @@ select ca_zip ,date_dim where cs_bill_customer_sk = c_customer_sk and c_current_addr_sk = ca_address_sk - and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', + and ( substring(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', '85392', '85460', '80348', '81792') or ca_state in ('CA','WA','GA') or cs_sales_price > 500) diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query16.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query16.sql index 54b7164fdbca..54e866fa0f8f 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query16.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query16.sql @@ -15,23 +15,23 @@ -- limitations under the License. select - count(distinct cs_order_number) as "order count" - ,sum(cs_ext_ship_cost) as "total shipping cost" - ,sum(cs_net_profit) as "total net profit" + count(distinct cs_order_number) as `order count` + ,sum(cs_ext_ship_cost) as `total shipping cost` + ,sum(cs_net_profit) as `total net profit` from catalog_sales cs1 ,date_dim ,customer_address ,call_center where - d_date between '1999-2-01' and - (cast('1999-2-01' as date) + 60 days) + d_date between '2001-4-01' and + (cast('2001-4-01' as date) + interval '60' day) and cs1.cs_ship_date_sk = d_date_sk and cs1.cs_ship_addr_sk = ca_address_sk -and ca_state = 'IL' +and ca_state = 'NY' and cs1.cs_call_center_sk = cc_call_center_sk -and cc_county in ('Williamson County','Williamson County','Williamson County','Williamson County', - 'Williamson County' +and cc_county in ('Ziebach County','Levy County','Huron County','Franklin Parish', + 'Daviess County' ) and exists (select * from catalog_sales cs2 @@ -41,4 +41,4 @@ and not exists(select * from catalog_returns cr1 where cs1.cs_order_number = cr1.cr_order_number) order by count(distinct cs_order_number) -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query19.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query19.sql index 7a85a1065d42..9b223ec2e774 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query19.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query19.sql @@ -24,7 +24,7 @@ select i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact, and d_year=1999 and ss_customer_sk = c_customer_sk and c_current_addr_sk = ca_address_sk - and substr(ca_zip,1,5) <> substr(s_zip,1,5) + and substring(ca_zip,1,5) <> substring(s_zip,1,5) and ss_store_sk = s_store_sk group by i_brand ,i_brand_id diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query20.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query20.sql index 95e960b18016..df0d2b79fcac 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query20.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query20.sql @@ -14,24 +14,23 @@ -- See the License for the specific language governing permissions and -- limitations under the License. -select i_item_id - ,i_item_desc - ,i_category - ,i_class +select i_item_desc + ,i_category + ,i_class ,i_current_price - ,sum(cs_ext_sales_price) as itemrevenue + ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio - from catalog_sales - ,item + from catalog_sales + ,item ,date_dim - where cs_item_sk = i_item_sk + where cs_item_sk = i_item_sk and i_category in ('Jewelry', 'Sports', 'Books') and cs_sold_date_sk = d_date_sk - and d_date between cast('2001-01-12' as date) - and (cast('2001-01-12' as date) + 30 days) + and d_date between cast('2001-01-12' as date) + and (cast('2001-01-12' as date) + interval '30' day) group by i_item_id - ,i_item_desc + ,i_item_desc ,i_category ,i_class ,i_current_price @@ -40,4 +39,4 @@ select i_item_id ,i_item_id ,i_item_desc ,revenueratio -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query21.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query21.sql index 3ba811b6a76e..412a6cd761ea 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query21.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query21.sql @@ -18,10 +18,10 @@ select * from(select w_warehouse_name ,i_item_id ,sum(case when (cast(d_date as date) < cast ('1998-04-08' as date)) - then inv_quantity_on_hand + then inv_quantity_on_hand else 0 end) as inv_before ,sum(case when (cast(d_date as date) >= cast ('1998-04-08' as date)) - then inv_quantity_on_hand + then inv_quantity_on_hand else 0 end) as inv_after from inventory ,warehouse @@ -31,13 +31,13 @@ select * and i_item_sk = inv_item_sk and inv_warehouse_sk = w_warehouse_sk and inv_date_sk = d_date_sk - and d_date between (cast ('1998-04-08' as date) - 30 days) - and (cast ('1998-04-08' as date) + 30 days) + and d_date between (cast ('1998-04-08' as date) - interval '30' day) + and (cast ('1998-04-08' as date) + interval '30' day) group by w_warehouse_name, i_item_id) x - where (case when inv_before > 0 - then inv_after / inv_before + where (case when inv_before > 0 + then inv_after / inv_before else null end) between 2.0/3.0 and 3.0/2.0 order by w_warehouse_name ,i_item_id - limit 100 + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query23.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query23.sql deleted file mode 100644 index 0ee1dab637d9..000000000000 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query23.sql +++ /dev/null @@ -1,120 +0,0 @@ --- Licensed to the Apache Software Foundation (ASF) under one --- or more contributor license agreements. See the NOTICE file --- distributed with this work for additional information --- regarding copyright ownership. The ASF licenses this file --- to you under the Apache License, Version 2.0 (the --- "License"); you may not use this file except in compliance --- with the License. You may obtain a copy of the License at --- --- http://www.apache.org/licenses/LICENSE-2.0 --- --- Unless required by applicable law or agreed to in writing, software --- distributed under the License is distributed on an "AS IS" BASIS, --- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --- See the License for the specific language governing permissions and --- limitations under the License. - -with frequent_ss_items as - (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt - from store_sales - ,date_dim - ,item - where ss_sold_date_sk = d_date_sk - and ss_item_sk = i_item_sk - and d_year in (1999,1999+1,1999+2,1999+3) - group by substr(i_item_desc,1,30),i_item_sk,d_date - having count(*) >4), - max_store_sales as - (select max(csales) tpcds_cmax - from (select c_customer_sk,sum(ss_quantity*ss_sales_price) csales - from store_sales - ,customer - ,date_dim - where ss_customer_sk = c_customer_sk - and ss_sold_date_sk = d_date_sk - and d_year in (1999,1999+1,1999+2,1999+3) - group by c_customer_sk)), - best_ss_customer as - (select c_customer_sk,sum(ss_quantity*ss_sales_price) ssales - from store_sales - ,customer - where ss_customer_sk = c_customer_sk - group by c_customer_sk - having sum(ss_quantity*ss_sales_price) > (95/100.0) * (select - * -from - max_store_sales)) - select sum(sales) - from (select cs_quantity*cs_list_price sales - from catalog_sales - ,date_dim - where d_year = 1999 - and d_moy = 1 - and cs_sold_date_sk = d_date_sk - and cs_item_sk in (select item_sk from frequent_ss_items) - and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer) - union all - select ws_quantity*ws_list_price sales - from web_sales - ,date_dim - where d_year = 1999 - and d_moy = 1 - and ws_sold_date_sk = d_date_sk - and ws_item_sk in (select item_sk from frequent_ss_items) - and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer)) - limit 100; -with frequent_ss_items as - (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt - from store_sales - ,date_dim - ,item - where ss_sold_date_sk = d_date_sk - and ss_item_sk = i_item_sk - and d_year in (1999,1999 + 1,1999 + 2,1999 + 3) - group by substr(i_item_desc,1,30),i_item_sk,d_date - having count(*) >4), - max_store_sales as - (select max(csales) tpcds_cmax - from (select c_customer_sk,sum(ss_quantity*ss_sales_price) csales - from store_sales - ,customer - ,date_dim - where ss_customer_sk = c_customer_sk - and ss_sold_date_sk = d_date_sk - and d_year in (1999,1999+1,1999+2,1999+3) - group by c_customer_sk)), - best_ss_customer as - (select c_customer_sk,sum(ss_quantity*ss_sales_price) ssales - from store_sales - ,customer - where ss_customer_sk = c_customer_sk - group by c_customer_sk - having sum(ss_quantity*ss_sales_price) > (95/100.0) * (select - * - from max_store_sales)) - select c_last_name,c_first_name,sales - from (select c_last_name,c_first_name,sum(cs_quantity*cs_list_price) sales - from catalog_sales - ,customer - ,date_dim - where d_year = 1999 - and d_moy = 1 - and cs_sold_date_sk = d_date_sk - and cs_item_sk in (select item_sk from frequent_ss_items) - and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer) - and cs_bill_customer_sk = c_customer_sk - group by c_last_name,c_first_name - union all - select c_last_name,c_first_name,sum(ws_quantity*ws_list_price) sales - from web_sales - ,customer - ,date_dim - where d_year = 1999 - and d_moy = 1 - and ws_sold_date_sk = d_date_sk - and ws_item_sk in (select item_sk from frequent_ss_items) - and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer) - and ws_bill_customer_sk = c_customer_sk - group by c_last_name,c_first_name) - order by c_last_name,c_first_name,sales - limit 100 diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query23a.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query23a.sql new file mode 100644 index 000000000000..bf200faaaab4 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query23a.sql @@ -0,0 +1,65 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. + +with frequent_ss_items as + (select substring(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt + from store_sales + ,date_dim + ,item + where ss_sold_date_sk = d_date_sk + and ss_item_sk = i_item_sk + and d_year in (1999,1999+1,1999+2,1999+3) + group by substring(i_item_desc,1,30),i_item_sk,d_date + having count(*) >4), + max_store_sales as + (select max(csales) tpcds_cmax + from (select c_customer_sk,sum(ss_quantity*ss_sales_price) csales + from store_sales + ,customer + ,date_dim + where ss_customer_sk = c_customer_sk + and ss_sold_date_sk = d_date_sk + and d_year in (1999,1999+1,1999+2,1999+3) + group by c_customer_sk) x), + best_ss_customer as + (select c_customer_sk,sum(ss_quantity*ss_sales_price) ssales + from store_sales + ,customer + where ss_customer_sk = c_customer_sk + group by c_customer_sk + having sum(ss_quantity*ss_sales_price) > (95/100.0) * (select + * +from + max_store_sales)) + select sum(sales) + from ((select cs_quantity*cs_list_price sales + from catalog_sales + ,date_dim + where d_year = 1999 + and d_moy = 1 + and cs_sold_date_sk = d_date_sk + and cs_item_sk in (select item_sk from frequent_ss_items) + and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer)) + union all + (select ws_quantity*ws_list_price sales + from web_sales + ,date_dim + where d_year = 1999 + and d_moy = 1 + and ws_sold_date_sk = d_date_sk + and ws_item_sk in (select item_sk from frequent_ss_items) + and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer))) y + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query23b.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query23b.sql new file mode 100644 index 000000000000..23a65398eac7 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query23b.sql @@ -0,0 +1,71 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. + +with frequent_ss_items as + (select substring(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt + from store_sales + ,date_dim + ,item + where ss_sold_date_sk = d_date_sk + and ss_item_sk = i_item_sk + and d_year in (1999,1999 + 1,1999 + 2,1999 + 3) + group by substring(i_item_desc,1,30),i_item_sk,d_date + having count(*) >4), + max_store_sales as + (select max(csales) tpcds_cmax + from (select c_customer_sk,sum(ss_quantity*ss_sales_price) csales + from store_sales + ,customer + ,date_dim + where ss_customer_sk = c_customer_sk + and ss_sold_date_sk = d_date_sk + and d_year in (1999,1999+1,1999+2,1999+3) + group by c_customer_sk) x), + best_ss_customer as + (select c_customer_sk,sum(ss_quantity*ss_sales_price) ssales + from store_sales + ,customer + where ss_customer_sk = c_customer_sk + group by c_customer_sk + having sum(ss_quantity*ss_sales_price) > (95/100.0) * (select + * + from max_store_sales)) + select c_last_name,c_first_name,sales + from ((select c_last_name,c_first_name,sum(cs_quantity*cs_list_price) sales + from catalog_sales + ,customer + ,date_dim + where d_year = 1999 + and d_moy = 1 + and cs_sold_date_sk = d_date_sk + and cs_item_sk in (select item_sk from frequent_ss_items) + and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer) + and cs_bill_customer_sk = c_customer_sk + group by c_last_name,c_first_name) + union all + (select c_last_name,c_first_name,sum(ws_quantity*ws_list_price) sales + from web_sales + ,customer + ,date_dim + where d_year = 1999 + and d_moy = 1 + and ws_sold_date_sk = d_date_sk + and ws_item_sk in (select item_sk from frequent_ss_items) + and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer) + and ws_bill_customer_sk = c_customer_sk + group by c_last_name,c_first_name)) y + order by c_last_name,c_first_name,sales + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query24.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query24.sql deleted file mode 100644 index 3f45c4f0c3c6..000000000000 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query24.sql +++ /dev/null @@ -1,119 +0,0 @@ --- Licensed to the Apache Software Foundation (ASF) under one --- or more contributor license agreements. See the NOTICE file --- distributed with this work for additional information --- regarding copyright ownership. The ASF licenses this file --- to you under the Apache License, Version 2.0 (the --- "License"); you may not use this file except in compliance --- with the License. You may obtain a copy of the License at --- --- http://www.apache.org/licenses/LICENSE-2.0 --- --- Unless required by applicable law or agreed to in writing, software --- distributed under the License is distributed on an "AS IS" BASIS, --- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --- See the License for the specific language governing permissions and --- limitations under the License. - -with ssales as -(select c_last_name - ,c_first_name - ,s_store_name - ,ca_state - ,s_state - ,i_color - ,i_current_price - ,i_manager_id - ,i_units - ,i_size - ,sum(ss_sales_price) netpaid -from store_sales - ,store_returns - ,store - ,item - ,customer - ,customer_address -where ss_ticket_number = sr_ticket_number - and ss_item_sk = sr_item_sk - and ss_customer_sk = c_customer_sk - and ss_item_sk = i_item_sk - and ss_store_sk = s_store_sk - and c_current_addr_sk = ca_address_sk - and c_birth_country <> upper(ca_country) - and s_zip = ca_zip -and s_market_id=7 -group by c_last_name - ,c_first_name - ,s_store_name - ,ca_state - ,s_state - ,i_color - ,i_current_price - ,i_manager_id - ,i_units - ,i_size) -select c_last_name - ,c_first_name - ,s_store_name - ,sum(netpaid) paid -from ssales -where i_color = 'orchid' -group by c_last_name - ,c_first_name - ,s_store_name -having sum(netpaid) > (select 0.05*avg(netpaid) - from ssales) -order by c_last_name - ,c_first_name - ,s_store_name -; -with ssales as -(select c_last_name - ,c_first_name - ,s_store_name - ,ca_state - ,s_state - ,i_color - ,i_current_price - ,i_manager_id - ,i_units - ,i_size - ,sum(ss_sales_price) netpaid -from store_sales - ,store_returns - ,store - ,item - ,customer - ,customer_address -where ss_ticket_number = sr_ticket_number - and ss_item_sk = sr_item_sk - and ss_customer_sk = c_customer_sk - and ss_item_sk = i_item_sk - and ss_store_sk = s_store_sk - and c_current_addr_sk = ca_address_sk - and c_birth_country <> upper(ca_country) - and s_zip = ca_zip - and s_market_id = 7 -group by c_last_name - ,c_first_name - ,s_store_name - ,ca_state - ,s_state - ,i_color - ,i_current_price - ,i_manager_id - ,i_units - ,i_size) -select c_last_name - ,c_first_name - ,s_store_name - ,sum(netpaid) paid -from ssales -where i_color = 'chiffon' -group by c_last_name - ,c_first_name - ,s_store_name -having sum(netpaid) > (select 0.05*avg(netpaid) - from ssales) -order by c_last_name - ,c_first_name - ,s_store_name diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query24a.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query24a.sql new file mode 100644 index 000000000000..1e55e449b94c --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query24a.sql @@ -0,0 +1,35 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. + + with ssales as + (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, + i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) netpaid + from store_sales, store_returns, store, item, customer, customer_address + where ss_ticket_number = sr_ticket_number + and ss_item_sk = sr_item_sk + and ss_customer_sk = c_customer_sk + and ss_item_sk = i_item_sk + and ss_store_sk = s_store_sk + and c_birth_country = upper(ca_country) + and s_zip = ca_zip + and s_market_id = 8 + group by c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, + i_current_price, i_manager_id, i_units, i_size) + select c_last_name, c_first_name, s_store_name, sum(netpaid) paid + from ssales + where i_color = 'pale' + group by c_last_name, c_first_name, s_store_name + having sum(netpaid) > (select 0.05*avg(netpaid) from ssales) \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query24b.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query24b.sql new file mode 100644 index 000000000000..2ade5f8b1926 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query24b.sql @@ -0,0 +1,35 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. + + with ssales as + (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, + i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) netpaid + from store_sales, store_returns, store, item, customer, customer_address + where ss_ticket_number = sr_ticket_number + and ss_item_sk = sr_item_sk + and ss_customer_sk = c_customer_sk + and ss_item_sk = i_item_sk + and ss_store_sk = s_store_sk + and c_birth_country = upper(ca_country) + and s_zip = ca_zip + and s_market_id = 8 + group by c_last_name, c_first_name, s_store_name, ca_state, s_state, + i_color, i_current_price, i_manager_id, i_units, i_size) + select c_last_name, c_first_name, s_store_name, sum(netpaid) paid + from ssales + where i_color = 'chiffon' + group by c_last_name, c_first_name, s_store_name + having sum(netpaid) > (select 0.05*avg(netpaid) from ssales) \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query32.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query32.sql index 70eb50860f43..0828df10de3f 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query32.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query32.sql @@ -14,28 +14,28 @@ -- See the License for the specific language governing permissions and -- limitations under the License. -select sum(cs_ext_discount_amt) as "excess discount amount" -from - catalog_sales - ,item +select sum(cs_ext_discount_amt) as `excess discount amount` +from + catalog_sales + ,item ,date_dim where i_manufact_id = 269 -and i_item_sk = cs_item_sk -and d_date between '1998-03-18' and - (cast('1998-03-18' as date) + 90 days) -and d_date_sk = cs_sold_date_sk -and cs_ext_discount_amt - > ( - select - 1.3 * avg(cs_ext_discount_amt) - from - catalog_sales +and i_item_sk = cs_item_sk +and d_date between '1998-03-18' and + (cast('1998-03-18' as date) + interval '90' day) +and d_date_sk = cs_sold_date_sk +and cs_ext_discount_amt + > ( + select + 1.3 * avg(cs_ext_discount_amt) + from + catalog_sales ,date_dim - where - cs_item_sk = i_item_sk + where + cs_item_sk = i_item_sk and d_date between '1998-03-18' and - (cast('1998-03-18' as date) + 90 days) - and d_date_sk = cs_sold_date_sk - ) -limit 100 + (cast('1998-03-18' as date) + interval '90' day) + and d_date_sk = cs_sold_date_sk + ) +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query37.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query37.sql index 24237b7fad3e..57791d4124cc 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query37.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query37.sql @@ -21,7 +21,7 @@ select i_item_id where i_current_price between 22 and 22 + 30 and inv_item_sk = i_item_sk and d_date_sk=inv_date_sk - and d_date between cast('2001-06-02' as date) and (cast('2001-06-02' as date) + 60 days) + and d_date between cast('2001-06-02' as date) and (cast('2001-06-02' as date) + interval '60' day) and i_manufact_id in (678,964,918,849) and inv_quantity_on_hand between 100 and 500 and cs_item_sk = i_item_sk diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query39a.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query39a.sql new file mode 100644 index 000000000000..e458ac292cd4 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query39a.sql @@ -0,0 +1,40 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. + +with inv as +(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy + ,stdev,mean, case mean when 0 then null else stdev/mean end cov + from(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy + ,stddev_samp(inv_quantity_on_hand) stdev,avg(inv_quantity_on_hand) mean + from inventory + ,item + ,warehouse + ,date_dim + where inv_item_sk = i_item_sk + and inv_warehouse_sk = w_warehouse_sk + and inv_date_sk = d_date_sk + and d_year =1999 + group by w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy) foo + where case mean when 0 then 0 else stdev/mean end > 1) +select inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean, inv1.cov + ,inv2.w_warehouse_sk,inv2.i_item_sk,inv2.d_moy,inv2.mean, inv2.cov +from inv inv1,inv inv2 +where inv1.i_item_sk = inv2.i_item_sk + and inv1.w_warehouse_sk = inv2.w_warehouse_sk + and inv1.d_moy=4 + and inv2.d_moy=4+1 +order by inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean,inv1.cov + ,inv2.d_moy,inv2.mean, inv2.cov \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query39.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query39b.sql similarity index 61% rename from sdks/java/testing/tpcds/src/main/resources/queries/query39.sql rename to sdks/java/testing/tpcds/src/main/resources/queries/query39b.sql index aaed22a80238..bb0e9fb94f44 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query39.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query39b.sql @@ -26,32 +26,7 @@ with inv as where inv_item_sk = i_item_sk and inv_warehouse_sk = w_warehouse_sk and inv_date_sk = d_date_sk - and d_year =1998 - group by w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy) foo - where case mean when 0 then 0 else stdev/mean end > 1) -select inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean, inv1.cov - ,inv2.w_warehouse_sk,inv2.i_item_sk,inv2.d_moy,inv2.mean, inv2.cov -from inv inv1,inv inv2 -where inv1.i_item_sk = inv2.i_item_sk - and inv1.w_warehouse_sk = inv2.w_warehouse_sk - and inv1.d_moy=4 - and inv2.d_moy=4+1 -order by inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean,inv1.cov - ,inv2.d_moy,inv2.mean, inv2.cov -; -with inv as -(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy - ,stdev,mean, case mean when 0 then null else stdev/mean end cov - from(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy - ,stddev_samp(inv_quantity_on_hand) stdev,avg(inv_quantity_on_hand) mean - from inventory - ,item - ,warehouse - ,date_dim - where inv_item_sk = i_item_sk - and inv_warehouse_sk = w_warehouse_sk - and inv_date_sk = d_date_sk - and d_year =1998 + and d_year =1999 group by w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy) foo where case mean when 0 then 0 else stdev/mean end > 1) select inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean, inv1.cov @@ -63,4 +38,4 @@ where inv1.i_item_sk = inv2.i_item_sk and inv2.d_moy=4+1 and inv1.cov > 1.5 order by inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean,inv1.cov - ,inv2.d_moy,inv2.mean, inv2.cov + ,inv2.d_moy,inv2.mean, inv2.cov \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query40.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query40.sql index 41a8cbafdfba..fe28cd38c613 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query40.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query40.sql @@ -17,25 +17,25 @@ select w_state ,i_item_id - ,sum(case when (cast(d_date as date) < cast ('1998-04-08' as date)) + ,sum(case when (cast(d_date as date) < cast ('1998-04-08' as date)) then cs_sales_price - coalesce(cr_refunded_cash,0) else 0 end) as sales_before - ,sum(case when (cast(d_date as date) >= cast ('1998-04-08' as date)) + ,sum(case when (cast(d_date as date) >= cast ('1998-04-08' as date)) then cs_sales_price - coalesce(cr_refunded_cash,0) else 0 end) as sales_after from catalog_sales left outer join catalog_returns on - (cs_order_number = cr_order_number + (cs_order_number = cr_order_number and cs_item_sk = cr_item_sk) - ,warehouse + ,warehouse ,item ,date_dim where i_current_price between 0.99 and 1.49 and i_item_sk = cs_item_sk - and cs_warehouse_sk = w_warehouse_sk + and cs_warehouse_sk = w_warehouse_sk and cs_sold_date_sk = d_date_sk - and d_date between (cast ('1998-04-08' as date) - 30 days) - and (cast ('1998-04-08' as date) + 30 days) + and d_date between (cast ('1998-04-08' as date) - interval '30' day) + and (cast ('1998-04-08' as date) + interval '30' day) group by w_state,i_item_id order by w_state,i_item_id -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query45.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query45.sql index 765456af9956..99dcac0be8d9 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query45.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query45.sql @@ -19,7 +19,7 @@ select ca_zip, ca_county, sum(ws_sales_price) where ws_bill_customer_sk = c_customer_sk and c_current_addr_sk = ca_address_sk and ws_item_sk = i_item_sk - and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', '85392', '85460', '80348', '81792') + and ( substring(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', '85392', '85460', '80348', '81792') or i_item_id in (select i_item_id from item diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query47.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query47.sql index 9d2e4ca6ff4e..9fcf30b4f2c6 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query47.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query47.sql @@ -40,7 +40,7 @@ with v1 as( s_store_name, s_company_name, d_year, d_moy), v2 as( - select v1.i_category, v1.i_brand + select v1.i_category ,v1.d_year, v1.d_moy ,v1.avg_monthly_sales ,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum @@ -57,8 +57,8 @@ with v1 as( v1.rn = v1_lead.rn - 1) select * from v2 - where d_year = 2000 and + where d_year = 2000 and avg_monthly_sales > 0 and case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 - order by sum_sales - avg_monthly_sales, nsum - limit 100 + order by sum_sales - avg_monthly_sales, 3 + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query5.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query5.sql index da2e30a6b06a..9ab9bce781ef 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query5.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query5.sql @@ -18,7 +18,7 @@ with ssr as (select s_store_id, sum(sales_price) as sales, sum(profit) as profit, - sum(return_amt) as returns, + sum(return_amt) as `returns`, sum(net_loss) as profit_loss from ( select ss_store_sk as store_sk, @@ -40,8 +40,8 @@ with ssr as date_dim, store where date_sk = d_date_sk - and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 14 days) + and d_date between cast('1998-08-04' as date) + and (cast('1998-08-04' as date) + interval '14' day) and store_sk = s_store_sk group by s_store_id) , @@ -49,7 +49,7 @@ with ssr as (select cp_catalog_page_id, sum(sales_price) as sales, sum(profit) as profit, - sum(return_amt) as returns, + sum(return_amt) as `returns`, sum(net_loss) as profit_loss from ( select cs_catalog_page_sk as page_sk, @@ -72,7 +72,7 @@ with ssr as catalog_page where date_sk = d_date_sk and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 14 days) + and (cast('1998-08-04' as date) + interval '14' day) and page_sk = cp_catalog_page_sk group by cp_catalog_page_id) , @@ -80,7 +80,7 @@ with ssr as (select web_site_id, sum(sales_price) as sales, sum(profit) as profit, - sum(return_amt) as returns, + sum(return_amt) as `returns`, sum(net_loss) as profit_loss from ( select ws_web_site_sk as wsr_web_site_sk, @@ -105,37 +105,37 @@ with ssr as web_site where date_sk = d_date_sk and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 14 days) + and (cast('1998-08-04' as date) + interval '14' day) and wsr_web_site_sk = web_site_sk group by web_site_id) select channel , id , sum(sales) as sales - , sum(returns) as returns + , sum(`returns`) as `returns` , sum(profit) as profit - from + from (select 'store channel' as channel , 'store' || s_store_id as id , sales - , returns + , `returns` , (profit - profit_loss) as profit from ssr union all select 'catalog channel' as channel , 'catalog_page' || cp_catalog_page_id as id , sales - , returns + , `returns` , (profit - profit_loss) as profit from csr union all select 'web channel' as channel , 'web_site' || web_site_id as id , sales - , returns + , `returns` , (profit - profit_loss) as profit from wsr ) x group by rollup (channel, id) order by channel ,id - limit 100 + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query50.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query50.sql index fc37addaae9d..38bb0a3596fc 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query50.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query50.sql @@ -25,14 +25,14 @@ select ,s_county ,s_state ,s_zip - ,sum(case when (sr_returned_date_sk - ss_sold_date_sk <= 30 ) then 1 else 0 end) as "30 days" - ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 30) and - (sr_returned_date_sk - ss_sold_date_sk <= 60) then 1 else 0 end ) as "31-60 days" - ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 60) and - (sr_returned_date_sk - ss_sold_date_sk <= 90) then 1 else 0 end) as "61-90 days" + ,sum(case when (sr_returned_date_sk - ss_sold_date_sk <= 30 ) then 1 else 0 end) as `30 days` + ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 30) and + (sr_returned_date_sk - ss_sold_date_sk <= 60) then 1 else 0 end ) as `31-60 days` + ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 60) and + (sr_returned_date_sk - ss_sold_date_sk <= 90) then 1 else 0 end) as `61-90 days` ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 90) and - (sr_returned_date_sk - ss_sold_date_sk <= 120) then 1 else 0 end) as "91-120 days" - ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 120) then 1 else 0 end) as ">120 days" + (sr_returned_date_sk - ss_sold_date_sk <= 120) then 1 else 0 end) as `91-120 days` + ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 120) then 1 else 0 end) as `>120 days` from store_sales ,store_returns @@ -69,4 +69,4 @@ order by s_store_name ,s_county ,s_state ,s_zip -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query58.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query58.sql index 42366e689826..87c1e7da8315 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query58.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query58.sql @@ -16,14 +16,14 @@ with ss_items as (select i_item_id item_id - ,sum(ss_ext_sales_price) ss_item_rev + ,sum(ss_ext_sales_price) ss_item_rev from store_sales ,item ,date_dim where ss_item_sk = i_item_sk and d_date in (select d_date from date_dim - where d_week_seq = (select d_week_seq + where d_week_seq = (select d_week_seq from date_dim where d_date = '1998-02-19')) and ss_sold_date_sk = d_date_sk @@ -37,7 +37,7 @@ with ss_items as where cs_item_sk = i_item_sk and d_date in (select d_date from date_dim - where d_week_seq = (select d_week_seq + where d_week_seq = (select d_week_seq from date_dim where d_date = '1998-02-19')) and cs_sold_date_sk = d_date_sk @@ -51,22 +51,22 @@ with ss_items as where ws_item_sk = i_item_sk and d_date in (select d_date from date_dim - where d_week_seq =(select d_week_seq + where d_week_seq =(select d_week_seq from date_dim where d_date = '1998-02-19')) and ws_sold_date_sk = d_date_sk group by i_item_id) select ss_items.item_id ,ss_item_rev - ,ss_item_rev/((ss_item_rev+cs_item_rev+ws_item_rev)/3) * 100 ss_dev + ,ss_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ss_dev ,cs_item_rev - ,cs_item_rev/((ss_item_rev+cs_item_rev+ws_item_rev)/3) * 100 cs_dev + ,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev ,ws_item_rev - ,ws_item_rev/((ss_item_rev+cs_item_rev+ws_item_rev)/3) * 100 ws_dev + ,ws_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ws_dev ,(ss_item_rev+cs_item_rev+ws_item_rev)/3 average from ss_items,cs_items,ws_items where ss_items.item_id=cs_items.item_id - and ss_items.item_id=ws_items.item_id + and ss_items.item_id=ws_items.item_id and ss_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev and ss_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev and cs_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev @@ -75,4 +75,4 @@ with ss_items as and ws_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev order by item_id ,ss_item_rev - limit 100 + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query59.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query59.sql index 462ef96ebd68..21e4cc96c0f4 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query59.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query59.sql @@ -30,7 +30,7 @@ with wss as ) select s_store_name1,s_store_id1,d_week_seq1 ,sun_sales1/sun_sales2,mon_sales1/mon_sales2 - ,tue_sales1/tue_sales2,wed_sales1/wed_sales2,thu_sales1/thu_sales2 + ,tue_sales1/tue_sales1,wed_sales1/wed_sales2,thu_sales1/thu_sales2 ,fri_sales1/fri_sales2,sat_sales1/sat_sales2 from (select s_store_name s_store_name1,wss.d_week_seq d_week_seq1 @@ -40,7 +40,7 @@ with wss as ,fri_sales fri_sales1,sat_sales sat_sales1 from wss,store,date_dim d where d.d_week_seq = wss.d_week_seq and - ss_store_sk = s_store_sk and + ss_store_sk = s_store_sk and d_month_seq between 1185 and 1185 + 11) y, (select s_store_name s_store_name2,wss.d_week_seq d_week_seq2 ,s_store_id s_store_id2,sun_sales sun_sales2 @@ -49,9 +49,9 @@ with wss as ,fri_sales fri_sales2,sat_sales sat_sales2 from wss,store,date_dim d where d.d_week_seq = wss.d_week_seq and - ss_store_sk = s_store_sk and + ss_store_sk = s_store_sk and d_month_seq between 1185+ 12 and 1185 + 23) x where s_store_id1=s_store_id2 and d_week_seq1=d_week_seq2-52 order by s_store_name1,s_store_id1,d_week_seq1 -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query62.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query62.sql index f3f383e13b44..e09d69bc9abe 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query62.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query62.sql @@ -15,17 +15,17 @@ -- limitations under the License. select - substr(w_warehouse_name,1,20) + substring(w_warehouse_name,1,20) ,sm_type ,web_name - ,sum(case when (ws_ship_date_sk - ws_sold_date_sk <= 30 ) then 1 else 0 end) as "30 days" - ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 30) and - (ws_ship_date_sk - ws_sold_date_sk <= 60) then 1 else 0 end ) as "31-60 days" - ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 60) and - (ws_ship_date_sk - ws_sold_date_sk <= 90) then 1 else 0 end) as "61-90 days" + ,sum(case when (ws_ship_date_sk - ws_sold_date_sk <= 30 ) then 1 else 0 end) as `30 days` + ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 30) and + (ws_ship_date_sk - ws_sold_date_sk <= 60) then 1 else 0 end ) as `31-60 days` + ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 60) and + (ws_ship_date_sk - ws_sold_date_sk <= 90) then 1 else 0 end) as `61-90 days` ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 90) and - (ws_ship_date_sk - ws_sold_date_sk <= 120) then 1 else 0 end) as "91-120 days" - ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 120) then 1 else 0 end) as ">120 days" + (ws_ship_date_sk - ws_sold_date_sk <= 120) then 1 else 0 end) as `91-120 days` + ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 120) then 1 else 0 end) as `>120 days` from web_sales ,warehouse @@ -39,10 +39,10 @@ and ws_warehouse_sk = w_warehouse_sk and ws_ship_mode_sk = sm_ship_mode_sk and ws_web_site_sk = web_site_sk group by - substr(w_warehouse_name,1,20) + substring(w_warehouse_name,1,20) ,sm_type ,web_name -order by substr(w_warehouse_name,1,20) +order by substring(w_warehouse_name,1,20) ,sm_type ,web_name -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query66.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query66.sql index f99b53b1030a..a91f99930328 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query66.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query66.sql @@ -22,7 +22,7 @@ select ,w_state ,w_country ,ship_carriers - ,year + ,`year` ,sum(jan_sales) as jan_sales ,sum(feb_sales) as feb_sales ,sum(mar_sales) as mar_sales @@ -60,7 +60,7 @@ select ,sum(nov_net) as nov_net ,sum(dec_net) as dec_net from ( - select + (select w_warehouse_name ,w_warehouse_sq_ft ,w_city @@ -68,50 +68,50 @@ select ,w_state ,w_country ,'DIAMOND' || ',' || 'AIRBORNE' as ship_carriers - ,d_year as year - ,sum(case when d_moy = 1 + ,d_year as `year` + ,sum(case when d_moy = 1 then ws_sales_price* ws_quantity else 0 end) as jan_sales - ,sum(case when d_moy = 2 + ,sum(case when d_moy = 2 then ws_sales_price* ws_quantity else 0 end) as feb_sales - ,sum(case when d_moy = 3 + ,sum(case when d_moy = 3 then ws_sales_price* ws_quantity else 0 end) as mar_sales - ,sum(case when d_moy = 4 + ,sum(case when d_moy = 4 then ws_sales_price* ws_quantity else 0 end) as apr_sales - ,sum(case when d_moy = 5 + ,sum(case when d_moy = 5 then ws_sales_price* ws_quantity else 0 end) as may_sales - ,sum(case when d_moy = 6 + ,sum(case when d_moy = 6 then ws_sales_price* ws_quantity else 0 end) as jun_sales - ,sum(case when d_moy = 7 + ,sum(case when d_moy = 7 then ws_sales_price* ws_quantity else 0 end) as jul_sales - ,sum(case when d_moy = 8 + ,sum(case when d_moy = 8 then ws_sales_price* ws_quantity else 0 end) as aug_sales - ,sum(case when d_moy = 9 + ,sum(case when d_moy = 9 then ws_sales_price* ws_quantity else 0 end) as sep_sales - ,sum(case when d_moy = 10 + ,sum(case when d_moy = 10 then ws_sales_price* ws_quantity else 0 end) as oct_sales ,sum(case when d_moy = 11 then ws_sales_price* ws_quantity else 0 end) as nov_sales ,sum(case when d_moy = 12 then ws_sales_price* ws_quantity else 0 end) as dec_sales - ,sum(case when d_moy = 1 + ,sum(case when d_moy = 1 then ws_net_paid_inc_tax * ws_quantity else 0 end) as jan_net ,sum(case when d_moy = 2 then ws_net_paid_inc_tax * ws_quantity else 0 end) as feb_net - ,sum(case when d_moy = 3 + ,sum(case when d_moy = 3 then ws_net_paid_inc_tax * ws_quantity else 0 end) as mar_net - ,sum(case when d_moy = 4 + ,sum(case when d_moy = 4 then ws_net_paid_inc_tax * ws_quantity else 0 end) as apr_net - ,sum(case when d_moy = 5 + ,sum(case when d_moy = 5 then ws_net_paid_inc_tax * ws_quantity else 0 end) as may_net - ,sum(case when d_moy = 6 + ,sum(case when d_moy = 6 then ws_net_paid_inc_tax * ws_quantity else 0 end) as jun_net - ,sum(case when d_moy = 7 + ,sum(case when d_moy = 7 then ws_net_paid_inc_tax * ws_quantity else 0 end) as jul_net - ,sum(case when d_moy = 8 + ,sum(case when d_moy = 8 then ws_net_paid_inc_tax * ws_quantity else 0 end) as aug_net - ,sum(case when d_moy = 9 + ,sum(case when d_moy = 9 then ws_net_paid_inc_tax * ws_quantity else 0 end) as sep_net - ,sum(case when d_moy = 10 + ,sum(case when d_moy = 10 then ws_net_paid_inc_tax * ws_quantity else 0 end) as oct_net ,sum(case when d_moy = 11 then ws_net_paid_inc_tax * ws_quantity else 0 end) as nov_net @@ -129,9 +129,9 @@ select and ws_sold_time_sk = t_time_sk and ws_ship_mode_sk = sm_ship_mode_sk and d_year = 2002 - and t_time between 49530 and 49530+28800 + and t_time between 49530 and 49530+28800 and sm_carrier in ('DIAMOND','AIRBORNE') - group by + group by w_warehouse_name ,w_warehouse_sq_ft ,w_city @@ -139,8 +139,9 @@ select ,w_state ,w_country ,d_year + ) union all - select + (select w_warehouse_name ,w_warehouse_sq_ft ,w_city @@ -148,50 +149,50 @@ select ,w_state ,w_country ,'DIAMOND' || ',' || 'AIRBORNE' as ship_carriers - ,d_year as year - ,sum(case when d_moy = 1 + ,d_year as `year` + ,sum(case when d_moy = 1 then cs_ext_sales_price* cs_quantity else 0 end) as jan_sales - ,sum(case when d_moy = 2 + ,sum(case when d_moy = 2 then cs_ext_sales_price* cs_quantity else 0 end) as feb_sales - ,sum(case when d_moy = 3 + ,sum(case when d_moy = 3 then cs_ext_sales_price* cs_quantity else 0 end) as mar_sales - ,sum(case when d_moy = 4 + ,sum(case when d_moy = 4 then cs_ext_sales_price* cs_quantity else 0 end) as apr_sales - ,sum(case when d_moy = 5 + ,sum(case when d_moy = 5 then cs_ext_sales_price* cs_quantity else 0 end) as may_sales - ,sum(case when d_moy = 6 + ,sum(case when d_moy = 6 then cs_ext_sales_price* cs_quantity else 0 end) as jun_sales - ,sum(case when d_moy = 7 + ,sum(case when d_moy = 7 then cs_ext_sales_price* cs_quantity else 0 end) as jul_sales - ,sum(case when d_moy = 8 + ,sum(case when d_moy = 8 then cs_ext_sales_price* cs_quantity else 0 end) as aug_sales - ,sum(case when d_moy = 9 + ,sum(case when d_moy = 9 then cs_ext_sales_price* cs_quantity else 0 end) as sep_sales - ,sum(case when d_moy = 10 + ,sum(case when d_moy = 10 then cs_ext_sales_price* cs_quantity else 0 end) as oct_sales ,sum(case when d_moy = 11 then cs_ext_sales_price* cs_quantity else 0 end) as nov_sales ,sum(case when d_moy = 12 then cs_ext_sales_price* cs_quantity else 0 end) as dec_sales - ,sum(case when d_moy = 1 + ,sum(case when d_moy = 1 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as jan_net - ,sum(case when d_moy = 2 + ,sum(case when d_moy = 2 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as feb_net - ,sum(case when d_moy = 3 + ,sum(case when d_moy = 3 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as mar_net - ,sum(case when d_moy = 4 + ,sum(case when d_moy = 4 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as apr_net - ,sum(case when d_moy = 5 + ,sum(case when d_moy = 5 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as may_net - ,sum(case when d_moy = 6 + ,sum(case when d_moy = 6 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as jun_net - ,sum(case when d_moy = 7 + ,sum(case when d_moy = 7 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as jul_net - ,sum(case when d_moy = 8 + ,sum(case when d_moy = 8 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as aug_net - ,sum(case when d_moy = 9 + ,sum(case when d_moy = 9 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as sep_net - ,sum(case when d_moy = 10 + ,sum(case when d_moy = 10 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as oct_net ,sum(case when d_moy = 11 then cs_net_paid_inc_ship_tax * cs_quantity else 0 end) as nov_net @@ -209,9 +210,9 @@ select and cs_sold_time_sk = t_time_sk and cs_ship_mode_sk = sm_ship_mode_sk and d_year = 2002 - and t_time between 49530 AND 49530+28800 + and t_time between 49530 AND 49530+28800 and sm_carrier in ('DIAMOND','AIRBORNE') - group by + group by w_warehouse_name ,w_warehouse_sq_ft ,w_city @@ -219,8 +220,9 @@ select ,w_state ,w_country ,d_year + ) ) x - group by + group by w_warehouse_name ,w_warehouse_sq_ft ,w_city @@ -228,6 +230,6 @@ select ,w_state ,w_country ,ship_carriers - ,year + ,`year` order by w_warehouse_name - limit 100 + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query72.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query72.sql index a1173dcc0c81..b0db77c43a75 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query72.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query72.sql @@ -17,8 +17,8 @@ select i_item_desc ,w_warehouse_name ,d1.d_week_seq - ,sum(case when p_promo_sk is null then 1 else 0 end) no_promo - ,sum(case when p_promo_sk is not null then 1 else 0 end) promo + ,count(case when p_promo_sk is null then 1 else 0 end) no_promo + ,count(case when p_promo_sk is not null then 1 else 0 end) promo ,count(*) total_cnt from catalog_sales join inventory on (cs_item_sk = inv_item_sk) @@ -32,11 +32,13 @@ join date_dim d3 on (cs_ship_date_sk = d3.d_date_sk) left outer join promotion on (cs_promo_sk=p_promo_sk) left outer join catalog_returns on (cr_item_sk = cs_item_sk and cr_order_number = cs_order_number) where d1.d_week_seq = d2.d_week_seq - and inv_quantity_on_hand < cs_quantity - and d3.d_date > d1.d_date + 5 + and inv_quantity_on_hand < cs_quantity + and d3.d_date > d1.d_date + interval '5' day and hd_buy_potential = '1001-5000' and d1.d_year = 2001 + and hd_buy_potential = '1001-5000' and cd_marital_status = 'M' + and d1.d_year = 2001 group by i_item_desc,w_warehouse_name,d1.d_week_seq order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq limit 100 diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query74.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query74.sql index 384d1e7214be..717130376e91 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query74.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query74.sql @@ -18,7 +18,7 @@ with year_total as ( select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name - ,d_year as year + ,d_year as `year` ,max(ss_net_paid) year_total ,'s' sale_type from customer @@ -35,7 +35,7 @@ with year_total as ( select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name - ,d_year as year + ,d_year as `year` ,max(ws_net_paid) year_total ,'w' sale_type from customer @@ -49,7 +49,7 @@ with year_total as ( ,c_last_name ,d_year ) - select + select t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name from year_total t_s_firstyear ,year_total t_s_secyear @@ -62,13 +62,13 @@ with year_total as ( and t_w_firstyear.sale_type = 'w' and t_s_secyear.sale_type = 's' and t_w_secyear.sale_type = 'w' - and t_s_firstyear.year = 2001 - and t_s_secyear.year = 2001+1 - and t_w_firstyear.year = 2001 - and t_w_secyear.year = 2001+1 + and t_s_firstyear.`year` = 2001 + and t_s_secyear.`year` = 2001+1 + and t_w_firstyear.`year` = 2001 + and t_w_secyear.`year` = 2001+1 and t_s_firstyear.year_total > 0 and t_w_firstyear.year_total > 0 and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end order by 2,1,3 -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query75.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query75.sql index 6d9c689d1fba..74b481532e27 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query75.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query75.sql @@ -14,55 +14,55 @@ -- See the License for the specific language governing permissions and -- limitations under the License. -WITH all_sales AS ( - SELECT d_year +with all_sales as ( + select d_year ,i_brand_id ,i_class_id ,i_category_id ,i_manufact_id - ,SUM(sales_cnt) AS sales_cnt - ,SUM(sales_amt) AS sales_amt - FROM (SELECT d_year + ,sum(sales_cnt) as sales_cnt + ,sum(sales_amt) as sales_amt + from (select d_year ,i_brand_id ,i_class_id ,i_category_id ,i_manufact_id - ,cs_quantity - COALESCE(cr_return_quantity,0) AS sales_cnt - ,cs_ext_sales_price - COALESCE(cr_return_amount,0.0) AS sales_amt - FROM catalog_sales JOIN item ON i_item_sk=cs_item_sk - JOIN date_dim ON d_date_sk=cs_sold_date_sk - LEFT JOIN catalog_returns ON (cs_order_number=cr_order_number - AND cs_item_sk=cr_item_sk) - WHERE i_category='Sports' - UNION - SELECT d_year + ,cs_quantity - coalesce(cr_return_quantity,0) as sales_cnt + ,cs_ext_sales_price - coalesce(cr_return_amount,0.0) as sales_amt + from catalog_sales join item on i_item_sk=cs_item_sk + join date_dim on d_date_sk=cs_sold_date_sk + left join catalog_returns on (cs_order_number=cr_order_number + and cs_item_sk=cr_item_sk) + where i_category='Sports' + union + select d_year ,i_brand_id ,i_class_id ,i_category_id ,i_manufact_id - ,ss_quantity - COALESCE(sr_return_quantity,0) AS sales_cnt - ,ss_ext_sales_price - COALESCE(sr_return_amt,0.0) AS sales_amt - FROM store_sales JOIN item ON i_item_sk=ss_item_sk - JOIN date_dim ON d_date_sk=ss_sold_date_sk - LEFT JOIN store_returns ON (ss_ticket_number=sr_ticket_number - AND ss_item_sk=sr_item_sk) - WHERE i_category='Sports' - UNION - SELECT d_year + ,ss_quantity - coalesce(sr_return_quantity,0) as sales_cnt + ,ss_ext_sales_price - coalesce(sr_return_amt,0.0) as sales_amt + from store_sales join item on i_item_sk=ss_item_sk + join date_dim on d_date_sk=ss_sold_date_sk + left join store_returns on (ss_ticket_number=sr_ticket_number + and ss_item_sk=sr_item_sk) + where i_category='Sports' + union + select d_year ,i_brand_id ,i_class_id ,i_category_id ,i_manufact_id - ,ws_quantity - COALESCE(wr_return_quantity,0) AS sales_cnt - ,ws_ext_sales_price - COALESCE(wr_return_amt,0.0) AS sales_amt - FROM web_sales JOIN item ON i_item_sk=ws_item_sk - JOIN date_dim ON d_date_sk=ws_sold_date_sk - LEFT JOIN web_returns ON (ws_order_number=wr_order_number - AND ws_item_sk=wr_item_sk) - WHERE i_category='Sports') sales_detail - GROUP BY d_year, i_brand_id, i_class_id, i_category_id, i_manufact_id) - SELECT prev_yr.d_year AS prev_year - ,curr_yr.d_year AS year + ,ws_quantity - coalesce(wr_return_quantity,0) as sales_cnt + ,ws_ext_sales_price - coalesce(wr_return_amt,0.0) as sales_amt + from web_sales join item on i_item_sk=ws_item_sk + join date_dim on d_date_sk=ws_sold_date_sk + left join web_returns on (ws_order_number=wr_order_number + and ws_item_sk=wr_item_sk) + where i_category='Sports') sales_detail + group by d_year, i_brand_id, i_class_id, i_category_id, i_manufact_id) + select prev_yr.d_year as prev_year + ,curr_yr.d_year as`year` ,curr_yr.i_brand_id ,curr_yr.i_class_id ,curr_yr.i_category_id @@ -72,12 +72,12 @@ WITH all_sales AS ( ,curr_yr.sales_cnt-prev_yr.sales_cnt AS sales_cnt_diff ,curr_yr.sales_amt-prev_yr.sales_amt AS sales_amt_diff FROM all_sales curr_yr, all_sales prev_yr - WHERE curr_yr.i_brand_id=prev_yr.i_brand_id - AND curr_yr.i_class_id=prev_yr.i_class_id - AND curr_yr.i_category_id=prev_yr.i_category_id - AND curr_yr.i_manufact_id=prev_yr.i_manufact_id - AND curr_yr.d_year=2002 - AND prev_yr.d_year=2002-1 - AND CAST(curr_yr.sales_cnt AS DECIMAL(17,2))/CAST(prev_yr.sales_cnt AS DECIMAL(17,2))<0.9 - ORDER BY sales_cnt_diff,sales_amt_diff - limit 100 + where curr_yr.i_brand_id=prev_yr.i_brand_id + and curr_yr.i_class_id=prev_yr.i_class_id + and curr_yr.i_category_id=prev_yr.i_category_id + and curr_yr.i_manufact_id=prev_yr.i_manufact_id + and curr_yr.d_year=2002 + and prev_yr.d_year=2002-1 + and cast(curr_yr.sales_cnt as decimal(17,2))/cast(prev_yr.sales_cnt as decimal(17,2))<0.9 + order by sales_cnt_diff + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query77.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query77.sql index 725717579dd0..04162a42189a 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query77.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query77.sql @@ -22,23 +22,23 @@ with ss as date_dim, store where ss_sold_date_sk = d_date_sk - and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 30 days) + and d_date between cast('1998-08-04' as date) + and (cast('1998-08-04' as date) + interval '30' day) and ss_store_sk = s_store_sk group by s_store_sk) , sr as (select s_store_sk, - sum(sr_return_amt) as returns, + sum(sr_return_amt) as `returns`, sum(sr_net_loss) as profit_loss from store_returns, date_dim, store where sr_returned_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 30 days) + and (cast('1998-08-04' as date) + interval '30' day) and sr_store_sk = s_store_sk - group by s_store_sk), + group by s_store_sk), cs as (select cs_call_center_sk, sum(cs_ext_sales_price) as sales, @@ -47,20 +47,19 @@ with ss as date_dim where cs_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 30 days) - group by cs_call_center_sk - ), + and (cast('1998-08-04' as date) + interval '30' day) + group by cs_call_center_sk + ), cr as - (select cr_call_center_sk, - sum(cr_return_amount) as returns, - sum(cr_net_loss) as profit_loss + (select + sum(cr_return_amount) as `returns`, + sum(cr_net_loss) as profit_loss from catalog_returns, date_dim where cr_returned_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 30 days) - group by cr_call_center_sk - ), + and (cast('1998-08-04' as date) + interval '30' day) + ), ws as ( select wp_web_page_sk, sum(ws_ext_sales_price) as sales, @@ -70,31 +69,31 @@ with ss as web_page where ws_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 30 days) + and (cast('1998-08-04' as date) + interval '30' day) and ws_web_page_sk = wp_web_page_sk - group by wp_web_page_sk), + group by wp_web_page_sk), wr as (select wp_web_page_sk, - sum(wr_return_amt) as returns, + sum(wr_return_amt) as `returns`, sum(wr_net_loss) as profit_loss from web_returns, date_dim, web_page where wr_returned_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 30 days) + and (cast('1998-08-04' as date) + interval '30' day) and wr_web_page_sk = wp_web_page_sk group by wp_web_page_sk) select channel , id , sum(sales) as sales - , sum(returns) as returns + , sum(`returns`) as `returns` , sum(profit) as profit - from + from (select 'store channel' as channel , ss.s_store_sk as id , sales - , coalesce(returns, 0) as returns + , coalesce(`returns`, 0) as `returns` , (profit - coalesce(profit_loss,0)) as profit from ss left join sr on ss.s_store_sk = sr.s_store_sk @@ -102,7 +101,7 @@ with ss as select 'catalog channel' as channel , cs_call_center_sk as id , sales - , returns + , `returns` , (profit - profit_loss) as profit from cs , cr @@ -110,7 +109,7 @@ with ss as select 'web channel' as channel , ws.wp_web_page_sk as id , sales - , coalesce(returns, 0) returns + , coalesce(`returns`, 0) `returns` , (profit - coalesce(profit_loss,0)) as profit from ws left join wr on ws.wp_web_page_sk = wr.wp_web_page_sk @@ -118,4 +117,4 @@ with ss as group by rollup (channel, id) order by channel ,id - limit 100 + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query79.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query79.sql index 072822628a4c..bd83c45a2569 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query79.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query79.sql @@ -15,7 +15,7 @@ -- limitations under the License. select - c_last_name,c_first_name,substr(s_city,1,30),ss_ticket_number,amt,profit + c_last_name,c_first_name,substring(s_city,1,30),ss_ticket_number,amt,profit from (select ss_ticket_number ,ss_customer_sk @@ -32,5 +32,5 @@ select and store.s_number_employees between 200 and 295 group by ss_ticket_number,ss_customer_sk,ss_addr_sk,store.s_city) ms,customer where ss_customer_sk = c_customer_sk - order by c_last_name,c_first_name,substr(s_city,1,30), profit + order by c_last_name,c_first_name,substring(s_city,1,30), profit limit 100 diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query8.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query8.sql index 250c1185ce13..34f6e0dea1d1 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query8.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query8.sql @@ -21,9 +21,9 @@ select s_store_name ,store, (select ca_zip from ( - SELECT substr(ca_zip,1,5) ca_zip + SELECT substring(ca_zip,1,5) ca_zip FROM customer_address - WHERE substr(ca_zip,1,5) IN ( + WHERE substring(ca_zip,1,5) IN ( '89436','30868','65085','22977','83927','77557', '58429','40697','80614','10502','32779', '91137','61265','98294','17921','18427', @@ -106,7 +106,7 @@ select s_store_name '32961','18586','79307','15492') intersect select ca_zip - from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt + from (SELECT substring(ca_zip,1,5) ca_zip,count(*) cnt FROM customer_address, customer WHERE ca_address_sk = c_current_addr_sk and c_preferred_cust_flag='Y' @@ -115,7 +115,7 @@ select s_store_name where ss_store_sk = s_store_sk and ss_sold_date_sk = d_date_sk and d_qoy = 1 and d_year = 2002 - and (substr(s_zip,1,2) = substr(V1.ca_zip,1,2)) + and (substring(s_zip,1,2) = substring(V1.ca_zip,1,2)) group by s_store_name order by s_store_name limit 100 diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query80.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query80.sql index 9c6e177b228a..85439344962d 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query80.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query80.sql @@ -17,7 +17,7 @@ with ssr as (select s_store_id as store_id, sum(ss_ext_sales_price) as sales, - sum(coalesce(sr_return_amt, 0)) as returns, + sum(coalesce(sr_return_amt, 0)) as `returns`, sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit from store_sales left outer join store_returns on (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), @@ -26,8 +26,8 @@ with ssr as item, promotion where ss_sold_date_sk = d_date_sk - and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 30 days) + and d_date between cast('1998-08-04' as date) + and (cast('1998-08-04' as date) + interval '30' day) and ss_store_sk = s_store_sk and ss_item_sk = i_item_sk and i_current_price > 50 @@ -38,7 +38,7 @@ with ssr as csr as (select cp_catalog_page_id as catalog_page_id, sum(cs_ext_sales_price) as sales, - sum(coalesce(cr_return_amount, 0)) as returns, + sum(coalesce(cr_return_amount, 0)) as `returns`, sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit from catalog_sales left outer join catalog_returns on (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), @@ -48,7 +48,7 @@ with ssr as promotion where cs_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 30 days) + and (cast('1998-08-04' as date) + interval '30' day) and cs_catalog_page_sk = cp_catalog_page_sk and cs_item_sk = i_item_sk and i_current_price > 50 @@ -59,7 +59,7 @@ group by cp_catalog_page_id) wsr as (select web_site_id, sum(ws_ext_sales_price) as sales, - sum(coalesce(wr_return_amt, 0)) as returns, + sum(coalesce(wr_return_amt, 0)) as `returns`, sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit from web_sales left outer join web_returns on (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), @@ -69,7 +69,7 @@ group by cp_catalog_page_id) promotion where ws_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) - and (cast('1998-08-04' as date) + 30 days) + and (cast('1998-08-04' as date) + interval '30' day) and ws_web_site_sk = web_site_sk and ws_item_sk = i_item_sk and i_current_price > 50 @@ -79,31 +79,31 @@ group by web_site_id) select channel , id , sum(sales) as sales - , sum(returns) as returns + , sum(`returns`) as `returns` , sum(profit) as profit - from + from (select 'store channel' as channel , 'store' || store_id as id , sales - , returns + , `returns` , profit from ssr union all select 'catalog channel' as channel , 'catalog_page' || catalog_page_id as id , sales - , returns + , `returns` , profit from csr union all select 'web channel' as channel , 'web_site' || web_site_id as id , sales - , returns + , `returns` , profit from wsr ) x group by rollup (channel, id) order by channel ,id - limit 100 + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query82.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query82.sql index f08cc170c540..7b80b181c1d5 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query82.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query82.sql @@ -21,7 +21,7 @@ select i_item_id where i_current_price between 30 and 30+30 and inv_item_sk = i_item_sk and d_date_sk=inv_date_sk - and d_date between cast('2002-05-30' as date) and (cast('2002-05-30' as date) + 60 days) + and d_date between cast('2002-05-30' as date) and (cast('2002-05-30' as date) + interval '60' day) and i_manufact_id in (437,129,727,663) and inv_quantity_on_hand between 100 and 500 and ss_item_sk = i_item_sk diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query85.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query85.sql index dea9927a31ac..09746df221df 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query85.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query85.sql @@ -14,7 +14,7 @@ -- See the License for the specific language governing permissions and -- limitations under the License. -select substr(r_reason_desc,1,20) +select substring(r_reason_desc,1,20) ,avg(ws_quantity) ,avg(wr_refunded_cash) ,avg(wr_fee) @@ -90,7 +90,7 @@ select substr(r_reason_desc,1,20) ) ) group by r_reason_desc -order by substr(r_reason_desc,1,20) +order by substring(r_reason_desc,1,20) ,avg(ws_quantity) ,avg(wr_refunded_cash) ,avg(wr_fee) diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query90.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query90.sql index 2dfa02a28b68..773bcda85129 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query90.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query90.sql @@ -22,7 +22,7 @@ select cast(amc as decimal(15,4))/cast(pmc as decimal(15,4)) am_pm_ratio and ws_web_page_sk = web_page.wp_web_page_sk and time_dim.t_hour between 6 and 6+1 and household_demographics.hd_dep_count = 8 - and web_page.wp_char_count between 5000 and 5200) at, + and web_page.wp_char_count between 5000 and 5200) `at`, ( select count(*) pmc from web_sales, household_demographics , time_dim, web_page where ws_sold_time_sk = time_dim.t_time_sk @@ -32,4 +32,4 @@ select cast(amc as decimal(15,4))/cast(pmc as decimal(15,4)) am_pm_ratio and household_demographics.hd_dep_count = 8 and web_page.wp_char_count between 5000 and 5200) pt order by am_pm_ratio - limit 100 + limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query92.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query92.sql index a7ce3a3e8266..9e2626ef4b27 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query92.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query92.sql @@ -15,29 +15,29 @@ -- limitations under the License. select - sum(ws_ext_discount_amt) as "Excess Discount Amount" -from - web_sales - ,item + sum(ws_ext_discount_amt) as `Excess Discount Amount` +from + web_sales + ,item ,date_dim where i_manufact_id = 269 -and i_item_sk = ws_item_sk -and d_date between '1998-03-18' and - (cast('1998-03-18' as date) + 90 days) -and d_date_sk = ws_sold_date_sk -and ws_ext_discount_amt - > ( - SELECT - 1.3 * avg(ws_ext_discount_amt) - FROM - web_sales +and i_item_sk = ws_item_sk +and d_date between '1998-03-18' and + (cast('1998-03-18' as date) + interval '90' day) +and d_date_sk = ws_sold_date_sk +and ws_ext_discount_amt + > ( + SELECT + 1.3 * avg(ws_ext_discount_amt) + FROM + web_sales ,date_dim - WHERE - ws_item_sk = i_item_sk + WHERE + ws_item_sk = i_item_sk and d_date between '1998-03-18' and - (cast('1998-03-18' as date) + 90 days) - and d_date_sk = ws_sold_date_sk - ) + (cast('1998-03-18' as date) + interval '90' day) + and d_date_sk = ws_sold_date_sk + ) order by sum(ws_ext_discount_amt) -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query94.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query94.sql index dab63beda478..c69a8cc262a4 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query94.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query94.sql @@ -15,17 +15,17 @@ -- limitations under the License. select - count(distinct ws_order_number) as "order count" - ,sum(ws_ext_ship_cost) as "total shipping cost" - ,sum(ws_net_profit) as "total net profit" + count(distinct ws_order_number) as `order count` + ,sum(ws_ext_ship_cost) as `total shipping cost` + ,sum(ws_net_profit) as `total net profit` from web_sales ws1 ,date_dim ,customer_address ,web_site where - d_date between '1999-5-01' and - (cast('1999-5-01' as date) + 60 days) + d_date between '1999-5-01' and + (cast('1999-5-01' as date) + interval '60' day) and ws1.ws_ship_date_sk = d_date_sk and ws1.ws_ship_addr_sk = ca_address_sk and ca_state = 'TX' @@ -39,4 +39,4 @@ and not exists(select * from web_returns wr1 where ws1.ws_order_number = wr1.wr_order_number) order by count(distinct ws_order_number) -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query95.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query95.sql index b0828261523b..543d2b259ec5 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query95.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query95.sql @@ -19,18 +19,18 @@ with ws_wh as from web_sales ws1,web_sales ws2 where ws1.ws_order_number = ws2.ws_order_number and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) - select - count(distinct ws_order_number) as "order count" - ,sum(ws_ext_ship_cost) as "total shipping cost" - ,sum(ws_net_profit) as "total net profit" + select + count(distinct ws_order_number) as `order count` + ,sum(ws_ext_ship_cost) as `total shipping cost` + ,sum(ws_net_profit) as `total net profit` from web_sales ws1 ,date_dim ,customer_address ,web_site where - d_date between '1999-5-01' and - (cast('1999-5-01' as date) + 60 days) + d_date between '1999-5-01' and + (cast('1999-5-01' as date) + interval '60' day) and ws1.ws_ship_date_sk = d_date_sk and ws1.ws_ship_addr_sk = ca_address_sk and ca_state = 'TX' @@ -42,4 +42,4 @@ and ws1.ws_order_number in (select wr_order_number from web_returns,ws_wh where wr_order_number = ws_wh.ws_order_number) order by count(distinct ws_order_number) -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query98.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query98.sql index 29d5757fa076..80e9f2066af7 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query98.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query98.sql @@ -14,33 +14,32 @@ -- See the License for the specific language governing permissions and -- limitations under the License. -select i_item_id - ,i_item_desc - ,i_category - ,i_class +select i_item_desc + ,i_category + ,i_class ,i_current_price - ,sum(ss_ext_sales_price) as itemrevenue + ,sum(ss_ext_sales_price) as itemrevenue ,sum(ss_ext_sales_price)*100/sum(sum(ss_ext_sales_price)) over (partition by i_class) as revenueratio -from +from store_sales - ,item + ,item ,date_dim -where - ss_item_sk = i_item_sk +where + ss_item_sk = i_item_sk and i_category in ('Jewelry', 'Sports', 'Books') and ss_sold_date_sk = d_date_sk - and d_date between cast('2001-01-12' as date) - and (cast('2001-01-12' as date) + 30 days) -group by + and d_date between cast('2001-01-12' as date) + and (cast('2001-01-12' as date) + interval '30' day) +group by i_item_id - ,i_item_desc + ,i_item_desc ,i_category ,i_class ,i_current_price -order by +order by i_category ,i_class ,i_item_id ,i_item_desc - ,revenueratio + ,revenueratio \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/queries/query99.sql b/sdks/java/testing/tpcds/src/main/resources/queries/query99.sql index de8e8ca2e58a..35fd9534e76c 100644 --- a/sdks/java/testing/tpcds/src/main/resources/queries/query99.sql +++ b/sdks/java/testing/tpcds/src/main/resources/queries/query99.sql @@ -15,17 +15,17 @@ -- limitations under the License. select - substr(w_warehouse_name,1,20) + substring(w_warehouse_name,1,20) ,sm_type ,cc_name - ,sum(case when (cs_ship_date_sk - cs_sold_date_sk <= 30 ) then 1 else 0 end) as "30 days" - ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 30) and - (cs_ship_date_sk - cs_sold_date_sk <= 60) then 1 else 0 end ) as "31-60 days" - ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 60) and - (cs_ship_date_sk - cs_sold_date_sk <= 90) then 1 else 0 end) as "61-90 days" + ,sum(case when (cs_ship_date_sk - cs_sold_date_sk <= 30 ) then 1 else 0 end) as `30 days` + ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 30) and + (cs_ship_date_sk - cs_sold_date_sk <= 60) then 1 else 0 end ) as `31-60 days` + ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 60) and + (cs_ship_date_sk - cs_sold_date_sk <= 90) then 1 else 0 end) as `61-90 days` ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 90) and - (cs_ship_date_sk - cs_sold_date_sk <= 120) then 1 else 0 end) as "91-120 days" - ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 120) then 1 else 0 end) as ">120 days" + (cs_ship_date_sk - cs_sold_date_sk <= 120) then 1 else 0 end) as `91-120 days` + ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 120) then 1 else 0 end) as `>120 days` from catalog_sales ,warehouse @@ -39,10 +39,10 @@ and cs_warehouse_sk = w_warehouse_sk and cs_ship_mode_sk = sm_ship_mode_sk and cs_call_center_sk = cc_call_center_sk group by - substr(w_warehouse_name,1,20) + substring(w_warehouse_name,1,20) ,sm_type ,cc_name -order by substr(w_warehouse_name,1,20) +order by substring(w_warehouse_name,1,20) ,sm_type ,cc_name -limit 100 +limit 100 \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas/store.json b/sdks/java/testing/tpcds/src/main/resources/schemas/store.json index 3df84659050d..4f48a810c7e2 100644 --- a/sdks/java/testing/tpcds/src/main/resources/schemas/store.json +++ b/sdks/java/testing/tpcds/src/main/resources/schemas/store.json @@ -8,10 +8,10 @@ {"name":"s_number_employees","type":"integer"}, {"name":"s_floor_space","type":"integer"}, {"name":"s_hours","type":"char(20)"}, - {"name":"S_manager","type":"varchar(40)"}, - {"name":"S_market_id","type":"integer"}, - {"name":"S_geography_class","type":"varchar(100)"}, - {"name":"S_market_desc","type":"varchar(100)"}, + {"name":"s_manager","type":"varchar(40)"}, + {"name":"s_market_id","type":"integer"}, + {"name":"s_geography_class","type":"varchar(100)"}, + {"name":"s_market_desc","type":"varchar(100)"}, {"name":"s_market_manager","type":"varchar(40)"}, {"name":"s_division_id","type":"integer"}, {"name":"s_division_name","type":"varchar(50)"}, diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/call_center.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/call_center.json new file mode 100644 index 000000000000..eeaf81db764d --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/call_center.json @@ -0,0 +1,260 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "cc_call_center_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cc_call_center_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_rec_start_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "cc_rec_end_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "cc_closed_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cc_open_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cc_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_class", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_employees", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cc_sq_ft", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cc_hours", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_manager", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_mkt_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cc_mkt_class", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_mkt_desc", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_market_manager", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_division", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cc_division_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_company", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cc_company_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_street_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_street_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_street_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_suite_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_city", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_county", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_state", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_zip", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_country", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cc_gmt_offset", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cc_tax_percentage", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/catalog_page.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/catalog_page.json new file mode 100644 index 000000000000..8b93471a2948 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/catalog_page.json @@ -0,0 +1,78 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "cp_catalog_page_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cp_catalog_page_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cp_start_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cp_end_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cp_department", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cp_catalog_number", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cp_catalog_page_number", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cp_description", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cp_type", + "type": [ + "null", + "string" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/catalog_returns.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/catalog_returns.json new file mode 100644 index 000000000000..525e97a6630b --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/catalog_returns.json @@ -0,0 +1,222 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "cr_returned_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_returned_time_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_item_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_refunded_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_refunded_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_refunded_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_refunded_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_returning_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_returning_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_returning_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_returning_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_call_center_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_catalog_page_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_ship_mode_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_warehouse_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_reason_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_order_number", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "cr_return_quantity", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_return_amount", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_return_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_return_amt_inc_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_fee", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_return_ship_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_refunded_cash", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_reversed_charge", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_store_credit", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cr_net_loss", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/catalog_sales.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/catalog_sales.json new file mode 100644 index 000000000000..ec86768c9424 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/catalog_sales.json @@ -0,0 +1,278 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "cs_sold_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_sold_time_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ship_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_bill_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_bill_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_bill_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_bill_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ship_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ship_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ship_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ship_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_call_center_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_catalog_page_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ship_mode_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_warehouse_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_item_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_promo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_order_number", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "cs_quantity", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_wholesale_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_list_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_sales_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ext_discount_amt", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ext_sales_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ext_wholesale_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ext_list_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ext_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_coupon_amt", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_ext_ship_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_net_paid", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_net_paid_inc_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_net_paid_inc_ship", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_net_paid_inc_ship_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cs_net_profit", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/customer.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/customer.json new file mode 100644 index 000000000000..3736c7a2b63f --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/customer.json @@ -0,0 +1,150 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "c_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "c_customer_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "c_current_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "c_current_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "c_current_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "c_first_shipto_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "c_first_sales_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "c_salutation", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "c_first_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "c_last_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "c_preferred_cust_flag", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "c_birth_day", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "c_birth_month", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "c_birth_year", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "c_birth_country", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "c_login", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "c_email_address", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "c_last_review_date", + "type": [ + "null", + "string" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/customer_address.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/customer_address.json new file mode 100644 index 000000000000..1100dec1e7ea --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/customer_address.json @@ -0,0 +1,110 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "ca_address_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ca_address_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_street_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_street_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_street_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_suite_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_city", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_county", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_state", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_zip", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_country", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ca_gmt_offset", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ca_location_type", + "type": [ + "null", + "string" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/customer_demographics.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/customer_demographics.json new file mode 100644 index 000000000000..c65348892744 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/customer_demographics.json @@ -0,0 +1,78 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "cd_demo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cd_gender", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cd_marital_status", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cd_education_status", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cd_purchase_estimate", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cd_credit_rating", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "cd_dep_count", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cd_dep_employed_count", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "cd_dep_college_count", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/date_dim.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/date_dim.json new file mode 100644 index 000000000000..6ee514124955 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/date_dim.json @@ -0,0 +1,233 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "d_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_date_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "d_month_seq", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_week_seq", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_quarter_seq", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_year", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_dow", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_moy", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_dom", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_qoy", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_fy_year", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_fy_quarter_seq", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_fy_week_seq", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_day_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_quarter_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_holiday", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_weekend", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_following_holiday", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_first_dom", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_last_dom", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_same_day_ly", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_same_day_lq", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "d_current_day", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_current_week", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_current_month", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_current_quarter", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "d_current_year", + "type": [ + "null", + "string" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/household_demographics.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/household_demographics.json new file mode 100644 index 000000000000..04f32dd0483a --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/household_demographics.json @@ -0,0 +1,46 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "hd_demo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "hd_income_band_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "hd_buy_potential", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "hd_dep_count", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "hd_vehicle_count", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/income_band.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/income_band.json new file mode 100644 index 000000000000..dc711b081f0a --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/income_band.json @@ -0,0 +1,30 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "ib_income_band_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ib_lower_bound", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ib_upper_bound", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/inventory.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/inventory.json new file mode 100644 index 000000000000..675d7bad4729 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/inventory.json @@ -0,0 +1,38 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "inv_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "inv_item_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "inv_warehouse_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "inv_quantity_on_hand", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/item.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/item.json new file mode 100644 index 000000000000..e9e642ea9202 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/item.json @@ -0,0 +1,188 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "i_item_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "i_item_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_rec_start_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "i_rec_end_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "i_item_desc", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_current_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "i_wholesale_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "i_brand_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "i_brand", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_class_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "i_class", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_category_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "i_category", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_manufact_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "i_manufact", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_size", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_formulation", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_color", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_units", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_container", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "i_manager_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "i_product_name", + "type": [ + "null", + "string" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/promotion.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/promotion.json new file mode 100644 index 000000000000..0b1d1ad7d4cd --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/promotion.json @@ -0,0 +1,158 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "p_promo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "p_promo_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_start_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "p_end_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "p_item_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "p_cost", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "p_response_target", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "p_promo_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_channel_dmail", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_channel_email", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_channel_catalog", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_channel_tv", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_channel_radio", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_channel_press", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_channel_event", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_channel_demo", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_channel_details", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_purpose", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "p_discount_active", + "type": [ + "null", + "string" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/reason.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/reason.json new file mode 100644 index 000000000000..376baf1d1234 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/reason.json @@ -0,0 +1,30 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "r_reason_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "r_reason_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "r_reason_desc", + "type": [ + "null", + "string" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/ship_mode.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/ship_mode.json new file mode 100644 index 000000000000..2dc27aa99cef --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/ship_mode.json @@ -0,0 +1,54 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "sm_ship_mode_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sm_ship_mode_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "sm_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "sm_code", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "sm_carrier", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "sm_contract", + "type": [ + "null", + "string" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/store.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/store.json new file mode 100644 index 000000000000..69cccde1c370 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/store.json @@ -0,0 +1,244 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "s_store_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "s_store_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_rec_start_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "s_rec_end_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "s_closed_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "s_store_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_number_employees", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "s_floor_space", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "s_hours", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_manager", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_market_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "s_geography_class", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_market_desc", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_market_manager", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_division_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "s_division_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_company_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "s_company_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_street_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_street_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_street_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_suite_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_city", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_county", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_state", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_zip", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_country", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "s_gmt_offset", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "s_tax_precentage", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/store_returns.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/store_returns.json new file mode 100644 index 000000000000..a004d22c960c --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/store_returns.json @@ -0,0 +1,166 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "sr_returned_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_return_time_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_item_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_store_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_reason_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_ticket_number", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "sr_return_quantity", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_return_amt", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_return_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_return_amt_inc_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_fee", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_return_ship_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_refunded_cash", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_reversed_charge", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_store_credit", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "sr_net_loss", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/store_sales.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/store_sales.json new file mode 100644 index 000000000000..97e92ebd7daf --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/store_sales.json @@ -0,0 +1,190 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "ss_sold_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_sold_time_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_item_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_store_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_promo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_ticket_number", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "ss_quantity", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_wholesale_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_list_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_sales_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_ext_discount_amt", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_ext_sales_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_ext_wholesale_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_ext_list_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_ext_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_coupon_amt", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_net_paid", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_net_paid_inc_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ss_net_profit", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/time_dim.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/time_dim.json new file mode 100644 index 000000000000..fc69edda2f55 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/time_dim.json @@ -0,0 +1,86 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "t_time_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "t_time_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "t_time", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "t_hour", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "t_minute", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "t_second", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "t_am_pm", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "t_shift", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "t_sub_shift", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "t_meal_time", + "type": [ + "null", + "string" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/warehouse.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/warehouse.json new file mode 100644 index 000000000000..872b45d36502 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/warehouse.json @@ -0,0 +1,118 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "w_warehouse_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "w_warehouse_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_warehouse_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_warehouse_sq_ft", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "w_street_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_street_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_street_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_suite_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_city", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_county", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_state", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_zip", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_country", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "w_gmt_offset", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_page.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_page.json new file mode 100644 index 000000000000..dcd90565efc3 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_page.json @@ -0,0 +1,124 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "wp_web_page_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wp_web_page_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "wp_rec_start_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "wp_rec_end_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "wp_creation_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wp_access_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wp_autogen_flag", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "wp_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wp_url", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "wp_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "wp_char_count", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wp_link_count", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wp_image_count", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wp_max_ad_count", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_returns.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_returns.json new file mode 100644 index 000000000000..4579457617f0 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_returns.json @@ -0,0 +1,198 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "wr_returned_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_returned_time_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_item_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_refunded_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_refunded_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_refunded_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_refunded_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_returning_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_returning_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_returning_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_returning_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_web_page_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_reason_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_order_number", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "wr_return_quantity", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_return_amt", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_return_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_return_amt_inc_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_fee", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_return_ship_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_refunded_cash", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_reversed_charge", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_account_credit", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "wr_net_loss", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_sales.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_sales.json new file mode 100644 index 000000000000..9b87b764540e --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_sales.json @@ -0,0 +1,278 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "ws_sold_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_sold_time_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ship_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_item_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_bill_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_bill_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_bill_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_bill_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ship_customer_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ship_cdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ship_hdemo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ship_addr_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_web_page_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_web_site_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ship_mode_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_warehouse_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_promo_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_order_number", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "ws_quantity", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_wholesale_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_list_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_sales_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ext_discount_amt", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ext_sales_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ext_wholesale_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ext_list_price", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ext_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_coupon_amt", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_ext_ship_cost", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_net_paid", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_net_paid_inc_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_net_paid_inc_ship", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_net_paid_inc_ship_tax", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "ws_net_profit", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_site.json b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_site.json new file mode 100644 index 000000000000..a15c002caa37 --- /dev/null +++ b/sdks/java/testing/tpcds/src/main/resources/schemas_avro/web_site.json @@ -0,0 +1,220 @@ +{ + "type": "record", + "name": "spark_schema", + "fields": [ + { + "name": "web_site_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "web_site_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_rec_start_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "web_rec_end_date", + "type": [ + "null", + { + "type": "int", + "logicalType": "date" + } + ], + "default": null + }, + { + "name": "web_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_open_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "web_close_date_sk", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "web_class", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_manager", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_mkt_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "web_mkt_class", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_mkt_desc", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_market_manager", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_company_id", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "web_company_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_street_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_street_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_street_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_suite_number", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_city", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_county", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_state", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_zip", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_country", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "web_gmt_offset", + "type": [ + "null", + "int" + ], + "default": null + }, + { + "name": "web_tax_percentage", + "type": [ + "null", + "int" + ], + "default": null + } + ] +} \ No newline at end of file diff --git a/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/QueryReaderTest.java b/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/QueryReaderTest.java index 56964103ab27..b21cdfaefb32 100644 --- a/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/QueryReaderTest.java +++ b/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/QueryReaderTest.java @@ -18,188 +18,221 @@ package org.apache.beam.sdk.tpcds; import static org.junit.Assert.assertEquals; + +import java.util.HashSet; +import java.util.Set; import org.junit.Test; public class QueryReaderTest { - private final String headers = "-- Licensed to the Apache Software Foundation (ASF) under one\n" + - "-- or more contributor license agreements. See the NOTICE file\n" + - "-- distributed with this work for additional information\n" + - "-- regarding copyright ownership. The ASF licenses this file\n" + - "-- to you under the Apache License, Version 2.0 (the\n" + - "-- \"License\"); you may not use this file except in compliance\n" + - "-- with the License. You may obtain a copy of the License at\n" + - "--\n" + - "-- http://www.apache.org/licenses/LICENSE-2.0\n" + - "--\n" + - "-- Unless required by applicable law or agreed to in writing, software\n" + - "-- distributed under the License is distributed on an \"AS IS\" BASIS,\n" + - "-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n" + - "-- See the License for the specific language governing permissions and\n" + - "-- limitations under the License.\n"; + private final String headers = + "-- Licensed to the Apache Software Foundation (ASF) under one\n" + + "-- or more contributor license agreements. See the NOTICE file\n" + + "-- distributed with this work for additional information\n" + + "-- regarding copyright ownership. The ASF licenses this file\n" + + "-- to you under the Apache License, Version 2.0 (the\n" + + "-- \"License\"); you may not use this file except in compliance\n" + + "-- with the License. You may obtain a copy of the License at\n" + + "--\n" + + "-- http://www.apache.org/licenses/LICENSE-2.0\n" + + "--\n" + + "-- Unless required by applicable law or agreed to in writing, software\n" + + "-- distributed under the License is distributed on an \"AS IS\" BASIS,\n" + + "-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n" + + "-- See the License for the specific language governing permissions and\n" + + "-- limitations under the License.\n"; + + @Test + public void testQuery3String() throws Exception { + String query3String = QueryReader.readQuery("query3"); + String expected = + "select dt.d_year \n" + + " ,item.i_brand_id brand_id \n" + + " ,item.i_brand brand\n" + + " ,sum(ss_ext_sales_price) sum_agg\n" + + " from date_dim dt \n" + + " ,store_sales\n" + + " ,item\n" + + " where dt.d_date_sk = store_sales.ss_sold_date_sk\n" + + " and store_sales.ss_item_sk = item.i_item_sk\n" + + " and item.i_manufact_id = 436\n" + + " and dt.d_moy=12\n" + + " group by dt.d_year\n" + + " ,item.i_brand\n" + + " ,item.i_brand_id\n" + + " order by dt.d_year\n" + + " ,sum_agg desc\n" + + " ,brand_id\n" + + " limit 100"; + String query3StringNoSpaces = query3String.replaceAll("\\s+", ""); + String expectedNoSpaces = (headers + expected).replaceAll("\\s+", ""); + assertEquals(expectedNoSpaces, query3StringNoSpaces); + } + + @Test + public void testQuery3Identifiers() throws Exception { + Set expected = new HashSet<>(); + expected.add("BRAND"); + expected.add("BRAND_ID"); + expected.add("D_DATE_SK"); + expected.add("D_MOY"); + expected.add("D_YEAR"); + expected.add("DATE_DIM"); + expected.add("DT"); + expected.add("I_BRAND"); + expected.add("I_BRAND_ID"); + expected.add("I_ITEM_SK"); + expected.add("I_MANUFACT_ID"); + expected.add("ITEM"); + expected.add("SS_EXT_SALES_PRICE"); + expected.add("SS_ITEM_SK"); + expected.add("SS_SOLD_DATE_SK"); + expected.add("STORE_SALES"); + expected.add("SUM_AGG"); - @Test - public void testQuery3String() throws Exception { - String query3String = QueryReader.readQuery("query3"); - String expected = "select dt.d_year \n" + - " ,item.i_brand_id brand_id \n" + - " ,item.i_brand brand\n" + - " ,sum(ss_ext_sales_price) sum_agg\n" + - " from date_dim dt \n" + - " ,store_sales\n" + - " ,item\n" + - " where dt.d_date_sk = store_sales.ss_sold_date_sk\n" + - " and store_sales.ss_item_sk = item.i_item_sk\n" + - " and item.i_manufact_id = 436\n" + - " and dt.d_moy=12\n" + - " group by dt.d_year\n" + - " ,item.i_brand\n" + - " ,item.i_brand_id\n" + - " order by dt.d_year\n" + - " ,sum_agg desc\n" + - " ,brand_id\n" + - " limit 100"; - String query3StringNoSpaces = query3String.replaceAll("\\s+", ""); - String expectedNoSpaces = (headers + expected).replaceAll("\\s+", ""); - assertEquals(expectedNoSpaces, query3StringNoSpaces); - } + String query3String = QueryReader.readQuery("query3"); + Set identifiers = QueryReader.getQueryIdentifiers(query3String); + assertEquals(expected, identifiers); + } - @Test - public void testQuery4String() throws Exception { - String query4String = QueryReader.readQuery("query4"); - String expected = "with year_total as (\n" + - " select c_customer_id customer_id\n" + - " ,c_first_name customer_first_name\n" + - " ,c_last_name customer_last_name\n" + - " ,c_preferred_cust_flag customer_preferred_cust_flag\n" + - " ,c_birth_country customer_birth_country\n" + - " ,c_login customer_login\n" + - " ,c_email_address customer_email_address\n" + - " ,d_year dyear\n" + - " ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total\n" + - " ,'s' sale_type\n" + - " from customer\n" + - " ,store_sales\n" + - " ,date_dim\n" + - " where c_customer_sk = ss_customer_sk\n" + - " and ss_sold_date_sk = d_date_sk\n" + - " group by c_customer_id\n" + - " ,c_first_name\n" + - " ,c_last_name\n" + - " ,c_preferred_cust_flag\n" + - " ,c_birth_country\n" + - " ,c_login\n" + - " ,c_email_address\n" + - " ,d_year\n" + - " union all\n" + - " select c_customer_id customer_id\n" + - " ,c_first_name customer_first_name\n" + - " ,c_last_name customer_last_name\n" + - " ,c_preferred_cust_flag customer_preferred_cust_flag\n" + - " ,c_birth_country customer_birth_country\n" + - " ,c_login customer_login\n" + - " ,c_email_address customer_email_address\n" + - " ,d_year dyear\n" + - " ,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total\n" + - " ,'c' sale_type\n" + - " from customer\n" + - " ,catalog_sales\n" + - " ,date_dim\n" + - " where c_customer_sk = cs_bill_customer_sk\n" + - " and cs_sold_date_sk = d_date_sk\n" + - " group by c_customer_id\n" + - " ,c_first_name\n" + - " ,c_last_name\n" + - " ,c_preferred_cust_flag\n" + - " ,c_birth_country\n" + - " ,c_login\n" + - " ,c_email_address\n" + - " ,d_year\n" + - "union all\n" + - " select c_customer_id customer_id\n" + - " ,c_first_name customer_first_name\n" + - " ,c_last_name customer_last_name\n" + - " ,c_preferred_cust_flag customer_preferred_cust_flag\n" + - " ,c_birth_country customer_birth_country\n" + - " ,c_login customer_login\n" + - " ,c_email_address customer_email_address\n" + - " ,d_year dyear\n" + - " ,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total\n" + - " ,'w' sale_type\n" + - " from customer\n" + - " ,web_sales\n" + - " ,date_dim\n" + - " where c_customer_sk = ws_bill_customer_sk\n" + - " and ws_sold_date_sk = d_date_sk\n" + - " group by c_customer_id\n" + - " ,c_first_name\n" + - " ,c_last_name\n" + - " ,c_preferred_cust_flag\n" + - " ,c_birth_country\n" + - " ,c_login\n" + - " ,c_email_address\n" + - " ,d_year\n" + - " )\n" + - " select \n" + - " t_s_secyear.customer_id\n" + - " ,t_s_secyear.customer_first_name\n" + - " ,t_s_secyear.customer_last_name\n" + - " ,t_s_secyear.customer_email_address\n" + - " from year_total t_s_firstyear\n" + - " ,year_total t_s_secyear\n" + - " ,year_total t_c_firstyear\n" + - " ,year_total t_c_secyear\n" + - " ,year_total t_w_firstyear\n" + - " ,year_total t_w_secyear\n" + - " where t_s_secyear.customer_id = t_s_firstyear.customer_id\n" + - " and t_s_firstyear.customer_id = t_c_secyear.customer_id\n" + - " and t_s_firstyear.customer_id = t_c_firstyear.customer_id\n" + - " and t_s_firstyear.customer_id = t_w_firstyear.customer_id\n" + - " and t_s_firstyear.customer_id = t_w_secyear.customer_id\n" + - " and t_s_firstyear.sale_type = 's'\n" + - " and t_c_firstyear.sale_type = 'c'\n" + - " and t_w_firstyear.sale_type = 'w'\n" + - " and t_s_secyear.sale_type = 's'\n" + - " and t_c_secyear.sale_type = 'c'\n" + - " and t_w_secyear.sale_type = 'w'\n" + - " and t_s_firstyear.dyear = 2001\n" + - " and t_s_secyear.dyear = 2001+1\n" + - " and t_c_firstyear.dyear = 2001\n" + - " and t_c_secyear.dyear = 2001+1\n" + - " and t_w_firstyear.dyear = 2001\n" + - " and t_w_secyear.dyear = 2001+1\n" + - " and t_s_firstyear.year_total > 0\n" + - " and t_c_firstyear.year_total > 0\n" + - " and t_w_firstyear.year_total > 0\n" + - " and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end\n" + - " > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end\n" + - " and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end\n" + - " > case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end\n" + - " order by t_s_secyear.customer_id\n" + - " ,t_s_secyear.customer_first_name\n" + - " ,t_s_secyear.customer_last_name\n" + - " ,t_s_secyear.customer_email_address\n" + - "limit 100"; - String query4StringNoSpaces = query4String.replaceAll("\\s+", ""); - String expectedNoSpaces = (headers + expected).replaceAll("\\s+", ""); - assertEquals(expectedNoSpaces, query4StringNoSpaces); - } + @Test + public void testQuery4String() throws Exception { + String query4String = QueryReader.readQuery("query4"); + String expected = + "with year_total as (\n" + + " select c_customer_id customer_id\n" + + " ,c_first_name customer_first_name\n" + + " ,c_last_name customer_last_name\n" + + " ,c_preferred_cust_flag customer_preferred_cust_flag\n" + + " ,c_birth_country customer_birth_country\n" + + " ,c_login customer_login\n" + + " ,c_email_address customer_email_address\n" + + " ,d_year dyear\n" + + " ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total\n" + + " ,'s' sale_type\n" + + " from customer\n" + + " ,store_sales\n" + + " ,date_dim\n" + + " where c_customer_sk = ss_customer_sk\n" + + " and ss_sold_date_sk = d_date_sk\n" + + " group by c_customer_id\n" + + " ,c_first_name\n" + + " ,c_last_name\n" + + " ,c_preferred_cust_flag\n" + + " ,c_birth_country\n" + + " ,c_login\n" + + " ,c_email_address\n" + + " ,d_year\n" + + " union all\n" + + " select c_customer_id customer_id\n" + + " ,c_first_name customer_first_name\n" + + " ,c_last_name customer_last_name\n" + + " ,c_preferred_cust_flag customer_preferred_cust_flag\n" + + " ,c_birth_country customer_birth_country\n" + + " ,c_login customer_login\n" + + " ,c_email_address customer_email_address\n" + + " ,d_year dyear\n" + + " ,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total\n" + + " ,'c' sale_type\n" + + " from customer\n" + + " ,catalog_sales\n" + + " ,date_dim\n" + + " where c_customer_sk = cs_bill_customer_sk\n" + + " and cs_sold_date_sk = d_date_sk\n" + + " group by c_customer_id\n" + + " ,c_first_name\n" + + " ,c_last_name\n" + + " ,c_preferred_cust_flag\n" + + " ,c_birth_country\n" + + " ,c_login\n" + + " ,c_email_address\n" + + " ,d_year\n" + + "union all\n" + + " select c_customer_id customer_id\n" + + " ,c_first_name customer_first_name\n" + + " ,c_last_name customer_last_name\n" + + " ,c_preferred_cust_flag customer_preferred_cust_flag\n" + + " ,c_birth_country customer_birth_country\n" + + " ,c_login customer_login\n" + + " ,c_email_address customer_email_address\n" + + " ,d_year dyear\n" + + " ,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total\n" + + " ,'w' sale_type\n" + + " from customer\n" + + " ,web_sales\n" + + " ,date_dim\n" + + " where c_customer_sk = ws_bill_customer_sk\n" + + " and ws_sold_date_sk = d_date_sk\n" + + " group by c_customer_id\n" + + " ,c_first_name\n" + + " ,c_last_name\n" + + " ,c_preferred_cust_flag\n" + + " ,c_birth_country\n" + + " ,c_login\n" + + " ,c_email_address\n" + + " ,d_year\n" + + " )\n" + + " select \n" + + " t_s_secyear.customer_id\n" + + " ,t_s_secyear.customer_first_name\n" + + " ,t_s_secyear.customer_last_name\n" + + " ,t_s_secyear.customer_email_address\n" + + " from year_total t_s_firstyear\n" + + " ,year_total t_s_secyear\n" + + " ,year_total t_c_firstyear\n" + + " ,year_total t_c_secyear\n" + + " ,year_total t_w_firstyear\n" + + " ,year_total t_w_secyear\n" + + " where t_s_secyear.customer_id = t_s_firstyear.customer_id\n" + + " and t_s_firstyear.customer_id = t_c_secyear.customer_id\n" + + " and t_s_firstyear.customer_id = t_c_firstyear.customer_id\n" + + " and t_s_firstyear.customer_id = t_w_firstyear.customer_id\n" + + " and t_s_firstyear.customer_id = t_w_secyear.customer_id\n" + + " and t_s_firstyear.sale_type = 's'\n" + + " and t_c_firstyear.sale_type = 'c'\n" + + " and t_w_firstyear.sale_type = 'w'\n" + + " and t_s_secyear.sale_type = 's'\n" + + " and t_c_secyear.sale_type = 'c'\n" + + " and t_w_secyear.sale_type = 'w'\n" + + " and t_s_firstyear.dyear = 2001\n" + + " and t_s_secyear.dyear = 2001+1\n" + + " and t_c_firstyear.dyear = 2001\n" + + " and t_c_secyear.dyear = 2001+1\n" + + " and t_w_firstyear.dyear = 2001\n" + + " and t_w_secyear.dyear = 2001+1\n" + + " and t_s_firstyear.year_total > 0\n" + + " and t_c_firstyear.year_total > 0\n" + + " and t_w_firstyear.year_total > 0\n" + + " and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end\n" + + " > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end\n" + + " and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end\n" + + " > case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end\n" + + " order by t_s_secyear.customer_id\n" + + " ,t_s_secyear.customer_first_name\n" + + " ,t_s_secyear.customer_last_name\n" + + " ,t_s_secyear.customer_email_address\n" + + "limit 100"; + String query4StringNoSpaces = query4String.replaceAll("\\s+", ""); + String expectedNoSpaces = (headers + expected).replaceAll("\\s+", ""); + assertEquals(expectedNoSpaces, query4StringNoSpaces); + } - @Test - public void testQuery55String() throws Exception { - String query55String = QueryReader.readQuery("query55"); - String expected = "select i_brand_id brand_id, i_brand brand,\n" + - " \tsum(ss_ext_sales_price) ext_price\n" + - " from date_dim, store_sales, item\n" + - " where d_date_sk = ss_sold_date_sk\n" + - " \tand ss_item_sk = i_item_sk\n" + - " \tand i_manager_id=36\n" + - " \tand d_moy=12\n" + - " \tand d_year=2001\n" + - " group by i_brand, i_brand_id\n" + - " order by ext_price desc, i_brand_id\n" + - "limit 100"; - String query55StringNoSpaces = query55String.replaceAll("\\s+", ""); - String expectedNoSpaces = (headers + expected).replaceAll("\\s+", ""); - assertEquals(expectedNoSpaces, query55StringNoSpaces); - } + @Test + public void testQuery55String() throws Exception { + String query55String = QueryReader.readQuery("query55"); + String expected = + "select i_brand_id brand_id, i_brand brand,\n" + + " \tsum(ss_ext_sales_price) ext_price\n" + + " from date_dim, store_sales, item\n" + + " where d_date_sk = ss_sold_date_sk\n" + + " \tand ss_item_sk = i_item_sk\n" + + " \tand i_manager_id=36\n" + + " \tand d_moy=12\n" + + " \tand d_year=2001\n" + + " group by i_brand, i_brand_id\n" + + " order by ext_price desc, i_brand_id\n" + + "limit 100"; + String query55StringNoSpaces = query55String.replaceAll("\\s+", ""); + String expectedNoSpaces = (headers + expected).replaceAll("\\s+", ""); + assertEquals(expectedNoSpaces, query55StringNoSpaces); + } } diff --git a/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TableSchemaJSONLoaderTest.java b/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TableSchemaJSONLoaderTest.java index 7748bee4e82f..45f6f91aac7b 100644 --- a/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TableSchemaJSONLoaderTest.java +++ b/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TableSchemaJSONLoaderTest.java @@ -18,134 +18,161 @@ package org.apache.beam.sdk.tpcds; import static org.junit.Assert.assertEquals; -import org.junit.Test; +import java.io.IOException; import java.util.Arrays; import java.util.Collections; import java.util.List; - +import org.junit.Test; public class TableSchemaJSONLoaderTest { - @Test - public void testStoreReturnsTable() throws Exception { - String storeReturnsSchemaString = TableSchemaJSONLoader.parseTableSchema("store_returns"); - String expected = "sr_returned_date_sk bigint," - + "sr_return_time_sk bigint," - + "sr_item_sk bigint," - + "sr_customer_sk bigint," - + "sr_cdemo_sk bigint," - + "sr_hdemo_sk bigint," - + "sr_addr_sk bigint," - + "sr_store_sk bigint," - + "sr_reason_sk bigint," - + "sr_ticket_number bigint," - + "sr_return_quantity bigint," - + "sr_return_amt double," - + "sr_return_tax double," - + "sr_return_amt_inc_tax double," - + "sr_fee double," - + "sr_return_ship_cost double," - + "sr_refunded_cash double," - + "sr_reversed_charge double," - + "sr_store_credit double," - + "sr_net_loss double"; - assertEquals(expected, storeReturnsSchemaString); - } + @Test + public void testStoreReturnsTable() throws Exception { + String storeReturnsSchemaString = TableSchemaJSONLoader.parseTableSchema("store_returns"); + String expected = + "sr_returned_date_sk bigint," + + "sr_return_time_sk bigint," + + "sr_item_sk bigint," + + "sr_customer_sk bigint," + + "sr_cdemo_sk bigint," + + "sr_hdemo_sk bigint," + + "sr_addr_sk bigint," + + "sr_store_sk bigint," + + "sr_reason_sk bigint," + + "sr_ticket_number bigint," + + "sr_return_quantity bigint," + + "sr_return_amt double," + + "sr_return_tax double," + + "sr_return_amt_inc_tax double," + + "sr_fee double," + + "sr_return_ship_cost double," + + "sr_refunded_cash double," + + "sr_reversed_charge double," + + "sr_store_credit double," + + "sr_net_loss double"; + assertEquals(expected, storeReturnsSchemaString); + } - @Test - public void testItemTable() throws Exception { - String itemSchemaString = TableSchemaJSONLoader.parseTableSchema("item"); - String expected = "i_item_sk bigint," - + "i_item_id varchar," - + "i_rec_start_date varchar," - + "i_rec_end_date varchar," - + "i_item_desc varchar," - + "i_current_price double," - + "i_wholesale_cost double," - + "i_brand_id bigint," - + "i_brand varchar," - + "i_class_id bigint," - + "i_class varchar," - + "i_category_id bigint," - + "i_category varchar," - + "i_manufact_id bigint," - + "i_manufact varchar," - + "i_size varchar," - + "i_formulation varchar," - + "i_color varchar," - + "i_units varchar," - + "i_container varchar," - + "i_manager_id bigint," - + "i_product_name varchar"; - assertEquals(expected, itemSchemaString); - } + @Test + public void testItemTable() throws Exception { + String itemSchemaString = TableSchemaJSONLoader.parseTableSchema("item"); + String expected = + "i_item_sk bigint," + + "i_item_id varchar," + + "i_rec_start_date varchar," + + "i_rec_end_date varchar," + + "i_item_desc varchar," + + "i_current_price double," + + "i_wholesale_cost double," + + "i_brand_id bigint," + + "i_brand varchar," + + "i_class_id bigint," + + "i_class varchar," + + "i_category_id bigint," + + "i_category varchar," + + "i_manufact_id bigint," + + "i_manufact varchar," + + "i_size varchar," + + "i_formulation varchar," + + "i_color varchar," + + "i_units varchar," + + "i_container varchar," + + "i_manager_id bigint," + + "i_product_name varchar"; + assertEquals(expected, itemSchemaString); + } - @Test - public void testDateDimTable() throws Exception { - String dateDimSchemaString = TableSchemaJSONLoader.parseTableSchema("date_dim"); - String expected = "d_date_sk bigint," - + "d_date_id varchar," - + "d_date varchar," - + "d_month_seq bigint," - + "d_week_seq bigint," - + "d_quarter_seq bigint," - + "d_year bigint," - + "d_dow bigint," - + "d_moy bigint," - + "d_dom bigint," - + "d_qoy bigint," - + "d_fy_year bigint," - + "d_fy_quarter_seq bigint," - + "d_fy_week_seq bigint," - + "d_day_name varchar," - + "d_quarter_name varchar," - + "d_holiday varchar," - + "d_weekend varchar," - + "d_following_holiday varchar," - + "d_first_dom bigint," - + "d_last_dom bigint," - + "d_same_day_ly bigint," - + "d_same_day_lq bigint," - + "d_current_day varchar," - + "d_current_week varchar," - + "d_current_month varchar," - + "d_current_quarter varchar," - + "d_current_year varchar"; - assertEquals(expected, dateDimSchemaString); - } + @Test + public void testDateDimTable() throws Exception { + String dateDimSchemaString = TableSchemaJSONLoader.parseTableSchema("date_dim"); + String expected = + "d_date_sk bigint," + + "d_date_id varchar," + + "d_date varchar," + + "d_month_seq bigint," + + "d_week_seq bigint," + + "d_quarter_seq bigint," + + "d_year bigint," + + "d_dow bigint," + + "d_moy bigint," + + "d_dom bigint," + + "d_qoy bigint," + + "d_fy_year bigint," + + "d_fy_quarter_seq bigint," + + "d_fy_week_seq bigint," + + "d_day_name varchar," + + "d_quarter_name varchar," + + "d_holiday varchar," + + "d_weekend varchar," + + "d_following_holiday varchar," + + "d_first_dom bigint," + + "d_last_dom bigint," + + "d_same_day_ly bigint," + + "d_same_day_lq bigint," + + "d_current_day varchar," + + "d_current_week varchar," + + "d_current_month varchar," + + "d_current_quarter varchar," + + "d_current_year varchar"; + assertEquals(expected, dateDimSchemaString); + } - @Test - public void testWarehouseTable() throws Exception { - String warehouseSchemaString = TableSchemaJSONLoader.parseTableSchema("warehouse"); - String expected = "w_warehouse_sk bigint," - + "w_warehouse_id varchar," - + "w_warehouse_name varchar," - + "w_warehouse_sq_ft bigint," - + "w_street_number varchar," - + "w_street_name varchar," - + "w_street_type varchar," - + "w_suite_number varchar," - + "w_city varchar," - + "w_county varchar," - + "w_state varchar," - + "w_zip varchar," - + "w_country varchar," - + "w_gmt_offset double"; - assertEquals(expected, warehouseSchemaString); - } + @Test + public void testWarehouseTable() throws Exception { + String warehouseSchemaString = TableSchemaJSONLoader.parseTableSchema("warehouse"); + String expected = + "w_warehouse_sk bigint," + + "w_warehouse_id varchar," + + "w_warehouse_name varchar," + + "w_warehouse_sq_ft bigint," + + "w_street_number varchar," + + "w_street_name varchar," + + "w_street_type varchar," + + "w_suite_number varchar," + + "w_city varchar," + + "w_county varchar," + + "w_state varchar," + + "w_zip varchar," + + "w_country varchar," + + "w_gmt_offset double"; + assertEquals(expected, warehouseSchemaString); + } - @Test - public void testGetAllTableNames() { - List tableNames = TableSchemaJSONLoader.getAllTableNames(); - Collections.sort(tableNames); - List expectedTableNames = Arrays.asList("call_center", "catalog_page", "catalog_returns", "catalog_sales", "customer", "customer_address", "customer_demographics", - "date_dim", "household_demographics", "income_band", "inventory", "item", "promotion", "reason", "ship_mode", "store", "store_returns", "store_sales", "time_dim", - "warehouse", "web_page", "web_returns", "web_sales", "web_site"); + @Test + public void testGetAllTableNames() throws IOException { + List tableNames = TableSchemaJSONLoader.getAllTableNames(); + Collections.sort(tableNames); + List expectedTableNames = + Arrays.asList( + "call_center", + "catalog_page", + "catalog_returns", + "catalog_sales", + "customer", + "customer_address", + "customer_demographics", + "date_dim", + "household_demographics", + "income_band", + "inventory", + "item", + "promotion", + "reason", + "ship_mode", + "store", + "store_returns", + "store_sales", + "time_dim", + "warehouse", + "web_page", + "web_returns", + "web_sales", + "web_site"); - assertEquals(expectedTableNames.size(), tableNames.size()); + assertEquals(expectedTableNames.size(), tableNames.size()); - for (int i = 0; i < tableNames.size(); i++) { - assertEquals(expectedTableNames.get(i), tableNames.get(i)); - } + for (int i = 0; i < tableNames.size(); i++) { + assertEquals(expectedTableNames.get(i), tableNames.get(i)); } + } } diff --git a/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TpcdsParametersReaderTest.java b/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TpcdsParametersReaderTest.java index 3f8c9516a170..9b9c352541ee 100644 --- a/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TpcdsParametersReaderTest.java +++ b/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TpcdsParametersReaderTest.java @@ -17,76 +17,74 @@ */ package org.apache.beam.sdk.tpcds; +import static org.junit.Assert.assertEquals; + +import java.util.List; import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.junit.Assert; import org.junit.Before; import org.junit.Test; -import static org.junit.Assert.assertEquals; - public class TpcdsParametersReaderTest { - private TpcdsOptions tpcdsOptions; - private TpcdsOptions tpcdsOptionsError; + private TpcdsOptions tpcdsOptions; + private TpcdsOptions tpcdsOptionsError; - @Before - public void initializeTpcdsOptions() { - tpcdsOptions = PipelineOptionsFactory.as(TpcdsOptions.class); - tpcdsOptionsError = PipelineOptionsFactory.as(TpcdsOptions.class); + @Before + public void initializeTpcdsOptions() { + tpcdsOptions = PipelineOptionsFactory.as(TpcdsOptions.class); + tpcdsOptionsError = PipelineOptionsFactory.as(TpcdsOptions.class); - tpcdsOptions.setDataSize("1G"); - tpcdsOptions.setQueries("1,2,3"); - tpcdsOptions.setTpcParallel(2); + tpcdsOptions.setDataSize("1G"); + tpcdsOptions.setQueries("1,2,3,14a"); + tpcdsOptions.setTpcParallel(2); - tpcdsOptionsError.setDataSize("5G"); - tpcdsOptionsError.setQueries("0,100"); - tpcdsOptionsError.setTpcParallel(0); - } + tpcdsOptionsError.setDataSize("5G"); + tpcdsOptionsError.setQueries("0,1b,100"); + tpcdsOptionsError.setTpcParallel(0); + } - @Test - public void testGetAndCheckDataSize() throws Exception { - String dataSize = TpcdsParametersReader.getAndCheckDataSize(tpcdsOptions); - String expected = "1G"; - assertEquals(expected, dataSize); - } + @Test + public void testGetAndCheckDataSize() throws Exception { + String dataSize = TpcdsParametersReader.getAndCheckDataSize(tpcdsOptions); + String expected = "1G"; + assertEquals(expected, dataSize); + } - @Test( expected = Exception.class) - public void testGetAndCheckDataSizeException() throws Exception { - TpcdsParametersReader.getAndCheckDataSize(tpcdsOptionsError); - } + @Test(expected = Exception.class) + public void testGetAndCheckDataSizeException() throws Exception { + TpcdsParametersReader.getAndCheckDataSize(tpcdsOptionsError); + } - @Test - public void testGetAndCheckQueries() throws Exception { - TpcdsOptions tpcdsOptionsAll = PipelineOptionsFactory.as(TpcdsOptions.class); - tpcdsOptionsAll.setQueries("all"); - String[] queryNameArray = TpcdsParametersReader.getAndCheckQueryNameArray(tpcdsOptionsAll); - String[] expected = new String[99]; - for (int i = 0; i < 99; i++) { - expected[i] = "query" + (i + 1); - } - Assert.assertArrayEquals(expected, queryNameArray); - } + @Test + public void testGetAndCheckAllQueries() throws Exception { + TpcdsOptions tpcdsOptionsAll = PipelineOptionsFactory.as(TpcdsOptions.class); + tpcdsOptionsAll.setQueries("all"); + String[] queryNames = TpcdsParametersReader.getAndCheckQueryNames(tpcdsOptionsAll); + List expected = TpcdsParametersReader.ALL_QUERY_NAMES; + Assert.assertArrayEquals(expected.toArray(new String[0]), queryNames); + } - @Test - public void testGetAndCheckAllQueries() throws Exception { - String[] queryNameArray = TpcdsParametersReader.getAndCheckQueryNameArray(tpcdsOptions); - String[] expected = {"query1", "query2", "query3"}; - Assert.assertArrayEquals(expected, queryNameArray); - } + @Test + public void testGetAndCheckSpecifiedQueries() throws Exception { + String[] queryNames = TpcdsParametersReader.getAndCheckQueryNames(tpcdsOptions); + String[] expected = {"query1", "query2", "query3", "query14a"}; + Assert.assertArrayEquals(expected, queryNames); + } - @Test( expected = Exception.class) - public void testGetAndCheckQueriesException() throws Exception { - TpcdsParametersReader.getAndCheckQueryNameArray(tpcdsOptionsError); - } + @Test(expected = Exception.class) + public void testGetAndCheckQueriesException() throws Exception { + TpcdsParametersReader.getAndCheckQueryNames(tpcdsOptionsError); + } - @Test - public void testGetAndCheckTpcParallel() throws Exception { - int nThreads = TpcdsParametersReader.getAndCheckTpcParallel(tpcdsOptions); - int expected = 2; - assertEquals(expected, nThreads); - } + @Test + public void testGetAndCheckTpcParallel() throws Exception { + int nThreads = TpcdsParametersReader.getAndCheckTpcParallel(tpcdsOptions); + int expected = 2; + assertEquals(expected, nThreads); + } - @Test( expected = Exception.class) - public void ttestGetAndCheckTpcParallelException() throws Exception { - TpcdsParametersReader.getAndCheckTpcParallel(tpcdsOptionsError); - } + @Test(expected = Exception.class) + public void ttestGetAndCheckTpcParallelException() throws Exception { + TpcdsParametersReader.getAndCheckTpcParallel(tpcdsOptionsError); + } } diff --git a/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TpcdsSchemasTest.java b/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TpcdsSchemasTest.java index 05402f2c30ec..9f2397cf945c 100644 --- a/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TpcdsSchemasTest.java +++ b/sdks/java/testing/tpcds/src/test/java/org/apache/beam/sdk/tpcds/TpcdsSchemasTest.java @@ -17,108 +17,107 @@ */ package org.apache.beam.sdk.tpcds; -import org.apache.beam.sdk.schemas.Schema; -import org.junit.Before; -import org.junit.Test; - -import java.util.Map; - import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotEquals; +import java.util.Map; +import org.apache.beam.sdk.schemas.Schema; +import org.junit.Before; +import org.junit.Test; public class TpcdsSchemasTest { - private Map schemaMap; - private Map immutableSchemaMap; + private Map schemaMap; + private Map immutableSchemaMap; - @Before - public void initializeMaps() throws Exception { - schemaMap = TpcdsSchemas.getTpcdsSchemas(); - immutableSchemaMap = TpcdsSchemas.getTpcdsSchemasImmutableMap(); - } + @Before + public void initializeMaps() throws Exception { + schemaMap = TpcdsSchemas.getTpcdsSchemas(); + immutableSchemaMap = TpcdsSchemas.getTpcdsSchemasImmutableMap(); + } - @Test - public void testCallCenterSchema() throws Exception { - Schema callCenterSchema = - Schema.builder() - .addField("cc_call_center_sk", Schema.FieldType.INT64) - .addField("cc_call_center_id", Schema.FieldType.STRING) - .addNullableField("cc_rec_start_date", Schema.FieldType.STRING) - .addNullableField("cc_rec_end_date", Schema.FieldType.STRING) - .addNullableField("cc_closed_date_sk", Schema.FieldType.INT64) - .addNullableField("cc_open_date_sk", Schema.FieldType.INT64) - .addNullableField("cc_name", Schema.FieldType.STRING) - .addNullableField("cc_class", Schema.FieldType.STRING) - .addNullableField("cc_employees", Schema.FieldType.INT64) - .addNullableField("cc_sq_ft", Schema.FieldType.INT64) - .addNullableField("cc_hours", Schema.FieldType.STRING) - .addNullableField("cc_manager", Schema.FieldType.STRING) - .addNullableField("cc_mkt_id", Schema.FieldType.INT64) - .addNullableField("cc_mkt_class", Schema.FieldType.STRING) - .addNullableField("cc_mkt_desc", Schema.FieldType.STRING) - .addNullableField("cc_market_manager", Schema.FieldType.STRING) - .addNullableField("cc_division", Schema.FieldType.INT64) - .addNullableField("cc_division_name", Schema.FieldType.STRING) - .addNullableField("cc_company", Schema.FieldType.INT64) - .addNullableField("cc_company_name", Schema.FieldType.STRING) - .addNullableField("cc_street_number", Schema.FieldType.STRING) - .addNullableField("cc_street_name", Schema.FieldType.STRING) - .addNullableField("cc_street_type", Schema.FieldType.STRING) - .addNullableField("cc_suite_number", Schema.FieldType.STRING) - .addNullableField("cc_city", Schema.FieldType.STRING) - .addNullableField("cc_county", Schema.FieldType.STRING) - .addNullableField("cc_state", Schema.FieldType.STRING) - .addNullableField("cc_zip", Schema.FieldType.STRING) - .addNullableField("cc_country", Schema.FieldType.STRING) - .addNullableField("cc_gmt_offset", Schema.FieldType.DOUBLE) - .addNullableField("cc_tax_percentage", Schema.FieldType.DOUBLE) - .build(); + @Test + public void testCallCenterSchema() { + Schema callCenterSchema = + Schema.builder() + .addField("cc_call_center_sk", Schema.FieldType.INT64) + .addField("cc_call_center_id", Schema.FieldType.STRING) + .addNullableField("cc_rec_start_date", Schema.FieldType.STRING) + .addNullableField("cc_rec_end_date", Schema.FieldType.STRING) + .addNullableField("cc_closed_date_sk", Schema.FieldType.INT64) + .addNullableField("cc_open_date_sk", Schema.FieldType.INT64) + .addNullableField("cc_name", Schema.FieldType.STRING) + .addNullableField("cc_class", Schema.FieldType.STRING) + .addNullableField("cc_employees", Schema.FieldType.INT64) + .addNullableField("cc_sq_ft", Schema.FieldType.INT64) + .addNullableField("cc_hours", Schema.FieldType.STRING) + .addNullableField("cc_manager", Schema.FieldType.STRING) + .addNullableField("cc_mkt_id", Schema.FieldType.INT64) + .addNullableField("cc_mkt_class", Schema.FieldType.STRING) + .addNullableField("cc_mkt_desc", Schema.FieldType.STRING) + .addNullableField("cc_market_manager", Schema.FieldType.STRING) + .addNullableField("cc_division", Schema.FieldType.INT64) + .addNullableField("cc_division_name", Schema.FieldType.STRING) + .addNullableField("cc_company", Schema.FieldType.INT64) + .addNullableField("cc_company_name", Schema.FieldType.STRING) + .addNullableField("cc_street_number", Schema.FieldType.STRING) + .addNullableField("cc_street_name", Schema.FieldType.STRING) + .addNullableField("cc_street_type", Schema.FieldType.STRING) + .addNullableField("cc_suite_number", Schema.FieldType.STRING) + .addNullableField("cc_city", Schema.FieldType.STRING) + .addNullableField("cc_county", Schema.FieldType.STRING) + .addNullableField("cc_state", Schema.FieldType.STRING) + .addNullableField("cc_zip", Schema.FieldType.STRING) + .addNullableField("cc_country", Schema.FieldType.STRING) + .addNullableField("cc_gmt_offset", Schema.FieldType.DOUBLE) + .addNullableField("cc_tax_percentage", Schema.FieldType.DOUBLE) + .build(); - assertNotEquals(schemaMap.get("call_center"), callCenterSchema); - assertEquals(immutableSchemaMap.get("call_center"), callCenterSchema); - } + assertNotEquals(schemaMap.get("call_center"), callCenterSchema); + assertEquals(immutableSchemaMap.get("call_center"), callCenterSchema); + } - @Test - public void testCatalogPageSchemaNullable() throws Exception { - Schema catalogPageSchemaNullable = - Schema.builder() - .addNullableField("cp_catalog_page_sk", Schema.FieldType.INT64) - .addNullableField("cp_catalog_page_id", Schema.FieldType.STRING) - .addNullableField("cp_start_date_sk", Schema.FieldType.INT64) - .addNullableField("cp_end_date_sk", Schema.FieldType.INT64) - .addNullableField("cp_department", Schema.FieldType.STRING) - .addNullableField("cp_catalog_number", Schema.FieldType.INT64) - .addNullableField("cp_catalog_page_number", Schema.FieldType.INT64) - .addNullableField("cp_description", Schema.FieldType.STRING) - .addNullableField("cp_type", Schema.FieldType.STRING) - .build(); + @Test + public void testCatalogPageSchemaNullable() { + Schema catalogPageSchemaNullable = + Schema.builder() + .addNullableField("cp_catalog_page_sk", Schema.FieldType.INT64) + .addNullableField("cp_catalog_page_id", Schema.FieldType.STRING) + .addNullableField("cp_start_date_sk", Schema.FieldType.INT64) + .addNullableField("cp_end_date_sk", Schema.FieldType.INT64) + .addNullableField("cp_department", Schema.FieldType.STRING) + .addNullableField("cp_catalog_number", Schema.FieldType.INT64) + .addNullableField("cp_catalog_page_number", Schema.FieldType.INT64) + .addNullableField("cp_description", Schema.FieldType.STRING) + .addNullableField("cp_type", Schema.FieldType.STRING) + .build(); - assertEquals(schemaMap.get("catalog_page"), catalogPageSchemaNullable); - assertNotEquals(schemaMap.get("catalog_page"), TpcdsSchemas.getCatalogPageSchema()); - assertEquals(immutableSchemaMap.get("catalog_page"), TpcdsSchemas.getCatalogPageSchema()); - } + assertEquals(schemaMap.get("catalog_page"), catalogPageSchemaNullable); + assertNotEquals(schemaMap.get("catalog_page"), TpcdsSchemas.getCatalogPageSchema()); + assertEquals(immutableSchemaMap.get("catalog_page"), TpcdsSchemas.getCatalogPageSchema()); + } - @Test - public void testCustomerAddressSchemaNullable() throws Exception { - Schema customerAddressSchemaNullable = - Schema.builder() - .addNullableField("ca_address_sk", Schema.FieldType.INT64) - .addNullableField("ca_address_id", Schema.FieldType.STRING) - .addNullableField("ca_street_number", Schema.FieldType.STRING) - .addNullableField("ca_street_name", Schema.FieldType.STRING) - .addNullableField("ca_street_type", Schema.FieldType.STRING) - .addNullableField("ca_suite_number", Schema.FieldType.STRING) - .addNullableField("ca_city", Schema.FieldType.STRING) - .addNullableField("ca_county", Schema.FieldType.STRING) - .addNullableField("ca_state", Schema.FieldType.STRING) - .addNullableField("ca_zip", Schema.FieldType.STRING) - .addNullableField("ca_country", Schema.FieldType.STRING) - .addNullableField("ca_gmt_offset", Schema.FieldType.DOUBLE) - .addNullableField("ca_location_type", Schema.FieldType.STRING) - .build(); + @Test + public void testCustomerAddressSchemaNullable() { + Schema customerAddressSchemaNullable = + Schema.builder() + .addNullableField("ca_address_sk", Schema.FieldType.INT64) + .addNullableField("ca_address_id", Schema.FieldType.STRING) + .addNullableField("ca_street_number", Schema.FieldType.STRING) + .addNullableField("ca_street_name", Schema.FieldType.STRING) + .addNullableField("ca_street_type", Schema.FieldType.STRING) + .addNullableField("ca_suite_number", Schema.FieldType.STRING) + .addNullableField("ca_city", Schema.FieldType.STRING) + .addNullableField("ca_county", Schema.FieldType.STRING) + .addNullableField("ca_state", Schema.FieldType.STRING) + .addNullableField("ca_zip", Schema.FieldType.STRING) + .addNullableField("ca_country", Schema.FieldType.STRING) + .addNullableField("ca_gmt_offset", Schema.FieldType.DOUBLE) + .addNullableField("ca_location_type", Schema.FieldType.STRING) + .build(); - assertEquals(schemaMap.get("customer_address"), customerAddressSchemaNullable); - assertNotEquals(schemaMap.get("customer_address"), TpcdsSchemas.getCustomerAddressSchema()); - assertEquals(immutableSchemaMap.get("customer_address"), TpcdsSchemas.getCustomerAddressSchema()); - } + assertEquals(schemaMap.get("customer_address"), customerAddressSchemaNullable); + assertNotEquals(schemaMap.get("customer_address"), TpcdsSchemas.getCustomerAddressSchema()); + assertEquals( + immutableSchemaMap.get("customer_address"), TpcdsSchemas.getCustomerAddressSchema()); + } } diff --git a/sdks/java/testing/watermarks/build.gradle b/sdks/java/testing/watermarks/build.gradle new file mode 100644 index 000000000000..5095bb21e289 --- /dev/null +++ b/sdks/java/testing/watermarks/build.gradle @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +plugins { id 'org.apache.beam.module' } +applyJavaNature( + publish: false, + archivesBaseName: 'beam-sdks-java-watermark-latency', + exportJavadoc: false +) + +description = "Apache Beam :: SDKs :: Java :: Watermark Latency Benchmark" + + +def mainClassProperty = "loadTest.mainClass" +def mainClass = project.findProperty(mainClassProperty) + +// When running via Gradle, this property can be used to pass commandline arguments +// to the load-tests launch +def loadTestArgsProperty = "loadTest.args" + +// When running via Gradle, this property sets the runner dependency +def runnerProperty = "runner" +def runnerDependency = (project.hasProperty(runnerProperty) + ? project.getProperty(runnerProperty) + : ":runners:direct-java") + +def shouldProvideSpark = ":runners:spark".equals(runnerDependency) +def isDataflowRunner = ":runners:google-cloud-dataflow-java".equals(runnerDependency) +def runnerConfiguration = ":runners:direct-java".equals(runnerDependency) ? "shadow" : null + +if (isDataflowRunner) { + /* + * We need to rely on manually specifying these evaluationDependsOn to ensure that + * the following projects are evaluated before we evaluate this project. This is because + * we are attempting to reference a property from the project directly. + */ + evaluationDependsOn(":runners:google-cloud-dataflow-java:worker:legacy-worker") +} + +configurations { + // A configuration for running the Load testlauncher directly from Gradle, which + // uses Gradle to put the appropriate dependencies on the Classpath rather than + // bundling them into a fat jar + gradleRun +} + +dependencies { + compile library.java.joda_time + runtimeOnly library.java.kafka_clients + compile library.java.slf4j_api + + compile project(path: ":sdks:java:core", configuration: "shadow") + runtimeOnly project(path: ":runners:direct-java", configuration: "shadow") + runtimeOnly project(":sdks:java:io:synthetic") + runtimeOnly project(":sdks:java:testing:test-utils") + runtimeOnly project(":sdks:java:io:google-cloud-platform") + runtimeOnly project(":sdks:java:io:kafka") + runtimeOnly project(":sdks:java:io:kinesis") + + gradleRun project(project.path) + gradleRun project(path: runnerDependency, configuration: runnerConfiguration) + + // The Spark runner requires the user to provide a Spark dependency. For self-contained + // runs with the Spark runner, we can provide such a dependency. This is deliberately phrased + // to not hardcode any runner other than :runners:direct-java + if (shouldProvideSpark) { + gradleRun library.java.spark_streaming + gradleRun library.java.spark_core, { + exclude group:"org.slf4j", module:"jul-to-slf4j" + } + gradleRun library.java.spark_sql + } +} + +if (shouldProvideSpark) { + configurations.gradleRun { + // Using Spark runner causes a StackOverflowError if slf4j-jdk14 is on the classpath + exclude group: "org.slf4j", module: "slf4j-jdk14" + } +} + +task run(type: JavaExec) { + def loadTestArgs = project.findProperty(loadTestArgsProperty) ?: "" + + if (isDataflowRunner) { + dependsOn ":runners:google-cloud-dataflow-java:worker:legacy-worker:shadowJar" + + def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: project(":runners:google-cloud-dataflow-java:worker:legacy-worker").shadowJar.archivePath + // Provide job with a customizable worker jar. + // With legacy worker jar, containerImage is set to empty (i.e. to use the internal build). + // More context and discussions can be found in PR#6694. + loadTestArgs = loadTestArgs + + " --dataflowWorkerJar=${dataflowWorkerJar} " + + " --workerHarnessContainerImage=" + } + + main = mainClass + classpath = configurations.gradleRun + args loadTestArgs.split() +} + diff --git a/sdks/java/testing/watermarks/src/main/java/org/apache/beam/sdk/testing/watermarks/WatermarkLatency.java b/sdks/java/testing/watermarks/src/main/java/org/apache/beam/sdk/testing/watermarks/WatermarkLatency.java new file mode 100644 index 000000000000..a56dfe79a541 --- /dev/null +++ b/sdks/java/testing/watermarks/src/main/java/org/apache/beam/sdk/testing/watermarks/WatermarkLatency.java @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.testing.watermarks; + +import java.util.ArrayList; +import java.util.Collections; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.io.GenerateSequence; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.windowing.AfterWatermark; +import org.apache.beam.sdk.transforms.windowing.FixedWindows; +import org.apache.beam.sdk.transforms.windowing.Window; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * This class defines the Apache Beam pipeline that was used to benchmark the watermark subsystems + * of Google Cloud Dataflow and Apache Flink for the paper "Watermarks in Stream Processing Systems: + * Semantics and Comparative Analysis of Apache Flink and Google Cloud Dataflow", submitted for the + * Industrial Track of VLDB 2021 by Tyler Akidau, Edmon Begoli, Slava Chernyak, Fabian Hueske, + * Kathryn Knight, Kenneth Knowles, Daniel Mills, and Dan Sotolongo. + */ +@SuppressWarnings({ + "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402) +}) +public class WatermarkLatency { + private static final TupleTag> output = new TupleTag>() {}; + private static final TupleTag> latencyResult = + new TupleTag>() {}; + private static final Logger LOG = LoggerFactory.getLogger(WatermarkLatency.class); + + public interface WatermarkLatencyOptions extends PipelineOptions { + @Default.Integer(3) + Integer getNumShuffles(); + + void setNumShuffles(Integer value); + + @Default.Integer(1000) + Integer getInputRatePerSec(); + + void setInputRatePerSec(Integer value); + + @Default.Integer(1000) + Integer getNumKeys(); + + void setNumKeys(Integer value); + + @Default.String("Default") + String getConfigName(); + + void setConfigName(String value); + + @Default.String("") + String getOutputPath(); + + void setOutputPath(String value); + } + + static void run(WatermarkLatencyOptions options) { + Pipeline p = Pipeline.create(options); + + Duration period = Duration.standardSeconds(1); + final int numKeys = options.getNumKeys(); + final String configName = options.getConfigName(); + + PCollection> input = + p.apply("Generate", GenerateSequence.from(0).withRate(options.getInputRatePerSec(), period)) + .apply( + ParDo.of( + new DoFn>() { + @ProcessElement + public void process(ProcessContext c) { + Instant now = Instant.now(); + c.output(KV.of(c.element() % numKeys, now)); + } + })) + .apply( + Window.>into(FixedWindows.of(Duration.standardSeconds(1))) + .triggering(AfterWatermark.pastEndOfWindow()) + .discardingFiredPanes() + .withAllowedLateness(Duration.ZERO)); + + PCollectionList> latencyList = + PCollectionList.>empty(p); + for (int i = 0; i < options.getNumShuffles(); i++) { + final int idx = i; + PCollectionTuple tup = + input + .apply(GroupByKey.create()) + .apply( + ParDo.of( + new DoFn>, KV>() { + @ProcessElement + public void process(ProcessContext c, @Timestamp Instant ts) { + Instant now = Instant.now(); + Instant lastGBKTs = Instant.ofEpochMilli(0L); + // forward records to next window + for (Instant v : c.element().getValue()) { + lastGBKTs = v; + // enforce re-shuffling by changing keys + c.output(KV.of(c.element().getKey() + 1, now)); + } + if (idx > 0) { + // compute delay since last shuffle and emit result to side output + Duration sessionDelay = new Duration(lastGBKTs, now); + c.output( + latencyResult, + KV.of( + String.format("GBK%d-GBK%d", idx - 1, idx), sessionDelay)); + } + } + }) + .withOutputTags(output, TupleTagList.of(latencyResult))); + + latencyList = latencyList.and(tup.get(latencyResult)); + input = tup.get(output); + } + + PCollectionList collectionList = PCollectionList.empty(p); + for (PCollection> latency : latencyList.getAll()) { + collectionList = + collectionList.and( + latency + .apply( + Window.>into( + FixedWindows.of(Duration.standardMinutes(1))) + .triggering(AfterWatermark.pastEndOfWindow()) + .discardingFiredPanes() + .withAllowedLateness(Duration.ZERO)) + .apply(GroupByKey.create()) + .apply( + ParDo.of( + new DoFn>, String>() { + + Duration median = null; + Duration p75 = null; + Duration p95 = null; + Duration p99 = null; + int numElements = -1; + + @ProcessElement + public void process(ProcessContext c) { + + computePercentiles(c.element().getValue()); + + if (numElements < 0) { + return; + } + + String out = + String.format( + "%s, %s, %d, %d, %d, %d, %d", + configName, + c.element().getKey(), + median.getMillis(), + p75.getMillis(), + p95.getMillis(), + p99.getMillis(), + numElements); + LOG.info(out); + } + + private void computePercentiles(Iterable vals) { + numElements = -1; + ArrayList accumulator = new ArrayList<>(6000); + + for (Duration v : vals) { + accumulator.add(v); + } + if (accumulator.isEmpty()) { + return; + } + + // Compute the median of the available points. + int medianIndex = (int) Math.floor(accumulator.size() * 0.5); + int p75Index = (int) Math.floor(accumulator.size() * 0.75); + int p95Index = (int) Math.floor(accumulator.size() * 0.95); + int p99Index = (int) Math.floor(accumulator.size() * 0.99); + + if (medianIndex < 0 + || medianIndex >= accumulator.size() + || p75Index < 0 + || p75Index >= accumulator.size() + || p95Index < 0 + || p95Index >= accumulator.size() + || p99Index < 0 + || p99Index >= accumulator.size()) { + LOG.info("Computed bogus index"); + return; + } + + Collections.sort(accumulator); + + median = accumulator.get(medianIndex); + p75 = accumulator.get(p75Index); + p95 = accumulator.get(p95Index); + p99 = accumulator.get(p99Index); + numElements = accumulator.size(); + } + }))); + } + + // Run pipeline + p.run().waitUntilFinish(); + } + + public static void main(String[] args) { + WatermarkLatencyOptions options = + PipelineOptionsFactory.fromArgs(args).withValidation().as(WatermarkLatencyOptions.class); + + run(options); + } +} diff --git a/sdks/java/testing/watermarks/src/main/java/org/apache/beam/sdk/testing/watermarks/package-info.java b/sdks/java/testing/watermarks/src/main/java/org/apache/beam/sdk/testing/watermarks/package-info.java new file mode 100644 index 000000000000..dce1fe73b6ff --- /dev/null +++ b/sdks/java/testing/watermarks/src/main/java/org/apache/beam/sdk/testing/watermarks/package-info.java @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/** Watermark latency benchmark. */ +package org.apache.beam.sdk.testing.watermarks; diff --git a/sdks/python/MANIFEST.in b/sdks/python/MANIFEST.in index c97e57aa1a16..60b0989d9af7 100644 --- a/sdks/python/MANIFEST.in +++ b/sdks/python/MANIFEST.in @@ -19,3 +19,4 @@ include gen_protos.py include README.md include NOTICE include LICENSE +include LICENSE.python diff --git a/sdks/python/apache_beam/__init__.py b/sdks/python/apache_beam/__init__.py index 24c3aaf89c68..2e5254980174 100644 --- a/sdks/python/apache_beam/__init__.py +++ b/sdks/python/apache_beam/__init__.py @@ -71,9 +71,6 @@ """ -from __future__ import absolute_import - -import os import sys import warnings diff --git a/sdks/python/apache_beam/coders/__init__.py b/sdks/python/apache_beam/coders/__init__.py index 680f1c725cbb..80630994ef9e 100644 --- a/sdks/python/apache_beam/coders/__init__.py +++ b/sdks/python/apache_beam/coders/__init__.py @@ -14,8 +14,6 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import - from apache_beam.coders.coders import * from apache_beam.coders.row_coder import * from apache_beam.coders.typecoders import registry diff --git a/sdks/python/apache_beam/coders/avro_record.py b/sdks/python/apache_beam/coders/avro_record.py index efbaca81b8f4..c9ed26d34eb7 100644 --- a/sdks/python/apache_beam/coders/avro_record.py +++ b/sdks/python/apache_beam/coders/avro_record.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - __all__ = ['AvroRecord'] diff --git a/sdks/python/apache_beam/coders/coder_impl.pxd b/sdks/python/apache_beam/coders/coder_impl.pxd index 66590789b32d..a07672559119 100644 --- a/sdks/python/apache_beam/coders/coder_impl.pxd +++ b/sdks/python/apache_beam/coders/coder_impl.pxd @@ -36,6 +36,7 @@ cdef object loads, dumps, create_InputStream, create_OutputStream, ByteCountingO # Temporarily untyped to allow monkeypatching on failed import. #cdef type WindowedValue +cdef bint is_compiled cdef class CoderImpl(object): cpdef encode_to_stream(self, value, OutputStream stream, bint nested) @@ -65,28 +66,31 @@ cdef class CallbackCoderImpl(CoderImpl): cdef object _size_estimator -cdef class DeterministicFastPrimitivesCoderImpl(CoderImpl): - cdef CoderImpl _underlying_coder - cdef object _step_label - cdef bint _check_safe(self, value) except -1 - - cdef object NoneType cdef unsigned char UNKNOWN_TYPE, NONE_TYPE, INT_TYPE, FLOAT_TYPE, BOOL_TYPE cdef unsigned char BYTES_TYPE, UNICODE_TYPE, LIST_TYPE, TUPLE_TYPE, DICT_TYPE cdef unsigned char SET_TYPE, ITERABLE_LIKE_TYPE +cdef unsigned char PROTO_TYPE, DATACLASS_TYPE, NAMED_TUPLE_TYPE + cdef set _ITERABLE_LIKE_TYPES cdef class FastPrimitivesCoderImpl(StreamCoderImpl): cdef CoderImpl fallback_coder_impl cdef CoderImpl iterable_coder_impl + cdef object requires_deterministic_step_label + cdef bint warn_deterministic_fallback + @cython.locals(dict_value=dict, int_value=libc.stdint.int64_t, unicode_value=unicode) cpdef encode_to_stream(self, value, OutputStream stream, bint nested) @cython.locals(t=int) cpdef decode_from_stream(self, InputStream stream, bint nested) + cdef encode_special_deterministic(self, value, OutputStream stream) + cdef encode_type(self, t, OutputStream stream) + cdef decode_type(self, InputStream stream) +cdef dict _unpickled_types cdef class BytesCoderImpl(CoderImpl): pass diff --git a/sdks/python/apache_beam/coders/coder_impl.py b/sdks/python/apache_beam/coders/coder_impl.py index 6abebd07bc03..618ee5544fca 100644 --- a/sdks/python/apache_beam/coders/coder_impl.py +++ b/sdks/python/apache_beam/coders/coder_impl.py @@ -26,23 +26,19 @@ This module may be optionally compiled with Cython, using the corresponding coder_impl.pxd file for type hints. -Py2/3 porting: Native range is used on both python versions instead of -future.builtins.range to avoid performance regression in Cython compiled code. - For internal use only; no backwards-compatibility guarantees. """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - +import enum import json -from builtins import chr -from builtins import object +import logging +import pickle from io import BytesIO from typing import TYPE_CHECKING from typing import Any from typing import Callable +from typing import Dict from typing import Iterable from typing import Iterator from typing import List @@ -52,20 +48,25 @@ from typing import Tuple from typing import Type +import dill from fastavro import parse_schema from fastavro import schemaless_reader from fastavro import schemaless_writer -from past.builtins import unicode as past_unicode -from past.builtins import long from apache_beam.coders import observable from apache_beam.coders.avro_record import AvroRecord +from apache_beam.utils import proto_utils from apache_beam.utils import windowed_value from apache_beam.utils.sharded_key import ShardedKey from apache_beam.utils.timestamp import MAX_TIMESTAMP from apache_beam.utils.timestamp import MIN_TIMESTAMP from apache_beam.utils.timestamp import Timestamp +try: + import dataclasses +except ImportError: + dataclasses = None # type: ignore + if TYPE_CHECKING: from apache_beam.transforms import userstate from apache_beam.transforms.window import IntervalWindow @@ -77,18 +78,20 @@ else: SLOW_STREAM = False +is_compiled = False +fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1 + if TYPE_CHECKING or SLOW_STREAM: from .slow_stream import InputStream as create_InputStream from .slow_stream import OutputStream as create_OutputStream from .slow_stream import ByteCountingOutputStream from .slow_stream import get_varint_size - if False: # pylint: disable=using-constant-test - # This clause is interpreted by the compiler. - from cython import compiled as is_compiled - else: - is_compiled = False - fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1 + try: + import cython + is_compiled = cython.compiled + except ImportError: + pass else: # pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports @@ -103,6 +106,8 @@ globals()['ByteCountingOutputStream'] = ByteCountingOutputStream # pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports +_LOGGER = logging.getLogger(__name__) + _TIME_SHIFT = 1 << 63 MIN_TIMESTAMP_micros = MIN_TIMESTAMP.micros MAX_TIMESTAMP_micros = MAX_TIMESTAMP.micros @@ -284,70 +289,24 @@ def __repr__(self): self._encoder, self._decoder) -class DeterministicFastPrimitivesCoderImpl(CoderImpl): - """For internal use only; no backwards-compatibility guarantees.""" - def __init__(self, coder, step_label): - self._underlying_coder = coder - self._step_label = step_label - - def _check_safe(self, value): - if isinstance(value, (bytes, past_unicode, long, int, float)): - pass - elif value is None: - pass - elif isinstance(value, (tuple, list)): - for x in value: - self._check_safe(x) - else: - raise TypeError( - "Unable to deterministically code '%s' of type '%s', " - "please provide a type hint for the input of '%s'" % - (value, type(value), self._step_label)) - - def encode_to_stream(self, value, stream, nested): - # type: (Any, create_OutputStream, bool) -> None - self._check_safe(value) - return self._underlying_coder.encode_to_stream(value, stream, nested) - - def decode_from_stream(self, stream, nested): - # type: (create_InputStream, bool) -> Any - return self._underlying_coder.decode_from_stream(stream, nested) - - def encode(self, value): - self._check_safe(value) - return self._underlying_coder.encode(value) - - def decode(self, encoded): - return self._underlying_coder.decode(encoded) - - def estimate_size(self, value, nested=False): - # type: (Any, bool) -> int - return self._underlying_coder.estimate_size(value, nested) - - def get_estimated_size_and_observables(self, value, nested=False): - # type: (Any, bool) -> Tuple[int, Observables] - return self._underlying_coder.get_estimated_size_and_observables( - value, nested) - - class ProtoCoderImpl(SimpleCoderImpl): """For internal use only; no backwards-compatibility guarantees.""" def __init__(self, proto_message_type): self.proto_message_type = proto_message_type def encode(self, value): - return value.SerializeToString() + return value.SerializePartialToString() def decode(self, encoded): proto_message = self.proto_message_type() - proto_message.ParseFromString(encoded) + proto_message.ParseFromString(encoded) # This is in effect "ParsePartial". return proto_message class DeterministicProtoCoderImpl(ProtoCoderImpl): """For internal use only; no backwards-compatibility guarantees.""" def encode(self, value): - return value.SerializeToString(deterministic=True) + return value.SerializePartialToString(deterministic=True) UNKNOWN_TYPE = 0xFF @@ -363,6 +322,12 @@ def encode(self, value): SET_TYPE = 8 ITERABLE_LIKE_TYPE = 10 +PROTO_TYPE = 100 +DATACLASS_TYPE = 101 +NAMED_TUPLE_TYPE = 102 +ENUM_TYPE = 103 +NESTED_STATE_TYPE = 104 + # Types that can be encoded as iterables, but are not literally # lists, etc. due to being lazy. The actual type is not preserved # through encoding, only the elements. This is particularly useful @@ -372,9 +337,12 @@ def encode(self, value): class FastPrimitivesCoderImpl(StreamCoderImpl): """For internal use only; no backwards-compatibility guarantees.""" - def __init__(self, fallback_coder_impl): + def __init__( + self, fallback_coder_impl, requires_deterministic_step_label=None): self.fallback_coder_impl = fallback_coder_impl self.iterable_coder_impl = IterableCoderImpl(self) + self.requires_deterministic_step_label = requires_deterministic_step_label + self.warn_deterministic_fallback = True @staticmethod def register_iterable_like_type(t): @@ -418,16 +386,25 @@ def encode_to_stream(self, value, stream, nested): elif t is bytes: stream.write_byte(BYTES_TYPE) stream.write(value, nested) - elif t is past_unicode: + elif t is str: unicode_value = value # for typing stream.write_byte(UNICODE_TYPE) stream.write(unicode_value.encode('utf-8'), nested) - elif t is list or t is tuple or t is set: - stream.write_byte( - LIST_TYPE if t is list else TUPLE_TYPE if t is tuple else SET_TYPE) + elif t is list or t is tuple: + stream.write_byte(LIST_TYPE if t is list else TUPLE_TYPE) stream.write_var_int64(len(value)) for e in value: self.encode_to_stream(e, stream, True) + elif t is bool: + stream.write_byte(BOOL_TYPE) + stream.write_byte(value) + elif t in _ITERABLE_LIKE_TYPES: + stream.write_byte(ITERABLE_LIKE_TYPE) + self.iterable_coder_impl.encode_to_stream(value, stream, nested) + # All deterministic encodings should be above this clause, + # all non-deterministic ones below. + elif self.requires_deterministic_step_label is not None: + self.encode_special_deterministic(value, stream) elif t is dict: dict_value = value # for typing stream.write_byte(DICT_TYPE) @@ -435,16 +412,85 @@ def encode_to_stream(self, value, stream, nested): for k, v in dict_value.items(): self.encode_to_stream(k, stream, True) self.encode_to_stream(v, stream, True) - elif t is bool: - stream.write_byte(BOOL_TYPE) - stream.write_byte(value) - elif t in _ITERABLE_LIKE_TYPES: - stream.write_byte(ITERABLE_LIKE_TYPE) - self.iterable_coder_impl.encode_to_stream(value, stream, nested) + elif t is set: + stream.write_byte(SET_TYPE) + stream.write_var_int64(len(value)) + for e in value: + self.encode_to_stream(e, stream, True) else: stream.write_byte(UNKNOWN_TYPE) self.fallback_coder_impl.encode_to_stream(value, stream, nested) + def encode_special_deterministic(self, value, stream): + if self.warn_deterministic_fallback: + _LOGGER.warning( + "Using fallback deterministic coder for type '%s' in '%s'. ", + type(value), + self.requires_deterministic_step_label) + self.warn_deterministic_fallback = False + if isinstance(value, proto_utils.message_types): + stream.write_byte(PROTO_TYPE) + self.encode_type(type(value), stream) + stream.write(value.SerializePartialToString(deterministic=True), True) + elif dataclasses and dataclasses.is_dataclass(value): + stream.write_byte(DATACLASS_TYPE) + if not type(value).__dataclass_params__.frozen: + raise TypeError( + "Unable to deterministically encode non-frozen '%s' of type '%s' " + "for the input of '%s'" % + (value, type(value), self.requires_deterministic_step_label)) + self.encode_type(type(value), stream) + values = [ + getattr(value, field.name) for field in dataclasses.fields(value) + ] + try: + self.iterable_coder_impl.encode_to_stream(values, stream, True) + except Exception as e: + raise TypeError(self._deterministic_encoding_error_msg(value)) from e + elif isinstance(value, tuple) and hasattr(type(value), '_fields'): + stream.write_byte(NAMED_TUPLE_TYPE) + self.encode_type(type(value), stream) + try: + self.iterable_coder_impl.encode_to_stream(value, stream, True) + except Exception as e: + raise TypeError(self._deterministic_encoding_error_msg(value)) from e + elif isinstance(value, enum.Enum): + stream.write_byte(ENUM_TYPE) + self.encode_type(type(value), stream) + # Enum values can be of any type. + try: + self.encode_to_stream(value.value, stream, True) + except Exception as e: + raise TypeError(self._deterministic_encoding_error_msg(value)) from e + elif hasattr(value, "__getstate__"): + if not hasattr(value, "__setstate__"): + raise TypeError( + "Unable to deterministically encode '%s' of type '%s', " + "for the input of '%s'. The object defines __getstate__ but not " + "__setstate__." % + (value, type(value), self.requires_deterministic_step_label)) + stream.write_byte(NESTED_STATE_TYPE) + self.encode_type(type(value), stream) + state_value = value.__getstate__() + try: + self.encode_to_stream(state_value, stream, True) + except Exception as e: + raise TypeError(self._deterministic_encoding_error_msg(value)) from e + else: + raise TypeError(self._deterministic_encoding_error_msg(value)) + + def _deterministic_encoding_error_msg(self, value): + return ( + "Unable to deterministically encode '%s' of type '%s', " + "please provide a type hint for the input of '%s'" % + (value, type(value), self.requires_deterministic_step_label)) + + def encode_type(self, t, stream): + stream.write(dill.dumps(t), True) + + def decode_type(self, stream): + return _unpickle_type(stream.read_all(True)) + def decode_from_stream(self, stream, nested): # type: (create_InputStream, bool) -> Any t = stream.read_byte() @@ -477,12 +523,49 @@ def decode_from_stream(self, stream, nested): return not not stream.read_byte() elif t == ITERABLE_LIKE_TYPE: return self.iterable_coder_impl.decode_from_stream(stream, nested) + elif t == PROTO_TYPE: + cls = self.decode_type(stream) + msg = cls() + msg.ParseFromString(stream.read_all(True)) + return msg + elif t == DATACLASS_TYPE or t == NAMED_TUPLE_TYPE: + cls = self.decode_type(stream) + return cls(*self.iterable_coder_impl.decode_from_stream(stream, True)) + elif t == ENUM_TYPE: + cls = self.decode_type(stream) + return cls(self.decode_from_stream(stream, True)) + elif t == NESTED_STATE_TYPE: + cls = self.decode_type(stream) + state = self.decode_from_stream(stream, True) + value = cls.__new__(cls) + value.__setstate__(state) + return value elif t == UNKNOWN_TYPE: return self.fallback_coder_impl.decode_from_stream(stream, nested) else: raise ValueError('Unknown type tag %x' % t) +_unpickled_types = {} # type: Dict[bytes, type] + + +def _unpickle_type(bs): + t = _unpickled_types.get(bs, None) + if t is None: + t = _unpickled_types[bs] = dill.loads(bs) + # Fix unpicklable anonymous named tuples for Python 3.6. + if t.__base__ is tuple and hasattr(t, '_fields'): + try: + pickle.loads(pickle.dumps(t)) + except pickle.PicklingError: + t.__reduce__ = lambda self: (_unpickle_named_tuple, (bs, tuple(self))) + return t + + +def _unpickle_named_tuple(bs, items): + return _unpickle_type(bs)(*items) + + class BytesCoderImpl(CoderImpl): """For internal use only; no backwards-compatibility guarantees. @@ -776,7 +859,7 @@ def decode_from_stream(self, in_stream, nested): class VarIntCoderImpl(StreamCoderImpl): """For internal use only; no backwards-compatibility guarantees. - A coder for long/int objects.""" + A coder for int objects.""" def encode_to_stream(self, value, out, nested): # type: (int, create_OutputStream, bool) -> None out.write_var_int64(value) @@ -908,7 +991,7 @@ def decode(self, encoded): class TupleCoderImpl(AbstractComponentCoderImpl): """A coder for tuple objects.""" def _extract_components(self, value): - return value + return tuple(value) def _construct_from_components(self, components): return tuple(components) @@ -1400,3 +1483,31 @@ def estimate_size(self, value, nested=False): estimated_size += ( self._key_coder_impl.estimate_size(value.key, nested=True)) return estimated_size + + +class TimestampPrefixingWindowCoderImpl(StreamCoderImpl): + """For internal use only; no backwards-compatibility guarantees. + + A coder for custom window types, which prefix required max_timestamp to + encoded original window. + + The coder encodes and decodes custom window types with following format: + window's max_timestamp() + encoded window using it's own coder. + """ + def __init__(self, window_coder_impl: CoderImpl) -> None: + self._window_coder_impl = window_coder_impl + + def encode_to_stream(self, value, stream, nested): + TimestampCoderImpl().encode_to_stream(value.max_timestamp(), stream, nested) + self._window_coder_impl.encode_to_stream(value, stream, nested) + + def decode_from_stream(self, stream, nested): + TimestampCoderImpl().decode_from_stream(stream, nested) + return self._window_coder_impl.decode_from_stream(stream, nested) + + def estimate_size(self, value: Any, nested: bool = False) -> int: + estimated_size = 0 + estimated_size += TimestampCoderImpl().estimate_size(value) + estimated_size += self._window_coder_impl.estimate_size(value, nested) + return estimated_size diff --git a/sdks/python/apache_beam/coders/coders.py b/sdks/python/apache_beam/coders/coders.py index 6058bf1f4b04..05f7a9d27377 100644 --- a/sdks/python/apache_beam/coders/coders.py +++ b/sdks/python/apache_beam/coders/coders.py @@ -18,14 +18,27 @@ """Collection of useful coders. Only those coders listed in __all__ are part of the public API of this module. + +## On usage of `pickle`, `dill` and `pickler` in Beam + +In Beam, we generally we use `pickle` for pipeline elements and `dill` for +more complex types, like user functions. + +`pickler` is Beam's own wrapping of dill + compression + error handling. +It serves also as an API to mask the actual encoding layer (so we can +change it from `dill` if necessary). + +We created `_MemoizingPickleCoder` to improve performance when serializing +complex user types for the execution of SDF. Specifically to address +BEAM-12781, where many identical `BoundedSource` instances are being +encoded. + """ # pytype: skip-file -from __future__ import absolute_import - import base64 -import sys -from builtins import object +import pickle +from functools import lru_cache from typing import TYPE_CHECKING from typing import Any from typing import Callable @@ -37,12 +50,9 @@ from typing import Tuple from typing import Type from typing import TypeVar -from typing import Union from typing import overload import google.protobuf.wrappers_pb2 -from future.moves import pickle -from past.builtins import unicode from apache_beam.coders import coder_impl from apache_beam.coders.avro_record import AvroRecord @@ -84,12 +94,13 @@ 'FastPrimitivesCoder', 'FloatCoder', 'IterableCoder', + 'ListCoder', 'MapCoder', 'NullableCoder', 'PickleCoder', 'ProtoCoder', - 'SingletonCoder', 'ShardedKeyCoder', + 'SingletonCoder', 'StrUtf8Coder', 'TimestampCoder', 'TupleCoder', @@ -167,7 +178,9 @@ def as_deterministic_coder(self, step_label, error_message=None): if self.is_deterministic(): return self else: - raise ValueError(error_message or "'%s' cannot be made deterministic.") + raise ValueError( + error_message or + "%s cannot be made deterministic for '%s'." % (self, step_label)) def estimate_size(self, value): """Estimates the encoded size of the given value, in bytes. @@ -223,7 +236,7 @@ def _dict_without_impl(self): return self.__dict__ def to_type_hint(self): - raise NotImplementedError('BEAM-2717') + raise NotImplementedError('BEAM-2717: %s' % self.__class__.__name__) @classmethod def from_type_hint(cls, unused_typehint, unused_registry): @@ -293,10 +306,6 @@ def __eq__(self, other): # pylint: enable=protected-access - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(type(self)) @@ -359,26 +368,17 @@ def to_runner_api(self, context): @classmethod def from_runner_api(cls, coder_proto, context): - # type: (Type[CoderT], beam_runner_api_pb2.Coder, PipelineContext) -> Union[CoderT, ExternalCoder] + # type: (Type[CoderT], beam_runner_api_pb2.Coder, PipelineContext) -> CoderT """Converts from an FunctionSpec to a Fn object. Prefer registering a urn with its parameter type and constructor. """ - if (context.allow_proto_holders and - coder_proto.spec.urn not in cls._known_urns): - # We hold this in proto form since there's no coder available in Python - # SDK. - # This is potentially a coder that is only available in an external SDK. - return ExternalCoder(coder_proto) - else: - parameter_type, constructor = cls._known_urns[coder_proto.spec.urn] - return constructor( - proto_utils.parse_Bytes(coder_proto.spec.payload, parameter_type), [ - context.coders.get_by_id(c) - for c in coder_proto.component_coder_ids - ], - context) + parameter_type, constructor = cls._known_urns[coder_proto.spec.urn] + return constructor( + proto_utils.parse_Bytes(coder_proto.spec.payload, parameter_type), + [context.coders.get_by_id(c) for c in coder_proto.component_coder_ids], + context) def to_runner_api_parameter(self, context): # type: (Optional[PipelineContext]) -> Tuple[str, Any, Sequence[Coder]] @@ -428,7 +428,7 @@ def is_deterministic(self): return True def to_type_hint(self): - return unicode + return str Coder.register_structured_urn(common_urns.coders.STRING_UTF8.urn, StrUtf8Coder) @@ -436,19 +436,8 @@ def to_type_hint(self): class ToBytesCoder(Coder): """A default string coder used if no sink coder is specified.""" - - if sys.version_info.major == 2: - - def encode(self, value): - # pylint: disable=unicode-builtin - return ( - value.encode('utf-8') if isinstance(value, unicode) # noqa: F821 - else str(value)) - - else: - - def encode(self, value): - return value if isinstance(value, bytes) else str(value).encode('utf-8') + def encode(self, value): + return value if isinstance(value, bytes) else str(value).encode('utf-8') def decode(self, _): raise NotImplementedError('ToBytesCoder cannot be used for decoding.') @@ -542,6 +531,13 @@ def _create_impl(self): return coder_impl.MapCoderImpl( self._key_coder.get_impl(), self._value_coder.get_impl()) + @classmethod + def from_type_hint(cls, typehint, registry): + # type: (typehints.DictConstraint, CoderRegistry) -> MapCoder + return cls( + registry.get_coder(typehint.key_type), + registry.get_coder(typehint.value_type)) + def to_type_hint(self): return typehints.Dict[self._key_coder.to_type_hint(), self._value_coder.to_type_hint()] @@ -559,6 +555,9 @@ def __eq__(self, other): def __hash__(self): return hash(type(self)) + hash(self._key_coder) + hash(self._value_coder) + def __repr__(self): + return 'MapCoder[%s, %s]' % (self._key_coder, self._value_coder) + class NullableCoder(FastCoder): def __init__(self, value_coder): @@ -765,6 +764,33 @@ def __hash__(self): return hash(type(self)) +class _MemoizingPickleCoder(_PickleCoderBase): + """Coder using Python's pickle functionality with memoization.""" + def __init__(self, cache_size=16): + super(_MemoizingPickleCoder, self).__init__() + self.cache_size = cache_size + + def _create_impl(self): + from apache_beam.internal import pickler + dumps = pickler.dumps + + mdumps = lru_cache(maxsize=self.cache_size, typed=True)(dumps) + + def _nonhashable_dumps(x): + try: + return mdumps(x) + except TypeError: + return dumps(x) + + return coder_impl.CallbackCoderImpl(_nonhashable_dumps, pickler.loads) + + def as_deterministic_coder(self, step_label, error_message=None): + return FastPrimitivesCoder(self, requires_deterministic=step_label) + + def to_type_hint(self): + return Any + + class PickleCoder(_PickleCoderBase): """Coder using Python's pickle functionality.""" def _create_impl(self): @@ -774,7 +800,7 @@ def _create_impl(self): lambda x: dumps(x, protocol), pickle.loads) def as_deterministic_coder(self, step_label, error_message=None): - return DeterministicFastPrimitivesCoder(self, step_label) + return FastPrimitivesCoder(self, requires_deterministic=step_label) def to_type_hint(self): return Any @@ -793,8 +819,9 @@ def __init__(self, coder, step_label): self._step_label = step_label def _create_impl(self): - return coder_impl.DeterministicFastPrimitivesCoderImpl( - self._underlying_coder.get_impl(), self._step_label) + return coder_impl.FastPrimitivesCoderImpl( + self._underlying_coder.get_impl(), + requires_deterministic_step_label=self._step_label) def is_deterministic(self): # type: () -> bool @@ -874,6 +901,16 @@ def __hash__(self): return hash(type(self)) +class FakeDeterministicFastPrimitivesCoder(FastPrimitivesCoder): + """A FastPrimitivesCoder that claims to be deterministic. + + This can be registered as a fallback coder to go back to the behavior before + deterministic encoding was enforced (BEAM-11719). + """ + def is_deterministic(self): + return True + + class Base64PickleCoder(Coder): """Coder of objects by Python pickle, then base64 encoding.""" @@ -945,10 +982,10 @@ def __eq__(self, other): def __hash__(self): return hash(self.proto_message_type) - @staticmethod - def from_type_hint(typehint, unused_registry): - if issubclass(typehint, google.protobuf.message.Message): - return ProtoCoder(typehint) + @classmethod + def from_type_hint(cls, typehint, unused_registry): + if issubclass(typehint, proto_utils.message_types): + return cls(typehint) else: raise ValueError(( 'Expected a subclass of google.protobuf.message.Message' @@ -1035,10 +1072,10 @@ def as_deterministic_coder(self, step_label, error_message=None): def to_type_hint(self): return typehints.Tuple[tuple(c.to_type_hint() for c in self._coders)] - @staticmethod - def from_type_hint(typehint, registry): + @classmethod + def from_type_hint(cls, typehint, registry): # type: (typehints.TupleConstraint, CoderRegistry) -> TupleCoder - return TupleCoder([registry.get_coder(t) for t in typehint.tuple_types]) + return cls([registry.get_coder(t) for t in typehint.tuple_types]) def as_cloud_object(self, coders_context=None): if self.is_kv_coder(): @@ -1122,10 +1159,10 @@ def as_deterministic_coder(self, step_label, error_message=None): return TupleSequenceCoder( self._elem_coder.as_deterministic_coder(step_label, error_message)) - @staticmethod - def from_type_hint(typehint, registry): + @classmethod + def from_type_hint(cls, typehint, registry): # type: (Any, CoderRegistry) -> TupleSequenceCoder - return TupleSequenceCoder(registry.get_coder(typehint.inner_type)) + return cls(registry.get_coder(typehint.inner_type)) def _get_component_coders(self): # type: () -> Tuple[Coder, ...] @@ -1142,7 +1179,7 @@ def __hash__(self): return hash((type(self), self._elem_coder)) -class IterableCoder(FastCoder): +class ListLikeCoder(FastCoder): """Coder of iterables of homogeneous objects.""" def __init__(self, elem_coder): # type: (Coder) -> None @@ -1159,7 +1196,7 @@ def as_deterministic_coder(self, step_label, error_message=None): if self.is_deterministic(): return self else: - return IterableCoder( + return type(self)( self._elem_coder.as_deterministic_coder(step_label, error_message)) def as_cloud_object(self, coders_context=None): @@ -1174,20 +1211,17 @@ def as_cloud_object(self, coders_context=None): def value_coder(self): return self._elem_coder - def to_type_hint(self): - return typehints.Iterable[self._elem_coder.to_type_hint()] - - @staticmethod - def from_type_hint(typehint, registry): - # type: (Any, CoderRegistry) -> IterableCoder - return IterableCoder(registry.get_coder(typehint.inner_type)) + @classmethod + def from_type_hint(cls, typehint, registry): + # type: (Any, CoderRegistry) -> ListLikeCoder + return cls(registry.get_coder(typehint.inner_type)) def _get_component_coders(self): # type: () -> Tuple[Coder, ...] return (self._elem_coder, ) def __repr__(self): - return 'IterableCoder[%r]' % self._elem_coder + return '%s[%r]' % (self.__class__.__name__, self._elem_coder) def __eq__(self, other): return ( @@ -1197,9 +1231,21 @@ def __hash__(self): return hash((type(self), self._elem_coder)) +class IterableCoder(ListLikeCoder): + """Coder of iterables of homogeneous objects.""" + def to_type_hint(self): + return typehints.Iterable[self._elem_coder.to_type_hint()] + + Coder.register_structured_urn(common_urns.coders.ITERABLE.urn, IterableCoder) +class ListCoder(ListLikeCoder): + """Coder of Python lists.""" + def to_type_hint(self): + return typehints.List[self._elem_coder.to_type_hint()] + + class GlobalWindowCoder(SingletonCoder): """Coder for global windows.""" def __init__(self): @@ -1491,11 +1537,11 @@ def to_type_hint(self): return sharded_key_type.ShardedKeyTypeConstraint( self._key_coder.to_type_hint()) - @staticmethod - def from_type_hint(typehint, registry): + @classmethod + def from_type_hint(cls, typehint, registry): from apache_beam.typehints import sharded_key_type if isinstance(typehint, sharded_key_type.ShardedKeyTypeConstraint): - return ShardedKeyCoder(registry.get_coder(typehint.key_type)) + return cls(registry.get_coder(typehint.key_type)) else: raise ValueError(( 'Expected an instance of ShardedKeyTypeConstraint' @@ -1507,53 +1553,53 @@ def __eq__(self, other): def __hash__(self): return hash(type(self)) + hash(self._key_coder) + def __repr__(self): + return 'ShardedKeyCoder[%s]' % self._key_coder + Coder.register_structured_urn( common_urns.coders.SHARDED_KEY.urn, ShardedKeyCoder) -class CoderElementType(typehints.TypeConstraint): - """An element type that just holds a coder.""" - def __init__(self, coder): - self.coder = coder +class TimestampPrefixingWindowCoder(FastCoder): + """For internal use only; no backwards-compatibility guarantees. + + Coder which prefixes the max timestamp of arbitrary window to its encoded + form.""" + def __init__(self, window_coder: Coder) -> None: + self._window_coder = window_coder + + def _create_impl(self): + return coder_impl.TimestampPrefixingWindowCoderImpl( + self._window_coder.get_impl()) + def to_type_hint(self): + return self._window_coder.to_type_hint() -class ExternalCoder(Coder): - """A `Coder` that holds a runner API `Coder` proto. + def _get_component_coders(self) -> List[Coder]: + return [self._window_coder] - This is used for coders for which corresponding objects cannot be - initialized in Python SDK. For example, coders for remote SDKs that may - be available in Python SDK transform graph when expanding a cross-language - transform. - """ - def __init__(self, coder_proto): - self._coder_proto = coder_proto + def is_deterministic(self) -> bool: + return self._window_coder.is_deterministic() def as_cloud_object(self, coders_context=None): - if not coders_context: - raise Exception( - 'coders_context must be specified to correctly encode external coders' - ) - coder_id = coders_context.get_by_proto(self._coder_proto, deduplicate=True) - - # 'kind:external' is just a placeholder kind. Dataflow will get the actual - # coder from pipeline proto using the pipeline_proto_coder_id property. - return {'@type': 'kind:external', 'pipeline_proto_coder_id': coder_id} + return { + '@type': 'kind:custom_window', + 'component_encodings': [ + self._window_coder.as_cloud_object(coders_context) + ], + } - @staticmethod - def from_type_hint(typehint, unused_registry): - if isinstance(typehint, CoderElementType): - return typehint.coder - else: - raise ValueError(( - 'Expected an instance of CoderElementType' - ', but got a %s' % typehint)) + def __repr__(self): + return 'TimestampPrefixingWindowCoder[%r]' % self._window_coder - def to_runner_api_parameter(self, context): + def __eq__(self, other): return ( - self._coder_proto.spec.urn, - self._coder_proto.spec.payload, - self._coder_proto.component_coder_ids) + type(self) == type(other) and self._window_coder == other._window_coder) - def to_type_hint(self): - return CoderElementType(self) + def __hash__(self): + return hash((type(self), self._window_coder)) + + +Coder.register_structured_urn( + common_urns.coders.CUSTOM_WINDOW.urn, TimestampPrefixingWindowCoder) diff --git a/sdks/python/apache_beam/coders/coders_test.py b/sdks/python/apache_beam/coders/coders_test.py index 084e0d17e894..42fb3a3e5e8c 100644 --- a/sdks/python/apache_beam/coders/coders_test.py +++ b/sdks/python/apache_beam/coders/coders_test.py @@ -16,12 +16,9 @@ # # pytype: skip-file -from __future__ import absolute_import - import base64 import logging import unittest -from builtins import object from apache_beam.coders import proto2_coder_test_messages_pb2 as test_message from apache_beam.coders import coders @@ -167,10 +164,6 @@ def __eq__(self, other): return True return False - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(type(self)) diff --git a/sdks/python/apache_beam/coders/coders_test_common.py b/sdks/python/apache_beam/coders/coders_test_common.py index 7ff026e4aa14..44ba7493e4b7 100644 --- a/sdks/python/apache_beam/coders/coders_test_common.py +++ b/sdks/python/apache_beam/coders/coders_test_common.py @@ -18,13 +18,14 @@ """Tests common to all coder implementations.""" # pytype: skip-file -from __future__ import absolute_import - +import collections +import enum import logging import math -import sys import unittest -from builtins import range +from typing import Any +from typing import List +from typing import NamedTuple import pytest @@ -45,6 +46,41 @@ from . import observable +try: + import dataclasses +except ImportError: + dataclasses = None # type: ignore + +MyNamedTuple = collections.namedtuple('A', ['x', 'y']) +MyTypedNamedTuple = NamedTuple('MyTypedNamedTuple', [('f1', int), ('f2', str)]) + + +class MyEnum(enum.Enum): + E1 = 5 + E2 = enum.auto() + E3 = 'abc' + + +MyIntEnum = enum.IntEnum('MyIntEnum', 'I1 I2 I3') +MyIntFlag = enum.IntFlag('MyIntFlag', 'F1 F2 F3') +MyFlag = enum.Flag('MyFlag', 'F1 F2 F3') # pylint: disable=too-many-function-args + + +class DefinesGetState: + def __init__(self, value): + self.value = value + + def __getstate__(self): + return self.value + + def __eq__(self, other): + return type(other) is type(self) and other.value == self.value + + +class DefinesGetAndSetState(DefinesGetState): + def __setstate__(self, value): + self.value = value + # Defined out of line for picklability. class CustomCoder(coders.Coder): @@ -55,6 +91,19 @@ def decode(self, encoded): return int(encoded) - 1 +if dataclasses is not None: + + @dataclasses.dataclass(frozen=True) + class FrozenDataClass: + a: Any + b: int + + @dataclasses.dataclass + class UnFrozenDataClass: + x: int + y: int + + # These tests need to all be run in the same process due to the asserts # in tearDownClass. @pytest.mark.no_xdist @@ -63,13 +112,38 @@ class CodersTest(unittest.TestCase): # These class methods ensure that we test each defined coder in both # nested and unnested context. + # Common test values representing Python's built-in types. + test_values_deterministic: List[Any] = [ + None, + 1, + -1, + 1.5, + b'str\0str', + u'unicode\0\u0101', + (), + (1, 2, 3), + [], + [1, 2, 3], + True, + False, + ] + test_values = test_values_deterministic + [ + dict(), + { + 'a': 'b' + }, + { + 0: dict(), 1: len + }, + set(), + {'a', 'b'}, + len, + ] + @classmethod def setUpClass(cls): cls.seen = set() cls.seen_nested = set() - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual @classmethod def tearDownClass(cls): @@ -80,8 +154,8 @@ def tearDownClass(cls): coders.Coder, coders.AvroGenericCoder, coders.DeterministicProtoCoder, - coders.ExternalCoder, coders.FastCoder, + coders.ListLikeCoder, coders.ProtoCoder, coders.ToBytesCoder ]) @@ -130,12 +204,24 @@ def test_custom_coder(self): (-10, b'b'), (5, b'c')) def test_pickle_coder(self): - self.check_coder(coders.PickleCoder(), 'a', 1, 1.5, (1, 2, 3)) + coder = coders.PickleCoder() + self.check_coder(coder, *self.test_values) + + def test_memoizing_pickle_coder(self): + coder = coders._MemoizingPickleCoder() + self.check_coder(coder, *self.test_values) def test_deterministic_coder(self): coder = coders.FastPrimitivesCoder() deterministic_coder = coders.DeterministicFastPrimitivesCoder(coder, 'step') - self.check_coder(deterministic_coder, 'a', 1, 1.5, (1, 2, 3)) + self.check_coder(deterministic_coder, *self.test_values_deterministic) + for v in self.test_values_deterministic: + self.check_coder(coders.TupleCoder((deterministic_coder, )), (v, )) + self.check_coder( + coders.TupleCoder( + (deterministic_coder, ) * len(self.test_values_deterministic)), + tuple(self.test_values_deterministic)) + with self.assertRaises(TypeError): self.check_coder(deterministic_coder, dict()) with self.assertRaises(TypeError): @@ -145,6 +231,37 @@ def test_deterministic_coder(self): coders.TupleCoder((deterministic_coder, coder)), (1, dict()), ('a', [dict()])) + self.check_coder(deterministic_coder, test_message.MessageA(field1='value')) + + self.check_coder( + deterministic_coder, [MyNamedTuple(1, 2), MyTypedNamedTuple(1, 'a')]) + + if dataclasses is not None: + self.check_coder(deterministic_coder, FrozenDataClass(1, 2)) + + with self.assertRaises(TypeError): + self.check_coder(deterministic_coder, UnFrozenDataClass(1, 2)) + with self.assertRaises(TypeError): + self.check_coder( + deterministic_coder, FrozenDataClass(UnFrozenDataClass(1, 2), 3)) + with self.assertRaises(TypeError): + self.check_coder( + deterministic_coder, MyNamedTuple(UnFrozenDataClass(1, 2), 3)) + + self.check_coder(deterministic_coder, list(MyEnum)) + self.check_coder(deterministic_coder, list(MyIntEnum)) + self.check_coder(deterministic_coder, list(MyIntFlag)) + self.check_coder(deterministic_coder, list(MyFlag)) + + self.check_coder( + deterministic_coder, + [DefinesGetAndSetState(1), DefinesGetAndSetState((1, 2, 3))]) + + with self.assertRaises(TypeError): + self.check_coder(deterministic_coder, DefinesGetState(1)) + with self.assertRaises(TypeError): + self.check_coder(deterministic_coder, DefinesGetAndSetState(dict())) + def test_dill_coder(self): cell_value = (lambda x: lambda: x)(0).__closure__[0] self.check_coder(coders.DillCoder(), 'a', 1, cell_value) @@ -154,19 +271,20 @@ def test_dill_coder(self): def test_fast_primitives_coder(self): coder = coders.FastPrimitivesCoder(coders.SingletonCoder(len)) - self.check_coder(coder, None, 1, -1, 1.5, b'str\0str', u'unicode\0\u0101') - self.check_coder(coder, (), (1, 2, 3)) - self.check_coder(coder, [], [1, 2, 3]) - self.check_coder(coder, dict(), {'a': 'b'}, {0: dict(), 1: len}) - self.check_coder(coder, set(), {'a', 'b'}) - self.check_coder(coder, True, False) - self.check_coder(coder, len) - self.check_coder(coders.TupleCoder((coder, )), ('a', ), (1, )) + self.check_coder(coder, *self.test_values) + for v in self.test_values: + self.check_coder(coders.TupleCoder((coder, )), (v, )) def test_fast_primitives_coder_large_int(self): coder = coders.FastPrimitivesCoder() self.check_coder(coder, 10**100) + def test_fake_deterministic_fast_primitives_coder(self): + coder = coders.FakeDeterministicFastPrimitivesCoder(coders.PickleCoder()) + self.check_coder(coder, *self.test_values) + for v in self.test_values: + self.check_coder(coders.TupleCoder((coder, )), (v, )) + def test_bytes_coder(self): self.check_coder(coders.BytesCoder(), b'a', b'\0', b'z' * 1000) @@ -329,6 +447,14 @@ def iter_generator(count): list(iter_generator(count)), iterable_coder.decode(iterable_coder.encode(iter_generator(count)))) + def test_list_coder(self): + list_coder = coders.ListCoder(coders.VarIntCoder()) + # Test unnested + self.check_coder(list_coder, [1], [-1, 0, 100]) + # Test nested + self.check_coder( + coders.TupleCoder((coders.VarIntCoder(), list_coder)), (1, [1, 2, 3])) + def test_windowedvalue_coder_paneinfo(self): coder = coders.WindowedValueCoder( coders.VarIntCoder(), coders.GlobalWindowCoder()) @@ -591,6 +717,10 @@ def test_sharded_key_coder(self): 'component_encodings': [key_coder.as_cloud_object()] }, coder.as_cloud_object()) + + # Test str repr + self.assertEqual('%s' % coder, 'ShardedKeyCoder[%s]' % key_coder) + self.assertEqual(b'\x00' + bytes_repr, coder.encode(ShardedKey(key, b''))) self.assertEqual( b'\x03123' + bytes_repr, coder.encode(ShardedKey(key, b'123'))) @@ -623,6 +753,19 @@ def test_sharded_key_coder(self): coders.TupleCoder((coder, other_coder)), (ShardedKey(key, b'123'), ShardedKey(other_key, b''))) + def test_timestamp_prefixing_window_coder(self): + self.check_coder( + coders.TimestampPrefixingWindowCoder(coders.IntervalWindowCoder()), + *[ + window.IntervalWindow(x, y) for x in [-2**52, 0, 2**52] + for y in range(-100, 100) + ]) + self.check_coder( + coders.TupleCoder(( + coders.TimestampPrefixingWindowCoder( + coders.IntervalWindowCoder()), )), + (window.IntervalWindow(0, 10), )) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/coders/fast_coders_test.py b/sdks/python/apache_beam/coders/fast_coders_test.py index 5a3f4e681ceb..c7112e0e4842 100644 --- a/sdks/python/apache_beam/coders/fast_coders_test.py +++ b/sdks/python/apache_beam/coders/fast_coders_test.py @@ -18,8 +18,6 @@ """Unit tests for compiled implementation of coder impls.""" # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/coders/observable.py b/sdks/python/apache_beam/coders/observable.py index e0edf12c63b4..8cdc3227ec85 100644 --- a/sdks/python/apache_beam/coders/observable.py +++ b/sdks/python/apache_beam/coders/observable.py @@ -21,10 +21,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - -from builtins import object - class ObservableMixin(object): """For internal use only; no backwards-compatibility guarantees. diff --git a/sdks/python/apache_beam/coders/observable_test.py b/sdks/python/apache_beam/coders/observable_test.py index 2ef0fe15f5ad..46f5186ba533 100644 --- a/sdks/python/apache_beam/coders/observable_test.py +++ b/sdks/python/apache_beam/coders/observable_test.py @@ -18,8 +18,6 @@ """Tests for the Observable mixin class.""" # pytype: skip-file -from __future__ import absolute_import - import logging import unittest from typing import List diff --git a/sdks/python/apache_beam/coders/row_coder.py b/sdks/python/apache_beam/coders/row_coder.py index ecb2bf3b6eea..4d67f8bf426f 100644 --- a/sdks/python/apache_beam/coders/row_coder.py +++ b/sdks/python/apache_beam/coders/row_coder.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import itertools from array import array @@ -89,10 +87,10 @@ def to_runner_api_parameter(self, unused_context): def from_runner_api_parameter(schema, components, unused_context): return RowCoder(schema) - @staticmethod - def from_type_hint(type_hint, registry): + @classmethod + def from_type_hint(cls, type_hint, registry): schema = schema_from_element_type(type_hint) - return RowCoder(schema) + return cls(schema) @staticmethod def from_payload(payload): @@ -179,7 +177,7 @@ def encode_to_stream(self, value, out, nested): for i, is_null in enumerate(nulls): words[i // 8] |= is_null << (i % 8) - self.NULL_MARKER_CODER.encode_to_stream(words.tostring(), out, True) + self.NULL_MARKER_CODER.encode_to_stream(words.tobytes(), out, True) for c, field, attr in zip(self.components, self.schema.fields, attrs): if attr is None: @@ -193,7 +191,7 @@ def encode_to_stream(self, value, out, nested): def decode_from_stream(self, in_stream, nested): nvals = self.SIZE_CODER.decode_from_stream(in_stream, True) words = array('B') - words.fromstring(self.NULL_MARKER_CODER.decode_from_stream(in_stream, True)) + words.frombytes(self.NULL_MARKER_CODER.decode_from_stream(in_stream, True)) if words: nulls = ((words[i // 8] >> (i % 8)) & 0x01 for i in range(nvals)) diff --git a/sdks/python/apache_beam/coders/row_coder_test.py b/sdks/python/apache_beam/coders/row_coder_test.py index 79d6e7815a42..b36aee78f270 100644 --- a/sdks/python/apache_beam/coders/row_coder_test.py +++ b/sdks/python/apache_beam/coders/row_coder_test.py @@ -16,15 +16,12 @@ # # pytype: skip-file -from __future__ import absolute_import - import logging import typing import unittest from itertools import chain import numpy as np -from past.builtins import unicode import apache_beam as beam from apache_beam.coders import RowCoder @@ -40,14 +37,13 @@ Person = typing.NamedTuple( "Person", [ - ("name", unicode), + ("name", str), ("age", np.int32), - ("address", typing.Optional[unicode]), - ("aliases", typing.List[unicode]), + ("address", typing.Optional[str]), + ("aliases", typing.List[str]), ("knows_javascript", bool), - # TODO(BEAM-7372): Use bytes instead of ByteString - ("payload", typing.Optional[typing.ByteString]), - ("custom_metadata", typing.Mapping[unicode, int]), + ("payload", typing.Optional[bytes]), + ("custom_metadata", typing.Mapping[str, int]), ("favorite_time", Timestamp), ]) @@ -57,7 +53,7 @@ class RowCoderTest(unittest.TestCase): JON_SNOW = Person( name="Jon Snow", - age=23, + age=np.int32(23), address=None, aliases=["crow", "wildling"], knows_javascript=False, @@ -69,7 +65,7 @@ class RowCoderTest(unittest.TestCase): JON_SNOW, Person( "Daenerys Targaryen", - 25, + np.int32(25), "Westeros", ["Mother of Dragons"], False, @@ -79,7 +75,7 @@ class RowCoderTest(unittest.TestCase): ), Person( "Michael Bluth", - 30, + np.int32(30), None, [], True, b"I've made a huge mistake", {}, @@ -193,13 +189,13 @@ def test_overflows(self): self.assertRaises(OverflowError, lambda: c.encode(case)) def test_none_in_non_nullable_field_throws(self): - Test = typing.NamedTuple('Test', [('foo', unicode)]) + Test = typing.NamedTuple('Test', [('foo', str)]) c = RowCoder.from_type_hint(Test, None) self.assertRaises(ValueError, lambda: c.encode(Test(foo=None))) def test_schema_remove_column(self): - fields = [("field1", unicode), ("field2", unicode)] + fields = [("field1", str), ("field2", str)] # new schema is missing one field that was in the old schema Old = typing.NamedTuple('Old', fields) New = typing.NamedTuple('New', fields[:-1]) @@ -211,7 +207,7 @@ def test_schema_remove_column(self): New("foo"), new_coder.decode(old_coder.encode(Old("foo", "bar")))) def test_schema_add_column(self): - fields = [("field1", unicode), ("field2", typing.Optional[unicode])] + fields = [("field1", str), ("field2", typing.Optional[str])] # new schema has one (optional) field that didn't exist in the old schema Old = typing.NamedTuple('Old', fields[:-1]) New = typing.NamedTuple('New', fields) @@ -223,8 +219,8 @@ def test_schema_add_column(self): New("bar", None), new_coder.decode(old_coder.encode(Old("bar")))) def test_schema_add_column_with_null_value(self): - fields = [("field1", typing.Optional[unicode]), ("field2", unicode), - ("field3", typing.Optional[unicode])] + fields = [("field1", typing.Optional[str]), ("field2", str), + ("field3", typing.Optional[str])] # new schema has one (optional) field that didn't exist in the old schema Old = typing.NamedTuple('Old', fields[:-1]) New = typing.NamedTuple('New', fields) diff --git a/sdks/python/apache_beam/coders/slow_coders_test.py b/sdks/python/apache_beam/coders/slow_coders_test.py index 348205fcdd29..fe1c707a62e5 100644 --- a/sdks/python/apache_beam/coders/slow_coders_test.py +++ b/sdks/python/apache_beam/coders/slow_coders_test.py @@ -18,8 +18,6 @@ """Unit tests for uncompiled implementation of coder impls.""" # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/coders/slow_stream.py b/sdks/python/apache_beam/coders/slow_stream.py index a41c578de899..23dc0ee24b01 100644 --- a/sdks/python/apache_beam/coders/slow_stream.py +++ b/sdks/python/apache_beam/coders/slow_stream.py @@ -21,12 +21,7 @@ """ # pytype: skip-file -from __future__ import absolute_import - import struct -import sys -from builtins import chr -from builtins import object from typing import List @@ -129,19 +124,6 @@ def __init__(self, data): self.data = data self.pos = 0 - # The behavior of looping over a byte-string and obtaining byte characters - # has been changed between python 2 and 3. - # b = b'\xff\x01' - # Python 2: - # b[0] = '\xff' - # ord(b[0]) = 255 - # Python 3: - # b[0] = 255 - if sys.version_info[0] >= 3: - self.read_byte = self.read_byte_py3 - else: - self.read_byte = self.read_byte_py2 - def size(self): return len(self.data) - self.pos @@ -154,13 +136,7 @@ def read_all(self, nested): # type: (bool) -> bytes return self.read(self.read_var_int64() if nested else self.size()) - def read_byte_py2(self): - # type: () -> int - self.pos += 1 - # mypy tests against python 3.x, where this is an error: - return ord(self.data[self.pos - 1]) # type: ignore[arg-type] - - def read_byte_py3(self): + def read_byte(self): # type: () -> int self.pos += 1 return self.data[self.pos - 1] diff --git a/sdks/python/apache_beam/coders/standard_coders_test.py b/sdks/python/apache_beam/coders/standard_coders_test.py index 87320c8d6473..454939f5211c 100644 --- a/sdks/python/apache_beam/coders/standard_coders_test.py +++ b/sdks/python/apache_beam/coders/standard_coders_test.py @@ -19,16 +19,12 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import json import logging import math import os.path import sys import unittest -from builtins import map from copy import deepcopy from typing import Dict from typing import Tuple @@ -193,7 +189,9 @@ class StandardCodersTest(unittest.TestCase): 'beam:coder:double:v1': parse_float, 'beam:coder:sharded_key:v1': lambda x, value_parser: ShardedKey( - key=value_parser(x['key']), shard_id=x['shardId'].encode('utf-8')) + key=value_parser(x['key']), shard_id=x['shardId'].encode('utf-8')), + 'beam:coder:custom_window:v1': lambda x, + window_parser: window_parser(x['window']) } def test_standard_coders(self): diff --git a/sdks/python/apache_beam/coders/stream.pyx b/sdks/python/apache_beam/coders/stream.pyx index d3675b403846..7fa1854c57b1 100644 --- a/sdks/python/apache_beam/coders/stream.pyx +++ b/sdks/python/apache_beam/coders/stream.pyx @@ -15,6 +15,8 @@ # limitations under the License. # +# cython: language_level=3 + """Compiled version of the Stream objects used by CoderImpl. For internal use only; no backwards-compatibility guarantees. diff --git a/sdks/python/apache_beam/coders/stream_test.py b/sdks/python/apache_beam/coders/stream_test.py index b84d338bca82..7a9c8711e7ba 100644 --- a/sdks/python/apache_beam/coders/stream_test.py +++ b/sdks/python/apache_beam/coders/stream_test.py @@ -18,13 +18,9 @@ """Tests for the stream implementations.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import math import unittest -from builtins import range from apache_beam.coders import slow_stream diff --git a/sdks/python/apache_beam/coders/typecoders.py b/sdks/python/apache_beam/coders/typecoders.py index 52159a25dc2f..03b6cee417d4 100644 --- a/sdks/python/apache_beam/coders/typecoders.py +++ b/sdks/python/apache_beam/coders/typecoders.py @@ -66,20 +66,13 @@ def MakeXyzs(v): # pytype: skip-file -from __future__ import absolute_import - -from builtins import object from typing import Any from typing import Dict from typing import Iterable from typing import List from typing import Type -from past.builtins import unicode - from apache_beam.coders import coders -from apache_beam.coders.coders import CoderElementType -from apache_beam.coders.coders import ExternalCoder from apache_beam.typehints import typehints __all__ = ['registry'] @@ -94,18 +87,22 @@ def __init__(self, fallback_coder=None): def register_standard_coders(self, fallback_coder): """Register coders for all basic and composite types.""" + # Coders without subclasses. self._register_coder_internal(int, coders.VarIntCoder) self._register_coder_internal(float, coders.FloatCoder) self._register_coder_internal(bytes, coders.BytesCoder) self._register_coder_internal(bool, coders.BooleanCoder) - self._register_coder_internal(unicode, coders.StrUtf8Coder) + self._register_coder_internal(str, coders.StrUtf8Coder) self._register_coder_internal(typehints.TupleConstraint, coders.TupleCoder) - self._register_coder_internal(CoderElementType, ExternalCoder) + self._register_coder_internal(typehints.DictConstraint, coders.MapCoder) # Default fallback coders applied in that order until the first matching # coder found. default_fallback_coders = [coders.ProtoCoder, coders.FastPrimitivesCoder] self._fallback_coder = fallback_coder or FirstOf(default_fallback_coders) + def register_fallback_coder(self, fallback_coder): + self._fallback_coder = FirstOf([fallback_coder, self._fallback_coder]) + def _register_coder_internal(self, typehint_type, typehint_coder_class): # type: (Any, Type[coders.Coder]) -> None self._coders[typehint_type] = typehint_coder_class @@ -135,10 +132,10 @@ def get_coder(self, typehint): raise RuntimeError( 'Coder registry has no fallback coder. This can happen if the ' 'fast_coders module could not be imported.') - if isinstance( - typehint, - (typehints.IterableTypeConstraint, typehints.ListConstraint)): + if isinstance(typehint, typehints.IterableTypeConstraint): return coders.IterableCoder.from_type_hint(typehint, self) + elif isinstance(typehint, typehints.ListConstraint): + return coders.ListCoder.from_type_hint(typehint, self) elif typehint is None: # In some old code, None is used for Any. # TODO(robertwb): Clean this up. @@ -187,7 +184,7 @@ def from_type_hint(self, typehint, registry): messages = [] for coder in self._coders: try: - return coder.from_type_hint(typehint, self) + return coder.from_type_hint(typehint, registry) except Exception as e: msg = ( '%s could not provide a Coder for type %s: %s' % diff --git a/sdks/python/apache_beam/coders/typecoders_test.py b/sdks/python/apache_beam/coders/typecoders_test.py index dec478f82ec3..02f4565c5e2d 100644 --- a/sdks/python/apache_beam/coders/typecoders_test.py +++ b/sdks/python/apache_beam/coders/typecoders_test.py @@ -18,10 +18,7 @@ """Unit tests for the typecoders module.""" # pytype: skip-file -from __future__ import absolute_import - import unittest -from builtins import object from apache_beam.coders import coders from apache_beam.coders import typecoders @@ -37,10 +34,6 @@ def __init__(self, n): def __eq__(self, other): return self.number == other.number - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return self.number diff --git a/sdks/python/apache_beam/dataframe/__init__.py b/sdks/python/apache_beam/dataframe/__init__.py index 9071a88193de..00ce191b26e5 100644 --- a/sdks/python/apache_beam/dataframe/__init__.py +++ b/sdks/python/apache_beam/dataframe/__init__.py @@ -14,6 +14,19 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import +"""Beam DataFrame API + +- For high-level documentation see + https://beam.apache.org/documentation/dsls/dataframes/overview/ +- :mod:`apache_beam.dataframe.io`: DataFrame I/Os +- :mod:`apache_beam.dataframe.frames`: DataFrame operations +- :mod:`apache_beam.dataframe.convert`: Conversion between + :class:`~apache_beam.pvalue.PCollection` and + :class:`~apache_beam.dataframe.frames.DeferredDataFrame`. +- :mod:`apache_beam.dataframe.schemas`: Mapping from Beam Schemas to pandas + dtypes. +- :mod:`apache_beam.dataframe.transforms`: Use DataFrame operations within a + Beam pipeline with `DataframeTransform`. +""" from apache_beam.dataframe.expressions import allow_non_parallel_operations diff --git a/sdks/python/apache_beam/dataframe/convert.py b/sdks/python/apache_beam/dataframe/convert.py index e7ec99eb303a..2b207dda7270 100644 --- a/sdks/python/apache_beam/dataframe/convert.py +++ b/sdks/python/apache_beam/dataframe/convert.py @@ -14,8 +14,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import inspect import weakref from typing import TYPE_CHECKING @@ -24,6 +22,9 @@ from typing import Tuple from typing import Union +import pandas as pd + +import apache_beam as beam from apache_beam import pvalue from apache_beam.dataframe import expressions from apache_beam.dataframe import frame_base @@ -33,13 +34,12 @@ if TYPE_CHECKING: # pylint: disable=ungrouped-imports from typing import Optional - import pandas # TODO: Or should this be called as_dataframe? def to_dataframe( pcoll, # type: pvalue.PCollection - proxy=None, # type: Optional[pandas.core.generic.NDFrame] + proxy=None, # type: Optional[pd.core.generic.NDFrame] label=None, # type: Optional[str] ): # type: (...) -> frame_base.DeferredFrame @@ -106,10 +106,12 @@ def _make_unbatched_pcoll( def to_pcollection( - *dataframes, # type: frame_base.DeferredFrame - **kwargs): - # type: (...) -> Union[pvalue.PCollection, Tuple[pvalue.PCollection, ...]] - + *dataframes, # type: Union[frame_base.DeferredFrame, pd.DataFrame, pd.Series] + label=None, + always_return_tuple=False, + yield_elements='schemas', + include_indexes=False, + pipeline=None) -> Union[pvalue.PCollection, Tuple[pvalue.PCollection, ...]]: """Converts one or more deferred dataframe-like objects back to a PCollection. This method creates and applies the actual Beam operations that compute @@ -119,10 +121,17 @@ def to_pcollection( behavior can be modified with the `yield_elements` and `include_indexes` arguments. + Also accepts non-deferred pandas dataframes, which are converted to deferred, + schema'd PCollections. In this case the contents of the entire dataframe are + serialized into the graph, so for large amounts of data it is preferable to + write them to disk and read them with one of the read methods. + If more than one (related) result is desired, it can be more efficient to pass them all at the same time to this method. Args: + label: (optional, default "ToPCollection(...)"") the label to use for the + conversion transform. always_return_tuple: (optional, default: False) If true, always return a tuple of PCollections, even if there's only one output. yield_elements: (optional, default: "schemas") If set to "pandas", return @@ -136,21 +145,37 @@ def to_pcollection( schema for expanded DataFrames. Raises an error if any of the index levels are unnamed (name=None), or if any of the names are not unique among all column and index names. + pipeline: (optional, unless non-deferred dataframes are passed) Used when + creating a PCollection from a non-deferred dataframe. """ - label = kwargs.pop('label', None) - always_return_tuple = kwargs.pop('always_return_tuple', False) - yield_elements = kwargs.pop('yield_elements', 'schemas') if not yield_elements in ("pandas", "schemas"): raise ValueError( "Invalid value for yield_elements argument, '%s'. " "Allowed values are 'pandas' and 'schemas'" % yield_elements) - include_indexes = kwargs.pop('include_indexes', False) - assert not kwargs # TODO(BEAM-7372): Use PEP 3102 if label is None: # Attempt to come up with a reasonable, stable label by retrieving the name # of these variables in the calling context. label = 'ToPCollection(%s)' % ', '.join(_var_name(e, 3) for e in dataframes) + # Support for non-deferred dataframes. + deferred_dataframes = [] + for ix, df in enumerate(dataframes): + if isinstance(df, frame_base.DeferredBase): + # TODO(robertwb): Maybe extract pipeline object? + deferred_dataframes.append(df) + elif isinstance(df, (pd.Series, pd.DataFrame)): + if pipeline is None: + raise ValueError( + 'Pipeline keyword required for non-deferred dataframe conversion.') + deferred = pipeline | '%s_Defer%s' % (label, ix) >> beam.Create([df]) + deferred_dataframes.append( + frame_base.DeferredFrame.wrap( + expressions.PlaceholderExpression(df.iloc[:0], deferred))) + else: + raise TypeError( + 'Unable to convert objects of type %s to a PCollection' % type(df)) + dataframes = tuple(deferred_dataframes) + def extract_input(placeholder): if not isinstance(placeholder._reference, pvalue.PCollection): raise TypeError( diff --git a/sdks/python/apache_beam/dataframe/convert_test.py b/sdks/python/apache_beam/dataframe/convert_test.py index 7d5cb03c5088..06a1001697f7 100644 --- a/sdks/python/apache_beam/dataframe/convert_test.py +++ b/sdks/python/apache_beam/dataframe/convert_test.py @@ -14,8 +14,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import unittest import pandas as pd @@ -92,6 +90,15 @@ def test_convert_scalar(self): pc_sum = convert.to_pcollection(s.sum()) assert_that(pc_sum, equal_to([6])) + def test_convert_non_deferred(self): + with beam.Pipeline() as p: + s1 = pd.Series([1, 2, 3]) + s2 = convert.to_dataframe(p | beam.Create([100, 200, 300])) + + pc1, pc2 = convert.to_pcollection(s1, s2, pipeline=p) + assert_that(pc1, equal_to([1, 2, 3]), label='CheckNonDeferred') + assert_that(pc2, equal_to([100, 200, 300]), label='CheckDeferred') + def test_convert_memoization(self): with beam.Pipeline() as p: a = pd.Series([1, 2, 3]) diff --git a/sdks/python/apache_beam/dataframe/doctests.py b/sdks/python/apache_beam/dataframe/doctests.py index 9f89c38d6924..45171db4da1e 100644 --- a/sdks/python/apache_beam/dataframe/doctests.py +++ b/sdks/python/apache_beam/dataframe/doctests.py @@ -37,10 +37,6 @@ values. """ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import collections import contextlib import doctest @@ -73,7 +69,7 @@ def __init__(self, pandas_obj, test_env): def __call__(self, *args, **kwargs): result = self._pandas_obj(*args, **kwargs) if type(result) in DeferredBase._pandas_type_map.keys(): - placeholder = expressions.PlaceholderExpression(result[0:0]) + placeholder = expressions.PlaceholderExpression(result.iloc[0:0]) self._test_env._inputs[placeholder] = result return DeferredBase.wrap(placeholder) else: @@ -193,7 +189,7 @@ def reset(self): def compute_using_session(self, to_compute): session = expressions.PartitioningSession(self._env._inputs) return { - name: frame._expr.evaluate_at(session) + name: session.evaluate(frame._expr) for name, frame in to_compute.items() } @@ -238,6 +234,50 @@ def fix(self, want, got): for name, frame in computed.items(): got = got.replace(name, repr(frame)) + # If a multiindex is used, compensate for it + if any(isinstance(frame, pd.core.generic.NDFrame) and + frame.index.nlevels > 1 for frame in computed.values()): + + def fill_multiindex(text): + """An awful hack to work around the fact that pandas omits repeated + elements in a multi-index. + For example: + + Series name Row ID + s1 0 a + 1 b + s2 0 c + 1 d + dtype: object + + The s1 and s2 are implied for the 2nd and 4th rows. However if we + re-order this Series it might be printed this way: + + Series name Row ID + s1 0 a + s2 1 d + s2 0 c + s1 1 b + dtype: object + + In our model these are equivalent, but when we sort the lines and + check equality they are not. This method fills in any omitted + multiindex values, so that we can successfully sort and compare.""" + lines = [list(line) for line in text.split('\n')] + for prev, line in zip(lines[:-1], lines[1:]): + if all(l == ' ' for l in line): + continue + + for i, l in enumerate(line): + if l != ' ': + break + line[i] = prev[i] + + return '\n'.join(''.join(line) for line in lines) + + got = fill_multiindex(got) + want = fill_multiindex(want) + def sort_and_normalize(text): return '\n'.join( sorted( @@ -549,7 +589,7 @@ def is_example_line(line): if output: # Strip the prompt. # TODO(robertwb): Figure out how to suppress this. - output = re.sub(r'^Out\[\d+\]:\s*', '', output) + output = re.sub(r'^Out\[\d+\]:[ \t]*\n?', '', output) examples.append(doctest.Example(src, output, lineno=lineno)) finally: @@ -619,14 +659,6 @@ def teststrings(texts, report=False, **runner_kwargs): return runner.summary().result() -def testfile(*args, **kwargs): - return _run_patched(doctest.testfile, *args, **kwargs) - - -def testmod(*args, **kwargs): - return _run_patched(doctest.testmod, *args, **kwargs) - - def set_pandas_options(): # See # https://github.com/pandas-dev/pandas/blob/a00202d12d399662b8045a8dd3fdac04f18e1e55/doc/source/conf.py#L319 @@ -666,3 +698,55 @@ def _run_patched(func, *args, **kwargs): *args, extraglobs=extraglobs, optionflags=optionflags, **kwargs) finally: doctest.DocTestRunner = original_doc_test_runner + + +def with_run_patched_docstring(target=None): + assert target is not None + + def wrapper(fn): + fn.__doc__ = f"""Run all pandas doctests in the specified {target}. + + Arguments `skip`, `wont_implement_ok`, `not_implemented_ok` are all in the + format:: + + {{ + "module.Class.method": ['*'], + "module.Class.other_method": [ + 'instance.other_method(bad_input)', + 'observe_result_of_bad_input()', + ], + }} + + `'*'` indicates all examples should be matched, otherwise the list is a list + of specific input strings that should be matched. + + All arguments are kwargs. + + Args: + optionflags (int): Passed through to doctests. + extraglobs (Dict[str,Any]): Passed through to doctests. + use_beam (bool): If true, run a Beam pipeline with partitioned input to + verify the examples, else use PartitioningSession to simulate + distributed execution. + skip (Dict[str,str]): A set of examples to skip entirely. + wont_implement_ok (Dict[str,str]): A set of examples that are allowed to + raise WontImplementError. + not_implemented_ok (Dict[str,str]): A set of examples that are allowed to + raise NotImplementedError. + + Returns: + ~doctest.TestResults: A doctest result describing the passed/failed tests. + """ + return fn + + return wrapper + + +@with_run_patched_docstring(target="file") +def testfile(*args, **kwargs): + return _run_patched(doctest.testfile, *args, **kwargs) + + +@with_run_patched_docstring(target="module") +def testmod(*args, **kwargs): + return _run_patched(doctest.testmod, *args, **kwargs) diff --git a/sdks/python/apache_beam/dataframe/doctests_test.py b/sdks/python/apache_beam/dataframe/doctests_test.py index 52d70fe57c4a..1adff65c0d0f 100644 --- a/sdks/python/apache_beam/dataframe/doctests_test.py +++ b/sdks/python/apache_beam/dataframe/doctests_test.py @@ -14,11 +14,8 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import doctest import os -import sys import tempfile import unittest @@ -146,7 +143,6 @@ def foo(x): ''' -@unittest.skipIf(sys.version_info <= (3, ), 'Requires contextlib.ExitStack.') class DoctestTest(unittest.TestCase): def test_good(self): result = doctests.teststring(SAMPLE_DOCTEST, report=False) diff --git a/sdks/python/apache_beam/dataframe/expressions.py b/sdks/python/apache_beam/dataframe/expressions.py index e275795665c6..c8960c395cb3 100644 --- a/sdks/python/apache_beam/dataframe/expressions.py +++ b/sdks/python/apache_beam/dataframe/expressions.py @@ -14,13 +14,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import contextlib import random import threading from typing import Any from typing import Callable +from typing import Generic from typing import Iterable from typing import Optional from typing import TypeVar @@ -92,9 +91,8 @@ def evaluate_with(input_partitioning): if not is_scalar(arg)): results.append(session.evaluate(expr)) - expected_output_partitioning = expr.preserves_partition_by( - ) if input_partitioning.is_subpartitioning_of( - expr.preserves_partition_by()) else input_partitioning + expected_output_partitioning = output_partitioning( + expr, input_partitioning) if not expected_output_partitioning.check(results): raise AssertionError( @@ -116,19 +114,24 @@ def evaluate_with(input_partitioning): # the expression is part of a test that relies on the random seed. random_state = random.getstate() - for input_partitioning in set([expr.requires_partition_by(), - partitionings.Nothing(), - partitionings.Index(), - partitionings.Singleton()]): - if not input_partitioning.is_subpartitioning_of( - expr.requires_partition_by()): + result = None + # Run with all supported partitionings s.t. the smallest subpartitioning + # is used last. This way the final result is computed with the most + # challenging partitioning. Avoids heisenbugs where sometimes the result + # is computed trivially with Singleton partitioning and passes. + for input_partitioning in sorted(set([expr.requires_partition_by(), + partitionings.Arbitrary(), + partitionings.Index(), + partitionings.Singleton()])): + if not expr.requires_partition_by().is_subpartitioning_of( + input_partitioning): continue random.setstate(random_state) - # TODO(BEAM-11324): Consider verifying result is always the same result = evaluate_with(input_partitioning) + assert result is not None self._bindings[expr] = result return self._bindings[expr] @@ -137,24 +140,77 @@ def evaluate_with(input_partitioning): T = TypeVar('T') -class Expression(object): +def output_partitioning(expr, input_partitioning): + """ Return the expected output partitioning for `expr` when it's input is + partitioned by `input_partitioning`. + + For internal use only; No backward compatibility guarantees """ + assert expr.requires_partition_by().is_subpartitioning_of(input_partitioning) + + if expr.preserves_partition_by().is_subpartitioning_of(input_partitioning): + return min(input_partitioning, expr.preserves_partition_by()) + else: + return partitionings.Arbitrary() + + +class Expression(Generic[T]): """An expression is an operation bound to a set of arguments. An expression represents a deferred tree of operations, which can be evaluated at a specific bindings of root expressions to values. + + requires_partition_by indicates the upper bound of a set of partitionings that + are acceptable inputs to this expression. The expression should be able to + produce the correct result when given input(s) partitioned by its + requires_partition_by attribute, or by any partitoning that is _not_ + a subpartitioning of it. + + preserves_partition_by indicates the upper bound of a set of partitionings + that can be preserved by this expression. When the input(s) to this expression + are partitioned by preserves_partition_by, or by any partitioning that is + _not_ a subpartitioning of it, this expression should produce output(s) + partitioned by the same partitioning. + + However, if the partitioning of an expression's input is a subpartitioning of + the partitioning that it preserves, the output is presumed to have no + particular partitioning (i.e. Arbitrary()). + + For example, let's look at an "element-wise operation", that has no + partitioning requirement, and preserves any partitioning given to it:: + + requires_partition_by = Arbitrary() -----------------------------+ + | + +-----------+-------------+---------- ... ----+---------| + | | | | | + Singleton() < Index([i]) < Index([i, j]) < ... < Index() < Arbitrary() + | | | | | + +-----------+-------------+---------- ... ----+---------| + | + preserves_partition_by = Arbitrary() ----------------------------+ + + As a more interesting example, consider this expression, which requires Index + partitioning, and preserves just Singleton partitioning:: + + requires_partition_by = Index() -----------------------+ + | + +-----------+-------------+---------- ... ----| + | | | | + Singleton() < Index([i]) < Index([i, j]) < ... < Index() < Arbitrary() + | + | + preserves_partition_by = Singleton() + + Note that any non-Arbitrary partitioning is an acceptable input for this + expression. However, unless the inputs are Singleton-partitioned, the + expression makes no guarantees about the partitioning of the output. """ - def __init__( - self, - name, # type: str - proxy, # type: T - _id=None # type: Optional[str] - ): + def __init__(self, name: str, proxy: T, _id: Optional[str] = None): self._name = name self._proxy = proxy # Store for preservation through pickling. self._id = _id or '%s_%s_%s' % (name, type(proxy).__name__, id(self)) - def proxy(self): # type: () -> T + def proxy(self) -> T: return self._proxy def __hash__(self): @@ -163,9 +219,6 @@ def __hash__(self): def __eq__(self, other): return self._id == other._id - def __ne__(self, other): - return not self == other - def __repr__(self): return '%s[%s]' % (self.__class__.__name__, self._id) @@ -173,18 +226,18 @@ def placeholders(self): """Returns all the placeholders that self depends on.""" raise NotImplementedError(type(self)) - def evaluate_at(self, session): # type: (Session) -> T + def evaluate_at(self, session: Session) -> T: """Returns the result of self with the bindings given in session.""" raise NotImplementedError(type(self)) - def requires_partition_by(self): # type: () -> partitionings.Partitioning + def requires_partition_by(self) -> partitionings.Partitioning: """Returns the partitioning, if any, require to evaluate this expression. - Returns partitioning.Nothing() to require no partitioning is required. + Returns partitioning.Arbitrary() to require no partitioning is required. """ raise NotImplementedError(type(self)) - def preserves_partition_by(self): # type: () -> partitionings.Partitioning + def preserves_partition_by(self) -> partitionings.Partitioning: """Returns the partitioning, if any, preserved by this expression. This gives an upper bound on the partitioning of its ouput. The actual @@ -220,10 +273,10 @@ def evaluate_at(self, session): return session.lookup(self) def requires_partition_by(self): - return partitionings.Nothing() + return partitionings.Arbitrary() def preserves_partition_by(self): - return partitionings.Nothing() + return partitionings.Index() class ConstantExpression(Expression): @@ -256,10 +309,10 @@ def evaluate_at(self, session): return self._value def requires_partition_by(self): - return partitionings.Nothing() + return partitionings.Arbitrary() def preserves_partition_by(self): - return partitionings.Nothing() + return partitionings.Arbitrary() class ComputedExpression(Expression): @@ -272,7 +325,7 @@ def __init__( proxy=None, # type: Optional[T] _id=None, # type: Optional[str] requires_partition_by=partitionings.Index(), # type: partitionings.Partitioning - preserves_partition_by=partitionings.Nothing(), # type: partitionings.Partitioning + preserves_partition_by=partitionings.Singleton(), # type: partitionings.Partitioning ): """Initialize a computed expression. @@ -291,10 +344,16 @@ def __init__( preserves_partition_by: The level of partitioning preserved. """ if (not _get_allow_non_parallel() and - requires_partition_by == partitionings.Singleton()): + isinstance(requires_partition_by, partitionings.Singleton)): + reason = requires_partition_by.reason or ( + f"Encountered non-parallelizable form of {name!r}.") + raise NonParallelOperation( - "Using non-parallel form of %s " - "outside of allow_non_parallel_operations block." % name) + f"{reason}\n" + "Consider using an allow_non_parallel_operations block if you're " + "sure you want to do this. See " + "https://s.apache.org/dataframe-non-parallel-operations for more " + "information.") args = tuple(args) if proxy is None: proxy = func(*(arg.proxy() for arg in args)) @@ -326,8 +385,8 @@ def elementwise_expression(name, func, args): name, func, args, - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Singleton()) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary()) _ALLOW_NON_PARALLEL = threading.local() @@ -349,4 +408,6 @@ def allow_non_parallel_operations(allow=True): class NonParallelOperation(Exception): - pass + def __init__(self, msg): + super(NonParallelOperation, self).__init__(self, msg) + self.msg = msg diff --git a/sdks/python/apache_beam/dataframe/expressions_test.py b/sdks/python/apache_beam/dataframe/expressions_test.py index fbbf0250608a..2c5c716f5beb 100644 --- a/sdks/python/apache_beam/dataframe/expressions_test.py +++ b/sdks/python/apache_beam/dataframe/expressions_test.py @@ -14,11 +14,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import unittest +import pandas as pd + from apache_beam.dataframe import expressions +from apache_beam.dataframe import partitionings class ExpressionTest(unittest.TestCase): @@ -53,6 +54,71 @@ def test_expression_proxy_error(self): with self.assertRaises(TypeError): expressions.ComputedExpression('add', lambda a, b: a + b, [a, b]) + def test_preserves_singleton_output_partitioning(self): + # Empty DataFrame with one column and two index levels + input_expr = expressions.ConstantExpression( + pd.DataFrame(columns=["column"], index=[[], []])) + + preserves_only_singleton = expressions.ComputedExpression( + 'preserves_only_singleton', + # index is replaced with an entirely new one, so + # if we were partitioned by Index we're not anymore. + lambda df: df.set_index('column'), + [input_expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) + + for partitioning in (partitionings.Singleton(), ): + self.assertEqual( + expressions.output_partitioning( + preserves_only_singleton, partitioning), + partitioning, + f"Should preserve {partitioning}") + + for partitioning in (partitionings.Index([0]), + partitionings.Index(), + partitionings.Arbitrary()): + self.assertEqual( + expressions.output_partitioning( + preserves_only_singleton, partitioning), + partitionings.Arbitrary(), + f"Should NOT preserve {partitioning}") + + def test_preserves_index_output_partitioning(self): + # Empty DataFrame with two columns and two index levels + input_expr = expressions.ConstantExpression( + pd.DataFrame(columns=["foo", "bar"], index=[[], []])) + + preserves_partial_index = expressions.ComputedExpression( + 'preserves_partial_index', + # This adds an additional index level, so we'd only preserve + # partitioning on the two index levels that existed before. + lambda df: df.set_index('foo', append=True), + [input_expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Index([0, 1])) + + for partitioning in ( + partitionings.Singleton(), + partitionings.Index([0]), + partitionings.Index([1]), + partitionings.Index([0, 1]), + ): + self.assertEqual( + expressions.output_partitioning( + preserves_partial_index, partitioning), + partitioning, + f"Should preserve {partitioning}") + + for partitioning in (partitionings.Index([0, 1, 2]), + partitionings.Index(), + partitionings.Arbitrary()): + self.assertEqual( + expressions.output_partitioning( + preserves_partial_index, partitioning), + partitionings.Arbitrary(), + f"Should NOT preserve {partitioning}") + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/dataframe/frame_base.py b/sdks/python/apache_beam/dataframe/frame_base.py index ff4ad14b924c..4bb9ddf807f3 100644 --- a/sdks/python/apache_beam/dataframe/frame_base.py +++ b/sdks/python/apache_beam/dataframe/frame_base.py @@ -14,16 +14,19 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import functools -import inspect -import sys +import re +from inspect import cleandoc +from inspect import getfullargspec +from inspect import isclass +from inspect import ismodule +from inspect import unwrap from typing import Any from typing import Callable from typing import Dict from typing import List from typing import Optional +from typing import Tuple from typing import Union import pandas as pd @@ -31,18 +34,6 @@ from apache_beam.dataframe import expressions from apache_beam.dataframe import partitionings -# pylint: disable=deprecated-method -if sys.version_info < (3, ): - _getargspec = inspect.getargspec - - def _unwrap(func): - while hasattr(func, '__wrapped__'): - func = func.__wrapped__ - return func -else: - _getargspec = inspect.getfullargspec - _unwrap = inspect.unwrap - class DeferredBase(object): @@ -70,7 +61,7 @@ def get(ix): 'get_%d' % ix, lambda t: t[ix], [expr], - requires_partition_by=partitionings.Nothing(), + requires_partition_by=partitionings.Arbitrary(), preserves_partition_by=partitionings.Singleton()) return tuple([cls.wrap(get(ix)) for ix in range(len(expr.proxy()))]) @@ -84,8 +75,11 @@ def get(ix): wrapper_type = _DeferredScalar return wrapper_type(expr) - def _elementwise(self, func, name=None, other_args=(), inplace=False): - return _elementwise_function(func, name, inplace=inplace)(self, *other_args) + def _elementwise( + self, func, name=None, other_args=(), other_kwargs=None, inplace=False): + other_kwargs = other_kwargs or {} + return _elementwise_function( + func, name, inplace=inplace)(self, *other_args, **other_kwargs) def __reduce__(self): return UnusableUnpickledDeferredBase, (str(self), ) @@ -106,9 +100,7 @@ def __repr__(self): class DeferredFrame(DeferredBase): - @property - def dtypes(self): - return self._expr.proxy().dtypes + pass class _DeferredScalar(DeferredBase): @@ -123,25 +115,42 @@ def apply(self, func, name=None, args=()): func, [self._expr] + [arg._expr for arg in args], requires_partition_by=partitionings.Singleton())) + def __repr__(self): + return f"DeferredScalar[type={type(self._expr.proxy())}]" + + def __bool__(self): + # TODO(BEAM-11951): Link to documentation + raise TypeError( + "Testing the truth value of a deferred scalar is not " + "allowed. It's not possible to branch on the result of " + "deferred operations.") + DeferredBase._pandas_type_map[None] = _DeferredScalar -def name_and_func(method): +def name_and_func(method: Union[str, Callable]) -> Tuple[str, Callable]: + """For the given method name or method, return the method name and the method + itself. + + For internal use only. No backwards compatibility guarantees.""" if isinstance(method, str): - return method, lambda df, *args, **kwargs: getattr(df, method)(*args, ** - kwargs) + method_str = method + func = lambda df, *args, **kwargs: getattr(df, method_str)(*args, **kwargs) + return method, func else: return method.__name__, method -def _elementwise_method(func, name=None, restrictions=None, inplace=False): +def _elementwise_method( + func, name=None, restrictions=None, inplace=False, base=None): return _proxy_method( func, name, restrictions, inplace, - requires_partition_by=partitionings.Nothing(), + base, + requires_partition_by=partitionings.Arbitrary(), preserves_partition_by=partitionings.Singleton()) @@ -150,38 +159,45 @@ def _proxy_method( name=None, restrictions=None, inplace=False, + base=None, requires_partition_by=partitionings.Singleton(), - preserves_partition_by=partitionings.Nothing()): + preserves_partition_by=partitionings.Arbitrary()): if name is None: name, func = name_and_func(func) - if restrictions is None: - restrictions = {} + if base is None: + raise ValueError("base is required for _proxy_method") return _proxy_function( func, name, restrictions, inplace, + base, requires_partition_by, preserves_partition_by) -def _elementwise_function(func, name=None, restrictions=None, inplace=False): +def _elementwise_function( + func, name=None, restrictions=None, inplace=False, base=None): return _proxy_function( func, name, restrictions, inplace, - requires_partition_by=partitionings.Nothing(), + base, + requires_partition_by=partitionings.Arbitrary(), preserves_partition_by=partitionings.Singleton()) def _proxy_function( - func, # type: Union[Callable, str] - name=None, # type: Optional[str] - restrictions=None, # type: Optional[Dict[str, Union[Any, List[Any], Callable[[Any], bool]]]] - inplace=False, # type: bool - requires_partition_by=partitionings.Singleton(), # type: partitionings.Partitioning - preserves_partition_by=partitionings.Nothing(), # type: partitionings.Partitioning + func, # type: Union[Callable, str] + name=None, # type: Optional[str] + restrictions=None, # type: Optional[Dict[str, Union[Any, List[Any]]]] + inplace=False, # type: bool + base=None, # type: Optional[type] + requires_partition_by=partitionings.Singleton( + ), # type: partitionings.Partitioning + preserves_partition_by=partitionings.Arbitrary( + ), # type: partitionings.Partitioning ): if name is None: @@ -198,7 +214,7 @@ def wrapper(*args, **kwargs): value = kwargs[key] else: try: - ix = _getargspec(func).args.index(key) + ix = getfullargspec(func).args.index(key) except ValueError: # TODO: fix for delegation? continue @@ -234,7 +250,7 @@ def wrapper(*args, **kwargs): lambda ix: ix.index.to_series(), # yapf break [arg._frame._expr], preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Nothing())) + requires_partition_by=partitionings.Arbitrary())) elif isinstance(arg, pd.core.generic.NDFrame): deferred_arg_indices.append(ix) deferred_arg_exprs.append(expressions.ConstantExpression(arg, arg[0:0])) @@ -258,7 +274,7 @@ def wrapper(*args, **kwargs): deferred_exprs = deferred_arg_exprs + deferred_kwarg_exprs if inplace: - actual_func = copy_and_mutate(func) + actual_func = _copy_and_mutate(func) else: actual_func = func @@ -276,9 +292,9 @@ def apply(*actual_args): return actual_func(*full_args, **full_kwargs) - if (not requires_partition_by.is_subpartitioning_of(partitionings.Index()) - and sum(isinstance(arg.proxy(), pd.core.generic.NDFrame) - for arg in deferred_exprs) > 1): + if (requires_partition_by.is_subpartitioning_of(partitionings.Index()) and + sum(isinstance(arg.proxy(), pd.core.generic.NDFrame) + for arg in deferred_exprs) > 1): # Implicit join on index if there is more than one indexed input. actual_requires_partition_by = partitionings.Index() else: @@ -296,31 +312,99 @@ def apply(*actual_args): else: return DeferredFrame.wrap(result_expr) - return wrapper + wrapper.__name__ = name + if restrictions: + wrapper.__doc__ = "\n".join( + f"Only {kw}={value!r} is supported" + for (kw, value) in restrictions.items()) + + if base is not None: + return with_docs_from(base)(wrapper) + else: + return wrapper -def _agg_method(func): - def wrapper(self, *args, **kwargs): - return self.agg(func, *args, **kwargs) +def _prettify_pandas_type(pandas_type): + if pandas_type in (pd.DataFrame, pd.Series): + return f'pandas.{pandas_type.__name__}' + elif isclass(pandas_type): + return f'{pandas_type.__module__}.{pandas_type.__name__}' + elif ismodule(pandas_type): + return pandas_type.__name__ + else: + raise TypeError(pandas_type) - return wrapper +def wont_implement_method(base_type, name, reason=None, explanation=None): + """Generate a stub method that raises WontImplementError. + + Note either reason or explanation must be specified. If both are specified, + explanation is ignored. + + Args: + base_type: The pandas type of the method that this is trying to replicate. + name: The name of the method that this is aiming to replicate. + reason: If specified, use data from the corresponding entry in + ``_WONT_IMPLEMENT_REASONS`` to generate a helpful exception message + and docstring for the method. + explanation: If specified, use this string as an explanation for why + this operation is not supported when generating an exception message + and docstring. + """ + if reason is not None: + if reason not in _WONT_IMPLEMENT_REASONS: + raise AssertionError( + f"reason must be one of {list(_WONT_IMPLEMENT_REASONS.keys())}, " + f"got {reason!r}") + reason_data = _WONT_IMPLEMENT_REASONS[reason] + elif explanation is not None: + reason_data = {'explanation': explanation} + else: + raise ValueError("One of (reason, explanation) must be specified") -def wont_implement_method(msg): def wrapper(*args, **kwargs): - raise WontImplementError(msg) + raise WontImplementError( + f"'{name}' is not yet supported {reason_data['explanation']}", + reason=reason) + + wrapper.__name__ = name + wrapper.__doc__ = ( + f":meth:`{_prettify_pandas_type(base_type)}.{name}` is not yet supported " + f"in the Beam DataFrame API {reason_data['explanation']}") + + if 'url' in reason_data: + wrapper.__doc__ += f"\n\n For more information see {reason_data['url']}." return wrapper -def not_implemented_method(op, jira='BEAM-9547'): +def not_implemented_method(op, jira='BEAM-9547', base_type=None): + """Generate a stub method for ``op`` that simply raises a NotImplementedError. + + For internal use only. No backwards compatibility guarantees.""" + assert base_type is not None, "base_type must be specified" + def wrapper(*args, **kwargs): - raise NotImplementedError("'%s' is not yet supported (%s)" % (op, jira)) + raise NotImplementedError( + f"{op!r} is not implemented yet. " + f"If support for {op!r} is important to you, please let the Beam " + "community know by writing to user@beam.apache.org " + "(see https://beam.apache.org/community/contact-us/) or commenting on " + f"https://issues.apache.org/jira/{jira}.") + + wrapper.__name__ = op + wrapper.__doc__ = ( + f":meth:`{_prettify_pandas_type(base_type)}.{op}` is not implemented yet " + "in the Beam DataFrame API.\n\n" + f"If support for {op!r} is important to you, please let the Beam " + "community know by `writing to user@beam.apache.org " + "`_ or commenting on " + f"`{jira} `_.") return wrapper -def copy_and_mutate(func): +def _copy_and_mutate(func): def wrapper(self, *args, **kwargs): copy = self.copy() func(copy, *args, **kwargs) @@ -330,6 +414,17 @@ def wrapper(self, *args, **kwargs): def maybe_inplace(func): + """Handles the inplace= kwarg available in many pandas operations. + + This decorator produces a new function handles the inplace kwarg. When + `inplace=False`, the new function simply yields the result of `func` + directly. + + When `inplace=True`, the output of `func` is used to replace this instances + expression. The result is that any operations applied to this instance after + the inplace operation will refernce the updated expression. + + For internal use only. No backwards compatibility guarantees.""" @functools.wraps(func) def wrapper(self, inplace=False, **kwargs): result = func(self, **kwargs) @@ -342,8 +437,17 @@ def wrapper(self, inplace=False, **kwargs): def args_to_kwargs(base_type): + """Convert all args to kwargs before calling the decorated function. + + When applied to a function, this decorator creates a new function + that always calls the wrapped function with *only* keyword arguments. It + inspects the argspec for the identically-named method on `base_type` to + determine the name to use for arguments that are converted to keyword + arguments. + + For internal use only. No backwards compatibility guarantees.""" def wrap(func): - arg_names = _getargspec(_unwrap(getattr(base_type, func.__name__))).args + arg_names = getfullargspec(unwrap(getattr(base_type, func.__name__))).args @functools.wraps(func) def wrapper(*args, **kwargs): @@ -360,9 +464,101 @@ def wrapper(*args, **kwargs): return wrap +BEAM_SPECIFIC = "Differences from pandas" + +SECTION_ORDER = [ + 'Parameters', + 'Returns', + 'Raises', + BEAM_SPECIFIC, + 'See Also', + 'Notes', + 'Examples' +] + +EXAMPLES_DISCLAIMER = ( + "**NOTE:** These examples are pulled directly from the pandas " + "documentation for convenience. Usage of the Beam DataFrame API will look " + "different because it is a deferred API.") +EXAMPLES_DIFFERENCES = EXAMPLES_DISCLAIMER + ( + " In addition, some arguments shown here may not be supported, see " + f"**{BEAM_SPECIFIC!r}** for details.") + + +def with_docs_from(base_type, name=None): + """Decorator that updates the documentation from the wrapped function to + duplicate the documentation from the identically-named method in `base_type`. + + Any docstring on the original function will be included in the new function + under a "Differences from pandas" heading. + """ + def wrap(func): + fn_name = name or func.__name__ + orig_doc = getattr(base_type, fn_name).__doc__ + if orig_doc is None: + return func + + orig_doc = cleandoc(orig_doc) + + section_splits = re.split(r'^(.*)$\n^-+$\n', orig_doc, flags=re.MULTILINE) + intro = section_splits[0].strip() + sections = dict(zip(section_splits[1::2], section_splits[2::2])) + + beam_has_differences = bool(func.__doc__) + + for header, content in sections.items(): + content = content.strip() + + # Replace references to version numbers so its clear they reference + # *pandas* versions + content = re.sub(r'([Vv]ersion\s+[\d\.]+)', r'pandas \1', content) + + if header == "Examples": + content = '\n\n'.join([ + ( + EXAMPLES_DIFFERENCES + if beam_has_differences else EXAMPLES_DISCLAIMER), + # Indent the examples under a doctest heading, + # add skipif option. This makes sure our doctest + # framework doesn't run these pandas tests. + (".. doctest::\n" + " :skipif: True"), + re.sub(r"^", " ", content, flags=re.MULTILINE), + ]) + else: + content = content.replace('DataFrame', 'DeferredDataFrame').replace( + 'Series', 'DeferredSeries') + sections[header] = content + + if beam_has_differences: + sections[BEAM_SPECIFIC] = cleandoc(func.__doc__) + else: + sections[BEAM_SPECIFIC] = ( + "This operation has no known divergences from the " + "pandas API.") + + def format_section(header): + return '\n'.join([header, ''.join('-' for _ in header), sections[header]]) + + func.__doc__ = '\n\n'.join([intro] + [ + format_section(header) for header in SECTION_ORDER if header in sections + ]) + + return func + + return wrap + + def populate_defaults(base_type): + """Populate default values for keyword arguments in decorated function. + + When applied to a function, this decorator creates a new function + with default values for all keyword arguments, based on the default values + for the identically-named method on `base_type`. + + For internal use only. No backwards compatibility guarantees.""" def wrap(func): - base_argspec = _getargspec(_unwrap(getattr(base_type, func.__name__))) + base_argspec = getfullargspec(unwrap(getattr(base_type, func.__name__))) if not base_argspec.defaults: return func @@ -371,9 +567,9 @@ def wrap(func): base_argspec.args[-len(base_argspec.defaults):], base_argspec.defaults)) - unwrapped_func = _unwrap(func) + unwrapped_func = unwrap(func) # args that do not have defaults in func, but do have defaults in base - func_argspec = _getargspec(unwrapped_func) + func_argspec = getfullargspec(unwrapped_func) num_non_defaults = len(func_argspec.args) - len(func_argspec.defaults or ()) defaults_to_populate = set( func_argspec.args[:num_non_defaults]).intersection( @@ -391,11 +587,58 @@ def wrapper(**kwargs): return wrap +_WONT_IMPLEMENT_REASONS = { + 'order-sensitive': { + 'explanation': "because it is sensitive to the order of the data.", + 'url': 'https://s.apache.org/dataframe-order-sensitive-operations', + }, + 'non-deferred-columns': { + 'explanation': ( + "because the columns in the output DataFrame depend " + "on the data."), + 'url': 'https://s.apache.org/dataframe-non-deferred-columns', + }, + 'non-deferred-result': { + 'explanation': ( + "because it produces an output type that is not " + "deferred."), + 'url': 'https://s.apache.org/dataframe-non-deferred-result', + }, + 'plotting-tools': { + 'explanation': "because it is a plotting tool.", + 'url': 'https://s.apache.org/dataframe-plotting-tools', + }, + 'event-time-semantics': { + 'explanation': ( + "because implementing it would require integrating with Beam " + "event-time semantics"), + 'url': 'https://s.apache.org/dataframe-event-time-semantics', + }, + 'deprecated': { + 'explanation': "because it is deprecated in pandas.", + }, + 'experimental': { + 'explanation': "because it is experimental in pandas.", + }, +} + + class WontImplementError(NotImplementedError): """An subclass of NotImplementedError to raise indicating that implementing - the given method is infeasible. + the given method is not planned. Raising this error will also prevent this doctests from being validated when run with the beam dataframe validation doctest runner. """ - pass + def __init__(self, msg, reason=None): + if reason is not None: + if reason not in _WONT_IMPLEMENT_REASONS: + raise AssertionError( + f"reason must be one of {list(_WONT_IMPLEMENT_REASONS.keys())}, " + f"got {reason!r}") + + reason_data = _WONT_IMPLEMENT_REASONS[reason] + if 'url' in reason_data: + msg = f"{msg}\nFor more information see {reason_data['url']}." + + super(WontImplementError, self).__init__(msg) diff --git a/sdks/python/apache_beam/dataframe/frame_base_test.py b/sdks/python/apache_beam/dataframe/frame_base_test.py index 557fedbf1d39..82d5b65e1a49 100644 --- a/sdks/python/apache_beam/dataframe/frame_base_test.py +++ b/sdks/python/apache_beam/dataframe/frame_base_test.py @@ -14,8 +14,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import unittest import pandas as pd diff --git a/sdks/python/apache_beam/dataframe/frames.py b/sdks/python/apache_beam/dataframe/frames.py index 442768b304ca..b834d9cea462 100644 --- a/sdks/python/apache_beam/dataframe/frames.py +++ b/sdks/python/apache_beam/dataframe/frames.py @@ -14,21 +14,47 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import +"""Analogs for :class:`pandas.DataFrame` and :class:`pandas.Series`: +:class:`DeferredDataFrame` and :class:`DeferredSeries`. + +These classes are effectively wrappers around a `schema-aware`_ +:class:`~apache_beam.pvalue.PCollection` that provide a set of operations +compatible with the `pandas`_ API. + +Note that we aim for the Beam DataFrame API to be completely compatible with +the pandas API, but there are some features that are currently unimplemented +for various reasons. Pay particular attention to the **'Differences from +pandas'** section for each operation to understand where we diverge. + +.. _schema-aware: + https://beam.apache.org/documentation/programming-guide/#what-is-a-schema +.. _pandas: + https://pandas.pydata.org/ +""" import collections import inspect +import itertools import math import re +import warnings +from typing import List +from typing import Optional import numpy as np import pandas as pd +from pandas.core.groupby.generic import DataFrameGroupBy from apache_beam.dataframe import expressions from apache_beam.dataframe import frame_base from apache_beam.dataframe import io from apache_beam.dataframe import partitionings +__all__ = [ + 'DeferredSeries', + 'DeferredDataFrame', +] + def populate_not_implemented(pd_type): def wrapper(deferred_type): @@ -44,19 +70,150 @@ def wrapper(deferred_type): setattr( deferred_type, attr, - property(frame_base.not_implemented_method(attr))) + property( + frame_base.not_implemented_method(attr, base_type=pd_type))) elif callable(pd_value): - setattr(deferred_type, attr, frame_base.not_implemented_method(attr)) + setattr( + deferred_type, + attr, + frame_base.not_implemented_method(attr, base_type=pd_type)) return deferred_type return wrapper +def _fillna_alias(method): + def wrapper(self, *args, **kwargs): + return self.fillna(*args, method=method, **kwargs) + + wrapper.__name__ = method + wrapper.__doc__ = ( + f'{method} is only supported for axis="columns". ' + 'axis="index" is order-sensitive.') + + return frame_base.with_docs_from(pd.DataFrame)( + frame_base.args_to_kwargs(pd.DataFrame)( + frame_base.populate_defaults(pd.DataFrame)(wrapper))) + + +LIFTABLE_AGGREGATIONS = ['all', 'any', 'max', 'min', 'prod', 'sum'] +LIFTABLE_WITH_SUM_AGGREGATIONS = ['size', 'count'] +UNLIFTABLE_AGGREGATIONS = [ + 'mean', + 'median', + 'quantile', + 'describe', + 'sem', + 'mad', + 'skew', + 'kurt', + 'kurtosis', + # TODO: The below all have specialized distributed + # implementations, but they require tracking + # multiple intermediate series, which is difficult + # to lift in groupby + 'std', + 'var', + 'corr', + 'cov', + 'nunique', +] +ALL_AGGREGATIONS = ( + LIFTABLE_AGGREGATIONS + LIFTABLE_WITH_SUM_AGGREGATIONS + + UNLIFTABLE_AGGREGATIONS) + + +def _agg_method(base, func): + def wrapper(self, *args, **kwargs): + return self.agg(func, *args, **kwargs) + + if func in UNLIFTABLE_AGGREGATIONS: + wrapper.__doc__ = ( + f"``{func}`` cannot currently be parallelized. It will " + "require collecting all data on a single node.") + wrapper.__name__ = func + + return frame_base.with_docs_from(base)(wrapper) + + +# Docstring to use for head and tail (commonly used to peek at datasets) +_PEEK_METHOD_EXPLANATION = ( + "because it is `order-sensitive " + "`_.\n\n" + "If you want to peek at a large dataset consider using interactive Beam's " + ":func:`ib.collect " + "` " + "with ``n`` specified, or :meth:`sample`. If you want to find the " + "N largest elements, consider using :meth:`DeferredDataFrame.nlargest`.") + + class DeferredDataFrameOrSeries(frame_base.DeferredFrame): - def __array__(self, dtype=None): - raise frame_base.WontImplementError( - 'Conversion to a non-deferred a numpy array.') + def _render_indexes(self): + if self.index.nlevels == 1: + return 'index=' + ( + '' if self.index.name is None else repr(self.index.name)) + else: + return 'indexes=[' + ', '.join( + '' if ix is None else repr(ix) + for ix in self.index.names) + ']' + + __array__ = frame_base.wont_implement_method( + pd.Series, '__array__', reason="non-deferred-result") + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + @frame_base.maybe_inplace + def drop(self, labels, axis, index, columns, errors, **kwargs): + """drop is not parallelizable when dropping from the index and + ``errors="raise"`` is specified. It requires collecting all data on a single + node in order to detect if one of the index values is missing.""" + if labels is not None: + if index is not None or columns is not None: + raise ValueError("Cannot specify both 'labels' and 'index'/'columns'") + if axis in (0, 'index'): + index = labels + columns = None + elif axis in (1, 'columns'): + index = None + columns = labels + else: + raise ValueError( + "axis must be one of (0, 1, 'index', 'columns'), " + "got '%s'" % axis) + + if columns is not None: + # Compute the proxy based on just the columns that are dropped. + proxy = self._expr.proxy().drop(columns=columns, errors=errors) + else: + proxy = self._expr.proxy() + + if index is not None and errors == 'raise': + # In order to raise an error about missing index values, we'll + # need to collect the entire dataframe. + # TODO: This could be parallelized by putting index values in a + # ConstantExpression and partitioning by index. + requires = partitionings.Singleton( + reason=( + "drop(errors='raise', axis='index') is not currently " + "parallelizable. This requires collecting all data on a single " + f"node in order to detect if one of {index!r} is missing.")) + else: + requires = partitionings.Arbitrary() + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'drop', + lambda df: df.drop( + axis=axis, + index=index, + columns=columns, + errors=errors, + **kwargs), [self._expr], + proxy=proxy, + requires_partition_by=requires)) + + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def droplevel(self, level, axis): @@ -64,40 +221,124 @@ def droplevel(self, level, axis): expressions.ComputedExpression( 'droplevel', lambda df: df.droplevel(level, axis=axis), [self._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Index() - if axis in (1, 'column') else partitionings.Nothing())) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary() + if axis in (1, 'column') else partitionings.Singleton())) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) @frame_base.maybe_inplace - def fillna(self, value, method, axis, **kwargs): - if method is not None and axis in (0, 'index'): - raise frame_base.WontImplementError('order-sensitive') - if isinstance(value, frame_base.DeferredBase): + def fillna(self, value, method, axis, limit, **kwargs): + """When ``axis="index"``, both ``method`` and ``limit`` must be ``None``. + otherwise this operation is order-sensitive.""" + # Default value is None, but is overriden with index. + axis = axis or 'index' + + if axis in (0, 'index'): + if method is not None: + raise frame_base.WontImplementError( + f"fillna(method={method!r}, axis={axis!r}) is not supported " + "because it is order-sensitive. Only fillna(method=None) is " + f"supported with axis={axis!r}.", + reason="order-sensitive") + if limit is not None: + raise frame_base.WontImplementError( + f"fillna(limit={method!r}, axis={axis!r}) is not supported because " + "it is order-sensitive. Only fillna(limit=None) is supported with " + f"axis={axis!r}.", + reason="order-sensitive") + + if isinstance(self, DeferredDataFrame) and isinstance(value, + DeferredSeries): + # If self is a DataFrame and value is a Series we want to broadcast value + # to all partitions of self. + # This is OK, as its index must be the same size as the columns set of + # self, so cannot be too large. + class AsScalar(object): + def __init__(self, value): + self.value = value + + with expressions.allow_non_parallel_operations(): + value_expr = expressions.ComputedExpression( + 'as_scalar', + lambda df: AsScalar(df), [value._expr], + requires_partition_by=partitionings.Singleton()) + + get_value = lambda x: x.value + requires = partitionings.Arbitrary() + elif isinstance(value, frame_base.DeferredBase): + # For other DeferredBase combinations, use Index partitioning to + # co-locate on the Index value_expr = value._expr + get_value = lambda x: x + requires = partitionings.Index() else: + # Default case, pass value through as a constant, no particular + # partitioning requirement value_expr = expressions.ConstantExpression(value) + get_value = lambda x: x + requires = partitionings.Arbitrary() + return frame_base.DeferredFrame.wrap( # yapf: disable expressions.ComputedExpression( 'fillna', lambda df, - value: df.fillna(value, method=method, axis=axis, **kwargs), - [self._expr, value_expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Nothing())) - - @frame_base.args_to_kwargs(pd.DataFrame) - @frame_base.populate_defaults(pd.DataFrame) - def ffill(self, **kwargs): - return self.fillna(method='ffill', **kwargs) + value: df.fillna( + get_value(value), + method=method, + axis=axis, + limit=limit, + **kwargs), [self._expr, value_expr], + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=requires)) + + ffill = _fillna_alias('ffill') + bfill = _fillna_alias('bfill') + backfill = _fillna_alias('backfill') + pad = _fillna_alias('pad') + + @frame_base.with_docs_from(pd.DataFrame) + def first(self, offset): + per_partition = expressions.ComputedExpression( + 'first-per-partition', + lambda df: df.sort_index().first(offset=offset), [self._expr], + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Arbitrary()) + with expressions.allow_non_parallel_operations(True): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'first', + lambda df: df.sort_index().first(offset=offset), [per_partition], + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Singleton())) - pad = ffill + @frame_base.with_docs_from(pd.DataFrame) + def last(self, offset): + per_partition = expressions.ComputedExpression( + 'last-per-partition', + lambda df: df.sort_index().last(offset=offset), [self._expr], + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Arbitrary()) + with expressions.allow_non_parallel_operations(True): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'last', + lambda df: df.sort_index().last(offset=offset), [per_partition], + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Singleton())) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def groupby(self, by, level, axis, as_index, group_keys, **kwargs): + """``as_index`` and ``group_keys`` must both be ``True``. + + Aggregations grouping by a categorical column with ``observed=False`` set + are not currently parallelizable + (`BEAM-11190 `_). + """ if not as_index: raise NotImplementedError('groupby(as_index=False)') if not group_keys: @@ -108,24 +349,47 @@ def groupby(self, by, level, axis, as_index, group_keys, **kwargs): expressions.ComputedExpression( 'groupbycols', lambda df: df.groupby(by, axis=axis, **kwargs), [self._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Index())) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary())) if level is None and by is None: raise TypeError("You have to supply one of 'by' and 'level'") elif level is not None: if isinstance(level, (list, tuple)): - levels = level - else: - levels = [level] - all_levels = self._expr.proxy().index.names - levels = [all_levels[i] if isinstance(i, int) else i for i in levels] - levels_to_drop = self._expr.proxy().index.names.difference(levels) - if levels_to_drop: - to_group = self.droplevel(levels_to_drop)._expr + grouping_indexes = level else: + grouping_indexes = [level] + + grouping_columns = [] + + index = self._expr.proxy().index + + # Translate to level numbers only + grouping_indexes = [ + l if isinstance(l, int) else index.names.index(l) + for l in grouping_indexes + ] + + if index.nlevels == 1: + to_group_with_index = self._expr to_group = self._expr + else: + levels_to_drop = [ + i for i in range(index.nlevels) if i not in grouping_indexes + ] + + # Reorder so the grouped indexes are first + to_group_with_index = self.reorder_levels( + grouping_indexes + levels_to_drop) + + grouping_indexes = list(range(len(grouping_indexes))) + levels_to_drop = list(range(len(grouping_indexes), index.nlevels)) + if levels_to_drop: + to_group = to_group_with_index.droplevel(levels_to_drop)._expr + else: + to_group = to_group_with_index._expr + to_group_with_index = to_group_with_index._expr elif callable(by): @@ -137,33 +401,117 @@ def map_index(df): to_group = expressions.ComputedExpression( 'map_index', map_index, [self._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Nothing()) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) + + orig_nlevels = self._expr.proxy().index.nlevels + + def prepend_mapped_index(df): + df = df.copy() + + index = df.index.to_frame() + index.insert(0, None, df.index.map(by)) + + df.index = pd.MultiIndex.from_frame( + index, names=[None] + list(df.index.names)) + return df + + to_group_with_index = expressions.ComputedExpression( + 'map_index_keep_orig', + prepend_mapped_index, + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + # Partitioning by the original indexes is preserved + preserves_partition_by=partitionings.Index( + list(range(1, orig_nlevels + 1)))) + + grouping_columns = [] + # The index we need to group by is the last one + grouping_indexes = [0] elif isinstance(by, DeferredSeries): + if isinstance(self, DeferredSeries): - raise NotImplementedError( - "grouping by a Series is not yet implemented. You can group by a " - "DataFrame column by specifying its name.") + def set_index(s, by): + df = pd.DataFrame(s) + df, by = df.align(by, axis=0, join='inner') + return df.set_index(by).iloc[:, 0] + + def prepend_index(s, by): + df = pd.DataFrame(s) + df, by = df.align(by, axis=0, join='inner') + return df.set_index([by, df.index]).iloc[:, 0] + + else: + + def set_index(df, by): # type: ignore + df, by = df.align(by, axis=0, join='inner') + return df.set_index(by) + + def prepend_index(df, by): # type: ignore + df, by = df.align(by, axis=0, join='inner') + return df.set_index([by, df.index]) + + to_group = expressions.ComputedExpression( + 'set_index', + set_index, [self._expr, by._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Singleton()) + + orig_nlevels = self._expr.proxy().index.nlevels + to_group_with_index = expressions.ComputedExpression( + 'prependindex', + prepend_index, [self._expr, by._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Index( + list(range(1, orig_nlevels + 1)))) + + grouping_columns = [] + grouping_indexes = [0] elif isinstance(by, np.ndarray): - raise frame_base.WontImplementError('order sensitive') + raise frame_base.WontImplementError( + "Grouping by a concrete ndarray is order sensitive.", + reason="order-sensitive") elif isinstance(self, DeferredDataFrame): if not isinstance(by, list): by = [by] + # Find the columns that we need to move into the index so we can group by + # them + column_names = self._expr.proxy().columns + grouping_columns = list(set(by).intersection(column_names)) index_names = self._expr.proxy().index.names - index_names_in_by = list(set(by).intersection(index_names)) - if index_names_in_by: + for label in by: + if label not in index_names and label not in self._expr.proxy().columns: + raise KeyError(label) + grouping_indexes = list(set(by).intersection(index_names)) + + if grouping_indexes: if set(by) == set(index_names): to_group = self._expr elif set(by).issubset(index_names): to_group = self.droplevel(index_names.difference(by))._expr else: - to_group = self.reset_index(index_names_in_by).set_index(by)._expr + to_group = self.reset_index(grouping_indexes).set_index(by)._expr else: to_group = self.set_index(by)._expr + if grouping_columns: + # TODO(BEAM-11711): It should be possible to do this without creating an + # expression manually, by using DeferredDataFrame.set_index, i.e.: + # to_group_with_index = self.set_index([self.index] + + # grouping_columns)._expr + to_group_with_index = expressions.ComputedExpression( + 'move_grouped_columns_to_index', + lambda df: df.set_index([df.index] + grouping_columns, drop=False), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Index( + list(range(self._expr.proxy().index.nlevels)))) + else: + to_group_with_index = self._expr + else: raise NotImplementedError(by) @@ -173,36 +521,593 @@ def map_index(df): lambda df: df.groupby( level=list(range(df.index.nlevels)), **kwargs), [to_group], requires_partition_by=partitionings.Index(), - preserves_partition_by=partitionings.Singleton()), - kwargs) + preserves_partition_by=partitionings.Arbitrary()), + kwargs, + to_group, + to_group_with_index, + grouping_columns=grouping_columns, + grouping_indexes=grouping_indexes) + + @property # type: ignore + @frame_base.with_docs_from(pd.DataFrame) + def loc(self): + return _DeferredLoc(self) + + @property # type: ignore + @frame_base.with_docs_from(pd.DataFrame) + def iloc(self): + """Position-based indexing with `iloc` is order-sensitive in almost every + case. Beam DataFrame users should prefer label-based indexing with `loc`. + """ + return _DeferredILoc(self) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + @frame_base.maybe_inplace + def reset_index(self, level=None, **kwargs): + """Dropping the entire index (e.g. with ``reset_index(level=None)``) is + not parallelizable. It is also only guaranteed that the newly generated + index values will be unique. The Beam DataFrame API makes no guarantee + that the same index values as the equivalent pandas operation will be + generated, because that implementation is order-sensitive.""" + if level is not None and not isinstance(level, (tuple, list)): + level = [level] + if level is None or len(level) == self._expr.proxy().index.nlevels: + # TODO(BEAM-12182): Could do distributed re-index with offsets. + requires_partition_by = partitionings.Singleton( + reason=( + f"reset_index(level={level!r}) drops the entire index and " + "creates a new one, so it cannot currently be parallelized " + "(BEAM-12182).")) + else: + requires_partition_by = partitionings.Arbitrary() + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'reset_index', + lambda df: df.reset_index(level=level, **kwargs), [self._expr], + preserves_partition_by=partitionings.Singleton(), + requires_partition_by=requires_partition_by)) + + abs = frame_base._elementwise_method('abs', base=pd.core.generic.NDFrame) + + @frame_base.with_docs_from(pd.core.generic.NDFrame) + @frame_base.args_to_kwargs(pd.core.generic.NDFrame) + @frame_base.populate_defaults(pd.core.generic.NDFrame) + def astype(self, dtype, copy, errors): + """astype is not parallelizable when ``errors="ignore"`` is specified. + + ``copy=False`` is not supported because it relies on memory-sharing + semantics. + + ``dtype="category`` is not supported because the type of the output column + depends on the data. Please use ``pd.CategoricalDtype`` with explicit + categories instead. + """ + requires = partitionings.Arbitrary() + + if errors == "ignore": + # We need all data in order to ignore errors and propagate the original + # data. + requires = partitionings.Singleton( + reason=( + f"astype(errors={errors!r}) is currently not parallelizable, " + "because all data must be collected on one node to determine if " + "the original data should be propagated instead.")) + + if not copy: + raise frame_base.WontImplementError( + f"astype(copy={copy!r}) is not supported because it relies on " + "memory-sharing semantics that are not compatible with the Beam " + "model.") + + if dtype == 'category': + raise frame_base.WontImplementError( + "astype(dtype='category') is not supported because the type of the " + "output column depends on the data. Please use pd.CategoricalDtype " + "with explicit categories instead.", + reason="non-deferred-columns") + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'astype', + lambda df: df.astype(dtype=dtype, copy=copy, errors=errors), + [self._expr], + requires_partition_by=requires, + preserves_partition_by=partitionings.Arbitrary())) + + at_time = frame_base._elementwise_method( + 'at_time', base=pd.core.generic.NDFrame) + between_time = frame_base._elementwise_method( + 'between_time', base=pd.core.generic.NDFrame) + copy = frame_base._elementwise_method('copy', base=pd.core.generic.NDFrame) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + @frame_base.maybe_inplace + def replace(self, to_replace, value, limit, method, **kwargs): + """``method`` is not supported in the Beam DataFrame API because it is + order-sensitive. It cannot be specified. + + If ``limit`` is specified this operation is not parallelizable.""" + if method is not None and not isinstance(to_replace, + dict) and value is None: + # pandas only relies on method if to_replace is not a dictionary, and + # value is None + raise frame_base.WontImplementError( + f"replace(method={method!r}) is not supported because it is " + "order sensitive. Only replace(method=None) is supported.", + reason="order-sensitive") + + if limit is None: + requires_partition_by = partitionings.Arbitrary() + else: + requires_partition_by = partitionings.Singleton( + reason=( + f"replace(limit={limit!r}) cannot currently be parallelized. It " + "requires collecting all data on a single node.")) + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'replace', + lambda df: df.replace( + to_replace=to_replace, + value=value, + limit=limit, + method=method, + **kwargs), [self._expr], + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=requires_partition_by)) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + def tz_localize(self, ambiguous, **kwargs): + """``ambiguous`` cannot be set to ``"infer"`` as its semantics are + order-sensitive. Similarly, specifying ``ambiguous`` as an + :class:`~numpy.ndarray` is order-sensitive, but you can achieve similar + functionality by specifying ``ambiguous`` as a Series.""" + if isinstance(ambiguous, np.ndarray): + raise frame_base.WontImplementError( + "tz_localize(ambiguous=ndarray) is not supported because it makes " + "this operation sensitive to the order of the data. Please use a " + "DeferredSeries instead.", + reason="order-sensitive") + elif isinstance(ambiguous, frame_base.DeferredFrame): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'tz_localize', + lambda df, + ambiguous: df.tz_localize(ambiguous=ambiguous, **kwargs), + [self._expr, ambiguous._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Singleton())) + elif ambiguous == 'infer': + # infer attempts to infer based on the order of the timestamps + raise frame_base.WontImplementError( + f"tz_localize(ambiguous={ambiguous!r}) is not allowed because it " + "makes this operation sensitive to the order of the data.", + reason="order-sensitive") + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'tz_localize', + lambda df: df.tz_localize(ambiguous=ambiguous, **kwargs), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton())) + + @property # type: ignore + @frame_base.with_docs_from(pd.DataFrame) + def size(self): + sizes = expressions.ComputedExpression( + 'get_sizes', + # Wrap scalar results in a Series for easier concatenation later + lambda df: pd.Series(df.size), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) + + with expressions.allow_non_parallel_operations(True): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'sum_sizes', + lambda sizes: sizes.sum(), [sizes], + requires_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Singleton())) + + def length(self): + """Alternative to ``len(df)`` which returns a deferred result that can be + used in arithmetic with :class:`DeferredSeries` or + :class:`DeferredDataFrame` instances.""" + lengths = expressions.ComputedExpression( + 'get_lengths', + # Wrap scalar results in a Series for easier concatenation later + lambda df: pd.Series(len(df)), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) + + with expressions.allow_non_parallel_operations(True): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'sum_lengths', + lambda lengths: lengths.sum(), [lengths], + requires_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Singleton())) + + def __len__(self): + raise frame_base.WontImplementError( + "len(df) is not currently supported because it produces a non-deferred " + "result. Consider using df.length() instead.", + reason="non-deferred-result") + + @property # type: ignore + @frame_base.with_docs_from(pd.DataFrame) + def empty(self): + empties = expressions.ComputedExpression( + 'get_empties', + # Wrap scalar results in a Series for easier concatenation later + lambda df: pd.Series(df.empty), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) + + with expressions.allow_non_parallel_operations(True): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'check_all_empty', + lambda empties: empties.all(), [empties], + requires_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Singleton())) + + @frame_base.with_docs_from(pd.DataFrame) + def bool(self): + # TODO: Documentation about DeferredScalar + # Will throw if any partition has >1 element + bools = expressions.ComputedExpression( + 'get_bools', + # Wrap scalar results in a Series for easier concatenation later + lambda df: pd.Series([], dtype=bool) + if df.empty else pd.Series([df.bool()]), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) + + with expressions.allow_non_parallel_operations(True): + # Will throw if overall dataset has != 1 element + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'combine_all_bools', + lambda bools: bools.bool(), [bools], + proxy=bool(), + requires_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Singleton())) + + @frame_base.with_docs_from(pd.DataFrame) + def equals(self, other): + intermediate = expressions.ComputedExpression( + 'equals_partitioned', + # Wrap scalar results in a Series for easier concatenation later + lambda df, + other: pd.Series(df.equals(other)), + [self._expr, other._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Singleton()) + + with expressions.allow_non_parallel_operations(True): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'aggregate_equals', + lambda df: df.all(), [intermediate], + requires_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Singleton())) + + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + def sort_values(self, axis, **kwargs): + """``sort_values`` is not implemented. + + It is not implemented for ``axis=index`` because it imposes an ordering on + the dataset, and it likely will not be maintained (see + https://s.apache.org/dataframe-order-sensitive-operations). + + It is not implemented for ``axis=columns`` because it makes the order of + the columns depend on the data (see + https://s.apache.org/dataframe-non-deferred-columns).""" + if axis in (0, 'index'): + # axis=index imposes an ordering on the DataFrame rows which we do not + # support + raise frame_base.WontImplementError( + "sort_values(axis=index) is not supported because it imposes an " + "ordering on the dataset which likely will not be preserved.", + reason="order-sensitive") + else: + # axis=columns will reorder the columns based on the data + raise frame_base.WontImplementError( + "sort_values(axis=columns) is not supported because the order of the " + "columns in the result depends on the data.", + reason="non-deferred-columns") + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + @frame_base.maybe_inplace + def sort_index(self, axis, **kwargs): + """``axis=index`` is not allowed because it imposes an ordering on the + dataset, and we cannot guarantee it will be maintained (see + https://s.apache.org/dataframe-order-sensitive-operations). Only + ``axis=columns`` is allowed.""" + if axis in (0, 'index'): + # axis=rows imposes an ordering on the DataFrame which we do not support + raise frame_base.WontImplementError( + "sort_index(axis=index) is not supported because it imposes an " + "ordering on the dataset which we cannot guarantee will be " + "preserved.", + reason="order-sensitive") + + # axis=columns reorders the columns by name + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'sort_index', + lambda df: df.sort_index(axis, **kwargs), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary(), + )) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + @frame_base.maybe_inplace + def where(self, cond, other, errors, **kwargs): + """where is not parallelizable when ``errors="ignore"`` is specified.""" + requires = partitionings.Arbitrary() + deferred_args = {} + actual_args = {} + + # TODO(bhulette): This is very similar to the logic in + # frame_base.elementwise_method, can we unify it? + if isinstance(cond, frame_base.DeferredFrame): + deferred_args['cond'] = cond + requires = partitionings.Index() + else: + actual_args['cond'] = cond - abs = frame_base._elementwise_method('abs') - astype = frame_base._elementwise_method('astype') - copy = frame_base._elementwise_method('copy') + if isinstance(other, frame_base.DeferredFrame): + deferred_args['other'] = other + requires = partitionings.Index() + else: + actual_args['other'] = other + + if errors == "ignore": + # We need all data in order to ignore errors and propagate the original + # data. + requires = partitionings.Singleton( + reason=( + f"where(errors={errors!r}) is currently not parallelizable, " + "because all data must be collected on one node to determine if " + "the original data should be propagated instead.")) + + actual_args['errors'] = errors + + def where_execution(df, *args): + runtime_values = { + name: value + for (name, value) in zip(deferred_args.keys(), args) + } + return df.where(**runtime_values, **actual_args, **kwargs) + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + "where", + where_execution, + [self._expr] + [df._expr for df in deferred_args.values()], + requires_partition_by=requires, + preserves_partition_by=partitionings.Index(), + )) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + @frame_base.maybe_inplace + def mask(self, cond, **kwargs): + """mask is not parallelizable when ``errors="ignore"`` is specified.""" + return self.where(~cond, **kwargs) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + def xs(self, key, axis, level, **kwargs): + """Note that ``xs(axis='index')`` will raise a ``KeyError`` at execution + time if the key does not exist in the index.""" + + if axis in ('columns', 1): + # Special case for axis=columns. This is a simple project that raises a + # KeyError at construction time for missing columns. + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'xs', + lambda df: df.xs(key, axis=axis, **kwargs), [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary())) + elif axis not in ('index', 0): + # Make sure that user's axis is valid + raise ValueError( + "axis must be one of ('index', 0, 'columns', 1). " + f"got {axis!r}.") + + if not isinstance(key, tuple): + key = (key, ) + + key_size = len(key) + key_series = pd.Series([key], pd.MultiIndex.from_tuples([key])) + key_expr = expressions.ConstantExpression( + key_series, proxy=key_series.iloc[:0]) + + if level is None: + reindexed = self + else: + if not isinstance(level, list): + level = [level] + + # If user specifed levels, reindex so those levels are at the beginning. + # Keep the others and preserve their order. + level = [ + l if isinstance(l, int) else list(self.index.names).index(l) + for l in level + ] + + reindexed = self.reorder_levels( + level + [i for i in range(self.index.nlevels) if i not in level]) + + def xs_partitioned(frame, key): + if not len(key): + # key is not in this partition, return empty dataframe + return frame.iloc[:0].droplevel(list(range(key_size))) + + # key should be in this partition, call xs. Will raise KeyError if not + # present. + return frame.xs(key.item()) + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'xs', + xs_partitioned, + [reindexed._expr, key_expr], + requires_partition_by=partitionings.Index(list(range(key_size))), + # Drops index levels, so partitioning is not preserved + preserves_partition_by=partitionings.Singleton())) @property def dtype(self): return self._expr.proxy().dtype - dtypes = dtype + isin = frame_base._elementwise_method('isin', base=pd.DataFrame) + combine_first = frame_base._elementwise_method( + 'combine_first', base=pd.DataFrame) + + combine = frame_base._proxy_method( + 'combine', + base=pd.DataFrame, + requires_partition_by=expressions.partitionings.Singleton( + reason="combine() is not parallelizable because func might operate " + "on the full dataset."), + preserves_partition_by=expressions.partitionings.Singleton()) + + @property # type: ignore + @frame_base.with_docs_from(pd.DataFrame) + def ndim(self): + return self._expr.proxy().ndim - def _get_index(self): + @property # type: ignore + @frame_base.with_docs_from(pd.DataFrame) + def index(self): return _DeferredIndex(self) - index = property( - _get_index, frame_base.not_implemented_method('index (setter)')) + @index.setter + def _set_index(self, value): + # TODO: assigning the index is generally order-sensitive, but we could + # support it in some rare cases, e.g. when assigning the index from one + # of a DataFrame's columns + raise NotImplementedError( + "Assigning an index is not yet supported. " + "Consider using set_index() instead.") + + reindex = frame_base.wont_implement_method( + pd.DataFrame, 'reindex', reason="order-sensitive") + + hist = frame_base.wont_implement_method( + pd.DataFrame, 'hist', reason="plotting-tools") + + attrs = property( + frame_base.wont_implement_method( + pd.DataFrame, 'attrs', reason='experimental')) + + reorder_levels = frame_base._proxy_method( + 'reorder_levels', + base=pd.DataFrame, + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) + + resample = frame_base.wont_implement_method( + pd.DataFrame, 'resample', reason='event-time-semantics') + + rolling = frame_base.wont_implement_method( + pd.DataFrame, 'rolling', reason='event-time-semantics') + + to_xarray = frame_base.wont_implement_method( + pd.DataFrame, 'to_xarray', reason='non-deferred-result') + to_clipboard = frame_base.wont_implement_method( + pd.DataFrame, 'to_clipboard', reason="non-deferred-result") + + swapaxes = frame_base.wont_implement_method( + pd.Series, 'swapaxes', reason="non-deferred-columns") + infer_object = frame_base.wont_implement_method( + pd.Series, 'infer_objects', reason="non-deferred-columns") + + ewm = frame_base.wont_implement_method( + pd.Series, 'ewm', reason="event-time-semantics") + expanding = frame_base.wont_implement_method( + pd.Series, 'expanding', reason="event-time-semantics") + + sparse = property( + frame_base.not_implemented_method( + 'sparse', 'BEAM-12425', base_type=pd.DataFrame)) + + transform = frame_base._elementwise_method('transform', base=pd.DataFrame) + + tz_convert = frame_base._proxy_method( + 'tz_convert', + base=pd.DataFrame, + requires_partition_by=partitionings.Arbitrary(), + # Manipulates index, partitioning is not preserved + preserves_partition_by=partitionings.Singleton()) @populate_not_implemented(pd.Series) @frame_base.DeferredFrame._register_for(pd.Series) class DeferredSeries(DeferredDataFrameOrSeries): + def __repr__(self): + return ( + f'DeferredSeries(name={self.name!r}, dtype={self.dtype}, ' + f'{self._render_indexes()})') + + @property # type: ignore + @frame_base.with_docs_from(pd.Series) + def name(self): + return self._expr.proxy().name + + @name.setter + def name(self, value): + def fn(s): + s = s.copy() + s.name = value + return s + + self._expr = expressions.ComputedExpression( + 'series_set_name', + fn, [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary()) + + @property # type: ignore + @frame_base.with_docs_from(pd.Series) + def dtype(self): + return self._expr.proxy().dtype + + dtypes = dtype + def __getitem__(self, key): if _is_null_slice(key) or key is Ellipsis: return self elif (isinstance(key, int) or _is_integer_slice(key) ) and self._expr.proxy().index._should_fallback_to_positional(): - raise frame_base.WontImplementError('order sensitive') + raise frame_base.WontImplementError( + "Accessing an item by an integer key is order sensitive for this " + "Series.", + reason="order-sensitive") elif isinstance(key, slice) or callable(key): return frame_base.DeferredFrame.wrap( @@ -211,8 +1116,8 @@ def __getitem__(self, key): 'getitem', lambda df: df[key], [self._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Singleton())) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary())) elif isinstance(key, DeferredSeries) and key._expr.proxy().dtype == bool: return frame_base.DeferredFrame.wrap( @@ -223,23 +1128,83 @@ def __getitem__(self, key): indexer: df[indexer], [self._expr, key._expr], requires_partition_by=partitionings.Index(), - preserves_partition_by=partitionings.Singleton())) + preserves_partition_by=partitionings.Arbitrary())) elif pd.core.series.is_iterator(key) or pd.core.common.is_bool_indexer(key): - raise frame_base.WontImplementError('order sensitive') + raise frame_base.WontImplementError( + "Accessing a DeferredSeries with an iterator is sensitive to the " + "order of the data.", + reason="order-sensitive") else: # We could consider returning a deferred scalar, but that might # be more surprising than a clear error. - raise frame_base.WontImplementError('non-deferred') + raise frame_base.WontImplementError( + f"Indexing a series with key of type {type(key)} is not supported " + "because it produces a non-deferred result.", + reason="non-deferred-result") + + @frame_base.with_docs_from(pd.Series) + def keys(self): + return self.index + + # Series.T == transpose. Both are a no-op + T = frame_base._elementwise_method('T', base=pd.Series) + transpose = frame_base._elementwise_method('transpose', base=pd.Series) + shape = property( + frame_base.wont_implement_method( + pd.Series, 'shape', reason="non-deferred-result")) + + @frame_base.with_docs_from(pd.Series) + @frame_base.args_to_kwargs(pd.Series) + @frame_base.populate_defaults(pd.Series) + def append(self, to_append, ignore_index, verify_integrity, **kwargs): + """``ignore_index=True`` is not supported, because it requires generating an + order-sensitive index.""" + if not isinstance(to_append, DeferredSeries): + raise frame_base.WontImplementError( + "append() only accepts DeferredSeries instances, received " + + str(type(to_append))) + if ignore_index: + raise frame_base.WontImplementError( + "append(ignore_index=True) is order sensitive because it requires " + "generating a new index based on the order of the data.", + reason="order-sensitive") + + if verify_integrity: + # We can verify the index is non-unique within index partitioned data. + requires = partitionings.Index() + else: + requires = partitionings.Arbitrary() + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'append', + lambda s, + to_append: s.append( + to_append, verify_integrity=verify_integrity, **kwargs), + [self._expr, to_append._expr], + requires_partition_by=requires, + preserves_partition_by=partitionings.Arbitrary())) + @frame_base.with_docs_from(pd.Series) @frame_base.args_to_kwargs(pd.Series) @frame_base.populate_defaults(pd.Series) def align(self, other, join, axis, level, method, **kwargs): + """Aligning per-level is not yet supported. Only the default, + ``level=None``, is allowed. + + Filling NaN values via ``method`` is not supported, because it is + `order-sensitive + `_. + Only the default, ``method=None``, is allowed.""" if level is not None: raise NotImplementedError('per-level align') if method is not None: - raise frame_base.WontImplementError('order-sensitive') + raise frame_base.WontImplementError( + f"align(method={method!r}) is not supported because it is " + "order sensitive. Only align(method=None) is supported.", + reason="order-sensitive") # We're using pd.concat here as expressions don't yet support # multiple return values. aligned = frame_base.DeferredFrame.wrap( @@ -249,28 +1214,82 @@ def align(self, other, join, axis, level, method, **kwargs): y: pd.concat([x, y], axis=1, join='inner'), [self._expr, other._expr], requires_partition_by=partitionings.Index(), - preserves_partition_by=partitionings.Index())) + preserves_partition_by=partitionings.Arbitrary())) return aligned.iloc[:, 0], aligned.iloc[:, 1] - array = property(frame_base.wont_implement_method('non-deferred value')) - - rename = frame_base._elementwise_method('rename') - between = frame_base._elementwise_method('between') + argsort = frame_base.wont_implement_method( + pd.Series, 'argsort', reason="order-sensitive") + + array = property( + frame_base.wont_implement_method( + pd.Series, 'array', reason="non-deferred-result")) + + # We can't reliably predict the output type, it depends on whether `key` is: + # - not in the index (default_value) + # - in the index once (constant) + # - in the index multiple times (Series) + get = frame_base.wont_implement_method( + pd.Series, 'get', reason="non-deferred-columns") + + ravel = frame_base.wont_implement_method( + pd.Series, 'ravel', reason="non-deferred-result") + + slice_shift = frame_base.wont_implement_method( + pd.Series, 'slice_shift', reason="deprecated") + tshift = frame_base.wont_implement_method( + pd.Series, 'tshift', reason="deprecated") + + rename = frame_base._elementwise_method('rename', base=pd.Series) + between = frame_base._elementwise_method('between', base=pd.Series) + + add_suffix = frame_base._proxy_method( + 'add_suffix', + base=pd.DataFrame, + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) + add_prefix = frame_base._proxy_method( + 'add_prefix', + base=pd.DataFrame, + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) + + @frame_base.with_docs_from(pd.Series) + @frame_base.args_to_kwargs(pd.Series) + @frame_base.populate_defaults(pd.Series) + def explode(self, ignore_index): + # ignoring the index will not preserve it + preserves = ( + partitionings.Singleton() if ignore_index else partitionings.Index()) + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'explode', + lambda s: s.explode(ignore_index), [self._expr], + preserves_partition_by=preserves, + requires_partition_by=partitionings.Arbitrary())) + @frame_base.with_docs_from(pd.DataFrame) def dot(self, other): + """``other`` must be a :class:`DeferredDataFrame` or :class:`DeferredSeries` + instance. Computing the dot product with an array-like is not supported + because it is order-sensitive.""" left = self._expr if isinstance(other, DeferredSeries): right = expressions.ComputedExpression( 'to_dataframe', pd.DataFrame, [other._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Index()) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary()) right_is_series = True elif isinstance(other, DeferredDataFrame): right = other._expr right_is_series = False else: - raise frame_base.WontImplementError('non-deferred result') + raise frame_base.WontImplementError( + "other must be a DeferredDataFrame or DeferredSeries instance. " + "Passing a concrete list or numpy array is not supported. Those " + "types have no index and must be joined based on the order of the " + "data.", + reason="order-sensitive") dots = expressions.ComputedExpression( 'dot', @@ -296,9 +1315,46 @@ def dot(self, other): __matmul__ = dot + @frame_base.with_docs_from(pd.Series) @frame_base.args_to_kwargs(pd.Series) @frame_base.populate_defaults(pd.Series) - def std(self, axis, skipna, level, ddof, **kwargs): + def nunique(self, **kwargs): + return self.drop_duplicates(keep="any").size + + @frame_base.with_docs_from(pd.Series) + @frame_base.args_to_kwargs(pd.Series) + @frame_base.populate_defaults(pd.Series) + def quantile(self, q, **kwargs): + """quantile is not parallelizable. See + `BEAM-12167 `_ tracking + the possible addition of an approximate, parallelizable implementation of + quantile.""" + # TODO(BEAM-12167): Provide an option for approximate distributed + # quantiles + requires = partitionings.Singleton( + reason=( + "Computing quantiles across index cannot currently be " + "parallelized. See BEAM-12167 tracking the possible addition of an " + "approximate, parallelizable implementation of quantile.")) + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'quantile', + lambda df: df.quantile(q=q, **kwargs), [self._expr], + requires_partition_by=requires, + preserves_partition_by=partitionings.Singleton())) + + @frame_base.with_docs_from(pd.Series) + def std(self, *args, **kwargs): + # Compute variance (deferred scalar) with same args, then sqrt it + return self.var(*args, **kwargs).apply(lambda var: math.sqrt(var)) + + @frame_base.with_docs_from(pd.Series) + @frame_base.args_to_kwargs(pd.Series) + @frame_base.populate_defaults(pd.Series) + def var(self, axis, skipna, level, ddof, **kwargs): + """Per-level aggregation is not yet supported (BEAM-11777). Only the + default, ``level=None``, is allowed.""" if level is not None: raise NotImplementedError("per-level aggregation") if skipna is None or skipna: @@ -329,12 +1385,12 @@ def combine_moments(data): if n <= ddof: return float('nan') else: - return math.sqrt(m / (n - ddof)) + return m / (n - ddof) moments = expressions.ComputedExpression( 'compute_moments', compute_moments, [self._expr], - requires_partition_by=partitionings.Nothing()) + requires_partition_by=partitionings.Arbitrary()) with expressions.allow_non_parallel_operations(True): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( @@ -342,14 +1398,20 @@ def combine_moments(data): combine_moments, [moments], requires_partition_by=partitionings.Singleton())) + @frame_base.with_docs_from(pd.Series) @frame_base.args_to_kwargs(pd.Series) @frame_base.populate_defaults(pd.Series) def corr(self, other, method, min_periods): + """Only ``method='pearson'`` is currently parallelizable.""" if method == 'pearson': # Note that this is the default. x, y = self.dropna().align(other.dropna(), 'inner') return x._corr_aligned(y, min_periods) else: + reason = ( + f"Encountered corr(method={method!r}) which cannot be " + "parallelized. Only corr(method='pearson') is currently " + "parallelizable.") # The rank-based correlations are not obviously parallelizable, though # perhaps an approximation could be done with a knowledge of quantiles # and custom partitioning. @@ -359,7 +1421,7 @@ def corr(self, other, method, min_periods): lambda df, other: df.corr(other, method=method, min_periods=min_periods), [self._expr, other._expr], - requires_partition_by=partitionings.Singleton())) + requires_partition_by=partitionings.Singleton(reason=reason))) def _corr_aligned(self, other, min_periods): std_x = self.std() @@ -368,6 +1430,7 @@ def _corr_aligned(self, other, min_periods): return cov.apply( lambda cov, std_x, std_y: cov / (std_x * std_y), args=[std_x, std_y]) + @frame_base.with_docs_from(pd.Series) @frame_base.args_to_kwargs(pd.Series) @frame_base.populate_defaults(pd.Series) def cov(self, other, min_periods, ddof): @@ -418,6 +1481,7 @@ def combine_co_moments(data): combine_co_moments, [moments], requires_partition_by=partitionings.Singleton())) + @frame_base.with_docs_from(pd.Series) @frame_base.args_to_kwargs(pd.Series) @frame_base.populate_defaults(pd.Series) @frame_base.maybe_inplace @@ -426,197 +1490,552 @@ def dropna(self, **kwargs): expressions.ComputedExpression( 'dropna', lambda df: df.dropna(**kwargs), [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Nothing())) - - items = iteritems = frame_base.wont_implement_method('non-lazy') + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Arbitrary())) + + isnull = isna = frame_base._elementwise_method('isna', base=pd.Series) + notnull = notna = frame_base._elementwise_method('notna', base=pd.Series) + + items = frame_base.wont_implement_method( + pd.Series, 'items', reason="non-deferred-result") + iteritems = frame_base.wont_implement_method( + pd.Series, 'iteritems', reason="non-deferred-result") + tolist = frame_base.wont_implement_method( + pd.Series, 'tolist', reason="non-deferred-result") + to_numpy = frame_base.wont_implement_method( + pd.Series, 'to_numpy', reason="non-deferred-result") + to_string = frame_base.wont_implement_method( + pd.Series, 'to_string', reason="non-deferred-result") + + def _wrap_in_df(self): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'wrap_in_df', + lambda s: pd.DataFrame(s), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary(), + )) - isin = frame_base._elementwise_method('isin') + @frame_base.with_docs_from(pd.Series) + @frame_base.args_to_kwargs(pd.Series) + @frame_base.populate_defaults(pd.Series) + @frame_base.maybe_inplace + def duplicated(self, keep): + """Only ``keep=False`` and ``keep="any"`` are supported. Other values of + ``keep`` make this an order-sensitive operation. Note ``keep="any"`` is + a Beam-specific option that guarantees only one duplicate will be kept, but + unlike ``"first"`` and ``"last"`` it makes no guarantees about _which_ + duplicate element is kept.""" + # Re-use the DataFrame based duplcated, extract the series back out + df = self._wrap_in_df() + + return df.duplicated(keep=keep)[df.columns[0]] + + @frame_base.with_docs_from(pd.Series) + @frame_base.args_to_kwargs(pd.Series) + @frame_base.populate_defaults(pd.Series) + @frame_base.maybe_inplace + def drop_duplicates(self, keep): + """Only ``keep=False`` and ``keep="any"`` are supported. Other values of + ``keep`` make this an order-sensitive operation. Note ``keep="any"`` is + a Beam-specific option that guarantees only one duplicate will be kept, but + unlike ``"first"`` and ``"last"`` it makes no guarantees about _which_ + duplicate element is kept.""" + # Re-use the DataFrame based drop_duplicates, extract the series back out + df = self._wrap_in_df() + + return df.drop_duplicates(keep=keep)[df.columns[0]] + + @frame_base.with_docs_from(pd.Series) + @frame_base.args_to_kwargs(pd.Series) + @frame_base.populate_defaults(pd.Series) + @frame_base.maybe_inplace + def sample(self, **kwargs): + """Only ``n`` and/or ``weights`` may be specified. ``frac``, + ``random_state``, and ``replace=True`` are not yet supported. + See `BEAM-12476 `_. - isna = frame_base._elementwise_method('isna') - notnull = notna = frame_base._elementwise_method('notna') + Note that pandas will raise an error if ``n`` is larger than the length + of the dataset, while the Beam DataFrame API will simply return the full + dataset in that case.""" - to_numpy = to_string = frame_base.wont_implement_method('non-deferred value') + # Re-use the DataFrame based sample, extract the series back out + df = self._wrap_in_df() - transform = frame_base._elementwise_method( - 'transform', restrictions={'axis': 0}) + return df.sample(**kwargs)[df.columns[0]] - def aggregate(self, func, axis=0, *args, **kwargs): + @frame_base.with_docs_from(pd.Series) + @frame_base.args_to_kwargs(pd.Series) + @frame_base.populate_defaults(pd.Series) + def aggregate(self, func, axis, *args, **kwargs): + """Some aggregation methods cannot be parallelized, and computing + them will require collecting all data on a single machine.""" + if kwargs.get('skipna', False): + # Eagerly generate a proxy to make sure skipna is a valid argument + # for this aggregation method + _ = self._expr.proxy().aggregate(func, axis, *args, **kwargs) + kwargs.pop('skipna') + return self.dropna().aggregate(func, axis, *args, **kwargs) if isinstance(func, list) and len(func) > 1: - # Aggregate each column separately, then stick them all together. + # level arg is ignored for multiple aggregations + _ = kwargs.pop('level', None) + + # Aggregate with each method separately, then stick them all together. rows = [self.agg([f], *args, **kwargs) for f in func] return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'join_aggregate', lambda *rows: pd.concat(rows), [row._expr for row in rows])) else: - # We're only handling a single column. + # We're only handling a single column. It could be 'func' or ['func'], + # which produce different results. 'func' produces a scalar, ['func'] + # produces a single element Series. base_func = func[0] if isinstance(func, list) else func - if _is_associative(base_func) and not args and not kwargs: + + if (_is_numeric(base_func) and + not pd.core.dtypes.common.is_numeric_dtype(self.dtype)): + warnings.warn( + f"Performing a numeric aggregation, {base_func!r}, on " + f"Series {self._expr.proxy().name!r} with non-numeric type " + f"{self.dtype!r}. This can result in runtime errors or surprising " + "results.") + + if 'level' in kwargs: + # Defer to groupby.agg for level= mode + return self.groupby( + level=kwargs.pop('level'), axis=axis).agg(func, *args, **kwargs) + + singleton_reason = None + if 'min_count' in kwargs: + # Eagerly generate a proxy to make sure min_count is a valid argument + # for this aggregation method + _ = self._expr.proxy().agg(func, axis, *args, **kwargs) + + singleton_reason = ( + "Aggregation with min_count= requires collecting all data on a " + "single node.") + + # We have specialized distributed implementations for these + if base_func in ('quantile', 'std', 'var', 'nunique', 'corr', 'cov'): + result = getattr(self, base_func)(*args, **kwargs) + if isinstance(func, list): + with expressions.allow_non_parallel_operations(True): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'wrap_aggregate', + lambda x: pd.Series(x, index=[base_func]), [result._expr], + requires_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Singleton())) + else: + return result + + agg_kwargs = kwargs.copy() + if ((_is_associative(base_func) or _is_liftable_with_sum(base_func)) and + singleton_reason is None): intermediate = expressions.ComputedExpression( 'pre_aggregate', - lambda s: s.agg([base_func], *args, **kwargs), [self._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Nothing()) + # Coerce to a Series, if the result is scalar we still want a Series + # so we can combine and do the final aggregation next. + lambda s: pd.Series(s.agg(func, *args, **kwargs)), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) allow_nonparallel_final = True + if _is_associative(base_func): + agg_func = func + else: + agg_func = ['sum'] if isinstance(func, list) else 'sum' else: intermediate = self._expr allow_nonparallel_final = None # i.e. don't change the value + agg_func = func + singleton_reason = ( + f"Aggregation function {func!r} cannot currently be " + "parallelized. It requires collecting all data for " + "this Series on a single node.") with expressions.allow_non_parallel_operations(allow_nonparallel_final): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'aggregate', - lambda s: s.agg(func, *args, **kwargs), [intermediate], + lambda s: s.agg(agg_func, *args, **agg_kwargs), [intermediate], preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Singleton())) + requires_partition_by=partitionings.Singleton( + reason=singleton_reason))) agg = aggregate - @property + @property # type: ignore + @frame_base.with_docs_from(pd.Series) def axes(self): return [self.index] - clip = frame_base._elementwise_method('clip') - - all = frame_base._agg_method('all') - any = frame_base._agg_method('any') - min = frame_base._agg_method('min') - max = frame_base._agg_method('max') - prod = product = frame_base._agg_method('prod') - sum = frame_base._agg_method('sum') - - cummax = cummin = cumsum = cumprod = frame_base.wont_implement_method( - 'order-sensitive') - diff = frame_base.wont_implement_method('order-sensitive') - - head = tail = frame_base.wont_implement_method('order-sensitive') - - filter = frame_base._elementwise_method('filter') - - memory_usage = frame_base.wont_implement_method('non-deferred value') + clip = frame_base._elementwise_method('clip', base=pd.Series) + + all = _agg_method(pd.Series, 'all') + any = _agg_method(pd.Series, 'any') + # TODO(BEAM-12074): Document that Series.count(level=) will drop NaN's + count = _agg_method(pd.Series, 'count') + describe = _agg_method(pd.Series, 'describe') + min = _agg_method(pd.Series, 'min') + max = _agg_method(pd.Series, 'max') + prod = product = _agg_method(pd.Series, 'prod') + sum = _agg_method(pd.Series, 'sum') + mean = _agg_method(pd.Series, 'mean') + median = _agg_method(pd.Series, 'median') + sem = _agg_method(pd.Series, 'sem') + mad = _agg_method(pd.Series, 'mad') + skew = _agg_method(pd.Series, 'skew') + kurt = _agg_method(pd.Series, 'kurt') + kurtosis = _agg_method(pd.Series, 'kurtosis') + + argmax = frame_base.wont_implement_method( + pd.Series, 'argmax', reason='order-sensitive') + argmin = frame_base.wont_implement_method( + pd.Series, 'argmin', reason='order-sensitive') + cummax = frame_base.wont_implement_method( + pd.Series, 'cummax', reason='order-sensitive') + cummin = frame_base.wont_implement_method( + pd.Series, 'cummin', reason='order-sensitive') + cumprod = frame_base.wont_implement_method( + pd.Series, 'cumprod', reason='order-sensitive') + cumsum = frame_base.wont_implement_method( + pd.Series, 'cumsum', reason='order-sensitive') + diff = frame_base.wont_implement_method( + pd.Series, 'diff', reason='order-sensitive') + interpolate = frame_base.wont_implement_method( + pd.Series, 'interpolate', reason='order-sensitive') + searchsorted = frame_base.wont_implement_method( + pd.Series, 'searchsorted', reason='order-sensitive') + shift = frame_base.wont_implement_method( + pd.Series, 'shift', reason='order-sensitive') + pct_change = frame_base.wont_implement_method( + pd.Series, 'pct_change', reason='order-sensitive') + is_monotonic = frame_base.wont_implement_method( + pd.Series, 'is_monotonic', reason='order-sensitive') + is_monotonic_increasing = frame_base.wont_implement_method( + pd.Series, 'is_monotonic_increasing', reason='order-sensitive') + is_monotonic_decreasing = frame_base.wont_implement_method( + pd.Series, 'is_monotonic_decreasing', reason='order-sensitive') + asof = frame_base.wont_implement_method( + pd.Series, 'asof', reason='order-sensitive') + first_valid_index = frame_base.wont_implement_method( + pd.Series, 'first_valid_index', reason='order-sensitive') + last_valid_index = frame_base.wont_implement_method( + pd.Series, 'last_valid_index', reason='order-sensitive') + autocorr = frame_base.wont_implement_method( + pd.Series, 'autocorr', reason='order-sensitive') + iat = property( + frame_base.wont_implement_method( + pd.Series, 'iat', reason='order-sensitive')) + + head = frame_base.wont_implement_method( + pd.Series, 'head', explanation=_PEEK_METHOD_EXPLANATION) + tail = frame_base.wont_implement_method( + pd.Series, 'tail', explanation=_PEEK_METHOD_EXPLANATION) + + filter = frame_base._elementwise_method('filter', base=pd.Series) + + memory_usage = frame_base.wont_implement_method( + pd.Series, 'memory_usage', reason="non-deferred-result") + nbytes = frame_base.wont_implement_method( + pd.Series, 'nbytes', reason="non-deferred-result") + to_list = frame_base.wont_implement_method( + pd.Series, 'to_list', reason="non-deferred-result") + + factorize = frame_base.wont_implement_method( + pd.Series, 'factorize', reason="non-deferred-columns") # In Series __contains__ checks the index - __contains__ = frame_base.wont_implement_method('non-deferred value') + __contains__ = frame_base.wont_implement_method( + pd.Series, '__contains__', reason="non-deferred-result") + @frame_base.with_docs_from(pd.Series) @frame_base.args_to_kwargs(pd.Series) @frame_base.populate_defaults(pd.Series) def nlargest(self, keep, **kwargs): + """Only ``keep=False`` and ``keep="any"`` are supported. Other values of + ``keep`` make this an order-sensitive operation. Note ``keep="any"`` is + a Beam-specific option that guarantees only one duplicate will be kept, but + unlike ``"first"`` and ``"last"`` it makes no guarantees about _which_ + duplicate element is kept.""" # TODO(robertwb): Document 'any' option. # TODO(robertwb): Consider (conditionally) defaulting to 'any' if no # explicit keep parameter is requested. if keep == 'any': keep = 'first' elif keep != 'all': - raise frame_base.WontImplementError('order-sensitive') + raise frame_base.WontImplementError( + f"nlargest(keep={keep!r}) is not supported because it is " + "order sensitive. Only keep=\"all\" is supported.", + reason="order-sensitive") kwargs['keep'] = keep per_partition = expressions.ComputedExpression( 'nlargest-per-partition', lambda df: df.nlargest(**kwargs), [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Nothing()) + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Arbitrary()) with expressions.allow_non_parallel_operations(True): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'nlargest', lambda df: df.nlargest(**kwargs), [per_partition], - preserves_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Arbitrary(), requires_partition_by=partitionings.Singleton())) + @frame_base.with_docs_from(pd.Series) @frame_base.args_to_kwargs(pd.Series) @frame_base.populate_defaults(pd.Series) def nsmallest(self, keep, **kwargs): + """Only ``keep=False`` and ``keep="any"`` are supported. Other values of + ``keep`` make this an order-sensitive operation. Note ``keep="any"`` is + a Beam-specific option that guarantees only one duplicate will be kept, but + unlike ``"first"`` and ``"last"`` it makes no guarantees about _which_ + duplicate element is kept.""" if keep == 'any': keep = 'first' elif keep != 'all': - raise frame_base.WontImplementError('order-sensitive') + raise frame_base.WontImplementError( + f"nsmallest(keep={keep!r}) is not supported because it is " + "order sensitive. Only keep=\"all\" is supported.", + reason="order-sensitive") kwargs['keep'] = keep per_partition = expressions.ComputedExpression( 'nsmallest-per-partition', lambda df: df.nsmallest(**kwargs), [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Nothing()) + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Arbitrary()) with expressions.allow_non_parallel_operations(True): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'nsmallest', lambda df: df.nsmallest(**kwargs), [per_partition], - preserves_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Arbitrary(), requires_partition_by=partitionings.Singleton())) - plot = property(frame_base.wont_implement_method('plot')) - pop = frame_base.wont_implement_method('non-lazy') + @property # type: ignore + @frame_base.with_docs_from(pd.Series) + def is_unique(self): + def set_index(s): + s = s[:] + s.index = s + return s + + self_index = expressions.ComputedExpression( + 'set_index', + set_index, [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton()) - rename_axis = frame_base._elementwise_method('rename_axis') + is_unique_distributed = expressions.ComputedExpression( + 'is_unique_distributed', + lambda s: pd.Series(s.is_unique), [self_index], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Singleton()) - @frame_base.args_to_kwargs(pd.Series) - @frame_base.populate_defaults(pd.Series) - @frame_base.maybe_inplace - def replace(self, limit, **kwargs): - if limit is None: - requires_partition_by = partitionings.Nothing() - else: - requires_partition_by = partitionings.Singleton() - return frame_base.DeferredFrame.wrap( - expressions.ComputedExpression( - 'replace', - lambda df: df.replace(limit=limit, **kwargs), [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=requires_partition_by)) + with expressions.allow_non_parallel_operations(): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'combine', + lambda s: s.all(), [is_unique_distributed], + requires_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Singleton())) - round = frame_base._elementwise_method('round') + plot = frame_base.wont_implement_method( + pd.Series, 'plot', reason="plotting-tools") + pop = frame_base.wont_implement_method( + pd.Series, 'pop', reason="non-deferred-result") - searchsorted = frame_base.wont_implement_method('order-sensitive') + rename_axis = frame_base._elementwise_method('rename_axis', base=pd.Series) - shift = frame_base.wont_implement_method('order-sensitive') + round = frame_base._elementwise_method('round', base=pd.Series) - take = frame_base.wont_implement_method('deprecated') + take = frame_base.wont_implement_method( + pd.Series, 'take', reason='deprecated') - to_dict = frame_base.wont_implement_method('non-deferred') + to_dict = frame_base.wont_implement_method( + pd.Series, 'to_dict', reason="non-deferred-result") - to_frame = frame_base._elementwise_method('to_frame') + to_frame = frame_base._elementwise_method('to_frame', base=pd.Series) + @frame_base.with_docs_from(pd.Series) def unique(self, as_series=False): + """unique is not supported by default because it produces a + non-deferred result: an :class:`~numpy.ndarray`. You can use the + Beam-specific argument ``unique(as_series=True)`` to get the result as + a :class:`DeferredSeries`""" + if not as_series: raise frame_base.WontImplementError( - 'pass as_series=True to get the result as a (deferred) Series') + "unique() is not supported by default because it produces a " + "non-deferred result: a numpy array. You can use the Beam-specific " + "argument unique(as_series=True) to get the result as a " + "DeferredSeries", + reason="non-deferred-result") return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'unique', lambda df: pd.Series(df.unique()), [self._expr], preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Singleton())) + requires_partition_by=partitionings.Singleton( + reason="unique() cannot currently be parallelized."))) + @frame_base.with_docs_from(pd.Series) def update(self, other): self._expr = expressions.ComputedExpression( 'update', lambda df, other: df.update(other) or df, [self._expr, other._expr], - preserves_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Arbitrary(), requires_partition_by=partitionings.Index()) - unstack = frame_base.wont_implement_method('non-deferred column values') + unstack = frame_base.wont_implement_method( + pd.Series, 'unstack', reason='non-deferred-columns') - values = property(frame_base.wont_implement_method('non-deferred')) + @frame_base.with_docs_from(pd.Series) + def value_counts( + self, + sort=False, + normalize=False, + ascending=False, + bins=None, + dropna=True): + """``sort`` is ``False`` by default, and ``sort=True`` is not supported + because it imposes an ordering on the dataset which likely will not be + preserved. + + When ``bin`` is specified this operation is not parallelizable. See + [BEAM-12441](https://issues.apache.org/jira/browse/BEAM-12441) tracking the + possible addition of a distributed implementation.""" + + if sort: + raise frame_base.WontImplementError( + "value_counts(sort=True) is not supported because it imposes an " + "ordering on the dataset which likely will not be preserved.", + reason="order-sensitive") - view = frame_base.wont_implement_method('memory sharing semantics') + if bins is not None: + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'value_counts', + lambda s: s.value_counts( + normalize=normalize, bins=bins, dropna=dropna)[self._expr], + requires_partition_by=partitionings.Singleton( + reason=( + "value_counts with bin specified requires collecting " + "the entire dataset to identify the range.")), + preserves_partition_by=partitionings.Singleton(), + )) - @property + if dropna: + column = self.dropna() + else: + column = self + + result = column.groupby(column).size() + + # groupby.size() names the index, which we don't need + result.index.name = None + + if normalize: + return result / column.length() + else: + return result + + values = property( + frame_base.wont_implement_method( + pd.Series, 'values', reason="non-deferred-result")) + + view = frame_base.wont_implement_method( + pd.Series, + 'view', + explanation=( + "because it relies on memory-sharing semantics that are " + "not compatible with the Beam model.")) + + @property # type: ignore + @frame_base.with_docs_from(pd.Series) def str(self): return _DeferredStringMethods(self._expr) + @property # type: ignore + @frame_base.with_docs_from(pd.Series) + def cat(self): + return _DeferredCategoricalMethods(self._expr) + + @property # type: ignore + @frame_base.with_docs_from(pd.Series) + def dt(self): + return _DeferredDatetimeMethods(self._expr) + + @frame_base.with_docs_from(pd.Series) + def mode(self, *args, **kwargs): + """mode is not currently parallelizable. An approximate, + parallelizable implementation of mode may be added in the future + (`BEAM-12181 `_).""" + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'mode', + lambda df: df.mode(*args, **kwargs), + [self._expr], + #TODO(BEAM-12181): Can we add an approximate implementation? + requires_partition_by=partitionings.Singleton( + reason=( + "mode cannot currently be parallelized. See " + "BEAM-12181 tracking the possble addition of " + "an approximate, parallelizable implementation of mode.")), + preserves_partition_by=partitionings.Singleton())) + + apply = frame_base._elementwise_method('apply', base=pd.Series) + map = frame_base._elementwise_method('map', base=pd.Series) + # TODO(BEAM-11636): Implement transform using type inference to determine the + # proxy + #transform = frame_base._elementwise_method('transform', base=pd.Series) -for name in ['apply', 'map', 'transform']: - setattr(DeferredSeries, name, frame_base._elementwise_method(name)) + @frame_base.with_docs_from(pd.Series) + @frame_base.args_to_kwargs(pd.Series) + @frame_base.populate_defaults(pd.Series) + def repeat(self, repeats, axis): + """``repeats`` must be an ``int`` or a :class:`DeferredSeries`. Lists are + not supported because they make this operation order-sensitive.""" + if isinstance(repeats, int): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'repeat', + lambda series: series.repeat(repeats), [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary())) + elif isinstance(repeats, frame_base.DeferredBase): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'repeat', + lambda series, + repeats_series: series.repeat(repeats_series), + [self._expr, repeats._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Arbitrary())) + elif isinstance(repeats, list): + raise frame_base.WontImplementError( + "repeat(repeats=) repeats must be an int or a DeferredSeries. " + "Lists are not supported because they make this operation sensitive " + "to the order of the data.", + reason="order-sensitive") + else: + raise TypeError( + "repeat(repeats=) value must be an int or a " + f"DeferredSeries (encountered {type(repeats)}).") @populate_not_implemented(pd.DataFrame) @frame_base.DeferredFrame._register_for(pd.DataFrame) class DeferredDataFrame(DeferredDataFrameOrSeries): - @property - def T(self): - return self.transpose() + def __repr__(self): + return ( + f'DeferredDataFrame(columns={list(self.columns)}, ' + f'{self._render_indexes()})') - @property + @property # type: ignore + @frame_base.with_docs_from(pd.DataFrame) def columns(self): return self._expr.proxy().columns @@ -631,8 +2050,12 @@ def set_columns(df): expressions.ComputedExpression( 'set_columns', set_columns, [self._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Singleton())) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary())) + + @frame_base.with_docs_from(pd.DataFrame) + def keys(self): + return self.columns def __getattr__(self, name): # Column attribute access. @@ -659,7 +2082,8 @@ def __getitem__(self, key): elif _is_integer_slice(key): # This depends on the contents of the index. raise frame_base.WontImplementError( - 'Use iloc or loc with integer slices.') + "Integer slices are not supported as they are ambiguous. Please " + "use iloc or loc with integer slices.") else: return self.loc[key] @@ -691,21 +2115,40 @@ def __setitem__(self, key, value): else: raise NotImplementedError(key) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def align(self, other, join, axis, copy, level, method, **kwargs): + """Aligning per level is not yet supported. Only the default, + ``level=None``, is allowed. + + Filling NaN values via ``method`` is not supported, because it is + `order-sensitive + `_. Only the + default, ``method=None``, is allowed. + + ``copy=False`` is not supported because its behavior (whether or not it is + an inplace operation) depends on the data.""" if not copy: - raise frame_base.WontImplementError('align(copy=False)') + raise frame_base.WontImplementError( + "align(copy=False) is not supported because it might be an inplace " + "operation depending on the data. Please prefer the default " + "align(copy=True).") if method is not None: - raise frame_base.WontImplementError('order-sensitive') + raise frame_base.WontImplementError( + f"align(method={method!r}) is not supported because it is " + "order sensitive. Only align(method=None) is supported.", + reason="order-sensitive") if kwargs: raise NotImplementedError('align(%s)' % ', '.join(kwargs.keys())) if level is not None: # Could probably get by partitioning on the used levels. - requires_partition_by = partitionings.Singleton() + requires_partition_by = partitionings.Singleton(reason=( + f"align(level={level}) is not currently parallelizable. Only " + "align(level=None) can be parallelized.")) elif axis in ('columns', 1): - requires_partition_by = partitionings.Nothing() + requires_partition_by = partitionings.Arbitrary() else: requires_partition_by = partitionings.Index() return frame_base.DeferredFrame.wrap( @@ -714,103 +2157,250 @@ def align(self, other, join, axis, copy, level, method, **kwargs): lambda df, other: df.align(other, join=join, axis=axis), [self._expr, other._expr], requires_partition_by=requires_partition_by, - preserves_partition_by=partitionings.Index())) + preserves_partition_by=partitionings.Arbitrary())) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + def append(self, other, ignore_index, verify_integrity, sort, **kwargs): + """``ignore_index=True`` is not supported, because it requires generating an + order-sensitive index.""" + if not isinstance(other, DeferredDataFrame): + raise frame_base.WontImplementError( + "append() only accepts DeferredDataFrame instances, received " + + str(type(other))) + if ignore_index: + raise frame_base.WontImplementError( + "append(ignore_index=True) is order sensitive because it requires " + "generating a new index based on the order of the data.", + reason="order-sensitive") + + if verify_integrity: + # We can verify the index is non-unique within index partitioned data. + requires = partitionings.Index() + else: + requires = partitionings.Arbitrary() + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'append', + lambda s, other: s.append(other, sort=sort, + verify_integrity=verify_integrity, + **kwargs), + [self._expr, other._expr], + requires_partition_by=requires, + preserves_partition_by=partitionings.Arbitrary() + ) + ) + + # If column name exists this is a simple project, otherwise it is a constant + # (default_value) + @frame_base.with_docs_from(pd.DataFrame) + def get(self, key, default_value=None): + if key in self.columns: + return self[key] + else: + return default_value + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) @frame_base.maybe_inplace def set_index(self, keys, **kwargs): + """``keys`` must be a ``str`` or ``List[str]``. Passing an Index or Series + is not yet supported (`BEAM-11711 + `_).""" if isinstance(keys, str): keys = [keys] - if not set(keys).issubset(self._expr.proxy().columns): - raise NotImplementedError(keys) + + if any(isinstance(k, (_DeferredIndex, frame_base.DeferredFrame)) + for k in keys): + raise NotImplementedError("set_index with Index or Series instances is " + "not yet supported (BEAM-11711).") + return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'set_index', lambda df: df.set_index(keys, **kwargs), [self._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Nothing())) - - @property - def loc(self): - return _DeferredLoc(self) - - @property - def iloc(self): - return _DeferredILoc(self) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton())) - @property + @property # type: ignore + @frame_base.with_docs_from(pd.DataFrame) def axes(self): return (self.index, self.columns) - @property + @property # type: ignore + @frame_base.with_docs_from(pd.DataFrame) def dtypes(self): return self._expr.proxy().dtypes + @frame_base.with_docs_from(pd.DataFrame) def assign(self, **kwargs): + """``value`` must be a ``callable`` or :class:`DeferredSeries`. Other types + make this operation order-sensitive.""" for name, value in kwargs.items(): if not callable(value) and not isinstance(value, DeferredSeries): - raise frame_base.WontImplementError("Unsupported value for new " - f"column '{name}': '{value}'. " - "Only callables and Series " - "instances are supported.") - return frame_base._elementwise_method('assign')(self, **kwargs) - + raise frame_base.WontImplementError( + f"Unsupported value for new column '{name}': '{value}'. Only " + "callables and DeferredSeries instances are supported. Other types " + "make this operation sensitive to the order of the data", + reason="order-sensitive") + return self._elementwise( + lambda df, *args, **kwargs: df.assign(*args, **kwargs), + 'assign', + other_kwargs=kwargs) + + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def explode(self, column, ignore_index): # ignoring the index will not preserve it - preserves = (partitionings.Nothing() if ignore_index - else partitionings.Singleton()) + preserves = (partitionings.Singleton() if ignore_index + else partitionings.Index()) return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'explode', lambda df: df.explode(column, ignore_index), [self._expr], preserves_partition_by=preserves, - requires_partition_by=partitionings.Nothing())) + requires_partition_by=partitionings.Arbitrary())) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + def insert(self, value, **kwargs): + """``value`` cannot be a ``List`` because aligning it with this + DeferredDataFrame is order-sensitive.""" + if isinstance(value, list): + raise frame_base.WontImplementError( + "insert(value=list) is not supported because it joins the input " + "list to the deferred DataFrame based on the order of the data.", + reason="order-sensitive") + + if isinstance(value, pd.core.generic.NDFrame): + value = frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(value)) + if isinstance(value, frame_base.DeferredFrame): + def func_zip(df, value): + df = df.copy() + df.insert(value=value, **kwargs) + return df + + inserted = frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'insert', + func_zip, + [self._expr, value._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Arbitrary())) + else: + def func_elementwise(df): + df = df.copy() + df.insert(value=value, **kwargs) + return df + inserted = frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'insert', + func_elementwise, + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary())) + + self._expr = inserted._expr + + @staticmethod + @frame_base.with_docs_from(pd.DataFrame) + def from_dict(*args, **kwargs): + return frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(pd.DataFrame.from_dict(*args, **kwargs))) + + @staticmethod + @frame_base.with_docs_from(pd.DataFrame) + def from_records(*args, **kwargs): + return frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(pd.DataFrame.from_records(*args, + **kwargs))) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) @frame_base.maybe_inplace - def drop(self, labels, axis, index, columns, errors, **kwargs): - if labels is not None: - if index is not None or columns is not None: - raise ValueError("Cannot specify both 'labels' and 'index'/'columns'") - if axis in (0, 'index'): - index = labels - columns = None - elif axis in (1, 'columns'): - index = None - columns = labels - else: - raise ValueError("axis must be one of (0, 1, 'index', 'columns'), " - "got '%s'" % axis) + def duplicated(self, keep, subset): + """Only ``keep=False`` and ``keep="any"`` are supported. Other values of + ``keep`` make this an order-sensitive operation. Note ``keep="any"`` is + a Beam-specific option that guarantees only one duplicate will be kept, but + unlike ``"first"`` and ``"last"`` it makes no guarantees about _which_ + duplicate element is kept.""" + # TODO(BEAM-12074): Document keep="any" + if keep == 'any': + keep = 'first' + elif keep is not False: + raise frame_base.WontImplementError( + f"duplicated(keep={keep!r}) is not supported because it is " + "sensitive to the order of the data. Only keep=False and " + "keep=\"any\" are supported.", + reason="order-sensitive") - if columns is not None: - # Compute the proxy based on just the columns that are dropped. - proxy = self._expr.proxy().drop(columns=columns, errors=errors) - else: - proxy = self._expr.proxy() + by = subset or list(self.columns) - if index is not None and errors == 'raise': - # In order to raise an error about missing index values, we'll - # need to collect the entire dataframe. - requires = partitionings.Singleton() - else: - requires = partitionings.Nothing() + # Workaround a bug where groupby.apply() that returns a single-element + # Series moves index label to column + return self.groupby(by).apply( + lambda df: pd.DataFrame(df.duplicated(keep=keep, subset=subset), + columns=[None]))[None] - return frame_base.DeferredFrame.wrap(expressions.ComputedExpression( - 'drop', - lambda df: df.drop(axis=axis, index=index, columns=columns, - errors=errors, **kwargs), - [self._expr], - proxy=proxy, - requires_partition_by=requires)) + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + @frame_base.maybe_inplace + def drop_duplicates(self, keep, subset, ignore_index): + """Only ``keep=False`` and ``keep="any"`` are supported. Other values of + ``keep`` make this an order-sensitive operation. Note ``keep="any"`` is + a Beam-specific option that guarantees only one duplicate will be kept, but + unlike ``"first"`` and ``"last"`` it makes no guarantees about _which_ + duplicate element is kept.""" + # TODO(BEAM-12074): Document keep="any" + if keep == 'any': + keep = 'first' + elif keep is not False: + raise frame_base.WontImplementError( + f"drop_duplicates(keep={keep!r}) is not supported because it is " + "sensitive to the order of the data. Only keep=False and " + "keep=\"any\" are supported.", + reason="order-sensitive") + + if ignore_index is not False: + raise frame_base.WontImplementError( + "drop_duplicates(ignore_index=False) is not supported because it " + "requires generating a new index that is sensitive to the order of " + "the data.", + reason="order-sensitive") + + by = subset or list(self.columns) + + return self.groupby(by).apply( + lambda df: df.drop_duplicates(keep=keep, subset=subset)).droplevel(by) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + def aggregate(self, func, axis, *args, **kwargs): + # We have specialized implementations for these. + if func in ('quantile',): + return getattr(self, func)(*args, axis=axis, **kwargs) + + # Maps to a property, args are ignored + if func in ('size',): + return getattr(self, func) + + # We also have specialized distributed implementations for these. They only + # support axis=0 (implicitly) though. axis=1 should fall through + if func in ('corr', 'cov') and axis in (0, 'index'): + return getattr(self, func)(*args, **kwargs) - def aggregate(self, func, axis=0, *args, **kwargs): if axis is None: # Aggregate across all elements by first aggregating across columns, # then across rows. @@ -823,9 +2413,9 @@ def aggregate(self, func, axis=0, *args, **kwargs): 'aggregate', lambda df: df.agg(func, axis=1, *args, **kwargs), [self._expr], - requires_partition_by=partitionings.Nothing())) - elif len(self._expr.proxy().columns) == 0 or args or kwargs: - # For these corner cases, just colocate everything. + requires_partition_by=partitionings.Arbitrary())) + elif len(self._expr.proxy().columns) == 0: + # For this corner case, just colocate everything. return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'aggregate', @@ -833,23 +2423,71 @@ def aggregate(self, func, axis=0, *args, **kwargs): [self._expr], requires_partition_by=partitionings.Singleton())) else: - # In the general case, compute the aggregation of each column separately, - # then recombine. + # In the general case, we will compute the aggregation of each column + # separately, then recombine. + + # First, handle any kwargs that cause a projection, by eagerly generating + # the proxy, and only including the columns that are in the output. + PROJECT_KWARGS = ('numeric_only', 'bool_only', 'include', 'exclude') + proxy = self._expr.proxy().agg(func, axis, *args, **kwargs) + + if isinstance(proxy, pd.DataFrame): + projected = self[list(proxy.columns)] + elif isinstance(proxy, pd.Series): + projected = self[list(proxy.index)] + else: + projected = self + + nonnumeric_columns = [name for (name, dtype) in projected.dtypes.items() + if not + pd.core.dtypes.common.is_numeric_dtype(dtype)] + + if _is_numeric(func) and nonnumeric_columns: + if 'numeric_only' in kwargs and kwargs['numeric_only'] is False: + # User has opted in to execution with non-numeric columns, they + # will accept runtime errors + pass + else: + raise frame_base.WontImplementError( + f"Numeric aggregation ({func!r}) on a DataFrame containing " + f"non-numeric columns ({*nonnumeric_columns,!r} is not " + "supported, unless `numeric_only=` is specified.\n" + "Use `numeric_only=True` to only aggregate over numeric " + "columns.\nUse `numeric_only=False` to aggregate over all " + "columns. Note this is not recommended, as it could result in " + "execution time errors.") + + for key in PROJECT_KWARGS: + if key in kwargs: + kwargs.pop(key) + if not isinstance(func, dict): - col_names = list(self._expr.proxy().columns) - func = {col: func for col in col_names} + col_names = list(projected._expr.proxy().columns) + func_by_col = {col: func for col in col_names} else: + func_by_col = func col_names = list(func.keys()) aggregated_cols = [] + has_lists = any(isinstance(f, list) for f in func_by_col.values()) for col in col_names: - funcs = func[col] - if not isinstance(funcs, list): + funcs = func_by_col[col] + if has_lists and not isinstance(funcs, list): + # If any of the columns do multiple aggregations, they all must use + # "list" style output funcs = [funcs] - aggregated_cols.append(self[col].agg(funcs, *args, **kwargs)) + aggregated_cols.append(projected[col].agg(funcs, *args, **kwargs)) # The final shape is different depending on whether any of the columns # were aggregated by a list of aggregators. with expressions.allow_non_parallel_operations(): - if any(isinstance(funcs, list) for funcs in func.values()): + if isinstance(proxy, pd.Series): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'join_aggregate', + lambda *cols: pd.Series( + {col: value for col, value in zip(col_names, cols)}), + [col._expr for col in aggregated_cols], + requires_partition_by=partitionings.Singleton())) + elif isinstance(proxy, pd.DataFrame): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'join_aggregate', @@ -858,43 +2496,58 @@ def aggregate(self, func, axis=0, *args, **kwargs): [col._expr for col in aggregated_cols], requires_partition_by=partitionings.Singleton())) else: - return frame_base.DeferredFrame.wrap( - expressions.ComputedExpression( - 'join_aggregate', - lambda *cols: pd.Series( - {col: value[0] for col, value in zip(col_names, cols)}), - [col._expr for col in aggregated_cols], - requires_partition_by=partitionings.Singleton(), - proxy=self._expr.proxy().agg(func, *args, **kwargs))) + raise AssertionError("Unexpected proxy type for " + f"DataFrame.aggregate!: proxy={proxy!r}, " + f"type(proxy)={type(proxy)!r}") agg = aggregate - applymap = frame_base._elementwise_method('applymap') + applymap = frame_base._elementwise_method('applymap', base=pd.DataFrame) + add_prefix = frame_base._elementwise_method('add_prefix', base=pd.DataFrame) + add_suffix = frame_base._elementwise_method('add_suffix', base=pd.DataFrame) - memory_usage = frame_base.wont_implement_method('non-deferred value') - info = frame_base.wont_implement_method('non-deferred value') + memory_usage = frame_base.wont_implement_method( + pd.DataFrame, 'memory_usage', reason="non-deferred-result") + info = frame_base.wont_implement_method( + pd.DataFrame, 'info', reason="non-deferred-result") - all = frame_base._agg_method('all') - any = frame_base._agg_method('any') - clip = frame_base._elementwise_method( - 'clip', restrictions={'axis': lambda axis: axis in (0, 'index')}) + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + @frame_base.maybe_inplace + def clip(self, axis, **kwargs): + """``lower`` and ``upper`` must be :class:`DeferredSeries` instances, or + constants. Array-like arguments are not supported because they are + order-sensitive.""" + + if any(isinstance(kwargs.get(arg, None), frame_base.DeferredFrame) + for arg in ('upper', 'lower')) and axis not in (0, 'index'): + raise frame_base.WontImplementError( + "axis must be 'index' when upper and/or lower are a DeferredFrame", + reason='order-sensitive') + + return frame_base._elementwise_method('clip', base=pd.DataFrame)(self, + axis=axis, + **kwargs) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def corr(self, method, min_periods): + """Only ``method="pearson"`` can be parallelized. Other methods require + collecting all data on a single worker (see + https://s.apache.org/dataframe-non-parallel-operations for details). + """ if method == 'pearson': proxy = self._expr.proxy().corr() columns = list(proxy.columns) args = [] arg_indices = [] - for ix, col1 in enumerate(columns): - for col2 in columns[ix+1:]: - arg_indices.append((col1, col2)) - # Note that this set may be different for each pair. - no_na = self.loc[self[col1].notna() & self[col2].notna()] - args.append( - no_na[col1]._corr_aligned(no_na[col2], min_periods)) + for col1, col2 in itertools.combinations(columns, 2): + arg_indices.append((col1, col2)) + args.append(self[col1].corr(self[col2], method=method, + min_periods=min_periods)) def fill_matrix(*args): data = collections.defaultdict(dict) for col in columns: @@ -912,13 +2565,17 @@ def fill_matrix(*args): proxy=proxy)) else: + reason = (f"Encountered corr(method={method!r}) which cannot be " + "parallelized. Only corr(method='pearson') is currently " + "parallelizable.") return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'corr', lambda df: df.corr(method=method, min_periods=min_periods), [self._expr], - requires_partition_by=partitionings.Singleton())) + requires_partition_by=partitionings.Singleton(reason=reason))) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def cov(self, min_periods, ddof): @@ -950,23 +2607,31 @@ def fill_matrix(*args): requires_partition_by=partitionings.Singleton(), proxy=proxy)) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def corrwith(self, other, axis, drop, method): - if axis not in (0, 'index'): - raise NotImplementedError('corrwith(axis=%r)' % axis) + if axis in (1, 'columns'): + return self._elementwise( + lambda df, other: df.corrwith(other, axis=axis, drop=drop, + method=method), + 'corrwith', + other_args=(other,)) + + if not isinstance(other, frame_base.DeferredFrame): other = frame_base.DeferredFrame.wrap( expressions.ConstantExpression(other)) if isinstance(other, DeferredSeries): - proxy = self._expr.proxy().corrwith(other._expr.proxy(), method=method) + proxy = self._expr.proxy().corrwith(other._expr.proxy(), axis=axis, + drop=drop, method=method) self, other = self.align(other, axis=0, join='inner') col_names = proxy.index other_cols = [other] * len(col_names) elif isinstance(other, DeferredDataFrame): proxy = self._expr.proxy().corrwith( - other._expr.proxy(), method=method, drop=drop) + other._expr.proxy(), axis=axis, method=method, drop=drop) self, other = self.align(other, axis=0, join='inner') col_names = list( set(self.columns) @@ -975,7 +2640,9 @@ def corrwith(self, other, axis, drop, method): other_cols = [other[col_name] for col_name in col_names] else: # Raise the right error. - self._expr.proxy().corrwith(other._expr.proxy()) + self._expr.proxy().corrwith(other._expr.proxy(), axis=axis, drop=drop, + method=method) + # Just in case something else becomes valid. raise NotImplementedError('corrwith(%s)' % type(other._expr.proxy)) @@ -999,12 +2666,110 @@ def fill_dataframe(*args): requires_partition_by=partitionings.Singleton(), proxy=proxy)) + cummax = frame_base.wont_implement_method(pd.DataFrame, 'cummax', + reason='order-sensitive') + cummin = frame_base.wont_implement_method(pd.DataFrame, 'cummin', + reason='order-sensitive') + cumprod = frame_base.wont_implement_method(pd.DataFrame, 'cumprod', + reason='order-sensitive') + cumsum = frame_base.wont_implement_method(pd.DataFrame, 'cumsum', + reason='order-sensitive') + # TODO(BEAM-12071): Consider adding an order-insensitive implementation for + # diff that relies on the index + diff = frame_base.wont_implement_method(pd.DataFrame, 'diff', + reason='order-sensitive') + interpolate = frame_base.wont_implement_method(pd.DataFrame, 'interpolate', + reason='order-sensitive') + + pct_change = frame_base.wont_implement_method( + pd.DataFrame, 'pct_change', reason='order-sensitive') + asof = frame_base.wont_implement_method( + pd.DataFrame, 'asof', reason='order-sensitive') + first_valid_index = frame_base.wont_implement_method( + pd.DataFrame, 'first_valid_index', reason='order-sensitive') + last_valid_index = frame_base.wont_implement_method( + pd.DataFrame, 'last_valid_index', reason='order-sensitive') + iat = property(frame_base.wont_implement_method( + pd.DataFrame, 'iat', reason='order-sensitive')) + + lookup = frame_base.wont_implement_method( + pd.DataFrame, 'lookup', reason='deprecated') + + head = frame_base.wont_implement_method(pd.DataFrame, 'head', + explanation=_PEEK_METHOD_EXPLANATION) + tail = frame_base.wont_implement_method(pd.DataFrame, 'tail', + explanation=_PEEK_METHOD_EXPLANATION) + + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + def sample(self, n, frac, replace, weights, random_state, axis): + """When ``axis='index'``, only ``n`` and/or ``weights`` may be specified. + ``frac``, ``random_state``, and ``replace=True`` are not yet supported. + See `BEAM-12476 `_. + + Note that pandas will raise an error if ``n`` is larger than the length + of the dataset, while the Beam DataFrame API will simply return the full + dataset in that case. + sample is fully supported for axis='columns'.""" + if axis in (1, 'columns'): + # Sampling on axis=columns just means projecting random columns + # Eagerly generate proxy to determine the set of columns at construction + # time + proxy = self._expr.proxy().sample(n=n, frac=frac, replace=replace, + weights=weights, + random_state=random_state, axis=axis) + # Then do the projection + return self[list(proxy.columns)] + + # axis='index' + if frac is not None or random_state is not None or replace: + raise NotImplementedError( + f"When axis={axis!r}, only n and/or weights may be specified. " + "frac, random_state, and replace=True are not yet supported " + f"(got frac={frac!r}, random_state={random_state!r}, " + f"replace={replace!r}). See BEAM-12476.") + + if n is None: + n = 1 + + if isinstance(weights, str): + weights = self[weights] + + tmp_weight_column_name = "___Beam_DataFrame_weights___" + + if weights is None: + self_with_randomized_weights = frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'randomized_weights', + lambda df: df.assign(**{tmp_weight_column_name: + np.random.rand(len(df))}), + [self._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Arbitrary())) + else: + # See "Fast Parallel Weighted Random Sampling" by Efraimidis and Spirakis + # https://www.cti.gr/images_gr/reports/99-06-02.ps + def assign_randomized_weights(df, weights): + non_zero_weights = (weights > 0) | pd.Series(dtype=bool, index=df.index) + df = df.loc[non_zero_weights] + weights = weights.loc[non_zero_weights] + random_weights = np.log(np.random.rand(len(weights))) / weights + return df.assign(**{tmp_weight_column_name: random_weights}) + self_with_randomized_weights = frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'randomized_weights', + assign_randomized_weights, + [self._expr, weights._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Arbitrary())) - cummax = cummin = cumsum = cumprod = frame_base.wont_implement_method( - 'order-sensitive') - diff = frame_base.wont_implement_method('order-sensitive') + return self_with_randomized_weights.nlargest( + n=n, columns=tmp_weight_column_name, keep='any').drop( + tmp_weight_column_name, axis=1) + @frame_base.with_docs_from(pd.DataFrame) def dot(self, other): # We want to broadcast the right hand side to all partitions of the left. # This is OK, as its index must be the same size as the columns set of self, @@ -1030,46 +2795,62 @@ def __init__(self, value): 'dot', lambda left, right: left @ right.value, [self._expr, side], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Index(), + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary(), proxy=proxy)) __matmul__ = dot - head = tail = frame_base.wont_implement_method('order-sensitive') + @frame_base.with_docs_from(pd.DataFrame) + def mode(self, axis=0, *args, **kwargs): + """mode with axis="columns" is not implemented because it produces + non-deferred columns. - max = frame_base._agg_method('max') - min = frame_base._agg_method('min') + mode with axis="index" is not currently parallelizable. An approximate, + parallelizable implementation of mode may be added in the future + (`BEAM-12181 `_).""" - def mode(self, axis=0, *args, **kwargs): if axis == 1 or axis == 'columns': # Number of columns is max(number mode values for each row), so we can't # determine how many there will be before looking at the data. - raise frame_base.WontImplementError('non-deferred column values') + raise frame_base.WontImplementError( + "mode(axis=columns) is not supported because it produces a variable " + "number of columns depending on the data.", + reason="non-deferred-columns") return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'mode', lambda df: df.mode(*args, **kwargs), [self._expr], - #TODO(robertwb): Approximate? - requires_partition_by=partitionings.Singleton(), + #TODO(BEAM-12181): Can we add an approximate implementation? + requires_partition_by=partitionings.Singleton(reason=( + "mode(axis='index') cannot currently be parallelized. See " + "BEAM-12181 tracking the possble addition of an approximate, " + "parallelizable implementation of mode." + )), preserves_partition_by=partitionings.Singleton())) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) @frame_base.maybe_inplace def dropna(self, axis, **kwargs): + """dropna with axis="columns" specified cannot be parallelized.""" # TODO(robertwb): This is a common pattern. Generalize? - if axis == 1 or axis == 'columns': - requires_partition_by = partitionings.Singleton() + if axis in (1, 'columns'): + requires_partition_by = partitionings.Singleton(reason=( + "dropna(axis=1) cannot currently be parallelized. It requires " + "checking all values in each column for NaN values, to determine " + "if that column should be dropped." + )) else: - requires_partition_by = partitionings.Nothing() + requires_partition_by = partitionings.Arbitrary() return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'dropna', lambda df: df.dropna(axis=axis, **kwargs), [self._expr], - preserves_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Arbitrary(), requires_partition_by=requires_partition_by)) def _eval_or_query(self, name, expr, inplace, **kwargs): @@ -1086,8 +2867,8 @@ def _eval_or_query(self, name, expr, inplace, **kwargs): name, lambda df: getattr(df, name)(expr, **kwargs), [self._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Singleton()) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary()) if inplace: self._expr = result_expr @@ -1095,21 +2876,39 @@ def _eval_or_query(self, name, expr, inplace, **kwargs): return frame_base.DeferredFrame.wrap(result_expr) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def eval(self, expr, inplace, **kwargs): + """Accessing local variables with ``@`` is not yet supported + (`BEAM-11202 `_). + + Arguments ``local_dict``, ``global_dict``, ``level``, ``target``, and + ``resolvers`` are not yet supported.""" return self._eval_or_query('eval', expr, inplace, **kwargs) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def query(self, expr, inplace, **kwargs): + """Accessing local variables with ``@`` is not yet supported + (`BEAM-11202 `_). + + Arguments ``local_dict``, ``global_dict``, ``level``, ``target``, and + ``resolvers`` are not yet supported.""" return self._eval_or_query('query', expr, inplace, **kwargs) - isna = frame_base._elementwise_method('isna') - notnull = notna = frame_base._elementwise_method('notna') + isnull = isna = frame_base._elementwise_method('isna', base=pd.DataFrame) + notnull = notna = frame_base._elementwise_method('notna', base=pd.DataFrame) - items = itertuples = iterrows = iteritems = frame_base.wont_implement_method( - 'non-lazy') + items = frame_base.wont_implement_method(pd.DataFrame, 'items', + reason="non-deferred-result") + itertuples = frame_base.wont_implement_method(pd.DataFrame, 'itertuples', + reason="non-deferred-result") + iterrows = frame_base.wont_implement_method(pd.DataFrame, 'iterrows', + reason="non-deferred-result") + iteritems = frame_base.wont_implement_method(pd.DataFrame, 'iteritems', + reason="non-deferred-result") def _cols_as_temporary_index(self, cols, suffix=''): original_index_names = list(self._expr.proxy().index.names) @@ -1124,8 +2923,8 @@ def reindex(df): df.rename_axis(index=new_index_names, copy=False) .reset_index().set_index(cols), [df._expr], - preserves_partition_by=partitionings.Nothing(), - requires_partition_by=partitionings.Nothing())) + preserves_partition_by=partitionings.Singleton(), + requires_partition_by=partitionings.Arbitrary())) def revert(df): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( @@ -1134,10 +2933,11 @@ def revert(df): df.reset_index().set_index(new_index_names) .rename_axis(index=original_index_names, copy=False), [df._expr], - preserves_partition_by=partitionings.Nothing(), - requires_partition_by=partitionings.Nothing())) + preserves_partition_by=partitionings.Singleton(), + requires_partition_by=partitionings.Arbitrary())) return reindex, revert + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def join(self, other, on, **kwargs): @@ -1169,9 +2969,10 @@ def fill_placeholders(values): lambda df, *deferred_others: df.join( fill_placeholders(deferred_others), **kwargs), [self._expr] + other_exprs, - preserves_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Arbitrary(), requires_partition_by=partitionings.Index())) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def merge( @@ -1182,7 +2983,17 @@ def merge( right_on, left_index, right_index, + suffixes, **kwargs): + """merge is not parallelizable unless ``left_index`` or ``right_index`` is + ``True`, because it requires generating an entirely new unique index. + See notes on :meth:`DeferredDataFrame.reset_index`. It is recommended to + move the join key for one of your columns to the index to avoid this issue. + For an example see the enrich pipeline in + :mod:`apache_beam.examples.dataframe.taxiride`. + + ``how="cross"`` is not yet supported. + """ self_proxy = self._expr.proxy() right_proxy = right._expr.proxy() # Validate with a pandas call. @@ -1194,15 +3005,17 @@ def merge( left_index=left_index, right_index=right_index, **kwargs) + if kwargs.get('how', None) == 'cross': + raise NotImplementedError("cross join is not yet implemented (BEAM-9547)") if not any([on, left_on, right_on, left_index, right_index]): - on = [col for col in self_proxy.columns() if col in right_proxy.columns()] + on = [col for col in self_proxy.columns if col in right_proxy.columns] if not left_on: left_on = on - elif not isinstance(left_on, list): + if left_on and not isinstance(left_on, list): left_on = [left_on] if not right_on: right_on = on - elif not isinstance(right_on, list): + if right_on and not isinstance(right_on, list): right_on = [right_on] if left_index: @@ -1215,13 +3028,27 @@ def merge( else: indexed_right = right.set_index(right_on, drop=False) + if left_on and right_on: + common_cols = set(left_on).intersection(right_on) + if len(common_cols): + # When merging on the same column name from both dfs, we need to make + # sure only one df has the column. Otherwise we end up with + # two duplicate columns, one with lsuffix and one with rsuffix. + # It's safe to drop from either because the data has already been duped + # to the index. + indexed_right = indexed_right.drop(columns=common_cols) + + merged = frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'merge', - lambda left, right: left.merge( - right, left_index=True, right_index=True, **kwargs), + lambda left, right: left.merge(right, + left_index=True, + right_index=True, + suffixes=suffixes, + **kwargs), [indexed_left._expr, indexed_right._expr], - preserves_partition_by=partitionings.Singleton(), + preserves_partition_by=partitionings.Arbitrary(), requires_partition_by=partitionings.Index())) if left_index or right_index: @@ -1229,20 +3056,29 @@ def merge( else: return merged.reset_index(drop=True) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def nlargest(self, keep, **kwargs): + """Only ``keep=False`` and ``keep="any"`` are supported. Other values of + ``keep`` make this an order-sensitive operation. Note ``keep="any"`` is + a Beam-specific option that guarantees only one duplicate will be kept, but + unlike ``"first"`` and ``"last"`` it makes no guarantees about _which_ + duplicate element is kept.""" if keep == 'any': keep = 'first' elif keep != 'all': - raise frame_base.WontImplementError('order-sensitive') + raise frame_base.WontImplementError( + f"nlargest(keep={keep!r}) is not supported because it is " + "order sensitive. Only keep=\"all\" is supported.", + reason="order-sensitive") kwargs['keep'] = keep per_partition = expressions.ComputedExpression( 'nlargest-per-partition', lambda df: df.nlargest(**kwargs), [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Nothing()) + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Arbitrary()) with expressions.allow_non_parallel_operations(True): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( @@ -1252,20 +3088,29 @@ def nlargest(self, keep, **kwargs): preserves_partition_by=partitionings.Singleton(), requires_partition_by=partitionings.Singleton())) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) def nsmallest(self, keep, **kwargs): + """Only ``keep=False`` and ``keep="any"`` are supported. Other values of + ``keep`` make this an order-sensitive operation. Note ``keep="any"`` is + a Beam-specific option that guarantees only one duplicate will be kept, but + unlike ``"first"`` and ``"last"`` it makes no guarantees about _which_ + duplicate element is kept.""" if keep == 'any': keep = 'first' elif keep != 'all': - raise frame_base.WontImplementError('order-sensitive') + raise frame_base.WontImplementError( + f"nsmallest(keep={keep!r}) is not supported because it is " + "order sensitive. Only keep=\"all\" is supported.", + reason="order-sensitive") kwargs['keep'] = keep per_partition = expressions.ComputedExpression( 'nsmallest-per-partition', lambda df: df.nsmallest(**kwargs), [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Nothing()) + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Arbitrary()) with expressions.allow_non_parallel_operations(True): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( @@ -1275,51 +3120,66 @@ def nsmallest(self, keep, **kwargs): preserves_partition_by=partitionings.Singleton(), requires_partition_by=partitionings.Singleton())) - @frame_base.args_to_kwargs(pd.DataFrame) - def nunique(self, **kwargs): - if kwargs.get('axis', None) in (1, 'columns'): - requires_partition_by = partitionings.Nothing() - else: - requires_partition_by = partitionings.Singleton() - return frame_base.DeferredFrame.wrap( - expressions.ComputedExpression( - 'nunique', - lambda df: df.nunique(**kwargs), - [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=requires_partition_by)) - - plot = property(frame_base.wont_implement_method('plot')) + plot = frame_base.wont_implement_method(pd.DataFrame, 'plot', + reason="plotting-tools") + @frame_base.with_docs_from(pd.DataFrame) def pop(self, item): result = self[item] + self._expr = expressions.ComputedExpression( 'popped', - lambda df: (df.pop(item), df)[-1], + lambda df: df.drop(columns=[item]), [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=partitionings.Nothing()) + preserves_partition_by=partitionings.Arbitrary(), + requires_partition_by=partitionings.Arbitrary()) return result - prod = product = frame_base._agg_method('prod') - + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) - def quantile(self, axis, **kwargs): - if axis == 1 or axis == 'columns': - raise frame_base.WontImplementError('non-deferred column values') + def quantile(self, q, axis, **kwargs): + """``quantile(axis="index")`` is not parallelizable. See + `BEAM-12167 `_ tracking + the possible addition of an approximate, parallelizable implementation of + quantile. + + When using quantile with ``axis="columns"`` only a single ``q`` value can be + specified.""" + if axis in (1, 'columns'): + if isinstance(q, list): + raise frame_base.WontImplementError( + "quantile(axis=columns) with multiple q values is not supported " + "because it transposes the input DataFrame. Note computing " + "an individual quantile across columns (e.g. " + f"df.quantile(q={q[0]!r}, axis={axis!r}) is supported.", + reason="non-deferred-columns") + else: + requires = partitionings.Arbitrary() + else: # axis='index' + # TODO(BEAM-12167): Provide an option for approximate distributed + # quantiles + requires = partitionings.Singleton(reason=( + "Computing quantiles across index cannot currently be parallelized. " + "See BEAM-12167 tracking the possible addition of an approximate, " + "parallelizable implementation of quantile." + )) + return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'quantile', - lambda df: df.quantile(axis=axis, **kwargs), + lambda df: df.quantile(q=q, axis=axis, **kwargs), [self._expr], - #TODO(robertwb): Approximate quantiles? - requires_partition_by=partitionings.Singleton(), + requires_partition_by=requires, preserves_partition_by=partitionings.Singleton())) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.maybe_inplace def rename(self, **kwargs): + """rename is not parallelizable when ``axis="index"`` and + ``errors="raise"``. It requires collecting all data on a single + node in order to detect if one of the index values is missing.""" rename_index = ( 'index' in kwargs or kwargs.get('axis', None) in (0, 'index') @@ -1331,15 +3191,22 @@ def rename(self, **kwargs): if rename_index: # Technically, it's still partitioned by index, but it's no longer # partitioned by the hash of the index. - preserves_partition_by = partitionings.Nothing() - else: preserves_partition_by = partitionings.Singleton() + else: + preserves_partition_by = partitionings.Index() if kwargs.get('errors', None) == 'raise' and rename_index: - # Renaming index with checking requires global index. - requires_partition_by = partitionings.Singleton() + # TODO: We could do this in parallel by creating a ConstantExpression + # with a series created from the mapper dict. Then Index() partitioning + # would co-locate the necessary index values and we could raise + # individually within each partition. Execution time errors are + # discouraged anyway so probably not worth the effort. + requires_partition_by = partitionings.Singleton(reason=( + "rename(errors='raise', axis='index') requires collecting all " + "data on a single node in order to detect missing index values." + )) else: - requires_partition_by = partitionings.Nothing() + requires_partition_by = partitionings.Arbitrary() proxy = None if rename_index: @@ -1363,100 +3230,121 @@ def rename(self, **kwargs): preserves_partition_by=preserves_partition_by, requires_partition_by=requires_partition_by)) - rename_axis = frame_base._elementwise_method('rename_axis') - - @frame_base.args_to_kwargs(pd.DataFrame) - @frame_base.populate_defaults(pd.DataFrame) - @frame_base.maybe_inplace - def replace(self, limit, **kwargs): - if limit is None: - requires_partition_by = partitionings.Nothing() - else: - requires_partition_by = partitionings.Singleton() - return frame_base.DeferredFrame.wrap( - expressions.ComputedExpression( - 'replace', - lambda df: df.replace(limit=limit, **kwargs), - [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=requires_partition_by)) + rename_axis = frame_base._elementwise_method('rename_axis', base=pd.DataFrame) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) - @frame_base.maybe_inplace - def reset_index(self, level=None, **kwargs): - if level is not None and not isinstance(level, (tuple, list)): - level = [level] - if level is None or len(level) == self._expr.proxy().index.nlevels: - # TODO: Could do distributed re-index with offsets. - requires_partition_by = partitionings.Singleton() - else: - requires_partition_by = partitionings.Nothing() - return frame_base.DeferredFrame.wrap( - expressions.ComputedExpression( - 'reset_index', - lambda df: df.reset_index(level=level, **kwargs), - [self._expr], - preserves_partition_by=partitionings.Nothing(), - requires_partition_by=requires_partition_by)) + def round(self, decimals, *args, **kwargs): - round = frame_base._elementwise_method('round') - select_dtypes = frame_base._elementwise_method('select_dtypes') + if isinstance(decimals, frame_base.DeferredFrame): + # Disallow passing a deferred Series in, our current partitioning model + # prevents us from using it correctly. + raise NotImplementedError("Passing a deferred series to round() is not " + "supported, please use a concrete pd.Series " + "instance or a dictionary") - @frame_base.args_to_kwargs(pd.DataFrame) - @frame_base.populate_defaults(pd.DataFrame) - def shift(self, axis, **kwargs): - if 'freq' in kwargs: - raise frame_base.WontImplementError('data-dependent') - if axis == 1 or axis == 'columns': - requires_partition_by = partitionings.Nothing() - else: - requires_partition_by = partitionings.Singleton() return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( - 'shift', - lambda df: df.shift(axis=axis, **kwargs), + 'round', + lambda df: df.round(decimals, *args, **kwargs), [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=requires_partition_by)) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Index() + ) + ) - @property - def shape(self): - raise frame_base.WontImplementError('scalar value') + select_dtypes = frame_base._elementwise_method('select_dtypes', + base=pd.DataFrame) + @frame_base.with_docs_from(pd.DataFrame) @frame_base.args_to_kwargs(pd.DataFrame) @frame_base.populate_defaults(pd.DataFrame) - @frame_base.maybe_inplace - def sort_values(self, axis, **kwargs): - if axis == 1 or axis == 'columns': - requires_partition_by = partitionings.Nothing() + def shift(self, axis, freq, **kwargs): + """shift with ``axis="index" is only supported with ``freq`` specified and + ``fill_value`` undefined. Other configurations make this operation + order-sensitive.""" + if axis in (1, 'columns'): + preserves = partitionings.Arbitrary() + proxy = None else: - requires_partition_by = partitionings.Singleton() + if freq is None or 'fill_value' in kwargs: + fill_value = kwargs.get('fill_value', 'NOT SET') + raise frame_base.WontImplementError( + f"shift(axis={axis!r}) is only supported with freq defined, and " + f"fill_value undefined (got freq={freq!r}," + f"fill_value={fill_value!r}). Other configurations are sensitive " + "to the order of the data because they require populating shifted " + "rows with `fill_value`.", + reason="order-sensitive") + # proxy generation fails in pandas <1.2 + # Seems due to https://github.com/pandas-dev/pandas/issues/14811, + # bug with shift on empty indexes. + # Fortunately the proxy should be identical to the input. + proxy = self._expr.proxy().copy() + + # index is modified, so no partitioning is preserved. + preserves = partitionings.Singleton() + return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( - 'sort_values', - lambda df: df.sort_values(axis=axis, **kwargs), + 'shift', + lambda df: df.shift(axis=axis, freq=freq, **kwargs), [self._expr], - preserves_partition_by=partitionings.Singleton(), - requires_partition_by=requires_partition_by)) - - stack = frame_base._elementwise_method('stack') - - sum = frame_base._agg_method('sum') - - take = frame_base.wont_implement_method('deprecated') - - to_records = to_dict = to_numpy = to_string = ( - frame_base.wont_implement_method('non-deferred value')) - - to_sparse = to_string # frame_base._elementwise_method('to_sparse') - - transform = frame_base._elementwise_method( - 'transform', restrictions={'axis': 0}) - - transpose = frame_base.wont_implement_method('non-deferred column values') - + proxy=proxy, + preserves_partition_by=preserves, + requires_partition_by=partitionings.Arbitrary())) + + shape = property(frame_base.wont_implement_method( + pd.DataFrame, 'shape', reason="non-deferred-result")) + + stack = frame_base._elementwise_method('stack', base=pd.DataFrame) + + all = _agg_method(pd.DataFrame, 'all') + any = _agg_method(pd.DataFrame, 'any') + count = _agg_method(pd.DataFrame, 'count') + describe = _agg_method(pd.DataFrame, 'describe') + max = _agg_method(pd.DataFrame, 'max') + min = _agg_method(pd.DataFrame, 'min') + prod = product = _agg_method(pd.DataFrame, 'prod') + sum = _agg_method(pd.DataFrame, 'sum') + mean = _agg_method(pd.DataFrame, 'mean') + median = _agg_method(pd.DataFrame, 'median') + nunique = _agg_method(pd.DataFrame, 'nunique') + std = _agg_method(pd.DataFrame, 'std') + var = _agg_method(pd.DataFrame, 'var') + sem = _agg_method(pd.DataFrame, 'sem') + mad = _agg_method(pd.DataFrame, 'mad') + skew = _agg_method(pd.DataFrame, 'skew') + kurt = _agg_method(pd.DataFrame, 'kurt') + kurtosis = _agg_method(pd.DataFrame, 'kurtosis') + + take = frame_base.wont_implement_method(pd.DataFrame, 'take', + reason='deprecated') + + to_records = frame_base.wont_implement_method(pd.DataFrame, 'to_records', + reason="non-deferred-result") + to_dict = frame_base.wont_implement_method(pd.DataFrame, 'to_dict', + reason="non-deferred-result") + to_numpy = frame_base.wont_implement_method(pd.DataFrame, 'to_numpy', + reason="non-deferred-result") + to_string = frame_base.wont_implement_method(pd.DataFrame, 'to_string', + reason="non-deferred-result") + + to_sparse = frame_base.wont_implement_method(pd.DataFrame, 'to_sparse', + reason="non-deferred-result") + + transpose = frame_base.wont_implement_method( + pd.DataFrame, 'transpose', reason='non-deferred-columns') + T = property(frame_base.wont_implement_method( + pd.DataFrame, 'T', reason='non-deferred-columns')) + + + @frame_base.with_docs_from(pd.DataFrame) def unstack(self, *args, **kwargs): + """unstack cannot be used on :class:`DeferredDataFrame` instances with + multiple index levels, because the columns in the output depend on the + data.""" if self._expr.proxy().index.nlevels == 1: return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( @@ -1465,196 +3353,578 @@ def unstack(self, *args, **kwargs): [self._expr], requires_partition_by=partitionings.Index())) else: - raise frame_base.WontImplementError('non-deferred column values') + raise frame_base.WontImplementError( + "unstack() is not supported on DataFrames with a multiple indexes, " + "because the columns in the output depend on the input data.", + reason="non-deferred-columns") update = frame_base._proxy_method( 'update', inplace=True, + base=pd.DataFrame, requires_partition_by=partitionings.Index(), - preserves_partition_by=partitionings.Index()) + preserves_partition_by=partitionings.Arbitrary()) + + values = property(frame_base.wont_implement_method( + pd.DataFrame, 'values', reason="non-deferred-result")) + + style = property(frame_base.wont_implement_method( + pd.DataFrame, 'style', reason="non-deferred-result")) - values = property(frame_base.wont_implement_method('non-deferred value')) + @frame_base.with_docs_from(pd.DataFrame) + @frame_base.args_to_kwargs(pd.DataFrame) + @frame_base.populate_defaults(pd.DataFrame) + def melt(self, ignore_index, **kwargs): + """``ignore_index=True`` is not supported, because it requires generating an + order-sensitive index.""" + if ignore_index: + raise frame_base.WontImplementError( + "melt(ignore_index=True) is order sensitive because it requires " + "generating a new index based on the order of the data.", + reason="order-sensitive") + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'melt', + lambda df: df.melt(ignore_index=False, **kwargs), [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton())) + + @frame_base.with_docs_from(pd.DataFrame) + def value_counts(self, subset=None, sort=False, normalize=False, + ascending=False): + """``sort`` is ``False`` by default, and ``sort=True`` is not supported + because it imposes an ordering on the dataset which likely will not be + preserved.""" + + if sort: + raise frame_base.WontImplementError( + "value_counts(sort=True) is not supported because it imposes an " + "ordering on the dataset which likely will not be preserved.", + reason="order-sensitive") + columns = subset or list(self.columns) + result = self.groupby(columns).size() + + if normalize: + return result/self.dropna().length() + else: + return result for io_func in dir(io): if io_func.startswith('to_'): setattr(DeferredDataFrame, io_func, getattr(io, io_func)) + setattr(DeferredSeries, io_func, getattr(io, io_func)) for meth in ('filter', ): - setattr(DeferredDataFrame, meth, frame_base._elementwise_method(meth)) + setattr(DeferredDataFrame, meth, + frame_base._elementwise_method(meth, base=pd.DataFrame)) -@populate_not_implemented(pd.core.groupby.generic.DataFrameGroupBy) +@populate_not_implemented(DataFrameGroupBy) class DeferredGroupBy(frame_base.DeferredFrame): - def __init__(self, expr, kwargs): + def __init__(self, expr, kwargs, + ungrouped: expressions.Expression[pd.core.generic.NDFrame], + ungrouped_with_index: expressions.Expression[pd.core.generic.NDFrame], # pylint: disable=line-too-long + grouping_columns, + grouping_indexes, + projection=None): + """This object represents the result of:: + + ungrouped.groupby(level=[grouping_indexes + grouping_columns], + **kwargs)[projection] + + :param expr: An expression to compute a pandas GroupBy object. Convenient + for unliftable aggregations. + :param ungrouped: An expression to compute the DataFrame pre-grouping, the + (Multi)Index contains only the grouping columns/indexes. + :param ungrouped_with_index: Same as ungrouped, except the index includes + all of the original indexes as well as any grouping columns. This is + important for operations that expose the original index, e.g. .apply(), + but we only use it when necessary to avoid unnessary data transfer and + GBKs. + :param grouping_columns: list of column labels that were in the original + groupby(..) ``by`` parameter. Only relevant for grouped DataFrames. + :param grouping_indexes: list of index names (or index level numbers) to be + grouped. + :param kwargs: Keywords args passed to the original groupby(..) call.""" super(DeferredGroupBy, self).__init__(expr) + self._ungrouped = ungrouped + self._ungrouped_with_index = ungrouped_with_index + self._projection = projection + self._grouping_columns = grouping_columns + self._grouping_indexes = grouping_indexes self._kwargs = kwargs - def agg(self, fn): + if (self._kwargs.get('dropna', True) is False and + self._ungrouped.proxy().index.nlevels > 1): + raise NotImplementedError( + "dropna=False does not work as intended in the Beam DataFrame API " + "when grouping on multiple columns or indexes (See BEAM-12495).") + + def __getattr__(self, name): + return DeferredGroupBy( + expressions.ComputedExpression( + 'groupby_project', + lambda gb: getattr(gb, name), [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary()), + self._kwargs, + self._ungrouped, + self._ungrouped_with_index, + self._grouping_columns, + self._grouping_indexes, + projection=name) + + def __getitem__(self, name): + return DeferredGroupBy( + expressions.ComputedExpression( + 'groupby_project', + lambda gb: gb[name], [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary()), + self._kwargs, + self._ungrouped, + self._ungrouped_with_index, + self._grouping_columns, + self._grouping_indexes, + projection=name) + + @frame_base.with_docs_from(DataFrameGroupBy) + def agg(self, fn, *args, **kwargs): + if _is_associative(fn): + return _liftable_agg(fn)(self, *args, **kwargs) + elif _is_liftable_with_sum(fn): + return _liftable_agg(fn, postagg_meth='sum')(self, *args, **kwargs) + elif _is_unliftable(fn): + return _unliftable_agg(fn)(self, *args, **kwargs) + elif callable(fn): + return DeferredDataFrame( + expressions.ComputedExpression( + 'agg', + lambda gb: gb.agg(fn, *args, **kwargs), [self._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Singleton())) + else: + raise NotImplementedError(f"GroupBy.agg(func={fn!r})") + + @property + def ndim(self): + return self._expr.proxy().ndim + + @frame_base.with_docs_from(DataFrameGroupBy) + def apply(self, func, *args, **kwargs): + """Note that ``func`` will be called once during pipeline construction time + with an empty pandas object, so take care if ``func`` has a side effect. + + When called with an empty pandas object, ``func`` is expected to return an + object of the same type as what will be returned when the pipeline is + processing actual data. If the result is a pandas object it should have the + same type and name (for a Series) or column types and names (for + a DataFrame) as the actual results.""" + project = _maybe_project_func(self._projection) + grouping_indexes = self._grouping_indexes + grouping_columns = self._grouping_columns + + # Unfortunately pandas does not execute func to determine the right proxy. + # We run user func on a proxy here to detect the return type and generate + # the proxy. + fn_input = project(self._ungrouped_with_index.proxy().reset_index( + grouping_columns, drop=True)) + result = func(fn_input) + if isinstance(result, pd.core.generic.NDFrame): + if result.index is fn_input.index: + proxy = result + else: + proxy = result[:0] + + def index_to_arrays(index): + return [index.get_level_values(level) + for level in range(index.nlevels)] + + # The final result will have the grouped indexes + the indexes from the + # result + proxy.index = pd.MultiIndex.from_arrays( + index_to_arrays(self._ungrouped.proxy().index) + + index_to_arrays(proxy.index), + names=self._ungrouped.proxy().index.names + proxy.index.names) + else: + # The user fn returns some non-pandas type. The expected result is a + # Series where each element is the result of one user fn call. + dtype = pd.Series([result]).dtype + proxy = pd.Series([], dtype=dtype, index=self._ungrouped.proxy().index) + + + def do_partition_apply(df): + # Remove columns from index, we only needed them there for partitioning + df = df.reset_index(grouping_columns, drop=True) + + gb = df.groupby(level=grouping_indexes or None, + by=grouping_columns or None) + + gb = project(gb) + return gb.apply(func, *args, **kwargs) + + return DeferredDataFrame( + expressions.ComputedExpression( + 'apply', + do_partition_apply, + [self._ungrouped_with_index], + proxy=proxy, + requires_partition_by=partitionings.Index(grouping_indexes + + grouping_columns), + preserves_partition_by=partitionings.Index(grouping_indexes))) + + + @frame_base.with_docs_from(DataFrameGroupBy) + def transform(self, fn, *args, **kwargs): + """Note that ``func`` will be called once during pipeline construction time + with an empty pandas object, so take care if ``func`` has a side effect. + + When called with an empty pandas object, ``func`` is expected to return an + object of the same type as what will be returned when the pipeline is + processing actual data. The result should have the same type and name (for + a Series) or column types and names (for a DataFrame) as the actual + results.""" if not callable(fn): - # TODO: Add support for strings in (UN)LIFTABLE_AGGREGATIONS. Test by - # running doctests for pandas.core.groupby.generic - raise NotImplementedError('GroupBy.agg currently only supports callable ' - 'arguments') + raise NotImplementedError( + "String functions are not yet supported in transform.") + + if self._grouping_columns and not self._projection: + grouping_columns = self._grouping_columns + def fn_wrapper(x, *args, **kwargs): + x = x.droplevel(grouping_columns) + return fn(x, *args, **kwargs) + else: + fn_wrapper = fn + + project = _maybe_project_func(self._projection) + + # pandas cannot execute fn to determine the right proxy. + # We run user fn on a proxy here to detect the return type and generate the + # proxy. + result = fn_wrapper(project(self._ungrouped_with_index.proxy())) + parent_frame = self._ungrouped.args()[0].proxy() + if isinstance(result, pd.core.generic.NDFrame): + proxy = result[:0] + + else: + # The user fn returns some non-pandas type. The expected result is a + # Series where each element is the result of one user fn call. + dtype = pd.Series([result]).dtype + proxy = pd.Series([], dtype=dtype, name=project(parent_frame).name) + + if not isinstance(self._projection, list): + proxy.name = self._projection + + # The final result will have the original indexes + proxy.index = parent_frame.index + + levels = self._grouping_indexes + self._grouping_columns + return DeferredDataFrame( expressions.ComputedExpression( - 'agg', - lambda df: df.agg(fn), [self._expr], - requires_partition_by=partitionings.Index(), - preserves_partition_by=partitionings.Singleton())) + 'transform', + lambda df: project(df.groupby(level=levels)).transform( + fn_wrapper, + *args, + **kwargs).droplevel(self._grouping_columns), + [self._ungrouped_with_index], + proxy=proxy, + requires_partition_by=partitionings.Index(levels), + preserves_partition_by=partitionings.Index(self._grouping_indexes))) - aggregate = agg + @frame_base.with_docs_from(DataFrameGroupBy) + def filter(self, func=None, dropna=True): + if func is None or not callable(func): + raise TypeError("func must be specified and it must be callable") - first = last = head = tail = frame_base.wont_implement_method( - 'order sensitive') + def apply_fn(df): + if func(df): + return df + elif not dropna: + result = df.copy() + result.iloc[:, :] = np.nan + return result + else: + return df.iloc[:0] + + return self.apply(apply_fn).droplevel(self._grouping_columns) + + @property # type: ignore + @frame_base.with_docs_from(DataFrameGroupBy) + def dtypes(self): + grouping_columns = self._grouping_columns + return self.apply(lambda df: df.drop(grouping_columns, axis=1).dtypes) + + fillna = frame_base.wont_implement_method( + DataFrameGroupBy, 'fillna', explanation=( + "df.fillna() should be used instead. Only method=None is supported " + "because other methods are order-sensitive. df.groupby(..).fillna() " + "without a method is equivalent to df.fillna().")) + + ffill = frame_base.wont_implement_method(DataFrameGroupBy, 'ffill', + reason="order-sensitive") + bfill = frame_base.wont_implement_method(DataFrameGroupBy, 'bfill', + reason="order-sensitive") + pad = frame_base.wont_implement_method(DataFrameGroupBy, 'pad', + reason="order-sensitive") + backfill = frame_base.wont_implement_method(DataFrameGroupBy, 'backfill', + reason="order-sensitive") + + aggregate = agg - # TODO(robertwb): Consider allowing this for categorical keys. - __len__ = frame_base.wont_implement_method('non-deferred') - __getitem__ = frame_base.not_implemented_method('__getitem__') - groups = property(frame_base.wont_implement_method('non-deferred')) + hist = frame_base.wont_implement_method(DataFrameGroupBy, 'hist', + reason="plotting-tools") + plot = frame_base.wont_implement_method(DataFrameGroupBy, 'plot', + reason="plotting-tools") + boxplot = frame_base.wont_implement_method(DataFrameGroupBy, 'boxplot', + reason="plotting-tools") + + head = frame_base.wont_implement_method( + DataFrameGroupBy, 'head', explanation=_PEEK_METHOD_EXPLANATION) + tail = frame_base.wont_implement_method( + DataFrameGroupBy, 'tail', explanation=_PEEK_METHOD_EXPLANATION) + + first = frame_base.not_implemented_method('first', base_type=DataFrameGroupBy) + last = frame_base.not_implemented_method('last', base_type=DataFrameGroupBy) + nth = frame_base.wont_implement_method( + DataFrameGroupBy, 'nth', reason='order-sensitive') + cumcount = frame_base.wont_implement_method( + DataFrameGroupBy, 'cumcount', reason='order-sensitive') + cummax = frame_base.wont_implement_method( + DataFrameGroupBy, 'cummax', reason='order-sensitive') + cummin = frame_base.wont_implement_method( + DataFrameGroupBy, 'cummin', reason='order-sensitive') + cumsum = frame_base.wont_implement_method( + DataFrameGroupBy, 'cumsum', reason='order-sensitive') + cumprod = frame_base.wont_implement_method( + DataFrameGroupBy, 'cumprod', reason='order-sensitive') + diff = frame_base.wont_implement_method(DataFrameGroupBy, 'diff', + reason='order-sensitive') + shift = frame_base.wont_implement_method(DataFrameGroupBy, 'shift', + reason='order-sensitive') + + pct_change = frame_base.wont_implement_method(DataFrameGroupBy, 'pct_change', + reason='order-sensitive') + ohlc = frame_base.wont_implement_method(DataFrameGroupBy, 'ohlc', + reason='order-sensitive') + + # TODO(BEAM-12169): Consider allowing this for categorical keys. + __len__ = frame_base.wont_implement_method( + DataFrameGroupBy, '__len__', reason="non-deferred-result") + groups = property(frame_base.wont_implement_method( + DataFrameGroupBy, 'groups', reason="non-deferred-result")) + indices = property(frame_base.wont_implement_method( + DataFrameGroupBy, 'indices', reason="non-deferred-result")) + + resample = frame_base.wont_implement_method( + DataFrameGroupBy, 'resample', reason='event-time-semantics') + rolling = frame_base.wont_implement_method( + DataFrameGroupBy, 'rolling', reason='event-time-semantics') + ewm = frame_base.wont_implement_method( + DataFrameGroupBy, 'ewm', reason="event-time-semantics") + expanding = frame_base.wont_implement_method( + DataFrameGroupBy, 'expanding', reason="event-time-semantics") + + tshift = frame_base.wont_implement_method( + DataFrameGroupBy, 'tshift', reason="deprecated") + +def _maybe_project_func(projection: Optional[List[str]]): + """ Returns identity func if projection is empty or None, else returns + a function that projects the specified columns. """ + if projection: + return lambda df: df[projection] + else: + return lambda x: x def _liftable_agg(meth, postagg_meth=None): - name, func = frame_base.name_and_func(meth) + agg_name, _ = frame_base.name_and_func(meth) if postagg_meth is None: - post_agg_name, post_agg_func = name, func + post_agg_name = agg_name else: - post_agg_name, post_agg_func = frame_base.name_and_func(postagg_meth) + post_agg_name, _ = frame_base.name_and_func(postagg_meth) + @frame_base.with_docs_from(DataFrameGroupBy, name=agg_name) def wrapper(self, *args, **kwargs): assert isinstance(self, DeferredGroupBy) - ungrouped = self._expr.args()[0] - to_group = ungrouped.proxy().index + if 'min_count' in kwargs: + return _unliftable_agg(meth)(self, *args, **kwargs) + + to_group = self._ungrouped.proxy().index is_categorical_grouping = any(to_group.get_level_values(i).is_categorical() - for i in range(to_group.nlevels)) + for i in self._grouping_indexes) groupby_kwargs = self._kwargs # Don't include un-observed categorical values in the preagg preagg_groupby_kwargs = groupby_kwargs.copy() preagg_groupby_kwargs['observed'] = True + + project = _maybe_project_func(self._projection) pre_agg = expressions.ComputedExpression( - 'pre_combine_' + name, - lambda df: func( - df.groupby(level=list(range(df.index.nlevels)), - **preagg_groupby_kwargs), - **kwargs), - [ungrouped], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Singleton()) + 'pre_combine_' + agg_name, + lambda df: getattr( + project( + df.groupby(level=list(range(df.index.nlevels)), + **preagg_groupby_kwargs) + ), + agg_name)(**kwargs), + [self._ungrouped], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary()) + post_agg = expressions.ComputedExpression( 'post_combine_' + post_agg_name, - lambda df: post_agg_func( - df.groupby(level=list(range(df.index.nlevels)), **groupby_kwargs), - **kwargs), + lambda df: getattr( + df.groupby(level=list(range(df.index.nlevels)), + **groupby_kwargs), + post_agg_name)(**kwargs), [pre_agg], - requires_partition_by=(partitionings.Singleton() + requires_partition_by=(partitionings.Singleton(reason=( + "Aggregations grouped by a categorical column are not currently " + "parallelizable (BEAM-11190)." + )) if is_categorical_grouping else partitionings.Index()), - preserves_partition_by=partitionings.Singleton()) + preserves_partition_by=partitionings.Arbitrary()) return frame_base.DeferredFrame.wrap(post_agg) return wrapper def _unliftable_agg(meth): - name, func = frame_base.name_and_func(meth) + agg_name, _ = frame_base.name_and_func(meth) + @frame_base.with_docs_from(DataFrameGroupBy, name=agg_name) def wrapper(self, *args, **kwargs): assert isinstance(self, DeferredGroupBy) - ungrouped = self._expr.args()[0] - to_group = ungrouped.proxy().index + to_group = self._ungrouped.proxy().index is_categorical_grouping = any(to_group.get_level_values(i).is_categorical() - for i in range(to_group.nlevels)) + for i in self._grouping_indexes) groupby_kwargs = self._kwargs + project = _maybe_project_func(self._projection) post_agg = expressions.ComputedExpression( - name, - lambda df: func( - df.groupby(level=list(range(df.index.nlevels)), **groupby_kwargs), - **kwargs), - [ungrouped], - requires_partition_by=(partitionings.Singleton() + agg_name, + lambda df: getattr(project( + df.groupby(level=list(range(df.index.nlevels)), + **groupby_kwargs), + ), agg_name)(**kwargs), + [self._ungrouped], + requires_partition_by=(partitionings.Singleton(reason=( + "Aggregations grouped by a categorical column are not currently " + "parallelizable (BEAM-11190)." + )) if is_categorical_grouping else partitionings.Index()), - preserves_partition_by=partitionings.Singleton()) + # Some aggregation methods (e.g. corr/cov) add additional index levels. + # We only preserve the ones that existed _before_ the groupby. + preserves_partition_by=partitionings.Index( + list(range(self._ungrouped.proxy().index.nlevels)))) return frame_base.DeferredFrame.wrap(post_agg) return wrapper -LIFTABLE_AGGREGATIONS = ['all', 'any', 'max', 'min', 'prod', 'sum'] -LIFTABLE_WITH_SUM_AGGREGATIONS = ['size', 'count'] -UNLIFTABLE_AGGREGATIONS = ['mean', 'median', 'std', 'var'] - for meth in LIFTABLE_AGGREGATIONS: setattr(DeferredGroupBy, meth, _liftable_agg(meth)) for meth in LIFTABLE_WITH_SUM_AGGREGATIONS: setattr(DeferredGroupBy, meth, _liftable_agg(meth, postagg_meth='sum')) for meth in UNLIFTABLE_AGGREGATIONS: + if meth in ('kurt', 'kurtosis'): + # pandas doesn't currently allow kurtosis on GroupBy: + # https://github.com/pandas-dev/pandas/issues/40139 + continue setattr(DeferredGroupBy, meth, _unliftable_agg(meth)) +def _check_str_or_np_builtin(agg_func, func_list): + return agg_func in func_list or ( + getattr(agg_func, '__name__', None) in func_list + and agg_func.__module__ in ('numpy', 'builtins')) + + +def _is_associative(agg_func): + return _check_str_or_np_builtin(agg_func, LIFTABLE_AGGREGATIONS) -def _is_associative(func): - return func in LIFTABLE_AGGREGATIONS or ( - getattr(func, '__name__', None) in LIFTABLE_AGGREGATIONS - and func.__module__ in ('numpy', 'builtins')) +def _is_liftable_with_sum(agg_func): + return _check_str_or_np_builtin(agg_func, LIFTABLE_WITH_SUM_AGGREGATIONS) +def _is_unliftable(agg_func): + return _check_str_or_np_builtin(agg_func, UNLIFTABLE_AGGREGATIONS) +NUMERIC_AGGREGATIONS = ['max', 'min', 'prod', 'sum', 'mean', 'median', 'std', + 'var', 'sem', 'mad', 'skew', 'kurt', 'kurtosis'] -@populate_not_implemented(pd.core.groupby.generic.DataFrameGroupBy) +def _is_numeric(agg_func): + return _check_str_or_np_builtin(agg_func, NUMERIC_AGGREGATIONS) + + +@populate_not_implemented(DataFrameGroupBy) class _DeferredGroupByCols(frame_base.DeferredFrame): # It's not clear that all of these make sense in Pandas either... - agg = aggregate = frame_base._elementwise_method('agg') - any = frame_base._elementwise_method('any') - all = frame_base._elementwise_method('all') - boxplot = frame_base.wont_implement_method('plot') - describe = frame_base.wont_implement_method('describe') - diff = frame_base._elementwise_method('diff') - fillna = frame_base._elementwise_method('fillna') - filter = frame_base._elementwise_method('filter') - first = frame_base.wont_implement_method('order sensitive') - get_group = frame_base._elementwise_method('group') - head = frame_base.wont_implement_method('order sensitive') - hist = frame_base.wont_implement_method('plot') - idxmax = frame_base._elementwise_method('idxmax') - idxmin = frame_base._elementwise_method('idxmin') - last = frame_base.wont_implement_method('order sensitive') - mad = frame_base._elementwise_method('mad') - max = frame_base._elementwise_method('max') - mean = frame_base._elementwise_method('mean') - median = frame_base._elementwise_method('median') - min = frame_base._elementwise_method('min') - nunique = frame_base._elementwise_method('nunique') - plot = frame_base.wont_implement_method('plot') - prod = frame_base._elementwise_method('prod') - quantile = frame_base._elementwise_method('quantile') - shift = frame_base._elementwise_method('shift') - size = frame_base._elementwise_method('size') - skew = frame_base._elementwise_method('skew') - std = frame_base._elementwise_method('std') - sum = frame_base._elementwise_method('sum') - tail = frame_base.wont_implement_method('order sensitive') - take = frame_base.wont_implement_method('deprectated') - tshift = frame_base._elementwise_method('tshift') - var = frame_base._elementwise_method('var') - - @property + agg = aggregate = frame_base._elementwise_method('agg', base=DataFrameGroupBy) + any = frame_base._elementwise_method('any', base=DataFrameGroupBy) + all = frame_base._elementwise_method('all', base=DataFrameGroupBy) + boxplot = frame_base.wont_implement_method( + DataFrameGroupBy, 'boxplot', reason="plotting-tools") + describe = frame_base.not_implemented_method('describe', + base_type=DataFrameGroupBy) + diff = frame_base._elementwise_method('diff', base=DataFrameGroupBy) + fillna = frame_base._elementwise_method('fillna', base=DataFrameGroupBy) + filter = frame_base._elementwise_method('filter', base=DataFrameGroupBy) + first = frame_base._elementwise_method('first', base=DataFrameGroupBy) + get_group = frame_base._elementwise_method('get_group', base=DataFrameGroupBy) + head = frame_base.wont_implement_method( + DataFrameGroupBy, 'head', explanation=_PEEK_METHOD_EXPLANATION) + hist = frame_base.wont_implement_method( + DataFrameGroupBy, 'hist', reason="plotting-tools") + idxmax = frame_base._elementwise_method('idxmax', base=DataFrameGroupBy) + idxmin = frame_base._elementwise_method('idxmin', base=DataFrameGroupBy) + last = frame_base._elementwise_method('last', base=DataFrameGroupBy) + mad = frame_base._elementwise_method('mad', base=DataFrameGroupBy) + max = frame_base._elementwise_method('max', base=DataFrameGroupBy) + mean = frame_base._elementwise_method('mean', base=DataFrameGroupBy) + median = frame_base._elementwise_method('median', base=DataFrameGroupBy) + min = frame_base._elementwise_method('min', base=DataFrameGroupBy) + nunique = frame_base._elementwise_method('nunique', base=DataFrameGroupBy) + plot = frame_base.wont_implement_method( + DataFrameGroupBy, 'plot', reason="plotting-tools") + prod = frame_base._elementwise_method('prod', base=DataFrameGroupBy) + quantile = frame_base._elementwise_method('quantile', base=DataFrameGroupBy) + shift = frame_base._elementwise_method('shift', base=DataFrameGroupBy) + size = frame_base._elementwise_method('size', base=DataFrameGroupBy) + skew = frame_base._elementwise_method('skew', base=DataFrameGroupBy) + std = frame_base._elementwise_method('std', base=DataFrameGroupBy) + sum = frame_base._elementwise_method('sum', base=DataFrameGroupBy) + tail = frame_base.wont_implement_method( + DataFrameGroupBy, 'tail', explanation=_PEEK_METHOD_EXPLANATION) + take = frame_base.wont_implement_method( + DataFrameGroupBy, 'take', reason='deprecated') + tshift = frame_base._elementwise_method('tshift', base=DataFrameGroupBy) + var = frame_base._elementwise_method('var', base=DataFrameGroupBy) + + @property # type: ignore + @frame_base.with_docs_from(DataFrameGroupBy) def groups(self): return self._expr.proxy().groups - @property + @property # type: ignore + @frame_base.with_docs_from(DataFrameGroupBy) def indices(self): return self._expr.proxy().indices - @property + @property # type: ignore + @frame_base.with_docs_from(DataFrameGroupBy) def ndim(self): return self._expr.proxy().ndim - @property + @property # type: ignore + @frame_base.with_docs_from(DataFrameGroupBy) def ngroups(self): return self._expr.proxy().ngroups @@ -1668,10 +3938,36 @@ def __init__(self, frame): def names(self): return self._frame._expr.proxy().index.names + @names.setter + def names(self, value): + def set_index_names(df): + df = df.copy() + df.index.names = value + return df + + self._frame._expr = expressions.ComputedExpression( + 'set_index_names', + set_index_names, + [self._frame._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary()) + + @property + def name(self): + return self._frame._expr.proxy().index.name + + @name.setter + def name(self, value): + self.names = [value] + @property def ndim(self): return self._frame._expr.proxy().index.ndim + @property + def dtype(self): + return self._frame._expr.proxy().index.dtype + @property def nlevels(self): return self._frame._expr.proxy().index.nlevels @@ -1685,26 +3981,44 @@ class _DeferredLoc(object): def __init__(self, frame): self._frame = frame - def __getitem__(self, index): - if isinstance(index, tuple): - rows, cols = index + def __getitem__(self, key): + if isinstance(key, tuple): + rows, cols = key return self[rows][cols] - elif isinstance(index, list) and index and isinstance(index[0], bool): - # Aligned by numerical index. - raise NotImplementedError(type(index)) - elif isinstance(index, list): + elif isinstance(key, list) and key and isinstance(key[0], bool): + # Aligned by numerical key. + raise NotImplementedError(type(key)) + elif isinstance(key, list): # Select rows, but behaves poorly on missing values. - raise NotImplementedError(type(index)) - elif isinstance(index, slice): + raise NotImplementedError(type(key)) + elif isinstance(key, slice): args = [self._frame._expr] - func = lambda df: df.loc[index] - elif isinstance(index, frame_base.DeferredFrame): - args = [self._frame._expr, index._expr] - func = lambda df, index: df.loc[index] - elif callable(index): - - def checked_callable_index(df): - computed_index = index(df) + func = lambda df: df.loc[key] + elif isinstance(key, frame_base.DeferredFrame): + func = lambda df, key: df.loc[key] + if pd.core.dtypes.common.is_bool_dtype(key._expr.proxy()): + # Boolean indexer, just pass it in as-is + args = [self._frame._expr, key._expr] + else: + # Likely a DeferredSeries of labels, overwrite the key's index with it's + # values so we can colocate them with the labels they're selecting + def data_to_index(s): + s = s.copy() + s.index = s + return s + + reindexed_expr = expressions.ComputedExpression( + 'data_to_index', + data_to_index, + [key._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Singleton(), + ) + args = [self._frame._expr, reindexed_expr] + elif callable(key): + + def checked_callable_key(df): + computed_index = key(df) if isinstance(computed_index, tuple): row_index, _ = computed_index else: @@ -1717,9 +4031,9 @@ def checked_callable_index(df): return computed_index args = [self._frame._expr] - func = lambda df: df.loc[checked_callable_index] + func = lambda df: df.loc[checked_callable_key] else: - raise NotImplementedError(type(index)) + raise NotImplementedError(type(key)) return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( @@ -1729,10 +4043,11 @@ def checked_callable_index(df): requires_partition_by=( partitionings.Index() if len(args) > 1 - else partitionings.Nothing()), - preserves_partition_by=partitionings.Singleton())) + else partitionings.Arbitrary()), + preserves_partition_by=partitionings.Arbitrary())) - __setitem__ = frame_base.not_implemented_method('loc.setitem') + __setitem__ = frame_base.not_implemented_method( + 'loc.setitem', base_type=pd.core.indexing._LocIndexer) @populate_not_implemented(pd.core.indexing._iLocIndexer) class _DeferredILoc(object): @@ -1743,37 +4058,49 @@ def __getitem__(self, index): if isinstance(index, tuple): rows, _ = index if rows != slice(None, None, None): - raise frame_base.WontImplementError('order-sensitive') + raise frame_base.WontImplementError( + "Using iloc to select rows is not supported because it's " + "position-based indexing is sensitive to the order of the data.", + reason="order-sensitive") return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( 'iloc', lambda df: df.iloc[index], [self._frame._expr], - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Singleton())) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary())) else: - raise frame_base.WontImplementError('order-sensitive') + raise frame_base.WontImplementError( + "Using iloc to select rows is not supported because it's " + "position-based indexing is sensitive to the order of the data.", + reason="order-sensitive") - __setitem__ = frame_base.wont_implement_method('iloc.setitem') + def __setitem__(self, index, value): + raise frame_base.WontImplementError( + "Using iloc to mutate a frame is not supported because it's " + "position-based indexing is sensitive to the order of the data.", + reason="order-sensitive") class _DeferredStringMethods(frame_base.DeferredBase): + @frame_base.with_docs_from(pd.core.strings.StringMethods) @frame_base.args_to_kwargs(pd.core.strings.StringMethods) @frame_base.populate_defaults(pd.core.strings.StringMethods) def cat(self, others, join, **kwargs): + """If defined, ``others`` must be a :class:`DeferredSeries` or a ``list`` of + ``DeferredSeries``.""" if others is None: # Concatenate series into a single String - requires = partitionings.Singleton() + requires = partitionings.Singleton(reason=( + "cat(others=None) concatenates all data in a Series into a single " + "string, so it requires collecting all data on a single node." + )) func = lambda df: df.str.cat(join=join, **kwargs) args = [self._expr] elif (isinstance(others, frame_base.DeferredBase) or (isinstance(others, list) and all(isinstance(other, frame_base.DeferredBase) for other in others))): - if join is None: - raise frame_base.WontImplementError("cat with others=Series or " - "others=List[Series] requires " - "join to be specified.") if isinstance(others, frame_base.DeferredBase): others = [others] @@ -1784,9 +4111,11 @@ def func(*args): args = [self._expr] + [other._expr for other in others] else: - raise frame_base.WontImplementError("others must be None, Series, or " - "List[Series]. List[str] is not " - "supported.") + raise frame_base.WontImplementError( + "others must be None, DeferredSeries, or List[DeferredSeries] " + f"(encountered {type(others)}). Other types are not supported " + "because they make this operation sensitive to the order of the " + "data.", reason="order-sensitive") return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( @@ -1794,10 +4123,13 @@ def func(*args): func, args, requires_partition_by=requires, - preserves_partition_by=partitionings.Singleton())) + preserves_partition_by=partitionings.Arbitrary())) + @frame_base.with_docs_from(pd.core.strings.StringMethods) @frame_base.args_to_kwargs(pd.core.strings.StringMethods) def repeat(self, repeats): + """``repeats`` must be an ``int`` or a :class:`DeferredSeries`. Lists are + not supported because they make this operation order-sensitive.""" if isinstance(repeats, int): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( @@ -1808,8 +4140,8 @@ def repeat(self, repeats): # Currently it incorrectly infers dtype bool, may require upstream # fix. proxy=self._expr.proxy(), - requires_partition_by=partitionings.Nothing(), - preserves_partition_by=partitionings.Singleton())) + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary())) elif isinstance(repeats, frame_base.DeferredBase): return frame_base.DeferredFrame.wrap( expressions.ComputedExpression( @@ -1821,10 +4153,27 @@ def repeat(self, repeats): # fix. proxy=self._expr.proxy(), requires_partition_by=partitionings.Index(), - preserves_partition_by=partitionings.Singleton())) + preserves_partition_by=partitionings.Arbitrary())) elif isinstance(repeats, list): - raise frame_base.WontImplementError("repeats must be an integer or a " - "Series.") + raise frame_base.WontImplementError( + "str.repeat(repeats=) repeats must be an int or a DeferredSeries. " + "Lists are not supported because they make this operation sensitive " + "to the order of the data.", reason="order-sensitive") + else: + raise TypeError("str.repeat(repeats=) value must be an int or a " + f"DeferredSeries (encountered {type(repeats)}).") + + get_dummies = frame_base.wont_implement_method( + pd.core.strings.StringMethods, 'get_dummies', + reason='non-deferred-columns') + + split = frame_base.wont_implement_method( + pd.core.strings.StringMethods, 'split', + reason='non-deferred-columns') + + rsplit = frame_base.wont_implement_method( + pd.core.strings.StringMethods, 'rsplit', + reason='non-deferred-columns') ELEMENTWISE_STRING_METHODS = [ @@ -1838,7 +4187,6 @@ def repeat(self, repeats): 'findall', 'fullmatch', 'get', - 'get_dummies', 'isalnum', 'isalpha', 'isdecimal', @@ -1857,11 +4205,9 @@ def repeat(self, repeats): 'partition', 'replace', 'rpartition', - 'rsplit', 'rstrip', 'slice', 'slice_replace', - 'split', 'startswith', 'strip', 'swapcase', @@ -1874,13 +4220,212 @@ def repeat(self, repeats): def make_str_func(method): def func(df, *args, **kwargs): - return getattr(df.str, method)(*args, **kwargs) + try: + df_str = df.str + except AttributeError: + # If there's a non-string value in a Series passed to .str method, pandas + # will generally just replace it with NaN in the result. However if + # there are _only_ non-string values, pandas will raise: + # + # AttributeError: Can only use .str accessor with string values! + # + # This can happen to us at execution time if we split a partition that is + # only non-strings. This branch just replaces all those values with NaN + # in that case. + return df.map(lambda _: np.nan) + else: + return getattr(df_str, method)(*args, **kwargs) + return func for method in ELEMENTWISE_STRING_METHODS: setattr(_DeferredStringMethods, method, - frame_base._elementwise_method(make_str_func(method))) + frame_base._elementwise_method(make_str_func(method), + name=method, + base=pd.core.strings.StringMethods)) + + +def make_cat_func(method): + def func(df, *args, **kwargs): + return getattr(df.cat, method)(*args, **kwargs) + + return func + + +class _DeferredCategoricalMethods(frame_base.DeferredBase): + @property # type: ignore + @frame_base.with_docs_from(pd.core.arrays.categorical.CategoricalAccessor) + def categories(self): + return self._expr.proxy().cat.categories + + @property # type: ignore + @frame_base.with_docs_from(pd.core.arrays.categorical.CategoricalAccessor) + def ordered(self): + return self._expr.proxy().cat.ordered + + @property # type: ignore + @frame_base.with_docs_from(pd.core.arrays.categorical.CategoricalAccessor) + def codes(self): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'codes', + lambda s: s.cat.codes, + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary(), + ) + ) + + remove_unused_categories = frame_base.wont_implement_method( + pd.core.arrays.categorical.CategoricalAccessor, + 'remove_unused_categories', reason="non-deferred-columns") + +ELEMENTWISE_CATEGORICAL_METHODS = [ + 'add_categories', + 'as_ordered', + 'as_unordered', + 'remove_categories', + 'rename_categories', + 'reorder_categories', + 'set_categories', +] + +for method in ELEMENTWISE_CATEGORICAL_METHODS: + setattr(_DeferredCategoricalMethods, + method, + frame_base._elementwise_method( + make_cat_func(method), name=method, + base=pd.core.arrays.categorical.CategoricalAccessor)) + +class _DeferredDatetimeMethods(frame_base.DeferredBase): + @property # type: ignore + @frame_base.with_docs_from(pd.core.indexes.accessors.DatetimeProperties) + def tz(self): + return self._expr.proxy().dt.tz + + @property # type: ignore + @frame_base.with_docs_from(pd.core.indexes.accessors.DatetimeProperties) + def freq(self): + return self._expr.proxy().dt.freq + + @frame_base.with_docs_from(pd.core.indexes.accessors.DatetimeProperties) + def tz_localize(self, *args, ambiguous='infer', **kwargs): + """``ambiguous`` cannot be set to ``"infer"`` as its semantics are + order-sensitive. Similarly, specifying ``ambiguous`` as an + :class:`~numpy.ndarray` is order-sensitive, but you can achieve similar + functionality by specifying ``ambiguous`` as a Series.""" + if isinstance(ambiguous, np.ndarray): + raise frame_base.WontImplementError( + "tz_localize(ambiguous=ndarray) is not supported because it makes " + "this operation sensitive to the order of the data. Please use a " + "DeferredSeries instead.", + reason="order-sensitive") + elif isinstance(ambiguous, frame_base.DeferredFrame): + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'tz_localize', + lambda s, + ambiguous: s.dt.tz_localize(*args, ambiguous=ambiguous, **kwargs), + [self._expr, ambiguous._expr], + requires_partition_by=partitionings.Index(), + preserves_partition_by=partitionings.Arbitrary())) + elif ambiguous == 'infer': + # infer attempts to infer based on the order of the timestamps + raise frame_base.WontImplementError( + f"tz_localize(ambiguous={ambiguous!r}) is not allowed because it " + "makes this operation sensitive to the order of the data.", + reason="order-sensitive") + + return frame_base.DeferredFrame.wrap( + expressions.ComputedExpression( + 'tz_localize', + lambda s: s.dt.tz_localize(*args, ambiguous=ambiguous, **kwargs), + [self._expr], + requires_partition_by=partitionings.Arbitrary(), + preserves_partition_by=partitionings.Arbitrary())) + + + to_period = frame_base.wont_implement_method( + pd.core.indexes.accessors.DatetimeProperties, 'to_period', + reason="event-time-semantics") + to_pydatetime = frame_base.wont_implement_method( + pd.core.indexes.accessors.DatetimeProperties, 'to_pydatetime', + reason="non-deferred-result") + to_pytimedelta = frame_base.wont_implement_method( + pd.core.indexes.accessors.DatetimeProperties, 'to_pytimedelta', + reason="non-deferred-result") + +def make_dt_property(method): + def func(df): + return getattr(df.dt, method) + + return func + +def make_dt_func(method): + def func(df, *args, **kwargs): + return getattr(df.dt, method)(*args, **kwargs) + + return func + + +ELEMENTWISE_DATETIME_METHODS = [ + 'ceil', + 'day_name', + 'month_name', + 'floor', + 'isocalendar', + 'round', + 'normalize', + 'strftime', + 'tz_convert', +] + +for method in ELEMENTWISE_DATETIME_METHODS: + setattr(_DeferredDatetimeMethods, + method, + frame_base._elementwise_method( + make_dt_func(method), + name=method, + base=pd.core.indexes.accessors.DatetimeProperties)) + +ELEMENTWISE_DATETIME_PROPERTIES = [ + 'date', + 'day', + 'dayofweek', + 'dayofyear', + 'days_in_month', + 'daysinmonth', + 'hour', + 'is_leap_year', + 'is_month_end', + 'is_month_start', + 'is_quarter_end', + 'is_quarter_start', + 'is_year_end', + 'is_year_start', + 'microsecond', + 'minute', + 'month', + 'nanosecond', + 'quarter', + 'second', + 'time', + 'timetz', + 'week', + 'weekday', + 'weekofyear', + 'year', +] + +for method in ELEMENTWISE_DATETIME_PROPERTIES: + setattr(_DeferredDatetimeMethods, + method, + property(frame_base._elementwise_method( + make_dt_property(method), + name=method, + base=pd.core.indexes.accessors.DatetimeProperties))) + for base in ['add', 'sub', @@ -1896,36 +4441,53 @@ def func(df, *args, **kwargs): for p in ['%s', 'r%s', '__%s__', '__r%s__']: # TODO: non-trivial level? name = p % base + if hasattr(pd.Series, name): + setattr( + DeferredSeries, + name, + frame_base._elementwise_method(name, restrictions={'level': None}, + base=pd.Series)) + if hasattr(pd.DataFrame, name): + setattr( + DeferredDataFrame, + name, + frame_base._elementwise_method(name, restrictions={'level': None}, + base=pd.DataFrame)) + inplace_name = '__i%s__' % base + if hasattr(pd.Series, inplace_name): setattr( DeferredSeries, - name, - frame_base._elementwise_method(name, restrictions={'level': None})) + inplace_name, + frame_base._elementwise_method(inplace_name, inplace=True, + base=pd.Series)) + if hasattr(pd.DataFrame, inplace_name): setattr( DeferredDataFrame, - name, - frame_base._elementwise_method(name, restrictions={'level': None})) - setattr( - DeferredSeries, - '__i%s__' % base, - frame_base._elementwise_method('__i%s__' % base, inplace=True)) - setattr( - DeferredDataFrame, - '__i%s__' % base, - frame_base._elementwise_method('__i%s__' % base, inplace=True)) + inplace_name, + frame_base._elementwise_method(inplace_name, inplace=True, + base=pd.DataFrame)) for name in ['lt', 'le', 'gt', 'ge', 'eq', 'ne']: for p in '%s', '__%s__': # Note that non-underscore name is used for both as the __xxx__ methods are # order-sensitive. - setattr(DeferredSeries, p % name, frame_base._elementwise_method(name)) - setattr(DeferredDataFrame, p % name, frame_base._elementwise_method(name)) + setattr(DeferredSeries, p % name, + frame_base._elementwise_method(name, base=pd.Series)) + setattr(DeferredDataFrame, p % name, + frame_base._elementwise_method(name, base=pd.DataFrame)) for name in ['__neg__', '__pos__', '__invert__']: - setattr(DeferredSeries, name, frame_base._elementwise_method(name)) - setattr(DeferredDataFrame, name, frame_base._elementwise_method(name)) + setattr(DeferredSeries, name, + frame_base._elementwise_method(name, base=pd.Series)) + setattr(DeferredDataFrame, name, + frame_base._elementwise_method(name, base=pd.DataFrame)) DeferredSeries.multiply = DeferredSeries.mul # type: ignore DeferredDataFrame.multiply = DeferredDataFrame.mul # type: ignore +DeferredSeries.subtract = DeferredSeries.sub # type: ignore +DeferredDataFrame.subtract = DeferredDataFrame.sub # type: ignore +DeferredSeries.divide = DeferredSeries.div # type: ignore +DeferredDataFrame.divide = DeferredDataFrame.div # type: ignore def _slice_parts(s): diff --git a/sdks/python/apache_beam/dataframe/frames_test.py b/sdks/python/apache_beam/dataframe/frames_test.py index 18799cb5965a..c3972ad7528f 100644 --- a/sdks/python/apache_beam/dataframe/frames_test.py +++ b/sdks/python/apache_beam/dataframe/frames_test.py @@ -14,49 +14,226 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - -import math -import sys import unittest import numpy as np import pandas as pd +from parameterized import parameterized import apache_beam as beam from apache_beam.dataframe import expressions from apache_beam.dataframe import frame_base -from apache_beam.dataframe import frames # pylint: disable=unused-import +from apache_beam.dataframe import frames +PD_VERSION = tuple(map(int, pd.__version__.split('.'))) -class DeferredFrameTest(unittest.TestCase): - def _run_test(self, func, *args, distributed=True): - deferred_args = [ - frame_base.DeferredFrame.wrap( - expressions.ConstantExpression(arg, arg[0:0])) for arg in args - ] +GROUPBY_DF = pd.DataFrame({ + 'group': ['a' if i % 5 == 0 or i % 3 == 0 else 'b' for i in range(100)], + 'foo': [None if i % 11 == 0 else i for i in range(100)], + 'bar': [None if i % 7 == 0 else 99 - i for i in range(100)], + 'baz': [None if i % 13 == 0 else i * 2 for i in range(100)], + 'bool': [i % 17 == 0 for i in range(100)], + 'str': [str(i) for i in range(100)], +}) + + +def _get_deferred_args(*args): + return [ + frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(arg, arg[0:0])) for arg in args + ] + + +class _AbstractFrameTest(unittest.TestCase): + """Test sub-class with utilities for verifying DataFrame operations.""" + def _run_error_test( + self, func, *args, construction_time=True, distributed=True): + """Verify that func(*args) raises the same exception in pandas and in Beam. + + Note that by default this only checks for exceptions that the Beam DataFrame + API raises during expression generation (i.e. construction time). + Exceptions raised while the pipeline is executing are less helpful, but + are sometimes unavoidable (e.g. data validation exceptions), to check for + these exceptions use construction_time=False.""" + deferred_args = _get_deferred_args(*args) + + # Get expected error + try: + expected = func(*args) + except Exception as e: + expected_error = e + else: + raise AssertionError( + "Expected an error, but executing with pandas successfully " + f"returned:\n{expected}") + + # Get actual error + if construction_time: + try: + _ = func(*deferred_args)._expr + except Exception as e: + actual = e + else: + raise AssertionError( + f"Expected an error:\n{expected_error}\nbut Beam successfully " + f"generated an expression.") + else: # not construction_time + # Check for an error raised during pipeline execution + expr = func(*deferred_args)._expr + session_type = ( + expressions.PartitioningSession + if distributed else expressions.Session) + try: + result = session_type({}).evaluate(expr) + except Exception as e: + actual = e + else: + raise AssertionError( + f"Expected an error:\n{expected_error}\nbut Beam successfully " + f"Computed the result:\n{result}.") + + # Verify + if (not isinstance(actual, type(expected_error)) or + str(expected_error) not in str(actual)): + raise AssertionError( + f'Expected {expected_error!r} to be raised, but got {actual!r}' + ) from actual + + def _run_inplace_test(self, func, arg, **kwargs): + """Verify an inplace operation performed by func. + + Checks that func performs the same inplace operation on arg, in pandas and + in Beam.""" + def wrapper(df): + df = df.copy() + func(df) + return df + + self._run_test(wrapper, arg, **kwargs) + + def _run_test( + self, func, *args, distributed=True, nonparallel=False, check_proxy=True): + """Verify that func(*args) produces the same result in pandas and in Beam. + + Args: + distributed (bool): Whether or not to use PartitioningSession to + simulate parallel execution. + nonparallel (bool): Whether or not this function contains a + non-parallelizable operation. If True, the expression will be + generated twice, once outside of an allow_non_parallel_operations + block (to verify NonParallelOperation is raised), and again inside + of an allow_non_parallel_operations block to actually generate an + expression to verify. + check_proxy (bool): Whether or not to check that the proxy of the + generated expression matches the actual result, defaults to True. + This option should NOT be set to False in tests added for new + operations if at all possible. Instead make sure the new operation + produces the correct proxy. This flag only exists as an escape hatch + until existing failures can be addressed (BEAM-12379).""" + # Compute expected value expected = func(*args) + + # Compute actual value + deferred_args = _get_deferred_args(*args) + if nonparallel: + # First run outside a nonparallel block to confirm this raises as expected + with self.assertRaises(expressions.NonParallelOperation) as raised: + func(*deferred_args) + + if raised.exception.msg.startswith( + "Encountered non-parallelizable form of"): + raise AssertionError( + "Default NonParallelOperation raised, please specify a reason in " + "the Singleton() partitioning requirement for this operation." + ) from raised.exception + + # Re-run in an allow non parallel block to get an expression to verify + with beam.dataframe.allow_non_parallel_operations(): + expr = func(*deferred_args)._expr + else: + expr = func(*deferred_args)._expr + + # Compute the result of the generated expression session_type = ( expressions.PartitioningSession if distributed else expressions.Session) - actual = session_type({}).evaluate(func(*deferred_args)._expr) - if hasattr(expected, 'equals'): + + actual = session_type({}).evaluate(expr) + + # Verify + if isinstance(expected, pd.core.generic.NDFrame): if distributed: - cmp = lambda df: expected.sort_index().equals(df.sort_index()) + if expected.index.is_unique: + expected = expected.sort_index() + actual = actual.sort_index() + else: + expected = expected.sort_values(list(expected.columns)) + actual = actual.sort_values(list(actual.columns)) + + if isinstance(expected, pd.Series): + pd.testing.assert_series_equal(expected, actual) + elif isinstance(expected, pd.DataFrame): + pd.testing.assert_frame_equal(expected, actual) else: - cmp = expected.equals - elif isinstance(expected, float): - cmp = lambda x: (math.isnan(x) and math.isnan(expected) - ) or x == expected == 0 or abs(expected - x) / ( - abs(expected) + abs(x)) < 1e-8 + raise ValueError( + f"Expected value is a {type(expected)}," + "not a Series or DataFrame.") + else: - cmp = expected.__eq__ - self.assertTrue( - cmp(actual), 'Expected:\n\n%r\n\nActual:\n\n%r' % (expected, actual)) + # Expectation is not a pandas object + if isinstance(expected, float): + if np.isnan(expected): + cmp = np.isnan + else: + cmp = lambda x: np.isclose(expected, x) + else: + cmp = expected.__eq__ + self.assertTrue( + cmp(actual), 'Expected:\n\n%r\n\nActual:\n\n%r' % (expected, actual)) + + if check_proxy: + # Verify that the actual result agrees with the proxy + proxy = expr.proxy() + + if type(actual) in (np.float32, np.float64): + self.assertTrue(type(actual) == type(proxy) or np.isnan(proxy)) + else: + self.assertEqual(type(actual), type(proxy)) + + if isinstance(expected, pd.core.generic.NDFrame): + if isinstance(expected, pd.Series): + self.assertEqual(actual.dtype, proxy.dtype) + self.assertEqual(actual.name, proxy.name) + elif isinstance(expected, pd.DataFrame): + pd.testing.assert_series_equal(actual.dtypes, proxy.dtypes) + + else: + raise ValueError( + f"Expected value is a {type(expected)}," + "not a Series or DataFrame.") + + self.assertEqual(actual.index.names, proxy.index.names) + for i in range(actual.index.nlevels): + self.assertEqual( + actual.index.get_level_values(i).dtype, + proxy.index.get_level_values(i).dtype) + +class DeferredFrameTest(_AbstractFrameTest): + """Miscellaneous tessts for DataFrame operations.""" def test_series_arithmetic(self): a = pd.Series([1, 2, 3]) b = pd.Series([100, 200, 300]) + self._run_test(lambda a, b: a - 2 * b, a, b) + self._run_test(lambda a, b: a.subtract(2).multiply(b).divide(a), a, b) + + def test_dataframe_arithmetic(self): + df = pd.DataFrame({'a': [1, 2, 3], 'b': [100, 200, 300]}) + df2 = pd.DataFrame({'a': [3000, 1000, 2000], 'b': [7, 11, 13]}) + + self._run_test(lambda df, df2: df - 2 * df2, df, df2) + self._run_test( + lambda df, df2: df.subtract(2).multiply(df2).divide(df), df, df2) def test_get_column(self): df = pd.DataFrame({ @@ -65,66 +242,215 @@ def test_get_column(self): }) self._run_test(lambda df: df['Animal'], df) self._run_test(lambda df: df.Speed, df) + self._run_test(lambda df: df.get('Animal'), df) + self._run_test(lambda df: df.get('FOO', df.Animal), df) + + def test_series_xs(self): + # pandas doctests only verify DataFrame.xs, here we verify Series.xs as well + d = { + 'num_legs': [4, 4, 2, 2], + 'num_wings': [0, 0, 2, 2], + 'class': ['mammal', 'mammal', 'mammal', 'bird'], + 'animal': ['cat', 'dog', 'bat', 'penguin'], + 'locomotion': ['walks', 'walks', 'flies', 'walks'] + } + df = pd.DataFrame(data=d) + df = df.set_index(['class', 'animal', 'locomotion']) + + self._run_test(lambda df: df.num_legs.xs('mammal'), df) + self._run_test(lambda df: df.num_legs.xs(('mammal', 'dog')), df) + self._run_test(lambda df: df.num_legs.xs('cat', level=1), df) + self._run_test( + lambda df: df.num_legs.xs(('bird', 'walks'), level=[0, 'locomotion']), + df) def test_set_column(self): def new_column(df): df['NewCol'] = df['Speed'] - return df df = pd.DataFrame({ 'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'], 'Speed': [380., 370., 24., 26.] }) - self._run_test(new_column, df) + self._run_inplace_test(new_column, df) def test_set_column_from_index(self): def new_column(df): df['NewCol'] = df.index - return df df = pd.DataFrame({ 'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'], 'Speed': [380., 370., 24., 26.] }) - self._run_test(new_column, df) + self._run_inplace_test(new_column, df) + + def test_tz_localize_ambiguous_series(self): + # This replicates a tz_localize doctest: + # s.tz_localize('CET', ambiguous=np.array([True, True, False])) + # But using a DeferredSeries instead of a np array + + s = pd.Series( + range(3), + index=pd.DatetimeIndex([ + '2018-10-28 01:20:00', '2018-10-28 02:36:00', '2018-10-28 03:46:00' + ])) + ambiguous = pd.Series([True, True, False], index=s.index) + + self._run_test( + lambda s, + ambiguous: s.tz_localize('CET', ambiguous=ambiguous), + s, + ambiguous) + + def test_tz_convert(self): + # This replicates a tz_localize doctest: + # s.tz_localize('CET', ambiguous=np.array([True, True, False])) + # But using a DeferredSeries instead of a np array + + s = pd.Series( + range(3), + index=pd.DatetimeIndex([ + '2018-10-27 01:20:00', '2018-10-27 02:36:00', '2018-10-27 03:46:00' + ], + tz='Europe/Berlin')) + + self._run_test(lambda s: s.tz_convert('America/Los_Angeles'), s) - def test_groupby(self): + def test_sort_index_columns(self): df = pd.DataFrame({ - 'group': ['a' if i % 5 == 0 or i % 3 == 0 else 'b' for i in range(100)], - 'value': [None if i % 11 == 0 else i for i in range(100)] + 'c': range(10), + 'a': range(10), + 'b': range(10), + np.nan: range(10), }) - self._run_test(lambda df: df.groupby('group').agg(sum), df) - self._run_test(lambda df: df.groupby('group').sum(), df) - self._run_test(lambda df: df.groupby('group').median(), df) - self._run_test(lambda df: df.groupby('group').size(), df) - self._run_test(lambda df: df.groupby('group').count(), df) - self._run_test(lambda df: df.groupby('group').max(), df) - self._run_test(lambda df: df.groupby('group').min(), df) - self._run_test(lambda df: df.groupby('group').mean(), df) - - self._run_test(lambda df: df[df.value > 30].groupby('group').sum(), df) - self._run_test(lambda df: df[df.value > 30].groupby('group').mean(), df) - self._run_test(lambda df: df[df.value > 30].groupby('group').size(), df) - - # Grouping by a series is not currently supported - #self._run_test(lambda df: df[df.value > 40].groupby(df.group).sum(), df) - #self._run_test(lambda df: df[df.value > 40].groupby(df.group).mean(), df) - #self._run_test(lambda df: df[df.value > 40].groupby(df.group).size(), df) - # Example from https://pandas.pydata.org/docs/user_guide/groupby.html - arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], - ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + self._run_test(lambda df: df.sort_index(axis=1), df) + self._run_test(lambda df: df.sort_index(axis=1, ascending=False), df) + self._run_test(lambda df: df.sort_index(axis=1, na_position='first'), df) - index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second']) + def test_where_callable_args(self): + df = pd.DataFrame( + np.arange(10, dtype=np.int64).reshape(-1, 2), columns=['A', 'B']) + + self._run_test( + lambda df: df.where(lambda df: df % 2 == 0, lambda df: df * 10), df) + def test_where_concrete_args(self): + df = pd.DataFrame( + np.arange(10, dtype=np.int64).reshape(-1, 2), columns=['A', 'B']) + + self._run_test( + lambda df: df.where( + df % 2 == 0, pd.Series({ + 'A': 123, 'B': 456 + }), axis=1), + df) + + def test_combine_dataframe(self): + df = pd.DataFrame({'A': [0, 0], 'B': [4, 4]}) + df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]}) + take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2 + self._run_test( + lambda df, + df2: df.combine(df2, take_smaller), + df, + df2, + nonparallel=True) + + def test_combine_dataframe_fill(self): + df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]}) + df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]}) + take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2 + self._run_test( + lambda df1, + df2: df1.combine(df2, take_smaller, fill_value=-5), + df1, + df2, + nonparallel=True) + + def test_combine_Series(self): + with expressions.allow_non_parallel_operations(): + s1 = pd.Series({'falcon': 330.0, 'eagle': 160.0}) + s2 = pd.Series({'falcon': 345.0, 'eagle': 200.0, 'duck': 30.0}) + self._run_test(lambda s1, s2: s1.combine(s2, max), s1, s2) + + def test_combine_first_dataframe(self): + df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]}) + df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]}) + + self._run_test(lambda df1, df2: df1.combine_first(df2), df1, df2) + + def test_combine_first_series(self): + s1 = pd.Series([1, np.nan]) + s2 = pd.Series([3, 4]) + + self._run_test(lambda s1, s2: s1.combine_first(s2), s1, s2) + + def test_add_prefix(self): + df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}) + s = pd.Series([1, 2, 3, 4]) + + self._run_test(lambda df: df.add_prefix('col_'), df) + self._run_test(lambda s: s.add_prefix('col_'), s) + + def test_add_suffix(self): + df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}) + s = pd.Series([1, 2, 3, 4]) + + self._run_test(lambda df: df.add_suffix('_col'), df) + self._run_test(lambda s: s.add_prefix('_col'), s) + + def test_set_index(self): df = pd.DataFrame({ - 'A': [1, 1, 1, 1, 2, 2, 3, 3], 'B': np.arange(8) - }, - index=index) + # [19, 18, ..] + 'index1': reversed(range(20)), # [15, 16, .., 0, 1, .., 13, 14] + 'index2': np.roll(range(20), 5), # ['', 'a', 'bb', ...] + 'values': [chr(ord('a') + i) * i for i in range(20)], + }) - self._run_test(lambda df: df.groupby(['second', 'A']).sum(), df) + self._run_test(lambda df: df.set_index(['index1', 'index2']), df) + self._run_test(lambda df: df.set_index(['index1', 'index2'], drop=True), df) + self._run_test(lambda df: df.set_index('values'), df) + + self._run_error_test(lambda df: df.set_index('bad'), df) + self._run_error_test( + lambda df: df.set_index(['index2', 'bad', 'really_bad']), df) + + def test_series_drop_ignore_errors(self): + midx = pd.MultiIndex( + levels=[['lama', 'cow', 'falcon'], ['speed', 'weight', 'length']], + codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]]) + s = pd.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx) + + # drop() requires singleton partitioning unless errors are ignored + # Add some additional tests here to make sure the implementation works in + # non-singleton partitioning. + self._run_test(lambda s: s.drop('lama', level=0, errors='ignore'), s) + self._run_test(lambda s: s.drop(('cow', 'speed'), errors='ignore'), s) + self._run_test(lambda s: s.drop('falcon', level=0, errors='ignore'), s) + + def test_dataframe_drop_ignore_errors(self): + midx = pd.MultiIndex( + levels=[['lama', 'cow', 'falcon'], ['speed', 'weight', 'length']], + codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]]) + df = pd.DataFrame( + index=midx, + columns=['big', 'small'], + data=[[45, 30], [200, 100], [1.5, 1], [30, 20], [250, 150], [1.5, 0.8], + [320, 250], [1, 0.8], [0.3, 0.2]]) + + # drop() requires singleton partitioning unless errors are ignored + # Add some additional tests here to make sure the implementation works in + # non-singleton partitioning. + self._run_test( + lambda df: df.drop(index='lama', level=0, errors='ignore'), df) + self._run_test( + lambda df: df.drop(index=('cow', 'speed'), errors='ignore'), df) + self._run_test( + lambda df: df.drop(index='falcon', level=0, errors='ignore'), df) + self._run_test( + lambda df: df.drop(index='cow', columns='small', errors='ignore'), df) - @unittest.skipIf(sys.version_info <= (3, ), 'differing signature') def test_merge(self): # This is from the pandas doctests, but fails due to re-indexing being # order-sensitive. @@ -134,23 +460,150 @@ def test_merge(self): df2 = pd.DataFrame({ 'rkey': ['foo', 'bar', 'baz', 'foo'], 'value': [5, 6, 7, 8] }) - with beam.dataframe.allow_non_parallel_operations(): - self._run_test( - lambda df1, - df2: df1.merge(df2, left_on='lkey', right_on='rkey').rename( - index=lambda x: '*').sort_values(['value_x', 'value_y']), - df1, - df2) - self._run_test( - lambda df1, - df2: df1.merge( - df2, - left_on='lkey', - right_on='rkey', - suffixes=('_left', '_right')).rename(index=lambda x: '*'). - sort_values(['value_left', 'value_right']), - df1, - df2) + self._run_test( + lambda df1, + df2: df1.merge(df2, left_on='lkey', right_on='rkey').rename( + index=lambda x: '*'), + df1, + df2, + nonparallel=True, + check_proxy=False) + self._run_test( + lambda df1, + df2: df1.merge( + df2, left_on='lkey', right_on='rkey', suffixes=('_left', '_right')). + rename(index=lambda x: '*'), + df1, + df2, + nonparallel=True, + check_proxy=False) + + def test_merge_left_join(self): + # This is from the pandas doctests, but fails due to re-indexing being + # order-sensitive. + df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]}) + df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]}) + + self._run_test( + lambda df1, + df2: df1.merge(df2, how='left', on='a').rename(index=lambda x: '*'), + df1, + df2, + nonparallel=True, + check_proxy=False) + + def test_merge_on_index(self): + # This is from the pandas doctests, but fails due to re-indexing being + # order-sensitive. + df1 = pd.DataFrame({ + 'lkey': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5] + }).set_index('lkey') + df2 = pd.DataFrame({ + 'rkey': ['foo', 'bar', 'baz', 'foo'], 'value': [5, 6, 7, 8] + }).set_index('rkey') + + self._run_test( + lambda df1, + df2: df1.merge(df2, left_index=True, right_index=True), + df1, + df2, + check_proxy=False) + + def test_merge_same_key(self): + df1 = pd.DataFrame({ + 'key': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5] + }) + df2 = pd.DataFrame({ + 'key': ['foo', 'bar', 'baz', 'foo'], 'value': [5, 6, 7, 8] + }) + self._run_test( + lambda df1, + df2: df1.merge(df2, on='key').rename(index=lambda x: '*'), + df1, + df2, + nonparallel=True, + check_proxy=False) + self._run_test( + lambda df1, + df2: df1.merge(df2, on='key', suffixes=('_left', '_right')).rename( + index=lambda x: '*'), + df1, + df2, + nonparallel=True, + check_proxy=False) + + def test_merge_same_key_doctest(self): + df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]}) + df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]}) + + self._run_test( + lambda df1, + df2: df1.merge(df2, how='left', on='a').rename(index=lambda x: '*'), + df1, + df2, + nonparallel=True, + check_proxy=False) + # Test without specifying 'on' + self._run_test( + lambda df1, + df2: df1.merge(df2, how='left').rename(index=lambda x: '*'), + df1, + df2, + nonparallel=True, + check_proxy=False) + + def test_merge_same_key_suffix_collision(self): + df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2], 'a_lsuffix': [5, 6]}) + df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4], 'a_rsuffix': [7, 8]}) + + self._run_test( + lambda df1, + df2: df1.merge( + df2, how='left', on='a', suffixes=('_lsuffix', '_rsuffix')).rename( + index=lambda x: '*'), + df1, + df2, + nonparallel=True, + check_proxy=False) + # Test without specifying 'on' + self._run_test( + lambda df1, + df2: df1.merge(df2, how='left', suffixes=('_lsuffix', '_rsuffix')). + rename(index=lambda x: '*'), + df1, + df2, + nonparallel=True, + check_proxy=False) + + def test_value_counts_with_nans(self): + # similar to doctests that verify value_counts, but include nan values to + # make sure we handle them correctly. + df = pd.DataFrame({ + 'num_legs': [2, 4, 4, 6, np.nan, np.nan], + 'num_wings': [2, 0, 0, 0, np.nan, 2] + }, + index=['falcon', 'dog', 'cat', 'ant', 'car', 'plane']) + + self._run_test(lambda df: df.value_counts(), df) + self._run_test(lambda df: df.value_counts(normalize=True), df) + + self._run_test(lambda df: df.num_wings.value_counts(), df) + self._run_test(lambda df: df.num_wings.value_counts(normalize=True), df) + + def test_value_counts_does_not_support_sort(self): + df = pd.DataFrame({ + 'num_legs': [2, 4, 4, 6, np.nan, np.nan], + 'num_wings': [2, 0, 0, 0, np.nan, 2] + }, + index=['falcon', 'dog', 'cat', 'ant', 'car', 'plane']) + + with self.assertRaisesRegex(frame_base.WontImplementError, + r"value_counts\(sort\=True\)"): + self._run_test(lambda df: df.value_counts(sort=True), df) + + with self.assertRaisesRegex(frame_base.WontImplementError, + r"value_counts\(sort\=True\)"): + self._run_test(lambda df: df.num_wings.value_counts(sort=True), df) def test_series_getitem(self): s = pd.Series([x**2 for x in range(10)]) @@ -162,6 +615,16 @@ def test_series_getitem(self): s.index = s.index.map(float) self._run_test(lambda s: s[1.5:6], s) + @parameterized.expand([ + (pd.Series(range(10)), ), # unique + (pd.Series(list(range(100)) + [0]), ), # non-unique int + (pd.Series(list(range(100)) + [0]) / 100, ), # non-unique flt + (pd.Series(['a', 'b', 'c', 'd']), ), # unique str + (pd.Series(['a', 'b', 'a', 'c', 'd']), ), # non-unique str + ]) + def test_series_is_unique(self, series): + self._run_test(lambda s: s.is_unique, series) + def test_dataframe_getitem(self): df = pd.DataFrame({'A': [x**2 for x in range(6)], 'B': list('abcdef')}) self._run_test(lambda df: df['A'], df) @@ -175,6 +638,9 @@ def test_dataframe_getitem(self): def test_loc(self): dates = pd.date_range('1/1/2000', periods=8) + # TODO(BEAM-11757): We do not preserve the freq attribute on a DateTime + # index + dates.freq = None df = pd.DataFrame( np.arange(32).reshape((8, 4)), index=dates, @@ -184,28 +650,24 @@ def test_loc(self): self._run_test(lambda df: df.loc[:dates[3]], df) self._run_test(lambda df: df.loc[df.A > 10], df) self._run_test(lambda df: df.loc[lambda df: df.A > 10], df) + self._run_test(lambda df: df.C.loc[df.A > 10], df) + self._run_test(lambda df, s: df.loc[s.loc[1:3]], df, pd.Series(dates)) - def test_series_agg(self): - s = pd.Series(list(range(16))) - self._run_test(lambda s: s.agg('sum'), s) - self._run_test(lambda s: s.agg(['sum']), s) - with beam.dataframe.allow_non_parallel_operations(): - self._run_test(lambda s: s.agg(['sum', 'mean']), s) - self._run_test(lambda s: s.agg(['mean']), s) - self._run_test(lambda s: s.agg('mean'), s) + def test_append_sort(self): + # yapf: disable + df1 = pd.DataFrame({'int': [1, 2, 3], 'str': ['a', 'b', 'c']}, + columns=['int', 'str'], + index=[1, 3, 5]) + df2 = pd.DataFrame({'int': [4, 5, 6], 'str': ['d', 'e', 'f']}, + columns=['str', 'int'], + index=[2, 4, 6]) + # yapf: enable - @unittest.skipIf(sys.version_info < (3, 6), 'Nondeterministic dict ordering.') - def test_dataframe_agg(self): - df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [2, 3, 5, 7]}) - self._run_test(lambda df: df.agg('sum'), df) - with beam.dataframe.allow_non_parallel_operations(): - self._run_test(lambda df: df.agg(['sum', 'mean']), df) - self._run_test(lambda df: df.agg({'A': 'sum', 'B': 'sum'}), df) - self._run_test(lambda df: df.agg({'A': 'sum', 'B': 'mean'}), df) - self._run_test(lambda df: df.agg({'A': ['sum', 'mean']}), df) - self._run_test(lambda df: df.agg({'A': ['sum', 'mean'], 'B': 'min'}), df) + self._run_test(lambda df1, df2: df1.append(df2, sort=True), df1, df2) + self._run_test(lambda df1, df2: df1.append(df2, sort=False), df1, df2) + self._run_test(lambda df1, df2: df2.append(df1, sort=True), df1, df2) + self._run_test(lambda df1, df2: df2.append(df1, sort=False), df1, df2) - @unittest.skipIf(sys.version_info < (3, 6), 'Nondeterministic dict ordering.') def test_smallest_largest(self): df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [2, 3, 5, 7]}) self._run_test(lambda df: df.nlargest(1, 'A', keep='all'), df) @@ -217,6 +679,7 @@ def test_series_cov_corr(self): pd.Series(range(100)), pd.Series([x**3 for x in range(-50, 50)])]: self._run_test(lambda s: s.std(), s) + self._run_test(lambda s: s.var(), s) self._run_test(lambda s: s.corr(s), s) self._run_test(lambda s: s.corr(s + 1), s) self._run_test(lambda s: s.corr(s * s), s) @@ -226,34 +689,39 @@ def test_dataframe_cov_corr(self): df = pd.DataFrame(np.random.randn(20, 3), columns=['a', 'b', 'c']) df.loc[df.index[:5], 'a'] = np.nan df.loc[df.index[5:10], 'b'] = np.nan - self._run_test(lambda df: df.corr().round(8), df) - self._run_test(lambda df: df.cov().round(8), df) - self._run_test(lambda df: df.corr(min_periods=12).round(8), df) - self._run_test(lambda df: df.cov(min_periods=12).round(8), df) - self._run_test(lambda df: df.corrwith(df.a).round(8), df) + self._run_test(lambda df: df.corr(), df) + self._run_test(lambda df: df.cov(), df) + self._run_test(lambda df: df.corr(min_periods=12), df) + self._run_test(lambda df: df.cov(min_periods=12), df) + self._run_test(lambda df: df.corrwith(df.a), df) + self._run_test(lambda df: df[['a', 'b']].corrwith(df[['b', 'c']]), df) + + df2 = pd.DataFrame(np.random.randn(20, 3), columns=['a', 'b', 'c']) self._run_test( - lambda df: df[['a', 'b']].corrwith(df[['b', 'c']]).round(8), df) + lambda df, df2: df.corrwith(df2, axis=1), df, df2, check_proxy=False) - def test_categorical_groupby(self): - df = pd.DataFrame({'A': np.arange(6), 'B': list('aabbca')}) - df['B'] = df['B'].astype(pd.CategoricalDtype(list('cab'))) - df = df.set_index('B') - # TODO(BEAM-11190): These aggregations can be done in index partitions, but - # it will require a little more complex logic - with beam.dataframe.allow_non_parallel_operations(): - self._run_test(lambda df: df.groupby(level=0).sum(), df) - self._run_test(lambda df: df.groupby(level=0).mean(), df) + def test_corrwith_bad_axis(self): + df = pd.DataFrame({'a': range(3), 'b': range(3, 6), 'c': range(6, 9)}) + self._run_error_test(lambda df: df.corrwith(df.a, axis=2), df) + self._run_error_test(lambda df: df.corrwith(df, axis=5), df) + + @unittest.skipIf(PD_VERSION < (1, 2), "na_action added in pandas 1.2.0") + def test_applymap_na_action(self): + # Replicates a doctest for na_action which is incompatible with + # doctest framework + df = pd.DataFrame([[pd.NA, 2.12], [3.356, 4.567]]) + self._run_test( + lambda df: df.applymap(lambda x: len(str(x)), na_action='ignore'), + df, + # TODO: generate proxy using naive type inference on fn + check_proxy=False) def test_dataframe_eval_query(self): df = pd.DataFrame(np.random.randn(20, 3), columns=['a', 'b', 'c']) self._run_test(lambda df: df.eval('foo = a + b - c'), df) self._run_test(lambda df: df.query('a > b + c'), df) - def eval_inplace(df): - df.eval('foo = a + b - c', inplace=True) - return df.foo - - self._run_test(eval_inplace, df) + self._run_inplace_test(lambda df: df.eval('foo = a + b - c'), df) # Verify that attempting to access locals raises a useful error deferred_df = frame_base.DeferredFrame.wrap( @@ -263,35 +731,1537 @@ def eval_inplace(df): self.assertRaises( NotImplementedError, lambda: deferred_df.query('a > @b + c')) + def test_index_name_assignment(self): + df = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]}) + df = df.set_index(['a', 'b'], drop=False) -class AllowNonParallelTest(unittest.TestCase): - def _use_non_parallel_operation(self): - _ = frame_base.DeferredFrame.wrap( - expressions.PlaceholderExpression(pd.Series([1, 2, 3]))).replace( - 'a', 'b', limit=1) + def change_index_names(df): + df.index.names = ['A', None] - def test_disallow_non_parallel(self): - with self.assertRaises(expressions.NonParallelOperation): - self._use_non_parallel_operation() + self._run_inplace_test(change_index_names, df) - def test_allow_non_parallel_in_context(self): - with beam.dataframe.allow_non_parallel_operations(): - self._use_non_parallel_operation() + def test_quantile(self): + df = pd.DataFrame( + np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=['a', 'b']) - def test_allow_non_parallel_nesting(self): - # disallowed - with beam.dataframe.allow_non_parallel_operations(): - # allowed - self._use_non_parallel_operation() - with beam.dataframe.allow_non_parallel_operations(False): - # disallowed again - with self.assertRaises(expressions.NonParallelOperation): - self._use_non_parallel_operation() - # allowed - self._use_non_parallel_operation() - # disallowed - with self.assertRaises(expressions.NonParallelOperation): - self._use_non_parallel_operation() + self._run_test( + lambda df: df.quantile(0.1, axis='columns'), df, check_proxy=False) + + self._run_test( + lambda df: df.quantile(0.1, axis='columns'), df, check_proxy=False) + with self.assertRaisesRegex(frame_base.WontImplementError, + r"df\.quantile\(q=0\.1, axis='columns'\)"): + self._run_test(lambda df: df.quantile([0.1, 0.5], axis='columns'), df) + + def test_dataframe_melt(self): + + df = pd.DataFrame({ + 'A': { + 0: 'a', 1: 'b', 2: 'c' + }, + 'B': { + 0: 1, 1: 3, 2: 5 + }, + 'C': { + 0: 2, 1: 4, 2: 6 + } + }) + + self._run_test( + lambda df: df.melt(id_vars=['A'], value_vars=['B'], ignore_index=False), + df) + self._run_test( + lambda df: df.melt( + id_vars=['A'], value_vars=['B', 'C'], ignore_index=False), + df) + self._run_test( + lambda df: df.melt( + id_vars=['A'], + value_vars=['B'], + var_name='myVarname', + value_name='myValname', + ignore_index=False), + df) + self._run_test( + lambda df: df.melt( + id_vars=['A'], value_vars=['B', 'C'], ignore_index=False), + df) + + df.columns = [list('ABC'), list('DEF')] + self._run_test( + lambda df: df.melt( + col_level=0, id_vars=['A'], value_vars=['B'], ignore_index=False), + df) + self._run_test( + lambda df: df.melt( + id_vars=[('A', 'D')], value_vars=[('B', 'E')], ignore_index=False), + df) + + def test_fillna_columns(self): + df = pd.DataFrame( + [[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], [np.nan, np.nan, np.nan, 5], + [np.nan, 3, np.nan, 4], [3, np.nan, np.nan, 4]], + columns=list('ABCD')) + + self._run_test(lambda df: df.fillna(method='ffill', axis='columns'), df) + self._run_test( + lambda df: df.fillna(method='ffill', axis='columns', limit=1), df) + self._run_test( + lambda df: df.fillna(method='bfill', axis='columns', limit=1), df) + + # Intended behavior is unclear here. See + # https://github.com/pandas-dev/pandas/issues/40989 + # self._run_test(lambda df: df.fillna(axis='columns', value=100, + # limit=2), df) + + def test_dataframe_fillna_dataframe_as_value(self): + df = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], + [np.nan, np.nan, np.nan, 5], [np.nan, 3, np.nan, 4]], + columns=list("ABCD")) + df2 = pd.DataFrame(np.zeros((4, 4)), columns=list("ABCE")) + + self._run_test(lambda df, df2: df.fillna(df2), df, df2) + + def test_dataframe_fillna_series_as_value(self): + df = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], + [np.nan, np.nan, np.nan, 5], [np.nan, 3, np.nan, 4]], + columns=list("ABCD")) + s = pd.Series(range(4), index=list("ABCE")) + + self._run_test(lambda df, s: df.fillna(s), df, s) + + def test_series_fillna_series_as_value(self): + df = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], + [np.nan, np.nan, np.nan, 5], [np.nan, 3, np.nan, 4]], + columns=list("ABCD")) + df2 = pd.DataFrame(np.zeros((4, 4)), columns=list("ABCE")) + + self._run_test(lambda df, df2: df.A.fillna(df2.A), df, df2) + + def test_append_verify_integrity(self): + df1 = pd.DataFrame({'A': range(10), 'B': range(10)}, index=range(10)) + df2 = pd.DataFrame({'A': range(10), 'B': range(10)}, index=range(9, 19)) + + self._run_error_test( + lambda s1, + s2: s1.append(s2, verify_integrity=True), + df1['A'], + df2['A'], + construction_time=False) + self._run_error_test( + lambda df1, + df2: df1.append(df2, verify_integrity=True), + df1, + df2, + construction_time=False) + + def test_categorical_groupby(self): + df = pd.DataFrame({'A': np.arange(6), 'B': list('aabbca')}) + df['B'] = df['B'].astype(pd.CategoricalDtype(list('cab'))) + df = df.set_index('B') + # TODO(BEAM-11190): These aggregations can be done in index partitions, but + # it will require a little more complex logic + self._run_test(lambda df: df.groupby(level=0).sum(), df, nonparallel=True) + self._run_test(lambda df: df.groupby(level=0).mean(), df, nonparallel=True) + + def test_dataframe_sum_nonnumeric_raises(self): + # Attempting a numeric aggregation with the str column present should + # raise, and suggest the numeric_only argument + with self.assertRaisesRegex(frame_base.WontImplementError, 'numeric_only'): + self._run_test(lambda df: df.sum(), GROUPBY_DF) + + # numeric_only=True should work + self._run_test(lambda df: df.sum(numeric_only=True), GROUPBY_DF) + # projecting only numeric columns should too + self._run_test(lambda df: df[['foo', 'bar']].sum(), GROUPBY_DF) + + def test_insert(self): + df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) + + self._run_inplace_test(lambda df: df.insert(1, 'C', df.A * 2), df) + self._run_inplace_test( + lambda df: df.insert(0, 'foo', pd.Series([8], index=[1])), + df, + check_proxy=False) + self._run_inplace_test(lambda df: df.insert(2, 'bar', value='q'), df) + + def test_insert_does_not_support_list_value(self): + df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) + + with self.assertRaisesRegex(frame_base.WontImplementError, + r"insert\(value=list\)"): + self._run_inplace_test(lambda df: df.insert(1, 'C', [7, 8, 9]), df) + + def test_drop_duplicates(self): + df = pd.DataFrame({ + 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], + 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], + 'rating': [4, 4, 3.5, 15, 5] + }) + + self._run_test(lambda df: df.drop_duplicates(keep=False), df) + self._run_test( + lambda df: df.drop_duplicates(subset=['brand'], keep=False), df) + self._run_test( + lambda df: df.drop_duplicates(subset=['brand', 'style'], keep=False), + df) + + @parameterized.expand([ + ( + lambda base: base.from_dict({ + 'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd'] + }), ), + ( + lambda base: base.from_dict({ + 'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd'] + }, + orient='index'), ), + ( + lambda base: base.from_records( + np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')], + dtype=[('col_1', 'i4'), ('col_2', 'U1')])), ), + ]) + def test_create_methods(self, func): + expected = func(pd.DataFrame) + + deferred_df = func(frames.DeferredDataFrame) + actual = expressions.Session({}).evaluate(deferred_df._expr) + + pd.testing.assert_frame_equal(actual, expected) + + def test_replace(self): + # verify a replace() doctest case that doesn't quite work in Beam as it uses + # the default method='pad' + df = pd.DataFrame({'A': ['bat', 'foo', 'bait'], 'B': ['abc', 'bar', 'xyz']}) + + self._run_test( + lambda df: df.replace( + regex={ + r'^ba.$': 'new', 'foo': 'xyz' + }, method=None), + df) + + def test_sample_columns(self): + df = pd.DataFrame({ + 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], + 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], + 'rating': [4, 4, 3.5, 15, 5] + }) + + self._run_test(lambda df: df.sample(axis=1, n=2, random_state=1), df) + self._run_error_test(lambda df: df.sample(axis=1, n=10, random_state=2), df) + self._run_test( + lambda df: df.sample(axis=1, n=10, random_state=3, replace=True), df) + + def test_cat(self): + # Replicate the doctests from CategorigcalAccessor + # These tests don't translate into pandas_doctests_test.py because it + # tries to use astype("category") in Beam, which makes a non-deferred + # column type. + s = pd.Series(list("abbccc")).astype("category") + + self._run_test(lambda s: s.cat.rename_categories(list("cba")), s) + self._run_test(lambda s: s.cat.reorder_categories(list("cba")), s) + self._run_test(lambda s: s.cat.add_categories(["d", "e"]), s) + self._run_test(lambda s: s.cat.remove_categories(["a", "c"]), s) + self._run_test(lambda s: s.cat.set_categories(list("abcde")), s) + self._run_test(lambda s: s.cat.as_ordered(), s) + self._run_test(lambda s: s.cat.as_unordered(), s) + self._run_test(lambda s: s.cat.codes, s) + + @parameterized.expand(frames.ELEMENTWISE_DATETIME_PROPERTIES) + def test_dt_property(self, prop_name): + # Generate a series with a lot of unique timestamps + s = pd.Series( + pd.date_range('1/1/2000', periods=100, freq='m') + + pd.timedelta_range(start='0 days', end='70 days', periods=100)) + self._run_test(lambda s: getattr(s.dt, prop_name), s) + + @parameterized.expand([ + ('month_name', {}), + ('day_name', {}), + ('normalize', {}), + ( + 'strftime', + { + 'date_format': '%B %d, %Y, %r' + }, + ), + ('tz_convert', { + 'tz': 'Europe/Berlin' + }), + ]) + def test_dt_method(self, op, kwargs): + # Generate a series with a lot of unique timestamps + s = pd.Series( + pd.date_range( + '1/1/2000', periods=100, freq='m', tz='America/Los_Angeles') + + pd.timedelta_range(start='0 days', end='70 days', periods=100)) + + self._run_test(lambda s: getattr(s.dt, op)(**kwargs), s) + + def test_dt_tz_localize_ambiguous_series(self): + # This replicates a dt.tz_localize doctest: + # s.tz_localize('CET', ambiguous=np.array([True, True, False])) + # But using a DeferredSeries instead of a np array + + s = pd.to_datetime( + pd.Series([ + '2018-10-28 01:20:00', '2018-10-28 02:36:00', '2018-10-28 03:46:00' + ])) + ambiguous = pd.Series([True, True, False], index=s.index) + + self._run_test( + lambda s, + ambiguous: s.dt.tz_localize('CET', ambiguous=ambiguous), + s, + ambiguous) + + def test_dt_tz_localize_nonexistent(self): + # This replicates dt.tz_localize doctests that exercise `nonexistent`. + # However they specify ambiguous='NaT' because the default, + # ambiguous='infer', is not supported. + s = pd.to_datetime( + pd.Series(['2015-03-29 02:30:00', '2015-03-29 03:30:00'])) + + self._run_test( + lambda s: s.dt.tz_localize( + 'Europe/Warsaw', ambiguous='NaT', nonexistent='shift_forward'), + s) + self._run_test( + lambda s: s.dt.tz_localize( + 'Europe/Warsaw', ambiguous='NaT', nonexistent='shift_backward'), + s) + self._run_test( + lambda s: s.dt.tz_localize( + 'Europe/Warsaw', ambiguous='NaT', nonexistent=pd.Timedelta('1H')), + s) + + +# pandas doesn't support kurtosis on GroupBys: +# https://github.com/pandas-dev/pandas/issues/40139 +ALL_GROUPING_AGGREGATIONS = sorted( + set(frames.ALL_AGGREGATIONS) - set(('kurt', 'kurtosis'))) + + +class GroupByTest(_AbstractFrameTest): + """Tests for DataFrame/Series GroupBy operations.""" + @parameterized.expand(ALL_GROUPING_AGGREGATIONS) + def test_groupby_agg(self, agg_type): + if agg_type == 'describe' and PD_VERSION < (1, 2): + self.skipTest( + "BEAM-12366: proxy generation of DataFrameGroupBy.describe " + "fails in pandas < 1.2") + self._run_test( + lambda df: df.groupby('group').agg(agg_type), + GROUPBY_DF, + check_proxy=False) + + @parameterized.expand(ALL_GROUPING_AGGREGATIONS) + def test_groupby_with_filter(self, agg_type): + if agg_type == 'describe' and PD_VERSION < (1, 2): + self.skipTest( + "BEAM-12366: proxy generation of DataFrameGroupBy.describe " + "fails in pandas < 1.2") + self._run_test( + lambda df: getattr(df[df.foo > 30].groupby('group'), agg_type)(), + GROUPBY_DF, + check_proxy=False) + + @parameterized.expand(ALL_GROUPING_AGGREGATIONS) + def test_groupby(self, agg_type): + if agg_type == 'describe' and PD_VERSION < (1, 2): + self.skipTest( + "BEAM-12366: proxy generation of DataFrameGroupBy.describe " + "fails in pandas < 1.2") + + self._run_test( + lambda df: getattr(df.groupby('group'), agg_type)(), + GROUPBY_DF, + check_proxy=False) + + @parameterized.expand(ALL_GROUPING_AGGREGATIONS) + def test_groupby_series(self, agg_type): + if agg_type == 'describe' and PD_VERSION < (1, 2): + self.skipTest( + "BEAM-12366: proxy generation of DataFrameGroupBy.describe " + "fails in pandas < 1.2") + + self._run_test( + lambda df: getattr(df[df.foo > 40].groupby(df.group), agg_type)(), + GROUPBY_DF, + check_proxy=False) + + def test_groupby_user_guide(self): + # Example from https://pandas.pydata.org/docs/user_guide/groupby.html + arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], + ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + + index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second']) + + df = pd.DataFrame({ + 'A': [1, 1, 1, 1, 2, 2, 3, 3], 'B': np.arange(8) + }, + index=index) + + self._run_test(lambda df: df.groupby(['second', 'A']).sum(), df) + + @parameterized.expand(ALL_GROUPING_AGGREGATIONS) + def test_groupby_project_series(self, agg_type): + df = GROUPBY_DF + + if agg_type == 'describe': + self.skipTest( + "BEAM-12366: proxy generation of SeriesGroupBy.describe " + "fails") + if agg_type in ('corr', 'cov'): + self.skipTest( + "BEAM-12367: SeriesGroupBy.{corr, cov} do not raise the " + "expected error.") + + self._run_test(lambda df: getattr(df.groupby('group').foo, agg_type)(), df) + self._run_test(lambda df: getattr(df.groupby('group').bar, agg_type)(), df) + self._run_test( + lambda df: getattr(df.groupby('group')['foo'], agg_type)(), df) + self._run_test( + lambda df: getattr(df.groupby('group')['bar'], agg_type)(), df) + + @parameterized.expand(ALL_GROUPING_AGGREGATIONS) + def test_groupby_project_dataframe(self, agg_type): + if agg_type == 'describe' and PD_VERSION < (1, 2): + self.skipTest( + "BEAM-12366: proxy generation of DataFrameGroupBy.describe " + "fails in pandas < 1.2") + self._run_test( + lambda df: getattr(df.groupby('group')[['bar', 'baz']], agg_type)(), + GROUPBY_DF, + check_proxy=False) + + def test_groupby_errors_bad_projection(self): + df = GROUPBY_DF + + # non-existent projection column + self._run_error_test( + lambda df: df.groupby('group')[['bar', 'baz']].bar.median(), df) + self._run_error_test(lambda df: df.groupby('group')[['bad']].median(), df) + + self._run_error_test(lambda df: df.groupby('group').bad.median(), df) + + self._run_error_test( + lambda df: df.groupby('group')[['bar', 'baz']].bar.sum(), df) + self._run_error_test(lambda df: df.groupby('group')[['bat']].sum(), df) + self._run_error_test(lambda df: df.groupby('group').bat.sum(), df) + + def test_groupby_errors_non_existent_label(self): + df = GROUPBY_DF + + # non-existent grouping label + self._run_error_test( + lambda df: df.groupby(['really_bad', 'foo', 'bad']).foo.sum(), df) + self._run_error_test(lambda df: df.groupby('bad').foo.sum(), df) + + def test_groupby_callable(self): + df = GROUPBY_DF + + self._run_test(lambda df: df.groupby(lambda x: x % 2).foo.sum(), df) + self._run_test(lambda df: df.groupby(lambda x: x % 5).median(), df) + + def test_groupby_apply(self): + df = GROUPBY_DF + + def median_sum_fn(x): + return (x.foo + x.bar).median() + + # Note this is the same as DataFrameGroupBy.describe. Using it here is + # just a convenient way to test apply() with a user fn that returns a Series + describe = lambda df: df.describe() + + self._run_test(lambda df: df.groupby('group').foo.apply(describe), df) + self._run_test( + lambda df: df.groupby('group')[['foo', 'bar']].apply(describe), df) + self._run_test(lambda df: df.groupby('group').apply(median_sum_fn), df) + self._run_test( + lambda df: df.set_index('group').foo.groupby(level=0).apply(describe), + df) + self._run_test(lambda df: df.groupby(level=0).apply(median_sum_fn), df) + self._run_test(lambda df: df.groupby(lambda x: x % 3).apply(describe), df) + self._run_test( + lambda df: df.bar.groupby(lambda x: x % 3).apply(describe), df) + self._run_test( + lambda df: df.set_index(['str', 'group', 'bool']).groupby( + level='group').apply(median_sum_fn), + df) + + def test_groupby_apply_preserves_column_order(self): + df = GROUPBY_DF + + self._run_test( + lambda df: df[['foo', 'group', 'bar']].groupby('group').apply( + lambda x: x), + df) + + def test_groupby_transform(self): + df = pd.DataFrame({ + "Date": [ + "2015-05-08", + "2015-05-07", + "2015-05-06", + "2015-05-05", + "2015-05-08", + "2015-05-07", + "2015-05-06", + "2015-05-05" + ], + "Data": [5, 8, 6, 1, 50, 100, 60, 120], + }) + + self._run_test(lambda df: df.groupby('Date')['Data'].transform(np.sum), df) + self._run_test( + lambda df: df.groupby('Date')['Data'].transform( + lambda x: (x - x.mean()) / x.std()), + df) + + def test_groupby_apply_modified_index(self): + df = GROUPBY_DF + + # If apply fn modifies the index then the output will include the grouped + # index + self._run_test( + lambda df: df.groupby('group').apply( + lambda x: x[x.foo > x.foo.median()]), + df) + + @unittest.skip('BEAM-11710') + def test_groupby_aggregate_grouped_column(self): + df = pd.DataFrame({ + 'group': ['a' if i % 5 == 0 or i % 3 == 0 else 'b' for i in range(100)], + 'foo': [None if i % 11 == 0 else i for i in range(100)], + 'bar': [None if i % 7 == 0 else 99 - i for i in range(100)], + 'baz': [None if i % 13 == 0 else i * 2 for i in range(100)], + }) + + self._run_test(lambda df: df.groupby('group').group.count(), df) + self._run_test(lambda df: df.groupby('group')[['group', 'bar']].count(), df) + self._run_test( + lambda df: df.groupby('group')[['group', 'bar']].apply( + lambda x: x.describe()), + df) + + @parameterized.expand((x, ) for x in [ + 0, + [1], + 3, + [0, 3], + [2, 1], + ['foo', 0], + [1, 'str'], + [3, 0, 2, 1], + ]) + def test_groupby_level_agg(self, level): + df = GROUPBY_DF.set_index(['group', 'foo', 'bar', 'str'], drop=False) + self._run_test(lambda df: df.groupby(level=level).bar.max(), df) + self._run_test( + lambda df: df.groupby(level=level).sum(numeric_only=True), df) + self._run_test( + lambda df: df.groupby(level=level).apply( + lambda x: (x.foo + x.bar).median()), + df) + + @unittest.skipIf(PD_VERSION < (1, 1), "drop_na added in pandas 1.1.0") + def test_groupby_count_na(self): + # Verify we can do a groupby.count() that doesn't drop NaN values + self._run_test( + lambda df: df.groupby('foo', dropna=True).bar.count(), GROUPBY_DF) + self._run_test( + lambda df: df.groupby('foo', dropna=False).bar.count(), GROUPBY_DF) + + def test_groupby_sum_min_count(self): + df = pd.DataFrame({ + 'good': [1, 2, 3, np.nan], + 'bad': [np.nan, np.nan, np.nan, 4], + 'group': ['a', 'b', 'a', 'b'] + }) + + self._run_test(lambda df: df.groupby('group').sum(min_count=2), df) + + def test_groupby_dtypes(self): + self._run_test( + lambda df: df.groupby('group').dtypes, GROUPBY_DF, check_proxy=False) + self._run_test( + lambda df: df.groupby(level=0).dtypes, GROUPBY_DF, check_proxy=False) + + @parameterized.expand(ALL_GROUPING_AGGREGATIONS) + def test_dataframe_groupby_series(self, agg_type): + if agg_type == 'describe' and PD_VERSION < (1, 2): + self.skipTest( + "BEAM-12366: proxy generation of DataFrameGroupBy.describe " + "fails in pandas < 1.2") + self._run_test( + lambda df: df[df.foo > 40].groupby(df.group).agg(agg_type), + GROUPBY_DF, + check_proxy=False) + self._run_test( + lambda df: df[df.foo > 40].groupby(df.foo % 3).agg(agg_type), + GROUPBY_DF, + check_proxy=False) + + @parameterized.expand(ALL_GROUPING_AGGREGATIONS) + def test_series_groupby_series(self, agg_type): + if agg_type == 'describe': + self.skipTest( + "BEAM-12366: proxy generation of SeriesGroupBy.describe " + "fails") + if agg_type in ('corr', 'cov'): + self.skipTest( + "BEAM-12367: SeriesGroupBy.{corr, cov} do not raise the " + "expected error.") + self._run_test( + lambda df: df[df.foo < 40].bar.groupby(df.group).agg(agg_type), + GROUPBY_DF) + self._run_test( + lambda df: df[df.foo < 40].bar.groupby(df.foo % 3).agg(agg_type), + GROUPBY_DF) + + def test_groupby_series_apply(self): + df = GROUPBY_DF + + def median_sum_fn(x): + return (x.foo + x.bar).median() + + # Note this is the same as DataFrameGroupBy.describe. Using it here is + # just a convenient way to test apply() with a user fn that returns a Series + describe = lambda df: df.describe() + + self._run_test(lambda df: df.groupby(df.group).foo.apply(describe), df) + self._run_test( + lambda df: df.groupby(df.group)[['foo', 'bar']].apply(describe), df) + self._run_test(lambda df: df.groupby(df.group).apply(median_sum_fn), df) + + def test_groupby_multiindex_keep_nans(self): + # Due to https://github.com/pandas-dev/pandas/issues/36470 + # groupby(dropna=False) doesn't work with multiple columns + with self.assertRaisesRegex(NotImplementedError, "BEAM-12495"): + self._run_test( + lambda df: df.groupby(['foo', 'bar'], dropna=False).sum(), GROUPBY_DF) + + +class AggregationTest(_AbstractFrameTest): + """Tests for global aggregation methods on DataFrame/Series.""" + + # corr, cov on Series require an other argument + @parameterized.expand( + sorted(set(frames.ALL_AGGREGATIONS) - set(['corr', 'cov']))) + def test_series_agg(self, agg_method): + s = pd.Series(list(range(16))) + + nonparallel = agg_method in ( + 'quantile', + 'mean', + 'describe', + 'median', + 'sem', + 'mad', + 'skew', + 'kurtosis', + 'kurt') + + # TODO(BEAM-12379): max and min produce the wrong proxy + check_proxy = agg_method not in ('max', 'min') + + self._run_test( + lambda s: s.agg(agg_method), + s, + nonparallel=nonparallel, + check_proxy=check_proxy) + + # corr, cov on Series require an other argument + # Series.size is a property + @parameterized.expand( + sorted(set(frames.ALL_AGGREGATIONS) - set(['corr', 'cov', 'size']))) + def test_series_agg_method(self, agg_method): + s = pd.Series(list(range(16))) + + nonparallel = agg_method in ( + 'quantile', + 'mean', + 'describe', + 'median', + 'sem', + 'mad', + 'skew', + 'kurtosis', + 'kurt') + + # TODO(BEAM-12379): max and min produce the wrong proxy + check_proxy = agg_method not in ('max', 'min') + + self._run_test( + lambda s: getattr(s, agg_method)(), + s, + nonparallel=nonparallel, + check_proxy=check_proxy) + + @parameterized.expand(frames.ALL_AGGREGATIONS) + def test_dataframe_agg(self, agg_method): + df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [2, 3, 5, 7]}) + + nonparallel = agg_method in ( + 'quantile', + 'mean', + 'describe', + 'median', + 'sem', + 'mad', + 'skew', + 'kurtosis', + 'kurt') + + # TODO(BEAM-12379): max and min produce the wrong proxy + check_proxy = agg_method not in ('max', 'min') + + self._run_test( + lambda df: df.agg(agg_method), + df, + nonparallel=nonparallel, + check_proxy=check_proxy) + + # DataFrame.size is a property + @parameterized.expand(sorted(set(frames.ALL_AGGREGATIONS) - set(['size']))) + def test_dataframe_agg_method(self, agg_method): + df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [2, 3, 5, 7]}) + + nonparallel = agg_method in ( + 'quantile', + 'mean', + 'describe', + 'median', + 'sem', + 'mad', + 'skew', + 'kurtosis', + 'kurt') + + # TODO(BEAM-12379): max and min produce the wrong proxy + check_proxy = agg_method not in ('max', 'min') + + self._run_test( + lambda df: getattr(df, agg_method)(), + df, + nonparallel=nonparallel, + check_proxy=check_proxy) + + def test_series_agg_modes(self): + s = pd.Series(list(range(16))) + self._run_test(lambda s: s.agg('sum'), s) + self._run_test(lambda s: s.agg(['sum']), s) + self._run_test(lambda s: s.agg(['sum', 'mean']), s, nonparallel=True) + self._run_test(lambda s: s.agg(['mean']), s, nonparallel=True) + self._run_test(lambda s: s.agg('mean'), s, nonparallel=True) + + def test_dataframe_agg_modes(self): + df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [2, 3, 5, 7]}) + self._run_test(lambda df: df.agg('sum'), df) + self._run_test(lambda df: df.agg(['sum', 'mean']), df, nonparallel=True) + self._run_test(lambda df: df.agg({'A': 'sum', 'B': 'sum'}), df) + self._run_test( + lambda df: df.agg({ + 'A': 'sum', 'B': 'mean' + }), df, nonparallel=True) + self._run_test( + lambda df: df.agg({'A': ['sum', 'mean']}), df, nonparallel=True) + self._run_test( + lambda df: df.agg({ + 'A': ['sum', 'mean'], 'B': 'min' + }), + df, + nonparallel=True) + + def test_series_agg_level(self): + self._run_test( + lambda df: df.set_index(['group', 'foo']).bar.count(level=0), + GROUPBY_DF) + self._run_test( + lambda df: df.set_index(['group', 'foo']).bar.max(level=0), GROUPBY_DF) + + self._run_test( + lambda df: df.set_index(['group', 'foo']).bar.median(level=0), + GROUPBY_DF) + + self._run_test( + lambda df: df.set_index(['foo', 'group']).bar.count(level=1), + GROUPBY_DF) + self._run_test( + lambda df: df.set_index(['group', 'foo']).bar.max(level=1), GROUPBY_DF) + self._run_test( + lambda df: df.set_index(['group', 'foo']).bar.max(level='foo'), + GROUPBY_DF) + self._run_test( + lambda df: df.set_index(['group', 'foo']).bar.median(level=1), + GROUPBY_DF) + + def test_dataframe_agg_level(self): + self._run_test( + lambda df: df.set_index(['group', 'foo']).count(level=0), GROUPBY_DF) + self._run_test( + lambda df: df.set_index(['group', 'foo']).max( + level=0, numeric_only=False), + GROUPBY_DF, + check_proxy=False) + # pandas implementation doesn't respect numeric_only argument here + # (https://github.com/pandas-dev/pandas/issues/40788), it + # always acts as if numeric_only=True. Our implmentation respects it so we + # need to make it explicit. + self._run_test( + lambda df: df.set_index(['group', 'foo']).sum( + level=0, numeric_only=True), + GROUPBY_DF) + + self._run_test( + lambda df: df.set_index(['group', 'foo'])[['bar']].count(level=1), + GROUPBY_DF) + self._run_test( + lambda df: df.set_index(['group', 'foo']).count(level=1), GROUPBY_DF) + self._run_test( + lambda df: df.set_index(['group', 'foo']).max( + level=1, numeric_only=False), + GROUPBY_DF, + check_proxy=False) + # sum with str columns is order-sensitive + self._run_test( + lambda df: df.set_index(['group', 'foo']).sum( + level=1, numeric_only=True), + GROUPBY_DF) + + self._run_test( + lambda df: df.set_index(['group', 'foo']).median( + level=0, numeric_only=True), + GROUPBY_DF) + self._run_test( + lambda df: df.drop('str', axis=1).set_index(['foo', 'group']).median( + level=1, numeric_only=True), + GROUPBY_DF) + + def test_series_agg_multifunc_level(self): + # level= is ignored for multiple agg fns + self._run_test( + lambda df: df.set_index(['group', 'foo']).bar.agg(['min', 'max'], + level=0), + GROUPBY_DF) + + def test_dataframe_agg_multifunc_level(self): + # level= is ignored for multiple agg fns + self._run_test( + lambda df: df.set_index(['group', 'foo']).agg(['min', 'max'], level=0), + GROUPBY_DF, + check_proxy=False) + + @parameterized.expand([(True, ), (False, )]) + @unittest.skipIf( + PD_VERSION < (1, 2), + "pandas 1.1.0 produces different dtypes for these examples") + def test_dataframe_agg_numeric_only(self, numeric_only): + # Note other aggregation functions can fail on this input with + # numeric_only={False,None}. These are the only ones that actually work for + # the string inputs. + self._run_test( + lambda df: df.max(numeric_only=numeric_only), + GROUPBY_DF, + check_proxy=False) + self._run_test( + lambda df: df.min(numeric_only=numeric_only), + GROUPBY_DF, + check_proxy=False) + + @unittest.skip( + "pandas implementation doesn't respect numeric_only= with " + "level= (https://github.com/pandas-dev/pandas/issues/40788)") + def test_dataframe_agg_level_numeric_only(self): + self._run_test( + lambda df: df.set_index('foo').sum(level=0, numeric_only=True), + GROUPBY_DF) + self._run_test( + lambda df: df.set_index('foo').max(level=0, numeric_only=True), + GROUPBY_DF) + self._run_test( + lambda df: df.set_index('foo').mean(level=0, numeric_only=True), + GROUPBY_DF) + self._run_test( + lambda df: df.set_index('foo').median(level=0, numeric_only=True), + GROUPBY_DF) + + def test_dataframe_agg_bool_only(self): + df = pd.DataFrame({ + 'all': [True for i in range(10)], + 'any': [i % 3 == 0 for i in range(10)], + 'int': range(10) + }) + + self._run_test(lambda df: df.all(), df) + self._run_test(lambda df: df.any(), df) + self._run_test(lambda df: df.all(bool_only=True), df) + self._run_test(lambda df: df.any(bool_only=True), df) + + @unittest.skip( + "pandas doesn't implement bool_only= with level= " + "(https://github.com/pandas-dev/pandas/blob/" + "v1.2.3/pandas/core/generic.py#L10573)") + def test_dataframe_agg_level_bool_only(self): + df = pd.DataFrame({ + 'all': [True for i in range(10)], + 'any': [i % 3 == 0 for i in range(10)], + 'int': range(10) + }) + + self._run_test(lambda df: df.set_index('int', drop=False).all(level=0), df) + self._run_test(lambda df: df.set_index('int', drop=False).any(level=0), df) + self._run_test( + lambda df: df.set_index('int', drop=False).all(level=0, bool_only=True), + df) + self._run_test( + lambda df: df.set_index('int', drop=False).any(level=0, bool_only=True), + df) + + def test_series_agg_np_size(self): + self._run_test( + lambda df: df.set_index(['group', 'foo']).agg(np.size), + GROUPBY_DF, + check_proxy=False) + + def test_df_agg_invalid_kwarg_raises(self): + self._run_error_test(lambda df: df.agg('mean', bool_only=True), GROUPBY_DF) + self._run_error_test( + lambda df: df.agg('any', numeric_only=True), GROUPBY_DF) + self._run_error_test( + lambda df: df.agg('median', min_count=3, numeric_only=True), GROUPBY_DF) + + def test_series_agg_method_invalid_kwarg_raises(self): + self._run_error_test(lambda df: df.foo.median(min_count=3), GROUPBY_DF) + self._run_error_test( + lambda df: df.foo.agg('median', min_count=3), GROUPBY_DF) + + @unittest.skipIf( + PD_VERSION < (1, 3), + ( + "DataFrame.agg raises a different exception from the " + "aggregation methods. Fixed in " + "https://github.com/pandas-dev/pandas/pull/40543.")) + def test_df_agg_method_invalid_kwarg_raises(self): + self._run_error_test(lambda df: df.mean(bool_only=True), GROUPBY_DF) + self._run_error_test(lambda df: df.any(numeric_only=True), GROUPBY_DF) + self._run_error_test( + lambda df: df.median(min_count=3, numeric_only=True), GROUPBY_DF) + + def test_agg_min_count(self): + df = pd.DataFrame({ + 'good': [1, 2, 3, np.nan], + 'bad': [np.nan, np.nan, np.nan, 4], + }, + index=['a', 'b', 'a', 'b']) + + self._run_test(lambda df: df.sum(level=0, min_count=2), df) + + self._run_test(lambda df: df.sum(min_count=3), df, nonparallel=True) + self._run_test(lambda df: df.sum(min_count=1), df, nonparallel=True) + self._run_test(lambda df: df.good.sum(min_count=2), df, nonparallel=True) + self._run_test(lambda df: df.bad.sum(min_count=2), df, nonparallel=True) + + def test_series_agg_std(self): + s = pd.Series(range(10)) + + self._run_test(lambda s: s.agg('std'), s) + self._run_test(lambda s: s.agg('var'), s) + self._run_test(lambda s: s.agg(['std', 'sum']), s) + self._run_test(lambda s: s.agg(['var']), s) + + def test_std_all_na(self): + s = pd.Series([np.nan] * 10) + + self._run_test(lambda s: s.agg('std'), s) + self._run_test(lambda s: s.std(), s) + + def test_std_mostly_na_with_ddof(self): + df = pd.DataFrame({ + 'one': [i if i % 8 == 0 else np.nan for i in range(8)], + 'two': [i if i % 4 == 0 else np.nan for i in range(8)], + 'three': [i if i % 2 == 0 else np.nan for i in range(8)], + }, + index=pd.MultiIndex.from_arrays( + [list(range(8)), list(reversed(range(8)))], + names=['forward', None])) + + self._run_test(lambda df: df.std(), df) # ddof=1 + self._run_test(lambda df: df.std(ddof=0), df) + self._run_test(lambda df: df.std(ddof=2), df) + self._run_test(lambda df: df.std(ddof=3), df) + self._run_test(lambda df: df.std(ddof=4), df) + + def test_dataframe_std(self): + self._run_test(lambda df: df.std(numeric_only=True), GROUPBY_DF) + self._run_test(lambda df: df.var(numeric_only=True), GROUPBY_DF) + + def test_dataframe_mode(self): + self._run_test( + lambda df: df.mode(), GROUPBY_DF, nonparallel=True, check_proxy=False) + self._run_test( + lambda df: df.mode(numeric_only=True), + GROUPBY_DF, + nonparallel=True, + check_proxy=False) + self._run_test( + lambda df: df.mode(dropna=True, numeric_only=True), + GROUPBY_DF, + nonparallel=True, + check_proxy=False) + + def test_series_mode(self): + self._run_test(lambda df: df.foo.mode(), GROUPBY_DF, nonparallel=True) + self._run_test( + lambda df: df.baz.mode(dropna=True), GROUPBY_DF, nonparallel=True) + + +class BeamSpecificTest(unittest.TestCase): + """Tests for functionality that's specific to the Beam DataFrame API. + + These features don't exist in pandas so we must verify them independently.""" + def assert_frame_data_equivalent(self, actual, expected): + """Verify that actual is the same as expected, ignoring the index and order + of the data.""" + def sort_and_drop_index(df): + if isinstance(df, pd.Series): + df = df.sort_values() + elif isinstance(df, pd.DataFrame): + df = df.sort_values(by=list(df.columns)) + + return df.reset_index(drop=True) + + actual = sort_and_drop_index(actual) + expected = sort_and_drop_index(expected) + + if isinstance(expected, pd.Series): + pd.testing.assert_series_equal(actual, expected) + elif isinstance(expected, pd.DataFrame): + pd.testing.assert_frame_equal(actual, expected) + + def _evaluate(self, func, *args, distributed=True): + deferred_args = [ + frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(arg, arg[0:0])) for arg in args + ] + + session_type = ( + expressions.PartitioningSession if distributed else expressions.Session) + + return session_type({}).evaluate(func(*deferred_args)._expr) + + def test_drop_duplicates_keep_any(self): + df = pd.DataFrame({ + 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], + 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], + 'rating': [4, 4, 3.5, 15, 5] + }) + + result = self._evaluate(lambda df: df.drop_duplicates(keep='any'), df) + + # Verify that the result is the same as conventional drop_duplicates + self.assert_frame_data_equivalent(result, df.drop_duplicates()) + + def test_drop_duplicates_keep_any_subset(self): + df = pd.DataFrame({ + 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], + 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], + 'rating': [4, 4, 3.5, 15, 5] + }) + + result = self._evaluate( + lambda df: df.drop_duplicates(keep='any', subset=['brand']), df) + + self.assertTrue(result.brand.unique) + self.assert_frame_data_equivalent( + result.brand, df.drop_duplicates(subset=['brand']).brand) + + def test_series_drop_duplicates_keep_any(self): + df = pd.DataFrame({ + 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], + 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], + 'rating': [4, 4, 3.5, 15, 5] + }) + + result = self._evaluate(lambda df: df.brand.drop_duplicates(keep='any'), df) + + self.assert_frame_data_equivalent(result, df.brand.drop_duplicates()) + + def test_duplicated_keep_any(self): + df = pd.DataFrame({ + 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], + 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], + 'rating': [4, 4, 3.5, 15, 5] + }) + + result = self._evaluate(lambda df: df.duplicated(keep='any'), df) + + # Verify that the result is the same as conventional duplicated + self.assert_frame_data_equivalent(result, df.duplicated()) + + def test_nsmallest_any(self): + df = pd.DataFrame({ + 'population': [ + 59000000, + 65000000, + 434000, + 434000, + 434000, + 337000, + 337000, + 11300, + 11300 + ], + 'GDP': [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311], + 'alpha-2': ["IT", "FR", "MT", "MV", "BN", "IS", "NR", "TV", "AI"] + }, + index=[ + "Italy", + "France", + "Malta", + "Maldives", + "Brunei", + "Iceland", + "Nauru", + "Tuvalu", + "Anguilla" + ]) + + result = self._evaluate( + lambda df: df.population.nsmallest(3, keep='any'), df) + + # keep='any' should produce the same result as keep='first', + # but not necessarily with the same index + self.assert_frame_data_equivalent(result, df.population.nsmallest(3)) + + def test_nlargest_any(self): + df = pd.DataFrame({ + 'population': [ + 59000000, + 65000000, + 434000, + 434000, + 434000, + 337000, + 337000, + 11300, + 11300 + ], + 'GDP': [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311], + 'alpha-2': ["IT", "FR", "MT", "MV", "BN", "IS", "NR", "TV", "AI"] + }, + index=[ + "Italy", + "France", + "Malta", + "Maldives", + "Brunei", + "Iceland", + "Nauru", + "Tuvalu", + "Anguilla" + ]) + + result = self._evaluate( + lambda df: df.population.nlargest(3, keep='any'), df) + + # keep='any' should produce the same result as keep='first', + # but not necessarily with the same index + self.assert_frame_data_equivalent(result, df.population.nlargest(3)) + + def test_sample(self): + df = pd.DataFrame({ + 'population': [ + 59000000, + 65000000, + 434000, + 434000, + 434000, + 337000, + 337000, + 11300, + 11300 + ], + 'GDP': [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311], + 'alpha-2': ["IT", "FR", "MT", "MV", "BN", "IS", "NR", "TV", "AI"] + }, + index=[ + "Italy", + "France", + "Malta", + "Maldives", + "Brunei", + "Iceland", + "Nauru", + "Tuvalu", + "Anguilla" + ]) + + result = self._evaluate(lambda df: df.sample(n=3), df) + + self.assertEqual(len(result), 3) + + series_result = self._evaluate(lambda df: df.GDP.sample(n=3), df) + self.assertEqual(len(series_result), 3) + self.assertEqual(series_result.name, "GDP") + + def test_sample_with_weights(self): + df = pd.DataFrame({ + 'population': [ + 59000000, + 65000000, + 434000, + 434000, + 434000, + 337000, + 337000, + 11300, + 11300 + ], + 'GDP': [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311], + 'alpha-2': ["IT", "FR", "MT", "MV", "BN", "IS", "NR", "TV", "AI"] + }, + index=[ + "Italy", + "France", + "Malta", + "Maldives", + "Brunei", + "Iceland", + "Nauru", + "Tuvalu", + "Anguilla" + ]) + + weights = pd.Series([0, 0, 0, 0, 0, 0, 0, 1, 1], index=df.index) + + result = self._evaluate( + lambda df, weights: df.sample(n=2, weights=weights), df, weights) + + self.assertEqual(len(result), 2) + self.assertEqual(set(result.index), set(["Tuvalu", "Anguilla"])) + + series_result = self._evaluate( + lambda df, weights: df.GDP.sample(n=2, weights=weights), df, weights) + self.assertEqual(len(series_result), 2) + self.assertEqual(series_result.name, "GDP") + self.assertEqual(set(series_result.index), set(["Tuvalu", "Anguilla"])) + + def test_sample_with_missing_weights(self): + df = pd.DataFrame({ + 'population': [ + 59000000, + 65000000, + 434000, + 434000, + 434000, + 337000, + 337000, + 11300, + 11300 + ], + 'GDP': [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311], + 'alpha-2': ["IT", "FR", "MT", "MV", "BN", "IS", "NR", "TV", "AI"] + }, + index=[ + "Italy", + "France", + "Malta", + "Maldives", + "Brunei", + "Iceland", + "Nauru", + "Tuvalu", + "Anguilla" + ]) + + # Missing weights are treated as 0 + weights = pd.Series([.1, .01, np.nan, 0], + index=["Nauru", "Iceland", "Anguilla", "Italy"]) + + result = self._evaluate( + lambda df, weights: df.sample(n=2, weights=weights), df, weights) + + self.assertEqual(len(result), 2) + self.assertEqual(set(result.index), set(["Nauru", "Iceland"])) + + series_result = self._evaluate( + lambda df, weights: df.GDP.sample(n=2, weights=weights), df, weights) + + self.assertEqual(len(series_result), 2) + self.assertEqual(series_result.name, "GDP") + self.assertEqual(set(series_result.index), set(["Nauru", "Iceland"])) + + def test_sample_with_weights_distribution(self): + target_prob = 0.25 + num_samples = 100 + num_targets = 200 + num_other_elements = 10000 + + target_weight = target_prob / num_targets + other_weight = (1 - target_prob) / num_other_elements + self.assertTrue(target_weight > other_weight * 10, "weights too close") + + result = self._evaluate( + lambda s, + weights: s.sample(n=num_samples, weights=weights).sum(), + # The first elements are 1, the rest are all 0. This means that when + # we sum all the sampled elements (above), the result should be the + # number of times the first elements (aka targets) were sampled. + pd.Series([1] * num_targets + [0] * num_other_elements), + pd.Series([target_weight] * num_targets + + [other_weight] * num_other_elements)) + + # With the above constants, the probability of violating this invariant + # (as computed using the Bernoulli distribution) is about 0.0012%. + expected = num_samples * target_prob + self.assertTrue(expected / 3 < result < expected * 2, (expected, result)) + + +class AllowNonParallelTest(unittest.TestCase): + def _use_non_parallel_operation(self): + _ = frame_base.DeferredFrame.wrap( + expressions.PlaceholderExpression(pd.Series([1, 2, 3]))).replace( + 'a', 'b', limit=1) + + def test_disallow_non_parallel(self): + with self.assertRaises(expressions.NonParallelOperation): + self._use_non_parallel_operation() + + def test_allow_non_parallel_in_context(self): + with beam.dataframe.allow_non_parallel_operations(): + self._use_non_parallel_operation() + + def test_allow_non_parallel_nesting(self): + # disallowed + with beam.dataframe.allow_non_parallel_operations(): + # allowed + self._use_non_parallel_operation() + with beam.dataframe.allow_non_parallel_operations(False): + # disallowed again + with self.assertRaises(expressions.NonParallelOperation): + self._use_non_parallel_operation() + # allowed + self._use_non_parallel_operation() + # disallowed + with self.assertRaises(expressions.NonParallelOperation): + self._use_non_parallel_operation() + + +class ConstructionTimeTest(unittest.TestCase): + """Tests for operations that can be executed eagerly.""" + DF = pd.DataFrame({ + 'str_col': ['foo', 'bar'] * 3, + 'int_col': [1, 2] * 3, + 'flt_col': [1.1, 2.2] * 3, + 'cat_col': pd.Series(list('aabbca'), dtype="category"), + 'datetime_col': pd.Series( + pd.date_range( + '1/1/2000', periods=6, freq='m', tz='America/Los_Angeles')) + }) + DEFERRED_DF = frame_base.DeferredFrame.wrap( + expressions.PlaceholderExpression(DF.iloc[:0])) + + def _run_test(self, fn): + expected = fn(self.DF) + actual = fn(self.DEFERRED_DF) + + if isinstance(expected, pd.Index): + pd.testing.assert_index_equal(expected, actual) + elif isinstance(expected, pd.Series): + pd.testing.assert_series_equal(expected, actual) + elif isinstance(expected, pd.DataFrame): + pd.testing.assert_frame_equal(expected, actual) + else: + self.assertEqual(expected, actual) + + @parameterized.expand(DF.columns) + def test_series_name(self, col_name): + self._run_test(lambda df: df[col_name].name) + + @parameterized.expand(DF.columns) + def test_series_dtype(self, col_name): + self._run_test(lambda df: df[col_name].dtype) + self._run_test(lambda df: df[col_name].dtypes) + + def test_dataframe_columns(self): + self._run_test(lambda df: list(df.columns)) + + def test_dataframe_dtypes(self): + self._run_test(lambda df: list(df.dtypes)) + + def test_categories(self): + self._run_test(lambda df: df.cat_col.cat.categories) + + def test_categorical_ordered(self): + self._run_test(lambda df: df.cat_col.cat.ordered) + + def test_groupby_ndim(self): + self._run_test(lambda df: df.groupby('int_col').ndim) + + def test_groupby_project_ndim(self): + self._run_test(lambda df: df.groupby('int_col').flt_col.ndim) + self._run_test( + lambda df: df.groupby('int_col')[['flt_col', 'str_col']].ndim) + + def test_get_column_default_None(self): + # .get just returns default_value=None at construction time if the column + # doesn't exist + self._run_test(lambda df: df.get('FOO')) + + def test_datetime_tz(self): + self._run_test(lambda df: df.datetime_col.dt.tz) + + +class DocstringTest(unittest.TestCase): + @parameterized.expand([ + (frames.DeferredDataFrame, pd.DataFrame), + (frames.DeferredSeries, pd.Series), + #(frames._DeferredIndex, pd.Index), + (frames._DeferredStringMethods, pd.core.strings.StringMethods), + ( + frames._DeferredCategoricalMethods, + pd.core.arrays.categorical.CategoricalAccessor), + (frames.DeferredGroupBy, pd.core.groupby.generic.DataFrameGroupBy), + (frames._DeferredGroupByCols, pd.core.groupby.generic.DataFrameGroupBy), + ( + frames._DeferredDatetimeMethods, + pd.core.indexes.accessors.DatetimeProperties), + ]) + def test_docs_defined(self, beam_type, pd_type): + beam_attrs = set(dir(beam_type)) + pd_attrs = set(dir(pd_type)) + + docstring_required = sorted([ + attr for attr in beam_attrs.intersection(pd_attrs) + if getattr(pd_type, attr).__doc__ and not attr.startswith('_') + ]) + + docstring_missing = [ + attr for attr in docstring_required + if not getattr(beam_type, attr).__doc__ + ] + + self.assertTrue( + len(docstring_missing) == 0, + f'{beam_type.__name__} is missing a docstring for ' + f'{len(docstring_missing)}/{len(docstring_required)} ' + f'({len(docstring_missing)/len(docstring_required):%}) ' + f'operations:\n{docstring_missing}') + + +class ReprTest(unittest.TestCase): + def test_basic_dataframe(self): + df = frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(GROUPBY_DF)) + self.assertEqual( + repr(df), + ( + "DeferredDataFrame(columns=['group', 'foo', 'bar', 'baz', 'bool', " + "'str'], index=)")) + + def test_dataframe_with_named_index(self): + df = frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(GROUPBY_DF.set_index('group'))) + self.assertEqual( + repr(df), + ( + "DeferredDataFrame(columns=['foo', 'bar', 'baz', 'bool', 'str'], " + "index='group')")) + + def test_dataframe_with_partial_named_index(self): + df = frame_base.DeferredFrame.wrap( + expressions.ConstantExpression( + GROUPBY_DF.set_index([GROUPBY_DF.index, 'group']))) + self.assertEqual( + repr(df), + ( + "DeferredDataFrame(columns=['foo', 'bar', 'baz', 'bool', 'str'], " + "indexes=[, 'group'])")) + + def test_dataframe_with_named_multi_index(self): + df = frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(GROUPBY_DF.set_index(['str', 'group']))) + self.assertEqual( + repr(df), + ( + "DeferredDataFrame(columns=['foo', 'bar', 'baz', 'bool'], " + "indexes=['str', 'group'])")) + + def test_dataframe_with_multiple_column_levels(self): + df = pd.DataFrame({ + 'foofoofoo': ['one', 'one', 'one', 'two', 'two', 'two'], + 'barbar': ['A', 'B', 'C', 'A', 'B', 'C'], + 'bazzy': [1, 2, 3, 4, 5, 6], + 'zoop': ['x', 'y', 'z', 'q', 'w', 't'] + }) + + df = df.pivot(index='foofoofoo', columns='barbar') + df = frame_base.DeferredFrame.wrap(expressions.ConstantExpression(df)) + self.assertEqual( + repr(df), + ( + "DeferredDataFrame(columns=[('bazzy', 'A'), ('bazzy', 'B'), " + "('bazzy', 'C'), ('zoop', 'A'), ('zoop', 'B'), ('zoop', 'C')], " + "index='foofoofoo')")) + + def test_dataframe_with_multiple_column_and_multiple_index_levels(self): + df = pd.DataFrame({ + 'foofoofoo': ['one', 'one', 'one', 'two', 'two', 'two'], + 'barbar': ['A', 'B', 'C', 'A', 'B', 'C'], + 'bazzy': [1, 2, 3, 4, 5, 6], + 'zoop': ['x', 'y', 'z', 'q', 'w', 't'] + }) + + df = df.pivot(index='foofoofoo', columns='barbar') + df.index = [['a', 'b'], df.index] + + # pandas repr displays this: + # bazzy zoop + # barbar A B C A B C + # foofoofoo + # a one 1 2 3 x y z + # b two 4 5 6 q w t + df = frame_base.DeferredFrame.wrap(expressions.ConstantExpression(df)) + self.assertEqual( + repr(df), + ( + "DeferredDataFrame(columns=[('bazzy', 'A'), ('bazzy', 'B'), " + "('bazzy', 'C'), ('zoop', 'A'), ('zoop', 'B'), ('zoop', 'C')], " + "indexes=[, 'foofoofoo'])")) + + def test_basic_series(self): + df = frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(GROUPBY_DF['bool'])) + self.assertEqual( + repr(df), "DeferredSeries(name='bool', dtype=bool, index=)") + + def test_series_with_named_index(self): + df = frame_base.DeferredFrame.wrap( + expressions.ConstantExpression(GROUPBY_DF.set_index('group')['str'])) + self.assertEqual( + repr(df), "DeferredSeries(name='str', dtype=object, index='group')") + + def test_series_with_partial_named_index(self): + df = frame_base.DeferredFrame.wrap( + expressions.ConstantExpression( + GROUPBY_DF.set_index([GROUPBY_DF.index, 'group'])['bar'])) + self.assertEqual( + repr(df), + ( + "DeferredSeries(name='bar', dtype=float64, " + "indexes=[, 'group'])")) + + def test_series_with_named_multi_index(self): + df = frame_base.DeferredFrame.wrap( + expressions.ConstantExpression( + GROUPBY_DF.set_index(['str', 'group'])['baz'])) + self.assertEqual( + repr(df), + "DeferredSeries(name='baz', dtype=float64, indexes=['str', 'group'])") if __name__ == '__main__': diff --git a/sdks/python/apache_beam/dataframe/io.py b/sdks/python/apache_beam/dataframe/io.py index 2f6404d1e70d..5046acc1018e 100644 --- a/sdks/python/apache_beam/dataframe/io.py +++ b/sdks/python/apache_beam/dataframe/io.py @@ -14,8 +14,34 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - +"""Sources and sinks for the Beam DataFrame API. + +Sources +####### +This module provides analogs for pandas ``read`` methods, like +:func:`pandas.read_csv`. However Beam sources like :func:`read_csv` +create a Beam :class:`~apache_beam.PTransform`, and return a +:class:`~apache_beam.dataframe.frames.DeferredDataFrame` or +:class:`~apache_beam.dataframe.frames.DeferredSeries` representing the contents +of the referenced file(s) or data source. + +The result of these methods must be applied to a :class:`~apache_beam.Pipeline` +object, for example:: + + df = p | beam.dataframe.io.read_csv(...) + +Sinks +##### +This module also defines analogs for pandas sink, or ``to``, methods that +generate a Beam :class:`~apache_beam.PTransform`. Users should prefer calling +these operations from :class:`~apache_beam.dataframe.frames.DeferredDataFrame` +instances (for example with +:meth:`DeferredDataFrame.to_csv +`). +""" + +import itertools +import re from io import BytesIO from io import StringIO from io import TextIOWrapper @@ -31,16 +57,21 @@ _DEFAULT_BYTES_CHUNKSIZE = 1 << 20 -def read_csv(path, *args, **kwargs): - """Emulates `pd.read_csv` from Pandas, but as a Beam PTransform. - - Use this as - - df = p | beam.dataframe.io.read_csv(...) - - to get a deferred Beam dataframe representing the contents of the file. - """ - return _ReadFromPandas(pd.read_csv, path, args, kwargs, incremental=True) +@frame_base.with_docs_from(pd) +def read_csv(path, *args, splittable=False, **kwargs): + """If your files are large and records do not contain quoted newlines, you may + pass the extra argument ``splittable=True`` to enable dynamic splitting for + this read on newlines. Using this option for records that do contain quoted + newlines may result in partial records and data corruption.""" + if 'nrows' in kwargs: + raise ValueError('nrows not yet supported') + return _ReadFromPandas( + pd.read_csv, + path, + args, + kwargs, + incremental=True, + splitter=_CsvSplitter(args, kwargs) if splittable else None) def _as_pc(df): @@ -49,26 +80,37 @@ def _as_pc(df): return convert.to_pcollection(df, yield_elements='pandas') +@frame_base.with_docs_from(pd.DataFrame) def to_csv(df, path, *args, **kwargs): + return _as_pc(df) | _WriteToPandas( 'to_csv', path, args, kwargs, incremental=True, binary=False) +@frame_base.with_docs_from(pd) def read_fwf(path, *args, **kwargs): return _ReadFromPandas(pd.read_fwf, path, args, kwargs, incremental=True) +@frame_base.with_docs_from(pd) def read_json(path, *args, **kwargs): + if 'nrows' in kwargs: + raise NotImplementedError('nrows not yet supported') + elif kwargs.get('lines', False): + # Work around https://github.com/pandas-dev/pandas/issues/34548. + kwargs = dict(kwargs, nrows=1 << 63) return _ReadFromPandas( pd.read_json, path, args, kwargs, incremental=kwargs.get('lines', False), - splittable=kwargs.get('lines', False), + splitter=_DelimSplitter(b'\n', _DEFAULT_BYTES_CHUNKSIZE) if kwargs.get( + 'lines', False) else None, binary=False) +@frame_base.with_docs_from(pd.DataFrame) def to_json(df, path, orient=None, *args, **kwargs): if orient is None: if isinstance(df._expr.proxy(), pd.DataFrame): @@ -87,6 +129,7 @@ def to_json(df, path, orient=None, *args, **kwargs): binary=False) +@frame_base.with_docs_from(pd) def read_html(path, *args, **kwargs): return _ReadFromPandas( lambda *args, @@ -96,6 +139,7 @@ def read_html(path, *args, **kwargs): kwargs) +@frame_base.with_docs_from(pd.DataFrame) def to_html(df, path, *args, **kwargs): return _as_pc(df) | _WriteToPandas( 'to_html', @@ -110,32 +154,52 @@ def to_html(df, path, *args, **kwargs): def _binary_reader(format): func = getattr(pd, 'read_%s' % format) - return lambda path, *args, **kwargs: _ReadFromPandas(func, path, args, kwargs) + result = lambda path, *args, **kwargs: _ReadFromPandas(func, path, args, + kwargs) + result.__name__ = f'read_{format}' + + return result def _binary_writer(format): - return ( + result = ( lambda df, path, *args, **kwargs: _as_pc(df) | _WriteToPandas(f'to_{format}', path, args, kwargs)) + result.__name__ = f'to_{format}' + return result for format in ('excel', 'feather', 'parquet', 'stata'): - globals()['read_%s' % format] = _binary_reader(format) - globals()['to_%s' % format] = _binary_writer(format) + globals()['read_%s' % format] = frame_base.with_docs_from(pd)( + _binary_reader(format)) + globals()['to_%s' % format] = frame_base.with_docs_from(pd.DataFrame)( + _binary_writer(format)) for format in ('sas', 'spss'): if hasattr(pd, 'read_%s' % format): # Depends on pandas version. - globals()['read_%s' % format] = _binary_reader(format) - -read_clipboard = to_clipboard = frame_base.wont_implement_method('clipboard') -read_msgpack = to_msgpack = frame_base.wont_implement_method('deprecated') -read_hdf = to_hdf = frame_base.wont_implement_method('random access files') + globals()['read_%s' % format] = frame_base.with_docs_from(pd)( + _binary_reader(format)) + +read_clipboard = frame_base.not_implemented_method( + 'read_clipboard', base_type=pd) +to_clipboard = frame_base.not_implemented_method( + 'to_clipboard', base_type=pd.DataFrame) +read_msgpack = frame_base.wont_implement_method( + pd, 'read_msgpack', reason="deprecated") +to_msgpack = frame_base.wont_implement_method( + pd.DataFrame, 'to_msgpack', reason="deprecated") +read_hdf = frame_base.wont_implement_method( + pd, 'read_hdf', explanation="because HDF5 is a random access file format") +to_hdf = frame_base.wont_implement_method( + pd.DataFrame, + 'to_hdf', + explanation="because HDF5 is a random access file format") for name in dir(pd): if name.startswith('read_') and name not in globals(): - globals()[name] = frame_base.not_implemented_method(name) + globals()[name] = frame_base.not_implemented_method(name, base_type=pd) def _prefix_range_index_with(prefix, df): @@ -152,9 +216,9 @@ def __init__( path, args, kwargs, + binary=True, incremental=False, - splittable=False, - binary=True): + splitter=False): if 'compression' in kwargs: raise NotImplementedError('compression') if not isinstance(path, str): @@ -163,17 +227,19 @@ def __init__( self.path = path self.args = args self.kwargs = kwargs - self.incremental = incremental - self.splittable = splittable self.binary = binary + self.incremental = incremental + self.splitter = splitter def expand(self, root): - # TODO(robertwb): Handle streaming (with explicit schema). paths_pcoll = root | beam.Create([self.path]) - first = io.filesystems.FileSystems.match([self.path], - limits=[1 - ])[0].metadata_list[0].path - with io.filesystems.FileSystems.open(first) as handle: + match = io.filesystems.FileSystems.match([self.path], limits=[1])[0] + if not match.metadata_list: + # TODO(BEAM-12031): This should be allowed for streaming pipelines if + # user provides an explicit schema. + raise FileNotFoundError(f"Found no files that match {self.path!r}") + first_path = match.metadata_list[0].path + with io.filesystems.FileSystems.open(first_path) as handle: if not self.binary: handle = TextIOWrapper(handle) if self.incremental: @@ -192,14 +258,155 @@ def expand(self, root): self.reader, self.args, self.kwargs, + self.binary, self.incremental, - self.splittable, - self.binary))) + self.splitter))) from apache_beam.dataframe import convert return convert.to_dataframe( pcoll, proxy=_prefix_range_index_with(':', sample[:0])) +class _Splitter: + def empty_buffer(self): + """Returns an empty buffer of the right type (string or bytes). + """ + raise NotImplementedError(self) + + def read_header(self, handle): + """Reads the header from handle, which points to the start of the file. + + Returns the pair (header, buffer) where buffer contains any part of the + file that was "overread" from handle while seeking the end of header. + """ + raise NotImplementedError(self) + + def read_to_record_boundary(self, buffered, handle): + """Reads the given handle up to the end of the current record. + + The buffer argument represents bytes that were read previously; logically + it's as if these were pushed back into handle for reading. If the + record end is within buffered, it's possible that no more bytes will be read + from handle at all. + + Returns the pair (remaining_record_bytes, buffer) where buffer contains + any part of the file that was "overread" from handle while seeking the end + of the record. + """ + raise NotImplementedError(self) + + +class _DelimSplitter(_Splitter): + """A _Splitter that splits on delimiters between records. + + This delimiter is assumed ot never occur within a record. + """ + def __init__(self, delim, read_chunk_size=_DEFAULT_BYTES_CHUNKSIZE): + # Multi-char delimiters would require more care across chunk boundaries. + assert len(delim) == 1 + self._delim = delim + self._empty = delim[:0] + self._read_chunk_size = read_chunk_size + + def empty_buffer(self): + return self._empty + + def read_header(self, handle): + return self._empty, self._empty + + def read_to_record_boundary(self, buffered, handle): + if self._delim in buffered: + ix = buffered.index(self._delim) + len(self._delim) + return buffered[:ix], buffered[ix:] + else: + while True: + chunk = handle.read(self._read_chunk_size) + if self._delim in chunk: + ix = chunk.index(self._delim) + len(self._delim) + return buffered + chunk[:ix], chunk[ix:] + elif not chunk: + return buffered, self._empty + else: + buffered += chunk + + +def _maybe_encode(str_or_bytes): + if isinstance(str_or_bytes, str): + return str_or_bytes.encode('utf-8') + else: + return str_or_bytes + + +class _CsvSplitter(_DelimSplitter): + """Splitter for dynamically sharding CSV files and newline record boundaries. + + Currently does not handle quoted newlines, so is off by default, but such + support could be added in the future. + """ + def __init__(self, args, kwargs, read_chunk_size=_DEFAULT_BYTES_CHUNKSIZE): + if args: + # TODO(robertwb): Automatically populate kwargs as we do for df methods. + raise ValueError( + 'Non-path arguments must be passed by keyword ' + 'for splittable csv reads.') + if kwargs.get('skipfooter', 0): + raise ValueError('Splittablility incompatible with skipping footers.') + super(_CsvSplitter, self).__init__( + _maybe_encode(kwargs.get('lineterminator', b'\n')), + _DEFAULT_BYTES_CHUNKSIZE) + self._kwargs = kwargs + + def read_header(self, handle): + if self._kwargs.get('header', 'infer') == 'infer': + if 'names' in self._kwargs: + header = None + else: + header = 0 + else: + header = self._kwargs['header'] + + if header is None: + return self._empty, self._empty + + if isinstance(header, int): + max_header = header + else: + max_header = max(header) + + skiprows = self._kwargs.get('skiprows', 0) + if isinstance(skiprows, int): + is_skiprow = lambda ix: ix < skiprows + elif callable(skiprows): + is_skiprow = skiprows + elif skiprows is None: + is_skiprow = lambda ix: False + else: + is_skiprow = lambda ix: ix in skiprows + + comment = _maybe_encode(self._kwargs.get('comment', None)) + if comment: + is_comment = lambda line: line.startswith(comment) + else: + is_comment = lambda line: False + + skip_blank_lines = self._kwargs.get('skip_blank_lines', True) + if skip_blank_lines: + is_blank = lambda line: re.match(rb'^\s*$', line) + else: + is_blank = lambda line: False + + text_header = b'' + rest = b'' + skipped = 0 + for ix in itertools.count(): + line, rest = self.read_to_record_boundary(rest, handle) + text_header += line + if is_skiprow(ix) or is_blank(line) or is_comment(line): + skipped += 1 + continue + if ix - skipped == max_header: + return text_header, rest + + class _TruncatingFileHandle(object): """A wrapper of a file-like object representing the restriction of the underling handle according to the given SDF restriction tracker, breaking @@ -212,29 +419,28 @@ class _TruncatingFileHandle(object): As with all SDF trackers, the endpoint may change dynamically during reading. """ - def __init__( - self, - underlying, - tracker, - delim=b'\n', - chunk_size=_DEFAULT_BYTES_CHUNKSIZE): + def __init__(self, underlying, tracker, splitter): self._underlying = underlying self._tracker = tracker - self._buffer_start_pos = self._tracker.current_restriction().start - self._delim = delim - self._chunk_size = chunk_size + self._splitter = splitter - self._buffer = self._empty = self._delim[:0] + self._empty = self._splitter.empty_buffer() self._done = False - if self._buffer_start_pos > 0: - # Seek to first delimiter after the start position. - self._underlying.seek(self._buffer_start_pos) - if self.buffer_to_delim(): - line_start = self._buffer.index(self._delim) + len(self._delim) - self._buffer_start_pos += line_start - self._buffer = self._buffer[line_start:] + self._header, self._buffer = self._splitter.read_header(self._underlying) + self._buffer_start_pos = len(self._header) + start = self._tracker.current_restriction().start + # Seek to first delimiter after the start position. + if start > len(self._header): + if start > len(self._header) + len(self._buffer): + self._buffer_start_pos = start + self._buffer = self._empty + self._underlying.seek(start) else: - self._done = True + self._buffer_start_pos = start + self._buffer = self._buffer[start - len(self._header):] + skip, self._buffer = self._splitter.read_to_record_boundary( + self._buffer, self._underlying) + self._buffer_start_pos += len(skip) def readable(self): return True @@ -253,54 +459,49 @@ def __iter__(self): # For pandas is_file_like. raise NotImplementedError() - def buffer_to_delim(self, offset=0): - """Read enough of the file such that the buffer contains the delimiter, or - end-of-file is reached. - """ - if self._delim in self._buffer[offset:]: - return True - while True: - chunk = self._underlying.read(self._chunk_size) - self._buffer += chunk - if self._delim in chunk: - return True - elif not chunk: - return False - def read(self, size=-1): - if self._done: + if self._header: + res = self._header + self._header = None + return res + elif self._done: return self._empty elif size == -1: self._buffer += self._underlying.read() elif not self._buffer: self._buffer = self._underlying.read(size) + if not self._buffer: + self._done = True + return self._empty + if self._tracker.try_claim(self._buffer_start_pos + len(self._buffer)): res = self._buffer self._buffer = self._empty self._buffer_start_pos += len(res) else: offset = self._tracker.current_restriction().stop - self._buffer_start_pos - if self.buffer_to_delim(offset): - end_of_line = self._buffer.index(self._delim, offset) - res = self._buffer[:end_of_line + len(self._delim)] + if offset <= 0: + res = self._empty else: - res = self._buffer + rest, _ = self._splitter.read_to_record_boundary( + self._buffer[offset:], self._underlying) + res = self._buffer[:offset] + rest self._done = True return res class _ReadFromPandasDoFn(beam.DoFn, beam.RestrictionProvider): - def __init__(self, reader, args, kwargs, incremental, splittable, binary): + def __init__(self, reader, args, kwargs, binary, incremental, splitter): # avoid pickling issues if reader.__module__.startswith('pandas.'): reader = reader.__name__ self.reader = reader self.args = args self.kwargs = kwargs - self.incremental = incremental - self.splittable = splittable self.binary = binary + self.incremental = incremental + self.splitter = splitter def initial_restriction(self, readable_file): return beam.io.restriction_trackers.OffsetRange( @@ -311,7 +512,7 @@ def restriction_size(self, readable_file, restriction): def create_tracker(self, restriction): tracker = beam.io.restriction_trackers.OffsetRestrictionTracker(restriction) - if self.splittable: + if self.splitter: return tracker else: return beam.io.restriction_trackers.UnsplittableRestrictionTracker( @@ -323,13 +524,16 @@ def process(self, readable_file, tracker=beam.DoFn.RestrictionParam()): reader = getattr(pd, self.reader) with readable_file.open() as handle: if self.incremental: - # We can get progress even if we can't split. # TODO(robertwb): We could consider trying to get progress for # non-incremental sources that are read linearly, as long as they # don't try to seek. This could be deceptive as progress would # advance to 100% the instant the (large) read was done, discounting # any downstream processing. - handle = _TruncatingFileHandle(handle, tracker) + handle = _TruncatingFileHandle( + handle, + tracker, + splitter=self.splitter or + _DelimSplitter(b'\n', _DEFAULT_BYTES_CHUNKSIZE)) if not self.binary: handle = TextIOWrapper(handle) if self.incremental: @@ -362,7 +566,7 @@ def expand(self, pcoll): return pcoll | fileio.WriteToFiles( path=dir, file_naming=fileio.default_file_naming(name), - sink=_WriteToPandasFileSink( + sink=lambda _: _WriteToPandasFileSink( self.writer, self.args, self.kwargs, self.incremental, self.binary)) diff --git a/sdks/python/apache_beam/dataframe/io_test.py b/sdks/python/apache_beam/dataframe/io_test.py index de67effadaa5..374eb0c03f7d 100644 --- a/sdks/python/apache_beam/dataframe/io_test.py +++ b/sdks/python/apache_beam/dataframe/io_test.py @@ -14,21 +14,22 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import -from __future__ import print_function - import glob import importlib import math import os import platform import shutil -import sys import tempfile +import typing import unittest +from datetime import datetime +from io import BytesIO from io import StringIO import pandas as pd +import pandas.testing +import pytest from pandas.testing import assert_frame_equal from parameterized import parameterized @@ -39,6 +40,11 @@ from apache_beam.testing.util import assert_that +class MyRow(typing.NamedTuple): + timestamp: int + value: int + + @unittest.skipIf(platform.system() == 'Windows', 'BEAM-10929') class IOTest(unittest.TestCase): def setUp(self): @@ -57,14 +63,15 @@ def temp_dir(self, files=None): fout.write(contents) return dir + os.path.sep - def read_all_lines(self, pattern): + def read_all_lines(self, pattern, delete=False): for path in glob.glob(pattern): with open(path) as fin: # TODO(Py3): yield from for line in fin: yield line.rstrip('\n') + if delete: + os.remove(path) - @unittest.skipIf(sys.version_info[0] < 3, 'unicode issues') def test_read_write_csv(self): input = self.temp_dir({'1.csv': 'a,b\n1,2\n', '2.csv': 'a,b\n3,4\n'}) output = self.temp_dir() @@ -75,8 +82,14 @@ def test_read_write_csv(self): self.assertCountEqual(['a,b,c', '1,2,3', '3,4,7'], set(self.read_all_lines(output + 'out.csv*'))) + @pytest.mark.uses_pyarrow + def test_read_write_parquet(self): + self._run_read_write_test( + 'parquet', {}, {}, dict(check_index=False), ['pyarrow']) + @parameterized.expand([ ('csv', dict(index_col=0)), + ('csv', dict(index_col=0, splittable=True)), ('json', dict(orient='index'), dict(orient='index')), ('json', dict(orient='columns'), dict(orient='columns')), ('json', dict(orient='split'), dict(orient='split')), @@ -97,7 +110,6 @@ def test_read_write_csv(self): dict(check_index=False)), ('html', dict(index_col=0), {}, {}, ['lxml']), ('excel', dict(index_col=0), {}, {}, ['openpyxl', 'xlrd']), - ('parquet', {}, {}, dict(check_index=False), ['pyarrow']), ]) # pylint: disable=dangerous-default-value def test_read_write( @@ -107,6 +119,18 @@ def test_read_write( write_kwargs={}, check_options={}, requires=()): + self._run_read_write_test( + format, read_kwargs, write_kwargs, check_options, requires) + + # pylint: disable=dangerous-default-value + def _run_read_write_test( + self, + format, + read_kwargs={}, + write_kwargs={}, + check_options={}, + requires=()): + for module in requires: try: importlib.import_module(module) @@ -169,7 +193,7 @@ def _run_truncating_file_handle_test( for split in list(splits) + [None]: tracker = restriction_trackers.OffsetRestrictionTracker(next_range) handle = io._TruncatingFileHandle( - StringIO(s), tracker, delim=delim, chunk_size=chunk_size) + StringIO(s), tracker, splitter=io._DelimSplitter(delim, chunk_size)) data = '' chunk = handle.read(1) if split is not None: @@ -201,6 +225,124 @@ def test_truncating_filehandle(self): self.assertGreater( min(len(s) for s in splits), len(numbers) * 0.9**20 * 0.1) + @parameterized.expand([ + ('defaults', dict()), + ('header', dict(header=1)), + ('multi_header', dict(header=[0, 1])), + ('multi_header', dict(header=[0, 1, 4])), + ('names', dict(names=('m', 'n', 'o'))), + ('names_and_header', dict(names=('m', 'n', 'o'), header=0)), + ('skip_blank_lines', dict(header=4, skip_blank_lines=True)), + ('skip_blank_lines', dict(header=4, skip_blank_lines=False)), + ('comment', dict(comment='X', header=4)), + ('comment', dict(comment='X', header=[0, 3])), + ('skiprows', dict(skiprows=0, header=[0, 1])), + ('skiprows', dict(skiprows=[1], header=[0, 3], skip_blank_lines=False)), + ('skiprows', dict(skiprows=[0, 1], header=[0, 1], comment='X')), + ]) + def test_csv_splitter(self, name, kwargs): + def assert_frame_equal(expected, actual): + try: + pandas.testing.assert_frame_equal(expected, actual) + except AssertionError: + print("Expected:\n", expected) + print("Actual:\n", actual) + raise + + def read_truncated_csv(start, stop): + return pd.read_csv( + io._TruncatingFileHandle( + BytesIO(contents.encode('ascii')), + restriction_trackers.OffsetRestrictionTracker( + restriction_trackers.OffsetRange(start, stop)), + splitter=io._CsvSplitter((), kwargs, read_chunk_size=7)), + index_col=0, + **kwargs) + + contents = ''' + a0, a1, a2 + b0, b1, b2 + +X , c1, c2 + e0, e1, e2 + f0, f1, f2 + w, daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaata, w + x, daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaata, x + y, daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaata, y + z, daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaata, z + '''.strip() + expected = pd.read_csv(StringIO(contents), index_col=0, **kwargs) + + one_shard = read_truncated_csv(0, len(contents)) + assert_frame_equal(expected, one_shard) + + equal_shards = pd.concat([ + read_truncated_csv(0, len(contents) // 2), + read_truncated_csv(len(contents) // 2, len(contents)), + ]) + assert_frame_equal(expected, equal_shards) + + three_shards = pd.concat([ + read_truncated_csv(0, len(contents) // 3), + read_truncated_csv(len(contents) // 3, len(contents) * 2 // 3), + read_truncated_csv(len(contents) * 2 // 3, len(contents)), + ]) + assert_frame_equal(expected, three_shards) + + # https://github.com/pandas-dev/pandas/issues/38292 + if not isinstance(kwargs.get('header'), list): + split_in_header = pd.concat([ + read_truncated_csv(0, 1), + read_truncated_csv(1, len(contents)), + ]) + assert_frame_equal(expected, split_in_header) + + if not kwargs: + # Make sure we're correct as we cross the header boundary. + # We don't need to do this for every permutation. + header_end = contents.index('a2') + 3 + for split in range(header_end - 2, header_end + 2): + split_at_header = pd.concat([ + read_truncated_csv(0, split), + read_truncated_csv(split, len(contents)), + ]) + assert_frame_equal(expected, split_at_header) + + def test_file_not_found(self): + with self.assertRaisesRegex(FileNotFoundError, r'/tmp/fake_dir/\*\*'): + with beam.Pipeline() as p: + _ = p | io.read_csv('/tmp/fake_dir/**') + + def test_windowed_write(self): + output = self.temp_dir() + with beam.Pipeline() as p: + pc = ( + p | beam.Create([MyRow(timestamp=i, value=i % 3) for i in range(20)]) + | beam.Map(lambda v: beam.window.TimestampedValue(v, v.timestamp)). + with_output_types(MyRow) + | beam.WindowInto( + beam.window.FixedWindows(10)).with_output_types(MyRow)) + + deferred_df = convert.to_dataframe(pc) + deferred_df.to_csv(output + 'out.csv', index=False) + + first_window_files = ( + f'{output}out.csv-' + f'{datetime.utcfromtimestamp(0).isoformat()}*') + self.assertCountEqual( + ['timestamp,value'] + [f'{i},{i%3}' for i in range(10)], + set(self.read_all_lines(first_window_files, delete=True))) + + second_window_files = ( + f'{output}out.csv-' + f'{datetime.utcfromtimestamp(10).isoformat()}*') + self.assertCountEqual( + ['timestamp,value'] + [f'{i},{i%3}' for i in range(10, 20)], + set(self.read_all_lines(second_window_files, delete=True))) + + # Check that we've read (and removed) every output file + self.assertEqual(len(glob.glob(f'{output}out.csv*')), 0) + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/dataframe/pandas_docs_test.py b/sdks/python/apache_beam/dataframe/pandas_docs_test.py index 449656ee8661..d52773c955f1 100644 --- a/sdks/python/apache_beam/dataframe/pandas_docs_test.py +++ b/sdks/python/apache_beam/dataframe/pandas_docs_test.py @@ -20,9 +20,6 @@ Run as python -m apache_beam.dataframe.pandas_docs_test [getting_started ...] """ -from __future__ import absolute_import -from __future__ import print_function - import argparse import contextlib import io diff --git a/sdks/python/apache_beam/dataframe/pandas_doctests_test.py b/sdks/python/apache_beam/dataframe/pandas_doctests_test.py index 526ea7e6898c..edc42f1b7cfd 100644 --- a/sdks/python/apache_beam/dataframe/pandas_doctests_test.py +++ b/sdks/python/apache_beam/dataframe/pandas_doctests_test.py @@ -14,8 +14,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import sys import unittest @@ -25,10 +23,134 @@ from apache_beam.dataframe.pandas_top_level_functions import _is_top_level_function -@unittest.skipIf(sys.version_info <= (3, ), 'Requires contextlib.ExitStack.') -@unittest.skipIf(sys.version_info < (3, 6), 'Nondeterministic dict ordering.') @unittest.skipIf(sys.platform == 'win32', '[BEAM-10626]') class DoctestTest(unittest.TestCase): + def test_ndframe_tests(self): + # IO methods are tested in io_test.py + skip_writes = { + f'pandas.core.generic.NDFrame.{name}': ['*'] + for name in dir(pd.core.generic.NDFrame) if name.startswith('to_') + } + + result = doctests.testmod( + pd.core.generic, + use_beam=False, + report=True, + wont_implement_ok={ + 'pandas.core.generic.NDFrame.head': ['*'], + 'pandas.core.generic.NDFrame.shift': [ + 'df.shift(periods=3)', + 'df.shift(periods=3, fill_value=0)', + ], + 'pandas.core.generic.NDFrame.tail': ['*'], + 'pandas.core.generic.NDFrame.take': ['*'], + 'pandas.core.generic.NDFrame.values': ['*'], + 'pandas.core.generic.NDFrame.tz_localize': [ + "s.tz_localize('CET', ambiguous='infer')", + # np.array is not a deferred object. This use-case is possible + # with a deferred Series though, which is tested in + # frames_test.py + "s.tz_localize('CET', ambiguous=np.array([True, True, False]))", + ], + 'pandas.core.generic.NDFrame.truncate': [ + # These inputs rely on tail (wont implement, order + # sensitive) for verification + "df.tail()", + "df.loc['2016-01-05':'2016-01-10', :].tail()", + ], + 'pandas.core.generic.NDFrame.replace': [ + "s.replace([1, 2], method='bfill')", + # Relies on method='pad' + "s.replace('a', None)", + # Implicitly uses method='pad', but output doesn't rely on that + # behavior. Verified indepently in + # frames_test.py::DeferredFrameTest::test_replace + "df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})" + ], + 'pandas.core.generic.NDFrame.fillna': [ + "df.fillna(method='ffill')", + 'df.fillna(value=values, limit=1)', + ], + 'pandas.core.generic.NDFrame.sort_values': ['*'], + 'pandas.core.generic.NDFrame.mask': [ + 'df.where(m, -df) == np.where(m, df, -df)' + ], + 'pandas.core.generic.NDFrame.where': [ + 'df.where(m, -df) == np.where(m, df, -df)' + ], + 'pandas.core.generic.NDFrame.interpolate': ['*'], + 'pandas.core.generic.NDFrame.resample': ['*'], + 'pandas.core.generic.NDFrame.rolling': ['*'], + # argsort wont implement + 'pandas.core.generic.NDFrame.abs': [ + 'df.loc[(df.c - 43).abs().argsort()]', + ], + 'pandas.core.generic.NDFrame.reindex': ['*'], + 'pandas.core.generic.NDFrame.pct_change': ['*'], + 'pandas.core.generic.NDFrame.asof': ['*'], + 'pandas.core.generic.NDFrame.infer_objects': ['*'], + 'pandas.core.generic.NDFrame.ewm': ['*'], + 'pandas.core.generic.NDFrame.expanding': ['*'], + }, + not_implemented_ok={ + 'pandas.core.generic.NDFrame.asof': ['*'], + 'pandas.core.generic.NDFrame.at_time': ['*'], + 'pandas.core.generic.NDFrame.between_time': ['*'], + 'pandas.core.generic.NDFrame.ewm': ['*'], + 'pandas.core.generic.NDFrame.expanding': ['*'], + 'pandas.core.generic.NDFrame.flags': ['*'], + 'pandas.core.generic.NDFrame.rank': ['*'], + 'pandas.core.generic.NDFrame.reindex_like': ['*'], + 'pandas.core.generic.NDFrame.replace': ['*'], + 'pandas.core.generic.NDFrame.sample': ['*'], + 'pandas.core.generic.NDFrame.set_flags': ['*'], + 'pandas.core.generic.NDFrame.squeeze': ['*'], + 'pandas.core.generic.NDFrame.truncate': ['*'], + }, + skip={ + # Internal test + 'pandas.core.generic.NDFrame._set_axis_name': ['*'], + # Fails to construct test series. asfreq is not implemented anyway. + 'pandas.core.generic.NDFrame.asfreq': ['*'], + 'pandas.core.generic.NDFrame.astype': ['*'], + 'pandas.core.generic.NDFrame.convert_dtypes': ['*'], + 'pandas.core.generic.NDFrame.copy': ['*'], + 'pandas.core.generic.NDFrame.droplevel': ['*'], + 'pandas.core.generic.NDFrame.rank': [ + # Modified dataframe + 'df' + ], + 'pandas.core.generic.NDFrame.rename': [ + # Seems to be an upstream bug. The actual error has a different + # message: + # TypeError: Index(...) must be called with a collection of + # some kind, 2 was passed + # pandas doctests only verify the type of exception + 'df.rename(2)' + ], + # Tests rely on setting index + 'pandas.core.generic.NDFrame.rename_axis': ['*'], + # Raises right exception, but testing framework has matching issues. + 'pandas.core.generic.NDFrame.replace': [ + "df.replace({'a string': 'new value', True: False}) # raises" + ], + 'pandas.core.generic.NDFrame.squeeze': ['*'], + + # NameError + 'pandas.core.generic.NDFrame.resample': ['df'], + + # Skipped so we don't need to install natsort + 'pandas.core.generic.NDFrame.sort_values': [ + 'from natsort import index_natsorted', + 'df.sort_values(\n' + ' by="time",\n' + ' key=lambda x: np.argsort(index_natsorted(df["time"]))\n' + ')' + ], + **skip_writes + }) + self.assertEqual(result.failed, 0) + def test_dataframe_tests(self): result = doctests.testmod( pd.core.frame, @@ -41,6 +163,10 @@ def test_dataframe_tests(self): 'pandas.core.frame.DataFrame.cumsum': ['*'], 'pandas.core.frame.DataFrame.cumprod': ['*'], 'pandas.core.frame.DataFrame.diff': ['*'], + 'pandas.core.frame.DataFrame.fillna': [ + "df.fillna(method='ffill')", + 'df.fillna(value=values, limit=1)', + ], 'pandas.core.frame.DataFrame.items': ['*'], 'pandas.core.frame.DataFrame.itertuples': ['*'], 'pandas.core.frame.DataFrame.iterrows': ['*'], @@ -56,7 +182,15 @@ def test_dataframe_tests(self): "df.nsmallest(3, ['population', 'GDP'])", "df.nsmallest(3, 'population', keep='last')", ], - 'pandas.core.frame.DataFrame.nunique': ['*'], + 'pandas.core.frame.DataFrame.replace': [ + "s.replace([1, 2], method='bfill')", + # Relies on method='pad' + "s.replace('a', None)", + # Implicitly uses method='pad', but output doesn't rely on that + # behavior. Verified indepently in + # frames_test.py::DeferredFrameTest::test_replace + "df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})" + ], 'pandas.core.frame.DataFrame.to_records': ['*'], 'pandas.core.frame.DataFrame.to_dict': ['*'], 'pandas.core.frame.DataFrame.to_numpy': ['*'], @@ -64,8 +198,8 @@ def test_dataframe_tests(self): 'pandas.core.frame.DataFrame.transpose': ['*'], 'pandas.core.frame.DataFrame.shape': ['*'], 'pandas.core.frame.DataFrame.shift': [ - 'df.shift(periods=3, freq="D")', - 'df.shift(periods=3, freq="infer")' + 'df.shift(periods=3)', + 'df.shift(periods=3, fill_value=0)', ], 'pandas.core.frame.DataFrame.unstack': ['*'], 'pandas.core.frame.DataFrame.memory_usage': ['*'], @@ -76,42 +210,71 @@ def test_dataframe_tests(self): 'pandas.core.frame.DataFrame.mode': [ "df.mode(axis='columns', numeric_only=True)" ], + 'pandas.core.frame.DataFrame.append': [ + 'df.append(df2, ignore_index=True)', + "for i in range(5):\n" + + " df = df.append({'A': i}, ignore_index=True)", + ], + 'pandas.core.frame.DataFrame.sort_index': ['*'], + 'pandas.core.frame.DataFrame.sort_values': ['*'], + 'pandas.core.frame.DataFrame.melt': [ + "df.melt(id_vars=['A'], value_vars=['B'])", + "df.melt(id_vars=['A'], value_vars=['B', 'C'])", + "df.melt(col_level=0, id_vars=['A'], value_vars=['B'])", + "df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')])", + "df.melt(id_vars=['A'], value_vars=['B'],\n" + + " var_name='myVarname', value_name='myValname')" + ], + # Most keep= options are order-sensitive + 'pandas.core.frame.DataFrame.drop_duplicates': ['*'], + 'pandas.core.frame.DataFrame.duplicated': [ + 'df.duplicated()', + "df.duplicated(keep='last')", + "df.duplicated(subset=['brand'])", + ], + 'pandas.core.frame.DataFrame.reindex': ['*'], + 'pandas.core.frame.DataFrame.dot': [ + # reindex not supported + 's2 = s.reindex([1, 0, 2, 3])', + ], }, not_implemented_ok={ - 'pandas.core.frame.DataFrame.isin': ['*'], - 'pandas.core.frame.DataFrame.melt': ['*'], - 'pandas.core.frame.DataFrame.count': ['*'], - 'pandas.core.frame.DataFrame.reindex': ['*'], + 'pandas.core.frame.DataFrame.transform': [ + # str arg not supported. Tested with np.sum in + # frames_test.py::DeferredFrameTest::test_groupby_transform_sum + "df.groupby('Date')['Data'].transform('sum')", + ], 'pandas.core.frame.DataFrame.reindex_axis': ['*'], + 'pandas.core.frame.DataFrame.round': [ + 'df.round(decimals)', + ], # We should be able to support pivot and pivot_table for categorical # columns 'pandas.core.frame.DataFrame.pivot': ['*'], - # We can implement this as a zipping operator, but it won't have the - # same capability. The doctest includes an example that branches on - # a deferred result. - 'pandas.core.frame.DataFrame.combine': ['*'], - - # Can be implemented as a zipping operator - 'pandas.core.frame.DataFrame.combine_first': ['*'], - - # Difficult to parallelize but should be possible? - 'pandas.core.frame.DataFrame.dot': [ - # reindex not supported - 's2 = s.reindex([1, 0, 2, 3])', - 'df.dot(s2)', - ], - # Trivially elementwise for axis=columns. Relies on global indexing # for axis=rows. # Difficult to determine proxy, need to inspect function 'pandas.core.frame.DataFrame.apply': ['*'], - # In theory this is possible for bounded inputs? - 'pandas.core.frame.DataFrame.append': ['*'], + # Cross-join not implemented + 'pandas.core.frame.DataFrame.merge': [ + "df1.merge(df2, how='cross')" + ], + + # TODO(BEAM-11711) + 'pandas.core.frame.DataFrame.set_index': [ + "df.set_index([s, s**2])", + ], }, skip={ + # s2 created with reindex + 'pandas.core.frame.DataFrame.dot': [ + 'df.dot(s2)', + ], + + # Throws NotImplementedError when modifying df 'pandas.core.frame.DataFrame.axes': [ # Returns deferred index. 'df.axes', @@ -123,8 +286,6 @@ def test_dataframe_tests(self): "df.loc[df.index[5:10], 'b'] = np.nan", 'df.cov(min_periods=12)', ], - 'pandas.core.frame.DataFrame.drop_duplicates': ['*'], - 'pandas.core.frame.DataFrame.duplicated': ['*'], 'pandas.core.frame.DataFrame.idxmax': ['*'], 'pandas.core.frame.DataFrame.idxmin': ['*'], 'pandas.core.frame.DataFrame.rename': [ @@ -132,11 +293,14 @@ def test_dataframe_tests(self): 'df.index', 'df.rename(index=str).index', ], + 'pandas.core.frame.DataFrame.set_index': [ + # TODO(BEAM-11711): This could pass in the index as + # a DeferredIndex, and we should fail it as order-sensitive. + "df.set_index([pd.Index([1, 2, 3, 4]), 'year'])", + ], 'pandas.core.frame.DataFrame.set_axis': ['*'], - 'pandas.core.frame.DataFrame.sort_index': ['*'], 'pandas.core.frame.DataFrame.to_markdown': ['*'], 'pandas.core.frame.DataFrame.to_parquet': ['*'], - 'pandas.core.frame.DataFrame.value_counts': ['*'], 'pandas.core.frame.DataFrame.to_records': [ 'df.index = df.index.rename("I")', @@ -166,16 +330,12 @@ def test_dataframe_tests(self): "df1.merge(df2, left_on='lkey', right_on='rkey')", "df1.merge(df2, left_on='lkey', right_on='rkey',\n" " suffixes=('_left', '_right'))", + "df1.merge(df2, how='left', on='a')", ], # Raises right exception, but testing framework has matching issues. 'pandas.core.frame.DataFrame.replace': [ "df.replace({'a string': 'new value', True: False}) # raises" ], - # Should raise WontImplement order-sensitive - 'pandas.core.frame.DataFrame.set_index': [ - "df.set_index([pd.Index([1, 2, 3, 4]), 'year'])", - "df.set_index([s, s**2])", - ], 'pandas.core.frame.DataFrame.to_sparse': ['type(df)'], # Skipped because "seen_wont_implement" is reset before getting to @@ -186,6 +346,25 @@ def test_dataframe_tests(self): 'pandas.core.frame.DataFrame.transpose': [ 'df1_transposed.dtypes', 'df2_transposed.dtypes' ], + # Skipped because the relies on iloc to set a cell to NA. Test is + # replicated in frames_test::DeferredFrameTest::test_applymap. + 'pandas.core.frame.DataFrame.applymap': [ + 'df_copy.iloc[0, 0] = pd.NA', + "df_copy.applymap(lambda x: len(str(x)), na_action='ignore')", + ], + # Skipped so we don't need to install natsort + 'pandas.core.frame.DataFrame.sort_values': [ + 'from natsort import index_natsorted', + 'df.sort_values(\n' + ' by="time",\n' + ' key=lambda x: np.argsort(index_natsorted(df["time"]))\n' + ')' + ], + # Mode that we don't yet support, documentation added in pandas + # 1.2.0 (https://github.com/pandas-dev/pandas/issues/35912) + 'pandas.core.frame.DataFrame.aggregate': [ + "df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))" + ], }) self.assertEqual(result.failed, 0) @@ -205,6 +384,10 @@ def test_series_tests(self): 'pandas.core.series.Series.dot': [ 's.dot(arr)', # non-deferred result ], + 'pandas.core.series.Series.fillna': [ + "df.fillna(method='ffill')", + 'df.fillna(value=values, limit=1)', + ], 'pandas.core.series.Series.items': ['*'], 'pandas.core.series.Series.iteritems': ['*'], # default keep is 'first' @@ -221,57 +404,86 @@ def test_series_tests(self): ], 'pandas.core.series.Series.pop': ['*'], 'pandas.core.series.Series.searchsorted': ['*'], - 'pandas.core.series.Series.shift': ['*'], + 'pandas.core.series.Series.shift': [ + 'df.shift(periods=3)', + 'df.shift(periods=3, fill_value=0)', + ], 'pandas.core.series.Series.take': ['*'], 'pandas.core.series.Series.to_dict': ['*'], 'pandas.core.series.Series.unique': ['*'], 'pandas.core.series.Series.unstack': ['*'], 'pandas.core.series.Series.values': ['*'], 'pandas.core.series.Series.view': ['*'], + 'pandas.core.series.Series.append': [ + 's1.append(s2, ignore_index=True)', + ], + 'pandas.core.series.Series.replace': [ + "s.replace([1, 2], method='bfill')", + # Relies on method='pad' + "s.replace('a', None)", + # Implicitly uses method='pad', but output doesn't rely on that + # behavior. Verified indepently in + # frames_test.py::DeferredFrameTest::test_replace + "df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})" + ], + 'pandas.core.series.Series.sort_index': ['*'], + 'pandas.core.series.Series.sort_values': ['*'], + 'pandas.core.series.Series.argmax': ['*'], + 'pandas.core.series.Series.argmin': ['*'], + 'pandas.core.series.Series.drop_duplicates': [ + 's.drop_duplicates()', + "s.drop_duplicates(keep='last')", + ], + 'pandas.core.series.Series.repeat': [ + 's.repeat([1, 2, 3])' + ], + 'pandas.core.series.Series.reindex': ['*'], + 'pandas.core.series.Series.autocorr': ['*'], }, not_implemented_ok={ + 'pandas.core.series.Series.transform': [ + # str arg not supported. Tested with np.sum in + # frames_test.py::DeferredFrameTest::test_groupby_transform_sum + "df.groupby('Date')['Data'].transform('sum')", + ], 'pandas.core.series.Series.groupby': [ 'ser.groupby(["a", "b", "a", "b"]).mean()', 'ser.groupby(["a", "b", "a", np.nan]).mean()', 'ser.groupby(["a", "b", "a", np.nan], dropna=False).mean()', - # Grouping by a series is not supported - 'ser.groupby(ser > 100).mean()', ], - 'pandas.core.series.Series.reindex': ['*'], }, skip={ - 'pandas.core.series.Series.append': ['*'], - 'pandas.core.series.Series.argmax': ['*'], - 'pandas.core.series.Series.argmin': ['*'], - 'pandas.core.series.Series.autocorr': ['*'], - 'pandas.core.series.Series.combine': ['*'], - 'pandas.core.series.Series.combine_first': ['*'], + 'pandas.core.series.Series.groupby': [ + # TODO(BEAM-11393): This example requires aligning two series + # with non-unique indexes. It only works in pandas because + # pandas can recognize the indexes are identical and elide the + # alignment. + 'ser.groupby(ser > 100).mean()', + ], + # error formatting + 'pandas.core.series.Series.append': [ + 's1.append(s2, verify_integrity=True)', + ], + # Throws NotImplementedError when modifying df 'pandas.core.series.Series.compare': ['*'], - 'pandas.core.series.Series.count': ['*'], 'pandas.core.series.Series.cov': [ # Differs in LSB on jenkins. "s1.cov(s2)", ], - 'pandas.core.series.Series.drop': ['*'], - 'pandas.core.series.Series.drop_duplicates': ['*'], 'pandas.core.series.Series.duplicated': ['*'], - 'pandas.core.series.Series.explode': ['*'], 'pandas.core.series.Series.idxmax': ['*'], 'pandas.core.series.Series.idxmin': ['*'], - 'pandas.core.series.Series.name': ['*'], 'pandas.core.series.Series.nonzero': ['*'], - 'pandas.core.series.Series.quantile': ['*'], 'pandas.core.series.Series.pop': ['ser'], # testing side effect - 'pandas.core.series.Series.repeat': ['*'], - 'pandas.core.series.Series.replace': ['*'], - 'pandas.core.series.Series.reset_index': ['*'], + # Raises right exception, but testing framework has matching issues. + 'pandas.core.series.Series.replace': [ + "df.replace({'a string': 'new value', True: False}) # raises" + ], 'pandas.core.series.Series.searchsorted': [ # This doctest seems to be incorrectly parsed. "x = pd.Categorical(['apple', 'bread', 'bread'," ], 'pandas.core.series.Series.set_axis': ['*'], - 'pandas.core.series.Series.sort_index': ['*'], - 'pandas.core.series.Series.sort_values': ['*'], 'pandas.core.series.Series.to_csv': ['*'], 'pandas.core.series.Series.to_markdown': ['*'], 'pandas.core.series.Series.update': ['*'], @@ -283,73 +495,100 @@ def test_series_tests(self): self.assertEqual(result.failed, 0) def test_string_tests(self): + PD_VERSION = tuple(int(v) for v in pd.__version__.split('.')) + if PD_VERSION < (1, 2, 0): + module = pd.core.strings + else: + # Definitions were moved to accessor in pandas 1.2.0 + module = pd.core.strings.accessor + + module_name = module.__name__ + result = doctests.testmod( - pd.core.strings, + module, use_beam=False, wont_implement_ok={ # These methods can accept deferred series objects, but not lists - 'pandas.core.strings.StringMethods.cat': [ + f'{module_name}.StringMethods.cat': [ "s.str.cat(['A', 'B', 'C', 'D'], sep=',')", "s.str.cat(['A', 'B', 'C', 'D'], sep=',', na_rep='-')", "s.str.cat(['A', 'B', 'C', 'D'], na_rep='-')" ], - 'pandas.core.strings.StringMethods.repeat': [ - 's.str.repeat(repeats=[1, 2, 3])' - ], - 'pandas.core.strings.str_repeat': [ + f'{module_name}.StringMethods.repeat': [ 's.str.repeat(repeats=[1, 2, 3])' ], + f'{module_name}.str_repeat': ['s.str.repeat(repeats=[1, 2, 3])'], + f'{module_name}.StringMethods.get_dummies': ['*'], + f'{module_name}.str_get_dummies': ['*'], + f'{module_name}.StringMethods': ['s.str.split("_")'], + f'{module_name}.StringMethods.rsplit': ['*'], + f'{module_name}.StringMethods.split': ['*'], }, skip={ - # Bad test strings - 'pandas.core.strings.str_replace': [ + # count() on Series with a NaN produces mismatched type if we + # have a NaN-only partition. + f'{module_name}.StringMethods.count': ["s.str.count('a')"], + f'{module_name}.str_count': ["s.str.count('a')"], + + # Bad test strings in pandas 1.1.x + f'{module_name}.str_replace': [ "pd.Series(['foo', 'fuz', np.nan]).str.replace('f', repr)" ], - 'pandas.core.strings.StringMethods.replace': [ + f'{module_name}.StringMethods.replace': [ "pd.Series(['foo', 'fuz', np.nan]).str.replace('f', repr)" ], + + # output has incorrect formatting in 1.2.x + f'{module_name}.StringMethods.extractall': ['*'] }) self.assertEqual(result.failed, 0) def test_datetime_tests(self): # TODO(BEAM-10721) - datetimelike_result = doctests.testmod( - pd.core.arrays.datetimelike, + indexes_accessors_result = doctests.testmod( + pd.core.indexes.accessors, use_beam=False, skip={ - 'pandas.core.arrays.datetimelike.AttributesMixin._unbox_scalar': [ + 'pandas.core.indexes.accessors.TimedeltaProperties': [ + # Seems like an upstream bug. The property is 'second' + 'seconds_series.dt.seconds' + ], + + # TODO(BEAM-12530): Test data creation fails for these + # s = pd.Series(pd.to_timedelta(np.arange(5), unit="d")) + # pylint: disable=line-too-long + 'pandas.core.indexes.accessors.DatetimeProperties.to_pydatetime': [ + '*' + ], + 'pandas.core.indexes.accessors.TimedeltaProperties.components': [ '*' ], - 'pandas.core.arrays.datetimelike.TimelikeOps.ceil': ['*'], - 'pandas.core.arrays.datetimelike.TimelikeOps.floor': ['*'], - 'pandas.core.arrays.datetimelike.TimelikeOps.round': ['*'], + 'pandas.core.indexes.accessors.TimedeltaProperties.to_pytimedelta': [ + '*' + ], + # pylint: enable=line-too-long }) + datetimelike_result = doctests.testmod( + pd.core.arrays.datetimelike, use_beam=False) datetime_result = doctests.testmod( pd.core.arrays.datetimes, use_beam=False, - skip={ - 'pandas.core.arrays.datetimes.DatetimeArray.day': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.hour': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.microsecond': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.minute': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.month': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.nanosecond': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.second': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.year': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.is_leap_year': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.is_month_end': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.is_month_start': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.is_quarter_end': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.is_quarter_start': [ - '*' - ], - 'pandas.core.arrays.datetimes.DatetimeArray.is_year_end': ['*'], - 'pandas.core.arrays.datetimes.DatetimeArray.is_year_start': ['*'], + wont_implement_ok={ 'pandas.core.arrays.datetimes.DatetimeArray.to_period': ['*'], + # All tz_localize tests use unsupported values for ambiguous= + # Verified seperately in + # frames_test.py::DeferredFrameTest::test_dt_tz_localize_* 'pandas.core.arrays.datetimes.DatetimeArray.tz_localize': ['*'], + }, + not_implemented_ok={ + # Verifies index version of this method + 'pandas.core.arrays.datetimes.DatetimeArray.to_period': [ + 'df.index.to_period("M")' + ], }) + self.assertEqual(indexes_accessors_result.failed, 0) self.assertEqual(datetimelike_result.failed, 0) self.assertEqual(datetime_result.failed, 0) @@ -370,6 +609,98 @@ def test_indexing_tests(self): }) self.assertEqual(result.failed, 0) + def test_groupby_tests(self): + result = doctests.testmod( + pd.core.groupby.groupby, + use_beam=False, + wont_implement_ok={ + 'pandas.core.groupby.groupby.GroupBy.head': ['*'], + 'pandas.core.groupby.groupby.GroupBy.tail': ['*'], + 'pandas.core.groupby.groupby.GroupBy.nth': ['*'], + 'pandas.core.groupby.groupby.GroupBy.cumcount': ['*'], + 'pandas.core.groupby.groupby.GroupBy.resample': ['*'], + }, + not_implemented_ok={ + 'pandas.core.groupby.groupby.GroupBy.ngroup': ['*'], + 'pandas.core.groupby.groupby.GroupBy.sample': ['*'], + 'pandas.core.groupby.groupby.BaseGroupBy.pipe': ['*'], + # pipe tests are in a different location in pandas 1.1.x + 'pandas.core.groupby.groupby._GroupBy.pipe': ['*'], + 'pandas.core.groupby.groupby.GroupBy.nth': [ + "df.groupby('A', as_index=False).nth(1)", + ], + }, + skip={ + # Uses iloc to mutate a DataFrame + 'pandas.core.groupby.groupby.GroupBy.resample': [ + 'df.iloc[2, 0] = 5', + 'df', + ], + # TODO: Raise wont implement for list passed as a grouping column + # Currently raises unhashable type: list + 'pandas.core.groupby.groupby.GroupBy.ngroup': [ + 'df.groupby(["A", [1,1,2,3,2,1]]).ngroup()' + ], + }) + self.assertEqual(result.failed, 0) + + result = doctests.testmod( + pd.core.groupby.generic, + use_beam=False, + wont_implement_ok={ + # Returns an array by default, not a Series. WontImplement + # (non-deferred) + 'pandas.core.groupby.generic.SeriesGroupBy.unique': ['*'], + # TODO: Is take actually deprecated? + 'pandas.core.groupby.generic.DataFrameGroupBy.take': ['*'], + 'pandas.core.groupby.generic.SeriesGroupBy.take': ['*'], + 'pandas.core.groupby.generic.SeriesGroupBy.nsmallest': [ + "s.nsmallest(3, keep='last')", + "s.nsmallest(3)", + "s.nsmallest()", + ], + 'pandas.core.groupby.generic.SeriesGroupBy.nlargest': [ + "s.nlargest(3, keep='last')", + "s.nlargest(3)", + "s.nlargest()", + ], + 'pandas.core.groupby.generic.DataFrameGroupBy.diff': ['*'], + 'pandas.core.groupby.generic.SeriesGroupBy.diff': ['*'], + 'pandas.core.groupby.generic.DataFrameGroupBy.hist': ['*'], + 'pandas.core.groupby.generic.DataFrameGroupBy.fillna': [ + "df.fillna(method='ffill')", + 'df.fillna(value=values, limit=1)', + ], + 'pandas.core.groupby.generic.SeriesGroupBy.fillna': [ + "df.fillna(method='ffill')", + 'df.fillna(value=values, limit=1)', + ], + }, + not_implemented_ok={ + 'pandas.core.groupby.generic.DataFrameGroupBy.idxmax': ['*'], + 'pandas.core.groupby.generic.DataFrameGroupBy.idxmin': ['*'], + 'pandas.core.groupby.generic.SeriesGroupBy.transform': ['*'], + 'pandas.core.groupby.generic.SeriesGroupBy.idxmax': ['*'], + 'pandas.core.groupby.generic.SeriesGroupBy.idxmin': ['*'], + }, + skip={ + 'pandas.core.groupby.generic.SeriesGroupBy.cov': [ + # Floating point comparison fails + 's1.cov(s2)', + ], + 'pandas.core.groupby.generic.DataFrameGroupBy.cov': [ + # Mutates input DataFrame with loc + # TODO: Replicate in frames_test.py + "df.loc[df.index[:5], 'a'] = np.nan", + "df.loc[df.index[5:10], 'b'] = np.nan", + "df.cov(min_periods=12)", + ], + # These examples rely on grouping by a list + 'pandas.core.groupby.generic.SeriesGroupBy.aggregate': ['*'], + 'pandas.core.groupby.generic.DataFrameGroupBy.aggregate': ['*'], + }) + self.assertEqual(result.failed, 0) + def test_top_level(self): tests = { name: func.__doc__ @@ -377,6 +708,7 @@ def test_top_level(self): if _is_top_level_function(func) and getattr(func, '__doc__', None) } + # IO methods are tested in io_test.py skip_reads = {name: ['*'] for name in dir(pd) if name.startswith('read_')} result = doctests.teststrings( @@ -388,11 +720,11 @@ def test_top_level(self): 'crosstab': ['*'], 'cut': ['*'], 'eval': ['*'], - 'factorize': ['*'], 'get_dummies': ['*'], 'infer_freq': ['*'], 'lreshape': ['*'], 'melt': ['*'], + 'merge': ["df1.merge(df2, how='cross')"], 'merge_asof': ['*'], 'pivot': ['*'], 'pivot_table': ['*'], @@ -400,16 +732,23 @@ def test_top_level(self): 'reset_option': ['*'], 'set_eng_float_format': ['*'], 'set_option': ['*'], - 'to_datetime': ['*'], 'to_numeric': ['*'], 'to_timedelta': ['*'], 'unique': ['*'], - 'value_counts': ['*'], 'wide_to_long': ['*'], }, wont_implement_ok={ + 'factorize': ['*'], 'to_datetime': ['s.head()'], 'to_pickle': ['*'], + 'melt': [ + "pd.melt(df, id_vars=['A'], value_vars=['B'])", + "pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])", + "pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])", + "pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])", + "pd.melt(df, id_vars=['A'], value_vars=['B'],\n" + + " var_name='myVarname', value_name='myValname')" + ], }, skip={ # error formatting @@ -420,7 +759,8 @@ def test_top_level(self): 'merge': [ "df1.merge(df2, left_on='lkey', right_on='rkey')", "df1.merge(df2, left_on='lkey', right_on='rkey',\n" - " suffixes=('_left', '_right'))" + " suffixes=('_left', '_right'))", + "df1.merge(df2, how='left', on='a')", ], # Not an actual test. 'option_context': ['*'], diff --git a/sdks/python/apache_beam/dataframe/pandas_top_level_functions.py b/sdks/python/apache_beam/dataframe/pandas_top_level_functions.py index 0e9154f5b267..443843e629b6 100644 --- a/sdks/python/apache_beam/dataframe/pandas_top_level_functions.py +++ b/sdks/python/apache_beam/dataframe/pandas_top_level_functions.py @@ -37,14 +37,36 @@ def wrapper(target, *args, **kwargs): return staticmethod(wrapper) +def _maybe_wrap_constant_expr(res): + if type(res) in frame_base.DeferredBase._pandas_type_map.keys(): + return frame_base.DeferredBase.wrap( + expressions.ConstantExpression(res, res[0:0])) + else: + return res + + def _defer_to_pandas(name): + func = getattr(pd, name) + def wrapper(*args, **kwargs): - res = getattr(pd, name)(*args, **kwargs) - if type(res) in frame_base.DeferredBase._pandas_type_map.keys(): - return frame_base.DeferredBase.wrap( - expressions.ConstantExpression(res, res[0:0])) - else: - return res + res = func(*args, **kwargs) + return _maybe_wrap_constant_expr(res) + + return staticmethod(wrapper) + + +def _defer_to_pandas_maybe_elementwise(name): + """ Same as _defer_to_pandas, except it handles DeferredBase args, assuming + the function can be processed elementwise. """ + func = getattr(pd, name) + + def wrapper(*args, **kwargs): + if any(isinstance(arg, frame_base.DeferredBase) + for arg in args + tuple(kwargs.values())): + return frame_base._elementwise_function(func, name)(*args, **kwargs) + + res = func(*args, **kwargs) + return _maybe_wrap_constant_expr(res) return staticmethod(wrapper) @@ -85,6 +107,16 @@ def concat( objs = [objs[k] for k in keys] else: objs = list(objs) + + if keys is None: + preserves_partitioning = partitionings.Arbitrary() + else: + # Index 0 will be a new index for keys, only partitioning by the original + # indexes (1 to N) will be preserved. + nlevels = min(o._expr.proxy().index.nlevels for o in objs) + preserves_partitioning = partitionings.Index( + [i for i in range(1, nlevels + 1)]) + deferred_none = expressions.ConstantExpression(None) exprs = [deferred_none if o is None else o._expr for o in objs] @@ -93,7 +125,7 @@ def concat( elif verify_integrity: required_partitioning = partitionings.Index() else: - required_partitioning = partitionings.Nothing() + required_partitioning = partitionings.Arbitrary() return frame_base.DeferredBase.wrap( expressions.ComputedExpression( @@ -109,7 +141,7 @@ def concat( verify_integrity=verify_integrity), # yapf break exprs, requires_partition_by=required_partitioning, - preserves_partition_by=partitionings.Index())) + preserves_partition_by=preserves_partitioning)) date_range = _defer_to_pandas('date_range') describe_option = _defer_to_pandas('describe_option') @@ -122,7 +154,8 @@ def concat( melt = _call_on_first_arg('melt') merge = _call_on_first_arg('merge') melt = _call_on_first_arg('melt') - merge_ordered = frame_base.wont_implement_method('order-sensitive') + merge_ordered = frame_base.wont_implement_method( + pd, 'merge_ordered', reason='order-sensitive') notna = _call_on_first_arg('notna') notnull = _call_on_first_arg('notnull') option_context = _defer_to_pandas('option_context') @@ -130,18 +163,27 @@ def concat( pivot = _call_on_first_arg('pivot') pivot_table = _call_on_first_arg('pivot_table') show_versions = _defer_to_pandas('show_versions') - test = frame_base.wont_implement_method('test') + test = frame_base.wont_implement_method( + pd, + 'test', + explanation="because it is an internal pandas testing utility.") timedelta_range = _defer_to_pandas('timedelta_range') - to_pickle = frame_base.wont_implement_method('order-sensitive') + to_pickle = frame_base.wont_implement_method( + pd, 'to_pickle', reason='order-sensitive') + to_datetime = _defer_to_pandas_maybe_elementwise('to_datetime') notna = _call_on_first_arg('notna') def __getattr__(self, name): if name.startswith('read_'): - return frame_base.wont_implement_method( - 'Use p | apache_beam.dataframe.io.%s' % name) + + def func(*args, **kwargs): + raise frame_base.WontImplementError( + 'Use p | apache_beam.dataframe.io.%s' % name) + + return func res = getattr(pd, name) if _is_top_level_function(res): - return frame_base.not_implemented_method(name) + return frame_base.not_implemented_method(name, base_type=pd) else: return res diff --git a/sdks/python/apache_beam/dataframe/partitionings.py b/sdks/python/apache_beam/dataframe/partitionings.py index 9baf9c9a1cd3..9891e71ef4f9 100644 --- a/sdks/python/apache_beam/dataframe/partitionings.py +++ b/sdks/python/apache_beam/dataframe/partitionings.py @@ -14,14 +14,13 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import random from typing import Any from typing import Iterable from typing import Tuple from typing import TypeVar +import numpy as np import pandas as pd Frame = TypeVar('Frame', bound=pd.core.generic.NDFrame) @@ -43,6 +42,12 @@ def is_subpartitioning_of(self, other): """ raise NotImplementedError + def __lt__(self, other): + return self != other and self <= other + + def __le__(self, other): + return not self.is_subpartitioning_of(other) + def partition_fn(self, df, num_partitions): # type: (Frame, int) -> Iterable[Tuple[Any, Frame]] @@ -65,7 +70,7 @@ class Index(Partitioning): These form a partial order, given by - Nothing() < Index([i]) < Index([i, j]) < ... < Index() < Singleton() + Singleton() < Index([i]) < Index([i, j]) < ... < Index() < Arbitrary() The ordering is implemented via the is_subpartitioning_of method, where the examples on the right are subpartitionings of the examples on the left above. @@ -82,9 +87,6 @@ def __repr__(self): def __eq__(self, other): return type(self) == type(other) and self._levels == other._levels - def __ne__(self, other): - return not self == other - def __hash__(self): if self._levels: return hash(tuple(sorted(self._levels))) @@ -92,7 +94,7 @@ def __hash__(self): return hash(type(self)) def is_subpartitioning_of(self, other): - if isinstance(other, Nothing): + if isinstance(other, Singleton): return True elif isinstance(other, Index): if self._levels is None: @@ -100,37 +102,48 @@ def is_subpartitioning_of(self, other): elif other._levels is None: return False else: - return all(level in other._levels for level in self._levels) - else: + return all(level in self._levels for level in other._levels) + elif isinstance(other, Arbitrary): return False + else: + raise ValueError(f"Encountered unknown type {other!r}") - def partition_fn(self, df, num_partitions): + def _hash_index(self, df): if self._levels is None: levels = list(range(df.index.nlevels)) else: levels = self._levels - hashes = sum( - pd.util.hash_array(df.index.get_level_values(level)) + return sum( + pd.util.hash_array(np.asarray(df.index.get_level_values(level))) for level in levels) + + def partition_fn(self, df, num_partitions): + hashes = self._hash_index(df) for key in range(num_partitions): yield key, df[hashes % num_partitions == key] def check(self, dfs): - # TODO(BEAM-11324): This check should be stronger, it should verify that - # running partition_fn on the concatenation of dfs yields the same - # partitions. - if self._levels is None: + # Drop empty DataFrames + dfs = [df for df in dfs if len(df)] - def get_index_set(df): - return set(df.index) - else: - - def get_index_set(df): - return set(zip(df.index.level[level] for level in self._levels)) + if not len(dfs): + return True - index_sets = [get_index_set(df) for df in dfs] - for i, index_set in enumerate(index_sets[:-1]): - if not index_set.isdisjoint(set.union(*index_sets[i + 1:])): + def apply_consistent_order(dfs): + # Apply consistent order between dataframes by using sum of the index's + # hash. + # Apply consistent order within dataframe with sort_index() + # Also drops any empty dataframes. + return sorted((df.sort_index() for df in dfs if len(df)), + key=lambda df: sum(self._hash_index(df))) + + dfs = apply_consistent_order(dfs) + repartitioned_dfs = apply_consistent_order( + df for _, df in self.test_partition_fn(pd.concat(dfs))) + + # Assert that each index is identical + for df, repartitioned_df in zip(dfs, repartitioned_dfs): + if not df.index.equals(repartitioned_df.index): return False return True @@ -139,17 +152,21 @@ def get_index_set(df): class Singleton(Partitioning): """A partitioning of all the data into a single partition. """ + def __init__(self, reason=None): + self._reason = reason + + @property + def reason(self): + return self._reason + def __eq__(self, other): return type(self) == type(other) - def __ne__(self, other): - return not self == other - def __hash__(self): return hash(type(self)) def is_subpartitioning_of(self, other): - return True + return isinstance(other, Singleton) def partition_fn(self, df, num_partitions): yield None, df @@ -158,23 +175,20 @@ def check(self, dfs): return len(dfs) <= 1 -class Nothing(Partitioning): +class Arbitrary(Partitioning): """A partitioning imposing no constraints on the actual partitioning. """ def __eq__(self, other): return type(self) == type(other) - def __ne__(self, other): - return not self == other - def __hash__(self): return hash(type(self)) def is_subpartitioning_of(self, other): - return isinstance(other, Nothing) + return True def test_partition_fn(self, df): - num_partitions = max(min(df.size, 10), 1) + num_partitions = 10 def shuffled(seq): seq = list(seq) diff --git a/sdks/python/apache_beam/dataframe/partitionings_test.py b/sdks/python/apache_beam/dataframe/partitionings_test.py index 70a5e99c5f94..b60aa6707eea 100644 --- a/sdks/python/apache_beam/dataframe/partitionings_test.py +++ b/sdks/python/apache_beam/dataframe/partitionings_test.py @@ -14,14 +14,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import unittest import pandas as pd +from apache_beam.dataframe.partitionings import Arbitrary from apache_beam.dataframe.partitionings import Index -from apache_beam.dataframe.partitionings import Nothing from apache_beam.dataframe.partitionings import Singleton @@ -36,8 +34,10 @@ class PartitioningsTest(unittest.TestCase): }).set_index(['shape', 'color', 'size']) def test_index_is_subpartition(self): - ordered_list = [Nothing(), Index([3]), Index([1, 3]), Index(), Singleton()] - for loose, strict in zip(ordered_list[:1], ordered_list[1:]): + ordered_list = [ + Singleton(), Index([3]), Index([1, 3]), Index(), Arbitrary() + ] + for loose, strict in zip(ordered_list[:-1], ordered_list[1:]): self.assertTrue(strict.is_subpartitioning_of(loose), (strict, loose)) self.assertFalse(loose.is_subpartitioning_of(strict), (loose, strict)) # Incomparable. @@ -65,13 +65,13 @@ def test_index_partition(self): self._check_partition(Index(), 7, 24) def test_nothing_subpartition(self): - self.assertTrue(Nothing().is_subpartitioning_of(Nothing())) for p in [Index([1]), Index([1, 2]), Index(), Singleton()]: - self.assertFalse(Nothing().is_subpartitioning_of(p), p) + self.assertTrue(Arbitrary().is_subpartitioning_of(p), p) def test_singleton_subpartition(self): - for p in [Nothing(), Index([1]), Index([1, 2]), Index(), Singleton()]: - self.assertTrue(Singleton().is_subpartitioning_of(p), p) + self.assertTrue(Singleton().is_subpartitioning_of(Singleton())) + for p in [Arbitrary(), Index([1]), Index([1, 2]), Index()]: + self.assertFalse(Singleton().is_subpartitioning_of(p), p) def test_singleton_partition(self): parts = list(Singleton().partition_fn(pd.Series(range(10)), 1000)) diff --git a/sdks/python/apache_beam/dataframe/schemas.py b/sdks/python/apache_beam/dataframe/schemas.py index f9ff55bbe7ed..bd889daeadc3 100644 --- a/sdks/python/apache_beam/dataframe/schemas.py +++ b/sdks/python/apache_beam/dataframe/schemas.py @@ -27,38 +27,29 @@ \--- np.float{32,64} Not supported <------ Optional[bytes] np.bool <-----> np.bool - - * int, float, bool are treated the same as np.int64, np.float64, np.bool - -Any unknown or unsupported types are treated as :code:`Any` and shunted to -:code:`np.object`:: - + np.dtype('S') <-----> bytes + pd.BooleanDType() <-----> Optional[bool] + pd.StringDType() <-----> Optional[str] + \--- str np.object <-----> Any -bytes, unicode strings and nullable Booleans are handled differently when using -pandas 0.x vs. 1.x. pandas 0.x has no mapping for these types, so they are -shunted to :code:`np.object`. + * int, float, bool are treated the same as np.int64, np.float64, np.bool -pandas 1.x Only:: +Note that when converting to pandas dtypes, any types not specified here are +shunted to ``np.object``. - np.dtype('S') <-----> bytes - pd.BooleanDType() <-----> Optional[bool] - pd.StringDType() <-----> Optional[str] - \--- str +Similarly when converting from pandas to Python types, types that aren't +otherwise specified here are shunted to ``Any``. Notably, this includes +``np.datetime64``. Pandas does not support hierarchical data natively. Currently, all structured -types (:code:`Sequence`, :code:`Mapping`, nested :code:`NamedTuple` types), are -shunted to :code:`np.object` like all other unknown types. In the future these +types (``Sequence``, ``Mapping``, nested ``NamedTuple`` types), are +shunted to ``np.object`` like all other unknown types. In the future these types may be given special consideration. """ # pytype: skip-file -#TODO: Mapping for date/time types -#https://pandas.pydata.org/docs/user_guide/timeseries.html#overview - -from __future__ import absolute_import - from typing import Any from typing import NamedTuple from typing import Optional @@ -87,11 +78,9 @@ T = TypeVar('T', bound=NamedTuple) -PD_MAJOR = int(pd.__version__.split('.')[0]) - # Generate type map (presented visually in the docstring) _BIDIRECTIONAL = [ - (np.bool, np.bool), + (bool, bool), (np.int8, np.int8), (np.int16, np.int16), (np.int32, np.int32), @@ -102,15 +91,11 @@ (pd.Int64Dtype(), Optional[np.int64]), (np.float32, Optional[np.float32]), (np.float64, Optional[np.float64]), - (np.object, Any), + (object, Any), + (pd.StringDtype(), Optional[str]), + (pd.BooleanDtype(), Optional[bool]), ] -if PD_MAJOR >= 1: - _BIDIRECTIONAL.extend([ - (pd.StringDtype(), Optional[str]), - (pd.BooleanDtype(), Optional[np.bool]), - ]) - PANDAS_TO_BEAM = { pd.Series([], dtype=dtype).dtype: fieldtype for dtype, @@ -121,10 +106,7 @@ # Shunt non-nullable Beam types to the same pandas types as their non-nullable # equivalents for FLOATs, DOUBLEs, and STRINGs. pandas has no non-nullable dtype # for these. -OPTIONAL_SHUNTS = [np.float32, np.float64] - -if PD_MAJOR >= 1: - OPTIONAL_SHUNTS.append(str) +OPTIONAL_SHUNTS = [np.float32, np.float64, str] for typehint in OPTIONAL_SHUNTS: BEAM_TO_PANDAS[typehint] = BEAM_TO_PANDAS[Optional[typehint]] @@ -182,7 +164,7 @@ def generate_proxy(element_type): for name, typehint in fields: # Default to np.object. This is lossy, we won't be able to recover # the type at the output. - dtype = BEAM_TO_PANDAS.get(typehint, np.object) + dtype = BEAM_TO_PANDAS.get(typehint, object) proxy[name] = proxy[name].astype(dtype) return proxy @@ -292,7 +274,8 @@ def _unbatch_transform(proxy, include_indexes): ctor = element_type_from_dataframe(proxy, include_indexes=include_indexes) return beam.ParDo( - _UnbatchWithIndex(ctor) if include_indexes else _UnbatchNoIndex(ctor)) + _UnbatchWithIndex(ctor) if include_indexes else _UnbatchNoIndex(ctor) + ).with_output_types(ctor) elif isinstance(proxy, pd.Series): # Raise a TypeError if proxy has an unknown type output_type = _dtype_to_fieldtype(proxy.dtype) @@ -322,7 +305,7 @@ def _dtype_to_fieldtype(dtype): elif dtype.kind == 'S': return bytes else: - raise TypeError("Unsupported dtype in proxy: '%s'" % dtype) + return Any @typehints.with_input_types(Union[pd.DataFrame, pd.Series]) diff --git a/sdks/python/apache_beam/dataframe/schemas_test.py b/sdks/python/apache_beam/dataframe/schemas_test.py index f9576a4a9af0..3fbb2d834b57 100644 --- a/sdks/python/apache_beam/dataframe/schemas_test.py +++ b/sdks/python/apache_beam/dataframe/schemas_test.py @@ -19,17 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import typing import unittest -import future.tests.base # pylint: disable=unused-import import numpy as np -# patches unittest.testcase to be python3 compatible import pandas as pd from parameterized import parameterized -from past.builtins import unicode import apache_beam as beam from apache_beam.coders import RowCoder @@ -39,12 +34,14 @@ from apache_beam.testing.test_pipeline import TestPipeline from apache_beam.testing.util import assert_that from apache_beam.testing.util import equal_to +from apache_beam.typehints import typehints +from apache_beam.typehints.native_type_compatibility import match_is_named_tuple Simple = typing.NamedTuple( - 'Simple', [('name', unicode), ('id', int), ('height', float)]) + 'Simple', [('name', str), ('id', int), ('height', float)]) coders_registry.register_coder(Simple, RowCoder) Animal = typing.NamedTuple( - 'Animal', [('animal', unicode), ('max_speed', typing.Optional[float])]) + 'Animal', [('animal', str), ('max_speed', typing.Optional[float])]) coders_registry.register_coder(Animal, RowCoder) @@ -68,51 +65,67 @@ def check_df_pcoll_equal(actual): # dtype. For example: # pd.Series([b'abc'], dtype=bytes).dtype != 'S' # pd.Series([b'abc'], dtype=bytes).astype(bytes).dtype == 'S' +# (test data, pandas_type, column_name, beam_type) COLUMNS = [ - ([375, 24, 0, 10, 16], np.int32, 'i32'), - ([375, 24, 0, 10, 16], np.int64, 'i64'), - ([375, 24, None, 10, 16], pd.Int32Dtype(), 'i32_nullable'), - ([375, 24, None, 10, 16], pd.Int64Dtype(), 'i64_nullable'), - ([375., 24., None, 10., 16.], np.float64, 'f64'), - ([375., 24., None, 10., 16.], np.float32, 'f32'), - ([True, False, True, True, False], np.bool, 'bool'), - (['Falcon', 'Ostrich', None, 3.14, 0], np.object, 'any'), -] # type: typing.List[typing.Tuple[typing.List[typing.Any], typing.Any, str]] - -if schemas.PD_MAJOR >= 1: - COLUMNS.extend([ - ([True, False, True, None, False], pd.BooleanDtype(), 'bool_nullable'), - (['Falcon', 'Ostrich', None, 'Aardvark', 'Elephant'], - pd.StringDtype(), - 'strdtype'), - ]) - -NICE_TYPES_DF = pd.DataFrame(columns=[name for _, _, name in COLUMNS]) -for arr, dtype, name in COLUMNS: + ([375, 24, 0, 10, 16], np.int32, 'i32', np.int32), + ([375, 24, 0, 10, 16], np.int64, 'i64', np.int64), + ([375, 24, None, 10, 16], + pd.Int32Dtype(), + 'i32_nullable', + typing.Optional[np.int32]), + ([375, 24, None, 10, 16], + pd.Int64Dtype(), + 'i64_nullable', + typing.Optional[np.int64]), + ([375., 24., None, 10., 16.], + np.float64, + 'f64', + typing.Optional[np.float64]), + ([375., 24., None, 10., 16.], + np.float32, + 'f32', + typing.Optional[np.float32]), + ([True, False, True, True, False], bool, 'bool', bool), + (['Falcon', 'Ostrich', None, 3.14, 0], object, 'any', typing.Any), + ([True, False, True, None, False], + pd.BooleanDtype(), + 'bool_nullable', + typing.Optional[bool]), + (['Falcon', 'Ostrich', None, 'Aardvark', 'Elephant'], + pd.StringDtype(), + 'strdtype', + typing.Optional[str]), +] # type: typing.List[typing.Tuple[typing.List[typing.Any], typing.Any, str, typing.Any]] + +NICE_TYPES_DF = pd.DataFrame(columns=[name for _, _, name, _ in COLUMNS]) +for arr, dtype, name, _ in COLUMNS: NICE_TYPES_DF[name] = pd.Series(arr, dtype=dtype, name=name).astype(dtype) NICE_TYPES_PROXY = NICE_TYPES_DF[:0] -SERIES_TESTS = [(pd.Series(arr, dtype=dtype, name=name), arr) for arr, - dtype, - name in COLUMNS] +SERIES_TESTS = [(pd.Series(arr, dtype=dtype, name=name), arr, beam_type) + for (arr, dtype, name, beam_type) in COLUMNS] _TEST_ARRAYS = [ - arr for arr, _, _ in COLUMNS + arr for (arr, _, _, _) in COLUMNS ] # type: typing.List[typing.List[typing.Any]] DF_RESULT = list(zip(*_TEST_ARRAYS)) -INDEX_DF_TESTS = [ - (NICE_TYPES_DF.set_index([name for _, _, name in COLUMNS[:i]]), DF_RESULT) - for i in range(1, len(COLUMNS) + 1) -] +BEAM_SCHEMA = typing.NamedTuple( # type: ignore + 'BEAM_SCHEMA', [(name, beam_type) for _, _, name, beam_type in COLUMNS]) +INDEX_DF_TESTS = [( + NICE_TYPES_DF.set_index([name for _, _, name, _ in COLUMNS[:i]]), + DF_RESULT, + BEAM_SCHEMA) for i in range(1, len(COLUMNS) + 1)] + +NOINDEX_DF_TESTS = [(NICE_TYPES_DF, DF_RESULT, BEAM_SCHEMA)] -NOINDEX_DF_TESTS = [(NICE_TYPES_DF, DF_RESULT)] +PD_VERSION = tuple(int(n) for n in pd.__version__.split('.')) class SchemasTest(unittest.TestCase): def test_simple_df(self): expected = pd.DataFrame({ - 'name': list(unicode(i) for i in range(5)), + 'name': list(str(i) for i in range(5)), 'id': list(range(5)), 'height': list(float(i) for i in range(5)) }, @@ -121,16 +134,33 @@ def test_simple_df(self): with TestPipeline() as p: res = ( p - | beam.Create([ - Simple(name=unicode(i), id=i, height=float(i)) for i in range(5) - ]) + | beam.Create( + [Simple(name=str(i), id=i, height=float(i)) for i in range(5)]) + | schemas.BatchRowsAsDataFrame(min_batch_size=10, max_batch_size=10)) + assert_that(res, matches_df(expected)) + + def test_simple_df_with_beam_row(self): + expected = pd.DataFrame({ + 'name': list(str(i) for i in range(5)), + 'id': list(range(5)), + 'height': list(float(i) for i in range(5)) + }, + columns=['name', 'id', 'height']) + + with TestPipeline() as p: + res = ( + p + | beam.Create([(str(i), i, float(i)) for i in range(5)]) + | beam.Select( + name=lambda r: str(r[0]), + id=lambda r: int(r[1]), + height=lambda r: float(r[2])) | schemas.BatchRowsAsDataFrame(min_batch_size=10, max_batch_size=10)) assert_that(res, matches_df(expected)) def test_generate_proxy(self): expected = pd.DataFrame({ - 'animal': pd.Series( - dtype=np.object if schemas.PD_MAJOR < 1 else pd.StringDtype()), + 'animal': pd.Series(dtype=pd.StringDtype()), 'max_speed': pd.Series(dtype=np.float64) }) @@ -141,7 +171,10 @@ def test_nice_types_proxy_roundtrip(self): schemas.element_type_from_dataframe(NICE_TYPES_PROXY)) self.assertTrue(roundtripped.equals(NICE_TYPES_PROXY)) - @unittest.skipIf(schemas.PD_MAJOR < 1, "bytes not supported with 0.x") + @unittest.skipIf( + PD_VERSION == (1, 2, 1), + "Can't roundtrip bytes in pandas 1.2.1" + "https://github.com/pandas-dev/pandas/issues/39474") def test_bytes_proxy_roundtrip(self): proxy = pd.DataFrame({'bytes': []}) proxy.bytes = proxy.bytes.astype(bytes) @@ -187,8 +220,18 @@ def test_batch_with_df_transform(self): proxy=schemas.generate_proxy(Animal))) assert_that(res, equal_to([('Falcon', 375.), ('Parrot', 25.)])) + def assert_typehints_equal(self, left, right): + left = typehints.normalize(left) + right = typehints.normalize(right) + + if match_is_named_tuple(left): + self.assertTrue(match_is_named_tuple(right)) + self.assertEqual(left.__annotations__, right.__annotations__) + else: + self.assertEqual(left, right) + @parameterized.expand(SERIES_TESTS + NOINDEX_DF_TESTS) - def test_unbatch_no_index(self, df_or_series, rows): + def test_unbatch_no_index(self, df_or_series, rows, beam_type): proxy = df_or_series[:0] with TestPipeline() as p: @@ -196,10 +239,15 @@ def test_unbatch_no_index(self, df_or_series, rows): p | beam.Create([df_or_series[::2], df_or_series[1::2]]) | schemas.UnbatchPandas(proxy)) + # Verify that the unbatched PCollection has the expected typehint + # TODO(BEAM-8538): typehints should support NamedTuple so we can use + # typehints.is_consistent_with here instead + self.assert_typehints_equal(res.element_type, beam_type) + assert_that(res, equal_to(rows)) @parameterized.expand(SERIES_TESTS + INDEX_DF_TESTS) - def test_unbatch_with_index(self, df_or_series, rows): + def test_unbatch_with_index(self, df_or_series, rows, _): proxy = df_or_series[:0] with TestPipeline() as p: @@ -242,6 +290,20 @@ def test_unbatch_include_index_column_conflict_raises(self): with self.assertRaisesRegex(ValueError, 'foo'): _ = pc | schemas.UnbatchPandas(proxy, include_indexes=True) + def test_unbatch_datetime(self): + + s = pd.Series( + pd.date_range( + '1/1/2000', periods=100, freq='m', tz='America/Los_Angeles')) + proxy = s[:0] + + with TestPipeline() as p: + res = ( + p | beam.Create([s[::2], s[1::2]]) + | schemas.UnbatchPandas(proxy, include_indexes=True)) + + assert_that(res, equal_to(list(s))) + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/dataframe/transforms.py b/sdks/python/apache_beam/dataframe/transforms.py index b403f32befef..b698448b3bbd 100644 --- a/sdks/python/apache_beam/dataframe/transforms.py +++ b/sdks/python/apache_beam/dataframe/transforms.py @@ -14,8 +14,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import - import collections from typing import TYPE_CHECKING from typing import Any @@ -35,6 +33,10 @@ from apache_beam.dataframe import partitionings from apache_beam.utils import windowed_value +__all__ = [ + 'DataframeTransform', +] + if TYPE_CHECKING: # pylint: disable=ungrouped-imports from apache_beam.pvalue import PCollection @@ -52,15 +54,15 @@ class DataframeTransform(transforms.PTransform): """A PTransform for applying function that takes and returns dataframes to one or more PCollections. - DataframeTransform will accept a PCollection with a schema and batch it - into dataframes if necessary. In this case the proxy can be omitted: + :class:`DataframeTransform` will accept a PCollection with a `schema`_ and + batch it into :class:`~pandas.DataFrame` instances if necessary:: - (pcoll | beam.Row(key=..., foo=..., bar=...) + (pcoll | beam.Select(key=..., foo=..., bar=...) | DataframeTransform(lambda df: df.group_by('key').sum())) - It is also possible to process a PCollection of dataframes directly, in this - case a proxy must be provided. For example, if pcoll is a PCollection of - dataframes, one could write:: + It is also possible to process a PCollection of :class:`~pandas.DataFrame` + instances directly, in this case a "proxy" must be provided. For example, if + ``pcoll`` is a PCollection of DataFrames, one could write:: pcoll | DataframeTransform(lambda df: df.group_by('key').sum(), proxy=...) @@ -69,17 +71,28 @@ class DataframeTransform(transforms.PTransform): PCollections, in which case they will be passed as keyword arguments. Args: - yield_elements: (optional, default: "schemas") If set to "pandas", return - PCollections containing the raw Pandas objects (DataFrames or Series), - if set to "schemas", return an element-wise PCollection, where DataFrame - and Series instances are expanded to one element per row. DataFrames are - converted to schema-aware PCollections, where column values can be - accessed by attribute. - include_indexes: (optional, default: False) When yield_elements="schemas", - if include_indexes=True, attempt to include index columns in the output - schema for expanded DataFrames. Raises an error if any of the index - levels are unnamed (name=None), or if any of the names are not unique - among all column and index names. + yield_elements: (optional, default: "schemas") If set to ``"pandas"``, + return PCollection(s) containing the raw Pandas objects + (:class:`~pandas.DataFrame` or :class:`~pandas.Series` as appropriate). + If set to ``"schemas"``, return an element-wise PCollection, where + DataFrame and Series instances are expanded to one element per row. + DataFrames are converted to `schema-aware`_ PCollections, where column + values can be accessed by attribute. + include_indexes: (optional, default: False) When + ``yield_elements="schemas"``, if ``include_indexes=True``, attempt to + include index columns in the output schema for expanded DataFrames. + Raises an error if any of the index levels are unnamed (name=None), or if + any of the names are not unique among all column and index names. + proxy: (optional) An empty :class:`~pandas.DataFrame` or + :class:`~pandas.Series` instance with the same ``dtype`` and ``name`` + as the elements of the input PCollection. Required when input + PCollection :class:`~pandas.DataFrame` or :class:`~pandas.Series` + elements. Ignored when input PCollection has a `schema`_. + + .. _schema: + https://beam.apache.org/documentation/programming-guide/#what-is-a-schema + .. _schema-aware: + https://beam.apache.org/documentation/programming-guide/#what-is-a-schema """ def __init__( self, func, proxy=None, yield_elements="schemas", include_indexes=False): @@ -173,7 +186,7 @@ def expand(self, pcolls): if len(tabular_inputs) == 0: partitioned_pcoll = next(pcolls.values()).pipeline | beam.Create([{}]) - elif self.stage.partitioning != partitionings.Nothing(): + elif self.stage.partitioning != partitionings.Arbitrary(): # Partitioning required for these operations. # Compute the number of partitions to use for the inputs based on # the estimated size of the inputs. @@ -257,7 +270,7 @@ class Stage(object): """ def __init__(self, inputs, partitioning): self.inputs = set(inputs) - if len(self.inputs) > 1 and partitioning == partitionings.Nothing(): + if len(self.inputs) > 1 and partitioning == partitionings.Arbitrary(): # We have to shuffle to co-locate, might as well partition. self.partitioning = partitionings.Index() else: @@ -282,30 +295,42 @@ def __repr__(self, indent=0): self.outputs)) # First define some helper functions. - def output_is_partitioned_by(expr, stage, partitioning): - if partitioning == partitionings.Nothing(): - # Always satisfied. - return True - elif stage.partitioning == partitionings.Singleton(): - # Within a stage, the singleton partitioning is trivially preserved. - return True - elif expr in stage.inputs: + def output_partitioning_in_stage(expr, stage): + """Return the output partitioning of expr when computed in stage, + or returns None if the expression cannot be computed in this stage. + """ + if expr in stage.inputs or expr in inputs: # Inputs are all partitioned by stage.partitioning. - return stage.partitioning.is_subpartitioning_of(partitioning) - elif expr.preserves_partition_by().is_subpartitioning_of(partitioning): - # Here expr preserves at least the requested partitioning; its outputs - # will also have this partitioning iff its inputs do. - if expr.requires_partition_by().is_subpartitioning_of(partitioning): - # If expr requires at least this partitioning, we will arrange such - # that its inputs satisfy this. - return True - else: - # Otherwise, recursively check all the inputs. - return all( - output_is_partitioned_by(arg, stage, partitioning) - for arg in expr.args()) - else: - return False + return stage.partitioning + + # Anything that's not an input must have arguments + assert len(expr.args()) + + arg_partitionings = set( + output_partitioning_in_stage(arg, stage) for arg in expr.args() + if not is_scalar(arg)) + + if len(arg_partitionings) == 0: + # All inputs are scalars, output partitioning isn't dependent on the + # input. + return expr.preserves_partition_by() + + if len(arg_partitionings) > 1: + # Arguments must be identically partitioned, can't compute this + # expression here. + return None + + arg_partitioning = arg_partitionings.pop() + + if not expr.requires_partition_by().is_subpartitioning_of( + arg_partitioning): + # Arguments aren't partitioned sufficiently for this expression + return None + + return expressions.output_partitioning(expr, arg_partitioning) + + def is_computable_in_stage(expr, stage): + return output_partitioning_in_stage(expr, stage) is not None def common_stages(stage_lists): # Set intersection, with a preference for earlier items in the list. @@ -314,11 +339,11 @@ def common_stages(stage_lists): if all(stage in other for other in stage_lists[1:]): yield stage - @memoize + @_memoize def is_scalar(expr): return not isinstance(expr.proxy(), pd.core.generic.NDFrame) - @memoize + @_memoize def expr_to_stages(expr): assert expr not in inputs # First attempt to compute this expression as part of an existing stage, @@ -331,8 +356,7 @@ def expr_to_stages(expr): required_partitioning = expr.requires_partition_by() for stage in common_stages([expr_to_stages(arg) for arg in expr.args() if arg not in inputs]): - if all(output_is_partitioned_by(arg, stage, required_partitioning) - for arg in expr.args() if not is_scalar(arg)): + if is_computable_in_stage(expr, stage): break else: # Otherwise, compute this expression as part of a new stage. @@ -362,19 +386,19 @@ def expr_to_stage(expr): if expr not in inputs: expr_to_stage(expr).outputs.add(expr) - @memoize + @_memoize def stage_to_result(stage): return {expr._id: expr_to_pcoll(expr) for expr in stage.inputs} | ComputeStage(stage) - @memoize + @_memoize def expr_to_pcoll(expr): if expr in inputs: return inputs[expr] else: return stage_to_result(expr_to_stage(expr))[expr._id] - @memoize + @_memoize def estimate_size(expr, same_stage_ok): # Returns a pcollection of ints whose sum is the estimated size of the # given expression. @@ -391,7 +415,7 @@ def estimate_size(expr, same_stage_ok): expr_stage = expr_to_stage(expr) # If the stage doesn't start with a shuffle, it's not safe to fuse # the computation into its parent either. - has_shuffle = expr_stage.partitioning != partitionings.Nothing() + has_shuffle = expr_stage.partitioning != partitionings.Arbitrary() # We assume the size of an expression is the sum of the size of its # inputs, which may be off by quite a bit, but the goal is to get # within an order of magnitude or two. @@ -493,7 +517,7 @@ def finish_bundle(self): self.start_bundle() -def memoize(f): +def _memoize(f): cache = {} def wrapper(*args, **kwargs): diff --git a/sdks/python/apache_beam/dataframe/transforms_test.py b/sdks/python/apache_beam/dataframe/transforms_test.py index 20815c2b408a..c9bd972fcd4f 100644 --- a/sdks/python/apache_beam/dataframe/transforms_test.py +++ b/sdks/python/apache_beam/dataframe/transforms_test.py @@ -14,14 +14,10 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import -from __future__ import division - import typing import unittest import pandas as pd -from past.builtins import unicode import apache_beam as beam from apache_beam import coders @@ -33,24 +29,21 @@ from apache_beam.testing.util import equal_to -def sort_by_value_and_drop_index(df): - if isinstance(df, pd.DataFrame): - sorted_df = df.sort_values(by=list(df.columns)) - else: - sorted_df = df.sort_values() - return sorted_df.reset_index(drop=True) - - -def check_correct(expected, actual, check_index=False): +def check_correct(expected, actual): if actual is None: raise AssertionError('Empty frame but expected: \n\n%s' % (expected)) if isinstance(expected, pd.core.generic.NDFrame): - sorted_actual = sort_by_value_and_drop_index(actual) - sorted_expected = sort_by_value_and_drop_index(expected) - if not sorted_actual.equals(sorted_expected): - raise AssertionError( - 'Dataframes not equal: \n\n%s\n\n%s' % - (sorted_actual, sorted_expected)) + expected = expected.sort_index() + actual = actual.sort_index() + + if isinstance(expected, pd.Series): + pd.testing.assert_series_equal(expected, actual) + elif isinstance(expected, pd.DataFrame): + pd.testing.assert_frame_equal(expected, actual) + else: + raise ValueError( + f"Expected value is a {type(expected)}," + "not a Series or DataFrame.") else: if actual != expected: raise AssertionError('Scalars not equal: %s != %s' % (actual, expected)) @@ -70,7 +63,7 @@ def df_equal_to(expected): AnimalSpeed = typing.NamedTuple( - 'AnimalSpeed', [('Animal', unicode), ('Speed', int)]) + 'AnimalSpeed', [('Animal', str), ('Speed', int)]) coders.registry.register_coder(AnimalSpeed, coders.RowCoder) Nested = typing.NamedTuple( 'Nested', [('id', int), ('animal_speed', AnimalSpeed)]) @@ -128,6 +121,30 @@ def test_groupby_sum_mean(self): self.run_scenario( df, lambda df: df.loc[df.Speed > 25].groupby('Animal').sum()) + def test_groupby_apply(self): + df = pd.DataFrame({ + 'group': ['a' if i % 5 == 0 or i % 3 == 0 else 'b' for i in range(100)], + 'foo': [None if i % 11 == 0 else i for i in range(100)], + 'bar': [None if i % 7 == 0 else 99 - i for i in range(100)], + 'baz': [None if i % 13 == 0 else i * 2 for i in range(100)], + }) + + def median_sum_fn(x): + return (x.foo + x.bar).median() + + describe = lambda df: df.describe() + + self.run_scenario(df, lambda df: df.groupby('group').foo.apply(describe)) + self.run_scenario( + df, lambda df: df.groupby('group')[['foo', 'bar']].apply(describe)) + self.run_scenario(df, lambda df: df.groupby('group').apply(median_sum_fn)) + self.run_scenario( + df, + lambda df: df.set_index('group').foo.groupby(level=0).apply(describe)) + self.run_scenario(df, lambda df: df.groupby(level=0).apply(median_sum_fn)) + self.run_scenario( + df, lambda df: df.groupby(lambda x: x % 3).apply(describe)) + def test_filter(self): df = pd.DataFrame({ 'Animal': ['Aardvark', 'Ant', 'Elephant', 'Zebra'], diff --git a/sdks/python/apache_beam/error.py b/sdks/python/apache_beam/error.py index 3165842e481a..18f1d60db6d3 100644 --- a/sdks/python/apache_beam/error.py +++ b/sdks/python/apache_beam/error.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - class BeamError(Exception): """Base class for all Beam errors.""" diff --git a/sdks/python/apache_beam/examples/__init__.py b/sdks/python/apache_beam/examples/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/__init__.py +++ b/sdks/python/apache_beam/examples/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/avro_bitcoin.py b/sdks/python/apache_beam/examples/avro_bitcoin.py index f877aafebceb..df7fad657d66 100644 --- a/sdks/python/apache_beam/examples/avro_bitcoin.py +++ b/sdks/python/apache_beam/examples/avro_bitcoin.py @@ -26,11 +26,11 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging +from avro.schema import Parse + import apache_beam as beam from apache_beam.io.avroio import ReadFromAvro from apache_beam.io.avroio import WriteToAvro @@ -38,13 +38,6 @@ from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import SetupOptions -# pylint: disable=wrong-import-order, wrong-import-position -try: - from avro.schema import Parse # avro-python3 library for python3 -except ImportError: - from avro.schema import parse as Parse # avro library for python2 -# pylint: enable=wrong-import-order, wrong-import-position - class BitcoinTxnCountDoFn(beam.DoFn): """Count inputs and outputs per transaction""" diff --git a/sdks/python/apache_beam/examples/complete/__init__.py b/sdks/python/apache_beam/examples/complete/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/complete/__init__.py +++ b/sdks/python/apache_beam/examples/complete/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/complete/autocomplete.py b/sdks/python/apache_beam/examples/complete/autocomplete.py index 54a0a100ab4a..3de8ff7cf423 100644 --- a/sdks/python/apache_beam/examples/complete/autocomplete.py +++ b/sdks/python/apache_beam/examples/complete/autocomplete.py @@ -19,12 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import re -from builtins import range import apache_beam as beam from apache_beam.io import ReadFromText diff --git a/sdks/python/apache_beam/examples/complete/autocomplete_test.py b/sdks/python/apache_beam/examples/complete/autocomplete_test.py index 00f3b657a6ed..2048742ef62b 100644 --- a/sdks/python/apache_beam/examples/complete/autocomplete_test.py +++ b/sdks/python/apache_beam/examples/complete/autocomplete_test.py @@ -19,11 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.examples.complete import autocomplete @@ -57,7 +55,7 @@ def test_top_prefixes(self): ('that', ((1, 'that'), )), ])) - @attr('IT') + @pytest.mark.it_postcommit def test_autocomplete_it(self): with TestPipeline(is_integration_test=True) as p: words = p | beam.io.ReadFromText(self.KINGLEAR_INPUT) diff --git a/sdks/python/apache_beam/examples/complete/distribopt.py b/sdks/python/apache_beam/examples/complete/distribopt.py index 30324ab97863..fb797a05dda3 100644 --- a/sdks/python/apache_beam/examples/complete/distribopt.py +++ b/sdks/python/apache_beam/examples/complete/distribopt.py @@ -51,9 +51,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import argparse import logging import string diff --git a/sdks/python/apache_beam/examples/complete/distribopt_test.py b/sdks/python/apache_beam/examples/complete/distribopt_test.py index bfd13ca7cf6d..067f326cd409 100644 --- a/sdks/python/apache_beam/examples/complete/distribopt_test.py +++ b/sdks/python/apache_beam/examples/complete/distribopt_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import tempfile diff --git a/sdks/python/apache_beam/examples/complete/estimate_pi.py b/sdks/python/apache_beam/examples/complete/estimate_pi.py index 167be40f640b..7f95989feba4 100644 --- a/sdks/python/apache_beam/examples/complete/estimate_pi.py +++ b/sdks/python/apache_beam/examples/complete/estimate_pi.py @@ -26,15 +26,10 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import argparse import json import logging import random -from builtins import object -from builtins import range from typing import Any from typing import Iterable from typing import Tuple diff --git a/sdks/python/apache_beam/examples/complete/estimate_pi_test.py b/sdks/python/apache_beam/examples/complete/estimate_pi_test.py index 85aa772f0789..ff224f4a9cab 100644 --- a/sdks/python/apache_beam/examples/complete/estimate_pi_test.py +++ b/sdks/python/apache_beam/examples/complete/estimate_pi_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/complete/game/__init__.py b/sdks/python/apache_beam/examples/complete/game/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/complete/game/__init__.py +++ b/sdks/python/apache_beam/examples/complete/game/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/complete/game/game_stats.py b/sdks/python/apache_beam/examples/complete/game/game_stats.py index 7725f5e1e32c..e6b6e3528422 100644 --- a/sdks/python/apache_beam/examples/complete/game/game_stats.py +++ b/sdks/python/apache_beam/examples/complete/game/game_stats.py @@ -73,10 +73,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import argparse import csv import logging diff --git a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py index b2e6042db4e0..ea7ec28f4412 100644 --- a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py +++ b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py @@ -20,7 +20,7 @@ Code: beam/sdks/python/apache_beam/examples/complete/game/game_stats.py Usage: - python setup.py nosetests --test-pipeline-options=" \ + pytest --test-pipeline-options=" \ --runner=TestDataflowRunner \ --project=... \ --region=... \ @@ -33,15 +33,13 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import time import unittest import uuid +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.examples.complete.game import game_stats from apache_beam.io.gcp.tests import utils @@ -105,7 +103,7 @@ def _cleanup_pubsub(self): test_utils.cleanup_subscriptions(self.sub_client, [self.input_sub]) test_utils.cleanup_topics(self.pub_client, [self.input_topic]) - @attr('IT') + @pytest.mark.it_postcommit def test_game_stats_it(self): state_verifier = PipelineStateMatcher(PipelineState.RUNNING) diff --git a/sdks/python/apache_beam/examples/complete/game/game_stats_test.py b/sdks/python/apache_beam/examples/complete/game/game_stats_test.py index fa5d785602b9..d085b90957d8 100644 --- a/sdks/python/apache_beam/examples/complete/game/game_stats_test.py +++ b/sdks/python/apache_beam/examples/complete/game/game_stats_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py b/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py index 6913fbf9a9c5..621a88c890a2 100644 --- a/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py +++ b/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py @@ -67,10 +67,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import argparse import csv import logging diff --git a/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py b/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py index 39ed32f6e32d..a9729bcdb1ca 100644 --- a/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py +++ b/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py @@ -20,7 +20,7 @@ Code: beam/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py Usage: - python setup.py nosetests --test-pipeline-options=" \ + pytest --test-pipeline-options=" \ --runner=TestDataflowRunner \ --project=... \ --region=... \ @@ -33,13 +33,11 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.examples.complete.game import hourly_team_score from apache_beam.io.gcp.tests import utils @@ -65,7 +63,7 @@ def setUp(self): self.dataset_ref = utils.create_bq_dataset( self.project, self.OUTPUT_DATASET) - @attr('IT') + @pytest.mark.it_postcommit def test_hourly_team_score_it(self): state_verifier = PipelineStateMatcher(PipelineState.DONE) query = ( diff --git a/sdks/python/apache_beam/examples/complete/game/hourly_team_score_test.py b/sdks/python/apache_beam/examples/complete/game/hourly_team_score_test.py index a3d3d629b23a..804cc84ed2a0 100644 --- a/sdks/python/apache_beam/examples/complete/game/hourly_team_score_test.py +++ b/sdks/python/apache_beam/examples/complete/game/hourly_team_score_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/complete/game/leader_board.py b/sdks/python/apache_beam/examples/complete/game/leader_board.py index 406d53e74637..5e04d6042745 100644 --- a/sdks/python/apache_beam/examples/complete/game/leader_board.py +++ b/sdks/python/apache_beam/examples/complete/game/leader_board.py @@ -81,10 +81,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import argparse import csv import logging diff --git a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py index 970fa1b0fbf7..8b82c6468798 100644 --- a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py +++ b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py @@ -20,7 +20,7 @@ Code: beam/sdks/python/apache_beam/examples/complete/game/leader_board.py Usage: - python setup.py nosetests --test-pipeline-options=" \ + pytest --test-pipeline-options=" \ --runner=TestDataflowRunner \ --project=... \ --region=... \ @@ -33,16 +33,13 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import time import unittest import uuid -from builtins import range +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.examples.complete.game import leader_board from apache_beam.io.gcp.tests import utils @@ -107,7 +104,7 @@ def _cleanup_pubsub(self): test_utils.cleanup_subscriptions(self.sub_client, [self.input_sub]) test_utils.cleanup_topics(self.pub_client, [self.input_topic]) - @attr('IT') + @pytest.mark.it_postcommit def test_leader_board_it(self): state_verifier = PipelineStateMatcher(PipelineState.RUNNING) @@ -133,6 +130,7 @@ def test_leader_board_it(self): self.project, teams_query, self.DEFAULT_EXPECTED_CHECKSUM) extra_opts = { + 'allow_unsafe_triggers': True, 'subscription': self.input_sub.name, 'dataset': self.dataset_ref.dataset_id, 'topic': self.input_topic.name, diff --git a/sdks/python/apache_beam/examples/complete/game/leader_board_test.py b/sdks/python/apache_beam/examples/complete/game/leader_board_test.py index b527208bfb46..1c1cd6548923 100644 --- a/sdks/python/apache_beam/examples/complete/game/leader_board_test.py +++ b/sdks/python/apache_beam/examples/complete/game/leader_board_test.py @@ -19,13 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest import apache_beam as beam from apache_beam.examples.complete.game import leader_board +from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.testing.test_pipeline import TestPipeline from apache_beam.testing.util import assert_that from apache_beam.testing.util import equal_to @@ -61,7 +60,8 @@ def test_leader_board_teams(self): ('team3', 13)])) def test_leader_board_users(self): - with TestPipeline() as p: + test_options = PipelineOptions(flags=['--allow_unsafe_triggers']) + with TestPipeline(options=test_options) as p: result = ( self.create_data(p) | leader_board.CalculateUserScores(allowed_lateness=120)) diff --git a/sdks/python/apache_beam/examples/complete/game/user_score.py b/sdks/python/apache_beam/examples/complete/game/user_score.py index 5d46b2552425..a87f2213ecc4 100644 --- a/sdks/python/apache_beam/examples/complete/game/user_score.py +++ b/sdks/python/apache_beam/examples/complete/game/user_score.py @@ -56,9 +56,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import argparse import csv import logging diff --git a/sdks/python/apache_beam/examples/complete/game/user_score_it_test.py b/sdks/python/apache_beam/examples/complete/game/user_score_it_test.py index 8d8dcab4817b..d26565ee1fe3 100644 --- a/sdks/python/apache_beam/examples/complete/game/user_score_it_test.py +++ b/sdks/python/apache_beam/examples/complete/game/user_score_it_test.py @@ -20,7 +20,7 @@ Code: beam/sdks/python/apache_beam/examples/complete/game/user_score.py Usage: - python setup.py nosetests --test-pipeline-options=" \ + pytest --test-pipeline-options=" \ --runner=TestDataflowRunner \ --project=... \ --region=... \ @@ -33,14 +33,12 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest import uuid +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.examples.complete.game import user_score from apache_beam.runners.runner import PipelineState @@ -62,7 +60,7 @@ def setUp(self): self.output = '/'.join( [self.test_pipeline.get_option('output'), self.uuid, 'results']) - @attr('IT') + @pytest.mark.it_postcommit def test_user_score_it(self): state_verifier = PipelineStateMatcher(PipelineState.DONE) diff --git a/sdks/python/apache_beam/examples/complete/game/user_score_test.py b/sdks/python/apache_beam/examples/complete/game/user_score_test.py index f72daaceac15..2283e56a4dc8 100644 --- a/sdks/python/apache_beam/examples/complete/game/user_score_test.py +++ b/sdks/python/apache_beam/examples/complete/game/user_score_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/complete/juliaset/__init__.py b/sdks/python/apache_beam/examples/complete/juliaset/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/complete/juliaset/__init__.py +++ b/sdks/python/apache_beam/examples/complete/juliaset/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/__init__.py b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/__init__.py +++ b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py index dd062d1aa8ad..4f98105c66f1 100644 --- a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py +++ b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py @@ -22,11 +22,7 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import argparse -from builtins import range import apache_beam as beam from apache_beam.io import WriteToText diff --git a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test.py b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test.py index a50280d024e7..00a7c83445ba 100644 --- a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test.py +++ b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import re diff --git a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test_it.py b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test_it.py index a5ada360d7c7..a2a3262a1fb6 100644 --- a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test_it.py +++ b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test_it.py @@ -19,15 +19,13 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import unittest import uuid +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.examples.complete.juliaset.juliaset import juliaset from apache_beam.io.filesystems import FileSystems @@ -36,7 +34,7 @@ from apache_beam.testing.test_pipeline import TestPipeline -@attr('IT') +@pytest.mark.it_postcommit class JuliaSetTestIT(unittest.TestCase): GRID_SIZE = 1000 diff --git a/sdks/python/apache_beam/examples/complete/juliaset/juliaset_main.py b/sdks/python/apache_beam/examples/complete/juliaset/juliaset_main.py index 4b146c37a6c1..fb64c2702fd2 100644 --- a/sdks/python/apache_beam/examples/complete/juliaset/juliaset_main.py +++ b/sdks/python/apache_beam/examples/complete/juliaset/juliaset_main.py @@ -50,8 +50,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging from apache_beam.examples.complete.juliaset.juliaset import juliaset diff --git a/sdks/python/apache_beam/examples/complete/juliaset/setup.py b/sdks/python/apache_beam/examples/complete/juliaset/setup.py index 8d06c1f2cfaa..79e24b95c85f 100644 --- a/sdks/python/apache_beam/examples/complete/juliaset/setup.py +++ b/sdks/python/apache_beam/examples/complete/juliaset/setup.py @@ -27,9 +27,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import subprocess from distutils.command.build import build as _build # type: ignore diff --git a/sdks/python/apache_beam/examples/complete/tfidf.py b/sdks/python/apache_beam/examples/complete/tfidf.py index 29409f809f48..16ce2b8471a7 100644 --- a/sdks/python/apache_beam/examples/complete/tfidf.py +++ b/sdks/python/apache_beam/examples/complete/tfidf.py @@ -23,9 +23,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import argparse import glob import math diff --git a/sdks/python/apache_beam/examples/complete/tfidf_test.py b/sdks/python/apache_beam/examples/complete/tfidf_test.py index ec886d4a936c..1613ea52735c 100644 --- a/sdks/python/apache_beam/examples/complete/tfidf_test.py +++ b/sdks/python/apache_beam/examples/complete/tfidf_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import re diff --git a/sdks/python/apache_beam/examples/complete/top_wikipedia_sessions.py b/sdks/python/apache_beam/examples/complete/top_wikipedia_sessions.py index 20fdac02fd90..572775871b4a 100644 --- a/sdks/python/apache_beam/examples/complete/top_wikipedia_sessions.py +++ b/sdks/python/apache_beam/examples/complete/top_wikipedia_sessions.py @@ -42,8 +42,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import json import logging @@ -96,8 +94,8 @@ def expand(self, pcoll): FixedWindows(size=THIRTY_DAYS_IN_SECONDS)) | 'Top' >> combiners.core.CombineGlobally( combiners.TopCombineFn( - 10, - lambda first, second: first[1] < second[1])).without_defaults()) + n=10, key=lambda sessions_count: sessions_count[1])). + without_defaults()) class SessionsToStringsDoFn(beam.DoFn): diff --git a/sdks/python/apache_beam/examples/complete/top_wikipedia_sessions_test.py b/sdks/python/apache_beam/examples/complete/top_wikipedia_sessions_test.py index 40540cb08df5..3c171664e45d 100644 --- a/sdks/python/apache_beam/examples/complete/top_wikipedia_sessions_test.py +++ b/sdks/python/apache_beam/examples/complete/top_wikipedia_sessions_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import json import unittest diff --git a/sdks/python/apache_beam/examples/cookbook/__init__.py b/sdks/python/apache_beam/examples/cookbook/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/cookbook/__init__.py +++ b/sdks/python/apache_beam/examples/cookbook/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/cookbook/bigquery_schema.py b/sdks/python/apache_beam/examples/cookbook/bigquery_schema.py index 74bbcc4ea020..bf06d8266868 100644 --- a/sdks/python/apache_beam/examples/cookbook/bigquery_schema.py +++ b/sdks/python/apache_beam/examples/cookbook/bigquery_schema.py @@ -24,8 +24,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging diff --git a/sdks/python/apache_beam/examples/cookbook/bigquery_side_input.py b/sdks/python/apache_beam/examples/cookbook/bigquery_side_input.py index 1b909a34d7bf..6f40b14226b5 100644 --- a/sdks/python/apache_beam/examples/cookbook/bigquery_side_input.py +++ b/sdks/python/apache_beam/examples/cookbook/bigquery_side_input.py @@ -29,11 +29,8 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging -from builtins import range from random import randrange import apache_beam as beam @@ -99,10 +96,9 @@ def run(argv=None): ignore_corpus = known_args.ignore_corpus ignore_word = known_args.ignore_word - pcoll_corpus = p | 'read corpus' >> beam.io.Read( - beam.io.BigQuerySource(query=query_corpus)) - pcoll_word = p | 'read_words' >> beam.io.Read( - beam.io.BigQuerySource(query=query_word)) + pcoll_corpus = p | 'read corpus' >> beam.io.ReadFromBigQuery( + query=query_corpus) + pcoll_word = p | 'read_words' >> beam.io.ReadFromBigQuery(query=query_word) pcoll_ignore_corpus = p | 'create_ignore_corpus' >> beam.Create( [ignore_corpus]) pcoll_ignore_word = p | 'create_ignore_word' >> beam.Create([ignore_word]) diff --git a/sdks/python/apache_beam/examples/cookbook/bigquery_side_input_test.py b/sdks/python/apache_beam/examples/cookbook/bigquery_side_input_test.py index d4320d4be2ed..ba9b61e342bd 100644 --- a/sdks/python/apache_beam/examples/cookbook/bigquery_side_input_test.py +++ b/sdks/python/apache_beam/examples/cookbook/bigquery_side_input_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py index 81d5ea8d2103..224a2ad586c1 100644 --- a/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py +++ b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py @@ -34,8 +34,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging diff --git a/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py index 70ff75737b54..8d4461560920 100644 --- a/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py +++ b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py @@ -19,15 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import time import unittest -from builtins import round +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.examples.cookbook import bigquery_tornadoes from apache_beam.io.gcp.tests import utils @@ -38,14 +35,11 @@ class BigqueryTornadoesIT(unittest.TestCase): - # Enable nose tests running in parallel - _multiprocess_can_split_ = True - # The default checksum is a SHA-1 hash generated from sorted rows reading # from expected Bigquery table. DEFAULT_CHECKSUM = 'd860e636050c559a16a791aff40d6ad809d4daf0' - @attr('IT') + @pytest.mark.it_postcommit def test_bigquery_tornadoes_it(self): test_pipeline = TestPipeline(is_integration_test=True) diff --git a/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_test.py b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_test.py index 10d6d40cfeaf..e9f3f679ad80 100644 --- a/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_test.py +++ b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/cookbook/bigtableio_it_test.py b/sdks/python/apache_beam/examples/cookbook/bigtableio_it_test.py index 666155bd7fc1..f12c9457d53f 100644 --- a/sdks/python/apache_beam/examples/cookbook/bigtableio_it_test.py +++ b/sdks/python/apache_beam/examples/cookbook/bigtableio_it_test.py @@ -18,8 +18,6 @@ """Unittest for GCP Bigtable testing.""" # pytype: skip-file -from __future__ import absolute_import - import datetime import logging import random diff --git a/sdks/python/apache_beam/examples/cookbook/coders.py b/sdks/python/apache_beam/examples/cookbook/coders.py index c85110ecac48..cb8d4c29cdae 100644 --- a/sdks/python/apache_beam/examples/cookbook/coders.py +++ b/sdks/python/apache_beam/examples/cookbook/coders.py @@ -30,12 +30,9 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import json import logging -from builtins import object import apache_beam as beam from apache_beam.io import ReadFromText @@ -47,7 +44,7 @@ class JsonCoder(object): """A JSON coder interpreting each line as a JSON string.""" def encode(self, x): - return json.dumps(x) + return json.dumps(x).encode('utf-8') def decode(self, x): return json.loads(x) diff --git a/sdks/python/apache_beam/examples/cookbook/coders_test.py b/sdks/python/apache_beam/examples/cookbook/coders_test.py index 61a928000aab..c2325eac51f0 100644 --- a/sdks/python/apache_beam/examples/cookbook/coders_test.py +++ b/sdks/python/apache_beam/examples/cookbook/coders_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/cookbook/combiners_test.py b/sdks/python/apache_beam/examples/cookbook/combiners_test.py index ead360f248c3..c24f60b5dcfe 100644 --- a/sdks/python/apache_beam/examples/cookbook/combiners_test.py +++ b/sdks/python/apache_beam/examples/cookbook/combiners_test.py @@ -25,8 +25,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/cookbook/custom_ptransform.py b/sdks/python/apache_beam/examples/cookbook/custom_ptransform.py index d596e4b77fd3..4beab18aab1e 100644 --- a/sdks/python/apache_beam/examples/cookbook/custom_ptransform.py +++ b/sdks/python/apache_beam/examples/cookbook/custom_ptransform.py @@ -22,8 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging diff --git a/sdks/python/apache_beam/examples/cookbook/custom_ptransform_test.py b/sdks/python/apache_beam/examples/cookbook/custom_ptransform_test.py index 21afc299e3ab..eb464d0b83c5 100644 --- a/sdks/python/apache_beam/examples/cookbook/custom_ptransform_test.py +++ b/sdks/python/apache_beam/examples/cookbook/custom_ptransform_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py b/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py index 39c76ca80ee8..c893a2cd8b2d 100644 --- a/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py +++ b/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py @@ -55,9 +55,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import argparse import logging import re @@ -66,7 +63,6 @@ from typing import Optional from typing import Text import uuid -from builtins import object import apache_beam as beam from apache_beam.io import ReadFromText diff --git a/sdks/python/apache_beam/examples/cookbook/datastore_wordcount_it_test.py b/sdks/python/apache_beam/examples/cookbook/datastore_wordcount_it_test.py index 0cff4c40d481..388caf6cb58b 100644 --- a/sdks/python/apache_beam/examples/cookbook/datastore_wordcount_it_test.py +++ b/sdks/python/apache_beam/examples/cookbook/datastore_wordcount_it_test.py @@ -19,14 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import time import unittest +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.testing.pipeline_verifiers import FileChecksumMatcher from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher @@ -44,7 +42,7 @@ class DatastoreWordCountIT(unittest.TestCase): DATASTORE_WORDCOUNT_KIND = "DatastoreWordCount" EXPECTED_CHECKSUM = '826f69ed0275858c2e098f1e8407d4e3ba5a4b3f' - @attr('IT') + @pytest.mark.it_postcommit def test_datastore_wordcount_it(self): test_pipeline = TestPipeline(is_integration_test=True) kind = self.DATASTORE_WORDCOUNT_KIND diff --git a/sdks/python/apache_beam/examples/cookbook/filters.py b/sdks/python/apache_beam/examples/cookbook/filters.py index 95f32f9babcc..fda07064fa0c 100644 --- a/sdks/python/apache_beam/examples/cookbook/filters.py +++ b/sdks/python/apache_beam/examples/cookbook/filters.py @@ -26,8 +26,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging @@ -90,7 +88,7 @@ def run(argv=None): with beam.Pipeline(argv=pipeline_args) as p: - input_data = p | beam.io.Read(beam.io.BigQuerySource(known_args.input)) + input_data = p | beam.io.ReadFromBigQuery(table=known_args.input) # pylint: disable=expression-not-assigned ( diff --git a/sdks/python/apache_beam/examples/cookbook/filters_test.py b/sdks/python/apache_beam/examples/cookbook/filters_test.py index 477fc7e5fb32..a75253e3e575 100644 --- a/sdks/python/apache_beam/examples/cookbook/filters_test.py +++ b/sdks/python/apache_beam/examples/cookbook/filters_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/examples/cookbook/group_with_coder.py b/sdks/python/apache_beam/examples/cookbook/group_with_coder.py index 2249ec668dea..3ce7836b491a 100644 --- a/sdks/python/apache_beam/examples/cookbook/group_with_coder.py +++ b/sdks/python/apache_beam/examples/cookbook/group_with_coder.py @@ -27,13 +27,10 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import sys import typing -from builtins import object import apache_beam as beam from apache_beam import coders diff --git a/sdks/python/apache_beam/examples/cookbook/group_with_coder_test.py b/sdks/python/apache_beam/examples/cookbook/group_with_coder_test.py index d896f48a9935..e29807fb68f4 100644 --- a/sdks/python/apache_beam/examples/cookbook/group_with_coder_test.py +++ b/sdks/python/apache_beam/examples/cookbook/group_with_coder_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import tempfile import unittest @@ -80,22 +78,17 @@ def test_basics_without_type_check(self): # therefore any custom coders will not be used. The default coder (pickler) # will be used instead. temp_path = self.create_temp_file(self.SAMPLE_RECORDS) - group_with_coder.run([ - '--no_pipeline_type_check', - '--input=%s*' % temp_path, - '--output=%s.result' % temp_path - ], - save_main_session=False) - # Parse result file and compare. - results = [] - with open_shards(temp_path + '.result-*-of-*') as result_file: - for line in result_file: - name, points = line.split(',') - results.append((name, int(points))) - logging.info('result: %s', results) - self.assertEqual( - sorted(results), - sorted([('ann', 15), ('fred', 9), ('joe', 60), ('mary', 8)])) + with self.assertRaises(Exception) as context: + # yapf: disable + group_with_coder.run( + [ + '--no_pipeline_type_check', + '--input=%s*' % temp_path, + '--output=%s.result' % temp_path + ], + save_main_session=False) + self.assertIn('Unable to deterministically encode', str(context.exception)) + self.assertIn('CombinePerKey(sum)/GroupByKey', str(context.exception)) if __name__ == '__main__': diff --git a/sdks/python/apache_beam/examples/cookbook/mergecontacts.py b/sdks/python/apache_beam/examples/cookbook/mergecontacts.py index 2c0780cb0f14..81df530102e3 100644 --- a/sdks/python/apache_beam/examples/cookbook/mergecontacts.py +++ b/sdks/python/apache_beam/examples/cookbook/mergecontacts.py @@ -31,12 +31,9 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import re -from builtins import next import apache_beam as beam from apache_beam.io import ReadFromText diff --git a/sdks/python/apache_beam/examples/cookbook/mergecontacts_test.py b/sdks/python/apache_beam/examples/cookbook/mergecontacts_test.py index adb81e75111e..72291fce0751 100644 --- a/sdks/python/apache_beam/examples/cookbook/mergecontacts_test.py +++ b/sdks/python/apache_beam/examples/cookbook/mergecontacts_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import tempfile import unittest diff --git a/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo.py b/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo.py index 5f4e55734aa8..1dde20a49d09 100644 --- a/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo.py +++ b/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo.py @@ -51,8 +51,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import re diff --git a/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo_test.py b/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo_test.py index 676050eb290a..e811aacc577e 100644 --- a/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo_test.py +++ b/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import re import tempfile diff --git a/sdks/python/apache_beam/examples/dataframe/README.md b/sdks/python/apache_beam/examples/dataframe/README.md new file mode 100644 index 000000000000..25fca9ef7335 --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/README.md @@ -0,0 +1,182 @@ + + +# Example DataFrame API Pipelines + +This module contains example pipelines that use the [Beam DataFrame +API](https://beam.apache.org/documentation/dsls/dataframes/overview/). + +## Pre-requisites + +You must have `apache-beam>=2.30.0` installed in order to run these pipelines, +because the `apache_beam.examples.dataframe` module was added in that release. +Additionally using the DataFrame API requires `pandas>=1.0.0` to be installed +in your local Python session. The _same_ version should be installed on workers +when executing DataFrame API pipelines on distributed runners. Reference +[`base_image_requirements.txt`](../../../container/base_image_requirements.txt) +for the Beam release you are using to see what version of pandas will be used +by default on distributed workers. + +## Wordcount Pipeline + +Wordcount is the "Hello World" of data analytic systems, so of course we +had to implement it for the Beam DataFrame API! See [`wordcount.py`](./wordcount.py) for the +implementation. Note it demonstrates how to integrate the DataFrame API with +a larger Beam pipeline by using [Beam +Schemas](https://beam.apache.org/documentation/programming-guide/#what-is-a-schema) +in conjunction with +[to_dataframe](https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_dataframe) +and +[to_pcollection](https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_pcollection). + +### Running the pipeline + +To run the pipeline locally: + +```sh +python -m apache_beam.examples.dataframe.wordcount \ + --input gs://dataflow-samples/shakespeare/kinglear.txt \ + --output counts +``` + +This will produce files like `counts-XXXXX-of-YYYYY` with contents like: +``` +KING: 243 +LEAR: 236 +DRAMATIS: 1 +PERSONAE: 1 +king: 65 +of: 447 +Britain: 2 +OF: 15 +FRANCE: 10 +DUKE: 3 +... +``` + +## Taxi Ride Example Pipelines + +[`taxiride.py`](./taxiride.py) contains implementations for two DataFrame pipelines that +process the well-known [NYC Taxi +dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page). These +pipelines don't use any Beam primitives. Instead they build end-to-end pipelines +using the DataFrame API, by leveraging [DataFrame +IOs](https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.io.html). + +The module defines two pipelines. The `location_id_agg` pipeline does a grouped +aggregation on the drop-off location ID. The `borough_enrich` pipeline extends +this example by joining the zone lookup table to find the borough where each +drop off occurred, and aggregate per borough. + +### Data +Some snapshots of NYC taxi data have been staged in +`gs://apache-beam-samples` for use with these example pipelines: + +- `gs://apache-beam-samples/nyc_taxi/2017/yellow_tripdata_2017-*.csv`: CSV files + containing taxi ride data for each month of 2017 (similar directories exist + for 2018 and 2019). +- `gs://apache-beam-samples/nyc_taxi/misc/sample.csv`: A sample of 1 million + records from the beginning of 2019. At ~85 MiB this is a manageable size for + processing locally. +- `gs://apache-beam-samples/nyc_taxi/misc/taxi+_zone_lookup.csv`: Lookup table + with information about Zone IDs. Used by the `borough_enrich` pipeline. + +### Running `location_id_agg` +To run the aggregation pipeline locally, use the following command: +```sh +python -m apache_beam.examples.dataframe.taxiride \ + --pipeline location_id_agg \ + --input gs://apache-beam-samples/nyc_taxi/misc/sample.csv \ + --output aggregation.csv +``` + +This will write the output to files like `aggregation.csv-XXXXX-of-YYYYY` with +contents like: +``` +DOLocationID,passenger_count +1,3852 +3,130 +4,7725 +5,24 +6,37 +7,7429 +8,24 +9,180 +10,938 +... +``` + +### Running `borough_enrich` +To run the enrich pipeline locally, use the command: +```sh +python -m apache_beam.examples.dataframe.taxiride \ + --pipeline borough_enrich \ + --input gs://apache-beam-samples/nyc_taxi/misc/sample.csv \ + --output enrich.csv +``` + +This will write the output to files like `enrich.csv-XXXXX-of-YYYYY` with +contents like: +``` +Borough,passenger_count +Bronx,13645 +Brooklyn,70654 +EWR,3852 +Manhattan,1417124 +Queens,81138 +Staten Island,531 +Unknown,28527 +``` + +## Flight Delay pipeline (added in 2.31.0) +[`flight_delays.py`](./flight_delays.py) contains an implementation of +a pipeline that processes the flight ontime data from +`bigquery-samples.airline_ontime_data.flights`. It uses a conventional Beam +pipeline to read from BigQuery, apply a 24-hour rolling window, and define a +Beam schema for the data. Then it converts to DataFrames in order to perform +a complex aggregation using `GroupBy.apply`, and write the result out with +`to_csv`. Note that the DataFrame computation respects the 24-hour window +applied above, and results are partitioned into separate files per day. + +### Running the pipeline +To run the pipeline locally: + +```sh +python -m apache_beam.examples.dataframe.flight_delays \ + --start_date 2012-12-24 \ + --end_date 2012-12-25 \ + --output gs:///

    /delays.csv \ + --project \ + --temp_location gs:/// +``` + +Note a GCP `project` and `temp_location` are required for reading from BigQuery. + +This will produce files like +`gs:////delays.csv-2012-12-24T00:00:00-2012-12-25T00:00:00-XXXXX-of-YYYYY` +with contents tracking average delays per airline on that day, for example: +``` +airline,departure_delay,arrival_delay +EV,10.01901901901902,4.431431431431432 +HA,-1.0829015544041452,0.010362694300518135 +UA,19.142555438225976,11.07180570221753 +VX,62.755102040816325,62.61224489795919 +WN,12.074298711144806,6.717968157695224 +... +``` diff --git a/sdks/python/apache_beam/examples/dataframe/__init__.py b/sdks/python/apache_beam/examples/dataframe/__init__.py new file mode 100644 index 000000000000..cce3acad34a4 --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/__init__.py @@ -0,0 +1,16 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# diff --git a/sdks/python/apache_beam/examples/dataframe/data/taxiride_2018_aggregation_truth.csv b/sdks/python/apache_beam/examples/dataframe/data/taxiride_2018_aggregation_truth.csv new file mode 100644 index 000000000000..a6f7cc18c4bc --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/data/taxiride_2018_aggregation_truth.csv @@ -0,0 +1,283 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Expected output for apache_beam.examples.dataframe.taxiride aggregation +# pipeline with input gs://apache-beam-samples/nyc_taxi/2018/*.csv +DOLocationID,passenger_count +1,322785 +2,126 +3,11462 +4,765196 +5,1129 +6,3233 +7,690775 +8,2745 +9,12449 +10,71098 +11,12153 +12,128193 +13,1554566 +14,121699 +15,15656 +16,24503 +17,216383 +18,24966 +19,10986 +20,15848 +21,21407 +22,31214 +23,4207 +24,543676 +25,309075 +26,38391 +27,1863 +28,41136 +29,16975 +30,574 +31,8262 +32,15028 +33,471728 +34,18156 +35,27688 +36,130194 +37,204445 +38,13124 +39,37766 +40,181880 +41,1082726 +42,724974 +43,2138199 +44,879 +45,476163 +46,2665 +47,16868 +48,4565446 +49,275157 +50,1918576 +51,22594 +52,133087 +53,16560 +54,43805 +55,21054 +56,44920 +57,8374 +58,3600 +59,1124 +60,11858 +61,258716 +62,68486 +63,17464 +64,13871 +65,308587 +66,253356 +67,26034 +68,3649382 +69,72461 +70,69129 +71,30635 +72,29204 +73,9531 +74,1290532 +75,1944536 +76,48657 +77,13536 +78,17909 +79,3793408 +80,266865 +81,11246 +82,140149 +83,63732 +84,1367 +85,25466 +86,10757 +87,1332524 +88,675278 +89,137834 +90,2395749 +91,32698 +92,70710 +93,29590 +94,12862 +95,192248 +96,2807 +97,268138 +98,20011 +99,89 +100,2551666 +101,10769 +102,21921 +104,1 +105,177 +106,81012 +107,3424400 +108,11452 +109,1699 +110,9 +111,1995 +112,365376 +113,2205186 +114,1741462 +115,2657 +116,459021 +117,9686 +118,3279 +119,35594 +120,6613 +121,26243 +122,8021 +123,24449 +124,17708 +125,841654 +126,20923 +127,120944 +128,9948 +129,282285 +130,49061 +131,22778 +132,1469797 +133,48876 +134,38954 +135,36338 +136,22307 +137,2295038 +138,1840138 +139,14768 +140,2906002 +141,3920899 +142,4382733 +143,2137720 +144,1518062 +145,497247 +146,210157 +147,18147 +148,1782752 +149,19561 +150,10083 +151,1381739 +152,276888 +153,6846 +154,1672 +155,22919 +156,1884 +157,49558 +158,1664306 +159,52129 +160,46973 +161,6176778 +162,5007447 +163,3953647 +164,3561583 +165,35265 +166,1230580 +167,26377 +168,128775 +169,31687 +170,5161294 +171,24697 +172,1966 +173,49021 +174,22149 +175,13123 +176,1184 +177,34208 +178,12609 +179,218209 +180,12416 +181,529150 +182,23097 +183,8748 +184,1176 +185,19379 +186,4350212 +187,962 +188,118354 +189,162667 +190,30577 +191,21350 +192,12914 +193,120494 +194,23144 +195,65935 +196,57684 +197,42170 +198,84035 +199,27 +200,49920 +201,7460 +202,72940 +203,16684 +204,897 +205,26487 +206,3422 +207,8605 +208,23031 +209,440472 +210,22758 +211,1365091 +212,21669 +213,29142 +214,2912 +215,20573 +216,65392 +217,53220 +218,25723 +219,35182 +220,62297 +221,3560 +222,8543 +223,308105 +224,663996 +225,157867 +226,389249 +227,36794 +228,98122 +229,2942385 +230,5261211 +231,2669490 +232,757350 +233,2418181 +234,4380823 +235,33308 +236,6295709 +237,5843189 +238,3405670 +239,4082449 +240,4699 +241,26449 +242,19209 +243,296255 +244,560156 +245,1880 +246,2935385 +247,72010 +248,19444 +249,2627767 +250,16423 +251,2444 +252,23272 +253,1195 +254,20046 +255,448683 +256,393910 +257,78543 +258,29391 +259,17508 +260,181074 +261,820459 +262,2095536 +263,3116716 +264,2036397 +265,371132 diff --git a/sdks/python/apache_beam/examples/dataframe/data/taxiride_2018_enrich_truth.csv b/sdks/python/apache_beam/examples/dataframe/data/taxiride_2018_enrich_truth.csv new file mode 100644 index 000000000000..a89357b46625 --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/data/taxiride_2018_enrich_truth.csv @@ -0,0 +1,26 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Expected output for apache_beam.examples.dataframe.taxiride enrich +# pipeline with input gs://apache-beam-samples/nyc_taxi/2018/*.csv +Borough,passenger_count +Manhattan,145129096 +Queens,8024331 +Brooklyn,6972533 +Bronx,1074423 +EWR,322785 +Unknown,2407529 +Staten Island,39659 diff --git a/sdks/python/apache_beam/examples/dataframe/flight_delays.py b/sdks/python/apache_beam/examples/dataframe/flight_delays.py new file mode 100644 index 000000000000..2048376b406b --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/flight_delays.py @@ -0,0 +1,137 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""A pipeline using dataframes to compute typical flight delay times.""" + +# pytype: skip-file + +from __future__ import absolute_import + +import argparse +import logging + +import apache_beam as beam +from apache_beam.dataframe.convert import to_dataframe +from apache_beam.options.pipeline_options import PipelineOptions + + +def get_mean_delay_at_top_airports(airline_df): + arr = airline_df.rename(columns={ + 'arrival_airport': 'airport' + }).airport.value_counts() + dep = airline_df.rename(columns={ + 'departure_airport': 'airport' + }).airport.value_counts() + total = arr + dep + # Note we keep all to include duplicates. + # This ensures the result is deterministic + top_airports = total.nlargest(10, keep='all') + at_top_airports = airline_df['arrival_airport'].isin( + top_airports.index.values) + return airline_df[at_top_airports].mean() + + +def input_date(date): + import datetime + parsed = datetime.datetime.strptime(date, '%Y-%m-%d') + if (parsed > datetime.datetime(2012, 12, 31) or + parsed < datetime.datetime(2002, 1, 1)): + raise ValueError("There's only data from 2002-01-01 to 2012-12-31") + return date + + +def run_flight_delay_pipeline( + pipeline, start_date=None, end_date=None, output=None): + query = f""" + SELECT + date, + airline, + departure_airport, + arrival_airport, + departure_delay, + arrival_delay + FROM `bigquery-samples.airline_ontime_data.flights` + WHERE date >= '{start_date}' AND date <= '{end_date}' + """ + + # Import this here to avoid pickling the main session. + import time + import datetime + from apache_beam import window + + def to_unixtime(s): + return time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d").timetuple()) + + # The pipeline will be run on exiting the with block. + with pipeline as p: + tbl = ( + p + | 'read table' >> beam.io.ReadFromBigQuery( + query=query, use_standard_sql=True) + | 'assign timestamp' >> + beam.Map(lambda x: window.TimestampedValue(x, to_unixtime(x['date']))) + # Use beam.Select to make sure data has a schema + # The casts in lambdas ensure data types are properly inferred + | 'set schema' >> beam.Select( + date=lambda x: str(x['date']), + airline=lambda x: str(x['airline']), + departure_airport=lambda x: str(x['departure_airport']), + arrival_airport=lambda x: str(x['arrival_airport']), + departure_delay=lambda x: float(x['departure_delay']), + arrival_delay=lambda x: float(x['arrival_delay']))) + + daily = tbl | 'daily windows' >> beam.WindowInto( + beam.window.FixedWindows(60 * 60 * 24)) + + # group the flights data by carrier + df = to_dataframe(daily) + result = df.groupby('airline').apply(get_mean_delay_at_top_airports) + result.to_csv(output) + + +def run(argv=None): + """Main entry point; defines and runs the flight delay pipeline.""" + parser = argparse.ArgumentParser() + parser.add_argument( + '--start_date', + dest='start_date', + type=input_date, + default='2012-12-22', + help='YYYY-MM-DD lower bound (inclusive) for input dataset.') + parser.add_argument( + '--end_date', + dest='end_date', + type=input_date, + default='2012-12-26', + help='YYYY-MM-DD upper bound (inclusive) for input dataset.') + parser.add_argument( + '--output', + dest='output', + required=True, + help='Location to write the output.') + known_args, pipeline_args = parser.parse_known_args(argv) + + run_flight_delay_pipeline( + beam.Pipeline(options=PipelineOptions(pipeline_args)), + start_date=known_args.start_date, + end_date=known_args.end_date, + output=known_args.output) + + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.INFO) + run() diff --git a/sdks/python/apache_beam/examples/dataframe/flight_delays_it_test.py b/sdks/python/apache_beam/examples/dataframe/flight_delays_it_test.py new file mode 100644 index 000000000000..6ed376e247f0 --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/flight_delays_it_test.py @@ -0,0 +1,138 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Test for the flight delay example.""" + +# pytype: skip-file + +from __future__ import absolute_import + +import logging +import os +import unittest +import uuid + +import pandas as pd +import pytest + +from apache_beam.examples.dataframe import flight_delays +from apache_beam.io.filesystems import FileSystems +from apache_beam.testing.test_pipeline import TestPipeline + + +class FlightDelaysTest(unittest.TestCase): + EXPECTED = { + '2012-12-23': [ + ('AA', 20.082559339525282, 12.825593395252838), + ('EV', 10.01901901901902, 4.431431431431432), + ('HA', -1.0829015544041452, 0.010362694300518135), + ('UA', 19.142555438225976, 11.07180570221753), + ('MQ', 8.902255639097744, 3.6676691729323307), + ('OO', 31.148883374689827, 31.90818858560794), + ('US', 3.092541436464088, -2.350828729281768), + ('WN', 12.074298711144806, 6.717968157695224), + ('AS', 5.0456273764258555, 1.0722433460076046), + ('B6', 20.646569646569645, 16.405405405405407), + ('DL', 5.2559923298178335, -3.214765100671141), + ('F9', 23.823529411764707, 25.455882352941178), + ('FL', 4.492877492877493, -0.8005698005698005), + ('VX', 62.755102040816325, 62.61224489795919), + ('YV', 16.155844155844157, 13.376623376623376), + ], + '2012-12-24': [ + ('AS', 0.5917602996254682, -2.2659176029962547), + ('B6', 8.070993914807302, 2.73630831643002), + ('DL', 3.7171824973319105, -2.2358591248665953), + ('F9', 14.111940298507463, 15.888059701492537), + ('FL', 2.4210526315789473, 2.242690058479532), + ('VX', 3.841666666666667, -2.4166666666666665), + ('YV', 0.32, 0.78), + ('MQ', 15.869642857142857, 9.992857142857142), + ('OO', 11.048517520215633, 10.138814016172507), + ('US', 1.369281045751634, -1.4101307189542485), + ('WN', 7.515952597994531, 0.7028258887876025), + ('AA', 7.049086757990867, -1.5970319634703196), + ('EV', 7.297101449275362, 2.2693236714975846), + ('HA', -2.6785714285714284, -2.4744897959183674), + ('UA', 10.935406698564593, -1.3337320574162679), + ], + '2012-12-25': [ + ('AS', 3.4816326530612245, 0.27346938775510204), + ('B6', 9.10590631364562, 3.989816700610998), + ('DL', 2.3022170361726952, -3.6709451575262544), + ('F9', 19.38255033557047, 21.845637583892618), + ('FL', 1.3982300884955752, 0.9380530973451328), + ('VX', 23.62878787878788, 23.636363636363637), + ('YV', 11.256302521008404, 11.659663865546218), + ('MQ', 32.6, 44.28666666666667), + ('OO', 16.2275960170697, 17.11948790896159), + ('US', 2.7953216374269005, 0.2236842105263158), + ('WN', 14.405783582089553, 10.111940298507463), + ('AA', 23.551581843191197, 35.62585969738652), + ('EV', 17.368638239339752, 16.43191196698762), + ('HA', -4.725806451612903, -3.9946236559139785), + ('UA', 16.663145539906104, 10.772300469483568), + ], + } + + def setUp(self): + self.test_pipeline = TestPipeline(is_integration_test=True) + self.outdir = ( + self.test_pipeline.get_option('temp_location') + '/flight_delays_it-' + + str(uuid.uuid4())) + self.output_path = os.path.join(self.outdir, 'output.csv') + + def tearDown(self): + FileSystems.delete([self.outdir + '/']) + + @pytest.mark.it_postcommit + def test_flight_delays(self): + flight_delays.run_flight_delay_pipeline( + self.test_pipeline, + start_date='2012-12-23', + end_date='2012-12-25', + output=self.output_path) + + def read_csv(path): + with FileSystems.open(path) as fp: + return pd.read_csv(fp) + + # Parse result file and compare. + for date, expectation in self.EXPECTED.items(): + result_df = pd.concat( + read_csv(metadata.path) for metadata in FileSystems.match( + [f'{self.output_path}-{date}*'])[0].metadata_list) + result_df = result_df.sort_values('airline').reset_index(drop=True) + + expected_df = pd.DataFrame( + expectation, columns=['airline', 'departure_delay', 'arrival_delay']) + expected_df = expected_df.sort_values('airline').reset_index(drop=True) + + try: + pd.testing.assert_frame_equal(result_df, expected_df) + except AssertionError as e: + raise AssertionError( + f"date={date!r} result DataFrame:\n\n" + f"{result_df}\n\n" + "Differs from Expectation:\n\n" + f"{expected_df}") from e + + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.INFO) + unittest.main() diff --git a/sdks/python/apache_beam/examples/dataframe/taxiride.py b/sdks/python/apache_beam/examples/dataframe/taxiride.py new file mode 100644 index 000000000000..5278235ed963 --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/taxiride.py @@ -0,0 +1,122 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Pipelines that use the DataFrame API to process NYC taxiride CSV data.""" + +# pytype: skip-file + +from __future__ import absolute_import + +import argparse +import logging + +import apache_beam as beam +from apache_beam.dataframe.io import read_csv +from apache_beam.options.pipeline_options import PipelineOptions + +ZONE_LOOKUP_PATH = ( + "gs://apache-beam-samples/nyc_taxi/misc/taxi+_zone_lookup.csv") + + +def run_aggregation_pipeline(pipeline, input_path, output_path): + # The pipeline will be run on exiting the with block. + with pipeline as p: + rides = p | read_csv(input_path) + + # Count the number of passengers dropped off per LocationID + agg = rides.groupby('DOLocationID').passenger_count.sum() + agg.to_csv(output_path) + + +def run_enrich_pipeline( + pipeline, input_path, output_path, zone_lookup_path=ZONE_LOOKUP_PATH): + """Enrich taxi ride data with zone lookup table and perform a grouped + aggregation.""" + # The pipeline will be run on exiting the with block. + with pipeline as p: + rides = p | "Read taxi rides" >> read_csv(input_path) + zones = p | "Read zone lookup" >> read_csv(zone_lookup_path) + + # Enrich taxi ride data with boroughs from zone lookup table + # Joins on zones.LocationID and rides.DOLocationID, by first making the + # former the index for zones. + rides = rides.merge( + zones.set_index('LocationID').Borough, + right_index=True, + left_on='DOLocationID', + how='left') + + # Sum passengers dropped off per Borough + agg = rides.groupby('Borough').passenger_count.sum() + agg.to_csv(output_path) + + # A more intuitive alternative to the above merge call, but this option + # doesn't preserve index, thus requires non-parallel execution. + #rides = rides.merge(zones[['LocationID','Borough']], + # how="left", + # left_on='DOLocationID', + # right_on='LocationID') + + +def run(argv=None): + """Main entry point.""" + parser = argparse.ArgumentParser( + formatter_class=argparse.ArgumentDefaultsHelpFormatter) + parser.add_argument( + '--input', + dest='input', + default='gs://apache-beam-samples/nyc_taxi/misc/sample.csv', + help='Input file to process.') + parser.add_argument( + '--output', + dest='output', + required=True, + help='Output file to write results to.') + parser.add_argument( + '--zone_lookup', + dest='zone_lookup_path', + default=ZONE_LOOKUP_PATH, + help='Location for taxi zone lookup CSV.') + parser.add_argument( + '--pipeline', + dest='pipeline', + default='location_id_agg', + help=( + "Choice of pipeline to run. Must be one of " + "(location_id_agg, borough_enrich).")) + + known_args, pipeline_args = parser.parse_known_args(argv) + + pipeline = beam.Pipeline(options=PipelineOptions(pipeline_args)) + + if known_args.pipeline == 'location_id_agg': + run_aggregation_pipeline(pipeline, known_args.input, known_args.output) + elif known_args.pipeline == 'borough_enrich': + run_enrich_pipeline( + pipeline, + known_args.input, + known_args.output, + known_args.zone_lookup_path) + else: + raise ValueError( + f"Unrecognized value for --pipeline: {known_args.pipeline!r}. " + "Must be one of ('location_id_agg', 'borough_enrich')") + + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.INFO) + run() diff --git a/sdks/python/apache_beam/examples/dataframe/taxiride_it_test.py b/sdks/python/apache_beam/examples/dataframe/taxiride_it_test.py new file mode 100644 index 000000000000..f81b7d8cfa60 --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/taxiride_it_test.py @@ -0,0 +1,107 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""End-to-end tests for the taxiride examples.""" + +# pytype: skip-file + +import logging +import os +import unittest +import uuid + +import pandas as pd +import pytest + +from apache_beam.examples.dataframe import taxiride +from apache_beam.io.filesystems import FileSystems +from apache_beam.options.pipeline_options import WorkerOptions +from apache_beam.testing.test_pipeline import TestPipeline + + +class TaxirideIT(unittest.TestCase): + def setUp(self): + self.test_pipeline = TestPipeline(is_integration_test=True) + self.outdir = ( + self.test_pipeline.get_option('temp_location') + '/taxiride_it-' + + str(uuid.uuid4())) + self.output_path = os.path.join(self.outdir, 'output.csv') + + def tearDown(self): + FileSystems.delete([self.outdir + '/']) + + @pytest.mark.it_postcommit + def test_aggregation(self): + taxiride.run_aggregation_pipeline( + self.test_pipeline, + 'gs://apache-beam-samples/nyc_taxi/2018/*.csv', + self.output_path) + + # Verify + expected = pd.read_csv( + os.path.join( + os.path.dirname(__file__), + 'data', + 'taxiride_2018_aggregation_truth.csv'), + comment='#') + expected = expected.sort_values('DOLocationID').reset_index(drop=True) + + def read_csv(path): + with FileSystems.open(path) as fp: + return pd.read_csv(fp) + + result = pd.concat( + read_csv(metadata.path) for metadata in FileSystems.match( + [f'{self.output_path}*'])[0].metadata_list) + result = result.sort_values('DOLocationID').reset_index(drop=True) + + pd.testing.assert_frame_equal(expected, result) + + @pytest.mark.it_postcommit + def test_enrich(self): + # Standard workers OOM with the enrich pipeline + self.test_pipeline.get_pipeline_options().view_as( + WorkerOptions).machine_type = 'e2-highmem-2' + + taxiride.run_enrich_pipeline( + self.test_pipeline, + 'gs://apache-beam-samples/nyc_taxi/2018/*.csv', + self.output_path) + + # Verify + expected = pd.read_csv( + os.path.join( + os.path.dirname(__file__), 'data', + 'taxiride_2018_enrich_truth.csv'), + comment='#') + expected = expected.sort_values('Borough').reset_index(drop=True) + + def read_csv(path): + with FileSystems.open(path) as fp: + return pd.read_csv(fp) + + result = pd.concat( + read_csv(metadata.path) for metadata in FileSystems.match( + [f'{self.output_path}*'])[0].metadata_list) + result = result.sort_values('Borough').reset_index(drop=True) + + pd.testing.assert_frame_equal(expected, result) + + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.DEBUG) + unittest.main() diff --git a/sdks/python/apache_beam/examples/dataframe/taxiride_test.py b/sdks/python/apache_beam/examples/dataframe/taxiride_test.py new file mode 100644 index 000000000000..46b2ca509df8 --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/taxiride_test.py @@ -0,0 +1,128 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Unit tests for the taxiride example pipelines.""" + +# pytype: skip-file + +from __future__ import absolute_import + +import logging +import os +import re +import tempfile +import unittest + +import pandas as pd + +import apache_beam as beam +from apache_beam.examples.dataframe import taxiride +from apache_beam.testing.util import open_shards + + +class TaxiRideExampleTest(unittest.TestCase): + + # First 10 lines from gs://apache-beam-samples/nyc_taxi/misc/sample.csv + # pylint: disable=line-too-long + SAMPLE_RIDES = """VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge + 1,2019-01-01 00:46:40,2019-01-01 00:53:20,1,1.50,1,N,151,239,1,7,0.5,0.5,1.65,0,0.3,9.95, + 1,2019-01-01 00:59:47,2019-01-01 01:18:59,1,2.60,1,N,239,246,1,14,0.5,0.5,1,0,0.3,16.3, + 2,2018-12-21 13:48:30,2018-12-21 13:52:40,3,.00,1,N,236,236,1,4.5,0.5,0.5,0,0,0.3,5.8, + 2,2018-11-28 15:52:25,2018-11-28 15:55:45,5,.00,1,N,193,193,2,3.5,0.5,0.5,0,0,0.3,7.55, + 2,2018-11-28 15:56:57,2018-11-28 15:58:33,5,.00,2,N,193,193,2,52,0,0.5,0,0,0.3,55.55, + 2,2018-11-28 16:25:49,2018-11-28 16:28:26,5,.00,1,N,193,193,2,3.5,0.5,0.5,0,5.76,0.3,13.31, + 2,2018-11-28 16:29:37,2018-11-28 16:33:43,5,.00,2,N,193,193,2,52,0,0.5,0,0,0.3,55.55, + 1,2019-01-01 00:21:28,2019-01-01 00:28:37,1,1.30,1,N,163,229,1,6.5,0.5,0.5,1.25,0,0.3,9.05, + 1,2019-01-01 00:32:01,2019-01-01 00:45:39,1,3.70,1,N,229,7,1,13.5,0.5,0.5,3.7,0,0.3,18.5 + """ + # pylint: enable=line-too-long + + SAMPLE_ZONE_LOOKUP = """"LocationID","Borough","Zone","service_zone" + 7,"Queens","Astoria","Boro Zone" + 193,"Queens","Queensbridge/Ravenswood","Boro Zone" + 229,"Manhattan","Sutton Place/Turtle Bay North","Yellow Zone" + 236,"Manhattan","Upper East Side North","Yellow Zone" + 239,"Manhattan","Upper West Side South","Yellow Zone" + 246,"Manhattan","West Chelsea/Hudson Yards","Yellow Zone" + """ + + def setUp(self): + self.tmpdir = tempfile.TemporaryDirectory() + self.input_path = os.path.join(self.tmpdir.name, 'rides.csv') + self.lookup_path = os.path.join(self.tmpdir.name, 'lookup.csv') + self.output_path = os.path.join(self.tmpdir.name, 'output.csv') + + with open(self.input_path, 'w') as fp: + fp.write(self.SAMPLE_RIDES) + + with open(self.lookup_path, 'w') as fp: + fp.write(self.SAMPLE_ZONE_LOOKUP) + + def tearDown(self): + self.tmpdir.cleanup() + + def test_aggregation(self): + # Compute expected result + rides = pd.read_csv(self.input_path) + expected_counts = rides.groupby('DOLocationID').passenger_count.sum() + + taxiride.run_aggregation_pipeline( + beam.Pipeline(), self.input_path, self.output_path) + + # Parse result file and compare. + # TODO(BEAM-12379): taxiride examples should produce int sums, not floats + results = [] + with open_shards(f'{self.output_path}-*') as result_file: + for line in result_file: + match = re.search(r'(\S+),([0-9\.]+)', line) + if match is not None: + results.append((int(match.group(1)), int(float(match.group(2))))) + elif line.strip(): + self.assertEqual(line.strip(), 'DOLocationID,passenger_count') + self.assertEqual(sorted(results), sorted(expected_counts.items())) + + def test_enrich(self): + # Compute expected result + rides = pd.read_csv(self.input_path) + zones = pd.read_csv(self.lookup_path) + rides = rides.merge( + zones.set_index('LocationID').Borough, + right_index=True, + left_on='DOLocationID', + how='left') + expected_counts = rides.groupby('Borough').passenger_count.sum() + + taxiride.run_enrich_pipeline( + beam.Pipeline(), self.input_path, self.output_path, self.lookup_path) + + # Parse result file and compare. + # TODO(BEAM-XXXX): taxiride examples should produce int sums, not floats + results = [] + with open_shards(f'{self.output_path}-*') as result_file: + for line in result_file: + match = re.search(r'(\S+),([0-9\.]+)', line) + if match is not None: + results.append((match.group(1), int(float(match.group(2))))) + elif line.strip(): + self.assertEqual(line.strip(), 'Borough,passenger_count') + self.assertEqual(sorted(results), sorted(expected_counts.items())) + + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.INFO) + unittest.main() diff --git a/sdks/python/apache_beam/examples/dataframe/wordcount.py b/sdks/python/apache_beam/examples/dataframe/wordcount.py new file mode 100644 index 000000000000..61efd0f7a073 --- /dev/null +++ b/sdks/python/apache_beam/examples/dataframe/wordcount.py @@ -0,0 +1,83 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""A word-counting workflow using the DataFrame API.""" + +# pytype: skip-file + +import argparse +import logging + +import apache_beam as beam +from apache_beam.dataframe.convert import to_dataframe +from apache_beam.dataframe.convert import to_pcollection +from apache_beam.io import ReadFromText +from apache_beam.options.pipeline_options import PipelineOptions + + +def run(argv=None): + """Main entry point; defines and runs the wordcount pipeline.""" + parser = argparse.ArgumentParser( + formatter_class=argparse.ArgumentDefaultsHelpFormatter) + parser.add_argument( + '--input', + dest='input', + default='gs://dataflow-samples/shakespeare/kinglear.txt', + help='Input file to process.') + parser.add_argument( + '--output', + dest='output', + required=True, + help='Output file to write results to.') + known_args, pipeline_args = parser.parse_known_args(argv) + + # Import this here to avoid pickling the main session. + import re + + # The pipeline will be run on exiting the with block. + with beam.Pipeline(options=PipelineOptions(pipeline_args)) as p: + + # Read the text file[pattern] into a PCollection. + lines = p | 'Read' >> ReadFromText(known_args.input) + + words = ( + lines + | 'Split' >> beam.FlatMap( + lambda line: re.findall(r'[\w]+', line)).with_output_types(str) + # Map to Row objects to generate a schema suitable for conversion + # to a dataframe. + | 'ToRows' >> beam.Map(lambda word: beam.Row(word=word))) + + df = to_dataframe(words) + df['count'] = 1 + counted = df.groupby('word').sum() + counted.to_csv(known_args.output) + + # Deferred DataFrames can also be converted back to schema'd PCollections + counted_pc = to_pcollection(counted, include_indexes=True) + + # Print out every word that occurred >50 times + _ = ( + counted_pc + | beam.Filter(lambda row: row.count > 50) + | beam.Map(lambda row: f'{row.word}: {row.count}') + | beam.Map(print)) + + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.INFO) + run() diff --git a/sdks/python/apache_beam/examples/wordcount_dataframe_test.py b/sdks/python/apache_beam/examples/dataframe/wordcount_test.py similarity index 91% rename from sdks/python/apache_beam/examples/wordcount_dataframe_test.py rename to sdks/python/apache_beam/examples/dataframe/wordcount_test.py index cffaf9ad628d..e05e0d67f364 100644 --- a/sdks/python/apache_beam/examples/wordcount_dataframe_test.py +++ b/sdks/python/apache_beam/examples/dataframe/wordcount_test.py @@ -20,15 +20,13 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import logging import re import tempfile import unittest -from apache_beam.examples import wordcount_dataframe +from apache_beam.examples.dataframe import wordcount from apache_beam.testing.util import open_shards @@ -51,8 +49,7 @@ def test_basics(self): expected_words = collections.defaultdict(int) for word in re.findall(r'[\w]+', self.SAMPLE_TEXT): expected_words[word] += 1 - wordcount_dataframe.run( - ['--input=%s*' % temp_path, '--output=%s.result' % temp_path]) + wordcount.run(['--input=%s*' % temp_path, '--output=%s.result' % temp_path]) # Parse result file and compare. results = [] with open_shards(temp_path + '.result-*') as result_file: diff --git a/sdks/python/apache_beam/examples/fastavro_it_test.py b/sdks/python/apache_beam/examples/fastavro_it_test.py index d4aca2ce6543..c9bb9881a13e 100644 --- a/sdks/python/apache_beam/examples/fastavro_it_test.py +++ b/sdks/python/apache_beam/examples/fastavro_it_test.py @@ -24,7 +24,7 @@ Usage: DataFlowRunner: - python setup.py nosetests --tests apache_beam.examples.fastavro_it_test \ + pytest apache_beam/examples/fastavro_it_test.py \ --test-pipeline-options=" --runner=TestDataflowRunner --project=... @@ -36,7 +36,7 @@ " DirectRunner: - python setup.py nosetests --tests apache_beam.examples.fastavro_it_test \ + pytest apache_beam/examples/fastavro_it_test.py \ --test-pipeline-options=" --output=/tmp --records=5000 @@ -45,16 +45,14 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import json import logging import unittest import uuid +import pytest +from avro.schema import Parse from fastavro import parse_schema -from nose.plugins.attrib import attr from apache_beam.io.avroio import ReadAllFromAvro from apache_beam.io.avroio import WriteToAvro @@ -67,13 +65,6 @@ from apache_beam.transforms.core import Map from apache_beam.transforms.util import CoGroupByKey -# pylint: disable=wrong-import-order, wrong-import-position -try: - from avro.schema import Parse # avro-python3 library for python3 -except ImportError: - from avro.schema import parse as Parse # avro library for python2 -# pylint: enable=wrong-import-order, wrong-import-position - LABELS = ['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx'] COLORS = ['RED', 'ORANGE', 'YELLOW', 'GREEN', 'BLUE', 'PURPLE', None] @@ -107,7 +98,7 @@ def setUp(self): self.uuid = str(uuid.uuid4()) self.output = '/'.join([self.test_pipeline.get_option('output'), self.uuid]) - @attr('IT') + @pytest.mark.it_postcommit def test_avro_it(self): num_records = self.test_pipeline.get_option('records') num_records = int(num_records) if num_records else 1000000 diff --git a/sdks/python/apache_beam/examples/flink/__init__.py b/sdks/python/apache_beam/examples/flink/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/flink/__init__.py +++ b/sdks/python/apache_beam/examples/flink/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/flink/flink_streaming_impulse.py b/sdks/python/apache_beam/examples/flink/flink_streaming_impulse.py index f211c55692d1..badf40126fd5 100644 --- a/sdks/python/apache_beam/examples/flink/flink_streaming_impulse.py +++ b/sdks/python/apache_beam/examples/flink/flink_streaming_impulse.py @@ -22,8 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import sys diff --git a/sdks/python/apache_beam/examples/kafkataxi/__init__.py b/sdks/python/apache_beam/examples/kafkataxi/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/kafkataxi/__init__.py +++ b/sdks/python/apache_beam/examples/kafkataxi/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py b/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py index f1fc6465391f..96fee057b322 100644 --- a/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py +++ b/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py @@ -24,18 +24,25 @@ # pytype: skip-file -from __future__ import absolute_import - import logging +import sys import typing import apache_beam as beam from apache_beam.io.kafka import ReadFromKafka from apache_beam.io.kafka import WriteToKafka +from apache_beam.options.pipeline_options import GoogleCloudOptions from apache_beam.options.pipeline_options import PipelineOptions -def run(bootstrap_servers, topic, pipeline_args): +def run( + bootstrap_servers, + topic, + with_metadata, + bq_dataset, + bq_table_name, + project, + pipeline_options): # bootstrap_servers = '123.45.67.89:123:9092' # topic = 'kafka_taxirides_realtime' # pipeline_args = ['--project', 'my-project', @@ -45,20 +52,44 @@ def run(bootstrap_servers, topic, pipeline_args): # '--num_workers', 'my-num-workers', # '--experiments', 'use_runner_v2'] - pipeline_options = PipelineOptions( - pipeline_args, save_main_session=True, streaming=True) window_size = 15 # size of the Window in seconds. - def log_ride(ride_bytes): + def log_ride(ride): + if 'timestamp' in ride: + logging.info( + 'Found ride at latitude %r and longitude %r with %r ' + 'passengers at timestamp %r', + ride['latitude'], + ride['longitude'], + ride['passenger_count'], + ride['timestamp']) + else: + logging.info( + 'Found ride at latitude %r and longitude %r with %r ' + 'passengers', + ride['latitude'], + ride['longitude'], + ride['passenger_count']) + + def convert_kafka_record_to_dictionary(record): + # the records have 'value' attribute when --with_metadata is given + if hasattr(record, 'value'): + ride_bytes = record.value + elif isinstance(record, tuple): + ride_bytes = record[1] + else: + raise RuntimeError('unknown record type: %s' % type(record)) # Converting bytes record from Kafka to a dictionary. import ast ride = ast.literal_eval(ride_bytes.decode("UTF-8")) - logging.info( - 'Found ride at latitude %r and longitude %r with %r ' - 'passengers', - ride['latitude'], - ride['longitude'], - ride['passenger_count']) + output = { + key: ride[key] + for key in ['latitude', 'longitude', 'passenger_count'] + } + if hasattr(record, 'timestamp'): + # timestamp is read from Kafka metadata + output['timestamp'] = record.timestamp + return output with beam.Pipeline(options=pipeline_options) as pipeline: _ = ( @@ -73,12 +104,23 @@ def log_ride(ride_bytes): producer_config={'bootstrap.servers': bootstrap_servers}, topic=topic)) - _ = ( + ride_col = ( pipeline | ReadFromKafka( consumer_config={'bootstrap.servers': bootstrap_servers}, - topics=[topic]) - | beam.FlatMap(lambda kv: log_ride(kv[1]))) + topics=[topic], + with_metadata=with_metadata) + | beam.Map(lambda record: convert_kafka_record_to_dictionary(record))) + + if bq_dataset: + schema = 'latitude:STRING,longitude:STRING,passenger_count:INTEGER' + if with_metadata: + schema += ',timestamp:STRING' + _ = ( + ride_col + | beam.io.WriteToBigQuery(bq_table_name, bq_dataset, project, schema)) + else: + _ = ride_col | beam.FlatMap(lambda ride: log_ride(ride)) if __name__ == '__main__': @@ -97,6 +139,39 @@ def log_ride(ride_bytes): dest='topic', default='kafka_taxirides_realtime', help='Kafka topic to write to and read from') + parser.add_argument( + '--with_metadata', + default=False, + action='store_true', + help='If set, also reads metadata from the Kafka broker.') + parser.add_argument( + '--bq_dataset', + type=str, + default='', + help='BigQuery Dataset to write tables to. ' + 'If set, export data to a BigQuery table instead of just logging. ' + 'Must already exist.') + parser.add_argument( + '--bq_table_name', + default='xlang_kafka_taxi', + help='The BigQuery table name. Should not already exist.') known_args, pipeline_args = parser.parse_known_args() - run(known_args.bootstrap_servers, known_args.topic, pipeline_args) + pipeline_options = PipelineOptions( + pipeline_args, save_main_session=True, streaming=True) + + # We also require the --project option to access --bq_dataset + project = pipeline_options.view_as(GoogleCloudOptions).project + if project is None: + parser.print_usage() + print(sys.argv[0] + ': error: argument --project is required') + sys.exit(1) + + run( + known_args.bootstrap_servers, + known_args.topic, + known_args.with_metadata, + known_args.bq_dataset, + known_args.bq_table_name, + project, + pipeline_options) diff --git a/sdks/python/apache_beam/examples/snippets/__init__.py b/sdks/python/apache_beam/examples/snippets/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/snippets/__init__.py +++ b/sdks/python/apache_beam/examples/snippets/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/snippets/snippets.py b/sdks/python/apache_beam/examples/snippets/snippets.py index a5d2b7bfa7d8..b665e85d96cf 100644 --- a/sdks/python/apache_beam/examples/snippets/snippets.py +++ b/sdks/python/apache_beam/examples/snippets/snippets.py @@ -31,17 +31,13 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import argparse import base64 import json -from builtins import object -from builtins import range from decimal import Decimal +import mock + import apache_beam as beam from apache_beam.io import iobase from apache_beam.io.range_trackers import OffsetRangeTracker @@ -94,6 +90,7 @@ def visit_transform(self, transform_node): transform_node.transform.fn.file_to_write = self.renames['write'] +@mock.patch('apache_beam.Pipeline', TestPipeline) def construct_pipeline(renames): """A reverse words snippet as an example for constructing a pipeline.""" import re @@ -103,10 +100,12 @@ def construct_pipeline(renames): # Unresolved reference in ReverseWords class import apache_beam as beam - class ReverseWords(beam.PTransform): + @beam.ptransform_fn + @beam.typehints.with_input_types(str) + @beam.typehints.with_output_types(str) + def ReverseWords(pcoll): """A PTransform that reverses individual elements in a PCollection.""" - def expand(self, pcoll): - return pcoll | beam.Map(lambda e: e[::-1]) + return pcoll | beam.Map(lambda word: word[::-1]) def filter_words(unused_x): """Pass through filter to select everything.""" @@ -114,82 +113,66 @@ def filter_words(unused_x): # [START pipelines_constructing_creating] import apache_beam as beam - from apache_beam.options.pipeline_options import PipelineOptions - with beam.Pipeline(options=PipelineOptions()) as p: + with beam.Pipeline() as pipeline: pass # build your pipeline here # [END pipelines_constructing_creating] - with TestPipeline() as p: # Use TestPipeline for testing. - # pylint: disable=line-too-long + # [START pipelines_constructing_reading] + lines = pipeline | 'ReadMyFile' >> beam.io.ReadFromText( + 'gs://some/inputData.txt') + # [END pipelines_constructing_reading] - # [START pipelines_constructing_reading] - lines = p | 'ReadMyFile' >> beam.io.ReadFromText( - 'gs://some/inputData.txt') - # [END pipelines_constructing_reading] + # [START pipelines_constructing_applying] + words = lines | beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x)) + reversed_words = words | ReverseWords() + # [END pipelines_constructing_applying] - # [START pipelines_constructing_applying] - words = lines | beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x)) - reversed_words = words | ReverseWords() - # [END pipelines_constructing_applying] + # [START pipelines_constructing_writing] + filtered_words = reversed_words | 'FilterWords' >> beam.Filter(filter_words) + filtered_words | 'WriteMyFile' >> beam.io.WriteToText( + 'gs://some/outputData.txt') + # [END pipelines_constructing_writing] - # [START pipelines_constructing_writing] - filtered_words = reversed_words | 'FilterWords' >> beam.Filter( - filter_words) - filtered_words | 'WriteMyFile' >> beam.io.WriteToText( - 'gs://some/outputData.txt') - # [END pipelines_constructing_writing] + pipeline.visit(SnippetUtils.RenameFiles(renames)) - p.visit(SnippetUtils.RenameFiles(renames)) - -def model_pipelines(argv): +def model_pipelines(): """A wordcount snippet as a simple pipeline example.""" # [START model_pipelines] + import argparse import re import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions - class MyOptions(PipelineOptions): - @classmethod - def _add_argparse_args(cls, parser): - parser.add_argument( - '--input', - dest='input', - default='gs://dataflow-samples/shakespeare/kinglear' - '.txt', - help='Input file to process.') - parser.add_argument( - '--output', - dest='output', - required=True, - help='Output file to write results to.') - - pipeline_options = PipelineOptions(argv) - my_options = pipeline_options.view_as(MyOptions) - - with beam.Pipeline(options=pipeline_options) as p: + parser = argparse.ArgumentParser() + parser.add_argument( + '--input-file', + default='gs://dataflow-samples/shakespeare/kinglear.txt', + help='The file path for the input text to process.') + parser.add_argument( + '--output-path', required=True, help='The path prefix for output files.') + args, beam_args = parser.parse_known_args() + beam_options = PipelineOptions(beam_args) + with beam.Pipeline(options=beam_options) as pipeline: ( - p - | beam.io.ReadFromText(my_options.input) + pipeline + | beam.io.ReadFromText(args.input_file) | beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x)) | beam.Map(lambda x: (x, 1)) | beam.combiners.Count.PerKey() - | beam.io.WriteToText(my_options.output)) + | beam.io.WriteToText(args.output_path)) # [END model_pipelines] -def model_pcollection(argv): +def model_pcollection(output_path): """Creating a PCollection from data in local memory.""" # [START model_pcollection] import apache_beam as beam - from apache_beam.options.pipeline_options import PipelineOptions - # argv = None # if None, uses sys.argv - pipeline_options = PipelineOptions(argv) - with beam.Pipeline(options=pipeline_options) as pipeline: + with beam.Pipeline() as pipeline: lines = ( pipeline | beam.Create([ @@ -198,30 +181,18 @@ def model_pcollection(argv): 'The slings and arrows of outrageous fortune, ', 'Or to take arms against a sea of troubles, ', ])) - # [END model_pcollection] - class MyOptions(PipelineOptions): - @classmethod - def _add_argparse_args(cls, parser): - parser.add_argument( - '--output', - dest='output', - required=True, - help='Output file to write results to.') - - my_options = pipeline_options.view_as(MyOptions) - lines | beam.io.WriteToText(my_options.output) + lines | beam.io.WriteToText(output_path) -def pipeline_options_remote(argv): +def pipeline_options_remote(): """Creating a Pipeline using a PipelineOptions object for remote execution.""" # [START pipeline_options_create] from apache_beam.options.pipeline_options import PipelineOptions - options = PipelineOptions(flags=argv) - + beam_options = PipelineOptions() # [END pipeline_options_create] # [START pipeline_options_define_custom] @@ -230,46 +201,54 @@ def pipeline_options_remote(argv): class MyOptions(PipelineOptions): @classmethod def _add_argparse_args(cls, parser): - parser.add_argument('--input') - parser.add_argument('--output') + parser.add_argument('--input-file') + parser.add_argument('--output-path') # [END pipeline_options_define_custom] - # [START pipeline_options_dataflow_service] - import apache_beam as beam - from apache_beam.options.pipeline_options import PipelineOptions - - # Create and set your PipelineOptions. - # For Cloud execution, specify DataflowRunner and set the Cloud Platform - # project, job name, temporary files location, and region. - # For more information about regions, check: - # https://cloud.google.com/dataflow/docs/concepts/regional-endpoints - options = PipelineOptions( - flags=argv, - runner='DataflowRunner', - project='my-project-id', - job_name='unique-job-name', - temp_location='gs://my-bucket/temp', - region='us-central1') - - # Create the Pipeline with the specified options. - # with beam.Pipeline(options=options) as pipeline: - # pass # build your pipeline here. - # [END pipeline_options_dataflow_service] - - my_options = options.view_as(MyOptions) - - with TestPipeline() as p: # Use TestPipeline for testing. - lines = p | beam.io.ReadFromText(my_options.input) - lines | beam.io.WriteToText(my_options.output) - - -def pipeline_options_local(argv): + @mock.patch('apache_beam.Pipeline') + def dataflow_options(mock_pipeline): + # [START pipeline_options_dataflow_service] + import argparse + + import apache_beam as beam + from apache_beam.options.pipeline_options import PipelineOptions + + parser = argparse.ArgumentParser() + # parser.add_argument('--my-arg', help='description') + args, beam_args = parser.parse_known_args() + + # Create and set your PipelineOptions. + # For Cloud execution, specify DataflowRunner and set the Cloud Platform + # project, job name, temporary files location, and region. + # For more information about regions, check: + # https://cloud.google.com/dataflow/docs/concepts/regional-endpoints + beam_options = PipelineOptions( + beam_args, + runner='DataflowRunner', + project='my-project-id', + job_name='unique-job-name', + temp_location='gs://my-bucket/temp', + region='us-central1') + + # Create the Pipeline with the specified options. + with beam.Pipeline(options=beam_options) as pipeline: + pass # build your pipeline here. + # [END pipeline_options_dataflow_service] + return beam_options + + beam_options = dataflow_options() + args = beam_options.view_as(MyOptions) + + with TestPipeline() as pipeline: # Use TestPipeline for testing. + lines = pipeline | beam.io.ReadFromText(args.input_file) + lines | beam.io.WriteToText(args.output_path) + + +@mock.patch('apache_beam.Pipeline', TestPipeline) +def pipeline_options_local(): """Creating a Pipeline using a PipelineOptions object for local execution.""" - from apache_beam import Pipeline - from apache_beam.options.pipeline_options import PipelineOptions - # [START pipeline_options_define_custom_with_help_and_default] from apache_beam.options.pipeline_options import PipelineOptions @@ -277,31 +256,40 @@ class MyOptions(PipelineOptions): @classmethod def _add_argparse_args(cls, parser): parser.add_argument( - '--input', - help='Input for the pipeline', - default='gs://my-bucket/input') + '--input-file', + default='gs://dataflow-samples/shakespeare/kinglear.txt', + help='The file path for the input text to process.') parser.add_argument( - '--output', - help='Output for the pipeline', - default='gs://my-bucket/output') + '--output-path', + required=True, + help='The path prefix for output files.') # [END pipeline_options_define_custom_with_help_and_default] # [START pipeline_options_local] - # Create and set your Pipeline Options. - options = PipelineOptions(flags=argv) - my_options = options.view_as(MyOptions) + import argparse - with Pipeline(options=options) as pipeline: - pass # build your pipeline here. - # [END pipeline_options_local] + import apache_beam as beam + from apache_beam.options.pipeline_options import PipelineOptions - with TestPipeline() as p: # Use TestPipeline for testing. - lines = p | beam.io.ReadFromText(my_options.input) - lines | beam.io.WriteToText(my_options.output) + parser = argparse.ArgumentParser() + # parser.add_argument('--my-arg') + args, beam_args = parser.parse_known_args() + # Create and set your Pipeline Options. + beam_options = PipelineOptions(beam_args) + args = beam_options.view_as(MyOptions) -def pipeline_options_command_line(argv): + with beam.Pipeline(options=beam_options) as pipeline: + lines = ( + pipeline + | beam.io.ReadFromText(args.input_file) + | beam.io.WriteToText(args.output_path)) + # [END pipeline_options_local] + + +@mock.patch('apache_beam.Pipeline', TestPipeline) +def pipeline_options_command_line(): """Creating a Pipeline by passing a list of arguments.""" # [START pipeline_options_command_line] @@ -309,17 +297,27 @@ def pipeline_options_command_line(argv): import argparse import apache_beam as beam + from apache_beam.options.pipeline_options import PipelineOptions + # For more details on how to use argparse, take a look at: + # https://docs.python.org/3/library/argparse.html parser = argparse.ArgumentParser() - parser.add_argument('--input') - parser.add_argument('--output') - args, beam_args = parser.parse_known_args(argv) + parser.add_argument( + '--input-file', + default='gs://dataflow-samples/shakespeare/kinglear.txt', + help='The file path for the input text to process.') + parser.add_argument( + '--output-path', required=True, help='The path prefix for output files.') + args, beam_args = parser.parse_known_args() # Create the Pipeline with remaining arguments. - with beam.Pipeline(argv=beam_args) as pipeline: - lines = pipeline | 'Read files' >> beam.io.ReadFromText(args.input) - lines | 'Write files' >> beam.io.WriteToText(args.output) - # [END pipeline_options_command_line] + beam_options = PipelineOptions(beam_args) + with beam.Pipeline(options=beam_options) as pipeline: + lines = ( + pipeline + | 'Read files' >> beam.io.ReadFromText(args.input_file) + | 'Write files' >> beam.io.WriteToText(args.output_path)) + # [END pipeline_options_command_line] def pipeline_logging(lines, output): @@ -345,32 +343,20 @@ def process(self, element): # Remaining WordCount example code ... # [END pipeline_logging] - with TestPipeline() as p: # Use TestPipeline for testing. + with TestPipeline() as pipeline: # Use TestPipeline for testing. ( - p + pipeline | beam.Create(lines) | beam.ParDo(ExtractWordsFn()) | beam.io.WriteToText(output)) -def pipeline_monitoring(renames): +def pipeline_monitoring(): """Using monitoring interface snippets.""" + import argparse import re import apache_beam as beam - from apache_beam.options.pipeline_options import PipelineOptions - - class WordCountOptions(PipelineOptions): - @classmethod - def _add_argparse_args(cls, parser): - parser.add_argument( - '--input', - help='Input for the pipeline', - default='gs://my-bucket/input') - parser.add_argument( - '--output', - help='output for the pipeline', - default='gs://my-bucket/output') class ExtractWordsFn(beam.DoFn): def process(self, element): @@ -385,68 +371,83 @@ def process(self, element): # [START pipeline_monitoring_composite] # The CountWords Composite Transform inside the WordCount pipeline. - class CountWords(beam.PTransform): - def expand(self, pcoll): - return ( - pcoll - # Convert lines of text into individual words. - | 'ExtractWords' >> beam.ParDo(ExtractWordsFn()) - # Count the number of times each word occurs. - | beam.combiners.Count.PerElement() - # Format each word and count into a printable string. - | 'FormatCounts' >> beam.ParDo(FormatCountsFn())) + @beam.ptransform_fn + def CountWords(pcoll): + return ( + pcoll + # Convert lines of text into individual words. + | 'ExtractWords' >> beam.ParDo(ExtractWordsFn()) + # Count the number of times each word occurs. + | beam.combiners.Count.PerElement() + # Format each word and count into a printable string. + | 'FormatCounts' >> beam.ParDo(FormatCountsFn())) # [END pipeline_monitoring_composite] - pipeline_options = PipelineOptions() - options = pipeline_options.view_as(WordCountOptions) - with TestPipeline() as p: # Use TestPipeline for testing. + parser = argparse.ArgumentParser() + parser.add_argument( + '--input-file', + default='gs://dataflow-samples/shakespeare/kinglear.txt', + help='The file path for the input text to process.') + parser.add_argument( + '--output-path', required=True, help='The path prefix for output files.') + args, _ = parser.parse_known_args() + + with TestPipeline() as pipeline: # Use TestPipeline for testing. # [START pipeline_monitoring_execution] ( - p + pipeline # Read the lines of the input text. - | 'ReadLines' >> beam.io.ReadFromText(options.input) + | 'ReadLines' >> beam.io.ReadFromText(args.input_file) # Count the words. | CountWords() # Write the formatted word counts to output. - | 'WriteCounts' >> beam.io.WriteToText(options.output)) + | 'WriteCounts' >> beam.io.WriteToText(args.output_path)) # [END pipeline_monitoring_execution] - p.visit(SnippetUtils.RenameFiles(renames)) - -def examples_wordcount_minimal(renames): +def examples_wordcount_minimal(): """MinimalWordCount example snippets.""" import re import apache_beam as beam - from apache_beam.options.pipeline_options import GoogleCloudOptions - from apache_beam.options.pipeline_options import StandardOptions + # [START examples_wordcount_minimal_options] from apache_beam.options.pipeline_options import PipelineOptions - # [START examples_wordcount_minimal_options] - options = PipelineOptions() - google_cloud_options = options.view_as(GoogleCloudOptions) - google_cloud_options.project = 'my-project-id' - google_cloud_options.job_name = 'myjob' - google_cloud_options.staging_location = 'gs://your-bucket-name-here/staging' - google_cloud_options.temp_location = 'gs://your-bucket-name-here/temp' - options.view_as(StandardOptions).runner = 'DataflowRunner' + input_file = 'gs://dataflow-samples/shakespeare/kinglear.txt' + output_path = 'gs://my-bucket/counts.txt' + + beam_options = PipelineOptions( + runner='DataflowRunner', + project='my-project-id', + job_name='unique-job-name', + temp_location='gs://my-bucket/temp', + ) # [END examples_wordcount_minimal_options] # Run it locally for testing. - options = PipelineOptions() + import argparse + + parser = argparse.ArgumentParser() + parser.add_argument('--input-file') + parser.add_argument('--output-path') + args, beam_args = parser.parse_known_args() + + input_file = args.input_file + output_path = args.output_path + + beam_options = PipelineOptions(beam_args) # [START examples_wordcount_minimal_create] - p = beam.Pipeline(options=options) + pipeline = beam.Pipeline(options=beam_options) # [END examples_wordcount_minimal_create] ( # [START examples_wordcount_minimal_read] - p - | beam.io.ReadFromText('gs://dataflow-samples/shakespeare/kinglear.txt') + pipeline + | beam.io.ReadFromText(input_file) # [END examples_wordcount_minimal_read] # [START examples_wordcount_minimal_pardo] @@ -462,54 +463,52 @@ def examples_wordcount_minimal(renames): # [END examples_wordcount_minimal_map] # [START examples_wordcount_minimal_write] - | beam.io.WriteToText('gs://my-bucket/counts.txt') + | beam.io.WriteToText(output_path) # [END examples_wordcount_minimal_write] ) - p.visit(SnippetUtils.RenameFiles(renames)) - # [START examples_wordcount_minimal_run] - result = p.run() + result = pipeline.run() # [END examples_wordcount_minimal_run] result.wait_until_finish() -def examples_wordcount_wordcount(renames): +def examples_wordcount_wordcount(): """WordCount example snippets.""" import re import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions - argv = [] - # [START examples_wordcount_wordcount_options] - class WordCountOptions(PipelineOptions): - @classmethod - def _add_argparse_args(cls, parser): - parser.add_argument( - '--input', - help='Input for the pipeline', - default='gs://my-bucket/input') + import argparse - options = PipelineOptions(argv) - word_count_options = options.view_as(WordCountOptions) - with beam.Pipeline(options=options) as p: - lines = p | beam.io.ReadFromText(word_count_options.input) + parser = argparse.ArgumentParser() + parser.add_argument( + '--input-file', + default='gs://dataflow-samples/shakespeare/kinglear.txt', + help='The file path for the input text to process.') + parser.add_argument( + '--output-path', required=True, help='The path prefix for output files.') + args, beam_args = parser.parse_known_args() + + beam_options = PipelineOptions(beam_args) + with beam.Pipeline(options=beam_options) as pipeline: + lines = pipeline | beam.io.ReadFromText(args.input_file) # [END examples_wordcount_wordcount_options] # [START examples_wordcount_wordcount_composite] - class CountWords(beam.PTransform): - def expand(self, pcoll): - return ( - pcoll - # Convert lines of text into individual words. - | 'ExtractWords' >> - beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x)) + @beam.ptransform_fn + def CountWords(pcoll): + return ( + pcoll + # Convert lines of text into individual words. + | 'ExtractWords' >> + beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x)) - # Count the number of times each word occurs. - | beam.combiners.Count.PerElement()) + # Count the number of times each word occurs. + | beam.combiners.Count.PerElement()) counts = lines | CountWords() @@ -524,11 +523,10 @@ def process(self, element): formatted = counts | beam.ParDo(FormatAsTextFn()) # [END examples_wordcount_wordcount_dofn] - formatted | beam.io.WriteToText('gs://my-bucket/counts.txt') - p.visit(SnippetUtils.RenameFiles(renames)) + formatted | beam.io.WriteToText(args.output_path) -def examples_wordcount_templated(renames): +def examples_wordcount_templated(): """Templated WordCount example snippet.""" import re @@ -544,15 +542,19 @@ def _add_argparse_args(cls, parser): # Use add_value_provider_argument for arguments to be templatable # Use add_argument as usual for non-templatable arguments parser.add_value_provider_argument( - '--input', help='Path of the file to read from') + '--input-file', + default='gs://dataflow-samples/shakespeare/kinglear.txt', + help='The file path for the input text to process.') parser.add_argument( - '--output', required=True, help='Output file to write results to.') + '--output-path', + required=True, + help='The path prefix for output files.') - pipeline_options = PipelineOptions(['--output', 'some/output_path']) - with beam.Pipeline(options=pipeline_options) as p: + beam_options = PipelineOptions() + args = beam_options.view_as(WordcountTemplatedOptions) - wordcount_options = pipeline_options.view_as(WordcountTemplatedOptions) - lines = p | 'Read' >> ReadFromText(wordcount_options.input) + with beam.Pipeline(options=beam_options) as pipeline: + lines = pipeline | 'Read' >> ReadFromText(args.input_file.get()) # [END example_wordcount_templated] @@ -569,9 +571,7 @@ def format_result(word_count): | 'Sum' >> beam.Map(lambda word_ones: (word_ones[0], sum(word_ones[1]))) | 'Format' >> beam.Map(format_result) - | 'Write' >> WriteToText(wordcount_options.output)) - - p.visit(SnippetUtils.RenameFiles(renames)) + | 'Write' >> WriteToText(args.output_path)) def examples_wordcount_debugging(renames): @@ -618,9 +618,9 @@ def process(self, element): # [END example_wordcount_debugging_logging] # [END example_wordcount_debugging_aggregators] - with TestPipeline() as p: # Use TestPipeline for testing. + with TestPipeline() as pipeline: # Use TestPipeline for testing. filtered_words = ( - p + pipeline | beam.io.ReadFromText('gs://dataflow-samples/shakespeare/kinglear.txt') | @@ -644,14 +644,13 @@ def format_result(word_count): | 'format' >> beam.Map(format_result) | 'Write' >> beam.io.WriteToText('gs://my-bucket/counts.txt')) - p.visit(SnippetUtils.RenameFiles(renames)) + pipeline.visit(SnippetUtils.RenameFiles(renames)) -def examples_wordcount_streaming(argv): +def examples_wordcount_streaming(): import apache_beam as beam from apache_beam import window from apache_beam.options.pipeline_options import PipelineOptions - from apache_beam.options.pipeline_options import StandardOptions # Parse out arguments. parser = argparse.ArgumentParser() @@ -672,19 +671,18 @@ def examples_wordcount_streaming(argv): help=( 'Input PubSub subscription of the form ' '"projects//subscriptions/."')) - known_args, pipeline_args = parser.parse_known_args(argv) + args, beam_args = parser.parse_known_args() - pipeline_options = PipelineOptions(pipeline_args) - pipeline_options.view_as(StandardOptions).streaming = True + beam_options = PipelineOptions(beam_args, streaming=True) - with TestPipeline(options=pipeline_options) as p: + with TestPipeline(options=beam_options) as pipeline: # [START example_wordcount_streaming_read] # Read from Pub/Sub into a PCollection. - if known_args.input_subscription: - lines = p | beam.io.ReadFromPubSub( - subscription=known_args.input_subscription) + if args.input_subscription: + lines = pipeline | beam.io.ReadFromPubSub( + subscription=args.input_subscription) else: - lines = p | beam.io.ReadFromPubSub(topic=known_args.input_topic) + lines = pipeline | beam.io.ReadFromPubSub(topic=args.input_topic) # [END example_wordcount_streaming_read] output = ( @@ -697,12 +695,12 @@ def examples_wordcount_streaming(argv): | 'Group' >> beam.GroupByKey() | 'Sum' >> beam.Map(lambda word_ones: (word_ones[0], sum(word_ones[1]))) - | - 'Format' >> beam.Map(lambda word_and_count: '%s: %d' % word_and_count)) + | 'Format' >> + beam.MapTuple(lambda word, count: f'{word}: {count}'.encode('utf-8'))) # [START example_wordcount_streaming_write] # Write to Pub/Sub - output | beam.io.WriteStringsToPubSub(known_args.output_topic) + output | beam.io.WriteToPubSub(args.output_topic) # [END example_wordcount_streaming_write] @@ -725,13 +723,13 @@ def __init__(self, templated_int): def process(self, an_int): yield self.templated_int.get() + an_int - pipeline_options = PipelineOptions() - with beam.Pipeline(options=pipeline_options) as p: + beam_options = PipelineOptions() + args = beam_options.view_as(TemplatedUserOptions) - user_options = pipeline_options.view_as(TemplatedUserOptions) - my_sum_fn = MySumFn(user_options.templated_int) + with beam.Pipeline(options=beam_options) as pipeline: + my_sum_fn = MySumFn(args.templated_int) sum = ( - p + pipeline | 'ReadCollection' >> beam.io.ReadFromText('gs://some/integer_collection') | 'StringToInt' >> beam.Map(lambda w: int(w)) @@ -743,7 +741,7 @@ def process(self, an_int): # so a value must be provided at graph-construction time my_sum_fn.templated_int = StaticValueProvider(int, 10) - p.visit(SnippetUtils.RenameFiles(renames)) + pipeline.visit(SnippetUtils.RenameFiles(renames)) # Defining a new source. @@ -840,8 +838,8 @@ def model_custom_source(count): # Using the source in an example pipeline. # [START model_custom_source_use_new_source] - with beam.Pipeline(options=PipelineOptions()) as p: - numbers = p | 'ProduceNumbers' >> beam.io.Read(CountingSource(count)) + with beam.Pipeline() as pipeline: + numbers = pipeline | 'ProduceNumbers' >> beam.io.Read(CountingSource(count)) # [END model_custom_source_use_new_source] lines = numbers | beam.core.Map(lambda number: 'line %d' % number) @@ -849,8 +847,8 @@ def model_custom_source(count): lines, equal_to(['line ' + str(number) for number in range(0, count)])) # [START model_custom_source_use_ptransform] - with beam.Pipeline(options=PipelineOptions()) as p: - numbers = p | 'ProduceNumbers' >> ReadFromCountingSource(count) + with beam.Pipeline() as pipeline: + numbers = pipeline | 'ProduceNumbers' >> ReadFromCountingSource(count) # [END model_custom_source_use_ptransform] lines = numbers | beam.core.Map(lambda number: 'line %d' % number) @@ -984,8 +982,8 @@ def model_custom_sink( # Using the new sink in an example pipeline. # [START model_custom_sink_use_new_sink] - with beam.Pipeline(options=PipelineOptions()) as p: - kvs = p | 'CreateKVs' >> beam.Create(KVs) + with beam.Pipeline(options=PipelineOptions()) as pipeline: + kvs = pipeline | 'CreateKVs' >> beam.Create(KVs) kvs | 'WriteToSimpleKV' >> beam.io.Write( SimpleKVSink(simplekv, 'http://url_to_simple_kv/', final_table_name)) @@ -994,8 +992,8 @@ def model_custom_sink( final_table_name = final_table_name_with_ptransform # [START model_custom_sink_use_ptransform] - with beam.Pipeline(options=PipelineOptions()) as p: - kvs = p | 'CreateKVs' >> beam.core.Create(KVs) + with beam.Pipeline(options=PipelineOptions()) as pipeline: + kvs = pipeline | 'CreateKVs' >> beam.core.Create(KVs) kvs | 'WriteToSimpleKV' >> WriteToKVSink( simplekv, 'http://url_to_simple_kv/', final_table_name) # [END model_custom_sink_use_ptransform] @@ -1008,9 +1006,10 @@ def filter_words(x): return re.findall(r'[A-Za-z\']+', x) # [START model_textio_read] - with beam.Pipeline(options=PipelineOptions()) as p: + with beam.Pipeline(options=PipelineOptions()) as pipeline: # [START model_pipelineio_read] - lines = p | 'ReadFromText' >> beam.io.ReadFromText('path/to/input-*.csv') + lines = pipeline | 'ReadFromText' >> beam.io.ReadFromText( + 'path/to/input-*.csv') # [END model_pipelineio_read] # [END model_textio_read] @@ -1022,21 +1021,21 @@ def filter_words(x): # [END model_pipelineio_write] # [END model_textio_write] - p.visit(SnippetUtils.RenameFiles(renames)) + pipeline.visit(SnippetUtils.RenameFiles(renames)) def model_textio_compressed(renames, expected): """Using a Read Transform to read compressed text files.""" - with TestPipeline() as p: + with TestPipeline() as pipeline: # [START model_textio_write_compressed] - lines = p | 'ReadFromText' >> beam.io.ReadFromText( + lines = pipeline | 'ReadFromText' >> beam.io.ReadFromText( '/path/to/input-*.csv.gz', compression_type=beam.io.filesystem.CompressionTypes.GZIP) # [END model_textio_write_compressed] assert_that(lines, equal_to(expected)) - p.visit(SnippetUtils.RenameFiles(renames)) + pipeline.visit(SnippetUtils.RenameFiles(renames)) def model_datastoreio(): @@ -1056,13 +1055,13 @@ def model_datastoreio(): query = Query(kind, project) # [START model_datastoreio_read] - p = beam.Pipeline(options=PipelineOptions()) - entities = p | 'Read From Datastore' >> ReadFromDatastore(query) + pipeline = beam.Pipeline(options=PipelineOptions()) + entities = pipeline | 'Read From Datastore' >> ReadFromDatastore(query) # [END model_datastoreio_read] # [START model_datastoreio_write] - p = beam.Pipeline(options=PipelineOptions()) - musicians = p | 'Musicians' >> beam.Create( + pipeline = beam.Pipeline(options=PipelineOptions()) + musicians = pipeline | 'Musicians' >> beam.Create( ['Mozart', 'Chopin', 'Beethoven', 'Vivaldi']) def to_entity(content): @@ -1076,7 +1075,8 @@ def to_entity(content): # [END model_datastoreio_write] -def model_bigqueryio(p, write_project='', write_dataset='', write_table=''): +def model_bigqueryio( + pipeline, write_project='', write_dataset='', write_table=''): """Using a Read and Write transform to read/write from/to BigQuery.""" # [START model_bigqueryio_table_spec] @@ -1116,29 +1116,29 @@ def model_bigqueryio(p, write_project='', write_dataset='', write_table=''): # [START model_bigqueryio_read_table] max_temperatures = ( - p - | 'ReadTable' >> beam.io.Read(beam.io.BigQuerySource(table_spec)) + pipeline + | 'ReadTable' >> beam.io.ReadFromBigQuery(table=table_spec) # Each row is a dictionary where the keys are the BigQuery columns | beam.Map(lambda elem: elem['max_temperature'])) # [END model_bigqueryio_read_table] # [START model_bigqueryio_read_query] max_temperatures = ( - p - | 'QueryTable' >> beam.io.Read(beam.io.BigQuerySource( + pipeline + | 'QueryTable' >> beam.io.ReadFromBigQuery( query='SELECT max_temperature FROM '\ - '[clouddataflow-readonly:samples.weather_stations]')) + '[clouddataflow-readonly:samples.weather_stations]') # Each row is a dictionary where the keys are the BigQuery columns | beam.Map(lambda elem: elem['max_temperature'])) # [END model_bigqueryio_read_query] # [START model_bigqueryio_read_query_std_sql] max_temperatures = ( - p - | 'QueryTableStdSQL' >> beam.io.Read(beam.io.BigQuerySource( + pipeline + | 'QueryTableStdSQL' >> beam.io.ReadFromBigQuery( query='SELECT max_temperature FROM '\ '`clouddataflow-readonly.samples.weather_stations`', - use_standard_sql=True)) + use_standard_sql=True) # Each row is a dictionary where the keys are the BigQuery columns | beam.Map(lambda elem: elem['max_temperature'])) # [END model_bigqueryio_read_query_std_sql] @@ -1162,7 +1162,7 @@ def model_bigqueryio(p, write_project='', write_dataset='', write_table=''): table_spec = '{}:{}.{}'.format(write_project, write_dataset, write_table) # [START model_bigqueryio_write_input] - quotes = p | beam.Create([ + quotes = pipeline | beam.Create([ { 'source': 'Mahatma Gandhi', 'quote': 'My life is my message.' }, @@ -1182,8 +1182,8 @@ def model_bigqueryio(p, write_project='', write_dataset='', write_table=''): # [START model_bigqueryio_write_dynamic_destinations] fictional_characters_view = beam.pvalue.AsDict( - p | 'CreateCharacters' >> beam.Create([('Yoda', True), - ('Obi Wan Kenobi', True)])) + pipeline | 'CreateCharacters' >> beam.Create([('Yoda', True), + ('Obi Wan Kenobi', True)])) def table_fn(element, fictional_characters): if element in fictional_characters: @@ -1239,9 +1239,9 @@ def expand(self, pcoll): # [END composite_ptransform_apply_method] # [END composite_transform_example] - with TestPipeline() as p: # Use TestPipeline for testing. + with TestPipeline() as pipeline: # Use TestPipeline for testing. ( - p + pipeline | beam.Create(contents) | CountWords() | beam.io.WriteToText(output_path)) @@ -1252,10 +1252,11 @@ def model_multiple_pcollections_flatten(contents, output_path): some_hash_fn = lambda s: ord(s[0]) partition_fn = lambda element, partitions: some_hash_fn(element) % partitions import apache_beam as beam - with TestPipeline() as p: # Use TestPipeline for testing. + with TestPipeline() as pipeline: # Use TestPipeline for testing. # Partition into deciles - partitioned = p | beam.Create(contents) | beam.Partition(partition_fn, 3) + partitioned = pipeline | beam.Create(contents) | beam.Partition( + partition_fn, 3) pcoll1 = partitioned[0] pcoll2 = partitioned[1] pcoll3 = partitioned[2] @@ -1285,9 +1286,9 @@ def get_percentile(i): return i import apache_beam as beam - with TestPipeline() as p: # Use TestPipeline for testing. + with TestPipeline() as pipeline: # Use TestPipeline for testing. - students = p | beam.Create(contents) + students = pipeline | beam.Create(contents) # [START model_multiple_pcollections_partition] def partition_fn(student, num_partitions): @@ -1309,14 +1310,14 @@ def model_group_by_key(contents, output_path): import re import apache_beam as beam - with TestPipeline() as p: # Use TestPipeline for testing. + with TestPipeline() as pipeline: # Use TestPipeline for testing. def count_ones(word_ones): (word, ones) = word_ones return (word, sum(ones)) words_and_counts = ( - p + pipeline | beam.Create(contents) | beam.FlatMap(lambda x: re.findall(r'\w+', x)) | 'one word' >> beam.Map(lambda w: (w, 1))) @@ -1362,14 +1363,14 @@ def model_join_using_side_inputs( import apache_beam as beam from apache_beam.pvalue import AsIter - with TestPipeline() as p: # Use TestPipeline for testing. + with TestPipeline() as pipeline: # Use TestPipeline for testing. # [START model_join_using_side_inputs] # This code performs a join by receiving the set of names as an input and # passing PCollections that contain emails and phone numbers as side inputs # instead of using CoGroupByKey. - names = p | 'names' >> beam.Create(name_list) - emails = p | 'email' >> beam.Create(email_list) - phones = p | 'phone' >> beam.Create(phone_list) + names = pipeline | 'names' >> beam.Create(name_list) + emails = pipeline | 'email' >> beam.Create(email_list) + phones = pipeline | 'phone' >> beam.Create(phone_list) def join_info(name, emails, phone_numbers): filtered_emails = [] @@ -1422,9 +1423,9 @@ def file_process_pattern_access_metadata(): from apache_beam.io import fileio # [START FileProcessPatternAccessMetadataSnip1] - with beam.Pipeline() as p: + with beam.Pipeline() as pipeline: readable_files = ( - p + pipeline | fileio.MatchFiles('hdfs://path/to/*.txt') | fileio.ReadMatches() | beam.Reshuffle()) @@ -1462,21 +1463,21 @@ def process(self, an_int): 'The string value is %s' % RuntimeValueProvider.get_value('string_value', str, '')) - pipeline_options = PipelineOptions() + beam_options = PipelineOptions() + args = beam_options.view_as(MyOptions) + # Create pipeline. - with beam.Pipeline(options=pipeline_options) as p: + with beam.Pipeline(options=beam_options) as pipeline: - my_options = pipeline_options.view_as(MyOptions) # Add a branch for logging the ValueProvider value. _ = ( - p + pipeline | beam.Create([None]) - | 'LogValueProvs' >> beam.ParDo( - LogValueProvidersFn(my_options.string_value))) + | 'LogValueProvs' >> beam.ParDo(LogValueProvidersFn(args.string_value))) # The main pipeline. result_pc = ( - p + pipeline | "main_pc" >> beam.Create([1, 2, 3]) | beam.combiners.Sum.Globally()) @@ -1505,17 +1506,16 @@ def cross_join(left, rights): yield (left, x) # Create pipeline. - pipeline_options = PipelineOptions() - p = beam.Pipeline(options=pipeline_options) + pipeline = beam.Pipeline() side_input = ( - p + pipeline | 'PeriodicImpulse' >> PeriodicImpulse( first_timestamp, last_timestamp, interval, True) | 'MapToFileName' >> beam.Map(lambda x: src_file_pattern + str(x)) | 'ReadFromFile' >> beam.io.ReadAllFromText()) main_input = ( - p + pipeline | 'MpImpulse' >> beam.Create(sample_main_input_elements) | 'MapMpToTimestamped' >> beam.Map(lambda src: TimestampedValue(src, src)) @@ -1528,7 +1528,7 @@ def cross_join(left, rights): cross_join, rights=beam.pvalue.AsIter(side_input))) # [END SideInputSlowUpdateSnip1] - return p, result + return pipeline, result def bigqueryio_deadletter(): @@ -1537,10 +1537,10 @@ def bigqueryio_deadletter(): # Create pipeline. schema = ({'fields': [{'name': 'a', 'type': 'STRING', 'mode': 'REQUIRED'}]}) - p = beam.Pipeline() + pipeline = beam.Pipeline() errors = ( - p | 'Data' >> beam.Create([1, 2]) + pipeline | 'Data' >> beam.Create([1, 2]) | 'CreateBrokenData' >> beam.Map(lambda src: {'a': src} if src == 2 else {'a': None}) | 'WriteToBigQuery' >> beam.io.WriteToBigQuery( @@ -1611,9 +1611,9 @@ def nlp_analyze_text(): extract_syntax=True, ) - with beam.Pipeline() as p: + with beam.Pipeline() as pipeline: responses = ( - p + pipeline | beam.Create([ 'My experience so far has been fantastic! ' 'I\'d really recommend this product.' @@ -1647,7 +1647,8 @@ def sdf_basic_example(): read_next_record = None # [START SDF_BasicExample] - class FileToWordsRestrictionProvider(beam.io.RestrictionProvider): + class FileToWordsRestrictionProvider(beam.transforms.core.RestrictionProvider + ): def initial_restriction(self, file_name): return OffsetRange(0, os.stat(file_name).st_size) @@ -1680,7 +1681,8 @@ def sdf_basic_example_with_splitting(): from apache_beam.io.restriction_trackers import OffsetRange # [START SDF_BasicExampleWithSplitting] - class FileToWordsRestrictionProvider(beam.io.RestrictionProvider): + class FileToWordsRestrictionProvider(beam.transforms.core.RestrictionProvider + ): def split(self, file_name, restriction): # Compute and output 64 MiB size ranges to process in parallel split_size = 64 * (1 << 20) diff --git a/sdks/python/apache_beam/examples/snippets/snippets_test.py b/sdks/python/apache_beam/examples/snippets/snippets_test.py index 238ecc8aef59..8f215a336177 100644 --- a/sdks/python/apache_beam/examples/snippets/snippets_test.py +++ b/sdks/python/apache_beam/examples/snippets/snippets_test.py @@ -19,9 +19,7 @@ """Tests for all code snippets used in public docs.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - +import gc import glob import gzip import logging @@ -32,12 +30,9 @@ import time import unittest import uuid -from builtins import map -from builtins import object -from builtins import range -from builtins import zip import mock +import parameterized import apache_beam as beam import apache_beam.transforms.combiners as combiners @@ -354,6 +349,79 @@ def expand(self, pcoll): with self.assertRaises(typehints.TypeCheckError): words_with_lens | beam.Map(lambda x: x).with_input_types(Tuple[int, int]) + def test_bad_types_annotations(self): + p = TestPipeline(options=PipelineOptions(pipeline_type_check=True)) + + numbers = p | beam.Create(['1', '2', '3']) + + # Consider the following code. + # pylint: disable=expression-not-assigned + # pylint: disable=unused-variable + class FilterEvensDoFn(beam.DoFn): + def process(self, element): + if element % 2 == 0: + yield element + + evens = numbers | 'Untyped Filter' >> beam.ParDo(FilterEvensDoFn()) + + # Now suppose numbers was defined as [snippet above]. + # When running this pipeline, you'd get a runtime error, + # possibly on a remote machine, possibly very late. + + with self.assertRaises(TypeError): + p.run() + + # To catch this early, we can annotate process() with the expected types. + # Beam will then use these as type hints and perform type checking before + # the pipeline starts. + with self.assertRaises(typehints.TypeCheckError): + # [START type_hints_do_fn_annotations] + from typing import Iterable + + class FilterEvensDoFn(beam.DoFn): + def process(self, element: int) -> Iterable[int]: + if element % 2 == 0: + yield element + + evens = numbers | 'filter_evens' >> beam.ParDo(FilterEvensDoFn()) + # [END type_hints_do_fn_annotations] + + # Another example, using a list output type. Notice that the output + # annotation has an additional Optional for the else clause. + with self.assertRaises(typehints.TypeCheckError): + # [START type_hints_do_fn_annotations_optional] + from typing import List, Optional + + class FilterEvensDoubleDoFn(beam.DoFn): + def process(self, element: int) -> Optional[List[int]]: + if element % 2 == 0: + return [element, element] + return None + + evens = numbers | 'double_evens' >> beam.ParDo(FilterEvensDoubleDoFn()) + # [END type_hints_do_fn_annotations_optional] + + # Example using an annotated function. + with self.assertRaises(typehints.TypeCheckError): + # [START type_hints_map_annotations] + def my_fn(element: int) -> str: + return 'id_' + str(element) + + ids = numbers | 'to_id' >> beam.Map(my_fn) + # [END type_hints_map_annotations] + + # Example using an annotated PTransform. + with self.assertRaises(typehints.TypeCheckError): + # [START type_hints_ptransforms] + from apache_beam.pvalue import PCollection + + class IntToStr(beam.PTransform): + def expand(self, pcoll: PCollection[int]) -> PCollection[str]: + return pcoll | beam.Map(lambda elem: str(elem)) + + ids = numbers | 'convert to str' >> IntToStr() + # [END type_hints_ptransforms] + def test_runtime_checks_off(self): # We do not run the following pipeline, as it has incorrect type # information, and may fail with obscure errors, depending on the runner @@ -495,12 +563,6 @@ def expand(self, pcoll): return pcoll | 'DummyWriteForTesting' >> beam.ParDo( SnippetsTest.DummyWriteTransform.WriteDoFn(self.file_to_write)) - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): self.old_read_from_text = beam.io.ReadFromText self.old_write_to_text = beam.io.WriteToText @@ -516,6 +578,8 @@ def tearDown(self): beam.io.WriteToText = self.old_write_to_text # Cleanup all the temporary files created in the test. map(os.remove, self.temp_files) + # Ensure that PipelineOptions subclasses have been cleaned up between tests + gc.collect() def create_temp_file(self, contents=''): with tempfile.NamedTemporaryFile(delete=False) as f: @@ -537,15 +601,20 @@ def get_output(self, path, sorted_output=True, suffix=''): def test_model_pipelines(self): temp_path = self.create_temp_file('aa bb cc\n bb cc\n cc') result_path = temp_path + '.result' - snippets.model_pipelines( - ['--input=%s*' % temp_path, '--output=%s' % result_path]) + test_argv = [ + "unused_argv[0]", + f"--input-file={temp_path}*", + f"--output-path={result_path}", + ] + with mock.patch.object(sys, 'argv', test_argv): + snippets.model_pipelines() self.assertEqual( self.get_output(result_path), [str(s) for s in [(u'aa', 1), (u'bb', 2), (u'cc', 3)]]) def test_model_pcollection(self): temp_path = self.create_temp_file() - snippets.model_pcollection(['--output=%s' % temp_path]) + snippets.model_pcollection(temp_path) self.assertEqual( self.get_output(temp_path), [ @@ -688,7 +757,13 @@ def test_model_bigqueryio(self): def _run_test_pipeline_for_options(self, fn): temp_path = self.create_temp_file('aa\nbb\ncc') result_path = temp_path + '.result' - fn(['--input=%s*' % temp_path, '--output=%s' % result_path]) + test_argv = [ + "unused_argv[0]", + f"--input-file={temp_path}*", + f"--output-path={result_path}", + ] + with mock.patch.object(sys, 'argv', test_argv): + fn() self.assertEqual(['aa', 'bb', 'cc'], self.get_output(result_path)) def test_pipeline_options_local(self): @@ -711,21 +786,24 @@ def test_pipeline_logging(self): self.assertEqual( sorted(' '.join(lines).split(' ')), self.get_output(result_path)) - def test_examples_wordcount(self): - pipelines = [ - snippets.examples_wordcount_minimal, - snippets.examples_wordcount_wordcount, - snippets.pipeline_monitoring, - snippets.examples_wordcount_templated + @parameterized.parameterized.expand([ + [snippets.examples_wordcount_minimal], + [snippets.examples_wordcount_wordcount], + [snippets.pipeline_monitoring], + [snippets.examples_wordcount_templated], + ]) + def test_examples_wordcount(self, pipeline): + temp_path = self.create_temp_file('abc def ghi\n abc jkl') + result_path = self.create_temp_file() + test_argv = [ + "unused_argv[0]", + f"--input-file={temp_path}*", + f"--output-path={result_path}", ] - - for pipeline in pipelines: - temp_path = self.create_temp_file('abc def ghi\n abc jkl') - result_path = self.create_temp_file() - pipeline({'read': temp_path, 'write': result_path}) - self.assertEqual( - self.get_output(result_path), - ['abc: 2', 'def: 1', 'ghi: 1', 'jkl: 1']) + with mock.patch.object(sys, 'argv', test_argv): + pipeline() + self.assertEqual( + self.get_output(result_path), ['abc: 2', 'def: 1', 'ghi: 1', 'jkl: 1']) def test_examples_ptransforms_templated(self): pipelines = [snippets.examples_ptransforms_templated] @@ -748,7 +826,7 @@ def test_examples_wordcount_debugging(self): @unittest.skipIf(pubsub is None, 'GCP dependencies are not installed') @mock.patch('apache_beam.io.ReadFromPubSub') - @mock.patch('apache_beam.io.WriteStringsToPubSub') + @mock.patch('apache_beam.io.WriteToPubSub') def test_examples_wordcount_streaming(self, *unused_mocks): def FakeReadFromPubSub(topic=None, subscription=None, values=None): expected_topic = topic @@ -768,7 +846,7 @@ def __init__(self, matcher): def expand(self, pcoll): assert_that(pcoll, self.matcher) - def FakeWriteStringsToPubSub(topic=None, values=None): + def FakeWriteToPubSub(topic=None, values=None): expected_topic = topic def _inner(topic=None, subscription=None): @@ -785,28 +863,34 @@ def _inner(topic=None, subscription=None): TimestampedValue(b'a b c c c', 20) ] output_topic = 'projects/fake-beam-test-project/topic/outtopic' - output_values = ['a: 1', 'a: 2', 'b: 1', 'b: 3', 'c: 3'] + output_values = [b'a: 1', b'a: 2', b'b: 1', b'b: 3', b'c: 3'] beam.io.ReadFromPubSub = ( FakeReadFromPubSub(topic=input_topic, values=input_values)) - beam.io.WriteStringsToPubSub = ( - FakeWriteStringsToPubSub(topic=output_topic, values=output_values)) - snippets.examples_wordcount_streaming([ + beam.io.WriteToPubSub = ( + FakeWriteToPubSub(topic=output_topic, values=output_values)) + test_argv = [ + 'unused_argv[0]', '--input_topic', 'projects/fake-beam-test-project/topic/intopic', '--output_topic', 'projects/fake-beam-test-project/topic/outtopic' - ]) + ] + with mock.patch.object(sys, 'argv', test_argv): + snippets.examples_wordcount_streaming() # Test with custom subscription. input_sub = 'projects/fake-beam-test-project/subscriptions/insub' beam.io.ReadFromPubSub = FakeReadFromPubSub( subscription=input_sub, values=input_values) - snippets.examples_wordcount_streaming([ + test_argv = [ + 'unused_argv[0]', '--input_subscription', 'projects/fake-beam-test-project/subscriptions/insub', '--output_topic', 'projects/fake-beam-test-project/topic/outtopic' - ]) + ] + with mock.patch.object(sys, 'argv', test_argv): + snippets.examples_wordcount_streaming() def test_model_composite_transform_example(self): contents = ['aa bb cc', 'bb cc', 'cc'] @@ -997,8 +1081,8 @@ def test_model_early_late_triggers(self): assert_that(counts, equal_to([('a', 4), ('b', 2), ('a', 1)])) def test_model_setting_trigger(self): - pipeline_options = PipelineOptions() - pipeline_options.view_as(StandardOptions).streaming = True + pipeline_options = PipelineOptions( + flags=['--streaming', '--allow_unsafe_triggers']) with TestPipeline(options=pipeline_options) as p: test_stream = ( @@ -1052,8 +1136,8 @@ def test_model_composite_triggers(self): assert_that(counts, equal_to([('a', 3), ('b', 2), ('a', 2), ('c', 2)])) def test_model_other_composite_triggers(self): - pipeline_options = PipelineOptions() - pipeline_options.view_as(StandardOptions).streaming = True + pipeline_options = PipelineOptions( + flags=['--streaming', '--allow_unsafe_triggers']) with TestPipeline(options=pipeline_options) as p: test_stream = ( @@ -1246,11 +1330,11 @@ def test_setting_global_window(self): beam.Map(lambda x: beam.window.TimestampedValue(('k', x), x))) # [START setting_global_window] from apache_beam import window - session_windowed_items = ( + global_windowed_items = ( items | 'window' >> beam.WindowInto(window.GlobalWindows())) # [END setting_global_window] summed = ( - session_windowed_items + global_windowed_items | 'group' >> beam.GroupByKey() | 'combine' >> beam.CombineValues(sum)) unkeyed = summed | 'unkey' >> beam.Map(lambda x: x[1]) diff --git a/sdks/python/apache_beam/examples/snippets/snippets_test_py3.py b/sdks/python/apache_beam/examples/snippets/snippets_test_py3.py deleted file mode 100644 index 5eb1d4ba15f7..000000000000 --- a/sdks/python/apache_beam/examples/snippets/snippets_test_py3.py +++ /dev/null @@ -1,114 +0,0 @@ -# coding=utf-8 -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -""" -Tests for all code snippets used in public docs, using Python 3 specific -syntax. -""" -# pytype: skip-file - -from __future__ import absolute_import -from __future__ import division - -import logging -import unittest - -import apache_beam as beam -from apache_beam import typehints -from apache_beam.options.pipeline_options import PipelineOptions -from apache_beam.testing.test_pipeline import TestPipeline - - -class TypeHintsTest(unittest.TestCase): - def test_bad_types_annotations(self): - p = TestPipeline(options=PipelineOptions(pipeline_type_check=True)) - - numbers = p | beam.Create(['1', '2', '3']) - - # Consider the following code. - # pylint: disable=expression-not-assigned - # pylint: disable=unused-variable - class FilterEvensDoFn(beam.DoFn): - def process(self, element): - if element % 2 == 0: - yield element - - evens = numbers | 'Untyped Filter' >> beam.ParDo(FilterEvensDoFn()) - - # Now suppose numbers was defined as [snippet above]. - # When running this pipeline, you'd get a runtime error, - # possibly on a remote machine, possibly very late. - - with self.assertRaises(TypeError): - p.run() - - # To catch this early, we can annotate process() with the expected types. - # Beam will then use these as type hints and perform type checking before - # the pipeline starts. - with self.assertRaises(typehints.TypeCheckError): - # [START type_hints_do_fn_annotations] - from typing import Iterable - - class FilterEvensDoFn(beam.DoFn): - def process(self, element: int) -> Iterable[int]: - if element % 2 == 0: - yield element - - evens = numbers | 'filter_evens' >> beam.ParDo(FilterEvensDoFn()) - # [END type_hints_do_fn_annotations] - - # Another example, using a list output type. Notice that the output - # annotation has an additional Optional for the else clause. - with self.assertRaises(typehints.TypeCheckError): - # [START type_hints_do_fn_annotations_optional] - from typing import List, Optional - - class FilterEvensDoubleDoFn(beam.DoFn): - def process(self, element: int) -> Optional[List[int]]: - if element % 2 == 0: - return [element, element] - return None - - evens = numbers | 'double_evens' >> beam.ParDo(FilterEvensDoubleDoFn()) - # [END type_hints_do_fn_annotations_optional] - - # Example using an annotated function. - with self.assertRaises(typehints.TypeCheckError): - # [START type_hints_map_annotations] - def my_fn(element: int) -> str: - return 'id_' + str(element) - - ids = numbers | 'to_id' >> beam.Map(my_fn) - # [END type_hints_map_annotations] - - # Example using an annotated PTransform. - with self.assertRaises(typehints.TypeCheckError): - # [START type_hints_ptransforms] - from apache_beam.pvalue import PCollection - - class IntToStr(beam.PTransform): - def expand(self, pcoll: PCollection[int]) -> PCollection[str]: - return pcoll | beam.Map(lambda elem: str(elem)) - - ids = numbers | 'convert to str' >> IntToStr() - # [END type_hints_ptransforms] - - -if __name__ == '__main__': - logging.getLogger().setLevel(logging.INFO) - unittest.main() diff --git a/sdks/python/apache_beam/examples/snippets/transforms/__init__.py b/sdks/python/apache_beam/examples/snippets/transforms/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/__init__.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/__init__.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/__init__.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/cogroupbykey.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/cogroupbykey.py index b20cd51d8eb0..ae5ce8173b3d 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/cogroupbykey.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/cogroupbykey.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def cogroupbykey(test=None): # [START cogroupbykey] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/cogroupbykey_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/cogroupbykey_test.py index cccfbe3bc4f2..ad4ed99a6e2b 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/cogroupbykey_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/cogroupbykey_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py index 15d16932b095..eb1ea9720df6 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py @@ -18,10 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - def combineglobally_function(test=None): # [START combineglobally_function] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py index 4f7ec2f2056a..e3c36a49613d 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py @@ -18,10 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - -import sys import unittest import mock @@ -67,32 +63,16 @@ def check_percentages(actual): str) # pylint: enable=line-too-long class CombineGloballyTest(unittest.TestCase): - # TODO: Remove this after Python 2 deprecation. - # https://issues.apache.org/jira/browse/BEAM-8124 - @unittest.skipIf( - sys.version_info[0] == 2, 'Python 2 renders sets in a non-compatible way') def test_combineglobally_function(self): combineglobally.combineglobally_function(check_common_items) - # TODO: Remove this after Python 2 deprecation. - # https://issues.apache.org/jira/browse/BEAM-8124 - @unittest.skipIf( - sys.version_info[0] == 2, 'Python 2 renders sets in a non-compatible way') def test_combineglobally_lambda(self): combineglobally.combineglobally_lambda(check_common_items) - # TODO: Remove this after Python 2 deprecation. - # https://issues.apache.org/jira/browse/BEAM-8124 - @unittest.skipIf( - sys.version_info[0] == 2, 'Python 2 renders sets in a non-compatible way') def test_combineglobally_multiple_arguments(self): combineglobally.combineglobally_multiple_arguments( check_common_items_with_exceptions) - # TODO: Remove this after Python 2 deprecation. - # https://issues.apache.org/jira/browse/BEAM-8124 - @unittest.skipIf( - sys.version_info[0] == 2, 'Python 2 renders sets in a non-compatible way') def test_combineglobally_side_inputs_singleton(self): combineglobally.combineglobally_side_inputs_singleton( check_common_items_with_exceptions) @@ -107,10 +87,6 @@ def test_combineglobally_side_inputs_singleton(self): # combineglobally.combineglobally_side_inputs_dict( # check_custom_common_items) - # TODO: Remove this after Python 2 deprecation. - # https://issues.apache.org/jira/browse/BEAM-8124 - @unittest.skipIf( - sys.version_info[0] == 2, 'Python 2 renders sets in a non-compatible way') def test_combineglobally_combinefn(self): combineglobally.combineglobally_combinefn(check_percentages) diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineperkey.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineperkey.py index 2b837ac0e6b0..31169699eebf 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineperkey.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineperkey.py @@ -18,10 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - def combineperkey_simple(test=None): # [START combineperkey_simple] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineperkey_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineperkey_test.py index ec23167dad3b..990781bc3915 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineperkey_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineperkey_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues.py index 97d7c4cc7f69..c9333f551e0f 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues.py @@ -18,10 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - def combinevalues_simple(test=None): # [START combinevalues_simple] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues_test.py index 3e233c9db210..dd4da75f3cf5 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/count.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/count.py index 6ee3980daf86..de7021b33ff2 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/count.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/count.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def count_globally(test=None): # [START count_globally] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/count_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/count_test.py index fff240693170..9016300a69e7 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/count_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/count_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/distinct.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/distinct.py index e2ee6da3bdb9..1d2f968032c3 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/distinct.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/distinct.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def distinct(test=None): # [START distinct] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/distinct_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/distinct_test.py index 70d134163b08..6f1246ee363a 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/distinct_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/distinct_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py index 879446a69620..dfeeaadc9c98 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py @@ -21,11 +21,6 @@ # Wrapping hurts the readability of the docs. # pylint: disable=line-too-long -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import sys import typing import unittest @@ -120,7 +115,6 @@ def test_groupby_two_exprs(self): ] assert_that(grouped | beam.MapTuple(normalize_kv), equal_to(expected)) - @unittest.skipIf(sys.version_info[0] < 3, 'bad comparison op') def test_group_by_attr(self): # [START groupby_attr] with beam.Pipeline() as p: @@ -145,7 +139,6 @@ def test_group_by_attr(self): ] assert_that(grouped | beam.MapTuple(normalize_kv), equal_to(expected)) - @unittest.skipIf(sys.version_info[0] < 3, 'bad comparison op') def test_group_by_attr_expr(self): # [START groupby_attr_expr] with beam.Pipeline() as p: diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey.py index 70f49f51d4b2..4abdeae7ac11 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def groupbykey(test=None): # [START groupbykey] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey_test.py index 3ea278c296ee..1a9b8f158c1a 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupintobatches.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupintobatches.py index 5e6ceaae2558..70b3144c7758 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupintobatches.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupintobatches.py @@ -16,9 +16,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import print_function - def groupintobatches(test=None): # [START groupintobatches] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupintobatches_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupintobatches_test.py index 76c94da5334f..09449f77f6e4 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupintobatches_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupintobatches_test.py @@ -16,9 +16,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/latest.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/latest.py index 166551886779..d7184c027844 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/latest.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/latest.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def latest_globally(test=None): # [START latest_globally] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/latest_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/latest_test.py index 6f9bffd49fe4..58ad92031838 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/latest_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/latest_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/max.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/max.py index fd0187fa64ae..1d17c34d72a2 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/max.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/max.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def max_globally(test=None): # [START max_globally] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/max_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/max_test.py index af43781b2770..834129abe53f 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/max_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/max_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/mean.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/mean.py index 033c0cf32ffc..489655b21c2e 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/mean.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/mean.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def mean_globally(test=None): # [START mean_globally] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/mean_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/mean_test.py index 796bbbf0cd8b..d91f9f2c74db 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/mean_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/mean_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/min.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/min.py index 812e73315bd4..f31ce4a3f13e 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/min.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/min.py @@ -16,9 +16,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import print_function - def min_globally(test=None): # [START min_globally] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/min_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/min_test.py index 321fd12cf99a..c30a2bd7398b 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/min_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/min_test.py @@ -16,9 +16,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample.py index 1ca827344754..92fc6a376b9c 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def sample_fixed_size_globally(test=None): # [START sample_fixed_size_globally] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample_test.py index bb337ee70d9b..453d82affc6a 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sum.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sum.py index 74b08365b2f8..a25509237521 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sum.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sum.py @@ -16,9 +16,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import print_function - def sum_globally(test=None): # [START sum_globally] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sum_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sum_test.py index dd59770b5145..1516739161aa 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sum_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/sum_test.py @@ -16,9 +16,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/top.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/top.py index 8b6cfa61eb8d..787ab9bec7e7 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/top.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/top.py @@ -16,9 +16,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import print_function - def top_largest(test=None): # [START top_largest] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/top_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/top_test.py index 2bbb759c0297..95beb241f5cd 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/top_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/top_test.py @@ -16,10 +16,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import print_function - -import sys import unittest import mock @@ -98,12 +94,6 @@ def test_top_smallest_per_key(self): def test_top_of(self): top.top_of(check_shortest_elements) - # TODO: Remove this after Python 2 deprecation. - # https://issues.apache.org/jira/browse/BEAM-8124 - @unittest.skipIf( - sys.version_info[0] == 2, - 'nosetests in Python 2 uses ascii instead of utf-8 in ' - 'the Top.PerKey transform and causes this to fail') def test_top_per_key(self): top.top_per_key(check_shortest_elements_per_key) diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/__init__.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/__init__.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py index 04cf58e8a51b..123e334d6caf 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def filter_function(test=None): # [START filter_function] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py index 10ad79d702ee..81023c6d9347 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py index 7e3e0076e646..362828576419 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def flatmap_simple(test=None): # [START flatmap_simple] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py index 9e67e74e3005..23e6d229bdc8 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/keys.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/keys.py index 9bbe9528ce3a..af206e020945 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/keys.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/keys.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def keys(test=None): # [START keys] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/keys_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/keys_test.py index 945583606a97..fcfe370234f1 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/keys_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/keys_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/kvswap.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/kvswap.py index 8e69bc60a688..0fc77e90c333 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/kvswap.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/kvswap.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def kvswap(test=None): # [START kvswap] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/kvswap_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/kvswap_test.py index c3eed2794857..fa494314bd3b 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/kvswap_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/kvswap_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/map.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/map.py index eaac277f1b23..216cd5b62d79 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/map.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/map.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def map_simple(test=None): # [START map_simple] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/map_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/map_test.py index 2f34a6220728..423d0572c3e8 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/map_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/map_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/pardo.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/pardo.py index 965446f1e50a..c54d05e39911 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/pardo.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/pardo.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def pardo_dofn(test=None): # [START pardo_dofn] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/pardo_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/pardo_test.py index 8498ab9be8b2..7a6f2ce00a1b 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/pardo_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/pardo_test.py @@ -18,12 +18,8 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - -import platform -import sys import unittest +from io import StringIO import mock @@ -34,13 +30,6 @@ from . import pardo -# TODO: Remove this after Python 2 deprecation. -# https://issues.apache.org/jira/browse/BEAM-8124 -if sys.version_info[0] == 2: - from io import BytesIO as StringIO -else: - from io import StringIO - def check_plants(actual): expected = '''[START plants] @@ -96,11 +85,6 @@ class ParDoTest(unittest.TestCase): def test_pardo_dofn(self): pardo.pardo_dofn(check_plants) - # TODO: Remove this after Python 2 deprecation. - # https://issues.apache.org/jira/browse/BEAM-8124 - @unittest.skipIf( - sys.version_info[0] == 2 and platform.system() == 'Windows', - 'Python 2 on Windows uses `long` rather than `int`') def test_pardo_dofn_params(self): pardo.pardo_dofn_params(check_dofn_params) diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/partition.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/partition.py index ba1e8a74d2c5..30dc8e32c2f3 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/partition.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/partition.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def partition_function(test=None): # pylint: disable=line-too-long, expression-not-assigned diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/partition_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/partition_test.py index be243b2ac923..55babe0802eb 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/partition_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/partition_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py index cdd83a57f771..68688b870c0f 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def regex_matches(test=None): # [START regex_matches] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex_test.py index d783b1fe66e6..1cc754992413 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/tostring.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/tostring.py index 32755cb29e63..7ca9d2b27c00 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/tostring.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/tostring.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def tostring_kvs(test=None): # [START tostring_kvs] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/tostring_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/tostring_test.py index e14227899cf6..01b9c04f7c4a 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/tostring_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/tostring_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/values.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/values.py index a9b3cc1282cd..c72c404d6f00 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/values.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/values.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def values(test=None): # [START values] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/values_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/values_test.py index 6c2caaacbade..acd84215a69d 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/values_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/values_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/withtimestamps.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/withtimestamps.py index bf4a4d450960..5008f26e3225 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/withtimestamps.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/withtimestamps.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - def withtimestamps_event_time(test=None): # [START withtimestamps_event_time] diff --git a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/withtimestamps_test.py b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/withtimestamps_test.py index 191d1140e24b..f23f48b1e9f0 100644 --- a/sdks/python/apache_beam/examples/snippets/transforms/elementwise/withtimestamps_test.py +++ b/sdks/python/apache_beam/examples/snippets/transforms/elementwise/withtimestamps_test.py @@ -18,9 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import unittest import mock diff --git a/sdks/python/apache_beam/examples/snippets/util.py b/sdks/python/apache_beam/examples/snippets/util.py index a14cd36c2ff5..7911b32ce4d3 100644 --- a/sdks/python/apache_beam/examples/snippets/util.py +++ b/sdks/python/apache_beam/examples/snippets/util.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import ast import shlex import subprocess as sp diff --git a/sdks/python/apache_beam/examples/snippets/util_test.py b/sdks/python/apache_beam/examples/snippets/util_test.py index 21d4e0eb6be2..fae26a339d53 100644 --- a/sdks/python/apache_beam/examples/snippets/util_test.py +++ b/sdks/python/apache_beam/examples/snippets/util_test.py @@ -18,8 +18,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from mock import patch diff --git a/sdks/python/apache_beam/examples/sql_taxi.py b/sdks/python/apache_beam/examples/sql_taxi.py index 607dea1d3f82..165bbc9cd9ba 100644 --- a/sdks/python/apache_beam/examples/sql_taxi.py +++ b/sdks/python/apache_beam/examples/sql_taxi.py @@ -29,8 +29,6 @@ # pytype: skip-file -from __future__ import absolute_import - import json import logging @@ -80,7 +78,8 @@ def run(output_topic, pipeline_args): "window_end": window.end.to_rfc3339() }) | "Convert to JSON" >> beam.Map(json.dumps) - | beam.io.WriteStringsToPubSub(topic=output_topic)) + | "UTF-8 encode" >> beam.Map(lambda s: s.encode("utf-8")) + | beam.io.WriteToPubSub(topic=output_topic)) if __name__ == '__main__': diff --git a/sdks/python/apache_beam/examples/streaming_wordcount.py b/sdks/python/apache_beam/examples/streaming_wordcount.py index 14d7e8c776ff..d276cfc26d54 100644 --- a/sdks/python/apache_beam/examples/streaming_wordcount.py +++ b/sdks/python/apache_beam/examples/streaming_wordcount.py @@ -20,13 +20,9 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging -from past.builtins import unicode - import apache_beam as beam import apache_beam.transforms.window as window from apache_beam.examples.wordcount_with_metrics import WordExtractingDoFn @@ -85,8 +81,7 @@ def count_ones(word_ones): counts = ( lines - | 'split' >> - (beam.ParDo(WordExtractingDoFn()).with_output_types(unicode)) + | 'split' >> (beam.ParDo(WordExtractingDoFn()).with_output_types(str)) | 'pair_with_one' >> beam.Map(lambda x: (x, 1)) | beam.WindowInto(window.FixedWindows(15, 0)) | 'group' >> beam.GroupByKey() diff --git a/sdks/python/apache_beam/examples/streaming_wordcount_debugging.py b/sdks/python/apache_beam/examples/streaming_wordcount_debugging.py index 28d9df9de583..2df87f4aa045 100644 --- a/sdks/python/apache_beam/examples/streaming_wordcount_debugging.py +++ b/sdks/python/apache_beam/examples/streaming_wordcount_debugging.py @@ -34,15 +34,11 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import re import time -from past.builtins import unicode - import apache_beam as beam import apache_beam.transforms.window as window from apache_beam.examples.wordcount import WordExtractingDoFn @@ -132,8 +128,7 @@ def count_ones(word_ones): counts = ( lines - | 'Split' >> - (beam.ParDo(WordExtractingDoFn()).with_output_types(unicode)) + | 'Split' >> (beam.ParDo(WordExtractingDoFn()).with_output_types(str)) | 'AddTimestampFn' >> beam.ParDo(AddTimestampFn()) | 'After AddTimestampFn' >> ParDo(PrintFn('After AddTimestampFn')) | 'PairWithOne' >> beam.Map(lambda x: (x, 1)) diff --git a/sdks/python/apache_beam/examples/streaming_wordcount_debugging_it_test.py b/sdks/python/apache_beam/examples/streaming_wordcount_debugging_it_test.py index 526863d47b66..9101ff830c21 100644 --- a/sdks/python/apache_beam/examples/streaming_wordcount_debugging_it_test.py +++ b/sdks/python/apache_beam/examples/streaming_wordcount_debugging_it_test.py @@ -19,14 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest import uuid +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.examples import streaming_wordcount_debugging from apache_beam.io.gcp.tests.pubsub_matcher import PubSubMessageMatcher @@ -96,7 +94,7 @@ def tearDown(self): test_utils.cleanup_topics( self.pub_client, [self.input_topic, self.output_topic]) - @attr('IT') + @pytest.mark.it_postcommit @unittest.skip( "Skipped due to [BEAM-3377]: assert_that not working for streaming") def test_streaming_wordcount_debugging_it(self): diff --git a/sdks/python/apache_beam/examples/streaming_wordcount_debugging_test.py b/sdks/python/apache_beam/examples/streaming_wordcount_debugging_test.py index 1210c480a7bf..2ab80993080c 100644 --- a/sdks/python/apache_beam/examples/streaming_wordcount_debugging_test.py +++ b/sdks/python/apache_beam/examples/streaming_wordcount_debugging_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import mock diff --git a/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py index cd79ddfef486..8beae5e89c66 100644 --- a/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py +++ b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py @@ -19,15 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest import uuid -from builtins import range +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.examples import streaming_wordcount from apache_beam.io.gcp.tests.pubsub_matcher import PubSubMessageMatcher @@ -80,7 +77,7 @@ def tearDown(self): test_utils.cleanup_topics( self.pub_client, [self.input_topic, self.output_topic]) - @attr('IT') + @pytest.mark.it_postcommit def test_streaming_wordcount_it(self): # Build expected dataset. expected_msg = [('%d: 1' % num).encode('utf-8') diff --git a/sdks/python/apache_beam/examples/windowed_wordcount.py b/sdks/python/apache_beam/examples/windowed_wordcount.py index a93c0759d83f..861a14792148 100644 --- a/sdks/python/apache_beam/examples/windowed_wordcount.py +++ b/sdks/python/apache_beam/examples/windowed_wordcount.py @@ -23,13 +23,9 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging -from past.builtins import unicode - import apache_beam as beam import apache_beam.transforms.window as window @@ -56,7 +52,7 @@ def process(self, element, window=beam.DoFn.WindowParam): }] -def run(argv=None): +def main(argv=None): """Build and run the pipeline.""" parser = argparse.ArgumentParser() @@ -84,7 +80,7 @@ def count_ones(word_ones): transformed = ( lines - | 'Split' >> (beam.FlatMap(find_words).with_output_types(unicode)) + | 'Split' >> (beam.FlatMap(find_words).with_output_types(str)) | 'PairWithOne' >> beam.Map(lambda x: (x, 1)) | beam.WindowInto(window.FixedWindows(2 * 60, 0)) | 'Group' >> beam.GroupByKey() @@ -102,4 +98,4 @@ def count_ones(word_ones): if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) - run() + main() diff --git a/sdks/python/apache_beam/examples/wordcount.py b/sdks/python/apache_beam/examples/wordcount.py index aa0780280370..b59baa61a469 100644 --- a/sdks/python/apache_beam/examples/wordcount.py +++ b/sdks/python/apache_beam/examples/wordcount.py @@ -19,14 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import re -from past.builtins import unicode - import apache_beam as beam from apache_beam.io import ReadFromText from apache_beam.io import WriteToText @@ -78,8 +74,7 @@ def run(argv=None, save_main_session=True): counts = ( lines - | 'Split' >> - (beam.ParDo(WordExtractingDoFn()).with_output_types(unicode)) + | 'Split' >> (beam.ParDo(WordExtractingDoFn()).with_output_types(str)) | 'PairWIthOne' >> beam.Map(lambda x: (x, 1)) | 'GroupAndSum' >> beam.CombinePerKey(sum)) diff --git a/sdks/python/apache_beam/examples/wordcount_dataframe.py b/sdks/python/apache_beam/examples/wordcount_dataframe.py index 384f6f5b1430..b6517fce76bd 100644 --- a/sdks/python/apache_beam/examples/wordcount_dataframe.py +++ b/sdks/python/apache_beam/examples/wordcount_dataframe.py @@ -15,69 +15,14 @@ # limitations under the License. # -"""A word-counting workflow using dataframes.""" +"""Alias for apache_beam.examples.dataframe.wordcount, a word-counting workflow +using the DataFrame API.""" # pytype: skip-file -from __future__ import absolute_import - -import argparse import logging -import apache_beam as beam -from apache_beam.dataframe.convert import to_dataframe -from apache_beam.dataframe.convert import to_pcollection -from apache_beam.io import ReadFromText -from apache_beam.options.pipeline_options import PipelineOptions - - -def run(argv=None): - """Main entry point; defines and runs the wordcount pipeline.""" - parser = argparse.ArgumentParser() - parser.add_argument( - '--input', - dest='input', - default='gs://dataflow-samples/shakespeare/kinglear.txt', - help='Input file to process.') - parser.add_argument( - '--output', - dest='output', - required=True, - help='Output file to write results to.') - known_args, pipeline_args = parser.parse_known_args(argv) - - # Import this here to avoid pickling the main session. - import re - - # The pipeline will be run on exiting the with block. - with beam.Pipeline(options=PipelineOptions(pipeline_args)) as p: - - # Read the text file[pattern] into a PCollection. - lines = p | 'Read' >> ReadFromText(known_args.input) - - words = ( - lines - | 'Split' >> beam.FlatMap( - lambda line: re.findall(r'[\w]+', line)).with_output_types(str) - # Map to Row objects to generate a schema suitable for conversion - # to a dataframe. - | 'ToRows' >> beam.Map(lambda word: beam.Row(word=word))) - - df = to_dataframe(words) - df['count'] = 1 - counted = df.groupby('word').sum() - counted.to_csv(known_args.output) - - # Deferred DataFrames can also be converted back to schema'd PCollections - counted_pc = to_pcollection(counted, include_indexes=True) - - # Print out every word that occurred >50 times - _ = ( - counted_pc - | beam.Filter(lambda row: row.count > 50) - | beam.Map(lambda row: f'{row.word}: {row.count}') - | beam.Map(print)) - +from apache_beam.examples.dataframe.wordcount import run if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/examples/wordcount_debugging.py b/sdks/python/apache_beam/examples/wordcount_debugging.py index 19f511e73e88..d4195df43a8d 100644 --- a/sdks/python/apache_beam/examples/wordcount_debugging.py +++ b/sdks/python/apache_beam/examples/wordcount_debugging.py @@ -42,14 +42,10 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import re -from past.builtins import unicode - import apache_beam as beam from apache_beam.io import ReadFromText from apache_beam.io import WriteToText @@ -107,8 +103,8 @@ def count_ones(word_ones): return ( pcoll | 'split' >> ( - beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x)). - with_output_types(unicode)) + beam.FlatMap( + lambda x: re.findall(r'[A-Za-z\']+', x)).with_output_types(str)) | 'pair_with_one' >> beam.Map(lambda x: (x, 1)) | 'group' >> beam.GroupByKey() | 'count' >> beam.Map(count_ones)) diff --git a/sdks/python/apache_beam/examples/wordcount_debugging_test.py b/sdks/python/apache_beam/examples/wordcount_debugging_test.py index c92df6df772d..4c8a62a0c1b8 100644 --- a/sdks/python/apache_beam/examples/wordcount_debugging_test.py +++ b/sdks/python/apache_beam/examples/wordcount_debugging_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import re import tempfile diff --git a/sdks/python/apache_beam/examples/wordcount_it_test.py b/sdks/python/apache_beam/examples/wordcount_it_test.py index 5511ca6e2337..8ee49c706555 100644 --- a/sdks/python/apache_beam/examples/wordcount_it_test.py +++ b/sdks/python/apache_beam/examples/wordcount_it_test.py @@ -19,15 +19,13 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import time import unittest +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.examples import wordcount from apache_beam.testing.load_tests.load_test_metrics_utils import InfluxDBMetricsPublisherOptions @@ -40,29 +38,34 @@ class WordCountIT(unittest.TestCase): - # Enable nose tests running in parallel - _multiprocess_can_split_ = True - # The default checksum is a SHA-1 hash generated from a sorted list of # lines read from expected output. This value corresponds to the default # input of WordCount example. DEFAULT_CHECKSUM = '33535a832b7db6d78389759577d4ff495980b9c0' - @attr('IT') + @pytest.mark.it_postcommit def test_wordcount_it(self): self._run_wordcount_it(wordcount.run) - @attr('IT', 'ValidatesContainer') + @pytest.mark.it_postcommit + @pytest.mark.it_validatescontainer def test_wordcount_fnapi_it(self): self._run_wordcount_it(wordcount.run, experiment='beam_fn_api') - @attr('ValidatesContainer') - def test_wordcount_it_with_prebuilt_sdk_container(self): + @pytest.mark.it_validatescontainer + def test_wordcount_it_with_prebuilt_sdk_container_local_docker(self): self._run_wordcount_it( wordcount.run, experiment='beam_fn_api', prebuild_sdk_container_engine='local_docker') + @pytest.mark.it_validatescontainer + def test_wordcount_it_with_prebuilt_sdk_container_cloud_build(self): + self._run_wordcount_it( + wordcount.run, + experiment='beam_fn_api', + prebuild_sdk_container_engine='cloud_build') + def _run_wordcount_it(self, run_wordcount, **opts): test_pipeline = TestPipeline(is_integration_test=True) extra_opts = {} diff --git a/sdks/python/apache_beam/examples/wordcount_minimal.py b/sdks/python/apache_beam/examples/wordcount_minimal.py index a4f7057d7ba4..f259bb0236bc 100644 --- a/sdks/python/apache_beam/examples/wordcount_minimal.py +++ b/sdks/python/apache_beam/examples/wordcount_minimal.py @@ -46,14 +46,10 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import re -from past.builtins import unicode - import apache_beam as beam from apache_beam.io import ReadFromText from apache_beam.io import WriteToText @@ -61,7 +57,7 @@ from apache_beam.options.pipeline_options import SetupOptions -def run(argv=None, save_main_session=True): +def main(argv=None, save_main_session=True): """Main entry point; defines and runs the wordcount pipeline.""" parser = argparse.ArgumentParser() @@ -111,8 +107,8 @@ def run(argv=None, save_main_session=True): counts = ( lines | 'Split' >> ( - beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x)). - with_output_types(unicode)) + beam.FlatMap( + lambda x: re.findall(r'[A-Za-z\']+', x)).with_output_types(str)) | 'PairWithOne' >> beam.Map(lambda x: (x, 1)) | 'GroupAndSum' >> beam.CombinePerKey(sum)) @@ -130,4 +126,4 @@ def format_result(word_count): if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) - run() + main() diff --git a/sdks/python/apache_beam/examples/wordcount_minimal_test.py b/sdks/python/apache_beam/examples/wordcount_minimal_test.py index a966401204f0..90ee84220783 100644 --- a/sdks/python/apache_beam/examples/wordcount_minimal_test.py +++ b/sdks/python/apache_beam/examples/wordcount_minimal_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import logging import re @@ -46,7 +44,7 @@ def test_basics(self): expected_words = collections.defaultdict(int) for word in re.findall(r'\w+', self.SAMPLE_TEXT): expected_words[word] += 1 - wordcount_minimal.run( + wordcount_minimal.main( ['--input=%s*' % temp_path, '--output=%s.result' % temp_path], save_main_session=False) # Parse result file and compare. diff --git a/sdks/python/apache_beam/examples/wordcount_test.py b/sdks/python/apache_beam/examples/wordcount_test.py index 51c0bc4efdb5..e3d476d5a786 100644 --- a/sdks/python/apache_beam/examples/wordcount_test.py +++ b/sdks/python/apache_beam/examples/wordcount_test.py @@ -20,8 +20,6 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import logging import re diff --git a/sdks/python/apache_beam/examples/wordcount_with_metrics.py b/sdks/python/apache_beam/examples/wordcount_with_metrics.py index 6732568d0543..bf61476ca3d2 100644 --- a/sdks/python/apache_beam/examples/wordcount_with_metrics.py +++ b/sdks/python/apache_beam/examples/wordcount_with_metrics.py @@ -19,14 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import re -from past.builtins import unicode - import apache_beam as beam from apache_beam.io import ReadFromText from apache_beam.io import WriteToText @@ -70,7 +66,7 @@ def process(self, element): return words -def run(argv=None, save_main_session=True): +def main(argv=None, save_main_session=True): """Main entry point; defines and runs the wordcount pipeline.""" parser = argparse.ArgumentParser() parser.add_argument( @@ -101,8 +97,7 @@ def count_ones(word_ones): counts = ( lines - | 'split' >> - (beam.ParDo(WordExtractingDoFn()).with_output_types(unicode)) + | 'split' >> (beam.ParDo(WordExtractingDoFn()).with_output_types(str)) | 'pair_with_one' >> beam.Map(lambda x: (x, 1)) | 'group' >> beam.GroupByKey() | 'count' >> beam.Map(count_ones)) @@ -139,4 +134,4 @@ def format_result(word_count): if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) - run() + main() diff --git a/sdks/python/apache_beam/examples/wordcount_xlang.py b/sdks/python/apache_beam/examples/wordcount_xlang.py index 88690e7b5074..80a0c8fe7a4f 100644 --- a/sdks/python/apache_beam/examples/wordcount_xlang.py +++ b/sdks/python/apache_beam/examples/wordcount_xlang.py @@ -19,15 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import re import subprocess import grpc -from past.builtins import unicode import apache_beam as beam from apache_beam.io import ReadFromText @@ -63,8 +60,7 @@ def build_pipeline(p, input_file, output_file): counts = ( lines - | 'split' >> - (beam.ParDo(WordExtractingDoFn()).with_output_types(unicode)) + | 'split' >> (beam.ParDo(WordExtractingDoFn()).with_output_types(str)) | 'count' >> beam.ExternalTransform( 'beam:transforms:xlang:count', None, EXPANSION_SERVICE_ADDR)) diff --git a/sdks/python/apache_beam/examples/wordcount_xlang_sql.py b/sdks/python/apache_beam/examples/wordcount_xlang_sql.py index 28f58a372852..97a43d386c75 100644 --- a/sdks/python/apache_beam/examples/wordcount_xlang_sql.py +++ b/sdks/python/apache_beam/examples/wordcount_xlang_sql.py @@ -22,15 +22,11 @@ Docker must also be available to run this pipeline locally. """ -from __future__ import absolute_import - import argparse import logging import re import typing -from past.builtins import unicode - import apache_beam as beam from apache_beam import coders from apache_beam.io import ReadFromText @@ -44,9 +40,9 @@ # One way to create such a PCollection is to produce a PCollection of # NamedTuple registered with the RowCoder. # -# Here we create and register a simple NamedTuple with a single unicode typed +# Here we create and register a simple NamedTuple with a single str typed # field named 'word' which we will use below. -MyRow = typing.NamedTuple('MyRow', [('word', unicode)]) +MyRow = typing.NamedTuple('MyRow', [('word', str)]) coders.registry.register_coder(MyRow, coders.RowCoder) diff --git a/sdks/python/apache_beam/internal/__init__.py b/sdks/python/apache_beam/internal/__init__.py index 84381ed71d4f..0bce5d68f724 100644 --- a/sdks/python/apache_beam/internal/__init__.py +++ b/sdks/python/apache_beam/internal/__init__.py @@ -16,5 +16,3 @@ # """For internal use only; no backwards-compatibility guarantees.""" - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/internal/gcp/__init__.py b/sdks/python/apache_beam/internal/gcp/__init__.py index 84381ed71d4f..0bce5d68f724 100644 --- a/sdks/python/apache_beam/internal/gcp/__init__.py +++ b/sdks/python/apache_beam/internal/gcp/__init__.py @@ -16,5 +16,3 @@ # """For internal use only; no backwards-compatibility guarantees.""" - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/internal/gcp/auth.py b/sdks/python/apache_beam/internal/gcp/auth.py index 636cab658860..b3dbe320ad69 100644 --- a/sdks/python/apache_beam/internal/gcp/auth.py +++ b/sdks/python/apache_beam/internal/gcp/auth.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import socket import threading diff --git a/sdks/python/apache_beam/internal/gcp/json_value.py b/sdks/python/apache_beam/internal/gcp/json_value.py index 60009ac7515e..c7d561efedca 100644 --- a/sdks/python/apache_beam/internal/gcp/json_value.py +++ b/sdks/python/apache_beam/internal/gcp/json_value.py @@ -19,11 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from past.builtins import long -from past.builtins import unicode - from apache_beam.options.value_provider import ValueProvider # Protect against environments where apitools library is not available. @@ -54,7 +49,7 @@ def get_typed_value_descriptor(obj): TypeError: if the Python object has a type that is not supported. """ - if isinstance(obj, (bytes, unicode)): + if isinstance(obj, (bytes, str)): type_name = 'Text' elif isinstance(obj, bool): type_name = 'Boolean' @@ -106,13 +101,13 @@ def to_json_value(obj, with_type=False): return extra_types.JsonValue(object_value=json_object) elif with_type: return to_json_value(get_typed_value_descriptor(obj), with_type=False) - elif isinstance(obj, (str, unicode)): + elif isinstance(obj, str): return extra_types.JsonValue(string_value=obj) elif isinstance(obj, bytes): return extra_types.JsonValue(string_value=obj.decode('utf8')) elif isinstance(obj, bool): return extra_types.JsonValue(boolean_value=obj) - elif isinstance(obj, (int, long)): + elif isinstance(obj, int): if _MININT64 <= obj <= _MAXINT64: return extra_types.JsonValue(integer_value=obj) else: diff --git a/sdks/python/apache_beam/internal/gcp/json_value_test.py b/sdks/python/apache_beam/internal/gcp/json_value_test.py index 4b21b4f61546..21337de1a103 100644 --- a/sdks/python/apache_beam/internal/gcp/json_value_test.py +++ b/sdks/python/apache_beam/internal/gcp/json_value_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from apache_beam.internal.gcp.json_value import from_json_value diff --git a/sdks/python/apache_beam/internal/http_client.py b/sdks/python/apache_beam/internal/http_client.py index 57d2feb56128..50c2a5aba903 100644 --- a/sdks/python/apache_beam/internal/http_client.py +++ b/sdks/python/apache_beam/internal/http_client.py @@ -21,8 +21,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - import logging import os import re diff --git a/sdks/python/apache_beam/internal/http_client_test.py b/sdks/python/apache_beam/internal/http_client_test.py index 6a403a9039ed..c8790782a8c1 100644 --- a/sdks/python/apache_beam/internal/http_client_test.py +++ b/sdks/python/apache_beam/internal/http_client_test.py @@ -18,8 +18,6 @@ """Unit tests for the http_client module.""" # pytype: skip-file -from __future__ import absolute_import - import os import unittest diff --git a/sdks/python/apache_beam/internal/metrics/__init__.py b/sdks/python/apache_beam/internal/metrics/__init__.py index 84381ed71d4f..0bce5d68f724 100644 --- a/sdks/python/apache_beam/internal/metrics/__init__.py +++ b/sdks/python/apache_beam/internal/metrics/__init__.py @@ -16,5 +16,3 @@ # """For internal use only; no backwards-compatibility guarantees.""" - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/internal/metrics/cells.py b/sdks/python/apache_beam/internal/metrics/cells.py index 44ee3953b468..3fcaecf8c677 100644 --- a/sdks/python/apache_beam/internal/metrics/cells.py +++ b/sdks/python/apache_beam/internal/metrics/cells.py @@ -25,10 +25,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - -from builtins import object from typing import TYPE_CHECKING from typing import Optional @@ -109,10 +105,6 @@ def __eq__(self, other): def __hash__(self): return hash(self.data) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __repr__(self): return ''.format( self.data.histogram.get_percentile_info()) @@ -147,10 +139,6 @@ def __eq__(self, other): def __hash__(self): return hash(self.histogram) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __repr__(self): return 'HistogramData({})'.format(self.histogram.get_percentile_info()) diff --git a/sdks/python/apache_beam/internal/metrics/cells_test.py b/sdks/python/apache_beam/internal/metrics/cells_test.py index 89fe23d2ecd8..066dec4a2635 100644 --- a/sdks/python/apache_beam/internal/metrics/cells_test.py +++ b/sdks/python/apache_beam/internal/metrics/cells_test.py @@ -17,11 +17,8 @@ # pytype: skip-file -from __future__ import absolute_import - import threading import unittest -from builtins import range from apache_beam.internal.metrics.cells import HistogramCell from apache_beam.internal.metrics.cells import HistogramCellFactory diff --git a/sdks/python/apache_beam/internal/metrics/metric.py b/sdks/python/apache_beam/internal/metrics/metric.py index 069919eb5e2b..0510674d5431 100644 --- a/sdks/python/apache_beam/internal/metrics/metric.py +++ b/sdks/python/apache_beam/internal/metrics/metric.py @@ -25,13 +25,10 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import - import datetime import logging import threading import time -from builtins import object from typing import TYPE_CHECKING from typing import Dict from typing import Optional diff --git a/sdks/python/apache_beam/internal/metrics/metric_test.py b/sdks/python/apache_beam/internal/metrics/metric_test.py index bef7713a82bd..22b64ee73aee 100644 --- a/sdks/python/apache_beam/internal/metrics/metric_test.py +++ b/sdks/python/apache_beam/internal/metrics/metric_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from mock import patch diff --git a/sdks/python/apache_beam/internal/module_test.py b/sdks/python/apache_beam/internal/module_test.py index c30f24966b7e..eaa1629be8e5 100644 --- a/sdks/python/apache_beam/internal/module_test.py +++ b/sdks/python/apache_beam/internal/module_test.py @@ -19,11 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - import re import sys -from builtins import object from typing import Type diff --git a/sdks/python/apache_beam/internal/pickler.py b/sdks/python/apache_beam/internal/pickler.py index 395d511dd8e6..9cecedd1b5c1 100644 --- a/sdks/python/apache_beam/internal/pickler.py +++ b/sdks/python/apache_beam/internal/pickler.py @@ -30,8 +30,6 @@ # pytype: skip-file -from __future__ import absolute_import - import base64 import bz2 import logging @@ -46,6 +44,8 @@ import dill +settings = {'dill_byref': None} + class _NoOpContextManager(object): def __enter__(self): @@ -55,13 +55,9 @@ def __exit__(self, *unused_exc_info): pass -if sys.version_info[0] > 2: - # Pickling, especially unpickling, causes broken module imports on Python 3 - # if executed concurrently, see: BEAM-8651, http://bugs.python.org/issue38884. - _pickle_lock_unless_py2 = threading.RLock() -else: - # Avoid slow reentrant locks on Py2. See: https://bugs.python.org/issue3001. - _pickle_lock_unless_py2 = _NoOpContextManager() +# Pickling, especially unpickling, causes broken module imports on Python 3 +# if executed concurrently, see: BEAM-8651, http://bugs.python.org/issue38884. +_pickle_lock = threading.RLock() # Dill 0.28.0 renamed dill.dill to dill._dill: # https://github.com/uqfoundation/dill/commit/f0972ecc7a41d0b8acada6042d557068cac69baa # TODO: Remove this once Beam depends on dill >= 0.2.8 @@ -79,9 +75,8 @@ def _is_nested_class(cls): """Returns true if argument is a class object that appears to be nested.""" return ( isinstance(cls, type) and cls.__module__ is not None and - cls.__module__ != 'builtins' # Python 3 - and cls.__module__ != '__builtin__' # Python 2 - and cls.__name__ not in sys.modules[cls.__module__].__dict__) + cls.__module__ != 'builtins' and + cls.__name__ not in sys.modules[cls.__module__].__dict__) def _find_containing_class(nested_class): @@ -246,13 +241,13 @@ def dumps(o, enable_trace=True, use_zlib=False): # type: (...) -> bytes """For internal use only; no backwards-compatibility guarantees.""" - with _pickle_lock_unless_py2: + with _pickle_lock: try: - s = dill.dumps(o) + s = dill.dumps(o, byref=settings['dill_byref']) except Exception: # pylint: disable=broad-except if enable_trace: dill.dill._trace(True) # pylint: disable=protected-access - s = dill.dumps(o) + s = dill.dumps(o, byref=settings['dill_byref']) else: raise finally: @@ -285,7 +280,7 @@ def loads(encoded, enable_trace=True, use_zlib=False): del c # Free up some possibly large and no-longer-needed memory. - with _pickle_lock_unless_py2: + with _pickle_lock: try: return dill.loads(s) except Exception: # pylint: disable=broad-except @@ -307,12 +302,12 @@ def dump_session(file_path): create and load the dump twice to have consistent results in the worker and the running session. Check: https://github.com/uqfoundation/dill/issues/195 """ - with _pickle_lock_unless_py2: + with _pickle_lock: dill.dump_session(file_path) dill.load_session(file_path) return dill.dump_session(file_path) def load_session(file_path): - with _pickle_lock_unless_py2: + with _pickle_lock: return dill.load_session(file_path) diff --git a/sdks/python/apache_beam/internal/pickler_test.py b/sdks/python/apache_beam/internal/pickler_test.py index 929e7015007c..15e62a714c10 100644 --- a/sdks/python/apache_beam/internal/pickler_test.py +++ b/sdks/python/apache_beam/internal/pickler_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import sys import types import unittest diff --git a/sdks/python/apache_beam/internal/util.py b/sdks/python/apache_beam/internal/util.py index da90d8120e95..f0a3ad8288b5 100644 --- a/sdks/python/apache_beam/internal/util.py +++ b/sdks/python/apache_beam/internal/util.py @@ -22,12 +22,9 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import threading import weakref -from builtins import object from multiprocessing.pool import ThreadPool from typing import Any from typing import Dict @@ -65,10 +62,6 @@ def __eq__(self, other): """ return isinstance(other, ArgumentPlaceholder) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(type(self)) diff --git a/sdks/python/apache_beam/internal/util_test.py b/sdks/python/apache_beam/internal/util_test.py index bb0ae3779264..ded1190f8405 100644 --- a/sdks/python/apache_beam/internal/util_test.py +++ b/sdks/python/apache_beam/internal/util_test.py @@ -18,8 +18,6 @@ """Unit tests for the util module.""" # pytype: skip-file -from __future__ import absolute_import - import unittest from apache_beam.internal.util import ArgumentPlaceholder diff --git a/sdks/python/apache_beam/io/__init__.py b/sdks/python/apache_beam/io/__init__.py index fcc21c11a7dd..4945da97d90f 100644 --- a/sdks/python/apache_beam/io/__init__.py +++ b/sdks/python/apache_beam/io/__init__.py @@ -18,8 +18,6 @@ """A package defining several input sources and output sinks.""" # pylint: disable=wildcard-import -from __future__ import absolute_import - from apache_beam.io.avroio import * from apache_beam.io.filebasedsink import * from apache_beam.io.iobase import Read diff --git a/sdks/python/apache_beam/io/avroio.py b/sdks/python/apache_beam/io/avroio.py index e861c434e0a6..45a619c19bbb 100644 --- a/sdks/python/apache_beam/io/avroio.py +++ b/sdks/python/apache_beam/io/avroio.py @@ -44,12 +44,9 @@ """ # pytype: skip-file -from __future__ import absolute_import - import io import os import zlib -from builtins import object from functools import partial import avro @@ -174,6 +171,7 @@ def __init__( min_bundle_size=0, desired_bundle_size=DEFAULT_DESIRED_BUNDLE_SIZE, use_fastavro=True, + with_filename=False, label='ReadAllFiles'): """Initializes ``ReadAllFromAvro``. @@ -184,6 +182,9 @@ def __init__( splitting the input into bundles. use_fastavro (bool); when set, use the `fastavro` library for IO, which is significantly faster, and is now the default. + with_filename: If True, returns a Key Value with the key being the file + name and the value being the actual data. If False, it only returns + the data. """ source_from_file = partial( _create_avro_source, @@ -194,7 +195,8 @@ def __init__( CompressionTypes.AUTO, desired_bundle_size, min_bundle_size, - source_from_file) + source_from_file, + with_filename) self.label = label diff --git a/sdks/python/apache_beam/io/avroio_test.py b/sdks/python/apache_beam/io/avroio_test.py index b121b8021169..2cb5c5cfae8c 100644 --- a/sdks/python/apache_beam/io/avroio_test.py +++ b/sdks/python/apache_beam/io/avroio_test.py @@ -16,37 +16,24 @@ # # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import json import logging import math import os import tempfile import unittest -from builtins import range from typing import List -import sys -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import hamcrest as hc import avro import avro.datafile from avro.datafile import DataFileWriter from avro.io import DatumWriter +from avro.schema import Parse from fastavro.schema import parse_schema from fastavro import writer -# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports -try: - from avro.schema import Parse # avro-python3 library for python3 -except ImportError: - from avro.schema import parse as Parse # avro library for python2 -# pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports - import apache_beam as beam from apache_beam import Create from apache_beam.io import avroio @@ -102,12 +89,6 @@ def __init__(self, methodName='runTest'): } ''' - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): # Reducing the size of thread pools. Without this test execution may fail in # environments with limited amount of resources. @@ -122,16 +103,20 @@ def tearDown(self): def _write_data(self, directory, prefix, codec, count, sync_interval): raise NotImplementedError - def _write_pattern(self, num_files): + def _write_pattern(self, num_files, return_filenames=False): assert num_files > 0 temp_dir = tempfile.mkdtemp() file_name = None + file_list = [] for _ in range(num_files): file_name = self._write_data(directory=temp_dir, prefix='mytemp') + file_list.append(file_name) assert file_name file_name_prefix = file_name[:file_name.rfind(os.path.sep)] + if return_filenames: + return (file_name_prefix + os.path.sep + 'mytemp*', file_list) return file_name_prefix + os.path.sep + 'mytemp*' def _run_avro_test( @@ -407,6 +392,17 @@ def test_read_all_from_avro_many_file_patterns(self): | avroio.ReadAllFromAvro(use_fastavro=self.use_fastavro), equal_to(self.RECORDS * 10)) + def test_read_all_from_avro_with_filename(self): + file_pattern, file_paths = self._write_pattern(3, return_filenames=True) + result = [(path, record) for path in file_paths for record in self.RECORDS] + with TestPipeline() as p: + assert_that( + p \ + | Create([file_pattern]) \ + | avroio.ReadAllFromAvro(use_fastavro=self.use_fastavro, + with_filename=True), + equal_to(result)) + def test_sink_transform(self): with tempfile.NamedTemporaryFile() as dst: path = dst.name @@ -446,7 +442,7 @@ def test_sink_transform_snappy(self): @unittest.skipIf( - sys.version_info[0] == 3 and os.environ.get('RUN_SKIPPED_PY3_TESTS') != '1', + os.environ.get('RUN_SKIPPED_PY3_TESTS') != '1', 'This test requires that Beam depends on avro-python3>=1.9 or newer. ' 'See: BEAM-6522.') class TestAvro(AvroBase, unittest.TestCase): diff --git a/sdks/python/apache_beam/io/aws/__init__.py b/sdks/python/apache_beam/io/aws/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/aws/__init__.py +++ b/sdks/python/apache_beam/io/aws/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/aws/clients/__init__.py b/sdks/python/apache_beam/io/aws/clients/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/aws/clients/__init__.py +++ b/sdks/python/apache_beam/io/aws/clients/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/aws/clients/s3/__init__.py b/sdks/python/apache_beam/io/aws/clients/s3/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/aws/clients/s3/__init__.py +++ b/sdks/python/apache_beam/io/aws/clients/s3/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/aws/clients/s3/boto3_client.py b/sdks/python/apache_beam/io/aws/clients/s3/boto3_client.py index e441d6f50c1c..e2f968ddba71 100644 --- a/sdks/python/apache_beam/io/aws/clients/s3/boto3_client.py +++ b/sdks/python/apache_beam/io/aws/clients/s3/boto3_client.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - from apache_beam.io.aws.clients.s3 import messages from apache_beam.options import pipeline_options @@ -57,7 +55,8 @@ def __init__(self, options): api_version = options.get('s3_api_version') verify = options.get('s3_verify') - self.client = boto3.client( + session = boto3.session.Session() + self.client = session.client( service_name='s3', region_name=region_name, api_version=api_version, diff --git a/sdks/python/apache_beam/io/aws/clients/s3/fake_client.py b/sdks/python/apache_beam/io/aws/clients/s3/fake_client.py index b04c642b48e4..b53ff32825a9 100644 --- a/sdks/python/apache_beam/io/aws/clients/s3/fake_client.py +++ b/sdks/python/apache_beam/io/aws/clients/s3/fake_client.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import datetime import time diff --git a/sdks/python/apache_beam/io/aws/clients/s3/messages.py b/sdks/python/apache_beam/io/aws/clients/s3/messages.py index 5363c856714c..8555364b727c 100644 --- a/sdks/python/apache_beam/io/aws/clients/s3/messages.py +++ b/sdks/python/apache_beam/io/aws/clients/s3/messages.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - class GetRequest(): """ diff --git a/sdks/python/apache_beam/io/aws/s3filesystem.py b/sdks/python/apache_beam/io/aws/s3filesystem.py index dce8cde05892..fa26badd60cb 100644 --- a/sdks/python/apache_beam/io/aws/s3filesystem.py +++ b/sdks/python/apache_beam/io/aws/s3filesystem.py @@ -19,10 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from future.utils import iteritems - from apache_beam.io.aws import s3io from apache_beam.io.filesystem import BeamIOError from apache_beam.io.filesystem import CompressedFile @@ -132,8 +128,8 @@ def _list(self, dir_or_prefix): ``BeamIOError``: if listing fails, but not if no files were found. """ try: - for path, size in iteritems( - s3io.S3IO(options=self._options).list_prefix(dir_or_prefix)): + for path, size in \ + s3io.S3IO(options=self._options).list_prefix(dir_or_prefix).items(): yield FileMetadata(path, size) except Exception as e: # pylint: disable=broad-except raise BeamIOError("List operation failed", {dir_or_prefix: e}) diff --git a/sdks/python/apache_beam/io/aws/s3filesystem_test.py b/sdks/python/apache_beam/io/aws/s3filesystem_test.py index 92d2a52b9ccf..eba67476f13e 100644 --- a/sdks/python/apache_beam/io/aws/s3filesystem_test.py +++ b/sdks/python/apache_beam/io/aws/s3filesystem_test.py @@ -20,8 +20,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/io/aws/s3io.py b/sdks/python/apache_beam/io/aws/s3io.py index 21b113026989..172a81d2171e 100644 --- a/sdks/python/apache_beam/io/aws/s3io.py +++ b/sdks/python/apache_beam/io/aws/s3io.py @@ -20,15 +20,12 @@ # pytype: skip-file -from __future__ import absolute_import - import errno import io import logging import re import time import traceback -from builtins import object from apache_beam.io.aws.clients.s3 import messages from apache_beam.io.filesystemio import Downloader @@ -493,7 +490,7 @@ def rename_files(self, src_dest_pairs): for src, dest, err in copy_results: if err is not None: rename_results.append((src, dest, err)) elif delete_results_dict[src] is not None: - rename_results.append(src, dest, delete_results_dict[src]) + rename_results.append((src, dest, delete_results_dict[src])) else: rename_results.append((src, dest, None)) diff --git a/sdks/python/apache_beam/io/aws/s3io_test.py b/sdks/python/apache_beam/io/aws/s3io_test.py index ad308f3e0eab..20db01614c3a 100644 --- a/sdks/python/apache_beam/io/aws/s3io_test.py +++ b/sdks/python/apache_beam/io/aws/s3io_test.py @@ -18,8 +18,6 @@ """Tests for S3 client.""" # pytype: skip-file -from __future__ import absolute_import - import logging import os import random diff --git a/sdks/python/apache_beam/io/azure/__init__.py b/sdks/python/apache_beam/io/azure/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/azure/__init__.py +++ b/sdks/python/apache_beam/io/azure/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/azure/blobstoragefilesystem.py b/sdks/python/apache_beam/io/azure/blobstoragefilesystem.py index 707843605444..28a5a5841b3a 100644 --- a/sdks/python/apache_beam/io/azure/blobstoragefilesystem.py +++ b/sdks/python/apache_beam/io/azure/blobstoragefilesystem.py @@ -19,10 +19,6 @@ Azure Blob Storage. """ -from __future__ import absolute_import - -from future.utils import iteritems - from apache_beam.io.azure import blobstorageio from apache_beam.io.filesystem import BeamIOError from apache_beam.io.filesystem import CompressedFile @@ -123,7 +119,7 @@ def _list(self, dir_or_prefix): """ try: for path, size in \ - iteritems(blobstorageio.BlobStorageIO().list_prefix(dir_or_prefix)): + blobstorageio.BlobStorageIO().list_prefix(dir_or_prefix).items(): yield FileMetadata(path, size) except Exception as e: # pylint: disable=broad-except raise BeamIOError("List operation failed", {dir_or_prefix: e}) diff --git a/sdks/python/apache_beam/io/azure/blobstoragefilesystem_test.py b/sdks/python/apache_beam/io/azure/blobstoragefilesystem_test.py index ac429bc1baba..c67dbe9c5c6e 100644 --- a/sdks/python/apache_beam/io/azure/blobstoragefilesystem_test.py +++ b/sdks/python/apache_beam/io/azure/blobstoragefilesystem_test.py @@ -20,13 +20,9 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest -# patches unittest.TestCase to be python3 compatible. -import future.tests.base # pylint: disable=unused-import import mock from apache_beam.io.filesystem import BeamIOError diff --git a/sdks/python/apache_beam/io/azure/blobstorageio.py b/sdks/python/apache_beam/io/azure/blobstorageio.py index 2515993177ad..d27b81ef8522 100644 --- a/sdks/python/apache_beam/io/azure/blobstorageio.py +++ b/sdks/python/apache_beam/io/azure/blobstorageio.py @@ -20,8 +20,6 @@ # pytype: skip-file -from __future__ import absolute_import - import errno import io import logging @@ -29,7 +27,6 @@ import re import tempfile import time -from builtins import object from apache_beam.io.filesystemio import Downloader from apache_beam.io.filesystemio import DownloaderStream diff --git a/sdks/python/apache_beam/io/azure/blobstorageio_test.py b/sdks/python/apache_beam/io/azure/blobstorageio_test.py index ee5e8e49f260..262a75b756a8 100644 --- a/sdks/python/apache_beam/io/azure/blobstorageio_test.py +++ b/sdks/python/apache_beam/io/azure/blobstorageio_test.py @@ -19,8 +19,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/io/concat_source.py b/sdks/python/apache_beam/io/concat_source.py index 4f799a9f5fb7..3872ccbe01ae 100644 --- a/sdks/python/apache_beam/io/concat_source.py +++ b/sdks/python/apache_beam/io/concat_source.py @@ -21,12 +21,8 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import bisect import threading -from builtins import range from apache_beam.io import iobase @@ -229,9 +225,12 @@ def position_at_fraction(self, fraction): def fraction_consumed(self): with self._lock: - return self.local_to_global( - self._claimed_source_ix, - self.sub_range_tracker(self._claimed_source_ix).fraction_consumed()) + if self._claimed_source_ix == len(self._source_bundles): + return 1.0 + else: + return self.local_to_global( + self._claimed_source_ix, + self.sub_range_tracker(self._claimed_source_ix).fraction_consumed()) def local_to_global(self, source_ix, source_frac): cw = self._cumulative_weights diff --git a/sdks/python/apache_beam/io/concat_source_test.py b/sdks/python/apache_beam/io/concat_source_test.py index b796ff36f0f3..efa24b3975fc 100644 --- a/sdks/python/apache_beam/io/concat_source_test.py +++ b/sdks/python/apache_beam/io/concat_source_test.py @@ -18,12 +18,8 @@ """Unit tests for the sources framework.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import unittest -from builtins import range import apache_beam as beam from apache_beam.io import iobase @@ -87,10 +83,6 @@ def __eq__(self, other): type(self) == type(other) and self._start == other._start and self._end == other._end and self._split_freq == other._split_freq) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - class ConcatSourceTest(unittest.TestCase): def test_range_source(self): @@ -135,6 +127,14 @@ def test_conact_source(self): self.assertEqual(range_tracker.sub_range_tracker(2).try_claim(10), True) self.assertEqual(range_tracker.sub_range_tracker(2).try_claim(11), False) + def test_fraction_consumed_at_end(self): + source = ConcatSource([ + RangeSource(0, 2), + RangeSource(2, 4), + ]) + range_tracker = source.get_range_tracker((2, None), None) + self.assertEqual(range_tracker.fraction_consumed(), 1.0) + def test_estimate_size(self): source = ConcatSource([ RangeSource(0, 10), diff --git a/sdks/python/apache_beam/io/debezium.py b/sdks/python/apache_beam/io/debezium.py new file mode 100644 index 000000000000..e598b97d4526 --- /dev/null +++ b/sdks/python/apache_beam/io/debezium.py @@ -0,0 +1,176 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" Unbounded source transform for + `Debezium `_. + + This transform is currently supported by Beam portable + Flink, Spark, and Dataflow v2 runners. + + **Setup** + + Transform provided in this module is cross-language transform + implemented in the Beam Java SDK. During the pipeline construction, Python SDK + will connect to a Java expansion service to expand this transform. + To facilitate this, a small amount of setup is needed before using this + transform in a Beam Python pipeline. + + There are several ways to setup cross-language Debezium transform. + + * Option 1: use the default expansion service + * Option 2: specify a custom expansion service + + See below for details regarding each of these options. + + *Option 1: Use the default expansion service* + + This is the recommended and easiest setup option for using Python Debezium + transform. This option requires following pre-requisites + before running the Beam pipeline. + + * Install Java runtime in the computer from where the pipeline is constructed + and make sure that 'java' command is available. + + In this option, Python SDK will either download (for released Beam version) or + build (when running from a Beam Git clone) a expansion service jar and use + that to expand transforms. Currently Debezium transform use the + 'beam-sdks-java-io-debezium-expansion-service' jar for this purpose. + + *Option 2: specify a custom expansion service* + + In this option, you startup your own expansion service and provide that as + a parameter when using the transform provided in this module. + + This option requires following pre-requisites before running the Beam + pipeline. + + * Startup your own expansion service. + * Update your pipeline to provide the expansion service address when + initiating Debezium transform provided in this module. + + Flink Users can use the built-in Expansion Service of the Flink Runner's + Job Server. If you start Flink's Job Server, the expansion service will be + started on port 8097. For a different address, please set the + expansion_service parameter. + + **More information** + + For more information regarding cross-language transforms see: + - https://beam.apache.org/roadmap/portability/ + + For more information specific to Flink runner see: + - https://beam.apache.org/documentation/runners/flink/ +""" + +# pytype: skip-file + +import json +from enum import Enum +from typing import List +from typing import NamedTuple +from typing import Optional + +from apache_beam.transforms import DoFn +from apache_beam.transforms import ParDo +from apache_beam.transforms import PTransform +from apache_beam.transforms.external import BeamJarExpansionService +from apache_beam.transforms.external import ExternalTransform +from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder + +__all__ = ['ReadFromDebezium', 'DriverClassName'] + + +def default_io_expansion_service(): + return BeamJarExpansionService( + 'sdks:java:io:debezium:expansion-service:shadowJar') + + +class DriverClassName(Enum): + MYSQL = 'MySQL' + POSTGRESQL = 'PostgreSQL' + ORACLE = 'Oracle' + DB2 = 'Db2' + + +ReadFromDebeziumSchema = NamedTuple( + 'ReadFromDebeziumSchema', + [('connector_class', str), ('username', str), ('password', str), + ('host', str), ('port', str), ('max_number_of_records', Optional[int]), + ('connection_properties', List[str])]) + + +class _JsonStringToDictionaries(DoFn): + """ A DoFn that consumes a JSON string and yields a python dictionary """ + def process(self, json_string): + obj = json.loads(json_string) + yield obj + + +class ReadFromDebezium(PTransform): + """ + An external PTransform which reads from Debezium and returns + a Dictionary for each item in the specified database + connection. + + Experimental; no backwards compatibility guarantees. + """ + URN = 'beam:external:java:debezium:read:v1' + + def __init__( + self, + connector_class, + username, + password, + host, + port, + max_number_of_records=None, + connection_properties=None, + expansion_service=None): + """ + Initializes a read operation from Debezium. + + :param connector_class: name of the jdbc driver class + :param username: database username + :param password: database password + :param host: database host + :param port: database port + :param max_number_of_records: maximum number of records + to be fetched before stop. + :param connection_properties: properties of the debezium + connection passed as string + with format + [propertyName=property;]* + :param expansion_service: The address (host:port) + of the ExpansionService. + """ + self.params = ReadFromDebeziumSchema( + connector_class=connector_class.value, + username=username, + password=password, + host=host, + port=port, + max_number_of_records=max_number_of_records, + connection_properties=connection_properties) + self.expansion_service = expansion_service or default_io_expansion_service() + + def expand(self, pbegin): + return ( + pbegin | ExternalTransform( + self.URN, + NamedTupleBasedPayloadBuilder(self.params), + self.expansion_service, + ) | ParDo(_JsonStringToDictionaries())) diff --git a/sdks/python/apache_beam/io/external/__init__.py b/sdks/python/apache_beam/io/external/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/external/__init__.py +++ b/sdks/python/apache_beam/io/external/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/external/gcp/__init__.py b/sdks/python/apache_beam/io/external/gcp/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/external/gcp/__init__.py +++ b/sdks/python/apache_beam/io/external/gcp/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/external/gcp/pubsub.py b/sdks/python/apache_beam/io/external/gcp/pubsub.py index e78ea9c26477..8f13771eb092 100644 --- a/sdks/python/apache_beam/io/external/gcp/pubsub.py +++ b/sdks/python/apache_beam/io/external/gcp/pubsub.py @@ -17,12 +17,8 @@ # pytype: skip-file -from __future__ import absolute_import - import typing -from past.builtins import unicode - import apache_beam as beam from apache_beam.io.gcp import pubsub from apache_beam.transforms import Map @@ -32,11 +28,11 @@ ReadFromPubsubSchema = typing.NamedTuple( 'ReadFromPubsubSchema', [ - ('topic', typing.Optional[unicode]), - ('subscription', typing.Optional[unicode]), - ('id_label', typing.Optional[unicode]), + ('topic', typing.Optional[str]), + ('subscription', typing.Optional[str]), + ('id_label', typing.Optional[str]), ('with_attributes', bool), - ('timestamp_attribute', typing.Optional[unicode]), + ('timestamp_attribute', typing.Optional[str]), ]) @@ -116,11 +112,11 @@ def expand(self, pbegin): WriteToPubsubSchema = typing.NamedTuple( 'WriteToPubsubSchema', [ - ('topic', unicode), - ('id_label', typing.Optional[unicode]), + ('topic', str), + ('id_label', typing.Optional[str]), # this is not implemented yet on the Java side: # ('with_attributes', bool), - ('timestamp_attribute', typing.Optional[unicode]), + ('timestamp_attribute', typing.Optional[str]), ]) diff --git a/sdks/python/apache_beam/io/external/generate_sequence.py b/sdks/python/apache_beam/io/external/generate_sequence.py index f709560f1238..e7f56e751383 100644 --- a/sdks/python/apache_beam/io/external/generate_sequence.py +++ b/sdks/python/apache_beam/io/external/generate_sequence.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - from apache_beam.transforms.external import ExternalTransform from apache_beam.transforms.external import ImplicitSchemaPayloadBuilder diff --git a/sdks/python/apache_beam/io/external/generate_sequence_test.py b/sdks/python/apache_beam/io/external/generate_sequence_test.py index b3450e45e557..f4147e1316e9 100644 --- a/sdks/python/apache_beam/io/external/generate_sequence_test.py +++ b/sdks/python/apache_beam/io/external/generate_sequence_test.py @@ -19,15 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import logging import os import re import unittest -from nose.plugins.attrib import attr +import pytest from apache_beam.io.external.generate_sequence import GenerateSequence from apache_beam.testing.test_pipeline import TestPipeline @@ -35,7 +32,7 @@ from apache_beam.testing.util import equal_to -@attr('UsesCrossLanguageTransforms') +@pytest.mark.xlang_transforms @unittest.skipUnless( os.environ.get('EXPANSION_PORT'), "EXPANSION_PORT environment var is not provided.") diff --git a/sdks/python/apache_beam/io/external/kafka.py b/sdks/python/apache_beam/io/external/kafka.py index e1852b44f111..ca577844f0ee 100644 --- a/sdks/python/apache_beam/io/external/kafka.py +++ b/sdks/python/apache_beam/io/external/kafka.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import warnings # pylint: disable=unused-import diff --git a/sdks/python/apache_beam/io/external/snowflake.py b/sdks/python/apache_beam/io/external/snowflake.py index 43d869ef263e..cf3aedc06856 100644 --- a/sdks/python/apache_beam/io/external/snowflake.py +++ b/sdks/python/apache_beam/io/external/snowflake.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import warnings # pylint: disable=unused-import diff --git a/sdks/python/apache_beam/io/external/xlang_debeziumio_it_test.py b/sdks/python/apache_beam/io/external/xlang_debeziumio_it_test.py new file mode 100644 index 000000000000..7829baba8b69 --- /dev/null +++ b/sdks/python/apache_beam/io/external/xlang_debeziumio_it_test.py @@ -0,0 +1,123 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import logging +import unittest + +from apache_beam.io.debezium import DriverClassName +from apache_beam.io.debezium import ReadFromDebezium +from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to + +# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports +try: + from testcontainers.postgres import PostgresContainer +except ImportError: + PostgresContainer = None + +NUM_RECORDS = 1 + + +@unittest.skipIf( + PostgresContainer is None, 'testcontainers package is not installed') +@unittest.skipIf( + TestPipeline().get_pipeline_options().view_as(StandardOptions).runner is + None, + 'Do not run this test on precommit suites.') +class CrossLanguageDebeziumIOTest(unittest.TestCase): + def setUp(self): + self.username = 'debezium' + self.password = 'dbz' + self.database = 'inventory' + self.start_db_container(retries=1) + self.host = self.db.get_container_host_ip() + self.port = self.db.get_exposed_port(5432) + self.connector_class = DriverClassName.POSTGRESQL + self.connection_properties = [ + "database.dbname=inventory", + "database.server.name=dbserver1", + "database.include.list=inventory", + "include.schema.changes=false" + ] + + def tearDown(self): + # Sometimes stopping the container raises ReadTimeout. We can ignore it + # here to avoid the test failure. + try: + self.db.stop() + except: # pylint: disable=bare-except + logging.error('Could not stop the DB container.') + + def test_xlang_debezium_read(self): + expected_response = [{ + "metadata": { + "connector": "postgresql", + "version": "1.3.1.Final", + "name": "dbserver1", + "database": "inventory", + "schema": "inventory", + "table": "customers" + }, + "before": None, + "after": { + "fields": { + "last_name": "Thomas", + "id": 1001, + "first_name": "Sally", + "email": "sally.thomas@acme.com" + } + } + }] + + with TestPipeline() as p: + p.not_use_test_runner_api = True + results = ( + p + | 'Read from debezium' >> ReadFromDebezium( + username=self.username, + password=self.password, + host=self.host, + port=self.port, + max_number_of_records=NUM_RECORDS, + connector_class=self.connector_class, + connection_properties=self.connection_properties)) + assert_that(results, equal_to(expected_response)) + + +# Creating a container with testcontainers sometimes raises ReadTimeout +# error. In java there are 2 retries set by default. + + def start_db_container(self, retries): + for i in range(retries): + try: + self.db = PostgresContainer( + 'debezium/example-postgres:latest', + user=self.username, + password=self.password, + dbname=self.database) + self.db.start() + break + except Exception as e: # pylint: disable=bare-except + if i == retries - 1: + logging.error('Unable to initialize DB container.') + raise e + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.INFO) + unittest.main() diff --git a/sdks/python/apache_beam/io/external/xlang_jdbcio_it_test.py b/sdks/python/apache_beam/io/external/xlang_jdbcio_it_test.py index 67c3d9902474..23227429911e 100644 --- a/sdks/python/apache_beam/io/external/xlang_jdbcio_it_test.py +++ b/sdks/python/apache_beam/io/external/xlang_jdbcio_it_test.py @@ -17,14 +17,10 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import typing import unittest -from past.builtins import unicode - import apache_beam as beam from apache_beam import coders from apache_beam.io.jdbc import ReadFromJdbc @@ -63,7 +59,7 @@ [ ("f_id", int), ("f_real", float), - ("f_string", unicode), + ("f_string", str), ], ) coders.registry.register_coder(JdbcWriteTestRow, coders.RowCoder) diff --git a/sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py b/sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py index 514acd0ec2bf..6da4567dc7bb 100644 --- a/sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py +++ b/sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py @@ -17,8 +17,6 @@ """Integration test for Python cross-language pipelines for Java KafkaIO.""" -from __future__ import absolute_import - import contextlib import logging import os @@ -110,7 +108,6 @@ def run_xlang_kafkaio(self, pipeline): pipeline.run(False) -@unittest.skip('BEAM-10663') @unittest.skipUnless( os.environ.get('LOCAL_KAFKA_JAR'), "LOCAL_KAFKA_JAR environment var is not provided.") diff --git a/sdks/python/apache_beam/io/external/xlang_kinesisio_it_test.py b/sdks/python/apache_beam/io/external/xlang_kinesisio_it_test.py index 3df7841fae69..01ff279528bc 100644 --- a/sdks/python/apache_beam/io/external/xlang_kinesisio_it_test.py +++ b/sdks/python/apache_beam/io/external/xlang_kinesisio_it_test.py @@ -35,8 +35,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import time diff --git a/sdks/python/apache_beam/io/external/xlang_parquetio_test.py b/sdks/python/apache_beam/io/external/xlang_parquetio_test.py index 8ca67b8b7027..e6c06be2d270 100644 --- a/sdks/python/apache_beam/io/external/xlang_parquetio_test.py +++ b/sdks/python/apache_beam/io/external/xlang_parquetio_test.py @@ -19,16 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import logging import os import re import unittest -from nose.plugins.attrib import attr - import apache_beam as beam from apache_beam import coders from apache_beam.coders.avro_record import AvroRecord @@ -39,7 +34,8 @@ PARQUET_WRITE_URN = "beam:transforms:xlang:test:parquet_write" -@attr('UsesCrossLanguageTransforms') +# TODO: enable test_xlang_parquetio_write after fixing BEAM-10507 +# @pytest.mark.xlang_transforms @unittest.skipUnless( os.environ.get('EXPANSION_JAR'), "EXPANSION_JAR environment variable is not set.") diff --git a/sdks/python/apache_beam/io/external/xlang_snowflakeio_it_test.py b/sdks/python/apache_beam/io/external/xlang_snowflakeio_it_test.py index bedbc9030d03..052ea3156228 100644 --- a/sdks/python/apache_beam/io/external/xlang_snowflakeio_it_test.py +++ b/sdks/python/apache_beam/io/external/xlang_snowflakeio_it_test.py @@ -41,8 +41,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import binascii import logging diff --git a/sdks/python/apache_beam/io/filebasedsink.py b/sdks/python/apache_beam/io/filebasedsink.py index a355d8c9b345..cae317493d5e 100644 --- a/sdks/python/apache_beam/io/filebasedsink.py +++ b/sdks/python/apache_beam/io/filebasedsink.py @@ -19,18 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import re import time import uuid -from builtins import range -from builtins import zip - -from future.utils import iteritems -from past.builtins import unicode from apache_beam.internal import util from apache_beam.io import iobase @@ -84,11 +77,11 @@ def __init__( ValueError: if **shard_name_template** is not of expected format. """ - if not isinstance(file_path_prefix, ((str, unicode), ValueProvider)): + if not isinstance(file_path_prefix, (str, ValueProvider)): raise TypeError( 'file_path_prefix must be a string or ValueProvider;' 'got %r instead' % file_path_prefix) - if not isinstance(file_name_suffix, ((str, unicode), ValueProvider)): + if not isinstance(file_name_suffix, (str, ValueProvider)): raise TypeError( 'file_name_suffix must be a string or ValueProvider;' 'got %r instead' % file_name_suffix) @@ -101,9 +94,9 @@ def __init__( shard_name_template = DEFAULT_SHARD_NAME_TEMPLATE elif shard_name_template == '': num_shards = 1 - if isinstance(file_path_prefix, (str, unicode)): + if isinstance(file_path_prefix, str): file_path_prefix = StaticValueProvider(str, file_path_prefix) - if isinstance(file_name_suffix, (str, unicode)): + if isinstance(file_name_suffix, str): file_name_suffix = StaticValueProvider(str, file_name_suffix) self.file_path_prefix = file_path_prefix self.file_name_suffix = file_name_suffix @@ -321,7 +314,7 @@ def _rename_batch(batch): except BeamIOError as exp: if exp.exception_details is None: raise - for (src, dst), exception in iteritems(exp.exception_details): + for (src, dst), exception in exp.exception_details.items(): if exception: _LOGGER.error( ('Exception in _rename_batch. src: %s, ' @@ -403,10 +396,6 @@ def __eq__(self, other): # pylint: disable=unidiomatic-typecheck return type(self) == type(other) and self.__dict__ == other.__dict__ - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - class FileBasedSinkWriter(iobase.Writer): """The writer for FileBasedSink. diff --git a/sdks/python/apache_beam/io/filebasedsink_test.py b/sdks/python/apache_beam/io/filebasedsink_test.py index aab1ffa1276b..c75958ce744e 100644 --- a/sdks/python/apache_beam/io/filebasedsink_test.py +++ b/sdks/python/apache_beam/io/filebasedsink_test.py @@ -20,19 +20,13 @@ # pytype: skip-file -from __future__ import absolute_import - import glob import logging import os import shutil -import sys import tempfile import unittest -from builtins import range -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import hamcrest as hc import mock @@ -105,12 +99,6 @@ def close(self, file_handle): class TestFileBasedSink(_TestCaseWithTempDirCleanUp): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def _common_init(self, sink): # Manually invoke the generic Sink API. init_token = sink.initialize_write() diff --git a/sdks/python/apache_beam/io/filebasedsource.py b/sdks/python/apache_beam/io/filebasedsource.py index a957b2e45c8c..38830e73fbc5 100644 --- a/sdks/python/apache_beam/io/filebasedsource.py +++ b/sdks/python/apache_beam/io/filebasedsource.py @@ -28,12 +28,8 @@ # pytype: skip-file -from __future__ import absolute_import - from typing import Callable - -from past.builtins import long -from past.builtins import unicode +from typing import Union from apache_beam.internal import pickler from apache_beam.io import concat_source @@ -106,13 +102,13 @@ def __init__( result. """ - if not isinstance(file_pattern, ((str, unicode), ValueProvider)): + if not isinstance(file_pattern, (str, ValueProvider)): raise TypeError( '%s: file_pattern must be of type string' ' or ValueProvider; got %r instead' % (self.__class__.__name__, file_pattern)) - if isinstance(file_pattern, (str, unicode)): + if isinstance(file_pattern, str): file_pattern = StaticValueProvider(str, file_pattern) self._pattern = file_pattern @@ -251,11 +247,11 @@ def __init__( stop_offset, min_bundle_size=0, splittable=True): - if not isinstance(start_offset, (int, long)): + if not isinstance(start_offset, int): raise TypeError( 'start_offset must be a number. Received: %r' % start_offset) if stop_offset != range_trackers.OffsetRangeTracker.OFFSET_INFINITY: - if not isinstance(stop_offset, (int, long)): + if not isinstance(stop_offset, int): raise TypeError( 'stop_offset must be a number. Received: %r' % stop_offset) if start_offset >= stop_offset: @@ -366,9 +362,13 @@ def process(self, element, *args, **kwargs): class _ReadRange(DoFn): - def __init__(self, source_from_file): - # type: (Callable[[str], iobase.BoundedSource]) -> None + def __init__( + self, + source_from_file, # type: Union[str, iobase.BoundedSource] + with_filename=False # type: bool + ) -> None: self._source_from_file = source_from_file + self._with_filename = with_filename def process(self, element, *args, **kwargs): metadata, range = element @@ -382,8 +382,12 @@ def process(self, element, *args, **kwargs): if not source_list: return source = source_list[0].source + for record in source.read(range.new_tracker()): - yield record + if self._with_filename: + yield (metadata.path, record) + else: + yield record class ReadAllFiles(PTransform): @@ -400,6 +404,7 @@ def __init__(self, desired_bundle_size, # type: int min_bundle_size, # type: int source_from_file, # type: Callable[[str], iobase.BoundedSource] + with_filename=False # type: bool ): """ Args: @@ -420,12 +425,16 @@ def __init__(self, paths passed to this will be for individual files, not for file patterns even if the ``PCollection`` of files processed by the transform consist of file patterns. + with_filename: If True, returns a Key Value with the key being the file + name and the value being the actual data. If False, it only returns + the data. """ self._splittable = splittable self._compression_type = compression_type self._desired_bundle_size = desired_bundle_size self._min_bundle_size = min_bundle_size self._source_from_file = source_from_file + self._with_filename = with_filename def expand(self, pvalue): return ( @@ -437,4 +446,6 @@ def expand(self, pvalue): self._desired_bundle_size, self._min_bundle_size)) | 'Reshard' >> Reshuffle() - | 'ReadRange' >> ParDo(_ReadRange(self._source_from_file))) + | 'ReadRange' >> ParDo( + _ReadRange( + self._source_from_file, with_filename=self._with_filename))) diff --git a/sdks/python/apache_beam/io/filebasedsource_test.py b/sdks/python/apache_beam/io/filebasedsource_test.py index 8e7f09970cee..e68d2afbac9d 100644 --- a/sdks/python/apache_beam/io/filebasedsource_test.py +++ b/sdks/python/apache_beam/io/filebasedsource_test.py @@ -16,9 +16,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import bz2 import gzip import io @@ -26,14 +23,9 @@ import math import os import random -import sys import tempfile import unittest -from builtins import object -from builtins import range -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import hamcrest as hc import apache_beam as beam @@ -217,12 +209,6 @@ def read(self, range_tracker): def estimate_size(self): return len(self._values) # Assuming each value to be 1 byte. - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): # Reducing the size of thread pools. Without this test execution may fail in # environments with limited amount of resources. @@ -266,12 +252,6 @@ def test_estimate_size(self): class TestFileBasedSource(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): # Reducing the size of thread pools. Without this test execution may fail in # environments with limited amount of resources. @@ -621,12 +601,6 @@ def default_output_coder(self): class TestSingleFileSource(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): # Reducing the size of thread pools. Without this test execution may fail in # environments with limited amount of resources. diff --git a/sdks/python/apache_beam/io/fileio.py b/sdks/python/apache_beam/io/fileio.py index 1100991f599e..8c990b114697 100644 --- a/sdks/python/apache_beam/io/fileio.py +++ b/sdks/python/apache_beam/io/fileio.py @@ -88,31 +88,37 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import logging import random import uuid +from collections import namedtuple from typing import TYPE_CHECKING from typing import Any from typing import BinaryIO # pylint: disable=unused-import from typing import Callable from typing import DefaultDict from typing import Dict +from typing import Iterable +from typing import List from typing import Tuple - -from past.builtins import unicode +from typing import Union import apache_beam as beam from apache_beam.io import filesystem from apache_beam.io import filesystems from apache_beam.io.filesystem import BeamIOError +from apache_beam.io.filesystem import CompressionTypes from apache_beam.options.pipeline_options import GoogleCloudOptions from apache_beam.options.value_provider import StaticValueProvider from apache_beam.options.value_provider import ValueProvider +from apache_beam.transforms.periodicsequence import PeriodicImpulse +from apache_beam.transforms.userstate import CombiningValueStateSpec from apache_beam.transforms.window import GlobalWindow +from apache_beam.transforms.window import IntervalWindow from apache_beam.utils.annotations import experimental +from apache_beam.utils.timestamp import MAX_TIMESTAMP +from apache_beam.utils.timestamp import Timestamp if TYPE_CHECKING: from apache_beam.transforms.window import BoundedWindow @@ -121,12 +127,17 @@ 'EmptyMatchTreatment', 'MatchFiles', 'MatchAll', + 'MatchContinuously', 'ReadableFile', 'ReadMatches' ] _LOGGER = logging.getLogger(__name__) +FileMetadata = namedtuple("FileMetadata", "mime_type compression_type") + +CreateFileMetadataFn = Callable[[str, str], FileMetadata] + class EmptyMatchTreatment(object): """How to treat empty matches in ``MatchAll`` and ``MatchFiles`` transforms. @@ -154,7 +165,7 @@ class _MatchAllFn(beam.DoFn): def __init__(self, empty_match_treatment): self._empty_match_treatment = empty_match_treatment - def process(self, file_pattern): + def process(self, file_pattern: str) -> List[filesystem.FileMetadata]: # TODO: Should we batch the lookups? match_results = filesystems.FileSystems.match([file_pattern]) match_result = match_results[0] @@ -175,12 +186,12 @@ class MatchFiles(beam.PTransform): of ``FileMetadata`` objects.""" def __init__( self, - file_pattern, + file_pattern: str, empty_match_treatment=EmptyMatchTreatment.ALLOW_IF_WILDCARD): self._file_pattern = file_pattern self._empty_match_treatment = empty_match_treatment - def expand(self, pcoll): + def expand(self, pcoll) -> beam.PCollection[filesystem.FileMetadata]: return pcoll.pipeline | beam.Create([self._file_pattern]) | MatchAll() @@ -192,19 +203,45 @@ class MatchAll(beam.PTransform): def __init__(self, empty_match_treatment=EmptyMatchTreatment.ALLOW): self._empty_match_treatment = empty_match_treatment - def expand(self, pcoll): + def expand( + self, + pcoll: beam.PCollection, + ) -> beam.PCollection[filesystem.FileMetadata]: return pcoll | beam.ParDo(_MatchAllFn(self._empty_match_treatment)) +class ReadableFile(object): + """A utility class for accessing files.""" + def __init__(self, metadata, compression=None): + self.metadata = metadata + self._compression = compression + + def open(self, mime_type='text/plain', compression_type=None): + compression = ( + compression_type or self._compression or + filesystems.CompressionTypes.AUTO) + return filesystems.FileSystems.open( + self.metadata.path, mime_type=mime_type, compression_type=compression) + + def read(self, mime_type='application/octet-stream'): + return self.open(mime_type).read() + + def read_utf8(self): + return self.open().read().decode('utf-8') + + class _ReadMatchesFn(beam.DoFn): def __init__(self, compression, skip_directories): self._compression = compression self._skip_directories = skip_directories - def process(self, file_metadata): + def process( + self, + file_metadata: Union[str, filesystem.FileMetadata], + ) -> Iterable[ReadableFile]: metadata = ( filesystem.FileMetadata(file_metadata, 0) if isinstance( - file_metadata, (str, unicode)) else file_metadata) + file_metadata, str) else file_metadata) if ((metadata.path.endswith('/') or metadata.path.endswith('\\')) and self._skip_directories): @@ -218,24 +255,55 @@ def process(self, file_metadata): yield ReadableFile(metadata, self._compression) -class ReadableFile(object): - """A utility class for accessing files.""" - def __init__(self, metadata, compression=None): - self.metadata = metadata - self._compression = compression +@experimental() +class MatchContinuously(beam.PTransform): + """Checks for new files for a given pattern every interval. - def open(self, mime_type='text/plain', compression_type=None): - compression = ( - compression_type or self._compression or - filesystems.CompressionTypes.AUTO) - return filesystems.FileSystems.open( - self.metadata.path, mime_type=mime_type, compression_type=compression) + This ``PTransform`` returns a ``PCollection`` of matching files in the form + of ``FileMetadata`` objects. + """ + def __init__( + self, + file_pattern, + interval=360.0, + has_deduplication=True, + start_timestamp=Timestamp.now(), + stop_timestamp=MAX_TIMESTAMP): + """Initializes a MatchContinuously transform. - def read(self, mime_type='application/octet-stream'): - return self.open(mime_type).read() + Args: + file_pattern: The file path to read from. + interval: Interval at which to check for files in seconds. + has_deduplication: Whether files already read are discarded or not. + start_timestamp: Timestamp for start file checking. + stop_timestamp: Timestamp after which no more files will be checked. + """ - def read_utf8(self): - return self.open().read().decode('utf-8') + self.file_pattern = file_pattern + self.interval = interval + self.has_deduplication = has_deduplication + self.start_ts = start_timestamp + self.stop_ts = stop_timestamp + + def expand(self, pcoll): + impulse = pcoll | PeriodicImpulse( + start_timestamp=self.start_ts, + stop_timestamp=self.stop_ts, + fire_interval=self.interval) + + match_files = ( + impulse + | 'GetFilePattern' >> beam.Map(lambda x: self.file_pattern) + | MatchAll()) + + if self.has_deduplication: + match_files = ( + match_files + # Making a Key Value so each file has its own state. + | 'ToKV' >> beam.Map(lambda x: (x.path, x)) + | 'RemoveAlreadyRead' >> beam.ParDo(_RemoveDuplicates())) + + return match_files class ReadMatches(beam.PTransform): @@ -246,7 +314,10 @@ def __init__(self, compression=None, skip_directories=True): self._compression = compression self._skip_directories = skip_directories - def expand(self, pcoll): + def expand( + self, + pcoll: beam.PCollection[Union[str, filesystem.FileMetadata]], + ) -> beam.PCollection[ReadableFile]: return pcoll | beam.ParDo( _ReadMatchesFn(self._compression, self._skip_directories)) @@ -263,7 +334,16 @@ class FileSink(object): - The ``flush`` method, which flushes any buffered state. This is most often called before closing a file (but not exclusively called in that situation). The sink is not responsible for closing the file handler. + A Sink class can override the following: + - The ``create_metadata`` method, which creates all metadata passed to + Filesystems.create. """ + def create_metadata( + self, destination: str, full_file_name: str) -> FileMetadata: + return FileMetadata( + mime_type="application/octet-stream", + compression_type=CompressionTypes.AUTO) + def open(self, fh): # type: (BinaryIO) -> None raise NotImplementedError @@ -516,18 +596,27 @@ def expand(self, pcoll): return file_results -def _create_writer(base_path, writer_key): +def _create_writer( + base_path, + writer_key: Tuple[str, IntervalWindow], + create_metadata_fn: CreateFileMetadataFn, +): try: filesystems.FileSystems.mkdirs(base_path) except IOError: # Directory already exists. pass + destination = writer_key[0] + # The file name has a prefix determined by destination+window, along with # a random string. This allows us to retrieve orphaned files later on. file_name = '%s_%s' % (abs(hash(writer_key)), uuid.uuid4()) full_file_name = filesystems.FileSystems.join(base_path, file_name) - return full_file_name, filesystems.FileSystems.create(full_file_name) + metadata = create_metadata_fn(destination, full_file_name) + return full_file_name, filesystems.FileSystems.create( + full_file_name, + **metadata._asdict()) class _MoveTempFilesIntoFinalDestinationFn(beam.DoFn): @@ -608,9 +697,13 @@ def process( shard = destination_and_shard[1] records = element[1] - full_file_name, writer = _create_writer(base_path=self.base_path.get(), - writer_key=(destination, w)) sink = self.sink_fn(destination) + + full_file_name, writer = _create_writer( + base_path=self.base_path.get(), + writer_key=(destination, w), + create_metadata_fn=sink.create_metadata) + sink.open(writer) for r in records: @@ -705,10 +798,13 @@ def _get_or_create_writer_and_sink(self, destination, window): return None, None else: # The writer does not exist, but we can still create a new one. - full_file_name, writer = _create_writer(base_path=self.base_path.get(), - writer_key=writer_key) sink = self.sink_fn(destination) + full_file_name, writer = _create_writer( + base_path=self.base_path.get(), + writer_key=writer_key, + create_metadata_fn=sink.create_metadata) + sink.open(writer) self._writers_and_sinks[writer_key] = (writer, sink) @@ -735,3 +831,21 @@ def finish_bundle(self): timestamp=key[1].start, windows=[key[1]] # TODO(pabloem) HOW DO WE GET THE PANE )) + + +class _RemoveDuplicates(beam.DoFn): + + COUNT_STATE = CombiningValueStateSpec('count', combine_fn=sum) + + def process(self, element, count_state=beam.DoFn.StateParam(COUNT_STATE)): + + path = element[0] + file_metadata = element[1] + counter = count_state.read() + + if counter == 0: + count_state.add(1) + _LOGGER.debug('Generated entry for file %s', path) + yield file_metadata + else: + _LOGGER.debug('File %s was already read, seen %d times', path, counter) diff --git a/sdks/python/apache_beam/io/fileio_test.py b/sdks/python/apache_beam/io/fileio_test.py index 473f9ec33f0b..90086c967364 100644 --- a/sdks/python/apache_beam/io/fileio_test.py +++ b/sdks/python/apache_beam/io/fileio_test.py @@ -19,20 +19,17 @@ # pytype: skip-file -from __future__ import absolute_import - import csv import io import json import logging import os -import sys import unittest import uuid import warnings +import pytest from hamcrest.library.text import stringmatches -from nose.plugins.attrib import attr import apache_beam as beam from apache_beam.io import fileio @@ -51,16 +48,14 @@ from apache_beam.transforms.window import FixedWindows from apache_beam.transforms.window import GlobalWindow from apache_beam.transforms.window import IntervalWindow +from apache_beam.utils.timestamp import Timestamp warnings.filterwarnings( 'ignore', category=FutureWarning, module='apache_beam.io.fileio_test') def _get_file_reader(readable_file): - if sys.version_info >= (3, 0): - return io.TextIOWrapper(readable_file.open()) - else: - return readable_file.open() + return io.TextIOWrapper(readable_file.open()) class MatchTest(_TestCaseWithTempDirCleanUp): @@ -296,7 +291,7 @@ class MatchIntegrationTest(unittest.TestCase): def setUp(self): self.test_pipeline = TestPipeline(is_integration_test=True) - @attr('IT') + @pytest.mark.it_postcommit def test_transform_on_gcs(self): args = self.test_pipeline.get_full_options_as_args() @@ -326,6 +321,71 @@ def test_transform_on_gcs(self): label='Assert Checksums') +class MatchContinuouslyTest(_TestCaseWithTempDirCleanUp): + def test_with_deduplication(self): + files = [] + tempdir = '%s%s' % (self._new_tempdir(), os.sep) + + # Create a file to be matched before pipeline + files.append(self._create_temp_file(dir=tempdir)) + # Add file name that will be created mid-pipeline + files.append(FileSystems.join(tempdir, 'extra')) + + interval = 0.2 + start = Timestamp.now() + stop = start + interval + 0.1 + + def _create_extra_file(element): + writer = FileSystems.create(FileSystems.join(tempdir, 'extra')) + writer.close() + return element.path + + with TestPipeline() as p: + match_continiously = ( + p + | fileio.MatchContinuously( + file_pattern=FileSystems.join(tempdir, '*'), + interval=interval, + start_timestamp=start, + stop_timestamp=stop) + | beam.Map(_create_extra_file)) + + assert_that(match_continiously, equal_to(files)) + + def test_without_deduplication(self): + interval = 0.2 + start = Timestamp.now() + stop = start + interval + 0.1 + + files = [] + tempdir = '%s%s' % (self._new_tempdir(), os.sep) + + # Create a file to be matched before pipeline starts + file = self._create_temp_file(dir=tempdir) + # Add file twice, since it will be matched for every interval + files += [file, file] + # Add file name that will be created mid-pipeline + files.append(FileSystems.join(tempdir, 'extra')) + + def _create_extra_file(element): + writer = FileSystems.create(FileSystems.join(tempdir, 'extra')) + writer.close() + return element.path + + with TestPipeline() as p: + match_continiously = ( + p + | fileio.MatchContinuously( + file_pattern=FileSystems.join(tempdir, '*'), + interval=interval, + has_deduplication=False, + start_timestamp=start, + stop_timestamp=stop) + | beam.Map(_create_extra_file)) + + assert_that(match_continiously, equal_to(files)) + + class WriteFilesTest(_TestCaseWithTempDirCleanUp): SIMPLE_COLLECTION = [ diff --git a/sdks/python/apache_beam/io/filesystem.py b/sdks/python/apache_beam/io/filesystem.py index c57f133f7990..687ab043b1dc 100644 --- a/sdks/python/apache_beam/io/filesystem.py +++ b/sdks/python/apache_beam/io/filesystem.py @@ -24,9 +24,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import abc import bz2 import io @@ -36,18 +33,12 @@ import re import time import zlib -from builtins import object -from builtins import zip from typing import BinaryIO # pylint: disable=unused-import from typing import Iterator from typing import List from typing import Optional from typing import Tuple -from future.utils import with_metaclass -from past.builtins import long -from past.builtins import unicode - from apache_beam.utils.plugin import BeamPlugin logger = logging.getLogger(__name__) @@ -211,9 +202,7 @@ def write(self, data): if compressed: self._file.write(compressed) - def _fetch_to_internal_buffer(self, num_bytes): - # type: (int) -> None - + def _fetch_to_internal_buffer(self, num_bytes: int) -> None: """Fetch up to num_bytes into the internal buffer.""" if (not self._read_eof and self._read_position > 0 and (self._read_buffer.tell() - self._read_position) < num_bytes): @@ -225,6 +214,7 @@ def _fetch_to_internal_buffer(self, num_bytes): self._clear_read_buffer() self._read_buffer.write(data) + assert self._decompressor while not self._read_eof and (self._read_buffer.tell() - self._read_position) < num_bytes: # Continue reading from the underlying file object until enough bytes are @@ -298,23 +288,22 @@ def readline(self): return bytes_io.getvalue() - def closed(self): - # type: () -> bool - return not self._file or self._file.closed() + def closed(self) -> bool: + return not self._file or self._file.closed - def close(self): - # type: () -> None + def close(self) -> None: if self.readable(): self._read_buffer.close() if self.writeable(): + assert self._compressor self._file.write(self._compressor.flush()) self._file.close() - def flush(self): - # type: () -> None + def flush(self) -> None: if self.writeable(): + assert self._compressor self._file.write(self._compressor.flush()) self._file.flush() @@ -435,8 +424,8 @@ def __exit__(self, exception_type, exception_value, traceback): class FileMetadata(object): """Metadata about a file path that is the output of FileSystem.match.""" def __init__(self, path, size_in_bytes): - assert isinstance(path, (str, unicode)) and path, "Path should be a string" - assert isinstance(size_in_bytes, (int, long)) and size_in_bytes >= 0, \ + assert isinstance(path, str) and path, "Path should be a string" + assert isinstance(size_in_bytes, int) and size_in_bytes >= 0, \ "Invalid value for size_in_bytes should %s (of type %s)" % ( size_in_bytes, type(size_in_bytes)) self.path = path @@ -452,10 +441,6 @@ def __eq__(self, other): def __hash__(self): return hash((self.path, self.size_in_bytes)) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __repr__(self): return 'FileMetadata(%s, %s)' % (self.path, self.size_in_bytes) @@ -487,7 +472,7 @@ def __init__(self, msg, exception_details=None): self.exception_details = exception_details -class FileSystem(with_metaclass(abc.ABCMeta, BeamPlugin)): # type: ignore[misc] +class FileSystem(BeamPlugin, metaclass=abc.ABCMeta): """A class that defines the functions that can be performed on a filesystem. All methods are abstract and they are for file system providers to diff --git a/sdks/python/apache_beam/io/filesystem_test.py b/sdks/python/apache_beam/io/filesystem_test.py index dda046fac794..83453a96d247 100644 --- a/sdks/python/apache_beam/io/filesystem_test.py +++ b/sdks/python/apache_beam/io/filesystem_test.py @@ -19,9 +19,6 @@ """Unit tests for filesystem module.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import bz2 import gzip import logging @@ -32,10 +29,8 @@ import tempfile import unittest import zlib -from builtins import range from io import BytesIO -from future.utils import iteritems from parameterized import param from parameterized import parameterized @@ -72,7 +67,7 @@ def _insert_random_file(self, path, size): self._files[path] = size def _list(self, dir_or_prefix): - for path, size in iteritems(self._files): + for path, size in self._files.items(): if path.startswith(dir_or_prefix): yield FileMetadata(path, size) diff --git a/sdks/python/apache_beam/io/filesystemio.py b/sdks/python/apache_beam/io/filesystemio.py index c46ef2177571..70f39daf661c 100644 --- a/sdks/python/apache_beam/io/filesystemio.py +++ b/sdks/python/apache_beam/io/filesystemio.py @@ -19,14 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import abc import io import os -from builtins import object - -from future.utils import with_metaclass __all__ = [ 'Downloader', @@ -37,7 +32,7 @@ ] -class Downloader(with_metaclass(abc.ABCMeta, object)): # type: ignore[misc] +class Downloader(metaclass=abc.ABCMeta): """Download interface for a single file. Implementations should support random access reads. @@ -63,7 +58,7 @@ def get_range(self, start, end): """ -class Uploader(with_metaclass(abc.ABCMeta, object)): # type: ignore[misc] +class Uploader(metaclass=abc.ABCMeta): """Upload interface for a single file.""" @abc.abstractmethod def put(self, data): diff --git a/sdks/python/apache_beam/io/filesystemio_test.py b/sdks/python/apache_beam/io/filesystemio_test.py index d101a06f5acb..cbf9449e78d1 100644 --- a/sdks/python/apache_beam/io/filesystemio_test.py +++ b/sdks/python/apache_beam/io/filesystemio_test.py @@ -19,15 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import io import logging import multiprocessing import os import threading import unittest -from builtins import range from apache_beam.io import filesystemio diff --git a/sdks/python/apache_beam/io/filesystems.py b/sdks/python/apache_beam/io/filesystems.py index be894281ac24..4683d2bb5f11 100644 --- a/sdks/python/apache_beam/io/filesystems.py +++ b/sdks/python/apache_beam/io/filesystems.py @@ -19,14 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import re -from builtins import object from typing import BinaryIO # pylint: disable=unused-import -from past.builtins import unicode - from apache_beam.io.filesystem import BeamIOError from apache_beam.io.filesystem import CompressionTypes from apache_beam.io.filesystem import FileSystem @@ -339,7 +334,7 @@ def delete(paths): Raises: ``BeamIOError``: if any of the delete operations fail """ - if isinstance(paths, (str, unicode)): + if isinstance(paths, str): raise BeamIOError( 'Delete passed string argument instead of list: %s' % paths) if len(paths) == 0: diff --git a/sdks/python/apache_beam/io/filesystems_test.py b/sdks/python/apache_beam/io/filesystems_test.py index e1d5d43be06c..1083499b4e2b 100644 --- a/sdks/python/apache_beam/io/filesystems_test.py +++ b/sdks/python/apache_beam/io/filesystems_test.py @@ -20,18 +20,13 @@ # pytype: skip-file -from __future__ import absolute_import - import filecmp import logging import os import shutil -import sys import tempfile import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import mock from apache_beam.io import localfilesystem @@ -48,12 +43,6 @@ def _join(first_path, *paths): class FileSystemsTest(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): self.tmpdir = tempfile.mkdtemp() diff --git a/sdks/python/apache_beam/io/flink/__init__.py b/sdks/python/apache_beam/io/flink/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/flink/__init__.py +++ b/sdks/python/apache_beam/io/flink/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/flink/flink_streaming_impulse_source.py b/sdks/python/apache_beam/io/flink/flink_streaming_impulse_source.py index 7b9a54a712ea..d50672ed6be2 100644 --- a/sdks/python/apache_beam/io/flink/flink_streaming_impulse_source.py +++ b/sdks/python/apache_beam/io/flink/flink_streaming_impulse_source.py @@ -22,8 +22,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - import json from typing import Any from typing import Dict @@ -44,7 +42,7 @@ def expand(self, pbegin): 'Input to transform must be a PBegin but found %s' % pbegin) return pvalue.PCollection(pbegin.pipeline, is_bounded=False) - def get_windowing(self, inputs): + def get_windowing(self, unused_inputs): return Windowing(GlobalWindows()) def infer_output_type(self, unused_input_type): diff --git a/sdks/python/apache_beam/io/flink/flink_streaming_impulse_source_test.py b/sdks/python/apache_beam/io/flink/flink_streaming_impulse_source_test.py index 2102d8f68746..f6ba5abdd575 100644 --- a/sdks/python/apache_beam/io/flink/flink_streaming_impulse_source_test.py +++ b/sdks/python/apache_beam/io/flink/flink_streaming_impulse_source_test.py @@ -18,8 +18,6 @@ """Unit tests for flink_streaming_impulse_source.""" # pytype: skip-file -from __future__ import absolute_import - import logging import unittest @@ -33,8 +31,7 @@ def test_serialization(self): # pylint: disable=expression-not-assigned p | FlinkStreamingImpulseSource() # Test that roundtrip through Runner API works - beam.Pipeline.from_runner_api( - p.to_runner_api(), p.runner, p._options, allow_proto_holders=False) + beam.Pipeline.from_runner_api(p.to_runner_api(), p.runner, p._options) if __name__ == '__main__': diff --git a/sdks/python/apache_beam/io/gcp/__init__.py b/sdks/python/apache_beam/io/gcp/__init__.py index 02d83df0eb05..87dc1c30ee18 100644 --- a/sdks/python/apache_beam/io/gcp/__init__.py +++ b/sdks/python/apache_beam/io/gcp/__init__.py @@ -14,26 +14,22 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import - -import sys # Important: the MIME library in the Python 3.x standard library used by # apitools causes uploads containing '\r\n' to be corrupted, unless we # patch the BytesGenerator class to write contents verbatim. -if sys.version_info[0] == 3: - try: - # pylint: disable=wrong-import-order, wrong-import-position - # pylint: disable=ungrouped-imports - import apitools.base.py.transfer as transfer - import email.generator as email_generator +try: + # pylint: disable=wrong-import-order, wrong-import-position + # pylint: disable=ungrouped-imports + import apitools.base.py.transfer as transfer + import email.generator as email_generator - class _WrapperNamespace(object): - class BytesGenerator(email_generator.BytesGenerator): - def _write_lines(self, lines): - self.write(lines) + class _WrapperNamespace(object): + class BytesGenerator(email_generator.BytesGenerator): + def _write_lines(self, lines): + self.write(lines) - transfer.email_generator = _WrapperNamespace - except ImportError: - # We may not have the GCP dependencies installed, so we pass in this case. - pass + transfer.email_generator = _WrapperNamespace +except ImportError: + # We may not have the GCP dependencies installed, so we pass in this case. + pass diff --git a/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py b/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py index c20ed47d665c..699dfa4b119a 100644 --- a/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py +++ b/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py @@ -21,8 +21,6 @@ # pytype: skip-file -from __future__ import absolute_import - import base64 import datetime import logging @@ -30,8 +28,8 @@ import time import unittest +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.io.gcp import big_query_query_to_table_pipeline from apache_beam.io.gcp.bigquery_tools import BigQueryWrapper @@ -156,7 +154,7 @@ def _setup_new_types_env(self): self.project, self.dataset_id, NEW_TYPES_INPUT_TABLE, table_data) self.assertTrue(passed, 'Error in BQ setup: %s' % errors) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_legacy_sql(self): verify_query = DIALECT_OUTPUT_VERIFY_QUERY % self.output_table expected_checksum = test_utils.compute_hash(DIALECT_OUTPUT_EXPECTED) @@ -179,7 +177,7 @@ def test_big_query_legacy_sql(self): options = self.test_pipeline.get_full_options_as_args(**extra_opts) big_query_query_to_table_pipeline.run_bq_pipeline(options) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_standard_sql(self): verify_query = DIALECT_OUTPUT_VERIFY_QUERY % self.output_table expected_checksum = test_utils.compute_hash(DIALECT_OUTPUT_EXPECTED) @@ -202,7 +200,7 @@ def test_big_query_standard_sql(self): options = self.test_pipeline.get_full_options_as_args(**extra_opts) big_query_query_to_table_pipeline.run_bq_pipeline(options) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_standard_sql_kms_key_native(self): if isinstance(self.test_pipeline.runner, TestDirectRunner): self.skipTest("This test doesn't work on DirectRunner.") @@ -238,7 +236,7 @@ def test_big_query_standard_sql_kms_key_native(self): 'No encryption configuration found: %s' % table) self.assertEqual(kms_key, table.encryptionConfiguration.kmsKeyName) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_new_types(self): expected_checksum = test_utils.compute_hash(NEW_TYPES_OUTPUT_EXPECTED) verify_query = NEW_TYPES_OUTPUT_VERIFY_QUERY % self.output_table @@ -262,7 +260,7 @@ def test_big_query_new_types(self): options = self.test_pipeline.get_full_options_as_args(**extra_opts) big_query_query_to_table_pipeline.run_bq_pipeline(options) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_new_types_avro(self): expected_checksum = test_utils.compute_hash(NEW_TYPES_OUTPUT_EXPECTED) verify_query = NEW_TYPES_OUTPUT_VERIFY_QUERY % self.output_table @@ -285,7 +283,7 @@ def test_big_query_new_types_avro(self): options = self.test_pipeline.get_full_options_as_args(**extra_opts) big_query_query_to_table_pipeline.run_bq_pipeline(options) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_new_types_native(self): expected_checksum = test_utils.compute_hash(NEW_TYPES_OUTPUT_EXPECTED) verify_query = NEW_TYPES_OUTPUT_VERIFY_QUERY % self.output_table diff --git a/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py b/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py index 8e4b0df5c0a0..c1867be91efc 100644 --- a/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py +++ b/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py @@ -24,8 +24,6 @@ # pytype: skip-file # pylint: disable=wrong-import-order, wrong-import-position -from __future__ import absolute_import - import argparse import apache_beam as beam diff --git a/sdks/python/apache_beam/io/gcp/bigquery.py b/sdks/python/apache_beam/io/gcp/bigquery.py index a5ceb5bc585f..35de4aa98188 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery.py +++ b/sdks/python/apache_beam/io/gcp/bigquery.py @@ -45,7 +45,7 @@ may use some caching techniques to share the side inputs between calls in order to avoid excessive reading::: - main_table = pipeline | 'VeryBig' >> beam.io.ReadFroBigQuery(...) + main_table = pipeline | 'VeryBig' >> beam.io.ReadFromBigQuery(...) side_table = pipeline | 'NotBig' >> beam.io.ReadFromBigQuery(...) results = ( main_table @@ -230,8 +230,8 @@ def compute_table_name(row): also take a callable that receives a table reference. -[1] https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#\ -configuration.load +[1] https://cloud.google.com/bigquery/docs/reference/rest/v2/Job\ + #jobconfigurationload [2] https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert [3] https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource @@ -269,8 +269,6 @@ def compute_table_name(row): # pytype: skip-file -from __future__ import absolute_import - import collections import itertools import json @@ -278,14 +276,9 @@ def compute_table_name(row): import random import time import uuid -from builtins import object -from builtins import zip from typing import Dict from typing import Union -from future.utils import itervalues -from past.builtins import unicode - import apache_beam as beam from apache_beam import coders from apache_beam import pvalue @@ -362,8 +355,6 @@ def compute_table_name(row): NOTE: This job name template does not have backwards compatibility guarantees. """ BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}" -"""The number of shards per destination when writing via streaming inserts.""" -DEFAULT_SHARDS_PER_DESTINATION = 500 @deprecated(since='2.11.0', current="bigquery_tools.parse_table_reference") @@ -439,7 +430,7 @@ def decode(self, encoded_table_row): od = json.loads( encoded_table_row, object_pairs_hook=collections.OrderedDict) return bigquery.TableRow( - f=[bigquery.TableCell(v=to_json_value(e)) for e in itervalues(od)]) + f=[bigquery.TableCell(v=to_json_value(e)) for e in od.values()]) class BigQueryDisposition(object): @@ -675,7 +666,7 @@ def __init__( self.query = None self.use_legacy_sql = True else: - if isinstance(query, (str, unicode)): + if isinstance(query, str): query = StaticValueProvider(str, query) self.query = query # TODO(BEAM-1082): Change the internal flag to be standard_sql @@ -723,7 +714,7 @@ def estimate_size(self): if (isinstance(self.table_reference, vp.ValueProvider) and self.table_reference.is_accessible()): table_ref = bigquery_tools.parse_table_reference( - table_ref, project=self._get_project()) + self.table_reference.get(), project=self._get_project()) elif isinstance(self.table_reference, vp.ValueProvider): # Size estimation is best effort. We return None as we have # no access to the table that we're querying. @@ -739,7 +730,7 @@ def estimate_size(self): self._job_name, self._source_uuid, bigquery_tools.BigQueryJobTypes.QUERY, - random.randint(0, 1000)) + '%s_%s' % (int(time.time()), random.randint(0, 1000))) job = bq._start_query_job( project, self.query.get(), @@ -803,7 +794,8 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None): bq.clean_up_temporary_dataset(self._get_project()) for source in self.split_result: - yield SourceBundle(0, source, None, None) + yield SourceBundle( + weight=1.0, source=source, start_position=None, stop_position=None) def get_range_tracker(self, start_position, stop_position): class CustomBigQuerySourceRangeTracker(RangeTracker): @@ -821,6 +813,9 @@ def read(self, range_tracker): @check_accessible(['query']) def _setup_temporary_dataset(self, bq): + if self.temp_dataset: + # Temp dataset was provided by the user so we can just return. + return location = bq.get_query_location( self._get_project(), self.query.get(), self.use_legacy_sql) bq.create_temporary_dataset(self._get_project(), location) @@ -831,7 +826,7 @@ def _execute_query(self, bq): self._job_name, self._source_uuid, bigquery_tools.BigQueryJobTypes.QUERY, - random.randint(0, 1000)) + '%s_%s' % (int(time.time()), random.randint(0, 1000))) job = bq._start_query_job( self._get_project(), self.query.get(), @@ -857,7 +852,7 @@ def _export_files(self, bq): self._job_name, self._source_uuid, bigquery_tools.BigQueryJobTypes.EXPORT, - random.randint(0, 1000)) + '%s_%s' % (int(time.time()), random.randint(0, 1000))) temp_location = self.options.view_as(GoogleCloudOptions).temp_location gcs_location = bigquery_export_destination_uri( self.gcs_location, temp_location, self._source_uuid) @@ -986,7 +981,7 @@ def __init__( self.table_reference = bigquery_tools.parse_table_reference( table, dataset, project) # Transform the table schema into a bigquery.TableSchema instance. - if isinstance(schema, (str, unicode)): + if isinstance(schema, str): # TODO(silviuc): Should add a regex-based validation of the format. table_schema = bigquery.TableSchema() schema_list = [s.strip(' ') for s in schema.split(',')] @@ -1081,7 +1076,8 @@ def __init__( max_buffered_rows=None, retry_strategy=None, additional_bq_parameters=None, - ignore_insert_ids=False): + ignore_insert_ids=False, + with_batched_input=False): """Initialize a WriteToBigQuery transform. Args: @@ -1122,6 +1118,9 @@ def __init__( duplication of data inserted to BigQuery, set `ignore_insert_ids` to True to increase the throughput for BQ writing. See: https://cloud.google.com/bigquery/streaming-data-into-bigquery#disabling_best_effort_de-duplication + with_batched_input: Whether the input has already been batched per + destination. If not, perform best-effort batching per destination within + a bundle. """ self.schema = schema self.test_client = test_client @@ -1142,6 +1141,7 @@ def __init__( max_buffered_rows or BigQueryWriteFn.DEFAULT_MAX_BUFFERED_ROWS) self._retry_strategy = retry_strategy or RetryStrategy.RETRY_ALWAYS self.ignore_insert_ids = ignore_insert_ids + self.with_batched_input = with_batched_input self.additional_bq_parameters = additional_bq_parameters or {} @@ -1185,7 +1185,7 @@ def get_table_schema(schema): """ if schema is None: return schema - elif isinstance(schema, (str, unicode)): + elif isinstance(schema, str): return bigquery_tools.parse_table_schema_from_json(schema) elif isinstance(schema, dict): return bigquery_tools.parse_table_schema_from_json(json.dumps(schema)) @@ -1240,7 +1240,7 @@ def _create_table_if_needed(self, table_reference, schema=None): _KNOWN_TABLES.add(str_table_reference) def process(self, element, *schema_side_inputs): - destination = element[0] + destination = bigquery_tools.get_hashable_destination(element[0]) if callable(self.schema): schema = self.schema(destination, *schema_side_inputs) @@ -1252,15 +1252,19 @@ def process(self, element, *schema_side_inputs): self._create_table_if_needed( bigquery_tools.parse_table_reference(destination), schema) - destination = bigquery_tools.get_hashable_destination(destination) - - row_and_insert_id = element[1] - self._rows_buffer[destination].append(row_and_insert_id) - self._total_buffered_rows += 1 - if len(self._rows_buffer[destination]) >= self._max_batch_size: + if not self.with_batched_input: + row_and_insert_id = element[1] + self._rows_buffer[destination].append(row_and_insert_id) + self._total_buffered_rows += 1 + if len(self._rows_buffer[destination]) >= self._max_batch_size: + return self._flush_batch(destination) + elif self._total_buffered_rows >= self._max_buffered_rows: + return self._flush_all_batches() + else: + # The input is already batched per destination, flush the rows now. + batched_rows = element[1] + self._rows_buffer[destination].extend(batched_rows) return self._flush_batch(destination) - elif self._total_buffered_rows >= self._max_buffered_rows: - return self._flush_all_batches() def finish_bundle(self): bigquery_tools.BigQueryWrapper.HISTOGRAM_METRIC_LOGGER.log_metrics( @@ -1297,7 +1301,7 @@ def _flush_batch(self, destination): rows = [r[0] for r in rows_and_insert_ids] if self.ignore_insert_ids: - insert_ids = None + insert_ids = [None for r in rows_and_insert_ids] else: insert_ids = [r[1] for r in rows_and_insert_ids] @@ -1312,10 +1316,11 @@ def _flush_batch(self, destination): skip_invalid_rows=True) self.batch_latency_metric.update((time.time() - start) * 1000) - failed_rows = [rows[entry.index] for entry in errors] + failed_rows = [rows[entry['index']] for entry in errors] should_retry = any( RetryStrategy.should_retry( - self._retry_strategy, entry.errors[0].reason) for entry in errors) + self._retry_strategy, entry['errors'][0]['reason']) + for entry in errors) if not passed: self.failed_rows_metric.update(len(failed_rows)) message = ( @@ -1348,6 +1353,13 @@ def _flush_batch(self, destination): ] +# The number of shards per destination when writing via streaming inserts. +DEFAULT_SHARDS_PER_DESTINATION = 500 +# The max duration a batch of elements is allowed to be buffered before being +# flushed to BigQuery. +DEFAULT_BATCH_BUFFERING_DURATION_LIMIT_SEC = 0.2 + + class _StreamToBigQuery(PTransform): def __init__( self, @@ -1362,6 +1374,7 @@ def __init__( retry_strategy, additional_bq_parameters, ignore_insert_ids, + with_auto_sharding, test_client=None): self.table_reference = table_reference self.table_side_inputs = table_side_inputs @@ -1375,11 +1388,9 @@ def __init__( self.test_client = test_client self.additional_bq_parameters = additional_bq_parameters self.ignore_insert_ids = ignore_insert_ids + self.with_auto_sharding = with_auto_sharding class InsertIdPrefixFn(DoFn): - def __init__(self, shards=DEFAULT_SHARDS_PER_DESTINATION): - self.shards = shards - def start_bundle(self): self.prefix = str(uuid.uuid4()) self._row_count = 0 @@ -1387,8 +1398,6 @@ def start_bundle(self): def process(self, element): key = element[0] value = element[1] - key = (key, random.randint(0, self.shards)) - insert_id = '%s-%s' % (self.prefix, self._row_count) self._row_count += 1 yield (key, (value, insert_id)) @@ -1403,28 +1412,53 @@ def expand(self, input): retry_strategy=self.retry_strategy, test_client=self.test_client, additional_bq_parameters=self.additional_bq_parameters, - ignore_insert_ids=self.ignore_insert_ids) + ignore_insert_ids=self.ignore_insert_ids, + with_batched_input=self.with_auto_sharding) - def drop_shard(elms): - key_and_shard = elms[0] - key = key_and_shard[0] - value = elms[1] - return (key, value) + def _add_random_shard(element): + key = element[0] + value = element[1] + return ((key, random.randint(0, DEFAULT_SHARDS_PER_DESTINATION)), value) - sharded_data = ( + def _restore_table_ref(sharded_table_ref_elems_kv): + sharded_table_ref = sharded_table_ref_elems_kv[0] + table_ref = bigquery_tools.parse_table_reference(sharded_table_ref) + return (table_ref, sharded_table_ref_elems_kv[1]) + + tagged_data = ( input | 'AppendDestination' >> beam.ParDo( bigquery_tools.AppendDestinationsFn(self.table_reference), *self.table_side_inputs) - | 'AddInsertIdsWithRandomKeys' >> beam.ParDo( - _StreamToBigQuery.InsertIdPrefixFn())) - - if not self.ignore_insert_ids: - sharded_data = (sharded_data | 'CommitInsertIds' >> ReshufflePerKey()) + | 'AddInsertIds' >> beam.ParDo(_StreamToBigQuery.InsertIdPrefixFn()) + | + 'ToHashableTableRef' >> beam.Map(bigquery_tools.to_hashable_table_ref)) + + if not self.with_auto_sharding: + tagged_data = ( + tagged_data + | 'WithFixedSharding' >> beam.Map(_add_random_shard) + | 'CommitInsertIds' >> ReshufflePerKey() + | 'DropShard' >> beam.Map(lambda kv: (kv[0][0], kv[1]))) + else: + # Auto-sharding is achieved via GroupIntoBatches.WithShardedKey + # transform which shards, groups and at the same time batches the table + # rows to be inserted to BigQuery. + + # Firstly the keys of tagged_data (table references) are converted to a + # hashable format. This is needed to work with the keyed states used by + # GroupIntoBatches. After grouping and batching is done, original table + # references are restored. + tagged_data = ( + tagged_data + | 'WithAutoSharding' >> beam.GroupIntoBatches.WithShardedKey( + (self.batch_size or BigQueryWriteFn.DEFAULT_MAX_BUFFERED_ROWS), + DEFAULT_BATCH_BUFFERING_DURATION_LIMIT_SEC) + | 'DropShard' >> beam.Map(lambda kv: (kv[0].key, kv[1]))) return ( - sharded_data - | 'DropShard' >> beam.Map(drop_shard) + tagged_data + | 'FromHashableTableRef' >> beam.Map(_restore_table_ref) | 'StreamInsertRows' >> ParDo( bigquery_write_fn, *self.schema_side_inputs).with_outputs( BigQueryWriteFn.FAILED_ROWS, main='main')) @@ -1468,20 +1502,20 @@ def __init__( triggering_frequency=None, validate=True, temp_file_format=None, - ignore_insert_ids=False): + ignore_insert_ids=False, + # TODO(BEAM-11857): Switch the default when the feature is mature. + with_auto_sharding=False): """Initialize a WriteToBigQuery transform. Args: table (str, callable, ValueProvider): The ID of the table, or a callable that returns it. The ID must contain only letters ``a-z``, ``A-Z``, - numbers ``0-9``, or underscores ``_``. If dataset argument is + numbers ``0-9``, or connectors ``-_``. If dataset argument is :data:`None` then the table argument must contain the entire table reference specified as: ``'DATASET.TABLE'`` or ``'PROJECT:DATASET.TABLE'``. If it's a callable, it must receive one argument representing an element to be written to BigQuery, and return a TableReference, or a string table name as specified above. - Multiple destinations are only supported on Batch pipelines at the - moment. dataset (str): The ID of the dataset containing this table or :data:`None` if the table reference is specified entirely by the table argument. @@ -1500,7 +1534,7 @@ def __init__( fields, repeated fields, or specifying a BigQuery mode for fields (mode will always be set to ``'NULLABLE'``). If a callable, then it should receive a destination (in the form of - a TableReference or a string, and return a str, dict or TableSchema. + a str, and return a str, dict or TableSchema). One may also pass ``SCHEMA_AUTODETECT`` here when using JSON-based file loads, and BigQuery will try to infer the schema for the files that are being loaded. @@ -1525,7 +1559,6 @@ def __init__( tables. batch_size (int): Number of rows to be written to BQ per streaming API insert. The default is 500. - insert. test_client: Override the default bigquery client used for testing. max_file_size (int): The maximum size for a file to be written and then loaded into BigQuery. The default value is 4TB, which is 80% of the @@ -1560,11 +1593,13 @@ def __init__( rows with transient errors (e.g. timeouts). Rows with permanent errors will be output to dead letter queue under `'FailedRows'` tag. - additional_bq_parameters (callable): A function that returns a dictionary - with additional parameters to pass to BQ when creating / loading data - into a table. These can be 'timePartitioning', 'clustering', etc. They - are passed directly to the job load configuration. See - https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load + additional_bq_parameters (dict, callable): Additional parameters to pass + to BQ when creating / loading data into a table. If a callable, it + should be a function that receives a table reference indicating + the destination and returns a dictionary. + These can be 'timePartitioning', 'clustering', etc. They are passed + directly to the job load configuration. See + https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationload table_side_inputs (tuple): A tuple with ``AsSideInput`` PCollections to be passed to the table callable (if one is provided). schema_side_inputs: A tuple with ``AsSideInput`` PCollections to be @@ -1592,6 +1627,10 @@ def __init__( duplication of data inserted to BigQuery, set `ignore_insert_ids` to True to increase the throughput for BQ writing. See: https://cloud.google.com/bigquery/streaming-data-into-bigquery#disabling_best_effort_de-duplication + with_auto_sharding: Experimental. If true, enables using a dynamically + determined number of shards to write to BigQuery. This can be used for + both FILE_LOADS and STREAMING_INSERTS. Only applicable to unbounded + input. """ self._table = table self._dataset = dataset @@ -1616,6 +1655,7 @@ def __init__( self.max_files_per_bundle = max_files_per_bundle self.method = method or WriteToBigQuery.Method.DEFAULT self.triggering_frequency = triggering_frequency + self.with_auto_sharding = with_auto_sharding self.insert_retry_strategy = insert_retry_strategy self._validate = validate self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON @@ -1650,10 +1690,14 @@ def expand(self, pcoll): self.table_reference.projectId = pcoll.pipeline.options.view_as( GoogleCloudOptions).project - experiments = p.options.view_as(DebugOptions).experiments or [] # TODO(pabloem): Use a different method to determine if streaming or batch. is_streaming_pipeline = p.options.view_as(StandardOptions).streaming + if not is_streaming_pipeline and self.with_auto_sharding: + raise ValueError( + 'with_auto_sharding is not applicable to batch pipelines.') + + experiments = p.options.view_as(DebugOptions).experiments or [] method_to_use = self._compute_method(experiments, is_streaming_pipeline) if method_to_use == WriteToBigQuery.Method.STREAMING_INSERTS: @@ -1668,17 +1712,18 @@ def expand(self, pcoll): 'FILE_LOADS method of writing to BigQuery.') outputs = pcoll | _StreamToBigQuery( - self.table_reference, - self.table_side_inputs, - self.schema_side_inputs, - self.schema, - self.batch_size, - self.create_disposition, - self.write_disposition, - self.kms_key, - self.insert_retry_strategy, - self.additional_bq_parameters, - self._ignore_insert_ids, + table_reference=self.table_reference, + table_side_inputs=self.table_side_inputs, + schema_side_inputs=self.schema_side_inputs, + schema=self.schema, + batch_size=self.batch_size, + create_disposition=self.create_disposition, + write_disposition=self.write_disposition, + kms_key=self.kms_key, + retry_strategy=self.insert_retry_strategy, + additional_bq_parameters=self.additional_bq_parameters, + ignore_insert_ids=self._ignore_insert_ids, + with_auto_sharding=self.with_auto_sharding, test_client=self.test_client) return {BigQueryWriteFn.FAILED_ROWS: outputs[BigQueryWriteFn.FAILED_ROWS]} @@ -1702,6 +1747,7 @@ def expand(self, pcoll): create_disposition=self.create_disposition, write_disposition=self.write_disposition, triggering_frequency=self.triggering_frequency, + with_auto_sharding=self.with_auto_sharding, temp_file_format=self._temp_file_format, max_file_size=self.max_file_size, max_files_per_bundle=self.max_files_per_bundle, @@ -1715,7 +1761,8 @@ def expand(self, pcoll): def display_data(self): res = {} - if self.table_reference is not None: + if self.table_reference is not None and isinstance(self.table_reference, + TableReference): tableSpec = '{}.{}'.format( self.table_reference.datasetId, self.table_reference.tableId) if self.table_reference.projectId is not None: @@ -1760,6 +1807,8 @@ def serialize(side_inputs): 'triggering_frequency': self.triggering_frequency, 'validate': self._validate, 'temp_file_format': self._temp_file_format, + 'ignore_insert_ids': self._ignore_insert_ids, + 'with_auto_sharding': self.with_auto_sharding, } return 'beam:transform:write_to_big_query:v0', pickler.dumps(config) @@ -1856,28 +1905,35 @@ class ReadFromBigQuery(PTransform): To learn more about type conversions between BigQuery and Avro, see: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro\ #avro_conversions + temp_dataset (``apache_beam.io.gcp.internal.clients.bigquery.\ + DatasetReference``): + Temporary dataset reference to use when reading from BigQuery using a + query. When reading using a query, BigQuery source will create a + temporary dataset and a temporary table to store the results of the + query. With this option, you can set an existing dataset to create the + temporary table in. BigQuery source will create a temporary table in + that dataset, and will remove it once it is not needed. Job needs access + to create and delete tables within the given dataset. Dataset name + should *not* start with the reserved prefix `beam_temp_dataset_`. """ COUNTER = 0 def __init__(self, gcs_location=None, *args, **kwargs): if gcs_location: - if not isinstance(gcs_location, (str, unicode, ValueProvider)): + if not isinstance(gcs_location, (str, ValueProvider)): raise TypeError( '%s: gcs_location must be of type string' ' or ValueProvider; got %r instead' % (self.__class__.__name__, type(gcs_location))) - - if isinstance(gcs_location, (str, unicode)): + if isinstance(gcs_location, str): gcs_location = StaticValueProvider(str, gcs_location) self.gcs_location = gcs_location - self._args = args self._kwargs = kwargs def expand(self, pcoll): - # TODO(BEAM-11115): Make ReadFromBQ rely on ReadAllFromBQ implementation. temp_location = pcoll.pipeline.options.view_as( GoogleCloudOptions).temp_location job_name = pcoll.pipeline.options.view_as(GoogleCloudOptions).job_name @@ -2038,5 +2094,5 @@ def expand(self, pcoll): return ( sources_to_read - | SDFBoundedSourceReader() + | SDFBoundedSourceReader(data_to_display=self.display_data()) | _PassThroughThenCleanup(beam.pvalue.AsIter(cleanup_locations))) diff --git a/sdks/python/apache_beam/io/gcp/bigquery_avro_tools.py b/sdks/python/apache_beam/io/gcp/bigquery_avro_tools.py index f726b1139d55..85f600459056 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_avro_tools.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_avro_tools.py @@ -23,9 +23,6 @@ NOTHING IN THIS FILE HAS BACKWARDS COMPATIBILITY GUARANTEES. """ -from __future__ import absolute_import -from __future__ import division - # BigQuery types as listed in # https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types # with aliases (RECORD, BOOLEAN, FLOAT, INTEGER) as defined in diff --git a/sdks/python/apache_beam/io/gcp/bigquery_avro_tools_test.py b/sdks/python/apache_beam/io/gcp/bigquery_avro_tools_test.py index 229dde01829e..ab4eb09b8654 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_avro_tools_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_avro_tools_test.py @@ -14,25 +14,18 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import -from __future__ import division - import json import logging import unittest import fastavro +from avro.schema import Parse from apache_beam.io.gcp import bigquery_avro_tools from apache_beam.io.gcp import bigquery_tools from apache_beam.io.gcp.bigquery_test import HttpError from apache_beam.io.gcp.internal.clients import bigquery -try: - from avro.schema import Parse # avro-python3 library for Python 3 -except ImportError: - from avro.schema import parse as Parse # avro library for Python 2 - @unittest.skipIf(HttpError is None, 'GCP dependencies are not installed') class TestBigQueryToAvroSchema(unittest.TestCase): @@ -90,30 +83,33 @@ def test_convert_bigquery_schema_to_avro_schema(self): # Test that schema can be parsed correctly by avro parsed_schema = Parse(json.dumps(avro_schema)) - # Avro RecordSchema provides field_map in py3 and fields_dict in py2 - field_map = getattr(parsed_schema, "field_map", None) or \ - getattr(parsed_schema, "fields_dict", None) - self.assertEqual(field_map["number"].type, Parse(json.dumps("long"))) self.assertEqual( - field_map["species"].type, Parse(json.dumps(["null", "string"]))) + parsed_schema.field_map["number"].type, Parse(json.dumps("long"))) self.assertEqual( - field_map["quality"].type, Parse(json.dumps(["null", "double"]))) + parsed_schema.field_map["species"].type, + Parse(json.dumps(["null", "string"]))) self.assertEqual( - field_map["grade"].type, Parse(json.dumps(["null", "double"]))) + parsed_schema.field_map["quality"].type, + Parse(json.dumps(["null", "double"]))) self.assertEqual( - field_map["quantity"].type, Parse(json.dumps(["null", "long"]))) + parsed_schema.field_map["grade"].type, + Parse(json.dumps(["null", "double"]))) self.assertEqual( - field_map["dependents"].type, Parse(json.dumps(["null", "long"]))) + parsed_schema.field_map["quantity"].type, + Parse(json.dumps(["null", "long"]))) self.assertEqual( - field_map["birthday"].type, + parsed_schema.field_map["dependents"].type, + Parse(json.dumps(["null", "long"]))) + self.assertEqual( + parsed_schema.field_map["birthday"].type, Parse( json.dumps( ["null", { "type": "long", "logicalType": "timestamp-micros" }]))) self.assertEqual( - field_map["birthdayMoney"].type, + parsed_schema.field_map["birthdayMoney"].type, Parse( json.dumps([ "null", @@ -125,31 +121,35 @@ def test_convert_bigquery_schema_to_avro_schema(self): } ]))) self.assertEqual( - field_map["flighted"].type, Parse(json.dumps(["null", "boolean"]))) + parsed_schema.field_map["flighted"].type, + Parse(json.dumps(["null", "boolean"]))) self.assertEqual( - field_map["flighted2"].type, Parse(json.dumps(["null", "boolean"]))) + parsed_schema.field_map["flighted2"].type, + Parse(json.dumps(["null", "boolean"]))) self.assertEqual( - field_map["sound"].type, Parse(json.dumps(["null", "bytes"]))) + parsed_schema.field_map["sound"].type, + Parse(json.dumps(["null", "bytes"]))) self.assertEqual( - field_map["anniversaryDate"].type, + parsed_schema.field_map["anniversaryDate"].type, Parse(json.dumps(["null", { "type": "int", "logicalType": "date" }]))) self.assertEqual( - field_map["anniversaryDatetime"].type, + parsed_schema.field_map["anniversaryDatetime"].type, Parse(json.dumps(["null", "string"]))) self.assertEqual( - field_map["anniversaryTime"].type, + parsed_schema.field_map["anniversaryTime"].type, Parse( json.dumps(["null", { "type": "long", "logicalType": "time-micros" }]))) self.assertEqual( - field_map["geoPositions"].type, Parse(json.dumps(["null", "string"]))) + parsed_schema.field_map["geoPositions"].type, + Parse(json.dumps(["null", "string"]))) for field in ("scion", "family"): self.assertEqual( - field_map[field].type, + parsed_schema.field_map[field].type, Parse( json.dumps([ "null", @@ -169,7 +169,7 @@ def test_convert_bigquery_schema_to_avro_schema(self): ]))) self.assertEqual( - field_map["associates"].type, + parsed_schema.field_map["associates"].type, Parse( json.dumps({ "type": "array", diff --git a/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py b/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py index 0342b4c84b70..e211ab4ebf8d 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py @@ -28,15 +28,13 @@ # pytype: skip-file -from __future__ import absolute_import - import hashlib +import io import logging import random +import time import uuid -from future.utils import iteritems - import apache_beam as beam from apache_beam import pvalue from apache_beam.io import filesystems as fs @@ -46,8 +44,16 @@ from apache_beam.options.pipeline_options import GoogleCloudOptions from apache_beam.transforms import trigger from apache_beam.transforms.display import DisplayDataItem +from apache_beam.transforms.util import GroupIntoBatches from apache_beam.transforms.window import GlobalWindows +# Protect against environments where bigquery library is not available. +# pylint: disable=wrong-import-order, wrong-import-position +try: + from apitools.base.py.exceptions import HttpError +except ImportError: + pass + _LOGGER = logging.getLogger(__name__) ONE_TERABYTE = (1 << 40) @@ -67,6 +73,10 @@ # this many records are written. _FILE_TRIGGERING_RECORD_COUNT = 500000 +# If using auto-sharding for unbounded data, we batch the records before +# triggering file write to avoid generating too many small files. +_FILE_TRIGGERING_BATCHING_DURATION_SECS = 1 + def _generate_job_name(job_name, job_type, step_name): return bigquery_tools.generate_bq_job_name( @@ -253,11 +263,11 @@ def process(self, element, file_prefix, *schema_side_inputs): self._destination_to_file_writer.pop(destination) yield pvalue.TaggedOutput( WriteRecordsToFile.WRITTEN_FILE_TAG, - (element[0], (file_path, file_size))) + (destination, (file_path, file_size))) def finish_bundle(self): for destination, file_path_writer in \ - iteritems(self._destination_to_file_writer): + self._destination_to_file_writer.items(): (file_path, writer) = file_path_writer file_size = writer.tell() writer.close() @@ -284,7 +294,7 @@ def __init__( self.file_format = file_format or bigquery_tools.FileFormat.JSON def process(self, element, file_prefix, *schema_side_inputs): - destination = element[0] + destination = bigquery_tools.get_hashable_destination(element[0]) rows = element[1] file_path, writer = None, None @@ -310,6 +320,129 @@ def process(self, element, file_prefix, *schema_side_inputs): yield (destination, (file_path, file_size)) +class UpdateDestinationSchema(beam.DoFn): + """Update destination schema based on data that is about to be copied into it. + + Unlike load and query jobs, BigQuery copy jobs do not support schema field + addition or relaxation on the destination table. This DoFn fills that gap by + updating the destination table schemas to be compatible with the data coming + from the source table so that schema field modification options are respected + regardless of whether data is loaded directly to the destination table or + loaded into temporary tables before being copied into the destination. + + This tranform takes as input a (destination, job_reference) pair where the + job_reference refers to a completed load job into a temporary table. + + This transform emits (destination, job_reference) pairs where the + job_reference refers to a submitted load job for performing the schema + modification. Note that the input and output job references are not the same. + + Experimental; no backwards compatibility guarantees. + """ + def __init__( + self, + write_disposition=None, + test_client=None, + additional_bq_parameters=None, + step_name=None, + source_format=None): + self._test_client = test_client + self._write_disposition = write_disposition + self._additional_bq_parameters = additional_bq_parameters or {} + self._step_name = step_name + self._source_format = source_format + + def setup(self): + self._bq_wrapper = bigquery_tools.BigQueryWrapper(client=self._test_client) + self._bq_io_metadata = create_bigquery_io_metadata(self._step_name) + + def display_data(self): + return { + 'write_disposition': str(self._write_disposition), + 'additional_bq_params': str(self._additional_bq_parameters), + 'source_format': str(self._source_format), + } + + def process(self, element, schema_mod_job_name_prefix): + destination = element[0] + temp_table_load_job_reference = element[1] + + if callable(self._additional_bq_parameters): + additional_parameters = self._additional_bq_parameters(destination) + elif isinstance(self._additional_bq_parameters, vp.ValueProvider): + additional_parameters = self._additional_bq_parameters.get() + else: + additional_parameters = self._additional_bq_parameters + + # When writing to normal tables WRITE_TRUNCATE will overwrite the schema but + # when writing to a partition, care needs to be taken to update the schema + # even on WRITE_TRUNCATE. + if (self._write_disposition not in ('WRITE_TRUNCATE', 'WRITE_APPEND') or + not additional_parameters or + not additional_parameters.get("schemaUpdateOptions")): + # No need to modify schema of destination table + return + + table_reference = bigquery_tools.parse_table_reference(destination) + if table_reference.projectId is None: + table_reference.projectId = vp.RuntimeValueProvider.get_value( + 'project', str, '') + + try: + # Check if destination table exists + destination_table = self._bq_wrapper.get_table( + project_id=table_reference.projectId, + dataset_id=table_reference.datasetId, + table_id=table_reference.tableId) + except HttpError as exn: + if exn.status_code == 404: + # Destination table does not exist, so no need to modify its schema + # ahead of the copy jobs. + return + else: + raise + + temp_table_load_job = self._bq_wrapper.get_job( + project=temp_table_load_job_reference.projectId, + job_id=temp_table_load_job_reference.jobId, + location=temp_table_load_job_reference.location) + temp_table_schema = temp_table_load_job.configuration.load.schema + + if bigquery_tools.check_schema_equal(temp_table_schema, + destination_table.schema, + ignore_descriptions=True, + ignore_field_order=True): + # Destination table schema is already the same as the temp table schema, + # so no need to run a job to update the destination table schema. + return + + destination_hash = _bq_uuid( + '%s:%s.%s' % ( + table_reference.projectId, + table_reference.datasetId, + table_reference.tableId)) + uid = _bq_uuid() + job_name = '%s_%s_%s' % (schema_mod_job_name_prefix, destination_hash, uid) + + _LOGGER.info( + 'Triggering schema modification job %s on %s', + job_name, + table_reference) + # Trigger potential schema modification by loading zero rows into the + # destination table with the temporary table schema. + schema_update_job_reference = self._bq_wrapper.perform_load_job( + destination=table_reference, + source_stream=io.BytesIO(), # file with zero rows + job_id=job_name, + schema=temp_table_schema, + write_disposition='WRITE_APPEND', + create_disposition='CREATE_NEVER', + additional_load_parameters=additional_parameters, + job_labels=self._bq_io_metadata.add_additional_bq_job_labels(), + source_format=self._source_format) + yield (destination, schema_update_job_reference) + + class TriggerCopyJobs(beam.DoFn): """Launches jobs to copy from temporary tables into the main target table. @@ -349,7 +482,7 @@ def start_bundle(self): if not self.bq_io_metadata: self.bq_io_metadata = create_bigquery_io_metadata(self._step_name) - def process(self, element, job_name_prefix=None): + def process(self, element, job_name_prefix=None, unused_schema_mod_jobs=None): destination = element[0] job_reference = element[1] @@ -450,7 +583,8 @@ def display_data(self): 'additional_bq_params': str(self.additional_bq_parameters), 'schema': str(self.schema), 'launchesBigQueryJobs': DisplayDataItem( - True, label="This Dataflow job launches bigquery jobs.") + True, label="This Dataflow job launches bigquery jobs."), + 'source_format': str(self.source_format), } return result @@ -496,8 +630,7 @@ def process(self, element, load_job_name_prefix, *schema_side_inputs): table_reference.tableId)) uid = _bq_uuid() job_name = '%s_%s_%s' % (load_job_name_prefix, destination_hash, uid) - _LOGGER.debug( - 'Load job has %s files. Job name is %s.', len(files), job_name) + _LOGGER.info('Load job has %s files. Job name is %s.', len(files), job_name) create_disposition = self.create_disposition if self.temporary_tables: @@ -506,21 +639,25 @@ def process(self, element, load_job_name_prefix, *schema_side_inputs): create_disposition = 'CREATE_IF_NEEDED' # For temporary tables, we create a new table with the name with JobId. table_reference.tableId = job_name - yield pvalue.TaggedOutput(TriggerLoadJobs.TEMP_TABLES, table_reference) + yield pvalue.TaggedOutput( + TriggerLoadJobs.TEMP_TABLES, + bigquery_tools.get_hashable_destination(table_reference)) _LOGGER.info( 'Triggering job %s to load data to BigQuery table %s.' - 'Schema: %s. Additional parameters: %s', + 'Schema: %s. Additional parameters: %s. Source format: %s', job_name, table_reference, schema, - additional_parameters) + additional_parameters, + self.source_format, + ) if not self.bq_io_metadata: self.bq_io_metadata = create_bigquery_io_metadata(self._step_name) job_reference = self.bq_wrapper.perform_load_job( - table_reference, - files, - job_name, + destination=table_reference, + source_uris=files, + job_id=job_name, schema=schema, write_disposition=self.write_disposition, create_disposition=create_disposition, @@ -641,6 +778,7 @@ def __init__( create_disposition=None, write_disposition=None, triggering_frequency=None, + with_auto_sharding=False, temp_file_format=None, max_file_size=None, max_files_per_bundle=None, @@ -656,6 +794,7 @@ def __init__( self.create_disposition = create_disposition self.write_disposition = write_disposition self.triggering_frequency = triggering_frequency + self.with_auto_sharding = with_auto_sharding self.max_file_size = max_file_size or _DEFAULT_MAX_FILE_SIZE self.max_files_per_bundle = ( max_files_per_bundle or _DEFAULT_MAX_WRITERS_PER_BUNDLE) @@ -708,6 +847,9 @@ def verify(self): raise ValueError( 'triggering_frequency can only be used with file' 'loads in streaming') + if not self.is_streaming_pipeline and self.with_auto_sharding: + return ValueError( + 'with_auto_sharding can only be used with file loads in streaming.') def _window_fn(self): """Set the correct WindowInto PTransform""" @@ -719,7 +861,12 @@ def _window_fn(self): # that the files are written if a threshold number of records are ready. # We use only the user-supplied trigger on the actual BigQuery load. # This allows us to offload the data to the filesystem. - if self.is_streaming_pipeline: + # + # In the case of dynamic sharding, however, we use a default trigger since + # the transform performs sharding also batches elements to avoid generating + # too many tiny files. User trigger is applied right after writes to limit + # the number of load jobs. + if self.is_streaming_pipeline and not self.with_auto_sharding: return beam.WindowInto(beam.window.GlobalWindows(), trigger=trigger.Repeatedly( trigger.AfterAny( @@ -732,6 +879,21 @@ def _window_fn(self): else: return beam.WindowInto(beam.window.GlobalWindows()) + def _maybe_apply_user_trigger(self, destination_file_kv_pc): + if self.is_streaming_pipeline: + # Apply the user's trigger back before we start triggering load jobs + return ( + destination_file_kv_pc + | "ApplyUserTrigger" >> beam.WindowInto( + beam.window.GlobalWindows(), + trigger=trigger.Repeatedly( + trigger.AfterAll( + trigger.AfterProcessingTime(self.triggering_frequency), + trigger.AfterCount(1))), + accumulation_mode=trigger.AccumulationMode.DISCARDING)) + else: + return destination_file_kv_pc + def _write_files(self, destination_data_kv_pc, file_prefix_pcv): outputs = ( destination_data_kv_pc @@ -774,27 +936,47 @@ def _write_files(self, destination_data_kv_pc, file_prefix_pcv): (destination_files_kv_pc, more_destination_files_kv_pc) | "DestinationFilesUnion" >> beam.Flatten() | "IdentityWorkaround" >> beam.Map(lambda x: x)) + return self._maybe_apply_user_trigger(all_destination_file_pairs_pc) - if self.is_streaming_pipeline: - # Apply the user's trigger back before we start triggering load jobs - all_destination_file_pairs_pc = ( - all_destination_file_pairs_pc - | "ApplyUserTrigger" >> beam.WindowInto( - beam.window.GlobalWindows(), - trigger=trigger.Repeatedly( - trigger.AfterAll( - trigger.AfterProcessingTime(self.triggering_frequency), - trigger.AfterCount(1))), - accumulation_mode=trigger.AccumulationMode.DISCARDING)) - return all_destination_file_pairs_pc + def _write_files_with_auto_sharding( + self, destination_data_kv_pc, file_prefix_pcv): + clock = self.test_client.test_clock if self.test_client else time.time + + # Auto-sharding is achieved via GroupIntoBatches.WithShardedKey + # transform which shards, groups and at the same time batches the table rows + # to be inserted to BigQuery. + + # Firstly, the keys of tagged_data (table references) are converted to a + # hashable format. This is needed to work with the keyed states used by. + # GroupIntoBatches. After grouping and batching is done, table references + # are restored. + destination_files_kv_pc = ( + destination_data_kv_pc + | + 'ToHashableTableRef' >> beam.Map(bigquery_tools.to_hashable_table_ref) + | 'WithAutoSharding' >> GroupIntoBatches.WithShardedKey( + batch_size=_FILE_TRIGGERING_RECORD_COUNT, + max_buffering_duration_secs=_FILE_TRIGGERING_BATCHING_DURATION_SECS, + clock=clock) + | 'FromHashableTableRefAndDropShard' >> beam.Map( + lambda kvs: + (bigquery_tools.parse_table_reference(kvs[0].key), kvs[1])) + | beam.ParDo( + WriteGroupedRecordsToFile( + schema=self.schema, file_format=self._temp_file_format), + file_prefix_pcv, + *self.schema_side_inputs)) + + return self._maybe_apply_user_trigger(destination_files_kv_pc) def _load_data( self, partitions_using_temp_tables, partitions_direct_to_destination, load_job_name_pcv, + schema_mod_job_name_pcv, copy_job_name_pcv, - singleton_pc, + p, step_name): """Load data to BigQuery @@ -831,34 +1013,62 @@ def _load_data( temp_tables_load_job_ids_pc = trigger_loads_outputs['main'] temp_tables_pc = trigger_loads_outputs[TriggerLoadJobs.TEMP_TABLES] - destination_copy_job_ids_pc = ( - singleton_pc + finished_temp_tables_load_jobs_pc = ( + p + | "ImpulseMonitorLoadJobs" >> beam.Create([None]) | "WaitForTempTableLoadJobs" >> beam.ParDo( WaitForBQJobs(self.test_client), - beam.pvalue.AsList(temp_tables_load_job_ids_pc)) + pvalue.AsList(temp_tables_load_job_ids_pc))) + + schema_mod_job_ids_pc = ( + finished_temp_tables_load_jobs_pc + | beam.ParDo( + UpdateDestinationSchema( + write_disposition=self.write_disposition, + test_client=self.test_client, + additional_bq_parameters=self.additional_bq_parameters, + step_name=step_name, + source_format=self._temp_file_format, + ), + schema_mod_job_name_pcv)) + + finished_schema_mod_jobs_pc = ( + p + | "ImpulseMonitorSchemaModJobs" >> beam.Create([None]) + | "WaitForSchemaModJobs" >> beam.ParDo( + WaitForBQJobs(self.test_client), + pvalue.AsList(schema_mod_job_ids_pc))) + + destination_copy_job_ids_pc = ( + finished_temp_tables_load_jobs_pc | beam.ParDo( TriggerCopyJobs( create_disposition=self.create_disposition, write_disposition=self.write_disposition, test_client=self.test_client, step_name=step_name), - copy_job_name_pcv)) + copy_job_name_pcv, + pvalue.AsIter(finished_schema_mod_jobs_pc))) finished_copy_jobs_pc = ( - singleton_pc + p + | "ImpulseMonitorCopyJobs" >> beam.Create([None]) | "WaitForCopyJobs" >> beam.ParDo( WaitForBQJobs(self.test_client), - beam.pvalue.AsList(destination_copy_job_ids_pc))) + pvalue.AsList(destination_copy_job_ids_pc))) _ = ( - finished_copy_jobs_pc + p + | "RemoveTempTables/Impulse" >> beam.Create([None]) | "RemoveTempTables/PassTables" >> beam.FlatMap( - lambda x, + lambda _, + unused_copy_jobs, deleting_tables: deleting_tables, + pvalue.AsIter(finished_copy_jobs_pc), pvalue.AsIter(temp_tables_pc)) | "RemoveTempTables/AddUselessValue" >> beam.Map(lambda x: (x, None)) | "RemoveTempTables/DeduplicateTables" >> beam.GroupByKey() - | "RemoveTempTables/GetTableNames" >> beam.Map(lambda elm: elm[0]) + | "RemoveTempTables/GetTableNames" >> beam.Keys() | "RemoveTempTables/Delete" >> beam.ParDo( DeleteTablesFn(self.test_client))) @@ -879,10 +1089,11 @@ def _load_data( *self.schema_side_inputs)) _ = ( - singleton_pc + p + | "ImpulseMonitorDestinationLoadJobs" >> beam.Create([None]) | "WaitForDestinationLoadJobs" >> beam.ParDo( WaitForBQJobs(self.test_client), - beam.pvalue.AsList(destination_load_job_ids_pc))) + pvalue.AsList(destination_load_job_ids_pc))) destination_load_job_ids_pc = ( (temp_tables_load_job_ids_pc, destination_load_job_ids_pc) @@ -911,6 +1122,14 @@ def expand(self, pcoll): lambda _: _generate_job_name( job_name, bigquery_tools.BigQueryJobTypes.LOAD, 'LOAD_STEP'))) + schema_mod_job_name_pcv = pvalue.AsSingleton( + singleton_pc + | "SchemaModJobNamePrefix" >> beam.Map( + lambda _: _generate_job_name( + job_name, + bigquery_tools.BigQueryJobTypes.LOAD, + 'SCHEMA_MOD_STEP'))) + copy_job_name_pcv = pvalue.AsSingleton( singleton_pc | "CopyJobNamePrefix" >> beam.Map( @@ -930,8 +1149,12 @@ def expand(self, pcoll): bigquery_tools.AppendDestinationsFn(self.destination), *self.table_side_inputs)) - all_destination_file_pairs_pc = self._write_files( - destination_data_kv_pc, file_prefix_pcv) + if not self.with_auto_sharding: + all_destination_file_pairs_pc = self._write_files( + destination_data_kv_pc, file_prefix_pcv) + else: + all_destination_file_pairs_pc = self._write_files_with_auto_sharding( + destination_data_kv_pc, file_prefix_pcv) grouped_files_pc = ( all_destination_file_pairs_pc @@ -963,15 +1186,19 @@ def expand(self, pcoll): self._load_data(all_partitions, empty_pc, load_job_name_pcv, + schema_mod_job_name_pcv, copy_job_name_pcv, - singleton_pc, + p, step_name)) else: destination_load_job_ids_pc, destination_copy_job_ids_pc = ( self._load_data(multiple_partitions_per_destination_pc, - single_partition_per_destination_pc, - load_job_name_pcv, copy_job_name_pcv, singleton_pc, - step_name)) + single_partition_per_destination_pc, + load_job_name_pcv, + schema_mod_job_name_pcv, + copy_job_name_pcv, + p, + step_name)) return { self.DESTINATION_JOBID_PAIRS: destination_load_job_ids_pc, diff --git a/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py b/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py index fca7d9c94bd2..1e227f8b00fe 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py @@ -19,20 +19,17 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import random -import sys import time import unittest import mock +import pytest from hamcrest.core import assert_that as hamcrest_assert from hamcrest.core.core.allof import all_of from hamcrest.core.core.is_ import is_ -from nose.plugins.attrib import attr from parameterized import param from parameterized import parameterized @@ -44,6 +41,8 @@ from apache_beam.io.gcp.internal.clients import bigquery as bigquery_api from apache_beam.io.gcp.tests.bigquery_matcher import BigqueryFullResultMatcher from apache_beam.io.gcp.tests.bigquery_matcher import BigqueryFullResultStreamingMatcher +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import StandardOptions from apache_beam.runners.dataflow.test_dataflow_runner import TestDataflowRunner from apache_beam.runners.runner import PipelineState from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher @@ -52,7 +51,9 @@ from apache_beam.testing.util import assert_that from apache_beam.testing.util import equal_to from apache_beam.transforms import combiners +from apache_beam.transforms.window import TimestampedValue from apache_beam.typehints.typehints import Tuple +from apache_beam.utils import timestamp try: from apitools.base.py.exceptions import HttpError @@ -458,7 +459,6 @@ def test_records_traverse_transform_with_mocks(self): assert_that(jobs, equal_to([job_reference]), label='CheckJobs') - @unittest.skipIf(sys.version_info[0] == 2, 'Mock pickling problems in Py 2') @mock.patch('time.sleep') def test_wait_for_job_completion(self, sleep_mock): job_references = [bigquery_api.JobReference(), bigquery_api.JobReference()] @@ -494,7 +494,6 @@ def test_wait_for_job_completion(self, sleep_mock): sleep_mock.assert_called_once() - @unittest.skipIf(sys.version_info[0] == 2, 'Mock pickling problems in Py 2') @mock.patch('time.sleep') def test_one_job_failed_after_waiting(self, sleep_mock): job_references = [bigquery_api.JobReference(), bigquery_api.JobReference()] @@ -603,6 +602,115 @@ def test_multiple_partition_files(self): equal_to([6]), label='CheckCopyJobCount') + @parameterized.expand([ + param(is_streaming=False, with_auto_sharding=False), + param(is_streaming=True, with_auto_sharding=False), + param(is_streaming=True, with_auto_sharding=True), + ]) + def test_triggering_frequency(self, is_streaming, with_auto_sharding): + destination = 'project1:dataset1.table1' + + job_reference = bigquery_api.JobReference() + job_reference.projectId = 'project1' + job_reference.jobId = 'job_name1' + result_job = bigquery_api.Job() + result_job.jobReference = job_reference + + mock_job = mock.Mock() + mock_job.status.state = 'DONE' + mock_job.status.errorResult = None + mock_job.jobReference = job_reference + + bq_client = mock.Mock() + bq_client.jobs.Get.return_value = mock_job + bq_client.jobs.Insert.return_value = result_job + + # Insert a fake clock to work with auto-sharding which needs a processing + # time timer. + class _FakeClock(object): + def __init__(self, now=time.time()): + self._now = now + + def __call__(self): + return self._now + + start_time = timestamp.Timestamp(0) + bq_client.test_clock = _FakeClock(now=start_time) + + triggering_frequency = 20 if is_streaming else None + transform = bqfl.BigQueryBatchFileLoads( + destination, + custom_gcs_temp_location=self._new_tempdir(), + test_client=bq_client, + validate=False, + temp_file_format=bigquery_tools.FileFormat.JSON, + is_streaming_pipeline=is_streaming, + triggering_frequency=triggering_frequency, + with_auto_sharding=with_auto_sharding) + + # Need to test this with the DirectRunner to avoid serializing mocks + test_options = PipelineOptions(flags=['--allow_unsafe_triggers']) + test_options.view_as(StandardOptions).streaming = is_streaming + with TestPipeline(runner='BundleBasedDirectRunner', + options=test_options) as p: + if is_streaming: + _SIZE = len(_ELEMENTS) + fisrt_batch = [ + TimestampedValue(value, start_time + i + 1) for i, + value in enumerate(_ELEMENTS[:_SIZE // 2]) + ] + second_batch = [ + TimestampedValue(value, start_time + _SIZE // 2 + i + 1) for i, + value in enumerate(_ELEMENTS[_SIZE // 2:]) + ] + # Advance processing time between batches of input elements to fire the + # user triggers. Intentionally advance the processing time twice for the + # auto-sharding case since we need to first fire the timer and then + # fire the trigger. + test_stream = ( + TestStream().advance_watermark_to(start_time).add_elements( + fisrt_batch).advance_processing_time( + 30).advance_processing_time(30).add_elements(second_batch). + advance_processing_time(30).advance_processing_time( + 30).advance_watermark_to_infinity()) + input = p | test_stream + else: + input = p | beam.Create(_ELEMENTS) + outputs = input | transform + + dest_files = outputs[bqfl.BigQueryBatchFileLoads.DESTINATION_FILE_PAIRS] + dest_job = outputs[bqfl.BigQueryBatchFileLoads.DESTINATION_JOBID_PAIRS] + + files = dest_files | "GetFiles" >> beam.Map(lambda x: x[1][0]) + destinations = ( + dest_files + | "GetDests" >> beam.Map( + lambda x: (bigquery_tools.get_hashable_destination(x[0]), x[1])) + | "GetUniques" >> combiners.Count.PerKey() + | "GetFinalDests" >> beam.Keys()) + jobs = dest_job | "GetJobs" >> beam.Map(lambda x: x[1]) + + # Check that all files exist. + _ = ( + files + | beam.Map(lambda x: hamcrest_assert(os.path.exists(x), is_(True)))) + + # Expect two load jobs are generated in the streaming case due to the + # triggering frequency. Grouping is per trigger so we expect two entries + # in the output as opposed to one. + file_count = files | combiners.Count.Globally().without_defaults() + expected_file_count = [1, 1] if is_streaming else [1] + expected_destinations = [destination, destination + ] if is_streaming else [destination] + expected_jobs = [job_reference, job_reference + ] if is_streaming else [job_reference] + assert_that(file_count, equal_to(expected_file_count), label='CountFiles') + assert_that( + destinations, + equal_to(expected_destinations), + label='CheckDestinations') + assert_that(jobs, equal_to(expected_jobs), label='CheckJobs') + class BigQueryFileLoadsIT(unittest.TestCase): @@ -636,7 +744,7 @@ def setUp(self): _LOGGER.info( "Created dataset %s in project %s", self.dataset_id, self.project) - @attr('IT') + @pytest.mark.it_postcommit def test_multiple_destinations_transform(self): output_table_1 = '%s%s' % (self.output_table, 1) output_table_2 = '%s%s' % (self.output_table, 2) @@ -716,7 +824,7 @@ def test_multiple_destinations_transform(self): max_file_size=20, max_files_per_bundle=-1)) - @attr('IT') + @pytest.mark.it_postcommit def test_bqfl_streaming(self): if isinstance(self.test_pipeline.runner, TestDataflowRunner): self.skipTest("TestStream is not supported on TestDataflowRunner") @@ -732,7 +840,9 @@ def test_bqfl_streaming(self): data=[(i, ) for i in range(100)]) args = self.test_pipeline.get_full_options_as_args( - on_success_matcher=all_of(state_matcher, bq_matcher), streaming=True) + on_success_matcher=all_of(state_matcher, bq_matcher), + streaming=True, + allow_unsafe_triggers=True) with beam.Pipeline(argv=args) as p: stream_source = ( TestStream().advance_watermark_to(0).advance_processing_time( @@ -752,7 +862,7 @@ def test_bqfl_streaming(self): .Method.FILE_LOADS, triggering_frequency=100)) - @attr('IT') + @pytest.mark.it_postcommit def test_one_job_fails_all_jobs_fail(self): # If one of the import jobs fails, then other jobs must not be performed. diff --git a/sdks/python/apache_beam/io/gcp/bigquery_io_metadata.py b/sdks/python/apache_beam/io/gcp/bigquery_io_metadata.py index 93a952a67637..a730f2cfc9bb 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_io_metadata.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_io_metadata.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import re from apache_beam.io.gcp import gce_metadata_util diff --git a/sdks/python/apache_beam/io/gcp/bigquery_io_metadata_test.py b/sdks/python/apache_beam/io/gcp/bigquery_io_metadata_test.py index 26ee66974249..c91202465388 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_io_metadata_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_io_metadata_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/io/gcp/bigquery_io_read_it_test.py b/sdks/python/apache_beam/io/gcp/bigquery_io_read_it_test.py index fd1e32f45ab3..63cc44589e48 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_io_read_it_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_io_read_it_test.py @@ -22,13 +22,11 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.io.gcp import bigquery_io_read_pipeline from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher @@ -61,11 +59,11 @@ def run_bigquery_io_read_pipeline(self, input_size, beam_bq_source=False): bigquery_io_read_pipeline.run( test_pipeline.get_full_options_as_args(**extra_opts)) - @attr('IT') + @pytest.mark.it_postcommit def test_bigquery_read_custom_1M_python(self): self.run_bigquery_io_read_pipeline('1M', True) - @attr('IT') + @pytest.mark.it_postcommit def test_bigquery_read_1M_python(self): self.run_bigquery_io_read_pipeline('1M') diff --git a/sdks/python/apache_beam/io/gcp/bigquery_io_read_pipeline.py b/sdks/python/apache_beam/io/gcp/bigquery_io_read_pipeline.py index a82326e1dfb8..8ca9736b025e 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_io_read_pipeline.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_io_read_pipeline.py @@ -22,8 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import random diff --git a/sdks/python/apache_beam/io/gcp/bigquery_read_internal.py b/sdks/python/apache_beam/io/gcp/bigquery_read_internal.py index 394e435a3ace..f5b959381d21 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_read_internal.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_read_internal.py @@ -25,10 +25,13 @@ import json import logging import random +import time import uuid from typing import TYPE_CHECKING +from typing import Any from typing import Dict from typing import Iterable +from typing import List from typing import Optional from typing import Union @@ -180,7 +183,6 @@ def process(self, element: 'ReadFromBigQueryRequest') -> Iterable[BoundedSource]: bq = bigquery_tools.BigQueryWrapper( temp_dataset_id=self._get_temp_dataset().datasetId) - # TODO(BEAM-11359): Clean up temp dataset at pipeline completion. if element.query is not None: self._setup_temporary_dataset(bq, element) @@ -204,6 +206,9 @@ def process(self, table_reference.datasetId, table_reference.tableId) + if bq.created_temp_dataset: + self._clean_temporary_dataset(bq, element) + def _get_bq_metadata(self): if not self.bq_io_metadata: self.bq_io_metadata = create_bigquery_io_metadata(self._step_name) @@ -228,6 +233,12 @@ def _setup_temporary_dataset( self._get_project(), element.query, not element.use_standard_sql) bq.create_temporary_dataset(self._get_project(), location) + def _clean_temporary_dataset( + self, + bq: bigquery_tools.BigQueryWrapper, + element: 'ReadFromBigQueryRequest'): + bq.clean_up_temporary_dataset(self._get_project()) + def _execute_query( self, bq: bigquery_tools.BigQueryWrapper, @@ -236,7 +247,7 @@ def _execute_query( self._job_name, self._source_uuid, bigquery_tools.BigQueryJobTypes.QUERY, - random.randint(0, 1000)) + '%s_%s' % (int(time.time()), random.randint(0, 1000))) job = bq._start_query_job( self._get_project(), element.query, @@ -335,8 +346,7 @@ def _to_decimal(value): @staticmethod def _to_bytes(value): - """Converts value from str to bytes on Python 3.x. Does nothing on - Python 2.7.""" + """Converts value from str to bytes.""" return value.encode('utf-8') @classmethod @@ -355,34 +365,32 @@ def _convert_to_tuple(cls, table_field_schemas): def decode(self, value): value = json.loads(value.decode('utf-8')) - return self._decode_with_schema(value, self.fields) + return self._decode_row(value, self.fields) - def _decode_with_schema(self, value, schema_fields): + def _decode_row(self, row: Dict[str, Any], schema_fields: List[FieldSchema]): for field in schema_fields: - if field.name not in value: + if field.name not in row: # The field exists in the schema, but it doesn't exist in this row. # It probably means its value was null, as the extract to JSON job # doesn't preserve null fields - value[field.name] = None + row[field.name] = None continue - if field.type == 'RECORD': - nested_values = value[field.name] - if field.mode == 'REPEATED': - for i, nested_value in enumerate(nested_values): - nested_values[i] = self._decode_with_schema( - nested_value, field.fields) - else: - value[field.name] = self._decode_with_schema( - nested_values, field.fields) + if field.mode == 'REPEATED': + for i, elem in enumerate(row[field.name]): + row[field.name][i] = self._decode_data(elem, field) else: - try: - converter = self._converters[field.type] - value[field.name] = converter(value[field.name]) - except KeyError: - # No need to do any conversion - pass - return value + row[field.name] = self._decode_data(row[field.name], field) + return row + + def _decode_data(self, obj: Any, field: FieldSchema): + if not field.fields: + try: + return self._converters[field.type](obj) + except KeyError: + # No need to do any conversion + return obj + return self._decode_row(obj, field.fields) def is_deterministic(self): return True diff --git a/sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py b/sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py index 4586f046f79b..9216a9c2b716 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py @@ -20,8 +20,6 @@ """Unit tests for BigQuery sources and sinks.""" # pytype: skip-file -from __future__ import absolute_import - import base64 import datetime import logging @@ -31,8 +29,7 @@ from decimal import Decimal from functools import wraps -from future.utils import iteritems -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.io.gcp.bigquery_tools import BigQueryWrapper @@ -131,7 +128,7 @@ class ReadTests(BigQueryReadIntegrationTests): @classmethod def setUpClass(cls): super(ReadTests, cls).setUpClass() - cls.table_name = 'python_write_table' + cls.table_name = 'python_read_table' cls.create_table(cls.table_name) table_id = '{}.{}'.format(cls.dataset_id, cls.table_name) @@ -160,7 +157,7 @@ def create_table(cls, table_name): cls.project, cls.dataset_id, table_name, cls.TABLE_DATA) @skip(['PortableRunner', 'FlinkRunner']) - @attr('IT') + @pytest.mark.it_postcommit def test_native_source(self): with beam.Pipeline(argv=self.args) as p: result = ( @@ -168,7 +165,7 @@ def test_native_source(self): beam.io.BigQuerySource(query=self.query, use_standard_sql=True))) assert_that(result, equal_to(self.TABLE_DATA)) - @attr('IT') + @pytest.mark.it_postcommit def test_iobase_source(self): query = StaticValueProvider(str, self.query) with beam.Pipeline(argv=self.args) as p: @@ -245,7 +242,7 @@ def create_table(cls, table_name): table_data = [row_data] # add rows with only one key value pair and None values for all other keys - for key, value in iteritems(row_data): + for key, value in row_data.items(): table_data.append({key: value}) cls.bigquery_client.insert_rows( @@ -267,7 +264,7 @@ def get_expected_data(self, native=True): expected_data = [expected_row] # add rows with only one key value pair and None values for all other keys - for key, value in iteritems(expected_row): + for key, value in expected_row.items(): row = {k: None for k in expected_row} row[key] = value expected_data.append(row) @@ -275,7 +272,7 @@ def get_expected_data(self, native=True): return expected_data @skip(['PortableRunner', 'FlinkRunner']) - @attr('IT') + @pytest.mark.it_postcommit def test_native_source(self): with beam.Pipeline(argv=self.args) as p: result = ( @@ -284,7 +281,7 @@ def test_native_source(self): beam.io.BigQuerySource(query=self.query, use_standard_sql=True))) assert_that(result, equal_to(self.get_expected_data())) - @attr('IT') + @pytest.mark.it_postcommit def test_iobase_source(self): with beam.Pipeline(argv=self.args) as p: result = ( @@ -381,7 +378,7 @@ def create_bq_schema(cls, with_extra=False): return table_schema @skip(['PortableRunner', 'FlinkRunner']) - @attr('IT') + @pytest.mark.it_postcommit def test_read_queries(self): # TODO(BEAM-11311): Remove experiment when tests run on r_v2. args = self.args + ["--experiments=use_runner_v2"] diff --git a/sdks/python/apache_beam/io/gcp/bigquery_read_perf_test.py b/sdks/python/apache_beam/io/gcp/bigquery_read_perf_test.py index d9e6cd3948b7..957028cb1e53 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_read_perf_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_read_perf_test.py @@ -51,8 +51,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging from apache_beam import Map diff --git a/sdks/python/apache_beam/io/gcp/bigquery_test.py b/sdks/python/apache_beam/io/gcp/bigquery_test.py index a8290624d94d..f6e014f7875c 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_test.py @@ -18,8 +18,6 @@ """Unit tests for BigQuery sources and sinks.""" # pytype: skip-file -from __future__ import absolute_import - import datetime import decimal import json @@ -32,12 +30,12 @@ import unittest import uuid -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import hamcrest as hc import mock +import pytest import pytz -from nose.plugins.attrib import attr +from parameterized import param +from parameterized import parameterized import apache_beam as beam from apache_beam.internal import pickler @@ -79,14 +77,18 @@ # pylint: disable=wrong-import-order, wrong-import-position try: from apitools.base.py.exceptions import HttpError + from google.cloud import bigquery as gcp_bigquery except ImportError: + gcp_bigquery = None HttpError = None # pylint: enable=wrong-import-order, wrong-import-position _LOGGER = logging.getLogger(__name__) -@unittest.skipIf(HttpError is None, 'GCP dependencies are not installed') +@unittest.skipIf( + HttpError is None or gcp_bigquery is None, + 'GCP dependencies are not installed') class TestTableRowJsonCoder(unittest.TestCase): def test_row_as_table_row(self): schema_definition = [('s', 'STRING'), ('i', 'INTEGER'), ('f', 'FLOAT'), @@ -371,6 +373,18 @@ def test_record_and_repeatable_field_is_properly_converted(self): actual = coder.decode(input_row) self.assertEqual(expected_row, actual) + def test_repeatable_field_is_properly_converted(self): + input_row = b'{"repeated": ["55.5", "65.5"], "integer": "10"}' + expected_row = {'repeated': [55.5, 65.5], 'integer': 10} + schema = self._make_schema([ + ('repeated', 'FLOAT', 'REPEATED', []), + ('integer', 'INTEGER', 'NULLABLE', []), + ]) + coder = _JsonToDictCoder(schema) + + actual = coder.decode(input_row) + self.assertEqual(expected_row, actual) + @unittest.skipIf(HttpError is None, 'GCP dependencies are not installed') class TestReadFromBigQuery(unittest.TestCase): @@ -437,15 +451,16 @@ def test_get_destination_uri_fallback_temp_location(self): 'empty, using temp_location instead' ]) + @mock.patch.object(BigQueryWrapper, '_delete_table') @mock.patch.object(BigQueryWrapper, '_delete_dataset') @mock.patch('apache_beam.io.gcp.internal.clients.bigquery.BigqueryV2') - def test_temp_dataset_location_is_configurable(self, api, delete_dataset): + def test_temp_dataset_is_configurable( + self, api, delete_dataset, delete_table): temp_dataset = bigquery.DatasetReference( projectId='temp-project', datasetId='bq_dataset') bq = BigQueryWrapper(client=api, temp_dataset_id=temp_dataset.datasetId) gcs_location = 'gs://gcs_location' - # bq.get_or_create_dataset.return_value = temp_dataset c = beam.io.gcp.bigquery._CustomBigQuerySource( query='select * from test_table', gcs_location=gcs_location, @@ -456,30 +471,15 @@ def test_temp_dataset_location_is_configurable(self, api, delete_dataset): project='execution_project', **{'temp_dataset': temp_dataset}) - api.datasets.Get.side_effect = HttpError({ - 'status_code': 404, 'status': 404 - }, - '', - '') - c._setup_temporary_dataset(bq) - api.datasets.Insert.assert_called_with( - bigquery.BigqueryDatasetsInsertRequest( - dataset=bigquery.Dataset(datasetReference=temp_dataset), - projectId=temp_dataset.projectId)) + api.datasets.assert_not_called() - api.datasets.Get.return_value = temp_dataset - api.datasets.Get.side_effect = None + # User provided temporary dataset should not be deleted but the temporary + # table created by Beam should be deleted. bq.clean_up_temporary_dataset(temp_dataset.projectId) - delete_dataset.assert_called_with( - temp_dataset.projectId, temp_dataset.datasetId, True) - - self.assertEqual( - bq._get_temp_table(temp_dataset.projectId), - bigquery.TableReference( - projectId=temp_dataset.projectId, - datasetId=temp_dataset.datasetId, - tableId=BigQueryWrapper.TEMP_TABLE + bq._temporary_table_suffix)) + delete_dataset.assert_not_called() + delete_table.assert_called_with( + temp_dataset.projectId, temp_dataset.datasetId, mock.ANY) @unittest.skipIf(HttpError is None, 'GCP dependencies are not installed') @@ -784,8 +784,7 @@ def test_dofn_client_process_performs_batching(self): client.tables.Get.return_value = bigquery.Table( tableReference=bigquery.TableReference( projectId='project_id', datasetId='dataset_id', tableId='table_id')) - client.tabledata.InsertAll.return_value = \ - bigquery.TableDataInsertAllResponse(insertErrors=[]) + client.insert_rows_json.return_value = [] create_disposition = beam.io.BigQueryDisposition.CREATE_NEVER write_disposition = beam.io.BigQueryDisposition.WRITE_APPEND @@ -799,15 +798,14 @@ def test_dofn_client_process_performs_batching(self): fn.process(('project_id:dataset_id.table_id', {'month': 1})) # InsertRows not called as batch size is not hit yet - self.assertFalse(client.tabledata.InsertAll.called) + self.assertFalse(client.insert_rows_json.called) def test_dofn_client_process_flush_called(self): client = mock.Mock() client.tables.Get.return_value = bigquery.Table( tableReference=bigquery.TableReference( projectId='project_id', datasetId='dataset_id', tableId='table_id')) - client.tabledata.InsertAll.return_value = ( - bigquery.TableDataInsertAllResponse(insertErrors=[])) + client.insert_rows_json.return_value = [] create_disposition = beam.io.BigQueryDisposition.CREATE_NEVER write_disposition = beam.io.BigQueryDisposition.WRITE_APPEND @@ -822,15 +820,14 @@ def test_dofn_client_process_flush_called(self): fn.process(('project_id:dataset_id.table_id', ({'month': 1}, 'insertid1'))) fn.process(('project_id:dataset_id.table_id', ({'month': 2}, 'insertid2'))) # InsertRows called as batch size is hit - self.assertTrue(client.tabledata.InsertAll.called) + self.assertTrue(client.insert_rows_json.called) def test_dofn_client_finish_bundle_flush_called(self): client = mock.Mock() client.tables.Get.return_value = bigquery.Table( tableReference=bigquery.TableReference( projectId='project_id', datasetId='dataset_id', tableId='table_id')) - client.tabledata.InsertAll.return_value = \ - bigquery.TableDataInsertAllResponse(insertErrors=[]) + client.insert_rows_json.return_value = [] create_disposition = beam.io.BigQueryDisposition.CREATE_IF_NEEDED write_disposition = beam.io.BigQueryDisposition.WRITE_APPEND @@ -849,11 +846,11 @@ def test_dofn_client_finish_bundle_flush_called(self): self.assertTrue(client.tables.Get.called) # InsertRows not called as batch size is not hit - self.assertFalse(client.tabledata.InsertAll.called) + self.assertFalse(client.insert_rows_json.called) fn.finish_bundle() # InsertRows called in finish bundle - self.assertTrue(client.tabledata.InsertAll.called) + self.assertTrue(client.insert_rows_json.called) def test_dofn_client_no_records(self): client = mock.Mock() @@ -880,6 +877,40 @@ def test_dofn_client_no_records(self): # InsertRows not called in finish bundle as no records self.assertFalse(client.tabledata.InsertAll.called) + def test_with_batched_input(self): + client = mock.Mock() + client.tables.Get.return_value = bigquery.Table( + tableReference=bigquery.TableReference( + projectId='project_id', datasetId='dataset_id', tableId='table_id')) + client.insert_rows_json.return_value = [] + create_disposition = beam.io.BigQueryDisposition.CREATE_IF_NEEDED + write_disposition = beam.io.BigQueryDisposition.WRITE_APPEND + + fn = beam.io.gcp.bigquery.BigQueryWriteFn( + batch_size=10, + create_disposition=create_disposition, + write_disposition=write_disposition, + kms_key=None, + with_batched_input=True, + test_client=client) + + fn.start_bundle() + + # Destination is a tuple of (destination, schema) to ensure the table is + # created. + fn.process(( + 'project_id:dataset_id.table_id', + [({ + 'month': 1 + }, 'insertid3'), ({ + 'month': 2 + }, 'insertid2'), ({ + 'month': 3 + }, 'insertid1')])) + + # InsertRows called since the input is already batched. + self.assertTrue(client.insert_rows_json.called) + @unittest.skipIf(HttpError is None, 'GCP dependencies are not installed') class PipelineBasedStreamingInsertTest(_TestCaseWithTempDirCleanUp): @@ -888,12 +919,9 @@ def test_failure_has_same_insert_ids(self): file_name_1 = os.path.join(tempdir, 'file1') file_name_2 = os.path.join(tempdir, 'file2') - def store_callback(arg): - insert_ids = [r.insertId for r in arg.tableDataInsertAllRequest.rows] - colA_values = [ - r.json.additionalProperties[0].value.string_value - for r in arg.tableDataInsertAllRequest.rows - ] + def store_callback(table, **kwargs): + insert_ids = [r for r in kwargs['row_ids']] + colA_values = [r['columnA'] for r in kwargs['json_rows']] json_output = {'insertIds': insert_ids, 'colA_values': colA_values} # The first time we try to insert, we save those insertions in # file insert_calls1. @@ -905,12 +933,10 @@ def store_callback(arg): with open(file_name_2, 'w') as f: json.dump(json_output, f) - res = mock.Mock() - res.insertErrors = [] - return res + return [] client = mock.Mock() - client.tabledata.InsertAll = mock.Mock(side_effect=store_callback) + client.insert_rows_json = mock.Mock(side_effect=store_callback) # Using the bundle based direct runner to avoid pickling problems # with mocks. @@ -925,30 +951,88 @@ def store_callback(arg): 'columnA': 'value5', 'columnB': 'value6' }]) | _StreamToBigQuery( - 'project:dataset.table', [], [], - 'anyschema', - None, - 'CREATE_NEVER', - None, - None, - None, [], + table_reference='project:dataset.table', + table_side_inputs=[], + schema_side_inputs=[], + schema='anyschema', + batch_size=None, + create_disposition='CREATE_NEVER', + write_disposition=None, + kms_key=None, + retry_strategy=None, + additional_bq_parameters=[], ignore_insert_ids=False, + with_auto_sharding=False, test_client=client)) with open(file_name_1) as f1, open(file_name_2) as f2: self.assertEqual(json.load(f1), json.load(f2)) + @parameterized.expand([ + param(with_auto_sharding=False), + param(with_auto_sharding=True), + ]) + def test_batch_size_with_auto_sharding(self, with_auto_sharding): + tempdir = '%s%s' % (self._new_tempdir(), os.sep) + file_name_1 = os.path.join(tempdir, 'file1') + file_name_2 = os.path.join(tempdir, 'file2') + + def store_callback(table, **kwargs): + insert_ids = [r for r in kwargs['row_ids']] + colA_values = [r['columnA'] for r in kwargs['json_rows']] + json_output = {'insertIds': insert_ids, 'colA_values': colA_values} + # Expect two batches of rows will be inserted. Store them separately. + if not os.path.exists(file_name_1): + with open(file_name_1, 'w') as f: + json.dump(json_output, f) + else: + with open(file_name_2, 'w') as f: + json.dump(json_output, f) + + return [] + + client = mock.Mock() + client.insert_rows_json = mock.Mock(side_effect=store_callback) + + # Using the bundle based direct runner to avoid pickling problems + # with mocks. + with beam.Pipeline(runner='BundleBasedDirectRunner') as p: + _ = ( + p + | beam.Create([{ + 'columnA': 'value1', 'columnB': 'value2' + }, { + 'columnA': 'value3', 'columnB': 'value4' + }, { + 'columnA': 'value5', 'columnB': 'value6' + }]) + | _StreamToBigQuery( + table_reference='project:dataset.table', + table_side_inputs=[], + schema_side_inputs=[], + schema='anyschema', + # Set a batch size such that the input elements will be inserted + # in 2 batches. + batch_size=2, + create_disposition='CREATE_NEVER', + write_disposition=None, + kms_key=None, + retry_strategy=None, + additional_bq_parameters=[], + ignore_insert_ids=False, + with_auto_sharding=with_auto_sharding, + test_client=client)) + + with open(file_name_1) as f1, open(file_name_2) as f2: + out1 = json.load(f1) + self.assertEqual(out1['colA_values'], ['value1', 'value3']) + out2 = json.load(f2) + self.assertEqual(out2['colA_values'], ['value5']) + class BigQueryStreamingInsertTransformIntegrationTests(unittest.TestCase): BIG_QUERY_DATASET_ID = 'python_bq_streaming_inserts_' - # Prevent nose from finding and running tests that were not - # specified in the Gradle file. - # See "More tests may be found" in: - # https://nose.readthedocs.io/en/latest/doc_tests/test_multiprocess - # /multiprocess.html#other-differences-in-test-running - _multiprocess_can_split_ = True - def setUp(self): self.test_pipeline = TestPipeline(is_integration_test=True) self.runner_name = type(self.test_pipeline.runner).__name__ @@ -964,7 +1048,7 @@ def setUp(self): _LOGGER.info( "Created dataset %s in project %s", self.dataset_id, self.project) - @attr('IT') + @pytest.mark.it_postcommit def test_value_provider_transform(self): output_table_1 = '%s%s' % (self.output_table, 1) output_table_2 = '%s%s' % (self.output_table, 2) @@ -1034,7 +1118,7 @@ def test_value_provider_transform(self): additional_bq_parameters=lambda _: additional_bq_parameters, method='FILE_LOADS')) - @attr('IT') + @pytest.mark.it_postcommit def test_multiple_destinations_transform(self): streaming = self.test_pipeline.options.view_as(StandardOptions).streaming if streaming and isinstance(self.test_pipeline.runner, TestDataflowRunner): @@ -1207,7 +1291,8 @@ def _run_pubsub_bq_pipeline(self, method, triggering_frequency=None): args = self.test_pipeline.get_full_options_as_args( on_success_matcher=hc.all_of(*matchers), wait_until_finish_duration=self.WAIT_UNTIL_FINISH_DURATION, - streaming=True) + streaming=True, + allow_unsafe_triggers=True) def add_schema_info(element): yield {'number': element} @@ -1227,14 +1312,12 @@ def add_schema_info(element): method=method, triggering_frequency=triggering_frequency) - @attr('IT') + @pytest.mark.it_postcommit def test_streaming_inserts(self): self._run_pubsub_bq_pipeline(WriteToBigQuery.Method.STREAMING_INSERTS) - @attr('IT') + @pytest.mark.it_postcommit def test_file_loads(self): - if isinstance(self.test_pipeline.runner, TestDataflowRunner): - self.skipTest('https://issuetracker.google.com/issues/118375066') self._run_pubsub_bq_pipeline( WriteToBigQuery.Method.FILE_LOADS, triggering_frequency=20) @@ -1258,7 +1341,7 @@ def setUp(self): _LOGGER.info( 'Created dataset %s in project %s', self.dataset_id, self.project) - @attr('IT') + @pytest.mark.it_postcommit def test_avro_file_load(self): # Construct elements such that they can be written via Avro but not via # JSON. See BEAM-8841. diff --git a/sdks/python/apache_beam/io/gcp/bigquery_tools.py b/sdks/python/apache_beam/io/gcp/bigquery_tools.py index 056182992329..9182411e699e 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_tools.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_tools.py @@ -27,8 +27,6 @@ # pytype: skip-file -from __future__ import absolute_import - import datetime import decimal import io @@ -38,17 +36,16 @@ import sys import time import uuid -from builtins import object +from json.decoder import JSONDecodeError +from typing import Tuple +from typing import TypeVar +from typing import Union import fastavro -from future.utils import iteritems -from future.utils import raise_with_traceback -from past.builtins import unicode from apache_beam import coders from apache_beam.internal.gcp import auth from apache_beam.internal.gcp.json_value import from_json_value -from apache_beam.internal.gcp.json_value import to_json_value from apache_beam.internal.http_client import get_new_http from apache_beam.internal.metrics.metric import MetricLogger from apache_beam.internal.metrics.metric import Metrics @@ -69,23 +66,35 @@ # Protect against environments where bigquery library is not available. # pylint: disable=wrong-import-order, wrong-import-position try: + from apitools.base.py.transfer import Upload from apitools.base.py.exceptions import HttpError, HttpForbiddenError + from google.api_core.exceptions import ClientError, GoogleAPICallError + from google.cloud import bigquery as gcp_bigquery except ImportError: + gcp_bigquery = None pass try: - # TODO(pabloem): Remove this workaround after Python 2.7 support ends. - from json.decoder import JSONDecodeError + from orjson import dumps as fast_json_dumps + from orjson import loads as fast_json_loads except ImportError: - JSONDecodeError = ValueError + fast_json_dumps = json.dumps + fast_json_loads = json.loads # pylint: enable=wrong-import-order, wrong-import-position -_LOGGER = logging.getLogger(__name__) +# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports +try: + from apache_beam.io.gcp.internal.clients.bigquery import TableReference +except ImportError: + TableReference = None +# pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports -MAX_RETRIES = 3 +_LOGGER = logging.getLogger(__name__) JSON_COMPLIANCE_ERROR = 'NAN, INF and -INF values are not JSON compliant.' +MAX_RETRIES = 3 +UNKNOWN_MIME_TYPE = 'application/octet-stream' class FileFormat(object): @@ -130,13 +139,32 @@ def get_hashable_destination(destination): A string representing the destination containing 'PROJECT:DATASET.TABLE'. """ - if isinstance(destination, bigquery.TableReference): + if isinstance(destination, TableReference): return '%s:%s.%s' % ( destination.projectId, destination.datasetId, destination.tableId) else: return destination +V = TypeVar('V') + + +def to_hashable_table_ref( + table_ref_elem_kv: Tuple[Union[str, TableReference], V]) -> Tuple[str, V]: + """Turns the key of the input tuple to its string representation. The key + should be either a string or a TableReference. + + Args: + table_ref_elem_kv: A tuple of table reference and element. + + Returns: + A tuple of string representation of input table and input element. + """ + table_ref = table_ref_elem_kv[0] + hashable_table_ref = get_hashable_destination(table_ref) + return (hashable_table_ref, table_ref_elem_kv[1]) + + def parse_table_schema_from_json(schema_string): """Parse the Table Schema provided as string. @@ -183,10 +211,10 @@ def parse_table_reference(table, dataset=None, project=None): Args: table: The ID of the table. The ID must contain only letters - (a-z, A-Z), numbers (0-9), or underscores (_). If dataset argument is None + (a-z, A-Z), numbers (0-9), connectors (-_). If dataset argument is None then the table argument must contain the entire table reference: 'DATASET.TABLE' or 'PROJECT:DATASET.TABLE'. This argument can be a - bigquery.TableReference instance in which case dataset and project are + TableReference instance in which case dataset and project are ignored and the reference is returned as a result. Additionally, for date partitioned tables, appending '$YYYYmmdd' to the table name is supported, e.g. 'DATASET.TABLE$YYYYmmdd'. @@ -206,8 +234,8 @@ def parse_table_reference(table, dataset=None, project=None): format. """ - if isinstance(table, bigquery.TableReference): - return bigquery.TableReference( + if isinstance(table, TableReference): + return TableReference( projectId=table.projectId, datasetId=table.datasetId, tableId=table.tableId) @@ -216,13 +244,13 @@ def parse_table_reference(table, dataset=None, project=None): elif isinstance(table, value_provider.ValueProvider): return table - table_reference = bigquery.TableReference() + table_reference = TableReference() # If dataset argument is not specified, the expectation is that the # table argument will contain a full table reference instead of just a # table name. if dataset is None: match = re.match( - r'^((?P.+):)?(?P\w+)\.(?P[\w\$]+)$', table) + r'^((?P.+):)?(?P\w+)\.(?P
    [-\w\$]+)$', table) if not match: raise ValueError( 'Expected a table reference (PROJECT:DATASET.TABLE or ' @@ -264,8 +292,10 @@ class BigQueryWrapper(object): (e.g., find and create tables, query a table, etc.). """ - TEMP_TABLE = 'temp_table_' - TEMP_DATASET = 'temp_dataset_' + # If updating following names, also update the corresponding pydocs in + # bigquery.py. + TEMP_TABLE = 'beam_temp_table_' + TEMP_DATASET = 'beam_temp_dataset_' HISTOGRAM_METRIC_LOGGER = MetricLogger() @@ -273,7 +303,8 @@ def __init__(self, client=None, temp_dataset_id=None): self.client = client or bigquery.BigqueryV2( http=get_new_http(), credentials=auth.get_service_credentials(), - response_encoding=None if sys.version_info[0] < 3 else 'utf8') + response_encoding='utf8') + self.gcp_bq_client = client or gcp_bigquery.Client() self._unique_row_id = 0 # For testing scenarios where we pass in a client we do not want a # randomized prefix for row IDs. @@ -284,7 +315,12 @@ def __init__(self, client=None, temp_dataset_id=None): 'latency_histogram_ms', LinearBucket(0, 20, 3000), BigQueryWrapper.HISTOGRAM_METRIC_LOGGER) + if temp_dataset_id and temp_dataset_id.startswith(self.TEMP_DATASET): + raise ValueError( + 'User provided temp dataset ID cannot start with %r' % + self.TEMP_DATASET) self.temp_dataset_id = temp_dataset_id or self._get_temp_dataset() + self.created_temp_dataset = False @property def unique_row_id(self): @@ -407,13 +443,29 @@ def _insert_load_job( project_id, job_id, table_reference, - source_uris, + source_uris=None, + source_stream=None, schema=None, write_disposition=None, create_disposition=None, additional_load_parameters=None, source_format=None, job_labels=None): + + if not source_uris and not source_stream: + raise ValueError( + 'Either a non-empty list of fully-qualified source URIs must be ' + 'provided via the source_uris parameter or an open file object must ' + 'be provided via the source_stream parameter. Got neither.') + + if source_uris and source_stream: + raise ValueError( + 'Only one of source_uris and source_stream may be specified. ' + 'Got both.') + + if source_uris is None: + source_uris = [] + additional_load_parameters = additional_load_parameters or {} job_schema = None if schema == 'SCHEMA_AUTODETECT' else schema reference = bigquery.JobReference(jobId=job_id, projectId=project_id) @@ -435,20 +487,28 @@ def _insert_load_job( ), jobReference=reference, )) - return self._start_job(request).jobReference + return self._start_job(request, stream=source_stream).jobReference def _start_job( self, - request # type: bigquery.BigqueryJobsInsertRequest + request, # type: bigquery.BigqueryJobsInsertRequest + stream=None, ): """Inserts a BigQuery job. If the job exists already, it returns it. + + Args: + request (bigquery.BigqueryJobsInsertRequest): An insert job request. + stream (IO[bytes]): A bytes IO object open for reading. """ try: - response = self.client.jobs.Insert(request) + upload = None + if stream: + upload = Upload.FromStream(stream, mime_type=UNKNOWN_MIME_TYPE) + response = self.client.jobs.Insert(request, upload=upload) _LOGGER.info( - "Stated BigQuery job: %s\n " + "Started BigQuery job: %s\n " "bq show -j --format=prettyjson --project_id=%s %s", response.jobReference, response.jobReference.projectId, @@ -553,7 +613,13 @@ def _get_query_results( num_retries=MAX_RETRIES, retry_filter=retry.retry_on_server_errors_timeout_or_quota_issues_filter) def _insert_all_rows( - self, project_id, dataset_id, table_id, rows, skip_invalid_rows=False): + self, + project_id, + dataset_id, + table_id, + rows, + insert_ids, + skip_invalid_rows=False): """Calls the insertAll BigQuery API endpoint. Docs for this BQ call: https://cloud.google.com/bigquery/docs/reference\ @@ -561,15 +627,6 @@ def _insert_all_rows( # The rows argument is a list of # bigquery.TableDataInsertAllRequest.RowsValueListEntry instances as # required by the InsertAll() method. - request = bigquery.BigqueryTabledataInsertAllRequest( - projectId=project_id, - datasetId=dataset_id, - tableId=table_id, - tableDataInsertAllRequest=bigquery.TableDataInsertAllRequest( - skipInvalidRows=skip_invalid_rows, - # TODO(silviuc): Should have an option for ignoreUnknownValues? - rows=rows)) - resource = resource_identifiers.BigQueryTable( project_id, dataset_id, table_id) @@ -590,22 +647,31 @@ def _insert_all_rows( base_labels=labels) started_millis = int(time.time() * 1000) - response = None try: - response = self.client.tabledata.InsertAll(request) - if not response.insertErrors: + table_ref_str = '%s.%s.%s' % (project_id, dataset_id, table_id) + errors = self.gcp_bq_client.insert_rows_json( + table_ref_str, + json_rows=rows, + row_ids=insert_ids, + skip_invalid_rows=True) + if not errors: service_call_metric.call('ok') - for insert_error in response.insertErrors: - for error in insert_error.errors: - service_call_metric.call(error.reason) + else: + for insert_error in errors: + service_call_metric.call(insert_error['errors'][0]) + except (ClientError, GoogleAPICallError) as e: + # e.code.value contains the numeric http status code. + service_call_metric.call(e.code.value) + # Re-reise the exception so that we re-try appropriately. + raise except HttpError as e: service_call_metric.call(e) + # Re-reise the exception so that we re-try appropriately. + raise finally: self._latency_histogram_metric.update( int(time.time() * 1000) - started_millis) - if response: - return not response.insertErrors, response.insertErrors - return False, [] + return not errors, errors @retry.with_exponential_backoff( num_retries=MAX_RETRIES, @@ -648,7 +714,7 @@ def _create_table( additional_parameters = additional_parameters or {} table = bigquery.Table( - tableReference=bigquery.TableReference( + tableReference=TableReference( projectId=project_id, datasetId=dataset_id, tableId=table_id), schema=schema, **additional_parameters) @@ -668,6 +734,7 @@ def get_or_create_dataset(self, project_id, dataset_id, location=None): dataset = self.client.datasets.Get( bigquery.BigqueryDatasetsGetRequest( projectId=project_id, datasetId=dataset_id)) + self.created_temp_dataset = False return dataset except HttpError as exn: if exn.status_code == 404: @@ -679,6 +746,7 @@ def get_or_create_dataset(self, project_id, dataset_id, location=None): request = bigquery.BigqueryDatasetsInsertRequest( projectId=project_id, dataset=dataset) response = self.client.datasets.Insert(request) + self.created_temp_dataset = True # The response is a bigquery.Dataset instance. return response else: @@ -737,18 +805,22 @@ def get_table_location(self, project_id, dataset_id, table_id): table = self.get_table(project_id, dataset_id, table_id) return table.location + # Returns true if the temporary dataset was provided by the user. + def is_user_configured_dataset(self): + return ( + self.temp_dataset_id and + not self.temp_dataset_id.startswith(self.TEMP_DATASET)) + @retry.with_exponential_backoff( num_retries=MAX_RETRIES, retry_filter=retry.retry_on_server_errors_and_timeout_filter) def create_temporary_dataset(self, project_id, location): - is_user_configured_dataset = \ - not self.temp_dataset_id.startswith(self.TEMP_DATASET) # Check if dataset exists to make sure that the temporary id is unique try: self.client.datasets.Get( bigquery.BigqueryDatasetsGetRequest( projectId=project_id, datasetId=self.temp_dataset_id)) - if project_id is not None and not is_user_configured_dataset: + if project_id is not None and not self.is_user_configured_dataset(): # Unittests don't pass projectIds so they can be run without error # User configured datasets are allowed to pre-exist. raise RuntimeError( @@ -784,7 +856,14 @@ def clean_up_temporary_dataset(self, project_id): else: raise try: - self._delete_dataset(temp_table.projectId, temp_table.datasetId, True) + # We do not want to delete temporary datasets configured by the user hence + # we just delete the temporary table in that case. + if not self.is_user_configured_dataset(): + self._delete_dataset(temp_table.projectId, temp_table.datasetId, True) + else: + self._delete_table( + temp_table.projectId, temp_table.datasetId, temp_table.tableId) + self.created_temp_dataset = False except HttpError as exn: if exn.status_code == 403: _LOGGER.warning( @@ -809,8 +888,9 @@ def get_job(self, project, job_id, location=None): def perform_load_job( self, destination, - files, job_id, + source_uris=None, + source_stream=None, schema=None, write_disposition=None, create_disposition=None, @@ -822,11 +902,23 @@ def perform_load_job( Returns: bigquery.JobReference with the information about the job that was started. """ + if not source_uris and not source_stream: + raise ValueError( + 'Either a non-empty list of fully-qualified source URIs must be ' + 'provided via the source_uris parameter or an open file object must ' + 'be provided via the source_stream parameter. Got neither.') + + if source_uris and source_stream: + raise ValueError( + 'Only one of source_uris and source_stream may be specified. ' + 'Got both.') + return self._insert_load_job( destination.projectId, job_id, destination, - files, + source_uris=source_uris, + source_stream=source_stream, schema=schema, create_disposition=create_disposition, write_disposition=write_disposition, @@ -930,8 +1022,8 @@ def get_or_create_table( # specified. if write_disposition == BigQueryDisposition.WRITE_TRUNCATE: self._delete_table(project_id, dataset_id, table_id) - elif (not self._is_table_empty(project_id, dataset_id, table_id) and - write_disposition == BigQueryDisposition.WRITE_EMPTY): + elif (write_disposition == BigQueryDisposition.WRITE_EMPTY and + not self._is_table_empty(project_id, dataset_id, table_id)): raise RuntimeError( 'Table %s:%s.%s is not empty but write disposition is WRITE_EMPTY.' % (project_id, dataset_id, table_id)) @@ -1056,29 +1148,19 @@ def insert_rows( # BigQuery will do a best-effort if unique IDs are provided. This situation # can happen during retries on failures. # TODO(silviuc): Must add support to writing TableRow's instead of dicts. - final_rows = [] - for i, row in enumerate(rows): - json_row = self._convert_to_json_row(row) - insert_id = str(self.unique_row_id) if not insert_ids else insert_ids[i] - final_rows.append( - bigquery.TableDataInsertAllRequest.RowsValueListEntry( - insertId=insert_id, json=json_row)) + insert_ids = [ + str(self.unique_row_id) if not insert_ids else insert_ids[i] for i, + _ in enumerate(rows) + ] + rows = [ + fast_json_loads(fast_json_dumps(r, default=default_encoder)) + for r in rows + ] + result, errors = self._insert_all_rows( - project_id, dataset_id, table_id, final_rows, skip_invalid_rows) + project_id, dataset_id, table_id, rows, insert_ids) return result, errors - def _convert_to_json_row(self, row): - json_object = bigquery.JsonObject() - for k, v in iteritems(row): - if isinstance(v, decimal.Decimal): - # decimal values are converted into string because JSON does not - # support the precision that decimal supports. BQ is able to handle - # inserts into NUMERIC columns by receiving JSON with string attrs. - v = str(v) - json_object.additionalProperties.append( - bigquery.JsonObject.AdditionalProperty(key=k, value=to_json_value(v))) - return json_object - def _convert_cell_value_to_dict(self, value, field): if field.type == 'STRING': # Input: "XYZ" --> Output: "XYZ" @@ -1239,8 +1321,10 @@ def _get_source_location(self): def __enter__(self): self.client = BigQueryWrapper(client=self.test_bigquery_client) - self.client.create_temporary_dataset( - self.executing_project, location=self._get_source_location()) + if not self.client.is_user_configured_dataset(): + # Temp dataset was provided by the user so we do not have to create one. + self.client.create_temporary_dataset( + self.executing_project, location=self._get_source_location()) return self def __exit__(self, exception_type, exception_value, traceback): @@ -1448,10 +1532,10 @@ def write(self, row): try: self._avro_writer.write(row) except (TypeError, ValueError) as ex: - raise_with_traceback( - ex.__class__( - "Error writing row to Avro: {}\nSchema: {}\nRow: {}".format( - ex, self._avro_writer.schema, row))) + _, _, tb = sys.exc_info() + raise ex.__class__( + "Error writing row to Avro: {}\nSchema: {}\nRow: {}".format( + ex, self._avro_writer.schema, row)).with_traceback(tb) class RetryStrategy(object): @@ -1578,7 +1662,7 @@ def get_dict_table_schema(schema): if (isinstance(schema, (dict, value_provider.ValueProvider)) or callable(schema) or schema is None): return schema - elif isinstance(schema, (str, unicode)): + elif isinstance(schema, str): table_schema = get_table_schema_from_string(schema) return table_schema_to_dict(table_schema) elif isinstance(schema, bigquery.TableSchema): @@ -1620,3 +1704,75 @@ def generate_bq_job_name(job_name, step_id, job_type, random=None): job_id=job_name.replace("-", ""), step_id=step_id, random=random) + + +def check_schema_equal( + left, right, *, ignore_descriptions=False, ignore_field_order=False): + # type: (Union[bigquery.TableSchema, bigquery.TableFieldSchema], Union[bigquery.TableSchema, bigquery.TableFieldSchema], bool, bool) -> bool + + """Check whether schemas are equivalent. + + This comparison function differs from using == to compare TableSchema + because it ignores categories, policy tags, descriptions (optionally), and + field ordering (optionally). + + Args: + left (~apache_beam.io.gcp.internal.clients.bigquery.\ +bigquery_v2_messages.TableSchema, ~apache_beam.io.gcp.internal.clients.\ +bigquery.bigquery_v2_messages.TableFieldSchema): + One schema to compare. + right (~apache_beam.io.gcp.internal.clients.bigquery.\ +bigquery_v2_messages.TableSchema, ~apache_beam.io.gcp.internal.clients.\ +bigquery.bigquery_v2_messages.TableFieldSchema): + The other schema to compare. + ignore_descriptions (bool): (optional) Whether or not to ignore field + descriptions when comparing. Defaults to False. + ignore_field_order (bool): (optional) Whether or not to ignore struct field + order when comparing. Defaults to False. + + Returns: + bool: True if the schemas are equivalent, False otherwise. + """ + if type(left) != type(right) or not isinstance( + left, (bigquery.TableSchema, bigquery.TableFieldSchema)): + return False + + if isinstance(left, bigquery.TableFieldSchema): + if left.name != right.name: + return False + + if left.type != right.type: + # Check for type aliases + if sorted( + (left.type, right.type)) not in (["BOOL", "BOOLEAN"], ["FLOAT", + "FLOAT64"], + ["INT64", "INTEGER"], ["RECORD", + "STRUCT"]): + return False + + if left.mode != right.mode: + return False + + if not ignore_descriptions and left.description != right.description: + return False + + if isinstance(left, + bigquery.TableSchema) or left.type in ("RECORD", "STRUCT"): + if len(left.fields) != len(right.fields): + return False + + if ignore_field_order: + left_fields = sorted(left.fields, key=lambda field: field.name) + right_fields = sorted(right.fields, key=lambda field: field.name) + else: + left_fields = left.fields + right_fields = right.fields + + for left_field, right_field in zip(left_fields, right_fields): + if not check_schema_equal(left_field, + right_field, + ignore_descriptions=ignore_descriptions, + ignore_field_order=ignore_field_order): + return False + + return True diff --git a/sdks/python/apache_beam/io/gcp/bigquery_tools_test.py b/sdks/python/apache_beam/io/gcp/bigquery_tools_test.py index 777b5b2620b9..84ac2f76b970 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_tools_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_tools_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import datetime import decimal import io @@ -30,24 +28,25 @@ import unittest import fastavro -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import,ungrouped-imports import mock import pytz -from future.utils import iteritems import apache_beam as beam from apache_beam.internal.gcp.json_value import to_json_value +from apache_beam.io.gcp import resource_identifiers from apache_beam.io.gcp.bigquery import TableRowJsonCoder from apache_beam.io.gcp.bigquery_tools import JSON_COMPLIANCE_ERROR from apache_beam.io.gcp.bigquery_tools import AvroRowWriter from apache_beam.io.gcp.bigquery_tools import BigQueryJobTypes from apache_beam.io.gcp.bigquery_tools import JsonRowWriter from apache_beam.io.gcp.bigquery_tools import RowAsDictJsonCoder +from apache_beam.io.gcp.bigquery_tools import check_schema_equal from apache_beam.io.gcp.bigquery_tools import generate_bq_job_name from apache_beam.io.gcp.bigquery_tools import parse_table_reference from apache_beam.io.gcp.bigquery_tools import parse_table_schema_from_json from apache_beam.io.gcp.internal.clients import bigquery +from apache_beam.metrics import monitoring_infos +from apache_beam.metrics.execution import MetricsEnvironment from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.value_provider import StaticValueProvider @@ -55,9 +54,14 @@ # pylint: disable=wrong-import-order, wrong-import-position try: from apitools.base.py.exceptions import HttpError, HttpForbiddenError + from google.api_core.exceptions import ClientError, DeadlineExceeded + from google.api_core.exceptions import InternalServerError except ImportError: + ClientError = None + DeadlineExceeded = None HttpError = None HttpForbiddenError = None + InternalServerError = None # pylint: enable=wrong-import-order, wrong-import-position @@ -130,6 +134,17 @@ def test_calling_with_fully_qualified_table_ref(self): self.assertEqual(parsed_ref.datasetId, datasetId) self.assertEqual(parsed_ref.tableId, tableId) + def test_calling_with_hyphened_table_ref(self): + projectId = 'test_project' + datasetId = 'test_dataset' + tableId = 'test-table' + fully_qualified_table = '{}:{}.{}'.format(projectId, datasetId, tableId) + parsed_ref = parse_table_reference(fully_qualified_table) + self.assertIsInstance(parsed_ref, bigquery.TableReference) + self.assertEqual(parsed_ref.projectId, projectId) + self.assertEqual(parsed_ref.datasetId, datasetId) + self.assertEqual(parsed_ref.tableId, tableId) + def test_calling_with_partially_qualified_table_ref(self): datasetId = 'test_dataset' tableId = 'test_table' @@ -386,6 +401,91 @@ def test_get_query_location(self): project_id="second_project_id", query=query, use_legacy_sql=False) self.assertEqual("US", location) + def test_perform_load_job_source_mutual_exclusivity(self): + client = mock.Mock() + wrapper = beam.io.gcp.bigquery_tools.BigQueryWrapper(client) + + # Both source_uri and source_stream specified. + with self.assertRaises(ValueError): + wrapper.perform_load_job( + destination=parse_table_reference('project:dataset.table'), + job_id='job_id', + source_uris=['gs://example.com/*'], + source_stream=io.BytesIO()) + + # Neither source_uri nor source_stream specified. + with self.assertRaises(ValueError): + wrapper.perform_load_job(destination='P:D.T', job_id='J') + + def test_perform_load_job_with_source_stream(self): + client = mock.Mock() + wrapper = beam.io.gcp.bigquery_tools.BigQueryWrapper(client) + + wrapper.perform_load_job( + destination=parse_table_reference('project:dataset.table'), + job_id='job_id', + source_stream=io.BytesIO(b'some,data')) + + client.jobs.Insert.assert_called_once() + upload = client.jobs.Insert.call_args[1]["upload"] + self.assertEqual(b'some,data', upload.stream.read()) + + def verify_write_call_metric( + self, project_id, dataset_id, table_id, status, count): + """Check if an metric was recorded for the BQ IO write API call.""" + process_wide_monitoring_infos = list( + MetricsEnvironment.process_wide_container(). + to_runner_api_monitoring_infos(None).values()) + resource = resource_identifiers.BigQueryTable( + project_id, dataset_id, table_id) + labels = { + # TODO(ajamato): Add Ptransform label. + monitoring_infos.SERVICE_LABEL: 'BigQuery', + # Refer to any method which writes elements to BigQuery in batches + # as "BigQueryBatchWrite". I.e. storage API's insertAll, or future + # APIs introduced. + monitoring_infos.METHOD_LABEL: 'BigQueryBatchWrite', + monitoring_infos.RESOURCE_LABEL: resource, + monitoring_infos.BIGQUERY_PROJECT_ID_LABEL: project_id, + monitoring_infos.BIGQUERY_DATASET_LABEL: dataset_id, + monitoring_infos.BIGQUERY_TABLE_LABEL: table_id, + monitoring_infos.STATUS_LABEL: status, + } + expected_mi = monitoring_infos.int64_counter( + monitoring_infos.API_REQUEST_COUNT_URN, count, labels=labels) + expected_mi.ClearField("start_time") + + found = False + for actual_mi in process_wide_monitoring_infos: + actual_mi.ClearField("start_time") + if expected_mi == actual_mi: + found = True + break + self.assertTrue( + found, "Did not find write call metric with status: %s" % status) + + @unittest.skipIf(ClientError is None, 'GCP dependencies are not installed') + def test_insert_rows_sets_metric_on_failure(self): + MetricsEnvironment.process_wide_container().reset() + client = mock.Mock() + client.insert_rows_json = mock.Mock( + # Fail a few times, then succeed. + side_effect=[ + DeadlineExceeded("Deadline Exceeded"), + InternalServerError("Internal Error"), + [], + ]) + wrapper = beam.io.gcp.bigquery_tools.BigQueryWrapper(client) + wrapper.insert_rows("my_project", "my_dataset", "my_table", []) + + # Expect two failing calls, then a success (i.e. two retries). + self.verify_write_call_metric( + "my_project", "my_dataset", "my_table", "deadline_exceeded", 1) + self.verify_write_call_metric( + "my_project", "my_dataset", "my_table", "internal", 1) + self.verify_write_call_metric( + "my_project", "my_dataset", "my_table", "ok", 1) + @unittest.skipIf(HttpError is None, 'GCP dependencies are not installed') class TestBigQueryReader(unittest.TestCase): @@ -814,9 +914,7 @@ def test_rows_are_written(self): client.tables.Get.return_value = table write_disposition = beam.io.BigQueryDisposition.WRITE_APPEND - insert_response = mock.Mock() - insert_response.insertErrors = [] - client.tabledata.InsertAll.return_value = insert_response + client.insert_rows_json.return_value = [] with beam.io.BigQuerySink( 'project:dataset.table', @@ -824,24 +922,11 @@ def test_rows_are_written(self): writer.Write({'i': 1, 'b': True, 's': 'abc', 'f': 3.14}) sample_row = {'i': 1, 'b': True, 's': 'abc', 'f': 3.14} - expected_rows = [] - json_object = bigquery.JsonObject() - for k, v in iteritems(sample_row): - json_object.additionalProperties.append( - bigquery.JsonObject.AdditionalProperty(key=k, value=to_json_value(v))) - expected_rows.append( - bigquery.TableDataInsertAllRequest.RowsValueListEntry( - insertId='_1', # First row ID generated with prefix '' - json=json_object)) - client.tabledata.InsertAll.assert_called_with( - bigquery.BigqueryTabledataInsertAllRequest( - projectId='project', - datasetId='dataset', - tableId='table', - tableDataInsertAllRequest=bigquery.TableDataInsertAllRequest( - rows=expected_rows, - skipInvalidRows=False, - ))) + client.insert_rows_json.assert_called_with( + '%s.%s.%s' % ('project', 'dataset', 'table'), + json_rows=[sample_row], + row_ids=['_1'], + skip_invalid_rows=True) def test_table_schema_without_project(self): # Writer should pick executing project by default. @@ -984,6 +1069,88 @@ def test_matches_template(self): self.assertRegex(job_name, base_pattern) +@unittest.skipIf(HttpError is None, 'GCP dependencies are not installed') +class TestCheckSchemaEqual(unittest.TestCase): + def test_simple_schemas(self): + schema1 = bigquery.TableSchema(fields=[]) + self.assertTrue(check_schema_equal(schema1, schema1)) + + schema2 = bigquery.TableSchema( + fields=[ + bigquery.TableFieldSchema(name="a", mode="NULLABLE", type="INT64") + ]) + self.assertTrue(check_schema_equal(schema2, schema2)) + self.assertFalse(check_schema_equal(schema1, schema2)) + + schema3 = bigquery.TableSchema( + fields=[ + bigquery.TableFieldSchema( + name="b", + mode="REPEATED", + type="RECORD", + fields=[ + bigquery.TableFieldSchema( + name="c", mode="REQUIRED", type="BOOL") + ]) + ]) + self.assertTrue(check_schema_equal(schema3, schema3)) + self.assertFalse(check_schema_equal(schema2, schema3)) + + def test_field_order(self): + """Test that field order is ignored when ignore_field_order=True.""" + schema1 = bigquery.TableSchema( + fields=[ + bigquery.TableFieldSchema( + name="a", mode="REQUIRED", type="FLOAT64"), + bigquery.TableFieldSchema(name="b", mode="REQUIRED", type="INT64"), + ]) + + schema2 = bigquery.TableSchema(fields=list(reversed(schema1.fields))) + + self.assertFalse(check_schema_equal(schema1, schema2)) + self.assertTrue( + check_schema_equal(schema1, schema2, ignore_field_order=True)) + + def test_descriptions(self): + """ + Test that differences in description are ignored + when ignore_descriptions=True. + """ + schema1 = bigquery.TableSchema( + fields=[ + bigquery.TableFieldSchema( + name="a", + mode="REQUIRED", + type="FLOAT64", + description="Field A", + ), + bigquery.TableFieldSchema( + name="b", + mode="REQUIRED", + type="INT64", + ), + ]) + + schema2 = bigquery.TableSchema( + fields=[ + bigquery.TableFieldSchema( + name="a", + mode="REQUIRED", + type="FLOAT64", + description="Field A is for Apple"), + bigquery.TableFieldSchema( + name="b", + mode="REQUIRED", + type="INT64", + description="Field B", + ), + ]) + + self.assertFalse(check_schema_equal(schema1, schema2)) + self.assertTrue( + check_schema_equal(schema1, schema2, ignore_descriptions=True)) + + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) unittest.main() diff --git a/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py b/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py index a5c1ce71e2a3..43bc7c9fa1ed 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py @@ -20,8 +20,6 @@ """Unit tests for BigQuery sources and sinks.""" # pytype: skip-file -from __future__ import absolute_import - import base64 import datetime import logging @@ -31,9 +29,9 @@ from decimal import Decimal import hamcrest as hc +import mock +import pytest import pytz -from future.utils import iteritems -from nose.plugins.attrib import attr import apache_beam as beam from apache_beam.io.gcp.bigquery_tools import BigQueryWrapper @@ -107,7 +105,7 @@ def create_table(self, table_name): projectId=self.project, datasetId=self.dataset_id, table=table) self.bigquery_client.client.tables.Insert(request) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_write(self): table_name = 'python_write_table' table_id = '{}.{}'.format(self.dataset_id, table_name) @@ -166,7 +164,7 @@ def test_big_query_write(self): create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED, write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY)) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_write_schema_autodetect(self): if self.runner_name == 'TestDataflowRunner': self.skipTest('DataflowRunner does not support schema autodetection') @@ -211,7 +209,7 @@ def test_big_query_write_schema_autodetect(self): write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY, temp_file_format=FileFormat.JSON)) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_write_new_types(self): table_name = 'python_new_types_table' table_id = '{}.{}'.format(self.dataset_id, table_name) @@ -229,7 +227,7 @@ def test_big_query_write_new_types(self): input_data = [row_data] # add rows with only one key value pair and None values for all other keys - for key, value in iteritems(row_data): + for key, value in row_data.items(): input_data.append({key: value}) table_schema = { @@ -292,7 +290,7 @@ def test_big_query_write_new_types(self): create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED, write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY)) - @attr('IT') + @pytest.mark.it_postcommit def test_big_query_write_without_schema(self): table_name = 'python_no_schema_table' self.create_table(table_name) @@ -354,6 +352,72 @@ def test_big_query_write_without_schema(self): write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND, temp_file_format=FileFormat.JSON)) + @pytest.mark.it_postcommit + @mock.patch( + "apache_beam.io.gcp.bigquery_file_loads._MAXIMUM_SOURCE_URIS", new=1) + def test_big_query_write_temp_table_append_schema_update(self): + """ + Test that schema update options are respected when appending to an existing + table via temporary tables. + + _MAXIMUM_SOURCE_URIS and max_file_size are both set to 1 to force multiple + load jobs and usage of temporary tables. + """ + table_name = 'python_append_schema_update' + self.create_table(table_name) + table_id = '{}.{}'.format(self.dataset_id, table_name) + + input_data = [{ + "int64": num, "bool": True, "nested_field": { + "fruit": "Apple" + } + } for num in range(1, 3)] + + table_schema = { + "fields": [{ + "name": "int64", "type": "INT64" + }, { + "name": "bool", "type": "BOOL" + }, + { + "name": "nested_field", + "type": "RECORD", + "mode": "REPEATED", + "fields": [ + { + "name": "fruit", + "type": "STRING", + "mode": "NULLABLE" + }, + ] + }] + } + + args = self.test_pipeline.get_full_options_as_args( + on_success_matcher=BigqueryFullResultMatcher( + project=self.project, + query=""" + SELECT bytes, date, time, int64, bool, fruit + FROM %s, + UNNEST(nested_field) as nested_field + ORDER BY int64 + """ % table_id, + data=[(None, None, None, num, True, "Apple") + for num in range(1, 3)])) + + with beam.Pipeline(argv=args) as p: + # pylint: disable=expression-not-assigned + ( + p | 'create' >> beam.Create(input_data) + | 'write' >> beam.io.WriteToBigQuery( + table_id, + schema=table_schema, + write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND, + max_file_size=1, # bytes + method=beam.io.WriteToBigQuery.Method.FILE_LOADS, + additional_bq_parameters={ + 'schemaUpdateOptions': ['ALLOW_FIELD_ADDITION']})) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/io/gcp/bigquery_write_perf_test.py b/sdks/python/apache_beam/io/gcp/bigquery_write_perf_test.py index 280d25a34b43..3936923c22e3 100644 --- a/sdks/python/apache_beam/io/gcp/bigquery_write_perf_test.py +++ b/sdks/python/apache_beam/io/gcp/bigquery_write_perf_test.py @@ -51,8 +51,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging from apache_beam import Map diff --git a/sdks/python/apache_beam/io/gcp/bigtableio.py b/sdks/python/apache_beam/io/gcp/bigtableio.py index 36c886185b5f..4887c11f5b67 100644 --- a/sdks/python/apache_beam/io/gcp/bigtableio.py +++ b/sdks/python/apache_beam/io/gcp/bigtableio.py @@ -37,8 +37,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - import logging import apache_beam as beam diff --git a/sdks/python/apache_beam/io/gcp/datastore/__init__.py b/sdks/python/apache_beam/io/gcp/datastore/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/__init__.py +++ b/sdks/python/apache_beam/io/gcp/datastore/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/__init__.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/__init__.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/adaptive_throttler.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/adaptive_throttler.py index 9ae046ebb62c..700a68a5af03 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/adaptive_throttler.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/adaptive_throttler.py @@ -21,11 +21,7 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import random -from builtins import object from apache_beam.io.gcp.datastore.v1new import util diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/adaptive_throttler_test.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/adaptive_throttler_test.py index cd45b5c6ec7a..98a75fe26dfd 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/adaptive_throttler_test.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/adaptive_throttler_test.py @@ -17,11 +17,7 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import unittest -from builtins import range from mock import patch diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/datastore_write_it_pipeline.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/datastore_write_it_pipeline.py index 506f4ab50af0..f56443fe6ed8 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/datastore_write_it_pipeline.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/datastore_write_it_pipeline.py @@ -29,8 +29,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import hashlib import logging @@ -118,7 +116,7 @@ def run(argv=None): | 'Input' >> beam.Create(list(range(num_entities))) | 'To String' >> beam.Map(str) | 'To Entity' >> beam.Map(EntityWrapper(kind, ancestor_key).make_entity) - | 'Write to Datastore' >> WriteToDatastore(project)) + | 'Write to Datastore' >> WriteToDatastore(project, hint_num_workers=1)) p.run() query = Query(kind=kind, project=project, ancestor=ancestor_key) @@ -155,7 +153,7 @@ def run(argv=None): _ = ( entities | 'To Keys' >> beam.Map(lambda entity: entity.key) - | 'delete entities' >> DeleteFromDatastore(project)) + | 'delete entities' >> DeleteFromDatastore(project, hint_num_workers=1)) p.run() diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/datastore_write_it_test.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/datastore_write_it_test.py index 7b3c01b2b72c..abecd5b6a4cf 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/datastore_write_it_test.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/datastore_write_it_test.py @@ -27,15 +27,13 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import random import unittest from datetime import datetime +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher from apache_beam.testing.test_pipeline import TestPipeline @@ -68,7 +66,7 @@ def run_datastore_write(self, limit=None): datastore_write_it_pipeline.run( test_pipeline.get_full_options_as_args(**extra_opts)) - @attr('IT') + @pytest.mark.it_postcommit @unittest.skipIf( datastore_write_it_pipeline is None, 'GCP dependencies are not installed') def test_datastore_write_limit(self): diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py index 065e5d8830ff..34539ece7fa6 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py @@ -23,19 +23,19 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import time -from builtins import round from apache_beam import typehints +from apache_beam.internal.metrics.metric import ServiceCallMetric +from apache_beam.io.gcp import resource_identifiers from apache_beam.io.gcp.datastore.v1new import helper from apache_beam.io.gcp.datastore.v1new import query_splitter from apache_beam.io.gcp.datastore.v1new import types from apache_beam.io.gcp.datastore.v1new import util from apache_beam.io.gcp.datastore.v1new.adaptive_throttler import AdaptiveThrottler +from apache_beam.io.gcp.datastore.v1new.rampup_throttling_fn import RampupThrottlingFn +from apache_beam.metrics import monitoring_infos from apache_beam.metrics.metric import Metrics from apache_beam.transforms import Create from apache_beam.transforms import DoFn @@ -44,6 +44,14 @@ from apache_beam.transforms import Reshuffle from apache_beam.utils import retry +# Protect against environments where datastore library is not available. +# pylint: disable=wrong-import-order, wrong-import-position +try: + from apitools.base.py.exceptions import HttpError + from google.api_core.exceptions import ClientError, GoogleAPICallError +except ImportError: + pass + __all__ = ['ReadFromDatastore', 'WriteToDatastore', 'DeleteFromDatastore'] _LOGGER = logging.getLogger(__name__) @@ -268,10 +276,33 @@ def get_estimated_num_splits(client, query): class _QueryFn(DoFn): """A DoFn that fetches entities from Cloud Datastore, for a given query.""" def process(self, query, *unused_args, **unused_kwargs): + if query.namespace is None: + query.namespace = '' _client = helper.get_client(query.project, query.namespace) client_query = query._to_client_query(_client) - for client_entity in client_query.fetch(query.limit): - yield types.Entity.from_client_entity(client_entity) + # Create request count metric + resource = resource_identifiers.DatastoreNamespace( + query.project, query.namespace) + labels = { + monitoring_infos.SERVICE_LABEL: 'Datastore', + monitoring_infos.METHOD_LABEL: 'BatchDatastoreRead', + monitoring_infos.RESOURCE_LABEL: resource, + monitoring_infos.DATASTORE_NAMESPACE_LABEL: query.namespace, + monitoring_infos.DATASTORE_PROJECT_ID_LABEL: query.project, + monitoring_infos.STATUS_LABEL: 'ok' + } + service_call_metric = ServiceCallMetric( + request_count_urn=monitoring_infos.API_REQUEST_COUNT_URN, + base_labels=labels) + try: + for client_entity in client_query.fetch(query.limit): + yield types.Entity.from_client_entity(client_entity) + service_call_metric.call('ok') + except (ClientError, GoogleAPICallError) as e: + # e.code.value contains the numeric http status code. + service_call_metric.call(e.code.value) + except HttpError as e: + service_call_metric.call(e) class _Mutate(PTransform): @@ -280,15 +311,33 @@ class _Mutate(PTransform): Only idempotent Datastore mutation operations (upsert and delete) are supported, as the commits are retried when failures occur. """ - def __init__(self, mutate_fn): + + # Default hint for the expected number of workers in the ramp-up throttling + # step for write or delete operations. + _DEFAULT_HINT_NUM_WORKERS = 500 + + def __init__( + self, + mutate_fn, + throttle_rampup=True, + hint_num_workers=_DEFAULT_HINT_NUM_WORKERS): """Initializes a Mutate transform. Args: mutate_fn: Instance of `DatastoreMutateFn` to use. + throttle_rampup: Whether to enforce a gradual ramp-up. + hint_num_workers: A hint for the expected number of workers, used to + estimate appropriate limits during ramp-up throttling. """ self._mutate_fn = mutate_fn + self._throttle_rampup = throttle_rampup + self._hint_num_workers = hint_num_workers def expand(self, pcoll): + if self._throttle_rampup: + throttling_fn = RampupThrottlingFn(self._hint_num_workers) + pcoll = ( + pcoll | 'Enforce throttling during ramp-up' >> ParDo(throttling_fn)) return pcoll | 'Write Batch to Datastore' >> ParDo(self._mutate_fn) class DatastoreMutateFn(DoFn): @@ -380,17 +429,39 @@ def write_mutations(self, throttler, rpc_stats_callback, throttle_delay=1): for element in self._batch_elements: self.add_to_batch(element) + # Create request count metric + resource = resource_identifiers.DatastoreNamespace(self._project, "") + labels = { + monitoring_infos.SERVICE_LABEL: 'Datastore', + monitoring_infos.METHOD_LABEL: 'BatchDatastoreWrite', + monitoring_infos.RESOURCE_LABEL: resource, + monitoring_infos.DATASTORE_NAMESPACE_LABEL: "", + monitoring_infos.DATASTORE_PROJECT_ID_LABEL: self._project, + monitoring_infos.STATUS_LABEL: 'ok' + } + + service_call_metric = ServiceCallMetric( + request_count_urn=monitoring_infos.API_REQUEST_COUNT_URN, + base_labels=labels) + try: start_time = time.time() self._batch.commit() end_time = time.time() + service_call_metric.call('ok') rpc_stats_callback(successes=1) throttler.successful_request(start_time * 1000) commit_time_ms = int((end_time - start_time) * 1000) return commit_time_ms - except Exception: + except (ClientError, GoogleAPICallError) as e: self._batch = None + # e.code.value contains the numeric http status code. + service_call_metric.call(e.code.value) + rpc_stats_callback(errors=1) + raise + except HttpError as e: + service_call_metric.call(e) rpc_stats_callback(errors=1) raise @@ -444,14 +515,22 @@ class WriteToDatastore(_Mutate): property key is empty then it is filled with the project ID passed to this transform. """ - def __init__(self, project): + def __init__( + self, + project, + throttle_rampup=True, + hint_num_workers=_Mutate._DEFAULT_HINT_NUM_WORKERS): """Initialize the `WriteToDatastore` transform. Args: project: (:class:`str`) The ID of the project to write entities to. + throttle_rampup: Whether to enforce a gradual ramp-up. + hint_num_workers: A hint for the expected number of workers, used to + estimate appropriate limits during ramp-up throttling. """ mutate_fn = WriteToDatastore._DatastoreWriteFn(project) - super(WriteToDatastore, self).__init__(mutate_fn) + super(WriteToDatastore, + self).__init__(mutate_fn, throttle_rampup, hint_num_workers) class _DatastoreWriteFn(_Mutate.DatastoreMutateFn): def element_to_client_batch_item(self, element): @@ -489,15 +568,23 @@ class DeleteFromDatastore(_Mutate): project ID passed to this transform. If ``project`` field in key is empty then it is filled with the project ID passed to this transform. """ - def __init__(self, project): + def __init__( + self, + project, + throttle_rampup=True, + hint_num_workers=_Mutate._DEFAULT_HINT_NUM_WORKERS): """Initialize the `DeleteFromDatastore` transform. Args: project: (:class:`str`) The ID of the project from which the entities will be deleted. + throttle_rampup: Whether to enforce a gradual ramp-up. + hint_num_workers: A hint for the expected number of workers, used to + estimate appropriate limits during ramp-up throttling. """ mutate_fn = DeleteFromDatastore._DatastoreDeleteFn(project) - super(DeleteFromDatastore, self).__init__(mutate_fn) + super(DeleteFromDatastore, + self).__init__(mutate_fn, throttle_rampup, hint_num_workers) class _DatastoreDeleteFn(_Mutate.DatastoreMutateFn): def element_to_client_batch_item(self, element): diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio_test.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio_test.py index 10922e28788e..603bcd018c3d 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio_test.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio_test.py @@ -19,10 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import datetime import math import unittest @@ -34,6 +30,7 @@ # Protect against environments where datastore library is not available. try: + from apache_beam.io.gcp import resource_identifiers from apache_beam.io.gcp.datastore.v1new import helper, util from apache_beam.io.gcp.datastore.v1new import query_splitter from apache_beam.io.gcp.datastore.v1new import datastoreio @@ -41,6 +38,8 @@ from apache_beam.io.gcp.datastore.v1new.datastoreio import ReadFromDatastore from apache_beam.io.gcp.datastore.v1new.datastoreio import WriteToDatastore from apache_beam.io.gcp.datastore.v1new.types import Key + from apache_beam.metrics import monitoring_infos + from apache_beam.metrics.execution import MetricsEnvironment from google.cloud.datastore import client from google.cloud.datastore import entity from google.cloud.datastore import helpers @@ -113,7 +112,7 @@ def test_write_mutations_no_errors(self): mock_throttler = MagicMock() rpc_stats_callback = MagicMock() mock_throttler.throttle_request.return_value = [] - mutate = datastoreio._Mutate.DatastoreMutateFn(lambda: None) + mutate = datastoreio._Mutate.DatastoreMutateFn("") mutate._batch = mock_batch mutate.write_mutations(mock_throttler, rpc_stats_callback) rpc_stats_callback.assert_has_calls([ @@ -130,7 +129,7 @@ def test_write_mutations_reconstruct_on_error(self, unused_sleep): mock_throttler = MagicMock() rpc_stats_callback = MagicMock() mock_throttler.throttle_request.return_value = [] - mutate = datastoreio._Mutate.DatastoreMutateFn(lambda: None) + mutate = datastoreio._Mutate.DatastoreMutateFn("") mutate._batch = mock_batch mutate._client = MagicMock() mutate._batch_elements = [None] @@ -153,7 +152,7 @@ def test_write_mutations_throttle_delay_retryable_error(self, unused_sleep): # First try: throttle once [True, False] # Second try: no throttle [False] mock_throttler.throttle_request.side_effect = [True, False, False] - mutate = datastoreio._Mutate.DatastoreMutateFn(lambda: None) + mutate = datastoreio._Mutate.DatastoreMutateFn("") mutate._batch = mock_batch mutate._batch_elements = [] mutate._client = MagicMock() @@ -174,13 +173,57 @@ def test_write_mutations_non_retryable_error(self): mock_throttler = MagicMock() rpc_stats_callback = MagicMock() mock_throttler.throttle_request.return_value = False - mutate = datastoreio._Mutate.DatastoreMutateFn(lambda: None) + mutate = datastoreio._Mutate.DatastoreMutateFn("") mutate._batch = mock_batch with self.assertRaises(exceptions.InvalidArgument): mutate.write_mutations( mock_throttler, rpc_stats_callback, throttle_delay=0) rpc_stats_callback.assert_called_once_with(errors=1) + def test_write_mutations_metric_on_failure(self): + MetricsEnvironment.process_wide_container().reset() + mock_batch = MagicMock() + mock_batch.commit.side_effect = [ + exceptions.DeadlineExceeded("Deadline Exceeded"), [] + ] + mock_throttler = MagicMock() + rpc_stats_callback = MagicMock() + mock_throttler.throttle_request.return_value = False + mutate = datastoreio._Mutate.DatastoreMutateFn("my_project") + mutate._batch = mock_batch + mutate._batch_elements = [] + mutate._client = MagicMock() + mutate.write_mutations(mock_throttler, rpc_stats_callback, throttle_delay=0) + self.verify_write_call_metric("my_project", "", "deadline_exceeded", 1) + self.verify_write_call_metric("my_project", "", "ok", 1) + + def verify_write_call_metric(self, project_id, namespace, status, count): + """Check if a metric was recorded for the Datastore IO write API call.""" + process_wide_monitoring_infos = list( + MetricsEnvironment.process_wide_container(). + to_runner_api_monitoring_infos(None).values()) + resource = resource_identifiers.DatastoreNamespace(project_id, namespace) + labels = { + monitoring_infos.SERVICE_LABEL: 'Datastore', + monitoring_infos.METHOD_LABEL: 'BatchDatastoreWrite', + monitoring_infos.RESOURCE_LABEL: resource, + monitoring_infos.DATASTORE_NAMESPACE_LABEL: namespace, + monitoring_infos.DATASTORE_PROJECT_ID_LABEL: project_id, + monitoring_infos.STATUS_LABEL: status + } + expected_mi = monitoring_infos.int64_counter( + monitoring_infos.API_REQUEST_COUNT_URN, count, labels=labels) + expected_mi.ClearField("start_time") + + found = False + for actual_mi in process_wide_monitoring_infos: + actual_mi.ClearField("start_time") + if expected_mi == actual_mi: + found = True + break + self.assertTrue( + found, "Did not find write call metric with status: %s" % status) + @unittest.skipIf(client is None, 'Datastore dependencies are not installed') class DatastoreioTest(unittest.TestCase): @@ -274,6 +317,52 @@ def test_SplitQueryFn_with_exception(self): self.assertEqual(expected_num_splits, len(split_queries)) self.assertEqual(self._mock_query, split_queries[0]) + def test_QueryFn_metric_on_failure(self): + MetricsEnvironment.process_wide_container().reset() + with patch.object(helper, 'get_client', return_value=self._mock_client): + self._mock_query.project = self._PROJECT + self._mock_query.namespace = self._NAMESPACE + _query_fn = ReadFromDatastore._QueryFn() + client_query = self._mock_query._to_client_query() + # Test with exception + client_query.fetch.side_effect = [ + exceptions.DeadlineExceeded("Deadline exceed") + ] + list(_query_fn.process(self._mock_query)) + self.verify_read_call_metric( + self._PROJECT, self._NAMESPACE, "deadline_exceeded", 1) + # Test success + client_query.fetch.side_effect = [[]] + list(_query_fn.process(self._mock_query)) + self.verify_read_call_metric(self._PROJECT, self._NAMESPACE, "ok", 1) + + def verify_read_call_metric(self, project_id, namespace, status, count): + """Check if a metric was recorded for the Datastore IO read API call.""" + process_wide_monitoring_infos = list( + MetricsEnvironment.process_wide_container(). + to_runner_api_monitoring_infos(None).values()) + resource = resource_identifiers.DatastoreNamespace(project_id, namespace) + labels = { + monitoring_infos.SERVICE_LABEL: 'Datastore', + monitoring_infos.METHOD_LABEL: 'BatchDatastoreRead', + monitoring_infos.RESOURCE_LABEL: resource, + monitoring_infos.DATASTORE_NAMESPACE_LABEL: namespace, + monitoring_infos.DATASTORE_PROJECT_ID_LABEL: project_id, + monitoring_infos.STATUS_LABEL: status + } + expected_mi = monitoring_infos.int64_counter( + monitoring_infos.API_REQUEST_COUNT_URN, count, labels=labels) + expected_mi.ClearField("start_time") + + found = False + for actual_mi in process_wide_monitoring_infos: + actual_mi.ClearField("start_time") + if expected_mi == actual_mi: + found = True + break + self.assertTrue( + found, "Did not find read call metric with status: %s" % status) + def check_DatastoreWriteFn(self, num_entities): """A helper function to test _DatastoreWriteFn.""" with patch.object(helper, 'get_client', return_value=self._mock_client): diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/helper.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/helper.py index 679e0528e012..a6f8ef594695 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/helper.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/helper.py @@ -23,19 +23,19 @@ # pytype: skip-file -from __future__ import absolute_import - import os import uuid -from builtins import range from typing import List from typing import Union from google.api_core import exceptions +from google.api_core.gapic_v1 import client_info from google.cloud import environment_vars +from google.cloud.datastore import __version__ from google.cloud.datastore import client from apache_beam.io.gcp.datastore.v1new import types +from apache_beam.version import __version__ as beam_version from cachetools.func import ttl_cache # https://cloud.google.com/datastore/docs/concepts/errors#error_codes @@ -50,7 +50,12 @@ @ttl_cache(maxsize=128, ttl=3600) def get_client(project, namespace): """Returns a Cloud Datastore client.""" - _client = client.Client(project=project, namespace=namespace) + _client_info = client_info.ClientInfo( + client_library_version=__version__, + gapic_version=__version__, + user_agent=f'beam-python-sdk/{beam_version}') + _client = client.Client( + project=project, namespace=namespace, client_info=_client_info) # Avoid overwriting user setting. BEAM-7608 if not os.environ.get(environment_vars.GCD_HOST, None): _client.base_url = 'https://batch-datastore.googleapis.com' # BEAM-1387 diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter.py index 2c8933b7e34b..8ac5b931e7a2 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter.py @@ -22,15 +22,6 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - -from builtins import range -from builtins import round - -from past.builtins import long -from past.builtins import unicode - from apache_beam.io.gcp.datastore.v1new import types from apache_beam.options.value_provider import ValueProvider @@ -144,10 +135,10 @@ class IdOrName(object): """ def __init__(self, id_or_name): self.id_or_name = id_or_name - if isinstance(id_or_name, (str, unicode)): + if isinstance(id_or_name, str): self.id = None self.name = id_or_name - elif isinstance(id_or_name, (int, long)): + elif isinstance(id_or_name, int): self.id = id_or_name self.name = None else: diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter_test.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter_test.py index a9b2b65c2ff9..b26651e9066e 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter_test.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter_test.py @@ -19,12 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import mock # Protect against environments where datastore library is not available. diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn.py new file mode 100644 index 000000000000..bf54401e4957 --- /dev/null +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn.py @@ -0,0 +1,97 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import datetime +import logging +import time +from typing import TypeVar + +from apache_beam import typehints +from apache_beam.io.gcp.datastore.v1new import util +from apache_beam.metrics.metric import Metrics +from apache_beam.transforms import DoFn +from apache_beam.utils.retry import FuzzedExponentialIntervals + +T = TypeVar('T') + +_LOG = logging.getLogger(__name__) + + +@typehints.with_input_types(T) +@typehints.with_output_types(T) +class RampupThrottlingFn(DoFn): + """A ``DoFn`` that throttles ramp-up following an exponential function. + + An implementation of a client-side throttler that enforces a gradual ramp-up, + broadly in line with Datastore best practices. See also + https://cloud.google.com/datastore/docs/best-practices#ramping_up_traffic. + """ + + _BASE_BUDGET = 500 + _RAMP_UP_INTERVAL = datetime.timedelta(minutes=5) + + def __init__(self, num_workers, *unused_args, **unused_kwargs): + """Initializes a ramp-up throttler transform. + + Args: + num_workers: A hint for the expected number of workers, used to derive + the local rate limit. + """ + super(RampupThrottlingFn, self).__init__(*unused_args, **unused_kwargs) + self._num_workers = num_workers + self._successful_ops = util.MovingSum(window_ms=1000, bucket_ms=1000) + self._first_instant = datetime.datetime.now() + self._throttled_secs = Metrics.counter( + RampupThrottlingFn, "cumulativeThrottlingSeconds") + + def _calc_max_ops_budget( + self, + first_instant: datetime.datetime, + current_instant: datetime.datetime): + """Function that returns per-second budget according to best practices. + + The exact function is `500 / num_workers * 1.5^max(0, (x-5)/5)`, where x is + the number of minutes since start time. + """ + timedelta_since_first = current_instant - first_instant + growth = max( + 0.0, (timedelta_since_first - self._RAMP_UP_INTERVAL) / + self._RAMP_UP_INTERVAL) + max_ops_budget = int(self._BASE_BUDGET / self._num_workers * (1.5**growth)) + return max(1, max_ops_budget) + + def process(self, element, **kwargs): + backoff = iter( + FuzzedExponentialIntervals(initial_delay_secs=1, num_retries=10000)) + + while True: + instant = datetime.datetime.now() + max_ops_budget = self._calc_max_ops_budget(self._first_instant, instant) + current_op_count = self._successful_ops.sum(instant.timestamp() * 1000) + available_ops = max_ops_budget - current_op_count + + if available_ops > 0: + self._successful_ops.add(instant.timestamp() * 1000, 1) + yield element + break + else: + backoff_secs = next(backoff) + _LOG.info( + 'Delaying by %sms to conform to gradual ramp-up.', + int(1000 * backoff_secs)) + time.sleep(backoff_secs) + self._throttled_secs.inc(int(backoff_secs)) diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn_test.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn_test.py new file mode 100644 index 000000000000..6c4783d1d7c6 --- /dev/null +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn_test.py @@ -0,0 +1,62 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import datetime +import unittest + +from mock import patch + +from apache_beam.io.gcp.datastore.v1new.rampup_throttling_fn import RampupThrottlingFn + +DATE_ZERO = datetime.datetime( + year=1970, month=1, day=1, tzinfo=datetime.timezone.utc) + + +class _RampupDelayException(Exception): + pass + + +class RampupThrottlerTransformTest(unittest.TestCase): + @patch('datetime.datetime') + @patch('time.sleep') + def test_rampup_throttling(self, mock_sleep, mock_datetime): + mock_datetime.now.return_value = DATE_ZERO + throttling_fn = RampupThrottlingFn(num_workers=1) + rampup_schedule = [ + (DATE_ZERO + datetime.timedelta(seconds=0), 500), + (DATE_ZERO + datetime.timedelta(milliseconds=1), 0), + (DATE_ZERO + datetime.timedelta(seconds=1), 500), + (DATE_ZERO + datetime.timedelta(seconds=1, milliseconds=1), 0), + (DATE_ZERO + datetime.timedelta(minutes=5), 500), + (DATE_ZERO + datetime.timedelta(minutes=10), 750), + (DATE_ZERO + datetime.timedelta(minutes=15), 1125), + (DATE_ZERO + datetime.timedelta(minutes=30), 3796), + (DATE_ZERO + datetime.timedelta(minutes=60), 43248), + ] + + mock_sleep.side_effect = _RampupDelayException() + for date, expected_budget in rampup_schedule: + mock_datetime.now.return_value = date + for _ in range(expected_budget): + next(throttling_fn.process(None)) + # Delay after budget is exhausted + with self.assertRaises(_RampupDelayException): + next(throttling_fn.process(None)) + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/types.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/types.py index ab9a18dd637c..2b09d0f885ee 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/types.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/types.py @@ -21,8 +21,6 @@ # pytype: skip-file -from __future__ import absolute_import - import copy from typing import Iterable from typing import List diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/types_test.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/types_test.py index c92da4200bca..29cfcb12fca2 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/types_test.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/types_test.py @@ -19,14 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import datetime import logging import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import mock # Protect against environments where datastore library is not available. diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/util.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/util.py index 207003abeb26..06a22143f59d 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/util.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/util.py @@ -21,15 +21,10 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import math -from builtins import object -from builtins import range # Constants used in batched mutation RPCs: -WRITE_BATCH_INITIAL_SIZE = 200 +WRITE_BATCH_INITIAL_SIZE = 50 # Max allowed Datastore writes per batch, and max bytes per batch. # Note that the max bytes per batch set here is lower than the 10MB limit # actually enforced by the API, to leave space for the CommitRequest wrapper @@ -37,8 +32,8 @@ # https://cloud.google.com/datastore/docs/concepts/limits WRITE_BATCH_MAX_SIZE = 500 WRITE_BATCH_MAX_BYTES_SIZE = 9000000 -WRITE_BATCH_MIN_SIZE = 10 -WRITE_BATCH_TARGET_LATENCY_MS = 5000 +WRITE_BATCH_MIN_SIZE = 5 +WRITE_BATCH_TARGET_LATENCY_MS = 6000 class MovingSum(object): @@ -52,8 +47,8 @@ class MovingSum(object): moving average tracker. """ def __init__(self, window_ms, bucket_ms): - if window_ms <= bucket_ms or bucket_ms <= 0: - raise ValueError("window_ms > bucket_ms > 0 please") + if window_ms < bucket_ms or bucket_ms <= 0: + raise ValueError("window_ms >= bucket_ms > 0 please") self._num_buckets = int(math.ceil(window_ms / bucket_ms)) self._bucket_ms = bucket_ms self._Reset(now=0) # initialize the moving window members diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1new/util_test.py b/sdks/python/apache_beam/io/gcp/datastore/v1new/util_test.py index 84bc9fabd1b9..f82b223b67ed 100644 --- a/sdks/python/apache_beam/io/gcp/datastore/v1new/util_test.py +++ b/sdks/python/apache_beam/io/gcp/datastore/v1new/util_test.py @@ -18,8 +18,6 @@ """Tests for util.py.""" # pytype: skip-file -from __future__ import absolute_import - import unittest from apache_beam.io.gcp.datastore.v1new import util @@ -85,18 +83,18 @@ def test_fast_queries(self): def test_slow_queries(self): self._batcher.report_latency(0, 10000, 200) self._batcher.report_latency(0, 10000, 200) - self.assertEqual(100, self._batcher.get_batch_size(0)) + self.assertEqual(120, self._batcher.get_batch_size(0)) def test_size_not_below_minimum(self): - self._batcher.report_latency(0, 30000, 50) - self._batcher.report_latency(0, 30000, 50) + self._batcher.report_latency(0, 75000, 50) + self._batcher.report_latency(0, 75000, 50) self.assertEqual(util.WRITE_BATCH_MIN_SIZE, self._batcher.get_batch_size(0)) def test_sliding_window(self): self._batcher.report_latency(0, 30000, 50) self._batcher.report_latency(50000, 5000, 200) self._batcher.report_latency(100000, 5000, 200) - self.assertEqual(200, self._batcher.get_batch_size(150000)) + self.assertEqual(240, self._batcher.get_batch_size(150000)) if __name__ == '__main__': diff --git a/sdks/python/apache_beam/io/gcp/dicomclient.py b/sdks/python/apache_beam/io/gcp/dicomclient.py index dbca17cb7b83..cbe951a0c744 100644 --- a/sdks/python/apache_beam/io/gcp/dicomclient.py +++ b/sdks/python/apache_beam/io/gcp/dicomclient.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - from google.auth import default from google.auth.transport import requests diff --git a/sdks/python/apache_beam/io/gcp/dicomio.py b/sdks/python/apache_beam/io/gcp/dicomio.py index e33d99df1696..078d61b92924 100644 --- a/sdks/python/apache_beam/io/gcp/dicomio.py +++ b/sdks/python/apache_beam/io/gcp/dicomio.py @@ -113,8 +113,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - from concurrent.futures import ThreadPoolExecutor from concurrent.futures import as_completed diff --git a/sdks/python/apache_beam/io/gcp/dicomio_integration_test.py b/sdks/python/apache_beam/io/gcp/dicomio_integration_test.py index f0faac481230..7970dd52e4b9 100644 --- a/sdks/python/apache_beam/io/gcp/dicomio_integration_test.py +++ b/sdks/python/apache_beam/io/gcp/dicomio_integration_test.py @@ -25,14 +25,11 @@ """ # pytype: skip-file -from __future__ import absolute_import - import random import string -import sys import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.io import fileio @@ -136,7 +133,7 @@ def tearDown(self): # clean up the temp Dicom store delete_dicom_store(self.project, DATA_SET_ID, REGION, self.temp_dicom_store) - @attr('IT') + @pytest.mark.it_postcommit def test_dicom_search_instances(self): # Search and compare the metadata of a persistent DICOM store. # Both refine and comprehensive search will be tested. @@ -186,7 +183,7 @@ def test_dicom_search_instances(self): equal_to([expected_dict_refine]), label='refine search assert') - @attr('IT') + @pytest.mark.it_postcommit def test_dicom_store_instance_from_gcs(self): # Store DICOM files to a empty DICOM store from a GCS bucket, # then check if the store metadata match. @@ -217,10 +214,7 @@ def test_dicom_store_instance_from_gcs(self): self.assertEqual(status_code, 200) # List comparison based on different version of python - if sys.version_info.major == 3: - self.assertCountEqual(result, self.expected_output_all_metadata) - else: - self.assertItemsEqual(result, self.expected_output_all_metadata) + self.assertCountEqual(result, self.expected_output_all_metadata) if __name__ == '__main__': diff --git a/sdks/python/apache_beam/io/gcp/dicomio_test.py b/sdks/python/apache_beam/io/gcp/dicomio_test.py index 2594e45d50d8..3dcb43405746 100644 --- a/sdks/python/apache_beam/io/gcp/dicomio_test.py +++ b/sdks/python/apache_beam/io/gcp/dicomio_test.py @@ -20,14 +20,10 @@ # pytype: skip-file -from __future__ import absolute_import - import json import os import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import from mock import patch import apache_beam as beam diff --git a/sdks/python/apache_beam/io/gcp/experimental/__init__.py b/sdks/python/apache_beam/io/gcp/experimental/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/gcp/experimental/__init__.py +++ b/sdks/python/apache_beam/io/gcp/experimental/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/gcp/experimental/spannerio.py b/sdks/python/apache_beam/io/gcp/experimental/spannerio.py index 29b8cc061a1e..d50b3a880c3e 100644 --- a/sdks/python/apache_beam/io/gcp/experimental/spannerio.py +++ b/sdks/python/apache_beam/io/gcp/experimental/spannerio.py @@ -168,8 +168,6 @@ mutation groups together to process. If the Mutation references a table or column does not exits, it will cause a exception and fails the entire pipeline. """ -from __future__ import absolute_import - import typing from collections import deque from collections import namedtuple @@ -304,7 +302,7 @@ def snapshot_options(self): return snapshot_options -@with_input_types(ReadOperation, typing.Dict[typing.Any, typing.Any]) +@with_input_types(ReadOperation, _SPANNER_TRANSACTION) @with_output_types(typing.List[typing.Any]) class _NaiveSpannerReadDoFn(DoFn): def __init__(self, spanner_configuration): @@ -424,7 +422,7 @@ def process(self, element): @with_input_types(int) -@with_output_types(typing.Dict[typing.Any, typing.Any]) +@with_output_types(_SPANNER_TRANSACTION) class _CreateTransactionFn(DoFn): """ A DoFn to create the transaction of cloud spanner. diff --git a/sdks/python/apache_beam/io/gcp/experimental/spannerio_read_it_test.py b/sdks/python/apache_beam/io/gcp/experimental/spannerio_read_it_test.py index 951ea111cf3b..0272dac4e772 100644 --- a/sdks/python/apache_beam/io/gcp/experimental/spannerio_read_it_test.py +++ b/sdks/python/apache_beam/io/gcp/experimental/spannerio_read_it_test.py @@ -15,15 +15,13 @@ # limitations under the License. # -from __future__ import absolute_import - import logging import random import sys import unittest import uuid -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -103,7 +101,7 @@ def setUpClass(cls): cls._add_dummy_entries() _LOGGER.info("Spanner Read IT Setup Complete...") - @attr('IT') + @pytest.mark.it_postcommit def test_read_via_table(self): _LOGGER.info("Spanner Read via table") with beam.Pipeline(argv=self.args) as p: @@ -115,7 +113,7 @@ def test_read_via_table(self): columns=["UserId", "Key"]) assert_that(r, equal_to(self._data)) - @attr('IT') + @pytest.mark.it_postcommit def test_read_via_sql(self): _LOGGER.info("Running Spanner via sql") with beam.Pipeline(argv=self.args) as p: diff --git a/sdks/python/apache_beam/io/gcp/experimental/spannerio_read_perf_test.py b/sdks/python/apache_beam/io/gcp/experimental/spannerio_read_perf_test.py new file mode 100644 index 000000000000..ae7c1ea8d765 --- /dev/null +++ b/sdks/python/apache_beam/io/gcp/experimental/spannerio_read_perf_test.py @@ -0,0 +1,158 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +A performance test for reading data from a Spanner database table. +Besides of the standard options, there are options with special meaning: +* spanner_instance - Spanner Instance ID. +* spanner_database - Spanner Database ID. +The table will be created and populated with data from Synthetic Source if it +does not exist. +* input_options - options for Synthetic Source: +num_records - number of rows to be inserted, +value_size - the length of a single row, +key_size - required option, but its value has no meaning. + +Example test run on DataflowRunner: +python -m apache_beam.io.gcp.experimental.spannerio_read_perf_test \ + --test-pipeline-options=" + --runner=TestDataflowRunner + --project='...' + --region='...' + --temp_location='gs://...' + --sdk_location=build/apache-beam.tar.gz + --publish_to_big_query=true + --metrics_dataset='...' + --metrics_table='...' + --spanner_instance='...' + --spanner_database='...' + --input_options='{ + \"num_records\": 10, + \"key_size\": 1, + \"value_size\": 1024 + }'" + +This setup will result in a table of 1MB size. +""" + +from __future__ import absolute_import + +import logging + +from apache_beam import FlatMap +from apache_beam import Map +from apache_beam import ParDo +from apache_beam.io import Read +from apache_beam.io.gcp.experimental.spannerio import ReadFromSpanner +from apache_beam.io.gcp.experimental.spannerio import WriteToSpanner +from apache_beam.testing.load_tests.load_test import LoadTest +from apache_beam.testing.load_tests.load_test_metrics_utils import CountMessages +from apache_beam.testing.load_tests.load_test_metrics_utils import MeasureTime +from apache_beam.testing.synthetic_pipeline import SyntheticSource +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to +from apache_beam.transforms.combiners import Count + +# pylint: disable=wrong-import-order, wrong-import-position +try: + from google.api_core.exceptions import AlreadyExists + from google.cloud import spanner +except ImportError: + spanner = None + AlreadyExists = None +# pylint: enable=wrong-import-order, wrong-import-position + + +class SpannerReadPerfTest(LoadTest): + def __init__(self): + super(SpannerReadPerfTest, self).__init__() + self.project = self.pipeline.get_option('project') + self.spanner_instance = self.pipeline.get_option('spanner_instance') + self.spanner_database = self.pipeline.get_option('spanner_database') + self._init_setup() + + def _create_database(self): + spanner_client = spanner.Client() + instance = spanner_client.instance(self.spanner_instance) + database = instance.database( + self.spanner_database, + ddl_statements=[ + """CREATE TABLE test_data ( + id STRING(99) NOT NULL, + data BYTES(MAX) NOT NULL + ) PRIMARY KEY (id)""", + ]) + database.create() + + def _init_setup(self): + """Checks if a spanner database exists and creates it if not.""" + try: + self._create_database() + self._create_input_data() + except AlreadyExists: + # pass if the database already exists + pass + + def _create_input_data(self): + """ + Runs an additional pipeline which creates test data and waits for its + completion. + """ + def format_record(record): + import base64 + return base64.b64encode(record[1]) + + def make_insert_mutations(element): + import uuid + from apache_beam.io.gcp.experimental.spannerio import WriteMutation + ins_mutation = WriteMutation.insert( + table='test_data', + columns=('id', 'data'), + values=[(str(uuid.uuid1()), element)]) + return [ins_mutation] + + with TestPipeline() as p: + ( # pylint: disable=expression-not-assigned + p + | 'Produce rows' >> Read( + SyntheticSource(self.parse_synthetic_source_options())) + | 'Format' >> Map(format_record) + | 'Make mutations' >> FlatMap(make_insert_mutations) + | 'Write to Spanner' >> WriteToSpanner( + project_id=self.project, + instance_id=self.spanner_instance, + database_id=self.spanner_database, + max_batch_size_bytes=5120)) + + def test(self): + output = ( + self.pipeline + | 'Read from Spanner' >> ReadFromSpanner( + self.project, + self.spanner_instance, + self.spanner_database, + sql="select data from test_data") + | 'Count messages' >> ParDo(CountMessages(self.metrics_namespace)) + | 'Measure time' >> ParDo(MeasureTime(self.metrics_namespace)) + | 'Count' >> Count.Globally()) + assert_that(output, equal_to([self.input_options['num_records']])) + + +if __name__ == '__main__': + logging.basicConfig(level=logging.INFO) + SpannerReadPerfTest().run() diff --git a/sdks/python/apache_beam/io/gcp/experimental/spannerio_test.py b/sdks/python/apache_beam/io/gcp/experimental/spannerio_test.py index 672bf8b7d242..e3d19652c306 100644 --- a/sdks/python/apache_beam/io/gcp/experimental/spannerio_test.py +++ b/sdks/python/apache_beam/io/gcp/experimental/spannerio_test.py @@ -15,12 +15,11 @@ # limitations under the License. # -from __future__ import absolute_import - import datetime import logging import random import string +import typing import unittest import mock @@ -338,7 +337,10 @@ def test_invalid_transaction( self, mock_batch_snapshot_class, mock_client_class): with self.assertRaises(ValueError): p = TestPipeline() - transaction = (p | beam.Create([{"invalid": "transaction"}])) + transaction = ( + p | beam.Create([{ + "invalid": "transaction" + }]).with_output_types(typing.Any)) _ = ( p | 'with query' >> ReadFromSpanner( project_id=TEST_PROJECT_ID, diff --git a/sdks/python/apache_beam/io/gcp/experimental/spannerio_write_it_test.py b/sdks/python/apache_beam/io/gcp/experimental/spannerio_write_it_test.py index b136add34265..7f2c8e30e3fb 100644 --- a/sdks/python/apache_beam/io/gcp/experimental/spannerio_write_it_test.py +++ b/sdks/python/apache_beam/io/gcp/experimental/spannerio_write_it_test.py @@ -15,14 +15,12 @@ # limitations under the License. # -from __future__ import absolute_import - import logging import random import unittest import uuid -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -105,7 +103,7 @@ def setUpClass(cls): cls._create_database() _LOGGER.info('Spanner Write IT Setup Complete...') - @attr('IT') + @pytest.mark.it_postcommit def test_write_batches(self): _prefex = 'test_write_batches' mutations = [ @@ -131,7 +129,7 @@ def test_write_batches(self): res.wait_until_finish() self.assertEqual(self._count_data(_prefex), len(mutations)) - @attr('IT') + @pytest.mark.it_postcommit def test_spanner_update(self): _prefex = 'test_update' @@ -167,7 +165,7 @@ def test_spanner_update(self): res.wait_until_finish() self.assertEqual(self._count_data(_prefex), 2) - @attr('IT') + @pytest.mark.it_postcommit def test_spanner_error(self): mutations_update = [ WriteMutation.update( diff --git a/sdks/python/apache_beam/io/gcp/experimental/spannerio_write_perf_test.py b/sdks/python/apache_beam/io/gcp/experimental/spannerio_write_perf_test.py new file mode 100644 index 000000000000..707db81713cd --- /dev/null +++ b/sdks/python/apache_beam/io/gcp/experimental/spannerio_write_perf_test.py @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +A pipeline that writes data from Synthetic Source to a Spanner. +Besides of the standard options, there are options with special meaning: +* spanner_instance - Spanner Instance ID. +* spanner_database - Spanner Database ID. +* input_options - options for Synthetic Source: +num_records - number of rows to be inserted, +value_size - the length of a single row, +key_size - required option, but its value has no meaning. + +Example test run on DataflowRunner: + +python -m apache_beam.io.gcp.experimental.spannerio_write_perf_test \ + --test-pipeline-options=" + --runner=TestDataflowRunner + --project='...' + --region='...' + --temp_location='gs://...' + --sdk_location=build/apache-beam.tar.gz + --publish_to_big_query=true + --metrics_dataset='...' + --metrics_table='...' + --spanner_instance='...' + --spanner_database='...' + --input_options='{ + \"num_records\": 10, + \"key_size\": 1, + \"value_size\": 1024 + }'" + +This setup will result in a table of 1MB size. +""" + +from __future__ import absolute_import + +import logging +import random +import uuid + +from apache_beam import FlatMap +from apache_beam import Map +from apache_beam import ParDo +from apache_beam.io import Read +from apache_beam.io.gcp.experimental.spannerio import WriteToSpanner +from apache_beam.testing.load_tests.load_test import LoadTest +from apache_beam.testing.load_tests.load_test_metrics_utils import CountMessages +from apache_beam.testing.load_tests.load_test_metrics_utils import MeasureTime +from apache_beam.testing.synthetic_pipeline import SyntheticSource + +# pylint: disable=wrong-import-order, wrong-import-position +try: + from google.cloud import spanner +except ImportError: + spanner = None +# pylint: enable=wrong-import-order, wrong-import-position + + +class SpannerWritePerfTest(LoadTest): + TEST_DATABASE = None + + def __init__(self): + super(SpannerWritePerfTest, self).__init__() + self.project = self.pipeline.get_option('project') + self.spanner_instance = self.pipeline.get_option('spanner_instance') + self.spanner_database = self.pipeline.get_option('spanner_database') + self._init_setup() + + def _generate_table_name(self): + self.TEST_DATABASE = "{}_{}".format( + self.spanner_database, ''.join(random.sample(uuid.uuid4().hex, 4))) + return self.TEST_DATABASE + + def _create_database(self): + spanner_client = spanner.Client() + instance = self._SPANNER_INSTANCE = spanner_client.instance( + self.spanner_instance) + database = instance.database( + self.TEST_DATABASE, + ddl_statements=[ + """CREATE TABLE test ( + id STRING(99) NOT NULL, + data BYTES(MAX) NOT NULL + ) PRIMARY KEY (id)""", + ]) + database.create() + + def _init_setup(self): + """Create database.""" + self._generate_table_name() + self._create_database() + + def test(self): + def format_record(record): + import base64 + return base64.b64encode(record[1]) + + def make_insert_mutations(element): + import uuid # pylint: disable=reimported + from apache_beam.io.gcp.experimental.spannerio import WriteMutation + ins_mutation = WriteMutation.insert( + table='test', + columns=('id', 'data'), + values=[(str(uuid.uuid1()), element)]) + return [ins_mutation] + + ( # pylint: disable=expression-not-assigned + self.pipeline + | 'Produce rows' >> Read( + SyntheticSource(self.parse_synthetic_source_options())) + | 'Count messages' >> ParDo(CountMessages(self.metrics_namespace)) + | 'Format' >> Map(format_record) + | 'Make mutations' >> FlatMap(make_insert_mutations) + | 'Measure time' >> ParDo(MeasureTime(self.metrics_namespace)) + | 'Write to Spanner' >> WriteToSpanner( + project_id=self.project, + instance_id=self.spanner_instance, + database_id=self.TEST_DATABASE, + max_batch_size_bytes=5120) + ) + + def cleanup(self): + """Removes test database.""" + database = self._SPANNER_INSTANCE.database(self.TEST_DATABASE) + database.drop() + + +if __name__ == '__main__': + logging.basicConfig(level=logging.INFO) + SpannerWritePerfTest().run() diff --git a/sdks/python/apache_beam/io/gcp/gce_metadata_util.py b/sdks/python/apache_beam/io/gcp/gce_metadata_util.py index 40f1745c170b..ff185876f04c 100644 --- a/sdks/python/apache_beam/io/gcp/gce_metadata_util.py +++ b/sdks/python/apache_beam/io/gcp/gce_metadata_util.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import requests BASE_METADATA_URL = "http://metadata/computeMetadata/v1/" diff --git a/sdks/python/apache_beam/io/gcp/gcsfilesystem.py b/sdks/python/apache_beam/io/gcp/gcsfilesystem.py index 91dcda325abb..0b20718c009d 100644 --- a/sdks/python/apache_beam/io/gcp/gcsfilesystem.py +++ b/sdks/python/apache_beam/io/gcp/gcsfilesystem.py @@ -19,13 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import zip from typing import BinaryIO # pylint: disable=unused-import -from future.utils import iteritems - from apache_beam.io.filesystem import BeamIOError from apache_beam.io.filesystem import CompressedFile from apache_beam.io.filesystem import CompressionTypes @@ -125,7 +120,7 @@ def _list(self, dir_or_prefix): ``BeamIOError``: if listing fails, but not if no files were found. """ try: - for path, size in iteritems(gcsio.GcsIO().list_prefix(dir_or_prefix)): + for path, size in gcsio.GcsIO().list_prefix(dir_or_prefix).items(): yield FileMetadata(path, size) except Exception as e: # pylint: disable=broad-except raise BeamIOError("List operation failed", {dir_or_prefix: e}) diff --git a/sdks/python/apache_beam/io/gcp/gcsfilesystem_test.py b/sdks/python/apache_beam/io/gcp/gcsfilesystem_test.py index 2a67b4113af9..c5f80bb6a1d0 100644 --- a/sdks/python/apache_beam/io/gcp/gcsfilesystem_test.py +++ b/sdks/python/apache_beam/io/gcp/gcsfilesystem_test.py @@ -20,14 +20,9 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest -from builtins import zip -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import mock from apache_beam.io.filesystem import BeamIOError diff --git a/sdks/python/apache_beam/io/gcp/gcsio.py b/sdks/python/apache_beam/io/gcp/gcsio.py index a83364bd4421..b9fb15f655eb 100644 --- a/sdks/python/apache_beam/io/gcp/gcsio.py +++ b/sdks/python/apache_beam/io/gcp/gcsio.py @@ -23,26 +23,25 @@ # pytype: skip-file -from __future__ import absolute_import - import errno import io import logging import multiprocessing import re -import sys import threading import time import traceback -from builtins import object from itertools import islice from apache_beam.internal.http_client import get_new_http +from apache_beam.internal.metrics.metric import ServiceCallMetric from apache_beam.io.filesystemio import Downloader from apache_beam.io.filesystemio import DownloaderStream from apache_beam.io.filesystemio import PipeStream from apache_beam.io.filesystemio import Uploader from apache_beam.io.filesystemio import UploaderStream +from apache_beam.io.gcp import resource_identifiers +from apache_beam.metrics import monitoring_infos from apache_beam.utils import retry __all__ = ['GcsIO'] @@ -107,7 +106,9 @@ def parse_gcs_path(gcs_path, object_optional=False): """Return the bucket and object names of the given gs:// path.""" match = re.match('^gs://([^/]+)/(.*)$', gcs_path) if match is None or (match.group(2) == '' and not object_optional): - raise ValueError('GCS path must be in the form gs:///.') + raise ValueError( + 'GCS path must be in the form gs:///. ' + f'Encountered {gcs_path!r}') return match.group(1), match.group(2) @@ -154,9 +155,17 @@ def __init__(self, storage_client=None): credentials=auth.get_service_credentials(), get_credentials=False, http=get_new_http(), - response_encoding=None if sys.version_info[0] < 3 else 'utf8') + response_encoding='utf8') self.client = storage_client self._rewrite_cb = None + self.bucket_to_project_number = {} + + def get_project_number(self, bucket): + if bucket not in self.bucket_to_project_number: + bucket_metadata = self.get_bucket(bucket_name=bucket) + self.bucket_to_project_number[bucket] = bucket_metadata.projectNumber + + return self.bucket_to_project_number[bucket] def _set_rewrite_response_callback(self, callback): """For testing purposes only. No backward compatibility guarantees. @@ -212,13 +221,20 @@ def open( """ if mode == 'r' or mode == 'rb': downloader = GcsDownloader( - self.client, filename, buffer_size=read_buffer_size) + self.client, + filename, + buffer_size=read_buffer_size, + get_project_number=self.get_project_number) return io.BufferedReader( DownloaderStream( downloader, read_buffer_size=read_buffer_size, mode=mode), buffer_size=read_buffer_size) elif mode == 'w' or mode == 'wb': - uploader = GcsUploader(self.client, filename, mime_type) + uploader = GcsUploader( + self.client, + filename, + mime_type, + get_project_number=self.get_project_number) return io.BufferedWriter( UploaderStream(uploader, mode=mode), buffer_size=128 * 1024) else: @@ -559,11 +575,27 @@ def list_prefix(self, path): class GcsDownloader(Downloader): - def __init__(self, client, path, buffer_size): + def __init__(self, client, path, buffer_size, get_project_number): self._client = client self._path = path self._bucket, self._name = parse_gcs_path(path) self._buffer_size = buffer_size + self._get_project_number = get_project_number + + project_number = self._get_project_number(self._bucket) + + # Create a request count metric + resource = resource_identifiers.GoogleCloudStorageBucket(self._bucket) + labels = { + monitoring_infos.SERVICE_LABEL: 'Storage', + monitoring_infos.METHOD_LABEL: 'Objects.get', + monitoring_infos.RESOURCE_LABEL: resource, + monitoring_infos.GCS_BUCKET_LABEL: self._bucket, + monitoring_infos.GCS_PROJECT_ID_LABEL: str(project_number) + } + service_call_metric = ServiceCallMetric( + request_count_urn=monitoring_infos.API_REQUEST_COUNT_URN, + base_labels=labels) # Get object state. self._get_request = ( @@ -571,7 +603,9 @@ def __init__(self, client, path, buffer_size): bucket=self._bucket, object=self._name)) try: metadata = self._get_object_metadata(self._get_request) + service_call_metric.call('ok') except HttpError as http_error: + service_call_metric.call(http_error) if http_error.status_code == 404: raise IOError(errno.ENOENT, 'Not found: %s' % self._path) else: @@ -590,7 +624,12 @@ def __init__(self, client, path, buffer_size): auto_transfer=False, chunksize=self._buffer_size, num_retries=20) - self._client.objects.Get(self._get_request, download=self._downloader) + + try: + self._client.objects.Get(self._get_request, download=self._downloader) + service_call_metric.call('ok') + except HttpError as e: + service_call_metric.call(e) @retry.with_exponential_backoff( retry_filter=retry.retry_on_server_errors_and_timeout_filter) @@ -609,11 +648,12 @@ def get_range(self, start, end): class GcsUploader(Uploader): - def __init__(self, client, path, mime_type): + def __init__(self, client, path, mime_type, get_project_number): self._client = client self._path = path self._bucket, self._name = parse_gcs_path(path) self._mime_type = mime_type + self._get_project_number = get_project_number # Set up communication with child thread. parent_conn, child_conn = multiprocessing.Pipe() @@ -647,9 +687,26 @@ def _start_upload(self): # # The uploader by default transfers data in chunks of 1024 * 1024 bytes at # a time, buffering writes until that size is reached. + + project_number = self._get_project_number(self._bucket) + + # Create a request count metric + resource = resource_identifiers.GoogleCloudStorageBucket(self._bucket) + labels = { + monitoring_infos.SERVICE_LABEL: 'Storage', + monitoring_infos.METHOD_LABEL: 'Objects.insert', + monitoring_infos.RESOURCE_LABEL: resource, + monitoring_infos.GCS_BUCKET_LABEL: self._bucket, + monitoring_infos.GCS_PROJECT_ID_LABEL: str(project_number) + } + service_call_metric = ServiceCallMetric( + request_count_urn=monitoring_infos.API_REQUEST_COUNT_URN, + base_labels=labels) try: self._client.objects.Insert(self._insert_request, upload=self._upload) + service_call_metric.call('ok') except Exception as e: # pylint: disable=broad-except + service_call_metric.call(e) _LOGGER.error( 'Error in _start_upload while inserting file %s: %s', self._path, diff --git a/sdks/python/apache_beam/io/gcp/gcsio_integration_test.py b/sdks/python/apache_beam/io/gcp/gcsio_integration_test.py index 998dd4222956..b06e374c1e8e 100644 --- a/sdks/python/apache_beam/io/gcp/gcsio_integration_test.py +++ b/sdks/python/apache_beam/io/gcp/gcsio_integration_test.py @@ -39,13 +39,11 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest import uuid -from nose.plugins.attrib import attr +import pytest from apache_beam.io.filesystems import FileSystems from apache_beam.testing.test_pipeline import TestPipeline @@ -113,17 +111,18 @@ def _test_copy( self.gcsio.copy(src, dst, kms_key_name, **extra_kwargs) self._verify_copy(src, dst, kms_key_name) - @attr('IT') + @pytest.mark.it_postcommit def test_copy(self): self._test_copy("test_copy") - @attr('IT') + @pytest.mark.it_postcommit def test_copy_kms(self): if self.kms_key_name is None: raise unittest.SkipTest('--kms_key_name not specified') self._test_copy("test_copy_kms", self.kms_key_name) - @attr('IT') + @pytest.mark.it_postcommit + @unittest.skip('BEAM-12352: enable once maxBytesRewrittenPerCall works again') def test_copy_rewrite_token(self): # Tests a multi-part copy (rewrite) operation. This is triggered by a # combination of 3 conditions: @@ -166,17 +165,18 @@ def _test_copy_batch( for _src, _dst in src_dst_pairs: self._verify_copy(_src, _dst, kms_key_name) - @attr('IT') + @pytest.mark.it_postcommit def test_copy_batch(self): self._test_copy_batch("test_copy_batch") - @attr('IT') + @pytest.mark.it_postcommit def test_copy_batch_kms(self): if self.kms_key_name is None: raise unittest.SkipTest('--kms_key_name not specified') self._test_copy_batch("test_copy_batch_kms", self.kms_key_name) - @attr('IT') + @pytest.mark.it_postcommit + @unittest.skip('BEAM-12352: enable once maxBytesRewrittenPerCall works again') def test_copy_batch_rewrite_token(self): # Tests a multi-part copy (rewrite) operation. This is triggered by a # combination of 3 conditions: diff --git a/sdks/python/apache_beam/io/gcp/gcsio_overrides.py b/sdks/python/apache_beam/io/gcp/gcsio_overrides.py index 2b6328ec3aa5..fc06cb28f1ad 100644 --- a/sdks/python/apache_beam/io/gcp/gcsio_overrides.py +++ b/sdks/python/apache_beam/io/gcp/gcsio_overrides.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import math import time diff --git a/sdks/python/apache_beam/io/gcp/gcsio_test.py b/sdks/python/apache_beam/io/gcp/gcsio_test.py index 29c7e7be60eb..06a98b11b590 100644 --- a/sdks/python/apache_beam/io/gcp/gcsio_test.py +++ b/sdks/python/apache_beam/io/gcp/gcsio_test.py @@ -18,31 +18,27 @@ """Tests for Google Cloud Storage client.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import datetime import errno import io import logging import os import random -import sys import time import unittest -from builtins import object -from builtins import range from email.message import Message -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import httplib2 import mock # Protect against environments where apitools library is not available. # pylint: disable=wrong-import-order, wrong-import-position +from apache_beam.metrics import monitoring_infos +from apache_beam.metrics.execution import MetricsEnvironment +from apache_beam.metrics.metricbase import MetricName + try: - from apache_beam.io.gcp import gcsio + from apache_beam.io.gcp import gcsio, resource_identifiers from apache_beam.io.gcp.internal.clients import storage from apitools.base.py.exceptions import HttpError except ImportError: @@ -50,6 +46,7 @@ # pylint: enable=wrong-import-order, wrong-import-position DEFAULT_GCP_PROJECT = 'apache-beam-testing' +DEFAULT_PROJECT_NUMBER = 1 class FakeGcsClient(object): @@ -58,6 +55,7 @@ class FakeGcsClient(object): def __init__(self): self.objects = FakeGcsObjects() + self.buckets = FakeGcsBuckets() # Referenced in GcsIO.copy_batch() and GcsIO.delete_batch(). self._http = object() @@ -87,6 +85,17 @@ def get_metadata(self): updated=last_updated_datetime) +class FakeGcsBuckets(object): + def __init__(self): + pass + + def get_bucket(self, bucket): + return storage.Bucket(name=bucket, projectNumber=DEFAULT_PROJECT_NUMBER) + + def Get(self, get_request): + return self.get_bucket(get_request.bucket) + + class FakeGcsObjects(object): def __init__(self): self.files = {} @@ -750,10 +759,7 @@ def test_mime_binary_encoding(self): # and does not corrupt '\r\n' during uploads (the patch to apitools in # Python 3 is applied in io/gcp/__init__.py). from apitools.base.py.transfer import email_generator - if sys.version_info[0] == 3: - generator_cls = email_generator.BytesGenerator - else: - generator_cls = email_generator.Generator + generator_cls = email_generator.BytesGenerator output_buffer = io.BytesIO() generator = generator_cls(output_buffer) test_msg = 'a\nb\r\nc\n\r\n\n\nd' @@ -762,6 +768,59 @@ def test_mime_binary_encoding(self): generator._handle_text(message) self.assertEqual(test_msg.encode('ascii'), output_buffer.getvalue()) + def test_downloader_monitoring_info(self): + # Clear the process wide metric container. + MetricsEnvironment.process_wide_container().reset() + + file_name = 'gs://gcsio-metrics-test/dummy_mode_file' + file_size = 5 * 1024 * 1024 + 100 + random_file = self._insert_random_file(self.client, file_name, file_size) + self.gcs.open(file_name, 'r') + + resource = resource_identifiers.GoogleCloudStorageBucket(random_file.bucket) + labels = { + monitoring_infos.SERVICE_LABEL: 'Storage', + monitoring_infos.METHOD_LABEL: 'Objects.get', + monitoring_infos.RESOURCE_LABEL: resource, + monitoring_infos.GCS_BUCKET_LABEL: random_file.bucket, + monitoring_infos.GCS_PROJECT_ID_LABEL: str(DEFAULT_PROJECT_NUMBER), + monitoring_infos.STATUS_LABEL: 'ok' + } + + metric_name = MetricName( + None, None, urn=monitoring_infos.API_REQUEST_COUNT_URN, labels=labels) + metric_value = MetricsEnvironment.process_wide_container().get_counter( + metric_name).get_cumulative() + + self.assertEqual(metric_value, 2) + + def test_uploader_monitoring_info(self): + # Clear the process wide metric container. + MetricsEnvironment.process_wide_container().reset() + + file_name = 'gs://gcsio-metrics-test/dummy_mode_file' + file_size = 5 * 1024 * 1024 + 100 + random_file = self._insert_random_file(self.client, file_name, file_size) + f = self.gcs.open(file_name, 'w') + + resource = resource_identifiers.GoogleCloudStorageBucket(random_file.bucket) + labels = { + monitoring_infos.SERVICE_LABEL: 'Storage', + monitoring_infos.METHOD_LABEL: 'Objects.insert', + monitoring_infos.RESOURCE_LABEL: resource, + monitoring_infos.GCS_BUCKET_LABEL: random_file.bucket, + monitoring_infos.GCS_PROJECT_ID_LABEL: str(DEFAULT_PROJECT_NUMBER), + monitoring_infos.STATUS_LABEL: 'ok' + } + + f.close() + metric_name = MetricName( + None, None, urn=monitoring_infos.API_REQUEST_COUNT_URN, labels=labels) + metric_value = MetricsEnvironment.process_wide_container().get_counter( + metric_name).get_cumulative() + + self.assertEqual(metric_value, 1) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/io/gcp/internal/__init__.py b/sdks/python/apache_beam/io/gcp/internal/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/gcp/internal/__init__.py +++ b/sdks/python/apache_beam/io/gcp/internal/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/gcp/internal/clients/__init__.py b/sdks/python/apache_beam/io/gcp/internal/clients/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/gcp/internal/clients/__init__.py +++ b/sdks/python/apache_beam/io/gcp/internal/clients/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/__init__.py b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/__init__.py index 3630d043083b..6f7bb4adbb8b 100644 --- a/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/__init__.py +++ b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/__init__.py @@ -18,8 +18,6 @@ """Common imports for generated bigquery client library.""" # pylint:disable=wildcard-import -from __future__ import absolute_import - import pkgutil # Protect against environments where apitools library is not available. diff --git a/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py index 74d71f182e21..b695843cc821 100644 --- a/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py +++ b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py @@ -18,8 +18,6 @@ """Generated client library for bigquery version v2.""" # NOTE: This file is autogenerated and should not be edited by hand. -from __future__ import absolute_import - from apitools.base.py import base_api from apache_beam.io.gcp.internal.clients.bigquery import \ diff --git a/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py index 364dd35d3990..d0ecfb5d5eb6 100644 --- a/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py +++ b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py @@ -21,8 +21,6 @@ """ # NOTE: This file is autogenerated and should not be edited by hand. -from __future__ import absolute_import - from apitools.base.protorpclite import message_types as _message_types from apitools.base.protorpclite import messages as _messages from apitools.base.py import encoding diff --git a/sdks/python/apache_beam/io/gcp/internal/clients/storage/__init__.py b/sdks/python/apache_beam/io/gcp/internal/clients/storage/__init__.py index 22011f8af063..b37a4b57c115 100644 --- a/sdks/python/apache_beam/io/gcp/internal/clients/storage/__init__.py +++ b/sdks/python/apache_beam/io/gcp/internal/clients/storage/__init__.py @@ -18,8 +18,6 @@ """Common imports for generated storage client library.""" # pylint:disable=wildcard-import -from __future__ import absolute_import - import pkgutil # Protect against environments where apitools library is not available. diff --git a/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py b/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py index df890a9c869c..9a6e426e9f02 100644 --- a/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py +++ b/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py @@ -18,8 +18,6 @@ """Generated client library for storage version v1.""" # NOTE: This file is autogenerated and should not be edited by hand. -from __future__ import absolute_import - from apitools.base.py import base_api from apache_beam.io.gcp.gcsio_overrides import GcsIOOverrides diff --git a/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py b/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py index 61744efb0e5c..cdba20be3e81 100644 --- a/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py +++ b/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py @@ -21,8 +21,6 @@ """ # NOTE: This file is autogenerated and should not be edited by hand. -from __future__ import absolute_import - from apitools.base.protorpclite import message_types as _message_types from apitools.base.protorpclite import messages as _messages from apitools.base.py import encoding, extra_types diff --git a/sdks/python/apache_beam/io/gcp/pubsub.py b/sdks/python/apache_beam/io/gcp/pubsub.py index 777ae5446a09..39312bf5b673 100644 --- a/sdks/python/apache_beam/io/gcp/pubsub.py +++ b/sdks/python/apache_beam/io/gcp/pubsub.py @@ -25,17 +25,12 @@ # pytype: skip-file -from __future__ import absolute_import - import re -from builtins import object from typing import Any from typing import List from typing import NamedTuple from typing import Optional - -from future.utils import iteritems -from past.builtins import unicode +from typing import Tuple from apache_beam import coders from apache_beam.io.iobase import Read @@ -74,13 +69,29 @@ class PubsubMessage(object): attributes: (dict) Key-value map of str to str, containing both user-defined and service generated attributes (such as id_label and timestamp_attribute). May be None. + message_id: (str) ID of the message, assigned by the pubsub service when the + message is published. Guaranteed to be unique within the topic. Will be + reset to None if the message is being written to pubsub. + publish_time: (datetime) Time at which the message was published. Will be + reset to None if the Message is being written to pubsub. + ordering_key: (str) If non-empty, identifies related messages for which + publish order is respected by the PubSub subscription. """ - def __init__(self, data, attributes): + def __init__( + self, + data, + attributes, + message_id=None, + publish_time=None, + ordering_key=""): if data is None and not attributes: raise ValueError( 'Either data (%r) or attributes (%r) must be set.', data, attributes) self.data = data self.attributes = attributes + self.message_id = message_id + self.publish_time = publish_time + self.ordering_key = ordering_key def __hash__(self): return hash((self.data, frozenset(self.attributes.items()))) @@ -89,10 +100,6 @@ def __eq__(self, other): return isinstance(other, PubsubMessage) and ( self.data == other.data and self.attributes == other.attributes) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __repr__(self): return 'PubsubMessage(%s, %s)' % (self.data, self.attributes) @@ -113,13 +120,21 @@ def _from_proto_str(proto_msg): msg.ParseFromString(proto_msg) # Convert ScalarMapContainer to dict. attributes = dict((key, msg.attributes[key]) for key in msg.attributes) - return PubsubMessage(msg.data, attributes) - - def _to_proto_str(self): + return PubsubMessage( + msg.data, + attributes, + msg.message_id, + msg.publish_time.ToDatetime(), + msg.ordering_key) + + def _to_proto_str(self, for_publish=False): """Get serialized form of ``PubsubMessage``. Args: proto_msg: str containing a serialized protobuf. + for_publish: bool, if True strip out message fields which cannot be + published (currently message_id and publish_time) per + https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#pubsubmessage Returns: A str containing a serialized protobuf of type @@ -128,8 +143,13 @@ def _to_proto_str(self): """ msg = pubsub.types.pubsub_pb2.PubsubMessage() msg.data = self.data - for key, value in iteritems(self.attributes): + for key, value in self.attributes.items(): msg.attributes[key] = value + if self.message_id and not for_publish: + msg.message_id = self.message_id + if self.publish_time and not for_publish: + msg.publish_time = msg.publish_time.FromDatetime(self.publish_time) + msg.ordering_key = self.ordering_key return msg.SerializeToString() @staticmethod @@ -142,7 +162,14 @@ def _from_message(msg): """ # Convert ScalarMapContainer to dict. attributes = dict((key, msg.attributes[key]) for key in msg.attributes) - return PubsubMessage(msg.data, attributes) + pubsubmessage = PubsubMessage(msg.data, attributes) + if msg.message_id: + pubsubmessage.message_id = msg.message_id + if msg.publish_time: + pubsubmessage.publish_time = msg.publish_time + if msg.ordering_key: + pubsubmessage.ordering_key = msg.ordering_key + return pubsubmessage class ReadFromPubSub(PTransform): @@ -234,7 +261,7 @@ def expand(self, pvalue): | ReadFromPubSub( self.topic, self.subscription, self.id_label, with_attributes=False) | 'DecodeString' >> Map(lambda b: b.decode('utf-8'))) - p.element_type = unicode + p.element_type = str return p @@ -252,16 +279,12 @@ def __init__(self, topic): topic: Cloud Pub/Sub topic in the form "/topics//". """ super(_WriteStringsToPubSub, self).__init__() - self.with_attributes = False - self.id_label = None - self.timestamp_attribute = None - self.project, self.topic_name = parse_topic(topic) - self._sink = _PubSubSink(topic, id_label=None, timestamp_attribute=None) + self.topic = topic def expand(self, pcoll): pcoll = pcoll | 'EncodeString' >> Map(lambda s: s.encode('utf-8')) pcoll.element_type = bytes - return pcoll | Write(self._sink) + return pcoll | WriteToPubSub(self.topic) class WriteToPubSub(PTransform): @@ -307,7 +330,7 @@ def message_to_proto_str(element): raise TypeError( 'Unexpected element. Type: %s (expected: PubsubMessage), ' 'value: %r' % (type(element), element)) - return element._to_proto_str() + return element._to_proto_str(for_publish=True) @staticmethod def bytes_to_proto_str(element): @@ -345,7 +368,7 @@ def display_data(self): TOPIC_REGEXP = 'projects/([^/]+)/topics/(.+)' -def parse_topic(full_topic): +def parse_topic(full_topic: str) -> Tuple[str, str]: match = re.match(TOPIC_REGEXP, full_topic) if not match: raise ValueError( @@ -440,9 +463,9 @@ class _PubSubSink(dataflow_io.NativeSink): """ def __init__( self, - topic, # type: str - id_label, # type: Optional[str] - timestamp_attribute # type: Optional[str] + topic: str, + id_label: Optional[str], + timestamp_attribute: Optional[str], ): self.coder = coders.BytesCoder() self.full_topic = topic diff --git a/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py b/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py index c15137a42509..541bb525abe1 100644 --- a/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py +++ b/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py @@ -20,14 +20,12 @@ """ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest import uuid +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.io.gcp import pubsub_it_pipeline from apache_beam.io.gcp.pubsub import PubsubMessage @@ -206,11 +204,11 @@ def _test_streaming(self, with_attributes): id_label=self.ID_LABEL, timestamp_attribute=self.TIMESTAMP_ATTRIBUTE) - @attr('IT') + @pytest.mark.it_postcommit def test_streaming_data_only(self): self._test_streaming(with_attributes=False) - @attr('IT') + @pytest.mark.it_postcommit def test_streaming_with_attributes(self): self._test_streaming(with_attributes=True) diff --git a/sdks/python/apache_beam/io/gcp/pubsub_io_perf_test.py b/sdks/python/apache_beam/io/gcp/pubsub_io_perf_test.py index 24bc9b98fd4f..674f4e48abc9 100644 --- a/sdks/python/apache_beam/io/gcp/pubsub_io_perf_test.py +++ b/sdks/python/apache_beam/io/gcp/pubsub_io_perf_test.py @@ -44,8 +44,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import sys diff --git a/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py b/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py index e9c3971ef0e8..c32f8dbaa2a2 100644 --- a/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py +++ b/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py @@ -21,8 +21,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import apache_beam as beam diff --git a/sdks/python/apache_beam/io/gcp/pubsub_test.py b/sdks/python/apache_beam/io/gcp/pubsub_test.py index f8d039313aff..2f71139f6329 100644 --- a/sdks/python/apache_beam/io/gcp/pubsub_test.py +++ b/sdks/python/apache_beam/io/gcp/pubsub_test.py @@ -20,14 +20,9 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest -from builtins import object -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import hamcrest as hc import mock @@ -829,6 +824,7 @@ def test_write_messages_success(self, mock_pubsub): def test_write_messages_deprecated(self, mock_pubsub): data = 'data' + data_bytes = b'data' payloads = [data] options = PipelineOptions([]) @@ -839,7 +835,7 @@ def test_write_messages_deprecated(self, mock_pubsub): | Create(payloads) | WriteStringsToPubSub('projects/fakeprj/topics/a_topic')) mock_pubsub.return_value.publish.assert_has_calls( - [mock.call(mock.ANY, data)]) + [mock.call(mock.ANY, data_bytes)]) def test_write_messages_with_attributes_success(self, mock_pubsub): data = b'data' diff --git a/sdks/python/apache_beam/io/gcp/resource_identifiers.py b/sdks/python/apache_beam/io/gcp/resource_identifiers.py index fb6ebbe66480..a89a9e17a324 100644 --- a/sdks/python/apache_beam/io/gcp/resource_identifiers.py +++ b/sdks/python/apache_beam/io/gcp/resource_identifiers.py @@ -33,3 +33,12 @@ def BigQueryTable(project_id, dataset_id, table_id): return '//bigquery.googleapis.com/projects/%s/datasets/%s/tables/%s' % ( project_id, dataset_id, table_id) + + +def GoogleCloudStorageBucket(bucket_id): + return '//storage.googleapis.com/buckets/%s' % bucket_id + + +def DatastoreNamespace(project_id, namespace_id): + return '//bigtable.googleapis.com/projects/%s/namespaces/%s' % ( + project_id, namespace_id) diff --git a/sdks/python/apache_beam/io/gcp/spanner.py b/sdks/python/apache_beam/io/gcp/spanner.py index dd1a0c48b8f0..60fae0405853 100644 --- a/sdks/python/apache_beam/io/gcp/spanner.py +++ b/sdks/python/apache_beam/io/gcp/spanner.py @@ -80,15 +80,11 @@ # pytype: skip-file -from __future__ import absolute_import - from enum import Enum from enum import auto from typing import NamedTuple from typing import Optional -from past.builtins import unicode - from apache_beam.transforms.external import BeamJarExpansionService from apache_beam.transforms.external import ExternalTransform from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder @@ -129,19 +125,19 @@ class TimestampBoundMode(Enum): class ReadFromSpannerSchema(NamedTuple): - instance_id: unicode - database_id: unicode + instance_id: str + database_id: str schema: bytes - sql: Optional[unicode] - table: Optional[unicode] - project_id: Optional[unicode] - host: Optional[unicode] - emulator_host: Optional[unicode] + sql: Optional[str] + table: Optional[str] + project_id: Optional[str] + host: Optional[str] + emulator_host: Optional[str] batching: Optional[bool] - timestamp_bound_mode: Optional[unicode] - read_timestamp: Optional[unicode] + timestamp_bound_mode: Optional[str] + read_timestamp: Optional[str] staleness: Optional[int] - time_unit: Optional[unicode] + time_unit: Optional[str] class ReadFromSpanner(ExternalTransform): @@ -168,7 +164,7 @@ class ExampleRow(NamedTuple): database_id='your_database_id', project_id='your_project_id', row_type=ExampleRow, - query='SELECT * FROM some_table', + sql='SELECT * FROM some_table', timestamp_bound_mode=TimestampBoundMode.MAX_STALENESS, staleness=3, time_unit=TimeUnit.HOURS, @@ -274,16 +270,16 @@ def __init__( class WriteToSpannerSchema(NamedTuple): - project_id: unicode - instance_id: unicode - database_id: unicode - table: unicode + project_id: str + instance_id: str + database_id: str + table: str max_batch_size_bytes: Optional[int] max_number_mutations: Optional[int] max_number_rows: Optional[int] grouping_factor: Optional[int] - host: Optional[unicode] - emulator_host: Optional[unicode] + host: Optional[str] + emulator_host: Optional[str] commit_deadline: Optional[int] max_cumulative_backoff: Optional[int] diff --git a/sdks/python/apache_beam/io/gcp/tests/__init__.py b/sdks/python/apache_beam/io/gcp/tests/__init__.py index 6569e3fe5de4..cce3acad34a4 100644 --- a/sdks/python/apache_beam/io/gcp/tests/__init__.py +++ b/sdks/python/apache_beam/io/gcp/tests/__init__.py @@ -14,5 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py b/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py index 20ca128a2810..f30baade5a28 100644 --- a/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py +++ b/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py @@ -19,11 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - import concurrent import logging -import sys import time from hamcrest.core.base_matcher import BaseMatcher @@ -206,10 +203,7 @@ def _get_query_result(self): return response _LOGGER.debug('Query result contains %d rows' % len(response)) time.sleep(1) - if sys.version_info >= (3, ): - raise TimeoutError('Timeout exceeded for matcher.') # noqa: F821 - else: - raise RuntimeError('Timeout exceeded for matcher.') + raise TimeoutError('Timeout exceeded for matcher.') # noqa: F821 class BigQueryTableMatcher(BaseMatcher): diff --git a/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher_test.py b/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher_test.py index ec22317cadbc..5511d61db46d 100644 --- a/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher_test.py +++ b/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher_test.py @@ -19,10 +19,7 @@ # pytype: skip-file -from __future__ import absolute_import - import logging -import sys import unittest import mock @@ -173,12 +170,8 @@ def test__get_query_result_timeout(self, mock__query_with_retry): mock__query_with_retry.side_effect = lambda: [] matcher = bq_verifier.BigqueryFullResultStreamingMatcher( 'some-project', 'some-query', [1, 2, 3], timeout=self.timeout) - if sys.version_info >= (3, ): - with self.assertRaises(TimeoutError): # noqa: F821 - matcher._get_query_result() - else: - with self.assertRaises(RuntimeError): - matcher._get_query_result() + with self.assertRaises(TimeoutError): # noqa: F821 + matcher._get_query_result() if __name__ == '__main__': diff --git a/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py b/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py index 04bca0a8e445..b944a4a82c14 100644 --- a/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py +++ b/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import time from collections import Counter diff --git a/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher_test.py b/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher_test.py index 747f00b5f2cb..46656349205a 100644 --- a/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher_test.py +++ b/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher_test.py @@ -19,14 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import logging -import sys import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import mock from hamcrest import assert_that as hc_assert_that @@ -45,12 +40,6 @@ @mock.patch('time.sleep', return_value=None) @mock.patch('google.cloud.pubsub.SubscriberClient') class PubSubMatcherTest(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): self.mock_presult = mock.MagicMock() diff --git a/sdks/python/apache_beam/io/gcp/tests/utils.py b/sdks/python/apache_beam/io/gcp/tests/utils.py index b437e97f155a..fa721eb40233 100644 --- a/sdks/python/apache_beam/io/gcp/tests/utils.py +++ b/sdks/python/apache_beam/io/gcp/tests/utils.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import random import time diff --git a/sdks/python/apache_beam/io/gcp/tests/utils_test.py b/sdks/python/apache_beam/io/gcp/tests/utils_test.py index 901f33a75225..5ae17436aac0 100644 --- a/sdks/python/apache_beam/io/gcp/tests/utils_test.py +++ b/sdks/python/apache_beam/io/gcp/tests/utils_test.py @@ -19,13 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import mock from apache_beam.io.gcp.pubsub import PubsubMessage diff --git a/sdks/python/apache_beam/io/gcp/tests/xlang_spannerio_it_test.py b/sdks/python/apache_beam/io/gcp/tests/xlang_spannerio_it_test.py index 9fa0d4e79aa9..eb0c71cbfa80 100644 --- a/sdks/python/apache_beam/io/gcp/tests/xlang_spannerio_it_test.py +++ b/sdks/python/apache_beam/io/gcp/tests/xlang_spannerio_it_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import os @@ -28,8 +26,6 @@ from typing import NamedTuple from typing import Optional -from past.builtins import unicode - import apache_beam as beam from apache_beam import coders from apache_beam.io.gcp.spanner import ReadFromSpanner @@ -56,17 +52,17 @@ class SpannerTestKey(NamedTuple): - f_string: unicode + f_string: str class SpannerTestRow(NamedTuple): - f_string: unicode + f_string: str f_int64: Optional[int] f_boolean: Optional[bool] class SpannerPartTestRow(NamedTuple): - f_string: unicode + f_string: str f_int64: Optional[int] diff --git a/sdks/python/apache_beam/io/hadoopfilesystem.py b/sdks/python/apache_beam/io/hadoopfilesystem.py index c34deadec87d..041b9c54cb0d 100644 --- a/sdks/python/apache_beam/io/hadoopfilesystem.py +++ b/sdks/python/apache_beam/io/hadoopfilesystem.py @@ -20,13 +20,10 @@ # pytype: skip-file -from __future__ import absolute_import - import io import logging import posixpath import re -from builtins import zip from typing import BinaryIO # pylint: disable=unused-import import hdfs diff --git a/sdks/python/apache_beam/io/hadoopfilesystem_test.py b/sdks/python/apache_beam/io/hadoopfilesystem_test.py index e83091f145c9..ed56927cec0e 100644 --- a/sdks/python/apache_beam/io/hadoopfilesystem_test.py +++ b/sdks/python/apache_beam/io/hadoopfilesystem_test.py @@ -19,18 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import - import io import logging import posixpath -import sys import unittest -from builtins import object -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import -from future.utils import itervalues from parameterized import parameterized_class from apache_beam.io import hadoopfilesystem as hdfs @@ -56,10 +49,6 @@ def __init__(self, path, mode='', type='FILE'): def __eq__(self, other): return self.stat == other.stat and self.getvalue() == self.getvalue() - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def close(self): self.saved_data = self.getvalue() io.BytesIO.close(self) @@ -138,7 +127,7 @@ def list(self, path, status=False): 'list must be called on a directory, got file: %s' % path) result = [] - for file in itervalues(self.files): + for file in self.files.values(): if file.stat['path'].startswith(path): fs = file.get_file_status() result.append((fs[hdfs._FILE_STATUS_PATH_SUFFIX], fs)) @@ -209,12 +198,6 @@ def checksum(self, path): @parameterized_class(('full_urls', ), [(False, ), (True, )]) class HadoopFileSystemTest(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): self._fake_hdfs = FakeHdfs() hdfs.hdfs.InsecureClient = (lambda *args, **kwargs: self._fake_hdfs) diff --git a/sdks/python/apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh b/sdks/python/apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh index fe3f828c006d..98cf4f74e4ab 100755 --- a/sdks/python/apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh +++ b/sdks/python/apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh @@ -23,7 +23,7 @@ if [[ $# != 1 ]]; then printf "Usage: \n$> ./apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh " printf "\n\tpython_version: [required] Python version used for container build and run tests." - printf " Use 'python:2' for Python2, 'python:3.8' for Python3.8." + printf " Use 'python:3.8' for Python3.8." exit 1 fi diff --git a/sdks/python/apache_beam/io/iobase.py b/sdks/python/apache_beam/io/iobase.py index 28b98a066ca3..8e81da31fb5b 100644 --- a/sdks/python/apache_beam/io/iobase.py +++ b/sdks/python/apache_beam/io/iobase.py @@ -31,24 +31,21 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import math import random import uuid -from builtins import object -from builtins import range from collections import namedtuple -from typing import TYPE_CHECKING from typing import Any from typing import Iterator from typing import Optional from typing import Tuple +from typing import Union from apache_beam import coders from apache_beam import pvalue +from apache_beam.coders.coders import _MemoizingPickleCoder +from apache_beam.internal import pickler from apache_beam.portability import common_urns from apache_beam.portability import python_urns from apache_beam.portability.api import beam_runner_api_pb2 @@ -65,9 +62,6 @@ from apache_beam.utils import urns from apache_beam.utils.windowed_value import WindowedValue -if TYPE_CHECKING: - from apache_beam.runners.pipeline_context import PipelineContext - __all__ = [ 'BoundedSource', 'RangeTracker', @@ -869,6 +863,9 @@ def close(self): class Read(ptransform.PTransform): """A transform that reads a PCollection.""" + # Import runners here to prevent circular imports + from apache_beam.runners.pipeline_context import PipelineContext + def __init__(self, source): # type: (SourceBase) -> None @@ -891,12 +888,14 @@ def get_desired_chunk_size(total_size): def expand(self, pbegin): if isinstance(self.source, BoundedSource): + coders.registry.register_coder(BoundedSource, _MemoizingPickleCoder) display_data = self.source.display_data() or {} display_data['source'] = self.source.__class__ + return ( pbegin | Impulse() - | core.Map(lambda _: self.source) + | core.Map(lambda _: self.source).with_output_types(BoundedSource) | SDFBoundedSourceReader(display_data)) elif isinstance(self.source, ptransform.PTransform): # The Read transform can also admit a full PTransform as an input @@ -928,8 +927,10 @@ def display_data(self): 'source_dd': self.source } - def to_runner_api_parameter(self, context): - # type: (PipelineContext) -> Tuple[str, Any] + def to_runner_api_parameter( + self, + context: PipelineContext, + ) -> Tuple[str, Any]: from apache_beam.runners.dataflow.native_io import iobase as dataflow_io if isinstance(self.source, (BoundedSource, dataflow_io.NativeSource)): from apache_beam.io.gcp.pubsub import _PubSubSource @@ -951,17 +952,19 @@ def to_runner_api_parameter(self, context): beam_runner_api_pb2.IsBounded.UNBOUNDED)) elif isinstance(self.source, ptransform.PTransform): return self.source.to_runner_api_parameter(context) + raise NotImplementedError( + "to_runner_api_parameter not " + "implemented for type") @staticmethod - @ptransform.PTransform.register_urn( - common_urns.deprecated_primitives.READ.urn, - beam_runner_api_pb2.ReadPayload) - @ptransform.PTransform.register_urn( - common_urns.composites.PUBSUB_READ.urn, - beam_runner_api_pb2.PubSubReadPayload) - def from_runner_api_parameter(transform, payload, context): - # type: (Any, Any, PipelineContext) -> Read + def from_runner_api_parameter( + transform: beam_runner_api_pb2.PTransform, + payload: Union[beam_runner_api_pb2.ReadPayload, + beam_runner_api_pb2.PubSubReadPayload], + context: PipelineContext, + ) -> "Read": if transform.spec.urn == common_urns.composites.PUBSUB_READ.urn: + assert isinstance(payload, beam_runner_api_pb2.PubSubReadPayload) # Importing locally to prevent circular dependencies. from apache_beam.io.gcp.pubsub import _PubSubSource source = _PubSubSource( @@ -972,8 +975,42 @@ def from_runner_api_parameter(transform, payload, context): timestamp_attribute=payload.timestamp_attribute or None) return Read(source) else: + assert isinstance(payload, beam_runner_api_pb2.ReadPayload) return Read(SourceBase.from_runner_api(payload.source, context)) + @staticmethod + def _from_runner_api_parameter_read( + transform: beam_runner_api_pb2.PTransform, + payload: beam_runner_api_pb2.ReadPayload, + context: PipelineContext, + ) -> "Read": + """Method for type proxying when calling register_urn due to limitations + in type exprs in Python""" + return Read.from_runner_api_parameter(transform, payload, context) + + @staticmethod + def _from_runner_api_parameter_pubsub_read( + transform: beam_runner_api_pb2.PTransform, + payload: beam_runner_api_pb2.PubSubReadPayload, + context: PipelineContext, + ) -> "Read": + """Method for type proxying when calling register_urn due to limitations + in type exprs in Python""" + return Read.from_runner_api_parameter(transform, payload, context) + + +ptransform.PTransform.register_urn( + common_urns.deprecated_primitives.READ.urn, + beam_runner_api_pb2.ReadPayload, + Read._from_runner_api_parameter_read, +) + +ptransform.PTransform.register_urn( + common_urns.composites.PUBSUB_READ.urn, + beam_runner_api_pb2.PubSubReadPayload, + Read._from_runner_api_parameter_pubsub_read, +) + class Write(ptransform.PTransform): """A ``PTransform`` that writes to a sink. @@ -1002,6 +1039,9 @@ class Write(ptransform.PTransform): should not be updated by users. These sinks are processed using a Dataflow native write transform. """ + # Import runners here to prevent circular imports + from apache_beam.runners.pipeline_context import PipelineContext + def __init__(self, sink): """Initializes a Write transform. @@ -1030,8 +1070,10 @@ def expand(self, pcoll): 'A sink must inherit iobase.Sink, iobase.NativeSink, ' 'or be a PTransform. Received : %r' % self.sink) - def to_runner_api_parameter(self, context): - # type: (PipelineContext) -> Tuple[str, Any] + def to_runner_api_parameter( + self, + context: PipelineContext, + ) -> Tuple[str, Any]: # Importing locally to prevent circular dependencies. from apache_beam.io.gcp.pubsub import _PubSubSink if isinstance(self.sink, _PubSubSink): @@ -1047,17 +1089,25 @@ def to_runner_api_parameter(self, context): @ptransform.PTransform.register_urn( common_urns.composites.PUBSUB_WRITE.urn, beam_runner_api_pb2.PubSubWritePayload) - def from_runner_api_parameter(ptransform, payload, unused_context): - # type: (Any, Any, PipelineContext) -> Write + def from_runner_api_parameter( + ptransform: Any, + payload: beam_runner_api_pb2.PubSubWritePayload, + unused_context: PipelineContext, + ) -> "Write": if ptransform.spec.urn != common_urns.composites.PUBSUB_WRITE.urn: raise ValueError( 'Write transform cannot be constructed for the given proto %r', ptransform) + if not payload.topic: + raise NotImplementedError( + "from_runner_api_parameter does not " + "handle empty or None topic") + # Importing locally to prevent circular dependencies. from apache_beam.io.gcp.pubsub import _PubSubSink sink = _PubSubSink( - topic=payload.topic or None, + topic=payload.topic, id_label=payload.id_attribute or None, timestamp_attribute=payload.timestamp_attribute or None) return Write(sink) @@ -1079,7 +1129,7 @@ def expand(self, pcoll): if min_shards == 1: keyed_pcoll = pcoll | core.Map(lambda x: (None, x)) else: - keyed_pcoll = pcoll | core.ParDo(_RoundRobinKeyFn(min_shards)) + keyed_pcoll = pcoll | core.ParDo(_RoundRobinKeyFn(), count=min_shards) write_result_coll = ( keyed_pcoll | core.WindowInto(window.GlobalWindows()) @@ -1180,17 +1230,13 @@ def _finalize_write( class _RoundRobinKeyFn(core.DoFn): - def __init__(self, count): - # type: (int) -> None - self.count = count - def start_bundle(self): - self.counter = random.randint(0, self.count - 1) + self.counter = None - def process(self, element): - self.counter += 1 - if self.counter >= self.count: - self.counter -= self.count + def process(self, element, count): + if self.counter is None: + self.counter = random.randrange(0, count) + self.counter = (1 + self.counter) % count yield self.counter, element @@ -1453,33 +1499,39 @@ def source(self): return self._source_bundle.source def try_split(self, fraction_of_remainder): - consumed_fraction = self.range_tracker().fraction_consumed() - fraction = ( - consumed_fraction + (1 - consumed_fraction) * fraction_of_remainder) - position = self.range_tracker().position_at_fraction(fraction) - # Need to stash current stop_pos before splitting since - # range_tracker.split will update its stop_pos if splits - # successfully. - stop_pos = self._source_bundle.stop_position - split_result = self.range_tracker().try_split(position) - if split_result: - split_pos, split_fraction = split_result - primary_weight = self._source_bundle.weight * split_fraction - residual_weight = self._source_bundle.weight - primary_weight - # Update self to primary weight and end position. - self._source_bundle = SourceBundle( - primary_weight, - self._source_bundle.source, - self._source_bundle.start_position, - split_pos) - return ( - self, - _SDFBoundedSourceRestriction( - SourceBundle( - residual_weight, - self._source_bundle.source, - split_pos, - stop_pos))) + try: + consumed_fraction = self.range_tracker().fraction_consumed() + fraction = ( + consumed_fraction + (1 - consumed_fraction) * fraction_of_remainder) + position = self.range_tracker().position_at_fraction(fraction) + # Need to stash current stop_pos before splitting since + # range_tracker.split will update its stop_pos if splits + # successfully. + stop_pos = self._source_bundle.stop_position + split_result = self.range_tracker().try_split(position) + if split_result: + split_pos, split_fraction = split_result + primary_weight = self._source_bundle.weight * split_fraction + residual_weight = self._source_bundle.weight - primary_weight + # Update self to primary weight and end position. + self._source_bundle = SourceBundle( + primary_weight, + self._source_bundle.source, + self._source_bundle.start_position, + split_pos) + return ( + self, + _SDFBoundedSourceRestriction( + SourceBundle( + residual_weight, + self._source_bundle.source, + split_pos, + stop_pos))) + except Exception: + # For any exceptions from underlying trySplit calls, the wrapper will + # think that the source refuses to split at this point. In this case, + # no split happens at the wrapper level. + return None class _SDFBoundedSourceRestrictionTracker(RestrictionTracker): @@ -1525,6 +1577,18 @@ def is_bounded(self): return True +class _SDFBoundedSourceWrapperRestrictionCoder(coders.Coder): + def decode(self, value): + return _SDFBoundedSourceRestriction(SourceBundle(*pickler.loads(value))) + + def encode(self, restriction): + return pickler.dumps(( + restriction._source_bundle.weight, + restriction._source_bundle.source, + restriction._source_bundle.start_position, + restriction._source_bundle.stop_position)) + + class _SDFBoundedSourceRestrictionProvider(core.RestrictionProvider): """ A `RestrictionProvider` that is used by SDF for `BoundedSource`. @@ -1532,8 +1596,10 @@ class _SDFBoundedSourceRestrictionProvider(core.RestrictionProvider): This restriction provider initializes restriction based on input element that is expected to be of BoundedSource type. """ - def __init__(self, desired_chunk_size=None): + def __init__(self, desired_chunk_size=None, restriction_coder=None): self._desired_chunk_size = desired_chunk_size + self._restriction_coder = ( + restriction_coder or _SDFBoundedSourceWrapperRestrictionCoder()) def _check_source(self, src): if not isinstance(src, BoundedSource): @@ -1570,7 +1636,7 @@ def restriction_size(self, element, restriction): return restriction.weight() def restriction_coder(self): - return coders.DillCoder() + return self._restriction_coder class SDFBoundedSourceReader(PTransform): diff --git a/sdks/python/apache_beam/io/iobase_test.py b/sdks/python/apache_beam/io/iobase_test.py index c8d147acf9c3..bde05664e380 100644 --- a/sdks/python/apache_beam/io/iobase_test.py +++ b/sdks/python/apache_beam/io/iobase_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import mock @@ -29,6 +27,7 @@ from apache_beam.io.concat_source import ConcatSource from apache_beam.io.concat_source_test import RangeSource from apache_beam.io import iobase +from apache_beam.io import range_trackers from apache_beam.io.iobase import SourceBundle from apache_beam.options.pipeline_options import DebugOptions from apache_beam.testing.util import assert_that @@ -183,6 +182,18 @@ def test_try_split_at_remainder(self): actual_primary._source_bundle.weight, self.sdf_restriction_tracker.current_restriction().weight()) + def test_try_split_with_any_exception(self): + source_bundle = SourceBundle( + range_trackers.OffsetRangeTracker.OFFSET_INFINITY, + RangeSource(0, range_trackers.OffsetRangeTracker.OFFSET_INFINITY), + 0, + range_trackers.OffsetRangeTracker.OFFSET_INFINITY) + self.sdf_restriction_tracker = ( + iobase._SDFBoundedSourceRestrictionTracker( + iobase._SDFBoundedSourceRestriction(source_bundle))) + self.sdf_restriction_tracker.try_claim(0) + self.assertIsNone(self.sdf_restriction_tracker.try_split(0.5)) + class UseSdfBoundedSourcesTests(unittest.TestCase): def _run_sdf_wrapper_pipeline(self, source, expected_values): diff --git a/sdks/python/apache_beam/io/jdbc.py b/sdks/python/apache_beam/io/jdbc.py index 0b9fb9fb339f..060a4d8d177a 100644 --- a/sdks/python/apache_beam/io/jdbc.py +++ b/sdks/python/apache_beam/io/jdbc.py @@ -18,7 +18,7 @@ """PTransforms for supporting Jdbc in Python pipelines. These transforms are currently supported by Beam portable - Flink and Spark runners. + Flink, Spark, and Dataflow v2 runners. **Setup** @@ -79,12 +79,8 @@ # pytype: skip-file -from __future__ import absolute_import - import typing -from past.builtins import unicode - from apache_beam.coders import RowCoder from apache_beam.transforms.external import BeamJarExpansionService from apache_beam.transforms.external import ExternalTransform @@ -104,20 +100,20 @@ def default_io_expansion_service(): JdbcConfigSchema = typing.NamedTuple( 'JdbcConfigSchema', - [('location', unicode), ('config', bytes)], + [('location', str), ('config', bytes)], ) Config = typing.NamedTuple( 'Config', [ - ('driver_class_name', unicode), - ('jdbc_url', unicode), - ('username', unicode), - ('password', unicode), - ('connection_properties', typing.Optional[unicode]), - ('connection_init_sqls', typing.Optional[typing.List[unicode]]), - ('write_statement', typing.Optional[unicode]), - ('read_query', typing.Optional[unicode]), + ('driver_class_name', str), + ('jdbc_url', str), + ('username', str), + ('password', str), + ('connection_properties', typing.Optional[str]), + ('connection_init_sqls', typing.Optional[typing.List[str]]), + ('write_statement', typing.Optional[str]), + ('read_query', typing.Optional[str]), ('fetch_size', typing.Optional[int]), ('output_parallelization', typing.Optional[bool]), ], diff --git a/sdks/python/apache_beam/io/kafka.py b/sdks/python/apache_beam/io/kafka.py index b36c791bfa74..3e28cfe59839 100644 --- a/sdks/python/apache_beam/io/kafka.py +++ b/sdks/python/apache_beam/io/kafka.py @@ -80,27 +80,20 @@ # pytype: skip-file -from __future__ import absolute_import - import typing -from past.builtins import unicode - from apache_beam.transforms.external import BeamJarExpansionService from apache_beam.transforms.external import ExternalTransform from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder ReadFromKafkaSchema = typing.NamedTuple( 'ReadFromKafkaSchema', - [ - ('consumer_config', typing.Mapping[unicode, unicode]), - ('topics', typing.List[unicode]), - ('key_deserializer', unicode), - ('value_deserializer', unicode), - ('start_read_time', typing.Optional[int]), - ('max_num_records', typing.Optional[int]), - ('max_read_time', typing.Optional[int]), - ]) + [('consumer_config', typing.Mapping[str, str]), + ('topics', typing.List[str]), ('key_deserializer', str), + ('value_deserializer', str), ('start_read_time', typing.Optional[int]), + ('max_num_records', typing.Optional[int]), + ('max_read_time', typing.Optional[int]), + ('commit_offset_in_finalize', bool), ('timestamp_policy', str)]) def default_io_expansion_service(): @@ -120,7 +113,12 @@ class ReadFromKafka(ExternalTransform): byte_array_deserializer = ( 'org.apache.kafka.common.serialization.ByteArrayDeserializer') - URN = 'beam:external:java:kafka:read:v1' + processing_time_policy = 'ProcessingTime' + create_time_policy = 'CreateTime' + log_append_time = 'LogAppendTime' + + URN_WITH_METADATA = 'beam:external:java:kafkaio:externalwithmetadata:v1' + URN_WITHOUT_METADATA = 'beam:external:java:kafkaio:typedwithoutmetadata:v1' def __init__( self, @@ -131,6 +129,9 @@ def __init__( start_read_time=None, max_num_records=None, max_read_time=None, + commit_offset_in_finalize=False, + timestamp_policy=processing_time_policy, + with_metadata=False, expansion_service=None, ): """ @@ -152,10 +153,26 @@ def __init__( for tests and demo applications. :param max_read_time: Maximum amount of time in seconds the transform executes. Mainly used for tests and demo applications. + :param commit_offset_in_finalize: Whether to commit offsets when finalizing. + :param timestamp_policy: The built-in timestamp policy which is used for + extracting timestamp from KafkaRecord. + :param with_metadata: whether the returned PCollection should contain + Kafka related metadata or not. If False (default), elements of the + returned PCollection will be of type 'bytes' if True, elements of the + returned PCollection will be of the type 'Row'. Note that, currently + this only works when using default key and value deserializers where + Java Kafka Reader reads keys and values as 'byte[]'. :param expansion_service: The address (host:port) of the ExpansionService. """ + if timestamp_policy not in [ReadFromKafka.processing_time_policy, + ReadFromKafka.create_time_policy, + ReadFromKafka.log_append_time]: + raise ValueError( + 'timestamp_policy should be one of ' + '[ProcessingTime, CreateTime, LogAppendTime]') + super(ReadFromKafka, self).__init__( - self.URN, + self.URN_WITH_METADATA if with_metadata else self.URN_WITHOUT_METADATA, NamedTupleBasedPayloadBuilder( ReadFromKafkaSchema( consumer_config=consumer_config, @@ -165,17 +182,18 @@ def __init__( max_num_records=max_num_records, max_read_time=max_read_time, start_read_time=start_read_time, - )), + commit_offset_in_finalize=commit_offset_in_finalize, + timestamp_policy=timestamp_policy)), expansion_service or default_io_expansion_service()) WriteToKafkaSchema = typing.NamedTuple( 'WriteToKafkaSchema', [ - ('producer_config', typing.Mapping[unicode, unicode]), - ('topic', unicode), - ('key_serializer', unicode), - ('value_serializer', unicode), + ('producer_config', typing.Mapping[str, str]), + ('topic', str), + ('key_serializer', str), + ('value_serializer', str), ]) diff --git a/sdks/python/apache_beam/io/kinesis.py b/sdks/python/apache_beam/io/kinesis.py index 3f8353d07c56..4a70f76d83d1 100644 --- a/sdks/python/apache_beam/io/kinesis.py +++ b/sdks/python/apache_beam/io/kinesis.py @@ -79,16 +79,12 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import time from typing import Mapping from typing import NamedTuple from typing import Optional -from past.builtins import unicode - from apache_beam import BeamJarExpansionService from apache_beam import ExternalTransform from apache_beam import NamedTupleBasedPayloadBuilder @@ -109,14 +105,14 @@ def default_io_expansion_service(): WriteToKinesisSchema = NamedTuple( 'WriteToKinesisSchema', [ - ('stream_name', unicode), - ('aws_access_key', unicode), - ('aws_secret_key', unicode), - ('region', unicode), - ('partition_key', unicode), - ('service_endpoint', Optional[unicode]), + ('stream_name', str), + ('aws_access_key', str), + ('aws_secret_key', str), + ('region', str), + ('partition_key', str), + ('service_endpoint', Optional[str]), ('verify_certificate', Optional[bool]), - ('producer_properties', Optional[Mapping[unicode, unicode]]), + ('producer_properties', Optional[Mapping[str, str]]), ], ) @@ -177,20 +173,20 @@ def __init__( ReadFromKinesisSchema = NamedTuple( 'ReadFromKinesisSchema', [ - ('stream_name', unicode), - ('aws_access_key', unicode), - ('aws_secret_key', unicode), - ('region', unicode), - ('service_endpoint', Optional[unicode]), + ('stream_name', str), + ('aws_access_key', str), + ('aws_secret_key', str), + ('region', str), + ('service_endpoint', Optional[str]), ('verify_certificate', Optional[bool]), ('max_num_records', Optional[int]), ('max_read_time', Optional[int]), - ('initial_position_in_stream', Optional[unicode]), + ('initial_position_in_stream', Optional[str]), ('initial_timestamp_in_stream', Optional[int]), ('request_records_limit', Optional[int]), ('up_to_date_threshold', Optional[int]), ('max_capacity_per_shard', Optional[int]), - ('watermark_policy', Optional[unicode]), + ('watermark_policy', Optional[str]), ('watermark_idle_duration_threshold', Optional[int]), ('rate_limit', Optional[int]), ], diff --git a/sdks/python/apache_beam/io/localfilesystem.py b/sdks/python/apache_beam/io/localfilesystem.py index 3cbdba61d004..56668aa89529 100644 --- a/sdks/python/apache_beam/io/localfilesystem.py +++ b/sdks/python/apache_beam/io/localfilesystem.py @@ -19,12 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import io import os import shutil -from builtins import zip from typing import BinaryIO # pylint: disable=unused-import from apache_beam.io.filesystem import BeamIOError @@ -162,9 +159,7 @@ def create( Returns: file handle with a close function for the user to use """ - if not os.path.exists(os.path.dirname(path)): - # TODO(Py3): Add exist_ok parameter. - os.makedirs(os.path.dirname(path)) + os.makedirs(os.path.dirname(path), exist_ok=True) return self._path_open(path, 'wb', mime_type, compression_type) def open( diff --git a/sdks/python/apache_beam/io/localfilesystem_test.py b/sdks/python/apache_beam/io/localfilesystem_test.py index 80d32697f359..84455c28f4cd 100644 --- a/sdks/python/apache_beam/io/localfilesystem_test.py +++ b/sdks/python/apache_beam/io/localfilesystem_test.py @@ -20,18 +20,13 @@ # pytype: skip-file -from __future__ import absolute_import - import filecmp import logging import os import shutil -import sys import tempfile import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import mock from parameterized import param from parameterized import parameterized @@ -62,12 +57,6 @@ def _split(path): class LocalFileSystemTest(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): self.tmpdir = tempfile.mkdtemp() pipeline_options = PipelineOptions() diff --git a/sdks/python/apache_beam/io/mongodbio.py b/sdks/python/apache_beam/io/mongodbio.py index 392c824b376f..c6f7d97f4705 100644 --- a/sdks/python/apache_beam/io/mongodbio.py +++ b/sdks/python/apache_beam/io/mongodbio.py @@ -66,17 +66,17 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import itertools import json import logging import math import struct +from typing import Union import apache_beam as beam from apache_beam.io import iobase +from apache_beam.io.range_trackers import LexicographicKeyRangeTracker +from apache_beam.io.range_trackers import OffsetRangeTracker from apache_beam.io.range_trackers import OrderedPositionRangeTracker from apache_beam.transforms import DoFn from apache_beam.transforms import PTransform @@ -86,11 +86,13 @@ _LOGGER = logging.getLogger(__name__) try: - # Mongodb has its own bundled bson, which is not compatible with bson pakcage. + # Mongodb has its own bundled bson, which is not compatible with bson package. # (https://github.com/py-bson/bson/issues/82). Try to import objectid and if # it fails because bson package is installed, MongoDB IO will not work but at # least rest of the SDK will work. + from bson import json_util from bson import objectid + from bson.objectid import ObjectId # pymongo also internally depends on bson. from pymongo import ASCENDING @@ -99,24 +101,30 @@ from pymongo import ReplaceOne except ImportError: objectid = None + json_util = None + ObjectId = None + ASCENDING = 1 + DESCENDING = -1 + MongoClient = None + ReplaceOne = None _LOGGER.warning("Could not find a compatible bson package.") -__all__ = ['ReadFromMongoDB', 'WriteToMongoDB'] +__all__ = ["ReadFromMongoDB", "WriteToMongoDB"] @experimental() class ReadFromMongoDB(PTransform): - """A ``PTransform`` to read MongoDB documents into a ``PCollection``. - """ + """A ``PTransform`` to read MongoDB documents into a ``PCollection``.""" def __init__( self, - uri='mongodb://localhost:27017', + uri="mongodb://localhost:27017", db=None, coll=None, filter=None, projection=None, extra_client_params=None, - bucket_auto=False): + bucket_auto=False, + ): """Initialize a :class:`ReadFromMongoDB` Args: @@ -139,16 +147,14 @@ def __init__( Returns: :class:`~apache_beam.transforms.ptransform.PTransform` - """ if extra_client_params is None: extra_client_params = {} if not isinstance(db, str): - raise ValueError('ReadFromMongDB db param must be specified as a string') + raise ValueError("ReadFromMongDB db param must be specified as a string") if not isinstance(coll, str): raise ValueError( - 'ReadFromMongDB coll param must be specified as a ' - 'string') + "ReadFromMongDB coll param must be specified as a string") self._mongo_source = _BoundedMongoSource( uri=uri, db=db, @@ -156,13 +162,88 @@ def __init__( filter=filter, projection=projection, extra_client_params=extra_client_params, - bucket_auto=bucket_auto) + bucket_auto=bucket_auto, + ) def expand(self, pcoll): return pcoll | iobase.Read(self._mongo_source) +class _ObjectIdRangeTracker(OrderedPositionRangeTracker): + """RangeTracker for tracking mongodb _id of bson ObjectId type.""" + def position_to_fraction( + self, + pos: ObjectId, + start: ObjectId, + end: ObjectId, + ): + """Returns the fraction of keys in the range [start, end) that + are less than the given key. + """ + pos_number = _ObjectIdHelper.id_to_int(pos) + start_number = _ObjectIdHelper.id_to_int(start) + end_number = _ObjectIdHelper.id_to_int(end) + return (pos_number - start_number) / (end_number - start_number) + + def fraction_to_position( + self, + fraction: float, + start: ObjectId, + end: ObjectId, + ): + """Converts a fraction between 0 and 1 + to a position between start and end. + """ + start_number = _ObjectIdHelper.id_to_int(start) + end_number = _ObjectIdHelper.id_to_int(end) + total = end_number - start_number + pos = int(total * fraction + start_number) + # make sure split position is larger than start position and smaller than + # end position. + if pos <= start_number: + return _ObjectIdHelper.increment_id(start, 1) + + if pos >= end_number: + return _ObjectIdHelper.increment_id(end, -1) + + return _ObjectIdHelper.int_to_id(pos) + + class _BoundedMongoSource(iobase.BoundedSource): + """A MongoDB source that reads a finite amount of input records. + + This class defines following operations which can be used to read + MongoDB source efficiently. + + * Size estimation - method ``estimate_size()`` may return an accurate + estimation in bytes for the size of the source. + * Splitting into bundles of a given size - method ``split()`` can be used to + split the source into a set of sub-sources (bundles) based on a desired + bundle size. + * Getting a RangeTracker - method ``get_range_tracker()`` should return a + ``RangeTracker`` object for a given position range for the position type + of the records returned by the source. + * Reading the data - method ``read()`` can be used to read data from the + source while respecting the boundaries defined by a given + ``RangeTracker``. + + A runner will perform reading the source in two steps. + + (1) Method ``get_range_tracker()`` will be invoked with start and end + positions to obtain a ``RangeTracker`` for the range of positions the + runner intends to read. Source must define a default initial start and end + position range. These positions must be used if the start and/or end + positions passed to the method ``get_range_tracker()`` are ``None`` + (2) Method read() will be invoked with the ``RangeTracker`` obtained in the + previous step. + + **Mutability** + + A ``_BoundedMongoSource`` object should not be mutated while + its methods (for example, ``read()``) are being invoked by a runner. Runner + implementations may invoke methods of ``_BoundedMongoSource`` objects through + multi-threaded and/or reentrant execution modes. + """ def __init__( self, uri=None, @@ -171,7 +252,8 @@ def __init__( filter=None, projection=None, extra_client_params=None, - bucket_auto=False): + bucket_auto=False, + ): if extra_client_params is None: extra_client_params = {} if filter is None: @@ -186,13 +268,33 @@ def __init__( def estimate_size(self): with MongoClient(self.uri, **self.spec) as client: - return client[self.db].command('collstats', self.coll).get('size') + return client[self.db].command("collstats", self.coll).get("size") def _estimate_average_document_size(self): with MongoClient(self.uri, **self.spec) as client: - return client[self.db].command('collstats', self.coll).get('avgObjSize') + return client[self.db].command("collstats", self.coll).get("avgObjSize") + + def split( + self, + desired_bundle_size: int, + start_position: Union[int, str, bytes, ObjectId] = None, + stop_position: Union[int, str, bytes, ObjectId] = None, + ): + """Splits the source into a set of bundles. + + Bundles should be approximately of size ``desired_bundle_size`` bytes. + + Args: + desired_bundle_size: the desired size (in bytes) of the bundles returned. + start_position: if specified the given position must be used as the + starting position of the first bundle. + stop_position: if specified the given position must be used as the ending + position of the last bundle. + Returns: + an iterator of objects of type 'SourceBundle' that gives information about + the generated bundles. + """ - def split(self, desired_bundle_size, start_position=None, stop_position=None): desired_bundle_size_in_mb = desired_bundle_size // 1024 // 1024 # for desired bundle size, if desired chunk size smaller than 1mb, use @@ -202,18 +304,21 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None): is_initial_split = start_position is None and stop_position is None start_position, stop_position = self._replace_none_positions( - start_position, stop_position) + start_position, stop_position + ) if self.bucket_auto: # Use $bucketAuto for bundling split_keys = [] weights = [] - for bucket in self._get_auto_buckets(desired_bundle_size_in_mb, - start_position, - stop_position, - is_initial_split): - split_keys.append({'_id': bucket['_id']['max']}) - weights.append(bucket['count']) + for bucket in self._get_auto_buckets( + desired_bundle_size_in_mb, + start_position, + stop_position, + is_initial_split, + ): + split_keys.append({"_id": bucket["_id"]["max"]}) + weights.append(bucket["count"]) else: # Use splitVector for bundling split_keys = self._get_split_keys( @@ -224,12 +329,13 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None): for split_key_id, weight in zip(split_keys, weights): if bundle_start >= stop_position: break - bundle_end = min(stop_position, split_key_id['_id']) + bundle_end = min(stop_position, split_key_id["_id"]) yield iobase.SourceBundle( weight=weight, source=self, start_position=bundle_start, - stop_position=bundle_end) + stop_position=bundle_end, + ) bundle_start = bundle_end # add range of last split_key to stop_position if bundle_start < stop_position: @@ -239,61 +345,146 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None): weight=weight, source=self, start_position=bundle_start, - stop_position=stop_position) + stop_position=stop_position, + ) + + def get_range_tracker( + self, + start_position: Union[int, str, ObjectId] = None, + stop_position: Union[int, str, ObjectId] = None, + ) -> Union[ + _ObjectIdRangeTracker, OffsetRangeTracker, LexicographicKeyRangeTracker]: + """Returns a RangeTracker for a given position range depending on type. - def get_range_tracker(self, start_position, stop_position): + Args: + start_position: starting position of the range. If 'None' default start + position of the source must be used. + stop_position: ending position of the range. If 'None' default stop + position of the source must be used. + Returns: + a ``_ObjectIdRangeTracker``, ``OffsetRangeTracker`` + or ``LexicographicKeyRangeTracker`` depending on the given position range. + """ start_position, stop_position = self._replace_none_positions( - start_position, stop_position) - return _ObjectIdRangeTracker(start_position, stop_position) + start_position, stop_position + ) + + if isinstance(start_position, ObjectId): + return _ObjectIdRangeTracker(start_position, stop_position) + + if isinstance(start_position, int): + return OffsetRangeTracker(start_position, stop_position) + + if isinstance(start_position, str): + return LexicographicKeyRangeTracker(start_position, stop_position) + + raise NotImplementedError( + f"RangeTracker for {type(start_position)} not implemented!") def read(self, range_tracker): + """Returns an iterator that reads data from the source. + + The returned set of data must respect the boundaries defined by the given + ``RangeTracker`` object. For example: + + * Returned set of data must be for the range + ``[range_tracker.start_position, range_tracker.stop_position)``. Note + that a source may decide to return records that start after + ``range_tracker.stop_position``. See documentation in class + ``RangeTracker`` for more details. Also, note that framework might + invoke ``range_tracker.try_split()`` to perform dynamic split + operations. range_tracker.stop_position may be updated + dynamically due to successful dynamic split operations. + * Method ``range_tracker.try_split()`` must be invoked for every record + that starts at a split point. + * Method ``range_tracker.record_current_position()`` may be invoked for + records that do not start at split points. + + Args: + range_tracker: a ``RangeTracker`` whose boundaries must be respected + when reading data from the source. A runner that reads this + source muss pass a ``RangeTracker`` object that is not + ``None``. + Returns: + an iterator of data read by the source. + """ with MongoClient(self.uri, **self.spec) as client: all_filters = self._merge_id_filter( range_tracker.start_position(), range_tracker.stop_position()) - docs_cursor = client[self.db][self.coll].find( - filter=all_filters, - projection=self.projection).sort([('_id', ASCENDING)]) + docs_cursor = ( + client[self.db][self.coll].find( + filter=all_filters, + projection=self.projection).sort([("_id", ASCENDING)])) for doc in docs_cursor: - if not range_tracker.try_claim(doc['_id']): + if not range_tracker.try_claim(doc["_id"]): return yield doc def display_data(self): - res = super(_BoundedMongoSource, self).display_data() - res['uri'] = self.uri - res['database'] = self.db - res['collection'] = self.coll - res['filter'] = json.dumps(self.filter) - res['projection'] = str(self.projection) - res['mongo_client_spec'] = json.dumps(self.spec) - res['bucket_auto'] = self.bucket_auto + """Returns the display data associated to a pipeline component.""" + res = super().display_data() + res["database"] = self.db + res["collection"] = self.coll + res["filter"] = json.dumps(self.filter, default=json_util.default) + res["projection"] = str(self.projection) + res["bucket_auto"] = self.bucket_auto return res - def _get_split_keys(self, desired_chunk_size_in_mb, start_pos, end_pos): - # calls mongodb splitVector command to get document ids at split position - if start_pos >= _ObjectIdHelper.increment_id(end_pos, -1): - # single document not splittable + @staticmethod + def _range_is_not_splittable( + start_pos: Union[int, str, ObjectId], + end_pos: Union[int, str, ObjectId], + ): + """Return `True` if splitting range doesn't make sense + (single document is not splittable), + Return `False` otherwise. + """ + return (( + isinstance(start_pos, ObjectId) and + start_pos >= _ObjectIdHelper.increment_id(end_pos, -1)) or + (isinstance(start_pos, int) and start_pos >= end_pos - 1) or + (isinstance(start_pos, str) and start_pos >= end_pos)) + + def _get_split_keys( + self, + desired_chunk_size_in_mb: int, + start_pos: Union[int, str, ObjectId], + end_pos: Union[int, str, ObjectId], + ): + """Calls MongoDB `splitVector` command + to get document ids at split position. + """ + # single document not splittable + if self._range_is_not_splittable(start_pos, end_pos): return [] + with MongoClient(self.uri, **self.spec) as client: - name_space = '%s.%s' % (self.db, self.coll) - return ( - client[self.db].command( - 'splitVector', - name_space, - keyPattern={'_id': 1}, # Ascending index - min={'_id': start_pos}, - max={'_id': end_pos}, - maxChunkSize=desired_chunk_size_in_mb)['splitKeys']) + name_space = "%s.%s" % (self.db, self.coll) + return client[self.db].command( + "splitVector", + name_space, + keyPattern={"_id": 1}, # Ascending index + min={"_id": start_pos}, + max={"_id": end_pos}, + maxChunkSize=desired_chunk_size_in_mb, + )["splitKeys"] def _get_auto_buckets( - self, desired_chunk_size_in_mb, start_pos, end_pos, is_initial_split): - - if start_pos >= _ObjectIdHelper.increment_id(end_pos, -1): - # single document not splittable + self, + desired_chunk_size_in_mb: int, + start_pos: Union[int, str, ObjectId], + end_pos: Union[int, str, ObjectId], + is_initial_split: bool, + ) -> list: + """Use MongoDB `$bucketAuto` aggregation to split collection into bundles + instead of `splitVector` command, which does not work with MongoDB Atlas. + """ + # single document not splittable + if self._range_is_not_splittable(start_pos, end_pos): return [] if is_initial_split and not self.filter: - # total collection size + # total collection size in MB size_in_mb = self.estimate_size() / float(1 << 20) else: # size of documents within start/end id range and possibly filtered @@ -310,34 +501,46 @@ def _get_auto_buckets( pipeline = [ { # filter by positions and by the custom filter if any - '$match': self._merge_id_filter(start_pos, end_pos) + "$match": self._merge_id_filter(start_pos, end_pos) }, { - '$bucketAuto': { - 'groupBy': '$_id', 'buckets': bucket_count + "$bucketAuto": { + "groupBy": "$_id", "buckets": bucket_count } - } + }, ] - buckets = list(client[self.db][self.coll].aggregate(pipeline)) + buckets = list( + # Use `allowDiskUse` option to avoid aggregation limit of 100 Mb RAM + client[self.db][self.coll].aggregate(pipeline, allowDiskUse=True)) if buckets: - buckets[-1]['_id']['max'] = end_pos + buckets[-1]["_id"]["max"] = end_pos return buckets - def _merge_id_filter(self, start_position, stop_position): - # Merge the default filter (if any) with refined _id field range - # of range_tracker. - # $gte specifies start position (inclusive) - # and $lt specifies the end position (exclusive), - # see more at - # https://docs.mongodb.com/manual/reference/operator/query/gte/ and - # https://docs.mongodb.com/manual/reference/operator/query/lt/ - id_filter = {'_id': {'$gte': start_position, '$lt': stop_position}} + def _merge_id_filter( + self, + start_position: Union[int, str, bytes, ObjectId], + stop_position: Union[int, str, bytes, ObjectId] = None, + ) -> dict: + """Merge the default filter (if any) with refined _id field range + of range_tracker. + $gte specifies start position (inclusive) + and $lt specifies the end position (exclusive), + see more at + https://docs.mongodb.com/manual/reference/operator/query/gte/ and + https://docs.mongodb.com/manual/reference/operator/query/lt/ + """ + + if stop_position is None: + id_filter = {"_id": {"$gte": start_position}} + else: + id_filter = {"_id": {"$gte": start_position, "$lt": stop_position}} + if self.filter: all_filters = { # see more at # https://docs.mongodb.com/manual/reference/operator/query/and/ - '$and': [self.filter.copy(), id_filter] + "$and": [self.filter.copy(), id_filter] } else: all_filters = id_filter @@ -346,45 +549,58 @@ def _merge_id_filter(self, start_position, stop_position): def _get_head_document_id(self, sort_order): with MongoClient(self.uri, **self.spec) as client: - cursor = client[self.db][self.coll].find( - filter={}, projection=[]).sort([('_id', sort_order)]).limit(1) + cursor = ( + client[self.db][self.coll].find(filter={}, projection=[]).sort([ + ("_id", sort_order) + ]).limit(1)) try: - return cursor[0]['_id'] + return cursor[0]["_id"] + except IndexError: - raise ValueError('Empty Mongodb collection') + raise ValueError("Empty Mongodb collection") def _replace_none_positions(self, start_position, stop_position): + if start_position is None: start_position = self._get_head_document_id(ASCENDING) if stop_position is None: last_doc_id = self._get_head_document_id(DESCENDING) # increment last doc id binary value by 1 to make sure the last document # is not excluded - stop_position = _ObjectIdHelper.increment_id(last_doc_id, 1) + if isinstance(last_doc_id, ObjectId): + stop_position = _ObjectIdHelper.increment_id(last_doc_id, 1) + elif isinstance(last_doc_id, int): + stop_position = last_doc_id + 1 + elif isinstance(last_doc_id, str): + stop_position = last_doc_id + '\x00' + return start_position, stop_position def _count_id_range(self, start_position, stop_position): - # Number of documents between start_position (inclusive) - # and stop_position (exclusive), respecting the custom filter if any. + """Number of documents between start_position (inclusive) + and stop_position (exclusive), respecting the custom filter if any. + """ with MongoClient(self.uri, **self.spec) as client: return client[self.db][self.coll].count_documents( filter=self._merge_id_filter(start_position, stop_position)) -class _ObjectIdHelper(object): +class _ObjectIdHelper: """A Utility class to manipulate bson object ids.""" @classmethod - def id_to_int(cls, id): + def id_to_int(cls, _id: Union[int, ObjectId]) -> int: """ Args: - id: ObjectId required for each MongoDB document _id field. + _id: ObjectId required for each MongoDB document _id field. Returns: Converted integer value of ObjectId's 12 bytes binary value. - """ + if isinstance(_id, int): + return _id + # converts object id binary to integer # id object is bytes type with size of 12 - ints = struct.unpack('>III', id.binary) + ints = struct.unpack(">III", _id.binary) return (ints[0] << 64) + (ints[1] << 32) + ints[2] @classmethod @@ -395,61 +611,45 @@ def int_to_id(cls, number): Returns: The ObjectId that has the 12 bytes binary converted from the integer value. - """ # converts integer value to object id. Int value should be less than # (2 ^ 96) so it can be convert to 12 bytes required by object id. if number < 0 or number >= (1 << 96): - raise ValueError('number value must be within [0, %s)' % (1 << 96)) - ints = [(number & 0xffffffff0000000000000000) >> 64, - (number & 0x00000000ffffffff00000000) >> 32, - number & 0x0000000000000000ffffffff] + raise ValueError("number value must be within [0, %s)" % (1 << 96)) + ints = [ + (number & 0xFFFFFFFF0000000000000000) >> 64, + (number & 0x00000000FFFFFFFF00000000) >> 32, + number & 0x0000000000000000FFFFFFFF, + ] - bytes = struct.pack('>III', *ints) - return objectid.ObjectId(bytes) + number_bytes = struct.pack(">III", *ints) + return ObjectId(number_bytes) @classmethod - def increment_id(cls, object_id, inc): + def increment_id( + cls, + _id: ObjectId, + inc: int, + ) -> ObjectId: """ + Increment object_id binary value by inc value and return new object id. + Args: - object_id: The ObjectId to change. - inc(int): The incremental int value to be added to ObjectId. + _id: The `_id` to change. + inc(int): The incremental int value to be added to `_id`. Returns: - + `_id` incremented by `inc` value """ - # increment object_id binary value by inc value and return new object id. - id_number = _ObjectIdHelper.id_to_int(object_id) + id_number = _ObjectIdHelper.id_to_int(_id) new_number = id_number + inc if new_number < 0 or new_number >= (1 << 96): raise ValueError( - 'invalid incremental, inc value must be within [' - '%s, %s)' % (0 - id_number, 1 << 96 - id_number)) + "invalid incremental, inc value must be within [" + "%s, %s)" % (0 - id_number, 1 << 96 - id_number)) return _ObjectIdHelper.int_to_id(new_number) -class _ObjectIdRangeTracker(OrderedPositionRangeTracker): - """RangeTracker for tracking mongodb _id of bson ObjectId type.""" - def position_to_fraction(self, pos, start, end): - pos_number = _ObjectIdHelper.id_to_int(pos) - start_number = _ObjectIdHelper.id_to_int(start) - end_number = _ObjectIdHelper.id_to_int(end) - return (pos_number - start_number) / (end_number - start_number) - - def fraction_to_position(self, fraction, start, end): - start_number = _ObjectIdHelper.id_to_int(start) - end_number = _ObjectIdHelper.id_to_int(end) - total = end_number - start_number - pos = int(total * fraction + start_number) - # make sure split position is larger than start position and smaller than - # end position. - if pos <= start_number: - return _ObjectIdHelper.increment_id(start, 1) - if pos >= end_number: - return _ObjectIdHelper.increment_id(end, -1) - return _ObjectIdHelper.int_to_id(pos) - - @experimental() class WriteToMongoDB(PTransform): """WriteToMongoDB is a ``PTransform`` that writes a ``PCollection`` of @@ -476,11 +676,12 @@ class WriteToMongoDB(PTransform): """ def __init__( self, - uri='mongodb://localhost:27017', + uri="mongodb://localhost:27017", db=None, coll=None, batch_size=100, - extra_client_params=None): + extra_client_params=None, + ): """ Args: @@ -500,11 +701,10 @@ def __init__( if extra_client_params is None: extra_client_params = {} if not isinstance(db, str): - raise ValueError('WriteToMongoDB db param must be specified as a string') + raise ValueError("WriteToMongoDB db param must be specified as a string") if not isinstance(coll, str): raise ValueError( - 'WriteToMongoDB coll param must be specified as a ' - 'string') + "WriteToMongoDB coll param must be specified as a string") self._uri = uri self._db = db self._coll = coll @@ -512,25 +712,27 @@ def __init__( self._spec = extra_client_params def expand(self, pcoll): - return pcoll \ - | beam.ParDo(_GenerateObjectIdFn()) \ - | Reshuffle() \ - | beam.ParDo(_WriteMongoFn(self._uri, self._db, self._coll, - self._batch_size, self._spec)) + return ( + pcoll + | beam.ParDo(_GenerateObjectIdFn()) + | Reshuffle() + | beam.ParDo( + _WriteMongoFn( + self._uri, self._db, self._coll, self._batch_size, self._spec))) class _GenerateObjectIdFn(DoFn): def process(self, element, *args, **kwargs): # if _id field already exist we keep it as it is, otherwise the ptransform # generates a new _id field to achieve idempotent write to mongodb. - if '_id' not in element: + if "_id" not in element: # object.ObjectId() generates a unique identifier that follows mongodb # default format, if _id is not present in document, mongodb server # generates it with this same function upon write. However the # uniqueness of generated id may not be guaranteed if the work load are # distributed across too many processes. See more on the ObjectId format # https://docs.mongodb.com/manual/reference/bson-types/#objectid. - element['_id'] = objectid.ObjectId() + element["_id"] = objectid.ObjectId() yield element @@ -564,15 +766,13 @@ def _flush(self): def display_data(self): res = super(_WriteMongoFn, self).display_data() - res['uri'] = self.uri - res['database'] = self.db - res['collection'] = self.coll - res['mongo_client_params'] = json.dumps(self.spec) - res['batch_size'] = self.batch_size + res["database"] = self.db + res["collection"] = self.coll + res["batch_size"] = self.batch_size return res -class _MongoSink(object): +class _MongoSink: def __init__(self, uri=None, db=None, coll=None, extra_params=None): if extra_params is None: extra_params = {} @@ -591,17 +791,18 @@ def write(self, documents): # insert new one, otherwise overwrite it. requests.append( ReplaceOne( - filter={'_id': doc.get('_id', None)}, + filter={"_id": doc.get("_id", None)}, replacement=doc, upsert=True)) resp = self.client[self.db][self.coll].bulk_write(requests) _LOGGER.debug( - 'BulkWrite to MongoDB result in nModified:%d, nUpserted:%d, ' - 'nMatched:%d, Errors:%s' % ( + "BulkWrite to MongoDB result in nModified:%d, nUpserted:%d, " + "nMatched:%d, Errors:%s" % ( resp.modified_count, resp.upserted_count, resp.matched_count, - resp.bulk_api_result.get('writeErrors'))) + resp.bulk_api_result.get("writeErrors"), + )) def __enter__(self): if self.client is None: diff --git a/sdks/python/apache_beam/io/mongodbio_it_test.py b/sdks/python/apache_beam/io/mongodbio_it_test.py index 9619c7af64b2..dfbc9e65305b 100644 --- a/sdks/python/apache_beam/io/mongodbio_it_test.py +++ b/sdks/python/apache_beam/io/mongodbio_it_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import time diff --git a/sdks/python/apache_beam/io/mongodbio_test.py b/sdks/python/apache_beam/io/mongodbio_test.py index dbcd7001c9d4..150eac2d5437 100644 --- a/sdks/python/apache_beam/io/mongodbio_test.py +++ b/sdks/python/apache_beam/io/mongodbio_test.py @@ -16,17 +16,15 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import datetime import logging import random -import sys import unittest +from typing import Union from unittest import TestCase import mock +from bson import ObjectId from bson import objectid from parameterized import parameterized_class from pymongo import ASCENDING @@ -42,6 +40,8 @@ from apache_beam.io.mongodbio import _ObjectIdHelper from apache_beam.io.mongodbio import _ObjectIdRangeTracker from apache_beam.io.mongodbio import _WriteMongoFn +from apache_beam.io.range_trackers import LexicographicKeyRangeTracker +from apache_beam.io.range_trackers import OffsetRangeTracker from apache_beam.testing.test_pipeline import TestPipeline from apache_beam.testing.util import assert_that from apache_beam.testing.util import equal_to @@ -124,7 +124,7 @@ def limit(self, num): def count_documents(self, filter): return len(self._filter(filter)) - def aggregate(self, pipeline): + def aggregate(self, pipeline, **kwargs): # Simulate $bucketAuto aggregate pipeline. # Example splits doc count for the total of 5 docs: # - 1 bucket: [5] @@ -179,7 +179,7 @@ def __getitem__(self, coll_name): def command(self, command, *args, **kwargs): if command == 'collstats': return {'size': 5 * 1024 * 1024, 'avgObjSize': 1 * 1024 * 1024} - elif command == 'splitVector': + if command == 'splitVector': return self.get_split_keys(command, *args, **kwargs) def get_split_keys(self, command, ns, min, max, maxChunkSize, **kwargs): @@ -209,7 +209,7 @@ def get_split_keys(self, command, ns, min, max, maxChunkSize, **kwargs): } -class _MockMongoClient(object): +class _MockMongoClient: def __init__(self, docs): self.docs = docs @@ -223,15 +223,87 @@ def __exit__(self, exc_type, exc_val, exc_tb): pass -@parameterized_class(('bucket_auto', ), [(None, ), (True, )]) +# Generate test data for MongoDB collections of different types +OBJECT_IDS = [ + objectid.ObjectId.from_datetime( + datetime.datetime(year=2020, month=i + 1, day=i + 1)) for i in range(5) +] + +INT_IDS = [n for n in range(5)] # [0, 1, 2, 3, 4] + +STR_IDS_1 = [str(n) for n in range(5)] # ['0', '1', '2', '3', '4'] + +# ['aaaaa', 'bbbbb', 'ccccc', 'ddddd', 'eeeee'] +STR_IDS_2 = [chr(97 + n) * 5 for n in range(5)] + +# ['AAAAAAAAAAAAAAAAAAAA', 'BBBBBBBBBBBBBBBBBBBB', ..., 'EEEEEEEEEEEEEEEEEEEE'] +STR_IDS_3 = [chr(65 + n) * 20 for n in range(5)] + + +@parameterized_class(('bucket_auto', '_ids', 'min_id', 'max_id'), + [ + ( + None, + OBJECT_IDS, + _ObjectIdHelper.int_to_id(0), + _ObjectIdHelper.int_to_id(2**96 - 1)), + ( + True, + OBJECT_IDS, + _ObjectIdHelper.int_to_id(0), + _ObjectIdHelper.int_to_id(2**96 - 1)), + ( + None, + INT_IDS, + 0, + 2**96 - 1, + ), + ( + True, + INT_IDS, + 0, + 2**96 - 1, + ), + ( + None, + STR_IDS_1, + chr(0), + chr(0x10ffff), + ), + ( + True, + STR_IDS_1, + chr(0), + chr(0x10ffff), + ), + ( + None, + STR_IDS_2, + chr(0), + chr(0x10ffff), + ), + ( + True, + STR_IDS_2, + chr(0), + chr(0x10ffff), + ), + ( + None, + STR_IDS_3, + chr(0), + chr(0x10ffff), + ), + ( + True, + STR_IDS_3, + chr(0), + chr(0x10ffff), + ), + ]) class MongoSourceTest(unittest.TestCase): @mock.patch('apache_beam.io.mongodbio.MongoClient') def setUp(self, mock_client): - self._ids = [ - objectid.ObjectId.from_datetime( - datetime.datetime(year=2020, month=i + 1, day=i + 1)) - for i in range(5) - ] self._docs = [{'_id': self._ids[i], 'x': i} for i in range(len(self._ids))] mock_client.return_value = _MockMongoClient(self._docs) @@ -246,6 +318,27 @@ def _create_source(filter=None, bucket_auto=None): kwargs['bucket_auto'] = bucket_auto return _BoundedMongoSource('mongodb://test', 'testdb', 'testcoll', **kwargs) + def _increment_id( + self, + _id: Union[ObjectId, int, str], + inc: int, + ) -> Union[ObjectId, int, str]: + """Helper method to increment `_id` of different types.""" + + if isinstance(_id, ObjectId): + return _ObjectIdHelper.increment_id(_id, inc) + + if isinstance(_id, int): + return _id + inc + + if isinstance(_id, str): + index = self._ids.index(_id) + inc + if index <= 0: + return self._ids[0] + if index >= len(self._ids): + return self._ids[-1] + return self._ids[index] + @mock.patch('apache_beam.io.mongodbio.MongoClient') def test_estimate_size(self, mock_client): mock_client.return_value = _MockMongoClient(self._docs) @@ -286,10 +379,12 @@ def test_split_single_document(self, mock_client): start_position=None, stop_position=None, desired_bundle_size=size)) self.assertEqual(len(splits), 1) - self.assertEqual(splits[0].start_position, self._docs[0]['_id']) - self.assertEqual( - splits[0].stop_position, - _ObjectIdHelper.increment_id(self._docs[0]['_id'], 1)) + _id = self._docs[0]['_id'] + assert _id == splits[0].start_position + assert _id <= splits[0].stop_position + if isinstance(_id, (ObjectId, int)): + # We can unambiguously determine next `_id` + assert self._increment_id(_id, 1) == splits[0].stop_position @mock.patch('apache_beam.io.mongodbio.MongoClient') def test_split_no_documents(self, mock_client): @@ -361,8 +456,8 @@ def test_split_filtered_empty(self, mock_client): reference_info = ( filtered_mongo_source, # range to match no documents: - _ObjectIdHelper.increment_id(self._docs[-1]['_id'], 1), - _ObjectIdHelper.increment_id(self._docs[-1]['_id'], 2), + self._increment_id(self._docs[-1]['_id'], 1), + self._increment_id(self._docs[-1]['_id'], 2), ) sources_info = ([ (split.source, split.start_position, split.stop_position) @@ -383,8 +478,21 @@ def test_dynamic_work_rebalancing(self, mock_client): @mock.patch('apache_beam.io.mongodbio.MongoClient') def test_get_range_tracker(self, mock_client): mock_client.return_value = _MockMongoClient(self._docs) - self.assertIsInstance( - self.mongo_source.get_range_tracker(None, None), _ObjectIdRangeTracker) + if self._ids == OBJECT_IDS: + self.assertIsInstance( + self.mongo_source.get_range_tracker(None, None), + _ObjectIdRangeTracker, + ) + elif self._ids == INT_IDS: + self.assertIsInstance( + self.mongo_source.get_range_tracker(None, None), + OffsetRangeTracker, + ) + elif self._ids == STR_IDS_1: + self.assertIsInstance( + self.mongo_source.get_range_tracker(None, None), + LexicographicKeyRangeTracker, + ) @mock.patch('apache_beam.io.mongodbio.MongoClient') def test_read(self, mock_client): @@ -398,26 +506,26 @@ def test_read(self, mock_client): }, { # range covers from the first to the third documents - 'start': _ObjectIdHelper.int_to_id(0), # smallest possible id + 'start': self.min_id, # smallest possible id 'stop': self._ids[2], 'expected': self._docs[0:2] }, { # range covers from the third to last documents 'start': self._ids[2], - 'stop': _ObjectIdHelper.int_to_id(2**96 - 1), # largest possible id + 'stop': self.max_id, # largest possible id 'expected': self._docs[2:] }, { # range covers all documents - 'start': _ObjectIdHelper.int_to_id(0), - 'stop': _ObjectIdHelper.int_to_id(2**96 - 1), + 'start': self.min_id, + 'stop': self.max_id, 'expected': self._docs }, { # range doesn't include any document - 'start': _ObjectIdHelper.increment_id(self._ids[2], 1), - 'stop': _ObjectIdHelper.increment_id(self._ids[3], -1), + 'start': self._increment_id(self._ids[2], 1), + 'stop': self._increment_id(self._ids[3], -1), 'expected': [] }, ] @@ -430,10 +538,35 @@ def test_read(self, mock_client): def test_display_data(self): data = self.mongo_source.display_data() - self.assertTrue('uri' in data) self.assertTrue('database' in data) self.assertTrue('collection' in data) + def test_range_is_not_splittable(self): + self.assertTrue( + self.mongo_source._range_is_not_splittable( + _ObjectIdHelper.int_to_id(1), + _ObjectIdHelper.int_to_id(1), + )) + self.assertTrue( + self.mongo_source._range_is_not_splittable( + _ObjectIdHelper.int_to_id(1), + _ObjectIdHelper.int_to_id(2), + )) + self.assertFalse( + self.mongo_source._range_is_not_splittable( + _ObjectIdHelper.int_to_id(1), + _ObjectIdHelper.int_to_id(3), + )) + + self.assertTrue(self.mongo_source._range_is_not_splittable(0, 0)) + self.assertTrue(self.mongo_source._range_is_not_splittable(0, 1)) + self.assertFalse(self.mongo_source._range_is_not_splittable(0, 2)) + + self.assertTrue(self.mongo_source._range_is_not_splittable("AAA", "AAA")) + self.assertFalse( + self.mongo_source._range_is_not_splittable("AAA", "AAA\x00")) + self.assertFalse(self.mongo_source._range_is_not_splittable("AAA", "AAB")) + @parameterized_class(('bucket_auto', ), [(False, ), (True, )]) class ReadFromMongoDBTest(unittest.TestCase): @@ -505,9 +638,13 @@ def test_write(self, mock_client): class WriteToMongoDBTest(unittest.TestCase): @mock.patch('apache_beam.io.mongodbio.MongoClient') def test_write_to_mongodb_with_existing_id(self, mock_client): - id = objectid.ObjectId() - docs = [{'x': 1, '_id': id}] - expected_update = [ReplaceOne({'_id': id}, {'x': 1, '_id': id}, True, None)] + _id = objectid.ObjectId() + docs = [{'x': 1, '_id': _id}] + expected_update = [ + ReplaceOne({'_id': _id}, { + 'x': 1, '_id': _id + }, True, None) + ] with TestPipeline() as p: _ = ( p | "Create" >> beam.Create(docs) @@ -543,37 +680,36 @@ def test_conversion(self): (objectid.ObjectId('00000000ffffffffffffffff'), 2**64 - 1), (objectid.ObjectId('ffffffffffffffffffffffff'), 2**96 - 1), ] - for (id, number) in test_cases: - self.assertEqual(id, _ObjectIdHelper.int_to_id(number)) - self.assertEqual(number, _ObjectIdHelper.id_to_int(id)) + for (_id, number) in test_cases: + self.assertEqual(_id, _ObjectIdHelper.int_to_id(number)) + self.assertEqual(number, _ObjectIdHelper.id_to_int(_id)) # random tests for _ in range(100): - id = objectid.ObjectId() - if sys.version_info[0] < 3: - number = int(id.binary.encode('hex'), 16) - else: # PY3 - number = int(id.binary.hex(), 16) - self.assertEqual(id, _ObjectIdHelper.int_to_id(number)) - self.assertEqual(number, _ObjectIdHelper.id_to_int(id)) + _id = objectid.ObjectId() + number = int(_id.binary.hex(), 16) + self.assertEqual(_id, _ObjectIdHelper.int_to_id(number)) + self.assertEqual(number, _ObjectIdHelper.id_to_int(_id)) def test_increment_id(self): test_cases = [ ( - objectid.ObjectId('000000000000000100000000'), - objectid.ObjectId('0000000000000000ffffffff')), + objectid.ObjectId("000000000000000100000000"), + objectid.ObjectId("0000000000000000ffffffff"), + ), ( - objectid.ObjectId('000000010000000000000000'), - objectid.ObjectId('00000000ffffffffffffffff')), + objectid.ObjectId("000000010000000000000000"), + objectid.ObjectId("00000000ffffffffffffffff"), + ), ] - for (first, second) in test_cases: + for first, second in test_cases: self.assertEqual(second, _ObjectIdHelper.increment_id(first, -1)) self.assertEqual(first, _ObjectIdHelper.increment_id(second, 1)) for _ in range(100): - id = objectid.ObjectId() - self.assertLess(id, _ObjectIdHelper.increment_id(id, 1)) - self.assertGreater(id, _ObjectIdHelper.increment_id(id, -1)) + _id = objectid.ObjectId() + self.assertLess(_id, _ObjectIdHelper.increment_id(_id, 1)) + self.assertGreater(_id, _ObjectIdHelper.increment_id(_id, -1)) class ObjectRangeTrackerTest(TestCase): @@ -586,10 +722,10 @@ def test_fraction_position_conversion(self): [random.randint(start_int, stop_int) for _ in range(100)]) tracker = _ObjectIdRangeTracker() for pos in test_cases: - id = _ObjectIdHelper.int_to_id(pos - start_int) + _id = _ObjectIdHelper.int_to_id(pos - start_int) desired_fraction = (pos - start_int) / (stop_int - start_int) self.assertAlmostEqual( - tracker.position_to_fraction(id, start, stop), + tracker.position_to_fraction(_id, start, stop), desired_fraction, places=20) diff --git a/sdks/python/apache_beam/io/parquetio.py b/sdks/python/apache_beam/io/parquetio.py index 50ee9388a70f..188aab7ce83d 100644 --- a/sdks/python/apache_beam/io/parquetio.py +++ b/sdks/python/apache_beam/io/parquetio.py @@ -30,8 +30,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - from functools import partial from apache_beam.io import filebasedsink @@ -66,14 +64,20 @@ class _ArrowTableToRowDictionaries(DoFn): """ A DoFn that consumes an Arrow table and yields a python dictionary for each row in the table.""" - def process(self, table): + def process(self, table, with_filename=False): + if with_filename: + file_name = table[0] + table = table[1] num_rows = table.num_rows data_items = table.to_pydict().items() for n in range(num_rows): row = {} for column, values in data_items: row[column] = values[n] - yield row + if with_filename: + yield (file_name, row) + else: + yield row class ReadFromParquetBatched(PTransform): @@ -217,6 +221,7 @@ def __init__( min_bundle_size=0, desired_bundle_size=DEFAULT_DESIRED_BUNDLE_SIZE, columns=None, + with_filename=False, label='ReadAllFiles'): """Initializes ``ReadAllFromParquet``. @@ -228,6 +233,9 @@ def __init__( columns: list of columns that will be read from files. A column name may be a prefix of a nested field, e.g. 'a' will select 'a.b', 'a.c', and 'a.d.e' + with_filename: If True, returns a Key Value with the key being the file + name and the value being the actual data. If False, it only returns + the data. """ super(ReadAllFromParquetBatched, self).__init__() source_from_file = partial( @@ -239,7 +247,8 @@ def __init__( CompressionTypes.UNCOMPRESSED, desired_bundle_size, min_bundle_size, - source_from_file) + source_from_file, + with_filename) self.label = label @@ -248,11 +257,14 @@ def expand(self, pvalue): class ReadAllFromParquet(PTransform): - def __init__(self, **kwargs): - self._read_batches = ReadAllFromParquetBatched(**kwargs) + def __init__(self, with_filename=False, **kwargs): + self._with_filename = with_filename + self._read_batches = ReadAllFromParquetBatched( + with_filename=self._with_filename, **kwargs) def expand(self, pvalue): - return pvalue | self._read_batches | ParDo(_ArrowTableToRowDictionaries()) + return pvalue | self._read_batches | ParDo( + _ArrowTableToRowDictionaries(), with_filename=self._with_filename) def _create_parquet_source( @@ -560,7 +572,7 @@ def display_data(self): return res def _write_batches(self, writer): - table = pa.Table.from_batches(self._record_batches) + table = pa.Table.from_batches(self._record_batches, schema=self._schema) self._record_batches = [] self._record_batches_byte_size = 0 writer.write_table(table) @@ -570,7 +582,7 @@ def _flush_buffer(self): for x, y in enumerate(self._buffer): arrays[x] = pa.array(y, type=self._schema.types[x]) self._buffer[x] = [] - rb = pa.RecordBatch.from_arrays(arrays, self._schema.names) + rb = pa.RecordBatch.from_arrays(arrays, schema=self._schema) self._record_batches.append(rb) size = 0 for x in arrays: diff --git a/sdks/python/apache_beam/io/parquetio_it_test.py b/sdks/python/apache_beam/io/parquetio_it_test.py index cb63f8227e7b..0d3cd0d65313 100644 --- a/sdks/python/apache_beam/io/parquetio_it_test.py +++ b/sdks/python/apache_beam/io/parquetio_it_test.py @@ -16,16 +16,12 @@ # # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import string -import sys import unittest from collections import Counter -from nose.plugins.attrib import attr +import pytest from apache_beam import Create from apache_beam import DoFn @@ -50,19 +46,13 @@ @unittest.skipIf(pa is None, "PyArrow is not installed.") class TestParquetIT(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): pass def tearDown(self): pass - @attr('IT') + @pytest.mark.it_postcommit def test_parquetio_it(self): file_prefix = "parquet_it_test" init_size = 10 diff --git a/sdks/python/apache_beam/io/parquetio_test.py b/sdks/python/apache_beam/io/parquetio_test.py index 23d4b415c391..0232bac45e06 100644 --- a/sdks/python/apache_beam/io/parquetio_test.py +++ b/sdks/python/apache_beam/io/parquetio_test.py @@ -16,18 +16,17 @@ # # pytype: skip-file -from __future__ import absolute_import - import json import logging import os import shutil -import sys import tempfile import unittest +from tempfile import TemporaryDirectory import hamcrest as hc import pandas +import pytest from parameterized import param from parameterized import parameterized @@ -48,8 +47,6 @@ from apache_beam.testing.util import equal_to from apache_beam.transforms.display import DisplayData from apache_beam.transforms.display_test import DisplayDataItemMatcher -# TODO(BEAM-8371): Use tempfile.TemporaryDirectory. -from apache_beam.utils.subprocess_server_test import TemporaryDirectory try: import pyarrow as pa @@ -62,13 +59,8 @@ @unittest.skipIf(pa is None, "PyArrow is not installed.") +@pytest.mark.uses_pyarrow class TestParquet(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): # Reducing the size of thread pools. Without this test execution may fail in # environments with limited amount of resources. @@ -102,14 +94,19 @@ def setUp(self): 'name': 'Percy', 'favorite_number': 6, 'favorite_color': 'Green' + }, + { + 'name': 'Peter', + 'favorite_number': 3, + 'favorite_color': None }] - self.SCHEMA = pa.schema([('name', pa.string()), - ('favorite_number', pa.int64()), + self.SCHEMA = pa.schema([('name', pa.string(), False), + ('favorite_number', pa.int64(), False), ('favorite_color', pa.string())]) - self.SCHEMA96 = pa.schema([('name', pa.string()), - ('favorite_number', pa.timestamp('ns')), + self.SCHEMA96 = pa.schema([('name', pa.string(), False), + ('favorite_number', pa.timestamp('ns'), False), ('favorite_color', pa.string())]) def tearDown(self): @@ -121,6 +118,7 @@ def _record_to_columns(self, records, schema): column = [] for r in records: column.append(r[n]) + col_list.append(column) return col_list @@ -137,7 +135,7 @@ def _records_as_arrow(self, schema=None, count=None): data.append(self.RECORDS[i % len_records]) col_data = self._record_to_columns(data, schema) col_array = [pa.array(c, schema.types[cn]) for cn, c in enumerate(col_data)] - return pa.Table.from_arrays(col_array, schema.names) + return pa.Table.from_arrays(col_array, schema=schema) def _write_data( self, @@ -162,13 +160,16 @@ def _write_data( return f.name - def _write_pattern(self, num_files): + def _write_pattern(self, num_files, with_filename=False): assert num_files > 0 temp_dir = tempfile.mkdtemp(dir=self.temp_dir) + file_list = [] for _ in range(num_files): - self._write_data(directory=temp_dir, prefix='mytemp') + file_list.append(self._write_data(directory=temp_dir, prefix='mytemp')) + if with_filename: + return (temp_dir + os.path.sep + 'mytemp*', file_list) return temp_dir + os.path.sep + 'mytemp*' def _run_parquet_test( @@ -414,7 +415,7 @@ def test_int96_type_conversion(self): count=120, row_group_size=20, schema=self.SCHEMA96) orig = self._records_as_arrow(count=120, schema=self.SCHEMA96) expected_result = [ - pa.Table.from_batches([batch]) + pa.Table.from_batches([batch], schema=self.SCHEMA96) for batch in orig.to_batches(chunksize=20) ] self._run_parquet_test(file_name, None, None, False, expected_result) @@ -450,8 +451,12 @@ def test_split_points(self): def test_selective_columns(self): file_name = self._write_data() orig = self._records_as_arrow() + name_column = self.SCHEMA.field('name') expected_result = [ - pa.Table.from_arrays([orig.column('name')], names=['name']) + pa.Table.from_arrays( + [orig.column('name')], + schema=pa.schema([('name', name_column.type, name_column.nullable) + ])) ] self._run_parquet_test(file_name, ['name'], None, False, expected_result) @@ -532,6 +537,16 @@ def test_read_all_from_parquet_many_file_patterns(self): | ReadAllFromParquetBatched(), equal_to([self._records_as_arrow()] * 10)) + def test_read_all_from_parquet_with_filename(self): + file_pattern, file_paths = self._write_pattern(3, with_filename=True) + result = [(path, record) for path in file_paths for record in self.RECORDS] + with TestPipeline() as p: + assert_that( + p \ + | Create([file_pattern]) \ + | ReadAllFromParquet(with_filename=True), + equal_to(result)) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/io/range_trackers.py b/sdks/python/apache_beam/io/range_trackers.py index 433ebbdce227..33b15d5ecfa2 100644 --- a/sdks/python/apache_beam/io/range_trackers.py +++ b/sdks/python/apache_beam/io/range_trackers.py @@ -19,16 +19,11 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import codecs import logging import math import threading -from builtins import zip - -from past.builtins import long +from typing import Union from apache_beam.io import iobase @@ -59,9 +54,9 @@ def __init__(self, start, end): raise ValueError('Start offset must not be \'None\'') if end is None: raise ValueError('End offset must not be \'None\'') - assert isinstance(start, (int, long)) + assert isinstance(start, int) if end != self.OFFSET_INFINITY: - assert isinstance(end, (int, long)) + assert isinstance(end, int) assert start <= end @@ -142,7 +137,7 @@ def set_current_position(self, record_start): self._last_record_start = record_start def try_split(self, split_offset): - assert isinstance(split_offset, (int, long)) + assert isinstance(split_offset, int) with self._lock: if self._stop_offset == OffsetRangeTracker.OFFSET_INFINITY: _LOGGER.debug( @@ -271,16 +266,17 @@ def try_split(self, position): if ((self._stop_position is not None and position >= self._stop_position) or (self._start_position is not None and position <= self._start_position)): - raise ValueError( - "Split at '%s' not in range %s" % - (position, [self._start_position, self._stop_position])) + _LOGGER.debug( + 'Refusing to split %r at %d: proposed split position out of range', + self, + position) + return + if self._last_claim is self.UNSTARTED or self._last_claim < position: fraction = self.position_to_fraction( position, start=self._start_position, end=self._stop_position) self._stop_position = position return position, fraction - else: - return None def fraction_consumed(self): if self._last_claim is self.UNSTARTED: @@ -295,6 +291,12 @@ def fraction_to_position(self, fraction, start, end): """ raise NotImplementedError + def position_to_fraction(self, position, start, end): + """Returns the fraction of keys in the range [start, end) that + are less than the given key. + """ + raise NotImplementedError + class UnsplittableRangeTracker(iobase.RangeTracker): """A RangeTracker that always ignores split requests. @@ -345,87 +347,112 @@ def set_split_points_unclaimed_callback(self, callback): class LexicographicKeyRangeTracker(OrderedPositionRangeTracker): - """ - A range tracker that tracks progress through a lexicographically + """A range tracker that tracks progress through a lexicographically ordered keyspace of strings. """ @classmethod - def fraction_to_position(cls, fraction, start=None, end=None): - """ - Linearly interpolates a key that is lexicographically + def fraction_to_position( + cls, + fraction: float, + start: Union[bytes, str] = None, + end: Union[bytes, str] = None, + ) -> Union[bytes, str]: + """Linearly interpolates a key that is lexicographically fraction of the way between start and end. """ assert 0 <= fraction <= 1, fraction + if start is None: start = b'' + + if fraction == 0: + return start + if fraction == 1: return end - elif fraction == 0: - return start + + if not end: + common_prefix_len = len(start) - len(start.lstrip(b'\xFF')) else: - if not end: - common_prefix_len = len(start) - len(start.lstrip(b'\xFF')) + for ix, (s, e) in enumerate(zip(start, end)): + if s != e: + common_prefix_len = ix + break else: - for ix, (s, e) in enumerate(zip(start, end)): - if s != e: - common_prefix_len = ix - break - else: - common_prefix_len = min(len(start), len(end)) - # Convert the relative precision of fraction (~53 bits) to an absolute - # precision needed to represent values between start and end distinctly. - prec = common_prefix_len + int(-math.log(fraction, 256)) + 7 - istart = cls._bytestring_to_int(start, prec) - iend = cls._bytestring_to_int(end, prec) if end else 1 << (prec * 8) - ikey = istart + int((iend - istart) * fraction) - # Could be equal due to rounding. - # Adjust to ensure we never return the actual start and end - # unless fraction is exatly 0 or 1. - if ikey == istart: - ikey += 1 - elif ikey == iend: - ikey -= 1 - return cls._bytestring_from_int(ikey, prec).rstrip(b'\0') + common_prefix_len = min(len(start), len(end)) + + # Convert the relative precision of fraction (~53 bits) to an absolute + # precision needed to represent values between start and end distinctly. + prec = common_prefix_len + int(-math.log(fraction, 256)) + 7 + istart = cls._bytestring_to_int(start, prec) + iend = cls._bytestring_to_int(end, prec) if end else 1 << (prec * 8) + ikey = istart + int((iend - istart) * fraction) + + # Could be equal due to rounding. + # Adjust to ensure we never return the actual start and end + # unless fraction is exatly 0 or 1. + if ikey == istart: + ikey += 1 + elif ikey == iend: + ikey -= 1 + + position: bytes = cls._bytestring_from_int(ikey, prec).rstrip(b'\0') + + if isinstance(start, bytes): + return position + + return position.decode(encoding='unicode_escape', errors='replace') @classmethod - def position_to_fraction(cls, key, start=None, end=None): - """ - Returns the fraction of keys in the range [start, end) that + def position_to_fraction( + cls, + key: Union[bytes, str] = None, + start: Union[bytes, str] = None, + end: Union[bytes, str] = None, + ) -> float: + """Returns the fraction of keys in the range [start, end) that are less than the given key. """ if not key: return 0 + if start is None: - start = b'' + start = '' if isinstance(key, str) else b'' + prec = len(start) + 7 if key.startswith(start): # Higher absolute precision needed for very small values of fixed # relative position. - prec = max(prec, len(key) - len(key[len(start):].strip(b'\0')) + 7) + trailing_symbol = '\0' if isinstance(key, str) else b'\0' + prec = max( + prec, len(key) - len(key[len(start):].strip(trailing_symbol)) + 7) istart = cls._bytestring_to_int(start, prec) ikey = cls._bytestring_to_int(key, prec) iend = cls._bytestring_to_int(end, prec) if end else 1 << (prec * 8) return float(ikey - istart) / (iend - istart) @staticmethod - def _bytestring_to_int(s, prec): - """ - Returns int(256**prec * f) where f is the fraction + def _bytestring_to_int(s: Union[bytes, str], prec: int) -> int: + """Returns int(256**prec * f) where f is the fraction represented by interpreting '.' + s as a base-256 floating point number. """ if not s: return 0 - elif len(s) < prec: + + if isinstance(s, str): + s = s.encode() # str -> bytes + + if len(s) < prec: s += b'\0' * (prec - len(s)) else: s = s[:prec] - return int(codecs.encode(s, 'hex'), 16) + + h = codecs.encode(s, encoding='hex') + return int(h, base=16) @staticmethod - def _bytestring_from_int(i, prec): - """ - Inverse of _bytestring_to_int. - """ + def _bytestring_from_int(i: int, prec: int) -> bytes: + """Inverse of _bytestring_to_int.""" h = '%x' % i - return codecs.decode('0' * (2 * prec - len(h)) + h, 'hex') + return codecs.decode('0' * (2 * prec - len(h)) + h, encoding='hex') diff --git a/sdks/python/apache_beam/io/range_trackers_test.py b/sdks/python/apache_beam/io/range_trackers_test.py index d223f99da27f..0bf37997f2ce 100644 --- a/sdks/python/apache_beam/io/range_trackers_test.py +++ b/sdks/python/apache_beam/io/range_trackers_test.py @@ -18,15 +18,12 @@ """Unit tests for the range_trackers module.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import copy import logging import math import unittest - -from past.builtins import long +from typing import Optional +from typing import Union from apache_beam.io import range_trackers @@ -129,7 +126,7 @@ def test_get_position_for_fraction_dense(self): tracker = range_trackers.OffsetRangeTracker(3, 6) # Position must be an integer type. - self.assertTrue(isinstance(tracker.position_at_fraction(0.0), (int, long))) + self.assertTrue(isinstance(tracker.position_at_fraction(0.0), int)) # [3, 3) represents 0.0 of [3, 6) self.assertEqual(3, tracker.position_at_fraction(0.0)) # [3, 4) represents up to 1/3 of [3, 6) @@ -262,25 +259,26 @@ def test_claim_order(self): def test_out_of_range(self): tracker = self.DoubleRangeTracker(10, 20) + # Can't claim before range. with self.assertRaises(ValueError): tracker.try_claim(-5) + # Can't split before range. - with self.assertRaises(ValueError): - tracker.try_split(-5) + self.assertFalse(tracker.try_split(-5)) + # Reject useless split at start position. - with self.assertRaises(ValueError): - tracker.try_split(10) + self.assertFalse(tracker.try_split(10)) + # Can't split after range. - with self.assertRaises(ValueError): - tracker.try_split(25) + self.assertFalse(tracker.try_split(25)) tracker.try_split(15) + # Can't split after modified range. - with self.assertRaises(ValueError): - tracker.try_split(17) + self.assertFalse(tracker.try_split(17)) + # Reject useless split at end position. - with self.assertRaises(ValueError): - tracker.try_split(15) + self.assertFalse(tracker.try_split(15)) self.assertTrue(tracker.try_split(14)) @@ -308,16 +306,20 @@ def test_try_split_fails(self): class LexicographicKeyRangeTrackerTest(unittest.TestCase): - """ - Tests of LexicographicKeyRangeTracker. - """ + """Tests of LexicographicKeyRangeTracker.""" key_to_fraction = ( range_trackers.LexicographicKeyRangeTracker.position_to_fraction) fraction_to_key = ( range_trackers.LexicographicKeyRangeTracker.fraction_to_position) - def _check(self, fraction=None, key=None, start=None, end=None, delta=0): + def _check( + self, + fraction: Optional[float] = None, + key: Union[bytes, str] = None, + start: Union[bytes, str] = None, + end: Union[bytes, str] = None, + delta: float = 0.0): assert key is not None or fraction is not None if fraction is None: fraction = self.key_to_fraction(key, start, end) @@ -346,14 +348,42 @@ def test_key_to_fraction_no_endpoints(self): self._check(key=b'\x07', fraction=7 / 256.) self._check(key=b'\xFF', fraction=255 / 256.) self._check(key=b'\x01\x02\x03', fraction=(2**16 + 2**9 + 3) / (2.0**24)) + self._check(key=b'UUUUUUT', fraction=1 / 3) + self._check(key=b'3333334', fraction=1 / 5) + self._check(key=b'$\x92I$\x92I$', fraction=1 / 7, delta=1e-3) + self._check(key=b'\x01\x02\x03', fraction=(2**16 + 2**9 + 3) / (2.0**24)) def test_key_to_fraction(self): + # test no key, no start + self._check(end=b'eeeeee', fraction=0.0) + self._check(end='eeeeee', fraction=0.0) + + # test no fraction + self._check(key=b'bbbbbb', start=b'aaaaaa', end=b'eeeeee') + self._check(key='bbbbbb', start='aaaaaa', end='eeeeee') + + # test no start + self._check(key=b'eeeeee', end=b'eeeeee', fraction=1.0) + self._check(key='eeeeee', end='eeeeee', fraction=1.0) + self._check(key=b'\x19YYYYY@', end=b'eeeeee', fraction=0.25) + self._check(key=b'2\xb2\xb2\xb2\xb2\xb2\x80', end='eeeeee', fraction=0.5) + self._check(key=b'L\x0c\x0c\x0c\x0c\x0b\xc0', end=b'eeeeee', fraction=0.75) + + # test bytes keys self._check(key=b'\x87', start=b'\x80', fraction=7 / 128.) self._check(key=b'\x07', end=b'\x10', fraction=7 / 16.) self._check(key=b'\x47', start=b'\x40', end=b'\x80', fraction=7 / 64.) self._check(key=b'\x47\x80', start=b'\x40', end=b'\x80', fraction=15 / 128.) + # test string keys + self._check(key='aaaaaa', start='aaaaaa', end='eeeeee', fraction=0.0) + self._check(key='bbbbbb', start='aaaaaa', end='eeeeee', fraction=0.25) + self._check(key='cccccc', start='aaaaaa', end='eeeeee', fraction=0.5) + self._check(key='dddddd', start='aaaaaa', end='eeeeee', fraction=0.75) + self._check(key='eeeeee', start='aaaaaa', end='eeeeee', fraction=1.0) + def test_key_to_fraction_common_prefix(self): + # test bytes keys self._check( key=b'a' * 100 + b'b', start=b'a' * 100 + b'a', @@ -375,7 +405,35 @@ def test_key_to_fraction_common_prefix(self): end=b'foob\x00\x00\x00\x00\x00\x00\x00\x00\x02', fraction=0.5) + # test string keys + self._check( + key='a' * 100 + 'a', + start='a' * 100 + 'a', + end='a' * 100 + 'e', + fraction=0.0) + self._check( + key='a' * 100 + 'b', + start='a' * 100 + 'a', + end='a' * 100 + 'e', + fraction=0.25) + self._check( + key='a' * 100 + 'c', + start='a' * 100 + 'a', + end='a' * 100 + 'e', + fraction=0.5) + self._check( + key='a' * 100 + 'd', + start='a' * 100 + 'a', + end='a' * 100 + 'e', + fraction=0.75) + self._check( + key='a' * 100 + 'e', + start='a' * 100 + 'a', + end='a' * 100 + 'e', + fraction=1.0) + def test_tiny(self): + # test bytes keys self._check(fraction=.5**20, key=b'\0\0\x10') self._check(fraction=.5**20, start=b'a', end=b'b', key=b'a\0\0\x10') self._check(fraction=.5**20, start=b'a', end=b'c', key=b'a\0\0\x20') @@ -391,6 +449,11 @@ def test_tiny(self): delta=1e-15) self._check(fraction=.5**100, key=b'\0' * 12 + b'\x10') + # test string keys + self._check(fraction=.5**20, start='a', end='b', key='a\0\0\x10') + self._check(fraction=.5**20, start='a', end='c', key='a\0\0\x20') + self._check(fraction=.5**20, start='xy_a', end='xy_c', key='xy_a\0\0\x20') + def test_lots(self): for fraction in (0, 1, .5, .75, 7. / 512, 1 - 7. / 4096): self._check(fraction) @@ -421,6 +484,12 @@ def test_lots(self): def test_good_prec(self): # There should be about 7 characters (~53 bits) of precision # (beyond the common prefix of start and end). + self._check( + 1 / math.e, + start='AAAAAAA', + end='zzzzzzz', + key='VNg/ot\x82', + delta=1e-14) self._check( 1 / math.e, start=b'abc_abc', diff --git a/sdks/python/apache_beam/io/restriction_trackers.py b/sdks/python/apache_beam/io/restriction_trackers.py index 00cefee30315..f2b3e1d03d2f 100644 --- a/sdks/python/apache_beam/io/restriction_trackers.py +++ b/sdks/python/apache_beam/io/restriction_trackers.py @@ -18,10 +18,6 @@ """`iobase.RestrictionTracker` implementations provided with Apache Beam.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - -from builtins import object from typing import Tuple from apache_beam.io.iobase import RestrictionProgress @@ -44,10 +40,6 @@ def __eq__(self, other): return self.start == other.start and self.stop == other.stop - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash((type(self), self.start, self.stop)) @@ -88,7 +80,7 @@ class OffsetRestrictionTracker(RestrictionTracker): """ def __init__(self, offset_range): # type: (OffsetRange) -> None - assert isinstance(offset_range, OffsetRange) + assert isinstance(offset_range, OffsetRange), offset_range self._range = offset_range self._current_position = None self._last_claim_attempt = None diff --git a/sdks/python/apache_beam/io/restriction_trackers_test.py b/sdks/python/apache_beam/io/restriction_trackers_test.py index 3af0e59b38a1..0d3eee18036e 100644 --- a/sdks/python/apache_beam/io/restriction_trackers_test.py +++ b/sdks/python/apache_beam/io/restriction_trackers_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/io/snowflake.py b/sdks/python/apache_beam/io/snowflake.py index de6b23e7f1d8..9331aeac02bd 100644 --- a/sdks/python/apache_beam/io/snowflake.py +++ b/sdks/python/apache_beam/io/snowflake.py @@ -74,14 +74,10 @@ # pytype: skip-file -from __future__ import absolute_import - from typing import List from typing import NamedTuple from typing import Optional -from past.builtins import unicode - import apache_beam as beam from apache_beam.transforms.external import BeamJarExpansionService from apache_beam.transforms.external import ExternalTransform @@ -103,21 +99,21 @@ def default_io_expansion_service(): ReadFromSnowflakeSchema = NamedTuple( 'ReadFromSnowflakeSchema', [ - ('server_name', unicode), - ('schema', unicode), - ('database', unicode), - ('staging_bucket_name', unicode), - ('storage_integration_name', unicode), - ('username', Optional[unicode]), - ('password', Optional[unicode]), - ('private_key_path', Optional[unicode]), - ('raw_private_key', Optional[unicode]), - ('private_key_passphrase', Optional[unicode]), - ('o_auth_token', Optional[unicode]), - ('table', Optional[unicode]), - ('query', Optional[unicode]), - ('role', Optional[unicode]), - ('warehouse', Optional[unicode]), + ('server_name', str), + ('schema', str), + ('database', str), + ('staging_bucket_name', str), + ('storage_integration_name', str), + ('username', Optional[str]), + ('password', Optional[str]), + ('private_key_path', Optional[str]), + ('raw_private_key', Optional[str]), + ('private_key_passphrase', Optional[str]), + ('o_auth_token', Optional[str]), + ('table', Optional[str]), + ('query', Optional[str]), + ('role', Optional[str]), + ('warehouse', Optional[str]), ]) @@ -240,24 +236,24 @@ def expand(self, pbegin): WriteToSnowflakeSchema = NamedTuple( 'WriteToSnowflakeSchema', [ - ('server_name', unicode), - ('schema', unicode), - ('database', unicode), - ('staging_bucket_name', unicode), - ('storage_integration_name', unicode), - ('create_disposition', unicode), - ('write_disposition', unicode), - ('table_schema', unicode), - ('username', Optional[unicode]), - ('password', Optional[unicode]), - ('private_key_path', Optional[unicode]), - ('raw_private_key', Optional[unicode]), - ('private_key_passphrase', Optional[unicode]), - ('o_auth_token', Optional[unicode]), - ('table', Optional[unicode]), - ('query', Optional[unicode]), - ('role', Optional[unicode]), - ('warehouse', Optional[unicode]), + ('server_name', str), + ('schema', str), + ('database', str), + ('staging_bucket_name', str), + ('storage_integration_name', str), + ('create_disposition', str), + ('write_disposition', str), + ('table_schema', str), + ('username', Optional[str]), + ('password', Optional[str]), + ('private_key_path', Optional[str]), + ('raw_private_key', Optional[str]), + ('private_key_passphrase', Optional[str]), + ('o_auth_token', Optional[str]), + ('table', Optional[str]), + ('query', Optional[str]), + ('role', Optional[str]), + ('warehouse', Optional[str]), ], ) diff --git a/sdks/python/apache_beam/io/source_test_utils.py b/sdks/python/apache_beam/io/source_test_utils.py index deb7ef1e1b33..b40f70604c42 100644 --- a/sdks/python/apache_beam/io/source_test_utils.py +++ b/sdks/python/apache_beam/io/source_test_utils.py @@ -45,15 +45,9 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import threading import weakref -from builtins import next -from builtins import object -from builtins import range from collections import namedtuple from multiprocessing.pool import ThreadPool diff --git a/sdks/python/apache_beam/io/source_test_utils_test.py b/sdks/python/apache_beam/io/source_test_utils_test.py index d94c7b9f9db3..6d3f2e3a4a85 100644 --- a/sdks/python/apache_beam/io/source_test_utils_test.py +++ b/sdks/python/apache_beam/io/source_test_utils_test.py @@ -17,25 +17,15 @@ # pytype: skip-file -from __future__ import absolute_import - import logging -import sys import tempfile import unittest -from builtins import range import apache_beam.io.source_test_utils as source_test_utils from apache_beam.io.filebasedsource_test import LineSource class SourceTestUtilsTest(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def _create_file_with_data(self, lines): assert isinstance(lines, list) with tempfile.NamedTemporaryFile(delete=False) as f: diff --git a/sdks/python/apache_beam/io/sources_test.py b/sdks/python/apache_beam/io/sources_test.py index 7cf6eb1ac94b..f75e4fdafff0 100644 --- a/sdks/python/apache_beam/io/sources_test.py +++ b/sdks/python/apache_beam/io/sources_test.py @@ -18,11 +18,8 @@ """Unit tests for the sources framework.""" # pytype: skip-file -from __future__ import absolute_import - import logging import os -import sys import tempfile import unittest @@ -92,12 +89,6 @@ def _get_file_size(self): class SourcesTest(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def _create_temp_file(self, contents): with tempfile.NamedTemporaryFile(delete=False) as f: f.write(contents) diff --git a/sdks/python/apache_beam/io/textio.py b/sdks/python/apache_beam/io/textio.py index b7af82230d55..277b993aabe7 100644 --- a/sdks/python/apache_beam/io/textio.py +++ b/sdks/python/apache_beam/io/textio.py @@ -19,16 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import logging -from builtins import object -from builtins import range from functools import partial from typing import Optional -from past.builtins import long - from apache_beam.coders import coders from apache_beam.io import filebasedsink from apache_beam.io import filebasedsource @@ -85,7 +79,7 @@ def position(self): @position.setter def position(self, value): - assert isinstance(value, (int, long)) + assert isinstance(value, int) if value > len(self._data): raise ValueError( 'Cannot set position to %d since it\'s larger than ' @@ -455,6 +449,10 @@ class ReadAllFromText(PTransform): Parses a text file as newline-delimited elements, by default assuming UTF-8 encoding. Supports newline delimiters '\\n' and '\\r\\n'. + If `with_filename` is ``True`` the output will include the file name. This is + similar to ``ReadFromTextWithFilename`` but this ``PTransform`` can be placed + anywhere in the pipeline. + This implementation only supports reading text encoded using UTF-8 or ASCII. This does not support other encodings such as UTF-16 or UTF-32. """ @@ -469,6 +467,7 @@ def __init__( strip_trailing_newlines=True, coder=coders.StrUtf8Coder(), # type: coders.Coder skip_header_lines=0, + with_filename=False, **kwargs): """Initialize the ``ReadAllFromText`` transform. @@ -490,6 +489,9 @@ def __init__( from each source file. Must be 0 or higher. Large number of skipped lines might impact performance. coder: Coder used to decode each line. + with_filename: If True, returns a Key Value with the key being the file + name and the value being the actual data. If False, it only returns + the data. """ super(ReadAllFromText, self).__init__(**kwargs) source_from_file = partial( @@ -507,7 +509,8 @@ def __init__( compression_type, desired_bundle_size, min_bundle_size, - source_from_file) + source_from_file, + with_filename) def expand(self, pvalue): return pvalue | 'ReadAllFiles' >> self._read_all_files diff --git a/sdks/python/apache_beam/io/textio_test.py b/sdks/python/apache_beam/io/textio_test.py index 627fc9345ce9..6be437054780 100644 --- a/sdks/python/apache_beam/io/textio_test.py +++ b/sdks/python/apache_beam/io/textio_test.py @@ -18,20 +18,15 @@ """Tests for textio module.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import bz2 import glob import gzip import logging import os import shutil -import sys import tempfile import unittest import zlib -from builtins import range import apache_beam as beam import apache_beam.io.source_test_utils as source_test_utils @@ -152,12 +147,6 @@ class TextSourceTest(unittest.TestCase): # Number of records that will be written by most tests. DEFAULT_NUM_RECORDS = 100 - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def _run_read_test( self, file_or_pattern, @@ -593,6 +582,17 @@ def test_read_all_many_file_patterns(self): [pattern1, pattern2, pattern3]) | 'ReadAll' >> ReadAllFromText() assert_that(pcoll, equal_to(expected_data)) + def test_read_all_with_filename(self): + pattern, expected_data = write_pattern([5, 3], return_filenames=True) + assert len(expected_data) == 8 + + with TestPipeline() as pipeline: + pcoll = ( + pipeline + | 'Create' >> Create([pattern]) + | 'ReadAll' >> ReadAllFromText(with_filename=True)) + assert_that(pcoll, equal_to(expected_data)) + def test_read_auto_bzip2(self): _, lines = write_data(15) with TempDir() as tempdir: @@ -1017,12 +1017,6 @@ def test_read_after_splitting_skip_header(self): class TextSinkTest(unittest.TestCase): - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def setUp(self): super(TextSinkTest, self).setUp() self.lines = [b'Line %d' % d for d in range(100)] diff --git a/sdks/python/apache_beam/io/tfrecordio.py b/sdks/python/apache_beam/io/tfrecordio.py index b6809ce5ed0e..e699756360e0 100644 --- a/sdks/python/apache_beam/io/tfrecordio.py +++ b/sdks/python/apache_beam/io/tfrecordio.py @@ -19,12 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import codecs import logging import struct -from builtins import object from functools import partial import crcmod @@ -38,7 +35,7 @@ from apache_beam.io.iobase import Write from apache_beam.transforms import PTransform -__all__ = ['ReadFromTFRecord', 'WriteToTFRecord'] +__all__ = ['ReadFromTFRecord', 'ReadAllFromTFRecord', 'WriteToTFRecord'] _LOGGER = logging.getLogger(__name__) @@ -206,7 +203,10 @@ def _create_tfrecordio_source( class ReadAllFromTFRecord(PTransform): """A ``PTransform`` for reading a ``PCollection`` of TFRecord files.""" def __init__( - self, coder=coders.BytesCoder(), compression_type=CompressionTypes.AUTO): + self, + coder=coders.BytesCoder(), + compression_type=CompressionTypes.AUTO, + with_filename=False): """Initialize the ``ReadAllFromTFRecord`` transform. Args: @@ -214,6 +214,9 @@ def __init__( compression_type: Used to handle compressed input files. Default value is CompressionTypes.AUTO, in which case the file_path's extension will be used to detect the compression. + with_filename: If True, returns a Key Value with the key being the file + name and the value being the actual data. If False, it only returns + the data. """ super(ReadAllFromTFRecord, self).__init__() source_from_file = partial( @@ -227,7 +230,8 @@ def __init__( compression_type=compression_type, desired_bundle_size=0, min_bundle_size=0, - source_from_file=source_from_file) + source_from_file=source_from_file, + with_filename=with_filename) def expand(self, pvalue): return pvalue | 'ReadAllFiles' >> self._read_all_files diff --git a/sdks/python/apache_beam/io/tfrecordio_test.py b/sdks/python/apache_beam/io/tfrecordio_test.py index dca5a5ecc2eb..a867c0212ad3 100644 --- a/sdks/python/apache_beam/io/tfrecordio_test.py +++ b/sdks/python/apache_beam/io/tfrecordio_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import binascii import glob import gzip @@ -28,14 +26,10 @@ import pickle import random import re -import sys import unittest import zlib -from builtins import range import crcmod -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import apache_beam as beam from apache_beam import Create @@ -105,12 +99,8 @@ def _as_file_handle(self, contents): def _increment_value_at_index(self, value, index): l = list(value) - if sys.version_info[0] <= 2: - l[index] = bytes(ord(l[index]) + 1) - return b"".join(l) - else: - l[index] = l[index] + 1 - return bytes(l) + l[index] = l[index] + 1 + return bytes(l) def _test_error(self, record, error_text): with self.assertRaisesRegex(ValueError, re.escape(error_text)): @@ -372,6 +362,28 @@ def test_process_multiple(self): compression_type=CompressionTypes.AUTO)) assert_that(result, equal_to([b'foo', b'bar'])) + def test_process_with_filename(self): + with TempDir() as temp_dir: + num_files = 3 + files = [] + for i in range(num_files): + path = temp_dir.create_temp_file('result%s' % i) + _write_file(path, FOO_BAR_RECORD_BASE64) + files.append(path) + content = [b'foo', b'bar'] + expected = [(file, line) for file in files for line in content] + pattern = temp_dir.get_path() + os.path.sep + '*' + + with TestPipeline() as p: + result = ( + p + | Create([pattern]) + | ReadAllFromTFRecord( + coder=coders.BytesCoder(), + compression_type=CompressionTypes.AUTO, + with_filename=True)) + assert_that(result, equal_to(expected)) + def test_process_glob(self): with TempDir() as temp_dir: self._write_glob(temp_dir, 'result') diff --git a/sdks/python/apache_beam/io/utils.py b/sdks/python/apache_beam/io/utils.py index 4903020300c0..0d1f52f35f2b 100644 --- a/sdks/python/apache_beam/io/utils.py +++ b/sdks/python/apache_beam/io/utils.py @@ -22,10 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import range - from apache_beam.io import iobase from apache_beam.io.range_trackers import OffsetRangeTracker from apache_beam.metrics import Metrics diff --git a/sdks/python/apache_beam/io/utils_test.py b/sdks/python/apache_beam/io/utils_test.py index 5fc90125082a..76ed9969f995 100644 --- a/sdks/python/apache_beam/io/utils_test.py +++ b/sdks/python/apache_beam/io/utils_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import mock diff --git a/sdks/python/apache_beam/io/vcfio.py b/sdks/python/apache_beam/io/vcfio.py deleted file mode 100644 index c97ff4f87ecc..000000000000 --- a/sdks/python/apache_beam/io/vcfio.py +++ /dev/null @@ -1,511 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -"""A source for reading from VCF files (version 4.x). - -The 4.2 spec is available at https://samtools.github.io/hts-specs/VCFv4.2.pdf. -""" - -# pytype: skip-file - -from __future__ import absolute_import - -import logging -import sys -import traceback -import warnings -from builtins import next -from builtins import object -from collections import namedtuple - -from future.utils import iteritems -from past.builtins import long -from past.builtins import unicode - -from apache_beam.coders import coders -from apache_beam.io import filebasedsource -from apache_beam.io.filesystem import CompressionTypes -from apache_beam.io.iobase import Read -from apache_beam.io.textio import _TextSource as TextSource -from apache_beam.transforms import PTransform - -if sys.version_info[0] < 3: - import vcf -else: - warnings.warn( - "VCF IO will support Python 3 after migration to Nucleus, " - "see: BEAM-5628.") - -__all__ = [ - 'ReadFromVcf', - 'Variant', - 'VariantCall', - 'VariantInfo', - 'MalformedVcfRecord' -] - -_LOGGER = logging.getLogger(__name__) - -# Stores data about variant INFO fields. The type of 'data' is specified in the -# VCF headers. 'field_count' is a string that specifies the number of fields -# that the data type contains. Its value can either be a number representing a -# constant number of fields, `None` indicating that the value is not set -# (equivalent to '.' in the VCF file) or one of: -# - 'A': one value per alternate allele. -# - 'G': one value for each possible genotype. -# - 'R': one value for each possible allele (including the reference). -VariantInfo = namedtuple('VariantInfo', ['data', 'field_count']) -# Stores data about failed VCF record reads. `line` is the text line that -# caused the failed read and `file_name` is the name of the file that the read -# failed in. -MalformedVcfRecord = namedtuple('MalformedVcfRecord', ['file_name', 'line']) -MISSING_FIELD_VALUE = '.' # Indicates field is missing in VCF record. -PASS_FILTER = 'PASS' # Indicates that all filters have been passed. -END_INFO_KEY = 'END' # The info key that explicitly specifies end of a record. -GENOTYPE_FORMAT_KEY = 'GT' # The genotype format key in a call. -PHASESET_FORMAT_KEY = 'PS' # The phaseset format key. -DEFAULT_PHASESET_VALUE = '*' # Default phaseset value if call is phased, but -# no 'PS' is present. -MISSING_GENOTYPE_VALUE = -1 # Genotype to use when '.' is used in GT field. - - -class Variant(object): - """A class to store info about a genomic variant. - - Each object corresponds to a single record in a VCF file. - """ - __hash__ = None # type: ignore[assignment] - - def __init__( - self, - reference_name=None, - start=None, - end=None, - reference_bases=None, - alternate_bases=None, - names=None, - quality=None, - filters=None, - info=None, - calls=None): - """Initialize the :class:`Variant` object. - - Args: - reference_name (str): The reference on which this variant occurs - (such as `chr20` or `X`). . - start (int): The position at which this variant occurs (0-based). - Corresponds to the first base of the string of reference bases. - end (int): The end position (0-based) of this variant. Corresponds to the - first base after the last base in the reference allele. - reference_bases (str): The reference bases for this variant. - alternate_bases (List[str]): The bases that appear instead of the - reference bases. - names (List[str]): Names for the variant, for example a RefSNP ID. - quality (float): Phred-scaled quality score (-10log10 prob(call is wrong)) - Higher values imply better quality. - filters (List[str]): A list of filters (normally quality filters) this - variant has failed. `PASS` indicates this variant has passed all - filters. - info (dict): A map of additional variant information. The key is specified - in the VCF record and the value is of type ``VariantInfo``. - calls (list of :class:`VariantCall`): The variant calls for this variant. - Each one represents the determination of genotype with respect to this - variant. - """ - self.reference_name = reference_name - self.start = start - self.end = end - self.reference_bases = reference_bases - self.alternate_bases = alternate_bases or [] - self.names = names or [] - self.quality = quality - self.filters = filters or [] - self.info = info or {} - self.calls = calls or [] - - def __eq__(self, other): - return isinstance(other, Variant) and vars(self) == vars(other) - - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - - def __repr__(self): - return ', '.join([ - str(s) for s in [ - self.reference_name, - self.start, - self.end, - self.reference_bases, - self.alternate_bases, - self.names, - self.quality, - self.filters, - self.info, - self.calls - ] - ]) - - def __lt__(self, other): - if not isinstance(other, Variant): - return NotImplemented - - # Elements should first be sorted by reference_name, start, end. - # Ordering of other members is not important, but must be - # deterministic. - if self.reference_name != other.reference_name: - return self.reference_name < other.reference_name - elif self.start != other.start: - return self.start < other.start - elif self.end != other.end: - return self.end < other.end - - self_vars = vars(self) - other_vars = vars(other) - for key in sorted(self_vars): - if self_vars[key] != other_vars[key]: - return self_vars[key] < other_vars[key] - - return False - - def __le__(self, other): - if not isinstance(other, Variant): - return NotImplemented - - return self < other or self == other - - def __gt__(self, other): - if not isinstance(other, Variant): - return NotImplemented - - return other < self - - def __ge__(self, other): - if not isinstance(other, Variant): - return NotImplemented - - return other <= self - - -class VariantCall(object): - """A class to store info about a variant call. - - A call represents the determination of genotype with respect to a particular - variant. It may include associated information such as quality and phasing. - """ - - __hash__ = None # type: ignore[assignment] - - def __init__(self, name=None, genotype=None, phaseset=None, info=None): - """Initialize the :class:`VariantCall` object. - - Args: - name (str): The name of the call. - genotype (List[int]): The genotype of this variant call as specified by - the VCF schema. The values are either `0` representing the reference, - or a 1-based index into alternate bases. Ordering is only important if - `phaseset` is present. If a genotype is not called (that is, a `.` is - present in the GT string), -1 is used - phaseset (str): If this field is present, this variant call's genotype - ordering implies the phase of the bases and is consistent with any other - variant calls in the same reference sequence which have the same - phaseset value. If the genotype data was phased but no phase set was - specified, this field will be set to `*`. - info (dict): A map of additional variant call information. The key is - specified in the VCF record and the type of the value is specified by - the VCF header FORMAT. - """ - self.name = name - self.genotype = genotype or [] - self.phaseset = phaseset - self.info = info or {} - - def __eq__(self, other): - return ((self.name, self.genotype, self.phaseset, self.info) == ( - other.name, other.genotype, other.phaseset, other.info)) - - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - - def __repr__(self): - return ', '.join( - [str(s) for s in [self.name, self.genotype, self.phaseset, self.info]]) - - -class _VcfSource(filebasedsource.FileBasedSource): - """A source for reading VCF files. - - Parses VCF files (version 4) using PyVCF library. If file_pattern specifies - multiple files, then the header from each file is used separately to parse - the content. However, the output will be a uniform PCollection of - :class:`Variant` objects. - """ - - DEFAULT_VCF_READ_BUFFER_SIZE = 65536 # 64kB - - def __init__( - self, - file_pattern, - compression_type=CompressionTypes.AUTO, - buffer_size=DEFAULT_VCF_READ_BUFFER_SIZE, - validate=True, - allow_malformed_records=False): - super(_VcfSource, self).__init__( - file_pattern, compression_type=compression_type, validate=validate) - - self._header_lines_per_file = {} - self._compression_type = compression_type - self._buffer_size = buffer_size - self._allow_malformed_records = allow_malformed_records - - def read_records(self, file_name, range_tracker): - record_iterator = _VcfSource._VcfRecordIterator( - file_name, - range_tracker, - self._pattern, - self._compression_type, - self._allow_malformed_records, - buffer_size=self._buffer_size, - skip_header_lines=0) - - # Convert iterator to generator to abstract behavior - for line in record_iterator: - yield line - - class _VcfRecordIterator(object): - """An Iterator for processing a single VCF file.""" - def __init__( - self, - file_name, - range_tracker, - file_pattern, - compression_type, - allow_malformed_records, - **kwargs): - self._header_lines = [] - self._last_record = None - self._file_name = file_name - self._allow_malformed_records = allow_malformed_records - - text_source = TextSource( - file_pattern, - 0, # min_bundle_size - compression_type, - True, # strip_trailing_newlines - coders.StrUtf8Coder(), # coder - validate=False, - header_processor_fns=(lambda x: x.startswith('#'), - self._store_header_lines), - **kwargs) - - self._text_lines = text_source.read_records( - self._file_name, range_tracker) - try: - self._vcf_reader = vcf.Reader(fsock=self._create_generator()) - except SyntaxError: - # Throw the exception inside the generator to ensure file is properly - # closed (it's opened inside TextSource.read_records). - self._text_lines.throw( - ValueError( - 'An exception was raised when reading header from VCF ' - 'file %s: %s' % (self._file_name, traceback.format_exc()))) - - def _store_header_lines(self, header_lines): - self._header_lines = header_lines - - def _create_generator(self): - header_processed = False - for text_line in self._text_lines: - if not header_processed and self._header_lines: - for header in self._header_lines: - self._last_record = header - yield self._last_record - header_processed = True - # PyVCF has explicit str() calls when parsing INFO fields, which fails - # with UTF-8 decoded strings. Encode the line back to UTF-8. - self._last_record = text_line.encode('utf-8') - yield self._last_record - - def __iter__(self): - return self - - # pylint: disable=next-method-defined - def next(self): - return self.__next__() - - def __next__(self): - try: - record = next(self._vcf_reader) - return self._convert_to_variant_record( - record, self._vcf_reader.infos, self._vcf_reader.formats) - except (LookupError, ValueError): - if self._allow_malformed_records: - _LOGGER.warning( - 'An exception was raised when reading record from VCF file ' - '%s. Invalid record was %s: %s', - self._file_name, - self._last_record, - traceback.format_exc()) - return MalformedVcfRecord(self._file_name, self._last_record) - - # Throw the exception inside the generator to ensure file is properly - # closed (it's opened inside TextSource.read_records). - self._text_lines.throw( - ValueError( - 'An exception was raised when reading record from VCF ' - 'file %s. Invalid record was %s: %s' % - (self._file_name, self._last_record, traceback.format_exc()))) - - def _convert_to_variant_record(self, record, infos, formats): - """Converts the PyVCF record to a :class:`Variant` object. - - Args: - record (:class:`~vcf.model._Record`): An object containing info about a - variant. - infos (dict): The PyVCF dict storing INFO extracted from the VCF header. - The key is the info key and the value is :class:`~vcf.parser._Info`. - formats (dict): The PyVCF dict storing FORMAT extracted from the VCF - header. The key is the FORMAT key and the value is - :class:`~vcf.parser._Format`. - Returns: - A :class:`Variant` object from the given record. - """ - variant = Variant() - variant.reference_name = record.CHROM - variant.start = record.start - variant.end = record.end - variant.reference_bases = ( - record.REF if record.REF != MISSING_FIELD_VALUE else None) - # ALT fields are classes in PyVCF (e.g. Substitution), so need convert - # them to their string representations. - variant.alternate_bases.extend([str(r) for r in record.ALT - if r] if record.ALT else []) - variant.names.extend(record.ID.split(';') if record.ID else []) - variant.quality = record.QUAL - # PyVCF uses None for '.' and an empty list for 'PASS'. - if record.FILTER is not None: - variant.filters.extend( - record.FILTER if record.FILTER else [PASS_FILTER]) - for k, v in iteritems(record.INFO): - # Special case: END info value specifies end of the record, so adjust - # variant.end and do not include it as part of variant.info. - if k == END_INFO_KEY: - variant.end = v - continue - field_count = None - if k in infos: - field_count = self._get_field_count_as_string(infos[k].num) - variant.info[k] = VariantInfo(data=v, field_count=field_count) - for sample in record.samples: - call = VariantCall() - call.name = sample.sample - for allele in sample.gt_alleles or [MISSING_GENOTYPE_VALUE]: - if allele is None: - allele = MISSING_GENOTYPE_VALUE - call.genotype.append(int(allele)) - phaseset_from_format = ( - getattr(sample.data, PHASESET_FORMAT_KEY) - if PHASESET_FORMAT_KEY in sample.data._fields else None) - # Note: Call is considered phased if it contains the 'PS' key regardless - # of whether it uses '|'. - if phaseset_from_format or sample.phased: - call.phaseset = ( - str(phaseset_from_format) - if phaseset_from_format else DEFAULT_PHASESET_VALUE) - for field in sample.data._fields: - # Genotype and phaseset (if present) are already included. - if field in (GENOTYPE_FORMAT_KEY, PHASESET_FORMAT_KEY): - continue - data = getattr(sample.data, field) - # Convert single values to a list for cases where the number of fields - # is unknown. This is to ensure consistent types across all records. - # Note: this is already done for INFO fields in PyVCF. - if (field in formats and formats[field].num is None and - isinstance(data, (int, float, long, str, unicode, bool))): - data = [data] - call.info[field] = data - variant.calls.append(call) - return variant - - def _get_field_count_as_string(self, field_count): - """Returns the string representation of field_count from PyVCF. - - PyVCF converts field counts to an integer with some predefined constants - as specified in the vcf.parser.field_counts dict (e.g. 'A' is -1). This - method converts them back to their string representation to avoid having - direct dependency on the arbitrary PyVCF constants. - Args: - field_count (int): An integer representing the number of fields in INFO - as specified by PyVCF. - Returns: - A string representation of field_count (e.g. '-1' becomes 'A'). - Raises: - ValueError: if the field_count is not valid. - """ - if field_count is None: - return None - elif field_count >= 0: - return str(field_count) - field_count_to_string = {v: k for k, v in vcf.parser.field_counts.items()} - if field_count in field_count_to_string: - return field_count_to_string[field_count] - else: - raise ValueError('Invalid value for field_count: %d' % field_count) - - -class ReadFromVcf(PTransform): - """A :class:`~apache_beam.transforms.ptransform.PTransform` for reading VCF - files. - - Parses VCF files (version 4) using PyVCF library. If file_pattern specifies - multiple files, then the header from each file is used separately to parse - the content. However, the output will be a PCollection of - :class:`Variant` (or :class:`MalformedVcfRecord` for failed reads) objects. - """ - def __init__( - self, - file_pattern=None, - compression_type=CompressionTypes.AUTO, - validate=True, - allow_malformed_records=False, - **kwargs): - """Initialize the :class:`ReadFromVcf` transform. - - Args: - file_pattern (str): The file path to read from either as a single file or - a glob pattern. - compression_type (str): Used to handle compressed input files. - Typical value is :attr:`CompressionTypes.AUTO - `, in which case the - underlying file_path's extension will be used to detect the compression. - validate (bool): flag to verify that the files exist during the pipeline - creation time. - allow_malformed_records (bool): determines if failed VCF - record reads will be tolerated. Failed record reads will result in a - :class:`MalformedVcfRecord` being returned from the read of the record - rather than a :class:`Variant`. - """ - super(ReadFromVcf, self).__init__(**kwargs) - self._source = _VcfSource( - file_pattern, - compression_type, - validate=validate, - allow_malformed_records=allow_malformed_records) - - def expand(self, pvalue): - return pvalue.pipeline | Read(self._source) diff --git a/sdks/python/apache_beam/io/vcfio_test.py b/sdks/python/apache_beam/io/vcfio_test.py deleted file mode 100644 index eafced635dae..000000000000 --- a/sdks/python/apache_beam/io/vcfio_test.py +++ /dev/null @@ -1,635 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -"""Tests for vcfio module.""" - -# pytype: skip-file - -from __future__ import absolute_import - -import logging -import os -import sys -import unittest -from itertools import chain -from itertools import permutations - -import apache_beam.io.source_test_utils as source_test_utils -from apache_beam.io.vcfio import _VcfSource as VcfSource -from apache_beam.io.vcfio import DEFAULT_PHASESET_VALUE -from apache_beam.io.vcfio import MISSING_GENOTYPE_VALUE -from apache_beam.io.vcfio import MalformedVcfRecord -from apache_beam.io.vcfio import ReadFromVcf -from apache_beam.io.vcfio import Variant -from apache_beam.io.vcfio import VariantCall -from apache_beam.io.vcfio import VariantInfo -from apache_beam.testing.test_pipeline import TestPipeline -from apache_beam.testing.test_utils import TempDir -from apache_beam.testing.util import BeamAssertException -from apache_beam.testing.util import assert_that - -# Note: mixing \n and \r\n to verify both behaviors. -_SAMPLE_HEADER_LINES = [ - '##fileformat=VCFv4.2\n', - '##INFO=\n', - '##INFO=\n', - '##FORMAT=\r\n', - '##FORMAT=\n', - '#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2\r\n', -] - -_SAMPLE_TEXT_LINES = [ - '20 14370 . G A 29 PASS AF=0.5 GT:GQ 0|0:48 1|0:48\n', - '20 17330 . T A 3 q10 AF=0.017 GT:GQ 0|0:49 0|1:3\n', - '20 1110696 . A G,T 67 PASS AF=0.3,0.7 GT:GQ 1|2:21 2|1:2\n', - '20 1230237 . T . 47 PASS . GT:GQ 0|0:54 0|0:48\n', - '19 1234567 . GTCT G,GTACT 50 PASS . GT:GQ 0/1:35 0/2:17\n', - '20 1234 rs123 C A,T 50 PASS AF=0.5 GT:GQ 0/0:48 1/0:20\n', - '19 123 rs1234 GTC . 40 q10;s50 NS=2 GT:GQ 1|0:48 0/1:.\n', - '19 12 . C 49 q10 AF=0.5 GT:GQ 0|1:45 .:.\n' -] - - -def get_full_file_path(file_name): - """Returns the full path of the specified ``file_name`` from ``data``.""" - return os.path.join( - os.path.dirname(__file__), '..', 'testing', 'data', 'vcf', file_name) - - -def get_full_dir(): - """Returns the full path of the ``data`` directory.""" - return os.path.join(os.path.dirname(__file__), '..', 'testing', 'data', 'vcf') - - -# Helper method for comparing variants. -def _variant_comparator(v1, v2): - if v1.reference_name == v2.reference_name: - if v1.start == v2.start: - return (v1.end > v2.end) - (v1.end < v2.end) - return (v1.start > v2.start) - (v1.start < v2.start) - return (v1.reference_name > v2.reference_name) - \ - (v1.reference_name < v2.reference_name) - - -# Helper method for verifying equal count on PCollection. -def _count_equals_to(expected_count): - def _count_equal(actual_list): - actual_count = len(actual_list) - if expected_count != actual_count: - raise BeamAssertException( - 'Expected %d not equal actual %d' % (expected_count, actual_count)) - - return _count_equal - - -@unittest.skipIf( - sys.version_info[0] == 3, - 'VCF io will be ported to Python 3 after switch to Nucleus. ' - 'See BEAM-5628') -class VcfSourceTest(unittest.TestCase): - - # Distribution should skip tests that need VCF files due to large size - VCF_FILE_DIR_MISSING = not os.path.exists(get_full_dir()) - - def _create_temp_vcf_file(self, lines, tempdir): - return tempdir.create_temp_file(suffix='.vcf', lines=lines) - - def _read_records(self, file_or_pattern, **kwargs): - return source_test_utils.read_from_source( - VcfSource(file_or_pattern, **kwargs)) - - def _create_temp_file_and_read_records(self, lines): - with TempDir() as tempdir: - file_name = tempdir.create_temp_file(suffix='.vcf', lines=lines) - return self._read_records(file_name) - - def _assert_variants_equal(self, actual, expected): - self.assertEqual(sorted(expected), sorted(actual)) - - def _get_sample_variant_1(self): - """Get first sample variant. - - Features: - multiple alternates - not phased - multiple names - """ - vcf_line = ( - '20 1234 rs123;rs2 C A,T 50 PASS AF=0.5,0.1;NS=1 ' - 'GT:GQ 0/0:48 1/0:20\n') - variant = Variant( - reference_name='20', - start=1233, - end=1234, - reference_bases='C', - alternate_bases=['A', 'T'], - names=['rs123', 'rs2'], - quality=50, - filters=['PASS'], - info={ - 'AF': VariantInfo(data=[0.5, 0.1], field_count='A'), - 'NS': VariantInfo(data=1, field_count='1') - }) - variant.calls.append( - VariantCall(name='Sample1', genotype=[0, 0], info={'GQ': 48})) - variant.calls.append( - VariantCall(name='Sample2', genotype=[1, 0], info={'GQ': 20})) - return variant, vcf_line - - def _get_sample_variant_2(self): - """Get second sample variant. - - Features: - multiple references - no alternate - phased - multiple filters - missing format field - """ - vcf_line = ('19 123 rs1234 GTC . 40 q10;s50 NS=2 GT:GQ 1|0:48 0/1:.\n') - variant = Variant( - reference_name='19', - start=122, - end=125, - reference_bases='GTC', - alternate_bases=[], - names=['rs1234'], - quality=40, - filters=['q10', 's50'], - info={'NS': VariantInfo(data=2, field_count='1')}) - variant.calls.append( - VariantCall( - name='Sample1', - genotype=[1, 0], - phaseset=DEFAULT_PHASESET_VALUE, - info={'GQ': 48})) - variant.calls.append( - VariantCall(name='Sample2', genotype=[0, 1], info={'GQ': None})) - return variant, vcf_line - - def _get_sample_variant_3(self): - """Get third sample variant. - - Features: - symbolic alternate - no calls for sample 2 - """ - vcf_line = ('19 12 . C 49 q10 AF=0.5 GT:GQ 0|1:45 .:.\n') - variant = Variant( - reference_name='19', - start=11, - end=12, - reference_bases='C', - alternate_bases=[''], - quality=49, - filters=['q10'], - info={'AF': VariantInfo(data=[0.5], field_count='A')}) - variant.calls.append( - VariantCall( - name='Sample1', - genotype=[0, 1], - phaseset=DEFAULT_PHASESET_VALUE, - info={'GQ': 45})) - variant.calls.append( - VariantCall( - name='Sample2', - genotype=[MISSING_GENOTYPE_VALUE], - info={'GQ': None})) - return variant, vcf_line - - def _get_invalid_file_contents(self): - """Gets sample invalid files contents. - - Returns: - A `tuple` where the first element is contents that are invalid because - of record errors and the second element is contents that are invalid - because of header errors. - """ - malformed_vcf_records = [ - # Malfromed record. - ['#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample\n', '1 1 '], - # GT is not an integer. - [ - '#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample\n', - '19 123 rs12345 T C 50 q10 AF=0.2;NS=2 GT A|0' - ], - # POS should be an integer. - [ - '##FILTER=\n', - '##FILTER=\n', - '#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample\n', - '19 abc rs12345 T C 9 q10 AF=0.2;NS=2 GT:GQ 1|0:48\n', - ] - ] - malformed_header_lines = [ - # Malformed FILTER. - [ - '##FILTER=\n', - '##FILTER=\n', - '#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample\n', - '19 123 rs12345 T C 50 q10 AF=0.2;NS=2 GT:GQ 1|0:48\n', - ] - ] - - return (malformed_vcf_records, malformed_header_lines) - - def test_sort_variants(self): - sorted_variants = [ - Variant(reference_name='a', start=20, end=22), - Variant(reference_name='a', start=20, end=22, quality=20), - Variant(reference_name='b', start=20, end=22), - Variant(reference_name='b', start=21, end=22), - Variant(reference_name='b', start=21, end=23) - ] - - for permutation in permutations(sorted_variants): - self.assertEqual(sorted(permutation), sorted_variants) - - def test_variant_equality(self): - base_variant = Variant( - reference_name='a', - start=20, - end=22, - reference_bases='a', - alternate_bases=['g', 't'], - names=['variant'], - quality=9, - filters=['q10'], - info={'key': 'value'}, - calls=[VariantCall(genotype=[0, 0])]) - equal_variant = Variant( - reference_name='a', - start=20, - end=22, - reference_bases='a', - alternate_bases=['g', 't'], - names=['variant'], - quality=9, - filters=['q10'], - info={'key': 'value'}, - calls=[VariantCall(genotype=[0, 0])]) - different_calls = Variant( - reference_name='a', - start=20, - end=22, - reference_bases='a', - alternate_bases=['g', 't'], - names=['variant'], - quality=9, - filters=['q10'], - info={'key': 'value'}, - calls=[VariantCall(genotype=[1, 0])]) - missing_field = Variant( - reference_name='a', - start=20, - end=22, - reference_bases='a', - alternate_bases=['g', 't'], - names=['variant'], - quality=9, - filters=['q10'], - info={'key': 'value'}) - - self.assertEqual(base_variant, equal_variant) - self.assertNotEqual(base_variant, different_calls) - self.assertNotEqual(base_variant, missing_field) - - @unittest.skipIf(VCF_FILE_DIR_MISSING, 'VCF test file directory is missing') - def test_read_single_file_large(self): - test_data_conifgs = [ - { - 'file': 'valid-4.0.vcf', 'num_records': 5 - }, - { - 'file': 'valid-4.0.vcf.gz', 'num_records': 5 - }, - { - 'file': 'valid-4.0.vcf.bz2', 'num_records': 5 - }, - { - 'file': 'valid-4.1-large.vcf', 'num_records': 9882 - }, - { - 'file': 'valid-4.2.vcf', 'num_records': 13 - }, - ] - for config in test_data_conifgs: - read_data = self._read_records(get_full_file_path(config['file'])) - self.assertEqual(config['num_records'], len(read_data)) - - @unittest.skipIf(VCF_FILE_DIR_MISSING, 'VCF test file directory is missing') - def test_read_file_pattern_large(self): - read_data = self._read_records(os.path.join(get_full_dir(), 'valid-*.vcf')) - self.assertEqual(9900, len(read_data)) - read_data_gz = self._read_records( - os.path.join(get_full_dir(), 'valid-*.vcf.gz')) - self.assertEqual(9900, len(read_data_gz)) - - def test_single_file_no_records(self): - self.assertEqual([], self._create_temp_file_and_read_records([''])) - self.assertEqual([], - self._create_temp_file_and_read_records( - ['\n', '\r\n', '\n'])) - self.assertEqual( - [], self._create_temp_file_and_read_records(_SAMPLE_HEADER_LINES)) - - def test_single_file_verify_details(self): - variant_1, vcf_line_1 = self._get_sample_variant_1() - read_data = self._create_temp_file_and_read_records( - _SAMPLE_HEADER_LINES + [vcf_line_1]) - self.assertEqual(1, len(read_data)) - self.assertEqual(variant_1, read_data[0]) - variant_2, vcf_line_2 = self._get_sample_variant_2() - variant_3, vcf_line_3 = self._get_sample_variant_3() - read_data = self._create_temp_file_and_read_records( - _SAMPLE_HEADER_LINES + [vcf_line_1, vcf_line_2, vcf_line_3]) - self.assertEqual(3, len(read_data)) - self._assert_variants_equal([variant_1, variant_2, variant_3], read_data) - - def test_file_pattern_verify_details(self): - variant_1, vcf_line_1 = self._get_sample_variant_1() - variant_2, vcf_line_2 = self._get_sample_variant_2() - variant_3, vcf_line_3 = self._get_sample_variant_3() - with TempDir() as tempdir: - self._create_temp_vcf_file(_SAMPLE_HEADER_LINES + [vcf_line_1], tempdir) - self._create_temp_vcf_file( - (_SAMPLE_HEADER_LINES + [vcf_line_2, vcf_line_3]), tempdir) - read_data = self._read_records(os.path.join(tempdir.get_path(), '*.vcf')) - self.assertEqual(3, len(read_data)) - self._assert_variants_equal([variant_1, variant_2, variant_3], read_data) - - @unittest.skipIf(VCF_FILE_DIR_MISSING, 'VCF test file directory is missing') - def test_read_after_splitting(self): - file_name = get_full_file_path('valid-4.1-large.vcf') - source = VcfSource(file_name) - splits = [p for p in source.split(desired_bundle_size=500)] - self.assertGreater(len(splits), 1) - sources_info = ([(split.source, split.start_position, split.stop_position) - for split in splits]) - self.assertGreater(len(sources_info), 1) - split_records = [] - for source_info in sources_info: - split_records.extend(source_test_utils.read_from_source(*source_info)) - self.assertEqual(9882, len(split_records)) - - def test_invalid_file(self): - invalid_file_contents = self._get_invalid_file_contents() - for content in chain(*invalid_file_contents): - with TempDir() as tempdir, self.assertRaises(ValueError): - self._read_records(self._create_temp_vcf_file(content, tempdir)) - # Try with multiple files (any one of them will throw an exception). - with TempDir() as tempdir, self.assertRaises(ValueError): - for content in chain(*invalid_file_contents): - self._create_temp_vcf_file(content, tempdir) - self._read_records(os.path.join(tempdir.get_path(), '*.vcf')) - - def test_allow_malformed_records(self): - invalid_records, invalid_headers = self._get_invalid_file_contents() - - # Invalid records should not raise errors - for content in invalid_records: - with TempDir() as tempdir: - records = self._read_records( - self._create_temp_vcf_file(content, tempdir), - allow_malformed_records=True) - for record in records: - self.assertIsInstance(record, MalformedVcfRecord) - - # Invalid headers should still raise errors - for content in invalid_headers: - with TempDir() as tempdir, self.assertRaises(ValueError): - self._read_records( - self._create_temp_vcf_file(content, tempdir), - allow_malformed_records=True) - - def test_no_samples(self): - header_line = '#CHROM POS ID REF ALT QUAL FILTER INFO\n' - record_line = '19 123 . G A . PASS AF=0.2' - expected_variant = Variant( - reference_name='19', - start=122, - end=123, - reference_bases='G', - alternate_bases=['A'], - filters=['PASS'], - info={'AF': VariantInfo(data=[0.2], field_count='A')}) - read_data = self._create_temp_file_and_read_records( - _SAMPLE_HEADER_LINES[:-1] + [header_line, record_line]) - self.assertEqual(1, len(read_data)) - self.assertEqual(expected_variant, read_data[0]) - - def test_no_info(self): - record_line = 'chr19 123 . . . . . . GT . .' - expected_variant = Variant(reference_name='chr19', start=122, end=123) - expected_variant.calls.append( - VariantCall(name='Sample1', genotype=[MISSING_GENOTYPE_VALUE])) - expected_variant.calls.append( - VariantCall(name='Sample2', genotype=[MISSING_GENOTYPE_VALUE])) - read_data = self._create_temp_file_and_read_records( - _SAMPLE_HEADER_LINES + [record_line]) - self.assertEqual(1, len(read_data)) - self.assertEqual(expected_variant, read_data[0]) - - def test_info_numbers_and_types(self): - info_headers = [ - '##INFO=\n', - '##INFO=\n', - '##INFO=\n', - '##INFO=\n', - '##INFO=\n' - ] - record_lines = [ - '19 2 . A T,C . . HA=a1,a2;HG=1,2,3;HR=a,b,c;HF;HU=0.1 GT 1/0 0/1\n', - '19 124 . A T . . HG=3,4,5;HR=d,e;HU=1.1,1.2 GT 0/0 0/1' - ] - variant_1 = Variant( - reference_name='19', - start=1, - end=2, - reference_bases='A', - alternate_bases=['T', 'C'], - info={ - 'HA': VariantInfo(data=['a1', 'a2'], field_count='A'), - 'HG': VariantInfo(data=[1, 2, 3], field_count='G'), - 'HR': VariantInfo(data=['a', 'b', 'c'], field_count='R'), - 'HF': VariantInfo(data=True, field_count='0'), - 'HU': VariantInfo(data=[0.1], field_count=None) - }) - variant_1.calls.append(VariantCall(name='Sample1', genotype=[1, 0])) - variant_1.calls.append(VariantCall(name='Sample2', genotype=[0, 1])) - variant_2 = Variant( - reference_name='19', - start=123, - end=124, - reference_bases='A', - alternate_bases=['T'], - info={ - 'HG': VariantInfo(data=[3, 4, 5], field_count='G'), - 'HR': VariantInfo(data=['d', 'e'], field_count='R'), - 'HU': VariantInfo(data=[1.1, 1.2], field_count=None) - }) - variant_2.calls.append(VariantCall(name='Sample1', genotype=[0, 0])) - variant_2.calls.append(VariantCall(name='Sample2', genotype=[0, 1])) - read_data = self._create_temp_file_and_read_records( - info_headers + _SAMPLE_HEADER_LINES[1:] + record_lines) - self.assertEqual(2, len(read_data)) - self._assert_variants_equal([variant_1, variant_2], read_data) - - def test_end_info_key(self): - phaseset_header_line = ( - '##INFO=\n') - record_lines = [ - '19 123 . A . . . END=1111 GT 1/0 0/1\n', - '19 123 . A . . . . GT 0/1 1/1\n' - ] - variant_1 = Variant( - reference_name='19', start=122, end=1111, reference_bases='A') - variant_1.calls.append(VariantCall(name='Sample1', genotype=[1, 0])) - variant_1.calls.append(VariantCall(name='Sample2', genotype=[0, 1])) - variant_2 = Variant( - reference_name='19', start=122, end=123, reference_bases='A') - variant_2.calls.append(VariantCall(name='Sample1', genotype=[0, 1])) - variant_2.calls.append(VariantCall(name='Sample2', genotype=[1, 1])) - read_data = self._create_temp_file_and_read_records( - [phaseset_header_line] + _SAMPLE_HEADER_LINES[1:] + record_lines) - self.assertEqual(2, len(read_data)) - self._assert_variants_equal([variant_1, variant_2], read_data) - - def test_custom_phaseset(self): - phaseset_header_line = ( - '##FORMAT=\n') - record_lines = [ - '19 123 . A T . . . GT:PS 1|0:1111 0/1:.\n', - '19 121 . A T . . . GT:PS 1|0:2222 0/1:2222\n' - ] - variant_1 = Variant( - reference_name='19', - start=122, - end=123, - reference_bases='A', - alternate_bases=['T']) - variant_1.calls.append( - VariantCall(name='Sample1', genotype=[1, 0], phaseset='1111')) - variant_1.calls.append(VariantCall(name='Sample2', genotype=[0, 1])) - variant_2 = Variant( - reference_name='19', - start=120, - end=121, - reference_bases='A', - alternate_bases=['T']) - variant_2.calls.append( - VariantCall(name='Sample1', genotype=[1, 0], phaseset='2222')) - variant_2.calls.append( - VariantCall(name='Sample2', genotype=[0, 1], phaseset='2222')) - read_data = self._create_temp_file_and_read_records( - [phaseset_header_line] + _SAMPLE_HEADER_LINES[1:] + record_lines) - self.assertEqual(2, len(read_data)) - self._assert_variants_equal([variant_1, variant_2], read_data) - - def test_format_numbers(self): - format_headers = [ - '##FORMAT=\n', - '##FORMAT=\n', - '##FORMAT=\n' - ] - record_lines = [ - '19 2 . A T,C . . . GT:FU:F1:F2 1/0:a1:3:a,b 0/1:a2,a3:4:b,c\n' - ] - expected_variant = Variant( - reference_name='19', - start=1, - end=2, - reference_bases='A', - alternate_bases=['T', 'C']) - expected_variant.calls.append( - VariantCall( - name='Sample1', - genotype=[1, 0], - info={ - 'FU': ['a1'], 'F1': 3, 'F2': ['a', 'b'] - })) - expected_variant.calls.append( - VariantCall( - name='Sample2', - genotype=[0, 1], - info={ - 'FU': ['a2', 'a3'], 'F1': 4, 'F2': ['b', 'c'] - })) - read_data = self._create_temp_file_and_read_records( - format_headers + _SAMPLE_HEADER_LINES[1:] + record_lines) - self.assertEqual(1, len(read_data)) - self.assertEqual(expected_variant, read_data[0]) - - def test_pipeline_read_single_file(self): - with TempDir() as tempdir: - file_name = self._create_temp_vcf_file( - _SAMPLE_HEADER_LINES + _SAMPLE_TEXT_LINES, tempdir) - with TestPipeline() as pipeline: - pcoll = pipeline | 'Read' >> ReadFromVcf(file_name) - assert_that(pcoll, _count_equals_to(len(_SAMPLE_TEXT_LINES))) - - @unittest.skipIf(VCF_FILE_DIR_MISSING, 'VCF test file directory is missing') - def test_pipeline_read_single_file_large(self): - with TestPipeline() as pipeline: - pcoll = pipeline | 'Read' >> ReadFromVcf( - get_full_file_path('valid-4.0.vcf')) - assert_that(pcoll, _count_equals_to(5)) - - @unittest.skipIf(VCF_FILE_DIR_MISSING, 'VCF test file directory is missing') - def test_pipeline_read_file_pattern_large(self): - with TestPipeline() as pipeline: - pcoll = pipeline | 'Read' >> ReadFromVcf( - os.path.join(get_full_dir(), 'valid-*.vcf')) - assert_that(pcoll, _count_equals_to(9900)) - - def test_read_reentrant_without_splitting(self): - with TempDir() as tempdir: - file_name = self._create_temp_vcf_file( - _SAMPLE_HEADER_LINES + _SAMPLE_TEXT_LINES, tempdir) - source = VcfSource(file_name) - source_test_utils.assert_reentrant_reads_succeed((source, None, None)) - - def test_read_reentrant_after_splitting(self): - with TempDir() as tempdir: - file_name = self._create_temp_vcf_file( - _SAMPLE_HEADER_LINES + _SAMPLE_TEXT_LINES, tempdir) - source = VcfSource(file_name) - splits = [split for split in source.split(desired_bundle_size=100000)] - assert len(splits) == 1 - source_test_utils.assert_reentrant_reads_succeed( - (splits[0].source, splits[0].start_position, splits[0].stop_position)) - - def test_dynamic_work_rebalancing(self): - with TempDir() as tempdir: - file_name = self._create_temp_vcf_file( - _SAMPLE_HEADER_LINES + _SAMPLE_TEXT_LINES, tempdir) - source = VcfSource(file_name) - splits = [split for split in source.split(desired_bundle_size=100000)] - assert len(splits) == 1 - source_test_utils.assert_split_at_fraction_exhaustive( - splits[0].source, splits[0].start_position, splits[0].stop_position) - - -if __name__ == '__main__': - logging.getLogger().setLevel(logging.INFO) - unittest.main() diff --git a/sdks/python/apache_beam/io/watermark_estimators.py b/sdks/python/apache_beam/io/watermark_estimators.py index 65c4611fca4c..be91ffb9c751 100644 --- a/sdks/python/apache_beam/io/watermark_estimators.py +++ b/sdks/python/apache_beam/io/watermark_estimators.py @@ -20,8 +20,6 @@ # pytype: skip-file -from __future__ import absolute_import - from apache_beam.io.iobase import WatermarkEstimator from apache_beam.transforms.core import WatermarkEstimatorProvider from apache_beam.utils.timestamp import Timestamp diff --git a/sdks/python/apache_beam/io/watermark_estimators_test.py b/sdks/python/apache_beam/io/watermark_estimators_test.py index edc5d7db7a2a..322217c3aa44 100644 --- a/sdks/python/apache_beam/io/watermark_estimators_test.py +++ b/sdks/python/apache_beam/io/watermark_estimators_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import mock diff --git a/sdks/python/apache_beam/metrics/__init__.py b/sdks/python/apache_beam/metrics/__init__.py index e74168f09e4b..8ce7bbb173fd 100644 --- a/sdks/python/apache_beam/metrics/__init__.py +++ b/sdks/python/apache_beam/metrics/__init__.py @@ -14,7 +14,5 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import - from apache_beam.metrics.metric import Metrics from apache_beam.metrics.metric import MetricsFilter diff --git a/sdks/python/apache_beam/metrics/cells.py b/sdks/python/apache_beam/metrics/cells.py index 34ce2a45b0ff..0c6b8f3dc21a 100644 --- a/sdks/python/apache_beam/metrics/cells.py +++ b/sdks/python/apache_beam/metrics/cells.py @@ -15,6 +15,8 @@ # limitations under the License. # +# cython: language_level=3 + """ This file contains metric cell classes. A metric cell is used to accumulate in-memory changes to a metric. It represents a specific metric in a single @@ -23,12 +25,8 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import threading import time -from builtins import object from datetime import datetime from typing import Any from typing import Optional @@ -286,11 +284,6 @@ def __hash__(self): # type: () -> int return hash(self.data) - def __ne__(self, other): - # type: (object) -> bool - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __repr__(self): # type: () -> str return 'DistributionResult(sum={}, count={}, min={}, max={})'.format( @@ -345,11 +338,6 @@ def __hash__(self): # type: () -> int return hash(self.data) - def __ne__(self, other): - # type: (object) -> bool - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __repr__(self): return ''.format( self.value, self.timestamp) @@ -391,11 +379,6 @@ def __hash__(self): # type: () -> int return hash((self.value, self.timestamp)) - def __ne__(self, other): - # type: (object) -> bool - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __repr__(self): # type: () -> str return ''.format( @@ -457,11 +440,6 @@ def __hash__(self): # type: () -> int return hash((self.sum, self.count, self.min, self.max)) - def __ne__(self, other): - # type: (object) -> bool - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __repr__(self): # type: () -> str return 'DistributionData(sum={}, count={}, min={}, max={})'.format( diff --git a/sdks/python/apache_beam/metrics/cells_test.py b/sdks/python/apache_beam/metrics/cells_test.py index a120d15f7166..3d4d81c3d12b 100644 --- a/sdks/python/apache_beam/metrics/cells_test.py +++ b/sdks/python/apache_beam/metrics/cells_test.py @@ -17,11 +17,8 @@ # pytype: skip-file -from __future__ import absolute_import - import threading import unittest -from builtins import range from apache_beam.metrics.cells import CounterCell from apache_beam.metrics.cells import DistributionCell diff --git a/sdks/python/apache_beam/metrics/execution.pxd b/sdks/python/apache_beam/metrics/execution.pxd index 3605d82dae4c..6311158ea1a3 100644 --- a/sdks/python/apache_beam/metrics/execution.pxd +++ b/sdks/python/apache_beam/metrics/execution.pxd @@ -41,5 +41,6 @@ cdef class MetricUpdater(object): cdef class MetricsContainer(object): cdef object step_name + cdef object lock cdef public dict metrics cpdef MetricCell get_metric_cell(self, metric_key) diff --git a/sdks/python/apache_beam/metrics/execution.py b/sdks/python/apache_beam/metrics/execution.py index 453be6b6ecb8..940086044976 100644 --- a/sdks/python/apache_beam/metrics/execution.py +++ b/sdks/python/apache_beam/metrics/execution.py @@ -15,6 +15,8 @@ # limitations under the License. # +# cython: language_level=3 + """ This module is for internal use only; no backwards-compatibility guarantees. @@ -32,9 +34,7 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import object +import threading from typing import TYPE_CHECKING from typing import Any from typing import Dict @@ -85,10 +85,6 @@ def __eq__(self, other): self.step == other.step and self.metric == other.metric and self.labels == other.labels) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash((self.step, self.metric, frozenset(self.labels))) @@ -128,10 +124,6 @@ def __eq__(self, other): self.key == other.key and self.committed == other.committed and self.attempted == other.attempted) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash((self.key, self.committed, self.attempted)) @@ -192,9 +184,6 @@ def __eq__(self, other): return self is other or ( self.cell_type == other.cell_type and self.fast_name == other.fast_name) - def __ne__(self, other): - return not self == other - def __hash__(self): return self._hash @@ -249,6 +238,7 @@ class MetricsContainer(object): """ def __init__(self, step_name): self.step_name = step_name + self.lock = threading.Lock() self.metrics = dict() # type: Dict[_TypedMetricName, MetricCell] def get_counter(self, metric_name): @@ -273,7 +263,8 @@ def get_metric_cell(self, typed_metric_name): # type: (_TypedMetricName) -> MetricCell cell = self.metrics.get(typed_metric_name, None) if cell is None: - cell = self.metrics[typed_metric_name] = typed_metric_name.cell_type() + with self.lock: + cell = self.metrics[typed_metric_name] = typed_metric_name.cell_type() return cell def get_cumulative(self): @@ -313,10 +304,12 @@ def to_runner_api_monitoring_infos(self, transform_id): # type: (str) -> Dict[FrozenSet, metrics_pb2.MonitoringInfo] """Returns a list of MonitoringInfos for the metrics in this container.""" + with self.lock: + items = list(self.metrics.items()) all_metrics = [ cell.to_runner_api_monitoring_info(key.metric_name, transform_id) for key, - cell in self.metrics.items() + cell in items ] return { monitoring_infos.to_key(mi): mi diff --git a/sdks/python/apache_beam/metrics/execution_test.py b/sdks/python/apache_beam/metrics/execution_test.py index d1086dac385c..a888376e7091 100644 --- a/sdks/python/apache_beam/metrics/execution_test.py +++ b/sdks/python/apache_beam/metrics/execution_test.py @@ -17,10 +17,7 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest -from builtins import range from apache_beam.metrics.execution import MetricKey from apache_beam.metrics.execution import MetricsContainer diff --git a/sdks/python/apache_beam/metrics/metric.py b/sdks/python/apache_beam/metrics/metric.py index 2ca0984b0175..f4896e9decf1 100644 --- a/sdks/python/apache_beam/metrics/metric.py +++ b/sdks/python/apache_beam/metrics/metric.py @@ -27,10 +27,7 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import - import logging -from builtins import object from typing import TYPE_CHECKING from typing import Dict from typing import FrozenSet diff --git a/sdks/python/apache_beam/metrics/metric_test.py b/sdks/python/apache_beam/metrics/metric_test.py index a0f75ed11de4..e3701228feec 100644 --- a/sdks/python/apache_beam/metrics/metric_test.py +++ b/sdks/python/apache_beam/metrics/metric_test.py @@ -17,13 +17,10 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest -from builtins import object import hamcrest as hc -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam import metrics @@ -153,7 +150,7 @@ def test_general_urn_metric_name_str(self): "urn=my_urn, labels={'key': 'value'})") self.assertEqual(str(mn), expected_str) - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_user_counter_using_pardo(self): class SomeDoFn(beam.DoFn): """A custom dummy DoFn using yield.""" diff --git a/sdks/python/apache_beam/metrics/metricbase.py b/sdks/python/apache_beam/metrics/metricbase.py index b89b4bb3700b..12e7881792f9 100644 --- a/sdks/python/apache_beam/metrics/metricbase.py +++ b/sdks/python/apache_beam/metrics/metricbase.py @@ -34,9 +34,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import object from typing import Dict from typing import Optional @@ -82,10 +79,6 @@ def __eq__(self, other): self.namespace == other.namespace and self.name == other.name and self.urn == other.urn and self.labels == other.labels) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __str__(self): if self.urn: return 'MetricName(namespace={}, name={}, urn={}, labels={})'.format( diff --git a/sdks/python/apache_beam/metrics/monitoring_infos.py b/sdks/python/apache_beam/metrics/monitoring_infos.py index 33bb1ca8af9f..2d8faa8fe535 100644 --- a/sdks/python/apache_beam/metrics/monitoring_infos.py +++ b/sdks/python/apache_beam/metrics/monitoring_infos.py @@ -20,8 +20,6 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import time from functools import reduce @@ -97,6 +95,14 @@ common_urns.monitoring_info_labels.BIGQUERY_VIEW.label_props.name) BIGQUERY_QUERY_NAME_LABEL = ( common_urns.monitoring_info_labels.BIGQUERY_QUERY_NAME.label_props.name) +GCS_PROJECT_ID_LABEL = ( + common_urns.monitoring_info_labels.GCS_PROJECT_ID.label_props.name) +GCS_BUCKET_LABEL = ( + common_urns.monitoring_info_labels.GCS_BUCKET.label_props.name) +DATASTORE_PROJECT_ID_LABEL = ( + common_urns.monitoring_info_labels.DATASTORE_PROJECT.label_props.name) +DATASTORE_NAMESPACE_LABEL = ( + common_urns.monitoring_info_labels.DATASTORE_NAMESPACE.label_props.name) def extract_counter_value(monitoring_info_proto): diff --git a/sdks/python/apache_beam/metrics/monitoring_infos_test.py b/sdks/python/apache_beam/metrics/monitoring_infos_test.py index 2855e230b7f7..d19e8bc10df1 100644 --- a/sdks/python/apache_beam/metrics/monitoring_infos_test.py +++ b/sdks/python/apache_beam/metrics/monitoring_infos_test.py @@ -16,8 +16,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from apache_beam.metrics import monitoring_infos diff --git a/sdks/python/apache_beam/ml/gcp/cloud_dlp.py b/sdks/python/apache_beam/ml/gcp/cloud_dlp.py index 3c4406cb691e..93510c8ebb26 100644 --- a/sdks/python/apache_beam/ml/gcp/cloud_dlp.py +++ b/sdks/python/apache_beam/ml/gcp/cloud_dlp.py @@ -19,8 +19,6 @@ functionality. """ -from __future__ import absolute_import - import logging from google.cloud import dlp_v2 diff --git a/sdks/python/apache_beam/ml/gcp/cloud_dlp_it_test.py b/sdks/python/apache_beam/ml/gcp/cloud_dlp_it_test.py index fbba610ac70d..a699aaa36be5 100644 --- a/sdks/python/apache_beam/ml/gcp/cloud_dlp_it_test.py +++ b/sdks/python/apache_beam/ml/gcp/cloud_dlp_it_test.py @@ -17,12 +17,10 @@ """Integration tests for Google Cloud Video Intelligence API transforms.""" -from __future__ import absolute_import - import logging import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -68,7 +66,7 @@ def setUp(self): self.runner_name = type(self.test_pipeline.runner).__name__ self.project = self.test_pipeline.get_option('project') - @attr("IT") + @pytest.mark.it_postcommit def test_deidentification(self): with TestPipeline(is_integration_test=True) as p: output = ( @@ -79,7 +77,7 @@ def test_deidentification(self): inspection_config=INSPECT_CONFIG)) assert_that(output, equal_to(['####################'])) - @attr("IT") + @pytest.mark.it_postcommit def test_inspection(self): with TestPipeline(is_integration_test=True) as p: output = ( diff --git a/sdks/python/apache_beam/ml/gcp/cloud_dlp_test.py b/sdks/python/apache_beam/ml/gcp/cloud_dlp_test.py index ef25555c2a58..111e5be6631a 100644 --- a/sdks/python/apache_beam/ml/gcp/cloud_dlp_test.py +++ b/sdks/python/apache_beam/ml/gcp/cloud_dlp_test.py @@ -17,8 +17,6 @@ """Unit tests for Google Cloud Video Intelligence API transforms.""" -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/ml/gcp/naturallanguageml.py b/sdks/python/apache_beam/ml/gcp/naturallanguageml.py index 5263a607d0a1..7817eb9c4c23 100644 --- a/sdks/python/apache_beam/ml/gcp/naturallanguageml.py +++ b/sdks/python/apache_beam/ml/gcp/naturallanguageml.py @@ -15,8 +15,6 @@ # limitations under the License. # -from __future__ import absolute_import - from typing import Mapping from typing import Optional from typing import Sequence diff --git a/sdks/python/apache_beam/ml/gcp/naturallanguageml_test.py b/sdks/python/apache_beam/ml/gcp/naturallanguageml_test.py index ef7235936bd1..e6395176091f 100644 --- a/sdks/python/apache_beam/ml/gcp/naturallanguageml_test.py +++ b/sdks/python/apache_beam/ml/gcp/naturallanguageml_test.py @@ -18,8 +18,6 @@ """Unit tests for Google Cloud Natural Language API transform.""" -from __future__ import absolute_import - import unittest import mock diff --git a/sdks/python/apache_beam/ml/gcp/naturallanguageml_test_it.py b/sdks/python/apache_beam/ml/gcp/naturallanguageml_test_it.py index 4cf58e4152ec..9adf56a90102 100644 --- a/sdks/python/apache_beam/ml/gcp/naturallanguageml_test_it.py +++ b/sdks/python/apache_beam/ml/gcp/naturallanguageml_test_it.py @@ -16,11 +16,9 @@ # # pytype: skip-file -from __future__ import absolute_import - import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -48,7 +46,7 @@ def extract(response): ]) -@attr('IT') +@pytest.mark.it_postcommit @unittest.skipIf(AnnotateText is None, 'GCP dependencies are not installed') class NaturalLanguageMlTestIT(unittest.TestCase): def test_analyzing_syntax(self): diff --git a/sdks/python/apache_beam/ml/gcp/recommendations_ai.py b/sdks/python/apache_beam/ml/gcp/recommendations_ai.py new file mode 100644 index 000000000000..b6eb4cfb4bcd --- /dev/null +++ b/sdks/python/apache_beam/ml/gcp/recommendations_ai.py @@ -0,0 +1,585 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""A connector for sending API requests to the GCP Recommendations AI +API (https://cloud.google.com/recommendations). +""" + +from __future__ import absolute_import + +from typing import Sequence +from typing import Tuple + +from google.api_core.retry import Retry + +from apache_beam import pvalue +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import GoogleCloudOptions +from apache_beam.transforms import DoFn +from apache_beam.transforms import ParDo +from apache_beam.transforms import PTransform +from apache_beam.transforms.util import GroupIntoBatches +from cachetools.func import ttl_cache + +# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports +try: + from google.cloud import recommendationengine +except ImportError: + raise ImportError( + 'Google Cloud Recommendation AI not supported for this execution ' + 'environment (could not import google.cloud.recommendationengine).') +# pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports + +__all__ = [ + 'CreateCatalogItem', + 'WriteUserEvent', + 'ImportCatalogItems', + 'ImportUserEvents', + 'PredictUserEvent' +] + +FAILED_CATALOG_ITEMS = "failed_catalog_items" + + +@ttl_cache(maxsize=128, ttl=3600) +def get_recommendation_prediction_client(): + """Returns a Recommendation AI - Prediction Service client.""" + _client = recommendationengine.PredictionServiceClient() + return _client + + +@ttl_cache(maxsize=128, ttl=3600) +def get_recommendation_catalog_client(): + """Returns a Recommendation AI - Catalog Service client.""" + _client = recommendationengine.CatalogServiceClient() + return _client + + +@ttl_cache(maxsize=128, ttl=3600) +def get_recommendation_user_event_client(): + """Returns a Recommendation AI - UserEvent Service client.""" + _client = recommendationengine.UserEventServiceClient() + return _client + + +class CreateCatalogItem(PTransform): + """Creates catalogitem information. + The ``PTranform`` returns a PCollectionTuple with a PCollections of + successfully and failed created CatalogItems. + + Example usage:: + + pipeline | CreateCatalogItem( + project='example-gcp-project', + catalog_name='my-catalog') + """ + def __init__( + self, + project: str = None, + retry: Retry = None, + timeout: float = 120, + metadata: Sequence[Tuple[str, str]] = (), + catalog_name: str = "default_catalog"): + """Initializes a :class:`CreateCatalogItem` transform. + + Args: + project (str): Optional. GCP project name in which the catalog + data will be imported. + retry: Optional. Designation of what + errors, if any, should be retried. + timeout (float): Optional. The amount of time, in seconds, to wait + for the request to complete. + metadata: Optional. Strings which + should be sent along with the request as metadata. + catalog_name (str): Optional. Name of the catalog. + Default: 'default_catalog' + """ + self.project = project + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.catalog_name = catalog_name + + def expand(self, pcoll): + if self.project is None: + self.project = pcoll.pipeline.options.view_as(GoogleCloudOptions).project + if self.project is None: + raise ValueError( + """GCP project name needs to be specified in "project" pipeline + option""") + return pcoll | ParDo( + _CreateCatalogItemFn( + self.project, + self.retry, + self.timeout, + self.metadata, + self.catalog_name)) + + +class _CreateCatalogItemFn(DoFn): + def __init__( + self, + project: str = None, + retry: Retry = None, + timeout: float = 120, + metadata: Sequence[Tuple[str, str]] = (), + catalog_name: str = None): + self._client = None + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.parent = f"projects/{project}/locations/global/catalogs/{catalog_name}" + self.counter = Metrics.counter(self.__class__, "api_calls") + + def setup(self): + if self._client is None: + self._client = get_recommendation_catalog_client() + + def process(self, element): + catalog_item = recommendationengine.CatalogItem(element) + request = recommendationengine.CreateCatalogItemRequest( + parent=self.parent, catalog_item=catalog_item) + + try: + created_catalog_item = self._client.create_catalog_item( + request=request, + retry=self.retry, + timeout=self.timeout, + metadata=self.metadata) + + self.counter.inc() + yield recommendationengine.CatalogItem.to_dict(created_catalog_item) + except Exception: + yield pvalue.TaggedOutput( + FAILED_CATALOG_ITEMS, + recommendationengine.CatalogItem.to_dict(catalog_item)) + + +class ImportCatalogItems(PTransform): + """Imports catalogitems in bulk. + The `PTransform` returns a PCollectionTuple with PCollections of + successfully and failed imported CatalogItems. + + Example usage:: + + pipeline + | ImportCatalogItems( + project='example-gcp-project', + catalog_name='my-catalog') + """ + def __init__( + self, + max_batch_size: int = 5000, + project: str = None, + retry: Retry = None, + timeout: float = 120, + metadata: Sequence[Tuple[str, str]] = (), + catalog_name: str = "default_catalog"): + """Initializes a :class:`ImportCatalogItems` transform + + Args: + batch_size (int): Required. Maximum number of catalogitems per + request. + project (str): Optional. GCP project name in which the catalog + data will be imported. + retry: Optional. Designation of what + errors, if any, should be retried. + timeout (float): Optional. The amount of time, in seconds, to wait + for the request to complete. + metadata: Optional. Strings which + should be sent along with the request as metadata. + catalog_name (str): Optional. Name of the catalog. + Default: 'default_catalog' + """ + self.max_batch_size = max_batch_size + self.project = project + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.catalog_name = catalog_name + + def expand(self, pcoll): + if self.project is None: + self.project = pcoll.pipeline.options.view_as(GoogleCloudOptions).project + if self.project is None: + raise ValueError( + 'GCP project name needs to be specified in "project" pipeline option') + return ( + pcoll | GroupIntoBatches.WithShardedKey(self.max_batch_size) | ParDo( + _ImportCatalogItemsFn( + self.project, + self.retry, + self.timeout, + self.metadata, + self.catalog_name))) + + +class _ImportCatalogItemsFn(DoFn): + def __init__( + self, + project=None, + retry=None, + timeout=120, + metadata=None, + catalog_name=None): + self._client = None + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.parent = f"projects/{project}/locations/global/catalogs/{catalog_name}" + self.counter = Metrics.counter(self.__class__, "api_calls") + + def setup(self): + if self._client is None: + self.client = get_recommendation_catalog_client() + + def process(self, element): + catalog_items = [recommendationengine.CatalogItem(e) for e in element[1]] + catalog_inline_source = recommendationengine.CatalogInlineSource( + {"catalog_items": catalog_items}) + input_config = recommendationengine.InputConfig( + catalog_inline_source=catalog_inline_source) + + request = recommendationengine.ImportCatalogItemsRequest( + parent=self.parent, input_config=input_config) + + try: + operation = self._client.import_catalog_items( + request=request, + retry=self.retry, + timeout=self.timeout, + metadata=self.metadata) + self.counter.inc(len(catalog_items)) + yield operation.result() + except Exception: + yield pvalue.TaggedOutput(FAILED_CATALOG_ITEMS, catalog_items) + + +class WriteUserEvent(PTransform): + """Write user event information. + The `PTransform` returns a PCollectionTuple with PCollections of + successfully and failed written UserEvents. + + Example usage:: + + pipeline + | WriteUserEvent( + project='example-gcp-project', + catalog_name='my-catalog', + event_store='my_event_store') + """ + def __init__( + self, + project: str = None, + retry: Retry = None, + timeout: float = 120, + metadata: Sequence[Tuple[str, str]] = (), + catalog_name: str = "default_catalog", + event_store: str = "default_event_store"): + """Initializes a :class:`WriteUserEvent` transform. + + Args: + project (str): Optional. GCP project name in which the catalog + data will be imported. + retry: Optional. Designation of what + errors, if any, should be retried. + timeout (float): Optional. The amount of time, in seconds, to wait + for the request to complete. + metadata: Optional. Strings which + should be sent along with the request as metadata. + catalog_name (str): Optional. Name of the catalog. + Default: 'default_catalog' + event_store (str): Optional. Name of the event store. + Default: 'default_event_store' + """ + self.project = project + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.catalog_name = catalog_name + self.event_store = event_store + + def expand(self, pcoll): + if self.project is None: + self.project = pcoll.pipeline.options.view_as(GoogleCloudOptions).project + if self.project is None: + raise ValueError( + 'GCP project name needs to be specified in "project" pipeline option') + return pcoll | ParDo( + _WriteUserEventFn( + self.project, + self.retry, + self.timeout, + self.metadata, + self.catalog_name, + self.event_store)) + + +class _WriteUserEventFn(DoFn): + FAILED_USER_EVENTS = "failed_user_events" + + def __init__( + self, + project=None, + retry=None, + timeout=120, + metadata=None, + catalog_name=None, + event_store=None): + self._client = None + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.parent = f"projects/{project}/locations/global/catalogs/"\ + f"{catalog_name}/eventStores/{event_store}" + self.counter = Metrics.counter(self.__class__, "api_calls") + + def setup(self): + if self._client is None: + self._client = get_recommendation_user_event_client() + + def process(self, element): + user_event = recommendationengine.UserEvent(element) + request = recommendationengine.WriteUserEventRequest( + parent=self.parent, user_event=user_event) + + try: + created_user_event = self._client.write_user_event(request) + self.counter.inc() + yield recommendationengine.UserEvent.to_dict(created_user_event) + except Exception: + yield pvalue.TaggedOutput( + self.FAILED_USER_EVENTS, + recommendationengine.UserEvent.to_dict(user_event)) + + +class ImportUserEvents(PTransform): + """Imports userevents in bulk. + The `PTransform` returns a PCollectionTuple with PCollections of + successfully and failed imported UserEvents. + + Example usage:: + + pipeline + | ImportUserEvents( + project='example-gcp-project', + catalog_name='my-catalog', + event_store='my_event_store') + """ + def __init__( + self, + max_batch_size: int = 5000, + project: str = None, + retry: Retry = None, + timeout: float = 120, + metadata: Sequence[Tuple[str, str]] = (), + catalog_name: str = "default_catalog", + event_store: str = "default_event_store"): + """Initializes a :class:`WriteUserEvent` transform. + + Args: + batch_size (int): Required. Maximum number of catalogitems + per request. + project (str): Optional. GCP project name in which the catalog + data will be imported. + retry: Optional. Designation of what + errors, if any, should be retried. + timeout (float): Optional. The amount of time, in seconds, to wait + for the request to complete. + metadata: Optional. Strings which + should be sent along with the request as metadata. + catalog_name (str): Optional. Name of the catalog. + Default: 'default_catalog' + event_store (str): Optional. Name of the event store. + Default: 'default_event_store' + """ + self.max_batch_size = max_batch_size + self.project = project + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.catalog_name = catalog_name + self.event_store = event_store + + def expand(self, pcoll): + if self.project is None: + self.project = pcoll.pipeline.options.view_as(GoogleCloudOptions).project + if self.project is None: + raise ValueError( + 'GCP project name needs to be specified in "project" pipeline option') + return ( + pcoll | GroupIntoBatches.WithShardedKey(self.max_batch_size) | ParDo( + _ImportUserEventsFn( + self.project, + self.retry, + self.timeout, + self.metadata, + self.catalog_name, + self.event_store))) + + +class _ImportUserEventsFn(DoFn): + FAILED_USER_EVENTS = "failed_user_events" + + def __init__( + self, + project=None, + retry=None, + timeout=120, + metadata=None, + catalog_name=None, + event_store=None): + self._client = None + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.parent = f"projects/{project}/locations/global/catalogs/"\ + f"{catalog_name}/eventStores/{event_store}" + self.counter = Metrics.counter(self.__class__, "api_calls") + + def setup(self): + if self._client is None: + self.client = get_recommendation_user_event_client() + + def process(self, element): + + user_events = [recommendationengine.UserEvent(e) for e in element[1]] + user_event_inline_source = recommendationengine.UserEventInlineSource( + {"user_events": user_events}) + input_config = recommendationengine.InputConfig( + user_event_inline_source=user_event_inline_source) + + request = recommendationengine.ImportUserEventsRequest( + parent=self.parent, input_config=input_config) + + try: + operation = self._client.write_user_event(request) + self.counter.inc(len(user_events)) + yield recommendationengine.PredictResponse.to_dict(operation.result()) + except Exception: + yield pvalue.TaggedOutput(self.FAILED_USER_EVENTS, user_events) + + +class PredictUserEvent(PTransform): + """Make a recommendation prediction. + The `PTransform` returns a PCollection + + Example usage:: + + pipeline + | PredictUserEvent( + project='example-gcp-project', + catalog_name='my-catalog', + event_store='my_event_store', + placement_id='recently_viewed_default') + """ + def __init__( + self, + project: str = None, + retry: Retry = None, + timeout: float = 120, + metadata: Sequence[Tuple[str, str]] = (), + catalog_name: str = "default_catalog", + event_store: str = "default_event_store", + placement_id: str = None): + """Initializes a :class:`PredictUserEvent` transform. + + Args: + project (str): Optional. GCP project name in which the catalog + data will be imported. + retry: Optional. Designation of what + errors, if any, should be retried. + timeout (float): Optional. The amount of time, in seconds, to wait + for the request to complete. + metadata: Optional. Strings which + should be sent along with the request as metadata. + catalog_name (str): Optional. Name of the catalog. + Default: 'default_catalog' + event_store (str): Optional. Name of the event store. + Default: 'default_event_store' + placement_id (str): Required. ID of the recommendation engine + placement. This id is used to identify the set of models that + will be used to make the prediction. + """ + self.project = project + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.placement_id = placement_id + self.catalog_name = catalog_name + self.event_store = event_store + if placement_id is None: + raise ValueError('placement_id must be specified') + else: + self.placement_id = placement_id + + def expand(self, pcoll): + if self.project is None: + self.project = pcoll.pipeline.options.view_as(GoogleCloudOptions).project + if self.project is None: + raise ValueError( + 'GCP project name needs to be specified in "project" pipeline option') + return pcoll | ParDo( + _PredictUserEventFn( + self.project, + self.retry, + self.timeout, + self.metadata, + self.catalog_name, + self.event_store, + self.placement_id)) + + +class _PredictUserEventFn(DoFn): + FAILED_PREDICTIONS = "failed_predictions" + + def __init__( + self, + project=None, + retry=None, + timeout=120, + metadata=None, + catalog_name=None, + event_store=None, + placement_id=None): + self._client = None + self.retry = retry + self.timeout = timeout + self.metadata = metadata + self.name = f"projects/{project}/locations/global/catalogs/"\ + f"{catalog_name}/eventStores/{event_store}/placements/"\ + f"{placement_id}" + self.counter = Metrics.counter(self.__class__, "api_calls") + + def setup(self): + if self._client is None: + self._client = get_recommendation_prediction_client() + + def process(self, element): + user_event = recommendationengine.UserEvent(element) + request = recommendationengine.PredictRequest( + name=self.name, user_event=user_event) + + try: + prediction = self._client.predict(request) + self.counter.inc() + yield [ + recommendationengine.PredictResponse.to_dict(p) + for p in prediction.pages + ] + except Exception: + yield pvalue.TaggedOutput(self.FAILED_PREDICTIONS, user_event) diff --git a/sdks/python/apache_beam/ml/gcp/recommendations_ai_test.py b/sdks/python/apache_beam/ml/gcp/recommendations_ai_test.py new file mode 100644 index 000000000000..2f688d97a309 --- /dev/null +++ b/sdks/python/apache_beam/ml/gcp/recommendations_ai_test.py @@ -0,0 +1,207 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Unit tests for Recommendations AI transforms.""" + +from __future__ import absolute_import + +import unittest + +import mock + +import apache_beam as beam +from apache_beam.metrics import MetricsFilter + +# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports +try: + from google.cloud import recommendationengine + from apache_beam.ml.gcp import recommendations_ai +except ImportError: + recommendationengine = None +# pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports + + +@unittest.skipIf( + recommendationengine is None, + "Recommendations AI dependencies not installed.") +class RecommendationsAICatalogItemTest(unittest.TestCase): + def setUp(self): + self._mock_client = mock.Mock() + self._mock_client.create_catalog_item.return_value = ( + recommendationengine.CatalogItem()) + self.m2 = mock.Mock() + self.m2.result.return_value = None + self._mock_client.import_catalog_items.return_value = self.m2 + + self._catalog_item = { + "id": "12345", + "title": "Sample laptop", + "description": "Indisputably the most fantastic laptop ever created.", + "language_code": "en", + "category_hierarchies": [{ + "categories": ["Electronic", "Computers"] + }] + } + + def test_CreateCatalogItem(self): + expected_counter = 1 + with mock.patch.object(recommendations_ai, + 'get_recommendation_catalog_client', + return_value=self._mock_client): + p = beam.Pipeline() + + _ = ( + p | "Create data" >> beam.Create([self._catalog_item]) + | "Create CatalogItem" >> + recommendations_ai.CreateCatalogItem(project="test")) + + result = p.run() + result.wait_until_finish() + + read_filter = MetricsFilter().with_name('api_calls') + query_result = result.metrics().query(read_filter) + if query_result['counters']: + read_counter = query_result['counters'][0] + self.assertTrue(read_counter.result == expected_counter) + + def test_ImportCatalogItems(self): + expected_counter = 1 + with mock.patch.object(recommendations_ai, + 'get_recommendation_catalog_client', + return_value=self._mock_client): + p = beam.Pipeline() + + _ = ( + p | "Create data" >> beam.Create([ + (self._catalog_item["id"], self._catalog_item), + (self._catalog_item["id"], self._catalog_item) + ]) | "Create CatalogItems" >> + recommendations_ai.ImportCatalogItems(project="test")) + + result = p.run() + result.wait_until_finish() + + read_filter = MetricsFilter().with_name('api_calls') + query_result = result.metrics().query(read_filter) + if query_result['counters']: + read_counter = query_result['counters'][0] + self.assertTrue(read_counter.result == expected_counter) + + +@unittest.skipIf( + recommendationengine is None, + "Recommendations AI dependencies not installed.") +class RecommendationsAIUserEventTest(unittest.TestCase): + def setUp(self): + self._mock_client = mock.Mock() + self._mock_client.write_user_event.return_value = ( + recommendationengine.UserEvent()) + self.m2 = mock.Mock() + self.m2.result.return_value = None + self._mock_client.import_user_events.return_value = self.m2 + + self._user_event = { + "event_type": "page-visit", "user_info": { + "visitor_id": "1" + } + } + + def test_CreateUserEvent(self): + expected_counter = 1 + with mock.patch.object(recommendations_ai, + 'get_recommendation_user_event_client', + return_value=self._mock_client): + p = beam.Pipeline() + + _ = ( + p | "Create data" >> beam.Create([self._user_event]) + | "Create UserEvent" >> + recommendations_ai.WriteUserEvent(project="test")) + + result = p.run() + result.wait_until_finish() + + read_filter = MetricsFilter().with_name('api_calls') + query_result = result.metrics().query(read_filter) + if query_result['counters']: + read_counter = query_result['counters'][0] + self.assertTrue(read_counter.result == expected_counter) + + def test_ImportUserEvents(self): + expected_counter = 1 + with mock.patch.object(recommendations_ai, + 'get_recommendation_user_event_client', + return_value=self._mock_client): + p = beam.Pipeline() + + _ = ( + p | "Create data" >> beam.Create([ + (self._user_event["user_info"]["visitor_id"], self._user_event), + (self._user_event["user_info"]["visitor_id"], self._user_event) + ]) | "Create UserEvents" >> + recommendations_ai.ImportUserEvents(project="test")) + + result = p.run() + result.wait_until_finish() + + read_filter = MetricsFilter().with_name('api_calls') + query_result = result.metrics().query(read_filter) + if query_result['counters']: + read_counter = query_result['counters'][0] + self.assertTrue(read_counter.result == expected_counter) + + +@unittest.skipIf( + recommendationengine is None, + "Recommendations AI dependencies not installed.") +class RecommendationsAIPredictTest(unittest.TestCase): + def setUp(self): + self._mock_client = mock.Mock() + self._mock_client.predict.return_value = [ + recommendationengine.PredictResponse() + ] + + self._user_event = { + "event_type": "page-visit", "user_info": { + "visitor_id": "1" + } + } + + def test_Predict(self): + expected_counter = 1 + with mock.patch.object(recommendations_ai, + 'get_recommendation_prediction_client', + return_value=self._mock_client): + p = beam.Pipeline() + + _ = ( + p | "Create data" >> beam.Create([self._user_event]) + | "Prediction UserEvents" >> recommendations_ai.PredictUserEvent( + project="test", placement_id="recently_viewed_default")) + + result = p.run() + result.wait_until_finish() + + read_filter = MetricsFilter().with_name('api_calls') + query_result = result.metrics().query(read_filter) + if query_result['counters']: + read_counter = query_result['counters'][0] + self.assertTrue(read_counter.result == expected_counter) + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/ml/gcp/recommendations_ai_test_it.py b/sdks/python/apache_beam/ml/gcp/recommendations_ai_test_it.py new file mode 100644 index 000000000000..b7175ec08e33 --- /dev/null +++ b/sdks/python/apache_beam/ml/gcp/recommendations_ai_test_it.py @@ -0,0 +1,109 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Integration tests for Recommendations AI transforms.""" + +from __future__ import absolute_import + +import random +import unittest + +import pytest + +import apache_beam as beam +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to +from apache_beam.testing.util import is_not_empty + +# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports +try: + from google.cloud import recommendationengine + from apache_beam.ml.gcp import recommendations_ai +except ImportError: + recommendationengine = None +# pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports + +GCP_TEST_PROJECT = 'apache-beam-testing' + + +def extract_id(response): + yield response["id"] + + +def extract_event_type(response): + yield response["event_type"] + + +def extract_prediction(response): + yield response[0]["results"] + + +@pytest.mark.it_postcommit +@unittest.skipIf( + recommendationengine is None, + "Recommendations AI dependencies not installed.") +@unittest.skip('https://issues.apache.org/jira/browse/BEAM-12683') +class RecommendationAIIT(unittest.TestCase): + def test_create_catalog_item(self): + + CATALOG_ITEM = { + "id": str(int(random.randrange(100000))), + "title": "Sample laptop", + "description": "Indisputably the most fantastic laptop ever created.", + "language_code": "en", + "category_hierarchies": [{ + "categories": ["Electronic", "Computers"] + }] + } + + with TestPipeline(is_integration_test=True) as p: + output = ( + p | 'Create data' >> beam.Create([CATALOG_ITEM]) + | 'Create CatalogItem' >> + recommendations_ai.CreateCatalogItem(project=GCP_TEST_PROJECT) + | beam.ParDo(extract_id) | beam.combiners.ToList()) + + assert_that(output, equal_to([[CATALOG_ITEM["id"]]])) + + def test_create_user_event(self): + USER_EVENT = {"event_type": "page-visit", "user_info": {"visitor_id": "1"}} + + with TestPipeline(is_integration_test=True) as p: + output = ( + p | 'Create data' >> beam.Create([USER_EVENT]) | 'Create UserEvent' >> + recommendations_ai.WriteUserEvent(project=GCP_TEST_PROJECT) + | beam.ParDo(extract_event_type) | beam.combiners.ToList()) + + assert_that(output, equal_to([[USER_EVENT["event_type"]]])) + + def test_predict(self): + USER_EVENT = {"event_type": "page-visit", "user_info": {"visitor_id": "1"}} + + with TestPipeline(is_integration_test=True) as p: + output = ( + p | 'Create data' >> beam.Create([USER_EVENT]) + | 'Predict UserEvent' >> recommendations_ai.PredictUserEvent( + project=GCP_TEST_PROJECT, placement_id="recently_viewed_default") + | beam.ParDo(extract_prediction)) + + assert_that(output, is_not_empty()) + + +if __name__ == '__main__': + print(recommendationengine.CatalogItem.__module__) + unittest.main() diff --git a/sdks/python/apache_beam/ml/gcp/videointelligenceml.py b/sdks/python/apache_beam/ml/gcp/videointelligenceml.py index 67ff4969f624..bc0aa0845923 100644 --- a/sdks/python/apache_beam/ml/gcp/videointelligenceml.py +++ b/sdks/python/apache_beam/ml/gcp/videointelligenceml.py @@ -17,15 +17,10 @@ """A connector for sending API requests to the GCP Video Intelligence API.""" -from __future__ import absolute_import - from typing import Optional from typing import Tuple from typing import Union -from future.utils import binary_type -from future.utils import text_type - from apache_beam import typehints from apache_beam.metrics import Metrics from apache_beam.transforms import DoFn @@ -55,8 +50,8 @@ class AnnotateVideo(PTransform): ref: https://cloud.google.com/video-intelligence/docs Sends each element to the GCP Video Intelligence API. Element is a - Union[text_type, binary_type] of either an URI (e.g. a GCS URI) or - binary_type base64-encoded video data. + Union[str, bytes] of either an URI (e.g. a GCS URI) or + bytes base64-encoded video data. Accepts an `AsDict` side input that maps each video to a video context. """ def __init__( @@ -118,8 +113,7 @@ def expand(self, pvalue): @typehints.with_input_types( - Union[text_type, binary_type], - Optional[videointelligence.types.VideoContext]) + Union[str, bytes], Optional[videointelligence.types.VideoContext]) class _VideoAnnotateFn(DoFn): """A DoFn that sends each input element to the GCP Video Intelligence API service and outputs an element with the return result of the API @@ -138,7 +132,7 @@ def start_bundle(self): self._client = get_videointelligence_client() def _annotate_video(self, element, video_context): - if isinstance(element, text_type): # Is element an URI to a GCS bucket + if isinstance(element, str): # Is element an URI to a GCS bucket response = self._client.annotate_video( input_uri=element, features=self.features, @@ -171,11 +165,11 @@ class AnnotateVideoWithContext(AnnotateVideo): Sends each element to the GCP Video Intelligence API. Element is a tuple of - (Union[text_type, binary_type], + (Union[str, bytes], Optional[videointelligence.types.VideoContext]) where the former is either an URI (e.g. a GCS URI) or - binary_type base64-encoded video data + bytes base64-encoded video data """ def __init__(self, features, location_id=None, metadata=None, timeout=120): """ @@ -208,8 +202,7 @@ def expand(self, pvalue): @typehints.with_input_types( - Tuple[Union[text_type, binary_type], - Optional[videointelligence.types.VideoContext]]) + Tuple[Union[str, bytes], Optional[videointelligence.types.VideoContext]]) class _VideoAnnotateFnWithContext(_VideoAnnotateFn): """A DoFn that unpacks each input tuple to element, video_context variables and sends these to the GCP Video Intelligence API service and outputs diff --git a/sdks/python/apache_beam/ml/gcp/videointelligenceml_test.py b/sdks/python/apache_beam/ml/gcp/videointelligenceml_test.py index b0bb441c5bba..3215cebb4a88 100644 --- a/sdks/python/apache_beam/ml/gcp/videointelligenceml_test.py +++ b/sdks/python/apache_beam/ml/gcp/videointelligenceml_test.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import unicode_literals - import logging import unittest diff --git a/sdks/python/apache_beam/ml/gcp/videointelligenceml_test_it.py b/sdks/python/apache_beam/ml/gcp/videointelligenceml_test_it.py index a2d82162bf37..03f79d171597 100644 --- a/sdks/python/apache_beam/ml/gcp/videointelligenceml_test_it.py +++ b/sdks/python/apache_beam/ml/gcp/videointelligenceml_test_it.py @@ -19,13 +19,10 @@ """An integration test that labels entities appearing in a video and checks if some expected entities were properly recognized.""" -from __future__ import absolute_import -from __future__ import unicode_literals - import unittest import hamcrest as hc -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -48,7 +45,7 @@ def extract_entities_descriptions(response): yield segment.entity.description -@attr('IT') +@pytest.mark.it_postcommit @unittest.skipIf( AnnotateVideoWithContext is None, 'GCP dependencies are not installed') class VideoIntelligenceMlTestIT(unittest.TestCase): diff --git a/sdks/python/apache_beam/ml/gcp/visionml.py b/sdks/python/apache_beam/ml/gcp/visionml.py index 13884a7bf0f0..0fb45ce72fdc 100644 --- a/sdks/python/apache_beam/ml/gcp/visionml.py +++ b/sdks/python/apache_beam/ml/gcp/visionml.py @@ -20,16 +20,11 @@ A connector for sending API requests to the GCP Vision API. """ -from __future__ import absolute_import - from typing import List from typing import Optional from typing import Tuple from typing import Union -from future.utils import binary_type -from future.utils import text_type - from apache_beam import typehints from apache_beam.metrics import Metrics from apache_beam.transforms import DoFn @@ -65,8 +60,8 @@ class AnnotateImage(PTransform): Batches elements together using ``util.BatchElements`` PTransform and sends each batch of elements to the GCP Vision API. - Element is a Union[text_type, binary_type] of either an URI (e.g. a GCS URI) - or binary_type base64-encoded image data. + Element is a Union[str, bytes] of either an URI (e.g. a GCS URI) + or bytes base64-encoded image data. Accepts an `AsDict` side input that maps each image to an image context. """ @@ -158,7 +153,7 @@ def expand(self, pvalue): metadata=self.metadata))) @typehints.with_input_types( - Union[text_type, binary_type], Optional[vision.types.ImageContext]) + Union[str, bytes], Optional[vision.types.ImageContext]) @typehints.with_output_types(List[vision.types.AnnotateImageRequest]) def _create_image_annotation_pairs(self, element, context_side_input): if context_side_input: # If we have a side input image context, use that @@ -166,10 +161,10 @@ def _create_image_annotation_pairs(self, element, context_side_input): else: image_context = None - if isinstance(element, text_type): + if isinstance(element, str): image = vision.types.Image( source=vision.types.ImageSource(image_uri=element)) - else: # Typehint checks only allows text_type or binary_type + else: # Typehint checks only allows str or bytes image = vision.types.Image(content=element) request = vision.types.AnnotateImageRequest( @@ -185,10 +180,10 @@ class AnnotateImageWithContext(AnnotateImage): Element is a tuple of:: - (Union[text_type, binary_type], + (Union[str, bytes], Optional[``vision.types.ImageContext``]) - where the former is either an URI (e.g. a GCS URI) or binary_type + where the former is either an URI (e.g. a GCS URI) or bytes base64-encoded image data. """ def __init__( @@ -249,14 +244,14 @@ def expand(self, pvalue): metadata=self.metadata))) @typehints.with_input_types( - Tuple[Union[text_type, binary_type], Optional[vision.types.ImageContext]]) + Tuple[Union[str, bytes], Optional[vision.types.ImageContext]]) @typehints.with_output_types(List[vision.types.AnnotateImageRequest]) def _create_image_annotation_pairs(self, element, **kwargs): element, image_context = element # Unpack (image, image_context) tuple - if isinstance(element, text_type): + if isinstance(element, str): image = vision.types.Image( source=vision.types.ImageSource(image_uri=element)) - else: # Typehint checks only allows text_type or binary_type + else: # Typehint checks only allows str or bytes image = vision.types.Image(content=element) request = vision.types.AnnotateImageRequest( diff --git a/sdks/python/apache_beam/ml/gcp/visionml_test.py b/sdks/python/apache_beam/ml/gcp/visionml_test.py index d4c6c203feaa..f038442468f8 100644 --- a/sdks/python/apache_beam/ml/gcp/visionml_test.py +++ b/sdks/python/apache_beam/ml/gcp/visionml_test.py @@ -20,9 +20,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import unicode_literals - import logging import unittest diff --git a/sdks/python/apache_beam/ml/gcp/visionml_test_it.py b/sdks/python/apache_beam/ml/gcp/visionml_test_it.py index a81acd015416..4413266dcc5c 100644 --- a/sdks/python/apache_beam/ml/gcp/visionml_test_it.py +++ b/sdks/python/apache_beam/ml/gcp/visionml_test_it.py @@ -16,11 +16,9 @@ # # pytype: skip-file -from __future__ import absolute_import - import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -42,7 +40,7 @@ def extract(response): yield text_annotation.description -@attr('IT') +@pytest.mark.it_postcommit @unittest.skipIf(vision is None, 'GCP dependencies are not installed') class VisionMlTestIT(unittest.TestCase): def test_text_detection_with_language_hint(self): diff --git a/sdks/python/apache_beam/options/__init__.py b/sdks/python/apache_beam/options/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/options/__init__.py +++ b/sdks/python/apache_beam/options/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/options/pipeline_options.py b/sdks/python/apache_beam/options/pipeline_options.py index c169586575df..bba56efe8d04 100644 --- a/sdks/python/apache_beam/options/pipeline_options.py +++ b/sdks/python/apache_beam/options/pipeline_options.py @@ -19,13 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import json import logging -from builtins import list -from builtins import object from typing import Any from typing import Callable from typing import Dict @@ -414,13 +410,30 @@ class StandardOptions(PipelineOptions): DEFAULT_RUNNER = 'DirectRunner' + ALL_KNOWN_RUNNERS = ( + 'apache_beam.runners.dataflow.dataflow_runner.DataflowRunner', + 'apache_beam.runners.direct.direct_runner.BundleBasedDirectRunner', + 'apache_beam.runners.direct.direct_runner.DirectRunner', + 'apache_beam.runners.direct.direct_runner.SwitchingDirectRunner', + 'apache_beam.runners.interactive.interactive_runner.InteractiveRunner', + 'apache_beam.runners.portability.flink_runner.FlinkRunner', + 'apache_beam.runners.portability.portable_runner.PortableRunner', + 'apache_beam.runners.portability.spark_runner.SparkRunner', + 'apache_beam.runners.test.TestDirectRunner', + 'apache_beam.runners.test.TestDataflowRunner', + ) + + KNOWN_RUNNER_NAMES = [path.split('.')[-1] for path in ALL_KNOWN_RUNNERS] + @classmethod def _add_argparse_args(cls, parser): parser.add_argument( '--runner', help=( 'Pipeline runner used to execute the workflow. Valid values are ' - 'DirectRunner, DataflowRunner.')) + 'one of %s, or the fully qualified name of a PipelineRunner ' + 'subclass. If unspecified, defaults to %s.' % + (', '.join(cls.KNOWN_RUNNER_NAMES), cls.DEFAULT_RUNNER))) # Whether to enable streaming mode. parser.add_argument( '--streaming', @@ -428,6 +441,18 @@ def _add_argparse_args(cls, parser): action='store_true', help='Whether to enable streaming mode.') + parser.add_argument( + '--resource_hint', + '--resource_hints', + dest='resource_hints', + action='append', + default=[], + help=( + 'Resource hint to set in the pipeline execution environment.' + 'Hints specified via this option override hints specified ' + 'at transform level. Interpretation of hints is defined by ' + 'Beam runners.')) + class CrossLanguageOptions(PipelineOptions): @classmethod @@ -499,6 +524,23 @@ def _add_argparse_args(cls, parser): help='Enable faster type checking via sampling at pipeline execution ' 'time. NOTE: only supported with portable runners ' '(including the DirectRunner)') + parser.add_argument( + '--allow_non_deterministic_key_coders', + default=False, + action='store_true', + help='Use non-deterministic coders (such as pickling) for key-grouping ' + 'operations such as GropuByKey. This is unsafe, as runners may group ' + 'keys based on their encoded bytes, but is available for backwards ' + 'compatibility. See BEAM-11719.') + parser.add_argument( + '--allow_unsafe_triggers', + default=False, + action='store_true', + help='Allow the use of unsafe triggers. Unsafe triggers have the ' + 'potential to cause data loss due to finishing and/or never having ' + 'their condition met. Some operations, such as GroupByKey, disallow ' + 'this. This exists for cases where such loss is acceptable and for ' + 'backwards compatibility. See BEAM-9487.') def validate(self, unused_validator): errors = [] @@ -647,11 +689,32 @@ def _add_argparse_args(cls, parser): default=None, help='Set a Google Cloud KMS key name to be used in ' 'Dataflow state operations (GBK, Streaming).') + parser.add_argument( + '--create_from_snapshot', + default=None, + help='The snapshot from which the job should be created.') parser.add_argument( '--flexrs_goal', default=None, choices=['COST_OPTIMIZED', 'SPEED_OPTIMIZED'], help='Set the Flexible Resource Scheduling mode') + parser.add_argument( + '--dataflow_service_option', + '--dataflow_service_options', + dest='dataflow_service_options', + action='append', + default=None, + help=( + 'Options to configure the Dataflow service. These ' + 'options decouple service side feature availbility ' + 'from the Apache Beam release cycle.')) + parser.add_argument( + '--enable_hot_key_logging', + default=False, + action='store_true', + help='When true, will enable the direct logging of any detected hot ' + 'keys into Cloud Logging. Warning: this will log the literal key as an ' + 'unobfuscated string.') def _create_default_gcs_bucket(self): try: @@ -817,9 +880,20 @@ def _add_argparse_args(cls, parser): default=None, help=( 'Docker registry location of container image to use for the ' - 'worker harness. Default is the container for the version of the ' - 'SDK. Note: currently, only approved Google Cloud Dataflow ' - 'container images may be used here.')) + 'worker harness. If not set, an appropriate approved Google Cloud ' + 'Dataflow image will be used based on the version of the ' + 'SDK. Note: This flag is deprecated and only supports ' + 'approved Google Cloud Dataflow container images. To provide a ' + 'custom container image, use sdk_container_image instead.')) + parser.add_argument( + '--sdk_container_image', + default=None, + help=( + 'Docker registry location of container image to use for the ' + 'worker harness. If not set, an appropriate approved Google Cloud ' + 'Dataflow image will be used based on the version of the ' + 'SDK. If set for a non-portable pipeline, only official ' + 'Google Cloud Dataflow container images may be used here.')) parser.add_argument( '--sdk_harness_container_image_overrides', action='append', @@ -857,9 +931,10 @@ def _add_argparse_args(cls, parser): def validate(self, validator): errors = [] + errors.extend(validator.validate_sdk_container_image_options(self)) + if validator.is_service_runner(): - errors.extend( - validator.validate_optional_argument_positive(self, 'num_workers')) + errors.extend(validator.validate_num_workers(self)) errors.extend(validator.validate_worker_region_zone(self)) return errors @@ -1019,7 +1094,6 @@ def _add_argparse_args(cls, parser): 'the command line.')) parser.add_argument( '--prebuild_sdk_container_engine', - choices=['local_docker', 'cloud_build'], help=( 'Prebuild sdk worker container image before job submission. If ' 'enabled, SDK invokes the boot sequence in SDK worker ' @@ -1028,7 +1102,9 @@ def _add_argparse_args(cls, parser): 'environment. This may speed up pipeline execution. To enable, ' 'select the Docker build engine: local_docker using ' 'locally-installed Docker or cloud_build for using Google Cloud ' - 'Build (requires a GCP project with Cloud Build API enabled).')) + 'Build (requires a GCP project with Cloud Build API enabled). You ' + 'can also subclass SdkContainerImageBuilder and use that to build ' + 'in other environments.')) parser.add_argument( '--prebuild_sdk_container_base_image', default=None, @@ -1164,16 +1240,19 @@ def _add_argparse_args(cls, parser): parser.add_argument( '--job_port', default=0, + type=int, help='Port to use for the job service. 0 to use a ' 'dynamic port.') parser.add_argument( '--artifact_port', default=0, + type=int, help='Port to use for artifact staging. 0 to use a ' 'dynamic port.') parser.add_argument( '--expansion_port', default=0, + type=int, help='Port to use for artifact staging. 0 to use a ' 'dynamic port.') parser.add_argument( @@ -1182,11 +1261,19 @@ def _add_argparse_args(cls, parser): help='The Java Application Launcher executable file to use for ' 'starting a Java job server. If unset, `java` from the ' 'environment\'s $PATH is used.') + parser.add_argument( + '--job_server_jvm_properties', + '--job_server_jvm_property', + dest='job_server_jvm_properties', + action='append', + default=[], + help='JVM properties to pass to a Java job server.') class FlinkRunnerOptions(PipelineOptions): - PUBLISHED_FLINK_VERSIONS = ['1.7', '1.8', '1.9', '1.10'] + # These should stay in sync with gradle.properties. + PUBLISHED_FLINK_VERSIONS = ['1.10', '1.11', '1.12', '1.13'] @classmethod def _add_argparse_args(cls, parser): @@ -1227,7 +1314,8 @@ def _add_argparse_args(cls, parser): 'the execution.') parser.add_argument( '--spark_job_server_jar', - help='Path or URL to a Beam Spark jobserver jar.') + help='Path or URL to a Beam Spark job server jar. ' + 'Overrides --spark_version.') parser.add_argument( '--spark_submit_uber_jar', default=False, @@ -1238,7 +1326,13 @@ def _add_argparse_args(cls, parser): parser.add_argument( '--spark_rest_url', help='URL for the Spark REST endpoint. ' - 'Only required when using spark_submit_uber_jar.') + 'Only required when using spark_submit_uber_jar. ' + 'For example, http://hostname:6066') + parser.add_argument( + '--spark_version', + default='2', + choices=['2', '3'], + help='Spark major version to use.') class TestOptions(PipelineOptions): diff --git a/sdks/python/apache_beam/options/pipeline_options_test.py b/sdks/python/apache_beam/options/pipeline_options_test.py index 25c42c0715bc..e0041fc6caf7 100644 --- a/sdks/python/apache_beam/options/pipeline_options_test.py +++ b/sdks/python/apache_beam/options/pipeline_options_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import json import logging import unittest @@ -217,6 +215,7 @@ def _add_argparse_args(cls, parser): parser.add_argument( '--fake_multi_option', action='append', help='fake multi option') + @unittest.skip("TODO(BEAM-12515): Flaky test.") def test_display_data(self): for case in PipelineOptionsTest.TEST_CASES: options = PipelineOptions(flags=case['flags']) @@ -623,6 +622,31 @@ def test_transform_name_mapping(self): mapping = options.view_as(GoogleCloudOptions).transform_name_mapping self.assertEqual(mapping['from'], 'to') + def test_dataflow_service_options(self): + options = PipelineOptions([ + '--dataflow_service_option', + 'whizz=bang', + '--dataflow_service_option', + 'beep=boop' + ]) + self.assertEqual( + sorted(options.get_all_options()['dataflow_service_options']), + ['beep=boop', 'whizz=bang']) + + options = PipelineOptions([ + '--dataflow_service_options', + 'whizz=bang', + '--dataflow_service_options', + 'beep=boop' + ]) + self.assertEqual( + sorted(options.get_all_options()['dataflow_service_options']), + ['beep=boop', 'whizz=bang']) + + options = PipelineOptions(flags=['']) + self.assertEqual( + options.get_all_options()['dataflow_service_options'], None) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/options/pipeline_options_validator.py b/sdks/python/apache_beam/options/pipeline_options_validator.py index c1496ae8ef8b..15f1f65dc93c 100644 --- a/sdks/python/apache_beam/options/pipeline_options_validator.py +++ b/sdks/python/apache_beam/options/pipeline_options_validator.py @@ -21,13 +21,8 @@ """ # pytype: skip-file -from __future__ import absolute_import - import logging import re -from builtins import object - -from past.builtins import unicode from apache_beam.internal import pickler from apache_beam.options.pipeline_options import DebugOptions @@ -115,6 +110,8 @@ class PipelineOptionsValidator(object): 'Option environment_config is incompatible with option(s) %s.') ERR_MISSING_REQUIRED_ENVIRONMENT_OPTION = ( 'Option %s is required for environment type %s.') + ERR_NUM_WORKERS_TOO_HIGH = ( + 'num_workers (%s) cannot exceed max_num_workers (%s)') # GCS path specific patterns. GCS_URI = '(?P[^:]+)://(?P[^/]+)(/(?P.*))?' @@ -220,8 +217,7 @@ def validate_cloud_options(self, view): 'Transform name mapping option is only useful when ' '--update and --streaming is specified') for _, (key, value) in enumerate(view.transform_name_mapping.items()): - if not isinstance(key, (str, unicode)) \ - or not isinstance(value, (str, unicode)): + if not isinstance(key, str) or not isinstance(value, str): errors.extend( self._validate_error( self.ERR_INVALID_TRANSFORM_NAME_MAPPING, key, value)) @@ -234,6 +230,44 @@ def validate_cloud_options(self, view): view.region = default_region return errors + def validate_sdk_container_image_options(self, view): + errors = [] + if view.sdk_container_image and view.worker_harness_container_image: + # To be fully backwards-compatible, these options will be set to the same + # value. Check that the values are different. + if view.sdk_container_image != view.worker_harness_container_image: + errors.extend( + self._validate_error( + 'Cannot use legacy flag --worker_harness_container_image along ' + 'with view.sdk_container_image')) + elif view.worker_harness_container_image: + # Warn about legacy flag and set new flag to value of old flag. + _LOGGER.warning( + 'Setting sdk_container_image to value of legacy flag ' + 'worker_harness_container_image.') + view.sdk_container_image = view.worker_harness_container_image + elif view.sdk_container_image: + # Set legacy option to value of new option. + view.worker_harness_container_image = view.sdk_container_image + + return errors + + def validate_num_workers(self, view): + """Validates that Dataflow worker number is valid.""" + errors = self.validate_optional_argument_positive(view, 'num_workers') + errors.extend( + self.validate_optional_argument_positive(view, 'max_num_workers')) + + num_workers = view.num_workers + max_num_workers = view.max_num_workers + if (num_workers is not None and max_num_workers is not None and + num_workers > max_num_workers): + errors.extend( + self._validate_error( + self.ERR_NUM_WORKERS_TOO_HIGH, num_workers, max_num_workers)) + + return errors + def validate_worker_region_zone(self, view): """Validates Dataflow worker region and zone arguments are consistent.""" errors = [] diff --git a/sdks/python/apache_beam/options/pipeline_options_validator_test.py b/sdks/python/apache_beam/options/pipeline_options_validator_test.py index cbdb2045f8b6..15071d8c7b78 100644 --- a/sdks/python/apache_beam/options/pipeline_options_validator_test.py +++ b/sdks/python/apache_beam/options/pipeline_options_validator_test.py @@ -19,11 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest -from builtins import object from hamcrest import assert_that from hamcrest import contains_string @@ -373,6 +370,64 @@ def test_validate_dataflow_job_file(self): errors = validator.validate() self.assertFalse(errors) + def test_num_workers_is_positive(self): + runner = MockRunners.DataflowRunner() + options = PipelineOptions([ + '--num_workers=-1', + '--worker_region=us-east1', + '--project=example:example', + '--temp_location=gs://foo/bar', + ]) + validator = PipelineOptionsValidator(options, runner) + errors = validator.validate() + self.assertEqual(len(errors), 1) + self.assertIn('num_workers', errors[0]) + self.assertIn('-1', errors[0]) + + def test_max_num_workers_is_positive(self): + runner = MockRunners.DataflowRunner() + options = PipelineOptions([ + '--max_num_workers=-1', + '--worker_region=us-east1', + '--project=example:example', + '--temp_location=gs://foo/bar', + ]) + validator = PipelineOptionsValidator(options, runner) + errors = validator.validate() + self.assertEqual(len(errors), 1) + self.assertIn('max_num_workers', errors[0]) + self.assertIn('-1', errors[0]) + + def test_num_workers_cannot_exceed_max_num_workers(self): + runner = MockRunners.DataflowRunner() + options = PipelineOptions([ + '--num_workers=43', + '--max_num_workers=42', + '--worker_region=us-east1', + '--project=example:example', + '--temp_location=gs://foo/bar', + ]) + validator = PipelineOptionsValidator(options, runner) + errors = validator.validate() + self.assertEqual(len(errors), 1) + self.assertIn('num_workers', errors[0]) + self.assertIn('43', errors[0]) + self.assertIn('max_num_workers', errors[0]) + self.assertIn('42', errors[0]) + + def test_num_workers_can_equal_max_num_workers(self): + runner = MockRunners.DataflowRunner() + options = PipelineOptions([ + '--num_workers=42', + '--max_num_workers=42', + '--worker_region=us-east1', + '--project=example:example', + '--temp_location=gs://foo/bar', + ]) + validator = PipelineOptionsValidator(options, runner) + errors = validator.validate() + self.assertEqual(len(errors), 0) + def test_zone_and_worker_region_mutually_exclusive(self): runner = MockRunners.DataflowRunner() options = PipelineOptions([ @@ -480,6 +535,54 @@ def test_region_optional_for_non_service_runner(self): errors = validator.validate() self.assertEqual(len(errors), 0) + def test_alias_sdk_container_to_worker_harness(self): + runner = MockRunners.DataflowRunner() + test_image = "SDK_IMAGE" + options = PipelineOptions([ + '--sdk_container_image=%s' % test_image, + '--project=example:example', + '--temp_location=gs://foo/bar', + ]) + validator = PipelineOptionsValidator(options, runner) + errors = validator.validate() + self.assertEqual(len(errors), 0) + self.assertEqual( + options.view_as(WorkerOptions).worker_harness_container_image, + test_image) + self.assertEqual( + options.view_as(WorkerOptions).sdk_container_image, test_image) + + def test_alias_worker_harness_sdk_container_image(self): + runner = MockRunners.DataflowRunner() + test_image = "WORKER_HARNESS" + options = PipelineOptions([ + '--worker_harness_container_image=%s' % test_image, + '--project=example:example', + '--temp_location=gs://foo/bar', + ]) + validator = PipelineOptionsValidator(options, runner) + errors = validator.validate() + self.assertEqual(len(errors), 0) + self.assertEqual( + options.view_as(WorkerOptions).worker_harness_container_image, + test_image) + self.assertEqual( + options.view_as(WorkerOptions).sdk_container_image, test_image) + + def test_worker_harness_sdk_container_image_mutually_exclusive(self): + runner = MockRunners.DataflowRunner() + options = PipelineOptions([ + '--worker_harness_container_image=WORKER', + '--sdk_container_image=SDK_ONLY', + '--project=example:example', + '--temp_location=gs://foo/bar', + ]) + validator = PipelineOptionsValidator(options, runner) + errors = validator.validate() + self.assertEqual(len(errors), 1) + self.assertIn('sdk_container_image', errors[0]) + self.assertIn('worker_harness_container_image', errors[0]) + def test_test_matcher(self): def get_validator(matcher): options = [ diff --git a/sdks/python/apache_beam/options/value_provider.py b/sdks/python/apache_beam/options/value_provider.py index 0fa5f2b5f157..5a5d36370f39 100644 --- a/sdks/python/apache_beam/options/value_provider.py +++ b/sdks/python/apache_beam/options/value_provider.py @@ -24,9 +24,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import object from functools import wraps from typing import Set @@ -88,10 +85,6 @@ def __eq__(self, other): return True return False - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash((type(self), self.value_type, self.value)) diff --git a/sdks/python/apache_beam/options/value_provider_test.py b/sdks/python/apache_beam/options/value_provider_test.py index 189501bb9eed..21e05b326a0a 100644 --- a/sdks/python/apache_beam/options/value_provider_test.py +++ b/sdks/python/apache_beam/options/value_provider_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/pipeline.py b/sdks/python/apache_beam/pipeline.py index 0b70d0803732..287ec49203f0 100644 --- a/sdks/python/apache_beam/pipeline.py +++ b/sdks/python/apache_beam/pipeline.py @@ -47,23 +47,21 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import - import abc import logging import os import re import shutil -import sys import tempfile -from builtins import object -from builtins import zip +import unicodedata from collections import defaultdict from typing import TYPE_CHECKING +from typing import Any from typing import Dict from typing import FrozenSet from typing import Iterable from typing import List +from typing import Mapping from typing import Optional from typing import Sequence from typing import Set @@ -71,8 +69,7 @@ from typing import Type from typing import Union -from future.utils import with_metaclass -from past.builtins import unicode +from google.protobuf import message from apache_beam import pvalue from apache_beam.internal import pickler @@ -89,9 +86,9 @@ from apache_beam.runners import create_runner from apache_beam.transforms import ParDo from apache_beam.transforms import ptransform -from apache_beam.transforms.core import RunnerAPIPTransformHolder -from apache_beam.transforms.sideinputs import SIDE_INPUT_PREFIX -from apache_beam.transforms.sideinputs import SIDE_INPUT_REGEX +from apache_beam.transforms.display import DisplayData +from apache_beam.transforms.resources import merge_resource_hints +from apache_beam.transforms.resources import resource_hints_from_options from apache_beam.transforms.sideinputs import get_sideinput_index from apache_beam.typehints import TypeCheckError from apache_beam.typehints import typehints @@ -221,13 +218,17 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() # type: Set[str] - + # Hints supplied via pipeline options are considered the outermost hints. + self._root_transform().resource_hints = resource_hints_from_options(options) # Create a ComponentIdMap for assigning IDs to components. Ensures that any # components that receive an ID during pipeline construction (for example in # ExternalTransform), will receive the same component ID when generating the # full pipeline proto. self.component_id_map = ComponentIdMap() + # Records whether this pipeline contains any external transforms. + self.contains_external_transforms = False + @property # type: ignore[misc] # decorated property not supported @deprecated( @@ -238,6 +239,11 @@ def options(self): # type: () -> PipelineOptions return self._options + @property + def allow_unsafe_triggers(self): + # type: () -> bool + return self._options.view_as(TypeOptions).allow_unsafe_triggers + def _current_transform(self): # type: () -> AppliedPTransform @@ -266,7 +272,7 @@ def _replace(self, override): output_replacements = { } # type: Dict[AppliedPTransform, List[Tuple[pvalue.PValue, Optional[str]]]] input_replacements = { - } # type: Dict[AppliedPTransform, Sequence[Union[pvalue.PBegin, pvalue.PCollection]]] + } # type: Dict[AppliedPTransform, Mapping[str, Union[pvalue.PBegin, pvalue.PCollection]]] side_input_replacements = { } # type: Dict[AppliedPTransform, List[pvalue.AsSideInput]] @@ -292,7 +298,11 @@ def _replace_if_needed(self, original_transform_node): original_transform_node.parent, replacement_transform, original_transform_node.full_label, - original_transform_node.inputs) + original_transform_node.main_inputs) + + # TODO(BEAM-12854): Merge rather than override. + replacement_transform_node.resource_hints = ( + original_transform_node.resource_hints) # Transform execution could depend on order in which nodes are # considered. Hence we insert the replacement transform node to same @@ -429,11 +439,11 @@ def visit_transform(self, transform_node): output_replacements[transform_node].append((tag, replacement)) if replace_input: - new_input = [ - input if not input in output_map else output_map[input] - for input in transform_node.inputs - ] - input_replacements[transform_node] = new_input + new_inputs = { + tag: input if not input in output_map else output_map[input] + for (tag, input) in transform_node.main_inputs.items() + } + input_replacements[transform_node] = new_inputs if replace_side_inputs: new_side_inputs = [] @@ -452,10 +462,10 @@ def visit_transform(self, transform_node): transform.replace_output(output, tag=tag) for transform in input_replacements: - transform.inputs = input_replacements[transform] + transform.replace_inputs(input_replacements[transform]) for transform in side_input_replacements: - transform.side_inputs = side_input_replacements[transform] + transform.replace_side_inputs(side_input_replacements[transform]) def _check_replacement(self, override): # type: (PTransformOverride) -> None @@ -500,6 +510,10 @@ def run(self, test_runner_api='AUTO'): """Runs the pipeline. Returns whatever our runner returns after running.""" + # Records whether this pipeline contains any cross-language transforms. + self.contains_external_transforms = ( + ExternalTransformFinder.contains_external_transforms(self)) + try: if test_runner_api == 'AUTO': # Don't pay the cost of a round-trip if we're going to be going through @@ -509,10 +523,15 @@ def run(self, test_runner_api='AUTO'): self.runner.__class__.__name__ == 'SwitchingDirectRunner' and not self._options.view_as(StandardOptions).streaming) + # Multi-language pipelines that contain external pipeline segments may + # not be able to create a Python pipeline object graph. Hence following + # runner API check should be skipped for such pipelines. + # The InteractiveRunner relies on a constant pipeline reference, skip # it. test_runner_api = ( not is_fnapi_compatible and + not self.contains_external_transforms and self.runner.__class__.__name__ != 'InteractiveRunner') # When possible, invoke a round trip through the runner API. @@ -520,8 +539,7 @@ def run(self, test_runner_api='AUTO'): return Pipeline.from_runner_api( self.to_runner_api(use_fake_coders=True), self.runner, - self._options, - allow_proto_holders=True).run(False) + self._options).run(False) if (self._options.view_as(TypeOptions).runtime_type_check and self._options.view_as(TypeOptions).performance_runtime_type_check): @@ -535,13 +553,8 @@ def run(self, test_runner_api='AUTO'): self.visit(typecheck.TypeCheckVisitor()) if self._options.view_as(TypeOptions).performance_runtime_type_check: - if sys.version_info < (3, ): - raise RuntimeError( - 'You cannot turn on performance_runtime_type_check ' - 'in Python 2. This is a Python 3 feature.') - else: - from apache_beam.typehints import typecheck - self.visit(typecheck.PerformanceTypeCheckVisitor()) + from apache_beam.typehints import typecheck + self.visit(typecheck.PerformanceTypeCheckVisitor()) if self._options.view_as(SetupOptions).save_main_session: # If this option is chosen, verify we can pickle the main session early. @@ -659,15 +672,18 @@ def apply( pvalueish, inputs = transform._extract_input_pvalues(pvalueish) try: - inputs = tuple(inputs) - for leaf_input in inputs: - if not isinstance(leaf_input, pvalue.PValue): - raise TypeError + if not isinstance(inputs, dict): + inputs = {str(ix): input for (ix, input) in enumerate(inputs)} except TypeError: raise NotImplementedError( 'Unable to extract PValue inputs from %s; either %s does not accept ' 'inputs of this format, or it does not properly override ' '_extract_input_pvalues' % (pvalueish, transform)) + for t, leaf_input in inputs.items(): + if not isinstance(leaf_input, pvalue.PValue) or not isinstance(t, str): + raise NotImplementedError( + '%s does not properly override _extract_input_pvalues, ' + 'returned %s from %s' % (transform, inputs, pvalueish)) current = AppliedPTransform( self._current_transform(), transform, full_label, inputs) @@ -694,7 +710,8 @@ def apply( if result.producer is None: result.producer = current - self._infer_result_type(transform, inputs, result) + # TODO(BEAM-1833): Pass full tuples dict. + self._infer_result_type(transform, tuple(inputs.values()), result) assert isinstance(result.producer.inputs, tuple) # The DoOutputsTuple adds the PCollection to the outputs when accessed @@ -741,9 +758,11 @@ def _infer_result_type( (not result_pcollection.element_type # TODO(robertwb): Ideally we'd do intersection here. or result_pcollection.element_type == typehints.Any)): - # Single-input, single-output inference. + # {Single, multi}-input, single-output inference. + input_element_types_tuple = tuple(i.element_type for i in inputs) input_element_type = ( - inputs[0].element_type if len(inputs) == 1 else typehints.Any) + input_element_types_tuple[0] if len(input_element_types_tuple) == 1 + else typehints.Union[input_element_types_tuple]) type_hints = transform.get_type_hints() declared_output_type = type_hints.simple_output_type(transform.label) if declared_output_type: @@ -760,7 +779,7 @@ def _infer_result_type( result_pcollection.element_type = transform.infer_output_type( input_element_type) elif isinstance(result_pcollection, pvalue.DoOutputsTuple): - # Single-input, multi-output inference. + # {Single, multi}-input, multi-output inference. # TODO(BEAM-4132): Add support for tagged type hints. # https://github.com/apache/beam/pull/9810#discussion_r338765251 for pcoll in result_pcollection: @@ -835,6 +854,10 @@ def to_runner_api( # general shapes, potential conflicts will have to be resolved. # We also only handle single-input, and (for fixing the output) single # output, which is sufficient. + # Also marks such values as requiring deterministic key coders. + deterministic_key_coders = not self._options.view_as( + TypeOptions).allow_non_deterministic_key_coders + class ForceKvInputTypes(PipelineVisitor): def enter_composite_transform(self, transform_node): # type: (AppliedPTransform) -> None @@ -848,18 +871,27 @@ def visit_transform(self, transform_node): pcoll = transform_node.inputs[0] pcoll.element_type = typehints.coerce_to_kv_type( pcoll.element_type, transform_node.full_label) + pcoll.requires_deterministic_key_coder = ( + deterministic_key_coders and transform_node.full_label) if len(transform_node.outputs) == 1: # The runner often has expectations about the output types as well. output, = transform_node.outputs.values() if not output.element_type: output.element_type = transform_node.transform.infer_output_type( pcoll.element_type) + if (isinstance(output.element_type, + typehints.TupleHint.TupleConstraint) and + len(output.element_type.tuple_types) == 2): + output.requires_deterministic_key_coder = ( + deterministic_key_coders and transform_node.full_label) for side_input in transform_node.transform.side_inputs: if side_input.requires_keyed_input(): side_input.pvalue.element_type = typehints.coerce_to_kv_type( side_input.pvalue.element_type, transform_node.full_label, side_input_producer=side_input.pvalue.producer.full_label) + side_input.pvalue.requires_deterministic_key_coder = ( + deterministic_key_coders and transform_node.full_label) self.visit(ForceKvInputTypes()) @@ -883,7 +915,6 @@ def from_runner_api( runner, # type: PipelineRunner options, # type: PipelineOptions return_context=False, # type: bool - allow_proto_holders=False # type: bool ): # type: (...) -> Pipeline @@ -891,9 +922,7 @@ def from_runner_api( p = Pipeline(runner=runner, options=options) from apache_beam.runners import pipeline_context context = pipeline_context.PipelineContext( - proto.components, - allow_proto_holders=allow_proto_holders, - requirements=proto.requirements) + proto.components, requirements=proto.requirements) if proto.root_transform_ids: root_transform_id, = proto.root_transform_ids p.transforms_stack = [context.transforms.get_by_id(root_transform_id)] @@ -917,7 +946,7 @@ def from_runner_api( for id in proto.components.transforms: transform = context.transforms.get_by_id(id) if not transform.inputs and transform.transform.__class__ in has_pbegin: - transform.inputs = (pvalue.PBegin(p), ) + transform.main_inputs = {'None': pvalue.PBegin(p)} if return_context: return p, context # type: ignore # too complicated for now @@ -962,6 +991,40 @@ def leave_composite_transform(self, transform_node): pass +class ExternalTransformFinder(PipelineVisitor): + """Looks for any external transforms in the pipeline and if found records + it. + """ + def __init__(self): + self._contains_external_transforms = False + + @staticmethod + def contains_external_transforms(pipeline): + visitor = ExternalTransformFinder() + pipeline.visit(visitor) + return visitor._contains_external_transforms + + def _perform_exernal_transform_test(self, transform): + if not transform: + return + from apache_beam.transforms import ExternalTransform + if isinstance(transform, ExternalTransform): + self._contains_external_transforms = True + + def visit_transform(self, transform_node): + # type: (AppliedPTransform) -> None + self._perform_exernal_transform_test(transform_node.transform) + + def enter_composite_transform(self, transform_node): + # type: (AppliedPTransform) -> None + # Python SDK object graph may represent an external transform that is a leaf + # of the pipeline graph as a composite without sub-transforms. + # Note that this visitor is just used to identify pipelines with external + # transforms. A Runner API pipeline proto generated from the Pipeline object + # will include external sub-transform. + self._perform_exernal_transform_test(transform_node.transform) + + class AppliedPTransform(object): """For internal use only; no backwards-compatibility guarantees. @@ -973,9 +1036,9 @@ def __init__( parent, # type: Optional[AppliedPTransform] transform, # type: Optional[ptransform.PTransform] full_label, # type: str - inputs, # type: Optional[Sequence[Union[pvalue.PBegin, pvalue.PCollection]]] + main_inputs, # type: Optional[Mapping[str, Union[pvalue.PBegin, pvalue.PCollection]]] environment_id=None, # type: Optional[str] - input_tags_to_preserve=None, # type: Dict[pvalue.PCollection, str] + annotations=None, # type: Optional[Dict[str, bytes]] ): # type: (...) -> None self.parent = parent @@ -986,13 +1049,42 @@ def __init__( # reusing PTransform instances in different contexts (apply() calls) without # any interference. This is particularly useful for composite transforms. self.full_label = full_label - self.inputs = inputs or () + self.main_inputs = dict(main_inputs or {}) self.side_inputs = tuple() if transform is None else transform.side_inputs self.outputs = {} # type: Dict[Union[str, int, None], pvalue.PValue] self.parts = [] # type: List[AppliedPTransform] self.environment_id = environment_id if environment_id else None # type: Optional[str] - self.input_tags_to_preserve = input_tags_to_preserve or {} + # We may need to merge the hints with environment-provided hints here + # once environment is a first-class citizen in Beam graph and we have + # access to actual environment, not just an id. + self.resource_hints = dict( + transform.get_resource_hints()) if transform else { + } # type: Dict[str, bytes] + + if annotations is None and transform: + + def annotation_to_bytes(key, a: Any) -> bytes: + if isinstance(a, bytes): + return a + elif isinstance(a, str): + return a.encode('ascii') + elif isinstance(a, message.Message): + return a.SerializeToString() + else: + raise TypeError( + 'Unknown annotation type %r (type %s) for %s' % (a, type(a), key)) + + annotations = { + key: annotation_to_bytes(key, a) + for key, + a in transform.annotations().items() + } + self.annotations = annotations + + @property + def inputs(self): + return tuple(self.main_inputs.values()) def __repr__(self): # type: () -> str @@ -1022,6 +1114,27 @@ def replace_output( else: raise TypeError("Unexpected output type: %s" % output) + # Importing locally to prevent circular dependency issues. + from apache_beam.transforms import external + if isinstance(self.transform, external.ExternalTransform): + self.transform.replace_named_outputs(self.named_outputs()) + + def replace_inputs(self, main_inputs): + self.main_inputs = main_inputs + + # Importing locally to prevent circular dependency issues. + from apache_beam.transforms import external + if isinstance(self.transform, external.ExternalTransform): + self.transform.replace_named_inputs(self.named_inputs()) + + def replace_side_inputs(self, side_inputs): + self.side_inputs = side_inputs + + # Importing locally to prevent circular dependency issues. + from apache_beam.transforms import external + if isinstance(self.transform, external.ExternalTransform): + self.transform.replace_named_inputs(self.named_inputs()) + def add_output( self, output, # type: Union[pvalue.DoOutputsTuple, pvalue.PValue] @@ -1039,6 +1152,7 @@ def add_output( def add_part(self, part): # type: (AppliedPTransform) -> None assert isinstance(part, AppliedPTransform) + part._merge_outer_resource_hints() self.parts.append(part) def is_composite(self): @@ -1111,31 +1225,27 @@ def visit( def named_inputs(self): # type: () -> Dict[str, pvalue.PValue] - # TODO(BEAM-1833): Push names up into the sdk construction. - main_inputs = { - str(ix): input - for ix, - input in enumerate(self.inputs) - if isinstance(input, pvalue.PCollection) - } - side_inputs = {(SIDE_INPUT_PREFIX + '%s') % ix: si.pvalue - for (ix, si) in enumerate(self.side_inputs)} - return dict(main_inputs, **side_inputs) + if self.transform is None: + assert not self.main_inputs and not self.side_inputs + return {} + else: + return self.transform._named_inputs(self.main_inputs, self.side_inputs) def named_outputs(self): # type: () -> Dict[str, pvalue.PCollection] - return { - try_unicode(tag): output - for tag, - output in self.outputs.items() - if isinstance(output, pvalue.PCollection) - } + if self.transform is None: + assert not self.outputs + return {} + else: + return self.transform._named_outputs(self.outputs) def to_runner_api(self, context): # type: (PipelineContext) -> beam_runner_api_pb2.PTransform - # External tranforms require more splicing than just setting the spec. + # External transforms require more splicing than just setting the spec. from apache_beam.transforms import external if isinstance(self.transform, external.ExternalTransform): + # TODO(BEAM-12082): Support resource hints in XLang transforms. + # In particular, make sure hints on composites are properly propagated. return self.transform.to_runner_api_transform(context, self.full_label) from apache_beam.portability.api import beam_runner_api_pb2 @@ -1164,13 +1274,8 @@ def transform_to_runner_api( transform_urn = transform_spec.urn if transform_spec else None if (not environment_id and (transform_urn not in Pipeline.runner_implemented_transforms())): - environment_id = context.default_environment_id() - - def _maybe_preserve_tag(new_tag, pc, input_tags_to_preserve): - # TODO(BEAM-1833): remove this after we update Python SDK and - # DataflowRunner to construct pipelines using runner API. - return input_tags_to_preserve[ - pc] if pc in input_tags_to_preserve else new_tag + environment_id = context.get_environment_id_for_resource_hints( + self.resource_hints) return beam_runner_api_pb2.PTransform( unique_name=self.full_label, @@ -1180,9 +1285,9 @@ def _maybe_preserve_tag(new_tag, pc, input_tags_to_preserve): for part in self.parts ], inputs={ - _maybe_preserve_tag(tag, pc, self.input_tags_to_preserve): - context.pcollections.get_id(pc) - for (tag, pc) in sorted(self.named_inputs().items()) + tag: context.pcollections.get_id(pc) + for tag, + pc in sorted(self.named_inputs().items()) }, outputs={ tag: context.pcollections.get_id(out) @@ -1190,8 +1295,10 @@ def _maybe_preserve_tag(new_tag, pc, input_tags_to_preserve): out in sorted(self.named_outputs().items()) }, environment_id=environment_id, + annotations=self.annotations, # TODO(BEAM-366): Add display_data. - display_data=None) + display_data=DisplayData.create_from(self.transform).to_proto() + if self.transform else None) @staticmethod def from_runner_api( @@ -1211,73 +1318,34 @@ def from_runner_api( pardo_payload = None side_input_tags = [] - main_inputs = [ - context.pcollections.get_by_id(id) for tag, - id in proto.inputs.items() if tag not in side_input_tags - ] - - def is_python_side_input(tag): - # type: (str) -> bool - # As per named_inputs() above. - return re.match(SIDE_INPUT_REGEX, tag) - - uses_python_sideinput_tags = ( - is_python_side_input(side_input_tags[0]) if side_input_tags else False) + main_inputs = { + tag: context.pcollections.get_by_id(id) + for (tag, id) in proto.inputs.items() if tag not in side_input_tags + } transform = ptransform.PTransform.from_runner_api(proto, context) - if uses_python_sideinput_tags: - # Ordering is important here. - # TODO(BEAM-9635): use key, value pairs instead of depending on tags with - # index as a suffix. - indexed_side_inputs = [ - (get_sideinput_index(tag), context.pcollections.get_by_id(id)) - for tag, - id in proto.inputs.items() if tag in side_input_tags - ] - side_inputs = [si for _, si in sorted(indexed_side_inputs)] - else: - # These must be set in the same order for subsequent zip to work. - side_inputs = [] - transform_side_inputs = [] - - for tag, id in proto.inputs.items(): - if tag in side_input_tags: - pc = context.pcollections.get_by_id(id) - side_inputs.append(pc) - assert pardo_payload # This must be a ParDo with side inputs. - side_input_from_pardo = pardo_payload.side_inputs[tag] - - # TODO(BEAM-1833): use 'pvalue.SideInputData.from_runner_api' here - # when that is updated to better represent runner API. - if (common_urns.side_inputs.MULTIMAP.urn == - side_input_from_pardo.access_pattern.urn): - transform_side_inputs.append(pvalue.AsMultiMap(pc)) - elif (common_urns.side_inputs.ITERABLE.urn == - side_input_from_pardo.access_pattern.urn): - transform_side_inputs.append(pvalue.AsIter(pc)) - else: - raise ValueError( - 'Unsupported side input access pattern %r' % - side_input_from_pardo.access_pattern.urn) - if transform: - transform.side_inputs = transform_side_inputs - - if isinstance(transform, RunnerAPIPTransformHolder): - # For external transforms that are ParDos, we have to preserve input tags. - input_tags_to_preserve = { - context.pcollections.get_by_id(id): tag - for (tag, id) in proto.inputs.items() - } - else: - input_tags_to_preserve = {} + if transform and proto.environment_id: + resource_hints = context.environments.get_by_id( + proto.environment_id).resource_hints() + if resource_hints: + transform._resource_hints = dict(resource_hints) + + # Ordering is important here. + # TODO(BEAM-9635): use key, value pairs instead of depending on tags with + # index as a suffix. + indexed_side_inputs = [ + (get_sideinput_index(tag), context.pcollections.get_by_id(id)) for tag, + id in proto.inputs.items() if tag in side_input_tags + ] + side_inputs = [si for _, si in sorted(indexed_side_inputs)] result = AppliedPTransform( parent=None, transform=transform, full_label=proto.unique_name, - inputs=main_inputs, - environment_id=proto.environment_id, - input_tags_to_preserve=input_tags_to_preserve) + main_inputs=main_inputs, + environment_id=None, + annotations=proto.annotations) if result.transform and result.transform.side_inputs: for si, pcoll in zip(result.transform.side_inputs, side_inputs): @@ -1287,7 +1355,7 @@ def is_python_side_input(tag): for transform_id in proto.subtransforms: part = context.transforms.get_by_id(transform_id) part.parent = result - result.parts.append(part) + result.add_part(part) result.outputs = { None if tag == 'None' else tag: context.pcollections.get_by_id(id) for tag, @@ -1295,9 +1363,6 @@ def is_python_side_input(tag): } # This annotation is expected by some runners. if proto.spec.urn == common_urns.primitives.PAR_DO.urn: - # TODO(BEAM-9168): Figure out what to do for RunnerAPIPTransformHolder. - assert isinstance(result.transform, (ParDo, RunnerAPIPTransformHolder)),\ - type(result.transform) result.transform.output_tags = set(proto.outputs.keys()).difference( {'None'}) if not result.parts: @@ -1308,9 +1373,17 @@ def is_python_side_input(tag): pc.tag = None if tag == 'None' else tag return result + def _merge_outer_resource_hints(self): + if (self.parent is not None and self.parent.resource_hints): + self.resource_hints = merge_resource_hints( + outer_hints=self.parent.resource_hints, + inner_hints=self.resource_hints) + if self.resource_hints: + for part in self.parts: + part._merge_outer_resource_hints() + -class PTransformOverride(with_metaclass(abc.ABCMeta, - object)): # type: ignore[misc] +class PTransformOverride(metaclass=abc.ABCMeta): """For internal use only; no backwards-compatibility guarantees. Gives a matcher and replacements for matching PTransforms. @@ -1407,22 +1480,14 @@ def get_or_assign(self, obj=None, obj_type=None, label=None): return self._obj_to_id[obj] + def _normalize(self, str_value): + str_value = unicodedata.normalize('NFC', str_value) + return re.sub(r'[^a-zA-Z0-9-_]+', '-', str_value) + def _unique_ref(self, obj=None, obj_type=None, label=None): + # Normalize, trim, and uniqify. + prefix = self._normalize( + '%s_%s_%s' % + (self.namespace, obj_type.__name__, label or type(obj).__name__))[0:100] self._counters[obj_type] += 1 - return "%s_%s_%s_%d" % ( - self.namespace, - obj_type.__name__, - label or type(obj).__name__, - self._counters[obj_type]) - - -if sys.version_info >= (3, ): - try_unicode = str - -else: - - def try_unicode(s): - try: - return unicode(s) - except UnicodeDecodeError: - return str(s).decode('ascii', 'replace') + return '%s_%d' % (prefix, self._counters[obj_type]) diff --git a/sdks/python/apache_beam/pipeline_test.py b/sdks/python/apache_beam/pipeline_test.py index e86de573a070..731d5e3ae12e 100644 --- a/sdks/python/apache_beam/pipeline_test.py +++ b/sdks/python/apache_beam/pipeline_test.py @@ -19,21 +19,18 @@ # pytype: skip-file -from __future__ import absolute_import - import copy import platform import unittest -from builtins import object -from builtins import range -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam import typehints from apache_beam.coders import BytesCoder from apache_beam.io import Read from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PortableOptions from apache_beam.pipeline import Pipeline from apache_beam.pipeline import PipelineOptions from apache_beam.pipeline import PipelineVisitor @@ -53,6 +50,9 @@ from apache_beam.transforms import ParDo from apache_beam.transforms import PTransform from apache_beam.transforms import WindowInto +from apache_beam.transforms.display import DisplayDataItem +from apache_beam.transforms.environments import ProcessEnvironment +from apache_beam.transforms.resources import ResourceHint from apache_beam.transforms.userstate import BagStateSpec from apache_beam.transforms.window import SlidingWindows from apache_beam.transforms.window import TimestampedValue @@ -60,7 +60,6 @@ from apache_beam.utils.timestamp import MIN_TIMESTAMP # TODO(BEAM-1555): Test is failing on the service, with FakeSource. -# from nose.plugins.attrib import attr class FakeSource(NativeSource): @@ -259,7 +258,7 @@ def test_create_singleton_pcollection(self): assert_that(pcoll, equal_to([[1, 2, 3]])) # TODO(BEAM-1555): Test is failing on the service, with FakeSource. - # @attr('ValidatesRunner') + # @pytest.mark.it_validatesrunner def test_metrics_in_fake_source(self): pipeline = TestPipeline() pcoll = pipeline | Read(FakeSource([1, 2, 3, 4, 5, 6])) @@ -636,7 +635,7 @@ def test_track_pcoll_unbounded(self): pcoll2 = pcoll1 | 'do1' >> FlatMap(lambda x: [x + 1]) pcoll3 = pcoll2 | 'do2' >> FlatMap(lambda x: [x + 1]) self.assertIs(pcoll1.is_bounded, False) - self.assertIs(pcoll1.is_bounded, False) + self.assertIs(pcoll2.is_bounded, False) self.assertIs(pcoll3.is_bounded, False) def test_track_pcoll_bounded(self): @@ -719,7 +718,7 @@ def process(self, element, prefix, suffix=DoFn.SideInputParam): TestDoFn(), prefix, suffix=AsSingleton(suffix)) assert_that(result, equal_to(['zyx-%s-xyz' % x for x in words_list])) - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_element_param(self): pipeline = TestPipeline() input = [1, 2] @@ -730,7 +729,7 @@ def test_element_param(self): assert_that(pcoll, equal_to(input)) pipeline.run() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_key_param(self): pipeline = TestPipeline() pcoll = ( @@ -924,6 +923,358 @@ def test_requirements(self): common_urns.requirements.REQUIRES_BUNDLE_FINALIZATION.urn, proto.requirements) + def test_annotations(self): + some_proto = BytesCoder().to_runner_api(None) + + class EmptyTransform(beam.PTransform): + def expand(self, pcoll): + return pcoll + + def annotations(self): + return {'foo': 'some_string'} + + class NonEmptyTransform(beam.PTransform): + def expand(self, pcoll): + return pcoll | beam.Map(lambda x: x) + + def annotations(self): + return { + 'foo': b'some_bytes', + 'proto': some_proto, + } + + p = beam.Pipeline() + _ = p | beam.Create([]) | EmptyTransform() | NonEmptyTransform() + proto = p.to_runner_api() + + seen = 0 + for transform in proto.components.transforms.values(): + if transform.unique_name == 'EmptyTransform': + seen += 1 + self.assertEqual(transform.annotations['foo'], b'some_string') + elif transform.unique_name == 'NonEmptyTransform': + seen += 1 + self.assertEqual(transform.annotations['foo'], b'some_bytes') + self.assertEqual( + transform.annotations['proto'], some_proto.SerializeToString()) + self.assertEqual(seen, 2) + + def test_transform_ids(self): + class MyPTransform(beam.PTransform): + def expand(self, p): + self.p = p + return p | beam.Create([None]) + + p = beam.Pipeline() + p | MyPTransform() # pylint: disable=expression-not-assigned + runner_api_proto = Pipeline.to_runner_api(p) + + for transform_id in runner_api_proto.components.transforms: + self.assertRegex(transform_id, r'[a-zA-Z0-9-_]+') + + def test_input_names(self): + class MyPTransform(beam.PTransform): + def expand(self, pcolls): + return pcolls.values() | beam.Flatten() + + p = beam.Pipeline() + input_names = set('ABC') + inputs = {x: p | x >> beam.Create([x]) for x in input_names} + inputs | MyPTransform() # pylint: disable=expression-not-assigned + runner_api_proto = Pipeline.to_runner_api(p) + + for transform_proto in runner_api_proto.components.transforms.values(): + if transform_proto.unique_name == 'MyPTransform': + self.assertEqual(set(transform_proto.inputs.keys()), input_names) + break + else: + self.fail('Unable to find transform.') + + def test_display_data(self): + class MyParentTransform(beam.PTransform): + def expand(self, p): + self.p = p + return p | beam.Create([None]) + + def display_data(self): # type: () -> dict + parent_dd = super(MyParentTransform, self).display_data() + parent_dd['p_dd_string'] = DisplayDataItem( + 'p_dd_string_value', label='p_dd_string_label') + parent_dd['p_dd_string_2'] = DisplayDataItem('p_dd_string_value_2') + parent_dd['p_dd_bool'] = DisplayDataItem(True, label='p_dd_bool_label') + parent_dd['p_dd_int'] = DisplayDataItem(1, label='p_dd_int_label') + return parent_dd + + class MyPTransform(MyParentTransform): + def expand(self, p): + self.p = p + return p | beam.Create([None]) + + def display_data(self): # type: () -> dict + parent_dd = super(MyPTransform, self).display_data() + parent_dd['dd_string'] = DisplayDataItem( + 'dd_string_value', label='dd_string_label') + parent_dd['dd_string_2'] = DisplayDataItem('dd_string_value_2') + parent_dd['dd_bool'] = DisplayDataItem(False, label='dd_bool_label') + parent_dd['dd_int'] = DisplayDataItem(1.1, label='dd_int_label') + return parent_dd + + p = beam.Pipeline() + p | MyPTransform() # pylint: disable=expression-not-assigned + from apache_beam.portability.api import beam_runner_api_pb2 + + proto_pipeline = Pipeline.to_runner_api(p, use_fake_coders=True) + my_transform, = [ + transform + for transform in proto_pipeline.components.transforms.values() + if transform.unique_name == 'MyPTransform' + ] + self.assertIsNotNone(my_transform) + self.assertListEqual( + list(my_transform.display_data), + [ + beam_runner_api_pb2.DisplayData( + urn=common_urns.StandardDisplayData.DisplayData.LABELLED.urn, + payload=beam_runner_api_pb2.LabelledPayload( + label='p_dd_string_label', + string_value='p_dd_string_value').SerializeToString()), + beam_runner_api_pb2.DisplayData( + urn=common_urns.StandardDisplayData.DisplayData.LABELLED.urn, + payload=beam_runner_api_pb2.LabelledPayload( + label='p_dd_string_2', + string_value='p_dd_string_value_2').SerializeToString()), + beam_runner_api_pb2.DisplayData( + urn=common_urns.StandardDisplayData.DisplayData.LABELLED.urn, + payload=beam_runner_api_pb2.LabelledPayload( + label='p_dd_bool_label', + bool_value=True).SerializeToString()), + beam_runner_api_pb2.DisplayData( + urn=common_urns.StandardDisplayData.DisplayData.LABELLED.urn, + payload=beam_runner_api_pb2.LabelledPayload( + label='p_dd_int_label', + double_value=1).SerializeToString()), + beam_runner_api_pb2.DisplayData( + urn=common_urns.StandardDisplayData.DisplayData.LABELLED.urn, + payload=beam_runner_api_pb2.LabelledPayload( + label='dd_string_label', + string_value='dd_string_value').SerializeToString()), + beam_runner_api_pb2.DisplayData( + urn=common_urns.StandardDisplayData.DisplayData.LABELLED.urn, + payload=beam_runner_api_pb2.LabelledPayload( + label='dd_string_2', + string_value='dd_string_value_2').SerializeToString()), + beam_runner_api_pb2.DisplayData( + urn=common_urns.StandardDisplayData.DisplayData.LABELLED.urn, + payload=beam_runner_api_pb2.LabelledPayload( + label='dd_bool_label', + bool_value=False).SerializeToString()), + beam_runner_api_pb2.DisplayData( + urn=common_urns.StandardDisplayData.DisplayData.LABELLED.urn, + payload=beam_runner_api_pb2.LabelledPayload( + label='dd_int_label', + double_value=1.1).SerializeToString()), + ]) + + def test_runner_api_roundtrip_preserves_resource_hints(self): + p = beam.Pipeline() + _ = ( + p | beam.Create([1, 2]) + | beam.Map(lambda x: x + 1).with_resource_hints(accelerator='gpu')) + + self.assertEqual( + p.transforms_stack[0].parts[1].transform.get_resource_hints(), + {common_urns.resource_hints.ACCELERATOR.urn: b'gpu'}) + + for _ in range(3): + # Verify that DEFAULT environments are recreated during multiple RunnerAPI + # translation and hints don't get lost. + p = Pipeline.from_runner_api(Pipeline.to_runner_api(p), None, None) + self.assertEqual( + p.transforms_stack[0].parts[1].transform.get_resource_hints(), + {common_urns.resource_hints.ACCELERATOR.urn: b'gpu'}) + + def test_hints_on_composite_transforms_are_propagated_to_subtransforms(self): + class FooHint(ResourceHint): + urn = 'foo_urn' + + class BarHint(ResourceHint): + urn = 'bar_urn' + + class BazHint(ResourceHint): + urn = 'baz_urn' + + class QuxHint(ResourceHint): + urn = 'qux_urn' + + class UseMaxValueHint(ResourceHint): + urn = 'use_max_value_urn' + + @classmethod + def get_merged_value( + cls, outer_value, inner_value): # type: (bytes, bytes) -> bytes + return ResourceHint._use_max(outer_value, inner_value) + + ResourceHint.register_resource_hint('foo_hint', FooHint) + ResourceHint.register_resource_hint('bar_hint', BarHint) + ResourceHint.register_resource_hint('baz_hint', BazHint) + ResourceHint.register_resource_hint('qux_hint', QuxHint) + ResourceHint.register_resource_hint('use_max_value_hint', UseMaxValueHint) + + @beam.ptransform_fn + def SubTransform(pcoll): + return pcoll | beam.Map(lambda x: x + 1).with_resource_hints( + foo_hint='set_on_subtransform', use_max_value_hint='10') + + @beam.ptransform_fn + def CompositeTransform(pcoll): + return pcoll | beam.Map(lambda x: x * 2) | SubTransform() + + p = beam.Pipeline() + _ = ( + p | beam.Create([1, 2]) + | CompositeTransform().with_resource_hints( + foo_hint='should_be_overriden_by_subtransform', + bar_hint='set_on_composite', + baz_hint='set_on_composite', + use_max_value_hint='100')) + options = PortableOptions([ + '--resource_hint=baz_hint=should_be_overriden_by_composite', + '--resource_hint=qux_hint=set_via_options', + '--environment_type=PROCESS', + '--environment_option=process_command=foo', + '--sdk_location=container', + ]) + environment = ProcessEnvironment.from_options(options) + proto = Pipeline.to_runner_api(p, default_environment=environment) + + for t in proto.components.transforms.values(): + if "CompositeTransform/SubTransform/Map" in t.unique_name: + environment = proto.components.environments.get(t.environment_id) + self.assertEqual( + environment.resource_hints.get('foo_urn'), b'set_on_subtransform') + self.assertEqual( + environment.resource_hints.get('bar_urn'), b'set_on_composite') + self.assertEqual( + environment.resource_hints.get('baz_urn'), b'set_on_composite') + self.assertEqual( + environment.resource_hints.get('qux_urn'), b'set_via_options') + self.assertEqual( + environment.resource_hints.get('use_max_value_urn'), b'100') + found = True + assert found + + def test_environments_with_same_resource_hints_are_reused(self): + class HintX(ResourceHint): + urn = 'X_urn' + + class HintY(ResourceHint): + urn = 'Y_urn' + + class HintIsOdd(ResourceHint): + urn = 'IsOdd_urn' + + ResourceHint.register_resource_hint('X', HintX) + ResourceHint.register_resource_hint('Y', HintY) + ResourceHint.register_resource_hint('IsOdd', HintIsOdd) + + p = beam.Pipeline() + num_iter = 4 + for i in range(num_iter): + _ = ( + p + | f'NoHintCreate_{i}' >> beam.Create([1, 2]) + | f'NoHint_{i}' >> beam.Map(lambda x: x + 1)) + _ = ( + p + | f'XCreate_{i}' >> beam.Create([1, 2]) + | + f'HintX_{i}' >> beam.Map(lambda x: x + 1).with_resource_hints(X='X')) + _ = ( + p + | f'XYCreate_{i}' >> beam.Create([1, 2]) + | f'HintXY_{i}' >> beam.Map(lambda x: x + 1).with_resource_hints( + X='X', Y='Y')) + _ = ( + p + | f'IsOddCreate_{i}' >> beam.Create([1, 2]) + | f'IsOdd_{i}' >> + beam.Map(lambda x: x + 1).with_resource_hints(IsOdd=str(i % 2 != 0))) + + proto = Pipeline.to_runner_api(p) + count_x = count_xy = count_is_odd = count_no_hints = 0 + env_ids = set() + for _, t in proto.components.transforms.items(): + env = proto.components.environments[t.environment_id] + if t.unique_name.startswith('HintX_'): + count_x += 1 + env_ids.add(t.environment_id) + self.assertEqual(env.resource_hints, {'X_urn': b'X'}) + + if t.unique_name.startswith('HintXY_'): + count_xy += 1 + env_ids.add(t.environment_id) + self.assertEqual(env.resource_hints, {'X_urn': b'X', 'Y_urn': b'Y'}) + + if t.unique_name.startswith('NoHint_'): + count_no_hints += 1 + env_ids.add(t.environment_id) + self.assertEqual(env.resource_hints, {}) + + if t.unique_name.startswith('IsOdd_'): + count_is_odd += 1 + env_ids.add(t.environment_id) + self.assertTrue( + env.resource_hints == {'IsOdd_urn': b'True'} or + env.resource_hints == {'IsOdd_urn': b'False'}) + assert count_x == count_is_odd == count_xy == count_no_hints == num_iter + assert num_iter > 1 + + self.assertEqual(len(env_ids), 5) + + def test_multiple_application_of_the_same_transform_set_different_hints(self): + class FooHint(ResourceHint): + urn = 'foo_urn' + + class UseMaxValueHint(ResourceHint): + urn = 'use_max_value_urn' + + @classmethod + def get_merged_value( + cls, outer_value, inner_value): # type: (bytes, bytes) -> bytes + return ResourceHint._use_max(outer_value, inner_value) + + ResourceHint.register_resource_hint('foo_hint', FooHint) + ResourceHint.register_resource_hint('use_max_value_hint', UseMaxValueHint) + + @beam.ptransform_fn + def SubTransform(pcoll): + return pcoll | beam.Map(lambda x: x + 1) + + @beam.ptransform_fn + def CompositeTransform(pcoll): + sub = SubTransform() + return ( + pcoll + | 'first' >> sub.with_resource_hints(foo_hint='first_application') + | 'second' >> sub.with_resource_hints(foo_hint='second_application')) + + p = beam.Pipeline() + _ = (p | beam.Create([1, 2]) | CompositeTransform()) + proto = Pipeline.to_runner_api(p) + count = 0 + for t in proto.components.transforms.values(): + if "CompositeTransform/first/Map" in t.unique_name: + environment = proto.components.environments.get(t.environment_id) + self.assertEqual( + b'first_application', environment.resource_hints.get('foo_urn')) + count += 1 + if "CompositeTransform/second/Map" in t.unique_name: + environment = proto.components.environments.get(t.environment_id) + self.assertEqual( + b'second_application', environment.resource_hints.get('foo_urn')) + count += 1 + assert count == 2 + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/portability/__init__.py b/sdks/python/apache_beam/portability/__init__.py index 9fbf21557df7..0bce5d68f724 100644 --- a/sdks/python/apache_beam/portability/__init__.py +++ b/sdks/python/apache_beam/portability/__init__.py @@ -16,4 +16,3 @@ # """For internal use only; no backwards-compatibility guarantees.""" -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/portability/common_urns.py b/sdks/python/apache_beam/portability/common_urns.py index e3563197c649..4b8838fdd3de 100644 --- a/sdks/python/apache_beam/portability/common_urns.py +++ b/sdks/python/apache_beam/portability/common_urns.py @@ -19,15 +19,15 @@ # pytype: skip-file -from __future__ import absolute_import - from apache_beam.portability.api.beam_runner_api_pb2_urns import BeamConstants from apache_beam.portability.api.beam_runner_api_pb2_urns import StandardArtifacts from apache_beam.portability.api.beam_runner_api_pb2_urns import StandardCoders +from apache_beam.portability.api.beam_runner_api_pb2_urns import StandardDisplayData from apache_beam.portability.api.beam_runner_api_pb2_urns import StandardEnvironments from apache_beam.portability.api.beam_runner_api_pb2_urns import StandardProtocols from apache_beam.portability.api.beam_runner_api_pb2_urns import StandardPTransforms from apache_beam.portability.api.beam_runner_api_pb2_urns import StandardRequirements +from apache_beam.portability.api.beam_runner_api_pb2_urns import StandardResourceHints from apache_beam.portability.api.beam_runner_api_pb2_urns import StandardSideInputTypes from apache_beam.portability.api.metrics_pb2_urns import MonitoringInfo from apache_beam.portability.api.metrics_pb2_urns import MonitoringInfoSpecs @@ -51,6 +51,7 @@ environments = StandardEnvironments.Environments artifact_types = StandardArtifacts.Types artifact_roles = StandardArtifacts.Roles +resource_hints = StandardResourceHints.Enum global_windows = GlobalWindowsPayload.Enum.PROPERTIES fixed_windows = FixedWindowsPayload.Enum.PROPERTIES @@ -63,3 +64,5 @@ protocols = StandardProtocols.Enum requirements = StandardRequirements.Enum + +displayData = StandardDisplayData.DisplayData diff --git a/sdks/python/apache_beam/portability/python_urns.py b/sdks/python/apache_beam/portability/python_urns.py index 0f77e2d00bb7..a025709f38a5 100644 --- a/sdks/python/apache_beam/portability/python_urns.py +++ b/sdks/python/apache_beam/portability/python_urns.py @@ -55,3 +55,8 @@ # and artifact fetching code. # (Used for testing.) SUBPROCESS_SDK = "beam:env:harness_subprocess_python:v1" + +# An annotation that indicates combiner packing is OK in all sub-transforms +# of this transform. This optimization may result in renamed counters and +# PCollection element counts. +APPLY_COMBINER_PACKING = "beam:annotation:apply_combiner_packing:v1" diff --git a/sdks/python/apache_beam/portability/utils.py b/sdks/python/apache_beam/portability/utils.py index 176c6db73c14..4d23ff102490 100644 --- a/sdks/python/apache_beam/portability/utils.py +++ b/sdks/python/apache_beam/portability/utils.py @@ -16,8 +16,6 @@ # """For internal use only; no backwards-compatibility guarantees.""" -from __future__ import absolute_import - from typing import TYPE_CHECKING from typing import NamedTuple diff --git a/sdks/python/apache_beam/pvalue.py b/sdks/python/apache_beam/pvalue.py index 5196525a643c..2b593e4aaf98 100644 --- a/sdks/python/apache_beam/pvalue.py +++ b/sdks/python/apache_beam/pvalue.py @@ -26,12 +26,8 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import itertools -from builtins import hex -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import Dict @@ -42,8 +38,6 @@ from typing import TypeVar from typing import Union -from past.builtins import unicode - from apache_beam import coders from apache_beam import typehints from apache_beam.internal import pickler @@ -62,6 +56,7 @@ __all__ = [ 'PCollection', 'TaggedOutput', + 'AsSideInput', 'AsSingleton', 'AsIter', 'AsList', @@ -109,6 +104,7 @@ def __init__(self, self.is_bounded = is_bounded if windowing: self._windowing = windowing + self.requires_deterministic_key_coder = None def __str__(self): return self._str_internal() @@ -151,10 +147,6 @@ def __eq__(self, other): if isinstance(other, PCollection): return self.tag == other.tag and self.producer == other.producer - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash((self.tag, self.producer)) @@ -187,7 +179,8 @@ def to_runner_api(self, context): # type: (PipelineContext) -> beam_runner_api_pb2.PCollection return beam_runner_api_pb2.PCollection( unique_name=self._unique_name(), - coder_id=context.coder_id_from_element_type(self.element_type), + coder_id=context.coder_id_from_element_type( + self.element_type, self.requires_deterministic_key_coder), is_bounded=beam_runner_api_pb2.IsBounded.BOUNDED if self.is_bounded else beam_runner_api_pb2.IsBounded.UNBOUNDED, windowing_strategy_id=context.windowing_strategies.get_id( @@ -335,7 +328,7 @@ class TaggedOutput(object): """ def __init__(self, tag, value): # type: (str, Any) -> None - if not isinstance(tag, (str, unicode)): + if not isinstance(tag, str): raise TypeError( 'Attempting to create a TaggedOutput with non-string tag %s' % (tag, )) @@ -421,7 +414,16 @@ def __init__(self, side_input_data): @staticmethod def _from_runtime_iterable(it, options): - return options['data'].view_fn(it) + access_pattern = options['data'].access_pattern + if access_pattern == common_urns.side_inputs.ITERABLE.urn: + raw_view = it + elif access_pattern == common_urns.side_inputs.MULTIMAP.urn: + raw_view = collections.defaultdict(list) + for k, v in it: + raw_view[k].append(v) + else: + raise ValueError('Unknown access_pattern: %s' % access_pattern) + return options['data'].view_fn(raw_view) def _view_options(self): return { @@ -656,6 +658,9 @@ class Row(object): when applied to a PCollection of ints will produce a PCollection with schema `(x=int, y=float)`. + + Note that in Beam 2.30.0 and later, Row objects are sensitive to field order. + So `Row(x=3, y=4)` is not considered equal to `Row(y=4, x=3)`. """ def __init__(self, **kwargs): self.__dict__.update(kwargs) @@ -664,24 +669,21 @@ def as_dict(self): return dict(self.__dict__) def __iter__(self): - for _, value in sorted(self.__dict__.items()): + for _, value in self.__dict__.items(): yield value def __repr__(self): - return 'Row(%s)' % ', '.join( - '%s=%r' % kv for kv in sorted(self.__dict__.items())) + return 'Row(%s)' % ', '.join('%s=%r' % kv for kv in self.__dict__.items()) def __hash__(self): - return hash(type(sorted(self.__dict__.items()))) + return hash(self.__dict__.items()) def __eq__(self, other): - return type(self) == type(other) and self.__dict__ == other.__dict__ - - def __ne__(self, other): - return not self == other + return type(self) == type(other) and all( + s == o for s, o in zip(self.__dict__.items(), other.__dict__.items())) def __reduce__(self): - return _make_Row, tuple(sorted(self.__dict__.items())) + return _make_Row, tuple(self.__dict__.items()) def _make_Row(*items): diff --git a/sdks/python/apache_beam/pvalue_test.py b/sdks/python/apache_beam/pvalue_test.py index fbe90f5425cc..0ea1a8c9835e 100644 --- a/sdks/python/apache_beam/pvalue_test.py +++ b/sdks/python/apache_beam/pvalue_test.py @@ -19,13 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import - from apache_beam.pvalue import AsSingleton from apache_beam.pvalue import PValue from apache_beam.pvalue import TaggedOutput diff --git a/sdks/python/apache_beam/runners/__init__.py b/sdks/python/apache_beam/runners/__init__.py index 0f278d182d7c..f92d95aa4826 100644 --- a/sdks/python/apache_beam/runners/__init__.py +++ b/sdks/python/apache_beam/runners/__init__.py @@ -20,8 +20,6 @@ This package defines runners, which are used to execute a pipeline. """ -from __future__ import absolute_import - from apache_beam.runners.direct.direct_runner import DirectRunner from apache_beam.runners.direct.test_direct_runner import TestDirectRunner from apache_beam.runners.runner import PipelineRunner diff --git a/sdks/python/apache_beam/runners/common.pxd b/sdks/python/apache_beam/runners/common.pxd index 05f7a99ad153..08de4b9c332f 100644 --- a/sdks/python/apache_beam/runners/common.pxd +++ b/sdks/python/apache_beam/runners/common.pxd @@ -44,6 +44,7 @@ cdef class MethodWrapper(object): cdef object restriction_provider_arg_name cdef object watermark_estimator_provider cdef object watermark_estimator_provider_arg_name + cdef object dynamic_timer_tag_arg_name cdef bint unbounded_per_element diff --git a/sdks/python/apache_beam/runners/common.py b/sdks/python/apache_beam/runners/common.py index 5e0f6a554ce6..30004d16b5e9 100644 --- a/sdks/python/apache_beam/runners/common.py +++ b/sdks/python/apache_beam/runners/common.py @@ -15,6 +15,7 @@ # limitations under the License. # # cython: profile=True +# cython: language_level=3 """Worker operations executor. @@ -22,16 +23,9 @@ """ # pytype: skip-file - -from __future__ import absolute_import -from __future__ import division - +import sys import threading import traceback -from builtins import next -from builtins import object -from builtins import round -from builtins import zip from typing import TYPE_CHECKING from typing import Any from typing import Dict @@ -41,9 +35,6 @@ from typing import Optional from typing import Tuple -from future.utils import raise_with_traceback -from past.builtins import unicode - from apache_beam.coders import TupleCoder from apache_beam.internal import util from apache_beam.options.value_provider import RuntimeValueProvider @@ -62,6 +53,7 @@ from apache_beam.transforms.window import GlobalWindow from apache_beam.transforms.window import TimestampedValue from apache_beam.transforms.window import WindowFn +from apache_beam.typehints import typehints from apache_beam.utils.counters import Counter from apache_beam.utils.counters import CounterName from apache_beam.utils.timestamp import Timestamp @@ -91,10 +83,6 @@ def __init__(self, step_name, transform_id=None): def __eq__(self, other): return self.step_name == other.step_name - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __repr__(self): return 'NameContext(%s)' % self.__dict__ @@ -134,10 +122,6 @@ def __eq__(self, other): self.user_name == other.user_name and self.system_name == other.system_name) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash((self.step_name, self.user_name, self.system_name)) @@ -198,6 +182,7 @@ def __init__(self, obj_to_invoke, method_name): self.restriction_provider_arg_name = None self.watermark_estimator_provider = None self.watermark_estimator_provider_arg_name = None + self.dynamic_timer_tag_arg_name = None if hasattr(self.method_value, 'unbounded_per_element'): self.unbounded_per_element = True @@ -224,6 +209,8 @@ def __init__(self, obj_to_invoke, method_name): self.watermark_estimator_provider = ( v.watermark_estimator_provider or obj_to_invoke) self.watermark_estimator_provider_arg_name = kw + elif core.DoFn.DynamicTimerTagParam == v: + self.dynamic_timer_tag_arg_name = kw # Create NoOpWatermarkEstimatorProvider if there is no # WatermarkEstimatorParam provided. @@ -231,7 +218,13 @@ def __init__(self, obj_to_invoke, method_name): self.watermark_estimator_provider = NoOpWatermarkEstimatorProvider() def invoke_timer_callback( - self, user_state_context, key, window, timestamp, pane_info): + self, + user_state_context, + key, + window, + timestamp, + pane_info, + dynamic_timer_tag): # TODO(ccy): support side inputs. kwargs = {} if self.has_userstate_arguments: @@ -247,6 +240,8 @@ def invoke_timer_callback( kwargs[self.window_arg_name] = window if self.key_arg_name: kwargs[self.key_arg_name] = key + if self.dynamic_timer_tag_arg_name: + kwargs[self.dynamic_timer_tag_arg_name] = dynamic_timer_tag if kwargs: return self.method_value(**kwargs) @@ -527,12 +522,18 @@ def invoke_teardown(self): """ self.signature.teardown_lifecycle_method.method_value() - def invoke_user_timer(self, timer_spec, key, window, timestamp, pane_info): + def invoke_user_timer( + self, timer_spec, key, window, timestamp, pane_info, dynamic_timer_tag): # self.output_processor is Optional, but in practice it won't be None here self.output_processor.process_outputs( WindowedValue(None, timestamp, (window, )), self.signature.timer_methods[timer_spec].invoke_timer_callback( - self.user_state_context, key, window, timestamp, pane_info)) + self.user_state_context, + key, + window, + timestamp, + pane_info, + dynamic_timer_tag)) def invoke_create_watermark_estimator(self, estimator_state): return self.signature.create_watermark_estimator_method.method_value( @@ -700,6 +701,14 @@ def invoke_process(self, residuals = [] if self.is_splittable: + if restriction is None: + # This may be a SDF invoked as an ordinary DoFn on runners that don't + # understand SDF. See, e.g. BEAM-11472. + # In this case, processing the element is simply processing it against + # the entire initial restriction. + restriction = self.signature.initial_restriction_method.method_value( + windowed_value.value) + with self.splitting_lock: self.current_windowed_value = windowed_value self.restriction = restriction @@ -877,6 +886,8 @@ def _invoke_process_per_window(self, element = windowed_value.value size = self.signature.get_restriction_provider().restriction_size( element, deferred_restriction) + if size < 0: + raise ValueError('Expected size >= 0 but received %s.' % size) current_watermark = ( self.threadsafe_watermark_estimator.current_watermark()) estimator_state = ( @@ -937,6 +948,9 @@ def _try_split(fraction, def compute_whole_window_split(to_index, from_index): restriction_size = restriction_provider.restriction_size( windowed_value, restriction) + if restriction_size < 0: + raise ValueError( + 'Expected size >= 0 but received %s.' % restriction_size) # The primary and residual both share the same value only differing # by the set of windows they are in. value = ((windowed_value.value, (restriction, watermark_estimator_state)), @@ -1020,8 +1034,12 @@ def compute_whole_window_split(to_index, from_index): element = windowed_value.value primary_size = restriction_provider.restriction_size( windowed_value.value, primary) + if primary_size < 0: + raise ValueError('Expected size >= 0 but received %s.' % primary_size) residual_size = restriction_provider.restriction_size( windowed_value.value, residual) + if residual_size < 0: + raise ValueError('Expected size >= 0 but received %s.' % residual_size) # We use the watermark estimator state for the original process call # for the primary and the updated watermark estimator state for the # residual for the split. @@ -1234,10 +1252,11 @@ def current_element_progress(self): assert isinstance(self.do_fn_invoker, PerWindowInvoker) return self.do_fn_invoker.current_element_progress() - def process_user_timer(self, timer_spec, key, window, timestamp, pane_info): + def process_user_timer( + self, timer_spec, key, window, timestamp, pane_info, dynamic_timer_tag): try: self.do_fn_invoker.invoke_user_timer( - timer_spec, key, window, timestamp, pane_info) + timer_spec, key, window, timestamp, pane_info, dynamic_timer_tag) except BaseException as exn: self._reraise_augmented(exn) @@ -1292,7 +1311,8 @@ def _reraise_augmented(self, exn): traceback.format_exception_only(type(exn), exn)[-1].strip() + step_annotation) new_exn._tagged_with_step = True - raise_with_traceback(new_exn) + _, _, tb = sys.exc_info() + raise new_exn.with_traceback(tb) class OutputProcessor(object): @@ -1351,7 +1371,7 @@ def process_outputs( tag = None if isinstance(result, TaggedOutput): tag = result.tag - if not isinstance(tag, (str, unicode)): + if not isinstance(tag, str): raise TypeError('In %s, tag %s is not a string' % (self, tag)) result = result.value if isinstance(result, WindowedValue): @@ -1401,7 +1421,7 @@ def finish_bundle_outputs(self, results): tag = None if isinstance(result, TaggedOutput): tag = result.tag - if not isinstance(tag, (str, unicode)): + if not isinstance(tag, str): raise TypeError('In %s, tag %s is not a string' % (self, tag)) result = result.value @@ -1485,3 +1505,41 @@ def windows(self): raise AttributeError('windows not accessible in this context') else: return self.windowed_value.windows + + +def group_by_key_input_visitor(deterministic_key_coders=True): + # Importing here to avoid a circular dependency + from apache_beam.pipeline import PipelineVisitor + from apache_beam.transforms.core import GroupByKey + + class GroupByKeyInputVisitor(PipelineVisitor): + """A visitor that replaces `Any` element type for input `PCollection` of + a `GroupByKey` with a `KV` type. + + TODO(BEAM-115): Once Python SDK is compatible with the new Runner API, + we could directly replace the coder instead of mutating the element type. + """ + def __init__(self, deterministic_key_coders=True): + self.deterministic_key_coders = deterministic_key_coders + + def enter_composite_transform(self, transform_node): + self.visit_transform(transform_node) + + def visit_transform(self, transform_node): + # Imported here to avoid circular dependencies. + # pylint: disable=wrong-import-order, wrong-import-position + if isinstance(transform_node.transform, GroupByKey): + pcoll = transform_node.inputs[0] + pcoll.element_type = typehints.coerce_to_kv_type( + pcoll.element_type, transform_node.full_label) + pcoll.requires_deterministic_key_coder = ( + self.deterministic_key_coders and transform_node.full_label) + key_type, value_type = pcoll.element_type.tuple_types + if transform_node.outputs: + key = next(iter(transform_node.outputs.keys())) + transform_node.outputs[key].element_type = typehints.KV[ + key_type, typehints.Iterable[value_type]] + transform_node.outputs[key].requires_deterministic_key_coder = ( + self.deterministic_key_coders and transform_node.full_label) + + return GroupByKeyInputVisitor(deterministic_key_coders) diff --git a/sdks/python/apache_beam/runners/common_test.py b/sdks/python/apache_beam/runners/common_test.py index c9860e61021c..c089d5a02286 100644 --- a/sdks/python/apache_beam/runners/common_test.py +++ b/sdks/python/apache_beam/runners/common_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import hamcrest as hc diff --git a/sdks/python/apache_beam/runners/dataflow/__init__.py b/sdks/python/apache_beam/runners/dataflow/__init__.py index 2148f1691e97..6674ba5d9ff9 100644 --- a/sdks/python/apache_beam/runners/dataflow/__init__.py +++ b/sdks/python/apache_beam/runners/dataflow/__init__.py @@ -21,7 +21,5 @@ with no backwards-compatibility guarantees. """ -from __future__ import absolute_import - from apache_beam.runners.dataflow.dataflow_runner import DataflowRunner from apache_beam.runners.dataflow.test_dataflow_runner import TestDataflowRunner diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline.py b/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline.py index 7dafe53523e6..bfe56c7e38c2 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import time from hamcrest.library.number.ordering_comparison import greater_than diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline_test.py b/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline_test.py index 5aa61ba4e78c..934dfe58916d 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline_test.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline_test.py @@ -19,12 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions @@ -44,7 +42,7 @@ def run_pipeline(self, **opts): p = beam.Pipeline(options=pipeline_options) return dataflow_exercise_metrics_pipeline.apply_and_run(p) - @attr('IT') + @pytest.mark.it_postcommit def test_metrics_it(self): result = self.run_pipeline() errors = metric_result_matchers.verify_all( @@ -52,7 +50,8 @@ def test_metrics_it(self): dataflow_exercise_metrics_pipeline.metric_matchers()) self.assertFalse(errors, str(errors)) - @attr('IT', 'ValidatesContainer') + @pytest.mark.it_postcommit + @pytest.mark.it_validatescontainer def test_metrics_fnapi_it(self): result = self.run_pipeline(experiment='beam_fn_api') errors = metric_result_matchers.verify_all( diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_streaming_metrics_pipeline.py b/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_streaming_metrics_pipeline.py index 77d4106d536c..01c0c5beb909 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_streaming_metrics_pipeline.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_streaming_metrics_pipeline.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import time diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_streaming_metrics_pipeline_test.py b/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_streaming_metrics_pipeline_test.py index 353d32ee3e9f..aa302a4633a6 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_streaming_metrics_pipeline_test.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_streaming_metrics_pipeline_test.py @@ -19,14 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest import uuid +import pytest from hamcrest.core.core.allof import all_of -from nose.plugins.attrib import attr from apache_beam.io.gcp.tests.pubsub_matcher import PubSubMessageMatcher from apache_beam.runners.dataflow import dataflow_exercise_streaming_metrics_pipeline @@ -115,7 +113,9 @@ def run_pipeline(self): return dataflow_exercise_streaming_metrics_pipeline.run(argv) # Need not run streaming test in batch mode. - @attr('IT', 'ValidatesRunner', 'sickbay-batch') + @pytest.mark.it_validatesrunner + @pytest.mark.no_sickbay_batch + @pytest.mark.no_xdist def test_streaming_pipeline_returns_expected_user_metrics_fnapi_it(self): """ Runs streaming Dataflow job and verifies that user metrics are reported diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py index c292e245d823..e5728e8f24b5 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py @@ -23,16 +23,12 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import numbers import sys from collections import defaultdict -from future.utils import iteritems - from apache_beam.metrics.cells import DistributionData from apache_beam.metrics.cells import DistributionResult from apache_beam.metrics.execution import MetricKey @@ -197,7 +193,7 @@ def _populate_metrics(self, response, result, user_metrics=False): metrics_by_name[metric_key][tentative_or_committed] = metric # Now we create the MetricResult elements. - for metric_key, metric in iteritems(metrics_by_name): + for metric_key, metric in metrics_by_name.items(): attempted = self._get_metric_value(metric['tentative']) committed = self._get_metric_value(metric['committed']) result.append( diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_metrics_test.py b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics_test.py index daa01b771626..a2ef0fe12dce 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_metrics_test.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics_test.py @@ -22,11 +22,8 @@ # pytype: skip-file -from __future__ import absolute_import - import types import unittest -from builtins import object import mock diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py index 92d9c19dfc46..29a601610120 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py @@ -22,29 +22,23 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import base64 import logging import os -import subprocess -import sys import threading import time import traceback -import urllib -from builtins import hex from collections import defaultdict +from subprocess import DEVNULL from typing import TYPE_CHECKING from typing import List - -from future.utils import iteritems +from urllib.parse import quote +from urllib.parse import quote_from_bytes +from urllib.parse import unquote_to_bytes import apache_beam as beam from apache_beam import coders from apache_beam import error -from apache_beam import pvalue from apache_beam.internal import pickler from apache_beam.internal.gcp import json_value from apache_beam.options.pipeline_options import DebugOptions @@ -52,11 +46,13 @@ from apache_beam.options.pipeline_options import SetupOptions from apache_beam.options.pipeline_options import StandardOptions from apache_beam.options.pipeline_options import TestOptions +from apache_beam.options.pipeline_options import TypeOptions from apache_beam.options.pipeline_options import WorkerOptions from apache_beam.portability import common_urns from apache_beam.portability.api import beam_runner_api_pb2 from apache_beam.pvalue import AsSideInput from apache_beam.runners.common import DoFnSignature +from apache_beam.runners.common import group_by_key_input_visitor from apache_beam.runners.dataflow.internal import names from apache_beam.runners.dataflow.internal.clients import dataflow as dataflow_api from apache_beam.runners.dataflow.internal.names import PropertyNames @@ -66,7 +62,6 @@ from apache_beam.runners.runner import PipelineState from apache_beam.runners.runner import PValueCache from apache_beam.transforms import window -from apache_beam.transforms.core import RunnerAPIPTransformHolder from apache_beam.transforms.display import DisplayData from apache_beam.transforms.sideinputs import SIDE_INPUT_PREFIX from apache_beam.typehints import typehints @@ -78,13 +73,6 @@ if TYPE_CHECKING: from apache_beam.pipeline import PTransformOverride -if sys.version_info[0] > 2: - unquote_to_bytes = urllib.parse.unquote_to_bytes - quote = urllib.parse.quote -else: - unquote_to_bytes = urllib.unquote # pylint: disable=deprecated-urllib-function - quote = urllib.quote # pylint: disable=deprecated-urllib-function - __all__ = ['DataflowRunner'] _LOGGER = logging.getLogger(__name__) @@ -272,62 +260,10 @@ def _only_element(iterable): return element @staticmethod - def group_by_key_input_visitor(): - # Imported here to avoid circular dependencies. - from apache_beam.pipeline import PipelineVisitor - - class GroupByKeyInputVisitor(PipelineVisitor): - """A visitor that replaces `Any` element type for input `PCollection` of - a `GroupByKey` with a `KV` type. - - TODO(BEAM-115): Once Python SDk is compatible with the new Runner API, - we could directly replace the coder instead of mutating the element type. - """ - def enter_composite_transform(self, transform_node): - self.visit_transform(transform_node) - - def visit_transform(self, transform_node): - # Imported here to avoid circular dependencies. - # pylint: disable=wrong-import-order, wrong-import-position - from apache_beam.transforms.core import GroupByKey - if isinstance(transform_node.transform, GroupByKey): - pcoll = transform_node.inputs[0] - pcoll.element_type = typehints.coerce_to_kv_type( - pcoll.element_type, transform_node.full_label) - key_type, value_type = pcoll.element_type.tuple_types - if transform_node.outputs: - key = DataflowRunner._only_element(transform_node.outputs.keys()) - transform_node.outputs[key].element_type = typehints.KV[ - key_type, typehints.Iterable[value_type]] - - return GroupByKeyInputVisitor() - - @staticmethod - def _set_pdone_visitor(pipeline): - # Imported here to avoid circular dependencies. - from apache_beam.pipeline import PipelineVisitor - - class SetPDoneVisitor(PipelineVisitor): - def __init__(self, pipeline): - self._pipeline = pipeline - - @staticmethod - def _maybe_fix_output(transform_node, pipeline): - if not transform_node.outputs: - pval = pvalue.PDone(pipeline) - pval.producer = transform_node - transform_node.outputs = {None: pval} - - def enter_composite_transform(self, transform_node): - SetPDoneVisitor._maybe_fix_output(transform_node, self._pipeline) - - def visit_transform(self, transform_node): - SetPDoneVisitor._maybe_fix_output(transform_node, self._pipeline) - - return SetPDoneVisitor(pipeline) - - @staticmethod - def side_input_visitor(use_unified_worker=False, use_fn_api=False): + def side_input_visitor( + use_unified_worker=False, + use_fn_api=False, + deterministic_key_coders=True): # Imported here to avoid circular dependencies. # pylint: disable=wrong-import-order, wrong-import-position from apache_beam.pipeline import PipelineVisitor @@ -366,10 +302,10 @@ def visit_transform(self, transform_node): is_bounded=side_input.pvalue.is_bounded) parent = transform_node.parent or pipeline._root_transform() map_to_void_key = beam.pipeline.AppliedPTransform( - pipeline, + parent, beam.Map(lambda x: (b'', x)), transform_node.full_label + '/MapToVoidKey%s' % ix, - (side_input.pvalue, )) + {'input': side_input.pvalue}) new_side_input.pvalue.producer = map_to_void_key map_to_void_key.add_output(new_side_input.pvalue, None) parent.add_part(map_to_void_key) @@ -378,6 +314,8 @@ def visit_transform(self, transform_node): # access pattern to appease Dataflow. side_input.pvalue.element_type = typehints.coerce_to_kv_type( side_input.pvalue.element_type, transform_node.full_label) + side_input.pvalue.requires_deterministic_key_coder = ( + deterministic_key_coders and transform_node.full_label) new_side_input = _DataflowMultimapSideInput(side_input) else: raise ValueError( @@ -438,29 +376,13 @@ def _overrides_setup_or_teardown(combinefn): return CombineFnVisitor() - def _check_for_unsupported_fnapi_features(self, pipeline_proto): - components = pipeline_proto.components - for windowing_strategy in components.windowing_strategies.values(): - if (windowing_strategy.merge_status == - beam_runner_api_pb2.MergeStatus.NEEDS_MERGE and - windowing_strategy.window_fn.urn not in ( - common_urns.session_windows.urn, )): - raise RuntimeError( - 'Unsupported merging windowing strategy: %s' % - windowing_strategy.window_fn.urn) - elif components.coders[ - windowing_strategy.window_coder_id].spec.urn not in ( - common_urns.coders.GLOBAL_WINDOW.urn, - common_urns.coders.INTERVAL_WINDOW.urn): - raise RuntimeError( - 'Unsupported window coder %s for window fn %s' % ( - components.coders[windowing_strategy.window_coder_id].spec.urn, - windowing_strategy.window_fn.urn)) - def _adjust_pipeline_for_dataflow_v2(self, pipeline): # Dataflow runner requires a KV type for GBK inputs, hence we enforce that # here. - pipeline.visit(self.group_by_key_input_visitor()) + pipeline.visit( + group_by_key_input_visitor( + not pipeline._options.view_as( + TypeOptions).allow_non_deterministic_key_coders)) def _check_for_unsupported_features_on_non_portable_worker(self, pipeline): pipeline.visit(self.combinefn_visitor()) @@ -486,6 +408,18 @@ def run_pipeline(self, pipeline, options): 'Google Cloud Dataflow runner not available, ' 'please install apache_beam[gcp]') + debug_options = options.view_as(DebugOptions) + if pipeline.contains_external_transforms: + if not apiclient._use_unified_worker(options): + _LOGGER.info( + 'Automatically enabling Dataflow Runner v2 since the ' + 'pipeline used cross-language transforms.') + # This has to be done before any Fn API specific setup. + debug_options.add_experiment("use_runner_v2") + # Dataflow multi-language pipelines require portable job submission. + if not debug_options.lookup_experiment('use_portable_job_submission'): + debug_options.add_experiment("use_portable_job_submission") + self._maybe_add_unified_worker_missing_options(options) use_fnapi = apiclient._use_fnapi(options) @@ -497,7 +431,9 @@ def run_pipeline(self, pipeline, options): pipeline.visit( self.side_input_visitor( apiclient._use_unified_worker(options), - apiclient._use_fnapi(options))) + apiclient._use_fnapi(options), + deterministic_key_coders=not options.view_as( + TypeOptions).allow_non_deterministic_key_coders)) # Performing configured PTransform overrides. Note that this is currently # done before Runner API serialization, since the new proto needs to contain @@ -521,13 +457,14 @@ def run_pipeline(self, pipeline, options): # instead of using the inferred default container image. self._default_environment = ( environments.DockerEnvironment.from_options(options)) - options.view_as(WorkerOptions).worker_harness_container_image = ( + options.view_as(WorkerOptions).sdk_container_image = ( self._default_environment.container_image) else: self._default_environment = ( environments.DockerEnvironment.from_container_image( apiclient.get_container_image_from_options(options), - artifacts=environments.python_sdk_dependencies(options))) + artifacts=environments.python_sdk_dependencies(options), + resource_hints=environments.resource_hints_from_options(options))) # This has to be performed before pipeline proto is constructed to make sure # that the changes are reflected in the portable job submission path. @@ -537,34 +474,47 @@ def run_pipeline(self, pipeline, options): self.proto_pipeline, self.proto_context = pipeline.to_runner_api( return_context=True, default_environment=self._default_environment) - if use_fnapi: - self._check_for_unsupported_fnapi_features(self.proto_pipeline) - - # Cross language transform require using a pipeline object constructed - # from the full pipeline proto to make sure that expanded version of - # external transforms are reflected in the Pipeline job graph. - # TODO(chamikara): remove following pipeline and pipeline proto recreation - # after portable job submission path is fully in place. - from apache_beam import Pipeline - pipeline = Pipeline.from_runner_api( - self.proto_pipeline, - pipeline.runner, - options, - allow_proto_holders=True) - - # Pipelines generated from proto do not have output set to PDone set for - # leaf elements. - pipeline.visit(self._set_pdone_visitor(pipeline)) - - # We need to generate a new context that maps to the new pipeline object. - self.proto_pipeline, self.proto_context = pipeline.to_runner_api( - return_context=True, default_environment=self._default_environment) + # Optimize the pipeline if it not streaming and the pre_optimize + # experiment is set. + if not options.view_as(StandardOptions).streaming: + pre_optimize = options.view_as(DebugOptions).lookup_experiment( + 'pre_optimize', 'default').lower() + from apache_beam.runners.portability.fn_api_runner import translations + if pre_optimize == 'none': + phases = [] + elif pre_optimize == 'default' or pre_optimize == 'all': + phases = [translations.pack_combiners, translations.sort_stages] + else: + phases = [] + for phase_name in pre_optimize.split(','): + # For now, these are all we allow. + if phase_name in ('pack_combiners', ): + phases.append(getattr(translations, phase_name)) + else: + raise ValueError( + 'Unknown or inapplicable phase for pre_optimize: %s' % + phase_name) + phases.append(translations.sort_stages) - else: + if phases: + self.proto_pipeline = translations.optimize_pipeline( + self.proto_pipeline, + phases=phases, + known_runner_urns=frozenset(), + partial=True) + + if not use_fnapi: # Performing configured PTransform overrides which should not be reflected # in the proto representation of the graph. pipeline.replace_all(DataflowRunner._NON_PORTABLE_PTRANSFORM_OVERRIDES) + # Always upload graph out-of-band when explicitly using runner v2 with + # use_portable_job_submission to avoid irrelevant large graph limits. + if (apiclient._use_unified_worker(debug_options) and + debug_options.lookup_experiment('use_portable_job_submission') and + not debug_options.lookup_experiment('upload_graph')): + debug_options.add_experiment("upload_graph") + # Add setup_options for all the BeamPlugin imports setup_options = options.view_as(SetupOptions) plugins = BeamPlugin.get_all_plugin_paths() @@ -614,8 +564,7 @@ def run_pipeline(self, pipeline, options): # is set. Note that use_avro is only interpreted by the Dataflow runner # at job submission and is not interpreted by Dataflow service or workers, # which by default use avro library unless use_fastavro experiment is set. - if sys.version_info[0] > 2 and ( - not debug_options.lookup_experiment('use_avro')): + if not debug_options.lookup_experiment('use_avro'): debug_options.add_experiment('use_fastavro') self.job = apiclient.Job(options, self.proto_pipeline) @@ -702,8 +651,6 @@ def _get_side_input_encoding(self, input_encoding): def _get_encoded_output_coder( self, transform_node, window_value=True, output_tag=None): """Returns the cloud encoding of the coder for the output of a transform.""" - is_external_transform = isinstance( - transform_node.transform, RunnerAPIPTransformHolder) if output_tag in transform_node.outputs: element_type = transform_node.outputs[output_tag].element_type @@ -711,10 +658,7 @@ def _get_encoded_output_coder( output_tag = DataflowRunner._only_element(transform_node.outputs.keys()) # TODO(robertwb): Handle type hints for multi-output transforms. element_type = transform_node.outputs[output_tag].element_type - elif is_external_transform: - raise ValueError( - 'For external transforms, output_tag must be specified ' - 'since we cannot fallback to a Python only coder.') + else: # TODO(silviuc): Remove this branch (and assert) when typehints are # propagated everywhere. Returning an 'Any' as type hint will trigger @@ -753,19 +697,15 @@ def _add_step(self, step_kind, step_label, transform_node, side_tags=()): step.add_property(PropertyNames.USER_NAME, step_label) # Cache the node/step association for the main output of the transform node. - # Main output key of external transforms can be ambiguous, so we only tag if - # there's only one tag instead of None. - output_tag = ( - DataflowRunner._only_element(transform_node.outputs.keys()) if len( - transform_node.outputs.keys()) == 1 else None) + # External transforms may not use 'None' as an output tag. + output_tags = ([None] + + list(side_tags) if None in transform_node.outputs.keys() else + list(transform_node.outputs.keys())) - self._cache.cache_output(transform_node, output_tag, step) - # If side_tags is not () then this is a multi-output transform node and we - # need to cache the (node, tag, step) for each of the tags used to access - # the outputs. This is essential because the keys used to search in the - # cache always contain the tag. - for tag in side_tags: - self._cache.cache_output(transform_node, tag, step) + # We have to cache output for all tags since some transforms may produce + # multiple outputs. + for output_tag in output_tags: + self._cache.cache_output(transform_node, output_tag, step) # Finally, we add the display data items to the pipeline step. # If the transform contains no display data then an empty list is added. @@ -776,6 +716,14 @@ def _add_step(self, step_kind, step_label, transform_node, side_tags=()): for item in DisplayData.create_from(transform_node.transform).items ]) + if transform_node.resource_hints: + step.add_property( + PropertyNames.RESOURCE_HINTS, + { + hint: quote_from_bytes(value) + for (hint, value) in transform_node.resource_hints.items() + }) + return step def _add_singleton_step( @@ -892,12 +840,6 @@ def _verify_gbk_coders(self, transform, pcoll): parent = pcoll.producer if parent: - # Skip the check because we can assume that any x-lang transform is - # properly formed (the onus is on the expansion service to construct - # transforms correctly). - if isinstance(parent.transform, RunnerAPIPTransformHolder): - return - coder = parent.transform._infer_output_coder() # pylint: disable=protected-access if not coder: coder = self._get_coder(pcoll.element_type or typehints.Any, None) @@ -939,35 +881,32 @@ def run_GroupByKey(self, transform_node, options): PropertyNames.SERIALIZED_FN, self.serialize_windowing_strategy(windowing, self._default_environment)) - def run_RunnerAPIPTransformHolder(self, transform_node, options): - """Adding Dataflow runner job description for transform holder objects. + def run_ExternalTransform(self, transform_node, options): + # Adds a dummy step to the Dataflow job description so that inputs and + # outputs are mapped correctly in the presence of external transforms. + # + # Note that Dataflow Python multi-language pipelines use Portable Job + # Submission by default, hence this step and rest of the Dataflow step + # definitions defined here are not used at Dataflow service but we have to + # maintain the mapping correctly till we can fully drop the Dataflow step + # definitions from the SDK. + + # AppliedTransform node outputs have to be updated to correctly map the + # outputs for external transforms. + transform_node.outputs = ({ + output.tag: output + for output in transform_node.outputs.values() + }) - These holder transform objects are generated for some of the transforms that - become available after a cross-language transform expansion, usually if the - corresponding transform object cannot be generated in Python SDK (for - example, a python `ParDo` transform cannot be generated without a serialized - Python `DoFn` object). - """ - urn = transform_node.transform.proto().urn - assert urn - # TODO(chamikara): support other transforms that requires holder objects in - # Python SDk. - if common_urns.primitives.PAR_DO.urn == urn: - self.run_ParDo(transform_node, options) - else: - raise NotImplementedError( - '%s uses unsupported URN: %s' % (transform_node.full_label, urn)) + self.run_Impulse(transform_node, options) def run_ParDo(self, transform_node, options): transform = transform_node.transform input_tag = transform_node.inputs[0].tag input_step = self._cache.get_pvalue(transform_node.inputs[0]) - is_external_transform = isinstance(transform, RunnerAPIPTransformHolder) - # Attach side inputs. si_dict = {} - all_input_labels = transform_node.input_tags_to_preserve si_labels = {} full_label_counts = defaultdict(int) lookup_label = lambda side_pval: si_labels[side_pval] @@ -977,13 +916,10 @@ def run_ParDo(self, transform_node, options): assert isinstance(side_pval, AsSideInput) step_name = 'SideInput-' + self._get_unique_step_name() si_label = ((SIDE_INPUT_PREFIX + '%d-%s') % - (ix, transform_node.full_label) - if side_pval.pvalue not in all_input_labels else - all_input_labels[side_pval.pvalue]) + (ix, transform_node.full_label)) old_label = (SIDE_INPUT_PREFIX + '%d') % ix - if not is_external_transform: - label_renames[old_label] = si_label + label_renames[old_label] = si_label assert old_label in named_inputs pcollection_label = '%s.%s' % ( @@ -1032,14 +968,14 @@ def run_ParDo(self, transform_node, options): if (label_renames and transform_proto.spec.urn == common_urns.primitives.PAR_DO.urn): # Patch PTransform proto. - for old, new in iteritems(label_renames): + for old, new in label_renames.items(): transform_proto.inputs[new] = transform_proto.inputs[old] del transform_proto.inputs[old] # Patch ParDo proto. proto_type, _ = beam.PTransform._known_urns[transform_proto.spec.urn] proto = proto_utils.parse_Bytes(transform_proto.spec.payload, proto_type) - for old, new in iteritems(label_renames): + for old, new in label_renames.items(): proto.side_inputs[new].CopyFrom(proto.side_inputs[old]) del proto.side_inputs[old] transform_proto.spec.payload = proto.SerializeToString() @@ -1084,7 +1020,7 @@ def run_ParDo(self, transform_node, options): # external transforms, we leave tags unmodified. # # Python SDK uses 'None' as the tag of the main output. - main_output_tag = (all_output_tags[0] if is_external_transform else 'None') + main_output_tag = 'None' step.encoding = self._get_encoded_output_coder( transform_node, output_tag=main_output_tag) @@ -1122,9 +1058,7 @@ def run_ParDo(self, transform_node, options): self._get_cloud_encoding(restriction_coder)) if options.view_as(StandardOptions).streaming: - is_stateful_dofn = ( - transform.is_pardo_with_stateful_dofn if is_external_transform else - DoFnSignature(transform.dofn).is_stateful_dofn()) + is_stateful_dofn = (DoFnSignature(transform.dofn).is_stateful_dofn()) if is_stateful_dofn: step.add_property(PropertyNames.USES_KEYED_STATE, 'true') @@ -1535,11 +1469,6 @@ def get_default_gcp_region(self): return environment_region try: cmd = ['gcloud', 'config', 'get-value', 'compute/region'] - # Use subprocess.DEVNULL in Python 3.3+. - if hasattr(subprocess, 'DEVNULL'): - DEVNULL = subprocess.DEVNULL - else: - DEVNULL = open(os.devnull, 'ab') raw_output = processes.check_output(cmd, stderr=DEVNULL) formatted_output = raw_output.decode('utf-8').strip() if formatted_output: diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py index 436d5d15a275..e7ce71b4f247 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py @@ -19,32 +19,28 @@ # pytype: skip-file -from __future__ import absolute_import - import json -import sys import unittest -from builtins import object -from builtins import range from datetime import datetime -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import mock +from parameterized import param +from parameterized import parameterized import apache_beam as beam import apache_beam.transforms as ptransform from apache_beam.coders import BytesCoder -from apache_beam.coders import coders from apache_beam.options.pipeline_options import DebugOptions from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.pipeline import AppliedPTransform from apache_beam.pipeline import Pipeline from apache_beam.portability import common_urns +from apache_beam.portability import python_urns from apache_beam.portability.api import beam_runner_api_pb2 from apache_beam.pvalue import PCollection from apache_beam.runners import DataflowRunner from apache_beam.runners import TestDataflowRunner +from apache_beam.runners import common from apache_beam.runners import create_runner from apache_beam.runners.dataflow.dataflow_runner import DataflowPipelineResult from apache_beam.runners.dataflow.dataflow_runner import DataflowRuntimeException @@ -207,9 +203,29 @@ def test_create_runner(self): self.assertTrue( isinstance(create_runner('TestDataflowRunner'), TestDataflowRunner)) - def test_environment_override_translation(self): + def test_environment_override_translation_legacy_worker_harness_image(self): self.default_properties.append('--experiments=beam_fn_api') - self.default_properties.append('--worker_harness_container_image=FOO') + self.default_properties.append('--worker_harness_container_image=LEGACY') + remote_runner = DataflowRunner() + with Pipeline(remote_runner, + options=PipelineOptions(self.default_properties)) as p: + ( # pylint: disable=expression-not-assigned + p | ptransform.Create([1, 2, 3]) + | 'Do' >> ptransform.FlatMap(lambda x: [(x, x)]) + | ptransform.GroupByKey()) + self.assertEqual( + list(remote_runner.proto_pipeline.components.environments.values()), + [ + beam_runner_api_pb2.Environment( + urn=common_urns.environments.DOCKER.urn, + payload=beam_runner_api_pb2.DockerPayload( + container_image='LEGACY').SerializeToString(), + capabilities=environments.python_sdk_capabilities()) + ]) + + def test_environment_override_translation_sdk_container_image(self): + self.default_properties.append('--experiments=beam_fn_api') + self.default_properties.append('--sdk_container_image=FOO') remote_runner = DataflowRunner() with Pipeline(remote_runner, options=PipelineOptions(self.default_properties)) as p: @@ -332,9 +348,10 @@ def test_group_by_key_input_visitor_with_valid_inputs(self): pcoll2.element_type = typehints.Any pcoll3.element_type = typehints.KV[typehints.Any, typehints.Any] for pcoll in [pcoll1, pcoll2, pcoll3]: - applied = AppliedPTransform(None, beam.GroupByKey(), "label", [pcoll]) + applied = AppliedPTransform( + None, beam.GroupByKey(), "label", {'pcoll': pcoll}) applied.outputs[None] = PCollection(None) - DataflowRunner.group_by_key_input_visitor().visit_transform(applied) + common.group_by_key_input_visitor().visit_transform(applied) self.assertEqual( pcoll.element_type, typehints.KV[typehints.Any, typehints.Any]) @@ -350,16 +367,16 @@ def test_group_by_key_input_visitor_with_invalid_inputs(self): "Found .*") for pcoll in [pcoll1, pcoll2]: with self.assertRaisesRegex(ValueError, err_msg): - DataflowRunner.group_by_key_input_visitor().visit_transform( - AppliedPTransform(None, beam.GroupByKey(), "label", [pcoll])) + common.group_by_key_input_visitor().visit_transform( + AppliedPTransform(None, beam.GroupByKey(), "label", {'in': pcoll})) def test_group_by_key_input_visitor_for_non_gbk_transforms(self): p = TestPipeline() pcoll = PCollection(p) for transform in [beam.Flatten(), beam.Map(lambda x: x)]: pcoll.element_type = typehints.Any - DataflowRunner.group_by_key_input_visitor().visit_transform( - AppliedPTransform(None, transform, "label", [pcoll])) + common.group_by_key_input_visitor().visit_transform( + AppliedPTransform(None, transform, "label", {'in': pcoll})) self.assertEqual(pcoll.element_type, typehints.Any) def test_flatten_input_with_visitor_with_single_input(self): @@ -371,11 +388,11 @@ def test_flatten_input_with_visitor_with_multiple_inputs(self): def _test_flatten_input_visitor(self, input_type, output_type, num_inputs): p = TestPipeline() - inputs = [] - for _ in range(num_inputs): + inputs = {} + for ix in range(num_inputs): input_pcoll = PCollection(p) input_pcoll.element_type = input_type - inputs.append(input_pcoll) + inputs[str(ix)] = input_pcoll output_pcoll = PCollection(p) output_pcoll.element_type = output_type @@ -383,7 +400,7 @@ def _test_flatten_input_visitor(self, input_type, output_type, num_inputs): flatten.add_output(output_pcoll, None) DataflowRunner.flatten_input_visitor().visit_transform(flatten) for _ in range(num_inputs): - self.assertEqual(inputs[0].element_type, output_type) + self.assertEqual(inputs['0'].element_type, output_type) def test_gbk_then_flatten_input_visitor(self): p = TestPipeline( @@ -398,7 +415,7 @@ def test_gbk_then_flatten_input_visitor(self): # to make sure the check below is not vacuous. self.assertNotIsInstance(flat.element_type, typehints.TupleConstraint) - p.visit(DataflowRunner.group_by_key_input_visitor()) + p.visit(common.group_by_key_input_visitor()) p.visit(DataflowRunner.flatten_input_visitor()) # The dataflow runner requires gbk input to be tuples *and* flatten @@ -426,7 +443,7 @@ def test_side_input_visitor(self): z: (x, y, z), beam.pvalue.AsSingleton(pc), beam.pvalue.AsMultiMap(pc)) - applied_transform = AppliedPTransform(None, transform, "label", [pc]) + applied_transform = AppliedPTransform(None, transform, "label", {'pc': pc}) DataflowRunner.side_input_visitor( use_fn_api=True).visit_transform(applied_transform) self.assertEqual(2, len(applied_transform.side_inputs)) @@ -503,8 +520,7 @@ def test_use_fastavro_experiment_is_added_on_py3_and_onwards(self): with Pipeline(remote_runner, PipelineOptions(self.default_properties)) as p: p | ptransform.Create([1]) # pylint: disable=expression-not-assigned - self.assertEqual( - sys.version_info[0] > 2, + self.assertTrue( remote_runner.job.options.view_as(DebugOptions).lookup_experiment( 'use_fastavro', False)) @@ -519,23 +535,6 @@ def test_use_fastavro_experiment_is_not_added_when_use_avro_is_present(self): self.assertFalse(debug_options.lookup_experiment('use_fastavro', False)) - def test_unsupported_fnapi_features(self): - remote_runner = DataflowRunner() - self.default_properties.append('--experiment=beam_fn_api') - self.default_properties.append('--experiment=use_runner_v2') - - with self.assertRaisesRegex(RuntimeError, 'Unsupported merging'): - with Pipeline(remote_runner, - options=PipelineOptions(self.default_properties)) as p: - # pylint: disable=expression-not-assigned - p | beam.Create([]) | beam.WindowInto(CustomMergingWindowFn()) - - with self.assertRaisesRegex(RuntimeError, 'Unsupported window coder'): - with Pipeline(remote_runner, - options=PipelineOptions(self.default_properties)) as p: - # pylint: disable=expression-not-assigned - p | beam.Create([]) | beam.WindowInto(CustomWindowTypeWindowFn()) - @mock.patch('os.environ.get', return_value=None) @mock.patch('apache_beam.utils.processes.check_output', return_value=b'') def test_get_default_gcp_region_no_default_returns_none( @@ -791,8 +790,6 @@ def teardown(self, *args, **kwargs): def _run_group_into_batches_and_get_step_properties( self, with_sharded_key, additional_properties): self.default_properties.append('--streaming') - self.default_properties.append( - '--experiment=enable_streaming_auto_sharding') for property in additional_properties: self.default_properties.append(property) @@ -815,58 +812,98 @@ def _run_group_into_batches_and_get_step_properties( def test_group_into_batches_translation(self): properties = self._run_group_into_batches_and_get_step_properties( - True, ['--enable_streaming_engine', '--experiment=use_runner_v2']) + True, ['--enable_streaming_engine', '--experiments=use_runner_v2']) self.assertEqual(properties[PropertyNames.USES_KEYED_STATE], u'true') self.assertEqual(properties[PropertyNames.ALLOWS_SHARDABLE_STATE], u'true') self.assertEqual(properties[PropertyNames.PRESERVES_KEYS], u'true') - def test_group_into_batches_translation_non_se(self): - properties = self._run_group_into_batches_and_get_step_properties( - True, ['--experiment=use_runner_v2']) - self.assertEqual(properties[PropertyNames.USES_KEYED_STATE], u'true') - self.assertFalse(PropertyNames.ALLOWS_SHARDABLE_STATE in properties) - self.assertFalse(PropertyNames.PRESERVES_KEYS in properties) - def test_group_into_batches_translation_non_sharded(self): properties = self._run_group_into_batches_and_get_step_properties( - False, ['--enable_streaming_engine', '--experiment=use_runner_v2']) + False, ['--enable_streaming_engine', '--experiments=use_runner_v2']) self.assertEqual(properties[PropertyNames.USES_KEYED_STATE], u'true') - self.assertFalse(PropertyNames.ALLOWS_SHARDABLE_STATE in properties) - self.assertFalse(PropertyNames.PRESERVES_KEYS in properties) + self.assertNotIn(PropertyNames.ALLOWS_SHARDABLE_STATE, properties) + self.assertNotIn(PropertyNames.PRESERVES_KEYS, properties) + + def test_group_into_batches_translation_non_se(self): + with self.assertRaisesRegex( + ValueError, + 'Runner determined sharding not available in Dataflow for ' + 'GroupIntoBatches for non-Streaming-Engine jobs'): + _ = self._run_group_into_batches_and_get_step_properties( + True, ['--experiments=use_runner_v2']) def test_group_into_batches_translation_non_unified_worker(self): # non-portable - properties = self._run_group_into_batches_and_get_step_properties( - True, ['--enable_streaming_engine']) - self.assertEqual(properties[PropertyNames.USES_KEYED_STATE], u'true') - self.assertFalse(PropertyNames.ALLOWS_SHARDABLE_STATE in properties) - self.assertFalse(PropertyNames.PRESERVES_KEYS in properties) + with self.assertRaisesRegex( + ValueError, + 'Runner determined sharding not available in Dataflow for ' + 'GroupIntoBatches for jobs not using Runner V2'): + _ = self._run_group_into_batches_and_get_step_properties( + True, ['--enable_streaming_engine']) # JRH - properties = self._run_group_into_batches_and_get_step_properties( - True, ['--enable_streaming_engine', '--experiment=beam_fn_api']) - self.assertEqual(properties[PropertyNames.USES_KEYED_STATE], u'true') - self.assertFalse(PropertyNames.ALLOWS_SHARDABLE_STATE in properties) - self.assertFalse(PropertyNames.PRESERVES_KEYS in properties) - - -class CustomMergingWindowFn(window.WindowFn): - def assign(self, assign_context): - return [] + with self.assertRaisesRegex( + ValueError, + 'Runner determined sharding not available in Dataflow for ' + 'GroupIntoBatches for jobs not using Runner V2'): + _ = self._run_group_into_batches_and_get_step_properties( + True, ['--enable_streaming_engine', '--experiments=beam_fn_api']) + + def test_pack_combiners(self): + class PackableCombines(beam.PTransform): + def annotations(self): + return {python_urns.APPLY_COMBINER_PACKING: b''} + + def expand(self, pcoll): + _ = pcoll | 'PackableMin' >> beam.CombineGlobally(min) + _ = pcoll | 'PackableMax' >> beam.CombineGlobally(max) - def merge(self, merge_context): - pass - - def get_window_coder(self): - return coders.IntervalWindowCoder() - - -class CustomWindowTypeWindowFn(window.NonMergingWindowFn): - def assign(self, assign_context): - return [] + runner = DataflowRunner() + with beam.Pipeline(runner=runner, + options=PipelineOptions(self.default_properties)) as p: + _ = p | beam.Create([10, 20, 30]) | PackableCombines() + + unpacked_minimum_step_name = ( + 'PackableCombines/PackableMin/CombinePerKey/Combine') + unpacked_maximum_step_name = ( + 'PackableCombines/PackableMax/CombinePerKey/Combine') + packed_step_name = ( + 'PackableCombines/Packed[PackableMin_CombinePerKey, ' + 'PackableMax_CombinePerKey]/Pack') + transform_names = set( + transform.unique_name + for transform in runner.proto_pipeline.components.transforms.values()) + self.assertNotIn(unpacked_minimum_step_name, transform_names) + self.assertNotIn(unpacked_maximum_step_name, transform_names) + self.assertIn(packed_step_name, transform_names) + + @parameterized.expand([ + param(memory_hint='min_ram'), + param(memory_hint='minRam'), + ]) + def test_resource_hints_translation(self, memory_hint): + runner = DataflowRunner() + self.default_properties.append('--resource_hint=accelerator=some_gpu') + self.default_properties.append(f'--resource_hint={memory_hint}=20GB') + with beam.Pipeline(runner=runner, + options=PipelineOptions(self.default_properties)) as p: + # pylint: disable=expression-not-assigned + ( + p + | beam.Create([1]) + | 'MapWithHints' >> beam.Map(lambda x: x + 1).with_resource_hints( + min_ram='10GB', + accelerator='type:nvidia-tesla-k80;count:1;install-nvidia-drivers' + )) - def get_window_coder(self): - return coders.BytesCoder() + step = self._find_step(runner.job, 'MapWithHints') + self.assertEqual( + step['properties']['resource_hints'], + { + 'beam:resources:min_ram_bytes:v1': '20000000000', + 'beam:resources:accelerator:v1': \ + 'type%3Anvidia-tesla-k80%3Bcount%3A1%3Binstall-nvidia-drivers' + }) if __name__ == '__main__': diff --git a/sdks/python/apache_beam/runners/dataflow/internal/__init__.py b/sdks/python/apache_beam/runners/dataflow/internal/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/__init__.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py index b9de09bab378..c60e88e7c070 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py @@ -21,14 +21,14 @@ # pytype: skip-file -from __future__ import absolute_import - import codecs import getpass import io import json import logging import os +import random + import pkg_resources import re import sys @@ -37,9 +37,6 @@ from copy import copy from datetime import datetime -from builtins import object -from past.builtins import unicode - from apitools.base.py import encoding from apitools.base.py import exceptions @@ -48,6 +45,7 @@ from apache_beam.internal.gcp.json_value import to_json_value from apache_beam.internal.http_client import get_new_http from apache_beam.io.filesystems import FileSystems +from apache_beam.io.gcp.gcsfilesystem import GCSFileSystem from apache_beam.io.gcp.internal.clients import storage from apache_beam.options.pipeline_options import DebugOptions from apache_beam.options.pipeline_options import GoogleCloudOptions @@ -63,6 +61,7 @@ from apache_beam.transforms import DataflowDistributionCounter from apache_beam.transforms import cy_combiners from apache_beam.transforms.display import DisplayData +from apache_beam.transforms.environments import is_apache_beam_container from apache_beam.utils import retry from apache_beam.utils import proto_utils @@ -281,18 +280,11 @@ def __init__( # Dataflow workers. environments_to_use = self._get_environments_from_tranforms() if _use_unified_worker(options): - # Adding a SDK container image for the pipeline SDKs - container_image = dataflow.SdkHarnessContainerImage() - pipeline_sdk_container_image = get_container_image_from_options(options) - container_image.containerImage = pipeline_sdk_container_image - container_image.useSingleCorePerContainer = True # True for Python SDK. - pool.sdkHarnessContainerImages.append(container_image) - - already_added_containers = [pipeline_sdk_container_image] + python_sdk_container_image = get_container_image_from_options(options) # Adding container images for other SDKs that may be needed for # cross-language pipelines. - for environment in environments_to_use: + for id, environment in environments_to_use: if environment.urn != common_urns.environments.DOCKER.urn: raise Exception( 'Dataflow can only execute pipeline steps in Docker environments.' @@ -300,26 +292,14 @@ def __init__( environment_payload = proto_utils.parse_Bytes( environment.payload, beam_runner_api_pb2.DockerPayload) container_image_url = environment_payload.container_image - if container_image_url in already_added_containers: - # Do not add the pipeline environment again. - - # Currently, Dataflow uses Docker container images to uniquely - # identify execution environments. Hence Dataflow executes all - # transforms that specify the the same Docker container image in a - # single container instance. Dependencies of all environments that - # specify a given container image will be staged in the container - # instance for that particular container image. - # TODO(BEAM-9455): loosen this restriction to support multiple - # environments with the same container image when Dataflow supports - # environment specific artifact provisioning. - continue - already_added_containers.append(container_image_url) container_image = dataflow.SdkHarnessContainerImage() container_image.containerImage = container_image_url # Currently we only set following to True for Python SDK. # TODO: set this correctly for remote environments that might be Python. - container_image.useSingleCorePerContainer = False + container_image.useSingleCorePerContainer = ( + container_image_url == python_sdk_container_image) + container_image.environmentId = id pool.sdkHarnessContainerImages.append(container_image) if self.debug_options.number_of_worker_harness_threads: @@ -364,6 +344,13 @@ def __init__( dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty( key='display_data', value=to_json_value(items))) + if self.google_cloud_options.dataflow_service_options: + for option in self.google_cloud_options.dataflow_service_options: + self.proto.serviceOptions.append(option) + + if self.google_cloud_options.enable_hot_key_logging: + self.proto.debugOptions = dataflow.DebugOptions(enableHotKeyLogging=True) + def _get_environments_from_tranforms(self): if not self._proto_pipeline: return [] @@ -373,10 +360,8 @@ def _get_environments_from_tranforms(self): for transform in self._proto_pipeline.components.transforms.values() if transform.environment_id) - return [ - self._proto_pipeline.components.environments[id] - for id in environment_ids - ] + return [(id, self._proto_pipeline.components.environments[id]) + for id in environment_ids] def _get_python_sdk_name(self): python_version = '%d.%d' % (sys.version_info[0], sys.version_info[1]) @@ -404,7 +389,7 @@ def encode_shortstrings(input_buffer, errors='strict'): def decode_shortstrings(input_buffer, errors='strict'): """Decoder (to Unicode) that suppresses long base64 strings.""" shortened, length = encode_shortstrings(input_buffer, errors) - return unicode(shortened), length + return str(shortened), length def shortstrings_registerer(encoding_name): if encoding_name == 'shortstrings': @@ -500,6 +485,9 @@ def __init__(self, options, proto_pipeline): self.proto.transformNameMapping.additionalProperties.append( dataflow.Job.TransformNameMappingValue.AdditionalProperty( key=key, value=value)) + if self.google_cloud_options.create_from_snapshot: + self.proto.createdFromSnapshotId = ( + self.google_cloud_options.create_from_snapshot) # Labels. if self.google_cloud_options.labels: self.proto.labels = dataflow.Job.LabelsValue() @@ -510,6 +498,11 @@ def __init__(self, options, proto_pipeline): self.proto.labels.additionalProperties.append( dataflow.Job.LabelsValue.AdditionalProperty(key=key, value=value)) + # Client Request ID + self.proto.clientRequestId = '{}-{}'.format( + datetime.utcnow().strftime('%Y%m%d%H%M%S%f'), + random.randrange(9000) + 1000) + self.base64_str_re = re.compile(r'^[A-Za-z0-9+/]*=*$') self.coder_str_re = re.compile(r'^([A-Za-z]+\$)([A-Za-z0-9+/]*=*)$') @@ -560,11 +553,12 @@ def __init__(self, options): def _get_sdk_image_overrides(self, pipeline_options): worker_options = pipeline_options.view_as(WorkerOptions) sdk_overrides = worker_options.sdk_harness_container_image_overrides - if sdk_overrides: - return dict(override_str.split(',', 1) for override_str in sdk_overrides) + return ( + dict(s.split(',', 1) + for s in sdk_overrides) if sdk_overrides else dict()) - # TODO(silviuc): Refactor so that retry logic can be applied. - @retry.no_retries # Using no_retries marks this as an integration point. + @retry.with_exponential_backoff( + retry_filter=retry.retry_on_server_errors_and_timeout_filter) def _gcs_file_copy(self, from_path, to_path): to_folder, to_name = os.path.split(to_path) total_size = os.path.getsize(from_path) @@ -579,7 +573,7 @@ def _stage_resources(self, pipeline, options): raise RuntimeError('The --temp_location option must be specified.') resources = [] - hashs = {} + hashes = {} for _, env in sorted(pipeline.components.environments.items(), key=lambda kv: kv[0]): for dep in env.dependencies: @@ -592,16 +586,31 @@ def _stage_resources(self, pipeline, options): role_payload = ( beam_runner_api_pb2.ArtifactStagingToRolePayload.FromString( dep.role_payload)) - if type_payload.sha256 and type_payload.sha256 in hashs: + if type_payload.sha256 and type_payload.sha256 in hashes: _LOGGER.info( 'Found duplicated artifact: %s (%s)', type_payload.path, type_payload.sha256) + staged_name = hashes[type_payload.sha256] dep.role_payload = beam_runner_api_pb2.ArtifactStagingToRolePayload( - staged_name=hashs[type_payload.sha256]).SerializeToString() + staged_name=staged_name).SerializeToString() else: - resources.append((type_payload.path, role_payload.staged_name)) - hashs[type_payload.sha256] = role_payload.staged_name + staged_name = role_payload.staged_name + resources.append((type_payload.path, staged_name)) + hashes[type_payload.sha256] = staged_name + + if FileSystems.get_scheme( + google_cloud_options.staging_location) == GCSFileSystem.scheme(): + dep.type_urn = common_urns.artifact_types.URL.urn + dep.type_payload = beam_runner_api_pb2.ArtifactUrlPayload( + url=FileSystems.join( + google_cloud_options.staging_location, staged_name), + sha256=type_payload.sha256).SerializeToString() + else: + dep.type_payload = beam_runner_api_pb2.ArtifactFilePayload( + path=FileSystems.join( + google_cloud_options.staging_location, staged_name), + sha256=type_payload.sha256).SerializeToString() resource_stager = _LegacyDataflowStager(self) staged_resources = resource_stager.stage_job_resources( @@ -659,14 +668,11 @@ def create_job(self, job): template_location = ( job.options.view_as(GoogleCloudOptions).template_location) - job_location = template_location or dataflow_job_file - if job_location: - gcs_or_local_path = os.path.dirname(job_location) - file_name = os.path.basename(job_location) - self.stage_file( - gcs_or_local_path, file_name, io.BytesIO(job.json().encode('utf-8'))) - if job.options.view_as(DebugOptions).lookup_experiment('upload_graph'): + # For Runner V2, also set portable job submission. + if _use_unified_worker(job.options): + job.options.view_as(DebugOptions).add_experiment( + 'use_portable_job_submission') self.stage_file( job.options.view_as(GoogleCloudOptions).staging_location, "dataflow_graph.json", @@ -676,6 +682,15 @@ def create_job(self, job): job.options.view_as(GoogleCloudOptions).staging_location, "dataflow_graph.json") + # template file generation should be placed immediately before the + # conditional API call. + job_location = template_location or dataflow_job_file + if job_location: + gcs_or_local_path = os.path.dirname(job_location) + file_name = os.path.basename(job_location) + self.stage_file( + gcs_or_local_path, file_name, io.BytesIO(job.json().encode('utf-8'))) + if not template_location: return self.submit_job_description(job) @@ -684,22 +699,59 @@ def create_job(self, job): return None @staticmethod - def _apply_sdk_environment_overrides(proto_pipeline, sdk_overrides): - # Update environments based on user provided overrides - if sdk_overrides: - for environment in proto_pipeline.components.environments.values(): - docker_payload = proto_utils.parse_Bytes( - environment.payload, beam_runner_api_pb2.DockerPayload) - for pattern, override in sdk_overrides.items(): - new_payload = copy(docker_payload) - new_payload.container_image = re.sub( - pattern, override, docker_payload.container_image) - environment.payload = new_payload.SerializeToString() + def _update_container_image_for_dataflow(beam_container_image_url): + # By default Dataflow pipelines use containers hosted in Dataflow GCR + # instead of Docker Hub. + image_suffix = beam_container_image_url.rsplit('/', 1)[1] + return names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/' + image_suffix + + @staticmethod + def _apply_sdk_environment_overrides( + proto_pipeline, sdk_overrides, pipeline_options): + # Updates container image URLs for Dataflow. + # For a given container image URL + # * If a matching override has been provided that will be used. + # * For improved performance, External Apache Beam container images that are + # not explicitly overridden will be + # updated to use GCR copies instead of directly downloading from the + # Docker Hub. + + current_sdk_container_image = get_container_image_from_options( + pipeline_options) + + for environment in proto_pipeline.components.environments.values(): + docker_payload = proto_utils.parse_Bytes( + environment.payload, beam_runner_api_pb2.DockerPayload) + overridden = False + new_container_image = docker_payload.container_image + for pattern, override in sdk_overrides.items(): + new_container_image = re.sub(pattern, override, new_container_image) + if new_container_image != docker_payload.container_image: + overridden = True + + # Container of the current (Python) SDK is overridden separately, hence + # not updated here. + if (is_apache_beam_container(new_container_image) and not overridden and + new_container_image != current_sdk_container_image): + new_container_image = ( + DataflowApplicationClient._update_container_image_for_dataflow( + docker_payload.container_image)) + + if not new_container_image: + raise ValueError( + 'SDK Docker container image has to be a non-empty string') + + new_payload = copy(docker_payload) + new_payload.container_image = new_container_image + environment.payload = new_payload.SerializeToString() def create_job_description(self, job): """Creates a job described by the workflow proto.""" DataflowApplicationClient._apply_sdk_environment_overrides( - job.proto_pipeline, self._sdk_image_overrides) + job.proto_pipeline, self._sdk_image_overrides, job.options) + + # Stage other resources for the SDK harness + resources = self._stage_resources(job.proto_pipeline, job.options) # Stage proto pipeline. self.stage_file( @@ -707,9 +759,6 @@ def create_job_description(self, job): shared_names.STAGED_PIPELINE_FILENAME, io.BytesIO(job.proto_pipeline.SerializeToString())) - # Stage other resources for the SDK harness - resources = self._stage_resources(job.proto_pipeline, job.options) - job.proto.environment = Environment( proto_pipeline_staged_url=FileSystems.join( job.google_cloud_options.staging_location, @@ -753,6 +802,20 @@ def submit_job_description(self, job): self.google_cloud_options.dataflow_endpoint) _LOGGER.fatal('details of server error: %s', e) raise + + if response.clientRequestId and \ + response.clientRequestId != job.proto.clientRequestId: + if self.google_cloud_options.update: + raise DataflowJobAlreadyExistsError( + "The job named %s with id: %s has already been updated into job " + "id: %s and cannot be updated again." % + (response.name, job.proto.replaceJobId, response.id)) + else: + raise DataflowJobAlreadyExistsError( + 'There is already active job named %s with id: %s. If you want to ' + 'submit a second job, try again by setting a different name using ' + '--job_name.' % (response.name, response.id)) + _LOGGER.info('Create job: %s', response) # The response is a Job proto with the id for the new job. _LOGGER.info('Created job with id: [%s]', response.id) @@ -987,6 +1050,13 @@ def get_sdk_package_name(): return shared_names.BEAM_PACKAGE_NAME +class DataflowJobAlreadyExistsError(retry.PermanentException): + """A non-retryable exception that a job with the given name already exists.""" + # Inherits retry.PermanentException to avoid retry in + # DataflowApplicationClient.submit_job_description + pass + + def to_split_int(n): res = dataflow.SplitInt64() res.lowBits = n & 0xffffffff @@ -994,6 +1064,8 @@ def to_split_int(n): return res +# TODO: Used in legacy batch worker. Move under MetricUpdateTranslators +# after Runner V2 transition. def translate_distribution(distribution_update, metric_update_proto): """Translate metrics DistributionUpdate to dataflow distribution update. @@ -1014,20 +1086,11 @@ def translate_distribution(distribution_update, metric_update_proto): metric_update_proto.distribution = dist_update_proto +# TODO: Used in legacy batch worker. Delete after Runner V2 transition. def translate_value(value, metric_update_proto): metric_update_proto.integer = to_split_int(value) -def translate_mean(accumulator, metric_update): - if accumulator.count: - metric_update.meanSum = to_json_value(accumulator.sum, with_type=True) - metric_update.meanCount = to_json_value(accumulator.count, with_type=True) - else: - # A denominator of 0 will raise an error in the service. - # What it means is we have nothing to report yet, so don't. - metric_update.kind = None - - def _use_fnapi(pipeline_options): standard_options = pipeline_options.view_as(StandardOptions) debug_options = pipeline_options.view_as(DebugOptions) @@ -1070,8 +1133,8 @@ def get_container_image_from_options(pipeline_options): str: Container image for remote execution. """ worker_options = pipeline_options.view_as(WorkerOptions) - if worker_options.worker_harness_container_image: - return worker_options.worker_harness_container_image + if worker_options.sdk_container_image: + return worker_options.sdk_container_image use_fnapi = _use_fnapi(pipeline_options) # TODO(tvalentyn): Use enumerated type instead of strings for job types. @@ -1129,7 +1192,7 @@ def get_runner_harness_container_image(): def get_response_encoding(): """Encoding to use to decode HTTP response from Google APIs.""" - return None if sys.version_info[0] < 3 else 'utf8' + return 'utf8' def _verify_interpreter_version_is_supported(pipeline_options): diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py b/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py index 34f49880c6b3..b63da701a751 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py @@ -19,17 +19,15 @@ # pytype: skip-file -from __future__ import absolute_import - +import json import logging import sys import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import mock from apache_beam.metrics.cells import DistributionData +from apache_beam.options.pipeline_options import GoogleCloudOptions from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.pipeline import Pipeline from apache_beam.portability import common_urns @@ -173,7 +171,7 @@ def test_flexrs_speed(self): dataflow.Environment.FlexResourceSchedulingGoalValueValuesEnum. FLEXRS_SPEED_OPTIMIZED)) - def test_sdk_harness_container_images_get_set(self): + def test_default_environment_get_set(self): pipeline_options = PipelineOptions([ '--experiments=beam_fn_api', @@ -189,8 +187,6 @@ def test_sdk_harness_container_images_get_set(self): proto_pipeline, _ = pipeline.to_runner_api( return_context=True, default_environment=test_environment) - # We have to manually add environments since Dataflow only sets - # 'sdkHarnessContainerImages' when there are at least two environments. dummy_env = beam_runner_api_pb2.Environment( urn=common_urns.environments.DOCKER.urn, payload=( @@ -214,16 +210,13 @@ def test_sdk_harness_container_images_get_set(self): }) worker_pool = env.proto.workerPools[0] - # For the test, a third environment get added since actual default - # container image for Dataflow is different from 'test_default_image' - # we've provided above. - self.assertEqual(3, len(worker_pool.sdkHarnessContainerImages)) + self.assertEqual(2, len(worker_pool.sdkHarnessContainerImages)) - # Container image should be overridden by a Dataflow specific URL. - self.assertTrue( - str.startswith( - (worker_pool.sdkHarnessContainerImages[0]).containerImage, - 'gcr.io/cloud-dataflow/v1beta3/python')) + images_from_proto = [ + sdk_info.containerImage + for sdk_info in worker_pool.sdkHarnessContainerImages + ] + self.assertIn('test_default_image', images_from_proto) def test_sdk_harness_container_image_overrides(self): test_environment = DockerEnvironment( @@ -231,9 +224,21 @@ def test_sdk_harness_container_image_overrides(self): proto_pipeline, _ = Pipeline().to_runner_api( return_context=True, default_environment=test_environment) + pipeline_options = PipelineOptions([ + '--experiments=beam_fn_api', + '--experiments=use_unified_worker', + '--temp_location', + 'gs://any-location/temp' + ]) + # Accessing non-public method for testing. apiclient.DataflowApplicationClient._apply_sdk_environment_overrides( - proto_pipeline, {'.*dummy.*': 'new_dummy_container_image'}) + proto_pipeline, + { + '.*dummy.*': 'new_dummy_container_image', + '.*notfound.*': 'new_dummy_container_image_2' + }, + pipeline_options) self.assertIsNotNone(1, len(proto_pipeline.components.environments)) env = list(proto_pipeline.components.environments.values())[0] @@ -246,6 +251,106 @@ def test_sdk_harness_container_image_overrides(self): self.assertEqual( docker_payload.container_image, 'new_dummy_container_image') + def test_dataflow_container_image_override(self): + pipeline_options = PipelineOptions([ + '--experiments=beam_fn_api', + '--experiments=use_unified_worker', + '--temp_location', + 'gs://any-location/temp' + ]) + + pipeline = Pipeline(options=pipeline_options) + pipeline | Create([1, 2, 3]) | ParDo(DoFn()) # pylint:disable=expression-not-assigned + + dummy_env = DockerEnvironment( + container_image='apache/beam_dummy_name:dummy_tag') + proto_pipeline, _ = pipeline.to_runner_api( + return_context=True, default_environment=dummy_env) + + # Accessing non-public method for testing. + apiclient.DataflowApplicationClient._apply_sdk_environment_overrides( + proto_pipeline, dict(), pipeline_options) + + from apache_beam.utils import proto_utils + found_override = False + for env in proto_pipeline.components.environments.values(): + docker_payload = proto_utils.parse_Bytes( + env.payload, beam_runner_api_pb2.DockerPayload) + if docker_payload.container_image.startswith( + names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY): + found_override = True + + self.assertTrue(found_override) + + def test_non_apache_container_not_overridden(self): + pipeline_options = PipelineOptions([ + '--experiments=beam_fn_api', + '--experiments=use_unified_worker', + '--temp_location', + 'gs://any-location/temp' + ]) + + pipeline = Pipeline(options=pipeline_options) + pipeline | Create([1, 2, 3]) | ParDo(DoFn()) # pylint:disable=expression-not-assigned + + dummy_env = DockerEnvironment( + container_image='other_org/dummy_name:dummy_tag') + proto_pipeline, _ = pipeline.to_runner_api( + return_context=True, default_environment=dummy_env) + + # Accessing non-public method for testing. + apiclient.DataflowApplicationClient._apply_sdk_environment_overrides( + proto_pipeline, dict(), pipeline_options) + + self.assertIsNotNone(2, len(proto_pipeline.components.environments)) + + from apache_beam.utils import proto_utils + found_override = False + for env in proto_pipeline.components.environments.values(): + docker_payload = proto_utils.parse_Bytes( + env.payload, beam_runner_api_pb2.DockerPayload) + if docker_payload.container_image.startswith( + names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY): + found_override = True + + self.assertFalse(found_override) + + def test_pipeline_sdk_not_overridden(self): + pipeline_options = PipelineOptions([ + '--experiments=beam_fn_api', + '--experiments=use_unified_worker', + '--temp_location', + 'gs://any-location/temp', + '--sdk_container_image=dummy_prefix/dummy_name:dummy_tag' + ]) + + pipeline = Pipeline(options=pipeline_options) + pipeline | Create([1, 2, 3]) | ParDo(DoFn()) # pylint:disable=expression-not-assigned + + proto_pipeline, _ = pipeline.to_runner_api(return_context=True) + + dummy_env = DockerEnvironment( + container_image='dummy_prefix/dummy_name:dummy_tag') + proto_pipeline, _ = pipeline.to_runner_api( + return_context=True, default_environment=dummy_env) + + # Accessing non-public method for testing. + apiclient.DataflowApplicationClient._apply_sdk_environment_overrides( + proto_pipeline, dict(), pipeline_options) + + self.assertIsNotNone(2, len(proto_pipeline.components.environments)) + + from apache_beam.utils import proto_utils + found_override = False + for env in proto_pipeline.components.environments.values(): + docker_payload = proto_utils.parse_Bytes( + env.payload, beam_runner_api_pb2.DockerPayload) + if docker_payload.container_image.startswith( + names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY): + found_override = True + + self.assertFalse(found_override) + def test_invalid_default_job_name(self): # Regexp for job names in dataflow. regexp = '^[a-z]([-a-z0-9]{0,61}[a-z0-9])?$' @@ -540,27 +645,14 @@ def test_pinned_worker_harness_image_tag_used_in_dev_sdk(self): pipeline_options, '2.0.0', #any environment version FAKE_PIPELINE_URL) - if sys.version_info[0] == 2: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python-fnapi:' + - names.BEAM_FNAPI_CONTAINER_VERSION)) - elif sys.version_info[0:2] == (3, 5): - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python3-fnapi:' + - names.BEAM_FNAPI_CONTAINER_VERSION)) - else: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + - '/python%d%d-fnapi:%s' % ( - sys.version_info[0], - sys.version_info[1], - names.BEAM_FNAPI_CONTAINER_VERSION))) + self.assertEqual( + env.proto.workerPools[0].workerHarnessContainerImage, + ( + names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python%d%d-fnapi:%s' % + ( + sys.version_info[0], + sys.version_info[1], + names.BEAM_FNAPI_CONTAINER_VERSION))) # batch, legacy pipeline. pipeline_options = PipelineOptions( @@ -570,26 +662,13 @@ def test_pinned_worker_harness_image_tag_used_in_dev_sdk(self): pipeline_options, '2.0.0', #any environment version FAKE_PIPELINE_URL) - if sys.version_info[0] == 2: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python:' + - names.BEAM_CONTAINER_VERSION)) - elif sys.version_info[0:2] == (3, 5): - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python3:' + - names.BEAM_CONTAINER_VERSION)) - else: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python%d%d:%s' % ( - sys.version_info[0], - sys.version_info[1], - names.BEAM_CONTAINER_VERSION))) + self.assertEqual( + env.proto.workerPools[0].workerHarnessContainerImage, + ( + names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python%d%d:%s' % ( + sys.version_info[0], + sys.version_info[1], + names.BEAM_CONTAINER_VERSION))) @mock.patch( 'apache_beam.runners.dataflow.internal.apiclient.' @@ -604,21 +683,12 @@ def test_worker_harness_image_tag_matches_released_sdk_version(self): pipeline_options, '2.0.0', #any environment version FAKE_PIPELINE_URL) - if sys.version_info[0] == 2: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python-fnapi:2.2.0')) - elif sys.version_info[0:2] == (3, 5): - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python3-fnapi:2.2.0')) - else: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + - '/python%d%d-fnapi:2.2.0' % - (sys.version_info[0], sys.version_info[1]))) + self.assertEqual( + env.proto.workerPools[0].workerHarnessContainerImage, + ( + names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + + '/python%d%d-fnapi:2.2.0' % + (sys.version_info[0], sys.version_info[1]))) # batch, legacy pipeline. pipeline_options = PipelineOptions( @@ -628,20 +698,11 @@ def test_worker_harness_image_tag_matches_released_sdk_version(self): pipeline_options, '2.0.0', #any environment version FAKE_PIPELINE_URL) - if sys.version_info[0] == 2: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python:2.2.0')) - elif sys.version_info[0:2] == (3, 5): - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python3:2.2.0')) - else: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python%d%d:2.2.0' % - (sys.version_info[0], sys.version_info[1]))) + self.assertEqual( + env.proto.workerPools[0].workerHarnessContainerImage, + ( + names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python%d%d:2.2.0' % + (sys.version_info[0], sys.version_info[1]))) @mock.patch( 'apache_beam.runners.dataflow.internal.apiclient.' @@ -656,21 +717,12 @@ def test_worker_harness_image_tag_matches_base_sdk_version_of_an_rc(self): pipeline_options, '2.0.0', #any environment version FAKE_PIPELINE_URL) - if sys.version_info[0] == 2: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python-fnapi:2.2.0')) - elif sys.version_info[0:2] == (3, 5): - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python3-fnapi:2.2.0')) - else: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + - '/python%d%d-fnapi:2.2.0' % - (sys.version_info[0], sys.version_info[1]))) + self.assertEqual( + env.proto.workerPools[0].workerHarnessContainerImage, + ( + names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + + '/python%d%d-fnapi:2.2.0' % + (sys.version_info[0], sys.version_info[1]))) # batch, legacy pipeline. pipeline_options = PipelineOptions( @@ -680,20 +732,11 @@ def test_worker_harness_image_tag_matches_base_sdk_version_of_an_rc(self): pipeline_options, '2.0.0', #any environment version FAKE_PIPELINE_URL) - if sys.version_info[0] == 2: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python:2.2.0')) - elif sys.version_info[0:2] == (3, 5): - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python3:2.2.0')) - else: - self.assertEqual( - env.proto.workerPools[0].workerHarnessContainerImage, - ( - names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python%d%d:2.2.0' % - (sys.version_info[0], sys.version_info[1]))) + self.assertEqual( + env.proto.workerPools[0].workerHarnessContainerImage, + ( + names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY + '/python%d%d:2.2.0' % + (sys.version_info[0], sys.version_info[1]))) def test_worker_harness_override_takes_precedence_over_sdk_defaults(self): # streaming, fnapi pipeline. @@ -701,7 +744,7 @@ def test_worker_harness_override_takes_precedence_over_sdk_defaults(self): '--temp_location', 'gs://any-location/temp', '--streaming', - '--worker_harness_container_image=some:image' + '--sdk_container_image=some:image' ]) env = apiclient.Environment( [], #packages @@ -714,7 +757,7 @@ def test_worker_harness_override_takes_precedence_over_sdk_defaults(self): pipeline_options = PipelineOptions([ '--temp_location', 'gs://any-location/temp', - '--worker_harness_container_image=some:image' + '--sdk_container_image=some:image' ]) env = apiclient.Environment( [], #packages @@ -743,6 +786,20 @@ def test_transform_name_mapping(self, mock_job): job = apiclient.Job(pipeline_options, FAKE_PIPELINE_URL) self.assertIsNotNone(job.proto.transformNameMapping) + def test_created_from_snapshot_id(self): + pipeline_options = PipelineOptions([ + '--project', + 'test_project', + '--job_name', + 'test_job_name', + '--temp_location', + 'gs://test-location/temp', + '--create_from_snapshot', + 'test_snapshot_id' + ]) + job = apiclient.Job(pipeline_options, FAKE_PIPELINE_URL) + self.assertEqual('test_snapshot_id', job.proto.createdFromSnapshotId) + def test_labels(self): pipeline_options = PipelineOptions([ '--project', @@ -952,11 +1009,9 @@ def test_use_unified_worker(self): def test_get_response_encoding(self): encoding = apiclient.get_response_encoding() - version_to_encoding = {3: 'utf8', 2: None} - assert encoding == version_to_encoding[sys.version_info[0]] + assert encoding == 'utf8' - @unittest.skip("Enable once BEAM-1080 is fixed.") def test_graph_is_uploaded(self): pipeline_options = PipelineOptions([ '--project', @@ -971,6 +1026,7 @@ def test_graph_is_uploaded(self): 'upload_graph' ]) job = apiclient.Job(pipeline_options, FAKE_PIPELINE_URL) + pipeline_options.view_as(GoogleCloudOptions).no_auth = True client = apiclient.DataflowApplicationClient(pipeline_options) with mock.patch.object(client, 'stage_file', side_effect=None): with mock.patch.object(client, 'create_job_description', @@ -983,6 +1039,129 @@ def test_graph_is_uploaded(self): mock.ANY, "dataflow_graph.json", mock.ANY) client.create_job_description.assert_called_once() + def test_create_job_returns_existing_job(self): + pipeline_options = PipelineOptions([ + '--project', + 'test_project', + '--job_name', + 'test_job_name', + '--temp_location', + 'gs://test-location/temp', + ]) + job = apiclient.Job(pipeline_options, FAKE_PIPELINE_URL) + self.assertTrue(job.proto.clientRequestId) # asserts non-empty string + pipeline_options.view_as(GoogleCloudOptions).no_auth = True + client = apiclient.DataflowApplicationClient(pipeline_options) + + response = dataflow.Job() + # different clientRequestId from `job` + response.clientRequestId = "20210821081910123456-1234" + response.name = 'test_job_name' + response.id = '2021-08-19_21_18_43-9756917246311111021' + + with mock.patch.object(client._client.projects_locations_jobs, + 'Create', + side_effect=[response]): + with mock.patch.object(client, 'create_job_description', + side_effect=None): + with self.assertRaises( + apiclient.DataflowJobAlreadyExistsError) as context: + client.create_job(job) + + self.assertEqual( + str(context.exception), + 'There is already active job named %s with id: %s. If you want to ' + 'submit a second job, try again by setting a different name using ' + '--job_name.' % ('test_job_name', response.id)) + + def test_update_job_returns_existing_job(self): + pipeline_options = PipelineOptions([ + '--project', + 'test_project', + '--job_name', + 'test_job_name', + '--temp_location', + 'gs://test-location/temp', + '--region', + 'us-central1', + '--update', + ]) + replace_job_id = '2021-08-21_00_00_01-6081497447916622336' + with mock.patch('apache_beam.runners.dataflow.internal.apiclient.Job.' + 'job_id_for_name', + return_value=replace_job_id) as job_id_for_name_mock: + job = apiclient.Job(pipeline_options, FAKE_PIPELINE_URL) + job_id_for_name_mock.assert_called_once() + + self.assertTrue(job.proto.clientRequestId) # asserts non-empty string + + pipeline_options.view_as(GoogleCloudOptions).no_auth = True + client = apiclient.DataflowApplicationClient(pipeline_options) + + response = dataflow.Job() + # different clientRequestId from `job` + response.clientRequestId = "20210821083254123456-1234" + response.name = 'test_job_name' + response.id = '2021-08-19_21_29_07-5725551945600207770' + + with mock.patch.object(client, 'create_job_description', side_effect=None): + with mock.patch.object(client._client.projects_locations_jobs, + 'Create', + side_effect=[response]): + + with self.assertRaises( + apiclient.DataflowJobAlreadyExistsError) as context: + client.create_job(job) + + self.assertEqual( + str(context.exception), + 'The job named %s with id: %s has already been updated into job ' + 'id: %s and cannot be updated again.' % + ('test_job_name', replace_job_id, response.id)) + + def test_template_file_generation_with_upload_graph(self): + pipeline_options = PipelineOptions([ + '--project', + 'test_project', + '--job_name', + 'test_job_name', + '--temp_location', + 'gs://test-location/temp', + '--experiments', + 'upload_graph', + '--template_location', + 'gs://test-location/template' + ]) + job = apiclient.Job(pipeline_options, FAKE_PIPELINE_URL) + job.proto.steps.append(dataflow.Step(name='test_step_name')) + + pipeline_options.view_as(GoogleCloudOptions).no_auth = True + client = apiclient.DataflowApplicationClient(pipeline_options) + with mock.patch.object(client, 'stage_file', side_effect=None): + with mock.patch.object(client, 'create_job_description', + side_effect=None): + with mock.patch.object(client, + 'submit_job_description', + side_effect=None): + client.create_job(job) + + client.stage_file.assert_has_calls([ + mock.call(mock.ANY, 'dataflow_graph.json', mock.ANY), + mock.call(mock.ANY, 'template', mock.ANY) + ]) + client.create_job_description.assert_called_once() + # template is generated, but job should not be submitted to the + # service. + client.submit_job_description.assert_not_called() + + template_filename = client.stage_file.call_args_list[-1][0][1] + self.assertTrue('template' in template_filename) + template_content = client.stage_file.call_args_list[-1][0][2].read( + ).decode('utf-8') + template_obj = json.loads(template_content) + self.assertFalse(template_obj.get('steps')) + self.assertTrue(template_obj['stepsLocation']) + def test_stage_resources(self): pipeline_options = PipelineOptions([ '--temp_location', @@ -1046,6 +1225,91 @@ def test_stage_resources(self): ('/tmp/bar2', 'bar2')], staging_location='gs://test-location/staging') + pipeline_expected = beam_runner_api_pb2.Pipeline( + components=beam_runner_api_pb2.Components( + environments={ + 'env1': beam_runner_api_pb2.Environment( + dependencies=[ + beam_runner_api_pb2.ArtifactInformation( + type_urn=common_urns.artifact_types.URL.urn, + type_payload=beam_runner_api_pb2.ArtifactUrlPayload( + url='gs://test-location/staging/foo1' + ).SerializeToString(), + role_urn=common_urns.artifact_roles.STAGING_TO.urn, + role_payload=beam_runner_api_pb2. + ArtifactStagingToRolePayload( + staged_name='foo1').SerializeToString()), + beam_runner_api_pb2.ArtifactInformation( + type_urn=common_urns.artifact_types.URL.urn, + type_payload=beam_runner_api_pb2.ArtifactUrlPayload( + url='gs://test-location/staging/bar1'). + SerializeToString(), + role_urn=common_urns.artifact_roles.STAGING_TO.urn, + role_payload=beam_runner_api_pb2. + ArtifactStagingToRolePayload( + staged_name='bar1').SerializeToString()) + ]), + 'env2': beam_runner_api_pb2.Environment( + dependencies=[ + beam_runner_api_pb2.ArtifactInformation( + type_urn=common_urns.artifact_types.URL.urn, + type_payload=beam_runner_api_pb2.ArtifactUrlPayload( + url='gs://test-location/staging/foo2'). + SerializeToString(), + role_urn=common_urns.artifact_roles.STAGING_TO.urn, + role_payload=beam_runner_api_pb2. + ArtifactStagingToRolePayload( + staged_name='foo2').SerializeToString()), + beam_runner_api_pb2.ArtifactInformation( + type_urn=common_urns.artifact_types.URL.urn, + type_payload=beam_runner_api_pb2.ArtifactUrlPayload( + url='gs://test-location/staging/bar2'). + SerializeToString(), + role_urn=common_urns.artifact_roles.STAGING_TO.urn, + role_payload=beam_runner_api_pb2. + ArtifactStagingToRolePayload( + staged_name='bar2').SerializeToString()) + ]) + })) + self.assertEqual(pipeline, pipeline_expected) + + def test_set_dataflow_service_option(self): + pipeline_options = PipelineOptions([ + '--dataflow_service_option', + 'whizz=bang', + '--temp_location', + 'gs://any-location/temp' + ]) + env = apiclient.Environment( + [], #packages + pipeline_options, + '2.0.0', #any environment version + FAKE_PIPELINE_URL) + self.assertEqual(env.proto.serviceOptions, ['whizz=bang']) + + def test_enable_hot_key_logging(self): + # Tests that the enable_hot_key_logging is not set by default. + pipeline_options = PipelineOptions( + ['--temp_location', 'gs://any-location/temp']) + env = apiclient.Environment( + [], #packages + pipeline_options, + '2.0.0', #any environment version + FAKE_PIPELINE_URL) + self.assertIsNone(env.proto.debugOptions) + + # Now test that it is set when given. + pipeline_options = PipelineOptions([ + '--enable_hot_key_logging', '--temp_location', 'gs://any-location/temp' + ]) + env = apiclient.Environment( + [], #packages + pipeline_options, + '2.0.0', #any environment version + FAKE_PIPELINE_URL) + self.assertEqual( + env.proto.debugOptions, dataflow.DebugOptions(enableHotKeyLogging=True)) + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/runners/dataflow/internal/clients/__init__.py b/sdks/python/apache_beam/runners/dataflow/internal/clients/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/clients/__init__.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/clients/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/__init__.py b/sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/__init__.py new file mode 100644 index 000000000000..35017d88b02d --- /dev/null +++ b/sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/__init__.py @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Common imports for generated cloudbuild client library.""" +# pylint:disable=wildcard-import +# mypy: ignore-errors + +import pkgutil + +# Protect against environments where apitools library is not available. +# pylint: disable=wrong-import-order, wrong-import-position +try: + from apitools.base.py import * + + from apache_beam.runners.dataflow.internal.clients.cloudbuild.cloudbuild_v1_client import * + from apache_beam.runners.dataflow.internal.clients.cloudbuild.cloudbuild_v1_messages import * +except ImportError: + pass +# pylint: enable=wrong-import-order, wrong-import-position + +__path__ = pkgutil.extend_path(__path__, __name__) diff --git a/sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/cloudbuild_v1_client.py b/sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/cloudbuild_v1_client.py new file mode 100644 index 000000000000..fe60374480e2 --- /dev/null +++ b/sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/cloudbuild_v1_client.py @@ -0,0 +1,686 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Generated client library for cloudbuild version v1.""" +# NOTE: This file is autogenerated and should not be edited by hand. +# To regenerate the client: +# pip install google-apitools[cli] +# gen_client --discovery_url=cloudbuild.v1 --overwrite \ +# --outdir=apache_beam/runners/dataflow/internal/clients/cloudbuild \ +# --root_package=. client +# +# mypy: ignore-errors +from apitools.base.py import base_api + +from . import cloudbuild_v1_messages as messages + + +class CloudbuildV1(base_api.BaseApiClient): + """Generated client library for service cloudbuild version v1.""" + + MESSAGES_MODULE = messages + BASE_URL = 'https://cloudbuild.googleapis.com/' + MTLS_BASE_URL = 'https://cloudbuild.mtls.googleapis.com/' + + _PACKAGE = 'cloudbuild' + _SCOPES = ['https://www.googleapis.com/auth/cloud-platform'] + _VERSION = 'v1' + _CLIENT_ID = '1042881264118.apps.googleusercontent.com' + _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b' + _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b' + _CLIENT_CLASS_NAME = 'CloudbuildV1' + _URL_VERSION = 'v1' + _API_KEY = None + + def __init__( + self, + url='', + credentials=None, + get_credentials=True, + http=None, + model=None, + log_request=False, + log_response=False, + credentials_args=None, + default_global_params=None, + additional_http_headers=None, + response_encoding=None): + """Create a new cloudbuild handle.""" + url = url or self.BASE_URL + super(CloudbuildV1, self).__init__( + url, + credentials=credentials, + get_credentials=get_credentials, + http=http, + model=model, + log_request=log_request, + log_response=log_response, + credentials_args=credentials_args, + default_global_params=default_global_params, + additional_http_headers=additional_http_headers, + response_encoding=response_encoding) + self.operations = self.OperationsService(self) + self.projects_builds = self.ProjectsBuildsService(self) + self.projects_locations_builds = self.ProjectsLocationsBuildsService(self) + self.projects_locations_operations = self.ProjectsLocationsOperationsService( + self) + self.projects_locations = self.ProjectsLocationsService(self) + self.projects_triggers = self.ProjectsTriggersService(self) + self.projects = self.ProjectsService(self) + + class OperationsService(base_api.BaseApiService): + """Service class for the operations resource.""" + + _NAME = 'operations' + + def __init__(self, client): + super(CloudbuildV1.OperationsService, self).__init__(client) + self._upload_configs = {} + + def Cancel(self, request, global_params=None): + r"""Starts asynchronous cancellation on a long-running operation. The server makes a best effort to cancel the operation, but success is not guaranteed. If the server doesn't support this method, it returns `google.rpc.Code.UNIMPLEMENTED`. Clients can use Operations.GetOperation or other methods to check whether the cancellation succeeded or whether the operation completed despite cancellation. On successful cancellation, the operation is not deleted; instead, it becomes an operation with an Operation.error value with a google.rpc.Status.code of 1, corresponding to `Code.CANCELLED`. + + Args: + request: (CloudbuildOperationsCancelRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Empty) The response message. + """ + config = self.GetMethodConfig('Cancel') + return self._RunMethod(config, request, global_params=global_params) + + Cancel.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1/operations/{operationsId}:cancel', + http_method='POST', + method_id='cloudbuild.operations.cancel', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1/{+name}:cancel', + request_field='cancelOperationRequest', + request_type_name='CloudbuildOperationsCancelRequest', + response_type_name='Empty', + supports_download=False, + ) + + def Get(self, request, global_params=None): + r"""Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service. + + Args: + request: (CloudbuildOperationsGetRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Operation) The response message. + """ + config = self.GetMethodConfig('Get') + return self._RunMethod(config, request, global_params=global_params) + + Get.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1/operations/{operationsId}', + http_method='GET', + method_id='cloudbuild.operations.get', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1/{+name}', + request_field='', + request_type_name='CloudbuildOperationsGetRequest', + response_type_name='Operation', + supports_download=False, + ) + + class ProjectsBuildsService(base_api.BaseApiService): + """Service class for the projects_builds resource.""" + + _NAME = 'projects_builds' + + def __init__(self, client): + super(CloudbuildV1.ProjectsBuildsService, self).__init__(client) + self._upload_configs = {} + + def Cancel(self, request, global_params=None): + r"""Cancels a build in progress. + + Args: + request: (CancelBuildRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Build) The response message. + """ + config = self.GetMethodConfig('Cancel') + return self._RunMethod(config, request, global_params=global_params) + + Cancel.method_config = lambda: base_api.ApiMethodInfo( + http_method='POST', + method_id='cloudbuild.projects.builds.cancel', + ordered_params=['projectId', 'id'], + path_params=['id', 'projectId'], + query_params=[], + relative_path='v1/projects/{projectId}/builds/{id}:cancel', + request_field='', + request_type_name='CancelBuildRequest', + response_type_name='Build', + supports_download=False, + ) + + def Create(self, request, global_params=None): + r"""Starts a build with the specified configuration. This method returns a long-running `Operation`, which includes the build ID. Pass the build ID to `GetBuild` to determine the build status (such as `SUCCESS` or `FAILURE`). + + Args: + request: (CloudbuildProjectsBuildsCreateRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Operation) The response message. + """ + config = self.GetMethodConfig('Create') + return self._RunMethod(config, request, global_params=global_params) + + Create.method_config = lambda: base_api.ApiMethodInfo( + http_method='POST', + method_id='cloudbuild.projects.builds.create', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['parent'], + relative_path='v1/projects/{projectId}/builds', + request_field='build', + request_type_name='CloudbuildProjectsBuildsCreateRequest', + response_type_name='Operation', + supports_download=False, + ) + + def Get(self, request, global_params=None): + r"""Returns information about a previously requested build. The `Build` that is returned includes its status (such as `SUCCESS`, `FAILURE`, or `WORKING`), and timing information. + + Args: + request: (CloudbuildProjectsBuildsGetRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Build) The response message. + """ + config = self.GetMethodConfig('Get') + return self._RunMethod(config, request, global_params=global_params) + + Get.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='cloudbuild.projects.builds.get', + ordered_params=['projectId', 'id'], + path_params=['id', 'projectId'], + query_params=['name'], + relative_path='v1/projects/{projectId}/builds/{id}', + request_field='', + request_type_name='CloudbuildProjectsBuildsGetRequest', + response_type_name='Build', + supports_download=False, + ) + + def List(self, request, global_params=None): + r"""Lists previously requested builds. Previously requested builds may still be in-progress, or may have finished successfully or unsuccessfully. + + Args: + request: (CloudbuildProjectsBuildsListRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ListBuildsResponse) The response message. + """ + config = self.GetMethodConfig('List') + return self._RunMethod(config, request, global_params=global_params) + + List.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='cloudbuild.projects.builds.list', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['filter', 'pageSize', 'pageToken', 'parent'], + relative_path='v1/projects/{projectId}/builds', + request_field='', + request_type_name='CloudbuildProjectsBuildsListRequest', + response_type_name='ListBuildsResponse', + supports_download=False, + ) + + def Retry(self, request, global_params=None): + r"""Creates a new build based on the specified build. This method creates a new build using the original build request, which may or may not result in an identical build. For triggered builds: * Triggered builds resolve to a precise revision; therefore a retry of a triggered build will result in a build that uses the same revision. For non-triggered builds that specify `RepoSource`: * If the original build built from the tip of a branch, the retried build will build from the tip of that branch, which may not be the same revision as the original build. * If the original build specified a commit sha or revision ID, the retried build will use the identical source. For builds that specify `StorageSource`: * If the original build pulled source from Google Cloud Storage without specifying the generation of the object, the new build will use the current object, which may be different from the original build source. * If the original build pulled source from Cloud Storage and specified the generation of the object, the new build will attempt to use the same object, which may or may not be available depending on the bucket's lifecycle management settings. + + Args: + request: (RetryBuildRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Operation) The response message. + """ + config = self.GetMethodConfig('Retry') + return self._RunMethod(config, request, global_params=global_params) + + Retry.method_config = lambda: base_api.ApiMethodInfo( + http_method='POST', + method_id='cloudbuild.projects.builds.retry', + ordered_params=['projectId', 'id'], + path_params=['id', 'projectId'], + query_params=[], + relative_path='v1/projects/{projectId}/builds/{id}:retry', + request_field='', + request_type_name='RetryBuildRequest', + response_type_name='Operation', + supports_download=False, + ) + + class ProjectsLocationsBuildsService(base_api.BaseApiService): + """Service class for the projects_locations_builds resource.""" + + _NAME = 'projects_locations_builds' + + def __init__(self, client): + super(CloudbuildV1.ProjectsLocationsBuildsService, self).__init__(client) + self._upload_configs = {} + + def Cancel(self, request, global_params=None): + r"""Cancels a build in progress. + + Args: + request: (CancelBuildRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Build) The response message. + """ + config = self.GetMethodConfig('Cancel') + return self._RunMethod(config, request, global_params=global_params) + + Cancel.method_config = lambda: base_api.ApiMethodInfo( + flat_path= + 'v1/projects/{projectsId}/locations/{locationsId}/builds/{buildsId}:cancel', + http_method='POST', + method_id='cloudbuild.projects.locations.builds.cancel', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1/{+name}:cancel', + request_field='', + request_type_name='CancelBuildRequest', + response_type_name='Build', + supports_download=False, + ) + + def Create(self, request, global_params=None): + r"""Starts a build with the specified configuration. This method returns a long-running `Operation`, which includes the build ID. Pass the build ID to `GetBuild` to determine the build status (such as `SUCCESS` or `FAILURE`). + + Args: + request: (CloudbuildProjectsLocationsBuildsCreateRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Operation) The response message. + """ + config = self.GetMethodConfig('Create') + return self._RunMethod(config, request, global_params=global_params) + + Create.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1/projects/{projectsId}/locations/{locationsId}/builds', + http_method='POST', + method_id='cloudbuild.projects.locations.builds.create', + ordered_params=['parent'], + path_params=['parent'], + query_params=['projectId'], + relative_path='v1/{+parent}/builds', + request_field='build', + request_type_name='CloudbuildProjectsLocationsBuildsCreateRequest', + response_type_name='Operation', + supports_download=False, + ) + + def Get(self, request, global_params=None): + r"""Returns information about a previously requested build. The `Build` that is returned includes its status (such as `SUCCESS`, `FAILURE`, or `WORKING`), and timing information. + + Args: + request: (CloudbuildProjectsLocationsBuildsGetRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Build) The response message. + """ + config = self.GetMethodConfig('Get') + return self._RunMethod(config, request, global_params=global_params) + + Get.method_config = lambda: base_api.ApiMethodInfo( + flat_path= + 'v1/projects/{projectsId}/locations/{locationsId}/builds/{buildsId}', + http_method='GET', + method_id='cloudbuild.projects.locations.builds.get', + ordered_params=['name'], + path_params=['name'], + query_params=['id', 'projectId'], + relative_path='v1/{+name}', + request_field='', + request_type_name='CloudbuildProjectsLocationsBuildsGetRequest', + response_type_name='Build', + supports_download=False, + ) + + def List(self, request, global_params=None): + r"""Lists previously requested builds. Previously requested builds may still be in-progress, or may have finished successfully or unsuccessfully. + + Args: + request: (CloudbuildProjectsLocationsBuildsListRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ListBuildsResponse) The response message. + """ + config = self.GetMethodConfig('List') + return self._RunMethod(config, request, global_params=global_params) + + List.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1/projects/{projectsId}/locations/{locationsId}/builds', + http_method='GET', + method_id='cloudbuild.projects.locations.builds.list', + ordered_params=['parent'], + path_params=['parent'], + query_params=['filter', 'pageSize', 'pageToken', 'projectId'], + relative_path='v1/{+parent}/builds', + request_field='', + request_type_name='CloudbuildProjectsLocationsBuildsListRequest', + response_type_name='ListBuildsResponse', + supports_download=False, + ) + + def Retry(self, request, global_params=None): + r"""Creates a new build based on the specified build. This method creates a new build using the original build request, which may or may not result in an identical build. For triggered builds: * Triggered builds resolve to a precise revision; therefore a retry of a triggered build will result in a build that uses the same revision. For non-triggered builds that specify `RepoSource`: * If the original build built from the tip of a branch, the retried build will build from the tip of that branch, which may not be the same revision as the original build. * If the original build specified a commit sha or revision ID, the retried build will use the identical source. For builds that specify `StorageSource`: * If the original build pulled source from Google Cloud Storage without specifying the generation of the object, the new build will use the current object, which may be different from the original build source. * If the original build pulled source from Cloud Storage and specified the generation of the object, the new build will attempt to use the same object, which may or may not be available depending on the bucket's lifecycle management settings. + + Args: + request: (RetryBuildRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Operation) The response message. + """ + config = self.GetMethodConfig('Retry') + return self._RunMethod(config, request, global_params=global_params) + + Retry.method_config = lambda: base_api.ApiMethodInfo( + flat_path= + 'v1/projects/{projectsId}/locations/{locationsId}/builds/{buildsId}:retry', + http_method='POST', + method_id='cloudbuild.projects.locations.builds.retry', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1/{+name}:retry', + request_field='', + request_type_name='RetryBuildRequest', + response_type_name='Operation', + supports_download=False, + ) + + class ProjectsLocationsOperationsService(base_api.BaseApiService): + """Service class for the projects_locations_operations resource.""" + + _NAME = 'projects_locations_operations' + + def __init__(self, client): + super(CloudbuildV1.ProjectsLocationsOperationsService, + self).__init__(client) + self._upload_configs = {} + + def Cancel(self, request, global_params=None): + r"""Starts asynchronous cancellation on a long-running operation. The server makes a best effort to cancel the operation, but success is not guaranteed. If the server doesn't support this method, it returns `google.rpc.Code.UNIMPLEMENTED`. Clients can use Operations.GetOperation or other methods to check whether the cancellation succeeded or whether the operation completed despite cancellation. On successful cancellation, the operation is not deleted; instead, it becomes an operation with an Operation.error value with a google.rpc.Status.code of 1, corresponding to `Code.CANCELLED`. + + Args: + request: (CloudbuildProjectsLocationsOperationsCancelRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Empty) The response message. + """ + config = self.GetMethodConfig('Cancel') + return self._RunMethod(config, request, global_params=global_params) + + Cancel.method_config = lambda: base_api.ApiMethodInfo( + flat_path= + 'v1/projects/{projectsId}/locations/{locationsId}/operations/{operationsId}:cancel', + http_method='POST', + method_id='cloudbuild.projects.locations.operations.cancel', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1/{+name}:cancel', + request_field='cancelOperationRequest', + request_type_name='CloudbuildProjectsLocationsOperationsCancelRequest', + response_type_name='Empty', + supports_download=False, + ) + + def Get(self, request, global_params=None): + r"""Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service. + + Args: + request: (CloudbuildProjectsLocationsOperationsGetRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Operation) The response message. + """ + config = self.GetMethodConfig('Get') + return self._RunMethod(config, request, global_params=global_params) + + Get.method_config = lambda: base_api.ApiMethodInfo( + flat_path= + 'v1/projects/{projectsId}/locations/{locationsId}/operations/{operationsId}', + http_method='GET', + method_id='cloudbuild.projects.locations.operations.get', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1/{+name}', + request_field='', + request_type_name='CloudbuildProjectsLocationsOperationsGetRequest', + response_type_name='Operation', + supports_download=False, + ) + + class ProjectsLocationsService(base_api.BaseApiService): + """Service class for the projects_locations resource.""" + + _NAME = 'projects_locations' + + def __init__(self, client): + super(CloudbuildV1.ProjectsLocationsService, self).__init__(client) + self._upload_configs = {} + + class ProjectsTriggersService(base_api.BaseApiService): + """Service class for the projects_triggers resource.""" + + _NAME = 'projects_triggers' + + def __init__(self, client): + super(CloudbuildV1.ProjectsTriggersService, self).__init__(client) + self._upload_configs = {} + + def Create(self, request, global_params=None): + r"""Creates a new `BuildTrigger`. This API is experimental. + + Args: + request: (CloudbuildProjectsTriggersCreateRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (BuildTrigger) The response message. + """ + config = self.GetMethodConfig('Create') + return self._RunMethod(config, request, global_params=global_params) + + Create.method_config = lambda: base_api.ApiMethodInfo( + http_method='POST', + method_id='cloudbuild.projects.triggers.create', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=[], + relative_path='v1/projects/{projectId}/triggers', + request_field='buildTrigger', + request_type_name='CloudbuildProjectsTriggersCreateRequest', + response_type_name='BuildTrigger', + supports_download=False, + ) + + def Delete(self, request, global_params=None): + r"""Deletes a `BuildTrigger` by its project ID and trigger ID. This API is experimental. + + Args: + request: (CloudbuildProjectsTriggersDeleteRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Empty) The response message. + """ + config = self.GetMethodConfig('Delete') + return self._RunMethod(config, request, global_params=global_params) + + Delete.method_config = lambda: base_api.ApiMethodInfo( + http_method='DELETE', + method_id='cloudbuild.projects.triggers.delete', + ordered_params=['projectId', 'triggerId'], + path_params=['projectId', 'triggerId'], + query_params=[], + relative_path='v1/projects/{projectId}/triggers/{triggerId}', + request_field='', + request_type_name='CloudbuildProjectsTriggersDeleteRequest', + response_type_name='Empty', + supports_download=False, + ) + + def Get(self, request, global_params=None): + r"""Returns information about a `BuildTrigger`. This API is experimental. + + Args: + request: (CloudbuildProjectsTriggersGetRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (BuildTrigger) The response message. + """ + config = self.GetMethodConfig('Get') + return self._RunMethod(config, request, global_params=global_params) + + Get.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='cloudbuild.projects.triggers.get', + ordered_params=['projectId', 'triggerId'], + path_params=['projectId', 'triggerId'], + query_params=[], + relative_path='v1/projects/{projectId}/triggers/{triggerId}', + request_field='', + request_type_name='CloudbuildProjectsTriggersGetRequest', + response_type_name='BuildTrigger', + supports_download=False, + ) + + def List(self, request, global_params=None): + r"""Lists existing `BuildTrigger`s. This API is experimental. + + Args: + request: (CloudbuildProjectsTriggersListRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ListBuildTriggersResponse) The response message. + """ + config = self.GetMethodConfig('List') + return self._RunMethod(config, request, global_params=global_params) + + List.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='cloudbuild.projects.triggers.list', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['pageSize', 'pageToken'], + relative_path='v1/projects/{projectId}/triggers', + request_field='', + request_type_name='CloudbuildProjectsTriggersListRequest', + response_type_name='ListBuildTriggersResponse', + supports_download=False, + ) + + def Patch(self, request, global_params=None): + r"""Updates a `BuildTrigger` by its project ID and trigger ID. This API is experimental. + + Args: + request: (CloudbuildProjectsTriggersPatchRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (BuildTrigger) The response message. + """ + config = self.GetMethodConfig('Patch') + return self._RunMethod(config, request, global_params=global_params) + + Patch.method_config = lambda: base_api.ApiMethodInfo( + http_method='PATCH', + method_id='cloudbuild.projects.triggers.patch', + ordered_params=['projectId', 'triggerId'], + path_params=['projectId', 'triggerId'], + query_params=[], + relative_path='v1/projects/{projectId}/triggers/{triggerId}', + request_field='buildTrigger', + request_type_name='CloudbuildProjectsTriggersPatchRequest', + response_type_name='BuildTrigger', + supports_download=False, + ) + + def Run(self, request, global_params=None): + r"""Runs a `BuildTrigger` at a particular source revision. + + Args: + request: (CloudbuildProjectsTriggersRunRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Operation) The response message. + """ + config = self.GetMethodConfig('Run') + return self._RunMethod(config, request, global_params=global_params) + + Run.method_config = lambda: base_api.ApiMethodInfo( + http_method='POST', + method_id='cloudbuild.projects.triggers.run', + ordered_params=['projectId', 'triggerId'], + path_params=['projectId', 'triggerId'], + query_params=[], + relative_path='v1/projects/{projectId}/triggers/{triggerId}:run', + request_field='repoSource', + request_type_name='CloudbuildProjectsTriggersRunRequest', + response_type_name='Operation', + supports_download=False, + ) + + def Webhook(self, request, global_params=None): + r"""ReceiveTriggerWebhook [Experimental] is called when the API receives a webhook request targeted at a specific trigger. + + Args: + request: (CloudbuildProjectsTriggersWebhookRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ReceiveTriggerWebhookResponse) The response message. + """ + config = self.GetMethodConfig('Webhook') + return self._RunMethod(config, request, global_params=global_params) + + Webhook.method_config = lambda: base_api.ApiMethodInfo( + http_method='POST', + method_id='cloudbuild.projects.triggers.webhook', + ordered_params=['projectId', 'trigger'], + path_params=['projectId', 'trigger'], + query_params=['secret'], + relative_path='v1/projects/{projectId}/triggers/{trigger}:webhook', + request_field='httpBody', + request_type_name='CloudbuildProjectsTriggersWebhookRequest', + response_type_name='ReceiveTriggerWebhookResponse', + supports_download=False, + ) + + class ProjectsService(base_api.BaseApiService): + """Service class for the projects resource.""" + + _NAME = 'projects' + + def __init__(self, client): + super(CloudbuildV1.ProjectsService, self).__init__(client) + self._upload_configs = {} diff --git a/sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/cloudbuild_v1_messages.py b/sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/cloudbuild_v1_messages.py new file mode 100644 index 000000000000..9dbef65b2137 --- /dev/null +++ b/sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/cloudbuild_v1_messages.py @@ -0,0 +1,1911 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Generated message classes for cloudbuild version v1. + +Creates and manages builds on Google Cloud Platform. +""" +# NOTE: This file is autogenerated and should not be edited by hand. +# mypy: ignore-errors + +from apitools.base.protorpclite import messages as _messages +from apitools.base.py import encoding +from apitools.base.py import extra_types + +package = 'cloudbuild' + + +class ArtifactObjects(_messages.Message): + r"""Files in the workspace to upload to Cloud Storage upon successful + completion of all build steps. + + Fields: + location: Cloud Storage bucket and optional object path, in the form + "gs://bucket/path/to/somewhere/". (see [Bucket Name + Requirements](https://cloud.google.com/storage/docs/bucket- + naming#requirements)). Files in the workspace matching any path pattern + will be uploaded to Cloud Storage with this location as a prefix. + paths: Path globs used to match files in the build's workspace. + timing: Output only. Stores timing information for pushing all artifact + objects. + """ + + location = _messages.StringField(1) + paths = _messages.StringField(2, repeated=True) + timing = _messages.MessageField('TimeSpan', 3) + + +class ArtifactResult(_messages.Message): + r"""An artifact that was uploaded during a build. This is a single record in + the artifact manifest JSON file. + + Fields: + fileHash: The file hash of the artifact. + location: The path of an artifact in a Google Cloud Storage bucket, with + the generation number. For example, + `gs://mybucket/path/to/output.jar#generation`. + """ + + fileHash = _messages.MessageField('FileHashes', 1, repeated=True) + location = _messages.StringField(2) + + +class Artifacts(_messages.Message): + r"""Artifacts produced by a build that should be uploaded upon successful + completion of all build steps. + + Fields: + images: A list of images to be pushed upon the successful completion of + all build steps. The images will be pushed using the builder service + account's credentials. The digests of the pushed images will be stored + in the Build resource's results field. If any of the images fail to be + pushed, the build is marked FAILURE. + objects: A list of objects to be uploaded to Cloud Storage upon successful + completion of all build steps. Files in the workspace matching specified + paths globs will be uploaded to the specified Cloud Storage location + using the builder service account's credentials. The location and + generation of the uploaded objects will be stored in the Build + resource's results field. If any objects fail to be pushed, the build is + marked FAILURE. + """ + + images = _messages.StringField(1, repeated=True) + objects = _messages.MessageField('ArtifactObjects', 2) + + +class Build(_messages.Message): + r"""A build resource in the Cloud Build API. At a high level, a `Build` + describes where to find source code, how to build it (for example, the + builder image to run on the source), and where to store the built artifacts. + Fields can include the following variables, which will be expanded when the + build is created: - $PROJECT_ID: the project ID of the build. - + $PROJECT_NUMBER: the project number of the build. - $BUILD_ID: the + autogenerated ID of the build. - $REPO_NAME: the source repository name + specified by RepoSource. - $BRANCH_NAME: the branch name specified by + RepoSource. - $TAG_NAME: the tag name specified by RepoSource. - + $REVISION_ID or $COMMIT_SHA: the commit SHA specified by RepoSource or + resolved from the specified branch or tag. - $SHORT_SHA: first 7 characters + of $REVISION_ID or $COMMIT_SHA. + + Enums: + StatusValueValuesEnum: Output only. Status of the build. + + Messages: + SubstitutionsValue: Substitutions data for `Build` resource. + TimingValue: Output only. Stores timing information for phases of the + build. Valid keys are: * BUILD: time to execute all build steps * PUSH: + time to push all specified images. * FETCHSOURCE: time to fetch source. + If the build does not specify source or images, these keys will not be + included. + + Fields: + artifacts: Artifacts produced by the build that should be uploaded upon + successful completion of all build steps. + availableSecrets: Secrets and secret environment variables. + buildTriggerId: Output only. The ID of the `BuildTrigger` that triggered + this build, if it was triggered automatically. + createTime: Output only. Time at which the request to create the build was + received. + finishTime: Output only. Time at which execution of the build was + finished. The difference between finish_time and start_time is the + duration of the build's execution. + id: Output only. Unique identifier of the build. + images: A list of images to be pushed upon the successful completion of + all build steps. The images are pushed using the builder service + account's credentials. The digests of the pushed images will be stored + in the `Build` resource's results field. If any of the images fail to be + pushed, the build status is marked `FAILURE`. + logUrl: Output only. URL to logs for this build in Google Cloud Console. + logsBucket: Google Cloud Storage bucket where logs should be written (see + [Bucket Name Requirements](https://cloud.google.com/storage/docs/bucket- + naming#requirements)). Logs file names will be of the format + `${logs_bucket}/log-${build_id}.txt`. + name: Output only. The 'Build' name with format: + `projects/{project}/locations/{location}/builds/{build}`, where {build} + is a unique identifier generated by the service. + options: Special options for this build. + projectId: Output only. ID of the project. + queueTtl: TTL in queue for this build. If provided and the build is + enqueued longer than this value, the build will expire and the build + status will be `EXPIRED`. The TTL starts ticking from create_time. + results: Output only. Results of the build. + secrets: Secrets to decrypt using Cloud Key Management Service. Note: + Secret Manager is the recommended technique for managing sensitive data + with Cloud Build. Use `available_secrets` to configure builds to access + secrets from Secret Manager. For instructions, see: + https://cloud.google.com/cloud-build/docs/securing-builds/use-secrets + serviceAccount: IAM service account whose credentials will be used at + build runtime. Must be of the format + `projects/{PROJECT_ID}/serviceAccounts/{ACCOUNT}`. ACCOUNT can be email + address or uniqueId of the service account. This field is in beta. + source: The location of the source files to build. + sourceProvenance: Output only. A permanent fixed identifier for source. + startTime: Output only. Time at which execution of the build was started. + status: Output only. Status of the build. + statusDetail: Output only. Customer-readable message about the current + status. + steps: Required. The operations to be performed on the workspace. + substitutions: Substitutions data for `Build` resource. + tags: Tags for annotation of a `Build`. These are not docker tags. + timeout: Amount of time that this build should be allowed to run, to + second granularity. If this amount of time elapses, work on the build + will cease and the build status will be `TIMEOUT`. `timeout` starts + ticking from `startTime`. Default time is ten minutes. + timing: Output only. Stores timing information for phases of the build. + Valid keys are: * BUILD: time to execute all build steps * PUSH: time to + push all specified images. * FETCHSOURCE: time to fetch source. If the + build does not specify source or images, these keys will not be + included. + """ + class StatusValueValuesEnum(_messages.Enum): + r"""Output only. Status of the build. + + Values: + STATUS_UNKNOWN: Status of the build is unknown. + QUEUED: Build or step is queued; work has not yet begun. + WORKING: Build or step is being executed. + SUCCESS: Build or step finished successfully. + FAILURE: Build or step failed to complete successfully. + INTERNAL_ERROR: Build or step failed due to an internal cause. + TIMEOUT: Build or step took longer than was allowed. + CANCELLED: Build or step was canceled by a user. + EXPIRED: Build was enqueued for longer than the value of `queue_ttl`. + """ + STATUS_UNKNOWN = 0 + QUEUED = 1 + WORKING = 2 + SUCCESS = 3 + FAILURE = 4 + INTERNAL_ERROR = 5 + TIMEOUT = 6 + CANCELLED = 7 + EXPIRED = 8 + + @encoding.MapUnrecognizedFields('additionalProperties') + class SubstitutionsValue(_messages.Message): + r"""Substitutions data for `Build` resource. + + Messages: + AdditionalProperty: An additional property for a SubstitutionsValue + object. + + Fields: + additionalProperties: Additional properties of type SubstitutionsValue + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a SubstitutionsValue object. + + Fields: + key: Name of the additional property. + value: A string attribute. + """ + + key = _messages.StringField(1) + value = _messages.StringField(2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + @encoding.MapUnrecognizedFields('additionalProperties') + class TimingValue(_messages.Message): + r"""Output only. Stores timing information for phases of the build. Valid + keys are: * BUILD: time to execute all build steps * PUSH: time to push + all specified images. * FETCHSOURCE: time to fetch source. If the build + does not specify source or images, these keys will not be included. + + Messages: + AdditionalProperty: An additional property for a TimingValue object. + + Fields: + additionalProperties: Additional properties of type TimingValue + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a TimingValue object. + + Fields: + key: Name of the additional property. + value: A TimeSpan attribute. + """ + + key = _messages.StringField(1) + value = _messages.MessageField('TimeSpan', 2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + artifacts = _messages.MessageField('Artifacts', 1) + availableSecrets = _messages.MessageField('Secrets', 2) + buildTriggerId = _messages.StringField(3) + createTime = _messages.StringField(4) + finishTime = _messages.StringField(5) + id = _messages.StringField(6) + images = _messages.StringField(7, repeated=True) + logUrl = _messages.StringField(8) + logsBucket = _messages.StringField(9) + name = _messages.StringField(10) + options = _messages.MessageField('BuildOptions', 11) + projectId = _messages.StringField(12) + queueTtl = _messages.StringField(13) + results = _messages.MessageField('Results', 14) + secrets = _messages.MessageField('Secret', 15, repeated=True) + serviceAccount = _messages.StringField(16) + source = _messages.MessageField('Source', 17) + sourceProvenance = _messages.MessageField('SourceProvenance', 18) + startTime = _messages.StringField(19) + status = _messages.EnumField('StatusValueValuesEnum', 20) + statusDetail = _messages.StringField(21) + steps = _messages.MessageField('BuildStep', 22, repeated=True) + substitutions = _messages.MessageField('SubstitutionsValue', 23) + tags = _messages.StringField(24, repeated=True) + timeout = _messages.StringField(25) + timing = _messages.MessageField('TimingValue', 26) + + +class BuildOperationMetadata(_messages.Message): + r"""Metadata for build operations. + + Fields: + build: The build that the operation is tracking. + """ + + build = _messages.MessageField('Build', 1) + + +class BuildOptions(_messages.Message): + r"""Optional arguments to enable specific features of builds. + + Enums: + LogStreamingOptionValueValuesEnum: Option to define build log streaming + behavior to Google Cloud Storage. + LoggingValueValuesEnum: Option to specify the logging mode, which + determines if and where build logs are stored. + MachineTypeValueValuesEnum: Compute Engine machine type on which to run + the build. + RequestedVerifyOptionValueValuesEnum: Requested verifiability options. + SourceProvenanceHashValueListEntryValuesEnum: + SubstitutionOptionValueValuesEnum: Option to specify behavior when there + is an error in the substitution checks. NOTE: this is always set to + ALLOW_LOOSE for triggered builds and cannot be overridden in the build + configuration file. + + Fields: + diskSizeGb: Requested disk size for the VM that runs the build. Note that + this is *NOT* "disk free"; some of the space will be used by the + operating system and build utilities. Also note that this is the minimum + disk size that will be allocated for the build -- the build may run with + a larger disk than requested. At present, the maximum disk size is + 1000GB; builds that request more than the maximum are rejected with an + error. + dynamicSubstitutions: Option to specify whether or not to apply bash style + string operations to the substitutions. NOTE: this is always enabled for + triggered builds and cannot be overridden in the build configuration + file. + env: A list of global environment variable definitions that will exist for + all build steps in this build. If a variable is defined in both globally + and in a build step, the variable will use the build step value. The + elements are of the form "KEY=VALUE" for the environment variable "KEY" + being given the value "VALUE". + logStreamingOption: Option to define build log streaming behavior to + Google Cloud Storage. + logging: Option to specify the logging mode, which determines if and where + build logs are stored. + machineType: Compute Engine machine type on which to run the build. + requestedVerifyOption: Requested verifiability options. + secretEnv: A list of global environment variables, which are encrypted + using a Cloud Key Management Service crypto key. These values must be + specified in the build's `Secret`. These variables will be available to + all build steps in this build. + sourceProvenanceHash: Requested hash for SourceProvenance. + substitutionOption: Option to specify behavior when there is an error in + the substitution checks. NOTE: this is always set to ALLOW_LOOSE for + triggered builds and cannot be overridden in the build configuration + file. + volumes: Global list of volumes to mount for ALL build steps Each volume + is created as an empty volume prior to starting the build process. Upon + completion of the build, volumes and their contents are discarded. + Global volume names and paths cannot conflict with the volumes defined a + build step. Using a global volume in a build with only one step is not + valid as it is indicative of a build request with an incorrect + configuration. + workerPool: Option to specify a `WorkerPool` for the build. Format: + projects/{project}/locations/{location}/workerPools/{workerPool} This + field is in beta and is available only to restricted users. + """ + class LogStreamingOptionValueValuesEnum(_messages.Enum): + r"""Option to define build log streaming behavior to Google Cloud Storage. + + Values: + STREAM_DEFAULT: Service may automatically determine build log streaming + behavior. + STREAM_ON: Build logs should be streamed to Google Cloud Storage. + STREAM_OFF: Build logs should not be streamed to Google Cloud Storage; + they will be written when the build is completed. + """ + STREAM_DEFAULT = 0 + STREAM_ON = 1 + STREAM_OFF = 2 + + class LoggingValueValuesEnum(_messages.Enum): + r"""Option to specify the logging mode, which determines if and where + build logs are stored. + + Values: + LOGGING_UNSPECIFIED: The service determines the logging mode. The + default is `LEGACY`. Do not rely on the default logging behavior as it + may change in the future. + LEGACY: Cloud Logging and Cloud Storage logging are enabled. + GCS_ONLY: Only Cloud Storage logging is enabled. + STACKDRIVER_ONLY: This option is the same as CLOUD_LOGGING_ONLY. + CLOUD_LOGGING_ONLY: Only Cloud Logging is enabled. Note that logs for + both the Cloud Console UI and Cloud SDK are based on Cloud Storage + logs, so neither will provide logs if this option is chosen. + NONE: Turn off all logging. No build logs will be captured. + """ + LOGGING_UNSPECIFIED = 0 + LEGACY = 1 + GCS_ONLY = 2 + STACKDRIVER_ONLY = 3 + CLOUD_LOGGING_ONLY = 4 + NONE = 5 + + class MachineTypeValueValuesEnum(_messages.Enum): + r"""Compute Engine machine type on which to run the build. + + Values: + UNSPECIFIED: Standard machine type. + N1_HIGHCPU_8: Highcpu machine with 8 CPUs. + N1_HIGHCPU_32: Highcpu machine with 32 CPUs. + E2_HIGHCPU_8: Highcpu e2 machine with 8 CPUs. + E2_HIGHCPU_32: Highcpu e2 machine with 32 CPUs. + """ + UNSPECIFIED = 0 + N1_HIGHCPU_8 = 1 + N1_HIGHCPU_32 = 2 + E2_HIGHCPU_8 = 3 + E2_HIGHCPU_32 = 4 + + class RequestedVerifyOptionValueValuesEnum(_messages.Enum): + r"""Requested verifiability options. + + Values: + NOT_VERIFIED: Not a verifiable build. (default) + VERIFIED: Verified build. + """ + NOT_VERIFIED = 0 + VERIFIED = 1 + + class SourceProvenanceHashValueListEntryValuesEnum(_messages.Enum): + r"""SourceProvenanceHashValueListEntryValuesEnum enum type. + + Values: + NONE: No hash requested. + SHA256: Use a sha256 hash. + MD5: Use a md5 hash. + """ + NONE = 0 + SHA256 = 1 + MD5 = 2 + + class SubstitutionOptionValueValuesEnum(_messages.Enum): + r"""Option to specify behavior when there is an error in the substitution + checks. NOTE: this is always set to ALLOW_LOOSE for triggered builds and + cannot be overridden in the build configuration file. + + Values: + MUST_MATCH: Fails the build if error in substitutions checks, like + missing a substitution in the template or in the map. + ALLOW_LOOSE: Do not fail the build if error in substitutions checks. + """ + MUST_MATCH = 0 + ALLOW_LOOSE = 1 + + diskSizeGb = _messages.IntegerField(1) + dynamicSubstitutions = _messages.BooleanField(2) + env = _messages.StringField(3, repeated=True) + logStreamingOption = _messages.EnumField( + 'LogStreamingOptionValueValuesEnum', 4) + logging = _messages.EnumField('LoggingValueValuesEnum', 5) + machineType = _messages.EnumField('MachineTypeValueValuesEnum', 6) + requestedVerifyOption = _messages.EnumField( + 'RequestedVerifyOptionValueValuesEnum', 7) + secretEnv = _messages.StringField(8, repeated=True) + sourceProvenanceHash = _messages.EnumField( + 'SourceProvenanceHashValueListEntryValuesEnum', 9, repeated=True) + substitutionOption = _messages.EnumField( + 'SubstitutionOptionValueValuesEnum', 10) + volumes = _messages.MessageField('Volume', 11, repeated=True) + workerPool = _messages.StringField(12) + + +class BuildStep(_messages.Message): + r"""A step in the build pipeline. + + Enums: + StatusValueValuesEnum: Output only. Status of the build step. At this + time, build step status is only updated on build completion; step status + is not updated in real-time as the build progresses. + + Fields: + args: A list of arguments that will be presented to the step when it is + started. If the image used to run the step's container has an + entrypoint, the `args` are used as arguments to that entrypoint. If the + image does not define an entrypoint, the first element in args is used + as the entrypoint, and the remainder will be used as arguments. + dir: Working directory to use when running this step's container. If this + value is a relative path, it is relative to the build's working + directory. If this value is absolute, it may be outside the build's + working directory, in which case the contents of the path may not be + persisted across build step executions, unless a `volume` for that path + is specified. If the build specifies a `RepoSource` with `dir` and a + step with a `dir`, which specifies an absolute path, the `RepoSource` + `dir` is ignored for the step's execution. + entrypoint: Entrypoint to be used instead of the build step image's + default entrypoint. If unset, the image's default entrypoint is used. + env: A list of environment variable definitions to be used when running a + step. The elements are of the form "KEY=VALUE" for the environment + variable "KEY" being given the value "VALUE". + id: Unique identifier for this build step, used in `wait_for` to reference + this build step as a dependency. + name: Required. The name of the container image that will run this + particular build step. If the image is available in the host's Docker + daemon's cache, it will be run directly. If not, the host will attempt + to pull the image first, using the builder service account's credentials + if necessary. The Docker daemon's cache will already have the latest + versions of all of the officially supported build steps + ([https://github.com/GoogleCloudPlatform/cloud- + builders](https://github.com/GoogleCloudPlatform/cloud-builders)). The + Docker daemon will also have cached many of the layers for some popular + images, like "ubuntu", "debian", but they will be refreshed at the time + you attempt to use them. If you built an image in a previous build step, + it will be stored in the host's Docker daemon's cache and is available + to use as the name for a later build step. + pullTiming: Output only. Stores timing information for pulling this build + step's builder image only. + secretEnv: A list of environment variables which are encrypted using a + Cloud Key Management Service crypto key. These values must be specified + in the build's `Secret`. + status: Output only. Status of the build step. At this time, build step + status is only updated on build completion; step status is not updated + in real-time as the build progresses. + timeout: Time limit for executing this build step. If not defined, the + step has no time limit and will be allowed to continue to run until + either it completes or the build itself times out. + timing: Output only. Stores timing information for executing this build + step. + volumes: List of volumes to mount into the build step. Each volume is + created as an empty volume prior to execution of the build step. Upon + completion of the build, volumes and their contents are discarded. Using + a named volume in only one step is not valid as it is indicative of a + build request with an incorrect configuration. + waitFor: The ID(s) of the step(s) that this build step depends on. This + build step will not start until all the build steps in `wait_for` have + completed successfully. If `wait_for` is empty, this build step will + start when all previous build steps in the `Build.Steps` list have + completed successfully. + """ + class StatusValueValuesEnum(_messages.Enum): + r"""Output only. Status of the build step. At this time, build step status + is only updated on build completion; step status is not updated in real- + time as the build progresses. + + Values: + STATUS_UNKNOWN: Status of the build is unknown. + QUEUED: Build or step is queued; work has not yet begun. + WORKING: Build or step is being executed. + SUCCESS: Build or step finished successfully. + FAILURE: Build or step failed to complete successfully. + INTERNAL_ERROR: Build or step failed due to an internal cause. + TIMEOUT: Build or step took longer than was allowed. + CANCELLED: Build or step was canceled by a user. + EXPIRED: Build was enqueued for longer than the value of `queue_ttl`. + """ + STATUS_UNKNOWN = 0 + QUEUED = 1 + WORKING = 2 + SUCCESS = 3 + FAILURE = 4 + INTERNAL_ERROR = 5 + TIMEOUT = 6 + CANCELLED = 7 + EXPIRED = 8 + + args = _messages.StringField(1, repeated=True) + dir = _messages.StringField(2) + entrypoint = _messages.StringField(3) + env = _messages.StringField(4, repeated=True) + id = _messages.StringField(5) + name = _messages.StringField(6) + pullTiming = _messages.MessageField('TimeSpan', 7) + secretEnv = _messages.StringField(8, repeated=True) + status = _messages.EnumField('StatusValueValuesEnum', 9) + timeout = _messages.StringField(10) + timing = _messages.MessageField('TimeSpan', 11) + volumes = _messages.MessageField('Volume', 12, repeated=True) + waitFor = _messages.StringField(13, repeated=True) + + +class BuildTrigger(_messages.Message): + r"""Configuration for an automated build in response to source repository + changes. + + Messages: + SubstitutionsValue: Substitutions for Build resource. The keys must match + the following regular expression: `^_[A-Z0-9_]+$`. + + Fields: + build: Contents of the build template. + createTime: Output only. Time when the trigger was created. + description: Human-readable description of this trigger. + disabled: If true, the trigger will never automatically execute a build. + filename: Path, from the source root, to a file whose contents is used for + the template. + github: GitHubEventsConfig describes the configuration of a trigger that + creates a build whenever a GitHub event is received. Mutually exclusive + with `trigger_template`. + id: Output only. Unique identifier of the trigger. + ignoredFiles: ignored_files and included_files are file glob matches using + https://golang.org/pkg/path/filepath/#Match extended with support for + "**". If ignored_files and changed files are both empty, then they are + not used to determine whether or not to trigger a build. If + ignored_files is not empty, then we ignore any files that match any of + the ignored_file globs. If the change has no files that are outside of + the ignored_files globs, then we do not trigger a build. + includedFiles: If any of the files altered in the commit pass the + ignored_files filter and included_files is empty, then as far as this + filter is concerned, we should trigger the build. If any of the files + altered in the commit pass the ignored_files filter and included_files + is not empty, then we make sure that at least one of those files matches + a included_files glob. If not, then we do not trigger a build. + name: User-assigned name of the trigger. Must be unique within the + project. Trigger names must meet the following requirements: + They must + contain only alphanumeric characters and dashes. + They can be 1-64 + characters long. + They must begin and end with an alphanumeric + character. + substitutions: Substitutions for Build resource. The keys must match the + following regular expression: `^_[A-Z0-9_]+$`. + tags: Tags for annotation of a `BuildTrigger` + triggerTemplate: Template describing the types of source changes to + trigger a build. Branch and tag names in trigger templates are + interpreted as regular expressions. Any branch or tag change that + matches that regular expression will trigger a build. Mutually exclusive + with `github`. + """ + @encoding.MapUnrecognizedFields('additionalProperties') + class SubstitutionsValue(_messages.Message): + r"""Substitutions for Build resource. The keys must match the following + regular expression: `^_[A-Z0-9_]+$`. + + Messages: + AdditionalProperty: An additional property for a SubstitutionsValue + object. + + Fields: + additionalProperties: Additional properties of type SubstitutionsValue + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a SubstitutionsValue object. + + Fields: + key: Name of the additional property. + value: A string attribute. + """ + + key = _messages.StringField(1) + value = _messages.StringField(2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + build = _messages.MessageField('Build', 1) + createTime = _messages.StringField(2) + description = _messages.StringField(3) + disabled = _messages.BooleanField(4) + filename = _messages.StringField(5) + github = _messages.MessageField('GitHubEventsConfig', 6) + id = _messages.StringField(7) + ignoredFiles = _messages.StringField(8, repeated=True) + includedFiles = _messages.StringField(9, repeated=True) + name = _messages.StringField(10) + substitutions = _messages.MessageField('SubstitutionsValue', 11) + tags = _messages.StringField(12, repeated=True) + triggerTemplate = _messages.MessageField('RepoSource', 13) + + +class BuiltImage(_messages.Message): + r"""An image built by the pipeline. + + Fields: + digest: Docker Registry 2.0 digest. + name: Name used to push the container image to Google Container Registry, + as presented to `docker push`. + pushTiming: Output only. Stores timing information for pushing the + specified image. + """ + + digest = _messages.StringField(1) + name = _messages.StringField(2) + pushTiming = _messages.MessageField('TimeSpan', 3) + + +class CancelBuildRequest(_messages.Message): + r"""Request to cancel an ongoing build. + + Fields: + id: Required. ID of the build. + name: The name of the `Build` to cancel. Format: + `projects/{project}/locations/{location}/builds/{build}` + projectId: Required. ID of the project. + """ + + id = _messages.StringField(1) + name = _messages.StringField(2) + projectId = _messages.StringField(3) + + +class CancelOperationRequest(_messages.Message): + r"""The request message for Operations.CancelOperation.""" + + +class CloudbuildOperationsCancelRequest(_messages.Message): + r"""A CloudbuildOperationsCancelRequest object. + + Fields: + cancelOperationRequest: A CancelOperationRequest resource to be passed as + the request body. + name: The name of the operation resource to be cancelled. + """ + + cancelOperationRequest = _messages.MessageField('CancelOperationRequest', 1) + name = _messages.StringField(2, required=True) + + +class CloudbuildOperationsGetRequest(_messages.Message): + r"""A CloudbuildOperationsGetRequest object. + + Fields: + name: The name of the operation resource. + """ + + name = _messages.StringField(1, required=True) + + +class CloudbuildProjectsBuildsCreateRequest(_messages.Message): + r"""A CloudbuildProjectsBuildsCreateRequest object. + + Fields: + build: A Build resource to be passed as the request body. + parent: The parent resource where this build will be created. Format: + `projects/{project}/locations/{location}` + projectId: Required. ID of the project. + """ + + build = _messages.MessageField('Build', 1) + parent = _messages.StringField(2) + projectId = _messages.StringField(3, required=True) + + +class CloudbuildProjectsBuildsGetRequest(_messages.Message): + r"""A CloudbuildProjectsBuildsGetRequest object. + + Fields: + id: Required. ID of the build. + name: The name of the `Build` to retrieve. Format: + `projects/{project}/locations/{location}/builds/{build}` + projectId: Required. ID of the project. + """ + + id = _messages.StringField(1, required=True) + name = _messages.StringField(2) + projectId = _messages.StringField(3, required=True) + + +class CloudbuildProjectsBuildsListRequest(_messages.Message): + r"""A CloudbuildProjectsBuildsListRequest object. + + Fields: + filter: The raw filter text to constrain the results. + pageSize: Number of results to return in the list. + pageToken: The page token for the next page of Builds. If unspecified, the + first page of results is returned. If the token is rejected for any + reason, INVALID_ARGUMENT will be thrown. In this case, the token should + be discarded, and pagination should be restarted from the first page of + results. See https://google.aip.dev/158 for more. + parent: The parent of the collection of `Builds`. Format: + `projects/{project}/locations/location` + projectId: Required. ID of the project. + """ + + filter = _messages.StringField(1) + pageSize = _messages.IntegerField(2, variant=_messages.Variant.INT32) + pageToken = _messages.StringField(3) + parent = _messages.StringField(4) + projectId = _messages.StringField(5, required=True) + + +class CloudbuildProjectsLocationsBuildsCreateRequest(_messages.Message): + r"""A CloudbuildProjectsLocationsBuildsCreateRequest object. + + Fields: + build: A Build resource to be passed as the request body. + parent: The parent resource where this build will be created. Format: + `projects/{project}/locations/{location}` + projectId: Required. ID of the project. + """ + + build = _messages.MessageField('Build', 1) + parent = _messages.StringField(2, required=True) + projectId = _messages.StringField(3) + + +class CloudbuildProjectsLocationsBuildsGetRequest(_messages.Message): + r"""A CloudbuildProjectsLocationsBuildsGetRequest object. + + Fields: + id: Required. ID of the build. + name: The name of the `Build` to retrieve. Format: + `projects/{project}/locations/{location}/builds/{build}` + projectId: Required. ID of the project. + """ + + id = _messages.StringField(1) + name = _messages.StringField(2, required=True) + projectId = _messages.StringField(3) + + +class CloudbuildProjectsLocationsBuildsListRequest(_messages.Message): + r"""A CloudbuildProjectsLocationsBuildsListRequest object. + + Fields: + filter: The raw filter text to constrain the results. + pageSize: Number of results to return in the list. + pageToken: The page token for the next page of Builds. If unspecified, the + first page of results is returned. If the token is rejected for any + reason, INVALID_ARGUMENT will be thrown. In this case, the token should + be discarded, and pagination should be restarted from the first page of + results. See https://google.aip.dev/158 for more. + parent: The parent of the collection of `Builds`. Format: + `projects/{project}/locations/location` + projectId: Required. ID of the project. + """ + + filter = _messages.StringField(1) + pageSize = _messages.IntegerField(2, variant=_messages.Variant.INT32) + pageToken = _messages.StringField(3) + parent = _messages.StringField(4, required=True) + projectId = _messages.StringField(5) + + +class CloudbuildProjectsLocationsOperationsCancelRequest(_messages.Message): + r"""A CloudbuildProjectsLocationsOperationsCancelRequest object. + + Fields: + cancelOperationRequest: A CancelOperationRequest resource to be passed as + the request body. + name: The name of the operation resource to be cancelled. + """ + + cancelOperationRequest = _messages.MessageField('CancelOperationRequest', 1) + name = _messages.StringField(2, required=True) + + +class CloudbuildProjectsLocationsOperationsGetRequest(_messages.Message): + r"""A CloudbuildProjectsLocationsOperationsGetRequest object. + + Fields: + name: The name of the operation resource. + """ + + name = _messages.StringField(1, required=True) + + +class CloudbuildProjectsTriggersCreateRequest(_messages.Message): + r"""A CloudbuildProjectsTriggersCreateRequest object. + + Fields: + buildTrigger: A BuildTrigger resource to be passed as the request body. + projectId: Required. ID of the project for which to configure automatic + builds. + """ + + buildTrigger = _messages.MessageField('BuildTrigger', 1) + projectId = _messages.StringField(2, required=True) + + +class CloudbuildProjectsTriggersDeleteRequest(_messages.Message): + r"""A CloudbuildProjectsTriggersDeleteRequest object. + + Fields: + projectId: Required. ID of the project that owns the trigger. + triggerId: Required. ID of the `BuildTrigger` to delete. + """ + + projectId = _messages.StringField(1, required=True) + triggerId = _messages.StringField(2, required=True) + + +class CloudbuildProjectsTriggersGetRequest(_messages.Message): + r"""A CloudbuildProjectsTriggersGetRequest object. + + Fields: + projectId: Required. ID of the project that owns the trigger. + triggerId: Required. Identifier (`id` or `name`) of the `BuildTrigger` to + get. + """ + + projectId = _messages.StringField(1, required=True) + triggerId = _messages.StringField(2, required=True) + + +class CloudbuildProjectsTriggersListRequest(_messages.Message): + r"""A CloudbuildProjectsTriggersListRequest object. + + Fields: + pageSize: Number of results to return in the list. + pageToken: Token to provide to skip to a particular spot in the list. + projectId: Required. ID of the project for which to list BuildTriggers. + """ + + pageSize = _messages.IntegerField(1, variant=_messages.Variant.INT32) + pageToken = _messages.StringField(2) + projectId = _messages.StringField(3, required=True) + + +class CloudbuildProjectsTriggersPatchRequest(_messages.Message): + r"""A CloudbuildProjectsTriggersPatchRequest object. + + Fields: + buildTrigger: A BuildTrigger resource to be passed as the request body. + projectId: Required. ID of the project that owns the trigger. + triggerId: Required. ID of the `BuildTrigger` to update. + """ + + buildTrigger = _messages.MessageField('BuildTrigger', 1) + projectId = _messages.StringField(2, required=True) + triggerId = _messages.StringField(3, required=True) + + +class CloudbuildProjectsTriggersRunRequest(_messages.Message): + r"""A CloudbuildProjectsTriggersRunRequest object. + + Fields: + projectId: Required. ID of the project. + repoSource: A RepoSource resource to be passed as the request body. + triggerId: Required. ID of the trigger. + """ + + projectId = _messages.StringField(1, required=True) + repoSource = _messages.MessageField('RepoSource', 2) + triggerId = _messages.StringField(3, required=True) + + +class CloudbuildProjectsTriggersWebhookRequest(_messages.Message): + r"""A CloudbuildProjectsTriggersWebhookRequest object. + + Fields: + httpBody: A HttpBody resource to be passed as the request body. + projectId: Project in which the specified trigger lives + secret: Secret token used for authorization if an OAuth token isn't + provided. + trigger: Name of the trigger to run the payload against + """ + + httpBody = _messages.MessageField('HttpBody', 1) + projectId = _messages.StringField(2, required=True) + secret = _messages.StringField(3) + trigger = _messages.StringField(4, required=True) + + +class Empty(_messages.Message): + r"""A generic empty message that you can re-use to avoid defining duplicated + empty messages in your APIs. A typical example is to use it as the request + or the response type of an API method. For instance: service Foo { rpc + Bar(google.protobuf.Empty) returns (google.protobuf.Empty); } The JSON + representation for `Empty` is empty JSON object `{}`. + """ + + +class FileHashes(_messages.Message): + r"""Container message for hashes of byte content of files, used in + SourceProvenance messages to verify integrity of source input to the build. + + Fields: + fileHash: Collection of file hashes. + """ + + fileHash = _messages.MessageField('Hash', 1, repeated=True) + + +class GitHubEventsConfig(_messages.Message): + r"""GitHubEventsConfig describes the configuration of a trigger that creates + a build whenever a GitHub event is received. This message is experimental. + + Fields: + installationId: The installationID that emits the GitHub event. + name: Name of the repository. For example: The name for + https://github.com/googlecloudplatform/cloud-builders is "cloud- + builders". + owner: Owner of the repository. For example: The owner for + https://github.com/googlecloudplatform/cloud-builders is + "googlecloudplatform". + pullRequest: filter to match changes in pull requests. + push: filter to match changes in refs like branches, tags. + """ + + installationId = _messages.IntegerField(1) + name = _messages.StringField(2) + owner = _messages.StringField(3) + pullRequest = _messages.MessageField('PullRequestFilter', 4) + push = _messages.MessageField('PushFilter', 5) + + +class HTTPDelivery(_messages.Message): + r"""HTTPDelivery is the delivery configuration for an HTTP notification. + + Fields: + uri: The URI to which JSON-containing HTTP POST requests should be sent. + """ + + uri = _messages.StringField(1) + + +class Hash(_messages.Message): + r"""Container message for hash values. + + Enums: + TypeValueValuesEnum: The type of hash that was performed. + + Fields: + type: The type of hash that was performed. + value: The hash value. + """ + class TypeValueValuesEnum(_messages.Enum): + r"""The type of hash that was performed. + + Values: + NONE: No hash requested. + SHA256: Use a sha256 hash. + MD5: Use a md5 hash. + """ + NONE = 0 + SHA256 = 1 + MD5 = 2 + + type = _messages.EnumField('TypeValueValuesEnum', 1) + value = _messages.BytesField(2) + + +class HttpBody(_messages.Message): + r"""Message that represents an arbitrary HTTP body. It should only be used + for payload formats that can't be represented as JSON, such as raw binary or + an HTML page. This message can be used both in streaming and non-streaming + API methods in the request as well as the response. It can be used as a top- + level request field, which is convenient if one wants to extract parameters + from either the URL or HTTP template into the request fields and also want + access to the raw HTTP body. Example: message GetResourceRequest { // A + unique request id. string request_id = 1; // The raw HTTP body is bound to + this field. google.api.HttpBody http_body = 2; } service ResourceService { + rpc GetResource(GetResourceRequest) returns (google.api.HttpBody); rpc + UpdateResource(google.api.HttpBody) returns (google.protobuf.Empty); } + Example with streaming methods: service CaldavService { rpc + GetCalendar(stream google.api.HttpBody) returns (stream + google.api.HttpBody); rpc UpdateCalendar(stream google.api.HttpBody) returns + (stream google.api.HttpBody); } Use of this type only changes how the + request and response bodies are handled, all other features will continue to + work unchanged. + + Messages: + ExtensionsValueListEntry: A ExtensionsValueListEntry object. + + Fields: + contentType: The HTTP Content-Type header value specifying the content + type of the body. + data: The HTTP request/response body as raw binary. + extensions: Application specific response metadata. Must be set in the + first response for streaming APIs. + """ + @encoding.MapUnrecognizedFields('additionalProperties') + class ExtensionsValueListEntry(_messages.Message): + r"""A ExtensionsValueListEntry object. + + Messages: + AdditionalProperty: An additional property for a + ExtensionsValueListEntry object. + + Fields: + additionalProperties: Properties of the object. Contains field @type + with type URL. + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a ExtensionsValueListEntry object. + + Fields: + key: Name of the additional property. + value: A extra_types.JsonValue attribute. + """ + + key = _messages.StringField(1) + value = _messages.MessageField('extra_types.JsonValue', 2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + contentType = _messages.StringField(1) + data = _messages.BytesField(2) + extensions = _messages.MessageField( + 'ExtensionsValueListEntry', 3, repeated=True) + + +class InlineSecret(_messages.Message): + r"""Pairs a set of secret environment variables mapped to encrypted values + with the Cloud KMS key to use to decrypt the value. + + Messages: + EnvMapValue: Map of environment variable name to its encrypted value. + Secret environment variables must be unique across all of a build's + secrets, and must be used by at least one build step. Values can be at + most 64 KB in size. There can be at most 100 secret values across all of + a build's secrets. + + Fields: + envMap: Map of environment variable name to its encrypted value. Secret + environment variables must be unique across all of a build's secrets, + and must be used by at least one build step. Values can be at most 64 KB + in size. There can be at most 100 secret values across all of a build's + secrets. + kmsKeyName: Resource name of Cloud KMS crypto key to decrypt the encrypted + value. In format: projects/*/locations/*/keyRings/*/cryptoKeys/* + """ + @encoding.MapUnrecognizedFields('additionalProperties') + class EnvMapValue(_messages.Message): + r"""Map of environment variable name to its encrypted value. Secret + environment variables must be unique across all of a build's secrets, and + must be used by at least one build step. Values can be at most 64 KB in + size. There can be at most 100 secret values across all of a build's + secrets. + + Messages: + AdditionalProperty: An additional property for a EnvMapValue object. + + Fields: + additionalProperties: Additional properties of type EnvMapValue + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a EnvMapValue object. + + Fields: + key: Name of the additional property. + value: A byte attribute. + """ + + key = _messages.StringField(1) + value = _messages.BytesField(2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + envMap = _messages.MessageField('EnvMapValue', 1) + kmsKeyName = _messages.StringField(2) + + +class ListBuildTriggersResponse(_messages.Message): + r"""Response containing existing `BuildTriggers`. + + Fields: + nextPageToken: Token to receive the next page of results. + triggers: `BuildTriggers` for the project, sorted by `create_time` + descending. + """ + + nextPageToken = _messages.StringField(1) + triggers = _messages.MessageField('BuildTrigger', 2, repeated=True) + + +class ListBuildsResponse(_messages.Message): + r"""Response including listed builds. + + Fields: + builds: Builds will be sorted by `create_time`, descending. + nextPageToken: Token to receive the next page of results. This will be + absent if the end of the response list has been reached. + """ + + builds = _messages.MessageField('Build', 1, repeated=True) + nextPageToken = _messages.StringField(2) + + +class Notification(_messages.Message): + r"""Notification is the container which holds the data that is relevant to + this particular notification. + + Messages: + StructDeliveryValue: Escape hatch for users to supply custom delivery + configs. + + Fields: + filter: The filter string to use for notification filtering. Currently, + this is assumed to be a CEL program. See + https://opensource.google/projects/cel for more. + httpDelivery: Configuration for HTTP delivery. + slackDelivery: Configuration for Slack delivery. + smtpDelivery: Configuration for SMTP (email) delivery. + structDelivery: Escape hatch for users to supply custom delivery configs. + """ + @encoding.MapUnrecognizedFields('additionalProperties') + class StructDeliveryValue(_messages.Message): + r"""Escape hatch for users to supply custom delivery configs. + + Messages: + AdditionalProperty: An additional property for a StructDeliveryValue + object. + + Fields: + additionalProperties: Properties of the object. + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a StructDeliveryValue object. + + Fields: + key: Name of the additional property. + value: A extra_types.JsonValue attribute. + """ + + key = _messages.StringField(1) + value = _messages.MessageField('extra_types.JsonValue', 2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + filter = _messages.StringField(1) + httpDelivery = _messages.MessageField('HTTPDelivery', 2) + slackDelivery = _messages.MessageField('SlackDelivery', 3) + smtpDelivery = _messages.MessageField('SMTPDelivery', 4) + structDelivery = _messages.MessageField('StructDeliveryValue', 5) + + +class NotifierConfig(_messages.Message): + r"""NotifierConfig is the top-level configuration message. + + Fields: + apiVersion: The API version of this configuration format. + kind: The type of notifier to use (e.g. SMTPNotifier). + metadata: Metadata for referring to/handling/deploying this notifier. + spec: The actual configuration for this notifier. + """ + + apiVersion = _messages.StringField(1) + kind = _messages.StringField(2) + metadata = _messages.MessageField('NotifierMetadata', 3) + spec = _messages.MessageField('NotifierSpec', 4) + + +class NotifierMetadata(_messages.Message): + r"""NotifierMetadata contains the data which can be used to reference or + describe this notifier. + + Fields: + name: The human-readable and user-given name for the notifier. For + example: "repo-merge-email-notifier". + notifier: The string representing the name and version of notifier to + deploy. Expected to be of the form of "/:". For example: "gcr.io/my- + project/notifiers/smtp:1.2.34". + """ + + name = _messages.StringField(1) + notifier = _messages.StringField(2) + + +class NotifierSecret(_messages.Message): + r"""NotifierSecret is the container that maps a secret name (reference) to + its Google Cloud Secret Manager resource path. + + Fields: + name: Name is the local name of the secret, such as the verbatim string + "my-smtp-password". + value: Value is interpreted to be a resource path for fetching the actual + (versioned) secret data for this secret. For example, this would be a + Google Cloud Secret Manager secret version resource path like: + "projects/my-project/secrets/my-secret/versions/latest". + """ + + name = _messages.StringField(1) + value = _messages.StringField(2) + + +class NotifierSecretRef(_messages.Message): + r"""NotifierSecretRef contains the reference to a secret stored in the + corresponding NotifierSpec. + + Fields: + secretRef: The value of `secret_ref` should be a `name` that is registered + in a `Secret` in the `secrets` list of the `Spec`. + """ + + secretRef = _messages.StringField(1) + + +class NotifierSpec(_messages.Message): + r"""NotifierSpec is the configuration container for notifications. + + Fields: + notification: The configuration of this particular notifier. + secrets: Configurations for secret resources used by this particular + notifier. + """ + + notification = _messages.MessageField('Notification', 1) + secrets = _messages.MessageField('NotifierSecret', 2, repeated=True) + + +class Operation(_messages.Message): + r"""This resource represents a long-running operation that is the result of + a network API call. + + Messages: + MetadataValue: Service-specific metadata associated with the operation. It + typically contains progress information and common metadata such as + create time. Some services might not provide such metadata. Any method + that returns a long-running operation should document the metadata type, + if any. + ResponseValue: The normal response of the operation in case of success. If + the original method returns no data on success, such as `Delete`, the + response is `google.protobuf.Empty`. If the original method is standard + `Get`/`Create`/`Update`, the response should be the resource. For other + methods, the response should have the type `XxxResponse`, where `Xxx` is + the original method name. For example, if the original method name is + `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`. + + Fields: + done: If the value is `false`, it means the operation is still in + progress. If `true`, the operation is completed, and either `error` or + `response` is available. + error: The error result of the operation in case of failure or + cancellation. + metadata: Service-specific metadata associated with the operation. It + typically contains progress information and common metadata such as + create time. Some services might not provide such metadata. Any method + that returns a long-running operation should document the metadata type, + if any. + name: The server-assigned name, which is only unique within the same + service that originally returns it. If you use the default HTTP mapping, + the `name` should be a resource name ending with + `operations/{unique_id}`. + response: The normal response of the operation in case of success. If the + original method returns no data on success, such as `Delete`, the + response is `google.protobuf.Empty`. If the original method is standard + `Get`/`Create`/`Update`, the response should be the resource. For other + methods, the response should have the type `XxxResponse`, where `Xxx` is + the original method name. For example, if the original method name is + `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`. + """ + @encoding.MapUnrecognizedFields('additionalProperties') + class MetadataValue(_messages.Message): + r"""Service-specific metadata associated with the operation. It typically + contains progress information and common metadata such as create time. + Some services might not provide such metadata. Any method that returns a + long-running operation should document the metadata type, if any. + + Messages: + AdditionalProperty: An additional property for a MetadataValue object. + + Fields: + additionalProperties: Properties of the object. Contains field @type + with type URL. + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a MetadataValue object. + + Fields: + key: Name of the additional property. + value: A extra_types.JsonValue attribute. + """ + + key = _messages.StringField(1) + value = _messages.MessageField('extra_types.JsonValue', 2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + @encoding.MapUnrecognizedFields('additionalProperties') + class ResponseValue(_messages.Message): + r"""The normal response of the operation in case of success. If the + original method returns no data on success, such as `Delete`, the response + is `google.protobuf.Empty`. If the original method is standard + `Get`/`Create`/`Update`, the response should be the resource. For other + methods, the response should have the type `XxxResponse`, where `Xxx` is + the original method name. For example, if the original method name is + `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`. + + Messages: + AdditionalProperty: An additional property for a ResponseValue object. + + Fields: + additionalProperties: Properties of the object. Contains field @type + with type URL. + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a ResponseValue object. + + Fields: + key: Name of the additional property. + value: A extra_types.JsonValue attribute. + """ + + key = _messages.StringField(1) + value = _messages.MessageField('extra_types.JsonValue', 2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + done = _messages.BooleanField(1) + error = _messages.MessageField('Status', 2) + metadata = _messages.MessageField('MetadataValue', 3) + name = _messages.StringField(4) + response = _messages.MessageField('ResponseValue', 5) + + +class PullRequestFilter(_messages.Message): + r"""PullRequestFilter contains filter properties for matching GitHub Pull + Requests. + + Enums: + CommentControlValueValuesEnum: Configure builds to run whether a + repository owner or collaborator need to comment `/gcbrun`. + + Fields: + branch: Regex of branches to match. The syntax of the regular expressions + accepted is the syntax accepted by RE2 and described at + https://github.com/google/re2/wiki/Syntax + commentControl: Configure builds to run whether a repository owner or + collaborator need to comment `/gcbrun`. + invertRegex: If true, branches that do NOT match the git_ref will trigger + a build. + """ + class CommentControlValueValuesEnum(_messages.Enum): + r"""Configure builds to run whether a repository owner or collaborator + need to comment `/gcbrun`. + + Values: + COMMENTS_DISABLED: Do not require comments on Pull Requests before + builds are triggered. + COMMENTS_ENABLED: Enforce that repository owners or collaborators must + comment on Pull Requests before builds are triggered. + COMMENTS_ENABLED_FOR_EXTERNAL_CONTRIBUTORS_ONLY: Enforce that repository + owners or collaborators must comment on external contributors' Pull + Requests before builds are triggered. + """ + COMMENTS_DISABLED = 0 + COMMENTS_ENABLED = 1 + COMMENTS_ENABLED_FOR_EXTERNAL_CONTRIBUTORS_ONLY = 2 + + branch = _messages.StringField(1) + commentControl = _messages.EnumField('CommentControlValueValuesEnum', 2) + invertRegex = _messages.BooleanField(3) + + +class PushFilter(_messages.Message): + r"""Push contains filter properties for matching GitHub git pushes. + + Fields: + branch: Regexes matching branches to build. The syntax of the regular + expressions accepted is the syntax accepted by RE2 and described at + https://github.com/google/re2/wiki/Syntax + invertRegex: When true, only trigger a build if the revision regex does + NOT match the git_ref regex. + tag: Regexes matching tags to build. The syntax of the regular expressions + accepted is the syntax accepted by RE2 and described at + https://github.com/google/re2/wiki/Syntax + """ + + branch = _messages.StringField(1) + invertRegex = _messages.BooleanField(2) + tag = _messages.StringField(3) + + +class ReceiveTriggerWebhookResponse(_messages.Message): + r"""ReceiveTriggerWebhookResponse [Experimental] is the response object for + the ReceiveTriggerWebhook method. + """ + + +class RepoSource(_messages.Message): + r"""Location of the source in a Google Cloud Source Repository. + + Messages: + SubstitutionsValue: Substitutions to use in a triggered build. Should only + be used with RunBuildTrigger + + Fields: + branchName: Regex matching branches to build. The syntax of the regular + expressions accepted is the syntax accepted by RE2 and described at + https://github.com/google/re2/wiki/Syntax + commitSha: Explicit commit SHA to build. + dir: Directory, relative to the source root, in which to run the build. + This must be a relative path. If a step's `dir` is specified and is an + absolute path, this value is ignored for that step's execution. + invertRegex: Only trigger a build if the revision regex does NOT match the + revision regex. + projectId: ID of the project that owns the Cloud Source Repository. If + omitted, the project ID requesting the build is assumed. + repoName: Name of the Cloud Source Repository. + substitutions: Substitutions to use in a triggered build. Should only be + used with RunBuildTrigger + tagName: Regex matching tags to build. The syntax of the regular + expressions accepted is the syntax accepted by RE2 and described at + https://github.com/google/re2/wiki/Syntax + """ + @encoding.MapUnrecognizedFields('additionalProperties') + class SubstitutionsValue(_messages.Message): + r"""Substitutions to use in a triggered build. Should only be used with + RunBuildTrigger + + Messages: + AdditionalProperty: An additional property for a SubstitutionsValue + object. + + Fields: + additionalProperties: Additional properties of type SubstitutionsValue + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a SubstitutionsValue object. + + Fields: + key: Name of the additional property. + value: A string attribute. + """ + + key = _messages.StringField(1) + value = _messages.StringField(2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + branchName = _messages.StringField(1) + commitSha = _messages.StringField(2) + dir = _messages.StringField(3) + invertRegex = _messages.BooleanField(4) + projectId = _messages.StringField(5) + repoName = _messages.StringField(6) + substitutions = _messages.MessageField('SubstitutionsValue', 7) + tagName = _messages.StringField(8) + + +class Results(_messages.Message): + r"""Artifacts created by the build pipeline. + + Fields: + artifactManifest: Path to the artifact manifest. Only populated when + artifacts are uploaded. + artifactTiming: Time to push all non-container artifacts. + buildStepImages: List of build step digests, in the order corresponding to + build step indices. + buildStepOutputs: List of build step outputs, produced by builder images, + in the order corresponding to build step indices. [Cloud + Builders](https://cloud.google.com/cloud-build/docs/cloud-builders) can + produce this output by writing to `$BUILDER_OUTPUT/output`. Only the + first 4KB of data is stored. + images: Container images that were built as a part of the build. + numArtifacts: Number of artifacts uploaded. Only populated when artifacts + are uploaded. + """ + + artifactManifest = _messages.StringField(1) + artifactTiming = _messages.MessageField('TimeSpan', 2) + buildStepImages = _messages.StringField(3, repeated=True) + buildStepOutputs = _messages.BytesField(4, repeated=True) + images = _messages.MessageField('BuiltImage', 5, repeated=True) + numArtifacts = _messages.IntegerField(6) + + +class RetryBuildRequest(_messages.Message): + r"""Specifies a build to retry. + + Fields: + id: Required. Build ID of the original build. + name: The name of the `Build` to retry. Format: + `projects/{project}/locations/{location}/builds/{build}` + projectId: Required. ID of the project. + """ + + id = _messages.StringField(1) + name = _messages.StringField(2) + projectId = _messages.StringField(3) + + +class SMTPDelivery(_messages.Message): + r"""SMTPDelivery is the delivery configuration for an SMTP (email) + notification. + + Fields: + fromAddress: This is the SMTP account/email that appears in the `From:` of + the email. If empty, it is assumed to be sender. + password: The SMTP sender's password. + port: The SMTP port of the server. + recipientAddresses: This is the list of addresses to which we send the + email (i.e. in the `To:` of the email). + senderAddress: This is the SMTP account/email that is used to send the + message. + server: The address of the SMTP server. + """ + + fromAddress = _messages.StringField(1) + password = _messages.MessageField('NotifierSecretRef', 2) + port = _messages.StringField(3) + recipientAddresses = _messages.StringField(4, repeated=True) + senderAddress = _messages.StringField(5) + server = _messages.StringField(6) + + +class Secret(_messages.Message): + r"""Pairs a set of secret environment variables containing encrypted values + with the Cloud KMS key to use to decrypt the value. Note: Use `kmsKeyName` + with `available_secrets` instead of using `kmsKeyName` with `secret`. For + instructions see: https://cloud.google.com/cloud-build/docs/securing- + builds/use-encrypted-credentials. + + Messages: + SecretEnvValue: Map of environment variable name to its encrypted value. + Secret environment variables must be unique across all of a build's + secrets, and must be used by at least one build step. Values can be at + most 64 KB in size. There can be at most 100 secret values across all of + a build's secrets. + + Fields: + kmsKeyName: Cloud KMS key name to use to decrypt these envs. + secretEnv: Map of environment variable name to its encrypted value. Secret + environment variables must be unique across all of a build's secrets, + and must be used by at least one build step. Values can be at most 64 KB + in size. There can be at most 100 secret values across all of a build's + secrets. + """ + @encoding.MapUnrecognizedFields('additionalProperties') + class SecretEnvValue(_messages.Message): + r"""Map of environment variable name to its encrypted value. Secret + environment variables must be unique across all of a build's secrets, and + must be used by at least one build step. Values can be at most 64 KB in + size. There can be at most 100 secret values across all of a build's + secrets. + + Messages: + AdditionalProperty: An additional property for a SecretEnvValue object. + + Fields: + additionalProperties: Additional properties of type SecretEnvValue + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a SecretEnvValue object. + + Fields: + key: Name of the additional property. + value: A byte attribute. + """ + + key = _messages.StringField(1) + value = _messages.BytesField(2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + kmsKeyName = _messages.StringField(1) + secretEnv = _messages.MessageField('SecretEnvValue', 2) + + +class SecretManagerSecret(_messages.Message): + r"""Pairs a secret environment variable with a SecretVersion in Secret + Manager. + + Fields: + env: Environment variable name to associate with the secret. Secret + environment variables must be unique across all of a build's secrets, + and must be used by at least one build step. + versionName: Resource name of the SecretVersion. In format: + projects/*/secrets/*/versions/* + """ + + env = _messages.StringField(1) + versionName = _messages.StringField(2) + + +class Secrets(_messages.Message): + r"""Secrets and secret environment variables. + + Fields: + inline: Secrets encrypted with KMS key and the associated secret + environment variable. + secretManager: Secrets in Secret Manager and associated secret environment + variable. + """ + + inline = _messages.MessageField('InlineSecret', 1, repeated=True) + secretManager = _messages.MessageField( + 'SecretManagerSecret', 2, repeated=True) + + +class SlackDelivery(_messages.Message): + r"""SlackDelivery is the delivery configuration for delivering Slack + messages via webhooks. See Slack webhook documentation at: + https://api.slack.com/messaging/webhooks. + + Fields: + webhookUri: The secret reference for the Slack webhook URI for sending + messages to a channel. + """ + + webhookUri = _messages.MessageField('NotifierSecretRef', 1) + + +class Source(_messages.Message): + r"""Location of the source in a supported storage service. + + Fields: + repoSource: If provided, get the source from this location in a Cloud + Source Repository. + storageSource: If provided, get the source from this location in Google + Cloud Storage. + """ + + repoSource = _messages.MessageField('RepoSource', 1) + storageSource = _messages.MessageField('StorageSource', 2) + + +class SourceProvenance(_messages.Message): + r"""Provenance of the source. Ways to find the original source, or verify + that some source was used for this build. + + Messages: + FileHashesValue: Output only. Hash(es) of the build source, which can be + used to verify that the original source integrity was maintained in the + build. Note that `FileHashes` will only be populated if `BuildOptions` + has requested a `SourceProvenanceHash`. The keys to this map are file + paths used as build source and the values contain the hash values for + those files. If the build source came in a single package such as a + gzipped tarfile (`.tar.gz`), the `FileHash` will be for the single path + to that file. + + Fields: + fileHashes: Output only. Hash(es) of the build source, which can be used + to verify that the original source integrity was maintained in the + build. Note that `FileHashes` will only be populated if `BuildOptions` + has requested a `SourceProvenanceHash`. The keys to this map are file + paths used as build source and the values contain the hash values for + those files. If the build source came in a single package such as a + gzipped tarfile (`.tar.gz`), the `FileHash` will be for the single path + to that file. + resolvedRepoSource: A copy of the build's `source.repo_source`, if exists, + with any revisions resolved. + resolvedStorageSource: A copy of the build's `source.storage_source`, if + exists, with any generations resolved. + """ + @encoding.MapUnrecognizedFields('additionalProperties') + class FileHashesValue(_messages.Message): + r"""Output only. Hash(es) of the build source, which can be used to verify + that the original source integrity was maintained in the build. Note that + `FileHashes` will only be populated if `BuildOptions` has requested a + `SourceProvenanceHash`. The keys to this map are file paths used as build + source and the values contain the hash values for those files. If the + build source came in a single package such as a gzipped tarfile + (`.tar.gz`), the `FileHash` will be for the single path to that file. + + Messages: + AdditionalProperty: An additional property for a FileHashesValue object. + + Fields: + additionalProperties: Additional properties of type FileHashesValue + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a FileHashesValue object. + + Fields: + key: Name of the additional property. + value: A FileHashes attribute. + """ + + key = _messages.StringField(1) + value = _messages.MessageField('FileHashes', 2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + fileHashes = _messages.MessageField('FileHashesValue', 1) + resolvedRepoSource = _messages.MessageField('RepoSource', 2) + resolvedStorageSource = _messages.MessageField('StorageSource', 3) + + +class StandardQueryParameters(_messages.Message): + r"""Query parameters accepted by all methods. + + Enums: + FXgafvValueValuesEnum: V1 error format. + AltValueValuesEnum: Data format for response. + + Fields: + f__xgafv: V1 error format. + access_token: OAuth access token. + alt: Data format for response. + callback: JSONP + fields: Selector specifying which fields to include in a partial response. + key: API key. Your API key identifies your project and provides you with + API access, quota, and reports. Required unless you provide an OAuth 2.0 + token. + oauth_token: OAuth 2.0 token for the current user. + prettyPrint: Returns response with indentations and line breaks. + quotaUser: Available to use for quota purposes for server-side + applications. Can be any arbitrary string assigned to a user, but should + not exceed 40 characters. + trace: A tracing token of the form "token:" to include in api + requests. + uploadType: Legacy upload protocol for media (e.g. "media", "multipart"). + upload_protocol: Upload protocol for media (e.g. "raw", "multipart"). + """ + class AltValueValuesEnum(_messages.Enum): + r"""Data format for response. + + Values: + json: Responses with Content-Type of application/json + media: Media download with context-dependent Content-Type + proto: Responses with Content-Type of application/x-protobuf + """ + json = 0 + media = 1 + proto = 2 + + class FXgafvValueValuesEnum(_messages.Enum): + r"""V1 error format. + + Values: + _1: v1 error format + _2: v2 error format + """ + _1 = 0 + _2 = 1 + + f__xgafv = _messages.EnumField('FXgafvValueValuesEnum', 1) + access_token = _messages.StringField(2) + alt = _messages.EnumField('AltValueValuesEnum', 3, default='json') + callback = _messages.StringField(4) + fields = _messages.StringField(5) + key = _messages.StringField(6) + oauth_token = _messages.StringField(7) + prettyPrint = _messages.BooleanField(8, default=True) + quotaUser = _messages.StringField(9) + trace = _messages.StringField(10) + uploadType = _messages.StringField(11) + upload_protocol = _messages.StringField(12) + + +class Status(_messages.Message): + r"""The `Status` type defines a logical error model that is suitable for + different programming environments, including REST APIs and RPC APIs. It is + used by [gRPC](https://github.com/grpc). Each `Status` message contains + three pieces of data: error code, error message, and error details. You can + find out more about this error model and how to work with it in the [API + Design Guide](https://cloud.google.com/apis/design/errors). + + Messages: + DetailsValueListEntry: A DetailsValueListEntry object. + + Fields: + code: The status code, which should be an enum value of google.rpc.Code. + details: A list of messages that carry the error details. There is a + common set of message types for APIs to use. + message: A developer-facing error message, which should be in English. Any + user-facing error message should be localized and sent in the + google.rpc.Status.details field, or localized by the client. + """ + @encoding.MapUnrecognizedFields('additionalProperties') + class DetailsValueListEntry(_messages.Message): + r"""A DetailsValueListEntry object. + + Messages: + AdditionalProperty: An additional property for a DetailsValueListEntry + object. + + Fields: + additionalProperties: Properties of the object. Contains field @type + with type URL. + """ + class AdditionalProperty(_messages.Message): + r"""An additional property for a DetailsValueListEntry object. + + Fields: + key: Name of the additional property. + value: A extra_types.JsonValue attribute. + """ + + key = _messages.StringField(1) + value = _messages.MessageField('extra_types.JsonValue', 2) + + additionalProperties = _messages.MessageField( + 'AdditionalProperty', 1, repeated=True) + + code = _messages.IntegerField(1, variant=_messages.Variant.INT32) + details = _messages.MessageField('DetailsValueListEntry', 2, repeated=True) + message = _messages.StringField(3) + + +class StorageSource(_messages.Message): + r"""Location of the source in an archive file in Google Cloud Storage. + + Fields: + bucket: Google Cloud Storage bucket containing the source (see [Bucket + Name Requirements](https://cloud.google.com/storage/docs/bucket- + naming#requirements)). + generation: Google Cloud Storage generation for the object. If the + generation is omitted, the latest generation will be used. + object: Google Cloud Storage object containing the source. This object + must be a gzipped archive file (`.tar.gz`) containing source to build. + """ + + bucket = _messages.StringField(1) + generation = _messages.IntegerField(2) + object = _messages.StringField(3) + + +class TimeSpan(_messages.Message): + r"""Start and end times for a build execution phase. + + Fields: + endTime: End of time span. + startTime: Start of time span. + """ + + endTime = _messages.StringField(1) + startTime = _messages.StringField(2) + + +class Volume(_messages.Message): + r"""Volume describes a Docker container volume which is mounted into build + steps in order to persist files across build step execution. + + Fields: + name: Name of the volume to mount. Volume names must be unique per build + step and must be valid names for Docker volumes. Each named volume must + be used by at least two build steps. + path: Path at which to mount the volume. Paths must be absolute and cannot + conflict with other volume paths on the same build step or with certain + reserved volume paths. + """ + + name = _messages.StringField(1) + path = _messages.StringField(2) + + +encoding.AddCustomJsonFieldMapping( + StandardQueryParameters, 'f__xgafv', '$.xgafv') +encoding.AddCustomJsonEnumMapping( + StandardQueryParameters.FXgafvValueValuesEnum, '_1', '1') +encoding.AddCustomJsonEnumMapping( + StandardQueryParameters.FXgafvValueValuesEnum, '_2', '2') diff --git a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/__init__.py b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/__init__.py index 0100df61db3f..239ee8c700a2 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/__init__.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/__init__.py @@ -18,8 +18,6 @@ """Common imports for generated dataflow client library.""" # pylint:disable=wildcard-import -from __future__ import absolute_import - import pkgutil # Protect against environments where apitools library is not available. diff --git a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py index b7cdcaa8ab9f..a48b9ea47c40 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py @@ -18,8 +18,6 @@ """Generated client library for dataflow version v1b3.""" # NOTE: This file is autogenerated and should not be edited by hand. -from __future__ import absolute_import - from apitools.base.py import base_api from apache_beam.runners.dataflow.internal.clients.dataflow import dataflow_v1b3_messages as messages @@ -29,17 +27,17 @@ class DataflowV1b3(base_api.BaseApiClient): """Generated client library for service dataflow version v1b3.""" MESSAGES_MODULE = messages - BASE_URL = u'https://dataflow.googleapis.com/' - MTLS_BASE_URL = u'' + BASE_URL = 'https://dataflow.googleapis.com/' + MTLS_BASE_URL = 'https://dataflow.mtls.googleapis.com/' - _PACKAGE = u'dataflow' - _SCOPES = [u'https://www.googleapis.com/auth/cloud-platform', u'https://www.googleapis.com/auth/compute', u'https://www.googleapis.com/auth/compute.readonly', u'https://www.googleapis.com/auth/userinfo.email'] - _VERSION = u'v1b3' + _PACKAGE = 'dataflow' + _SCOPES = ['https://www.googleapis.com/auth/cloud-platform', 'https://www.googleapis.com/auth/compute', 'https://www.googleapis.com/auth/compute.readonly', 'https://www.googleapis.com/auth/userinfo.email'] + _VERSION = 'v1b3' _CLIENT_ID = '1042881264118.apps.googleusercontent.com' _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b' _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b' - _CLIENT_CLASS_NAME = u'DataflowV1b3' - _URL_VERSION = u'v1b3' + _CLIENT_CLASS_NAME = 'DataflowV1b3' + _URL_VERSION = 'v1b3' _API_KEY = None def __init__(self, url='', credentials=None, @@ -57,6 +55,8 @@ def __init__(self, url='', credentials=None, default_global_params=default_global_params, additional_http_headers=additional_http_headers, response_encoding=response_encoding) + self.projects_catalogTemplates_templateVersions = self.ProjectsCatalogTemplatesTemplateVersionsService(self) + self.projects_catalogTemplates = self.ProjectsCatalogTemplatesService(self) self.projects_jobs_debug = self.ProjectsJobsDebugService(self) self.projects_jobs_messages = self.ProjectsJobsMessagesService(self) self.projects_jobs_workItems = self.ProjectsJobsWorkItemsService(self) @@ -64,18 +64,205 @@ def __init__(self, url='', credentials=None, self.projects_locations_flexTemplates = self.ProjectsLocationsFlexTemplatesService(self) self.projects_locations_jobs_debug = self.ProjectsLocationsJobsDebugService(self) self.projects_locations_jobs_messages = self.ProjectsLocationsJobsMessagesService(self) + self.projects_locations_jobs_snapshots = self.ProjectsLocationsJobsSnapshotsService(self) + self.projects_locations_jobs_stages = self.ProjectsLocationsJobsStagesService(self) self.projects_locations_jobs_workItems = self.ProjectsLocationsJobsWorkItemsService(self) self.projects_locations_jobs = self.ProjectsLocationsJobsService(self) + self.projects_locations_snapshots = self.ProjectsLocationsSnapshotsService(self) self.projects_locations_sql = self.ProjectsLocationsSqlService(self) self.projects_locations_templates = self.ProjectsLocationsTemplatesService(self) self.projects_locations = self.ProjectsLocationsService(self) + self.projects_snapshots = self.ProjectsSnapshotsService(self) + self.projects_templateVersions = self.ProjectsTemplateVersionsService(self) self.projects_templates = self.ProjectsTemplatesService(self) self.projects = self.ProjectsService(self) + class ProjectsCatalogTemplatesTemplateVersionsService(base_api.BaseApiService): + """Service class for the projects_catalogTemplates_templateVersions resource.""" + + _NAME = 'projects_catalogTemplates_templateVersions' + + def __init__(self, client): + super(DataflowV1b3.ProjectsCatalogTemplatesTemplateVersionsService, self).__init__(client) + self._upload_configs = { + } + + def Create(self, request, global_params=None): + r"""Creates a new Template with TemplateVersion. Requires project_id(projects) and template display_name(catalogTemplates). The template display_name is set by the user. + + Args: + request: (DataflowProjectsCatalogTemplatesTemplateVersionsCreateRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (TemplateVersion) The response message. + """ + config = self.GetMethodConfig('Create') + return self._RunMethod( + config, request, global_params=global_params) + + Create.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1b3/projects/{projectsId}/catalogTemplates/{catalogTemplatesId}/templateVersions', + http_method='POST', + method_id='dataflow.projects.catalogTemplates.templateVersions.create', + ordered_params=['parent'], + path_params=['parent'], + query_params=[], + relative_path='v1b3/{+parent}/templateVersions', + request_field='createTemplateVersionRequest', + request_type_name='DataflowProjectsCatalogTemplatesTemplateVersionsCreateRequest', + response_type_name='TemplateVersion', + supports_download=False, + ) + + class ProjectsCatalogTemplatesService(base_api.BaseApiService): + """Service class for the projects_catalogTemplates resource.""" + + _NAME = 'projects_catalogTemplates' + + def __init__(self, client): + super(DataflowV1b3.ProjectsCatalogTemplatesService, self).__init__(client) + self._upload_configs = { + } + + def Commit(self, request, global_params=None): + r"""Creates a new TemplateVersion (Important: not new Template) entry in the spanner table. Requires project_id and display_name (template). + + Args: + request: (DataflowProjectsCatalogTemplatesCommitRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (TemplateVersion) The response message. + """ + config = self.GetMethodConfig('Commit') + return self._RunMethod( + config, request, global_params=global_params) + + Commit.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1b3/projects/{projectsId}/catalogTemplates/{catalogTemplatesId}:commit', + http_method='POST', + method_id='dataflow.projects.catalogTemplates.commit', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1b3/{+name}:commit', + request_field='commitTemplateVersionRequest', + request_type_name='DataflowProjectsCatalogTemplatesCommitRequest', + response_type_name='TemplateVersion', + supports_download=False, + ) + + def Delete(self, request, global_params=None): + r"""Deletes an existing Template. Do nothing if Template does not exist. + + Args: + request: (DataflowProjectsCatalogTemplatesDeleteRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Empty) The response message. + """ + config = self.GetMethodConfig('Delete') + return self._RunMethod( + config, request, global_params=global_params) + + Delete.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1b3/projects/{projectsId}/catalogTemplates/{catalogTemplatesId}', + http_method='DELETE', + method_id='dataflow.projects.catalogTemplates.delete', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1b3/{+name}', + request_field='', + request_type_name='DataflowProjectsCatalogTemplatesDeleteRequest', + response_type_name='Empty', + supports_download=False, + ) + + def Get(self, request, global_params=None): + r"""Get TemplateVersion using project_id and display_name with an optional version_id field. Get latest (has tag "latest") TemplateVersion if version_id not set. + + Args: + request: (DataflowProjectsCatalogTemplatesGetRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (TemplateVersion) The response message. + """ + config = self.GetMethodConfig('Get') + return self._RunMethod( + config, request, global_params=global_params) + + Get.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1b3/projects/{projectsId}/catalogTemplates/{catalogTemplatesId}', + http_method='GET', + method_id='dataflow.projects.catalogTemplates.get', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1b3/{+name}', + request_field='', + request_type_name='DataflowProjectsCatalogTemplatesGetRequest', + response_type_name='TemplateVersion', + supports_download=False, + ) + + def Label(self, request, global_params=None): + r"""Updates the label of the TemplateVersion. Label can be duplicated in Template, so either add or remove the label in the TemplateVersion. + + Args: + request: (DataflowProjectsCatalogTemplatesLabelRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ModifyTemplateVersionLabelResponse) The response message. + """ + config = self.GetMethodConfig('Label') + return self._RunMethod( + config, request, global_params=global_params) + + Label.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1b3/projects/{projectsId}/catalogTemplates/{catalogTemplatesId}:label', + http_method='POST', + method_id='dataflow.projects.catalogTemplates.label', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1b3/{+name}:label', + request_field='modifyTemplateVersionLabelRequest', + request_type_name='DataflowProjectsCatalogTemplatesLabelRequest', + response_type_name='ModifyTemplateVersionLabelResponse', + supports_download=False, + ) + + def Tag(self, request, global_params=None): + r"""Updates the tag of the TemplateVersion, and tag is unique in Template. If tag exists in another TemplateVersion in the Template, updates the tag to this TemplateVersion will remove it from the old TemplateVersion and add it to this TemplateVersion. If request is remove_only (remove_only = true), remove the tag from this TemplateVersion. + + Args: + request: (DataflowProjectsCatalogTemplatesTagRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ModifyTemplateVersionTagResponse) The response message. + """ + config = self.GetMethodConfig('Tag') + return self._RunMethod( + config, request, global_params=global_params) + + Tag.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1b3/projects/{projectsId}/catalogTemplates/{catalogTemplatesId}:tag', + http_method='POST', + method_id='dataflow.projects.catalogTemplates.tag', + ordered_params=['name'], + path_params=['name'], + query_params=[], + relative_path='v1b3/{+name}:tag', + request_field='modifyTemplateVersionTagRequest', + request_type_name='DataflowProjectsCatalogTemplatesTagRequest', + response_type_name='ModifyTemplateVersionTagResponse', + supports_download=False, + ) + class ProjectsJobsDebugService(base_api.BaseApiService): """Service class for the projects_jobs_debug resource.""" - _NAME = u'projects_jobs_debug' + _NAME = 'projects_jobs_debug' def __init__(self, client): super(DataflowV1b3.ProjectsJobsDebugService, self).__init__(client) @@ -96,15 +283,15 @@ def GetConfig(self, request, global_params=None): config, request, global_params=global_params) GetConfig.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.jobs.debug.getConfig', - ordered_params=[u'projectId', u'jobId'], - path_params=[u'jobId', u'projectId'], + http_method='POST', + method_id='dataflow.projects.jobs.debug.getConfig', + ordered_params=['projectId', 'jobId'], + path_params=['jobId', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/debug/getConfig', - request_field=u'getDebugConfigRequest', - request_type_name=u'DataflowProjectsJobsDebugGetConfigRequest', - response_type_name=u'GetDebugConfigResponse', + relative_path='v1b3/projects/{projectId}/jobs/{jobId}/debug/getConfig', + request_field='getDebugConfigRequest', + request_type_name='DataflowProjectsJobsDebugGetConfigRequest', + response_type_name='GetDebugConfigResponse', supports_download=False, ) @@ -122,22 +309,22 @@ def SendCapture(self, request, global_params=None): config, request, global_params=global_params) SendCapture.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.jobs.debug.sendCapture', - ordered_params=[u'projectId', u'jobId'], - path_params=[u'jobId', u'projectId'], + http_method='POST', + method_id='dataflow.projects.jobs.debug.sendCapture', + ordered_params=['projectId', 'jobId'], + path_params=['jobId', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/debug/sendCapture', - request_field=u'sendDebugCaptureRequest', - request_type_name=u'DataflowProjectsJobsDebugSendCaptureRequest', - response_type_name=u'SendDebugCaptureResponse', + relative_path='v1b3/projects/{projectId}/jobs/{jobId}/debug/sendCapture', + request_field='sendDebugCaptureRequest', + request_type_name='DataflowProjectsJobsDebugSendCaptureRequest', + response_type_name='SendDebugCaptureResponse', supports_download=False, ) class ProjectsJobsMessagesService(base_api.BaseApiService): """Service class for the projects_jobs_messages resource.""" - _NAME = u'projects_jobs_messages' + _NAME = 'projects_jobs_messages' def __init__(self, client): super(DataflowV1b3.ProjectsJobsMessagesService, self).__init__(client) @@ -145,13 +332,7 @@ def __init__(self, client): } def List(self, request, global_params=None): - r"""Request the job status. - -To request the status of a job, we recommend using -`projects.locations.jobs.messages.list` with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.messages.list` is not recommended, as you can only request -the status of jobs that are running in `us-central1`. + r"""Request the job status. To request the status of a job, we recommend using `projects.locations.jobs.messages.list` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.messages.list` is not recommended, as you can only request the status of jobs that are running in `us-central1`. Args: request: (DataflowProjectsJobsMessagesListRequest) input message @@ -164,22 +345,22 @@ def List(self, request, global_params=None): config, request, global_params=global_params) List.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.jobs.messages.list', - ordered_params=[u'projectId', u'jobId'], - path_params=[u'jobId', u'projectId'], - query_params=[u'endTime', u'location', u'minimumImportance', u'pageSize', u'pageToken', u'startTime'], - relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/messages', + http_method='GET', + method_id='dataflow.projects.jobs.messages.list', + ordered_params=['projectId', 'jobId'], + path_params=['jobId', 'projectId'], + query_params=['endTime', 'location', 'minimumImportance', 'pageSize', 'pageToken', 'startTime'], + relative_path='v1b3/projects/{projectId}/jobs/{jobId}/messages', request_field='', - request_type_name=u'DataflowProjectsJobsMessagesListRequest', - response_type_name=u'ListJobMessagesResponse', + request_type_name='DataflowProjectsJobsMessagesListRequest', + response_type_name='ListJobMessagesResponse', supports_download=False, ) class ProjectsJobsWorkItemsService(base_api.BaseApiService): """Service class for the projects_jobs_workItems resource.""" - _NAME = u'projects_jobs_workItems' + _NAME = 'projects_jobs_workItems' def __init__(self, client): super(DataflowV1b3.ProjectsJobsWorkItemsService, self).__init__(client) @@ -200,15 +381,15 @@ def Lease(self, request, global_params=None): config, request, global_params=global_params) Lease.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.jobs.workItems.lease', - ordered_params=[u'projectId', u'jobId'], - path_params=[u'jobId', u'projectId'], + http_method='POST', + method_id='dataflow.projects.jobs.workItems.lease', + ordered_params=['projectId', 'jobId'], + path_params=['jobId', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/workItems:lease', - request_field=u'leaseWorkItemRequest', - request_type_name=u'DataflowProjectsJobsWorkItemsLeaseRequest', - response_type_name=u'LeaseWorkItemResponse', + relative_path='v1b3/projects/{projectId}/jobs/{jobId}/workItems:lease', + request_field='leaseWorkItemRequest', + request_type_name='DataflowProjectsJobsWorkItemsLeaseRequest', + response_type_name='LeaseWorkItemResponse', supports_download=False, ) @@ -226,22 +407,22 @@ def ReportStatus(self, request, global_params=None): config, request, global_params=global_params) ReportStatus.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.jobs.workItems.reportStatus', - ordered_params=[u'projectId', u'jobId'], - path_params=[u'jobId', u'projectId'], + http_method='POST', + method_id='dataflow.projects.jobs.workItems.reportStatus', + ordered_params=['projectId', 'jobId'], + path_params=['jobId', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/workItems:reportStatus', - request_field=u'reportWorkItemStatusRequest', - request_type_name=u'DataflowProjectsJobsWorkItemsReportStatusRequest', - response_type_name=u'ReportWorkItemStatusResponse', + relative_path='v1b3/projects/{projectId}/jobs/{jobId}/workItems:reportStatus', + request_field='reportWorkItemStatusRequest', + request_type_name='DataflowProjectsJobsWorkItemsReportStatusRequest', + response_type_name='ReportWorkItemStatusResponse', supports_download=False, ) class ProjectsJobsService(base_api.BaseApiService): """Service class for the projects_jobs resource.""" - _NAME = u'projects_jobs' + _NAME = 'projects_jobs' def __init__(self, client): super(DataflowV1b3.ProjectsJobsService, self).__init__(client) @@ -262,26 +443,20 @@ def Aggregated(self, request, global_params=None): config, request, global_params=global_params) Aggregated.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.jobs.aggregated', - ordered_params=[u'projectId'], - path_params=[u'projectId'], - query_params=[u'filter', u'location', u'pageSize', u'pageToken', u'view'], - relative_path=u'v1b3/projects/{projectId}/jobs:aggregated', + http_method='GET', + method_id='dataflow.projects.jobs.aggregated', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['filter', 'location', 'pageSize', 'pageToken', 'view'], + relative_path='v1b3/projects/{projectId}/jobs:aggregated', request_field='', - request_type_name=u'DataflowProjectsJobsAggregatedRequest', - response_type_name=u'ListJobsResponse', + request_type_name='DataflowProjectsJobsAggregatedRequest', + response_type_name='ListJobsResponse', supports_download=False, ) def Create(self, request, global_params=None): - r"""Creates a Cloud Dataflow job. - -To create a job, we recommend using `projects.locations.jobs.create` with a -[regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.create` is not recommended, as your job will always start -in `us-central1`. + r"""Creates a Cloud Dataflow job. To create a job, we recommend using `projects.locations.jobs.create` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.create` is not recommended, as your job will always start in `us-central1`. Args: request: (DataflowProjectsJobsCreateRequest) input message @@ -294,26 +469,20 @@ def Create(self, request, global_params=None): config, request, global_params=global_params) Create.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.jobs.create', - ordered_params=[u'projectId'], - path_params=[u'projectId'], - query_params=[u'location', u'replaceJobId', u'view'], - relative_path=u'v1b3/projects/{projectId}/jobs', - request_field=u'job', - request_type_name=u'DataflowProjectsJobsCreateRequest', - response_type_name=u'Job', + http_method='POST', + method_id='dataflow.projects.jobs.create', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['location', 'replaceJobId', 'view'], + relative_path='v1b3/projects/{projectId}/jobs', + request_field='job', + request_type_name='DataflowProjectsJobsCreateRequest', + response_type_name='Job', supports_download=False, ) def Get(self, request, global_params=None): - r"""Gets the state of the specified Cloud Dataflow job. - -To get the state of a job, we recommend using `projects.locations.jobs.get` -with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.get` is not recommended, as you can only get the state of -jobs that are running in `us-central1`. + r"""Gets the state of the specified Cloud Dataflow job. To get the state of a job, we recommend using `projects.locations.jobs.get` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.get` is not recommended, as you can only get the state of jobs that are running in `us-central1`. Args: request: (DataflowProjectsJobsGetRequest) input message @@ -326,26 +495,20 @@ def Get(self, request, global_params=None): config, request, global_params=global_params) Get.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.jobs.get', - ordered_params=[u'projectId', u'jobId'], - path_params=[u'jobId', u'projectId'], - query_params=[u'location', u'view'], - relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}', + http_method='GET', + method_id='dataflow.projects.jobs.get', + ordered_params=['projectId', 'jobId'], + path_params=['jobId', 'projectId'], + query_params=['location', 'view'], + relative_path='v1b3/projects/{projectId}/jobs/{jobId}', request_field='', - request_type_name=u'DataflowProjectsJobsGetRequest', - response_type_name=u'Job', + request_type_name='DataflowProjectsJobsGetRequest', + response_type_name='Job', supports_download=False, ) def GetMetrics(self, request, global_params=None): - r"""Request the job status. - -To request the status of a job, we recommend using -`projects.locations.jobs.getMetrics` with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.getMetrics` is not recommended, as you can only request the -status of jobs that are running in `us-central1`. + r"""Request the job status. To request the status of a job, we recommend using `projects.locations.jobs.getMetrics` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.getMetrics` is not recommended, as you can only request the status of jobs that are running in `us-central1`. Args: request: (DataflowProjectsJobsGetMetricsRequest) input message @@ -358,27 +521,20 @@ def GetMetrics(self, request, global_params=None): config, request, global_params=global_params) GetMetrics.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.jobs.getMetrics', - ordered_params=[u'projectId', u'jobId'], - path_params=[u'jobId', u'projectId'], - query_params=[u'location', u'startTime'], - relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/metrics', + http_method='GET', + method_id='dataflow.projects.jobs.getMetrics', + ordered_params=['projectId', 'jobId'], + path_params=['jobId', 'projectId'], + query_params=['location', 'startTime'], + relative_path='v1b3/projects/{projectId}/jobs/{jobId}/metrics', request_field='', - request_type_name=u'DataflowProjectsJobsGetMetricsRequest', - response_type_name=u'JobMetrics', + request_type_name='DataflowProjectsJobsGetMetricsRequest', + response_type_name='JobMetrics', supports_download=False, ) def List(self, request, global_params=None): - r"""List the jobs of a project. - -To list the jobs of a project in a region, we recommend using -`projects.locations.jobs.get` with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To -list the all jobs across all regions, use `projects.jobs.aggregated`. Using -`projects.jobs.list` is not recommended, as you can only get the list of -jobs that are running in `us-central1`. + r"""List the jobs of a project. To list the jobs of a project in a region, we recommend using `projects.locations.jobs.list` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To list the all jobs across all regions, use `projects.jobs.aggregated`. Using `projects.jobs.list` is not recommended, as you can only get the list of jobs that are running in `us-central1`. Args: request: (DataflowProjectsJobsListRequest) input message @@ -391,26 +547,46 @@ def List(self, request, global_params=None): config, request, global_params=global_params) List.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.jobs.list', - ordered_params=[u'projectId'], - path_params=[u'projectId'], - query_params=[u'filter', u'location', u'pageSize', u'pageToken', u'view'], - relative_path=u'v1b3/projects/{projectId}/jobs', + http_method='GET', + method_id='dataflow.projects.jobs.list', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['filter', 'location', 'pageSize', 'pageToken', 'view'], + relative_path='v1b3/projects/{projectId}/jobs', request_field='', - request_type_name=u'DataflowProjectsJobsListRequest', - response_type_name=u'ListJobsResponse', + request_type_name='DataflowProjectsJobsListRequest', + response_type_name='ListJobsResponse', supports_download=False, ) - def Update(self, request, global_params=None): - r"""Updates the state of an existing Cloud Dataflow job. + def Snapshot(self, request, global_params=None): + r"""Snapshot the state of a streaming job. -To update the state of an existing job, we recommend using -`projects.locations.jobs.update` with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.update` is not recommended, as you can only update the state -of jobs that are running in `us-central1`. + Args: + request: (DataflowProjectsJobsSnapshotRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Snapshot) The response message. + """ + config = self.GetMethodConfig('Snapshot') + return self._RunMethod( + config, request, global_params=global_params) + + Snapshot.method_config = lambda: base_api.ApiMethodInfo( + http_method='POST', + method_id='dataflow.projects.jobs.snapshot', + ordered_params=['projectId', 'jobId'], + path_params=['jobId', 'projectId'], + query_params=[], + relative_path='v1b3/projects/{projectId}/jobs/{jobId}:snapshot', + request_field='snapshotJobRequest', + request_type_name='DataflowProjectsJobsSnapshotRequest', + response_type_name='Snapshot', + supports_download=False, + ) + + def Update(self, request, global_params=None): + r"""Updates the state of an existing Cloud Dataflow job. To update the state of an existing job, we recommend using `projects.locations.jobs.update` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.update` is not recommended, as you can only update the state of jobs that are running in `us-central1`. Args: request: (DataflowProjectsJobsUpdateRequest) input message @@ -423,22 +599,22 @@ def Update(self, request, global_params=None): config, request, global_params=global_params) Update.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'PUT', - method_id=u'dataflow.projects.jobs.update', - ordered_params=[u'projectId', u'jobId'], - path_params=[u'jobId', u'projectId'], - query_params=[u'location'], - relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}', - request_field=u'job', - request_type_name=u'DataflowProjectsJobsUpdateRequest', - response_type_name=u'Job', + http_method='PUT', + method_id='dataflow.projects.jobs.update', + ordered_params=['projectId', 'jobId'], + path_params=['jobId', 'projectId'], + query_params=['location'], + relative_path='v1b3/projects/{projectId}/jobs/{jobId}', + request_field='job', + request_type_name='DataflowProjectsJobsUpdateRequest', + response_type_name='Job', supports_download=False, ) class ProjectsLocationsFlexTemplatesService(base_api.BaseApiService): """Service class for the projects_locations_flexTemplates resource.""" - _NAME = u'projects_locations_flexTemplates' + _NAME = 'projects_locations_flexTemplates' def __init__(self, client): super(DataflowV1b3.ProjectsLocationsFlexTemplatesService, self).__init__(client) @@ -459,22 +635,22 @@ def Launch(self, request, global_params=None): config, request, global_params=global_params) Launch.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.locations.flexTemplates.launch', - ordered_params=[u'projectId', u'location'], - path_params=[u'location', u'projectId'], + http_method='POST', + method_id='dataflow.projects.locations.flexTemplates.launch', + ordered_params=['projectId', 'location'], + path_params=['location', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/flexTemplates:launch', - request_field=u'launchFlexTemplateRequest', - request_type_name=u'DataflowProjectsLocationsFlexTemplatesLaunchRequest', - response_type_name=u'LaunchFlexTemplateResponse', + relative_path='v1b3/projects/{projectId}/locations/{location}/flexTemplates:launch', + request_field='launchFlexTemplateRequest', + request_type_name='DataflowProjectsLocationsFlexTemplatesLaunchRequest', + response_type_name='LaunchFlexTemplateResponse', supports_download=False, ) class ProjectsLocationsJobsDebugService(base_api.BaseApiService): """Service class for the projects_locations_jobs_debug resource.""" - _NAME = u'projects_locations_jobs_debug' + _NAME = 'projects_locations_jobs_debug' def __init__(self, client): super(DataflowV1b3.ProjectsLocationsJobsDebugService, self).__init__(client) @@ -495,15 +671,15 @@ def GetConfig(self, request, global_params=None): config, request, global_params=global_params) GetConfig.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.locations.jobs.debug.getConfig', - ordered_params=[u'projectId', u'location', u'jobId'], - path_params=[u'jobId', u'location', u'projectId'], + http_method='POST', + method_id='dataflow.projects.locations.jobs.debug.getConfig', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/debug/getConfig', - request_field=u'getDebugConfigRequest', - request_type_name=u'DataflowProjectsLocationsJobsDebugGetConfigRequest', - response_type_name=u'GetDebugConfigResponse', + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/debug/getConfig', + request_field='getDebugConfigRequest', + request_type_name='DataflowProjectsLocationsJobsDebugGetConfigRequest', + response_type_name='GetDebugConfigResponse', supports_download=False, ) @@ -521,22 +697,22 @@ def SendCapture(self, request, global_params=None): config, request, global_params=global_params) SendCapture.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.locations.jobs.debug.sendCapture', - ordered_params=[u'projectId', u'location', u'jobId'], - path_params=[u'jobId', u'location', u'projectId'], + http_method='POST', + method_id='dataflow.projects.locations.jobs.debug.sendCapture', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/debug/sendCapture', - request_field=u'sendDebugCaptureRequest', - request_type_name=u'DataflowProjectsLocationsJobsDebugSendCaptureRequest', - response_type_name=u'SendDebugCaptureResponse', + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/debug/sendCapture', + request_field='sendDebugCaptureRequest', + request_type_name='DataflowProjectsLocationsJobsDebugSendCaptureRequest', + response_type_name='SendDebugCaptureResponse', supports_download=False, ) class ProjectsLocationsJobsMessagesService(base_api.BaseApiService): """Service class for the projects_locations_jobs_messages resource.""" - _NAME = u'projects_locations_jobs_messages' + _NAME = 'projects_locations_jobs_messages' def __init__(self, client): super(DataflowV1b3.ProjectsLocationsJobsMessagesService, self).__init__(client) @@ -544,13 +720,7 @@ def __init__(self, client): } def List(self, request, global_params=None): - r"""Request the job status. - -To request the status of a job, we recommend using -`projects.locations.jobs.messages.list` with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.messages.list` is not recommended, as you can only request -the status of jobs that are running in `us-central1`. + r"""Request the job status. To request the status of a job, we recommend using `projects.locations.jobs.messages.list` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.messages.list` is not recommended, as you can only request the status of jobs that are running in `us-central1`. Args: request: (DataflowProjectsLocationsJobsMessagesListRequest) input message @@ -563,22 +733,94 @@ def List(self, request, global_params=None): config, request, global_params=global_params) List.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.locations.jobs.messages.list', - ordered_params=[u'projectId', u'location', u'jobId'], - path_params=[u'jobId', u'location', u'projectId'], - query_params=[u'endTime', u'minimumImportance', u'pageSize', u'pageToken', u'startTime'], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/messages', + http_method='GET', + method_id='dataflow.projects.locations.jobs.messages.list', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], + query_params=['endTime', 'minimumImportance', 'pageSize', 'pageToken', 'startTime'], + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/messages', + request_field='', + request_type_name='DataflowProjectsLocationsJobsMessagesListRequest', + response_type_name='ListJobMessagesResponse', + supports_download=False, + ) + + class ProjectsLocationsJobsSnapshotsService(base_api.BaseApiService): + """Service class for the projects_locations_jobs_snapshots resource.""" + + _NAME = 'projects_locations_jobs_snapshots' + + def __init__(self, client): + super(DataflowV1b3.ProjectsLocationsJobsSnapshotsService, self).__init__(client) + self._upload_configs = { + } + + def List(self, request, global_params=None): + r"""Lists snapshots. + + Args: + request: (DataflowProjectsLocationsJobsSnapshotsListRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ListSnapshotsResponse) The response message. + """ + config = self.GetMethodConfig('List') + return self._RunMethod( + config, request, global_params=global_params) + + List.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='dataflow.projects.locations.jobs.snapshots.list', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], + query_params=[], + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/snapshots', + request_field='', + request_type_name='DataflowProjectsLocationsJobsSnapshotsListRequest', + response_type_name='ListSnapshotsResponse', + supports_download=False, + ) + + class ProjectsLocationsJobsStagesService(base_api.BaseApiService): + """Service class for the projects_locations_jobs_stages resource.""" + + _NAME = 'projects_locations_jobs_stages' + + def __init__(self, client): + super(DataflowV1b3.ProjectsLocationsJobsStagesService, self).__init__(client) + self._upload_configs = { + } + + def GetExecutionDetails(self, request, global_params=None): + r"""Request detailed information about the execution status of a stage of the job. EXPERIMENTAL. This API is subject to change or removal without notice. + + Args: + request: (DataflowProjectsLocationsJobsStagesGetExecutionDetailsRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (StageExecutionDetails) The response message. + """ + config = self.GetMethodConfig('GetExecutionDetails') + return self._RunMethod( + config, request, global_params=global_params) + + GetExecutionDetails.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='dataflow.projects.locations.jobs.stages.getExecutionDetails', + ordered_params=['projectId', 'location', 'jobId', 'stageId'], + path_params=['jobId', 'location', 'projectId', 'stageId'], + query_params=['endTime', 'pageSize', 'pageToken', 'startTime'], + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/stages/{stageId}/executionDetails', request_field='', - request_type_name=u'DataflowProjectsLocationsJobsMessagesListRequest', - response_type_name=u'ListJobMessagesResponse', + request_type_name='DataflowProjectsLocationsJobsStagesGetExecutionDetailsRequest', + response_type_name='StageExecutionDetails', supports_download=False, ) class ProjectsLocationsJobsWorkItemsService(base_api.BaseApiService): """Service class for the projects_locations_jobs_workItems resource.""" - _NAME = u'projects_locations_jobs_workItems' + _NAME = 'projects_locations_jobs_workItems' def __init__(self, client): super(DataflowV1b3.ProjectsLocationsJobsWorkItemsService, self).__init__(client) @@ -599,15 +841,15 @@ def Lease(self, request, global_params=None): config, request, global_params=global_params) Lease.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.locations.jobs.workItems.lease', - ordered_params=[u'projectId', u'location', u'jobId'], - path_params=[u'jobId', u'location', u'projectId'], + http_method='POST', + method_id='dataflow.projects.locations.jobs.workItems.lease', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/workItems:lease', - request_field=u'leaseWorkItemRequest', - request_type_name=u'DataflowProjectsLocationsJobsWorkItemsLeaseRequest', - response_type_name=u'LeaseWorkItemResponse', + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/workItems:lease', + request_field='leaseWorkItemRequest', + request_type_name='DataflowProjectsLocationsJobsWorkItemsLeaseRequest', + response_type_name='LeaseWorkItemResponse', supports_download=False, ) @@ -625,22 +867,22 @@ def ReportStatus(self, request, global_params=None): config, request, global_params=global_params) ReportStatus.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.locations.jobs.workItems.reportStatus', - ordered_params=[u'projectId', u'location', u'jobId'], - path_params=[u'jobId', u'location', u'projectId'], + http_method='POST', + method_id='dataflow.projects.locations.jobs.workItems.reportStatus', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/workItems:reportStatus', - request_field=u'reportWorkItemStatusRequest', - request_type_name=u'DataflowProjectsLocationsJobsWorkItemsReportStatusRequest', - response_type_name=u'ReportWorkItemStatusResponse', + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/workItems:reportStatus', + request_field='reportWorkItemStatusRequest', + request_type_name='DataflowProjectsLocationsJobsWorkItemsReportStatusRequest', + response_type_name='ReportWorkItemStatusResponse', supports_download=False, ) class ProjectsLocationsJobsService(base_api.BaseApiService): """Service class for the projects_locations_jobs resource.""" - _NAME = u'projects_locations_jobs' + _NAME = 'projects_locations_jobs' def __init__(self, client): super(DataflowV1b3.ProjectsLocationsJobsService, self).__init__(client) @@ -648,13 +890,7 @@ def __init__(self, client): } def Create(self, request, global_params=None): - r"""Creates a Cloud Dataflow job. - -To create a job, we recommend using `projects.locations.jobs.create` with a -[regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.create` is not recommended, as your job will always start -in `us-central1`. + r"""Creates a Cloud Dataflow job. To create a job, we recommend using `projects.locations.jobs.create` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.create` is not recommended, as your job will always start in `us-central1`. Args: request: (DataflowProjectsLocationsJobsCreateRequest) input message @@ -667,26 +903,20 @@ def Create(self, request, global_params=None): config, request, global_params=global_params) Create.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.locations.jobs.create', - ordered_params=[u'projectId', u'location'], - path_params=[u'location', u'projectId'], - query_params=[u'replaceJobId', u'view'], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs', - request_field=u'job', - request_type_name=u'DataflowProjectsLocationsJobsCreateRequest', - response_type_name=u'Job', + http_method='POST', + method_id='dataflow.projects.locations.jobs.create', + ordered_params=['projectId', 'location'], + path_params=['location', 'projectId'], + query_params=['replaceJobId', 'view'], + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs', + request_field='job', + request_type_name='DataflowProjectsLocationsJobsCreateRequest', + response_type_name='Job', supports_download=False, ) def Get(self, request, global_params=None): - r"""Gets the state of the specified Cloud Dataflow job. - -To get the state of a job, we recommend using `projects.locations.jobs.get` -with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.get` is not recommended, as you can only get the state of -jobs that are running in `us-central1`. + r"""Gets the state of the specified Cloud Dataflow job. To get the state of a job, we recommend using `projects.locations.jobs.get` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.get` is not recommended, as you can only get the state of jobs that are running in `us-central1`. Args: request: (DataflowProjectsLocationsJobsGetRequest) input message @@ -699,26 +929,46 @@ def Get(self, request, global_params=None): config, request, global_params=global_params) Get.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.locations.jobs.get', - ordered_params=[u'projectId', u'location', u'jobId'], - path_params=[u'jobId', u'location', u'projectId'], - query_params=[u'view'], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}', + http_method='GET', + method_id='dataflow.projects.locations.jobs.get', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], + query_params=['view'], + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}', request_field='', - request_type_name=u'DataflowProjectsLocationsJobsGetRequest', - response_type_name=u'Job', + request_type_name='DataflowProjectsLocationsJobsGetRequest', + response_type_name='Job', supports_download=False, ) - def GetMetrics(self, request, global_params=None): - r"""Request the job status. + def GetExecutionDetails(self, request, global_params=None): + r"""Request detailed information about the execution status of the job. EXPERIMENTAL. This API is subject to change or removal without notice. -To request the status of a job, we recommend using -`projects.locations.jobs.getMetrics` with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.getMetrics` is not recommended, as you can only request the -status of jobs that are running in `us-central1`. + Args: + request: (DataflowProjectsLocationsJobsGetExecutionDetailsRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (JobExecutionDetails) The response message. + """ + config = self.GetMethodConfig('GetExecutionDetails') + return self._RunMethod( + config, request, global_params=global_params) + + GetExecutionDetails.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='dataflow.projects.locations.jobs.getExecutionDetails', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], + query_params=['pageSize', 'pageToken'], + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/executionDetails', + request_field='', + request_type_name='DataflowProjectsLocationsJobsGetExecutionDetailsRequest', + response_type_name='JobExecutionDetails', + supports_download=False, + ) + + def GetMetrics(self, request, global_params=None): + r"""Request the job status. To request the status of a job, we recommend using `projects.locations.jobs.getMetrics` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.getMetrics` is not recommended, as you can only request the status of jobs that are running in `us-central1`. Args: request: (DataflowProjectsLocationsJobsGetMetricsRequest) input message @@ -731,27 +981,20 @@ def GetMetrics(self, request, global_params=None): config, request, global_params=global_params) GetMetrics.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.locations.jobs.getMetrics', - ordered_params=[u'projectId', u'location', u'jobId'], - path_params=[u'jobId', u'location', u'projectId'], - query_params=[u'startTime'], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/metrics', + http_method='GET', + method_id='dataflow.projects.locations.jobs.getMetrics', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], + query_params=['startTime'], + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}/metrics', request_field='', - request_type_name=u'DataflowProjectsLocationsJobsGetMetricsRequest', - response_type_name=u'JobMetrics', + request_type_name='DataflowProjectsLocationsJobsGetMetricsRequest', + response_type_name='JobMetrics', supports_download=False, ) def List(self, request, global_params=None): - r"""List the jobs of a project. - -To list the jobs of a project in a region, we recommend using -`projects.locations.jobs.get` with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To -list the all jobs across all regions, use `projects.jobs.aggregated`. Using -`projects.jobs.list` is not recommended, as you can only get the list of -jobs that are running in `us-central1`. + r"""List the jobs of a project. To list the jobs of a project in a region, we recommend using `projects.locations.jobs.list` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To list the all jobs across all regions, use `projects.jobs.aggregated`. Using `projects.jobs.list` is not recommended, as you can only get the list of jobs that are running in `us-central1`. Args: request: (DataflowProjectsLocationsJobsListRequest) input message @@ -764,26 +1007,46 @@ def List(self, request, global_params=None): config, request, global_params=global_params) List.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.locations.jobs.list', - ordered_params=[u'projectId', u'location'], - path_params=[u'location', u'projectId'], - query_params=[u'filter', u'pageSize', u'pageToken', u'view'], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs', + http_method='GET', + method_id='dataflow.projects.locations.jobs.list', + ordered_params=['projectId', 'location'], + path_params=['location', 'projectId'], + query_params=['filter', 'pageSize', 'pageToken', 'view'], + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs', request_field='', - request_type_name=u'DataflowProjectsLocationsJobsListRequest', - response_type_name=u'ListJobsResponse', + request_type_name='DataflowProjectsLocationsJobsListRequest', + response_type_name='ListJobsResponse', supports_download=False, ) - def Update(self, request, global_params=None): - r"""Updates the state of an existing Cloud Dataflow job. + def Snapshot(self, request, global_params=None): + r"""Snapshot the state of a streaming job. -To update the state of an existing job, we recommend using -`projects.locations.jobs.update` with a [regional endpoint] -(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using -`projects.jobs.update` is not recommended, as you can only update the state -of jobs that are running in `us-central1`. + Args: + request: (DataflowProjectsLocationsJobsSnapshotRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Snapshot) The response message. + """ + config = self.GetMethodConfig('Snapshot') + return self._RunMethod( + config, request, global_params=global_params) + + Snapshot.method_config = lambda: base_api.ApiMethodInfo( + http_method='POST', + method_id='dataflow.projects.locations.jobs.snapshot', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], + query_params=[], + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}:snapshot', + request_field='snapshotJobRequest', + request_type_name='DataflowProjectsLocationsJobsSnapshotRequest', + response_type_name='Snapshot', + supports_download=False, + ) + + def Update(self, request, global_params=None): + r"""Updates the state of an existing Cloud Dataflow job. To update the state of an existing job, we recommend using `projects.locations.jobs.update` with a [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using `projects.jobs.update` is not recommended, as you can only update the state of jobs that are running in `us-central1`. Args: request: (DataflowProjectsLocationsJobsUpdateRequest) input message @@ -796,22 +1059,110 @@ def Update(self, request, global_params=None): config, request, global_params=global_params) Update.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'PUT', - method_id=u'dataflow.projects.locations.jobs.update', - ordered_params=[u'projectId', u'location', u'jobId'], - path_params=[u'jobId', u'location', u'projectId'], + http_method='PUT', + method_id='dataflow.projects.locations.jobs.update', + ordered_params=['projectId', 'location', 'jobId'], + path_params=['jobId', 'location', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}', - request_field=u'job', - request_type_name=u'DataflowProjectsLocationsJobsUpdateRequest', - response_type_name=u'Job', + relative_path='v1b3/projects/{projectId}/locations/{location}/jobs/{jobId}', + request_field='job', + request_type_name='DataflowProjectsLocationsJobsUpdateRequest', + response_type_name='Job', + supports_download=False, + ) + + class ProjectsLocationsSnapshotsService(base_api.BaseApiService): + """Service class for the projects_locations_snapshots resource.""" + + _NAME = 'projects_locations_snapshots' + + def __init__(self, client): + super(DataflowV1b3.ProjectsLocationsSnapshotsService, self).__init__(client) + self._upload_configs = { + } + + def Delete(self, request, global_params=None): + r"""Deletes a snapshot. + + Args: + request: (DataflowProjectsLocationsSnapshotsDeleteRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (DeleteSnapshotResponse) The response message. + """ + config = self.GetMethodConfig('Delete') + return self._RunMethod( + config, request, global_params=global_params) + + Delete.method_config = lambda: base_api.ApiMethodInfo( + http_method='DELETE', + method_id='dataflow.projects.locations.snapshots.delete', + ordered_params=['projectId', 'location', 'snapshotId'], + path_params=['location', 'projectId', 'snapshotId'], + query_params=[], + relative_path='v1b3/projects/{projectId}/locations/{location}/snapshots/{snapshotId}', + request_field='', + request_type_name='DataflowProjectsLocationsSnapshotsDeleteRequest', + response_type_name='DeleteSnapshotResponse', + supports_download=False, + ) + + def Get(self, request, global_params=None): + r"""Gets information about a snapshot. + + Args: + request: (DataflowProjectsLocationsSnapshotsGetRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Snapshot) The response message. + """ + config = self.GetMethodConfig('Get') + return self._RunMethod( + config, request, global_params=global_params) + + Get.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='dataflow.projects.locations.snapshots.get', + ordered_params=['projectId', 'location', 'snapshotId'], + path_params=['location', 'projectId', 'snapshotId'], + query_params=[], + relative_path='v1b3/projects/{projectId}/locations/{location}/snapshots/{snapshotId}', + request_field='', + request_type_name='DataflowProjectsLocationsSnapshotsGetRequest', + response_type_name='Snapshot', + supports_download=False, + ) + + def List(self, request, global_params=None): + r"""Lists snapshots. + + Args: + request: (DataflowProjectsLocationsSnapshotsListRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ListSnapshotsResponse) The response message. + """ + config = self.GetMethodConfig('List') + return self._RunMethod( + config, request, global_params=global_params) + + List.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='dataflow.projects.locations.snapshots.list', + ordered_params=['projectId', 'location'], + path_params=['location', 'projectId'], + query_params=['jobId'], + relative_path='v1b3/projects/{projectId}/locations/{location}/snapshots', + request_field='', + request_type_name='DataflowProjectsLocationsSnapshotsListRequest', + response_type_name='ListSnapshotsResponse', supports_download=False, ) class ProjectsLocationsSqlService(base_api.BaseApiService): """Service class for the projects_locations_sql resource.""" - _NAME = u'projects_locations_sql' + _NAME = 'projects_locations_sql' def __init__(self, client): super(DataflowV1b3.ProjectsLocationsSqlService, self).__init__(client) @@ -819,10 +1170,7 @@ def __init__(self, client): } def Validate(self, request, global_params=None): - r"""Validates a GoogleSQL query for Cloud Dataflow syntax. Will always. -confirm the given query parses correctly, and if able to look up -schema information from DataCatalog, will validate that the query -analyzes properly as well. + r"""Validates a GoogleSQL query for Cloud Dataflow syntax. Will always confirm the given query parses correctly, and if able to look up schema information from DataCatalog, will validate that the query analyzes properly as well. Args: request: (DataflowProjectsLocationsSqlValidateRequest) input message @@ -835,22 +1183,22 @@ def Validate(self, request, global_params=None): config, request, global_params=global_params) Validate.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.locations.sql.validate', - ordered_params=[u'projectId', u'location'], - path_params=[u'location', u'projectId'], - query_params=[u'query'], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/sql:validate', + http_method='GET', + method_id='dataflow.projects.locations.sql.validate', + ordered_params=['projectId', 'location'], + path_params=['location', 'projectId'], + query_params=['query'], + relative_path='v1b3/projects/{projectId}/locations/{location}/sql:validate', request_field='', - request_type_name=u'DataflowProjectsLocationsSqlValidateRequest', - response_type_name=u'ValidateResponse', + request_type_name='DataflowProjectsLocationsSqlValidateRequest', + response_type_name='ValidateResponse', supports_download=False, ) class ProjectsLocationsTemplatesService(base_api.BaseApiService): """Service class for the projects_locations_templates resource.""" - _NAME = u'projects_locations_templates' + _NAME = 'projects_locations_templates' def __init__(self, client): super(DataflowV1b3.ProjectsLocationsTemplatesService, self).__init__(client) @@ -871,15 +1219,15 @@ def Create(self, request, global_params=None): config, request, global_params=global_params) Create.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.locations.templates.create', - ordered_params=[u'projectId', u'location'], - path_params=[u'location', u'projectId'], + http_method='POST', + method_id='dataflow.projects.locations.templates.create', + ordered_params=['projectId', 'location'], + path_params=['location', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/templates', - request_field=u'createJobFromTemplateRequest', - request_type_name=u'DataflowProjectsLocationsTemplatesCreateRequest', - response_type_name=u'Job', + relative_path='v1b3/projects/{projectId}/locations/{location}/templates', + request_field='createJobFromTemplateRequest', + request_type_name='DataflowProjectsLocationsTemplatesCreateRequest', + response_type_name='Job', supports_download=False, ) @@ -897,15 +1245,15 @@ def Get(self, request, global_params=None): config, request, global_params=global_params) Get.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.locations.templates.get', - ordered_params=[u'projectId', u'location'], - path_params=[u'location', u'projectId'], - query_params=[u'gcsPath', u'view'], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/templates:get', + http_method='GET', + method_id='dataflow.projects.locations.templates.get', + ordered_params=['projectId', 'location'], + path_params=['location', 'projectId'], + query_params=['gcsPath', 'view'], + relative_path='v1b3/projects/{projectId}/locations/{location}/templates:get', request_field='', - request_type_name=u'DataflowProjectsLocationsTemplatesGetRequest', - response_type_name=u'GetTemplateResponse', + request_type_name='DataflowProjectsLocationsTemplatesGetRequest', + response_type_name='GetTemplateResponse', supports_download=False, ) @@ -923,22 +1271,22 @@ def Launch(self, request, global_params=None): config, request, global_params=global_params) Launch.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.locations.templates.launch', - ordered_params=[u'projectId', u'location'], - path_params=[u'location', u'projectId'], - query_params=[u'dynamicTemplate_gcsPath', u'dynamicTemplate_stagingLocation', u'gcsPath', u'validateOnly'], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/templates:launch', - request_field=u'launchTemplateParameters', - request_type_name=u'DataflowProjectsLocationsTemplatesLaunchRequest', - response_type_name=u'LaunchTemplateResponse', + http_method='POST', + method_id='dataflow.projects.locations.templates.launch', + ordered_params=['projectId', 'location'], + path_params=['location', 'projectId'], + query_params=['dynamicTemplate_gcsPath', 'dynamicTemplate_stagingLocation', 'gcsPath', 'validateOnly'], + relative_path='v1b3/projects/{projectId}/locations/{location}/templates:launch', + request_field='launchTemplateParameters', + request_type_name='DataflowProjectsLocationsTemplatesLaunchRequest', + response_type_name='LaunchTemplateResponse', supports_download=False, ) class ProjectsLocationsService(base_api.BaseApiService): """Service class for the projects_locations resource.""" - _NAME = u'projects_locations' + _NAME = 'projects_locations' def __init__(self, client): super(DataflowV1b3.ProjectsLocationsService, self).__init__(client) @@ -959,22 +1307,121 @@ def WorkerMessages(self, request, global_params=None): config, request, global_params=global_params) WorkerMessages.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.locations.workerMessages', - ordered_params=[u'projectId', u'location'], - path_params=[u'location', u'projectId'], + http_method='POST', + method_id='dataflow.projects.locations.workerMessages', + ordered_params=['projectId', 'location'], + path_params=['location', 'projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/locations/{location}/WorkerMessages', - request_field=u'sendWorkerMessagesRequest', - request_type_name=u'DataflowProjectsLocationsWorkerMessagesRequest', - response_type_name=u'SendWorkerMessagesResponse', + relative_path='v1b3/projects/{projectId}/locations/{location}/WorkerMessages', + request_field='sendWorkerMessagesRequest', + request_type_name='DataflowProjectsLocationsWorkerMessagesRequest', + response_type_name='SendWorkerMessagesResponse', + supports_download=False, + ) + + class ProjectsSnapshotsService(base_api.BaseApiService): + """Service class for the projects_snapshots resource.""" + + _NAME = 'projects_snapshots' + + def __init__(self, client): + super(DataflowV1b3.ProjectsSnapshotsService, self).__init__(client) + self._upload_configs = { + } + + def Get(self, request, global_params=None): + r"""Gets information about a snapshot. + + Args: + request: (DataflowProjectsSnapshotsGetRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (Snapshot) The response message. + """ + config = self.GetMethodConfig('Get') + return self._RunMethod( + config, request, global_params=global_params) + + Get.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='dataflow.projects.snapshots.get', + ordered_params=['projectId', 'snapshotId'], + path_params=['projectId', 'snapshotId'], + query_params=['location'], + relative_path='v1b3/projects/{projectId}/snapshots/{snapshotId}', + request_field='', + request_type_name='DataflowProjectsSnapshotsGetRequest', + response_type_name='Snapshot', + supports_download=False, + ) + + def List(self, request, global_params=None): + r"""Lists snapshots. + + Args: + request: (DataflowProjectsSnapshotsListRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ListSnapshotsResponse) The response message. + """ + config = self.GetMethodConfig('List') + return self._RunMethod( + config, request, global_params=global_params) + + List.method_config = lambda: base_api.ApiMethodInfo( + http_method='GET', + method_id='dataflow.projects.snapshots.list', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['jobId', 'location'], + relative_path='v1b3/projects/{projectId}/snapshots', + request_field='', + request_type_name='DataflowProjectsSnapshotsListRequest', + response_type_name='ListSnapshotsResponse', + supports_download=False, + ) + + class ProjectsTemplateVersionsService(base_api.BaseApiService): + """Service class for the projects_templateVersions resource.""" + + _NAME = 'projects_templateVersions' + + def __init__(self, client): + super(DataflowV1b3.ProjectsTemplateVersionsService, self).__init__(client) + self._upload_configs = { + } + + def List(self, request, global_params=None): + r"""List TemplateVersions using project_id and an optional display_name field. List all the TemplateVersions in the Template if display set. List all the TemplateVersions in the Project if display_name not set. + + Args: + request: (DataflowProjectsTemplateVersionsListRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (ListTemplateVersionsResponse) The response message. + """ + config = self.GetMethodConfig('List') + return self._RunMethod( + config, request, global_params=global_params) + + List.method_config = lambda: base_api.ApiMethodInfo( + flat_path='v1b3/projects/{projectsId}/templateVersions', + http_method='GET', + method_id='dataflow.projects.templateVersions.list', + ordered_params=['parent'], + path_params=['parent'], + query_params=['pageSize', 'pageToken'], + relative_path='v1b3/{+parent}/templateVersions', + request_field='', + request_type_name='DataflowProjectsTemplateVersionsListRequest', + response_type_name='ListTemplateVersionsResponse', supports_download=False, ) class ProjectsTemplatesService(base_api.BaseApiService): """Service class for the projects_templates resource.""" - _NAME = u'projects_templates' + _NAME = 'projects_templates' def __init__(self, client): super(DataflowV1b3.ProjectsTemplatesService, self).__init__(client) @@ -995,15 +1442,15 @@ def Create(self, request, global_params=None): config, request, global_params=global_params) Create.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.templates.create', - ordered_params=[u'projectId'], - path_params=[u'projectId'], + http_method='POST', + method_id='dataflow.projects.templates.create', + ordered_params=['projectId'], + path_params=['projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/templates', - request_field=u'createJobFromTemplateRequest', - request_type_name=u'DataflowProjectsTemplatesCreateRequest', - response_type_name=u'Job', + relative_path='v1b3/projects/{projectId}/templates', + request_field='createJobFromTemplateRequest', + request_type_name='DataflowProjectsTemplatesCreateRequest', + response_type_name='Job', supports_download=False, ) @@ -1021,15 +1468,15 @@ def Get(self, request, global_params=None): config, request, global_params=global_params) Get.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'GET', - method_id=u'dataflow.projects.templates.get', - ordered_params=[u'projectId'], - path_params=[u'projectId'], - query_params=[u'gcsPath', u'location', u'view'], - relative_path=u'v1b3/projects/{projectId}/templates:get', + http_method='GET', + method_id='dataflow.projects.templates.get', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['gcsPath', 'location', 'view'], + relative_path='v1b3/projects/{projectId}/templates:get', request_field='', - request_type_name=u'DataflowProjectsTemplatesGetRequest', - response_type_name=u'GetTemplateResponse', + request_type_name='DataflowProjectsTemplatesGetRequest', + response_type_name='GetTemplateResponse', supports_download=False, ) @@ -1047,28 +1494,54 @@ def Launch(self, request, global_params=None): config, request, global_params=global_params) Launch.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.templates.launch', - ordered_params=[u'projectId'], - path_params=[u'projectId'], - query_params=[u'dynamicTemplate_gcsPath', u'dynamicTemplate_stagingLocation', u'gcsPath', u'location', u'validateOnly'], - relative_path=u'v1b3/projects/{projectId}/templates:launch', - request_field=u'launchTemplateParameters', - request_type_name=u'DataflowProjectsTemplatesLaunchRequest', - response_type_name=u'LaunchTemplateResponse', + http_method='POST', + method_id='dataflow.projects.templates.launch', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['dynamicTemplate_gcsPath', 'dynamicTemplate_stagingLocation', 'gcsPath', 'location', 'validateOnly'], + relative_path='v1b3/projects/{projectId}/templates:launch', + request_field='launchTemplateParameters', + request_type_name='DataflowProjectsTemplatesLaunchRequest', + response_type_name='LaunchTemplateResponse', supports_download=False, ) class ProjectsService(base_api.BaseApiService): """Service class for the projects resource.""" - _NAME = u'projects' + _NAME = 'projects' def __init__(self, client): super(DataflowV1b3.ProjectsService, self).__init__(client) self._upload_configs = { } + def DeleteSnapshots(self, request, global_params=None): + r"""Deletes a snapshot. + + Args: + request: (DataflowProjectsDeleteSnapshotsRequest) input message + global_params: (StandardQueryParameters, default: None) global arguments + Returns: + (DeleteSnapshotResponse) The response message. + """ + config = self.GetMethodConfig('DeleteSnapshots') + return self._RunMethod( + config, request, global_params=global_params) + + DeleteSnapshots.method_config = lambda: base_api.ApiMethodInfo( + http_method='DELETE', + method_id='dataflow.projects.deleteSnapshots', + ordered_params=['projectId'], + path_params=['projectId'], + query_params=['location', 'snapshotId'], + relative_path='v1b3/projects/{projectId}/snapshots', + request_field='', + request_type_name='DataflowProjectsDeleteSnapshotsRequest', + response_type_name='DeleteSnapshotResponse', + supports_download=False, + ) + def WorkerMessages(self, request, global_params=None): r"""Send a worker_message to the service. @@ -1083,14 +1556,14 @@ def WorkerMessages(self, request, global_params=None): config, request, global_params=global_params) WorkerMessages.method_config = lambda: base_api.ApiMethodInfo( - http_method=u'POST', - method_id=u'dataflow.projects.workerMessages', - ordered_params=[u'projectId'], - path_params=[u'projectId'], + http_method='POST', + method_id='dataflow.projects.workerMessages', + ordered_params=['projectId'], + path_params=['projectId'], query_params=[], - relative_path=u'v1b3/projects/{projectId}/WorkerMessages', - request_field=u'sendWorkerMessagesRequest', - request_type_name=u'DataflowProjectsWorkerMessagesRequest', - response_type_name=u'SendWorkerMessagesResponse', + relative_path='v1b3/projects/{projectId}/WorkerMessages', + request_field='sendWorkerMessagesRequest', + request_type_name='DataflowProjectsWorkerMessagesRequest', + response_type_name='SendWorkerMessagesResponse', supports_download=False, ) diff --git a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py index 684b51e31faf..b8b9f94d1aeb 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py @@ -21,8 +21,6 @@ """ # NOTE: This file is autogenerated and should not be edited by hand. -from __future__ import absolute_import - from apitools.base.protorpclite import messages as _messages from apitools.base.py import encoding from apitools.base.py import extra_types @@ -53,9 +51,9 @@ class ApproximateReportedProgress(_messages.Message): consumedParallelism: Total amount of parallelism in the portion of input of this task that has already been consumed and is no longer active. In the first two examples above (see remaining_parallelism), the value - should be 29 or 2 respectively. The sum of remaining_parallelism and + should be 29 or 2 respectively. The sum of remaining_parallelism and consumed_parallelism should equal the total amount of parallelism in - this work item. If specified, must be finite. + this work item. If specified, must be finite. fractionConsumed: Completion as fraction of the input consumed, from 0.0 (beginning, nothing consumed), to 1.0 (end of the input, entire input consumed). @@ -63,23 +61,22 @@ class ApproximateReportedProgress(_messages.Message): remainingParallelism: Total amount of parallelism in the input of this task that remains, (i.e. can be delegated to this task and any new tasks via dynamic splitting). Always at least 1 for non-finished work items - and 0 for finished. "Amount of parallelism" refers to how many non- - empty parts of the input can be read in parallel. This does not - necessarily equal number of records. An input that can be read in - parallel down to the individual records is called "perfectly - splittable". An example of non-perfectly parallelizable input is a - block-compressed file format where a block of records has to be read as - a whole, but different blocks can be read in parallel. Examples: * If - we are processing record #30 (starting at 1) out of 50 in a perfectly - splittable 50-record input, this value should be 21 (20 remaining + 1 - current). * If we are reading through block 3 in a block-compressed file - consisting of 5 blocks, this value should be 3 (since blocks 4 and 5 - can be processed in parallel by new tasks via dynamic splitting and - the current task remains processing block 3). * If we are reading - through the last block in a block-compressed file, or reading or - processing the last record in a perfectly splittable input, this value - should be 1, because apart from the current task, no additional - remainder can be split off. + and 0 for finished. "Amount of parallelism" refers to how many non-empty + parts of the input can be read in parallel. This does not necessarily + equal number of records. An input that can be read in parallel down to + the individual records is called "perfectly splittable". An example of + non-perfectly parallelizable input is a block-compressed file format + where a block of records has to be read as a whole, but different blocks + can be read in parallel. Examples: * If we are processing record #30 + (starting at 1) out of 50 in a perfectly splittable 50-record input, + this value should be 21 (20 remaining + 1 current). * If we are reading + through block 3 in a block-compressed file consisting of 5 blocks, this + value should be 3 (since blocks 4 and 5 can be processed in parallel by + new tasks via dynamic splitting and the current task remains processing + block 3). * If we are reading through the last block in a block- + compressed file, or reading or processing the last record in a perfectly + splittable input, this value should be 1, because apart from the current + task, no additional remainder can be split off. """ consumedParallelism = _messages.MessageField('ReportedParallelism', 1) @@ -106,6 +103,20 @@ class ApproximateSplitRequest(_messages.Message): position = _messages.MessageField('Position', 3) +class Artifact(_messages.Message): + r"""Job information for templates. + + Fields: + containerSpec: Container image path set for flex Template. + jobGraphGcsPath: job_graph_gcs_path set for legacy Template. + metadata: Metadata set for legacy Template. + """ + + containerSpec = _messages.MessageField('ContainerSpec', 1) + jobGraphGcsPath = _messages.StringField(2) + metadata = _messages.MessageField('TemplateMetadata', 3) + + class AutoscalingEvent(_messages.Message): r"""A structured message reporting an autoscaling decision made by the Dataflow service. @@ -132,8 +143,7 @@ class EventTypeValueValuesEnum(_messages.Enum): r"""The type of autoscaling event to report. Values: - TYPE_UNKNOWN: Default type for the enum. Value should never be - returned. + TYPE_UNKNOWN: Default type for the enum. Value should never be returned. TARGET_NUM_WORKERS_CHANGED: The TARGET_NUM_WORKERS_CHANGED type should be used when the target worker pool size has changed at the start of an actuation. An event should always be specified as @@ -238,6 +248,16 @@ class CPUTime(_messages.Message): totalMs = _messages.IntegerField(3, variant=_messages.Variant.UINT64) +class CommitTemplateVersionRequest(_messages.Message): + r"""Commit will add a new TemplateVersion to an existing template. + + Fields: + templateVersion: TemplateVersion obejct to create. + """ + + templateVersion = _messages.MessageField('TemplateVersion', 1) + + class ComponentSource(_messages.Message): r"""Description of an interstitial value between transforms in an execution stage. @@ -309,15 +329,17 @@ class ContainerSpec(_messages.Message): r"""Container Spec. Fields: + defaultEnvironment: Default runtime environment for the job. image: Name of the docker container image. E.g., gcr.io/project/some-image metadata: Metadata describing a template including description and validation rules. sdkInfo: Required. SDK info of the Flex Template. """ - image = _messages.StringField(1) - metadata = _messages.MessageField('TemplateMetadata', 2) - sdkInfo = _messages.MessageField('SDKInfo', 3) + defaultEnvironment = _messages.MessageField('FlexTemplateRuntimeEnvironment', 1) + image = _messages.StringField(2) + metadata = _messages.MessageField('TemplateMetadata', 3) + sdkInfo = _messages.MessageField('SDKInfo', 4) class CounterMetadata(_messages.Message): @@ -563,6 +585,16 @@ class AdditionalProperty(_messages.Message): parameters = _messages.MessageField('ParametersValue', 5) +class CreateTemplateVersionRequest(_messages.Message): + r"""Creates a new Template with TemplateVersions. + + Fields: + templateVersion: The TemplateVersion object to create. + """ + + templateVersion = _messages.MessageField('TemplateVersion', 1) + + class CustomSourceLocation(_messages.Message): r"""Identifies the location of a custom souce. @@ -590,13 +622,115 @@ class DataDiskAssignment(_messages.Message): vmInstance = _messages.StringField(2) +class DataflowProjectsCatalogTemplatesCommitRequest(_messages.Message): + r"""A DataflowProjectsCatalogTemplatesCommitRequest object. + + Fields: + commitTemplateVersionRequest: A CommitTemplateVersionRequest resource to + be passed as the request body. + name: The location of the template, name includes project_id and + display_name. Commit using project_id(pid1) and display_name(tid1). + Format: projects/{pid1}/catalogTemplates/{tid1} + """ + + commitTemplateVersionRequest = _messages.MessageField('CommitTemplateVersionRequest', 1) + name = _messages.StringField(2, required=True) + + +class DataflowProjectsCatalogTemplatesDeleteRequest(_messages.Message): + r"""A DataflowProjectsCatalogTemplatesDeleteRequest object. + + Fields: + name: name includes project_id and display_name. Delete by + project_id(pid1) and display_name(tid1). Format: + projects/{pid1}/catalogTemplates/{tid1} + """ + + name = _messages.StringField(1, required=True) + + +class DataflowProjectsCatalogTemplatesGetRequest(_messages.Message): + r"""A DataflowProjectsCatalogTemplatesGetRequest object. + + Fields: + name: Resource name includes project_id and display_name. version_id is + optional. Get the latest TemplateVersion if version_id not set. Get by + project_id(pid1) and display_name(tid1): Format: + projects/{pid1}/catalogTemplates/{tid1} Get by project_id(pid1), + display_name(tid1), and version_id(vid1): Format: + projects/{pid1}/catalogTemplates/{tid1@vid} + """ + + name = _messages.StringField(1, required=True) + + +class DataflowProjectsCatalogTemplatesLabelRequest(_messages.Message): + r"""A DataflowProjectsCatalogTemplatesLabelRequest object. + + Fields: + modifyTemplateVersionLabelRequest: A ModifyTemplateVersionLabelRequest + resource to be passed as the request body. + name: Resource name includes project_id, display_name, and version_id. + Updates by project_id(pid1), display_name(tid1), and version_id(vid1): + Format: projects/{pid1}/catalogTemplates/{tid1@vid} + """ + + modifyTemplateVersionLabelRequest = _messages.MessageField('ModifyTemplateVersionLabelRequest', 1) + name = _messages.StringField(2, required=True) + + +class DataflowProjectsCatalogTemplatesTagRequest(_messages.Message): + r"""A DataflowProjectsCatalogTemplatesTagRequest object. + + Fields: + modifyTemplateVersionTagRequest: A ModifyTemplateVersionTagRequest + resource to be passed as the request body. + name: Resource name includes project_id, display_name, and version_id. + Updates by project_id(pid1), display_name(tid1), and version_id(vid1): + Format: projects/{pid1}/catalogTemplates/{tid1@vid} + """ + + modifyTemplateVersionTagRequest = _messages.MessageField('ModifyTemplateVersionTagRequest', 1) + name = _messages.StringField(2, required=True) + + +class DataflowProjectsCatalogTemplatesTemplateVersionsCreateRequest(_messages.Message): + r"""A DataflowProjectsCatalogTemplatesTemplateVersionsCreateRequest object. + + Fields: + createTemplateVersionRequest: A CreateTemplateVersionRequest resource to + be passed as the request body. + parent: The parent project and template that the TemplateVersion will be + created under. Create using project_id(pid1) and display_name(tid1). + Format: projects/{pid1}/catalogTemplates/{tid1} + """ + + createTemplateVersionRequest = _messages.MessageField('CreateTemplateVersionRequest', 1) + parent = _messages.StringField(2, required=True) + + +class DataflowProjectsDeleteSnapshotsRequest(_messages.Message): + r"""A DataflowProjectsDeleteSnapshotsRequest object. + + Fields: + location: The location that contains this snapshot. + projectId: The ID of the Cloud Platform project that the snapshot belongs + to. + snapshotId: The ID of the snapshot. + """ + + location = _messages.StringField(1) + projectId = _messages.StringField(2, required=True) + snapshotId = _messages.StringField(3) + + class DataflowProjectsJobsAggregatedRequest(_messages.Message): r"""A DataflowProjectsJobsAggregatedRequest object. Enums: FilterValueValuesEnum: The kind of filter to use. - ViewValueValuesEnum: Level of information requested in response. Default - is `JOB_VIEW_SUMMARY`. + ViewValueValuesEnum: Deprecated. ListJobs always returns summaries now. + Use GetJob for other JobViews. Fields: filter: The kind of filter to use. @@ -609,18 +743,23 @@ class DataflowProjectsJobsAggregatedRequest(_messages.Message): pageToken: Set this to the 'next_page_token' field of a previous response to request additional results in a long list. projectId: The project which owns the jobs. - view: Level of information requested in response. Default is - `JOB_VIEW_SUMMARY`. + view: Deprecated. ListJobs always returns summaries now. Use GetJob for + other JobViews. """ class FilterValueValuesEnum(_messages.Enum): r"""The kind of filter to use. Values: - UNKNOWN: - ALL: - TERMINATED: - ACTIVE: + UNKNOWN: The filter isn't specified, or is unknown. This returns all + jobs ordered on descending `JobUuid`. + ALL: Returns all running jobs first ordered on creation timestamp, then + returns all terminated jobs ordered on the termination timestamp. + TERMINATED: Filters the jobs that have a terminated state, ordered on + the termination timestamp. Example terminated states: + `JOB_STATE_STOPPED`, `JOB_STATE_UPDATED`, `JOB_STATE_DRAINED`, etc. + ACTIVE: Filters the jobs that are running ordered on the creation + timestamp. """ UNKNOWN = 0 ALL = 1 @@ -628,14 +767,19 @@ class FilterValueValuesEnum(_messages.Enum): ACTIVE = 3 class ViewValueValuesEnum(_messages.Enum): - r"""Level of information requested in response. Default is - `JOB_VIEW_SUMMARY`. + r"""Deprecated. ListJobs always returns summaries now. Use GetJob for + other JobViews. Values: - JOB_VIEW_UNKNOWN: - JOB_VIEW_SUMMARY: - JOB_VIEW_ALL: - JOB_VIEW_DESCRIPTION: + JOB_VIEW_UNKNOWN: The job view to return isn't specified, or is unknown. + Responses will contain at least the `JOB_VIEW_SUMMARY` information, + and may contain additional information. + JOB_VIEW_SUMMARY: Request summary information only: Project ID, Job ID, + job name, job type, job status, start/end time, and Cloud SDK version + details. + JOB_VIEW_ALL: Request all information available for this job. + JOB_VIEW_DESCRIPTION: Request summary info and limited job description + data for steps, labels and environment. """ JOB_VIEW_UNKNOWN = 0 JOB_VIEW_SUMMARY = 1 @@ -670,10 +814,15 @@ class ViewValueValuesEnum(_messages.Enum): r"""The level of information requested in response. Values: - JOB_VIEW_UNKNOWN: - JOB_VIEW_SUMMARY: - JOB_VIEW_ALL: - JOB_VIEW_DESCRIPTION: + JOB_VIEW_UNKNOWN: The job view to return isn't specified, or is unknown. + Responses will contain at least the `JOB_VIEW_SUMMARY` information, + and may contain additional information. + JOB_VIEW_SUMMARY: Request summary information only: Project ID, Job ID, + job name, job type, job status, start/end time, and Cloud SDK version + details. + JOB_VIEW_ALL: Request all information available for this job. + JOB_VIEW_DESCRIPTION: Request summary info and limited job description + data for steps, labels and environment. """ JOB_VIEW_UNKNOWN = 0 JOB_VIEW_SUMMARY = 1 @@ -721,7 +870,7 @@ class DataflowProjectsJobsGetMetricsRequest(_messages.Message): r"""A DataflowProjectsJobsGetMetricsRequest object. Fields: - jobId: The job to get messages for. + jobId: The job to get metrics for. location: The [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that contains the job specified by job_id. @@ -755,10 +904,15 @@ class ViewValueValuesEnum(_messages.Enum): r"""The level of information requested in response. Values: - JOB_VIEW_UNKNOWN: - JOB_VIEW_SUMMARY: - JOB_VIEW_ALL: - JOB_VIEW_DESCRIPTION: + JOB_VIEW_UNKNOWN: The job view to return isn't specified, or is unknown. + Responses will contain at least the `JOB_VIEW_SUMMARY` information, + and may contain additional information. + JOB_VIEW_SUMMARY: Request summary information only: Project ID, Job ID, + job name, job type, job status, start/end time, and Cloud SDK version + details. + JOB_VIEW_ALL: Request all information available for this job. + JOB_VIEW_DESCRIPTION: Request summary info and limited job description + data for steps, labels and environment. """ JOB_VIEW_UNKNOWN = 0 JOB_VIEW_SUMMARY = 1 @@ -776,8 +930,8 @@ class DataflowProjectsJobsListRequest(_messages.Message): Enums: FilterValueValuesEnum: The kind of filter to use. - ViewValueValuesEnum: Level of information requested in response. Default - is `JOB_VIEW_SUMMARY`. + ViewValueValuesEnum: Deprecated. ListJobs always returns summaries now. + Use GetJob for other JobViews. Fields: filter: The kind of filter to use. @@ -790,18 +944,23 @@ class DataflowProjectsJobsListRequest(_messages.Message): pageToken: Set this to the 'next_page_token' field of a previous response to request additional results in a long list. projectId: The project which owns the jobs. - view: Level of information requested in response. Default is - `JOB_VIEW_SUMMARY`. + view: Deprecated. ListJobs always returns summaries now. Use GetJob for + other JobViews. """ class FilterValueValuesEnum(_messages.Enum): r"""The kind of filter to use. Values: - UNKNOWN: - ALL: - TERMINATED: - ACTIVE: + UNKNOWN: The filter isn't specified, or is unknown. This returns all + jobs ordered on descending `JobUuid`. + ALL: Returns all running jobs first ordered on creation timestamp, then + returns all terminated jobs ordered on the termination timestamp. + TERMINATED: Filters the jobs that have a terminated state, ordered on + the termination timestamp. Example terminated states: + `JOB_STATE_STOPPED`, `JOB_STATE_UPDATED`, `JOB_STATE_DRAINED`, etc. + ACTIVE: Filters the jobs that are running ordered on the creation + timestamp. """ UNKNOWN = 0 ALL = 1 @@ -809,14 +968,19 @@ class FilterValueValuesEnum(_messages.Enum): ACTIVE = 3 class ViewValueValuesEnum(_messages.Enum): - r"""Level of information requested in response. Default is - `JOB_VIEW_SUMMARY`. + r"""Deprecated. ListJobs always returns summaries now. Use GetJob for + other JobViews. Values: - JOB_VIEW_UNKNOWN: - JOB_VIEW_SUMMARY: - JOB_VIEW_ALL: - JOB_VIEW_DESCRIPTION: + JOB_VIEW_UNKNOWN: The job view to return isn't specified, or is unknown. + Responses will contain at least the `JOB_VIEW_SUMMARY` information, + and may contain additional information. + JOB_VIEW_SUMMARY: Request summary information only: Project ID, Job ID, + job name, job type, job status, start/end time, and Cloud SDK version + details. + JOB_VIEW_ALL: Request all information available for this job. + JOB_VIEW_DESCRIPTION: Request summary info and limited job description + data for steps, labels and environment. """ JOB_VIEW_UNKNOWN = 0 JOB_VIEW_SUMMARY = 1 @@ -847,7 +1011,7 @@ class DataflowProjectsJobsMessagesListRequest(_messages.Message): that contains the job specified by job_id. minimumImportance: Filter to only get messages with importance >= level pageSize: If specified, determines the maximum number of messages to - return. If unspecified, the service may choose an appropriate default, + return. If unspecified, the service may choose an appropriate default, or may return an arbitrarily large number of results. pageToken: If supplied, this should be the value of next_page_token returned by an earlier call. This will cause the next page of results to @@ -862,12 +1026,30 @@ class MinimumImportanceValueValuesEnum(_messages.Enum): r"""Filter to only get messages with importance >= level Values: - JOB_MESSAGE_IMPORTANCE_UNKNOWN: - JOB_MESSAGE_DEBUG: - JOB_MESSAGE_DETAILED: - JOB_MESSAGE_BASIC: - JOB_MESSAGE_WARNING: - JOB_MESSAGE_ERROR: + JOB_MESSAGE_IMPORTANCE_UNKNOWN: The message importance isn't specified, + or is unknown. + JOB_MESSAGE_DEBUG: The message is at the 'debug' level: typically only + useful for software engineers working on the code the job is running. + Typically, Dataflow pipeline runners do not display log messages at + this level by default. + JOB_MESSAGE_DETAILED: The message is at the 'detailed' level: somewhat + verbose, but potentially useful to users. Typically, Dataflow pipeline + runners do not display log messages at this level by default. These + messages are displayed by default in the Dataflow monitoring UI. + JOB_MESSAGE_BASIC: The message is at the 'basic' level: useful for + keeping track of the execution of a Dataflow pipeline. Typically, + Dataflow pipeline runners display log messages at this level by + default, and these messages are displayed by default in the Dataflow + monitoring UI. + JOB_MESSAGE_WARNING: The message is at the 'warning' level: indicating a + condition pertaining to a job which may require human intervention. + Typically, Dataflow pipeline runners display log messages at this + level by default, and these messages are displayed by default in the + Dataflow monitoring UI. + JOB_MESSAGE_ERROR: The message is at the 'error' level: indicating a + condition preventing a job from succeeding. Typically, Dataflow + pipeline runners display log messages at this level by default, and + these messages are displayed by default in the Dataflow monitoring UI. """ JOB_MESSAGE_IMPORTANCE_UNKNOWN = 0 JOB_MESSAGE_DEBUG = 1 @@ -886,6 +1068,21 @@ class MinimumImportanceValueValuesEnum(_messages.Enum): startTime = _messages.StringField(8) +class DataflowProjectsJobsSnapshotRequest(_messages.Message): + r"""A DataflowProjectsJobsSnapshotRequest object. + + Fields: + jobId: The job to be snapshotted. + projectId: The project which owns the job to be snapshotted. + snapshotJobRequest: A SnapshotJobRequest resource to be passed as the + request body. + """ + + jobId = _messages.StringField(1, required=True) + projectId = _messages.StringField(2, required=True) + snapshotJobRequest = _messages.MessageField('SnapshotJobRequest', 3) + + class DataflowProjectsJobsUpdateRequest(_messages.Message): r"""A DataflowProjectsJobsUpdateRequest object. @@ -972,10 +1169,15 @@ class ViewValueValuesEnum(_messages.Enum): r"""The level of information requested in response. Values: - JOB_VIEW_UNKNOWN: - JOB_VIEW_SUMMARY: - JOB_VIEW_ALL: - JOB_VIEW_DESCRIPTION: + JOB_VIEW_UNKNOWN: The job view to return isn't specified, or is unknown. + Responses will contain at least the `JOB_VIEW_SUMMARY` information, + and may contain additional information. + JOB_VIEW_SUMMARY: Request summary information only: Project ID, Job ID, + job name, job type, job status, start/end time, and Cloud SDK version + details. + JOB_VIEW_ALL: Request all information available for this job. + JOB_VIEW_DESCRIPTION: Request summary info and limited job description + data for steps, labels and environment. """ JOB_VIEW_UNKNOWN = 0 JOB_VIEW_SUMMARY = 1 @@ -1027,11 +1229,35 @@ class DataflowProjectsLocationsJobsDebugSendCaptureRequest(_messages.Message): sendDebugCaptureRequest = _messages.MessageField('SendDebugCaptureRequest', 4) +class DataflowProjectsLocationsJobsGetExecutionDetailsRequest(_messages.Message): + r"""A DataflowProjectsLocationsJobsGetExecutionDetailsRequest object. + + Fields: + jobId: The job to get execution details for. + location: The [regional endpoint] + (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) + that contains the job specified by job_id. + pageSize: If specified, determines the maximum number of stages to return. + If unspecified, the service may choose an appropriate default, or may + return an arbitrarily large number of results. + pageToken: If supplied, this should be the value of next_page_token + returned by an earlier call. This will cause the next page of results to + be returned. + projectId: A project id. + """ + + jobId = _messages.StringField(1, required=True) + location = _messages.StringField(2, required=True) + pageSize = _messages.IntegerField(3, variant=_messages.Variant.INT32) + pageToken = _messages.StringField(4) + projectId = _messages.StringField(5, required=True) + + class DataflowProjectsLocationsJobsGetMetricsRequest(_messages.Message): r"""A DataflowProjectsLocationsJobsGetMetricsRequest object. Fields: - jobId: The job to get messages for. + jobId: The job to get metrics for. location: The [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that contains the job specified by job_id. @@ -1065,10 +1291,15 @@ class ViewValueValuesEnum(_messages.Enum): r"""The level of information requested in response. Values: - JOB_VIEW_UNKNOWN: - JOB_VIEW_SUMMARY: - JOB_VIEW_ALL: - JOB_VIEW_DESCRIPTION: + JOB_VIEW_UNKNOWN: The job view to return isn't specified, or is unknown. + Responses will contain at least the `JOB_VIEW_SUMMARY` information, + and may contain additional information. + JOB_VIEW_SUMMARY: Request summary information only: Project ID, Job ID, + job name, job type, job status, start/end time, and Cloud SDK version + details. + JOB_VIEW_ALL: Request all information available for this job. + JOB_VIEW_DESCRIPTION: Request summary info and limited job description + data for steps, labels and environment. """ JOB_VIEW_UNKNOWN = 0 JOB_VIEW_SUMMARY = 1 @@ -1086,8 +1317,8 @@ class DataflowProjectsLocationsJobsListRequest(_messages.Message): Enums: FilterValueValuesEnum: The kind of filter to use. - ViewValueValuesEnum: Level of information requested in response. Default - is `JOB_VIEW_SUMMARY`. + ViewValueValuesEnum: Deprecated. ListJobs always returns summaries now. + Use GetJob for other JobViews. Fields: filter: The kind of filter to use. @@ -1100,18 +1331,23 @@ class DataflowProjectsLocationsJobsListRequest(_messages.Message): pageToken: Set this to the 'next_page_token' field of a previous response to request additional results in a long list. projectId: The project which owns the jobs. - view: Level of information requested in response. Default is - `JOB_VIEW_SUMMARY`. + view: Deprecated. ListJobs always returns summaries now. Use GetJob for + other JobViews. """ class FilterValueValuesEnum(_messages.Enum): r"""The kind of filter to use. Values: - UNKNOWN: - ALL: - TERMINATED: - ACTIVE: + UNKNOWN: The filter isn't specified, or is unknown. This returns all + jobs ordered on descending `JobUuid`. + ALL: Returns all running jobs first ordered on creation timestamp, then + returns all terminated jobs ordered on the termination timestamp. + TERMINATED: Filters the jobs that have a terminated state, ordered on + the termination timestamp. Example terminated states: + `JOB_STATE_STOPPED`, `JOB_STATE_UPDATED`, `JOB_STATE_DRAINED`, etc. + ACTIVE: Filters the jobs that are running ordered on the creation + timestamp. """ UNKNOWN = 0 ALL = 1 @@ -1119,14 +1355,19 @@ class FilterValueValuesEnum(_messages.Enum): ACTIVE = 3 class ViewValueValuesEnum(_messages.Enum): - r"""Level of information requested in response. Default is - `JOB_VIEW_SUMMARY`. + r"""Deprecated. ListJobs always returns summaries now. Use GetJob for + other JobViews. Values: - JOB_VIEW_UNKNOWN: - JOB_VIEW_SUMMARY: - JOB_VIEW_ALL: - JOB_VIEW_DESCRIPTION: + JOB_VIEW_UNKNOWN: The job view to return isn't specified, or is unknown. + Responses will contain at least the `JOB_VIEW_SUMMARY` information, + and may contain additional information. + JOB_VIEW_SUMMARY: Request summary information only: Project ID, Job ID, + job name, job type, job status, start/end time, and Cloud SDK version + details. + JOB_VIEW_ALL: Request all information available for this job. + JOB_VIEW_DESCRIPTION: Request summary info and limited job description + data for steps, labels and environment. """ JOB_VIEW_UNKNOWN = 0 JOB_VIEW_SUMMARY = 1 @@ -1157,7 +1398,7 @@ class DataflowProjectsLocationsJobsMessagesListRequest(_messages.Message): that contains the job specified by job_id. minimumImportance: Filter to only get messages with importance >= level pageSize: If specified, determines the maximum number of messages to - return. If unspecified, the service may choose an appropriate default, + return. If unspecified, the service may choose an appropriate default, or may return an arbitrarily large number of results. pageToken: If supplied, this should be the value of next_page_token returned by an earlier call. This will cause the next page of results to @@ -1172,12 +1413,30 @@ class MinimumImportanceValueValuesEnum(_messages.Enum): r"""Filter to only get messages with importance >= level Values: - JOB_MESSAGE_IMPORTANCE_UNKNOWN: - JOB_MESSAGE_DEBUG: - JOB_MESSAGE_DETAILED: - JOB_MESSAGE_BASIC: - JOB_MESSAGE_WARNING: - JOB_MESSAGE_ERROR: + JOB_MESSAGE_IMPORTANCE_UNKNOWN: The message importance isn't specified, + or is unknown. + JOB_MESSAGE_DEBUG: The message is at the 'debug' level: typically only + useful for software engineers working on the code the job is running. + Typically, Dataflow pipeline runners do not display log messages at + this level by default. + JOB_MESSAGE_DETAILED: The message is at the 'detailed' level: somewhat + verbose, but potentially useful to users. Typically, Dataflow pipeline + runners do not display log messages at this level by default. These + messages are displayed by default in the Dataflow monitoring UI. + JOB_MESSAGE_BASIC: The message is at the 'basic' level: useful for + keeping track of the execution of a Dataflow pipeline. Typically, + Dataflow pipeline runners display log messages at this level by + default, and these messages are displayed by default in the Dataflow + monitoring UI. + JOB_MESSAGE_WARNING: The message is at the 'warning' level: indicating a + condition pertaining to a job which may require human intervention. + Typically, Dataflow pipeline runners display log messages at this + level by default, and these messages are displayed by default in the + Dataflow monitoring UI. + JOB_MESSAGE_ERROR: The message is at the 'error' level: indicating a + condition preventing a job from succeeding. Typically, Dataflow + pipeline runners display log messages at this level by default, and + these messages are displayed by default in the Dataflow monitoring UI. """ JOB_MESSAGE_IMPORTANCE_UNKNOWN = 0 JOB_MESSAGE_DEBUG = 1 @@ -1196,6 +1455,67 @@ class MinimumImportanceValueValuesEnum(_messages.Enum): startTime = _messages.StringField(8) +class DataflowProjectsLocationsJobsSnapshotRequest(_messages.Message): + r"""A DataflowProjectsLocationsJobsSnapshotRequest object. + + Fields: + jobId: The job to be snapshotted. + location: The location that contains this job. + projectId: The project which owns the job to be snapshotted. + snapshotJobRequest: A SnapshotJobRequest resource to be passed as the + request body. + """ + + jobId = _messages.StringField(1, required=True) + location = _messages.StringField(2, required=True) + projectId = _messages.StringField(3, required=True) + snapshotJobRequest = _messages.MessageField('SnapshotJobRequest', 4) + + +class DataflowProjectsLocationsJobsSnapshotsListRequest(_messages.Message): + r"""A DataflowProjectsLocationsJobsSnapshotsListRequest object. + + Fields: + jobId: If specified, list snapshots created from this job. + location: The location to list snapshots in. + projectId: The project ID to list snapshots for. + """ + + jobId = _messages.StringField(1, required=True) + location = _messages.StringField(2, required=True) + projectId = _messages.StringField(3, required=True) + + +class DataflowProjectsLocationsJobsStagesGetExecutionDetailsRequest(_messages.Message): + r"""A DataflowProjectsLocationsJobsStagesGetExecutionDetailsRequest object. + + Fields: + endTime: Upper time bound of work items to include, by start time. + jobId: The job to get execution details for. + location: The [regional endpoint] + (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) + that contains the job specified by job_id. + pageSize: If specified, determines the maximum number of work items to + return. If unspecified, the service may choose an appropriate default, + or may return an arbitrarily large number of results. + pageToken: If supplied, this should be the value of next_page_token + returned by an earlier call. This will cause the next page of results to + be returned. + projectId: A project id. + stageId: The stage for which to fetch information. + startTime: Lower time bound of work items to include, by start time. + """ + + endTime = _messages.StringField(1) + jobId = _messages.StringField(2, required=True) + location = _messages.StringField(3, required=True) + pageSize = _messages.IntegerField(4, variant=_messages.Variant.INT32) + pageToken = _messages.StringField(5) + projectId = _messages.StringField(6, required=True) + stageId = _messages.StringField(7, required=True) + startTime = _messages.StringField(8) + + class DataflowProjectsLocationsJobsUpdateRequest(_messages.Message): r"""A DataflowProjectsLocationsJobsUpdateRequest object. @@ -1252,6 +1572,50 @@ class DataflowProjectsLocationsJobsWorkItemsReportStatusRequest(_messages.Messag reportWorkItemStatusRequest = _messages.MessageField('ReportWorkItemStatusRequest', 4) +class DataflowProjectsLocationsSnapshotsDeleteRequest(_messages.Message): + r"""A DataflowProjectsLocationsSnapshotsDeleteRequest object. + + Fields: + location: The location that contains this snapshot. + projectId: The ID of the Cloud Platform project that the snapshot belongs + to. + snapshotId: The ID of the snapshot. + """ + + location = _messages.StringField(1, required=True) + projectId = _messages.StringField(2, required=True) + snapshotId = _messages.StringField(3, required=True) + + +class DataflowProjectsLocationsSnapshotsGetRequest(_messages.Message): + r"""A DataflowProjectsLocationsSnapshotsGetRequest object. + + Fields: + location: The location that contains this snapshot. + projectId: The ID of the Cloud Platform project that the snapshot belongs + to. + snapshotId: The ID of the snapshot. + """ + + location = _messages.StringField(1, required=True) + projectId = _messages.StringField(2, required=True) + snapshotId = _messages.StringField(3, required=True) + + +class DataflowProjectsLocationsSnapshotsListRequest(_messages.Message): + r"""A DataflowProjectsLocationsSnapshotsListRequest object. + + Fields: + jobId: If specified, list snapshots created from this job. + location: The location to list snapshots in. + projectId: The project ID to list snapshots for. + """ + + jobId = _messages.StringField(1) + location = _messages.StringField(2, required=True) + projectId = _messages.StringField(3, required=True) + + class DataflowProjectsLocationsSqlValidateRequest(_messages.Message): r"""A DataflowProjectsLocationsSqlValidateRequest object. @@ -1308,7 +1672,8 @@ class ViewValueValuesEnum(_messages.Enum): r"""The view to retrieve. Defaults to METADATA_ONLY. Values: - METADATA_ONLY: + METADATA_ONLY: Template view that retrieves only the metadata associated + with the template. """ METADATA_ONLY = 0 @@ -1365,6 +1730,53 @@ class DataflowProjectsLocationsWorkerMessagesRequest(_messages.Message): sendWorkerMessagesRequest = _messages.MessageField('SendWorkerMessagesRequest', 3) +class DataflowProjectsSnapshotsGetRequest(_messages.Message): + r"""A DataflowProjectsSnapshotsGetRequest object. + + Fields: + location: The location that contains this snapshot. + projectId: The ID of the Cloud Platform project that the snapshot belongs + to. + snapshotId: The ID of the snapshot. + """ + + location = _messages.StringField(1) + projectId = _messages.StringField(2, required=True) + snapshotId = _messages.StringField(3, required=True) + + +class DataflowProjectsSnapshotsListRequest(_messages.Message): + r"""A DataflowProjectsSnapshotsListRequest object. + + Fields: + jobId: If specified, list snapshots created from this job. + location: The location to list snapshots in. + projectId: The project ID to list snapshots for. + """ + + jobId = _messages.StringField(1) + location = _messages.StringField(2) + projectId = _messages.StringField(3, required=True) + + +class DataflowProjectsTemplateVersionsListRequest(_messages.Message): + r"""A DataflowProjectsTemplateVersionsListRequest object. + + Fields: + pageSize: The maximum number of TemplateVersions to return per page. + pageToken: The page token, received from a previous ListTemplateVersions + call. Provide this to retrieve the subsequent page. + parent: parent includes project_id, and display_name is optional. List by + project_id(pid1) and display_name(tid1). Format: + projects/{pid1}/catalogTemplates/{tid1} List by project_id(pid1). + Format: projects/{pid1} + """ + + pageSize = _messages.IntegerField(1, variant=_messages.Variant.INT32) + pageToken = _messages.StringField(2) + parent = _messages.StringField(3, required=True) + + class DataflowProjectsTemplatesCreateRequest(_messages.Message): r"""A DataflowProjectsTemplatesCreateRequest object. @@ -1400,7 +1812,8 @@ class ViewValueValuesEnum(_messages.Enum): r"""The view to retrieve. Defaults to METADATA_ONLY. Values: - METADATA_ONLY: + METADATA_ONLY: Template view that retrieves only the metadata associated + with the template. """ METADATA_ONLY = 0 @@ -1465,6 +1878,21 @@ class DatastoreIODetails(_messages.Message): projectId = _messages.StringField(2) +class DebugOptions(_messages.Message): + r"""Describes any options that have an effect on the debugging of pipelines. + + Fields: + enableHotKeyLogging: When true, enables the logging of the literal hot key + to the user's Cloud Logging. + """ + + enableHotKeyLogging = _messages.BooleanField(1) + + +class DeleteSnapshotResponse(_messages.Message): + r"""Response from deleting a snapshot.""" + + class DerivedSource(_messages.Message): r"""Specification of one of the bundles produced as a result of splitting a Source (e.g. when executing a SourceSplitRequest, or when splitting an @@ -1506,22 +1934,22 @@ class Disk(_messages.Message): r"""Describes the data disk used by a workflow job. Fields: - diskType: Disk storage type, as defined by Google Compute Engine. This + diskType: Disk storage type, as defined by Google Compute Engine. This must be a disk type appropriate to the project and zone in which the - workers will run. If unknown or unspecified, the service will attempt - to choose a reasonable default. For example, the standard persistent - disk type is a resource name typically ending in "pd-standard". If SSD + workers will run. If unknown or unspecified, the service will attempt to + choose a reasonable default. For example, the standard persistent disk + type is a resource name typically ending in "pd-standard". If SSD persistent disks are available, the resource name typically ends with - "pd-ssd". The actual valid values are defined the Google Compute Engine + "pd-ssd". The actual valid values are defined the Google Compute Engine API, not by the Cloud Dataflow API; consult the Google Compute Engine documentation for more information about determining the set of - available disk types for a particular project and zone. Google Compute + available disk types for a particular project and zone. Google Compute Engine Disk types are local to a particular project in a particular zone, and so the resource name will typically look something like this: compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd- standard mountPoint: Directory in a VM where disk is mounted. - sizeGb: Size of disk in GB. If zero or unspecified, the service will + sizeGb: Size of disk in GB. If zero or unspecified, the service will attempt to choose a reasonable default. """ @@ -1608,12 +2036,24 @@ class DynamicSourceSplit(_messages.Message): residual = _messages.MessageField('DerivedSource', 2) +class Empty(_messages.Message): + r"""A generic empty message that you can re-use to avoid defining duplicated + empty messages in your APIs. A typical example is to use it as the request + or the response type of an API method. For instance: service Foo { rpc + Bar(google.protobuf.Empty) returns (google.protobuf.Empty); } The JSON + representation for `Empty` is empty JSON object `{}`. + """ + + + class Environment(_messages.Message): r"""Describes the environment in which a Dataflow Job runs. Enums: FlexResourceSchedulingGoalValueValuesEnum: Which Flexible Resource Scheduling mode to run in. + ShuffleModeValueValuesEnum: Output only. The shuffle mode used for the + job. Messages: InternalExperimentsValue: Experimental settings. @@ -1626,14 +2066,18 @@ class Environment(_messages.Message): of the service are required in order to run the job. Fields: - clusterManagerApiService: The type of cluster manager API to use. If + clusterManagerApiService: The type of cluster manager API to use. If unknown or unspecified, the service will attempt to choose a reasonable - default. This should be in the form of the API service name, e.g. + default. This should be in the form of the API service name, e.g. "compute.googleapis.com". dataset: The dataset for the current project where various workflow - related tables are stored. The supported resource type is: Google - BigQuery: bigquery.googleapis.com/{dataset} - experiments: The list of experiments to enable. + related tables are stored. The supported resource type is: Google + BigQuery: bigquery.googleapis.com/{dataset} + debugOptions: Any debugging options to be supplied to the job. + experiments: The list of experiments to enable. This field should be used + for SDK related experiments and not for service related experiments. The + proper field for service related experiments is service_options. For + more details see the rationale at go/user-specified-service-options. flexResourceSchedulingGoal: Which Flexible Resource Scheduling mode to run in. internalExperiments: Experimental settings. @@ -1647,13 +2091,19 @@ class Environment(_messages.Message): encrypt data at rest, AKA a Customer Managed Encryption Key (CMEK). Format: projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY + serviceOptions: The list of service options to enable. This field should + be used for service related experiments only. These experiments, when + graduating to GA, should be replaced by dedicated fields or become + default (i.e. always on). For more details see the rationale at go/user- + specified-service-options. + shuffleMode: Output only. The shuffle mode used for the job. tempStoragePrefix: The prefix of the resources the system should use for - temporary storage. The system will append the suffix "/temp-{JOBNAME} - to this resource prefix, where {JOBNAME} is the value of the job_name - field. The resulting bucket and object prefix is used as the prefix of + temporary storage. The system will append the suffix "/temp-{JOBNAME} to + this resource prefix, where {JOBNAME} is the value of the job_name + field. The resulting bucket and object prefix is used as the prefix of the resources used to store temporary data needed during the job - execution. NOTE: This will override the value in taskrunner_settings. - The supported resource type is: Google Cloud Storage: + execution. NOTE: This will override the value in taskrunner_settings. + The supported resource type is: Google Cloud Storage: storage.googleapis.com/{bucket}/{object} bucket.storage.googleapis.com/{object} userAgent: A description of the process that generated the request. @@ -1666,11 +2116,12 @@ class Environment(_messages.Message): which worker processing should occur, e.g. "us-west1". Mutually exclusive with worker_zone. If neither worker_region nor worker_zone is specified, default to the control plane's region. - workerZone: The Compute Engine zone (https://cloud.google.com/compute/docs - /regions-zones/regions-zones) in which worker processing should occur, - e.g. "us-west1-a". Mutually exclusive with worker_region. If neither - worker_region nor worker_zone is specified, a zone in the control - plane's region is chosen based on available capacity. + workerZone: The Compute Engine zone + (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in + which worker processing should occur, e.g. "us-west1-a". Mutually + exclusive with worker_region. If neither worker_region nor worker_zone + is specified, a zone in the control plane's region is chosen based on + available capacity. """ class FlexResourceSchedulingGoalValueValuesEnum(_messages.Enum): @@ -1685,6 +2136,18 @@ class FlexResourceSchedulingGoalValueValuesEnum(_messages.Enum): FLEXRS_SPEED_OPTIMIZED = 1 FLEXRS_COST_OPTIMIZED = 2 + class ShuffleModeValueValuesEnum(_messages.Enum): + r"""Output only. The shuffle mode used for the job. + + Values: + SHUFFLE_MODE_UNSPECIFIED: Shuffle mode information is not available. + VM_BASED: Shuffle is done on the worker VMs. + SERVICE_BASED: Shuffle is done on the service side. + """ + SHUFFLE_MODE_UNSPECIFIED = 0 + VM_BASED = 1 + SERVICE_BASED = 2 + @encoding.MapUnrecognizedFields('additionalProperties') class InternalExperimentsValue(_messages.Message): r"""Experimental settings. @@ -1790,18 +2253,21 @@ class AdditionalProperty(_messages.Message): clusterManagerApiService = _messages.StringField(1) dataset = _messages.StringField(2) - experiments = _messages.StringField(3, repeated=True) - flexResourceSchedulingGoal = _messages.EnumField('FlexResourceSchedulingGoalValueValuesEnum', 4) - internalExperiments = _messages.MessageField('InternalExperimentsValue', 5) - sdkPipelineOptions = _messages.MessageField('SdkPipelineOptionsValue', 6) - serviceAccountEmail = _messages.StringField(7) - serviceKmsKeyName = _messages.StringField(8) - tempStoragePrefix = _messages.StringField(9) - userAgent = _messages.MessageField('UserAgentValue', 10) - version = _messages.MessageField('VersionValue', 11) - workerPools = _messages.MessageField('WorkerPool', 12, repeated=True) - workerRegion = _messages.StringField(13) - workerZone = _messages.StringField(14) + debugOptions = _messages.MessageField('DebugOptions', 3) + experiments = _messages.StringField(4, repeated=True) + flexResourceSchedulingGoal = _messages.EnumField('FlexResourceSchedulingGoalValueValuesEnum', 5) + internalExperiments = _messages.MessageField('InternalExperimentsValue', 6) + sdkPipelineOptions = _messages.MessageField('SdkPipelineOptionsValue', 7) + serviceAccountEmail = _messages.StringField(8) + serviceKmsKeyName = _messages.StringField(9) + serviceOptions = _messages.StringField(10, repeated=True) + shuffleMode = _messages.EnumField('ShuffleModeValueValuesEnum', 11) + tempStoragePrefix = _messages.StringField(12) + userAgent = _messages.MessageField('UserAgentValue', 13) + version = _messages.MessageField('VersionValue', 14) + workerPools = _messages.MessageField('WorkerPool', 15, repeated=True) + workerRegion = _messages.StringField(16) + workerZone = _messages.StringField(17) class ExecutionStageState(_messages.Message): @@ -1828,12 +2294,12 @@ class ExecutionStageStateValueValuesEnum(_messages.Enum): JOB_STATE_RUNNING: `JOB_STATE_RUNNING` indicates that the job is currently running. JOB_STATE_DONE: `JOB_STATE_DONE` indicates that the job has successfully - completed. This is a terminal job state. This state may be set by the + completed. This is a terminal job state. This state may be set by the Cloud Dataflow service, as a transition from `JOB_STATE_RUNNING`. It may also be set via a Cloud Dataflow `UpdateJob` call, if the job has not yet reached a terminal state. JOB_STATE_FAILED: `JOB_STATE_FAILED` indicates that the job has failed. - This is a terminal job state. This state may only be set by the Cloud + This is a terminal job state. This state may only be set by the Cloud Dataflow service, and only as a transition from `JOB_STATE_RUNNING`. JOB_STATE_CANCELLED: `JOB_STATE_CANCELLED` indicates that the job has been explicitly cancelled. This is a terminal job state. This state @@ -1858,15 +2324,19 @@ class ExecutionStageStateValueValuesEnum(_messages.Enum): Cloud Dataflow service, and only as a transition from `JOB_STATE_DRAINING`. JOB_STATE_PENDING: `JOB_STATE_PENDING` indicates that the job has been - created but is not yet running. Jobs that are pending may only + created but is not yet running. Jobs that are pending may only transition to `JOB_STATE_RUNNING`, or `JOB_STATE_FAILED`. JOB_STATE_CANCELLING: `JOB_STATE_CANCELLING` indicates that the job has - been explicitly cancelled and is in the process of stopping. Jobs - that are cancelling may only transition to `JOB_STATE_CANCELLED` or + been explicitly cancelled and is in the process of stopping. Jobs that + are cancelling may only transition to `JOB_STATE_CANCELLED` or `JOB_STATE_FAILED`. JOB_STATE_QUEUED: `JOB_STATE_QUEUED` indicates that the job has been created but is being delayed until launch. Jobs that are queued may only transition to `JOB_STATE_PENDING` or `JOB_STATE_CANCELLED`. + JOB_STATE_RESOURCE_CLEANING_UP: `JOB_STATE_RESOURCE_CLEANING_UP` + indicates that the batch job's associated resources are currently + being cleaned up after a successful run. Currently, this is an opt-in + feature, please reach out to Cloud support team if you are intersted. """ JOB_STATE_UNKNOWN = 0 JOB_STATE_STOPPED = 1 @@ -1880,6 +2350,7 @@ class ExecutionStageStateValueValuesEnum(_messages.Enum): JOB_STATE_PENDING = 9 JOB_STATE_CANCELLING = 10 JOB_STATE_QUEUED = 11 + JOB_STATE_RESOURCE_CLEANING_UP = 12 currentStateTime = _messages.StringField(1) executionStageName = _messages.StringField(2) @@ -1888,7 +2359,7 @@ class ExecutionStageStateValueValuesEnum(_messages.Enum): class ExecutionStageSummary(_messages.Message): r"""Description of the composing transforms, names/ids, and input/outputs of - a stage of execution. Some composing transforms and sources may have been + a stage of execution. Some composing transforms and sources may have been generated by the Dataflow service during execution planning. Enums: @@ -1903,6 +2374,8 @@ class ExecutionStageSummary(_messages.Message): kind: Type of tranform this stage is executing. name: Dataflow service generated name for this stage. outputSource: Output sources for this stage. + prerequisiteStage: Other stages that must complete before this stage can + run. """ class KindValueValuesEnum(_messages.Enum): @@ -1937,6 +2410,7 @@ class KindValueValuesEnum(_messages.Enum): kind = _messages.EnumField('KindValueValuesEnum', 5) name = _messages.StringField(6) outputSource = _messages.MessageField('StageSource', 7, repeated=True) + prerequisiteStage = _messages.StringField(8, repeated=True) class FailedLocation(_messages.Message): @@ -1974,6 +2448,143 @@ class FlattenInstruction(_messages.Message): inputs = _messages.MessageField('InstructionInput', 1, repeated=True) +class FlexTemplateRuntimeEnvironment(_messages.Message): + r"""The environment values to be set at runtime for flex template. + + Enums: + FlexrsGoalValueValuesEnum: Set FlexRS goal for the job. + https://cloud.google.com/dataflow/docs/guides/flexrs + IpConfigurationValueValuesEnum: Configuration for VM IPs. + + Messages: + AdditionalUserLabelsValue: Additional user labels to be specified for the + job. Keys and values must follow the restrictions specified in the + [labeling restrictions](https://cloud.google.com/compute/docs/labeling- + resources#restrictions) page. An object containing a list of "key": + value pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }. + + Fields: + additionalExperiments: Additional experiment flags for the job. + additionalUserLabels: Additional user labels to be specified for the job. + Keys and values must follow the restrictions specified in the [labeling + restrictions](https://cloud.google.com/compute/docs/labeling- + resources#restrictions) page. An object containing a list of "key": + value pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }. + enableStreamingEngine: Whether to enable Streaming Engine for the job. + flexrsGoal: Set FlexRS goal for the job. + https://cloud.google.com/dataflow/docs/guides/flexrs + ipConfiguration: Configuration for VM IPs. + kmsKeyName: Name for the Cloud KMS key for the job. Key format is: + projects//locations//keyRings//cryptoKeys/ + machineType: The machine type to use for the job. Defaults to the value + from the template if not specified. + maxWorkers: The maximum number of Google Compute Engine instances to be + made available to your pipeline during execution, from 1 to 1000. + network: Network to which VMs will be assigned. If empty or unspecified, + the service will use the network "default". + numWorkers: The initial number of Google Compute Engine instances for the + job. + serviceAccountEmail: The email address of the service account to run the + job as. + subnetwork: Subnetwork to which VMs will be assigned, if desired. You can + specify a subnetwork using either a complete URL or an abbreviated path. + Expected to be of the form "https://www.googleapis.com/compute/v1/projec + ts/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or + "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in + a Shared VPC network, you must use the complete URL. + tempLocation: The Cloud Storage path to use for temporary files. Must be a + valid Cloud Storage URL, beginning with `gs://`. + workerRegion: The Compute Engine region + (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in + which worker processing should occur, e.g. "us-west1". Mutually + exclusive with worker_zone. If neither worker_region nor worker_zone is + specified, default to the control plane's region. + workerZone: The Compute Engine zone + (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in + which worker processing should occur, e.g. "us-west1-a". Mutually + exclusive with worker_region. If neither worker_region nor worker_zone + is specified, a zone in the control plane's region is chosen based on + available capacity. If both `worker_zone` and `zone` are set, + `worker_zone` takes precedence. + zone: The Compute Engine [availability + zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones) + for launching worker instances to run your pipeline. In the future, + worker_zone will take precedence. + """ + + class FlexrsGoalValueValuesEnum(_messages.Enum): + r"""Set FlexRS goal for the job. + https://cloud.google.com/dataflow/docs/guides/flexrs + + Values: + FLEXRS_UNSPECIFIED: Run in the default mode. + FLEXRS_SPEED_OPTIMIZED: Optimize for lower execution time. + FLEXRS_COST_OPTIMIZED: Optimize for lower cost. + """ + FLEXRS_UNSPECIFIED = 0 + FLEXRS_SPEED_OPTIMIZED = 1 + FLEXRS_COST_OPTIMIZED = 2 + + class IpConfigurationValueValuesEnum(_messages.Enum): + r"""Configuration for VM IPs. + + Values: + WORKER_IP_UNSPECIFIED: The configuration is unknown, or unspecified. + WORKER_IP_PUBLIC: Workers should have public IP addresses. + WORKER_IP_PRIVATE: Workers should have private IP addresses. + """ + WORKER_IP_UNSPECIFIED = 0 + WORKER_IP_PUBLIC = 1 + WORKER_IP_PRIVATE = 2 + + @encoding.MapUnrecognizedFields('additionalProperties') + class AdditionalUserLabelsValue(_messages.Message): + r"""Additional user labels to be specified for the job. Keys and values + must follow the restrictions specified in the [labeling + restrictions](https://cloud.google.com/compute/docs/labeling- + resources#restrictions) page. An object containing a list of "key": value + pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }. + + Messages: + AdditionalProperty: An additional property for a + AdditionalUserLabelsValue object. + + Fields: + additionalProperties: Additional properties of type + AdditionalUserLabelsValue + """ + + class AdditionalProperty(_messages.Message): + r"""An additional property for a AdditionalUserLabelsValue object. + + Fields: + key: Name of the additional property. + value: A string attribute. + """ + + key = _messages.StringField(1) + value = _messages.StringField(2) + + additionalProperties = _messages.MessageField('AdditionalProperty', 1, repeated=True) + + additionalExperiments = _messages.StringField(1, repeated=True) + additionalUserLabels = _messages.MessageField('AdditionalUserLabelsValue', 2) + enableStreamingEngine = _messages.BooleanField(3) + flexrsGoal = _messages.EnumField('FlexrsGoalValueValuesEnum', 4) + ipConfiguration = _messages.EnumField('IpConfigurationValueValuesEnum', 5) + kmsKeyName = _messages.StringField(6) + machineType = _messages.StringField(7) + maxWorkers = _messages.IntegerField(8, variant=_messages.Variant.INT32) + network = _messages.StringField(9) + numWorkers = _messages.IntegerField(10, variant=_messages.Variant.INT32) + serviceAccountEmail = _messages.StringField(11) + subnetwork = _messages.StringField(12) + tempLocation = _messages.StringField(13) + workerRegion = _messages.StringField(14) + workerZone = _messages.StringField(15) + zone = _messages.StringField(16) + + class FloatingPointList(_messages.Message): r"""A metric value representing a list of floating point numbers. @@ -2058,7 +2669,7 @@ class TemplateTypeValueValuesEnum(_messages.Enum): class Histogram(_messages.Message): - r"""Histogram of value counts for a distribution. Buckets have an inclusive + r"""Histogram of value counts for a distribution. Buckets have an inclusive lower bound and exclusive upper bound and use "1,2,5 bucketing": The first bucket range is from [0,1) and all subsequent bucket boundaries are powers of ten multiplied by 1, 2, or 5. Thus, bucket boundaries are 0, 1, 2, 5, 10, @@ -2101,7 +2712,7 @@ class InstructionInput(_messages.Message): Fields: outputNum: The output index (origin zero) within the producer. producerInstructionIndex: The index (origin zero) of the parallel - instruction that produces the output to be consumed by this input. This + instruction that produces the output to be consumed by this input. This index is relative to the list of instructions in this input's instruction's containing MapTask. """ @@ -2198,28 +2809,28 @@ class IntegerMean(_messages.Message): class Job(_messages.Message): - r"""Defines a job to be run by the Cloud Dataflow service. + r"""Defines a job to be run by the Cloud Dataflow service. nextID: 26 Enums: - CurrentStateValueValuesEnum: The current state of the job. Jobs are - created in the `JOB_STATE_STOPPED` state unless otherwise specified. A + CurrentStateValueValuesEnum: The current state of the job. Jobs are + created in the `JOB_STATE_STOPPED` state unless otherwise specified. A job in the `JOB_STATE_RUNNING` state may asynchronously enter a terminal state. After a job has reached a terminal state, no further state - updates may be made. This field may be mutated by the Cloud Dataflow + updates may be made. This field may be mutated by the Cloud Dataflow service; callers cannot mutate it. - RequestedStateValueValuesEnum: The job's requested state. `UpdateJob` may + RequestedStateValueValuesEnum: The job's requested state. `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and - `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may + `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may also be used to directly set a job's requested state to `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the job if it has not already reached a terminal state. TypeValueValuesEnum: The type of Cloud Dataflow job. Messages: - LabelsValue: User-defined labels for this job. The labels map can contain - no more than 64 entries. Entries of the labels map are UTF8 strings - that comply with the following restrictions: * Keys must conform to - regexp: \p{Ll}\p{Lo}{0,62} * Values must conform to regexp: + LabelsValue: User-defined labels for this job. The labels map can contain + no more than 64 entries. Entries of the labels map are UTF8 strings that + comply with the following restrictions: * Keys must conform to regexp: + \p{Ll}\p{Lo}{0,62} * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63} * Both keys and values are additionally constrained to be <= 128 bytes in size. TransformNameMappingValue: The map of transform name prefixes of the job @@ -2237,52 +2848,54 @@ class Job(_messages.Message): and set by the Cloud Dataflow service. createdFromSnapshotId: If this is specified, the job's initial state is populated from the given snapshot. - currentState: The current state of the job. Jobs are created in the - `JOB_STATE_STOPPED` state unless otherwise specified. A job in the + currentState: The current state of the job. Jobs are created in the + `JOB_STATE_STOPPED` state unless otherwise specified. A job in the `JOB_STATE_RUNNING` state may asynchronously enter a terminal state. After a job has reached a terminal state, no further state updates may - be made. This field may be mutated by the Cloud Dataflow service; + be made. This field may be mutated by the Cloud Dataflow service; callers cannot mutate it. currentStateTime: The timestamp associated with the current state. environment: The environment for the job. executionInfo: Deprecated. - id: The unique ID of this job. This field is set by the Cloud Dataflow + id: The unique ID of this job. This field is set by the Cloud Dataflow service when the Job is created, and is immutable for the life of the job. jobMetadata: This field is populated by the Dataflow service to support filtering jobs by the metadata values provided here. Populated for ListJobs and all GetJob views SUMMARY and higher. - labels: User-defined labels for this job. The labels map can contain no - more than 64 entries. Entries of the labels map are UTF8 strings that - comply with the following restrictions: * Keys must conform to regexp: + labels: User-defined labels for this job. The labels map can contain no + more than 64 entries. Entries of the labels map are UTF8 strings that + comply with the following restrictions: * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62} * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63} * Both keys and values are additionally constrained to be <= 128 bytes in size. location: The [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that contains this job. - name: The user-specified Cloud Dataflow job name. Only one Job with a + name: The user-specified Cloud Dataflow job name. Only one Job with a given name may exist in a project at any given time. If a caller attempts to create a Job with the same name as an already-existing Job, - the attempt returns the existing Job. The name must match the regular + the attempt returns the existing Job. The name must match the regular expression `[a-z]([-a-z0-9]{0,38}[a-z0-9])?` pipelineDescription: Preliminary field: The format of this data may change at any time. A description of the user pipeline and stages through which - it is executed. Created by Cloud Dataflow service. Only retrieved with + it is executed. Created by Cloud Dataflow service. Only retrieved with JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL. projectId: The ID of the Cloud Platform project that the job belongs to. replaceJobId: If this job is an update of an existing job, this field is - the job ID of the job it replaced. When sending a `CreateJobRequest`, + the job ID of the job it replaced. When sending a `CreateJobRequest`, you can update a job by specifying it here. The job named here is stopped, and its intermediate state is transferred to this job. replacedByJobId: If another job is an update of this job (and thus, this job is in `JOB_STATE_UPDATED`), this field contains the ID of that job. - requestedState: The job's requested state. `UpdateJob` may be used to + requestedState: The job's requested state. `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and `JOB_STATE_RUNNING` states, - by setting requested_state. `UpdateJob` may also be used to directly - set a job's requested state to `JOB_STATE_CANCELLED` or - `JOB_STATE_DONE`, irrevocably terminating the job if it has not already - reached a terminal state. + by setting requested_state. `UpdateJob` may also be used to directly set + a job's requested state to `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, + irrevocably terminating the job if it has not already reached a terminal + state. + satisfiesPzs: Reserved for future use. This field is set only in responses + from the server; it is ignored if it is set in any requests. stageStates: This field may be mutated by the Cloud Dataflow service; callers cannot mutate it. startTime: The timestamp when the job was started (transitioned to @@ -2291,13 +2904,14 @@ class Job(_messages.Message): is updated when the job is started by the Cloud Dataflow service. For other jobs, start_time always equals to create_time and is immutable and set by the Cloud Dataflow service. - steps: Exactly one of step or steps_location should be specified. The - top-level steps that constitute the entire job. + steps: Exactly one of step or steps_location should be specified. The top- + level steps that constitute the entire job. Only retrieved with + JOB_VIEW_ALL. stepsLocation: The GCS location where the steps are stored. tempFiles: A set of files the system should be aware of that are used for temporary storage. These temporary files will be removed on job completion. No duplicates are allowed. No file patterns are supported. - The supported files are: Google Cloud Storage: + The supported files are: Google Cloud Storage: storage.googleapis.com/{bucket}/{object} bucket.storage.googleapis.com/{object} transformNameMapping: The map of transform name prefixes of the job to be @@ -2306,8 +2920,8 @@ class Job(_messages.Message): """ class CurrentStateValueValuesEnum(_messages.Enum): - r"""The current state of the job. Jobs are created in the - `JOB_STATE_STOPPED` state unless otherwise specified. A job in the + r"""The current state of the job. Jobs are created in the + `JOB_STATE_STOPPED` state unless otherwise specified. A job in the `JOB_STATE_RUNNING` state may asynchronously enter a terminal state. After a job has reached a terminal state, no further state updates may be made. This field may be mutated by the Cloud Dataflow service; callers cannot @@ -2320,12 +2934,12 @@ class CurrentStateValueValuesEnum(_messages.Enum): JOB_STATE_RUNNING: `JOB_STATE_RUNNING` indicates that the job is currently running. JOB_STATE_DONE: `JOB_STATE_DONE` indicates that the job has successfully - completed. This is a terminal job state. This state may be set by the + completed. This is a terminal job state. This state may be set by the Cloud Dataflow service, as a transition from `JOB_STATE_RUNNING`. It may also be set via a Cloud Dataflow `UpdateJob` call, if the job has not yet reached a terminal state. JOB_STATE_FAILED: `JOB_STATE_FAILED` indicates that the job has failed. - This is a terminal job state. This state may only be set by the Cloud + This is a terminal job state. This state may only be set by the Cloud Dataflow service, and only as a transition from `JOB_STATE_RUNNING`. JOB_STATE_CANCELLED: `JOB_STATE_CANCELLED` indicates that the job has been explicitly cancelled. This is a terminal job state. This state @@ -2350,15 +2964,19 @@ class CurrentStateValueValuesEnum(_messages.Enum): Cloud Dataflow service, and only as a transition from `JOB_STATE_DRAINING`. JOB_STATE_PENDING: `JOB_STATE_PENDING` indicates that the job has been - created but is not yet running. Jobs that are pending may only + created but is not yet running. Jobs that are pending may only transition to `JOB_STATE_RUNNING`, or `JOB_STATE_FAILED`. JOB_STATE_CANCELLING: `JOB_STATE_CANCELLING` indicates that the job has - been explicitly cancelled and is in the process of stopping. Jobs - that are cancelling may only transition to `JOB_STATE_CANCELLED` or + been explicitly cancelled and is in the process of stopping. Jobs that + are cancelling may only transition to `JOB_STATE_CANCELLED` or `JOB_STATE_FAILED`. JOB_STATE_QUEUED: `JOB_STATE_QUEUED` indicates that the job has been created but is being delayed until launch. Jobs that are queued may only transition to `JOB_STATE_PENDING` or `JOB_STATE_CANCELLED`. + JOB_STATE_RESOURCE_CLEANING_UP: `JOB_STATE_RESOURCE_CLEANING_UP` + indicates that the batch job's associated resources are currently + being cleaned up after a successful run. Currently, this is an opt-in + feature, please reach out to Cloud support team if you are intersted. """ JOB_STATE_UNKNOWN = 0 JOB_STATE_STOPPED = 1 @@ -2372,11 +2990,12 @@ class CurrentStateValueValuesEnum(_messages.Enum): JOB_STATE_PENDING = 9 JOB_STATE_CANCELLING = 10 JOB_STATE_QUEUED = 11 + JOB_STATE_RESOURCE_CLEANING_UP = 12 class RequestedStateValueValuesEnum(_messages.Enum): - r"""The job's requested state. `UpdateJob` may be used to switch between + r"""The job's requested state. `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and `JOB_STATE_RUNNING` states, by setting - requested_state. `UpdateJob` may also be used to directly set a job's + requested_state. `UpdateJob` may also be used to directly set a job's requested state to `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the job if it has not already reached a terminal state. @@ -2387,12 +3006,12 @@ class RequestedStateValueValuesEnum(_messages.Enum): JOB_STATE_RUNNING: `JOB_STATE_RUNNING` indicates that the job is currently running. JOB_STATE_DONE: `JOB_STATE_DONE` indicates that the job has successfully - completed. This is a terminal job state. This state may be set by the + completed. This is a terminal job state. This state may be set by the Cloud Dataflow service, as a transition from `JOB_STATE_RUNNING`. It may also be set via a Cloud Dataflow `UpdateJob` call, if the job has not yet reached a terminal state. JOB_STATE_FAILED: `JOB_STATE_FAILED` indicates that the job has failed. - This is a terminal job state. This state may only be set by the Cloud + This is a terminal job state. This state may only be set by the Cloud Dataflow service, and only as a transition from `JOB_STATE_RUNNING`. JOB_STATE_CANCELLED: `JOB_STATE_CANCELLED` indicates that the job has been explicitly cancelled. This is a terminal job state. This state @@ -2417,15 +3036,19 @@ class RequestedStateValueValuesEnum(_messages.Enum): Cloud Dataflow service, and only as a transition from `JOB_STATE_DRAINING`. JOB_STATE_PENDING: `JOB_STATE_PENDING` indicates that the job has been - created but is not yet running. Jobs that are pending may only + created but is not yet running. Jobs that are pending may only transition to `JOB_STATE_RUNNING`, or `JOB_STATE_FAILED`. JOB_STATE_CANCELLING: `JOB_STATE_CANCELLING` indicates that the job has - been explicitly cancelled and is in the process of stopping. Jobs - that are cancelling may only transition to `JOB_STATE_CANCELLED` or + been explicitly cancelled and is in the process of stopping. Jobs that + are cancelling may only transition to `JOB_STATE_CANCELLED` or `JOB_STATE_FAILED`. JOB_STATE_QUEUED: `JOB_STATE_QUEUED` indicates that the job has been created but is being delayed until launch. Jobs that are queued may only transition to `JOB_STATE_PENDING` or `JOB_STATE_CANCELLED`. + JOB_STATE_RESOURCE_CLEANING_UP: `JOB_STATE_RESOURCE_CLEANING_UP` + indicates that the batch job's associated resources are currently + being cleaned up after a successful run. Currently, this is an opt-in + feature, please reach out to Cloud support team if you are intersted. """ JOB_STATE_UNKNOWN = 0 JOB_STATE_STOPPED = 1 @@ -2439,6 +3062,7 @@ class RequestedStateValueValuesEnum(_messages.Enum): JOB_STATE_PENDING = 9 JOB_STATE_CANCELLING = 10 JOB_STATE_QUEUED = 11 + JOB_STATE_RESOURCE_CLEANING_UP = 12 class TypeValueValuesEnum(_messages.Enum): r"""The type of Cloud Dataflow job. @@ -2456,9 +3080,9 @@ class TypeValueValuesEnum(_messages.Enum): @encoding.MapUnrecognizedFields('additionalProperties') class LabelsValue(_messages.Message): - r"""User-defined labels for this job. The labels map can contain no more - than 64 entries. Entries of the labels map are UTF8 strings that comply - with the following restrictions: * Keys must conform to regexp: + r"""User-defined labels for this job. The labels map can contain no more + than 64 entries. Entries of the labels map are UTF8 strings that comply + with the following restrictions: * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62} * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63} * Both keys and values are additionally constrained to be <= 128 bytes in size. @@ -2527,13 +3151,28 @@ class AdditionalProperty(_messages.Message): replaceJobId = _messages.StringField(15) replacedByJobId = _messages.StringField(16) requestedState = _messages.EnumField('RequestedStateValueValuesEnum', 17) - stageStates = _messages.MessageField('ExecutionStageState', 18, repeated=True) - startTime = _messages.StringField(19) - steps = _messages.MessageField('Step', 20, repeated=True) - stepsLocation = _messages.StringField(21) - tempFiles = _messages.StringField(22, repeated=True) - transformNameMapping = _messages.MessageField('TransformNameMappingValue', 23) - type = _messages.EnumField('TypeValueValuesEnum', 24) + satisfiesPzs = _messages.BooleanField(18) + stageStates = _messages.MessageField('ExecutionStageState', 19, repeated=True) + startTime = _messages.StringField(20) + steps = _messages.MessageField('Step', 21, repeated=True) + stepsLocation = _messages.StringField(22) + tempFiles = _messages.StringField(23, repeated=True) + transformNameMapping = _messages.MessageField('TransformNameMappingValue', 24) + type = _messages.EnumField('TypeValueValuesEnum', 25) + + +class JobExecutionDetails(_messages.Message): + r"""Information about the execution of a job. + + Fields: + nextPageToken: If present, this response does not contain all requested + tasks. To obtain the next page of results, repeat the request with + page_token set to this value. + stages: The stages of the job execution. + """ + + nextPageToken = _messages.StringField(1) + stages = _messages.MessageField('StageSummary', 2, repeated=True) class JobExecutionInfo(_messages.Message): @@ -2612,11 +3251,11 @@ class MessageImportanceValueValuesEnum(_messages.Enum): Typically, Dataflow pipeline runners do not display log messages at this level by default. JOB_MESSAGE_DETAILED: The message is at the 'detailed' level: somewhat - verbose, but potentially useful to users. Typically, Dataflow - pipeline runners do not display log messages at this level by default. - These messages are displayed by default in the Dataflow monitoring UI. + verbose, but potentially useful to users. Typically, Dataflow pipeline + runners do not display log messages at this level by default. These + messages are displayed by default in the Dataflow monitoring UI. JOB_MESSAGE_BASIC: The message is at the 'basic' level: useful for - keeping track of the execution of a Dataflow pipeline. Typically, + keeping track of the execution of a Dataflow pipeline. Typically, Dataflow pipeline runners display log messages at this level by default, and these messages are displayed by default in the Dataflow monitoring UI. @@ -2626,7 +3265,7 @@ class MessageImportanceValueValuesEnum(_messages.Enum): level by default, and these messages are displayed by default in the Dataflow monitoring UI. JOB_MESSAGE_ERROR: The message is at the 'error' level: indicating a - condition preventing a job from succeeding. Typically, Dataflow + condition preventing a job from succeeding. Typically, Dataflow pipeline runners display log messages at this level by default, and these messages are displayed by default in the Dataflow monitoring UI. """ @@ -2673,7 +3312,7 @@ class JobMetadata(_messages.Message): class JobMetrics(_messages.Message): r"""JobMetrics contains a collection of metrics describing the detailed progress of a Dataflow job. Metrics correspond to user-defined and system- - defined metrics in the job. This resource captures only the most recent + defined metrics in the job. This resource captures only the most recent values of each metric; time-series data can be queried for them (under the same metric names) from Cloud Monitoring. @@ -2735,16 +3374,58 @@ class LaunchFlexTemplateParameter(_messages.Message): r"""Launch FlexTemplate Parameter. Messages: + LaunchOptionsValue: Launch options for this flex template job. This is a + common set of options across languages and templates. This should not be + used to pass job parameters. ParametersValue: The parameters for FlexTemplate. Ex. {"num_workers":"5"} + TransformNameMappingsValue: Use this to pass transform_name_mappings for + streaming update jobs. Ex:{"oldTransformName":"newTransformName",...}' Fields: containerSpec: Spec about the container image to launch. containerSpecGcsPath: Gcs path to a file with json serialized ContainerSpec as content. - jobName: Required. The job name to use for the created job. + environment: The runtime environment for the FlexTemplate job + jobName: Required. The job name to use for the created job. For update job + request, job name should be same as the existing running job. + launchOptions: Launch options for this flex template job. This is a common + set of options across languages and templates. This should not be used + to pass job parameters. parameters: The parameters for FlexTemplate. Ex. {"num_workers":"5"} + transformNameMappings: Use this to pass transform_name_mappings for + streaming update jobs. Ex:{"oldTransformName":"newTransformName",...}' + update: Set this to true if you are sending a request to update a running + streaming job. When set, the job name should be the same as the running + job. """ + @encoding.MapUnrecognizedFields('additionalProperties') + class LaunchOptionsValue(_messages.Message): + r"""Launch options for this flex template job. This is a common set of + options across languages and templates. This should not be used to pass + job parameters. + + Messages: + AdditionalProperty: An additional property for a LaunchOptionsValue + object. + + Fields: + additionalProperties: Additional properties of type LaunchOptionsValue + """ + + class AdditionalProperty(_messages.Message): + r"""An additional property for a LaunchOptionsValue object. + + Fields: + key: Name of the additional property. + value: A string attribute. + """ + + key = _messages.StringField(1) + value = _messages.StringField(2) + + additionalProperties = _messages.MessageField('AdditionalProperty', 1, repeated=True) + @encoding.MapUnrecognizedFields('additionalProperties') class ParametersValue(_messages.Message): r"""The parameters for FlexTemplate. Ex. {"num_workers":"5"} @@ -2769,10 +3450,41 @@ class AdditionalProperty(_messages.Message): additionalProperties = _messages.MessageField('AdditionalProperty', 1, repeated=True) + @encoding.MapUnrecognizedFields('additionalProperties') + class TransformNameMappingsValue(_messages.Message): + r"""Use this to pass transform_name_mappings for streaming update jobs. + Ex:{"oldTransformName":"newTransformName",...}' + + Messages: + AdditionalProperty: An additional property for a + TransformNameMappingsValue object. + + Fields: + additionalProperties: Additional properties of type + TransformNameMappingsValue + """ + + class AdditionalProperty(_messages.Message): + r"""An additional property for a TransformNameMappingsValue object. + + Fields: + key: Name of the additional property. + value: A string attribute. + """ + + key = _messages.StringField(1) + value = _messages.StringField(2) + + additionalProperties = _messages.MessageField('AdditionalProperty', 1, repeated=True) + containerSpec = _messages.MessageField('ContainerSpec', 1) containerSpecGcsPath = _messages.StringField(2) - jobName = _messages.StringField(3) - parameters = _messages.MessageField('ParametersValue', 4) + environment = _messages.MessageField('FlexTemplateRuntimeEnvironment', 3) + jobName = _messages.StringField(4) + launchOptions = _messages.MessageField('LaunchOptionsValue', 5) + parameters = _messages.MessageField('ParametersValue', 6) + transformNameMappings = _messages.MessageField('TransformNameMappingsValue', 7) + update = _messages.BooleanField(8) class LaunchFlexTemplateRequest(_messages.Message): @@ -3024,10 +3736,33 @@ class ListJobsResponse(_messages.Message): nextPageToken = _messages.StringField(3) +class ListSnapshotsResponse(_messages.Message): + r"""List of snapshots. + + Fields: + snapshots: Returned snapshots. + """ + + snapshots = _messages.MessageField('Snapshot', 1, repeated=True) + + +class ListTemplateVersionsResponse(_messages.Message): + r"""Respond a list of TemplateVersions. + + Fields: + nextPageToken: A token that can be sent as `page_token` to retrieve the + next page. If this field is omitted, there are no subsequent pages. + templateVersions: A list of TemplateVersions. + """ + + nextPageToken = _messages.StringField(1) + templateVersions = _messages.MessageField('TemplateVersion', 2, repeated=True) + + class MapTask(_messages.Message): r"""MapTask consists of an ordered set of instructions, each of which describes one particular low-level operation for the worker to perform in - order to accomplish the MapTask's WorkItem. Each instruction must appear in + order to accomplish the MapTask's WorkItem. Each instruction must appear in the list before any instructions which depends on its output. Fields: @@ -3046,6 +3781,23 @@ class MapTask(_messages.Message): systemName = _messages.StringField(4) +class MemInfo(_messages.Message): + r"""Information about the memory usage of a worker or a container within a + worker. + + Fields: + currentLimitBytes: Instantenous memory limit in bytes. + currentRssBytes: Instantenous memory (RSS) size in bytes. + timestamp: Timestamp of the measurement. + totalGbMs: Total memory (RSS) usage since start up in GB * ms. + """ + + currentLimitBytes = _messages.IntegerField(1, variant=_messages.Variant.UINT64) + currentRssBytes = _messages.IntegerField(2, variant=_messages.Variant.UINT64) + timestamp = _messages.StringField(3) + totalGbMs = _messages.IntegerField(4, variant=_messages.Variant.UINT64) + + class MetricShortId(_messages.Message): r"""The metric short id is returned to the user alongside an offset into ReportWorkItemStatusRequest @@ -3067,18 +3819,16 @@ class MetricStructuredName(_messages.Message): Messages: ContextValue: Zero or more labeled fields which identify the part of the job this metric is associated with, such as the name of a step or - collection. For example, built-in counters associated with steps will - have context['step'] = . Counters associated with - PCollections in the SDK will have context['pcollection'] = . + collection. For example, built-in counters associated with steps will + have context['step'] = . Counters associated with PCollections in the + SDK will have context['pcollection'] = . Fields: context: Zero or more labeled fields which identify the part of the job this metric is associated with, such as the name of a step or - collection. For example, built-in counters associated with steps will - have context['step'] = . Counters associated with - PCollections in the SDK will have context['pcollection'] = . + collection. For example, built-in counters associated with steps will + have context['step'] = . Counters associated with PCollections in the + SDK will have context['pcollection'] = . name: Worker-defined metric name. origin: Origin (namespace) of metric name. May be blank for user-define metrics; will be "dataflow" for metrics defined by the Dataflow service @@ -3088,10 +3838,10 @@ class MetricStructuredName(_messages.Message): @encoding.MapUnrecognizedFields('additionalProperties') class ContextValue(_messages.Message): r"""Zero or more labeled fields which identify the part of the job this - metric is associated with, such as the name of a step or collection. For + metric is associated with, such as the name of a step or collection. For example, built-in counters associated with steps will have context['step'] - = . Counters associated with PCollections in the SDK will have - context['pcollection'] = . + = . Counters associated with PCollections in the SDK will have + context['pcollection'] = . Messages: AdditionalProperty: An additional property for a ContextValue object. @@ -3133,9 +3883,9 @@ class MetricUpdate(_messages.Message): the newest value. internal: Worker-computed aggregate value for internal use by the Dataflow service. - kind: Metric aggregation kind. The possible metric aggregation kinds are + kind: Metric aggregation kind. The possible metric aggregation kinds are "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution". The - specified aggregation kind is case-insensitive. If omitted, this is not + specified aggregation kind is case-insensitive. If omitted, this is not an aggregated value but instead a single metric sample value. meanCount: Worker-computed aggregate value for the "Mean" aggregation kind. This holds the count of the aggregated values and is used in @@ -3147,11 +3897,11 @@ class MetricUpdate(_messages.Message): only possible value types are Long and Double. name: Name of the metric. scalar: Worker-computed aggregate value for aggregation kinds "Sum", - "Max", "Min", "And", and "Or". The possible value types are Long, + "Max", "Min", "And", and "Or". The possible value types are Long, Double, and Boolean. - set: Worker-computed aggregate value for the "Set" aggregation kind. The + set: Worker-computed aggregate value for the "Set" aggregation kind. The only possible value type is a list of Values whose type can be Long, - Double, or String, according to the metric's type. All Values in the + Double, or String, according to the metric's type. All Values in the list must be of the same type. updateTime: Timestamp associated with the metric value. Optional when workers are reporting work progress; it will be filled in responses from @@ -3171,6 +3921,102 @@ class MetricUpdate(_messages.Message): updateTime = _messages.StringField(11) +class ModifyTemplateVersionLabelRequest(_messages.Message): + r"""Either add the label to TemplateVersion or remove it from the + TemplateVersion. + + Enums: + OpValueValuesEnum: Requests for add label to TemplateVersion or remove + label from TemplateVersion. + + Fields: + key: The label key for update. + op: Requests for add label to TemplateVersion or remove label from + TemplateVersion. + value: The label value for update. + """ + + class OpValueValuesEnum(_messages.Enum): + r"""Requests for add label to TemplateVersion or remove label from + TemplateVersion. + + Values: + OPERATION_UNSPECIFIED: Default value. + ADD: Add the label to the TemplateVersion object. + REMOVE: Remove the label from the TemplateVersion object. + """ + OPERATION_UNSPECIFIED = 0 + ADD = 1 + REMOVE = 2 + + key = _messages.StringField(1) + op = _messages.EnumField('OpValueValuesEnum', 2) + value = _messages.StringField(3) + + +class ModifyTemplateVersionLabelResponse(_messages.Message): + r"""Respond the labels in the TemplateVersion. + + Messages: + LabelsValue: All the label in the TemplateVersion. + + Fields: + labels: All the label in the TemplateVersion. + """ + + @encoding.MapUnrecognizedFields('additionalProperties') + class LabelsValue(_messages.Message): + r"""All the label in the TemplateVersion. + + Messages: + AdditionalProperty: An additional property for a LabelsValue object. + + Fields: + additionalProperties: Additional properties of type LabelsValue + """ + + class AdditionalProperty(_messages.Message): + r"""An additional property for a LabelsValue object. + + Fields: + key: Name of the additional property. + value: A string attribute. + """ + + key = _messages.StringField(1) + value = _messages.StringField(2) + + additionalProperties = _messages.MessageField('AdditionalProperty', 1, repeated=True) + + labels = _messages.MessageField('LabelsValue', 1) + + +class ModifyTemplateVersionTagRequest(_messages.Message): + r"""Add a tag to the current TemplateVersion. If tag exist in another + TemplateVersion in the Template, remove the tag before add it to the current + TemplateVersion. If remove_only set, remove the tag from the current + TemplateVersion. + + Fields: + removeOnly: The flag that indicates if the request is only for remove tag + from TemplateVersion. + tag: The tag for update. + """ + + removeOnly = _messages.BooleanField(1) + tag = _messages.StringField(2) + + +class ModifyTemplateVersionTagResponse(_messages.Message): + r"""Respond the current tags in the TemplateVersion. + + Fields: + tags: All the tags in the TemplateVersion. + """ + + tags = _messages.StringField(1, repeated=True) + + class MountedDataDisk(_messages.Message): r"""Describes mounted data disk. @@ -3248,7 +4094,7 @@ class Package(_messages.Message): Fields: location: The resource to read the package from. The supported resource - type is: Google Cloud Storage: storage.googleapis.com/{bucket} + type is: Google Cloud Storage: storage.googleapis.com/{bucket} bucket.storage.googleapis.com/ name: The name of the package. """ @@ -3480,7 +4326,7 @@ class AdditionalProperty(_messages.Message): class PipelineDescription(_messages.Message): r"""A descriptive representation of submitted pipeline as well as the - executed form. This data is provided by the Dataflow service for ease of + executed form. This data is provided by the Dataflow service for ease of visualizing the pipeline and interpreting Dataflow provided metrics. Fields: @@ -3496,8 +4342,20 @@ class PipelineDescription(_messages.Message): originalPipelineTransform = _messages.MessageField('TransformSummary', 3, repeated=True) +class Point(_messages.Message): + r"""A point in the timeseries. + + Fields: + time: The timestamp of the point. + value: The value of the point. + """ + + time = _messages.StringField(1) + value = _messages.FloatField(2) + + class Position(_messages.Message): - r"""Position defines a position within a collection of data. The value can + r"""Position defines a position within a collection of data. The value can be either the end position, a key (used with ordered collections), a byte offset, or a record index. @@ -3520,6 +4378,20 @@ class Position(_messages.Message): shufflePosition = _messages.StringField(6) +class ProgressTimeseries(_messages.Message): + r"""Information about the progress of some component of job execution. + + Fields: + currentProgress: The current progress of the component, in the range + [0,1]. + dataPoints: History of progress for the component. Points are sorted by + time. + """ + + currentProgress = _messages.FloatField(1) + dataPoints = _messages.MessageField('Point', 2, repeated=True) + + class PubSubIODetails(_messages.Message): r"""Metadata for a PubSub connector used by the job. @@ -3541,12 +4413,11 @@ class PubsubLocation(_messages.Message): idLabel: If set, contains a pubsub label from which to extract record ids. If left empty, record deduplication will be strictly best effort. subscription: A pubsub subscription, in the form of - "pubsub.googleapis.com/subscriptions//" + "pubsub.googleapis.com/subscriptions//" timestampLabel: If set, contains a pubsub label from which to extract record timestamps. If left empty, record timestamps will be generated upon arrival. - topic: A pubsub topic, in the form of "pubsub.googleapis.com/topics - //" + topic: A pubsub topic, in the form of "pubsub.googleapis.com/topics//" trackingSubscription: If set, specifies the pubsub subscription that will be used for tracking custom time timestamps for watermark estimation. withAttributes: If true, then the client has requested to get pubsub @@ -3562,6 +4433,45 @@ class PubsubLocation(_messages.Message): withAttributes = _messages.BooleanField(7) +class PubsubSnapshotMetadata(_messages.Message): + r"""Represents a Pubsub snapshot. + + Fields: + expireTime: The expire time of the Pubsub snapshot. + snapshotName: The name of the Pubsub snapshot. + topicName: The name of the Pubsub topic. + """ + + expireTime = _messages.StringField(1) + snapshotName = _messages.StringField(2) + topicName = _messages.StringField(3) + + +class QueryInfo(_messages.Message): + r"""Information about a validated query. + + Enums: + QueryPropertyValueListEntryValuesEnum: + + Fields: + queryProperty: Includes an entry for each satisfied QueryProperty. + """ + + class QueryPropertyValueListEntryValuesEnum(_messages.Enum): + r"""QueryPropertyValueListEntryValuesEnum enum type. + + Values: + QUERY_PROPERTY_UNSPECIFIED: The query property is unknown or + unspecified. + HAS_UNBOUNDED_SOURCE: Indicates this query reads from >= 1 unbounded + source. + """ + QUERY_PROPERTY_UNSPECIFIED = 0 + HAS_UNBOUNDED_SOURCE = 1 + + queryProperty = _messages.EnumField('QueryPropertyValueListEntryValuesEnum', 1, repeated=True) + + class ReadInstruction(_messages.Message): r"""An instruction that reads records. Takes no inputs, produces one output. @@ -3589,8 +4499,8 @@ class ReportWorkItemStatusRequest(_messages.Message): workItemStatuses: The order is unimportant, except that the order of the WorkItemServiceState messages in the ReportWorkItemStatusResponse corresponds to the order of WorkItemStatus messages here. - workerId: The ID of the worker reporting the WorkItem status. If this - does not match the ID of the worker which the Dataflow service believes + workerId: The ID of the worker reporting the WorkItem status. If this does + not match the ID of the worker which the Dataflow service believes currently has the lease on the WorkItem, the report will be dropped (with an error response). """ @@ -3695,14 +4605,45 @@ class ReportedParallelism(_messages.Message): class ResourceUtilizationReport(_messages.Message): r"""Worker metrics exported from workers. This contains resource utilization - metrics accumulated from a variety of sources. For more information, see go - /df-resource-signals. + metrics accumulated from a variety of sources. For more information, see + go/df-resource-signals. + + Messages: + ContainersValue: Per container information. Key: container name. Fields: + containers: Per container information. Key: container name. cpuTime: CPU utilization samples. + memoryInfo: Memory utilization samples. """ - cpuTime = _messages.MessageField('CPUTime', 1, repeated=True) + @encoding.MapUnrecognizedFields('additionalProperties') + class ContainersValue(_messages.Message): + r"""Per container information. Key: container name. + + Messages: + AdditionalProperty: An additional property for a ContainersValue object. + + Fields: + additionalProperties: Additional properties of type ContainersValue + """ + + class AdditionalProperty(_messages.Message): + r"""An additional property for a ContainersValue object. + + Fields: + key: Name of the additional property. + value: A ResourceUtilizationReport attribute. + """ + + key = _messages.StringField(1) + value = _messages.MessageField('ResourceUtilizationReport', 2) + + additionalProperties = _messages.MessageField('AdditionalProperty', 1, repeated=True) + + containers = _messages.MessageField('ContainersValue', 1) + cpuTime = _messages.MessageField('CPUTime', 2, repeated=True) + memoryInfo = _messages.MessageField('MemInfo', 3, repeated=True) class ResourceUtilizationReportResponse(_messages.Message): @@ -3721,32 +4662,38 @@ class RuntimeEnvironment(_messages.Message): AdditionalUserLabelsValue: Additional user labels to be specified for the job. Keys and values should follow the restrictions specified in the [labeling restrictions](https://cloud.google.com/compute/docs/labeling- - resources#restrictions) page. + resources#restrictions) page. An object containing a list of "key": + value pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }. Fields: additionalExperiments: Additional experiment flags for the job. additionalUserLabels: Additional user labels to be specified for the job. Keys and values should follow the restrictions specified in the [labeling restrictions](https://cloud.google.com/compute/docs/labeling- - resources#restrictions) page. + resources#restrictions) page. An object containing a list of "key": + value pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }. bypassTempDirValidation: Whether to bypass the safety checks for the job's temporary directory. Use with caution. + enableStreamingEngine: Whether to enable Streaming Engine for the job. ipConfiguration: Configuration for VM IPs. - kmsKeyName: Optional. Name for the Cloud KMS key for the job. Key format - is: projects//locations//keyRings//cryptoKey - s/ + kmsKeyName: Name for the Cloud KMS key for the job. Key format is: + projects//locations//keyRings//cryptoKeys/ machineType: The machine type to use for the job. Defaults to the value from the template if not specified. maxWorkers: The maximum number of Google Compute Engine instances to be made available to your pipeline during execution, from 1 to 1000. - network: Network to which VMs will be assigned. If empty or unspecified, + network: Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default". numWorkers: The initial number of Google Compute Engine instnaces for the job. serviceAccountEmail: The email address of the service account to run the job as. - subnetwork: Subnetwork to which VMs will be assigned, if desired. - Expected to be of the form "regions/REGION/subnetworks/SUBNETWORK". + subnetwork: Subnetwork to which VMs will be assigned, if desired. You can + specify a subnetwork using either a complete URL or an abbreviated path. + Expected to be of the form "https://www.googleapis.com/compute/v1/projec + ts/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or + "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in + a Shared VPC network, you must use the complete URL. tempLocation: The Cloud Storage path to use for temporary files. Must be a valid Cloud Storage URL, beginning with `gs://`. workerRegion: The Compute Engine region @@ -3754,12 +4701,13 @@ class RuntimeEnvironment(_messages.Message): which worker processing should occur, e.g. "us-west1". Mutually exclusive with worker_zone. If neither worker_region nor worker_zone is specified, default to the control plane's region. - workerZone: The Compute Engine zone (https://cloud.google.com/compute/docs - /regions-zones/regions-zones) in which worker processing should occur, - e.g. "us-west1-a". Mutually exclusive with worker_region. If neither - worker_region nor worker_zone is specified, a zone in the control - plane's region is chosen based on available capacity. If both - `worker_zone` and `zone` are set, `worker_zone` takes precedence. + workerZone: The Compute Engine zone + (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in + which worker processing should occur, e.g. "us-west1-a". Mutually + exclusive with worker_region. If neither worker_region nor worker_zone + is specified, a zone in the control plane's region is chosen based on + available capacity. If both `worker_zone` and `zone` are set, + `worker_zone` takes precedence. zone: The Compute Engine [availability zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones) for launching worker instances to run your pipeline. In the future, @@ -3783,7 +4731,8 @@ class AdditionalUserLabelsValue(_messages.Message): r"""Additional user labels to be specified for the job. Keys and values should follow the restrictions specified in the [labeling restrictions](https://cloud.google.com/compute/docs/labeling- - resources#restrictions) page. + resources#restrictions) page. An object containing a list of "key": value + pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }. Messages: AdditionalProperty: An additional property for a @@ -3810,18 +4759,19 @@ class AdditionalProperty(_messages.Message): additionalExperiments = _messages.StringField(1, repeated=True) additionalUserLabels = _messages.MessageField('AdditionalUserLabelsValue', 2) bypassTempDirValidation = _messages.BooleanField(3) - ipConfiguration = _messages.EnumField('IpConfigurationValueValuesEnum', 4) - kmsKeyName = _messages.StringField(5) - machineType = _messages.StringField(6) - maxWorkers = _messages.IntegerField(7, variant=_messages.Variant.INT32) - network = _messages.StringField(8) - numWorkers = _messages.IntegerField(9, variant=_messages.Variant.INT32) - serviceAccountEmail = _messages.StringField(10) - subnetwork = _messages.StringField(11) - tempLocation = _messages.StringField(12) - workerRegion = _messages.StringField(13) - workerZone = _messages.StringField(14) - zone = _messages.StringField(15) + enableStreamingEngine = _messages.BooleanField(4) + ipConfiguration = _messages.EnumField('IpConfigurationValueValuesEnum', 5) + kmsKeyName = _messages.StringField(6) + machineType = _messages.StringField(7) + maxWorkers = _messages.IntegerField(8, variant=_messages.Variant.INT32) + network = _messages.StringField(9) + numWorkers = _messages.IntegerField(10, variant=_messages.Variant.INT32) + serviceAccountEmail = _messages.StringField(11) + subnetwork = _messages.StringField(12) + tempLocation = _messages.StringField(13) + workerRegion = _messages.StringField(14) + workerZone = _messages.StringField(15) + zone = _messages.StringField(16) class RuntimeMetadata(_messages.Message): @@ -3869,6 +4819,8 @@ class SdkHarnessContainerImage(_messages.Message): Fields: containerImage: A docker container image that resides in Google Container Registry. + environmentId: Environment ID for the Beam runner API proto Environment + that corresponds to the current SDK Harness. useSingleCorePerContainer: If true, recommends the Dataflow service to use only one core per SDK container instance with this image. If false (or unset) recommends using more than one core per SDK container instance @@ -3877,7 +4829,8 @@ class SdkHarnessContainerImage(_messages.Message): """ containerImage = _messages.StringField(1) - useSingleCorePerContainer = _messages.BooleanField(2) + environmentId = _messages.StringField(2) + useSingleCorePerContainer = _messages.BooleanField(3) class SdkVersion(_messages.Message): @@ -3936,8 +4889,7 @@ class SendDebugCaptureRequest(_messages.Message): class SendDebugCaptureResponse(_messages.Message): - r"""Response to a send capture request. -nothing""" + r"""Response to a send capture request. nothing""" class SendWorkerMessagesRequest(_messages.Message): @@ -4146,6 +5098,72 @@ class AdditionalProperty(_messages.Message): spec = _messages.MessageField('SpecValue', 2) +class Snapshot(_messages.Message): + r"""Represents a snapshot of a job. + + Enums: + StateValueValuesEnum: State of the snapshot. + + Fields: + creationTime: The time this snapshot was created. + description: User specified description of the snapshot. Maybe empty. + diskSizeBytes: The disk byte size of the snapshot. Only available for + snapshots in READY state. + id: The unique ID of this snapshot. + projectId: The project this snapshot belongs to. + pubsubMetadata: PubSub snapshot metadata. + sourceJobId: The job this snapshot was created from. + state: State of the snapshot. + ttl: The time after which this snapshot will be automatically deleted. + """ + + class StateValueValuesEnum(_messages.Enum): + r"""State of the snapshot. + + Values: + UNKNOWN_SNAPSHOT_STATE: Unknown state. + PENDING: Snapshot intent to create has been persisted, snapshotting of + state has not yet started. + RUNNING: Snapshotting is being performed. + READY: Snapshot has been created and is ready to be used. + FAILED: Snapshot failed to be created. + DELETED: Snapshot has been deleted. + """ + UNKNOWN_SNAPSHOT_STATE = 0 + PENDING = 1 + RUNNING = 2 + READY = 3 + FAILED = 4 + DELETED = 5 + + creationTime = _messages.StringField(1) + description = _messages.StringField(2) + diskSizeBytes = _messages.IntegerField(3) + id = _messages.StringField(4) + projectId = _messages.StringField(5) + pubsubMetadata = _messages.MessageField('PubsubSnapshotMetadata', 6, repeated=True) + sourceJobId = _messages.StringField(7) + state = _messages.EnumField('StateValueValuesEnum', 8) + ttl = _messages.StringField(9) + + +class SnapshotJobRequest(_messages.Message): + r"""Request to create a snapshot of a job. + + Fields: + description: User specified description of the snapshot. Maybe empty. + location: The location that contains this job. + snapshotSources: If true, perform snapshots for sources which support + this. + ttl: TTL for the snapshot. + """ + + description = _messages.StringField(1) + location = _messages.StringField(2) + snapshotSources = _messages.BooleanField(3) + ttl = _messages.StringField(4) + + class Source(_messages.Message): r"""A source that records can be read and decoded from. @@ -4165,19 +5183,19 @@ class Source(_messages.Message): codec: The codec to use to decode data read from the source. doesNotNeedSplitting: Setting this value to true hints to the framework that the source doesn't need splitting, and using SourceSplitRequest on - it would yield SOURCE_SPLIT_OUTCOME_USE_CURRENT. E.g. a file splitter + it would yield SOURCE_SPLIT_OUTCOME_USE_CURRENT. E.g. a file splitter may set this to true when splitting a single file into a set of byte ranges of appropriate size, and set this to false when splitting a filepattern into individual files. However, for efficiency, a file splitter may decide to produce file subranges directly from the - filepattern to avoid a splitting round-trip. See SourceSplitRequest for - an overview of the splitting process. This field is meaningful only in + filepattern to avoid a splitting round-trip. See SourceSplitRequest for + an overview of the splitting process. This field is meaningful only in the Source objects populated by the user (e.g. when filling in a DerivedSource). Source objects supplied by the framework to the user don't have this field populated. metadata: Optionally, metadata for this source can be supplied right away, avoiding a SourceGetMetadataOperation roundtrip (see - SourceOperationRequest). This field is meaningful only in the Source + SourceOperationRequest). This field is meaningful only in the Source objects populated by the user (e.g. when filling in a DerivedSource). Source objects supplied by the framework to the user don't have this field populated. @@ -4306,7 +5324,7 @@ class SourceMetadata(_messages.Message): Fields: estimatedSizeBytes: An estimate of the total size (in bytes) of the data - that would be read from this source. This estimate is in terms of + that would be read from this source. This estimate is in terms of external storage size, before any decompression or other processing done by the reader. infinite: Specifies that the size of this source is known to be infinite @@ -4374,11 +5392,11 @@ class SourceSplitOptions(_messages.Message): class SourceSplitRequest(_messages.Message): r"""Represents the operation to split a high-level Source specification into - bundles (parts for parallel processing). At a high level, splitting of a + bundles (parts for parallel processing). At a high level, splitting of a source into bundles happens as follows: SourceSplitRequest is applied to the source. If it returns SOURCE_SPLIT_OUTCOME_USE_CURRENT, no further splitting happens and the source is used "as is". Otherwise, splitting is applied - recursively to each produced DerivedSource. As an optimization, for any + recursively to each produced DerivedSource. As an optimization, for any Source, if its does_not_need_splitting is true, the framework assumes that splitting this source would return SOURCE_SPLIT_OUTCOME_USE_CURRENT, and doesn't initiate a SourceSplitRequest. This applies both to the initial @@ -4499,6 +5517,20 @@ class SplitInt64(_messages.Message): lowBits = _messages.IntegerField(2, variant=_messages.Variant.UINT32) +class StageExecutionDetails(_messages.Message): + r"""Information about the workers and work items within a stage. + + Fields: + nextPageToken: If present, this response does not contain all requested + tasks. To obtain the next page of results, repeat the request with + page_token set to this value. + workers: Workers that have done work on the stage. + """ + + nextPageToken = _messages.StringField(1) + workers = _messages.MessageField('WorkerDetails', 2, repeated=True) + + class StageSource(_messages.Message): r"""Description of an input or output of an execution stage. @@ -4517,6 +5549,49 @@ class StageSource(_messages.Message): userName = _messages.StringField(4) +class StageSummary(_messages.Message): + r"""Information about a particular execution stage of a job. + + Enums: + StateValueValuesEnum: State of this stage. + + Fields: + endTime: End time of this stage. If the work item is completed, this is + the actual end time of the stage. Otherwise, it is the predicted end + time. + metrics: Metrics for this stage. + progress: Progress for this stage. Only applicable to Batch jobs. + stageId: ID of this stage + startTime: Start time of this stage. + state: State of this stage. + """ + + class StateValueValuesEnum(_messages.Enum): + r"""State of this stage. + + Values: + EXECUTION_STATE_UNKNOWN: The component state is unknown or unspecified. + EXECUTION_STATE_NOT_STARTED: The component is not yet running. + EXECUTION_STATE_RUNNING: The component is currently running. + EXECUTION_STATE_SUCCEEDED: The component succeeded. + EXECUTION_STATE_FAILED: The component failed. + EXECUTION_STATE_CANCELLED: Execution of the component was cancelled. + """ + EXECUTION_STATE_UNKNOWN = 0 + EXECUTION_STATE_NOT_STARTED = 1 + EXECUTION_STATE_RUNNING = 2 + EXECUTION_STATE_SUCCEEDED = 3 + EXECUTION_STATE_FAILED = 4 + EXECUTION_STATE_CANCELLED = 5 + + endTime = _messages.StringField(1) + metrics = _messages.MessageField('MetricUpdate', 2, repeated=True) + progress = _messages.MessageField('ProgressTimeseries', 3) + stageId = _messages.StringField(4) + startTime = _messages.StringField(5) + state = _messages.EnumField('StateValueValuesEnum', 6) + + class StandardQueryParameters(_messages.Message): r"""Query parameters accepted by all methods. @@ -4568,7 +5643,7 @@ class FXgafvValueValuesEnum(_messages.Enum): f__xgafv = _messages.EnumField('FXgafvValueValuesEnum', 1) access_token = _messages.StringField(2) - alt = _messages.EnumField('AltValueValuesEnum', 3, default=u'json') + alt = _messages.EnumField('AltValueValuesEnum', 3, default='json') callback = _messages.StringField(4) fields = _messages.StringField(5) key = _messages.StringField(6) @@ -4596,7 +5671,7 @@ class Status(_messages.Message): r"""The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains - three pieces of data: error code, error message, and error details. You can + three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors). @@ -4605,7 +5680,7 @@ class Status(_messages.Message): Fields: code: The status code, which should be an enum value of google.rpc.Code. - details: A list of messages that carry the error details. There is a + details: A list of messages that carry the error details. There is a common set of message types for APIs to use. message: A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the @@ -4644,23 +5719,23 @@ class AdditionalProperty(_messages.Message): class Step(_messages.Message): - r"""Defines a particular step within a Cloud Dataflow job. A job consists - of multiple steps, each of which performs some specific operation as part of - the overall job. Data is typically passed from one step to another as part - of the job. Here's an example of a sequence of steps which together - implement a Map-Reduce job: * Read a collection of data from some source, - parsing the collection's elements. * Validate the elements. * - Apply a user-defined function to map each element to some value and - extract an element-specific key value. * Group elements with the same key - into a single element with that key, transforming a multiply-keyed - collection into a uniquely-keyed collection. * Write the elements out - to some data sink. Note that the Cloud Dataflow service may be used to run - many different types of jobs, not just Map-Reduce. + r"""Defines a particular step within a Cloud Dataflow job. A job consists of + multiple steps, each of which performs some specific operation as part of + the overall job. Data is typically passed from one step to another as part + of the job. Here's an example of a sequence of steps which together + implement a Map-Reduce job: * Read a collection of data from some source, + parsing the collection's elements. * Validate the elements. * Apply a user- + defined function to map each element to some value and extract an element- + specific key value. * Group elements with the same key into a single element + with that key, transforming a multiply-keyed collection into a uniquely- + keyed collection. * Write the elements out to some data sink. Note that the + Cloud Dataflow service may be used to run many different types of jobs, not + just Map-Reduce. Messages: PropertiesValue: Named properties associated with the step. Each kind of predefined step has its own required set of properties. Must be provided - on Create. Only retrieved with JOB_VIEW_ALL. + on Create. Only retrieved with JOB_VIEW_ALL. Fields: kind: The kind of step in the Cloud Dataflow job. @@ -4668,7 +5743,7 @@ class Step(_messages.Message): with respect to all other steps in the Cloud Dataflow job. properties: Named properties associated with the step. Each kind of predefined step has its own required set of properties. Must be provided - on Create. Only retrieved with JOB_VIEW_ALL. + on Create. Only retrieved with JOB_VIEW_ALL. """ @encoding.MapUnrecognizedFields('additionalProperties') @@ -4955,7 +6030,7 @@ class StructuredMessage(_messages.Message): programmatic consumption. Fields: - messageKey: Identifier for this message type. Used by external systems to + messageKey: Identifier for this message type. Used by external systems to internationalize or personalize message. messageText: Human-readable version of message. parameters: The structured data associated with this message. @@ -4973,10 +6048,10 @@ class TaskRunnerSettings(_messages.Message): alsologtostderr: Whether to also send taskrunner log info to stderr. baseTaskDir: The location on the worker for task-specific subdirectories. baseUrl: The base URL for the taskrunner to use when accessing Google - Cloud APIs. When workers access Google Cloud APIs, they logically do so - via relative URLs. If this field is specified, it supplies the base URL - to use for resolving these relative URLs. The normative algorithm used - is defined by RFC 1808, "Relative Uniform Resource Locators". If not + Cloud APIs. When workers access Google Cloud APIs, they logically do so + via relative URLs. If this field is specified, it supplies the base URL + to use for resolving these relative URLs. The normative algorithm used + is defined by RFC 1808, "Relative Uniform Resource Locators". If not specified, the default value is "http://www.googleapis.com/" commandlinesFileName: The file to store preprocessing commands in. continueOnException: Whether to continue taskrunner if an exception is @@ -4987,9 +6062,9 @@ class TaskRunnerSettings(_messages.Message): logDir: The directory on the VM to store logs. logToSerialconsole: Whether to send taskrunner log info to Google Compute Engine VM serial console. - logUploadLocation: Indicates where to put logs. If this is not specified, - the logs will not be uploaded. The supported resource type is: Google - Cloud Storage: storage.googleapis.com/{bucket}/{object} + logUploadLocation: Indicates where to put logs. If this is not specified, + the logs will not be uploaded. The supported resource type is: Google + Cloud Storage: storage.googleapis.com/{bucket}/{object} bucket.storage.googleapis.com/{object} oauthScopes: The OAuth2 scopes to be requested by the taskrunner in order to access the Cloud Dataflow API. @@ -5001,8 +6076,8 @@ class TaskRunnerSettings(_messages.Message): taskUser: The UNIX user ID on the worker VM to use for tasks launched by taskrunner; e.g. "root". tempStoragePrefix: The prefix of the resources the taskrunner should use - for temporary storage. The supported resource type is: Google Cloud - Storage: storage.googleapis.com/{bucket}/{object} + for temporary storage. The supported resource type is: Google Cloud + Storage: storage.googleapis.com/{bucket}/{object} bucket.storage.googleapis.com/{object} vmId: The ID string of the VM. workflowFileName: The file to store the workflow in. @@ -5043,6 +6118,87 @@ class TemplateMetadata(_messages.Message): parameters = _messages.MessageField('ParameterMetadata', 3, repeated=True) +class TemplateVersion(_messages.Message): + r"""//////////////////////////////////////////////////////////////////////// + ///// //// Template Catalog is used to organize user TemplateVersions. //// + TemplateVersions that have the same project_id and display_name are //// + belong to the same Template. //// Templates with the same project_id belong + to the same Project. //// TemplateVersion may have labels and multiple + labels are allowed. //// Duplicated labels in the same `TemplateVersion` are + not allowed. //// TemplateVersion may have tags and multiple tags are + allowed. Duplicated //// tags in the same `Template` are not allowed! + + Enums: + TypeValueValuesEnum: Either LEGACY or FLEX. This should match with the + type of artifact. + + Messages: + LabelsValue: Labels for the Template Version. Labels can be duplicate + within Template. + + Fields: + artifact: Job graph and metadata if it is a legacy Template. Container + image path and metadata if it is flex Template. + createTime: Creation time of this TemplateVersion. + description: Template description from the user. + displayName: A customized name for Template. Multiple TemplateVersions per + Template. + labels: Labels for the Template Version. Labels can be duplicate within + Template. + projectId: A unique project_id. Multiple Templates per Project. + tags: Alias for version_id, helps locate a TemplateVersion. + type: Either LEGACY or FLEX. This should match with the type of artifact. + versionId: An auto generated version_id for TemplateVersion. + """ + + class TypeValueValuesEnum(_messages.Enum): + r"""Either LEGACY or FLEX. This should match with the type of artifact. + + Values: + TEMPLATE_TYPE_UNSPECIFIED: Default value. Not a useful zero case. + LEGACY: Legacy Template. + FLEX: Flex Template. + """ + TEMPLATE_TYPE_UNSPECIFIED = 0 + LEGACY = 1 + FLEX = 2 + + @encoding.MapUnrecognizedFields('additionalProperties') + class LabelsValue(_messages.Message): + r"""Labels for the Template Version. Labels can be duplicate within + Template. + + Messages: + AdditionalProperty: An additional property for a LabelsValue object. + + Fields: + additionalProperties: Additional properties of type LabelsValue + """ + + class AdditionalProperty(_messages.Message): + r"""An additional property for a LabelsValue object. + + Fields: + key: Name of the additional property. + value: A string attribute. + """ + + key = _messages.StringField(1) + value = _messages.StringField(2) + + additionalProperties = _messages.MessageField('AdditionalProperty', 1, repeated=True) + + artifact = _messages.MessageField('Artifact', 1) + createTime = _messages.StringField(2) + description = _messages.StringField(3) + displayName = _messages.StringField(4) + labels = _messages.MessageField('LabelsValue', 5) + projectId = _messages.StringField(6) + tags = _messages.StringField(7, repeated=True) + type = _messages.EnumField('TypeValueValuesEnum', 8) + versionId = _messages.StringField(9) + + class TopologyConfig(_messages.Message): r"""Global topology of the streaming Dataflow job, including all computations and their sharded locations. @@ -5108,7 +6264,7 @@ class TransformSummary(_messages.Message): transform. kind: Type of transform. name: User provided name for this transform instance. - outputCollectionName: User names for all collection outputs to this + outputCollectionName: User names for all collection outputs to this transform. """ @@ -5150,9 +6306,12 @@ class ValidateResponse(_messages.Message): Fields: errorMessage: Will be empty if validation succeeds. + queryInfo: Information about the validated query. Not defined if + validation fails. """ errorMessage = _messages.StringField(1) + queryInfo = _messages.MessageField('QueryInfo', 2) class WorkItem(_messages.Message): @@ -5200,6 +6359,51 @@ class WorkItem(_messages.Message): streamingSetupTask = _messages.MessageField('StreamingSetupTask', 15) +class WorkItemDetails(_messages.Message): + r"""Information about an individual work item execution. + + Enums: + StateValueValuesEnum: State of this work item. + + Fields: + attemptId: Attempt ID of this work item + endTime: End time of this work item attempt. If the work item is + completed, this is the actual end time of the work item. Otherwise, it + is the predicted end time. + metrics: Metrics for this work item. + progress: Progress of this work item. + startTime: Start time of this work item attempt. + state: State of this work item. + taskId: Name of this work item. + """ + + class StateValueValuesEnum(_messages.Enum): + r"""State of this work item. + + Values: + EXECUTION_STATE_UNKNOWN: The component state is unknown or unspecified. + EXECUTION_STATE_NOT_STARTED: The component is not yet running. + EXECUTION_STATE_RUNNING: The component is currently running. + EXECUTION_STATE_SUCCEEDED: The component succeeded. + EXECUTION_STATE_FAILED: The component failed. + EXECUTION_STATE_CANCELLED: Execution of the component was cancelled. + """ + EXECUTION_STATE_UNKNOWN = 0 + EXECUTION_STATE_NOT_STARTED = 1 + EXECUTION_STATE_RUNNING = 2 + EXECUTION_STATE_SUCCEEDED = 3 + EXECUTION_STATE_FAILED = 4 + EXECUTION_STATE_CANCELLED = 5 + + attemptId = _messages.StringField(1) + endTime = _messages.StringField(2) + metrics = _messages.MessageField('MetricUpdate', 3, repeated=True) + progress = _messages.MessageField('ProgressTimeseries', 4) + startTime = _messages.StringField(5) + state = _messages.EnumField('StateValueValuesEnum', 6) + taskId = _messages.StringField(7) + + class WorkItemServiceState(_messages.Message): r"""The Dataflow service's idea of the current state of a WorkItem being processed by a worker. @@ -5209,6 +6413,10 @@ class WorkItemServiceState(_messages.Message): particular worker harness. Fields: + completeWorkStatus: If set, a request to complete the work item with the + given status. This will not be set to OK, unless supported by the + specific kind of WorkItem. It can be used for the backend to indicate a + WorkItem must terminate, e.g., for aborting work. harnessData: Other data returned by the service, specific to the particular worker harness. hotKeyDetection: A hot key is a symptom of poor data distribution in which @@ -5257,15 +6465,16 @@ class AdditionalProperty(_messages.Message): additionalProperties = _messages.MessageField('AdditionalProperty', 1, repeated=True) - harnessData = _messages.MessageField('HarnessDataValue', 1) - hotKeyDetection = _messages.MessageField('HotKeyDetection', 2) - leaseExpireTime = _messages.StringField(3) - metricShortId = _messages.MessageField('MetricShortId', 4, repeated=True) - nextReportIndex = _messages.IntegerField(5) - reportStatusInterval = _messages.StringField(6) - splitRequest = _messages.MessageField('ApproximateSplitRequest', 7) - suggestedStopPoint = _messages.MessageField('ApproximateProgress', 8) - suggestedStopPosition = _messages.MessageField('Position', 9) + completeWorkStatus = _messages.MessageField('Status', 1) + harnessData = _messages.MessageField('HarnessDataValue', 2) + hotKeyDetection = _messages.MessageField('HotKeyDetection', 3) + leaseExpireTime = _messages.StringField(4) + metricShortId = _messages.MessageField('MetricShortId', 5, repeated=True) + nextReportIndex = _messages.IntegerField(6) + reportStatusInterval = _messages.StringField(7) + splitRequest = _messages.MessageField('ApproximateSplitRequest', 8) + suggestedStopPoint = _messages.MessageField('ApproximateProgress', 9) + suggestedStopPosition = _messages.MessageField('Position', 10) class WorkItemStatus(_messages.Message): @@ -5276,19 +6485,19 @@ class WorkItemStatus(_messages.Message): unsuccessfully). counterUpdates: Worker output counters for this WorkItem. dynamicSourceSplit: See documentation of stop_position. - errors: Specifies errors which occurred during processing. If errors are + errors: Specifies errors which occurred during processing. If errors are provided, and completed = true, then the WorkItem is considered to have failed. metricUpdates: DEPRECATED in favor of counter_updates. progress: DEPRECATED in favor of reported_progress. - reportIndex: The report index. When a WorkItem is leased, the lease will - contain an initial report index. When a WorkItem's status is reported - to the system, the report should be sent with that report index, and the + reportIndex: The report index. When a WorkItem is leased, the lease will + contain an initial report index. When a WorkItem's status is reported to + the system, the report should be sent with that report index, and the response will contain the index the worker should use for the next - report. Reports received with unexpected index values will be rejected - by the service. In order to preserve idempotency, the worker should not + report. Reports received with unexpected index values will be rejected + by the service. In order to preserve idempotency, the worker should not alter the contents of a report, even if the worker must submit the same - report multiple times before getting back a response. The worker should + report multiple times before getting back a response. The worker should not submit a subsequent report until the response for the previous report had been received from the service. reportedProgress: The worker's progress through this WorkItem. @@ -5307,14 +6516,14 @@ class WorkItemStatus(_messages.Message): way in which the original task is decomposed into the two parts is specified either as a position demarcating them (stop_position), or explicitly as two DerivedSources, if this task consumes a user-defined - source type (dynamic_source_split). The "current" task is adjusted as a + source type (dynamic_source_split). The "current" task is adjusted as a result of the split: after a task with range [A, B) sends a stop_position update at C, its range is considered to be [A, C), e.g.: * - Progress should be interpreted relative to the new range, e.g. "75% + Progress should be interpreted relative to the new range, e.g. "75% completed" means "75% of [A, C) completed" * The worker should interpret - proposed_stop_position relative to the new range, e.g. "split at 68%" - should be interpreted as "split at 68% of [A, C)". * If the worker - chooses to split again using stop_position, only stop_positions in [A, + proposed_stop_position relative to the new range, e.g. "split at 68%" + should be interpreted as "split at 68% of [A, C)". * If the worker + chooses to split again using stop_position, only stop_positions in [A, C) will be accepted. * Etc. dynamic_source_split has similar semantics: e.g., if a task with source S splits using dynamic_source_split into {P, R} (where P and R must be together equivalent to S), then subsequent @@ -5342,6 +6551,18 @@ class WorkItemStatus(_messages.Message): workItemId = _messages.StringField(14) +class WorkerDetails(_messages.Message): + r"""Information about a worker + + Fields: + workItems: Work items processed by this worker, sorted by time. + workerName: Name of this worker + """ + + workItems = _messages.MessageField('WorkItemDetails', 1, repeated=True) + workerName = _messages.StringField(2) + + class WorkerHealthReport(_messages.Message): r"""WorkerHealthReport contains information about the health of a worker. The VM should be identified by the labels attached to the WorkerMessage that @@ -5352,10 +6573,10 @@ class WorkerHealthReport(_messages.Message): Fields: msg: A message describing any unusual health reports. - pods: The pods running on the worker. See: http://kubernetes.io/v1.1/docs - /api-reference/v1/definitions.html#_v1_pod This field is used by the - worker to send the status of the indvidual containers running on each - worker. + pods: The pods running on the worker. See: + http://kubernetes.io/v1.1/docs/api-reference/v1/definitions.html#_v1_pod + This field is used by the worker to send the status of the indvidual + containers running on each worker. reportInterval: The interval at which the worker is sending health reports. The default value of 0 should be interpreted as the field is not being explicitly set by the worker. @@ -5405,7 +6626,7 @@ class WorkerHealthReportResponse(_messages.Message): Fields: reportInterval: A positive value indicates the worker should change its - reporting interval to the specified value. The default value of zero + reporting interval to the specified value. The default value of zero means no change in report rate is requested by the server. """ @@ -5495,8 +6716,8 @@ class WorkerMessage(_messages.Message): Messages: LabelsValue: Labels are used to group WorkerMessages. For example, a worker_message about a particular container might have the labels: { - "JOB_ID": "2015-04-22", "WORKER_ID": "wordcount-vm-2015..." - "CONTAINER_TYPE": "worker", "CONTAINER_ID": "ac1234def"} Label tags + "JOB_ID": "2015-04-22", "WORKER_ID": "wordcount-vm-2015..." + "CONTAINER_TYPE": "worker", "CONTAINER_ID": "ac1234def"} Label tags typically correspond to Label enum values. However, for ease of development other strings can be used as tags. LABEL_UNSPECIFIED should not be used here. @@ -5504,8 +6725,8 @@ class WorkerMessage(_messages.Message): Fields: labels: Labels are used to group WorkerMessages. For example, a worker_message about a particular container might have the labels: { - "JOB_ID": "2015-04-22", "WORKER_ID": "wordcount-vm-2015..." - "CONTAINER_TYPE": "worker", "CONTAINER_ID": "ac1234def"} Label tags + "JOB_ID": "2015-04-22", "WORKER_ID": "wordcount-vm-2015..." + "CONTAINER_TYPE": "worker", "CONTAINER_ID": "ac1234def"} Label tags typically correspond to Label enum values. However, for ease of development other strings can be used as tags. LABEL_UNSPECIFIED should not be used here. @@ -5521,10 +6742,10 @@ class WorkerMessage(_messages.Message): class LabelsValue(_messages.Message): r"""Labels are used to group WorkerMessages. For example, a worker_message about a particular container might have the labels: { "JOB_ID": - "2015-04-22", "WORKER_ID": "wordcount-vm-2015..." "CONTAINER_TYPE": - "worker", "CONTAINER_ID": "ac1234def"} Label tags typically correspond - to Label enum values. However, for ease of development other strings can - be used as tags. LABEL_UNSPECIFIED should not be used here. + "2015-04-22", "WORKER_ID": "wordcount-vm-2015..." "CONTAINER_TYPE": + "worker", "CONTAINER_ID": "ac1234def"} Label tags typically correspond to + Label enum values. However, for ease of development other strings can be + used as tags. LABEL_UNSPECIFIED should not be used here. Messages: AdditionalProperty: An additional property for a LabelsValue object. @@ -5559,62 +6780,62 @@ class WorkerMessageCode(_messages.Message): r"""A message code is used to report status and error messages to the service. The message codes are intended to be machine readable. The service will take care of translating these into user understandable messages if - necessary. Example use cases: 1. Worker processes reporting successful - startup. 2. Worker processes reporting specific errors (e.g. package - staging failure). + necessary. Example use cases: 1. Worker processes reporting successful + startup. 2. Worker processes reporting specific errors (e.g. package staging + failure). Messages: ParametersValue: Parameters contains specific information about the code. - This is a struct to allow parameters of different types. Examples: 1. - For a "HARNESS_STARTED" message parameters might provide the name of - the worker and additional data like timing information. 2. For a - "GCS_DOWNLOAD_ERROR" parameters might contain fields listing the GCS - objects being downloaded and fields containing errors. In general + This is a struct to allow parameters of different types. Examples: 1. + For a "HARNESS_STARTED" message parameters might provide the name of the + worker and additional data like timing information. 2. For a + "GCS_DOWNLOAD_ERROR" parameters might contain fields listing the GCS + objects being downloaded and fields containing errors. In general complex data structures should be avoided. If a worker needs to send a specific and complicated data structure then please consider defining a new proto and adding it to the data oneof in WorkerMessageResponse. - Conventions: Parameters should only be used for information that isn't - typically passed as a label. hostname and other worker identifiers - should almost always be passed as labels since they will be included on + Conventions: Parameters should only be used for information that isn't + typically passed as a label. hostname and other worker identifiers + should almost always be passed as labels since they will be included on most messages. Fields: code: The code is a string intended for consumption by a machine that - identifies the type of message being sent. Examples: 1. + identifies the type of message being sent. Examples: 1. "HARNESS_STARTED" might be used to indicate the worker harness has - started. 2. "GCS_DOWNLOAD_ERROR" might be used to indicate an error - downloading a GCS file as part of the boot process of one of the - worker containers. This is a string and not an enum to make it easy to - add new codes without waiting for an API change. - parameters: Parameters contains specific information about the code. This - is a struct to allow parameters of different types. Examples: 1. For a - "HARNESS_STARTED" message parameters might provide the name of the - worker and additional data like timing information. 2. For a - "GCS_DOWNLOAD_ERROR" parameters might contain fields listing the GCS - objects being downloaded and fields containing errors. In general + started. 2. "GCS_DOWNLOAD_ERROR" might be used to indicate an error + downloading a GCS file as part of the boot process of one of the worker + containers. This is a string and not an enum to make it easy to add new + codes without waiting for an API change. + parameters: Parameters contains specific information about the code. This + is a struct to allow parameters of different types. Examples: 1. For a + "HARNESS_STARTED" message parameters might provide the name of the + worker and additional data like timing information. 2. For a + "GCS_DOWNLOAD_ERROR" parameters might contain fields listing the GCS + objects being downloaded and fields containing errors. In general complex data structures should be avoided. If a worker needs to send a specific and complicated data structure then please consider defining a new proto and adding it to the data oneof in WorkerMessageResponse. - Conventions: Parameters should only be used for information that isn't - typically passed as a label. hostname and other worker identifiers - should almost always be passed as labels since they will be included on + Conventions: Parameters should only be used for information that isn't + typically passed as a label. hostname and other worker identifiers + should almost always be passed as labels since they will be included on most messages. """ @encoding.MapUnrecognizedFields('additionalProperties') class ParametersValue(_messages.Message): - r"""Parameters contains specific information about the code. This is a - struct to allow parameters of different types. Examples: 1. For a - "HARNESS_STARTED" message parameters might provide the name of the - worker and additional data like timing information. 2. For a - "GCS_DOWNLOAD_ERROR" parameters might contain fields listing the GCS - objects being downloaded and fields containing errors. In general complex - data structures should be avoided. If a worker needs to send a specific - and complicated data structure then please consider defining a new proto - and adding it to the data oneof in WorkerMessageResponse. Conventions: - Parameters should only be used for information that isn't typically passed - as a label. hostname and other worker identifiers should almost always be - passed as labels since they will be included on most messages. + r"""Parameters contains specific information about the code. This is a + struct to allow parameters of different types. Examples: 1. For a + "HARNESS_STARTED" message parameters might provide the name of the worker + and additional data like timing information. 2. For a "GCS_DOWNLOAD_ERROR" + parameters might contain fields listing the GCS objects being downloaded + and fields containing errors. In general complex data structures should be + avoided. If a worker needs to send a specific and complicated data + structure then please consider defining a new proto and adding it to the + data oneof in WorkerMessageResponse. Conventions: Parameters should only + be used for information that isn't typically passed as a label. hostname + and other worker identifiers should almost always be passed as labels + since they will be included on most messages. Messages: AdditionalProperty: An additional property for a ParametersValue object. @@ -5661,27 +6882,27 @@ class WorkerMessageResponse(_messages.Message): class WorkerPool(_messages.Message): r"""Describes one particular pool of Cloud Dataflow workers to be instantiated by the Cloud Dataflow service in order to perform the - computations required by a job. Note that a workflow job may use multiple + computations required by a job. Note that a workflow job may use multiple pools, in order to match the various computational requirements of the various stages of the job. Enums: - DefaultPackageSetValueValuesEnum: The default package set to install. - This allows the service to select a default set of packages which are - useful to worker harnesses written in a particular language. + DefaultPackageSetValueValuesEnum: The default package set to install. This + allows the service to select a default set of packages which are useful + to worker harnesses written in a particular language. IpConfigurationValueValuesEnum: Configuration for VM IPs. TeardownPolicyValueValuesEnum: Sets the policy for determining when to turndown worker pool. Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and `TEARDOWN_NEVER`. `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down if the job succeeds. - `TEARDOWN_NEVER` means the workers are never torn down. If the workers + `TEARDOWN_NEVER` means the workers are never torn down. If the workers are not torn down by the service, they will continue to run and use Google Compute Engine VM resources in the user's project until they are explicitly terminated by the user. Because of this, Google recommends using the `TEARDOWN_ALWAYS` policy except for small, manually supervised - test jobs. If unknown or unspecified, the service will attempt to - choose a reasonable default. + test jobs. If unknown or unspecified, the service will attempt to choose + a reasonable default. Messages: MetadataValue: Metadata to set on the Google Compute Engine VMs. @@ -5690,28 +6911,28 @@ class WorkerPool(_messages.Message): Fields: autoscalingSettings: Settings for autoscaling of this WorkerPool. dataDisks: Data disks that are used by a VM in this workflow. - defaultPackageSet: The default package set to install. This allows the + defaultPackageSet: The default package set to install. This allows the service to select a default set of packages which are useful to worker harnesses written in a particular language. - diskSizeGb: Size of root disk for VMs, in GB. If zero or unspecified, the + diskSizeGb: Size of root disk for VMs, in GB. If zero or unspecified, the service will attempt to choose a reasonable default. diskSourceImage: Fully qualified source image for disks. - diskType: Type of root disk for VMs. If empty or unspecified, the service + diskType: Type of root disk for VMs. If empty or unspecified, the service will attempt to choose a reasonable default. ipConfiguration: Configuration for VM IPs. kind: The kind of the worker pool; currently only `harness` and `shuffle` are supported. - machineType: Machine type (e.g. "n1-standard-1"). If empty or - unspecified, the service will attempt to choose a reasonable default. + machineType: Machine type (e.g. "n1-standard-1"). If empty or unspecified, + the service will attempt to choose a reasonable default. metadata: Metadata to set on the Google Compute Engine VMs. - network: Network to which VMs will be assigned. If empty or unspecified, + network: Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default". numThreadsPerWorker: The number of threads per worker harness. If empty or unspecified, the service will choose a number of threads (according to the number of cores on the selected machine type for batch, or 1 by convention for streaming). numWorkers: Number of Google Compute Engine workers in this pool needed to - execute the job. If zero or unspecified, the service will attempt to + execute the job. If zero or unspecified, the service will attempt to choose a reasonable default. onHostMaintenance: The action to take on host maintenance, as defined by the Google Compute Engine API. @@ -5721,32 +6942,32 @@ class WorkerPool(_messages.Message): this pipeline. This will only be set in the Fn API path. For non-cross- language pipelines this should have only one entry. Cross-language pipelines will have two or more entries. - subnetwork: Subnetwork to which VMs will be assigned, if desired. - Expected to be of the form "regions/REGION/subnetworks/SUBNETWORK". + subnetwork: Subnetwork to which VMs will be assigned, if desired. Expected + to be of the form "regions/REGION/subnetworks/SUBNETWORK". taskrunnerSettings: Settings passed through to Google Compute Engine - workers when using the standard Dataflow task runner. Users should + workers when using the standard Dataflow task runner. Users should ignore this field. teardownPolicy: Sets the policy for determining when to turndown worker pool. Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and `TEARDOWN_NEVER`. `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down if the job succeeds. `TEARDOWN_NEVER` means the - workers are never torn down. If the workers are not torn down by the + workers are never torn down. If the workers are not torn down by the service, they will continue to run and use Google Compute Engine VM resources in the user's project until they are explicitly terminated by the user. Because of this, Google recommends using the `TEARDOWN_ALWAYS` - policy except for small, manually supervised test jobs. If unknown or + policy except for small, manually supervised test jobs. If unknown or unspecified, the service will attempt to choose a reasonable default. workerHarnessContainerImage: Required. Docker container image that executes the Cloud Dataflow worker harness, residing in Google Container - Registry. Deprecated for the Fn API path. Use + Registry. Deprecated for the Fn API path. Use sdk_harness_container_images instead. - zone: Zone to run the worker pools in. If empty or unspecified, the + zone: Zone to run the worker pools in. If empty or unspecified, the service will attempt to choose a reasonable default. """ class DefaultPackageSetValueValuesEnum(_messages.Enum): - r"""The default package set to install. This allows the service to select + r"""The default package set to install. This allows the service to select a default set of packages which are useful to worker harnesses written in a particular language. @@ -5783,11 +7004,11 @@ class TeardownPolicyValueValuesEnum(_messages.Enum): `TEARDOWN_NEVER`. `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down if the job succeeds. `TEARDOWN_NEVER` means the - workers are never torn down. If the workers are not torn down by the + workers are never torn down. If the workers are not torn down by the service, they will continue to run and use Google Compute Engine VM resources in the user's project until they are explicitly terminated by the user. Because of this, Google recommends using the `TEARDOWN_ALWAYS` - policy except for small, manually supervised test jobs. If unknown or + policy except for small, manually supervised test jobs. If unknown or unspecified, the service will attempt to choose a reasonable default. Values: @@ -5881,20 +7102,20 @@ class WorkerSettings(_messages.Message): r"""Provides data to pass through to the worker harness. Fields: - baseUrl: The base URL for accessing Google Cloud APIs. When workers - access Google Cloud APIs, they logically do so via relative URLs. If - this field is specified, it supplies the base URL to use for resolving - these relative URLs. The normative algorithm used is defined by RFC - 1808, "Relative Uniform Resource Locators". If not specified, the - default value is "http://www.googleapis.com/" + baseUrl: The base URL for accessing Google Cloud APIs. When workers access + Google Cloud APIs, they logically do so via relative URLs. If this field + is specified, it supplies the base URL to use for resolving these + relative URLs. The normative algorithm used is defined by RFC 1808, + "Relative Uniform Resource Locators". If not specified, the default + value is "http://www.googleapis.com/" reportingEnabled: Whether to send work progress updates to the service. servicePath: The Cloud Dataflow service path relative to the root URL, for example, "dataflow/v1b3/projects". shuffleServicePath: The Shuffle service path relative to the root URL, for example, "shuffle/v1beta1". tempStoragePrefix: The prefix of the resources the system should use for - temporary storage. The supported resource type is: Google Cloud - Storage: storage.googleapis.com/{bucket}/{object} + temporary storage. The supported resource type is: Google Cloud Storage: + storage.googleapis.com/{bucket}/{object} bucket.storage.googleapis.com/{object} workerId: The ID of the worker running this pipeline. """ @@ -5914,8 +7135,8 @@ class WorkerShutdownNotice(_messages.Message): Fields: reason: The reason for the worker shutdown. Current possible values are: - "UNKNOWN": shutdown reason is unknown. "PREEMPTION": shutdown reason - is preemption. Other possible reasons may be added in the future. + "UNKNOWN": shutdown reason is unknown. "PREEMPTION": shutdown reason is + preemption. Other possible reasons may be added in the future. """ reason = _messages.StringField(1) diff --git a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py index 90962925854f..5b8753dfab65 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py @@ -16,9 +16,6 @@ # # pytype: skip-file -from __future__ import absolute_import - -from future.utils import iteritems from hamcrest.core.base_matcher import BaseMatcher IGNORED = object() @@ -50,7 +47,7 @@ def _matches(self, item): if self.origin != IGNORED and item.origin != self.origin: return False if self.context != IGNORED: - for key, name in iteritems(self.context): + for key, name in self.context.items(): if key not in item.context: return False if name != IGNORED and item.context[key] != name: diff --git a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers_test.py b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers_test.py index e60cf0e58e66..68dd06681ca0 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers_test.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers_test.py @@ -16,8 +16,6 @@ # # pytype: skip-file -from __future__ import absolute_import - import unittest import hamcrest as hc diff --git a/sdks/python/apache_beam/runners/dataflow/internal/names.py b/sdks/python/apache_beam/runners/dataflow/internal/names.py index 7a2ed09d6295..837989ae4c5a 100644 --- a/sdks/python/apache_beam/runners/dataflow/internal/names.py +++ b/sdks/python/apache_beam/runners/dataflow/internal/names.py @@ -22,11 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import - -# Standard file names used for staging files. -from builtins import object - # Referenced by Dataflow legacy worker. from apache_beam.runners.internal.names import PICKLED_MAIN_SESSION_FILE # pylint: disable=unused-import @@ -41,10 +36,10 @@ # Update this version to the next version whenever there is a change that will # require changes to legacy Dataflow worker execution environment. -BEAM_CONTAINER_VERSION = 'beam-master-20201116' +BEAM_CONTAINER_VERSION = 'beam-master-20210809' # Update this version to the next version whenever there is a change that # requires changes to SDK harness container or SDK harness launcher. -BEAM_FNAPI_CONTAINER_VERSION = 'beam-master-20201116' +BEAM_FNAPI_CONTAINER_VERSION = 'beam-master-20210809' DATAFLOW_CONTAINER_IMAGE_REPOSITORY = 'gcr.io/cloud-dataflow/v1beta3' @@ -108,6 +103,7 @@ class PropertyNames(object): PUBSUB_SUBSCRIPTION = 'pubsub_subscription' PUBSUB_TIMESTAMP_ATTRIBUTE = 'pubsub_timestamp_label' PUBSUB_TOPIC = 'pubsub_topic' + RESOURCE_HINTS = 'resource_hints' RESTRICTION_ENCODING = 'restriction_encoding' SERIALIZED_FN = 'serialized_fn' SHARD_NAME_TEMPLATE = 'shard_template' diff --git a/sdks/python/apache_beam/runners/dataflow/native_io/__init__.py b/sdks/python/apache_beam/runners/dataflow/native_io/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/runners/dataflow/native_io/__init__.py +++ b/sdks/python/apache_beam/runners/dataflow/native_io/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/runners/dataflow/native_io/iobase.py b/sdks/python/apache_beam/runners/dataflow/native_io/iobase.py index 4d920f7580ea..c545ecd2b176 100644 --- a/sdks/python/apache_beam/runners/dataflow/native_io/iobase.py +++ b/sdks/python/apache_beam/runners/dataflow/native_io/iobase.py @@ -22,10 +22,7 @@ # pytype: skip-file -from __future__ import absolute_import - import logging -from builtins import object from typing import TYPE_CHECKING from typing import Optional diff --git a/sdks/python/apache_beam/runners/dataflow/native_io/iobase_test.py b/sdks/python/apache_beam/runners/dataflow/native_io/iobase_test.py index 7392535c6cb6..5e72ca555b69 100644 --- a/sdks/python/apache_beam/runners/dataflow/native_io/iobase_test.py +++ b/sdks/python/apache_beam/runners/dataflow/native_io/iobase_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from apache_beam import Create diff --git a/sdks/python/apache_beam/runners/dataflow/ptransform_overrides.py b/sdks/python/apache_beam/runners/dataflow/ptransform_overrides.py index 402a4ed4e32a..add888522716 100644 --- a/sdks/python/apache_beam/runners/dataflow/ptransform_overrides.py +++ b/sdks/python/apache_beam/runners/dataflow/ptransform_overrides.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - from apache_beam.options.pipeline_options import DebugOptions from apache_beam.options.pipeline_options import GoogleCloudOptions from apache_beam.options.pipeline_options import StandardOptions @@ -360,14 +358,19 @@ def matches(self, applied_ptransform): return False google_cloud_options = self.options.view_as(GoogleCloudOptions) if not google_cloud_options.enable_streaming_engine: - return False + raise ValueError( + 'Runner determined sharding not available in Dataflow for ' + 'GroupIntoBatches for non-Streaming-Engine jobs. In order to use ' + 'runner determined sharding, please use ' + '--streaming --enable_streaming_engine --experiments=use_runner_v2') from apache_beam.runners.dataflow.internal import apiclient if not apiclient._use_unified_worker(self.options): - return False - experiments = self.options.view_as(DebugOptions).experiments or [] - if 'enable_streaming_auto_sharding' not in experiments: - return False + raise ValueError( + 'Runner determined sharding not available in Dataflow for ' + 'GroupIntoBatches for jobs not using Runner V2. In order to use ' + 'runner determined sharding, please use ' + '--streaming --enable_streaming_engine --experiments=use_runner_v2') self.dataflow_runner.add_pcoll_with_auto_sharding(applied_ptransform) return True diff --git a/sdks/python/apache_beam/runners/dataflow/template_runner_test.py b/sdks/python/apache_beam/runners/dataflow/template_runner_test.py index 54e39a5ee21a..021056608f1f 100644 --- a/sdks/python/apache_beam/runners/dataflow/template_runner_test.py +++ b/sdks/python/apache_beam/runners/dataflow/template_runner_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import json import tempfile import unittest @@ -77,27 +75,6 @@ def test_full_completion(self): ['job_name'], 'test-job') - def test_bad_path(self): - dummy_sdk_file = tempfile.NamedTemporaryFile() - remote_runner = DataflowRunner() - pipeline = Pipeline( - remote_runner, - options=PipelineOptions([ - '--dataflow_endpoint=ignored', - '--sdk_location=' + dummy_sdk_file.name, - '--job_name=test-job', - '--project=test-project', - '--staging_location=ignored', - '--temp_location=/dev/null', - '--template_location=/bad/path', - '--no_auth' - ])) - remote_runner.job = apiclient.Job( - pipeline._options, pipeline.to_runner_api()) - - with self.assertRaises(IOError): - pipeline.run().wait_until_finish() - if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py index 43bc32ce6ffa..b63249061b6a 100644 --- a/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py +++ b/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import logging import time diff --git a/sdks/python/apache_beam/runners/direct/__init__.py b/sdks/python/apache_beam/runners/direct/__init__.py index 0b647fd6f7c4..97cffc66b417 100644 --- a/sdks/python/apache_beam/runners/direct/__init__.py +++ b/sdks/python/apache_beam/runners/direct/__init__.py @@ -20,6 +20,5 @@ Anything in this package not imported here is an internal implementation detail with no backwards-compatibility guarantees. """ -from __future__ import absolute_import from apache_beam.runners.direct.direct_runner import DirectRunner diff --git a/sdks/python/apache_beam/runners/direct/bundle_factory.py b/sdks/python/apache_beam/runners/direct/bundle_factory.py index 20a979bb08a0..e4beefe992c1 100644 --- a/sdks/python/apache_beam/runners/direct/bundle_factory.py +++ b/sdks/python/apache_beam/runners/direct/bundle_factory.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import object from typing import Iterable from typing import Iterator from typing import List diff --git a/sdks/python/apache_beam/runners/direct/clock.py b/sdks/python/apache_beam/runners/direct/clock.py index e1c9b20e2968..99e3bed3abea 100644 --- a/sdks/python/apache_beam/runners/direct/clock.py +++ b/sdks/python/apache_beam/runners/direct/clock.py @@ -21,10 +21,7 @@ """ # pytype: skip-file -from __future__ import absolute_import - import time -from builtins import object from apache_beam.utils.timestamp import Timestamp diff --git a/sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor.py b/sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor.py index 4290b55501e1..2a6fc3ee6093 100644 --- a/sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor.py +++ b/sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - from typing import TYPE_CHECKING from typing import Dict from typing import Set diff --git a/sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor_test.py b/sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor_test.py index ec9dd8165e7f..5d8f21c23f88 100644 --- a/sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor_test.py +++ b/sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor_test.py @@ -18,8 +18,6 @@ """Tests for consumer_tracking_pipeline_visitor.""" # pytype: skip-file -from __future__ import absolute_import - import logging import unittest @@ -45,10 +43,6 @@ class ConsumerTrackingPipelineVisitorTest(unittest.TestCase): def setUp(self): self.pipeline = Pipeline(DirectRunner()) self.visitor = ConsumerTrackingPipelineVisitor() - try: # Python 2 - self.assertCountEqual = self.assertItemsEqual - except AttributeError: # Python 3 - pass def test_root_transforms(self): root_read = beam.Impulse() diff --git a/sdks/python/apache_beam/runners/direct/direct_metrics.py b/sdks/python/apache_beam/runners/direct/direct_metrics.py index 85c14be2e501..e4fd44053119 100644 --- a/sdks/python/apache_beam/runners/direct/direct_metrics.py +++ b/sdks/python/apache_beam/runners/direct/direct_metrics.py @@ -22,10 +22,7 @@ """ # pytype: skip-file -from __future__ import absolute_import - import threading -from builtins import object from collections import defaultdict from apache_beam.metrics.cells import CounterAggregator diff --git a/sdks/python/apache_beam/runners/direct/direct_metrics_test.py b/sdks/python/apache_beam/runners/direct/direct_metrics_test.py index cf072bb5ddc5..506663eb55c8 100644 --- a/sdks/python/apache_beam/runners/direct/direct_metrics_test.py +++ b/sdks/python/apache_beam/runners/direct/direct_metrics_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import hamcrest as hc diff --git a/sdks/python/apache_beam/runners/direct/direct_runner.py b/sdks/python/apache_beam/runners/direct/direct_runner.py index f57166073c67..4149ea2bf402 100644 --- a/sdks/python/apache_beam/runners/direct/direct_runner.py +++ b/sdks/python/apache_beam/runners/direct/direct_runner.py @@ -23,8 +23,6 @@ # pytype: skip-file -from __future__ import absolute_import - import itertools import logging import time @@ -118,8 +116,15 @@ def visit_transform(self, applied_ptransform): # Check whether all transforms used in the pipeline are supported by the # FnApiRunner, and the pipeline was not meant to be run as streaming. if _FnApiRunnerSupportVisitor().accept(pipeline): - from apache_beam.runners.portability.fn_api_runner import FnApiRunner - runner = FnApiRunner() + from apache_beam.portability.api import beam_provision_api_pb2 + from apache_beam.runners.portability.fn_api_runner import fn_runner + from apache_beam.runners.portability.portable_runner import JobServiceHandle + all_options = options.get_all_options() + encoded_options = JobServiceHandle.encode_pipeline_options(all_options) + provision_info = fn_runner.ExtendedProvisionInfo( + beam_provision_api_pb2.ProvisionInfo( + pipeline_options=encoded_options)) + runner = fn_runner.FnApiRunner(provision_info=provision_info) else: runner = BundleBasedDirectRunner() @@ -393,7 +398,7 @@ def _infer_output_coder( # type: (...) -> typing.Optional[coders.Coder] return coders.BytesCoder() - def get_windowing(self, inputs): + def get_windowing(self, unused_inputs): return beam.Windowing(beam.window.GlobalWindows()) def expand(self, pvalue): @@ -472,9 +477,7 @@ def get_replacement_transform_for_applied_ptransform( class WriteToPubSubOverride(PTransformOverride): def matches(self, applied_ptransform): - return isinstance( - applied_ptransform.transform, - (beam_pubsub.WriteToPubSub, beam_pubsub._WriteStringsToPubSub)) + return isinstance(applied_ptransform.transform, beam_pubsub.WriteToPubSub) def get_replacement_transform_for_applied_ptransform( self, applied_ptransform): diff --git a/sdks/python/apache_beam/runners/direct/direct_runner_test.py b/sdks/python/apache_beam/runners/direct/direct_runner_test.py index dcdd42320a9e..6f226133917d 100644 --- a/sdks/python/apache_beam/runners/direct/direct_runner_test.py +++ b/sdks/python/apache_beam/runners/direct/direct_runner_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import threading import unittest from collections import defaultdict diff --git a/sdks/python/apache_beam/runners/direct/direct_userstate.py b/sdks/python/apache_beam/runners/direct/direct_userstate.py index a6fcccea651a..715355cba2a8 100644 --- a/sdks/python/apache_beam/runners/direct/direct_userstate.py +++ b/sdks/python/apache_beam/runners/direct/direct_userstate.py @@ -18,8 +18,6 @@ """Support for user state in the BundleBasedDirectRunner.""" # pytype: skip-file -from __future__ import absolute_import - import copy import itertools @@ -206,8 +204,7 @@ def __init__(self, step_context, dofn, key_coder): self.dofn = dofn self.key_coder = key_coder - self.all_state_specs, self.all_timer_specs = ( - userstate.get_dofn_specs(dofn)) + self.all_state_specs, self.all_timer_specs = userstate.get_dofn_specs(dofn) self.state_tags = {} for state_spec in self.all_state_specs: state_key = 'user/%s' % state_spec.name @@ -226,12 +223,14 @@ def __init__(self, step_context, dofn, key_coder): self.cached_states = {} self.cached_timers = {} - def get_timer(self, timer_spec, key, window, timestamp, pane): + def get_timer( + self, timer_spec: userstate.TimerSpec, key, window, timestamp, + pane) -> userstate.RuntimeTimer: assert timer_spec in self.all_timer_specs encoded_key = self.key_coder.encode(key) cache_key = (encoded_key, window, timer_spec) if cache_key not in self.cached_timers: - self.cached_timers[cache_key] = userstate.RuntimeTimer(timer_spec) + self.cached_timers[cache_key] = userstate.RuntimeTimer() return self.cached_timers[cache_key] def get_state(self, state_spec, key, window): @@ -291,16 +290,22 @@ def commit(self): encoded_key, window, timer_spec = cache_key state = self.step_context.get_keyed_state(encoded_key) timer_name = 'user/%s' % timer_spec.name - if runtime_timer._cleared: - state.clear_timer(window, timer_name, timer_spec.time_domain) - if runtime_timer._new_timestamp is not None: - # TODO(ccy): add corresponding watermark holds after the DirectRunner - # allows for keyed watermark holds. - state.set_timer( - window, - timer_name, - timer_spec.time_domain, - runtime_timer._new_timestamp) + for dynamic_timer_tag, timer in runtime_timer._timer_recordings.items(): + if timer.cleared: + state.clear_timer( + window, + timer_name, + timer_spec.time_domain, + dynamic_timer_tag=dynamic_timer_tag) + if timer.timestamp: + # TODO(ccy): add corresponding watermark holds after the DirectRunner + # allows for keyed watermark holds. + state.set_timer( + window, + timer_name, + timer_spec.time_domain, + timer.timestamp, + dynamic_timer_tag=dynamic_timer_tag) def reset(self): for state in self.cached_states.values(): diff --git a/sdks/python/apache_beam/runners/direct/evaluation_context.py b/sdks/python/apache_beam/runners/direct/evaluation_context.py index 48a99bd26c3a..8d50d689cc7b 100644 --- a/sdks/python/apache_beam/runners/direct/evaluation_context.py +++ b/sdks/python/apache_beam/runners/direct/evaluation_context.py @@ -19,11 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import threading -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import DefaultDict diff --git a/sdks/python/apache_beam/runners/direct/executor.py b/sdks/python/apache_beam/runners/direct/executor.py index 1c5dff594bb9..bfcb47f99e88 100644 --- a/sdks/python/apache_beam/runners/direct/executor.py +++ b/sdks/python/apache_beam/runners/direct/executor.py @@ -19,16 +19,13 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import itertools import logging +import queue import sys import threading import traceback -from builtins import object -from builtins import range from typing import TYPE_CHECKING from typing import Any from typing import Dict @@ -37,9 +34,6 @@ from typing import Set from weakref import WeakValueDictionary -from future.moves import queue -from future.utils import raise_ - from apache_beam.metrics.execution import MetricsContainer from apache_beam.runners.worker import statesampler from apache_beam.transforms import sideinputs @@ -485,7 +479,7 @@ def await_completion(self): try: if update.exception: t, v, tb = update.exc_info - raise_(t, v, tb) + raise t(v).with_traceback(tb) finally: self.executor_service.shutdown() self.executor_service.await_completion() diff --git a/sdks/python/apache_beam/runners/direct/helper_transforms.py b/sdks/python/apache_beam/runners/direct/helper_transforms.py index 09125f44a096..0e88c021e2f9 100644 --- a/sdks/python/apache_beam/runners/direct/helper_transforms.py +++ b/sdks/python/apache_beam/runners/direct/helper_transforms.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import itertools import typing diff --git a/sdks/python/apache_beam/runners/direct/sdf_direct_runner.py b/sdks/python/apache_beam/runners/direct/sdf_direct_runner.py index cb23d6921e6b..381394ef7221 100644 --- a/sdks/python/apache_beam/runners/direct/sdf_direct_runner.py +++ b/sdks/python/apache_beam/runners/direct/sdf_direct_runner.py @@ -20,10 +20,7 @@ # pytype: skip-file -from __future__ import absolute_import - import uuid -from builtins import object from threading import Lock from threading import Timer from typing import TYPE_CHECKING diff --git a/sdks/python/apache_beam/runners/direct/sdf_direct_runner_test.py b/sdks/python/apache_beam/runners/direct/sdf_direct_runner_test.py index dd802f34bcce..f60d50c073f2 100644 --- a/sdks/python/apache_beam/runners/direct/sdf_direct_runner_test.py +++ b/sdks/python/apache_beam/runners/direct/sdf_direct_runner_test.py @@ -19,13 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import os import unittest -from builtins import range import apache_beam as beam from apache_beam import Create diff --git a/sdks/python/apache_beam/runners/direct/test_direct_runner.py b/sdks/python/apache_beam/runners/direct/test_direct_runner.py index 81f58706255d..507ec1a5eb01 100644 --- a/sdks/python/apache_beam/runners/direct/test_direct_runner.py +++ b/sdks/python/apache_beam/runners/direct/test_direct_runner.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - from apache_beam.internal import pickler from apache_beam.options.pipeline_options import StandardOptions from apache_beam.options.pipeline_options import TestOptions diff --git a/sdks/python/apache_beam/runners/direct/test_stream_impl.py b/sdks/python/apache_beam/runners/direct/test_stream_impl.py index 94829eddf741..45362e11c3a1 100644 --- a/sdks/python/apache_beam/runners/direct/test_stream_impl.py +++ b/sdks/python/apache_beam/runners/direct/test_stream_impl.py @@ -25,9 +25,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import itertools import logging from queue import Empty as EmptyException diff --git a/sdks/python/apache_beam/runners/direct/transform_evaluator.py b/sdks/python/apache_beam/runners/direct/transform_evaluator.py index 8a695183489e..c77e44216f2b 100644 --- a/sdks/python/apache_beam/runners/direct/transform_evaluator.py +++ b/sdks/python/apache_beam/runners/direct/transform_evaluator.py @@ -19,14 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import - import atexit import collections import logging import random import time -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import Dict @@ -34,8 +31,6 @@ from typing import Tuple from typing import Type -from future.utils import iteritems - import apache_beam.io as io from apache_beam import coders from apache_beam import pvalue @@ -311,7 +306,10 @@ def process_timer_wrapper(self, timer_firing): """ state = self._step_context.get_keyed_state(timer_firing.encoded_key) state.clear_timer( - timer_firing.window, timer_firing.name, timer_firing.time_domain) + timer_firing.window, + timer_firing.name, + timer_firing.time_domain, + dynamic_timer_tag=timer_firing.dynamic_timer_tag) self.process_timer(timer_firing) def process_timer(self, timer_firing): @@ -875,7 +873,8 @@ def process_timer(self, timer_firing): timer_firing.window, timer_firing.timestamp, # TODO Add paneinfo to timer_firing in DirectRunner - None) + None, + timer_firing.dynamic_timer_tag) def process_element(self, element): self.runner.process(element) @@ -1038,7 +1037,7 @@ def process_element(self, element): def finish_bundle(self): bundles = [] bundle = None - for encoded_k, vs in iteritems(self.gbk_items): + for encoded_k, vs in self.gbk_items.items(): if not bundle: bundle = self._evaluation_context.create_bundle(self.output_pcollection) bundles.append(bundle) diff --git a/sdks/python/apache_beam/runners/direct/util.py b/sdks/python/apache_beam/runners/direct/util.py index 045a2ce55f24..11081c1289b2 100644 --- a/sdks/python/apache_beam/runners/direct/util.py +++ b/sdks/python/apache_beam/runners/direct/util.py @@ -22,10 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import object - class TransformResult(object): """Result of evaluating an AppliedPTransform with a TransformEvaluator.""" @@ -61,16 +57,28 @@ def __init__( class TimerFiring(object): """A single instance of a fired timer.""" - def __init__(self, encoded_key, window, name, time_domain, timestamp): + def __init__( + self, + encoded_key, + window, + name, + time_domain, + timestamp, + dynamic_timer_tag=''): self.encoded_key = encoded_key self.window = window self.name = name self.time_domain = time_domain self.timestamp = timestamp + self.dynamic_timer_tag = dynamic_timer_tag def __repr__(self): - return 'TimerFiring({!r}, {!r}, {}, {})'.format( - self.encoded_key, self.name, self.time_domain, self.timestamp) + return 'TimerFiring({!r}, {!r}, {}, {}, {})'.format( + self.encoded_key, + self.name, + self.time_domain, + self.timestamp, + self.dynamic_timer_tag) class KeyedWorkItem(object): diff --git a/sdks/python/apache_beam/runners/direct/watermark_manager.py b/sdks/python/apache_beam/runners/direct/watermark_manager.py index 7c4590c3bd48..8f97de508ff5 100644 --- a/sdks/python/apache_beam/runners/direct/watermark_manager.py +++ b/sdks/python/apache_beam/runners/direct/watermark_manager.py @@ -19,10 +19,7 @@ # pytype: skip-file -from __future__ import absolute_import - import threading -from builtins import object from typing import TYPE_CHECKING from typing import Dict from typing import Iterable @@ -325,8 +322,14 @@ def extract_transform_timers(self): if had_realtime_timer: has_realtime_timer = True for expired in timers: - window, (name, time_domain, timestamp) = expired + window, (name, time_domain, timestamp, dynamic_timer_tag) = expired fired_timers.append( - TimerFiring(encoded_key, window, name, time_domain, timestamp)) + TimerFiring( + encoded_key, + window, + name, + time_domain, + timestamp, + dynamic_timer_tag=dynamic_timer_tag)) self._fired_timers.update(fired_timers) return fired_timers, has_realtime_timer diff --git a/sdks/python/apache_beam/runners/interactive/augmented_pipeline.py b/sdks/python/apache_beam/runners/interactive/augmented_pipeline.py new file mode 100644 index 000000000000..324316315018 --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/augmented_pipeline.py @@ -0,0 +1,130 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module to augment interactive flavor into the given pipeline. + +For internal use only; no backward-compatibility guarantees. +""" +# pytype: skip-file + +import copy +from typing import Dict +from typing import Optional +from typing import Set + +import apache_beam as beam +from apache_beam.portability.api import beam_runner_api_pb2 +from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive import background_caching_job +from apache_beam.runners.interactive.caching.cacheable import Cacheable +from apache_beam.runners.interactive.caching.read_cache import ReadCache +from apache_beam.runners.interactive.caching.write_cache import WriteCache + + +class AugmentedPipeline: + """A pipeline with augmented interactive flavor that caches intermediate + PCollections defined by the user, reads computed PCollections as source and + prunes unnecessary pipeline parts for fast computation. + """ + def __init__( + self, + user_pipeline: beam.Pipeline, + pcolls: Optional[Set[beam.pvalue.PCollection]] = None): + """ + Initializes a pipelilne for augmenting interactive flavor. + + Args: + user_pipeline: a beam.Pipeline instance defined by the user. + pcolls: cacheable pcolls to be computed/retrieved. If the set is + empty, all intermediate pcolls assigned to variables are applicable. + """ + assert not pcolls or all([pcoll.pipeline is user_pipeline for pcoll in + pcolls]), 'All %s need to belong to %s' % (pcolls, user_pipeline) + self._user_pipeline = user_pipeline + self._pcolls = pcolls + self._cache_manager = ie.current_env().get_cache_manager( + self._user_pipeline, create_if_absent=True) + if background_caching_job.has_source_to_cache(self._user_pipeline): + self._cache_manager = ie.current_env().get_cache_manager( + self._user_pipeline) + _, self._context = self._user_pipeline.to_runner_api(return_context=True) + self._context.component_id_map = copy.copy( + self._user_pipeline.component_id_map) + self._cacheables = self.cacheables() + + @property + def augmented_pipeline(self) -> beam_runner_api_pb2.Pipeline: + return self.augment() + + # TODO(BEAM-10708): Support generating a background recording job that + # contains unbound source recording transforms only. + @property + def background_recording_pipeline(self) -> beam_runner_api_pb2.Pipeline: + raise NotImplementedError + + def cacheables(self) -> Dict[beam.pvalue.PCollection, Cacheable]: + """Finds all the cacheable intermediate PCollections in the pipeline with + their metadata. + """ + c = {} + for watching in ie.current_env().watching(): + for key, val in watching: + if (isinstance(val, beam.pvalue.PCollection) and + val.pipeline is self._user_pipeline and + (not self._pcolls or val in self._pcolls)): + pcoll_id = self._context.pcollections.get_id(val) + c[val] = Cacheable( + pcoll_id=pcoll_id, + var=key, + pcoll=val, + version=str(id(val)), + producer_version=str(id(val.producer))) + return c + + def augment(self) -> beam_runner_api_pb2.Pipeline: + """Augments the pipeline with cache. Always calculates a new result. + + For a cacheable PCollection, if cache exists, read cache; else, write cache. + """ + pipeline = self._user_pipeline.to_runner_api() + + # Find pcolls eligible for reading or writing cache. + readcache_pcolls = set() + for pcoll, cacheable in self._cacheables.items(): + key = repr(cacheable.to_key()) + if (self._cache_manager.exists('full', key) and + pcoll in ie.current_env().computed_pcollections): + readcache_pcolls.add(pcoll) + writecache_pcolls = set( + self._cacheables.keys()).difference(readcache_pcolls) + + # Wire in additional transforms to read cache and write cache. + for readcache_pcoll in readcache_pcolls: + ReadCache( + pipeline, + self._context, + self._cache_manager, + self._cacheables[readcache_pcoll]).read_cache() + for writecache_pcoll in writecache_pcolls: + WriteCache( + pipeline, + self._context, + self._cache_manager, + self._cacheables[writecache_pcoll]).write_cache() + # TODO(BEAM-10708): Support streaming, add pruning logic, and integrate + # pipeline fragment logic. + return pipeline diff --git a/sdks/python/apache_beam/runners/interactive/augmented_pipeline_test.py b/sdks/python/apache_beam/runners/interactive/augmented_pipeline_test.py new file mode 100644 index 000000000000..1bafb9fb16f5 --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/augmented_pipeline_test.py @@ -0,0 +1,84 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tests for augmented_pipeline module.""" + +# pytest: skip-file + +import unittest + +import apache_beam as beam +from apache_beam.runners.interactive import augmented_pipeline as ap +from apache_beam.runners.interactive import interactive_beam as ib +from apache_beam.runners.interactive import interactive_environment as ie + + +class CacheableTest(unittest.TestCase): + def setUp(self): + ie.new_env() + + def test_find_all_cacheables(self): + p = beam.Pipeline() + cacheable_pcoll_1 = p | beam.Create([1, 2, 3]) + cacheable_pcoll_2 = cacheable_pcoll_1 | beam.Map(lambda x: x * x) + ib.watch(locals()) + + aug_p = ap.AugmentedPipeline(p) + cacheables = aug_p.cacheables() + self.assertIn(cacheable_pcoll_1, cacheables) + self.assertIn(cacheable_pcoll_2, cacheables) + + def test_ignore_cacheables(self): + p = beam.Pipeline() + cacheable_pcoll_1 = p | 'cacheable_pcoll_1' >> beam.Create([1, 2, 3]) + cacheable_pcoll_2 = p | 'cacheable_pcoll_2' >> beam.Create([4, 5, 6]) + ib.watch(locals()) + + aug_p = ap.AugmentedPipeline(p, (cacheable_pcoll_1, )) + cacheables = aug_p.cacheables() + self.assertIn(cacheable_pcoll_1, cacheables) + self.assertNotIn(cacheable_pcoll_2, cacheables) + + def test_ignore_pcoll_from_other_pipeline(self): + p = beam.Pipeline() + p2 = beam.Pipeline() + cacheable_from_p2 = p2 | beam.Create([1, 2, 3]) + ib.watch(locals()) + + aug_p = ap.AugmentedPipeline(p) + cacheables = aug_p.cacheables() + self.assertNotIn(cacheable_from_p2, cacheables) + + +class AugmentTest(unittest.TestCase): + def setUp(self): + ie.new_env() + + def test_error_when_pcolls_from_mixed_pipelines(self): + p = beam.Pipeline() + cacheable_from_p = p | beam.Create([1, 2, 3]) + p2 = beam.Pipeline() + cacheable_from_p2 = p2 | beam.Create([1, 2, 3]) + ib.watch(locals()) + + self.assertRaises( + AssertionError, + lambda: ap.AugmentedPipeline(p, (cacheable_from_p, cacheable_from_p2))) + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/runners/interactive/background_caching_job.py b/sdks/python/apache_beam/runners/interactive/background_caching_job.py index 9cbb9e8cc0df..c08c5d637305 100644 --- a/sdks/python/apache_beam/runners/interactive/background_caching_job.py +++ b/sdks/python/apache_beam/runners/interactive/background_caching_job.py @@ -38,8 +38,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import threading import time @@ -140,7 +138,8 @@ def attempt_to_run_background_caching_job( # pipeline_instrument module to this module and aggregate tests. from apache_beam.runners.interactive import pipeline_instrument as instr runner_pipeline = beam.pipeline.Pipeline.from_runner_api( - user_pipeline.to_runner_api(use_fake_coders=True), runner, options) + user_pipeline.to_runner_api(), runner, options) + ie.current_env().add_derived_pipeline(user_pipeline, runner_pipeline) background_caching_job_result = beam.pipeline.Pipeline.from_runner_api( instr.build_pipeline_instrument( runner_pipeline).background_caching_pipeline_proto(), @@ -231,11 +230,13 @@ def has_source_to_cache(user_pipeline): create_if_absent=True), streaming_cache.StreamingCache): + file_based_cm = ie.current_env().get_cache_manager(user_pipeline) ie.current_env().set_cache_manager( streaming_cache.StreamingCache( - ie.current_env().get_cache_manager(user_pipeline)._cache_dir, + file_based_cm._cache_dir, is_cache_complete=is_cache_complete, - sample_resolution_sec=1.0), + sample_resolution_sec=1.0, + saved_pcoders=file_based_cm._saved_pcoders), user_pipeline) return has_cache @@ -337,8 +338,7 @@ def extract_source_to_cache_signature(user_pipeline): user_pipeline) unbounded_sources_as_ptransforms = set( map(lambda x: x.transform, unbounded_sources_as_applied_transforms)) - _, context = user_pipeline.to_runner_api( - return_context=True, use_fake_coders=True) + _, context = user_pipeline.to_runner_api(return_context=True) signature = set( map( lambda transform: str(transform.to_runner_api(context)), diff --git a/sdks/python/apache_beam/runners/interactive/background_caching_job_test.py b/sdks/python/apache_beam/runners/interactive/background_caching_job_test.py index cfba3f288763..aef2f768237e 100644 --- a/sdks/python/apache_beam/runners/interactive/background_caching_job_test.py +++ b/sdks/python/apache_beam/runners/interactive/background_caching_job_test.py @@ -18,10 +18,8 @@ """Tests for apache_beam.runners.interactive.background_caching_job.""" # pytype: skip-file -from __future__ import absolute_import - -import sys import unittest +from unittest.mock import patch import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions @@ -38,13 +36,6 @@ from apache_beam.testing.test_stream_service import TestStreamServiceController from apache_beam.transforms.window import TimestampedValue -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch # type: ignore[misc] - _FOO_PUBSUB_SUB = 'projects/test-project/subscriptions/foo' _BAR_PUBSUB_SUB = 'projects/test-project/subscriptions/bar' _TEST_CACHE_KEY = 'test' @@ -85,8 +76,6 @@ def _setup_test_streaming_cache(pipeline): @unittest.skipIf( not ie.current_env().is_interactive_ready, '[interactive] dependency is not installed.') -@unittest.skipIf( - sys.version_info < (3, 6), 'The tests require at least Python 3.6 to work.') class BackgroundCachingJobTest(unittest.TestCase): def tearDown(self): ie.new_env() diff --git a/sdks/python/apache_beam/runners/interactive/cache_manager.py b/sdks/python/apache_beam/runners/interactive/cache_manager.py index d168a7641b14..9ed0b25fd934 100644 --- a/sdks/python/apache_beam/runners/interactive/cache_manager.py +++ b/sdks/python/apache_beam/runners/interactive/cache_manager.py @@ -17,15 +17,11 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import collections import os -import sys import tempfile -import urllib +from urllib.parse import quote +from urllib.parse import unquote_to_bytes import apache_beam as beam from apache_beam import coders @@ -34,13 +30,6 @@ from apache_beam.io import tfrecordio from apache_beam.transforms import combiners -if sys.version_info[0] > 2: - unquote_to_bytes = urllib.parse.unquote_to_bytes - quote = urllib.parse.quote -else: - unquote_to_bytes = urllib.unquote # pylint: disable=deprecated-urllib-function - quote = urllib.quote # pylint: disable=deprecated-urllib-function - class CacheManager(object): """Abstract class for caching PCollections. @@ -176,7 +165,7 @@ def __init__(self, cache_dir=None, cache_format='text'): self._cache_dir = cache_dir else: self._cache_dir = tempfile.mkdtemp( - prefix='it-', dir=os.environ.get('TEST_TMPDIR', None)) + prefix='ib-', dir=os.environ.get('TEST_TMPDIR', None)) self._versions = collections.defaultdict(lambda: self._CacheVersion()) self.cache_format = cache_format @@ -218,9 +207,11 @@ def save_pcoder(self, pcoder, *labels): self._saved_pcoders[self._path(*labels)] = pcoder def load_pcoder(self, *labels): - return ( - self._default_pcoder if self._default_pcoder is not None else - self._saved_pcoders[self._path(*labels)]) + saved_pcoder = self._saved_pcoders.get(self._path(*labels), None) + if saved_pcoder is None or isinstance(saved_pcoder, + coders.FastPrimitivesCoder): + return self._default_pcoder + return saved_pcoder def read(self, *labels, **args): # Return an iterator to an empty list if it doesn't exist. @@ -236,11 +227,22 @@ def read(self, *labels, **args): return reader, version def write(self, values, *labels): - sink = self.sink(labels)._sink - path = self._path(*labels) + """Imitates how a WriteCache tranform works without running a pipeline. - init_result = sink.initialize_write() - writer = sink.open_writer(init_result, path) + For testing and cache manager development, not for production usage because + the write is not sharded and does not use Beam execution model. + """ + pcoder = coders.registry.get_coder(type(values[0])) + # Save the pcoder for the actual labels. + self.save_pcoder(pcoder, *labels) + single_shard_labels = [*labels[:-1], '-00000-of-00001'] + # Save the pcoder for the labels that imitates the sharded cache file name + # suffix. + self.save_pcoder(pcoder, *single_shard_labels) + # Put a '-%05d-of-%05d' suffix to the cache file. + sink = self.sink(single_shard_labels)._sink + path = self._path(*labels[:-1]) + writer = sink.open_writer(path, labels[-1]) for v in values: writer.write(v) writer.close() @@ -265,7 +267,7 @@ def cleanup(self): self._saved_pcoders = {} def _glob_path(self, *labels): - return self._path(*labels) + '-*-of-*' + return self._path(*labels) + '*-*-of-*' def _path(self, *labels): return filesystems.FileSystems.join(self._cache_dir, *labels) diff --git a/sdks/python/apache_beam/runners/interactive/cache_manager_test.py b/sdks/python/apache_beam/runners/interactive/cache_manager_test.py index 12c644ca83dc..d6319b65f750 100644 --- a/sdks/python/apache_beam/runners/interactive/cache_manager_test.py +++ b/sdks/python/apache_beam/runners/interactive/cache_manager_test.py @@ -17,18 +17,11 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import shutil -import tempfile import time import unittest +import apache_beam as beam from apache_beam import coders -from apache_beam.io import filesystems from apache_beam.runners.interactive import cache_manager as cache @@ -46,36 +39,20 @@ class FileBasedCacheManagerTest(object): cache_format = None # type: str def setUp(self): - self.test_dir = tempfile.mkdtemp() self.cache_manager = cache.FileBasedCacheManager( - self.test_dir, cache_format=self.cache_format) + cache_format=self.cache_format) def tearDown(self): - # The test_dir might have already been removed by cache_manager.cleanup(). - if os.path.exists(self.test_dir): - shutil.rmtree(self.test_dir) + self.cache_manager.cleanup() - def mock_write_cache(self, pcoll_list, prefix, cache_label): + def mock_write_cache(self, values, prefix, cache_label): """Cache the PCollection where cache.WriteCache would write to.""" - cache_path = filesystems.FileSystems.join( - self.cache_manager._cache_dir, prefix) - if not filesystems.FileSystems.exists(cache_path): - filesystems.FileSystems.mkdirs(cache_path) - # Pause for 0.1 sec, because the Jenkins test runs so fast that the file # writes happen at the same timestamp. time.sleep(0.1) - cache_file = cache_label + '-1-of-2' labels = [prefix, cache_label] - - # Usually, the pcoder will be inferred from `pcoll.element_type` - pcoder = coders.registry.get_coder(object) - # Save a pcoder for reading. - self.cache_manager.save_pcoder(pcoder, *labels) - # Save a pcoder for the fake write to the file. - self.cache_manager.save_pcoder(pcoder, prefix, cache_file) - self.cache_manager.write(pcoll_list, prefix, cache_file) + self.cache_manager.write(values, *labels) def test_exists(self): """Test that CacheManager can correctly tell if the cache exists or not.""" @@ -218,6 +195,14 @@ def test_read_over_cleanup(self): self.assertTrue( self.cache_manager.is_latest_version(version, prefix, cache_label)) + def test_load_saved_pcoder(self): + pipeline = beam.Pipeline() + pcoll = pipeline | beam.Create([1, 2, 3]) + _ = pcoll | cache.WriteCache(self.cache_manager, 'a key') + self.assertIs( + type(self.cache_manager.load_pcoder('full', 'a key')), + type(coders.registry.get_coder(int))) + class TextFileBasedCacheManagerTest( FileBasedCacheManagerTest, diff --git a/sdks/python/apache_beam/runners/interactive/caching/__init__.py b/sdks/python/apache_beam/runners/interactive/caching/__init__.py index dc60d93f883f..cce3acad34a4 100644 --- a/sdks/python/apache_beam/runners/interactive/caching/__init__.py +++ b/sdks/python/apache_beam/runners/interactive/caching/__init__.py @@ -14,6 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import - -from apache_beam.runners.interactive.caching.streaming_cache import StreamingCache diff --git a/sdks/python/apache_beam/runners/interactive/caching/cacheable.py b/sdks/python/apache_beam/runners/interactive/caching/cacheable.py new file mode 100644 index 000000000000..96663a70a148 --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/caching/cacheable.py @@ -0,0 +1,76 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module for dataclasses to hold metadata of cacheable PCollections in the user +code scope. + +For internal use only; no backwards-compatibility guarantees. +""" + +# pytype: skip-file + +from dataclasses import dataclass + +import apache_beam as beam +from apache_beam.runners.interactive.utils import obfuscate + + +@dataclass +class Cacheable: + pcoll_id: str + var: str + version: str + pcoll: beam.pvalue.PCollection + producer_version: str + + def __hash__(self): + return hash(( + self.pcoll_id, + self.var, + self.version, + self.pcoll, + self.producer_version)) + + def to_key(self): + return CacheKey( + self.var, + self.version, + self.producer_version, + str(id(self.pcoll.pipeline))) + + +@dataclass +class CacheKey: + var: str + version: str + producer_version: str + pipeline_id: str + + def __post_init__(self): + # Normalize arbitrary variable name to a fixed length hex str. + self.var = obfuscate(self.var)[:10] + + @staticmethod + def from_str(r): + r_split = r.split('-') + ck = CacheKey(*r_split) + ck.var = r_split[0] + return ck + + def __repr__(self): + return '-'.join( + [self.var, self.version, self.producer_version, self.pipeline_id]) diff --git a/sdks/python/apache_beam/runners/interactive/caching/expression_cache.py b/sdks/python/apache_beam/runners/interactive/caching/expression_cache.py new file mode 100644 index 000000000000..5b1b9effe5c5 --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/caching/expression_cache.py @@ -0,0 +1,109 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import * + +import apache_beam as beam +from apache_beam.dataframe import convert +from apache_beam.dataframe import expressions + + +class ExpressionCache(object): + """Utility class for caching deferred DataFrames expressions. + + This is cache is currently a light-weight wrapper around the + TO_PCOLLECTION_CACHE in the beam.dataframes.convert module and the + computed_pcollections in the interactive module. + + Example:: + + df : beam.dataframe.DeferredDataFrame = ... + ... + cache = ExpressionCache() + cache.replace_with_cached(df._expr) + + This will automatically link the instance to the existing caches. After it is + created, the cache can then be used to modify an existing deferred dataframe + expression tree to replace nodes with computed PCollections. + + This object can be created and destroyed whenever. This class holds no state + and the only side-effect is modifying the given expression. + """ + def __init__(self, pcollection_cache=None, computed_cache=None): + from apache_beam.runners.interactive import interactive_environment as ie + + self._pcollection_cache = ( + convert.TO_PCOLLECTION_CACHE + if pcollection_cache is None else pcollection_cache) + self._computed_cache = ( + ie.current_env().computed_pcollections + if computed_cache is None else computed_cache) + + def replace_with_cached( + self, expr: expressions.Expression) -> Dict[str, expressions.Expression]: + """Replaces any previously computed expressions with PlaceholderExpressions. + + This is used to correctly read any expressions that were cached in previous + runs. This enables the InteractiveRunner to prune off old calculations from + the expression tree. + """ + + replaced_inputs: Dict[str, expressions.Expression] = {} + self._replace_with_cached_recur(expr, replaced_inputs) + return replaced_inputs + + def _replace_with_cached_recur( + self, + expr: expressions.Expression, + replaced_inputs: Dict[str, expressions.Expression]) -> None: + """Recursive call for `replace_with_cached`. + + Recurses through the expression tree and replaces any cached inputs with + `PlaceholderExpression`s. + """ + + final_inputs = [] + + for input in expr.args(): + pc = self._get_cached(input) + + # Only read from cache when there is the PCollection has been fully + # computed. This is so that no partial results are used. + if self._is_computed(pc): + + # Reuse previously seen cached expressions. This is so that the same + # value isn't cached multiple times. + if input._id in replaced_inputs: + cached = replaced_inputs[input._id] + else: + cached = expressions.PlaceholderExpression( + input.proxy(), self._pcollection_cache[input._id]) + + replaced_inputs[input._id] = cached + final_inputs.append(cached) + else: + final_inputs.append(input) + self._replace_with_cached_recur(input, replaced_inputs) + expr._args = tuple(final_inputs) + + def _get_cached(self, + expr: expressions.Expression) -> Optional[beam.PCollection]: + """Returns the PCollection associated with the expression.""" + return self._pcollection_cache.get(expr._id, None) + + def _is_computed(self, pc: beam.PCollection) -> bool: + """Returns True if the PCollection has been run and computed.""" + return pc is not None and pc in self._computed_cache diff --git a/sdks/python/apache_beam/runners/interactive/caching/expression_cache_test.py b/sdks/python/apache_beam/runners/interactive/caching/expression_cache_test.py new file mode 100644 index 000000000000..c6e46f3cc3ff --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/caching/expression_cache_test.py @@ -0,0 +1,128 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import unittest + +import apache_beam as beam +from apache_beam.dataframe import expressions +from apache_beam.runners.interactive.caching.expression_cache import ExpressionCache + + +class ExpressionCacheTest(unittest.TestCase): + def setUp(self): + self._pcollection_cache = {} + self._computed_cache = set() + self._pipeline = beam.Pipeline() + self.cache = ExpressionCache(self._pcollection_cache, self._computed_cache) + + def create_trace(self, expr): + trace = [expr] + for input in expr.args(): + trace += self.create_trace(input) + return trace + + def mock_cache(self, expr): + pcoll = beam.PCollection(self._pipeline) + self._pcollection_cache[expr._id] = pcoll + self._computed_cache.add(pcoll) + + def assertTraceTypes(self, expr, expected): + actual_types = [type(e).__name__ for e in self.create_trace(expr)] + expected_types = [e.__name__ for e in expected] + self.assertListEqual(actual_types, expected_types) + + def test_only_replaces_cached(self): + in_expr = expressions.ConstantExpression(0) + comp_expr = expressions.ComputedExpression('test', lambda x: x, [in_expr]) + + # Expect that no replacement of expressions is performed. + expected_trace = [ + expressions.ComputedExpression, expressions.ConstantExpression + ] + self.assertTraceTypes(comp_expr, expected_trace) + + self.cache.replace_with_cached(comp_expr) + + self.assertTraceTypes(comp_expr, expected_trace) + + # Now "cache" the expression and assert that the cached expression was + # replaced with a placeholder. + self.mock_cache(in_expr) + + replaced = self.cache.replace_with_cached(comp_expr) + + expected_trace = [ + expressions.ComputedExpression, expressions.PlaceholderExpression + ] + self.assertTraceTypes(comp_expr, expected_trace) + self.assertIn(in_expr._id, replaced) + + def test_only_replaces_inputs(self): + arg_0_expr = expressions.ConstantExpression(0) + ident_val = expressions.ComputedExpression( + 'ident', lambda x: x, [arg_0_expr]) + + arg_1_expr = expressions.ConstantExpression(1) + comp_expr = expressions.ComputedExpression( + 'add', lambda x, y: x + y, [ident_val, arg_1_expr]) + + self.mock_cache(ident_val) + + replaced = self.cache.replace_with_cached(comp_expr) + + # Assert that ident_val was replaced and that its arguments were removed + # from the expression tree. + expected_trace = [ + expressions.ComputedExpression, + expressions.PlaceholderExpression, + expressions.ConstantExpression + ] + self.assertTraceTypes(comp_expr, expected_trace) + self.assertIn(ident_val._id, replaced) + self.assertNotIn(arg_0_expr, self.create_trace(comp_expr)) + + def test_only_caches_same_input(self): + arg_0_expr = expressions.ConstantExpression(0) + ident_val = expressions.ComputedExpression( + 'ident', lambda x: x, [arg_0_expr]) + comp_expr = expressions.ComputedExpression( + 'add', lambda x, y: x + y, [ident_val, arg_0_expr]) + + self.mock_cache(arg_0_expr) + + replaced = self.cache.replace_with_cached(comp_expr) + + # Assert that arg_0_expr, being an input to two computations, was replaced + # with the same placeholder expression. + expected_trace = [ + expressions.ComputedExpression, + expressions.ComputedExpression, + expressions.PlaceholderExpression, + expressions.PlaceholderExpression + ] + actual_trace = self.create_trace(comp_expr) + unique_placeholders = set( + t for t in actual_trace + if isinstance(t, expressions.PlaceholderExpression)) + self.assertTraceTypes(comp_expr, expected_trace) + self.assertTrue( + all(e == replaced[arg_0_expr._id] for e in unique_placeholders)) + self.assertIn(arg_0_expr._id, replaced) + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/runners/interactive/caching/read_cache.py b/sdks/python/apache_beam/runners/interactive/caching/read_cache.py new file mode 100644 index 000000000000..b23681d4b43f --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/caching/read_cache.py @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module to read cache of computed PCollections. + +For internal use only; no backward-compatibility guarantees. +""" +# pytype: skip-file + +from typing import Tuple + +import apache_beam as beam +from apache_beam.portability.api import beam_runner_api_pb2 +from apache_beam.runners.interactive import cache_manager as cache +from apache_beam.runners.interactive.caching.cacheable import Cacheable +from apache_beam.runners.pipeline_context import PipelineContext +from apache_beam.transforms.ptransform import PTransform + + +class ReadCache: + """Class that facilitates reading cache of computed PCollections. + """ + def __init__( + self, + pipeline: beam_runner_api_pb2.Pipeline, + context: PipelineContext, + cache_manager: cache.CacheManager, + cacheable: Cacheable): + self._pipeline = pipeline + self._context = context + self._cache_manager = cache_manager + self._cacheable = cacheable + self._key = repr(cacheable.to_key()) + self._label = '{}{}'.format('_cache_', self._key) + + def read_cache(self) -> Tuple[str, str]: + """Reads cache of the cacheable PCollection and wires the cache into the + pipeline proto. Returns the pipeline-scoped ids of the cacheable PCollection + and the cache reading output PCollection that replaces it. + + First, it creates a temporary pipeline instance on top of the existing + component_id_map from the self._pipeline's context so that both pipelines + share the context and have no conflict component ids. + Second, it instantiates a _ReadCacheTransform to build the temporary + pipeline with a subgraph under top level transforms that reads the cache of + a cacheable PCollection. + Third, it copies components of the subgraph from the temporary pipeline to + self._pipeline, skipping components that are not in the temporary pipeline + but presents in the component_id_map of self._pipeline. Since to_runner_api + generates components for all entries in the component_id_map, those + component ids from the context shared by self._pipeline need to be ignored. + Last, it replaces inputs of all transforms that consume the cacheable + PCollection with the output PCollection of the _ReadCacheTransform so that + the whole pipeline computes with data from the cache. The pipeline + fragment of reading the cacheable PCollection is now disconnected from the + rest of the pipeline and can be pruned later. + """ + template, read_output = self._build_runner_api_template() + output_id = self._context.pcollections.get_id(read_output) + source_id = self._context.pcollections.get_id(self._cacheable.pcoll) + # Copy cache reading subgraph from the template to the pipeline proto. + for pcoll_id in template.components.pcollections: + if pcoll_id in self._pipeline.components.pcollections: + continue + self._pipeline.components.pcollections[pcoll_id].CopyFrom( + template.components.pcollections[pcoll_id]) + for coder_id in template.components.coders: + if coder_id in self._pipeline.components.coders: + continue + self._pipeline.components.coders[coder_id].CopyFrom( + template.components.coders[coder_id]) + for windowing_strategy_id in template.components.windowing_strategies: + if (windowing_strategy_id in + self._pipeline.components.windowing_strategies): + continue + self._pipeline.components.windowing_strategies[ + windowing_strategy_id].CopyFrom( + template.components.windowing_strategies[windowing_strategy_id]) + template_root_transform_id = template.root_transform_ids[0] + root_transform_id = self._pipeline.root_transform_ids[0] + for transform_id in template.components.transforms: + if (transform_id == template_root_transform_id or + transform_id in self._pipeline.components.transforms): + continue + self._pipeline.components.transforms[transform_id].CopyFrom( + template.components.transforms[transform_id]) + self._pipeline.components.transforms[ + root_transform_id].subtransforms.extend( + template.components.transforms[template_root_transform_id]. + subtransforms) + + # Replace all the input pcoll of source_id with output pcoll of output_id + # from cache reading. + for transform in self._pipeline.components.transforms.values(): + inputs = transform.inputs + if source_id in inputs.values(): + keys_need_replacement = set() + for key in inputs: + if inputs[key] == source_id: + keys_need_replacement.add(key) + for key in keys_need_replacement: + inputs[key] = output_id + + return source_id, output_id + + def _build_runner_api_template( + self) -> Tuple[beam_runner_api_pb2.Pipeline, beam.pvalue.PCollection]: + transform = _ReadCacheTransform(self._cache_manager, self._key, self._label) + tmp_pipeline = beam.Pipeline() + tmp_pipeline.component_id_map = self._context.component_id_map + read_output = tmp_pipeline | 'source' + self._label >> transform + return tmp_pipeline.to_runner_api(), read_output + + +class _ReadCacheTransform(PTransform): + """A composite transform encapsulates reading cache of PCollections. + """ + def __init__(self, cache_manager: cache.CacheManager, key: str, label: str): + self._cache_manager = cache_manager + self._key = key + self._label = label + + def expand(self, pcoll: beam.pvalue.PCollection) -> beam.pvalue.PCollection: + class Unreify(beam.DoFn): + def process(self, e): + yield e.windowed_value + + return ( + pcoll.pipeline + | + 'read' + self._label >> cache.ReadCache(self._cache_manager, self._key) + | 'unreify' + self._label >> beam.ParDo(Unreify())) diff --git a/sdks/python/apache_beam/runners/interactive/caching/read_cache_test.py b/sdks/python/apache_beam/runners/interactive/caching/read_cache_test.py new file mode 100644 index 000000000000..aa2ed202b6cc --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/caching/read_cache_test.py @@ -0,0 +1,89 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tests for read_cache.""" +# pytype: skip-file + +import unittest +from unittest.mock import patch + +import apache_beam as beam +from apache_beam.runners.interactive import augmented_pipeline as ap +from apache_beam.runners.interactive import interactive_beam as ib +from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import read_cache +from apache_beam.runners.interactive.testing.pipeline_assertion import assert_pipeline_proto_equal +from apache_beam.runners.interactive.testing.test_cache_manager import InMemoryCache + + +class ReadCacheTest(unittest.TestCase): + def setUp(self): + ie.new_env() + + @patch( + 'apache_beam.runners.interactive.interactive_environment' + '.InteractiveEnvironment.get_cache_manager') + def test_read_cache(self, mocked_get_cache_manager): + p = beam.Pipeline() + pcoll = p | beam.Create([1, 2, 3]) + consumer_transform = beam.Map(lambda x: x * x) + _ = pcoll | consumer_transform + ib.watch(locals()) + + # Create the cache in memory. + cache_manager = InMemoryCache() + mocked_get_cache_manager.return_value = cache_manager + aug_p = ap.AugmentedPipeline(p) + key = repr(aug_p._cacheables[pcoll].to_key()) + cache_manager.write('test', 'full', key) + + # Capture the applied transform of the consumer_transform. + pcoll_id = aug_p._context.pcollections.get_id(pcoll) + consumer_transform_id = None + pipeline_proto = p.to_runner_api() + for (transform_id, + transform) in pipeline_proto.components.transforms.items(): + if pcoll_id in transform.inputs.values(): + consumer_transform_id = transform_id + break + self.assertIsNotNone(consumer_transform_id) + + # Read cache on the pipeline proto. + _, cache_id = read_cache.ReadCache( + pipeline_proto, aug_p._context, aug_p._cache_manager, + aug_p._cacheables[pcoll]).read_cache() + actual_pipeline = pipeline_proto + + # Read cache directly on the pipeline instance. + label = '{}{}'.format('_cache_', key) + transform = read_cache._ReadCacheTransform(aug_p._cache_manager, key, label) + p | 'source' + label >> transform + expected_pipeline = p.to_runner_api() + + # This rougly checks the equivalence between two protos, not detailed + # wiring in sub transforms under top level transforms. + assert_pipeline_proto_equal(self, expected_pipeline, actual_pipeline) + + # Check if the actual_pipeline uses cache as input of the + # consumer_transform instead of the original pcoll from source. + inputs = actual_pipeline.components.transforms[consumer_transform_id].inputs + self.assertIn(cache_id, inputs.values()) + self.assertNotIn(pcoll_id, inputs.values()) + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py b/sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py index 52e8388d5f66..fc8a8aa44768 100644 --- a/sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py +++ b/sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import shutil @@ -26,27 +24,25 @@ import time import traceback from collections import OrderedDict +# We don't have an explicit pathlib dependency because this code only works with +# the interactive target installed which has an indirect dependency on pathlib +# through ipython>=5.9.0. +from pathlib import Path from google.protobuf.message import DecodeError import apache_beam as beam +from apache_beam import coders from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileHeader from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload from apache_beam.runners.interactive.cache_manager import CacheManager from apache_beam.runners.interactive.cache_manager import SafeFastPrimitivesCoder +from apache_beam.runners.interactive.caching.cacheable import CacheKey from apache_beam.testing.test_stream import OutputFormat from apache_beam.testing.test_stream import ReverseTestStream from apache_beam.utils import timestamp -# We don't have an explicit pathlib dependency because this code only works with -# the interactive target installed which has an indirect dependency on pathlib -# and pathlib2 through ipython>=5.9.0. -try: - from pathlib import Path -except ImportError: - from pathlib2 import Path # python 2 backport - _LOGGER = logging.getLogger(__name__) @@ -160,8 +156,6 @@ def __init__(self, cache_dir, labels, is_cache_complete=None, coder=None): self._labels = labels self._path = os.path.join(self._cache_dir, *self._labels) self._is_cache_complete = is_cache_complete - - from apache_beam.runners.interactive.pipeline_instrument import CacheKey self._pipeline_id = CacheKey.from_str(labels[-1]).pipeline_id def _wait_until_file_exists(self, timeout_secs=30): @@ -172,7 +166,6 @@ def _wait_until_file_exists(self, timeout_secs=30): while not os.path.exists(self._path): time.sleep(1) if time.time() - start > timeout_secs: - from apache_beam.runners.interactive.pipeline_instrument import CacheKey pcollection_var = CacheKey.from_str(self._labels[-1]).var raise RuntimeError( 'Timed out waiting for cache file for PCollection `{}` to be ' @@ -246,7 +239,11 @@ class StreamingCache(CacheManager): """Abstraction that holds the logic for reading and writing to cache. """ def __init__( - self, cache_dir, is_cache_complete=None, sample_resolution_sec=0.1): + self, + cache_dir, + is_cache_complete=None, + sample_resolution_sec=0.1, + saved_pcoders=None): self._sample_resolution_sec = sample_resolution_sec self._is_cache_complete = is_cache_complete @@ -254,7 +251,7 @@ def __init__( self._cache_dir = cache_dir else: self._cache_dir = tempfile.mkdtemp( - prefix='interactive-temp-', dir=os.environ.get('TEST_TMPDIR', None)) + prefix='ib-', dir=os.environ.get('TEST_TMPDIR', None)) # List of saved pcoders keyed by PCollection path. It is OK to keep this # list in memory because once FileBasedCacheManager object is @@ -266,7 +263,7 @@ def __init__( # However, if we are to implement better cache persistence, one needs # to take care of keeping consistency between the cached PCollection # and its PCoder type. - self._saved_pcoders = {} + self._saved_pcoders = saved_pcoders or {} self._default_pcoder = SafeFastPrimitivesCoder() # The sinks to capture data from capturable sources. @@ -306,7 +303,10 @@ def read(self, *labels, **args): return iter([]), -1 reader = StreamingCacheSource( - self._cache_dir, labels, self._is_cache_complete).read(tail=tail) + self._cache_dir, + labels, + self._is_cache_complete, + self.load_pcoder(*labels)).read(tail=tail) # Return an empty iterator if there is nothing in the file yet. This can # only happen when tail is False. @@ -324,9 +324,9 @@ def read_multiple(self, labels, tail=True): pipeline runtime which needs to block. """ readers = [ - StreamingCacheSource(self._cache_dir, l, - self._is_cache_complete).read(tail=tail) - for l in labels + StreamingCacheSource( + self._cache_dir, l, self._is_cache_complete, + self.load_pcoder(*l)).read(tail=tail) for l in labels ] headers = [next(r) for r in readers] return StreamingCache.Reader(headers, readers).read() @@ -343,8 +343,10 @@ def write(self, values, *labels): if isinstance(v, (TestStreamFileHeader, TestStreamFileRecord)): val = v.SerializeToString() else: - val = v - f.write(self._default_pcoder.encode(val) + b'\n') + raise TypeError( + 'Values given to streaming cache should be either ' + 'TestStreamFileHeader or TestStreamFileRecord.') + f.write(self.load_pcoder(*labels).encode(val) + b'\n') def clear(self, *labels): directory = os.path.join(self._cache_dir, *labels[:-1]) @@ -372,23 +374,39 @@ def sink(self, labels, is_capture=False): """ filename = labels[-1] cache_dir = os.path.join(self._cache_dir, *labels[:-1]) - sink = StreamingCacheSink(cache_dir, filename, self._sample_resolution_sec) + sink = StreamingCacheSink( + cache_dir, + filename, + self._sample_resolution_sec, + self.load_pcoder(*labels)) if is_capture: self._capture_sinks[sink.path] = sink self._capture_keys.add(filename) return sink def save_pcoder(self, pcoder, *labels): - self._saved_pcoders[os.path.join(*labels)] = pcoder + self._saved_pcoders[os.path.join(self._cache_dir, *labels)] = pcoder def load_pcoder(self, *labels): - return ( - self._default_pcoder if self._default_pcoder is not None else - self._saved_pcoders[os.path.join(*labels)]) + saved_pcoder = self._saved_pcoders.get( + os.path.join(self._cache_dir, *labels), None) + if saved_pcoder is None or isinstance(saved_pcoder, + coders.FastPrimitivesCoder): + return self._default_pcoder + return saved_pcoder def cleanup(self): + if os.path.exists(self._cache_dir): - shutil.rmtree(self._cache_dir) + + def on_fail_to_cleanup(function, path, excinfo): + _LOGGER.warning( + 'Failed to clean up temporary files: %s. You may' + 'manually delete them if necessary. Error was: %s', + path, + excinfo) + + shutil.rmtree(self._cache_dir, onerror=on_fail_to_cleanup) self._saved_pcoders = {} self._capture_sinks = {} self._capture_keys = set() diff --git a/sdks/python/apache_beam/runners/interactive/caching/streaming_cache_test.py b/sdks/python/apache_beam/runners/interactive/caching/streaming_cache_test.py index 2238e0d2b94e..6b811026b98b 100644 --- a/sdks/python/apache_beam/runners/interactive/caching/streaming_cache_test.py +++ b/sdks/python/apache_beam/runners/interactive/caching/streaming_cache_test.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from apache_beam import coders @@ -27,8 +25,8 @@ from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload from apache_beam.runners.interactive.cache_manager import SafeFastPrimitivesCoder +from apache_beam.runners.interactive.caching.cacheable import CacheKey from apache_beam.runners.interactive.caching.streaming_cache import StreamingCache -from apache_beam.runners.interactive.pipeline_instrument import CacheKey from apache_beam.runners.interactive.testing.test_cache_manager import FileRecordsBuilder from apache_beam.testing.test_pipeline import TestPipeline from apache_beam.testing.test_stream import TestStream @@ -429,6 +427,25 @@ def test_read_and_write_multiple_outputs(self): self.assertListEqual(actual_events, expected_events) + def test_always_default_coder_for_test_stream_records(self): + CACHED_NUMBERS = repr(CacheKey('numbers', '', '', '')) + numbers = (FileRecordsBuilder(CACHED_NUMBERS) + .advance_processing_time(2) + .add_element(element=1, event_time_secs=0) + .advance_processing_time(1) + .add_element(element=2, event_time_secs=0) + .advance_processing_time(1) + .add_element(element=2, event_time_secs=0) + .build()) # yapf: disable + cache = StreamingCache(cache_dir=None) + cache.write(numbers, CACHED_NUMBERS) + self.assertIs( + type(cache.load_pcoder(CACHED_NUMBERS)), type(cache._default_pcoder)) + + def test_streaming_cache_does_not_write_non_record_or_header_types(self): + cache = StreamingCache(cache_dir=None) + self.assertRaises(TypeError, cache.write, 'some value', 'a key') + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/runners/interactive/caching/write_cache.py b/sdks/python/apache_beam/runners/interactive/caching/write_cache.py new file mode 100644 index 000000000000..94effdf44453 --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/caching/write_cache.py @@ -0,0 +1,186 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module to write cache for PCollections being computed. + +For internal use only; no backward-compatibility guarantees. +""" +# pytype: skip-file + +from typing import Tuple + +import apache_beam as beam +from apache_beam.portability.api import beam_runner_api_pb2 +from apache_beam.runners.interactive import cache_manager as cache +from apache_beam.runners.interactive.caching.cacheable import Cacheable +from apache_beam.runners.pipeline_context import PipelineContext +from apache_beam.testing import test_stream +from apache_beam.transforms.ptransform import PTransform +from apache_beam.transforms.window import WindowedValue + + +class WriteCache: + """Class that facilitates writing cache for PCollections being computed. + """ + def __init__( + self, + pipeline: beam_runner_api_pb2.Pipeline, + context: PipelineContext, + cache_manager: cache.CacheManager, + cacheable: Cacheable): + self._pipeline = pipeline + self._context = context + self._cache_manager = cache_manager + self._cacheable = cacheable + self._key = repr(cacheable.to_key()) + self._label = '{}{}'.format('_cache_', self._key) + + def write_cache(self) -> None: + """Writes cache for the cacheable PCollection that is being computed. + + First, it creates a temporary pipeline instance on top of the existing + component_id_map from self._pipeline's context so that both pipelines + share the context and have no conflict component ids. + Second, it creates a _PCollectionPlaceHolder in the temporary pipeline that + mimics the attributes of the cacheable PCollection to be written into cache. + It also marks all components in the current temporary pipeline as + ignorable when later copying components to self._pipeline. + Third, it instantiates a _WriteCacheTransform that uses the + _PCollectionPlaceHolder as the input. This adds a subgraph under top level + transforms that writes the _PCollectionPlaceHolder into cache. + Fourth, it copies components of the subgraph from the temporary pipeline to + self._pipeline, skipping components that are ignored in the temporary + pipeline and components that are not in the temporary pipeline but presents + in the component_id_map of self._pipeline. + Last, it replaces inputs of all transforms that consume the + _PCollectionPlaceHolder with the cacheable PCollection to be written to + cache. + """ + template, write_input_placeholder = self._build_runner_api_template() + input_placeholder_id = self._context.pcollections.get_id( + write_input_placeholder.placeholder_pcoll) + input_id = self._context.pcollections.get_id(self._cacheable.pcoll) + + # Copy cache writing subgraph from the template to the pipeline proto. + for pcoll_id in template.components.pcollections: + if (pcoll_id in self._pipeline.components.pcollections or + pcoll_id in write_input_placeholder.ignorable_components.pcollections + ): + continue + self._pipeline.components.pcollections[pcoll_id].CopyFrom( + template.components.pcollections[pcoll_id]) + for coder_id in template.components.coders: + if (coder_id in self._pipeline.components.coders or + coder_id in write_input_placeholder.ignorable_components.coders): + continue + self._pipeline.components.coders[coder_id].CopyFrom( + template.components.coders[coder_id]) + for windowing_strategy_id in template.components.windowing_strategies: + if (windowing_strategy_id in + self._pipeline.components.windowing_strategies or + windowing_strategy_id in + write_input_placeholder.ignorable_components.windowing_strategies): + continue + self._pipeline.components.windowing_strategies[ + windowing_strategy_id].CopyFrom( + template.components.windowing_strategies[windowing_strategy_id]) + template_root_transform_id = template.root_transform_ids[0] + root_transform_id = self._pipeline.root_transform_ids[0] + for transform_id in template.components.transforms: + if (transform_id in self._pipeline.components.transforms or transform_id + in write_input_placeholder.ignorable_components.transforms): + continue + self._pipeline.components.transforms[transform_id].CopyFrom( + template.components.transforms[transform_id]) + for top_level_transform in template.components.transforms[ + template_root_transform_id].subtransforms: + if (top_level_transform in + write_input_placeholder.ignorable_components.transforms): + continue + self._pipeline.components.transforms[ + root_transform_id].subtransforms.append(top_level_transform) + + # Replace all the input pcoll of input_placeholder_id from cache writing + # with cacheable pcoll of input_id. + for transform in self._pipeline.components.transforms.values(): + inputs = transform.inputs + if input_placeholder_id in inputs.values(): + keys_need_replacement = set() + for key in inputs: + if inputs[key] == input_placeholder_id: + keys_need_replacement.add(key) + for key in keys_need_replacement: + inputs[key] = input_id + + def _build_runner_api_template( + self) -> Tuple[beam_runner_api_pb2.Pipeline, '_PCollectionPlaceHolder']: + pph = _PCollectionPlaceHolder(self._cacheable.pcoll, self._context) + transform = _WriteCacheTransform( + self._cache_manager, self._key, self._label) + _ = pph.placeholder_pcoll | 'sink' + self._label >> transform + return pph.placeholder_pcoll.pipeline.to_runner_api(), pph + + +class _WriteCacheTransform(PTransform): + """A composite transform encapsulates writing cache for PCollections. + """ + def __init__(self, cache_manager: cache.CacheManager, key: str, label: str): + self._cache_manager = cache_manager + self._key = key + self._label = label + + def expand(self, pcoll: beam.pvalue.PCollection) -> beam.pvalue.PCollection: + class Reify(beam.DoFn): + def process( + self, + e, + w=beam.DoFn.WindowParam, + p=beam.DoFn.PaneInfoParam, + t=beam.DoFn.TimestampParam): + yield test_stream.WindowedValueHolder(WindowedValue(e, t, [w], p)) + + return ( + pcoll + | 'reify' + self._label >> beam.ParDo(Reify()) + | 'write' + self._label >> cache.WriteCache( + self._cache_manager, self._key, is_capture=False)) + + +class _PCollectionPlaceHolder: + """A placeholder as an input to the cache writing transform. + """ + def __init__(self, pcoll: beam.pvalue.PCollection, context: PipelineContext): + tmp_pipeline = beam.Pipeline() + tmp_pipeline.component_id_map = context.component_id_map + self._input_placeholder = tmp_pipeline | 'CreatePInput' >> beam.Create( + [], reshuffle=False) + self._input_placeholder.tag = pcoll.tag + self._input_placeholder.element_type = pcoll.element_type + self._input_placeholder.is_bounded = pcoll.is_bounded + self._input_placeholder._windowing = pcoll.windowing + self._ignorable_components = tmp_pipeline.to_runner_api().components + + @property + def placeholder_pcoll(self) -> beam.pvalue.PCollection: + return self._input_placeholder + + @property + def ignorable_components(self) -> beam_runner_api_pb2.Components: + """Subgraph generated by the placeholder that can be ignored in the final + pipeline proto. + """ + return self._ignorable_components diff --git a/sdks/python/apache_beam/runners/interactive/caching/write_cache_test.py b/sdks/python/apache_beam/runners/interactive/caching/write_cache_test.py new file mode 100644 index 000000000000..af8dc7b21ef7 --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/caching/write_cache_test.py @@ -0,0 +1,83 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tests for write_cache.""" +# pytype: skip-file + +import unittest +from unittest.mock import patch + +import apache_beam as beam +from apache_beam.runners.interactive import augmented_pipeline as ap +from apache_beam.runners.interactive import interactive_beam as ib +from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import write_cache +from apache_beam.runners.interactive.testing.pipeline_assertion import assert_pipeline_proto_equal +from apache_beam.runners.interactive.testing.test_cache_manager import InMemoryCache + + +class WriteCacheTest(unittest.TestCase): + def setUp(self): + ie.new_env() + + @patch( + 'apache_beam.runners.interactive.interactive_environment' + '.InteractiveEnvironment.get_cache_manager') + def test_write_cache(self, mocked_get_cache_manager): + p = beam.Pipeline() + pcoll = p | beam.Create([1, 2, 3]) + ib.watch(locals()) + + cache_manager = InMemoryCache() + mocked_get_cache_manager.return_value = cache_manager + aug_p = ap.AugmentedPipeline(p) + key = repr(aug_p._cacheables[pcoll].to_key()) + pipeline_proto = p.to_runner_api() + + # Write cache on the pipeline proto. + write_cache.WriteCache( + pipeline_proto, + aug_p._context, + aug_p._cache_manager, + aug_p._cacheables[pcoll]).write_cache() + actual_pipeline = pipeline_proto + + # Write cache directly on the piepline instance. + label = '{}{}'.format('_cache_', key) + transform = write_cache._WriteCacheTransform( + aug_p._cache_manager, key, label) + _ = pcoll | 'sink' + label >> transform + expected_pipeline = p.to_runner_api() + + assert_pipeline_proto_equal(self, expected_pipeline, actual_pipeline) + + # Check if the actual_pipeline uses pcoll as an input of a write transform. + pcoll_id = aug_p._context.pcollections.get_id(pcoll) + write_transform_id = None + for transform_id, transform in \ + actual_pipeline.components.transforms.items(): + if pcoll_id in transform.inputs.values(): + write_transform_id = transform_id + break + self.assertIsNotNone(write_transform_id) + self.assertIn( + 'sink', + actual_pipeline.components.transforms[write_transform_id].unique_name) + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/runners/interactive/display/display_manager.py b/sdks/python/apache_beam/runners/interactive/display/display_manager.py index 3c58ce2d2daf..84df97135647 100644 --- a/sdks/python/apache_beam/runners/interactive/display/display_manager.py +++ b/sdks/python/apache_beam/runners/interactive/display/display_manager.py @@ -22,10 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import collections import threading import time diff --git a/sdks/python/apache_beam/runners/interactive/display/interactive_pipeline_graph.py b/sdks/python/apache_beam/runners/interactive/display/interactive_pipeline_graph.py index 5811769c57b6..48c926f21263 100644 --- a/sdks/python/apache_beam/runners/interactive/display/interactive_pipeline_graph.py +++ b/sdks/python/apache_beam/runners/interactive/display/interactive_pipeline_graph.py @@ -22,10 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import re from apache_beam.runners.interactive.display import pipeline_graph diff --git a/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py b/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py index 615758d3e053..89800d83d5e3 100644 --- a/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py +++ b/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py @@ -22,9 +22,6 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import base64 import datetime import html @@ -156,7 +153,8 @@ def visualize( stream, dynamic_plotting_interval=None, include_window_info=False, - display_facets=False): + display_facets=False, + element_type=None): """Visualizes the data of a given PCollection. Optionally enables dynamic plotting with interval in seconds if the PCollection is being produced by a running pipeline or the pipeline is streaming indefinitely. The function @@ -192,7 +190,8 @@ def visualize( pv = PCollectionVisualization( stream, include_window_info=include_window_info, - display_facets=display_facets) + display_facets=display_facets, + element_type=element_type) if ie.current_env().is_in_notebook: pv.display() else: @@ -217,7 +216,8 @@ def continuous_update_display(): # pylint: disable=unused-variable updated_pv = PCollectionVisualization( stream, include_window_info=include_window_info, - display_facets=display_facets) + display_facets=display_facets, + element_type=element_type) updated_pv.display(updating_pv=pv) # Stop updating the visualizations as soon as the stream will not yield @@ -243,7 +243,12 @@ class PCollectionVisualization(object): access current interactive environment for materialized PCollection data at the moment of self instantiation through cache. """ - def __init__(self, stream, include_window_info=False, display_facets=False): + def __init__( + self, + stream, + include_window_info=False, + display_facets=False, + element_type=None): assert _pcoll_visualization_ready, ( 'Dependencies for PCollection visualization are not available. Please ' 'use `pip install apache-beam[interactive]` to install necessary ' @@ -261,6 +266,7 @@ def __init__(self, stream, include_window_info=False, display_facets=False): self._include_window_info = include_window_info self._display_facets = display_facets self._is_datatable_empty = True + self._element_type = element_type def display_plain_text(self): """Displays a head sample of the normalized PCollection data. @@ -410,7 +416,8 @@ def _display_dataframe(self, data, update=None): def _to_dataframe(self): results = list(self._stream.read(tail=False)) - return elements_to_df(results, self._include_window_info) + return elements_to_df( + results, self._include_window_info, element_type=self._element_type) def format_window_info_in_dataframe(data): diff --git a/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py b/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py index 5d01b88b1057..5b0b51d35d62 100644 --- a/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py +++ b/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py @@ -18,10 +18,10 @@ """Tests for apache_beam.runners.interactive.display.pcoll_visualization.""" # pytype: skip-file -from __future__ import absolute_import - -import sys import unittest +from unittest.mock import ANY +from unittest.mock import PropertyMock +from unittest.mock import patch import pytz @@ -38,13 +38,6 @@ from apache_beam.utils.windowed_value import PaneInfo from apache_beam.utils.windowed_value import PaneInfoTiming -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch, ANY, PropertyMock -except ImportError: - from mock import patch, ANY, PropertyMock # type: ignore[misc] - try: import timeloop except ImportError: @@ -54,8 +47,6 @@ @unittest.skipIf( not ie.current_env().is_interactive_ready, '[interactive] dependency is not installed.') -@unittest.skipIf( - sys.version_info < (3, 6), 'The tests require at least Python 3.6 to work.') class PCollectionVisualizationTest(unittest.TestCase): def setUp(self): ie.new_env() diff --git a/sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py b/sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py index e7e0161410f5..d084569d6a2b 100644 --- a/sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py +++ b/sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py @@ -22,10 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import collections import logging import threading diff --git a/sdks/python/apache_beam/runners/interactive/display/pipeline_graph_renderer.py b/sdks/python/apache_beam/runners/interactive/display/pipeline_graph_renderer.py index 4ac6d756133a..9e23fc1deeda 100644 --- a/sdks/python/apache_beam/runners/interactive/display/pipeline_graph_renderer.py +++ b/sdks/python/apache_beam/runners/interactive/display/pipeline_graph_renderer.py @@ -22,10 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import abc import os import subprocess @@ -33,15 +29,13 @@ from typing import Optional from typing import Type -from future.utils import with_metaclass - from apache_beam.utils.plugin import BeamPlugin if TYPE_CHECKING: from apache_beam.runners.interactive.display.pipeline_graph import PipelineGraph -class PipelineGraphRenderer(with_metaclass(abc.ABCMeta, BeamPlugin)): # type: ignore[misc] +class PipelineGraphRenderer(BeamPlugin, metaclass=abc.ABCMeta): """Abstract class for renderers, who decide how pipeline graphs are rendered. """ @classmethod diff --git a/sdks/python/apache_beam/runners/interactive/display/pipeline_graph_test.py b/sdks/python/apache_beam/runners/interactive/display/pipeline_graph_test.py index 01d85bd38301..9a0c7a9df1be 100644 --- a/sdks/python/apache_beam/runners/interactive/display/pipeline_graph_test.py +++ b/sdks/python/apache_beam/runners/interactive/display/pipeline_graph_test.py @@ -18,9 +18,8 @@ """Tests for apache_beam.runners.interactive.display.pipeline_graph.""" # pytype: skip-file -from __future__ import absolute_import - import unittest +from unittest.mock import patch import apache_beam as beam from apache_beam.runners.interactive import interactive_beam as ib @@ -29,13 +28,6 @@ from apache_beam.runners.interactive.display import pipeline_graph from apache_beam.runners.interactive.testing.mock_ipython import mock_get_ipython -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch # type: ignore[misc] - # pylint: disable=range-builtin-not-iterating,unused-variable,possibly-unused-variable # Reason: # Disable pylint for pipelines built for testing. Not all PCollections are diff --git a/sdks/python/apache_beam/runners/interactive/interactive_beam.py b/sdks/python/apache_beam/runners/interactive/interactive_beam.py index d77bf524fe18..0b3f8a3e732e 100644 --- a/sdks/python/apache_beam/runners/interactive/interactive_beam.py +++ b/sdks/python/apache_beam/runners/interactive/interactive_beam.py @@ -31,18 +31,18 @@ # pytype: skip-file -from __future__ import absolute_import - import logging from datetime import timedelta import pandas as pd import apache_beam as beam +from apache_beam.dataframe.frame_base import DeferredBase from apache_beam.runners.interactive import interactive_environment as ie from apache_beam.runners.interactive.display import pipeline_graph from apache_beam.runners.interactive.display.pcoll_visualization import visualize from apache_beam.runners.interactive.options import interactive_options +from apache_beam.runners.interactive.utils import deferred_df_to_pcollection from apache_beam.runners.interactive.utils import elements_to_df from apache_beam.runners.interactive.utils import progress_indicated from apache_beam.runners.runner import PipelineState @@ -361,12 +361,14 @@ def run_pipeline(self): ie.current_env().watch(watchable) -# TODO(BEAM-8288): Change the signature of this function to -# `show(*pcolls, include_window_info=False, visualize_data=False)` once Python 2 -# is completely deprecated from Beam. @progress_indicated -def show(*pcolls, **configs): - # type: (*Union[Dict[Any, PCollection], Iterable[PCollection], PCollection], **bool) -> None +def show( + *pcolls, + include_window_info=False, + visualize_data=False, + n='inf', + duration='inf'): + # type: (*Union[Dict[Any, PCollection], Iterable[PCollection], PCollection], bool, bool, Union[int, str], Union[int, str]) -> None """Shows given PCollections in an interactive exploratory way if used within a notebook, or prints a heading sampled data if used within an ipython shell. @@ -434,7 +436,7 @@ def show(*pcolls, **configs): for pcoll_container in pcolls: if isinstance(pcoll_container, dict): flatten_pcolls.extend(pcoll_container.values()) - elif isinstance(pcoll_container, beam.pvalue.PCollection): + elif isinstance(pcoll_container, (beam.pvalue.PCollection, DeferredBase)): flatten_pcolls.append(pcoll_container) else: try: @@ -443,20 +445,31 @@ def show(*pcolls, **configs): raise ValueError( 'The given pcoll %s is not a dict, an iterable or a PCollection.' % pcoll_container) - pcolls = flatten_pcolls - assert len(pcolls) > 0, ( - 'Need at least 1 PCollection to show data visualization.') - for pcoll in pcolls: + + # Iterate through the given PCollections and convert any deferred DataFrames + # or Series into PCollections. + pcolls = [] + + # The element type is used to help visualize the given PCollection. For the + # deferred DataFrame/Series case it is the proxy of the frame. + element_types = {} + for pcoll in flatten_pcolls: + if isinstance(pcoll, DeferredBase): + pcoll, element_type = deferred_df_to_pcollection(pcoll) + watch({'anonymous_pcollection_{}'.format(id(pcoll)): pcoll}) + else: + element_type = pcoll.element_type + + element_types[pcoll] = element_type + + pcolls.append(pcoll) assert isinstance(pcoll, beam.pvalue.PCollection), ( '{} is not an apache_beam.pvalue.PCollection.'.format(pcoll)) - user_pipeline = pcolls[0].pipeline - # TODO(BEAM-8288): Remove below pops and assertion once Python 2 is - # deprecated from Beam. - include_window_info = configs.pop('include_window_info', False) - visualize_data = configs.pop('visualize_data', False) - n = configs.pop('n', 'inf') - duration = configs.pop('duration', 'inf') + assert len(pcolls) > 0, ( + 'Need at least 1 PCollection to show data visualization.') + + user_pipeline = pcolls[0].pipeline if isinstance(n, str): assert n == 'inf', ( @@ -475,12 +488,6 @@ def show(*pcolls, **configs): if duration == 'inf': duration = float('inf') - # This assertion is to protect the backward compatibility for function - # signature change after Python 2 deprecation. - assert not configs, ( - 'The only supported arguments are include_window_info, visualize_data, ' - 'n, and duration') - recording_manager = ie.current_env().get_recording_manager( user_pipeline, create_if_absent=True) recording = recording_manager.record(pcolls, max_n=n, max_duration=duration) @@ -494,10 +501,14 @@ def show(*pcolls, **configs): visualize( stream, include_window_info=include_window_info, - display_facets=visualize_data) + display_facets=visualize_data, + element_type=element_types[stream.pcoll]) elif ie.current_env().is_in_ipython: for stream in recording.computed().values(): - visualize(stream, include_window_info=include_window_info) + visualize( + stream, + include_window_info=include_window_info, + element_type=element_types[stream.pcoll]) if recording.is_computed(): return @@ -509,7 +520,8 @@ def show(*pcolls, **configs): stream, dynamic_plotting_interval=1, include_window_info=include_window_info, - display_facets=visualize_data) + display_facets=visualize_data, + element_type=element_types[stream.pcoll]) # Invoke wait_until_finish to ensure the blocking nature of this API without # relying on the run to be blocking. @@ -538,6 +550,8 @@ def collect(pcoll, n='inf', duration='inf', include_window_info=False): n: (optional) max number of elements to visualize. Default 'inf'. duration: (optional) max duration of elements to read in integer seconds or a string duration. Default 'inf'. + include_window_info: (optional) if True, appends the windowing information + to each row. Default False. For example:: @@ -548,6 +562,15 @@ def collect(pcoll, n='inf', duration='inf', include_window_info=False): # Run the pipeline and bring the PCollection into memory as a Dataframe. in_memory_square = head(square, n=5) """ + # Remember the element type so we can make an informed decision on how to + # collect the result in elements_to_df. + if isinstance(pcoll, DeferredBase): + # Get the proxy so we can get the output shape of the DataFrame. + pcoll, element_type = deferred_df_to_pcollection(pcoll) + watch({'anonymous_pcollection_{}'.format(id(pcoll)): pcoll}) + else: + element_type = pcoll.element_type + assert isinstance(pcoll, beam.pvalue.PCollection), ( '{} is not an apache_beam.pvalue.PCollection.'.format(pcoll)) @@ -580,7 +603,15 @@ def collect(pcoll, n='inf', duration='inf', include_window_info=False): recording.cancel() return pd.DataFrame() - return elements_to_df(elements, include_window_info=include_window_info) + if n == float('inf'): + n = None + + # Collecting DataFrames may have a length > n, so slice again to be sure. Note + # that array[:None] returns everything. + return elements_to_df( + elements, + include_window_info=include_window_info, + element_type=element_type)[:n] @progress_indicated diff --git a/sdks/python/apache_beam/runners/interactive/interactive_beam_test.py b/sdks/python/apache_beam/runners/interactive/interactive_beam_test.py index b76e73d07547..f12ab8ee3de7 100644 --- a/sdks/python/apache_beam/runners/interactive/interactive_beam_test.py +++ b/sdks/python/apache_beam/runners/interactive/interactive_beam_test.py @@ -18,14 +18,15 @@ """Tests for apache_beam.runners.interactive.interactive_beam.""" # pytype: skip-file -from __future__ import absolute_import - import importlib import sys import time import unittest +from typing import NamedTuple +from unittest.mock import patch import apache_beam as beam +from apache_beam import dataframe as frames from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.runners.interactive import interactive_beam as ib from apache_beam.runners.interactive import interactive_environment as ie @@ -34,12 +35,12 @@ from apache_beam.runners.runner import PipelineState from apache_beam.testing.test_stream import TestStream -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch + +class Record(NamedTuple): + order_id: int + product_id: int + quantity: int + # The module name is also a variable in module. _module_name = 'apache_beam.runners.interactive.interactive_beam_test' @@ -119,9 +120,6 @@ def test_show_mark_pcolls_computed_when_done(self): ib.show(pcoll) self.assertTrue(pcoll in ie.current_env().computed_pcollections) - @unittest.skipIf( - sys.version_info < (3, 6), - 'The tests require at least Python 3.6 to work.') @patch('apache_beam.runners.interactive.interactive_beam.visualize') def test_show_handles_dict_of_pcolls(self, mocked_visualize): p = beam.Pipeline(ir.InteractiveRunner()) @@ -135,9 +133,6 @@ def test_show_handles_dict_of_pcolls(self, mocked_visualize): ib.show({'pcoll': pcoll}) mocked_visualize.assert_called_once() - @unittest.skipIf( - sys.version_info < (3, 6), - 'The tests require at least Python 3.6 to work.') @patch('apache_beam.runners.interactive.interactive_beam.visualize') def test_show_handles_iterable_of_pcolls(self, mocked_visualize): p = beam.Pipeline(ir.InteractiveRunner()) @@ -151,9 +146,19 @@ def test_show_handles_iterable_of_pcolls(self, mocked_visualize): ib.show([pcoll]) mocked_visualize.assert_called_once() - @unittest.skipIf( - sys.version_info < (3, 6), - 'The tests require at least Python 3.6 to work.') + @patch('apache_beam.runners.interactive.interactive_beam.visualize') + def test_show_handles_deferred_dataframes(self, mocked_visualize): + p = beam.Pipeline(ir.InteractiveRunner()) + + deferred = frames.convert.to_dataframe(p | beam.Create([Record(0, 0, 0)])) + + ib.watch(locals()) + ie.current_env().track_user_pipelines() + ie.current_env()._is_in_ipython = True + ie.current_env()._is_in_notebook = True + ib.show(deferred) + mocked_visualize.assert_called_once() + @patch('apache_beam.runners.interactive.interactive_beam.visualize') def test_show_noop_when_pcoll_container_is_invalid(self, mocked_visualize): class SomeRandomClass: @@ -169,9 +174,6 @@ def __init__(self, pcoll): self.assertRaises(ValueError, ib.show, SomeRandomClass(pcoll)) mocked_visualize.assert_not_called() - @unittest.skipIf( - sys.version_info < (3, 6), - 'The tests require at least Python 3.6 to work.') def test_recordings_describe(self): """Tests that getting the description works.""" @@ -195,9 +197,6 @@ def test_recordings_describe(self): self.assertEqual(all_descriptions[p1]['pipeline_var'], 'p1') self.assertEqual(all_descriptions[p2]['pipeline_var'], 'p2') - @unittest.skipIf( - sys.version_info < (3, 6), - 'The tests require at least Python 3.6 to work.') def test_recordings_clear(self): """Tests that clearing the pipeline is correctly forwarded.""" @@ -215,9 +214,6 @@ def test_recordings_clear(self): ib.recordings.clear(p) self.assertEqual(ib.recordings.describe(p)['size'], 0) - @unittest.skipIf( - sys.version_info < (3, 6), - 'The tests require at least Python 3.6 to work.') def test_recordings_record(self): """Tests that recording pipeline succeeds.""" diff --git a/sdks/python/apache_beam/runners/interactive/interactive_environment.py b/sdks/python/apache_beam/runners/interactive/interactive_environment.py index c6938f3b05de..102f10a0830f 100644 --- a/sdks/python/apache_beam/runners/interactive/interactive_environment.py +++ b/sdks/python/apache_beam/runners/interactive/interactive_environment.py @@ -24,13 +24,10 @@ """ # pytype: skip-file -from __future__ import absolute_import - import atexit import importlib import logging import os -import sys import tempfile from collections.abc import Iterable @@ -172,12 +169,6 @@ def __init__(self): self._computed_pcolls = set() # Always watch __main__ module. self.watch('__main__') - # Do a warning level logging if current python version is below 3.6. - if sys.version_info < (3, 6): - self._is_py_version_ready = False - _LOGGER.warning('Interactive Beam requires Python 3.5.3+.') - else: - self._is_py_version_ready = True # Check if [interactive] dependencies are installed. try: import IPython # pylint: disable=unused-import @@ -202,7 +193,7 @@ def __init__(self): if self._is_in_ipython and not self._is_in_notebook: _LOGGER.warning( 'You have limited Interactive Beam features since your ' - 'ipython kernel is not connected any notebook frontend.') + 'ipython kernel is not connected to any notebook frontend.') if self._is_in_notebook: self.load_jquery_with_datatable() register_ipython_log_handler() @@ -222,11 +213,6 @@ def options(self): from apache_beam.runners.interactive.interactive_beam import options return options - @property - def is_py_version_ready(self): - """If Python version is above the minimum requirement.""" - return self._is_py_version_ready - @property def is_interactive_ready(self): """If the [interactive] dependencies are installed.""" @@ -261,7 +247,9 @@ def cleanup(self, pipeline=None): bcj.attempt_to_cancel_background_caching_job(pipeline) bcj.attempt_to_stop_test_stream_service(pipeline) cache_manager = self.get_cache_manager(pipeline) - if cache_manager: + # Recording manager performs cache manager cleanup during eviction, so we + # don't need to clean it up here. + if cache_manager and self.get_recording_manager(pipeline) is None: cache_manager.cleanup() else: for _, job in self._background_caching_jobs.items(): @@ -270,8 +258,10 @@ def cleanup(self, pipeline=None): for _, controller in self._test_stream_service_controllers.items(): if controller: controller.stop() - for _, cache_manager in self._cache_managers.items(): - if cache_manager: + for pipeline_id, cache_manager in self._cache_managers.items(): + # Recording manager performs cache manager cleanup during eviction, so + # we don't need to clean it up here. + if cache_manager and pipeline_id not in self._recording_managers: cache_manager.cleanup() self.evict_recording_manager(pipeline) diff --git a/sdks/python/apache_beam/runners/interactive/interactive_environment_test.py b/sdks/python/apache_beam/runners/interactive/interactive_environment_test.py index 2d91608814b5..f08db0156e83 100644 --- a/sdks/python/apache_beam/runners/interactive/interactive_environment_test.py +++ b/sdks/python/apache_beam/runners/interactive/interactive_environment_test.py @@ -18,11 +18,9 @@ """Tests for apache_beam.runners.interactive.interactive_environment.""" # pytype: skip-file -from __future__ import absolute_import - import importlib -import sys import unittest +from unittest.mock import patch import apache_beam as beam from apache_beam.runners import runner @@ -30,19 +28,10 @@ from apache_beam.runners.interactive import interactive_environment as ie from apache_beam.runners.interactive.recording_manager import RecordingManager -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch # type: ignore[misc] - # The module name is also a variable in module. _module_name = 'apache_beam.runners.interactive.interactive_environment_test' -@unittest.skipIf( - sys.version_info < (3, 6), 'The tests require at least Python 3.6 to work.') class InteractiveEnvironmentTest(unittest.TestCase): def setUp(self): self._p = beam.Pipeline() @@ -168,78 +157,59 @@ def test_pipeline_result_is_none_when_pipeline_absent(self): self.assertIs(ie.current_env().is_terminated(self._p), True) self.assertIs(ie.current_env().evict_pipeline_result(self._p), None) - @patch('atexit.register') - def test_cleanup_registered_when_creating_new_env(self, mocked_atexit): - ie.new_env() - mocked_atexit.assert_called_once() - - @patch( - 'apache_beam.runners.interactive.interactive_environment' - '.InteractiveEnvironment.cleanup') - def test_cleanup_invoked_when_new_env_replace_not_none_env( - self, mocked_cleanup): - ie._interactive_beam_env = None - ie.new_env() - mocked_cleanup.assert_not_called() - ie.new_env() - mocked_cleanup.assert_called_once() - - @patch( - 'apache_beam.runners.interactive.interactive_environment' - '.InteractiveEnvironment.cleanup') - def test_cleanup_not_invoked_when_cm_changed_from_none(self, mocked_cleanup): - ie._interactive_beam_env = None - ie.new_env() - dummy_pipeline = 'dummy' - self.assertIsNone(ie.current_env().get_cache_manager(dummy_pipeline)) - cache_manager = cache.FileBasedCacheManager() - ie.current_env().set_cache_manager(cache_manager, dummy_pipeline) - mocked_cleanup.assert_not_called() - self.assertIs( - ie.current_env().get_cache_manager(dummy_pipeline), cache_manager) + def test_cleanup_registered_when_creating_new_env(self): + with patch('atexit.register') as mocked_atexit: + _ = ie.InteractiveEnvironment() + mocked_atexit.assert_called_once() - @patch( - 'apache_beam.runners.interactive.interactive_environment' - '.InteractiveEnvironment.cleanup') - def test_cleanup_invoked_when_not_none_cm_changed(self, mocked_cleanup): + def test_cleanup_invoked_when_new_env_replace_not_none_env(self): ie._interactive_beam_env = None ie.new_env() - dummy_pipeline = 'dummy' - ie.current_env().set_cache_manager( - cache.FileBasedCacheManager(), dummy_pipeline) - mocked_cleanup.assert_not_called() - ie.current_env().set_cache_manager( - cache.FileBasedCacheManager(), dummy_pipeline) - mocked_cleanup.assert_called_once() - - @patch( - 'apache_beam.runners.interactive.interactive_environment' - '.InteractiveEnvironment.cleanup') - def test_noop_when_cm_is_not_changed(self, mocked_cleanup): - ie._interactive_beam_env = None + with patch('apache_beam.runners.interactive.interactive_environment' + '.InteractiveEnvironment.cleanup') as mocked_cleanup: + ie.new_env() + mocked_cleanup.assert_called_once() + + def test_cleanup_not_invoked_when_cm_changed_from_none(self): + env = ie.InteractiveEnvironment() + with patch('apache_beam.runners.interactive.interactive_environment' + '.InteractiveEnvironment.cleanup') as mocked_cleanup: + dummy_pipeline = 'dummy' + self.assertIsNone(env.get_cache_manager(dummy_pipeline)) + cache_manager = cache.FileBasedCacheManager() + env.set_cache_manager(cache_manager, dummy_pipeline) + mocked_cleanup.assert_not_called() + self.assertIs(env.get_cache_manager(dummy_pipeline), cache_manager) + + def test_cleanup_invoked_when_not_none_cm_changed(self): + env = ie.InteractiveEnvironment() + with patch('apache_beam.runners.interactive.interactive_environment' + '.InteractiveEnvironment.cleanup') as mocked_cleanup: + dummy_pipeline = 'dummy' + env.set_cache_manager(cache.FileBasedCacheManager(), dummy_pipeline) + mocked_cleanup.assert_not_called() + env.set_cache_manager(cache.FileBasedCacheManager(), dummy_pipeline) + mocked_cleanup.assert_called_once() + + def test_noop_when_cm_is_not_changed(self): cache_manager = cache.FileBasedCacheManager() dummy_pipeline = 'dummy' - ie.new_env() - ie.current_env()._cache_managers[str(id(dummy_pipeline))] = cache_manager - mocked_cleanup.assert_not_called() - ie.current_env().set_cache_manager(cache_manager, dummy_pipeline) - mocked_cleanup.assert_not_called() + env = ie.InteractiveEnvironment() + with patch('apache_beam.runners.interactive.interactive_environment' + '.InteractiveEnvironment.cleanup') as mocked_cleanup: + env._cache_managers[str(id(dummy_pipeline))] = cache_manager + mocked_cleanup.assert_not_called() + env.set_cache_manager(cache_manager, dummy_pipeline) + mocked_cleanup.assert_not_called() def test_get_cache_manager_creates_cache_manager_if_absent(self): - ie._interactive_beam_env = None - ie.new_env() + env = ie.InteractiveEnvironment() dummy_pipeline = 'dummy' - self.assertIsNone(ie.current_env().get_cache_manager(dummy_pipeline)) + self.assertIsNone(env.get_cache_manager(dummy_pipeline)) self.assertIsNotNone( - ie.current_env().get_cache_manager( - dummy_pipeline, create_if_absent=True)) - - @patch( - 'apache_beam.runners.interactive.interactive_environment' - '.InteractiveEnvironment.cleanup') - def test_track_user_pipeline_cleanup_non_inspectable_pipeline( - self, mocked_cleanup): - ie._interactive_beam_env = None + env.get_cache_manager(dummy_pipeline, create_if_absent=True)) + + def test_track_user_pipeline_cleanup_non_inspectable_pipeline(self): ie.new_env() dummy_pipeline_1 = beam.Pipeline() dummy_pipeline_2 = beam.Pipeline() @@ -262,9 +232,10 @@ def test_track_user_pipeline_cleanup_non_inspectable_pipeline( dummy_non_inspectable_pipeline, None) ie.current_env().set_pipeline_result( dummy_pipeline_5, runner.PipelineResult(runner.PipelineState.RUNNING)) - mocked_cleanup.assert_not_called() - ie.current_env().track_user_pipelines() - mocked_cleanup.assert_called_once() + with patch('apache_beam.runners.interactive.interactive_environment' + '.InteractiveEnvironment.cleanup') as mocked_cleanup: + ie.current_env().track_user_pipelines() + mocked_cleanup.assert_called_once() def test_evict_pcollections(self): """Tests the evicton logic in the InteractiveEnvironment.""" diff --git a/sdks/python/apache_beam/runners/interactive/interactive_runner.py b/sdks/python/apache_beam/runners/interactive/interactive_runner.py index affc426572ca..68a66e000deb 100644 --- a/sdks/python/apache_beam/runners/interactive/interactive_runner.py +++ b/sdks/python/apache_beam/runners/interactive/interactive_runner.py @@ -22,10 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import logging import apache_beam as beam @@ -190,7 +186,7 @@ def visit_transform(self, transform_node): if not self._skip_display: a_pipeline_graph = pipeline_graph.PipelineGraph( - pipeline_instrument.original_pipeline, + pipeline_instrument.original_pipeline_proto, render_option=self._render_option) a_pipeline_graph.display_graph() diff --git a/sdks/python/apache_beam/runners/interactive/interactive_runner_test.py b/sdks/python/apache_beam/runners/interactive/interactive_runner_test.py index 6f22de3fb500..69551de15596 100644 --- a/sdks/python/apache_beam/runners/interactive/interactive_runner_test.py +++ b/sdks/python/apache_beam/runners/interactive/interactive_runner_test.py @@ -22,16 +22,15 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import sys import unittest +from typing import NamedTuple +from unittest.mock import patch import pandas as pd import apache_beam as beam +from apache_beam.dataframe.convert import to_dataframe from apache_beam.options.pipeline_options import StandardOptions from apache_beam.runners.direct import direct_runner from apache_beam.runners.interactive import interactive_beam as ib @@ -46,13 +45,6 @@ from apache_beam.utils.windowed_value import PaneInfoTiming from apache_beam.utils.windowed_value import WindowedValue -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch # type: ignore[misc] - def print_with_message(msg): def printer(elem): @@ -62,6 +54,12 @@ def printer(elem): return printer +class Record(NamedTuple): + name: str + age: int + height: int + + class InteractiveRunnerTest(unittest.TestCase): @unittest.skipIf(sys.platform == "win32", "[BEAM-10627]") def test_basic(self): @@ -154,9 +152,6 @@ def process(self, element): ] self.assertEqual(actual_reified, expected_reified) - @unittest.skipIf( - sys.version_info < (3, 5, 3), - 'The tests require at least Python 3.6 to work.') def test_streaming_wordcount(self): class WordExtractingDoFn(beam.DoFn): def process(self, element): @@ -288,6 +283,206 @@ def test_mark_pcollection_completed_after_successful_run(self, cell): self.assertTrue(cube in ie.current_env().computed_pcollections) self.assertEqual({0, 1, 8, 27, 64}, set(result.get(cube))) + @unittest.skipIf(sys.platform == "win32", "[BEAM-10627]") + def test_dataframes(self): + p = beam.Pipeline( + runner=interactive_runner.InteractiveRunner( + direct_runner.DirectRunner())) + data = p | beam.Create( + [1, 2, 3]) | beam.Map(lambda x: beam.Row(square=x * x, cube=x * x * x)) + df = to_dataframe(data) + + # Watch the local scope for Interactive Beam so that values will be cached. + ib.watch(locals()) + + # This is normally done in the interactive_utils when a transform is + # applied but needs an IPython environment. So we manually run this here. + ie.current_env().track_user_pipelines() + + df_expected = pd.DataFrame({'square': [1, 4, 9], 'cube': [1, 8, 27]}) + pd.testing.assert_frame_equal( + df_expected, ib.collect(df, n=10).reset_index(drop=True)) + + @unittest.skipIf(sys.platform == "win32", "[BEAM-10627]") + def test_dataframes_with_grouped_index(self): + p = beam.Pipeline( + runner=interactive_runner.InteractiveRunner( + direct_runner.DirectRunner())) + + data = [ + Record('a', 20, 170), + Record('a', 30, 170), + Record('b', 22, 180), + Record('c', 18, 150) + ] + + aggregate = lambda df: df.groupby('height').mean() + + deferred_df = aggregate(to_dataframe(p | beam.Create(data))) + df_expected = aggregate(pd.DataFrame(data)) + + # Watch the local scope for Interactive Beam so that values will be cached. + ib.watch(locals()) + + # This is normally done in the interactive_utils when a transform is + # applied but needs an IPython environment. So we manually run this here. + ie.current_env().track_user_pipelines() + + pd.testing.assert_frame_equal(df_expected, ib.collect(deferred_df, n=10)) + + @unittest.skipIf(sys.platform == "win32", "[BEAM-10627]") + def test_dataframes_with_multi_index(self): + p = beam.Pipeline( + runner=interactive_runner.InteractiveRunner( + direct_runner.DirectRunner())) + + data = [ + Record('a', 20, 170), + Record('a', 30, 170), + Record('b', 22, 180), + Record('c', 18, 150) + ] + + aggregate = lambda df: df.groupby(['name', 'height']).mean() + + deferred_df = aggregate(to_dataframe(p | beam.Create(data))) + df_expected = aggregate(pd.DataFrame(data)) + + # Watch the local scope for Interactive Beam so that values will be cached. + ib.watch(locals()) + + # This is normally done in the interactive_utils when a transform is + # applied but needs an IPython environment. So we manually run this here. + ie.current_env().track_user_pipelines() + + pd.testing.assert_frame_equal(df_expected, ib.collect(deferred_df, n=10)) + + @unittest.skipIf(sys.platform == "win32", "[BEAM-10627]") + def test_dataframes_with_multi_index_get_result(self): + p = beam.Pipeline( + runner=interactive_runner.InteractiveRunner( + direct_runner.DirectRunner())) + + data = [ + Record('a', 20, 170), + Record('a', 30, 170), + Record('b', 22, 180), + Record('c', 18, 150) + ] + + aggregate = lambda df: df.groupby(['name', 'height']).mean()['age'] + + deferred_df = aggregate(to_dataframe(p | beam.Create(data))) + df_expected = aggregate(pd.DataFrame(data)) + + # Watch the local scope for Interactive Beam so that values will be cached. + ib.watch(locals()) + + # This is normally done in the interactive_utils when a transform is + # applied but needs an IPython environment. So we manually run this here. + ie.current_env().track_user_pipelines() + + pd.testing.assert_series_equal(df_expected, ib.collect(deferred_df, n=10)) + + @unittest.skipIf(sys.platform == "win32", "[BEAM-10627]") + def test_dataframes_same_cell_twice(self): + p = beam.Pipeline( + runner=interactive_runner.InteractiveRunner( + direct_runner.DirectRunner())) + data = p | beam.Create( + [1, 2, 3]) | beam.Map(lambda x: beam.Row(square=x * x, cube=x * x * x)) + df = to_dataframe(data) + + # Watch the local scope for Interactive Beam so that values will be cached. + ib.watch(locals()) + + # This is normally done in the interactive_utils when a transform is + # applied but needs an IPython environment. So we manually run this here. + ie.current_env().track_user_pipelines() + + df_expected = pd.DataFrame({'square': [1, 4, 9], 'cube': [1, 8, 27]}) + pd.testing.assert_series_equal( + df_expected['square'], + ib.collect(df['square'], n=10).reset_index(drop=True)) + pd.testing.assert_series_equal( + df_expected['cube'], + ib.collect(df['cube'], n=10).reset_index(drop=True)) + + @unittest.skipIf( + not ie.current_env().is_interactive_ready, + '[interactive] dependency is not installed.') + @unittest.skipIf(sys.platform == "win32", "[BEAM-10627]") + @patch('IPython.get_ipython', new_callable=mock_get_ipython) + def test_dataframe_caching(self, cell): + + # Create a pipeline that exercises the DataFrame API. This will also use + # caching in the background. + with cell: # Cell 1 + p = beam.Pipeline(interactive_runner.InteractiveRunner()) + ib.watch({'p': p}) + + with cell: # Cell 2 + data = p | beam.Create([ + 1, 2, 3 + ]) | beam.Map(lambda x: beam.Row(square=x * x, cube=x * x * x)) + + with beam.dataframe.allow_non_parallel_operations(): + df = to_dataframe(data).reset_index(drop=True) + + ib.collect(df) + + with cell: # Cell 3 + df['output'] = df['square'] * df['cube'] + ib.collect(df) + + with cell: # Cell 4 + df['output'] = 0 + ib.collect(df) + + # We use a trace through the graph to perform an isomorphism test. The end + # output should look like a linear graph. This indicates that the dataframe + # transform was correctly broken into separate pieces to cache. If caching + # isn't enabled, all the dataframe computation nodes are connected to a + # single shared node. + trace = [] + + # Only look at the top-level transforms for the isomorphism. The test + # doesn't care about the transform implementations, just the overall shape. + class TopLevelTracer(beam.pipeline.PipelineVisitor): + def _find_root_producer(self, node: beam.pipeline.AppliedPTransform): + if node is None or not node.full_label: + return None + + parent = self._find_root_producer(node.parent) + if parent is None: + return node + + return parent + + def _add_to_trace(self, node, trace): + if '/' not in str(node): + if node.inputs: + producer = self._find_root_producer(node.inputs[0].producer) + producer_name = producer.full_label if producer else '' + trace.append((producer_name, node.full_label)) + + def visit_transform(self, node: beam.pipeline.AppliedPTransform): + self._add_to_trace(node, trace) + + def enter_composite_transform( + self, node: beam.pipeline.AppliedPTransform): + self._add_to_trace(node, trace) + + p.visit(TopLevelTracer()) + + # Do the isomorphism test which states that the topological sort of the + # graph yields a linear graph. + trace_string = '\n'.join(str(t) for t in trace) + prev_producer = '' + for producer, consumer in trace: + self.assertEqual(producer, prev_producer, trace_string) + prev_producer = consumer + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/runners/interactive/messaging/interactive_environment_inspector.py b/sdks/python/apache_beam/runners/interactive/messaging/interactive_environment_inspector.py index 0fc1e7cc38c5..ed3dc51f972a 100644 --- a/sdks/python/apache_beam/runners/interactive/messaging/interactive_environment_inspector.py +++ b/sdks/python/apache_beam/runners/interactive/messaging/interactive_environment_inspector.py @@ -22,8 +22,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - import apache_beam as beam from apache_beam.runners.interactive.utils import as_json from apache_beam.runners.interactive.utils import obfuscate diff --git a/sdks/python/apache_beam/runners/interactive/messaging/interactive_environment_inspector_test.py b/sdks/python/apache_beam/runners/interactive/messaging/interactive_environment_inspector_test.py index 764e0f19b98b..4edb0d47c436 100644 --- a/sdks/python/apache_beam/runners/interactive/messaging/interactive_environment_inspector_test.py +++ b/sdks/python/apache_beam/runners/interactive/messaging/interactive_environment_inspector_test.py @@ -18,11 +18,10 @@ """Tests for interactive_environment_inspector.""" # pytype: skip-file -from __future__ import absolute_import - import json import sys import unittest +from unittest.mock import patch import apache_beam as beam import apache_beam.runners.interactive.messaging.interactive_environment_inspector as inspector @@ -32,13 +31,6 @@ from apache_beam.runners.interactive.testing.mock_ipython import mock_get_ipython from apache_beam.runners.interactive.utils import obfuscate -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch # type: ignore[misc] - @unittest.skipIf( not ie.current_env().is_interactive_ready, diff --git a/sdks/python/apache_beam/runners/interactive/options/capture_control.py b/sdks/python/apache_beam/runners/interactive/options/capture_control.py index 837ff57d8839..e4ea95d432a5 100644 --- a/sdks/python/apache_beam/runners/interactive/options/capture_control.py +++ b/sdks/python/apache_beam/runners/interactive/options/capture_control.py @@ -23,8 +23,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging from datetime import timedelta diff --git a/sdks/python/apache_beam/runners/interactive/options/capture_control_test.py b/sdks/python/apache_beam/runners/interactive/options/capture_control_test.py index d57e01b3c8ad..279e8570b2ac 100644 --- a/sdks/python/apache_beam/runners/interactive/options/capture_control_test.py +++ b/sdks/python/apache_beam/runners/interactive/options/capture_control_test.py @@ -19,10 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - -import sys import unittest +from unittest.mock import patch import apache_beam as beam from apache_beam import coders @@ -38,13 +36,6 @@ from apache_beam.runners.interactive.options import capture_limiters from apache_beam.testing.test_stream_service import TestStreamServiceController -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch # type: ignore[misc] - def _build_an_empty_streaming_pipeline(): from apache_beam.options.pipeline_options import PipelineOptions @@ -71,8 +62,6 @@ def read_multiple(self): @unittest.skipIf( not ie.current_env().is_interactive_ready, '[interactive] dependency is not installed.') -@unittest.skipIf( - sys.version_info < (3, 6), 'The tests require at least Python 3.6 to work.') class CaptureControlTest(unittest.TestCase): def setUp(self): ie.new_env() diff --git a/sdks/python/apache_beam/runners/interactive/options/capture_limiters.py b/sdks/python/apache_beam/runners/interactive/options/capture_limiters.py index c32aa46934f2..ed921cce1d0f 100644 --- a/sdks/python/apache_beam/runners/interactive/options/capture_limiters.py +++ b/sdks/python/apache_beam/runners/interactive/options/capture_limiters.py @@ -20,14 +20,15 @@ For internal use only; no backwards-compatibility guarantees. """ -from __future__ import absolute_import - import threading +import pandas as pd + from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileHeader from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.utils.windowed_value import WindowedValue class Limiter: @@ -107,7 +108,12 @@ def update(self, e): # Otherwise, count everything else but the header of the file since it is # not an element. elif not isinstance(e, TestStreamFileHeader): - self._count += 1 + # When elements are DataFrames, we want the output to be constrained by + # how many rows we have read, not how many DataFrames we have read. + if isinstance(e, WindowedValue) and isinstance(e.value, pd.DataFrame): + self._count += len(e.value) + else: + self._count += 1 def is_triggered(self): return self._count >= self._max_count diff --git a/sdks/python/apache_beam/runners/interactive/options/capture_limiters_test.py b/sdks/python/apache_beam/runners/interactive/options/capture_limiters_test.py index 3144a9e3fd4f..2af1995894c6 100644 --- a/sdks/python/apache_beam/runners/interactive/options/capture_limiters_test.py +++ b/sdks/python/apache_beam/runners/interactive/options/capture_limiters_test.py @@ -15,13 +15,14 @@ # limitations under the License. # -from __future__ import absolute_import - import unittest +import pandas as pd + from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload from apache_beam.runners.interactive.options.capture_limiters import CountLimiter from apache_beam.runners.interactive.options.capture_limiters import ProcessingTimeLimiter +from apache_beam.utils.windowed_value import WindowedValue class CaptureLimitersTest(unittest.TestCase): @@ -35,6 +36,19 @@ def test_count_limiter(self): limiter.update(4) self.assertTrue(limiter.is_triggered()) + def test_count_limiter_with_dataframes(self): + limiter = CountLimiter(5) + + # Test that empty dataframes don't count. + for _ in range(10): + df = WindowedValue(pd.DataFrame(), 0, []) + limiter.update(df) + + self.assertFalse(limiter.is_triggered()) + df = WindowedValue(pd.DataFrame({'col': list(range(10))}), 0, []) + limiter.update(df) + self.assertTrue(limiter.is_triggered()) + def test_processing_time_limiter(self): limiter = ProcessingTimeLimiter(max_duration_secs=2) diff --git a/sdks/python/apache_beam/runners/interactive/options/interactive_options.py b/sdks/python/apache_beam/runners/interactive/options/interactive_options.py index 781289c50ef8..ccc5d19ae44e 100644 --- a/sdks/python/apache_beam/runners/interactive/options/interactive_options.py +++ b/sdks/python/apache_beam/runners/interactive/options/interactive_options.py @@ -22,8 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import - from dateutil import tz from apache_beam.runners.interactive.options import capture_control diff --git a/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py b/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py index 31d0c18a2798..6c0a9228b343 100644 --- a/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py +++ b/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py @@ -19,8 +19,6 @@ For internal use only; no backwards-compatibility guarantees. """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.pipeline import PipelineVisitor from apache_beam.runners.interactive import interactive_environment as ie @@ -65,8 +63,7 @@ def __init__(self, pcolls, options=None): # pipeline instance held by the end user. This instance can be processed # into a pipeline fragment that later run by the underlying runner. self._runner_pipeline = self._build_runner_pipeline() - _, self._context = self._runner_pipeline.to_runner_api( - return_context=True, use_fake_coders=True) + _, self._context = self._runner_pipeline.to_runner_api(return_context=True) from apache_beam.runners.interactive import pipeline_instrument as instr self._runner_pcoll_to_id = instr.pcolls_to_pcoll_id( self._runner_pipeline, self._context) @@ -98,7 +95,7 @@ def __init__(self, pcolls, options=None): def deduce_fragment(self): """Deduce the pipeline fragment as an apache_beam.Pipeline instance.""" fragment = beam.pipeline.Pipeline.from_runner_api( - self._runner_pipeline.to_runner_api(use_fake_coders=True), + self._runner_pipeline.to_runner_api(), self._runner_pipeline.runner, self._options) ie.current_env().add_derived_pipeline(self._runner_pipeline, fragment) @@ -121,7 +118,7 @@ def run(self, display_pipeline_graph=False, use_cache=True, blocking=False): def _build_runner_pipeline(self): runner_pipeline = beam.pipeline.Pipeline.from_runner_api( - self._user_pipeline.to_runner_api(use_fake_coders=True), + self._user_pipeline.to_runner_api(), self._user_pipeline.runner, self._options) ie.current_env().add_derived_pipeline(self._user_pipeline, runner_pipeline) diff --git a/sdks/python/apache_beam/runners/interactive/pipeline_fragment_test.py b/sdks/python/apache_beam/runners/interactive/pipeline_fragment_test.py index 4ab580138e34..6e9d327e3021 100644 --- a/sdks/python/apache_beam/runners/interactive/pipeline_fragment_test.py +++ b/sdks/python/apache_beam/runners/interactive/pipeline_fragment_test.py @@ -16,9 +16,8 @@ # """Tests for apache_beam.runners.interactive.pipeline_fragment.""" -from __future__ import absolute_import - import unittest +from unittest.mock import patch import apache_beam as beam from apache_beam.options.pipeline_options import StandardOptions @@ -31,13 +30,6 @@ from apache_beam.runners.interactive.testing.pipeline_assertion import assert_pipeline_proto_equal from apache_beam.testing.test_stream import TestStream -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch # type: ignore[misc] - @unittest.skipIf( not ie.current_env().is_interactive_ready, @@ -92,10 +84,10 @@ def test_user_pipeline_intact_after_deducing_pipeline_fragment(self, cell): # calling locals(). ib.watch({'init': init, 'square': square, 'cube': cube}) user_pipeline_proto_before_deducing_fragment = p.to_runner_api( - return_context=False, use_fake_coders=True) + return_context=False) _ = pf.PipelineFragment([square]).deduce_fragment() user_pipeline_proto_after_deducing_fragment = p.to_runner_api( - return_context=False, use_fake_coders=True) + return_context=False) assert_pipeline_proto_equal( self, user_pipeline_proto_before_deducing_fragment, diff --git a/sdks/python/apache_beam/runners/interactive/pipeline_instrument.py b/sdks/python/apache_beam/runners/interactive/pipeline_instrument.py index f112aebaf8d8..448fca7c515f 100644 --- a/sdks/python/apache_beam/runners/interactive/pipeline_instrument.py +++ b/sdks/python/apache_beam/runners/interactive/pipeline_instrument.py @@ -23,8 +23,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - import apache_beam as beam from apache_beam.pipeline import PipelineVisitor from apache_beam.portability.api import beam_runner_api_pb2 @@ -32,7 +30,8 @@ from apache_beam.runners.interactive import interactive_environment as ie from apache_beam.runners.interactive import pipeline_fragment as pf from apache_beam.runners.interactive import background_caching_job -from apache_beam.runners.interactive.utils import obfuscate +from apache_beam.runners.interactive.caching.cacheable import Cacheable +from apache_beam.runners.interactive.caching.cacheable import CacheKey from apache_beam.testing import test_stream from apache_beam.transforms.window import WindowedValue @@ -40,57 +39,6 @@ WRITE_CACHE = "_WriteCache_" -# TODO: turn this into a dataclass object when we finally get off of Python2. -class Cacheable: - def __init__(self, pcoll_id, var, version, pcoll, producer_version): - self.pcoll_id = pcoll_id - self.var = var - self.version = version - self.pcoll = pcoll - self.producer_version = producer_version - - def __eq__(self, other): - return ( - self.pcoll_id == other.pcoll_id and self.var == other.var and - self.version == other.version and self.pcoll == other.pcoll and - self.producer_version == other.producer_version) - - def __hash__(self): - return hash(( - self.pcoll_id, - self.var, - self.version, - self.pcoll, - self.producer_version)) - - def to_key(self): - return CacheKey( - self.var, - self.version, - self.producer_version, - str(id(self.pcoll.pipeline))) - - -# TODO: turn this into a dataclass object when we finally get off of Python2. -class CacheKey: - def __init__(self, var, version, producer_version, pipeline_id): - # Makes sure that the variable name is obfuscated and only first 10 - # characters taken so that the CacheKey has a constant length. - self.var = obfuscate(var)[:10] - self.version = version - self.producer_version = producer_version - self.pipeline_id = pipeline_id - - @staticmethod - def from_str(r): - split = r.split('-') - return CacheKey(split[0], split[1], split[2], split[3]) - - def __repr__(self): - return '-'.join( - [self.var, self.version, self.producer_version, self.pipeline_id]) - - class PipelineInstrument(object): """A pipeline instrument for pipeline to be executed by interactive runner. @@ -103,36 +51,34 @@ class PipelineInstrument(object): """ def __init__(self, pipeline, options=None): self._pipeline = pipeline - # The cache manager per user-defined pipeline is lazily initiated the first - # time accessed. It is owned by interactive_environment module. This - # shortcut reference will be initialized when the user pipeline associated - # to the given pipeline is identified. - self._cache_manager = None - - # Invoke a round trip through the runner API. This makes sure the Pipeline - # proto is stable. The snapshot of pipeline will not be mutated within this - # module and can be used to recover original pipeline if needed. - self._pipeline_snap = beam.pipeline.Pipeline.from_runner_api( - pipeline.to_runner_api(use_fake_coders=True), pipeline.runner, options) - ie.current_env().add_derived_pipeline(self._pipeline, self._pipeline_snap) + + self._user_pipeline = ie.current_env().user_pipeline(pipeline) + if not self._user_pipeline: + self._user_pipeline = pipeline + self._cache_manager = ie.current_env().get_cache_manager( + self._user_pipeline, create_if_absent=True) + # Check if the user defined pipeline contains any source to cache. + # If so, during the check, the cache manager is converted into a + # streaming cache manager, thus re-assign. + if background_caching_job.has_source_to_cache(self._user_pipeline): + self._cache_manager = ie.current_env().get_cache_manager( + self._user_pipeline) self._background_caching_pipeline = beam.pipeline.Pipeline.from_runner_api( - pipeline.to_runner_api(use_fake_coders=True), pipeline.runner, options) + pipeline.to_runner_api(), pipeline.runner, options) ie.current_env().add_derived_pipeline( self._pipeline, self._background_caching_pipeline) # Snapshot of original pipeline information. (self._original_pipeline_proto, - self._original_context) = self._pipeline_snap.to_runner_api( - return_context=True, use_fake_coders=True) + context) = self._pipeline.to_runner_api(return_context=True) # All compute-once-against-original-pipeline fields. self._unbounded_sources = unbounded_sources( self._background_caching_pipeline) # TODO(BEAM-7760): once cache scope changed, this is not needed to manage # relationships across pipelines, runners, and jobs. - self._pcolls_to_pcoll_id = pcolls_to_pcoll_id( - self._pipeline_snap, self._original_context) + self._pcolls_to_pcoll_id = pcolls_to_pcoll_id(self._pipeline, context) # A mapping from PCollection id to python id() value in user defined # pipeline instance. @@ -149,11 +95,6 @@ def __init__(self, pipeline, options=None): # (Dict[str, AppliedPTransform]). self._cached_pcoll_read = {} - # Reference to the user defined pipeline instance based on the given - # pipeline. The class never mutates it. - # Note: the original pipeline is not the user pipeline. - self._user_pipeline = None - # A dict from PCollections in the runner pipeline instance to their # corresponding PCollections in the user pipeline instance. Populated # after preprocess(). @@ -182,8 +123,8 @@ def instrumented_pipeline_proto(self): # Prunes upstream transforms that don't contribute to the targets the # instrumented pipeline run cares. return pf.PipelineFragment( - list(targets)).deduce_fragment().to_runner_api(use_fake_coders=True) - return self._pipeline.to_runner_api(use_fake_coders=True) + list(targets)).deduce_fragment().to_runner_api() + return self._pipeline.to_runner_api() def _required_components( self, @@ -286,8 +227,7 @@ def _required_components( def prune_subgraph_for(self, pipeline, required_transform_ids): # Create the pipeline_proto to read all the components from. It will later # create a new pipeline proto from the cut out components. - pipeline_proto, context = pipeline.to_runner_api( - return_context=True, use_fake_coders=False) + pipeline_proto, context = pipeline.to_runner_api(return_context=True) # Get all the root transforms. The caching transforms will be subtransforms # of one of these roots. @@ -337,7 +277,7 @@ def background_caching_pipeline_proto(self): # Create the pipeline_proto to read all the components from. It will later # create a new pipeline proto from the cut out components. pipeline_proto, context = self._background_caching_pipeline.to_runner_api( - return_context=True, use_fake_coders=False) + return_context=True) # Get all the sources we want to cache. sources = unbounded_sources(self._background_caching_pipeline) @@ -421,15 +361,9 @@ def pcolls_to_pcoll_id(self): @property def original_pipeline_proto(self): - """Returns the portable proto representation of the pipeline before - instrumentation.""" + """Returns a snapshot of the pipeline proto before instrumentation.""" return self._original_pipeline_proto - @property - def original_pipeline(self): - """Returns a snapshot of the pipeline before instrumentation.""" - return self._pipeline_snap - @property def user_pipeline(self): """Returns a reference to the pipeline instance defined by the user. If a @@ -528,8 +462,7 @@ def visit_transform(self, transform_node): v = TestStreamVisitor() self._pipeline.visit(v) - pipeline_proto = self._pipeline.to_runner_api( - return_context=False, use_fake_coders=True) + pipeline_proto = self._pipeline.to_runner_api(return_context=False) test_stream_id = '' for t_id, t in pipeline_proto.components.transforms.items(): if t.unique_name == v.test_stream: @@ -571,29 +504,11 @@ def _process(self, pcoll): cacheable_key = self._pin._cacheable_key(pcoll) user_pcoll = self._pin.cacheables[cacheable_key].pcoll if (cacheable_key in self._pin.cacheables and user_pcoll != pcoll): - if not self._pin._user_pipeline: - # Retrieve a reference to the user defined pipeline instance. - self._pin._user_pipeline = user_pcoll.pipeline - # Retrieve a reference to the cache manager for the user defined - # pipeline instance. - self._pin._cache_manager = ie.current_env().get_cache_manager( - self._pin._user_pipeline, create_if_absent=True) - # Check if the user defined pipeline contains any source to cache. - # If so, during the check, the cache manager is converted into a - # streaming cache manager, thus re-assign the reference. - if background_caching_job.has_source_to_cache( - self._pin._user_pipeline): - self._pin._cache_manager = ie.current_env().get_cache_manager( - self._pin._user_pipeline) self._pin._runner_pcoll_to_user_pcoll[pcoll] = user_pcoll self._pin.cacheables[cacheable_key].pcoll = pcoll v = PreprocessVisitor(self) self._pipeline.visit(v) - if not self._user_pipeline: - self._user_pipeline = self._pipeline - self._cache_manager = ie.current_env().get_cache_manager( - self._user_pipeline, create_if_absent=True) def _write_cache( self, @@ -679,7 +594,6 @@ def _read_cache(self, pipeline, pcoll, is_unbounded_source_output): key = self.cache_key(pcoll) # Can only read from cache when the cache with expected key exists and its # computation has been completed. - is_cached = self._cache_manager.exists('full', key) is_computed = ( pcoll in self._runner_pcoll_to_user_pcoll and @@ -771,8 +685,8 @@ def enter_composite_transform(self, transform_node): def visit_transform(self, transform_node): if transform_node.inputs: - input_list = list(transform_node.inputs) - for i, input_pcoll in enumerate(input_list): + main_inputs = dict(transform_node.main_inputs) + for tag, input_pcoll in main_inputs.items(): key = self._pin.cache_key(input_pcoll) # Replace the input pcollection with the cached pcollection (if it @@ -780,9 +694,9 @@ def visit_transform(self, transform_node): if key in self._pin._cached_pcoll_read: # Ignore this pcoll in the final pruned instrumented pipeline. self._pin._ignored_targets.add(input_pcoll) - input_list[i] = self._pin._cached_pcoll_read[key] + main_inputs[tag] = self._pin._cached_pcoll_read[key] # Update the transform with its new inputs. - transform_node.inputs = tuple(input_list) + transform_node.main_inputs = main_inputs v = ReadCacheWireVisitor(self) pipeline.visit(v) @@ -885,10 +799,7 @@ def cacheables(pcolls_to_pcoll_id): cacheable_var_by_pcoll_id = {} for watching in ie.current_env().watching(): for key, val in watching: - # TODO(BEAM-8288): cleanup the attribute check when py2 is not supported. - if hasattr(val, '__class__') and isinstance(val, beam.pvalue.PCollection): - cacheable = {} - + if isinstance(val, beam.pvalue.PCollection): pcoll_id = pcolls_to_pcoll_id.get(str(val), None) # It's highly possible that PCollection str is not unique across # multiple pipelines, further check during instrument is needed. diff --git a/sdks/python/apache_beam/runners/interactive/pipeline_instrument_test.py b/sdks/python/apache_beam/runners/interactive/pipeline_instrument_test.py index 2ab98ed5567b..a3f91c068f2c 100644 --- a/sdks/python/apache_beam/runners/interactive/pipeline_instrument_test.py +++ b/sdks/python/apache_beam/runners/interactive/pipeline_instrument_test.py @@ -18,8 +18,6 @@ """Tests for apache_beam.runners.interactive.pipeline_instrument.""" # pytype: skip-file -from __future__ import absolute_import - import tempfile import unittest @@ -42,6 +40,9 @@ class PipelineInstrumentTest(unittest.TestCase): + def setUp(self): + ie.new_env() + def cache_key_of(self, name, pcoll): return repr( instr.CacheKey( @@ -55,7 +56,7 @@ def test_pcolls_to_pcoll_id(self): ie.current_env().set_cache_manager(InMemoryCache(), p) # pylint: disable=range-builtin-not-iterating init_pcoll = p | 'Init Create' >> beam.Impulse() - _, ctx = p.to_runner_api(use_fake_coders=True, return_context=True) + _, ctx = p.to_runner_api(return_context=True) self.assertEqual( instr.pcolls_to_pcoll_id(p, ctx), {str(init_pcoll): 'ref_PCollection_PCollection_1'}) @@ -65,7 +66,7 @@ def test_cacheable_key_without_version_map(self): ie.current_env().set_cache_manager(InMemoryCache(), p) # pylint: disable=range-builtin-not-iterating init_pcoll = p | 'Init Create' >> beam.Create(range(10)) - _, ctx = p.to_runner_api(use_fake_coders=True, return_context=True) + _, ctx = p.to_runner_api(return_context=True) self.assertEqual( instr.cacheable_key(init_pcoll, instr.pcolls_to_pcoll_id(p, ctx)), str(id(init_pcoll)) + '_ref_PCollection_PCollection_8') @@ -87,7 +88,7 @@ def test_cacheable_key_with_version_map(self): ie.current_env().set_cache_manager(InMemoryCache(), p2) # pylint: disable=range-builtin-not-iterating init_pcoll_2 = p2 | 'Init Create' >> beam.Create(range(10)) - _, ctx = p2.to_runner_api(use_fake_coders=True, return_context=True) + _, ctx = p2.to_runner_api(return_context=True) # The cacheable_key should use id(init_pcoll) as prefix even when # init_pcoll_2 is supplied as long as the version map is given. @@ -188,7 +189,7 @@ def test_background_caching_pipeline_proto(self): # Add some extra PTransform afterwards to make sure that only the unbounded # sources remain. - c = (a, b) | beam.CoGroupByKey() + c = (a, b) | beam.Flatten() _ = c | beam.Map(lambda x: x) ib.watch(locals()) @@ -213,8 +214,7 @@ def test_background_caching_pipeline_proto(self): | 'reify b' >> beam.Map(lambda _: _) | 'b' >> cache.WriteCache(ie.current_env().get_cache_manager(p), '')) - expected_pipeline = p.to_runner_api( - return_context=False, use_fake_coders=True) + expected_pipeline = p.to_runner_api(return_context=False) assert_pipeline_proto_equal(self, expected_pipeline, actual_pipeline) @@ -279,6 +279,7 @@ def test_instrument_example_pipeline_to_read_cache(self): self._mock_write_cache(p_origin, [b'1', b'4', b'9'], second_pcoll_cache_key) # Mark the completeness of PCollections from the original(user) pipeline. ie.current_env().mark_pcollection_computed((init_pcoll, second_pcoll)) + ie.current_env().add_derived_pipeline(p_origin, p_copy) instr.build_pipeline_instrument(p_copy) cached_init_pcoll = ( @@ -296,11 +297,11 @@ def enter_composite_transform(self, transform_node): def visit_transform(self, transform_node): if transform_node.inputs: - input_list = list(transform_node.inputs) - for i in range(len(input_list)): - if input_list[i] == init_pcoll: - input_list[i] = cached_init_pcoll - transform_node.inputs = tuple(input_list) + main_inputs = dict(transform_node.main_inputs) + for tag in main_inputs.keys(): + if main_inputs[tag] == init_pcoll: + main_inputs[tag] = cached_init_pcoll + transform_node.main_inputs = main_inputs v = TestReadCacheWireVisitor() p_origin.visit(v) @@ -312,9 +313,8 @@ def test_find_out_correct_user_pipeline(self): # This is a new runner pipeline instance with the same pipeline graph to # what the user_pipeline represents. runner_pipeline = beam.pipeline.Pipeline.from_runner_api( - user_pipeline.to_runner_api(use_fake_coders=True), - user_pipeline.runner, - options=None) + user_pipeline.to_runner_api(), user_pipeline.runner, options=None) + ie.current_env().add_derived_pipeline(user_pipeline, runner_pipeline) # This is a totally irrelevant user pipeline in the watched scope. irrelevant_user_pipeline = beam.Pipeline( interactive_runner.InteractiveRunner()) @@ -342,7 +342,7 @@ def test_instrument_example_unbounded_pipeline_to_read_cache(self): if not isinstance(pcoll, beam.pvalue.PCollection): continue cache_key = self.cache_key_of(name, pcoll) - self._mock_write_cache(p_original, [b''], cache_key) + self._mock_write_cache(p_original, [], cache_key) # Instrument the original pipeline to create the pipeline the user will see. instrumenter = instr.build_pipeline_instrument(p_original) @@ -470,7 +470,7 @@ def visit_transform(self, transform_node): # Test that the pipeline is as expected. assert_pipeline_proto_equal( self, - p_expected.to_runner_api(use_fake_coders=True), + p_expected.to_runner_api(), instrumenter.instrumented_pipeline_proto()) def test_instrument_mixed_streaming_batch(self): @@ -498,7 +498,7 @@ def test_instrument_mixed_streaming_batch(self): ib.watch(locals()) self._mock_write_cache( - p_original, [b''], self.cache_key_of('source_2', source_2)) + p_original, [], self.cache_key_of('source_2', source_2)) ie.current_env().mark_pcollection_computed([source_2]) # Instrument the original pipeline to create the pipeline the user will see. @@ -506,6 +506,7 @@ def test_instrument_mixed_streaming_batch(self): p_original.to_runner_api(), runner=interactive_runner.InteractiveRunner(), options=options) + ie.current_env().add_derived_pipeline(p_original, p_copy) instrumenter = instr.build_pipeline_instrument(p_copy) actual_pipeline = beam.Pipeline.from_runner_api( proto=instrumenter.instrumented_pipeline_proto(), @@ -553,7 +554,7 @@ def visit_transform(self, transform_node): # Test that the pipeline is as expected. assert_pipeline_proto_equal( self, - p_expected.to_runner_api(use_fake_coders=True), + p_expected.to_runner_api(), instrumenter.instrumented_pipeline_proto()) def test_instrument_example_unbounded_pipeline_direct_from_source(self): @@ -615,7 +616,7 @@ def visit_transform(self, transform_node): # Test that the pipeline is as expected. assert_pipeline_proto_equal( self, - p_expected.to_runner_api(use_fake_coders=True), + p_expected.to_runner_api(), instrumenter.instrumented_pipeline_proto()) def test_instrument_example_unbounded_pipeline_to_read_cache_not_cached(self): @@ -683,7 +684,7 @@ def visit_transform(self, transform_node): # Test that the pipeline is as expected. assert_pipeline_proto_equal( self, - p_expected.to_runner_api(use_fake_coders=True), + p_expected.to_runner_api(), instrumenter.instrumented_pipeline_proto()) def test_instrument_example_unbounded_pipeline_to_multiple_read_cache(self): @@ -709,7 +710,7 @@ def test_instrument_example_unbounded_pipeline_to_multiple_read_cache(self): if not isinstance(pcoll, beam.pvalue.PCollection): continue cache_key = self.cache_key_of(name, pcoll) - self._mock_write_cache(p_original, [b''], cache_key) + self._mock_write_cache(p_original, [], cache_key) # Instrument the original pipeline to create the pipeline the user will see. instrumenter = instr.build_pipeline_instrument(p_original) @@ -763,9 +764,8 @@ def visit_transform(self, transform_node): def test_pipeline_pruned_when_input_pcoll_is_cached(self): user_pipeline, init_pcoll, _ = self._example_pipeline() runner_pipeline = beam.Pipeline.from_runner_api( - user_pipeline.to_runner_api(use_fake_coders=True), - user_pipeline.runner, - None) + user_pipeline.to_runner_api(), user_pipeline.runner, None) + ie.current_env().add_derived_pipeline(user_pipeline, runner_pipeline) # Mock as if init_pcoll is cached. init_pcoll_cache_key = self.cache_key_of('init_pcoll', init_pcoll) @@ -778,8 +778,7 @@ def test_pipeline_pruned_when_input_pcoll_is_cached(self): pruned_proto = pipeline_instrument.instrumented_pipeline_proto() # Skip the prune step for comparison, it should contain the sub-graph that # produces init_pcoll but not useful anymore. - full_proto = pipeline_instrument._pipeline.to_runner_api( - use_fake_coders=True) + full_proto = pipeline_instrument._pipeline.to_runner_api() self.assertEqual( len( pruned_proto.components.transforms[ @@ -793,7 +792,7 @@ def test_pipeline_pruned_when_input_pcoll_is_cached(self): 'ref_AppliedPTransform_AppliedPTransform_1'].subtransforms), 6) assert_pipeline_proto_contain_top_level_transform( - self, full_proto, 'Init Source') + self, full_proto, 'Init-Source') def test_side_effect_pcoll_is_included(self): pipeline_with_side_effect = beam.Pipeline( diff --git a/sdks/python/apache_beam/runners/interactive/recording_manager.py b/sdks/python/apache_beam/runners/interactive/recording_manager.py index f1fc085672ce..c51a6480a18a 100644 --- a/sdks/python/apache_beam/runners/interactive/recording_manager.py +++ b/sdks/python/apache_beam/runners/interactive/recording_manager.py @@ -15,8 +15,6 @@ # limitations under the License. # -from __future__ import absolute_import - import logging import threading import time @@ -25,6 +23,7 @@ import pandas as pd import apache_beam as beam +from apache_beam.dataframe.frame_base import DeferredBase from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload from apache_beam.runners.interactive import background_caching_job as bcj from apache_beam.runners.interactive import interactive_environment as ie @@ -65,6 +64,13 @@ def var(self): """Returns the variable named that defined this PCollection.""" return self._var + @property + def pcoll(self): + # type: () -> beam.pvalue.PCollection + + """Returns the PCollection that supplies this stream with data.""" + return self._pcoll + @property def cache_key(self): # type: () -> str @@ -289,10 +295,22 @@ def _watch(self, pcolls): """ watched_pcollections = set() + watched_dataframes = set() for watching in ie.current_env().watching(): for _, val in watching: if isinstance(val, beam.pvalue.PCollection): watched_pcollections.add(val) + elif isinstance(val, DeferredBase): + watched_dataframes.add(val) + + # Convert them one-by-one to generate a unique label for each. This allows + # caching at a more fine-grained granularity. + # + # TODO(BEAM-12388): investigate the mixing pcollections in multiple + # pipelines error when using the default label. + for df in watched_dataframes: + pcoll, _ = utils.deferred_df_to_pcollection(df) + watched_pcollections.add(pcoll) for pcoll in pcolls: if pcoll not in watched_pcollections: ie.current_env().watch( diff --git a/sdks/python/apache_beam/runners/interactive/recording_manager_test.py b/sdks/python/apache_beam/runners/interactive/recording_manager_test.py index bf7d216151b6..ca44ca3ef66d 100644 --- a/sdks/python/apache_beam/runners/interactive/recording_manager_test.py +++ b/sdks/python/apache_beam/runners/interactive/recording_manager_test.py @@ -15,11 +15,9 @@ # limitations under the License. # -from __future__ import absolute_import - -import sys import time import unittest +from unittest.mock import MagicMock import apache_beam as beam from apache_beam import coders @@ -43,13 +41,6 @@ from apache_beam.utils.timestamp import MIN_TIMESTAMP from apache_beam.utils.windowed_value import WindowedValue -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import MagicMock -except ImportError: - from mock import MagicMock # type: ignore[misc] - class MockPipelineResult(beam.runners.runner.PipelineResult): """Mock class for controlling a PipelineResult.""" @@ -191,9 +182,6 @@ def as_windowed_value(element): class RecordingTest(unittest.TestCase): - @unittest.skipIf( - sys.version_info < (3, 6, 0), - 'This test requires at least Python 3.6 to work.') def test_computed(self): """Tests that a PCollection is marked as computed only in a complete state. @@ -261,9 +249,6 @@ def test_computed(self): self.assertTrue(recording.computed()) self.assertFalse(recording.uncomputed()) - @unittest.skipIf( - sys.version_info < (3, 6, 0), - 'This test requires at least Python 3.6 to work.') def test_describe(self): p = beam.Pipeline(InteractiveRunner()) numbers = p | 'numbers' >> beam.Create([0, 1, 2]) @@ -309,9 +294,6 @@ def test_describe(self): class RecordingManagerTest(unittest.TestCase): - @unittest.skipIf( - sys.version_info < (3, 6, 0), - 'This test requires at least Python 3.6 to work.') def test_basic_execution(self): """A basic pipeline to be used as a smoke test.""" @@ -351,9 +333,6 @@ def test_basic_execution(self): rm.cancel() - @unittest.skipIf( - sys.version_info < (3, 6, 0), - 'This test requires at least Python 3.6 to work.') def test_duration_parsing(self): p = beam.Pipeline(InteractiveRunner()) elems = p | beam.Create([0, 1, 2]) @@ -371,9 +350,6 @@ def test_duration_parsing(self): # Assert that the duration was parsed correctly to integer seconds. self.assertEqual(recording.describe()['duration'], 500) - @unittest.skipIf( - sys.version_info < (3, 6, 0), - 'This test requires at least Python 3.6 to work.') def test_cancel_stops_recording(self): # Add the TestStream so that it can be cached. ib.options.recordable_sources.add(TestStream) @@ -418,9 +394,6 @@ def is_triggered(self): rm.cancel() self.assertTrue(bcj.is_done()) - @unittest.skipIf( - sys.version_info < (3, 6, 0), - 'This test requires at least Python 3.6 to work.') def test_recording_manager_clears_cache(self): """Tests that the RecordingManager clears the cache before recording. @@ -463,9 +436,6 @@ def test_recording_manager_clears_cache(self): unittest.mock.ANY, set(pipeline_instrument.cache_key(pc) for pc in (elems, squares))) - @unittest.skipIf( - sys.version_info < (3, 6, 0), - 'This test requires at least Python 3.6 to work.') def test_clear(self): """Tests that clear can empty the cache for a specific pipeline.""" @@ -502,9 +472,6 @@ def test_clear(self): rm_2.clear() self.assertEqual(rm_2.describe()['size'], 0) - @unittest.skipIf( - sys.version_info < (3, 6, 0), - 'This test requires at least Python 3.6 to work.') def test_record_pipeline(self): # Add the TestStream so that it can be cached. ib.options.recordable_sources.add(TestStream) @@ -532,18 +499,23 @@ def test_record_pipeline(self): class SizeLimiter(Limiter): def __init__(self, p): self.pipeline = p + self._rm = None + + def set_recording_manager(self, rm): + self._rm = rm def is_triggered(self): - rm = ie.current_env().get_recording_manager(self.pipeline) - return rm.describe()['size'] > 0 if rm else False + return self._rm.describe()['size'] > 0 if self._rm else False # Do the first recording to get the timestamp of the first time the fragment # was run. - rm = RecordingManager(p, test_limiters=[SizeLimiter(p)]) + size_limiter = SizeLimiter(p) + rm = RecordingManager(p, test_limiters=[size_limiter]) + size_limiter.set_recording_manager(rm) self.assertEqual(rm.describe()['state'], PipelineState.STOPPED) self.assertTrue(rm.record_pipeline()) - ie.current_env().set_recording_manager(rm, p) + # A recording is in progress, no need to start another one. self.assertFalse(rm.record_pipeline()) for _ in range(60): diff --git a/sdks/python/apache_beam/runners/interactive/sql/__init__.py b/sdks/python/apache_beam/runners/interactive/sql/__init__.py new file mode 100644 index 000000000000..cce3acad34a4 --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/sql/__init__.py @@ -0,0 +1,16 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# diff --git a/sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics.py b/sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics.py new file mode 100644 index 000000000000..1dc42e0bf959 --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics.py @@ -0,0 +1,297 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module of beam_sql cell magic that executes a Beam SQL. + +Only works within an IPython kernel. +""" + +import importlib +import keyword +import logging +from typing import Dict +from typing import Optional +from typing import Tuple +from typing import Union + +import apache_beam as beam +from apache_beam.pvalue import PValue +from apache_beam.runners.interactive import cache_manager as cache +from apache_beam.runners.interactive import interactive_beam as ib +from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive import pipeline_instrument as inst +from apache_beam.runners.interactive.cache_manager import FileBasedCacheManager +from apache_beam.runners.interactive.caching.streaming_cache import StreamingCache +from apache_beam.runners.interactive.sql.utils import find_pcolls +from apache_beam.runners.interactive.sql.utils import is_namedtuple +from apache_beam.runners.interactive.sql.utils import pcolls_by_name +from apache_beam.runners.interactive.sql.utils import register_coder_for_schema +from apache_beam.runners.interactive.sql.utils import replace_single_pcoll_token +from apache_beam.runners.interactive.utils import obfuscate +from apache_beam.runners.interactive.utils import progress_indicated +from apache_beam.testing import test_stream +from apache_beam.testing.test_stream_service import TestStreamServiceController +from apache_beam.transforms.sql import SqlTransform +from IPython.core.magic import Magics +from IPython.core.magic import cell_magic +from IPython.core.magic import magics_class + +_LOGGER = logging.getLogger(__name__) + +_EXAMPLE_USAGE = """Usage: + %%%%beam_sql [output_name] + Calcite SQL statement + Syntax: https://beam.apache.org/documentation/dsls/sql/calcite/query-syntax/ + Please make sure that there is no conflicts between your variable names and + the SQL keywords, such as "SELECT", "FROM", "WHERE" and etc. + + output_name is optional. If not supplied, a variable name is automatically + assigned to the output of the magic. + + The output of the magic is usually a PCollection or similar PValue, + depending on the SQL statement executed. +""" + + +def on_error(error_msg, *args): + """Logs the error and the usage example.""" + _LOGGER.error(error_msg, *args) + _LOGGER.info(_EXAMPLE_USAGE) + + +@magics_class +class BeamSqlMagics(Magics): + @cell_magic + def beam_sql(self, line: str, cell: str) -> Union[None, PValue]: + """The beam_sql cell magic that executes a Beam SQL. + + Args: + line: (optional) the string on the same line after the beam_sql magic. + Used as the output variable name in the __main__ module. + cell: everything else in the same notebook cell as a string. Used as a + Beam SQL query. + + Returns None if running into an error, otherwise a PValue as if a + SqlTransform is applied. + """ + if line and not line.strip().isidentifier() or keyword.iskeyword( + line.strip()): + on_error( + 'The output_name "%s" is not a valid identifier. Please supply a ' + 'valid identifier that is not a Python keyword.', + line) + return + if not cell or cell.isspace(): + on_error('Please supply the sql to be executed.') + return + found = find_pcolls(cell, pcolls_by_name()) + for _, pcoll in found.items(): + if not is_namedtuple(pcoll.element_type): + on_error( + 'PCollection %s of type %s is not a NamedTuple. See ' + 'https://beam.apache.org/documentation/programming-guide/#schemas ' + 'for more details.', + pcoll, + pcoll.element_type) + return + register_coder_for_schema(pcoll.element_type) + + # TODO(BEAM-10708): implicitly execute the pipeline and write output into + # cache. + return apply_sql(cell, line, found) + + +@progress_indicated +def apply_sql( + query: str, output_name: Optional[str], + found: Dict[str, beam.PCollection]) -> PValue: + """Applies a SqlTransform with the given sql and queried PCollections. + + Args: + query: The SQL query executed in the magic. + output_name: (optional) The output variable name in __main__ module. + found: The PCollections with variable names found to be used in the query. + + Returns: + A PValue, mostly a PCollection, depending on the query. + """ + output_name = _generate_output_name(output_name, query, found) + query, sql_source = _build_query_components(query, found) + try: + output = sql_source | SqlTransform(query) + # Declare a variable with the output_name and output value in the + # __main__ module so that the user can use the output smoothly. + setattr(importlib.import_module('__main__'), output_name, output) + ib.watch({output_name: output}) + _LOGGER.info( + "The output PCollection variable is %s: %s", output_name, output) + return output + except (KeyboardInterrupt, SystemExit): + raise + except Exception as e: + on_error('Error when applying the Beam SQL: %s', e) + + +def pcoll_from_file_cache( + query_pipeline: beam.Pipeline, + pcoll: beam.PCollection, + cache_manager: FileBasedCacheManager, + key: str) -> beam.PCollection: + """Reads PCollection cache from files. + + Args: + query_pipeline: The beam.Pipeline object built by the magic to execute the + SQL query. + pcoll: The PCollection to read cache for. + cache_manager: The file based cache manager that holds the PCollection + cache. + key: The key of the PCollection cache. + + Returns: + A PCollection read from the cache. + """ + schema = pcoll.element_type + + class Unreify(beam.DoFn): + def process(self, e): + if isinstance(e, beam.Row) and hasattr(e, 'windowed_value'): + yield e.windowed_value + + return ( + query_pipeline + | + '{}{}'.format('QuerySource', key) >> cache.ReadCache(cache_manager, key) + | '{}{}'.format('Unreify', key) >> beam.ParDo( + Unreify()).with_output_types(schema)) + + +def pcolls_from_streaming_cache( + user_pipeline: beam.Pipeline, + query_pipeline: beam.Pipeline, + name_to_pcoll: Dict[str, beam.PCollection], + instrumentation: inst.PipelineInstrument, + cache_manager: StreamingCache) -> Dict[str, beam.PCollection]: + """Reads PCollection cache through the TestStream. + + Args: + user_pipeline: The beam.Pipeline object defined by the user in the + notebook. + query_pipeline: The beam.Pipeline object built by the magic to execute the + SQL query. + name_to_pcoll: PCollections with variable names used in the SQL query. + instrumentation: A pipeline_instrument.PipelineInstrument that helps + calculate the cache key of a given PCollection. + cache_manager: The streaming cache manager that holds the PCollection cache. + + Returns: + A Dict[str, beam.PCollection], where each PCollection is tagged with + their PCollection variable names, read from the cache. + + When the user_pipeline has unbounded sources, we force all cache reads to go + through the TestStream even if they are bounded sources. + """ + def exception_handler(e): + _LOGGER.error(str(e)) + return True + + test_stream_service = ie.current_env().get_test_stream_service_controller( + user_pipeline) + if not test_stream_service: + test_stream_service = TestStreamServiceController( + cache_manager, exception_handler=exception_handler) + test_stream_service.start() + ie.current_env().set_test_stream_service_controller( + user_pipeline, test_stream_service) + + tag_to_name = {} + for name, pcoll in name_to_pcoll.items(): + key = instrumentation.cache_key(pcoll) + tag_to_name[key] = name + output_pcolls = query_pipeline | test_stream.TestStream( + output_tags=set(tag_to_name.keys()), + coder=cache_manager._default_pcoder, + endpoint=test_stream_service.endpoint) + sql_source = {} + for tag, output in output_pcolls.items(): + name = tag_to_name[tag] + # Must mark the element_type to avoid introducing pickled Python coder + # to the Java expansion service. + output.element_type = name_to_pcoll[name].element_type + sql_source[name] = output + return sql_source + + +def _generate_output_name( + output_name: Optional[str], query: str, + found: Dict[str, beam.PCollection]) -> str: + """Generates a unique output name if None is provided. + + Otherwise, returns the given output name directly. + The generated output name is sql_output_{uuid} where uuid is an obfuscated + value from the query and PCollections found to be used in the query. + """ + if not output_name: + execution_id = obfuscate(query, found)[:12] + output_name = 'sql_output_' + execution_id + return output_name + + +def _build_query_components( + query: str, found: Dict[str, beam.PCollection] +) -> Tuple[str, + Union[Dict[str, beam.PCollection], beam.PCollection, beam.Pipeline]]: + """Builds necessary components needed to apply the SqlTransform. + + Args: + query: The SQL query to be executed by the magic. + found: The PCollections with variable names found to be used by the query. + + Returns: + The processed query to be executed by the magic and a source to apply the + SqlTransform to: a dictionary of tagged PCollections, or a single + PCollection, or the pipeline to execute the query. + """ + if found: + user_pipeline = next(iter(found.values())).pipeline + cache_manager = ie.current_env().get_cache_manager(user_pipeline) + instrumentation = inst.build_pipeline_instrument(user_pipeline) + sql_pipeline = beam.Pipeline(options=user_pipeline._options) + ie.current_env().add_derived_pipeline(user_pipeline, sql_pipeline) + sql_source = {} + if instrumentation.has_unbounded_sources: + sql_source = pcolls_from_streaming_cache( + user_pipeline, sql_pipeline, found, instrumentation, cache_manager) + else: + for pcoll_name, pcoll in found.items(): + cache_key = instrumentation.cache_key(pcoll) + sql_source[pcoll_name] = pcoll_from_file_cache( + sql_pipeline, pcoll, cache_manager, cache_key) + if len(sql_source) == 1: + query = replace_single_pcoll_token(query, next(iter(sql_source.keys()))) + sql_source = next(iter(sql_source.values())) + else: + sql_source = beam.Pipeline() + return query, sql_source + + +def load_ipython_extension(ipython): + """Marks this module as an IPython extension. + + To load this magic in an IPython environment, execute: + %load_ext apache_beam.runners.interactive.sql.beam_sql_magics. + """ + ipython.register_magics(BeamSqlMagics) diff --git a/sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics_test.py b/sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics_test.py new file mode 100644 index 000000000000..d35bd4624b7f --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics_test.py @@ -0,0 +1,121 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tests for beam_sql_magics module.""" + +# pytype: skip-file + +import unittest +from unittest.mock import patch + +import pytest + +import apache_beam as beam +from apache_beam.runners.interactive import interactive_beam as ib +from apache_beam.runners.interactive import interactive_environment as ie + +try: + from apache_beam.runners.interactive.sql.beam_sql_magics import _build_query_components + from apache_beam.runners.interactive.sql.beam_sql_magics import _generate_output_name +except (ImportError, NameError): + pass # The test is to be skipped because [interactive] dep not installed. + + +@unittest.skipIf( + not ie.current_env().is_interactive_ready, + '[interactive] dependency is not installed.') +@pytest.mark.skipif( + not ie.current_env().is_interactive_ready, + reason='[interactive] dependency is not installed.') +class BeamSqlMagicsTest(unittest.TestCase): + def test_generate_output_name_when_not_provided(self): + output_name = None + self.assertTrue( + _generate_output_name(output_name, '', {}).startswith('sql_output_')) + + def test_use_given_output_name_when_provided(self): + output_name = 'output' + self.assertEqual(_generate_output_name(output_name, '', {}), output_name) + + def test_build_query_components_when_no_pcoll_queried(self): + query = """SELECT CAST(1 AS INT) AS `id`, + CAST('foo' AS VARCHAR) AS `str`, + CAST(3.14 AS DOUBLE) AS `flt`""" + processed_query, sql_source = _build_query_components(query, {}) + self.assertEqual(processed_query, query) + self.assertIsInstance(sql_source, beam.Pipeline) + + def test_build_query_components_when_single_pcoll_queried(self): + p = beam.Pipeline() + target = p | beam.Create([1, 2, 3]) + ib.watch(locals()) + query = 'SELECT * FROM target where a=1' + found = {'target': target} + + with patch('apache_beam.runners.interactive.sql.beam_sql_magics.' + 'pcoll_from_file_cache', + lambda a, + b, + c, + d: target): + processed_query, sql_source = _build_query_components(query, found) + + self.assertEqual(processed_query, 'SELECT * FROM PCOLLECTION where a=1') + self.assertIsInstance(sql_source, beam.PCollection) + + def test_build_query_components_when_multiple_pcolls_queried(self): + p = beam.Pipeline() + pcoll_1 = p | 'Create 1' >> beam.Create([1, 2, 3]) + pcoll_2 = p | 'Create 2' >> beam.Create([4, 5, 6]) + ib.watch(locals()) + query = 'SELECT * FROM pcoll_1 JOIN pcoll_2 USING (a)' + found = {'pcoll_1': pcoll_1, 'pcoll_2': pcoll_2} + + with patch('apache_beam.runners.interactive.sql.beam_sql_magics.' + 'pcoll_from_file_cache', + lambda a, + b, + c, + d: pcoll_1): + processed_query, sql_source = _build_query_components(query, found) + + self.assertEqual(processed_query, query) + self.assertIsInstance(sql_source, dict) + self.assertIn('pcoll_1', sql_source) + self.assertIn('pcoll_2', sql_source) + + def test_build_query_components_when_unbounded_pcolls_queried(self): + p = beam.Pipeline() + pcoll = p | beam.io.ReadFromPubSub( + subscription='projects/fake-project/subscriptions/fake_sub') + ib.watch(locals()) + query = 'SELECT * FROM pcoll' + found = {'pcoll': pcoll} + + with patch('apache_beam.runners.interactive.sql.beam_sql_magics.' + 'pcolls_from_streaming_cache', + lambda a, + b, + c, + d, + e: found): + _, sql_source = _build_query_components(query, found) + self.assertIs(sql_source, pcoll) + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/runners/interactive/sql/utils.py b/sdks/python/apache_beam/runners/interactive/sql/utils.py new file mode 100644 index 000000000000..355b6e6693dd --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/sql/utils.py @@ -0,0 +1,125 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module of utilities for SQL magics. + +For internal use only; no backward-compatibility guarantees. +""" + +# pytype: skip-file + +import logging +from typing import Dict +from typing import NamedTuple + +import apache_beam as beam +from apache_beam.runners.interactive import interactive_beam as ib +from apache_beam.runners.interactive import interactive_environment as ie + +_LOGGER = logging.getLogger(__name__) + + +def is_namedtuple(cls: type) -> bool: + """Determines if a class is built from typing.NamedTuple.""" + return ( + isinstance(cls, type) and issubclass(cls, tuple) and + hasattr(cls, '_fields') and hasattr(cls, '_field_types')) + + +def register_coder_for_schema(schema: NamedTuple) -> None: + """Registers a RowCoder for the given schema if hasn't. + + Notifies the user of what code has been implicitly executed. + """ + assert is_namedtuple(schema), ( + 'Schema %s is not a typing.NamedTuple.' % schema) + coder = beam.coders.registry.get_coder(schema) + if not isinstance(coder, beam.coders.RowCoder): + _LOGGER.warning( + 'Schema %s has not been registered to use a RowCoder. ' + 'Automatically registering it by running: ' + 'beam.coders.registry.register_coder(%s, ' + 'beam.coders.RowCoder)', + schema.__name__, + schema.__name__) + beam.coders.registry.register_coder(schema, beam.coders.RowCoder) + + +def pcolls_by_name() -> Dict[str, beam.PCollection]: + """Finds all PCollections by their variable names defined in the notebook.""" + inspectables = ie.current_env().inspector.inspectables + pcolls = {} + for _, inspectable in inspectables.items(): + metadata = inspectable['metadata'] + if metadata['type'] == 'pcollection': + pcolls[metadata['name']] = inspectable['value'] + return pcolls + + +def find_pcolls( + sql: str, pcolls: Dict[str, + beam.PCollection]) -> Dict[str, beam.PCollection]: + """Finds all PCollections used in the given sql query. + + It does a simple word by word match and calls ib.collect for each PCollection + found. + """ + found = {} + for word in sql.split(): + if word in pcolls: + found[word] = pcolls[word] + if found: + _LOGGER.info('Found PCollections used in the magic: %s.', found) + _LOGGER.info('Collecting data...') + for name, pcoll in found.items(): + try: + _ = ib.collect(pcoll) + except (KeyboardInterrupt, SystemExit): + raise + except: + _LOGGER.error( + 'Cannot collect data for PCollection %s. Please make sure the ' + 'PCollections queried in the sql "%s" are all from a single ' + 'pipeline using an InteractiveRunner. Make sure there is no ' + 'ambiguity, for example, same named PCollections from multiple ' + 'pipelines or notebook re-executions.', + name, + sql) + raise + _LOGGER.info('Done collecting data.') + return found + + +def replace_single_pcoll_token(sql: str, pcoll_name: str) -> str: + """Replaces the pcoll_name used in the sql with 'PCOLLECTION'. + + For sql query using only a single PCollection, the PCollection needs to be + referred to as 'PCOLLECTION' instead of its variable/tag name. + """ + words = sql.split() + token_locations = [] + i = 0 + for word in words: + if word.lower() == 'from': + token_locations.append(i + 1) + i += 2 + continue + i += 1 + for token_location in token_locations: + if token_location < len(words) and words[token_location] == pcoll_name: + words[token_location] = 'PCOLLECTION' + return ' '.join(words) diff --git a/sdks/python/apache_beam/runners/interactive/sql/utils_test.py b/sdks/python/apache_beam/runners/interactive/sql/utils_test.py new file mode 100644 index 000000000000..ed52cad30d18 --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/sql/utils_test.py @@ -0,0 +1,90 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tests for utils module.""" + +# pytype: skip-file + +import unittest +from typing import NamedTuple +from unittest.mock import patch + +import apache_beam as beam +from apache_beam.runners.interactive import interactive_beam as ib +from apache_beam.runners.interactive.sql.utils import find_pcolls +from apache_beam.runners.interactive.sql.utils import is_namedtuple +from apache_beam.runners.interactive.sql.utils import pcolls_by_name +from apache_beam.runners.interactive.sql.utils import register_coder_for_schema +from apache_beam.runners.interactive.sql.utils import replace_single_pcoll_token + + +class ANamedTuple(NamedTuple): + a: int + b: str + + +class UtilsTest(unittest.TestCase): + def test_is_namedtuple(self): + class AType: + pass + + a_type = AType + a_tuple = type((1, 2, 3)) + + a_namedtuple = ANamedTuple + + self.assertTrue(is_namedtuple(a_namedtuple)) + self.assertFalse(is_namedtuple(a_type)) + self.assertFalse(is_namedtuple(a_tuple)) + + def test_register_coder_for_schema(self): + self.assertNotIsInstance( + beam.coders.registry.get_coder(ANamedTuple), beam.coders.RowCoder) + register_coder_for_schema(ANamedTuple) + self.assertIsInstance( + beam.coders.registry.get_coder(ANamedTuple), beam.coders.RowCoder) + + def test_pcolls_by_name(self): + p = beam.Pipeline() + pcoll = p | beam.Create([1]) + ib.watch({'p': p, 'pcoll': pcoll}) + + name_to_pcoll = pcolls_by_name() + self.assertIn('pcoll', name_to_pcoll) + + def test_find_pcolls(self): + with patch('apache_beam.runners.interactive.interactive_beam.collect', + lambda _: None): + found = find_pcolls( + """SELECT * FROM pcoll_1 JOIN pcoll_2 + USING (common_column)""", { + 'pcoll_1': None, 'pcoll_2': None + }) + self.assertIn('pcoll_1', found) + self.assertIn('pcoll_2', found) + + def test_replace_single_pcoll_token(self): + sql = 'SELECT * FROM abc WHERE a=1 AND b=2' + replaced_sql = replace_single_pcoll_token(sql, 'wow') + self.assertEqual(replaced_sql, sql) + replaced_sql = replace_single_pcoll_token(sql, 'abc') + self.assertEqual( + replaced_sql, 'SELECT * FROM PCOLLECTION WHERE a=1 AND b=2') + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/29c9237ddf4f3d5988a503069b4d3c47.png b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/29c9237ddf4f3d5988a503069b4d3c47.png new file mode 100644 index 000000000000..8463b3f65a87 Binary files /dev/null and b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/29c9237ddf4f3d5988a503069b4d3c47.png differ diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/7a35f487b2a5f3a9b9852a8659eeb4bd.png b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/7a35f487b2a5f3a9b9852a8659eeb4bd.png index bfafc868b663..21796197835f 100644 Binary files a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/7a35f487b2a5f3a9b9852a8659eeb4bd.png and b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/7a35f487b2a5f3a9b9852a8659eeb4bd.png differ diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/29c9237ddf4f3d5988a503069b4d3c47.png b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/29c9237ddf4f3d5988a503069b4d3c47.png new file mode 100644 index 000000000000..a0880181cf9f Binary files /dev/null and b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/29c9237ddf4f3d5988a503069b4d3c47.png differ diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png index 6f3f325aef6a..d089751f359e 100644 Binary files a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png and b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png differ diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/notebook_executor.py b/sdks/python/apache_beam/runners/interactive/testing/integration/notebook_executor.py index 10517223630a..ecff3bd40560 100644 --- a/sdks/python/apache_beam/runners/interactive/testing/integration/notebook_executor.py +++ b/sdks/python/apache_beam/runners/interactive/testing/integration/notebook_executor.py @@ -20,8 +20,6 @@ # pytype: skip-file -from __future__ import absolute_import - import html import os import shutil @@ -82,7 +80,8 @@ def execute(self): for path in self._paths: with open(path, 'r') as nb_f: nb = nbformat.read(nb_f, as_version=4) - ep = ExecutePreprocessor(allow_errors=True, kernel_name='test') + ep = ExecutePreprocessor( + timeout=-1, allow_errors=True, kernel_name='test') ep.preprocess(nb, {'metadata': {'path': os.path.dirname(path)}}) execution_id = obfuscate(path) diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/screen_diff.py b/sdks/python/apache_beam/runners/interactive/testing/integration/screen_diff.py index 57dc0a72912c..5e0c8551ca1c 100644 --- a/sdks/python/apache_beam/runners/interactive/testing/integration/screen_diff.py +++ b/sdks/python/apache_beam/runners/interactive/testing/integration/screen_diff.py @@ -19,26 +19,18 @@ # pytype: skip-file -from __future__ import absolute_import - import os import platform import threading import unittest +from http.server import HTTPServer +from http.server import SimpleHTTPRequestHandler import pytest from apache_beam.runners.interactive import interactive_environment as ie from apache_beam.runners.interactive.testing.integration import notebook_executor -# TODO(BEAM-8288): clean up the work-around when Python2 support is deprecated. -try: - from http.server import SimpleHTTPRequestHandler - from http.server import HTTPServer -except ImportError: - import SimpleHTTPServer as HTTPServer - from SimpleHTTPServer import SimpleHTTPRequestHandler - try: import chromedriver_binary # pylint: disable=unused-import from needle.cases import NeedleTestCase @@ -127,7 +119,6 @@ def should_skip(): """Whether a screen diff test should be skipped.""" return not ( platform.system() in _SUPPORTED_PLATFORMS and - ie.current_env().is_py_version_ready and ie.current_env().is_interactive_ready and _interactive_integration_ready) diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/test_notebooks/dataframes.ipynb b/sdks/python/apache_beam/runners/interactive/testing/integration/test_notebooks/dataframes.ipynb new file mode 100644 index 000000000000..cb6bf4f4d4cd --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/testing/integration/test_notebooks/dataframes.ipynb @@ -0,0 +1,106 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "# Interactive Beam Examples" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import apache_beam as beam\n", + "from apache_beam.runners.interactive.interactive_runner import InteractiveRunner\n", + "from apache_beam.runners.interactive import interactive_beam as ib\n", + "from apache_beam.dataframe.convert import to_dataframe\n", + "from apache_beam.dataframe.convert import to_pcollection\n", + "from typing import NamedTuple\n", + "\n", + "from IPython.display import display, HTML" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Record(NamedTuple):\n", + " i: int\n", + " a: int\n", + " \n", + "data = [Record(i, i % 5) for i in range(10)]\n", + "p = beam.Pipeline(InteractiveRunner())\n", + "\n", + "els = p | beam.Create(data)\n", + "df = to_dataframe(els).set_index('i', drop=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ib.show(df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ib.collect(df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display(HTML('
    '))" + ] + } + ], + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/tests/init_square_cube_test.py b/sdks/python/apache_beam/runners/interactive/testing/integration/tests/screen_diff_test.py similarity index 62% rename from sdks/python/apache_beam/runners/interactive/testing/integration/tests/init_square_cube_test.py rename to sdks/python/apache_beam/runners/interactive/testing/integration/tests/screen_diff_test.py index 9853dd3cd3a0..0d36c8844349 100644 --- a/sdks/python/apache_beam/runners/interactive/testing/integration/tests/init_square_cube_test.py +++ b/sdks/python/apache_beam/runners/interactive/testing/integration/tests/screen_diff_test.py @@ -18,16 +18,38 @@ """Integration tests for interactive beam.""" # pytype: skip-file -from __future__ import absolute_import - import unittest +import pytest + from apache_beam.runners.interactive.testing.integration.screen_diff import BaseTestCase +@pytest.mark.timeout(300) +class DataFramesTest(BaseTestCase): + def __init__(self, *args, **kwargs): + kwargs['golden_size'] = (1024, 10000) + super(DataFramesTest, self).__init__(*args, **kwargs) + + def explicit_wait(self): + try: + from selenium.webdriver.common.by import By + from selenium.webdriver.support import expected_conditions + from selenium.webdriver.support.ui import WebDriverWait + + WebDriverWait(self.driver, 5).until( + expected_conditions.presence_of_element_located((By.ID, 'test-done'))) + except: + pass # The test will be ignored. + + def test_dataframes(self): + self.assert_notebook('dataframes') + + +@pytest.mark.timeout(300) class InitSquareCubeTest(BaseTestCase): def __init__(self, *args, **kwargs): - kwargs['golden_size'] = (1024, 7000) + kwargs['golden_size'] = (1024, 10000) super(InitSquareCubeTest, self).__init__(*args, **kwargs) def test_init_square_cube_notebook(self): diff --git a/sdks/python/apache_beam/runners/interactive/testing/test_cache_manager.py b/sdks/python/apache_beam/runners/interactive/testing/test_cache_manager.py index 207881836167..b94926913116 100644 --- a/sdks/python/apache_beam/runners/interactive/testing/test_cache_manager.py +++ b/sdks/python/apache_beam/runners/interactive/testing/test_cache_manager.py @@ -14,8 +14,6 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import - import collections import itertools import sys @@ -67,6 +65,10 @@ def cleanup(self): self._cached = collections.defaultdict(list) self._pcoders = {} + def clear(self, *label): + # Noop because in-memory. + pass + def source(self, *labels): vals = self._cached[self._key(*labels)] return beam.Create(vals) diff --git a/sdks/python/apache_beam/runners/interactive/user_pipeline_tracker_test.py b/sdks/python/apache_beam/runners/interactive/user_pipeline_tracker_test.py index 38e409e06b6d..f7025b8b75bf 100644 --- a/sdks/python/apache_beam/runners/interactive/user_pipeline_tracker_test.py +++ b/sdks/python/apache_beam/runners/interactive/user_pipeline_tracker_test.py @@ -15,8 +15,6 @@ # limitations under the License. # -from __future__ import absolute_import - import unittest import apache_beam as beam diff --git a/sdks/python/apache_beam/runners/interactive/utils.py b/sdks/python/apache_beam/runners/interactive/utils.py index f9e97380ad0c..cb0b7db67ec9 100644 --- a/sdks/python/apache_beam/runners/interactive/utils.py +++ b/sdks/python/apache_beam/runners/interactive/utils.py @@ -18,16 +18,30 @@ """Utilities to be used in Interactive Beam. """ -from __future__ import absolute_import - +import functools import hashlib import json import logging import pandas as pd +from apache_beam.dataframe.convert import to_pcollection +from apache_beam.dataframe.frame_base import DeferredBase from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.runners.interactive.caching.expression_cache import ExpressionCache from apache_beam.testing.test_stream import WindowedValueHolder +from apache_beam.typehints.schemas import named_fields_from_element_type + +_LOGGER = logging.getLogger(__name__) + +# Add line breaks to the IPythonLogHandler's HTML output. +_INTERACTIVE_LOG_STYLE = """ + +""" def to_element_list( @@ -76,8 +90,8 @@ def elements(): count += 1 -def elements_to_df(elements, include_window_info=False): - # type: (List[WindowedValue], bool) -> DataFrame +def elements_to_df(elements, include_window_info=False, element_type=None): + # type: (List[WindowedValue], bool, Any) -> DataFrame """Parses the given elements into a Dataframe. @@ -86,6 +100,12 @@ def elements_to_df(elements, include_window_info=False): True, then it will concatenate the windowing information onto the elements DataFrame. """ + try: + columns_names = [ + name for name, _ in named_fields_from_element_type(element_type) + ] + except TypeError: + columns_names = None rows = [] windowed_info = [] @@ -94,8 +114,14 @@ def elements_to_df(elements, include_window_info=False): if include_window_info: windowed_info.append([e.timestamp.micros, e.windows, e.pane_info]) - rows_df = pd.DataFrame(rows) - if include_window_info: + using_dataframes = isinstance(element_type, pd.DataFrame) + using_series = isinstance(element_type, pd.Series) + if using_dataframes or using_series: + rows_df = pd.concat(rows) + else: + rows_df = pd.DataFrame(rows, columns=columns_names) + + if include_window_info and not using_series: windowed_info_df = pd.DataFrame( windowed_info, columns=['event_time', 'windows', 'pane_info']) final_df = pd.concat([rows_df, windowed_info_df], axis=1) @@ -152,6 +178,7 @@ def emit(self, record): from html import escape from IPython.core.display import HTML from IPython.core.display import display + display(HTML(_INTERACTIVE_LOG_STYLE)) display( HTML( self.log_template.format( @@ -197,8 +224,10 @@ def __enter__(self): display(HTML(self.spinner_template.format(id=self._id))) else: display(self._enter_text) - except ImportError: - pass # NOOP when dependencies are not available. + except ImportError as e: + _LOGGER.error( + 'Please use interactive Beam features in an IPython' + 'or notebook environment: %s' % e) def __exit__(self, exc_type, exc_value, traceback): try: @@ -214,8 +243,10 @@ def __exit__(self, exc_type, exc_value, traceback): customized_script=script))) else: display(self._exit_text) - except ImportError: - pass # NOOP when dependencies are not avaialble. + except ImportError as e: + _LOGGER.error( + 'Please use interactive Beam features in an IPython' + 'or notebook environment: %s' % e) def progress_indicated(func): @@ -223,6 +254,7 @@ def progress_indicated(func): """A decorator using a unique progress indicator as a context manager to execute the given function within.""" + @functools.wraps(func) def run_within_progress_indicator(*args, **kwargs): with ProgressIndicator('Processing...', 'Done.'): return func(*args, **kwargs) @@ -248,3 +280,17 @@ def return_as_json(*args, **kwargs): return str(return_value) return return_as_json + + +def deferred_df_to_pcollection(df): + assert isinstance(df, DeferredBase), '{} is not a DeferredBase'.format(df) + + # The proxy is used to output a DataFrame with the correct columns. + # + # TODO(BEAM-11064): Once type hints are implemented for pandas, use those + # instead of the proxy. + cache = ExpressionCache() + cache.replace_with_cached(df._expr) + + proxy = df._expr.proxy() + return to_pcollection(df, yield_elements='pandas', label=str(df._expr)), proxy diff --git a/sdks/python/apache_beam/runners/interactive/utils_test.py b/sdks/python/apache_beam/runners/interactive/utils_test.py index 3e4a5461f9d4..ecbba302be2f 100644 --- a/sdks/python/apache_beam/runners/interactive/utils_test.py +++ b/sdks/python/apache_beam/runners/interactive/utils_test.py @@ -15,42 +15,46 @@ # limitations under the License. # -from __future__ import absolute_import - import json import logging -import sys import unittest +from typing import NamedTuple +from unittest.mock import PropertyMock +from unittest.mock import patch import numpy as np import pandas as pd +import pytest +import apache_beam as beam from apache_beam import coders +from apache_beam.dataframe.convert import to_dataframe from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload from apache_beam.runners.interactive import interactive_environment as ie from apache_beam.runners.interactive import utils +from apache_beam.runners.interactive.testing.mock_ipython import mock_get_ipython from apache_beam.testing.test_stream import WindowedValueHolder from apache_beam.utils.timestamp import Timestamp from apache_beam.utils.windowed_value import WindowedValue -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch + +class Record(NamedTuple): + order_id: int + product_id: int + quantity: int + + +def windowed_value(e): + from apache_beam.transforms.window import GlobalWindow + return WindowedValue(e, 1, [GlobalWindow()]) class ParseToDataframeTest(unittest.TestCase): def test_parse_windowedvalue(self): """Tests that WindowedValues are supported but not present. """ - from apache_beam.transforms.window import GlobalWindow - els = [ - WindowedValue(('a', 2), 1, [GlobalWindow()]), - WindowedValue(('b', 3), 1, [GlobalWindow()]) - ] + els = [windowed_value(('a', 2)), windowed_value(('b', 3))] actual_df = utils.elements_to_df(els, include_window_info=False) expected_df = pd.DataFrame([['a', 2], ['b', 3]], columns=[0, 1]) @@ -60,12 +64,8 @@ def test_parse_windowedvalue(self): def test_parse_windowedvalue_with_window_info(self): """Tests that WindowedValues are supported and have their own columns. """ - from apache_beam.transforms.window import GlobalWindow - els = [ - WindowedValue(('a', 2), 1, [GlobalWindow()]), - WindowedValue(('b', 3), 1, [GlobalWindow()]) - ] + els = [windowed_value(('a', 2)), windowed_value(('b', 3))] actual_df = utils.elements_to_df(els, include_window_info=True) expected_df = pd.DataFrame( @@ -78,15 +78,13 @@ def test_parse_windowedvalue_with_window_info(self): def test_parse_windowedvalue_with_dicts(self): """Tests that dicts play well with WindowedValues. """ - from apache_beam.transforms.window import GlobalWindow - els = [ - WindowedValue({ + windowed_value({ 'b': 2, 'd': 4 - }, 1, [GlobalWindow()]), - WindowedValue({ + }), + windowed_value({ 'a': 1, 'b': 2, 'c': 3 - }, 1, [GlobalWindow()]) + }) ] actual_df = utils.elements_to_df(els, include_window_info=True) @@ -97,6 +95,31 @@ def test_parse_windowedvalue_with_dicts(self): # check_like so that ordering of indices doesn't matter. pd.testing.assert_frame_equal(actual_df, expected_df, check_like=True) + def test_parse_dataframes(self): + """Tests that it correctly parses a DataFrame. + """ + deferred = to_dataframe(beam.Pipeline() | beam.Create([Record(0, 0, 0)])) + + els = [windowed_value(pd.DataFrame(Record(n, 0, 0))) for n in range(10)] + + actual_df = utils.elements_to_df( + els, element_type=deferred._expr.proxy()).reset_index(drop=True) + expected_df = pd.concat([e.value for e in els], ignore_index=True) + pd.testing.assert_frame_equal(actual_df, expected_df) + + def test_parse_series(self): + """Tests that it correctly parses a Pandas Series. + """ + deferred = to_dataframe(beam.Pipeline() + | beam.Create([Record(0, 0, 0)]))['order_id'] + + els = [windowed_value(pd.Series([n])) for n in range(10)] + + actual_df = utils.elements_to_df( + els, element_type=deferred._expr.proxy()).reset_index(drop=True) + expected_df = pd.concat([e.value for e in els], ignore_index=True) + pd.testing.assert_series_equal(actual_df, expected_df) + class ToElementListTest(unittest.TestCase): def test_test_stream_payload_events(self): @@ -135,8 +158,6 @@ def test_element_limit_count(self): @unittest.skipIf( not ie.current_env().is_interactive_ready, '[interactive] dependency is not installed.') -@unittest.skipIf( - sys.version_info < (3, 6), 'The tests require at least Python 3.6 to work.') class IPythonLogHandlerTest(unittest.TestCase): def setUp(self): utils.register_ipython_log_handler() @@ -191,54 +212,50 @@ def test_child_module_logger_can_override_logging_level(self, mock_emit): @unittest.skipIf( not ie.current_env().is_interactive_ready, '[interactive] dependency is not installed.') -@unittest.skipIf( - sys.version_info < (3, 6), 'The tests require at least Python 3.6 to work.') +@pytest.mark.skipif( + not ie.current_env().is_interactive_ready, + reason='[interactive] dependency is not installed.') class ProgressIndicatorTest(unittest.TestCase): def setUp(self): ie.new_env() - @patch('IPython.core.display.display') - def test_progress_in_plain_text_when_not_in_notebook(self, mocked_display): - ie.current_env()._is_in_notebook = False - mocked_display.assert_not_called() - - @utils.progress_indicated - def progress_indicated_dummy(): - mocked_display.assert_called_with('Processing...') - - progress_indicated_dummy() - mocked_display.assert_called_with('Done.') - - @patch('IPython.core.display.HTML') - @patch('IPython.core.display.Javascript') - @patch('IPython.core.display.display') - @patch('IPython.core.display.display_javascript') + @patch('IPython.get_ipython', new_callable=mock_get_ipython) + @patch( + 'apache_beam.runners.interactive.interactive_environment' + '.InteractiveEnvironment.is_in_notebook', + new_callable=PropertyMock) + def test_progress_in_plain_text_when_not_in_notebook( + self, mocked_is_in_notebook, unused): + mocked_is_in_notebook.return_value = False + + with patch('IPython.core.display.display') as mocked_display: + + @utils.progress_indicated + def progress_indicated_dummy(): + mocked_display.assert_any_call('Processing...') + + progress_indicated_dummy() + mocked_display.assert_any_call('Done.') + + @patch('IPython.get_ipython', new_callable=mock_get_ipython) + @patch( + 'apache_beam.runners.interactive.interactive_environment' + '.InteractiveEnvironment.is_in_notebook', + new_callable=PropertyMock) def test_progress_in_HTML_JS_when_in_notebook( - self, - mocked_display_javascript, - mocked_display, - mocked_javascript, - mocked_html): + self, mocked_is_in_notebook, unused): + mocked_is_in_notebook.return_value = True - ie.current_env()._is_in_notebook = True - mocked_display.assert_not_called() - mocked_display_javascript.assert_not_called() - - @utils.progress_indicated - def progress_indicated_dummy(): - mocked_display.assert_called_once() - mocked_html.assert_called_once() - - progress_indicated_dummy() - mocked_display_javascript.assert_called_once() - mocked_javascript.assert_called_once() + with patch('IPython.core.display.HTML') as mocked_html,\ + patch('IPython.core.display.Javascript') as mocked_js: + with utils.ProgressIndicator('enter', 'exit'): + mocked_html.assert_called() + mocked_js.assert_called() @unittest.skipIf( not ie.current_env().is_interactive_ready, '[interactive] dependency is not installed.') -@unittest.skipIf( - sys.version_info < (3, 6), 'The tests require at least Python 3.6 to work.') class MessagingUtilTest(unittest.TestCase): SAMPLE_DATA = {'a': [1, 2, 3], 'b': 4, 'c': '5', 'd': {'e': 'f'}} diff --git a/sdks/python/apache_beam/runners/job/__init__.py b/sdks/python/apache_beam/runners/job/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/runners/job/__init__.py +++ b/sdks/python/apache_beam/runners/job/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/runners/job/manager.py b/sdks/python/apache_beam/runners/job/manager.py index a1c329e4e06f..9c9265d8fd4e 100644 --- a/sdks/python/apache_beam/runners/job/manager.py +++ b/sdks/python/apache_beam/runners/job/manager.py @@ -20,12 +20,9 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import subprocess import time -from builtins import object import grpc diff --git a/sdks/python/apache_beam/runners/job/utils.py b/sdks/python/apache_beam/runners/job/utils.py index 7bde57735df9..205d87941a5a 100644 --- a/sdks/python/apache_beam/runners/job/utils.py +++ b/sdks/python/apache_beam/runners/job/utils.py @@ -20,8 +20,6 @@ # pytype: skip-file -from __future__ import absolute_import - import json import logging diff --git a/sdks/python/apache_beam/runners/pipeline_context.py b/sdks/python/apache_beam/runners/pipeline_context.py index b63e5396096c..7cefe3956c9e 100644 --- a/sdks/python/apache_beam/runners/pipeline_context.py +++ b/sdks/python/apache_beam/runners/pipeline_context.py @@ -23,9 +23,6 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import - -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import Dict @@ -49,12 +46,14 @@ from apache_beam.portability.api import beam_runner_api_pb2 from apache_beam.transforms import core from apache_beam.transforms import environments +from apache_beam.transforms.resources import merge_resource_hints from apache_beam.typehints import native_type_compatibility if TYPE_CHECKING: from google.protobuf import message # pylint: disable=ungrouped-imports from apache_beam.coders.coder_impl import IterableStateReader from apache_beam.coders.coder_impl import IterableStateWriter + from apache_beam.transforms import ptransform PortableObjectT = TypeVar('PortableObjectT', bound='PortableObject') @@ -118,13 +117,23 @@ def get_by_id(self, id): def get_by_proto(self, maybe_new_proto, label=None, deduplicate=False): # type: (message.Message, Optional[str], bool) -> str + # TODO: this method may not be safe for arbitrary protos due to + # xlang concerns, hence limiting usage to the only current use-case it has. + # See: https://github.com/apache/beam/pull/14390#discussion_r616062377 + assert isinstance(maybe_new_proto, beam_runner_api_pb2.Environment) + obj = self._obj_type.from_runner_api( + maybe_new_proto, self._pipeline_context) + if deduplicate: + if obj in self._obj_to_id: + return self._obj_to_id[obj] + for id, proto in self._id_to_proto.items(): if proto == maybe_new_proto: return id return self.put_proto( self._pipeline_context.component_id_map.get_or_assign( - label, obj_type=self._obj_type), + obj=obj, obj_type=self._obj_type, label=label), maybe_new_proto) def get_id_to_proto_map(self): @@ -172,7 +181,6 @@ def __init__(self, iterable_state_read=None, # type: Optional[IterableStateReader] iterable_state_write=None, # type: Optional[IterableStateWriter] namespace='ref', # type: str - allow_proto_holders=False, # type: bool requirements=(), # type: Iterable[str] ): # type: (...) -> None @@ -185,6 +193,7 @@ def __init__(self, self.component_id_map = component_id_map or ComponentIdMap(namespace) assert self.component_id_map.namespace == namespace + # TODO(BEAM-12084) Initialize component_id_map with objects from proto. self.transforms = _PipelineContextMap( self, pipeline.AppliedPTransform, @@ -211,16 +220,15 @@ def __init__(self, namespace, proto.environments if proto is not None else None) - if default_environment: - self._default_environment_id = self.environments.get_id( - default_environment, - label='default_environment') # type: Optional[str] - else: - self._default_environment_id = None + if default_environment is None: + default_environment = environments.DefaultEnvironment() + + self._default_environment_id = self.environments.get_id( + default_environment, label='default_environment') # type: str + self.use_fake_coders = use_fake_coders self.iterable_state_read = iterable_state_read self.iterable_state_write = iterable_state_write - self.allow_proto_holders = allow_proto_holders self._requirements = set(requirements) def add_requirement(self, requirement): @@ -235,12 +243,20 @@ def requirements(self): # rather than an actual coder. The element type is required for some runners, # as well as performing a round-trip through protos. # TODO(BEAM-2717): Remove once this is no longer needed. - def coder_id_from_element_type(self, element_type): - # type: (Any) -> str + def coder_id_from_element_type( + self, element_type, requires_deterministic_key_coder=None): + # type: (Any, Optional[str]) -> str if self.use_fake_coders: return pickler.dumps(element_type).decode('ascii') else: - return self.coders.get_id(coders.registry.get_coder(element_type)) + coder = coders.registry.get_coder(element_type) + if requires_deterministic_key_coder: + coder = coders.TupleCoder([ + coder.key_coder().as_deterministic_coder( + requires_deterministic_key_coder), + coder.value_coder() + ]) + return self.coders.get_id(coder) def element_type_from_coder_id(self, coder_id): # type: (str) -> Any @@ -268,5 +284,33 @@ def to_runner_api(self): return context_proto def default_environment_id(self): - # type: () -> Optional[str] + # type: () -> str return self._default_environment_id + + def get_environment_id_for_resource_hints( + self, hints): # type: (Dict[str, bytes]) -> str + """Returns an environment id that has necessary resource hints.""" + if not hints: + return self.default_environment_id() + + def get_or_create_environment_with_resource_hints( + template_env_id, + resource_hints, + ): # type: (str, Dict[str, bytes]) -> str + """Creates an environment that has necessary hints and returns its id.""" + template_env = self.environments.get_proto_from_id(template_env_id) + cloned_env = beam_runner_api_pb2.Environment() + cloned_env.CopyFrom(template_env) + cloned_env.resource_hints.clear() + cloned_env.resource_hints.update(resource_hints) + + return self.environments.get_by_proto( + cloned_env, label='environment_with_resource_hints', deduplicate=True) + + default_env_id = self.default_environment_id() + env_hints = self.environments.get_by_id(default_env_id).resource_hints() + hints = merge_resource_hints(outer_hints=env_hints, inner_hints=hints) + maybe_new_env_id = get_or_create_environment_with_resource_hints( + default_env_id, hints) + + return maybe_new_env_id diff --git a/sdks/python/apache_beam/runners/pipeline_context_test.py b/sdks/python/apache_beam/runners/pipeline_context_test.py index 119c25695e01..49ff6f744bf1 100644 --- a/sdks/python/apache_beam/runners/pipeline_context_test.py +++ b/sdks/python/apache_beam/runners/pipeline_context_test.py @@ -19,12 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from apache_beam import coders from apache_beam.runners import pipeline_context +from apache_beam.transforms import environments class PipelineContextTest(unittest.TestCase): @@ -36,11 +35,32 @@ def test_deduplication(self): def test_deduplication_by_proto(self): context = pipeline_context.PipelineContext() - bytes_coder_proto = coders.BytesCoder().to_runner_api(None) - bytes_coder_ref = context.coders.get_by_proto(bytes_coder_proto) - bytes_coder_ref2 = context.coders.get_by_proto( - bytes_coder_proto, deduplicate=True) - self.assertEqual(bytes_coder_ref, bytes_coder_ref2) + env_proto = environments.SubprocessSDKEnvironment( + command_string="foo").to_runner_api(None) + env_ref_1 = context.environments.get_by_proto(env_proto) + env_ref_2 = context.environments.get_by_proto(env_proto, deduplicate=True) + self.assertEqual(env_ref_1, env_ref_2) + + def test_equal_environments_are_deduplicated_when_fetched_by_obj_or_proto( + self): + context = pipeline_context.PipelineContext() + + env = environments.SubprocessSDKEnvironment(command_string="foo") + env_proto = env.to_runner_api(None) + id_from_proto = context.environments.get_by_proto(env_proto) + id_from_obj = context.environments.get_id(env) + self.assertEqual(id_from_obj, id_from_proto) + self.assertEqual( + context.environments.get_by_id(id_from_obj).command_string, "foo") + + env = environments.SubprocessSDKEnvironment(command_string="bar") + env_proto = env.to_runner_api(None) + id_from_obj = context.environments.get_id(env) + id_from_proto = context.environments.get_by_proto( + env_proto, deduplicate=True) + self.assertEqual(id_from_obj, id_from_proto) + self.assertEqual( + context.environments.get_by_id(id_from_obj).command_string, "bar") def test_serialization(self): context = pipeline_context.PipelineContext() diff --git a/sdks/python/apache_beam/runners/portability/__init__.py b/sdks/python/apache_beam/runners/portability/__init__.py index d247cadd0dac..7af93ed945fa 100644 --- a/sdks/python/apache_beam/runners/portability/__init__.py +++ b/sdks/python/apache_beam/runners/portability/__init__.py @@ -16,5 +16,3 @@ # """This runner is experimental; no backwards-compatibility guarantees.""" - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/runners/portability/abstract_job_service.py b/sdks/python/apache_beam/runners/portability/abstract_job_service.py index aa88f5f23279..55224f49584a 100644 --- a/sdks/python/apache_beam/runners/portability/abstract_job_service.py +++ b/sdks/python/apache_beam/runners/portability/abstract_job_service.py @@ -16,8 +16,6 @@ # # pytype: skip-file -from __future__ import absolute_import - import copy import itertools import json @@ -26,7 +24,6 @@ import tempfile import uuid import zipfile -from builtins import object from concurrent import futures from typing import TYPE_CHECKING from typing import Dict diff --git a/sdks/python/apache_beam/runners/portability/artifact_service.py b/sdks/python/apache_beam/runners/portability/artifact_service.py index 18537f49f6e8..64023eb15aaa 100644 --- a/sdks/python/apache_beam/runners/portability/artifact_service.py +++ b/sdks/python/apache_beam/runners/portability/artifact_service.py @@ -21,10 +21,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import concurrent.futures import contextlib import hashlib @@ -42,9 +38,9 @@ from typing import MutableMapping from typing import Optional from typing import Tuple +from urllib.request import urlopen import grpc -from future.moves.urllib.request import urlopen from apache_beam.io import filesystems from apache_beam.io.filesystems import CompressionTypes @@ -271,7 +267,8 @@ def file_reader(self, path): def file_writer(self, name=None): full_path = filesystems.FileSystems.join(self._root, name) - return filesystems.FileSystems.create(full_path), full_path + return filesystems.FileSystems.create( + full_path, compression_type=CompressionTypes.UNCOMPRESSED), full_path def resolve_artifacts(artifacts, service, dest_dir): @@ -355,6 +352,3 @@ def __next__(self): raise self._queue.get() else: return item - - if sys.version_info < (3, ): - next = __next__ diff --git a/sdks/python/apache_beam/runners/portability/artifact_service_test.py b/sdks/python/apache_beam/runners/portability/artifact_service_test.py index 5855d89eecf2..17f1e962b9a0 100644 --- a/sdks/python/apache_beam/runners/portability/artifact_service_test.py +++ b/sdks/python/apache_beam/runners/portability/artifact_service_test.py @@ -18,16 +18,11 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import contextlib import io import threading import unittest - -from future.moves.urllib.parse import quote +from urllib.parse import quote from apache_beam.portability import common_urns from apache_beam.portability.api import beam_artifact_api_pb2 diff --git a/sdks/python/apache_beam/runners/portability/expansion_service.py b/sdks/python/apache_beam/runners/portability/expansion_service.py index 2536ef3fb5b5..da7b425b6e9f 100644 --- a/sdks/python/apache_beam/runners/portability/expansion_service.py +++ b/sdks/python/apache_beam/runners/portability/expansion_service.py @@ -19,9 +19,6 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import traceback from apache_beam import pipeline as beam_pipeline @@ -39,6 +36,8 @@ class ExpansionServiceServicer( def __init__(self, options=None): self._options = options or beam_pipeline.PipelineOptions( environment_type=python_urns.EMBEDDED_PYTHON, sdk_location='container') + self._default_environment = ( + portable_runner.PortableRunner._create_environment(self._options)) def Expand(self, request, context=None): try: @@ -54,8 +53,7 @@ def with_pipeline(component, pcoll_id=None): context = pipeline_context.PipelineContext( request.components, - default_environment=portable_runner.PortableRunner. - _create_environment(self._options), + default_environment=self._default_environment, namespace=request.namespace) producers = { pcoll_id: (context.transforms.get_by_id(t_id), pcoll_tag) diff --git a/sdks/python/apache_beam/runners/portability/expansion_service_test.py b/sdks/python/apache_beam/runners/portability/expansion_service_test.py index bd8f72e78df6..98d2faa89c6b 100644 --- a/sdks/python/apache_beam/runners/portability/expansion_service_test.py +++ b/sdks/python/apache_beam/runners/portability/expansion_service_test.py @@ -16,8 +16,6 @@ # # pytype: skip-file -from __future__ import absolute_import - import argparse import logging import signal @@ -25,16 +23,18 @@ import typing import grpc -from past.builtins import unicode import apache_beam as beam import apache_beam.transforms.combiners as combine from apache_beam.coders import RowCoder from apache_beam.pipeline import PipelineOptions +from apache_beam.portability.api import beam_artifact_api_pb2_grpc from apache_beam.portability.api import beam_expansion_api_pb2_grpc from apache_beam.portability.api.external_transforms_pb2 import ExternalConfigurationPayload +from apache_beam.runners.portability import artifact_service from apache_beam.runners.portability import expansion_service from apache_beam.transforms import ptransform +from apache_beam.transforms.environments import PyPIArtifactRegistry from apache_beam.transforms.external import ImplicitSchemaPayloadBuilder from apache_beam.utils import thread_pool_executor @@ -51,6 +51,10 @@ TEST_COMPK_URN = "beam:transforms:xlang:test:compk" TEST_FLATTEN_URN = "beam:transforms:xlang:test:flatten" TEST_PARTITION_URN = "beam:transforms:xlang:test:partition" +TEST_PYTHON_BS4_URN = "beam:transforms:xlang:test:python_bs4" + +# A transform that does not produce an output. +TEST_NO_OUTPUT_URN = "beam:transforms:xlang:test:nooutput" @ptransform.PTransform.register_urn('beam:transforms:xlang:count', None) @@ -88,7 +92,7 @@ def from_runner_api_parameter(unused_ptransform, payload, unused_context): @ptransform.PTransform.register_urn(TEST_PREFIX_URN, None) -@beam.typehints.with_output_types(unicode) +@beam.typehints.with_output_types(str) class PrefixTransform(ptransform.PTransform): def __init__(self, payload): self._payload = payload @@ -113,9 +117,9 @@ def expand(self, pcolls): 'main': (pcolls['main1'], pcolls['main2']) | beam.Flatten() | beam.Map(lambda x, s: x + s, beam.pvalue.AsSingleton( - pcolls['side'])).with_output_types(unicode), + pcolls['side'])).with_output_types(str), 'side': pcolls['side'] - | beam.Map(lambda x: x + x).with_output_types(unicode), + | beam.Map(lambda x: x + x).with_output_types(str), } def to_runner_api_parameter(self, unused_context): @@ -152,7 +156,7 @@ def expand(self, pcoll): return pcoll \ | beam.CoGroupByKey() \ | beam.ParDo(self.ConcatFn()).with_output_types( - typing.Tuple[int, typing.Iterable[unicode]]) + typing.Tuple[int, typing.Iterable[str]]) def to_runner_api_parameter(self, unused_context): return TEST_CGBK_URN, None @@ -181,9 +185,12 @@ def from_runner_api_parameter( @ptransform.PTransform.register_urn(TEST_COMPK_URN, None) class CombinePerKeyTransform(ptransform.PTransform): def expand(self, pcoll): - return pcoll \ - | beam.CombinePerKey(sum).with_output_types( - typing.Tuple[unicode, int]) + output = pcoll \ + | beam.CombinePerKey(sum) + # TODO: Use `with_output_types` instead of explicitly + # assigning to `.element_type` after fixing BEAM-12872 + output.element_type = beam.typehints.Tuple[str, int] + return output def to_runner_api_parameter(self, unused_context): return TEST_COMPK_URN, None @@ -226,6 +233,27 @@ def from_runner_api_parameter( return PartitionTransform() +class ExtractHtmlTitleDoFn(beam.DoFn): + def process(self, element): + from bs4 import BeautifulSoup + soup = BeautifulSoup(element, 'html.parser') + return [soup.title.string] + + +@ptransform.PTransform.register_urn(TEST_PYTHON_BS4_URN, None) +class ExtractHtmlTitleTransform(ptransform.PTransform): + def expand(self, pcoll): + return pcoll | beam.ParDo(ExtractHtmlTitleDoFn()).with_output_types(str) + + def to_runner_api_parameter(self, unused_context): + return TEST_PYTHON_BS4_URN, None + + @staticmethod + def from_runner_api_parameter( + unused_ptransform, unused_parameter, unused_context): + return ExtractHtmlTitleTransform() + + @ptransform.PTransform.register_urn('payload', bytes) class PayloadTransform(ptransform.PTransform): def __init__(self, payload): @@ -271,6 +299,23 @@ def from_runner_api_parameter(unused_ptransform, level, unused_context): return FibTransform(int(level.decode('ascii'))) +@ptransform.PTransform.register_urn(TEST_NO_OUTPUT_URN, None) +class NoOutputTransform(ptransform.PTransform): + def expand(self, pcoll): + def log_val(val): + logging.debug('Got value: %r', val) + + # Logging without returning anything + _ = (pcoll | 'TestLabel' >> beam.ParDo(log_val)) + + def to_runner_api_parameter(self, unused_context): + return TEST_NO_OUTPUT_URN, None + + @staticmethod + def from_runner_api_parameter(unused_ptransform, payload, unused_context): + return NoOutputTransform(parse_string_payload(payload)['data']) + + def parse_string_payload(input_byte): payload = ExternalConfigurationPayload() payload.ParseFromString(input_byte) @@ -287,6 +332,7 @@ def cleanup(unused_signum, unused_frame): def main(unused_argv): + PyPIArtifactRegistry.register_artifact('beautifulsoup4', '>=4.9,<5.0') parser = argparse.ArgumentParser() parser.add_argument( '-p', '--port', type=int, help='port on which to serve the job api') @@ -298,6 +344,10 @@ def main(unused_argv): PipelineOptions( ["--experiments", "beam_fn_api", "--sdk_location", "container"])), server) + beam_artifact_api_pb2_grpc.add_ArtifactRetrievalServiceServicer_to_server( + artifact_service.ArtifactRetrievalService( + artifact_service.BeamFilesystemHandler(None).file_reader), + server) server.add_insecure_port('localhost:{}'.format(options.port)) server.start() _LOGGER.info('Listening for expansion requests at %d', options.port) diff --git a/sdks/python/apache_beam/runners/portability/flink_runner.py b/sdks/python/apache_beam/runners/portability/flink_runner.py index f886a7d45a59..6486d3d0a282 100644 --- a/sdks/python/apache_beam/runners/portability/flink_runner.py +++ b/sdks/python/apache_beam/runners/portability/flink_runner.py @@ -19,13 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import logging import os import re -import sys import urllib from apache_beam.options import pipeline_options @@ -54,10 +50,6 @@ def default_job_server(self, options): flink_options.flink_master = flink_master if (flink_options.flink_submit_uber_jar and flink_master not in MAGIC_HOST_NAMES): - if sys.version_info < (3, 6): - raise ValueError( - 'flink_submit_uber_jar requires Python 3.6+, current version %s' % - sys.version) # This has to be changed [auto], otherwise we will attempt to submit a # the pipeline remotely on the Flink JobMaster which will _fail_. # DO NOT CHANGE the following line, unless you have tested this. diff --git a/sdks/python/apache_beam/runners/portability/flink_runner_test.py b/sdks/python/apache_beam/runners/portability/flink_runner_test.py index b3c3d5276aa4..7d26a70c9e39 100644 --- a/sdks/python/apache_beam/runners/portability/flink_runner_test.py +++ b/sdks/python/apache_beam/runners/portability/flink_runner_test.py @@ -16,9 +16,6 @@ # # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import argparse import logging import shlex @@ -31,7 +28,6 @@ from tempfile import mkdtemp import pytest -from past.builtins import unicode import apache_beam as beam from apache_beam import Impulse @@ -61,7 +57,7 @@ _LOGGER = logging.getLogger(__name__) -Row = typing.NamedTuple("Row", [("col1", int), ("col2", unicode)]) +Row = typing.NamedTuple("Row", [("col1", int), ("col2", str)]) beam.coders.registry.register_coder(Row, beam.coders.RowCoder) @@ -238,7 +234,8 @@ def test_expand_kafka_read(self): p | ReadFromKafka( consumer_config={ - 'bootstrap.servers': 'notvalid1:7777, notvalid2:3531' + 'bootstrap.servers': 'notvalid1:7777, notvalid2:3531', + 'group.id': 'any_group' }, topics=['topic1', 'topic2'], key_deserializer='org.apache.kafka.' @@ -247,6 +244,8 @@ def test_expand_kafka_read(self): value_deserializer='org.apache.kafka.' 'common.serialization.' 'LongDeserializer', + commit_offset_in_finalize=True, + timestamp_policy=ReadFromKafka.create_time_policy, expansion_service=self.get_expansion_service())) self.assertTrue( 'No resolvable bootstrap urls given in bootstrap.servers' in str( @@ -338,46 +337,47 @@ def process(self, kv, state=beam.DoFn.StateParam(state_spec)): if options.view_as(StandardOptions).streaming: lines_expected.update([ # Gauges for the last finished bundle - 'stateful.beam.metric:statecache:capacity: 123', - 'stateful.beam.metric:statecache:size: 10', - 'stateful.beam.metric:statecache:get: 20', - 'stateful.beam.metric:statecache:miss: 0', - 'stateful.beam.metric:statecache:hit: 20', - 'stateful.beam.metric:statecache:put: 0', - 'stateful.beam.metric:statecache:evict: 0', + 'stateful.beam_metric:statecache:capacity: 123', + 'stateful.beam_metric:statecache:size: 10', + 'stateful.beam_metric:statecache:get: 20', + 'stateful.beam_metric:statecache:miss: 0', + 'stateful.beam_metric:statecache:hit: 20', + 'stateful.beam_metric:statecache:put: 0', + 'stateful.beam_metric:statecache:evict: 0', # Counters - 'stateful.beam.metric:statecache:get_total: 220', - 'stateful.beam.metric:statecache:miss_total: 10', - 'stateful.beam.metric:statecache:hit_total: 210', - 'stateful.beam.metric:statecache:put_total: 10', - 'stateful.beam.metric:statecache:evict_total: 0', + 'stateful.beam_metric:statecache:get_total: 220', + 'stateful.beam_metric:statecache:miss_total: 10', + 'stateful.beam_metric:statecache:hit_total: 210', + 'stateful.beam_metric:statecache:put_total: 10', + 'stateful.beam_metric:statecache:evict_total: 0', ]) else: # Batch has a different processing model. All values for # a key are processed at once. lines_expected.update([ # Gauges - 'stateful).beam.metric:statecache:capacity: 123', + 'stateful).beam_metric:statecache:capacity: 123', # For the first key, the cache token will not be set yet. # It's lazily initialized after first access in StateRequestHandlers - 'stateful).beam.metric:statecache:size: 10', + 'stateful).beam_metric:statecache:size: 10', # We have 11 here because there are 110 / 10 elements per key - 'stateful).beam.metric:statecache:get: 12', - 'stateful).beam.metric:statecache:miss: 1', - 'stateful).beam.metric:statecache:hit: 11', + 'stateful).beam_metric:statecache:get: 12', + 'stateful).beam_metric:statecache:miss: 1', + 'stateful).beam_metric:statecache:hit: 11', # State is flushed back once per key - 'stateful).beam.metric:statecache:put: 1', - 'stateful).beam.metric:statecache:evict: 0', + 'stateful).beam_metric:statecache:put: 1', + 'stateful).beam_metric:statecache:evict: 0', # Counters - 'stateful).beam.metric:statecache:get_total: 120', - 'stateful).beam.metric:statecache:miss_total: 10', - 'stateful).beam.metric:statecache:hit_total: 110', - 'stateful).beam.metric:statecache:put_total: 10', - 'stateful).beam.metric:statecache:evict_total: 0', + 'stateful).beam_metric:statecache:get_total: 120', + 'stateful).beam_metric:statecache:miss_total: 10', + 'stateful).beam_metric:statecache:hit_total: 110', + 'stateful).beam_metric:statecache:put_total: 10', + 'stateful).beam_metric:statecache:evict_total: 0', ]) lines_actual = set() with open(self.test_metrics_path, 'r') as f: for line in f: + print(line, end='') for metric_str in lines_expected: metric_name = metric_str.split()[0] if metric_str in line: @@ -395,6 +395,9 @@ def test_callbacks_with_exception(self): def test_register_finalizations(self): raise unittest.SkipTest("BEAM-11021") + def test_custom_merging_window(self): + raise unittest.SkipTest("BEAM-11004") + # Inherits all other tests. @@ -420,6 +423,12 @@ def test_expand_kafka_write(self): def test_sql(self): raise unittest.SkipTest("BEAM-7252") + def test_pack_combiners(self): + # Stages produced by translations.pack_combiners are fused + # by translations.greedily_fuse, which prevent the stages + # from being detecting using counters by the test. + self._test_pack_combiners(assert_using_counter_names=False) + class FlinkRunnerTestStreaming(FlinkRunnerTest): def __init__(self, *args, **kwargs): diff --git a/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py b/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py index b10a75f300fa..9b0d6ff4f571 100644 --- a/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py +++ b/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import logging import os import tempfile @@ -66,12 +63,16 @@ def executable_jar(self): if not os.path.exists(self._executable_jar): parsed = urllib.parse.urlparse(self._executable_jar) if not parsed.scheme: + try: + flink_version = self.flink_version() + except Exception: + flink_version = '$FLINK_VERSION' raise ValueError( 'Unable to parse jar URL "%s". If using a full URL, make sure ' 'the scheme is specified. If using a local file path, make sure ' 'the file exists; you may have to first build the job server ' 'using `./gradlew runners:flink:%s:job-server:shadowJar`.' % - (self._executable_jar, self._flink_version)) + (self._executable_jar, flink_version)) url = self._executable_jar else: url = job_server.JavaJarJobServer.path_to_beam_jar( diff --git a/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server_test.py b/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server_test.py index 591f2dbb7270..1294f4653b2a 100644 --- a/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server_test.py +++ b/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server_test.py @@ -16,13 +16,9 @@ # # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import contextlib import logging import os -import sys import tempfile import unittest import zipfile @@ -46,7 +42,6 @@ def temp_name(*args, **kwargs): os.unlink(name) -@unittest.skipIf(sys.version_info < (3, 6), "Requires Python 3.6+") class FlinkUberJarJobServerTest(unittest.TestCase): @requests_mock.mock() def test_flink_version(self, http_mock): @@ -197,6 +192,36 @@ def test_retain_unknown_options(self): self.assertEqual( options_proto['beam:option:unknown_option_foo:v1'], 'some_value') + @requests_mock.mock() + def test_bad_url_flink_version(self, http_mock): + http_mock.get('http://flink/v1/config', json={'flink-version': '1.2.3.4'}) + options = pipeline_options.FlinkRunnerOptions() + options.flink_job_server_jar = "bad url" + job_server = flink_uber_jar_job_server.FlinkUberJarJobServer( + 'http://flink', options) + with self.assertRaises(ValueError) as context: + job_server.executable_jar() + self.assertEqual( + 'Unable to parse jar URL "bad url". If using a full URL, make sure ' + 'the scheme is specified. If using a local file path, make sure ' + 'the file exists; you may have to first build the job server ' + 'using `./gradlew runners:flink:1.2:job-server:shadowJar`.', + str(context.exception)) + + def test_bad_url_placeholder_version(self): + options = pipeline_options.FlinkRunnerOptions() + options.flink_job_server_jar = "bad url" + job_server = flink_uber_jar_job_server.FlinkUberJarJobServer( + 'http://example.com/bad', options) + with self.assertRaises(ValueError) as context: + job_server.executable_jar() + self.assertEqual( + 'Unable to parse jar URL "bad url". If using a full URL, make sure ' + 'the scheme is specified. If using a local file path, make sure ' + 'the file exists; you may have to first build the job server ' + 'using `./gradlew runners:flink:$FLINK_VERSION:job-server:shadowJar`.', + str(context.exception)) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/__init__.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/__init__.py index b1e560b6653a..3ef1bea3df57 100644 --- a/sdks/python/apache_beam/runners/portability/fn_api_runner/__init__.py +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/__init__.py @@ -15,5 +15,4 @@ # limitations under the License. # -from __future__ import absolute_import from apache_beam.runners.portability.fn_api_runner.fn_runner import FnApiRunner diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py index bc691235f6b2..4f35cc0f2109 100644 --- a/sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py @@ -19,11 +19,11 @@ # mypy: disallow-untyped-defs -from __future__ import absolute_import - import collections import copy import itertools +import uuid +import weakref from typing import TYPE_CHECKING from typing import Any from typing import DefaultDict @@ -54,7 +54,9 @@ from apache_beam.runners.portability.fn_api_runner.translations import only_element from apache_beam.runners.portability.fn_api_runner.translations import split_buffer_id from apache_beam.runners.portability.fn_api_runner.translations import unique_name +from apache_beam.runners.portability.fn_api_runner.watermark_manager import WatermarkManager from apache_beam.runners.worker import bundle_processor +from apache_beam.transforms import core from apache_beam.transforms import trigger from apache_beam.transforms import window from apache_beam.transforms.window import GlobalWindow @@ -69,7 +71,7 @@ from apache_beam.runners.portability.fn_api_runner.fn_runner import DataOutput from apache_beam.runners.portability.fn_api_runner.fn_runner import OutputTimers from apache_beam.runners.portability.fn_api_runner.translations import DataSideInput - from apache_beam.transforms import core + from apache_beam.runners.portability.fn_api_runner.translations import TimerFamilyId from apache_beam.transforms.window import BoundedWindow ENCODED_IMPULSE_VALUE = WindowedValueCoder( @@ -338,6 +340,222 @@ def from_runner_api_parameter(window_coder_id, context): context.coders[window_coder_id.decode('utf-8')]) +class GenericMergingWindowFn(window.WindowFn): + + URN = 'internal-generic-merging' + + TO_SDK_TRANSFORM = 'read' + FROM_SDK_TRANSFORM = 'write' + + _HANDLES = {} # type: Dict[str, GenericMergingWindowFn] + + def __init__(self, execution_context, windowing_strategy_proto): + # type: (FnApiRunnerExecutionContext, beam_runner_api_pb2.WindowingStrategy) -> None + self._worker_handler = None # type: Optional[worker_handlers.WorkerHandler] + self._handle_id = handle_id = uuid.uuid4().hex + self._HANDLES[handle_id] = self + # ExecutionContexts are expensive, we don't want to keep them in the + # static dictionary forever. Instead we hold a weakref and pop self + # out of the dict once this context goes away. + self._execution_context_ref_obj = weakref.ref( + execution_context, lambda _: self._HANDLES.pop(handle_id, None)) + self._windowing_strategy_proto = windowing_strategy_proto + self._counter = 0 + # Lazily created in make_process_bundle_descriptor() + self._process_bundle_descriptor = None + self._bundle_processor_id = None # type: Optional[str] + self.windowed_input_coder_impl = None # type: Optional[CoderImpl] + self.windowed_output_coder_impl = None # type: Optional[CoderImpl] + + def _execution_context_ref(self): + # type: () -> FnApiRunnerExecutionContext + result = self._execution_context_ref_obj() + assert result is not None + return result + + def payload(self): + # type: () -> bytes + return self._handle_id.encode('utf-8') + + @staticmethod + @window.urns.RunnerApiFn.register_urn(URN, bytes) + def from_runner_api_parameter(handle_id, unused_context): + # type: (bytes, Any) -> GenericMergingWindowFn + return GenericMergingWindowFn._HANDLES[handle_id.decode('utf-8')] + + def assign(self, assign_context): + # type: (window.WindowFn.AssignContext) -> Iterable[window.BoundedWindow] + raise NotImplementedError() + + def merge(self, merge_context): + # type: (window.WindowFn.MergeContext) -> None + worker_handler = self.worker_handle() + + assert self.windowed_input_coder_impl is not None + assert self.windowed_output_coder_impl is not None + process_bundle_id = self.uid('process') + to_worker = worker_handler.data_conn.output_stream( + process_bundle_id, self.TO_SDK_TRANSFORM) + to_worker.write( + self.windowed_input_coder_impl.encode_nested( + window.GlobalWindows.windowed_value((b'', merge_context.windows)))) + to_worker.close() + + process_bundle_req = beam_fn_api_pb2.InstructionRequest( + instruction_id=process_bundle_id, + process_bundle=beam_fn_api_pb2.ProcessBundleRequest( + process_bundle_descriptor_id=self._bundle_processor_id)) + result_future = worker_handler.control_conn.push(process_bundle_req) + for output in worker_handler.data_conn.input_elements( + process_bundle_id, [self.FROM_SDK_TRANSFORM], + abort_callback=lambda: bool(result_future.is_done() and result_future. + get().error)): + if isinstance(output, beam_fn_api_pb2.Elements.Data): + windowed_result = self.windowed_output_coder_impl.decode_nested( + output.data) + for merge_result, originals in windowed_result.value[1][1]: + merge_context.merge(originals, merge_result) + else: + raise RuntimeError("Unexpected data: %s" % output) + + result = result_future.get() + if result.error: + raise RuntimeError(result.error) + # The result was "returned" via the merge callbacks on merge_context above. + + def get_window_coder(self): + # type: () -> coders.Coder + return self._execution_context_ref().pipeline_context.coders[ + self._windowing_strategy_proto.window_coder_id] + + def worker_handle(self): + # type: () -> worker_handlers.WorkerHandler + if self._worker_handler is None: + worker_handler_manager = self._execution_context_ref( + ).worker_handler_manager + self._worker_handler = worker_handler_manager.get_worker_handlers( + self._windowing_strategy_proto.environment_id, 1)[0] + process_bundle_decriptor = self.make_process_bundle_descriptor( + self._worker_handler.data_api_service_descriptor(), + self._worker_handler.state_api_service_descriptor()) + worker_handler_manager.register_process_bundle_descriptor( + process_bundle_decriptor) + return self._worker_handler + + def make_process_bundle_descriptor( + self, data_api_service_descriptor, state_api_service_descriptor): + # type: (Optional[endpoints_pb2.ApiServiceDescriptor], Optional[endpoints_pb2.ApiServiceDescriptor]) -> beam_fn_api_pb2.ProcessBundleDescriptor + + """Creates a ProcessBundleDescriptor for invoking the WindowFn's + merge operation. + """ + def make_channel_payload(coder_id): + # type: (str) -> bytes + data_spec = beam_fn_api_pb2.RemoteGrpcPort(coder_id=coder_id) + if data_api_service_descriptor: + data_spec.api_service_descriptor.url = (data_api_service_descriptor.url) + return data_spec.SerializeToString() + + pipeline_context = self._execution_context_ref().pipeline_context + global_windowing_strategy_id = self.uid('global_windowing_strategy') + global_windowing_strategy_proto = core.Windowing( + window.GlobalWindows()).to_runner_api(pipeline_context) + coders = dict(pipeline_context.coders.get_id_to_proto_map()) + + def make_coder(urn, *components): + # type: (str, str) -> str + coder_proto = beam_runner_api_pb2.Coder( + spec=beam_runner_api_pb2.FunctionSpec(urn=urn), + component_coder_ids=components) + coder_id = self.uid('coder') + coders[coder_id] = coder_proto + pipeline_context.coders.put_proto(coder_id, coder_proto) + return coder_id + + bytes_coder_id = make_coder(common_urns.coders.BYTES.urn) + window_coder_id = self._windowing_strategy_proto.window_coder_id + global_window_coder_id = make_coder(common_urns.coders.GLOBAL_WINDOW.urn) + iter_window_coder_id = make_coder( + common_urns.coders.ITERABLE.urn, window_coder_id) + input_coder_id = make_coder( + common_urns.coders.KV.urn, bytes_coder_id, iter_window_coder_id) + output_coder_id = make_coder( + common_urns.coders.KV.urn, + bytes_coder_id, + make_coder( + common_urns.coders.KV.urn, + iter_window_coder_id, + make_coder( + common_urns.coders.ITERABLE.urn, + make_coder( + common_urns.coders.KV.urn, + window_coder_id, + iter_window_coder_id)))) + windowed_input_coder_id = make_coder( + common_urns.coders.WINDOWED_VALUE.urn, + input_coder_id, + global_window_coder_id) + windowed_output_coder_id = make_coder( + common_urns.coders.WINDOWED_VALUE.urn, + output_coder_id, + global_window_coder_id) + + self.windowed_input_coder_impl = pipeline_context.coders[ + windowed_input_coder_id].get_impl() + self.windowed_output_coder_impl = pipeline_context.coders[ + windowed_output_coder_id].get_impl() + + self._bundle_processor_id = self.uid('merge_windows') + return beam_fn_api_pb2.ProcessBundleDescriptor( + id=self._bundle_processor_id, + transforms={ + self.TO_SDK_TRANSFORM: beam_runner_api_pb2.PTransform( + unique_name='MergeWindows/Read', + spec=beam_runner_api_pb2.FunctionSpec( + urn=bundle_processor.DATA_INPUT_URN, + payload=make_channel_payload(windowed_input_coder_id)), + outputs={'input': 'input'}), + 'Merge': beam_runner_api_pb2.PTransform( + unique_name='MergeWindows/Merge', + spec=beam_runner_api_pb2.FunctionSpec( + urn=common_urns.primitives.MERGE_WINDOWS.urn, + payload=self._windowing_strategy_proto.window_fn. + SerializeToString()), + inputs={'input': 'input'}, + outputs={'output': 'output'}), + self.FROM_SDK_TRANSFORM: beam_runner_api_pb2.PTransform( + unique_name='MergeWindows/Write', + spec=beam_runner_api_pb2.FunctionSpec( + urn=bundle_processor.DATA_OUTPUT_URN, + payload=make_channel_payload(windowed_output_coder_id)), + inputs={'output': 'output'}), + }, + pcollections={ + 'input': beam_runner_api_pb2.PCollection( + unique_name='input', + windowing_strategy_id=global_windowing_strategy_id, + coder_id=input_coder_id), + 'output': beam_runner_api_pb2.PCollection( + unique_name='output', + windowing_strategy_id=global_windowing_strategy_id, + coder_id=output_coder_id), + }, + coders=coders, + windowing_strategies={ + global_windowing_strategy_id: global_windowing_strategy_proto, + }, + environments=dict( + self._execution_context_ref().pipeline_components.environments. + items()), + state_api_service_descriptor=state_api_service_descriptor, + timer_api_service_descriptor=data_api_service_descriptor) + + def uid(self, name=''): + # type: (str) -> str + self._counter += 1 + return '%s_%s_%s' % (self._handle_id, name, self._counter) + + class FnApiRunnerExecutionContext(object): """ :var pcoll_buffers: (dict): Mapping of @@ -345,20 +563,19 @@ class FnApiRunnerExecutionContext(object): ``beam.PCollection``. """ def __init__(self, - stages, # type: List[translations.Stage] - worker_handler_manager, # type: worker_handlers.WorkerHandlerManager - pipeline_components, # type: beam_runner_api_pb2.Components - safe_coders, # type: Dict[str, str] - data_channel_coders, # type: Dict[str, str] - ): - # type: (...) -> None - + stages, # type: List[translations.Stage] + worker_handler_manager, # type: worker_handlers.WorkerHandlerManager + pipeline_components, # type: beam_runner_api_pb2.Components + safe_coders: translations.SafeCoderMapping, + data_channel_coders: Dict[str, str], + ) -> None: """ :param worker_handler_manager: This class manages the set of worker handlers, and the communication with state / control APIs. - :param pipeline_components: (beam_runner_api_pb2.Components): TODO - :param safe_coders: - :param data_channel_coders: + :param pipeline_components: (beam_runner_api_pb2.Components) + :param safe_coders: A map from Coder ID to Safe Coder ID. + :param data_channel_coders: A map from PCollection ID to the ID of the Coder + for that PCollection. """ self.stages = stages self.side_input_descriptors_by_stage = ( @@ -370,6 +587,12 @@ def __init__(self, self.safe_coders = safe_coders self.data_channel_coders = data_channel_coders + self.input_transform_to_buffer_id = { + t.unique_name: t.spec.payload + for s in stages for t in s.transforms + if t.spec.urn == bundle_processor.DATA_INPUT_URN + } + self.watermark_manager = WatermarkManager(stages) self.pipeline_context = pipeline_context.PipelineContext( self.pipeline_components, iterable_state_write=self._iterable_state_write) @@ -443,23 +666,27 @@ def _make_safe_windowing_strategy(self, id): windowing_strategy_proto = self.pipeline_components.windowing_strategies[id] if windowing_strategy_proto.window_fn.urn in SAFE_WINDOW_FNS: return id - elif (windowing_strategy_proto.merge_status == - beam_runner_api_pb2.MergeStatus.NON_MERGING) or True: + else: safe_id = id + '_safe' while safe_id in self.pipeline_components.windowing_strategies: safe_id += '_' safe_proto = copy.copy(windowing_strategy_proto) - safe_proto.window_fn.urn = GenericNonMergingWindowFn.URN - safe_proto.window_fn.payload = ( - windowing_strategy_proto.window_coder_id.encode('utf-8')) + if (windowing_strategy_proto.merge_status == + beam_runner_api_pb2.MergeStatus.NON_MERGING): + safe_proto.window_fn.urn = GenericNonMergingWindowFn.URN + safe_proto.window_fn.payload = ( + windowing_strategy_proto.window_coder_id.encode('utf-8')) + elif (windowing_strategy_proto.merge_status == + beam_runner_api_pb2.MergeStatus.NEEDS_MERGE): + window_fn = GenericMergingWindowFn(self, windowing_strategy_proto) + safe_proto.window_fn.urn = GenericMergingWindowFn.URN + safe_proto.window_fn.payload = window_fn.payload() + else: + raise NotImplementedError( + 'Unsupported merging strategy: %s' % + windowing_strategy_proto.merge_status) self.pipeline_context.windowing_strategies.put_proto(safe_id, safe_proto) return safe_id - elif windowing_strategy_proto.window_fn.urn == python_urns.PICKLED_WINDOWFN: - return id - else: - raise NotImplementedError( - '[BEAM-10119] Unknown merging WindowFn: %s' % - windowing_strategy_proto) @property def state_servicer(self): @@ -594,7 +821,7 @@ def _build_process_bundle_descriptor(self): timer_api_service_descriptor=self.data_api_service_descriptor()) def extract_bundle_inputs_and_outputs(self): - # type: () -> Tuple[Dict[str, PartitionableBuffer], DataOutput, Dict[Tuple[str, str], bytes]] + # type: () -> Tuple[Dict[str, PartitionableBuffer], DataOutput, Dict[TimerFamilyId, bytes]] """Returns maps of transform names to PCollection identifiers. @@ -725,7 +952,7 @@ def get_buffer(self, buffer_id, transform_id): self.execution_context.pipeline_context.windowing_strategies[ self.execution_context.safe_windowing_strategies[ self.execution_context.pipeline_components. - pcollections[output_pcoll].windowing_strategy_id]]) + pcollections[input_pcoll].windowing_strategy_id]]) self.execution_context.pcoll_buffers[buffer_id] = GroupingBuffer( pre_gbk_coder, post_gbk_coder, windowing_strategy) else: @@ -734,8 +961,8 @@ def get_buffer(self, buffer_id, transform_id): raise NotImplementedError(buffer_id) return self.execution_context.pcoll_buffers[buffer_id] - def input_for(self, transform_id, input_id): - # type: (str, str) -> str + def input_for(self, transform_id: str, input_id: str) -> str: + """Returns the name of the transform producing the given PCollection.""" input_pcoll = self.process_bundle_descriptor.transforms[ transform_id].inputs[input_id] for read_id, proto in self.process_bundle_descriptor.transforms.items(): diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py index 404261f6b979..be8fe60bd2c1 100644 --- a/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py @@ -20,9 +20,6 @@ # pytype: skip-file # mypy: check-untyped-defs -from __future__ import absolute_import -from __future__ import print_function - import contextlib import copy import itertools @@ -33,15 +30,16 @@ import sys import threading import time -from builtins import object from typing import TYPE_CHECKING from typing import Callable from typing import Dict +from typing import Iterable from typing import Iterator from typing import List from typing import Mapping from typing import MutableMapping from typing import Optional +from typing import Set from typing import Tuple from typing import TypeVar from typing import Union @@ -57,6 +55,7 @@ from apache_beam.portability.api import beam_provision_api_pb2 from apache_beam.portability.api import beam_runner_api_pb2 from apache_beam.runners import runner +from apache_beam.runners.common import group_by_key_input_visitor from apache_beam.runners.portability import portable_metrics from apache_beam.runners.portability.fn_api_runner import execution from apache_beam.runners.portability.fn_api_runner import translations @@ -64,9 +63,11 @@ from apache_beam.runners.portability.fn_api_runner.translations import create_buffer_id from apache_beam.runners.portability.fn_api_runner.translations import only_element from apache_beam.runners.portability.fn_api_runner.worker_handlers import WorkerHandlerManager +from apache_beam.runners.worker import bundle_processor from apache_beam.transforms import environments from apache_beam.utils import proto_utils from apache_beam.utils import thread_pool_executor +from apache_beam.utils import timestamp from apache_beam.utils.profiler import Profile if TYPE_CHECKING: @@ -116,7 +117,7 @@ def __init__( """ super(FnApiRunner, self).__init__() self._default_environment = ( - default_environment or environments.EmbeddedPythonEnvironment()) + default_environment or environments.EmbeddedPythonEnvironment.default()) self._bundle_repeat = bundle_repeat self._num_workers = 1 self._progress_frequency = progress_request_frequency @@ -153,9 +154,10 @@ def run_pipeline(self, # This is sometimes needed if type checking is disabled # to enforce that the inputs (and outputs) of GroupByKey operations # are known to be KVs. - from apache_beam.runners.dataflow.dataflow_runner import DataflowRunner - # TODO: Move group_by_key_input_visitor() to a non-dataflow specific file. - pipeline.visit(DataflowRunner.group_by_key_input_visitor()) + pipeline.visit( + group_by_key_input_visitor( + not options.view_as(pipeline_options.TypeOptions). + allow_non_deterministic_key_coders)) self._bundle_repeat = self._bundle_repeat or options.view_as( pipeline_options.DirectOptions).direct_runner_bundle_repeat pipeline_direct_num_workers = options.view_as( @@ -169,12 +171,23 @@ def run_pipeline(self, running_mode = \ options.view_as(pipeline_options.DirectOptions).direct_running_mode if running_mode == 'multi_threading': - self._default_environment = environments.EmbeddedPythonGrpcEnvironment() + self._default_environment = ( + environments.EmbeddedPythonGrpcEnvironment.default()) elif running_mode == 'multi_processing': command_string = '%s -m apache_beam.runners.worker.sdk_worker_main' \ % sys.executable - self._default_environment = environments.SubprocessSDKEnvironment( - command_string=command_string) + self._default_environment = ( + environments.SubprocessSDKEnvironment.from_command_string( + command_string=command_string)) + + if running_mode == 'in_memory' and self._num_workers != 1: + _LOGGER.warning( + 'If direct_num_workers is not equal to 1, direct_running_mode ' + 'should be `multi_processing` or `multi_threading` instead of ' + '`in_memory` in order for it to have the desired worker parallelism ' + 'effect. direct_num_workers: %d ; running_mode: %s', + self._num_workers, + running_mode) self._profiler_factory = Profile.factory_from_options( options.view_as(pipeline_options.ProfilingOptions)) @@ -306,7 +319,6 @@ def create_stages( phases=[ translations.annotate_downstream_side_inputs, translations.fix_side_input_pcoll_coders, - translations.eliminate_common_key_with_none, translations.pack_combiners, translations.lift_combiners, translations.expand_sdf, @@ -355,10 +367,36 @@ def run_stages(self, bundle_context_manager = execution.BundleContextManager( runner_execution_context, stage, self._num_workers) + assert ( + runner_execution_context.watermark_manager.get_stage_node( + bundle_context_manager.stage.name + ).input_watermark() == timestamp.MAX_TIMESTAMP), ( + 'wrong watermark for %s. Expected %s, but got %s.' % ( + runner_execution_context.watermark_manager.get_stage_node( + bundle_context_manager.stage.name), + timestamp.MAX_TIMESTAMP, + runner_execution_context.watermark_manager.get_stage_node( + bundle_context_manager.stage.name + ).input_watermark() + ) + ) + stage_results = self._run_stage( - runner_execution_context, - bundle_context_manager, + runner_execution_context, bundle_context_manager) + + assert ( + runner_execution_context.watermark_manager.get_stage_node( + bundle_context_manager.stage.name + ).input_watermark() == timestamp.MAX_TIMESTAMP), ( + 'wrong input watermark for %s. Expected %s, but got %s.' % ( + runner_execution_context.watermark_manager.get_stage_node( + bundle_context_manager.stage.name), + timestamp.MAX_TIMESTAMP, + runner_execution_context.watermark_manager.get_stage_node( + bundle_context_manager.stage.name + ).output_watermark()) ) + monitoring_infos_by_stage[stage.name] = ( stage_results.process_bundle.monitoring_infos) finally: @@ -373,9 +411,7 @@ def _run_bundle_multiple_times_for_testing( data_output, # type: DataOutput fired_timers, # type: Mapping[Tuple[str, str], execution.PartitionableBuffer] expected_output_timers, # type: Dict[Tuple[str, str], bytes] - ): - # type: (...) -> None - + ) -> None: """ If bundle_repeat > 0, replay every bundle for profiling and debugging. """ @@ -392,16 +428,30 @@ def _run_bundle_multiple_times_for_testing( finally: runner_execution_context.state_servicer.restore() - def _collect_written_timers_and_add_to_fired_timers( - self, - bundle_context_manager, # type: execution.BundleContextManager - fired_timers # type: Dict[Tuple[str, str], ListBuffer] - ): - # type: (...) -> None - + @staticmethod + def _collect_written_timers( + bundle_context_manager: execution.BundleContextManager, + ) -> Tuple[Dict[translations.TimerFamilyId, timestamp.Timestamp], + Dict[translations.TimerFamilyId, ListBuffer]]: + """Review output buffers, and collect written timers. + + This function reviews a stage that has just been run. The stage will have + written timers to its output buffers. The function then takes the timers, + and adds them to the `newly_set_timers` dictionary, and the + timer_watermark_data dictionary. + + The function then returns the following two elements in a tuple: + - timer_watermark_data: A dictionary mapping timer family to upcoming + timestamp to fire. + - newly_set_timers: A dictionary mapping timer family to timer buffers + to be passed to the SDK upon firing. + """ + timer_watermark_data = {} + newly_set_timers = {} for (transform_id, timer_family_id) in bundle_context_manager.stage.timers: written_timers = bundle_context_manager.get_buffer( create_buffer_id(timer_family_id, kind='timers'), transform_id) + assert isinstance(written_timers, ListBuffer) timer_coder_impl = bundle_context_manager.get_timer_coder_impl( transform_id, timer_family_id) if not written_timers.cleared: @@ -415,26 +465,50 @@ def _collect_written_timers_and_add_to_fired_timers( # Only add not cleared timer to fired timers. if not decoded_timer.clear_bit: timer_coder_impl.encode_to_stream(decoded_timer, out, True) - fired_timers[(transform_id, timer_family_id)] = ListBuffer( + if (transform_id, timer_family_id) not in timer_watermark_data: + timer_watermark_data[(transform_id, + timer_family_id)] = timestamp.MAX_TIMESTAMP + timer_watermark_data[(transform_id, timer_family_id)] = min( + timer_watermark_data[(transform_id, timer_family_id)], + decoded_timer.hold_timestamp) + newly_set_timers[(transform_id, timer_family_id)] = ListBuffer( coder_impl=timer_coder_impl) - fired_timers[(transform_id, timer_family_id)].append(out.get()) + newly_set_timers[(transform_id, timer_family_id)].append(out.get()) written_timers.clear() + return timer_watermark_data, newly_set_timers + def _add_sdk_delayed_applications_to_deferred_inputs( self, bundle_context_manager, # type: execution.BundleContextManager bundle_result, # type: beam_fn_api_pb2.InstructionResponse deferred_inputs # type: MutableMapping[str, execution.PartitionableBuffer] ): - # type: (...) -> None + # type: (...) -> Set[str] + + """Returns a set of PCollection IDs of PColls having delayed applications. + + This transform inspects the bundle_context_manager, and bundle_result + objects, and adds all deferred inputs to the deferred_inputs object. + """ + pcolls_with_delayed_apps = set() for delayed_application in bundle_result.process_bundle.residual_roots: - name = bundle_context_manager.input_for( + producer_name = bundle_context_manager.input_for( delayed_application.application.transform_id, delayed_application.application.input_id) - if name not in deferred_inputs: - deferred_inputs[name] = ListBuffer( - coder_impl=bundle_context_manager.get_input_coder_impl(name)) - deferred_inputs[name].append(delayed_application.application.element) + if producer_name not in deferred_inputs: + deferred_inputs[producer_name] = ListBuffer( + coder_impl=bundle_context_manager.get_input_coder_impl( + producer_name)) + deferred_inputs[producer_name].append( + delayed_application.application.element) + + transform = bundle_context_manager.process_bundle_descriptor.transforms[ + producer_name] + # We take the output with tag 'out' from the producer transform. The + # producer transform is a GRPC read, and it has a single output. + pcolls_with_delayed_apps.add(only_element(transform.outputs.values())) + return pcolls_with_delayed_apps def _add_residuals_and_channel_splits_to_deferred_inputs( self, @@ -443,22 +517,37 @@ def _add_residuals_and_channel_splits_to_deferred_inputs( last_sent, # type: Dict[str, execution.PartitionableBuffer] deferred_inputs # type: MutableMapping[str, execution.PartitionableBuffer] ): - # type: (...) -> None + # type: (...) -> Tuple[Set[str], Set[str]] + + """Returns a two sets representing PCollections with watermark holds. + + The first set represents PCollections with delayed root applications. + The second set represents PTransforms with channel splits. + """ + pcolls_with_delayed_apps = set() + transforms_with_channel_splits = set() prev_stops = {} # type: Dict[str, int] for split in splits: for delayed_application in split.residual_roots: - name = bundle_context_manager.input_for( + producer_name = bundle_context_manager.input_for( delayed_application.application.transform_id, delayed_application.application.input_id) - if name not in deferred_inputs: - deferred_inputs[name] = ListBuffer( - coder_impl=bundle_context_manager.get_input_coder_impl(name)) - deferred_inputs[name].append(delayed_application.application.element) + if producer_name not in deferred_inputs: + deferred_inputs[producer_name] = ListBuffer( + coder_impl=bundle_context_manager.get_input_coder_impl( + producer_name)) + deferred_inputs[producer_name].append( + delayed_application.application.element) + # We take the output with tag 'out' from the producer transform. The + # producer transform is a GRPC read, and it has a single output. + pcolls_with_delayed_apps.add( + bundle_context_manager.process_bundle_descriptor. + transforms[producer_name].outputs['out']) for channel_split in split.channel_splits: coder_impl = bundle_context_manager.get_input_coder_impl( channel_split.transform_id) - # TODO(SDF): This requires determanistic ordering of buffer iteration. + # TODO(SDF): This requires deterministic ordering of buffer iteration. # TODO(SDF): The return split is in terms of indices. Ideally, # a runner could map these back to actual positions to effectively # describe the two "halves" of the now-split range. Even if we have @@ -478,6 +567,12 @@ def _add_residuals_and_channel_splits_to_deferred_inputs( channel_split.first_residual_element:prev_stops. get(channel_split.transform_id, len(all_elements)) + 1] if residual_elements: + transform = ( + bundle_context_manager.process_bundle_descriptor.transforms[ + channel_split.transform_id]) + assert transform.spec.urn == bundle_processor.DATA_INPUT_URN + transforms_with_channel_splits.add(transform.unique_name) + if channel_split.transform_id not in deferred_inputs: coder_impl = bundle_context_manager.get_input_coder_impl( channel_split.transform_id) @@ -487,6 +582,7 @@ def _add_residuals_and_channel_splits_to_deferred_inputs( coder_impl.encode_all(residual_elements)) prev_stops[ channel_split.transform_id] = channel_split.last_primary_element + return pcolls_with_delayed_apps, transforms_with_channel_splits def _run_stage(self, runner_execution_context, # type: execution.FnApiRunnerExecutionContext @@ -546,19 +642,30 @@ def merge_results(last_result): error=final_result.error or last_result.error)) while True: - last_result, deferred_inputs, fired_timers = self._run_bundle( + last_result, deferred_inputs, fired_timers, watermark_updates = ( + self._run_bundle( runner_execution_context, bundle_context_manager, data_input, data_output, input_timers, expected_timer_output, - bundle_manager) + bundle_manager)) + + for pc_name, watermark in watermark_updates.items(): + runner_execution_context.watermark_manager.set_pcoll_watermark( + pc_name, watermark) final_result = merge_results(last_result) if not deferred_inputs and not fired_timers: break else: + assert (runner_execution_context.watermark_manager.get_stage_node( + bundle_context_manager.stage.name).output_watermark() + < timestamp.MAX_TIMESTAMP), ( + 'wrong timestamp for %s. ' + % runner_execution_context.watermark_manager.get_stage_node( + bundle_context_manager.stage.name)) data_input = deferred_inputs input_timers = fired_timers @@ -571,6 +678,72 @@ def merge_results(last_result): return final_result + @staticmethod + def _build_watermark_updates( + runner_execution_context, # type: execution.FnApiRunnerExecutionContext + stage_inputs, # type: Iterable[str] + expected_timers, # type: Iterable[translations.TimerFamilyId] + pcolls_with_da, # type: Set[str] + transforms_w_splits, # type: Set[str] + watermarks_by_transform_and_timer_family # type: Dict[translations.TimerFamilyId, timestamp.Timestamp] + ) -> Dict[Union[str, translations.TimerFamilyId], timestamp.Timestamp]: + """Builds a dictionary of PCollection (or TimerFamilyId) to timestamp. + + Args: + stage_inputs: represent the set of expected input PCollections for a + stage. These do not include timers. + expected_timers: represent the set of TimerFamilyIds that the stage can + expect to receive as inputs. + pcolls_with_da: represent the set of stage input PCollections that had + delayed applications. + transforms_w_splits: represent the set of transforms in the stage that had + input splits. + watermarks_by_transform_and_timer_family: represent the set of watermark + holds to be added for each timer family. + """ + updates = { + } # type: Dict[Union[str, translations.TimerFamilyId], timestamp.Timestamp] + + def get_pcoll_id(transform_id): + buffer_id = runner_execution_context.input_transform_to_buffer_id[ + transform_id] + # For IMPULSE-reading transforms, we use the transform name as buffer id. + if buffer_id == translations.IMPULSE_BUFFER: + pcollection_id = transform_id + else: + _, pcollection_id = translations.split_buffer_id(buffer_id) + return pcollection_id + + # Any PCollections that have deferred applications should have their + # watermark held back. + for pcoll in pcolls_with_da: + updates[pcoll] = timestamp.MIN_TIMESTAMP + + # Also any transforms with splits should have their input PCollection's + # watermark held back. + for tr in transforms_w_splits: + pcoll_id = get_pcoll_id(tr) + updates[pcoll_id] = timestamp.MIN_TIMESTAMP + + # For all expected stage timers, we have two possible outcomes: + # 1) If the stage set a firing time for the timer, then we hold the + # watermark at that time + # 2) If the stage did not set a firing time for the timer, then we + # advance the watermark for that timer to MAX_TIMESTAMP. + for timer_pcoll_id in expected_timers: + updates[timer_pcoll_id] = watermarks_by_transform_and_timer_family.get( + timer_pcoll_id, timestamp.MAX_TIMESTAMP) + + # For any PCollection in the set of stage inputs, if its watermark was not + # held back (i.e. there weren't splits in its consumer PTransform, and there + # weren't delayed applications of the PCollection's elements), then the + # watermark should be advanced to MAX_TIMESTAMP. + for transform_id in stage_inputs: + pcoll_id = get_pcoll_id(transform_id) + if pcoll_id not in updates: + updates[pcoll_id] = timestamp.MAX_TIMESTAMP + return updates + def _run_bundle( self, runner_execution_context, # type: execution.FnApiRunnerExecutionContext @@ -578,11 +751,12 @@ def _run_bundle( data_input, # type: Dict[str, execution.PartitionableBuffer] data_output, # type: DataOutput input_timers, # type: Mapping[Tuple[str, str], execution.PartitionableBuffer] - expected_timer_output, # type: Dict[Tuple[str, str], bytes] + expected_timer_output, # type: Dict[translations.TimerFamilyId, bytes] bundle_manager # type: BundleManager - ): - # type: (...) -> Tuple[beam_fn_api_pb2.InstructionResponse, Dict[str, execution.PartitionableBuffer], Dict[Tuple[str, str], ListBuffer]] - + ) -> Tuple[beam_fn_api_pb2.InstructionResponse, + Dict[str, execution.PartitionableBuffer], + Dict[translations.TimerFamilyId, ListBuffer], + Dict[Union[str, translations.TimerFamilyId], timestamp.Timestamp]]: """Execute a bundle, and return a result object, and deferred inputs.""" self._run_bundle_multiple_times_for_testing( runner_execution_context, @@ -600,20 +774,28 @@ def _run_bundle( # - SDK-initiated deferred applications of root elements # - Runner-initiated deferred applications of root elements deferred_inputs = {} # type: Dict[str, execution.PartitionableBuffer] - fired_timers = {} # type: Dict[Tuple[str, str], ListBuffer] - self._collect_written_timers_and_add_to_fired_timers( - bundle_context_manager, fired_timers) + watermarks_by_transform_and_timer_family, newly_set_timers = ( + self._collect_written_timers(bundle_context_manager)) - self._add_sdk_delayed_applications_to_deferred_inputs( + sdk_pcolls_with_da = self._add_sdk_delayed_applications_to_deferred_inputs( bundle_context_manager, result, deferred_inputs) - self._add_residuals_and_channel_splits_to_deferred_inputs( - splits, bundle_context_manager, data_input, deferred_inputs) + runner_pcolls_with_da, transforms_with_channel_splits = ( + self._add_residuals_and_channel_splits_to_deferred_inputs( + splits, bundle_context_manager, data_input, deferred_inputs)) + + watermark_updates = self._build_watermark_updates( + runner_execution_context, + data_input.keys(), + expected_timer_output.keys(), + runner_pcolls_with_da.union(sdk_pcolls_with_da), + transforms_with_channel_splits, + watermarks_by_transform_and_timer_family) # After collecting deferred inputs, we 'pad' the structure with empty # buffers for other expected inputs. - if deferred_inputs or fired_timers: + if deferred_inputs or newly_set_timers: # The worker will be waiting on these inputs as well. for other_input in data_input: if other_input not in deferred_inputs: @@ -621,7 +803,7 @@ def _run_bundle( coder_impl=bundle_context_manager.get_input_coder_impl( other_input)) - return result, deferred_inputs, fired_timers + return result, deferred_inputs, newly_set_timers, watermark_updates @staticmethod def get_cache_token_generator(static=True): @@ -810,6 +992,7 @@ def _generate_splits_for_testing(self, split_fraction = next(split_manager_generator) done = False except StopIteration: + split_fraction = None done = True # Send all the data. diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py index 05bd0d2b32fb..4053b9cce770 100644 --- a/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py @@ -16,15 +16,13 @@ # # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import collections +import gc import logging import os import random +import re import shutil -import sys import tempfile import threading import time @@ -32,20 +30,19 @@ import typing import unittest import uuid -from builtins import range from typing import Any from typing import Dict from typing import Tuple -# patches unittest.TestCase to be python3 compatible import hamcrest # pylint: disable=ungrouped-imports +import pytest from hamcrest.core.matcher import Matcher from hamcrest.core.string_description import StringDescription -from nose.plugins.attrib import attr from tenacity import retry from tenacity import stop_after_attempt import apache_beam as beam +from apache_beam.coders import coders from apache_beam.coders.coders import StrUtf8Coder from apache_beam.io import restriction_trackers from apache_beam.io.watermark_estimators import ManualWatermarkEstimator @@ -54,7 +51,9 @@ from apache_beam.metrics.metricbase import MetricName from apache_beam.options.pipeline_options import DebugOptions from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import StandardOptions from apache_beam.options.value_provider import RuntimeValueProvider +from apache_beam.portability import python_urns from apache_beam.runners.portability import fn_api_runner from apache_beam.runners.portability.fn_api_runner import fn_runner from apache_beam.runners.sdf_utils import RestrictionTrackerView @@ -104,12 +103,7 @@ def create_pipeline(self, is_drain=False): def test_assert_that(self): # TODO: figure out a way for fn_api_runner to parse and raise the # underlying exception. - if sys.version_info < (3, 2): - assertRaisesRegex = self.assertRaisesRegexp - else: - assertRaisesRegex = self.assertRaisesRegex - - with assertRaisesRegex(Exception, 'Failed assert'): + with self.assertRaisesRegex(Exception, 'Failed assert'): with self.create_pipeline() as p: assert_that(p | beam.Create(['a', 'b']), equal_to(['a'])) @@ -493,6 +487,30 @@ def is_buffered_correctly(actual): assert_that(actual, is_buffered_correctly) + def test_pardo_dynamic_timer(self): + class DynamicTimerDoFn(beam.DoFn): + dynamic_timer_spec = userstate.TimerSpec( + 'dynamic_timer', userstate.TimeDomain.WATERMARK) + + def process( + self, element, + dynamic_timer=beam.DoFn.TimerParam(dynamic_timer_spec)): + dynamic_timer.set(element[1], dynamic_timer_tag=element[0]) + + @userstate.on_timer(dynamic_timer_spec) + def dynamic_timer_callback( + self, + tag=beam.DoFn.DynamicTimerTagParam, + timestamp=beam.DoFn.TimestampParam): + yield (tag, timestamp) + + with self.create_pipeline() as p: + actual = ( + p + | beam.Create([('key1', 10), ('key2', 20), ('key3', 30)]) + | beam.ParDo(DynamicTimerDoFn())) + assert_that(actual, equal_to([('key1', 10), ('key2', 20), ('key3', 30)])) + def test_sdf(self): class ExpandingStringsDoFn(beam.DoFn): def process( @@ -756,6 +774,21 @@ def test_windowing(self): | beam.Map(lambda k_vs1: (k_vs1[0], sorted(k_vs1[1])))) assert_that(res, equal_to([('k', [1, 2]), ('k', [100, 101, 102])])) + def test_custom_merging_window(self): + with self.create_pipeline() as p: + res = ( + p + | beam.Create([1, 2, 100, 101, 102]) + | beam.Map(lambda t: window.TimestampedValue(('k', t), t)) + | beam.WindowInto(CustomMergingWindowFn()) + | beam.GroupByKey() + | beam.Map(lambda k_vs1: (k_vs1[0], sorted(k_vs1[1])))) + assert_that( + res, equal_to([('k', [1]), ('k', [101]), ('k', [2, 100, 102])])) + gc.collect() + from apache_beam.runners.portability.fn_api_runner.execution import GenericMergingWindowFn + self.assertEqual(GenericMergingWindowFn._HANDLES, {}) + @unittest.skip('BEAM-9119: test is flaky') def test_large_elements(self): with self.create_pipeline() as p: @@ -963,6 +996,55 @@ def _add_argparse_args(cls, parser): with self.create_pipeline() as p: assert_that(p | beam.Create(['a', 'b']), equal_to(['a', 'b'])) + def _test_pack_combiners(self, assert_using_counter_names): + counter = beam.metrics.Metrics.counter('ns', 'num_values') + + def min_with_counter(values): + counter.inc() + return min(values) + + def max_with_counter(values): + counter.inc() + return max(values) + + class PackableCombines(beam.PTransform): + def annotations(self): + return {python_urns.APPLY_COMBINER_PACKING: b''} + + def expand(self, pcoll): + assert_that( + pcoll | 'PackableMin' >> beam.CombineGlobally(min_with_counter), + equal_to([10]), + label='AssertMin') + assert_that( + pcoll | 'PackableMax' >> beam.CombineGlobally(max_with_counter), + equal_to([30]), + label='AssertMax') + + with self.create_pipeline() as p: + _ = p | beam.Create([10, 20, 30]) | PackableCombines() + + res = p.run() + res.wait_until_finish() + + packed_step_name_regex = ( + r'.*Packed.*PackableMin.*CombinePerKey.*PackableMax.*CombinePerKey.*' + + 'Pack.*') + + counters = res.metrics().query(beam.metrics.MetricsFilter())['counters'] + step_names = set(m.key.step for m in counters) + pipeline_options = p._options + if assert_using_counter_names: + if pipeline_options.view_as(StandardOptions).streaming: + self.assertFalse( + any([re.match(packed_step_name_regex, s) for s in step_names])) + else: + self.assertTrue( + any([re.match(packed_step_name_regex, s) for s in step_names])) + + def test_pack_combiners(self): + self._test_pack_combiners(assert_using_counter_names=True) + # These tests are kept in a separate group so that they are # not ran in the FnApiRunnerTestWithBundleRepeat which repeats @@ -1361,7 +1443,8 @@ class FnApiRunnerTestWithGrpc(FnApiRunnerTest): def create_pipeline(self, is_drain=False): return beam.Pipeline( runner=fn_api_runner.FnApiRunner( - default_environment=environments.EmbeddedPythonGrpcEnvironment(), + default_environment=environments.EmbeddedPythonGrpcEnvironment. + default(), is_drain=is_drain)) @@ -1370,7 +1453,10 @@ def create_pipeline(self, is_drain=False): return beam.Pipeline( runner=fn_api_runner.FnApiRunner( default_environment=environments.EmbeddedPythonGrpcEnvironment( - state_cache_size=0, data_buffer_time_limit_ms=0), + state_cache_size=0, + data_buffer_time_limit_ms=0, + capabilities=environments.python_sdk_capabilities(), + artifacts=()), is_drain=is_drain)) @@ -1477,7 +1563,8 @@ def create_pipeline(self, is_drain=False): # to the bundle process request. return beam.Pipeline( runner=fn_api_runner.FnApiRunner( - default_environment=environments.EmbeddedPythonGrpcEnvironment(), + default_environment=environments.EmbeddedPythonGrpcEnvironment. + default(), is_drain=is_drain)) def test_checkpoint(self): @@ -1560,11 +1647,12 @@ def split_manager(num_elements): def run_sdf_split_half(self, is_drain=False): element_counter = ElementCounter() - is_first_bundle = [True] # emulate nonlocal for Python 2 + is_first_bundle = True def split_manager(num_elements): + nonlocal is_first_bundle if is_first_bundle and num_elements > 0: - del is_first_bundle[:] + is_first_bundle = False breakpoint = element_counter.set_breakpoint(1) yield breakpoint.wait() @@ -1858,14 +1946,12 @@ class FnApiBasedLullLoggingTest(unittest.TestCase): def create_pipeline(self): return beam.Pipeline( runner=fn_api_runner.FnApiRunner( - default_environment=environments.EmbeddedPythonGrpcEnvironment(), + default_environment=environments.EmbeddedPythonGrpcEnvironment. + default(), progress_request_frequency=0.5)) def test_lull_logging(self): - # TODO(BEAM-1251): Remove this test skip after dropping Py 2 support. - if sys.version_info < (3, 4): - self.skipTest('Log-based assertions are supported after Python 3.4') try: utils.check_compiled('apache_beam.runners.worker.opcounters') except RuntimeError: @@ -1901,7 +1987,7 @@ def __reduce__(self): return (self.__class__, (self.num_elements, 'x' * self.num_elements)) -@attr('ValidatesRunner') +@pytest.mark.it_validatesrunner class FnApiBasedStateBackedCoderTest(unittest.TestCase): def create_pipeline(self): return beam.Pipeline( @@ -1926,6 +2012,26 @@ def test_gbk_many_values(self): assert_that(r, equal_to([VALUES_PER_ELEMENT * NUM_OF_ELEMENTS])) +# TODO(robertwb): Why does pickling break when this is inlined? +class CustomMergingWindowFn(window.WindowFn): + def assign(self, assign_context): + return [ + window.IntervalWindow( + assign_context.timestamp, assign_context.timestamp + 1000) + ] + + def merge(self, merge_context): + evens = [w for w in merge_context.windows if w.start % 2 == 0] + if evens: + merge_context.merge( + evens, + window.IntervalWindow( + min(w.start for w in evens), max(w.end for w in evens))) + + def get_window_coder(self): + return coders.IntervalWindowCoder() + + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) unittest.main() diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py index b3d6e027b579..dc536d22ec43 100644 --- a/sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py @@ -20,14 +20,11 @@ # pytype: skip-file # mypy: check-untyped-defs -from __future__ import absolute_import -from __future__ import print_function - import collections import functools import itertools import logging -from builtins import object +from typing import Callable from typing import Container from typing import DefaultDict from typing import Dict @@ -40,8 +37,6 @@ from typing import Tuple from typing import TypeVar -from past.builtins import unicode - from apache_beam import coders from apache_beam.internal import pickler from apache_beam.portability import common_urns @@ -78,12 +73,21 @@ IMPULSE_BUFFER = b'impulse' +# TimerFamilyId is identified by transform name + timer family +TimerFamilyId = Tuple[str, str] + # SideInputId is identified by a consumer ParDo + tag. SideInputId = Tuple[str, str] SideInputAccessPattern = beam_runner_api_pb2.FunctionSpec DataOutput = Dict[str, bytes] +# A map from a PCollection coder ID to a Safe Coder ID +# A safe coder is a coder that can be used on the runner-side of the FnApi. +# A safe coder receives a byte string, and returns a type that can be +# understood by the runner when deserializing. +SafeCoderMapping = Dict[str, str] + # DataSideInput maps SideInputIds to a tuple of the encoded bytes of the side # input content, and a payload specification regarding the type of side input # (MultiMap / Iterable). @@ -105,7 +109,7 @@ def __init__( self.transforms = transforms self.downstream_side_inputs = downstream_side_inputs self.must_follow = must_follow - self.timers = set() # type: Set[Tuple[str, str]] + self.timers = set() # type: Set[TimerFamilyId] self.parent = parent if environment is None: environment = functools.reduce( @@ -174,7 +178,7 @@ def fuse(self, other, context): union(self.must_follow, other.must_follow), environment=self._merge_environments( self.environment, other.environment), - parent=_parent_for_fused_stages([self.name, other.name], context), + parent=_parent_for_fused_stages([self, other], context), forced_root=self.forced_root or other.forced_root) def is_runner_urn(self, context): @@ -375,7 +379,12 @@ def __init__( coder_proto = coders.BytesCoder().to_runner_api( None) # type: ignore[arg-type] self.bytes_coder_id = self.add_or_get_coder_id(coder_proto, 'bytes_coder') - self.safe_coders = {self.bytes_coder_id: self.bytes_coder_id} + + self.safe_coders: SafeCoderMapping = { + self.bytes_coder_id: self.bytes_coder_id + } + + # A map of PCollection ID to Coder ID. self.data_channel_coders = {} # type: Dict[str, str] def add_or_get_coder_id( @@ -762,8 +771,39 @@ def _remap_input_pcolls(transform, pcoll_id_remap): transform.inputs[input_key] = pcoll_id_remap[transform.inputs[input_key]] -def eliminate_common_key_with_none(stages, context): - # type: (Iterable[Stage], TransformContext) -> Iterable[Stage] +def _make_pack_name(names): + """Return the packed Transform or Stage name. + + The output name will contain the input names' common prefix, the infix + '/Packed', and the input names' suffixes in square brackets. + For example, if the input names are 'a/b/c1/d1' and 'a/b/c2/d2, then + the output name is 'a/b/Packed[c1_d1, c2_d2]'. + """ + assert names + tokens_in_names = [name.split('/') for name in names] + common_prefix_tokens = [] + + # Find the longest common prefix of tokens. + while True: + first_token_in_names = set() + for tokens in tokens_in_names: + if not tokens: + break + first_token_in_names.add(tokens[0]) + if len(first_token_in_names) != 1: + break + common_prefix_tokens.append(next(iter(first_token_in_names))) + for tokens in tokens_in_names: + tokens.pop(0) + + common_prefix_tokens.append('Packed') + common_prefix = '/'.join(common_prefix_tokens) + suffixes = ['_'.join(tokens) for tokens in tokens_in_names] + return '%s[%s]' % (common_prefix, ', '.join(suffixes)) + + +def _eliminate_common_key_with_none(stages, context, can_pack=lambda s: True): + # type: (Iterable[Stage], TransformContext, Callable[[str], bool]) -> Iterable[Stage] """Runs common subexpression elimination for sibling KeyWithNone stages. @@ -778,7 +818,7 @@ def eliminate_common_key_with_none(stages, context): # elimination, and group eligible KeyWithNone stages by parent and # environment. def get_stage_key(stage): - if len(stage.transforms) == 1: + if len(stage.transforms) == 1 and can_pack(stage.name): transform = only_transform(stage.transforms) if (transform.spec.urn == common_urns.primitives.PAR_DO.urn and len(transform.inputs) == 1 and len(transform.outputs) == 1): @@ -795,15 +835,23 @@ def get_stage_key(stage): pcoll_id_remap = {} remaining_stages = [] for sibling_stages in grouped_eligible_stages.values(): - output_pcoll_ids = [ - only_element(stage.transforms[0].outputs.values()) - for stage in sibling_stages - ] - parent = _parent_for_fused_stages([s.name for s in sibling_stages], context) - for to_delete_pcoll_id in output_pcoll_ids[1:]: - pcoll_id_remap[to_delete_pcoll_id] = output_pcoll_ids[0] - del context.components.pcollections[to_delete_pcoll_id] - sibling_stages[0].parent = parent + if len(sibling_stages) > 1: + output_pcoll_ids = [ + only_element(stage.transforms[0].outputs.values()) + for stage in sibling_stages + ] + parent = _parent_for_fused_stages(sibling_stages, context) + for to_delete_pcoll_id in output_pcoll_ids[1:]: + pcoll_id_remap[to_delete_pcoll_id] = output_pcoll_ids[0] + del context.components.pcollections[to_delete_pcoll_id] + sibling_stages[0].parent = parent + sibling_stages[0].name = _make_pack_name( + stage.name for stage in sibling_stages) + only_transform( + sibling_stages[0].transforms).unique_name = _make_pack_name( + only_transform(stage.transforms).unique_name + for stage in sibling_stages) + remaining_stages.append(sibling_stages[0]) # Remap all transforms in components. @@ -818,8 +866,8 @@ def get_stage_key(stage): yield stage -def pack_combiners(stages, context): - # type: (Iterable[Stage], TransformContext) -> Iterator[Stage] +def pack_per_key_combiners(stages, context, can_pack=lambda s: True): + # type: (Iterable[Stage], TransformContext, Callable[[str], bool]) -> Iterator[Stage] """Packs sibling CombinePerKey stages into a single CombinePerKey. @@ -874,9 +922,9 @@ def _try_fuse_stages(a, b): # Partition stages by whether they are eligible for CombinePerKey packing # and group eligible CombinePerKey stages by parent and environment. def get_stage_key(stage): - if (len(stage.transforms) == 1 and stage.environment is not None and - python_urns.PACKED_COMBINE_FN in context.components.environments[ - stage.environment].capabilities): + if (len(stage.transforms) == 1 and can_pack(stage.name) and + stage.environment is not None and python_urns.PACKED_COMBINE_FN in + context.components.environments[stage.environment].capabilities): transform = only_transform(stage.transforms) if (transform.spec.urn == common_urns.composites.COMBINE_PER_KEY.urn and len(transform.inputs) == 1 and len(transform.outputs) == 1): @@ -948,13 +996,16 @@ def get_stage_key(stage): component_coder_ids=[key_coder_id, pack_output_value_coder_id]) pack_output_kv_coder_id = context.add_or_get_coder_id(pack_output_kv_coder) - # Set up packed PCollection - pack_combine_name = fused_stage.name + pack_stage_name = _make_pack_name([stage.name for stage in packable_stages]) + pack_transform_name = _make_pack_name([ + only_transform(stage.transforms).unique_name + for stage in packable_stages + ]) pack_pcoll_id = unique_name(context.components.pcollections, 'pcollection') input_pcoll = context.components.pcollections[input_pcoll_id] context.components.pcollections[pack_pcoll_id].CopyFrom( beam_runner_api_pb2.PCollection( - unique_name=pack_combine_name + '.out', + unique_name=pack_transform_name + '/Pack.out', coder_id=pack_output_kv_coder_id, windowing_strategy_id=input_pcoll.windowing_strategy_id, is_bounded=input_pcoll.is_bounded)) @@ -969,7 +1020,7 @@ def get_stage_key(stage): for combine_payload in combine_payloads ]).to_runner_api(context) # type: ignore[arg-type] pack_transform = beam_runner_api_pb2.PTransform( - unique_name=pack_combine_name + '/Pack', + unique_name=pack_transform_name + '/Pack', spec=beam_runner_api_pb2.FunctionSpec( urn=common_urns.composites.COMBINE_PER_KEY.urn, payload=beam_runner_api_pb2.CombinePayload( @@ -977,10 +1028,11 @@ def get_stage_key(stage): accumulator_coder_id=tuple_accumulator_coder_id). SerializeToString()), inputs={'in': input_pcoll_id}, - outputs={'out': pack_pcoll_id}, + # 'None' single output key follows convention for CombinePerKey. + outputs={'None': pack_pcoll_id}, environment_id=fused_stage.environment) pack_stage = Stage( - pack_combine_name + '/Pack', [pack_transform], + pack_stage_name + '/Pack', [pack_transform], downstream_side_inputs=fused_stage.downstream_side_inputs, must_follow=fused_stage.must_follow, parent=fused_stage.parent, @@ -991,7 +1043,7 @@ def get_stage_key(stage): tags = [str(i) for i in range(len(output_pcoll_ids))] pickled_do_fn_data = pickler.dumps((_UnpackFn(tags), (), {}, [], None)) unpack_transform = beam_runner_api_pb2.PTransform( - unique_name=pack_combine_name + '/Unpack', + unique_name=pack_transform_name + '/Unpack', spec=beam_runner_api_pb2.FunctionSpec( urn=common_urns.primitives.PAR_DO.urn, payload=beam_runner_api_pb2.ParDoPayload( @@ -1002,7 +1054,7 @@ def get_stage_key(stage): outputs=dict(zip(tags, output_pcoll_ids)), environment_id=fused_stage.environment) unpack_stage = Stage( - pack_combine_name + '/Unpack', [unpack_transform], + pack_stage_name + '/Unpack', [unpack_transform], downstream_side_inputs=fused_stage.downstream_side_inputs, must_follow=fused_stage.must_follow, parent=fused_stage.parent, @@ -1010,6 +1062,34 @@ def get_stage_key(stage): yield unpack_stage +def pack_combiners(stages, context, can_pack=None): + # type: (Iterable[Stage], TransformContext, Optional[Callable[[str], bool]]) -> Iterator[Stage] + if can_pack is None: + can_pack_names = {} # type: Dict[str, bool] + parents = context.parents_map() + + def can_pack_fn(name: str) -> bool: + if name in can_pack_names: + return can_pack_names[name] + else: + transform = context.components.transforms[name] + if python_urns.APPLY_COMBINER_PACKING in transform.annotations: + result = True + elif name in parents: + result = can_pack_fn(parents[name]) + else: + result = False + can_pack_names[name] = result + return result + + can_pack = can_pack_fn + + yield from pack_per_key_combiners( + _eliminate_common_key_with_none(stages, context, can_pack), + context, + can_pack) + + def lift_combiners(stages, context): # type: (List[Stage], TransformContext) -> Iterator[Stage] @@ -1191,18 +1271,16 @@ def unlifted_stages(stage): yield stage -def _lowest_common_ancestor(a, b, context): - # type: (str, str, TransformContext) -> Optional[str] +def _lowest_common_ancestor(a, b, parents): + # type: (str, str, Dict[str, str]) -> Optional[str] '''Returns the name of the lowest common ancestor of the two named stages. - The provided context is used to compute ancestors of stages. Note that stages - are considered to be ancestors of themselves. + The map of stage names to their parents' stage names should be provided + in parents. Note that stages are considered to be ancestors of themselves. ''' assert a != b - parents = context.parents_map() - def get_ancestors(name): ancestor = name while ancestor is not None: @@ -1217,7 +1295,7 @@ def get_ancestors(name): def _parent_for_fused_stages(stages, context): - # type: (Iterable[Optional[str]], TransformContext) -> Optional[str] + # type: (Iterable[Stage], TransformContext) -> Optional[str] '''Returns the name of the new parent for the fused stages. @@ -1225,15 +1303,25 @@ def _parent_for_fused_stages(stages, context): contained in the set of stages to be fused. The provided context is used to compute ancestors of stages. ''' + + parents = context.parents_map() + # If any of the input stages were produced by fusion or an optimizer phase, + # or had its parent modified by an optimizer phase, its parent will not be + # be reflected in the PipelineContext yet, so we need to add it to the + # parents map. + for stage in stages: + parents[stage.name] = stage.parent + def reduce_fn(a, b): # type: (Optional[str], Optional[str]) -> Optional[str] if a is None or b is None: return None - return _lowest_common_ancestor(a, b, context) + return _lowest_common_ancestor(a, b, parents) - result = functools.reduce(reduce_fn, stages) - if result in stages: - result = context.parents_map().get(result) + stage_names = [stage.name for stage in stages] # type: List[Optional[str]] + result = functools.reduce(reduce_fn, stage_names) + if result in stage_names: + result = parents.get(result) return result @@ -1251,7 +1339,7 @@ def expand_sdf(stages, context): if pardo_payload.restriction_coder_id: def copy_like(protos, original, suffix='_copy', **kwargs): - if isinstance(original, (str, unicode)): + if isinstance(original, str): key = original original = protos[original] else: diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/translations_test.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/translations_test.py index 8eb79609367d..37882dc31ffa 100644 --- a/sdks/python/apache_beam/runners/portability/fn_api_runner/translations_test.py +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/translations_test.py @@ -16,16 +16,20 @@ # # pytype: skip-file -from __future__ import absolute_import - import logging import unittest +import pytest + import apache_beam as beam from apache_beam import runners from apache_beam.options import pipeline_options from apache_beam.portability import common_urns +from apache_beam.portability import python_urns from apache_beam.runners.portability.fn_api_runner import translations +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to from apache_beam.transforms import combiners from apache_beam.transforms import core from apache_beam.transforms import environments @@ -38,13 +42,14 @@ class MultipleKeyWithNone(beam.PTransform): def expand(self, pcoll): _ = pcoll | 'key-with-none-a' >> beam.ParDo(core._KeyWithNone()) _ = pcoll | 'key-with-none-b' >> beam.ParDo(core._KeyWithNone()) + _ = pcoll | 'key-with-none-c' >> beam.ParDo(core._KeyWithNone()) pipeline = beam.Pipeline() _ = pipeline | beam.Create( [1, 2, 3]) | 'multiple-key-with-none' >> MultipleKeyWithNone() pipeline_proto = pipeline.to_runner_api() _, stages = translations.create_and_optimize_stages( - pipeline_proto, [translations.eliminate_common_key_with_none], + pipeline_proto, [translations._eliminate_common_key_with_none], known_runner_urns=frozenset()) key_with_none_stages = [ stage for stage in stages if 'key-with-none' in stage.name @@ -54,9 +59,13 @@ def expand(self, pcoll): def test_pack_combiners(self): class MultipleCombines(beam.PTransform): + def annotations(self): + return {python_urns.APPLY_COMBINER_PACKING: b''} + def expand(self, pcoll): _ = pcoll | 'mean-perkey' >> combiners.Mean.PerKey() _ = pcoll | 'count-perkey' >> combiners.Count.PerKey() + _ = pcoll | 'largest-perkey' >> core.CombinePerKey(combiners.Largest(1)) pipeline = beam.Pipeline() vals = [6, 3, 1, 1, 9, 1, 5, 2, 0, 6] @@ -74,15 +83,20 @@ def expand(self, pcoll): if transform.spec.urn == common_urns.composites.COMBINE_PER_KEY.urn: combine_per_key_stages.append(stage) self.assertEqual(len(combine_per_key_stages), 1) - self.assertIn('/Pack', combine_per_key_stages[0].name) + self.assertIn('Packed', combine_per_key_stages[0].name) + self.assertIn('Packed', combine_per_key_stages[0].transforms[0].unique_name) self.assertIn('multiple-combines', combine_per_key_stages[0].parent) self.assertNotIn('-perkey', combine_per_key_stages[0].parent) def test_pack_combiners_with_missing_environment_capability(self): class MultipleCombines(beam.PTransform): + def annotations(self): + return {python_urns.APPLY_COMBINER_PACKING: b''} + def expand(self, pcoll): _ = pcoll | 'mean-perkey' >> combiners.Mean.PerKey() _ = pcoll | 'count-perkey' >> combiners.Count.PerKey() + _ = pcoll | 'largest-perkey' >> core.CombinePerKey(combiners.Largest(1)) pipeline = beam.Pipeline() vals = [6, 3, 1, 1, 9, 1, 5, 2, 0, 6] @@ -99,15 +113,22 @@ def expand(self, pcoll): combine_per_key_stages.append(stage) # Combiner packing should be skipped because the environment is missing # the beam:combinefn:packed_python:v1 capability. - self.assertEqual(len(combine_per_key_stages), 2) + self.assertEqual(len(combine_per_key_stages), 3) for combine_per_key_stage in combine_per_key_stages: - self.assertNotIn('/Pack', combine_per_key_stage.name) + self.assertNotIn('Packed', combine_per_key_stage.name) + self.assertNotIn( + 'Packed', combine_per_key_stage.transforms[0].unique_name) def test_pack_global_combiners(self): class MultipleCombines(beam.PTransform): + def annotations(self): + return {python_urns.APPLY_COMBINER_PACKING: b''} + def expand(self, pcoll): _ = pcoll | 'mean-globally' >> combiners.Mean.Globally() _ = pcoll | 'count-globally' >> combiners.Count.Globally() + _ = pcoll | 'largest-globally' >> core.CombineGlobally( + combiners.Largest(1)) pipeline = beam.Pipeline() vals = [6, 3, 1, 1, 9, 1, 5, 2, 0, 6] @@ -117,7 +138,6 @@ def expand(self, pcoll): pipeline_proto = pipeline.to_runner_api(default_environment=environment) _, stages = translations.create_and_optimize_stages( pipeline_proto, [ - translations.eliminate_common_key_with_none, translations.pack_combiners, ], known_runner_urns=frozenset()) @@ -134,7 +154,8 @@ def expand(self, pcoll): if transform.spec.urn == common_urns.composites.COMBINE_PER_KEY.urn: combine_per_key_stages.append(stage) self.assertEqual(len(combine_per_key_stages), 1) - self.assertIn('/Pack', combine_per_key_stages[0].name) + self.assertIn('Packed', combine_per_key_stages[0].name) + self.assertIn('Packed', combine_per_key_stages[0].transforms[0].unique_name) self.assertIn('multiple-combines', combine_per_key_stages[0].parent) self.assertNotIn('-globally', combine_per_key_stages[0].parent) @@ -149,14 +170,19 @@ def test_optimize_empty_pipeline(self): optimized_pipeline_proto, runner, pipeline_options.PipelineOptions()) def test_optimize_single_combine_globally(self): + class SingleCombine(beam.PTransform): + def annotations(self): + return {python_urns.APPLY_COMBINER_PACKING: b''} + + def expand(self, pcoll): + _ = pcoll | combiners.Count.Globally() + pipeline = beam.Pipeline() vals = [6, 3, 1, 1, 9, 1, 5, 2, 0, 6] - _ = pipeline | Create(vals) | combiners.Count.Globally() + _ = pipeline | Create(vals) | SingleCombine() pipeline_proto = pipeline.to_runner_api() optimized_pipeline_proto = translations.optimize_pipeline( - pipeline_proto, - [ - translations.eliminate_common_key_with_none, + pipeline_proto, [ translations.pack_combiners, ], known_runner_urns=frozenset(), @@ -167,16 +193,22 @@ def test_optimize_single_combine_globally(self): optimized_pipeline_proto, runner, pipeline_options.PipelineOptions()) def test_optimize_multiple_combine_globally(self): + class MultipleCombines(beam.PTransform): + def annotations(self): + return {python_urns.APPLY_COMBINER_PACKING: b''} + + def expand(self, pcoll): + _ = pcoll | 'mean-globally' >> combiners.Mean.Globally() + _ = pcoll | 'count-globally' >> combiners.Count.Globally() + _ = pcoll | 'largest-globally' >> core.CombineGlobally( + combiners.Largest(1)) + pipeline = beam.Pipeline() vals = [6, 3, 1, 1, 9, 1, 5, 2, 0, 6] - pcoll = pipeline | Create(vals) - _ = pcoll | 'mean-globally' >> combiners.Mean.Globally() - _ = pcoll | 'count-globally' >> combiners.Count.Globally() + _ = pipeline | Create(vals) | MultipleCombines() pipeline_proto = pipeline.to_runner_api() optimized_pipeline_proto = translations.optimize_pipeline( - pipeline_proto, - [ - translations.eliminate_common_key_with_none, + pipeline_proto, [ translations.pack_combiners, ], known_runner_urns=frozenset(), @@ -217,6 +249,108 @@ def assert_is_topologically_sorted(transform_id, visited_pcolls): assert_is_topologically_sorted( optimized_pipeline_proto.root_transform_ids[0], set()) + @pytest.mark.it_validatesrunner + def test_run_packable_combine_per_key(self): + class MultipleCombines(beam.PTransform): + def annotations(self): + return {python_urns.APPLY_COMBINER_PACKING: b''} + + def expand(self, pcoll): + # These CombinePerKey stages will be packed if and only if + # translations.pack_combiners is enabled in the TestPipeline runner. + assert_that( + pcoll | 'min-perkey' >> core.CombinePerKey(min), + equal_to([('a', -1)]), + label='assert-min-perkey') + assert_that( + pcoll | 'count-perkey' >> combiners.Count.PerKey(), + equal_to([('a', 10)]), + label='assert-count-perkey') + assert_that( + pcoll + | 'largest-perkey' >> combiners.Top.LargestPerKey(2), + equal_to([('a', [9, 6])]), + label='assert-largest-perkey') + + with TestPipeline() as pipeline: + vals = [6, 3, 1, -1, 9, 1, 5, 2, 0, 6] + _ = ( + pipeline + | Create([('a', x) for x in vals]) + | 'multiple-combines' >> MultipleCombines()) + + @pytest.mark.it_validatesrunner + def test_run_packable_combine_globally(self): + class MultipleCombines(beam.PTransform): + def annotations(self): + return {python_urns.APPLY_COMBINER_PACKING: b''} + + def expand(self, pcoll): + # These CombineGlobally stages will be packed if and only if + # translations.eliminate_common_key_with_void and + # translations.pack_combiners are enabled in the TestPipeline runner. + assert_that( + pcoll | 'min-globally' >> core.CombineGlobally(min), + equal_to([-1]), + label='assert-min-globally') + assert_that( + pcoll | 'count-globally' >> combiners.Count.Globally(), + equal_to([10]), + label='assert-count-globally') + assert_that( + pcoll + | 'largest-globally' >> combiners.Top.Largest(2), + equal_to([[9, 6]]), + label='assert-largest-globally') + + with TestPipeline() as pipeline: + vals = [6, 3, 1, -1, 9, 1, 5, 2, 0, 6] + _ = pipeline | Create(vals) | 'multiple-combines' >> MultipleCombines() + + def test_conditionally_packed_combiners(self): + class RecursiveCombine(beam.PTransform): + def __init__(self, labels): + self._labels = labels + + def expand(self, pcoll): + base = pcoll | 'Sum' >> beam.CombineGlobally(sum) + if self._labels: + rest = pcoll | self._labels[0] >> RecursiveCombine(self._labels[1:]) + return (base, rest) | beam.Flatten() + else: + return base + + def annotations(self): + if len(self._labels) == 2: + return {python_urns.APPLY_COMBINER_PACKING: b''} + else: + return {} + + # Verify the results are as expected. + with TestPipeline() as pipeline: + result = pipeline | beam.Create([1, 2, 3]) | RecursiveCombine('ABCD') + assert_that(result, equal_to([6, 6, 6, 6, 6])) + + # Verify the optimization is as expected. + proto = pipeline.to_runner_api( + default_environment=environments.EmbeddedPythonEnvironment( + capabilities=environments.python_sdk_capabilities())) + optimized = translations.optimize_pipeline( + proto, + phases=[translations.pack_combiners], + known_runner_urns=frozenset(), + partial=True) + optimized_stage_names = sorted( + t.unique_name for t in optimized.components.transforms.values()) + self.assertIn('RecursiveCombine/Sum/CombinePerKey', optimized_stage_names) + self.assertIn('RecursiveCombine/A/Sum/CombinePerKey', optimized_stage_names) + self.assertNotIn( + 'RecursiveCombine/A/B/Sum/CombinePerKey', optimized_stage_names) + self.assertIn( + 'RecursiveCombine/A/B/Packed[Sum_CombinePerKey, ' + 'C_Sum_CombinePerKey, C_D_Sum_CombinePerKey]/Pack', + optimized_stage_names) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/trigger_manager.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/trigger_manager.py new file mode 100644 index 000000000000..0fd74b33a8cb --- /dev/null +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/trigger_manager.py @@ -0,0 +1,458 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +import collections +import logging +import typing +from collections import defaultdict + +from apache_beam import typehints +from apache_beam.coders import PickleCoder +from apache_beam.coders import StrUtf8Coder +from apache_beam.coders import TupleCoder +from apache_beam.coders import VarIntCoder +from apache_beam.coders.coders import IntervalWindowCoder +from apache_beam.transforms import DoFn +from apache_beam.transforms.core import Windowing +from apache_beam.transforms.timeutil import TimeDomain +from apache_beam.transforms.trigger import AccumulationMode +from apache_beam.transforms.trigger import TriggerContext +from apache_beam.transforms.trigger import _CombiningValueStateTag +from apache_beam.transforms.trigger import _StateTag +from apache_beam.transforms.userstate import AccumulatingRuntimeState +from apache_beam.transforms.userstate import BagRuntimeState +from apache_beam.transforms.userstate import BagStateSpec +from apache_beam.transforms.userstate import CombiningValueStateSpec +from apache_beam.transforms.userstate import RuntimeTimer +from apache_beam.transforms.userstate import SetRuntimeState +from apache_beam.transforms.userstate import SetStateSpec +from apache_beam.transforms.userstate import TimerSpec +from apache_beam.transforms.userstate import on_timer +from apache_beam.transforms.window import BoundedWindow +from apache_beam.transforms.window import GlobalWindow +from apache_beam.transforms.window import TimestampedValue +from apache_beam.transforms.window import WindowFn +from apache_beam.typehints import TypeCheckError +from apache_beam.utils import windowed_value +from apache_beam.utils.timestamp import MIN_TIMESTAMP +from apache_beam.utils.timestamp import Timestamp +from apache_beam.utils.windowed_value import WindowedValue + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.DEBUG) + +K = typing.TypeVar('K') + + +class _ReifyWindows(DoFn): + """Receives KV pairs, and wraps the values into WindowedValues.""" + def process( + self, element, window=DoFn.WindowParam, timestamp=DoFn.TimestampParam): + try: + k, v = element + except TypeError: + raise TypeCheckError( + 'Input to GroupByKey must be a PCollection with ' + 'elements compatible with KV[A, B]') + + yield (k, windowed_value.WindowedValue(v, timestamp, [window])) + + +class _GroupBundlesByKey(DoFn): + def start_bundle(self): + self.keys = defaultdict(list) + + def process(self, element): + key, windowed_value = element + self.keys[key].append(windowed_value) + + def finish_bundle(self): + for k, vals in self.keys.items(): + yield windowed_value.WindowedValue((k, vals), + MIN_TIMESTAMP, [GlobalWindow()]) + + +def read_watermark(watermark_state): + try: + return watermark_state.read() + except ValueError: + watermark_state.add(MIN_TIMESTAMP) + return watermark_state.read() + + +class TriggerMergeContext(WindowFn.MergeContext): + def __init__( + self, all_windows, context: 'FnRunnerStatefulTriggerContext', windowing): + super(TriggerMergeContext, self).__init__(all_windows) + self.trigger_context = context + self.windowing = windowing + self.merged_away: typing.Dict[BoundedWindow, BoundedWindow] = {} + + def merge(self, to_be_merged, merge_result): + _LOGGER.debug("Merging %s into %s", to_be_merged, merge_result) + self.trigger_context.merge_windows(to_be_merged, merge_result) + for window in to_be_merged: + if window != merge_result: + self.merged_away[window] = merge_result + # Clear state associated with PaneInfo since it is + # not preserved across merges. + self.trigger_context.for_window(window).clear_state(None) + self.windowing.triggerfn.on_merge( + to_be_merged, + merge_result, + self.trigger_context.for_window(merge_result)) + + +@typehints.with_input_types( + typing.Tuple[K, typing.Iterable[windowed_value.WindowedValue]]) +@typehints.with_output_types( + typing.Tuple[K, typing.Iterable[windowed_value.WindowedValue]]) +class GeneralTriggerManagerDoFn(DoFn): + """A trigger manager that supports all windowing / triggering cases. + + This implements a DoFn that manages triggering in a per-key basis. All + elements for a single key are processed together. Per-key state holds data + related to all windows. + """ + + # TODO(BEAM-12026) Add support for Global and custom window fns. + KNOWN_WINDOWS = SetStateSpec('known_windows', IntervalWindowCoder()) + FINISHED_WINDOWS = SetStateSpec('finished_windows', IntervalWindowCoder()) + LAST_KNOWN_TIME = CombiningValueStateSpec('last_known_time', combine_fn=max) + LAST_KNOWN_WATERMARK = CombiningValueStateSpec( + 'last_known_watermark', combine_fn=max) + + # TODO(pabloem) What's the coder for the elements/keys here? + WINDOW_ELEMENT_PAIRS = BagStateSpec( + 'all_elements', TupleCoder([IntervalWindowCoder(), PickleCoder()])) + WINDOW_TAG_VALUES = BagStateSpec( + 'per_window_per_tag_value_state', + TupleCoder([IntervalWindowCoder(), StrUtf8Coder(), VarIntCoder()])) + + PROCESSING_TIME_TIMER = TimerSpec( + 'processing_time_timer', TimeDomain.REAL_TIME) + WATERMARK_TIMER = TimerSpec('watermark_timer', TimeDomain.WATERMARK) + + def __init__(self, windowing: Windowing): + self.windowing = windowing + # Only session windows are merging. Other windows are non-merging. + self.merging_windows = self.windowing.windowfn.is_merging() + + def process( + self, + element: typing.Tuple[K, typing.Iterable[windowed_value.WindowedValue]], + all_elements: BagRuntimeState = DoFn.StateParam(WINDOW_ELEMENT_PAIRS), # type: ignore + latest_processing_time: AccumulatingRuntimeState = DoFn.StateParam(LAST_KNOWN_TIME), # type: ignore + latest_watermark: AccumulatingRuntimeState = DoFn.StateParam( # type: ignore + LAST_KNOWN_WATERMARK), + window_tag_values: BagRuntimeState = DoFn.StateParam(WINDOW_TAG_VALUES), # type: ignore + windows_state: SetRuntimeState = DoFn.StateParam(KNOWN_WINDOWS), # type: ignore + finished_windows_state: SetRuntimeState = DoFn.StateParam( # type: ignore + FINISHED_WINDOWS), + processing_time_timer=DoFn.TimerParam(PROCESSING_TIME_TIMER), + watermark_timer=DoFn.TimerParam(WATERMARK_TIMER), + *args, **kwargs): + context = FnRunnerStatefulTriggerContext( + processing_time_timer=processing_time_timer, + watermark_timer=watermark_timer, + latest_processing_time=latest_processing_time, + latest_watermark=latest_watermark, + all_elements_state=all_elements, + window_tag_values=window_tag_values, + finished_windows_state=finished_windows_state) + key, windowed_values = element + watermark = read_watermark(latest_watermark) + + windows_to_elements = collections.defaultdict(list) + for wv in windowed_values: + for window in wv.windows: + # ignore expired windows + if watermark > window.end + self.windowing.allowed_lateness: + continue + if window in finished_windows_state.read(): + continue + windows_to_elements[window].append( + TimestampedValue(wv.value, wv.timestamp)) + + # Processing merging of windows + if self.merging_windows: + old_windows = set(windows_state.read()) + all_windows = old_windows.union(list(windows_to_elements)) + if all_windows != old_windows: + merge_context = TriggerMergeContext( + all_windows, context, self.windowing) + self.windowing.windowfn.merge(merge_context) + + merged_windows_to_elements = collections.defaultdict(list) + for window, values in windows_to_elements.items(): + while window in merge_context.merged_away: + window = merge_context.merged_away[window] + merged_windows_to_elements[window].extend(values) + windows_to_elements = merged_windows_to_elements + + for w in windows_to_elements: + windows_state.add(w) + # Done processing merging of windows + + seen_windows = set() + for w in windows_to_elements: + window_context = context.for_window(w) + seen_windows.add(w) + for value_w_timestamp in windows_to_elements[w]: + _LOGGER.debug(value_w_timestamp) + all_elements.add((w, value_w_timestamp)) + self.windowing.triggerfn.on_element(windowed_values, w, window_context) + + return self._fire_eligible_windows( + key, TimeDomain.WATERMARK, watermark, None, context, seen_windows) + + def _fire_eligible_windows( + self, + key: K, + time_domain, + timestamp: Timestamp, + timer_tag: typing.Optional[str], + context: 'FnRunnerStatefulTriggerContext', + windows_of_interest: typing.Optional[typing.Set[BoundedWindow]] = None): + windows_to_elements = context.windows_to_elements_map() + context.all_elements_state.clear() + + fired_windows = set() + _LOGGER.debug( + '%s - tag %s - timestamp %s', time_domain, timer_tag, timestamp) + for w, elems in windows_to_elements.items(): + if windows_of_interest is not None and w not in windows_of_interest: + # windows_of_interest=None means that we care about all windows. + # If we care only about some windows, and this window is not one of + # them, then we do not intend to fire this window. + continue + window_context = context.for_window(w) + if self.windowing.triggerfn.should_fire(time_domain, + timestamp, + w, + window_context): + finished = self.windowing.triggerfn.on_fire( + timestamp, w, window_context) + _LOGGER.debug('Firing on window %s. Finished: %s', w, finished) + fired_windows.add(w) + if finished: + context.finished_windows_state.add(w) + # TODO(pabloem): Format the output: e.g. pane info + elems = [WindowedValue(e.value, e.timestamp, (w, )) for e in elems] + yield (key, elems) + + finished_windows: typing.Set[BoundedWindow] = set( + context.finished_windows_state.read()) + # Add elements that were not fired back into state. + for w, elems in windows_to_elements.items(): + for e in elems: + if (w in finished_windows or + (w in fired_windows and + self.windowing.accumulation_mode == AccumulationMode.DISCARDING)): + continue + context.all_elements_state.add((w, e)) + + @on_timer(PROCESSING_TIME_TIMER) + def processing_time_trigger( + self, + key=DoFn.KeyParam, + timer_tag=DoFn.DynamicTimerTagParam, + timestamp=DoFn.TimestampParam, + latest_processing_time=DoFn.StateParam(LAST_KNOWN_TIME), + all_elements=DoFn.StateParam(WINDOW_ELEMENT_PAIRS), + processing_time_timer=DoFn.TimerParam(PROCESSING_TIME_TIMER), + window_tag_values: BagRuntimeState = DoFn.StateParam(WINDOW_TAG_VALUES), # type: ignore + finished_windows_state: SetRuntimeState = DoFn.StateParam( # type: ignore + FINISHED_WINDOWS), + watermark_timer=DoFn.TimerParam(WATERMARK_TIMER)): + context = FnRunnerStatefulTriggerContext( + processing_time_timer=processing_time_timer, + watermark_timer=watermark_timer, + latest_processing_time=latest_processing_time, + latest_watermark=None, + all_elements_state=all_elements, + window_tag_values=window_tag_values, + finished_windows_state=finished_windows_state) + result = self._fire_eligible_windows( + key, TimeDomain.REAL_TIME, timestamp, timer_tag, context) + latest_processing_time.add(timestamp) + return result + + @on_timer(WATERMARK_TIMER) + def watermark_trigger( + self, + key=DoFn.KeyParam, + timer_tag=DoFn.DynamicTimerTagParam, + timestamp=DoFn.TimestampParam, + latest_watermark=DoFn.StateParam(LAST_KNOWN_WATERMARK), + all_elements=DoFn.StateParam(WINDOW_ELEMENT_PAIRS), + processing_time_timer=DoFn.TimerParam(PROCESSING_TIME_TIMER), + window_tag_values: BagRuntimeState = DoFn.StateParam(WINDOW_TAG_VALUES), # type: ignore + finished_windows_state: SetRuntimeState = DoFn.StateParam( # type: ignore + FINISHED_WINDOWS), + watermark_timer=DoFn.TimerParam(WATERMARK_TIMER)): + context = FnRunnerStatefulTriggerContext( + processing_time_timer=processing_time_timer, + watermark_timer=watermark_timer, + latest_processing_time=None, + latest_watermark=latest_watermark, + all_elements_state=all_elements, + window_tag_values=window_tag_values, + finished_windows_state=finished_windows_state) + result = self._fire_eligible_windows( + key, TimeDomain.WATERMARK, timestamp, timer_tag, context) + latest_watermark.add(timestamp) + return result + + +class FnRunnerStatefulTriggerContext(TriggerContext): + def __init__( + self, + processing_time_timer: RuntimeTimer, + watermark_timer: RuntimeTimer, + latest_processing_time: typing.Optional[AccumulatingRuntimeState], + latest_watermark: typing.Optional[AccumulatingRuntimeState], + all_elements_state: BagRuntimeState, + window_tag_values: BagRuntimeState, + finished_windows_state: SetRuntimeState): + self.timers = { + TimeDomain.REAL_TIME: processing_time_timer, + TimeDomain.WATERMARK: watermark_timer + } + self.current_times = { + TimeDomain.REAL_TIME: latest_processing_time, + TimeDomain.WATERMARK: latest_watermark + } + self.all_elements_state = all_elements_state + self.window_tag_values = window_tag_values + self.finished_windows_state = finished_windows_state + + def windows_to_elements_map( + self + ) -> typing.Dict[BoundedWindow, typing.List[windowed_value.WindowedValue]]: + window_element_pairs: typing.Iterable[typing.Tuple[ + BoundedWindow, + windowed_value.WindowedValue]] = self.all_elements_state.read() + result: typing.Dict[BoundedWindow, + typing.List[windowed_value.WindowedValue]] = {} + for w, e in window_element_pairs: + if w not in result: + result[w] = [] + result[w].append(e) + return result + + def for_window(self, window): + return PerWindowTriggerContext(window, self) + + def get_current_time(self): + return self.current_times[TimeDomain.REAL_TIME].read() + + def set_timer(self, name, time_domain, timestamp): + _LOGGER.debug('Setting timer (%s, %s) at %s', time_domain, name, timestamp) + self.timers[time_domain].set(timestamp, dynamic_timer_tag=name) + + def clear_timer(self, name, time_domain): + _LOGGER.debug('Clearing timer (%s, %s)', time_domain, name) + self.timers[time_domain].clear(dynamic_timer_tag=name) + + def merge_windows(self, to_be_merged, merge_result): + all_triplets = list(self.window_tag_values.read()) + # Collect all the triplets for the window we are merging away, and tag them + # with the new window (merge_result). + merging_away_triplets = [(merge_result, state_tag, state) + for (window, state_tag, state) in all_triplets + if window in to_be_merged] + + # Collect all of the other triplets, and joining them with the newly-tagged + # set of triplets. + resulting_triplets = [(window, state_tag, state) + for (window, state_tag, state) in all_triplets + if window not in to_be_merged] + merging_away_triplets + self.window_tag_values.clear() + for t in resulting_triplets: + self.window_tag_values.add(t) + + # Merge also element-window pairs + all_elements = self.all_elements_state.read() + resulting_elements = [ + (merge_result if e[0] in to_be_merged else e[0], e[1]) + for e in all_elements + ] + self.all_elements_state.clear() + for e in resulting_elements: + self.all_elements_state.add(e) + + def add_state(self, tag, value): + # State can only be kept in per-window context, so this is not implemented. + raise NotImplementedError('unimplemented') + + def get_state(self, tag): + # State can only be kept in per-window context, so this is not implemented. + raise NotImplementedError('unimplemented') + + def clear_state(self, tag): + # State can only be kept in per-window context, so this is not implemented. + raise NotImplementedError('unimplemented') + + +class PerWindowTriggerContext(TriggerContext): + def __init__(self, window, parent: FnRunnerStatefulTriggerContext): + self.window = window + self.parent = parent + + def get_current_time(self): + return self.parent.get_current_time() + + def set_timer(self, name, time_domain, timestamp): + self.parent.set_timer(name, time_domain, timestamp) + + def clear_timer(self, name, time_domain): + _LOGGER.debug('Clearing timer (%s, %s)', time_domain, name) + self.parent.clear_timer(name, time_domain) + + def add_state(self, tag: _StateTag, value): + assert isinstance(tag, _CombiningValueStateTag) + # Used to count: + # 1) number of elements in a window ('count') + # 2) number of triggers matched individually ('index') + # 3) whether the watermark has passed end of window ('is_late') + self.parent.window_tag_values.add((self.window, tag.tag, value)) + + def get_state(self, tag: _StateTag): + assert isinstance(tag, _CombiningValueStateTag) + # Used to count: + # 1) number of elements in a window ('count') + # 2) number of triggers matched individually ('index') + # 3) whether the watermark has passed end of window ('is_late') + all_triplets = self.parent.window_tag_values.read() + relevant_triplets = [(window, state_tag, state) + for (window, state_tag, state) in all_triplets + if window == self.window and state_tag == tag.tag] + return tag.combine_fn.apply(relevant_triplets) + + def clear_state(self, tag: _StateTag): + if tag is None: + matches = lambda x: True + else: + matches = lambda x: x == tag.tag + all_triplets = self.parent.window_tag_values.read() + remaining_triplets = [(window, state_tag, state) + for (window, state_tag, state) in all_triplets + if not (window == self.window and matches(state_tag))] + _LOGGER.debug('Tag: %s | Remaining triplets: %s', tag, remaining_triplets) + self.parent.window_tag_values.clear() + for t in remaining_triplets: + self.parent.window_tag_values.add(t) diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/trigger_manager_test.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/trigger_manager_test.py new file mode 100644 index 000000000000..8a071520ad15 --- /dev/null +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/trigger_manager_test.py @@ -0,0 +1,250 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import unittest + +from apache_beam import Map +from apache_beam import WindowInto +from apache_beam.runners.portability.fn_api_runner import trigger_manager +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.test_stream import TestStream +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to +from apache_beam.transforms.core import ParDo +from apache_beam.transforms.core import Windowing +from apache_beam.transforms.trigger import AccumulationMode +from apache_beam.transforms.trigger import AfterCount +from apache_beam.transforms.trigger import AfterWatermark +from apache_beam.transforms.trigger import Repeatedly +from apache_beam.transforms.window import FixedWindows +from apache_beam.transforms.window import IntervalWindow +from apache_beam.transforms.window import Sessions +from apache_beam.transforms.window import SlidingWindows +from apache_beam.transforms.window import TimestampedValue +from apache_beam.utils.timestamp import MAX_TIMESTAMP + + +class TriggerManagerTest(unittest.TestCase): + def test_with_trigger_window_that_finish(self): + def tsv(key, value, ts): + return TimestampedValue((key, value), timestamp=ts) + + # yapf: disable + test_stream = ( + TestStream() + .advance_watermark_to(0) + .add_elements([tsv('k1', 1, 0), tsv('k1', 2, 0)]) + .add_elements([tsv('k1', 3, 0)]) + .advance_watermark_to(2) + .add_elements([tsv('k1', 6, 0)]) # This value is discarded. + .advance_watermark_to_infinity()) + # yapf: enable + + # Fixed, one-second windows with DefaultTrigger (after watermark) + windowing = Windowing( + FixedWindows(1), + triggerfn=AfterWatermark(), + allowed_lateness=0, + accumulation_mode=AccumulationMode.DISCARDING) + + with TestPipeline() as p: + result = ( + p + | test_stream + | WindowInto(windowing.windowfn) + | ParDo(trigger_manager._ReifyWindows()) + | ParDo(trigger_manager._GroupBundlesByKey()) + | ParDo(trigger_manager.GeneralTriggerManagerDoFn(windowing)) + | Map( + lambda elm: + (elm[0], elm[1][0].windows[0], [v.value for v in elm[1]]))) + assert_that( + result, + equal_to([ + ('k1', IntervalWindow(0, 1), [1, 2, 3]), # On the watermark + ])) + + def test_fixed_windows_simple_watermark(self): + def tsv(key, value, ts): + return TimestampedValue((key, value), timestamp=ts) + + # yapf: disable + test_stream = ( + TestStream() + .advance_watermark_to(0) + .add_elements([tsv('k1', 1, 0), tsv('k2', 1, 0), + tsv('k1', 2, 0), tsv('k2', 2, 0)]) + .add_elements([tsv('k1', 3, 0), tsv('k2', 3, 0)]) + .add_elements([tsv('k1', 4, 1), tsv('k2', 4, 1)]) + .add_elements([tsv('k1', 5, 1), tsv('k2', 5, 1)]) + .advance_watermark_to(1) + .add_elements([tsv('k1', 6, 0)]) + .advance_watermark_to_infinity()) + # yapf: enable + + # Fixed, one-second windows with DefaultTrigger (after watermark) + windowing = Windowing( + FixedWindows(1), allowed_lateness=MAX_TIMESTAMP.seconds()) + + with TestPipeline() as p: + result = ( + p + | test_stream + | WindowInto(windowing.windowfn) + | ParDo(trigger_manager._ReifyWindows()) + | ParDo(trigger_manager._GroupBundlesByKey()) + | ParDo(trigger_manager.GeneralTriggerManagerDoFn(windowing)) + | Map( + lambda elm: + (elm[0], elm[1][0].windows[0], [v.value for v in elm[1]]))) + assert_that( + result, + equal_to([ + ('k1', IntervalWindow(0, 1), [1, 2, 3]), # On the watermark + ('k2', IntervalWindow(0, 1), [1, 2, 3]), # On the watermark + ('k1', IntervalWindow(1, 2), [4, 5]), # On the watermark + ('k2', IntervalWindow(1, 2), [4, 5]), # On the watermark + ('k1', IntervalWindow(0, 1), [6]), # After the watermark + ])) + + def test_sliding_windows_simple_watermark(self): + # yapf: disable + test_stream = ( + TestStream() + .advance_watermark_to(0) + .add_elements([('k1', 1), ('k2', 1), ('k1', 1), ('k2', 1)]) + .add_elements([('k1', 1), ('k2', 1)]) + .advance_watermark_to(1) + .add_elements([('k1', 2), ('k2', 2)]) + .add_elements([('k1', 2), ('k2', 2)]) + .advance_watermark_to(2) + .add_elements([('k1', 3), ('k2', 3)]) + .add_elements([('k1', 3), ('k2', 3)]) + .advance_watermark_to_infinity()) + # yapf: enable + + # Fixed, one-second windows with DefaultTrigger (after watermark) + windowing = Windowing(SlidingWindows(2, 1)) + + with TestPipeline() as p: + result = ( + p + | test_stream + | WindowInto(windowing.windowfn) + | ParDo(trigger_manager._ReifyWindows()) + | ParDo(trigger_manager._GroupBundlesByKey()) + | ParDo(trigger_manager.GeneralTriggerManagerDoFn(windowing)) + | Map( + lambda elm: + (elm[0], elm[1][0].windows[0], [v.value for v in elm[1]]))) + assert_that( + result, + equal_to([ + ('k1', IntervalWindow(-1, 1), [1, 1, 1]), + ('k2', IntervalWindow(-1, 1), [1, 1, 1]), + ('k1', IntervalWindow(0, 2), [1, 1, 1, 2, 2]), + ('k2', IntervalWindow(0, 2), [1, 1, 1, 2, 2]), + ('k1', IntervalWindow(1, 3), [2, 2, 3, 3]), + ('k2', IntervalWindow(1, 3), [2, 2, 3, 3]), + ('k1', IntervalWindow(2, 4), [3, 3]), + ('k2', IntervalWindow(2, 4), [3, 3]), + ])) + + def test_fixed_after_count_accumulating(self): + # yapf: disable + test_stream = ( + TestStream() + .advance_watermark_to(0) + .add_elements([('k1', 1), ('k1', 1), ('k2', 1), ('k2', 1)]) + .add_elements([('k1', 1), ('k1', 1)]) + .advance_watermark_to(2) + .add_elements([('k1', 2), ('k2', 2)]) # This values are discarded. + .advance_watermark_to_infinity()) + # yapf: enable + + # Fixed, one-second windows with DefaultTrigger (after watermark) + windowing = Windowing( + FixedWindows(2), + triggerfn=Repeatedly(AfterCount(2)), + accumulation_mode=AccumulationMode.ACCUMULATING) + + with TestPipeline() as p: + result = ( + p + | test_stream + | WindowInto(windowing.windowfn) + | ParDo(trigger_manager._ReifyWindows()) + | ParDo(trigger_manager._GroupBundlesByKey()) + | ParDo(trigger_manager.GeneralTriggerManagerDoFn(windowing)) + | Map( + lambda elm: + (elm[0], elm[1][0].windows[0], [v.value for v in elm[1]]))) + assert_that( + result, + equal_to([ + ('k1', IntervalWindow(0, 2), [1, 1]), + ('k2', IntervalWindow(0, 2), [1, 1]), + ('k1', IntervalWindow(0, 2), [1, 1, 1, 1]), + ])) + + def test_sessions_and_complex_trigger_accumulating(self): + def tsv(key, value, ts): + return TimestampedValue((key, value), timestamp=ts) + + # yapf: disable + test_stream = ( + TestStream() + .advance_watermark_to(0) + .add_elements([tsv('k1', 1, 1), tsv('k1', 2, 15), + tsv('k1', 3, 7), tsv('k1', 4, 30)]) + .advance_watermark_to(50) + .add_elements([tsv('k1', -3, 1), tsv('k1', -2, 2),]) + .add_elements([tsv('k1', -1, 21)]) + .advance_watermark_to_infinity()) + # yapf: enable + + # Fixed, one-second windows with DefaultTrigger (after watermark) + windowing = Windowing( + Sessions(10), + triggerfn=AfterWatermark(early=AfterCount(2), late=AfterCount(1)), + accumulation_mode=AccumulationMode.ACCUMULATING, + allowed_lateness=MAX_TIMESTAMP.seconds()) + + with TestPipeline() as p: + result = ( + p + | test_stream + | WindowInto(windowing.windowfn) + | ParDo(trigger_manager._ReifyWindows()) + | ParDo(trigger_manager._GroupBundlesByKey()) + | ParDo(trigger_manager.GeneralTriggerManagerDoFn(windowing)) + | Map( + lambda elm: + (elm[0], elm[1][0].windows[0], set(v.value for v in elm[1])))) + assert_that( + result, + equal_to([ + ('k1', IntervalWindow(1, 25), {1, 2, 3}), # early + ('k1', IntervalWindow(1, 25), {1, 2, 3}), # on time + ('k1', IntervalWindow(30, 40), {4}), # on time + ('k1', IntervalWindow(1, 25), {1, 2, 3, -3, -2}), # late + ('k1', IntervalWindow(1, 40), {1, 2, 3, 4, -3, -2, -1}), # late + ])) + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/visualization_tools.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/visualization_tools.py new file mode 100644 index 000000000000..a4eeba13ebda --- /dev/null +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/visualization_tools.py @@ -0,0 +1,115 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Set of utilities to visualize a pipeline to be executed by FnApiRunner.""" +from typing import Set +from typing import Tuple + +from apache_beam.runners.portability.fn_api_runner.translations import Stage +from apache_beam.runners.portability.fn_api_runner.watermark_manager import WatermarkManager + + +def show_stage(stage: Stage): + try: + import graphviz + except ImportError: + import warnings + warnings.warn('Unable to draw pipeline. graphviz library missing.') + return + + g = graphviz.Digraph() + + seen_pcollections = set() + for t in stage.transforms: + g.node(t.unique_name, shape='box') + + for i in t.inputs.values(): + assert isinstance(i, str) + if i not in seen_pcollections: + g.node(i) + seen_pcollections.add(i) + + g.edge(i, t.unique_name) + + for o in t.outputs.values(): + assert isinstance(o, str) + if o not in seen_pcollections: + g.node(o) + seen_pcollections.add(o) + + g.edge(t.unique_name, o) + + g.render('stage_graph', format='png') + + +def show_watermark_manager(watermark_manager: WatermarkManager): + try: + import graphviz + except ImportError: + import warnings + warnings.warn('Unable to draw pipeline. graphviz library missing.') + return + + g = graphviz.Digraph() + + def pcoll_node_name(pcoll_node: WatermarkManager.PCollectionNode): + if isinstance(pcoll_node.name, tuple): + return 'PCOLL_%s_%s' % pcoll_node.name + else: + return 'PCOLL_%s' % pcoll_node.name + + def add_node(name, shape=None): + if name not in seen_nodes: + seen_nodes.add(name) + g.node(name, shape=shape) + + def add_links(link_from=None, link_to=None, edge_style="solid"): + if link_from and link_to: + if (link_to, link_from) not in seen_links: + g.edge(link_from, link_to, style=edge_style) + seen_links.add((link_to, link_from)) + + seen_nodes: Set[str] = set() + seen_links: Set[Tuple[str, str]] = set() + for node in watermark_manager._stages_by_name.values(): + name = 'STAGE_%s...%s' % (node.name[:30], node.name[-30:]) + add_node(name, 'box') + + for pcnode in watermark_manager._pcollections_by_name.values(): + assert isinstance(pcnode, WatermarkManager.PCollectionNode) + name = pcoll_node_name(pcnode) + add_node(name) + + for node in watermark_manager._stages_by_name.values(): + stage = 'STAGE_%s...%s' % (node.name[:30], node.name[-30:]) + for pcoll in node.inputs: + input_name = pcoll_node_name(pcoll) + # Main inputs have a BOLD edge. + add_links(link_from=input_name, link_to=stage, edge_style="bold") + for pcoll in node.side_inputs: + # Side inputs have a dashed edge. + input_name = pcoll_node_name(pcoll) + add_links(link_from=input_name, link_to=stage, edge_style="dashed") + + for pcnode in watermark_manager._pcollections_by_name.values(): + assert isinstance(pcnode, WatermarkManager.PCollectionNode) + pcoll_name = pcoll_node_name(pcnode) + for producer in pcnode.producers: + prod_name = 'STAGE_%s...%s' % (producer.name[:30], producer.name[-30:]) + add_links(link_from=prod_name, link_to=pcoll_name) + + g.render('pipeline_graph', format='png') diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/watermark_manager.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/watermark_manager.py new file mode 100644 index 000000000000..fd15c6a0aac2 --- /dev/null +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/watermark_manager.py @@ -0,0 +1,171 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Utilities for managing watermarks for a pipeline execution by FnApiRunner.""" + +from __future__ import absolute_import + +from typing import Dict +from typing import List +from typing import Set +from typing import Union + +from apache_beam.portability.api import beam_runner_api_pb2 +from apache_beam.runners.portability.fn_api_runner import translations +from apache_beam.runners.portability.fn_api_runner.translations import split_buffer_id +from apache_beam.runners.worker import bundle_processor +from apache_beam.utils import proto_utils +from apache_beam.utils import timestamp + + +class WatermarkManager(object): + """Manages the watermarks of a pipeline's stages. + It works by constructing an internal graph representation of the pipeline, + and keeping track of dependencies.""" + class PCollectionNode(object): + def __init__(self, name): + self.name = name + self._watermark = timestamp.MIN_TIMESTAMP + self.producers: Set[WatermarkManager.StageNode] = set() + + def __str__(self): + return 'PCollectionNode' % list(self.producers) + + def set_watermark(self, wm: timestamp.Timestamp): + self._watermark = min(self.upstream_watermark(), wm) + + def upstream_watermark(self): + if self.producers: + return min(p.input_watermark() for p in self.producers) + else: + return timestamp.MAX_TIMESTAMP + + def watermark(self): + return self._watermark + + class StageNode(object): + def __init__(self, name): + # We keep separate inputs and side inputs because side inputs + # should hold back a stage's input watermark, to hold back execution + # for that stage; but they should not be considered when calculating + # the output watermark of the stage, because only the main input + # can actually advance that watermark. + self.name = name + self.inputs: Set[WatermarkManager.PCollectionNode] = set() + self.side_inputs: Set[WatermarkManager.PCollectionNode] = set() + self.outputs: Set[WatermarkManager.PCollectionNode] = set() + + def __str__(self): + return 'StageNode None + self._pcollections_by_name: Dict[Union[str, translations.TimerFamilyId], + WatermarkManager.PCollectionNode] = {} + self._stages_by_name: Dict[str, WatermarkManager.StageNode] = {} + + def add_pcollection( + pcname: str, + snode: WatermarkManager.StageNode) -> WatermarkManager.PCollectionNode: + if pcname not in self._pcollections_by_name: + self._pcollections_by_name[pcname] = WatermarkManager.PCollectionNode( + pcname) + pcnode = self._pcollections_by_name[pcname] + assert isinstance(pcnode, WatermarkManager.PCollectionNode) + snode.inputs.add(pcnode) + return pcnode + + for s in stages: + stage_name = s.name + stage_node = WatermarkManager.StageNode(stage_name) + self._stages_by_name[stage_name] = stage_node + + # 1. Get stage inputs, create nodes for them, add to _stages_by_name, + # and add as inputs to stage node. + for transform in s.transforms: + if transform.spec.urn == bundle_processor.DATA_INPUT_URN: + buffer_id = transform.spec.payload + if buffer_id == translations.IMPULSE_BUFFER: + pcoll_name = transform.unique_name + add_pcollection(pcoll_name, stage_node) + continue + else: + _, pcoll_name = split_buffer_id(buffer_id) + add_pcollection(pcoll_name, stage_node) + + # 2. Get stage timers, and add them as inputs to the stage. + for transform in s.transforms: + if transform.spec.urn in translations.PAR_DO_URNS: + payload = proto_utils.parse_Bytes( + transform.spec.payload, beam_runner_api_pb2.ParDoPayload) + for timer_family_id in payload.timer_family_specs.keys(): + timer_pcoll_name = (transform.unique_name, timer_family_id) + self._pcollections_by_name[ + timer_pcoll_name] = WatermarkManager.PCollectionNode( + timer_pcoll_name) + timer_pcoll_node = self._pcollections_by_name[timer_pcoll_name] + assert isinstance( + timer_pcoll_node, WatermarkManager.PCollectionNode) + stage_node.inputs.add(timer_pcoll_node) + + # 3. Get stage outputs, create nodes for them, add to + # _pcollections_by_name, and add stage as their producer + for transform in s.transforms: + if transform.spec.urn == bundle_processor.DATA_OUTPUT_URN: + buffer_id = transform.spec.payload + _, pcoll_name = split_buffer_id(buffer_id) + if pcoll_name not in self._pcollections_by_name: + self._pcollections_by_name[ + pcoll_name] = WatermarkManager.PCollectionNode(pcoll_name) + pcoll_node = self._pcollections_by_name[pcoll_name] + assert isinstance(pcoll_node, WatermarkManager.PCollectionNode) + pcoll_node.producers.add(stage_node) + stage_node.outputs.add(pcoll_node) + + # 4. Get stage side inputs, create nodes for them, add to + # _pcollections_by_name, and add them as side inputs of the stage. + for pcoll_name in s.side_inputs(): + if pcoll_name not in self._pcollections_by_name: + self._pcollections_by_name[ + pcoll_name] = WatermarkManager.PCollectionNode(pcoll_name) + pcoll_node = self._pcollections_by_name[pcoll_name] + assert isinstance(pcoll_node, WatermarkManager.PCollectionNode) + stage_node.side_inputs.add(pcoll_node) + + def get_stage_node(self, name): + # type: (str) -> StageNode + return self._stages_by_name[name] + + def set_pcoll_watermark(self, name, watermark): + element = self._pcollections_by_name[name] + element.set_watermark(watermark) diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner/worker_handlers.py b/sdks/python/apache_beam/runners/portability/fn_api_runner/worker_handlers.py index dbfcf6cd19d6..b967e01260d5 100644 --- a/sdks/python/apache_beam/runners/portability/fn_api_runner/worker_handlers.py +++ b/sdks/python/apache_beam/runners/portability/fn_api_runner/worker_handlers.py @@ -19,8 +19,6 @@ # mypy: disallow-untyped-defs -from __future__ import absolute_import - import collections import contextlib import copy @@ -30,7 +28,6 @@ import sys import threading import time -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import BinaryIO # pylint: disable=unused-import @@ -71,6 +68,7 @@ from apache_beam.runners.worker.statecache import StateCache from apache_beam.utils import proto_utils from apache_beam.utils import thread_pool_executor +from apache_beam.utils.interactive_utils import is_in_notebook from apache_beam.utils.sentinel import Sentinel if TYPE_CHECKING: @@ -318,7 +316,7 @@ def register_environment( urn, # type: str payload_type # type: Optional[Type[T]] ): - # type: (...) -> Callable[[Callable[[T, sdk_worker.StateHandler, ExtendedProvisionInfo, GrpcServer], WorkerHandler]], Callable[[T, sdk_worker.StateHandler, ExtendedProvisionInfo, GrpcServer], WorkerHandler]] + # type: (...) -> Callable[[Type[WorkerHandler]], Callable[[T, sdk_worker.StateHandler, ExtendedProvisionInfo, GrpcServer], WorkerHandler]] def wrapper(constructor): # type: (Callable) -> Callable cls._registered_environments[urn] = constructor, payload_type # type: ignore[assignment] @@ -345,7 +343,7 @@ def create(cls, # This takes a WorkerHandlerManager instead of GrpcServer, so it is not # compatible with WorkerHandler.register_environment. There is a special case # in WorkerHandlerManager.get_worker_handlers() that allows it to work. -@WorkerHandler.register_environment(python_urns.EMBEDDED_PYTHON, None) # type: ignore[arg-type] +@WorkerHandler.register_environment(python_urns.EMBEDDED_PYTHON, None) class EmbeddedWorkerHandler(WorkerHandler): """An in-memory worker_handler for fn API control, state and data planes.""" @@ -363,7 +361,7 @@ def __init__(self, state_cache = StateCache(STATE_CACHE_SIZE) self.bundle_processor_cache = sdk_worker.BundleProcessorCache( SingletonStateHandlerFactory( - sdk_worker.CachingStateHandler(state_cache, state)), + sdk_worker.GlobalCachingStateHandler(state_cache, state)), data_plane.InMemoryDataChannelFactory( self.data_plane_handler.inverse()), worker_manager._process_bundle_descriptors) @@ -707,7 +705,10 @@ def start_worker(self): # type: () -> None from apache_beam.runners.portability import local_job_service self.worker = local_job_service.SubprocessSdkWorker( - self._worker_command_line, self.control_address, self.worker_id) + self._worker_command_line, + self.control_address, + self.provision_info, + self.worker_id) self.worker_thread = threading.Thread( name='run_worker', target=self.worker.run) self.worker_thread.start() @@ -734,11 +735,15 @@ def __init__(self, def host_from_worker(self): # type: () -> str - if sys.platform == "darwin": + if sys.platform == 'darwin': # See https://docs.docker.com/docker-for-mac/networking/ return 'host.docker.internal' - else: - return super(DockerSdkWorkerHandler, self).host_from_worker() + if sys.platform == 'linux' and is_in_notebook(): + import socket + # Gets ipv4 address of current host. Note the host is not guaranteed to + # be localhost because the python SDK could be running within a container. + return socket.gethostbyname(socket.getfqdn()) + return super(DockerSdkWorkerHandler, self).host_from_worker() def start_worker(self): # type: () -> None @@ -1048,6 +1053,10 @@ def clear(self, state_key): pass return _Future.done() + def done(self): + # type: () -> None + pass + @staticmethod def _to_key(state_key): # type: (beam_fn_api_pb2.StateKey) -> bytes diff --git a/sdks/python/apache_beam/runners/portability/job_server.py b/sdks/python/apache_beam/runners/portability/job_server.py index 4aa9b6094905..b1363c6a4021 100644 --- a/sdks/python/apache_beam/runners/portability/job_server.py +++ b/sdks/python/apache_beam/runners/portability/job_server.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import atexit import shutil import signal @@ -130,6 +128,7 @@ def __init__(self, options): self._expansion_port = options.expansion_port self._artifacts_dir = options.artifacts_dir self._java_launcher = options.job_server_java_launcher + self._jvm_properties = options.job_server_jvm_properties def java_arguments( self, job_port, artifact_port, expansion_port, artifacts_dir): @@ -139,8 +138,9 @@ def path_to_jar(self): raise NotImplementedError(type(self)) @staticmethod - def path_to_beam_jar(gradle_target): - return subprocess_server.JavaJarServer.path_to_beam_jar(gradle_target) + def path_to_beam_jar(gradle_target, artifact_id=None): + return subprocess_server.JavaJarServer.path_to_beam_jar( + gradle_target, artifact_id=artifact_id) @staticmethod def local_jar(url): @@ -152,8 +152,9 @@ def subprocess_cmd_and_endpoint(self): self._artifacts_dir if self._artifacts_dir else self.local_temp_dir( prefix='artifacts')) job_port, = subprocess_server.pick_port(self._job_port) - return ([self._java_launcher, '-jar', jar_path] + list( + subprocess_cmd = [self._java_launcher, '-jar'] + self._jvm_properties + [ + jar_path + ] + list( self.java_arguments( - job_port, self._artifact_port, self._expansion_port, - artifacts_dir)), - 'localhost:%s' % job_port) + job_port, self._artifact_port, self._expansion_port, artifacts_dir)) + return (subprocess_cmd, 'localhost:%s' % job_port) diff --git a/sdks/python/apache_beam/runners/portability/job_server_test.py b/sdks/python/apache_beam/runners/portability/job_server_test.py new file mode 100644 index 000000000000..1e2ede281c9d --- /dev/null +++ b/sdks/python/apache_beam/runners/portability/job_server_test.py @@ -0,0 +1,81 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# pytype: skip-file + +import logging +import unittest + +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.runners.portability.job_server import JavaJarJobServer + + +class JavaJarJobServerStub(JavaJarJobServer): + def java_arguments( + self, job_port, artifact_port, expansion_port, artifacts_dir): + return [ + '--artifacts-dir', + artifacts_dir, + '--job-port', + job_port, + '--artifact-port', + artifact_port, + '--expansion-port', + expansion_port + ] + + def path_to_jar(self): + return '/path/to/jar' + + @staticmethod + def local_jar(url): + return url + + +class JavaJarJobServerTest(unittest.TestCase): + def test_subprocess_cmd_and_endpoint(self): + pipeline_options = PipelineOptions([ + '--job_port=8099', + '--artifact_port=8098', + '--expansion_port=8097', + '--artifacts_dir=/path/to/artifacts/', + '--job_server_java_launcher=/path/to/java', + '--job_server_jvm_properties=-Dsome.property=value' + ]) + job_server = JavaJarJobServerStub(pipeline_options) + subprocess_cmd, endpoint = job_server.subprocess_cmd_and_endpoint() + self.assertEqual( + subprocess_cmd, + [ + '/path/to/java', + '-jar', + '-Dsome.property=value', + '/path/to/jar', + '--artifacts-dir', + '/path/to/artifacts/', + '--job-port', + 8099, + '--artifact-port', + 8098, + '--expansion-port', + 8097 + ]) + self.assertEqual(endpoint, 'localhost:8099') + + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.INFO) + unittest.main() diff --git a/sdks/python/apache_beam/runners/portability/local_job_service.py b/sdks/python/apache_beam/runners/portability/local_job_service.py index 8e33e4df24ef..aedfc03fcaa9 100644 --- a/sdks/python/apache_beam/runners/portability/local_job_service.py +++ b/sdks/python/apache_beam/runners/portability/local_job_service.py @@ -16,8 +16,6 @@ # # pytype: skip-file -from __future__ import absolute_import - import concurrent.futures import itertools import logging @@ -29,12 +27,12 @@ import threading import time import traceback -from builtins import object from typing import TYPE_CHECKING from typing import List from typing import Optional import grpc +from google.protobuf import json_format from google.protobuf import text_format # type: ignore # not in typeshed from apache_beam.metrics import monitoring_infos @@ -172,14 +170,16 @@ def GetJobMetrics(self, request, context=None): class SubprocessSdkWorker(object): """Manages a SDK worker implemented as a subprocess communicating over grpc. - """ + """ def __init__( self, worker_command_line, # type: bytes control_address, + provision_info, worker_id=None): self._worker_command_line = worker_command_line self._control_address = control_address + self._provision_info = provision_info self._worker_id = worker_id def run(self): @@ -195,11 +195,14 @@ def run(self): control_descriptor = text_format.MessageToString( endpoints_pb2.ApiServiceDescriptor(url=self._control_address)) + pipeline_options = json_format.MessageToJson( + self._provision_info.provision_info.pipeline_options) env_dict = dict( os.environ, CONTROL_API_SERVICE_DESCRIPTOR=control_descriptor, - LOGGING_API_SERVICE_DESCRIPTOR=logging_descriptor) + LOGGING_API_SERVICE_DESCRIPTOR=logging_descriptor, + PIPELINE_OPTIONS=pipeline_options) # only add worker_id when it is set. if self._worker_id: env_dict['WORKER_ID'] = self._worker_id diff --git a/sdks/python/apache_beam/runners/portability/local_job_service_main.py b/sdks/python/apache_beam/runners/portability/local_job_service_main.py index 5c5b2adba50d..6d9d32f5f23f 100644 --- a/sdks/python/apache_beam/runners/portability/local_job_service_main.py +++ b/sdks/python/apache_beam/runners/portability/local_job_service_main.py @@ -28,9 +28,6 @@ at a time. Pass --help to see them all. """ -from __future__ import absolute_import -from __future__ import print_function - import argparse import logging import os diff --git a/sdks/python/apache_beam/runners/portability/local_job_service_test.py b/sdks/python/apache_beam/runners/portability/local_job_service_test.py index afa2fce10d7e..7d9d70d98df8 100644 --- a/sdks/python/apache_beam/runners/portability/local_job_service_test.py +++ b/sdks/python/apache_beam/runners/portability/local_job_service_test.py @@ -16,9 +16,6 @@ # # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import logging import unittest diff --git a/sdks/python/apache_beam/runners/portability/portable_metrics.py b/sdks/python/apache_beam/runners/portability/portable_metrics.py index f3af1097d69d..08a328b11234 100644 --- a/sdks/python/apache_beam/runners/portability/portable_metrics.py +++ b/sdks/python/apache_beam/runners/portability/portable_metrics.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging from apache_beam.metrics import monitoring_infos diff --git a/sdks/python/apache_beam/runners/portability/portable_runner.py b/sdks/python/apache_beam/runners/portability/portable_runner.py index 15c860698611..d040d2ec6e0f 100644 --- a/sdks/python/apache_beam/runners/portability/portable_runner.py +++ b/sdks/python/apache_beam/runners/portability/portable_runner.py @@ -18,9 +18,6 @@ # pytype: skip-file # mypy: check-untyped-defs -from __future__ import absolute_import -from __future__ import division - import atexit import functools import itertools @@ -28,6 +25,8 @@ import threading import time from typing import TYPE_CHECKING +from typing import Any +from typing import Dict from typing import Iterator from typing import Optional from typing import Tuple @@ -169,6 +168,11 @@ def add_runner_options(parser): add_extra_args_fn=add_runner_options, retain_unknown_options=self._retain_unknown_options) + return self.encode_pipeline_options(all_options) + + @staticmethod + def encode_pipeline_options( + all_options: Dict[str, Any]) -> 'struct_pb2.Struct': def convert_pipeline_option_value(v): # convert int values: BEAM-5509 if type(v) == int: @@ -321,54 +325,86 @@ def get_proto_pipeline(pipeline, options): default_environment=PortableRunner._create_environment( portable_options)) - # Preemptively apply combiner lifting, until all runners support it. - # These optimizations commute and are idempotent. + # TODO: https://issues.apache.org/jira/browse/BEAM-7199 + # Eventually remove the 'pre_optimize' option alltogether and only perform + # the equivalent of the 'default' case below (minus the 'lift_combiners' + # part). pre_optimize = options.view_as(DebugOptions).lookup_experiment( - 'pre_optimize', 'lift_combiners').lower() - if not options.view_as(StandardOptions).streaming: - flink_known_urns = frozenset([ - common_urns.composites.RESHUFFLE.urn, - common_urns.primitives.IMPULSE.urn, - common_urns.primitives.FLATTEN.urn, - common_urns.primitives.GROUP_BY_KEY.urn - ]) - if pre_optimize == 'none': - pass + 'pre_optimize', 'default').lower() + if (not options.view_as(StandardOptions).streaming and + pre_optimize != 'none'): + if pre_optimize == 'default': + phases = [ + # TODO: https://issues.apache.org/jira/browse/BEAM-4678 + # https://issues.apache.org/jira/browse/BEAM-11478 + # Eventually remove the 'lift_combiners' phase from 'default'. + translations.pack_combiners, + translations.lift_combiners, + translations.sort_stages + ] + partial = True elif pre_optimize == 'all': - proto_pipeline = translations.optimize_pipeline( - proto_pipeline, - phases=[ - translations.annotate_downstream_side_inputs, - translations.annotate_stateful_dofns_as_roots, - translations.fix_side_input_pcoll_coders, - translations.eliminate_common_key_with_none, - translations.pack_combiners, - translations.lift_combiners, - translations.expand_sdf, - translations.fix_flatten_coders, - # fn_api_runner_transforms.sink_flattens, - translations.greedily_fuse, - translations.read_to_impulse, - translations.extract_impulse_stages, - translations.remove_data_plane_ops, - translations.sort_stages - ], - known_runner_urns=flink_known_urns) + phases = [ + translations.annotate_downstream_side_inputs, + translations.annotate_stateful_dofns_as_roots, + translations.fix_side_input_pcoll_coders, + translations.pack_combiners, + translations.lift_combiners, + translations.expand_sdf, + translations.fix_flatten_coders, + # translations.sink_flattens, + translations.greedily_fuse, + translations.read_to_impulse, + translations.extract_impulse_stages, + translations.remove_data_plane_ops, + translations.sort_stages + ] + partial = False + elif pre_optimize == 'all_except_fusion': + # TODO(BEAM-7248): Delete this branch after PortableRunner supports + # beam:runner:executable_stage:v1. + phases = [ + translations.annotate_downstream_side_inputs, + translations.annotate_stateful_dofns_as_roots, + translations.fix_side_input_pcoll_coders, + translations.pack_combiners, + translations.lift_combiners, + translations.expand_sdf, + translations.fix_flatten_coders, + # translations.sink_flattens, + # translations.greedily_fuse, + translations.read_to_impulse, + translations.extract_impulse_stages, + translations.remove_data_plane_ops, + translations.sort_stages + ] + partial = True else: phases = [] for phase_name in pre_optimize.split(','): # For now, these are all we allow. - if phase_name in 'lift_combiners': + if phase_name in ('pack_combiners', 'lift_combiners'): phases.append(getattr(translations, phase_name)) else: raise ValueError( 'Unknown or inapplicable phase for pre_optimize: %s' % phase_name) - proto_pipeline = translations.optimize_pipeline( - proto_pipeline, - phases=phases, - known_runner_urns=flink_known_urns, - partial=True) + phases.append(translations.sort_stages) + partial = True + + # All (known) portable runners (ie Flink and Spark) support these URNs. + known_urns = frozenset([ + common_urns.composites.RESHUFFLE.urn, + common_urns.primitives.IMPULSE.urn, + common_urns.primitives.FLATTEN.urn, + common_urns.primitives.GROUP_BY_KEY.urn + ]) + proto_pipeline = translations.optimize_pipeline( + proto_pipeline, + phases=phases, + known_runner_urns=known_urns, + partial=partial) + return proto_pipeline def run_pipeline(self, pipeline, options): diff --git a/sdks/python/apache_beam/runners/portability/portable_runner_test.py b/sdks/python/apache_beam/runners/portability/portable_runner_test.py index 25991746a7db..5d4217f306e8 100644 --- a/sdks/python/apache_beam/runners/portability/portable_runner_test.py +++ b/sdks/python/apache_beam/runners/portability/portable_runner_test.py @@ -16,9 +16,6 @@ # # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import inspect import logging import socket @@ -235,6 +232,19 @@ def create_options(self): return options +# TODO(BEAM-7248): Delete this test after PortableRunner supports +# beam:runner:executable_stage:v1. +class PortableRunnerOptimizedWithoutFusion(PortableRunnerTest): + def create_options(self): + options = super(PortableRunnerOptimizedWithoutFusion, self).create_options() + options.view_as(DebugOptions).add_experiment( + 'pre_optimize=all_except_fusion') + options.view_as(DebugOptions).add_experiment('state_cache_size=100') + options.view_as(DebugOptions).add_experiment( + 'data_buffer_time_limit_ms=1000') + return options + + class PortableRunnerTestWithExternalEnv(PortableRunnerTest): @classmethod def setUpClass(cls): diff --git a/sdks/python/apache_beam/runners/portability/samza_runner_test.py b/sdks/python/apache_beam/runners/portability/samza_runner_test.py new file mode 100644 index 000000000000..2f60ad81b272 --- /dev/null +++ b/sdks/python/apache_beam/runners/portability/samza_runner_test.py @@ -0,0 +1,184 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# pytype: skip-file + +# Run as +# +# pytest samza_runner_test.py[::TestClass::test_case] \ +# --test-pipeline-options="--environment_type=LOOPBACK" +import argparse +import logging +import shlex +import unittest +from shutil import rmtree +from tempfile import mkdtemp + +import pytest + +from apache_beam.options.pipeline_options import PortableOptions +from apache_beam.runners.portability import job_server +from apache_beam.runners.portability import portable_runner +from apache_beam.runners.portability import portable_runner_test + +_LOGGER = logging.getLogger(__name__) + + +class SamzaRunnerTest(portable_runner_test.PortableRunnerTest): + _use_grpc = True + _use_subprocesses = True + + expansion_port = None + samza_job_server_jar = None + + @pytest.fixture(autouse=True) + def parse_options(self, request): + if not request.config.option.test_pipeline_options: + raise unittest.SkipTest( + 'Skipping because --test-pipeline-options is not specified.') + test_pipeline_options = request.config.option.test_pipeline_options + parser = argparse.ArgumentParser(add_help=True) + parser.add_argument( + '--samza_job_server_jar', + help='Job server jar to submit jobs.', + action='store') + parser.add_argument( + '--environment_type', + default='LOOPBACK', + choices=['DOCKER', 'PROCESS', 'LOOPBACK'], + help='Set the environment type for running user code. DOCKER runs ' + 'user code in a container. PROCESS runs user code in ' + 'automatically started processes. LOOPBACK runs user code on ' + 'the same process that originally submitted the job.') + parser.add_argument( + '--environment_option', + '--environment_options', + dest='environment_options', + action='append', + default=None, + help=( + 'Environment configuration for running the user code. ' + 'Recognized options depend on --environment_type.\n ' + 'For DOCKER: docker_container_image (optional)\n ' + 'For PROCESS: process_command (required), process_variables ' + '(optional, comma-separated)\n ' + 'For EXTERNAL: external_service_address (required)')) + known_args, unknown_args = parser.parse_known_args( + shlex.split(test_pipeline_options)) + if unknown_args: + _LOGGER.warning('Discarding unrecognized arguments %s' % unknown_args) + self.set_samza_job_server_jar( + known_args.samza_job_server_jar or + job_server.JavaJarJobServer.path_to_beam_jar( + ':runners:samza:job-server:shadowJar')) + self.environment_type = known_args.environment_type + self.environment_options = known_args.environment_options\ + + @classmethod + def _subprocess_command(cls, job_port, expansion_port): + # will be cleaned up at the end of this method, and recreated and used by + # the job server + tmp_dir = mkdtemp(prefix='samzatest') + + cls.expansion_port = expansion_port + + try: + return [ + 'java', + '-jar', + cls.samza_job_server_jar, + '--artifacts-dir', + tmp_dir, + '--job-port', + str(job_port), + '--artifact-port', + '0', + '--expansion-port', + str(expansion_port), + ] + finally: + rmtree(tmp_dir) + + @classmethod + def set_samza_job_server_jar(cls, samza_job_server_jar): + cls.samza_job_server_jar = samza_job_server_jar + + @classmethod + def get_runner(cls): + return portable_runner.PortableRunner() + + @classmethod + def get_expansion_service(cls): + # TODO Move expansion address resides into PipelineOptions + return 'localhost:%s' % cls.expansion_port + + def create_options(self): + options = super(SamzaRunnerTest, self).create_options() + options.view_as(PortableOptions).environment_type = self.environment_type + options.view_as( + PortableOptions).environment_options = self.environment_options + + return options + + def test_metrics(self): + # Skip until Samza portable runner supports distribution metrics. + raise unittest.SkipTest("BEAM-12614") + + def test_flattened_side_input(self): + # Blocked on support for transcoding + # https://issues.apache.org/jira/browse/BEAM-12681 + super(SamzaRunnerTest, + self).test_flattened_side_input(with_transcoding=False) + + def test_pack_combiners(self): + # Stages produced by translations.pack_combiners are fused + # by translations.greedily_fuse, which prevent the stages + # from being detecting using counters by the test. + self._test_pack_combiners(assert_using_counter_names=False) + + def test_pardo_timers(self): + # Skip until Samza portable runner supports clearing timer. + raise unittest.SkipTest("BEAM-12774") + + def test_register_finalizations(self): + # Skip until Samza runner supports bundle finalization. + raise unittest.SkipTest("BEAM-12615") + + def test_callbacks_with_exception(self): + # Skip until Samza runner supports bundle finalization. + raise unittest.SkipTest("BEAM-12615") + + def test_sdf_with_dofn_as_watermark_estimator(self): + # Skip until Samza runner supports SDF and self-checkpoint. + raise unittest.SkipTest("BEAM-12616") + + def test_sdf_with_sdf_initiated_checkpointing(self): + # Skip until Samza runner supports SDF. + raise unittest.SkipTest("BEAM-12616") + + def test_sdf_with_watermark_tracking(self): + # Skip until Samza runner supports SDF. + raise unittest.SkipTest("BEAM-12616") + + def test_custom_merging_window(self): + # Skip until Samza runner supports merging window fns + raise unittest.SkipTest("BEAM-12617") + + +if __name__ == '__main__': + # Run the tests. + logging.getLogger().setLevel(logging.INFO) + unittest.main() diff --git a/sdks/python/apache_beam/runners/portability/sdk_container_builder.py b/sdks/python/apache_beam/runners/portability/sdk_container_builder.py index f41e9f8fbdb7..d1c6247e7800 100644 --- a/sdks/python/apache_beam/runners/portability/sdk_container_builder.py +++ b/sdks/python/apache_beam/runners/portability/sdk_container_builder.py @@ -24,8 +24,6 @@ to build the new image. """ -from __future__ import absolute_import - import json import logging import os @@ -36,8 +34,8 @@ import tempfile import time import uuid +from typing import Type -from google.protobuf.duration_pb2 import Duration from google.protobuf.json_format import MessageToJson from apache_beam import version as beam_version @@ -49,7 +47,9 @@ from apache_beam.options.pipeline_options import SetupOptions from apache_beam.portability import common_urns from apache_beam.portability.api import beam_runner_api_pb2 +from apache_beam.runners.dataflow.internal.clients import cloudbuild from apache_beam.runners.portability.stager import Stager +from apache_beam.utils import plugin ARTIFACTS_CONTAINER_DIR = '/opt/apache/beam/artifacts' ARTIFACTS_MANIFEST_FILE = 'artifacts_info.json' @@ -65,7 +65,7 @@ _LOGGER = logging.getLogger(__name__) -class SdkContainerImageBuilder(object): +class SdkContainerImageBuilder(plugin.BeamPlugin): def __init__(self, options): self._options = options self._docker_registry_push_url = self._options.view_as( @@ -79,24 +79,27 @@ def __init__(self, options): (sys.version_info[0], sys.version_info[1], version)) self._temp_src_dir = None - def build(self): + def _build(self): container_image_tag = str(uuid.uuid4()) container_image_name = os.path.join( self._docker_registry_push_url or '', 'beam_python_prebuilt_sdk:%s' % container_image_tag) with tempfile.TemporaryDirectory() as temp_folder: self._temp_src_dir = temp_folder - self.prepare_dependencies() - self.invoke_docker_build_and_push(container_image_name) + self._prepare_dependencies() + self._invoke_docker_build_and_push(container_image_name) return container_image_name - def prepare_dependencies(self): + def _prepare_dependencies(self): with tempfile.TemporaryDirectory() as tmp: - resources = Stager.create_job_resources(self._options, tmp) + artifacts = Stager.create_job_resources(self._options, tmp) + resources = Stager.extract_staging_tuple_iter(artifacts) # make a copy of the staged artifacts into the temp source folder. + file_names = [] for path, name in resources: shutil.copyfile(path, os.path.join(self._temp_src_dir, name)) + file_names.append(name) with open(os.path.join(self._temp_src_dir, 'Dockerfile'), 'w') as file: file.write( DOCKERFILE_TEMPLATE.format( @@ -104,15 +107,20 @@ def prepare_dependencies(self): workdir=ARTIFACTS_CONTAINER_DIR, manifest_file=ARTIFACTS_MANIFEST_FILE, entrypoint=SDK_CONTAINER_ENTRYPOINT)) - self.generate_artifacts_manifests_json_file(resources, self._temp_src_dir) + self._generate_artifacts_manifests_json_file( + file_names, self._temp_src_dir) - def invoke_docker_build_and_push(self, container_image_name): + def _invoke_docker_build_and_push(self, container_image_name): raise NotImplementedError + @classmethod + def _builder_key(cls) -> str: + return f'{cls.__module__}.{cls.__name__}' + @staticmethod - def generate_artifacts_manifests_json_file(resources, temp_dir): + def _generate_artifacts_manifests_json_file(file_names, temp_dir): infos = [] - for _, name in resources: + for name in file_names: info = beam_runner_api_pb2.ArtifactInformation( type_urn=common_urns.StandardArtifacts.Types.FILE.urn, type_payload=beam_runner_api_pb2.ArtifactFilePayload( @@ -126,25 +134,36 @@ def generate_artifacts_manifests_json_file(resources, temp_dir): def build_container_image(cls, pipeline_options: PipelineOptions) -> str: setup_options = pipeline_options.view_as(SetupOptions) container_build_engine = setup_options.prebuild_sdk_container_engine - if container_build_engine: - if container_build_engine == 'local_docker': - builder = _SdkContainerImageLocalBuilder( - pipeline_options) # type: SdkContainerImageBuilder - elif container_build_engine == 'cloud_build': - builder = _SdkContainerImageCloudBuilder(pipeline_options) - else: - raise ValueError( - 'Only (--prebuild_sdk_container_engine local_docker) and ' - '(--prebuild_sdk_container_engine cloud_build) are supported') - else: - raise ValueError('No --prebuild_sdk_container_engine option specified.') - return builder.build() + builder_cls = cls._get_subclass_by_key(container_build_engine) + builder = builder_cls(pipeline_options) + return builder._build() + + @classmethod + def _get_subclass_by_key(cls, key: str) -> Type['SdkContainerImageBuilder']: + available_builders = [ + subclass for subclass in cls.get_all_subclasses() + if subclass._builder_key() == key + ] + if not available_builders: + available_builder_keys = [ + subclass._builder_key() for subclass in cls.get_all_subclasses() + ] + raise ValueError( + f'Cannot find SDK builder type {key} in ' + f'{available_builder_keys}') + elif len(available_builders) > 1: + raise ValueError(f'Found multiple builders under key {key}') + return available_builders[0] class _SdkContainerImageLocalBuilder(SdkContainerImageBuilder): """SdkContainerLocalBuilder builds the sdk container image with local docker.""" - def invoke_docker_build_and_push(self, container_image_name): + @classmethod + def _builder_key(cls): + return 'local_docker' + + def _invoke_docker_build_and_push(self, container_image_name): try: _LOGGER.info("Building sdk container, this may take a few minutes...") now = time.time() @@ -194,11 +213,20 @@ def __init__(self, options): get_credentials=(not self._google_cloud_options.no_auth), http=get_new_http(), response_encoding='utf8') + self._cloudbuild_client = cloudbuild.CloudbuildV1( + credentials=credentials, + get_credentials=(not self._google_cloud_options.no_auth), + http=get_new_http(), + response_encoding='utf8') if not self._docker_registry_push_url: self._docker_registry_push_url = ( 'gcr.io/%s/prebuilt_beam_sdk' % self._google_cloud_options.project) - def invoke_docker_build_and_push(self, container_image_name): + @classmethod + def _builder_key(cls): + return 'cloud_build' + + def _invoke_docker_build_and_push(self, container_image_name): project_id = self._google_cloud_options.project temp_location = self._google_cloud_options.temp_location # google cloud build service expects all the build source file to be @@ -214,36 +242,50 @@ def invoke_docker_build_and_push(self, container_image_name): temp_location, '%s-%s.tgz' % (SOURCE_FOLDER, container_image_tag)) self._upload_to_gcs(tarball_path, gcs_location) - from google.cloud.devtools import cloudbuild_v1 - client = cloudbuild_v1.CloudBuildClient() - build = cloudbuild_v1.Build() + build = cloudbuild.Build() build.steps = [] - step = cloudbuild_v1.BuildStep() + step = cloudbuild.BuildStep() step.name = 'gcr.io/kaniko-project/executor:latest' step.args = ['--destination=' + container_image_name, '--cache=true'] step.dir = SOURCE_FOLDER build.steps.append(step) - source = cloudbuild_v1.Source() - source.storage_source = cloudbuild_v1.StorageSource() + source = cloudbuild.Source() + source.storageSource = cloudbuild.StorageSource() gcs_bucket, gcs_object = self._get_gcs_bucket_and_name(gcs_location) - source.storage_source.bucket = os.path.join(gcs_bucket) - source.storage_source.object = gcs_object + source.storageSource.bucket = os.path.join(gcs_bucket) + source.storageSource.object = gcs_object build.source = source # TODO(zyichi): make timeout configurable - build.timeout = Duration().FromSeconds(seconds=1800) + build.timeout = '7200s' now = time.time() - operation = client.create_build(project_id=project_id, build=build) + # operation = client.create_build(project_id=project_id, build=build) + request = cloudbuild.CloudbuildProjectsBuildsCreateRequest( + projectId=project_id, build=build) + build = self._cloudbuild_client.projects_builds.Create(request) + build_id, log_url = self._get_cloud_build_id_and_log_url(build.metadata) _LOGGER.info( 'Building sdk container with Google Cloud Build, this may ' - 'take a few minutes, you may check build log at %s' % - operation.metadata.build.log_url) + 'take a few minutes, you may check build log at %s' % log_url) # block until build finish, if build fails exception will be raised and # stops the job submission. - operation.result() + response = self._cloudbuild_client.projects_builds.Get( + cloudbuild.CloudbuildProjectsBuildsGetRequest( + id=build_id, projectId=project_id)) + while response.status in [cloudbuild.Build.StatusValueValuesEnum.QUEUED, + cloudbuild.Build.StatusValueValuesEnum.WORKING]: + time.sleep(10) + response = self._cloudbuild_client.projects_builds.Get( + cloudbuild.CloudbuildProjectsBuildsGetRequest( + id=build_id, projectId=project_id)) + + if response.status != cloudbuild.Build.StatusValueValuesEnum.SUCCESS: + raise RuntimeError( + 'Failed to build python sdk container image on google cloud build, ' + 'please check build log for error.') _LOGGER.info( "Python SDK container pre-build finished in %.2f seconds" % @@ -276,6 +318,18 @@ def _upload_to_gcs(self, local_file_path, gcs_location): raise _LOGGER.info('Completed GCS upload to %s.', gcs_location) + def _get_cloud_build_id_and_log_url(self, metadata): + id = None + log_url = None + for item in metadata.additionalProperties: + if item.key == 'build': + for field in item.value.object_value.properties: + if field.key == 'logUrl': + log_url = field.value.string_value + if field.key == 'id': + id = field.value.string_value + return id, log_url + @staticmethod def _get_gcs_bucket_and_name(gcs_location): return gcs_location[5:].split('/', 1) diff --git a/sdks/python/apache_beam/runners/portability/sdk_container_builder_test.py b/sdks/python/apache_beam/runners/portability/sdk_container_builder_test.py new file mode 100644 index 000000000000..955fe328f171 --- /dev/null +++ b/sdks/python/apache_beam/runners/portability/sdk_container_builder_test.py @@ -0,0 +1,101 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Unit tests for the sdk container builder module.""" + +# pytype: skip-file + +import gc +import logging +import unittest +import unittest.mock + +from apache_beam.options import pipeline_options +from apache_beam.runners.portability import sdk_container_builder + + +class SdkContainerBuilderTest(unittest.TestCase): + def tearDown(self): + # Ensures SdkContainerImageBuilder subclasses are cleared + gc.collect() + + def test_can_find_local_builder(self): + local_builder = ( + sdk_container_builder.SdkContainerImageBuilder._get_subclass_by_key( + 'local_docker')) + self.assertEqual( + local_builder, sdk_container_builder._SdkContainerImageLocalBuilder) + + def test_can_find_cloud_builder(self): + local_builder = ( + sdk_container_builder.SdkContainerImageBuilder._get_subclass_by_key( + 'cloud_build')) + self.assertEqual( + local_builder, sdk_container_builder._SdkContainerImageCloudBuilder) + + def test_missing_builder_key_throws_value_error(self): + with self.assertRaises(ValueError): + sdk_container_builder.SdkContainerImageBuilder._get_subclass_by_key( + 'missing-id') + + def test_multiple_matchings_keys_throws_value_error(self): + # pylint: disable=unused-variable + class _PluginSdkBuilder(sdk_container_builder.SdkContainerImageBuilder): + @classmethod + def _builder_key(cls): + return 'test-id' + + class _PluginSdkBuilder2(sdk_container_builder.SdkContainerImageBuilder): + @classmethod + def _builder_key(cls): + return 'test-id' + + # pylint: enable=unused-variable + + with self.assertRaises(ValueError): + sdk_container_builder.SdkContainerImageBuilder._get_subclass_by_key( + 'test-id') + + def test_can_find_new_subclass(self): + class _PluginSdkBuilder(sdk_container_builder.SdkContainerImageBuilder): + pass + + expected_key = f'{_PluginSdkBuilder.__module__}._PluginSdkBuilder' + local_builder = ( + sdk_container_builder.SdkContainerImageBuilder._get_subclass_by_key( + expected_key)) + self.assertEqual(local_builder, _PluginSdkBuilder) + + @unittest.mock.patch( + 'apache_beam.runners.portability.sdk_container_builder._SdkContainerImageLocalBuilder' # pylint: disable=line-too-long + ) + @unittest.mock.patch.object( + sdk_container_builder.SdkContainerImageBuilder, "_get_subclass_by_key") + def test_build_container_image_locates_subclass_invokes_build( + self, mock_get_subclass, mocked_local_builder): + mock_get_subclass.return_value = mocked_local_builder + options = pipeline_options.PipelineOptions() + sdk_container_builder.SdkContainerImageBuilder.build_container_image( + options) + mocked_local_builder.assert_called_once_with(options) + mocked_local_builder.return_value._build.assert_called_once_with() + + +if __name__ == '__main__': + # Run the tests. + logging.getLogger().setLevel(logging.INFO) + unittest.main() diff --git a/sdks/python/apache_beam/runners/portability/spark_java_job_server_test.py b/sdks/python/apache_beam/runners/portability/spark_java_job_server_test.py new file mode 100644 index 000000000000..50490d9c5c15 --- /dev/null +++ b/sdks/python/apache_beam/runners/portability/spark_java_job_server_test.py @@ -0,0 +1,65 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# pytype: skip-file + +import logging +import unittest + +from apache_beam.options import pipeline_options +from apache_beam.runners.portability.spark_runner import SparkRunner + + +class SparkTestPipelineOptions(pipeline_options.PipelineOptions): + def view_as(self, cls): + # Ensure only SparkRunnerOptions and JobServerOptions are used when calling + # default_job_server. If other options classes are needed, the cache key + # must include them to prevent incorrect hits. + assert ( + cls is pipeline_options.SparkRunnerOptions or + cls is pipeline_options.JobServerOptions) + return super().view_as(cls) + + +class SparkJavaJobServerTest(unittest.TestCase): + def test_job_server_cache(self): + # Multiple SparkRunner instances may be created, so we need to make sure we + # cache job servers across runner instances. + + # Most pipeline-specific options, such as sdk_worker_parallelism, don't + # affect job server configuration, so it is ok to ignore them for caching. + job_server1 = SparkRunner().default_job_server( + SparkTestPipelineOptions(['--sdk_worker_parallelism=1'])) + job_server2 = SparkRunner().default_job_server( + SparkTestPipelineOptions(['--sdk_worker_parallelism=2'])) + self.assertIs(job_server2, job_server1) + + # JobServerOptions and SparkRunnerOptions do affect job server + # configuration, so using different pipeline options gives us a different + # job server. + job_server3 = SparkRunner().default_job_server( + SparkTestPipelineOptions(['--job_port=1234'])) + self.assertIsNot(job_server3, job_server1) + + job_server4 = SparkRunner().default_job_server( + SparkTestPipelineOptions(['--spark_master_url=spark://localhost:5678'])) + self.assertIsNot(job_server4, job_server1) + self.assertIsNot(job_server4, job_server3) + + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.INFO) + unittest.main() diff --git a/sdks/python/apache_beam/runners/portability/spark_runner.py b/sdks/python/apache_beam/runners/portability/spark_runner.py index 4619ea9f8592..bb7b4c0465e2 100644 --- a/sdks/python/apache_beam/runners/portability/spark_runner.py +++ b/sdks/python/apache_beam/runners/portability/spark_runner.py @@ -19,12 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import os import re -import sys import urllib from apache_beam.options import pipeline_options @@ -35,6 +31,10 @@ # https://spark.apache.org/docs/latest/submitting-applications.html#master-urls LOCAL_MASTER_PATTERN = r'^local(\[.+\])?$' +# Since Java job servers are heavyweight external processes, cache them. +# This applies only to SparkJarJobServer, not SparkUberJarJobServer. +JOB_SERVER_CACHE = {} + class SparkRunner(portable_runner.PortableRunner): def run_pipeline(self, pipeline, options): @@ -49,15 +49,19 @@ def run_pipeline(self, pipeline, options): def default_job_server(self, options): spark_options = options.view_as(pipeline_options.SparkRunnerOptions) if spark_options.spark_submit_uber_jar: - if sys.version_info < (3, 6): - raise ValueError( - 'spark_submit_uber_jar requires Python 3.6+, current version %s' % - sys.version) if not spark_options.spark_rest_url: raise ValueError('Option spark_rest_url must be set.') return spark_uber_jar_job_server.SparkUberJarJobServer( spark_options.spark_rest_url, options) - return job_server.StopOnExitJobServer(SparkJarJobServer(options)) + # Use Java job server by default. + # Only SparkRunnerOptions and JobServerOptions affect job server + # configuration, so concat those as the cache key. + job_server_options = options.view_as(pipeline_options.JobServerOptions) + options_str = str(spark_options) + str(job_server_options) + if not options_str in JOB_SERVER_CACHE: + JOB_SERVER_CACHE[options_str] = job_server.StopOnExitJobServer( + SparkJarJobServer(options)) + return JOB_SERVER_CACHE[options_str] def create_job_service_handle(self, job_service, options): return portable_runner.JobServiceHandle( @@ -73,6 +77,7 @@ def __init__(self, options): options = options.view_as(pipeline_options.SparkRunnerOptions) self._jar = options.spark_job_server_jar self._master_url = options.spark_master_url + self._spark_version = options.spark_version def path_to_jar(self): if self._jar: @@ -83,11 +88,15 @@ def path_to_jar(self): 'Unable to parse jar URL "%s". If using a full URL, make sure ' 'the scheme is specified. If using a local file path, make sure ' 'the file exists; you may have to first build the job server ' - 'using `./gradlew runners:spark:job-server:shadowJar`.' % + 'using `./gradlew runners:spark:2:job-server:shadowJar`.' % self._jar) return self._jar else: - return self.path_to_beam_jar(':runners:spark:job-server:shadowJar') + if self._spark_version == '3': + return self.path_to_beam_jar(':runners:spark:3:job-server:shadowJar') + return self.path_to_beam_jar( + ':runners:spark:2:job-server:shadowJar', + artifact_id='beam-runners-spark-job-server') def java_arguments( self, job_port, artifact_port, expansion_port, artifacts_dir): diff --git a/sdks/python/apache_beam/runners/portability/spark_runner_test.py b/sdks/python/apache_beam/runners/portability/spark_runner_test.py index 4cf7e4b5148a..ba5f103fc1b5 100644 --- a/sdks/python/apache_beam/runners/portability/spark_runner_test.py +++ b/sdks/python/apache_beam/runners/portability/spark_runner_test.py @@ -16,9 +16,6 @@ # # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import argparse import logging import shlex @@ -87,7 +84,7 @@ def parse_options(self, request): self.set_spark_job_server_jar( known_args.spark_job_server_jar or job_server.JavaJarJobServer.path_to_beam_jar( - ':runners:spark:job-server:shadowJar')) + ':runners:spark:2:job-server:shadowJar')) self.environment_type = known_args.environment_type self.environment_options = known_args.environment_options @@ -172,12 +169,18 @@ def test_sdf_with_dofn_as_watermark_estimator(self): # Skip until Spark runner supports SDF and self-checkpoint. raise unittest.SkipTest("BEAM-7222") + def test_pardo_dynamic_timer(self): + raise unittest.SkipTest("BEAM-9912") + def test_flattened_side_input(self): # Blocked on support for transcoding # https://jira.apache.org/jira/browse/BEAM-7236 super(SparkRunnerTest, self).test_flattened_side_input(with_transcoding=False) + def test_custom_merging_window(self): + raise unittest.SkipTest("BEAM-11004") + # Inherits all other tests from PortableRunnerTest. diff --git a/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py b/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py index 6b9e6fd3b2b3..60b2e88357c9 100644 --- a/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py +++ b/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import itertools import logging import os @@ -50,12 +47,12 @@ class SparkUberJarJobServer(abstract_job_service.AbstractJobServiceServicer): def __init__(self, rest_url, options): super(SparkUberJarJobServer, self).__init__() self._rest_url = rest_url - self._executable_jar = ( - options.view_as( - pipeline_options.SparkRunnerOptions).spark_job_server_jar) self._artifact_port = ( options.view_as(pipeline_options.JobServerOptions).artifact_port) self._temp_dir = tempfile.mkdtemp(prefix='apache-beam-spark') + spark_options = options.view_as(pipeline_options.SparkRunnerOptions) + self._executable_jar = spark_options.spark_job_server_jar + self._spark_version = spark_options.spark_version def start(self): return self @@ -72,12 +69,17 @@ def executable_jar(self): 'Unable to parse jar URL "%s". If using a full URL, make sure ' 'the scheme is specified. If using a local file path, make sure ' 'the file exists; you may have to first build the job server ' - 'using `./gradlew runners:spark:job-server:shadowJar`.' % + 'using `./gradlew runners:spark:2:job-server:shadowJar`.' % self._executable_jar) url = self._executable_jar else: - url = job_server.JavaJarJobServer.path_to_beam_jar( - ':runners:spark:job-server:shadowJar') + if self._spark_version == '3': + url = job_server.JavaJarJobServer.path_to_beam_jar( + ':runners:spark:3:job-server:shadowJar') + else: + url = job_server.JavaJarJobServer.path_to_beam_jar( + ':runners:spark:2:job-server:shadowJar', + artifact_id='beam-runners-spark-job-server') return job_server.JavaJarJobServer.local_jar(url) def create_beam_job(self, job_id, job_name, pipeline, options): diff --git a/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server_test.py b/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server_test.py index 0a7b954574bf..d9bb7e686f05 100644 --- a/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server_test.py +++ b/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server_test.py @@ -16,13 +16,9 @@ # # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import contextlib import logging import os -import sys import tempfile import unittest import zipfile @@ -60,7 +56,6 @@ def spark_job(): pipeline_options.SparkRunnerOptions()) -@unittest.skipIf(sys.version_info < (3, 6), "Requires Python 3.6+") class SparkUberJarJobServerTest(unittest.TestCase): @requests_mock.mock() def test_get_server_spark_version(self, http_mock): diff --git a/sdks/python/apache_beam/runners/portability/stager.py b/sdks/python/apache_beam/runners/portability/stager.py index 3aa89379a758..bee664af1737 100644 --- a/sdks/python/apache_beam/runners/portability/stager.py +++ b/sdks/python/apache_beam/runners/portability/stager.py @@ -47,9 +47,8 @@ """ # pytype: skip-file -from __future__ import absolute_import - import glob +import hashlib import logging import os import shutil @@ -58,9 +57,9 @@ from typing import List from typing import Optional from typing import Tuple +from urllib.parse import urlparse import pkg_resources -from future.moves.urllib.parse import urlparse from apache_beam.internal import pickler from apache_beam.internal.http_client import get_new_http @@ -69,6 +68,8 @@ from apache_beam.options.pipeline_options import PipelineOptions # pylint: disable=unused-import from apache_beam.options.pipeline_options import SetupOptions from apache_beam.options.pipeline_options import WorkerOptions +from apache_beam.portability import common_urns +from apache_beam.portability.api import beam_runner_api_pb2 from apache_beam.runners.internal import names from apache_beam.utils import processes from apache_beam.utils import retry @@ -114,10 +115,50 @@ def get_sdk_package_name(): Returns the PyPI package name to be staged.""" return names.BEAM_PACKAGE_NAME + @staticmethod + def _create_file_stage_to_artifact(local_path, staged_name): + return beam_runner_api_pb2.ArtifactInformation( + type_urn=common_urns.artifact_types.FILE.urn, + type_payload=beam_runner_api_pb2.ArtifactFilePayload( + path=local_path).SerializeToString(), + role_urn=common_urns.artifact_roles.STAGING_TO.urn, + role_payload=beam_runner_api_pb2.ArtifactStagingToRolePayload( + staged_name=staged_name).SerializeToString()) + + @staticmethod + def _create_file_pip_requirements_artifact(local_path): + return beam_runner_api_pb2.ArtifactInformation( + type_urn=common_urns.artifact_types.FILE.urn, + type_payload=beam_runner_api_pb2.ArtifactFilePayload( + path=local_path).SerializeToString(), + role_urn=common_urns.artifact_roles.PIP_REQUIREMENTS_FILE.urn) + + @staticmethod + def extract_staging_tuple_iter( + artifacts: List[beam_runner_api_pb2.ArtifactInformation]): + for artifact in artifacts: + if artifact.type_urn == common_urns.artifact_types.FILE.urn: + file_payload = beam_runner_api_pb2.ArtifactFilePayload() + file_payload.ParseFromString(artifact.type_payload) + src = file_payload.path + if artifact.role_urn == common_urns.artifact_roles.STAGING_TO.urn: + role_payload = beam_runner_api_pb2.ArtifactStagingToRolePayload() + role_payload.ParseFromString(artifact.role_payload) + dst = role_payload.staged_name + elif (artifact.role_urn == + common_urns.artifact_roles.PIP_REQUIREMENTS_FILE.urn): + dst = hashlib.sha256(artifact.SerializeToString()).hexdigest() + else: + raise RuntimeError("unknown role type: %s" % artifact.role_urn) + yield (src, dst) + else: + raise RuntimeError("unknown artifact type: %s" % artifact.type_urn) + @staticmethod def create_job_resources(options, # type: PipelineOptions temp_dir, # type: str build_setup_args=None, # type: Optional[List[str]] + pypi_requirements=None, # type: Optional[List[str]] populate_requirements_cache=None, # type: Optional[str] skip_prestaged_dependencies=False, # type: Optional[bool] ): @@ -135,21 +176,22 @@ def create_job_resources(options, # type: PipelineOptions build_setup_args: A list of command line arguments used to build a setup package. Used only if options.setup_file is not None. Used only for testing. + pypi_requirements: A list of PyPI requirements used to cache source + packages. populate_requirements_cache: Callable for populating the requirements cache. Used only for testing. skip_prestaged_dependencies: Skip staging dependencies that can be added into SDK containers during prebuilding. Returns: - A list of tuples of local file paths and file names (no paths) to be - used for staging resources. + A list of ArtifactInformation to be used for staging resources. Raises: RuntimeError: If files specified are not found or error encountered while trying to create the resources (e.g., build a setup package). """ - resources = [] # type: List[Tuple[str, str]] + resources = [] # type: List[beam_runner_api_pb2.ArtifactInformation] setup_options = options.view_as(SetupOptions) @@ -157,6 +199,13 @@ def create_job_resources(options, # type: PipelineOptions # requirements.txt, python packages from extra_packages and workflow tarball # if we know we are using a dependency pre-installed sdk container image. if not skip_prestaged_dependencies: + requirements_cache_path = ( + os.path.join(tempfile.gettempdir(), 'dataflow-requirements-cache') + if setup_options.requirements_cache is None else + setup_options.requirements_cache) + if not os.path.exists(requirements_cache_path): + os.makedirs(requirements_cache_path) + # Stage a requirements file if present. if setup_options.requirements_file is not None: if not os.path.isfile(setup_options.requirements_file): @@ -164,21 +213,32 @@ def create_job_resources(options, # type: PipelineOptions 'The file %s cannot be found. It was specified in the ' '--requirements_file command line option.' % setup_options.requirements_file) - resources.append((setup_options.requirements_file, REQUIREMENTS_FILE)) - requirements_cache_path = ( - os.path.join(tempfile.gettempdir(), 'dataflow-requirements-cache') - if setup_options.requirements_cache is None else - setup_options.requirements_cache) - # Populate cache with packages from requirements and stage the files - # in the cache. - if not os.path.exists(requirements_cache_path): - os.makedirs(requirements_cache_path) + resources.append( + Stager._create_file_stage_to_artifact( + setup_options.requirements_file, REQUIREMENTS_FILE)) + # Populate cache with packages from the requirement file option and + # stage the files in the cache. ( populate_requirements_cache if populate_requirements_cache else Stager._populate_requirements_cache)( setup_options.requirements_file, requirements_cache_path) + + if pypi_requirements: + tf = tempfile.NamedTemporaryFile(mode='w', delete=False) + tf.writelines(pypi_requirements) + tf.close() + resources.append(Stager._create_file_pip_requirements_artifact(tf.name)) + # Populate cache with packages from PyPI requirements and stage + # the files in the cache. + ( + populate_requirements_cache if populate_requirements_cache else + Stager._populate_requirements_cache)( + tf.name, requirements_cache_path) + + if setup_options.requirements_file is not None or pypi_requirements: for pkg in glob.glob(os.path.join(requirements_cache_path, '*')): - resources.append((pkg, os.path.basename(pkg))) + resources.append( + Stager._create_file_stage_to_artifact(pkg, os.path.basename(pkg))) # Handle a setup file if present. # We will build the setup package locally and then copy it to the staging @@ -195,7 +255,9 @@ def create_job_resources(options, # type: PipelineOptions 'setup.py instead of %s' % setup_options.setup_file) tarball_file = Stager._build_setup_package( setup_options.setup_file, temp_dir, build_setup_args) - resources.append((tarball_file, WORKFLOW_TARBALL_FILE)) + resources.append( + Stager._create_file_stage_to_artifact( + tarball_file, WORKFLOW_TARBALL_FILE)) # Handle extra local packages that should be staged. if setup_options.extra_packages is not None: @@ -236,10 +298,11 @@ def create_job_resources(options, # type: PipelineOptions if os.path.isfile(sdk_path): _LOGGER.info('Copying Beam SDK "%s" to staging location.', sdk_path) - resources.append(( - sdk_path, - Stager._desired_sdk_filename_in_staging_location( - setup_options.sdk_location))) + resources.append( + Stager._create_file_stage_to_artifact( + sdk_path, + Stager._desired_sdk_filename_in_staging_location( + setup_options.sdk_location))) else: if setup_options.sdk_location == 'default': raise RuntimeError( @@ -275,13 +338,17 @@ def create_job_resources(options, # type: PipelineOptions pickled_session_file = os.path.join( temp_dir, names.PICKLED_MAIN_SESSION_FILE) pickler.dump_session(pickled_session_file) - resources.append((pickled_session_file, names.PICKLED_MAIN_SESSION_FILE)) + resources.append( + Stager._create_file_stage_to_artifact( + pickled_session_file, names.PICKLED_MAIN_SESSION_FILE)) worker_options = options.view_as(WorkerOptions) dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None) if dataflow_worker_jar is not None: jar_staged_filename = 'dataflow-worker.jar' - resources.append((dataflow_worker_jar, jar_staged_filename)) + resources.append( + Stager._create_file_stage_to_artifact( + dataflow_worker_jar, jar_staged_filename)) return resources @@ -323,6 +390,7 @@ def create_and_stage_job_resources( options, # type: PipelineOptions build_setup_args=None, # type: Optional[List[str]] temp_dir=None, # type: Optional[str] + pypi_requirements=None, # type: Optional[List[str]] populate_requirements_cache=None, # type: Optional[str] staging_location=None # type: Optional[str] ): @@ -340,6 +408,8 @@ def create_and_stage_job_resources( temp_dir: Temporary folder where the resource building can happen. If None then a unique temp directory will be created. Used only for testing. + pypi_requirements: A list of PyPI requirements used to cache source + packages. populate_requirements_cache: Callable for populating the requirements cache. Used only for testing. staging_location: Location to stage the file. @@ -357,9 +427,14 @@ def create_and_stage_job_resources( temp_dir = temp_dir or tempfile.mkdtemp() resources = self.create_job_resources( - options, temp_dir, build_setup_args, populate_requirements_cache) + options, + temp_dir, + build_setup_args, + pypi_requirements=pypi_requirements, + populate_requirements_cache=populate_requirements_cache) - staged_resources = self.stage_job_resources(resources, staging_location) + staged_resources = self.stage_job_resources( + list(Stager.extract_staging_tuple_iter(resources)), staging_location) # Delete all temp files created while staging job resources. shutil.rmtree(temp_dir) @@ -402,7 +477,7 @@ def _is_remote_path(path): @staticmethod def _create_jar_packages(jar_packages, temp_dir): - # type: (...) -> List[Tuple[str, str]] + # type: (...) -> List[beam_runner_api_pb2.ArtifactInformation] """Creates a list of local jar packages for Java SDK Harness. @@ -416,7 +491,7 @@ def _create_jar_packages(jar_packages, temp_dir): RuntimeError: If files specified are not found or do not have expected name patterns. """ - resources = [] # type: List[Tuple[str, str]] + resources = [] # type: List[beam_runner_api_pb2.ArtifactInformation] staging_temp_dir = tempfile.mkdtemp(dir=temp_dir) local_packages = [] # type: List[str] for package in jar_packages: @@ -447,13 +522,13 @@ def _create_jar_packages(jar_packages, temp_dir): for package in local_packages: basename = os.path.basename(package) - resources.append((package, basename)) + resources.append(Stager._create_file_stage_to_artifact(package, basename)) return resources @staticmethod def _create_extra_packages(extra_packages, temp_dir): - # type: (...) -> List[Tuple[str, str]] + # type: (...) -> List[beam_runner_api_pb2.ArtifactInformation] """Creates a list of local extra packages. @@ -465,15 +540,15 @@ def _create_extra_packages(extra_packages, temp_dir): returns. Returns: - A list of tuples of local file paths and file names (no paths) for the - resources staged. All the files are assumed to be staged in - staging_location. + A list of ArtifactInformation of local file paths and file names + (no paths) for the resources staged. All the files are assumed to be + staged in staging_location. Raises: RuntimeError: If files specified are not found or do not have expected name patterns. """ - resources = [] # type: List[Tuple[str, str]] + resources = [] # type: List[beam_runner_api_pb2.ArtifactInformation] staging_temp_dir = tempfile.mkdtemp(dir=temp_dir) local_packages = [] # type: List[str] for package in extra_packages: @@ -514,7 +589,7 @@ def _create_extra_packages(extra_packages, temp_dir): for package in local_packages: basename = os.path.basename(package) - resources.append((package, basename)) + resources.append(Stager._create_file_stage_to_artifact(package, basename)) # Create a file containing the list of extra packages and stage it. # The file is important so that in the worker the packages are installed # exactly in the order specified. This approach will avoid extra PyPI @@ -528,7 +603,8 @@ def _create_extra_packages(extra_packages, temp_dir): # Note that the caller of this function is responsible for deleting the # temporary folder where all temp files are created, including this one. resources.append( - (os.path.join(temp_dir, EXTRA_PACKAGES_FILE), EXTRA_PACKAGES_FILE)) + Stager._create_file_stage_to_artifact( + os.path.join(temp_dir, EXTRA_PACKAGES_FILE), EXTRA_PACKAGES_FILE)) return resources @@ -614,7 +690,7 @@ def _desired_sdk_filename_in_staging_location(sdk_location): @staticmethod def _create_beam_sdk(sdk_remote_location, temp_dir): - # type: (...) -> List[Tuple[str, str]] + # type: (...) -> List[beam_runner_api_pb2.ArtifactInformation] """Creates a Beam SDK file with the appropriate version. @@ -626,8 +702,8 @@ def _create_beam_sdk(sdk_remote_location, temp_dir): downloaded. Returns: - A list of tuples of local files path and SDK files that will be staged - to the staging location. + A list of ArtifactInformation of local files path and SDK files that + will be staged to the staging location. Raises: RuntimeError: if staging was not successful. @@ -637,11 +713,12 @@ def _create_beam_sdk(sdk_remote_location, temp_dir): sdk_sources_staged_name = Stager.\ _desired_sdk_filename_in_staging_location(sdk_local_file) _LOGGER.info('Staging SDK sources from PyPI: %s', sdk_sources_staged_name) - staged_sdk_files = [(sdk_local_file, sdk_sources_staged_name)] + staged_sdk_files = [ + Stager._create_file_stage_to_artifact( + sdk_local_file, sdk_sources_staged_name) + ] try: - abi_suffix = ( - 'mu' if sys.version_info[0] < 3 else - ('m' if sys.version_info < (3, 8) else '')) + abi_suffix = 'm' if sys.version_info < (3, 8) else '' # Stage binary distribution of the SDK, for now on a best-effort basis. sdk_local_file = Stager._download_pypi_sdk_package( temp_dir, @@ -655,7 +732,9 @@ def _create_beam_sdk(sdk_remote_location, temp_dir): _LOGGER.info( 'Staging binary distribution of the SDK from PyPI: %s', sdk_binary_staged_name) - staged_sdk_files.append((sdk_local_file, sdk_binary_staged_name)) + staged_sdk_files.append( + Stager._create_file_stage_to_artifact( + sdk_local_file, sdk_binary_staged_name)) except RuntimeError as e: _LOGGER.warning( 'Failed to download requested binary distribution ' @@ -671,7 +750,10 @@ def _create_beam_sdk(sdk_remote_location, temp_dir): staged_name = Stager._desired_sdk_filename_in_staging_location( local_download_file) _LOGGER.info('Staging Beam SDK from %s', sdk_remote_location) - return [(local_download_file, staged_name)] + return [ + Stager._create_file_stage_to_artifact( + local_download_file, staged_name) + ] else: raise RuntimeError( 'The --sdk_location option was used with an unsupported ' diff --git a/sdks/python/apache_beam/runners/portability/stager_test.py b/sdks/python/apache_beam/runners/portability/stager_test.py index ea676af223d8..d12a05628306 100644 --- a/sdks/python/apache_beam/runners/portability/stager_test.py +++ b/sdks/python/apache_beam/runners/portability/stager_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import shutil @@ -214,6 +212,25 @@ def test_with_requirements_file(self): staging_location=staging_dir)[1])) self.assertTrue( os.path.isfile(os.path.join(staging_dir, stager.REQUIREMENTS_FILE))) + + def test_with_pypi_requirements(self): + staging_dir = self.make_temp_dir() + requirements_cache_dir = self.make_temp_dir() + + options = PipelineOptions() + self.update_options(options) + options.view_as(SetupOptions).requirements_cache = requirements_cache_dir + resources = self.stager.create_and_stage_job_resources( + options, + pypi_requirements=['nothing>=1.0,<2.0'], + populate_requirements_cache=self.populate_requirements_cache, + staging_location=staging_dir)[1] + self.assertEqual(3, len(resources)) + self.assertTrue({'abc.txt', 'def.txt'} <= set(resources)) + generated_requirements = (set(resources) - {'abc.txt', 'def.txt'}).pop() + with open(os.path.join(staging_dir, generated_requirements)) as f: + data = f.read() + self.assertEqual('nothing>=1.0,<2.0', data) self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'abc.txt'))) self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'def.txt'))) @@ -255,41 +272,6 @@ def test_with_requirements_file_and_cache(self): self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'abc.txt'))) self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'def.txt'))) - @unittest.skipIf( - sys.version_info[0] == 3, - 'This test is not hermetic ' - 'and halts test suite execution on Python 3. ' - 'TODO: BEAM-5502') - def test_with_setup_file(self): - staging_dir = self.make_temp_dir() - source_dir = self.make_temp_dir() - self.create_temp_file(os.path.join(source_dir, 'setup.py'), 'notused') - - options = PipelineOptions() - self.update_options(options) - options.view_as(SetupOptions).setup_file = os.path.join( - source_dir, 'setup.py') - - self.assertEqual( - [stager.WORKFLOW_TARBALL_FILE], - self.stager.create_and_stage_job_resources( - options, - # We replace the build setup command because a realistic one would - # require the setuptools package to be installed. Note that we can't - # use "touch" here to create the expected output tarball file, since - # touch is not available on Windows, so we invoke python to produce - # equivalent behavior. - build_setup_args=[ - 'python', - '-c', - 'open(__import__("sys").argv[1], "a")', - os.path.join(source_dir, stager.WORKFLOW_TARBALL_FILE) - ], - temp_dir=source_dir, - staging_location=staging_dir)[1]) - self.assertTrue( - os.path.isfile(os.path.join(staging_dir, stager.WORKFLOW_TARBALL_FILE))) - def test_setup_file_not_present(self): staging_dir = self.make_temp_dir() diff --git a/sdks/python/apache_beam/runners/runner.py b/sdks/python/apache_beam/runners/runner.py index ba80b276c95d..1030a53ff3f3 100644 --- a/sdks/python/apache_beam/runners/runner.py +++ b/sdks/python/apache_beam/runners/runner.py @@ -19,18 +19,17 @@ # pytype: skip-file -from __future__ import absolute_import - import importlib import logging import os import shelve import shutil import tempfile -from builtins import object from typing import TYPE_CHECKING from typing import Optional +from apache_beam.options.pipeline_options import StandardOptions + if TYPE_CHECKING: from apache_beam import pvalue from apache_beam import PTransform @@ -41,22 +40,10 @@ __all__ = ['PipelineRunner', 'PipelineState', 'PipelineResult'] -_ALL_KNOWN_RUNNERS = ( - 'apache_beam.runners.dataflow.dataflow_runner.DataflowRunner', - 'apache_beam.runners.direct.direct_runner.BundleBasedDirectRunner', - 'apache_beam.runners.direct.direct_runner.DirectRunner', - 'apache_beam.runners.direct.direct_runner.SwitchingDirectRunner', - 'apache_beam.runners.interactive.interactive_runner.InteractiveRunner', - 'apache_beam.runners.portability.flink_runner.FlinkRunner', - 'apache_beam.runners.portability.portable_runner.PortableRunner', - 'apache_beam.runners.portability.spark_runner.SparkRunner', - 'apache_beam.runners.test.TestDirectRunner', - 'apache_beam.runners.test.TestDataflowRunner', -) - -_KNOWN_RUNNER_NAMES = [path.split('.')[-1] for path in _ALL_KNOWN_RUNNERS] - -_RUNNER_MAP = {path.split('.')[-1].lower(): path for path in _ALL_KNOWN_RUNNERS} +_RUNNER_MAP = { + path.split('.')[-1].lower(): path + for path in StandardOptions.ALL_KNOWN_RUNNERS +} # Allow this alias, but don't make public. _RUNNER_MAP['pythonrpcdirectrunner'] = ( @@ -110,7 +97,7 @@ def create_runner(runner_name): raise ValueError( 'Unexpected pipeline runner: %s. Valid values are %s ' 'or the fully qualified name of a PipelineRunner subclass.' % - (runner_name, ', '.join(_KNOWN_RUNNER_NAMES))) + (runner_name, ', '.join(StandardOptions.KNOWN_RUNNER_NAMES))) class PipelineRunner(object): diff --git a/sdks/python/apache_beam/runners/runner_test.py b/sdks/python/apache_beam/runners/runner_test.py index 0923369ad0ba..61fe400997dd 100644 --- a/sdks/python/apache_beam/runners/runner_test.py +++ b/sdks/python/apache_beam/runners/runner_test.py @@ -24,8 +24,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import apache_beam as beam diff --git a/sdks/python/apache_beam/runners/sdf_utils.py b/sdks/python/apache_beam/runners/sdf_utils.py index aa91bda7cfad..8ff9a35d2ed6 100644 --- a/sdks/python/apache_beam/runners/sdf_utils.py +++ b/sdks/python/apache_beam/runners/sdf_utils.py @@ -19,12 +19,8 @@ """Common utility class to help SDK harness to execute an SDF. """ -from __future__ import absolute_import -from __future__ import division - import logging import threading -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import NamedTuple diff --git a/sdks/python/apache_beam/runners/sdf_utils_test.py b/sdks/python/apache_beam/runners/sdf_utils_test.py index 087068073d95..a4510d747d13 100644 --- a/sdks/python/apache_beam/runners/sdf_utils_test.py +++ b/sdks/python/apache_beam/runners/sdf_utils_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import time import unittest diff --git a/sdks/python/apache_beam/runners/test/__init__.py b/sdks/python/apache_beam/runners/test/__init__.py index 26f13539d4aa..ed8413e5be05 100644 --- a/sdks/python/apache_beam/runners/test/__init__.py +++ b/sdks/python/apache_beam/runners/test/__init__.py @@ -23,8 +23,6 @@ # Protect against environments where dataflow runner is not available. # pylint: disable=wrong-import-order, wrong-import-position -from __future__ import absolute_import - try: from apache_beam.runners.dataflow.test_dataflow_runner import TestDataflowRunner from apache_beam.runners.direct.test_direct_runner import TestDirectRunner diff --git a/sdks/python/apache_beam/runners/worker/__init__.py b/sdks/python/apache_beam/runners/worker/__init__.py index 9fbf21557df7..0bce5d68f724 100644 --- a/sdks/python/apache_beam/runners/worker/__init__.py +++ b/sdks/python/apache_beam/runners/worker/__init__.py @@ -16,4 +16,3 @@ # """For internal use only; no backwards-compatibility guarantees.""" -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/runners/worker/bundle_processor.py b/sdks/python/apache_beam/runners/worker/bundle_processor.py index 60bdc45e214b..497d613f46f9 100644 --- a/sdks/python/apache_beam/runners/worker/bundle_processor.py +++ b/sdks/python/apache_beam/runners/worker/bundle_processor.py @@ -19,10 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import base64 import bisect import collections @@ -31,8 +27,6 @@ import logging import random import threading -from builtins import next -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import Callable @@ -52,7 +46,6 @@ from typing import Union from typing import cast -from future.utils import itervalues from google.protobuf import duration_pb2 from google.protobuf import timestamp_pb2 @@ -76,6 +69,7 @@ from apache_beam.transforms import core from apache_beam.transforms import sideinputs from apache_beam.transforms import userstate +from apache_beam.transforms import window from apache_beam.utils import counters from apache_beam.utils import proto_utils from apache_beam.utils import timestamp @@ -88,8 +82,8 @@ from apache_beam.runners.sdf_utils import SplitResultResidual from apache_beam.runners.worker import data_plane from apache_beam.runners.worker import sdk_worker - from apache_beam.transforms import window from apache_beam.transforms.core import Windowing + from apache_beam.transforms.window import BoundedWindow from apache_beam.utils import windowed_value # This module is experimental. No backwards-compatibility guarantees. @@ -173,7 +167,7 @@ class DataInputOperation(RunnerIOOperation): def __init__(self, operation_name, # type: Union[str, common.NameContext] step_name, - consumers, # type: Mapping[Any, Iterable[operations.Operation]] + consumers, # type: Mapping[Any, List[operations.Operation]] counter_factory, # type: counters.CounterFactory state_sampler, # type: statesampler.StateSampler windowed_coder, # type: coders.Coder @@ -196,7 +190,7 @@ def __init__(self, self.counter_factory, self.name_context.step_name, 0, - next(iter(itervalues(consumers))), + next(iter(consumers.values())), self.windowed_coder, self._get_runtime_performance_hints()) ] @@ -333,8 +327,9 @@ def finish(self): def reset(self): # type: () -> None - self.index = -1 - self.stop = float('inf') + with self.splitting_lock: + self.index = -1 + self.stop = float('inf') super(DataInputOperation, self).reset() @@ -381,7 +376,7 @@ def __init__(self, self._element_coder = coder.wrapped_value_coder self._target_window_coder = coder.window_coder # TODO(robertwb): Limit the cache size. - self._cache = {} # type: Dict[window.BoundedWindow, Any] + self._cache = {} # type: Dict[BoundedWindow, Any] def __getitem__(self, window): target_window = self._side_input_data.window_mapping_fn(window) @@ -644,14 +639,14 @@ def commit(self): class OutputTimer(userstate.BaseTimer): def __init__(self, key, - window, # type: windowed_value.BoundedWindow + window, # type: BoundedWindow timestamp, # type: timestamp.Timestamp paneinfo, # type: windowed_value.PaneInfo time_domain, # type: str timer_family_id, # type: str timer_coder_impl, # type: coder_impl.TimerCoderImpl output_stream # type: data_plane.ClosableOutputStream - ): + ): self._key = key self._window = window self._input_timestamp = timestamp @@ -661,12 +656,11 @@ def __init__(self, self._output_stream = output_stream self._timer_coder_impl = timer_coder_impl - def set(self, ts): - # type: (timestamp.TimestampTypes) -> None + def set(self, ts: timestamp.TimestampTypes, dynamic_timer_tag='') -> None: ts = timestamp.Timestamp.of(ts) timer = userstate.Timer( user_key=self._key, - dynamic_timer_tag='', + dynamic_timer_tag=dynamic_timer_tag, windows=(self._window, ), clear_bit=False, fire_timestamp=ts, @@ -676,11 +670,10 @@ def set(self, ts): self._timer_coder_impl.encode_to_stream(timer, self._output_stream, True) self._output_stream.maybe_flush() - def clear(self): - # type: () -> None + def clear(self, dynamic_timer_tag='') -> None: timer = userstate.Timer( user_key=self._key, - dynamic_timer_tag='', + dynamic_timer_tag=dynamic_timer_tag, windows=(self._window, ), clear_bit=True, fire_timestamp=None, @@ -713,10 +706,8 @@ def __init__(self, Args: state_handler: A StateServicer object. transform_id: The name of the PTransform that this context is associated. - key_coder: - window_coder: - timer_family_specs: A list of ``userstate.TimerSpec`` objects specifying - the timers associated with this operation. + key_coder: Coder for the key type. + window_coder: Coder for the window type. """ self._state_handler = state_handler self._transform_id = transform_id @@ -731,14 +722,11 @@ def add_timer_info(self, timer_family_id, timer_info): self._timers_info[timer_family_id] = timer_info def get_timer( - self, - timer_spec, - key, - window, # type: windowed_value.BoundedWindow - timestamp, - pane): - # type: (...) -> OutputTimer + self, timer_spec: userstate.TimerSpec, key, window, timestamp, + pane) -> OutputTimer: assert self._timers_info[timer_spec.name].output_stream is not None + timer_coder_impl = self._timers_info[timer_spec.name].timer_coder_impl + output_stream = self._timers_info[timer_spec.name].output_stream return OutputTimer( key, window, @@ -746,8 +734,8 @@ def get_timer( pane, timer_spec.time_domain, timer_spec.name, - self._timers_info[timer_spec.name].timer_coder_impl, - self._timers_info[timer_spec.name].output_stream) + timer_coder_impl, + output_stream) def get_state(self, *args): # type: (*Any) -> FnApiUserRuntimeStateTypes @@ -759,7 +747,7 @@ def get_state(self, *args): def _create_state(self, state_spec, # type: userstate.StateSpec key, - window # type: windowed_value.BoundedWindow + window # type: BoundedWindow ): # type: (...) -> FnApiUserRuntimeStateTypes if isinstance(state_spec, @@ -849,6 +837,7 @@ def __init__(self, self.process_bundle_descriptor = process_bundle_descriptor self.state_handler = state_handler self.data_channel_factory = data_channel_factory + self.current_instruction_id = None # type: Optional[str] # There is no guarantee that the runner only set # timer_api_service_descriptor when having timers. So this field cannot be @@ -956,6 +945,7 @@ def process_bundle(self, instruction_id): try: execution_context = ExecutionContext() + self.current_instruction_id = instruction_id self.state_sampler.start() # Start all operations. for op in reversed(self.ops.values()): @@ -969,7 +959,7 @@ def process_bundle(self, instruction_id): # (transform_id, timer_family_id). data_channels = collections.defaultdict( list - ) # type: DefaultDict[data_plane.GrpcClientDataChannel, List[Union[str, Tuple[str, str]]]] + ) # type: DefaultDict[data_plane.DataChannel, List[Union[str, Tuple[str, str]]]] # Add expected data inputs for each data channel. input_op_by_transform_id = {} @@ -982,13 +972,13 @@ def process_bundle(self, instruction_id): data_channels[self.timer_data_channel].extend( list(self.timers_info.keys())) - # Set up timer output stream for DoOperation. - for ((transform_id, timer_family_id), - timer_info) in self.timers_info.items(): - output_stream = self.timer_data_channel.output_timer_stream( - instruction_id, transform_id, timer_family_id) - timer_info.output_stream = output_stream - self.ops[transform_id].add_timer_info(timer_family_id, timer_info) + # Set up timer output stream for DoOperation. + for ((transform_id, timer_family_id), + timer_info) in self.timers_info.items(): + output_stream = self.timer_data_channel.output_timer_stream( + instruction_id, transform_id, timer_family_id) + timer_info.output_stream = output_stream + self.ops[transform_id].add_timer_info(timer_family_id, timer_info) # Process data and timer inputs for data_channel, expected_inputs in data_channels.items(): @@ -1025,7 +1015,7 @@ def process_bundle(self, instruction_id): finally: # Ensure any in-flight split attempts complete. with self.splitting_lock: - pass + self.current_instruction_id = None self.state_sampler.stop_if_still_running() def finalize_bundle(self): @@ -1042,6 +1032,10 @@ def try_split(self, bundle_split_request): # type: (beam_fn_api_pb2.ProcessBundleSplitRequest) -> beam_fn_api_pb2.ProcessBundleSplitResponse split_response = beam_fn_api_pb2.ProcessBundleSplitResponse() with self.splitting_lock: + if bundle_split_request.instruction_id != self.current_instruction_id: + # This may be a delayed split for a former bundle, see BEAM-12475. + return split_response + for op in self.ops.values(): if isinstance(op, DataInputOperation): desired_split = bundle_split_request.desired_splits.get( @@ -1452,6 +1446,8 @@ def process(self, element_restriction, *args, **kwargs): element, (restriction, _) = element_restriction for part, size in self.restriction_provider.split_and_size( element, restriction): + if size < 0: + raise ValueError('Expected size >= 0 but received %s.' % size) estimator_state = ( self.watermark_estimator_provider.initial_estimator_state( element, part)) @@ -1476,6 +1472,10 @@ def process(self, element_restriction, *args, **kwargs): truncated_restriction_size = ( self.restriction_provider.restriction_size( element, truncated_restriction)) + if truncated_restriction_size < 0: + raise ValueError( + 'Expected size >= 0 but received %s.' % + truncated_restriction_size) yield ((element, (truncated_restriction, estimator_state)), truncated_restriction_size) @@ -1863,6 +1863,48 @@ def process(self, element): factory, transform_id, transform_proto, consumers, MapWindows()) +@BeamTransformFactory.register_urn( + common_urns.primitives.MERGE_WINDOWS.urn, beam_runner_api_pb2.FunctionSpec) +def create_merge_windows( + factory, # type: BeamTransformFactory + transform_id, # type: str + transform_proto, # type: beam_runner_api_pb2.PTransform + mapping_fn_spec, # type: beam_runner_api_pb2.FunctionSpec + consumers # type: Dict[str, List[operations.Operation]] +): + assert mapping_fn_spec.urn == python_urns.PICKLED_WINDOWFN + window_fn = pickler.loads(mapping_fn_spec.payload) + + class MergeWindows(beam.DoFn): + def process(self, element): + nonce, windows = element + + original_windows = set(windows) # type: Set[window.BoundedWindow] + merged_windows = collections.defaultdict( + set + ) # type: MutableMapping[window.BoundedWindow, Set[window.BoundedWindow]] + + class RecordingMergeContext(window.WindowFn.MergeContext): + def merge( + self, + to_be_merged, # type: Iterable[window.BoundedWindow] + merge_result, # type: window.BoundedWindow + ): + originals = merged_windows[merge_result] + for window in to_be_merged: + if window in original_windows: + originals.add(window) + original_windows.remove(window) + else: + originals.update(merged_windows.pop(window)) + + window_fn.merge(RecordingMergeContext(windows)) + yield nonce, (original_windows, merged_windows.items()) + + return _create_simple_pardo_operation( + factory, transform_id, transform_proto, consumers, MergeWindows()) + + @BeamTransformFactory.register_urn(common_urns.primitives.TO_STRING.urn, None) def create_to_string_fn( factory, # type: BeamTransformFactory diff --git a/sdks/python/apache_beam/runners/worker/bundle_processor_test.py b/sdks/python/apache_beam/runners/worker/bundle_processor_test.py index 4a3ce44ddfc1..87802bcb1c86 100644 --- a/sdks/python/apache_beam/runners/worker/bundle_processor_test.py +++ b/sdks/python/apache_beam/runners/worker/bundle_processor_test.py @@ -18,8 +18,6 @@ """Unit tests for bundle processing.""" # pytype: skip-file -from __future__ import absolute_import - import unittest from apache_beam.runners.worker.bundle_processor import DataInputOperation diff --git a/sdks/python/apache_beam/runners/worker/channel_factory.py b/sdks/python/apache_beam/runners/worker/channel_factory.py index b3f31b1930f6..cd859e6dc1c8 100644 --- a/sdks/python/apache_beam/runners/worker/channel_factory.py +++ b/sdks/python/apache_beam/runners/worker/channel_factory.py @@ -18,10 +18,6 @@ """Factory to create grpc channel.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import grpc diff --git a/sdks/python/apache_beam/runners/worker/data_plane.py b/sdks/python/apache_beam/runners/worker/data_plane.py index c1067581d025..e89a66978efc 100644 --- a/sdks/python/apache_beam/runners/worker/data_plane.py +++ b/sdks/python/apache_beam/runners/worker/data_plane.py @@ -20,10 +20,6 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import abc import collections import logging @@ -31,11 +27,11 @@ import sys import threading import time -from builtins import object -from builtins import range +from types import TracebackType from typing import TYPE_CHECKING from typing import Any from typing import Callable +from typing import Collection from typing import DefaultDict from typing import Dict from typing import Iterable @@ -49,8 +45,6 @@ from typing import Union import grpc -from future.utils import raise_ -from future.utils import with_metaclass from apache_beam.coders import coder_impl from apache_beam.portability.api import beam_fn_api_pb2 @@ -59,21 +53,16 @@ from apache_beam.runners.worker.worker_id_interceptor import WorkerIdInterceptor if TYPE_CHECKING: - # TODO(BEAM-9372): move this out of the TYPE_CHECKING scope when we drop - # support for python < 3.5.3 - from types import TracebackType - ExcInfo = Tuple[Type[BaseException], BaseException, TracebackType] - OptExcInfo = Union[ExcInfo, Tuple[None, None, None]] - # TODO: move this out of the TYPE_CHECKING scope when we drop support for - # python < 3.6 - from typing import Collection # pylint: disable=ungrouped-imports import apache_beam.coders.slow_stream OutputStream = apache_beam.coders.slow_stream.OutputStream - DataOrTimers = \ - Union[beam_fn_api_pb2.Elements.Data, beam_fn_api_pb2.Elements.Timers] + DataOrTimers = Union[beam_fn_api_pb2.Elements.Data, + beam_fn_api_pb2.Elements.Timers] else: OutputStream = type(coder_impl.create_OutputStream()) +ExcInfo = Tuple[Type[BaseException], BaseException, TracebackType] +OptExcInfo = Union[ExcInfo, Tuple[None, None, None]] + # This module is experimental. No backwards-compatibility guarantees. _LOGGER = logging.getLogger(__name__) @@ -81,6 +70,10 @@ _DEFAULT_SIZE_FLUSH_THRESHOLD = 10 << 20 # 10MB _DEFAULT_TIME_FLUSH_THRESHOLD_MS = 0 # disable time-based flush by default +# Keep a set of completed instructions to discard late received data. The set +# can have up to _MAX_CLEANED_INSTRUCTIONS items. See _GrpcDataChannel. +_MAX_CLEANED_INSTRUCTIONS = 10000 + class ClosableOutputStream(OutputStream): """A Outputstream for use with CoderImpls that has a close() method.""" @@ -106,10 +99,11 @@ def flush(self): pass @staticmethod - def create(close_callback, # type: Optional[Callable[[bytes], None]] - flush_callback, # type: Optional[Callable[[bytes], None]] - data_buffer_time_limit_ms # type: int - ): + def create( + close_callback, # type: Optional[Callable[[bytes], None]] + flush_callback, # type: Optional[Callable[[bytes], None]] + data_buffer_time_limit_ms # type: int + ): # type: (...) -> ClosableOutputStream if data_buffer_time_limit_ms > 0: return TimeBasedBufferingClosableOutputStream( @@ -123,12 +117,12 @@ def create(close_callback, # type: Optional[Callable[[bytes], None]] class SizeBasedBufferingClosableOutputStream(ClosableOutputStream): """A size-based buffering OutputStream.""" - - def __init__(self, - close_callback=None, # type: Optional[Callable[[bytes], None]] - flush_callback=None, # type: Optional[Callable[[bytes], None]] - size_flush_threshold=_DEFAULT_SIZE_FLUSH_THRESHOLD # type: int - ): + def __init__( + self, + close_callback=None, # type: Optional[Callable[[bytes], None]] + flush_callback=None, # type: Optional[Callable[[bytes], None]] + size_flush_threshold=_DEFAULT_SIZE_FLUSH_THRESHOLD # type: int + ): super(SizeBasedBufferingClosableOutputStream, self).__init__(close_callback) self._flush_callback = flush_callback self._size_flush_threshold = size_flush_threshold @@ -198,12 +192,13 @@ def _flush(): class PeriodicThread(threading.Thread): """Call a function periodically with the specified number of seconds""" - def __init__(self, - interval, # type: float - function, # type: Callable - args=None, # type: Optional[Iterable] - kwargs=None # type: Optional[Mapping[str, Any]] - ): + def __init__( + self, + interval, # type: float + function, # type: Callable + args=None, # type: Optional[Iterable] + kwargs=None # type: Optional[Mapping[str, Any]] + ): # type: (...) -> None threading.Thread.__init__(self) self._interval = interval @@ -226,7 +221,7 @@ def cancel(self): self._finished.set() -class DataChannel(with_metaclass(abc.ABCMeta, object)): # type: ignore[misc] +class DataChannel(metaclass=abc.ABCMeta): """Represents a channel for reading and writing data over the data plane. Read data and timer from this channel with the input_elements method:: @@ -254,11 +249,12 @@ class DataChannel(with_metaclass(abc.ABCMeta, object)): # type: ignore[misc] data_channel.close() """ @abc.abstractmethod - def input_elements(self, - instruction_id, # type: str - expected_inputs, # type: Collection[Union[str, Tuple[str, str]]] - abort_callback=None # type: Optional[Callable[[], bool]] - ): + def input_elements( + self, + instruction_id, # type: str + expected_inputs, # type: Collection[Union[str, Tuple[str, str]]] + abort_callback=None # type: Optional[Callable[[], bool]] + ): # type: (...) -> Iterator[DataOrTimers] """Returns an iterable of all Element.Data and Element.Timers bundles for @@ -292,11 +288,12 @@ def output_stream( raise NotImplementedError(type(self)) @abc.abstractmethod - def output_timer_stream(self, - instruction_id, # type: str - transform_id, # type: str - timer_family_id # type: str - ): + def output_timer_stream( + self, + instruction_id, # type: str + transform_id, # type: str + timer_family_id # type: str + ): # type: (...) -> ClosableOutputStream """Returns an output stream written timers to transform_id. @@ -339,11 +336,12 @@ def inverse(self): # type: () -> InMemoryDataChannel return self._inverse - def input_elements(self, + def input_elements( + self, instruction_id, # type: str - unused_expected_inputs, # type: Any + unused_expected_inputs, # type: Any abort_callback=None # type: Optional[Callable[[], bool]] - ): + ): # type: (...) -> Iterator[DataOrTimers] other_inputs = [] for element in self._inputs: @@ -358,11 +356,12 @@ def input_elements(self, other_inputs.append(element) self._inputs = other_inputs - def output_timer_stream(self, - instruction_id, # type: str - transform_id, # type: str - timer_family_id # type: str - ): + def output_timer_stream( + self, + instruction_id, # type: str + transform_id, # type: str + timer_family_id # type: str + ): # type: (...) -> ClosableOutputStream def add_to_inverse_output(timer): # type: (bytes) -> None @@ -421,10 +420,16 @@ def __init__(self, data_buffer_time_limit_ms=0): lambda: queue.Queue(maxsize=5) ) # type: DefaultDict[str, queue.Queue[DataOrTimers]] + # Keep a cache of completed instructions. Data for completed instructions + # must be discarded. See input_elements() and _clean_receiving_queue(). + # OrderedDict is used as FIFO set with the value being always `True`. + self._cleaned_instruction_ids = collections.OrderedDict( + ) # type: collections.OrderedDict[str, bool] + self._receive_lock = threading.Lock() self._reads_finished = threading.Event() self._closed = False - self._exc_info = None # type: Optional[OptExcInfo] + self._exc_info = (None, None, None) # type: OptExcInfo def close(self): # type: () -> None @@ -436,20 +441,37 @@ def wait(self, timeout=None): self._reads_finished.wait(timeout) def _receiving_queue(self, instruction_id): - # type: (str) -> queue.Queue[DataOrTimers] + # type: (str) -> Optional[queue.Queue[DataOrTimers]] + + """ + Gets or creates queue for a instruction_id. Or, returns None if the + instruction_id is already cleaned up. This is best-effort as we track + a limited number of cleaned-up instructions. + """ with self._receive_lock: + if instruction_id in self._cleaned_instruction_ids: + return None return self._received[instruction_id] def _clean_receiving_queue(self, instruction_id): # type: (str) -> None + + """ + Removes the queue and adds the instruction_id to the cleaned-up list. The + instruction_id cannot be reused for new queue. + """ with self._receive_lock: self._received.pop(instruction_id) + self._cleaned_instruction_ids[instruction_id] = True + while len(self._cleaned_instruction_ids) > _MAX_CLEANED_INSTRUCTIONS: + self._cleaned_instruction_ids.popitem(last=False) - def input_elements(self, + def input_elements( + self, instruction_id, # type: str - expected_inputs, # type: Collection[Union[str, Tuple[str, str]]] + expected_inputs, # type: Collection[Union[str, Tuple[str, str]]] abort_callback=None # type: Optional[Callable[[], bool]] - ): + ): # type: (...) -> Iterator[DataOrTimers] @@ -462,6 +484,8 @@ def input_elements(self, expected_inputs(collection): expected inputs, include both data and timer. """ received = self._receiving_queue(instruction_id) + if received is None: + raise RuntimeError('Instruction cleaned up already %s' % instruction_id) done_inputs = set() # type: Set[Union[str, Tuple[str, str]]] abort_callback = abort_callback or (lambda: False) try: @@ -473,9 +497,9 @@ def input_elements(self, raise RuntimeError('Channel closed prematurely.') if abort_callback(): return - if self._exc_info: - t, v, tb = self._exc_info - raise_(t, v, tb) + t, v, tb = self._exc_info + if t: + raise t(v).with_traceback(tb) else: if isinstance(element, beam_fn_api_pb2.Elements.Timers): if element.is_last: @@ -519,11 +543,12 @@ def close_callback(data): return ClosableOutputStream.create( close_callback, add_to_send_queue, self._data_buffer_time_limit_ms) - def output_timer_stream(self, - instruction_id, # type: str - transform_id, # type: str - timer_family_id # type: str - ): + def output_timer_stream( + self, + instruction_id, # type: str + transform_id, # type: str + timer_family_id # type: str + ): # type: (...) -> ClosableOutputStream def add_to_send_queue(timer): # type: (bytes) -> None @@ -577,12 +602,48 @@ def _write_outputs(self): def _read_inputs(self, elements_iterator): # type: (Iterable[beam_fn_api_pb2.Elements]) -> None + + next_discard_log_time = 0 # type: float + + def _put_queue(instruction_id, element): + # type: (str, Union[beam_fn_api_pb2.Elements.Data, beam_fn_api_pb2.Elements.Timers]) -> None + + """ + Puts element to the queue of the instruction_id, or discards it if the + instruction_id is already cleaned up. + """ + nonlocal next_discard_log_time + start_time = time.time() + next_waiting_log_time = start_time + 300 + while True: + input_queue = self._receiving_queue(instruction_id) + if input_queue is None: + current_time = time.time() + if next_discard_log_time <= current_time: + # Log every 10 seconds across all _put_queue calls + _LOGGER.info( + 'Discard inputs for cleaned up instruction: %s', instruction_id) + next_discard_log_time = current_time + 10 + return + try: + input_queue.put(element, timeout=1) + return + except queue.Full: + current_time = time.time() + if next_waiting_log_time <= current_time: + # Log every 5 mins in each _put_queue call + _LOGGER.info( + 'Waiting on input queue of instruction: %s for %.2f seconds', + instruction_id, + current_time - start_time) + next_waiting_log_time = current_time + 300 + try: for elements in elements_iterator: for timer in elements.timers: - self._receiving_queue(timer.instruction_id).put(timer) + _put_queue(timer.instruction_id, timer) for data in elements.data: - self._receiving_queue(data.instruction_id).put(data) + _put_queue(data.instruction_id, data) except: # pylint: disable=bare-except if not self._closed: _LOGGER.exception('Failed to read inputs in the data plane.') @@ -603,11 +664,11 @@ def set_inputs(self, elements_iterator): class GrpcClientDataChannel(_GrpcDataChannel): """A DataChannel wrapping the client side of a BeamFnData connection.""" - - def __init__(self, - data_stub, # type: beam_fn_api_pb2_grpc.BeamFnDataStub - data_buffer_time_limit_ms=0 # type: int - ): + def __init__( + self, + data_stub, # type: beam_fn_api_pb2_grpc.BeamFnDataStub + data_buffer_time_limit_ms=0 # type: int + ): # type: (...) -> None super(GrpcClientDataChannel, self).__init__(data_buffer_time_limit_ms) self.set_inputs(data_stub.Data(self._write_outputs())) @@ -629,10 +690,11 @@ def get_conn_by_worker_id(self, worker_id): with self._lock: return self._connections_by_worker_id[worker_id] - def Data(self, - elements_iterator, # type: Iterable[beam_fn_api_pb2.Elements] - context # type: Any - ): + def Data( + self, + elements_iterator, # type: Iterable[beam_fn_api_pb2.Elements] + context # type: Any + ): # type: (...) -> Iterator[beam_fn_api_pb2.Elements] worker_id = dict(context.invocation_metadata())['worker_id'] data_conn = self.get_conn_by_worker_id(worker_id) @@ -641,7 +703,7 @@ def Data(self, yield elements -class DataChannelFactory(with_metaclass(abc.ABCMeta, object)): # type: ignore[misc] +class DataChannelFactory(metaclass=abc.ABCMeta): """An abstract factory for creating ``DataChannel``.""" @abc.abstractmethod def create_data_channel(self, remote_grpc_port): @@ -650,6 +712,13 @@ def create_data_channel(self, remote_grpc_port): """Returns a ``DataChannel`` from the given RemoteGrpcPort.""" raise NotImplementedError(type(self)) + @abc.abstractmethod + def create_data_channel_from_url(self, url): + # type: (str) -> Optional[GrpcClientDataChannel] + + """Returns a ``DataChannel`` from the given url.""" + raise NotImplementedError(type(self)) + @abc.abstractmethod def close(self): # type: () -> None @@ -663,12 +732,12 @@ class GrpcClientDataChannelFactory(DataChannelFactory): Caches the created channels by ``data descriptor url``. """ - - def __init__(self, - credentials=None, # type: Any - worker_id=None, # type: Optional[str] - data_buffer_time_limit_ms=0 # type: int - ): + def __init__( + self, + credentials=None, # type: Any + worker_id=None, # type: Optional[str] + data_buffer_time_limit_ms=0 # type: int + ): # type: (...) -> None self._data_channel_cache = {} # type: Dict[str, GrpcClientDataChannel] self._lock = threading.Lock() diff --git a/sdks/python/apache_beam/runners/worker/data_plane_test.py b/sdks/python/apache_beam/runners/worker/data_plane_test.py index 1a83fe1a5e9c..5124bb69e6c8 100644 --- a/sdks/python/apache_beam/runners/worker/data_plane_test.py +++ b/sdks/python/apache_beam/runners/worker/data_plane_test.py @@ -19,10 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import itertools import logging import time diff --git a/sdks/python/apache_beam/runners/worker/log_handler.py b/sdks/python/apache_beam/runners/worker/log_handler.py index ae01ff3dc6c6..46157db5d097 100644 --- a/sdks/python/apache_beam/runners/worker/log_handler.py +++ b/sdks/python/apache_beam/runners/worker/log_handler.py @@ -20,9 +20,6 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import -from __future__ import print_function - import logging import math import queue diff --git a/sdks/python/apache_beam/runners/worker/log_handler_test.py b/sdks/python/apache_beam/runners/worker/log_handler_test.py index dcae3f67be62..a045c0cde78b 100644 --- a/sdks/python/apache_beam/runners/worker/log_handler_test.py +++ b/sdks/python/apache_beam/runners/worker/log_handler_test.py @@ -17,12 +17,9 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import re import unittest -from builtins import range import grpc diff --git a/sdks/python/apache_beam/runners/worker/logger.py b/sdks/python/apache_beam/runners/worker/logger.py index f6501233ccdd..e171caf7710b 100644 --- a/sdks/python/apache_beam/runners/worker/logger.py +++ b/sdks/python/apache_beam/runners/worker/logger.py @@ -22,8 +22,6 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import - import contextlib import json import logging diff --git a/sdks/python/apache_beam/runners/worker/logger_test.py b/sdks/python/apache_beam/runners/worker/logger_test.py index 47a5ab54828b..158fd3b5856b 100644 --- a/sdks/python/apache_beam/runners/worker/logger_test.py +++ b/sdks/python/apache_beam/runners/worker/logger_test.py @@ -19,15 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import unicode_literals - import json import logging import sys import threading import unittest -from builtins import object from apache_beam.runners.worker import logger from apache_beam.runners.worker import statesampler diff --git a/sdks/python/apache_beam/runners/worker/opcounters.py b/sdks/python/apache_beam/runners/worker/opcounters.py index d761be48c930..bafbf9f6c645 100644 --- a/sdks/python/apache_beam/runners/worker/opcounters.py +++ b/sdks/python/apache_beam/runners/worker/opcounters.py @@ -22,19 +22,13 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import math import random -from builtins import hex -from builtins import object +import sys from typing import TYPE_CHECKING from typing import Any from typing import Optional -from future.utils import raise_with_traceback - from apache_beam.typehints import TypeCheckError from apache_beam.typehints.decorators import _check_instance_type from apache_beam.utils import counters @@ -245,7 +239,8 @@ def type_check(self, value): error_msg = ( 'Runtime type violation detected within %s: ' '%s' % (transform_label, e)) - raise_with_traceback(TypeCheckError(error_msg)) + _, _, traceback = sys.exc_info() + raise TypeCheckError(error_msg).with_traceback(traceback) def do_sample(self, windowed_value): # type: (windowed_value.WindowedValue) -> None diff --git a/sdks/python/apache_beam/runners/worker/opcounters_test.py b/sdks/python/apache_beam/runners/worker/opcounters_test.py index d3ab22dcb0a3..75842fc948ab 100644 --- a/sdks/python/apache_beam/runners/worker/opcounters_test.py +++ b/sdks/python/apache_beam/runners/worker/opcounters_test.py @@ -17,15 +17,10 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import math import random import unittest -from builtins import object -from builtins import range from apache_beam import coders from apache_beam.runners.worker import opcounters diff --git a/sdks/python/apache_beam/runners/worker/operation_specs.py b/sdks/python/apache_beam/runners/worker/operation_specs.py index f1aa83052b7f..75fccd864d4a 100644 --- a/sdks/python/apache_beam/runners/worker/operation_specs.py +++ b/sdks/python/apache_beam/runners/worker/operation_specs.py @@ -23,8 +23,6 @@ # pytype: skip-file -from __future__ import absolute_import - import collections from apache_beam import coders diff --git a/sdks/python/apache_beam/runners/worker/operations.py b/sdks/python/apache_beam/runners/worker/operations.py index 8c1c4a079159..19546dd98c93 100644 --- a/sdks/python/apache_beam/runners/worker/operations.py +++ b/sdks/python/apache_beam/runners/worker/operations.py @@ -22,15 +22,9 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import logging -import sys import threading -from builtins import filter -from builtins import object -from builtins import zip from typing import TYPE_CHECKING from typing import Any from typing import DefaultDict @@ -73,6 +67,7 @@ from apache_beam.runners.sdf_utils import SplitResultResidual from apache_beam.runners.worker.bundle_processor import ExecutionContext from apache_beam.runners.worker.statesampler import StateSampler + from apache_beam.transforms.userstate import TimerSpec # Allow some "pure mode" declarations. try: @@ -685,7 +680,7 @@ def setup(self): self.timer_specs = { spec.name: spec for spec in userstate.get_dofn_specs(fn)[1] - } + } # type: Dict[str, TimerSpec] if self.side_input_maps is None: if tags_and_types: @@ -741,7 +736,8 @@ def process_timer(self, tag, timer_data): timer_data.user_key, timer_data.windows[0], timer_data.fire_timestamp, - timer_data.paneinfo) + timer_data.paneinfo, + timer_data.dynamic_timer_tag) def finish(self): # type: () -> None @@ -1003,10 +999,7 @@ def __init__( fn, args, kwargs = pickler.loads(self.spec.combine_fn)[:3] self.combine_fn = curry_combine_fn(fn, args, kwargs) self.combine_fn_add_input = self.combine_fn.add_input - base_compact = ( - core.CombineFn.compact if sys.version_info >= - (3, ) else core.CombineFn.compact.__func__) - if self.combine_fn.compact.__func__ is base_compact: + if self.combine_fn.compact.__func__ is core.CombineFn.compact: self.combine_fn_compact = None else: self.combine_fn_compact = self.combine_fn.compact diff --git a/sdks/python/apache_beam/runners/worker/sdk_worker.py b/sdks/python/apache_beam/runners/worker/sdk_worker.py index 18909921e669..4ad0727442e5 100644 --- a/sdks/python/apache_beam/runners/worker/sdk_worker.py +++ b/sdks/python/apache_beam/runners/worker/sdk_worker.py @@ -20,10 +20,6 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import abc import collections import contextlib @@ -34,8 +30,8 @@ import threading import time import traceback -from builtins import object from concurrent import futures +from types import TracebackType from typing import TYPE_CHECKING from typing import Any from typing import Callable @@ -54,8 +50,6 @@ from typing import Union import grpc -from future.utils import raise_ -from future.utils import with_metaclass from apache_beam.coders import coder_impl from apache_beam.metrics import monitoring_infos @@ -76,14 +70,12 @@ from apache_beam.utils.sentinel import Sentinel if TYPE_CHECKING: - # TODO(BEAM-9372): move this out of the TYPE_CHECKING scope when we drop - # support for python < 3.5.3 - from types import TracebackType - ExcInfo = Tuple[Type[BaseException], BaseException, TracebackType] - OptExcInfo = Union[ExcInfo, Tuple[None, None, None]] from apache_beam.portability.api import endpoints_pb2 from apache_beam.utils.profiler import Profile +ExcInfo = Tuple[Type[BaseException], BaseException, TracebackType] +OptExcInfo = Union[ExcInfo, Tuple[None, None, None]] + T = TypeVar('T') _KT = TypeVar('_KT') _VT = TypeVar('_VT') @@ -117,11 +109,11 @@ class ShortIdCache(object): def __init__(self): # type: () -> None self._lock = threading.Lock() - self._lastShortId = 0 - self._infoKeyToShortId = {} # type: Dict[FrozenSet, str] - self._shortIdToInfo = {} # type: Dict[str, metrics_pb2.MonitoringInfo] + self._last_short_id = 0 + self._info_key_to_short_id = {} # type: Dict[FrozenSet, str] + self._short_id_to_info = {} # type: Dict[str, metrics_pb2.MonitoringInfo] - def getShortId(self, monitoring_info): + def get_short_id(self, monitoring_info): # type: (metrics_pb2.MonitoringInfo) -> str """ Returns the assigned shortId for a given MonitoringInfo, assigns one if @@ -130,29 +122,32 @@ def getShortId(self, monitoring_info): key = monitoring_infos.to_key(monitoring_info) with self._lock: try: - return self._infoKeyToShortId[key] + return self._info_key_to_short_id[key] except KeyError: - self._lastShortId += 1 + self._last_short_id += 1 # Convert to a hex string (and drop the '0x') for some compression - shortId = hex(self._lastShortId)[2:] + shortId = hex(self._last_short_id)[2:] payload_cleared = metrics_pb2.MonitoringInfo() payload_cleared.CopyFrom(monitoring_info) payload_cleared.ClearField('payload') - self._infoKeyToShortId[key] = shortId - self._shortIdToInfo[shortId] = payload_cleared + self._info_key_to_short_id[key] = shortId + self._short_id_to_info[shortId] = payload_cleared return shortId - def getInfos(self, short_ids): + def get_infos(self, short_ids): #type: (Iterable[str]) -> Dict[str, metrics_pb2.MonitoringInfo] """ Gets the base MonitoringInfo (with payload cleared) for each short ID. Throws KeyError if an unassigned short ID is encountered. """ - return {short_id: self._shortIdToInfo[short_id] for short_id in short_ids} + return { + short_id: self._short_id_to_info[short_id] + for short_id in short_ids + } SHORT_ID_CACHE = ShortIdCache() @@ -339,7 +334,7 @@ def _request_harness_monitoring_infos(self, request): harness_monitoring_infos=( beam_fn_api_pb2.HarnessMonitoringInfosResponse( monitoring_data={ - SHORT_ID_CACHE.getShortId(info): info.payload + SHORT_ID_CACHE.get_short_id(info): info.payload for info in process_wide_monitoring_infos }))), request) @@ -350,7 +345,7 @@ def _request_monitoring_infos(self, request): lambda: beam_fn_api_pb2.InstructionResponse( instruction_id=request.instruction_id, monitoring_infos=beam_fn_api_pb2.MonitoringInfosMetadataResponse( - monitoring_info=SHORT_ID_CACHE.getInfos( + monitoring_info=SHORT_ID_CACHE.get_infos( request.monitoring_infos.monitoring_info_id))), request) @@ -597,6 +592,7 @@ def __init__( self.log_lull_timeout_ns = ( log_lull_timeout_ns or DEFAULT_LOG_LULL_TIMEOUT_NS) self._last_full_thread_dump_secs = 0.0 + self._last_lull_logged_secs = 0.0 def do_instruction(self, request): # type: (beam_fn_api_pb2.InstructionRequest) -> beam_fn_api_pb2.InstructionResponse @@ -650,7 +646,7 @@ def process_bundle( residual_roots=delayed_applications, monitoring_infos=monitoring_infos, monitoring_data={ - SHORT_ID_CACHE.getShortId(info): info.payload + SHORT_ID_CACHE.get_short_id(info): info.payload for info in monitoring_infos }, requires_finalization=requests_finalization)) @@ -691,7 +687,8 @@ def _log_lull_in_bundle_processor(self, processor): def _log_lull_sampler_info(self, sampler_info): # type: (statesampler.StateSamplerInfo) -> None if (sampler_info and sampler_info.time_since_transition and - sampler_info.time_since_transition > self.log_lull_timeout_ns): + sampler_info.time_since_transition > self.log_lull_timeout_ns and + self._passed_lull_timeout_since_last_log()): step_name = sampler_info.state_name.step_name state_name = sampler_info.state_name.name lull_seconds = sampler_info.time_since_transition / 1e9 @@ -717,6 +714,14 @@ def _log_lull_sampler_info(self, sampler_info): if self._should_log_full_thread_dump(lull_seconds): self._log_full_thread_dump() + def _passed_lull_timeout_since_last_log(self) -> bool: + if (time.time() - self._last_lull_logged_secs > + self.log_lull_timeout_ns / 1e9): + self._last_lull_logged_secs = time.time() + return True + else: + return False + def _should_log_full_thread_dump(self, lull_seconds): # type: (float) -> bool if lull_seconds < LOG_LULL_FULL_THREAD_DUMP_LULL_S: @@ -755,7 +760,7 @@ def process_bundle_progress( process_bundle_progress=beam_fn_api_pb2.ProcessBundleProgressResponse( monitoring_infos=monitoring_infos, monitoring_data={ - SHORT_ID_CACHE.getShortId(info): info.payload + SHORT_ID_CACHE.get_short_id(info): info.payload for info in monitoring_infos })) @@ -799,7 +804,7 @@ def maybe_profile(self, instruction_id): yield -class StateHandler(with_metaclass(abc.ABCMeta, object)): # type: ignore[misc] +class StateHandler(metaclass=abc.ABCMeta): """An abstract object representing a ``StateHandler``.""" @abc.abstractmethod def get_raw( @@ -824,8 +829,19 @@ def clear(self, state_key): # type: (beam_fn_api_pb2.StateKey) -> _Future raise NotImplementedError(type(self)) + @abc.abstractmethod + @contextlib.contextmanager + def process_instruction_id(self, bundle_id): + # type: (str) -> Iterator[None] + raise NotImplementedError(type(self)) -class StateHandlerFactory(with_metaclass(abc.ABCMeta, object)): # type: ignore[misc] + @abc.abstractmethod + def done(self): + # type: () -> None + raise NotImplementedError(type(self)) + + +class StateHandlerFactory(metaclass=abc.ABCMeta): """An abstract factory for creating ``DataChannel``.""" @abc.abstractmethod def create_state_handler(self, api_service_descriptor): @@ -880,7 +896,7 @@ def create_state_handler(self, api_service_descriptor): # Add workerId to the grpc channel grpc_channel = grpc.intercept_channel( grpc_channel, WorkerIdInterceptor()) - self._state_handler_cache[url] = CachingStateHandler( + self._state_handler_cache[url] = GlobalCachingStateHandler( self._state_cache, GrpcStateHandler( beam_fn_api_pb2_grpc.BeamFnStateStub(grpc_channel))) @@ -895,22 +911,67 @@ def close(self): self._state_cache.evict_all() -class ThrowingStateHandler(StateHandler): - """A state handler that errors on any requests.""" - def get_raw( +class CachingStateHandler(metaclass=abc.ABCMeta): + @abc.abstractmethod + @contextlib.contextmanager + def process_instruction_id(self, bundle_id, cache_tokens): + # type: (str, Iterable[beam_fn_api_pb2.ProcessBundleRequest.CacheToken]) -> Iterator[None] + raise NotImplementedError(type(self)) + + @abc.abstractmethod + def blocking_get( self, state_key, # type: beam_fn_api_pb2.StateKey - continuation_token=None # type: Optional[bytes] + coder, # type: coder_impl.CoderImpl ): - # type: (...) -> Tuple[bytes, Optional[bytes]] + # type: (...) -> Iterable[Any] + raise NotImplementedError(type(self)) + + @abc.abstractmethod + def extend( + self, + state_key, # type: beam_fn_api_pb2.StateKey + coder, # type: coder_impl.CoderImpl + elements, # type: Iterable[Any] + ): + # type: (...) -> _Future + raise NotImplementedError(type(self)) + + @abc.abstractmethod + def clear(self, state_key): + # type: (beam_fn_api_pb2.StateKey) -> _Future + raise NotImplementedError(type(self)) + + @abc.abstractmethod + def done(self): + # type: () -> None + raise NotImplementedError(type(self)) + + +class ThrowingStateHandler(CachingStateHandler): + """A caching state handler that errors on any requests.""" + @contextlib.contextmanager + def process_instruction_id(self, bundle_id, cache_tokens): + # type: (str, Iterable[beam_fn_api_pb2.ProcessBundleRequest.CacheToken]) -> Iterator[None] + raise RuntimeError( + 'Unable to handle state requests for ProcessBundleDescriptor ' + 'for bundle id %s.' % bundle_id) + + def blocking_get( + self, + state_key, # type: beam_fn_api_pb2.StateKey + coder, # type: coder_impl.CoderImpl + ): + # type: (...) -> Iterable[Any] raise RuntimeError( 'Unable to handle state requests for ProcessBundleDescriptor without ' 'state ApiServiceDescriptor for state key %s.' % state_key) - def append_raw( + def extend( self, state_key, # type: beam_fn_api_pb2.StateKey - data # type: bytes + coder, # type: coder_impl.CoderImpl + elements, # type: Iterable[Any] ): # type: (...) -> _Future raise RuntimeError( @@ -923,6 +984,11 @@ def clear(self, state_key): 'Unable to handle state requests for ProcessBundleDescriptor without ' 'state ApiServiceDescriptor for state key %s.' % state_key) + def done(self): + # type: () -> None + raise RuntimeError( + 'Unable to handle state requests for ProcessBundleDescriptor.') + class GrpcStateHandler(StateHandler): @@ -1035,7 +1101,8 @@ def _blocking_request(self, request): while not req_future.wait(timeout=1): if self._exc_info: t, v, tb = self._exc_info - raise_(t, v, tb) + if t and v and tb: + raise t(v).with_traceback(tb) elif self._done: raise RuntimeError() response = req_future.get() @@ -1056,7 +1123,7 @@ def _next_id(self): return str(request_id) -class CachingStateHandler(object): +class GlobalCachingStateHandler(CachingStateHandler): """ A State handler which retrieves and caches state. If caching is activated, caches across bundles using a supplied cache token. If activated but no cache token is supplied, caching is done at the bundle diff --git a/sdks/python/apache_beam/runners/worker/sdk_worker_main.py b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py index c3dda2940036..394781b15bf7 100644 --- a/sdks/python/apache_beam/runners/worker/sdk_worker_main.py +++ b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py @@ -19,75 +19,41 @@ # pytype: skip-file -from __future__ import absolute_import - -import http.server import json import logging import os import re import sys -import threading import traceback -from builtins import object from google.protobuf import text_format # type: ignore # not in typeshed from apache_beam.internal import pickler +from apache_beam.io import filesystems from apache_beam.options.pipeline_options import DebugOptions +from apache_beam.options.pipeline_options import GoogleCloudOptions from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import ProfilingOptions +from apache_beam.options.value_provider import RuntimeValueProvider from apache_beam.portability.api import endpoints_pb2 from apache_beam.runners.internal import names from apache_beam.runners.worker.log_handler import FnApiLogRecordHandler from apache_beam.runners.worker.sdk_worker import SdkHarness -from apache_beam.runners.worker.worker_status import thread_dump from apache_beam.utils import profiler # This module is experimental. No backwards-compatibility guarantees. _LOGGER = logging.getLogger(__name__) +_ENABLE_GOOGLE_CLOUD_PROFILER = 'enable_google_cloud_profiler' -class StatusServer(object): - def start(self, status_http_port=0): - """Executes the serving loop for the status server. - - Args: - status_http_port(int): Binding port for the debug server. - Default is 0 which means any free unsecured port - """ - class StatusHttpHandler(http.server.BaseHTTPRequestHandler): - """HTTP handler for serving stacktraces of all threads.""" - def do_GET(self): # pylint: disable=invalid-name - """Return all thread stacktraces information for GET request.""" - self.send_response(200) - self.send_header('Content-Type', 'text/plain') - self.end_headers() - - self.wfile.write(thread_dump().encode('utf-8')) - - def log_message(self, f, *args): - """Do not log any messages.""" - pass - - self.httpd = httpd = http.server.HTTPServer(('localhost', status_http_port), - StatusHttpHandler) - _LOGGER.info( - 'Status HTTP server running at %s:%s', - httpd.server_name, - httpd.server_port) - - httpd.serve_forever() - - -def main(unused_argv): - """Main entry point for SDK Fn Harness.""" - if 'LOGGING_API_SERVICE_DESCRIPTOR' in os.environ: +def create_harness(environment, dry_run=False): + """Creates SDK Fn Harness.""" + if 'LOGGING_API_SERVICE_DESCRIPTOR' in environment: try: logging_service_descriptor = endpoints_pb2.ApiServiceDescriptor() text_format.Merge( - os.environ['LOGGING_API_SERVICE_DESCRIPTOR'], + environment['LOGGING_API_SERVICE_DESCRIPTOR'], logging_service_descriptor) # Send all logs to the runner. @@ -104,61 +70,87 @@ def main(unused_argv): else: fn_log_handler = None - # Start status HTTP server thread. - thread = threading.Thread( - name='status_http_server', target=StatusServer().start) - thread.daemon = True - thread.setName('status-server-demon') - thread.start() - - if 'PIPELINE_OPTIONS' in os.environ: - sdk_pipeline_options = _parse_pipeline_options( - os.environ['PIPELINE_OPTIONS']) - else: - sdk_pipeline_options = PipelineOptions.from_dictionary({}) + pipeline_options_dict = _load_pipeline_options( + environment.get('PIPELINE_OPTIONS')) + # These are used for dataflow templates. + RuntimeValueProvider.set_runtime_options(pipeline_options_dict) + sdk_pipeline_options = PipelineOptions.from_dictionary(pipeline_options_dict) + filesystems.FileSystems.set_options(sdk_pipeline_options) - if 'SEMI_PERSISTENT_DIRECTORY' in os.environ: - semi_persistent_directory = os.environ['SEMI_PERSISTENT_DIRECTORY'] + if 'SEMI_PERSISTENT_DIRECTORY' in environment: + semi_persistent_directory = environment['SEMI_PERSISTENT_DIRECTORY'] else: semi_persistent_directory = None _LOGGER.info('semi_persistent_directory: %s', semi_persistent_directory) - _worker_id = os.environ.get('WORKER_ID', None) + _worker_id = environment.get('WORKER_ID', None) try: _load_main_session(semi_persistent_directory) + except CorruptMainSessionException: + exception_details = traceback.format_exc() + _LOGGER.error( + 'Could not load main session: %s', exception_details, exc_info=True) + raise except Exception: # pylint: disable=broad-except exception_details = traceback.format_exc() _LOGGER.error( 'Could not load main session: %s', exception_details, exc_info=True) - try: - _LOGGER.info( - 'Python sdk harness started with pipeline_options: %s', - sdk_pipeline_options.get_all_options(drop_default=True)) - control_service_descriptor = endpoints_pb2.ApiServiceDescriptor() - status_service_descriptor = endpoints_pb2.ApiServiceDescriptor() + _LOGGER.info( + 'Pipeline_options: %s', + sdk_pipeline_options.get_all_options(drop_default=True)) + control_service_descriptor = endpoints_pb2.ApiServiceDescriptor() + status_service_descriptor = endpoints_pb2.ApiServiceDescriptor() + text_format.Merge( + environment['CONTROL_API_SERVICE_DESCRIPTOR'], control_service_descriptor) + if 'STATUS_API_SERVICE_DESCRIPTOR' in environment: text_format.Merge( - os.environ['CONTROL_API_SERVICE_DESCRIPTOR'], - control_service_descriptor) - if 'STATUS_API_SERVICE_DESCRIPTOR' in os.environ: - text_format.Merge( - os.environ['STATUS_API_SERVICE_DESCRIPTOR'], - status_service_descriptor) - # TODO(robertwb): Support authentication. - assert not control_service_descriptor.HasField('authentication') - - experiments = sdk_pipeline_options.view_as(DebugOptions).experiments or [] - enable_heap_dump = 'enable_heap_dump' in experiments - SdkHarness( - control_address=control_service_descriptor.url, - status_address=status_service_descriptor.url, - worker_id=_worker_id, - state_cache_size=_get_state_cache_size(experiments), - data_buffer_time_limit_ms=_get_data_buffer_time_limit_ms(experiments), - profiler_factory=profiler.Profile.factory_from_options( - sdk_pipeline_options.view_as(ProfilingOptions)), - enable_heap_dump=enable_heap_dump).run() + environment['STATUS_API_SERVICE_DESCRIPTOR'], status_service_descriptor) + # TODO(robertwb): Support authentication. + assert not control_service_descriptor.HasField('authentication') + + experiments = sdk_pipeline_options.view_as(DebugOptions).experiments or [] + enable_heap_dump = 'enable_heap_dump' in experiments + if dry_run: + return + sdk_harness = SdkHarness( + control_address=control_service_descriptor.url, + status_address=status_service_descriptor.url, + worker_id=_worker_id, + state_cache_size=_get_state_cache_size(experiments), + data_buffer_time_limit_ms=_get_data_buffer_time_limit_ms(experiments), + profiler_factory=profiler.Profile.factory_from_options( + sdk_pipeline_options.view_as(ProfilingOptions)), + enable_heap_dump=enable_heap_dump) + return fn_log_handler, sdk_harness, sdk_pipeline_options + + +def main(unused_argv): + """Main entry point for SDK Fn Harness.""" + fn_log_handler, sdk_harness, sdk_pipeline_options = create_harness(os.environ) + experiments = sdk_pipeline_options.view_as(DebugOptions).experiments or [] + dataflow_service_options = ( + sdk_pipeline_options.view_as(GoogleCloudOptions).dataflow_service_options + or []) + if (_ENABLE_GOOGLE_CLOUD_PROFILER in experiments) or ( + _ENABLE_GOOGLE_CLOUD_PROFILER in dataflow_service_options): + try: + import googlecloudprofiler + job_id = os.environ["JOB_ID"] + job_name = os.environ["JOB_NAME"] + if job_id and job_name: + googlecloudprofiler.start( + service=job_name, service_version=job_id, verbose=1) + _LOGGER.info('Turning on Google Cloud Profiler.') + else: + raise RuntimeError('Unable to find the job id or job name from envvar.') + except Exception as e: # pylint: disable=broad-except + _LOGGER.warning( + 'Unable to start google cloud profiler due to error: %s' % e) + try: + _LOGGER.info('Python sdk harness starting.') + sdk_harness.run() _LOGGER.info('Python sdk harness exiting.') except: # pylint: disable=broad-except _LOGGER.exception('Python sdk harness failed: ') @@ -168,20 +160,26 @@ def main(unused_argv): fn_log_handler.close() -def _parse_pipeline_options(options_json): +def _load_pipeline_options(options_json): + if options_json is None: + return {} options = json.loads(options_json) # Check the options field first for backward compatibility. if 'options' in options: - return PipelineOptions.from_dictionary(options.get('options')) + return options.get('options') else: # Remove extra urn part from the key. portable_option_regex = r'^beam:option:(?P.*):v1$' - return PipelineOptions.from_dictionary({ + return { re.match(portable_option_regex, k).group('key') if re.match( portable_option_regex, k) else k: v for k, v in options.items() - }) + } + + +def _parse_pipeline_options(options_json): + return PipelineOptions.from_dictionary(_load_pipeline_options(options_json)) def _get_state_cache_size(experiments): @@ -225,12 +223,29 @@ def _get_data_buffer_time_limit_ms(experiments): return 0 +class CorruptMainSessionException(Exception): + """ + Used to crash this worker if a main session file was provided but + is not valid. + """ + pass + + def _load_main_session(semi_persistent_directory): """Loads a pickled main session from the path specified.""" if semi_persistent_directory: session_file = os.path.join( semi_persistent_directory, 'staged', names.PICKLED_MAIN_SESSION_FILE) if os.path.isfile(session_file): + # If the expected session file is present but empty, it's likely that + # the user code run by this worker will likely crash at runtime. + # This can happen if the worker fails to download the main session. + # Raise a fatal error and crash this worker, forcing a restart. + if os.path.getsize(session_file) == 0: + raise CorruptMainSessionException( + 'Session file found, but empty: %s. Functions defined in __main__ ' + '(interactive session) will almost certainly fail.' % + (session_file, )) pickler.load_session(session_file) else: _LOGGER.warning( diff --git a/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py b/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py index 6239da56c22a..ea6a0728a555 100644 --- a/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py +++ b/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py @@ -19,20 +19,15 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import logging import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import from hamcrest import all_of from hamcrest import assert_that from hamcrest import has_entry from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.value_provider import RuntimeValueProvider from apache_beam.runners.worker import sdk_worker_main from apache_beam.runners.worker import worker_status @@ -52,7 +47,6 @@ def _add_argparse_args(cls, parser): '--m_m_option', action='append', help='mock multi option') def test_status_server(self): - # Wrapping the method to see if it appears in threadump def wrapped_method_for_test(): threaddump = worker_status.thread_dump() @@ -95,6 +89,16 @@ def test_parse_pipeline_options(self): get_all_options(), has_entry('eam:option:m_option:v', 'mock_val')) + def test_runtime_values(self): + test_runtime_provider = RuntimeValueProvider('test_param', int, None) + sdk_worker_main.create_harness({ + 'CONTROL_API_SERVICE_DESCRIPTOR': '', + 'PIPELINE_OPTIONS': '{"test_param": 37}', + }, + dry_run=True) + self.assertTrue(test_runtime_provider.is_accessible()) + self.assertEqual(test_runtime_provider.get(), 37) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/runners/worker/sdk_worker_test.py b/sdks/python/apache_beam/runners/worker/sdk_worker_test.py index 7fa290fddb8f..92a2f72058cd 100644 --- a/sdks/python/apache_beam/runners/worker/sdk_worker_test.py +++ b/sdks/python/apache_beam/runners/worker/sdk_worker_test.py @@ -19,16 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import contextlib import logging import threading import time import unittest -from builtins import range from collections import namedtuple import grpc @@ -47,7 +42,7 @@ from apache_beam.runners.worker import statecache from apache_beam.runners.worker import statesampler from apache_beam.runners.worker.sdk_worker import BundleProcessorCache -from apache_beam.runners.worker.sdk_worker import CachingStateHandler +from apache_beam.runners.worker.sdk_worker import GlobalCachingStateHandler from apache_beam.runners.worker.sdk_worker import SdkWorker from apache_beam.utils import thread_pool_executor from apache_beam.utils.counters import CounterName @@ -334,7 +329,7 @@ def test_log_lull_in_bundle_processor(self): with mock.patch(log_full_thread_dump_fn_name) as log_full_thread_dump: with mock.patch('time.time') as time_mock: - time_mock.return_value = now + 21 * 60 # 21 minutes + time_mock.return_value = now + 42 * 60 # 21 minutes after previous one sampler_info = self._get_state_sampler_info_for_lull(21 * 60) worker._log_lull_sampler_info(sampler_info) log_full_thread_dump.assert_called_once_with() @@ -362,7 +357,7 @@ def process_instruction_id(self, bundle_id): underlying_state = FakeUnderlyingState() state_cache = statecache.StateCache(100) - caching_state_hander = sdk_worker.CachingStateHandler( + caching_state_hander = GlobalCachingStateHandler( state_cache, underlying_state) state1 = beam_fn_api_pb2.StateKey( @@ -498,8 +493,7 @@ def test_append_clear_with_preexisting_state(self): underlying_state_handler = self.UnderlyingStateHandler() state_cache = statecache.StateCache(100) - handler = sdk_worker.CachingStateHandler( - state_cache, underlying_state_handler) + handler = GlobalCachingStateHandler(state_cache, underlying_state_handler) def get(): return handler.blocking_get(state, coder.get_impl()) @@ -529,8 +523,7 @@ def clear(): def test_continuation_token(self): underlying_state_handler = self.UnderlyingStateHandler() state_cache = statecache.StateCache(100) - handler = sdk_worker.CachingStateHandler( - state_cache, underlying_state_handler) + handler = GlobalCachingStateHandler(state_cache, underlying_state_handler) coder = VarIntCoder() @@ -558,10 +551,12 @@ def clear(): underlying_state_handler.set_continuations(True) underlying_state_handler.set_values([45, 46, 47], coder) with handler.process_instruction_id('bundle', [cache_token]): - self.assertEqual(get_type(), CachingStateHandler.ContinuationIterable) + self.assertEqual( + get_type(), GlobalCachingStateHandler.ContinuationIterable) self.assertEqual(get(), [45, 46, 47]) append(48, 49) - self.assertEqual(get_type(), CachingStateHandler.ContinuationIterable) + self.assertEqual( + get_type(), GlobalCachingStateHandler.ContinuationIterable) self.assertEqual(get(), [45, 46, 47, 48, 49]) clear() self.assertEqual(get_type(), list) @@ -579,7 +574,7 @@ def clear(): class ShortIdCacheTest(unittest.TestCase): def testShortIdAssignment(self): - TestCase = namedtuple('TestCase', ['expectedShortId', 'info']) + TestCase = namedtuple('TestCase', ['expected_short_id', 'info']) test_cases = [ TestCase(*args) for args in [ ( @@ -664,14 +659,14 @@ def testShortIdAssignment(self): for case in test_cases: self.assertEqual( - case.expectedShortId, - cache.getShortId(case.info), + case.expected_short_id, + cache.get_short_id(case.info), "Got incorrect short id for monitoring info:\n%s" % case.info) # Retrieve all of the monitoring infos by short id, and verify that the # metadata (everything but the payload) matches the originals - actual_recovered_infos = cache.getInfos( - case.expectedShortId for case in test_cases).values() + actual_recovered_infos = cache.get_infos( + case.expected_short_id for case in test_cases).values() for recoveredInfo, case in zip(actual_recovered_infos, test_cases): self.assertEqual( monitoringInfoMetadata(case.info), @@ -680,8 +675,8 @@ def testShortIdAssignment(self): # Retrieve short ids one more time in reverse for case in reversed(test_cases): self.assertEqual( - case.expectedShortId, - cache.getShortId(case.info), + case.expected_short_id, + cache.get_short_id(case.info), "Got incorrect short id on second retrieval for monitoring info:\n%s" % case.info) diff --git a/sdks/python/apache_beam/runners/worker/sideinputs.py b/sdks/python/apache_beam/runners/worker/sideinputs.py index 02ff4fdedcca..7192ec145455 100644 --- a/sdks/python/apache_beam/runners/worker/sideinputs.py +++ b/sdks/python/apache_beam/runners/worker/sideinputs.py @@ -19,15 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import logging import queue import threading import traceback -from builtins import object -from builtins import range from apache_beam.coders import observable from apache_beam.io import iobase diff --git a/sdks/python/apache_beam/runners/worker/sideinputs_test.py b/sdks/python/apache_beam/runners/worker/sideinputs_test.py index 35e5d365e786..f609cf4f814e 100644 --- a/sdks/python/apache_beam/runners/worker/sideinputs_test.py +++ b/sdks/python/apache_beam/runners/worker/sideinputs_test.py @@ -19,13 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import time import unittest -from builtins import object -from builtins import range import mock diff --git a/sdks/python/apache_beam/runners/worker/statecache.py b/sdks/python/apache_beam/runners/worker/statecache.py index 0fd8ac750a9c..3ed0c1589c26 100644 --- a/sdks/python/apache_beam/runners/worker/statecache.py +++ b/sdks/python/apache_beam/runners/worker/statecache.py @@ -19,8 +19,6 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import - import collections import logging import threading diff --git a/sdks/python/apache_beam/runners/worker/statecache_test.py b/sdks/python/apache_beam/runners/worker/statecache_test.py index d18dd6c98ec8..a1a175ed3473 100644 --- a/sdks/python/apache_beam/runners/worker/statecache_test.py +++ b/sdks/python/apache_beam/runners/worker/statecache_test.py @@ -18,8 +18,6 @@ """Tests for state caching.""" # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/runners/worker/statesampler.py b/sdks/python/apache_beam/runners/worker/statesampler.py index abb43a74ec2a..7230b248e444 100644 --- a/sdks/python/apache_beam/runners/worker/statesampler.py +++ b/sdks/python/apache_beam/runners/worker/statesampler.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import contextlib import threading from typing import TYPE_CHECKING diff --git a/sdks/python/apache_beam/runners/worker/statesampler_fast.pyx b/sdks/python/apache_beam/runners/worker/statesampler_fast.pyx index 3c0fddec37e9..f0e1e6ccb595 100644 --- a/sdks/python/apache_beam/runners/worker/statesampler_fast.pyx +++ b/sdks/python/apache_beam/runners/worker/statesampler_fast.pyx @@ -16,6 +16,7 @@ # # cython: profile=True +# cython: language_level=3 """State sampler for tracking time spent in execution steps. diff --git a/sdks/python/apache_beam/runners/worker/statesampler_slow.py b/sdks/python/apache_beam/runners/worker/statesampler_slow.py index a27e00cfcbc6..2451ab2c6474 100644 --- a/sdks/python/apache_beam/runners/worker/statesampler_slow.py +++ b/sdks/python/apache_beam/runners/worker/statesampler_slow.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import object from typing import Optional from apache_beam.runners import common diff --git a/sdks/python/apache_beam/runners/worker/statesampler_test.py b/sdks/python/apache_beam/runners/worker/statesampler_test.py index b34c9b59d6ed..c9ea7e8eef97 100644 --- a/sdks/python/apache_beam/runners/worker/statesampler_test.py +++ b/sdks/python/apache_beam/runners/worker/statesampler_test.py @@ -18,13 +18,9 @@ """Tests for state sampler.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import time import unittest -from builtins import range from tenacity import retry from tenacity import stop_after_attempt diff --git a/sdks/python/apache_beam/runners/worker/worker_id_interceptor.py b/sdks/python/apache_beam/runners/worker/worker_id_interceptor.py index 4bcdb03c6b44..b913f2c63b63 100644 --- a/sdks/python/apache_beam/runners/worker/worker_id_interceptor.py +++ b/sdks/python/apache_beam/runners/worker/worker_id_interceptor.py @@ -18,10 +18,6 @@ """Client Interceptor to inject worker_id""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import collections import os from typing import Optional diff --git a/sdks/python/apache_beam/runners/worker/worker_id_interceptor_test.py b/sdks/python/apache_beam/runners/worker/worker_id_interceptor_test.py index 258eac017208..0db9c1b4ddc0 100644 --- a/sdks/python/apache_beam/runners/worker/worker_id_interceptor_test.py +++ b/sdks/python/apache_beam/runners/worker/worker_id_interceptor_test.py @@ -18,8 +18,6 @@ """Test for WorkerIdInterceptor""" # pytype: skip-file -from __future__ import absolute_import - import collections import logging import unittest diff --git a/sdks/python/apache_beam/runners/worker/worker_pool_main.py b/sdks/python/apache_beam/runners/worker/worker_pool_main.py index f4a4728c373d..e5bfff81355b 100644 --- a/sdks/python/apache_beam/runners/worker/worker_pool_main.py +++ b/sdks/python/apache_beam/runners/worker/worker_pool_main.py @@ -29,8 +29,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import atexit import logging diff --git a/sdks/python/apache_beam/runners/worker/worker_status.py b/sdks/python/apache_beam/runners/worker/worker_status.py index 0fe84831e718..b699f0794fce 100644 --- a/sdks/python/apache_beam/runners/worker/worker_status.py +++ b/sdks/python/apache_beam/runners/worker/worker_status.py @@ -17,9 +17,6 @@ """Worker status api handler for reporting SDK harness debug info.""" -from __future__ import absolute_import -from __future__ import division - import queue import sys import threading diff --git a/sdks/python/apache_beam/runners/worker/worker_status_test.py b/sdks/python/apache_beam/runners/worker/worker_status_test.py index 686b2d60c0ec..2b4c7dcf5561 100644 --- a/sdks/python/apache_beam/runners/worker/worker_status_test.py +++ b/sdks/python/apache_beam/runners/worker/worker_status_test.py @@ -15,8 +15,6 @@ # limitations under the License. # -from __future__ import absolute_import - import logging import threading import unittest diff --git a/sdks/python/apache_beam/testing/__init__.py b/sdks/python/apache_beam/testing/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/testing/__init__.py +++ b/sdks/python/apache_beam/testing/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/testing/benchmarks/__init__.py b/sdks/python/apache_beam/testing/benchmarks/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/testing/benchmarks/__init__.py +++ b/sdks/python/apache_beam/testing/benchmarks/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/preprocess.py b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/preprocess.py index 057cb5863e2f..2016a2c97658 100644 --- a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/preprocess.py +++ b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/preprocess.py @@ -15,8 +15,6 @@ """Preprocessor applying tf.transform to the chicago_taxi data.""" # pytype: skip-file -from __future__ import absolute_import, division, print_function - import argparse import os @@ -24,7 +22,7 @@ import tensorflow_transform as transform import tensorflow_transform.beam as tft_beam from tensorflow_transform.coders import example_proto_coder -from tensorflow_transform.tf_metadata import dataset_metadata, dataset_schema +from tensorflow_transform.tf_metadata import dataset_metadata, schema_utils import apache_beam as beam from apache_beam.io.gcp.bigquery import ReadFromBigQuery @@ -138,7 +136,7 @@ def preprocessing_fn(inputs): filters=MetricsFilter().with_namespace(namespace)) schema = taxi.read_schema(schema_file) raw_feature_spec = taxi.get_raw_feature_spec(schema) - raw_schema = dataset_schema.from_feature_spec(raw_feature_spec) + raw_schema = schema_utils.schema_from_feature_spec(raw_feature_spec) raw_data_metadata = dataset_metadata.DatasetMetadata(raw_schema) pipeline = beam.Pipeline(argv=pipeline_args) diff --git a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/process_tfma.py b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/process_tfma.py index f2af28468ad1..ac9dcab0026a 100644 --- a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/process_tfma.py +++ b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/process_tfma.py @@ -16,8 +16,6 @@ # pytype: skip-file -from __future__ import absolute_import, division, print_function - import argparse import tensorflow as tf diff --git a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/setup.py b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/setup.py index 5882651e31f6..75c7bda05a3e 100644 --- a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/setup.py +++ b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/setup.py @@ -15,8 +15,6 @@ """Setup dependencies for local and cloud deployment.""" # pytype: skip-file -from __future__ import absolute_import - import setuptools TF_VERSION = '1.14.0' diff --git a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/tfdv_analyze_and_validate.py b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/tfdv_analyze_and_validate.py index ab70ccc9237a..87c631762287 100644 --- a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/tfdv_analyze_and_validate.py +++ b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/tfdv_analyze_and_validate.py @@ -15,10 +15,6 @@ """Compute stats, infer schema, and validate stats for chicago taxi example.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import argparse import numpy as np diff --git a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/model.py b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/model.py index 2095520a919c..ad9e33c749dc 100644 --- a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/model.py +++ b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/model.py @@ -15,10 +15,6 @@ """Defines the model used to predict who will tip in the Chicago Taxi demo.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import tensorflow as tf import tensorflow_model_analysis as tfma diff --git a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/task.py b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/task.py index a6b9960aef87..709d76c813c7 100644 --- a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/task.py +++ b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/task.py @@ -15,10 +15,6 @@ """Trainer for the chicago_taxi demo.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import argparse import os diff --git a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/taxi.py b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/taxi.py index 147e2c423281..6d84c995bfca 100644 --- a/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/taxi.py +++ b/sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/taxi.py @@ -15,12 +15,7 @@ """Utility and schema methods for the chicago_taxi sample.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - from tensorflow_transform import coders as tft_coders -from tensorflow_transform.tf_metadata import dataset_schema from tensorflow_transform.tf_metadata import schema_utils from google.protobuf import text_format # type: ignore # typeshed out of date @@ -103,14 +98,14 @@ def get_raw_feature_spec(schema): def make_proto_coder(schema): raw_feature_spec = get_raw_feature_spec(schema) - raw_schema = dataset_schema.from_feature_spec(raw_feature_spec) + raw_schema = schema_utils.schema_from_feature_spec(raw_feature_spec) return tft_coders.ExampleProtoCoder(raw_schema) def make_csv_coder(schema): """Return a coder for tf.transform to read csv files.""" raw_feature_spec = get_raw_feature_spec(schema) - parsing_schema = dataset_schema.from_feature_spec(raw_feature_spec) + parsing_schema = schema_utils.schema_from_feature_spec(raw_feature_spec) return tft_coders.CsvCoder(CSV_COLUMN_NAMES, parsing_schema) diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/__init__.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/__init__.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/models/__init__.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/models/__init__.py index f4f43cbb1236..cce3acad34a4 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/models/__init__.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/models/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/models/auction_bid.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/models/auction_bid.py index af3600332882..7424a3a48355 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/models/auction_bid.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/models/auction_bid.py @@ -16,8 +16,6 @@ # """Result of WinningBid transform.""" -from __future__ import absolute_import - from apache_beam.coders import coder_impl from apache_beam.coders.coders import FastCoder from apache_beam.testing.benchmarks.nexmark import nexmark_util diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/models/nexmark_model.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/models/nexmark_model.py index 0cb7aa889f7c..4613d7f90c26 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/models/nexmark_model.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/models/nexmark_model.py @@ -26,8 +26,6 @@ - The bid on an item for auction (Bid). """ -from __future__ import absolute_import - from apache_beam.coders import coder_impl from apache_beam.coders.coders import FastCoder from apache_beam.coders.coders import StrUtf8Coder diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/monitor.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/monitor.py index fab62188a5c6..9d363bfeec61 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/monitor.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/monitor.py @@ -15,8 +15,6 @@ # limitations under the License. # -from __future__ import absolute_import - from time import time import apache_beam as beam diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_launcher.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_launcher.py index e7ce033fb2a1..89b6dce2b58f 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_launcher.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_launcher.py @@ -59,10 +59,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import argparse import logging import time @@ -245,7 +241,7 @@ def read_from_pubsub(self): | 'deserialization' >> beam.ParDo(nexmark_util.ParseJsonEventFn())) return events - def run_query(self, query, query_args, query_errors): + def run_query(self, query, query_args, pipeline_options, query_errors): try: self.pipeline = beam.Pipeline(options=self.pipeline_options) nexmark_util.setup_coder() @@ -263,7 +259,7 @@ def run_query(self, query, query_args, query_errors): events = self.read_from_file() events = events | 'event_monitor' >> beam.ParDo(event_monitor.doFn) - output = query.load(events, query_args) + output = query.load(events, query_args, pipeline_options) output | 'result_monitor' >> beam.ParDo(result_monitor.doFn) # pylint: disable=expression-not-assigned result = self.pipeline.run() @@ -430,7 +426,11 @@ def run(self): query_errors = [] for i in self.args.query: logging.info('Running query %d', i) - self.run_query(queries[i], query_args, query_errors=query_errors) + self.run_query( + queries[i], + query_args, + self.pipeline_options, + query_errors=query_errors) if query_errors: logging.error('Query failed with %s', ', '.join(query_errors)) diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_util.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_util.py index 85181795fa75..570fcb1e1ec0 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_util.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_util.py @@ -34,9 +34,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import json import logging import threading diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/nexmark_query_util.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/nexmark_query_util.py index 0d12b697844e..85bece4083c6 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/nexmark_query_util.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/nexmark_query_util.py @@ -16,8 +16,6 @@ # """Utilities for working with NEXmark data stream.""" -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.models import nexmark_model diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query0.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query0.py index a4c50550a6dc..904e1d208dc0 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query0.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query0.py @@ -27,8 +27,6 @@ # pytype: skip-file -from __future__ import absolute_import - import apache_beam as beam @@ -40,7 +38,7 @@ def process(self, element): yield recon -def load(events, query_args=None): +def load(events, metadata=None, pipeline_options=None): return ( events | 'serialization_and_deserialization' >> beam.ParDo(RoundTripFn())) diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query1.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query1.py index acb205fe2395..2173a93c2abe 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query1.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query1.py @@ -25,8 +25,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.models import nexmark_model from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util @@ -34,7 +32,7 @@ USD_TO_EURO = 0.89 -def load(events, query_args=None): +def load(events, metadata=None, pipeline_options=None): return ( events | nexmark_query_util.JustBids() diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query10.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query10.py index f678d8da68f6..8640e50867f9 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query10.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query10.py @@ -22,8 +22,6 @@ 2*max_workers log files. """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.options.pipeline_options import GoogleCloudOptions from apache_beam.transforms import trigger @@ -31,7 +29,7 @@ from apache_beam.utils.timestamp import Duration NUM_SHARD_PER_WORKER = 5 -LATE_BATCHING_PERIOD = Duration.of(10) +LATE_BATCHING_PERIOD = 10 output_path = None max_num_workers = 5 @@ -78,7 +76,7 @@ def index_path_for(window): return None -def load(events, pipeline_options, metadata=None): +def load(events, metadata=None, pipeline_options=None): return ( events | 'query10_shard_events' >> beam.ParDo(ShardEventsDoFn()) diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query11.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query11.py index 1da48df39923..644e070eb907 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query11.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query11.py @@ -24,8 +24,6 @@ bids per session. """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util from apache_beam.testing.benchmarks.nexmark.queries.nexmark_query_util import ResultNames @@ -33,7 +31,7 @@ from apache_beam.transforms import window -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): return ( events diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query12.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query12.py index 92a2d0ddaa3b..160efb18095a 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query12.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query12.py @@ -23,8 +23,6 @@ Emit the count of bids per window. """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util from apache_beam.testing.benchmarks.nexmark.queries.nexmark_query_util import ResultNames @@ -32,7 +30,7 @@ from apache_beam.transforms import window -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): return ( events | nexmark_query_util.JustBids() diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query2.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query2.py index db3d33c6b65d..4295d0a51702 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query2.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query2.py @@ -25,14 +25,12 @@ # pytype: skip-file -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util from apache_beam.testing.benchmarks.nexmark.queries.nexmark_query_util import ResultNames -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): return ( events | nexmark_query_util.JustBids() diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query3.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query3.py index f723f5a286cb..2ddb25d00d5b 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query3.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query3.py @@ -31,8 +31,6 @@ the stored person record. """ -from __future__ import absolute_import - import logging import apache_beam as beam @@ -45,7 +43,7 @@ from apache_beam.transforms.userstate import on_timer -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): num_events_in_pane = 30 windowed_events = ( events diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query4.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query4.py index e4226197f56d..81f35224620d 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query4.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query4.py @@ -39,8 +39,6 @@ window_size_sec and period window_period_sec. """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util from apache_beam.testing.benchmarks.nexmark.queries import winning_bids @@ -48,7 +46,7 @@ from apache_beam.transforms import window -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): # find winning bids for each closed auction all_winning_bids = ( events diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query5.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query5.py index 766a84e6582e..3c5e572fed08 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query5.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query5.py @@ -31,15 +31,13 @@ windows, and we'll also preserve the bid counts. """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util from apache_beam.testing.benchmarks.nexmark.queries.nexmark_query_util import ResultNames from apache_beam.transforms import window -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): return ( events | nexmark_query_util.JustBids() diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query6.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query6.py index 6dabe3779868..0f8a0eb59325 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query6.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query6.py @@ -28,9 +28,6 @@ GROUP BY Q.seller; """ -from __future__ import absolute_import -from __future__ import division - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util from apache_beam.testing.benchmarks.nexmark.queries import winning_bids @@ -39,7 +36,7 @@ from apache_beam.transforms import window -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): # find winning bids for each closed auction return ( events diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query7.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query7.py index 8f16b497fbcc..930eb08f0366 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query7.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query7.py @@ -29,14 +29,12 @@ (A combiner, as used in Query 5, is a more efficient approach.). """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util from apache_beam.transforms import window -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): # window bids into fixed window sliding_bids = ( events diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query8.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query8.py index 8f0e42356672..59a0459742c1 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query8.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query8.py @@ -27,15 +27,13 @@ shorter window. """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util from apache_beam.testing.benchmarks.nexmark.queries.nexmark_query_util import ResultNames from apache_beam.transforms import window -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): # window person and key by persons' id persons_by_id = ( events diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query9.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query9.py index 180d8500d062..05f3f726b045 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query9.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query9.py @@ -20,14 +20,12 @@ See winning_bids.py for detailed documentation """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.testing.benchmarks.nexmark.queries import nexmark_query_util from apache_beam.testing.benchmarks.nexmark.queries import winning_bids -def load(events, metadata=None): +def load(events, metadata=None, pipeline_options=None): return ( events | beam.Filter(nexmark_query_util.auction_or_bid) diff --git a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/winning_bids.py b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/winning_bids.py index 6cb69db3e263..94f84b6d20dd 100644 --- a/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/winning_bids.py +++ b/sdks/python/apache_beam/testing/benchmarks/nexmark/queries/winning_bids.py @@ -31,8 +31,6 @@ and auctions together without requiring global state. """ -from __future__ import absolute_import - import apache_beam as beam from apache_beam.coders import coder_impl from apache_beam.coders.coders import FastCoder diff --git a/sdks/python/apache_beam/testing/data/vcf/valid-4.0.vcf b/sdks/python/apache_beam/testing/data/vcf/valid-4.0.vcf deleted file mode 100644 index e9b064ba0947..000000000000 --- a/sdks/python/apache_beam/testing/data/vcf/valid-4.0.vcf +++ /dev/null @@ -1,23 +0,0 @@ -##fileformat=VCFv4.0 -##fileDate=20090805 -##source=myImputationProgramV3.1 -##reference=1000GenomesPilot-NCBI36 -##phasing=partial -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##FILTER= -##FILTER= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 -20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. -20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3 -20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4 -20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2 -19 1234567 microsat1 GTCT G,GTACT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3 \ No newline at end of file diff --git a/sdks/python/apache_beam/testing/data/vcf/valid-4.0.vcf.bz2 b/sdks/python/apache_beam/testing/data/vcf/valid-4.0.vcf.bz2 deleted file mode 100644 index dff64e61a60b..000000000000 Binary files a/sdks/python/apache_beam/testing/data/vcf/valid-4.0.vcf.bz2 and /dev/null differ diff --git a/sdks/python/apache_beam/testing/data/vcf/valid-4.0.vcf.gz b/sdks/python/apache_beam/testing/data/vcf/valid-4.0.vcf.gz deleted file mode 100644 index ac581282d3c0..000000000000 Binary files a/sdks/python/apache_beam/testing/data/vcf/valid-4.0.vcf.gz and /dev/null differ diff --git a/sdks/python/apache_beam/testing/data/vcf/valid-4.1-large.vcf b/sdks/python/apache_beam/testing/data/vcf/valid-4.1-large.vcf deleted file mode 100644 index c47068521cdb..000000000000 --- a/sdks/python/apache_beam/testing/data/vcf/valid-4.1-large.vcf +++ /dev/null @@ -1,10000 +0,0 @@ -##fileformat=VCFv4.1 -##fileDate=20121204 -##center=Complete Genomics -##source=CGAPipeline_2.2.0.26 -##source_GENOME_REFERENCE=NCBI build 37 -##source_MAX_PLOIDY=10 -##source_NUMBER_LEVELS=GS01868-DNA_H02:7 -##source_NONDIPLOID_WINDOW_WIDTH=100000 -##source_MEAN_GC_CORRECTED_CVG=GS01868-DNA_H02:41.51 -##source_GENE_ANNOTATIONS=NCBI build 37.2 -##source_DBSNP_BUILD=dbSNP build 135 -##source_MEI_1000G_ANNOTATIONS=INITIAL-DATA-RELEASE -##source_COSMIC=COSMIC v59 -##source_DGV_VERSION=9 -##source_MIRBASE_VERSION=mirBase version 18 -##source_PFAM_DATE=April 21, 2011 -##source_REPMASK_GENERATED_AT=2011-Feb-15 10:08 -##source_SEGDUP_GENERATED_AT=2010-Dec-01 13:40 -##phasing=partial -##reference=ftp://ftp.completegenomics.com/ReferenceFiles/build37.fa.bz2 -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##contig= -##ALT= -##ALT= -##ALT= -##ALT= -##ALT= -##ALT= -##ALT= -##ALT= -##ALT= -##FILTER= -##FILTER= -##FILTER= -##FILTER= -##FILTER= -##FILTER= -##FILTER= -##FILTER= -##FILTER= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GS000016676-ASM -1 1 . N . . END=10000;NS=1;AN=0 GT:PS ./.:. -1 10001 . T . . NS=1;CGA_WINEND=12000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:1.44:.:0:.:0:0.999:152 -1 10001 . T . . END=11038;NS=1;AN=0 GT:PS ./.:. -1 11048 . CGCACGGCGCCGGGCTGGGGCGGGGGGAGGGTGGCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 11270 . AGAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 11302 . GGGCACTGCAGGGCCCTCTTGCTTACTGTATAGTGGTGGCACGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 11388 . AGGTGTAGTGGCAGCACGCCCACCTGCTGGCAGCTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 11475 . ACACCCGGAGCATATGCTGTTTGGTCTCAGTAGACTC . . . NS=1;AN=0 GT:PS ./.:. -1 11528 . TGGGTTTAAAAGTAAAAAATAAATATGTTTAATTTGTGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 11650 . TGGATTTTTGCCAGTCTAACAGGTGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 11707 . TGGGGCCTGGCCATGTGTATTTTTTTAAATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 11769 . TGAGAATGACTGCGCAAATTTGCCGGATTTCCTTTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 11841 . CCGGGTATCATTCACCATTTTTCTTTTCGTT . . . NS=1;AN=0 GT:PS ./.:. -1 11891 . CTTTGACCTCTTCTTTCTGTTCATGTGTATTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 11958 . ACCGGGCCTTTGAGAGGTCACAGGGTCTTGATGCTGTGGTCTTCATC . . . NS=1;AN=0 GT:PS ./.:. -1 12001 . C . . NS=1;CGA_WINEND=14000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:2.26:1.68:.:0:.:0:0.999:152 -1 12027 . ACTGCTGGCCTGTGCCAGGGTGCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 12099 . AGTGGGATGGGCCATTGTTCATCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 12135 . TGTCTGCATGTAACTTAATACCACAACCAGGCATAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 12187 . AAGATGAGTGAGAGCATC . . . NS=1;AN=0 GT:PS ./.:. -1 12238 . CTTGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 12264 . ACGTGGCCGGCCCTCGCTCCAGCAGCTGGACCCCTACCTGCCGTCTGCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 12329 . GCCGGGCTGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 12380 . TCTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 12551 . GGTAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 12656 . CCAGAGCTGCAGAAGACGAC . . . NS=1;AN=0 GT:PS ./.:. -1 12734 . TAGACAGTGAGTGGGAGTGGCGTCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 12780 . GGCGTCTCCTGTCTCCTGGAGAGGCTTCGATG . . . NS=1;AN=0 GT:PS ./.:. -1 12829 . GATCTTCCCTGTGATGTCATCTGGAGCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 12936 . CAGCAAACAGTCTGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 12976 . TCAGAGCCCAGGCCAGGGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 13113 . AAGTGAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 13175 . GGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 13270 . CCAGTGATACACCCG . . . NS=1;AN=0 GT:PS ./.:. -1 13299 . ACACGCTGTTGGCCTGGATCTGAGCCCTGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 13358 . ATTGCTGCTGTGTGGAAGTTCACTCC . . . NS=1;AN=0 GT:PS ./.:. -1 13410 . ACCACCCCGAGATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 13610 . GTGTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 13653 . AAACAGGGGAATCCCGAAGAAATGGTGGGTCCTGGCCATCCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 13747 . CTGCGTGGCCGAGGGCCAGGCTTCTCACTGGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 13810 . ACCTTCTTAGAAGCGAGACGGAGCAGACCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 13865 . ACTAAAGTTAGCTGCCCTGGACTATTCACCCCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 13954 . ACCTCCCCCACCTTCTTCCTGAGTCATTCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 14001 . A . . NS=1;CGA_WINEND=16000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:2.19:1.78:.:0:.:0:0.999:152 -1 14108 . ATCTTCTACCC . . . NS=1;AN=0 GT:PS ./.:. -1 14162 . ACTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 14207 . GGAGACT . . . NS=1;AN=0 GT:PS ./.:. -1 14245 . GACTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 14351 . AGACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 14433 . GCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 14495 . ATGGAGCACAGGCAGACAGAAGTCCCCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 14539 . TCAAGCCAGCCTTCCGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 14650 . CAACGGC . . . NS=1;AN=0 GT:PS ./.:. -1 14668 . TCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCAGTCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 14733 . GGCTGCTGCGGTGGCGGCAGAGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 14809 . CAGGTCCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 14904 . GGAAGAAAAAGGCAGGACAGAATTACAAGGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 14973 . TGCGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 15012 . AGTGCCCACCTTGGCTCGTGGCTCTCACTGCAACGGGAAAGCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 15115 . GACACTC . . . NS=1;AN=0 GT:PS ./.:. -1 15186 . CACCGGGCACTGATGAGACAGCGGCTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 15237 . CTCGGGGCCAGGGCCAGGGTGTGCAGCACCACTGTACAATGG . . . NS=1;AN=0 GT:PS ./.:. -1 15387 . GGCGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 15422 . ACAGCAGGCATCATCAGTAGCCTCCAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 15510 . GACCGCTCTTGGCAGTCGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 15588 . TCCCAAACCAG . . . NS=1;AN=0 GT:PS ./.:. -1 15814 . GCTGCTGCTTCTCCAGCTTTCGCTCCTTCATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 15862 . TGCCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 15893 . TAGCAGAGTGGCCAGCCACCGGAGGGGTCAACC . . . NS=1;AN=0 GT:PS ./.:. -1 15953 . GCCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 15988 . TGCTCAGGCAGGGCTGGGGAAGCTTACTGT . . . NS=1;AN=0 GT:PS ./.:. -1 16001 . C . . NS=1;CGA_WINEND=18000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:2.06:1.76:.:0:.:0:0.999:152 -1 16055 . AAACGAGGAGCCCTGCG . . . NS=1;AN=0 GT:PS ./.:. -1 16100 . GTGTGGGGGCCTGGGCACTGACTTCTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 16203 . CCCTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 16254 . AGGGGTTTTGTGCCACTTCTGGATGCTAGGGTTACACTGGGAGACACA . . . NS=1;AN=0 GT:PS ./.:. -1 16374 . GGAATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 16484 . ATATTTGAAATGGAAACTATTCAAAAAATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 16531 . TAACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 16567 . GCACGCCAGAAATCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 16678 . GGGAGTGGGGGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 16715 . GGGGTGGTGGTGGGGGCGGTGGGGGTGGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 16798 . AAGGTGTGTGACCAGGGAGGTCCCCGGCCCAGCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 16853 . CCTACCT . . . NS=1;AN=0 GT:PS ./.:. -1 16954 . CATGAGGTCGTTGGCAATGCCGGGCAGGTCAGGCAGGTAGGATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 17219 . CCCACCC . . . NS=1;AN=0 GT:PS ./.:. -1 17362 . CTTCTACCTACAGAGGCGACATGGGGGTCAGGCAAGCTGACACCCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 17481 . GCCGAGCCACCCGTCACCCCCTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 17594 . C . . . NS=1;AN=0 GT:PS ./.:. -1 17713 . CACGCACACAGGAAAGTCCTTCAGCTTCTCCTGAGAGGGCCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 17805 . T . . . NS=1;AN=0 GT:PS ./.:. -1 17882 . GTGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 17942 . G . . . NS=1;AN=0 GT:PS ./.:. -1 17995 . CGCCCGTGAAGATGGAGCCATATTCCTGCAGGCGCCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 18001 . T . . NS=1;CGA_WINEND=20000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.79:1.98:.:0:.:0:0.999:152 -1 18083 . TGAGGGGGCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 18140 . TGGAAGCCTGGGCGAGAAGAAAGCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 18183 . CAGGGCAGAGACTGGGCAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 18253 . GGGTATG . . . NS=1;AN=0 GT:PS ./.:. -1 18350 . CTCCGGCTCTGCTCTACCTGCTGGGAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 18496 . CCTACTT . . . NS=1;AN=0 GT:PS ./.:. -1 18562 . GTCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 18640 . GAAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 18691 . GGCTACTGATGGGGCAAGCACTTC . . . NS=1;AN=0 GT:PS ./.:. -1 18742 . ACAATGTGGCCTCTGCAGAGGGGGAACGGAGACCGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 18831 . GAACTGCCCCTGCACATACTGAACGGCTCACTGAGCAAACCCCGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 18930 . GGTGCGGGGTGGGCCCAGTGATATCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 19001 . TGCATCT . . . NS=1;AN=0 GT:PS ./.:. -1 19048 . CTGATGC . . . NS=1;AN=0 GT:PS ./.:. -1 19169 . CACAACATCCTCCTCCCAGTCGCCCCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 19265 . GCACTCACCGGGCACGAGCGAGCCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 19303 . GGATGAGAAGGCAGAGGCGCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 19387 . AGGCGCGACTGGGGTTCATGAGGAAAGGGAGGGGGAGGATGTGGGATGG . . . NS=1;AN=0 GT:PS ./.:. -1 19589 . CCGTGCCCTAAAGGGTCTGCCCTGATTACTCCTGGCTCCTTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 19680 . AAGCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 19772 . CGGGACCACCACCCAGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 19855 . TGTCTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 19939 . TTCGAGGTCCACAGGGGCAGTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 20001 . C . . NS=1;CGA_WINEND=22000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:2.01:1.69:.:0:.:0:0.999:152 -1 20096 . AACAGAGAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 20133 . GAGTCCCAGGGGCCAGCACTGCTCGAAATGTACAGCATTTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 20188 . TTATTAGCCTGCTGTGCCCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 20242 . CAGGATTTTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 20301 . GGAGAGAACATATAGGAAAAATCAG . . . NS=1;AN=0 GT:PS ./.:. -1 20385 . CACGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 20463 . TAAGCTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 20508 . GCCAGAGGGTAGACTGCAATCACCAAGATGAAATTTACAAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 20592 . TATAAATACAGAAGGTGGAGGGAACTTGCTTTAGACACAG . . . NS=1;AN=0 GT:PS ./.:. -1 20708 . AAAGGCAATGAGATCTTAGGGCACACAGCTCCCCGCCCCTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 20810 . AAAACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 20859 . CAGAGGGTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 20911 . AGCAGGAGGAGAGAGCACAGCCTGCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 20957 . CACCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 20982 . ACGCCAGTGAGGCCAGAGGCCGGGCTGTGCTGGGGCCTGAGCCGGGTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 21073 . GAGGAGCATGTTTAAGGGGACGGG . . . NS=1;AN=0 GT:PS ./.:. -1 21118 . ACCGAAAAAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 21169 . AGGAGGGGCAAGTGGAGGAGGAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 21221 . GTCGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 21316 . CTTGCAAGTCCCCTGTCTGTAGCCTCACCCCTGTCGTATC . . . NS=1;AN=0 GT:PS ./.:. -1 21408 . CTTGTCCCTTCCGTGACGGATGCCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 21500 . CACGCCTGAATCAACTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 21580 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.83|rs526642;CGA_FI=653635|NR_024540.1|WASH5P|INTRON|UNKNOWN-INC;CGA_SDO=5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:126:126,168:126,168:48,48:-168,-126,0:-48,-48,0:1:1,1:0 -1 21623 . ACACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 21707 . CCCGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 21766 . CCCCTCCCACCCCTGTGCAGGCCGGCCTTCGCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 21837 . CCTCCCTCCAAGCCTGCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 21870 . CCCTGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 21988 . GCAATGGCCCCATTTCCCTTGGGGAATCCATCTCTCTCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 22001 . T . . NS=1;CGA_WINEND=24000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.65:1.60:.:0:.:0:0.999:152 -1 22067 . GCTCCTCAGTCTAAGCCAAGTGGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 22109 . CCCATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 22152 . GGGATGAGTGAGTGTGGCTTCTGGAGGAAGTGGGGACACAGGACAGC . . . NS=1;AN=0 GT:PS ./.:. -1 22246 . CGAGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 22371 . TTAATTTTTGCTTAGCTTGGTCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 22408 . GGCGTGCCACCAATTCTTACCGATTTCTCTCCACTCTAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 22491 . TCTCGCCCTATGTGTTCCCATTCCAGCCTCTAGGACACAGTGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 22651 . TGAGAGGCATCTGGCCCTCCCTGCGCTGTGCCAGCAGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 22755 . CATCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 22822 . AGACGCCAAAAATCCAGCGCTGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 22861 . CCACGCAGTCCCCATCTTGGCAAGGAAACACAATTTCCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 22945 . CCATAATC . . . NS=1;AN=0 GT:PS ./.:. -1 22977 . TGCATCCTCTTCCCTAGGTGTCCCTCGGGCACATTTAGCACAAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 23044 . GCACTTTGTTACTATTGGTGGCAGGTTTATGA . . . NS=1;AN=0 GT:PS ./.:. -1 23094 . GTACGGGTCAAGATTATCAACAGGGAAGAGATAGC . . . NS=1;AN=0 GT:PS ./.:. -1 23171 . TTTGCATGTTTTGATTAATTTAATATTTAAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 23252 . CACCGAGGCTTAGAGGGGTTGGGTTGCCCAAGGTTACAGAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 23356 . TCACTGTGTGTCCCCTGGTTACTGGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 23391 . ACAAACTCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 23503 . CTGGCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 23528 . GGCAGGGATGGCTTGGACCACGAGAGGCACCTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 23581 . CCCACTG . . . NS=1;AN=0 GT:PS ./.:. -1 23643 . TCAGTTTGCTTATGGCCAAAGACAGGACCT . . . NS=1;AN=0 GT:PS ./.:. -1 23697 . TTTACCAAAAAAAGAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 23786 . AGCACTGCCAATACAAGAAGCTGCAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 23827 . CCCTCAATGGCCACTCCGTGCTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 23878 . CCACCTC . . . NS=1;AN=0 GT:PS ./.:. -1 23975 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.100|rs2748067&dbsnp.131|rs76046194;CGA_FI=653635|NR_024540.1|WASH5P|INTRON|UNKNOWN-INC;CGA_RPT=L2b|L2|53.1;CGA_SDO=5 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:102,.:102,.:40,.:-102,0,0:-40,0,0:0:0,.:0 -1 24001 . G . . NS=1;CGA_WINEND=26000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.46:1.70:.:0:.:0:0.999:152 -1 24031 . CCCCATC . . . NS=1;AN=0 GT:PS ./.:. -1 24095 . GAATCCTGGCTCTGTCACTAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 24127 . CAGCCCTTCTGTGCCTCAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 24194 . TGAGTTAATGCACTCAAATCAATGGTTGTGCACGGTTTATA . . . NS=1;AN=0 GT:PS ./.:. -1 24266 . AGACCTTGTCACAACTGTTATTGAAGAACTAATCATCTATTGCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 24344 . TCCAGGTGGAGAGGTATGTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 24416 . CACTGCTGGGTAAATATTTGTTGGCTGCAGGAAAACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 24480 . AAAAGCATGAG . . . NS=1;AN=0 GT:PS ./.:. -1 24508 . CCACAGGAAACCAGGAGGCTAAGTGGGGTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 24635 . GACCGGGATTCCCCAAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 24701 . GCCCTCTCATCAGGTGGGGGTGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 24762 . TTCTGCAGGTACTGCAGGGCATCCGCCATCTGCTGGACGGC . . . NS=1;AN=0 GT:PS ./.:. -1 24829 . TGAAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 24871 . TCCTCACAGGAGTCATGGTGCCTGTGGGTCGGAGCCGGAGCGTCAGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 24936 . CACGCCCCCACCACAGGGCAGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 25041 . TTGCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 25062 . ATAGATGGGACTCTGCTGATGCCTGCTGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 25108 . CAGGGCCCGGGACTGGGGAATCTGTAGGGTCAATGGAGGAGTTCAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 25215 . TACCTTGTCTCAGTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 25305 . GGGGTAGCAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 25344 . TACACAGTTCTGGAAAAGCACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 25437 . GGGCAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 25496 . CTTAGGGGGACAA . . . NS=1;AN=0 GT:PS ./.:. -1 25530 . GGTGGAGGACAGGAAGGAAAAACACTCCTGGAATTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 25609 . CTCTCCCTGGTGCCACTAAAGCAGCAATCACACTGCAGACAGC . . . NS=1;AN=0 GT:PS ./.:. -1 25730 . CTCACTG . . . NS=1;AN=0 GT:PS ./.:. -1 25771 . ACTGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 25871 . TTTCACTTG . . . NS=1;AN=0 GT:PS ./.:. -1 25931 . TTCGATT . . . NS=1;AN=0 GT:PS ./.:. -1 25970 . GAGACGTGGTTATTTCCAATAATAATTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 26001 . T . . NS=1;CGA_WINEND=28000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.37:1.49:.:0:.:0:0.999:152 -1 26010 . TAACGCACCACACCAACATCTTCACCCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 26057 . CTCCCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 26263 . ACCCAACCCTCTGAGACCAGCACACCCCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 26339 . GTTTGCTGGCTGTCCTAACTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 26443 . ATTTCTTGTTAGTGTGTGTGTGTTTGCTCACACATATGCGTGAAAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 26501 . ACAGATCTCCTCAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 26561 . TGAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 26595 . GATCATCTGTTAGGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 26648 . ACTAGCCAGGGAGAGTCTCAAAAACAAC . . . NS=1;AN=0 GT:PS ./.:. -1 26697 . CTACTCCAGTCATGGGTACAAAGCTAAGGAGTGACAAATCCCTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 26792 . GCCGGGCGCAGCGGCTCACGCC . . . NS=1;AN=0 GT:PS ./.:. -1 26835 . GGCGAAGGCAGGCAGATCA . . . NS=1;AN=0 GT:PS ./.:. -1 26887 . ACATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 26926 . GCCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 26954 . CCCCGCTACTCGGGAGGCTGAGGAAGGAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 26992 . AACCAGGAAGGTGGAGGTTGCAGTGTGCCAAGATCGCGCCATGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 27044 . CCTAGGCAACGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 27103 . AAAGAAACAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 27136 . AACCGCAAGCGGTCTTGAGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 27172 . TCCTTGGGGAAGTACTAGAAGAAAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 27227 . CACTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 27274 . CCTTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 27313 . CATGCAGCCACTGAGCACTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 27361 . GCCATAAGTGTAAAATATGCACCAAATTTCAAAGGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 27428 . TTTATATTGATTACGTGCTAAAATAACCATATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 27486 . TATCACTAATTTCATCTGTTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 27537 . TTAAATATTTCTTTTCTTTTTCTTTCCTCTCACTCAGCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 27604 . GCTGTTTTTGGGCAGCAGATATCCTAGAATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 27664 . TCATAAC . . . NS=1;AN=0 GT:PS ./.:. -1 27720 . TGACCATGTA . . . NS=1;AN=0 GT:PS ./.:. -1 27777 . ACAGTATGACTGCTAATAATACCTACACATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 27840 . TTAACTCTTATTATCAGTGAATTTATCATCATCCCCTATTTTACATAA . . . NS=1;AN=0 GT:PS ./.:. -1 27904 . AGACCAAATAACATTTTTTCAACATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 27965 . CTGTCGTCTGAATTCCAAGCTTTTTGTTATTTATTG . . . NS=1;AN=0 GT:PS ./.:. -1 28001 . A . . NS=1;CGA_WINEND=30000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.40:1.68:.:0:.:0:0.999:152 -1 28039 . GCCCAAACATTTTGTTAGTAGTACCAACTGTAAGTCACCTTATCTTCATA . . . NS=1;AN=0 GT:PS ./.:. -1 28110 . AATTAGATCTGTTTTTGATACTGAGGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 28174 . TGTGGTCAACACTTTCGTTACTTTAGTATACATCACCCCAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 28244 . TAGGTAGTAGTATCTATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 28324 . TAGTTGCTCATCTGAAGAAGTGACGGACCACCTC . . . NS=1;AN=0 GT:PS ./.:. -1 28364 . AGTGGACAGACAGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 28391 . GACAGGGGATTTTGTTGGCGGAAAAAAAAATTTATCAAAAGTCGTCTTCTATCA . . . NS=1;AN=0 GT:PS ./.:. -1 28474 . AGTTCCACAGTGGGTAACTGTAATTCATTCTAGGTCTGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 28555 . CCACAAATACCT . . . NS=1;AN=0 GT:PS ./.:. -1 28582 . ATGGTGGTTTTTTTTTTTTTTTGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 28660 . CGCTCAATATTTCTAGTCGACAGCACTGCTTTCGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 28748 . ACCGCGG . . . NS=1;AN=0 GT:PS ./.:. -1 28811 . TCCAGGGTCTCTCCCGGAGTTACAAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 28860 . CAACGCGGTGTCAGAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 28906 . TCCGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 28950 . GAACCCGGCAGGGGCGGGAAGACGCAGGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 29011 . CGGGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 29061 . GCCGGGTGCAGGGCGCGGCTCCAGGGAGGAAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 29121 . GGCGGTCGGGGCCCAGCGGCG . . . NS=1;AN=0 GT:PS ./.:. -1 29153 . GGAGCCGGGCACCGGGCAGCGGCCGCGGAACACCAGCTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 29253 . CGGGTCCCCTACTTCGCCCCGCCAGGCCCCCACGACC . . . NS=1;AN=0 GT:PS ./.:. -1 29359 . CGCTCTGCCGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 29390 . GCCGCCCCCAGTCCGCCCGCGCCTCCGG . . . NS=1;AN=0 GT:PS ./.:. -1 29430 . CCGCTCGCCCTCCACTGCGCCCTCCCCGAGCGCGGCTCCAGGACCCCG . . . NS=1;AN=0 GT:PS ./.:. -1 29495 . CCTGTCGGGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 29568 . CATGCGTTGTCTTCCGAGCGTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 29630 . TCCTAGACCTCCGTCCTTTGTCCCATCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 29693 . CCAACCTCGGCTCCTCCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 29721 . GCCCGGGGTGCGCCCCGGGGCAGGACCCCCAGCCCACGCCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 29787 . TACGCCTTGACCCGCTTTCCTGCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 29846 . GGGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 29874 . CCCACCCCCCTTTAAGAATTCAATAGAGAAGCCAGACGCAAAACTACAGATATCGTATGAGTCCAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 29958 . A . . . NS=1;AN=0 GT:PS ./.:. -1 30001 . G . . NS=1;CGA_WINEND=32000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.52:1.49:.:0:.:0:0.999:152 -1 30024 . AGCTCGTGTTCAATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 30076 . AAATGAGTGGTAGTGATGGCGGCACAACAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 30237 . TTTTAAAAAGTTAAATATAT . . . NS=1;AN=0 GT:PS ./.:. -1 30285 . GCAGTTGTCCCTCCTGGAATCCGTTGGCTTGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 30359 . AAAGACAGGATGCCCAGCTAGT . . . NS=1;AN=0 GT:PS ./.:. -1 30408 . TTCGTAGCATAAATATGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 30485 . TTCAGAATTAAGCATTTTATATTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 30525 . CCACCCTACTCTCTTCCTAACACTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 30567 . TGTCCGCCTTCCCTGCCTCCTCTTCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 30633 . CTCGCTGGCTGCAGCGTGTGGTCCCCTTACCAGAGGTAAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 30725 . AATGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 30776 . CCTTTGGTAGGTAATTACGGTTAGATGAGGTCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 30852 . TTGTCTCTGTGTCTCCCTCTCTCTCTCTCTCTCTCTCTCTCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 30907 . CATTTCTCTCTCTCTCGCTATCTCATTTTTCTCTCTCTCTCTTTCTCTCCTCTGTCTTTTCCCACCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 30982 . TGCGAAGAGAAGGTGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 31026 . ACCGGGAACCCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 31125 . TTTTGTTTTGACTAATACAACCTGAAAACATTTTCCCCTCACTCC . . . NS=1;AN=0 GT:PS ./.:. -1 31248 . GCCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 31282 . CACAGGCTCAGGGATCTGCTATTCATTCTTTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 31381 . GCCCTGCCTCCTTTTGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 31431 . AATCTGGCTGGGCGTGGTGGCTCATGCCTGTAATCCTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 31484 . GACGCGAGAGGACTGCTTGAGCCCAAGAGTTTGAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 31552 . TACAAAAATAAAATAAAATAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 31647 . GATCGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 31669 . GATTGTACCACTGCACTCCAGGCTGGGCGACA . . . NS=1;AN=0 GT:PS ./.:. -1 31716 . TCAGAAAAAAAAAAAAAAGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 31790 . CACGATGCCTGTGAATATACACACACACCACA . . . NS=1;AN=0 GT:PS ./.:. -1 31856 . TGCACTGCTAGGCACCACCCCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 31938 . GTTCCCTACCTAATCTACTGACAGGCTCATCCCCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 32000 . TGCAGTGGGAATCCTGGACCTCAGCCTGGACAAAGAACAGCTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 32001 . G . . NS=1;CGA_WINEND=34000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.55:1.39:.:0:.:0:0.999:152 -1 32064 . CACAGAAGCTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 32093 . AGCTGGGCTGAGCGGGCCTGGGAATTAAGGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 32147 . TTGCTGAAGCTTGCCACATCCCCCAGCCTCCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 32227 . GAGTGAAGAAAATGTGACAGGGTGTCCTAAGCCCCGATCTACAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 32322 . GCCTCTAGCTTTTGTGCTACAGTTCTGGGAACAGACTCCTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 32374 . CCACTTCCCTCCGCAGCATTAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 32455 . ATGGAAATGTCCTGCTCTCTAAACAGATAGACAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 32584 . TGCACGCA . . . NS=1;AN=0 GT:PS ./.:. -1 32646 . CGGTGACTGTGTTCAGAGTGAGTTCACACCAT . . . NS=1;AN=0 GT:PS ./.:. -1 32733 . CAGCCCAGGAACCTCCCCTTATCGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 32815 . TGCCATGTGGGTTGTTCTCTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 32891 . CCTCGGCTGGAGTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 32971 . TATGTAATAACTGAATCTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 33051 . CCATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 33108 . TCATAAAAAGGAAGGCAGAGCTGATCCATGGCACCATGTGACAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 33176 . GGAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 33258 . AGCCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 33329 . GAGCTGATGAAAATGTTTTGGAACTACATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 33377 . CATGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 33398 . CACTGATTGTTCAATTTAAAATGGTCAAACTTATATGA . . . NS=1;AN=0 GT:PS ./.:. -1 33443 . CTCCATTAAAAAAAAAAAAAAAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 33492 . AATCCCAACACTTTGGAAAAAGGTGAAAGTTTTTTTTTCTTTTTTTTTTTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 33552 . GTTCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 33653 . TCTTCTAATGCTATCCCTCCCCCAGCCCCCCACCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 33703 . TGTATGATGTTCTCTGCCCCATGTCCAAGCGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 33744 . TCAATTCCCACCTGTGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 33797 . A . . . NS=1;AN=0 GT:PS ./.:. -1 33868 . ATGGCTGCATAGTATCCCATGGTATATATGTGCCACATTCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 33925 . ATTGATGGACATT . . . NS=1;AN=0 GT:PS ./.:. -1 33958 . GCTATTGTGAATACTGCCACAATAAACATACAT . . . NS=1;AN=0 GT:PS ./.:. -1 34001 . C . . NS=1;CGA_WINEND=36000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.23:1.45:.:0:.:0:0.999:152 -1 34026 . CTTTGGGTATATACCCTAAGACCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 34125 . TGGGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 34291 . TCTCACATCTTCTTGGCCAGCACTGGACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 34358 . TATGAGAAAGAAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 34404 . CATCTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 34452 . AGAAGGCTTTCTGGACGGT . . . NS=1;AN=0 GT:PS ./.:. -1 34513 . TGGTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 34581 . AGGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 34631 . AAGTTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 34704 . TGGTTCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 34740 . GCCGTGCTCCTTGGAGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 34767 . AGGCGGAGGACACGTGGGAGGTTTTAGGGACAAGCCTGGAGGCAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 34968 . GTCAGGCAGGGAGTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 35030 . GGTGGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 35076 . GACCACGTGCTGGATGTCACGCT . . . NS=1;AN=0 GT:PS ./.:. -1 35116 . GCCGGGTTAGGCACCTGGTGTTTTACGTACATA . . . NS=1;AN=0 GT:PS ./.:. -1 35160 . GTGAGGGCATCCGACC . . . NS=1;AN=0 GT:PS ./.:. -1 35229 . TTAGAGCTTAATCGTGTTCAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 35364 . GTCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 35407 . GGGAGGCTGAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 35482 . CTATAAAAAATAAATAAATAAATAAAAACAAC . . . NS=1;AN=0 GT:PS ./.:. -1 35656 . GGCTTAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 35755 . TCACGAA . . . NS=1;AN=0 GT:PS ./.:. -1 36001 . T . . NS=1;CGA_WINEND=38000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.06:1.37:.:0:.:0:0.999:152 -1 36070 . CCCCGTTGTGTGGGAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 36125 . AGGGAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 36158 . TGCAGCTGGCTCATTCCCATATAGGGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 36229 . GGGGAGGCCAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 36258 . GGGTGGCTCTGAGGGGGCTCTCAGGGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 36344 . AAGTTTGGAAAAAAAAAAAAACCCAGCCTGGCGGAAAGAATTTAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 36483 . GTCATCCTTCCCCAACACATCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 36521 . CAAGCCTCTCCCACCCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 36590 . AGACATG . . . NS=1;AN=0 GT:PS ./.:. -1 36722 . TGATTCTGTGGTATGTTAATGTTTATGCATAG . . . NS=1;AN=0 GT:PS ./.:. -1 36786 . GAGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 36851 . G . . . NS=1;AN=0 GT:PS ./.:. -1 36888 . GGCTAGTTGTTTGAATGTCTGCATGAAAAAGCGGACGAC . . . NS=1;AN=0 GT:PS ./.:. -1 36960 . GACCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 36996 . CTGCACTATTAATTTGTTTTTTAGCTAAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 37052 . CCACCCAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 37103 . TGAACCTACCTTTTCAATGTAAATTCAGTGAAATCTAAGTGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 37236 . AGGGATTTTTTTTTTGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 37296 . AATGATAATCTTTTGGATATATTGGGTTAAATAAATTTATTA . . . NS=1;AN=0 GT:PS ./.:. -1 37362 . GTTTAATG . . . NS=1;AN=0 GT:PS ./.:. -1 37506 . AGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 37560 . ACAGAGATAACTCCAACCCTTAAGAAGGTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 37624 . TACTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 37688 . GCCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 37714 . GTGGTCTCACCTCCGGCAGTATCACCACCACTGGGCACAAGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 37769 . CAACTGT . . . NS=1;AN=0 GT:PS ./.:. -1 37791 . GTACTCCCAGTGTTCACACCATGCTGCACTCACAGAAGACTCTTCGTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 37882 . G . . . NS=1;AN=0 GT:PS ./.:. -1 37941 . GGGGAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 37962 . AGCCAGGAGTCTCATCCCCTGGGGAAGTTCCAGGGACCCCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 38001 . C . . NS=1;CGA_WINEND=40000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.37:1.47:.:0:.:0:0.999:152 -1 38049 . CAGAGCCTGCCTTCCACGTGGGTTTGACAGGAGCCTCCTAACTGC . . . NS=1;AN=0 GT:PS ./.:. -1 38140 . GCCAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 38221 . G . . . NS=1;AN=0 GT:PS ./.:. -1 38232 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.131|rs77823476&dbsnp.86|rs806727;CGA_FI=645520|NR_026818.1|FAM138A|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:87:195,87:195,87:48,48:-195,-87,0:-48,-48,0:1:1,1:0 -1 38339 . GGACAGG . . . NS=1;AN=0 GT:PS ./.:. -1 38378 . AAAGTGGTCTCCTGCAGTTACGTGGCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 38481 . TTCTTCTTACTGCTTATAATATAGT . . . NS=1;AN=0 GT:PS ./.:. -1 38592 . GTCTCCCCACATGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 38736 . AGATTACAAGGGTGTACCATGCAGAACCTCTCCACCAAACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 38821 . TTGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 38907 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs3874156&dbsnp.131|rs75829199&dbsnp.86|rs806726;CGA_FI=645520|NR_026818.1|FAM138A|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=MLT1E1A-int|ERVL-MaLR|38.5;CGA_SDO=6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:94:172,94:172,94:48,48:-172,-94,0:-48,-48,0:13:9,9:4 -1 38998 . AAAGAAGACTGTCAGAGACCCCAAACTCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 39140 . AATCTTCCCACATCTTAAAACCTGTTTAGGGAACACCAGCATCTGTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 39217 . TCCTTCCCCTGCTGCCTCTTTCTGAACAGCAATGTCTCAAGCTTTACC . . . NS=1;AN=0 GT:PS ./.:. -1 39289 . GGGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 39403 . CAAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 39484 . GTGACCCCCACAA . . . NS=1;AN=0 GT:PS ./.:. -1 39575 . AGGCGGTATATATGTGATTCATGTACTGATCAT . . . NS=1;AN=0 GT:PS ./.:. -1 39625 . GCTGGATGCAGTGGCTCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 39654 . CCAACACTTTGGGAGGCTGAGGCGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 39702 . TCGAGACCAGGCTGGCCAACATGGCAAAACCCCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 39827 . ACTCAGGAGGTGGAGGTGGCAGTGAGCCAAGATCGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 40001 . G . . NS=1;CGA_WINEND=42000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.33:1.54:.:0:.:0:0.999:152 -1 40026 . GGAGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 40090 . CATAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 40156 . GAACCCAGTGCTGGCTGACACCCTGATGGCACCTTACAGAGGACC . . . NS=1;AN=0 GT:PS ./.:. -1 40234 . CTGGGGAACACTGGGTCGTATTTGCAGCTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 40295 . TGGGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 40386 . CCGCAGCCACGCTGGCCCTCTGCTGTTCTTCGAAGCCACCAGGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 40491 . GTATATG . . . NS=1;AN=0 GT:PS ./.:. -1 40604 . ACATGCC . . . NS=1;AN=0 GT:PS ./.:. -1 40636 . TTCCTTTTTTTTTTTTTTTTTTTGACA . . . NS=1;AN=0 GT:PS ./.:. -1 40703 . CTCGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 40890 . ACTGGTCTA . . . NS=1;AN=0 GT:PS ./.:. -1 41025 . TCTTGGGAATATTAAGTGGAGAGGGGTACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 41215 . TTCTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 41253 . CCTCATGT . . . NS=1;AN=0 GT:PS ./.:. -1 41376 . AAAGGTATAGCAATATTTCTATTTCCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 41475 . GGAAGCCAAAATGACAGGGAGCTACTAAAACTTTATTCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 41668 . AGGACCCAATATCTTACAATGTCCATTGGTTCAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 41757 . GGCATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 41839 . AGTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 41981 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.86|rs806721;CGA_RPT=ERVL-E-int|ERVL|47.4;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:109:109,238:109,238:48,48:-238,-109,0:-48,-48,0:6:2,2:4 -1 42001 . A . . NS=1;CGA_WINEND=44000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.57:1.37:.:0:.:0:0.999:152 -1 42036 . GAACAAATTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 42198 . CATGCTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 42262 . ATTGTGCAAGCATAAGTGGCTGAGTCAGGTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 42343 . ACATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 42522 . TTATACT . . . NS=1;AN=0 GT:PS ./.:. -1 42574 . TTACGCTTTTCTTAAACACACAAAATACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 42662 . AATCAATTAGCAATCAGGAAGGAGTTGTGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 42736 . AAATTATTCACAATAAAAAAAAAGATTAGAATAGTTTTTTTAAAAAAAAAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 42821 . AGGTATA . . . NS=1;AN=0 GT:PS ./.:. -1 43039 . TTAGGCAAGGAAGCAAAAGCAGAAACCATGAAAAAAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 43201 . ATATCAATAACCACAACATTCAAGCTGTCAGTTTGAATAG . . . NS=1;AN=0 GT:PS ./.:. -1 43370 . GAAACCAA . . . NS=1;AN=0 GT:PS ./.:. -1 43583 . ACACTGGTAAAAAAAATGAAAGCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 43662 . ATATCGTACTAAAAGTCCTAGCCAGGACAATTAGACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 43793 . CAGCAAAAAAAAAAAAAAAACTAC . . . NS=1;AN=0 GT:PS ./.:. -1 44001 . T . . NS=1;CGA_WINEND=46000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.42:1.36:.:0:.:0:0.999:152 -1 44162 . TTTGACAGAAATAAAAAAAAAATTCT . . . NS=1;AN=0 GT:PS ./.:. -1 44329 . AAACACACAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 44669 . AAATAGACAAATGAGACTATGCCAAATTAAAAAATTTCTAACAACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 44752 . GAATGGGAGAAATATTTGCAAACTACTCATCCAACCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 44832 . AGTAAAATAAATAAATAAATAAATAAATAAATAAATTAAATAAATTATTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 45028 . TGGCTATTATCA . . . NS=1;AN=0 GT:PS ./.:. -1 45084 . GAACCCTTGCATCATGTACAAATTAAAAATAGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 45244 . ACCTAAC . . . NS=1;AN=0 GT:PS ./.:. -1 45517 . AAAGGAAAAAAATTCAATTAGTAGGATTACATTCAGGGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 45623 . TATTGTAAATGTTAATATGAGGTAATATATGTGTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 45753 . A . . . NS=1;AN=0 GT:PS ./.:. -1 45758 . T . . . NS=1;AN=0 GT:PS ./.:. -1 45791 . GATTAAAAAAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 45853 . TTTTAAATATAATTTAAACCAAATTTAAAATAAGCATATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 45974 . TAAATTTTAAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 46001 . G . . NS=1;CGA_WINEND=48000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.56:1.63:.:0:.:0:0.999:152 -1 46051 . CACTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 46085 . TATGTCAGATCATGAATT . . . NS=1;AN=0 GT:PS ./.:. -1 46399 . TAACTTTTTTTTTTTTTTGAGCAGCAGCAAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 46670 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2548905;CGA_RPT=MER45A|hAT-Tip100|29.0;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:24:24,24:24,24:5,24:-24,0,-24:-5,0,-24:22:0,22:22 -1 46868 . GCCAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 46922 . AAAGTAAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 47108 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2531241;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:75:75,75:75,75:32,36:-75,0,-75:-32,0,-36:16:1,15:15 -1 47489 . TCCCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 47658 . AAATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 47693 . GCATCAATGGGTCACTA . . . NS=1;AN=0 GT:PS ./.:. -1 47932 . TCTCCTCCACTTTTCTGTTTTCCTCCTATCTC . . . NS=1;AN=0 GT:PS ./.:. -1 47995 . ACAATCT . . . NS=1;AN=0 GT:PS ./.:. -1 48001 . T . . NS=1;CGA_WINEND=50000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.72:1.41:.:0:.:0:0.999:152 -1 48445 . GAACGGG . . . NS=1;AN=0 GT:PS ./.:. -1 48934 . CAGTATG . . . NS=1;AN=0 GT:PS ./.:. -1 49240 . CAAGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 49269 . ACCGCATGTTCTCACTTATGAGCGTGAGATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 49431 . CTGTACAACGAACCCCCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 49479 . CACGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 49506 . AGTCAAAAAGAAAAAGAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 49829 . AAAGCAAAACAAACAAACAAACAAAACAAAACACTAA . . . NS=1;AN=0 GT:PS ./.:. -1 50001 . A . . NS=1;CGA_WINEND=52000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.47:1.56:.:0:.:0:0.999:152 -1 50169 . GGTATGG . . . NS=1;AN=0 GT:PS ./.:. -1 50482 . GTGT . . . NS=1;AN=0 GT:PS ./.:. -1 50888 . TGGTATG . . . NS=1;AN=0 GT:PS ./.:. -1 51456 . AGCGGAAGAGTAAGTCTTTGTATTTTATGCTACTGTACCTCTGGGATT . . . NS=1;AN=0 GT:PS ./.:. -1 51617 . AGCACTTTGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 51665 . GACCATCCTGGCTAACACGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 51742 . TGCGGTCCCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 51803 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs62637812;CGA_RPT=AluY|Alu|7.7;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:82:112,82:111,81:38,38:-112,0,-82:-38,0,-38:20:9,11:11 -1 51861 . CCTCAAAAAAAAAAAAAGAAGATTGATCAGAGAGTACCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 52001 . A . . NS=1;CGA_WINEND=54000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.20:1.19:.:0:.:0:0.999:152 -1 52058 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs62637813&dbsnp.131|rs76894830;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:24:24,26:24,26:3,25:-24,0,-26:-3,0,-25:9:4,5:5 -1 52134 . CTTGTCTAATTGTTATTAATAATTAATAAATAACTTA . . . NS=1;AN=0 GT:PS ./.:. -1 52182 . TTATTAATAATAACTTA . . . NS=1;AN=0 GT:PS ./.:. -1 52206 . ATGA . . . NS=1;AN=0 GT:PS ./.:. -1 52228 . AATAACTT AC . . NS=1;AN=2;AC=1;CGA_RPT=AT_rich|Low_complexity|3.1;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:52228:PASS:43:121,43:118,40:24,20:-121,0,-43:-24,0,-20:27:11,16:16 -1 52238 . T G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2691277&dbsnp.134|rs150021059;CGA_RPT=AT_rich|Low_complexity|3.1;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:52228:PASS:75:275,75:272,72:52,48:-275,-75,0:-52,-48,0:26:26,26:0 -1 52727 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2691278;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:78:78,78:78,78:27,37:-78,0,-78:-27,0,-37:22:1,21:21 -1 52797 . CAGGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 53006 . AGACACTTACAA . . . NS=1;AN=0 GT:PS ./.:. -1 53184 . TTTTTATGCCATGTATATTTCTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 53385 . AGTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 53425 . ACATGGCTACC . . . NS=1;AN=0 GT:PS ./.:. -1 53705 . AGTAAGCATATAGATGGAATAAATAAAATGTGAACTTAGGTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 53786 . TTAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 53955 . GTCAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 53998 . CCTACAT . . . NS=1;AN=0 GT:PS ./.:. -1 54001 . A . . NS=1;CGA_WINEND=56000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.63:1.53:.:0:.:0:0.999:152 -1 54043 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2531228;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:35:35,35:35,35:10,27:-35,0,-35:-10,0,-27:26:5,21:21 -1 54108 . AAGGATA . . . NS=1;AN=0 GT:PS ./.:. -1 54175 . AGCTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 54287 . ACACATA . . . NS=1;AN=0 GT:PS ./.:. -1 54351 . AACCGTACCTATGCTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 54586 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs79600414;CGA_RPT=L2|L2|49.7;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:39:39,39:33,33:3,17:-39,0,-39:-3,0,-17:64:15,49:49 -1 54705 . CTTGTATTTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTCCTCCTTTTCTTTCCTTTTCTTTCTTTCATTCTTTCTTTCTTTTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 54841 . GTTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 55082 . CAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 55164 . C A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.103|rs3091274;CGA_RPT=L2|L2|49.7;CGA_SDO=2 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:44,.:44,.:18,.:-44,0,0:-18,0,0:18:18,.:0 -1 55296 . ATGCGACCTTCCCACTTAAAATCCTACTATTTACGCT . . . NS=1;AN=0 GT:PS ./.:. -1 55378 . GCTGAAGACACTTCACTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 55425 . CATGGTATAGTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 55484 . AACATAC . . . NS=1;AN=0 GT:PS ./.:. -1 55542 . ACTCCAAAATCTATCAACTCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 55813 . GTCGTGTTCACCTCTATCACATCATAAATATAGCAAACAGT . . . NS=1;AN=0 GT:PS ./.:. -1 55926 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs3020698&dbsnp.121|rs13343114;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:31:65,31:65,31:22,30:-65,-31,0:-30,-22,0:23:23,23:0 -1 55973 . TTCTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 56001 . T . . NS=1;CGA_WINEND=58000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.42:1.69:.:0:.:0:0.999:152 -1 56154 . GTTAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 56295 . ATACGCT . . . NS=1;AN=0 GT:PS ./.:. -1 56378 . TTATGTA . . . NS=1;AN=0 GT:PS ./.:. -1 56482 . TTTCACC . . . NS=1;AN=0 GT:PS ./.:. -1 56635 . ACACTTCTTATTCTGCTGCTGTTCTAGAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 56799 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2691309;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:31:31,31:31,31:8,26:-31,0,-31:-8,0,-26:23:2,21:21 -1 56984 . ATCCAAAAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 57155 . CCTGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 57246 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2691313;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:26:26,26:26,26:3,17:-26,0,-26:-3,0,-17:34:5,29:29 -1 57289 . TTCCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 57373 . TCCCTCCCCCTATTTCATCA . . . NS=1;AN=0 GT:PS ./.:. -1 57792 . TTTCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 57853 . AACTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 57892 . TGTCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 57952 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2691334&dbsnp.135|rs189727433;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:81:205,81:205,81:48,48:-205,-81,0:-48,-48,0:13:12,12:1 -1 57987 . TGTCTGATCTCAGCTATTTCCATCCTATTTGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 58001 . T . . NS=1;CGA_WINEND=60000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.85:1.47:.:0:.:0:0.999:152 -1 58176 . G . . . NS=1;AN=0 GT:PS ./.:. -1 58211 . A . . . NS=1;AN=0 GT:PS ./.:. -1 58595 . ACTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 58909 . TAAATACT . . . NS=1;AN=0 GT:PS ./.:. -1 58986 . TCTCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 59051 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2691352;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:113:147,113:147,113:44,42:-147,0,-113:-44,0,-42:26:11,15:15 -1 59131 . ATGCAAAAAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 59319 . ATCCCACCATACCTCATTATCACACCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 59498 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2854666&dbsnp.131|rs76479716;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:122:122,122:112,112:13,34:-122,0,-122:-13,0,-34:50:32,18:18 -1 59615 . CCTTCCCCTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 60001 . A . . NS=1;CGA_WINEND=62000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.37:1.68:.:0:.:0:0.999:152 -1 60276 . TTGTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 60405 . TTACGTC . . . NS=1;AN=0 GT:PS ./.:. -1 60718 . G . . . NS=1;AN=0 GT:PS ./.:. -1 60726 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2531295&dbsnp.131|rs77618875&dbsnp.135|rs192328835;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:60726:PASS:77:86,77:80,71:12,27:-86,0,-77:-12,0,-27:43:31,12:12 -1 60788 . AATACATGCATATTGTGGAGATAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 61018 . TTCGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 61216 . TGATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 61289 . A AG . . NS=1;AN=2;AC=1;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:57:57,57:57,57:22,26:-57,0,-57:-22,0,-26:11:1,10:10 -1 61347 . TTGTAAAAAAAAAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 61442 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2531261&dbsnp.129|rs62637818&dbsnp.131|rs74970982;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:31:31,62:0,9:14,0:-62,-31,0:-14,0,0:11:11,11:0 -1 61448 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:127,.:74,.:0,.:0:0 -1 61477 . TTTGTTTACCATTATTACTCTTGGTATTTTTAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 61576 . TCCGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 61675 . T . . . NS=1;AN=0 GT:PS ./.:. -1 61848 . TAATAATTGTAAAACTTTTTTTTCTTTTTTTTTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 61984 . CCCAAGTAGCTGGGACTACAGGCATGCACCACCATGCCCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 62001 . A . . NS=1;CGA_WINEND=64000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.32:1.47:.:0:.:0:0.999:152 -1 62203 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28402963;CGA_RPT=L1M5|L1|41.7;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:90:90,90:90,90:37,39:-90,0,-90:-37,0,-39:18:5,13:13 -1 62236 . ACATACACACACACACACACATATCTGTATATACAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 62295 . ATTCTTCATTTCATTTGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 62536 . TTTTTAAAGATTCTGTATTTTTTAAACCATTTATTTGTATATGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 62678 . CCAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 62774 . TTAACCT . . . NS=1;AN=0 GT:PS ./.:. -1 62877 . TTTTCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 63071 . AAAACTC . . . NS=1;AN=0 GT:PS ./.:. -1 63236 . TTGCATGATACAAAAGTTCTTTATCCATGTTATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 63356 . GAATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 63413 . TGCTATGTCTCAGTTTGTTTTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 63466 . ATGTGGGGAGCTTTTATTGTGATTTTCCTCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 63513 . TGCATGGACACTTATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 63693 . TTGTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 63733 . TCCCTACTAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 63792 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.101|rs2907079;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:57:57,57:57,57:20,33:-57,0,-57:-20,0,-33:23:8,15:15 -1 63908 . ACAGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 64001 . A . . NS=1;CGA_WINEND=66000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.68:1.49:.:0:.:0:0.999:152 -1 64122 . CTACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 64168 . A . . . NS=1;AN=0 GT:PS ./.:. -1 64200 . AGTATTTTTATGC . . . NS=1;AN=0 GT:PS ./.:. -1 64277 . ATGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 64476 . GAAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 64510 . GGTGGGGGGAAGGGGGAGGGATAGCATTAGGAGATATAACTAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 64580 . CACACCCACATGGCACATGTATACATATGTAACTAACCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 64761 . TCTCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 64973 . ATTCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 65250 . AAAAAGCACCTTTAGACTCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 65526 . TGCCTCATTCTGTGAAAATTGCTGTAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 65588 . AGACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 65742 . GAGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 65794 . TGCTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 66001 . A . . NS=1;CGA_WINEND=68000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:2.27:1.67:.:0:.:0:0.999:152 -1 66094 . A . . . NS=1;AN=0 GT:PS ./.:. -1 66157 . C . . END=66634;NS=1;AN=0 GT:PS ./.:. -1 66734 . AAGGACT . . . NS=1;AN=0 GT:PS ./.:. -1 66790 . TCTAATTTTTTTTGAATAATTTTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 67069 . TTAATTTTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 67223 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:67223:PASS:42,.:39,.:0,.:0:0 -1 67242 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2531263&dbsnp.131|rs77818189;CGA_FI=79501|NM_001005484.1|OR4F5|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:67223:PASS:42:42,42:39,39:20,4:-42,0,-42:-4,0,-20:59:34,25:34 -1 67445 . AAATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 67602 . ATGTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 68001 . T . . NS=1;CGA_WINEND=70000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.57:1.36:.:0:.:0:0.999:152 -1 68303 . CAGCTATTACCTATTGT . . . NS=1;AN=0 GT:PS ./.:. -1 68384 . AGAGCTAAATTAAACAATCATTCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 68544 . TTTTTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 68613 . ATATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 68893 . ATAG . . . NS=1;AN=0 GT:PS ./.:. -1 68905 . T . . . NS=1;AN=0 GT:PS ./.:. -1 68931 . GTAATAG . . . NS=1;AN=0 GT:PS ./.:. -1 69060 . GAATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 69267 . CTCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 69453 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2854682&dbsnp.134|rs142004627;CGA_FI=79501|NM_001005484.1|OR4F5|CDS|SYNONYMOUS;CGA_PFAM=PFAM|PF00001|7tm_1;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:129:129,129:129,129:42,45:-129,0,-129:-42,0,-45:28:4,24:24 -1 69511 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2691305&dbsnp.131|rs75062661;CGA_FI=79501|NM_001005484.1|OR4F5|CDS|MISSENSE;CGA_PFAM=PFAM|PF00001|7tm_1;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:135:135,317:135,317:48,48:-317,-135,0:-48,-48,0:5:4,4:1 -1 69552 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2531266&dbsnp.129|rs55874132;CGA_FI=79501|NM_001005484.1|OR4F5|CDS|SYNONYMOUS;CGA_PFAM=PFAM|PF00001|7tm_1;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:69552:PASS:130:130,178:130,178:42,49:-130,0,-178:-42,0,-49:23:4,19:19 -1 69569 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2531267;CGA_FI=79501|NM_001005484.1|OR4F5|CDS|MISSENSE;CGA_PFAM=PFAM|PF00001|7tm_1;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:69552:PASS:177:177,178:177,178:48,49:-177,0,-178:-48,0,-49:29:8,21:21 -1 69732 . CCTAATG . . . NS=1;AN=0 GT:PS ./.:. -1 69894 . TTCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 69965 . GACAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 70001 . A . . NS=1;CGA_WINEND=72000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.24:1.30:.:0:.:0:0.999:152 -1 70242 . GAACAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 70297 . TGACCCT . . . NS=1;AN=0 GT:PS ./.:. -1 70349 . TTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 70496 . ATACTTC . . . NS=1;AN=0 GT:PS ./.:. -1 70604 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2854679;CGA_RPT=LTR89|ERVL?|39.7;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:28:28,28:28,28:6,25:-28,0,-28:-6,0,-25:16:2,14:14 -1 70725 . AAGCACAGGCTTTAAAGTAAAAAACAAAGAGCTGGATTCA . . . NS=1;AN=0 GT:PS ./.:. -1 71176 . A . . . NS=1;AN=0 GT:PS ./.:. -1 71378 . AATTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 71632 . AATATGGTAAAGATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 71666 . TTAATTTTTAATGCGTAATAAAACTATGAGAAAATTTAAAAGTGAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 71779 . CCCAAAATATTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 71843 . AGTCTTTTTTTTTTTTTTTACAGTTGTAGGCAGAAAACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 71989 . AATTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 72001 . A . . NS=1;CGA_WINEND=74000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.86:1.18:.:0:.:0:0.999:152 -1 72082 . TACTCTTTTATATATATACATATATGTGTGTATATGTGTATATATATATACACACATATATACATACATACATACATACATATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 72206 . GGGATACATGTGCAGAATGTACAGGTTTGTTACACAGGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 72277 . CAACTCACCATCTACATTAGGTATTTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 72336 . T . . . NS=1;AN=0 GT:PS ./.:. -1 72389 . TTGGTCAACTCCCATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 72452 . TGCGGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 72520 . GCTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 72585 . TTAAAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 72694 . GTGTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 72743 . CCATTAC . . . NS=1;AN=0 GT:PS ./.:. -1 72787 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2854675&dbsnp.129|rs62641289;CGA_RPT=L1PA7|L1|7.3;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:54:169,54:169,54:48,45:-169,-54,0:-48,-45,0:7:7,7:0 -1 72859 . CCCGTCAATGTTAGACTAGATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 73230 . ATGTTTAGCTCCCCCTTGTTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 73343 . GACTTTCTTCTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 73486 . TGTCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 73620 . C . . . NS=1;AN=0 GT:PS ./.:. -1 73633 . A . . . NS=1;AN=0 GT:PS ./.:. -1 73645 . TTCCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 73695 . TATGAAAAAAATGTTCAAGTCTCTCAGATTAAGATGCATGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 73799 . TTTGATA . . . NS=1;AN=0 GT:PS ./.:. -1 73838 . CACCCTTTTTTTTTTTTTTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 73874 . TAGGGTACATGTGCACCTTGTGCAGGTTAGTTACATATGTA . . . NS=1;AN=0 GT:PS ./.:. -1 73932 . TGCGCTGAACCCACTAACTCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 73971 . ATCTCCCAATGCTATCCCTCCCCCCTCCCCCCACCCCACAACA . . . NS=1;AN=0 GT:PS ./.:. -1 74001 . C . . NS=1;CGA_WINEND=76000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.20:0.87:.:0:.:0:0.999:152 -1 74021 . GAGTGTGATATTC . . . NS=1;AN=0 GT:PS ./.:. -1 74070 . CCACCTATGAGTGAGAATATGCGGTGTTTGGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 74112 . T . . END=74372;NS=1;AN=0 GT:PS ./.:. -1 74384 . TGGTATTTCCAGTTCGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 74408 . GAGGAATCGCCACACTGACTTCCACAATGGTTGAACTAGTTTACAGTCCCACCAACAGTGTAAAAGTGTTCCTATTTCTCCACATCCT . . . NS=1;AN=0 GT:PS ./.:. -1 74503 . A . . . NS=1;AN=0 GT:PS ./.:. -1 74506 . TGTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 74520 . TTTTTAATGAT . . . NS=1;AN=0 GT:PS ./.:. -1 74535 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:74535:VQLOW:28,.:23,.:0,.:0:0 -1 74550 . AGATGATATCTCATTGTGGTTTTGATTTGCATTTCTCTGATGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 74610 . TTTTTCATGTGTTTTTTGGCTGCATAGATGTCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 74672 . C . . . NS=1;AN=0 GT:PS ./.:. -1 74676 . CACTTGTT . . . NS=1;AN=0 GT:PS .|.:74676 -1 74690 . GTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 74701 . TTTTCTTGTAAATTTGTTTGAGTTCATTGTAGATTCTGGATATTAGCCCTTTGTCAGATGAGTAGGTTGCAAAAATTTTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 74822 . TTTTGCTGTGCAGAAGCTCTTTAGTTTAATTAGATCCCATTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 74927 . ATCAGAGAATACTACAAACACCTCTACGCAAATAAACTAGAAAATCTAGAAGAAATGGATAAATTCCTGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 75004 . CACTCTCCCAAGCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 75085 . TAGCTTACCAACCAAAAAGAGTCCAGGACCAGATGGATTCACAGCCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 75144 . GGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 75172 . TCTGAAACTATTCCAATCAATAGAAAAAGAGGGAGTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 75222 . TTTATGAGGCCAGCATCATTCTGATACCAAAGCCAGGCAGAGACACAACAAAAAAAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 75301 . GATGAACATTGATGCAAAAATCCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 75333 . TACTGGCAAAACGAATCCAGCAGCACATCAAAAAGCTTATCCACCAAGATCAAGTGGGCTTCATCCCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 75409 . AGGCTGGTTCAATATACGCAAATCAATAAATGTA . . . NS=1;AN=0 GT:PS ./.:. -1 75451 . TATAAACAGAGCCAAAGACAAAAACC . . . NS=1;AN=0 GT:PS ./.:. -1 75490 . AATAGATGCAGAAAAGGCCTTTGACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 75523 . A . . END=75821;NS=1;AN=0 GT:PS ./.:. -1 75828 . CCATTGTCTCAGCCCAAAATCTTCC . . . NS=1;AN=0 GT:PS ./.:. -1 75876 . A . . END=76275;NS=1;AN=0 GT:PS ./.:. -1 76001 . C . . NS=1;CGA_WINEND=78000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.60:0.90:.:0:.:0:0.999:152 -1 76285 . AAAGAACAAAGCTGGAGGCATCACGCT . . . NS=1;AN=0 GT:PS ./.:. -1 76326 . TATACGAC . . . NS=1;AN=0 GT:PS ./.:. -1 76416 . TAATGCCGCATATCTACAACTATCTGATCTTTGACAAACCTGAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 76536 . TGAAACTGGATCCCTTCCTTACACCTTATACAAAAATCAATTCAAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 76597 . AAACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 76619 . AAAACCCTAGAAGAAAACCTAGGCTTTACCATTC . . . NS=1;AN=0 GT:PS ./.:. -1 76843 . CACTCAAGTCTATTCATTGAAGCATTGTTTTTCATAGTAAACGG . . . NS=1;AN=0 GT:PS ./.:. -1 76918 . TGATCCCAGCATTTTGGGAGGCTGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 76992 . TGGCGAAACCCCATCTCTACCAAAAATACAAAAATTAGCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 77081 . GGGAGGATCGCTTGAACCTGGGAGGCAGAAGTTTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 77129 . CGTGCCTCTGCACTCC . . . NS=1;AN=0 GT:PS ./.:. -1 77175 . AAAA . . . NS=1;AN=0 GT:PS ./.:. -1 77276 . A . . . NS=1;AN=0 GT:PS ./.:. -1 77409 . ATAAGTATATATTTTATAAATGTTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 77459 . AACGTAATACATATATAATTTTCTTATGGCAGGAGGAGGAAACA . . . NS=1;AN=0 GT:PS ./.:. -1 77565 . ATCCTGTAGCTGTTTTATGTAATATAAAAATGTAATTAAATTAACAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 77688 . CATGGGACACTAACATACAGACAAATTCATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 77773 . CGAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 77916 . TTGTTAAATATTCTCTATTTTATGACAA . . . NS=1;AN=0 GT:PS ./.:. -1 77958 . GTCGAAGAGAGAAACATGCAAGAACACCGTAGGGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 78001 . A . . NS=1;CGA_WINEND=80000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.80:1.04:.:0:.:0:0.999:152 -1 78015 . T . . . NS=1;AN=0 GT:PS ./.:. -1 78035 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2691297;CGA_RPT=L1MC4a|L1|35.4;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:78035:VQLOW:23:23,23:23,23:2,24:-23,0,-23:-2,0,-24:8:1,7:7 -1 78093 . TATATTTTTAAAAACTAAAAAGATATATTAGCTGATG . . . NS=1;AN=0 GT:PS ./.:. -1 78193 . TATTAAAATAATTTAAAAATGACCAAGTATTTGATTATATCAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 78311 . TAAT . . . NS=1;AN=0 GT:PS ./.:. -1 78354 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 78388 . TGAAACCCTATCTCTACAAAAAACAAACAAAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 78437 . TTTAAAAAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 78507 . TGATCTGACTATGTGCTTCCCTGAACAAATGCACTTTACC . . . NS=1;AN=0 GT:PS ./.:. -1 78630 . CTTCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 78892 . CTCGCAGCCCTCACCCTGGAGAGTCCACAGGTACCAGGGGTTGGTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 78965 . CACAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 79022 . CAAGCAGGGCCACCTGGCCTGGGACTCCGGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 79075 . GACGACT . . . NS=1;AN=0 GT:PS ./.:. -1 79150 . GCAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 79415 . CCCGTGTCACAGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 79663 . A G . . NS=1;AN=2;AC=1;CGA_RPT=L1PREC2|L1|19.7;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:79663:PASS:62:62,64:62,64:27,35:-62,0,-64:-27,0,-35:14:1,13:13 -1 79678 . T C . . NS=1;AN=2;AC=1;CGA_RPT=L1PREC2|L1|19.7;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:79663:PASS:54:54,64:54,64:24,35:-54,0,-64:-24,0,-35:17:1,16:16 -1 79769 . CCTCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 79876 . AACGACCATACTGCCAAAAGCAACCTACAAATTCAATGC . . . NS=1;AN=0 GT:PS ./.:. -1 80001 . T . . NS=1;CGA_WINEND=82000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.15:1.21:.:0:.:0:0.999:152 -1 80141 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs4030287;CGA_RPT=L1PB|L1|9.0;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:106:106,106:106,106:26,33:-106,0,-106:-26,0,-33:32:7,25:25 -1 80383 . AACCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 80500 . AAATAAATAATCAGCAGAGTAAACAGACAACCCACA . . . NS=1;AN=0 GT:PS ./.:. -1 81204 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2531303;CGA_RPT=Tigger5|TcMar-Tigger|32.1;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:40:40,40:40,40:16,28:-40,0,-40:-16,0,-28:17:2,15:15 -1 81257 . TGCCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 81326 . CAGAGACTGACTGTGTCAAAGTATTAGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 81784 . AAAATGTAAAAAGTATCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 81856 . TAATAAAATAAGAAGCCAAAAAACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 82001 . G . . NS=1;CGA_WINEND=84000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:1.25:.:0:.:0:0.999:152 -1 82030 . T . . . NS=1;AN=0 GT:PS ./.:. -1 82112 . ACAATTCACCACAACTGACTTCAAAAAAAAAAAAAAAAAAAAAGAAGTACCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 82210 . TATAACACACACACAAACACTAGGTTTAGATGTTTTCACAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 82300 . AACTCTCAGCCATTTGAGGCAAAATATTACAATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 82453 . GCCAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 82537 . AAATGAAGGCTAAGGCAGAATTATATATGGCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 82620 . AGAAGTTTTTCATATTTTTTTCTTTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 82673 . TTTTAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 82734 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs4030331;CGA_RPT=L1M4c|L1|45.3;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:32:32,32:30,30:4,18:-32,0,-32:-4,0,-18:38:25,13:13 -1 82797 . CATCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 82884 . G . . . NS=1;AN=0 GT:PS ./.:. -1 82961 . CCCTGAGTAAGCAGATATTGAAAATATTAGACAAAAACTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 83010 . GTCTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 83052 . ACATATAAATAAATAAGAAATATGAATTTTTTTAAAGGTACAAAAAAATTCT . . . NS=1;AN=0 GT:PS ./.:. -1 83121 . TAAGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 83242 . AATGACAACAAAAAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 83329 . CAGGAAGACTATTTGAAGAAATGTGTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 83385 . AATATAT . . . NS=1;AN=0 GT:PS ./.:. -1 83412 . ACTTCATCAAGGAAATATACAAAGATATTCACACC . . . NS=1;AN=0 GT:PS ./.:. -1 83511 . CAACGCG . . . NS=1;AN=0 GT:PS ./.:. -1 83556 . A . . . NS=1;AN=0 GT:PS ./.:. -1 83588 . TTAGGAAAAAGGCAACGCGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 83642 . GAGAGCTCATTATAAACCATGGGTGCCAGAAGAGCTTAGGATG . . . NS=1;AN=0 GT:PS ./.:. -1 83783 . G . . END=84056;NS=1;AN=0 GT:PS ./.:. -1 84001 . A . . NS=1;CGA_WINEND=86000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.21:1.10:.:0:.:0:0.999:152 -1 84091 . CACTTTTAAAAAAAAGACTCCTTCAGATACAAACTAAAAAACACTA . . . NS=1;AN=0 GT:PS ./.:. -1 84195 . ATATAAAAGCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 84241 . TATATGT . . . NS=1;AN=0 GT:PS ./.:. -1 84283 . TATGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 84299 . GATGTATACAGATGTGGTTTGTGAAATTACCAACATAAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 84379 . CTATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 84435 . AAATCCCCATGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 84585 . ACAGAAAACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 84680 . ATTAACAGAATGGATTTTTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 84750 . ACACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 85048 . AAAAGATAAAACATCTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 85118 . CTATAGATAACACTTCTCTCAAAAACTGCAGAGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 85197 . ATATGTTAGGCCATAAGATAAGCTCAATAAACTTAAAAAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 85403 . AGACGATTGAAAACAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 85490 . AACATGT . . . NS=1;AN=0 GT:PS ./.:. -1 85801 . ATGCAAAAAAAATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 85994 . AAATGTACCAGAATCTGAAAACATCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 86001 . C . . NS=1;CGA_WINEND=88000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.57:0.68:.:0:.:0:0.999:152 -1 86300 . TTAGTTCAAATTGACTTTTGAACATACTTGGACTA . . . NS=1;AN=0 GT:PS ./.:. -1 86417 . AAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 86429 . T . . . NS=1;AN=0 GT:PS ./.:. -1 86435 . A . . . NS=1;AN=0 GT:PS ./.:. -1 86448 . C . . . NS=1;AN=0 GT:PS ./.:. -1 86452 . GTAA . . . NS=1;AN=0 GT:PS ./.:. -1 86469 . AGTAAATATTAATATATTTGTATTGCTAGAACCCCAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 86515 . GTGAAAGGACAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 86573 . TACATTAGAATCAGTATTATCAACATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 86636 . TCTTAAAAAAATATAATATGGACATATTATATATTATATGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 86687 . GTGTGTCTATACAT . . . NS=1;AN=0 GT:PS ./.:. -1 86758 . T . . . NS=1;AN=0 GT:PS ./.:. -1 86796 . T . . . NS=1;AN=0 GT:PS ./.:. -1 86809 . ATAATT . . . NS=1;AN=0 GT:PS ./.:. -1 86818 . CCTC . . . NS=1;AN=0 GT:PS ./.:. -1 86983 . ACATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 88001 . A . . NS=1;CGA_WINEND=90000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.60:0.82:.:0:.:0:0.999:152 -1 88065 . CTAG . . . NS=1;AN=0 GT:PS ./.:. -1 88125 . TCTGACT . . . NS=1;AN=0 GT:PS ./.:. -1 88239 . CCATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 88259 . GACCTTCTCCTGGGACCACAGGCCTGTGTCTCTATCTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 88329 . TAAAACT . . . NS=1;AN=0 GT:PS ./.:. -1 88570 . TTTCACG . . . NS=1;AN=0 GT:PS ./.:. -1 88773 . G . . . NS=1;AN=0 GT:PS ./.:. -1 88797 . G . . . NS=1;AN=0 GT:PS ./.:. -1 88915 . GCAGAGCCGGCCCCCATCTCCTCTGACCTCCCACCTCTCTCCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 89251 . AATATTGAGCACTATCAGTAAAATACATAAAACCCT . . . NS=1;AN=0 GT:PS ./.:. -1 89346 . CAAATGGATTACACGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 89412 . TGGCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 89575 . TAATGAATAATTTTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 89628 . GGTTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 89845 . AATACTCCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 89885 . AGCCTCCATCTTTCCACTCCTTAATCTGGGCTTGGCCAAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 89948 . ATTAACAAGTCTGATGTGCACAGAGGCTGTAGAATGTGCACTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 90001 . T . . NS=1;CGA_WINEND=92000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.44:0.93:.:0:.:0:0.999:152 -1 90025 . TGCCCCACGAAGGAAACAGAGCCAACC . . . NS=1;AN=0 GT:PS ./.:. -1 90062 . T . . END=90300;NS=1;AN=0 GT:PS ./.:. -1 90306 . GACAGTCCCTCTGTCCCTCTGTCTCTGCCAACCAGTTAACCTGCTGCTTCCTGGAGGAAGACAGTCCCTCTGTCCCTCTGTCTCTGCCAACCAGTTAACCTGCTGCTTCCTGGAGGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 90452 . TTGACCGCAGACATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 90638 . TTCTCTGCTCATTTAAAATGCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 90674 . TACATTTTTATAGGATCAGGGATCTGCTCTTGGATTTATGTCATGT . . . NS=1;AN=0 GT:PS ./.:. -1 90777 . AGGCGCTGGGAGGCCTGTGCATCAGCTGCTGCTGTCTGTAGCTGAGTTCCTTCACCCCTCTGCTGTCCTCAGCTCCTTCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 90868 . CAGGAAATCAATGTCATGCTGACATCACTCTAGATCTAAAACTTG . . . NS=1;AN=0 GT:PS ./.:. -1 90939 . ACATCTGTAATCCCAGCAATTTGGGAGGCCGA . . . NS=1;AN=0 GT:PS ./.:. -1 91004 . GATCCTGGCTAACACGGTGAAACCCCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 91061 . GGTTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 91079 . TGTAGCCCCAGCTACTTGGGAGGCTGAAGCAGGAGAATGGCGTGAACCTGGGAGGTGGAGCTGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 91160 . GCCACTGCACTCCAGACTGGGAGAGAGAGCGAGACTTTCTCAAAAAAAAAAAAATCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 91252 . CTAGAATCCTTGAAGCGCCCCCAAGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 91417 . TGTGTGGCACCAGGTGGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 91450 . GGCAAACCCGAGCCCAGGGATGCGGGGTGGGGGCAGGTACATCC . . . NS=1;AN=0 GT:PS ./.:. -1 91504 . TACAGCAGATTAACTCTGTTCTGTTTCATTGTGGTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 91548 . TGCGTTTTTTTTTCTCCAACTTTGTGCTTCATCGGGAAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 91613 . GAAGAAAAGGCCAAACTCTGGAAAAAATTTGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 91666 . GACCACA . . . NS=1;AN=0 GT:PS ./.:. -1 91727 . AAAAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 91801 . AGACAAAAAAGCTACATCCCTGCCTCTACCTCCATCGCATGCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 91886 . AACCATT . . . NS=1;AN=0 GT:PS ./.:. -1 91937 . TCCCCAATACCCGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 91983 . CAACCTTTGGGAAAAGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 92001 . C . . NS=1;CGA_WINEND=94000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.09:1.17:.:0:.:0:0.999:152 -1 92143 . GAACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 92478 . GCTGAGGCTGCTATTCTTTTGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 92635 . CCAAACCTCAGTCCCTCAGTTGTAAAATTAAAAAAAAAAAAAAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 92708 . GATTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 92833 . GAGGGACAGAAACAAGTGGGAGAAGGTAAAGAGATGGACAA . . . NS=1;AN=0 GT:PS ./.:. -1 92915 . TATGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 93120 . GAAATACAGAAGAGAGATTTCTCATGGTTAAAACGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 93279 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4265376;CGA_RPT=L5|RTE|30.4;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:36:36,36:36,36:13,27:-36,0,-36:-13,0,-27:16:1,15:15 -1 93491 . TCAATTTTATTGAAGTTCACTTCTGACCTCTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 93662 . TAGAGCTGAGACCATTTGCCACTCAGTTTCCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 93725 . CCGGTTTTTTTGTTTTTGTTTTTGTTTTTAGACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 93804 . GCTCACTGCAACCTCCGCTGCCTGTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 94001 . G . . NS=1;CGA_WINEND=96000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:1.22:.:0:.:0:0.999:152 -1 94011 . T . . . NS=1;AN=0 GT:PS ./.:. -1 94116 . CTCGTAAGTAGATTACTACAATCACCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 94221 . AAATGAAAATCTGACCACGTTACTCT . . . NS=1;AN=0 GT:PS ./.:. -1 94257 . TCCGCCTATGGCCGCTGTTAGGATCAAGTCTAAACTCCCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 94461 . G . . . NS=1;AN=0 GT:PS ./.:. -1 94522 . GCTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 94567 . GCCTCACTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 94722 . AATCACATCACATTGCTTCCTTCATATTTTTTTGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 94805 . GCTCCTTTTCTTTTCTTTTCTTTTTTTTTTTTTTTTTTTTTTTGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 94857 . TCTCGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 94890 . TGCAATCTAGGCTCACTGCAAGCTCTGCCTCCTGGGTTCACGTCATTCTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 94975 . ACCTACCACCACGCCTGGCTAATTTTTTTTTATTTTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 95147 . GCATAAACTAAATGTTTTCCAAAGGGAATAGGGCAAAACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 95290 . GGGCTCTCCACTTACAAGAAGAGAGCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 95412 . CCTGTTAATTTAATCACACGGAACACTTCTATTTAAAATTCCCGAGAGTTAAGATGTAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 95532 . CATCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 95575 . ATGGGGCAATTTCTTAAAAGCACCATGTATTTTATCG . . . NS=1;AN=0 GT:PS ./.:. -1 95695 . CCACTATAAAGAACCCAGCGTGGTTTTAACTAATGGATCAAAAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 96001 . T . . NS=1;CGA_WINEND=98000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:1.21:.:0:.:0:0.999:152 -1 96005 . TGGGAGGCACAGTGGAAGATCATG . . . NS=1;AN=0 GT:PS ./.:. -1 96150 . G . . . NS=1;AN=0 GT:PS ./.:. -1 96250 . TCACGGAGGAAAAAAATCTCTCAATGATCTTATCTTTATA . . . NS=1;AN=0 GT:PS ./.:. -1 96363 . GAGGCAACCTCCAAAGGTGGGGCCCTCTGCTCACCTG . . . NS=1;AN=0 GT:PS ./.:. -1 96476 . TTTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 96589 . AAGTACA . . . NS=1;AN=0 GT:PS ./.:. -1 96636 . ATAAATTCGTTCAAGCAGCCATTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 96895 . G . . . NS=1;AN=0 GT:PS ./.:. -1 96941 . CGGTAGACTAA . . . NS=1;AN=0 GT:PS ./.:. -1 97044 . T . . . NS=1;AN=0 GT:PS ./.:. -1 97127 . TTATAAAAAGGTGAGCTGTAATAAATACTAGTGCCACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 97279 . AAGTACA . . . NS=1;AN=0 GT:PS ./.:. -1 97311 . ACCGGCAAATTCTGTTGTTTGTATAAACATCAGCCATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 97461 . ATAATTAATACATTATTAAATTGAATTGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 97563 . TCAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 97621 . T . . . NS=1;AN=0 GT:PS ./.:. -1 97624 . G . . . NS=1;AN=0 GT:PS ./.:. -1 97818 . AGACATG . . . NS=1;AN=0 GT:PS ./.:. -1 97866 . CCACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 97929 . TAACAATCTGAGAGACATTCATACATTTTCCATGTGCTGTAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 98001 . C . . NS=1;CGA_WINEND=100000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.74:1.17:.:0:.:0:0.999:152 -1 98018 . CCCTGTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 98103 . AATAAAGAATTCTATCAATGCTGAGGGAAGATGACTAG . . . NS=1;AN=0 GT:PS ./.:. -1 98307 . CCTGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 98378 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs3868675&dbsnp.108|rs4114931;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:37:37,37:37,37:14,27:-37,0,-37:-14,0,-27:17:1,16:16 -1 98447 . ACATGGGCACCCATATTTTTCTAGCCACTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 98507 . C . . . NS=1;AN=0 GT:PS ./.:. -1 98566 . GTGATTTTCTGTTGGTGTTCACTTCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 98615 . TTATTGACTGACTGACTAACTAATGTG . . . NS=1;AN=0 GT:PS ./.:. -1 98652 . TTCATAAAGAAAGGCTCTCTACAAAAACGGAGGGATGCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 98776 . AATGTGCCTTTCTAGTAACAGGTTTTTAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 98830 . TATTTGTGTGTGTGCATGTGGTAGTGGGGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 98882 . AGAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 98910 . ATACTGTATTCAGGGGGAAAAAATTTTCCCAAGGTCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 98996 . TGCTTTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTTTCCTTTTTTTTCTTTCTCTTTTTTTCTTCTTTTTTTTTTCTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTTGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 99331 . CTCATGATCCACCCACGTTGGCCTCCCAAAGTGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 99373 . CAGGCGTGAGCCACCGCCCCTGGCCAGGATTGCTTTTACAGC . . . NS=1;AN=0 GT:PS ./.:. -1 99480 . T . . . NS=1;AN=0 GT:PS ./.:. -1 99488 . G . . . NS=1;AN=0 GT:PS ./.:. -1 99497 . C . . . NS=1;AN=0 GT:PS ./.:. -1 99584 . GTTAAAAGATATTATTTTGCTTTACACTTTTTCTCTCAGAAATAAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 99754 . AGGGGAGATTTTTCAGGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 99820 . GAAAGTGTATAATGATGTCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 99871 . TGCCGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 99938 . AAAAGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 100001 . A . . NS=1;CGA_WINEND=102000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.83:1.22:.:0:.:0:0.999:153 -1 100319 . TTTGGACAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 100481 . TCCGTGTTACTGAGCAGTTCTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 100538 . TTCTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 100583 . AGGTCAAATTCAAAGGAGAGAAAAAAGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 100652 . ATGGCACAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 100750 . TACCCTTCTAATCTCTATCACAGCAAAAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 100855 . ACATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 100942 . AATGTAGAAATGCTACAGATTATATTCTCTGATTATGACACAA . . . NS=1;AN=0 GT:PS ./.:. -1 101011 . TTTAAAAGCTTTCTCTTAAATAATTCTATGTCAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 101092 . CTTTGGGAGGCCAAGGTGGGCAGGTCACTTGAGGTCAGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 101140 . CCAGCCTCGTCAACATGGTAACACCC . . . NS=1;AN=0 GT:PS ./.:. -1 101210 . TGCCTGTAATCCCAGCTACTTAGGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 101265 . AAGGTGGAGGTTGCAGTGAGCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 101312 . TAGGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 101376 . GTGGAAAATAGTGACAATAAAAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 101425 . TTGAGATGCCAAGGTGGCAGGATCACTTGAGACCAGGAGTTCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 101509 . C . . . NS=1;AN=0 GT:PS ./.:. -1 101512 . C . . . NS=1;AN=0 GT:PS ./.:. -1 101547 . TTCCTGTAATCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 101608 . A . . . NS=1;AN=0 GT:PS ./.:. -1 101618 . C . . . NS=1;AN=0 GT:PS ./.:. -1 101666 . TCCTATCTCAAAAAAAAAAAAAAAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 101792 . GGAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 101825 . TAGCCACGGTGACTCACATCTGTAATCCCAGCACTTTGGGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 101909 . GCCTGGCCAACATGGTGAAATCTTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 102001 . G . . NS=1;CGA_WINEND=104000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.86:1.32:.:0:.:0:0.999:153 -1 102133 . ATAAACTAGAAAACAGAAACATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 102181 . ATGCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 102207 . GTGAATTAAGGAAGGGAAGAGATGGTTGGAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 102322 . AGAGATGCTTGACTGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 102412 . ATCTCCTCCCCTCCCCTACTCCTCACCCCACACTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 102494 . GCATTCTTATTTCCCTGATTTCTTTTTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 102593 . CAAGGGCTTCACAGACAGT . . . NS=1;AN=0 GT:PS ./.:. -1 102635 . TTCAGGTTTTATACCTACCTTATAGATAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 102693 . TGTTCCCAAAGCCTCGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 102766 . ACTCTACTGCCTCTCCATGGATAAAGACAGAGATCACATATTAATA . . . NS=1;AN=0 GT:PS ./.:. -1 102856 . TGATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 102915 . AATTTGAATAACTCCCTGCGGGTGAAGTTCAAAGTACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 103007 . A . . . NS=1;AN=0 GT:PS ./.:. -1 103088 . TCTACTAAAAATAAAAAATTAGCCGGGCCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 103165 . G . . . NS=1;AN=0 GT:PS ./.:. -1 103214 . CACTGCACTCCAGCCTGGGCGACAGAGCGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 103257 . AAAGTAAAATAAAATAAAATAAAAAATAAAAGTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 103304 . ATCAGGGAGGTCTGTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 103336 . A . . . NS=1;AN=0 GT:PS ./.:. -1 103381 . TCAGGGTCCTAGCAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 103441 . AGGGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 103523 . TGTTGGAGGTGGGGCCTAATGGGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 103737 . CAGAGTAGCTAGGATTACAGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 103800 . AGACGGGGTTTCACCAT . . . NS=1;AN=0 GT:PS ./.:. -1 103853 . GATACACCTGCCTCGGCCTCCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 103890 . CAGGTGTGAGCCACCATGCCTACC . . . NS=1;AN=0 GT:PS ./.:. -1 103970 . AACCCCTCTCTCTCGCCACGTGATCTC . . . NS=1;AN=0 GT:PS ./.:. -1 104001 . C . . NS=1;CGA_WINEND=106000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.06:1.45:.:0:.:0:0.999:153 -1 104030 . GAGTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 104104 . AGCCAAATAAACCTCTCTTCTTTAAAATTATTCAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 104157 . AACAACACACACACACACACACACACACATACACACACACG . . . NS=1;AN=0 GT:PS ./.:. -1 104222 . AATTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 104317 . TAATGGTTAAGTAATTATTTGCTCTTACTCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 104382 . TCAACTAGAATCTAGGAAGCAGAGAACCTGAGTGTTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 104519 . CAACAGAGCGACTCAGATGCTATAAAACTTGCTAACA . . . NS=1;AN=0 GT:PS ./.:. -1 104611 . CACAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 104672 . ACCTCACAGAGAAGGAAATTTACACGCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 104756 . TTCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 104819 . GACATTA . . . NS=1;AN=0 GT:PS ./.:. -1 104939 . ATACAAAGAGTAATACCATGTCACTTAAGAATAGAATCATGGACGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 105054 . TTCCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 105120 . ACACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 105178 . GCTCAGATACCTTCTCCGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 105265 . GAGACTAATGAGTAGTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 105295 . AAGCTGAGAATGCTTCTACCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 105330 . GGAATATTCATCAAAACACAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 105484 . TTCCAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 105631 . ACACTCA . . . NS=1;AN=0 GT:PS ./.:. -1 105713 . TACAATAAACATGTGTTTTTAACAAGAAAAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 105785 . A . . . NS=1;AN=0 GT:PS ./.:. -1 105886 . TAGAGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 105984 . ACATGGGTGTTAAAATCAG . . . NS=1;AN=0 GT:PS ./.:. -1 106001 . A . . NS=1;CGA_WINEND=108000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.19:1.25:.:0:.:0:0.999:153 -1 106040 . AGAGCAAGCTGGGAAAGCAGTGGCCTTTAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 106115 . C . . . NS=1;AN=0 GT:PS ./.:. -1 106124 . AGT . . . NS=1;AN=0 GT:PS ./.:. -1 106130 . T . . . NS=1;AN=0 GT:PS ./.:. -1 106133 . TAT . . . NS=1;AN=0 GT:PS ./.:. -1 106154 . AATAGTAAACTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 106462 . AGAATAG . . . NS=1;AN=0 GT:PS ./.:. -1 106497 . GTAATTTTAATATATAACTGGGGTGAGAATCATTGACATAATTGTAACAGGATAATATTCAGGAAATATGG . . . NS=1;AN=0 GT:PS ./.:. -1 106608 . AAAAGTTTTATGTTTTCCCCTAACTCAGGGTCATCAG . . . NS=1;AN=0 GT:PS ./.:. -1 106908 . CTTGAGCAAATGGTAAATTAACTCTCTCTTTTCTCTCTCTCTCTAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 107122 . TTTACTGGAGTACACAATTGTGACTATTTTTAGCCATAGGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 107279 . CACCTTACACTTAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 107329 . CACTTTTCAAAAGACT . . . NS=1;AN=0 GT:PS ./.:. -1 107514 . GGAGATTTGGACATAGAGAGAGGCACACGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 107967 . TGGGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 108001 . A . . NS=1;CGA_WINEND=110000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.13:1.11:.:0:.:0:0.999:153 -1 108016 . TCTTTCCTGGCTATGTTTCTGACATCCTCTTGTACCATGCTCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 108252 . CTATAACAACCTAATATATTCTCAATTGATTAACTGTTTTGCTGAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 108318 . GAAAGAAAACATGGCCAGGTGCAGTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 108356 . AATCCCACCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 108401 . CTTCAAAAAATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 108537 . GTCACTATCAAAAAAAAAAAAAAAAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 108596 . TATCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 108672 . AATGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 108890 . TGACCCAGCATGGCTGAACACTCAGTGACTACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 108998 . TTATATTCAGAATTACTCAAGTCTTAGAAGCACCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 109061 . TCAAGTGATGGGCTGAAGTGAAGGGAGGGAGTCACTCACTTGAACGGT . . . NS=1;AN=0 GT:PS ./.:. -1 109403 . ATATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 109568 . GTGTATGCGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAAAGACAGAAGAAAGAGGGAGACCTTAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 109707 . AGGGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 109773 . ATATATGCAATATATATACATATATACACACATATACATATGTATTTAAATATTTAAATTACATTTTCTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 109852 . AGATATG . . . NS=1;AN=0 GT:PS ./.:. -1 109951 . CAACCCTCCTGTATTAGTCTCCCCAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 110001 . A . . NS=1;CGA_WINEND=112000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.05:1.31:.:0:.:0:0.999:153 -1 110004 . ATGTCCACCTTTATGCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 110059 . AACTTAATAATAAAAACATTTCAAATGTAAAGAAATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 110124 . AAATTGG . . . NS=1;AN=0 GT:PS ./.:. -1 110148 . ACACTTTTCAAAAGAATACATGCATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 110217 . TTAGAGAAATGCAAATCAAAACCATAATGAGATACCATCTC . . . NS=1;AN=0 GT:PS ./.:. -1 110416 . CCATTCG . . . NS=1;AN=0 GT:PS ./.:. -1 110438 . TACTGGGTATATACCCAGATGAATATAAACCAT . . . NS=1;AN=0 GT:PS ./.:. -1 110504 . TTGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 110606 . ATGCCGC . . . NS=1;AN=0 GT:PS ./.:. -1 110700 . ATACAGCATACTCTCAGTTATAAGTGGGAGCTAAATGAT . . . NS=1;AN=0 GT:PS ./.:. -1 110925 . AAATAAAAGTTAAAAAAAAAAGAAAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 110980 . TATGAAAAACACATATCTTTCATT . . . NS=1;AN=0 GT:PS ./.:. -1 111099 . ACACCTGTAATCCCAGCACTTTGGGAGGCCGATG . . . NS=1;AN=0 GT:PS ./.:. -1 111158 . TTCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 111201 . TAAAAATACAAAAATTAGCTGGGTGTGGTGGCAGGCACCTGTAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 111259 . GAGGCTGAGGCAGGAGAATCGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 111341 . GCAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 111363 . GGGGAAAAAAAAAAACAAAAAAAACCACCACCATCATTTTGCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 111520 . ATTATTTTGTATGCGATGACAACAGAATATATTATCATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 111570 . AATCTCATTCATAATATAAAGTATAAATTTGTGATTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 111708 . AAAATTTGAAACTAGTAACATGGAGGACTATTGTCATTGTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 111779 . CAGTGTACATAAAAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 111916 . GGCTAATAGTAGGCACCTAATA . . . NS=1;AN=0 GT:PS ./.:. -1 111987 . C . . . NS=1;AN=0 GT:PS ./.:. -1 112001 . C . . NS=1;CGA_WINEND=114000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.16:1.52:.:0:.:0:0.999:153 -1 112014 . AATTACTGTTTAGAGAATAACATTTGATGGAATCATGCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 112065 . TTACGACTCAATTGTTTGTACTGACATTAACATCCCAAATCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 112134 . ATGTGGCACCTGCTGAAGCCTGCTGCCTCATTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 112200 . CTCTAACATTTTTTAGCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 112248 . TGACCTTTGTACCTGTTCTTTATTCCTGGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 112320 . AGGTTAAATGGCACTAACTCAGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 112357 . CTGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 112402 . TAGTATCGAATCAAGTTTATAATTTTAAAATAATTGGGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 112504 . CCTGTTTGTTCACTCCTGCCACAGTCAGAATAGTGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 112656 . CACGTAGGTAATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 112755 . TTCTTTAGAAAACTTGGACAATAGCATTTGCTGTCTTGTCCAAATTGTTACTAAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 112824 . TCTGACATGAAATGACATTGGAAAACATTAAACACGATTGAAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 112885 . TTATTATTAGAAACCAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 112918 . AAAATAGT . . . NS=1;AN=0 GT:PS ./.:. -1 113068 . GTGATTTTTCAGGTTCACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 113113 . A . . . NS=1;AN=0 GT:PS ./.:. -1 113179 . TATGAAAACAAGAGATAAATATACACAACTGAGGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 113296 . TAACTTTTAATAGAACCAGTCACTACATTAAAAAAATGAC . . . NS=1;AN=0 GT:PS ./.:. -1 113584 . TACAATTTAGGAGAAGAAATTGTATGGAAGGAAGGTTCATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 113679 . GATAAGATATTGTGGCTGCTACCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 113965 . TTGACGATTACCTGTAGCCAACCCTAAGTGAAGAACTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 114001 . T . . NS=1;CGA_WINEND=116000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.04:1.52:.:0:.:0:0.999:153 -1 114039 . TAGCTAAGAACCATGTGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 114079 . CCTCAGTTGAAATTTAAGATGACATATTGAGCAGACATACTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 114180 . ACTGATTTTGAGATTCTCACATAAGTATTAC . . . NS=1;AN=0 GT:PS ./.:. -1 114237 . ATTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 114284 . ATTTCCCTATG . . . NS=1;AN=0 GT:PS ./.:. -1 114376 . AACTCACACAATCTGTGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 114465 . TAGGAAAAAAAATCCTCTCTGGACAAATAAATCATCAAAGCAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 114654 . AAACACATATTTTAATGTGGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 114690 . ATCACAACTATGAGTAAAGACCAAGAAAATTGTGCTGGATG . . . NS=1;AN=0 GT:PS ./.:. -1 114745 . GGCTCCCCTCCTATTTAAGTCTGGGTACTGTGTCACCCGAAGTCTTCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 114807 . GGTCTGGGTTTGCCTATGAAAGAAACTCA . . . NS=1;AN=0 GT:PS ./.:. -1 114919 . A . . . NS=1;AN=0 GT:PS ./.:. -1 114985 . ATTCCTATAAGCTTGGGTTCTGTGCCCACACTCTAGACTGTCAGGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 115042 . ATATAAAACAGACCTCTTCTGATTTTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 115115 . GCATAAGGCCCTAATTAATATTAAACTTTTATCA . . . NS=1;AN=0 GT:PS ./.:. -1 115200 . TAAAATATAAAGAATTGTCCAGAAATATATAAAAAAAGAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 115268 . TATAACAATTGTATGGACTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 115320 . TTTGAAGAAAAAAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 115436 . A . . . NS=1;AN=0 GT:PS ./.:. -1 115440 . T . . . NS=1;AN=0 GT:PS ./.:. -1 115471 . ACAATTTCCTAACAATTTTGGGGTTTATATTTTTGAAAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 115662 . TGTCTGTCCACGATAAGCACTATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 115785 . AAAATTCCTCAAGACTGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 115819 . CACCCTCACAAGAACACTTGCCTAGCAATGGCTGTTTCTGCCAGTAAGTTAACACCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 115886 . CAGACCCTGTGACCAATGAT . . . NS=1;AN=0 GT:PS ./.:. -1 116001 . A . . NS=1;CGA_WINEND=118000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.12:1.54:.:0:.:0:0.999:153 -1 116017 . AATCCTTGTTTATTTCCAAATAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 116131 . TCCAGAAGAATTTCTGTAACAG . . . NS=1;AN=0 GT:PS ./.:. -1 116164 . GTTCTTTGCATGTTTGCTAGAACTCACC . . . NS=1;AN=0 GT:PS ./.:. -1 116204 . TGAGCAACCAAAGCCTGGTTTTTGTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 116253 . GGAGGGGGGTTTATCGTACTGATTCAAGGTGTGAAGGTAACATC . . . NS=1;AN=0 GT:PS ./.:. -1 116486 . TACTGTAAAACATCCCATGGTTTCTCATAG . . . NS=1;AN=0 GT:PS ./.:. -1 116523 . AGTAAAAGTGAAATTTTTATGATGGCCTGAGAAACTTTTCCCATTAGATGCCCAAGTGCTGGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 116624 . GGCAGTCACACTAGCCTCCTTGCTGCTCCACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 116878 . AAACTGTAAATATACATGTTCACTTTTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 116926 . TGGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 116985 . A . . . NS=1;AN=0 GT:PS ./.:. -1 116996 . TACTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 117034 . CACATTGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 117230 . ATATTATTTTCATGTATAAAGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 117308 . C . . . NS=1;AN=0 GT:PS ./.:. -1 117325 . T . . . NS=1;AN=0 GT:PS ./.:. -1 117344 . A . . . NS=1;AN=0 GT:PS ./.:. -1 117365 . AGGGCCAAAAGAGTCAACTTCTGAAGAAGCGCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 117443 . TTGCAAAAATAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 117469 . ATTCATGAGTAGAAAAATAGACTAGTGGAATAACATAAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 117619 . GGAGAATAATGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 117746 . TTACCCAGATGGGCCCAGTCTAATCACATGAGTTCTTAAAAATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 117871 . A . . . NS=1;AN=0 GT:PS ./.:. -1 117913 . CTAGAAGATAGAAAAGGCCAGGATATGGATTCTACCCTAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 117983 . TTGATTTTAGTTCACTAAAATTCA . . . NS=1;AN=0 GT:PS ./.:. -1 118001 . A . . NS=1;CGA_WINEND=120000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:1.53:.:0:.:0:0.999:153 -1 118056 . TTTAGGTCACTTAGTTTGTAGAAATTTGTTACAGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 118098 . G . . . NS=1;AN=0 GT:PS ./.:. -1 118157 . AATTCAAGGTGAATTC . . . NS=1;AN=0 GT:PS ./.:. -1 118183 . CTTAAAACATTTAGATTAAAAATAAATGAGAATTTTTGTTACTTTTGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 118246 . AGAAAAACAAACATTAAGGAGGAAAAATGAACATA . . . NS=1;AN=0 GT:PS ./.:. -1 118313 . GGAAGATATCATAAGGTGACAAATCAT . . . NS=1;AN=0 GT:PS ./.:. -1 118351 . TTTACAACATATATATAAG . . . NS=1;AN=0 GT:PS ./.:. -1 118385 . TTAGAATATATATGAACTCCCAAAAATCAACAGGAAAAATAAGACATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 118442 . AAATGCATAAACAAAAGAAGGCAAAACAAAAATAATGACTCATAA . . . NS=1;AN=0 GT:PS ./.:. -1 118514 . GATGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 118526 . AATGCAAATTAAAACCACCCTGAGATGCTTTTTACATCCA . . . NS=1;AN=0 GT:PS ./.:. -1 118590 . AAAAGTAATAACAAAGATGGGAAGTAATAGAAAATCTTGTCCATTA . . . NS=1;AN=0 GT:PS ./.:. -1 118721 . G . . . NS=1;AN=0 GT:PS ./.:. -1 118725 . A . . . NS=1;AN=0 GT:PS ./.:. -1 118794 . AACATTGTTTGTTATATCAAAAAATAAAAAAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 118839 . CAGCAAAAAAAATAAGTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 118883 . ATGGAATAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 119029 . T . . . NS=1;AN=0 GT:PS ./.:. -1 119033 . A . . . NS=1;AN=0 GT:PS ./.:. -1 119041 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 119103 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 119126 . AGTTGAGGGAATTTCAATTGGAAAAAAATAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 119222 . AGCTACACTATATATTTTCAATGTATTTAATGTATTTTTTGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 119271 . AATATTATGCAATAAAAATGAGAAAACAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 119326 . AAAGAAATGGAGAAAAAATTATAATCTAGTTGAGTAATGGTATATTACATAGCTATTTTCTTAAGTAGATGTATGTACATG . . . NS=1;AN=0 GT:PS ./.:. -1 119434 . CTTAATTATATATAAATATATATGTACATATTTTTAATATAAAATACT . . . NS=1;AN=0 GT:PS ./.:. -1 119496 . AAAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 119536 . TTTTGTATTTTAAGTTTTACATAGTAGGTGTATTTTTCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 119587 . CTATAAAGAACTGCCCAAGACTGGGTAATTTATAAAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 119643 . CACCGTT . . . NS=1;AN=0 GT:PS ./.:. -1 119694 . AGACAAAGAGGAAGCAAGCCAGCTTCTTCG . . . NS=1;AN=0 GT:PS ./.:. -1 119735 . GAAGAAGTGCCGAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 119766 . CTTATAAAACCATCAAATCTCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 119803 . A . . . NS=1;AN=0 GT:PS ./.:. -1 119827 . CCCCCATGATTCAATTACCTCCACC . . . NS=1;AN=0 GT:PS ./.:. -1 119952 . TGTATTGAATTTTAAACTCAGAGAAAAATACT . . . NS=1;AN=0 GT:PS ./.:. -1 120001 . C . . NS=1;CGA_WINEND=122000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.60:1.70:.:0:.:0:0.999:153 -1 120057 . GTTGTTTAGTTACAAGATAGAATGTGGCCTTGTAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 120367 . GAGAGCTGTTCCAAAGTTTAGGGAGTTTTTGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 120410 . AAATAAAAATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 120452 . ATACTGTCAGAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 120504 . G . . . NS=1;AN=0 GT:PS ./.:. -1 120508 . T . . . NS=1;AN=0 GT:PS ./.:. -1 120529 . T . . . NS=1;AN=0 GT:PS ./.:. -1 120606 . TGGTAAATCATTTTCTACCAAAAGAAAGAAATGTCTTGTCTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 120696 . TTAGAAAATTATATTTTATACGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 120775 . CTTCAAGTTTGCTCTTAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 120877 . TTCATTGAATCCTGGATGCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 120908 . AATAAGAGGAATTCATATGGATCAGCTAGAAAAAAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 120964 . AAAGTTATATATTATATATCTATTATATATAATATATATCTATTACATATTATATATTGTATATCTATTACATATATATTATATATGTATTATATATATTATATATTATATATGTATTATATATATTATATATTATATATCTATTATATATATAATATTATATATTATATATCATT . . . NS=1;AN=0 GT:PS ./.:. -1 121145 . ATTCCCCAGCGTTCATATTTGTCAGTGCAAGTAAAGAGCCTTACTGCTGATGAGGTTTGAGGTATGACCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 121263 . G . . END=121528;NS=1;AN=0 GT:PS ./.:. -1 121542 . AGGTGTGAGCCACCACGCCCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 121594 . TTTGAAGGTCATAAAAAATATAATAAGAGATAAGGCTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 121648 . ATAAAATCCTTTAATAAAAATATAAAGGAATAATATAATAATTTTCTTTAATAAAATATAATAAGAGATAAGGCTAATTTCCTTTAATAAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 121765 . TCCAAAAAAAGAAATGGAGAGGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 121802 . ATTAATCTTGTCAAAAATATAAAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 121849 . ACTGTTTTCCTTGTCTGCGGCCATTGTGCTGCTGCTACACAACTACCGCAAGCAGCCCTTCACGCCCTCCTCCCAGTACAAAGCTAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 121948 . AAATGTTAAGCTTGGAAGAGTCAGCATCACTGCACTTATTTTTTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 122001 . T . . NS=1;CGA_WINEND=124000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.83:1.55:.:0:.:0:0.999:153 -1 122021 . AGTGGGGGAAAGGTTAAAAACCCCCCTGGATAAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 122100 . TTTCCTTGTCCCTTGACATAAACTTGATAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 122177 . CAGGTACTTAAAGTTAGCTCCAAAAATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 122226 . GCATTGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 122432 . AGTGACATTGCCTTTTAGTTGTACTTTCACAAAAATTCA . . . NS=1;AN=0 GT:PS ./.:. -1 122546 . TCTCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 122633 . CTTCTTTTCATATTTTTGAAAACTTTTGAAAAACTACC . . . NS=1;AN=0 GT:PS ./.:. -1 122769 . AAAACAATATGTTGTCTTTATCTTTACCTCTCTGTGGCATTTAATGATAA . . . NS=1;AN=0 GT:PS ./.:. -1 122830 . T . . . NS=1;AN=0 GT:PS ./.:. -1 122854 . GAATTCAGTCAGACAACGTACTTACATTTTTCGTCTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 122909 . CACCTCAGCTTTCTCCATTCAGCTATA . . . NS=1;AN=0 GT:PS ./.:. -1 122962 . TCTGCCTCTCCTCTCACTCTATACTATCTCTGTTAGCTAATTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 123036 . TATACACATATGCATGTGTGTACATGTGCACACACACACTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 123085 . C . . END=123432;NS=1;AN=0 GT:PS ./.:. -1 123467 . ATTCTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 123508 . AGAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 123629 . TATCCTCATTTTTTTCAGATTCTTGCTTAGAAGTCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 123685 . GACATATTAAACATTGCAGTCCATTATAAGCTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 123727 . AGGGATTTTTGCCTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 123798 . TGTAATGAATATTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 123827 . AAATTCAATCAAATCACATCACCTGTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 123881 . ACTTAGAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 123900 . TCTTTTTATACAATATACAATATATTTTATACAATATAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 123984 . G . . . NS=1;AN=0 GT:PS ./.:. -1 124001 . C . . NS=1;CGA_WINEND=126000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:1.52:.:0:.:0:0.999:153 -1 124109 . GTCCCTCCACTTTTGCCAACTAATCCCTGCTCAACTTTTCATCTCAGCAGGAGGCCCATTCTCTTTGGCAATCCTCTGGCCTCCAGCCCATTTATTATATGCTCACATGTCAACATGTACTTCGTA . . . NS=1;AN=0 GT:PS ./.:. -1 124252 . GCACTTTTATATTTTAACAAATTATATT . . . NS=1;AN=0 GT:PS ./.:. -1 124310 . CAGGAATTTTGTTCTTGCTCATCATCAACTTTTTCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 124411 . G . . . NS=1;AN=0 GT:PS ./.:. -1 124426 . CCTTTAAATTAGGATGGCAAAGATCGTATATAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 124494 . CCCAATTAAGGAGCACAGCTATGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 124552 . TTTTAAAAAGAAAACTGGCCAGGTACTGTGGCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 124674 . TGTCTCTACAAAAAATACAAAAATTAGCCAAGTTTGGTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 124739 . GGGAGGCTGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 124800 . T . . . NS=1;AN=0 GT:PS ./.:. -1 124842 . TCTCAAAAAAAAAAAAAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 124991 . TAGTATTAGAAAATTACATTAC . . . NS=1;AN=0 GT:PS ./.:. -1 125183 . TATGTTTTTGAAATAAAATATATCTGAGTAGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 125265 . GACGTACATGATA . . . NS=1;AN=0 GT:PS ./.:. -1 125299 . CTTTTAGAAGTCAATCAGGAAGAGGGGAGCAGTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 125491 . TGCATGCTTATACATATAAACACAGCTGATAATTTATTAG . . . NS=1;AN=0 GT:PS ./.:. -1 125563 . CCAGTTTTTTATTTAAATTGAAGATTAGTATACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 125611 . TCAAAATAAAAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 125653 . TCAGAAAAAAAAAGTCAAAAGCTAGAGTATAGAGAAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 125730 . ACAAGATTTAAATATTTTAATG . . . NS=1;AN=0 GT:PS ./.:. -1 125799 . AAATAAACAGATTATATGGAGGATTTTTAGAAGATAAGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 125861 . AAACAAGGGAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 125906 . T . . . NS=1;AN=0 GT:PS ./.:. -1 125920 . A . . . NS=1;AN=0 GT:PS ./.:. -1 125954 . GATGCATAAATATATAAATAAACGATAAAAAATGTTGCATACATATATGACTTTTTCAGAATCAAAAAATTTAAATTTCTGTAATAAAATTTAAATGTTTATAAATTTAAAAAACTAGAAGAAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 126001 . T . . NS=1;CGA_WINEND=128000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.04:1.68:.:0:.:0:0.999:153 -1 126099 . CAAATAAATGACAACTATT . . . NS=1;AN=0 GT:PS ./.:. -1 126208 . CAATTATTTGTCTCAAAAACAAACAAAAAAAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 126258 . A . . . NS=1;AN=0 GT:PS ./.:. -1 126322 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 126337 . AGGCCAGTAGGCAGTGGATTATATATTTAAAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 126394 . AATATATAGCTGGAAAACTTATCCTTCAAAAATGAAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 126444 . ATTTCCGGATTTTTTTTTAAAACTGA . . . NS=1;AN=0 GT:PS ./.:. -1 126622 . T . . . NS=1;AN=0 GT:PS ./.:. -1 126642 . TTTTGGTTTGTAACTCTGCTTTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 126680 . TTAAAAGGCAAATGCATAAAATGTAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 126768 . AAAGAGTAGAGCTATATATATAGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 126808 . GTGATTGAACTTAAGTTGAAATAAATTCAAATTAAAATGTTATAACTCTAGGATGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 126876 . TCATAGTAACCAAAAATGAAATATACATAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 126935 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 126977 . G . . . NS=1;AN=0 GT:PS ./.:. -1 126987 . A . . . NS=1;AN=0 GT:PS ./.:. -1 127001 . A . . . NS=1;AN=0 GT:PS ./.:. -1 127048 . C . . . NS=1;AN=0 GT:PS ./.:. -1 127062 . TCACATAAAACTGGAGCTGAAAGAAACAAATATTTACCTATAAAGTTAAAAGTTATATA . . . NS=1;AN=0 GT:PS ./.:. -1 127136 . TTT . . . NS=1;AN=0 GT:PS ./.:. -1 127152 . T . . . NS=1;AN=0 GT:PS ./.:. -1 127155 . T . . . NS=1;AN=0 GT:PS ./.:. -1 127161 . G . . . NS=1;AN=0 GT:PS ./.:. -1 127165 . A . . . NS=1;AN=0 GT:PS ./.:. -1 127266 . CCCTATG . . . NS=1;AN=0 GT:PS ./.:. -1 127289 . CCTAAAAAGTCTATTCTCAAATGCAGCAGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 127374 . TTCAATTTTATAACACTGGGTTAAGATGAAAGAATGAGAAGATAAAGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 127436 . AACTCACAAACATGTTCAGAAGCAGTAAGAAGTTACATTAATTATCTTTTGAAAGTCGA . . . NS=1;AN=0 GT:PS ./.:. -1 127505 . CTTTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 127647 . GAGTTTCCATTAAAAGACAATTTAGTAAAACTTTTCTTCCCCCAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 127709 . ATGATTTAACAACATGTGTAAAAGTCATTGTGGGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 127829 . G . . . NS=1;AN=0 GT:PS ./.:. -1 127864 . GCCTCAGCTTTCTGAGTAGCAAGGACTACAGGTGCACACCATCACGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 127922 . TTGTACTATTAGTACAGACGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 127958 . GCCAGGCTGGTCTCGAACTCCTGACCTCAAATGATCCATCTACC . . . NS=1;AN=0 GT:PS ./.:. -1 128001 . C . . NS=1;CGA_WINEND=130000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.76:1.41:.:0:.:0:0.999:153 -1 128064 . ACTTTGGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 128126 . CT . . . NS=1;AN=0 GT:PS ./.:. -1 128195 . ACACAAAATCATGGGAGTTCTAATCAAAATCCAACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 128285 . GCCTATTAATTAGATTTGTCTTTGTAGCATTTAACTCTATAATAAATAATATTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 128495 . TGGCAAATATTGATTGTCATCTTCGTGTTTGTCTATGTCCTAAGTGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 128588 . TCACCCCCTTTTTTTTTTTTTTTTGAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 128798 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs3926105&dbsnp.131|rs76554219;CGA_RPT=AluSq2|Alu|12.1;CGA_SDO=24 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:55:55,183:55,183:45,48:-183,-55,0:-48,-45,0:5:5,5:0 -1 128844 . CTCGGCCTCCCACAGTGCTGAGATTACAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 128883 . CCACGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 128979 . AAAGACAAACTCACAGGAAGATGGGATGTAGAATGATAAG . . . NS=1;AN=0 GT:PS ./.:. -1 129083 . CAGCTGAGTCTGCAGTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 129112 . CTTTCATTTTATAAAAATCTATGATTTCTCCTTCCAGTTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 129229 . A . . . NS=1;AN=0 GT:PS ./.:. -1 129271 . ACCTAAAAGAATACGCTTTTTTTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 129312 . CAAATCATCACAGTAGACCACGATC . . . NS=1;AN=0 GT:PS ./.:. -1 129427 . T . . . NS=1;AN=0 GT:PS ./.:. -1 129455 . GCGATCTCAGCTCACTGCAACCTCCATCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 129571 . CTGGCTAATTTTTGTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 129602 . GGGTTTTGCCATGATGGCCAGGCTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 129674 . AGACTTTTTTTTTTTTTTTAATA . . . NS=1;AN=0 GT:PS ./.:. -1 129739 . TCCTGAGCTCAAGTGATCCTCCCACCTCAGCTTCCCAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 129806 . C . . . NS=1;AN=0 GT:PS ./.:. -1 129947 . GAAATTATCCAGTCAGTGGACAGAGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 130001 . C . . NS=1;CGA_WINEND=132000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.98:1.51:.:0:.:0:0.999:153 -1 130118 . AAAATACAAAAATTAGCCGGGTGTGGTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 130216 . T . . . NS=1;AN=0 GT:PS ./.:. -1 130278 . CTGTCTCAAAAAGAAAAAAAAAAGAGACAGAGAAAAGAAAGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 130333 . TTAAGCAAACCATTGTCAGGTTATGGGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 130430 . T . . . NS=1;AN=0 GT:PS ./.:. -1 130434 . A . . . NS=1;AN=0 GT:PS ./.:. -1 130437 . G . . . NS=1;AN=0 GT:PS ./.:. -1 130573 . CATTCAAAGTGCTGAAAGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 130695 . CCACTAGGTCTACCTTAAAAAAATGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 130734 . TCAAGTAAAAATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 130758 . GAGCGGTGGCTCATGCCTGTAATCCCATTTTGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 130852 . AAAACCCCACCTCCAGTAAAAATACAAAAAATTAGCCAGGTATGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 130961 . ACAAAAAACAAAACAAACAAAAAAAACAAAACTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 131062 . GGGCGAGGAGGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 131099 . CCCGGACCAAGTGCTCGGCCCCCAGGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 131143 . TCCCGTGGCGTCAGCATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 131200 . TTCTCCTGGTACTCCATCCCCTTCCTGACCCCTCCCTGCAGCCACACGAG . . . NS=1;AN=0 GT:PS ./.:. -1 131281 . C . . . NS=1;AN=0 GT:PS ./.:. -1 131305 . AGTTGGCAGCTGTTGCTCATGAGCGTCCACCAGGTGGGACAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 131361 . GGGCGGCCCCCTGGAGCCACCTGCCCTGAAAGCCCAGGGCCCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 131413 . ACACTTTGGGGTTGGTGGAACCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 131455 . CCATGGAGGAGGAGCCCTGGGCCCCTCAGGGGAGTCCCTGCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 131549 . CCTGACACCCAGTTGCCTCTACCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 131597 . CTCAGTGCCCTGCGCAAGGAACAGGACTCATCTTCTGAGAAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 131658 . AATCAGACAAGGACCACATCCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 131694 . GCGCTCATGATCTTCAGCAGGCGGCACCAGGCCCTGGCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 131746 . TCACCCCAACCAGGATAACCGGACCGTCAGCCAGATGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 131875 . G . . . NS=1;AN=0 GT:PS ./.:. -1 131889 . TCAGAGGCCAAGCCCACAAGCCAGGGGCTAGCAGGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 131944 . GAGCGGAGCATATCAGAGACGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 132001 . C . . NS=1;CGA_WINEND=134000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.16:1.77:.:0:.:0:0.999:153 -1 132027 . GAGCTCGGATACCAAGGAGCAGCTTCTGTGGGGCAGAACGGCTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 132081 . GGGAACCTGGCTCAGCCTGGCCCAAGCCTTCTCCCACAGCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 132135 . GGACGGCAGGGAAATAGACCGTCAGGCACTACGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 132264 . GGGAGGTGACCCGTGGGCAGCCCTGCTGCCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 132327 . CAGCGAGGTCATAGCGAGTGACGAAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 132366 . CCATGAGGAGGAGGGGGTGATGATGTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 132401 . TGATGGCTTTAGCACCACCGACA . . . NS=1;AN=0 GT:PS ./.:. -1 132441 . GAGTGGGTGACCGACTGAGAGTGGGGACAACTCTGGGGAGGAGCCAGAGGGCAACAAG . . . NS=1;AN=0 GT:PS ./.:. -1 132514 . GTATTTGCACCTGTCATTCCTTCCTCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 132560 . GCTGGATCCTGAGCCCCCAGGGTCCCCCGATCCACCTGCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 132624 . CCTGTCCTCCTCCTACACATACTCGGATGCTTCCTCCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 132716 . AGTCTGGTCAACGCAGCAGAGCGGGCCCCCTACGGCCCCAACCCCTGGGGATGGGGGCCCAGGGACGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 132835 . AGAGACCTGAAAGTGTGGGCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 132936 . CCCGGGGGGCAGAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 133006 . TGGACCCCACACTGGAGGACCCCACCGCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 133055 . ATGCTCCAGCTGCAGTCCAAAGCCCAACACCCCCAAGTGTGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 133119 . CCCTTTGCCTGTACAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 133187 . CTCTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 133236 . CTTCCAGGCCCACTGCTTCTTCCTGTCCACTAGGCCACAGCCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 133305 . T . . . NS=1;AN=0 GT:PS ./.:. -1 133333 . CCCTGACTCCCAGCCCTGTGGGGGTCCTGACCGCACCTCACCTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 133456 . GGGCAGCAGTGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 133478 . GGGGCGGTCCAGTGGGAGGAGCCTCAGCCTCGCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 133584 . ATGCTGGTGGTGGGTGCAGGGCCGCTGGGAGCTGCTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 133668 . G . . . NS=1;AN=0 GT:PS ./.:. -1 133711 . GAAGATGTGTGCATAGCAGGTCCACTGCTGCTGCCCCTGCCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 133869 . ACACCCCAGCCCTGCCTCAACACCTGGGGGTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 133971 . ACAGTATGTGGGGGTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 134001 . G . . NS=1;CGA_WINEND=136000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.36:.:0:.:0:0.999:153 -1 134047 . GGACGGAGTAAGGCCTTCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGAGACCGAGTCTTGCTCTGTCGCCCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 134231 . C . . . NS=1;AN=0 GT:PS ./.:. -1 134259 . TACAGACGGGGCTTCATCATCTTGGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 134331 . T . . . NS=1;AN=0 GT:PS ./.:. -1 134367 . G . . . NS=1;AN=0 GT:PS ./.:. -1 134421 . GGGGGAAAGCTGGGCAGTTTCCCTCCTCCGAGCCCCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 134493 . TCACTTTTCGGAAAATAGCTCCTGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 134527 . AAGATGGAGTGTGAAGAGGGCCTTGGGCCACAGGGAGGCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 134597 . TTCTTTCCCCAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 134624 . GTGAGTATGGGGGTGGGGGCTCCTGCACTTCGACACAGGCAGCAGGAGGGTTTTCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 134712 . TACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 134741 . TTTGCCCCCTTCCCCAGAACAGAACACGTTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 134873 . TGGCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 134929 . AAATATTCCAAAATTCAATATTTTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 134973 . AAACAAATTAGAGGCCAAGAGGCTGCCGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 135037 . GAATGAGCTGGGCCTAAAGAGGCCACTGGCAGGCAGGAGCTGGACCTGCCGAAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 135176 . GCCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 135230 . GCCGACTGGAGATCAAGTTCTGCGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 135462 . GGCCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 135536 . GGACGATTTGGGCCTGCGGAGGCCGCCGGGAGGCCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 135587 . G . . . NS=1;AN=0 GT:PS ./.:. -1 135660 . ACCGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 135801 . GCCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 135823 . GGAGGGGGCGCCGGGAGGCTGCAAGTGGGTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 135978 . CCCCAAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 136001 . T . . NS=1;CGA_WINEND=138000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.73:1.27:.:0:.:0:0.999:153 -1 136014 . ATCCCTTCTCCCAGTGACT . . . NS=1;AN=0 GT:PS ./.:. -1 136045 . CTCCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 136158 . GCAGCCCGAGTGCGATC . . . NS=1;AN=0 GT:PS ./.:. -1 136186 . TCACGGTGGCCTGTTGAGGCAGGGGGTCACGCT . . . NS=1;AN=0 GT:PS ./.:. -1 136226 . GTCCGCGTGGGAGGGGCCGGTGTGAGGCAAGGGCTCACACTG . . . NS=1;AN=0 GT:PS ./.:. -1 136276 . CAGCGTGGGAGGGGCCGGTGTGAGGCAAGGGGCTCACGCTGACCTCTGTCCGCGTGGGAGGGGCCGGTGTGAGGCAAGGGCTCACACTGACCTCTCTCAGCGTGGGAGGGGCCGGTGTGAGGCAAGGGGCTCACGCTGACCTCTGTCCGCG . . . NS=1;AN=0 GT:PS ./.:. -1 136434 . G . . . NS=1;AN=0 GT:PS ./.:. -1 136437 . T . . . NS=1;AN=0 GT:PS ./.:. -1 136446 . G . . . NS=1;AN=0 GT:PS ./.:. -1 136467 . T . . END=137271;NS=1;AN=0 GT:PS ./.:. -1 137278 . GGGGCTCACGCCTCTGGGCAGGGTGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 137332 . CACCGTGAGGGAGGAGCTGGGCCGCACGCGGGCTGCTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 137560 . CGGTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 137606 . GCTGGGAGGCAGGGCCGGGAGAGCCCGACTTCAGGACAACT . . . NS=1;AN=0 GT:PS ./.:. -1 137656 . GGC . . . NS=1;AN=0 GT:PS ./.:. -1 137685 . TGGAGGAGCCCACCGACCGGAGACCA . . . NS=1;AN=0 GT:PS ./.:. -1 137723 . AGATGCCATCGGAGGGCAGGAGCTCATC . . . NS=1;AN=0 GT:PS ./.:. -1 137776 . C . . . NS=1;AN=0 GT:PS ./.:. -1 137918 . GAGAACG . . . NS=1;AN=0 GT:PS ./.:. -1 138001 . A . . NS=1;CGA_WINEND=140000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.13:1.73:.:0:.:0:0.999:153 -1 138153 . TCCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 138260 . GGCCGCCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 138304 . GGGAGGCAGGAGGAGCTGGGCCTGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 138353 . TCACCTGAGGATGCCACAGTGAGACACCATCTGGGTCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 138444 . AGTTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 138557 . GTGAGGGAGGAGCTGTGCCTGTTGAGGCTGCTGGCAGGCAGGCAGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 138648 . AAAAGCCCCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 138813 . AGGCTGTTGTGAGGCAGCAGTTGTGCCTGTAGACCCA . . . NS=1;AN=0 GT:PS ./.:. -1 138887 . AGGCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 138921 . GCAGGAGCTGGGCCTGGACAGG . . . NS=1;AN=0 GT:PS ./.:. -1 138970 . TAGGCCACCAGGAGGCAGCAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 139054 . TTGGCCGTGGAGAGGCCACCGTGAGGCATAAGCTGGATG . . . NS=1;AN=0 GT:PS ./.:. -1 139106 . GTGAGGCAAGACCTGGGCCTGTCTAGGCTGCTGGGAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 139186 . TTGGGCCTGGAAAGGCCCTTGTGAAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 139223 . CCTAAAGAGGCCACTGGGTGGCAGGAGCTGGGTGTGTAGAAGCTGCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 139479 . CCTGGAGAGAAGGCTGGGAGGCAGGAGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 139532 . ATGGAGCTGTGCCTGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 139556 . TTGTGAGGCAGTAGCCTCATCTGCGGAGGCTGCCGTGACGTAGGGTATG . . . NS=1;AN=0 GT:PS ./.:. -1 139757 . TGTAATATATAATAAAATAATTATGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 139903 . AGCGCAACCTCGATCCCTCACATGCACGGTTCACAACAGGGTGCGTTCTCCTATGAGAATCTAATGCTGCTGCTCATCTGAGAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 139994 . CTCAGGCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 140001 . G . . NS=1;CGA_WINEND=142000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.91:1.45:.:0:.:0:0.999:153 -1 140012 . CAAAGGGGAGTGGCTGTAAATACAGACGAAGCTTCCCTCACTCCCTCACTCGACACCGCTCACCTCCTGCTGTGTGGCTCCTTGCGGCTCCATGGCTCAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 140127 . GCTCAAGTGCATCCAAAACGA . . . NS=1;AN=0 GT:PS ./.:. -1 140350 . AAATCCCATCTGTGTGGGTTTACCT . . . NS=1;AN=0 GT:PS ./.:. -1 140407 . ACATACTTGAGAGGCTGAGGTGAGACAA . . . NS=1;AN=0 GT:PS ./.:. -1 140466 . GGACGAC . . . NS=1;AN=0 GT:PS ./.:. -1 140512 . TCACTTGAGCCCAGGAATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 140559 . CCCCATCTGGCCAACATGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 140814 . GGCACAGAGCTTCTAAAGCTCTTACAAAGACCTCAGTGATAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 140884 . ATTTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 140923 . AAGAACTTTGGGATCTCCAGCATGGTAAGAATGCATT . . . NS=1;AN=0 GT:PS ./.:. -1 141029 . ATGCAACCACATGGTAAGAGGCTTGGAACTTTCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 141225 . G . . . NS=1;AN=0 GT:PS ./.:. -1 141260 . GCCCCTCCGAACTTAACTTGCCCTGGGTATCTTTCTTTTTTTTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 141330 . GCTGGAGTGCAGTGGCACAATCTCAGCTTACTGTAACCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 141377 . CCAGTCCCCAGCTCAAGGTATCCTCTCATCTCAGCTTCCCTAGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 141462 . ATTATTTTTTAATTTTTTATA . . . NS=1;AN=0 GT:PS ./.:. -1 141501 . GTTGCCCAGGCTGGTCTCAAACTCCTGAGTTTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 141611 . TCATTGACTGTTTCTGAGATGTATCC . . . NS=1;AN=0 GT:PS ./.:. -1 141730 . ACTTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 141754 . AGCCTGGCCAACACAACAAGACCCCATCTATACAAAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 141929 . CAACAAAATAAGACCCTCTCTCTCAGAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 142001 . G . . NS=1;CGA_WINEND=144000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:1.55:.:0:.:0:0.999:153 -1 142027 . TTACGGGAACCCCCGATT . . . NS=1;AN=0 GT:PS ./.:. -1 142119 . GGACTGAGCCCCTAACTTGTGGGGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 142258 . A . . . NS=1;AN=0 GT:PS ./.:. -1 142356 . CCAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 142466 . CTACATGTGATTCTGTGAGAATTAACGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 142632 . G . . . NS=1;AN=0 GT:PS ./.:. -1 142675 . ACAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 142716 . GGTTTCCCTTCCCGGACAGTTTGCGCTATCCCATCCCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 142778 . CCCCTTCCCTCCCCACTCTCATACA . . . NS=1;AN=0 GT:PS ./.:. -1 142952 . T . . . NS=1;AN=0 GT:PS ./.:. -1 143002 . GTAGATCCAGCTGGAAGTGACAAAAAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 143152 . CATGAAGGGTTAATTTGTATTTTATTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 143205 . CACCTAGGCTGGAGTGCAGTGGTGCAATCAGGCTCACTGCAGCCTTGACC . . . NS=1;AN=0 GT:PS ./.:. -1 143265 . AAGTAATCTCACTTAATTTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 143347 . GGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 143411 . GACCCTGCCTCTACC . . . NS=1;AN=0 GT:PS ./.:. -1 143634 . T . . . NS=1;AN=0 GT:PS ./.:. -1 143804 . G . . . NS=1;AN=0 GT:PS ./.:. -1 143807 . AGG . . . NS=1;AN=0 GT:PS ./.:. -1 143818 . ATACAGATGAAGTTTCCCTTCACTCGCCTGCTGCTCACCTCCAGCTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 143882 . AGACCGC . . . NS=1;AN=0 GT:PS ./.:. -1 143906 . AAGGAACCAACCCACGCCATTCTTCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 143944 . CTGCTGCAGTGGTCAACTTGTAGCACCCCTAAGCTCGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 144001 . G . . NS=1;CGA_WINEND=146000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.79:1.57:.:0:.:0:0.999:153 -1 144047 . CACATCCTGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 144135 . TCTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 144183 . TTGTAGGAGTCCAATCAGGAGACACAAACCACTCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 144263 . A . . . NS=1;AN=0 GT:PS ./.:. -1 144290 . A . . . NS=1;AN=0 GT:PS ./.:. -1 144295 . T . . . NS=1;AN=0 GT:PS ./.:. -1 144299 . A . . . NS=1;AN=0 GT:PS ./.:. -1 144331 . TGGCGAAACCTCGTCTCTACAAAAAACACAAAAATCAGCTGGGTGTGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 144425 . A . . . NS=1;AN=0 GT:PS ./.:. -1 144427 . T . . . NS=1;AN=0 GT:PS ./.:. -1 144431 . T . . . NS=1;AN=0 GT:PS ./.:. -1 144435 . GCC . . . NS=1;AN=0 GT:PS ./.:. -1 144459 . AGCAGAGGTTGTGCCACTGTACTCCAGCCTGGGTGACAGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 144524 . AACGTATATATATATATATATATATATATATATATATATATATATATATATATGTAAATTTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 144645 . TGGGAGGCCAAGGCAGACAGATCACCTGAGGTCAGGAGTTCGAGACCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 144702 . GCACAGAGAAACCCCATCTCTACTAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 144761 . C . . . NS=1;AN=0 GT:PS ./.:. -1 144778 . T . . . NS=1;AN=0 GT:PS ./.:. -1 144809 . CCCAGAAGGTGGAGGTTGCGCTGAGCCGAGATAGCGCCATTGCACTCCAGCCTGGGCAACAAGAGTGAAACTCCATCTCAAAAAAAAAAAAGGGTATTAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 144993 . G . . . NS=1;AN=0 GT:PS ./.:. -1 145048 . C . . . NS=1;AN=0 GT:PS ./.:. -1 145103 . TACCAAAAAAAAGAGACATTAGCCAGGTGTGGTGGTGGTGCACA . . . NS=1;AN=0 GT:PS ./.:. -1 145174 . G . . . NS=1;AN=0 GT:PS ./.:. -1 145182 . A . . . NS=1;AN=0 GT:PS ./.:. -1 145214 . C . . . NS=1;AN=0 GT:PS ./.:. -1 145248 . TGGGCGACAGAGTGAGACCCTGTCTTAG . . . NS=1;AN=0 GT:PS ./.:. -1 145364 . CCCACTTTCCTGTATCT . . . NS=1;AN=0 GT:PS ./.:. -1 145507 . TCCCGGTGCCTTCTCTACAGCAGCCTGAGCCATGTCTCTAATCTATGA . . . NS=1;AN=0 GT:PS ./.:. -1 145642 . GCCTTGCAAGGCAGCCTCACTGCTTGCCCCTCTCCATTTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 145720 . GAACGCACACTCTTTCTCCTCTGGGAGTCTCTGAAGTGGGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 145858 . TCATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 145884 . GATTCTCAGGAGCATGGCAGGTGAAGTGCTCCTCCCATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 145935 . TTAGGGAGTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 146001 . T . . NS=1;CGA_WINEND=148000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.16:1.38:.:0:.:0:0.999:153 -1 146006 . TGGTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 146055 . GTTACACAGCACAGTTACAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 146098 . GATAAAATTAATGTTGCTCATCAGCTGACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 146182 . CTTTAAAACTGGAAGAGGGAGGCAGAAGGTTAAGAACCAGAGACGGT . . . NS=1;AN=0 GT:PS ./.:. -1 146477 . CTCCATCCTGAGTGACAGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 146537 . GGGCCCTCTCCATCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 146575 . CTCTGCAAACGAGTAAACATCACCCTCCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 146700 . AAGCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 146747 . TTCCTTTGGTTCTCAGTAGGCAGGGTAGGGGCCAGGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 146864 . CCTGGACAACATAGCAAGACCTGGGTGGCACACA . . . NS=1;AN=0 GT:PS ./.:. -1 146958 . TTTCAGGCTGCAGTGAGCCATGATCACACCACTGCACTTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 147024 . TCACAAAAAGTTAGAAAAAAAAAAGAGAGAGGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 147066 . ATACACAGGCACCACCACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 147129 . GTTGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 147184 . TTGTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 147278 . TATCAAAAATACAAAAATCAGCTGGGCAGTAGTGGCGTGTGCCTGTAGTCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 147430 . CTGTCAACAACAACAACAACAACAAAAACAAAAACAACAACAACAAAAAAAACTCC . . . NS=1;AN=0 GT:PS ./.:. -1 147521 . A . . . NS=1;AN=0 GT:PS ./.:. -1 147587 . GGAAGAAAAAAAAAACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 147682 . ATCACTTGAGGCCAGGAGTATGA . . . NS=1;AN=0 GT:PS ./.:. -1 147718 . AACATGGTAAAATCCCACCACTACAGAAAAATCTAAAAATTAG . . . NS=1;AN=0 GT:PS ./.:. -1 147841 . C . . . NS=1;AN=0 GT:PS ./.:. -1 148001 . T . . NS=1;CGA_WINEND=150000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.34:.:0:.:0:0.999:153 -1 148022 . TGGATTTTTAAAAAATCAAGACGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 148084 . GGCTCAAGCCATCCTCCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 148113 . CTGAGTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 148151 . AACTGGTATAGCCACGTTAGAAAACATTCTGGCAGTTTCTCAAAAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 148250 . CCAGAAAAATAAAAATATATGTCCACACAAAAACTTGTACAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 148336 . ACATGGAAACAACCCAAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 148466 . ACACTGTGCTAAGAGGGAAAAAAAGCCACAAAAGATCACATATTGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 148548 . GACAAAAAAATTAATCAATGGTTGCCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 148613 . AGTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 148655 . TTCTAAAAGTGACTGTGGTGATCGATG . . . NS=1;AN=0 GT:PS ./.:. -1 148690 . TGTGAATATTCTAAAACCTACTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 148741 . TGGTATGTGAATATTTTAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 148770 . ATTTAAAATAATAATAATAGGGGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 148826 . CCAGCACTTTGGGAGGCTGAGGCAGGAGGATCACTTGAGGTCAGGAGTTTTGAGCCCAGTCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 148905 . CCCGTCTCTATGATAAAAAATTAGCTGGACATGGTGGCACA . . . NS=1;AN=0 GT:PS ./.:. -1 149046 . CCTAAGCAACAGAGCAAGACGCTGTCTCTGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 149092 . AATGCAAGTTTTTATCACTTTGTGAGTGTAGCCAAGTTGGAGGAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 149147 . TAATAAAAGAGCACTGAATAATGACAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 149319 . G . . . NS=1;AN=0 GT:PS ./.:. -1 149667 . AGACTCTTATCTTAAAAAAAAAAAGAAAAAAAAGAAATGGC . . . NS=1;AN=0 GT:PS ./.:. -1 149787 . CCAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 149933 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 150001 . T . . NS=1;CGA_WINEND=152000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.23:.:0:.:0:0.999:153 -1 150018 . TTGAAAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 150072 . C . . . NS=1;AN=0 GT:PS ./.:. -1 150096 . GCTGTATGGTTACAAGGCCTACATAGC . . . NS=1;AN=0 GT:PS ./.:. -1 150366 . GCCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 150448 . GCTATATTTATTTTATTTTATTAAATTTATTTTTTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 150558 . T . . . NS=1;AN=0 GT:PS ./.:. -1 150670 . AGTGTTTTTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 150764 . TCCTACTGACTGACTTCAACATG . . . NS=1;AN=0 GT:PS ./.:. -1 150824 . CCCCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 150977 . CCTCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 151247 . ATTCACTTATGAGGCCAACCCTGACCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 151293 . CTGTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 151324 . CTTTCTTTTTGAAACAAGATCTTGCTTTATTG . . . NS=1;AN=0 GT:PS ./.:. -1 151481 . TTTATTTTTTTTTTTTTGAGACGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 151591 . CCCAAGTAGCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 151689 . GAACTTCTGACCTCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 151744 . GGCGTGAGCCACTGCGCCTGGCCTTTAAAAAAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 151844 . TCCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 151912 . TTCGTGTTCATTTTTTGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 151999 . GGGGCCAGAAGCCCAACTAGGTCTATTAAGGCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 152001 . G . . NS=1;CGA_WINEND=154000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:1.36:.:0:.:0:0.999:153 -1 152053 . CTGCATTCCTTCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 152120 . ACCCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 152240 . CTACCTCATTCTCTTATAAAGATCCTTGTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 152449 . CTAAGGTGGGAGGATTGCTTTAGCCTAGGAGGTCAAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 152584 . CCCGTGGTTCCTGGCATAGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 152759 . AACGGGGTGGCTGCAAGCTCCGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 152833 . TTTAAAACAGAGTTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 152873 . CAATGGCACAATCTCAGCTTGCTACAACCTCCACCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 152949 . GCTGGAATTATAGGGGTGTGCCACAATGCC . . . NS=1;AN=0 GT:PS ./.:. -1 153013 . TTTCACCATGTTGGTCAGGCTGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 153089 . GGATTACAGGAGTGAGCCACCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 153240 . AAACACCCAAACAGCAGGGTTTGGAGAGCTTCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 153340 . CCCTCCCCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 153387 . TGAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 153413 . GTAATAGAAAATAAGGTGGCCAGATGCACTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 153520 . CTGGGCAACATAAGAAGACCCCATCTATACAAAAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 153622 . GGAGAATCACTTGAGCCCTGGACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 153670 . A G . . NS=1;AN=2;AC=2;CGA_RPT=AluJo|Alu|18.9;CGA_SDO=17 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:46:97,46:97,46:39,41:-97,-46,0:-41,-39,0:0:0,0:0 -1 153699 . AGCGAGGCCCTGTCTCTTAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 153850 . TTGCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 153953 . GCGGTTGGCATC . . . NS=1;AN=0 GT:PS ./.:. -1 154001 . C . . NS=1;CGA_WINEND=156000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.88:1.58:.:0:.:0:0.999:153 -1 154069 . GACTGGCTGAAGCCACAGCAGAAGAATATAAATTGTGAAGATTTCATG . . . NS=1;AN=0 GT:PS ./.:. -1 154222 . C . . . NS=1;AN=0 GT:PS ./.:. -1 154272 . CCGGTCATCTTCGTAAGCTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 154549 . CTAAGGGGAGGCCTCTGAAATGGCCGC . . . NS=1;AN=0 GT:PS ./.:. -1 154586 . GCTGTCTTTTACAGTCATAG . . . NS=1;AN=0 GT:PS ./.:. -1 154630 . TTCGCGTGGCGCTCCCAGGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 154772 . TTAATTTCGCCCCAGTCCTGTGGTCCTGTGATCTTGCCCTGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 155000 . TCTCTTTTGTACTCTTTCCCTTTATTTCTCAGACTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 155046 . AGGGAAAATAGAAAAGAACCTACATGAAATATCAGGGGTGAATTTCC . . . NS=1;AN=0 GT:PS ./.:. -1 155169 . AACAAGGCAAGCATTAAAGTCAGACCAGACTAACATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 155240 . AATCGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 155424 . GGGCAAAATTCCATACAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 155487 . CATACCC . . . NS=1;AN=0 GT:PS ./.:. -1 155535 . GCTGGTTTCCCTGCCTGGGCAGCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 155589 . CCCTCCACCTCCCCCTTCCCTCCCCACTCTCATACAACTCTTCCTTATC . . . NS=1;AN=0 GT:PS ./.:. -1 155677 . TCTCTCCCTCTCCAGAAGAGCTTCCGATT . . . NS=1;AN=0 GT:PS ./.:. -1 155757 . C . . . NS=1;AN=0 GT:PS ./.:. -1 155797 . TTCCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 155835 . GAAGTGACAAAAAGACATTTAAAAAAAAAAAAAAAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 155889 . CATCAGCACTTAAAAGTTTTAAACGATATGTGAAAAACAAAATTTAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 155960 . GGAAGGTGTTACTGGGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 155992 . TAATTTTTATTTTATTTTATTTTTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 156001 . T . . NS=1;CGA_WINEND=158000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.70:1.36:.:0:.:0:0.999:153 -1 156049 . ACTGCAGTGGTGCAATCACAGTTAACTGCAGCCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 156133 . GAAATGCAGTCTTGCTCTTAGCAAAGCTAAAGTGCAATGGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 156231 . G . . . NS=1;AN=0 GT:PS ./.:. -1 156276 . CTCATTTTTTTTTTTTAATTTTTAGTAGAGACAAAGTGTCACTA . . . NS=1;AN=0 GT:PS ./.:. -1 156350 . CTCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 156380 . AAATGCTGGGATTACAGGTGTGAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 156433 . TTAATTATATAAAGAGCTCAAAGCAAATATTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 156483 . C . . . NS=1;AN=0 GT:PS ./.:. -1 156503 . GTAAATTGTGATACATCCATATAATAAAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 156603 . T . . . NS=1;AN=0 GT:PS ./.:. -1 156671 . TTTCCTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 156725 . AAAAGTATTTATCATTTTTATAATTTAATAAAAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 156837 . GTGGCTCATGCTTATAATACCAGTACTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 156875 . TGGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 156927 . G . . . NS=1;AN=0 GT:PS ./.:. -1 156941 . C . . . NS=1;AN=0 GT:PS ./.:. -1 156945 . TAAA . . . NS=1;AN=0 GT:PS ./.:. -1 157026 . T . . . NS=1;AN=0 GT:PS ./.:. -1 157028 . A . . . NS=1;AN=0 GT:PS ./.:. -1 157045 . C . . . NS=1;AN=0 GT:PS ./.:. -1 157104 . TCTCAAAAAAAAAAAAAAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 157226 . GCCAAGGCGGGTAGATCTTGAGATCAGGAGTTCGAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 157310 . AATTAAC . . . NS=1;AN=0 GT:PS ./.:. -1 157400 . TACCTGTGTGGGCGAAGGTGCAGTGAAATGGCCATTTTCTTGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 157492 . GAAATTTTTTTTAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 157635 . CATCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 157729 . GTCATATGCACAAACACAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 157761 . GTAATTTTTTTCTCTTTTTTTAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 157913 . GTGAAATAAGGAAGAATTATGGAGAATTTAAAAATCTATGCTATTTATAG . . . NS=1;AN=0 GT:PS ./.:. -1 158001 . T . . NS=1;CGA_WINEND=160000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.95:1.50:.:0:.:0:0.999:153 -1 158003 . ATTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 158049 . TTGCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 158085 . TAATTGTAGAAACAGATACAATTTGTCCCTTGGTATATGGGGGGATT . . . NS=1;AN=0 GT:PS ./.:. -1 158361 . AGAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 158472 . ATGCTGGAATTGGGAACAGCAGAAGTGTCATCTCAGAGCTACTCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 158535 . GGGGCTCAGGTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 158582 . GTGGGATTTACTTGTCCATCCATTTTCTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 158696 . ATTGAAAAATCGTCGCAGGTCAGGTGAGGTGGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 158743 . CCAGCCCACTGGGAGACTAAGGCAGGAGGATTCCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 158828 . CTACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 158986 . CCCTGTCTCAAGACACACACAAACACACACACACACACACACACACCCCCAATCTCACTCT . . . NS=1;AN=0 GT:PS ./.:. -1 159068 . AGGGCCTTCTGGTTACAGAAGAGGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 159237 . ATGCCTCCTTTGTCAATTAATAAATGGAACATCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 159360 . GCGCGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 159435 . AGATCAGCCTGGCCAACATGGTGAAACCCCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 159525 . ACTTGGGAGGCTGAGGCAGGAAAATCGCTTGAACCCGGAAGGCGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 159598 . CATTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 159948 . CTCCAGAAACAATAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 160001 . T . . NS=1;CGA_WINEND=162000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.05:1.50:.:0:.:0:0.999:153 -1 160010 . G . . . NS=1;AN=0 GT:PS ./.:. -1 160116 . GGGCCCAGCTCCTCACTACTCACCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 160166 . AGAGGATGGGGAAACAAGGCTCCTGACTTTTTTTCCCTAATATCT . . . NS=1;AN=0 GT:PS ./.:. -1 160383 . TGGGATCATTCCAAATTCC . . . NS=1;AN=0 GT:PS ./.:. -1 160435 . AAGGACCAACCATTCAAATGGGCCCTGCTGCCAAGCCTTTTTTTTTTTTTTTTAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 160541 . T . . . NS=1;AN=0 GT:PS ./.:. -1 160562 . G . . . NS=1;AN=0 GT:PS ./.:. -1 160815 . G . . . NS=1;AN=0 GT:PS ./.:. -1 160819 . CT . . . NS=1;AN=0 GT:PS ./.:. -1 160823 . C . . . NS=1;AN=0 GT:PS ./.:. -1 160833 . A . . . NS=1;AN=0 GT:PS ./.:. -1 160925 . GAGCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 161075 . GTGACTGAAGCAAAAGCTTCATAACCAGAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 161224 . TATATGCAGGGATGCAGGCTGTAGTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 161283 . CAAGACCTCAAACTG . . . NS=1;AN=0 GT:PS ./.:. -1 161400 . AAGCCATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 161466 . AACCTAGATGAGACAATCTAAGCATCCAAAACAATAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 161512 . TGGCCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 161588 . TAGAGGGAAGGTTACTAGGTCACTAACTT . . . NS=1;AN=0 GT:PS ./.:. -1 161654 . GGAAAATTAGCTATTTATTCAGTCTTTCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 161716 . AGATGAAACAAATCTGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 161808 . C . . . NS=1;AN=0 GT:PS ./.:. -1 161815 . C . . . NS=1;AN=0 GT:PS ./.:. -1 161831 . G . . . NS=1;AN=0 GT:PS ./.:. -1 161870 . TCTGCCTACAAGAGACACTAGGATATGAGGGGTAGTTTTAGCCCTAATGG . . . NS=1;AN=0 GT:PS ./.:. -1 162001 . C . . NS=1;CGA_WINEND=164000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.14:1.55:.:0:.:0:0.999:153 -1 162016 . A . . . NS=1;AN=0 GT:PS ./.:. -1 162049 . CTATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 162140 . T . . . NS=1;AN=0 GT:PS ./.:. -1 162290 . GGGAAGGAGGAAGGGAGGGAAGGAGGGAGGGAGGGAGAGAGAGAGGGAGGGAGGGGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 162378 . GAAGAAAAGGAAAGGAAAGGAATAAATTTTATTTCTTAACAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 162428 . GTTAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 162554 . TTTATAATAAGCCCACTCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 162588 . TTACCACAATAAC . . . NS=1;AN=0 GT:PS ./.:. -1 162802 . AAGGAAAAAAGTCACAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 162853 . ACCTAAAAAAGATCTCATTAACTCCCCCAGCTCACCTCCACGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 162980 . GACACCCGACCTCAATAGCTCCAGAACAGCCCTAAAACATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 163184 . CTCTCGGGGTCCACCAAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 163268 . CACAAGGCATGTCGTCCTCAAAGATAAATGAGCAGGCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 163405 . TCTGAGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 163512 . TCACGCTGACCCCAGCTCCCTGGATGT . . . NS=1;AN=0 GT:PS ./.:. -1 163702 . AAAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 163738 . CTCTCACCTATTTCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 163799 . TTTTAAGCTGATAATGAAAAAAAAAGAAAAAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 163859 . GTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 163964 . TACTAAAAATACAAAAAATTAGCCAGGCATGGTGGCGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 164001 . C . . NS=1;CGA_WINEND=166000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.98:1.50:.:0:.:0:0.999:153 -1 164020 . ACTAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 164059 . AGGCAGAGAATGTGGTGACCTGAGATCACGTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 164135 . AAAACAAAACAAAACAAAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 164305 . T . . . NS=1;AN=0 GT:PS ./.:. -1 164342 . T . . . NS=1;AN=0 GT:PS ./.:. -1 164374 . CACGTCTGTAATCTCAGCACTCTGGGAGGCCGAGGCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 164439 . GCGGGAAGATCACTTGACGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 164474 . G . . . NS=1;AN=0 GT:PS ./.:. -1 164670 . TCTCAAAAAAAAAAAAAAAAAAAAAATTCCTTTGGGAAGGCCTTCTACA . . . NS=1;AN=0 GT:PS ./.:. -1 164746 . AAAGGGTATGGGATCATCACCGGACCTTTGGCTTTTACAG . . . NS=1;AN=0 GT:PS ./.:. -1 164811 . AAGGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 164938 . AAATAAAAAGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 165064 . TCTCCTGAAGTCAGGAGTTCAAGGCCAGCCTGGCCAACATGG . . . NS=1;AN=0 GT:PS ./.:. -1 165259 . GTGCGTGACAGAACAAAACTTCAACCTCCAAAAAAAAAAAAAAAAAAAAAAACAGC . . . NS=1;AN=0 GT:PS ./.:. -1 165408 . TTTCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 165565 . GAATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 165614 . TGCATTA . . . NS=1;AN=0 GT:PS ./.:. -1 165642 . C . . . NS=1;AN=0 GT:PS ./.:. -1 165700 . TAAT . . . NS=1;AN=0 GT:PS ./.:. -1 165706 . A . . . NS=1;AN=0 GT:PS ./.:. -1 165709 . GAAA . . . NS=1;AN=0 GT:PS ./.:. -1 165846 . T . . . NS=1;AN=0 GT:PS ./.:. -1 165886 . GACAAAGTTGATTT . . . NS=1;AN=0 GT:PS ./.:. -1 165949 . GGTGAGAATAAATACTTA . . . NS=1;AN=0 GT:PS ./.:. -1 166001 . C . . NS=1;CGA_WINEND=168000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.95:1.51:.:0:.:0:0.999:153 -1 166044 . AGAGAAAGGACAGGCTGGGCACAGTGGCTCACACCTGTAATCCCAGCAGTTTGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 166162 . C . . . NS=1;AN=0 GT:PS ./.:. -1 166220 . CAGCTACTCGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 166358 . AAGAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 166484 . GGAGAAGAGACGTGGCCAGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 166567 . GAGATTTTTGCTTTAAAATGAACCAAAAAAAAACCAAAGGTGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 166647 . AGAGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 166787 . AATACCA . . . NS=1;AN=0 GT:PS ./.:. -1 166885 . CAAGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 166972 . GAGAGGATAATAACAAATCGCTAATTTCTTTCATCACTATATAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 167104 . CAACAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 167131 . ACATTTTATTTATTTATTTATTTTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 167228 . TCCTGGGTTCAAGCGATTCTCCTGCCTTGGCCTCCCGAATAGCTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 167285 . TGCGCCACCACACCCGTCTAATTTTGTATTTTTAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 167409 . TACAGGTGTGAGCCACCACGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 167514 . CACCTATAATCCCAGCACTTTGGGAGGCTGAGGTGAGTGGATCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 167596 . GAAACCCCGTCTCTACTAAAAATACAAAAATCAG . . . NS=1;AN=0 GT:PS ./.:. -1 167664 . GCTCGGGAGGCTGAGGCAGAGAACTGCTTGAACCCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 167770 . TCTCAAAAAAAATAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 167960 . T . . . NS=1;AN=0 GT:PS ./.:. -1 168001 . G . . NS=1;CGA_WINEND=170000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.89:1.48:.:0:.:0:0.999:153 -1 168063 . AACCTAAGATTACAAGACTTTTCCAGTTTAGACATACCA . . . NS=1;AN=0 GT:PS ./.:. -1 168145 . CGATACG . . . NS=1;AN=0 GT:PS ./.:. -1 168223 . AACATATAATGACTGATTTCATATATTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 168352 . A . . . NS=1;AN=0 GT:PS ./.:. -1 168373 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4083155&dbsnp.131|rs79285249&dbsnp.135|rs183198872;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:15:15,44:15,44:17,18:-44,-15,0:-18,-17,0:0:0,0:0 -1 168442 . TCCTAAAACATG . . . NS=1;AN=0 GT:PS ./.:. -1 168463 . TGTGAAATAAGACTTTACAGCAGCCGGGTGCAGTGGTGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 168566 . AGCCAAAACCCCCCTCCCTAGCCCCACCCCCACCCCGTCCCTACCAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 168636 . TGGCGGGCGCCTGTAGTCCCAGCTACTCAGGAGGCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 168750 . G . . . NS=1;AN=0 GT:PS ./.:. -1 168762 . ACCTCAAAAAAAACAAAAACAAAAACACA . . . NS=1;AN=0 GT:PS ./.:. -1 168857 . ACCCTTTTTCTCCCAATCATTGAAACAC . . . NS=1;AN=0 GT:PS ./.:. -1 168930 . C . . . NS=1;AN=0 GT:PS ./.:. -1 168964 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.131|rs80158785&dbsnp.135|rs187450123;CGA_RPT=MLT1D|ERVL-MaLR|40.2;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:50:50,113:50,113:43,42:-113,-50,0:-43,-42,0:2:1,1:1 -1 169221 . ACCTTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 169267 . GTAAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 169489 . ATACAAAAAATACAAAAACTAG . . . NS=1;AN=0 GT:PS ./.:. -1 169578 . CCCAGGGGGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 169631 . TGGGTGACAGAGCGACGCTCCATCTCGAAAACAAAACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 169717 . CACCCAACCCCCAGAAACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 169755 . CCACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 169856 . ATTCTTTTTTTTTTTTTGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 169919 . TCTCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 170001 . G . . NS=1;CGA_WINEND=172000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.09:1.41:.:0:.:0:0.999:153 -1 170038 . ATGCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 170102 . CAGCCCCGCAAAGTGCTGCTATTATAGG . . . NS=1;AN=0 GT:PS ./.:. -1 170333 . AGGTTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 170372 . AAAGGATGGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 170423 . CCAGGTTGGCAGGGCTGGGGAAGGGAAAGTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 170466 . CAAGAAAAAAAAGAGGCAGCAGAGGGAGCAGGAGAGCGCTCACATGGAACTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 170525 . TGCCTGAGGGGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 170556 . GACGTCAGGGGGCAGAGAGGCGCAGTTCCAGGGCGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 170634 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4096701;CGA_SDO=15 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:48:48,123:48,123:42,43:-123,-48,0:-43,-42,0:1:1,1:0 -1 170702 . GAGTGGGGCAGAGCAGGGAGGAGTCCTGCACA . . . NS=1;AN=0 GT:PS ./.:. -1 170778 . G . . . NS=1;AN=0 GT:PS ./.:. -1 170821 . TACCCCTGGGCTGATCACTTGGGGAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 170966 . TTTCGCTCCTGTCGCCCAGGCTGGAGTGCAGTGGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 171035 . GTTCAAGTGATTCTCCTGCCTCAGCCTCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 171146 . GGGCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 171159 . T . . . NS=1;AN=0 GT:PS ./.:. -1 171236 . CACGATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 171411 . TCGAGCCCATCATCCCCTAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 171454 . GCAGAGCTGAGGGGGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 171485 . TGTGAAATCGCCCTGAGATGACCCACA . . . NS=1;AN=0 GT:PS ./.:. -1 171519 . GCTAGGAAGTAAGCGCTGCATCTCCTGCAGCGTCCTCCATCCCTAGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 171612 . TTATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 171782 . GGACGGG . . . NS=1;AN=0 GT:PS ./.:. -1 171828 . AGTGGGAGGGGGAACAGCATGAGCCAGGCCTCGAGGCAGAAGGACAACCAG . . . NS=1;AN=0 GT:PS ./.:. -1 171897 . TGCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 172001 . G . . NS=1;CGA_WINEND=174000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:1.43:.:0:.:0:0.999:153 -1 172093 . AGGTTTTTTGTTTTTGTTTTGGAGACGGAGTTTCGCTCTTGTCACAC . . . NS=1;AN=0 GT:PS ./.:. -1 172273 . GGCAGGAAGCTAAACT . . . NS=1;AN=0 GT:PS ./.:. -1 172531 . G . . . NS=1;AN=0 GT:PS ./.:. -1 172536 . CAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 172561 . GCCTGGCAGCTCTCTCCAACTTTGGAAGCCCAGGGGCATG . . . NS=1;AN=0 GT:PS ./.:. -1 172627 . T . . . NS=1;AN=0 GT:PS ./.:. -1 172705 . GTCTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 172874 . CGTGACTCCAAAGTCCCTGCCCTAGCCCCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 172915 . GCAGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 172934 . AGGCCAGCAGCACAGCCGGCCAAGACCAGGGAAACTTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 172983 . AGCACCCCCAGGTATTCCAACCTAACCCT . . . NS=1;AN=0 GT:PS ./.:. -1 173022 . CTCTCACCACCCTTCTTCCTGCTTTAACCTCAACCCCTAC . . . NS=1;AN=0 GT:PS ./.:. -1 173095 . AGACGCCTCAATAAATCAGTCTAATCTCGAAAATAAAAAAGACT . . . NS=1;AN=0 GT:PS ./.:. -1 173167 . GCTAAAAACCATAAACATATAACAACTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 173209 . TTCAATATATATCCAATCATTGTAACTATGACACAGT . . . NS=1;AN=0 GT:PS ./.:. -1 173264 . TTTCAAAATGTA . . . NS=1;AN=0 GT:PS ./.:. -1 173297 . ATTCAAACTATTTATTCAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 173386 . ATAAATTAGCAGCCAGCAGGCAGTGACACACCGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 173582 . CAATGTCTGTAACGTGACTTGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 173649 . CATGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 173682 . G . . . NS=1;AN=0 GT:PS ./.:. -1 173709 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4117998&dbsnp.131|rs80085981&dbsnp.135|rs192722547;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:68:68,134:68,134:48,45:-134,-68,0:-48,-45,0:2:2,2:0 -1 173737 . TTCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 173769 . CATTTTCCGTTCCCCAGCATTGGCAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 173820 . CTCGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 173870 . AAGAAAAAATGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 174001 . G . . NS=1;CGA_WINEND=176000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.65:1.33:.:0:.:0:0.999:153 -1 174055 . TGAGTACTGCTTTTTCCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 174090 . T . . . NS=1;AN=0 GT:PS ./.:. -1 174214 . GCCGCCTAAAGTTATACAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 174278 . T . . . NS=1;AN=0 GT:PS ./.:. -1 174353 . A . . . NS=1;AN=0 GT:PS ./.:. -1 174482 . AGTGAAACCCTCTCTCTACTAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 174525 . A . . . NS=1;AN=0 GT:PS ./.:. -1 174531 . G . . . NS=1;AN=0 GT:PS ./.:. -1 174536 . ACG . . . NS=1;AN=0 GT:PS ./.:. -1 174544 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 174556 . TTGAGAGGCCGAGGCGGGTAGATCACCTGAGGTCAGGAGTTTGAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 174678 . A . . . NS=1;AN=0 GT:PS ./.:. -1 174721 . CCCAAAAGGCAAAGATTGTGGTGAGCCGAGATTGTGCCATTGC . . . NS=1;AN=0 GT:PS ./.:. -1 174777 . CAAAAACAGCGAAACTCCGTCTCAAAAAAAAAAAAAAGAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 174841 . TGCAGTGAGCTGAGACTGCACCATTGCACTCCAGCCTGGGTAGCAGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 174899 . TCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAGAGAGAGAGAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 175131 . TGCAATA . . . NS=1;AN=0 GT:PS ./.:. -1 175180 . GCTGACGCCTGTAATCCTAACACTTTGGGAAGCCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 175239 . GGAG . . . NS=1;AN=0 GT:PS ./.:. -1 175245 . CGAG . . . NS=1;AN=0 GT:PS ./.:. -1 175276 . T . . . NS=1;AN=0 GT:PS ./.:. -1 175312 . GGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGCCTGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 175360 . AATCGCTTGAACCCGGGAGGTGGAGGTTGCAGTGAGCCGAGATC . . . NS=1;AN=0 GT:PS ./.:. -1 175427 . GGACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 175443 . TGTCAAAAAAAAAAAACAGAAAAAGAAAAAGAAAAAAGAATTAG . . . NS=1;AN=0 GT:PS ./.:. -1 175578 . A . . . NS=1;AN=0 GT:PS ./.:. -1 175595 . AATCTTTTTTTTATTTTGAGACAGAGTTTTGCTCGTT . . . NS=1;AN=0 GT:PS ./.:. -1 175707 . C . . . NS=1;AN=0 GT:PS ./.:. -1 175717 . T . . . NS=1;AN=0 GT:PS ./.:. -1 175726 . G . . . NS=1;AN=0 GT:PS ./.:. -1 175729 . A . . . NS=1;AN=0 GT:PS ./.:. -1 175763 . TGTATTTTCAGTTGAGACAGGGTTTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 175808 . TCTCGAACTCCTGACCTCAGGTGATCCACTGACCTTGGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 175918 . AGGCGCGGTGGCTCATGCCTATAATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 176001 . G . . NS=1;CGA_WINEND=177417 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.62:1.26:.:0:.:0:0.999:153 -1 176034 . AAAAATATATTTTAAAAATTAGCTGGGCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 176082 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 176110 . AGAACCACTTGAACCTGGGAGGTGGAGGTTGCAGTGAGCGGAGATCACGCCACTGCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 176182 . CAATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 176199 . CTCAAAAACAAAACAAAACAAAACAAAACAAAAAACCACTAAAAAAAAGACT . . . NS=1;AN=0 GT:PS ./.:. -1 176299 . C . . . NS=1;AN=0 GT:PS ./.:. -1 176309 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs3897087;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:21:56,21:56,21:25,22:-56,-21,0:-25,-22,0:0:0,0:0 -1 176357 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:VQLOW:34,.:0,.:0,.:0:0 -1 176366 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:VQLOW:25,.:0,.:0,.:0:0 -1 176372 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs3897088;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:30:30,55:8,20:15,0:-55,-30,0:-15,0,0:4:2,2:2 -1 176409 . AAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 176523 . AAGGTAAGAAATGTAAATTTGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 176620 . TTAAGGGAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 176673 . TTGTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 176762 . GTCACCAACATCGATGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 176929 . GCCGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 176977 . ACAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 177073 . CAAGAAAAGAGAGAAGAATGGAGACGGCAGCACGAG . . . NS=1;AN=0 GT:PS ./.:. -1 177132 . TTGAGATCAGTTATATTTCTTCTGACAAAAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 177289 . TACTTTGATTCCAAATAAAACAAATATTTAAAAAATTTAATGAATA . . . NS=1;AN=0 GT:PS ./.:. -1 177376 . G . . . NS=1;AN=0 GT:PS ./.:. -1 177400 . TGGTGGTCTAGAGAATTC . . . NS=1;AN=0 GT:PS ./.:. -1 177418 . N . . END=227417;NS=1;AN=0 GT:PS ./.:. -1 227418 . G . . NS=1;CGA_WINEND=230000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.55:.:0:.:0:0.999:167 -1 227418 . GATTCATGGCTGAAATCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 227455 . GTCTCTCAATCCGATCAAGTAGATGTCTAAAATTAACCGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 227509 . CTGATTCATGGCTGAAATTGTGTTTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 227543 . TGTGTGTCTCTTAATCCACTCAAGTAGATGTCTAAAATTAACCATCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 227599 . TGCCTGATTCATGGCTGAAATCACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 227632 . GCTATGTGTGTCTCTTAATCCAGTCAAGTAGATGTCTAAAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 227695 . CTGATTCATGGCTGAAATCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 227737 . TCTCAATCCGATCAAGTAGATGTCTGAAATTAACCATCA . . . NS=1;AN=0 GT:PS ./.:. -1 227782 . TTATGCCTGATTCATGGCTGAAATTTCAGGATGAAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 227839 . T . . . NS=1;AN=0 GT:PS ./.:. -1 227851 . T . . . NS=1;AN=0 GT:PS ./.:. -1 227880 . TCTTAACTCCAGAGAGCATTGCAAAATTCATTTATGAAAACCTCTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 227949 . TTGGAAAAAAATAAGCATTTATAAATAAATATTC . . . NS=1;AN=0 GT:PS ./.:. -1 228018 . TTTCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 228070 . G . . . NS=1;AN=0 GT:PS ./.:. -1 228093 . CTATATCTTCAAAATTATCATTATTGAATATAAAACAAGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 228154 . GTTCTAGTCAAATAAGCTAATATTATACTTACTAGAAACGTA . . . NS=1;AN=0 GT:PS ./.:. -1 228211 . TAGATTTGATTCTAATTAAGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 228250 . CATTATTTTTTTTATGCTGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 228337 . ATCGAAAGCATCATAATCAGGAGCAAGTCGAACATATGCCTTCTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 228425 . TCATAGAGCTTCTTCACAGCCTGTCTGATCTGGTGCTTGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 228491 . AGCGTGTTGTTTTCTTCTATCT . . . NS=1;AN=0 GT:PS ./.:. -1 228528 . CAGTGGTCAGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 228590 . TTCCGAGGATATCTGGGCTGCCTCCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 228644 . GTGAGTGACATGCGGATCTTCTTTTTTGCGT . . . NS=1;AN=0 GT:PS ./.:. -1 228687 . ACCTTTCAACACTGCCTTCTTGGCCTTTAAGGCCTTCGCTTTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 228780 . GTGAAAAGCGAAAAACATTATTTCAAAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 228874 . AAAGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 228947 . A . . . NS=1;AN=0 GT:PS ./.:. -1 229001 . AGAGAGACTAAAGATATTTTGGCCCGTTAATAAACATGTTTTTTTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 229136 . ATGCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 229181 . CTGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 229202 . AGTATTAAAATTATAATCAATATATGTAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 229270 . CTATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 229327 . TTTTATAAGGAAAACCATACAGAAGATACAAATAAAAAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 229401 . GTAAAATGTTATGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 229561 . TTTAGGAGGCTGAGGCGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 229673 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6678242;CGA_RPT=AluY|Alu|10.4;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:11:110,11:110,11:37,15:-110,-11,0:-37,-15,0:25:25,25:0 -1 229801 . TCTCTAAATAAATAAATAAATAAATGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 229865 . CCCCGTCTCTACCAAAAATACAAAAATTAGATGGGCAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 230001 . T . . NS=1;CGA_WINEND=232000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.46:1.77:.:0:.:0:0.999:167 -1 230018 . CGATAGAGCCAGACCCTGTCTCAAAAAAAATTTTTTTAAATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 230430 . ACATAGT . . . NS=1;AN=0 GT:PS ./.:. -1 230541 . TGGGATATCTGCCACAATGCATTTGTCGAAATATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 230630 . ACTCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 230685 . CTACAAATTACGATGGTTTGGATGTGGTTTGTCCCCACAA . . . NS=1;AN=0 GT:PS ./.:. -1 230768 . G . . . NS=1;AN=0 GT:PS ./.:. -1 230794 . GTCGTGGGGACGGATCCCTCATGAAAGGATTAATGTCCTCCATGGGGGTGAGTGAGTTCTGTTCTCACAGG . . . NS=1;AN=0 GT:PS ./.:. -1 230921 . CCTCTTGCTTTCACTTCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 230955 . TGCACCCCTTGCTCCCCTTCCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 230989 . AGGTGAAAAAGACTGAGGCCCCGCCAGATGCAACTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 231057 . TGAACCAAATGAAACTTTTTTACTTATAAATTACGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 231116 . AGCACAAAATG . . . NS=1;AN=0 GT:PS ./.:. -1 231139 . AATCTAGGTAAAAACTTTGAAAATGAATAGAATCTGTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 231202 . CATTATTGGATTCCATTTTATAAAGTTCTTTCCAACAGAAGCAATTGT . . . NS=1;AN=0 GT:PS ./.:. -1 231477 . TTAGGTGTGTGTAGGTAGGTTAGACACGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 231523 . TTGCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 231681 . ACTAAAAAAATTAAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 231857 . GCATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 231984 . ATGCAGTACGGACAAGGAGGAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 232001 . A . . NS=1;CGA_WINEND=234000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.92:1.67:.:0:.:0:0.999:167 -1 232063 . GAATATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 232104 . CAATGGGGGGCATT . . . NS=1;AN=0 GT:PS ./.:. -1 232212 . ATGACTACATGCCGT . . . NS=1;AN=0 GT:PS ./.:. -1 232367 . TGAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 232410 . TTGCGGCTACATGGGAAATCTCTGCTTTTTTTTTTTGACGATT . . . NS=1;AN=0 GT:PS ./.:. -1 232471 . AGACGTAAAATAAAACTTTATTTAAAACACAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 232566 . CTAGTCCTGTTTTTTAAAATAAGAGCAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 232626 . ATATATAAATCAAAACAAATGTCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 232672 . AGTAACAATATGTGTAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 232771 . AAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 232832 . TGTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 232874 . GATCAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 232927 . TAGCCATAATACAACAGAATCAAATATTGGCCACTGG . . . NS=1;AN=0 GT:PS ./.:. -1 233001 . CTTATGA . . . NS=1;AN=0 GT:PS ./.:. -1 233030 . CACAATGCTTTCTAAAACAAAAGAGTCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 233083 . TCAGCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 233140 . TACGGGTGATTTTAAATGTTGCTATGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 233218 . AACCCAAACTCTGGAATGTTTGCAAATTTAGTTGAGCTTCTGTGTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 233551 . CATTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 233629 . CTTACACGTATTGATTGATCTCTCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 233687 . CCCGACC . . . NS=1;AN=0 GT:PS ./.:. -1 233769 . TTCTAAAAAATCTGAGAGCTGTCTCAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 233811 . ACATGTAATGTAGGATGTCAATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 234001 . T . . NS=1;CGA_WINEND=236000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:2.24:1.81:.:0:.:0:0.999:167 -1 234156 . GATATCT . . . NS=1;AN=0 GT:PS ./.:. -1 234426 . GGACACACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 234463 . GACCCACCCTCGGGTGGGTCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 234539 . GACACTGAACCTAAATCCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 234606 . TGATCTC . . . NS=1;AN=0 GT:PS ./.:. -1 234710 . ATGTCTA . . . NS=1;AN=0 GT:PS ./.:. -1 234756 . CTTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 235080 . TGACTTTTTCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 235435 . ATATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 235500 . TTTAAAAGACGGT . . . NS=1;AN=0 GT:PS ./.:. -1 235604 . CTGACTCATG . . . NS=1;AN=0 GT:PS ./.:. -1 235840 . AAAACCCT . . . NS=1;AN=0 GT:PS ./.:. -1 235926 . TTCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 235973 . TGGCTCTTTTTCATGTCCTTTATCAAGTTTGGATC . . . NS=1;AN=0 GT:PS ./.:. -1 236001 . T . . NS=1;CGA_WINEND=238000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.65:1.95:.:0:.:0:0.999:167 -1 236197 . ACATGTC . . . NS=1;AN=0 GT:PS ./.:. -1 236475 . TGGGCTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 236615 . CTGCTGCTTCCTGGAGGAAGACAGTCCCTCTGTCCCTCTGTCTCTGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 236682 . TCCTGGAGGGAGACAGTCCCTCAGTCCCTCTGTCTCTGCCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 236733 . CTGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 236790 . ACATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 237006 . T . . . NS=1;AN=0 GT:PS ./.:. -1 237058 . AGGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 237158 . A . . . NS=1;AN=0 GT:PS ./.:. -1 237269 . ACCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 237328 . AGACGAT . . . NS=1;AN=0 GT:PS ./.:. -1 237420 . ACTTGGGAGGCTGAGGCAGGAGAATGGCTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 237467 . CTTGCAGTGAGCCAAGATCACGCCACT . . . NS=1;AN=0 GT:PS ./.:. -1 237525 . TCTCAAAAAAAAAAAAAAAACTTA . . . NS=1;AN=0 GT:PS ./.:. -1 237663 . TATACAG . . . NS=1;AN=0 GT:PS ./.:. -1 237749 . TGCAGCACCAGGTGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 237801 . GCGGAGTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 237869 . TTAGTTTGCGTTGTGTTTCTCCAACTT . . . NS=1;AN=0 GT:PS ./.:. -1 238001 . C . . NS=1;CGA_WINEND=240000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.41:1.51:.:0:.:0:0.999:167 -1 238055 . AAAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 238213 . AACAATTTGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 238263 . TTCCCCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 238307 . AGAAAATCTTTGGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 238470 . GAACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 238817 . ATTCTTTTGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 238987 . AATTAAAAAAAAAAAAAAAGAAGAAGAAGAGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 239036 . GATTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 239161 . GAGGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 239243 . TATGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 239479 . AACGAAGCTCTCTTTATTTGCTTCTGCTAATTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 239693 . TTCTGTAGGTTGTACAATAACTTTTGGTGAGAAAAAATAAAAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 239819 . TCAATTTTATTGAAGTTCACTTCTGACCTCTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 240001 . C . . NS=1;CGA_WINEND=242000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.16:1.55:.:0:.:0:0.999:167 -1 240017 . TTCCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 240082 . AGACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 240144 . CTCCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 240549 . AAATGAAAATCTGACCACGTTACTCT . . . NS=1;AN=0 GT:PS ./.:. -1 240608 . TCAAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 240850 . GCTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 240895 . GCCTCACTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 241079 . TTTTGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 241133 . GCTCCTTTTCTTTTCTTTTCTTTTCTTTTTTCTTTTTTTTTTTTTTTTGAGTCAGAATCTCGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 241220 . TGGTGCG . . . NS=1;AN=0 GT:PS ./.:. -1 241271 . T . . . NS=1;AN=0 GT:PS ./.:. -1 241317 . CACACCCGGCTAATTTTTTTTTTTTTTTTGTATTTTTTAGTAGAGACTGTGTCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 241482 . GCATAAACTAAATGTTTTCCAAAGGGAATAGGGCAAAACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 241625 . GGGCTCTCCACTTACAAGAAGAGAGCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 241747 . CCTGTTAATTTAATCACACGGAACACTTCTATTTAAAATTCC . . . NS=1;AN=0 GT:PS ./.:. -1 241833 . ACACTGCTTGGAGTGTCAGGCCTAGATCTCTATCCATCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 241910 . ATGGGGCAATTTCTTAAAAGCACCATGTATTTTATCG . . . NS=1;AN=0 GT:PS ./.:. -1 241989 . AATAAATGTCTTCCACAATCCCATAGCCCAGAGCTAACTAACCACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 242001 . C . . NS=1;CGA_WINEND=244000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:1.40:.:0:.:0:0.999:167 -1 242340 . TGGGAGGCACAGTGGAAGATCATG . . . NS=1;AN=0 GT:PS ./.:. -1 242461 . AGACTGTGGGTCCCCTCAGTCTTGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 242575 . AAACTGCCAATCATGGAGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 242695 . AGGGAGGCAACCTCCAAAGGTGGGGCCCTCTGCTCACCTG . . . NS=1;AN=0 GT:PS ./.:. -1 242811 . TTTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 242949 . AAACGGCCTTTTGAGTTGAGCAATAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 243186 . TAACATTACAGTAACTGTTACAGGTTCCAGCAGGCTAACTGGGTGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 243397 . TGTGTGTTGGGATAGAGTGGTAAGAAAATGGGAAATAATAAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 243496 . CACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 243572 . CTTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 243761 . ATTTGACTA . . . NS=1;AN=0 GT:PS ./.:. -1 243850 . CAAGT C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs140116466;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:322:322,322:269,269:32,45:-322,0,-322:-32,0,-45:67:17,50:50 -1 243898 . TCAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 244001 . C . . NS=1;CGA_WINEND=246000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.98:1.46:.:0:.:0:0.999:167 -1 244285 . TACATTTTCCATGTGCTGTAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 244353 . CCCTGGTGAGCCACAGAGATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 244642 . CCTGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 244782 . ACATGGGCACCCATATTTTTCTAGCCACTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 244831 . G . . . NS=1;AN=0 GT:PS ./.:. -1 244920 . CACTTCTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 244954 . TGAC . . . NS=1;AN=0 GT:PS ./.:. -1 245021 . ATGCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 245088 . T . . . NS=1;AN=0 GT:PS ./.:. -1 245105 . TGTATTAATGTGCCTTTCTAGTAACAGGTTTTTAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 245165 . TATTTGTGTGTGTGCATGTGGTAGTGGGGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 245217 . AGAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 245258 . GGGGAAAAAATTTTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 245325 . TAGAGTTGCTTTTATTTATTTATTTATTTATTTATTTATTTTTCCTTTTTTTTCTTTCTCTTTTTTTCTTCTTTTTTTTTTCTTTTCTTTCTTTTTTTTTTTTTTTGGACAGAGTCTCACACTGTCACCTCGGCTGGAGTGCATTGGTGCAATCTCGACTCACTGCAACTT . . . NS=1;AN=0 GT:PS ./.:. -1 245602 . TGAGGTTTCACTATGTTGGCCAGGCTGGTCTCAAACTCCTGACCTCATGA . . . NS=1;AN=0 GT:PS ./.:. -1 245658 . CACGTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 246001 . C . . NS=1;CGA_WINEND=248000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:1.35:.:0:.:0:0.999:167 -1 246056 . ATGCGAACTGGGAGGGGAGATTTTTCAGGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 246134 . GAAAGTGTATAATGATGTCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 246186 . GCCGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 246252 . AAAAGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 246408 . TCGCCATGCCTAGTACAGACTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 246633 . TTTGGACAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 246795 . TCCGTGTTACTGAGCAGTTCTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 246852 . TTCTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 246900 . T . . . NS=1;AN=0 GT:PS ./.:. -1 246974 . ACAATTTTTATACATAAAGATTTCATAAAACCA . . . NS=1;AN=0 GT:PS ./.:. -1 247067 . TACCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 247172 . ACATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 247295 . GACACAA . . . NS=1;AN=0 GT:PS ./.:. -1 247406 . GTACTTTGGGAGGCCAAGGTGGGCAGGTCACTTGAGGTCAGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 247527 . TGCCTGTAATCCCAGCTACTTAGGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 247582 . AAGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 247632 . G . . . NS=1;AN=0 GT:PS ./.:. -1 247652 . C . . . NS=1;AN=0 GT:PS ./.:. -1 247712 . AAAATATTACACATGTGTAATCCCAGCATTTTGAGATGCC . . . NS=1;AN=0 GT:PS ./.:. -1 247935 . C . . . NS=1;AN=0 GT:PS ./.:. -1 247983 . TCCTATCTCAAAAAAAAAAAAAAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 248001 . A . . NS=1;CGA_WINEND=250000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.07:1.52:.:0:.:0:0.999:167 -1 248108 . GGAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 248225 . GCCTGGCCAACATGGTGAAATCTTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 248467 . ACATAGAACTAATTTATAAATCAAAGCACTATGCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 248735 . C . . . NS=1;AN=0 GT:PS ./.:. -1 248828 . TTTCTTTTTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 248951 . TTCAGGTTTTATACCTACCTTATAGATAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 249269 . ATTCGGGTTTTTTTTTTAAAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 249401 . CATTTCTACTAAAAATAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 249555 . GAGCGAGATTCCGTCTCAAAAAGTAAAATAAAATAAAATAAAAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 249698 . TCAGGGTCCTAGCAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 249840 . TGTTGGAGGTGGGGCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 249938 . TTTTCTTTTTGCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 250001 . G . . NS=1;CGA_WINEND=252000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.19:1.52:.:0:.:0:0.999:167 -1 250054 . CAGAGTAGCTAGGATTACAGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 250120 . C . . . NS=1;AN=0 GT:PS ./.:. -1 250287 . AACCCCTCTCTCTCGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 250340 . CTGTCATGAGTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 250421 . AGCCAAATAAACCTCTCTTCTTTAAAATTATTCAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 250474 . AACAACACACACACACACACACACACATACACACACACG . . . NS=1;AN=0 GT:PS ./.:. -1 250537 . AATTAGAAATGGTGATGCACCGAGGGATTGGCACCGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 250632 . TAATGGTTAAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 250697 . TCAACTAGAATCTAGGAAGCAGAGAACCTGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 250834 . CAACAGAGCGACTCAGATGCTATAAAACTTGCTAACA . . . NS=1;AN=0 GT:PS ./.:. -1 250926 . CACAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 251008 . ACACGCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 251071 . TTCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 251134 . GACATTAGACATATA . . . NS=1;AN=0 GT:PS ./.:. -1 251254 . ATACAAAGAGTAATACCATGTCACTTAAGAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 251369 . TTCCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 251435 . ACACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 251493 . GCTCAGATACCTTCTCCGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 251591 . GTAGTGAGCAAATATCCTGAAGTTGAGAATGCTTCTACCTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 251645 . GGAATATTCATCAAAACACAAC . . . NS=1;AN=0 GT:PS ./.:. -1 251799 . TTCCAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 251920 . GTCAATATATATATAGATATATACACACACTCA . . . NS=1;AN=0 GT:PS ./.:. -1 252001 . T . . NS=1;CGA_WINEND=254000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.32:1.32:.:0:.:0:0.999:167 -1 252028 . TACAATAAACATGTGTTTTTAACAAGAAAAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 252100 . A . . . NS=1;AN=0 GT:PS ./.:. -1 252201 . TAGAGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 252299 . ACATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 252355 . AGAGCAAGCTGGGAAAGCAGTGGCCTTTAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 252469 . AATAGTAAACTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 252953 . TCATCAG . . . NS=1;AN=0 GT:PS ./.:. -1 253126 . GCAGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 253223 . CTTGAGCAAATGGTAAATTAACTCTCTCTTTTCTCTCTCTCTCTAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 253437 . TTTACTGGAGTACACAATTGTGACTATTTTTAGCCATAGGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 253598 . TTACACTTAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 253829 . GGAGATTTGGACATAGAGAGAGGCACACGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 254001 . C . . NS=1;CGA_WINEND=256000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.21:1.27:.:0:.:0:0.999:167 -1 254109 . GATCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 254257 . TTAAGGCCTTGCTTTAAAGCTTCAATGGGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 254355 . TCCTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 254489 . TTCAAATGTCACTTCCCTGTAAAAGCTTCCTGGCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 254567 . CTATAACAACCTAATATATTCTCAATTGATTAACTGT . . . NS=1;AN=0 GT:PS ./.:. -1 254639 . AAACGTGGCCAGGTGCAGTGGCTCACACCTGTAATCCCACCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 254716 . CTTCAAAAAATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 254857 . TATCAAAAAAAAAAAAAAAAAAAAAAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 254917 . TATCTATCACGTTCACCTCCCAAGAGGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 254993 . AATGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 255123 . T . . . NS=1;AN=0 GT:PS ./.:. -1 255141 . TTTCCAGTTATATATCTGGTAGAGATTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 255211 . TGACCCAGCATGGCTGAACACTCAGTGACTACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 255319 . TTATATTCAGAATTACTCAAGTCTTAGAAGCACCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 255382 . TCAAGTGATGGGCTGAAGTGAAGGGAGGGAGTCACTCACTTGAATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 255500 . TGATTGG . . . NS=1;AN=0 GT:PS ./.:. -1 255630 . GAAATTTAGAATAAATTAATAAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 255724 . ATATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 255816 . TGCTTGAGGGATAA . . . NS=1;AN=0 GT:PS ./.:. -1 255889 . GTGTATGCGTGTGTGTGTGTATGTGTGTGTGTGTGTGTGTGAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 256001 . C . . NS=1;CGA_WINEND=258000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.98:1.27:.:0:.:0:0.999:167 -1 256019 . CATAAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 256088 . ATATATGCAATATATATACATATATACACACATATACATATGTATTTAAATATTTAAATTACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 256266 . CAACCCTCCTGTATTAGTCTCCCCAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 256319 . ATGTCCACCTTTATGCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 256374 . AACTTAATAATAAAAACATTTCAAATGTAAAGAAATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 256439 . AAATTGG . . . NS=1;AN=0 GT:PS ./.:. -1 256458 . AACGGACACTTTTCAAAAGAATACATGCATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 256532 . TTAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 256720 . AAGCAGAGCTACCATTCG . . . NS=1;AN=0 GT:PS ./.:. -1 256755 . CTGGGTATATACCCAGATGAATATAAACCAT . . . NS=1;AN=0 GT:PS ./.:. -1 256819 . TTGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 257015 . ATACAGCATACTCTCAGTTATAAGTGGGAGCTAAATGAT . . . NS=1;AN=0 GT:PS ./.:. -1 257240 . AAATAAAAGTTAAAAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 257288 . CTGGTAATATGAAAAACACA . . . NS=1;AN=0 GT:PS ./.:. -1 257414 . ACACCTGTAATCCCAGCACTTTGGGAGGCCGATGCTGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 257473 . TTCGGGACC . . . NS=1;AN=0 GT:PS ./.:. -1 257527 . AAATTAGCTGGGTGTGGTGGCTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 257591 . ATCGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 257656 . GCAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 257678 . GGGGAAAAAAAAAAACAAAAAAAACCACCACCATCATTTTGCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 257803 . ATTTGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 257834 . GTTATTATTTTGTATGCGATGACAACAGAATATATTATCATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 257887 . AATCTCATTCATAATATAAAGTATAAATTTGTGATTTTGCTTTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 258001 . A . . NS=1;CGA_WINEND=260000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.00:1.59:.:0:.:0:0.999:167 -1 258025 . AAAATTTGAAACTAGTAACATGGAGGACTATTGTCATTGTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 258086 . TTCTGCAAAGCAGTGTACATAAAAATAATTTCAAGAAATTTATAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 258141 . TTATGGTGTATAAACAACTTTAGATTCTTTGTTTAAGAAATACT . . . NS=1;AN=0 GT:PS ./.:. -1 258233 . GGCTAATAGTAGGCACCTAATA . . . NS=1;AN=0 GT:PS ./.:. -1 258349 . AACATTTGATGGAATCATGCTTTTACTTTCTGCTTACGAC . . . NS=1;AN=0 GT:PS ./.:. -1 258444 . CTGAGCAATGTGGCACCTGCTGAAGCCTGCTGCCTCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 258517 . CTCTAACATTTTTTAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 258599 . TTTTCCCCTGACAAATTACTTATCATCTATCATAATTCAGGTTAAATG . . . NS=1;AN=0 GT:PS ./.:. -1 258674 . CTGTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 258719 . TAGTATCGAATCAAGTTTATAATTTTAAAATAATTGGGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 258845 . TCAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 258878 . GAACAAGCACTAAATAAATGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 258922 . CTCATTTTCAGAACAGAGTACTAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 258973 . CACGTAGGTAATTTACAAGGGCTACAATTTCAGCTCAGATTTACCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 259060 . CACTTCAGATTCTTCTTTAGAAAACTTGGACAATAGCATTTGCTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 259122 . CTAAGAATCAAGAGAGATATCTGACA . . . NS=1;AN=0 GT:PS ./.:. -1 259163 . AAACATTAAACACGAT . . . NS=1;AN=0 GT:PS ./.:. -1 259236 . AAATAGTAATACTTATTGCAGACTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 259345 . A . . . NS=1;AN=0 GT:PS ./.:. -1 259385 . GTGATTTTTCAGGTTCACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 259430 . A . . . NS=1;AN=0 GT:PS ./.:. -1 259496 . TATGAAAACAAGAGATAAATATACACAACTGAGGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 259613 . TAACTTTTAATAGAACCAGTCACTAAATTAAAAAAATGAC . . . NS=1;AN=0 GT:PS ./.:. -1 259664 . C . . . NS=1;AN=0 GT:PS ./.:. -1 259742 . GCTTACTGAATAAGCTGCTAAGGTTTGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 259782 . GTGCGGTGAAATGATGTCTACATCACAGTCCAACATTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 259871 . AGACGTTTGTATATGATAAGAGAGCCAGAGTACAATTTAGGAGAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 259996 . GATAAGATATTGTGGCTGCTACCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 260001 . G . . NS=1;CGA_WINEND=262000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:1.54:.:0:.:0:0.999:167 -1 260153 . GATTAGATTATAAGATACTGTGAATTTCTTCTTGTGTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 260228 . ATGAAGCCAACTGGCATGCTGTCAGTGGCCCAGTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 260356 . TAGCTAAGAACCATGTGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 260396 . CCTCAGTTGAAATTTAAGATGACATATTGAGCAGACATACTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 260521 . GTATTACACCTTCAGTGAGCACGTGTACTAGAAATTTAAAAAATAAATAAAATAAACCTTCAAAGTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 260679 . CAGGTTTCAGCCTGAACTCACACAATCTGTGTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 260785 . G . . . NS=1;AN=0 GT:PS ./.:. -1 260820 . AGCAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 260850 . CATGAGCTTCTAACACACACACAAAAATCACACACACAAAATGGGGGTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 260969 . AAACACATATTTTAATGTGGTTAATTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 261026 . CAAGAAAATTGTGCTGGATGGCCACTTCCACCATGGCTCCCCTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 261077 . AGTCTGGGTACTGTGTCACCCGAAGTCTTCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 261122 . GGTCTGGGTTTGCCTATGAAAGAAACTCA . . . NS=1;AN=0 GT:PS ./.:. -1 261160 . GAAATGAGGAGTGAAGAGGAGGTCTTCAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 261231 . GTGATGGCTTGCAGAATCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 261293 . TTCAGAAATTCCTATAAGCTTGGGTTCTGTGCCCACACTCTAGACTGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 261357 . ATATAAAACAGACCTCTTCTGATTTTGTCTAGCTGCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 261449 . A . . . NS=1;AN=0 GT:PS ./.:. -1 261455 . C . . . NS=1;AN=0 GT:PS ./.:. -1 261463 . A . . . NS=1;AN=0 GT:PS ./.:. -1 261515 . TAAAATATAAAGAATTGTCCAGAAATATATAAAAAAAGAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 261583 . TATAACAATTGTATGGACTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 261635 . TTTGAAGAAAAAAGCAATAAGAAGCCTCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 261746 . TTGAGATGTTTATTATAATGAATTATCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 261786 . ACAATTTCCTAACAATTTTGGGGTTTATATTTTTGAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 261849 . TTGTACTATTGTTAGGTAACTTTGATGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 262001 . A . . NS=1;CGA_WINEND=264000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.07:1.55:.:0:.:0:0.999:167 -1 262069 . CTGTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 262089 . GGACCTATCATAAAAAATTCCTCAAGACTGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 262136 . CACCCTCACAAGAACACTTGCCTAGCAATGGCTGTTTCTGCCAGTAAGTTAACACCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 262204 . AGACCCTGTGACCAATGATGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 262329 . ATTATAATCCTTGTTTATTTCCAAATAAATTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 262400 . ACATTGTTATTATGAAATTGGTTGGGTGATGTGTCTTATTTTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 262489 . CATGTTTGCTAGAACTCACCTGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 262539 . TTTTTGTGTTTAGTTTTTCTTTTGTGATTGGGGAGGGGGGTTTATCGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 262608 . A . . . NS=1;AN=0 GT:PS ./.:. -1 262715 . CTGCTTCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 262826 . CTCGTAGTATTTATAGTAAAAGTGAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 262888 . TGCCCAAGTGCTGGTCTGGTCTGATCTTCTCATCTTCCCTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 262943 . CAGTCACACTAGCCTCCTTGCTGCTCCACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 263033 . CTTTTCTCCCATAT . . . NS=1;AN=0 GT:PS ./.:. -1 263088 . TAAATGTCCCATTCTCTGTGAAGCTTTCCTGCCCACCCTATTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 263195 . AAACTGTAAATATACATGTTCACTTTTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 263243 . TGGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 263302 . A . . . NS=1;AN=0 GT:PS ./.:. -1 263313 . TACTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 263351 . CACATTGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 263417 . C . . . NS=1;AN=0 GT:PS ./.:. -1 263547 . ATATTATTTTCATGTATAAAGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 263625 . C . . . NS=1;AN=0 GT:PS ./.:. -1 263642 . T . . . NS=1;AN=0 GT:PS ./.:. -1 263661 . A . . . NS=1;AN=0 GT:PS ./.:. -1 263682 . AGGGCCAAAAGAGTCAACTTCTGAAGAAGCGCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 263760 . TTGCAAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 263786 . ATTCATGAGTAGAAAAATAGACTAGTGGAATAACATAAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 263936 . GGAGAATAATGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 264001 . T . . NS=1;CGA_WINEND=266000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:1.43:.:0:.:0:0.999:167 -1 264063 . TTACCCAGATGGGCCCAGTCTAATCACATGAGTTCTTAAAAATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 264188 . A . . . NS=1;AN=0 GT:PS ./.:. -1 264230 . CTAGAAGATAGAAAAGGCCAGGATATGGATTCTACCCTAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 264300 . TTGATTTTAGTTCACTAAAATTCA . . . NS=1;AN=0 GT:PS ./.:. -1 264373 . TTTAGGTCACTTAGTTTGTAGAAATTTGTTACAGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 264415 . G . . . NS=1;AN=0 GT:PS ./.:. -1 264474 . AATTCAAGGTGAATTC . . . NS=1;AN=0 GT:PS ./.:. -1 264500 . CTTAAAACATTTAGATTAAAAATAAATGAGAATTTTTGTTACTTTTGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 264563 . AGAAAAACAAACATTAAGGAGGAAAAATGAACATATG . . . NS=1;AN=0 GT:PS ./.:. -1 264630 . GGAAGATATCATAAGGTGACAAATCATAAACTGTAATATTTACAA . . . NS=1;AN=0 GT:PS ./.:. -1 264680 . ATATAAGTGAATAAATATACATTTAGAATATATATGAACTCCCAAAAATCAACAGGAAAAATAAGACATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 264759 . AAATGCATAAACAAAAGAAGGCAAAACAAAAATAATGACTCATAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 264831 . GATGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 264843 . AATGCAAATTAAAACCACCCTGAGATGCTTTTTACATCCATGAGCCTGATAAAAGTTAGAGTCTAAAAGTAATAATTAACAAAGATGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 265042 . G . . . NS=1;AN=0 GT:PS ./.:. -1 265046 . A . . . NS=1;AN=0 GT:PS ./.:. -1 265115 . AACATTGTTTGTTATATCAAAAAATAAAAAAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 265160 . CAGCAAAAAAAATAAGTAAAAATAAATCCTGTTGTATTCTAACAATGGAATAATATATAGCCATTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 265258 . TAAGTATCAGCAAAACATATTGTTTAGTGAAAAACTAA . . . NS=1;AN=0 GT:PS ./.:. -1 265330 . T . . . NS=1;AN=0 GT:PS ./.:. -1 265354 . A . . . NS=1;AN=0 GT:PS ./.:. -1 265421 . TAAGAGGGATAGCAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 265447 . AGTTGAGGGAATTTCAATTGGAAAAAAATAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 265490 . TAAGTCAGGTAGTGGGTATTAGCATTTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 265538 . CTTATAGC . . . NS=1;AN=0 GT:PS ./.:. -1 265554 . ATATTTTCAATGTATTTAATGTATTTTTTGCATAATTAAATATTATGCAATAAAAATGAGAAAACAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 265634 . GATAAATTACAATAAAGAAATGGAGAAAAAATTATAATCTAGTTGAGTAATGGTATATTACATAGCTATTTTCTTAAGTAGATGT . . . NS=1;AN=0 GT:PS ./.:. -1 265755 . CTTAATTATATATAAATATATATGTACATATTTTTAATATAAAATACTAAACAAAGTACACCAAAATATTAGCTCCTATGTTAGTGAGATAATG . . . NS=1;AN=0 GT:PS ./.:. -1 265857 . TTTTGTATTTTAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 265885 . TGTATTTTTCTGTTTTCATACTGCTATAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 265989 . CTCAGGAAATCTACAATCATGGCGGAAGACAAAGAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 266001 . A . . NS=1;CGA_WINEND=267719 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.64:1.33:.:0:.:0:0.999:167 -1 266038 . TTCTTCGCAAGGCAGCATGAAGAAGTGCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 266105 . CTCGTGAGAACTCACTATCACAAGAACA . . . NS=1;AN=0 GT:PS ./.:. -1 266148 . CCCCCATGATTCAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 266229 . C . . . NS=1;AN=0 GT:PS ./.:. -1 266240 . T . . . NS=1;AN=0 GT:PS ./.:. -1 266273 . TGTATTGAATTTTAAACTCAGAGAAAAATACT . . . NS=1;AN=0 GT:PS ./.:. -1 266397 . GAATGTGGCCTTGTAAGAAAGCAAATTAACTTCTAAC . . . NS=1;AN=0 GT:PS ./.:. -1 266494 . ACTTGAACTGCAGTAAAATATCCTCAGCAACATAGATGTGTGTGTTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 266555 . ATACAAATTTAATGAAACTCCATTGGTGGTGTTTTTAATC . . . NS=1;AN=0 GT:PS ./.:. -1 266607 . AAGATGTCCTGGCTTATTCACAGATGCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 266709 . GGAGTTTTTGTAAGGAATTAATTAATAAAAATGTTCTTGAAAGAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 266822 . ATCGCTATTTTTTTTTTGACACACACTTTACA . . . NS=1;AN=0 GT:PS ./.:. -1 266924 . TGATGGTAAATCATTTTCTAC . . . NS=1;AN=0 GT:PS ./.:. -1 266954 . GAAATGTCTTGTCTATTCAGGTTCTGCTCTACTTAAAAGTTTTCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 267096 . CTTCAAGTTTGCTCTTAGCAAGTAATTGTTTCAGTATCTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 267198 . TTCATTGAATCCTGGATGCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 267229 . AATAAGAGGAATTCATATGGATCAGCTAGAAAAAAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 267286 . A . . END=267719;NS=1;AN=0 GT:PS ./.:. -1 267720 . N . . END=317719;NS=1;AN=0 GT:PS ./.:. -1 317720 . G . . NS=1;CGA_WINEND=320000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.27:.:0:.:0:0.999:211 -1 317720 . GATCTACCATGAAAGACTTGTGAATCCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 317771 . ATGTTATTCAGGTACAAAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 317885 . A . . . NS=1;AN=0 GT:PS ./.:. -1 317926 . TGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 317988 . CGCAAACTGCCCGGGAAGGGAAACCA . . . NS=1;AN=0 GT:PS ./.:. -1 318093 . AGACACAGTAATTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 318135 . TCATTGCAGTTTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 318233 . TTTCGTTAATTCTCACAGAATCACATATAGG . . . NS=1;AN=0 GT:PS ./.:. -1 318366 . GGTCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 318470 . T . . . NS=1;AN=0 GT:PS ./.:. -1 318493 . T . . . NS=1;AN=0 GT:PS ./.:. -1 318583 . AGACCCCACAAGTTAGGGGCTCAGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 318684 . AATCGGGGGTTCCCGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 318764 . TTCTTTTTTTCTGAGAGAGAGGGTCTTATTTTGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 318932 . TTTATTTTTTGTATAGATGGGGTCTTGTTGTGTTGGCCAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 318992 . CTCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 319092 . GGATACATCTCAGAAACAGTCAATGAAAGAGACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 319193 . CTTAAACTCAGGAGTTTGAGACCAGCCTGGGCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 319253 . AATTAAAAAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 319306 . CTACTAGGGAAGCTGAGATGAGAGGATACCTTGAGCTGGGGACTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 319422 . TCTCAAAAAAAAGAAAGATACCCAGGGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 319462 . GAGGGGCACAGAGCTCCCATGCCCTCTGTTGAACATGCGACCCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 319656 . TGCATGAGGCTGAAAGTTCCAAGCCTCTTACCATGTGGTTGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 319755 . CAACAACATCCCCAAATGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 319799 . AGTTCTTAGAGGCTCTTGTGTTAGAAACCTGGGACCAAGATCAAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 319871 . ATCTATCACTGAGGTCTTTGTAAGAGCTTTAGAAGCTCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 319949 . TTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 320001 . G . . NS=1;CGA_WINEND=322000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.02:1.56:.:0:.:0:0.999:211 -1 320148 . TGGCCATGTTGGCCAGATGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 320196 . CAAATTCCTGGGCTCAAGTGATC . . . NS=1;AN=0 GT:PS ./.:. -1 320252 . TTACGTCGTCCAGGCTGATCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 320294 . TTGTCTCACCTCAGCCTCTCAAGTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 320365 . CACAGATGGGATTTGGGCATAGGTTTGGTTTCCCAGGGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 320581 . TCGTTTTGGATGCACTTGAGCAGGGGTCCCCAACCCCTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 320631 . CGCAAGGAGCCACACAGCAGGAGGTGAGCGGTGTCGAGTGAGGGAGTGAGGGAAGCTTCGTCTGTATTTACAGC . . . NS=1;AN=0 GT:PS ./.:. -1 320743 . TCTCAGATGAGCAGCAGCGTTAGATTCTCATAGGAGAACGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 320795 . AACCGTGCATGTGAGGGATCTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 320896 . TGCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 321061 . C . . . NS=1;AN=0 GT:PS ./.:. -1 321103 . TCACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 321122 . CCCATACCCTACGTCACGGCAGCCTCCGCAGATGAGCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 321212 . TTTAGACCCAGCTCCTGCCTCCCAGCCTTCTCTCCAGGCTCTGAACTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 321288 . TAGGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 321449 . TTTTCTAAAAAAAAAATGTCTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 321539 . CACACATATTCTTTTGACA . . . NS=1;AN=0 GT:PS ./.:. -1 321575 . TATGTACATTCCTTACAAACAAACAAAAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 321699 . TGTCGCTACCAGTATTTTGCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 321849 . TCTCTTTTTACTCTCTGTCTTTGTGTTCTGCATTTTCCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 321913 . AAGTCAATCGTACTAATTTATCACGATTTGCTTTATTAATTTATACT . . . NS=1;AN=0 GT:PS ./.:. -1 321980 . CCAGCAGACCTCATTACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 322001 . T . . NS=1;CGA_WINEND=324000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:1.44:.:0:.:0:0.999:211 -1 322006 . CCTGTTTTATTTTGTTTTTTTTTCTGAGACAGGGTCTCCCTCTGTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 322062 . GGAGTGTAGTAGTGCTATCGCAGCTGACTGCAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 322113 . AAGCGATCCTCCCACCTCAACCTCCCACGTGGCTGAGACTACA . . . NS=1;AN=0 GT:PS ./.:. -1 322192 . TTCGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 322206 . TTCCAGAGGGGTGACAGCGAAACGTGAGTAAGCATGGATTTTGGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 322287 . ACTGAGGGACGACGACTATATGTTTTTACAATTATGCTGTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 322338 . TGTTGCATAGCCTTGAAAATAATAACTTTTAATTGAGTGGAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 322422 . GGCTCACACTGGTAATCGCAACACTTTGGGAGGCTGAGGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 322471 . GCTTGAGGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 322528 . ACAGAAAAATACATGAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 322569 . CTGTAGTCCCAGCTACTTGGGAGGCTGAGGTGGGAGGATC . . . NS=1;AN=0 GT:PS ./.:. -1 322682 . GAGACCCTGTCTCAAAACAACAAAAAAGTAGCAGCTAACATCAACTGACCTTTTACCAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 322831 . GTGTGTACAGATGACCTTTTGTTTAGATTGAATTGTCTCCCCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 322888 . GTGAGTCATGGTGAATGGACATTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 323003 . GGCCAAGAGTTGGAGGAGGCAGTATGGCAGTATGGTGAGACCCTGTCTCCATTATTTTAAAAAATTGACAGGCTTTACCCGGGAAGGCTTATACACAATTTAAACACCCCTCATAGTATAAGAAGGTGCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 323170 . TTTAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 323199 . TTACATAGACAAAAGAACTCATATTACTTTACTTGTCACA . . . NS=1;AN=0 GT:PS ./.:. -1 323347 . GTAGTGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 323371 . GCAGCCTTGACCTCCCAGGCTCAAACTTCAGCATTCCGAGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 323421 . TACAAGTGTGCACCACCACC . . . NS=1;AN=0 GT:PS ./.:. -1 323466 . GATAGAGACAGGGTCTCACTCTGTTGTCCAGACCGGTCTCTAGCTCCTGGCCTTAAGCAATCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 323542 . TCTGAAATTACTG . . . NS=1;AN=0 GT:PS ./.:. -1 323574 . CCATGCCTGGCCTGGGCTAGTCCCATATTCTCTAGAGTTCTCTTTACTCTGTGCTAGCCAATCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 323658 . TTATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 323751 . GGGTCCCAGGAGGTAAACCCACACAGATGGGATTTGGGCATAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 323805 . CAGGGGGCAGTGCTGAGCTCTTTGCCAGTGGGAAATGGGGTGCTGGTGATTTCCAGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 324001 . A . . NS=1;CGA_WINEND=326000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.76:1.33:.:0:.:0:0.999:211 -1 324006 . AGCAGGGGTCCCCAACCCCTGAGCCATGGAGCCGCAAGGAGCCACACAGCAGGAGGTGAGCGGTGTCGAGTGAGGGAGTGAGGGAAGCTTCGTCTGTATTTAGAGCCACTCCCCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 324132 . CCCGCCTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 324155 . GATGAGCAGCAGCATTAGATGCTCATAGGAGAACGCACCCTGTTGTGAACCGTGCATGTGAGGGATCGAGGTTGCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 324245 . A . . . NS=1;AN=0 GT:PS ./.:. -1 324263 . TGTCACTTTCTCCCATCACGCTCAGGTGGGACCATCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 324310 . AAACAAGCTTAACACGCCCACTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 324354 . TATAATTATTTTATTATATATTACAGTGTAATAATGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 324509 . CTCACAATGGCCTATTTAGGCCCATACCCTACGTCACGGCAGCCTCCGCAGATGAGGCTACTGCCTCACA . . . NS=1;AN=0 GT:PS ./.:. -1 324600 . CCATCGTTACAATGGCCTCTTTAGACCCAGCTCCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 324650 . CTCCAGGCCCTGAACTTTCTCAAGTCGACCTCACCAGGCCCAGCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 324769 . GTCGGCCTCTGCAGTCCCAACGTCTGCCTCACAGCAGATTCTTCACGCCCAGCATCT . . . NS=1;AN=0 GT:PS ./.:. -1 324899 . GTGGCCTCTTTAGGCCAAGCTCATGCTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 324968 . CTTCCCTGGCCAGATTCCTGCCTGTCTCCCAGCAGCCTAGACAG . . . NS=1;AN=0 GT:PS ./.:. -1 325023 . GCCTCACACTGGCCTCTCTACATCCAGCTTATG . . . NS=1;AN=0 GT:PS ./.:. -1 325068 . CTCTCCAGGCCCAACTCCTGTCCCAGGACGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 325117 . TTACTCAAGTTAGACTCTCTAGTCCCAACTGCTGCCTCCTGGTGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 325199 . CAGGCCCAGCTCCTGCCTCCTGTCAGCGTCTACAGGCCCAACCTCTGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 325290 . TCTACAGGCACAACTGCTGCCTCACAACAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 325345 . GCTCATGGCGGCCAATGTAGGCCCAAAACTTCCTCAAGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 325475 . CCCAGGGGCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 325525 . GCCAAATTTCTGCCTGCCTGCCAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 325564 . CAGCTCCTCCCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 325598 . ACTCATGACTGTGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 325685 . CCCAACT . . . NS=1;AN=0 GT:PS ./.:. -1 325742 . TCCAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 325776 . CAGGCGAAGCTCCTGCCTTTCGGCAGCCTCTCCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 325866 . CCCGGCGGCCTTCCCAAGCCCCGCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 325975 . ACAGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 326001 . C . . NS=1;CGA_WINEND=328000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.88:1.01:.:0:.:0:0.999:211 -1 326140 . CCTTCCGGCGGCCTCTCCGGGCCCAGAACCTCCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 326211 . CGTTCTCTCCGGGCCCAGCTCTTCTTCCTGGTTGGGTCTCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 326320 . CGGCCCACAGCTTCCTCAAGCCAAGCTCCCCAGGCCCAGGTCAGGCCTCACGGTGGCCTCTCCAGGATGAGCTCCTGCCCTCCGATGGCATCT . . . NS=1;AN=0 GT:PS ./.:. -1 326434 . GTCGGTGGGCTCCTCCACGCCAAGGTTGGGCCTCCCGGCGACCGCCGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 326489 . AGTTGTCCTGAAGTCGGGCTCTCCCGGCCCTGCCTCCCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 326566 . CTCCCAACCGCCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 326676 . GCAGGCCCCTCCCTTGCCTCCCAGGGGCCTCTCCAGGCCCAGCTCTTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 326737 . CTCCCGGGGCCAAGTCCCTGCCTGCCTCCCAGCAGCCCGCGTGCGGCCCAGCTCCTCCCTCACGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 326824 . TGCCTCTGGCACCCTGCCCAGAGGCGTGAGCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 326863 . CACACTGGCTCCTCCCACGCTGAGAGAGGTCAGTGTGAGCCCTTGCCTCACACCGGCCCCTCCCACGCGGACAGAGGTCAGCGTGAGCCCCTTGCCTCACACCGGCCCCTCCCATGCTGAGAGAGGTCAGTGTGAGCCCTTGCCTCACCCCGG . . . NS=1;AN=0 GT:PS ./.:. -1 327022 . C . . END=327233;NS=1;AN=0 GT:PS ./.:. -1 327241 . GAGAGCTGGGCCTGGAGACTCCCCTGGGAGGCAACAGCGGGGTCTGCAGACGCCCTTCTCCAGCCGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 327322 . AGTCACTGGGAGAAGGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 327367 . AACTTTGGGGTCTACAAACGCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 327453 . TGGGCAATGCGGGAGGCAGAGGCCAGGCCTCCTTAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 327498 . TCAGACCCACTTGCAGCCTCCCGGCGCCCCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 327547 . TCCCGGCTGCATCTCCAGGCCGGACTCTGGCCCGACT . . . NS=1;AN=0 GT:PS ./.:. -1 327688 . TCGCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 327764 . GCTCCTCTAGGCCCAGCTTGGGCCTCCCGGCGGCCTCCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 327812 . ATCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 327886 . GTCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 328001 . G . . NS=1;CGA_WINEND=330000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.70:1.33:.:0:.:0:0.999:211 -1 328098 . AGGCGCAGAACTTGATCTCCAGTCGGCCTTTGCAGGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 328144 . TGCCTCTCGAAGGCCTGCACGGGCCCGGCCTCGGCCTCGGCCTCACAGCAGACTCTCCACGCCCAGCTAGCTCTCG . . . NS=1;AN=0 GT:PS ./.:. -1 328262 . CACTTCGGCAGGTCCAGCTCCTGCCTGCCAGTGGCCTCTTTAGGCCCAGCTCATTCCTCACGTCGGCCATTCCAGGCCCCGTT . . . NS=1;AN=0 GT:PS ./.:. -1 328439 . AATATTTTGATAG . . . NS=1;AN=0 GT:PS ./.:. -1 328475 . TGCGCCAAGCCCGAATTTTTTATTTTATTTTCCTTATTATTTGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 328568 . AATATCGCCCACGATCAACGTGTTCTGTTCTGGGGAAGGGGGCAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 328630 . TTCTTAAAAAGTATAGCTCAAGTTGGGAGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 328687 . C . . . NS=1;AN=0 GT:PS ./.:. -1 328702 . AGTGCAGGAGCCCCCACCCCCATACTCACCT . . . NS=1;AN=0 GT:PS ./.:. -1 328744 . CTCTGGGGAAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 328773 . CCCCTAGTCCACAGGCGCCTCCCTGTGGCCCAAGGCCCTCTTCACACTCCATCTT . . . NS=1;AN=0 GT:PS ./.:. -1 328834 . CCAGCAGGAGCTATTTTCCGAAAAGTGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 328894 . TACAGGGGCTCGGAGGAGGGAAACTGC . . . NS=1;AN=0 GT:PS ./.:. -1 328948 . GGTAGGGGGTATAGATAAGAGGAGCAGGCCTTGGCCAGGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 329026 . G . . . NS=1;AN=0 GT:PS ./.:. -1 329036 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 329089 . GTCTGTACTAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 329120 . CGTGGTAGCGTCCACCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 329206 . GCCAAGATCGCACCACTGCACTCCAGCCTGGGCGACAGAGCAAGACTCGGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAGGAAGGCCTTACTCCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 329362 . GCTACCCCCACATACTGT . . . NS=1;AN=0 GT:PS ./.:. -1 329436 . GTGGTAGTTATGGAGACCCCCAGGTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 329471 . GGCTGGGGTGTCCCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 329501 . CAAGGCCCCAACTCTGGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 329596 . CAGGGCAGGGGCAGCAGCAGTGGACCTGCTATGCACACATCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 329682 . T . . . NS=1;AN=0 GT:PS ./.:. -1 329716 . CTCTGGGAACCATGCAGCAGCTCCCAGCGGCCCTGCACCCACCACCAGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 329801 . CCAGAAGATCATGCAGTCATCAGTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 329836 . GCCTGCGAGGCTGAGGCTCCTCCCACTGGACCGCCCCCCAACTGGCACCACTGCTGCCCCTGCCCCTACTCTCAGCCTCACGTGACTCTCGGGCAGAAGCAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 329971 . AGCCAGGTGAGGTGCGGTCAGGACCCCCACAGGGCTGGGAGTCAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 330001 . A . . NS=1;CGA_WINEND=332000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:1.25:.:0:.:0:0.999:211 -1 330059 . TGGGCCTGGAGGGCGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 330136 . ACTGGTCCAGGGTACGTGCAGTGAAGAGGACAGCGCCTTCTCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 330201 . CCTCGGCTTCTCCACCTGTACAGGCAAAGGGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 330240 . CCCCATCACACATGGCACACTTGGGGGTGTTGGGCTTTGGACTGCAGCTGGAGCATCTTCTCGTCTTGCATTTGGGCGCGGTGGGGTCCTCCAGTGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 330358 . C . . . NS=1;AN=0 GT:PS ./.:. -1 330368 . C . . . NS=1;AN=0 GT:PS ./.:. -1 330398 . AGGCTCTGCCCCCCGGGTGGCTCAGCCCAGCTCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 330451 . G . . . NS=1;AN=0 GT:PS ./.:. -1 330493 . GTCGCCCACACTTTCAGGTCTCTTGCACCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 330542 . GGGAGGAAACAGGCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 330565 . AGGCGTCCCTGGGCCCCCATCCCCAGGGGTTGGGGCCGTAGGGGGCCCGCTCTGCTGCGTTGACCAGACT . . . NS=1;AN=0 GT:PS ./.:. -1 330649 . GCTCCTGGGCCCAGTAAGAAGGAGGTGGGTGCCAAGGTTGAGGAGGAAGCATCCGAGTACGTGTAGGAGGAGGACAGG . . . NS=1;AN=0 GT:PS ./.:. -1 330748 . AGCTGCAGGTGGATCGGGGGACCCTGGGGGCTCAGGATCCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 330808 . AAGGAGGAAGGAATGACAGGTGCAAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 330852 . CTTGTTGCCCTCTGGCTCCTCCCCAGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 330887 . CACTCTCAGTCGGTCACCCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 330927 . TGTCGGTGGTGCTAAAGCCATCA . . . NS=1;AN=0 GT:PS ./.:. -1 330956 . ATGACATCATCACCCCCTCCTCCTCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 330996 . CTCTTCGTCACTCGCTATGACCTCGCTGGCCATGTGCTGGGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 331047 . TCACGTGGGCGGCAGCAGGGCTGCCCACGGGTCACCTCCCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 331180 . TCCCGTAGTGCCTGACGGTCTATTTCCCTGCCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 331227 . CCCGCTGTGGGAGAAGGCTTGGGCCAGGCTGAGCCAGGTTCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 331278 . TGCAGCCGTTCTGCCCCACAGAAGCTGCTCCTTGGTATCCGAGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 331354 . TCAGAGGACACCCCAGGGGCAGTGGCCGTGCCCGTCTCTGATATGCTCCGCTCCCACGAGCCCTTGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 331431 . CTAGCCCCTGGCTTGTGGGCTTGGCCTCTGAGCTGGACTTCTTTCG . . . NS=1;AN=0 GT:PS ./.:. -1 331496 . CACCTTCACCTGGAAGGCCAGGTTGTATTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 331558 . GCTCGCTCAGCATCTGGCTGACGGTCCGGTTATCCTGGTTGGGGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 331615 . CCCCGCCAGGGCCTGGTGCCGCCTGCTGAAGATCATGAGCGC . . . NS=1;AN=0 GT:PS ./.:. -1 331669 . ACCGGATGTGGTCCTTGTCTGATTTGTTGGGGCTGCGTCCATCCTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 331725 . GAGTCCTGTTCCTTGCGCAGGGCACTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 331773 . CTGAGTGGTAGAGGCAACTGGGTGTCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 331851 . TCCAGCAGGGACTCCCCTGAGGGGCCCAGGGCTCCTCCTCCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 331913 . CCAGGTTCCACCACCCCCAAAGTGTGTGGGGTTGCGGGCCCTGGGCTTTCAGGGCAGGTGGCTCCAGGGGGCCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 332001 . T . . NS=1;CGA_WINEND=334000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.84:1.20:.:0:.:0:0.999:211 -1 332001 . TCCCTGTCCCACCTGGTGGACGCTCATGAGCAACGGCTGCCAACT . . . NS=1;AN=0 GT:PS ./.:. -1 332104 . G . . . NS=1;AN=0 GT:PS ./.:. -1 332115 . GGGAGGGGTCAGGAAGGGGATGGAGTACCAGGAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 332188 . AACATGCTGACGCCACGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 332222 . CAGGCCTGGGGGCCGAGCACTTGGTCCGGGCAGGGGGTTCC . . . NS=1;AN=0 GT:PS ./.:. -1 332275 . ACACCTCCTCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 332354 . TCAAGTTTTGTTTTTTTTGTTTGTTTTGTTTTTTGTTTTTGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 332451 . CCTTCATACCTGGCTAATTTTTTGTATTTTTACTGGAGGTGGGGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 332550 . CCTCAGCCTCCCAAAATGGGATTACAGGCATGAGCTACCGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 332610 . TACTTGAAAAACTCCGTTAAGCATTTTTTTAAGGTAGACCTAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 332668 . TCAGCTTTGTTTGTCGAGGAAACACGTTATTTCTTTTTCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 332756 . TTTTCTTTCAGCACTTTGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 332886 . AAGCTTCATGGATTTAGATGTCTAAATCTTTCCCATGATTTAGGCAGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 333033 . TTTCTTTTCTCTGTCTCTTTTTTTTTTCTTTTTGAGACAG . . . NS=1;AN=0 GT:PS ./.:. -1 333131 . CCAACCTCCGCCTCCTGGGTTCAAGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 333164 . CTGCCTCAGCCTCCTAAGTAGCTGGGACTACAGGTGTGTGCCACCACACCCGGCTAATTTTTGTATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 333284 . CCTCTTAATCTGCCTGCCTCGGCCTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 333526 . T . . . NS=1;AN=0 GT:PS ./.:. -1 333576 . GGGAAGCTGAGGTGGGAGGATCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 333630 . CAACGTACTGAGAACTTGTCTCTATATTAAAAAAAAAAAAAAAAAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 333721 . GAGACCAGCCTGGCCATCATGGCAAAACCC . . . NS=1;AN=0 GT:PS ./.:. -1 333763 . AAATACAAAAATTAGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 333799 . TGTAGTGGTGGTGCATGCCTATAG . . . NS=1;AN=0 GT:PS ./.:. -1 333850 . G . . . NS=1;AN=0 GT:PS ./.:. -1 333868 . GAGAGGGAGGTTGCAGTGAGCTGAGATCGCACCAGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 333912 . GCCTGGGCAACAGAGTGAGACTCCATCTTATAAAAGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 333988 . ACTCCTGAGTTTTTGAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 334001 . T . . NS=1;CGA_WINEND=336000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.68:1.14:.:0:.:0:0.999:211 -1 334016 . GATCGTGCTCTACTGTGATGATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 334055 . TCAGAAAAAAAGCGTATTCTTTTAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 334121 . T . . . NS=1;AN=0 GT:PS ./.:. -1 334157 . TGATTCAACAGGAGGAGATAAGGAAGCTCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 334220 . TAGATTTTTATAAAATGAAAGCTGCCTCTGAAGCACTGCAGACTCAGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 334315 . CTCTTACTATTCTGAGAGCCTTATCATTCT . . . NS=1;AN=0 GT:PS ./.:. -1 334365 . T . . . NS=1;AN=0 GT:PS ./.:. -1 334406 . TAAAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 334462 . GGGCGTGGTGGCTCATGCCTGTAATCTCAGCACTGTGGGAGGCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 334586 . A . . . NS=1;AN=0 GT:PS ./.:. -1 334606 . GGCTGGGACCTGTAATCCCAGCTACTTGGGAGGCTGAGGCAGGAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 334659 . GAACCCTGGAGGTGGAGGTTGCAGTGAGCAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 334698 . CCATTACACTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 334736 . ATCTCAAAAAAAAAAAAAAAAAGGGGGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 334841 . GACAATCAATATTTGCCAAAATGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 334922 . A . . . NS=1;AN=0 GT:PS ./.:. -1 334945 . ATCCAGTAGTGACTTTTACAGTTTGTATCTAAATAGAAGCTGGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 335010 . GCATAAAATATTATTTATTATAGAGTTAAATGCTACA . . . NS=1;AN=0 GT:PS ./.:. -1 335054 . ATCTAATTAATAGGCCTATTTTCCTTTTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 335256 . ACAGATTTGGGGTATATGCTAAAGTTACCAA . . . NS=1;AN=0 GT:PS ./.:. -1 335345 . AGGCCGAGGTGGGCGGATCATTTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 335405 . GAAACTCCGTCTGTACTAATAGTACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 335441 . AGGCGTGATGGTGTGCACCTG . . . NS=1;AN=0 GT:PS ./.:. -1 335476 . CAGAAAGCTGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 335521 . TTGCAGTGAGCAGAGATTGTGCCACTGCACTCCAGCCTGGGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 335626 . ACACATGTTGTTAAATCATCTTACAGATTTTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 335668 . GAAGAAAAGTTTTACTAAATGGTCTTTTAATGGAAACTC . . . NS=1;AN=0 GT:PS ./.:. -1 335743 . ATCCATTCCTAGGCCTAGAAAAATGTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 335808 . CAGCGATAGTACATTAGCTATGCTATATGCATACATTAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 335859 . TCGACTTTCAAAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 335889 . CTTACTGCTTCTGAACATGTTTGTGAGTTATATTGCTGAGGGACCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 335958 . AACCCAGTGTTATAAAATTGAAATC . . . NS=1;AN=0 GT:PS ./.:. -1 336001 . C . . NS=1;CGA_WINEND=338000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:1.30:.:0:.:0:0.999:211 -1 336003 . AAAATTAATATCTACCTTGTAAAAAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 336039 . CTGCATTTGAGAATAGACTTTTTAGGTAATAATGATGCAATCCATAGGGTTTTTGGGGGCACAGAGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 336179 . CATATTTTTACTCTTTTTATAATTTTTTCTAAAAAAAATTAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 336232 . TATATAACTTTTAACTTTATAGGTAAATATTTGTTTCTTTCAGCTCCAGTTTTATGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 336304 . G . . . NS=1;AN=0 GT:PS ./.:. -1 336365 . T . . . NS=1;AN=0 GT:PS ./.:. -1 336369 . T . . . NS=1;AN=0 GT:PS ./.:. -1 336408 . CACATTTTGTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 336435 . TTTGTTTATATTCTATATATATTTCATTTTTGGTTACTATGAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 336536 . TTCAATCACATACCAAAATTCTACTGCTATATATATAGCTCTACTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 336600 . G . . . NS=1;AN=0 GT:PS ./.:. -1 336610 . A . . . NS=1;AN=0 GT:PS ./.:. -1 336644 . CAATTACATTTTATGCATTTGCCTTTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 336682 . AAAATAAAAAGCAGAGTTACAAACCAAAATTACAATAGGACTGTTTTTATGTTTATGTATTTACCTTTACCAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 336793 . C . . . NS=1;AN=0 GT:PS ./.:. -1 336803 . G . . . NS=1;AN=0 GT:PS ./.:. -1 336831 . G . . . NS=1;AN=0 GT:PS ./.:. -1 336846 . T . . . NS=1;AN=0 GT:PS ./.:. -1 336880 . CAGTTTTAAAAAAAAATCCGGAAATGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 336915 . CTCCTTCATTTTTGAAGGATAAGTTTTCCAGCTATATATTT . . . NS=1;AN=0 GT:PS ./.:. -1 336976 . ATTATTTTAAATATATAATCCACTGCCTACTGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 337022 . TGCCGAGAAATCAGCTGCTAATGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 337083 . TGAGTTTTCAACATTCTCCCATTATCTTTTTTTTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 337134 . ATAATTGTACATATTCATGGGATACAGAGTGATATTTTGATA . . . NS=1;AN=0 GT:PS ./.:. -1 337213 . AGCATATCCATCACCTCAAATATTTGTCATTTATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 337268 . TTCTTTCTTCTAGTTTTTTAAATTTATAAACATTTAAATTTTATTACAGAAATTTAAATTTTTTGATTCTGAAAAAGTCATATATGTATGCAACATTTTTTATCATTTATTTATATATTT . . . NS=1;AN=0 GT:PS ./.:. -1 337421 . TCTATTTTATCATTATTTCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 337479 . CCCTTGTTTTTTCCTAGTATATTAATTTATTTACTTATCTTCTAAAAATCCTCCATATAATCTGTTTATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 337573 . AATAATTAGTTCTGTTCTATTTTCCATTAAAATATTTAAATCTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 337655 . TTAATTTCTCTATACTCTAGCTTTTGACTTTTTTTTTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 337731 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 337772 . AAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 337796 . C . . . NS=1;AN=0 GT:PS ./.:. -1 337811 . A . . . NS=1;AN=0 GT:PS ./.:. -1 337815 . T . . . NS=1;AN=0 GT:PS ./.:. -1 337833 . C . . . NS=1;AN=0 GT:PS ./.:. -1 337844 . TGTATAAGCATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 337933 . CGTTGGGGAAAAAATAAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 338001 . C . . NS=1;CGA_WINEND=340000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.70:1.19:.:0:.:0:0.999:211 -1 338130 . TGCTGCTACTCAGATACATTTTATTTCAAAAACATA . . . NS=1;AN=0 GT:PS ./.:. -1 338190 . TTCCAAAAACATATTCACACTGAACTTTCAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 338253 . G . . . NS=1;AN=0 GT:PS ./.:. -1 338305 . AGTATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 338336 . GTAATGTAATTTTCTAATGCTAAATCA . . . NS=1;AN=0 GT:PS ./.:. -1 338421 . CCCCTAATTTTTCTTACCTTTGACATGATCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 338474 . GTGATTTTTCTTTTTTTTTTTTTTTTTTGAGACAGGGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 338533 . GAGTGCAGTGGTGCAATCACAGCTCATTGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 338597 . CCTCAGCCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 338633 . GCCACCAAACTTGGCTAATTTTTGTATTTTTTGTAGAGACAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 338685 . AATTCTCAGGCTGGTCTGGAATTTCTTGGCTCAAGTAATCCTGCCTTGGCCTCCCAACATGCTGATATTA . . . NS=1;AN=0 GT:PS ./.:. -1 338761 . TAAGCCACAGTACCTGGCCAGTTTTCTTTTTAAAAAAATCTA . . . NS=1;AN=0 GT:PS ./.:. -1 338833 . TAGCTGTGCTCCTTAATTGGGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 338874 . AACTTAGCCAATTTTCTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 338969 . C . . . NS=1;AN=0 GT:PS ./.:. -1 338983 . TAAATGGTGCACTGGATGTTGAAAAAGTTGATGATGA . . . NS=1;AN=0 GT:PS ./.:. -1 339036 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 339038 . C . . . NS=1;AN=0 GT:PS ./.:. -1 339066 . GAAATATAATTTGTTAAAATATAAAAGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 339148 . ATGGGCTGGAGGCCAGAGGATTGCCAAAGAGAATGGGCCTCCTGCTGAGATGAAAAGTTGAGCAGGGATTAGTTGGCAAAAGTGGAGGGACGAT . . . NS=1;AN=0 GT:PS ./.:. -1 339289 . TGCGACAAAGTCTATATAAAAAACTGA . . . NS=1;AN=0 GT:PS ./.:. -1 339326 . AATGTGGCTTAAATACAGAAGCTAGTAGGAGAGGAGTCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 339416 . GTATAAAAAGAATTTCTCTTTATTCTAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 339468 . CTTTAAACAGGTGATGTGATTTGATTGAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 339522 . CATTACATGCCCACTGTTTGTCAGATATTGCTGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 339582 . AAACAGGCAAAAATCCCTGTCCTCTTGCAGCTTATAATGGACTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 339635 . ATATGTCAGAGGAGGTCCACGGAGGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 339681 . TGAAAAAAATGAGGATA . . . NS=1;AN=0 GT:PS ./.:. -1 339812 . CTTCTCTCCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 339842 . TGTCTATATATATAGAATATG . . . NS=1;AN=0 GT:PS ./.:. -1 339880 . T . . END=340145;NS=1;AN=0 GT:PS ./.:. -1 340001 . A . . NS=1;CGA_WINEND=342000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.60:1.27:.:0:.:0:0.999:211 -1 340154 . TTTCCACTAATTTAAAATGCCACCTTTATGTTATTGTAATTTATATATATACTATATATACACACACACACATATATATATACATG . . . NS=1;AN=0 GT:PS ./.:. -1 340246 . TACAGTGTGTGTGTGCACATGTACACACATGCATATGTGTATAGAATGCCCAGTATAAGCAATGTGCACAAATAAAATTAG . . . NS=1;AN=0 GT:PS ./.:. -1 340342 . TAGAGTGAGAGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 340389 . TATAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 340409 . TGAGGTGGTTTCTAAGATGGAGAATAAGACGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 340449 . AGTACGTTGTTTGACTGAATTCAAGAAAGAAGGGTAAAAGAGAAGAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 340506 . TTATCATTAAATGCCACAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 340548 . ATTGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 340568 . TTAAGGGTGACCAAATTCCGTTTTGGAGGAGGAACAGATTCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 340654 . GGTAGTTTTTCAAAAGTTTTCAAAAATATGAAAAGAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 340723 . AATGGGAGAGACTATGGTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 340772 . ATTGAGATAGAGATATTGACTATATAAACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 340854 . TGAATTTTTGTGAAAGTACAACTAAAAGGCAATGTCACT . . . NS=1;AN=0 GT:PS ./.:. -1 341091 . AACAATGCATGACAATTTACAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 341123 . TTTTGGAGCTAACTTTAAGTACCTG . . . NS=1;AN=0 GT:PS ./.:. -1 341191 . TTATTTATCAAGTTTATGTCAAGGGACAAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 341267 . ACACTTATCCAGGGGGGTTTTTAACCTTTCCCCCACT . . . NS=1;AN=0 GT:PS ./.:. -1 341330 . AAT . . . NS=1;AN=0 GT:PS ./.:. -1 341337 . A . . . NS=1;AN=0 GT:PS ./.:. -1 341347 . GCG . . . NS=1;AN=0 GT:PS ./.:. -1 341369 . TAACATTTCTCACAAGTCAATTAGCTTTGTACTGGGAGGAGGGCGTGAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 341425 . TTGCGGTAGTTGTGTAGCAGCAGCACAATGGCCGCAGACAAGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 341495 . ATAATTTTATATTTTTGACAAGATTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 341535 . CTTCCTCTCCATTTCTTTTTTTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 341584 . ATTTTATTAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 341605 . CTTATCTCTTATTATATTTTATTAAAGAAAATTATTATATTATTCCTTTATATTTTTATTAAAGGATTTTATTATTA . . . NS=1;AN=0 GT:PS ./.:. -1 341687 . GGAAATTAGCCTTATCTCTTATTATATTTTTTATGACCTTCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 341739 . TCTGCTTAAAAGTGTACCCTGGCCGGGCGTGGTGGCTCACACCT . . . NS=1;AN=0 GT:PS ./.:. -1 341793 . C . . END=342057;NS=1;AN=0 GT:PS ./.:. -1 342001 . C . . NS=1;CGA_WINEND=344000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.62:1.17:.:0:.:0:0.999:211 -1 342069 . A . . . NS=1;AN=0 GT:PS ./.:. -1 342105 . A . . END=342360;NS=1;AN=0 GT:PS ./.:. -1 342377 . TTAATTTTTTTCTAGCTGATCCATATGAATTCCTCTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 342426 . AAAGCATCCAGGATTCAATGAAGAACTGACTA . . . NS=1;AN=0 GT:PS ./.:. -1 342487 . GCAGGCTTAAGCCATTTTTGATATAGATACTGA . . . NS=1;AN=0 GT:PS ./.:. -1 342529 . TTGCTAAGAGCAAACTTGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 342573 . CTTCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 342625 . CTAACCACTTGCTCGCCAACAAGGAAAACTTTTAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 342674 . GAATAGACAAGACATTTCTTTCTTTTGGTAGAAAATGAT . . . NS=1;AN=0 GT:PS ./.:. -1 342745 . GTAATTTTAACTTTGTGATTTATTGCCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 342782 . TCTTCTGTACTGTAAAGTGTGTGTCAAAAAAAAAAAATAGCGATTTTGGAGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 342889 . TTTCTCTCTTTCAAGAACATTTTTATTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 342926 . TTACAAAAACTCCCTAAACTTTGGAACAGCTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 343011 . CTTGCATCTGTGAATAAGCCAGGACATCTT . . . NS=1;AN=0 GT:PS ./.:. -1 343052 . TGATTAAAAACACCACCAATGGAGTTTCATTAAATTTGTATTGCTCTGACTAGTGAAACATACA . . . NS=1;AN=0 GT:PS ./.:. -1 343124 . TTGCTGAGGATATTTTACTGCAGTTCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 343222 . TTAATTTGCTTTCTTACAAGGCCACATTC . . . NS=1;AN=0 GT:PS ./.:. -1 343333 . TAAAAACACTAGTATTTTTCTCTGAGTTTAAAATTCAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 343383 . GATATGGTTAGGCTTTGTATCCCCACCTGAATCTCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 343483 . TAATTGAATCATGGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 343531 . AGTTCTCACGAG . . . NS=1;AN=0 GT:PS ./.:. -1 343578 . CTCGGCACTTCTTCATGCTGCCTTGCGAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 343629 . G . . . NS=1;AN=0 GT:PS ./.:. -1 343711 . A . . . NS=1;AN=0 GT:PS ./.:. -1 343714 . A . . . NS=1;AN=0 GT:PS ./.:. -1 343730 . TTCTTTATAGCAGTATGAAAATAGAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 343781 . AAATACAAAAAAACAAAACATTATCTCACTAACATAGGAGCTAATATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 343845 . AGTATTTTATATTAAAAATATGTACATATATATTTATATATAATTAAGAACAT . . . NS=1;AN=0 GT:PS ./.:. -1 343929 . ACATCTACTTAAGAAAATAGCTATGTAATATACCATTACTCAACTAGATTATAATTTTTTCTCCATTTCTTTATTGTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 344001 . A . . NS=1;CGA_WINEND=346000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.04:1.21:.:0:.:0:0.999:211 -1 344022 . CTTTTTTGTTTTCTCATTTTTATTGCATAATATTTAATTATGCAAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 344104 . TAT . . . NS=1;AN=0 GT:PS ./.:. -1 344128 . AACAAATGCTAATACCCACTACCTGACTTA . . . NS=1;AN=0 GT:PS ./.:. -1 344166 . GATATTATTTTTTTCCAATTGAAATTCCCTCAACT . . . NS=1;AN=0 GT:PS ./.:. -1 344222 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 344293 . T . . . NS=1;AN=0 GT:PS ./.:. -1 344315 . AAATGTAACCATATTGTATATATTCTTCTTCAGCTTCTTAGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 344375 . TTTTGCTGATACTTACATTCATATGTACAGT . . . NS=1;AN=0 GT:PS ./.:. -1 344423 . AATGGCTATATATTATTCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 344473 . T . . . NS=1;AN=0 GT:PS ./.:. -1 344475 . A . . . NS=1;AN=0 GT:PS ./.:. -1 344498 . TGTCTTTTTTATTTTTTGATATAACAAACAATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 344601 . T . . . NS=1;AN=0 GT:PS ./.:. -1 344603 . T . . . NS=1;AN=0 GT:PS ./.:. -1 344627 . T . . . NS=1;AN=0 GT:PS ./.:. -1 344691 . TAATGGACAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 344711 . TACTTCCCATCTTTGTTAATTATTACTTTTAGACTCTAACTTTTATCA . . . NS=1;AN=0 GT:PS ./.:. -1 344765 . TGGATGTAAAAAGCATCTCAGGGTGGTTTTAATTTGCATT . . . NS=1;AN=0 GT:PS ./.:. -1 344810 . GCTCATCTATGAAGATGAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 344849 . A . . . NS=1;AN=0 GT:PS ./.:. -1 344877 . GTTTATGCATTTTGCTTGTTCTATGTCTTATTTTTCCTGTTGATTTTTGGGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 344939 . ATTCTAAATGTATATTTATTCACTTATATATATGTTGTAAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 344995 . T . . . NS=1;AN=0 GT:PS ./.:. -1 345009 . ATATCTTCCAAATAGAGAAGCTTTATATTTTGATGTAGTCATATGTTCATTTTTCCTCCTTAATGTTTGTTTTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 345098 . TACCAAAAGTAACAAAAATTCTCATTTATTTTTAATC . . . NS=1;AN=0 GT:PS ./.:. -1 345158 . GAATTCACCTTGAATT . . . NS=1;AN=0 GT:PS ./.:. -1 345218 . GATAACCACTTGTTCTATTACTGCTGTAACAAATTTCTACAAACTAAGTGACCTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 345313 . GAAATCAGGCATGAATTTTAGTGAACTAAAATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 345375 . TGGCTAGGGTAGAATCCATATCCTGGCCTTTTCTATCTTCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 345458 . T . . . NS=1;AN=0 GT:PS ./.:. -1 345539 . TCCATTTTTAAGAACTCATGTGATTAGACTGGGCCCATCTGGGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 345811 . ATTGTTTTTATTTTTATGTTATTCCACTA . . . NS=1;AN=0 GT:PS ./.:. -1 345855 . CATGAATTATGGTACATGAGTTTATTTTTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 345930 . TTTTGCGCTTCTTCAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 345956 . TTTTGGCCCTTTGGTCTTCTATACACATTTTAGAAATGCTTTGTTGAGGACTA . . . NS=1;AN=0 GT:PS ./.:. -1 346001 . G . . NS=1;CGA_WINEND=348000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.01:1.17:.:0:.:0:0.999:211 -1 346077 . GTGCTTTATACATGAAAATAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 346287 . GCCCAATGTGATATTTATTCAACAAATATTCATTGAGTATACCTAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 346398 . CATTCCAGTATTTGGAGACAGATGATAAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 346505 . AGTCTGTAATTTAAATAGGGTGGGCAGGAAAGCTTCACAGAGAATGG . . . NS=1;AN=0 GT:PS ./.:. -1 346601 . ATATGGGAGAAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 346686 . AAGGAGGCTAGTGTGACTGCCACAGAATCACCCAAGGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 346743 . AGACCAGCACTTGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 346790 . AAAAATTTCACTTTTACTATAAATACTATGAG . . . NS=1;AN=0 GT:PS ./.:. -1 346838 . TACAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 346867 . CATGTTTTAAACAAACTCTATAGCTTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 346922 . GGCAGAAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 347059 . GTACGATAAACCCCCCTCCCCAATCACAAAAGAAAAACTAAACACAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 347120 . TTGCTCAGACAATTTTACAGGTGAGTTCTAGCAAACATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 347212 . A . . . NS=1;AN=0 GT:PS ./.:. -1 347297 . G . . . NS=1;AN=0 GT:PS ./.:. -1 347421 . AAACATCATTGGTCACAGGGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 347453 . GCTGGTGTTAACTTACTGGCAGAAACAGCCATTGCTAGGCAAGTGTTCTTGTGAGGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 347525 . GCTGCAGTCTTGAGGAATTTTTTATGATAGGTCCTATTATAAAACACCTACAGGATGAGCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 347623 . GTGAGTTTATAGAAAGTCCTTGTGATAG . . . NS=1;AN=0 GT:PS ./.:. -1 347666 . G . . . NS=1;AN=0 GT:PS ./.:. -1 347750 . TTCACTGTAGTATAATCTGCACATCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 347791 . AGTACAAAGAAAGAAAATTAAAGGTATATCTCTTTCAAAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 347883 . TTATAATAAACATCTCAAGCTTCACAGAATTCT . . . NS=1;AN=0 GT:PS ./.:. -1 347927 . CACTCTCATCCACAATCTTTTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 347978 . GTTGCTGAGGCTTCTTATTGCTTTTTTCTTCAAATAACAGTCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 348001 . T . . NS=1;CGA_WINEND=350000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.19:.:0:.:0:0.999:211 -1 348042 . CCTAGTCCATACAATTGTTATATT . . . NS=1;AN=0 GT:PS ./.:. -1 348088 . GTGATTCTTTTTTTATATATTTCTGGACAATTCTTTATATTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 348186 . T . . . NS=1;AN=0 GT:PS ./.:. -1 348197 . T . . . NS=1;AN=0 GT:PS ./.:. -1 348202 . A . . . NS=1;AN=0 GT:PS ./.:. -1 348250 . AAAAGCAGCTAGACAAAATCAGAAGAGGTCTGTTTTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 348304 . TGACAGTCTAGAGTGTGGGCACAGAACCCAAGCTTATAGGAATTTCTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 348387 . AGCTCATAAAGAATTCTGCGAGCCATCACATCTGTCAAACCTGCTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 348455 . TATTTGAAGACCTCCTCTTCACTCCTCATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 348496 . TGAGTTTCTTTCATAGGCAAACCCAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 348537 . CCTGAAGACTTCGGGTGACACAGTACCCAGACTTAAATAGGAGGGGAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 348601 . CATCCAGCACAATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 348631 . ATAGTTTTGATTCCTTAAAAAAATTAACCACATTAAAATATGTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 348746 . GCTACCCCCATTTTGTGTGTGTGATTTTTGTGTGTGTGTTAGAAGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 348817 . TGAGCTTGCTTTGATGATTTATTTGTCCAGAGAGGATTTTTTTTCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 348915 . CAGCTTGCAAATTTGGAAGCCACACAGATTGTGTGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 349057 . CTCACTTTGAAGGTTTATTTTATTTATTTTTTAAATTTCTAGTACACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 349119 . GTAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 349208 . TCAGTATGTCTGCTCAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 349244 . ACTGAGGGTGGATCTTCTTCCCAGCTCACTCACATGGTTCTTAGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 349386 . CTGGGCCACTGACAGCATGCCAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 349420 . CAAATGAGAGGGCAAGAGAAAGAGAGAGAGGGAGAGGGCACAAGAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 349544 . G . . . NS=1;AN=0 GT:PS ./.:. -1 349562 . CATG . . . NS=1;AN=0 GT:PS .|.:349562 -1 349624 . TTCTGGTAGCAGCCACAATATCTTATC . . . NS=1;AN=0 GT:PS ./.:. -1 349767 . CAAATGTCTATGTTAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 349809 . TAGGCTTTTAAACTCTGTGAATGTTGGACTGTGATGTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 349890 . GCTTATTCAGTAAGCACATACTTGGCTCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 349982 . G . . . NS=1;AN=0 GT:PS ./.:. -1 349994 . GTCATTTTTTTAATTTAGTGACTGGTTCTATTAAAAGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 350001 . T . . NS=1;CGA_WINEND=352000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:1.23:.:0:.:0:0.999:211 -1 350107 . AAACCACAGCCCTCAGTTGTGTATATTTATCTCTTGTTTTCATA . . . NS=1;AN=0 GT:PS ./.:. -1 350215 . ATAAATATTTAATTCACAAGTTTAGAAAAGTGAACCTGAAAAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 350301 . T . . . NS=1;AN=0 GT:PS ./.:. -1 350364 . TTGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 350404 . ACTATTTTTAAGTTGAAAATGTAATTGGTTTCTAATAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 350468 . ATCGTGTTTAATGTTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 350499 . TGTCAGATATCTCTCTTGATTCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 350539 . AGACAGCAAATGCTATTGTCCAAGTTTTCTAAAGAAGAATCTGAAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 350644 . AAATTGTAGCCCTTGTAAATTACCTACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 350699 . ATTCAGTACTCTGTTCTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 350751 . T . . . NS=1;AN=0 GT:PS ./.:. -1 350754 . A . . . NS=1;AN=0 GT:PS ./.:. -1 350762 . C . . . NS=1;AN=0 GT:PS ./.:. -1 350795 . ATTCTGACTGTGGCAGGAGTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 350887 . TCTCCCAATTATTTTAAAATTATAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 350966 . GAGGCAGTTAGGGAAGCCTTCCCTGAGTTAGTGCCATTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 351016 . TGATAGACGATAAGTAATTTGTCAGGGGAAAAATACTCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 351075 . AAGGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 351098 . TGTCTTGGTCCAGGAGCTAAAAAATGTTAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 351152 . AAAGAGTTATTAAATGAGGCAGCAGGCTTCAGCAGGTGCCACAT . . . NS=1;AN=0 GT:PS ./.:. -1 351258 . GTCGTAAGCAGAAAGTAAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 351291 . AAATGTTATTCTCTAAACAGTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 351333 . A . . . NS=1;AN=0 GT:PS ./.:. -1 351392 . TATTAGGTGCCTACTATTAGCCAGGTACAGCCCTTAGCTACTT . . . NS=1;AN=0 GT:PS ./.:. -1 351461 . CAGAATTTCTTAAACAAAGAATCTAAAGTTGTTTATACACCATAA . . . NS=1;AN=0 GT:PS ./.:. -1 351514 . TTTTATAAATTTCTTGAAATTATTTTTATGTACACTG . . . NS=1;AN=0 GT:PS ./.:. -1 351611 . T . . . NS=1;AN=0 GT:PS ./.:. -1 351657 . TTTACTGTCAATAGTTTGTATCA . . . NS=1;AN=0 GT:PS ./.:. -1 351695 . T . . . NS=1;AN=0 GT:PS ./.:. -1 351701 . G . . . NS=1;AN=0 GT:PS ./.:. -1 351712 . A . . . NS=1;AN=0 GT:PS ./.:. -1 351721 . CAAAATCACAAATTTATACTTTATATTATGAATGAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 351769 . AGCATGATAATATATTCTGTTGTCATCACATACA . . . NS=1;AN=0 GT:PS ./.:. -1 351810 . AACATATAGAGTATGAATCAATAATTTTTCAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 351903 . TAAC . . . NS=1;AN=0 GT:PS ./.:. -1 351924 . CTTGCAAAATGATGGTGGTGGTTTTTTTTTTTTTTTTTTCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 351982 . TTGTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 352001 . G . . NS=1;CGA_WINEND=354000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.78:1.13:.:0:.:0:0.999:211 -1 352005 . TGGCGTGATTTTGGCTCACTGTAAACTCCACCTCCTGGGTTCAAGCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 352082 . GTATTACAGGTGCCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 352105 . ACCCAGCTAATTTTTGTATTTTTAGTAGAGATGGGGGTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 352162 . TGGTCCCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 352185 . GTGATCCACCAGCATCGGCCTCCCAAAGTGCTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 352227 . GTGTGAGCCACTGCGTCCAGCCAGTGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 352301 . T . . . NS=1;AN=0 GT:PS ./.:. -1 352361 . ATGATAAGAAGATTTTAATTTTCTTTTTTTTTTAACTTTTATTTTAAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 352442 . TCACAGGGGTTTATTGTACAGATTATTTCATCATCCAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 352515 . C . . . NS=1;AN=0 GT:PS ./.:. -1 352558 . T . . . NS=1;AN=0 GT:PS ./.:. -1 352623 . GCTGTATTTGGTTTTCTGTTCCTGCATT . . . NS=1;AN=0 GT:PS ./.:. -1 352679 . CCACATTCCAGCAAAAGACATGATATCATTTTTTAATG . . . NS=1;AN=0 GT:PS ./.:. -1 352778 . TCACCAGAGCAGTCTGACAGAACCTCTCTGAAAGACTTCTCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 352853 . CCCTAAATAAATCTAACTTTAATTTCTTAAAAGCTTAATTTTTTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 352943 . TTCGGTATTTTTCCTATTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 353005 . ATATGACTTTGGTGCCTTTTCCTCAGCCTCCACTGATGATTTTTTCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 353147 . ATGCTTTTACAGGTTTTGCTATCA . . . NS=1;AN=0 GT:PS ./.:. -1 353316 . ACCTCAAAACTTCGCAGCTTAAAACAAC . . . NS=1;AN=0 GT:PS ./.:. -1 353391 . CTACATGATTCTGCTCCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 353424 . AGCCATCTGTGGTATTCAGCTGGCAGCTGGGTAGTCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 353574 . GATCTATTCAGCAGCAATATGGTTTGGCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 353643 . AATTTCCGTGTCTTGTGGGAGGGATCCAACGGG . . . NS=1;AN=0 GT:PS ./.:. -1 353714 . CTCGTGGTCTTGAATAAGTCTCACAAGAGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 353768 . CTTTCACTCAGCTCTCATTCTGTCTTGTCTGCCATCATGTAGAGATGTG . . . NS=1;AN=0 GT:PS ./.:. -1 353884 . TTTTTCTTCATA . . . NS=1;AN=0 GT:PS ./.:. -1 353924 . CAGCAGCATGAAAACAGACTAATACAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 354001 . G . . NS=1;CGA_WINEND=356000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.83:1.14:.:0:.:0:0.999:211 -1 354025 . AACAGGCAGGGGTTGAAACAGTTTGGAGGGCTCAGAAAACAACAG . . . NS=1;AN=0 GT:PS ./.:. -1 354129 . TCAAATAATGATATAGACAATGAAATCCAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 354189 . CTCAATGGGAACTGGAGTAAAGGTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 354327 . CAAATCATTCAAGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 354381 . AACAGAGCATAAACGTTCAGAAAATTTGTAGCATGACACT . . . NS=1;AN=0 GT:PS ./.:. -1 354563 . C . . . NS=1;AN=0 GT:PS ./.:. -1 354756 . GTTGAGCCTGTGGATTCACAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 354801 . CCTCCATCTAGATTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 354910 . GAAATGTGGGGTTGAAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 354998 . CCCGGAATGGTAGATCCACCGACA . . . NS=1;AN=0 GT:PS ./.:. -1 355073 . AACCGGGAGGAAGGCTCTCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 355240 . AGCCTTTAGTCCCTTTGTTTTGGGTAAATTTTACCATTTGGAATGGC . . . NS=1;AN=0 GT:PS ./.:. -1 355323 . GAAAATAACTAAATTGCTTTTGATT . . . NS=1;AN=0 GT:PS ./.:. -1 355401 . TGCCAACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 355444 . GGACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 355477 . GTGAGGACATGAGATTTGGGAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 355531 . CC GT . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.131|rs75706905&dbsnp.131|rs78394685;CGA_RPT=THE1B|ERVL-MaLR|25.6;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:42:42,82:42,82:33,19:-82,-42,0:-33,-19,0:3:3,3:0 -1 355823 . CCCGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 355867 . ATGCAAGCAGAATAAGCAGAATTCTTCGATACTGACTCAGCACCCACAC . . . NS=1;AN=0 GT:PS ./.:. -1 355936 . AAATGGAAGCTTCAAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 356001 . T . . NS=1;CGA_WINEND=358000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.89:1.07:.:0:.:0:0.999:211 -1 356055 . CCCCACGTTTTCAACGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 356237 . AAAATGAGTTCATTACATGGTAAGGATGAGGGAGAAAGAAAAGATCTG . . . NS=1;AN=0 GT:PS ./.:. -1 356350 . CCACATATAGCTGTGTGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 356443 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.131|rs75715938&dbsnp.132|rs112222596;CGA_RPT=L2|L2|50.1;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:66:66,131:66,131:48,44:-131,-66,0:-48,-44,0:3:1,1:2 -1 356537 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.131|rs75249042&dbsnp.135|rs183318141;CGA_RPT=L2|L2|50.1;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:56:114,56:114,56:42,45:-114,-56,0:-45,-42,0:3:3,3:0 -1 356561 . T . . . NS=1;AN=0 GT:PS ./.:. -1 356598 . T . . . NS=1;AN=0 GT:PS ./.:. -1 356934 . TCCTCACAATGTGGTTTCAGGCACAACTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 357016 . TTTGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 357082 . GGAAACTGACTCACATCAAATCCCTCTCTTATAGTAATTTTCATT . . . NS=1;AN=0 GT:PS ./.:. -1 357307 . C . . . NS=1;AN=0 GT:PS ./.:. -1 357311 . TATTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 357328 . G . . . NS=1;AN=0 GT:PS ./.:. -1 357373 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 357453 . GGGTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 357525 . TCATCCAACTTTATTTGATTTTGTTATTTTTAATTTTTCTAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 357583 . TTAAAATAATATTTAATCACCCA . . . NS=1;AN=0 GT:PS ./.:. -1 357653 . G . . . NS=1;AN=0 GT:PS ./.:. -1 357842 . T . . . NS=1;AN=0 GT:PS ./.:. -1 358001 . C . . NS=1;CGA_WINEND=360000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.84:1.16:.:0:.:0:0.999:211 -1 358013 . TCTATTTTTTTTTTTTTTTGTGGGTTGTTTATTTTGTGTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 358105 . CACATTTTTTCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 358191 . T . . . NS=1;AN=0 GT:PS ./.:. -1 358283 . TTTCGTT . . . NS=1;AN=0 GT:PS ./.:. -1 358346 . ATTATTCTCTCCACTTCCCACTTTTTTTCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 358422 . C . . . NS=1;AN=0 GT:PS ./.:. -1 358431 . T . . . NS=1;AN=0 GT:PS ./.:. -1 358440 . TTTATTTTTT . . . NS=1;AN=0 GT:PS .|.:358440 -1 358485 . TCTAAATATCATTTCAGAGAGAAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 358517 . GAGAAGGCATATTTTTAAAAAACATAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 358553 . TCCTGTTTTTTACTGCTTCATGGTATA . . . NS=1;AN=0 GT:PS ./.:. -1 358607 . TTTCCCTTTTTCTTCTAGTATTTTTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 358645 . A . . . NS=1;AN=0 GT:PS ./.:. -1 358659 . AATACAATTATTATAAATAATTATATAATA . . . NS=1;AN=0 GT:PS ./.:. -1 358721 . A . . . NS=1;AN=0 GT:PS ./.:. -1 358769 . CAGACGAGTTTTGGAATCTTGTCTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 358894 . ATCCTTTTCTTTTCATAATAGAATGAAAATTGC . . . NS=1;AN=0 GT:PS ./.:. -1 358944 . CCCCATCTTTTTGCCTTCTTTCATTACACCACACACGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 359245 . G . . . NS=1;AN=0 GT:PS ./.:. -1 359372 . CTGAATTTATTTACACGATTGAGATTGATCTAGAGTTTTATTTTTTCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 359455 . TTTTGTGAATATGT . . . NS=1;AN=0 GT:PS ./.:. -1 359518 . A . . . NS=1;AN=0 GT:PS ./.:. -1 359523 . G . . . NS=1;AN=0 GT:PS ./.:. -1 359530 . T . . . NS=1;AN=0 GT:PS ./.:. -1 359678 . AGCATTAGTTTTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 359722 . TCTATTTTTTCCATGGAACATTATATGATTTCATGTCCACAT . . . NS=1;AN=0 GT:PS ./.:. -1 359822 . ATGTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 359877 . C . . . NS=1;AN=0 GT:PS ./.:. -1 359990 . G . . . NS=1;AN=0 GT:PS ./.:. -1 360001 . A . . NS=1;CGA_WINEND=362000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:1.06:.:0:.:0:0.999:211 -1 360197 . T . . . NS=1;AN=0 GT:PS ./.:. -1 360321 . CCTCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 360399 . GGCACCG . . . NS=1;AN=0 GT:PS ./.:. -1 360695 . CATTCCCCTTCTCAGCCCTTTTCATTCTCATG . . . NS=1;AN=0 GT:PS ./.:. -1 360747 . TTTCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 360853 . ACAACTC . . . NS=1;AN=0 GT:PS ./.:. -1 360880 . C . . . NS=1;AN=0 GT:PS ./.:. -1 360950 . CACTTAC . . . NS=1;AN=0 GT:PS ./.:. -1 361108 . TTGGCTCCTCTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 361165 . TTGATTC . . . NS=1;AN=0 GT:PS ./.:. -1 361206 . TCTATAG . . . NS=1;AN=0 GT:PS ./.:. -1 361271 . GCACTCT . . . NS=1;AN=0 GT:PS ./.:. -1 361439 . A . . . NS=1;AN=0 GT:PS ./.:. -1 361487 . TATTTATTCTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 361544 . CTCCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 361590 . TATTTCTATTTCTGTTTCTATA . . . NS=1;AN=0 GT:PS ./.:. -1 361731 . ATCGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 361762 . TTACTGGCTTCATCCTCAATTCCCGGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 361804 . GGCTCCCTAAATTTATTTAATAATTCTCATA . . . NS=1;AN=0 GT:PS ./.:. -1 361871 . AATATTGACTACCTTTCTTATTTTCCTTTCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 361961 . CCATTTTTGAAATATTTTACAA . . . NS=1;AN=0 GT:PS ./.:. -1 362001 . T . . NS=1;CGA_WINEND=364000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.73:1.00:.:0:.:0:0.999:211 -1 362017 . ATGTATTGTAATATTTATCCTCTTTTATGGCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 362119 . TTGTAGAGGATTTGTTACACTTCTTAAACTGCTGATCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 362174 . TTGACCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 362215 . CACATGGCCTGATTTGTTCCACTTCTACTA . . . NS=1;AN=0 GT:PS ./.:. -1 362439 . TTTTCTGATATGCATG . . . NS=1;AN=0 GT:PS ./.:. -1 362532 . GGGAGAAAAGTCCATTTTCCTGAGGATAAGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 362672 . G . . . NS=1;AN=0 GT:PS ./.:. -1 362742 . AAGACATTTTTTTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 362819 . C . . . NS=1;AN=0 GT:PS ./.:. -1 362891 . AGCAATTTTTTTTTTGGGGGGGGTGCTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 363059 . TAGGCCTCAGATTATAATAAATAACGTTTATTGTGCTATACTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 363167 . AGGGAGCAAGCAAGCAAGGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 363196 . TATCCCTATTTTATGTATTTATGTATTTATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 363241 . TTTTGGTTTTTTTGTTGTTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 363285 . ATGTTTTGGGGTACATGTGATATTTTTGATACATGTATACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 363369 . C . . . NS=1;AN=0 GT:PS ./.:. -1 363420 . GCTATTTTAGAATATATAATAAATTATTGTTCCCTATGAT . . . NS=1;AN=0 GT:PS ./.:. -1 363469 . GTACAATTGAATACTAAAATTTATTATGTCTATCTACATA . . . NS=1;AN=0 GT:PS ./.:. -1 363519 . AATAACTATAATTTCTCTGCTGTGCAATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 363567 . TTCGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 363633 . TACGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 363685 . GTCGACCTCAAAGTCCAGATTATTGACC . . . NS=1;AN=0 GT:PS ./.:. -1 363770 . CAACCTCAAAGTCCAAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 363875 . AAATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 363963 . TGACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 364001 . A . . NS=1;CGA_WINEND=366000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:0.95:.:0:.:0:0.999:211 -1 364215 . GGCAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 364247 . CCCCGTTAAATGCAACCCAATGTCACAGTCCGTA . . . NS=1;AN=0 GT:PS ./.:. -1 364287 . CCAGATGAGTGTATTA . . . NS=1;AN=0 GT:PS ./.:. -1 364336 . CATCCTTGAATGAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 364364 . TAAATTTATTTTATTTATGTTCTTTTTAACC . . . NS=1;AN=0 GT:PS ./.:. -1 364422 . ATTTCATCCTACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 364475 . TTTATTTTTAAATCATTGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 364505 . AATTTCCCCTTTCTCCCATTCTTTCCATATTTTTAATCTTTTTTTAACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 364564 . TCTGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 364676 . T . . . NS=1;AN=0 GT:PS ./.:. -1 364851 . ATCAAGAACTATCCACTTTCTATTTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 364932 . AGGTCTCCTCCTGTTATATCAATTTAGGTATTTTTCACACATTGTGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 365069 . AATTGTATCTTCTGCCCTCCAATATCTATAACACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 365203 . CAACTTTTTTTGGTTCTTTCCATTTACAGGAATTAGAGAAAGAACTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 365314 . A . . . NS=1;AN=0 GT:PS ./.:. -1 365325 . A . . . NS=1;AN=0 GT:PS ./.:. -1 365400 . TGACCAACAAAATATGAAGAATGGGTAAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 365544 . CTATATA . . . NS=1;AN=0 GT:PS ./.:. -1 365620 . ACTCTTTTGAGTTTTGCTGTTTTTACCAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 365835 . TTTATATAATTCCATAG . . . NS=1;AN=0 GT:PS ./.:. -1 365886 . ACAATTATAGAGTTTTCCGGTCCCTCTTCCCAGTGCCTCAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 365965 . T . . . NS=1;AN=0 GT:PS ./.:. -1 365979 . A . . . NS=1;AN=0 GT:PS ./.:. -1 366001 . G . . NS=1;CGA_WINEND=368000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.80:0.92:.:0:.:0:0.999:211 -1 366040 . TGTAAAAAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 366157 . AGATATGATGTGTGCCTCTGTGAGTTTTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 366240 . GGAGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 366314 . TCTATTA . . . NS=1;AN=0 GT:PS ./.:. -1 366357 . G . . . NS=1;AN=0 GT:PS ./.:. -1 366433 . ACAATATG . . . NS=1;AN=0 GT:PS ./.:. -1 366604 . TCCGAATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 366686 . AGGGTGGGGAATGCAATGAACATAAAAGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 366839 . GTGTCCATTTTATGCATGT . . . NS=1;AN=0 GT:PS ./.:. -1 366952 . TCATTGCAGGATGGGAAGCCATTTAGGGCTTCATAATT . . . NS=1;AN=0 GT:PS ./.:. -1 367027 . AGTAAGCATGAGAAAATGTGGTAGATAATGGTGGTGATTC . . . NS=1;AN=0 GT:PS ./.:. -1 367091 . TCTAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 367130 . CCCCAGTCTCTATATTCTCTCAATCCACAG . . . NS=1;AN=0 GT:PS ./.:. -1 367196 . TGCAGTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 367240 . ACTTTCTGTCAGAGAATTGTGGGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 367342 . ATCACAG . . . NS=1;AN=0 GT:PS ./.:. -1 367412 . A . . . NS=1;AN=0 GT:PS ./.:. -1 367470 . TCACATGTTGTGATAATATTATGTTGATTTTTTGTTTTTAATA . . . NS=1;AN=0 GT:PS ./.:. -1 367564 . TTGATTTTAGTACATATTATTATCTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 367651 . C . . . NS=1;AN=0 GT:PS ./.:. -1 367828 . CCCCCATGTACTTTCTACTGGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 367941 . GGAGGCTGCATCGCTCAAATCTTCTTCATCCACGTCGTTGGTGGTGTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 368001 . C . . NS=1;CGA_WINEND=370000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:1.03:.:0:.:0:0.999:211 -1 368021 . CAGATATGTGGCCCTATGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 368110 . TTGGTGTCAGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 368154 . TTAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 368203 . CTCGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 368292 . TTCATACTTCTAATCTCCTACGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 368392 . CAGTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 368433 . ACACGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 368584 . AGAGGATCTCATAAATGATATAATAAGCCCTTCTCATTA . . . NS=1;AN=0 GT:PS ./.:. -1 368653 . GATATTTTAGATTCAGGAACTATGAGACATTATGTATTGATT . . . NS=1;AN=0 GT:PS ./.:. -1 368724 . TTATCTGATGAATATATGATGAATATATTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 368800 . TATGCCCATTTAATTTCTTTCAGCAATGTTTTGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 368927 . CAAT . . . NS=1;AN=0 GT:PS ./.:. -1 369008 . TATAAGTTCCATCAGTTTTTTATAGGTTATGTAGGATTTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 369088 . C . . . NS=1;AN=0 GT:PS ./.:. -1 369151 . GAGTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 369194 . T . . . NS=1;AN=0 GT:PS ./.:. -1 369265 . TCT . . . NS=1;AN=0 GT:PS ./.:. -1 369334 . TTTTTCGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 369472 . T . . . NS=1;AN=0 GT:PS ./.:. -1 369520 . ATACGCA . . . NS=1;AN=0 GT:PS ./.:. -1 369562 . ATTATTTTAACACTGGAGATAGAATCTGGTGGAATGACGTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 369644 . TAAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 369810 . TCTATCTTATTGACATACATATCTTTTTTGTGGTGAAAACA . . . NS=1;AN=0 GT:PS ./.:. -1 369915 . TTCTGTGCAATAGTTCACTGAAACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 369961 . CCCTTTTATCAACATCTACCTTTTCCATGTCTACCCCCAACTACTA . . . NS=1;AN=0 GT:PS ./.:. -1 370001 . C . . NS=1;CGA_WINEND=372000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.91:1.09:.:0:.:0:0.999:211 -1 370054 . TTTTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 370096 . TTTCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 370197 . ATTGTATATATACACTACATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 370387 . TTTTAGTTTTTTGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 370503 . TATCATT . . . NS=1;AN=0 GT:PS ./.:. -1 370564 . TGCATTTCCCTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 370613 . TGTATTTTTTTTGAGAAATGTCTATTTAGGACCTTGCCCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 370735 . CTCTGCAAATATTTTCTCACAACTT . . . NS=1;AN=0 GT:PS ./.:. -1 370820 . AATCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 370859 . GGTGATA . . . NS=1;AN=0 GT:PS ./.:. -1 370902 . TTTTCCCCTATGTTTTCATCTAGTAGTTTTACAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 370985 . GTGAGATAAGGATACACACCATACACATTCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 371107 . AAAATTTATTGGTCATAAATGCATGAGTTTATTTCTGGGCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 371180 . TTTTGTGCAAGTGTCATATTG . . . NS=1;AN=0 GT:PS ./.:. -1 371223 . ATATGTACTTGTTTTGGGGGGGATC . . . NS=1;AN=0 GT:PS ./.:. -1 371301 . TGCCTCACCAAGATTGCCCAGAAAACTGA . . . NS=1;AN=0 GT:PS ./.:. -1 371476 . ATGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 371604 . TAGTAAC . . . NS=1;AN=0 GT:PS ./.:. -1 371681 . CACTATTCCTTCAAATTCCCTATTTCTATCTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 371767 . AGCTTAGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 371799 . TTGTAAAATATTGTAGAACA . . . NS=1;AN=0 GT:PS ./.:. -1 371976 . GAGATTTTTTTTTAAATTATATTTTAAGTTCTGGGGTACATATGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 372001 . A . . NS=1;CGA_WINEND=374000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.54:1.08:.:0:.:0:0.999:211 -1 372089 . TCTACAT . . . NS=1;AN=0 GT:PS ./.:. -1 372123 . CCCCAGCCCCTACCCCCAGACAGGCCCCGGTGTGTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 372189 . ATTGTTCAACTCTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 372262 . TGATGGTTTCCAGCTTAATC . . . NS=1;AN=0 GT:PS ./.:. -1 372335 . C . . . NS=1;AN=0 GT:PS ./.:. -1 372344 . TATATGCCACATTTTCTTTATGCAGTCTATCACTGAATGGGCATTTTGGTTGGTTCCAAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 372411 . TATTGTGAACAGTGCCACAATAAACATATGTGTGCATGTGTCTTTATAGTAGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 372472 . TAATCCTTTGGATATATACCCAGTAATGCAATTACTG . . . NS=1;AN=0 GT:PS ./.:. -1 372565 . AATGGTTGAACTAATTTACACTCCCACCAACAGTGTAAAAGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 372697 . GTGGTATTGATATGCATTTCTCTGATGACCAGTGATGATGAGCTTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 372807 . CCACTGATGGGTTTGTTTGTTATTTTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 372860 . ATTCTGGATATTAGCCCTTTGTCAGATGGATAGATTGCAAAAATTTTCACCCATTC . . . NS=1;AN=0 GT:PS ./.:. -1 372921 . GTTGCCTGTTCACTCTGATGATAGTTTCTTTTGCTGTGCAGGAGCTCTTTAGTTTAATTAGATCCCATTTGTCAATTTTGGCTTTTGTTGCCATT . . . NS=1;AN=0 GT:PS ./.:. -1 373038 . AGTTTTTGTCCATGCCTATGTACTGAATGGTATTGCCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 373126 . TAATACATCGTGAGTTAATTTTTGTGTAAAGTGTAAGAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 373198 . GCCGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 373223 . AAAAAGGGAATCGTTTCCCCATTGCTTGTTTTTGTCAGGTTTGTCAAAGATCAGATAGTTGTAGATGTGTGGTGTTATTTCTGAGGCCTCTGTTCTGTTCCATTGGTCTACATATCTGTTTTGGTACCAGTACCATGCTGTTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 373375 . AAGACTTGTAGTATAGTTCGAAGTCAGACAG . . . NS=1;AN=0 GT:PS ./.:. -1 373459 . CTCATTTTTGGTTCCATATGA . . . NS=1;AN=0 GT:PS ./.:. -1 373490 . A . . . NS=1;AN=0 GT:PS ./.:. -1 373516 . TCAGTGGTAGCTTGATGGGGACAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 373558 . TTTGGGCAGTATGTCCATTTTCATGATATTGATTCTTCCTATCCA . . . NS=1;AN=0 GT:PS ./.:. -1 373618 . TTTCCATTTGTTTGTGTCCTCTCTTATTTCCTTGAGCAGTGGTTTGTAGTTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 373702 . TTGGATTCCTAGGTATTTTATTCTCTTAGTAGCAATTGTGAATGGGAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 373817 . TTGATTTTGTATCCTGAGACTTTTCTGAAGTTGCTTATTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 373868 . TTTTGGGCTGAGACCATGGGGTTTTCTAAATACACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 373959 . A . . . NS=1;AN=0 GT:PS ./.:. -1 373967 . C . . . NS=1;AN=0 GT:PS ./.:. -1 373985 . G . . . NS=1;AN=0 GT:PS ./.:. -1 374001 . C . . NS=1;CGA_WINEND=376000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.64:1.18:.:0:.:0:0.999:211 -1 374019 . G . . . NS=1;AN=0 GT:PS ./.:. -1 374023 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 374066 . TGGGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 374123 . TATTGAGAGTTTTTAGCATGAAGGGCTGTTGAATTTTTTCGAAGGCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 374265 . TTGTATCCCAGGGATGAAGCCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 374300 . GGACAAGCTTTTGATGTGCTGCTGGATTTGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 374340 . ATTTTATTGAGGATTCTTGCATCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 374392 . TTTCTTTTTTTTTGTTGTGTCTCTGCCAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 374444 . GCCTCATAAAATGAGTTAGGGAGGATTCCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 374513 . ACCAGCTCCTCTTTGTACCTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 374569 . ACTGTTTTTTGGTTGGTAGGCTATTAATTCTGCCACAATTTCAGACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 374635 . G . . . NS=1;AN=0 GT:PS ./.:. -1 374650 . CTGGTTTAGTCTTGGGAGGGTGTATGTGTCCAGGAATTTGTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 374711 . A . . . NS=1;AN=0 GT:PS ./.:. -1 374742 . GATAGTAGTTTGTATTTCTGTGGGATCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 374788 . TCATTTTTTATTGCATCTGTTTGATTCTTCTCTGTTTTCTTCTTTATGA . . . NS=1;AN=0 GT:PS ./.:. -1 374896 . TTGATTTTTTTTGAAGGTTTTTTTGTGTCTCTATCTCCTTCAGTTCTGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 375005 . CTTCTAATTGAGATGTTAGGGTGTCAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 375187 . ACCCAGTAGTTGTTCAGGAGCAGGTTGTTCAGTTTACAT . . . NS=1;AN=0 GT:PS ./.:. -1 375235 . TGGGTTTGAGTCAGTTTCTTAATCCTGAGTTCTAATTTAATTGCACTGCGATCT . . . NS=1;AN=0 GT:PS ./.:. -1 375306 . CCCATTTTTTTTTGCATTTGCTGAGGAGTGTTTTACTTCCAAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 375371 . GTGCAATGTGGTGCTGAGAAGAATGTATATTCTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 375471 . C . . . NS=1;AN=0 GT:PS ./.:. -1 375494 . T . . . NS=1;AN=0 GT:PS ./.:. -1 375516 . TTGACAGTGGGATGTTAAAGTCTCCCACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 375590 . TACTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 375613 . TGTATTGGGTGCATATATATTTAGGATAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 375663 . ATCCTTTTACCA . . . NS=1;AN=0 GT:PS ./.:. -1 375712 . TGGTTTAAAGTCTGTTTTATCAGAGACTAGGATTGCGACCCCTGCTTTTTTTTGCCTTCCATTTGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 375789 . TCCTCCATCCCTTTATTTTGAGCCTATTTGTGTCTTTGCACGTGAGATGGGTCTCCTGAATACAGAACACTGATGGGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 375876 . TTAGCCAATTTGCCAGTCTGTGTTTTTTAATTGGAGCATTTAGCCCATTTACATTTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 376001 . T . . NS=1;CGA_WINEND=378000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.58:1.23:.:0:.:0:0.999:211 -1 376021 . GTCAATGGTCTTTACAATTTGGTATGTTTTTGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 376134 . ATCTCTCAGCATTTGCTTGTCTGTAAAGATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 376174 . CTTCACTTATGAAGCTTAGTTTGGCTGGATATGAAATCCTGGGTGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 376253 . CCCCCATTCTCTTCTGGCTTGTAGAATTTCTGCTGATAGATCTGCTGTTAGTCTGATGGGCTTCCCTTTGTGGGTAACCTGACCT . . . NS=1;AN=0 GT:PS ./.:. -1 376356 . AACATTTTTTCCTTCATTTCAACCTTGCTCAATCTGATGACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 376435 . GTGATGTTCTCCGTATTTCTGAATTTGAATATTG . . . NS=1;AN=0 GT:PS ./.:. -1 376502 . A . . . NS=1;AN=0 GT:PS ./.:. -1 376511 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 376519 . G . . . NS=1;AN=0 GT:PS ./.:. -1 376535 . CCATTCTCCCCATCACTTTCAGGTACAGCAATCAAACGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 376606 . TTGGAGGCTTTGTTCATTTCTTTTCATTCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 376702 . TCCGCTTGATCGA . . . NS=1;AN=0 GT:PS ./.:. -1 376924 . TCTCTCAATTCATCAAACTCATTCTCCGTCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 376984 . A . . . NS=1;AN=0 GT:PS ./.:. -1 377094 . TGGTAACCTTCTGATGAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 377153 . CCTTTTTGTTTGTTAGTTTTCCTTCTAACAGTCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 377209 . TGGTGTTTGCCCTAGGTCTACTCTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 377248 . GGGCATCACCAGCGGAGGCTGGAGAACAGCAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 377292 . GTTTCTTCCTCTGGAAGTTTTGTCCAAGAGGGGCACCCACCAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 377362 . GTGCCTGTTGGCACCTACTGGGAGGTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 377411 . GGTTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 377458 . AATACTGTGCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 377489 . CAGAGCCATCAGGCTTTTCAAAGATGCTTTAAGTCTGCTGAAGCTGTGCCCACAGGCGC . . . NS=1;AN=0 GT:PS ./.:. -1 377601 . CTGACTGGGGCTGCTGCCCTTTTTTCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 377658 . G . . . NS=1;AN=0 GT:PS ./.:. -1 377735 . TACACTGTGGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 377784 . CCCTCCCGCCACCAAGCTCGAGTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 377826 . CTGCTGTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 377856 . CTTAGCTTGCTGGGCTCTGTATGGGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 377971 . CCATTGTGGCATGAAAAAAAAAAAAAAACTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 378001 . C . . NS=1;CGA_WINEND=380000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.76:1.04:.:0:.:0:0.999:211 -1 378072 . GAGGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 378112 . ACCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 378218 . TCCCGGGTGAGGCGACA . . . NS=1;AN=0 GT:PS ./.:. -1 378261 . GTGCGCTGCACCCACTGTCCAACCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 378318 . G . . . NS=1;AN=0 GT:PS ./.:. -1 378409 . GAGATTTTTTTTAAAAGTGCAAAGAAAGACATC . . . NS=1;AN=0 GT:PS ./.:. -1 378493 . GCCCCTGTTGGATTCAGCAGAGGGAGATAGGCCTTGCCATA . . . NS=1;AN=0 GT:PS ./.:. -1 378572 . GGGAGTGCTGATACCTGGGCCACAGTTAGTCCAAGTTTATCACTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 378673 . TGTGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 378765 . CACCGGCCTCCGAGAGTCCATGGAGCAATGGGAAAATTGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 378864 . GGGGACTGCAGTCGACAATG . . . NS=1;AN=0 GT:PS ./.:. -1 378920 . T . . . NS=1;AN=0 GT:PS ./.:. -1 379003 . GGCCGCAATTCAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 379174 . CAGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 379201 . GCTGGACTTGAGGGCTGGCTTGGCTGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 379235 . GACACAGCAGCAGCTCAGGATGATGGTGATGGTCCACGCGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 379292 . TTCATAGTAGTAGTTACAACACTGAGACTGC . . . NS=1;AN=0 GT:PS ./.:. -1 379473 . TCCTTAC . . . NS=1;AN=0 GT:PS ./.:. -1 379575 . ATACGTTTTGAAGTCAGATTGTGAGGCCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 379628 . AGTGCTTTAGTTATTCAGGGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 379721 . AATGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 379795 . ACATAAAATATCTTTCCATGTATTTGTGTCATCTACAATTTTTCATC . . . NS=1;AN=0 GT:PS ./.:. -1 379935 . TACAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 380000 . G . . . NS=1;AN=0 GT:PS ./.:. -1 380001 . A . . NS=1;CGA_WINEND=382000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.61:1.14:.:0:.:0:0.999:211 -1 380025 . GTATATTGATTTTGAATTCTGCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 380089 . GGAGTTTTTAGGGTTTCCTATATA . . . NS=1;AN=0 GT:PS ./.:. -1 380159 . TGAATATCTTTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 380255 . GTTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 380358 . CTACGCCCAATTTGTTGAGAGGTTTTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 380421 . CTGCATATATAGAGATAGCTATTTTTTTATCCTTCATT . . . NS=1;AN=0 GT:PS ./.:. -1 380544 . AATATGT . . . NS=1;AN=0 GT:PS ./.:. -1 380559 . TCTGTTTGCTAGTACTTATTTTGAGGACTTTTGTATCTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 380609 . GATATTGGTTGGCCCATACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 380655 . T . . END=380854;NS=1;AN=0 GT:PS ./.:. -1 380867 . C . . END=381153;NS=1;AN=0 GT:PS ./.:. -1 381166 . TGTGGAAGTCAGTGTGGCGATTTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 381207 . GAAATACCAT . . . NS=1;AN=0 GT:PS ./.:. -1 381224 . AGCCATCCCATTACTGGGTATATACCCAAAGGATTATAAATCAT . . . NS=1;AN=0 GT:PS ./.:. -1 381297 . TTTATTGCAGCACTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 381369 . AAGA . . . NS=1;AN=0 GT:PS ./.:. -1 381376 . T . . . NS=1;AN=0 GT:PS ./.:. -1 381379 . G . . . NS=1;AN=0 GT:PS ./.:. -1 381425 . GTTCATGTCCTTTGTAGGGACATGGATGA . . . NS=1;AN=0 GT:PS ./.:. -1 381473 . CAGCAAACTATCACAAGGACAAAAAACCAAACAGTGCATGTTCTCACT . . . NS=1;AN=0 GT:PS ./.:. -1 381543 . A . . . NS=1;AN=0 GT:PS ./.:. -1 381563 . GGGGAACATCACTTTAAAAAAAAAACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 381619 . AAATACTCTTTCTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 381670 . ATTATTTTTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 381904 . TCTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 381973 . CTCGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 382001 . G . . NS=1;CGA_WINEND=384000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.80:1.05:.:0:.:0:0.999:211 -1 382017 . TTTTCAAAGAACAAATTCTTAGTTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 382069 . GTCTTTTTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 382119 . A . . . NS=1;AN=0 GT:PS ./.:. -1 382247 . ATTTTTTTGCTGTGTTTTCTTAAAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 382294 . T . . . NS=1;AN=0 GT:PS ./.:. -1 382348 . T . . . NS=1;AN=0 GT:PS ./.:. -1 382375 . TCAGGGTAATGCTGGTTTTGAAAAATGAATTTGGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 382489 . CACTGAGCCATTCAATCCTGGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 382557 . TCACGATTTGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 382617 . TTCTTCTAGGTTATCTAATTTGCTGGTGAATAATTAATTATAGT . . . NS=1;AN=0 GT:PS ./.:. -1 382692 . TCAGTTTTAATGTCTCTTCTTTCATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 382729 . TTTCTTTTTTCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 382771 . TATCTTTTCAAAAAACAATTCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 382941 . TTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 382973 . AGT . . . NS=1;AN=0 GT:PS ./.:. -1 382978 . C . . . NS=1;AN=0 GT:PS ./.:. -1 382981 . TCT . . . NS=1;AN=0 GT:PS ./.:. -1 383078 . C . . . NS=1;AN=0 GT:PS ./.:. -1 383101 . ATATAGGTATATATAAAAACAATACACACATTGTATATAAACTATGGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 383167 . GAACATAAAGTGTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 383361 . AAGATAAACAGAATAACTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 383425 . CAAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 383455 . TTTCTTTTTTTTTTAATTTGTGTAGCTTTTTATTTTATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 383526 . G . . . NS=1;AN=0 GT:PS ./.:. -1 383534 . TGCA . . . NS=1;AN=0 GT:PS ./.:. -1 383566 . T . . . NS=1;AN=0 GT:PS ./.:. -1 383606 . ACCTAAAAATCCA . . . NS=1;AN=0 GT:PS ./.:. -1 383625 . GGGAGCGTTTCACTGAAGACAGTTGTTAGAGAAACGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 383687 . GTCCCTTCTTCTCTGCTTTCTTCTTTTCTCCTCCTCCTCCTCCTCCCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 383744 . TTCTTCTCTCTCTGTTTTTCTAATCATGAAAACAAACGAAAAAAACTATG . . . NS=1;AN=0 GT:PS ./.:. -1 383817 . AGCGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 383849 . ACAGAAAATAAAATAACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 383888 . AAACAAAAAATTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 383971 . CATAGAGAATAACATAATTGAAATTTAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 384001 . C . . NS=1;CGA_WINEND=386000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.10:0.96:.:0:.:0:0.999:211 -1 384043 . GTTGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 384086 . CTATCCAAAATGAACACAGGAATAAGGAAATGGAAAATAAG . . . NS=1;AN=0 GT:PS ./.:. -1 384153 . ACAGAGGGGGAAATGCTAAAAAGTTTTAAAGAGTGTTTCAGAAGGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 384231 . TTTTAAATTTTATTTTATG . . . NS=1;AN=0 GT:PS ./.:. -1 384320 . CCATCAACCCATCACCTAGGTATGAGGCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 384390 . T . . END=386531;NS=1;AN=0 GT:PS ./.:. -1 386001 . A . . NS=1;CGA_WINEND=388000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.25:0.99:.:0:.:0:0.999:211 -1 386537 . CCTACCCCCTGTCCCCCTGAGAGGCCCTGGTGTGTGTTGTTCCCCTCCATGTACCCACGTGTTTGTCTTGATGGTCTCCTACCCCCTGTCCCCCTGAGAGGCCCTGGTGTGTGTTGTTCCCCTCCATGTATCCACGTGTTTGTCTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 386691 . CCTACCCCTGTCCCGCTGAGAGGCCCTGGTATGTGTTGTTCCCCTCCATGTATCCATGTGTTTGCTCTCATTGTTCAACTCCCTCTTACGACTGAGAACATGTGGTGTTTGGTTTTCTGTTCCTGTGTTAGTTTGCTGAGGGTGATGGCTTCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 386855 . T . . END=387191;NS=1;AN=0 GT:PS ./.:. -1 387210 . G . . . NS=1;AN=0 GT:PS ./.:. -1 387270 . CAGCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 387322 . CACTATACCTGGCTAATTTTTTTATTTTTAGTAGAGATGGAGTTTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 387398 . TGTCCTCAGGTGATCCACCCGCCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 387492 . AGAATGATTTATATTCCTTTGGGCATATATCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 387576 . ATTTTCACACTGCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 387628 . GTGTAAAAACATT . . . NS=1;AN=0 GT:PS ./.:. -1 387728 . GTGGTTTTGAGTTGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 387803 . CTTCTTTTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 387845 . TGATGTTTTTTTTTTTTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 387922 . TGTGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 388001 . G . . NS=1;CGA_WINEND=390000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.86:1.15:.:0:.:0:0.999:211 -1 388004 . GGTACTC . . . NS=1;AN=0 GT:PS ./.:. -1 388202 . CCCAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 388294 . TTTTCTGTTGCCCCG . . . NS=1;AN=0 GT:PS ./.:. -1 388352 . GGTGTTTTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 388437 . TGCAGGCATTCT . . . NS=1;AN=0 GT:PS ./.:. -1 388461 . AAAGAAAAAAAAAAGCTGACTTTGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 388671 . TACAGAGGGAGGCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 388809 . A . . . NS=1;AN=0 GT:PS ./.:. -1 388873 . GGCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 388938 . TTTTGCAAATTGTTGTTTGTGGGGGTCTGTCCTGCAGACCCCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 389051 . G . . . NS=1;AN=0 GT:PS ./.:. -1 389129 . CAGGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 389183 . GAGAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 389297 . ACTCAAGGAAGGATTAGGCCGCCTTCATTCAGATC . . . NS=1;AN=0 GT:PS ./.:. -1 389349 . AACACCCCCAGCCTTCCATGAAGGTTTGTGTCGATTT . . . NS=1;AN=0 GT:PS ./.:. -1 389417 . GCCTGACTGAACTCCCACAGTTGTTGCTACTTTTTGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 389577 . TGGAAAACTTC . . . NS=1;AN=0 GT:PS ./.:. -1 389690 . A . . . NS=1;AN=0 GT:PS ./.:. -1 389729 . ATCTTATATATAGAAAACACCAATAAC . . . NS=1;AN=0 GT:PS ./.:. -1 389783 . T . . . NS=1;AN=0 GT:PS ./.:. -1 389900 . CAACAAAAAATACT . . . NS=1;AN=0 GT:PS ./.:. -1 389960 . CTACAAAACATTGATGAAAACAATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 390001 . A . . NS=1;CGA_WINEND=392000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:0.99:.:0:.:0:0.999:211 -1 390009 . GATATATCATGTTCATGGATTGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 390052 . TGTCCATACTATCCAAAGTGATGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 390110 . GACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 390176 . A . . . NS=1;AN=0 GT:PS ./.:. -1 390379 . ACATGGGGAAAGAACAGTCTCTTCAAAAAATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 390465 . ATATAAAAATATAAATTCA . . . NS=1;AN=0 GT:PS ./.:. -1 390526 . CCTGGAAGAAAACAGGGGAATA . . . NS=1;AN=0 GT:PS ./.:. -1 390654 . TGCACAGCAAAAGAAAGTCAACAGAGTGAAGTGATAAC . . . NS=1;AN=0 GT:PS ./.:. -1 390758 . AGGAACTCAAACAACACAA . . . NS=1;AN=0 GT:PS ./.:. -1 390799 . AATGAAAAATTGACAAAGGATCTAAATAGACATTTCTCAAAAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 391030 . AAACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 391060 . TAAATGATTCTGTAAAAATAGAATTACCATATGATTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 391130 . TATCAAATCAGTGTGTCAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 391246 . AAATGTGGTACACATACACAATAGAATAGTATACAACCTTAAAAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 391466 . GATGTTGGTCAAAGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 391606 . TTTATGGGGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 391657 . A . . . NS=1;AN=0 GT:PS ./.:. -1 391717 . CCAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 391811 . TTTCTTTTTTTGTACCCATTAACCATCCCCCACCTCTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 391961 . TGTCTTTTTGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 392001 . C . . NS=1;CGA_WINEND=394000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.07:.:0:.:0:0.999:211 -1 392041 . TTCCTTTTTATGGCTGAATAATACT . . . NS=1;AN=0 GT:PS ./.:. -1 392091 . CTTTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 392206 . TTTGGGGGGTATATATGCAGCAGTGGGATTGCTGGACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 392254 . TCTATTTTTACTTTTTTGAGGAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 392293 . CATAGGGGATGTACTTGTATTCTTCAAAATTGCTAAGAGTAGATTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 392350 . CAT . . . NS=1;AN=0 GT:PS ./.:. -1 392395 . CATGGCTATTTTACAATGAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 392427 . A . . END=392699;NS=1;AN=0 GT:PS ./.:. -1 392734 . ACATAACACTTTCCAACTTGTTTTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 392831 . ACATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 392862 . ATATAATCCAACAATATACTCCAAAGAAAAATACATCATGA . . . NS=1;AN=0 GT:PS ./.:. -1 392948 . CAAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 393008 . CTGTGGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 393042 . TTTGTAATAATAACAAACTA . . . NS=1;AN=0 GT:PS ./.:. -1 393099 . ACACAAAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 393141 . TAAACAGTTTCATTCTAACATGAGGAACA . . . NS=1;AN=0 GT:PS ./.:. -1 393203 . CATTAAAATAATACTGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 393286 . TCAACAC . . . NS=1;AN=0 GT:PS ./.:. -1 393332 . CTAACAACACATCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 393380 . TCCAGGGGTGCTTGAATGGTTCAAAATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 393562 . A . . . NS=1;AN=0 GT:PS ./.:. -1 393617 . GTTAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 393677 . ATTCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 393797 . A . . . NS=1;AN=0 GT:PS ./.:. -1 393883 . TGTATTTCTATACACCAAATACAT . . . NS=1;AN=0 GT:PS ./.:. -1 394001 . G . . NS=1;CGA_WINEND=396000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.83:1.08:.:0:.:0:0.999:211 -1 394177 . TTACCAACGTTATTTCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 394229 . AGCCAAAAAAAGATCCCTAATAGCCAAAGCACTTCTAAGCAAAAAACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 394314 . ATATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 394389 . AA . . . NS=1;AN=0 GT:PS ./.:. -1 394454 . ACCCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 394582 . GTCCGACATATTCATGT . . . NS=1;AN=0 GT:PS ./.:. -1 394618 . ATACGAATGGATGAATTACACCAA . . . NS=1;AN=0 GT:PS ./.:. -1 394794 . TATTAAAAAACACATACAATGCAATGGAAGGACAGT . . . NS=1;AN=0 GT:PS ./.:. -1 394880 . AATGGGAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 394944 . TTTATTTTTCTGAGACATGGTGGCCCAGGCTGGAGTACGATA . . . NS=1;AN=0 GT:PS ./.:. -1 395131 . TCTCTTTTTTATTATTCCTTGAACACTAAGTATATGTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 395229 . CCCCATCCCCTTTTTACTTTTCGTCATGTCATATAGTTTCTGAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 395303 . TCAAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 395352 . TTCATCC . . . NS=1;AN=0 GT:PS ./.:. -1 395388 . CACCAGGTCATGTTAATTTTACTGTCACGCACATATCT . . . NS=1;AN=0 GT:PS ./.:. -1 395472 . GATAATC . . . NS=1;AN=0 GT:PS ./.:. -1 395560 . GCTAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 395597 . AACAAAAAACCAAACACCGCATGTTTTCACTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 395644 . ACAATGAGAACACTTGGACACAGGAAGGGGAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 395714 . GGGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 395777 . ACACGGCACATGTATACATATGTAACCTGCACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 395830 . ACTTAAAGTATAATAAAAATAAATATTAAAAAAAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 395895 . TCCTACT . . . NS=1;AN=0 GT:PS ./.:. -1 395911 . GCTATAAATTTCTTATAATTTAATAAATATTTTTCCATAA . . . NS=1;AN=0 GT:PS ./.:. -1 395982 . C . . . NS=1;AN=0 GT:PS ./.:. -1 395996 . ATTATCAGGCAGCAGTACAAAGCTGCTTACACCTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 396001 . C . . NS=1;CGA_WINEND=398000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.77:0.98:.:0:.:0:0.999:211 -1 396043 . TTTATTTTAGGCCTCCTTGCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 396089 . TGAAAAACTTACACATTTCTTCATTTGGTCAATCCTTGCACACTTTCAAGATTCATTTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 396163 . TGGA . . . NS=1;AN=0 GT:PS ./.:. -1 396169 . C . . . NS=1;AN=0 GT:PS ./.:. -1 396182 . CTTGTATTATTAATTTATCTTAGTTTTCACTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 396262 . GACAGTAAGCTTTTGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 396375 . TGGAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 396486 . AACCTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 396555 . CTGCCTCTCTCTCTTTCTATATATATAACAT . . . NS=1;AN=0 GT:PS ./.:. -1 396668 . A . . . NS=1;AN=0 GT:PS ./.:. -1 396709 . AAATAAAAATGAATCATTAATACACCACTTGGCACAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 396794 . GGGTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 396862 . TATCTTTGCTCTGAATTTGTTTTTTTTCACT . . . NS=1;AN=0 GT:PS ./.:. -1 396932 . TGATATTTTATTTAAAAACTGTTACTATAGAAAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 397014 . CTGTATT . . . NS=1;AN=0 GT:PS ./.:. -1 397060 . TT . . . NS=1;AN=0 GT:PS ./.:. -1 397093 . CCTTGATTTTTGTGGCTATCATTTTATTCATTCCATTGCTTTTTTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 397179 . AACGATAGATTGTTTTGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 397215 . GTGCCTGTGATGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 397275 . TTTAAAAACATAAATTTTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 397327 . TAAAGTATGATAAAAATATACATATATAACAAATTAAAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 397398 . GCTATTTTTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 397490 . GTAAATGTGATTTACTGATGAAGATGCCACCCCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 397546 . CCTGCGT . . . NS=1;AN=0 GT:PS ./.:. -1 397948 . TATTTATTTTTAAGTTGAAAAACAAAACACACACACACAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 398001 . A . . NS=1;CGA_WINEND=400000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.80:0.93:.:0:.:0:0.999:211 -1 398095 . CTCCTCAACTGCAGCCACCT . . . NS=1;AN=0 GT:PS ./.:. -1 398126 . AGTGACGATAAACATGACTGTTCCAGTAGAGGCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 398221 . TGCCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 398322 . GGGATTTTTGTCACTTGGTGTCCACACTGTGAATTTTGCTGGTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 398406 . T . . . NS=1;AN=0 GT:PS ./.:. -1 398436 . TTTGACTAGGACACGTGCTTGATTCTATTTTTGGTAAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 398575 . T . . . NS=1;AN=0 GT:PS ./.:. -1 398577 . A . . . NS=1;AN=0 GT:PS ./.:. -1 398583 . T . . . NS=1;AN=0 GT:PS ./.:. -1 398612 . GTACATGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 398627 . GTATAGACACACGTGTGTCTATGCATATAATATATAATAAGTACATATTATATATTTTTTAAGAGAAGGAAAGAGCATGTGAGAGAGCACATTGCTTATTTATGTTGATAATACTGATTCTAATGTAACTCAGTGGGCTGCTTCTTTCCTTGTCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 398791 . TCTGTATCTCTGTCCTTTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 398826 . GTTCTAGCAATACAAATATATTAATATTTACTCATTGG . . . NS=1;AN=0 GT:PS ./.:. -1 398871 . TTACAATGCACATACATTTGTTTCAGAATTGTTATATTTATACTAATATAAAAAGAAAAAATCCT . . . NS=1;AN=0 GT:PS ./.:. -1 398941 . GAAGATTTAATAATTTTTTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 399024 . TAAGAGAACAGTCAAAAATTAATCAGATTAATTATTATTTTCCCCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 399177 . T . . . NS=1;AN=0 GT:PS ./.:. -1 399187 . A . . . NS=1;AN=0 GT:PS ./.:. -1 399280 . ATTTCCCTCTAATTCTTTATAG . . . NS=1;AN=0 GT:PS ./.:. -1 399394 . ATCCACAGCAATGTTTTATTTCCATGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 399492 . GTGAGGTAGGGGTCCAGCTTCATTTTTTTTGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 399627 . T . . . NS=1;AN=0 GT:PS ./.:. -1 399668 . TTTTAAAAGAACCAACTTTTAGTTTTGTTGATCTCTGTATA . . . NS=1;AN=0 GT:PS ./.:. -1 399744 . AATGTTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 399831 . ACATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 399904 . TATCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 400001 . A . . NS=1;CGA_WINEND=402000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:1.11:.:0:.:0:0.999:353 -1 400088 . AATCTTTTTAAGTTTATTGAGCTTATCTTATGGCCTAACATAT . . . NS=1;AN=0 GT:PS ./.:. -1 400203 . TCTATAG . . . NS=1;AN=0 GT:PS ./.:. -1 400363 . TATTTTGGAGCCCTGTTAGTTGGTGTACATATATT . . . NS=1;AN=0 GT:PS ./.:. -1 400415 . CTTGTTTAATTTTGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 400570 . ATTGTGTATTTTGATC . . . NS=1;AN=0 GT:PS ./.:. -1 400624 . TTTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 400725 . TTGCTTTTGTTTTCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 400878 . TACCATGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 400911 . TGGAACAATTTAGAATAAATTGAAACAACTTCAATAG . . . NS=1;AN=0 GT:PS ./.:. -1 401021 . ATACATC . . . NS=1;AN=0 GT:PS ./.:. -1 401072 . TTTTAAAACATATAGAATATTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 401110 . AATATGATAATTGCTTTTATATTTACCTATGTAGTTATCCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 401190 . TAGTGTTTTTTAGTTTGTATCTGAAGGAGTCTTTTTTTTTAAAAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 401271 . T . . END=401471;NS=1;AN=0 GT:PS ./.:. -1 401519 . TTACAGAATTTTTGGTTGACAGTGTTTTTTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 401565 . AATGTCATTCTA . . . NS=1;AN=0 GT:PS ./.:. -1 401599 . TATAATGAGCTCTCTGCTGTCAAACTTACTGAGGATC . . . NS=1;AN=0 GT:PS ./.:. -1 401676 . G . . . NS=1;AN=0 GT:PS ./.:. -1 401686 . ATT . . . NS=1;AN=0 GT:PS ./.:. -1 401692 . A . . . NS=1;AN=0 GT:PS ./.:. -1 401695 . C . . . NS=1;AN=0 GT:PS ./.:. -1 401709 . A . . . NS=1;AN=0 GT:PS ./.:. -1 401737 . CGCGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 401780 . TTTGACAATTTGAAACGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 401829 . TTCCTTGATGAAGTGCACTGAGCTTTTTGAATGTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 401896 . TCAAACACATTTCTTCAAATAGTCTTCCTGCCCCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 401962 . TCTTGATGTTGTTACACAGGTCTTTAGGCTCTATTCTTTTTTGTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 402001 . T . . NS=1;CGA_WINEND=404000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:0.98:.:0:.:0:0.999:353 -1 402046 . TCACTTG . . . NS=1;AN=0 GT:PS ./.:. -1 402110 . AAGTAAAAAGTAACTTTTTAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 402176 . AAAAATTCATATTTCTTATTTATTTATATGT . . . NS=1;AN=0 GT:PS ./.:. -1 402242 . TTAAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 402369 . G . . . NS=1;AN=0 GT:PS ./.:. -1 402406 . CTGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 402455 . TGGGATG . . . NS=1;AN=0 GT:PS ./.:. -1 402581 . ATTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 402613 . CAAAAAAGAAAAAAATATGAAAAACATC . . . NS=1;AN=0 GT:PS ./.:. -1 402687 . TAGTGGAGCCATATAT . . . NS=1;AN=0 GT:PS ./.:. -1 402802 . CCTTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 402926 . TCAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 402975 . TGGATGATCTGATTTCTAAAGGTTTGGTGGAATTCTCTGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 403033 . TGTTTGTGTGTGTGTTATA . . . NS=1;AN=0 GT:PS ./.:. -1 403103 . CTTCTTTTTTTTTTTTTTTTTTGAAGTCAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 403284 . TCTCTTTTTTTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 403389 . CTTATTTTATTAATTTTCACTTTTTTCTACTATGG . . . NS=1;AN=0 GT:PS ./.:. -1 403451 . TTTTAGATACTTTTTACATTTTTCAGCTGATAATTTAATTCCTACACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 403699 . TTTTGATT . . . NS=1;AN=0 GT:PS ./.:. -1 403763 . G . . . NS=1;AN=0 GT:PS ./.:. -1 403770 . ATTA . . . NS=1;AN=0 GT:PS ./.:. -1 403922 . AGTGTCTGTCTCTTCCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 404001 . A . . NS=1;CGA_WINEND=406000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:1.01:.:0:.:0:0.999:353 -1 404027 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 404113 . TTTCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 404156 . TCTCGTGCGTA . . . NS=1;AN=0 GT:PS ./.:. -1 404243 . CTGCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 404285 . GATCAGACCCTGAGACTGCGTTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 404333 . TCTGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 404406 . CAATATTCCCTAGGAAGCTGACCTCTGCTGAATGCAACACTCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 404470 . TAGTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 404529 . TACTTAACCATTA . . . NS=1;AN=0 GT:PS ./.:. -1 404567 . TCTATCCATGGCCTCAGTTCCTGTTGGGGAGCCTCG . . . NS=1;AN=0 GT:PS ./.:. -1 404628 . TTTCTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 404658 . TTGCGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 404727 . AGGCTGAATAATTTTAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 404757 . ATTTGGTTCACAGTTCTGAAGGTGTGCAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 404799 . ACCATTTGCTTCTGGTGAGGGCTTTAGTCTGTTTCCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 404872 . GAGATCACGTGGCAAGAGAGAGGGGTTTGTACCA . . . NS=1;AN=0 GT:PS ./.:. -1 404932 . G . . . NS=1;AN=0 GT:PS ./.:. -1 404968 . GCTTACACCT . . . NS=1;AN=0 GT:PS ./.:. -1 404988 . CACTTTGGGAGGCCGAGGCAGGTGGATCACCTGAGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 405052 . ATGGTGAAACCCCGTCTCTACTAAAAATACCAAAAATTAGCTGGGCATAGTGGTGGGTACC . . . NS=1;AN=0 GT:PS ./.:. -1 405328 . AGGCCCCACCTCCAACAATGGGGATCACATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 405391 . ACCCTAGCAGTTCCCTTAACCCTGGAAAGAGACCCT . . . NS=1;AN=0 GT:PS ./.:. -1 405479 . GACCCTGACGGATAGAGGGACCATACAGATCACTAAAATGCTGAGGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 405576 . AAACTTTTATTTTTTATTTTATTTTATTTTATTTTTTGAGACGGAATCTCGCCCTGTCACCC . . . NS=1;AN=0 GT:PS ./.:. -1 405669 . C . . . NS=1;AN=0 GT:PS ./.:. -1 405704 . CCTGCCTTAGCCTCCCGAGTAGCTGGGACTACAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 405748 . GCAGGCCCGGCTAATTTTTTATTTTTAGTAGAGATGG . . . NS=1;AN=0 GT:PS ./.:. -1 405800 . ACCGGGCTGGTCTTCAACTCCTGACTTCATGATCCACCC . . . NS=1;AN=0 GT:PS ./.:. -1 405873 . TGAGCCGCTGCACCCAGCCAAACTTAAAAAAAAAAACCCAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 405947 . TCAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 406001 . T . . NS=1;CGA_WINEND=408000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.71:1.03:.:0:.:0:0.999:353 -1 406006 . TTAATCA . . . NS=1;AN=0 GT:PS ./.:. -1 406089 . AGAGGCAGTATAGTATCATGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 406140 . TAGGTTTGAATTACTAGCCACGAGGCTTTGGGAACA . . . NS=1;AN=0 GT:PS ./.:. -1 406199 . ATTTCTTTATCTA . . . NS=1;AN=0 GT:PS ./.:. -1 406219 . TAGATACCTTACCTTTATCTATACCTTATAGATACCTTTATCTATAAGGTATAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 406288 . ACCTGAAAGTTTTGGCATGAGTTTAGTAAAACTGTCTGTGAAGCCCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 406367 . AAATGGTGGCTTTATATAGAGTAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 406405 . TCTCAAAAAGAAATCAGGGAAATAAGAATGC . . . NS=1;AN=0 GT:PS ./.:. -1 406481 . TTGAGTGTGGGGTGAGGAGTAGGGGAGGGGAGGAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 406547 . GGAGAAAAATGATGTCACTGGGAACTGCAGTCATTTGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 406681 . TCCCACCTCCTCTCCAACCATCTCTTCCCTTCCTTAATTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 406742 . AAGGCATAGTGCTTTGATTTATAAATTAGTTCTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 406908 . AGAGATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 406971 . CTAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 406993 . AGACAAGATTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACC . . . NS=1;AN=0 GT:PS ./.:. -1 407123 . TAACACATTATTTCCACTTTCCTAAGGATAG . . . NS=1;AN=0 GT:PS ./.:. -1 407175 . C . . . NS=1;AN=0 GT:PS ./.:. -1 407235 . GTGATTTTTTTTTTTTTTTTTGAGATAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 407358 . CTCTCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 407406 . AAAATTTTTTTTTGGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 407462 . TGCGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 407497 . GGCATCTCAAAATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 407529 . AATATTTTTATTGTCACTATTTTCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 407564 . GAAATTTTATTTGGATTTCTTTTTTTTTTTTTTTTGACA . . . NS=1;AN=0 GT:PS ./.:. -1 407648 . AGCTCACTGCAACCTCCACCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 407687 . TCCTGCCTCAGCCTCCTAAGTAGCTGGGATTACAGGCATGCGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 407750 . T . . . NS=1;AN=0 GT:PS ./.:. -1 407753 . A . . . NS=1;AN=0 GT:PS ./.:. -1 407775 . C . . . NS=1;AN=0 GT:PS ./.:. -1 407812 . CAAGTGACCTGCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 407883 . TTTCTTTTTGACATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 407920 . TAAATTTTCATGCTGTAATTTCTAGTTTTGTTGTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 408001 . T . . NS=1;CGA_WINEND=410000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.71:0.91:.:0:.:0:0.999:353 -1 408008 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 408073 . TCAATGTTTACTTTTTCAGGCTATAGGCTTTGCTACATAA . . . NS=1;AN=0 GT:PS ./.:. -1 408124 . TTTTTGGTCCTCATATAGATTTTTTAATTACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 408178 . AAGGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 408204 . A . . . NS=1;AN=0 GT:PS ./.:. -1 408207 . A . . . NS=1;AN=0 GT:PS ./.:. -1 408223 . C . . . NS=1;AN=0 GT:PS ./.:. -1 408271 . AAATTGT . . . NS=1;AN=0 GT:PS ./.:. -1 408393 . GAAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 408425 . TTGTGTTGCTGAGAACTGCTCAGTAACACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 408487 . TTCAGTGTCTTGAGTATTGTGAAACACA . . . NS=1;AN=0 GT:PS ./.:. -1 408600 . ATGCAGGTCTACTGTCCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 408713 . GAGGTATATTTTTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 408820 . GGAGAGTCTGTACTAGGCATGGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 408858 . GCCCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 408993 . CAACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 409063 . C . . . NS=1;AN=0 GT:PS ./.:. -1 409097 . TCTGACATCATTATACACTTTCCAATGAAAGCAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 409164 . CACTCCTGAAAAATCTCCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 409193 . ACAGTGACTTATTAACCAACACTCATGAT . . . NS=1;AN=0 GT:PS ./.:. -1 409482 . GGGCTAAATGACATTGTTCCTACAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 409527 . A . . . NS=1;AN=0 GT:PS ./.:. -1 409573 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 409580 . T . . . NS=1;AN=0 GT:PS ./.:. -1 409584 . A . . . NS=1;AN=0 GT:PS ./.:. -1 409589 . AAC . . . NS=1;AN=0 GT:PS ./.:. -1 409604 . T . . . NS=1;AN=0 GT:PS ./.:. -1 409655 . TACTAAAAATATAAAAAAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 409723 . TGAGACAGGAGAATCACTTGAACCTGGGAGGCAGAAGTTGCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 409798 . AGGCGACA . . . NS=1;AN=0 GT:PS ./.:. -1 409812 . GACTCTGTCCAAAAAAAAAAAAGAAGAAAAAAAGAGAAAGGAAAAAAAAAGGAAAAATAAATAAATAAGTAAATAAATAAATACATA . . . NS=1;AN=0 GT:PS ./.:. -1 409906 . CCCTAAC . . . NS=1;AN=0 GT:PS ./.:. -1 409971 . TTTCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 410001 . G . . NS=1;CGA_WINEND=412000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.77:1.04:.:0:.:0:0.999:353 -1 410012 . TCTTTCTCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 410045 . CACTACCACATGCACACACACAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 410093 . TTTCTAAAAACCTG . . . NS=1;AN=0 GT:PS ./.:. -1 410149 . TAATCAGGCAACTCTGGTTTCTATCGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 410208 . AGGGCATCCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 410259 . CACATTAGTTAGTCAGTCAGTCAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 410306 . AGCCGAAGTGAACACCAACAGAAAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 410393 . G . . . NS=1;AN=0 GT:PS ./.:. -1 410417 . CTAATGAAAGTGGCTAGAAAAATATGGGTGCCCATGT . . . NS=1;AN=0 GT:PS ./.:. -1 410471 . TGTATGA . . . NS=1;AN=0 GT:PS ./.:. -1 410586 . CTGGCAGGCTAAGGGTTCTGAGGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 410872 . GCTCAACATGGAGCTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 410928 . GGCTACA . . . NS=1;AN=0 GT:PS ./.:. -1 410965 . ATTCTTAAAGCTTCTGCTGGAAGTGACATGTC . . . NS=1;AN=0 GT:PS ./.:. -1 411028 . CACGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 411157 . GGTATAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 411208 . AAATCACAGTAAATTTCCTTTAAAAGATCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 411270 . GAGTCCTTAGTAATGGACTCCATCTCTTCCATCAGATAAAATGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 411330 . AATTTGAATAATGAAACCAAAGGAAAAAAAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 411390 . GAACAACTGTGGCATCAGCATAATTCAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 411426 . TGTATTAAATATTTTGCAGAAAAGTGAAAACAAATTGATAGCCAAATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 411490 . ACCATGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 411527 . AAACACTAGTTTAGTTATATAAACATGGCTGATGTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 411616 . A . . . NS=1;AN=0 GT:PS ./.:. -1 411731 . AAATGTGGCACTAGTATTTATTACAGCTCACCTTTTTATAATGAAGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 411792 . ATTCTTATTATTTCCCATTTTCTTACCA . . . NS=1;AN=0 GT:PS ./.:. -1 411935 . TTTTATTAGGTATTAGTCTACCAACTT . . . NS=1;AN=0 GT:PS ./.:. -1 412000 . TTCCACCCAGTTATCCTGCTGGAACCTGTAACAGTTACTGTAATGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 412001 . T . . NS=1;CGA_WINEND=414000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.81:0.92:.:0:.:0:0.999:353 -1 412124 . G . . . NS=1;AN=0 GT:PS ./.:. -1 412179 . CAGTAGCTAGAGAAGTGCTGGAATGCCCCTGTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 412236 . AGAATGGCTGCTTGAACGAATTTACTGCTCAACTCGAAAGGCCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 412390 . CACCCATTCATTATTATCTAATGAGGAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 412497 . CAGGTGAGCAGAGGGCCCCACCTTCGGAGGTTGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 412607 . TATAAAGATAAGATCATTGAGAGATTTTTTTCCTCCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 412797 . CTGCCCTACCCCTAGCCTGTGTGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 412841 . TGGCAGAGGAGTCCCAGGGGAAGGATGCATGATCTTCCACTGTGCCTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 412920 . AGCAGGTCTCCCTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 412979 . GTTCCCTCCTTGTCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 413127 . CTCAGGAAAGCTCAGAGCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 413162 . TGATCCATTAGTTAAAACCACGCTGGGTTCTTTATAGTGGTTAGTTAG . . . NS=1;AN=0 GT:PS ./.:. -1 413285 . CGATAAAATACATGGTGCTTTTCAGAAATTGCCCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 413358 . TCTGATGGATAGAGATCTAGGCCTGACACTCCAAGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 413423 . TTCTTACATCTTAACCCTCAGGAATTTTAAATAGAAGTGTTCCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 413543 . CACAGCGTGTTGGGAATGGTGATTATAAATGTAACCATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 413592 . GTAAGTGGAGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 413708 . TTTTGTTTTGCCCTATTCC . . . NS=1;AN=0 GT:PS ./.:. -1 413867 . TGGTCGGGCACGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 413925 . A . . . NS=1;AN=0 GT:PS ./.:. -1 413945 . GGAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 413991 . ATACAAAAAAAAAAAAAAATTAGCCAGGCGTGGTGGTAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 414001 . A . . NS=1;CGA_WINEND=416000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.82:1.07:.:0:.:0:0.999:353 -1 414068 . G . . . NS=1;AN=0 GT:PS ./.:. -1 414108 . TAGATCGCGCCACTGCACTCCAGCCTGGGCAACAGAGCAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 414155 . ACTCAAAAAAAAAAAAAAAAAAAGAAAAGAAAAGGAGCCTCTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 414239 . TGCCAAAAAAAATATGAAGGAAGCAATGTGATGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 414393 . ATCAAGGCTCCCATCAACAGCCAGTCCTGTGAGTGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 414464 . TGCTGAACACAGC . . . NS=1;AN=0 GT:PS ./.:. -1 414503 . AATCATGTGGTCAGTTTGCAGGGTGGTTATTACACAGCAGTAGATGA . . . NS=1;AN=0 GT:PS ./.:. -1 414611 . CGGTGAGCAGAATATGGATGTGGGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 414663 . AAGCACTGAACAGAGCACAAAGACCTGATGTTCCAGGGTCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 414722 . TAACAGCGGCCATAGGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 414752 . AGAGTAACGTGGTCAGATTTTCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 414790 . CTGACATCCATGTGGAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 414860 . GATCGTAGTAATCTACTTAAGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 414899 . GAATGGGGAGGTGTTAGAGAAGAGAAAATGGATTTGAAGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 414988 . CCTGTAATCCCAGCACTTTGGGAGGCCAAGGTGGACAG . . . NS=1;AN=0 GT:PS ./.:. -1 415167 . AACCCAGGCAGTGGAGGTTGCAGTGAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 415233 . AAGACTCCGTCTAAAAAAAAAAAACAAC . . . NS=1;AN=0 GT:PS ./.:. -1 415286 . ATTGGGTGAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 415463 . TAAAGAGGTCAGAAGTGAACTTCAATAAAATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 415551 . GGAACACACCTATGAGTCAGAAAGCCAGACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 415736 . TTCATTAAGCCAATAGACAACCATGGCATTTTAACT . . . NS=1;AN=0 GT:PS ./.:. -1 415795 . CTGATTTTTTAATTAGCAGAAGCAAATAAAGAGAGCTTCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 415864 . T . . . NS=1;AN=0 GT:PS ./.:. -1 415919 . CAACATAATGGTTAATTTTCTGAGAAGTAAGTTCATGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 416001 . A . . NS=1;CGA_WINEND=418000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.78:0.99:.:0:.:0:0.999:353 -1 416026 . TTGCTTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 416066 . ACACATAATAAAGAATTGTGTTAGTGCCAGAGAGACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 416139 . CTTCTTTCTGTCCCTCCACCCCACCAGCTCTGATAC . . . NS=1;AN=0 GT:PS ./.:. -1 416208 . T G . . NS=1;AN=2;AC=1;CGA_RPT=MIR|MIR|38.2;CGA_SDO=14 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:26:26,26:26,26:6,25:-26,0,-26:-6,0,-25:25:1,24:24 -1 416276 . A . . . NS=1;AN=0 GT:PS ./.:. -1 416292 . TAGGTACTCTTCTTCTTTTTTTTTTTTTTTTTAAATTTTACA . . . NS=1;AN=0 GT:PS ./.:. -1 416560 . ATCGAATAAAGAGCTGATCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 416697 . GCCGCGCACCCAGCAGTCGGCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 416932 . TTTGCTCTTACTTTCAACATTTCTGCTGGGGCCTTGCATTGAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 417055 . TTATTTATTATTTATTTATCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 417366 . ATCTTGTAGGAAAGACAAAGGTAGAGAATCTGTCTGATGGCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 417461 . GCTATGAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 417516 . CTGTAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 417832 . G . . . NS=1;AN=0 GT:PS ./.:. -1 417914 . TTACGAG . . . NS=1;AN=0 GT:PS ./.:. -1 417965 . AATTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 418001 . G . . NS=1;CGA_WINEND=420000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.80:1.01:.:0:.:0:0.999:353 -1 418066 . CATCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 418284 . GAAGTACTG . . . NS=1;AN=0 GT:PS ./.:. -1 418476 . GCTTAAAACAATAGTCAAGAAAATTAGAGCCACAAACT . . . NS=1;AN=0 GT:PS ./.:. -1 418610 . TAAAGTATAATTAAAAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 418788 . AACACTCACAACTGTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 418819 . CTTCCTGAGGGACCTGGAGGAAGTCACACCC . . . NS=1;AN=0 GT:PS ./.:. -1 418879 . T . . . NS=1;AN=0 GT:PS ./.:. -1 418932 . GGTTTTTGTTTGTTTGTTTGAGACAGAGTCTCACTCT . . . NS=1;AN=0 GT:PS ./.:. -1 418974 . CCAGGCTGGAGTGCAGTGGCACGATCTCAGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 419024 . CCACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 419086 . CCATGCCCAGATAATTTTTGTATTTTTAGTAGAGATGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 419135 . GTTTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 419170 . TTATCCACCCACCTTGGCCTCCCAAAGCGCTGAGATTACAGGTGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 419336 . GTATTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 419383 . CCCTTAC . . . NS=1;AN=0 GT:PS ./.:. -1 419638 . A . . . NS=1;AN=0 GT:PS ./.:. -1 419789 . A . . . NS=1;AN=0 GT:PS ./.:. -1 420001 . G . . NS=1;CGA_WINEND=422000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.86:1.01:.:0:.:0:0.999:353 -1 420131 . GCCCTGTATCAGGGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 420165 . AAACAAAAAAGTCCCACTG . . . NS=1;AN=0 GT:PS ./.:. -1 420367 . ATACGCT . . . NS=1;AN=0 GT:PS ./.:. -1 420436 . C . . . NS=1;AN=0 GT:PS ./.:. -1 420500 . A . . . NS=1;AN=0 GT:PS ./.:. -1 420555 . GGA . . . NS=1;AN=0 GT:PS ./.:. -1 420634 . AAGGACACCTAGGGTTAGTCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 420759 . A . . . NS=1;AN=0 GT:PS ./.:. -1 420773 . T . . . NS=1;AN=0 GT:PS ./.:. -1 420974 . TTTGTGTCTACCTCAAAGGTACGTTGCAAGGATCGAGGGACAGAGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 421087 . TACCAGAGAATGTGAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 421126 . TTAGAAAAGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 421171 . CATAGATGGATGATCAGAATTTGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 421208 . G . . . NS=1;AN=0 GT:PS ./.:. -1 421225 . A . . . NS=1;AN=0 GT:PS ./.:. -1 421325 . AGCCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 421459 . G . . . NS=1;AN=0 GT:PS ./.:. -1 421580 . TCTGATG . . . NS=1;AN=0 GT:PS ./.:. -1 421737 . AGCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 421860 . G . . . NS=1;AN=0 GT:PS ./.:. -1 421958 . ATCCTGCCTTCAAGGAATCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 422001 . G . . NS=1;CGA_WINEND=424000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:1.00:.:0:.:0:0.999:353 -1 422026 . AAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 422112 . AGCCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 422228 . TTGCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 422310 . AGCACTTTGGGAGGCCGAGGTGGGCAGACCACGAG . . . NS=1;AN=0 GT:PS ./.:. -1 422362 . AGCCTGGCCAACATGGTGAAACCCC . . . NS=1;AN=0 GT:PS ./.:. -1 422392 . TACTACACACACACACACACACACACACACACACACACACACACCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 422447 . GCTACTC . . . NS=1;AN=0 GT:PS ./.:. -1 422557 . TCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAGATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 422675 . AGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAAATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 422757 . CATATAA . . . NS=1;AN=0 GT:PS ./.:. -1 422980 . AGGCATT . . . NS=1;AN=0 GT:PS ./.:. -1 423120 . AGCGCCGTGTACCTGATACGCTACCCTGGGCACAGGCGATCAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 423254 . TTTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 423366 . A . . . NS=1;AN=0 GT:PS ./.:. -1 423370 . G . . . NS=1;AN=0 GT:PS ./.:. -1 423382 . G . . . NS=1;AN=0 GT:PS ./.:. -1 423440 . CTATTAAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 423482 . C . . . NS=1;AN=0 GT:PS ./.:. -1 423500 . G . . . NS=1;AN=0 GT:PS ./.:. -1 423529 . ACCAGGGAGGTGGAGGTTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 423639 . AAAATAGGATGACTGCACGTC . . . NS=1;AN=0 GT:PS ./.:. -1 423811 . CAAATGATTTTTGTCCTGTAAAAAGATTTTATTGCTCCACA . . . NS=1;AN=0 GT:PS ./.:. -1 424001 . T . . NS=1;CGA_WINEND=426000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.74:1.08:.:0:.:0:0.999:353 -1 424221 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4951897&dbsnp.131|rs78102372;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:39:39,64:39,64:36,28:-64,-39,0:-36,-28,0:1:1,1:0 -1 424324 . TGGGAAAACAGGAGAAAGTGTTGTTGGATA . . . NS=1;AN=0 GT:PS ./.:. -1 424455 . GAAATGATAATATCAGTAGTAGCAATACTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 424621 . CGTATTTTTACA . . . NS=1;AN=0 GT:PS ./.:. -1 424674 . CAAATCCACTGTCTATGTCCCTCCCTCCCTCCCTCCCTCCCTCCCTTCCTTCCTTCCTTCCTTCCTTGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 424775 . TTTCTGCCTCCCTCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 425118 . GAAGTGGACATGGTGGCAGACAAAGCTGCTGGAAGCTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 425186 . CACATTCCAAAGGCAGTGTGTGTGTGATGTTGCATCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 425380 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 425506 . G . . . NS=1;AN=0 GT:PS ./.:. -1 425525 . GCTCTACACACACACACACACACACACACACGATT . . . NS=1;AN=0 GT:PS ./.:. -1 425713 . TTGAGTGTGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 426001 . T . . NS=1;CGA_WINEND=428000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.89:0.86:.:0:.:0:0.999:353 -1 426037 . CTTACAGTGGGTCTCCTGATGAAGCTGAGGTCATGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 426133 . AAACTTGGCTGCACATTGGAATTCCTAGGGAGAGTTTTAAACATC . . . NS=1;AN=0 GT:PS ./.:. -1 426351 . GAAGTCCATCCTCACAGCGGTCAGGCCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 426403 . GGCATATGCCTAAGTTTCTGTGCAATGAATGCATGCCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 426773 . AACTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 426894 . GGTGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 427077 . GATGGAGAGAAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 427200 . AGGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 427315 . GAGCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 427342 . CCACGACAGTGGGGAGTGAGGTGAGGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 427390 . ACTCACCCAGCACATTCCCACCTCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 427432 . AGGGACAAACTGACCAGAAGGCTATT . . . NS=1;AN=0 GT:PS ./.:. -1 427492 . T . . . NS=1;AN=0 GT:PS ./.:. -1 427612 . TTCCCTACC . . . NS=1;AN=0 GT:PS ./.:. -1 427855 . AGCAGTTTTTTTTTTTTTTAGAGCTGGTGTCTTGCTCTGTTGCCCAAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 428001 . G . . NS=1;CGA_WINEND=430000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.88:1.16:.:0:.:0:0.999:353 -1 428247 . TATAGGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 428502 . TCTTCAAAACCAGTAGGAGGATTTCTCTTGCACTGAATCTCTTTTGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 428830 . GGCCTTGCATTTTATTTCATTACCAGTGTCTATTGGCTCTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 428944 . TCACGGTTGTTAGGAGGATTAAAAGTGTATAAAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 429049 . ACAGAGCATTTTAAGTGAAAAAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 429134 . TGCTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 429200 . TTTACAG . . . NS=1;AN=0 GT:PS ./.:. -1 429359 . AACTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 429439 . ATCAACA . . . NS=1;AN=0 GT:PS ./.:. -1 429509 . TCCTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 429543 . GCACCTGCTGGACCTGTGGCAAGATCATCAGGGATTGC . . . NS=1;AN=0 GT:PS ./.:. -1 429609 . ACCAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 429702 . GACTCTTTCTTATTCTGGGCCTCTGCTGTTATCTGGGTCTAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 429796 . TCTCGGCAGGCTGAATGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 429869 . ACACGAC . . . NS=1;AN=0 GT:PS ./.:. -1 429941 . CTATTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 430001 . T . . NS=1;CGA_WINEND=432000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.80:1.02:.:0:.:0:0.999:353 -1 430049 . GGGTGGGGTGACTTCCAACCACACTGGGAGCCATCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 430109 . TCCGCAGCACAGATGGGTGCAACCCAATGTCAGGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 430184 . GTAGACGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 430251 . CACGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 430306 . GGGGTACAGCTTGGTTTTATGCATTTTAGAAAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 430416 . GGGCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 430523 . TTTTGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 430567 . AATGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 430622 . CACGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 430734 . A . . . NS=1;AN=0 GT:PS ./.:. -1 430943 . C . . . NS=1;AN=0 GT:PS ./.:. -1 431221 . AGATAAAAAAAAAAATCAG . . . NS=1;AN=0 GT:PS ./.:. -1 431346 . AATGGCCTTGCCTTCCTGTGCAGTCAGATGGAGCACCC . . . NS=1;AN=0 GT:PS ./.:. -1 431400 . CCTGCTGAGTAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 431777 . TGTATGTCACAGATCACCAGATGT . . . NS=1;AN=0 GT:PS ./.:. -1 431986 . ATATTGGTTAAAAACCA . . . NS=1;AN=0 GT:PS ./.:. -1 432001 . C . . NS=1;CGA_WINEND=434000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.91:0.88:.:0:.:0:0.999:353 -1 432024 . ACTGAAGGATGAGAATT . . . NS=1;AN=0 GT:PS ./.:. -1 432063 . CTACGCTGTTGTGAAGTTCCCCACAGGGATCTCCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 432132 . C . . . NS=1;AN=0 GT:PS ./.:. -1 432193 . GGCTCTGGGGTTGGGCCCTTCTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 432325 . TTGTAAAAGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 432404 . CTCTGCCTCCTCACTTGGCTACATTTTTCTCTCACTTGTGCATC . . . NS=1;AN=0 GT:PS ./.:. -1 432496 . GTACGCT . . . NS=1;AN=0 GT:PS ./.:. -1 432741 . GGGATGTGGGGGGAGGGCGTGTGTGTGGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 432847 . GCAGGTCATTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 432885 . ACAGTCTAGCCTTTTAAGAATGC . . . NS=1;AN=0 GT:PS ./.:. -1 432936 . ACGTGGGGATATGTGGGGGCGGCCATGTTGCCAGCCACCTGTTGGGGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 433036 . TGCGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 433183 . ACTCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 433257 . TGCTGAGTTACCCCAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 433320 . GAGATTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 433430 . ACTTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 433523 . CCCAGAAGCAAAGCAGAAATTTCTTTTGAGATCCAGGGTGGGAAATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 433649 . AGTCACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 433798 . CATTCCCCACCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 433938 . C . . . NS=1;AN=0 GT:PS ./.:. -1 434001 . G . . NS=1;CGA_WINEND=436000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.83:0.99:.:0:.:0:0.999:353 -1 434150 . CTGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 434203 . TTTTGTTGTTGTTATTTCAGATGGAGTTTCACTCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 434312 . A . . . NS=1;AN=0 GT:PS ./.:. -1 434478 . AGGCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 434696 . ACTCCTCATTGACAGCAGATGGGATTTTTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 435059 . AGCGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 435283 . TTATAAAAAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 435563 . TTTGTTTTCTTGAACCCATTATATGAATAATTTTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 435803 . GTAATTTTTTTTTTCTTTTTTGAGACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 435947 . CAGGTGCCTGCCACCATGCCTGGCTAATTTTTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 436001 . T . . NS=1;CGA_WINEND=438000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.62:0.97:.:0:.:0:0.999:353 -1 436189 . GATATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 436231 . TCTCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 436272 . CTTTACTGA . . . NS=1;AN=0 GT:PS ./.:. -1 436333 . T . . . NS=1;AN=0 GT:PS ./.:. -1 436350 . A . . . NS=1;AN=0 GT:PS ./.:. -1 436449 . TCCTCCCACCTCAGCCTCCGGAGTAGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 436512 . TTTTAAAATTTTTTTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 436750 . T . . . NS=1;AN=0 GT:PS ./.:. -1 436768 . GCCCCGTCACACCCATGAACTCACACCGAGAACACAACATCA . . . NS=1;AN=0 GT:PS ./.:. -1 436821 . TCACGCACAGAGGACATGTGAACACTACACACACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 436867 . CACACCCCGTCACACCCATGAACTCACACCAAGAACACAACATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 436924 . CACGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 436970 . ACACGCCCCGTCACACTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 437005 . ACACAACATCACACACAGGGACTCACGCACAGAGGACATGTGAACA . . . NS=1;AN=0 GT:PS ./.:. -1 437057 . ACACGTTCTCACACAGCACACACCC . . . NS=1;AN=0 GT:PS ./.:. -1 437095 . CCCCCACATGCCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 437231 . TGGGCCCTGCGCTGGCCCCGGCCGCAGACGCCCACCTGCTGCTGTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 437297 . GGGCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 437345 . TGAAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 437380 . GCAGAAGCATGAGCCCCAGAATGTGCACGAAGGAAGAGAGAGCCGG . . . NS=1;AN=0 GT:PS ./.:. -1 437501 . CCATCCCCGCACCCACTGGTGTGGCCTGACCCTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 437560 . ACTAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 437642 . C . . END=438074;NS=1;AN=0 GT:PS ./.:. -1 438001 . G . . NS=1;CGA_WINEND=440000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.77:0.89:.:0:.:0:0.999:353 -1 438085 . GTGCGCCGGCCGCCTGGCCTGGGCATCTCCTCTCCTGCAGCGCCGCCTGCTGGCCACAGAGAACCCGCGTGCGCCGGCCGCCAGGCCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 438182 . CCCGGGCCCTAGTTCCCCCCCTCACCTAAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 438246 . T . . . NS=1;AN=0 GT:PS ./.:. -1 438441 . G . . . NS=1;AN=0 GT:PS ./.:. -1 438506 . AGCTGGCCCAGGGGTTCAGGCTTTCCTTTCATAAAGTGGGGTCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 438747 . GGTGCATAGAAAACAAACACTCTGGGAAGCCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 438818 . GTGCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 438894 . A . . . NS=1;AN=0 GT:PS ./.:. -1 439001 . CTCACTGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 439129 . CCCTGGGGGAGGTTTTCCCCTTACTTGAAATGTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 439224 . T . . . NS=1;AN=0 GT:PS ./.:. -1 439261 . TTGGCCCCCTCTGGCCAGCTGGTTCCCCAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 439321 . CCCCATGGCCTGCCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 439380 . AAAATCTGCTTTATAGATGAGGAAAGACCC . . . NS=1;AN=0 GT:PS ./.:. -1 439460 . C . . . NS=1;AN=0 GT:PS ./.:. -1 439468 . T . . . NS=1;AN=0 GT:PS ./.:. -1 439473 . T . . . NS=1;AN=0 GT:PS ./.:. -1 439660 . TAGGTGGTATATGGGTGATAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 439922 . GAAAGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 439968 . CTCACCCCTATTAGCCCCAGTGTTTGGCCTGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 440001 . C . . NS=1;CGA_WINEND=442000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.86:0.98:.:0:.:0:0.999:353 -1 440158 . CTCCATGGGCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 440350 . GAAAGAATATGGTAAAGGAAAGCTTTGAGCCCATAT . . . NS=1;AN=0 GT:PS ./.:. -1 440573 . C . . . NS=1;AN=0 GT:PS ./.:. -1 440694 . GGGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 440894 . ACCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 441007 . CCCTGGGGAACTCTCAGCACGGGGGTTGCATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 441055 . TGCTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 441221 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 441239 . T . . . NS=1;AN=0 GT:PS ./.:. -1 441244 . G . . . NS=1;AN=0 GT:PS ./.:. -1 441295 . CCAGTCCTCTGACTTGTCATTTTTCTACCCC . . . NS=1;AN=0 GT:PS ./.:. -1 441338 . TTAACCCCATA . . . NS=1;AN=0 GT:PS ./.:. -1 441450 . AAGTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 441506 . TGGTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 441681 . GCAGGACTGGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 441734 . ATGAGGTTTTTTTTTTTCCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 441827 . TTTTGGTGTACAGTTCTATGAGTTTAACACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 441927 . CACGCTCCCCGACC . . . NS=1;AN=0 GT:PS ./.:. -1 441989 . GCAGCTGGCGGCACCTGGAGACCGGCTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 442001 . A . . NS=1;CGA_WINEND=444000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:0.77:.:0:.:0:0.999:353 -1 442027 . CGACCGCGCGTGCGCGGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 442137 . GTTGTTGGATAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 442191 . AGCTTTTGTTTCTCTCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 442271 . AACACCACAGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 442301 . TCCCGCTAATCGGGAATGAGTGCCTGCTGCTCCGCGTTCTTGCGGGCACC . . . NS=1;AN=0 GT:PS ./.:. -1 442450 . TTTCCCATCCATATCCCTTGCTTGGTGACGC . . . NS=1;AN=0 GT:PS ./.:. -1 442562 . TAGATACAATTCC . . . NS=1;AN=0 GT:PS ./.:. -1 442668 . TTTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 442699 . ACTATATTTAAAATTTTAATTTAAAAACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 442939 . CACGCAGCGCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 442983 . GCCTCCATACCCGCTTGGCACCCACCAAAGGGTCCGGGGACCC . . . NS=1;AN=0 GT:PS ./.:. -1 443114 . GGTCTTTTTTTTTTTTTTTTTTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 443152 . GTCGCCCAGGCTGGAGTGCAGTGGCGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 443235 . CCTCGCG . . . NS=1;AN=0 GT:PS ./.:. -1 443261 . GCCTGCCACCATGCCTGGCTAATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 443397 . GAGCCACCGCACCTGGCCAGGCTGGTCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 443463 . GGAGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 443529 . AGAAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 443594 . CTAGATCCCCTGCCTGGAGGCCCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 443632 . TCTGGATAGTTGCAGCCACTGTGGGGAAGCTTCAGTTTGGGAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 443691 . TATTGTCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 443711 . TGCAGAGAGGAGAATGGAGCGGATGGAGGTGCGGTGGGGGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 443823 . CTTCTTCTTGTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 443856 . GTCGCCCATCCTGGTTAGGAGGCACAAGGAAGTGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 444001 . G . . NS=1;CGA_WINEND=446000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.75:0.93:.:0:.:0:0.999:353 -1 444034 . CTTCGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 444065 . CCCCTGGAGGGAGAGGCTTCCTTCCTTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 444129 . CCTCAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 444187 . CTCAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 444253 . GAAATCA . . . NS=1;AN=0 GT:PS ./.:. -1 444290 . ACCTAAAAATATTACTCAGTTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 444353 . ATCCAACAT . . . NS=1;AN=0 GT:PS ./.:. -1 444414 . CACGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 444505 . C . . . NS=1;AN=0 GT:PS ./.:. -1 444536 . TTCATACGTTATTAAACCTAAACATAAAAATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 444601 . AAGGTTCTTATATGTACACATCACATA . . . NS=1;AN=0 GT:PS ./.:. -1 444690 . TTTAGGGGCCAAGGGGCCCCACACTC . . . NS=1;AN=0 GT:PS ./.:. -1 444772 . GGAGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 444865 . TAATGAG . . . NS=1;AN=0 GT:PS ./.:. -1 444920 . CACATCG . . . NS=1;AN=0 GT:PS ./.:. -1 444954 . GGTCTGCTTTGCTTACAAGGTCCAGAAACCCAGCAGAAAACCCTCGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 445039 . TCCATGTGTGACACAAGACAC . . . NS=1;AN=0 GT:PS ./.:. -1 445091 . AGTTGAAGCTGAATAAAGCCAGCAGGCAGCGAAAATCACCTGCACTCAGTTCAGTTGTGGCAGGAAAGGGAAAGAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 445192 . AGAGGGAGAAAGGAGAGGGGGATGGAGAGGAGAAAGGAGAGGGGGATGGAGAGGGAGAAAGGAGAGGGGGATAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 445282 . GGGGCTAGAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 445305 . GGGAGATGGAGAGGGGGATGGAGAGGGGGATGGAGAGGAGAAAGGAGACGGGGATGGAGGGGGGATGGAGAGAAGAAAGGAGAGGAGGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 445426 . GGCTGGCTGGGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 445458 . GCAGGGGAGGCCGGCTGGGCATGGACACTG . . . NS=1;AN=0 GT:PS ./.:. -1 445500 . AGCCACTCTGGGAACAGCAGCCAGTGGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 445543 . GCTGTTCCCTCCTGTCCAGGATGGGCCTGTCTCTGCAGACAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 445613 . CACTCCCCATGTGCATGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 445898 . AGGGGAGAGGAGTTTACTACCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 445982 . GTACGTGATGCTGTGCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 446001 . T . . NS=1;CGA_WINEND=448000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.23:0.91:.:0:.:0:0.999:353 -1 446007 . CCCACAG . . . NS=1;AN=0 GT:PS ./.:. -1 446073 . GTCAGGGCCCGGGGGTCCTGACGGGTACA . . . NS=1;AN=0 GT:PS ./.:. -1 446142 . CACAGCAGGACACGCCTGCGCTGAAAGAGTGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 446279 . ATTGTTTCCCACAGCAATACATGTTAGCAACTTTGAAACTACTTTTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 446355 . TACAGTAGTAAGAGTCAGGGTTCTCACAGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 446425 . G . . END=447446;NS=1;AN=0 GT:PS ./.:. -1 447472 . CAGTAGGAAATCATGTTTACATGATACATATATACAGATCAGAATGGACCCTGAGGTGGTCGGTTACAGTCAGATATGCCAGTAGGAAGTCGTGTTTACATGA . . . NS=1;AN=0 GT:PS ./.:. -1 447581 . TATACAGATCAGAATGGACCCTGAGGTGGTCAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 447630 . CAGCAGGAACTCATGTTTACATAATACATATATACAGATAGGTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 447693 . TGCGTGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 447818 . T . . . NS=1;AN=0 GT:PS ./.:. -1 447881 . GATGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 447929 . AAATAAAATTTTTAAAAAGCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 448001 . C . . NS=1;CGA_WINEND=450000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.35:0.84:.:0:.:0:0.999:353 -1 448006 . CTACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 448029 . CACTCACACAGACATACACACACACACACACACAAATCGGATTATGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 448164 . TGGCGAGCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 448188 . ACTCGGGAGGCTGAGGCAGGAGAATGGTGTGAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 448252 . TCACGCCAATGCACTCCAGCTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 448287 . AGACTCTGTCTCAAAAACAAAACAAAACAAAACAAAACAAAAAACTGA . . . NS=1;AN=0 GT:PS ./.:. -1 448404 . GGCTGGGCACAGTGGCTCGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 448463 . ATCTCCTGAGGTCAGGAGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 448510 . ACCCTGTCTCTACAAAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 448634 . TGTGCCACTGCACTCCAGCCTGGGCAATAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 448677 . TCTCAAAAAAAAGAAAGAAAGAAAGAAAAGAAAAGAAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 448738 . TGCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 448843 . ACCATGGTGCCCAAATTCTTTGATGCTCCTTCCTGTGGGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 448938 . GAAAACAGTCACTCTCCCGTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 448974 . CCTCACCT . . . NS=1;AN=0 GT:PS ./.:. -1 449070 . CTCTGCCGTGTTCTTACGCAAACGCACAGTTCCAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 449142 . GGGCATCT . . . NS=1;AN=0 GT:PS ./.:. -1 449187 . GGCAGTGAAACGAA . . . NS=1;AN=0 GT:PS ./.:. -1 449219 . CACAGGTGCAGGGGACTAAGGAGGCGTGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 449270 . T . . END=450577;NS=1;AN=0 GT:PS ./.:. -1 450001 . C . . NS=1;CGA_WINEND=452000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.48:0.71:.:0:.:0:0.999:353 -1 450587 . CACGTGCATCA . . . NS=1;AN=0 GT:PS ./.:. -1 450646 . GGGAAGTTGCAGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 450679 . GGACATCTTTTTTTTTTTTTTAAATAAAACATTTTTAACATGA . . . NS=1;AN=0 GT:PS ./.:. -1 450728 . GCAGAGCACGGTGGCTCGCACCTGTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 450848 . AATACAAAAATTAGCTGGGCGTGGTGATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 450946 . G . . . NS=1;AN=0 GT:PS ./.:. -1 450989 . AGAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 451113 . T . . . NS=1;AN=0 GT:PS ./.:. -1 451244 . GACGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 451283 . GTGAGTGTGTATGACTGTGTGTATGAGTGTGTATGATTTGTGTGTGTGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 451417 . TGAGTGTGAATGTGAACATGTGTGTGTGAGTGGGTATATGATTTGGGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 451661 . CTGGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 451676 . TGAGCATGAATGTGAACATGTGTGCATGAATGTGAACATGTGTGCATGAATATATGATTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 451768 . GTACAATCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 451922 . TGCACAA . . . NS=1;AN=0 GT:PS ./.:. -1 451992 . TGGGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 452001 . G . . NS=1;CGA_WINEND=454000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.85:0.98:.:0:.:0:0.999:353 -1 452058 . GTGAGCATGAATGTGTATGCACAAGTGTGTGTATGTGTGTATGATCT . . . NS=1;AN=0 GT:PS ./.:. -1 452122 . A . . . NS=1;AN=0 GT:PS ./.:. -1 452159 . AGTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 452258 . AGCATTG . . . NS=1;AN=0 GT:PS ./.:. -1 452429 . GAAGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 452559 . CTCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 453065 . G . . . NS=1;AN=0 GT:PS ./.:. -1 453163 . CCATGCCGCAAAGATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 453264 . AAAGTATAAATCTGGAAAAAATAATGTCGATGTTATTTATTTAC . . . NS=1;AN=0 GT:PS ./.:. -1 453334 . ACATAAAAACACAACAGAACATTTCATAGGCCACT . . . NS=1;AN=0 GT:PS ./.:. -1 453466 . CCATAAAGAAATAAGTCAAAAGAGAAACGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 453718 . A . . . NS=1;AN=0 GT:PS ./.:. -1 453752 . AAATGCTCACCTTTTGACAGGGTACTTTTAGTTCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 453987 . GCTGTTCCACCTGAGCCTAAGGTTCCTCTGCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 454001 . G . . NS=1;CGA_WINEND=456000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:0.93:.:0:.:0:0.999:353 -1 454044 . AAGCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 454087 . GGGCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 454156 . AACGTTTCCATCTGTTAATAAAGAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 454225 . C . . . NS=1;AN=0 GT:PS ./.:. -1 454359 . CTCCGCATGACTTGGATAACACGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 454528 . G . . . NS=1;AN=0 GT:PS ./.:. -1 454733 . GATGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 454777 . CTTGGCTCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 454862 . GGATGTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 454948 . TGATGTA . . . NS=1;AN=0 GT:PS ./.:. -1 455168 . GGCGTGGGCAGGGGGCTGACTCCATGTGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 455257 . TTTCTCTCCTCCCTATAAAGCCTATTTTTGTATTAGGGTGTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 455715 . TGTTCATGGTCCTCCCCGTGGGATGT . . . NS=1;AN=0 GT:PS ./.:. -1 456001 . G . . NS=1;CGA_WINEND=458000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.75:0.96:.:0:.:0:0.999:353 -1 456108 . TAGTGGGCGAGGATTGTTTAGCCGCCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 456145 . A . . . NS=1;AN=0 GT:PS ./.:. -1 456152 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 456201 . ATTC . . . NS=1;AN=0 GT:PS ./.:. -1 456218 . ATAAATATATATTTATATTATTTATATTTAGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 456332 . CTCCTACT . . . NS=1;AN=0 GT:PS ./.:. -1 456501 . CAAATCAGCCTACTTGGAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 456570 . TTCTTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 456609 . TTGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 456684 . TCTCTTTCTCCCCCTCTCCCCGCTACATTTCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 456742 . CTCTGTCTTCACGAGAGCTACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 456839 . TAGTTCC . . . NS=1;AN=0 GT:PS ./.:. -1 456913 . CACGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 457090 . TTTAGATGGTTTTTTTTTTTTTTGTTTGTTTTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 457234 . TCCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 457284 . TTTGTATTTTAGTAGAGACGGGATTT . . . NS=1;AN=0 GT:PS ./.:. -1 457331 . CTCGATC . . . NS=1;AN=0 GT:PS ./.:. -1 457583 . CTAATTTTTTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 457711 . GCCCGGCCCCTTTTGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 457794 . GAGGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 458001 . A . . NS=1;CGA_WINEND=460000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.84:0.98:.:0:.:0:0.999:353 -1 458009 . ATTCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 458055 . CTTTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 458211 . G . . . NS=1;AN=0 GT:PS ./.:. -1 458693 . ACACCTGTAGTCCCAACTACGCAGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 458753 . TTGAGGCTGCTCTGAGCTGTGATTG . . . NS=1;AN=0 GT:PS ./.:. -1 458820 . TCTTAAAAAAAAAAAAAAAAAACTATTGCAAGAGGAGAGAGAGAGACTGA . . . NS=1;AN=0 GT:PS ./.:. -1 459164 . CTTACCA . . . NS=1;AN=0 GT:PS ./.:. -1 459222 . GTGGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 459249 . TGCATGTCTCTTTGAATCCGTATCT . . . NS=1;AN=0 GT:PS ./.:. -1 459485 . ATGCAAGTCCTACTGTTTCTGTAACTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 459661 . CTCGTCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 459678 . CCCGCTCGAGCCTCTCCACATGCAGCAGGAAGGAAAGTGGAGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 459732 . T . . . NS=1;AN=0 GT:PS ./.:. -1 459746 . T . . . NS=1;AN=0 GT:PS ./.:. -1 459921 . CCAGTTTTTCCTGCCTGTCCTGTTTGGGCAGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 460001 . A . . NS=1;CGA_WINEND=462000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.98:1.04:.:0:.:0:0.999:353 -1 460084 . TTCTACACCATTCCGGGATGCTGGTGTCCACCACTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 460365 . CATTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 460403 . CTCGCCTTCAGGGTCGTCTGTGTCTGTTAAAGTCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 460663 . AAACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 460720 . TGATTCCAGTGCAGCATCACATGACAGACAGAGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 460819 . CTGTCTCTTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 461079 . CAATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 461230 . CACGTTCTCAGGACCTCCTGAAGCTGCGTCACAGGCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 461421 . GGAGAAGTGTGGGTGTTGGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 461539 . ATGTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 461645 . TGGCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 461710 . G . . . NS=1;AN=0 GT:PS ./.:. -1 461811 . AAGTTGGAGAAACAAAACGCAAACTAAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 461895 . CCCCACCCCGCATCCCTGGGCTCGGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 461941 . TGCCACC . . . NS=1;AN=0 GT:PS ./.:. -1 462001 . G . . NS=1;CGA_WINEND=464000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.22:1.35:.:0:.:0:0.999:353 -1 462037 . CTGTATA . . . NS=1;AN=0 GT:PS ./.:. -1 462100 . CTTGGGGGCGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 462158 . TAAGTTTTTTTTTTTTTTTTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 462213 . AGTGGCGTGATCTTGGCTCACTGCAAGCTCCACCTCCCAGGTTCAAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 462280 . CCCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 462321 . CTAATTTTTTTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 462372 . ATCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 462431 . TACAGGTGTGAGCCACCACACCT . . . NS=1;AN=0 GT:PS ./.:. -1 462535 . GACAGCAGAGGGGTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 462562 . TACAGACAGCAGCAGCTGATGCACAGGCCTCCCAGCGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 462618 . GGGAAGTGCTCAGAAGCTTACAAAGCTGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 462676 . GAGTAGATCCCTGATCCTATAAAAATGTACTAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 462736 . GGGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 462910 . TGCATGT . . . NS=1;AN=0 GT:PS ./.:. -1 462927 . CTGGGTTGGCAGAGACAGAGTGACTGTCTTCCTCCAGGAAGCAGCAGGTTAACTGGTTGGCAGAGACAGAGGGACTGAGGGACTGTCTCCCTCCAGGAAGCAGCAGGTTAACTGGTTGGCAGAGACAGAGGGACAGAGGGACTGTCTTCCTCCAGGAAGCAGCAGGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 463224 . CAAGCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 463503 . GACGTGTCAGAAACC . . . NS=1;AN=0 GT:PS ./.:. -1 463557 . AAAATTATTCATTAAAAACATCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 463644 . ACAATACCCTTCAGACTTTGAGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 463699 . GATCCAAACTTGATAAAGGACATGAAAAAGAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 463859 . AGGGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 463921 . ATACGAG . . . NS=1;AN=0 GT:PS ./.:. -1 463965 . CTGTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 464001 . T . . NS=1;CGA_WINEND=466000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.35:1.25:.:0:.:0:0.999:353 -1 464012 . GCCATTGAGCACCCTGGTGTTGAGAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 464066 . GGGTCACTGTGAGTGGGCTGCCCCCAACATGAGTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 464194 . ACCGTCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 464265 . AACATAT . . . NS=1;AN=0 GT:PS ./.:. -1 464574 . AAAAGTATGCGGAATAGAACT . . . NS=1;AN=0 GT:PS ./.:. -1 464614 . CCTGAAAAAGTCACATGTTATTTCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 464651 . TGGCAAAAAAAAAGTCACGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 464937 . AAATAATTTAAAAGTGCTTTTGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 464976 . CTAGAGGGTAAGATTAGACAT . . . NS=1;AN=0 GT:PS ./.:. -1 465018 . GGTTAGGGATAGGATTAGGATCTGGGTCAGAGTCAGGGCCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 465094 . GAGATCA . . . NS=1;AN=0 GT:PS ./.:. -1 465135 . TCAGGATTTAGGTTCAGTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 465165 . GGACAGGGTTAGGGTTAGGATT . . . NS=1;AN=0 GT:PS ./.:. -1 465194 . GAGCTTTGTTCTCCTCAGGACCCACCCGAGGACGGGTCACC . . . NS=1;AN=0 GT:PS ./.:. -1 465244 . GAGCACCTGGTAGTGTGGCATGTCCACA . . . NS=1;AN=0 GT:PS ./.:. -1 465286 . TTCATTG . . . NS=1;AN=0 GT:PS ./.:. -1 465306 . CCTGGGGAGACGTGGCTGCAGGCCATTGAGGAAGGTGAGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 465365 . CCCGTGTGCTGAGGAGGGAGCTCTGCCGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 465531 . AGAGATC . . . NS=1;AN=0 GT:PS ./.:. -1 465577 . G . . . NS=1;AN=0 GT:PS ./.:. -1 465592 . AAGTTGGGTATAGGCAGAGGCTGGAGGAAACATGTGCATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 465660 . ATTACTCATTTTTCTTACAGTGTTAAATTAGTAAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 465858 . AACATTGACATCCTACATTACATGT . . . NS=1;AN=0 GT:PS ./.:. -1 465895 . AATCTGAGACAGCTCTCAGATTTTTTAGAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 465977 . TCCTGACAACATGGGCCCAAGGTGGTCGGGGCACA . . . NS=1;AN=0 GT:PS ./.:. -1 466001 . G . . NS=1;CGA_WINEND=468000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.54:1.22:.:0:.:0:0.999:353 -1 466036 . GACACGAGAGATCAATCAATATGTGTAAGATGTACATTGGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 466102 . GACAGGGGGCTTCCAGGTCACAGGTAGGTAAGAGACAAATG . . . NS=1;AN=0 GT:PS ./.:. -1 466176 . CACGTGAGGCAATCAGGTATGCATTTATCTCGGTGATCAGATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 466253 . CTAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 466356 . GTGTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 466470 . TTGGGTTTCCAATA . . . NS=1;AN=0 GT:PS ./.:. -1 466677 . CATCGTGGTCACAAG . . . NS=1;AN=0 GT:PS ./.:. -1 466945 . TTTTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 467152 . GTATAAATAATTAAACAAGGAAGTGTTAAAAAAAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 467234 . AATCGTCAAAAAAAAAAAGAGATTTCCCATGTAGCCGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 467450 . ATCCCACACGACATGTAGTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 467501 . TGTTAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 467554 . GGATTTTTGTAGAATGTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 467693 . ACTGCATGTGTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 467800 . ATTCTTTTGCAAAGGAGATTTCTATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 468001 . G . . NS=1;CGA_WINEND=470000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.15:1.21:.:0:.:0:0.999:353 -1 468102 . G . . . NS=1;AN=0 GT:PS ./.:. -1 468154 . TTAGCAATGCTAGGAAGCATATGTGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 468191 . ACCTACACACACCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 468434 . ACAATTGCTTCTGTTGGAAAGAACTTTATAAAATGGAATCCAATAATG . . . NS=1;AN=0 GT:PS ./.:. -1 468491 . TCACGTGCCTTCAGCCTACAG . . . NS=1;AN=0 GT:PS ./.:. -1 468523 . TTTCAAAGTTTTTACCTAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 468557 . CATTTTGTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 468589 . CTGCGTAATTTATAAGTAAAACAGTTTCATTTGGTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 468634 . GGTAGCTGGAATGTCCGAGATTGGGCAGTTGCATCTGGCGGGGCCTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 468688 . TTCACCTCATGGTGGAAAGTGGAAGGGGAGCAAGGGGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 468738 . ACATAGCAGAAGTGAAAGCAAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 468777 . CAGACTCTTTTTAATTACCTACTCCTGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 468816 . ATTCCTGTGAGAACAGAACTCACTCACCCCCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 468860 . ATCTATTCATGAGGGATCCGTCCCCACGACCCAAACACCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 468910 . CCCACCGCCCCACACTGACACAGTGGGAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 468962 . T . . . NS=1;AN=0 GT:PS ./.:. -1 468984 . CATCGTAATTTATAGCATAAATTCTTTTTCACATGATGTATT . . . NS=1;AN=0 GT:PS ./.:. -1 469037 . TACTCCACATCCTGAGTAATTTGATT . . . NS=1;AN=0 GT:PS ./.:. -1 469112 . TTTCGACAAATGCATTGTGGCAGATATCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 469604 . CTCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 469626 . TTTAAAAAAATTTTTTTTGAGACAGGGTCTGGCTCTGTCGCCCAGGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 469689 . AATCTCAGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 469739 . C . . . NS=1;AN=0 GT:PS ./.:. -1 469772 . CATGCCGTCATGCCCATCTAATTTTTGTATTTTTGGTAGAGACGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 469854 . CAACATTTATTTATTTATTTATTTAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 469940 . A . . . NS=1;AN=0 GT:PS ./.:. -1 470001 . C . . NS=1;CGA_WINEND=471368 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.60:0.97:.:0:.:0:0.999:353 -1 470001 . CAGGTGCCTGCCATCACGCCCGGCTAATTTTTTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 470065 . TTAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 470103 . ACCTGCCTCAGCCTCCTAAAGTGCTGGGATTATAGGCATGA . . . NS=1;AN=0 GT:PS ./.:. -1 470200 . TATAGTGGCACATAGCATGGATAAGGAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 470267 . AAACATAACATTTTAC . . . NS=1;AN=0 GT:PS ./.:. -1 470302 . GGAAGGTTAGGTATCTCTTTTTATTTGTATCTTCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 470349 . TTATAAAAAATGCAACCTACTTTACTTGCGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 470401 . ATGCTTTGCATAGAGTTGTTTCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 470448 . TTTATTTACATATATTGATTATAATTTTAATACT . . . NS=1;AN=0 GT:PS ./.:. -1 470497 . TTC . . . NS=1;AN=0 GT:PS ./.:. -1 470502 . G . . . NS=1;AN=0 GT:PS ./.:. -1 470516 . GTAGACAGTTATAAACTGTCATATATTAGCATTCTATAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 470603 . G . . . NS=1;AN=0 GT:PS ./.:. -1 470609 . C . . . NS=1;AN=0 GT:PS ./.:. -1 470630 . TGTGGCAGAAAAAAACATGTTTATTAACGGG . . . NS=1;AN=0 GT:PS ./.:. -1 470674 . AGTCTCTCTGTAAAAACAGGAAGCCAAAAGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 470715 . GAATTATTTATGTTCAGTAATTAATGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 470756 . TTAT . . . NS=1;AN=0 GT:PS ./.:. -1 470863 . GTAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 470884 . TAATGTTTTTCGCTTTTCACAAGACGGCACCGAAAGCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 470951 . AGCCAAAGTGAAGGTTTTAAAGGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 470999 . CCGCAGCCACACGCAAAAAAGAAGATCCGCATGTCACCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 471059 . CTGCGACTCCGGAGGCAGCCCAGATATCCTCGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 471142 . TCCGCTGACCACTGAGTCGGCCGGAAGAAGATAGAAGAAAACAACACGCT . . . NS=1;AN=0 GT:PS ./.:. -1 471216 . CAACAAGCACCAGATCAGACAGGCTGTGAAGAAGCTCTATGA . . . NS=1;AN=0 GT:PS ./.:. -1 471287 . TTTGTCCTGATAAAGAGAACAAGGCATATGTTCGACTTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 471347 . ATGTTGTAACAAAATTGGGATC . . . NS=1;AN=0 GT:PS ./.:. -1 471369 . N . . END=521368;NS=1;AN=0 GT:PS ./.:. -1 521369 . G . . NS=1;CGA_WINEND=524000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.58:0.75:.:0:.:0:0.999:426 -1 521369 . GATCCTTGAAGCGCCCCCAAGGGCATCTT . . . NS=1;AN=0 GT:PS ./.:. -1 521421 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 521429 . G . . . NS=1;AN=0 GT:PS ./.:. -1 521542 . GGTGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 521582 . ATGCGGGGTGGGGGCAGCTACGTCCTCTCTTGAGCTACAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 521648 . TTGCTTAGTTTGCGTTTTGTTTCTCCAACTT . . . NS=1;AN=0 GT:PS ./.:. -1 521776 . GACCACA . . . NS=1;AN=0 GT:PS ./.:. -1 521838 . AAAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 521941 . TCCCTCGCATGTAAAATGTGTATTCAGTGAACACTG . . . NS=1;AN=0 GT:PS ./.:. -1 522046 . TTCCCCAATACCCGCACTTTTCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 522079 . CTGAGGCCCCCAGACAATCTTTGGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 522173 . T . . . NS=1;AN=0 GT:PS ./.:. -1 522210 . GTTCTTTGATCAGCGCCTGTGACGCAGCTTCAGGAGGTCCTGAGAACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 522363 . GGGCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 522393 . TTTTCTGATTGTCAATTGATTG . . . NS=1;AN=0 GT:PS ./.:. -1 522462 . A . . . NS=1;AN=0 GT:PS ./.:. -1 522482 . G . . . NS=1;AN=0 GT:PS ./.:. -1 522523 . GCCACCCTTAGAGAAAATAG . . . NS=1;AN=0 GT:PS ./.:. -1 522564 . GACCCATAGAAGGTGCTAGACTCTCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 522659 . CACAAGAGACAG . . . NS=1;AN=0 GT:PS ./.:. -1 522730 . TCACGGCCTCTGTCTGTCATGT . . . NS=1;AN=0 GT:PS ./.:. -1 522820 . TCTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 522961 . GCGGGGTCCACGCAATC . . . NS=1;AN=0 GT:PS ./.:. -1 523011 . CGGAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 523074 . C . . . NS=1;AN=0 GT:PS ./.:. -1 523079 . AAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 523118 . CTTAATG . . . NS=1;AN=0 GT:PS ./.:. -1 523181 . ATTGTGGCACACGGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 523278 . T . . . NS=1;AN=0 GT:PS ./.:. -1 523349 . CTGTTCCCCGTGCCCAGGCAGCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 523388 . TCCCGGAATGGTGTAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 523414 . CCCGTCATAGCCAAAGCCTGGGGTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 523461 . CCCACTCCTCTCCCGACCCCTCCCTCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 523536 . GCCTGCCCAAACAGGACAGGCAGGAAAAACTGG . . . NS=1;AN=0 GT:PS ./.:. -1 523743 . A . . . NS=1;AN=0 GT:PS ./.:. -1 523751 . T . . . NS=1;AN=0 GT:PS ./.:. -1 523799 . AGGCTCGAGCGGGGCACAGTCCATGACGAG . . . NS=1;AN=0 GT:PS ./.:. -1 523976 . GCCAGTTACAGAAACAGTAGGACTTGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 524001 . G . . NS=1;CGA_WINEND=526000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.66:0.74:.:0:.:0:0.999:426 -1 524106 . AGAGGCCAGCAGGAGGGAAACACCGACCC . . . NS=1;AN=0 GT:PS ./.:. -1 524145 . AGACGGGGATTGGGAGAGAAATTCAGAAAAGATTGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 524216 . AGATACG . . . NS=1;AN=0 GT:PS ./.:. -1 524261 . AAGCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 524316 . TTTTGGTAAGTTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 524412 . CCATTCAGGCTCCTTTGAGCCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 524460 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 524624 . G . . . NS=1;AN=0 GT:PS ./.:. -1 524641 . T . . . NS=1;AN=0 GT:PS ./.:. -1 524644 . AATAG . . . NS=1;AN=0 GT:PS ./.:. -1 524668 . A . . . NS=1;AN=0 GT:PS ./.:. -1 524692 . AGGTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 524719 . GCTCACAGCAGCCTCAAACTCCTAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 524771 . CCTGCGTAGTTGGGACTACAGGTGTGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 524853 . GCCAAGGCTGGTTTTGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 525132 . CCCATCTCTCTTACCATACACAGAAATCAAATAAAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 525259 . C . . . NS=1;AN=0 GT:PS ./.:. -1 525272 . A . . . NS=1;AN=0 GT:PS ./.:. -1 525337 . CTGCAGATCAAAGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 525461 . ATCGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 525531 . CATGAAAAAATGC . . . NS=1;AN=0 GT:PS ./.:. -1 525665 . GAAAGGGGGACCCTCACACACTGTTGTGGGAACATTG . . . NS=1;AN=0 GT:PS ./.:. -1 525745 . AACTAAAAGGGGCTGGGCCCGG . . . NS=1;AN=0 GT:PS ./.:. -1 525877 . ATATAAAAAATTAG . . . NS=1;AN=0 GT:PS ./.:. -1 526001 . A . . NS=1;CGA_WINEND=528000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.37:0.59:.:0:.:0:0.999:426 -1 526028 . TCCACCTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 526111 . GGGCGGATCACAAGGTCAGGAGATCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 526160 . AAATCCCGTCTCTACTAAAATACAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 526229 . CTCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 526343 . CTCCAAAAACAAACAAACAAACAAAAAACACCTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 526387 . TGCTGTATGATCCAGTAATTTCACTAACTGGGCATAT . . . NS=1;AN=0 GT:PS ./.:. -1 526549 . GAACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 526595 . G . . . NS=1;AN=0 GT:PS ./.:. -1 526887 . GGCGGATC . . . NS=1;AN=0 GT:PS ./.:. -1 527036 . CCCGGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 527110 . TCTCAAAAAAAAAAAAAAAAAAGAAGGAATAAGACCTAGTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 527164 . AGTTGACAATTGCCTACTGTATATTTCAAAATAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 527231 . AAGCAAAAACAAATATTTAAGGCGATAGATATTCC . . . NS=1;AN=0 GT:PS ./.:. -1 527319 . TACCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 527365 . A . . . NS=1;AN=0 GT:PS ./.:. -1 527377 . CTTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 527398 . TCTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 527406 . TGGGCTCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 527418 . AAATATTT . . . NS=1;AN=0 GT:PS ./.:. -1 527429 . T . . . NS=1;AN=0 GT:PS ./.:. -1 527439 . CAAAGTAGGAGGATTCCTTGAGCCCGGGAGCTTGAGGCTGCAGTGAGATCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 527512 . CGACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 527530 . GCTCTAAATATAAATAATATAAATATATATTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 527577 . G . . . NS=1;AN=0 GT:PS ./.:. -1 527613 . ACAGTCCGCCCTTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 527636 . TGACACAGCCAAGGCAGCTGAACAATCCTCGC . . . NS=1;AN=0 GT:PS ./.:. -1 527673 . AGACAGTGGAGGTCGCCCTCCAGAGGACCTTATCAGATGTACGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 527739 . TTTCTATTCAGAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 527967 . CAGGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 528001 . G . . NS=1;CGA_WINEND=530000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:0.67:.:0:.:0:0.999:426 -1 528043 . ATCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 528094 . GTGCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 528148 . TAAGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 528358 . AGATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 528432 . ATGGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 528476 . TTTAACAAACACCCTAATACAAAAATATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 528583 . GCCCACATGGAGTCAGCCCCCTGCCCACGCC . . . NS=1;AN=0 GT:PS ./.:. -1 528633 . CCCTCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 528793 . ACTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 528827 . TACATCATGGCTGAGGGCACATACGTGCACGCACATAT . . . NS=1;AN=0 GT:PS ./.:. -1 528911 . ATAACATCC . . . NS=1;AN=0 GT:PS ./.:. -1 528943 . A . . . NS=1;AN=0 GT:PS ./.:. -1 528958 . T . . . NS=1;AN=0 GT:PS ./.:. -1 528967 . GGCC . . . NS=1;AN=0 GT:PS ./.:. -1 528998 . AGCCAAGCCTGGGGGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 529042 . CCCCATCCCTGTCCGGAATAGCACGGGTGCTTCTCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 529126 . TCCACGCC . . . NS=1;AN=0 GT:PS ./.:. -1 529232 . AGCT . . . NS=1;AN=0 GT:PS ./.:. -1 529397 . TACCGTGTTATCCAAGTCATGCGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 529598 . TTGCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 529675 . GCGTCCCCTCCCTGGCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 529731 . CTCGCTTGGCCCCCACCTGATTCCTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 529788 . GAACAGCCTCAACTGATTCAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 529992 . CCCAGAACTAAAAGTACCCTGTCAAAGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 530001 . A . . NS=1;CGA_WINEND=532000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:0.65:.:0:.:0:0.999:426 -1 530062 . CTCCCCTGCTGCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 530098 . GGCAGGTAGGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 530237 . CTATAGG . . . NS=1;AN=0 GT:PS ./.:. -1 530269 . TAGTGATATACGTGTACACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 530309 . TTTATGGTCTGTCTTCTTATAACTGCTACACCCATGCCACCGTCATTA . . . NS=1;AN=0 GT:PS ./.:. -1 530389 . ATCGTTGCCTATTTTATTGTGTAAAGTGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 530474 . GTAAATAAATAACATCGACATTATTTTTTCCAGATTTATACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 530601 . CCCATCTTTGCGGCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 530653 . CTGCCCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 530763 . CAAGGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 530889 . CCCAGCAGTGCAGACCCCTCTCTAGAGCCGAGATGCTCCCGGCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 530960 . GCTCCCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 531172 . TGAGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 531216 . CACTGAGCTCAGATCCCACGTCTGAGCCTCCGCCTTTCCGT . . . NS=1;AN=0 GT:PS ./.:. -1 531517 . CAATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 531616 . ACATACT . . . NS=1;AN=0 GT:PS ./.:. -1 531677 . AGATCATACACACATACACACACTTGTGCATACACATTCATGCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 531783 . ATACCCA . . . NS=1;AN=0 GT:PS ./.:. -1 531891 . AGATCATACACACATACACACACTTG . . . NS=1;AN=0 GT:PS ./.:. -1 532001 . A . . NS=1;CGA_WINEND=534000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.40:0.64:.:0:.:0:0.999:426 -1 532003 . ACCGATTGTACACTCGTGCACACATTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 532098 . CACCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 532151 . T . . . NS=1;AN=0 GT:PS ./.:. -1 532239 . CACACTCATACACAGCCCAAAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 532304 . ACACCCAAATCATATACCCACTCACACACACATGTTCACATTCACACTCA . . . NS=1;AN=0 GT:PS ./.:. -1 532399 . CCCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 532436 . CACTCACACACACAAATACACACTCATACACAGTCATACACACTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 532515 . CACCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 532643 . CTGCTTTTTAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 532770 . TTGCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 532887 . CCCATCACCACGCCCAGCTAATTTTTGTATT . . . NS=1;AN=0 GT:PS ./.:. -1 532926 . AGATGGGGTTTCACCATGTTGGCTAGGCTGGTCTTGAACTCCTGACCT . . . NS=1;AN=0 GT:PS ./.:. -1 532987 . CCTCGGCCTCCCAAAGTGCTGGGATTACAG . . . NS=1;AN=0 GT:PS ./.:. -1 533031 . GCTCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 533058 . TTTTATTTAAAAAAAAAAAAAAGATGTC . . . NS=1;AN=0 GT:PS ./.:. -1 533106 . TACTGCAACTTCCCACAG . . . NS=1;AN=0 GT:PS ./.:. -1 533160 . CACGCAATGAGGCACGTGTAGAAACTGCGACACTCACACGGG . . . NS=1;AN=0 GT:PS ./.:. -1 533222 . G . . END=533973;NS=1;AN=0 GT:PS ./.:. -1 533982 . ACTGCGACACTCA . . . NS=1;AN=0 GT:PS ./.:. -1 534001 . T . . NS=1;CGA_WINEND=536000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.58:0.89:.:0:.:0:0.999:426 -1 534005 . G A . . NS=1;AN=1;AC=1;CGA_SDO=6 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:534005:VQLOW:31,.:30,.:8,.:-31,0,0:-8,0,0:7:3,.:4 -1 534014 . GCTC . . . NS=1;AN=0 GT:PS ./.:. -1 534025 . G . . . NS=1;AN=0 GT:PS ./.:. -1 534028 . AACTGCGACACTCACGC . . . NS=1;AN=0 GT:PS ./.:. -1 534050 . C . . . NS=1;AN=0 GT:PS ./.:. -1 534053 . T . . . NS=1;AN=0 GT:PS ./.:. -1 534057 . A . . . NS=1;AN=0 GT:PS ./.:. -1 534067 . GGTGTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 534082 . A . . . NS=1;AN=0 GT:PS ./.:. -1 534085 . CTCA . . . NS=1;AN=0 GT:PS ./.:. -1 534096 . GCCGTCTCAGCAGCTCACGTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 534138 . CCCTCACGCCTCCTTAGTCCCCTGCACCTGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 534189 . TCGCTTCACCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 534311 . CGGCGGGGGGGGGGGCGGCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 534411 . AGGTGAGGCCTGCCAGGTCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 534445 . GACTGTTTTCC . . . NS=1;AN=0 GT:PS ./.:. -1 534542 . ACCGTGGTAATTACTGAACATTTAGGGGAGACACTTTGAGACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 534648 . GCCTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 534666 . GATTTTCTTTTTTTTCTTTTCTTTTCTTTCTTTCTTTCTTTTTTTGAGACAGAGTTTTGCTCTCATT . . . NS=1;AN=0 GT:PS ./.:. -1 534806 . GCCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 534859 . GTATTTTTTTGTAGAGACAGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 534908 . GAACTCC . . . NS=1;AN=0 GT:PS ./.:. -1 534939 . CAGCCTCCCGAAGTGTTGAGATTACAGGCACGAGCCACTGTGCCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 535057 . TCAGTTTTTTGTTTTGTTTTGTTTTGTTTTGTTTTTGAGACAGAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 535133 . GGCGTGATCTTGGCTCACTGCAAGCTCCACCTCCCGGGCTCACACCA . . . NS=1;AN=0 GT:PS ./.:. -1 535196 . TCCCGAGTAGCTGGGACTACAGGCGCTCGCCACCTCGCCTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 535314 . GGGCATAATCCGATTTGTGTGTGTGTGTGTGTGTATGTCTGTGTGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 535379 . ATTGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 535434 . ATGTCCACAGCTTTTTAAAAATTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 535504 . G . . . NS=1;AN=0 GT:PS ./.:. -1 535507 . C . . . NS=1;AN=0 GT:PS ./.:. -1 535523 . T . . . NS=1;AN=0 GT:PS ./.:. -1 535672 . GTGTGCTAAGCCATGTATGTACACGCA . . . NS=1;AN=0 GT:PS ./.:. -1 535715 . TTTAACCTATCTGTATATATGTATTATG . . . NS=1;AN=0 GT:PS ./.:. -1 535755 . CCTGCTGGCATATCTGACTATAACTGACCACCTCAGGGTCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 535803 . TCTGTATATATGTATCATGTAAACACGACTTC . . . NS=1;AN=0 GT:PS ./.:. -1 535843 . ATATCTGACTGTAAC . . . NS=1;AN=0 GT:PS ./.:. -1 535864 . CCTCAGGGTCCATTCCGATCTGTATATATGTATCATGTAAACATGATTTCCTACTGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 535945 . T . . END=536546;NS=1;AN=0 GT:PS ./.:. -1 536001 . A . . NS=1;CGA_WINEND=538000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.49:0.90:.:0:.:0:0.999:426 -1 536585 . CAGCTGTGAGAACCCTGACTCTTACTACTGTATTGACTT . . . NS=1;AN=0 GT:PS ./.:. -1 536670 . CATGTATTGCTGTGGGAAACAATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 536800 . ACTCTTTCAGCGCAGGCGTGTCCTGCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 536863 . CCCATCCTGTACCCGTCAGGACCCCCGGGCCCTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 536958 . CTGTGGGGTTTGACAAACACAGCATCACGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 537049 . CCTCGGTAGTAAACTCCTCTCCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 537188 . TGGCGGGGGTGGGCACACA . . . NS=1;AN=0 GT:PS ./.:. -1 537338 . CCACCATGCACATAGGCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 537415 . CAGGAGGGAACAGC . . . NS=1;AN=0 GT:PS ./.:. -1 537441 . TCCGCCACTGGCTGCTGTTCCCAGAGTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 537484 . CAGCGTCCATGCCGGCCTCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 537526 . CACGCCCAGCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 537571 . CACCCTCCTCTCCTTTCTTCTCTCCATCCCCCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 537629 . CCATCCCCCTCTCCATCCCCCTCTCCATCTCCCTCTCCTTTCTCCTCTCTAGCCCCCTCTCCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 537699 . CTCCATCCCCCTCTCCTTTCTCCCTCTCCATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 537749 . GGCTCTTTCCCTTTCCTGCCACAACTGAACTGAGTGCAGGTGATTTTCGCTGCCTGCTGGCTTTATTCAGCTTCAACT . . . NS=1;AN=0 GT:PS ./.:. -1 537852 . AAATGTGTGTCTTGTGTCACACATGGAAATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 537914 . GCCTCGAGGGTTTTCTGCTGGGTTTCTGGACCTTGTAAGCAAAGCAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 537991 . CGATGTG . . . NS=1;AN=0 GT:PS ./.:. -1 538001 . T . . NS=1;CGA_WINEND=540000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.68:0.76:.:0:.:0:0.999:426 -1 538046 . CTCATTA . . . NS=1;AN=0 GT:PS ./.:. -1 538139 . CAGCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 538194 . CCGCTGGAGAGTGTGGGGCCCCTTGGCCCCTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 538290 . TATGTGATGTGTACATATAAGAACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 538348 . TCCATTTTTATGTTTAGGTTTAATAACGTATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 538472 . TTCATCTCTGAGCATTTTCTTCTCTGGACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 538556 . ATGTTGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 538600 . GACAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 538658 . TGATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 538687 . GAGAGTCTGGTTTCTACAGCGCCTTCAGGGAGAATGAGACTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 538780 . CTGCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 538823 . CAGAAGGAAGGAAGCCTCTCCCTCCAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 538875 . CCCTCGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 538957 . G . . . NS=1;AN=0 GT:PS ./.:. -1 538968 . TCT . . . NS=1;AN=0 GT:PS ./.:. -1 539025 . CAACACTTCCTTGTGCCTCCTAACCAGGATGGGCGACACCAGCCCATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 539080 . TGGGACAAGAAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 539164 . TCTCCCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 539179 . CTCCATCCGCTCCATTCTCCTCTCTGCACATCAGCTTCCCAGACAATA . . . NS=1;AN=0 GT:PS ./.:. -1 539245 . CCCAAACTGAAGCTTCCCCACAGTGGCTGCAACTATCCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 539297 . ACTTGGGCCTCCAGGCAGGGGATCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 539382 . CTCTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 539447 . TTTTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 539490 . AAAAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 539632 . AAATTAGCCAGGCATGGTGGCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 539665 . TCCCAGCTACGCGCAAGGCTGAGGCAGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 539719 . GCTTGCAGTGAGCCGAGATGGTGCCACTGCACTCCAGCCTGGGCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 539778 . TCTCAAAAAAAAAAAAAAAAAAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 539892 . GGGTCCCCGGACCCTTTGGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 539946 . T . . . NS=1;AN=0 GT:PS ./.:. -1 539972 . CTGCGTGGGGAGGGCGAGCTCAGAGAGCAGGGGAGCCTGACCC . . . NS=1;AN=0 GT:PS ./.:. -1 540001 . G . . NS=1;CGA_WINEND=542000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.76:0.91:.:0:.:0:0.999:426 -1 540165 . A . . . NS=1;AN=0 GT:PS ./.:. -1 540193 . TTTTAAATTAAAATTTTAAATATAG . . . NS=1;AN=0 GT:PS ./.:. -1 540243 . ACATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 540343 . GGAATTGTATCTA . . . NS=1;AN=0 GT:PS ./.:. -1 540437 . GCGTCACCAAGCAAGGGATATGGATGGGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 540708 . CCCGAGAGAAACAAAAGCTTATGTTCACACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 540767 . CTCTATCCAACAACCCTGGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 540863 . GAGCCGCGCACGCGCGGTCGGCTCGGCGAGGAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 540903 . CCAAGTGCCGCCAGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 540969 . GGTGGGGGAGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 541049 . AAATGTG . . . NS=1;AN=0 GT:PS ./.:. -1 541154 . AGAGAAAAAAAAAAACCTCATTTCCTCCCCACAAAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 541218 . C . . . NS=1;AN=0 GT:PS ./.:. -1 541238 . G . . . NS=1;AN=0 GT:PS ./.:. -1 541396 . TCAACCA . . . NS=1;AN=0 GT:PS ./.:. -1 541451 . ATGGACTT . . . NS=1;AN=0 GT:PS ./.:. -1 541560 . TATGGGGGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 541598 . CAAGTCAGAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 541665 . TCCCAGGCCATGTGGAAGACCTCACAGGGGGACCAACTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 541847 . GACAGCAGCTGTCTAAACAGGAGCATGCAACCCCCGTGCTGAGAGTTCC . . . NS=1;AN=0 GT:PS ./.:. -1 542001 . G . . NS=1;CGA_WINEND=544000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:0.91:.:0:.:0:0.999:426 -1 542008 . CCATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 542518 . ATATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 542732 . GGGTGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 542902 . GCTCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 542958 . ACCACTGTCCCCTTCTCACCTTTCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 543221 . ACCTATCACCCATATACCACCTA . . . NS=1;AN=0 GT:PS ./.:. -1 543435 . A . . . NS=1;AN=0 GT:PS ./.:. -1 543565 . GTGAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 543635 . GGGGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 543768 . CCCGGGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 543865 . AGTGTCTTAGGTAGACGGTTACACTTGTTTTCAGTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 543993 . CTGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 544001 . G . . NS=1;CGA_WINEND=546000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.39:0.85:.:0:.:0:0.999:426 -1 544001 . GAGG . . . NS=1;AN=0 GT:PS ./.:. -1 544079 . AGCGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 544123 . AGCGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 544391 . GCCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 544520 . CTGTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 544697 . GTGAGGGGGGGAACT . . . NS=1;AN=0 GT:PS ./.:. -1 544733 . G . . END=545023;NS=1;AN=0 GT:PS ./.:. -1 545035 . C . . END=545972;NS=1;AN=0 GT:PS ./.:. -1 545980 . GGCCAGCAGGCGGCGCTGCAGGAGAGGAGATGCCCAGGCCTGGCGGCCGGCGCACGCGGGTTCTCTGTGGCCAGCAGGCGGCGCTGCAGGAGGGGAGATGCCCAGGCCTGGCGGCCGGCGCACGTGGGCTCTCTGTGGCCAGCAGGCGGCGCTGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 546001 . G . . NS=1;CGA_WINEND=548000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.18:1.04:.:0:.:0:0.999:426 -1 546189 . ACCGCGGCGGCCTCTCCTGAGGTTCCCTAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 546267 . GTGCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 546354 . CCGGCTCTCTCTTCCTTCGTGCACATTCTGGGGCTCATGCTTCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 546428 . CCTTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 546501 . AGCACAGCAGCAGGTGGGCGTCTGCGGCCGGGGCCAGCGCAGGGCCCACT . . . NS=1;AN=0 GT:PS ./.:. -1 546644 . TGCGCGCAGAGTAAGGATGTGTGTGTCTACGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 546693 . TGACAGGGTGTGTTCTGTGTGAGAACATGT . . . NS=1;AN=0 GT:PS ./.:. -1 546729 . TGTCCACATGTCCTCTGTGCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 546768 . GTTGTGTTCTCGGTGTGAGTTCATGGGTGTGATGGGGTGTGTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 546820 . AACGTGTGTGTAGTGTCCACATGTCCTCTGTGCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 546872 . GTTGTGTTCTTGGTGTGAGTTCATGGGTGTGACGGGGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 546922 . AACGTGTGTGTAGTGTTCACATGTCCTCTGTGCGTGAGTCCCCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 546974 . GTTGTGTTCTCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 546996 . ATGGGTGTGACGGGGTGTGTGCTGTGTGAGAACGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 547039 . TGTCCACATGTCCTCTGTGCG . . . NS=1;AN=0 GT:PS ./.:. -1 547065 . CCCCGTGTGTGATGTTGTGTTCTCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 547100 . ATGGGTGTGACGGGGTGTGTGCTGTGTGAGAACGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 547143 . TGTCCACATGTCCTCTGTGCG . . . NS=1;AN=0 GT:PS ./.:. -1 547182 . GTTGTGTTCTCGGTGTGAGTTCATGGGTGTGACGGGGTGTGTGCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 547234 . AACGTGTGTGTAGTGTCCACATGTCCTCTGTGCGTGAGTCCCCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 547286 . GTTGTGTTCTCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 547308 . ATGGGTGTGACGGGGTGTGTGCTGTGTGAGAACGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 547351 . TGTCCACATGTCCTCTGTGCGTGAGTCCCCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 547390 . GTTGTGTTCTCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 547412 . ATGGGTGTGACGGGGTGTGTGCTGTGTGAGAACGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 547455 . TGTCCACATGTCCTCTGTGCGTGAGTCCCCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 547494 . GTTGTGTTCTCGGTGTGAGTTCATGGGTGTGACGGGGTGTGTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 547545 . GAACGTGTGTGTAGTGTCCACATGTCCTCTGTGCATGA . . . NS=1;AN=0 GT:PS ./.:. -1 547598 . GTTGTGTTCTCGGTGTGAGTTCATGGGTGTGACGGGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 547675 . CTGTGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 547700 . GTTGTGTTCTCGGTGTGAGTTCATGGGTGTGACGGGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 547750 . AACGTGTGTGTAGTGTTCACAT . . . NS=1;AN=0 GT:PS ./.:. -1 547777 . CTGTGCGTGAGTCCCTGTGTGTGATGTTGTGTTCTCGGTGTGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 547831 . TGACGGGGCGTGTGCTGTGTGAGAACATGT . . . NS=1;AN=0 GT:PS ./.:. -1 547875 . TGTTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 548001 . C . . NS=1;CGA_WINEND=550000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:0.99:.:0:.:0:0.999:426 -1 548081 . A . . . NS=1;AN=0 GT:PS ./.:. -1 548087 . ATT . . . NS=1;AN=0 GT:PS ./.:. -1 548092 . A . . . NS=1;AN=0 GT:PS ./.:. -1 548130 . CAGCTACTCCGGAGGCTGAGGTGGGAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 548274 . A . . . NS=1;AN=0 GT:PS ./.:. -1 548281 . T . . . NS=1;AN=0 GT:PS ./.:. -1 548291 . A . . . NS=1;AN=0 GT:PS ./.:. -1 548328 . CAGTAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 548457 . G . . . NS=1;AN=0 GT:PS ./.:. -1 548491 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2792860&dbsnp.131|rs75892356;CGA_SDO=8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:28:28,63:28,63:27,28:-63,-28,0:-28,-27,0:0:0,0:0 -1 548511 . A . . . NS=1;AN=0 GT:PS ./.:. -1 548585 . C . . . NS=1;AN=0 GT:PS ./.:. -1 548622 . AAAATACAAAAATTAGCCAGGCATGGTGGCAGGCACCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 548701 . TGAACCCGGGAGGTGGAGCTTGCAGCGAGCTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 548776 . TCCGTCTCAAAAAAGAAAAAAAAAATTAC . . . NS=1;AN=0 GT:PS ./.:. -1 549007 . AAACAAAATTATTCATATAATGGGTTCAAGAAAACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 549337 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2491327;CGA_SDO=8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:54:54,54:54,54:24,32:-54,0,-54:-24,0,-32:17:3,14:14 -1 549538 . AAAGTGCCACTG . . . NS=1;AN=0 GT:PS ./.:. -1 549879 . AATTAAAAATCCCATCTGCTGTCAATGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 550001 . T . . NS=1;CGA_WINEND=552000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:0.87:.:0:.:0:0.999:426 -1 550122 . TCACGCC . . . NS=1;AN=0 GT:PS ./.:. -1 550295 . T . . . NS=1;AN=0 GT:PS ./.:. -1 550368 . GGCGAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 550398 . AACAAAAAGCCAACCATGGGATCTGTGGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 550451 . CAGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 550669 . G . . . NS=1;AN=0 GT:PS ./.:. -1 550796 . CAAGGTGGGGAATGTTCTCTTAACCTGCAGCTTTCTCCTTCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 550882 . CTCAGGGCCTTCCAGCCAGACCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 550952 . TGTGACT . . . NS=1;AN=0 GT:PS ./.:. -1 551036 . ACCATTTCCCACCCTGGATCTCAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 551333 . GACTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 551401 . CCACCCCCTTATTGTTATAGGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 551540 . GAGTCACTGGCAAGCTTTGATATGCAAACGCA . . . NS=1;AN=0 GT:PS ./.:. -1 551622 . TTGCCCCAACAGGTGGCTGGCAACATGGCCGCCCCCACATATCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 551718 . C . . . NS=1;AN=0 GT:PS ./.:. -1 551753 . GACCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 551814 . TCCTCATATAACTAGCTGATTACACCACACACACGCCCTCCCCCCACA . . . NS=1;AN=0 GT:PS ./.:. -1 552001 . G . . NS=1;CGA_WINEND=554000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.79:0.94:.:0:.:0:0.999:426 -1 552100 . CACCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 552159 . GATGCACAAGTGAGAGAAAAATGTA . . . NS=1;AN=0 GT:PS ./.:. -1 552270 . GAACTTTTACAA . . . NS=1;AN=0 GT:PS ./.:. -1 552389 . ATGAGAAGGGCCCAACCCCAGAGCCCAGGCCAGTCAGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 552471 . CCAGGGGCTAGGTCCATGGCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 552507 . GCAGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 552537 . AGCGTAGATTCCCACTCATTCCCACAGCCAATTCTCATCCTTCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 552604 . TGGTTTTTAACCAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 552823 . ACATACA . . . NS=1;AN=0 GT:PS ./.:. -1 553193 . CCCTACTCAGCAGGCCCAGATGAG . . . NS=1;AN=0 GT:PS ./.:. -1 553223 . GGGTGCTCCATCTGACTGCACAGGAAGGCAAGGCCATT . . . NS=1;AN=0 GT:PS ./.:. -1 553331 . GAACACAGGGGCACTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 553367 . CTGATTTTTTTTTTTATCT . . . NS=1;AN=0 GT:PS ./.:. -1 553673 . TCCTAAAGAGATTAACTGAAAGTCTAGCACTTTGTTTTTTTTTTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 553848 . GCCCACGACCACACACAGCTAATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 554001 . C . . NS=1;CGA_WINEND=556000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.83:1.04:.:0:.:0:0.999:426 -1 554034 . AAACATT . . . NS=1;AN=0 GT:PS ./.:. -1 554077 . CTTCAAAAGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 554188 . G . . . NS=1;AN=0 GT:PS ./.:. -1 554350 . GGCCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 554413 . TTTTCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 554462 . GCTCCTGACATTGGGTTGCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 554492 . CTGTGGACTCTTCCCTCGGAATGAGAGAGGGAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 554659 . TGAAATAG . . . NS=1;AN=0 GT:PS ./.:. -1 554732 . GTCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 554792 . AAGCACTCAGCCTGCCGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 554862 . AGCTATACC . . . NS=1;AN=0 GT:PS ./.:. -1 554899 . AAGAGTCTGTGAGTGGGCAGAATTCCCTCCAGGGTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 554992 . AACTGGTCAGCCTCACTCCCTTGCTGAGACCAATAGCAACCCC . . . NS=1;AN=0 GT:PS ./.:. -1 555092 . ACCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 555162 . TGTTGATA . . . NS=1;AN=0 GT:PS ./.:. -1 555242 . TCACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 555401 . CTGTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 555466 . CAGCAGCAACAGACTTTACAA . . . NS=1;AN=0 GT:PS ./.:. -1 555533 . CTTCTTTTTCACTTAAAATGCTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 555660 . G . . . NS=1;AN=0 GT:PS ./.:. -1 555735 . CTCAGAGCCAATAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 555771 . CAAGGCCCCTTATCTTTGGAGCCCAGTGTTCCTTCCACAT . . . NS=1;AN=0 GT:PS ./.:. -1 556001 . A . . NS=1;CGA_WINEND=558000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:0.88:.:0:.:0:0.999:426 -1 556057 . ATGCAAAAGAGATTCAGTGCAAGAGAAATCCTCCTACTGGTTTTGAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 556352 . TGACCTATA . . . NS=1;AN=0 GT:PS ./.:. -1 556703 . AACTTGGGCAACAGAGCAAGACACCAGCTCTAAAAAAAAAAAAAACTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 556827 . GTGAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 556973 . AATCATTCCAGGAAGGTAGGGAAAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 557112 . AGGAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 557191 . GAGGAGGTGGGAATGTGCTGGGTGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 557236 . CCTGCCTCACCTCACTCCCCACTGTCGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 557286 . TGCGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 557401 . TCCGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 557517 . CCTTTCTCTCCATC . . . NS=1;AN=0 GT:PS ./.:. -1 557707 . GACCACC . . . NS=1;AN=0 GT:PS ./.:. -1 557793 . GAGCTATGGGAACCAATGGAATTGGATCTAAGGTTTTGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 558001 . C . . NS=1;CGA_WINEND=560000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.76:1.00:.:0:.:0:0.999:426 -1 558166 . AAGGCATGCATTCATTGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 558226 . GTGAGGCCTGACCGCTGTGAGGATGGACT . . . NS=1;AN=0 GT:PS ./.:. -1 558433 . GTTTAAAACTCTCCCTAGGAATTCCAATGTGCAGCCAAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 558533 . ACCCATGACCTCAGCTTCATCAGGAGACCCACTGTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 558720 . CCCCAGCAGAAGGGAGAGGCGAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 558753 . GCCAGGACTGTGTAAGGCCTTTGAAGGTTGACCATCCATCTC . . . NS=1;AN=0 GT:PS ./.:. -1 558882 . CATTCACACTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 559020 . A . . . NS=1;AN=0 GT:PS ./.:. -1 559048 . AATCGTGTGTGTGTGTGTGTGTGTGTGTGTGTAGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 559127 . GGCACCTCTTTTCACTGCTGTCATGCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 559228 . GT . . . NS=1;AN=0 GT:PS ./.:. -1 559385 . AGAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 559408 . CTGTCTTTGGAATGTGTGATTAAGCTTTTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 559452 . GACAGCTTCCAGCAGCTTTGTCTGCCACCGTGTCCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 559818 . AGGGAGGGAGGCAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 559865 . GAGCAAGGAAGGAAGGAAGGAAGGAAGGGAGGGAGGGAGGGAGGGAGGGACATAGACAGTGGATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 559975 . T . . . NS=1;AN=0 GT:PS ./.:. -1 560001 . T . . NS=1;CGA_WINEND=562000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.76:0.97:.:0:.:0:0.999:426 -1 560042 . GGCTGGGGTAAAGAGTGATACATGTAAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 560129 . CTACTACTGATATTATCATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 560373 . CAGCAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 560788 . TCATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 560946 . GACGTGCAGTCATCCTATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 561056 . TGCAACCTCCACCTCCCTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 561089 . CCTGCCTCAGCCTTCCCAGCAGCTGGGATTACAGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 561151 . TGTATTTTTTAATAG . . . NS=1;AN=0 GT:PS ./.:. -1 561239 . T . . . NS=1;AN=0 GT:PS ./.:. -1 561441 . GTCTGATCGCCTGTGCCCACGGTAGCGTATCAGGTACACGGCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 561619 . AATGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 561826 . ACCATTGTTTAGACAGTTATATGAAATGGGGTATTTTCTAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 561888 . ATATGGGGATTTTTTTTTTTTTTTTTTGAGACAGAGTCTCACTCTGTCGCCTAGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 562001 . C . . NS=1;CGA_WINEND=564000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.70:0.95:.:0:.:0:0.999:426 -1 562013 . TTTCTTTTTTTTTTTTTTTTTTTTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 562144 . GAGTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 562158 . ACAGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTAGT . . . NS=1;AN=0 GT:PS ./.:. -1 562222 . CCATGTTGGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 562255 . CTCGTGGTCTGCCCACCTCGGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 562365 . ATAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 562481 . AAGGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 562567 . TTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 562620 . CCAGATTCCTTGAAGGCAGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 562739 . C . . . NS=1;AN=0 GT:PS ./.:. -1 562856 . CCTCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 562907 . GA GG . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.131|rs78231499;CGA_FI=100131754|XR_108278.1|LOC100131754|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=L2c|L2|35.5;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:50,.:50,.:22,.:-50,0,0:-22,0,0:4:2,.:2 -1 562972 . A T . . NS=1;AN=2;AC=1;CGA_FI=100131754|XR_108278.1|LOC100131754|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:34:34,34:34,34:12,27:-34,0,-34:-12,0,-27:14:1,13:13 -1 563013 . CATCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 563124 . C . . . NS=1;AN=0 GT:PS ./.:. -1 563268 . AGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 563374 . T . . . NS=1;AN=0 GT:PS ./.:. -1 563458 . TTTCTTTTCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 563491 . ACCTCACATTCTCTGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 563573 . CACGCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 563615 . GCACAAAAATTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 563819 . TAAAATCCAGTTTGTGCCTAC . . . NS=1;AN=0 GT:PS ./.:. -1 563938 . TTTTGACTA . . . NS=1;AN=0 GT:PS ./.:. -1 564001 . C . . NS=1;CGA_WINEND=566000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:92.54:0.98:.:0:.:0:0.999:426 -1 564038 . TCC . . . NS=1;AN=0 GT:PS ./.:. -1 564067 . CAACAATGATTTCAAATATTTCACTTTTTAAGTCAGTGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 564151 . AAACAGTCGTCATTACCATAGCTGTGACAGGGAGACTGTTGAATTTATAATCTATTGGCCATTCACAGCATAGCGTATA . . . NS=1;AN=0 GT:PS ./.:. -1 564248 . T . . . NS=1;AN=0 GT:PS ./.:. -1 564270 . TCATCACATTCCCTTCACAACTTACTCACCAGATCAGACTTTGAGCTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 564326 . GCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 564338 . TCGTTTGAAATGGTCATCCATCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 564365 . G . . . NS=1;AN=0 GT:PS ./.:. -1 564373 . ACC . . . NS=1;AN=0 GT:PS ./.:. -1 564383 . A . . . NS=1;AN=0 GT:PS ./.:. -1 564385 . GTCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 564399 . AGATGATTTTCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 564415 . T . . . NS=1;AN=0 GT:PS ./.:. -1 564423 . TTTTGTTTAATATATTAGATTTGACCTTCAGCAAGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 564463 . A . . . NS=1;AN=0 GT:PS ./.:. -1 564583 GS000016676-ASM_2060_L C [12:41757451[NC . . NS=1;SVTYPE=BND;MATEID=GS000016676-ASM_2060_R;CGA_BF=0.10;CGA_BNDGO=NM_001164595|+ GT:FT:CGA_BNDMPC:CGA_BNDPOS:CGA_BNDDEF:CGA_BNDP 1:TSNR;SHORT;INTERBL:12:564583:[41757451[NC:IMPRECISE -1 564621 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs10458597;CGA_FI=100131754|XR_108278.1|LOC100131754|UTR|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:145,.:145,.:11,.:-145,0,0:-11,0,0:467:464,.:3 -1 564654 . G . . . NS=1;AN=0 GT:PS ./.:. -1 564714 . A . . . NS=1;AN=0 GT:PS ./.:. -1 564862 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.92|rs1988726;CGA_FI=100131754|XR_108278.1|LOC100131754|UTR|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:222:467,222:467,222:20,50:-467,-222,0:-50,-20,0:451:448,448:2 -1 564868 . T . . . NS=1;AN=0 GT:PS ./.:. -1 565006 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs142650224&dbsnp.92|rs1856864;CGA_FI=100131754|XR_108278.1|LOC100131754|UTR|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:270:270,460:270,460:53,20:-460,-270,0:-53,-20,0:313:311,311:2 -1 565148 . T A . . NS=1;AN=1;AC=1;CGA_FI=100131754|XR_108278.1|LOC100131754|UTR|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:166,.:166,.:11,.:-166,0,0:-11,0,0:495:494,.:1 -1 565286 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.134|rs140432825&dbsnp.88|rs1578391;CGA_FI=100131754|XR_108278.1|LOC100131754|UTR|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:234,.:234,.:12,.:-234,0,0:-12,0,0:322:308,.:14 -1 565406 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6594029&dbsnp.134|rs144191907;CGA_FI=100131754|XR_108278.1|LOC100131754|UTR|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:422:926,422:926,422:41,53:-926,-422,0:-53,-41,0:259:254,254:5 -1 565454 . T . . . NS=1;AN=0 GT:PS ./.:. -1 565464 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs6594030;CGA_FI=100131754|XR_108278.1|LOC100131754|UTR|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:380,.:380,.:16,.:-380,0,0:-16,0,0:388:388,.:0 -1 565490 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs7349153&dbsnp.134|rs141292355;CGA_FI=100131754|XR_108278.1|LOC100131754|INTRON|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:319,.:319,.:15,.:-319,0,0:-15,0,0:353:351,.:1 -1 565508 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9283150&dbsnp.134|rs145079224;CGA_FI=100131754|XR_108278.1|LOC100131754|INTRON|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:288,.:288,.:14,.:-288,0,0:-14,0,0:296:294,.:1 -1 565541 . A G . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs6594031&dbsnp.134|rs145787647;CGA_FI=100131754|XR_108278.1|LOC100131754|INTRON|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:374,.:374,.:16,.:-374,0,0:-16,0,0:272:269,.:3 -1 565591 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7416152&dbsnp.134|rs141914161;CGA_FI=100131754|XR_108278.1|LOC100131754|INTRON|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:249:249,367:249,367:53,16:-367,-249,0:-53,-16,0:262:255,255:7 -1 565697 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs3021087&dbsnp.134|rs143799743;CGA_FI=100131754|XR_108278.1|LOC100131754|INTRON|UNKNOWN-INC;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:218:218,469:218,469:50,20:-469,-218,0:-50,-20,0:202:196,196:6 -1 565870 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9326619&dbsnp.134|rs148172508;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:276,.:276,.:14,.:-276,0,0:-14,0,0:347:344,.:3 -1 565937 . T . . . NS=1;AN=0 GT:PS ./.:. -1 565976 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9283151&dbsnp.134|rs138694460;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:172,.:172,.:11,.:-172,0,0:-11,0,0:321:311,.:9 -1 566001 . A . . NS=1;CGA_WINEND=568000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:98.61:1.03:.:0:.:0:0.999:426 -1 566010 . G . . . NS=1;AN=0 GT:PS ./.:. -1 566021 . A G . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs6421778;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:224,.:224,.:12,.:-224,0,0:-12,0,0:396:395,.:1 -1 566024 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs6421779;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:224,.:224,.:12,.:-224,0,0:-12,0,0:397:397,.:0 -1 566048 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs6421780&dbsnp.134|rs145777875;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:105,.:105,.:9,.:-105,0,0:-9,0,0:209:208,.:1 -1 566130 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs147381188&dbsnp.92|rs1832730;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:517:517,1336:517,1336:53,44:-1336,-517,0:-53,-44,0:340:338,338:2 -1 566371 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs56133209&dbsnp.134|rs149516726&dbsnp.92|rs1832731;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:416:416,987:416,987:53,43:-987,-416,0:-53,-43,0:256:254,254:2 -1 566390 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7418044&dbsnp.134|rs144069128;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:407:897,407:897,407:39,53:-897,-407,0:-53,-39,0:298:297,297:1 -1 566573 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs143522810&dbsnp.92|rs1856866;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:265:265,730:265,730:53,33:-730,-265,0:-53,-33,0:133:130,130:3 -1 566771 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.134|rs146545403&dbsnp.96|rs2185536;CGA_SDO=10 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:222,.:222,.:12,.:-222,0,0:-12,0,0:453:452,.:1 -1 566778 . C T . . NS=1;AN=1;AC=1;CGA_SDO=10 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:236,.:236,.:12,.:-236,0,0:-12,0,0:502:502,.:0 -1 566792 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9283152&dbsnp.134|rs140024420;CGA_SDO=10 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:129,.:129,.:11,.:-129,0,0:-11,0,0:301:301,.:0 -1 566816 . C A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.134|rs151278223&dbsnp.96|rs2185537;CGA_SDO=10 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:41,.:41,.:5,.:-41,0,0:-5,0,0:120:118,.:2 -1 566849 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs140536240&dbsnp.96|rs2185538;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:305:305,713:305,713:53,32:-713,-305,0:-53,-32,0:155:152,152:3 -1 566916 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs3949348&dbsnp.134|rs145370722;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:372:372,906:372,906:53,40:-906,-372,0:-53,-40,0:240:235,235:5 -1 566933 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.132|rs113120793&dbsnp.134|rs142636660;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:342:342,832:342,832:53,37:-832,-342,0:-53,-37,0:180:176,176:4 -1 566960 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs146905624&dbsnp.96|rs2185540;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:420:1045,420:1045,420:44,53:-1045,-420,0:-53,-44,0:175:174,174:1 -1 567002 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9285834&dbsnp.134|rs138612401;CGA_SDO=10 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:567002:PASS:61,.:193,.:11,.:-61,0,0:-11,0,0:486:486,.:0 -1 567005 . C T . . NS=1;AN=1;AC=1;CGA_SDO=10 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:567002:PASS:66,.:207,.:11,.:-66,0,0:-11,0,0:541:541,.:0 -1 567033 . T . . . NS=1;AN=0 GT:PS ./.:. -1 567037 . TTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 567061 . CC CTGCT,CT . . NS=1;AN=2;AC=1,1;CGA_XR=.,dbsnp.119|rs9326621&dbsnp.134|rs143484473;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/2:.:PASS:75:75,240:22,187:4,11:-240,-240,-240,-75,0,-75:-11,-11,-11,-4,0,-4:120:16,65:1 -1 567092 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9326622&dbsnp.134|rs148002581;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:53:53,587:159,534:43,24:-587,-53,0:-43,-24,0:116:116,116:0 -1 567119 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9283153&dbsnp.134|rs141701738;CGA_SDO=10 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:221:221,583:221,583:50,26:-583,-221,0:-50,-26,0:123:117,117:6 -1 567191 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs139593516&dbsnp.96|rs2185541;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:341:341,816:341,816:53,36:-816,-341,0:-53,-36,0:141:136,136:5 -1 567230 . T . . . NS=1;AN=0 GT:PS ./.:. -1 567239 . CG C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.129|rs60652689&dbsnp.131|rs78150957&dbsnp.134|rs147555239&dbsnp.134|rs149882676;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:28,.:28,.:4,.:-28,0,0:-4,0,0:358:357,.:1 -1 567486 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.92|rs1972377;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:204:204,498:204,498:48,22:-498,-204,0:-48,-22,0:134:133,133:1 -1 567489 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.92|rs1972378;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:174:393,174:393,174:17,44:-393,-174,0:-44,-17,0:136:134,134:2 -1 567575 . TAGCCCACT . . . NS=1;AN=0 GT:PS ./.:. -1 567697 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs148529803&dbsnp.92|rs1972379;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:617:617,1464:617,1464:53,44:-1464,-617,0:-53,-44,0:365:365,365:0 -1 567783 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs142895724&dbsnp.92|rs2000095;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:337:635,337:635,337:28,53:-635,-337,0:-53,-28,0:240:240,240:0 -1 567807 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2853819&dbsnp.134|rs151077676;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:471:1005,471:1005,471:44,53:-1005,-471,0:-53,-44,0:362:360,360:2 -1 567867 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs142547273&dbsnp.92|rs2000096;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:119:256,119:256,119:26,38:-256,-119,0:-38,-26,0:65:64,64:1 -1 568001 . A . . NS=1;CGA_WINEND=570000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:100.32:0.91:.:0:.:0:0.999:426 -1 568072 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2853820&dbsnp.134|rs147963388;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:293:713,293:713,293:32,53:-713,-293,0:-53,-32,0:193:192,192:1 -1 568201 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4098611&dbsnp.134|rs150293476;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:287:287,489:287,489:53,21:-489,-287,0:-53,-21,0:229:228,228:1 -1 568235 . T . . . NS=1;AN=0 GT:PS ./.:. -1 568256 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.108|rs4098612&dbsnp.134|rs137953150;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:131,.:131,.:11,.:-131,0,0:-11,0,0:336:336,.:0 -1 568361 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4098613&dbsnp.134|rs143080150;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:580:580,1458:580,1458:53,44:-1458,-580,0:-53,-44,0:354:354,354:0 -1 568404 . G A . . NS=1;AN=1;AC=1;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:349,.:349,.:15,.:-349,0,0:-15,0,0:467:467,.:0 -1 568419 . T . . . NS=1;AN=0 GT:PS ./.:. -1 568442 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs7419218&dbsnp.134|rs146264256;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:127,.:127,.:11,.:-127,0,0:-11,0,0:273:272,.:0 -1 568463 . A . . . NS=1;AN=0 GT:PS ./.:. -1 568572 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7413388&dbsnp.134|rs139256206;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:490:490,1198:490,1198:53,44:-1198,-490,0:-53,-44,0:219:216,216:3 -1 568616 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7411884&dbsnp.134|rs144294451;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:408:408,972:408,972:53,42:-972,-408,0:-53,-42,0:147:142,142:5 -1 568691 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.117|rs8179256&dbsnp.134|rs141850957;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:257:257,459:257,459:53,20:-459,-257,0:-53,-20,0:170:169,169:1 -1 568703 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7411906&dbsnp.134|rs143001019;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:163:163,245:163,245:43,13:-245,-163,0:-43,-13,0:166:166,166:0 -1 568718 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7417964&dbsnp.134|rs150918512;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:152:152,247:152,247:42,13:-247,-152,0:-42,-13,0:130:129,129:1 -1 568745 . C CCA . . NS=1;AN=1;AC=1;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:126,.:126,.:7,.:-126,0,0:-7,0,0:140:140,.:0 -1 568752 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9326624;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:146:180,146:180,146:14,41:-180,-146,0:-41,-14,0:102:102,102:0 -1 568941 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.117|rs8179289&dbsnp.134|rs150104574;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:208:208,356:208,356:49,15:-356,-208,0:-49,-15,0:162:158,158:4 -1 569004 . T . . . NS=1;AN=0 GT:PS ./.:. -1 569010 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9285836;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:264,.:264,.:14,.:-264,0,0:-14,0,0:291:291,.:0 -1 569052 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.134|rs138605248&dbsnp.96|rs2153588;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:166,.:166,.:11,.:-166,0,0:-11,0,0:237:235,.:2 -1 569094 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9283154&dbsnp.134|rs140303844;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:351:623,351:623,351:28,53:-623,-351,0:-53,-28,0:319:319,319:0 -1 569204 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.132|rs112660509&dbsnp.134|rs145433303;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:198,.:198,.:11,.:-198,0,0:-11,0,0:368:367,.:1 -1 569226 . C . . . NS=1;AN=0 GT:PS ./.:. -1 569267 . G . . . NS=1;AN=0 GT:PS ./.:. -1 569492 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6594033&dbsnp.134|rs147253560;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:255:522,255:522,255:23,53:-522,-255,0:-53,-23,0:218:218,218:0 -1 569609 . A . . . NS=1;AN=0 GT:PS ./.:. -1 569624 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs6594035&dbsnp.134|rs144764545;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:472,.:472,.:21,.:-472,0,0:-21,0,0:381:379,.:2 -1 569717 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs147510063&dbsnp.96|rs2096044;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:40:272,40:272,40:14,29:-272,-40,0:-29,-14,0:301:301,301:0 -1 569803 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs150050719&dbsnp.96|rs2096045;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:312:312,823:312,823:53,36:-823,-312,0:-53,-36,0:285:282,282:3 -1 569874 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.96|rs2096046;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:391:391,834:391,834:53,37:-834,-391,0:-53,-37,0:326:325,325:1 -1 569878 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9283155;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:478:1099,478:1099,478:44,53:-1099,-478,0:-53,-44,0:352:351,351:1 -1 569983 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs145414098&dbsnp.96|rs2096047;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:437:437,1105:437,1105:53,44:-1105,-437,0:-53,-44,0:334:333,333:1 -1 570001 . G . . NS=1;CGA_WINEND=572000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:9.98:0.94:.:0:.:0:0.999:426 -1 570076 . T . . . NS=1;AN=0 GT:PS ./.:. -1 570079 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9283156;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:122,.:122,.:10,.:-122,0,0:-10,0,0:367:366,.:1 -1 570094 . G . . . NS=1;AN=0 GT:PS ./.:. -1 570097 . A G . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.100|rs2298012;CGA_SDO=9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:61,.:61,.:6,.:-61,0,0:-6,0,0:209:207,.:2 -1 570178 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9326626&dbsnp.134|rs146675873;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:412:412,981:412,981:53,43:-981,-412,0:-53,-43,0:208:206,206:2 -1 570301 . C . . END=570596;NS=1;AN=0 GT:PS ./.:. -1 570623 . CA . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:570623:VQLOW:25,.:0,.:0,.:0:0 -1 570626 . G GA . . NS=1;AN=2;AC=1;CGA_SDO=9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:570623:VQLOW:25:25,25:0,0:5,0:-25,0,-25:0,0,-5:11:10,1:10 -1 570634 . TTAATGATAAACCCATTCACAA . . . NS=1;AN=0 GT:PS ./.:. -1 570662 . A . . . NS=1;AN=0 GT:PS ./.:. -1 570668 . T . . . NS=1;AN=0 GT:PS ./.:. -1 570672 . T . . . NS=1;AN=0 GT:PS ./.:. -1 570679 . ATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 570814 . GCTCTTTTCATTACACATGAC . . . NS=1;AN=0 GT:PS ./.:. -1 570918 . AAAAGGTCACCAAACCAAATTTGGGTCCACCCACCCAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 571203 . A . . . NS=1;AN=0 GT:PS ./.:. -1 571253 . TCACACCTGTAATCTCAGCGCTTTGGGAGGCCAAGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 571319 . GACCAGCCAGACCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 571344 . CACCATCTCTACTAAAAATACAAAAATTATCTGGGCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 571439 . AACGTGGGAGGCAGAGGGTTCAGTGAGCTGAGATCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 571489 . AGCCTGGGCAGCAGAGTGAGACTCTGTCTCAAACAAACAAACAAAAACC . . . NS=1;AN=0 GT:PS ./.:. -1 571590 . A . . . NS=1;AN=0 GT:PS ./.:. -1 571620 . GGGTGTGACTTCCTCCAGGTCCCTCAGGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 571662 . AAAGAACAGTTGTGAGTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 571834 . TTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 571948 . TTTAAGATTAGTTTGTGGCTCTAATTTTCTTGACTATTGTTTTAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 572001 . T . . NS=1;CGA_WINEND=574000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.74:0.92:.:0:.:0:0.999:426 -1 572315 . AGCGTGCCCCCAATTTTGCATGCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 572398 . AAGGATG . . . NS=1;AN=0 GT:PS ./.:. -1 572468 . ACACACCAGATTCATTCCCTGATTAGAGCTGCTGAATT . . . NS=1;AN=0 GT:PS ./.:. -1 572550 . CTCATAA . . . NS=1;AN=0 GT:PS ./.:. -1 572638 . C . . . NS=1;AN=0 GT:PS ./.:. -1 572947 . ACCTACA . . . NS=1;AN=0 GT:PS ./.:. -1 573002 . TTCATAGC . . . NS=1;AN=0 GT:PS ./.:. -1 573098 . ACAAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 573212 . TGGCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 573393 . TCTGATAAATAAATAATAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 573493 . TTGCTCAATGCAAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 573532 . GAGCAAATAGATAGTTAACCACTCTTTAAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 573752 . AGCCGACTGCTGGGTGCGCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 573888 . TCCTGATCAGCTCTTTATTCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 574001 . T . . NS=1;CGA_WINEND=576000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:1.01:.:0:.:0:0.999:426 -1 574140 . AAAATTTTTTTAAAAAAAAAGAAGAAGAGTACCTACTGTATAGCATTGATTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 574312 . GAGGGACAGAAAGAAGTAGGAGAAGGTAAAGAGATGGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 574389 . TTTATTATGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 574433 . AGAAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 574556 . TGTGATTATTTGAACTTCAGCATCT . . . NS=1;AN=0 GT:PS ./.:. -1 574599 . GAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 574661 . T . . . NS=1;AN=0 GT:PS ./.:. -1 574695 . AGTTAAAATGCCATGGTTGTCTATTGGCTTAATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 574841 . CTCTTCCGTA . . . NS=1;AN=0 GT:PS ./.:. -1 574882 . AAATGTCTGGCTTTCTGACTCATAGGTGTGTTCC . . . NS=1;AN=0 GT:PS ./.:. -1 574970 . TCAATTTTATTGAAGTTCACTTCTGACCTCTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 575023 . AACTGCCCAACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 575127 . G . . . NS=1;AN=0 GT:PS ./.:. -1 575168 . TTCCTCACCCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 575206 . GTTGTTTTTTTTTTTTTAGACGGAGTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 575276 . CACTGCAACCTCCACTGCCTGGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 575476 . C . . . NS=1;AN=0 GT:PS ./.:. -1 575658 . ATTCTCCACATGGATGTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 575690 . AAATGAAAATCTGACCACGTTACTCT . . . NS=1;AN=0 GT:PS ./.:. -1 575726 . TCCGCCTATGGCCGCTGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 575881 . GTGCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 575926 . T . . . NS=1;AN=0 GT:PS ./.:. -1 575930 . GTG . . . NS=1;AN=0 GT:PS ./.:. -1 575945 . T . . . NS=1;AN=0 GT:PS ./.:. -1 575958 . CATGATTTCATTTTGCAAGGGTTCCTTCCTTGGGCTGTGTTCAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 576001 . G . . NS=1;CGA_WINEND=578000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.83:0.94:.:0:.:0:0.999:426 -1 576036 . GCCTCACTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 576190 . AAATCACATCACATTGCTTCCTTCATATTTTTTTGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 576291 . TTTCTTTTTTTTTTTTTTGAGTCAGAATCTTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 576346 . AGTGGCGCGATCTA . . . NS=1;AN=0 GT:PS ./.:. -1 576436 . ACCTACCACCACGCCTGGCTAATTTTTTTTTTTTTTTGTATTTTTTGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 576519 . C . . . NS=1;AN=0 GT:PS ./.:. -1 576552 . GGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGTGCCTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 576719 . ATAAACTAAATGTTTTCCAAAGGGAATAGGGCAAAACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 576863 . GCTCTCCACTTACAAGAAGAGAGCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 576913 . CAACACGCTGTGAGTGCAGGCAGCTACCAGGAGGAGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 576983 . CCTGTTAATTTAATCACACGGAACACTTCTATTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 577071 . ACTGCTTGGAGTGTCAGGCCTAGATCTCTATCCATCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 577145 . GATGGGGCAATTTCTGAAAAGCACCATGTATTTTATCG . . . NS=1;AN=0 GT:PS ./.:. -1 577258 . CTAACTAACCACTATAAAGAACCCAGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 577472 . CTGGACAAGGAGGGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 577532 . TTAAGGGAGACCTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 577576 . TGGGAGGCACAGTGGAAGATCATG . . . NS=1;AN=0 GT:PS ./.:. -1 577644 . CCTGCACACAGGCTAGGGGTAGGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 577721 . G . . . NS=1;AN=0 GT:PS ./.:. -1 577821 . TCACGGAGGAAAAAAATCTCTCAATGATCTTATCTTTATA . . . NS=1;AN=0 GT:PS ./.:. -1 577942 . CTCCGAAGGTGGGGCCCTCTGCTCACCTG . . . NS=1;AN=0 GT:PS ./.:. -1 578001 . G . . NS=1;CGA_WINEND=580000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.83:0.99:.:0:.:0:0.999:426 -1 578047 . TTTCTCCTCATTAGATAATAATGAATGGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 578098 . GTGAGGAAATCTACAAAATTAATTTCACAA . . . NS=1;AN=0 GT:PS ./.:. -1 578185 . AAATGGCCTTTCGAGTTGAGCAGTAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 578253 . TTTAACAGGGGCATTCCAGCACTTCTCTAGCTACTG . . . NS=1;AN=0 GT:PS ./.:. -1 578333 . G . . . NS=1;AN=0 GT:PS ./.:. -1 578343 . C . . . NS=1;AN=0 GT:PS ./.:. -1 578365 . CTGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 578420 . TAACATTACAGTAACTGTTACAGGTTCCAGCAGGATAACTGGGTGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 578506 . AAGTTGGTAGACTAA . . . NS=1;AN=0 GT:PS ./.:. -1 578630 . ATGTGTGTTGGGATAGAGTGGTAAGAAAATGGGAAATAATAAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 578696 . TTATAAAAAGGTGAGCTGTAATAAATACTAGTGCCACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 578883 . G . . . NS=1;AN=0 GT:PS ./.:. -1 578903 . TAAACATCAGCCATGTTTATATAACTAAACTAGTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 578969 . CCACATGGTGGCTTAATGCTGCATTGATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 579008 . TTTGTTTTCACTTTTCTGCAAAATATTTAATACATTATTAAATTGAATTATGCTGATGCC . . . NS=1;AN=0 GT:PS ./.:. -1 579085 . AA . . . NS=1;AN=0 GT:PS ./.:. -1 579102 . TTTAATTTTTTTTTCCTTTGGTTTCATTATTCAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 579153 . CAACATTTTATCTGATGGAAGAGATGGAGTCCATTACTAAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 579239 . TAAAGGAAATTTACTGTGATTT . . . NS=1;AN=0 GT:PS ./.:. -1 579384 . AGAGACATG . . . NS=1;AN=0 GT:PS ./.:. -1 579434 . CCACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 579472 . GACATGTCACTTCCAGCAGAAGCTTTAAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 579534 . TGTAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 579586 . CCCTGTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 579857 . TTCCCTCAGAACCCTTAGCCTGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 579991 . TCATACA . . . NS=1;AN=0 GT:PS ./.:. -1 580001 . G . . NS=1;CGA_WINEND=582000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.66:0.90:.:0:.:0:0.999:426 -1 580015 . ACATGGGCACCCATATTTTTCTAGCCACTTTCATTAG . . . NS=1;AN=0 GT:PS ./.:. -1 580075 . C . . . NS=1;AN=0 GT:PS ./.:. -1 580085 . T . . . NS=1;AN=0 GT:PS ./.:. -1 580110 . CCAGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 580134 . GTGATTTTCTGTTGGTGTTCACTTCGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 580183 . TTATTGACTGACTGACTAACTAATGTG . . . NS=1;AN=0 GT:PS ./.:. -1 580231 . AGGCTCTCTACAAAAACGGAGGGATGCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 580276 . TACGTAAGAAATTGCCTCCGATAG . . . NS=1;AN=0 GT:PS ./.:. -1 580362 . CAGGTTTTTAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 580398 . TATTTGTGTGTGTGCATGTGGTAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 580491 . GGGGAAAACATTTTCCCAAGGTTCTAACAGAAGAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 580555 . TGTGAGGGTTGCTTTTATGTATTTATTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 580617 . CTTTCTCTTTTTTTCTTCTTTTTTTTTTTTTGGACAGAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 580664 . TGTCGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 580792 . ATAATTTTTTTATATTTTTAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 580882 . GCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCCCCTGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 580939 . TTATAGC . . . NS=1;AN=0 GT:PS ./.:. -1 580981 . TTAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 581142 . TTTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 581285 . AGGGGAGATTTTTCAGGAGTGCCACAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 581338 . CCTGCTTTCATTGGAAAGTGTATAATGATGTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 581400 . ACTGCCGTCATAGGGATGCCTTAGTGAATCAATCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 581469 . AAAAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 581612 . TGACATGTAAGCATTGCCATGCCTAGTACAGACTCTCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 581740 . TTATAAAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 581829 . CACTGGGAATAACCTCTGTACTTTGGACAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 581938 . CATGTTTCTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 582001 . T . . NS=1;CGA_WINEND=584000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.74:1.01:.:0:.:0:0.999:426 -1 582013 . TCCGTGTTACTGAGCAGTTCTCAGCAACACAA . . . NS=1;AN=0 GT:PS ./.:. -1 582073 . T . . . NS=1;AN=0 GT:PS ./.:. -1 582192 . ACAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 582253 . AAAGAAAAATGATGGTAAATGAGACATTAATTTACCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 582377 . CCTAAAAAAGTAAACATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 582460 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 582515 . CACGACA . . . NS=1;AN=0 GT:PS ./.:. -1 582570 . TCTATGTCAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 582633 . GAGGCCAAGGTGGGCAGGTCACTTG . . . NS=1;AN=0 GT:PS ./.:. -1 582691 . GGCGACACCCTGTCTCTACTAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 582738 . TGGCGCATGCCTGTAATCCCAGCTACTTAGGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 582799 . GAAGGTGGAGGTTGCAGTGAGCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 582850 . G . . . NS=1;AN=0 GT:PS ./.:. -1 582867 . TGTCAAAAAAAAAAAAAAAAAAGAAATCCAAATAAAATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 582916 . GTGGAAAATAGTGACAATAAAAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 582965 . TTGAGATGCCAAGGTGGCAGGATCACTTGAGACCAGGAGTTCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 583046 . ACACGCCAAAAAAAAATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 583149 . A . . . NS=1;AN=0 GT:PS ./.:. -1 583159 . CAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 583207 . TCCTATCTCAAAAAAAAAAAAAAAAAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 583335 . GGAAATAATGTGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 583372 . CACGGTGACTCACATCTGTAATCCCAGCACTTTGGGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 583433 . GGTCAGGAGTTCCAGACCAGCCTGGCCAACATGGTGAAATCTTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 583498 . C . . . NS=1;AN=0 GT:PS ./.:. -1 583541 . G . . . NS=1;AN=0 GT:PS ./.:. -1 583558 . A . . . NS=1;AN=0 GT:PS ./.:. -1 583562 . T . . . NS=1;AN=0 GT:PS ./.:. -1 583573 . A . . . NS=1;AN=0 GT:PS ./.:. -1 583693 . AACATAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 583724 . ATGCCTTGAAAAGAGGGAGAAAAATTGTGAATT . . . NS=1;AN=0 GT:PS ./.:. -1 583777 . GGAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 583888 . T . . . NS=1;AN=0 GT:PS ./.:. -1 583955 . ATCTCCTCCCCTCCCCTACTCCTCACCCCACACTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 584001 . A . . NS=1;CGA_WINEND=586000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.71:0.98:.:0:.:0:0.999:426 -1 584058 . C . . . NS=1;AN=0 GT:PS ./.:. -1 584136 . CAAGGGCTTCACAGACAGTTTTACTAAACTCATGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 584201 . TATACCTTATAGATAAAGGTATCTATAAGGTATAGATAAAGGTAAGGTATCTA . . . NS=1;AN=0 GT:PS ./.:. -1 584261 . TAGATAAAGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 584297 . TGTTCCCAAAGCCTCGTGGCTAGTAATTCAAACCTA . . . NS=1;AN=0 GT:PS ./.:. -1 584361 . CCTCATGATACTATACTGCCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 584460 . TGATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 584519 . AATTTGAATAACTCCCTGCGGGTGAAGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 584592 . CGG . . . NS=1;AN=0 GT:PS ./.:. -1 584633 . GGGTGGATCATGAAGTCAGGAGTTGAAGACCAGCCCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 584687 . CCATCTCTACTAAAAATAAAAAATTAGCCGGGCCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 584769 . G . . . NS=1;AN=0 GT:PS ./.:. -1 584834 . GGGTGACAGGGCGAGATTCCGTCTCAAAAAATAAAATAAAATAAAATAAAAAATAAAAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 584944 . ATTCCTCAGCATTTTAGTGATCTGTATGGTCCCTCTATCCGTCAGGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 585045 . AGGGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 585109 . TTGAAATGTGATCCCCATTGTTGGAGGTGGGGCCTAATGGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 585340 . CAGAGTAGCTAGGATTACAGGTACCC . . . NS=1;AN=0 GT:PS ./.:. -1 585403 . AGACGGGGTTTCACCATG . . . NS=1;AN=0 GT:PS ./.:. -1 585433 . GTCTCAAACTCCTGACCTCAGGTGATC . . . NS=1;AN=0 GT:PS ./.:. -1 585466 . CCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 585566 . TGGTACAAACCCCTCTCTCTTGCCACGTGATCTC . . . NS=1;AN=0 GT:PS ./.:. -1 585632 . TGAGTGGAAACAGACTAAAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 585705 . TGAACCAAATAAACCTCTCTTCTTTAAAATTATTCAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 585760 . AACAACACACACACACACACACACACACACACACACACACACACGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 585815 . CTAAAACAGGAACTAATTAGAAATGGTAATG . . . NS=1;AN=0 GT:PS ./.:. -1 585863 . CGAGGCTCCCCAACAGGAACTGAGGCCATGGATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 585924 . TAATGGTTAAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 585965 . GCCAAGGCCTCCCATGGACCAAACTCAACTA . . . NS=1;AN=0 GT:PS ./.:. -1 586001 . T . . NS=1;CGA_WINEND=588000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:0.97:.:0:.:0:0.999:426 -1 586015 . CCTGAGTGTTGCATTCAGCAGAAGTCAGCTTCCTAGGGAATATTG . . . NS=1;AN=0 GT:PS ./.:. -1 586126 . CAACAGAGCGACTCAGATGCTATAAAACTTGCTAACGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 586192 . CAAGCCAGGTTTTAGTCATCAGAAATCGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 586301 . CACACGA . . . NS=1;AN=0 GT:PS ./.:. -1 586346 . TTCGAAAACTCA . . . NS=1;AN=0 GT:PS ./.:. -1 586437 . A . . . NS=1;AN=0 GT:PS ./.:. -1 586525 . ATCGGAAGAGACAGACACT . . . NS=1;AN=0 GT:PS ./.:. -1 586689 . ATTTAATAAATATCAG . . . NS=1;AN=0 GT:PS ./.:. -1 586782 . C . . . NS=1;AN=0 GT:PS ./.:. -1 586786 . GTA . . . NS=1;AN=0 GT:PS ./.:. -1 586956 . TTATTAATAAAAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 586993 . AAAATGTAAAAAGTATCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 587042 . CCATAGTAGAAAAAAGTGAAAATTAATAAAATAAG . . . NS=1;AN=0 GT:PS ./.:. -1 587168 . AATAAAAAAAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 587337 . CTTCAAAAAAAAAAAAAAAAAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 587413 . TATAACACACACACAAACA . . . NS=1;AN=0 GT:PS ./.:. -1 587447 . TTCACAGAGAATTCCACCAAACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 587482 . ATCATCCA . . . NS=1;AN=0 GT:PS ./.:. -1 587532 . CAATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 587656 . GCCAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 587762 . ATATATGG . . . NS=1;AN=0 GT:PS ./.:. -1 587824 . GATGTTTTTCATATTTTTTTCTTTTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 587876 . TTTTAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 588001 . T . . NS=1;CGA_WINEND=590000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:1.16:.:0:.:0:0.999:426 -1 588003 . CATCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 588052 . TTTCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 588090 . G . . . NS=1;AN=0 GT:PS ./.:. -1 588167 . CCCTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 588216 . GTCTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 588258 . ACATATAAATAAATAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 588331 . TAATTAAAAAGTTACTTTTTACTT . . . NS=1;AN=0 GT:PS ./.:. -1 588382 . TGGAAAAAAGAATCAGTGAACTTGATAGATCAAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 588452 . AATGACAACAAAAAAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 588541 . GGAAGACTATTTGAAGAAATGTGTTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 588595 . AATATATACATTCAAAAAGCTCAGTGCACT . . . NS=1;AN=0 GT:PS ./.:. -1 588653 . C . . . NS=1;AN=0 GT:PS ./.:. -1 588721 . CAACGCG . . . NS=1;AN=0 GT:PS ./.:. -1 588755 . T . . . NS=1;AN=0 GT:PS ./.:. -1 588769 . G . . . NS=1;AN=0 GT:PS ./.:. -1 588772 . T . . . NS=1;AN=0 GT:PS ./.:. -1 588774 . A . . . NS=1;AN=0 GT:PS ./.:. -1 588777 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 588788 . C . . . NS=1;AN=0 GT:PS ./.:. -1 588860 . A . . . NS=1;AN=0 GT:PS ./.:. -1 588888 . TAGAATGACATTTTAAAGTTCTGAAAGAAAAAAACACT . . . NS=1;AN=0 GT:PS ./.:. -1 588939 . TCTGTAACTTGGAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 588993 . GATTAAAAAAAAAAGAGAGAGAGAAAGAGAAAGAAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 589043 . A . . . NS=1;AN=0 GT:PS ./.:. -1 589058 . AAGAAAAAGAAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAAAGAAAAGAAAGAAAGAAAAGCAAGCAAGCTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 589220 . CACTTTTAAAAAAAAAGACTCCTTCAGATACAAACTAAAAAACACTA . . . NS=1;AN=0 GT:PS ./.:. -1 589325 . ATATAAAAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 589360 . TTTAAATATTCTATATGTTTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 589429 . GATGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 589509 . CTATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 589569 . CCCCATGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 589715 . ACAGAAAACAAAAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 589880 . ACACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 590001 . A . . NS=1;CGA_WINEND=592000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:0.95:.:0:.:0:0.999:426 -1 590025 . TTTCAAAATTAAACAAG . . . NS=1;AN=0 GT:PS ./.:. -1 590081 . G . . . NS=1;AN=0 GT:PS ./.:. -1 590088 . A . . . NS=1;AN=0 GT:PS ./.:. -1 590187 . ACACCCA . . . NS=1;AN=0 GT:PS ./.:. -1 590247 . CTATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 590326 . ATATGTTAGGCCATAAGATAAGCTCAATAAACTTAAAAAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 590546 . AAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 590619 . AACATGT . . . NS=1;AN=0 GT:PS ./.:. -1 590703 . AATAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 590748 . TATACAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 590931 . ATGCAAAAAAAATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 591056 . TGTGGATTGGCAATGCATTCTTAGATAATACAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 591156 . TATAAAGAATTAGAGGGGAATTTGGTGAAAGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 591269 . T . . . NS=1;AN=0 GT:PS ./.:. -1 591279 . A . . . NS=1;AN=0 GT:PS ./.:. -1 591391 . GAAAATAATAATTAATCTGATTAATTTTTGACTGTTCTCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 591499 . AAAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 591529 . TTCTTTTTATATTAGTATAAATATAACAATTCTGAAACAAATGTATGTGCATTGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 591593 . CCAATGAGTAAATATTAATATATTTGTATTGCTAGAACCCCAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 591645 . GTGAAAGGACAGAGATACAGATATGGAATAAGACAAGGAAAGAAGCAGCCCACTGAGTTACATTAGAATCAGTATTATCAACATAAATATACAATGTGCTCTCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 591766 . TCTTAAAAAATATATAATATGTACATATTATATATTATATGCATAGACACACGTGTGTCTATAC . . . NS=1;AN=0 GT:PS ./.:. -1 591835 . CTACATGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 591873 . A . . . NS=1;AN=0 GT:PS ./.:. -1 591878 . T . . . NS=1;AN=0 GT:PS ./.:. -1 592001 . A . . NS=1;CGA_WINEND=594000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:0.99:.:0:.:0:0.999:426 -1 592004 . CACGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 592088 . TCTAACCAGCAAAATTCACAGTGTGGACA . . . NS=1;AN=0 GT:PS ./.:. -1 592232 . G . . . NS=1;AN=0 GT:PS ./.:. -1 592265 . GCAGGAAGGACACAGCCGTGAAAATGCAAGGACGCCTCTACTGG . . . NS=1;AN=0 GT:PS ./.:. -1 592480 . TGTGTTTTGTTTTTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 592906 . ACGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 592933 . TTGGGGGTGGCATCTTCATCAGTAAATCACA . . . NS=1;AN=0 GT:PS ./.:. -1 592988 . TA . . . NS=1;AN=0 GT:PS ./.:. -1 593048 . GAATAAAAATAGCAGGAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 593088 . TATGTTTTTAATTTGTTATATATGTATATTTTTATCATACTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 593148 . TGCACAATGTGCAGGAATAAAATTTATGTTTTTAAAATTTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 593227 . AGCCTCATCACAGGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 593260 . TGCCAAAACAATCTATCGTT . . . NS=1;AN=0 GT:PS ./.:. -1 593317 . GGAGAAAAAAGCAATGGAATGAATAAAATGATAGCCACAAAAATCAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 593438 . AATACAG . . . NS=1;AN=0 GT:PS ./.:. -1 593494 . TTTCTATAGTAACAGTTTTTAAATAAAATATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 593566 . AGTGAAAAAAAACAAATTCAGAGCAAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 593658 . TAAACCC . . . NS=1;AN=0 GT:PS ./.:. -1 593711 . ATCTGTGCCAAGTGGTGTATTAATGATTCATTTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 593775 . GGTGTAGCCTGCAACTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 593873 . ATGTTATATATATAGAAAGAGAGAGAGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 593965 . AACAGGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 594001 . A . . NS=1;CGA_WINEND=596000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.78:1.11:.:0:.:0:0.999:426 -1 594077 . TGCTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 594179 . GTCCAAAAGCTTACTGTCTAGTGGGAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 594243 . ATAAGTGAAAACTAAGATAAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 594286 . AAGGTTTCCAAAGTCAATGAGGCCTCAAATGAATCT . . . NS=1;AN=0 GT:PS ./.:. -1 594377 . AGGAACAGCATGAGCAAATGCAAGGAGGCCTAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 594425 . AAAGAGGTGTAAGCAGCTTTGTACTGCTGCCTGATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 594499 . GAGTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 594526 . TAAATTATAAGAAATTTATAGCATAAGGAATAGTAGGACCGTTAAATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 594591 . CTTCTTTTTTTAATATTTATTTTTATTATACTTTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 594647 . AACGTGCAGGTTACATATGTATACATGTGCCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 594777 . GTGATGTTCCCCTTCCTGTGTCCAAGTGTTCTCATTGT . . . NS=1;AN=0 GT:PS ./.:. -1 594828 . ATGAGTGAAAACATGCGGTGTTTGGTTTTTTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 594892 . CTCTAGCTGCATTGTGGGAGGAAAAAAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 594980 . GATTATC . . . NS=1;AN=0 GT:PS ./.:. -1 595033 . AGATATGTGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 595087 . AAGTGAAAGGAAAGGATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 595149 . GCTTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 595183 . TATTTCAGAAACTATATGACATGACGAAAAGTAAAAAGGGGATGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 595502 . TCAGAAAAATAAAAAAACTA . . . NS=1;AN=0 GT:PS ./.:. -1 595553 . CTTTACTA . . . NS=1;AN=0 GT:PS ./.:. -1 595629 . ACTGTCCTTCCATTGCATTGTATGTGTTTTTTAATA . . . NS=1;AN=0 GT:PS ./.:. -1 595778 . ATTCAATAATAAGTTTGCATATTACAACCT . . . NS=1;AN=0 GT:PS ./.:. -1 595817 . TTGGTGTAATTCATCCATTCGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 595870 . GTCGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 595928 . AACTATTA . . . NS=1;AN=0 GT:PS ./.:. -1 595998 . CTGGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 596001 . G . . NS=1;CGA_WINEND=598000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.03:1.06:.:0:.:0:0.999:426 -1 596138 . ACCATAT . . . NS=1;AN=0 GT:PS ./.:. -1 596180 . TTTGTTTTTTGCTTAGAAGTGCTTTGGCTATTAGGGATCTTTTTTTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 596262 . GTGAGAAATAACGTTGGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 596355 . CGTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 596369 . G . . . NS=1;AN=0 GT:PS ./.:. -1 596375 . CCAT . . . NS=1;AN=0 GT:PS ./.:. -1 596526 . TCTCAACAAACATTCAAACAGCTTGAATGTATTTGGTGTATAGAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 596658 . TTCTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 596775 . ATTGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 596833 . ACTTCTAAC . . . NS=1;AN=0 GT:PS ./.:. -1 596891 . A . . . NS=1;AN=0 GT:PS ./.:. -1 597049 . TCTATTTTGAACCATTCAAGCACCCCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 597110 . TTTTGATGTGTTGTTAG . . . NS=1;AN=0 GT:PS ./.:. -1 597166 . GTGTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 597236 . TGCACAGTATTATTTTAATG . . . NS=1;AN=0 GT:PS ./.:. -1 597289 . TGTTCCTCATGTTACAATGAAACTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 597350 . GTTTTTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 597397 . TAGTTTGTTATTATTACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 597444 . CCCACAGCTATTGAAATAACCATATTTTGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 597504 . TTGTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 597621 . ATTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 597675 . TTACTATCATGGAGGTACCACCATATAAAACAAGTTGGAAAGTGTTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 597759 . GGCATTATTTCTTTATTAAATATTTGGTAATATTTCTTTATTAAATATTGCATCCA . . . NS=1;AN=0 GT:PS ./.:. -1 597821 . CCTGGAGTTCTTTCTACAGGAAAAAAAAATTTTCTAAATAAAATTTCTACAATGAAAAAAAAACTACT . . . NS=1;AN=0 GT:PS ./.:. -1 597897 . CTAGTTTTTTTCTGATCATTTCATAAAAGTAGGTATTTTTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 597961 . GATTGTCAAATTTATTAATATAAAGTTTCATATTTTATATTTATTTTATCAGATAAATAAAATTATATGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 598001 . T . . NS=1;CGA_WINEND=600000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.01:.:0:.:0:0.999:426 -1 598057 . AGCCATGTTAAGCTAACATATG . . . NS=1;AN=0 GT:PS ./.:. -1 598095 . T . . . NS=1;AN=0 GT:PS ./.:. -1 598098 . C . . . NS=1;AN=0 GT:PS ./.:. -1 598106 . ATGA . . . NS=1;AN=0 GT:PS ./.:. -1 598116 . TTA . . . NS=1;AN=0 GT:PS ./.:. -1 598121 . A . . . NS=1;AN=0 GT:PS ./.:. -1 598149 . CAAGTACATCCCCTATG . . . NS=1;AN=0 GT:PS ./.:. -1 598178 . AAGGCTCCTCAAAAAAGTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 598213 . GTGGTCCAGCAATCCCACTGCTGCATATATACCCCCCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 598360 . GAATAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 598393 . AGTATTATTCAGCCATAAAAAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 598485 . GCACAAAAAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 598607 . TGGGAGAGGTGGGGGATGGTTAATGGGTACAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 598735 . TAATTGG . . . NS=1;AN=0 GT:PS ./.:. -1 598840 . TGTACCTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 598889 . T . . . NS=1;AN=0 GT:PS ./.:. -1 598976 . ACCCTTTGACCAACATC . . . NS=1;AN=0 GT:PS ./.:. -1 599169 . TTTTAAGGTTGTATACTATTCTATTGTGTATGTGTACCACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 599306 . CTCATTGACACACTGATTTGATA . . . NS=1;AN=0 GT:PS ./.:. -1 599360 . CTGAATCATATGGTAATTCTATTTTTACAGAATCATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 599422 . ATAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 599485 . ATCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 599540 . GTAGTTTTAATA . . . NS=1;AN=0 GT:PS ./.:. -1 599624 . GAAATGT . . . NS=1;AN=0 GT:PS ./.:. -1 599682 . TTGTGTTGTTTGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 599767 . GTTATCACTTCACTCTGTTGACTTTCTTTTGCTGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 599912 . ATTCCCCTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 599987 . TTTATATGGTATGAAATAAGGGCCTAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 600001 . A . . NS=1;CGA_WINEND=602000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.05:1.14:.:0:.:0:0.999:218 -1 600073 . CCCATGTG . . . NS=1;AN=0 GT:PS ./.:. -1 600287 . AGAGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 600342 . AAATGTCATAGGAATTTTGATAG . . . NS=1;AN=0 GT:PS ./.:. -1 600381 . GTACATCACTTTGGATAGTATGGACA . . . NS=1;AN=0 GT:PS ./.:. -1 600475 . ATCGCTTTCATCAATGTTTTGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 600545 . AGTATTTTTTGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 600703 . GTTATTGGTGTTTTCTATATATAAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 601003 . TTTCAAAAAGTAGCAACAACTGTGGGAGTTCAGTCAGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 601063 . TAGTTTTAAGAAATCGACACAAACCTTCATGGAAGGCTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 601127 . GATCTGAATGAAGGCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 601269 . CTCTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 601323 . GCCCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 601580 . CTCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 601773 . AGAGCCTCCCTCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 601979 . GTCAGCTTTTTTTTTTTCTTTTTGTGACCCAGCAGAATGCCTGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 602001 . T . . NS=1;CGA_WINEND=604000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.51:1.23:.:0:.:0:0.999:218 -1 602110 . C . . . NS=1;AN=0 GT:PS ./.:. -1 602152 . CGGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 602252 . TGCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 602380 . TTTGTAAACATAGGTTGTGGTGCAGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 602450 . GAGTACC . . . NS=1;AN=0 GT:PS ./.:. -1 602523 . CCCTCTCATCTCCACA . . . NS=1;AN=0 GT:PS ./.:. -1 602595 . CAAGAAAAAAAAAACCAT . . . NS=1;AN=0 GT:PS ./.:. -1 602642 . AATCCTCAAAAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 602715 . ATGCAACTCAAAACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 602868 . AAGGCAGTGTGAAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 602935 . TGGATATATGCCCAAAGGAATATAAATCACTCT . . . NS=1;AN=0 GT:PS ./.:. -1 603017 . TCCCAGCACTTTGGGAGGCCAAGGCGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 603183 . ATCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 603280 . ATATATATACATATACATACATATATATACATATATATACACATATATATATACATATATACATATATTATATAGGTAAATGTATATATATGTGTATATATATACACACATATATATACACATATATATACATATTATAACTACATATATATACACACACACATACATATACATGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 603480 . TTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 603490 . A . . . NS=1;AN=0 GT:PS ./.:. -1 603499 . GGAATCAACCCAAATGCCCATCAATGATATATTGGATAAAGAAAATGTGATATATATTCACCATGGAATACTATGCAGCCGTTAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 603631 . CCATCACCCTCAGCAAACTAACACAGGAACAGAAAACCAAACACCA . . . NS=1;AN=0 GT:PS ./.:. -1 603685 . CAGTCGTAAGAGGGAGTTGAACAATGAGAGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 603720 . ACG . . . NS=1;AN=0 GT:PS ./.:. -1 603726 . A . . . NS=1;AN=0 GT:PS ./.:. -1 603739 . AACAACACACACCAGGGCCTCTCAGGGGGACAGGGGTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 603788 . GGACAAACACGTGGATACATGGAGGGGAACAACAC . . . NS=1;AN=0 GT:PS ./.:. -1 603828 . AGGGCCTCTCAGGGGGACAGGGGTAGGAGACCATCAGGACAAACACATGGATACATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 603892 . ACAACACACACCAGGGCCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 603918 . GGACAGGGGTAGGAGACCATCAGGACAAACACGTGGATAC . . . NS=1;AN=0 GT:PS ./.:. -1 603967 . AACAACACACACCAGGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 603995 . GACAGGGGGTAGGAGACCATCAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 604001 . G . . NS=1;CGA_WINEND=606000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.38:0.92:.:0:.:0:0.999:218 -1 604057 . AGGACCTCTCAGCGGGACAGGGGGTAGGAGACCATCAGGACAAACACGTGGATAC . . . NS=1;AN=0 GT:PS ./.:. -1 604121 . AACAACACACACCAGGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 604149 . GACAGGGGGTAGGAGACCATCAGGACAAACACGTGGGTACATGGAGGGGAACAACACACACCAGGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 604223 . GGGGACAGGGGTAGGAGACCATCAGGACAAACACGTGGATACATGGAGGGGAACAACACACACCAGGACCTCTCAGCGGGACAGGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 604317 . ACCATCAGGACAAACACGTGGGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 604349 . GGAGCAACACACACCAGGGCCTCTCAGGGGGACGGGGGGTAGGAGACCATCAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 604412 . TGGATAC . . . NS=1;AN=0 GT:PS ./.:. -1 604429 . ACAACACACACCAGGGCCTCTCAGGGGGACGGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 604471 . ACCATCAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 604488 . GTGGGTACATGGAGGGGAACAACACACACCAGGGCCTCTCAGGGGGACGGGGGGTAGGAGACCATCAGGACAAACACGTGAGTACATG . . . NS=1;AN=0 GT:PS ./.:. -1 604583 . ACAACACACACCAGGGCCTCTCAGGGGGACGGGGGGTAGGAGACCATCAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 604642 . GTGGGTACATGGAGGGGAACAACACACACCAGGGCCTCTCAGGGGGACGGGGGGTAGGAGACCATCAGGACAAACACGTGGATACAT . . . NS=1;AN=0 GT:PS ./.:. -1 604734 . GGAACAACACACACCAGGGCCTCTCAGCGGGACAGGGGGTAGGAGACCATCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 604794 . ACGTGAGTACATGGAGGGGAACAACACATACCAGGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 604840 . GGACGGGGGGTAGGAGACCATCAGGACAAACACGTGGGTACATGGAGGGGAACAACACACACCAGGGCCTCTCAGGGGGACGGGGGGTAGGAGACCATCA . . . NS=1;AN=0 GT:PS ./.:. -1 604951 . TGGATACATGGAGGGGAACAACACACACCAGGGCCTCTCAGGGGGACAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 605028 . TGGATACATGGAGGGGAACAACACACACCAGGGCCTCTCAGGGGGACAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 605085 . AGACCATCAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 605105 . TGGATACATGGAGGGGAACAACACACACCAGGGCCTCTCAGGGGGACGGGGGGTAGGAGACCATCAGGACAAACACGTGGGTACATGGAGGGGAACAACACACACCAGGGCCTCTCAGGGGGACGGGGGGTAGGAGACCATCAAGACAAACACGTGGGTACATGGAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 605281 . ACACACCAGGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 605345 . AGGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 605396 . ACACATTTACCTATGTATCAAACCTACACT . . . NS=1;AN=0 GT:PS ./.:. -1 605456 . AAATTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 605496 . TCTCCTTCTGAAACACTCTTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 605579 . CATTTCCTTATTCCTGTGTTCATTTTGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 605647 . CTTCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 605700 . AAATTTCAATTATGTTATTCTCTATG . . . NS=1;AN=0 GT:PS ./.:. -1 605793 . TTTTAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 605820 . TACTAAATATAGTTATTTTATTTTCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 605873 . CTTCGCTAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 605903 . CATAGTTTTTTTCGTTTGTTTTCATGATTAGAAAAACAGAGAGAGAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 605960 . AAAGGGAGGAGGAGGAGGAGGAGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 605994 . GCAGAGAAGAAGGGACAG . . . NS=1;AN=0 GT:PS ./.:. -1 606001 . A . . NS=1;CGA_WINEND=608000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.85:1.14:.:0:.:0:0.999:218 -1 606033 . TAACGTTTCTCTAACAACTGTCTTCAGTGAAACGCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 606078 . TGGATTTTTAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 606130 . A . . . NS=1;AN=0 GT:PS ./.:. -1 606169 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 606201 . AAAATAAAATAAAAAGCTACACAAATTAAAAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 606265 . CCTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 606310 . TTCTTTTTAGTTATTCTGTTTATCTT . . . NS=1;AN=0 GT:PS ./.:. -1 606514 . CAGACACTTTATGTTCTCTTTTCTTTACAAGCATGCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 606559 . TATACAATGTGTGTATTGTTTTTATATATACCTA . . . NS=1;AN=0 GT:PS ./.:. -1 606618 . G . . . NS=1;AN=0 GT:PS ./.:. -1 606700 . AAAAGCAGTTATAAGAGGGACACTTATAGCAATAAATGCC . . . NS=1;AN=0 GT:PS ./.:. -1 606902 . TAAGAATTGTTTTTTGAAAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 606954 . TAAGAAAAAAGAAAACAAACTCAGAAATGAAAGAAGAGACATTAAAACT . . . NS=1;AN=0 GT:PS ./.:. -1 607036 . ACTATAATTAATTATTCACCAGCAAATTAGATAACCTAGAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 607093 . TACCAAAACTGAATCATGAAGAATTCAAAATTTAGAACAAATCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 607184 . GACCCAGGATTGAATGGCTCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 607299 . TTTCAAAACCAGCATTACCCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 607421 . TTATTTTTAAGAAAACACAGCAAAAAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 607619 . C . . . NS=1;AN=0 GT:PS ./.:. -1 607653 . TGAAACTAAGAATTTGTTCTTTGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 607717 . AAACGAG . . . NS=1;AN=0 GT:PS ./.:. -1 607786 . ATATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 608001 . T . . NS=1;CGA_WINEND=610000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.58:1.04:.:0:.:0:0.999:218 -1 608013 . TTTTAAAAAATAATAATACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 608060 . GAAGGAGAAAGAGTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 608105 . ATTGTTTTTTTTTTAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 608176 . A . . . NS=1;AN=0 GT:PS ./.:. -1 608208 . TTGTGATAGTTTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 608265 . CATGAACTCATCATTTTTTATGG . . . NS=1;AN=0 GT:PS ./.:. -1 608317 . CACATTTTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 608382 . GAATAGTGCTGCAATAAACATACGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 608433 . TTTATAATCCTTTGGGTATATACCCAGTAATGGGATGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 608480 . ATGGTATTTCTAGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 608502 . CCTAAGAAATCGCCACACTGACTTCCACAATGGTTGAACTAGT . . . NS=1;AN=0 GT:PS ./.:. -1 608556 . C . . END=608822;NS=1;AN=0 GT:PS ./.:. -1 608831 . T . . END=609042;NS=1;AN=0 GT:PS ./.:. -1 609061 . TAAGAAAAGTATGGGCCAACCAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 609096 . ACATAGATACAAAAGTCCTCAAAATAAGTACTAGCAAACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 609146 . ACATATT . . . NS=1;AN=0 GT:PS ./.:. -1 609193 . ATGTTTCAGCAAACACAAATCAAATGTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 609244 . GGATAAAAAAATAGCTATCTCTATATATGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 609313 . A . . . NS=1;AN=0 GT:PS ./.:. -1 609319 . T . . . NS=1;AN=0 GT:PS ./.:. -1 609335 . G . . . NS=1;AN=0 GT:PS ./.:. -1 609435 . CAAGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 609523 . AATAAAAGATATTCAAATTGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 609578 . ATATTATATATAGGAAACCCTAAAAACTCC . . . NS=1;AN=0 GT:PS ./.:. -1 609665 . AATGTACAAAACTCAGTAGTTTCTTTACACTCACAA . . . NS=1;AN=0 GT:PS ./.:. -1 609754 . AAACTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 609855 . GATGAAAAATTGTAGATGACACAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 609969 . CAACATT . . . NS=1;AN=0 GT:PS ./.:. -1 610001 . A . . NS=1;CGA_WINEND=612000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.73:1.12:.:0:.:0:0.999:218 -1 610046 . GGACCCTGAATAACTAAAGCACT . . . NS=1;AN=0 GT:PS ./.:. -1 610091 . GAAGGCCTCACAATCTGACTTCAAAACGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 610189 . CCCGCGCTGGGTCGGAGGAGCAGGAGTATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 610240 . ATGGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 610371 . GCAGTCTCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 610415 . GGCTCGCGTGGACCATCACCATCATCCTGAGCTGCTGCTGTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 610464 . CCACAGCCAAGCCAGCCCTCAAGTCCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 610513 . CTGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 610676 . CTTTGAATTGCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 610731 . CCCTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 610771 . A . . . NS=1;AN=0 GT:PS ./.:. -1 610810 . CATTGTCGACTGCAGTCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 610915 . CTCAGAGGCCGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 611014 . GTCCACA . . . NS=1;AN=0 GT:PS ./.:. -1 611075 . TTCAGTGATAAACTTGGACTAACTGTGGCCCAGGTATCAGCACTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 611160 . TATGGCAAGGCCTATCTCCCTCTGCTGAATCCAACAGGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 611272 . TA . . . NS=1;AN=0 GT:PS ./.:. -1 611371 . A . . . NS=1;AN=0 GT:PS ./.:. -1 611406 . ACTGGTTGGACAGTGGGTGCAGCGCACGG . . . NS=1;AN=0 GT:PS ./.:. -1 611459 . TGTCGCCTCACCCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 611575 . TCACGGT . . . NS=1;AN=0 GT:PS ./.:. -1 611615 . GTGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 611675 . CACCGAGCTAGCTGCAGGAGTTTTTTTTTTTTTTTCATGCCACAATGG . . . NS=1;AN=0 GT:PS ./.:. -1 611809 . CCCACCCATACAGAGCCCAGCAAGCTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 611861 . ACAGCAGTCTGAGGTTGACCTAGGACACTCGAGCTTGGTGGCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 612001 . A . . NS=1;CGA_WINEND=614000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.57:1.17:.:0:.:0:0.999:218 -1 612035 . C . . . NS=1;AN=0 GT:PS ./.:. -1 612064 . TCTGAAAAAAGGGCAGCAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 612130 . C . . . NS=1;AN=0 GT:PS ./.:. -1 612160 . A . . . NS=1;AN=0 GT:PS ./.:. -1 612164 . T . . . NS=1;AN=0 GT:PS ./.:. -1 612167 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 612173 . T . . . NS=1;AN=0 GT:PS ./.:. -1 612276 . CCTAACC . . . NS=1;AN=0 GT:PS ./.:. -1 612310 . CCCAGTAGGTGCCAACAGGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 612356 . CATCTGGTGGGTGCCCCTCTTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 612429 . CTCCGCTGGTGATGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 612478 . AACACCA . . . NS=1;AN=0 GT:PS ./.:. -1 612504 . GCCTGACTGTTAGAAGGAAAACTAACAAACAAAAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 612580 . ACCTCATCAGAAGGTTACC . . . NS=1;AN=0 GT:PS ./.:. -1 612907 . T . . . NS=1;AN=0 GT:PS ./.:. -1 612934 . T . . . NS=1;AN=0 GT:PS ./.:. -1 612979 . TCGATCAAGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 613060 . AATGAAAAGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 613118 . CTACATTTGATTGCTGTACCTGAAAGTGATGGGGAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 613165 . GTTAGAAAACACTCTTCAGGATATTATCCAGGAGAACTTCCCTAACCT . . . NS=1;AN=0 GT:PS ./.:. -1 613243 . ATACGGAGAACATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 613294 . ATAGTCATCAGATTGAGCAAGGTTGAAATGA . . . NS=1;AN=0 GT:PS ./.:. -1 613342 . G . . . NS=1;AN=0 GT:PS ./.:. -1 613358 . GTCAGGTTACCCACAAAGGGAAGCCCATCAGACTAACAGCAGATCTATCAGCAGAAATTCTACAAGCCAGAAGAGAATGGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 613459 . AAAGAAAAGAATTTTCCACCCAGGATT . . . NS=1;AN=0 GT:PS ./.:. -1 613493 . CAGCCAAACTAAGCTTCATAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 613523 . AAATAAAATCTTTACAGACAAGCAAATGCTGAGAGATTTTGTCACCACCAGGCCTGCCTTAAAGGAGCTCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 613621 . A . . . NS=1;AN=0 GT:PS ./.:. -1 613647 . CATACCAAATTGTAAAGACAATT . . . NS=1;AN=0 GT:PS ./.:. -1 613718 . ATCGTAATGACAGGATCAAATTCACACATAAC . . . NS=1;AN=0 GT:PS ./.:. -1 613766 . GTAAATGGGCTAAATGCTCCAATTAAAAAACACAGACTGGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 613836 . TGTTCTGTATTCAGGAGACCCATCTCACGTGCAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 613881 . GGCTCAAAATAAAGGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 613923 . GAAGGCAAAAAAAAGCAGGGGTCGCAATCCTAGTCTCTGATAAAACAGACTTTAAACCAACAAAGATCAAAAGAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 614001 . A . . NS=1;CGA_WINEND=616000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.61:1.06:.:0:.:0:0.999:218 -1 614019 . TGGTAAAAGGATCAATGCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 614049 . TAATTATCCTAAATATATATGCACCCAATACAGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 614091 . GATGCATAAAGTAAGCTCTTAGAGACTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 614171 . CTGTCAATACTAGACAGATCAACGAAACAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 614222 . G . . . NS=1;AN=0 GT:PS ./.:. -1 614239 . G . . . NS=1;AN=0 GT:PS ./.:. -1 614271 . C . . . NS=1;AN=0 GT:PS ./.:. -1 614278 . C . . . NS=1;AN=0 GT:PS ./.:. -1 614290 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 614311 . ACCACATTGCACTTATTCTAAAATTGACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 614350 . AAGTAAAACACTCCTCAGCAAATGCAAAAAAAAATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 614406 . GATCGCAGTGCAATTAAATTAGAACTCAGGATTAAGAAACTGACTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 614500 . ACTGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 614547 . ACCA . . . NS=1;AN=0 GT:PS ./.:. -1 614557 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 614571 . T . . . NS=1;AN=0 GT:PS ./.:. -1 614575 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 614615 . ATTTATAGCACTAAATGCCCACAAGAGAAAGCAGGAAAGATCTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 614685 . G . . . NS=1;AN=0 GT:PS ./.:. -1 614758 . AAGGAGATAGAGACACAAAAAAACCTTCAAAAAAAATCAATGAATCCAGGAGCTGGTTTTTTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 614836 . TAGATAG . . . NS=1;AN=0 GT:PS ./.:. -1 614948 . T . . . NS=1;AN=0 GT:PS ./.:. -1 615001 . GGACAAATTCCTGGACACATACACCCTCCCAAGACTAAACCAG . . . NS=1;AN=0 GT:PS ./.:. -1 615095 . G . . . NS=1;AN=0 GT:PS ./.:. -1 615121 . C . . . NS=1;AN=0 GT:PS ./.:. -1 615124 . T . . . NS=1;AN=0 GT:PS ./.:. -1 615145 . AGCCGAATTCTACCAGAGGTACAAAGAGGAGCTGGTACCA . . . NS=1;AN=0 GT:PS ./.:. -1 615195 . AACTATTCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 615243 . ATGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 615269 . AAATTTGGCAGAGACACAACAAAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 615334 . GCAAAAATCCTCAATAAAATACTGGCAAACCAAATCCAGCAGCATATCAAAAGCTTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 615444 . C . . . NS=1;AN=0 GT:PS ./.:. -1 615522 . AAGGCCTTCGAAAAAATTCAACAGCCCTTCATGCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 615564 . CTCAATAAACTAGGTACTGATGGAACATATCTCAAAATAATAATACCTA . . . NS=1;AN=0 GT:PS ./.:. -1 615666 . TTTGAAAACCAGCACAAGACAAGGATGCCCTATC . . . NS=1;AN=0 GT:PS ./.:. -1 615705 . ACTCCTATTCAACGTAGTATTGGAAGTTCTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 615790 . TGTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 615826 . TCTTCTTAAGCTAATAAGCAACTTCAGAAAAGTCTCAGGATACAAAATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 615924 . AGCCAAATCATGAGTGAACTCTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 615966 . A . . . NS=1;AN=0 GT:PS ./.:. -1 616001 . G . . NS=1;CGA_WINEND=618000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.57:1.09:.:0:.:0:0.999:218 -1 616038 . TGCTCAAGGAAATAAGAGAGGACACA . . . NS=1;AN=0 GT:PS ./.:. -1 616138 . TT . . . NS=1;AN=0 GT:PS ./.:. -1 616191 . ATTGGAAAAAACTACTTTAAATTTCATATGGAACCAAAAATGAG . . . NS=1;AN=0 GT:PS ./.:. -1 616352 . ACAGATATGTAGACCAATGGAACAGAACAGAGGCCTCAGAAATAACACCACACATCTACAACTATCTGATCTTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 616434 . TGACAAAAACAAGCAATGGGGAAACGAT . . . NS=1;AN=0 GT:PS ./.:. -1 616492 . C . . . NS=1;AN=0 GT:PS ./.:. -1 616561 . TGTATTA . . . NS=1;AN=0 GT:PS ./.:. -1 616593 . CATAAAAAACCCTAGAAGAAAACCTAGGCAATACCATTCAGTACA . . . NS=1;AN=0 GT:PS ./.:. -1 616678 . AATGGCAACAAAAGCCAAAATTGACAAATGGGATCTAATTAAACTAAAGAGCTCCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 616741 . AAAAGAAACTATCATCAGAGTGAACAGG . . . NS=1;AN=0 GT:PS ./.:. -1 616782 . GGGTGAAAATTTTTGCAATCTATCCATCTGACAAAGGGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 616857 . CAAGAAAATAACAAACAAACCCATCAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 616968 . GTCATCAGAGAAATGCATATCAATACC . . . NS=1;AN=0 GT:PS ./.:. -1 617079 . ATAAGAATGCTTTTACACTGTTGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 617126 . A . . . NS=1;AN=0 GT:PS ./.:. -1 617185 . CAGTAATTGCATTACTGGGTATATATCCAAAGGATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 617232 . TACTATAAAGACACATGCACACATATGT . . . NS=1;AN=0 GT:PS ./.:. -1 617276 . CACAATAGCAAAGACTTGGAACCAACCAAAATGCCCATTCAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 617335 . AAAATGTGGCATATATACACCATGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 617400 . TTCAGGGACATGGATTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 617428 . A . . . NS=1;AN=0 GT:PS ./.:. -1 617489 . ATGAGAGTTGAACA . . . NS=1;AN=0 GT:PS ./.:. -1 617539 . CACCGGGGCCTGTCTGGGGGTAGGGGCTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 617598 . ATGTAGATGATGGGTTGATGGGTGCAGCAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 617673 . GCATATGTACCCCAGAACTTAAAATATAATTTAAAAAAAAATCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 617901 . GAACAGGCAACCTGCAGACCTAAGCTTGATTCCCAAGTCACAGTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 617979 . AAGAGATAGAAATAGGGAATTTGAAGGAATAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 618001 . G . . NS=1;CGA_WINEND=620000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:1.08:.:0:.:0:0.999:218 -1 618083 . GTTACTA . . . NS=1;AN=0 GT:PS ./.:. -1 618211 . TCCGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 618310 . C . . . NS=1;AN=0 GT:PS ./.:. -1 618364 . TCAGTTTTCTGGGCAATCTTGGTGAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 618453 . CCCAAAACAAGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 618493 . CAATATGACACTTGCACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 618544 . GGAGCCCAGAAATAAACTCATGCATTTATGACCAATAAATTTTTGACA . . . NS=1;AN=0 GT:PS ./.:. -1 618675 . ACACGAATGTGTATGGTGTGTATCCTTATCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 618756 . AACTGTAAAACTACTAGATGAAAACATAGGGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 618828 . TATCACC . . . NS=1;AN=0 GT:PS ./.:. -1 618867 . TGGGATT . . . NS=1;AN=0 GT:PS ./.:. -1 618934 . AAGTTGTGAGAAAATATTTGCAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 619038 . AAATGGGCAAGGTCCTAAATAGACATTTCTCAAAAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 619116 . ATCAGGGAAATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 619184 . AATGATAAAAGATAACA . . . NS=1;AN=0 GT:PS ./.:. -1 619474 . AAAATGTAGTGTATATATACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 619591 . CACGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 619633 . TCTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 619687 . TAGTAGTTGGGGGTAGACATGGAAAAGGTAGATGTTGATAAAAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 619752 . AAAGTTTCAGTGAACTATTGCACAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 619843 . TGTTTTCACCACAAAAAAGATATGTATGTCAATAAGATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 620001 . C . . NS=1;CGA_WINEND=622000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:1.02:.:0:.:0:0.999:218 -1 620043 . TCTTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 620089 . TTGACGTCATTCCACCAGATTCTATCTCCAGTGTTAAAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 620167 . TGCGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 620217 . T . . . NS=1;AN=0 GT:PS ./.:. -1 620350 . AAACGAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 620426 . AGA . . . NS=1;AN=0 GT:PS ./.:. -1 620446 . T . . . NS=1;AN=0 GT:PS ./.:. -1 620536 . TCTACTC . . . NS=1;AN=0 GT:PS ./.:. -1 620600 . T . . . NS=1;AN=0 GT:PS ./.:. -1 620642 . AAAGAAAATCCTACATAACCTATAAAAAACTGATGGAACTTATA . . . NS=1;AN=0 GT:PS ./.:. -1 620763 . ATTGAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 620858 . CTACAAAACATTGCTGAAAGAAATTAAATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 620957 . ATTCATCAGATAA . . . NS=1;AN=0 GT:PS ./.:. -1 620999 . AATCAATACATAATGTCTCATAGTTCCTGAATCTAAAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 621071 . TAATGAGAAGGGCTTATTATATCATTTATGAGATCCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 621254 . GCCGTGTATACACAAACATGGGTGGACCAAAGAACAAAAGGACCACTG . . . NS=1;AN=0 GT:PS ./.:. -1 621377 . AGACGTAGGAGATTAGAAGTATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 621484 . AGCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 621530 . AGAAGGCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 621571 . TGACTGACACCAA . . . NS=1;AN=0 GT:PS ./.:. -1 621652 . TTACATAGGGCCACATATCTG . . . NS=1;AN=0 GT:PS ./.:. -1 621703 . TCCACACCACCAACGACGTGGATGAAGAAGATTTGAGCGATGCAGCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 621841 . CTGGCCAGTAGAAAGTACATGGGGGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 622001 . A . . NS=1;CGA_WINEND=624000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.83:0.92:.:0:.:0:0.999:218 -1 622042 . G . . . NS=1;AN=0 GT:PS ./.:. -1 622102 . TGAAGATAATAATATGTA . . . NS=1;AN=0 GT:PS ./.:. -1 622181 . TATTAAAAACAAAAAATCAACATAATATTATCACAACATGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 622276 . T . . . NS=1;AN=0 GT:PS ./.:. -1 622345 . CTGTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 622428 . GCACCCACAATTCTCTGACAGAAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 622489 . ATCACTGCAGTAAGAGGTGCCTGGGACAGACT . . . NS=1;AN=0 GT:PS ./.:. -1 622534 . CTGTGGATTGAGAGAATATAGAGACTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 622595 . AGACTAGAGTAAGTGGACTTTCACAAGAAATAGAATCACCACCATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 622660 . GCTTACTGCTATTTAAGTGCCTCAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 622707 . TACGAAGCCCTAAATGGCTTCCCATCCTGCAATGA . . . NS=1;AN=0 GT:PS ./.:. -1 622977 . CAGCTTTTATGTTCATTGCATTCCCCACCCTTTTAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 623083 . ATTCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 623253 . CATATTGT . . . NS=1;AN=0 GT:PS ./.:. -1 623373 . TAATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 623447 . CAACTCC . . . NS=1;AN=0 GT:PS ./.:. -1 623533 . A . . . NS=1;AN=0 GT:PS ./.:. -1 623643 . CACTTTTTACA . . . NS=1;AN=0 GT:PS ./.:. -1 623710 . A . . . NS=1;AN=0 GT:PS ./.:. -1 623728 . A . . . NS=1;AN=0 GT:PS ./.:. -1 623801 . TAATTGTTGAAAAAGACT . . . NS=1;AN=0 GT:PS ./.:. -1 624001 . C . . NS=1;CGA_WINEND=626000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.18:1.12:.:0:.:0:0.999:218 -1 624049 . T . . . NS=1;AN=0 GT:PS ./.:. -1 624143 . TATATAG . . . NS=1;AN=0 GT:PS ./.:. -1 624285 . TTGGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 624456 . ATTCCTGTAAATAGAAAGAACCAAAAAAAGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 624812 . AAACGAATAGAAAGTGGATAGTTCTTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 625120 . AAAGTAGAACAGAGATAAAGTTAAAAAAAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 625196 . GAACAATGATTTAAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 625255 . AAGGTAGGATGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 625297 . GGTTAAAAAGAACATAAATAAAATAAATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 625339 . ACACTCATTCAAGGATGCTACTGAGTTTGACTTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 625389 . TAATACACTCATCTGGGGATGCTACGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 625469 . TATCGCCCTGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 625721 . AAGATCA . . . NS=1;AN=0 GT:PS ./.:. -1 625794 . TTCAAAAGTCCAGAAACCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 625900 . ATAATTTGGACTTTGAGGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 626000 . GTCGACC . . . NS=1;AN=0 GT:PS ./.:. -1 626001 . T . . NS=1;CGA_WINEND=628000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:1.06:.:0:.:0:0.999:218 -1 626051 . TAACGTA . . . NS=1;AN=0 GT:PS ./.:. -1 626117 . GTTCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 626165 . AGTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 626182 . TATGTAGATAGACATAATAAATTTTAGTATTCAATTATAC . . . NS=1;AN=0 GT:PS ./.:. -1 626231 . ATCGTAGGGAACAATAATTTATTATATATTCTAAAATAGC . . . NS=1;AN=0 GT:PS ./.:. -1 626302 . AAAGAAAAGATAAATATTCAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 626363 . ATTGTATACATGTATCAAAAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 626427 . AAAAAACAACAAAAAAACCAAAAGAATAGAAATCAAAAATAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 626576 . CATGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 626621 . TCTGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 626929 . AAAGAAAAAAAATGTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 627016 . C . . . NS=1;AN=0 GT:PS ./.:. -1 627240 . CATCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 627410 . CATTAAAATATGCATCAGGCATGTGTGATGCATACAGTAGAAGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 627638 . ATAGCCATAAAAGAGGATAAATATTACAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 627706 . TTGTAAAATATTTCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 627791 . AGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 627920 . CCAGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 627953 . GTGATGTATCAAATGGAGGAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 628001 . T . . NS=1;CGA_WINEND=630000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.06:1.12:.:0:.:0:0.999:218 -1 628077 . TATAGAAACAGAAATAGAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 628137 . AGCTGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 628189 . GCAAGAATAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 628411 . AGAGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 628476 . CTATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 628517 . GAATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 628735 . A . . . NS=1;AN=0 GT:PS ./.:. -1 628791 . T . . . NS=1;AN=0 GT:PS ./.:. -1 628829 . GAGTTGTCAAAAGCTGAATTAATTAAAAGTC . . . NS=1;AN=0 GT:PS ./.:. -1 628935 . CTTGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 628962 . CATGAGAATGAAAAGGGCTGAGAAGGGGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 629283 . CGGTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 629361 . ACTGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 629487 . A . . . NS=1;AN=0 GT:PS ./.:. -1 629698 . C . . . NS=1;AN=0 GT:PS ./.:. -1 629811 . G . . . NS=1;AN=0 GT:PS ./.:. -1 629860 . TTTACAT . . . NS=1;AN=0 GT:PS ./.:. -1 629925 . ATGTGGACATGAAATCATATAATGTTCCATGGAAAAAATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 629996 . ATCAAAACTAATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 630001 . A . . NS=1;CGA_WINEND=632000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.86:1.19:.:0:.:0:0.999:218 -1 630158 . A . . . NS=1;AN=0 GT:PS ./.:. -1 630220 . ACATATTCACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 630267 . GCAGAAAAAATAAAACTCTAGATCAATCTCAATCGTGTAAATAAATTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 630443 . C . . . NS=1;AN=0 GT:PS ./.:. -1 630705 . ACACGTGTGTGGTGTAATGAAAGAAGGCAAAAAGATGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 630762 . GCAATTTTCATTCTATTATGAAAAGAAAAGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 630893 . TGGAGACAAGATTCCAAAACTCGTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 630964 . CTTTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 631000 . TATTATATAATTATTTATAATAATTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 631040 . AACTTGATGACCTTTATAAAAATACTAGAAGAAAAAGGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 631109 . TATACCATGAAGCAGTAAAAAACAGGATCAAATTTTCTATGTTTTTTAAAAATATGCCTTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 631178 . GTTTTCTCTCTGAAATGATATTTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 631266 . G . . . NS=1;AN=0 GT:PS ./.:. -1 631311 . CAGGAAAAAAAGTGGGAAGTGGAGAGAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 631399 . AACGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 631488 . GGTTAAAAAATTCA . . . NS=1;AN=0 GT:PS ./.:. -1 631570 . CCAGAAAAAATGTG . . . NS=1;AN=0 GT:PS ./.:. -1 631634 . CAAACACAAAATAAACAACCCACAAAAAAAAAAAAAAATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 631846 . A . . . NS=1;AN=0 GT:PS ./.:. -1 631956 . A . . . NS=1;AN=0 GT:PS ./.:. -1 632001 . A . . NS=1;CGA_WINEND=634000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:1.12:.:0:.:0:0.999:218 -1 632083 . TGGGTGATTAAATATTATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 632118 . ATTATTAGAAAAATTAAAAATAACAAAATCAAATAAAGTTGGATGA . . . NS=1;AN=0 GT:PS ./.:. -1 632229 . AGCACCCTTGGAAATAACACAAAAATTAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 632339 . CATACAGAGAAATATTAATACCATGAAAATATTAAAATAAAAGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 632540 . CATAAAAATTACTCTGAAATACAATGAAAATTAC . . . NS=1;AN=0 GT:PS ./.:. -1 632655 . CAGTTAACTGAGTGGAGAACAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 632706 . C . . . NS=1;AN=0 GT:PS ./.:. -1 632729 . GGGAGTTGTGCCTGAAACCACATTGTGAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 632946 . TTAGAAAGCAACATGCTTGCTAAGGCAAGAAATTTGATTA . . . NS=1;AN=0 GT:PS ./.:. -1 633092 . CTAATGCACCATCTAATGCTTCAAGTCTAATAATTATTCAATTC . . . NS=1;AN=0 GT:PS ./.:. -1 633324 . TACCACACAGCTATATGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 633375 . TATACTT . . . NS=1;AN=0 GT:PS ./.:. -1 633582 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs11485756&dbsnp.131|rs75160317&dbsnp.135|rs192643069;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:48:48,121:48,121:42,43:-121,-48,0:-43,-42,0:0:0,0:0 -1 633623 . C . . . NS=1;AN=0 GT:PS ./.:. -1 633740 . GCCTTGAAGCTTCCATTTCTACATCTTGGAACCAACGTGTGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 633801 . AAGAATTCTGCTTATTCTGCTTGCATTA . . . NS=1;AN=0 GT:PS ./.:. -1 633998 . A . . . NS=1;AN=0 GT:PS ./.:. -1 634001 . G . . NS=1;CGA_WINEND=636000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.81:1.17:.:0:.:0:0.999:218 -1 634137 . A . . . NS=1;AN=0 GT:PS ./.:. -1 634196 . T . . . NS=1;AN=0 GT:PS ./.:. -1 634243 . ATAGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 634286 . GTTGGCAGTCCAAAGTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 634335 . TAAACCTGTAAAATCAAAAGCAATTTAGTTATTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 634407 . GCCGTTCCAAATGGTAAAATTTAC . . . NS=1;AN=0 GT:PS ./.:. -1 634450 . G . . . NS=1;AN=0 GT:PS ./.:. -1 634458 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.131|rs78339854&dbsnp.132|rs111275359;CGA_RPT=THE1B-int|ERVL-MaLR|13.2;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:48:114,48:114,48:42,42:-114,-48,0:-42,-42,0:1:1,1:0 -1 634583 . CCCCTGTGGTTTTGCAGGGGAGAGCCTTCCTCCCGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 634672 . TGTCGGTGGATCTACCATTCCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 634708 . GCA . . . NS=1;AN=0 GT:PS ./.:. -1 634715 . TCT . . . NS=1;AN=0 GT:PS ./.:. -1 634725 . G . . . NS=1;AN=0 GT:PS ./.:. -1 634730 . A . . . NS=1;AN=0 GT:PS ./.:. -1 634733 . A . . . NS=1;AN=0 GT:PS ./.:. -1 634738 . G . . . NS=1;AN=0 GT:PS ./.:. -1 634764 . GGGCTTCAACCCCACATTTCCATTCCCCACT . . . NS=1;AN=0 GT:PS ./.:. -1 634915 . TTCCGTGAATCCACAGGCTCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 634995 . TGTACCTTGACCCCTTTTAGCTGTGGCTGGAGCAGCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 635362 . A . . . NS=1;AN=0 GT:PS ./.:. -1 635388 . C . . . NS=1;AN=0 GT:PS ./.:. -1 635478 . GACACCTTTACTCCAGTTCCCATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 635532 . AGCCTGGATTTCATTGTCTATATCATTATTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 635657 . CCCCTGCCTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 635747 . ATTAGTCTGTTTTCATGCTGCTGATA . . . NS=1;AN=0 GT:PS ./.:. -1 635798 . TATGAAGAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 635877 . CACATCTCTACATGATGGCAGACAAGACAGAATGAGAGCTAAGTGAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 635948 . CAGCTCTTGTGAGACTTATTCAAGACCACGAG . . . NS=1;AN=0 GT:PS ./.:. -1 636001 . C . . NS=1;CGA_WINEND=638000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.81:1.03:.:0:.:0:0.999:218 -1 636018 . CCCGTTGGATCCCTCCCACAAGACACGGAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 636113 . ATAGATC . . . NS=1;AN=0 GT:PS ./.:. -1 636231 . CCAGACTACCCAGCTGCCAGCTGAATACCACAGATGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 636295 . TCATGTAGCTAAGCCCTGCTTGCGCTAATACAAGTCCACAATTTTTTTTAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 636362 . TGCTAAGTTTTGA . . . NS=1;AN=0 GT:PS ./.:. -1 636465 . GAAGCCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 636523 . TGATAGCAAAACCTGTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 636646 . AAAATCATCAGTGGAGGCTGAGGAAAACGCA . . . NS=1;AN=0 GT:PS ./.:. -1 636698 . AGAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 636725 . ATATAAAAAATAGGAAAAATACCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 636792 . AAAGAAAAAAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 636872 . TAGGAGAAGTCTTCCAGAGAGGTTCTGTCAGACTGCTCTGGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 636976 . CCATTAAAAAATGATATCATGTCTTTTGCTGGAACATG . . . NS=1;AN=0 GT:PS ./.:. -1 637043 . AATGCAGGAACAGAAAACCAAATACAGC . . . NS=1;AN=0 GT:PS ./.:. -1 637289 . AAATAAAAGTTAAAAAAAAAGAAAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 637327 . ATCATCTACCTGGTAATATGAAAAACACA . . . NS=1;AN=0 GT:PS ./.:. -1 637391 . A . . . NS=1;AN=0 GT:PS ./.:. -1 637437 . CACCACTGGCTGGACGCAGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGATGCTGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 637521 . TTCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 637562 . ACTAAAAATACAAAAATTAGCTGGGTGTGGTGGCAGGCACCTGTAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 637639 . ATCGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 637680 . TCACGCCATTGCACTCCAGCCTGGGCAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 637725 . CGGGGAAAAAAAAGAAAAAAAAAAACCACCGCCATCATTTTGCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 637851 . ATTTGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 637874 . CTCTATATGTTATTATTTTGTATGTGATGACAACAGAATATATTATCATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 637935 . AATCTCATTCATAATATAAAGTATAAATTTGTGATTT . . . NS=1;AN=0 GT:PS ./.:. -1 637993 . C . . . NS=1;AN=0 GT:PS ./.:. -1 638001 . A . . NS=1;CGA_WINEND=640000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:1.27:.:0:.:0:0.999:218 -1 638073 . AAAATTTGAAACTAGTAACATGGAGGACTATTGTCATTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 638144 . CAGTGTACATAAAAATAATTTCAAGAAATTTATAAAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 638189 . TTATGGTGTATAAACAACTTTAGATTCTTTGTTTAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 638281 . GGCTAATAGTAGGCACCTAATA . . . NS=1;AN=0 GT:PS ./.:. -1 638352 . C . . . NS=1;AN=0 GT:PS ./.:. -1 638379 . AATTACTGTTTAGAGAATAACATTTGATGG . . . NS=1;AN=0 GT:PS ./.:. -1 638430 . TTACGAC . . . NS=1;AN=0 GT:PS ./.:. -1 638499 . ATGTGGCACCTGCTGAAGCCTGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 638565 . CTCTAACATTTTTTAGCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 638616 . C . . . NS=1;AN=0 GT:PS ./.:. -1 638685 . AGGTTAAATGGCACTAACTCAGGGAAGGCTTCCCTAACTGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 638767 . TAGTATCGAATCAAGTTTATAATTTTAAAATAATTGGGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 638869 . CCTGTTTGTTCACTCCTGCCACAGTCAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 638926 . GAACAAGCACTAAATAAATGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 638989 . ACTGAATGGATCATGAACACTATCTGGTATGTCACGTAGGTAATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 639107 . TCACTTCAGATTCTTCTTTAGAAAACTTGGACAATAGCATTTGCTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 639170 . CTAAGAATCAAGAGAGATATCTGACA . . . NS=1;AN=0 GT:PS ./.:. -1 639209 . GAAAACATTAAACACGATT . . . NS=1;AN=0 GT:PS ./.:. -1 639250 . TTATTATTAGAAACCAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 639282 . AAAAATAGTAATACTTATTGCAGACTCAAATGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 639433 . GTGATTTTTCAGGTTCACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 639478 . A . . . NS=1;AN=0 GT:PS ./.:. -1 639544 . TATGAAAACAAGAGATAAATATACACAACTGAGGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 639598 . T . . . NS=1;AN=0 GT:PS ./.:. -1 639607 . T . . . NS=1;AN=0 GT:PS ./.:. -1 639620 . C T . . NS=1;AN=2;AC=1;CGA_SDO=20 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:639620:VQLOW:34:34,34:34,34:12,27:-34,0,-34:-12,0,-27:15:1,14:14 -1 639661 . TAACTTTTAATAGAACCAGTCACTAAATTAAAAAAATGAC . . . NS=1;AN=0 GT:PS ./.:. -1 639910 . ACCTAACATAGACATTTGTATATGATAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 639964 . G . . . NS=1;AN=0 GT:PS ./.:. -1 640001 . A . . NS=1;CGA_WINEND=642000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:1.24:.:0:.:0:0.999:218 -1 640022 . GGCCTGAATAAGAAATATTCTGGATAAGATATTGTGGCTGCTACCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 640204 . T . . . NS=1;AN=0 GT:PS ./.:. -1 640226 . TTCTTCTTGTGCCCTCTCCCTCTCTCTCTTTCTCTTGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 640279 . AAGCCAACTGGCATGCTGTCAGTGGCCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 640404 . TAGCTAAGAACCATGTGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 640444 . CCTCAGTTGAAATTTAAGATGACATATTGAGCAGACATACTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 640569 . GTATTACACCTTCAGTGAGCACGTGTACTAGAAATTTAAAAAATAAATAAAATAAACCTTCAAAGTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 640727 . CAGGTTTCAGCCTGAACTCACACAATCTGTGTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 640833 . G . . . NS=1;AN=0 GT:PS ./.:. -1 640868 . AGCAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 640898 . CATGAGCTTCTAACACACACACAAAAATCACACACACAAAATGGGGGTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 641017 . AAACACATATTTTAATGTGGTTAATTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 641074 . CAAGAAAATTGTGCTGGATG . . . NS=1;AN=0 GT:PS ./.:. -1 641108 . GGCTCCCCTCCTATTTAAGTCTGGGTACTGTGTCACCCGAAGTCTTCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 641170 . GGTCTGGGTTTGCCTATGAAAGAAACTCATGAGAGCTGGAAATGA . . . NS=1;AN=0 GT:PS ./.:. -1 641233 . TCAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 641279 . GTGATGGCTTGCAGAATCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 641341 . TTCAGAAATTCCTATAAGCTTGGGTTCTGTGCCCACACTCTAGACTGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 641405 . ATATAAAACAGACCTCTTCTGATTTTGTCTAGCTGCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 641497 . A . . . NS=1;AN=0 GT:PS ./.:. -1 641503 . C . . . NS=1;AN=0 GT:PS ./.:. -1 641511 . A . . . NS=1;AN=0 GT:PS ./.:. -1 641563 . TAAAATATAAAGAATTGTCCAGAAATATATAAAAAAAGAATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 641629 . AATATAACAATTGTATGGACTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 641683 . TTTGAAGAAAAAAGCAATAAGAAGCCTCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 641785 . CTGTGAAGCTTGAGATGTTTATTATAATGAATTATCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 641834 . ACAATTTCCTAACAATTTTGGGGTTTATATTTTTGAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 641897 . TTGTACTATTGTTAGGTAACTTTGATGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 642001 . G . . NS=1;CGA_WINEND=644000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.07:1.23:.:0:.:0:0.999:218 -1 642025 . TGTCTGTCCACGATAAGCACTATCACAAGGACTTTCTATA . . . NS=1;AN=0 GT:PS ./.:. -1 642117 . CTGTAGGTGTCTCTATAATAGGACCTATCATAAAAAATTC . . . NS=1;AN=0 GT:PS ./.:. -1 642164 . CTGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 642184 . CACCCTCACAAGAACACTTGCCTAGCAATGGCTGTTTCTGCCAGTAAGTTAACACCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 642252 . AGACCCTGTGACCAATGATGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 642375 . AGATTATAATCCTTGTTTATTTCCAAATAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 642458 . T . . . NS=1;AN=0 GT:PS ./.:. -1 642483 . T . . . NS=1;AN=0 GT:PS ./.:. -1 642537 . CATGTTTGCTAGAACTCACCTGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 642587 . TTTTTGTGTTTAGTTTTTCTTTTGTGATTGGGGAGGGGGGTTTATCGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 642656 . A . . . NS=1;AN=0 GT:PS ./.:. -1 642763 . CTGCTTCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 642874 . CTCGTAGTATTTATAGTAAAAGTGAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 642936 . TGCCCAAGTGCTGGTCTGGTCTGATCTTCTCATCTTCCCTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 642991 . CAGTCACACTAGCCTCCTTGCTGCTCCACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 643081 . CTTTTCTCCCATAT . . . NS=1;AN=0 GT:PS ./.:. -1 643136 . TAAATGTCCCATTCTCTGTGAAGCTTTCCTGCCCACCCTATTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 643243 . AAACTGTAAATATACATGTTCACTTTTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 643326 . TTTGGTTCACTGGTGTATTCTTAAAGCATGTTACATACTAGG . . . NS=1;AN=0 GT:PS ./.:. -1 643402 . ATTGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 643453 . CAAATGTAATTCCTTACATTA . . . NS=1;AN=0 GT:PS ./.:. -1 643571 . GATACTAGGAAAAGAGGAAGGGATATATTATTTTCATGTATAAAGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 643730 . AGGGCCAAAAGAGTCAACTTCTGAAGAAGCGCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 643808 . TTGCAAAAATAAACTCATGTGCCATAATTCATGAGTAGAAAAATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 643997 . C . . . NS=1;AN=0 GT:PS ./.:. -1 644001 . A . . NS=1;CGA_WINEND=646000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:1.21:.:0:.:0:0.999:218 -1 644111 . TTACCCAGATGGGCCCAGTCTAATCACATGAGTTCTTAAAAATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 644232 . T . . . NS=1;AN=0 GT:PS ./.:. -1 644278 . CTAGAAGATAGAAAAGGCCAGGATATGGATTCTACCCTAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 644348 . TTGATTTTAGTTCACTAAAATTCA . . . NS=1;AN=0 GT:PS ./.:. -1 644458 . T . . . NS=1;AN=0 GT:PS ./.:. -1 644463 . G . . . NS=1;AN=0 GT:PS ./.:. -1 644522 . AATTCAAGGTGAATTC . . . NS=1;AN=0 GT:PS ./.:. -1 644548 . CTTAAAACATTTAGATTAAAAATAAATGAGAATTTTTGTTACTTTTGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 644611 . AGAAAAACAAACATTAAGGAGGAAAAATGAACATATG . . . NS=1;AN=0 GT:PS ./.:. -1 644660 . TATAAAGCTTCTCTATTTGGAAGATAT . . . NS=1;AN=0 GT:PS ./.:. -1 644712 . AATATTTACAACATATATATAAGTGAATAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 644756 . TATATATGAACTCCCAAAAATCAACAGGAAAAATAAGACATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 644807 . AAATGCATAAACAAAAGAAGGCAAAACAAAAATAATGACTCATAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 644879 . GATGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 644891 . AATGCAAATTAAAACCACCCTGAGATGCTTTTTACATCCATGAGCCTGATAAAAGTTAGAGTCTAAAAGTAATAATTAACAAAGATGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 644993 . TCTTGTCCATTA . . . NS=1;AN=0 GT:PS ./.:. -1 645090 . G . . . NS=1;AN=0 GT:PS ./.:. -1 645094 . A . . . NS=1;AN=0 GT:PS ./.:. -1 645163 . AACATTGTTTGTTATATCAAAAAATAAAAAAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 645208 . CAGCAAAAAAAATAAGTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 645252 . ATGGAATAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 645290 . ACTGTACATATGAATGTAAGTATCAGCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 645340 . C . . . NS=1;AN=0 GT:PS ./.:. -1 645360 . A . . . NS=1;AN=0 GT:PS ./.:. -1 645363 . T . . . NS=1;AN=0 GT:PS ./.:. -1 645472 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 645495 . AGTTGAGGGAATTTCAATTGGAAAAAAATAATATC . . . NS=1;AN=0 GT:PS ./.:. -1 645538 . TAAGTCAGGTAGTGGGTATTAGCATTTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 645588 . T . . . NS=1;AN=0 GT:PS ./.:. -1 645607 . TTCAATGTATTTAATATATTTTTTGCATAATTAAATATTATGCAATAAAAATGAGAAAACAAAAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 645695 . AAAGAAATGGAGAAAAAATTATAATCTAGTTGAGTAATGGTATATTACATAGCTATTTTCTTAAGTAGATGT . . . NS=1;AN=0 GT:PS ./.:. -1 645803 . CTTAATTATATATAAATATATATGTACATATTTTTAATATAAAATACTAAACAAAGTACACCAAAATATTAGCTCCTATGTTAGTGAGATAATG . . . NS=1;AN=0 GT:PS ./.:. -1 645905 . TTTTGTATTTTAAGTTTTACATAGTAGGTGTATTTTTCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 645956 . CTATAAAGAACTGCCCAAGACTGGGTAATTTATAAAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 646001 . T . . NS=1;CGA_WINEND=648000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.56:1.14:.:0:.:0:0.999:218 -1 646063 . AGACAAAGAGGAAGCAAGCCAGCTTCTTCG . . . NS=1;AN=0 GT:PS ./.:. -1 646104 . GAAGAAGTGCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 646153 . CTCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 646167 . CTATCACAAGAACAGCACAGGGGAAACTGCCCCCATGATTCAATTA . . . NS=1;AN=0 GT:PS ./.:. -1 646277 . C . . . NS=1;AN=0 GT:PS ./.:. -1 646306 . CCATATCAGTAGGCATGTATTGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 646445 . GAATGTGGCCTTGTAAGAAAGCAAATTAACTTCTAAC . . . NS=1;AN=0 GT:PS ./.:. -1 646542 . ACTTGAACTGCAGTAAAATATCCTCAGCAACATAGATGTGTATGTTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 646622 . CCATTGG . . . NS=1;AN=0 GT:PS ./.:. -1 646648 . ATTTCTGAAGATGTCCTGGCTTATTCACAGATGCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 646736 . GAGAGCTGTTCCAAAGTTTAGGGAGTTTTTGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 646776 . AATAAATAAAAATGTTCTTGAAAGAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 646873 . GCTATTTTTTTTTTTGACACACACTTTACA . . . NS=1;AN=0 GT:PS ./.:. -1 646914 . AATGTCTCCGGCAATAAATCACAAAGTTAAAATTAC . . . NS=1;AN=0 GT:PS ./.:. -1 646976 . TGGTAAATCATTTTCTACCAAAAGAAAGAAATGTCTTGTCTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 647038 . AAAGTTTTCCTTGTTGGCGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 647115 . CAGGAAGATGACTCAGGGCCTTATCCATACCTTCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 647175 . TCAGTATCTATATCAAAAATGGCTTAAGCCTGCAACATGTTTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 647247 . TTCATTGAATCCTGGATGCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 647278 . AATAAGAGGAATTCATATGGATCAGCTAGAAAAAAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 647334 . AAAGTTATATATTATATATCTATTATATATAATATTATATATCTATTATATATTATATATTGTATATCTATTACATATATATTATATATGTATTATATATATTATATATCTATTATATATATAATATTATATATTATATATCATT . . . NS=1;AN=0 GT:PS ./.:. -1 647484 . ATTCCCCAGCGTTCATATTTGTCAGTGCAAGTAAAGAGCCTTACTGCTGATGAGGTTTGAGGTATGACCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 647594 . T . . END=647867;NS=1;AN=0 GT:PS ./.:. -1 647881 . AGGTGTGAGCCACCACGCCCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 647933 . TTTGAAGGTCATAAAAAATATAATAAGAGATAAGGCTAATT . . . NS=1;AN=0 GT:PS ./.:. -1 647987 . ATAAAATCCTTTAATAAAAATATAAAGGAATAATATAATAATTTTATTTAATAAAATATAATAAGAGATAAGGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 648001 . T . . NS=1;CGA_WINEND=650000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:1.30:.:0:.:0:0.999:218 -1 648068 . CTTTAATAAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 648104 . TCCAAAAAAAGAAATGGAGAGGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 648141 . ATTAATCTTGTCAAAAATATAAAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 648188 . ACTGTTTTCCTTGTCTGCGGCCATTGTGCTGCTGCTACACAACTACCGCAAGCAGCCCTTCACGCCCTCCTCCCAGTACAAAGCTAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 648287 . AAATGTTAAGCTTGGAAGAGTCAGCATCGCTGCACTTATTTTTTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 648360 . AGTGGGGGAAAGGTTAAAAACCCCCCTGGATAAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 648441 . TCCTTGTCCCTTGACATAAACTTGATAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 648516 . CAGGTACTTAAAGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 648565 . GCATTGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 648771 . AGTGACATTGCCTTTTAGTTGTACTTTCACAAAAATTCA . . . NS=1;AN=0 GT:PS ./.:. -1 648883 . TCTCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 648917 . AGCCACCATAGTCTCTCCCATT . . . NS=1;AN=0 GT:PS ./.:. -1 648970 . CTTCTTTTCATATTTTTGAAAACTTTTGAAAAACTACC . . . NS=1;AN=0 GT:PS ./.:. -1 649079 . TTTGGTAACCCTTAAATTACTAAACCCAAAACAACATG . . . NS=1;AN=0 GT:PS ./.:. -1 649149 . ATGATAA . . . NS=1;AN=0 GT:PS ./.:. -1 649164 . CTTTCTTCTCTTTTACCCTTCTTTCTTGAATTCAGTCAAACAACGTA . . . NS=1;AN=0 GT:PS ./.:. -1 649219 . TTTCGTCTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 649246 . CACCTCAGCTTTCTCCATTCAGCTATA . . . NS=1;AN=0 GT:PS ./.:. -1 649307 . TCCTCTCACTCTATACTATCTCTGTTAGCTAATTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 649377 . CACATATGCATGTGTGTACATGTGCATACACACACTGTATGTGGACATGTATATATATATATGTGTGTGTGTATATATATAGTATATATATAAATTACAATAACATAAAGGTGGCATTTTAAATTAGTGGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 649521 . TGATCACTACACATTCTATACATGTAAAGAAAATATCACTCTGTATCCCAAGAATATGTACAATTATGGTTTGTCAAATGAAAAAGTTCATACATTGAAAAATTTTAGATAAATATCAAACTTTCTCTGAAACTGTAACTGTAAAATGTAAAAAACAGTAATTGCTATATTGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 649704 . GTAGAAGAATA . . . NS=1;AN=0 GT:PS ./.:. -1 649721 . ATTTCCCTAATCATTATGTGTAATTACAATTACATAGAAGAATATGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 649775 . CCCTAATCATTATGTGTAATTACAATTACATATATATATGTAATTGTAATTACACATAA . . . NS=1;AN=0 GT:PS ./.:. -1 649851 . CATATTCTATATATATAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 649890 . GAGGGAGAGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 650001 . T . . NS=1;CGA_WINEND=652000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:1.24:.:0:.:0:0.999:218 -1 650016 . TATCCTCATTTTTTTCAGATTCTTGCTTAGAAGTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 650059 . GTGGACCTCCTCTGACATATTAAACATTGCAGTCCATTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 650114 . AGGGATTTTTGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 650156 . CTACAGCAATATCTGACAAACAGTGGGCATGTAATG . . . NS=1;AN=0 GT:PS ./.:. -1 650214 . AAATTCAATCAAATCACATCACCTGTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 650268 . ACTTAGAATAAAGAGAAATTCTTTTTATAC . . . NS=1;AN=0 GT:PS ./.:. -1 650347 . TTCGACTCCTCTCCTACTAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 650381 . CCACATTAGACCTTTCTTCAGTTTTTTATAT . . . NS=1;AN=0 GT:PS ./.:. -1 650472 . ATCGTCCCTCCACTTTCGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 650510 . TTTTCATCTCAGCAGGAGGCCCATTCTCTTTGGCAATCCTCTGGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 650618 . GCACTTTTATATTTTAACAAATTATATT . . . NS=1;AN=0 GT:PS ./.:. -1 650694 . TCATCATCAACTTTTTCAACATC . . . NS=1;AN=0 GT:PS ./.:. -1 650724 . CCATTTAGAACTTAGATGTAGTCAATACAGG . . . NS=1;AN=0 GT:PS ./.:. -1 650777 . G . . . NS=1;AN=0 GT:PS ./.:. -1 650792 . CCTTTAAATTAGGATGGCAAAGATCGTATATAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 650857 . GCTCCCAATT . . . NS=1;AN=0 GT:PS ./.:. -1 650911 . TAGATTTTTTTAAAAAGAAAACTGGCCAGGTACTGTGGCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 650968 . CATGTTGGGAGGCCAAGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 651013 . GACCAGCCTGAGAATTTGGCAAAACTCTGTCTCTACAAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 651074 . TGGTGGCATGTGCCTGTAGTACCAGCTACTTGGGAGGCTGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 651132 . AGTCTGGGAGGTCAAGGCTGCAATGAGCTGTGATTGCACCACTGCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 651208 . TCTCAAAAAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 651253 . AAGGATCATGTCAAAGGTAAGAAAAATTAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 651344 . TGATTTAGCATTAGAAAATTACATTAC . . . NS=1;AN=0 GT:PS ./.:. -1 651394 . CAAATACTAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 651546 . TTTTGAAATAAAATGTATCTGAGTAGCAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 651662 . AGAAGTCAATCAGGAAGAGGGGAGCAGTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 651753 . TATTTTATTTTTTCCCCAACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 651853 . TGCTTATACATATAAACACAGCTGATAATTTATTAG . . . NS=1;AN=0 GT:PS ./.:. -1 651921 . CCAGTTTTTTATTTAAATTGAAGATTAGTATACATTTTAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 651972 . AAAT . . . NS=1;AN=0 GT:PS ./.:. -1 652001 . A . . NS=1;CGA_WINEND=654000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:1.34:.:0:.:0:0.999:218 -1 652011 . TCAGAAAAAAAAAGTCAAAAGCTAGAGTATAGAGAAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 652088 . ACAAGATTTAAATATTTTAATGGAAAATAGAACAGAACTAATTATT . . . NS=1;AN=0 GT:PS ./.:. -1 652156 . AAAATAAACAGATTATATGGAGGATTTTTAGAAGATAAGTAAATAAATTAATATAC . . . NS=1;AN=0 GT:PS ./.:. -1 652219 . AAACAAGGGAAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 652264 . TTTGAAATAATGATAAAATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 652309 . AAAGATGCATAAATATATAAATAAATGATAAAAAATGTTGCATACATATATGACTTTTTCAGAATCAAAAAATTTAAATTTCTGTAATAAAATTTAAATGTTTATAAATTTAAAAAACTAGAAGAAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 652457 . CAAATAAATGACAAATATTTGAGGTGATGGATATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 652568 . ATTATTTGTCTCAAAAACAAACAAAAAAAAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 652681 . C . . . NS=1;AN=0 GT:PS ./.:. -1 652709 . GTGGATTATATATTTAAAATAAT . . . NS=1;AN=0 GT:PS ./.:. -1 652746 . ATTGAGAAATATATAGCTGGAAAACTTATCCTTCAAAAATGAAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 652803 . ATTTCCGGATTTTTTTTTAAAACTGA . . . NS=1;AN=0 GT:PS ./.:. -1 652981 . T . . . NS=1;AN=0 GT:PS ./.:. -1 652997 . GTAATTTTGGTTTGTAACTCTGCTTTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 653039 . TTAAAAGGCAAATGCATAAAATGTAATTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 653127 . AAAGAGTAGAGCTATATATATAGCAGTAGAATTTTGGTATGTGATTG . . . NS=1;AN=0 GT:PS ./.:. -1 653235 . TCATAGTAACCAAAAATGAAATATATATAGAATATAAACAAAAGGAAATG . . . NS=1;AN=0 GT:PS ./.:. -1 653291 . GAAACAAAATGTG . . . NS=1;AN=0 GT:PS ./.:. -1 653336 . G . . . NS=1;AN=0 GT:PS ./.:. -1 653359 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 653373 . T . . . NS=1;AN=0 GT:PS ./.:. -1 653377 . A . . . NS=1;AN=0 GT:PS ./.:. -1 653400 . CATAAATCTGAAAACTCTATTTCACATAAAACTGGAGCTGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 653465 . G . . . NS=1;AN=0 GT:PS ./.:. -1 653491 . CTAATTTTTTTTAGAAAAAATTATAAAAAGTAAAAATATG . . . NS=1;AN=0 GT:PS ./.:. -1 653602 . TCCCTCTGTGCCCCCAAAAAACCCTATG . . . NS=1;AN=0 GT:PS ./.:. -1 653639 . ATTATTACCTAAAAAGTCTATTCTCAAATGCAGCAGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 653746 . CTGGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 653774 . AAAGGTCCCTCAGCAATATAACTCACAAACATGTTCAGAAGCAGTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 653834 . TTATCTTTTGAAAGTCGA . . . NS=1;AN=0 GT:PS ./.:. -1 653862 . CTTTAATGTATGCATATAGCATAGCTAATGTACTATCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 653910 . TTATTCAATGAATA . . . NS=1;AN=0 GT:PS ./.:. -1 653967 . T . . . NS=1;AN=0 GT:PS ./.:. -1 654001 . G . . NS=1;CGA_WINEND=656000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.71:1.15:.:0:.:0:0.999:218 -1 654004 . GAGTTTCCATTAAAAGACAATTTAGTAAAACTTTTCTTCCCCCAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 654060 . TGTAAGATGATTTAACAACATGTGTAAAAGTCATTGTGGGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 654146 . TCACCCAGGCTGGAGTGCAGTGGCACAATCTCTGCTCACTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 654220 . TGCCTCAGCTTTCTGAGTAGCAAGGACTACAGGTGCACACCATCACGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 654278 . TTTGTACTATTAGTACAGACGGAGTTTCACCATGTTGGCCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 654348 . ATCCGCCCACCTCGGCCTCCCAAAGTGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 654422 . CTTTGGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 654583 . CAACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 654605 . TGAAGAAATTATGAGTAGAATTTAAAAAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 654642 . GCCTATTAATTAGATTTGTCTTTGTAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 654678 . CTATAATAAATAATATTTTATGCCTATGAGTCCCCAACAAAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 654738 . ATACAAACTGTAAAAGTCACTACTGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 654788 . T . . . NS=1;AN=0 GT:PS ./.:. -1 654827 . ACATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 654852 . TGGCAAATATTGATTGTCATCTTCGTGTTTGTCTATGTCCTAAGTGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 654945 . TCACCCCCTTTTTTTTTTTTTTTTGAGATGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 654999 . TGGAGTGTAATGG . . . NS=1;AN=0 GT:PS ./.:. -1 655028 . TGCAACCTCCACCTCCAGGGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 655056 . ATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGATT . . . NS=1;AN=0 GT:PS ./.:. -1 655101 . G . . . NS=1;AN=0 GT:PS ./.:. -1 655119 . T . . . NS=1;AN=0 GT:PS ./.:. -1 655201 . CTCGGCCTCCCACAGTGCTGAGATTACAGGCATGAGCCACCACGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 655336 . AAAGACAAACTCACAGGAAGATGGGATGTAGAATGATAAG . . . NS=1;AN=0 GT:PS ./.:. -1 655453 . A . . . NS=1;AN=0 GT:PS ./.:. -1 655470 . TTTCATTTTATAAAAATCTA . . . NS=1;AN=0 GT:PS ./.:. -1 655502 . CCAGTTGTTTTTTCTCTTCCTCGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 655586 . A . . . NS=1;AN=0 GT:PS ./.:. -1 655639 . TACGCTTTTTTTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 655669 . CAAATCATCACAGTAGAGCACGATC . . . NS=1;AN=0 GT:PS ./.:. -1 655701 . CAATCTCAAAAACTCAGGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 655784 . T . . . NS=1;AN=0 GT:PS ./.:. -1 655810 . GTGCGATCTCAGCTCACTGCAACCTCCATCTCCCAGTTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 655887 . CTATAGGCATGCACCACCACTACA . . . NS=1;AN=0 GT:PS ./.:. -1 655928 . CTGGCTAATTTTTGTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 655959 . GGGTTTTGCCATGATGGCCAGGCTGGTCTCGAACTCCTGACCTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 656001 . T . . NS=1;CGA_WINEND=658000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.35:.:0:.:0:0.999:218 -1 656031 . AGACTTTTTTTTTTTTTTTTAATA . . . NS=1;AN=0 GT:PS ./.:. -1 656097 . TCCTGAGCTCAAGTGATCCTCCCACCTCAGCTTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 656164 . C . . . NS=1;AN=0 GT:PS ./.:. -1 656182 . A . . . NS=1;AN=0 GT:PS ./.:. -1 656476 . AAAATACAAAAATTAGCCGGGTGTGGTGGCACACACCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 656526 . ACTTAGGAGGCTGAGGCAGGAGAATCGCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 656563 . GGAGGCGGAGGTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 656639 . TCTCAAAAAGAAAAAAAAAAGAGACAGAGAAAAGAAAGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 656773 . CAACTGCCTAAATCATGGGAAAGATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 656918 . GAATCGAATAATACATTCAAAGTGCTGAAAGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 656999 . AAGGAAAAAGAAATAACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 657032 . CAAAGCTGAGGGCATTCAGGACCACTAGGTCTACCTTAAAAAAATGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 657092 . TCAAGTAAAAATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 657120 . GGTAGCTCATGCCTGTAATCCCATTTTGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 657240 . ATTAGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 657319 . ACAAAAAACAAAACAAACAAAAAAAACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 657420 . GGGCGAGGAGGTGTGAGCCCCTGCCAGGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 657457 . CCCGGACCAAGTGCTCGGCCCCCAGGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 657501 . TCCCGTGGCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 657581 . CCTGACCCCTCCCTGCAGCCACACGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 657632 . GAGTCTC G . . NS=1;AN=2;AC=2;CGA_FI=100287654|XM_002342011.2|LOC100287654|INTRON|UNKNOWN-INC;CGA_SDO=28 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:12:12,28:12,28:19,6:-28,-12,0:-19,-6,0:1:1,1:0 -1 657676 . AGCCGTTGCTCATGAGCGTCCACCAAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 657743 . ACCTGCCCTGAAAGCCCAGGGCCCGCAACCCCACACACTTTGGGGGTGGTGGAACCTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 657819 . CCATGGAGGAGGAGCCCTGGGCCCCTCAGGGGAGTCCCTGCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 657913 . CCTGACACCCAGTTGCCTCTACCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 658001 . A . . NS=1;CGA_WINEND=660000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.11:1.42:.:0:.:0:0.999:218 -1 658022 . AATCAGACAAGGACC . . . NS=1;AN=0 GT:PS ./.:. -1 658058 . GCGCTCATGATCTTCAGCAGGCGGCACCAGGCCCTGGCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 658110 . TCACCCCAACCAGGATAACCGGACCGTCAGCCAGATGCTGAGCGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 658256 . GAGGCCAAGCCCACAAGCCAGGGGCTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 658308 . GAGCGGAGCATATCAGAGACGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 658416 . CTATGGGGCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 658445 . GGGAACCTGGCTCAGCCTGGCCCAAGCCTTCTCCCACAGCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 658499 . GGACGGCAGGGAAATAGACCGTCAGGCACTACGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 658628 . GGGAGGTGACCCGTGGGCAGCCCTGCTGCCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 658691 . CAGCGAGGTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 658726 . TCATCCATGAGGAGGAGGGGGTGATGATGTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 658781 . ACCGACACCGATCTCAAGTTCAAGGAGTGGGTGACCGACTGAGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 658835 . CTCTGGGGAGGAGCCAGAGGGCAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 658878 . GTATTTGCACCTGTCATTCCTTCCTCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 658924 . GCTGGATCCTGAGCCCCCAGGGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 659080 . AGTCTGGTCAACGCAGCAGAGCGGGCCCCCTACGGCCCCAACCCCTGGGGATGGGGGCCCAGGGACGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 659159 . G . . . NS=1;AN=0 GT:PS ./.:. -1 659199 . AGAGACCTGAAAGTGTGGGTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 659263 . C . . . NS=1;AN=0 GT:PS ./.:. -1 659300 . CCCGGGGGGCAGAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 659345 . C . . . NS=1;AN=0 GT:PS ./.:. -1 659354 . G . . . NS=1;AN=0 GT:PS ./.:. -1 659375 . CCCACACTGGAGGACCCCACCGCGCCCAAATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 659419 . ATGCTCCAGCTGCAGTCCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 659445 . ACACCCCCAAGTGTGCCATGTGTGATGGGGACAGCTTCCCCTTTGCCTGTACAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 659534 . ACCGAGAAGGCGCTGTCCTCTTCACTGCACGTACCCTGGACCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 659632 . GGCCACAGCCGCCCTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 659697 . CCCTGACTCCCAGCCCTGTGGGGGTCCTGACCGCACCTCACCTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 659774 . CCACTGCTTCTGCCCGAGAGTCACGTGAGGCTGAGAGTAGGGGCAGGGGCAGCAGTGGTGCCAGTTGGGGGGCGGTCCAGTGGGAGGAGCCTCAGCCTCGCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 659887 . GGGACTGATGACTGCATGATCTTCTGGGCACCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 659948 . ATGCTGGTGGTGGGTGCAGGGCCGCTGGGAGCTGCTGCATGGTTCCCAGAGGCTGGACTGAGGCAGGTGCCAACTGAAGCTGCTGGGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 660001 . T . . NS=1;CGA_WINEND=662000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.73:1.26:.:0:.:0:0.999:218 -1 660074 . AGAAGATGTGTGCATAGCAGGTCCACTGCTGCTGCCCCTGCCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 660193 . TGCCCCAGAGTTGGGGCCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 660220 . TGGTTGGAAGGGGACACCCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 660252 . ACACCTGGGGGTCTCCATAACTACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 660315 . AGTACCCCCTGAGAACATGGACAGTATGTGGGGGTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 660409 . TGGGACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 660425 . CTTCCTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGAGACCGAGTCTTGCTCTGTCGCCCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 660502 . GTGCGATCTTGGCTCACTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 660581 . TACAGGTGGACGCTACCACGTCCGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 660615 . GTATTTTTAGTACAGACGGGGCTTCATCATCTTGGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 660676 . C . . . NS=1;AN=0 GT:PS ./.:. -1 660697 . T . . . NS=1;AN=0 GT:PS ./.:. -1 660720 . CGTGAGCCACCACGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 660760 . TATACCCCCTACC . . . NS=1;AN=0 GT:PS ./.:. -1 660787 . GGGGGAAAGCTGGGCAGTTTCCCTCCTCCGAGCCCCTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 660856 . TTTTCACTTTTCGGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 660879 . CCTGCTGGGGCTACAAGATGGAGTGTGAAGAGGGCCTTGGGCCACAGGGAGGCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 660971 . CCA . . . NS=1;AN=0 GT:PS ./.:. -1 660988 . AGGTGAGTATGGGGGTGGGGGCTCCTGCACT . . . NS=1;AN=0 GT:PS ./.:. -1 661058 . TGCACTCCCAACTTGAGCTATACTTTTTAAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 661106 . CTTTGCCCCCTTCCCCAGAACAGAACACGTTGATCGTGGGCGATAT . . . NS=1;AN=0 GT:PS ./.:. -1 661177 . TGACCGTCATTAAACCTGTTTAACACCAAATAATAAG . . . NS=1;AN=0 GT:PS ./.:. -1 661228 . AAATTCGGGCTTGGCGCAGAAACTCACTCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 661269 . CTATCAAAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 661295 . AAATATTCCAAAATTCAATATTTTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 661338 . T . . . NS=1;AN=0 GT:PS ./.:. -1 661376 . AACGGGGCCTGGAATGGCCGACGTGAGGAATGAGCTGGGCCTAAAGAGGCCACTGGCAGGCAGGAGCTGGACCTGCCGAAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 661479 . GACTGGGGAGGCCGCAGTGAGGCGAGAGCTAGCTGGGCGTGGAGAGTCTGCTGTGAGGCCGAGGCCGAGGCCGGGCCCGTGCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 661584 . GGGCCTGCAAAGGCCGACTGGAGATCAAGTTCTGCGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 661690 . C . . . NS=1;AN=0 GT:PS ./.:. -1 661702 . C . . . NS=1;AN=0 GT:PS ./.:. -1 661828 . GGCCGACTTGAGGACGACTTGGGCCTGCAGAGGCCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 661902 . GGACGATTTGGGCCTGCAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 661932 . AGGCCCAAGCTGGGCCTAGAGGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 662001 . G . . NS=1;CGA_WINEND=664000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:1.11:.:0:.:0:0.999:218 -1 662026 . ACCGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 662167 . GCCGGGAGGAAGAGCTGGGCCCGGAGGGGGCGCCGGGAGGCTGCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 662380 . ATCCCTTCTCCCAGTGACT . . . NS=1;AN=0 GT:PS ./.:. -1 662425 . GGGCGTCTGCAGACCCCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 662497 . CAGGCCCAAGTCCCTGCCTACCTCCCAGCAGCCCGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 662552 . TCACGGTGGCCTGTTGAGGCAGGGGATCACGCTGACCTCTGTCCGCG . . . NS=1;AN=0 GT:PS ./.:. -1 662609 . CGGTGTGAGGCAAGGGCTCACATTGACCTCTCTCAGCGTGGGAGGGGCCGGTGTGAGACAAGGGGCTCACGCTGACCTCTGTCCGCGTGGGAGGGGCCGGTGTGAGGCAAGGGCTCACACTG . . . NS=1;AN=0 GT:PS ./.:. -1 662738 . TCAGCGTGGGAGGAGCCAGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 662766 . GGGGCTCACGCCTCTGGGCAGGGTGCCAGAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 662820 . CACCGTGAGGGAGGAGCTGGGCCGCACGCGGGCTGCTGGGAGGCAGGCAGGGACTTGGCCCCGG . . . NS=1;AN=0 GT:PS ./.:. -1 663048 . CGGTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 663094 . GCTGGGAGGCAGGGCCGGGAGAGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 663170 . GCGTGGAGGAGCCCACCGACCGGAGACCA . . . NS=1;AN=0 GT:PS ./.:. -1 663207 . CTGGAGATGCCATCGGAGGGCAGGAGCTCATC . . . NS=1;AN=0 GT:PS ./.:. -1 663258 . GCCTGACCTGGGCCTGGGGAGCTTGGCTTGAGGAAGCTGTGGGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 663406 . GAGAACG . . . NS=1;AN=0 GT:PS ./.:. -1 663447 . TTGAGGAGGTTCTGGGCCCGGAGAGGCCGCCGGAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 663506 . T C . . NS=1;AN=2;AC=2;CGA_FI=100133331|NR_028327.1|LOC100133331|UTR|UNKNOWN-INC&100287654|XM_002342011.2|LOC100287654|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=19 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:18:18,38:18,38:19,15:-38,-18,0:-19,-15,0:4:2,2:2 -1 663641 . TCCGCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 663748 . GGCCGCCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 663792 . GGGAGGCAGGAGGAGCTGGGCCTGGAGAGGCTGCCGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 663840 . TTCGCCTGAGGATGCCACAGTGAGACACCATCTGGGTCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 663932 . AGTTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 664001 . T . . NS=1;CGA_WINEND=666000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.81:1.64:.:0:.:0:0.999:218 -1 664007 . TGGACTCACAGTCATGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 664045 . GTGAGGGAGGAGCTGTGCCTGTTGAGGCTGCTGGCAGGCAGGCAGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 664135 . AAAAAGCCCCTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 664268 . CCGCCATGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 664301 . AGGCTGTTGTGAGGCAGCAGTTGTGCCTGTAGACCCA . . . NS=1;AN=0 GT:PS ./.:. -1 664375 . AGGCAGAGGTTGGGCCTGTAGACGCTGACAGGAGGCAGGAGCTGGGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 664458 . TAGGCCACCAGGAGGCAGCAGTTGGGACTAGAGAGTCTGACTTGAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 664523 . TGACGTC . . . NS=1;AN=0 GT:PS ./.:. -1 664574 . CTGGATG . . . NS=1;AN=0 GT:PS ./.:. -1 664594 . GTGAGGCAAGACCTGGGCCTGTCTAGGCTGCTGGGAGACAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 664674 . TTGGGCCTGGAAAGGCCCTTGTGAAGCATGAGCTTGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 664737 . GCTAGGTGTGTAGAAGCTGCTGAAAGGTTGGGAGCTTGGCTTGGGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 664798 . AGATGCTGGGCGTGAAGAATCTGCTGTGAGGCAGACTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 664927 . TGAGCTGGGCCTGGTGAGGTCGACT . . . NS=1;AN=0 GT:PS ./.:. -1 664966 . GCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 665046 . GTGAGGCAGTAGCCTCATCTGCGGAGGCTGCCGTGACATAG . . . NS=1;AN=0 GT:PS ./.:. -1 665107 . ATTGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 665158 . C . . . NS=1;AN=0 GT:PS ./.:. -1 665245 . TGTAATATATAATAAAATAATTATAGAACTCACCAT . . . NS=1;AN=0 GT:PS ./.:. -1 665288 . AATTAGTGGGCGTGTTAAGCTTGTTTTCCTGCAACTGAATGGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 665354 . AGTGACAGATCAATAGGTATTAGATTCTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 665391 . AGCGCAACCTCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 665414 . GCACGGTTCACAACAGGGTGCGTTCTCCTATGAGCATCTAATGCTGCTGCTCATC . . . NS=1;AN=0 GT:PS ./.:. -1 665512 . GCTCTAAATACAGACGAAGCTTCCCTCACTCCCTCACTCGACACCGCTCACCTCCTGCTGTGTGGCTCCTTGCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 665610 . C . . . NS=1;AN=0 GT:PS ./.:. -1 665760 . CTACTGGAAATCACCAGCACCCCATTTCCCACTGGCAAAGAGCTCAGCACTGCCCCCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 665843 . CCATCTGTGTGGGTCTACCTCCTGGGACCCTTCCTAACATATTAGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 665959 . ATTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 665998 . ACAGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 666001 . G . . NS=1;CGA_WINEND=668000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.19:1.52:.:0:.:0:0.999:218 -1 666036 . CAGGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 666057 . TGCCTGAAATTCCAGCCATTACAGAAGCTAATGCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 666122 . CCGGTCTGGACGACA . . . NS=1;AN=0 GT:PS ./.:. -1 666179 . TGGGGGTGGTGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 666212 . ACTCAGAATGCTGAAGTTTGAGCCTGGGAGGTCAAGGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 666268 . TGCCACTACAGTCCAGCCGGATGA . . . NS=1;AN=0 GT:PS ./.:. -1 666402 . T . . . NS=1;AN=0 GT:PS ./.:. -1 666441 . ATGGCTTACGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 666507 . ATGAGGGGTGGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 666537 . TCCAGGGTAAAGCCTGTCAATTTTTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 666571 . GGAGACAGGGTCTCACCATACTGCCATACTGCCTCCTCCAACTCTTGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 666709 . GGAGAATGTCCATTCACCATGACTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 666751 . GGGGAGACAATTCAATCTAAGCAAAAGGTCATCTGTACACACACAGTAAAAATCTG . . . NS=1;AN=0 GT:PS ./.:. -1 666885 . GGTAAAAGGTCAGTTGATGTTAGCTGCTACTTTTTTGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 667076 . TAATTCATGTATTTTTCTGTAGGGATGGTGACTCCCCCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 667225 . TTTATCAATATTATTCTTATTCCACTCAATTAAAAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 667289 . TGTATCCTACAGCGTAATTGTAAAAACATACACAGTCGTCATCC . . . NS=1;AN=0 GT:PS ./.:. -1 667374 . TACCAAAATCCATGCTTACTCACGTTTCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 667470 . TGTAGTCTCAGCCACGTGGGAGGTTGAGGTGGGAGGATCGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 667531 . C . . . NS=1;AN=0 GT:PS ./.:. -1 667572 . ACAACAGAGGGAGACCCTGTCTCAGAAAAAAAAACAAAATAAAACAGG . . . NS=1;AN=0 GT:PS ./.:. -1 667628 . TGTAATGAGGTCTGCTGGGCAAAATTCCATATAAGCAATGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 667679 . AAAGCAAATCGTGATAAATTAGTACGATTGACTT . . . NS=1;AN=0 GT:PS ./.:. -1 667736 . TAAGGAAAATGCAGAACACAAAGACAGAGAGTAAAAAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 668001 . T . . NS=1;CGA_WINEND=670000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.32:1.91:.:0:.:0:0.999:218 -1 668020 . GCCTTTTGTTTGTTTGTAAGGAATGTACATA . . . NS=1;AN=0 GT:PS ./.:. -1 668161 . TTTCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 668321 . ATCGTCATGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 668473 . CTGCGGAGGCTGCCGT . . . NS=1;AN=0 GT:PS ./.:. -1 668510 . TAGGCCATTGTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 668682 . C . . . NS=1;AN=0 GT:PS ./.:. -1 668731 . CTGGATGGTCCCACCTGAGCGTGATGGGAGAAAGTGACA . . . NS=1;AN=0 GT:PS ./.:. -1 668784 . AGATTCTCATAAGGACAGCGCAACCTAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 668822 . TGCACGGTTCACAACAGGGTGCGTTCTCCTATGAGAATCTAACGCTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 668881 . GAAGGTGGAGCTCAGGCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 668932 . AGACGAAGCTTCCCTCACTCCCTCACTCGACA . . . NS=1;AN=0 GT:PS ./.:. -1 668986 . C . . . NS=1;AN=0 GT:PS ./.:. -1 668989 . C . . . NS=1;AN=0 GT:PS ./.:. -1 669038 . AAAGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 669159 . GTGAGGTCACCTACTGGAAATCACCAGCATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 669314 . GCATAAACCACTCAAAAGTTTAAAGTGGTAAAATTTAATACAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 669367 . ATTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 669462 . TCAGGCCTGAAATTCCAGCAATT . . . NS=1;AN=0 GT:PS ./.:. -1 669587 . TGGGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 669627 . ATGCGGAAGTTTGAGCCTGGGAGGTCAAGGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 669676 . TGCCACTAC . . . NS=1;AN=0 GT:PS ./.:. -1 669855 . TACTAAATTATAATACCCT . . . NS=1;AN=0 GT:PS ./.:. -1 669901 . CACCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 669968 . TTTAAAATAATGGAGACAGGGTCTCACCATACTGCCATACTGCCTCCTCCAACTCTTGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 670001 . G . . NS=1;CGA_WINEND=672000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.21:1.90:.:0:.:0:0.999:218 -1 670117 . GGAGAATGTCCATTCACCATGACTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 670170 . TCAATCTAAGCAAAAGGTCATCTGTACACACACAGTAAAAATCTG . . . NS=1;AN=0 GT:PS ./.:. -1 670293 . GGTAAAAGGTCAGTTGATGTTAGCTGCTACTTTTTTGTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 670484 . TAATTCATGTATTTTTCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 670514 . ACTCCCCCTTTGTTTCCAAGGCCTATCGCAAACTCTTGGCCTCAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 670567 . CCTGCCTCAGCCTCCCAAAGTGTTGCGAT . . . NS=1;AN=0 GT:PS ./.:. -1 670633 . TTTATCAATATTATTCTTATTCCACTCAATTAAAAATTATTATTTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 670701 . TCCCACAGCATAATTGTAAAAACATATAGTCGTCGTCCCTCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 670780 . TACCAAAATCCATGCTTACTCACGTTTCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 670819 . TCTGGAATCCACGTATACGAA . . . NS=1;AN=0 GT:PS ./.:. -1 670850 . G . . . NS=1;AN=0 GT:PS ./.:. -1 670876 . TGTAGTCTCAGCCACGTGGGAGGTTGAGGTGGGAGGATCGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 670934 . A . . . NS=1;AN=0 GT:PS ./.:. -1 670937 . C . . . NS=1;AN=0 GT:PS ./.:. -1 670978 . ACAACAGAGGGAGACCCTGTCTCAGAAAAAAAAACAAAATAAAACAGGTTAGAAATTGTAATGAGGTCTGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 671142 . TAAGGAAAATGCAGAACACAAAGACAGAGAGTAAAAAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 671310 . GAAGCAAAATACTGGTAGCGACA . . . NS=1;AN=0 GT:PS ./.:. -1 671426 . GCCTTTTGTTTGTTTGTAAGGAATGTACATA . . . NS=1;AN=0 GT:PS ./.:. -1 671563 . GAAATTTTTTTTTTAGAAAATTGAACAAGTGCTCCCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 671737 . ATACCTACAGTCCCAGCTACCTGAACTTACTGAGAAAGTTCAGAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 671838 . GAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGGCTCATCTGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 671922 . ATTGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 672001 . T . . NS=1;CGA_WINEND=674000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.67:1.98:.:0:.:0:0.999:218 -1 672060 . TGTAATATATAATAAAATAATCATA . . . NS=1;AN=0 GT:PS ./.:. -1 672206 . AGCGCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 672246 . GTGCGTTCTCCTATGAGAATCTAACGC . . . NS=1;AN=0 GT:PS ./.:. -1 672327 . GCTGTAAATACAGACGAAGCTTCCCTCACTCCCTCACTCGA . . . NS=1;AN=0 GT:PS ./.:. -1 672392 . C . . . NS=1;AN=0 GT:PS ./.:. -1 672398 . G . . . NS=1;AN=0 GT:PS ./.:. -1 672660 . ATCTGTGTGGGTCTACCTCCTGGGACCCTTCCTAACATATTAGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 672774 . ATTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 672820 . AAACAGAACTCTAGAGAATATGGGACTAGCCCAGGCCAGGCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 672908 . GAGGATT . . . NS=1;AN=0 GT:PS ./.:. -1 672945 . GACGACACAGTGAGACCCTGTCTCTATCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 673029 . TCGGAATGCTGAAGTTTGAGCCTGGGAGGTCAAGGCTGCAGTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 673114 . AGATCCT . . . NS=1;AN=0 GT:PS ./.:. -1 673214 . TAATATGAGTTCTTTTGTCTATGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 673324 . GAGGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 673397 . CTCACCATACTGCCATACTGCCTCCTCCAACTCTTGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 673709 . GGTCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 673893 . TAATTCA . . . NS=1;AN=0 GT:PS ./.:. -1 674001 . T . . NS=1;CGA_WINEND=676000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.29:1.83:.:0:.:0:0.999:218 -1 674048 . AATATTATTCTTATTCCACTCAATTAAAAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 674106 . TGTATCCTACAGCGTAATTGTAAAAACATATACA . . . NS=1;AN=0 GT:PS ./.:. -1 674244 . ATACGAAAATTCCAAATATTA . . . NS=1;AN=0 GT:PS ./.:. -1 674287 . TGTAGTCTCAGCCACGTGGGAGGTTGAGGTGGGAGGATCGCTT . . . NS=1;AN=0 GT:PS ./.:. -1 674410 . TCAGAAAAAAAAAAATAAATAAATAAAACAG . . . NS=1;AN=0 GT:PS ./.:. -1 674558 . TAAGGAAAATGCAGAACACAAAGACAGAGAGTAAAAAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 674742 . AGCGACA . . . NS=1;AN=0 GT:PS ./.:. -1 674842 . GCCTTTTGTTTGTTTGTAAGGAATGTATATA . . . NS=1;AN=0 GT:PS ./.:. -1 674979 . GAAATTTTTTTTTTAGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 675143 . ATCGTCATGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 675253 . G . . . NS=1;AN=0 GT:PS ./.:. -1 675313 . CGTAGGGTATGGGCCTAAATAGGCCATTGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 675527 . GGCGTGTTAAGCTTGTTTTCCTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 675622 . AGCGCAACCTAGATCCCTCACATGCACGGT . . . NS=1;AN=0 GT:PS ./.:. -1 675662 . GTGCGTTCTCCTATGAGAATC . . . NS=1;AN=0 GT:PS ./.:. -1 675743 . GCTGTAAATACAGACGAAGCTTCCCTCACTCCCTCACTCGACACCGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 675813 . T . . . NS=1;AN=0 GT:PS ./.:. -1 675825 . GCTCAGGGGTTGGGGACCCCTGCTCAAGTGCATCCAAAACGA . . . NS=1;AN=0 GT:PS ./.:. -1 675943 . TTCGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 676001 . T . . NS=1;CGA_WINEND=678000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:1.37:.:0:.:0:0.999:218 -1 676035 . CAGCACTGCCCCCTGGGAAACCAAACCTATGCCCAAATCCCATCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 676147 . GAGACAATCGATTTAGCCCAGGAGTTTGAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 676215 . GGAAGAGGTGGGAGGATCACTTGAGCCCAGGAATTTG . . . NS=1;AN=0 GT:PS ./.:. -1 676278 . CCCCATCTGGCCAACATGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 676570 . GATAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 676600 . AATATTTGATCTTGGTCCCAGGTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 676748 . ATGCAACCACATGGTAAGAGGCTTGGAACTTTCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 676979 . GCCCCTCCGAACTTAACTTGCCCTGGGTATCTTTCTTTTTTTTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 677049 . GCTGGAGTGCAGTGGCACAATCTCAGCTTACTGTAACCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 677095 . CCCAGTCCCCAGCTCAAGGTATCCTCTCATCTCAGCTTCCCTAGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 677167 . CACCAGTTATTATTATTATTTTTTAATTTTTTATA . . . NS=1;AN=0 GT:PS ./.:. -1 677220 . GTTGCCCAGGCTGGTCTCAAACTCCTGAGTTTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 677330 . TCATTGACTGTTTCTGAGATGTATCC . . . NS=1;AN=0 GT:PS ./.:. -1 677449 . ACTTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 677473 . AGCCTGGCCAACACAACAAGACCCCATCTATACAAAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 677648 . CAACAAAATAAGACCCTCTCTCTCAGAAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 677746 . TTACGGGAACCCCCGATT . . . NS=1;AN=0 GT:PS ./.:. -1 677838 . GGACTGAGCCCCTAACTTGTGGGGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 678001 . T . . NS=1;CGA_WINEND=680000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:1.59:.:0:.:0:0.999:218 -1 678075 . CCAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 678184 . CCTATATGTGATTCTGTGAGAATTAACGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 678309 . A . . . NS=1;AN=0 GT:PS ./.:. -1 678435 . GGTTTCCCTTCCCGGGCAGTTTGCGCTATCCCATCCCGGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 678486 . CCCTCCACCTCCCCCTTCCCTCCCCACTCTCATACA . . . NS=1;AN=0 GT:PS ./.:. -1 678671 . T . . . NS=1;AN=0 GT:PS ./.:. -1 678721 . GTAGATCCAGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 678804 . ATGTCATGTGAAAATTAAACAG . . . NS=1;AN=0 GT:PS ./.:. -1 678871 . CATGAAGGGTTAATTTGTATTTTATTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 678924 . CACCTAGGCTGGAGTGCAGTGGTGCAATCAGGCTCACTGCAGCCTTGACC . . . NS=1;AN=0 GT:PS ./.:. -1 678984 . AAGTAATCTCACTTAATTTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 679066 . GGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 679117 . GGCAATATATTAAGACCCTGCCTCTACC . . . NS=1;AN=0 GT:PS ./.:. -1 679464 . CAACCTA . . . NS=1;AN=0 GT:PS ./.:. -1 679509 . GTCGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 679537 . ATACAGATGAAGTTTCCCTTCACTCGCCTGCTGCTCACCTCCAGCTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 679601 . AGACCGCTGCTCAAGTGCATTTGAAAGGAACCATCCCACGCC . . . NS=1;AN=0 GT:PS ./.:. -1 679655 . CATCTTTACTGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 679689 . CCCCAAGCTCGCAGGACA . . . NS=1;AN=0 GT:PS ./.:. -1 679771 . CCTGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 679854 . TCTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 679932 . ACTCAAAAGTTTAAACTAG . . . NS=1;AN=0 GT:PS ./.:. -1 679982 . A . . . NS=1;AN=0 GT:PS ./.:. -1 679988 . C . . . NS=1;AN=0 GT:PS ./.:. -1 679993 . A . . . NS=1;AN=0 GT:PS ./.:. -1 680001 . G . . NS=1;CGA_WINEND=682000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:1.46:.:0:.:0:0.999:218 -1 680014 . T . . . NS=1;AN=0 GT:PS ./.:. -1 680093 . TGTGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 680144 . A . . . NS=1;AN=0 GT:PS ./.:. -1 680146 . T . . . NS=1;AN=0 GT:PS ./.:. -1 680149 . CTT . . . NS=1;AN=0 GT:PS ./.:. -1 680154 . GCC . . . NS=1;AN=0 GT:PS ./.:. -1 680177 . GAGCAGAGGTTGTGCCACTGTACTCCAGCCTGGGTGACAGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 680243 . AACGTATATATATATATATATGTAAATTTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 680299 . GCAAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 680331 . TTGGGAGGCCAAGGCAGACAGATCACCTGAGGTCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 680389 . GCACAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 680407 . TCTACTAAAAATACAAAATTAGCTGGGCATGGTGGCACATGCCTGTAATCCCAACTACTCGGGAGGCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 680496 . CCCAGAAGGTGGAGGTTGCGCTGAGCCGAGATAGCACCATTGCACTCCAGCCTGGGCAACAAGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 680571 . TCTCAAAAAAAAAAAAAAAGGTATTAATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 680746 . C . . . NS=1;AN=0 GT:PS ./.:. -1 680792 . TACCAAAAAAAAGAGACATTAGCCAGGTGTGGTGGTGGTGCACA . . . NS=1;AN=0 GT:PS ./.:. -1 680903 . C . . . NS=1;AN=0 GT:PS ./.:. -1 680938 . GGGCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 681053 . CCCACTTTCCTGTATCTTTAACCTATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 681331 . GCCTTGCAAGGCAGCCTCACTGCTTGCCCCTCTCCATTTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 681409 . GAACGCACACTCTTTCTCCTCTGGGAGTCTCTGAAGTGGGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 681547 . TCATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 681573 . GATTCTCAGGAGCATGGCAGGTGAAGTGCTCCTCCCATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 681624 . TTAGGGAGTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 681695 . TGGTGGCCAAAGTAATAACCCCCACCGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 681796 . AATGTTGCTCATCAGCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 681871 . CTTTAAAACTGGAAGAGGGAGGCAGAAGGTTAAGAACCAGAGACGGT . . . NS=1;AN=0 GT:PS ./.:. -1 682001 . G . . NS=1;CGA_WINEND=684000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.11:1.29:.:0:.:0:0.999:218 -1 682044 . A . . . NS=1;AN=0 GT:PS ./.:. -1 682225 . AGGACCCTCTCCATCCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 682264 . CTCTGCAAACGAGTAAACATCACCCTCCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 682389 . AAGCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 682436 . TTCCTTTGGTTCTCAGTAGGCAGGGTAGGGGGCCAGGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 682554 . CCTGGACAACATAGCAAGACCTGGGTGGCATACACC . . . NS=1;AN=0 GT:PS ./.:. -1 682647 . GTTTCAGGCTGCAGTGAGCCATGAT . . . NS=1;AN=0 GT:PS ./.:. -1 682678 . ACTGCACTTCAGCCTGGGTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 682724 . TTAGAAAAAAAAAAGAGGGAGAGAGACTATACACAGGCACCACCACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 682815 . GTTGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 682870 . TTGTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 682964 . TATCAAAAATACAAAAATTAGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 683085 . A . . . NS=1;AN=0 GT:PS ./.:. -1 683120 . CAAAAACAACAACAACAATAACAAAAACAAAAACAACAACAACAAAAAAAACTCC . . . NS=1;AN=0 GT:PS ./.:. -1 683207 . CAAAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 683276 . GGAAGAAAAAAAAAACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 683371 . ATCACTTGAGGCCAGGAGTATGA . . . NS=1;AN=0 GT:PS ./.:. -1 683411 . TGGTAAAATCCCACCACTACAGAAAAATCTAAAAATTAG . . . NS=1;AN=0 GT:PS ./.:. -1 683530 . C . . . NS=1;AN=0 GT:PS ./.:. -1 683593 . CTAGAAAAAAAAAATGTC . . . NS=1;AN=0 GT:PS ./.:. -1 683638 . ACATACA . . . NS=1;AN=0 GT:PS ./.:. -1 683711 . TGGATTTTTAAAAAATCAAGACGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 683773 . GGCTCAAGCCATCCTCCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 683805 . A . . . NS=1;AN=0 GT:PS ./.:. -1 683882 . AAAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 683920 . TCACTCCTAGGCATATATCCCAGAAAAATAAAAATATATG . . . NS=1;AN=0 GT:PS ./.:. -1 684001 . T . . NS=1;CGA_WINEND=686000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:1.33:.:0:.:0:0.999:218 -1 684025 . ACATGGAAACAACCCAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 684155 . ACACTGTGCTAAGAGGGAAGAAAAGCCACAAAAGATCACATATTGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 684302 . AGTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 684344 . TTCTAAAAGTGACTGTGGTGATCGATG . . . NS=1;AN=0 GT:PS ./.:. -1 684379 . TGTGAATATTCTAAAACCTACTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 684433 . T . . . NS=1;AN=0 GT:PS ./.:. -1 684436 . G . . . NS=1;AN=0 GT:PS ./.:. -1 684460 . TTTAAAATAATAATAATAGGGGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 684525 . GGGAGGCTGAGGCAGGAGGATCACTTGAGGTCAGGAGTTTTGAG . . . NS=1;AN=0 GT:PS ./.:. -1 684659 . AGACTGAAGTGAGAGAACCACTTGAGCCCAGGAGTTTGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 684782 . ATGCAAGTTTTTATCACTTTGTGAGTGTAGCCAAGTTGGAGGAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 684859 . ACGGTGAGTGGCTGGTTAGGCTCAGTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 685008 . G . . . NS=1;AN=0 GT:PS ./.:. -1 685046 . ATAGCAGCTGTTTATTAAAGACTACAAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 685368 . T . . . NS=1;AN=0 GT:PS ./.:. -1 685380 . G . . . NS=1;AN=0 GT:PS ./.:. -1 685472 . TTACCCAATAT . . . NS=1;AN=0 GT:PS ./.:. -1 685523 . TTTCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 685707 . TTGAAAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 685761 . C . . . NS=1;AN=0 GT:PS ./.:. -1 685796 . ACAAGGCCTACATAGC . . . NS=1;AN=0 GT:PS ./.:. -1 685838 . TTTCATTTGTATTTGTATTTTGAGACAGGGTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 686001 . A . . NS=1;CGA_WINEND=688000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.98:1.28:.:0:.:0:0.999:218 -1 686003 . GCATTTTTTTCATTTTTGTAGAGAGAGAAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 686055 . GCCTCAAACTCCTAGAATCAAGAGATCTGCCCATCTC . . . NS=1;AN=0 GT:PS ./.:. -1 686144 . TTATTTTATTTTATTAAATTTATTTTTTTTATTTTTGTAGAGAGGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 686214 . CTCTCAAACTCATGGCCTTAAAACATACTCCCATCTCTGCCTCTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 686342 . TTGGGAAAAGCAGTAGTGTTTTTTAAAATTACTTA . . . NS=1;AN=0 GT:PS ./.:. -1 686399 . CAACCTTGACCACTGCCTTCTCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 686453 . TACTGACTGACTTCAACATG . . . NS=1;AN=0 GT:PS ./.:. -1 686510 . CCCCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 686535 . CACCGGGATGTTGCCACAGCTTGGCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 686663 . CCTCGGTGGTTCCCATTTTAGTCAGAGTAAAAGCCAAAGCCCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 686767 . A T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs79616629;CGA_RPT=L2a|L2|37.7;CGA_SDO=20 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:56:56,56:56,56:20,33:-56,0,-56:-20,0,-33:25:3,22:22 -1 686937 . ATTCACTTATGAGGCCAACCCTGACCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 686983 . CTGTCCCCATTCCCACCATGCTCATTTCTTTCTTTCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 687092 . ACCTCCCAGGCTTAAACAATCCTCCCGCCTCAGCCACCCTAGGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 687166 . TGGCTTTTTTTTTTTTTTTTTGAGATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 687291 . G . . . NS=1;AN=0 GT:PS ./.:. -1 687378 . GAACTTCTGACCTCGTGATCCACCCTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 687446 . GCGCCTGGCCTTTAAAAAAATTTTTTTTTAGACATGAGGTCTCATTATGTTGTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 687953 . TTGTGAT . . . NS=1;AN=0 GT:PS ./.:. -1 688001 . G . . NS=1;CGA_WINEND=690000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.02:1.34:.:0:.:0:0.999:218 -1 688033 . AGAGTTAGAGATCAGCCTGGGGAAAAAAAGGAAGATCCTGCCTTTACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 688514 . A . . . NS=1;AN=0 GT:PS ./.:. -1 688525 . A . . . NS=1;AN=0 GT:PS ./.:. -1 688601 . G . . . NS=1;AN=0 GT:PS ./.:. -1 688656 . TGCCACAACGCCTAGCTAACTGTTGTTATTTTTAGTAGAAATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 688723 . G . . . NS=1;AN=0 GT:PS ./.:. -1 688778 . GGATTACAGGAGTGAGCCACCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 688814 . CACATTTTTTGAGGCTTGGAACTTTCAGCCTCACCTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 688923 . CTCCATAAACACCCAAACAGCAGGGTTTGGAGAGCTTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 689029 . CCCTCCCCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 689076 . TGAGATGGAGCCATTACATTGAGCCAGTAATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 689140 . CCCGTAATCCCAGCACTTTGGGAGGCAGAGGTGGGCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 689209 . CTGGGCAACATAAGAAGACCCCATCTATACAAAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 689374 . AGCTTGGACAACAGAGCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 689403 . CTTAAAAAGAAAAGAAAAAAAAACTTGTTTTTCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 689503 . TAACTGGTTGGTCAAAATACAGGTGACAACCTAGGACTTGCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 689765 . CTGAAGCCACAGCAGAAGAACATAAATTGTGAAGATTTCATG . . . NS=1;AN=0 GT:PS ./.:. -1 689912 . C . . . NS=1;AN=0 GT:PS ./.:. -1 689962 . CCGGTCATCTTCGTAAGCTGAGGATGAATGTCCCCGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 690001 . G . . NS=1;CGA_WINEND=692000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:1.36:.:0:.:0:0.999:218 -1 690107 . AACCTTAAACTCTGGCTGCCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 690245 . GGAGGCCTCTGAAATGGCCGCTTTGGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 690292 . ATA . . . NS=1;AN=0 GT:PS ./.:. -1 690327 . GGCGCTCCCAGCCTTATCAGGACAAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 690642 . ACCTGCCGACGTGTGATGT . . . NS=1;AN=0 GT:PS ./.:. -1 690766 . ATCAGGGGTGAATTTCC . . . NS=1;AN=0 GT:PS ./.:. -1 690890 . AACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 690931 . ATCGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 691079 . ACATCAAAAAATTAGAAACTGTAATGAGGTCTCTTGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 691225 . GCTGGTTTCCCTGCCTGGGCAGCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 691279 . CCCTCCACCTCCCCCTTCCCTCCCCACTCTCATACAACTCTTCCTTATC . . . NS=1;AN=0 GT:PS ./.:. -1 691367 . TCTCTCCCTCTCCAGAAGAGCTTCCGATT . . . NS=1;AN=0 GT:PS ./.:. -1 691487 . TTCCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 691534 . AAAAGACATTTAAAAAAAAAAAAAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 691577 . CATCAGCACTTAAAAGTTTTAAACGATATGTGAAAAACAAAATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 691648 . GGAAGGTGTTACTGGGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 691680 . TAATTTTTATTTTATTTTATTTTTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 691737 . ACTGCAGTGGTGCAATCACAGTTAACT . . . NS=1;AN=0 GT:PS ./.:. -1 691821 . GAAATGCAGTCTTGCTCTTAGCAAAGCTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 691964 . CTCATTTTTTTTTTTAATTTTTAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 692001 . T . . NS=1;CGA_WINEND=694000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.68:1.13:.:0:.:0:0.999:218 -1 692037 . CTCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 692067 . AAATGCTGGGATTACAGGTGTGAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 692120 . TTAATTATATAAAGAGCTCAAAGCAAATATTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 692170 . C . . . NS=1;AN=0 GT:PS ./.:. -1 692201 . T . . . NS=1;AN=0 GT:PS ./.:. -1 692211 . T . . . NS=1;AN=0 GT:PS ./.:. -1 692214 . TAA . . . NS=1;AN=0 GT:PS ./.:. -1 692233 . A . . . NS=1;AN=0 GT:PS ./.:. -1 692285 . G . . . NS=1;AN=0 GT:PS ./.:. -1 692290 . T . . . NS=1;AN=0 GT:PS ./.:. -1 692297 . T . . . NS=1;AN=0 GT:PS ./.:. -1 692308 . C . . . NS=1;AN=0 GT:PS ./.:. -1 692358 . TTTCCTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 692412 . AAAAGTATTTATCATTTTTATAATTTAATAAAAGATT . . . NS=1;AN=0 GT:PS ./.:. -1 692562 . TGGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 692628 . CTAGTAAAAATAAAAAAATAAAAATAATTGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 692790 . ATCTCAAAAAAAAAAAAAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 692912 . GCCAAGGCGGGTAGATCACGAGATCAGGAGTTCGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 692989 . ATACAAAAATTAACCAGGCATGGTGGCATATGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 693036 . ACTCAGGAGGCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 693058 . AATCGCTTGAACCTGGGAGGCACAGGTTGCAGTGAGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 693267 . CTTGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 693321 . GAAATTTTTTTTAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 693525 . TCACCTTTGGATATAATTCAACCTAAACAAAAGGTCATAT . . . NS=1;AN=0 GT:PS ./.:. -1 693598 . TTTCTCTTTTTTTAAAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 693743 . TGACATAAGGAAGAATTATGGAGAATTTAAAAATCTATGCTATTTATAGG . . . NS=1;AN=0 GT:PS ./.:. -1 693824 . CTACTATTATTATTTTTATGG . . . NS=1;AN=0 GT:PS ./.:. -1 693861 . AAAACTGTCATTAAAAATTAC . . . NS=1;AN=0 GT:PS ./.:. -1 693945 . GGTATATGGGGGGATT . . . NS=1;AN=0 GT:PS ./.:. -1 693993 . AATCCACGCATACTCAAGTTTTCGAAGTCAGTCCTGTGGAATCCACAT . . . NS=1;AN=0 GT:PS ./.:. -1 694001 . C . . NS=1;CGA_WINEND=696000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:1.29:.:0:.:0:0.999:218 -1 694142 . A . . . NS=1;AN=0 GT:PS ./.:. -1 694168 . ACTGCACTCCAGTCTGGGCAACACAGTGAGACAG . . . NS=1;AN=0 GT:PS ./.:. -1 694279 . AAATAACTATCTAATCCAATTAATGCTGGAATTGGGAACAGCAGAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 694344 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 694363 . TGGGGCTCAGGTGTGTTGAGGTCCCCATGCCTGGACTAT . . . NS=1;AN=0 GT:PS ./.:. -1 694411 . GTGGGATTTACTT . . . NS=1;AN=0 GT:PS ./.:. -1 694433 . TTTTCTATATTCCAGCACTGGGAAACTAGGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 694525 . ATTGAAAAATCGTCGCAGGTCAGGTGAGGTGGCTCATACCTATAATCCCAGCCCACTGGGAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 694603 . TCCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 694657 . CTACAAAAAATTAGAAAATGAACTGGGTGCGGTAAAACATT . . . NS=1;AN=0 GT:PS ./.:. -1 694807 . GAGTGAGACCCTGTCTCAAGACACACACAAACACACACACACACACACACACACACACACACACACACACACCCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 694913 . AGGGCCTTCTGGTTACAGAAGAGGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 695082 . ATGCCTCCTTTGTCAATTAATAAATGGAACATCAGCCTTAAAATCCAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 695205 . GCGCGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 695280 . AGATCAGCCTGGCCAACATGGTGAAACCCCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 695373 . T . . . NS=1;AN=0 GT:PS ./.:. -1 695409 . AGGCGGAGGTTGCAGTTACTTCTAGAAGAATTTCCATTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 695752 . AAAATTTGCTTCGGCAAATCTTATGCAGAGCCAACTCCAGGCTCCAGAAACAATAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 695961 . GGGCCCAGCTCCTCACTACTCACCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 696001 . G . . NS=1;CGA_WINEND=698000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.02:1.28:.:0:.:0:0.999:218 -1 696011 . AGAGGATGGGGAAACAAGGCTCCTGACTTTTTTTCCCTAATATCT . . . NS=1;AN=0 GT:PS ./.:. -1 696228 . TGGGATCATTCCAAATTCC . . . NS=1;AN=0 GT:PS ./.:. -1 696280 . AAGGACC . . . NS=1;AN=0 GT:PS ./.:. -1 696295 . AAATGGGCCCTGCTGCCAAGCCTTTTTTTTTTTTTTTAACAA . . . NS=1;AN=0 GT:PS ./.:. -1 696545 . TGCCAAAAATGCACTACAGCCCCCACCCAATTC . . . NS=1;AN=0 GT:PS ./.:. -1 696634 . AACTAAAACAGAAACTCCTGAACTGGGTTCTTTTGAGCCCAGGAAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 696769 . GAGCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 696919 . GTGACTGAAGCAAAAGCTTCAGAACCAGAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 697068 . TATATGCAGGGATGCAGGCTGTAGTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 697127 . CAAGACCTCAAACTG . . . NS=1;AN=0 GT:PS ./.:. -1 697154 . AAAGGAATCAAGGTTCCCTAGAGAAACGGCTGACTCCATGTATGGTGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 697222 . TCTTTTTTGCCAGAAAGCAAGGAAGCCATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 697337 . AAAACAATAAAGACTGCAATGGCCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 697454 . CTAACTTACTACTCTGAAAAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 697491 . GTAGGGTGGAGAATTAGCTATTTATTCAGTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 697648 . TGAACCCCCTACAAAAAAAGCACAAGACAGAATGTGAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 697745 . GTAGTTTTAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 697870 . CTAGGTAGTGGATCTGAGGCTACCTATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 698001 . C . . NS=1;CGA_WINEND=700000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.09:1.28:.:0:.:0:0.999:218 -1 698016 . AAAGGAAAAAAGGAAGGAAAGAAAAAAGGAAGGAAGGAGGGAAGGAGGGAAAAAGGGAAGGAGGGAAGGAAAGGAAGGAAGGGAAAGAAGGAAAGGAAGGAAGGGAAAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 698134 . GGGAAGGAGGAAGGGAGGGAAGGAGGGAGGGAGGGAGAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 698218 . GAAGAAAAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 698240 . TAAATTTTATTTCTTAACAGTTCTGGATGTTAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 698375 . AAAAGGGGCTGAACTCTGTTTTATAATAAGCCCACTCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 698642 . AAGGAAAAAAGTCACAGTGAA . . . NS=1;AN=0 GT:PS ./.:. -1 698680 . CTTTATCAAAAGCACCTAAAAAAGATC . . . NS=1;AN=0 GT:PS ./.:. -1 698731 . CACCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 698785 . A . . . NS=1;AN=0 GT:PS ./.:. -1 698823 . ACCTGACCTCAATAGCTCCAGAACAGCCCTAAAACATTTC . . . NS=1;AN=0 GT:PS ./.:. -1 698987 . GGGGAAAAAAGGAACAATGAGTAGAGGAGAAACAGACCACTCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 699118 . G . . . NS=1;AN=0 GT:PS ./.:. -1 699135 . ATGAGCAGGCAAGCTGGCTAGAAAACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 699245 . TCTGAGGATGATGTCAGTATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 699311 . C . . . NS=1;AN=0 GT:PS ./.:. -1 699352 . TCACGCTGACCCCAGCTCCCTGGATGTTACCATTAGCCAAGACT . . . NS=1;AN=0 GT:PS ./.:. -1 699542 . AAAGGGGGAGGAGAAACTAGGAAAATCATATATGGGCTCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 699654 . G . . . NS=1;AN=0 GT:PS ./.:. -1 699664 . G . . . NS=1;AN=0 GT:PS ./.:. -1 699699 . GTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 699807 . T . . . NS=1;AN=0 GT:PS ./.:. -1 699815 . C . . . NS=1;AN=0 GT:PS ./.:. -1 699925 . CACGTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 700001 . A . . NS=1;CGA_WINEND=702000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.21:.:0:.:0:0.999:409 -1 700182 . T . . . NS=1;AN=0 GT:PS ./.:. -1 700214 . CACGTCTGTAATCTCAGCACTCTGGGAGGCCGAGGCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 700273 . GTCGAGGCGGGAAGATCACTTGACGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 700334 . ACCGCATCTCCACTAAAAATACAAAAATTAGCCTGGTGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 700510 . TCTCAAAAAAAAAAAAAAAAAATTCCTTTGGGAAGGCCTTCTACA . . . NS=1;AN=0 GT:PS ./.:. -1 700576 . TGGAAAAAAGGGTATGGGATCATCACCGGACCTTTGGCTTTTACAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 700647 . AAGGGAT . . . NS=1;AN=0 GT:PS ./.:. -1 700774 . AAATAAAAAGAACACCAAAAATG . . . NS=1;AN=0 GT:PS ./.:. -1 700900 . TCTCCTGAAGTCAGGAGTTCAAGGCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 701063 . CATGAGCTGAGATCATACCACTGCACTCCAGCGTGGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 701120 . CTCCAAAAAAAAAAAAACAGCTAGCAGGTGACATTTGCTATAGGGAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 701233 . TTTCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 701390 . GAATGCTTATTCAGTTGACTGGTGTAGACTT . . . NS=1;AN=0 GT:PS ./.:. -1 701439 . TGCATTATGCCAGATGAATCTTGCATCTCAAAAGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 701525 . TAATAAAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 701570 . TTGACAACATGAATTCTCCTGTCCTAGGACATAATTAA . . . NS=1;AN=0 GT:PS ./.:. -1 701671 . T . . . NS=1;AN=0 GT:PS ./.:. -1 701709 . CTGACAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 701776 . TGAGAATAAATACTTA . . . NS=1;AN=0 GT:PS ./.:. -1 701839 . TAACATTTCATCATGAACTGCGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 701910 . CCAGCAGTTTGGGAGGCCGAGGCAGGCAGATCATGAAGTCAGGAGTTCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 701986 . C . . . NS=1;AN=0 GT:PS ./.:. -1 702001 . A . . NS=1;CGA_WINEND=704000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:1.23:.:0:.:0:0.999:409 -1 702045 . AGCGACTCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 702308 . GGAGAAGAGACGTGGCCAGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 702391 . GAGATTTTTGCTTTAAAATGAACCAAAAAAAAACCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 702453 . AAAGTGGGAGAAACACTAAGAGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 702709 . CAAGTAAATAGTCACCCAAATAAAAACATCATGTTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 702783 . G . . . NS=1;AN=0 GT:PS ./.:. -1 702796 . GAGAGGATAATAACAAATCGCTAATTTCTTTCATCACTATATAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 702928 . CAACAGGGGCACCTTGGTGAGTACTGAACATTTTATTTATTTACTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 703012 . CTAGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 703067 . AGCGATTCTCCTGCCTTGGCCTCCCGAATAGCTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 703113 . TGCGCCACCACACCCGTCTAATTTTGTATTTTTAGTAGAGACGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 703342 . CACCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 703381 . GATCAG . . . NS=1;AN=0 GT:PS ./.:. -1 703424 . GAAACCCCGTCTCTACTAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 703492 . GCTCGGGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 703523 . ACCCGGGAGGCGGAGGTTGCAGTGAGCCGAGATCGTGCCACTGCACTC . . . NS=1;AN=0 GT:PS ./.:. -1 703586 . AGTGAGGCTCCGTCTCAAAAAAAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 703804 . GTAAGGATGGCATGACTCGCCGGCAGCCCTGGGCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 703891 . AACATAAGATTACAAGACTTTTCCAGTTTAGACATACCA . . . NS=1;AN=0 GT:PS ./.:. -1 703973 . CGATACG . . . NS=1;AN=0 GT:PS ./.:. -1 704001 . C . . NS=1;CGA_WINEND=706000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.88:1.31:.:0:.:0:0.999:409 -1 704065 . GATGTCATATATTTTACAT . . . NS=1;AN=0 GT:PS ./.:. -1 704270 . TCCTAAAACATG . . . NS=1;AN=0 GT:PS ./.:. -1 704291 . TGTGAAAATAGACTTTACAGCAGCCGGGTGCAGTGGTGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 704345 . GCACTTTGGCAGCAGAGGCAGGTGGATCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 704398 . AAAACCCCCCTCCCCAGCCCCACCCCCACCCCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 704455 . AGGGCATGGTGGCGGGCGCCTGTAGT . . . NS=1;AN=0 GT:PS ./.:. -1 704543 . GAGCCAAGATCACACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 704590 . ACCTCAAAAAAAACAAAAACAAAAACACA . . . NS=1;AN=0 GT:PS ./.:. -1 704634 . CCCGACCTTACAGATGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 704685 . ACCCTTTTTCTCCCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 705004 . AACAGATTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 705049 . ACCTTGACT . . . NS=1;AN=0 GT:PS ./.:. -1 705093 . CTGTAAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 705175 . AGTG . . . NS=1;AN=0 GT:PS ./.:. -1 705205 . TGGGCTCAGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 705317 . ATACAAAAAATACAAA . . . NS=1;AN=0 GT:PS ./.:. -1 705373 . ACTCAGGAGGCTGAGACAGGAGAATTGTTTGAACCCAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 705545 . CACCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 705651 . ACTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 705684 . ATTCTTTTTTTTTTTTTTGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 705744 . ACAGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 705917 . TGGTATTACAGG . . . NS=1;AN=0 GT:PS ./.:. -1 706001 . G . . NS=1;CGA_WINEND=708000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.19:1.14:.:0:.:0:0.999:409 -1 706131 . CAGGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 706177 . TGGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 706222 . CCAGGTTGGCAGGGCTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 706355 . GACGTCAGGGGGCAGAGAGGCGCAGTTCCAGGGTGGCTTTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 706582 . CCCTTCTCATGGGTCCTGCTTTCTGGCTTCTCCTTCCTTACCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 706642 . GGAAGAACTGAGACAAAGTTTCTCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 706693 . T . . . NS=1;AN=0 GT:PS ./.:. -1 706765 . TTTCGCTCCTGTCGCCCAGGCTGGAGTGCAGTGGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 706850 . T . . . NS=1;AN=0 GT:PS ./.:. -1 706862 . T . . . NS=1;AN=0 GT:PS ./.:. -1 707216 . CCATCATCCCCTAGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 707246 . GGTATTTGCAGAGCTGAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 707284 . TGTGAAATCGCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 707318 . GCTGGGAAGTGAGCGCTGCATCTCCTGCAGCGTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 707411 . TTATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 707572 . TATCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 707882 . AAACGATGCAAGGTTTTTTGTTTTTGTTTTGGAGACGGAGTTTCGC . . . NS=1;AN=0 GT:PS ./.:. -1 708001 . G . . NS=1;CGA_WINEND=710000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:1.21:.:0:.:0:0.999:409 -1 708072 . GGCAGGAAGCTAAACTGATACCTAGGGTAATCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 708335 . CAGCTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 708367 . AGCTTTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 708434 . GCCCAGAGGGACAGAGGCAGATGAGTTGCGT . . . NS=1;AN=0 GT:PS ./.:. -1 708504 . GTCTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 708673 . CGTGACT . . . NS=1;AN=0 GT:PS ./.:. -1 708714 . GCAGGCCCATCAGATGCCCAGGCCAGCAGCACAGCCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 708760 . AGGGAAACTTGGGGAGCCTCAGAGCACCCCCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 708843 . TTTTACC . . . NS=1;AN=0 GT:PS ./.:. -1 708894 . AGACGCC . . . NS=1;AN=0 GT:PS ./.:. -1 708930 . AAAGGCTTAACAGATATACAATTGCACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 709007 . TTCAATATATATCCAATCATTGTAACTATGACACAGT . . . NS=1;AN=0 GT:PS ./.:. -1 709057 . ACTATTTTCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 709137 . GCCACTA . . . NS=1;AN=0 GT:PS ./.:. -1 709190 . TAGCAGCCGGCAGGCAGTGACACACCGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 709341 . CAATGTCCTATACTTTGGTAAATACAGACTATGTTTAAACAATGTC . . . NS=1;AN=0 GT:PS ./.:. -1 709447 . CATGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 709496 . ACATTGAGAGTCCAGAAGATAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 709534 . TTTCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 709571 . TTCCGTTCCCCAGCATTGGCGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 709618 . CTCGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 709662 . TTGAAAAAGAAAAAATGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 709884 . TTACTTTTTAGTTTCCTTCATTTGAATCATCATTGTAAGTCTCCCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 710001 . C . . NS=1;CGA_WINEND=712000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.66:1.03:.:0:.:0:0.999:409 -1 710012 . GCCGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 710076 . T . . . NS=1;AN=0 GT:PS ./.:. -1 710219 . G . . . NS=1;AN=0 GT:PS ./.:. -1 710248 . TTCAAGACCAGCCTGGCCAACAGAGTGAAACCCTCTCTCTACTA . . . NS=1;AN=0 GT:PS ./.:. -1 710546 . CCATTGC . . . NS=1;AN=0 GT:PS ./.:. -1 710585 . TCTCAAAAAAAAAAAAAAAGAAAAAATTAGCCAGGCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 710686 . TCTCAAAAAAAAAAAAAAAAGAGAGAGAGAGAGAAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 710905 . TGCAATA . . . NS=1;AN=0 GT:PS ./.:. -1 710985 . GCCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 711086 . GGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 711136 . TCGCTTGAACCCGGGAGGTGGAGGTTGCAGTGAGCCGAGATCTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 711201 . GGACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 711217 . TGTCAAAAAAAAAAAAAAAAAAAACAGAAAAAGAAAAAGAAAAAAGAATTAG . . . NS=1;AN=0 GT:PS ./.:. -1 711377 . AATCTTTTTTTTATTTTGAGACAGAGTTTTGCTCATTGCCCAGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 711453 . C . . . NS=1;AN=0 GT:PS ./.:. -1 711456 . T . . . NS=1;AN=0 GT:PS ./.:. -1 711460 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 711466 . CT . . . NS=1;AN=0 GT:PS ./.:. -1 711471 . T . . . NS=1;AN=0 GT:PS ./.:. -1 711489 . C . . . NS=1;AN=0 GT:PS ./.:. -1 711510 . G . . . NS=1;AN=0 GT:PS ./.:. -1 711526 . C . . . NS=1;AN=0 GT:PS ./.:. -1 711545 . TGTATTTTCAGTTGAGACAGGGTTTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 711591 . CTCGAACTCCTGACCTCAGGTGATCCACTGACCTTGGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 711700 . AGGCGCGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 711816 . AAAAATATATTTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 711879 . AGGCTGAGGCAGGAGAACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 711909 . GGAGGTGGAGGTTGCAGTGAGCGGAGATCACGCC . . . NS=1;AN=0 GT:PS ./.:. -1 711985 . AAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 712001 . A . . NS=1;CGA_WINEND=714000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:0.91:2:29:=:29:0.999:409 -1 712020 . T . . . NS=1;AN=0 GT:PS ./.:. -1 712046 . AAAACTAAAACCAAAAACACAACACAAATGTAGTACACA . . . NS=1;AN=0 GT:PS ./.:. -1 712191 . AAAGAAACAGGCTCAGAGAATGTTATTTGATTGGACCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 712323 . TTGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 712406 . GGGAAAGTGAAAATGCTTCTAGAAGGCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 712455 . TTGTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 712622 . GTATTTCAGAAAAACATAATCATATTAACAAATAATAACACTGT . . . NS=1;AN=0 GT:PS ./.:. -1 712683 . AATGCTACTTTAGAAAAACATGCTCAAATCTAGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 713162 . CAGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 713362 . GTTCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 713542 . CCGCCTCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 713977 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs74512038;CGA_FI=100288069|NR_033908.1|LOC100288069|UTR|UNKNOWN-INC;CGA_SDO=7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:198:198,198:196,196:43,46:-198,0,-198:-43,0,-46:34:15,19:19 -1 714001 . C . . NS=1;CGA_WINEND=716000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.82:0.93:2:37:=:37:0.999:409 -1 714202 . C . . . NS=1;AN=0 GT:PS ./.:. -1 714427 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:714427:PASS:177,.:160,.:0,.:0:0 -1 714439 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs74707816&dbsnp.134|rs139182182;CGA_FI=100288069|NR_033908.1|LOC100288069|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:714427:VQLOW:21:116,21:124,28:43,25:-116,0,-21:-43,0,-25:17:10,7:7 -1 715348 . T G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3131984;CGA_FI=100288069|NR_033908.1|LOC100288069|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluJb|Alu|19.0;CGA_SDO=5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:73:73,229:73,229:48,48:-229,-73,0:-48,-48,0:17:17,17:0 -1 715487 . CTGAGATTACAGGAATGAGCCATCGTGCCTGGCTTTACAC . . . NS=1;AN=0 GT:PS ./.:. -1 715646 . TTTCTTTTTTGAGATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 715985 . TTTAATTTAAAATTTTTTTTTCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 716001 . T . . NS=1;CGA_WINEND=718000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:0.96:2:44:=:44:0.999:409 -1 716137 . T . . . NS=1;AN=0 GT:PS ./.:. -1 716164 . CTGCAGTGAGCCATGATCGTACCA . . . NS=1;AN=0 GT:PS ./.:. -1 716215 . AGACCCTGACTCCACAAATAAATAAATCAACGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 716432 . GCATAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 716708 . TCTACCT . . . NS=1;AN=0 GT:PS ./.:. -1 716908 . CTGCATG . . . NS=1;AN=0 GT:PS ./.:. -1 716980 . GTGATATATAATCCAACTTGGATTTTTAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 717883 . AGAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 718001 . C . . NS=1;CGA_WINEND=720000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:1.05:2:49:=:49:0.999:409 -1 718135 . CTGATGCTGCTGGATGGTAGATCACACTTTATAAAGCAAGGGGCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 718206 . AATAGTACAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 718386 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10900602&dbsnp.131|rs77614743;CGA_FI=100287934|XR_108279.1|LOC100287934|TSS-UPSTREAM|UNKNOWN-INC&100288069|NR_033908.1|LOC100288069|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=MER33|hAT-Charlie|30.0;CGA_SDO=8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:39:79,39:79,39:27,28:-79,0,-39:-27,0,-28:23:11,12:12 -1 718449 . T . . . NS=1;AN=0 GT:PS ./.:. -1 718555 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs10751453;CGA_FI=100287934|XR_108279.1|LOC100287934|TSS-UPSTREAM|UNKNOWN-INC&100288069|NR_033908.1|LOC100288069|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:25:25,133:25,133:25,45:-133,-25,0:-45,-25,0:16:13,13:3 -1 718787 . T . . . NS=1;AN=0 GT:PS ./.:. -1 719139 . T . . . NS=1;AN=0 GT:PS ./.:. -1 719195 . ATTATTTACTCATGTTGGGTGTAGTTGTTTTTTTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 719256 . A . . . NS=1;AN=0 GT:PS ./.:. -1 719408 . TT . . . NS=1;AN=0 GT:PS ./.:. -1 719475 . TTT . . . NS=1;AN=0 GT:PS ./.:. -1 719481 . T . . . NS=1;AN=0 GT:PS ./.:. -1 719804 . CCT . . . NS=1;AN=0 GT:PS ./.:. -1 720001 . A . . NS=1;CGA_WINEND=722000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:0.94:2:52:=:52:0.999:409 -1 720236 . GAACTACT . . . NS=1;AN=0 GT:PS ./.:. -1 720447 . TTATAAAAGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 720797 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.103|rs3115846&dbsnp.131|rs75530702;CGA_FI=100287934|XR_108279.1|LOC100287934|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:91:91,91:90,90:22,31:-91,0,-91:-22,0,-31:39:16,23:23 -1 720950 . GTTGTGGGGTAGGGGGAGGGGGGAGGGATAA . . . NS=1;AN=0 GT:PS ./.:. -1 721022 . TGCAGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 721677 . T C . . NS=1;AN=2;AC=1;CGA_FI=100287934|XR_108279.1|LOC100287934|UTR|UNKNOWN-INC;CGA_RPT=BLACKJACK|hAT-Blackjack|28.9;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:22:22,22:1,1:0,8:-22,0,-22:0,0,-8:37:6,31:31 -1 721719 . T . . . NS=1;AN=0 GT:PS ./.:. -1 721722 . TCATT . . . NS=1;AN=0 GT:PS ./.:. -1 721844 . TCCCAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 721946 . C . . . NS=1;AN=0 GT:PS ./.:. -1 722001 . A . . NS=1;CGA_WINEND=724000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:0.93:2:53:=:53:0.999:409 -1 722025 . T . . . NS=1;AN=0 GT:PS ./.:. -1 722459 . GACATTTTGATA . . . NS=1;AN=0 GT:PS ./.:. -1 722715 . C . . . NS=1;AN=0 GT:PS ./.:. -1 723722 . GGGCAAAAAAAGCAAAACTCTGAAGAAAGAGAGAGAGAGGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 723798 . CAG C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs34882115&dbsnp.126|rs35182822;CGA_RPT=GA-rich|Low_complexity|16.6;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:189:189,252:227,252:37,39:-189,0,-252:-37,0,-39:34:17,17:17 -1 723882 . TAAGGTGTGGGCCAAAGAAAGTAAGTTAG . . . NS=1;AN=0 GT:PS ./.:. -1 724001 . C . . NS=1;CGA_WINEND=726000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:5.71:0.89:.:0:.:0:0.999:409 -1 724013 . TGTGTGGGTTAAATGTAATTAAATTCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 724053 . C . . . NS=1;AN=0 GT:PS ./.:. -1 724138 . AATGGAATGG . . . NS=1;AN=0 GT:PS ./.:. -1 724153 . AATGGAATGGAATGG . . . NS=1;AN=0 GT:PS ./.:. -1 724176 . GGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 724191 . GGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 724285 . C T . . NS=1;AN=2;AC=1;CGA_RPT=(GAATG)n|Satellite|15.4;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:724285:PASS:111:111,111:103,103:35,41:-111,0,-111:-35,0,-41:22:6,16:16 -1 724298 . AACGG . . . NS=1;AN=0 GT:PS ./.:. -1 724421 . G . . END=725562;NS=1;AN=0 GT:PS ./.:. -1 724597 . G . . IMPRECISE;SVTYPE=INS;END=724597;SVLEN=39;CIPOS=-15,15;MEINFO=L1PA2,653,691,-;NS=1 GT:FT:CGA_IS:CGA_IDC:CGA_IDCL:CGA_IDCR:CGA_RDC:CGA_NBET:CGA_ETS:CGA_KES .:sns95:24:1:1:0:145:L1PA3:0:0.999 -1 725573 . AATGGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 725591 . G . . . NS=1;AN=0 GT:PS ./.:. -1 725601 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:725601:PASS:67,.:64,.:0,.:0:0 -1 725630 . T A . . NS=1;AN=2;AC=1;CGA_RPT=(GAATG)n|Satellite|13.8;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:725601:PASS:117:145,117:142,123:47,43:-117,0,-145:-43,0,-47:10:5,5:5 -1 725640 . T C . . NS=1;AN=2;AC=1;CGA_RPT=(GAATG)n|Satellite|13.8;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:725601:PASS:94:132,94:131,103:45,40:-94,0,-132:-40,0,-45:13:7,6:7 -1 725665 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 725671 . CTAATGGAATGGACTT . . . NS=1;AN=0 GT:PS ./.:. -1 725689 . A . . . NS=1;AN=0 GT:PS ./.:. -1 725695 . T . . . NS=1;AN=0 GT:PS ./.:. -1 725716 . G . . END=725994;NS=1;AN=0 GT:PS ./.:. -1 726001 . G . . NS=1;CGA_WINEND=728000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:3.31:0.91:.:0:.:0:0.999:409 -1 726014 . A . . . NS=1;AN=0 GT:PS ./.:. -1 726075 . TGGACTGGAATGTAATGAGTTTGGAATGGACTTGAATGCAATGGAATGGAATGGAATGGAATGGAATGGACTCAAATGGAATAGCATGGAATGGAATGGACTCAAATGCATTGGAATGGAATGGACTCGAATGGAATGGAATGGACTCGAATGGAATGGAGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 726293 . ATAACAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 726308 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:726308:PASS:42,.:32,.:0,.:0:0 -1 726311 . C . . . NS=1;AN=0 GT:PS ./.:. -1 726314 . GAATGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 726324 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 726330 . AATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 726350 . AATGGAATGGAATGG . . . NS=1;AN=0 GT:PS ./.:. -1 726367 . TGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 726376 . C . . . NS=1;AN=0 GT:PS ./.:. -1 726385 . AATGGAATGGAATGGAATAGAATGGACTCAAATGGAATGGAATATAATGGAATGGGAATGGGAATGGGAATGGAAGGGATGGGATGGGATGGGATGTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 726498 . ATGGAATGGACACCTATGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 726563 . A . . . NS=1;AN=0 GT:PS ./.:. -1 726579 . TGGACTCG . . . NS=1;AN=0 GT:PS ./.:. -1 726594 . T . . . NS=1;AN=0 GT:PS ./.:. -1 726606 . GAATGGAATGGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 726626 . GACTCG . . . NS=1;AN=0 GT:PS ./.:. -1 726634 . TGGAATGGAATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 726671 . TAATGGAATGGACTTGAATGAAATGGCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 726706 . ACTCGAATGGAATGGAATGGAATGGAATCGAATGGAATCGAATGGAATGACA . . . NS=1;AN=0 GT:PS ./.:. -1 726847 . CTCAAATGGAATAGAATGGAATAGAATGGACTCGAATATAATGGAATGAAATTGGCTCGAATGGAATGGAATGGACTTGAATGGAATGGAATGGAATCGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGACTCGA . . . NS=1;AN=0 GT:PS ./.:. -1 727652 . CGCCGCTGCTCCACCTGCCCCTCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 727776 . T . . . NS=1;AN=0 GT:PS ./.:. -1 727850 . TTACGTA . . . NS=1;AN=0 GT:PS ./.:. -1 727894 . AAATCTTTTACTTTTACCAACCT . . . NS=1;AN=0 GT:PS ./.:. -1 727959 . CAATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 728001 . T . . NS=1;CGA_WINEND=730000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.15:1.04:.:0:.:0:0.999:409 -1 728010 . TTGTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 728086 . TAACTCTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 728153 . TCCACACTGT . . . NS=1;AN=0 GT:PS ./.:. -1 728239 . TTACTGCCTCCCTGTGATTATAGGTGAGACAGTCAACAAACCT . . . NS=1;AN=0 GT:PS ./.:. -1 728312 . CACTCACACTCTGCTCCATCACCCTCAGCCACACAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 728690 . TCATAAC . . . NS=1;AN=0 GT:PS ./.:. -1 728858 . GGAACCCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 729676 . GGGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 730001 . T . . NS=1;CGA_WINEND=732000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.03:1.00:.:0:.:0:0.999:409 -1 731410 . TCTTTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 731883 . TGTTAAAAACACA . . . NS=1;AN=0 GT:PS ./.:. -1 732001 . A . . NS=1;CGA_WINEND=734000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:1.05:.:0:.:0:0.999:409 -1 733639 . TTTTTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 733995 . CAACTAG . . . NS=1;AN=0 GT:PS ./.:. -1 734001 . G . . NS=1;CGA_WINEND=736000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.76:1.08:.:0:.:0:0.999:409 -1 734016 . ATATATGTAAATATATCTTTTTCTGTGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 734235 . AAAATAC . . . NS=1;AN=0 GT:PS ./.:. -1 734462 . G . . . NS=1;AN=0 GT:PS ./.:. -1 734491 . T . . . NS=1;AN=0 GT:PS ./.:. -1 734936 . ACACTGTCTTCCACAATGG . . . NS=1;AN=0 GT:PS ./.:. -1 735149 . CTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 735243 . GATGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 735330 . GCTATGCAGAAGCTCTTTAGTTTAATTAGATCCCATT . . . NS=1;AN=0 GT:PS ./.:. -1 735423 . CCCGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 735452 . TAGCTTTTCTTGTAGGGTTTTTATGGTTTTAGGTCTTATGTTTAAG . . . NS=1;AN=0 GT:PS ./.:. -1 735630 . TTTCTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 735661 . TAGATGTGTGGCGTTATTTCTGAGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 735734 . A . . . NS=1;AN=0 GT:PS ./.:. -1 735739 . G . . . NS=1;AN=0 GT:PS ./.:. -1 735810 . ACTATAAAGACACACGCACACGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 735865 . G . . . NS=1;AN=0 GT:PS ./.:. -1 735879 . C . . . NS=1;AN=0 GT:PS ./.:. -1 735890 . TCAGTGATAGACTGGATAAAGAAAATGTGGCACATATACA . . . NS=1;AN=0 GT:PS ./.:. -1 735948 . T . . . NS=1;AN=0 GT:PS ./.:. -1 736001 . A . . NS=1;CGA_WINEND=738000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.07:.:0:.:0:0.999:409 -1 736016 . AAACTAA . . . NS=1;AN=0 GT:PS ./.:. -1 736103 . GGGAACATCAC . . . NS=1;AN=0 GT:PS ./.:. -1 736225 . CCT . . . NS=1;AN=0 GT:PS ./.:. -1 736253 . T . . . NS=1;AN=0 GT:PS ./.:. -1 736288 . A . . . NS=1;AN=0 GT:PS ./.:. -1 736291 . T . . . NS=1;AN=0 GT:PS ./.:. -1 736522 . AT TC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.103|rs3094318&dbsnp.103|rs3131974&dbsnp.111|rs4951928&dbsnp.130|rs71490526&dbsnp.131|rs75961312&dbsnp.131|rs76630699;CGA_SDO=2 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:29,.:29,.:0,.:-29,0,0:0,0,0:32:32,.:0 -1 736808 . AGGGTCACTCACTAAGCATCTTTCCCATGCGCTGCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 736893 . ATTATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 737044 . CCTCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 737640 . GGCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 737737 . TAGGCTCA . . . NS=1;AN=0 GT:PS ./.:. -1 738001 . C . . NS=1;CGA_WINEND=740000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.03:1.03:.:0:.:0:0.999:409 -1 738020 . T . . . NS=1;AN=0 GT:PS ./.:. -1 738121 . GGAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 738182 . ACATTGA . . . NS=1;AN=0 GT:PS ./.:. -1 738527 . ACAGGCTGGGCATGGTGGCTCACACCTGTAATCCCAACAC . . . NS=1;AN=0 GT:PS ./.:. -1 738811 . ATCTAAAAAAAAAAAAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 738904 . ATCAAGCATTTTTTAAAAATGCTTCTACATT . . . NS=1;AN=0 GT:PS ./.:. -1 739139 . AAATAGC . . . NS=1;AN=0 GT:PS ./.:. -1 739207 . GGGATGG . . . NS=1;AN=0 GT:PS ./.:. -1 739426 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3131973&dbsnp.131|rs76643345;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:51:211,51:211,51:51,44:-211,-51,0:-51,-44,0:23:23,23:0 -1 739528 . GAG . . . NS=1;AN=0 GT:PS ./.:. -1 739593 . GGGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 740001 . T . . NS=1;CGA_WINEND=742000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.12:1.16:.:0:.:0:0.999:409 -1 740583 . AAGGTGC . . . NS=1;AN=0 GT:PS ./.:. -1 740683 . AAATTTCTTCCAAAAGGGTAACAT . . . NS=1;AN=0 GT:PS ./.:. -1 740896 . CCCCAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 741267 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs80183632;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:109:109,109:97,97:24,32:-109,0,-109:-24,0,-32:31:10,21:21 -1 741394 . CTGATTT . . . NS=1;AN=0 GT:PS ./.:. -1 741576 . CGATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 741743 . TGTAACATGGGATTTCTCTCCAGAGCAGCCATGCACTGCCCAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 741903 . CCACGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 741987 . AACCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 742001 . A . . NS=1;CGA_WINEND=744000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.19:1.15:.:0:.:0:0.999:409 -1 742171 . CAGATTTTCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 742231 . AGACGAT . . . NS=1;AN=0 GT:PS ./.:. -1 742433 . TATGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 742573 . A . . . NS=1;AN=0 GT:PS ./.:. -1 742823 . CAATAAG . . . NS=1;AN=0 GT:PS ./.:. -1 742941 . TTTTACAA . . . NS=1;AN=0 GT:PS ./.:. -1 743069 . AATCCTT . . . NS=1;AN=0 GT:PS ./.:. -1 743151 . AGACGAT . . . NS=1;AN=0 GT:PS ./.:. -1 743268 . GAAACAGCTTGAAGCTCTCTGGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 743449 . ACACGCC . . . NS=1;AN=0 GT:PS ./.:. -1 743510 . TGCGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 743648 . GCATAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 743848 . AAAAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 744001 . A . . NS=1;CGA_WINEND=746000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.12:1.00:.:0:.:0:0.999:409 -1 744056 . AAGATTTTAAAGGAATTTAAATAATTGGACATATTA . . . NS=1;AN=0 GT:PS ./.:. -1 744301 . TTCCTATGTACCCTAATAAAAAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 744485 . TACCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 744628 . TCCCTTTGTCAAGCGAGTCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 744680 . GTCTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 744739 . TTGCACT . . . NS=1;AN=0 GT:PS ./.:. -1 744875 . ATACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 744948 . CCCCTGAGGTAAGGAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 745018 . AACGCTATGC . . . NS=1;AN=0 GT:PS ./.:. -1 745049 . CTCCAGCCTTCCCCTGCTGCCATCTAACA . . . NS=1;AN=0 GT:PS ./.:. -1 745361 . TTTATTTCATAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 745570 . TATTAAAATATTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 745621 . GGATACA . . . NS=1;AN=0 GT:PS ./.:. -1 745840 . TA . . . NS=1;AN=0 GT:PS ./.:. -1 745905 . AATGACT . . . NS=1;AN=0 GT:PS ./.:. -1 746001 . C . . NS=1;CGA_WINEND=748000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.17:1.28:.:0:.:0:0.999:409 -1 746099 . TACGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 746203 . ACAGAAAAAAAAAACCTGTAAATAGTAATCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 746365 . ATCCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 746391 . GCCGGTCATCGTTCTTTGACAAGAAAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 746443 . CCACTATTCACTTTTAGTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 746521 . CAGTGCCAACAGTTGTAAATCATCAAGACAAGCAAAGCACATT . . . NS=1;AN=0 GT:PS ./.:. -1 746691 . TGAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 746819 . GAAGGAGACAGGATTGCTGACTAATCTCATATGTACAGGGAGAACGAC . . . NS=1;AN=0 GT:PS ./.:. -1 746913 . AAGGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 747186 . GAGGACC . . . NS=1;AN=0 GT:PS ./.:. -1 747295 . AAACAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 747513 . CTGTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 747585 . CAACAAGCCCATGAAAAGATGCTCCACATCACGAATTGT . . . NS=1;AN=0 GT:PS ./.:. -1 747688 . AATTAAAAAAAAAAAAAGAAAGTTTAACAAGTATTG . . . NS=1;AN=0 GT:PS ./.:. -1 747914 . CACATTC . . . NS=1;AN=0 GT:PS ./.:. -1 748001 . T . . NS=1;CGA_WINEND=750000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.14:1.02:.:0:.:0:0.999:409 -1 748138 . TATGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 748177 . AAACAGAAAGCCAGTTACCAGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 748506 . AACGTTATACAACACAAGCGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 748722 . ATGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 748767 . GAATAGAGAAGAATCAACAAAACAACTC . . . NS=1;AN=0 GT:PS ./.:. -1 748875 . CTAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 749065 . AGGATAA . . . NS=1;AN=0 GT:PS ./.:. -1 749127 . TAAGATA . . . NS=1;AN=0 GT:PS ./.:. -1 749173 . CATTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 749212 . CTACACT . . . NS=1;AN=0 GT:PS ./.:. -1 749262 . TGAGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 749393 . AAATTATCTATCTTTTCAAGCTAC . . . NS=1;AN=0 GT:PS ./.:. -1 749514 . ACTCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 749541 . GCAGACTGTGATAATTTCACCAAAACC . . . NS=1;AN=0 GT:PS ./.:. -1 749592 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.111|rs4606254&dbsnp.131|rs80161738;CGA_FI=400728|XR_108280.1|FAM87B|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=3 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:749592:PASS:62,.:37,.:4,.:-62,0,0:-4,0,0:57:37,.:20 -1 749602 . C . . . NS=1;AN=0 GT:PS ./.:. -1 749641 . CGATCAA . . . NS=1;AN=0 GT:PS ./.:. -1 749680 . GGACGACATTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 749734 . CAATCCCCAGTTGTTGTCTTAAGTCACACT . . . NS=1;AN=0 GT:PS ./.:. -1 749853 . TTACGTT . . . NS=1;AN=0 GT:PS ./.:. -1 749899 . G . . . NS=1;AN=0 GT:PS ./.:. -1 749960 . AGTTAAAAAAAAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 749986 . TTTCAGAATTAACATTTCCTTCCTAAACATCTAACACGACACACTTAC . . . NS=1;AN=0 GT:PS ./.:. -1 750001 . T . . NS=1;CGA_WINEND=752000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:0.99:.:0:.:0:0.999:409 -1 750041 . TAACACACCCTTATTACATGAAGGAGCAGCAGAGCAGAGGGCG . . . NS=1;AN=0 GT:PS ./.:. -1 750110 . CACAGCCAATG . . . NS=1;AN=0 GT:PS ./.:. -1 750150 . AACTGCACTGTGCACAGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 750342 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3103768&dbsnp.120|rs10793768&dbsnp.129|rs55727401;CGA_FI=400728|XR_108280.1|FAM87B|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:42:42,91:42,91:39,37:-91,-42,0:-39,-37,0:9:9,9:0 -1 750433 . CTTCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 750501 . ACTCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 750619 . ATCAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 750740 . AGATGCAAAGGTGAGCTGCAGGTGGTCTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 750894 . AGGGGCCAACTGGGACTGGGGTGTCCATCAGCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 751000 . GTGGAAAAAAAAGAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 751123 . AAAACAG . . . NS=1;AN=0 GT:PS ./.:. -1 751169 . G . . . NS=1;AN=0 GT:PS ./.:. -1 751290 . AAATAATAAAAATAAATTATTTAAACATA . . . NS=1;AN=0 GT:PS ./.:. -1 751372 . TTACAAATAAAAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 751473 . CTAGGTATCTGATTAGAAAAAAAAAAAAATAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 751530 . AAATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 751562 . AAGAGATAGAGAGAAAAACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 752001 . T . . NS=1;CGA_WINEND=754000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:1.18:.:0:.:0:0.999:409 -1 752242 . T A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.120|rs12090487;CGA_FI=400728|XR_108280.1|FAM87B|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=L1P4|L1|15.2;CGA_SDO=2 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:67,.:67,.:15,.:-67,0,0:-15,0,0:36:16,.:20 -1 752383 . TATTAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 752434 . CTAGGAAATACTCTTCTTGACGTTGGCCTTGGCAAAGAATTTTTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 752493 . AACGATTGCAACAAAACAAAATTGG . . . NS=1;AN=0 GT:PS ./.:. -1 752566 . GT AT . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.103|rs3094315;CGA_FI=400728|XR_108280.1|FAM87B|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=L1P4|L1|15.2;CGA_SDO=2 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:752566:PASS:725,.:680,.:49,.:-725,0,0:-49,0,0:51:40,.:0 -1 752593 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:752566:PASS:626,.:583,.:0,.:0:0 -1 752721 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.103|rs3131972;CGA_FI=400728|XR_108280.1|FAM87B|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:489:519,489:581,489:48,47:-519,0,-489:-48,0,-47:60:33,27:27 -1 752894 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3131971&dbsnp.131|rs77059159;CGA_FI=400728|XR_108280.1|FAM87B|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:34:66,34:66,34:15,27:-66,-34,0:-27,-15,0:31:31,31:0 -1 753269 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3115861&dbsnp.129|rs61770172;CGA_FI=400728|XR_108280.1|FAM87B|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:36:51,36:51,36:8,28:-51,-36,0:-28,-8,0:44:44,44:0 -1 753376 . CGGCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 753405 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3115860&dbsnp.129|rs61770173;CGA_FI=400728|XR_108280.1|FAM87B|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:54:294,54:294,54:52,34:-294,-54,0:-52,-34,0:32:31,31:1 -1 753425 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3131970&dbsnp.130|rs71507459;CGA_FI=400728|XR_108280.1|FAM87B|UTR|UNKNOWN-INC;CGA_RPT=MER58A|hAT-Charlie|38.4;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:51:131,51:131,51:42,44:-131,-51,0:-44,-42,0:28:27,27:1 -1 753471 . GGGCAGCCGTAGACCACACACGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 753537 . AGACGGGCTGGGCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 753648 . AATGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 753845 . CT . . . NS=1;AN=0 GT:PS ./.:. -1 753849 . G T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.129|rs56101360&dbsnp.129|rs58324164;CGA_FI=400728|XR_108280.1|FAM87B|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:753849:VQLOW:36,.:26,.:6,.:-36,0,0:-6,0,0:23:17,.:6 -1 753973 . TCTGTGGACACTTTTTCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 754001 . T . . NS=1;CGA_WINEND=756000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:0.94:2:46:=:46:0.999:409 -1 754182 . A G . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.103|rs3131969;CGA_FI=400728|XR_108280.1|FAM87B|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:754182:PASS:68,.:70,.:15,.:-68,0,0:-15,0,0:33:19,.:14 -1 754192 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL .|0:754182:PASS:.,43:.,50:.,0:0:0 -1 754334 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.103|rs3131967;CGA_FI=400728|XR_108280.1|FAM87B|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:30:30,74:70,70:9,26:-30,0,-74:-9,0,-26:51:17,34:34 -1 754379 . TTCCTCCAGACACTCT . . . NS=1;AN=0 GT:PS ./.:. -1 754444 . CAGCCAATGGAACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 754503 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.103|rs3115859;CGA_FI=400728|XR_108280.1|FAM87B|UTR|UNKNOWN-INC;CGA_RPT=MLT1I|ERVL-MaLR|41.4;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:280:339,280:339,280:53,52:-339,0,-280:-53,0,-52:37:19,18:18 -1 754629 . AGTGATT . . . NS=1;AN=0 GT:PS ./.:. -1 754758 . CAGTATC . . . NS=1;AN=0 GT:PS ./.:. -1 754964 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.103|rs3131966&dbsnp.131|rs74918077;CGA_FI=400728|XR_108280.1|FAM87B|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:89:148,89:148,89:36,30:-148,0,-89:-36,0,-30:39:17,22:22 -1 755153 . AGTCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 755274 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs78408995;CGA_RPT=MER44D|TcMar-Tigger|23.4;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:55:55,55:50,50:17,31:-55,0,-55:-17,0,-31:26:9,17:17 -1 755482 . C . . . NS=1;AN=0 GT:PS ./.:. -1 755486 . A . . . NS=1;AN=0 GT:PS ./.:. -1 755638 . A . . . NS=1;AN=0 GT:PS ./.:. -1 755648 . CCAGCAGAACCACCCTGTCTATACTACCTGCCTGTCCAGCAGATCCACCCTGTCTACACTA . . . NS=1;AN=0 GT:PS ./.:. -1 755714 . CTGGCCAGCAGATC . . . NS=1;AN=0 GT:PS ./.:. -1 755736 . CTATACTACCTGCCGCTCCAGCAGATCCACCCTGTCTACACTACCTGCCTGTCCAGCAGACCCG . . . NS=1;AN=0 GT:PS ./.:. -1 755828 . ATATCCACCCTATCTACACTACCTGCCTGGCCAGCATATCCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 755883 . ACCTCCCAGCCCAGCAGATCCGCCCTGTCTACACTA . . . NS=1;AN=0 GT:PS ./.:. -1 755924 . CTGGCCAGTAGATCCACGCTATCTACACTACCTGCCTGGCCAGCAGATCCACCCTGTCAACACTACCTGCTTGTCCAGCAGGTCCACACTGTCTACACTACCTGCCTGTCCAGCAGGTGCACCCTATCTA . . . NS=1;AN=0 GT:PS ./.:. -1 756001 . G . . NS=1;CGA_WINEND=758000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.41:1.54:2:38:=:38:0.999:409 -1 756068 . CCAG . . . NS=1;AN=0 GT:PS ./.:. -1 756080 . CCCTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 756108 . AGATCCACCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 756167 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 756176 . GCAGATCC GCAGGTCC . . NS=1;AN=1;AC=1;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:756176:PASS:42,.:42,.:17,.:-42,0,0:-17,0,0:2:2,.:0 -1 756238 . CTTGTCCAGCAGGTCCACCATGTCTACACTGCCTGCCTGGCCAGCAGATCC . . . NS=1;AN=0 GT:PS ./.:. -1 756299 . CACTACCTGCTTGTCCAGCAGGTCCACCCTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 756341 . TGCCTGCAAAGCAGATCC . . . NS=1;AN=0 GT:PS ./.:. -1 756366 . CTACACTACCTGGCTGGCCAGTAGATCCACGCTATCTACACTACCTTCCTGTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 756427 . CCAACCT . . . NS=1;AN=0 GT:PS ./.:. -1 756448 . CCTGTCGAGCAGATCCACCCTGTCTATACTACCTGCCTGTCCAGCAGGTCCACCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 756513 . ACCTGCGTGCCCAGCTGATCCGCCCTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 756551 . TGCTTGTCGAGCAGATCTGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 756586 . TGCCTGTCCAGCAGATCCACCCTGTCTATACTCCGTACCTGGCCAGCAGATCCACGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 756650 . ACTACCTGCCTGTCCAGCAGATCCACACTGTCTACACTACTTGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 756781 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs75189095 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:73:74,73:74,73:26,36:-74,0,-73:-26,0,-36:27:14,13:13 -1 756879 . GGTC . . . NS=1;AN=0 GT:PS ./.:. -1 756944 . CAGTAGATCCACGGTATCTACACTACCTCCCTGGCCAGCAGATTCACCCAGTCTACACTAACTGCTTGTCCAGCAGGTCCACCCTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 757047 . CCAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 757103 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs76779543 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:757103:PASS:290:311,290:311,300:53,53:-311,0,-290:-53,0,-53:32:16,16:16 -1 757120 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77598327 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:757103:PASS:230:408,230:408,229:53,49:-408,0,-230:-53,0,-49:33:20,13:13 -1 757253 . TCCAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 757520 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:757520:VQLOW:37,.:37,.:0,.:0:0 -1 757527 . T . . . NS=1;AN=0 GT:PS ./.:. -1 757532 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:757520:VQLOW:37,.:37,.:0,.:0:0 -1 757535 . C T . . NS=1;AN=2;AC=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:757520:VQLOW:37:37,37:37,37:27,14:-37,0,-37:-14,0,-27:14:6,8:6 -1 757595 . C . . . NS=1;AN=0 GT:PS ./.:. -1 757640 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.103|rs3115853 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:65:65,65:65,65:28,35:-65,0,-65:-28,0,-35:13:6,7:7 -1 757734 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4951929 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:75:620,75:620,75:52,48:-620,-75,0:-52,-48,0:29:29,29:0 -1 757803 . ACCTCCCTGGCCAGCAGATCCACCCTGTCTATACTACCTGCCTGGCCAGCAGATCCACCCTGTCTATACTACCTGACTGGCCAGCAGATCCACCC . . . NS=1;AN=0 GT:PS ./.:. -1 757936 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4951862 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:21:21,58:21,58:22,26:-58,-21,0:-26,-22,0:17:17,17:0 -1 757976 . CTAGCTGCCTGTCCAGCATGTCCACCCTATCTACACTACCTGCCTGTCCAGCAGATCC . . . NS=1;AN=0 GT:PS ./.:. -1 758001 . C . . NS=1;CGA_WINEND=760000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.54:1.09:2:42:=:42:0.999:409 -1 758052 . GCCTATCCAGCAGATCTACCCTGTCTACACTACCTGCCTGCTCAGCAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 758138 . CACGCTATCTA . . . NS=1;AN=0 GT:PS ./.:. -1 758202 . CAGGTCCACCCTATCTACACTACCTGCCTGCCCAGCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 758262 . GCCTGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 758298 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 758302 . C . . . NS=1;AN=0 GT:PS ./.:. -1 758305 . A . . . NS=1;AN=0 GT:PS ./.:. -1 758309 . G . . . NS=1;AN=0 GT:PS ./.:. -1 758315 . CCCTGTCCATACTACC CCCTGTCCACACTACC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.103|rs3131955;CGA_FI=643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:758315:PASS:47,.:38,.:15,.:-47,0,0:-15,0,0:16:3,.:0 -1 758333 . CCT . . . NS=1;AN=0 GT:PS ./.:. -1 758339 . C . . . NS=1;AN=0 GT:PS ./.:. -1 758375 . A . . . NS=1;AN=0 GT:PS ./.:. -1 758378 . A . . . NS=1;AN=0 GT:PS ./.:. -1 758482 . CAGATCCACCCTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 758585 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 758607 . TACCT . . . NS=1;AN=0 GT:PS ./.:. -1 758626 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3131954;CGA_FI=643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:11:98,11:98,11:39,15:-98,-11,0:-39,-15,0:6:6,6:0 -1 758863 . C . . . NS=1;AN=0 GT:PS ./.:. -1 758866 . GCAGGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 758881 . C . . . NS=1;AN=0 GT:PS ./.:. -1 758896 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:758896:PASS:48,.:21,.:0,.:0:0 -1 758901 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:758896:PASS:48,.:21,.:0,.:0:0 -1 758905 . ATC . . . NS=1;AN=0 GT:PS ./.:. -1 759146 . AAATTTTAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 759435 . GTTATATAAATATT . . . NS=1;AN=0 GT:PS ./.:. -1 759663 . CATCAAAAATGAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 759686 . AAATTCATTTTACATATATGTCTATAAAATAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 759837 . T A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3115851&dbsnp.132|rs114111569;CGA_FI=643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:34:96,34:96,34:33,32:-96,-34,0:-33,-32,0:28:28,28:0 -1 760001 . A . . NS=1;CGA_WINEND=762000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.07:1.08:2:46:=:46:0.999:409 -1 760354 . GCACAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 760418 . TTGCCTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 760575 . TGGGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 760806 . ACATCCT . . . NS=1;AN=0 GT:PS ./.:. -1 760912 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.86|rs1048488;CGA_FI=643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:760912:PASS:833,.:794,.:49,.:-833,0,0:-49,0,0:50:35,.:15 -1 760913 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:760912:PASS:841,.:801,.:0,.:0:0 -1 760954 . TTCATAA . . . NS=1;AN=0 GT:PS ./.:. -1 761147 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3115850;CGA_FI=643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:39:39,270:39,270:29,34:-270,-39,0:-34,-29,0:54:45,45:9 -1 761257 . CCTATCCACTGCCTTGTGTCAGTATGTGTGTGTCTTGGGGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 761732 . C . . . NS=1;AN=0 GT:PS ./.:. -1 761752 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.86|rs1057213;CGA_FI=643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:48:97,48:96,67:23,38:-97,-48,0:-38,-23,0:38:38,38:0 -1 761800 . A T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.130|rs71507460&dbsnp.86|rs1064272;CGA_FI=643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:70,.:70,.:9,.:-70,0,0:-9,0,0:57:57,.:0 -1 761811 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.129|rs55884753&dbsnp.86|rs1057212;CGA_FI=643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:31,.:31,.:4,.:-31,0,0:-4,0,0:59:59,.:0 -1 761957 . A AT . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.129|rs59038458&dbsnp.130|rs70949521;CGA_FI=643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:40,.:40,.:5,.:-40,0,0:-5,0,0:28:28,.:0 -1 762001 . T . . NS=1;CGA_WINEND=764000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:1.01:.:0:.:0:0.999:409 -1 762269 . A . . . NS=1;AN=0 GT:PS ./.:. -1 762273 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3115849;CGA_FI=100506327|XR_108281.1|LOC100506327|TSS-UPSTREAM|UNKNOWN-INC&643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:29:56,29:56,29:12,26:-56,-29,0:-26,-12,0:34:34,34:0 -1 762317 . AGTCACCGCTAGTGGGAGGCGATTGTGCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 762589 . G . . . NS=1;AN=0 GT:PS ./.:. -1 762592 . C . . . NS=1;AN=0 GT:PS ./.:. -1 762601 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.103|rs3131949&dbsnp.130|rs71507463;CGA_FI=100506327|XR_108281.1|LOC100506327|TSS-UPSTREAM|UNKNOWN-INC&643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:24,.:24,.:3,.:-24,0,0:-3,0,0:19:19,.:0 -1 762628 . AGGGTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 762696 . A . . . NS=1;AN=0 GT:PS ./.:. -1 762784 . AGC . . . NS=1;AN=0 GT:PS ./.:. -1 762818 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.135|rs188947985;CGA_FI=100506327|XR_108281.1|LOC100506327|TSS-UPSTREAM|UNKNOWN-INC&643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:762818:PASS:167:204,167:205,168:50,49:-204,0,-167:-50,0,-49:20:11,9:9 -1 762856 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs142969342;CGA_FI=100506327|XR_108281.1|LOC100506327|TSS-UPSTREAM|UNKNOWN-INC&643837|NR_015368.1|LOC643837|TSS-UPSTREAM|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:762818:PASS:151:151,189:151,189:45,50:-151,0,-189:-45,0,-50:21:8,13:13 -1 762932 . G . . . NS=1;AN=0 GT:PS ./.:. -1 762964 . GGCGGGGCCGGGCCGGGCCGGGGCGG . . . NS=1;AN=0 GT:PS ./.:. -1 763012 . CTTACCGACCTCCCGCCCCCGCTGCGCGCGTTTCTGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 763185 . GGGCGGTGCTGCTCCCGAGTCGGCGCGCGGCGGGGACGCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 763254 . C . . . NS=1;AN=0 GT:PS ./.:. -1 763263 . A . . . NS=1;AN=0 GT:PS ./.:. -1 763267 . T . . . NS=1;AN=0 GT:PS ./.:. -1 763289 . TGGGTAGCAGCCTCTTCGGCCCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 763319 . GTGACGCGCGCTCGGGCTCCGCGTTCGCGTCGAGGCAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 763387 . A . . . NS=1;AN=0 GT:PS ./.:. -1 763394 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.103|rs3115847&dbsnp.126|rs35782521;CGA_FI=100506327|XR_108281.1|LOC100506327|TSS-UPSTREAM|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:763394:VQLOW:21:21,21:16,16:0,21:-21,0,-21:0,0,-21:5:2,3:3 -1 763410 . CCGCCCTGGACAC . . . NS=1;AN=0 GT:PS ./.:. -1 763459 . CCTATTTTCAACCTGTCCTGCTCCGCACCTGAGATGATTTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 763532 . ACATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 763583 . C . . . NS=1;AN=0 GT:PS ./.:. -1 763629 . TA . . . NS=1;AN=0 GT:PS ./.:. -1 763633 . TAT . . . NS=1;AN=0 GT:PS ./.:. -1 763652 . TTTGTCTCCAGTACATATAATGAGGCTTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 763765 . AAGGATTTTTTTTTTTAAATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 763869 . TCATTTTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 764001 . G . . NS=1;CGA_WINEND=766000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.22:1.05:.:0:.:0:0.999:409 -1 764259 . AGTCACAGAGTATAGTGAAGTTAAGATAGTTTATTAGCAACT . . . NS=1;AN=0 GT:PS ./.:. -1 764646 . TAATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 764847 . ATGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 764902 . TGTATTG . . . NS=1;AN=0 GT:PS ./.:. -1 764941 . TATATTG . . . NS=1;AN=0 GT:PS ./.:. -1 764993 . TGTGCAGCATCTGAGCTCCGTCTTCACCTTAATCCCGAA . . . NS=1;AN=0 GT:PS ./.:. -1 765079 . CACTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 765161 . AACAGTTTCTCTGGTTATAGAATATATTTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 765345 . TTCATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 765496 . CAAGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 765933 . TTATACT . . . NS=1;AN=0 GT:PS ./.:. -1 766001 . T . . NS=1;CGA_WINEND=768000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.70:0.69:.:0:.:0:0.999:409 -1 766102 . CAGTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 766203 . CCCGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 766294 . CAGTCTCCCTTGGCAGCTCTCAGCTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 766593 . CTGCTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 766972 . TCTGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 767038 . G . . . NS=1;AN=0 GT:PS ./.:. -1 767076 . TGTGTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 767348 . TGTCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 767780 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2905042;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC&79854|NR_024321.1|NCRNA00115|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:39:39,252:39,252:36,52:-252,-39,0:-52,-36,0:21:20,20:1 -1 768001 . G . . NS=1;CGA_WINEND=770000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.76:0.80:.:0:.:0:0.999:409 -1 768113 . GTAAGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 768241 . T . . . NS=1;AN=0 GT:PS ./.:. -1 768253 . A C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.101|rs2977608;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=AluSx|Alu|16.1;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:768253:PASS:118,.:107,.:37,.:-118,0,0:-37,0,0:23:14,.:9 -1 768349 . GGTGTCCAACTCCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 768553 . CCCACCT . . . NS=1;AN=0 GT:PS ./.:. -1 768592 . CCATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 768630 . TTTTCTACATTCCTAACTTACTTTCCAGGGGATCGTT . . . NS=1;AN=0 GT:PS ./.:. -1 768726 . TGCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 768796 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 768910 . ATTTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 769237 . CAGGCCCTCATGTACGTCCAGGATGCGGTGACAGCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 769308 . TGAAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 769357 . ATCCGGC . . . NS=1;AN=0 GT:PS ./.:. -1 769548 . GTTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 769632 . GCACCAGCAGCAAACA . . . NS=1;AN=0 GT:PS ./.:. -1 769829 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2977607;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:107:470,107:470,107:32,36:-470,-107,0:-36,-32,0:80:78,78:2 -1 769866 . GCAAGATGTTCCTGCTTCCGCGC . . . NS=1;AN=0 GT:PS ./.:. -1 769959 . TGACGGT . . . NS=1;AN=0 GT:PS ./.:. -1 770001 . C . . NS=1;CGA_WINEND=772000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.00:1.25:.:0:.:0:0.999:409 -1 770075 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2977606&dbsnp.131|rs77324267;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:63:63,303:63,303:37,53:-303,-63,0:-53,-37,0:35:35,35:0 -1 770125 . TTACGCA . . . NS=1;AN=0 GT:PS ./.:. -1 770297 . ACCTATCCATCCACCCATCTATCCAACCCTCCCTCCCTCCATCCACCCATCC . . . NS=1;AN=0 GT:PS ./.:. -1 770412 . TCTACTT . . . NS=1;AN=0 GT:PS ./.:. -1 770568 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3131943&dbsnp.131|rs78979757;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:36:36,126:36,126:34,41:-126,-36,0:-41,-34,0:25:25,25:0 -1 770735 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs139531586;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:60:73,60:72,59:11,24:-73,0,-60:-11,0,-24:40:18,22:22 -1 770847 . GCAGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 770897 . CCACGGCCACGATGGTGTGCCTCCGAAGCCACGCGG . . . NS=1;AN=0 GT:PS ./.:. -1 771083 . GAGTTGG . . . NS=1;AN=0 GT:PS ./.:. -1 771150 . TGAGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 771316 . CAGTCCAATCCATGGGGAAGGTGTGAATTAAAGACCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 771410 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.100|rs2519006&dbsnp.134|rs142008205;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:33,.:33,.:4,.:-33,0,0:-4,0,0:58:22,.:36 -1 771665 . CCAATAAGTCCCTGGCTCCATGTG . . . NS=1;AN=0 GT:PS ./.:. -1 771823 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2977605&dbsnp.131|rs74599385;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=LTR16C|ERVL|52.2;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:33:33,102:33,102:31,35:-102,-33,0:-35,-31,0:25:24,24:1 -1 771854 . CCAGGGGCCAGTGCCCCCAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 771947 . TCCCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 772001 . T . . NS=1;CGA_WINEND=774000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:1.13:2:49:=:49:0.999:409 -1 772043 . GCACTCT . . . NS=1;AN=0 GT:PS ./.:. -1 772430 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs79479693&dbsnp.134|rs145794890;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=FLAM_C|Alu|17.4;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:101:179,101:179,101:33,33:-179,0,-101:-33,0,-33:40:18,22:22 -1 772734 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:772734:PASS:308,.:274,.:0,.:0:0 -1 772755 . A C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.101|rs2905039;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=MER58A|hAT-Charlie|26.3;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:772734:PASS:224,.:211,.:45,.:-224,0,0:-45,0,0:33:23,.:0 -1 772764 . C . . . NS=1;AN=0 GT:PS ./.:. -1 772782 . C . . . NS=1;AN=0 GT:PS ./.:. -1 772978 . CTAGAAA . . . NS=1;AN=0 GT:PS ./.:. -1 773885 . AGCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 774001 . A . . NS=1;CGA_WINEND=776000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:1.07:2:49:=:49:0.999:409 -1 774363 . A G . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.120|rs12563150&dbsnp.131|rs76662252;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=L1M4b|L1|50.3;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:56,.:56,.:7,.:-56,0,0:-7,0,0:56:20,.:36 -1 774419 . GAAAGAACCAGCAGACTT . . . NS=1;AN=0 GT:PS ./.:. -1 774568 . AAAGTTCCAAGCAGAAAAAATGTC . . . NS=1;AN=0 GT:PS ./.:. -1 774715 . G . . . NS=1;AN=0 GT:PS ./.:. -1 774722 . A . . . NS=1;AN=0 GT:PS ./.:. -1 774729 . A . . . NS=1;AN=0 GT:PS ./.:. -1 774736 . A . . . NS=1;AN=0 GT:PS ./.:. -1 774785 . G . . . NS=1;AN=0 GT:PS ./.:. -1 774801 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 774839 . CTGGACACACACACCTAGACACACACACCTGGACAAACACACCTGGACACACACACCTAGACACACACACCTGGACACACACACGTAGACACACACACCTAGAGACACACACCTGGACACACACACCTAGACACACACACCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 775092 . CTCACCA . . . NS=1;AN=0 GT:PS ./.:. -1 775252 . G . . . NS=1;AN=0 GT:PS ./.:. -1 775255 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 775278 . C . . . NS=1;AN=0 GT:PS ./.:. -1 775423 . G . . . NS=1;AN=0 GT:PS ./.:. -1 775426 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2905037;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=L1M4b|L1|50.3;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:775426:PASS:27:27,77:34,51:27,8:-77,-27,0:-27,-8,0:40:40,40:0 -1 775429 . A . . . NS=1;AN=0 GT:PS ./.:. -1 775570 . GGAGTAAGGAAATGGCCACATATTGTATAATCCCATTTATATGAA . . . NS=1;AN=0 GT:PS ./.:. -1 775637 . C . . . NS=1;AN=0 GT:PS ./.:. -1 775648 . A . . . NS=1;AN=0 GT:PS ./.:. -1 775659 . A . . . NS=1;AN=0 GT:PS ./.:. -1 775683 . T . . . NS=1;AN=0 GT:PS ./.:. -1 775779 . ATT . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:775779:PASS:77,.:7,.:0,.:0:0 -1 775790 . AAA AA . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.129|rs55687311&dbsnp.134|rs145511843;CGA_FI=100506327|XR_108281.1|LOC100506327|INTRON|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=(A)n|Simple_repeat|12.9;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:775779:VQLOW:31,.:7,.:0,.:-31,0,0:0,0,0:44:13,.:1 -1 776001 . T . . NS=1;CGA_WINEND=778000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.05:1.10:2:49:=:49:0.999:409 -1 776170 . TTTTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 776232 . AACGCGTTGGTGAGGT . . . NS=1;AN=0 GT:PS ./.:. -1 776288 . GTAGGTA . . . NS=1;AN=0 GT:PS ./.:. -1 776455 . TGCCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 776804 . GCCAAGGCCACCGTCAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 776854 . TCTCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 777122 . A T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2980319;CGA_FI=100506327|XR_108281.1|LOC100506327|UTR|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=MIRb|MIR|36.1;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:127:127,681:94,641:48,49:-681,-127,0:-49,-48,0:53:53,53:0 -1 777126 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:495,.:455,.:0,.:0:0 -1 777202 . A . . . NS=1;AN=0 GT:PS ./.:. -1 777856 . GCATAGG . . . NS=1;AN=0 GT:PS ./.:. -1 777992 . CAGGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 778001 . G . . NS=1;CGA_WINEND=780000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.01:1.14:.:0:.:0:0.999:409 -1 778067 . CTGATCCTTCCTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 778124 . ACACCTCCTGCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 778204 . TGTGTCACT . . . NS=1;AN=0 GT:PS ./.:. -1 778302 . C CCT . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs58115377&dbsnp.132|rs112119688;CGA_FI=100506327|XR_108281.1|LOC100506327|UTR|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:16:16,32:16,34:18,5:-32,-16,0:-18,-5,0:19:18,18:1 -1 778569 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2977615;CGA_FI=100506327|XR_108281.1|LOC100506327|UTR|UNKNOWN-INC&643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:43:43,156:43,156:39,46:-156,-43,0:-46,-39,0:29:29,29:0 -1 778836 . G . . . NS=1;AN=0 GT:PS ./.:. -1 779310 . TGA T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs80302052;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:264:264,342:302,342:35,42:-264,0,-342:-35,0,-42:41:20,21:21 -1 779356 . C T . . NS=1;AN=2;AC=1;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:103:238,103:238,103:22,24:-238,0,-103:-22,0,-24:75:34,41:41 -1 779911 . G GT . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs112437059;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:96:195,96:195,96:39,33:-195,0,-96:-39,0,-33:28:14,14:14 -1 780001 . G . . NS=1;CGA_WINEND=782000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.03:1.02:2:50:=:50:0.999:409 -1 780027 . G T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2977613;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:95:95,787:95,787:48,53:-787,-95,0:-53,-48,0:35:35,35:0 -1 780229 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12565032;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:163:186,163:184,161:42,42:-186,0,-163:-42,0,-42:32:22,10:10 -1 780559 . T . . . NS=1;AN=0 GT:PS ./.:. -1 780785 . T A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.101|rs2977612;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:427:427,427:426,426:50,54:-427,0,-427:-50,0,-54:46:22,24:24 -1 781640 . ACCAACA . . . NS=1;AN=0 GT:PS ./.:. -1 781686 . CAGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 781929 . TACATGG . . . NS=1;AN=0 GT:PS ./.:. -1 782001 . A . . NS=1;CGA_WINEND=784000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.21:1.09:2:49:=:49:0.999:409 -1 782051 . TGATATT . . . NS=1;AN=0 GT:PS ./.:. -1 782470 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs76135949;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=MER103C|hAT-Charlie|32.0;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:197:197,197:196,196:20,30:-197,0,-197:-20,0,-30:60:25,35:35 -1 782528 . T . . . NS=1;AN=0 GT:PS ./.:. -1 782553 . C . . . NS=1;AN=0 GT:PS ./.:. -1 782677 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs140041012;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:107:240,107:240,107:31,34:-240,0,-107:-31,0,-34:54:24,30:30 -1 782830 . GCCCGGCACAAGGTAGGAACTGAGTGTGAGTGCGGGATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 782981 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6594026;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:123:123,123:123,123:13,25:-123,0,-123:-13,0,-25:67:38,29:29 -1 783304 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2980295&dbsnp.131|rs75216674;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=MIR|MIR|37.7;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:92:627,92:627,92:50,47:-627,-92,0:-50,-47,0:42:41,41:1 -1 784001 . G . . NS=1;CGA_WINEND=786000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:1.08:2:49:=:49:0.999:409 -1 784020 . CAGATAA . . . NS=1;AN=0 GT:PS ./.:. -1 785050 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.101|rs2905062;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_RPT=MSTD|ERVL-MaLR|31.4;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:286:396,286:396,285:53,52:-396,0,-286:-53,0,-52:36:19,17:17 -1 785340 . A . . . NS=1;AN=0 GT:PS ./.:. -1 785440 . C . . . NS=1;AN=0 GT:PS ./.:. -1 785734 . AAGTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 785989 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.101|rs2980300;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:177:177,177:174,174:41,44:-177,0,-177:-41,0,-44:31:12,19:19 -1 786001 . A . . NS=1;CGA_WINEND=788000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.82:1.04:.:0:.:0:0.999:409 -1 786038 . TAT . . . NS=1;AN=0 GT:PS ./.:. -1 786069 . A . . . NS=1;AN=0 GT:PS ./.:. -1 786096 . TACATGTGCCATGCTGGTGCGCTGCACCCACTAACTCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 786163 . CTATCCCTCCCCCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 786185 . CCCCACAACAGTCCCCAGAGTGTGATGTTCCCCTTCCTGTGTCCATGTGTTCTCATTGTTCAGTTCCCACC . . . NS=1;AN=0 GT:PS ./.:. -1 786346 . GGACATGAACTCATCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 786435 . ATTCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 786730 . CTCAGCCAGCAGGGTTGCCCAGTGCCCCTTGTCACCC . . . NS=1;AN=0 GT:PS ./.:. -1 786787 . CTGCGGTTACTCTGGGTCTGTGCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 787019 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2905060&dbsnp.129|rs56289866;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:34:206,34:206,34:44,27:-206,-34,0:-44,-27,0:30:27,27:3 -1 787066 . AAACAGGGAAAATGTCTTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 787121 . T . . . NS=1;AN=0 GT:PS ./.:. -1 787135 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.125|rs28753393;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:787135:PASS:36:100,36:115,34:19,27:-100,-36,0:-27,-19,0:42:42,42:0 -1 787151 . G . . . NS=1;AN=0 GT:PS ./.:. -1 787185 . G . . . NS=1;AN=0 GT:PS ./.:. -1 787205 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.101|rs2905057&dbsnp.129|rs55693913;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:105,.:105,.:13,.:-105,0,0:-13,0,0:54:54,.:0 -1 787262 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2905056&dbsnp.129|rs56108613;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:62:62,184:62,184:37,22:-184,-62,0:-37,-22,0:58:58,58:0 -1 787399 . G . . . NS=1;AN=0 GT:PS ./.:. -1 787675 . T . . . NS=1;AN=0 GT:PS ./.:. -1 787680 . G . . . NS=1;AN=0 GT:PS ./.:. -1 787685 . G T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.101|rs2905054;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:318,.:318,.:38,.:-318,0,0:-38,0,0:55:29,.:26 -1 787717 . CGTTGGTTCCCAGTTGGCTTCCGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 787765 . CCAACAGGCTGGTGTTGAATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 787844 . C T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.101|rs2905053;CGA_FI=643837|NR_015368.1|LOC643837|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:191,.:180,.:33,.:-191,0,0:-33,0,0:48:32,.:16 -1 788001 . A . . NS=1;CGA_WINEND=790000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.82:0.96:.:0:.:0:0.999:409 -1 788380 . TCCCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 788560 . CTTGGAACTCCTGACCTCAAGTGATCTGCCCGCCTCGGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 788709 . TTGATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 788837 . CCCTGACCCTGATCAACATGAGATGACCG . . . NS=1;AN=0 GT:PS ./.:. -1 788886 . CCCCGACCCTGATGAACGTGAGATGACCGCCGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 788931 . TGAACCCCGACCCTGATGAACGTGAGATGACCGCCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 788984 . CCCCGACCCTGATGAACGTGAGATGACCGCCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 789033 . CCCCGACCCTGATGAACGTGAGATGACCG . . . NS=1;AN=0 GT:PS ./.:. -1 789086 . GACCCTGATCAACGTGAGATGACCG . . . NS=1;AN=0 GT:PS ./.:. -1 789138 . CCTGATG . . . NS=1;AN=0 GT:PS ./.:. -1 789239 . G . . . NS=1;AN=0 GT:PS ./.:. -1 789242 . C . . . NS=1;AN=0 GT:PS ./.:. -1 789256 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3131939;CGA_FI=643837|NR_015368.1|LOC643837|UTR|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:25:117,25:117,25:42,25:-117,-25,0:-42,-25,0:19:18,18:1 -1 789848 . ATGCGTGCAGATGAGTGACGCAGTGCATCG . . . NS=1;AN=0 GT:PS ./.:. -1 789932 . CAGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 790001 . C . . NS=1;CGA_WINEND=792000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.03:1.14:.:0:.:0:0.999:409 -1 790033 . CACGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 790215 . ACACCGA . . . NS=1;AN=0 GT:PS ./.:. -1 790278 . AGGAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 790381 . GGCATCA . . . NS=1;AN=0 GT:PS ./.:. -1 790696 . C CAT . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs56224346&dbsnp.131|rs76089329&dbsnp.134|rs146955212;CGA_RPT=(TATG)n|Simple_repeat|37.8;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:29:29,398:29,398:22,39:-398,-29,0:-39,-22,0:31:30,30:1 -1 790753 . CAT C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs74609473;CGA_RPT=(TG)n|Simple_repeat|35.6;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:790753:PASS:222:460,222:459,225:38,39:-460,0,-222:-38,0,-39:40:21,17:17 -1 790758 . GTA G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.102|rs3039191&dbsnp.130|rs72558500&dbsnp.134|rs137938478;CGA_RPT=(TG)n|Simple_repeat|35.6;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:790753:PASS:309:415,309:424,317:42,36:-309,0,-415:-36,0,-42:41:21,17:21 -1 790904 . GCACGTG . . . NS=1;AN=0 GT:PS ./.:. -1 790948 . GGATGT . . . NS=1;AN=0 GT:PS ./.:. -1 790958 . AGGC . . . NS=1;AN=0 GT:PS ./.:. -1 791315 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77384149&dbsnp.132|rs116928318;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:791315:PASS:347:557,347:555,356:49,54:-557,0,-347:-49,0,-54:56:30,26:26 -1 791328 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs74877941;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:791315:PASS:271:521,271:518,278:49,52:-521,0,-271:-49,0,-52:50:29,21:21 -1 791389 . ACTCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 791675 . GTGTTGT . . . NS=1;AN=0 GT:PS ./.:. -1 791984 . TCTTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 792001 . G . . NS=1;CGA_WINEND=794000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.91:1.13:.:0:.:0:0.999:409 -1 792263 . A G . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.131|rs76190315&dbsnp.86|rs1044922;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:46,.:46,.:6,.:-46,0,0:-6,0,0:67:24,.:43 -1 792297 . GCTCACCTTCCTGCCTCAAGCCCCTCTCCCACGT . . . NS=1;AN=0 GT:PS ./.:. -1 792360 . CCCTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 792480 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2905036;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:58:236,58:236,58:40,35:-236,-58,0:-40,-35,0:43:43,43:0 -1 793062 . GGGGAAATGAGAGGCCTGAGCG . . . NS=1;AN=0 GT:PS ./.:. -1 793145 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.101|rs2905030;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:38:38,66:57,57:7,24:-38,0,-66:-7,0,-24:55:43,12:12 -1 793205 . TGTCAGTTTAATTTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 793325 . TTTCTTTTTTTTTTTCTTAAGGGGTCAGTTTTAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 793408 . TTTAACATAGAAATATCTGCCTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 793466 . A . . . NS=1;AN=0 GT:PS ./.:. -1 793907 . ACTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 793915 . T . . . NS=1;AN=0 GT:PS ./.:. -1 793922 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs145538528;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:793922:PASS:69:69,69:63,63:28,34:-69,0,-69:-28,0,-34:17:5,12:12 -1 794001 . G . . NS=1;CGA_WINEND=796000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.02:1.05:.:0:.:0:0.999:409 -1 794319 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:113,.:0,.:0,.:0:0 -1 794332 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:51,.:0,.:0,.:0:0 -1 794575 . GAAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 794613 . TACTGGTAACGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 794874 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs146966553;CGA_RPT=L1ME4a|L1|37.6;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:55:157,55:157,55:48,33:-157,0,-55:-48,0,-33:18:9,9:9 -1 794901 . CATATTTTCAATAATTTCCATTATAATAATAATGTCATGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 795555 . CATCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 795985 . TCCCTCCAT . . . NS=1;AN=0 GT:PS ./.:. -1 796001 . A . . NS=1;CGA_WINEND=798000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.30:0.91:.:0:.:0:0.999:409 -1 796111 . ACTCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 796373 . GGTGGGGGGGTTAATCTTTTAAC . . . NS=1;AN=0 GT:PS ./.:. -1 796531 . GACTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 796714 . C . . . NS=1;AN=0 GT:PS ./.:. -1 796727 . G T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2909612&dbsnp.132|rs115637794;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:95:255,95:254,124:42,52:-255,-95,0:-52,-42,0:43:43,43:0 -1 796767 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.120|rs12076540&dbsnp.131|rs75932129;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:796767:PASS:197,.:175,.:18,.:-197,0,0:-18,0,0:61:43,.:18 -1 796771 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:796767:PASS:177,.:167,.:0,.:0:0 -1 797039 . A . . . NS=1;AN=0 GT:PS ./.:. -1 797297 . T . . . NS=1;AN=0 GT:PS ./.:. -1 797474 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:797474:PASS:156,.:72,.:0,.:0:0 -1 797502 . GATAGAT . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:797474:PASS:156,.:72,.:0,.:0:0 -1 797541 . GAGATAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 797810 . GGCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 798001 . A . . NS=1;CGA_WINEND=800000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.25:1.07:.:0:.:0:0.999:409 -1 798026 . C . . . NS=1;AN=0 GT:PS ./.:. -1 798045 . C . . . NS=1;AN=0 GT:PS ./.:. -1 798245 . TGTGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 798400 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10900604;CGA_RPT=MLT1K|ERVL-MaLR|41.9;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:91:91,91:86,86:21,30:-91,0,-91:-21,0,-30:33:13,20:20 -1 798616 . GGGAGGGCAGATTCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 798730 . TTGGATAAAGAGCAAGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 798754 . GAGCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 798798 . TGCGAGAGAAAGATGTGAGCAAATATCCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 798872 . A G . . NS=1;AN=1;AC=1;CGA_SDO=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:38,.:38,.:6,.:-38,0,0:-6,0,0:41:19,.:22 -1 798959 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:VQLOW:30,.:29,.:0,.:0:0 -1 798999 . GACGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 799303 . ACAAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 799447 . CCA . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:428,.:363,.:0,.:0:0 -1 799463 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4245756;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:115:360,115:363,169:31,44:-360,-115,0:-44,-31,0:79:79,79:0 -1 799496 . AAATCTCCCCTAAGGAGGAGATACGACGTGTGCAGATTGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 799625 . GTGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 799668 . CACCCTCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 799791 . AGAATTGGTATGCCTTACCCA . . . NS=1;AN=0 GT:PS ./.:. -1 799883 . GAGGTGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 800001 . C . . NS=1;CGA_WINEND=802000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.29:1.03:.:0:.:0:0.999:406 -1 800001 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:63,.:15,.:0,.:0:0 -1 800007 . T . . . NS=1;AN=0 GT:PS ./.:. -1 800102 . CTAACTA . . . NS=1;AN=0 GT:PS ./.:. -1 800191 . CTATCCCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 800342 . CTGGCATCGATGACATGGGAACTA . . . NS=1;AN=0 GT:PS ./.:. -1 800383 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4951931&dbsnp.131|rs76291278;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:67:179,67:179,67:22,26:-179,0,-67:-22,0,-26:54:27,27:27 -1 800691 . CCTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 800974 . GTGCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 801706 . TGGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 801943 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7516866;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:283:507,283:502,237:43,33:-507,0,-283:-43,0,-33:66:35,31:31 -1 801957 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:547,.:502,.:0,.:0:0 -1 801983 . CCGTGCCCTCACGTGGTCCTCCCTCTGCACTCACATCCCTGACGTCCTCCCGTGCCCTCACGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 802001 . C . . NS=1;CGA_WINEND=804000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:0.97:.:0:.:0:0.999:406 -1 802090 . CACGTGGTCCTCCCTCTGCACTCACATCCCTGACGTCCTCCCGTGCCCTCACGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 802170 . GACGTCCTCCCGAGCCCTCACGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 802219 . GACGTCCTCCCGAGCCCTCACGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 802268 . GACGTCCTCCCGAGCCCTCACGTGGTCCTCCCTCTGCACTCACATCCCTGACGTCCTCCCGAGCCCTCACGTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 802366 . GACATCCTCCCGTGCTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 802415 . GACGTCCTCCCGTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 802846 . AAAAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 803045 . CAGCAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 803062 . GTGTGCTCAGCAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 803081 . G . . . NS=1;AN=0 GT:PS ./.:. -1 803085 . CATCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 803417 . AGACGGC . . . NS=1;AN=0 GT:PS ./.:. -1 803544 . GGCACATATGGCATAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 803859 . CAGATTT . . . NS=1;AN=0 GT:PS ./.:. -1 803896 . AAGTCTGGGATTCTTCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 804001 . A . . NS=1;CGA_WINEND=806000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.81:1.26:.:0:.:0:0.999:406 -1 804115 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9725068&dbsnp.131|rs79338660;CGA_FI=284593|NR_027055.1|FAM41C|INTRON|UNKNOWN-INC;CGA_SDO=23 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:113:113,113:113,113:28,35:-113,0,-113:-28,0,-35:36:14,22:22 -1 804527 . A . . . NS=1;AN=0 GT:PS ./.:. -1 804537 . TTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 804593 . TA T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs75244191;CGA_FI=284593|NR_027055.1|FAM41C|INTRON|UNKNOWN-INC;CGA_SDO=22 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:83:83,83:83,83:25,34:-83,0,-83:-25,0,-34:24:12,12:12 -1 804975 . AATTAAAAAAAAAAATCCG . . . NS=1;AN=0 GT:PS ./.:. -1 805202 . G . . . NS=1;AN=0 GT:PS ./.:. -1 805226 . C . . . NS=1;AN=0 GT:PS ./.:. -1 805230 . CGG . . . NS=1;AN=0 GT:PS ./.:. -1 805251 . CCGC . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:805251:PASS:59,.:33,.:0,.:0:0 -1 805436 . C . . . NS=1;AN=0 GT:PS ./.:. -1 805470 . C A . . NS=1;AN=2;AC=1;CGA_FI=284593|NR_027055.1|FAM41C|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:805470:VQLOW:20:20,20:0,0:0,11:-20,0,-20:0,0,-11:11:1,10:10 -1 805477 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 805485 . GC G . . NS=1;AN=2;AC=1;CGA_FI=284593|NR_027055.1|FAM41C|INTRON|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:805470:VQLOW:20:20,20:0,0:0,3:-20,0,-20:0,0,-3:15:1,14:14 -1 805491 . C . . . NS=1;AN=0 GT:PS ./.:. -1 805494 . A . . . NS=1;AN=0 GT:PS ./.:. -1 806001 . T . . NS=1;CGA_WINEND=808000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:1.01:.:0:.:0:0.999:406 -1 806367 . A . . . NS=1;AN=0 GT:PS ./.:. -1 806466 . TCTATTG . . . NS=1;AN=0 GT:PS ./.:. -1 806805 . CTTATCTAACATTTTTATGTGTTGCTTCTTCCAGTTTACTA . . . NS=1;AN=0 GT:PS ./.:. -1 807299 . AAGATTTTTTTTTTTTTTTTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 807512 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs10751454;CGA_FI=284593|NR_027055.1|FAM41C|INTRON|UNKNOWN-INC;CGA_SDO=7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:57:57,559:57,559:46,52:-559,-57,0:-52,-46,0:26:26,26:0 -1 807652 . ATAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 807761 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4951932;CGA_FI=284593|NR_027055.1|FAM41C|INTRON|UNKNOWN-INC;CGA_SDO=7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:46:46,470:46,470:41,48:-470,-46,0:-48,-41,0:19:19,19:0 -1 807960 . T . . . NS=1;AN=0 GT:PS ./.:. -1 807983 . GAAGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 808001 . C . . NS=1;CGA_WINEND=810000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.80:1.08:.:0:.:0:0.999:406 -1 808010 . AGGGGGTTTCATTTGCTCCACCTGCAGCGAGGTTAGCCCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 808073 . ATTCCTAACAGGGGAAGC . . . NS=1;AN=0 GT:PS ./.:. -1 808125 . TTGCTCCACCTGCAGTGAGGTCTGTTAGCCCATCTC . . . NS=1;AN=0 GT:PS ./.:. -1 808220 . AGGGGGTTTCATTTGCTCCACCTGCAGCGAGGTTAGCCCATCT . . . NS=1;AN=0 GT:PS ./.:. -1 808283 . ATTCCTAACAGGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 808309 . TGTGACTCTGGAGAAGGGGGTTTCATTTGCTCCACCTGCAGCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 808430 . AGGGGGTTTCATTTGCTCCACCTGCAGCGAGGTTAGCCCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 808517 . C . . . NS=1;AN=0 GT:PS ./.:. -1 808631 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11240779;CGA_FI=284593|NR_027055.1|FAM41C|INTRON|UNKNOWN-INC;CGA_SDO=7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:123:409,123:403,117:53,35:-409,0,-123:-53,0,-35:35:22,13:13 -1 808812 . GGTTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 808922 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6594027;CGA_FI=284593|NR_027055.1|FAM41C|INTRON|UNKNOWN-INC;CGA_SDO=7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:32:32,145:53,166:34,39:-145,-32,0:-39,-34,0:31:27,27:4 -1 808928 . C . . . NS=1;AN=0 GT:PS ./.:. -1 809107 . CGTGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 809164 . TGGTCCCGTTTCCCCGGCTGCATTTCTTCATGCCCGGCTTTGCCCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 809236 . A . . . NS=1;AN=0 GT:PS ./.:. -1 809298 . GCCCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 809350 . GTCGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 809378 . ACAAATGTTCAACATTCAAAATA . . . NS=1;AN=0 GT:PS ./.:. -1 809436 . AACTTAA . . . NS=1;AN=0 GT:PS ./.:. -1 809624 . GGCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 809729 . GAATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 810001 . A . . NS=1;CGA_WINEND=812000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.75:0.99:.:0:.:0:0.999:406 -1 810286 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28410559;CGA_FI=284593|NR_027055.1|FAM41C|UTR|UNKNOWN-INC;CGA_SDO=7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:420:420,420:420,420:50,54:-420,0,-420:-50,0,-54:44:21,23:23 -1 811242 . C . . END=811812;NS=1;AN=0 GT:PS ./.:. -1 811897 . C . . . NS=1;AN=0 GT:PS ./.:. -1 811919 . A . . . NS=1;AN=0 GT:PS ./.:. -1 812001 . G . . NS=1;CGA_WINEND=814000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.22:0.93:.:0:.:0:0.999:406 -1 812267 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7541694;CGA_FI=284593|NR_027055.1|FAM41C|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:78:783,78:783,78:53,42:-783,-78,0:-53,-42,0:34:34,34:0 -1 812284 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7545373;CGA_FI=284593|NR_027055.1|FAM41C|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:92:682,92:682,92:53,47:-682,-92,0:-53,-47,0:36:36,36:0 -1 812732 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:812732:PASS:502,.:454,.:0,.:0:0 -1 812750 . CT CC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.111|rs4246500;CGA_FI=284593|NR_027055.1|FAM41C|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=5 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:812732:PASS:548,.:518,.:50,.:-548,0,0:-50,0,0:45:34,.:0 -1 812762 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:812732:PASS:360,.:323,.:0,.:0:0 -1 813219 . TGGTTGTTGGTGGCTCAGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 813285 . TGAGAAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 813314 . ATCCGTTGAAAAGGTGAGTAATGCTGGCAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 813374 . ACTGGTGTTCAGAGGTGGATTTGGTTCCTTCCCAGCCTTTCCCGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 813436 . CATGTACTTTTGTGTGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 813566 . GGGCATCAGGTACTTCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 813613 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:813613:PASS:55,.:17,.:0,.:0:0 -1 813624 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:813613:PASS:83,.:17,.:0,.:0:0 -1 813639 . CA . . . NS=1;AN=0 GT:PS .|.:813613 -1 813649 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:813613:PASS:83,.:17,.:0,.:0:0 -1 813747 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.111|rs4970388;CGA_FI=284593|NR_027055.1|FAM41C|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=5 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:49,.:49,.:8,.:-49,0,0:-8,0,0:45:45,.:0 -1 813816 . AACATTTGGGCCCTGAGAAACGGC . . . NS=1;AN=0 GT:PS ./.:. -1 813861 . TATGACA . . . NS=1;AN=0 GT:PS ./.:. -1 813944 . TCAGAGAGAAAGCGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 813971 . TAATGACAGCCTGAAATTATTTGAGTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 814001 . T . . NS=1;CGA_WINEND=816000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:2.33:2.65:.:0:.:0:0.999:406 -1 814024 . TTCTGTCTCCTTGTGCATCACCTGCGCAGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 814074 . TGTATACAGGTAGCTGTGTTACCCTCCTAGCCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 814144 . GCAGTTAAGGTCTAGGCTCATGGGAGGACAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 814212 . CATACCTGCCTTCCTCCATCTG . . . NS=1;AN=0 GT:PS ./.:. -1 814241 . AGTAGGGGATGGGAGGTCTCACACTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 814294 . CTCTGCTTTCCCAGACAGCCCCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 814368 . TGGGTGTTTGTTTATAAATTATTCCCCTGGAGGGGAATAAA . . . NS=1;AN=0 GT:PS ./.:. -1 814439 . AAACATCTAAGCCTGGCCCTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 814487 . AAACTTACCATT . . . NS=1;AN=0 GT:PS ./.:. -1 814606 . CCAAAACCA . . . NS=1;AN=0 GT:PS ./.:. -1 814685 . ATGCCTCAAAGTCAAAAGTCAACAGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 814728 . CAATATA . . . NS=1;AN=0 GT:PS ./.:. -1 814767 . ATATACACACACAC . . . NS=1;AN=0 GT:PS ./.:. -1 814784 . T . . . NS=1;AN=0 GT:PS ./.:. -1 814788 . TACAT . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:814788:PASS:47,.:34,.:0,.:0:0 -1 814795 . G . . . NS=1;AN=0 GT:PS ./.:. -1 814800 . T . . . NS=1;AN=0 GT:PS ./.:. -1 814813 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:814788:PASS:47,.:34,.:0,.:0:0 -1 814822 . ACCATTTAAAAAATACCATCCTTTCCCCATTGAATAGTGTTGACTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 814885 . ACCGTGTTTTTCGGTTCTTTATTTCTATCGCATTGGTCTTTATGT . . . NS=1;AN=0 GT:PS ./.:. -1 814998 . GTTTGTCCTCTAACT . . . NS=1;AN=0 GT:PS ./.:. -1 815036 . GGCTACTTAGGATTTTTTTAGA . . . NS=1;AN=0 GT:PS ./.:. -1 815074 . GAATAGGTTTTTCTATTTTTGAATATATTGGAATTTTTATAGT . . . NS=1;AN=0 GT:PS ./.:. -1 815135 . GATCGCTATAGATAACAATGGCATCTTGACAAGGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 815178 . CCAGTCCATAAACACATGATGTCTTTTCATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 815226 . ATACTTTTCTGCCATGTTTATAG . . . NS=1;AN=0 GT:PS ./.:. -1 815278 . AAGTATTTTATTCTTTTGATGCTATCATACATGATACTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 815334 . TCAGATAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 815370 . TTTCTGTGTATTGATTTTGTATCCTGCAACTT . . . NS=1;AN=0 GT:PS ./.:. -1 815417 . ATTGTATCT . . . NS=1;AN=0 GT:PS ./.:. -1 815449 . AAAGATTTTTAATATATAAGGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 815489 . ATAATTTTACTTTTTAAAAAATTGGAATATCCTTTATTCT . . . NS=1;AN=0 GT:PS ./.:. -1 815538 . CTTATTGTTTTAACTAACTAGAACCTTCAGTACTACATTAAATAGAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 815603 . CTTGTTTTCGCTCTGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 815661 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 815668 . G . . . NS=1;AN=0 GT:PS ./.:. -1 815676 . TTGGGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 815695 . A . . . NS=1;AN=0 GT:PS ./.:. -1 815704 . TAAGGTGTTTTCTTTCTTTTTATAATTTATTAAGTACATT . . . NS=1;AN=0 GT:PS ./.:. -1 815751 . GAATGTGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 815778 . TTTTTCTTTAAGATGATCACATGAGGTTTTTTCCTTCATTA . . . NS=1;AN=0 GT:PS ./.:. -1 815833 . TACACTGATTTTCATGTGTTAGAACATACTTTTATTTCAGGAGTCAGTTA . . . NS=1;AN=0 GT:PS ./.:. -1 815889 . TTCATAGTGTATAATCCTTTTAATGTACTGCTAAATTTGAATTG . . . NS=1;AN=0 GT:PS ./.:. -1 815959 . TCAACATTTGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 815989 . TTTTCTTATGGTGCCTTTGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 816001 . G . . NS=1;CGA_WINEND=818000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:3.93:2.74:.:0:.:0:0.999:406 -1 816039 . ATATAATAAGTTAGAAAATGTTACCTCCTTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 816098 . CTGGGTTAATTCTGCTTTAAACGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 816138 . GTGTAGCCATCTGGTCCAGGCTTTTCTTTGTTGCTGGGTTTTTTATTA . . . NS=1;AN=0 GT:PS ./.:. -1 816200 . CTGCTGAATCTCCTTGCTCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 816289 . TTAGGTTATTCAATTTTTTTAGTGTATAATTC . . . NS=1;AN=0 GT:PS ./.:. -1 816331 . TCTTCTACATCCTTTTTTTACTCCAAAAGTTTGTTAGTTATG . . . NS=1;AN=0 GT:PS ./.:. -1 816380 . TTTATTTTTGAGTTTGGTAATTTGAGTATTCCCTTTTTTTCTTAGTCAATCTA . . . NS=1;AN=0 GT:PS ./.:. -1 816449 . TTTTTATCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 816471 . AACTGGGTTTTGTTGGTTTTTGATATCTTTTTCTATTCTCTATTTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 816539 . ATCATTTTTAAAATTTTGCTAGCTTTTAGTTGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 816594 . TTCCCTCTGTTTTTCTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 816632 . ATTCGGTATCTTATTTTTTATAATCATTTA . . . NS=1;AN=0 GT:PS ./.:. -1 816674 . TTTCCCGTGTGGTACTGTTTTTGATG . . . NS=1;AN=0 GT:PS ./.:. -1 816713 . TTGGTATTTCATATTTTTAATTTGTCTCTAGATATTTTCTATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 816784 . T . . . NS=1;AN=0 GT:PS ./.:. -1 816789 . G . . . NS=1;AN=0 GT:PS ./.:. -1 816796 . A . . . NS=1;AN=0 GT:PS ./.:. -1 816800 . A . . . NS=1;AN=0 GT:PS ./.:. -1 816831 . ACAAAATTTTCTTGATTTGTTACAGTTTTATTTGTTGTAAGTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 816889 . AATATGTGTATCAACATTTGTTGTGTTCTCATAAACTTTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 816941 . ATTTCTGGTCCACATATGTAAGTCTCTACATTAATATTATTTTGAAGCATTTAAACTTCTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 817033 . AGATTTTTGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 817060 . TGGTAGGTGACTGAGAAATGCTTAAAAATTAGCCAAAACTTAAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 817118 . TACTAA . . . NS=1;AN=0 GT:PS ./.:. -1 817146 . AAAAAATAAGTCTTTAATGGTATAAAAGCAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 817185 . GAATGTTTTCTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 817209 . CATCTAAATTAAAAGCTGGAAAAAAATTTTATGG . . . NS=1;AN=0 GT:PS ./.:. -1 817266 . ATTTCAACTTTTTCTGGTTAAAATTTTTCCAAACAGATTCCTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 817323 . AAA . . . NS=1;AN=0 GT:PS ./.:. -1 817341 . A . . . NS=1;AN=0 GT:PS ./.:. -1 817368 . AAACTTTCTTGAACCTGTGGGAATCCATGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 817405 . TGACATTATGTTCTATTCTCTTGGAAGGTAGAAATATCGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 817457 . TTGCTGACAAGAAATATGGTCCTGAGCAAGGCTCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 817518 . AACGCTGGAGCCCATCTGTCTCCAATCTGCTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 817566 . GAACTTCAGTTTTCCCTTTGATACTCTGTATTTCTACCAACCACAACGCC . . . NS=1;AN=0 GT:PS ./.:. -1 817642 . AATGACAAATATAGGCCTGAAGGAAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 817677 . TGGCATTCCCAGCTTACTACCACTCCTTGGGTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 817722 . TACGTGGATTCAACTCATAGACTCAGGTGGGTGAGGATCTATTGTTCA . . . NS=1;AN=0 GT:PS ./.:. -1 817780 . AAGTGACTGCTTAAGACTCTGGTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 817828 . CTCAATGCAGTGTTAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 817860 . TAATTACCATCTTACTATCACTAAATCATAGCTAAAATAAGGAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 817921 . AGAGATGTAATCTTATGAAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 817951 . AGAGATTTGTGGAGAGCCCTTCATAATTTCATGGTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 818001 . G . . NS=1;CGA_WINEND=820000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.62:2.36:.:0:.:0:0.999:406 -1 818012 . GACATTTCATTATAATATATTAGCTATTC . . . NS=1;AN=0 GT:PS ./.:. -1 818057 . ATGTAAAGTTTTCTTTGTTGCACTTTAAGTTCTGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 818103 . AGAGCATGCAGGTTTGTTACGTAAGTATACACGTGTCATG . . . NS=1;AN=0 GT:PS ./.:. -1 818152 . TGCACCCATCAACCCATCATCTACATTAAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 818205 . CCCCAGCCTCTCACCC . . . NS=1;AN=0 GT:PS ./.:. -1 818241 . GATGTTCCTCTCCCTGTGTCCATGTGTTCTCATTGTTCAACTC . . . NS=1;AN=0 GT:PS ./.:. -1 818292 . GAGTGAGAACATGCAGTGTTTGGTTTTCTTTTCTTTTTTTCTTTCTCTCTTTTCTTTTTTTTTTTTTGAGACAAACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 818379 . TTGTCCAGGTTGGAGTGCAATGGCGCGATCTCGGCTCACTGCAA . . . NS=1;AN=0 GT:PS ./.:. -1 818431 . TCCCGGGTTCAAGCGATTCTCCTGCCTCAGCCTCCCAAGTAGCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 818483 . AGGCATGTGCCAACATGCCTGGCTAATTGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 818528 . AGACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTCAACTC . . . NS=1;AN=0 GT:PS ./.:. -1 818581 . CCCGAAGATCTGAGACTACAGGTGTGAGCCAATGC . . . NS=1;AN=0 GT:PS ./.:. -1 818646 . TTTGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 818676 . CTACGTCCCTGGAAAGGACATAAATGCGTAGT . . . NS=1;AN=0 GT:PS ./.:. -1 818728 . CACCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 818775 . CCAGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 818798 . GTGCTGAAATAAACATACAGTGCATGTGTCTTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 818864 . ACCCCGTAATGGGATTGCTAGGTCAAATTGTA . . . NS=1;AN=0 GT:PS ./.:. -1 818905 . CTAGATCCTTGAGGAATTGTCACACTGTCTTCCATAATGACTGAACTAA . . . NS=1;AN=0 GT:PS ./.:. -1 818962 . CCTACCAACAGTATGAAAGCATTCCTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 819012 . CTGTTGTTTCCTGACTTTTAATAATAGCCATTCTAACTGGCTTGAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 819086 . ATTTATC . . . NS=1;AN=0 GT:PS ./.:. -1 819103 . TGACGATGAGCTTTTTTTCATGTTTGTTGGCCACATAA . . . NS=1;AN=0 GT:PS ./.:. -1 819161 . TCTGTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 819191 . GGGGTTGTTTTTTCTTGTAAATTTGTTTAAGTTATTTGTAGATTCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 819262 . TAGATTGCAAAAATTTTCTCCCAATCTATAGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 819323 . TTGCTGTGCAGAAGCTCTTTAGTTTAATTAGATCCCATTCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 819394 . TTTGGTGTTTTAGTCATAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 819461 . CTAGGGTTTTTATGG . . . NS=1;AN=0 GT:PS ./.:. -1 819504 . CCATCTTCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 819530 . GGTGTAAGGAAGATGTCCAGTTTCAATTT . . . NS=1;AN=0 GT:PS ./.:. -1 819597 . AAATAAGGAATCCTTTCCCCGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 819650 . GATGGTTGTATGTGTATGCTCT . . . NS=1;AN=0 GT:PS ./.:. -1 819683 . TCTATATTCTGGTTCATTGGTCTATGTGTCTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 819731 . ATGCTGTTTTGGTTACTGTAGCCTTATAGTATATTTTGAAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 819798 . GTTATTTTTGCTTAGAATTGTCTTGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 819837 . TTTTTGGTTCATGAGAATTTTTAAATAGT . . . NS=1;AN=0 GT:PS ./.:. -1 819873 . AATTCTGTGAAGAATGTCATTGGTAGTTTAATGGGAATACCATTAAATT . . . NS=1;AN=0 GT:PS ./.:. -1 819949 . GCTATTTTCACGAATTAATTCTTCCGTATCCATGAGCATGGAATGC . . . NS=1;AN=0 GT:PS ./.:. -1 820001 . A . . NS=1;CGA_WINEND=822000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:2.16:2.62:.:0:.:0:0.999:406 -1 820013 . CCTGTCTGATTTCTCTGAGCAGTGGTTTGTAGTCCTCCTTGAAG . . . NS=1;AN=0 GT:PS ./.:. -1 820084 . ATTCTGATGTATTTTATT . . . NS=1;AN=0 GT:PS ./.:. -1 820123 . GAATTTCATTCATGAT . . . NS=1;AN=0 GT:PS ./.:. -1 820161 . TTGATGTATAGAAATACTAGCAATTTTTGCACATTGGTTTTGTATACTG . . . NS=1;AN=0 GT:PS ./.:. -1 820239 . AGAAGCTTTTGGGCTGAGATGATGGG . . . NS=1;AN=0 GT:PS ./.:. -1 820274 . ATACAGGATCATGTCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 820313 . TTCCTCTCTTCCTATTTAAATACCTTTATTTCTTTCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 820359 . TTGCCCTGGCCAGAAATTCCAGCACTATATTGAATAGGAGTGGTAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 820416 . CTTGTCTTGTGCCAGTTTTCAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 820460 . TCATTCAGTATGATATTGGCTGTGGGTTTGTCATTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 820521 . ATCCTTCAATAGCTATTTTATTGAGGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 820572 . TTTCATT . . . NS=1;AN=0 GT:PS ./.:. -1 820604 . ATAGTCGTGTTGTTTTATGTTTAGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 820653 . TATAGATTTGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 820698 . GCCATCTTGATCGTGGTG . . . NS=1;AN=0 GT:PS ./.:. -1 820729 . TTAACAGTCTTAAGTTCAGTCTTTTTACATAATCCCACATT . . . NS=1;AN=0 GT:PS ./.:. -1 820787 . TTCATTCTTTTTTGTTCTTTTTTCTCTATTCTTTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 820845 . TTTCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 820931 . C . . . NS=1;AN=0 GT:PS ./.:. -1 820935 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:820935:PASS:268,.:245,.:0,.:0:0 -1 820943 . AGGTCAGTTA . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:820935:PASS:356,.:320,.:0,.:0:0 -1 820955 . TTCC . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:820935:PASS:356,.:320,.:0,.:0:0 -1 820967 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:820935:PASS:356,.:320,.:0,.:0:0 -1 820979 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs72890746;CGA_RPT=L1PA7|L1|26.4;CGA_SDO=8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:820935:PASS:113:113,121:194,222:30,16:-121,0,-113:-16,0,-30:109:56,53:56 -1 821001 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:821001:PASS:435,.:0,.:0,.:0:0 -1 821034 . GTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 821051 . T . . . NS=1;AN=0 GT:PS ./.:. -1 821054 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:821001:PASS:435,.:0,.:0,.:0:0 -1 821056 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:435,.:0,.:0,.:0:0 -1 821060 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:435,.:0,.:0,.:0:0 -1 821069 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:435,.:0,.:0,.:0:0 -1 821108 . CCTTGGGTCAGTTCTGTGCCCTTGCTGGGGAGGTGGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 821187 . TTCAGCGTTTTTGTGTTGATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 821246 . GACGTTGCTGACCTTTGAATGGGGTTTTTGTGGGGTCTTTTTTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 821305 . TTGCTTTCTGTTTGTTCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 821339 . CTTTCCTAGGGCTGCTGTGGTTTTCTGGGGGTCCACTCTGGACCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 821418 . CACCGGTGAAGGCTGCAAAACAGCAAAGATGGCAGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 821470 . GAGTACT . . . NS=1;AN=0 GT:PS ./.:. -1 821509 . AACGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 821530 . TGGAGACCCCTGTTGGGAGGCCTCACCCAGTCAGGGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 821577 . AGGAACTGCTTA . . . NS=1;AN=0 GT:PS ./.:. -1 821602 . GCTGCCCTTTGGCAGAGCAGGTGTGCTGTGCTGTGCTGATCCCCGGGAGTCTCCAGAGCCAGCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 821697 . AGATAGCAGCTACCCCTCTCCCTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 821757 . ATAGAACTCTGGCTGGAGTTGCTAAAATTCCAATGGGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 821817 . ATGTGTTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 821852 . CACAATCAGGCACAGCAGCTGTGCTGTGTTATGGGAAACTCCTCCTGGACCC . . . NS=1;AN=0 GT:PS ./.:. -1 821942 . ATATAGTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 822001 . T . . NS=1;CGA_WINEND=824000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:3.07:3.18:.:0:.:0:0.999:406 -1 822002 . GCCTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 822032 . GGGTTTTGTGAGGTGCTGTGGGAGTGGGGCCTCAGAATGATG . . . NS=1;AN=0 GT:PS ./.:. -1 822086 . CTGGATTCAGCCCCCTTCCTAGGGGAATGCACAGATGTATCTCCCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 822154 . TAGGATGCAAAACTCCTGGGTTTCCACGCATGCCCCAGTGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 822203 . GCATTCCGCCGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 822247 . GCCATGGTGGCTGAGCTCACCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 822353 . TTGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 822397 . G . . . NS=1;AN=0 GT:PS ./.:. -1 822400 . C . . . NS=1;AN=0 GT:PS ./.:. -1 822469 . CTGGATACCTCAACTCAAG . . . NS=1;AN=0 GT:PS ./.:. -1 822493 . GAAGTCACTTGTAGTTTTCATTGCTCTCCATGAGAGCCATGGGCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 822545 . CTTCTAATCGGCCAGCTTGGCCCCATCTAAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 822607 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:822607:PASS:411,.:0,.:0,.:0:0 -1 822613 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:822607:VQLOW:24,.:0,.:0,.:0:0 -1 822633 . C . . . NS=1;AN=0 GT:PS ./.:. -1 822638 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:822638:PASS:672,.:0,.:0,.:0:0 -1 822646 . ATC . . . NS=1;AN=0 GT:PS ./.:. -1 822654 . TTCAT . . . NS=1;AN=0 GT:PS ./.:. -1 822662 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:822638:PASS:484,.:0,.:0,.:0:0 -1 822673 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs75183112;CGA_SDO=8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:148:148,606:0,0:0,11:-148,0,-606:0,0,-11:93:51,42:42 -1 822683 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 822694 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:698,.:0,.:0,.:0:0 -1 822716 . AACATCTGCCATTTATAAAATTTCTGTAGAGTTAATAGAACTTTTCCTGA . . . NS=1;AN=0 GT:PS ./.:. -1 822837 . CATCTTGTTACTGGGATTATAAATTCGTACAAACTTTATCTAATGT . . . NS=1;AN=0 GT:PS ./.:. -1 822906 . TATCAGTGGAAAT . . . NS=1;AN=0 GT:PS ./.:. -1 822938 . TCATTTCCAGAAATTTACCCTACAGACATACTCATGATGCCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 823009 . ATACCACCTTTTTCATTAACAAAAGTCTGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 823057 . TCAGTCACATTAATGGTTATCACCTTACTGTGTAT . . . NS=1;AN=0 GT:PS ./.:. -1 823112 . C . . . NS=1;AN=0 GT:PS ./.:. -1 823118 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 823135 . T . . . NS=1;AN=0 GT:PS ./.:. -1 823139 . A . . . NS=1;AN=0 GT:PS ./.:. -1 823164 . AAAATGATGTCCAATATATGTTACAAAGTAAAAACAACCCTGGGTGCAGAGCCATGTGTCTGATATGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 823257 . G . . . NS=1;AN=0 GT:PS ./.:. -1 823260 . G . . . NS=1;AN=0 GT:PS ./.:. -1 823270 . GAA . . . NS=1;AN=0 GT:PS ./.:. -1 823279 . T . . . NS=1;AN=0 GT:PS ./.:. -1 823294 . G . . . NS=1;AN=0 GT:PS ./.:. -1 823321 . AAGACCTTTCTTTTTTAATGCTAACACTCCAAGGAACTGCACAACATT . . . NS=1;AN=0 GT:PS ./.:. -1 823374 . ATTCCCAACACATGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 823405 . CTATAGATTTTAGGAAATGAGGCTCTTGAAAAATCAAGCATTAATATTCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 823469 . TGGGACATA . . . NS=1;AN=0 GT:PS ./.:. -1 823496 . TTGAAAGTTTGTAAAATTTGCTCATACAACCATCTGGGCCTAGTCCTTCC . . . NS=1;AN=0 GT:PS ./.:. -1 823558 . TTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 823570 . GAGATTTTATTTTCTTCACATTATAGTACTTCTTAGGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 823618 . TTCTTGATTCTTGAGTCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 823648 . CTTTGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 823661 . AGGGTGCTTGGGCTCAGAACACATCAGTCAAAGAAAGAGAGAAGAAAGGAAGGCAGGAAGGCAGGGTGAAAGGAAGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 823758 . GAC . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:823758:PASS:274,.:268,.:0,.:0:0 -1 823763 . G . . . NS=1;AN=0 GT:PS ./.:. -1 823767 . A . . . NS=1;AN=0 GT:PS ./.:. -1 823769 . A . . . NS=1;AN=0 GT:PS ./.:. -1 823774 . G . . . NS=1;AN=0 GT:PS ./.:. -1 823777 . G . . . NS=1;AN=0 GT:PS ./.:. -1 823812 . A . . . NS=1;AN=0 GT:PS ./.:. -1 823831 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:456,.:314,.:0,.:0:0 -1 823865 . GG GAGGGAAAGA . . NS=1;AN=2;AC=1;CGA_RPT=GA-rich|Low_complexity|23.3;CGA_SDO=4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:177:177,456:35,314:4,34:-177,0,-456:-4,0,-34:161:95,161:66 -1 823893 . GGGAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 823906 . AGAGGAAGGGAGAGGAAGGAAGGAGGGAGAGAAAGAGGGAAAGGGAAAGAAGGAAGGAAAGAAACAAG . . . NS=1;AN=0 GT:PS ./.:. -1 823994 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:823994:PASS:203,.:190,.:0,.:0:0 -1 824001 . G . . NS=1;CGA_WINEND=826000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:3.82:3.29:.:0:.:0:0.999:406 -1 824002 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:823994:PASS:203,.:190,.:0,.:0:0 -1 824004 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:823994:PASS:203,.:190,.:0,.:0:0 -1 824036 . GAGGGAGAGAGGGAGGGAGGGAGGAAAAGAAAGAGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 824091 . CGGGAAGGGAAGAGGAGCCAGCCAAAGG . . . NS=1;AN=0 GT:PS ./.:. -1 824139 . CTTCTACTTGACCCAAGCAGTTCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 824201 . G . . . NS=1;AN=0 GT:PS ./.:. -1 824215 . T C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.131|rs72890762;CGA_SDO=4 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:244,.:22,.:3,.:-244,0,0:-3,0,0:123:74,.:48 -1 824254 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:824254:PASS:629,.:221,.:0,.:0:0 -1 824258 . CC . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:824254:PASS:715,.:293,.:0,.:0:0 -1 824269 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:240,.:0,.:0,.:0:0 -1 824284 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:415,.:35,.:0,.:0:0 -1 824300 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:715,.:293,.:0,.:0:0 -1 824308 . T . . . NS=1;AN=0 GT:PS ./.:. -1 824332 . C . . . NS=1;AN=0 GT:PS ./.:. -1 824338 . GC . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:110,.:0,.:0,.:0:0 -1 824357 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:189,.:0,.:0,.:0:0 -1 824370 . G . . . NS=1;AN=0 GT:PS ./.:. -1 824376 . A . . . NS=1;AN=0 GT:PS ./.:. -1 824379 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:188,.:0,.:0,.:0:0 -1 824393 . A . . . NS=1;AN=0 GT:PS ./.:. -1 824407 . AGGCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 824428 . ATTAGGTTATAAGATTCTGCCCCCATGTATGGA . . . NS=1;AN=0 GT:PS ./.:. -1 824467 . TCATTATCTCAGGAGTGGGTTAGTTATCTTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 824518 . GCATGTT . . . NS=1;AN=0 GT:PS ./.:. -1 824547 . TTCCCAAACTCTTGCTCTTCTGCCTTCAGCCATGAGATGACACAGCCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 824604 . CACGAGATGCAGACCCCTCATCCTTGGATTTCCTAGCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 824663 . TTTCTTTTCTTTATAGAT . . . NS=1;AN=0 GT:PS ./.:. -1 824691 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.131|rs75026264;CGA_RPT=MSTD|ERVL-MaLR|17.5;CGA_SDO=4 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:824691:PASS:88,.:69,.:8,.:-88,0,0:-8,0,0:65:36,.:26 -1 824694 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL .|0:824691:PASS:.,67:.,28:.,0:0:0 -1 824709 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:206,.:144,.:0,.:0:0 -1 824727 . ATGCCCCATTACATGGAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 824764 . C . . . NS=1;AN=0 GT:PS ./.:. -1 824816 . A . . . NS=1;AN=0 GT:PS ./.:. -1 824838 . C . . . NS=1;AN=0 GT:PS ./.:. -1 824845 . G . . . NS=1;AN=0 GT:PS ./.:. -1 824857 . TTC . . . NS=1;AN=0 GT:PS ./.:. -1 824874 . C . . . NS=1;AN=0 GT:PS ./.:. -1 824901 . GCCGACTT . . . NS=1;AN=0 GT:PS ./.:. -1 824921 . ATCATGGCAGGTTTGATGTGCTCACTTCTAT . . . NS=1;AN=0 GT:PS ./.:. -1 824970 . G . . . NS=1;AN=0 GT:PS ./.:. -1 824983 . G . . . NS=1;AN=0 GT:PS ./.:. -1 825016 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 825037 . C . . . NS=1;AN=0 GT:PS ./.:. -1 825051 . A . . . NS=1;AN=0 GT:PS ./.:. -1 825063 . GCCCCTGTAGGCAGAGCCTAGACAAGAGTT . . . NS=1;AN=0 GT:PS ./.:. -1 825104 . G . . . NS=1;AN=0 GT:PS ./.:. -1 825140 . C . . . NS=1;AN=0 GT:PS ./.:. -1 825144 . C . . . NS=1;AN=0 GT:PS ./.:. -1 825197 . G . . . NS=1;AN=0 GT:PS ./.:. -1 825207 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs61768257;CGA_RPT=MIRb|MIR|33.0;CGA_SDO=4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:314:314,377:296,307:27,37:-314,0,-377:-27,0,-37:77:34,42:42 -1 825250 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs61768258;CGA_RPT=MIRb|MIR|33.0;CGA_SDO=4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:825250:PASS:288:379,288:304,216:45,48:-379,0,-288:-45,0,-48:49:31,18:18 -1 825266 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs72890768;CGA_RPT=MIRb|MIR|33.0;CGA_SDO=4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:825250:PASS:138:138,241:71,179:24,49:-138,0,-241:-24,0,-49:27:14,13:13 -1 825288 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:889,.:811,.:0,.:0:0 -1 825292 . G . . . NS=1;AN=0 GT:PS ./.:. -1 825307 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77548197;CGA_SDO=4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:825307:VQLOW:28:28,100:0,42:0,29:-28,0,-100:0,0,-29:22:10,12:12 -1 825323 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:403,.:334,.:0,.:0:0 -1 825333 . TGT TATAACA . . NS=1;AN=2;AC=1;CGA_RPT=MLT1I|ERVL-MaLR|36.7;CGA_SDO=4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:825333:PASS:287:287,889:226,811:28,43:-287,0,-889:-28,0,-43:37:11,26:26 -1 825346 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs74765147;CGA_RPT=MLT1I|ERVL-MaLR|36.7;CGA_SDO=4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:825333:PASS:164:164,506:111,438:27,54:-164,0,-506:-27,0,-54:39:15,24:24 -1 825360 . GTC TTA . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs74559290&dbsnp.134|rs150390413;CGA_RPT=MLT1I|ERVL-MaLR|36.7;CGA_SDO=4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:825333:PASS:363:363,889:288,811:18,43:-363,0,-889:-18,0,-43:54:24,30:30 -1 825382 . TTTCTTTTTTCTATTTTTTCTTTTGTTGGGGGGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 825431 . AGTCTCACTCTGTCACCCGGGCTGGAGTGCAGTGGTGCAATCTCAGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 825500 . G . . . NS=1;AN=0 GT:PS ./.:. -1 825508 . T . . . NS=1;AN=0 GT:PS ./.:. -1 825513 . C . . . NS=1;AN=0 GT:PS ./.:. -1 825518 . G . . IMPRECISE;SVTYPE=INS;END=825518;SVLEN=208;CIPOS=-241,242;MEINFO=AluY,9,216,-;NS=1 GT:FT:CGA_IS:CGA_IDC:CGA_IDCL:CGA_IDCR:CGA_RDC:CGA_NBET:CGA_ETS:CGA_KES .:sns95:257:11:10:1:.:AluYa1:0:0.979 -1 825542 . G . . . NS=1;AN=0 GT:PS ./.:. -1 825549 . A . . . NS=1;AN=0 GT:PS ./.:. -1 825552 . TCAG . . . NS=1;AN=0 GT:PS ./.:. -1 825562 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 825566 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 825595 . T . . . NS=1;AN=0 GT:PS ./.:. -1 825604 . ACCATGTTAGTCAGGATGGTCTCGATCTCCTGACCTCATGATCTGCCTGCCTCAGCCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 825681 . AGGCGTGAGCCACTGCACCCGGCCTTGACTTC . . . NS=1;AN=0 GT:PS ./.:. -1 825748 . GGGTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 825767 GS000016676-ASM_3618_L C ]1:5726936]C . . NS=1;SVTYPE=BND;MATEID=GS000016676-ASM_3618_R;CGA_BF=0.92 GT:FT:CGA_BNDMPC:CGA_BNDPOS:CGA_BNDDEF:CGA_BNDP 1:MPCBT:8:825767:]5726936]C:PRECISE -1 825826 . C . . . NS=1;AN=0 GT:PS ./.:. -1 825851 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:825851:PASS:77,.:64,.:0,.:0:0 -1 826001 . G . . NS=1;CGA_WINEND=828000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.23:1.18:.:0:.:0:0.999:406 -1 826023 . GCTTCTA . . . NS=1;AN=0 GT:PS ./.:. -1 826057 . TAAACTGGGAAGCACTTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 826113 . TAGATAGAGGATGGAGAGACTGCAGGGGGCAGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 826162 . CT . . . NS=1;AN=0 GT:PS ./.:. -1 826182 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 826194 . A . . . NS=1;AN=0 GT:PS ./.:. -1 826214 . C . . . NS=1;AN=0 GT:PS ./.:. -1 826225 . A . . . NS=1;AN=0 GT:PS ./.:. -1 826240 . C . . . NS=1;AN=0 GT:PS ./.:. -1 826255 . A . . . NS=1;AN=0 GT:PS ./.:. -1 826259 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 826272 . C . . . NS=1;AN=0 GT:PS ./.:. -1 826277 . G . . . NS=1;AN=0 GT:PS ./.:. -1 826282 . C . . . NS=1;AN=0 GT:PS ./.:. -1 826301 . T . . . NS=1;AN=0 GT:PS ./.:. -1 826458 . G . . . NS=1;AN=0 GT:PS ./.:. -1 826610 . GGAAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 826643 . AAGTTTTTATAGTAACAGA . . . NS=1;AN=0 GT:PS ./.:. -1 826676 . GTCAATATTTATTACTTCATTAAGAGCAAATAAATACTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 826748 . AGTTTTGTATCACTATGTTTTTAATATTATACCTAA . . . NS=1;AN=0 GT:PS ./.:. -1 826802 . ACAATCT . . . NS=1;AN=0 GT:PS ./.:. -1 826831 . CCTCGAGGTAAGATTTACGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 826885 . TTTTCCAACTTTTTATACACATTT . . . NS=1;AN=0 GT:PS ./.:. -1 826936 . AAAATAA . . . NS=1;AN=0 GT:PS ./.:. -1 827052 . CCACAATATTAATACTTAGTAACCTTTATTTTAATAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 827105 . GAAATCTTGAATTGTCATATAGCAGTATCTTACAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 827162 . GAAATATGTGTTCCTAAAACATTTTTTTTTAAGATGGAGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 827232 . TGGAACAATCTCGGCTCACTGCAACTTCCGCCCCCCGGGTTCAAGTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 827323 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970386;CGA_RPT=AluSp|Alu|10.6;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:148:148,187:173,187:48,50:-148,0,-187:-48,0,-50:19:10,9:9 -1 827597 . ACACTTTAGGAGGCCAAGGCAGGAGTATCATGAGACC . . . NS=1;AN=0 GT:PS ./.:. -1 827639 . GAGCAAAATAGTGAGATGCTAACTCTACAAAAAAAATAAAAATTAGCTGAGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 827709 . AATTACAGC . . . NS=1;AN=0 GT:PS ./.:. -1 827731 . GAGGTGGGAGGATCCCTTGAGGGCAGGAGGTCAAGGTTGCAGTGA . . . NS=1;AN=0 GT:PS ./.:. -1 827783 . TCATACCACTGTACTTCAGCCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 827828 . TCAAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 827869 . CATGTATCCATTTACATTTACTTATTTTTAACAGTTTATCTAGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 827966 . TTTCTTGTTAACCATGTTATAGCCTGTGAATATCAGGTGTTCACGTAA . . . NS=1;AN=0 GT:PS ./.:. -1 828001 . G . . NS=1;CGA_WINEND=830000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.02:1.18:.:0:.:0:0.999:406 -1 828054 . AACTCAGAAAATTCCATTACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 828112 . T . . . NS=1;AN=0 GT:PS ./.:. -1 828154 . TTATAGCTTTATAACC . . . NS=1;AN=0 GT:PS ./.:. -1 828264 . TTTCCTTTTATCCATACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 828338 . CCTATATCCCAGCACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 828362 . CTGAGGCAAGAGGATCACTTGAGCTCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 828415 . TAGCGAGACCGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 828447 . AAAAAATTGCCAGGCATGGTGGTGCATGCC . . . NS=1;AN=0 GT:PS ./.:. -1 828583 . AGAGTGAGACCTTAGAGAGAGACCTT . . . NS=1;AN=0 GT:PS ./.:. -1 828618 . AGAGAAAAAAATAAAGAATTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 828682 . TAATACC . . . NS=1;AN=0 GT:PS ./.:. -1 828722 . CAAGTCAGCAGATTCAAAATAGGCAGGGAAAAAAAAATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 828778 . AGATTTT . . . NS=1;AN=0 GT:PS ./.:. -1 828899 . GTAATCCCAACACTTTGGGAGGCTGAGGCAGGTGGATCACTT . . . NS=1;AN=0 GT:PS ./.:. -1 829037 . GTCCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 829105 . C . . . NS=1;AN=0 GT:PS ./.:. -1 829159 . AAAGAAAAAAAAAAAAAAAAAAAATATATATATATATATATATATATATATATATATATATAGGCCTGTGCGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 829267 . GCCGAGGCGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 829298 . GTTCGAGACCAGCCTGGCCAACATGGTGAAACCTCGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 829361 . GGGCATGGTGGCACACA . . . NS=1;AN=0 GT:PS ./.:. -1 829397 . AGGAGGCTGAGACAGGAGAATCACTAGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 829442 . TGCAGTCAGCCAAGATCACACCACTGCACTCCAGCCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 829515 . TGTGTGTGTGACTAGAATGGTCTATAATATATAGCCAGCTCGAGT . . . NS=1;AN=0 GT:PS ./.:. -1 829593 . TCTCATTTTGGCCTTTTCAAGATTAAATCT . . . NS=1;AN=0 GT:PS ./.:. -1 829631 . AGGCCCCTCCCCTCTAGGGAAGTACTTGCCGGAGCGCTGCCTAAAG . . . NS=1;AN=0 GT:PS ./.:. -1 829710 . AATGGTTTATTTCTCATTATAAGGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 829767 . ACAGGAAGAATTTTTAAAATCGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 829816 . GGTCCCCAGAAAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 829858 . CAGGAGGAGGAGTGAGGGAGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 829912 . TCAGTTTTTCACGTGTGCCATTTTCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 829969 . TTCAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 830001 . T . . NS=1;CGA_WINEND=832000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.41:1.32:.:0:.:0:0.999:406 -1 830051 . AATATTTTAGACCAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 830104 . ATTGAGTATCTCAGTGGCTGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 830151 . ATTTGAAC . . . NS=1;AN=0 GT:PS ./.:. -1 830172 . AGGGTTCTAACCCAACCAGGATCCCCCTGGGGTGAAACTGAAACCCACAG . . . NS=1;AN=0 GT:PS ./.:. -1 830244 . TTTTTATTTTTATTTTTTCATTTTTTTTGAGACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 830324 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77038580;CGA_RPT=AluSp|Alu|10.6;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:830324:PASS:662:798,662:798,668:35,53:-798,0,-662:-35,0,-53:117:68,49:49 -1 830340 . CGC TGT . . NS=1;AN=2;AC=1;CGA_RPT=AluSp|Alu|10.6;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:830324:PASS:830:1376,830:1387,861:25,43:-1376,0,-830:-25,0,-43:124:81,43:43 -1 830348 . G A . . NS=1;AN=2;AC=1;CGA_RPT=AluSp|Alu|10.6;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:830324:PASS:328:639,328:640,330:41,39:-639,0,-328:-41,0,-39:84:52,32:32 -1 830357 . C A . . NS=1;AN=2;AC=1;CGA_RPT=AluSp|Alu|10.6;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:830324:PASS:439:439,570:439,580:30,51:-439,0,-570:-30,0,-51:80:42,38:38 -1 830361 . C T . . NS=1;AN=2;AC=1;CGA_RPT=AluSp|Alu|10.6;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:830324:PASS:284:284,455:283,459:26,46:-284,0,-455:-26,0,-46:72:36,36:36 -1 830374 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL .|0:830324:PASS:.,451:.,459:.,0:0:0 -1 830440 . GACAGGGTTTCTCCATGTTGGTCAAGCTGGTCTCGAACTCCCAACCTCAGGTGATCCGCCTGCCTA . . . NS=1;AN=0 GT:PS ./.:. -1 830520 . TTGGGTTTACAGGCGTGAGCCACCACGCCCGGCCCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 830578 . GATGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 830602 . TGAGTTGGTGTTCACTAACAAGCACGGAAGCTTTGTTACATT . . . NS=1;AN=0 GT:PS ./.:. -1 830658 . TGGCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 830681 . CCCGGGTTCACGCCATTCTCCTGCCTCAGCCTCCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 830778 . CAC . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:830778:VQLOW:28,.:22,.:0,.:0:0 -1 830783 . T . . . NS=1;AN=0 GT:PS ./.:. -1 830807 . G . . . NS=1;AN=0 GT:PS ./.:. -1 830830 . TCCGCCCACCTCGGCCTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 830862 . TTACAGGCGTGAGCCACGCGC . . . NS=1;AN=0 GT:PS ./.:. -1 830953 . AAAGTTTCAGGTGTCTTGATGGTATCTAAATCAGTTGTTGATTCGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 831009 . ACACAAGCATAACCTCTACGCCAAGTTATTATT . . . NS=1;AN=0 GT:PS ./.:. -1 831072 . TCTGTCCACCAAACCAGTTGTTCTGCTTGTGTCTTTGCAGCTGGTTT . . . NS=1;AN=0 GT:PS ./.:. -1 831132 . TCAGCTGCTGGTAACATCTGGCCTTTGGGAAGGCTCGAAAAATGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 831233 . CCCCTTTTTTGTTTTTATTA . . . NS=1;AN=0 GT:PS ./.:. -1 831269 . ATGAAATCGTAACTGAGCATTTTCAATTAACTGTGTGGAATGAACCATG . . . NS=1;AN=0 GT:PS ./.:. -1 831340 . TAATAGG . . . NS=1;AN=0 GT:PS ./.:. -1 831362 . CAATACCTCAATTACAGCTACAAGCTCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 831402 . GAAATATAGGGCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 831461 . ACTGGACCCATCTGTGAAACAATGAAAACGCTTAGCAGGCTGCAGGTTGTTTA . . . NS=1;AN=0 GT:PS ./.:. -1 831533 . AAATCGTTCACAGTCTTG . . . NS=1;AN=0 GT:PS ./.:. -1 831606 . CAATTTTTTGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 831648 . TATATTTCCATAGGTTGTATAACTGAATTGATGGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 831724 . TAATTAT . . . NS=1;AN=0 GT:PS ./.:. -1 831787 . GTTCAGTAACTAATTTCTCTAAAGCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 831848 . CCATATT . . . NS=1;AN=0 GT:PS ./.:. -1 831902 . CAACGGCCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 831938 . GCAGGAACTTTGTCTTTCCACTTG . . . NS=1;AN=0 GT:PS ./.:. -1 832001 . C . . NS=1;CGA_WINEND=834000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.77:1.04:.:0:.:0:0.999:406 -1 832060 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9697378;CGA_RPT=HERVK9-int|ERVK|6.4;CGA_SDO=11 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:832060:VQLOW:32,.:0,.:0,.:-32,0,0:0,0,0:25:13,.:12 -1 832066 . G . . . NS=1;AN=0 GT:PS ./.:. -1 832089 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:832089:PASS:62,.:17,.:0,.:0:0 -1 832092 . TG . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:832089:PASS:80,.:35,.:0,.:0:0 -1 832109 . CCATAAATTTATAGGTACAGAAGTTATAA . . . NS=1;AN=0 GT:PS ./.:. -1 832178 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13302934;CGA_RPT=HERVK9-int|ERVK|6.4;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:97:97,140:134,165:43,49:-97,0,-140:-43,0,-49:21:11,10:10 -1 832297 . CTG C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs34017275;CGA_RPT=HERVK9-int|ERVK|6.4;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:832297:PASS:164:224,164:223,168:38,39:-224,0,-164:-38,0,-39:26:13,13:13 -1 832318 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4500250;CGA_RPT=HERVK9-int|ERVK|6.4;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:832297:PASS:130:237,130:237,131:52,45:-237,0,-130:-52,0,-45:25:14,11:11 -1 832359 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:832359:PASS:50,.:16,.:0,.:0:0 -1 832368 . GTAATTTGATTTACC . . . NS=1;AN=0 GT:PS ./.:. -1 832398 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:832359:PASS:50,.:16,.:0,.:0:0 -1 832603 . C . . . NS=1;AN=0 GT:PS ./.:. -1 832606 . A . . . NS=1;AN=0 GT:PS ./.:. -1 832619 . C . . . NS=1;AN=0 GT:PS ./.:. -1 832756 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28833197;CGA_RPT=HERVK9-int|ERVK|6.4;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:186:186,213:211,211:45,48:-186,0,-213:-45,0,-48:30:12,18:18 -1 832918 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28765502;CGA_RPT=HERVK9-int|ERVK|6.4;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:832918:PASS:66:66,66:52,52:18,32:-66,0,-66:-18,0,-32:21:5,16:16 -1 832960 . AT A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.126|rs34260203&dbsnp.131|rs77120786&dbsnp.134|rs146934973;CGA_RPT=HERVK9-int|ERVK|6.4;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:77:77,465:77,496:38,40:-465,-77,0:-40,-38,0:33:30,30:3 -1 833172 . T TCGAA . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs57423329&dbsnp.132|rs113551494;CGA_RPT=HERVK9-int|ERVK|8.0;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:114:114,152:124,152:35,38:-114,0,-152:-35,0,-38:19:8,11:11 -1 833223 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13303211;CGA_RPT=HERVK9-int|ERVK|8.0;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:832918:PASS:195:195,195:191,191:43,46:-195,0,-195:-43,0,-46:32:13,19:19 -1 833302 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28752186&dbsnp.131|rs75132976;CGA_RPT=HERVK9-int|ERVK|8.0;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:832918:PASS:162:162,162:161,161:46,48:-162,0,-162:-46,0,-48:27:12,15:15 -1 833641 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28594623;CGA_RPT=MER9a2|ERVK|8.2;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:832918:PASS:330:330,396:331,395:53,54:-330,0,-396:-53,0,-54:32:14,18:18 -1 833659 . T A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28522102;CGA_RPT=MER9a2|ERVK|8.2;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:832918:PASS:398:398,435:397,434:53,54:-398,0,-435:-53,0,-54:33:15,18:18 -1 833663 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28416910;CGA_RPT=MER9a2|ERVK|8.2;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:832918:PASS:303:471,303:472,302:53,53:-471,0,-303:-53,0,-53:33:18,15:15 -1 833824 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28484835;CGA_RPT=MER9a2|ERVK|8.2;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:833824:PASS:234:234,234:229,229:46,49:-234,0,-234:-46,0,-49:39:15,24:24 -1 833927 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28593608;CGA_RPT=MER9a2|ERVK|8.2;CGA_SDO=11 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:73:73,73:66,66:23,35:-73,0,-73:-23,0,-35:20:6,14:14 -1 833977 . G . . . NS=1;AN=0 GT:PS ./.:. -1 834001 . C . . NS=1;CGA_WINEND=836000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:0.92:2:27:=:27:0.999:406 -1 834198 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28385272 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:833824:PASS:244:244,244:244,244:52,50:-244,0,-244:-52,0,-50:27:13,14:14 -1 834401 . CTTCTTTTTTTTTTTTTTTTTTTTTAGACGG . . . NS=1;AN=0 GT:PS ./.:. -1 834832 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4411087 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:115:115,115:114,114:42,43:-115,0,-115:-42,0,-43:18:7,11:11 -1 834928 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4422949 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:335:335,335:331,331:46,54:-335,0,-335:-46,0,-54:48:19,29:29 -1 834999 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28570054 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:347:347,347:347,347:53,54:-347,0,-347:-53,0,-54:39:19,20:20 -1 835499 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4422948 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:287:287,287:286,286:52,52:-287,0,-287:-52,0,-52:39:17,22:22 -1 836001 . C . . NS=1;CGA_WINEND=838000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.05:0.98:2:37:=:37:0.999:406 -1 836529 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28731045;CGA_RPT=AluJb|Alu|17.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:836529:PASS:191:191,191:191,191:49,50:-191,0,-191:-49,0,-50:28:13,15:15 -1 836896 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28705752;CGA_RPT=MER67A|ERV1|43.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:836529:PASS:264:264,648:265,650:33,54:-264,0,-648:-33,0,-54:50:17,33:33 -1 836924 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.130|rs72890788;CGA_RPT=MER67A|ERV1|43.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:836529:PASS:351:440,351:470,359:50,54:-440,0,-351:-50,0,-54:43:24,19:19 -1 838001 . A . . NS=1;CGA_WINEND=840000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.84:0.99:2:43:=:43:0.999:406 -1 838150 . TCTCAAAAAAAAAAAAAAATCGT . . . NS=1;AN=0 GT:PS ./.:. -1 838329 . G GC . . NS=1;AN=2;AC=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:46:46,46:44,44:12,23:-46,0,-46:-12,0,-23:16:6,10:10 -1 838387 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970384;CGA_RPT=MLT1B|ERVL-MaLR|29.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:100:160,100:159,99:48,40:-160,0,-100:-48,0,-40:16:9,7:7 -1 838555 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970383;CGA_RPT=MLT1B|ERVL-MaLR|29.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:447:473,447:473,447:50,54:-473,0,-447:-50,0,-54:47:24,23:23 -1 838931 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7523690;CGA_RPT=MLT1A|ERVL-MaLR|29.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:196:196,196:196,196:49,50:-196,0,-196:-49,0,-50:26:12,14:14 -1 839103 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28562941;CGA_RPT=MER2|TcMar-Tigger|17.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:289:441,289:439,287:53,52:-441,0,-289:-53,0,-52:37:21,16:16 -1 839301 . CAGA . . . NS=1;AN=0 GT:PS ./.:. -1 839670 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 839676 . AGGG . . . NS=1;AN=0 GT:PS ./.:. -1 839688 . T . . . NS=1;AN=0 GT:PS ./.:. -1 839695 . C . . . NS=1;AN=0 GT:PS ./.:. -1 839853 . GCCGCCGCCTCCTCCGAACGCGGCCGCCTCCTCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 839894 . GTGGCCTCCTCCGAACGCGGCCGCCTCCTCCTCCGAACGCGGCCGCCTCCTCCTCCG . . . NS=1;AN=0 GT:PS ./.:. -1 839958 . CCT . . . NS=1;AN=0 GT:PS ./.:. -1 839980 . C . . . NS=1;AN=0 GT:PS ./.:. -1 839982 . T . . . NS=1;AN=0 GT:PS ./.:. -1 839985 . T . . . NS=1;AN=0 GT:PS ./.:. -1 840001 . T . . NS=1;CGA_WINEND=842000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.69:0.95:2:48:=:48:0.999:406 -1 840006 . A . . . NS=1;AN=0 GT:PS ./.:. -1 840017 . G . . . NS=1;AN=0 GT:PS ./.:. -1 840020 . TCCT . . . NS=1;AN=0 GT:PS ./.:. -1 840027 . AAC . . . NS=1;AN=0 GT:PS ./.:. -1 840036 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.135|rs186698170;CGA_RPT=(CCG)n|Simple_repeat|30.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:840036:VQLOW:29:29,33:22,26:1,25:-29,0,-33:-1,0,-25:4:2,2:2 -1 840056 . CCG . . . NS=1;AN=0 GT:PS ./.:. -1 840061 . T . . . NS=1;AN=0 GT:PS ./.:. -1 840114 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 840123 . C . . . NS=1;AN=0 GT:PS ./.:. -1 840134 . GCG . . . NS=1;AN=0 GT:PS ./.:. -1 840139 . G . . . NS=1;AN=0 GT:PS ./.:. -1 840153 . A . . . NS=1;AN=0 GT:PS ./.:. -1 840165 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 840180 . C . . . NS=1;AN=0 GT:PS ./.:. -1 840197 . ACAAAGAGCCGCGCGGCCACGACGGCCGCGTGCCCG . . . NS=1;AN=0 GT:PS ./.:. -1 840303 . CGGTTC . . . NS=1;AN=0 GT:PS ./.:. -1 840327 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28625089 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:840327:PASS:43:45,43:44,43:18,29:-45,0,-43:-18,0,-29:4:2,2:2 -1 840331 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 840336 . AGGCGGGTGAGGGGAGGGGGCCGGAGGGTCGGGGGTGCCGGGGGGTGCGGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 840409 . GGGACGTTCGTGGCGGGGGAGGCTGTTGGGGACGTTCGTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 840472 . TCGTGGCGGGGGAGGCTGTTGGGTCCCCTCCCCGCCCCACCGC . . . NS=1;AN=0 GT:PS ./.:. -1 840561 . TCCG . . . NS=1;AN=0 GT:PS ./.:. -1 840576 . C . . . NS=1;AN=0 GT:PS ./.:. -1 840577 . T . . . NS=1;AN=0 GT:PS ./.:. -1 840580 . T . . . NS=1;AN=0 GT:PS ./.:. -1 840587 . GCCTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 840753 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970382;CGA_RPT=L1MB5|L1|41.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:44:44,349:44,349:40,52:-349,-44,0:-52,-40,0:21:21,21:0 -1 841085 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.88|rs1574243;CGA_RPT=L1MB5|L1|26.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:337:348,337:348,337:53,54:-348,0,-337:-53,0,-54:35:18,17:17 -1 841666 . TTTT . . . NS=1;AN=0 GT:PS ./.:. -1 841678 . C T . . NS=1;AN=2;AC=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:841678:VQLOW:24:24,24:5,5:0,13:-24,0,-24:0,0,-13:13:1,12:12 -1 842001 . C . . NS=1;CGA_WINEND=844000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.71:1.03:2:49:=:49:0.999:406 -1 842013 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7419119;CGA_FI=284600|XR_108282.1|LOC284600|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:124:124,124:124,124:43,44:-124,0,-124:-43,0,-44:14:7,7:7 -1 842057 . A AAACTCAGCTGCCTCTCCCCTTC . . NS=1;AN=2;AC=1;CGA_FI=284600|XR_108282.1|LOC284600|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:38:38,101:37,95:7,33:-38,0,-101:-7,0,-33:18:13,5:5 -1 842291 . ACCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 842359 . ACCCGAG . . . NS=1;AN=0 GT:PS ./.:. -1 842616 . TA T . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.129|rs55764002&dbsnp.132|rs111389427;CGA_FI=284600|XR_108282.1|LOC284600|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSz6|Alu|20.0 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:20,.:20,.:1,.:-20,0,0:-1,0,0:11:6,.:5 -1 842825 . AA GA . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs7519340;CGA_FI=284600|XR_108282.1|LOC284600|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSz|Alu|13.3 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:283,.:283,.:48,.:-283,0,0:-48,0,0:15:15,.:0 -1 842919 . AA . . . NS=1;AN=0 GT:PS ./.:. -1 843215 . C CCTGCCCGGTCCTTCTGACCAGCCGAGAGTA . . NS=1;AN=2;AC=1;CGA_FI=284600|XR_108282.1|LOC284600|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:843215:PASS:129:129,336:126,328:35,40:-129,0,-336:-35,0,-40:16:7,9:9 -1 843249 . AGACCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 843405 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11516185;CGA_FI=284600|XR_108282.1|LOC284600|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:148:148,148:141,141:44,46:-148,0,-148:-44,0,-46:28:9,19:19 -1 843558 . GGGCAGCACA . . . NS=1;AN=0 GT:PS ./.:. -1 843579 . C . . . NS=1;AN=0 GT:PS ./.:. -1 843595 . GT . . . NS=1;AN=0 GT:PS ./.:. -1 843607 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 843639 . A . . . NS=1;AN=0 GT:PS ./.:. -1 843642 . G . . . NS=1;AN=0 GT:PS ./.:. -1 843676 . AGGCTGGTGGGGGCAGCAGCTGGAAGGGGAAGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 843913 . G . . END=844215;NS=1;AN=0 GT:PS ./.:. -1 844001 . G . . NS=1;CGA_WINEND=846000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.76:0.84:.:0:.:0:0.999:406 -1 844300 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs61769713;CGA_FI=284600|XR_108282.1|LOC284600|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:53:112,53:111,51:41,31:-112,0,-53:-41,0,-31:8:5,3:3 -1 844356 . C . . . NS=1;AN=0 GT:PS ./.:. -1 844383 . CGGGGGTCGGGGTCAGGCCCCCGGGCGCACCGTTGCTGGTATATGCGGGGGTCG . . . NS=1;AN=0 GT:PS ./.:. -1 844443 . GGCCCCCGGGCGCACCGTTGCTGGTATATGCGGTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 844516 . ATGCGGTGGTCGGGGTCAGGCCCCCGGGCGCACCTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 844680 . G . . . NS=1;AN=0 GT:PS ./.:. -1 844691 . ATGCGGTGGTCGGGGTCAGGCCCCCGGGCGCACCGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 844736 . ATGCGGGGGTCGGGGTCAGGCCCCCGGGCGCACCGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 844781 . ATGCGGTGGTCGGGGTCAGGCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 844810 . GCATCTTTGCTGGTATATGCGGTGGTCGGGGTCAGGCCCCCCGGGCGCACTGTTGCTGGTATATGCGGTGGTCGGGGTCAGGCCCCCGGGCGCACCTTTG . . . NS=1;AN=0 GT:PS ./.:. -1 844924 . GGTCGGGGTCAGGCCCCCGGGCGCACCGTTGCTGGTATATGCGGGGGTCGGGGTCAGGCCCCCGGGCGTTGCTGGTATATGCGGTGGTCGGGGTCAGGCCCCCGGGCGCACCGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 845047 . ATGCGGGGGTCGGGGTCAGGCCCCCGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 845092 . ATGCGGT . . . NS=1;AN=0 GT:PS ./.:. -1 845108 . CAGGCCCCCGGGCGCACCGTTGCTGGTATATGCGGGGGTCGGGGTCAGGCCCCCGGGCGTTGCTGGTATATGCGGGGGTCGGGGTCAGGCCCCCGGGCGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 845273 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs117039017;CGA_FI=284600|XR_108282.1|LOC284600|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:37:65,37:65,37:28,27:-65,0,-37:-28,0,-27:16:7,9:9 -1 845337 . GGCGCGGGTGGGGGTCCTGGGCAGGGGCGGCGGCGCGG . . . NS=1;AN=0 GT:PS ./.:. -1 845398 . CCCGC . . . NS=1;AN=0 GT:PS ./.:. -1 845589 . GGACGCCCTGGCGCCTCTGCTGCCCACGGCGGCCCCGAGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 845654 . CATCTCCTGTCAGCCGCAGTGACTGCGGGTGCCTGACGGCGCCGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 845708 . G . . . NS=1;AN=0 GT:PS ./.:. -1 845711 . C . . . NS=1;AN=0 GT:PS ./.:. -1 846001 . G . . NS=1;CGA_WINEND=848000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:1.08:2:48:=:48:0.999:406 -1 847220 . A . . . NS=1;AN=0 GT:PS ./.:. -1 847223 . ACAATCGTGTTTAC . . . NS=1;AN=0 GT:PS ./.:. -1 847250 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7416129;CGA_FI=284600|XR_108282.1|LOC284600|INTRON|UNKNOWN-INC;CGA_RPT=MIRb|MIR|38.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:847250:PASS:117:880,117:775,12:53,20:-880,-117,0:-53,-20,0:37:37,37:0 -1 848001 . T . . NS=1;CGA_WINEND=850000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.95:1.10:2:48:=:48:0.999:406 -1 848828 . GA G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.107|rs3841265&dbsnp.134|rs150504890;CGA_FI=284600|XR_108282.1|LOC284600|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:333:333,339:363,337:38,42:-333,0,-339:-38,0,-42:41:23,18:18 -1 849440 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs79376265;CGA_FI=284600|XR_108282.1|LOC284600|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:148:336,148:331,143:52,47:-336,0,-148:-52,0,-47:27:17,10:10 -1 849998 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13303222;CGA_FI=284600|XR_108282.1|LOC284600|UTR|UNKNOWN-INC;CGA_RPT=L1MEc|L1|46.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:95:95,95:84,84:29,38:-95,0,-95:-29,0,-38:23:6,17:17 -1 850001 . G . . NS=1;CGA_WINEND=852000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.11:1.10:2:48:=:48:0.999:406 -1 850328 . CA CAA . . NS=1;AN=1;AC=1;CGA_FI=284600|XR_108282.1|LOC284600|UTR|UNKNOWN-INC;CGA_RPT=AluSp|Alu|10.6 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:850328:VQLOW:26,.:0,.:0,.:-26,0,0:0,0,0:12:4,.:4 -1 850352 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:850328:VQLOW:26,.:0,.:0,.:0:0 -1 850528 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs114889920;CGA_FI=284600|XR_108282.1|LOC284600|UTR|UNKNOWN-INC;CGA_RPT=L1MEc|L1|46.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:850528:PASS:113:113,113:111,111:38,42:-113,0,-113:-38,0,-42:21:8,13:13 -1 850542 . T . . . NS=1;AN=0 GT:PS ./.:. -1 850780 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6657440 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:333:333,333:331,331:46,54:-333,0,-333:-46,0,-54:43:18,25:25 -1 851030 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs79709025 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:323:356,323:354,322:48,54:-356,0,-323:-48,0,-54:43:24,19:19 -1 851499 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970465 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:219:332,219:331,219:52,50:-332,0,-219:-52,0,-50:29:15,14:14 -1 851757 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970464&dbsnp.129|rs62677860 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:30:312,30:312,30:48,29:-312,-30,0:-48,-29,0:16:16,16:0 -1 852001 . G . . NS=1;CGA_WINEND=854000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.06:0.92:2:49:=:49:0.999:406 -1 852016 . C . . . NS=1;AN=0 GT:PS ./.:. -1 852020 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 852964 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970461;CGA_FI=100130417|NR_026874.1|FLJ39609|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:368:383,368:383,368:53,54:-383,0,-368:-53,0,-54:39:20,19:19 -1 853488 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77644389;CGA_FI=100130417|NR_026874.1|FLJ39609|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:267:267,267:265,265:50,51:-267,0,-267:-50,0,-51:32:13,19:19 -1 854001 . G . . NS=1;CGA_WINEND=856000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.10:0.97:2:44:=:44:0.999:406 -1 854168 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs79188446;CGA_FI=100130417|NR_026874.1|FLJ39609|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:226:226,226:226,226:46,49:-226,0,-226:-46,0,-49:31:16,15:15 -1 854777 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13303019;CGA_FI=100130417|NR_026874.1|FLJ39609|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:854777:PASS:196:196,196:195,195:49,50:-196,0,-196:-49,0,-50:27:13,14:14 -1 855075 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6673914;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:854777:PASS:285:285,285:285,285:51,52:-285,0,-285:-51,0,-52:34:15,19:19 -1 855168 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs78128413;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:99:99,99:98,98:39,40:-99,0,-99:-39,0,-40:15:6,9:9 -1 855635 . CCTT C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77712898;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:855635:PASS:174:174,272:189,256:30,39:-174,0,-272:-30,0,-39:46:14,32:32 -1 855774 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77595185;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:855635:PASS:171:298,171:296,169:52,49:-298,0,-171:-52,0,-49:26:15,11:11 -1 855878 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs117282503;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:855635:PASS:207:207,207:204,204:44,47:-207,0,-207:-44,0,-47:35:14,21:21 -1 856001 . G . . NS=1;CGA_WINEND=858000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.28:0.98:2:37:=:37:0.999:406 -1 856041 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs61769717;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:163:260,163:257,161:52,48:-260,0,-163:-52,0,-48:27:16,11:11 -1 856099 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28534711;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:855635:PASS:109:193,109:183,104:50,36:-109,0,-193:-36,0,-50:28:21,7:21 -1 856108 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28742275;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:855635:PASS:129:193,129:183,138:50,43:-129,0,-193:-43,0,-50:22:15,7:15 -1 856329 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7414599;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:284:355,284:354,282:53,52:-355,0,-284:-53,0,-52:35:20,15:15 -1 856476 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4040605;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:36:310,36:310,36:48,34:-310,-36,0:-48,-34,0:17:16,16:1 -1 856628 . C . . . NS=1;AN=0 GT:PS ./.:. -1 857177 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28409649;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:857177:PASS:454:454,454:454,454:47,54:-454,0,-454:-47,0,-54:54:26,28:28 -1 857318 . G GGAGT . . NS=1;AN=2;AC=1;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:270:270,351:287,345:31,41:-270,0,-351:-31,0,-41:52:21,31:31 -1 857443 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs56663360;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:857177:PASS:371:639,371:635,366:49,54:-639,0,-371:-49,0,-54:57:34,23:23 -1 857728 . T G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6689107;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:59:542,59:542,59:52,46:-542,-59,0:-52,-46,0:27:27,27:0 -1 858001 . C . . NS=1;CGA_WINEND=860000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.36:2.02:2:25:=:25:0.999:406 -1 858149 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77520260;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:389:472,389:472,389:50,54:-472,0,-389:-50,0,-54:49:24,25:25 -1 858691 . TG T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs34628185&dbsnp.134|rs149702675;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:228:322,228:320,227:40,39:-322,0,-228:-40,0,-39:34:19,15:15 -1 858801 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7418179;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=MIR|MIR|30.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:56:56,56:40,40:12,28:-56,0,-56:-12,0,-28:25:6,19:19 -1 859222 . CCGTCACGCACCCCCCGCGGGAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 859332 . GCGGGCGCGCGCCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 859353 . C . . . NS=1;AN=0 GT:PS ./.:. -1 859359 . CACGACTGA . . . NS=1;AN=0 GT:PS ./.:. -1 859404 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.130|rs71509444;CGA_FI=100130417|NR_026874.1|FLJ39609|TSS-UPSTREAM|UNKNOWN-INC&148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:859404:VQLOW:36:74,36:73,35:31,27:-74,0,-36:-31,0,-27:9:5,4:4 -1 859436 . GCCGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 859465 . GAGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 859473 . CGTGGGACT . . . NS=1;AN=0 GT:PS ./.:. -1 859490 . CGCGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 859506 . CGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 859520 . C . . . NS=1;AN=0 GT:PS ./.:. -1 859620 . CGGCCGGCTGGGCAGTCCGGGGAGGCCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 859656 . GCGC . . . NS=1;AN=0 GT:PS ./.:. -1 859664 . GCGGCGGCTGCGG . . . NS=1;AN=0 GT:PS ./.:. -1 859689 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 859701 . CGT . . . NS=1;AN=0 GT:PS ./.:. -1 859709 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:859709:VQLOW:21,.:17,.:0,.:0:0 -1 859713 . G . . . NS=1;AN=0 GT:PS ./.:. -1 859716 . AGGCGCCTCCCCGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 859820 . G . . . NS=1;AN=0 GT:PS ./.:. -1 859894 . T . . . NS=1;AN=0 GT:PS ./.:. -1 859908 . CTGCCA CTGCCG . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.132|rs112703963;CGA_FI=148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=(CCG)n|Simple_repeat|27.9 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:859908:PASS:61,.:61,.:27,.:-61,0,0:-27,0,0:6:6,.:0 -1 859967 . GCCCGCCTCGGCCGCCGGTTACGAGGCTCTGCTGGCCCCGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 860001 . G . . NS=1;CGA_WINEND=862000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:1.05:2:36:=:36:0.999:406 -1 860100 . GCCACCGCGGCCGCGGCCCCGGATTTCCAGCCGCTGCTGGACAACGGCGAGCCGTGCATCGAGGTGGAGTGCGGCGCCAACCGCGCGCTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 860254 . T . . . NS=1;AN=0 GT:PS ./.:. -1 860280 . AGC . . . NS=1;AN=0 GT:PS ./.:. -1 860288 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 860296 . C . . . NS=1;AN=0 GT:PS ./.:. -1 860302 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 860307 . A . . . NS=1;AN=0 GT:PS ./.:. -1 860310 . C . . . NS=1;AN=0 GT:PS ./.:. -1 860363 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 860366 . G . . . NS=1;AN=0 GT:PS ./.:. -1 860371 . GACC . . . NS=1;AN=0 GT:PS ./.:. -1 860383 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:860383:VQLOW:22,.:21,.:0,.:0:0 -1 860391 . TCGGACACCCGGGAGCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 860413 . CTCG . . . NS=1;AN=0 GT:PS ./.:. -1 860428 . GCTGCAGCTCCAGGGCTGCGCGGGGACACCCCCGCCGCGCGCGGAGGCCTCGGTGAACAC . . . NS=1;AN=0 GT:PS ./.:. -1 860504 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs78033073;CGA_FI=148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:860504:VQLOW:30:30,64:32,64:10,35:-30,0,-64:-10,0,-35:7:2,5:5 -1 860521 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs57924093;CGA_FI=148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:860504:PASS:49:49,60:49,60:31,27:-60,0,-49:-27,0,-31:8:3,5:3 -1 860583 . T . . . NS=1;AN=0 GT:PS ./.:. -1 860598 . C . . . NS=1;AN=0 GT:PS ./.:. -1 860601 . G . . . NS=1;AN=0 GT:PS ./.:. -1 860655 . CGGAGCCCCGGGTTCGGGGGAGACTGGAGGGGCGCACGTGCGGCCGGGTGCGAGCGCGCGGCGGGGGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 860730 . GGGCGGCG . . . NS=1;AN=0 GT:PS ./.:. -1 860761 . CGGCG . . . NS=1;AN=0 GT:PS ./.:. -1 860778 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:860778:VQLOW:28,.:23,.:0,.:0:0 -1 860789 . TGGCGCCTGCGGGCG . . . NS=1;AN=0 GT:PS ./.:. -1 860810 . CCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 860854 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs57816555;CGA_FI=148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:36:36,146:30,139:29,45:-146,-36,0:-45,-29,0:11:11,11:0 -1 860862 . GCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 860929 . T . . . NS=1;AN=0 GT:PS ./.:. -1 860932 . T . . . NS=1;AN=0 GT:PS ./.:. -1 861008 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.125|rs28521172;CGA_FI=148398|NM_152486.2|SAMD11|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:21:189,21:189,21:48,22:-189,-21,0:-48,-22,0:8:8,8:0 -1 861115 . AGCCCAGCAGATCCCTGCGGCGTTCGCGAGGGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 861630 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.101|rs2879816;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:135:279,135:277,132:52,45:-279,0,-135:-52,0,-45:21:13,8:8 -1 861808 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13302982;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:861808:PASS:345:345,345:345,345:47,54:-345,0,-345:-47,0,-54:41:20,21:21 -1 862001 . T . . NS=1;CGA_WINEND=864000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.14:0.92:2:44:=:44:0.999:406 -1 862093 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13303291;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:861808:PASS:239:319,239:321,240:53,50:-319,0,-239:-53,0,-50:35:19,16:16 -1 862124 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13303101;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:861808:PASS:393:393,530:393,531:50,54:-393,0,-530:-50,0,-54:46:20,26:26 -1 862186 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs74442310;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:194:194,194:194,194:49,50:-194,0,-194:-49,0,-50:21:11,10:10 -1 862383 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6680268;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:861808:PASS:415:422,415:421,419:53,54:-422,0,-415:-53,0,-54:38:19,19:19 -1 862389 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6693546;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:861808:PASS:330:441,330:440,331:53,54:-441,0,-330:-53,0,-54:36:19,17:17 -1 862866 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs3892970;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:260:260,260:259,259:49,51:-260,0,-260:-49,0,-51:33:15,18:18 -1 863124 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs4040604;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:200:200,200:197,197:43,47:-200,0,-200:-43,0,-47:33:12,21:21 -1 863499 . TGT CGC . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28649395&dbsnp.125|rs28718350;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:367:456,367:456,366:33,35:-456,0,-367:-33,0,-35:41:23,18:18 -1 863508 . G T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7410984;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:165:165,780:181,783:52,50:-780,-165,0:-52,-50,0:40:40,40:0 -1 863511 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.125|rs28626846;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:117:117,576:124,577:48,52:-576,-117,0:-52,-48,0:29:29,29:0 -1 863556 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7410998;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:274:274,274:272,272:50,52:-274,0,-274:-50,0,-52:36:15,21:21 -1 863632 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28403979;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:95:281,95:278,93:52,39:-281,0,-95:-52,0,-39:24:14,10:10 -1 863641 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.125|rs28569249;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:61:61,476:95,477:48,52:-476,-61,0:-52,-48,0:24:23,23:0 -1 863644 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.125|rs28739566;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:74:74,465:98,462:48,52:-465,-74,0:-52,-48,0:22:22,22:0 -1 863689 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7417994;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:863689:PASS:250:250,251:254,251:52,50:-250,0,-251:-52,0,-50:20:10,10:10 -1 863696 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs35717056;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:863689:PASS:181:181,289:181,289:50,52:-289,0,-181:-52,0,-50:23:9,14:9 -1 863776 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs35485427;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|6.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:323:349,323:348,322:53,54:-349,0,-323:-53,0,-54:35:20,15:15 -1 863843 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs35599603;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:863843:PASS:248:356,248:355,253:53,51:-356,0,-248:-53,0,-51:32:18,14:14 -1 863863 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs35854196;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:863843:PASS:168:323,168:322,169:52,49:-323,0,-168:-52,0,-49:24:15,9:9 -1 863978 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.130|rs74047403;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:863978:PASS:102:268,102:265,98:48,40:-268,0,-102:-48,0,-40:19:12,7:7 -1 863999 . T . . . NS=1;AN=0 GT:PS ./.:. -1 864001 . G . . NS=1;CGA_WINEND=866000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.16:0.88:2:50:=:50:0.999:406 -1 864278 . G A . . NS=1;AN=2;AC=1;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:240:436,240:434,238:50,50:-436,0,-240:-50,0,-50:41:23,18:18 -1 864726 . T A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2340590;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:135:1145,135:1145,135:49,52:-1145,-135,0:-52,-49,0:50:50,50:0 -1 864755 . AGG GGA . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2340588&dbsnp.100|rs2340589;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:109:109,1078:109,1078:37,33:-1078,-109,0:-37,-33,0:49:49,49:0 -1 864938 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2340587&dbsnp.130|rs78370858;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:864938:PASS:435:435,435:435,435:46,54:-435,0,-435:-46,0,-54:52:26,26:26 -1 865367 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs75294478;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:864938:PASS:269:269,269:267,267:51,50:-269,0,-269:-50,0,-51:34:20,14:20 -1 865663 . G . . . NS=1;AN=0 GT:PS ./.:. -1 865694 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9988179;CGA_FI=148398|NM_152486.2|SAMD11|CDS|MISSENSE GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:865694:PASS:214:214,214:212,212:51,50:-214,0,-214:-51,0,-50:29:12,17:17 -1 865948 . GCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 866001 . C . . NS=1;CGA_WINEND=868000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.07:0.90:2:52:=:52:0.999:406 -1 866319 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9988021;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:37:240,37:240,37:48,35:-240,-37,0:-48,-35,0:14:14,14:0 -1 866511 . C CCCCT . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs60722469;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:66:66,101:63,98:25,33:-66,0,-101:-25,0,-33:18:11,7:7 -1 866920 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2341361;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:72:72,632:72,632:48,52:-632,-72,0:-52,-48,0:29:29,29:0 -1 867584 . A T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2341360;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:132:132,1154:132,1154:52,50:-1154,-132,0:-52,-50,0:47:47,47:0 -1 867993 . GTTTC G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.114|rs5772025&dbsnp.134|rs138211850;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:101:101,764:101,787:39,38:-764,-101,0:-39,-38,0:42:42,42:0 -1 868001 . C . . NS=1;CGA_WINEND=870000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.71:0.81:2:50:=:50:0.999:406 -1 868329 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2341359;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:75:75,658:75,658:41,53:-658,-75,0:-53,-41,0:31:31,31:0 -1 868404 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13302914;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=C-rich|Low_complexity|28.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:94:94,753:94,753:48,53:-753,-94,0:-53,-48,0:33:33,33:0 -1 868791 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303003;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:35:35,374:35,374:33,48:-374,-35,0:-48,-33,0:16:16,16:0 -1 868840 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28532704;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:166:185,166:185,166:48,49:-185,0,-166:-48,0,-49:19:9,10:10 -1 868891 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303066;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:71:613,71:613,71:53,40:-613,-71,0:-53,-40,0:30:30,30:0 -1 868928 . A AG . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs56367715&dbsnp.131|rs74724223;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:60:60,135:85,123:29,36:-60,0,-135:-29,0,-36:29:8,21:21 -1 868981 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13303037;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:120:120,120:117,117:42,43:-120,0,-120:-42,0,-43:17:6,11:11 -1 869121 . T TG . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs56190854&dbsnp.131|rs75384481&dbsnp.132|rs111626358;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:40:40,70:51,69:18,29:-40,0,-70:-18,0,-29:14:5,9:9 -1 869244 . C . . . NS=1;AN=0 GT:PS ./.:. -1 869323 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303207;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:12:103,12:103,12:40,16:-103,-12,0:-40,-16,0:8:8,8:0 -1 869354 . G . . END=870152;NS=1;AN=0 GT:PS ./.:. -1 870001 . G . . NS=1;CGA_WINEND=872000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.85:0.77:2:50:=:50:0.999:406 -1 870166 . GTGGGCAGGGGAGGCGGCTGCGTTACAGGTGGGCGGGGGAGGCGGCTCCGTTACAGGTGGGCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 870233 . A . . . NS=1;AN=0 GT:PS ./.:. -1 870248 . A . . . NS=1;AN=0 GT:PS ./.:. -1 870252 . GGGCGGGG GTGCAGGG . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.125|rs28491190&dbsnp.135|rs186226871;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:870252:PASS:137,.:105,.:23,.:-137,0,0:-23,0,0:18:14,.:4 -1 870269 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 870284 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.125|rs28621383;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:870252:PASS:83,.:85,.:35,.:-83,0,0:-35,0,0:11:7,.:4 -1 870290 . GGCGGCTGCGTTACAGGTGGGCGGGGGGGG GG . . NS=1;AN=1;AC=1;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:870252:PASS:60,.:28,.:6,.:-60,0,0:-6,0,0:13:6,.:4 -1 870331 . C . . . NS=1;AN=0 GT:PS ./.:. -1 870340 . GGGCGG . . . NS=1;AN=0 GT:PS ./.:. -1 870360 . T . . . NS=1;AN=0 GT:PS ./.:. -1 870373 . G . . . NS=1;AN=0 GT:PS ./.:. -1 870388 . GCCCCTTCT . . . NS=1;AN=0 GT:PS ./.:. -1 870806 . C CGGAGCTCCTCT . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs59561572&dbsnp.132|rs113002845;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:226:226,373:222,369:33,41:-226,0,-373:-33,0,-41:40:25,15:15 -1 870903 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303094;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:93:753,93:753,93:53,47:-753,-93,0:-53,-47,0:31:31,31:0 -1 871098 . G . . . NS=1;AN=0 GT:PS ./.:. -1 871215 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28419423;CGA_FI=148398|NM_152486.2|SAMD11|CDS|SYNONYMOUS GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:143:143,181:155,178:45,49:-143,0,-181:-45,0,-49:26:10,16:16 -1 871280 . A . . . NS=1;AN=0 GT:PS ./.:. -1 871334 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs4072383;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:103:265,103:261,99:48,40:-265,0,-103:-48,0,-40:19:12,7:7 -1 871683 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4504834;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:16:41,16:41,16:17,18:-41,-16,0:-18,-17,0:2:1,1:1 -1 871958 . G . . . NS=1;AN=0 GT:PS ./.:. -1 871966 . G A . . NS=1;AN=2;AC=1;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:871966:VQLOW:27:27,27:26,26:5,25:-27,0,-27:-5,0,-25:6:2,4:4 -1 872001 . C . . NS=1;CGA_WINEND=874000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.00:1.02:2:52:=:52:0.999:406 -1 872087 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs78308511;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:872087:PASS:205:205,263:210,262:51,50:-205,0,-263:-51,0,-50:29:12,17:17 -1 872091 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77614634;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:872087:PASS:206:206,263:213,262:45,51:-206,0,-263:-45,0,-51:30:13,17:17 -1 872352 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.92|rs1806780;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:872087:PASS:114:114,114:103,103:35,41:-114,0,-114:-35,0,-41:28:8,20:20 -1 872776 . GCCC G,GC . . NS=1;AN=2;AC=1,1;CGA_XR=.,dbsnp.114|rs5772023&dbsnp.126|rs34936020;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC,148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/2:.:PASS:64:64,64:63,64:19,13:-64,-64,-64,-64,0,-64:-19,-13,-13,-19,0,-19:12:6,5:0 -1 872817 . G . . . NS=1;AN=0 GT:PS ./.:. -1 872964 . GCCTTGGCCCACCCCCTCCCAGCCCACC . . . NS=1;AN=0 GT:PS ./.:. -1 873394 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.86|rs1110051;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:150:150,150:146,146:44,47:-150,0,-150:-44,0,-47:27:10,17:17 -1 873558 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.86|rs1110052;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:873558:PASS:389:389,389:387,387:43,54:-389,0,-389:-43,0,-54:51:22,29:29 -1 874001 . C . . NS=1;CGA_WINEND=876000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.15:1.04:2:52:=:52:0.999:406 -1 874073 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28450942;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:873558:PASS:408:408,419:408,419:54,50:-419,0,-408:-50,0,-54:47:24,23:24 -1 874659 . GCT . . . NS=1;AN=0 GT:PS ./.:. -1 874925 . ACA . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:874925:PASS:206,.:202,.:0,.:0:0 -1 874950 . T TCCCTGGAGGACC . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.114|rs6143081&dbsnp.131|rs79212057;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:874925:PASS:112:206,112:202,124:40,35:-112,0,-206:-35,0,-40:22:12,10:12 -1 875605 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 875612 . GCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 875623 . GGGG . . . NS=1;AN=0 GT:PS ./.:. -1 875667 . CGCGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 875676 . GCC G . . NS=1;AN=1;AC=1;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC;CGA_RPT=GC_rich|Low_complexity|2.2 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:875676:PASS:49,.:48,.:14,.:-49,0,0:-14,0,0:10:3,.:0 -1 875741 . C . . . NS=1;AN=0 GT:PS ./.:. -1 876001 . C . . NS=1;CGA_WINEND=878000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:0.70:2:48:=:48:0.999:406 -1 876402 . A . . . NS=1;AN=0 GT:PS ./.:. -1 876499 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4372192;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:29:237,29:237,29:52,28:-237,-29,0:-52,-28,0:20:20,20:0 -1 876949 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.130|rs72902600;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:69:69,127:70,127:24,45:-69,0,-127:-24,0,-45:20:10,10:10 -1 877003 . G T . . NS=1;AN=2;AC=1;CGA_FI=148398|NM_152486.2|SAMD11|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:88:88,88:87,87:36,38:-88,0,-88:-36,0,-38:18:8,10:10 -1 877176 . T . . . NS=1;AN=0 GT:PS ./.:. -1 877235 . CGCTCGGGTCCGCAGGGGAGGGGAGCAGGCGGGGCCGGCGCCCCGCGC . . . NS=1;AN=0 GT:PS ./.:. -1 877334 . G . . . NS=1;AN=0 GT:PS ./.:. -1 877337 . G . . . NS=1;AN=0 GT:PS ./.:. -1 877353 . GCC . . . NS=1;AN=0 GT:PS ./.:. -1 877358 . C . . . NS=1;AN=0 GT:PS ./.:. -1 877361 . C . . . NS=1;AN=0 GT:PS ./.:. -1 877375 . GCGCGCG . . . NS=1;AN=0 GT:PS ./.:. -1 877472 . AACGGGGGCGGGGGGGACGCCGCTCATTGCGCTGCCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 877522 . G . . END=877723;NS=1;AN=0 GT:PS ./.:. -1 877746 . CCGCCTCGGACCCCCCGACCCCGCGTTGTCCCCCTCCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 877800 . C . . END=878115;NS=1;AN=0 GT:PS ./.:. -1 878001 . T . . NS=1;CGA_WINEND=880000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.77:0.85:2:50:=:50:0.999:406 -1 878121 . CGGGCTCCGGACCCCCCACCCCGTCCCGGGACTCTGCCCGGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 878302 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 878326 . C . . . NS=1;AN=0 GT:PS ./.:. -1 878361 . CCGA . . . NS=1;AN=0 GT:PS ./.:. -1 878369 . AAGGGGCTTTTCCCAGGGTCCACACTGCCCCTGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 878440 . T . . . NS=1;AN=0 GT:PS ./.:. -1 878445 . A . . . NS=1;AN=0 GT:PS ./.:. -1 878903 . CACCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 879013 . G . . . NS=1;AN=0 GT:PS ./.:. -1 879118 . T . . . NS=1;AN=0 GT:PS ./.:. -1 879317 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7523549;CGA_FI=148398|NM_152486.2|SAMD11|CDS|SYNONYMOUS GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:104:104,104:100,100:35,40:-104,0,-104:-35,0,-40:20:7,13:13 -1 879676 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6605067;CGA_FI=148398|NM_152486.2|SAMD11|UTR3|UNKNOWN-INC&26155|NM_015658.3|NOC2L|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:126:126,803:126,803:52,49:-803,-126,0:-52,-49,0:52:51,51:1 -1 879687 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.36|rs2839;CGA_FI=148398|NM_152486.2|SAMD11|UTR3|UNKNOWN-INC&26155|NM_015658.3|NOC2L|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:78:653,78:653,78:53,42:-653,-78,0:-53,-42,0:39:39,39:0 -1 879776 . CTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 879786 . CCC . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:879786:PASS:41,.:14,.:0,.:0:0 -1 879792 . GGAA . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:879786:PASS:41,.:14,.:0,.:0:0 -1 879817 . TGACG . . . NS=1;AN=0 GT:PS ./.:. -1 879824 . A . . . NS=1;AN=0 GT:PS ./.:. -1 879830 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 879840 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 879858 . C . . . NS=1;AN=0 GT:PS ./.:. -1 879863 . AGCTGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 879874 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:879874:PASS:92,.:91,.:0,.:0:0 -1 879878 . G . . . NS=1;AN=0 GT:PS ./.:. -1 879881 . G . . . NS=1;AN=0 GT:PS ./.:. -1 880001 . G . . NS=1;CGA_WINEND=882000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.10:1.15:2:50:=:50:0.999:406 -1 880128 . CAGCTGCTGCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 880238 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3748592;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:111:1006,111:1006,111:53,52:-1006,-111,0:-53,-52,0:39:39,39:0 -1 880390 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.107|rs3748593;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:233:332,233:331,231:53,49:-332,0,-233:-53,0,-49:31:18,13:13 -1 881377 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs74360597;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:390:452,390:452,390:50,54:-452,0,-390:-50,0,-54:49:25,24:24 -1 882001 . C . . NS=1;CGA_WINEND=884000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:0.97:2:51:=:51:0.999:406 -1 882275 . T . . . NS=1;AN=0 GT:PS ./.:. -1 882280 . TAC . . . NS=1;AN=0 GT:PS ./.:. -1 882803 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2340582;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:94:94,1002:94,1002:48,50:-1002,-94,0:-50,-48,0:45:45,45:0 -1 883625 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970378;CGA_FI=26155|NM_015658.3|NOC2L|ACCEPTOR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:63:427,63:427,63:52,48:-427,-63,0:-52,-48,0:25:25,25:0 -1 883948 . C . . . NS=1;AN=0 GT:PS ./.:. -1 883961 . TTGAAGTCGACCTGCTGGAACATCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 884001 . T . . NS=1;CGA_WINEND=886000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:1.10:2:50:=:50:0.999:406 -1 884038 . TCACCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 884091 . C . . . NS=1;AN=0 GT:PS ./.:. -1 884426 . A AACAGCAAAG . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.114|rs5772022;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:36:36,379:36,379:25,39:-379,-36,0:-39,-25,0:33:33,33:0 -1 884815 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4246503;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:76:793,76:793,76:53,41:-793,-76,0:-53,-41,0:36:36,36:0 -1 885654 . T . . . NS=1;AN=0 GT:PS ./.:. -1 885657 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs116767636;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC;CGA_RPT=(CA)n|Simple_repeat|38.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:885657:PASS:80:230,80:183,34:48,27:-230,0,-80:-48,0,-27:20:14,6:6 -1 885676 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970377;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC;CGA_RPT=(CA)n|Simple_repeat|38.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:885676:PASS:46:46,561:75,517:48,52:-561,-46,0:-52,-48,0:28:28,28:0 -1 885689 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970452;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC;CGA_RPT=(CA)n|Simple_repeat|38.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:885676:PASS:45:45,697:79,651:48,52:-697,-45,0:-52,-48,0:29:29,29:0 -1 885699 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970376;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC;CGA_RPT=(CA)n|Simple_repeat|38.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:885676:VQLOW:17:17,577:17,531:19,52:-577,-17,0:-52,-19,0:26:26,26:0 -1 885704 . T . . . NS=1;AN=0 GT:PS ./.:. -1 885945 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28535998;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC;CGA_RPT=(CA)n|Simple_repeat|34.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:127:127,127:126,126:41,44:-127,0,-127:-41,0,-44:20:9,11:11 -1 886001 . G . . NS=1;CGA_WINEND=888000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:0.96:2:52:=:52:0.999:406 -1 886006 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970375;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC;CGA_RPT=(CA)n|Simple_repeat|34.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:26:182,26:182,26:48,26:-182,-26,0:-48,-26,0:11:11,11:0 -1 886049 . ACAG A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs146799731;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC;CGA_RPT=(CA)n|Simple_repeat|34.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:885657:PASS:387:387,435:387,435:42,40:-435,0,-387:-40,0,-42:39:18,21:18 -1 886180 . ACTGTTC CTTTCAG . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.126|rs34767445&dbsnp.130|rs71490527;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:83:723,83:723,83:32,35:-723,-83,0:-35,-32,0:53:53,53:0 -1 886309 . G . . . NS=1;AN=0 GT:PS ./.:. -1 886384 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.107|rs3748594;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:885657:PASS:69:69,69:54,54:32,19:-69,0,-69:-19,0,-32:23:17,6:17 -1 886654 . G . . . NS=1;AN=0 GT:PS ./.:. -1 886786 . CAGGTACC . . . NS=1;AN=0 GT:PS ./.:. -1 886802 . C . . . NS=1;AN=0 GT:PS ./.:. -1 886817 . C CATTTT . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs111748052;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:886817:PASS:166:166,180:175,175:39,40:-166,0,-180:-39,0,-40:23:8,15:15 -1 886982 . C . . . NS=1;AN=0 GT:PS ./.:. -1 887059 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs76456117;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:299:395,299:393,297:53,53:-395,0,-299:-53,0,-53:38:22,16:16 -1 887560 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3748595;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:76:778,76:778,76:53,41:-778,-76,0:-53,-41,0:38:38,38:0 -1 887801 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3828047;CGA_FI=26155|NM_015658.3|NOC2L|CDS|NO-CHANGE;CGA_PFAM=PFAM|PF03715|Noc2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:69:69,583:69,583:39,53:-583,-69,0:-53,-39,0:31:31,31:0 -1 888001 . G . . NS=1;CGA_WINEND=890000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.10:0.92:2:53:=:53:0.999:406 -1 888220 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28711536;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:327:327,327:326,326:46,54:-327,0,-327:-46,0,-54:41:19,22:22 -1 888639 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3748596;CGA_FI=26155|NM_015658.3|NOC2L|CDS|SYNONYMOUS GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:92:689,92:689,92:50,47:-689,-92,0:-50,-47,0:42:42,42:0 -1 888659 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3748597;CGA_FI=26155|NM_015658.3|NOC2L|CDS|NO-CHANGE GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:73:632,73:632,73:52,48:-632,-73,0:-52,-48,0:27:27,27:0 -1 888740 . CT . . . NS=1;AN=0 GT:PS ./.:. -1 888743 . C . . . NS=1;AN=0 GT:PS ./.:. -1 888747 . GCCA . . . NS=1;AN=0 GT:PS ./.:. -1 889131 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.107|rs3828048;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:218:218,285:218,285:52,50:-218,0,-285:-52,0,-50:26:13,13:13 -1 889158 . GA CC . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13302945&dbsnp.121|rs13303056&dbsnp.129|rs56262069;CGA_FI=26155|NM_015658.3|NOC2L|DONOR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:96:789,96:789,124:35,37:-789,-96,0:-37,-35,0:34:34,34:0 -1 889638 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303206;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:113:1027,113:1027,113:50,52:-1027,-113,0:-52,-50,0:42:42,42:0 -1 889713 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303051;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:84:84,855:84,855:44,53:-855,-84,0:-53,-44,0:37:37,37:0 -1 890001 . C . . NS=1;CGA_WINEND=892000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:0.94:2:53:=:53:0.999:406 -1 890295 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6661531;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:242:412,242:410,240:53,50:-412,0,-242:-53,0,-50:36:21,15:15 -1 890447 . G . . . NS=1;AN=0 GT:PS ./.:. -1 891021 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13302957;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC&339451|NM_198317.1|KLHL17|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:301:301,301:298,298:45,53:-301,0,-301:-45,0,-53:40:16,24:24 -1 892001 . C . . NS=1;CGA_WINEND=894000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.09:0.97:2:52:=:52:0.999:406 -1 892460 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.127|rs41285802;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC&339451|NM_198317.1|KLHL17|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:295:295,295:295,295:53,53:-295,0,-295:-53,0,-53:34:17,17:17 -1 892745 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303227;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC&339451|NM_198317.1|KLHL17|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:112:1023,112:1023,112:50,52:-1023,-112,0:-52,-50,0:44:44,44:0 -1 893280 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970371;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC&339451|NM_198317.1|KLHL17|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=L2a|L2|63.0;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:74:74,739:74,739:41,53:-739,-74,0:-53,-41,0:31:30,30:1 -1 893461 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 893615 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs75254714;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC&339451|NM_198317.1|KLHL17|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSx|Alu|9.3;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:511:511,511:511,511:49,54:-511,0,-511:-49,0,-54:54:27,26:26 -1 893631 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6605069;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC&339451|NM_198317.1|KLHL17|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSx|Alu|11.8;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:249:1359,249:1359,300:49,52:-1359,-249,0:-52,-49,0:54:54,54:0 -1 893719 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970445;CGA_FI=26155|NM_015658.3|NOC2L|INTRON|UNKNOWN-INC&339451|NM_198317.1|KLHL17|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSx|Alu|11.8;CGA_SDO=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:174:351,174:347,170:52,49:-351,0,-174:-52,0,-49:29:18,11:11 -1 894001 . T . . NS=1;CGA_WINEND=896000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.02:0.93:2:51:=:51:0.999:406 -1 894673 . T C . . NS=1;AN=2;AC=1;CGA_FI=26155|NM_015658.3|NOC2L|UTR5|UNKNOWN-INC&339451|NM_198317.1|KLHL17|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:117:119,117:118,117:43,43:-119,0,-117:-43,0,-43:18:10,8:8 -1 894890 . A AAGAC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.130|rs70949531&dbsnp.132|rs111643007;CGA_FI=26155|NM_015658.3|NOC2L|TSS-UPSTREAM|UNKNOWN-INC&339451|NM_198317.1|KLHL17|TSS-UPSTREAM|UNKNOWN-INC;CGA_SDO=2 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:134,.:134,.:35,.:-134,0,0:-35,0,0:11:11,.:0 -1 895115 . GATCCGCGGAGACCGGGCCAGCGCCACGAACACCACGCAGGGCG . . . NS=1;AN=0 GT:PS ./.:. -1 895181 . GTGCCGACCGCGGCTCTTCCCGGGGACGCCGCACGGGACGAAGA . . . NS=1;AN=0 GT:PS ./.:. -1 895293 . AGACCCCGC . . . NS=1;AN=0 GT:PS ./.:. -1 895324 . GGCACGGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 895427 . GCTCCCCGGAGGAGAGCAAGTTAGGGGGTCG . . . NS=1;AN=0 GT:PS ./.:. -1 895507 . CTCCTCGGAGGAGGAAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 895545 . CCTTGGAGGAGGAGGAGGGCGAGGCTTAGGAGGGCTCTTCGGA . . . NS=1;AN=0 GT:PS ./.:. -1 895610 . G . . . NS=1;AN=0 GT:PS ./.:. -1 895613 . G . . . NS=1;AN=0 GT:PS ./.:. -1 895616 . C . . . NS=1;AN=0 GT:PS ./.:. -1 895619 . C . . . NS=1;AN=0 GT:PS ./.:. -1 895622 . G . . . NS=1;AN=0 GT:PS ./.:. -1 895629 . A . . . NS=1;AN=0 GT:PS ./.:. -1 895641 . A . . . NS=1;AN=0 GT:PS ./.:. -1 895652 . CTTCCCGGAGGAGGAGGAGGAGGAGGGCTAGGCCGGGGGCTTCCCAGAGGAGGAGGATGGCGGGGCCTGGGGGGCTTCTCG . . . NS=1;AN=0 GT:PS ./.:. -1 895767 . C . . . NS=1;AN=0 GT:PS ./.:. -1 895845 . GGAGGAGGCGGACCCGGGGCGCAGCGCTGGAAGAATCCGCGTCC . . . NS=1;AN=0 GT:PS ./.:. -1 895895 . TAGTCCCCGAAGCCTCTCGGGAGGCGGGGCGGGCGGCGCCGAGAAACA . . . NS=1;AN=0 GT:PS ./.:. -1 895972 . GCGACACAGAGCGGGCCGCCACCGCCGAGCAGCCCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 896001 . C . . NS=1;CGA_WINEND=898000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:1.34:2:45:=:45:0.999:406 -1 896041 . CCTCCGC . . . NS=1;AN=0 GT:PS ./.:. -1 896061 . GTCCGGCAGCCGAATGCAGCCCCGCAGCGAGCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 896103 . CAGGACGCAGAGCCCGGAGCACGGCAGCCCGGGGCCCGGGCCCGAGGCGCCGCCGCCTCCACCGCCGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 896190 . C . . . NS=1;AN=0 GT:PS ./.:. -1 896202 . C . . . NS=1;AN=0 GT:PS ./.:. -1 896225 . GCGCGGCCCCCGCCCTCGCGTCCGCTCGCAGAAGGGGCGGGGGCCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 896333 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs144174542;CGA_FI=26155|NM_015658.3|NOC2L|TSS-UPSTREAM|UNKNOWN-INC&339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:896333:PASS:49:93,49:92,49:37,31:-93,0,-49:-37,0,-31:11:6,5:5 -1 896339 . C . . . NS=1;AN=0 GT:PS ./.:. -1 896348 . C . . . NS=1;AN=0 GT:PS ./.:. -1 896350 . CGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 896426 . GGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 896476 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.125|rs28393498;CGA_FI=26155|NM_015658.3|NOC2L|TSS-UPSTREAM|UNKNOWN-INC&339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:21:208,21:208,21:48,22:-208,-21,0:-48,-22,0:17:17,17:0 -1 896554 . T . . . NS=1;AN=0 GT:PS ./.:. -1 896563 . C . . . NS=1;AN=0 GT:PS ./.:. -1 896610 . AGCGGCTCCAGGGCGGGCGGGCGGCTCCAGCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 896731 . GCCGTGCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 897325 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970441;CGA_FI=26155|NM_015658.3|NOC2L|TSS-UPSTREAM|UNKNOWN-INC&339451|NM_198317.1|KLHL17|CDS|SYNONYMOUS&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC;CGA_PFAM=PFAM|PF07707|BACK GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:84:589,84:589,84:53,44:-589,-84,0:-53,-44,0:31:31,31:0 -1 897564 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303229;CGA_FI=26155|NM_015658.3|NOC2L|TSS-UPSTREAM|UNKNOWN-INC&339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:39:39,330:39,330:36,52:-330,-39,0:-52,-36,0:20:20,20:0 -1 897718 . A . . . NS=1;AN=0 GT:PS ./.:. -1 897730 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7549631;CGA_FI=26155|NM_015658.3|NOC2L|TSS-UPSTREAM|UNKNOWN-INC&339451|NM_198317.1|KLHL17|ACCEPTOR|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:897730:PASS:125:193,125:186,119:48,43:-193,0,-125:-48,0,-43:16:9,7:7 -1 897879 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs79008338;CGA_FI=26155|NM_015658.3|NOC2L|TSS-UPSTREAM|UNKNOWN-INC&339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:101:215,101:211,97:48,40:-215,0,-101:-48,0,-40:18:12,6:6 -1 898001 . A . . NS=1;CGA_WINEND=900000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.84:0.86:2:50:=:50:0.999:406 -1 898323 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6605071;CGA_FI=26155|NM_015658.3|NOC2L|TSS-UPSTREAM|UNKNOWN-INC&339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:84:644,84:644,84:52,48:-644,-84,0:-52,-48,0:29:29,29:0 -1 899000 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.107|rs3813184;CGA_FI=26155|NM_015658.3|NOC2L|TSS-UPSTREAM|UNKNOWN-INC&339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:234:283,234:282,234:51,49:-283,0,-234:-51,0,-49:30:16,14:14 -1 899462 . C . . . NS=1;AN=0 GT:PS ./.:. -1 899667 . G . . . NS=1;AN=0 GT:PS ./.:. -1 899692 . TGGCGGGTCTGCGTCCAGCCCACGCCCTCGCCCCCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 899739 . GGTCGCCCGTGGCGTCCATGCTGAGCCGACGCAGCTCAGCGGGCGTGGCCGTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 899803 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 899809 . CGTGGCAGGGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 899829 . G . . . NS=1;AN=0 GT:PS ./.:. -1 899832 . C . . . NS=1;AN=0 GT:PS ./.:. -1 899911 . GTCCGCAGTGGGGCTGCGGGGAGGGGGG GTCCGCAGTGGGGCTGCCGGGAGGGGTC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs6677386&dbsnp.130|rs71490529&dbsnp.134|rs143296006&dbsnp.134|rs147467971;CGA_FI=339451|NM_198317.1|KLHL17|DONOR|NO-CHANGE&339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:26,.:25,.:0,.:-26,0,0:0,0,0:6:6,.:0 -1 899942 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.130|rs71509448;CGA_FI=339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:17:17,75:17,75:19,32:-75,-17,0:-32,-19,0:5:5,5:0 -1 899946 . CCGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 899970 . CCGCGCGTCCG . . . NS=1;AN=0 GT:PS ./.:. -1 899986 . G . . END=900204;NS=1;AN=0 GT:PS ./.:. -1 900001 . G . . NS=1;CGA_WINEND=902000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.89:0.77:2:49:=:49:0.999:303 -1 900285 . CA TG . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970434&dbsnp.111|rs4970435&dbsnp.130|rs71490530;CGA_FI=339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:28:411,28:411,28:30,33:-411,-28,0:-33,-30,0:17:17,17:0 -1 900319 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs80351873;CGA_FI=339451|NM_198317.1|KLHL17|INTRON|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:96:96,96:95,95:38,40:-96,0,-96:-38,0,-40:16:6,10:10 -1 900540 . C . . . NS=1;AN=0 GT:PS ./.:. -1 900553 . G . . . NS=1;AN=0 GT:PS ./.:. -1 900972 . T G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9697711;CGA_FI=339451|NM_198317.1|KLHL17|UTR3|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:72:72,578:72,578:48,52:-578,-72,0:-52,-48,0:24:24,24:0 -1 901023 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303351;CGA_FI=339451|NM_198317.1|KLHL17|UTR3|UNKNOWN-INC&84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:71:529,71:529,71:52,48:-529,-71,0:-52,-48,0:24:24,24:0 -1 901607 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13302996;CGA_FI=84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:74:74,447:74,447:41,50:-447,-74,0:-50,-41,0:45:45,45:0 -1 901652 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2879814;CGA_FI=84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:82:721,82:721,82:53,44:-721,-82,0:-53,-44,0:36:36,36:0 -1 901806 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.101|rs2879815;CGA_FI=84069|NM_001160184.1|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:53:431,53:431,53:52,44:-431,-53,0:-52,-44,0:24:24,24:0 -1 902001 . G . . NS=1;CGA_WINEND=904000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.11:0.88:2:49:=:49:0.999:303 -1 902069 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs116147894;CGA_FI=84069|NM_001160184.1|PLEKHN1|ACCEPTOR|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|ACCEPTOR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:91:91,91:90,90:37,39:-91,0,-91:-37,0,-39:12:5,7:7 -1 902128 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28499371;CGA_FI=84069|NM_001160184.1|PLEKHN1|CDS|MISSENSE&84069|NM_032129.2|PLEKHN1|CDS|MISSENSE GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:95:95,95:94,94:38,40:-95,0,-95:-38,0,-40:14:6,8:8 -1 902749 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs113967711;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:176:257,176:217,134:52,46:-257,0,-176:-52,0,-46:27:15,12:12 -1 902756 . ACC AC,A . . NS=1;AN=2;AC=1,1;CGA_XR=dbsnp.126|rs33930611&dbsnp.126|rs34363514&dbsnp.129|rs55994311&dbsnp.134|rs141099258,dbsnp.126|rs33930611&dbsnp.126|rs34363514&dbsnp.129|rs55994311&dbsnp.134|rs141099258;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC,84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/2:.:PASS:45:189,45:145,0:28,0:-189,-45,-45,-189,0,-189:-28,0,0,-28,0,-28:29:15,5:0 -1 903104 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6696281;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:41:351,41:351,41:52,38:-351,-41,0:-52,-38,0:28:28,28:0 -1 903321 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6669800;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:191:249,191:247,190:52,50:-249,0,-191:-52,0,-50:23:14,9:9 -1 903426 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6696609;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:271:271,271:270,270:50,51:-271,0,-271:-50,0,-51:36:17,19:19 -1 904001 . A . . NS=1;CGA_WINEND=906000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:0.87:.:0:.:0:0.999:303 -1 904165 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28391282;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC;CGA_RPT=L2a|L2|47.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:440:440,440:440,440:50,54:-440,0,-440:-50,0,-54:45:22,23:23 -1 904628 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28562326;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:904628:PASS:291:291,291:290,290:44,52:-291,0,-291:-44,0,-52:42:19,23:23 -1 904752 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs35241590;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC;CGA_RPT=(TG)n|Simple_repeat|23.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:97:97,207:93,203:38,50:-97,0,-207:-38,0,-50:16:5,11:11 -1 904757 . A ATG . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs56411007&dbsnp.131|rs77716369;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC;CGA_RPT=(TG)n|Simple_repeat|23.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:27:27,179:52,177:37,35:-179,-27,0:-37,-35,0:18:15,15:1 -1 904942 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6667868;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC;CGA_RPT=(TG)n|Simple_repeat|27.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:904628:PASS:82:259,82:253,75:48,36:-259,0,-82:-48,0,-36:19:13,6:6 -1 905017 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs59766802;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC;CGA_RPT=(TG)n|Simple_repeat|29.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:904628:PASS:152:152,225:154,221:45,50:-152,0,-225:-45,0,-50:25:10,15:15 -1 905029 . CGTGT C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs111304798;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC;CGA_RPT=(TG)n|Simple_repeat|29.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:904628:PASS:154:154,225:175,221:36,40:-154,0,-225:-36,0,-40:26:8,18:18 -1 905275 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 906001 . G . . NS=1;CGA_WINEND=908000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.07:0.74:2:47:=:47:0.999:303 -1 906272 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.125|rs28507236;CGA_FI=84069|NM_001160184.1|PLEKHN1|CDS|SYNONYMOUS&84069|NM_032129.2|PLEKHN1|CDS|SYNONYMOUS;CGA_PFAM=PFAM|PF00169|PH-like GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:41:41,356:41,356:38,52:-356,-41,0:-52,-38,0:20:20,20:0 -1 906695 . C . . . NS=1;AN=0 GT:PS ./.:. -1 907082 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 907170 . AG A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs113797882&dbsnp.134|rs137960056;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:143:143,213:176,187:27,38:-143,0,-213:-27,0,-38:54:14,40:40 -1 907609 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs79890672;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:907609:PASS:86:86,86:61,61:27,34:-86,0,-86:-27,0,-34:18:4,14:14 -1 907622 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs59928984;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:907609:PASS:64:64,86:55,61:19,34:-64,0,-86:-19,0,-34:21:4,17:17 -1 907884 . GCCTGTAAT . . . NS=1;AN=0 GT:PS ./.:. -1 907920 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28430926;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC;CGA_RPT=AluSz|Alu|13.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:907920:VQLOW:33:33,35:33,35:11,27:-33,0,-35:-11,0,-27:12:5,7:7 -1 907992 . AAA AC . . NS=1;AN=2;AC=1;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC;CGA_RPT=AluSz|Alu|13.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:907609:PASS:177:177,216:177,221:29,30:-177,0,-216:-29,0,-30:19:12,7:7 -1 907998 . A T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs77098784;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC;CGA_RPT=AluSz|Alu|13.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:907609:PASS:206:255,206:255,208:52,50:-255,0,-206:-52,0,-50:23:12,11:11 -1 908001 . A . . NS=1;CGA_WINEND=910000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.01:0.88:2:49:=:49:0.999:303 -1 908151 . GAT . . . NS=1;AN=0 GT:PS ./.:. -1 908170 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28542142;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:908170:PASS:41:41,41:6,6:0,14:-41,0,-41:0,0,-14:13:3,10:10 -1 908414 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28504611;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:232:478,232:471,225:53,49:-478,0,-232:-53,0,-49:37:24,13:13 -1 909073 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs3892467;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:909073:PASS:206:478,206:472,200:50,47:-478,0,-206:-50,0,-47:41:26,15:15 -1 909238 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3829740;CGA_FI=84069|NM_001160184.1|PLEKHN1|CDS|MISSENSE&84069|NM_032129.2|PLEKHN1|CDS|MISSENSE GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:78:78,612:78,612:48,52:-612,-78,0:-52,-48,0:28:28,28:0 -1 909309 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.107|rs3829738;CGA_FI=84069|NM_001160184.1|PLEKHN1|CDS|MISSENSE&84069|NM_032129.2|PLEKHN1|CDS|MISSENSE GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:909073:PASS:166:439,166:428,155:53,42:-439,0,-166:-53,0,-42:32:22,10:10 -1 909363 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs62639990;CGA_FI=84069|NM_001160184.1|PLEKHN1|CDS|SYNONYMOUS&84069|NM_032129.2|PLEKHN1|CDS|SYNONYMOUS GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:909073:PASS:124:207,124:205,122:48,44:-207,0,-124:-48,0,-44:17:10,7:7 -1 909555 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2340594;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:46:270,46:270,46:52,41:-270,-46,0:-52,-41,0:20:20,20:0 -1 909768 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2340593;CGA_FI=84069|NM_001160184.1|PLEKHN1|INTRON|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:51:51,487:51,487:44,48:-487,-51,0:-48,-44,0:19:19,19:0 -1 909880 . T . . . NS=1;AN=0 GT:PS ./.:. -1 910001 . A . . NS=1;CGA_WINEND=912000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.12:0.87:2:48:=:48:0.999:303 -1 910394 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28477686;CGA_FI=84069|NM_001160184.1|PLEKHN1|UTR3|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:153:153,153:152,152:45,48:-153,0,-153:-45,0,-48:25:11,14:14 -1 910438 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6685581;CGA_FI=84069|NM_001160184.1|PLEKHN1|UTR3|UNKNOWN-INC&84069|NM_032129.2|PLEKHN1|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:19:172,19:172,19:48,20:-172,-19,0:-48,-20,0:16:16,16:0 -1 910512 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs74424855;CGA_RPT=GA-rich|Low_complexity|21.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:910512:PASS:413:413,810:405,801:34,57:-413,0,-810:-34,0,-57:73:32,41:41 -1 910527 . GAGAGA G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs141996759;CGA_RPT=GA-rich|Low_complexity|21.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:910512:PASS:147:147,709:184,701:28,45:-147,0,-709:-28,0,-45:73:26,47:47 -1 910903 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970429;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:77:77,687:77,687:42,53:-687,-77,0:-53,-42,0:34:33,33:1 -1 910935 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2340592;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:910512:PASS:207:207,207:192,192:35,46:-207,0,-207:-35,0,-46:45:14,31:31 -1 911595 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7417106;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:65:65,469:65,469:38,53:-469,-65,0:-53,-38,0:31:29,29:2 -1 911768 . G . . . NS=1;AN=0 GT:PS ./.:. -1 911934 . AAAAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 912001 . T . . NS=1;CGA_WINEND=914000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.26:0.67:2:45:=:45:0.999:303 -1 912049 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9803103;CGA_FI=84808|NR_027693.1|C1orf170|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:73:73,73:73,73:31,36:-73,0,-73:-31,0,-36:11:5,6:6 -1 912074 . AACCGCCTGCCCCCACC . . . NS=1;AN=0 GT:PS ./.:. -1 912099 . C . . END=913448;NS=1;AN=0 GT:PS ./.:. -1 913464 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 913497 . CGCACACGGCCGCCCCGGGAACCGCCTGCCTCCCCCTCCAACCCCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 913606 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 913889 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2340596;CGA_FI=84808|NR_027693.1|C1orf170|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:173:455,173:448,166:53,43:-455,0,-173:-53,0,-43:39:25,14:14 -1 914001 . G . . NS=1;CGA_WINEND=916000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.22:0.85:2:49:=:49:0.999:303 -1 914192 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2340595;CGA_FI=84808|NR_027693.1|C1orf170|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:65:545,65:545,65:52,48:-545,-65,0:-52,-48,0:26:26,26:0 -1 914333 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13302979;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:63:63,447:63,447:37,53:-447,-63,0:-53,-37,0:30:30,30:0 -1 914852 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13303368;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:914852:PASS:187:187,275:186,274:49,50:-187,0,-275:-49,0,-50:29:13,16:16 -1 914876 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13302983;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:69:69,561:118,561:48,52:-561,-69,0:-52,-48,0:26:26,26:0 -1 914940 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13303033;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:914852:PASS:167:167,167:166,166:47,49:-167,0,-167:-47,0,-49:25:11,14:14 -1 915227 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303355;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:30:30,615:74,615:48,52:-615,-30,0:-52,-48,0:25:25,25:0 -1 915264 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs113243246;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:914852:PASS:136:429,136:422,129:54,32:-136,0,-429:-32,0,-54:33:21,12:21 -1 916001 . G . . NS=1;CGA_WINEND=918000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.06:1.05:2:52:=:52:0.999:303 -1 916071 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.135|rs190879197;CGA_FI=84808|NR_027693.1|C1orf170|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:916071:PASS:118:118,118:118,118:39,43:-118,0,-118:-39,0,-43:20:9,11:11 -1 916108 . T . . . NS=1;AN=0 GT:PS ./.:. -1 916144 . T . . . NS=1;AN=0 GT:PS ./.:. -1 916160 . G . . . NS=1;AN=0 GT:PS ./.:. -1 916164 . C . . . NS=1;AN=0 GT:PS ./.:. -1 916239 . TGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 916246 . CTG . . . NS=1;AN=0 GT:PS ./.:. -1 916257 . GCTGCCACTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 916549 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6660139;CGA_FI=84808|NR_027693.1|C1orf170|UTR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:42:306,42:306,42:48,39:-306,-42,0:-48,-39,0:14:14,14:0 -1 916834 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6694632;CGA_FI=84808|NR_027693.1|C1orf170|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:86:86,698:86,698:45,53:-698,-86,0:-53,-45,0:31:31,31:0 -1 917060 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6605058;CGA_FI=84808|NR_027693.1|C1orf170|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:38:260,38:260,38:48,36:-260,-38,0:-48,-36,0:16:16,16:0 -1 918001 . G . . NS=1;CGA_WINEND=920000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.81:0.93:2:53:=:53:0.999:303 -1 918384 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13303118;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:378:440,378:440,378:50,54:-440,0,-378:-50,0,-54:48:25,23:23 -1 918573 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2341354;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:111:111,918:111,918:52,53:-918,-111,0:-53,-52,0:39:38,38:1 -1 919419 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6605059;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:32:32,290:32,290:31,48:-290,-32,0:-48,-31,0:12:12,12:0 -1 919501 . GGA TGA . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.111|rs4970414;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:919501:PASS:441,.:412,.:52,.:-441,0,0:-52,0,0:20:18,.:0 -1 919507 . TGGCTGTAG . . . NS=1;AN=0 GT:PS ./.:. -1 919521 . GGAATAG . . . NS=1;AN=0 GT:PS ./.:. -1 919555 . A . . . NS=1;AN=0 GT:PS ./.:. -1 919645 . GGGGAGGTCGGGCAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 919695 . A . . END=919922;NS=1;AN=0 GT:PS ./.:. -1 919946 . G . . . NS=1;AN=0 GT:PS ./.:. -1 919960 . C . . . NS=1;AN=0 GT:PS ./.:. -1 919963 . TCTTTTTTCTTTTC . . . NS=1;AN=0 GT:PS ./.:. -1 919979 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 919983 . TT . . . NS=1;AN=0 GT:PS ./.:. -1 919987 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 919991 . CTTA . . . NS=1;AN=0 GT:PS ./.:. -1 919998 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2341357;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluJb|Alu|18.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:919998:VQLOW:15:159,15:158,15:48,17:-159,-15,0:-48,-17,0:7:7,7:0 -1 920001 . T . . NS=1;CGA_WINEND=922000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.80:0.88:2:53:=:53:0.999:303 -1 920002 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs55746336;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluJb|Alu|18.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:919998:VQLOW:15:150,15:152,15:47,17:-150,-15,0:-47,-17,0:6:6,6:0 -1 920006 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4039719;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluJb|Alu|18.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:919998:PASS:33:150,33:152,33:47,31:-150,-33,0:-47,-31,0:5:5,5:0 -1 920010 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4039720;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluJb|Alu|18.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:919998:PASS:47:47,150:47,151:42,47:-150,-47,0:-47,-42,0:7:7,7:0 -1 920014 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs142966470;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluJb|Alu|18.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:919998:VQLOW:14:14,106:14,106:17,41:-106,-14,0:-41,-17,0:5:5,5:0 -1 920018 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:919998:VQLOW:24,.:24,.:0,.:0:0 -1 920022 . A . . . NS=1;AN=0 GT:PS ./.:. -1 920109 . T . . . NS=1;AN=0 GT:PS ./.:. -1 920128 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 920133 . CCTCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 920149 . A . . . NS=1;AN=0 GT:PS ./.:. -1 920166 . C . . . NS=1;AN=0 GT:PS ./.:. -1 920648 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6677020;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=L1ME1|L1|39.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:88:758,88:758,88:53,46:-758,-88,0:-53,-46,0:36:36,36:0 -1 920733 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6677131;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=L1ME1|L1|39.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:70:70,682:70,682:39,53:-682,-70,0:-53,-39,0:32:32,32:0 -1 921570 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6662128;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluY|Alu|4.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:96:836,96:836,96:53,48:-836,-96,0:-53,-48,0:35:35,35:0 -1 921716 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303278;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluY|Alu|4.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:54:516,54:516,54:48,45:-516,-54,0:-48,-45,0:19:19,19:0 -1 922001 . C . . NS=1;CGA_WINEND=924000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:0.88:2:52:=:52:0.999:303 -1 922207 . TGGA . . . NS=1;AN=0 GT:PS ./.:. -1 922305 . G GC . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.126|rs34505725&dbsnp.130|rs70954422&dbsnp.131|rs79100974;CGA_FI=84808|NR_027693.1|C1orf170|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSz|Alu|14.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:37:37,403:37,403:32,35:-403,-37,0:-35,-32,0:19:19,19:0 -1 922538 . G . . . NS=1;AN=0 GT:PS ./.:. -1 922576 . C . . . NS=1;AN=0 GT:PS ./.:. -1 922643 . C . . . NS=1;AN=0 GT:PS ./.:. -1 922646 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:922646:PASS:124,.:64,.:0,.:0:0 -1 922664 . TTCTTTTTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 922741 . A C . . NS=1;AN=2;AC=1;CGA_RPT=AluSq|Alu|10.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:922741:VQLOW:20:20,25:8,13:0,19:-20,0,-25:0,0,-19:5:1,4:4 -1 922745 . TCCGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 922760 . C . . . NS=1;AN=0 GT:PS ./.:. -1 922873 . C . . . NS=1;AN=0 GT:PS ./.:. -1 922877 . G . . . NS=1;AN=0 GT:PS ./.:. -1 922882 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 922905 . ATCCACCCAACTCGGCCTCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 922937 . A . . . NS=1;AN=0 GT:PS ./.:. -1 923050 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:922646:PASS:363,.:328,.:0,.:0:0 -1 923061 . A . . . NS=1;AN=0 GT:PS ./.:. -1 923074 . CCA CCG . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs6605060;CGA_RPT=AluSx1|Alu|12.1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:922646:PASS:399,.:364,.:48,.:-399,0,0:-48,0,0:17:17,.:0 -1 923459 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442609;CGA_RPT=L1ME1|L1|40.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:48:48,371:48,371:42,48:-371,-48,0:-48,-42,0:19:19,19:0 -1 923522 . AGATTTTTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 923749 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442610;CGA_RPT=AluSg|Alu|10.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:68:68,653:68,653:48,52:-653,-68,0:-52,-48,0:25:25,25:0 -1 923978 . A AG . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.130|rs70949537;CGA_RPT=AluSg|Alu|10.7 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:66,.:66,.:13,.:-66,0,0:-13,0,0:38:36,.:2 -1 924001 . A . . NS=1;CGA_WINEND=926000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:0.95:.:0:.:0:0.999:303 -1 924095 . GGAGTTCGGGGTTGATTGTTTCTGGCGTTTAGGGTTGATTGTTTCTGGCGTTCAGG . . . NS=1;AN=0 GT:PS ./.:. -1 924177 . GATAGTTTCTGGAGTTCTGGGTGGAGTGTTTCGGGAGTTCTGGGTTGATTGTTTCTGGGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 924245 . TGATTGTTTCTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 924267 . TTGATTATTTCTGGTGTTCTGGGTTAATTGTT . . . NS=1;AN=0 GT:PS ./.:. -1 924314 . TGATTGTTTCTGGAGTTTGGGGTTGACTGTTTCTGGAGTTCTGGGTTGATTGTTCCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 924383 . TGATTGTTTCTGGAGTTTGGGGTTGACTTT . . . NS=1;AN=0 GT:PS ./.:. -1 924440 . AGTTTGGGGTTGATTGTTTCTGGAGTTCGTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 924485 . GAGTTCTGGGTTGACTGTTTCTGGAGTTCAGGG . . . NS=1;AN=0 GT:PS ./.:. -1 924525 . TTTCTGGAATTTGGGGTTGATTGTTTCTGGAGTTCTGGGTTGATTGT . . . NS=1;AN=0 GT:PS ./.:. -1 924583 . T . . . NS=1;AN=0 GT:PS ./.:. -1 924603 . T . . . NS=1;AN=0 GT:PS ./.:. -1 924616 . T . . . NS=1;AN=0 GT:PS ./.:. -1 924629 . A . . . NS=1;AN=0 GT:PS ./.:. -1 924802 . CACTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 924898 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6665000;CGA_RPT=MSTD|ERVL-MaLR|32.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:118:118,1011:118,1011:52,50:-1011,-118,0:-52,-50,0:41:41,41:0 -1 925551 . AT A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.126|rs35621903&dbsnp.132|rs111393411;CGA_RPT=AluSx1|Alu|9.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:73:73,344:38,310:35,35:-344,-73,0:-35,-35,0:16:16,16:0 -1 925575 . ACCATGTTGGC . . . NS=1;AN=0 GT:PS ./.:. -1 925684 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6605061;CGA_RPT=AluSq2|Alu|8.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:66:446,66:446,66:53,38:-446,-66,0:-53,-38,0:38:38,38:0 -1 925798 . C . . . NS=1;AN=0 GT:PS ./.:. -1 925880 . C . . . NS=1;AN=0 GT:PS ./.:. -1 926001 . G . . NS=1;CGA_WINEND=928000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:0.88:.:0:.:0:0.999:303 -1 926166 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 926169 . T . . . NS=1;AN=0 GT:PS ./.:. -1 926351 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6671243;CGA_RPT=AluSx1|Alu|16.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:59:59,551:59,551:46,52:-551,-59,0:-52,-46,0:25:24,24:1 -1 926431 . A T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970403;CGA_RPT=L1MEc|L1|32.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:79:903,79:903,79:53,42:-903,-79,0:-53,-42,0:38:37,37:1 -1 926621 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970351;CGA_RPT=AluJb|Alu|21.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:52:52,477:52,477:44,48:-477,-52,0:-48,-44,0:17:17,17:0 -1 927309 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2341362 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:83:752,83:752,83:53,44:-752,-83,0:-53,-44,0:35:35,35:0 -1 927535 . TTTTG T . . NS=1;AN=2;AC=1;CGA_RPT=AluSx|Alu|16.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:25:25,25:0,0:0,3:-25,0,-25:0,0,-3:26:2,24:24 -1 928001 . T . . NS=1;CGA_WINEND=930000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.82:0.89:.:0:.:0:0.999:303 -1 928346 . G . . . NS=1;AN=0 GT:PS ./.:. -1 928364 . C . . . NS=1;AN=0 GT:PS ./.:. -1 928373 . AGG . . . NS=1;AN=0 GT:PS ./.:. -1 928441 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:928441:PASS:51,.:48,.:0,.:0:0 -1 928520 . A . . . NS=1;AN=0 GT:PS ./.:. -1 928578 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.125|rs28394749 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:151:151,1329:151,1329:52,49:-1329,-151,0:-52,-49,0:59:58,58:1 -1 928836 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9777703 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:82:82,925:82,925:44,53:-925,-82,0:-53,-44,0:38:37,37:1 -1 928979 . GGGG GGGGC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.126|rs35805387&dbsnp.131|rs79780781 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:928979:PASS:117,.:117,.:35,.:-117,0,0:-35,0,0:16:16,.:0 -1 928993 . G . . . NS=1;AN=0 GT:PS ./.:. -1 929190 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9777939;CGA_RPT=MSTA|ERVL-MaLR|19.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:63:490,63:490,63:52,48:-490,-63,0:-52,-48,0:28:28,28:0 -1 929316 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9777893;CGA_RPT=MSTA|ERVL-MaLR|19.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:50:50,606:50,606:43,52:-606,-50,0:-52,-43,0:25:24,24:1 -1 929321 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13302916;CGA_RPT=MSTA|ERVL-MaLR|19.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:31:31,619:31,619:30,52:-619,-31,0:-52,-30,0:26:25,25:1 -1 929327 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13302924;CGA_RPT=MSTA|ERVL-MaLR|19.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:62:62,536:62,536:47,52:-536,-62,0:-52,-47,0:23:23,23:0 -1 929459 . TTTGGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 929701 . C CA . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.126|rs35561765;CGA_RPT=AluSz|Alu|12.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:14:14,106:14,106:16,35:-106,-14,0:-35,-16,0:12:11,11:1 -1 929872 . C . . . NS=1;AN=0 GT:PS ./.:. -1 930001 . G . . NS=1;CGA_WINEND=932000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.88:0.92:.:0:.:0:0.999:303 -1 930162 . C . . . NS=1;AN=0 GT:PS ./.:. -1 930888 . ATTTTT A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs55952878&dbsnp.134|rs139994710;CGA_RPT=L1MEc|L1|22.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:54:250,54:250,54:38,38:-250,-54,0:-38,-38,0:29:29,29:0 -1 930923 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710882 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:22:165,22:165,22:48,22:-165,-22,0:-48,-22,0:7:7,7:0 -1 931014 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710881 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:72:72,735:72,735:40,53:-735,-72,0:-53,-40,0:39:37,37:2 -1 931096 . G . . . NS=1;AN=0 GT:PS ./.:. -1 931327 . CGTTCCCCCCGCGGCAGACAAGCCCAGACACACACGGCCCAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 931497 . T . . . NS=1;AN=0 GT:PS ./.:. -1 931500 . C . . . NS=1;AN=0 GT:PS ./.:. -1 931504 . CTATTCAGCAG CTATACAGCAG . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.100|rs2799061 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:931504:PASS:261,.:257,.:48,.:-261,0,0:-48,0,0:12:10,.:0 -1 931518 . ACC . . . NS=1;AN=0 GT:PS ./.:. -1 931582 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 931589 . CAGCCGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 931659 . TGAGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 931730 . A . . . NS=1;AN=0 GT:PS ./.:. -1 931769 . AGGGCAGAAAGGACCCCCCGCTGGAGGGGGCACCCCACGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 931835 . AGGACAGAAAGGACCCCCCGCTGGAGGGGGCACCCCACAT . . . NS=1;AN=0 GT:PS ./.:. -1 931901 . AGGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 931928 . GGGCACCTCACGTCTGGGGCCACAGGATGCA . . . NS=1;AN=0 GT:PS ./.:. -1 931967 . AGGACAGAAAGGACCCCCCGCTGGAGGGGGCACCCCACATCT . . . NS=1;AN=0 GT:PS ./.:. -1 932001 . C . . NS=1;CGA_WINEND=934000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.30:0.75:2:48:=:48:0.999:303 -1 932018 . GGATGCAGGGTGGGGAGGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 932050 . CCGCGGGAGGGGGCACCTCACGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 932084 . GGATGCAGGGTGGGGAGGGCAGAAAGAACCCCCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 932126 . GGCACCCCACATCTGGGGCCACAGGATGCAGGGTGGGGAGGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 932182 . CGCGGGAGGGGGCACCTCACGTCT . . . NS=1;AN=0 GT:PS ./.:. -1 932215 . GGATGCAGGGTGGGGAGGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 932618 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3128112 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:170:1385,170:1385,170:48,44:-1385,-170,0:-48,-44,0:62:62,62:0 -1 933790 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442392 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:58:532,58:532,58:52,46:-532,-58,0:-52,-46,0:25:25,25:0 -1 933854 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 933858 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 933866 . CTC . . . NS=1;AN=0 GT:PS ./.:. -1 934001 . G . . NS=1;CGA_WINEND=936000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.23:1.45:2:40:=:40:0.999:303 -1 934099 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710867;CGA_RPT=G-rich|Low_complexity|26.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:934099:VQLOW:14:144,14:142,14:46,17:-144,-14,0:-46,-17,0:7:7,7:0 -1 934102 . G . . . NS=1;AN=0 GT:PS ./.:. -1 934106 . AT GT . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.100|rs2799063;CGA_RPT=G-rich|Low_complexity|26.3 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:934099:PASS:94,.:91,.:37,.:-94,0,0:-37,0,0:7:7,.:0 -1 934126 . GGAGGGGAGGGCGCGGAGCG . . . NS=1;AN=0 GT:PS ./.:. -1 934154 . GGAGCG . . . NS=1;AN=0 GT:PS ./.:. -1 934164 . G . . . NS=1;AN=0 GT:PS ./.:. -1 934180 . C . . . NS=1;AN=0 GT:PS ./.:. -1 934183 . TCCCGCCCCCAGCCCGCCCCCCCGGGCCCGCCCGACGCCCCCAGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 934248 . GGGAGGAGGGCGGGACCCCGGCGCGGCGTGGCTGCGGGGCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 934339 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 934402 . G . . . NS=1;AN=0 GT:PS ./.:. -1 934435 . A . . . NS=1;AN=0 GT:PS ./.:. -1 934459 . AGGGCCCACCCGGGCCCTGCGGCCCCGCCCTGGGGGCGGCGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 934514 . CAGACCCGGCAGCAGCGGCGGCGCGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 934565 . T . . . NS=1;AN=0 GT:PS ./.:. -1 935020 . C T . . NS=1;AN=2;AC=1;CGA_FI=57801|NM_001142467.1|HES4|INTRON|UNKNOWN-INC&57801|NM_021170.3|HES4|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:92:244,92:238,86:48,38:-244,0,-92:-48,0,-38:19:13,6:6 -1 935222 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2298214;CGA_FI=57801|NM_001142467.1|HES4|CDS|MISSENSE&57801|NM_021170.3|HES4|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:36:196,36:196,36:48,34:-196,-36,0:-48,-34,0:12:12,12:0 -1 935416 . C . . . NS=1;AN=0 GT:PS ./.:. -1 935420 . C . . . NS=1;AN=0 GT:PS ./.:. -1 935423 . G . . . NS=1;AN=0 GT:PS ./.:. -1 935428 . GAGCGCG . . . NS=1;AN=0 GT:PS ./.:. -1 935437 . G . . . NS=1;AN=0 GT:PS ./.:. -1 935459 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3128113;CGA_FI=57801|NM_001142467.1|HES4|UTR5|UNKNOWN-INC&57801|NM_021170.3|HES4|UTR5|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:14:14,73:14,73:17,31:-73,-14,0:-31,-17,0:5:5,5:0 -1 935492 . G T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121571;CGA_FI=57801|NM_001142467.1|HES4|UTR5|UNKNOWN-INC&57801|NM_021170.3|HES4|UTR5|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:45:45,341:45,341:41,48:-341,-45,0:-48,-41,0:14:14,14:0 -1 935600 . T . . . NS=1;AN=0 GT:PS ./.:. -1 935633 . C . . . NS=1;AN=0 GT:PS ./.:. -1 935650 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:935650:VQLOW:20,.:15,.:0,.:0:0 -1 935656 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 935673 . TCCG . . . NS=1;AN=0 GT:PS ./.:. -1 935703 . TCCCGCGCTCCCCGATGCAGCCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 935792 . G . . . NS=1;AN=0 GT:PS ./.:. -1 935833 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3128115;CGA_FI=57801|NM_001142467.1|HES4|TSS-UPSTREAM|UNKNOWN-INC&57801|NM_021170.3|HES4|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:32:32,300:32,300:31,48:-300,-32,0:-48,-31,0:13:13,13:0 -1 935893 . T . . . NS=1;AN=0 GT:PS ./.:. -1 935896 . C . . . NS=1;AN=0 GT:PS ./.:. -1 935899 . AGG . . . NS=1;AN=0 GT:PS ./.:. -1 935910 . G . . . NS=1;AN=0 GT:PS ./.:. -1 936001 . C . . NS=1;CGA_WINEND=938000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.78:1.21:2:43:=:43:0.999:303 -1 936003 . A . . . NS=1;AN=0 GT:PS ./.:. -1 936111 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.92|rs1936360;CGA_FI=57801|NM_001142467.1|HES4|TSS-UPSTREAM|UNKNOWN-INC&57801|NM_021170.3|HES4|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:58:574,58:574,58:52,46:-574,-58,0:-52,-46,0:25:25,25:0 -1 936194 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121570;CGA_FI=57801|NM_001142467.1|HES4|TSS-UPSTREAM|UNKNOWN-INC&57801|NM_021170.3|HES4|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:46:417,46:417,46:48,41:-417,-46,0:-48,-41,0:18:18,18:0 -1 936210 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121569;CGA_FI=57801|NM_001142467.1|HES4|TSS-UPSTREAM|UNKNOWN-INC&57801|NM_021170.3|HES4|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:37:372,37:372,37:48,35:-372,-37,0:-48,-35,0:19:18,18:1 -1 936265 . GGCACCCGGGGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 936317 . T . . . NS=1;AN=0 GT:PS ./.:. -1 936335 . T . . . NS=1;AN=0 GT:PS ./.:. -1 936338 . CCCGC . . . NS=1;AN=0 GT:PS ./.:. -1 936520 . G . . . NS=1;AN=0 GT:PS ./.:. -1 936525 . ACGCGTCCGT . . . NS=1;AN=0 GT:PS ./.:. -1 936614 . CAGGAGGCAGATGGCAGACTCAGCAGTCAC . . . NS=1;AN=0 GT:PS ./.:. -1 936675 . G . . . NS=1;AN=0 GT:PS ./.:. -1 936724 . GT . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:936724:PASS:41,.:21,.:0,.:0:0 -1 936734 . C . . . NS=1;AN=0 GT:PS ./.:. -1 936827 . G A . . NS=1;AN=2;AC=1;CGA_FI=57801|NM_001142467.1|HES4|TSS-UPSTREAM|UNKNOWN-INC&57801|NM_021170.3|HES4|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:24:24,24:11,11:0,17:-24,0,-24:0,0,-17:17:4,13:13 -1 936957 . C . . . NS=1;AN=0 GT:PS ./.:. -1 937032 . TTCG . . . NS=1;AN=0 GT:PS ./.:. -1 937038 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 937057 . G . . . NS=1;AN=0 GT:PS ./.:. -1 937069 . A . . . NS=1;AN=0 GT:PS ./.:. -1 937651 . C . . . NS=1;AN=0 GT:PS ./.:. -1 937680 . GGC . . . NS=1;AN=0 GT:PS ./.:. -1 937688 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2489000;CGA_FI=57801|NM_001142467.1|HES4|TSS-UPSTREAM|UNKNOWN-INC&57801|NM_021170.3|HES4|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=FLAM_C|Alu|13.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:937688:PASS:42:322,42:322,42:48,39:-322,-42,0:-48,-39,0:17:17,17:0 -1 937795 . C . . . NS=1;AN=0 GT:PS ./.:. -1 938001 . G . . NS=1;CGA_WINEND=940000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:0.89:2:49:=:49:0.999:303 -1 938116 . T G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710869;CGA_FI=57801|NM_001142467.1|HES4|TSS-UPSTREAM|UNKNOWN-INC&57801|NM_021170.3|HES4|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSp|Alu|10.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:64:676,64:676,64:52,48:-676,-64,0:-52,-48,0:26:26,26:0 -1 938213 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2799058;CGA_FI=57801|NM_001142467.1|HES4|TSS-UPSTREAM|UNKNOWN-INC&57801|NM_021170.3|HES4|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSp|Alu|10.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:86:86,994:86,994:45,50:-994,-86,0:-50,-45,0:41:40,40:1 -1 940001 . T . . NS=1;CGA_WINEND=942000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:1.01:2:51:=:51:0.999:303 -1 940096 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4503294;CGA_FI=57801|NM_001142467.1|HES4|TSS-UPSTREAM|UNKNOWN-INC&57801|NM_021170.3|HES4|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=L2|L2|43.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:137:137,1223:137,1223:52,50:-1223,-137,0:-52,-50,0:49:48,48:1 -1 940413 . CTGCTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 940725 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 940728 . C . . . NS=1;AN=0 GT:PS ./.:. -1 940735 . G . . . NS=1;AN=0 GT:PS ./.:. -1 940739 . CCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 941146 . A C . . NS=1;AN=2;AC=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:941146:VQLOW:34:34,48:20,34:0,27:-34,0,-48:0,0,-27:9:2,7:7 -1 941150 . GTG GTTC . . NS=1;AN=1;AC=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP .|1:941146:PASS:.,48:.,34:.,1:-48,0,0:-1,0,0:8:.,8:0 -1 941161 . G . . . NS=1;AN=0 GT:PS ./.:. -1 941163 . AGTTCCCCCCGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 941182 . CCC . . . NS=1;AN=0 GT:PS ./.:. -1 941471 . G . . . NS=1;AN=0 GT:PS ./.:. -1 941490 . GGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 941644 . GGACCCTCCGCCCT . . . NS=1;AN=0 GT:PS ./.:. -1 941665 . G . . . NS=1;AN=0 GT:PS ./.:. -1 941672 . T . . . NS=1;AN=0 GT:PS ./.:. -1 941680 . G . . . NS=1;AN=0 GT:PS ./.:. -1 942001 . T . . NS=1;CGA_WINEND=944000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.85:0.84:2:52:=:52:0.999:303 -1 942849 . A . . . NS=1;AN=0 GT:PS ./.:. -1 942865 . A . . . NS=1;AN=0 GT:PS ./.:. -1 943127 . TTTTTTTTTTTTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 943250 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121568;CGA_RPT=AluYa5|Alu|0.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:42:42,303:42,303:39,48:-303,-42,0:-48,-39,0:14:14,14:0 -1 943468 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121567 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:73:73,687:73,687:48,52:-687,-73,0:-52,-48,0:29:29,29:0 -1 943907 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2488992;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSq2|Alu|9.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:51:51,461:51,461:33,50:-461,-51,0:-50,-33,0:42:42,42:0 -1 943968 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303313;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSq2|Alu|9.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:57:486,57:486,57:52,46:-486,-57,0:-52,-46,0:21:21,21:0 -1 944001 . G . . NS=1;CGA_WINEND=946000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.58:0.80:2:51:=:51:0.999:303 -1 945058 . C . . . NS=1;AN=0 GT:PS ./.:. -1 945096 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3135457;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:87:87,901:87,901:45,50:-901,-87,0:-50,-45,0:41:41,41:0 -1 945111 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303172;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:69:733,69:733,69:53,39:-733,-69,0:-53,-39,0:33:33,33:0 -1 945214 . GAGTGCAGTGGTGCGATCACAGCTCACTGCAGCCTCCACCTCCCGGGTTCAA . . . NS=1;AN=0 GT:PS ./.:. -1 945474 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121566;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSz|Alu|10.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:79:79,762:79,762:42,53:-762,-79,0:-53,-42,0:32:31,31:1 -1 945612 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121565;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSz|Alu|10.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:26:26,243:26,243:26,48:-243,-26,0:-48,-26,0:9:9,9:0 -1 945669 . C . . . NS=1;AN=0 GT:PS ./.:. -1 945720 . T . . . NS=1;AN=0 GT:PS ./.:. -1 945862 . TTATTTATTT TTATTT . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.108|rs4039717&dbsnp.126|rs35187710&dbsnp.129|rs61025563&dbsnp.134|rs141880858&dbsnp.134|rs150242145;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=(TTTA)n|Simple_repeat|13.6 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:945862:PASS:53,.:53,.:16,.:-53,0,0:-16,0,0:6:6,.:0 -1 945876 . ATTTGAATCTTAT . . . NS=1;AN=0 GT:PS ./.:. -1 946001 . T . . NS=1;CGA_WINEND=948000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.70:1.09:2:51:=:51:0.999:303 -1 946134 . TTTT . . . NS=1;AN=0 GT:PS ./.:. -1 946180 . TGCAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 946195 . TCGGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 946204 . T . . . NS=1;AN=0 GT:PS ./.:. -1 946207 . AGCCTCCG . . . NS=1;AN=0 GT:PS ./.:. -1 946331 . TTGGCCAGGCTGGTCTCAAA . . . NS=1;AN=0 GT:PS ./.:. -1 947034 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465126;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=L1MB5|L1|34.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:74:74,662:74,662:48,52:-662,-74,0:-52,-48,0:28:28,28:0 -1 947538 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465125;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSx|Alu|12.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:50:50,517:50,517:43,52:-517,-50,0:-52,-43,0:22:22,22:0 -1 947617 . A . . . NS=1;AN=0 GT:PS ./.:. -1 947761 . TTTA . . . NS=1;AN=0 GT:PS ./.:. -1 947769 . TATTATTACAGTCATAC . . . NS=1;AN=0 GT:PS ./.:. -1 947791 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:947791:VQLOW:30,.:0,.:0,.:0:0 -1 947838 . AATGATG . . . NS=1;AN=0 GT:PS ./.:. -1 947934 . TTTATTTTTATTT . . . NS=1;AN=0 GT:PS ./.:. -1 948001 . T . . NS=1;CGA_WINEND=950000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.91:0.86:2:52:=:52:0.999:303 -1 948134 . CAGTGGCTC . . . NS=1;AN=0 GT:PS ./.:. -1 948421 . A AAAC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.129|rs60637872&dbsnp.132|rs113047134;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluJb|Alu|14.5 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:230,.:199,.:36,.:-230,0,0:-36,0,0:34:21,.:0 -1 948692 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2341365;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:78:78,722:78,722:42,53:-722,-78,0:-53,-42,0:35:35,35:0 -1 948846 . T TA . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3841266&dbsnp.114|rs5772027;CGA_FI=9636|NM_005101.3|ISG15|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:93:93,854:93,854:39,37:-854,-93,0:-39,-37,0:44:44,44:0 -1 948870 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4615788;CGA_FI=9636|NM_005101.3|ISG15|UTR5|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:93:816,93:816,93:53,47:-816,-93,0:-53,-47,0:39:39,39:0 -1 948921 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.52|rs15842;CGA_FI=9636|NM_005101.3|ISG15|UTR5|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:84:84,679:84,679:44,53:-679,-84,0:-53,-44,0:30:30,30:0 -1 949235 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465124;CGA_FI=9636|NM_005101.3|ISG15|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:52:389,52:389,52:52,44:-389,-52,0:-52,-44,0:21:21,21:0 -1 949654 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.52|rs8997;CGA_FI=9636|NM_005101.3|ISG15|CDS|SYNONYMOUS;CGA_PFAM=PFAM|PF00240|ubiquitin,PFAM|PF00788|UBQ GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:20:150,20:150,20:47,21:-150,-20,0:-47,-21,0:9:9,9:0 -1 949925 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2799070 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:108:108,904:108,904:52,50:-904,-108,0:-52,-50,0:42:42,42:0 -1 950001 . G . . NS=1;CGA_WINEND=952000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:0.91:2:53:=:53:0.999:303 -1 950113 . GAAGT G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.130|rs70949541&dbsnp.134|rs140258289 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:86:844,86:867,86:38,39:-844,-86,0:-39,-38,0:42:42,42:0 -1 950677 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9331223;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:140:1162,140:1162,140:49,52:-1162,-140,0:-52,-49,0:51:51,51:0 -1 950716 . A T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2799069;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:70:254,70:254,70:52,48:-254,-70,0:-52,-48,0:29:29,29:0 -1 950837 . GCTG . . . NS=1;AN=0 GT:PS ./.:. -1 951283 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442363;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluY|Alu|5.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:91:765,91:765,91:53,47:-765,-91,0:-53,-47,0:35:35,35:0 -1 951295 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442388;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluY|Alu|5.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:93:847,93:847,93:53,47:-847,-93,0:-53,-47,0:34:34,34:0 -1 951322 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9697362&dbsnp.131|rs75970800;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluY|Alu|5.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:43:411,43:411,43:52,39:-411,-43,0:-52,-39,0:25:25,25:0 -1 951330 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9697717;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluY|Alu|5.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:49:49,432:49,432:43,52:-432,-49,0:-52,-43,0:20:20,20:0 -1 951441 . T . . . NS=1;AN=0 GT:PS ./.:. -1 951824 . G . . . NS=1;AN=0 GT:PS ./.:. -1 952001 . C . . NS=1;CGA_WINEND=954000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.61:0.88:.:0:.:0:0.999:303 -1 952428 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442611;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSx|Alu|12.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:27:27,215:27,215:27,48:-215,-27,0:-48,-27,0:11:11,11:0 -1 952554 . TCCAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 952712 . GCT . . . NS=1;AN=0 GT:PS ./.:. -1 952839 . CTTGGCGGTGAGTGTTACAGCTCATAAAAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 953033 . GCCTAGT . . . NS=1;AN=0 GT:PS ./.:. -1 953168 . AGT . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:116,.:89,.:0,.:0:0 -1 953183 . A AGT . . NS=1;AN=2;AC=2;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=LTR12C|ERV1|19.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:61:222,61:195,40:39,34:-222,-61,0:-39,-34,0:29:29,29:0 -1 953215 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:953215:PASS:153,.:120,.:0,.:0:0 -1 953223 . GCGCT GTGCT . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.134|rs139816136;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=LTR12C|ERV1|19.5 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:953215:PASS:112,.:79,.:27,.:-112,0,0:-27,0,0:27:18,.:1 -1 953238 . TTT . . . NS=1;AN=0 GT:PS ./.:. -1 953351 . GGCGCAT . . . NS=1;AN=0 GT:PS ./.:. -1 953428 . CGATTGGTGTATTTACAATCCCTGAGCTAGACATAAAGGTT . . . NS=1;AN=0 GT:PS ./.:. -1 953678 . TGCGC CGCGC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.134|rs142441326;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=LTR12C|ERV1|19.5 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:71,.:71,.:31,.:-71,0,0:-31,0,0:6:6,.:0 -1 953761 . GCTTGGGCCGCACAGGAGCCCACGGA . . . NS=1;AN=0 GT:PS ./.:. -1 953875 . A . . . NS=1;AN=0 GT:PS ./.:. -1 953893 . CTGCTGGGGGACCCACTACACCCTCCGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 953952 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442612;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=LTR12C|ERV1|19.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:15:15,124:15,124:17,43:-124,-15,0:-43,-17,0:14:12,12:2 -1 954001 . A . . NS=1;CGA_WINEND=956000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.51:1.01:2:52:=:52:0.999:303 -1 954619 . C . . . NS=1;AN=0 GT:PS ./.:. -1 954625 . T . . . NS=1;AN=0 GT:PS ./.:. -1 954777 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs61766299;CGA_FI=375790|NM_198576.2|AGRN|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:954777:VQLOW:13:99,13:99,13:39,16:-99,-13,0:-39,-16,0:6:6,6:0 -1 954794 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 954800 . G . . . NS=1;AN=0 GT:PS ./.:. -1 954813 . G . . . NS=1;AN=0 GT:PS ./.:. -1 954824 . A . . . NS=1;AN=0 GT:PS ./.:. -1 954832 . G . . . NS=1;AN=0 GT:PS ./.:. -1 954860 . GGGGGGGCGGGGCCGGGAGGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 954908 . CAGCGCCCG . . . NS=1;AN=0 GT:PS ./.:. -1 954924 . ACCGGGACCCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 954951 . C . . END=955175;NS=1;AN=0 GT:PS ./.:. -1 955185 . C . . . NS=1;AN=0 GT:PS ./.:. -1 955203 . C . . END=955678;NS=1;AN=0 GT:PS ./.:. -1 955686 . TGGTGCTCACCGGGACGGTGGAGGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 955728 . C . . END=955977;NS=1;AN=0 GT:PS ./.:. -1 955985 . A . . . NS=1;AN=0 GT:PS ./.:. -1 955987 . ACAAG . . . NS=1;AN=0 GT:PS ./.:. -1 955994 . CACCTCGC . . . NS=1;AN=0 GT:PS ./.:. -1 956001 . C . . NS=1;CGA_WINEND=958000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.72:0.79:2:50:=:50:0.999:303 -1 956007 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:956007:VQLOW:23,.:21,.:0,.:0:0 -1 956015 . CTG . . . NS=1;AN=0 GT:PS ./.:. -1 956022 . G . . . NS=1;AN=0 GT:PS ./.:. -1 956024 . GCCCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 956051 . GGGC . . . NS=1;AN=0 GT:PS ./.:. -1 956071 . TTTGCCCGGGCGGGGAGCGGGGGCTGGGCCTGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 956109 . GCTTGTTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 956129 . GCCTTTCCCGGGGCGAGAGGGGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 956162 . AGAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 956178 . AGCCTGAGCTCCCAACCCCGGGAGCCAGGTGGGGGGTGCCGCAGTGGTGCGGGGGGGGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 956241 . G . . . NS=1;AN=0 GT:PS ./.:. -1 956308 . AT . . . NS=1;AN=0 GT:PS ./.:. -1 956315 . T . . . NS=1;AN=0 GT:PS ./.:. -1 956327 . C . . . NS=1;AN=0 GT:PS ./.:. -1 956333 . T . . . NS=1;AN=0 GT:PS ./.:. -1 956455 . C . . . NS=1;AN=0 GT:PS ./.:. -1 956467 . A . . . NS=1;AN=0 GT:PS ./.:. -1 956818 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 956852 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9777931;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:64:457,64:459,64:52,48:-457,-64,0:-52,-48,0:27:27,27:0 -1 957633 . T . . . NS=1;AN=0 GT:PS ./.:. -1 957967 . T TTGTAGTCTGACCTGTGGTCTGAC . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.114|rs6143083;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:14:14,346:14,346:16,37:-346,-14,0:-37,-16,0:40:39,39:1 -1 957980 . T . . . NS=1;AN=0 GT:PS ./.:. -1 958001 . C . . NS=1;CGA_WINEND=960000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.32:0.89:2:52:=:52:0.999:303 -1 958067 . T C . . NS=1;AN=2;AC=1;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:958067:VQLOW:27:27,27:14,14:0,19:-27,0,-27:0,0,-19:17:4,13:13 -1 958070 . C . . . NS=1;AN=0 GT:PS ./.:. -1 958905 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2710890;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:406:406,406:406,406:50,54:-406,0,-406:-50,0,-54:41:20,21:21 -1 959155 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs3845291;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:111:1019,111:1019,111:50,52:-1019,-111,0:-52,-50,0:44:44,44:0 -1 959169 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs3845292;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:86:86,881:86,881:45,50:-881,-86,0:-50,-45,0:43:43,43:0 -1 959231 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4039721;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:85:85,698:85,698:45,53:-698,-85,0:-53,-45,0:30:30,30:0 -1 959509 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28591569;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:208:288,208:288,208:52,50:-288,0,-208:-52,0,-50:28:15,13:13 -1 960001 . C . . NS=1;CGA_WINEND=962000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.10:0.80:2:51:=:51:0.999:303 -1 960409 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970392;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:77:77,761:77,761:42,53:-761,-77,0:-53,-42,0:31:31,31:0 -1 960734 . CA . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:VQLOW:22,.:8,.:0,.:0:0 -1 960790 . T . . . NS=1;AN=0 GT:PS ./.:. -1 961008 . CCC . . . NS=1;AN=0 GT:PS ./.:. -1 961054 . G . . . NS=1;AN=0 GT:PS ./.:. -1 961057 . TCGA . . . NS=1;AN=0 GT:PS ./.:. -1 961252 . GT . . . NS=1;AN=0 GT:PS ./.:. -1 961257 . T . . . NS=1;AN=0 GT:PS ./.:. -1 961364 . TGGGGGGGGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 961827 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121556;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:79:671,79:671,79:52,48:-671,-79,0:-52,-48,0:28:28,28:0 -1 962001 . G . . NS=1;CGA_WINEND=964000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.07:0.91:.:0:.:0:0.999:303 -1 962236 . GGC . . . NS=1;AN=0 GT:PS ./.:. -1 962241 . T . . . NS=1;AN=0 GT:PS ./.:. -1 962266 . GGGTGAGC . . . NS=1;AN=0 GT:PS ./.:. -1 962606 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970393;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC;CGA_RPT=(TG)n|Simple_repeat|21.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:91:773,91:773,91:53,47:-773,-91,0:-53,-47,0:32:32,32:0 -1 962891 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970394;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:101:790,101:790,101:50,50:-790,-101,0:-50,-50,0:40:40,40:0 -1 963013 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442389;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:91:736,91:736,91:53,47:-736,-91,0:-53,-47,0:34:34,34:0 -1 963143 . T . . . NS=1;AN=0 GT:PS ./.:. -1 963249 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710870;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:65:524,65:524,65:52,48:-524,-65,0:-52,-48,0:25:25,25:0 -1 963701 . CCCCCCCC CACCCC . . NS=1;AN=1;AC=1;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC;CGA_RPT=GC_rich|Low_complexity|0.0 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:40,.:38,.:2,.:-40,0,0:-2,0,0:19:10,.:0 -1 963721 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465127;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:66:66,288:64,290:48,52:-288,-66,0:-52,-48,0:25:25,25:0 -1 963835 . AGCCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 964001 . A . . NS=1;CGA_WINEND=966000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.66:0.79:.:0:.:0:0.999:303 -1 964051 . ACAGCCACGCCACCCTCTCCCAAGGAACCGAGCCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 964115 . C . . . NS=1;AN=0 GT:PS ./.:. -1 964128 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4246501;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:46:144,46:144,46:44,41:-144,-46,0:-44,-41,0:28:26,26:2 -1 964267 . C . . . NS=1;AN=0 GT:PS ./.:. -1 964298 . C G . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.111|rs4970396;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:47,.:47,.:20,.:-47,0,0:-20,0,0:11:7,.:4 -1 964385 . GAGCCCCAGCCCCTCGTGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 964447 . AGCCCCA . . . NS=1;AN=0 GT:PS ./.:. -1 964766 . G GGAC . . NS=1;AN=2;AC=1;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:48:48,282:55,282:7,38:-48,0,-282:-7,0,-38:36:17,19:19 -1 964815 . C . . . NS=1;AN=0 GT:PS ./.:. -1 964840 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970343;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:15:195,15:199,15:48,17:-195,-15,0:-48,-17,0:9:9,9:0 -1 964848 . A T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970344;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:20:261,20:268,37:48,35:-261,-20,0:-48,-35,0:12:12,12:0 -1 964891 . C . . . NS=1;AN=0 GT:PS ./.:. -1 964896 . G . . . NS=1;AN=0 GT:PS ./.:. -1 964978 . T . . . NS=1;AN=0 GT:PS ./.:. -1 964984 . C . . . NS=1;AN=0 GT:PS ./.:. -1 964992 . GGTGCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 965009 . A . . . NS=1;AN=0 GT:PS ./.:. -1 965020 . TGTGTGCAGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 965051 . ATGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 965097 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 965102 . A . . . NS=1;AN=0 GT:PS ./.:. -1 965122 . CAGCGTGTGTGTGTGCAGTGCATGGTGCTGTGAGTGTGAGATCGTGTGTGTGTATGCAGTGCATGGTGCTGTGTGAGATCAGCGTGTGTGTGTGTGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 965240 . AGCATGTGTGTGTGCAGTGCATGGTGCTGTGAGATCAGTGTGTGTGTGTGTGCAGTGCATGGTGCTGTGTGAGATCAGCATGTGTGTGTGTGTGCAGCGCATGGTGCTGTGTGAGATCAGCATGTGTGTGTGTGTGTGCAGTGCATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 965400 . AGCATGTGTGTGTGCAGTGCATGGTGCTGTGTGAGATCAGCATGTGTGTGTGTGTGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 965477 . CAGCATGTGTGTGTGTGTGTGCAGTGCATGGTGCTGTGAGATCAGCATGTGTGTGTGTGTGTGTGCAGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 965563 . CAGCATGTGTGTGTGCAGTGCATGGTGCTGTGAGATCAGCGCGTGTGTGTGTGCAGTGCATGGTGCTGTGAGATCAGCGTGTGTGTGTGTGCAGTGCATGGTGCTGTGTGAGATCAGCATGTGTGTGTGTGCAGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 965730 . CAGCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 965744 . GTGTGAGATCAGCATGTGTGTGTGTGTGTGCAGTGCATGGTGCTGTGAGATCAGCGTGTGTGTGTGTGCAGT . . . NS=1;AN=0 GT:PS ./.:. -1 965826 . GTGTGAGATCAGCATGTGTGTGTGTGCAGTGCATGGTGCTGTGAGATCAGCGTGTGTGTGCAGCGCATGGTGCTGTGTGAGATCAGCGTGTGTGTGTGCAGCGCATGGTGCTGAGAGATCAGCATGTGTGTGTGCAGTGCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 965981 . CAGCGTGTGTGTGTGTGCAGTGCATGGTGCTGAGTGTGAGATCAGCATGTGTGTGTGTGCAGTGCATGGTGCTGTGAGATCAGTGTGTGTGTGTGCAGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 966001 . T . . NS=1;CGA_WINEND=968000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:0.91:.:0:.:0:0.999:303 -1 966090 . GTGTGAGATCAGCATGTGTGTGTGTGTGTGTGCAGCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 966144 . AGCATGTGTGTGTGTGTGTGTGTGCAGTGCATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 966190 . AGCATGTGTGTGTGCAGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 966225 . CAGCGTGTGTGTGTGTGCAGTGCATGGTGCTGTGTGAGATCAGCATGT . . . NS=1;AN=0 GT:PS ./.:. -1 966291 . GTGCTGTGAGATCAGCGTGTGTGTGTGTGCAGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 966374 . CTGAGTGTGAGATCAGCATGTGTGTGTGCAGTGCATGGTGCTGTGAGTGTATCAGCATGTGTGTGTGTGCAGTGCATGGTGCTGTGAGTGTGATTGTGTGTGTGTGTGTGCGGTGCATGGTGCTGTGTGAGATG . . . NS=1;AN=0 GT:PS ./.:. -1 966577 . AGCATGTGTGTGTGCAGTGCATGGTGCTGTGAGTG . . . NS=1;AN=0 GT:PS ./.:. -1 966725 . AGTACTC . . . NS=1;AN=0 GT:PS ./.:. -1 967658 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970349;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:99:99,897:99,897:49,53:-897,-99,0:-53,-49,0:38:38,38:0 -1 968001 . C . . NS=1;CGA_WINEND=970000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.34:1.10:2:51:=:51:0.999:303 -1 968275 . TCTAGGGGATGG . . . NS=1;AN=0 GT:PS ./.:. -1 968468 . TTC . . . NS=1;AN=0 GT:PS ./.:. -1 968475 . A . . . NS=1;AN=0 GT:PS ./.:. -1 968709 . GGGCGGGGCCGCGGGCGCAGACACTCGCGGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 968767 . T . . . NS=1;AN=0 GT:PS ./.:. -1 968772 . C . . . NS=1;AN=0 GT:PS ./.:. -1 968775 . G . . . NS=1;AN=0 GT:PS ./.:. -1 968785 . G . . . NS=1;AN=0 GT:PS ./.:. -1 968797 . G . . . NS=1;AN=0 GT:PS ./.:. -1 968820 . G . . . NS=1;AN=0 GT:PS ./.:. -1 968925 . G . . . NS=1;AN=0 GT:PS ./.:. -1 968928 . G . . . NS=1;AN=0 GT:PS ./.:. -1 969009 . C . . END=969242;NS=1;AN=0 GT:PS ./.:. -1 969293 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 969419 . C . . . NS=1;AN=0 GT:PS ./.:. -1 969448 . GCT . . . NS=1;AN=0 GT:PS ./.:. -1 969452 . T . . . NS=1;AN=0 GT:PS ./.:. -1 969459 . G . . . NS=1;AN=0 GT:PS ./.:. -1 969461 . T . . . NS=1;AN=0 GT:PS ./.:. -1 969513 . CTGCCGG . . . NS=1;AN=0 GT:PS ./.:. -1 969532 . CAGACCCTCGGCGCCCGGCCCCCGCGCACCTGCCGCGCGCAC . . . NS=1;AN=0 GT:PS ./.:. -1 969593 . G . . . NS=1;AN=0 GT:PS ./.:. -1 969809 . GG CT . . NS=1;AN=2;AC=1;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:969809:VQLOW:25:25,25:9,9:0,18:-25,0,-25:0,0,-18:8:1,7:7 -1 969815 . AGCTGCTGTCCGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 969892 . C . . . NS=1;AN=0 GT:PS ./.:. -1 969899 . G . . . NS=1;AN=0 GT:PS ./.:. -1 969947 . GCCGCTGGCGCGGGACACCCGGCAGCCGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 969996 . C . . . NS=1;AN=0 GT:PS ./.:. -1 970001 . C . . NS=1;CGA_WINEND=972000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.85:0.85:2:52:=:52:0.999:303 -1 970215 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442364;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:50:361,50:368,50:48,43:-361,-50,0:-48,-43,0:17:17,17:0 -1 970312 . G . . . NS=1;AN=0 GT:PS ./.:. -1 970320 . GGCTA . . . NS=1;AN=0 GT:PS ./.:. -1 970348 . T . . . NS=1;AN=0 GT:PS ./.:. -1 970354 . AGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 970388 . CCTGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 970399 . C . . . NS=1;AN=0 GT:PS ./.:. -1 970404 . T . . . NS=1;AN=0 GT:PS ./.:. -1 970409 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 970499 . ATGGGGTCAG . . . NS=1;AN=0 GT:PS ./.:. -1 970516 . CCCTTCAGCAGCCTGGATCCCTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 970550 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 970622 . A . . . NS=1;AN=0 GT:PS ./.:. -1 970625 . G . . . NS=1;AN=0 GT:PS ./.:. -1 971224 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2799055;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:52:361,52:361,52:53,33:-361,-52,0:-53,-33,0:34:34,34:0 -1 971367 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710883;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:90:90,743:90,743:46,50:-743,-90,0:-50,-46,0:40:40,40:0 -1 972001 . G . . NS=1;CGA_WINEND=974000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.05:0.98:2:53:=:53:0.999:303 -1 972134 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121575;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:52:52,480:52,480:44,52:-480,-52,0:-52,-44,0:27:27,27:0 -1 972180 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970350;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:66:579,66:579,66:52,48:-579,-66,0:-52,-48,0:26:26,26:0 -1 973105 . TGT . . . NS=1;AN=0 GT:PS ./.:. -1 973336 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2488993;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:30:217,30:217,30:48,29:-217,-30,0:-48,-29,0:19:19,19:0 -1 973377 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465129;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:66:66,650:66,650:38,53:-650,-66,0:-53,-38,0:30:30,30:0 -1 973458 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465130;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:110:110,812:110,812:52,50:-812,-110,0:-52,-50,0:42:42,42:0 -1 973668 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2488994;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:64:64,479:64,479:48,52:-479,-64,0:-52,-48,0:28:28,28:0 -1 974001 . G . . NS=1;CGA_WINEND=976000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.86:0.97:2:52:=:52:0.999:303 -1 974180 . G T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121577;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:56:471,56:471,56:52,45:-471,-56,0:-52,-45,0:23:23,23:0 -1 974199 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465131;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:79:703,79:703,79:53,42:-703,-79,0:-53,-42,0:31:31,31:0 -1 974225 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2488995;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC;CGA_RPT=G-rich|Low_complexity|27.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:110:110,937:110,937:52,50:-937,-110,0:-52,-50,0:42:42,42:0 -1 974296 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2488996;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:86:86,815:86,815:45,53:-815,-86,0:-53,-45,0:31:31,31:0 -1 974355 . GT AC . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465132&dbsnp.100|rs2488997&dbsnp.126|rs35487305;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:47:47,414:47,414:34,33:-414,-47,0:-34,-33,0:21:21,21:0 -1 974494 . G T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465133;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC;CGA_RPT=MIR|MIR|35.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:84:678,84:678,84:53,44:-678,-84,0:-53,-44,0:32:32,32:0 -1 974570 . T G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465134;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:66:471,66:471,66:53,38:-471,-66,0:-53,-38,0:30:30,30:0 -1 974662 . G T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2465135;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:72:72,553:72,553:48,52:-553,-72,0:-52,-48,0:28:27,27:1 -1 974791 . T TGG . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.126|rs36095704&dbsnp.130|rs71576591;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:28,.:28,.:1,.:-28,0,0:-1,0,0:11:3,.:0 -1 974894 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121578;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:44:395,44:395,44:52,40:-395,-44,0:-52,-40,0:20:20,20:0 -1 975133 . T A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121579;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:40:40,351:40,351:37,48:-351,-40,0:-48,-37,0:16:16,16:0 -1 975419 . GGCGCCAGAG . . . NS=1;AN=0 GT:PS ./.:. -1 975537 . TCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 975551 . CGCTGCAGCAGCGCGG . . . NS=1;AN=0 GT:PS ./.:. -1 975702 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9331224;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:62:457,62:457,62:52,47:-457,-62,0:-52,-47,0:29:29,29:0 -1 975862 . G . . . NS=1;AN=0 GT:PS ./.:. -1 975900 . ACG . . . NS=1;AN=0 GT:PS ./.:. -1 976001 . A . . NS=1;CGA_WINEND=978000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.28:0.70:2:47:=:47:0.999:303 -1 976047 . TGCCGGGGAATG . . . NS=1;AN=0 GT:PS ./.:. -1 976075 . CCGTGTGCGAGCCCAACGCGGAGGGGCCGGGCCGGGCGTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 976200 . C . . END=976517;NS=1;AN=0 GT:PS ./.:. -1 976620 . G . . . NS=1;AN=0 GT:PS ./.:. -1 976623 . G . . . NS=1;AN=0 GT:PS ./.:. -1 976788 . C . . . NS=1;AN=0 GT:PS ./.:. -1 976793 . GGC . . . NS=1;AN=0 GT:PS ./.:. -1 976805 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 976937 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:976937:VQLOW:24,.:0,.:0,.:0:0 -1 976940 . GCCC . . . NS=1;AN=0 GT:PS ./.:. -1 976956 . GCCCGGCA . . . NS=1;AN=0 GT:PS ./.:. -1 977056 . TGCGCTCCGGCCAGTGCCAGGGTCGAGGTGAGCGGCTCCCCCGGGGGAGGGCTCCGGCCAGTGCCAGGGTCGAGGTGGGCGGCTCCCCCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 977178 . TGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 977187 . C . . . NS=1;AN=0 GT:PS ./.:. -1 977203 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3121552;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:977203:PASS:20:20,53:20,53:21,23:-53,-20,0:-23,-21,0:15:15,15:0 -1 977330 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2799066;CGA_FI=375790|NM_198576.2|AGRN|ACCEPTOR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:74:551,74:551,74:53,41:-551,-74,0:-53,-41,0:32:31,31:1 -1 977516 . T . . . NS=1;AN=0 GT:PS ./.:. -1 977522 . GCAA . . . NS=1;AN=0 GT:PS ./.:. -1 977570 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710876;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:77:737,77:737,77:53,42:-737,-77,0:-53,-42,0:31:31,31:0 -1 977780 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710875;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:89:735,89:735,89:53,46:-735,-89,0:-53,-46,0:33:33,33:0 -1 978001 . T . . NS=1;CGA_WINEND=980000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.07:0.95:2:50:=:50:0.999:303 -1 978387 . G . . . NS=1;AN=0 GT:PS ./.:. -1 978477 . G . . . NS=1;AN=0 GT:PS ./.:. -1 978480 . G C . . NS=1;AN=2;AC=1;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:978480:VQLOW:26:26,26:0,0:0,11:-26,0,-26:0,0,-11:22:2,20:20 -1 978580 . GGGGCTTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 978592 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 978598 . GAGCCCCT GAGCCC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.134|rs138543546;CGA_FI=375790|NM_198576.2|AGRN|ACCEPTOR|UNKNOWN-INC&375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:978598:PASS:331,.:289,.:40,.:-331,0,0:-40,0,0:39:25,.:0 -1 978747 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 979121 . G . . . NS=1;AN=0 GT:PS ./.:. -1 979384 . GGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 980001 . G . . NS=1;CGA_WINEND=982000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:0.77:2:48:=:48:0.999:303 -1 980460 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3128097;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:74:74,665:74,665:48,52:-665,-74,0:-52,-48,0:28:27,27:1 -1 981087 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3128098;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:101:101,818:101,818:50,53:-818,-101,0:-53,-50,0:39:39,39:0 -1 981639 . T . . . NS=1;AN=0 GT:PS ./.:. -1 981776 . GAGG . . . NS=1;AN=0 GT:PS ./.:. -1 981828 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 981891 . G . . . NS=1;AN=0 GT:PS ./.:. -1 981895 . TACC . . . NS=1;AN=0 GT:PS ./.:. -1 981905 . AGC . . . NS=1;AN=0 GT:PS ./.:. -1 981917 . CCTCCGCCCTCATCACGAC CCTCCGCCCTCATCGCGAC . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.100|rs2465128;CGA_FI=375790|NM_198576.2|AGRN|CDS|NO-CHANGE&375790|NM_198576.2|AGRN|CDS|SYNONYMOUS GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:981917:PASS:116,.:109,.:41,.:-116,0,0:-41,0,0:9:5,.:0 -1 981948 . C . . . NS=1;AN=0 GT:PS ./.:. -1 981951 . G . . . NS=1;AN=0 GT:PS ./.:. -1 981958 . CAGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 981970 . G . . . NS=1;AN=0 GT:PS ./.:. -1 981974 . CCCGTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 981997 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 982001 . C . . NS=1;CGA_WINEND=984000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.73:0.86:2:47:=:47:0.999:303 -1 982001 . CCC . . . NS=1;AN=0 GT:PS ./.:. -1 982013 . CCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 982249 . C . . . NS=1;AN=0 GT:PS ./.:. -1 982258 . TT . . . NS=1;AN=0 GT:PS ./.:. -1 982268 . A . . . NS=1;AN=0 GT:PS ./.:. -1 982274 . G . . . NS=1;AN=0 GT:PS ./.:. -1 982278 . TGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 982413 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 982444 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3128099;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:31:31,269:31,269:30,52:-269,-31,0:-52,-30,0:20:20,20:0 -1 982455 . T . . . NS=1;AN=0 GT:PS ./.:. -1 982462 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3128100;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:75:75,635:75,635:48,52:-635,-75,0:-52,-48,0:26:26,26:0 -1 982513 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3128101;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:48:363,48:363,48:48,42:-363,-48,0:-48,-42,0:18:18,18:0 -1 982941 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.103|rs3128102;CGA_FI=375790|NM_198576.2|AGRN|ACCEPTOR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:67:576,67:576,67:53,38:-576,-67,0:-53,-38,0:33:33,33:0 -1 982994 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.52|rs10267;CGA_FI=375790|NM_198576.2|AGRN|CDS|SYNONYMOUS;CGA_PFAM=PFAM|PF01390|SEA GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:80:579,80:579,80:53,43:-579,-80,0:-53,-43,0:32:32,32:0 -1 983149 . T . . . NS=1;AN=0 GT:PS ./.:. -1 983422 . CGTCAGGAGCCATT . . . NS=1;AN=0 GT:PS ./.:. -1 983444 . AGCCACGG . . . NS=1;AN=0 GT:PS ./.:. -1 983473 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 983502 . G . . . NS=1;AN=0 GT:PS ./.:. -1 983506 . C . . . NS=1;AN=0 GT:PS ./.:. -1 983521 . C . . . NS=1;AN=0 GT:PS ./.:. -1 983566 . G . . . NS=1;AN=0 GT:PS ./.:. -1 983589 . CGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 983601 . CGTCGGCCCCCGGCCCCCCAGCAGCCTCCAAAGCCCTGTGACTCACAGCCCTGCTTCCACGGGGGGACCTGCCAGGACTGGGCATTGGGCGGGGGCTTCACCTGCAGCTGCCCGGCAGGCAGGGGAGGCGCCGTCTGTGAGAAGGGTAAGGATGTCCACTGCAGAGGAGGGCGGGGAGGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 984001 . T . . NS=1;CGA_WINEND=986000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.89:0.63:2:43:=:43:0.999:303 -1 984103 . C . . . NS=1;AN=0 GT:PS ./.:. -1 984148 . AAAAAAAAAAAA A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.129|rs57668569;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC;CGA_RPT=AluSp|Alu|7.6 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:54,.:54,.:16,.:-54,0,0:-16,0,0:7:7,.:0 -1 984171 . CAG C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.134|rs140904842;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:15:208,15:208,15:35,21:-208,-15,0:-35,-21,0:12:12,12:0 -1 984249 . C . . . NS=1;AN=0 GT:PS ./.:. -1 984252 . G . . . NS=1;AN=0 GT:PS ./.:. -1 984262 . T . . . NS=1;AN=0 GT:PS ./.:. -1 984302 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442391;CGA_FI=375790|NM_198576.2|AGRN|CDS|SYNONYMOUS GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:42:42,402:42,402:39,48:-402,-42,0:-48,-39,0:19:19,19:0 -1 984386 . CCG . . . NS=1;AN=0 GT:PS ./.:. -1 984469 . GCTCAGGTGGGCGGGGAGGGGACGGGCGGGGGAGGGGGGGCCGGGGCAGCTCAGGTGGGTGGGGTGGGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 984543 . G . . . NS=1;AN=0 GT:PS ./.:. -1 984635 . C . . . NS=1;AN=0 GT:PS ./.:. -1 985202 . A G . . NS=1;AN=2;AC=1;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:32:32,32:29,29:7,25:-32,0,-32:-7,0,-25:10:3,7:7 -1 985266 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2275813;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:45:45,453:45,453:41,52:-453,-45,0:-52,-41,0:20:20,20:0 -1 985379 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:985379:VQLOW:20,.:14,.:0,.:0:0 -1 985396 . G . . . NS=1;AN=0 GT:PS ./.:. -1 985424 . G . . . NS=1;AN=0 GT:PS ./.:. -1 985445 . GGG GT . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2275812&dbsnp.100|rs2799067&dbsnp.130|rs71576592;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:985445:PASS:241:325,241:323,269:33,32:-325,0,-241:-33,0,-32:27:16,11:11 -1 985449 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2799067&dbsnp.129|rs56255212;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:985445:PASS:136:231,136:230,143:50,44:-136,0,-231:-44,0,-50:23:16,7:16 -1 985450 . G A . . NS=1;AN=2;AC=1;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:985445:PASS:136:231,136:230,143:52,47:-231,0,-136:-52,0,-47:20:13,7:7 -1 985460 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2275811;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:985445:PASS:273:508,273:505,278:53,52:-508,0,-273:-53,0,-52:35:22,13:13 -1 986001 . C . . NS=1;CGA_WINEND=988000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:0.89:2:48:=:48:0.999:303 -1 986060 . T . . . NS=1;AN=0 GT:PS ./.:. -1 986064 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:986064:VQLOW:24,.:0,.:0,.:0:0 -1 986443 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710887;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:79:675,79:675,79:53,42:-675,-79,0:-53,-42,0:31:31,31:0 -1 986579 . T . . . NS=1;AN=0 GT:PS ./.:. -1 986684 . C . . . NS=1;AN=0 GT:PS ./.:. -1 986687 . GACT . . . NS=1;AN=0 GT:PS ./.:. -1 986816 . C . . . NS=1;AN=0 GT:PS ./.:. -1 986840 . TCGGAGGCCGC . . . NS=1;AN=0 GT:PS ./.:. -1 986855 . TGCTGACCCC . . . NS=1;AN=0 GT:PS ./.:. -1 987200 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9803031;CGA_FI=375790|NM_198576.2|AGRN|DONOR|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:32:32,295:32,295:31,48:-295,-32,0:-48,-31,0:11:11,11:0 -1 987670 . T G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303287;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC;CGA_RPT=(TG)n|Simple_repeat|33.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:70:70,582:70,582:48,52:-582,-70,0:-52,-48,0:25:25,25:0 -1 987894 . C CGT . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.126|rs34235844;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC;CGA_RPT=(TG)n|Simple_repeat|23.6 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:151,.:169,.:35,.:-151,0,0:-35,0,0:15:11,.:4 -1 988001 . T . . NS=1;CGA_WINEND=990000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.04:0.84:2:50:=:50:0.999:303 -1 988503 . A T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2799071;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:126:126,1072:126,1072:52,50:-1072,-126,0:-52,-50,0:47:46,46:1 -1 988742 . TGTGGGTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 988900 . GAC . . . NS=1;AN=0 GT:PS ./.:. -1 988932 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710871;CGA_FI=375790|NM_198576.2|AGRN|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:32:564,32:564,75:52,48:-564,-32,0:-52,-48,0:23:23,23:0 -1 990001 . C . . NS=1;CGA_WINEND=992000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.73:1.03:2:52:=:52:0.999:303 -1 990126 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 990280 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4275402;CGA_FI=375790|NM_198576.2|AGRN|CDS|SYNONYMOUS GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:41:339,41:339,41:48,38:-339,-41,0:-48,-38,0:19:19,19:0 -1 990320 . G . . . NS=1;AN=0 GT:PS ./.:. -1 990465 . T . . . NS=1;AN=0 GT:PS ./.:. -1 990517 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2710872;CGA_FI=375790|NM_198576.2|AGRN|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:50:476,50:476,50:52,43:-476,-50,0:-52,-43,0:23:23,23:0 -1 990602 . AGC . . . NS=1;AN=0 GT:PS ./.:. -1 990773 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2799072;CGA_FI=375790|NM_198576.2|AGRN|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:57:57,416:57,416:46,52:-416,-57,0:-52,-46,0:25:23,23:2 -1 990806 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2799073;CGA_FI=375790|NM_198576.2|AGRN|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:91:762,91:762,91:52,48:-762,-91,0:-52,-48,0:29:29,29:0 -1 990898 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 990905 . G . . . NS=1;AN=0 GT:PS ./.:. -1 990984 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.52|rs8014;CGA_FI=375790|NM_198576.2|AGRN|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:69:604,69:604,69:52,48:-604,-69,0:-52,-48,0:27:27,27:0 -1 991233 . G . . . NS=1;AN=0 GT:PS ./.:. -1 991249 . GCCAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 991347 . C . . . NS=1;AN=0 GT:PS ./.:. -1 991364 . CCCCCAGCCCCAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 991392 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 991397 . G . . . NS=1;AN=0 GT:PS ./.:. -1 991401 . C . . . NS=1;AN=0 GT:PS ./.:. -1 991500 . TTGCTTTTGTCCATCCTCACCAGCGCGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 991548 . C . . . NS=1;AN=0 GT:PS ./.:. -1 991604 . CAG . . . NS=1;AN=0 GT:PS ./.:. -1 991611 . T . . . NS=1;AN=0 GT:PS ./.:. -1 991658 . AC A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs60286592 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:24:24,207:24,207:28,35:-207,-24,0:-35,-28,0:13:13,13:0 -1 991721 . GTGCGTGTACGTGTGGGGGTGTGTGTGTGTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 991793 . GCACGTGTGGGTGCGGGTACGTGTGGGTGCGGGTACGTGTGGGTGCATGT . . . NS=1;AN=0 GT:PS ./.:. -1 991884 . GGTACGTGTGGGTGTGGGTGCGT . . . NS=1;AN=0 GT:PS ./.:. -1 991925 . GCATGTATGGGTGCATGTACGTGTGTG . . . NS=1;AN=0 GT:PS ./.:. -1 991974 . GGCGTGTATGTGTGGGTGCGTGTGCGTGTGGGTGCGTGTGCTTGC . . . NS=1;AN=0 GT:PS ./.:. -1 992001 . G . . NS=1;CGA_WINEND=994000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.64:0.90:2:53:=:53:0.999:303 -1 992038 . GGTGCATGTACGTGTGTGGGTGCGTG . . . NS=1;AN=0 GT:PS ./.:. -1 992076 . GGCGTGTATGTGTGGGTGTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 992115 . GCACGTGTGTGTGTGTGTGTGTGTGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 992327 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2245754 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:72:667,72:667,72:53,40:-667,-72,0:-53,-40,0:35:35,35:0 -1 992622 . G . . . NS=1;AN=0 GT:PS ./.:. -1 992635 . T . . . NS=1;AN=0 GT:PS ./.:. -1 992651 . C . . . NS=1;AN=0 GT:PS ./.:. -1 992819 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9331226 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:33:367,33:367,33:48,31:-367,-33,0:-48,-31,0:15:15,15:0 -1 992840 . ATT A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.126|rs34537778&dbsnp.129|rs55929714&dbsnp.134|rs150915126 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:51:416,51:422,51:38,38:-416,-51,0:-38,-38,0:27:27,27:0 -1 992852 . AAG A . . NS=1;AN=2;AC=2;CGA_RPT=AluSx1|Alu|12.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:22:22,165:22,165:27,36:-165,-22,0:-36,-27,0:20:20,20:0 -1 993360 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs13303240;CGA_RPT=AluSx1|Alu|11.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:66:455,66:455,66:52,48:-455,-66,0:-52,-48,0:25:25,25:0 -1 993402 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:993402:VQLOW:21,.:0,.:0,.:0:0 -1 993405 . TTGC . . . NS=1;AN=0 GT:PS ./.:. -1 993916 . GCA . . . NS=1;AN=0 GT:PS ./.:. -1 994001 . C . . NS=1;CGA_WINEND=996000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.85:0.98:2:53:=:53:0.999:303 -1 994391 . G T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2488991 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:50:394,50:394,50:52,43:-394,-50,0:-52,-43,0:27:27,27:0 -1 994519 . G . . . NS=1;AN=0 GT:PS ./.:. -1 994569 . T . . . NS=1;AN=0 GT:PS ./.:. -1 994573 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 994674 . GGGAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 994768 . CCCGCCCCCCAGCCTGGAGCGCCCCCCTCCGGCCCCGGTCCGCAGTGGA . . . NS=1;AN=0 GT:PS ./.:. -1 994874 . AA . . . NS=1;AN=0 GT:PS ./.:. -1 994878 . CCCGCGACCCGCGACCCGGCTGCCCGCGGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 994915 . G . . . NS=1;AN=0 GT:PS ./.:. -1 994924 . CGGGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 994933 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 994939 . C . . . NS=1;AN=0 GT:PS ./.:. -1 994958 . C . . . NS=1;AN=0 GT:PS ./.:. -1 994966 . CGCCCG . . . NS=1;AN=0 GT:PS ./.:. -1 994991 . ACCCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 995481 . T G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442393;CGA_RPT=AluSg|Alu|18.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:82:637,82:637,82:53,44:-637,-82,0:-53,-44,0:38:37,37:1 -1 996001 . G . . NS=1;CGA_WINEND=998000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.62:0.92:2:53:=:53:0.999:303 -1 996354 . TTAT . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:VQLOW:28,.:0,.:0,.:0:0 -1 997263 . GTGGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 997275 . TCCTAGAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 997290 . GCCCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 997309 . TA . . . NS=1;AN=0 GT:PS ./.:. -1 997408 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.125|rs28397086 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:104:799,104:799,104:50,51:-799,-104,0:-51,-50,0:43:43,43:0 -1 997436 . CTCCCTCCCTTGTCCCCGTTCCCTCCG C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.126|rs34515215&dbsnp.129|rs59725750&dbsnp.134|rs145846158;CGA_RPT=C-rich|Low_complexity|18.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:46:325,46:325,46:36,31:-325,-46,0:-36,-31,0:43:43,43:0 -1 997524 . CCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 997585 . AGT . . . NS=1;AN=0 GT:PS ./.:. -1 997602 . GCTCTTTTT . . . NS=1;AN=0 GT:PS ./.:. -1 997618 . A . . . NS=1;AN=0 GT:PS ./.:. -1 997623 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 997636 . GCCG . . . NS=1;AN=0 GT:PS ./.:. -1 997666 . C . . . NS=1;AN=0 GT:PS ./.:. -1 997709 . A . . . NS=1;AN=0 GT:PS ./.:. -1 997715 . GCTCCG . . . NS=1;AN=0 GT:PS ./.:. -1 997723 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 997727 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 997731 . GACCTCATGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 997747 . G . . . NS=1;AN=0 GT:PS ./.:. -1 997751 . ACTAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 997765 . G . . . NS=1;AN=0 GT:PS ./.:. -1 997768 . ACGGGGGGGCTGTTCACCAGGA . . . NS=1;AN=0 GT:PS ./.:. -1 997797 . GGGCTGCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 997828 . CGGCAGCACCAGCCTCTGCCTGCATGGGGCCGCGAGGTTTGCAGTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 997886 . CTTCCTGACCTGCCCCGGACACGGAGCACGGCTCC . . . NS=1;AN=0 GT:PS ./.:. -1 998001 . A . . NS=1;CGA_WINEND=1000000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.09:0.88:2:51:=:51:0.999:303 -1 998055 . A . . . NS=1;AN=0 GT:PS ./.:. -1 998062 . G . . . NS=1;AN=0 GT:PS ./.:. -1 998185 . GAGGCTGGGAGGGGTGGAGTGGACGGGGCTGCTGACAGCCTCGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 998395 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7526076 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:136:1183,136:1183,136:49,52:-1183,-136,0:-52,-49,0:50:50,50:0 -1 998582 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3813194 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:77:637,77:637,77:53,42:-637,-77,0:-53,-42,0:33:33,33:0 -1 999041 . ATGTG A,ATG . . NS=1;AN=2;AC=1,1;CGA_XR=dbsnp.114|rs5772030&dbsnp.114|rs5772031&dbsnp.114|rs5772032&dbsnp.114|rs5772033&dbsnp.129|rs59276350&dbsnp.130|rs63618161&dbsnp.132|rs112424583&dbsnp.134|rs150552617,dbsnp.114|rs5772030&dbsnp.114|rs5772031&dbsnp.114|rs5772032&dbsnp.114|rs5772033&dbsnp.129|rs59276350&dbsnp.130|rs63618161&dbsnp.134|rs150552617;CGA_RPT=(TG)n|Simple_repeat|19.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/2:.:PASS:89:89,252:84,247:26,30:-252,-252,-252,-89,0,-89:-30,-30,-30,-26,0,-26:19:7,9:1 -1 999840 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1000001 . G . . NS=1;CGA_WINEND=1002000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.62:0.70:2:47:=:47:0.999:312 -1 1000156 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs11584349 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:43:43,228:43,228:39,48:-228,-43,0:-48,-39,0:18:18,18:0 -1 1000164 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 1000204 . CCC . . . NS=1;AN=0 GT:PS ./.:. -1 1000212 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1000231 . G . . END=1000696;NS=1;AN=0 GT:PS ./.:. -1 1001177 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970401;CGA_RPT=LTR13A|ERVK|6.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:52:52,501:52,501:44,52:-501,-52,0:-52,-44,0:25:24,24:1 -1 1001562 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1002001 . A . . NS=1;CGA_WINEND=1004000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.11:1.02:2:51:=:51:0.999:312 -1 1002434 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs11260596 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:86:711,86:711,86:53,45:-711,-86,0:-53,-45,0:33:33,33:0 -1 1002932 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4246502 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:61:61,396:61,396:47,52:-396,-61,0:-52,-47,0:29:29,29:0 -1 1003053 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4074992 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:100:847,100:847,100:53,50:-847,-100,0:-53,-50,0:36:36,36:0 -1 1003629 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4075116 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:111:111,1064:111,1064:52,50:-1064,-111,0:-52,-50,0:44:43,43:1 -1 1003678 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1003683 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 1003962 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1004001 . G . . NS=1;CGA_WINEND=1006000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:0.73:2:48:=:48:0.999:312 -1 1004017 . CGGGCCGATCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1004034 . CGTCCTCGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 1004051 . ACG . . . NS=1;AN=0 GT:PS ./.:. -1 1004090 . G . . END=1004535;NS=1;AN=0 GT:PS ./.:. -1 1004581 . GTCCGCAGGGCTGGACTGCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 1004687 . CT . . . NS=1;AN=0 GT:PS ./.:. -1 1004721 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1004736 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 1004740 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1004754 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1004957 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4073176 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:33:463,33:463,33:52,31:-463,-33,0:-52,-31,0:23:23,23:0 -1 1004980 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.108|rs4073177 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:72:607,72:607,72:52,48:-607,-72,0:-52,-48,0:29:29,29:0 -1 1005251 . CACCGCA . . . NS=1;AN=0 GT:PS ./.:. -1 1006001 . T . . NS=1;CGA_WINEND=1008000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.85:0.88:2:51:=:51:0.999:312 -1 1006223 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442394 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:57:57,455:57,455:46,52:-455,-57,0:-52,-46,0:22:22,22:0 -1 1006947 . GACTGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 1006990 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4326571 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:79:79,548:79,548:42,53:-548,-79,0:-53,-42,0:31:31,31:0 -1 1007120 . GTG . . . NS=1;AN=0 GT:PS ./.:. -1 1007158 . CTGGGCGAGGGC . . . NS=1;AN=0 GT:PS ./.:. -1 1007203 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4633229;CGA_FI=401934|XM_002342025.2|LOC401934|CDS|SYNONYMOUS GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:22:225,22:225,22:48,22:-225,-22,0:-48,-22,0:13:13,13:0 -1 1007360 . CGCCTCCAGTCCCTGCAGCGCGCCCAGCAGCGGGCCAGGCGGCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1007432 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4333796;CGA_FI=401934|XM_002342025.2|LOC401934|CDS|MISSENSE;CGA_RPT=GC_rich|Low_complexity|5.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:13:200,13:200,13:48,16:-200,-13,0:-48,-16,0:12:12,12:0 -1 1007439 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1008001 . T . . NS=1;CGA_WINEND=1010000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.11:0.92:2:53:=:53:0.999:312 -1 1009231 . CTGTGA CTGCGA . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9442366;CGA_FI=401934|XM_002342025.2|LOC401934|INTRON|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:99,.:89,.:37,.:-99,0,0:-37,0,0:12:9,.:0 -1 1009478 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442367;CGA_FI=401934|XM_002342025.2|LOC401934|UTR5|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:40:40,483:40,483:37,52:-483,-40,0:-52,-37,0:25:25,25:0 -1 1009554 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1009558 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1010001 . G . . NS=1;CGA_WINEND=1012000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.64:0.72:.:0:.:0:0.999:312 -1 1010717 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442368;CGA_FI=401934|XM_002342025.2|LOC401934|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:83:690,83:690,83:53,44:-690,-83,0:-53,-44,0:31:31,31:0 -1 1010997 . GTCGGGGGTTG . . . NS=1;AN=0 GT:PS ./.:. -1 1011041 . GTCAGGGGTTGGGGGGACCC . . . NS=1;AN=0 GT:PS ./.:. -1 1011088 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1011092 . GTTA . . . NS=1;AN=0 GT:PS ./.:. -1 1011303 . GGG . . . NS=1;AN=0 GT:PS ./.:. -1 1011310 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 1011318 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1011331 . GTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1011342 . CTGAGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 1011373 . TGA . . . NS=1;AN=0 GT:PS ./.:. -1 1011401 . GCTGAGGTTATGGGGACTCCGTGCTGGGAGGCTGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 1011471 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1011485 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1011492 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1011522 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1011538 . TC . . . NS=1;AN=0 GT:PS ./.:. -1 1011543 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1011561 . GGTGACTCCGTGCA . . . NS=1;AN=0 GT:PS ./.:. -1 1011582 . GA . . . NS=1;AN=0 GT:PS ./.:. -1 1011595 . CT . . . NS=1;AN=0 GT:PS ./.:. -1 1011616 . C . . END=1014753;NS=1;AN=0 GT:PS ./.:. -1 1012001 . A . . NS=1;CGA_WINEND=1014000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.01:0.19:.:0:.:0:0.999:312 -1 1014001 . G . . NS=1;CGA_WINEND=1016000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.61:0.82:.:0:.:0:0.999:312 -1 1014836 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs12401605 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:39:317,39:317,39:48,36:-317,-39,0:-48,-36,0:14:14,14:0 -1 1014864 . T A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs12411041 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:52:52,396:52,396:44,48:-396,-52,0:-48,-44,0:19:19,19:0 -1 1015026 . GTGTGGT . . . NS=1;AN=0 GT:PS ./.:. -1 1015126 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.126|rs36027499 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:51:498,51:498,51:52,44:-498,-51,0:-52,-44,0:22:22,22:0 -1 1015257 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442369 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:35:35,349:35,349:33,52:-349,-35,0:-52,-33,0:21:21,21:0 -1 1015551 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442370 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:71:467,71:467,71:53,40:-467,-71,0:-53,-40,0:33:33,33:0 -1 1015618 . TAGGCCCAGCTC T . . NS=1;AN=2;AC=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:469:469,469:468,468:40,42:-469,0,-469:-40,0,-42:37:17,20:20 -1 1015817 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.121|rs12746483 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:12:209,12:209,12:48,16:-209,-12,0:-48,-16,0:10:10,10:0 -1 1015855 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1016001 . T . . NS=1;CGA_WINEND=1018000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.96:0.85:2:52:=:52:0.999:312 -1 1016058 . TCCGCCCCCACCTCGGTCCCTGTC . . . NS=1;AN=0 GT:PS ./.:. -1 1016088 . CCTCCGCCCCCACCTCGGTCCCTGTCTCCTTCCCTCCGCCCCCACCTCGGTCCCTGTCTCCTTCCCTCCGCCCCCACCTCGGTCCCTGTCTCCTTCCCTCCGCCCCCACCTCGGTCCCTGTCTCCTTCCCTCCGCCCCCACCTCGGTCCCTGTCTCCTTCCCTCCGCCCCCACCTCGGTCCCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 1016278 . TCCCTCCGCCCCCACCTCGGTCCCTGT . . . NS=1;AN=0 GT:PS ./.:. -1 1016310 . TCCCTCCGCCCCCACCTCGGTCCCTGTCTCCTTCCCTCCGCCCCCACCTC . . . NS=1;AN=0 GT:PS ./.:. -1 1016429 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1017028 . GC GG,G . . NS=1;AN=2;AC=1,1;CGA_XR=dbsnp.111|rs4970352,. GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/2:.:VQLOW:31:127,31:124,27:41,5:-127,-31,-31,-127,0,-127:-41,-5,-5,-41,0,-41:21:9,4:0 -1 1017170 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3766193 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:70:70,721:70,721:48,52:-721,-70,0:-52,-48,0:28:28,28:0 -1 1017197 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3766192 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:139:139,1236:139,1236:52,49:-1236,-139,0:-52,-49,0:55:55,55:0 -1 1017341 . G T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.86|rs1133647;CGA_FI=54991|NM_017891.4|C1orf159|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:59:530,59:530,59:52,46:-530,-59,0:-52,-46,0:23:23,23:0 -1 1018001 . C . . NS=1;CGA_WINEND=1020000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.10:0.97:2:53:=:53:0.999:312 -1 1018144 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442395;CGA_FI=54991|NM_017891.4|C1orf159|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:90:90,859:90,859:46,53:-859,-90,0:-53,-46,0:34:34,34:0 -1 1018562 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442371;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:89:89,720:89,720:46,53:-720,-89,0:-53,-46,0:32:32,32:0 -1 1018678 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:504,.:457,.:0,.:0:0 -1 1018704 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442372;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC;CGA_RPT=MLT1C|ERVL-MaLR|17.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:91:811,91:764,44:50,31:-811,-91,0:-50,-31,0:44:43,43:0 -1 1019106 . GGACGGGGCACG . . . NS=1;AN=0 GT:PS ./.:. -1 1019175 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2298215;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:83:717,83:717,83:53,44:-717,-83,0:-53,-44,0:30:30,30:0 -1 1019180 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442396;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:84:84,727:84,727:44,53:-727,-84,0:-53,-44,0:36:36,36:0 -1 1019962 . CAGCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 1020001 . G . . NS=1;CGA_WINEND=1022000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.20:0.92:2:53:=:53:0.999:312 -1 1020007 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1020177 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1020406 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442397;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:76:618,76:618,76:52,48:-618,-76,0:-52,-48,0:28:28,28:0 -1 1020581 . ACCTGGGAGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 1020885 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1020923 . A AG . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.130|rs70949550;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:67:67,512:67,512:37,39:-512,-67,0:-39,-37,0:29:29,29:0 -1 1021415 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.107|rs3737728;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:26:343,26:343,26:52,26:-343,-26,0:-52,-26,0:20:20,20:0 -1 1021695 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442398;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:76:76,713:76,713:41,53:-713,-76,0:-53,-41,0:32:31,31:1 -1 1021873 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1022001 . A . . NS=1;CGA_WINEND=1024000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.85:0.96:2:53:=:53:0.999:312 -1 1022037 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6701114;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:110:110,984:110,984:52,50:-984,-110,0:-52,-50,0:41:41,41:0 -1 1022457 . GCT . . . NS=1;AN=0 GT:PS ./.:. -1 1022502 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1023139 . C G . . NS=1;AN=2;AC=1;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:254:338,254:338,253:53,51:-338,0,-254:-53,0,-51:31:17,14:14 -1 1023441 . CTGCAGG CTGGAGG . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.119|rs9442399;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:1023441:PASS:267,.:274,.:52,.:-267,0,0:-52,0,0:21:14,.:0 -1 1023453 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1023456 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1023531 . A . . END=1023734;NS=1;AN=0 GT:PS ./.:. -1 1023853 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1023910 . CCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 1023920 . CAAGCCTC . . . NS=1;AN=0 GT:PS ./.:. -1 1023930 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1023934 . CCGCCAGGCCGACGCTGCGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1024001 . A . . NS=1;CGA_WINEND=1026000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.11:0.88:2:53:=:53:0.999:312 -1 1024095 . GGTG . . . NS=1;AN=0 GT:PS ./.:. -1 1024194 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1024897 . GGCTCCCCAACCCCCACGC . . . NS=1;AN=0 GT:PS ./.:. -1 1025301 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442400;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:228:228,228:227,227:46,49:-228,0,-228:-46,0,-49:37:17,20:20 -1 1025797 . GGGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 1026001 . C . . NS=1;CGA_WINEND=1028000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.95:0.92:2:53:=:53:0.999:312 -1 1026657 . CCCAGCAGGCAGCGGAGGGCTCCCCTCTGCG . . . NS=1;AN=0 GT:PS ./.:. -1 1026702 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1026707 . CC AC,AT . . NS=1;AN=2;AC=1,1;CGA_XR=dbsnp.108|rs4074137,.;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC,54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|2:1026707:VQLOW:20:186,20:182,17:30,0:-186,-20,-20,-186,0,-186:-30,0,0,-30,0,-30:11:8,1:0 -1 1026712 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 1026726 . AGCACCT . . . NS=1;AN=0 GT:PS ./.:. -1 1026801 . T A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4562563;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:46:377,46:377,46:48,41:-377,-46,0:-48,-41,0:16:16,16:0 -1 1026898 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1026902 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1026905 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 1026924 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 1027094 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1027097 . TGGAGGGTGGGGCCAAATGGAAGTGGGCGGGGCTGTGGTGGAGGGTGGGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1027155 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:1027155:VQLOW:24,.:20,.:0,.:0:0 -1 1027199 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1027208 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1028001 . G . . NS=1;CGA_WINEND=1030000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:0.94:2:53:=:53:0.999:312 -1 1030001 . C . . NS=1;CGA_WINEND=1032000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:0.88:2:51:=:51:0.999:312 -1 1030347 . TGCTAAAAAGACT . . . NS=1;AN=0 GT:PS ./.:. -1 1030592 . TCCATTT . . . NS=1;AN=0 GT:PS ./.:. -1 1031540 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9651273;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC;CGA_RPT=L1MD3|L1|40.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:109:109,1043:109,1043:52,50:-1043,-109,0:-52,-50,0:40:40,40:0 -1 1031719 . ATAATTTTTTTTGTAC . . . NS=1;AN=0 GT:PS ./.:. -1 1032001 . A . . NS=1;CGA_WINEND=1034000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.90:0.92:2:48:=:48:0.999:312 -1 1032184 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9651272;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC;CGA_RPT=AluJo|Alu|22.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:55:55,572:55,572:45,52:-572,-55,0:-52,-45,0:25:25,25:0 -1 1032579 . C T . . NS=1;AN=2;AC=1;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC;CGA_RPT=AluSx1|Alu|10.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:32:32,32:12,12:0,18:-32,0,-32:0,0,-18:16:2,14:14 -1 1032965 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1033999 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970353;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC;CGA_RPT=L1MD3|L1|40.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:38:288,38:288,38:52,36:-288,-38,0:-52,-36,0:26:26,26:0 -1 1034001 . T . . NS=1;CGA_WINEND=1036000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.95:0.58:2:42:=:42:0.999:312 -1 1035053 . C T . . NS=1;AN=2;AC=1;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC;CGA_RPT=L1MD|L1|23.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:20:20,20:0,0:0,11:-20,0,-20:0,0,-11:20:3,17:17 -1 1036001 . G . . NS=1;CGA_WINEND=1038000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:0.94:2:49:=:49:0.999:312 -1 1036533 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:1036533:PASS:40,.:5,.:0,.:0:0 -1 1036536 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1037452 . AAAA . . . NS=1;AN=0 GT:PS ./.:. -1 1037995 . AAAAAAAAAAAAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 1038001 . A . . NS=1;CGA_WINEND=1040000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.86:0.86:2:51:=:51:0.999:312 -1 1038106 . AGGCAGAAGTTGCAGTGAGAC . . . NS=1;AN=0 GT:PS ./.:. -1 1038130 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1038134 . TGC . . . NS=1;AN=0 GT:PS ./.:. -1 1040001 . A . . NS=1;CGA_WINEND=1042000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.67:0.95:2:51:=:51:0.999:312 -1 1040195 . AAGGAAAAAAAAAAAGCTTT . . . NS=1;AN=0 GT:PS ./.:. -1 1040418 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs77147003;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC;CGA_RPT=MIR|MIR|47.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:171:543,171:528,156:53,42:-543,0,-171:-53,0,-42:33:24,9:9 -1 1040548 . AACAAATATG . . . NS=1;AN=0 GT:PS ./.:. -1 1040918 . ATTTAAAAAACAAT . . . NS=1;AN=0 GT:PS ./.:. -1 1041007 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1041026 . CCGAGGTGGGTGGATCACGAGGTCAGGAG . . . NS=1;AN=0 GT:PS ./.:. -1 1041088 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1041129 . CTGT . . . NS=1;AN=0 GT:PS ./.:. -1 1041154 . TGA . . . NS=1;AN=0 GT:PS ./.:. -1 1041165 . AATAG . . . NS=1;AN=0 GT:PS ./.:. -1 1041173 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1041176 . CCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1041261 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1041265 . AAAT . . . NS=1;AN=0 GT:PS ./.:. -1 1041451 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1041455 . AACTA . . . NS=1;AN=0 GT:PS ./.:. -1 1041462 . AAGTCAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 1042001 . G . . NS=1;CGA_WINEND=1044000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.64:1.25:2:47:=:47:0.999:312 -1 1043612 . CAG . . . NS=1;AN=0 GT:PS ./.:. -1 1043673 . TCTCGGCACCGTTCACCACAGCCACCATGTCTCAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 1043854 . CTCAGCAGCACCGTCCACCACAGCCACCATGTCTCGGCAGCACCGTCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 1044001 . A . . NS=1;CGA_WINEND=1046000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:1.03:.:0:.:0:0.999:312 -1 1044008 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1044036 . TCTCGGCAGCACCGTTCACC . . . NS=1;AN=0 GT:PS ./.:. -1 1044098 . CTCAGCAGCACCGTCCACCA . . . NS=1;AN=0 GT:PS ./.:. -1 1044137 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1044143 . TT . . . NS=1;AN=0 GT:PS ./.:. -1 1044171 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1044242 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1044251 . GT . . . NS=1;AN=0 GT:PS ./.:. -1 1044256 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1044448 . ACCGTTCACCACAGCCACCATGTCTGCAGCAGCATTGTTCACCACAGCCAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 1046001 . A . . NS=1;CGA_WINEND=1048000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.92:1.02:2:50:=:50:0.999:312 -1 1046320 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 1046746 . AAA . . . NS=1;AN=0 GT:PS ./.:. -1 1047077 . CAGTA . . . NS=1;AN=0 GT:PS ./.:. -1 1047094 . AGG AT . . NS=1;AN=2;AC=1;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1047094:VQLOW:21:21,21:4,4:0,17:-21,0,-21:0,0,-17:8:1,7:7 -1 1047106 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 1047422 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1048001 . A . . NS=1;CGA_WINEND=1050000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.79:0.90:2:53:=:53:0.999:312 -1 1048572 . TCTCAAAAAAAAAAAAAAAACTTG . . . NS=1;AN=0 GT:PS ./.:. -1 1048862 . TCCGGGGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1049294 . TTT . . . NS=1;AN=0 GT:PS ./.:. -1 1050001 . C . . NS=1;CGA_WINEND=1052000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.69:0.87:2:52:=:52:0.999:312 -1 1050391 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs149580770;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC;CGA_RPT=L1PB1|L1|21.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:514:726,514:724,512:49,54:-726,0,-514:-49,0,-54:56:32,24:24 -1 1050892 . A . . END=1051143;NS=1;AN=0 GT:PS ./.:. -1 1051157 . GACGCCCCTTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1051187 . TCAGCCCGGGCGCCTTTCCCCCATAGGACCGCGGCCAGGCTCGTTGGGAGGCGGCGACGAGGACGCGGGCCCAGGCGCTGGCGGCTCCTCCG . . . NS=1;AN=0 GT:PS ./.:. -1 1051297 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 1051303 . AGGGAA . . . NS=1;AN=0 GT:PS ./.:. -1 1051312 . C T . . NS=1;AN=2;AC=1;CGA_FI=54991|NM_017891.4|C1orf159|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1051312:VQLOW:21:21,25:20,24:0,24:-21,0,-25:0,0,-24:4:1,3:3 -1 1051319 . GCCGGGCCAGGGCCGCCGACCTTTGTCTGCCTCTCGCACTCCCTGCGCCGACCCGGCCGCCCAGACGGACCCCAGCGCCCCAACCCGCT . . . NS=1;AN=0 GT:PS ./.:. -1 1051417 . CCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1051439 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1051444 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1051447 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1051450 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1051496 . GCC . . . NS=1;AN=0 GT:PS ./.:. -1 1051514 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1051526 . CCCGGCCCCGGCGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 1051547 . CGCAGCTCCCAAAGAAAACTACAACTCCCGGCGGCCCGCGCGAGAGCCGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1051617 . GCGCGCGGCCGTGGGTGGGGCGCCGGGGCGGGGCGCGAAGCGCCCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 1051748 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1051751 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1051757 . CGTTGCCGGGAGACGGGGCGGGGCGT . . . NS=1;AN=0 GT:PS ./.:. -1 1051785 . TCGGGGTCTC . . . NS=1;AN=0 GT:PS ./.:. -1 1051927 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 1051931 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1051968 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1052001 . G . . NS=1;CGA_WINEND=1054000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.81:0.83:2:52:=:52:0.999:312 -1 1052368 . CTTCAAT . . . NS=1;AN=0 GT:PS ./.:. -1 1052397 . CTCCTAGCTGGGCCAGCGCGCAGGGTGGGGGGGCGC . . . NS=1;AN=0 GT:PS ./.:. -1 1052511 . CCCCCCATATACCCCCAACCCCTCAGACCCCCCAACCCCCCAGACCCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1052574 . CCCAGA . . . NS=1;AN=0 GT:PS ./.:. -1 1052842 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1052845 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1054001 . G . . NS=1;CGA_WINEND=1056000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.79:0.95:2:53:=:53:0.999:312 -1 1054179 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1055427 . GA G . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.126|rs34088984&dbsnp.130|rs71576602;CGA_FI=54991|NM_017891.4|C1orf159|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSz|Alu|16.2 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:VQLOW:22,.:17,.:0,.:-22,0,0:0,0,0:18:5,.:13 -1 1055654 . T C . . NS=1;AN=2;AC=1;CGA_FI=54991|NM_017891.4|C1orf159|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluJo|Alu|28.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:84:84,84:64,64:22,35:-84,0,-84:-22,0,-35:28:6,22:22 -1 1056001 . A . . NS=1;CGA_WINEND=1058000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.88:0.87:2:53:=:53:0.999:312 -1 1056050 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1056053 . GCG . . . NS=1;AN=0 GT:PS ./.:. -1 1056058 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1056874 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1056928 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1056938 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1057024 . CCCCATTGCCCCCTGGGATTGCCCCCCCTCCCCCACCCCACCCCATCTGGGACTCCTGCCCCTACAGCAGCCGCAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 1057321 . CTCTGTGTGGGGGGCGTC . . . NS=1;AN=0 GT:PS ./.:. -1 1058001 . G . . NS=1;CGA_WINEND=1060000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.97:0.87:2:53:=:53:0.999:312 -1 1060001 . G . . NS=1;CGA_WINEND=1062000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.84:1.06:2:52:=:52:0.999:312 -1 1061395 . CCATAGA . . . NS=1;AN=0 GT:PS ./.:. -1 1061483 . C G . . NS=1;AN=2;AC=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:20:20,20:3,3:0,12:-20,0,-20:0,0,-12:14:2,12:12 -1 1061537 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1061691 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1061931 . TCCGCCCGCCTTGGGGGAGGCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 1062001 . G . . NS=1;CGA_WINEND=1064000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:0.80:2:51:=:51:0.999:312 -1 1062171 . CCCTCCGCGA . . . NS=1;AN=0 GT:PS ./.:. -1 1062229 . CCGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 1062638 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442373 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:87:87,87:78,78:27,37:-87,0,-87:-27,0,-37:25:7,18:18 -1 1063044 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7545801 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:66:148,66:145,63:46,34:-148,0,-66:-46,0,-34:11:7,4:4 -1 1063241 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970413 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:180:304,180:301,177:53,44:-304,0,-180:-53,0,-44:30:18,12:12 -1 1064001 . G . . NS=1;CGA_WINEND=1066000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.02:0.86:2:52:=:52:0.999:312 -1 1064535 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6682475 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:144:171,144:171,144:48,47:-171,0,-144:-48,0,-47:17:9,8:8 -1 1064670 . C G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs7547403 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:53:53,395:53,395:34,53:-395,-53,0:-53,-34,0:35:35,35:0 -1 1064802 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2298216 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:420:520,420:520,420:50,54:-520,0,-420:-50,0,-54:49:25,24:24 -1 1064979 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2298217 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1064979:PASS:101:374,101:365,92:53,31:-374,0,-101:-53,0,-31:31:21,10:10 -1 1065296 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs4072537 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:1064979:PASS:308:308,310:308,310:53,53:-310,0,-308:-53,0,-53:34:18,16:18 -1 1065567 . AAAAAAAAAA . . . NS=1;AN=0 GT:PS ./.:. -1 1065592 . AAAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 1065824 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1065842 . TCTCTGCTGTCACT . . . NS=1;AN=0 GT:PS ./.:. -1 1066001 . T . . NS=1;CGA_WINEND=1068000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:1.09:2:50:=:50:0.999:312 -1 1066259 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs4072496;CGA_RPT=HAL1|L1|45.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1066259:PASS:264:264,407:263,407:43,54:-264,0,-407:-43,0,-54:40:17,23:23 -1 1066282 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10907181 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1066259:PASS:236:284,236:287,236:52,50:-284,0,-236:-52,0,-50:29:16,13:13 -1 1066388 . C CT . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs34287831&dbsnp.134|rs140634183;CGA_RPT=HAL1|L1|43.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1066388:PASS:204:204,205:218,205:39,40:-204,0,-205:-39,0,-40:27:17,10:10 -1 1066403 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10907182;CGA_RPT=HAL1|L1|43.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1066388:PASS:258:258,329:258,333:52,50:-258,0,-329:-52,0,-50:27:12,15:15 -1 1066816 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1066819 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7513297;CGA_RPT=AluY|Alu|6.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1066819:PASS:388:450,388:451,392:50,54:-450,0,-388:-50,0,-54:40:19,21:21 -1 1066828 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7553878;CGA_RPT=AluY|Alu|6.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1066819:PASS:228:594,228:597,229:53,49:-594,0,-228:-53,0,-49:37:23,14:14 -1 1066946 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7513404;CGA_RPT=AluY|Alu|6.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1066946:PASS:335:335,551:335,549:47,54:-335,0,-551:-47,0,-54:45:18,27:27 -1 1066952 . AT GC . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7513405&dbsnp.116|rs7516160&dbsnp.126|rs34955020;CGA_RPT=AluY|Alu|6.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1066946:PASS:460:460,568:478,567:33,39:-460,0,-568:-33,0,-39:43:19,24:24 -1 1067596 . CAG C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs141257782 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:250:250,250:250,250:38,40:-250,0,-250:-38,0,-40:28:13,15:15 -1 1067674 . T TGG . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs35574593&dbsnp.131|rs79280895 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:160:160,162:160,162:39,39:-160,0,-162:-39,0,-39:21:10,11:11 -1 1067862 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442374 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1067862:PASS:287:287,360:323,356:53,54:-287,0,-360:-53,0,-54:33:15,18:18 -1 1067865 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442358 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1067862:PASS:245:245,360:245,356:48,54:-245,0,-360:-48,0,-54:31:11,20:20 -1 1068001 . T . . NS=1;CGA_WINEND=1070000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.95:1.15:2:49:=:49:0.999:312 -1 1068441 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1068450 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1068459 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1068669 . GT G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs34990026&dbsnp.134|rs148642912&dbsnp.134|rs149201820 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:157:157,184:183,183:35,38:-157,0,-184:-35,0,-38:35:16,19:19 -1 1068720 . GGTG . . . NS=1;AN=0 GT:PS ./.:. -1 1068836 . GCCTGCCTGCCCGGCCTCCTCAGCAGATGCT . . . NS=1;AN=0 GT:PS ./.:. -1 1069425 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442375;CGA_RPT=AluY|Alu|3.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1069425:PASS:233:323,233:326,234:52,50:-323,0,-233:-52,0,-50:28:16,12:12 -1 1069443 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442376;CGA_RPT=AluY|Alu|3.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1069425:PASS:224:377,224:379,223:53,49:-377,0,-224:-53,0,-49:30:18,12:12 -1 1069451 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442377;CGA_RPT=AluY|Alu|3.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1069425:PASS:282:282,443:281,445:51,54:-282,0,-443:-51,0,-54:36:16,20:20 -1 1069475 . AAAAAAAG A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs57309011;CGA_RPT=AluY|Alu|3.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1069425:PASS:146:146,292:150,292:34,40:-146,0,-292:-34,0,-40:25:11,14:14 -1 1070001 . A . . NS=1;CGA_WINEND=1072000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.99:0.86:2:52:=:52:0.999:312 -1 1070128 . T G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442378 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:42:337,42:337,42:48,39:-337,-42,0:-48,-39,0:17:17,17:0 -1 1070202 . GGAGGGGGACAC . . . NS=1;AN=0 GT:PS ./.:. -1 1070441 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442379 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:169:340,169:335,165:52,49:-340,0,-169:-52,0,-49:26:17,9:9 -1 1071118 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10907183;CGA_RPT=MER41B|ERV1|22.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:423:423,423:423,423:45,54:-423,0,-423:-45,0,-54:50:25,25:25 -1 1071192 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6604971;CGA_RPT=MER41B|ERV1|22.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:392:392,392:392,392:50,54:-392,0,-392:-50,0,-54:49:22,27:27 -1 1072001 . C . . NS=1;CGA_WINEND=1074000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.73:0.87:2:52:=:52:0.999:312 -1 1072409 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1072458 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1072498 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442360 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:115:115,115:115,115:42,43:-115,0,-115:-42,0,-43:13:6,7:7 -1 1072536 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1072542 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1072613 . CGGG . . . NS=1;AN=0 GT:PS ./.:. -1 1072636 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1072645 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 1072732 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1072744 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1072764 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1072810 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1072814 . GAGAA . . . NS=1;AN=0 GT:PS ./.:. -1 1072830 . GCC . . . NS=1;AN=0 GT:PS ./.:. -1 1072847 . GCA . . . NS=1;AN=0 GT:PS ./.:. -1 1073099 . AGCCTGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 1073949 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1074001 . A . . NS=1;CGA_WINEND=1076000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.64:0.99:.:0:.:0:0.999:312 -1 1074125 . ACAG . . . NS=1;AN=0 GT:PS ./.:. -1 1074606 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1074671 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1074715 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1074763 . GGGGACCTGGGTCCTGGGGAGCTTCCTGGGGTCAGAAGGTGGGGGTGTCAGCATCG . . . NS=1;AN=0 GT:PS ./.:. -1 1074824 . GGGGACCTGGGTCCTGGGGAGCTTCCTGGGGTCAGAAGGTGGGGGTGTCAGCATCG . . . NS=1;AN=0 GT:PS ./.:. -1 1074885 . GGGGACCTGGGTCCTGGGGAGCTTCCTGGGGTCAGAAGGTGGGGGTGTCAGCATCG . . . NS=1;AN=0 GT:PS ./.:. -1 1074946 . GGGGACCTGGGTCCTGGGGAGCTTCCTGGGGTCAGAAGGTGGGGGTGTCAGCATCG . . . NS=1;AN=0 GT:PS ./.:. -1 1075007 . GGGGACCTGGGTCCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 1075030 . TCCTGGGGTCAGAAGGTGGGGGTGTCAGCATCGAACCGGGGGACCTGGGTCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 1075130 . GGGGCCTGGGTCCT . . . NS=1;AN=0 GT:PS ./.:. -1 1075152 . TCCTGGGGTCAGAAGGTAGGGGTGTCAACGTCGAACCGGGGGACCT . . . NS=1;AN=0 GT:PS ./.:. -1 1075208 . GAGCTTCCTGGGTTCAGAAGGTGGGGGTGTCAGCATCGAACCGGGGGACCTGGGTC . . . NS=1;AN=0 GT:PS ./.:. -1 1075276 . CTGAGGTCAGAAGGTGGGGGTGTCAGCATCGAACCGGGGGACCTGGGTCCTGGGGAGCTTCCTGGGGTCAGAAGGTGGGGGTGTCAGCATCGAACCGGGGGACCTGGGTCCTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 1075423 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1075434 . GGGGACCTGGGTCCTGGGGAGCTTCCTGGGGTCAGAAGGTGGGGGTGTCAGCATCG . . . NS=1;AN=0 GT:PS ./.:. -1 1075495 . GGGGACCTGGGTCCTGGGGAGCTTCCTGGGGTCA . . . NS=1;AN=0 GT:PS ./.:. -1 1075539 . GTGTCAGCATCGAACCGGGGGACCTGGGTCATGG . . . NS=1;AN=0 GT:PS ./.:. -1 1075585 . GGTCAGAAGGTGGGGGTGTCAACGTCGAACCGGGGGGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1075640 . TCCTGGGGTCAGAAGGTAGGGGTGTCAACGTCGAACCGGGGGACCT . . . NS=1;AN=0 GT:PS ./.:. -1 1075696 . GAGCTTCCTGGG . . . NS=1;AN=0 GT:PS ./.:. -1 1075725 . TCAACGTCGAACCGGGGGACCT . . . NS=1;AN=0 GT:PS ./.:. -1 1075757 . GAGCTTCCTGGGGTCAGAAGGTGGGGGTGTCAACGT . . . NS=1;AN=0 GT:PS ./.:. -1 1075801 . GGGACCTGGGTCCTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 1075925 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs75969607 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:66:66,66:66,66:15,26:-66,0,-66:-15,0,-26:30:14,16:16 -1 1076001 . G . . NS=1;CGA_WINEND=1078000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.75:0.75:.:0:.:0:0.999:312 -1 1076491 . C . . END=1076943;NS=1;AN=0 GT:PS ./.:. -1 1077010 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1077064 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970357 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1077064:PASS:102:102,133:108,130:41,45:-102,0,-133:-41,0,-45:14:7,7:7 -1 1077100 . AGGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 1077432 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1077435 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1077962 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs55750860 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:210:210,210:209,209:45,48:-210,0,-210:-45,0,-48:30:13,17:17 -1 1078001 . T . . NS=1;CGA_WINEND=1080000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.07:0.91:.:0:.:0:0.999:312 -1 1078279 . GACACCCTTGTGTCTTCGGAAAATGCCAGGTCCCCCCCCAGCT . . . NS=1;AN=0 GT:PS ./.:. -1 1080001 . C . . NS=1;CGA_WINEND=1082000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.24:0.78:.:0:.:0:0.999:312 -1 1080286 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.88|rs1539638 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:70:306,70:323,63:52,34:-306,0,-70:-52,0,-34:26:17,9:9 -1 1080511 . CAACCAC . . . NS=1;AN=0 GT:PS ./.:. -1 1080927 . T TCTGACCTCATGGCCGACCCCAC . . NS=1;AN=2;AC=1;CGA_RPT=(ACCTG)n|Simple_repeat|35.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:238:238,411:244,408:39,41:-238,0,-411:-39,0,-41:39:23,16:16 -1 1081648 . CAGCCCCTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 1081975 . GCCCAAC . . . NS=1;AN=0 GT:PS ./.:. -1 1082001 . G . . NS=1;CGA_WINEND=1084000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.18:0.97:.:0:.:0:0.999:312 -1 1083613 . CAGCA . . . NS=1;AN=0 GT:PS ./.:. -1 1084001 . A . . NS=1;CGA_WINEND=1086000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.20:0.96:.:0:.:0:0.999:312 -1 1085043 . GCGC . . . NS=1;AN=0 GT:PS ./.:. -1 1086001 . T . . NS=1;CGA_WINEND=1088000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.02:0.99:.:0:.:0:0.999:312 -1 1086293 . TGGCG . . . NS=1;AN=0 GT:PS ./.:. -1 1086305 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1086313 . TCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 1086320 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1086325 . GTGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1086338 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1086678 . GT . . . NS=1;AN=0 GT:PS ./.:. -1 1086896 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1087683 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442380;CGA_RPT=L1MC3|L1|39.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:604:604,604:603,603:48,51:-604,0,-604:-48,0,-51:62:30,32:32 -1 1088001 . C . . NS=1;CGA_WINEND=1090000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.29:0.88:.:0:.:0:0.999:312 -1 1089167 . A AT . . NS=1;AN=2;AC=1;CGA_RPT=AluJb|Alu|41.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:31:31,31:2,2:0,5:-31,0,-31:0,0,-5:31:5,26:26 -1 1089262 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4970358;CGA_RPT=AluJb|Alu|41.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:101:887,101:887,101:53,50:-887,-101,0:-53,-50,0:39:39,39:0 -1 1090001 . T . . NS=1;CGA_WINEND=1092000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.81:0.86:2:50:=:50:0.999:312 -1 1090010 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442361;CGA_RPT=AluSp|Alu|10.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:192:192,192:192,192:49,50:-192,0,-192:-49,0,-50:21:9,12:12 -1 1090038 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1090055 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1090170 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1090577 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6604972;CGA_RPT=L1MEg|L1|42.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:64:64,640:64,640:48,52:-640,-64,0:-52,-48,0:26:26,26:0 -1 1092001 . G . . NS=1;CGA_WINEND=1094000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.86:0.73:2:48:=:48:0.999:312 -1 1092153 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1092157 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1092269 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1092297 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1092300 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 1092331 . AGGTCGCACAGCAGGACCAGGACCCAGGACCTCGGGCTGGGGACAGAGTGACCTTC . . . NS=1;AN=0 GT:PS ./.:. -1 1092396 . CCG . . . NS=1;AN=0 GT:PS ./.:. -1 1092411 . CTTGCAGTGGCCAACAGGTGCCTGGGGTCTT . . . NS=1;AN=0 GT:PS ./.:. -1 1092459 . CCGGGACAAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 1092474 . AGAAAAGGTAGCTAGCTGGAAGAGGGTGCAGGAGGCCCCCCGCTCTGTGCAGCATTAAATCATGGTGGGGTCACCTGCCTTGTCTGGCAGCATGGT . . . NS=1;AN=0 GT:PS ./.:. -1 1092577 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1092599 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs56863140 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1092599:VQLOW:39:39,59:35,55:13,33:-39,0,-59:-13,0,-33:8:2,6:6 -1 1092613 . A . . END=1092863;NS=1;AN=0 GT:PS ./.:. -1 1092982 . CGCT . . . NS=1;AN=0 GT:PS ./.:. -1 1092989 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:1092989:VQLOW:29,.:27,.:0,.:0:0 -1 1092991 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1092994 . CGAGGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 1093009 . GCTGGGAGGGGCCTCCCTCCGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1094001 . T . . NS=1;CGA_WINEND=1096000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.03:0.89:2:51:=:51:0.999:312 -1 1094157 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 1094190 . C G . . NS=1;AN=2;AC=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1094190:VQLOW:25:25,25:0,0:0,11:-25,0,-25:0,0,-11:13:2,11:11 -1 1094199 . TG . . . NS=1;AN=0 GT:PS ./.:. -1 1094485 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970360;CGA_RPT=AluSx|Alu|12.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:90:382,90:372,80:52,37:-382,0,-90:-52,0,-37:22:16,6:6 -1 1094672 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970361;CGA_RPT=AluSx|Alu|12.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:190:190,190:184,184:48,50:-190,0,-190:-48,0,-50:29:10,19:19 -1 1094738 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970362 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:442:442,442:441,441:46,54:-442,0,-442:-46,0,-54:51:23,28:28 -1 1094979 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7538773 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:163:339,163:335,159:52,48:-339,0,-163:-52,0,-48:29:17,12:12 -1 1095383 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6604973;CGA_RPT=AluJb|Alu|18.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:203:426,203:422,199:53,47:-426,0,-203:-53,0,-47:34:21,13:13 -1 1095619 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970419 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:186:263,186:263,185:52,50:-263,0,-186:-52,0,-50:27:14,13:13 -1 1096001 . C . . NS=1;CGA_WINEND=1098000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.18:0.97:2:53:=:53:0.999:312 -1 1096011 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442381 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:70:70,70:57,57:20,33:-70,0,-70:-20,0,-33:24:6,18:18 -1 1096198 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442382 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:216:349,216:348,214:52,50:-349,0,-216:-52,0,-50:28:16,12:12 -1 1096908 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.88|rs1539636 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:145:145,145:117,117:39,43:-145,0,-145:-39,0,-43:26:12,14:14 -1 1096928 . G GT . . NS=1;AN=2;AC=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:28:28,145:2,117:0,27:-28,0,-145:0,0,-27:34:3,29:29 -1 1097092 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.88|rs1539635 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1097092:PASS:132:184,132:183,136:48,46:-184,0,-132:-48,0,-46:18:10,8:8 -1 1097100 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.88|rs1539634 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1097092:PASS:150:272,150:273,151:52,47:-272,0,-150:-52,0,-47:26:15,11:11 -1 1097287 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442384 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1097287:PASS:56:378,56:359,37:52,27:-378,0,-56:-52,0,-27:22:18,4:4 -1 1097301 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1097335 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9442385 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:192:192,192:185,185:42,45:-192,0,-192:-42,0,-45:39:14,25:25 -1 1097407 . CCCCA C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.134|rs145121017 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:286:358,286:358,286:40,40:-358,0,-286:-40,0,-40:33:17,16:16 -1 1097618 . CGGAGCCTCTTGCCCATTGGGGTGGGACTGGGG . . . NS=1;AN=0 GT:PS ./.:. -1 1097937 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.92|rs1891907;CGA_FI=406984|NR_029639.1|MIR200B|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:48:421,48:421,48:48,42:-421,-48,0:-48,-42,0:17:17,17:0 -1 1098001 . C . . NS=1;CGA_WINEND=1100000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.11:0.87:2:53:=:53:0.999:312 -1 1098088 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1098421 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12135382;CGA_FI=406983|NR_029834.1|MIR200A|TSS-UPSTREAM|UNKNOWN-INC&406984|NR_029639.1|MIR200B|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:125:125,125:124,124:43,44:-125,0,-125:-43,0,-44:19:8,11:11 -1 1098610 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 1098618 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 1098645 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1098714 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4379629;CGA_FI=406983|NR_029834.1|MIR200A|TSS-UPSTREAM|UNKNOWN-INC&406984|NR_029639.1|MIR200B|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:51:127,51:126,50:44,31:-127,0,-51:-44,0,-31:12:7,5:5 -1 1098820 . TC T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.114|rs5772039&dbsnp.130|rs71578305&dbsnp.134|rs148367913;CGA_FI=406983|NR_029834.1|MIR200A|TSS-UPSTREAM|UNKNOWN-INC&406984|NR_029639.1|MIR200B|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:88:88,124:111,123:31,38:-88,0,-124:-31,0,-38:27:16,11:11 -1 1099020 . CTTCTAG . . . NS=1;AN=0 GT:PS ./.:. -1 1099342 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9660710;CGA_FI=406983|NR_029834.1|MIR200A|TSS-UPSTREAM|UNKNOWN-INC&406984|NR_029639.1|MIR200B|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:332:332,332:332,332:39,54:-332,0,-332:-39,0,-54:52:25,27:27 -1 1100001 . A . . NS=1;CGA_WINEND=1102000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.08:0.94:2:53:=:53:0.999:315 -1 1100217 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.92|rs1891905;CGA_FI=406983|NR_029834.1|MIR200A|TSS-UPSTREAM|UNKNOWN-INC&406984|NR_029639.1|MIR200B|TSS-UPSTREAM|UNKNOWN-INC&554210|NR_029957.1|MIR429|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:475:619,475:618,474:49,54:-619,0,-475:-49,0,-54:53:29,24:24 -1 1100319 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.92|rs1891904;CGA_FI=406983|NR_029834.1|MIR200A|TSS-UPSTREAM|UNKNOWN-INC&406984|NR_029639.1|MIR200B|TSS-UPSTREAM|UNKNOWN-INC&554210|NR_029957.1|MIR429|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:144:144,144:143,143:44,47:-144,0,-144:-44,0,-47:23:10,13:13 -1 1101003 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7549819;CGA_FI=406983|NR_029834.1|MIR200A|TSS-UPSTREAM|UNKNOWN-INC&406984|NR_029639.1|MIR200B|TSS-UPSTREAM|UNKNOWN-INC&554210|NR_029957.1|MIR429|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:50:182,50:176,44:48,29:-182,0,-50:-48,0,-29:16:11,5:5 -1 1101393 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1101397 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1101689 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1101858 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1101899 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1101910 . CACCCCCACCCCCAC . . . NS=1;AN=0 GT:PS ./.:. -1 1101949 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1102001 . G . . NS=1;CGA_WINEND=1104000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.04:0.91:2:53:=:53:0.999:315 -1 1102069 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442386;CGA_FI=406983|NR_029834.1|MIR200A|TSS-UPSTREAM|UNKNOWN-INC&406984|NR_029639.1|MIR200B|TSS-UPSTREAM|UNKNOWN-INC&554210|NR_029957.1|MIR429|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:55:530,55:530,55:52,45:-530,-55,0:-52,-45,0:26:26,26:0 -1 1102116 . CAGA . . . NS=1;AN=0 GT:PS ./.:. -1 1102162 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1102306 . TGCCCACCCCAGGACCCAAAGCTGGTGGCTGCTGG . . . NS=1;AN=0 GT:PS ./.:. -1 1102573 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1102593 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1102841 . GCCCACAGCGCCTGGGCGGGA . . . NS=1;AN=0 GT:PS ./.:. -1 1103053 . TCAGCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1103068 . T . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:1103068:VQLOW:38,.:16,.:0,.:0:0 -1 1103071 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1103387 . CTGGGGCGGAGGGCCGAGCGGGGCCAGCAGACGGGTGAGGGCGGAGGGCCGAGCGGGGCCAGCAGACGGGTGAGGGCGGAGGGCCGAGCGGGGCCAGCAGACGGGTGAGGGCGGAGGGCTGAGCGGGCGGCAGAGG . . . NS=1;AN=0 GT:PS ./.:. -1 1103534 . CTCCGAAGTCCAGCCCCCAGGGGAGGGGCCGGCCTCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1103690 . T TCA . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.126|rs34866926&dbsnp.126|rs36015232&dbsnp.131|rs77069227;CGA_FI=554210|NR_029957.1|MIR429|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:128:226,128:222,124:39,36:-226,0,-128:-39,0,-36:23:15,8:8 -1 1103717 . TTTAGGGTGAC . . . NS=1;AN=0 GT:PS ./.:. -1 1103812 . CC . . . NS=1;AN=0 GT:PS ./.:. -1 1103958 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7521584;CGA_FI=554210|NR_029957.1|MIR429|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:325:325,325:325,325:53,54:-325,0,-325:-53,0,-54:34:16,18:18 -1 1104001 . T . . NS=1;CGA_WINEND=1106000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.04:0.86:2:52:=:52:0.999:315 -1 1104117 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.132|rs117987012;CGA_FI=554210|NR_029957.1|MIR429|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:273:273,273:272,272:50,52:-273,0,-273:-50,0,-52:30:14,16:16 -1 1105002 . C T . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:181:195,181:195,181:49,50:-195,0,-181:-49,0,-50:23:11,12:12 -1 1105773 . GC . . . NS=1;AN=0 GT:PS ./.:. -1 1105788 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1105814 . CGG . . . NS=1;AN=0 GT:PS ./.:. -1 1105821 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1106001 . G . . NS=1;CGA_WINEND=1108000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.98:0.91:2:53:=:53:0.999:315 -1 1106061 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6656236;CGA_FI=254173|NM_001130045.1|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1106061:PASS:148:398,148:393,143:52,47:-398,0,-148:-52,0,-47:29:18,11:11 -1 1106473 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4970420;CGA_FI=254173|NM_001130045.1|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:1106061:PASS:331:331,331:328,328:54,46:-331,0,-331:-46,0,-54:45:27,18:27 -1 1106784 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4442317;CGA_FI=254173|NM_001130045.1|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:39:377,39:377,39:48,36:-377,-39,0:-48,-36,0:18:18,18:0 -1 1106950 . GAACCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1107294 . G GC . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1107294:PASS:181:181,293:181,291:36,39:-181,0,-293:-36,0,-39:33:12,21:21 -1 1107303 . ACACA GCCCG . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1107294:PASS:394:394,428:409,428:35,37:-394,0,-428:-35,0,-37:31:14,17:17 -1 1107879 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1107990 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1107994 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1108001 . G . . NS=1;CGA_WINEND=1110000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.88:0.99:2:51:=:51:0.999:315 -1 1108015 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1108044 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 1108117 . AATAAATTAATAATAAAATAATTATAATAATATAGAAAATAATATATAAAATAATAAATATATAAAATATAAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 1108203 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1108207 . AATAA . . . NS=1;AN=0 GT:PS ./.:. -1 1108218 . TATTAAAAATACA . . . NS=1;AN=0 GT:PS ./.:. -1 1108277 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs12136529;CGA_FI=254173|NM_001130045.1|TTLL10|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluSx1|Alu|13.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:1108277:PASS:21:21,227:21,227:22,48:-227,-21,0:-48,-22,0:12:12,12:0 -1 1108286 . AG . . . NS=1;AN=0 GT:PS ./.:. -1 1108315 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1108368 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1109154 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs12041521;CGA_FI=100506376|XM_003118493.1|LOC100506376|UTR3|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:53:53,394:53,394:44,52:-394,-53,0:-52,-44,0:22:22,22:0 -1 1109252 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.113|rs5010607;CGA_FI=100506376|XM_003118493.1|LOC100506376|UTR3|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:108:108,900:108,900:52,50:-900,-108,0:-52,-50,0:41:41,41:0 -1 1109437 . G A . . NS=1;AN=2;AC=1;CGA_FI=100506376|XM_003118493.1|LOC100506376|UTR3|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:319:461,319:460,318:50,53:-461,0,-319:-50,0,-53:49:26,23:23 -1 1109476 . C A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs12141369;CGA_FI=100506376|XM_003118493.1|LOC100506376|UTR3|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:67:455,67:455,67:52,48:-455,-67,0:-52,-48,0:24:24,24:0 -1 1109782 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.130|rs72894004;CGA_FI=100506376|XM_003118493.1|LOC100506376|UTR3|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:92:793,92:793,92:53,47:-793,-92,0:-53,-47,0:34:34,34:0 -1 1110001 . C . . NS=1;CGA_WINEND=1112000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.78:1.20:2:48:=:48:0.999:315 -1 1110374 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1110377 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1110586 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9442387;CGA_FI=100506376|XM_003118493.1|LOC100506376|UTR3|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:102:829,102:829,102:50,50:-829,-102,0:-50,-50,0:44:44,44:0 -1 1111114 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1111116 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1111130 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1111173 . GCTGTGCAGGTGGAGAGAGGCTGCCACCGTGCAGGTGGAGAGAGGCTGCCGCTGTGCCGGTGGAGAGGCTGCTGCCGTGCAGGTGGAGAGAGGCTGCCGCTGTGCCGGTGGAGAGGCTGCTGCTCCCAGCCGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1111455 . CTGCCGCTGTGCAGGTGGAGAGACTGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 1111586 . CTGCTGC . . . NS=1;AN=0 GT:PS ./.:. -1 1111621 . CTGACGCTGTGCAGGTGGAGAGAGGCTGACGCTGTGCAGGTGGAGAGGCTGCTGCTCCCAGCCGC . . . NS=1;AN=0 GT:PS ./.:. -1 1111955 . AATTCCCGGCT . . . NS=1;AN=0 GT:PS ./.:. -1 1112001 . C . . NS=1;CGA_WINEND=1114000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:0.85:2:52:=:52:0.999:315 -1 1112309 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9724957;CGA_FI=100506376|XM_003118493.1|LOC100506376|INTRON|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:294:479,294:477,293:50,52:-479,0,-294:-50,0,-52:43:24,19:19 -1 1112405 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1112408 . CTG . . . NS=1;AN=0 GT:PS ./.:. -1 1112422 . GTCTGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 1112434 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1112699 . T TC . . NS=1;AN=2;AC=1;CGA_FI=100506376|XM_003118493.1|LOC100506376|INTRON|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluY|Alu|8.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:93:93,262:94,261:26,37:-93,0,-262:-26,0,-37:37:17,20:20 -1 1112982 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6671609;CGA_FI=100506376|XM_003118493.1|LOC100506376|INTRON|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|TSS-UPSTREAM|UNKNOWN-INC;CGA_RPT=AluY|Alu|8.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:110:353,110:346,103:52,41:-353,0,-110:-52,0,-41:23:16,7:7 -1 1113087 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1113091 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1113094 . C T . . NS=1;AN=2;AC=1;CGA_FI=100506376|XM_003118493.1|LOC100506376|INTRON|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|TSS-UPSTREAM|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1113094:VQLOW:22:22,22:0,0:0,11:-22,0,-22:0,0,-11:25:1,24:24 -1 1114001 . G . . NS=1;CGA_WINEND=1116000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.14:0.87:2:53:=:53:0.999:315 -1 1114297 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1115210 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260546;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|UTR5|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1115210:PASS:208:411,208:411,220:53,49:-411,0,-208:-53,0,-49:30:17,13:13 -1 1115213 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260547;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|UTR5|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1115210:PASS:224:379,224:378,240:53,50:-379,0,-224:-53,0,-50:31:17,14:14 -1 1115427 . GAGGCCA . . . NS=1;AN=0 GT:PS ./.:. -1 1115776 . G T . . NS=1;AN=2;AC=1;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:175:216,175:215,175:52,49:-216,0,-175:-52,0,-49:23:12,11:11 -1 1115908 . CAGCCGC . . . NS=1;AN=0 GT:PS ./.:. -1 1115922 . ACTA . . . NS=1;AN=0 GT:PS ./.:. -1 1115976 . GGGAAGGTAGC . . . NS=1;AN=0 GT:PS ./.:. -1 1115994 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1116001 . G . . NS=1;CGA_WINEND=1118000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.91:0.99:2:53:=:53:0.999:315 -1 1116004 . GCGGGCAGCCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1116018 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1116021 . GCCTCCTGGGCCGGAGCACAGGGCAGGTTGGAGGGGTGGGGACCGAGGTCCTGCGCTCCCTCCACACGAGCCCTGGCCTCTGACCTCCAGGAGAGCAG . . . NS=1;AN=0 GT:PS ./.:. -1 1116123 . TGTACC . . . NS=1;AN=0 GT:PS ./.:. -1 1116136 . CAAC . . . NS=1;AN=0 GT:PS ./.:. -1 1116142 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 1116183 . GGG . . . NS=1;AN=0 GT:PS ./.:. -1 1116195 . GGGCCATG . . . NS=1;AN=0 GT:PS ./.:. -1 1116208 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 1116223 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1116284 . TACGCCTGCCCCTGCCCCTGCCCCTGCACCCGCCCCACCCCTGCCCCTGCGCCCGCCCCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1116370 . AGGCTCCCAGGCTGGCTCCAGCCCCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1116553 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.130|rs72631898;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:315:383,315:382,315:53,53:-383,0,-315:-53,0,-53:37:19,18:18 -1 1116601 . T C . . NS=1;AN=2;AC=1;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:22:22,22:0,0:0,11:-22,0,-22:0,0,-11:29:4,25:25 -1 1116683 . G A . . NS=1;AN=2;AC=1;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:126:325,126:322,124:52,44:-325,0,-126:-52,0,-44:22:13,9:9 -1 1117398 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs12097586;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:80:80,670:80,670:48,52:-670,-80,0:-52,-48,0:28:28,28:0 -1 1117486 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.121|rs13376670;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:370:449,370:449,370:50,54:-449,0,-370:-50,0,-54:41:21,20:20 -1 1118001 . A . . NS=1;CGA_WINEND=1120000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.15:0.93:2:53:=:53:0.999:315 -1 1118212 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10907171;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:142:142,142:142,142:44,47:-142,0,-142:-44,0,-47:22:11,11:11 -1 1119657 . G C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.111|rs4560982;CGA_FI=100506376|XM_003118493.1|LOC100506376|TSS-UPSTREAM|UNKNOWN-INC&254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|INTRON|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:52:52,469:52,469:44,52:-469,-52,0:-52,-44,0:28:28,28:0 -1 1120001 . T . . NS=1;CGA_WINEND=1122000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.06:0.97:.:0:.:0:0.999:315 -1 1120032 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1120035 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:1120035:PASS:69,.:43,.:0,.:0:0 -1 1120069 . C . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:1120035:VQLOW:36,.:29,.:0,.:0:0 -1 1120173 . GGGACCCCCGTGAGGACAGGCCCTCCGGAC . . . NS=1;AN=0 GT:PS ./.:. -1 1120307 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:1120035:PASS:83,.:79,.:0,.:0:0 -1 1121014 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.107|rs3813204;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC&254173|NM_153254.2|TTLL10|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:344:344,344:344,344:47,54:-344,0,-344:-47,0,-54:40:20,20:20 -1 1121341 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4297230;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|56.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1121341:PASS:340:340,417:343,416:53,54:-340,0,-417:-53,0,-54:35:17,18:18 -1 1121358 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.131|rs80057011;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|56.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:1121341:PASS:345:345,521:343,520:54,53:-521,0,-345:-53,0,-54:39:17,22:17 -1 1121472 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1121480 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12063663;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSx1|Alu|9.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1121480:PASS:162:259,162:241,143:47,40:-259,0,-162:-47,0,-40:32:24,8:8 -1 1121625 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs4081334;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSx1|Alu|9.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:137:137,251:136,250:43,50:-137,0,-251:-43,0,-50:22:9,13:13 -1 1121657 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs11260548;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSx1|Alu|9.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:VQLOW:19:301,19:301,43:48,39:-301,-19,0:-48,-39,0:18:17,17:1 -1 1121715 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.108|rs4081333;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSx1|Alu|9.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:46:46,46:32,32:10,26:-46,0,-46:-10,0,-26:16:3,13:13 -1 1121794 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260549;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|44.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:300:308,300:308,300:53,53:-308,0,-300:-53,0,-53:34:17,17:17 -1 1121835 . C CTG . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs57346441;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|44.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:61:654,61:654,61:39,33:-654,-61,0:-39,-33,0:36:36,36:0 -1 1122001 . G . . NS=1;CGA_WINEND=1124000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.75:0.86:2:52:=:52:0.999:315 -1 1122021 . CACTTGTTATTTTCCTTTCTTTATTAAAGAATCACCATCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1122196 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4634847;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|44.0 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:203:203,203:193,193:43,46:-203,0,-203:-43,0,-46:39:12,27:27 -1 1122230 . GT G . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.126|rs35158481;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|4.8 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:51,.:42,.:12,.:-51,0,0:-12,0,0:15:9,.:1 -1 1122283 . T G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12064046;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|4.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:217:292,217:292,217:52,48:-292,0,-217:-52,0,-48:31:17,14:14 -1 1122319 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs7415847;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|4.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:86:86,86:86,86:36,38:-86,0,-86:-36,0,-38:18:9,9:9 -1 1122388 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260550;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|4.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1122388:PASS:235:461,235:457,235:53,50:-461,0,-235:-53,0,-50:30:19,11:11 -1 1122395 . TC TTT . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|4.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1122388:PASS:332:349,332:345,346:34,35:-349,0,-332:-34,0,-35:31:19,12:12 -1 1122468 . TG CG . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.116|rs7545694&dbsnp.131|rs80083461;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|4.8 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:1122468:PASS:81,.:81,.:34,.:-81,0,0:-34,0,0:8:8,.:0 -1 1122473 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1122478 . GGCCT . . . NS=1;AN=0 GT:PS ./.:. -1 1122485 . CA . . . NS=1;AN=0 GT:PS ./.:. -1 1122516 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260551;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|4.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1122516:PASS:89:157,89:157,90:48,39:-157,0,-89:-48,0,-39:13:8,5:5 -1 1122539 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12063897;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:1122516:PASS:198:308,198:308,199:53,44:-198,0,-308:-44,0,-53:30:16,14:16 -1 1122771 . A G . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSx1|Alu|12.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:VQLOW:22:22,22:16,16:0,21:-22,0,-22:0,0,-21:9:2,7:7 -1 1122844 . TACTAAAAATATG . . . NS=1;AN=0 GT:PS ./.:. -1 1122915 . AA GA,GG . . NS=1;AN=2;AC=1,1;CGA_XR=dbsnp.125|rs28595293,dbsnp.125|rs28460227&dbsnp.125|rs28595293;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC,254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSx1|Alu|12.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|2:1122915:PASS:123:126,123:126,124:25,25:-126,-123,-123,-126,0,-126:-25,-25,-25,-25,0,-25:13:6,7:0 -1 1122937 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.125|rs28648687;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSx1|Alu|12.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:1122915:PASS:139:191,139:191,139:50,45:-139,0,-191:-45,0,-50:19:11,8:11 -1 1123106 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12401472;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSx|Alu|12.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:184:186,184:186,184:48,50:-186,0,-184:-48,0,-50:17:8,9:9 -1 1123284 . CA C . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.126|rs35247267;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSx|Alu|12.1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/.:.:PASS:53,.:53,.:16,.:-53,0,0:-16,0,0:8:6,.:2 -1 1123434 . T A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12066716;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:264:264,264:263,263:49,51:-264,0,-264:-49,0,-51:31:14,17:17 -1 1123785 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1124001 . T . . NS=1;CGA_WINEND=1126000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.87:0.92:2:53:=:53:0.999:315 -1 1124257 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10907172;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:219:219,219:217,217:45,48:-219,0,-219:-45,0,-48:31:13,18:18 -1 1124399 . C G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10907173;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:166:166,166:160,160:46,48:-166,0,-166:-46,0,-48:27:9,18:18 -1 1124663 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6684820;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:76:76,823:76,823:41,53:-823,-76,0:-53,-41,0:35:35,35:0 -1 1124750 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.116|rs6702156;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:103:103,1038:103,1038:51,50:-1038,-103,0:-51,-50,0:41:41,41:0 -1 1124819 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.116|rs6694487;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:340:473,340:471,339:50,54:-473,0,-340:-50,0,-54:42:24,18:18 -1 1124891 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs61768485;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:260:260,260:259,259:52,50:-260,0,-260:-52,0,-50:29:13,16:16 -1 1125110 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12124436;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1125110:VQLOW:24:24,71:28,70:6,36:-24,0,-71:-6,0,-36:10:4,6:6 -1 1125119 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1125122 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1125220 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12065129;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|5.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:215:315,215:313,213:53,48:-315,0,-215:-53,0,-48:30:17,13:13 -1 1125348 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12029885;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|5.7 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:186:186,213:198,213:50,50:-186,0,-213:-50,0,-50:22:11,11:11 -1 1125488 . TCTCAAAAAAAAAAAAAAAAGAAT . . . NS=1;AN=0 GT:PS ./.:. -1 1125553 . A T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs10907174;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:80:80,788:80,788:43,53:-788,-80,0:-53,-43,0:33:33,33:0 -1 1125811 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1126001 . G . . NS=1;CGA_WINEND=1128000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.93:0.88:2:53:=:53:0.999:315 -1 1126236 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4578157;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSq2|Alu|13.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1126236:PASS:414:414,540:414,537:50,54:-414,0,-540:-50,0,-54:40:17,23:23 -1 1126255 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.111|rs4449971;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSq2|Alu|13.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1126236:PASS:361:361,540:360,537:53,54:-361,0,-540:-53,0,-54:38:17,21:21 -1 1126455 . CTGCCAA . . . NS=1;AN=0 GT:PS ./.:. -1 1126723 . GCTACTCA . . . NS=1;AN=0 GT:PS ./.:. -1 1126968 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1126994 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.96|rs2094830;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=MER9a1|ERVK|6.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1126994:PASS:98:98,98:88,88:31,39:-98,0,-98:-31,0,-39:25:7,18:18 -1 1127101 . C A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9659458;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=MER9a1|ERVK|6.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1127101:PASS:222:222,459:220,457:46,54:-222,0,-459:-46,0,-54:36:13,23:23 -1 1127137 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12062271;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=MER9a1|ERVK|6.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1127101:PASS:281:281,325:298,326:53,54:-281,0,-325:-53,0,-54:34:19,15:15 -1 1127322 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260552;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=MER9a1|ERVK|6.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1127322:PASS:214:246,214:246,214:52,50:-246,0,-214:-52,0,-50:28:14,13:13 -1 1127330 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12061357;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=MER9a1|ERVK|6.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:1127322:PASS:246:254,246:254,248:51,48:-246,0,-254:-48,0,-51:35:19,14:19 -1 1127380 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260553;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=MER9a1|ERVK|6.3 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:267:267,267:267,267:50,51:-267,0,-267:-50,0,-51:35:17,18:18 -1 1127507 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12021582;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1127507:PASS:299:415,299:415,299:53,53:-415,0,-299:-53,0,-53:39:21,18:18 -1 1127523 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12024296;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1127507:PASS:382:440,382:440,385:50,54:-440,0,-382:-50,0,-54:42:21,21:21 -1 1127608 . G GT . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:139:253,139:277,136:39,37:-253,0,-139:-39,0,-37:29:17,12:12 -1 1127681 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs61768486;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|34.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:113:113,113:106,106:36,41:-113,0,-113:-36,0,-41:20:6,14:14 -1 1127739 . TGATATGAGTGTAGACACTCCAGTTGGTATAAGT . . . NS=1;AN=0 GT:PS ./.:. -1 1127786 . TTGATATGAATGTAGACACTCCAGTTGGTATGAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 1127838 . TATGAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 1127857 . GTTCGTATCAGTGTAGACACTCCAGTTGATATCAGTGT . . . NS=1;AN=0 GT:PS ./.:. -1 1127939 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1127948 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:1127948:VQLOW:37,.:37,.:0,.:0:0 -1 1127977 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1128001 . G . . NS=1;CGA_WINEND=1130000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.75:0.89:2:53:=:53:0.999:315 -1 1128122 . TTGGTATGAGTGTAGACACTCCAGTTGTTCATA . . . NS=1;AN=0 GT:PS ./.:. -1 1128196 . GTTGGTATGAGTGTAGACA . . . NS=1;AN=0 GT:PS ./.:. -1 1128331 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs61766176;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|30.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:260:260,260:260,260:49,51:-260,0,-260:-49,0,-51:31:14,17:17 -1 1128429 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.129|rs61766177;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|8.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:134:241,134:239,132:48,45:-241,0,-134:-48,0,-45:19:11,8:8 -1 1128521 . C T . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.129|rs61766178;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluY|Alu|8.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:52:52,391:52,391:44,52:-391,-52,0:-52,-44,0:25:25,25:0 -1 1128605 . TGA . . . NS=1;AN=0 GT:PS ./.:. -1 1128709 . AAT . . . NS=1;AN=0 GT:PS ./.:. -1 1128717 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1128778 . C CTTA . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.131|rs78174849;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|45.8 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:50:50,338:50,341:37,39:-338,-50,0:-39,-37,0:28:28,28:0 -1 1128849 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1128861 . GAAAC . . . NS=1;AN=0 GT:PS ./.:. -1 1128886 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1129009 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1129122 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9659213;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSq|Alu|7.4 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:77:111,77:111,76:41,37:-111,0,-77:-41,0,-37:10:6,4:4 -1 1129263 . TGTGGTGG . . . NS=1;AN=0 GT:PS ./.:. -1 1129672 . G T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260554;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|39.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1129672:PASS:247:594,247:589,242:50,50:-594,0,-247:-50,0,-50:43:28,15:15 -1 1129707 . A G . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260555;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|39.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1129672:PASS:210:294,210:294,213:52,50:-294,0,-210:-52,0,-50:27:16,11:11 -1 1129789 . G C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12060374;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|39.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1129789:PASS:207:451,207:447,202:52,50:-451,0,-207:-52,0,-50:29:18,11:11 -1 1129920 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12060422;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|39.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:150:494,150:481,137:53,39:-494,0,-150:-53,0,-39:32:23,9:9 -1 1130001 . G . . NS=1;CGA_WINEND=1132000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.81:0.96:2:53:=:53:0.999:315 -1 1130093 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12026524;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|39.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:1129789:PASS:183:183,189:183,189:50,49:-189,0,-183:-49,0,-50:20:10,10:10 -1 1130206 . G A . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs11260556;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|39.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:118:1037,118:1037,118:50,52:-1037,-118,0:-52,-50,0:44:44,44:0 -1 1130480 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9659772;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluSz|Alu|16.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1130480:PASS:185:185,185:185,185:48,50:-185,0,-185:-48,0,-50:22:11,11:11 -1 1130727 . A C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10907175;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|36.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:118:123,118:125,116:44,43:-123,0,-118:-44,0,-43:15:9,6:6 -1 1130843 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9727857;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|36.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:58:664,58:653,98:53,49:-664,-58,0:-53,-49,0:34:34,34:0 -1 1130855 . T C . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs10907176;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|36.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:1130480:PASS:148:148,491:136,479:39,53:-491,0,-148:-53,0,-39:33:10,23:10 -1 1130881 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1130954 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1130960 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1130968 . AC . . . NS=1;AN=0 GT:PS ./.:. -1 1131052 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12066103;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluJo|Alu|17.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:42:42,42:39,39:15,28:-42,0,-42:-15,0,-28:10:3,7:7 -1 1131233 . GCTGATCTCAGACTCCTGAGCTCAAGCGATC . . . NS=1;AN=0 GT:PS ./.:. -1 1131310 . C T . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluJo|Alu|17.5 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1131310:PASS:122:122,341:127,302:32,53:-122,0,-341:-32,0,-53:36:10,26:26 -1 1131323 . C T . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluJb|Alu|34.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1131310:PASS:88:88,341:89,302:22,53:-88,0,-341:-22,0,-53:36:5,31:31 -1 1131334 . C T . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluJb|Alu|34.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1131310:PASS:221:221,341:224,302:29,53:-221,0,-341:-29,0,-53:51:13,38:38 -1 1131394 . A . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0/.:.:PASS:103,.:98,.:0,.:0:0 -1 1131419 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1131441 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12403745;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=AluJb|Alu|34.1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:50:50,50:50,50:22,31:-50,0,-50:-22,0,-31:7:3,4:4 -1 1131581 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9329409;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=L1MB8|L1|47.6 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:66:493,66:493,66:52,48:-493,-66,0:-52,-48,0:24:24,24:0 -1 1131825 . GTGAGGC . . . NS=1;AN=0 GT:PS ./.:. -1 1132001 . G . . NS=1;CGA_WINEND=1134000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.94:0.92:2:53:=:53:0.999:315 -1 1132010 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1132013 . CTG . . . NS=1;AN=0 GT:PS ./.:. -1 1132744 . CCCTCACACCCTCCCCACC . . . NS=1;AN=0 GT:PS ./.:. -1 1132785 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.100|rs2274792;CGA_FI=254173|NM_001130045.1|TTLL10|INTRON|UNKNOWN-INC;CGA_RPT=C-rich|Low_complexity|30.9 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1132785:PASS:64:66,64:66,64:29,35:-66,0,-64:-29,0,-35:8:4,4:4 -1 1132797 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1132869 . CTCTGCTGTCCCAGCGCCGCTTCGTGCT . . . NS=1;AN=0 GT:PS ./.:. -1 1132907 . GGTGAGGCCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 1132938 . G A . . NS=1;AN=1;AC=1;CGA_XR=dbsnp.100|rs2274791;CGA_FI=254173|NM_001130045.1|TTLL10|CDS|MISSENSE GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|.:1132938:PASS:66,.:65,.:28,.:-66,0,0:-28,0,0:6:4,.:2 -1 1132947 . GCCTCCG . . . NS=1;AN=0 GT:PS ./.:. -1 1132960 . GCCGCCCCTGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1133005 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1133008 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1133023 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1133033 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 1133038 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1133042 . GCG . . . NS=1;AN=0 GT:PS ./.:. -1 1133047 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1133065 . GCG . . . NS=1;AN=0 GT:PS ./.:. -1 1133074 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1133077 . A G . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.100|rs2274789;CGA_FI=254173|NM_001130045.1|TTLL10|CDS|SYNONYMOUS GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|1:1133077:VQLOW:12:107,12:107,12:41,16:-107,-12,0:-41,-16,0:7:7,7:0 -1 1133084 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1133110 . G A . . NS=1;AN=2;AC=1;CGA_FI=254173|NM_001130045.1|TTLL10|CDS|SYNONYMOUS GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 0|1:1132785:PASS:47:47,47:47,47:30,20:-47,0,-47:-20,0,-30:11:6,5:6 -1 1133254 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs12026794;CGA_FI=254173|NM_001130045.1|TTLL10|UTR3|UNKNOWN-INC;CGA_RPT=GC_rich|Low_complexity|3.2 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:72:174,72:173,70:48,36:-174,0,-72:-48,0,-36:17:10,7:7 -1 1133273 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs12031928;CGA_FI=254173|NM_001130045.1|TTLL10|UTR3|UNKNOWN-INC GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:48:449,48:449,73:48,48:-449,-48,0:-48,-48,0:19:19,19:0 -1 1133787 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.120|rs12405246 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:42:42,268:42,268:39,48:-268,-42,0:-48,-39,0:15:15,15:0 -1 1133815 . T C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.126|rs35009659 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:63:512,63:512,63:52,48:-512,-63,0:-52,-48,0:22:22,22:0 -1 1134001 . A . . NS=1;CGA_WINEND=1136000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:1.06:1.00:2:53:=:53:0.999:315 -1 1134295 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1134633 . C T . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.120|rs11260558 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1134633:PASS:242:264,242:254,232:52,50:-264,0,-242:-52,0,-50:26:14,12:12 -1 1134642 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1134659 . G . . . NS=1;AN=1 GT:PS:FT:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL 0|.:1134633:PASS:249,.:239,.:0,.:0:0 -1 1134707 . G A . . NS=1;AN=2;AC=1;CGA_XR=dbsnp.119|rs9727747 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/0:.:PASS:135:135,171:134,168:43,49:-135,0,-171:-43,0,-49:26:10,16:16 -1 1135242 . A C . . NS=1;AN=2;AC=2;CGA_XR=dbsnp.119|rs9729550 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1/1:.:PASS:80:80,690:80,690:48,52:-690,-80,0:-52,-48,0:29:28,28:1 -1 1135760 . TCTCACCAGCAGCGGGGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1135789 . CTGAGGGCCAG . . . NS=1;AN=0 GT:PS ./.:. -1 1136001 . G . . NS=1;CGA_WINEND=1138000 GT:CGA_GP:CGA_NP:CGA_CP:CGA_PS:CGA_CT:CGA_TS:CGA_CL:CGA_LS .:0.78:0.75:.:0:.:0:0.999:315 -1 1136124 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1136148 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1136477 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1136480 . GC G . . NS=1;AN=2;AC=1 GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP 1|0:1136480:VQLOW:22:22,22:4,4:0,3:-22,0,-22:0,0,-3:10:1,9:9 -1 1136485 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1136488 . CTCGGG . . . NS=1;AN=0 GT:PS ./.:. -1 1136513 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1136515 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1136540 . ACTGCGCCCCGAGTGGGAGGGGGCGGCG . . . NS=1;AN=0 GT:PS ./.:. -1 1136574 . CCGCCGCGAC . . . NS=1;AN=0 GT:PS ./.:. -1 1136586 . CG . . . NS=1;AN=0 GT:PS ./.:. -1 1136644 . CCATGGCGGGT . . . NS=1;AN=0 GT:PS ./.:. -1 1136659 . GCT . . . NS=1;AN=0 GT:PS ./.:. -1 1136664 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1136668 . GG . . . NS=1;AN=0 GT:PS ./.:. -1 1136672 . TCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1136692 . A . . . NS=1;AN=0 GT:PS ./.:. -1 1136695 . CGCCCCCGCCCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1136713 . C . . . NS=1;AN=0 GT:PS ./.:. -1 1136715 . AGAACCCG . . . NS=1;AN=0 GT:PS ./.:. -1 1136726 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1136729 . T . . . NS=1;AN=0 GT:PS ./.:. -1 1136746 . CGCACGCTG . . . NS=1;AN=0 GT:PS ./.:. -1 1136757 . G . . . NS=1;AN=0 GT:PS ./.:. -1 1136761 . CCGCCAGGCCCGGAGGGTCGCGCTCCAGGTAAA . . . NS=1;AN=0 GT:PS ./.:. -1 1136797 . CGCGGGGCGGGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1136828 . CCGGAGACCCCGCCCAGAGCCCGCTCCGCCGCCCGCGGAATCCCCCGCCCGTCCG . . . NS=1;AN=0 GT:PS ./.:. -1 1136897 . GGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1136930 . CCCACGCGAGGCCGCCCACGA . . . NS=1;AN=0 GT:PS ./.:. -1 1136960 . GCGCTGAGTCAGCCCCGCGGGACCCGCGCTACGCGGGCCGCCG . . . NS=1;AN=0 GT:PS ./.:. -1 1137014 . GGCCGGTGCGGGACAGCCCCGGTGTGGGGGGCGCGTGGGGAGA . . . NS=1;AN=0 GT:PS ./.:. -1 1137097 . TGCGCGGGGCAGGGCCCGACCGCTCAGCCTCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1137169 . GCCCCGAGCCCAGTACCCAGCCTCCAGCCCAGTACCCAGCCCCGAGCCCAGTACCCAGCCTCCAGCCCAGTACCCAGCCCCGAGCCCAGTACCCAGCCTCCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 1137283 . GCCCCGAGCCCAGTACCCAGCCTCCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 1137323 . CCCGAGCCCAGTACCCAGCCTCCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 1137361 . CCCGAGCCCAGTACCCAGCCTCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1137399 . CCCGAGCCCAGTACCCAGCCTCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1137437 . CCCGAGCCCAGTACCCAGCCTCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1137470 . CCAGCCCCGAGCCCAGTACCCAGCCTCCAGCCCAGTACCCAGCCCCGAGCCCAGTACCCAGCCTCCAGCCCAGTACCCAGCCCCGAGCCCAGTACCCAGCCTCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1137584 . CCAGCCCCGAGCCCAGTACCCAGCCTCCAGCC . . . NS=1;AN=0 GT:PS ./.:. -1 1137622 . CCAGCCCCGAGCCCAGTACCCAGCCTCCAGCCCAGTACCCAGCCCCCAGC . . . NS=1;AN=0 GT:PS ./.:. -1 1137679 . CCAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1137693 . AGTACCCAGCCCCCAGCCCAGTACCCAGCCTCCAGCCCAGTACCCAGCCCCGAGCCCAGTACCCAGCCTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 1137774 . CCAGCCCCGAGCCCAGTACCCAGCCCCCAGCCCAGTACCCAGCCTCCAGCCCAGTACCCAGCCTCCA . . . NS=1;AN=0 GT:PS ./.:. -1 1137850 . CCAGCCCCGAGCCCAGTACCCAGCCCCCAGCCCAGTACCCAGCCC . . . NS=1;AN=0 GT:PS ./.:. -1 1137907 . CCAGCCCCGAGCCCAGTACCCAGCCCCGAGCCCAGCACCCAGCCTCCAGCCCAGTACCCATCCC . . . NS=1;AN=0 GT:PS ./.:. \ No newline at end of file diff --git a/sdks/python/apache_beam/testing/data/vcf/valid-4.1-large.vcf.gz b/sdks/python/apache_beam/testing/data/vcf/valid-4.1-large.vcf.gz deleted file mode 100644 index e2d0fed14a86..000000000000 Binary files a/sdks/python/apache_beam/testing/data/vcf/valid-4.1-large.vcf.gz and /dev/null differ diff --git a/sdks/python/apache_beam/testing/data/vcf/valid-4.2.vcf b/sdks/python/apache_beam/testing/data/vcf/valid-4.2.vcf deleted file mode 100644 index c42d71cbb6b2..000000000000 --- a/sdks/python/apache_beam/testing/data/vcf/valid-4.2.vcf +++ /dev/null @@ -1,42 +0,0 @@ -##fileformat=VCFv4.2 -##fileDate=20090805 -##source=myImputationProgramV3.1 -##phasing=partial -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##INFO= -##FILTER= -##FILTER= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##FORMAT= -##reference=file:/lustre/scratch105/projects/g1k/ref/main_project/human_g1k_v37.fasta -##contig= -##contig= -##contig= -##SAMPLE= -##SAMPLE= -##PEDIGREE= -##PEDIGREE= -##pedigreeDB=url -#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 -19 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. -20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3 -20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4 -20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2 -20 1234567 microsat1 GTC G,GTCTC 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3 -20 2234567 . C [13:123457[ACGC 50 PASS SVTYPE=BÑD;NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/1:17:2 1/1:40:3 -20 2234568 . C .TC 50 PASS SVTYPE=BND;NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/1:17:2 1/1:40:3 -20 2234569 . C CT. 50 PASS SVTYPE=BND;NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/1:17:2 1/1:40:3 -20 3234569 . C 50 PASS END=3235677;NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/1:17:2 1/1:40:3 -20 4234569 . N .[13:123457[ 50 PASS SVTYPE=BND;NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/1:17:2 ./.:40:3 -20 5234569 . N [13:123457[. 50 PASS SVTYPE=BND;NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/1:17:2 1/1:40:3 -Y 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GL 0:0,49 0:0,3 1:41,0 -HLA-A*01:01:01:01 1 . N T 50 PASS END=1;NS=3;DP=9;AA=G GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. \ No newline at end of file diff --git a/sdks/python/apache_beam/testing/data/vcf/valid-4.2.vcf.gz b/sdks/python/apache_beam/testing/data/vcf/valid-4.2.vcf.gz deleted file mode 100644 index 4208e3e7969d..000000000000 Binary files a/sdks/python/apache_beam/testing/data/vcf/valid-4.2.vcf.gz and /dev/null differ diff --git a/sdks/python/apache_beam/testing/datatype_inference.py b/sdks/python/apache_beam/testing/datatype_inference.py index 1c00dfb2f506..c8236ab1d642 100644 --- a/sdks/python/apache_beam/testing/datatype_inference.py +++ b/sdks/python/apache_beam/testing/datatype_inference.py @@ -16,25 +16,16 @@ # # pytype: skip-file -from __future__ import absolute_import - import array import json from collections import OrderedDict import numpy as np -from past.builtins import unicode +from avro.schema import Parse from apache_beam.typehints import trivial_inference from apache_beam.typehints import typehints -# pylint: disable=wrong-import-order, wrong-import-position -try: - from avro.schema import Parse # avro-python3 library for python3 -except ImportError: - from avro.schema import parse as Parse # avro library for python2 -# pylint: enable=wrong-import-order, wrong-import-position - try: import pyarrow as pa except ImportError: @@ -98,7 +89,6 @@ def infer_avro_schema(data, use_fastavro=False): int: "int", float: "double", str: "string", - unicode: "string", bytes: "bytes", np.ndarray: "bytes", array.array: "bytes", diff --git a/sdks/python/apache_beam/testing/datatype_inference_test.py b/sdks/python/apache_beam/testing/datatype_inference_test.py index c4a429746ab9..d59adbf81ff2 100644 --- a/sdks/python/apache_beam/testing/datatype_inference_test.py +++ b/sdks/python/apache_beam/testing/datatype_inference_test.py @@ -16,15 +16,12 @@ # # pytype: skip-file -from __future__ import absolute_import - import logging import unittest from collections import OrderedDict import numpy as np from parameterized import parameterized -from past.builtins import unicode from apache_beam.testing import datatype_inference from apache_beam.typehints import typehints @@ -72,7 +69,7 @@ "type_schema": OrderedDict([ ("a", int), ("b", float), - ("c", unicode), + ("c", str), ("d", np.ndarray), ("e", bytes), ]), diff --git a/sdks/python/apache_beam/testing/extra_assertions.py b/sdks/python/apache_beam/testing/extra_assertions.py index b67f814bbf49..7820956c7c5e 100644 --- a/sdks/python/apache_beam/testing/extra_assertions.py +++ b/sdks/python/apache_beam/testing/extra_assertions.py @@ -16,23 +16,10 @@ # # pytype: skip-file -from __future__ import absolute_import - -import sys - import numpy as np class ExtraAssertionsMixin(object): - - if sys.version_info[0] < 3: - - def assertCountEqual(self, first, second, msg=None): - """Assert that two containers have the same number of the same items in - any order. - """ - return self.assertItemsEqual(first, second, msg=msg) - def assertUnhashableCountEqual(self, data1, data2): """Assert that two containers have the same items, with special treatment for numpy arrays. diff --git a/sdks/python/apache_beam/testing/extra_assertions_test.py b/sdks/python/apache_beam/testing/extra_assertions_test.py index 73522ac8437d..9867c7989742 100644 --- a/sdks/python/apache_beam/testing/extra_assertions_test.py +++ b/sdks/python/apache_beam/testing/extra_assertions_test.py @@ -17,8 +17,6 @@ # # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/testing/load_tests/co_group_by_key_test.py b/sdks/python/apache_beam/testing/load_tests/co_group_by_key_test.py index 165fd920c689..e7d212baffab 100644 --- a/sdks/python/apache_beam/testing/load_tests/co_group_by_key_test.py +++ b/sdks/python/apache_beam/testing/load_tests/co_group_by_key_test.py @@ -76,8 +76,6 @@ # pytype: skip-file -from __future__ import absolute_import - import json import logging diff --git a/sdks/python/apache_beam/testing/load_tests/combine_test.py b/sdks/python/apache_beam/testing/load_tests/combine_test.py index 7076cdadb46a..d3a372f645d4 100644 --- a/sdks/python/apache_beam/testing/load_tests/combine_test.py +++ b/sdks/python/apache_beam/testing/load_tests/combine_test.py @@ -68,8 +68,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import sys diff --git a/sdks/python/apache_beam/testing/load_tests/group_by_key_test.py b/sdks/python/apache_beam/testing/load_tests/group_by_key_test.py index 81a6281a320f..69ea74c8c262 100644 --- a/sdks/python/apache_beam/testing/load_tests/group_by_key_test.py +++ b/sdks/python/apache_beam/testing/load_tests/group_by_key_test.py @@ -69,8 +69,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import apache_beam as beam diff --git a/sdks/python/apache_beam/testing/load_tests/load_test.py b/sdks/python/apache_beam/testing/load_tests/load_test.py index a15f81987042..f5917fbfba27 100644 --- a/sdks/python/apache_beam/testing/load_tests/load_test.py +++ b/sdks/python/apache_beam/testing/load_tests/load_test.py @@ -16,8 +16,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import json import logging @@ -100,6 +98,7 @@ def __init__(self, metrics_namespace=None): options = self.pipeline.get_pipeline_options().view_as(LoadTestOptions) self.timeout_ms = options.timeout_ms self.input_options = options.input_options + self.extra_metrics = {} if metrics_namespace: self.metrics_namespace = metrics_namespace @@ -150,7 +149,7 @@ def run(self): self.result = self.pipeline.run() # Defaults to waiting forever, unless timeout_ms has been set self.result.wait_until_finish(duration=self.timeout_ms) - self._metrics_monitor.publish_metrics(self.result) + self._metrics_monitor.publish_metrics(self.result, self.extra_metrics) finally: self.cleanup() diff --git a/sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py b/sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py index d8168270c3e4..f6d33409c7e9 100644 --- a/sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py +++ b/sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py @@ -29,8 +29,6 @@ # pytype: skip-file -from __future__ import absolute_import - import json import logging import time @@ -50,7 +48,7 @@ from apache_beam.utils.timestamp import Timestamp try: - from google.cloud import bigquery + from google.cloud import bigquery # type: ignore from google.cloud.bigquery.schema import SchemaField from google.cloud.exceptions import NotFound except ImportError: @@ -218,7 +216,8 @@ def __init__( 'InfluxDB') self.filters = filters - def publish_metrics(self, result): + def publish_metrics(self, result, extra_metrics: dict): + metric_id = uuid.uuid4().hex metrics = result.metrics().query(self.filters) # Metrics from pipeline result are stored in map with keys: 'gauges', @@ -226,11 +225,20 @@ def publish_metrics(self, result): # Under each key there is list of objects of each metric type. It is # required to prepare metrics for publishing purposes. Expected is to have # a list of dictionaries matching the schema. - insert_dicts = self._prepare_all_metrics(metrics) + insert_dicts = self._prepare_all_metrics(metrics, metric_id) + + insert_dicts += self._prepare_extra_metrics(extra_metrics, metric_id) if len(insert_dicts) > 0: for publisher in self.publishers: publisher.publish(insert_dicts) + def _prepare_extra_metrics(self, extra_metrics: dict, metric_id: str): + ts = time.time() + return [ + Metric(ts, metric_id, v, label=k).as_dict() for k, + v in extra_metrics.items() + ] + def publish_values(self, labeled_values): """The method to publish simple labeled values. @@ -246,8 +254,7 @@ def publish_values(self, labeled_values): for publisher in self.publishers: publisher.publish(metric_dicts) - def _prepare_all_metrics(self, metrics): - metric_id = uuid.uuid4().hex + def _prepare_all_metrics(self, metrics, metric_id): insert_rows = self._get_counters(metrics['counters'], metric_id) insert_rows += self._get_distributions(metrics['distributions'], metric_id) @@ -398,11 +405,10 @@ def publish(self, results): outputs = self.bq.save(results) if len(outputs) > 0: for output in outputs: - errors = output['errors'] - for err in errors: - _LOGGER.error(err['message']) + if output['errors']: + _LOGGER.error(output) raise ValueError( - 'Unable save rows in BigQuery: {}'.format(err['message'])) + 'Unable save rows in BigQuery: {}'.format(output['errors'])) class BigQueryClient(object): diff --git a/sdks/python/apache_beam/testing/load_tests/microbenchmarks_test.py b/sdks/python/apache_beam/testing/load_tests/microbenchmarks_test.py new file mode 100644 index 000000000000..2dff7404f8fe --- /dev/null +++ b/sdks/python/apache_beam/testing/load_tests/microbenchmarks_test.py @@ -0,0 +1,98 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +This is a load test that runs a set of basic microbenchmarks for the Python SDK +and the DirectRunner. + +This test does not need any additional options passed to run, besides the +dataset information. + +Example test run: + +python -m apache_beam.testing.load_tests.microbenchmarks_test \ + --test-pipeline-options=" + --project=big-query-project + --input_options='{}' + --region=... + --publish_to_big_query=true + --metrics_dataset=python_load_tests + --metrics_table=microbenchmarks" + +or: + +./gradlew -PloadTest.args=" + --publish_to_big_query=true + --project=... + --region=... + --input_options='{}' + --metrics_dataset=python_load_tests + --metrics_table=microbenchmarks + --runner=DirectRunner" \ +-PloadTest.mainClass=apache_beam.testing.load_tests.microbenchmarks_test \ +-Prunner=DirectRunner :sdks:python:apache_beam:testing:load_tests:run +""" + +# pytype: skip-file + +import logging +import time + +from apache_beam.testing.load_tests.load_test import LoadTest +from apache_beam.tools import fn_api_runner_microbenchmark +from apache_beam.tools import teststream_microbenchmark +from apache_beam.transforms.util import _BatchSizeEstimator + + +class MicroBenchmarksLoadTest(LoadTest): + def __init__(self): + super(MicroBenchmarksLoadTest, self).__init__() + + def test(self): + self.extra_metrics.update(self._run_fn_api_runner_microbenchmark()) + self.extra_metrics.update(self._run_teststream_microbenchmark()) + + def _run_teststream_microbenchmark(self): + start = time.perf_counter() + result = teststream_microbenchmark.run_benchmark(verbose=False) + sizes = list(result[0].values())[0] + costs = list(result[1].values())[0] + a, b = _BatchSizeEstimator.linear_regression_no_numpy(sizes, costs) + + return { + 'teststream_microbenchmark_runtime_sec': time.perf_counter() - start, + 'teststream_microbenchmark_fixed_cost_ms': a * 1000, + 'teststream_microbenchmark_per_element_cost_ms': b * 1000, + } + + def _run_fn_api_runner_microbenchmark(self): + start = time.perf_counter() + result = fn_api_runner_microbenchmark.run_benchmark(verbose=False) + sizes = list(result[0].values())[0] + costs = list(result[1].values())[0] + a, b = _BatchSizeEstimator.linear_regression_no_numpy(sizes, costs) + + return { + 'fn_api_runner_microbenchmark_runtime_sec': time.perf_counter() - start, + 'fn_api_runner_microbenchmark_fixed_cost_ms': a * 1000, + 'fn_api_runner_microbenchmark_per_element_cost_ms': b * 1000, + } + + +if __name__ == '__main__': + logging.basicConfig(level=logging.INFO) + MicroBenchmarksLoadTest().run() diff --git a/sdks/python/apache_beam/testing/load_tests/pardo_test.py b/sdks/python/apache_beam/testing/load_tests/pardo_test.py index 9ccb45e1b93e..6722fe33d8ab 100644 --- a/sdks/python/apache_beam/testing/load_tests/pardo_test.py +++ b/sdks/python/apache_beam/testing/load_tests/pardo_test.py @@ -72,8 +72,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import apache_beam as beam diff --git a/sdks/python/apache_beam/testing/load_tests/sideinput_test.py b/sdks/python/apache_beam/testing/load_tests/sideinput_test.py index 28dd16c1e5b1..f77e35c3cda9 100644 --- a/sdks/python/apache_beam/testing/load_tests/sideinput_test.py +++ b/sdks/python/apache_beam/testing/load_tests/sideinput_test.py @@ -56,9 +56,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging from typing import Any from typing import Dict diff --git a/sdks/python/apache_beam/testing/metric_result_matchers.py b/sdks/python/apache_beam/testing/metric_result_matchers.py index e49575cb7bce..a4a7f69290a8 100644 --- a/sdks/python/apache_beam/testing/metric_result_matchers.py +++ b/sdks/python/apache_beam/testing/metric_result_matchers.py @@ -43,8 +43,6 @@ # pytype: skip-file -from __future__ import absolute_import - from hamcrest import equal_to from hamcrest.core import string_description from hamcrest.core.base_matcher import BaseMatcher diff --git a/sdks/python/apache_beam/testing/metric_result_matchers_test.py b/sdks/python/apache_beam/testing/metric_result_matchers_test.py index 4e724d26a6f3..9f4d4086c087 100644 --- a/sdks/python/apache_beam/testing/metric_result_matchers_test.py +++ b/sdks/python/apache_beam/testing/metric_result_matchers_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from hamcrest import assert_that as hc_assert_that diff --git a/sdks/python/apache_beam/testing/pipeline_verifiers.py b/sdks/python/apache_beam/testing/pipeline_verifiers.py index b952a2968021..225e6d0dbae1 100644 --- a/sdks/python/apache_beam/testing/pipeline_verifiers.py +++ b/sdks/python/apache_beam/testing/pipeline_verifiers.py @@ -24,8 +24,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import time diff --git a/sdks/python/apache_beam/testing/pipeline_verifiers_test.py b/sdks/python/apache_beam/testing/pipeline_verifiers_test.py index 0943ee5cd4f6..085339003699 100644 --- a/sdks/python/apache_beam/testing/pipeline_verifiers_test.py +++ b/sdks/python/apache_beam/testing/pipeline_verifiers_test.py @@ -19,13 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import tempfile import unittest -from builtins import range from hamcrest import assert_that as hc_assert_that from mock import Mock diff --git a/sdks/python/apache_beam/testing/synthetic_pipeline.py b/sdks/python/apache_beam/testing/synthetic_pipeline.py index 33dd2ced112e..2387be79fd18 100644 --- a/sdks/python/apache_beam/testing/synthetic_pipeline.py +++ b/sdks/python/apache_beam/testing/synthetic_pipeline.py @@ -34,9 +34,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import argparse import json import logging @@ -88,10 +85,6 @@ def bytes(self, length): Generator = _Random -# TODO(BEAM-7372): Remove this when Beam drops Python 2. -if np is not None and sys.version_info.major == 2: - Generator = np.random.RandomState - def parse_byte_size(s): suffixes = 'BKMGTP' diff --git a/sdks/python/apache_beam/testing/synthetic_pipeline_test.py b/sdks/python/apache_beam/testing/synthetic_pipeline_test.py index dbecb16d99a9..7e973a3ca7d7 100644 --- a/sdks/python/apache_beam/testing/synthetic_pipeline_test.py +++ b/sdks/python/apache_beam/testing/synthetic_pipeline_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import glob import json import logging @@ -36,9 +34,11 @@ from apache_beam.testing.util import equal_to try: - import numpy as np + import numpy # pylint: disable=unused-import except ImportError: - np = None + NP_INSTALLED = False +else: + NP_INSTALLED = True def input_spec( @@ -60,7 +60,8 @@ def input_spec( } -@unittest.skipIf(np is None, 'Synthetic source dependencies are not installed') +@unittest.skipIf( + not NP_INSTALLED, 'Synthetic source dependencies are not installed') class SyntheticPipelineTest(unittest.TestCase): # pylint: disable=expression-not-assigned diff --git a/sdks/python/apache_beam/testing/test_pipeline.py b/sdks/python/apache_beam/testing/test_pipeline.py index 678a07855ad9..910f14997536 100644 --- a/sdks/python/apache_beam/testing/test_pipeline.py +++ b/sdks/python/apache_beam/testing/test_pipeline.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import argparse import shlex from unittest import SkipTest @@ -42,7 +40,7 @@ class TestPipeline(Pipeline): It has a functionality to parse arguments from command line and build pipeline options for tests who runs against a pipeline runner and utilizes resources of the pipeline runner. Those test functions are recommended to be tagged by - ``@attr("ValidatesRunner")`` annotation. + ``@pytest.mark.it_validatesrunner`` annotation. In order to configure the test with customized pipeline options from command line, system argument ``--test-pipeline-options`` can be used to obtains a @@ -50,7 +48,7 @@ class TestPipeline(Pipeline): For example, use following command line to execute all ValidatesRunner tests:: - python setup.py nosetests -a ValidatesRunner \\ + pytest -m it_validatesrunner \\ --test-pipeline-options="--runner=DirectRunner \\ --job_name=myJobName \\ --num_workers=1" @@ -61,6 +59,10 @@ class TestPipeline(Pipeline): pcoll = ... assert_that(pcoll, equal_to(...)) """ + # Command line options read in by pytest. + # If this is not None, will use as default value for --test-pipeline-options. + pytest_test_pipeline_options = None + def __init__( self, runner=None, @@ -144,8 +146,9 @@ def _parse_test_option_args(self, argv): default=False, help='whether not to use test-runner-api') known, unused_argv = parser.parse_known_args(argv) - - if self.is_integration_test and not known.test_pipeline_options: + test_pipeline_options = known.test_pipeline_options or \ + TestPipeline.pytest_test_pipeline_options + if self.is_integration_test and not test_pipeline_options: # Skip integration test when argument '--test-pipeline-options' is not # specified since nose calls integration tests when runs unit test by # 'setup.py test'. @@ -154,8 +157,8 @@ def _parse_test_option_args(self, argv): 'is not specified') self.not_use_test_runner_api = known.not_use_test_runner_api - return shlex.split(known.test_pipeline_options) \ - if known.test_pipeline_options else [] + return shlex.split(test_pipeline_options) \ + if test_pipeline_options else [] def get_full_options_as_args(self, **extra_opts): """Get full pipeline options as an argument list. diff --git a/sdks/python/apache_beam/testing/test_pipeline_test.py b/sdks/python/apache_beam/testing/test_pipeline_test.py index a5461704f1ef..c38b802dc90c 100644 --- a/sdks/python/apache_beam/testing/test_pipeline_test.py +++ b/sdks/python/apache_beam/testing/test_pipeline_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/testing/test_stream.py b/sdks/python/apache_beam/testing/test_stream.py index 590792c3089a..d655a9021946 100644 --- a/sdks/python/apache_beam/testing/test_stream.py +++ b/sdks/python/apache_beam/testing/test_stream.py @@ -21,16 +21,11 @@ """ # pytype: skip-file -from __future__ import absolute_import - from abc import ABCMeta from abc import abstractmethod -from builtins import object from enum import Enum from functools import total_ordering -from future.utils import with_metaclass - import apache_beam as beam from apache_beam import coders from apache_beam import pvalue @@ -63,7 +58,7 @@ @total_ordering -class Event(with_metaclass(ABCMeta, object)): # type: ignore[misc] +class Event(metaclass=ABCMeta): # type: ignore[misc] """Test stream event to be emitted during execution of a TestStream.""" @abstractmethod def __eq__(self, other): @@ -77,10 +72,6 @@ def __hash__(self): def __lt__(self, other): raise NotImplementedError - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - @abstractmethod def to_runner_api(self, element_coder): raise NotImplementedError @@ -216,15 +207,56 @@ def __repr__(self): return 'ProcessingTimeEvent: <{}>'.format(self.advance_by) -class WindowedValueHolder: +class WindowedValueHolderMeta(type): + """A metaclass that overrides the isinstance check for WindowedValueHolder. + + Python does a quick test for exact match. If an instance is exactly of + type WindowedValueHolder, the overridden isinstance check is omitted. + The override is needed because WindowedValueHolder elements encoded then + decoded become Row elements. + """ + def __instancecheck__(cls, other): + """Checks if a beam.Row typed instance is a WindowedValueHolder. + """ + return ( + isinstance(other, beam.Row) and hasattr(other, 'windowed_value') and + hasattr(other, 'urn') and + isinstance(other.windowed_value, WindowedValue) and + other.urn == common_urns.coders.ROW.urn) + + +class WindowedValueHolder(beam.Row, metaclass=WindowedValueHolderMeta): """A class that holds a WindowedValue. This is a special class that can be used by the runner that implements the TestStream as a signal that the underlying value should be unreified to the specified window. """ + # Register WindowedValueHolder to always use RowCoder. + coders.registry.register_coder(WindowedValueHolderMeta, coders.RowCoder) + def __init__(self, windowed_value): - self.windowed_value = windowed_value + assert isinstance(windowed_value, WindowedValue), ( + 'WindowedValueHolder can only hold %s type. Instead, %s is given.') % ( + WindowedValue, windowed_value) + super().__init__( + **{ + 'windowed_value': windowed_value, 'urn': common_urns.coders.ROW.urn + }) + + @classmethod + def from_row(cls, row): + """Converts a beam.Row typed instance to WindowedValueHolder. + """ + if isinstance(row, WindowedValueHolder): + return WindowedValueHolder(row.windowed_value) + assert isinstance(row, beam.Row), 'The given row %s must be a %s type' % ( + row, beam.Row) + assert hasattr(row, 'windowed_value'), ( + 'The given %s must have a windowed_value attribute.') % row + assert isinstance(row.windowed_value, WindowedValue), ( + 'The windowed_value attribute of %s must be a %s type') % ( + row, WindowedValue) class TestStream(PTransform): diff --git a/sdks/python/apache_beam/testing/test_stream_it_test.py b/sdks/python/apache_beam/testing/test_stream_it_test.py index c9458d726573..0e293eda3713 100644 --- a/sdks/python/apache_beam/testing/test_stream_it_test.py +++ b/sdks/python/apache_beam/testing/test_stream_it_test.py @@ -19,12 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from functools import wraps -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.options.pipeline_options import StandardOptions @@ -69,7 +67,7 @@ def setUpClass(cls): cls.project = cls.test_pipeline.get_option('project') @supported(['DirectRunner', 'SwitchingDirectRunner']) - @attr('IT') + @pytest.mark.it_postcommit def test_basic_execution(self): test_stream = ( TestStream().advance_watermark_to(10).add_elements([ @@ -105,7 +103,7 @@ def process( ])) @supported(['DirectRunner', 'SwitchingDirectRunner']) - @attr('IT') + @pytest.mark.it_postcommit def test_multiple_outputs(self): """Tests that the TestStream supports emitting to multiple PCollections.""" letters_elements = [ @@ -153,7 +151,7 @@ def process( p.run() @supported(['DirectRunner', 'SwitchingDirectRunner']) - @attr('IT') + @pytest.mark.it_postcommit def test_multiple_outputs_with_watermark_advancement(self): """Tests that the TestStream can independently control output watermarks.""" diff --git a/sdks/python/apache_beam/testing/test_stream_service.py b/sdks/python/apache_beam/testing/test_stream_service.py index 618e208fbb98..988e1b8e3750 100644 --- a/sdks/python/apache_beam/testing/test_stream_service.py +++ b/sdks/python/apache_beam/testing/test_stream_service.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - from concurrent.futures import ThreadPoolExecutor import grpc diff --git a/sdks/python/apache_beam/testing/test_stream_service_test.py b/sdks/python/apache_beam/testing/test_stream_service_test.py index 7a5b403d36d7..749dafd035bf 100644 --- a/sdks/python/apache_beam/testing/test_stream_service_test.py +++ b/sdks/python/apache_beam/testing/test_stream_service_test.py @@ -17,10 +17,8 @@ # pytype: skip-file -from __future__ import absolute_import - -import sys import unittest +from unittest.mock import patch import grpc @@ -31,13 +29,6 @@ from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload from apache_beam.testing.test_stream_service import TestStreamServiceController -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch # type: ignore[misc] - # Nose automatically detects tests if they match a regex. Here, it mistakens # these protos as tests. For more info see the Nose docs at: # https://nose.readthedocs.io/en/latest/writing_tests.html @@ -124,8 +115,6 @@ def test_multiple_sessions(self): self.assertEqual(events_b, expected_events) -@unittest.skipIf( - sys.version_info < (3, 6), 'The tests require at least Python 3.6 to work.') class TestStreamServiceStartStopTest(unittest.TestCase): # Weak internal use needs to be explicitly imported. diff --git a/sdks/python/apache_beam/testing/test_stream_test.py b/sdks/python/apache_beam/testing/test_stream_test.py index fbba8def73a2..a4580b787abb 100644 --- a/sdks/python/apache_beam/testing/test_stream_test.py +++ b/sdks/python/apache_beam/testing/test_stream_test.py @@ -19,13 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.options.pipeline_options import TypeOptions from apache_beam.portability import common_urns from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileHeader from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord @@ -333,6 +332,27 @@ def process( ('a', timestamp.Timestamp(5), beam.window.IntervalWindow(5, 10)), ])) + def test_instance_check_windowed_value_holder(self): + windowed_value = WindowedValue( + 'a', + Timestamp(5), [beam.window.IntervalWindow(5, 10)], + PaneInfo(True, True, PaneInfoTiming.ON_TIME, 0, 0)) + self.assertTrue( + isinstance(WindowedValueHolder(windowed_value), WindowedValueHolder)) + self.assertTrue( + isinstance( + beam.Row( + windowed_value=windowed_value, urn=common_urns.coders.ROW.urn), + WindowedValueHolder)) + self.assertFalse( + isinstance( + beam.Row(windowed_value=windowed_value), WindowedValueHolder)) + self.assertFalse(isinstance(windowed_value, WindowedValueHolder)) + self.assertFalse( + isinstance(beam.Row(x=windowed_value), WindowedValueHolder)) + self.assertFalse( + isinstance(beam.Row(windowed_value=1), WindowedValueHolder)) + def test_gbk_execution_no_triggers(self): test_stream = ( TestStream().advance_watermark_to(10).add_elements([ @@ -429,6 +449,7 @@ def test_gbk_execution_after_processing_trigger_fired(self): options = PipelineOptions() options.view_as(StandardOptions).streaming = True + options.view_as(TypeOptions).allow_unsafe_triggers = True p = TestPipeline(options=options) records = ( p diff --git a/sdks/python/apache_beam/testing/test_utils.py b/sdks/python/apache_beam/testing/test_utils.py index cb81091171e3..b58bfcde8766 100644 --- a/sdks/python/apache_beam/testing/test_utils.py +++ b/sdks/python/apache_beam/testing/test_utils.py @@ -22,17 +22,11 @@ # pytype: skip-file -from __future__ import absolute_import - import hashlib import imp import os import shutil import tempfile -from builtins import object - -from mock import Mock -from mock import patch from apache_beam.io.filesystems import FileSystems from apache_beam.utils import retry @@ -101,6 +95,10 @@ def patch_retry(testcase, module): module: The module that uses retry and need to be replaced with mock clock and logger in test. """ + # Import mock here to avoid execution time errors for other utilities + from mock import Mock + from mock import patch + real_retry_with_exponential_backoff = retry.with_exponential_backoff def patched_retry_with_exponential_backoff(**kwargs): diff --git a/sdks/python/apache_beam/testing/test_utils_test.py b/sdks/python/apache_beam/testing/test_utils_test.py index 07c2a3196dd3..a65712184b7b 100644 --- a/sdks/python/apache_beam/testing/test_utils_test.py +++ b/sdks/python/apache_beam/testing/test_utils_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import os import tempfile diff --git a/sdks/python/apache_beam/testing/util.py b/sdks/python/apache_beam/testing/util.py index 110ec4190034..595158150ad1 100644 --- a/sdks/python/apache_beam/testing/util.py +++ b/sdks/python/apache_beam/testing/util.py @@ -19,13 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import glob import io import tempfile -from builtins import object from apache_beam import pvalue from apache_beam.transforms import window @@ -75,10 +72,6 @@ def __init__(self, iterable): def __eq__(self, other): return self._counter == collections.Counter(other) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(self._counter) @@ -289,11 +282,13 @@ def expand(self, pcoll): pcoll = pcoll | ParDo(ReifyTimestampWindow()) keyed_singleton = pcoll.pipeline | Create([(None, None)]) + keyed_singleton.is_bounded = True if use_global_window: pcoll = pcoll | WindowInto(window.GlobalWindows()) keyed_actual = pcoll | "ToVoidKey" >> Map(lambda v: (None, v)) + keyed_actual.is_bounded = True # This is a CoGroupByKey so that the matcher always runs, even if the # PCollection is empty. diff --git a/sdks/python/apache_beam/testing/util_test.py b/sdks/python/apache_beam/testing/util_test.py index 7efbdb43b75f..98c1349ef36c 100644 --- a/sdks/python/apache_beam/testing/util_test.py +++ b/sdks/python/apache_beam/testing/util_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import apache_beam as beam @@ -43,12 +41,6 @@ class UtilTest(unittest.TestCase): - def setUp(self): - try: # Python 3 - _ = self.assertRaisesRegex - except AttributeError: # Python 2 - self.assertRaisesRegex = self.assertRaisesRegexp - def test_assert_that_passes(self): with TestPipeline() as p: assert_that(p | Create([1, 2, 3]), equal_to([1, 2, 3])) diff --git a/sdks/python/apache_beam/tools/__init__.py b/sdks/python/apache_beam/tools/__init__.py index 30ca5c31be0f..7b7c3b391932 100644 --- a/sdks/python/apache_beam/tools/__init__.py +++ b/sdks/python/apache_beam/tools/__init__.py @@ -19,4 +19,3 @@ For internal use only; no backwards-compatibility guarantees. """ -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/tools/coders_microbenchmark.py b/sdks/python/apache_beam/tools/coders_microbenchmark.py index 902812fc76de..6474ac44a995 100644 --- a/sdks/python/apache_beam/tools/coders_microbenchmark.py +++ b/sdks/python/apache_beam/tools/coders_microbenchmark.py @@ -31,9 +31,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import argparse import logging import random @@ -41,8 +38,6 @@ import string import sys -from past.builtins import unicode - from apache_beam.coders import proto2_coder_test_messages_pb2 as test_message from apache_beam.coders import coders from apache_beam.tools import utils @@ -81,10 +76,9 @@ def large_int(): def random_string(length): - return unicode( - ''.join( - random.choice(string.ascii_letters + string.digits) - for _ in range(length))) + return ''.join( + random.choice(string.ascii_letters + string.digits) + for _ in range(length)) def small_string(): diff --git a/sdks/python/apache_beam/tools/distribution_counter_microbenchmark.py b/sdks/python/apache_beam/tools/distribution_counter_microbenchmark.py index c29b7ff5cf1a..4d867cccf9ce 100644 --- a/sdks/python/apache_beam/tools/distribution_counter_microbenchmark.py +++ b/sdks/python/apache_beam/tools/distribution_counter_microbenchmark.py @@ -26,15 +26,10 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import logging import random import sys import time -from builtins import range from apache_beam.tools import utils diff --git a/sdks/python/apache_beam/tools/fn_api_runner_microbenchmark.py b/sdks/python/apache_beam/tools/fn_api_runner_microbenchmark.py index 0b9eb6a0df75..a7bf7cbae317 100644 --- a/sdks/python/apache_beam/tools/fn_api_runner_microbenchmark.py +++ b/sdks/python/apache_beam/tools/fn_api_runner_microbenchmark.py @@ -56,14 +56,9 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import argparse import logging import random -from builtins import range import apache_beam as beam import apache_beam.typehints.typehints as typehints @@ -77,7 +72,7 @@ NUM_PARALLEL_STAGES = 7 -NUM_SERIAL_STAGES = 5 +NUM_SERIAL_STAGES = 7 class BagInStateOutputAfterTimer(beam.DoFn): @@ -128,12 +123,13 @@ def _pipeline_runner(): return _pipeline_runner -def run_benchmark(starting_point, num_runs, num_elements_step, verbose): +def run_benchmark( + starting_point=1, num_runs=10, num_elements_step=100, verbose=True): suite = [ utils.LinearRegressionBenchmarkConfig( run_single_pipeline, starting_point, num_elements_step, num_runs) ] - utils.run_benchmarks(suite, verbose=verbose) + return utils.run_benchmarks(suite, verbose=verbose) if __name__ == '__main__': diff --git a/sdks/python/apache_beam/tools/map_fn_microbenchmark.py b/sdks/python/apache_beam/tools/map_fn_microbenchmark.py index a6613f0d8a67..ddafbc147b5f 100644 --- a/sdks/python/apache_beam/tools/map_fn_microbenchmark.py +++ b/sdks/python/apache_beam/tools/map_fn_microbenchmark.py @@ -32,14 +32,8 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import logging import time -from builtins import range -from builtins import zip import apache_beam as beam from apache_beam.tools import utils diff --git a/sdks/python/apache_beam/tools/microbenchmarks_test.py b/sdks/python/apache_beam/tools/microbenchmarks_test.py index 850ef33abf00..aacf05851e3a 100644 --- a/sdks/python/apache_beam/tools/microbenchmarks_test.py +++ b/sdks/python/apache_beam/tools/microbenchmarks_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from pkg_resources import DistributionNotFound diff --git a/sdks/python/apache_beam/tools/runtime_type_check_microbenchmark.py b/sdks/python/apache_beam/tools/runtime_type_check_microbenchmark.py index 2dfe336312a0..6ba01b9a5172 100644 --- a/sdks/python/apache_beam/tools/runtime_type_check_microbenchmark.py +++ b/sdks/python/apache_beam/tools/runtime_type_check_microbenchmark.py @@ -27,12 +27,7 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import logging -from builtins import range from collections import defaultdict from time import time from typing import Iterable diff --git a/sdks/python/apache_beam/tools/sideinput_microbenchmark.py b/sdks/python/apache_beam/tools/sideinput_microbenchmark.py index 9c69b3f9d180..69f8bff7ad73 100644 --- a/sdks/python/apache_beam/tools/sideinput_microbenchmark.py +++ b/sdks/python/apache_beam/tools/sideinput_microbenchmark.py @@ -25,13 +25,8 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import logging import time -from builtins import range from apache_beam.runners.worker import opcounters from apache_beam.runners.worker import sideinputs diff --git a/sdks/python/apache_beam/tools/teststream_microbenchmark.py b/sdks/python/apache_beam/tools/teststream_microbenchmark.py new file mode 100644 index 000000000000..4b00de043df1 --- /dev/null +++ b/sdks/python/apache_beam/tools/teststream_microbenchmark.py @@ -0,0 +1,125 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""A microbenchmark for measuring changes in the performance of TestStream +running locally. +This microbenchmark attempts to measure the overhead of the main data paths +for the TestStream. Specifically new elements, watermark changes and processing +time advances. + +This runs a series of N parallel pipelines with M parallel stages each. Each +stage does the following: + +1) Put all the PCollection elements in a window +2) Wait until the watermark advances past the end of the window. +3) When the watermark passes, change the key and output all the elements +4) Go back to #1 until all elements in the stream have been consumed. + +This executes the same codepaths that are run on the Fn API (and Dataflow) +workers, but is generally easier to run (locally) and more stable. + +Run as + + python -m apache_beam.tools.teststream_microbenchmark + +""" + +# pytype: skip-file + +import argparse +import itertools +import logging +import random + +import apache_beam as beam +import apache_beam.typehints.typehints as typehints +from apache_beam import WindowInto +from apache_beam.runners import DirectRunner +from apache_beam.testing.test_stream import TestStream +from apache_beam.tools import utils +from apache_beam.transforms.window import FixedWindows + +NUM_PARALLEL_STAGES = 7 + +NUM_SERIAL_STAGES = 6 + + +class RekeyElements(beam.DoFn): + def process(self, element): + _, values = element + return [(random.randint(0, 1000), v) for v in values] + + +def _build_serial_stages(input_pc, num_serial_stages, stage_count): + pc = (input_pc | ('gbk_start_stage%s' % stage_count) >> beam.GroupByKey()) + + for i in range(num_serial_stages): + pc = ( + pc + | ('stage%s_map%s' % (stage_count, i)) >> beam.ParDo( + RekeyElements()).with_output_types(typehints.KV[int, int]) + | ('stage%s_gbk%s' % (stage_count, i)) >> beam.GroupByKey()) + + return pc + + +def run_single_pipeline(size): + def _pipeline_runner(): + with beam.Pipeline(runner=DirectRunner()) as p: + ts = TestStream().advance_watermark_to(0) + all_elements = iter(range(size)) + watermark = 0 + while True: + next_batch = list(itertools.islice(all_elements, 100)) + if not next_batch: + break + ts = ts.add_elements([(i, random.randint(0, 1000)) for i in next_batch]) + watermark = watermark + 100 + ts = ts.advance_watermark_to(watermark) + ts = ts.advance_watermark_to_infinity() + + input_pc = p | ts | WindowInto(FixedWindows(100)) + for i in range(NUM_PARALLEL_STAGES): + _build_serial_stages(input_pc, NUM_SERIAL_STAGES, i) + + return _pipeline_runner + + +def run_benchmark( + starting_point=1, num_runs=10, num_elements_step=300, verbose=True): + suite = [ + utils.LinearRegressionBenchmarkConfig( + run_single_pipeline, starting_point, num_elements_step, num_runs) + ] + return utils.run_benchmarks(suite, verbose=verbose) + + +if __name__ == '__main__': + logging.basicConfig() + utils.check_compiled('apache_beam.runners.common') + + parser = argparse.ArgumentParser() + parser.add_argument('--num_runs', default=10, type=int) + parser.add_argument('--starting_point', default=1, type=int) + parser.add_argument('--increment', default=300, type=int) + parser.add_argument('--verbose', default=True, type=bool) + options = parser.parse_args() + + run_benchmark( + options.starting_point, + options.num_runs, + options.increment, + options.verbose) diff --git a/sdks/python/apache_beam/tools/utils.py b/sdks/python/apache_beam/tools/utils.py index 1ad5403b4be5..2f27598fbee9 100644 --- a/sdks/python/apache_beam/tools/utils.py +++ b/sdks/python/apache_beam/tools/utils.py @@ -19,10 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import collections import gc import importlib diff --git a/sdks/python/apache_beam/transforms/__init__.py b/sdks/python/apache_beam/transforms/__init__.py index ab35331f2ba9..4e66a290842c 100644 --- a/sdks/python/apache_beam/transforms/__init__.py +++ b/sdks/python/apache_beam/transforms/__init__.py @@ -18,7 +18,6 @@ """PTransform and descendants.""" # pylint: disable=wildcard-import -from __future__ import absolute_import from apache_beam.transforms import combiners from apache_beam.transforms.core import * diff --git a/sdks/python/apache_beam/transforms/combinefn_lifecycle_pipeline.py b/sdks/python/apache_beam/transforms/combinefn_lifecycle_pipeline.py index 5186e090b073..1964082e8dab 100644 --- a/sdks/python/apache_beam/transforms/combinefn_lifecycle_pipeline.py +++ b/sdks/python/apache_beam/transforms/combinefn_lifecycle_pipeline.py @@ -102,6 +102,8 @@ def run_combine(pipeline, input_elements=5, lift_combiners=True): # Enable runtime type checking in order to cover TypeCheckCombineFn by # the test. pipeline.get_pipeline_options().view_as(TypeOptions).runtime_type_check = True + pipeline.get_pipeline_options().view_as( + TypeOptions).allow_unsafe_triggers = True with pipeline as p: pcoll = p | 'Start' >> beam.Create(range(input_elements)) diff --git a/sdks/python/apache_beam/transforms/combinefn_lifecycle_test.py b/sdks/python/apache_beam/transforms/combinefn_lifecycle_test.py index b55baed8726a..a244f805ee35 100644 --- a/sdks/python/apache_beam/transforms/combinefn_lifecycle_test.py +++ b/sdks/python/apache_beam/transforms/combinefn_lifecycle_test.py @@ -22,10 +22,11 @@ import unittest from functools import wraps -from nose.plugins.attrib import attr +import pytest from parameterized import parameterized_class from apache_beam.options.pipeline_options import DebugOptions +from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import StandardOptions from apache_beam.runners.direct import direct_runner from apache_beam.runners.portability import fn_api_runner @@ -54,7 +55,7 @@ def wrapped(*args, **kwargs): return wrapped -@attr('ValidatesRunner') +@pytest.mark.it_validatesrunner class CombineFnLifecycleTest(unittest.TestCase): def setUp(self): self.pipeline = TestPipeline(is_integration_test=True) @@ -69,6 +70,9 @@ def test_non_liftable_combine(self): @skip_unless_v2 def test_combining_value_state(self): + if ('DataflowRunner' in self.pipeline.get_pipeline_options().view_as( + StandardOptions).runner): + self.skipTest('BEAM-11793') run_pardo(self.pipeline) @@ -85,7 +89,10 @@ def test_combine(self): self._assert_teardown_called() def test_non_liftable_combine(self): - run_combine(TestPipeline(runner=self.runner()), lift_combiners=False) + test_options = PipelineOptions(flags=['--allow_unsafe_triggers']) + run_combine( + TestPipeline(runner=self.runner(), options=test_options), + lift_combiners=False) self._assert_teardown_called() def test_combining_value_state(self): diff --git a/sdks/python/apache_beam/transforms/combiners.py b/sdks/python/apache_beam/transforms/combiners.py index 68c9e084f4cc..41ad3df14178 100644 --- a/sdks/python/apache_beam/transforms/combiners.py +++ b/sdks/python/apache_beam/transforms/combiners.py @@ -19,17 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import copy import heapq +import itertools import operator import random -import sys -import warnings -from builtins import object -from builtins import zip from typing import Any from typing import Dict from typing import Iterable @@ -39,8 +33,6 @@ from typing import TypeVar from typing import Union -from past.builtins import long - from apache_beam import typehints from apache_beam.transforms import core from apache_beam.transforms import cy_combiners @@ -66,6 +58,7 @@ class CombinerWithoutDefaults(ptransform.PTransform): """Super class to inherit without_defaults to built-in Combiners.""" def __init__(self, has_defaults=True): + super(CombinerWithoutDefaults, self).__init__() self.has_defaults = has_defaults def with_defaults(self, has_defaults=True): @@ -95,7 +88,7 @@ def expand(self, pcoll): # TODO(laolu): This type signature is overly restrictive. This should be # more general. -@with_input_types(Union[float, int, long]) +@with_input_types(Union[float, int]) @with_output_types(float) class MeanCombineFn(core.CombineFn): """CombineFn for computing an arithmetic mean.""" @@ -176,7 +169,8 @@ class Top(object): """Combiners for obtaining extremal elements.""" # pylint: disable=no-self-argument - + @with_input_types(T) + @with_output_types(List[T]) class Of(CombinerWithoutDefaults): """Obtain a list of the compare-most N elements in a PCollection. @@ -184,94 +178,53 @@ class Of(CombinerWithoutDefaults): to which it is applied, where "greatest" is determined by the comparator function supplied as the compare argument. """ - def _py2__init__(self, n, compare=None, *args, **kwargs): - """Initializer. - - compare should be an implementation of "a < b" taking at least two - arguments (a and b). Additional arguments and side inputs specified in - the apply call become additional arguments to the comparator. Defaults to - the natural ordering of the elements. - The arguments 'key' and 'reverse' may instead be passed as keyword - arguments, and have the same meaning as for Python's sort functions. - - Args: - pcoll: PCollection to process. - n: number of elements to extract from pcoll. - compare: as described above. - *args: as described above. - **kwargs: as described above. - """ - super(Top.Of, self).__init__() - if compare: - warnings.warn( - 'Compare not available in Python 3, use key instead.', - DeprecationWarning) - self._n = n - self._compare = compare - self._key = kwargs.pop('key', None) - self._reverse = kwargs.pop('reverse', False) - self._args = args - self._kwargs = kwargs - - def _py3__init__(self, n, **kwargs): + def __init__(self, n, key=None, reverse=False): """Creates a global Top operation. The arguments 'key' and 'reverse' may be passed as keyword arguments, and have the same meaning as for Python's sort functions. Args: - pcoll: PCollection to process. n: number of elements to extract from pcoll. - **kwargs: may contain 'key' and/or 'reverse' + key: (optional) a mapping of elements to a comparable key, similar to + the key argument of Python's sorting methods. + reverse: (optional) whether to order things smallest to largest, rather + than largest to smallest """ - unknown_kwargs = set(kwargs.keys()) - set(['key', 'reverse']) - if unknown_kwargs: - raise ValueError( - 'Unknown keyword arguments: ' + ', '.join(unknown_kwargs)) - self._py2__init__(n, None, **kwargs) - - # Python 3 sort does not accept a comparison operator, and nor do we. - # FIXME: mypy would handle this better if we placed the _py*__init__ funcs - # inside the if/else block below: - if sys.version_info[0] < 3: - __init__ = _py2__init__ - else: - __init__ = _py3__init__ # type: ignore + super(Top.Of, self).__init__() + self._n = n + self._key = key + self._reverse = reverse def default_label(self): return 'Top(%d)' % self._n def expand(self, pcoll): - compare = self._compare - if (not self._args and not self._kwargs and pcoll.windowing.is_default()): - if self._reverse: - if compare is None or compare is operator.lt: - compare = operator.gt - else: - original_compare = compare - compare = lambda a, b: original_compare(b, a) + if pcoll.windowing.is_default(): # This is a more efficient global algorithm. top_per_bundle = pcoll | core.ParDo( - _TopPerBundle(self._n, compare, self._key)) - # If pcoll is empty, we can't guerentee that top_per_bundle + _TopPerBundle(self._n, self._key, self._reverse)) + # If pcoll is empty, we can't guarantee that top_per_bundle # won't be empty, so inject at least one empty accumulator - # so that downstream is guerenteed to produce non-empty output. - empty_bundle = pcoll.pipeline | core.Create([(None, [])]) + # so that downstream is guaranteed to produce non-empty output. + empty_bundle = ( + pcoll.pipeline | core.Create([(None, [])]).with_output_types( + top_per_bundle.element_type)) return ((top_per_bundle, empty_bundle) | core.Flatten() | core.GroupByKey() - | core.ParDo(_MergeTopPerBundle(self._n, compare, self._key))) + | core.ParDo( + _MergeTopPerBundle(self._n, self._key, self._reverse))) else: if self.has_defaults: return pcoll | core.CombineGlobally( - TopCombineFn(self._n, compare, self._key, self._reverse), - *self._args, - **self._kwargs) + TopCombineFn(self._n, self._key, self._reverse)) else: return pcoll | core.CombineGlobally( - TopCombineFn(self._n, compare, self._key, self._reverse), - *self._args, - **self._kwargs).without_defaults() + TopCombineFn(self._n, self._key, + self._reverse)).without_defaults() + @with_input_types(Tuple[K, V]) + @with_output_types(Tuple[K, List[V]]) class PerKey(ptransform.PTransform): """Identifies the compare-most N elements associated with each key. @@ -280,56 +233,22 @@ class PerKey(ptransform.PTransform): "greatest" is determined by the comparator function supplied as the compare argument in the initializer. """ - def _py2__init__(self, n, compare=None, *args, **kwargs): - """Initializer. - - compare should be an implementation of "a < b" taking at least two - arguments (a and b). Additional arguments and side inputs specified in - the apply call become additional arguments to the comparator. Defaults to - the natural ordering of the elements. - - The arguments 'key' and 'reverse' may instead be passed as keyword - arguments, and have the same meaning as for Python's sort functions. - - Args: - n: number of elements to extract from input. - compare: as described above. - *args: as described above. - **kwargs: as described above. - """ - if compare: - warnings.warn( - 'Compare not available in Python 3, use key instead.', - DeprecationWarning) - self._n = n - self._compare = compare - self._key = kwargs.pop('key', None) - self._reverse = kwargs.pop('reverse', False) - self._args = args - self._kwargs = kwargs - - def _py3__init__(self, n, **kwargs): + def __init__(self, n, key=None, reverse=False): """Creates a per-key Top operation. The arguments 'key' and 'reverse' may be passed as keyword arguments, and have the same meaning as for Python's sort functions. Args: - pcoll: PCollection to process. n: number of elements to extract from pcoll. - **kwargs: may contain 'key' and/or 'reverse' + key: (optional) a mapping of elements to a comparable key, similar to + the key argument of Python's sorting methods. + reverse: (optional) whether to order things smallest to largest, rather + than largest to smallest """ - unknown_kwargs = set(kwargs.keys()) - set(['key', 'reverse']) - if unknown_kwargs: - raise ValueError( - 'Unknown keyword arguments: ' + ', '.join(unknown_kwargs)) - self._py2__init__(n, None, **kwargs) - - # Python 3 sort does not accept a comparison operator, and nor do we. - if sys.version_info[0] < 3: - __init__ = _py2__init__ - else: - __init__ = _py3__init__ # type: ignore + self._n = n + self._key = key + self._reverse = reverse def default_label(self): return 'TopPerKey(%d)' % self._n @@ -347,9 +266,7 @@ def expand(self, pcoll): the PCollection containing the result. """ return pcoll | core.CombinePerKey( - TopCombineFn(self._n, self._compare, self._key, self._reverse), - *self._args, - **self._kwargs) + TopCombineFn(self._n, self._key, self._reverse)) @staticmethod @ptransform.ptransform_fn @@ -385,18 +302,17 @@ def SmallestPerKey(pcoll, n, reverse=True): @with_input_types(T) @with_output_types(Tuple[None, List[T]]) class _TopPerBundle(core.DoFn): - def __init__(self, n, less_than, key): + def __init__(self, n, key, reverse): self._n = n - self._less_than = None if less_than is operator.le else less_than + self._compare = operator.gt if reverse else None self._key = key def start_bundle(self): self._heap = [] def process(self, element): - if self._less_than or self._key: - element = cy_combiners.ComparableValue( - element, self._less_than, self._key) + if self._compare or self._key: + element = cy_combiners.ComparableValue(element, self._compare, self._key) if len(self._heap) < self._n: heapq.heappush(self._heap, element) else: @@ -410,7 +326,7 @@ def finish_bundle(self): self._heap.sort() # Unwrap to avoid serialization via pickle. - if self._less_than or self._key: + if self._compare or self._key: yield window.GlobalWindows.windowed_value( (None, [wrapper.value for wrapper in self._heap])) else: @@ -420,9 +336,9 @@ def finish_bundle(self): @with_input_types(Tuple[None, Iterable[List[T]]]) @with_output_types(List[T]) class _MergeTopPerBundle(core.DoFn): - def __init__(self, n, less_than, key): + def __init__(self, n, key, reverse): self._n = n - self._less_than = None if less_than is operator.lt else less_than + self._compare = operator.gt if reverse else None self._key = key def process(self, key_and_bundles): @@ -440,19 +356,19 @@ def push(hp, e): heapq.heappushpop(hp, e) return False - if self._less_than or self._key: + if self._compare or self._key: heapc = [] # type: List[cy_combiners.ComparableValue] for bundle in bundles: if not heapc: heapc = [ - cy_combiners.ComparableValue(element, self._less_than, self._key) + cy_combiners.ComparableValue(element, self._compare, self._key) for element in bundle ] continue for element in reversed(bundle): if push(heapc, cy_combiners.ComparableValue(element, - self._less_than, + self._compare, self._key)): break heapc.sort() @@ -479,33 +395,14 @@ class TopCombineFn(core.CombineFn): This CombineFn uses a key or comparison operator to rank the elements. Args: - compare: (optional) an implementation of "a < b" taking at least two - arguments (a and b). Additional arguments and side inputs specified - in the apply call become additional arguments to the comparator. key: (optional) a mapping of elements to a comparable key, similar to the key argument of Python's sorting methods. reverse: (optional) whether to order things smallest to largest, rather than largest to smallest """ - - # TODO(robertwb): For Python 3, remove compare and only keep key. - def __init__(self, n, compare=None, key=None, reverse=False): + def __init__(self, n, key=None, reverse=False): self._n = n - - if compare is operator.lt: - compare = None - elif compare is operator.gt: - compare = None - reverse = not reverse - - if compare: - self._compare = (( - lambda a, b, *args, **kwargs: not compare(a, b, *args, **kwargs)) - if reverse else compare) - else: - self._compare = operator.gt if reverse else operator.lt - - self._less_than = None + self._compare = operator.gt if reverse else operator.lt self._key = key def _hydrated_heap(self, heap): @@ -513,18 +410,16 @@ def _hydrated_heap(self, heap): first = heap[0] if isinstance(first, cy_combiners.ComparableValue): if first.requires_hydration: - assert self._less_than is not None for comparable in heap: assert comparable.requires_hydration - comparable.hydrate(self._less_than, self._key) + comparable.hydrate(self._compare, self._key) assert not comparable.requires_hydration return heap else: return heap else: - assert self._less_than is not None return [ - cy_combiners.ComparableValue(element, self._less_than, self._key) + cy_combiners.ComparableValue(element, self._compare, self._key) for element in heap ] else: @@ -553,21 +448,15 @@ def create_accumulator(self, *args, **kwargs): def add_input(self, accumulator, element, *args, **kwargs): # Caching to avoid paying the price of variadic expansion of args / kwargs # when it's not needed (for the 'if' case below). - if self._less_than is None: - if args or kwargs: - self._less_than = lambda a, b: self._compare(a, b, *args, **kwargs) - else: - self._less_than = self._compare - holds_comparables, heap = accumulator - if self._less_than is not operator.lt or self._key: + if self._compare is not operator.lt or self._key: heap = self._hydrated_heap(heap) holds_comparables = True else: assert not holds_comparables comparable = ( - cy_combiners.ComparableValue(element, self._less_than, self._key) + cy_combiners.ComparableValue(element, self._compare, self._key) if holds_comparables else element) if len(heap) < self._n: @@ -577,19 +466,11 @@ def add_input(self, accumulator, element, *args, **kwargs): return (holds_comparables, heap) def merge_accumulators(self, accumulators, *args, **kwargs): - if args or kwargs: - self._less_than = lambda a, b: self._compare(a, b, *args, **kwargs) - add_input = lambda accumulator, element: self.add_input( - accumulator, element, *args, **kwargs) - else: - self._less_than = self._compare - add_input = self.add_input - result_heap = None holds_comparables = None for accumulator in accumulators: holds_comparables, heap = accumulator - if self._less_than is not operator.lt or self._key: + if self._compare is not operator.lt or self._key: heap = self._hydrated_heap(heap) holds_comparables = True else: @@ -599,7 +480,7 @@ def merge_accumulators(self, accumulators, *args, **kwargs): result_heap = heap else: for comparable in heap: - _, result_heap = add_input( + _, result_heap = self.add_input( (holds_comparables, result_heap), comparable.value if holds_comparables else comparable) @@ -615,13 +496,8 @@ def compact(self, accumulator, *args, **kwargs): return accumulator def extract_output(self, accumulator, *args, **kwargs): - if args or kwargs: - self._less_than = lambda a, b: self._compare(a, b, *args, **kwargs) - else: - self._less_than = self._compare - holds_comparables, heap = accumulator - if self._less_than is not operator.lt or self._key: + if self._compare is not operator.lt or self._key: if not holds_comparables: heap = self._hydrated_heap(heap) holds_comparables = True @@ -654,6 +530,8 @@ class Sample(object): # pylint: disable=no-self-argument + @with_input_types(T) + @with_output_types(List[T]) class FixedSizeGlobally(CombinerWithoutDefaults): """Sample n elements from the input PCollection without replacement.""" def __init__(self, n): @@ -673,6 +551,8 @@ def display_data(self): def default_label(self): return 'FixedSizeGlobally(%d)' % self._n + @with_input_types(Tuple[K, V]) + @with_output_types(Tuple[K, List[V]]) class FixedSizePerKey(ptransform.PTransform): """Sample n elements associated with each key without replacement.""" def __init__(self, n): @@ -727,16 +607,25 @@ def teardown(self): class _TupleCombineFnBase(core.CombineFn): - def __init__(self, *combiners): + def __init__(self, *combiners, merge_accumulators_batch_size=None): self._combiners = [core.CombineFn.maybe_from_callable(c) for c in combiners] self._named_combiners = combiners + # If the `merge_accumulators_batch_size` value is not specified, we chose a + # bounded default that is inversely proportional to the number of + # accumulators in merged tuples. + num_combiners = max(1, len(combiners)) + self._merge_accumulators_batch_size = ( + merge_accumulators_batch_size or max(10, 1000 // num_combiners)) def display_data(self): combiners = [ c.__name__ if hasattr(c, '__name__') else c.__class__.__name__ for c in self._named_combiners ] - return {'combiners': str(combiners)} + return { + 'combiners': str(combiners), + 'merge_accumulators_batch_size': self._merge_accumulators_batch_size + } def setup(self, *args, **kwargs): for c in self._combiners: @@ -746,10 +635,22 @@ def create_accumulator(self, *args, **kwargs): return [c.create_accumulator(*args, **kwargs) for c in self._combiners] def merge_accumulators(self, accumulators, *args, **kwargs): - return [ - c.merge_accumulators(a, *args, **kwargs) for c, - a in zip(self._combiners, zip(*accumulators)) - ] + # Make sure that `accumulators` is an iterator (so that the position is + # remembered). + accumulators = iter(accumulators) + result = next(accumulators) + while True: + # Load accumulators into memory and merge in batches to decrease peak + # memory usage. + accumulators_batch = [result] + list( + itertools.islice(accumulators, self._merge_accumulators_batch_size)) + if len(accumulators_batch) == 1: + break + result = [ + c.merge_accumulators(a, *args, **kwargs) for c, + a in zip(self._combiners, zip(*accumulators_batch)) + ] + return result def compact(self, accumulator, *args, **kwargs): return [ @@ -800,11 +701,10 @@ def add_input(self, accumulator, element, *args, **kwargs): ] +@with_input_types(T) +@with_output_types(List[T]) class ToList(CombinerWithoutDefaults): """A global CombineFn that condenses a PCollection into a single list.""" - def __init__(self, label='ToList'): # pylint: disable=useless-super-delegation - super(ToList, self).__init__(label) - def expand(self, pcoll): if self.has_defaults: return pcoll | self.label >> core.CombineGlobally(ToListCombineFn()) @@ -831,6 +731,8 @@ def extract_output(self, accumulator): return accumulator +@with_input_types(Tuple[K, V]) +@with_output_types(Dict[K, V]) class ToDict(CombinerWithoutDefaults): """A global CombineFn that condenses a PCollection into a single dict. @@ -838,9 +740,6 @@ class ToDict(CombinerWithoutDefaults): If multiple values are associated with the same key, only one of the values will be present in the resulting dict. """ - def __init__(self, label='ToDict'): # pylint: disable=useless-super-delegation - super(ToDict, self).__init__(label) - def expand(self, pcoll): if self.has_defaults: return pcoll | self.label >> core.CombineGlobally(ToDictCombineFn()) @@ -871,11 +770,10 @@ def extract_output(self, accumulator): return accumulator +@with_input_types(T) +@with_output_types(Set[T]) class ToSet(CombinerWithoutDefaults): """A global CombineFn that condenses a PCollection into a set.""" - def __init__(self, label='ToSet'): # pylint: disable=useless-super-delegation - super(ToSet, self).__init__(label) - def expand(self, pcoll): if self.has_defaults: return pcoll | self.label >> core.CombineGlobally(ToSetCombineFn()) diff --git a/sdks/python/apache_beam/transforms/combiners_test.py b/sdks/python/apache_beam/transforms/combiners_test.py index 60cd2b899d3e..7e0e83542ee9 100644 --- a/sdks/python/apache_beam/transforms/combiners_test.py +++ b/sdks/python/apache_beam/transforms/combiners_test.py @@ -18,17 +18,12 @@ """Unit tests for our libraries of combine PTransforms.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import itertools import random -import sys import unittest import hamcrest as hc -from future.builtins import range -from nose.plugins.attrib import attr +import pytest import apache_beam as beam import apache_beam.transforms.combiners as combine @@ -172,46 +167,6 @@ def test_top(self): assert_that( result_key_bot, equal_to([('a', [0, 1, 1, 1])]), label='key:bot') - @unittest.skipIf(sys.version_info[0] > 2, 'deprecated comparator') - def test_top_py2(self): - with TestPipeline() as pipeline: - - # A parameter we'll be sharing with a custom comparator. - names = { - 0: 'zo', - 1: 'one', - 2: 'twoo', - 3: 'three', - 5: 'fiiive', - 6: 'sssssix', - 9: 'nniiinne' - } - - # First for global combines. - pcoll = pipeline | 'start' >> Create([6, 3, 1, 1, 9, 1, 5, 2, 0, 6]) - - result_cmp = pcoll | 'cmp' >> combine.Top.Of( - 6, lambda a, b, names: len(names[a]) < len(names[b]), - names) # Note parameter passed to comparator. - result_cmp_rev = pcoll | 'cmp_rev' >> combine.Top.Of( - 3, - lambda a, b, names: len(names[a]) < len(names[b]), - names, # Note parameter passed to comparator. - reverse=True) - assert_that(result_cmp, equal_to([[9, 6, 6, 5, 3, 2]]), label='CheckCmp') - assert_that(result_cmp_rev, equal_to([[0, 1, 1]]), label='CheckCmpRev') - - # Again for per-key combines. - pcoll = pipeline | 'start-perkye' >> Create( - [('a', x) for x in [6, 3, 1, 1, 9, 1, 5, 2, 0, 6]]) - result_key_cmp = pcoll | 'cmp-perkey' >> combine.Top.PerKey( - 6, lambda a, b, names: len(names[a]) < len(names[b]), - names) # Note parameter passed to comparator. - assert_that( - result_key_cmp, - equal_to([('a', [9, 6, 6, 5, 3, 2])]), - label='key:cmp') - def test_empty_global_top(self): with TestPipeline() as p: assert_that(p | beam.Create([]) | combine.Top.Largest(10), equal_to([[]])) @@ -236,19 +191,6 @@ def test_top_key(self): | combine.Top.Of(3, key=len, reverse=True), [['c', 'aa', 'bbb']]) - @unittest.skipIf(sys.version_info[0] > 2, 'deprecated comparator') - def test_top_key_py2(self): - # The largest elements compared by their length mod 5. - self.assertEqual(['aa', 'bbbb', 'c', 'ddddd', 'eee', 'ffffff'] - | combine.Top.Of( - 3, - compare=lambda len_a, - len_b, - m: len_a % m > len_b % m, - key=len, - reverse=True, - m=5), [['bbbb', 'eee', 'aa']]) - def test_sharded_top_combine_fn(self): def test_combine_fn(combine_fn, shards, expected): accumulators = [ @@ -307,7 +249,8 @@ def test_basic_combiners_display_data(self): dd = DisplayData.create_from(transform) expected_items = [ DisplayDataItemMatcher('combine_fn', combine.TupleCombineFn), - DisplayDataItemMatcher('combiners', "['max', 'MeanCombineFn', 'sum']") + DisplayDataItemMatcher('combiners', "['max', 'MeanCombineFn', 'sum']"), + DisplayDataItemMatcher('merge_accumulators_batch_size', 333), ] hc.assert_that(dd.items, hc.contains_inanyorder(*expected_items)) @@ -416,6 +359,49 @@ def test_tuple_combine_fn_without_defaults(self): max).with_common_input()).without_defaults()) assert_that(result, equal_to([(1, 7.0 / 4, 3)])) + def test_empty_tuple_combine_fn(self): + with TestPipeline() as p: + result = ( + p + | Create([(), (), ()]) + | beam.CombineGlobally(combine.TupleCombineFn())) + assert_that(result, equal_to([()])) + + def test_tuple_combine_fn_batched_merge(self): + num_combine_fns = 10 + max_num_accumulators_in_memory = 30 + # Maximum number of accumulator tuples in memory - 1 for the merge result. + merge_accumulators_batch_size = ( + max_num_accumulators_in_memory // num_combine_fns - 1) + num_accumulator_tuples_to_merge = 20 + + class CountedAccumulator: + count = 0 + oom = False + + def __init__(self): + if CountedAccumulator.count > max_num_accumulators_in_memory: + CountedAccumulator.oom = True + else: + CountedAccumulator.count += 1 + + class CountedAccumulatorCombineFn(beam.CombineFn): + def create_accumulator(self): + return CountedAccumulator() + + def merge_accumulators(self, accumulators): + CountedAccumulator.count += 1 + for _ in accumulators: + CountedAccumulator.count -= 1 + + combine_fn = combine.TupleCombineFn( + *[CountedAccumulatorCombineFn() for _ in range(num_combine_fns)], + merge_accumulators_batch_size=merge_accumulators_batch_size) + combine_fn.merge_accumulators( + combine_fn.create_accumulator() + for _ in range(num_accumulator_tuples_to_merge)) + assert not CountedAccumulator.oom + def test_to_list_and_to_dict1(self): with TestPipeline() as pipeline: the_list = [6, 3, 1, 1, 9, 1, 5, 2, 0, 6] @@ -869,7 +855,7 @@ def test_with_input_types_decorator_violation(self): # # Test cases for streaming. # -@attr('ValidatesRunner') +@pytest.mark.it_validatesrunner class TimestampCombinerTest(unittest.TestCase): def test_combiner_earliest(self): """Test TimestampCombiner with EARLIEST.""" diff --git a/sdks/python/apache_beam/transforms/core.py b/sdks/python/apache_beam/transforms/core.py index ca204d7ddfee..25b05df14164 100644 --- a/sdks/python/apache_beam/transforms/core.py +++ b/sdks/python/apache_beam/transforms/core.py @@ -19,26 +19,17 @@ # pytype: skip-file -from __future__ import absolute_import - import copy import inspect import logging import random -import sys import types import typing -from builtins import map -from builtins import object -from builtins import range - -from past.builtins import unicode from apache_beam import coders from apache_beam import pvalue from apache_beam import typehints from apache_beam.coders import typecoders -from apache_beam.coders.coders import ExternalCoder from apache_beam.internal import pickler from apache_beam.internal import util from apache_beam.options.pipeline_options import TypeOptions @@ -69,7 +60,6 @@ from apache_beam.typehints.decorators import with_output_types from apache_beam.typehints.trivial_inference import element_type from apache_beam.typehints.typehints import is_consistent_with -from apache_beam.utils import proto_utils from apache_beam.utils import urns from apache_beam.utils.timestamp import Duration @@ -83,11 +73,6 @@ from apache_beam.transforms.trigger import DefaultTrigger from apache_beam.transforms.trigger import TriggerFn -try: - import funcsigs # Python 2 only. -except ImportError: - funcsigs = None - __all__ = [ 'DoFn', 'CombineFn', @@ -317,11 +302,13 @@ def restriction_coder(self): return coders.registry.get_coder(object) def restriction_size(self, element, restriction): - """Returns the size of an element with respect to the given element. + """Returns the size of a restriction with respect to the given element. By default, asks a newly-created restriction tracker for the default size of the restriction. + The return value must be non-negative. + This API is required to be implemented. """ raise NotImplementedError @@ -329,6 +316,8 @@ def restriction_size(self, element, restriction): def split_and_size(self, element, restriction): """Like split, but also does sizing, returning (restriction, size) pairs. + For each pair, size must be non-negative. + This API is optional if ``split`` and ``restriction_size`` have been implemented. """ @@ -388,12 +377,7 @@ def get_function_args_defaults(f): it doesn't include bound arguments and may follow function wrappers. """ signature = get_signature(f) - # Fall back on funcsigs if inspect module doesn't have 'Parameter'; prefer - # inspect.Parameter over funcsigs.Parameter if both are available. - try: - parameter = inspect.Parameter - except AttributeError: - parameter = funcsigs.Parameter + parameter = inspect.Parameter # TODO(BEAM-5878) support kwonlyargs on Python 3. _SUPPORTED_ARG_TYPES = [ parameter.POSITIONAL_ONLY, parameter.POSITIONAL_OR_KEYWORD @@ -410,95 +394,6 @@ def get_function_args_defaults(f): return args, defaults -class RunnerAPIPTransformHolder(PTransform): - """A `PTransform` that holds a runner API `PTransform` proto. - - This is used for transforms, for which corresponding objects - cannot be initialized in Python SDK. For example, for `ParDo` transforms for - remote SDKs that may be available in Python SDK transform graph when expanding - a cross-language transform since a Python `ParDo` object cannot be generated - without a serialized Python `DoFn` object. - """ - def __init__(self, proto, context): - self._proto = proto - self._context = context - - # For ParDos with side-inputs, this will be populated after this object is - # created. - self.side_inputs = [] - self.is_pardo_with_stateful_dofn = bool(self._get_pardo_state_specs()) - - def proto(self): - """Runner API payload for a `PTransform`""" - return self._proto - - def to_runner_api(self, context, **extra_kwargs): - # TODO(BEAM-7850): no need to copy around Environment if it is a direct - # attribute of PTransform. - id_to_proto_map = self._context.environments.get_id_to_proto_map() - for env_id in id_to_proto_map: - if env_id not in context.environments: - context.environments.put_proto(env_id, id_to_proto_map[env_id]) - else: - env1 = id_to_proto_map[env_id] - env2 = context.environments[env_id] - assert env1.urn == env2.to_runner_api(context).urn, ( - 'Expected environments with the same ID to be equal but received ' - 'environments with different URNs ' - '%r and %r', - env1.urn, env2.to_runner_api(context).urn) - assert env1.payload == env2.to_runner_api(context).payload, ( - 'Expected environments with the same ID to be equal but received ' - 'environments with different payloads ' - '%r and %r', - env1.payload, env2.to_runner_api(context).payload) - - def recursively_add_coder_protos(coder_id, old_context, new_context): - coder_proto = old_context.coders.get_proto_from_id(coder_id) - new_context.coders.put_proto(coder_id, coder_proto, True) - for component_coder_id in coder_proto.component_coder_ids: - recursively_add_coder_protos( - component_coder_id, old_context, new_context) - - if common_urns.primitives.PAR_DO.urn == self._proto.urn: - # If a restriction coder has been set by an external SDK, we have to - # explicitly add it (and all component coders recursively) to the context - # to make sure that it does not get dropped by Python SDK. - par_do_payload = proto_utils.parse_Bytes( - self._proto.payload, beam_runner_api_pb2.ParDoPayload) - if par_do_payload.restriction_coder_id: - recursively_add_coder_protos( - par_do_payload.restriction_coder_id, self._context, context) - elif (common_urns.composites.COMBINE_PER_KEY.urn == self._proto.urn or - common_urns.composites.COMBINE_GLOBALLY.urn == self._proto.urn): - # We have to include coders embedded in `CombinePayload`. - combine_payload = proto_utils.parse_Bytes( - self._proto.payload, beam_runner_api_pb2.CombinePayload) - if combine_payload.accumulator_coder_id: - recursively_add_coder_protos( - combine_payload.accumulator_coder_id, self._context, context) - - return self._proto - - def get_restriction_coder(self): - # For some runners, restriction coder ID has to be provided to correctly - # encode ParDo transforms that are SDF. - if common_urns.primitives.PAR_DO.urn == self._proto.urn: - par_do_payload = proto_utils.parse_Bytes( - self._proto.payload, beam_runner_api_pb2.ParDoPayload) - if par_do_payload.restriction_coder_id: - restriction_coder_proto = self._context.coders.get_proto_from_id( - par_do_payload.restriction_coder_id) - - return ExternalCoder(restriction_coder_proto) - - def _get_pardo_state_specs(self): - if common_urns.primitives.PAR_DO.urn == self._proto.urn: - par_do_payload = proto_utils.parse_Bytes( - self._proto.payload, beam_runner_api_pb2.ParDoPayload) - return par_do_payload.state_specs - - class WatermarkEstimatorProvider(object): """Provides methods for generating WatermarkEstimator. @@ -539,10 +434,6 @@ def __eq__(self, other): return self.param_id == other.param_id return False - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(self.param_id) @@ -653,6 +544,7 @@ class DoFn(WithTypeHints, HasDisplayData, urns.RunnerApiFn): # DoFn.KeyParam StateParam = _StateDoFnParam TimerParam = _TimerDoFnParam + DynamicTimerTagParam = _DoFnParam('DynamicTimerTagParam') DoFnProcessParams = [ ElementParam, @@ -664,7 +556,7 @@ class DoFn(WithTypeHints, HasDisplayData, urns.RunnerApiFn): BundleFinalizerParam, KeyParam, StateParam, - TimerParam + TimerParam, ] RestrictionParam = _RestrictionDoFnParam @@ -692,23 +584,30 @@ def process(self, element, *args, **kwargs): This is invoked by ``DoFnRunner`` for each element of a input ``PCollection``. - If specified, following default arguments are used by the ``DoFnRunner`` to - be able to pass the parameters correctly. - - ``DoFn.ElementParam``: element to be processed, should not be mutated. - ``DoFn.SideInputParam``: a side input that may be used when processing. - ``DoFn.TimestampParam``: timestamp of the input element. - ``DoFn.WindowParam``: ``Window`` the input element belongs to. - ``DoFn.TimerParam``: a ``userstate.RuntimeTimer`` object defined by the spec - of the parameter. - ``DoFn.StateParam``: a ``userstate.RuntimeState`` object defined by the spec - of the parameter. - ``DoFn.KeyParam``: key associated with the element. - ``DoFn.RestrictionParam``: an ``iobase.RestrictionTracker`` will be - provided here to allow treatment as a Splittable ``DoFn``. The restriction - tracker will be derived from the restriction provider in the parameter. - ``DoFn.WatermarkEstimatorParam``: a function that can be used to track - output watermark of Splittable ``DoFn`` implementations. + The following parameters can be used as default values on ``process`` + arguments to indicate that a DoFn accepts the corresponding parameters. For + example, a DoFn might accept the element and its timestamp with the + following signature:: + + def process(element=DoFn.ElementParam, timestamp=DoFn.TimestampParam): + ... + + The full set of parameters is: + + - ``DoFn.ElementParam``: element to be processed, should not be mutated. + - ``DoFn.SideInputParam``: a side input that may be used when processing. + - ``DoFn.TimestampParam``: timestamp of the input element. + - ``DoFn.WindowParam``: ``Window`` the input element belongs to. + - ``DoFn.TimerParam``: a ``userstate.RuntimeTimer`` object defined by the + spec of the parameter. + - ``DoFn.StateParam``: a ``userstate.RuntimeState`` object defined by the + spec of the parameter. + - ``DoFn.KeyParam``: key associated with the element. + - ``DoFn.RestrictionParam``: an ``iobase.RestrictionTracker`` will be + provided here to allow treatment as a Splittable ``DoFn``. The restriction + tracker will be derived from the restriction provider in the parameter. + - ``DoFn.WatermarkEstimatorParam``: a function that can be used to track + output watermark of Splittable ``DoFn`` implementations. Args: element: The element to be processed @@ -803,19 +702,6 @@ def _process_argspec_fn(self): urns.RunnerApiFn.register_pickle_urn(python_urns.PICKLED_DOFN) -def _fn_takes_side_inputs(fn): - try: - signature = get_signature(fn) - except TypeError: - # We can't tell; maybe it does. - return True - - return ( - len(signature.parameters) > 1 or any( - p.kind == p.VAR_POSITIONAL or p.kind == p.VAR_KEYWORD - for p in signature.parameters.values())) - - class CallableWrapperDoFn(DoFn): """For internal use only; no backwards-compatibility guarantees. @@ -1158,10 +1044,7 @@ def compact(self, accumulator, *args, **kwargs): return [self._fn(accumulator, *args, **kwargs)] def extract_output(self, accumulator, *args, **kwargs): - if len(accumulator) == 1: - return accumulator[0] - else: - return self._fn(accumulator, *args, **kwargs) + return self._fn(accumulator, *args, **kwargs) def default_type_hints(self): fn_hints = get_type_hints(self._fn) @@ -1238,10 +1121,7 @@ def compact(self, accumulator): return [self._fn(accumulator)] def extract_output(self, accumulator): - if len(accumulator) == 1: - return accumulator[0] - else: - return self._fn(accumulator) + return self._fn(accumulator) class PartitionFn(WithTypeHints): @@ -1675,7 +1555,8 @@ def Map(fn, *args, **kwargs): # pylint: disable=invalid-name raise TypeError( 'Map can be used only with callable objects. ' 'Received %r instead.' % (fn)) - if _fn_takes_side_inputs(fn): + from apache_beam.transforms.util import fn_takes_side_inputs + if fn_takes_side_inputs(fn): wrapper = lambda x, *args, **kwargs: [fn(x, *args, **kwargs)] else: wrapper = lambda x: [fn(x)] @@ -1712,10 +1593,6 @@ def MapTuple(fn, *args, **kwargs): # pylint: disable=invalid-name beam.MapTuple(lambda a, b, ...: ...) - is equivalent to Python 2 - - beam.Map(lambda (a, b, ...), ...: ...) - In other words beam.MapTuple(fn) @@ -2433,6 +2310,44 @@ def infer_output_type(self, input_type): key_type, typehints.WindowedValue[value_type]]] # type: ignore[misc] def expand(self, pcoll): + from apache_beam.transforms.trigger import DataLossReason + from apache_beam.transforms.trigger import DefaultTrigger + windowing = pcoll.windowing + trigger = windowing.triggerfn + if not pcoll.is_bounded and isinstance( + windowing.windowfn, GlobalWindows) and isinstance(trigger, + DefaultTrigger): + if pcoll.pipeline.allow_unsafe_triggers: + # TODO(BEAM-9487) Change comment for Beam 2.33 + _LOGGER.warning( + '%s: PCollection passed to GroupByKey is unbounded, has a global ' + 'window, and uses a default trigger. This is being allowed ' + 'because --allow_unsafe_triggers is set, but it may prevent ' + 'data from making it through the pipeline.', + self.label) + else: + raise ValueError( + 'GroupByKey cannot be applied to an unbounded ' + + 'PCollection with global windowing and a default trigger') + + unsafe_reason = trigger.may_lose_data(windowing) + if unsafe_reason != DataLossReason.NO_POTENTIAL_LOSS: + reason_msg = str(unsafe_reason).replace('DataLossReason.', '') + if pcoll.pipeline.allow_unsafe_triggers: + _LOGGER.warning( + '%s: Unsafe trigger `%s` detected (reason: %s). This is ' + 'being allowed because --allow_unsafe_triggers is set. This could ' + 'lead to missing or incomplete groups.', + self.label, + trigger, + reason_msg) + else: + msg = '{}: Unsafe trigger: `{}` may lose data. '.format( + self.label, trigger) + msg += 'Reason: {}. '.format(reason_msg) + msg += 'This can be overriden with the --allow_unsafe_triggers flag.' + raise ValueError(msg) + return pvalue.PCollection.from_(pcoll) def infer_output_type(self, input_type): @@ -2501,14 +2416,7 @@ def __init__( for ix, field in enumerate(fields): name = field if isinstance(field, str) else 'key%d' % ix key_fields.append((name, _expr_to_callable(field, ix))) - if sys.version_info < (3, 6): - # Before PEP 468, these are randomly ordered. - # At least provide deterministic behavior here. - # pylint: disable=dict-items-not-iterating - kwargs_items = sorted(kwargs.items()) - else: - kwargs_items = kwargs.items() # pylint: disable=dict-items-not-iterating - for name, expr in kwargs_items: + for name, expr in kwargs.items(): key_fields.append((name, _expr_to_callable(expr, name))) self._key_fields = key_fields field_names = tuple(name for name, _ in key_fields) @@ -2552,11 +2460,23 @@ def _key_func(self): key_exprs = [expr for _, expr in self._key_fields] return lambda element: key_type(*(expr(element) for expr in key_exprs)) + def _key_type_hint(self): + if not self._force_tuple_keys and len(self._key_fields) == 1: + return typing.Any + else: + return _dynamic_named_tuple( + 'Key', tuple(name for name, _ in self._key_fields)) + def default_label(self): return 'GroupBy(%s)' % ', '.join(name for name, _ in self._key_fields) def expand(self, pcoll): - return pcoll | Map(lambda x: (self._key_func()(x), x)) | GroupByKey() + input_type = pcoll.element_type or typing.Any + return ( + pcoll + | Map(lambda x: (self._key_func()(x), x)).with_output_types( + typehints.Tuple[self._key_type_hint(), input_type]) + | GroupByKey()) _dynamic_named_tuple_cache = { @@ -2606,10 +2526,12 @@ def expand(self, pcoll): result_fields = tuple(name for name, _ in self._grouping._key_fields) + tuple( dest for _, __, dest in self._aggregations) + key_type_hint = self._grouping.force_tuple_keys(True)._key_type_hint() return ( pcoll - | Map(lambda x: (key_func(x), value_func(x))) + | Map(lambda x: (key_func(x), value_func(x))).with_output_types( + typehints.Tuple[key_type_hint, typing.Any]) | CombinePerKey( TupleCombineFn( *[combine_fn for _, combine_fn, __ in self._aggregations])) @@ -2772,10 +2694,6 @@ def __eq__(self, other): self.environment_id == self.environment_id) return False - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(( self.windowfn, @@ -2818,7 +2736,7 @@ def from_runner_api(proto, context): accumulation_mode=proto.accumulation_mode, timestamp_combiner=proto.output_time, allowed_lateness=Duration(micros=proto.allowed_lateness * 1000), - environment_id=proto.environment_id) + environment_id=None) @typehints.with_input_types(T) @@ -2960,17 +2878,10 @@ def expand(self, pcolls): for pcoll in pcolls: self._check_pcollection(pcoll) is_bounded = all(pcoll.is_bounded for pcoll in pcolls) - result = pvalue.PCollection(self.pipeline, is_bounded=is_bounded) - result.element_type = typehints.Union[tuple( - pcoll.element_type for pcoll in pcolls)] - return result + return pvalue.PCollection(self.pipeline, is_bounded=is_bounded) - def get_windowing(self, inputs): - # type: (typing.Any) -> Windowing - if not inputs: - # TODO(robertwb): Return something compatible with every windowing? - return Windowing(GlobalWindows()) - return super(Flatten, self).get_windowing(inputs) + def infer_output_type(self, input_type): + return input_type def to_runner_api_parameter(self, context): # type: (PipelineContext) -> typing.Tuple[str, None] @@ -2995,7 +2906,7 @@ def __init__(self, values, reshuffle=True): values: An object of values for the PCollection """ super(Create, self).__init__() - if isinstance(values, (unicode, str, bytes)): + if isinstance(values, (str, bytes)): raise TypeError( 'PTransform Create: Refusing to treat string as ' 'an iterable. (string=%r)' % values) @@ -3003,6 +2914,15 @@ def __init__(self, values, reshuffle=True): values = values.items() self.values = tuple(values) self.reshuffle = reshuffle + self._coder = typecoders.registry.get_coder(self.get_output_type()) + + def __getstate__(self): + serialized_values = [self._coder.encode(v) for v in self.values] + return serialized_values, self.reshuffle, self._coder + + def __setstate__(self, state): + serialized_values, self.reshuffle, self._coder = state + self.values = [self._coder.decode(v) for v in serialized_values] def to_runner_api_parameter(self, context): # type: (PipelineContext) -> typing.Tuple[str, bytes] @@ -3024,8 +2944,7 @@ def get_output_type(self): def expand(self, pbegin): assert isinstance(pbegin, pvalue.PBegin) - coder = typecoders.registry.get_coder(self.get_output_type()) - serialized_values = [coder.encode(v) for v in self.values] + serialized_values = [self._coder.encode(v) for v in self.values] reshuffle = self.reshuffle # Avoid the "redistributing" reshuffle for 0 and 1 element Creates. @@ -3045,12 +2964,11 @@ def expand(self, pcoll): | Impulse() | FlatMap(lambda _: serialized_values).with_output_types(bytes) | MaybeReshuffle().with_output_types(bytes) - | Map(coder.decode).with_output_types(self.get_output_type())) + | Map(self._coder.decode).with_output_types(self.get_output_type())) def as_read(self): from apache_beam.io import iobase - coder = typecoders.registry.get_coder(self.get_output_type()) - source = self._create_source_from_iterable(self.values, coder) + source = self._create_source_from_iterable(self.values, self._coder) return iobase.Read(source).with_output_types(self.get_output_type()) def get_windowing(self, unused_inputs): diff --git a/sdks/python/apache_beam/transforms/create_source.py b/sdks/python/apache_beam/transforms/create_source.py index e25a383f3de5..2fbc925afdda 100644 --- a/sdks/python/apache_beam/transforms/create_source.py +++ b/sdks/python/apache_beam/transforms/create_source.py @@ -17,13 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - -from builtins import map -from builtins import next -from builtins import range - from apache_beam.io import iobase from apache_beam.transforms.core import Create diff --git a/sdks/python/apache_beam/transforms/create_test.py b/sdks/python/apache_beam/transforms/create_test.py index fe9f84fdb31e..37f32d478008 100644 --- a/sdks/python/apache_beam/transforms/create_test.py +++ b/sdks/python/apache_beam/transforms/create_test.py @@ -18,15 +18,13 @@ """Unit tests for the Create and _CreateSource classes.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import unittest -from builtins import range from apache_beam import Create +from apache_beam import coders from apache_beam.coders import FastPrimitivesCoder +from apache_beam.internal import pickler from apache_beam.io import source_test_utils from apache_beam.testing.test_pipeline import TestPipeline from apache_beam.testing.util import assert_that @@ -125,6 +123,45 @@ def test_create_source_progress(self): self.assertEqual(expected_split_points_report, split_points_report) + def test_create_uses_coder_for_pickling(self): + coders.registry.register_coder(_Unpicklable, _UnpicklableCoder) + create = Create([_Unpicklable(1), _Unpicklable(2), _Unpicklable(3)]) + unpickled_create = pickler.loads(pickler.dumps(create)) + self.assertEqual( + sorted(create.values, key=lambda v: v.value), + sorted(unpickled_create.values, key=lambda v: v.value)) + + with self.assertRaises(NotImplementedError): + # As there is no special coder for Union types, this will fall back to + # FastPrimitivesCoder, which in turn falls back to pickling. + create_mixed_types = Create([_Unpicklable(1), 2]) + pickler.dumps(create_mixed_types) + + +class _Unpicklable(object): + def __init__(self, value): + self.value = value + + def __eq__(self, other): + return self.value == other.value + + def __getstate__(self): + raise NotImplementedError() + + def __setstate__(self, state): + raise NotImplementedError() + + +class _UnpicklableCoder(coders.Coder): + def encode(self, value): + return str(value.value).encode() + + def decode(self, encoded): + return _Unpicklable(int(encoded.decode())) + + def to_type_hint(self): + return _Unpicklable + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/transforms/cy_combiners.py b/sdks/python/apache_beam/transforms/cy_combiners.py index b6f6fa9d4562..dbbff0798ec2 100644 --- a/sdks/python/apache_beam/transforms/cy_combiners.py +++ b/sdks/python/apache_beam/transforms/cy_combiners.py @@ -24,11 +24,7 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import operator -from builtins import object from apache_beam.transforms import core @@ -62,10 +58,6 @@ def __eq__(self, other): isinstance(other, AccumulatorCombineFn) and self._accumulator_type is other._accumulator_type) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(self._accumulator_type) diff --git a/sdks/python/apache_beam/transforms/cy_dataflow_distribution_counter.pyx b/sdks/python/apache_beam/transforms/cy_dataflow_distribution_counter.pyx index b50cdebbdeb3..a567478999e9 100644 --- a/sdks/python/apache_beam/transforms/cy_dataflow_distribution_counter.pyx +++ b/sdks/python/apache_beam/transforms/cy_dataflow_distribution_counter.pyx @@ -15,6 +15,7 @@ # # cython: profile=True +# cython: language_level=3 """ For internal use only. No backwards compatibility guarantees.""" diff --git a/sdks/python/apache_beam/transforms/dataflow_distribution_counter_test.py b/sdks/python/apache_beam/transforms/dataflow_distribution_counter_test.py index f072cd104b2c..eda658e245c6 100644 --- a/sdks/python/apache_beam/transforms/dataflow_distribution_counter_test.py +++ b/sdks/python/apache_beam/transforms/dataflow_distribution_counter_test.py @@ -16,8 +16,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from mock import Mock diff --git a/sdks/python/apache_beam/transforms/deduplicate.py b/sdks/python/apache_beam/transforms/deduplicate.py index 743412e3b5fe..8c348986ad60 100644 --- a/sdks/python/apache_beam/transforms/deduplicate.py +++ b/sdks/python/apache_beam/transforms/deduplicate.py @@ -19,9 +19,6 @@ """a collection of ptransforms for deduplicating elements.""" -from __future__ import absolute_import -from __future__ import division - import typing from apache_beam import typehints diff --git a/sdks/python/apache_beam/transforms/deduplicate_test.py b/sdks/python/apache_beam/transforms/deduplicate_test.py index 5bea89591ee1..b6ec53dacef0 100644 --- a/sdks/python/apache_beam/transforms/deduplicate_test.py +++ b/sdks/python/apache_beam/transforms/deduplicate_test.py @@ -19,11 +19,9 @@ """Unit tests for deduplicate transform by using TestStream.""" -from __future__ import absolute_import - import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.coders import coders @@ -41,7 +39,9 @@ # TestStream is only supported in streaming pipeline. The Deduplicate transform # also requires Timer support. Sickbaying this testsuite until dataflow runner # supports both TestStream and user timer. -@attr('ValidatesRunner', 'sickbay-batch', 'sickbay-streaming') +@pytest.mark.no_sickbay_batch +@pytest.mark.no_sickbay_streaming +@pytest.mark.it_validatesrunner class DeduplicateTest(unittest.TestCase): def __init__(self, *args, **kwargs): self.runner = None diff --git a/sdks/python/apache_beam/transforms/display.py b/sdks/python/apache_beam/transforms/display.py index 65fb567d12ed..449e45ab1cc5 100644 --- a/sdks/python/apache_beam/transforms/display.py +++ b/sdks/python/apache_beam/transforms/display.py @@ -38,18 +38,16 @@ # pytype: skip-file -from __future__ import absolute_import - import calendar import inspect import json -from builtins import object from datetime import datetime from datetime import timedelta from typing import TYPE_CHECKING from typing import List -from past.builtins import unicode +from apache_beam.portability import common_urns +from apache_beam.portability.api import beam_runner_api_pb2 if TYPE_CHECKING: from apache_beam.options.pipeline_options import PipelineOptions @@ -128,6 +126,51 @@ def _populate_items(self, display_data_dict): self.items.append( DisplayDataItem(element, namespace=self.namespace, key=key)) + def to_proto(self): + # type: (...) -> List[beam_runner_api_pb2.DisplayData] + + """Returns a List of Beam proto representation of Display data.""" + def create_payload(dd): + display_data_dict = None + try: + display_data_dict = dd.get_dict() + except ValueError: + # Skip if the display data is invalid. + return None + + # We use 'label' or 'key' properties to populate the 'label' attribute of + # 'LabelledPayload'. 'label' is a better choice since it's expected to be + # more human readable but some transforms, sources, etc. may not set a + # 'label' property when configuring DisplayData. + label = ( + display_data_dict['label'] + if 'label' in display_data_dict else display_data_dict['key']) + + value = display_data_dict['value'] + if isinstance(value, str): + return beam_runner_api_pb2.LabelledPayload( + label=label, string_value=value) + elif isinstance(value, bool): + return beam_runner_api_pb2.LabelledPayload( + label=label, bool_value=value) + elif isinstance(value, (int, float, complex)): + return beam_runner_api_pb2.LabelledPayload( + label=label, double_value=value) + else: + raise ValueError( + 'Unsupported type %s for value of display data %s' % + (type(value), label)) + + dd_protos = [] + for dd in self.items: + dd_proto = create_payload(dd) + if dd_proto: + dd_protos.append( + beam_runner_api_pb2.DisplayData( + urn=common_urns.StandardDisplayData.DisplayData.LABELLED.urn, + payload=create_payload(dd).SerializeToString())) + return dd_protos + @classmethod def create_from_options(cls, pipeline_options): """ Creates :class:`~apache_beam.transforms.display.DisplayData` from a @@ -189,7 +232,6 @@ class DisplayDataItem(object): """ typeDict = { str: 'STRING', - unicode: 'STRING', int: 'INTEGER', float: 'FLOAT', bool: 'BOOLEAN', @@ -321,10 +363,6 @@ def __eq__(self, other): return self._get_dict() == other._get_dict() return False - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(tuple(sorted(self._get_dict().items()))) diff --git a/sdks/python/apache_beam/transforms/display_test.py b/sdks/python/apache_beam/transforms/display_test.py index 840fc6d581bd..a7605b09a3ad 100644 --- a/sdks/python/apache_beam/transforms/display_test.py +++ b/sdks/python/apache_beam/transforms/display_test.py @@ -19,15 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from datetime import datetime # pylint: disable=ungrouped-imports import hamcrest as hc from hamcrest.core.base_matcher import BaseMatcher -from past.builtins import unicode import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions @@ -167,7 +164,7 @@ def test_unicode_type_display_data(self): class MyDoFn(beam.DoFn): def display_data(self): return { - 'unicode_string': unicode('my string'), + 'unicode_string': 'my string', 'unicode_literal_string': u'my literal string' } diff --git a/sdks/python/apache_beam/transforms/dofn_lifecycle_test.py b/sdks/python/apache_beam/transforms/dofn_lifecycle_test.py index ec446a1bd1da..73068657ac4a 100644 --- a/sdks/python/apache_beam/transforms/dofn_lifecycle_test.py +++ b/sdks/python/apache_beam/transforms/dofn_lifecycle_test.py @@ -19,11 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -76,7 +74,7 @@ def teardown(self): self._teardown_called = True -@attr('ValidatesRunner') +@pytest.mark.it_validatesrunner class DoFnLifecycleTest(unittest.TestCase): def test_dofn_lifecycle(self): with TestPipeline() as p: diff --git a/sdks/python/apache_beam/transforms/environments.py b/sdks/python/apache_beam/transforms/environments.py index a00422283285..c90c8585a520 100644 --- a/sdks/python/apache_beam/transforms/environments.py +++ b/sdks/python/apache_beam/transforms/environments.py @@ -21,12 +21,11 @@ # pytype: skip-file -from __future__ import absolute_import - import json import logging import sys import tempfile +from types import MappingProxyType from typing import TYPE_CHECKING from typing import Any from typing import Callable @@ -36,6 +35,7 @@ from typing import List from typing import Mapping from typing import Optional +from typing import Set from typing import Tuple from typing import Type from typing import TypeVar @@ -52,21 +52,24 @@ from apache_beam.portability.api import endpoints_pb2 from apache_beam.runners.portability import stager from apache_beam.runners.portability.sdk_container_builder import SdkContainerImageBuilder +from apache_beam.transforms.resources import resource_hints_from_options from apache_beam.utils import proto_utils if TYPE_CHECKING: + from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import PortableOptions from apache_beam.runners.pipeline_context import PipelineContext __all__ = [ 'Environment', + 'DefaultEnvironment', 'DockerEnvironment', 'ProcessEnvironment', 'ExternalEnvironment', 'EmbeddedPythonEnvironment', 'EmbeddedPythonGrpcEnvironment', 'SubprocessSDKEnvironment', - 'RunnerAPIEnvironmentHolder' + 'PyPIArtifactRegistry' ] T = TypeVar('T') @@ -75,6 +78,7 @@ Optional[Any], Iterable[str], Iterable[beam_runner_api_pb2.ArtifactInformation], + Mapping[str, bytes], 'PipelineContext' ], Any] @@ -85,6 +89,16 @@ def looks_like_json(s): return re.match(r'\s*\{.*\}\s*$', s) +APACHE_BEAM_DOCKER_IMAGE_PREFIX = 'apache/beam' + +APACHE_BEAM_JAVA_CONTAINER_NAME_PREFIX = 'beam_java' + + +def is_apache_beam_container(container_image): + return container_image and container_image.startswith( + APACHE_BEAM_DOCKER_IMAGE_PREFIX) + + class Environment(object): """Abstract base class for environments. @@ -98,12 +112,29 @@ class Environment(object): _urn_to_env_cls = {} # type: Dict[str, type] def __init__(self, - capabilities, # type: Iterable[str] - artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] - ): + capabilities=(), # type: Iterable[str] + artifacts=(), # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints=None, # type: Optional[Mapping[str, bytes]] + ): # type: (...) -> None self._capabilities = capabilities - self._artifacts = artifacts + self._artifacts = sorted(artifacts, key=lambda x: x.SerializeToString()) + # Hints on created environments should be immutable since pipeline context + # stores environments in hash maps and we use hints to compute the hash. + self._resource_hints = MappingProxyType( + dict(resource_hints) if resource_hints else {}) + + def __eq__(self, other): + return ( + self.__class__ == other.__class__ and + self._artifacts == other._artifacts + # Assuming that we don't have instances of the same Environment subclass + # with different set of capabilities. + and self._resource_hints == other._resource_hints) + + def __hash__(self): + # type: () -> int + return hash((self.__class__, frozenset(self._resource_hints.items()))) def artifacts(self): # type: () -> Iterable[beam_runner_api_pb2.ArtifactInformation] @@ -117,6 +148,10 @@ def capabilities(self): # type: () -> Iterable[str] return self._capabilities + def resource_hints(self): + # type: () -> Mapping[str, bytes] + return self._resource_hints + @classmethod @overload def register_urn( @@ -193,7 +228,8 @@ def to_runner_api(self, context): (isinstance(typed_param, bytes) or typed_param is None) else typed_param.encode('utf-8'), capabilities=self.capabilities(), - dependencies=self.artifacts()) + dependencies=self.artifacts(), + resource_hints=self.resource_hints()) @classmethod def from_runner_api(cls, @@ -205,16 +241,12 @@ def from_runner_api(cls, return None parameter_type, constructor = cls._known_urns[proto.urn] - try: - return constructor( - proto_utils.parse_Bytes(proto.payload, parameter_type), - proto.capabilities, - proto.dependencies, - context) - except Exception: - if context.allow_proto_holders: - return RunnerAPIEnvironmentHolder(proto) - raise + return constructor( + proto_utils.parse_Bytes(proto.payload, parameter_type), + proto.capabilities, + proto.dependencies, + proto.resource_hints, + context) @classmethod def from_options(cls, options): @@ -228,6 +260,26 @@ def from_options(cls, options): raise NotImplementedError +@Environment.register_urn(common_urns.environments.DEFAULT.urn, None) +class DefaultEnvironment(Environment): + """Used as a stub when context is missing a default environment.""" + def to_runner_api_parameter(self, context): + return common_urns.environments.DEFAULT.urn, None + + @staticmethod + def from_runner_api_parameter(payload, # type: beam_runner_api_pb2.DockerPayload + capabilities, # type: Iterable[str] + artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints, # type: Mapping[str, bytes] + context # type: PipelineContext + ): + # type: (...) -> DefaultEnvironment + return DefaultEnvironment( + capabilities=capabilities, + artifacts=artifacts, + resource_hints=resource_hints) + + @Environment.register_urn( common_urns.environments.DOCKER.urn, beam_runner_api_pb2.DockerPayload) class DockerEnvironment(Environment): @@ -236,8 +288,9 @@ def __init__( container_image=None, # type: Optional[str] capabilities=(), # type: Iterable[str] artifacts=(), # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints=None, # type: Optional[Mapping[str, bytes]] ): - super(DockerEnvironment, self).__init__(capabilities, artifacts) + super().__init__(capabilities, artifacts, resource_hints) if container_image: logging.info( 'Using provided Python SDK container image: %s' % (container_image)) @@ -251,15 +304,11 @@ def __init__( (self.container_image)) def __eq__(self, other): - return self.__class__ == other.__class__ \ - and self.container_image == other.container_image - - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other + return ( + super().__eq__(other) and self.container_image == other.container_image) def __hash__(self): - return hash((self.__class__, self.container_image)) + return hash((super().__hash__(), self.container_image)) def __repr__(self): return 'DockerEnvironment(container_image=%s)' % self.container_image @@ -272,15 +321,17 @@ def to_runner_api_parameter(self, context): @staticmethod def from_runner_api_parameter(payload, # type: beam_runner_api_pb2.DockerPayload - capabilities, # type: Iterable[str] - artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] - context # type: PipelineContext - ): + capabilities, # type: Iterable[str] + artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints, # type: Mapping[str, bytes] + context # type: PipelineContext + ): # type: (...) -> DockerEnvironment return DockerEnvironment( container_image=payload.container_image, capabilities=capabilities, - artifacts=artifacts) + artifacts=artifacts, + resource_hints=resource_hints) @classmethod def from_options(cls, options): @@ -290,19 +341,25 @@ def from_options(cls, options): options) return cls.from_container_image( container_image=prebuilt_container_image, - artifacts=python_sdk_dependencies(options)) + artifacts=python_sdk_dependencies(options), + resource_hints=resource_hints_from_options(options), + ) return cls.from_container_image( container_image=options.lookup_environment_option( 'docker_container_image') or options.environment_config, - artifacts=python_sdk_dependencies(options)) + artifacts=python_sdk_dependencies(options), + resource_hints=resource_hints_from_options(options), + ) @classmethod - def from_container_image(cls, container_image, artifacts=()): - # type: (str, Iterable[beam_runner_api_pb2.ArtifactInformation]) -> DockerEnvironment + def from_container_image( + cls, container_image, artifacts=(), resource_hints=None): + # type: (str, Iterable[beam_runner_api_pb2.ArtifactInformation], Optional[Mapping[str, bytes]]) -> DockerEnvironment return cls( container_image=container_image, capabilities=python_sdk_capabilities(), - artifacts=artifacts) + artifacts=artifacts, + resource_hints=resource_hints) @staticmethod def default_docker_image(): @@ -317,7 +374,8 @@ def default_docker_image(): (sys.version_info[0], sys.version_info[1])) image = ( - 'apache/beam_python{version_suffix}_sdk:{tag}'.format( + APACHE_BEAM_DOCKER_IMAGE_PREFIX + + '_python{version_suffix}_sdk:{tag}'.format( version_suffix=version_suffix, tag=sdk_version)) logging.info('Default Python SDK image for environment is %s' % (image)) return image @@ -334,27 +392,25 @@ def __init__( env=None, # type: Optional[Mapping[str, str]] capabilities=(), # type: Iterable[str] artifacts=(), # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints=None, # type: Optional[Mapping[str, bytes]] ): # type: (...) -> None - super(ProcessEnvironment, self).__init__(capabilities, artifacts) + super().__init__(capabilities, artifacts, resource_hints) self.command = command self.os = os self.arch = arch self.env = env or {} def __eq__(self, other): - return self.__class__ == other.__class__ \ - and self.command == other.command and self.os == other.os \ - and self.arch == other.arch and self.env == other.env - - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other + return ( + super().__eq__(other) and self.command == other.command and + self.os == other.os and self.arch == other.arch and + self.env == other.env) def __hash__(self): # type: () -> int return hash(( - self.__class__, + super().__hash__(), self.command, self.os, self.arch, @@ -379,10 +435,11 @@ def to_runner_api_parameter(self, context): @staticmethod def from_runner_api_parameter(payload, - capabilities, # type: Iterable[str] - artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] - context # type: PipelineContext - ): + capabilities, # type: Iterable[str] + artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints, # type: Mapping[str, bytes] + context # type: PipelineContext + ): # type: (...) -> ProcessEnvironment return ProcessEnvironment( command=payload.command, @@ -390,7 +447,9 @@ def from_runner_api_parameter(payload, arch=payload.arch, env=payload.env, capabilities=capabilities, - artifacts=artifacts) + artifacts=artifacts, + resource_hints=resource_hints, + ) @staticmethod def parse_environment_variables(variables): @@ -416,7 +475,9 @@ def from_options(cls, options): arch=config.get('arch', ''), env=config.get('env', ''), capabilities=python_sdk_capabilities(), - artifacts=python_sdk_dependencies(options)) + artifacts=python_sdk_dependencies(options), + resource_hints=resource_hints_from_options(options), + ) env = cls.parse_environment_variables( options.lookup_environment_option('process_variables').split(',') if options.lookup_environment_option('process_variables') else []) @@ -424,7 +485,9 @@ def from_options(cls, options): options.lookup_environment_option('process_command'), env=env, capabilities=python_sdk_capabilities(), - artifacts=python_sdk_dependencies(options)) + artifacts=python_sdk_dependencies(options), + resource_hints=resource_hints_from_options(options), + ) @Environment.register_urn( @@ -436,23 +499,21 @@ def __init__( params=None, # type: Optional[Mapping[str, str]] capabilities=(), # type: Iterable[str] artifacts=(), # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints=None, # type: Optional[Mapping[str, bytes]] ): - super(ExternalEnvironment, self).__init__(capabilities, artifacts) + super().__init__(capabilities, artifacts, resource_hints) self.url = url self.params = params def __eq__(self, other): - return self.__class__ == other.__class__ and self.url == other.url \ - and self.params == other.params - - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other + return ( + super().__eq__(other) and self.url == other.url and + self.params == other.params) def __hash__(self): # type: () -> int return hash(( - self.__class__, + super().__hash__(), self.url, frozenset(self.params.items()) if self.params is not None else None)) @@ -470,16 +531,18 @@ def to_runner_api_parameter(self, context): @staticmethod def from_runner_api_parameter(payload, # type: beam_runner_api_pb2.ExternalPayload - capabilities, # type: Iterable[str] - artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] - context # type: PipelineContext - ): + capabilities, # type: Iterable[str] + artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints, # type: Mapping[str, bytes] + context # type: PipelineContext + ): # type: (...) -> ExternalEnvironment return ExternalEnvironment( payload.endpoint.url, params=payload.params or None, capabilities=capabilities, - artifacts=artifacts) + artifacts=artifacts, + resource_hints=resource_hints) @classmethod def from_options(cls, options): @@ -501,44 +564,39 @@ def from_options(cls, options): url, params=params, capabilities=python_sdk_capabilities(), - artifacts=python_sdk_dependencies(options)) + artifacts=python_sdk_dependencies(options), + resource_hints=resource_hints_from_options(options)) @Environment.register_urn(python_urns.EMBEDDED_PYTHON, None) class EmbeddedPythonEnvironment(Environment): - def __init__(self, capabilities=None, artifacts=()): - super(EmbeddedPythonEnvironment, self).__init__(capabilities, artifacts) - - def __eq__(self, other): - return self.__class__ == other.__class__ - - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - - def __hash__(self): - # type: () -> int - return hash(self.__class__) - def to_runner_api_parameter(self, context): # type: (PipelineContext) -> Tuple[str, None] return python_urns.EMBEDDED_PYTHON, None @staticmethod def from_runner_api_parameter(unused_payload, # type: None - capabilities, # type: Iterable[str] - artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] - context # type: PipelineContext - ): + capabilities, # type: Iterable[str] + artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints, # type: Mapping[str, bytes] + context # type: PipelineContext + ): # type: (...) -> EmbeddedPythonEnvironment - return EmbeddedPythonEnvironment(capabilities, artifacts) + return EmbeddedPythonEnvironment(capabilities, artifacts, resource_hints) @classmethod def from_options(cls, options): # type: (PortableOptions) -> EmbeddedPythonEnvironment return cls( capabilities=python_sdk_capabilities(), - artifacts=python_sdk_dependencies(options)) + artifacts=python_sdk_dependencies(options), + resource_hints=resource_hints_from_options(options), + ) + + @classmethod + def default(cls): + # type: () -> EmbeddedPythonEnvironment + return cls(capabilities=python_sdk_capabilities(), artifacts=()) @Environment.register_urn(python_urns.EMBEDDED_PYTHON_GRPC, bytes) @@ -548,24 +606,25 @@ def __init__( state_cache_size=None, data_buffer_time_limit_ms=None, capabilities=(), - artifacts=()): - super(EmbeddedPythonGrpcEnvironment, self).__init__(capabilities, artifacts) + artifacts=(), + resource_hints=None, + ): + super().__init__(capabilities, artifacts, resource_hints) self.state_cache_size = state_cache_size self.data_buffer_time_limit_ms = data_buffer_time_limit_ms def __eq__(self, other): - return self.__class__ == other.__class__ \ - and self.state_cache_size == other.state_cache_size \ - and self.data_buffer_time_limit_ms == other.data_buffer_time_limit_ms - - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other + return ( + super().__eq__(other) and + self.state_cache_size == other.state_cache_size and + self.data_buffer_time_limit_ms == other.data_buffer_time_limit_ms) def __hash__(self): # type: () -> int - return hash( - (self.__class__, self.state_cache_size, self.data_buffer_time_limit_ms)) + return hash(( + super().__hash__(), + self.state_cache_size, + self.data_buffer_time_limit_ms)) def __repr__(self): # type: () -> str @@ -589,10 +648,11 @@ def to_runner_api_parameter(self, context): @staticmethod def from_runner_api_parameter(payload, # type: bytes - capabilities, # type: Iterable[str] - artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] - context # type: PipelineContext - ): + capabilities, # type: Iterable[str] + artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints, # type: Mapping[str, bytes] + context # type: PipelineContext + ): # type: (...) -> EmbeddedPythonGrpcEnvironment if payload: config = EmbeddedPythonGrpcEnvironment.parse_config( @@ -601,7 +661,8 @@ def from_runner_api_parameter(payload, # type: bytes state_cache_size=config.get('state_cache_size'), data_buffer_time_limit_ms=config.get('data_buffer_time_limit_ms'), capabilities=capabilities, - artifacts=artifacts) + artifacts=artifacts, + resource_hints=resource_hints) else: return EmbeddedPythonGrpcEnvironment() @@ -613,11 +674,14 @@ def from_options(cls, options): options.environment_config) return cls( state_cache_size=config.get('state_cache_size'), - data_buffer_time_limit_ms=config.get('data_buffer_time_limit_ms')) + data_buffer_time_limit_ms=config.get('data_buffer_time_limit_ms'), + capabilities=python_sdk_capabilities(), + artifacts=python_sdk_dependencies(options)) else: return cls( capabilities=python_sdk_capabilities(), - artifacts=python_sdk_dependencies(options)) + artifacts=python_sdk_dependencies(options), + resource_hints=resource_hints_from_options(options)) @staticmethod def parse_config(s): @@ -634,6 +698,11 @@ def parse_config(s): else: return {'state_cache_size': int(s)} + @classmethod + def default(cls): + # type: () -> EmbeddedPythonGrpcEnvironment + return cls(capabilities=python_sdk_capabilities(), artifacts=()) + @Environment.register_urn(python_urns.SUBPROCESS_SDK, bytes) class SubprocessSDKEnvironment(Environment): @@ -642,21 +711,18 @@ def __init__( command_string, # type: str capabilities=(), # type: Iterable[str] artifacts=(), # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints=None, # type: Optional[Mapping[str, bytes]] ): - super(SubprocessSDKEnvironment, self).__init__(capabilities, artifacts) + super().__init__(capabilities, artifacts, resource_hints) self.command_string = command_string def __eq__(self, other): - return self.__class__ == other.__class__ \ - and self.command_string == other.command_string - - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other + return ( + super().__eq__(other) and self.command_string == other.command_string) def __hash__(self): # type: () -> int - return hash((self.__class__, self.command_string)) + return hash((super().__hash__(), self.command_string)) def __repr__(self): # type: () -> str @@ -668,13 +734,14 @@ def to_runner_api_parameter(self, context): @staticmethod def from_runner_api_parameter(payload, # type: bytes - capabilities, # type: Iterable[str] - artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] - context # type: PipelineContext - ): + capabilities, # type: Iterable[str] + artifacts, # type: Iterable[beam_runner_api_pb2.ArtifactInformation] + resource_hints, # type: Mapping[str, bytes] + context # type: PipelineContext + ): # type: (...) -> SubprocessSDKEnvironment return SubprocessSDKEnvironment( - payload.decode('utf-8'), capabilities, artifacts) + payload.decode('utf-8'), capabilities, artifacts, resource_hints) @classmethod def from_options(cls, options): @@ -682,32 +749,27 @@ def from_options(cls, options): return cls( options.environment_config, capabilities=python_sdk_capabilities(), - artifacts=python_sdk_dependencies(options)) - + artifacts=python_sdk_dependencies(options), + resource_hints=resource_hints_from_options(options)) -class RunnerAPIEnvironmentHolder(Environment): - def __init__(self, proto): - # type: (beam_runner_api_pb2.Environment) -> None - self.proto = proto - - def to_runner_api(self, context): - # type: (PipelineContext) -> beam_runner_api_pb2.Environment - return self.proto + @classmethod + def from_command_string(cls, command_string): + # type: (str) -> SubprocessSDKEnvironment + return cls( + command_string, capabilities=python_sdk_capabilities(), artifacts=()) - def capabilities(self): - # type: () -> Iterable[str] - return self.proto.capabilities - def __eq__(self, other): - return self.__class__ == other.__class__ and self.proto == other.proto +class PyPIArtifactRegistry(object): + _registered_artifacts = set() # type: Set[Tuple[str, str]] - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other + @classmethod + def register_artifact(cls, name, version): + cls._registered_artifacts.add((name, version)) - def __hash__(self): - # type: () -> int - return hash((self.__class__, self.proto)) + @classmethod + def get_artifacts(cls): + for artifact in cls._registered_artifacts: + yield artifact def python_sdk_capabilities(): @@ -721,10 +783,12 @@ def _python_sdk_capabilities_iter(): if getattr(urn_spec, 'urn', None) in coders.Coder._known_urns: yield urn_spec.urn yield common_urns.protocols.LEGACY_PROGRESS_REPORTING.urn + yield common_urns.protocols.HARNESS_MONITORING_INFOS.urn yield common_urns.protocols.WORKER_STATUS.urn yield python_urns.PACKED_COMBINE_FN yield 'beam:version:sdk_base:' + DockerEnvironment.default_docker_image() yield common_urns.sdf_components.TRUNCATE_SIZED_RESTRICTION.urn + yield common_urns.primitives.TO_STRING.urn def python_sdk_dependencies(options, tmp_dir=None): @@ -732,15 +796,11 @@ def python_sdk_dependencies(options, tmp_dir=None): tmp_dir = tempfile.mkdtemp() skip_prestaged_dependencies = options.view_as( SetupOptions).prebuild_sdk_container_engine is not None - return tuple( - beam_runner_api_pb2.ArtifactInformation( - type_urn=common_urns.artifact_types.FILE.urn, - type_payload=beam_runner_api_pb2.ArtifactFilePayload( - path=local_path).SerializeToString(), - role_urn=common_urns.artifact_roles.STAGING_TO.urn, - role_payload=beam_runner_api_pb2.ArtifactStagingToRolePayload( - staged_name=staged_name).SerializeToString()) for local_path, - staged_name in stager.Stager.create_job_resources( - options, - tmp_dir, - skip_prestaged_dependencies=skip_prestaged_dependencies)) + return stager.Stager.create_job_resources( + options, + tmp_dir, + pypi_requirements=[ + artifact[0] + artifact[1] + for artifact in PyPIArtifactRegistry.get_artifacts() + ], + skip_prestaged_dependencies=skip_prestaged_dependencies) diff --git a/sdks/python/apache_beam/transforms/environments_test.py b/sdks/python/apache_beam/transforms/environments_test.py index 155fe3c7b253..e40b2ad95435 100644 --- a/sdks/python/apache_beam/transforms/environments_test.py +++ b/sdks/python/apache_beam/transforms/environments_test.py @@ -20,10 +20,7 @@ # pytype: skip-file -from __future__ import absolute_import - import logging -import sys import unittest from apache_beam.options.pipeline_options import PortableOptions @@ -40,12 +37,6 @@ class RunnerApiTest(unittest.TestCase): - - if sys.version_info <= (3, ): - - def assertIn(self, first, second, msg=None): - self.assertTrue(first in second, msg) - def test_environment_encoding(self): for environment in (DockerEnvironment(), DockerEnvironment(container_image='img'), @@ -72,10 +63,13 @@ def test_environment_encoding(self): def test_sdk_capabilities(self): sdk_capabilities = environments.python_sdk_capabilities() self.assertIn(common_urns.coders.LENGTH_PREFIX.urn, sdk_capabilities) + self.assertIn( + common_urns.protocols.HARNESS_MONITORING_INFOS.urn, sdk_capabilities) self.assertIn(common_urns.protocols.WORKER_STATUS.urn, sdk_capabilities) self.assertIn( common_urns.sdf_components.TRUNCATE_SIZED_RESTRICTION.urn, sdk_capabilities) + self.assertIn(common_urns.primitives.TO_STRING.urn, sdk_capabilities) def test_default_capabilities(self): environment = DockerEnvironment.from_options( @@ -123,6 +117,17 @@ def test_process_variables_missing_rvalue(self): ]) ProcessEnvironment.from_options(options) + def test_environments_with_same_hints_are_equal(self): + options = PortableOptions([ + '--environment_type=PROCESS', + '--environment_option=process_command=foo', + '--sdk_location=container', + '--resource_hint=accelerator=gpu', + ]) + environment1 = ProcessEnvironment.from_options(options) + environment2 = ProcessEnvironment.from_options(options) + self.assertEqual(environment1, environment2) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/transforms/external.py b/sdks/python/apache_beam/transforms/external.py index 6b8f849ecc37..305cb97da1da 100644 --- a/sdks/python/apache_beam/transforms/external.py +++ b/sdks/python/apache_beam/transforms/external.py @@ -21,15 +21,10 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - import contextlib import copy import functools -import sys import threading -from typing import ByteString from typing import Dict import grpc @@ -127,28 +122,10 @@ def _get_named_tuple_instance(self): for key, value in self._values.items() if value is not None } - # In python 2 named_fields_to_schema will not accept str because its - # ambiguous. This converts str hints to ByteString recursively so its clear - # we intend to use BYTES. - # TODO(BEAM-7372): Remove coercion to ByteString - def coerce_str_to_bytes(typ): - if typ == str: - return ByteString - - elif hasattr(typ, '__args__') and hasattr(typ, '__origin__'): - # Create a new type rather than modifying the existing one - typ = typ.__origin__[tuple(map(coerce_str_to_bytes, typ.__args__))] - - return typ - - if sys.version_info[0] >= 3: - coerce_str_to_bytes = lambda x: x - - schema = named_fields_to_schema([( - key, - coerce_str_to_bytes(convert_to_typing_type(instance_to_type(value)))) - for key, - value in values.items()]) + schema = named_fields_to_schema([ + (key, convert_to_typing_type(instance_to_type(value))) for key, + value in values.items() + ]) return named_tuple_from_schema(schema)(**values) @@ -170,8 +147,6 @@ def _get_named_tuple_instance(self): class AnnotationBasedPayloadBuilder(SchemaBasedPayloadBuilder): """ Build a payload based on an external transform's type annotations. - - Supported in python 3 only. """ def __init__(self, transform, **values): """ @@ -194,8 +169,6 @@ def _get_named_tuple_instance(self): class DataclassBasedPayloadBuilder(SchemaBasedPayloadBuilder): """ Build a payload based on an external transform that uses dataclasses. - - Supported in python 3 only. """ def __init__(self, transform): """ @@ -222,7 +195,10 @@ class ExternalTransform(ptransform.PTransform): Experimental; no backwards compatibility guarantees. """ _namespace_counter = 0 - _namespace = threading.local() # type: ignore[assignment] + + # Variable name _namespace conflicts with DisplayData._namespace so we use + # name _external_namespace here. + _external_namespace = threading.local() _IMPULSE_PREFIX = 'impulse' @@ -240,9 +216,15 @@ def __init__(self, urn, payload, expansion_service=None): self._payload = ( payload.payload() if isinstance(payload, PayloadBuilder) else payload) self._expansion_service = expansion_service - self._namespace = self._fresh_namespace() + self._external_namespace = self._fresh_namespace() self._inputs = {} # type: Dict[str, pvalue.PCollection] - self._output = {} # type: Dict[str, pvalue.PCollection] + self._outputs = {} # type: Dict[str, pvalue.PCollection] + + def replace_named_inputs(self, named_inputs): + self._inputs = named_inputs + + def replace_named_outputs(self, named_outputs): + self._outputs = named_outputs def __post_init__(self, expansion_service): """ @@ -257,15 +239,15 @@ def default_label(self): @classmethod def get_local_namespace(cls): - return getattr(cls._namespace, 'value', 'external') + return getattr(cls._external_namespace, 'value', 'external') @classmethod @contextlib.contextmanager def outer_namespace(cls, namespace): prev = cls.get_local_namespace() - cls._namespace.value = namespace + cls._external_namespace.value = namespace yield - cls._namespace.value = prev + cls._external_namespace.value = prev @classmethod def _fresh_namespace(cls): @@ -306,7 +288,7 @@ def expand(self, pvalueish): components = context.to_runner_api() request = beam_expansion_api_pb2.ExpansionRequest( components=components, - namespace=self._namespace, # type: ignore # mypy thinks self._namespace is threading.local + namespace=self._external_namespace, # type: ignore # mypy thinks self._namespace is threading.local transform=transform_proto) with self._service() as service: @@ -412,7 +394,7 @@ def _normalize(coder_proto): return normalized for id, proto in self._expanded_components.coders.items(): - if id.startswith(self._namespace): + if id.startswith(self._external_namespace): context.coders.put_proto(id, proto) elif id in context.coders: if not _equivalent(context.coders._id_to_proto[id], proto): @@ -422,10 +404,10 @@ def _normalize(coder_proto): else: context.coders.put_proto(id, proto) for id, proto in self._expanded_components.windowing_strategies.items(): - if id.startswith(self._namespace): + if id.startswith(self._external_namespace): context.windowing_strategies.put_proto(id, proto) for id, proto in self._expanded_components.environments.items(): - if id.startswith(self._namespace): + if id.startswith(self._external_namespace): context.environments.put_proto(id, proto) for id, proto in self._expanded_components.pcollections.items(): id = pcoll_renames.get(id, id) @@ -436,10 +418,12 @@ def _normalize(coder_proto): if id.startswith(self._IMPULSE_PREFIX): # Our fake inputs. continue - assert id.startswith(self._namespace), (id, self._namespace) + assert id.startswith( + self._external_namespace), (id, self._external_namespace) new_proto = beam_runner_api_pb2.PTransform( unique_name=proto.unique_name, - spec=proto.spec, + # If URN is not set this is an empty spec. + spec=proto.spec if proto.spec.urn else None, subtransforms=proto.subtransforms, inputs={ tag: pcoll_renames.get(pcoll, pcoll) diff --git a/sdks/python/apache_beam/transforms/external_it_test.py b/sdks/python/apache_beam/transforms/external_it_test.py index d99c21819716..e24b70f3d3d7 100644 --- a/sdks/python/apache_beam/transforms/external_it_test.py +++ b/sdks/python/apache_beam/transforms/external_it_test.py @@ -19,11 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam import Pipeline @@ -35,7 +33,7 @@ class ExternalTransformIT(unittest.TestCase): - @attr('IT') + @pytest.mark.it_postcommit def test_job_python_from_python_it(self): @ptransform.PTransform.register_urn('simple', None) class SimpleTransform(ptransform.PTransform): diff --git a/sdks/python/apache_beam/transforms/external_java.py b/sdks/python/apache_beam/transforms/external_java.py index a9cb723147fd..f0a963864c1d 100644 --- a/sdks/python/apache_beam/transforms/external_java.py +++ b/sdks/python/apache_beam/transforms/external_java.py @@ -17,8 +17,6 @@ """Tests for the Java external transforms.""" -from __future__ import absolute_import - import argparse import logging import subprocess @@ -26,7 +24,6 @@ import grpc from mock import patch -from past.builtins import unicode import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions @@ -139,7 +136,7 @@ def run_pipeline(pipeline_options, expansion_service, wait_until_finish=True): res = ( p | beam.Create(list('aaabccxyyzzz')) - | beam.Map(unicode) + | beam.Map(str) | beam.ExternalTransform( TEST_FILTER_URN, ImplicitSchemaPayloadBuilder({'data': u'middle'}), diff --git a/sdks/python/apache_beam/transforms/external_test.py b/sdks/python/apache_beam/transforms/external_test.py index 60dffe565684..c357f7a791d1 100644 --- a/sdks/python/apache_beam/transforms/external_test.py +++ b/sdks/python/apache_beam/transforms/external_test.py @@ -19,31 +19,40 @@ # pytype: skip-file -from __future__ import absolute_import - +import dataclasses import logging -import sys import typing import unittest -from past.builtins import unicode - import apache_beam as beam from apache_beam import Pipeline from apache_beam.coders import RowCoder +from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.portability.api.external_transforms_pb2 import ExternalConfigurationPayload +from apache_beam.runners import pipeline_context from apache_beam.runners.portability import expansion_service from apache_beam.runners.portability.expansion_service_test import FibTransform from apache_beam.testing.util import assert_that from apache_beam.testing.util import equal_to +from apache_beam.transforms.external import AnnotationBasedPayloadBuilder from apache_beam.transforms.external import ImplicitSchemaPayloadBuilder from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder from apache_beam.typehints import typehints from apache_beam.typehints.native_type_compatibility import convert_to_beam_type +# Protect against environments where apitools library is not available. +# pylint: disable=wrong-import-order, wrong-import-position +try: + from apache_beam.runners.dataflow.internal import apiclient +except ImportError: + apiclient = None # type: ignore +# pylint: enable=wrong-import-order, wrong-import-position -def get_payload(args): - return ExternalConfigurationPayload(configuration=args) + +def get_payload(cls): + payload = ExternalConfigurationPayload() + payload.ParseFromString(cls._payload) + return payload class PayloadBase(object): @@ -83,32 +92,12 @@ def test_typing_payload_builder(self): for key, value in self.values.items(): self.assertEqual(getattr(decoded, key), value) - # TODO(BEAM-7372): Drop py2 specific "bytes" tests - def test_typing_payload_builder_with_bytes(self): - """ - string_utf8 coder will be used even if values are not unicode in python 2.x - """ - result = self.get_payload_from_typing_hints(self.bytes_values) - decoded = RowCoder(result.schema).decode(result.payload) - for key, value in self.values.items(): - self.assertEqual(getattr(decoded, key), value) - def test_typehints_payload_builder(self): result = self.get_payload_from_typing_hints(self.values) decoded = RowCoder(result.schema).decode(result.payload) for key, value in self.values.items(): self.assertEqual(getattr(decoded, key), value) - # TODO(BEAM-7372): Drop py2 specific "bytes" tests - def test_typehints_payload_builder_with_bytes(self): - """ - string_utf8 coder will be used even if values are not unicode in python 2.x - """ - result = self.get_payload_from_typing_hints(self.bytes_values) - decoded = RowCoder(result.schema).decode(result.payload) - for key, value in self.values.items(): - self.assertEqual(getattr(decoded, key), value) - def test_optional_error(self): """ value can only be None if typehint is Optional @@ -124,9 +113,9 @@ def get_payload_from_typing_hints(self, values): [ ('integer_example', int), ('boolean', bool), - ('string_example', unicode), - ('list_of_strings', typing.List[unicode]), - ('mapping', typing.Mapping[unicode, float]), + ('string_example', str), + ('list_of_strings', typing.List[str]), + ('mapping', typing.Mapping[str, float]), ('optional_integer', typing.Optional[int]), ]) @@ -161,18 +150,11 @@ def test_implicit_payload_builder_with_bytes(self): result = builder.build() decoded = RowCoder(result.schema).decode(result.payload) - if sys.version_info[0] < 3: - for key, value in PayloadBase.bytes_values.items(): - # Note the default value in the getattr call. - # ImplicitSchemaPayloadBuilder omits fields with valu=None since their - # type cannot be inferred. - self.assertEqual(getattr(decoded, key, None), value) - else: - for key, value in PayloadBase.values.items(): - # Note the default value in the getattr call. - # ImplicitSchemaPayloadBuilder omits fields with valu=None since their - # type cannot be inferred. - self.assertEqual(getattr(decoded, key, None), value) + for key, value in PayloadBase.values.items(): + # Note the default value in the getattr call. + # ImplicitSchemaPayloadBuilder omits fields with valu=None since their + # type cannot be inferred. + self.assertEqual(getattr(decoded, key, None), value) # Verify we have not modified a cached type (BEAM-10766) # TODO(BEAM-7372): Remove when bytes coercion code is removed. @@ -205,6 +187,54 @@ def test_pipeline_generation(self): u'ExternalTransform(beam:transforms:xlang:test:prefix)/TestLabel', pipeline_from_proto.transforms_stack[0].parts[1].parts[0].full_label) + @unittest.skipIf(apiclient is None, 'GCP dependencies are not installed') + def test_pipeline_generation_with_runner_overrides(self): + pipeline_properties = [ + '--dataflow_endpoint=ignored', + '--job_name=test-job', + '--project=test-project', + '--staging_location=ignored', + '--temp_location=/dev/null', + '--no_auth', + '--dry_run=True', + '--sdk_location=container', + '--runner=DataflowRunner', + '--streaming' + ] + + with beam.Pipeline(options=PipelineOptions(pipeline_properties)) as p: + _ = ( + p + | beam.io.ReadFromPubSub( + subscription= + 'projects/dummy-project/subscriptions/dummy-subscription') + | beam.ExternalTransform( + 'beam:transforms:xlang:test:prefix', + ImplicitSchemaPayloadBuilder({'data': u'0'}), + expansion_service.ExpansionServiceServicer())) + + pipeline_proto, _ = p.to_runner_api(return_context=True) + + pubsub_read_transform = None + external_transform = None + proto_transforms = pipeline_proto.components.transforms + for id in proto_transforms: + if 'beam:transforms:xlang:test:prefix' in proto_transforms[ + id].unique_name: + external_transform = proto_transforms[id] + if 'ReadFromPubSub' in proto_transforms[id].unique_name: + pubsub_read_transform = proto_transforms[id] + + if not (pubsub_read_transform and external_transform): + raise ValueError( + 'Could not find an external transform and the PubSub read transform ' + 'in the pipeline') + + self.assertEqual(1, len(list(pubsub_read_transform.outputs.values()))) + self.assertEqual( + list(pubsub_read_transform.outputs.values()), + list(external_transform.inputs.values())) + def test_payload(self): with beam.Pipeline() as p: res = ( @@ -218,6 +248,35 @@ def test_nested(self): with beam.Pipeline() as p: assert_that(p | FibTransform(6), equal_to([8])) + def test_external_empty_spec_translation(self): + pipeline = beam.Pipeline() + external_transform = beam.ExternalTransform( + 'beam:transforms:xlang:test:prefix', + ImplicitSchemaPayloadBuilder({'data': u'0'}), + expansion_service.ExpansionServiceServicer()) + _ = (pipeline | beam.Create(['a', 'b']) | external_transform) + pipeline.run().wait_until_finish() + + external_transform_label = ( + 'ExternalTransform(beam:transforms:xlang:test:prefix)/TestLabel') + for transform in external_transform._expanded_components.transforms.values( + ): + # We clear the spec of one of the external transforms. + if transform.unique_name == external_transform_label: + transform.spec.Clear() + + context = pipeline_context.PipelineContext() + proto_pipeline = pipeline.to_runner_api(context=context) + + proto_transform = None + for transform in proto_pipeline.components.transforms.values(): + if (transform.unique_name == + 'ExternalTransform(beam:transforms:xlang:test:prefix)/TestLabel'): + proto_transform = transform + + self.assertIsNotNone(proto_transform) + self.assertTrue(str(proto_transform).strip().find('spec {') == -1) + def test_unique_name(self): p = beam.Pipeline() _ = p | FibTransform(6) @@ -229,6 +288,123 @@ def test_unique_name(self): self.assertEqual( len(set(pcolls)), len(pcolls), msg='PCollection names are not unique.') + def test_external_transform_finder_non_leaf(self): + pipeline = beam.Pipeline() + _ = ( + pipeline + | beam.Create(['a', 'b']) + | beam.ExternalTransform( + 'beam:transforms:xlang:test:prefix', + ImplicitSchemaPayloadBuilder({'data': u'0'}), + expansion_service.ExpansionServiceServicer()) + | beam.Map(lambda x: x)) + pipeline.run().wait_until_finish() + + self.assertTrue(pipeline.contains_external_transforms) + + def test_external_transform_finder_leaf(self): + pipeline = beam.Pipeline() + _ = ( + pipeline + | beam.Create(['a', 'b']) + | beam.ExternalTransform( + 'beam:transforms:xlang:test:nooutput', + ImplicitSchemaPayloadBuilder({'data': u'0'}), + expansion_service.ExpansionServiceServicer())) + pipeline.run().wait_until_finish() + + self.assertTrue(pipeline.contains_external_transforms) + + +class ExternalAnnotationPayloadTest(PayloadBase, unittest.TestCase): + def get_payload_from_typing_hints(self, values): + class AnnotatedTransform(beam.ExternalTransform): + URN = 'beam:external:fakeurn:v1' + + def __init__( + self, + integer_example: int, + boolean: bool, + string_example: str, + list_of_strings: typing.List[str], + mapping: typing.Mapping[str, float], + optional_integer: typing.Optional[int] = None, + expansion_service=None): + super(AnnotatedTransform, self).__init__( + self.URN, + AnnotationBasedPayloadBuilder( + self, + integer_example=integer_example, + boolean=boolean, + string_example=string_example, + list_of_strings=list_of_strings, + mapping=mapping, + optional_integer=optional_integer, + ), + expansion_service) + + return get_payload(AnnotatedTransform(**values)) + + def get_payload_from_beam_typehints(self, values): + class AnnotatedTransform(beam.ExternalTransform): + URN = 'beam:external:fakeurn:v1' + + def __init__( + self, + integer_example: int, + boolean: bool, + string_example: str, + list_of_strings: typehints.List[str], + mapping: typehints.Dict[str, float], + optional_integer: typehints.Optional[int] = None, + expansion_service=None): + super(AnnotatedTransform, self).__init__( + self.URN, + AnnotationBasedPayloadBuilder( + self, + integer_example=integer_example, + boolean=boolean, + string_example=string_example, + list_of_strings=list_of_strings, + mapping=mapping, + optional_integer=optional_integer, + ), + expansion_service) + + return get_payload(AnnotatedTransform(**values)) + + +class ExternalDataclassesPayloadTest(PayloadBase, unittest.TestCase): + def get_payload_from_typing_hints(self, values): + @dataclasses.dataclass + class DataclassTransform(beam.ExternalTransform): + URN = 'beam:external:fakeurn:v1' + + integer_example: int + boolean: bool + string_example: str + list_of_strings: typing.List[str] + mapping: typing.Mapping[str, float] = dataclasses.field(default=dict) + optional_integer: typing.Optional[int] = None + expansion_service: dataclasses.InitVar[typing.Optional[str]] = None + + return get_payload(DataclassTransform(**values)) + + def get_payload_from_beam_typehints(self, values): + @dataclasses.dataclass + class DataclassTransform(beam.ExternalTransform): + URN = 'beam:external:fakeurn:v1' + + integer_example: int + boolean: bool + string_example: str + list_of_strings: typehints.List[str] + mapping: typehints.Dict[str, float] = {} + optional_integer: typehints.Optional[int] = None + expansion_service: dataclasses.InitVar[typehints.Optional[str]] = None + + return get_payload(DataclassTransform(**values)) + if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) diff --git a/sdks/python/apache_beam/transforms/external_test_py3.py b/sdks/python/apache_beam/transforms/external_test_py3.py deleted file mode 100644 index 2549eee5c474..000000000000 --- a/sdks/python/apache_beam/transforms/external_test_py3.py +++ /dev/null @@ -1,99 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -"""Unit tests for the transform.external classes.""" - -# pytype: skip-file - -from __future__ import absolute_import - -import typing -import unittest - -import apache_beam as beam -from apache_beam import typehints -from apache_beam.portability.api.external_transforms_pb2 import ExternalConfigurationPayload -from apache_beam.transforms.external import AnnotationBasedPayloadBuilder -from apache_beam.transforms.external_test import PayloadBase - - -def get_payload(cls): - payload = ExternalConfigurationPayload() - payload.ParseFromString(cls._payload) - return payload - - -class ExternalAnnotationPayloadTest(PayloadBase, unittest.TestCase): - def get_payload_from_typing_hints(self, values): - class AnnotatedTransform(beam.ExternalTransform): - URN = 'beam:external:fakeurn:v1' - - def __init__( - self, - integer_example: int, - boolean: bool, - string_example: str, - list_of_strings: typing.List[str], - mapping: typing.Mapping[str, float], - optional_integer: typing.Optional[int] = None, - expansion_service=None): - super(AnnotatedTransform, self).__init__( - self.URN, - AnnotationBasedPayloadBuilder( - self, - integer_example=integer_example, - boolean=boolean, - string_example=string_example, - list_of_strings=list_of_strings, - mapping=mapping, - optional_integer=optional_integer, - ), - expansion_service) - - return get_payload(AnnotatedTransform(**values)) - - def get_payload_from_beam_typehints(self, values): - class AnnotatedTransform(beam.ExternalTransform): - URN = 'beam:external:fakeurn:v1' - - def __init__( - self, - integer_example: int, - boolean: bool, - string_example: str, - list_of_strings: typehints.List[str], - mapping: typehints.Dict[str, float], - optional_integer: typehints.Optional[int] = None, - expansion_service=None): - super(AnnotatedTransform, self).__init__( - self.URN, - AnnotationBasedPayloadBuilder( - self, - integer_example=integer_example, - boolean=boolean, - string_example=string_example, - list_of_strings=list_of_strings, - mapping=mapping, - optional_integer=optional_integer, - ), - expansion_service) - - return get_payload(AnnotatedTransform(**values)) - - -if __name__ == '__main__': - unittest.main() diff --git a/sdks/python/apache_beam/transforms/external_test_py37.py b/sdks/python/apache_beam/transforms/external_test_py37.py deleted file mode 100644 index 52eec45ba09c..000000000000 --- a/sdks/python/apache_beam/transforms/external_test_py37.py +++ /dev/null @@ -1,73 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -"""Unit tests for the transform.external classes.""" - -# pytype: skip-file - -from __future__ import absolute_import - -import dataclasses -import typing -import unittest - -import apache_beam as beam -from apache_beam import typehints -from apache_beam.portability.api.external_transforms_pb2 import ExternalConfigurationPayload -from apache_beam.transforms.external_test import PayloadBase - - -def get_payload(cls): - payload = ExternalConfigurationPayload() - payload.ParseFromString(cls._payload) - return payload - - -class ExternalDataclassesPayloadTest(PayloadBase, unittest.TestCase): - def get_payload_from_typing_hints(self, values): - @dataclasses.dataclass - class DataclassTransform(beam.ExternalTransform): - URN = 'beam:external:fakeurn:v1' - - integer_example: int - boolean: bool - string_example: str - list_of_strings: typing.List[str] - mapping: typing.Mapping[str, float] = dataclasses.field(default=dict) - optional_integer: typing.Optional[int] = None - expansion_service: dataclasses.InitVar[typing.Optional[str]] = None - - return get_payload(DataclassTransform(**values)) - - def get_payload_from_beam_typehints(self, values): - @dataclasses.dataclass - class DataclassTransform(beam.ExternalTransform): - URN = 'beam:external:fakeurn:v1' - - integer_example: int - boolean: bool - string_example: str - list_of_strings: typehints.List[str] - mapping: typehints.Dict[str, float] = {} - optional_integer: typehints.Optional[int] = None - expansion_service: dataclasses.InitVar[typehints.Optional[str]] = None - - return get_payload(DataclassTransform(**values)) - - -if __name__ == '__main__': - unittest.main() diff --git a/sdks/python/apache_beam/transforms/periodicsequence.py b/sdks/python/apache_beam/transforms/periodicsequence.py index c117f8d37efb..9fc902992d40 100644 --- a/sdks/python/apache_beam/transforms/periodicsequence.py +++ b/sdks/python/apache_beam/transforms/periodicsequence.py @@ -15,9 +15,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import division - import math import time @@ -37,6 +34,11 @@ class ImpulseSeqGenRestrictionProvider(core.RestrictionProvider): def initial_restriction(self, element): start, end, interval = element + if isinstance(start, Timestamp): + start = start.micros / 1000000 + if isinstance(end, Timestamp): + end = end.micros / 1000000 + assert start <= end assert interval > 0 total_outputs = math.ceil((end - start) / interval) @@ -81,6 +83,9 @@ def process( ''' start, _, interval = element + if isinstance(start, Timestamp): + start = start.micros / 1000000 + assert isinstance(restriction_tracker, sdf_utils.RestrictionTrackerView) current_output_index = restriction_tracker.current_restriction().start diff --git a/sdks/python/apache_beam/transforms/periodicsequence_test.py b/sdks/python/apache_beam/transforms/periodicsequence_test.py index 544eb5b37305..b18bf75d0709 100644 --- a/sdks/python/apache_beam/transforms/periodicsequence_test.py +++ b/sdks/python/apache_beam/transforms/periodicsequence_test.py @@ -19,13 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - +import inspect import time import unittest -from builtins import range import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -39,8 +35,6 @@ class PeriodicSequenceTest(unittest.TestCase): - # Enable nose tests running in parallel - def test_periodicsequence_outputs_valid_sequence(self): start_offset = 1 start_time = time.time() + start_offset @@ -81,6 +75,25 @@ def test_periodicimpulse_windowing_on_si(self): for x in range(0, int(duration / interval), 1)] assert_that(actual, equal_to(k)) + def test_periodicimpulse_default_start(self): + default_parameters = inspect.signature(PeriodicImpulse).parameters + it = default_parameters["start_timestamp"].default + duration = 1 + et = it + duration + interval = 0.5 + + # Check default `stop_timestamp` is the same type `start_timestamp` + is_same_type = isinstance( + it, type(default_parameters["stop_timestamp"].default)) + error = "'start_timestamp' and 'stop_timestamp' have different type" + assert is_same_type, error + + with TestPipeline() as p: + result = p | 'PeriodicImpulse' >> PeriodicImpulse(it, et, interval) + + k = [it + x * interval for x in range(0, int(duration / interval))] + assert_that(result, equal_to(k)) + def test_periodicsequence_outputs_valid_sequence_in_past(self): start_offset = -10000 it = time.time() + start_offset diff --git a/sdks/python/apache_beam/transforms/ptransform.py b/sdks/python/apache_beam/transforms/ptransform.py index 15d3699aa86e..b314abd20c90 100644 --- a/sdks/python/apache_beam/transforms/ptransform.py +++ b/sdks/python/apache_beam/transforms/ptransform.py @@ -36,8 +36,6 @@ class and wrapper class that allows lambda functions to be used as # pytype: skip-file -from __future__ import absolute_import - import copy import itertools import logging @@ -45,9 +43,6 @@ class and wrapper class that allows lambda functions to be used as import os import sys import threading -from builtins import hex -from builtins import object -from builtins import zip from functools import reduce from functools import wraps from typing import TYPE_CHECKING @@ -55,6 +50,7 @@ class and wrapper class that allows lambda functions to be used as from typing import Callable from typing import Dict from typing import List +from typing import Mapping from typing import Optional from typing import Sequence from typing import Tuple @@ -71,8 +67,10 @@ class and wrapper class that allows lambda functions to be used as from apache_beam.internal import util from apache_beam.portability import python_urns from apache_beam.pvalue import DoOutputsTuple +from apache_beam.transforms import resources from apache_beam.transforms.display import DisplayDataItem from apache_beam.transforms.display import HasDisplayData +from apache_beam.transforms.sideinputs import SIDE_INPUT_PREFIX from apache_beam.typehints import native_type_compatibility from apache_beam.typehints import typehints from apache_beam.typehints.decorators import IOTypeHints @@ -256,7 +254,7 @@ def visit(self, node): return self.visit_nested(node) -def get_named_nested_pvalues(pvalueish): +def get_named_nested_pvalues(pvalueish, as_inputs=False): if isinstance(pvalueish, tuple): # Check to see if it's a named tuple. fields = getattr(pvalueish, '_fields', None) @@ -265,16 +263,22 @@ def get_named_nested_pvalues(pvalueish): else: tagged_values = enumerate(pvalueish) elif isinstance(pvalueish, list): + if as_inputs: + # Full list treated as a list of value for eager evaluation. + yield None, pvalueish + return tagged_values = enumerate(pvalueish) elif isinstance(pvalueish, dict): tagged_values = pvalueish.items() else: - if isinstance(pvalueish, (pvalue.PValue, pvalue.DoOutputsTuple)): + if as_inputs or isinstance(pvalueish, + (pvalue.PValue, pvalue.DoOutputsTuple)): yield None, pvalueish return for tag, subvalue in tagged_values: - for subtag, subsubvalue in get_named_nested_pvalues(subvalue): + for subtag, subsubvalue in get_named_nested_pvalues( + subvalue, as_inputs=as_inputs): if subtag is None: yield tag, subsubvalue else: @@ -364,6 +368,9 @@ def default_label(self): # type: () -> str return self.__class__.__name__ + def annotations(self) -> Dict[str, Union[bytes, str, message.Message]]: + return {} + def default_type_hints(self): fn_type_hints = IOTypeHints.from_callable(self.expand) if fn_type_hints is not None: @@ -418,6 +425,35 @@ def with_output_types(self, type_hint): validate_composite_type_param(type_hint, 'Type hints for a PTransform') return super(PTransform, self).with_output_types(type_hint) + def with_resource_hints(self, **kwargs): # type: (...) -> PTransform + """Adds resource hints to the :class:`PTransform`. + + Resource hints allow users to express constraints on the environment where + the transform should be executed. Interpretation of the resource hints is + defined by Beam Runners. Runners may ignore the unsupported hints. + + Args: + **kwargs: key-value pairs describing hints and their values. + + Raises: + ValueError: if provided hints are unknown to the SDK. See + :mod:~apache_beam.transforms.resources` for a list of known hints. + + Returns: + PTransform: A reference to the instance of this particular + :class:`PTransform` object. + """ + self.get_resource_hints().update(resources.parse_resource_hints(kwargs)) + return self + + def get_resource_hints(self): + # type: () -> Dict[str, bytes] + if '_resource_hints' not in self.__dict__: + # PTransform subclasses don't always call super(), so prefer lazy + # initialization. By default, transforms don't have any resource hints. + self._resource_hints = {} # type: Dict[str, bytes] + return self._resource_hints + def type_check_inputs(self, pvalueish): self.type_check_inputs_or_outputs(pvalueish, 'input') @@ -520,8 +556,13 @@ def get_windowing(self, inputs): By default most transforms just return the windowing function associated with the input PCollection (or the first input if several). """ - # TODO(robertwb): Assert all input WindowFns compatible. - return inputs[0].windowing + if inputs: + return inputs[0].windowing + else: + from apache_beam.transforms.core import Windowing + from apache_beam.transforms.window import GlobalWindows + # TODO(robertwb): Return something compatible with every windowing? + return Windowing(GlobalWindows()) def __rrshift__(self, label): return _NamedPTransform(self, label) @@ -535,6 +576,8 @@ def __or__(self, right): def __ror__(self, left, label=None): """Used to apply this PTransform to non-PValues, e.g., a tuple.""" pvalueish, pvalues = self._extract_input_pvalues(left) + if isinstance(pvalues, dict): + pvalues = tuple(pvalues.values()) pipelines = [v.pipeline for v in pvalues if isinstance(v, pvalue.PValue)] if pvalues and not pipelines: deferred = False @@ -556,15 +599,15 @@ def __ror__(self, left, label=None): for pp in pipelines: if p != pp: raise ValueError( - 'Mixing value from different pipelines not allowed.') + 'Mixing values in different pipelines is not allowed.' + '\n{%r} != {%r}' % (p, pp)) deferred = not getattr(p.runner, 'is_eager', False) # pylint: disable=wrong-import-order, wrong-import-position from apache_beam.transforms.core import Create # pylint: enable=wrong-import-order, wrong-import-position replacements = { id(v): p | 'CreatePInput%s' % ix >> Create(v, reshuffle=False) - for ix, - v in enumerate(pvalues) + for (ix, v) in enumerate(pvalues) if not isinstance(v, pvalue.PValue) and v is not None } pvalueish = _SetInputPValues().visit(pvalueish, replacements) @@ -594,19 +637,11 @@ def _extract_input_pvalues(self, pvalueish): if isinstance(pvalueish, pipeline.Pipeline): pvalueish = pvalue.PBegin(pvalueish) - def _dict_tuple_leaves(pvalueish): - if isinstance(pvalueish, tuple): - for a in pvalueish: - for p in _dict_tuple_leaves(a): - yield p - elif isinstance(pvalueish, dict): - for a in pvalueish.values(): - for p in _dict_tuple_leaves(a): - yield p - else: - yield pvalueish - - return pvalueish, tuple(_dict_tuple_leaves(pvalueish)) + return pvalueish, { + str(tag): value + for (tag, value) in get_named_nested_pvalues( + pvalueish, as_inputs=True) + } def _pvaluish_from_dict(self, input_dict): if len(input_dict) == 1: @@ -614,6 +649,34 @@ def _pvaluish_from_dict(self, input_dict): else: return input_dict + def _named_inputs(self, main_inputs, side_inputs): + # type: (Mapping[str, pvalue.PValue], Sequence[Any]) -> Dict[str, pvalue.PValue] + + """Returns the dictionary of named inputs (including side inputs) as they + should be named in the beam proto. + """ + main_inputs = { + tag: input + for (tag, input) in main_inputs.items() + if isinstance(input, pvalue.PCollection) + } + named_side_inputs = {(SIDE_INPUT_PREFIX + '%s') % ix: si.pvalue + for (ix, si) in enumerate(side_inputs)} + return dict(main_inputs, **named_side_inputs) + + def _named_outputs(self, outputs): + # type: (Dict[object, pvalue.PCollection]) -> Dict[str, pvalue.PCollection] + + """Returns the dictionary of named outputs as they should be named in the + beam proto. + """ + # TODO(BEAM-1833): Push names up into the sdk construction. + return { + str(tag): output + for (tag, output) in outputs.items() + if isinstance(output, pvalue.PCollection) + } + _known_urns = {} # type: Dict[str, Tuple[Optional[type], ConstructorFn]] @classmethod @@ -697,18 +760,10 @@ def from_runner_api(cls, return None parameter_type, constructor = cls._known_urns[proto.spec.urn] - try: - return constructor( - proto, - proto_utils.parse_Bytes(proto.spec.payload, parameter_type), - context) - except Exception: - if context.allow_proto_holders: - # For external transforms we cannot build a Python ParDo object so - # we build a holder transform instead. - from apache_beam.transforms.core import RunnerAPIPTransformHolder - return RunnerAPIPTransformHolder(proto.spec, context) - raise + return constructor( + proto, + proto_utils.parse_Bytes(proto.spec.payload, parameter_type), + context) def to_runner_api_parameter( self, diff --git a/sdks/python/apache_beam/transforms/ptransform_test.py b/sdks/python/apache_beam/transforms/ptransform_test.py index d4f8424c32a5..9ce33f501294 100644 --- a/sdks/python/apache_beam/transforms/ptransform_test.py +++ b/sdks/python/apache_beam/transforms/ptransform_test.py @@ -19,26 +19,20 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import collections import operator +import pickle +import random import re -import sys import typing import unittest -from builtins import map -from builtins import range -from builtins import zip from functools import reduce +from typing import Iterable from typing import Optional +from unittest.mock import patch -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import hamcrest as hc -from nose.plugins.attrib import attr +import pytest import apache_beam as beam import apache_beam.pvalue as pvalue @@ -47,11 +41,16 @@ from apache_beam.io.iobase import Read from apache_beam.metrics import Metrics from apache_beam.metrics.metric import MetricsFilter +from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import TypeOptions +from apache_beam.portability import common_urns from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.test_stream import TestStream from apache_beam.testing.util import assert_that from apache_beam.testing.util import equal_to +from apache_beam.testing.util import is_empty from apache_beam.transforms import WindowInto +from apache_beam.transforms import trigger from apache_beam.transforms import window from apache_beam.transforms.display import DisplayData from apache_beam.transforms.display import DisplayDataItem @@ -68,15 +67,6 @@ class PTransformTest(unittest.TestCase): - # Enable nose tests running in parallel - _multiprocess_can_split_ = True - - @classmethod - def setUpClass(cls): - # Method has been renamed in Python 3 - if sys.version_info[0] < 3: - cls.assertCountEqual = cls.assertItemsEqual - def assertStartswith(self, msg, prefix): self.assertTrue( msg.startswith(prefix), '"%s" does not start with "%s"' % (msg, prefix)) @@ -189,7 +179,7 @@ def test_do_with_multiple_outputs_maintains_unique_name(self): assert_that(r1.m, equal_to([2, 3, 4]), label='r1') assert_that(r2.m, equal_to([3, 4, 5]), label='r2') - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_impulse(self): with TestPipeline() as pipeline: result = pipeline | beam.Impulse() | beam.Map(lambda _: 0) @@ -197,7 +187,8 @@ def test_impulse(self): # TODO(BEAM-3544): Disable this test in streaming temporarily. # Remove sickbay-streaming tag after it's resolved. - @attr('ValidatesRunner', 'sickbay-streaming') + @pytest.mark.no_sickbay_streaming + @pytest.mark.it_validatesrunner def test_read_metrics(self): from apache_beam.io.utils import CountingSource @@ -223,7 +214,7 @@ def process(self, element): self.assertEqual(outputs_counter.committed, 100) self.assertEqual(outputs_counter.attempted, 100) - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_par_do_with_multiple_outputs_and_using_yield(self): class SomeDoFn(beam.DoFn): """A custom DoFn using yield.""" @@ -242,7 +233,7 @@ def process(self, element): assert_that(results.odd, equal_to([1, 3]), label='assert:odd') assert_that(results.even, equal_to([2, 4]), label='assert:even') - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_par_do_with_multiple_outputs_and_using_return(self): def some_fn(v): if v % 2 == 0: @@ -257,7 +248,7 @@ def some_fn(v): assert_that(results.odd, equal_to([1, 3]), label='assert:odd') assert_that(results.even, equal_to([2, 4]), label='assert:even') - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_undeclared_outputs(self): with TestPipeline() as pipeline: nums = pipeline | 'Some Numbers' >> beam.Create([1, 2, 3, 4]) @@ -271,7 +262,7 @@ def test_undeclared_outputs(self): assert_that(results.odd, equal_to([1, 3]), label='assert:odd') assert_that(results.even, equal_to([2, 4]), label='assert:even') - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_multiple_empty_outputs(self): with TestPipeline() as pipeline: nums = pipeline | 'Some Numbers' >> beam.Create([1, 3, 5]) @@ -482,6 +473,45 @@ def test_group_by_key(self): result = pcoll | 'Group' >> beam.GroupByKey() | _SortLists assert_that(result, equal_to([(1, [1, 2, 3]), (2, [1, 2]), (3, [1])])) + def test_group_by_key_unbounded_global_default_trigger(self): + test_options = PipelineOptions() + test_options.view_as(TypeOptions).allow_unsafe_triggers = False + with self.assertRaisesRegex( + ValueError, + 'GroupByKey cannot be applied to an unbounded PCollection with ' + + 'global windowing and a default trigger'): + with TestPipeline(options=test_options) as pipeline: + pipeline | TestStream() | beam.GroupByKey() + + def test_group_by_key_unsafe_trigger(self): + test_options = PipelineOptions() + test_options.view_as(TypeOptions).allow_unsafe_triggers = False + with self.assertRaisesRegex(ValueError, 'Unsafe trigger'): + with TestPipeline(options=test_options) as pipeline: + _ = ( + pipeline + | beam.Create([(None, None)]) + | WindowInto( + window.GlobalWindows(), + trigger=trigger.AfterCount(5), + accumulation_mode=trigger.AccumulationMode.ACCUMULATING) + | beam.GroupByKey()) + + def test_group_by_key_allow_unsafe_triggers(self): + test_options = PipelineOptions(flags=['--allow_unsafe_triggers']) + with TestPipeline(options=test_options) as pipeline: + pcoll = ( + pipeline + | beam.Create([(1, 1), (1, 2), (1, 3), (1, 4)]) + | WindowInto( + window.GlobalWindows(), + trigger=trigger.AfterCount(5), + accumulation_mode=trigger.AccumulationMode.ACCUMULATING) + | beam.GroupByKey()) + # We need five, but it only has four - Displays how this option is + # dangerous. + assert_that(pcoll, is_empty()) + def test_group_by_key_reiteration(self): class MyDoFn(beam.DoFn): def process(self, gbk_result): @@ -500,6 +530,94 @@ def process(self, gbk_result): | 'Reiteration-Sum' >> beam.ParDo(MyDoFn())) assert_that(result, equal_to([(1, 170)])) + def test_group_by_key_deterministic_coder(self): + # pylint: disable=global-variable-not-assigned + global MyObject # for pickling of the class instance + + class MyObject: + def __init__(self, value): + self.value = value + + def __eq__(self, other): + return self.value == other.value + + def __hash__(self): + return hash(self.value) + + class MyObjectCoder(beam.coders.Coder): + def encode(self, o): + return pickle.dumps((o.value, random.random())) + + def decode(self, encoded): + return MyObject(pickle.loads(encoded)[0]) + + def as_deterministic_coder(self, *args): + return MydeterministicObjectCoder() + + def to_type_hint(self): + return MyObject + + class MydeterministicObjectCoder(beam.coders.Coder): + def encode(self, o): + return pickle.dumps(o.value) + + def decode(self, encoded): + return MyObject(pickle.loads(encoded)) + + def is_deterministic(self): + return True + + beam.coders.registry.register_coder(MyObject, MyObjectCoder) + + with TestPipeline() as pipeline: + pcoll = pipeline | beam.Create([(MyObject(k % 2), k) for k in range(10)]) + grouped = pcoll | beam.GroupByKey() | beam.MapTuple( + lambda k, vs: (k.value, sorted(vs))) + combined = pcoll | beam.CombinePerKey(sum) | beam.MapTuple( + lambda k, v: (k.value, v)) + assert_that( + grouped, + equal_to([(0, [0, 2, 4, 6, 8]), (1, [1, 3, 5, 7, 9])]), + 'CheckGrouped') + assert_that(combined, equal_to([(0, 20), (1, 25)]), 'CheckCombined') + + def test_group_by_key_non_deterministic_coder(self): + with self.assertRaisesRegex(Exception, r'deterministic'): + with TestPipeline() as pipeline: + _ = ( + pipeline + | beam.Create([(PickledObject(10), None)]) + | beam.GroupByKey() + | beam.MapTuple(lambda k, v: list(v))) + + def test_group_by_key_allow_non_deterministic_coder(self): + with TestPipeline() as pipeline: + # The GroupByKey below would fail without this option. + pipeline._options.view_as( + TypeOptions).allow_non_deterministic_key_coders = True + grouped = ( + pipeline + | beam.Create([(PickledObject(10), None)]) + | beam.GroupByKey() + | beam.MapTuple(lambda k, v: list(v))) + assert_that(grouped, equal_to([[None]])) + + def test_group_by_key_fake_deterministic_coder(self): + fresh_registry = beam.coders.typecoders.CoderRegistry() + with patch.object( + beam.coders, 'registry', fresh_registry), patch.object( + beam.coders.typecoders, 'registry', fresh_registry): + with TestPipeline() as pipeline: + # The GroupByKey below would fail without this registration. + beam.coders.registry.register_fallback_coder( + beam.coders.coders.FakeDeterministicFastPrimitivesCoder()) + grouped = ( + pipeline + | beam.Create([(PickledObject(10), None)]) + | beam.GroupByKey() + | beam.MapTuple(lambda k, v: list(v))) + assert_that(grouped, equal_to([[None]])) + def test_partition_with_partition_fn(self): class SomePartitionFn(beam.PartitionFn): def partition_for(self, element, num_partitions, offset): @@ -560,7 +678,7 @@ def test_partition_followed_by_flatten_and_groupbykey(self): grouped = flattened | 'D' >> beam.GroupByKey() | _SortLists assert_that(grouped, equal_to([('aa', [1, 2]), ('bb', [2])])) - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_flatten_pcollections(self): with TestPipeline() as pipeline: pcoll_1 = pipeline | 'Start 1' >> beam.Create([0, 1, 2, 3]) @@ -575,7 +693,7 @@ def test_flatten_no_pcollections(self): result = () | 'Empty' >> beam.Flatten(pipeline=pipeline) assert_that(result, equal_to([])) - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_flatten_one_single_pcollection(self): with TestPipeline() as pipeline: input = [0, 1, 2, 3] @@ -584,7 +702,8 @@ def test_flatten_one_single_pcollection(self): assert_that(result, equal_to(input)) # TODO(BEAM-9002): Does not work in streaming mode on Dataflow. - @attr('ValidatesRunner', 'sickbay-streaming') + @pytest.mark.no_sickbay_streaming + @pytest.mark.it_validatesrunner def test_flatten_same_pcollections(self): with TestPipeline() as pipeline: pc = pipeline | beam.Create(['a', 'b']) @@ -597,7 +716,7 @@ def test_flatten_pcollections_in_iterable(self): result = [pcoll for pcoll in (pcoll_1, pcoll_2)] | beam.Flatten() assert_that(result, equal_to([0, 1, 2, 3, 4, 5, 6, 7])) - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_flatten_a_flattened_pcollection(self): with TestPipeline() as pipeline: pcoll_1 = pipeline | 'Start 1' >> beam.Create([0, 1, 2, 3]) @@ -619,7 +738,7 @@ def test_flatten_input_type_must_be_iterable_of_pcolls(self): with self.assertRaises(TypeError): set([1, 2, 3]) | beam.Flatten() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_flatten_multiple_pcollections_having_multiple_consumers(self): with TestPipeline() as pipeline: input = pipeline | 'Start' >> beam.Create(['AA', 'BBB', 'CC']) @@ -679,6 +798,32 @@ def test_co_group_by_key_on_dict(self): 'X': [4], 'Y': [7, 8] })])) + def test_co_group_by_key_on_dict_with_tuple_keys(self): + with TestPipeline() as pipeline: + key = ('a', ('b', 'c')) + pcoll_1 = pipeline | 'Start 1' >> beam.Create([(key, 1)]) + pcoll_2 = pipeline | 'Start 2' >> beam.Create([(key, 2)]) + result = {'X': pcoll_1, 'Y': pcoll_2} | beam.CoGroupByKey() + result |= _SortLists + assert_that(result, equal_to([(key, {'X': [1], 'Y': [2]})])) + + def test_co_group_by_key_on_empty(self): + with TestPipeline() as pipeline: + assert_that( + tuple() | 'EmptyTuple' >> beam.CoGroupByKey(pipeline=pipeline), + equal_to([]), + label='AssertEmptyTuple') + assert_that([] | 'EmptyList' >> beam.CoGroupByKey(pipeline=pipeline), + equal_to([]), + label='AssertEmptyList') + assert_that( + iter([]) | 'EmptyIterable' >> beam.CoGroupByKey(pipeline=pipeline), + equal_to([]), + label='AssertEmptyIterable') + assert_that({} | 'EmptyDict' >> beam.CoGroupByKey(pipeline=pipeline), + equal_to([]), + label='AssertEmptyDict') + def test_group_by_key_input_must_be_kv_pairs(self): with self.assertRaises(typehints.TypeCheckError) as e: with TestPipeline() as pipeline: @@ -810,6 +955,17 @@ def expand(self, pcoll): self.assertEqual(sorted(res1), [1, 2, 4, 8]) self.assertEqual(sorted(res2), [1, 2, 4, 8]) + def test_resource_hint_application_is_additive(self): + t = beam.Map(lambda x: x + 1).with_resource_hints( + accelerator='gpu').with_resource_hints(min_ram=1).with_resource_hints( + accelerator='tpu') + self.assertEqual( + t.get_resource_hints(), + { + common_urns.resource_hints.ACCELERATOR.urn: b'tpu', + common_urns.resource_hints.MIN_RAM_BYTES.urn: b'1' + }) + class TestGroupBy(unittest.TestCase): def test_lambdas(self): @@ -1368,7 +1524,6 @@ def more_than_half(a): def test_filter_type_checks_using_type_hints_decorator(self): @with_input_types(b=int) def half(b): - import random return bool(random.choice([0, 1])) # Filter should deduce that it returns the same type that it takes. @@ -1933,16 +2088,10 @@ def test_mean_globally_pipeline_checking_violated(self): | 'C' >> beam.Create(['test']).with_output_types(str) | 'Mean' >> combine.Mean.Globally()) - if sys.version_info[0] >= 3: - expected_msg = \ - "Type hint violation for 'CombinePerKey': " \ - "requires Tuple[TypeVariable[K], Union[float, int]] " \ - "but got Tuple[None, str] for element" - else: - expected_msg = \ - "Type hint violation for 'CombinePerKey': " \ - "requires Tuple[TypeVariable[K], Union[float, int, long]] " \ - "but got Tuple[None, str] for element" + expected_msg = \ + "Type hint violation for 'CombinePerKey': " \ + "requires Tuple[TypeVariable[K], Union[float, int]] " \ + "but got Tuple[None, str] for element" self.assertStartswith(e.exception.args[0], expected_msg) @@ -2005,16 +2154,10 @@ def test_mean_per_key_pipeline_checking_violated(self): | 'EvenMean' >> combine.Mean.PerKey()) self.p.run() - if sys.version_info[0] >= 3: - expected_msg = \ - "Type hint violation for 'CombinePerKey(MeanCombineFn)': " \ - "requires Tuple[TypeVariable[K], Union[float, int]] " \ - "but got Tuple[str, str] for element" - else: - expected_msg = \ - "Type hint violation for 'CombinePerKey(MeanCombineFn)': " \ - "requires Tuple[TypeVariable[K], Union[float, int, long]] " \ - "but got Tuple[str, str] for element" + expected_msg = \ + "Type hint violation for 'CombinePerKey(MeanCombineFn)': " \ + "requires Tuple[TypeVariable[K], Union[float, int]] " \ + "but got Tuple[str, str] for element" self.assertStartswith(e.exception.args[0], expected_msg) @@ -2049,22 +2192,13 @@ def test_mean_per_key_runtime_checking_violated(self): | 'OddMean' >> combine.Mean.PerKey()) self.p.run() - if sys.version_info[0] >= 3: - expected_msg = \ - "Runtime type violation detected within " \ - "OddMean/CombinePerKey(MeanCombineFn): " \ - "Type-hint for argument: 'element' violated: " \ - "Union[float, int] type-constraint violated. " \ - "Expected an instance of one of: ('float', 'int'), " \ - "received str instead" - else: - expected_msg = \ - "Runtime type violation detected within " \ - "OddMean/CombinePerKey(MeanCombineFn): " \ - "Type-hint for argument: 'element' violated: " \ - "Union[float, int, long] type-constraint violated. " \ - "Expected an instance of one of: ('float', 'int', 'long'), " \ - "received str instead" + expected_msg = \ + "Runtime type violation detected within " \ + "OddMean/CombinePerKey(MeanCombineFn): " \ + "Type-hint for argument: 'element' violated: " \ + "Union[float, int] type-constraint violated. " \ + "Expected an instance of one of: ('float', 'int'), " \ + "received str instead" self.assertStartswith(e.exception.args[0], expected_msg) @@ -2199,9 +2333,8 @@ def test_per_key_pipeline_checking_violated(self): self.assertStartswith( e.exception.args[0], - "Type hint violation for 'CombinePerKey(TopCombineFn)': " - "requires Tuple[TypeVariable[K], TypeVariable[T]] " - "but got {} for element".format(int)) + "Input type hint violation at TopMod: expected Tuple[TypeVariable[K], " + "TypeVariable[V]], got {}".format(int)) def test_per_key_pipeline_checking_satisfied(self): d = ( @@ -2358,10 +2491,8 @@ def test_to_dict_pipeline_check_violated(self): self.assertStartswith( e.exception.args[0], - "Type hint violation for 'CombinePerKey': " - "requires " - "Tuple[TypeVariable[K], Tuple[TypeVariable[K], TypeVariable[V]]] " - "but got Tuple[None, int] for element") + "Input type hint violation at ToDict: expected Tuple[TypeVariable[K], " + "TypeVariable[V]], got {}".format(int)) def test_to_dict_pipeline_check_satisfied(self): d = ( @@ -2502,11 +2633,19 @@ def _sort_lists(result): return tuple(_sort_lists(e) for e in result) elif isinstance(result, dict): return {k: _sort_lists(v) for k, v in result.items()} + elif isinstance(result, Iterable) and not isinstance(result, str): + return sorted(result) else: return result _SortLists = beam.Map(_sort_lists) + +class PickledObject(object): + def __init__(self, value): + self.value = value + + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/transforms/py_dataflow_distribution_counter.py b/sdks/python/apache_beam/transforms/py_dataflow_distribution_counter.py index 0be434879c78..61e0c87a2d7d 100644 --- a/sdks/python/apache_beam/transforms/py_dataflow_distribution_counter.py +++ b/sdks/python/apache_beam/transforms/py_dataflow_distribution_counter.py @@ -19,11 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import object -from builtins import range - globals()['INT64_MAX'] = 2**63 - 1 globals()['INT64_MIN'] = -2**63 diff --git a/sdks/python/apache_beam/transforms/resources.py b/sdks/python/apache_beam/transforms/resources.py new file mode 100644 index 000000000000..e201b6488764 --- /dev/null +++ b/sdks/python/apache_beam/transforms/resources.py @@ -0,0 +1,218 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""A module for defining resource requirements for execution of transforms. + +Pipeline authors can use resource hints to provide additional information to +runners about the desired aspects of the execution environment. + +Resource hints can be specified on a transform level for parts of the pipeline, +or globally via --resource_hint pipeline option. + +See also: PTransforms.with_resource_hints(). +""" + +import re +from typing import TYPE_CHECKING +from typing import Any +from typing import Dict +from typing import Optional + +from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.portability.common_urns import resource_hints + +if TYPE_CHECKING: + from typing import Mapping + from apache_beam.options.pipeline_options import PipelineOptions + +__all__ = [ + 'ResourceHint', + 'AcceleratorHint', + 'MinRamHint', + 'merge_resource_hints', + 'parse_resource_hints', + 'resource_hints_from_options', +] + + +class ResourceHint: + """A superclass to define resource hints.""" + # A unique URN, one per Resource Hint class. + urn = None # type: Optional[str] + + _urn_to_known_hints = {} # type: Dict[str, type] + _name_to_known_hints = {} # type: Dict[str, type] + + @classmethod + def parse(cls, value): # type: (str) -> Dict[str, bytes] + """Describes how to parse the hint. + Override to specify a custom parsing logic.""" + assert cls.urn is not None + # Override this method to have a custom parsing logic. + return {cls.urn: ResourceHint._parse_str(value)} + + @classmethod + def get_merged_value( + cls, outer_value, inner_value): # type: (bytes, bytes) -> bytes + """Reconciles values of a hint when the hint specified on a transform is + also defined in an outer context, for example on a composite transform, or + specified in the transform's execution environment. + Override to specify a custom merging logic. + """ + # Defaults to the inner value as it is the most specific one. + return inner_value + + @staticmethod + def get_by_urn(urn): + return ResourceHint._urn_to_known_hints[urn] + + @staticmethod + def get_by_name(name): + return ResourceHint._name_to_known_hints[name] + + @staticmethod + def register_resource_hint( + hint_name, hint_class): # type: (str, type) -> None + assert issubclass(hint_class, ResourceHint) + assert hint_class.urn is not None + ResourceHint._name_to_known_hints[hint_name] = hint_class + ResourceHint._urn_to_known_hints[hint_class.urn] = hint_class + + @staticmethod + def _parse_str(value): + if not isinstance(value, str): + raise ValueError("Input must be a string.") + return value.encode('ascii') + + @staticmethod + def _parse_int(value): + if isinstance(value, str): + value = int(value) + if not isinstance(value, int): + raise ValueError("Input must be an integer.") + return str(value).encode('ascii') + + @staticmethod + def _parse_storage_size_str(value): + """Parses a human-friendly storage size string into a number of bytes. + """ + if isinstance(value, int): + return ResourceHint._parse_int(value) + + if not isinstance(value, str): + raise ValueError("Input must be a string or integer.") + + value = value.strip().replace(" ", "") + units = { + 'PiB': 2**50, + 'TiB': 2**40, + 'GiB': 2**30, + 'MiB': 2**20, + 'KiB': 2**10, + 'PB': 10**15, + 'TB': 10**12, + 'GB': 10**9, + 'MB': 10**6, + 'KB': 10**3, + 'B': 1, + } + match = re.match(r'.*?(\D+)$', value) + if not match: + raise ValueError("Unrecognized value pattern.") + + suffix = match.group(1) + if suffix not in units: + raise ValueError("Unrecognized unit.") + multiplier = units[suffix] + value = value[:-len(suffix)] + + return str(round(float(value) * multiplier)).encode('ascii') + + @staticmethod + def _use_max(v1, v2): + return str(max(int(v1), int(v2))).encode('ascii') + + +class AcceleratorHint(ResourceHint): + """Describes desired hardware accelerators in execution environment.""" + urn = resource_hints.ACCELERATOR.urn + + +ResourceHint.register_resource_hint('accelerator', AcceleratorHint) + + +class MinRamHint(ResourceHint): + """Describes min RAM requirements for transform's execution environment.""" + urn = resource_hints.MIN_RAM_BYTES.urn + + @classmethod + def parse(cls, value): # type: (str) -> Dict[str, bytes] + return {cls.urn: ResourceHint._parse_storage_size_str(value)} + + @classmethod + def get_merged_value( + cls, outer_value, inner_value): # type: (bytes, bytes) -> bytes + return ResourceHint._use_max(outer_value, inner_value) + + +ResourceHint.register_resource_hint('min_ram', MinRamHint) +# Alias for interoperability with SDKs preferring camelCase. +ResourceHint.register_resource_hint('minRam', MinRamHint) + + +def parse_resource_hints(hints): # type: (Dict[Any, Any]) -> Dict[str, bytes] + parsed_hints = {} + for hint, value in hints.items(): + try: + hint_cls = ResourceHint.get_by_name(hint) + try: + parsed_hints.update(hint_cls.parse(value)) + except ValueError: + raise ValueError(f"Resource hint {hint} has invalid value {value}.") + except KeyError: + raise ValueError(f"Unknown resource hint: {hint}.") + + return parsed_hints + + +def resource_hints_from_options(options): + # type: (Optional[PipelineOptions]) -> Dict[str, bytes] + if options is None: + return {} + hints = {} + option_specified_hints = options.view_as(StandardOptions).resource_hints + for hint in option_specified_hints: + if '=' in hint: + k, v = hint.split('=', maxsplit=1) + hints[k] = v + else: + hints[hint] = None + + return parse_resource_hints(hints) + + +def merge_resource_hints( + outer_hints, inner_hints +): # type: (Mapping[str, bytes], Mapping[str, bytes]) -> Dict[str, bytes] + merged_hints = dict(inner_hints) + for urn, outer_value in outer_hints.items(): + if urn in inner_hints: + merged_value = ResourceHint.get_by_urn(urn).get_merged_value( + outer_value=outer_value, inner_value=inner_hints[urn]) + else: + merged_value = outer_value + merged_hints[urn] = merged_value + return merged_hints diff --git a/sdks/python/apache_beam/transforms/resources_test.py b/sdks/python/apache_beam/transforms/resources_test.py new file mode 100644 index 000000000000..19e1307d2bd7 --- /dev/null +++ b/sdks/python/apache_beam/transforms/resources_test.py @@ -0,0 +1,66 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import unittest + +from parameterized import param +from parameterized import parameterized + +from apache_beam import PTransform + + +class ResourcesTest(unittest.TestCase): + @parameterized.expand([ + param( + name='min_ram', + val='100 MiB', + urn='beam:resources:min_ram_bytes:v1', + bytestr=b'104857600'), + param( + name='minRam', + val='100MB', + urn='beam:resources:min_ram_bytes:v1', + bytestr=b'100000000'), + param( + name='min_ram', + val='6.5 GiB', + urn='beam:resources:min_ram_bytes:v1', + bytestr=b'6979321856'), + param( + name='accelerator', + val='gpu', + urn='beam:resources:accelerator:v1', + bytestr=b'gpu'), + ]) + def test_known_resource_hints(self, name, val, urn, bytestr): + t = PTransform() + t = t.with_resource_hints(**{name: val}) + self.assertEqual(t.get_resource_hints(), {urn: bytestr}) + + @parameterized.expand([ + param(name='min_ram', val='3,500G'), + param(name='accelerator', val=1), + param(name='unknown_hint', val=1) + ]) + def test_resource_hint_parsing_fails_early(self, name, val): + t = PTransform() + with self.assertRaises(ValueError): + _ = t.with_resource_hints(**{name: val}) + + +if __name__ == '__main__': + unittest.main() diff --git a/sdks/python/apache_beam/transforms/sideinputs.py b/sdks/python/apache_beam/transforms/sideinputs.py index 9832f0f01084..5c92eafe5422 100644 --- a/sdks/python/apache_beam/transforms/sideinputs.py +++ b/sdks/python/apache_beam/transforms/sideinputs.py @@ -26,10 +26,7 @@ # pytype: skip-file -from __future__ import absolute_import - import re -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import Callable @@ -58,6 +55,9 @@ def default_window_mapping_fn(target_window_fn): if target_window_fn == window.GlobalWindows(): return _global_window_mapping_fn + if isinstance(target_window_fn, window.Sessions): + raise RuntimeError("Sessions is not allowed in side inputs") + def map_via_end(source_window): # type: (window.BoundedWindow) -> window.BoundedWindow return list( diff --git a/sdks/python/apache_beam/transforms/sideinputs_test.py b/sdks/python/apache_beam/transforms/sideinputs_test.py index c8d118e7c499..29a88452cb5c 100644 --- a/sdks/python/apache_beam/transforms/sideinputs_test.py +++ b/sdks/python/apache_beam/transforms/sideinputs_test.py @@ -19,12 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import - +import itertools import logging import unittest -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -39,9 +38,6 @@ class SideInputsTest(unittest.TestCase): - # Enable nose tests running in parallel - _multiprocess_can_split_ = True - def create_pipeline(self): return TestPipeline() @@ -147,7 +143,7 @@ def test_windowed_dict(self): }), ]) - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_empty_singleton_side_input(self): pipeline = self.create_pipeline() pcol = pipeline | 'start' >> beam.Create([1, 2]) @@ -165,7 +161,8 @@ def my_fn(k, s): # TODO(BEAM-5025): Disable this test in streaming temporarily. # Remove sickbay-streaming tag after it's fixed. - @attr('ValidatesRunner', 'sickbay-streaming') + @pytest.mark.no_sickbay_streaming + @pytest.mark.it_validatesrunner def test_multi_valued_singleton_side_input(self): pipeline = self.create_pipeline() pcol = pipeline | 'start' >> beam.Create([1, 2]) @@ -175,7 +172,7 @@ def test_multi_valued_singleton_side_input(self): with self.assertRaises(Exception): pipeline.run() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_default_value_singleton_side_input(self): pipeline = self.create_pipeline() pcol = pipeline | 'start' >> beam.Create([1, 2]) @@ -185,7 +182,7 @@ def test_default_value_singleton_side_input(self): assert_that(result, equal_to([10, 20])) pipeline.run() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_iterable_side_input(self): pipeline = self.create_pipeline() pcol = pipeline | 'start' >> beam.Create([1, 2]) @@ -195,7 +192,34 @@ def test_iterable_side_input(self): assert_that(result, equal_to([3, 4, 6, 8])) pipeline.run() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner + def test_reiterable_side_input(self): + expected_side = frozenset(range(100)) + + def check_reiteration(main, side): + assert expected_side == set(side), side + # Iterate a second time. + assert expected_side == set(side), side + # Iterate over two copies of the input at the same time. + both = zip(side, side) + first, second = zip(*both) + assert expected_side == set(first), first + assert expected_side == set(second), second + # This will iterate over two copies of the side input, but offset. + offset = [None] * (len(expected_side) // 2) + both = zip(itertools.chain(side, offset), itertools.chain(offset, side)) + first, second = zip(*both) + expected_and_none = frozenset.union(expected_side, [None]) + assert expected_and_none == set(first), first + assert expected_and_none == set(second), second + + pipeline = self.create_pipeline() + pcol = pipeline | 'start' >> beam.Create(['A', 'B']) + side = pipeline | 'side' >> beam.Create(expected_side) + _ = pcol | 'check' >> beam.Map(check_reiteration, beam.pvalue.AsIter(side)) + pipeline.run() + + @pytest.mark.it_validatesrunner def test_as_list_and_as_dict_side_inputs(self): a_list = [5, 1, 3, 2, 9] some_pairs = [('crouton', 17), ('supreme', None)] @@ -222,7 +246,7 @@ def match(actual): assert_that(results, matcher(1, a_list, some_pairs)) pipeline.run() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_as_singleton_without_unique_labels(self): # This should succeed as calling beam.pvalue.AsSingleton on the same # PCollection twice with the same defaults will return the same @@ -250,7 +274,7 @@ def match(actual): assert_that(results, matcher(1, 2)) pipeline.run() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_as_singleton_with_different_defaults(self): a_list = [] pipeline = self.create_pipeline() @@ -275,7 +299,7 @@ def match(actual): assert_that(results, matcher(1, 2, 3)) pipeline.run() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_as_list_twice(self): # This should succeed as calling beam.pvalue.AsList on the same # PCollection twice will return the same view. @@ -302,7 +326,7 @@ def match(actual): assert_that(results, matcher(1, [1, 2, 3])) pipeline.run() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_as_dict_twice(self): some_kvs = [('a', 1), ('b', 2)] pipeline = self.create_pipeline() @@ -327,7 +351,7 @@ def match(actual): assert_that(results, matcher(1, some_kvs)) pipeline.run() - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_flattened_side_input(self): pipeline = self.create_pipeline() main_input = pipeline | 'main input' >> beam.Create([None]) @@ -341,7 +365,9 @@ def test_flattened_side_input(self): pipeline.run() # TODO(BEAM-9499): Disable this test in streaming temporarily. - @attr('ValidatesRunner', 'sickbay-batch', 'sickbay-streaming') + @pytest.mark.no_sickbay_batch + @pytest.mark.no_sickbay_streaming + @pytest.mark.it_validatesrunner def test_multi_triggered_gbk_side_input(self): """Test a GBK sideinput, with multiple triggering.""" # TODO(BEAM-9322): Remove use of this experiment. diff --git a/sdks/python/apache_beam/transforms/sql.py b/sdks/python/apache_beam/transforms/sql.py index 244cd17ef049..30d546443d06 100644 --- a/sdks/python/apache_beam/transforms/sql.py +++ b/sdks/python/apache_beam/transforms/sql.py @@ -19,12 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - import typing -from past.builtins import unicode - from apache_beam.transforms.external import BeamJarExpansionService from apache_beam.transforms.external import ExternalTransform from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder @@ -32,8 +28,7 @@ __all__ = ['SqlTransform'] SqlTransformSchema = typing.NamedTuple( - 'SqlTransformSchema', [('query', unicode), - ('dialect', typing.Optional[unicode])]) + 'SqlTransformSchema', [('query', str), ('dialect', typing.Optional[str])]) class SqlTransform(ExternalTransform): diff --git a/sdks/python/apache_beam/transforms/sql_test.py b/sdks/python/apache_beam/transforms/sql_test.py index c38fb9b03068..854aec078ce5 100644 --- a/sdks/python/apache_beam/transforms/sql_test.py +++ b/sdks/python/apache_beam/transforms/sql_test.py @@ -19,14 +19,11 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import typing import unittest -from nose.plugins.attrib import attr -from past.builtins import unicode +import pytest import apache_beam as beam from apache_beam import coders @@ -37,18 +34,18 @@ from apache_beam.transforms.sql import SqlTransform SimpleRow = typing.NamedTuple( - "SimpleRow", [("id", int), ("str", unicode), ("flt", float)]) + "SimpleRow", [("id", int), ("str", str), ("flt", float)]) coders.registry.register_coder(SimpleRow, coders.RowCoder) -Enrich = typing.NamedTuple("Enrich", [("id", int), ("metadata", unicode)]) +Enrich = typing.NamedTuple("Enrich", [("id", int), ("metadata", str)]) coders.registry.register_coder(Enrich, coders.RowCoder) Shopper = typing.NamedTuple( - "Shopper", [("shopper", unicode), ("cart", typing.Mapping[unicode, int])]) + "Shopper", [("shopper", str), ("cart", typing.Mapping[str, int])]) coders.registry.register_coder(Shopper, coders.RowCoder) -@attr('UsesSqlExpansionService') +@pytest.mark.xlang_sql_expansion_service @unittest.skipIf( TestPipeline().get_pipeline_options().view_as(StandardOptions).runner is None, @@ -60,15 +57,14 @@ class SqlTransformTest(unittest.TestCase): job server. The easiest way to accomplish this is to run the `validatesCrossLanguageRunnerPythonUsingSql` gradle target for a particular job server, which will start the runner and job server for you. For example, - `:runners:flink:1.10:job-server:validatesCrossLanguageRunnerPythonUsingSql` to - test on Flink 1.10. + `:runners:flink:1.13:job-server:validatesCrossLanguageRunnerPythonUsingSql` to + test on Flink 1.13. Alternatively, you may be able to iterate faster if you run the tests directly using a runner like `FlinkRunner`, which can start a local Flink cluster and job server for you: $ pip install -e './sdks/python[gcp,test]' - $ python ./sdks/python/setup.py nosetests \\ - --tests apache_beam.transforms.sql_test \\ + $ pytest apache_beam/transforms/sql_test.py \\ --test-pipeline-options="--runner=FlinkRunner" """ _multiprocess_can_split_ = True @@ -149,7 +145,7 @@ def test_row(self): out = ( p | beam.Create([1, 2, 10]) - | beam.Map(lambda x: beam.Row(a=x, b=unicode(x))) + | beam.Map(lambda x: beam.Row(a=x, b=str(x))) | SqlTransform("SELECT a*a as s, LENGTH(b) AS c FROM PCOLLECTION")) assert_that(out, equal_to([(1, 1), (4, 1), (100, 2)])) diff --git a/sdks/python/apache_beam/transforms/stats.pxd b/sdks/python/apache_beam/transforms/stats.pxd new file mode 100644 index 000000000000..e67c0121e722 --- /dev/null +++ b/sdks/python/apache_beam/transforms/stats.pxd @@ -0,0 +1,60 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +cimport cython +from libc.stdint cimport int64_t + +cdef class _QuantileSpec(object): + cdef readonly int64_t buffer_size + cdef readonly int64_t num_buffers + cdef readonly bint weighted + cdef readonly key + cdef readonly bint reverse + cdef readonly weighted_key + cdef readonly less_than + +cdef class _QuantileBuffer(object): + cdef readonly elements + cdef readonly weights + cdef readonly bint weighted + cdef readonly int64_t level + cdef readonly min_val + cdef readonly max_val + cdef readonly _iter + +cdef class _QuantileState(object): + cdef readonly _QuantileSpec spec + cdef public buffers + cdef public unbuffered_elements + cdef public unbuffered_weights + cdef public add_unbuffered + cpdef bint is_empty(self) + @cython.locals(num_new_buffers=int64_t, idx=int64_t) + cpdef _add_unbuffered(self, elements, offset_fn) + @cython.locals(num_new_buffers=int64_t, idx=int64_t) + cpdef _add_unbuffered_weighted(self, elements, offset_fn) + cpdef finalize(self) + @cython.locals(min_level=int64_t) + cpdef collapse_if_needed(self, offset_fn) + + +@cython.locals(new_level=int64_t, new_weight=double, step=double, offset=double) +cdef _QuantileBuffer _collapse(buffers, offset_fn, _QuantileSpec spec) + +@cython.locals(j=int64_t) +cdef _interpolate(buffers, int64_t count, double step, double offset, + _QuantileSpec spec) \ No newline at end of file diff --git a/sdks/python/apache_beam/transforms/stats.py b/sdks/python/apache_beam/transforms/stats.py index 9d19bc43e4fa..cbd79f474727 100644 --- a/sdks/python/apache_beam/transforms/stats.py +++ b/sdks/python/apache_beam/transforms/stats.py @@ -15,6 +15,8 @@ # limitations under the License. # +# cython: language_level=3 + """This module has all statistic related transforms. This ApproximateUnique class will be deprecated [1]. PLease look into using @@ -28,22 +30,16 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import hashlib import heapq import itertools import logging import math -import sys import typing -from builtins import round from typing import Any -from typing import Generic -from typing import Iterable +from typing import Callable from typing import List -from typing import Sequence +from typing import Tuple from apache_beam import coders from apache_beam import typehints @@ -61,30 +57,34 @@ K = typing.TypeVar('K') V = typing.TypeVar('V') +try: + import mmh3 # pylint: disable=import-error -def _get_default_hash_fn(): - """Returns either murmurhash or md5 based on installation.""" - try: - import mmh3 # pylint: disable=import-error + def _mmh3_hash(value): + # mmh3.hash64 returns two 64-bit unsigned integers + return mmh3.hash64(value, seed=0, signed=False)[0] - def _mmh3_hash(value): - # mmh3.hash64 returns two 64-bit unsigned integers - return mmh3.hash64(value, seed=0, signed=False)[0] + _default_hash_fn = _mmh3_hash + _default_hash_fn_type = 'mmh3' +except ImportError: - return _mmh3_hash + def _md5_hash(value): + # md5 is a 128-bit hash, so we truncate the hexdigest (string of 32 + # hexadecimal digits) to 16 digits and convert to int to get the 64-bit + # integer fingerprint. + return int(hashlib.md5(value).hexdigest()[:16], 16) - except ImportError: + _default_hash_fn = _md5_hash + _default_hash_fn_type = 'md5' + + +def _get_default_hash_fn(): + """Returns either murmurhash or md5 based on installation.""" + if _default_hash_fn_type == 'md5': logging.warning( 'Couldn\'t find murmurhash. Install mmh3 for a faster implementation of' 'ApproximateUnique.') - - def _md5_hash(value): - # md5 is a 128-bit hash, so we truncate the hexdigest (string of 32 - # hexadecimal digits) to 16 digits and convert to int to get the 64-bit - # integer fingerprint. - return int(hashlib.md5(value).hexdigest()[:16], 16) - - return _md5_hash + return _default_hash_fn class ApproximateUnique(object): @@ -141,8 +141,7 @@ def _get_sample_size_from_est_error(est_err): Calculate sample size from estimation error """ - #math.ceil in python2.7 returns a float, while it returns an int in python3. - return int(math.ceil(4.0 / math.pow(est_err, 2.0))) + return math.ceil(4.0 / math.pow(est_err, 2.0)) @typehints.with_input_types(T) @typehints.with_output_types(int) @@ -297,9 +296,20 @@ class ApproximateQuantiles(object): weighted=True out: [0, 2, 5, 7, 100] + + in: [list(range(10)), ..., list(range(90, 101))], num_quantiles=5, + input_batched=True + + out: [0, 25, 50, 75, 100] + + in: [(list(range(10)), [1]*10), (list(range(10)), [0]*10), ..., + (list(range(90, 101)), [0]*11)], num_quantiles=5, input_batched=True, + weighted=True + + out: [0, 2, 5, 7, 100] """ @staticmethod - def _display_data(num_quantiles, key, reverse, weighted): + def _display_data(num_quantiles, key, reverse, weighted, input_batched): return { 'num_quantiles': DisplayDataItem(num_quantiles, label='Quantile Count'), 'key': DisplayDataItem( @@ -307,7 +317,9 @@ def _display_data(num_quantiles, key, reverse, weighted): if hasattr(key, '__name__') else key.__class__.__name__, label='Record Comparer Key'), 'reverse': DisplayDataItem(str(reverse), label='Is Reversed'), - 'weighted': DisplayDataItem(str(weighted), label='Is Weighted') + 'weighted': DisplayDataItem(str(weighted), label='Is Weighted'), + 'input_batched': DisplayDataItem( + str(input_batched), label='Is Input Batched'), } @typehints.with_input_types( @@ -327,12 +339,24 @@ class Globally(PTransform): weighted: (optional) if set to True, the transform returns weighted quantiles. The input PCollection is then expected to contain tuples of input values with the corresponding weight. + input_batched: (optional) if set to True, the transform expects each + element of input PCollection to be a batch, which is a list of elements + for non-weighted case and a tuple of lists of elements and weights for + weighted. Provides a way to accumulate multiple elements at a time more + efficiently. """ - def __init__(self, num_quantiles, key=None, reverse=False, weighted=False): + def __init__( + self, + num_quantiles, + key=None, + reverse=False, + weighted=False, + input_batched=False): self._num_quantiles = num_quantiles self._key = key self._reverse = reverse self._weighted = weighted + self._input_batched = input_batched def expand(self, pcoll): return pcoll | CombineGlobally( @@ -340,14 +364,16 @@ def expand(self, pcoll): num_quantiles=self._num_quantiles, key=self._key, reverse=self._reverse, - weighted=self._weighted)) + weighted=self._weighted, + input_batched=self._input_batched)) def display_data(self): return ApproximateQuantiles._display_data( num_quantiles=self._num_quantiles, key=self._key, reverse=self._reverse, - weighted=self._weighted) + weighted=self._weighted, + input_batched=self._input_batched) @typehints.with_input_types( typehints.Union[typing.Tuple[K, V], @@ -368,12 +394,24 @@ class PerKey(PTransform): weighted: (optional) if set to True, the transform returns weighted quantiles. The input PCollection is then expected to contain tuples of input values with the corresponding weight. + input_batched: (optional) if set to True, the transform expects each + element of input PCollection to be a batch, which is a list of elements + for non-weighted case and a tuple of lists of elements and weights for + weighted. Provides a way to accumulate multiple elements at a time more + efficiently. """ - def __init__(self, num_quantiles, key=None, reverse=False, weighted=False): + def __init__( + self, + num_quantiles, + key=None, + reverse=False, + weighted=False, + input_batched=False): self._num_quantiles = num_quantiles self._key = key self._reverse = reverse self._weighted = weighted + self._input_batched = input_batched def expand(self, pcoll): return pcoll | CombinePerKey( @@ -381,69 +419,106 @@ def expand(self, pcoll): num_quantiles=self._num_quantiles, key=self._key, reverse=self._reverse, - weighted=self._weighted)) + weighted=self._weighted, + input_batched=self._input_batched)) def display_data(self): return ApproximateQuantiles._display_data( num_quantiles=self._num_quantiles, key=self._key, reverse=self._reverse, - weighted=self._weighted) + weighted=self._weighted, + input_batched=self._input_batched) + + +class _QuantileSpec(object): + """Quantiles computation specifications.""" + def __init__(self, buffer_size, num_buffers, weighted, key, reverse): + # type: (int, int, bool, Any, bool) -> None + self.buffer_size = buffer_size + self.num_buffers = num_buffers + self.weighted = weighted + self.key = key + self.reverse = reverse + + # Used to sort tuples of values and weights. + self.weighted_key = None if key is None else (lambda x: key(x[0])) + + # Used to compare values. + if reverse and key is None: + self.less_than = lambda a, b: a > b + elif reverse: + self.less_than = lambda a, b: key(a) > key(b) + elif key is None: + self.less_than = lambda a, b: a < b + else: + self.less_than = lambda a, b: key(a) < key(b) + + def get_argsort_key(self, elements): + # type: (List) -> Callable[[int], Any] + + """Returns a key for sorting indices of elements by element's value.""" + if self.key is None: + return elements.__getitem__ + else: + return lambda idx: self.key(elements[idx]) + def __reduce__(self): + return ( + self.__class__, + ( + self.buffer_size, + self.num_buffers, + self.weighted, + self.key, + self.reverse)) -class _QuantileBuffer(Generic[T]): + +class _QuantileBuffer(object): """A single buffer in the sense of the referenced algorithm. (see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.6513&rep=rep1 &type=pdf and ApproximateQuantilesCombineFn for further information)""" - def __init__(self, elements, weighted, level=0, weight=1): - # type: (Sequence[T], bool, int, int) -> None - # In case of weighted quantiles, elements are tuples of values and weights. + def __init__( + self, elements, weights, weighted, level=0, min_val=None, max_val=None): + # type: (List, List, bool, int, Any, Any) -> None self.elements = elements + # In non-weighted case weights contains a single element representing weight + # of the buffer in the sense of the original algorithm. In weighted case, + # it stores weights of individual elements. + self.weights = weights self.weighted = weighted self.level = level - self.weight = weight - - def __lt__(self, other): - if self.weighted: - return [element[0] for element in self.elements - ] < [element[0] for element in other.elements] + if min_val is None or max_val is None: + # Buffer is always initialized with sorted elements. + self.min_val = elements[0] + self.max_val = elements[-1] else: - return self.elements < other.elements - - def sized_iterator(self): - class QuantileBufferIterator(object): - def __init__(self, elem, weighted, weight): - self._iter = iter(elem) - self.weighted = weighted - self.weight = weight - - def __iter__(self): - return self - - def __next__(self): - if self.weighted: - return next(self._iter) - else: - value = next(self._iter) - return (value, self.weight) + # Note that collapsed buffer may not contain min and max in the list of + # elements. + self.min_val = min_val + self.max_val = max_val - next = __next__ # For Python 2 + def __iter__(self): + return zip( + self.elements, + self.weights if self.weighted else itertools.repeat(self.weights[0])) - return QuantileBufferIterator(self.elements, self.weighted, self.weight) + def __lt__(self, other): + return self.level < other.level -class _QuantileState(Generic[T]): +class _QuantileState(object): """ Compact summarization of a collection on which quantiles can be estimated. """ - min_val = None # type: Any # Holds smallest item in the list - max_val = None # type: Any # Holds largest item in the list - - def __init__(self, buffer_size, num_buffers, unbuffered_elements, buffers): - # type: (int, int, List[Any], List[_QuantileBuffer[T]]) -> None - self.buffer_size = buffer_size - self.num_buffers = num_buffers + def __init__(self, unbuffered_elements, unbuffered_weights, buffers, spec): + # type: (List, List, List[_QuantileBuffer], _QuantileSpec) -> None self.buffers = buffers + self.spec = spec + if spec.weighted: + self.add_unbuffered = self._add_unbuffered_weighted + else: + self.add_unbuffered = self._add_unbuffered # The algorithm requires that the manipulated buffers always be filled to # capacity to perform the collapse operation. This operation can be extended @@ -452,6 +527,17 @@ def __init__(self, buffer_size, num_buffers, unbuffered_elements, buffers): # into new, full buffers and then take them into account when computing the # final output. self.unbuffered_elements = unbuffered_elements + self.unbuffered_weights = unbuffered_weights + + # This is needed for pickling to work when Cythonization is enabled. + def __reduce__(self): + return ( + self.__class__, + ( + self.unbuffered_elements, + self.unbuffered_weights, + self.buffers, + self.spec)) def is_empty(self): # type: () -> bool @@ -459,8 +545,219 @@ def is_empty(self): """Check if the buffered & unbuffered elements are empty or not.""" return not self.unbuffered_elements and not self.buffers + def _add_unbuffered(self, elements, offset_fn): + # type: (List, Any) -> None + + """ + Add elements to the unbuffered list, creating new buffers and + collapsing if needed. + """ + self.unbuffered_elements.extend(elements) + num_new_buffers = len(self.unbuffered_elements) // self.spec.buffer_size + for idx in range(num_new_buffers): + to_buffer = sorted( + self.unbuffered_elements[idx * self.spec.buffer_size:(idx + 1) * + self.spec.buffer_size], + key=self.spec.key, + reverse=self.spec.reverse) + heapq.heappush( + self.buffers, + _QuantileBuffer(elements=to_buffer, weights=[1], weighted=False)) + + if num_new_buffers > 0: + self.unbuffered_elements = self.unbuffered_elements[num_new_buffers * + self.spec. + buffer_size:] + + self.collapse_if_needed(offset_fn) + + def _add_unbuffered_weighted(self, elements, offset_fn): + # type: (List, Any) -> None + + """ + Add elements with weights to the unbuffered list, creating new buffers and + collapsing if needed. + """ + if len(elements) == 1: + self.unbuffered_elements.append(elements[0][0]) + self.unbuffered_weights.append(elements[0][1]) + else: + self.unbuffered_elements.extend(elements[0]) + self.unbuffered_weights.extend(elements[1]) + num_new_buffers = len(self.unbuffered_elements) // self.spec.buffer_size + argsort_key = self.spec.get_argsort_key(self.unbuffered_elements) + for idx in range(num_new_buffers): + argsort = sorted( + range(idx * self.spec.buffer_size, (idx + 1) * self.spec.buffer_size), + key=argsort_key, + reverse=self.spec.reverse) + elements_to_buffer = [self.unbuffered_elements[idx] for idx in argsort] + weights_to_buffer = [self.unbuffered_weights[idx] for idx in argsort] + heapq.heappush( + self.buffers, + _QuantileBuffer( + elements=elements_to_buffer, + weights=weights_to_buffer, + weighted=True)) + + if num_new_buffers > 0: + self.unbuffered_elements = self.unbuffered_elements[num_new_buffers * + self.spec. + buffer_size:] + self.unbuffered_weights = self.unbuffered_weights[num_new_buffers * + self.spec.buffer_size:] + + self.collapse_if_needed(offset_fn) + + def finalize(self): + # type: () -> None + + """ + Creates a new buffer using all unbuffered elements. Called before + extracting an output. Note that the buffer doesn't have to be put in a + proper position since _collapse is not going to be called after. + """ + if self.unbuffered_elements and self.spec.weighted: + argsort_key = self.spec.get_argsort_key(self.unbuffered_elements) + argsort = sorted( + range(len(self.unbuffered_elements)), + key=argsort_key, + reverse=self.spec.reverse) + self.unbuffered_elements = [ + self.unbuffered_elements[idx] for idx in argsort + ] + self.unbuffered_weights = [ + self.unbuffered_weights[idx] for idx in argsort + ] + self.buffers.append( + _QuantileBuffer( + self.unbuffered_elements, self.unbuffered_weights, weighted=True)) + self.unbuffered_weights = [] + elif self.unbuffered_elements: + self.unbuffered_elements.sort( + key=self.spec.key, reverse=self.spec.reverse) + self.buffers.append( + _QuantileBuffer( + self.unbuffered_elements, weights=[1], weighted=False)) + self.unbuffered_elements = [] + + def collapse_if_needed(self, offset_fn): + # type: (Any) -> None + + """ + Checks if summary has too many buffers and collapses some of them until the + limit is restored. + """ + while len(self.buffers) > self.spec.num_buffers: + to_collapse = [heapq.heappop(self.buffers), heapq.heappop(self.buffers)] + min_level = to_collapse[1].level + + while self.buffers and self.buffers[0].level <= min_level: + to_collapse.append(heapq.heappop(self.buffers)) + + heapq.heappush(self.buffers, _collapse(to_collapse, offset_fn, self.spec)) + + +def _collapse(buffers, offset_fn, spec): + # type: (List[_QuantileBuffer], Any, _QuantileSpec) -> _QuantileBuffer + + """ + Approximates elements from multiple buffers and produces a single buffer. + """ + new_level = 0 + new_weight = 0 + for buffer in buffers: + # As presented in the paper, there should always be at least two + # buffers of the same (minimal) level to collapse, but it is possible + # to violate this condition when combining buffers from independently + # computed shards. If they differ we take the max. + new_level = max([new_level, buffer.level + 1]) + new_weight = new_weight + sum(buffer.weights) + if spec.weighted: + step = new_weight / (spec.buffer_size - 1) + offset = new_weight / (2 * spec.buffer_size) + else: + step = new_weight + offset = offset_fn(new_weight) + new_elements, new_weights, min_val, max_val = \ + _interpolate(buffers, spec.buffer_size, step, offset, spec) + if not spec.weighted: + new_weights = [new_weight] + return _QuantileBuffer( + new_elements, new_weights, spec.weighted, new_level, min_val, max_val) + + +def _interpolate(buffers, count, step, offset, spec): + # type: (List[_QuantileBuffer], int, float, float, _QuantileSpec) -> Tuple[List, List, Any, Any] + + """ + Emulates taking the ordered union of all elements in buffers, repeated + according to their weight, and picking out the (k * step + offset)-th elements + of this list for `0 <= k < count`. + """ + buffer_iterators = [] + min_val = buffers[0].min_val + max_val = buffers[0].max_val + for buffer in buffers: + # Calculate extreme values for the union of buffers. + min_val = buffer.min_val if spec.less_than( + buffer.min_val, min_val) else min_val + max_val = buffer.max_val if spec.less_than( + max_val, buffer.max_val) else max_val + buffer_iterators.append(iter(buffer)) + + # Note that `heapq.merge` can also be used here since the buffers are sorted. + # In practice, however, `sorted` uses natural order in the union and + # significantly outperforms `heapq.merge`. + sorted_elements = sorted( + itertools.chain.from_iterable(buffer_iterators), + key=spec.weighted_key, + reverse=spec.reverse) + + if not spec.weighted: + # If all buffers have the same weight, then quantiles' indices are evenly + # distributed over a range [0, len(sorted_elements)]. + buffers_have_same_weight = True + weight = buffers[0].weights[0] + for buffer in buffers: + if buffer.weights[0] != weight: + buffers_have_same_weight = False + break + if buffers_have_same_weight: + offset = offset / weight + step = step / weight + max_idx = len(sorted_elements) - 1 + result = [ + sorted_elements[min(int(j * step + offset), max_idx)][0] + for j in range(count) + ] + return result, [], min_val, max_val + + sorted_elements_iter = iter(sorted_elements) + weighted_element = next(sorted_elements_iter) + new_elements = [] + new_weights = [] + j = 0 + current_weight = weighted_element[1] + previous_weight = 0 + while j < count: + target_weight = j * step + offset + j += 1 + try: + while current_weight <= target_weight: + weighted_element = next(sorted_elements_iter) + current_weight += weighted_element[1] + except StopIteration: + pass + new_elements.append(weighted_element[0]) + if spec.weighted: + new_weights.append(current_weight - previous_weight) + previous_weight = current_weight + + return new_elements, new_weights, min_val, max_val -class ApproximateQuantilesCombineFn(CombineFn, Generic[T]): + +class ApproximateQuantilesCombineFn(CombineFn): """ This combiner gives an idea of the distribution of a collection of values using approximate N-tiles. The output of this combiner is the list of size of @@ -483,9 +780,12 @@ class ApproximateQuantilesCombineFn(CombineFn, Generic[T]): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.6513&rep=rep1 &type=pdf - The default error bound is (1 / N) for uniformly distributed data and - min(1e-2, 1 / N) for weighted case, though in practice the accuracy tends to - be much better. + Note that the weighted quantiles are evaluated using a generalized version of + the algorithm referenced in the paper. + + The default error bound is (1 / num_quantiles) for uniformly distributed data + and min(1e-2, 1 / num_quantiles) for weighted case, though in practice the + accuracy tends to be much better. Args: num_quantiles: Number of quantiles to produce. It is the size of the final @@ -501,6 +801,8 @@ class ApproximateQuantilesCombineFn(CombineFn, Generic[T]): weighted: (optional) if set to True, the combiner produces weighted quantiles. The input elements are then expected to be tuples of input values with the corresponding weight. + input_batched: (optional) if set to True, inputs are expected to be batches + of elements. """ # For alternating between biasing up and down in the above even weight @@ -514,7 +816,7 @@ class ApproximateQuantilesCombineFn(CombineFn, Generic[T]): # non-optimal. The impact is logarithmic with respect to this value, so this # default should be fine for most uses. _MAX_NUM_ELEMENTS = 1e9 - _qs = None # type: _QuantileState[T] + _qs = None # type: _QuantileState def __init__( self, @@ -523,29 +825,25 @@ def __init__( num_buffers, # type: int key=None, reverse=False, - weighted=False): - def _comparator(a, b): - if key: - a, b = key(a), key(b) - - retval = int(a > b) - int(a < b) - - if reverse: - return -retval - - return retval - - self._comparator = _comparator - + weighted=False, + input_batched=False): self._num_quantiles = num_quantiles - self._buffer_size = buffer_size - self._num_buffers = num_buffers - if weighted: - self._key = (lambda x: x[0]) if key is None else (lambda x: key(x[0])) - else: - self._key = key - self._reverse = reverse - self._weighted = weighted + self._spec = _QuantileSpec(buffer_size, num_buffers, weighted, key, reverse) + self._input_batched = input_batched + if self._input_batched: + setattr(self, 'add_input', self._add_inputs) + + def __reduce__(self): + return ( + self.__class__, + ( + self._num_quantiles, + self._spec.buffer_size, + self._spec.num_buffers, + self._spec.key, + self._spec.reverse, + self._spec.weighted, + self._input_batched)) @classmethod def create( @@ -555,7 +853,8 @@ def create( max_num_elements=None, key=None, reverse=False, - weighted=False): + weighted=False, + input_batched=False): # type: (...) -> ApproximateQuantilesCombineFn """ @@ -582,11 +881,17 @@ def create( weighted: (optional) if set to True, the combiner produces weighted quantiles. The input elements are then expected to be tuples of values with the corresponding weight. + input_batched: (optional) if set to True, inputs are expected to be + batches of elements. """ max_num_elements = max_num_elements or cls._MAX_NUM_ELEMENTS if not epsilon: epsilon = min(1e-2, 1.0 / num_quantiles) \ if weighted else (1.0 / num_quantiles) + # Note that calculation of the buffer size and the number of buffers here + # is based on technique used in the Munro-Paterson algorithm. Switching to + # the logic used in the "New Algorithm" may result in memory savings since + # it results in lower values for b and k in practice. b = 2 while (b - 2) * (1 << (b - 2)) < epsilon * max_num_elements: b = b + 1 @@ -598,30 +903,8 @@ def create( num_buffers=b, key=key, reverse=reverse, - weighted=weighted) - - def _add_unbuffered(self, qs, elements): - # type: (_QuantileState[T], Iterable[T]) -> None - - """ - Add a new buffer to the unbuffered list, creating a new buffer and - collapsing if needed. - """ - qs.unbuffered_elements.extend(elements) - if len(qs.unbuffered_elements) >= qs.buffer_size: - qs.unbuffered_elements.sort(key=self._key, reverse=self._reverse) - - while len(qs.unbuffered_elements) >= qs.buffer_size: - to_buffer = qs.unbuffered_elements[:qs.buffer_size] - heapq.heappush( - qs.buffers, - _QuantileBuffer( - elements=to_buffer, - weighted=self._weighted, - weight=sum([element[1] for element in to_buffer]) - if self._weighted else 1)) - qs.unbuffered_elements = qs.unbuffered_elements[qs.buffer_size:] - self._collapse_if_needed(qs) + weighted=weighted, + input_batched=input_batched) def _offset(self, new_weight): # type: (int) -> float @@ -636,132 +919,33 @@ def _offset(self, new_weight): self._offset_jitter = 2 - self._offset_jitter return (new_weight + self._offset_jitter) / 2 - def _collapse(self, buffers): - # type: (Iterable[_QuantileBuffer[T]]) -> _QuantileBuffer[T] - new_level = 0 - new_weight = 0 - for buffer_elem in buffers: - # As presented in the paper, there should always be at least two - # buffers of the same (minimal) level to collapse, but it is possible - # to violate this condition when combining buffers from independently - # computed shards. If they differ we take the max. - new_level = max([new_level, buffer_elem.level + 1]) - new_weight = new_weight + buffer_elem.weight - if self._weighted: - step = new_weight / (self._buffer_size - 1) - offset = new_weight / (2 * self._buffer_size) - else: - step = new_weight - offset = self._offset(new_weight) - new_elements = self._interpolate(buffers, self._buffer_size, step, offset) - return _QuantileBuffer(new_elements, self._weighted, new_level, new_weight) - - def _collapse_if_needed(self, qs): - # type: (_QuantileState) -> None - while len(qs.buffers) > self._num_buffers: - to_collapse = [] - to_collapse.append(heapq.heappop(qs.buffers)) - to_collapse.append(heapq.heappop(qs.buffers)) - min_level = to_collapse[1].level - - while len(qs.buffers) > 0 and qs.buffers[0].level == min_level: - to_collapse.append(heapq.heappop(qs.buffers)) - - heapq.heappush(qs.buffers, self._collapse(to_collapse)) - - def _interpolate(self, i_buffers, count, step, offset): - """ - Emulates taking the ordered union of all elements in buffers, repeated - according to their weight, and picking out the (k * step + offset)-th - elements of this list for `0 <= k < count`. - """ - - iterators = [] - new_elements = [] - compare_key = self._key - if self._key and not self._weighted: - compare_key = lambda x: self._key(x[0]) - for buffer_elem in i_buffers: - iterators.append(buffer_elem.sized_iterator()) - - # Python 3 `heapq.merge` support key comparison and returns an iterator and - # does not pull the data into memory all at once. Python 2 does not - # support comparison on its `heapq.merge` api, so we use the itertools - # which takes the `key` function for comparison and creates an iterator - # from it. - if sys.version_info[0] < 3: - sorted_elem = iter( - sorted( - itertools.chain.from_iterable(iterators), - key=compare_key, - reverse=self._reverse)) - else: - sorted_elem = heapq.merge( - *iterators, key=compare_key, reverse=self._reverse) - - weighted_element = next(sorted_elem) - current = weighted_element[1] - j = 0 - previous = 0 - while j < count: - target = j * step + offset - j = j + 1 - try: - while current <= target: - weighted_element = next(sorted_elem) - current = current + weighted_element[1] - except StopIteration: - pass - if self._weighted: - new_elements.append((weighted_element[0], current - previous)) - previous = current - else: - new_elements.append(weighted_element[0]) - return new_elements - # TODO(BEAM-7746): Signature incompatible with supertype def create_accumulator(self): # type: ignore[override] - # type: () -> _QuantileState[T] + # type: () -> _QuantileState self._qs = _QuantileState( - buffer_size=self._buffer_size, - num_buffers=self._num_buffers, unbuffered_elements=[], - buffers=[]) + unbuffered_weights=[], + buffers=[], + spec=self._spec) return self._qs def add_input(self, quantile_state, element): """ Add a new element to the collection being summarized by quantile state. """ - value = element[0] if self._weighted else element - if quantile_state.is_empty(): - quantile_state.min_val = quantile_state.max_val = value - elif self._comparator(value, quantile_state.min_val) < 0: - quantile_state.min_val = value - elif self._comparator(value, quantile_state.max_val) > 0: - quantile_state.max_val = value - self._add_unbuffered(quantile_state, elements=[element]) + quantile_state.add_unbuffered([element], self._offset) return quantile_state - def add_inputs(self, quantile_state, elements): - """Add new elements to the collection being summarized by quantile state. + def _add_inputs(self, quantile_state, elements): + # type: (_QuantileState, List) -> _QuantileState + """ - if not elements: + Add a batch of elements to the collection being summarized by quantile + state. + """ + if len(elements) == 0: return quantile_state - - values = [ - element[0] for element in elements - ] if self._weighted else elements - min_val = min(values) - max_val = max(values) - if quantile_state.is_empty(): - quantile_state.min_val = min_val - quantile_state.max_val = max_val - elif self._comparator(min_val, quantile_state.min_val) < 0: - quantile_state.min_val = min_val - elif self._comparator(max_val, quantile_state.max_val) > 0: - quantile_state.max_val = max_val - self._add_unbuffered(quantile_state, elements=elements) + quantile_state.add_unbuffered(elements, self._offset) return quantile_state def merge_accumulators(self, accumulators): @@ -770,17 +954,16 @@ def merge_accumulators(self, accumulators): for accumulator in accumulators: if accumulator.is_empty(): continue - if not qs.min_val or self._comparator(accumulator.min_val, - qs.min_val) < 0: - qs.min_val = accumulator.min_val - if not qs.max_val or self._comparator(accumulator.max_val, - qs.max_val) > 0: - qs.max_val = accumulator.max_val - - self._add_unbuffered(qs, accumulator.unbuffered_elements) + if self._spec.weighted: + qs.add_unbuffered( + [accumulator.unbuffered_elements, accumulator.unbuffered_weights], + self._offset) + else: + qs.add_unbuffered(accumulator.unbuffered_elements, self._offset) qs.buffers.extend(accumulator.buffers) - self._collapse_if_needed(qs) + heapq.heapify(qs.buffers) + qs.collapse_if_needed(self._offset) return qs def extract_output(self, accumulator): @@ -791,46 +974,21 @@ def extract_output(self, accumulator): """ if accumulator.is_empty(): return [] - + accumulator.finalize() all_elems = accumulator.buffers - if self._weighted: - unbuffered_weight = sum( - [element[1] for element in accumulator.unbuffered_elements]) - total_weight = unbuffered_weight + total_weight = 0 + if self._spec.weighted: for buffer_elem in all_elems: - total_weight += sum([element[1] for element in buffer_elem.elements]) - if accumulator.unbuffered_elements: - accumulator.unbuffered_elements.sort( - key=self._key, reverse=self._reverse) - all_elems.append( - _QuantileBuffer( - accumulator.unbuffered_elements, - weighted=True, - weight=unbuffered_weight)) - - step = 1.0 * total_weight / (self._num_quantiles - 1) - offset = (1.0 * total_weight) / (self._num_quantiles - 1) - mid_quantiles = [ - element[0] for element in self._interpolate( - all_elems, self._num_quantiles - 2, step, offset) - ] + total_weight += sum(buffer_elem.weights) else: - total_weight = len(accumulator.unbuffered_elements) for buffer_elem in all_elems: - total_weight += accumulator.buffer_size * buffer_elem.weight - - if accumulator.unbuffered_elements: - accumulator.unbuffered_elements.sort( - key=self._key, reverse=self._reverse) - all_elems.append( - _QuantileBuffer(accumulator.unbuffered_elements, weighted=False)) - - step = 1.0 * total_weight / (self._num_quantiles - 1) - offset = (1.0 * total_weight - 1) / (self._num_quantiles - 1) - mid_quantiles = self._interpolate( - all_elems, self._num_quantiles - 2, step, offset) - - quantiles = [accumulator.min_val] - quantiles.extend(mid_quantiles) - quantiles.append(accumulator.max_val) - return quantiles + total_weight += len(buffer_elem.elements) * buffer_elem.weights[0] + + step = total_weight / (self._num_quantiles - 1) + offset = (total_weight - 1) / (self._num_quantiles - 1) + + quantiles, _, min_val, max_val = \ + _interpolate(all_elems, self._num_quantiles - 2, step, offset, + self._spec) + + return [min_val] + quantiles + [max_val] diff --git a/sdks/python/apache_beam/transforms/stats_test.py b/sdks/python/apache_beam/transforms/stats_test.py index 860594f7fd44..739438035c88 100644 --- a/sdks/python/apache_beam/transforms/stats_test.py +++ b/sdks/python/apache_beam/transforms/stats_test.py @@ -18,14 +18,10 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import math import random import sys import unittest -from builtins import range from collections import defaultdict import hamcrest as hc @@ -482,13 +478,116 @@ def test_alternate_quantiles(self): equal_to([["ccccc", "aaa", "b"]]), label='checkWithKeyAndReversed') + def test_batched_quantiles(self): + with TestPipeline() as p: + data = [] + for i in range(100): + data.append([(j / 10, abs(j - 500)) + for j in range(i * 10, (i + 1) * 10)]) + pc = p | Create(data) + globally = ( + pc | 'Globally' >> beam.ApproximateQuantiles.Globally( + 3, input_batched=True)) + with_key = ( + pc | 'Globally with key' >> beam.ApproximateQuantiles.Globally( + 3, key=sum, input_batched=True)) + key_with_reversed = ( + pc | 'Globally with key and reversed' >> + beam.ApproximateQuantiles.Globally( + 3, key=sum, reverse=True, input_batched=True)) + assert_that( + globally, + equal_to([[(0.0, 500), (49.9, 1), (99.9, 499)]]), + label='checkGlobally') + assert_that( + with_key, + equal_to([[(50.0, 0), (72.5, 225), (99.9, 499)]]), + label='checkGloballyWithKey') + assert_that( + key_with_reversed, + equal_to([[(99.9, 499), (72.5, 225), (50.0, 0)]]), + label='checkGloballyWithKeyAndReversed') + + def test_batched_weighted_quantiles(self): + with TestPipeline() as p: + data = [] + for i in range(100): + data.append([[(i / 10, abs(i - 500)) + for i in range(i * 10, (i + 1) * 10)], [i] * 10]) + pc = p | Create(data) + globally = ( + pc | 'Globally' >> beam.ApproximateQuantiles.Globally( + 3, weighted=True, input_batched=True)) + with_key = ( + pc | 'Globally with key' >> beam.ApproximateQuantiles.Globally( + 3, key=sum, weighted=True, input_batched=True)) + key_with_reversed = ( + pc | 'Globally with key and reversed' >> + beam.ApproximateQuantiles.Globally( + 3, key=sum, reverse=True, weighted=True, input_batched=True)) + assert_that( + globally, + equal_to([[(0.0, 500), (70.8, 208), (99.9, 499)]]), + label='checkGlobally') + assert_that( + with_key, + equal_to([[(50.0, 0), (21.0, 290), (99.9, 499)]]), + label='checkGloballyWithKey') + assert_that( + key_with_reversed, + equal_to([[(99.9, 499), (21.0, 290), (50.0, 0)]]), + label='checkGloballyWithKeyAndReversed') + + def test_quantiles_merge_accumulators(self): + # This test exercises merging multiple buffers and approximation accuracy. + # The max_num_elements is set to a small value to trigger buffers collapse + # and interpolation. Under the conditions below, buffer_size=125 and + # num_buffers=4, so we're only allowed to keep half of the input values. + num_accumulators = 100 + num_quantiles = 5 + eps = 0.01 + max_num_elements = 1000 + combine_fn = ApproximateQuantilesCombineFn.create( + num_quantiles, eps, max_num_elements) + combine_fn_weighted = ApproximateQuantilesCombineFn.create( + num_quantiles, eps, max_num_elements, weighted=True) + data = list(range(1000)) + weights = list(reversed(range(1000))) + step = math.ceil(len(data) / num_accumulators) + accumulators = [] + accumulators_weighted = [] + for i in range(num_accumulators): + accumulator = combine_fn.create_accumulator() + accumulator_weighted = combine_fn_weighted.create_accumulator() + for element, weight in zip(data[i*step:(i+1)*step], + weights[i*step:(i+1)*step]): + accumulator = combine_fn.add_input(accumulator, element) + accumulator_weighted = combine_fn_weighted.add_input( + accumulator_weighted, (element, weight)) + accumulators.append(accumulator) + accumulators_weighted.append(accumulator_weighted) + accumulator = combine_fn.merge_accumulators(accumulators) + accumulator_weighted = combine_fn_weighted.merge_accumulators( + accumulators_weighted) + quantiles = combine_fn.extract_output(accumulator) + quantiles_weighted = combine_fn_weighted.extract_output( + accumulator_weighted) + + # In fact, the final accuracy is much higher than eps, but we test for a + # minimal accuracy here. + for q, actual_q in zip(quantiles, [0, 249, 499, 749, 999]): + self.assertAlmostEqual(q, actual_q, delta=max_num_elements * eps) + for q, actual_q in zip(quantiles_weighted, [0, 133, 292, 499, 999]): + self.assertAlmostEqual(q, actual_q, delta=max_num_elements * eps) + @staticmethod def _display_data_matcher(instance): expected_items = [ DisplayDataItemMatcher('num_quantiles', instance._num_quantiles), DisplayDataItemMatcher('weighted', str(instance._weighted)), DisplayDataItemMatcher('key', str(instance._key.__name__)), - DisplayDataItemMatcher('reverse', str(instance._reverse)) + DisplayDataItemMatcher('reverse', str(instance._reverse)), + DisplayDataItemMatcher('input_batched', str(instance._input_batched)), ] return expected_items @@ -551,8 +650,9 @@ def test_efficiency( combine_fn = ApproximateQuantilesCombineFn.create( num_quantiles=10, max_num_elements=maxInputSize, epsilon=epsilon) self.assertEqual( - expectedNumBuffers, combine_fn._num_buffers, "Number of buffers") - self.assertEqual(expectedBufferSize, combine_fn._buffer_size, "Buffer size") + expectedNumBuffers, combine_fn._spec.num_buffers, "Number of buffers") + self.assertEqual( + expectedBufferSize, combine_fn._spec.buffer_size, "Buffer size") @parameterized.expand(_build_quantilebuffer_test_data) def test_correctness(self, epsilon, maxInputSize, *args): @@ -561,8 +661,8 @@ def test_correctness(self, epsilon, maxInputSize, *args): """ combine_fn = ApproximateQuantilesCombineFn.create( num_quantiles=10, max_num_elements=maxInputSize, epsilon=epsilon) - b = combine_fn._num_buffers - k = combine_fn._buffer_size + b = combine_fn._spec.num_buffers + k = combine_fn._spec.buffer_size n = maxInputSize self.assertLessEqual((b - 2) * (1 << (b - 2)) + 0.5, (epsilon * n), '(b-2)2^(b-2) + 1/2 <= eN') diff --git a/sdks/python/apache_beam/transforms/timeutil.py b/sdks/python/apache_beam/transforms/timeutil.py index 166d787d0c53..87294b0dcf4d 100644 --- a/sdks/python/apache_beam/transforms/timeutil.py +++ b/sdks/python/apache_beam/transforms/timeutil.py @@ -19,13 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - from abc import ABCMeta from abc import abstractmethod -from builtins import object - -from future.utils import with_metaclass from apache_beam.portability.api import beam_runner_api_pb2 @@ -44,8 +39,6 @@ class TimeDomain(object): _RUNNER_API_MAPPING = { WATERMARK: beam_runner_api_pb2.TimeDomain.EVENT_TIME, REAL_TIME: beam_runner_api_pb2.TimeDomain.PROCESSING_TIME, - DEPENDENT_REAL_TIME: beam_runner_api_pb2.TimeDomain. - SYNCHRONIZED_PROCESSING_TIME, } @staticmethod @@ -65,7 +58,7 @@ def is_event_time(domain): return TimeDomain.from_string(domain) == TimeDomain.WATERMARK -class TimestampCombinerImpl(with_metaclass(ABCMeta, object)): # type: ignore[misc] +class TimestampCombinerImpl(metaclass=ABCMeta): """Implementation of TimestampCombiner.""" @abstractmethod def assign_output_time(self, window, input_timestamp): @@ -90,7 +83,7 @@ def merge(self, unused_result_window, merging_timestamps): return self.combine_all(merging_timestamps) -class DependsOnlyOnWindow(with_metaclass(ABCMeta, TimestampCombinerImpl)): # type: ignore[misc] +class DependsOnlyOnWindow(TimestampCombinerImpl, metaclass=ABCMeta): """TimestampCombinerImpl that only depends on the window.""" def merge(self, result_window, unused_merging_timestamps): # Since we know that the result only depends on the window, we can ignore diff --git a/sdks/python/apache_beam/transforms/transforms_keyword_only_args_test_py3.py b/sdks/python/apache_beam/transforms/transforms_keyword_only_args_test.py similarity index 97% rename from sdks/python/apache_beam/transforms/transforms_keyword_only_args_test_py3.py rename to sdks/python/apache_beam/transforms/transforms_keyword_only_args_test.py index 374f7c41b072..28566ba55a03 100644 --- a/sdks/python/apache_beam/transforms/transforms_keyword_only_args_test_py3.py +++ b/sdks/python/apache_beam/transforms/transforms_keyword_only_args_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest @@ -31,10 +29,6 @@ class KeywordOnlyArgsTests(unittest.TestCase): - - # Enable nose tests running in parallel - _multiprocess_can_split_ = True - def test_side_input_keyword_only_args(self): with TestPipeline() as pipeline: diff --git a/sdks/python/apache_beam/transforms/trigger.py b/sdks/python/apache_beam/transforms/trigger.py index d176a8be2182..e5f7c24767c7 100644 --- a/sdks/python/apache_beam/transforms/trigger.py +++ b/sdks/python/apache_beam/transforms/trigger.py @@ -15,26 +15,24 @@ # limitations under the License. # -"""Support for Dataflow triggers. +"""Support for Apache Beam triggers. Triggers control when in processing time windows get emitted. """ # pytype: skip-file -from __future__ import absolute_import - import collections import copy import logging import numbers from abc import ABCMeta from abc import abstractmethod -from builtins import object - -from future.moves.itertools import zip_longest -from future.utils import iteritems -from future.utils import with_metaclass +from enum import Flag +from enum import auto +from functools import reduce +from itertools import zip_longest +from operator import or_ from apache_beam.coders import coder_impl from apache_beam.coders import observable @@ -79,7 +77,7 @@ class AccumulationMode(object): # RETRACTING = 3 -class _StateTag(with_metaclass(ABCMeta, object)): # type: ignore[misc] +class _StateTag(metaclass=ABCMeta): """An identifier used to store and retrieve typed, combinable state. The given tag must be unique for this step.""" @@ -162,9 +160,16 @@ def with_prefix(self, prefix): prefix + self.tag, self.timestamp_combiner_impl) +class DataLossReason(Flag): + """Enum defining potential reasons that a trigger may cause data loss.""" + NO_POTENTIAL_LOSS = 0 + MAY_FINISH = auto() + CONDITION_NOT_GUARANTEED = auto() + + # pylint: disable=unused-argument # TODO(robertwb): Provisional API, Java likely to change as well. -class TriggerFn(with_metaclass(ABCMeta, object)): # type: ignore[misc] +class TriggerFn(metaclass=ABCMeta): """A TriggerFn determines when window (panes) are emitted. See https://beam.apache.org/documentation/programming-guide/#triggers @@ -243,6 +248,43 @@ def reset(self, window, context): """Clear any state and timers used by this TriggerFn.""" pass + def may_lose_data(self, unused_windowing): + # type: (core.Windowing) -> DataLossReason + + """Returns whether or not this trigger could cause data loss. + + A trigger can cause data loss in the following scenarios: + + * The trigger has a chance to finish. For instance, AfterWatermark() + without a late trigger would cause all late data to be lost. This + scenario is only accounted for if the windowing strategy allows + late data. Otherwise, the trigger is not responsible for the data + loss. + * The trigger condition may not be met. For instance, + Repeatedly(AfterCount(N)) may not fire due to N not being met. This + is only accounted for if the condition itself led to data loss. + Repeatedly(AfterCount(1)) is safe, since it would only not fire if + there is no data to lose, but Repeatedly(AfterCount(2)) can cause + data loss if there is only one record. + + Note that this only returns the potential for loss. It does not mean that + there will be data loss. It also only accounts for loss related to the + trigger, not other potential causes. + + Args: + windowing: The Windowing that this trigger belongs to. It does not need + to be the top-level trigger. + + Returns: + The DataLossReason. If there is no potential loss, + DataLossReason.NO_POTENTIAL_LOSS is returned. Otherwise, all the + potential reasons are returned as a single value. For instance, if + data loss can result from finishing or not having the condition met, + the result will be DataLossReason.MAY_FINISH|CONDITION_NOT_GUARANTEED. + """ + # For backwards compatibility's sake, we're assuming the trigger is safe. + return DataLossReason.NO_POTENTIAL_LOSS + # pylint: enable=unused-argument @@ -267,10 +309,6 @@ def from_runner_api(proto, context): def to_runner_api(self, unused_context): pass - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - class DefaultTrigger(TriggerFn): """Semantically Repeatedly(AfterWatermark()), but more optimized.""" @@ -281,26 +319,27 @@ def __repr__(self): return 'DefaultTrigger()' def on_element(self, element, window, context): - context.set_timer('', TimeDomain.WATERMARK, window.end) + context.set_timer(str(window), TimeDomain.WATERMARK, window.end) def on_merge(self, to_be_merged, merge_result, context): - # Note: Timer clearing solely an optimization. for window in to_be_merged: - if window.end != merge_result.end: - context.clear_timer('', TimeDomain.WATERMARK) + context.clear_timer(str(window), TimeDomain.WATERMARK) def should_fire(self, time_domain, watermark, window, context): if watermark >= window.end: # Explicitly clear the timer so that late elements are not emitted again # when the timer is fired. - context.clear_timer('', TimeDomain.WATERMARK) + context.clear_timer(str(window), TimeDomain.WATERMARK) return watermark >= window.end def on_fire(self, watermark, window, context): return False def reset(self, window, context): - context.clear_timer('', TimeDomain.WATERMARK) + context.clear_timer(str(window), TimeDomain.WATERMARK) + + def may_lose_data(self, unused_windowing): + return DataLossReason.NO_POTENTIAL_LOSS def __eq__(self, other): return type(self) == type(other) @@ -350,6 +389,9 @@ def on_fire(self, timestamp, window, context): def reset(self, window, context): pass + def may_lose_data(self, unused_windowing): + return DataLossReason.MAY_FINISH + @staticmethod def from_runner_api(proto, context): return AfterProcessingTime( @@ -401,6 +443,9 @@ def should_fire(self, time_domain, watermark, window, context): def on_fire(self, watermark, window, context): return False + def may_lose_data(self, unused_windowing): + return DataLossReason.NO_POTENTIAL_LOSS + @staticmethod def from_runner_api(proto, context): return Always() @@ -445,6 +490,14 @@ def should_fire(self, time_domain, watermark, window, context): def on_fire(self, watermark, window, context): return True + def may_lose_data(self, unused_windowing): + """No potential data loss. + + Though Never doesn't explicitly trigger, it still collects data on + windowing closing, so any data loss is due to windowing closing. + """ + return DataLossReason.NO_POTENTIAL_LOSS + @staticmethod def from_runner_api(proto, context): return _Never() @@ -466,6 +519,7 @@ class AfterWatermark(TriggerFn): LATE_TAG = _CombiningValueStateTag('is_late', any) def __init__(self, early=None, late=None): + # TODO(zhoufek): Maybe don't wrap early/late if they are already Repeatedly self.early = Repeatedly(early) if early else None self.late = Repeatedly(late) if late else None @@ -536,6 +590,20 @@ def reset(self, window, context): if self.late: self.late.reset(window, NestedContext(context, 'late')) + def may_lose_data(self, windowing): + """May cause data loss if the windowing allows lateness and either: + + * The late trigger is not set + * The late trigger may cause data loss. + + The second case is equivalent to Repeatedly(late).may_lose_data(windowing) + """ + if windowing.allowed_lateness == 0: + return DataLossReason.NO_POTENTIAL_LOSS + if self.late is None: + return DataLossReason.MAY_FINISH + return self.late.may_lose_data(windowing) + def __eq__(self, other): return ( type(self) == type(other) and self.early == other.early and @@ -605,6 +673,12 @@ def on_fire(self, watermark, window, context): def reset(self, window, context): context.clear_state(self.COUNT_TAG) + def may_lose_data(self, unused_windowing): + reason = DataLossReason.MAY_FINISH + if self.count > 1: + reason |= DataLossReason.CONDITION_NOT_GUARANTEED + return reason + @staticmethod def from_runner_api(proto, unused_context): return AfterCount(proto.element_count.element_count) @@ -649,6 +723,17 @@ def on_fire(self, watermark, window, context): def reset(self, window, context): self.underlying.reset(window, context) + def may_lose_data(self, windowing): + """Repeatedly may only lose data if the underlying trigger may not have + its condition met. + + For underlying triggers that may finish, Repeatedly overrides that + behavior. + """ + return ( + self.underlying.may_lose_data(windowing) + & DataLossReason.CONDITION_NOT_GUARANTEED) + @staticmethod def from_runner_api(proto, context): return Repeatedly( @@ -663,7 +748,7 @@ def has_ontime_pane(self): return self.underlying.has_ontime_pane() -class _ParallelTriggerFn(with_metaclass(ABCMeta, TriggerFn)): # type: ignore[misc] +class _ParallelTriggerFn(TriggerFn, metaclass=ABCMeta): def __init__(self, *triggers): self.triggers = triggers @@ -754,6 +839,15 @@ class AfterAny(_ParallelTriggerFn): """ combine_op = any + def may_lose_data(self, windowing): + reason = DataLossReason.NO_POTENTIAL_LOSS + for trigger in self.triggers: + t_reason = trigger.may_lose_data(windowing) + if t_reason == DataLossReason.NO_POTENTIAL_LOSS: + return t_reason + reason |= t_reason + return reason + class AfterAll(_ParallelTriggerFn): """Fires when all subtriggers have fired. @@ -762,6 +856,9 @@ class AfterAll(_ParallelTriggerFn): """ combine_op = all + def may_lose_data(self, windowing): + return reduce(or_, (t.may_lose_data(windowing) for t in self.triggers)) + class AfterEach(TriggerFn): @@ -817,6 +914,9 @@ def reset(self, window, context): for ix, trigger in enumerate(self.triggers): trigger.reset(window, self._sub_context(context, ix)) + def may_lose_data(self, windowing): + return reduce(or_, (t.may_lose_data(windowing) for t in self.triggers)) + @staticmethod def _sub_context(context, index): return NestedContext(context, '%d/' % index) @@ -909,13 +1009,14 @@ def clear_state(self, tag): # pylint: disable=unused-argument -class SimpleState(with_metaclass(ABCMeta, object)): # type: ignore[misc] +class SimpleState(metaclass=ABCMeta): """Basic state storage interface used for triggering. Only timers must hold the watermark (by their timestamp). """ @abstractmethod - def set_timer(self, window, name, time_domain, timestamp): + def set_timer( + self, window, name, time_domain, timestamp, dynamic_timer_tag=''): pass @abstractmethod @@ -923,7 +1024,7 @@ def get_window(self, window_id): pass @abstractmethod - def clear_timer(self, window, name, time_domain): + def clear_timer(self, window, name, time_domain, dynamic_timer_tag=''): pass @abstractmethod @@ -971,12 +1072,19 @@ def __init__(self, raw_state): self.window_ids = self.raw_state.get_global_state(self.WINDOW_IDS, {}) self.counter = None - def set_timer(self, window, name, time_domain, timestamp): - self.raw_state.set_timer(self._get_id(window), name, time_domain, timestamp) + def set_timer( + self, window, name, time_domain, timestamp, dynamic_timer_tag=''): + self.raw_state.set_timer( + self._get_id(window), + name, + time_domain, + timestamp, + dynamic_timer_tag=dynamic_timer_tag) - def clear_timer(self, window, name, time_domain): + def clear_timer(self, window, name, time_domain, dynamic_timer_tag=''): for window_id in self._get_ids(window): - self.raw_state.clear_timer(window_id, name, time_domain) + self.raw_state.clear_timer( + window_id, name, time_domain, dynamic_timer_tag=dynamic_timer_tag) def add_state(self, window, tag, value): if isinstance(tag, _ReadModifyWriteStateTag): @@ -1091,7 +1199,7 @@ def create_trigger_driver( return driver -class TriggerDriver(with_metaclass(ABCMeta, object)): # type: ignore[misc] +class TriggerDriver(metaclass=ABCMeta): """Breaks a series of bundle and timer firings into window (pane)s.""" @abstractmethod def process_elements( @@ -1114,6 +1222,7 @@ def process_timer( pass def process_entire_key(self, key, windowed_values): + # This state holds per-key, multi-window state. state = InMemoryUnmergedState() for wvalue in self.process_elements(state, windowed_values, @@ -1122,7 +1231,7 @@ def process_entire_key(self, key, windowed_values): yield wvalue.with_value((key, wvalue.value)) while state.timers: fired = state.get_and_clear_timers() - for timer_window, (name, time_domain, fire_time) in fired: + for timer_window, (name, time_domain, fire_time, _) in fired: for wvalue in self.process_timer(timer_window, name, time_domain, @@ -1159,10 +1268,6 @@ def __eq__(self, other): def __hash__(self): return hash(tuple(self)) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - coder_impl.FastPrimitivesCoderImpl.register_iterable_like_type( _UnwindowedValues) @@ -1445,11 +1550,12 @@ def set_global_state(self, tag, value): def get_global_state(self, tag, default=None): return self.global_state.get(tag.tag, default) - def set_timer(self, window, name, time_domain, timestamp): - self.timers[window][(name, time_domain)] = timestamp + def set_timer( + self, window, name, time_domain, timestamp, dynamic_timer_tag=''): + self.timers[window][(name, time_domain, dynamic_timer_tag)] = timestamp - def clear_timer(self, window, name, time_domain): - self.timers[window].pop((name, time_domain), None) + def clear_timer(self, window, name, time_domain, dynamic_timer_tag=''): + self.timers[window].pop((name, time_domain, dynamic_timer_tag), None) if not self.timers[window]: del self.timers[window] @@ -1504,7 +1610,8 @@ def get_timers( expired = [] has_realtime_timer = False for window, timers in list(self.timers.items()): - for (name, time_domain), timestamp in list(timers.items()): + for (name, time_domain, dynamic_timer_tag), timestamp in list( + timers.items()): if time_domain == TimeDomain.REAL_TIME: time_marker = processing_time has_realtime_timer = True @@ -1515,9 +1622,10 @@ def get_timers( 'TimeDomain error: No timers defined for time domain %s.', time_domain) if timestamp <= time_marker: - expired.append((window, (name, time_domain, timestamp))) + expired.append( + (window, (name, time_domain, timestamp, dynamic_timer_tag))) if clear: - del timers[(name, time_domain)] + del timers[(name, time_domain, dynamic_timer_tag)] if not timers and clear: del self.timers[window] return expired, has_realtime_timer @@ -1527,7 +1635,7 @@ def get_and_clear_timers(self, watermark=MAX_TIMESTAMP): def get_earliest_hold(self): earliest_hold = MAX_TIMESTAMP - for unused_window, tagged_states in iteritems(self.state): + for unused_window, tagged_states in self.state.items(): # TODO(BEAM-2519): currently, this assumes that the watermark hold tag is # named "watermark". This is currently only true because the only place # watermark holds are set is in the GeneralTriggerDriver, where we use diff --git a/sdks/python/apache_beam/transforms/trigger_test.py b/sdks/python/apache_beam/transforms/trigger_test.py index 073b5eca60cd..ed43094088c1 100644 --- a/sdks/python/apache_beam/transforms/trigger_test.py +++ b/sdks/python/apache_beam/transforms/trigger_test.py @@ -19,25 +19,20 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import json import os.path import pickle import random import unittest -from builtins import range -from builtins import zip -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import yaml import apache_beam as beam from apache_beam import coders from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.options.pipeline_options import TypeOptions from apache_beam.portability import common_urns from apache_beam.runners import pipeline_context from apache_beam.runners.direct.clock import TestClock @@ -45,6 +40,7 @@ from apache_beam.testing.test_stream import TestStream from apache_beam.testing.util import assert_that from apache_beam.testing.util import equal_to +from apache_beam.transforms import WindowInto from apache_beam.transforms import ptransform from apache_beam.transforms import trigger from apache_beam.transforms.core import Windowing @@ -56,6 +52,7 @@ from apache_beam.transforms.trigger import AfterProcessingTime from apache_beam.transforms.trigger import AfterWatermark from apache_beam.transforms.trigger import Always +from apache_beam.transforms.trigger import DataLossReason from apache_beam.transforms.trigger import DefaultTrigger from apache_beam.transforms.trigger import GeneralTriggerDriver from apache_beam.transforms.trigger import InMemoryUnmergedState @@ -63,6 +60,7 @@ from apache_beam.transforms.trigger import TriggerFn from apache_beam.transforms.trigger import _Never from apache_beam.transforms.window import FixedWindows +from apache_beam.transforms.window import GlobalWindows from apache_beam.transforms.window import IntervalWindow from apache_beam.transforms.window import Sessions from apache_beam.transforms.window import TimestampCombiner @@ -157,8 +155,8 @@ def run_trigger( actual_panes[window].append(set(wvalue.value)) while state.timers: - for timer_window, (name, time_domain, - timestamp) in state.get_and_clear_timers(): + for timer_window, (name, time_domain, timestamp, + _) in state.get_and_clear_timers(): for wvalue in driver.process_timer(timer_window, name, time_domain, @@ -179,8 +177,8 @@ def run_trigger( actual_panes[window].append(set(wvalue.value)) while state.timers: - for timer_window, (name, time_domain, - timestamp) in state.get_and_clear_timers(): + for timer_window, (name, time_domain, timestamp, + _) in state.get_and_clear_timers(): for wvalue in driver.process_timer(timer_window, name, time_domain, @@ -439,6 +437,128 @@ def test_picklable_output(self): pickle.loads(pickle.dumps(unwindowed)).value, list(range(10))) +class MayLoseDataTest(unittest.TestCase): + def _test(self, trigger, lateness, expected): + windowing = WindowInto( + GlobalWindows(), + trigger=trigger, + accumulation_mode=AccumulationMode.ACCUMULATING, + allowed_lateness=lateness).windowing + self.assertEqual(trigger.may_lose_data(windowing), expected) + + def test_default_trigger(self): + self._test(DefaultTrigger(), 0, DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_processing_time(self): + self._test(AfterProcessingTime(), 0, DataLossReason.MAY_FINISH) + + def test_always(self): + self._test(Always(), 0, DataLossReason.NO_POTENTIAL_LOSS) + + def test_never(self): + self._test(_Never(), 0, DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_watermark_no_allowed_lateness(self): + self._test(AfterWatermark(), 0, DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_watermark_late_none(self): + self._test(AfterWatermark(), 60, DataLossReason.MAY_FINISH) + + def test_after_watermark_no_allowed_lateness_safe_late(self): + self._test( + AfterWatermark(late=DefaultTrigger()), + 0, + DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_watermark_safe_late(self): + self._test( + AfterWatermark(late=DefaultTrigger()), + 60, + DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_watermark_no_allowed_lateness_may_finish_late(self): + self._test( + AfterWatermark(late=AfterProcessingTime()), + 0, + DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_watermark_may_finish_late(self): + self._test( + AfterWatermark(late=AfterProcessingTime()), + 60, + DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_watermark_no_allowed_lateness_condition_late(self): + self._test( + AfterWatermark(late=AfterCount(5)), 0, DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_watermark_condition_late(self): + self._test( + AfterWatermark(late=AfterCount(5)), + 60, + DataLossReason.CONDITION_NOT_GUARANTEED) + + def test_after_count_one(self): + self._test(AfterCount(1), 0, DataLossReason.MAY_FINISH) + + def test_after_count_gt_one(self): + self._test( + AfterCount(2), + 0, + DataLossReason.MAY_FINISH | DataLossReason.CONDITION_NOT_GUARANTEED) + + def test_repeatedly_safe_underlying(self): + self._test( + Repeatedly(DefaultTrigger()), 0, DataLossReason.NO_POTENTIAL_LOSS) + + def test_repeatedly_may_finish_underlying(self): + self._test(Repeatedly(AfterCount(1)), 0, DataLossReason.NO_POTENTIAL_LOSS) + + def test_repeatedly_condition_underlying(self): + self._test( + Repeatedly(AfterCount(2)), 0, DataLossReason.CONDITION_NOT_GUARANTEED) + + def test_after_any_some_unsafe(self): + self._test( + AfterAny(AfterCount(1), DefaultTrigger()), + 0, + DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_any_same_reason(self): + self._test( + AfterAny(AfterCount(1), AfterProcessingTime()), + 0, + DataLossReason.MAY_FINISH) + + def test_after_any_different_reasons(self): + self._test( + AfterAny(Repeatedly(AfterCount(2)), AfterProcessingTime()), + 0, + DataLossReason.MAY_FINISH | DataLossReason.CONDITION_NOT_GUARANTEED) + + def test_after_all_some_unsafe(self): + self._test( + AfterAll(AfterCount(1), DefaultTrigger()), 0, DataLossReason.MAY_FINISH) + + def test_after_all_safe(self): + self._test( + AfterAll(Repeatedly(AfterCount(1)), DefaultTrigger()), + 0, + DataLossReason.NO_POTENTIAL_LOSS) + + def test_after_each_some_unsafe(self): + self._test( + AfterEach(AfterCount(1), DefaultTrigger()), + 0, + DataLossReason.MAY_FINISH) + + def test_after_each_all_safe(self): + self._test( + AfterEach(Repeatedly(AfterCount(1)), DefaultTrigger()), + 0, + DataLossReason.NO_POTENTIAL_LOSS) + + class RunnerApiTest(unittest.TestCase): def test_trigger_encoding(self): for trigger_fn in (DefaultTrigger(), @@ -457,7 +577,8 @@ def test_trigger_encoding(self): class TriggerPipelineTest(unittest.TestCase): def test_after_count(self): - with TestPipeline() as p: + test_options = PipelineOptions(flags=['--allow_unsafe_triggers']) + with TestPipeline(options=test_options) as p: def construct_timestamped(k_t): return TimestampedValue((k_t[0], k_t[1]), k_t[1]) @@ -838,7 +959,7 @@ def _execute( def fire_timers(): to_fire = state.get_and_clear_timers(watermark) while to_fire: - for timer_window, (name, time_domain, t_timestamp) in to_fire: + for timer_window, (name, time_domain, t_timestamp, _) in to_fire: for wvalue in driver.process_timer(timer_window, name, time_domain, @@ -1067,6 +1188,7 @@ def CheckAggregation(inputs_and_expected, aggregation): with TestPipeline() as p: # TODO(BEAM-8601): Pass this during pipeline construction. p._options.view_as(StandardOptions).streaming = True + p._options.view_as(TypeOptions).allow_unsafe_triggers = True # We can have at most one test stream per pipeline, so we share it. inputs_and_expected = p | read_test_stream diff --git a/sdks/python/apache_beam/transforms/userstate.py b/sdks/python/apache_beam/transforms/userstate.py index 8d417ada9a63..84184d4bde02 100644 --- a/sdks/python/apache_beam/transforms/userstate.py +++ b/sdks/python/apache_beam/transforms/userstate.py @@ -23,13 +23,12 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import - +import collections import types -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import Callable +from typing import Dict from typing import Iterable from typing import NamedTuple from typing import Optional @@ -184,29 +183,6 @@ def to_runner_api(self, context, key_coder, window_coder): coders._TimerCoder(key_coder, window_coder))) -# TODO(BEAM-9602): Provide support for dynamic timer. -class TimerFamilySpec(object): - prefix = "tfs-" - - def __init__(self, name, time_domain): - # type: (str, str) -> None - self.name = self.prefix + name - if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME): - raise ValueError('Unsupported TimeDomain: %r.' % (time_domain, )) - self.time_domain = time_domain - - def __repr__(self): - # type: () -> str - return '%s(%s)' % (self.__class__.__name__, self.name) - - def to_runner_api(self, context, key_coder, window_coder): - # type: (PipelineContext, coders.Coder, coders.Coder) -> beam_runner_api_pb2.TimerFamilySpec - return beam_runner_api_pb2.TimerFamilySpec( - time_domain=TimeDomain.to_runner_api(self.time_domain), - timer_family_coder_id=context.coders.get_id( - coders._TimerCoder(key_coder, window_coder))) - - def on_timer(timer_spec): # type: (TimerSpec) -> Callable[[CallableT], CallableT] @@ -324,30 +300,34 @@ def validate_stateful_dofn(dofn): class BaseTimer(object): - def clear(self): - # type: () -> None + def clear(self, dynamic_timer_tag=''): + # type: (str) -> None raise NotImplementedError - def set(self, timestamp): - # type: (Timestamp) -> None + def set(self, timestamp, dynamic_timer_tag=''): + # type: (Timestamp, str) -> None raise NotImplementedError +_TimerTuple = collections.namedtuple('timer_tuple', ('cleared', 'timestamp')) + + class RuntimeTimer(BaseTimer): """Timer interface object passed to user code.""" - def __init__(self, timer_spec): - # type: (TimerSpec) -> None + def __init__(self) -> None: + self._timer_recordings = {} # type: Dict[str, _TimerTuple] self._cleared = False self._new_timestamp = None # type: Optional[Timestamp] - def clear(self): - # type: () -> None - self._cleared = True - self._new_timestamp = None + def clear(self, dynamic_timer_tag=''): + # type: (str) -> None + self._timer_recordings[dynamic_timer_tag] = _TimerTuple( + cleared=True, timestamp=None) - def set(self, timestamp): - # type: (Timestamp) -> None - self._new_timestamp = timestamp + def set(self, timestamp, dynamic_timer_tag=''): + # type: (Timestamp, str) -> None + self._timer_recordings[dynamic_timer_tag] = _TimerTuple( + cleared=False, timestamp=timestamp) class RuntimeState(object): diff --git a/sdks/python/apache_beam/transforms/userstate_test.py b/sdks/python/apache_beam/transforms/userstate_test.py index ec676624019d..702d5c367228 100644 --- a/sdks/python/apache_beam/transforms/userstate_test.py +++ b/sdks/python/apache_beam/transforms/userstate_test.py @@ -18,13 +18,10 @@ """Unit tests for the Beam State and Timer API interfaces.""" # pytype: skip-file -from __future__ import absolute_import - import unittest from typing import Any from typing import List -# patches unittest.TestCase to be python3 compatible import mock import apache_beam as beam @@ -64,6 +61,7 @@ class TestStatefulDoFn(DoFn): EXPIRY_TIMER_1 = TimerSpec('expiry1', TimeDomain.WATERMARK) EXPIRY_TIMER_2 = TimerSpec('expiry2', TimeDomain.WATERMARK) EXPIRY_TIMER_3 = TimerSpec('expiry3', TimeDomain.WATERMARK) + EXPIRY_TIMER_FAMILY = TimerSpec('expiry_family', TimeDomain.WATERMARK) def process( self, @@ -72,7 +70,8 @@ def process( buffer_1=DoFn.StateParam(BUFFER_STATE_1), buffer_2=DoFn.StateParam(BUFFER_STATE_2), timer_1=DoFn.TimerParam(EXPIRY_TIMER_1), - timer_2=DoFn.TimerParam(EXPIRY_TIMER_2)): + timer_2=DoFn.TimerParam(EXPIRY_TIMER_2), + dynamic_timer=DoFn.TimerParam(EXPIRY_TIMER_FAMILY)): yield element @on_timer(EXPIRY_TIMER_1) @@ -103,6 +102,13 @@ def on_expiry_3( timer_3=DoFn.TimerParam(EXPIRY_TIMER_3)): yield 'expired3' + @on_timer(EXPIRY_TIMER_FAMILY) + def on_expiry_family( + self, + dynamic_timer=DoFn.TimerParam(EXPIRY_TIMER_FAMILY), + dynamic_timer_tag=DoFn.DynamicTimerTagParam): + yield (dynamic_timer_tag, 'expired_dynamic_timer') + class InterfaceTest(unittest.TestCase): def _validate_dofn(self, dofn): @@ -165,29 +171,44 @@ def test_good_signatures(self): class BasicStatefulDoFn(DoFn): BUFFER_STATE = BagStateSpec('buffer', BytesCoder()) EXPIRY_TIMER = TimerSpec('expiry1', TimeDomain.WATERMARK) + EXPIRY_TIMER_FAMILY = TimerSpec('expiry_family_1', TimeDomain.WATERMARK) def process( self, element, buffer=DoFn.StateParam(BUFFER_STATE), - timer1=DoFn.TimerParam(EXPIRY_TIMER)): + timer1=DoFn.TimerParam(EXPIRY_TIMER), + dynamic_timer=DoFn.TimerParam(EXPIRY_TIMER_FAMILY)): yield element @on_timer(EXPIRY_TIMER) def expiry_callback(self, element, timer=DoFn.TimerParam(EXPIRY_TIMER)): yield element + @on_timer(EXPIRY_TIMER_FAMILY) + def expiry_family_callback( + self, element, dynamic_timer=DoFn.TimerParam(EXPIRY_TIMER_FAMILY)): + yield element + # Validate get_dofn_specs() and timer callbacks in # DoFnSignature. stateful_dofn = BasicStatefulDoFn() signature = self._validate_dofn(stateful_dofn) expected_specs = ( set([BasicStatefulDoFn.BUFFER_STATE]), - set([BasicStatefulDoFn.EXPIRY_TIMER])) + set([ + BasicStatefulDoFn.EXPIRY_TIMER, + BasicStatefulDoFn.EXPIRY_TIMER_FAMILY + ]), + ) self.assertEqual(expected_specs, get_dofn_specs(stateful_dofn)) self.assertEqual( stateful_dofn.expiry_callback, signature.timer_methods[BasicStatefulDoFn.EXPIRY_TIMER].method_value) + self.assertEqual( + stateful_dofn.expiry_family_callback, + signature.timer_methods[ + BasicStatefulDoFn.EXPIRY_TIMER_FAMILY].method_value) stateful_dofn = TestStatefulDoFn() signature = self._validate_dofn(stateful_dofn) @@ -196,7 +217,8 @@ def expiry_callback(self, element, timer=DoFn.TimerParam(EXPIRY_TIMER)): set([ TestStatefulDoFn.EXPIRY_TIMER_1, TestStatefulDoFn.EXPIRY_TIMER_2, - TestStatefulDoFn.EXPIRY_TIMER_3 + TestStatefulDoFn.EXPIRY_TIMER_3, + TestStatefulDoFn.EXPIRY_TIMER_FAMILY ])) self.assertEqual(expected_specs, get_dofn_specs(stateful_dofn)) self.assertEqual( @@ -208,6 +230,10 @@ def expiry_callback(self, element, timer=DoFn.TimerParam(EXPIRY_TIMER)): self.assertEqual( stateful_dofn.on_expiry_3, signature.timer_methods[TestStatefulDoFn.EXPIRY_TIMER_3].method_value) + self.assertEqual( + stateful_dofn.on_expiry_family, + signature.timer_methods[ + TestStatefulDoFn.EXPIRY_TIMER_FAMILY].method_value) def test_bad_signatures(self): # (1) The same state parameter is duplicated on the process method. @@ -269,6 +295,20 @@ def expiry_callback( with self.assertRaises(ValueError): self._validate_dofn(BadStatefulDoFn4()) + # (5) The same timer family parameter is duplicated on the process method. + class BadStatefulDoFn5(DoFn): + EXPIRY_TIMER_FAMILY = TimerSpec('dynamic_timer', TimeDomain.WATERMARK) + + def process( + self, + element, + dynamic_timer_1=DoFn.TimerParam(EXPIRY_TIMER_FAMILY), + dynamic_timer_2=DoFn.TimerParam(EXPIRY_TIMER_FAMILY)): + yield element + + with self.assertRaises(ValueError): + self._validate_dofn(BadStatefulDoFn5()) + def test_validation_typos(self): # (1) Here, the user mistakenly used the same timer spec twice for two # different timer callbacks. @@ -461,6 +501,45 @@ def clear_values(self, bag_state=beam.DoFn.StateParam(BAG_STATE)): self.assertEqual(['extra'], StatefulDoFnOnDirectRunnerTest.all_records) + def test_two_timers_one_function(self): + class BagStateClearingStatefulDoFn(beam.DoFn): + + BAG_STATE = BagStateSpec('bag_state', StrUtf8Coder()) + EMIT_TIMER = TimerSpec('emit_timer', TimeDomain.WATERMARK) + EMIT_TWICE_TIMER = TimerSpec('clear_timer', TimeDomain.WATERMARK) + + def process( + self, + element, + bag_state=beam.DoFn.StateParam(BAG_STATE), + emit_timer=beam.DoFn.TimerParam(EMIT_TIMER), + emit_twice_timer=beam.DoFn.TimerParam(EMIT_TWICE_TIMER)): + value = element[1] + bag_state.add(value) + emit_twice_timer.set(100) + emit_timer.set(1000) + + @on_timer(EMIT_TWICE_TIMER) + @on_timer(EMIT_TIMER) + def emit_values(self, bag_state=beam.DoFn.StateParam(BAG_STATE)): + for value in bag_state.read(): + yield value + + with TestPipeline() as p: + test_stream = ( + TestStream().advance_watermark_to(0).add_elements([ + ('key', 'value') + ]).advance_watermark_to(100)) + + _ = ( + p + | test_stream + | beam.ParDo(BagStateClearingStatefulDoFn()) + | beam.ParDo(self.record_dofn())) + + self.assertEqual(['value', 'value'], + StatefulDoFnOnDirectRunnerTest.all_records) + def test_simple_read_modify_write_stateful_dofn(self): class SimpleTestReadModifyWriteStatefulDoFn(DoFn): VALUE_STATE = ReadModifyWriteStateSpec('value', StrUtf8Coder()) @@ -826,6 +905,171 @@ def emit_callback_1( self.assertEqual([('timer1-mykey', 10, 10, 15)], sorted(StatefulDoFnOnDirectRunnerTest.all_records)) + def test_timer_default_tag(self): + class DynamicTimerDoFn(DoFn): + EMIT_TIMER_FAMILY = TimerSpec('emit', TimeDomain.WATERMARK) + + def process(self, element, emit=DoFn.TimerParam(EMIT_TIMER_FAMILY)): + emit.set(10) + emit.set(20, dynamic_timer_tag='') + + @on_timer(EMIT_TIMER_FAMILY) + def emit_callback( + self, ts=DoFn.TimestampParam, tag=DoFn.DynamicTimerTagParam): + yield (tag, ts) + + with TestPipeline() as p: + test_stream = (TestStream().advance_watermark_to(10).add_elements( + [1])).advance_watermark_to_infinity() + ( + p + | test_stream + | beam.Map(lambda x: ('mykey', x)) + | beam.ParDo(DynamicTimerDoFn()) + | beam.ParDo(self.record_dofn())) + + self.assertEqual([('', 20)], + sorted(StatefulDoFnOnDirectRunnerTest.all_records)) + + def test_dynamic_timer_simple_dofn(self): + class DynamicTimerDoFn(DoFn): + EMIT_TIMER_FAMILY = TimerSpec('emit', TimeDomain.WATERMARK) + + def process(self, element, emit=DoFn.TimerParam(EMIT_TIMER_FAMILY)): + emit.set(10, dynamic_timer_tag='emit1') + emit.set(20, dynamic_timer_tag='emit2') + emit.set(30, dynamic_timer_tag='emit3') + + @on_timer(EMIT_TIMER_FAMILY) + def emit_callback( + self, ts=DoFn.TimestampParam, tag=DoFn.DynamicTimerTagParam): + yield (tag, ts) + + with TestPipeline() as p: + test_stream = (TestStream().advance_watermark_to(10).add_elements( + [1])).advance_watermark_to_infinity() + ( + p + | test_stream + | beam.Map(lambda x: ('mykey', x)) + | beam.ParDo(DynamicTimerDoFn()) + | beam.ParDo(self.record_dofn())) + + self.assertEqual([('emit1', 10), ('emit2', 20), ('emit3', 30)], + sorted(StatefulDoFnOnDirectRunnerTest.all_records)) + + def test_dynamic_timer_clear_timer(self): + class DynamicTimerDoFn(DoFn): + EMIT_TIMER_FAMILY = TimerSpec('emit', TimeDomain.WATERMARK) + + def process(self, element, emit=DoFn.TimerParam(EMIT_TIMER_FAMILY)): + if element[1] == 'set': + emit.set(10, dynamic_timer_tag='emit1') + emit.set(20, dynamic_timer_tag='emit2') + emit.set(30, dynamic_timer_tag='emit3') + if element[1] == 'clear': + emit.clear(dynamic_timer_tag='emit3') + + @on_timer(EMIT_TIMER_FAMILY) + def emit_callback( + self, ts=DoFn.TimestampParam, tag=DoFn.DynamicTimerTagParam): + yield (tag, ts) + + with TestPipeline() as p: + test_stream = ( + TestStream().advance_watermark_to(5).add_elements( + ['set']).advance_watermark_to(10).add_elements( + ['clear']).advance_watermark_to_infinity()) + ( + p + | test_stream + | beam.Map(lambda x: ('mykey', x)) + | beam.ParDo(DynamicTimerDoFn()) + | beam.ParDo(self.record_dofn())) + + self.assertEqual([('emit1', 10), ('emit2', 20)], + sorted(StatefulDoFnOnDirectRunnerTest.all_records)) + + def test_dynamic_timer_multiple(self): + class DynamicTimerDoFn(DoFn): + EMIT_TIMER_FAMILY1 = TimerSpec('emit_family_1', TimeDomain.WATERMARK) + EMIT_TIMER_FAMILY2 = TimerSpec('emit_family_2', TimeDomain.WATERMARK) + + def process( + self, + element, + emit1=DoFn.TimerParam(EMIT_TIMER_FAMILY1), + emit2=DoFn.TimerParam(EMIT_TIMER_FAMILY2)): + emit1.set(10, dynamic_timer_tag='emit11') + emit1.set(20, dynamic_timer_tag='emit12') + emit1.set(30, dynamic_timer_tag='emit13') + emit2.set(30, dynamic_timer_tag='emit21') + emit2.set(20, dynamic_timer_tag='emit22') + emit2.set(10, dynamic_timer_tag='emit23') + + @on_timer(EMIT_TIMER_FAMILY1) + def emit_callback( + self, ts=DoFn.TimestampParam, tag=DoFn.DynamicTimerTagParam): + yield (tag, ts) + + @on_timer(EMIT_TIMER_FAMILY2) + def emit_callback_2( + self, ts=DoFn.TimestampParam, tag=DoFn.DynamicTimerTagParam): + yield (tag, ts) + + with TestPipeline() as p: + test_stream = ( + TestStream().advance_watermark_to(5).add_elements( + ['1']).advance_watermark_to_infinity()) + ( + p + | test_stream + | beam.Map(lambda x: ('mykey', x)) + | beam.ParDo(DynamicTimerDoFn()) + | beam.ParDo(self.record_dofn())) + + self.assertEqual([('emit11', 10), ('emit12', 20), ('emit13', 30), + ('emit21', 30), ('emit22', 20), ('emit23', 10)], + sorted(StatefulDoFnOnDirectRunnerTest.all_records)) + + def test_dynamic_timer_and_simple_timer(self): + class DynamicTimerDoFn(DoFn): + EMIT_TIMER_FAMILY = TimerSpec('emit', TimeDomain.WATERMARK) + GC_TIMER = TimerSpec('gc', TimeDomain.WATERMARK) + + def process( + self, + element, + emit=DoFn.TimerParam(EMIT_TIMER_FAMILY), + gc=DoFn.TimerParam(GC_TIMER)): + emit.set(10, dynamic_timer_tag='emit1') + emit.set(20, dynamic_timer_tag='emit2') + emit.set(30, dynamic_timer_tag='emit3') + gc.set(40) + + @on_timer(EMIT_TIMER_FAMILY) + def emit_callback( + self, ts=DoFn.TimestampParam, tag=DoFn.DynamicTimerTagParam): + yield (tag, ts) + + @on_timer(GC_TIMER) + def gc(self, ts=DoFn.TimestampParam): + yield ('gc', ts) + + with TestPipeline() as p: + test_stream = ( + TestStream().advance_watermark_to(5).add_elements( + ['1']).advance_watermark_to_infinity()) + ( + p + | test_stream + | beam.Map(lambda x: ('mykey', x)) + | beam.ParDo(DynamicTimerDoFn()) + | beam.ParDo(self.record_dofn())) + + self.assertEqual([('emit1', 10), ('emit2', 20), ('emit3', 30), ('gc', 40)], + sorted(StatefulDoFnOnDirectRunnerTest.all_records)) + def test_index_assignment(self): class IndexAssigningStatefulDoFn(DoFn): INDEX_STATE = CombiningValueStateSpec('index', sum) diff --git a/sdks/python/apache_beam/transforms/util.py b/sdks/python/apache_beam/transforms/util.py index 2e8d7c9e8ea4..63bbece98333 100644 --- a/sdks/python/apache_beam/transforms/util.py +++ b/sdks/python/apache_beam/transforms/util.py @@ -20,21 +20,13 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import collections import contextlib import random import re -import sys import threading import time import uuid -from builtins import filter -from builtins import object -from builtins import range -from builtins import zip from typing import TYPE_CHECKING from typing import Any from typing import Iterable @@ -43,14 +35,12 @@ from typing import TypeVar from typing import Union -from future.utils import itervalues -from past.builtins import long - from apache_beam import coders from apache_beam import typehints from apache_beam.metrics import Metrics from apache_beam.portability import common_urns from apache_beam.portability.api import beam_runner_api_pb2 +from apache_beam.pvalue import AsSideInput from apache_beam.transforms import window from apache_beam.transforms.combiners import CountCombineFn from apache_beam.transforms.core import CombinePerKey @@ -59,6 +49,7 @@ from apache_beam.transforms.core import Flatten from apache_beam.transforms.core import GroupByKey from apache_beam.transforms.core import Map +from apache_beam.transforms.core import MapTuple from apache_beam.transforms.core import ParDo from apache_beam.transforms.core import Windowing from apache_beam.transforms.ptransform import PTransform @@ -73,6 +64,8 @@ from apache_beam.transforms.window import NonMergingWindowFn from apache_beam.transforms.window import TimestampCombiner from apache_beam.transforms.window import TimestampedValue +from apache_beam.typehints.decorators import get_signature +from apache_beam.typehints.sharded_key_type import ShardedKeyType from apache_beam.utils import windowed_value from apache_beam.utils.annotations import deprecated from apache_beam.utils.annotations import experimental @@ -153,84 +146,107 @@ class CoGroupByKey(PTransform): (or if there's a chance there may be none), this argument is the only way to provide pipeline information, and should be considered mandatory. """ - def __init__(self, **kwargs): - super(CoGroupByKey, self).__init__() - self.pipeline = kwargs.pop('pipeline', None) - if kwargs: - raise ValueError('Unexpected keyword arguments: %s' % list(kwargs.keys())) + def __init__(self, *, pipeline=None): + self.pipeline = pipeline def _extract_input_pvalues(self, pvalueish): try: # If this works, it's a dict. - return pvalueish, tuple(itervalues(pvalueish)) + return pvalueish, tuple(pvalueish.values()) except AttributeError: + # Cast iterables a tuple so we can do re-iteration. pcolls = tuple(pvalueish) return pcolls, pcolls def expand(self, pcolls): - """Performs CoGroupByKey on argument pcolls; see class docstring.""" - - # For associating values in K-V pairs with the PCollections they came from. - def _pair_tag_with_value(key_value, tag): - (key, value) = key_value - return (key, (tag, value)) - - # Creates the key, value pairs for the output PCollection. Values are either - # lists or dicts (per the class docstring), initialized by the result of - # result_ctor(result_ctor_arg). - def _merge_tagged_vals_under_key(key_grouped, result_ctor, result_ctor_arg): - (key, grouped) = key_grouped - result_value = result_ctor(result_ctor_arg) - for tag, value in grouped: - result_value[tag].append(value) - return (key, result_value) + if isinstance(pcolls, dict): + if all(isinstance(tag, str) and len(tag) < 10 for tag in pcolls.keys()): + # Small, string tags. Pass them as data. + pcolls_dict = pcolls + restore_tags = None + else: + # Pass the tags in the restore_tags closure. + tags = list(pcolls.keys()) + pcolls_dict = {str(ix): pcolls[tag] for (ix, tag) in enumerate(tags)} + restore_tags = lambda vs: { + tag: vs[str(ix)] + for (ix, tag) in enumerate(tags) + } + else: + # Tags are tuple indices. + num_tags = len(pcolls) + pcolls_dict = {str(ix): pcolls[ix] for ix in range(num_tags)} + restore_tags = lambda vs: tuple(vs[str(ix)] for ix in range(num_tags)) - try: - # If pcolls is a dict, we turn it into (tag, pcoll) pairs for use in the - # general-purpose code below. The result value constructor creates dicts - # whose keys are the tags. - result_ctor_arg = list(pcolls) - result_ctor = lambda tags: dict((tag, []) for tag in tags) - pcolls = pcolls.items() - except AttributeError: - # Otherwise, pcolls is a list/tuple, so we turn it into (index, pcoll) - # pairs. The result value constructor makes tuples with len(pcolls) slots. - pcolls = list(enumerate(pcolls)) - result_ctor_arg = len(pcolls) - result_ctor = lambda size: tuple([] for _ in range(size)) + result = ( + pcolls_dict | 'CoGroupByKeyImpl' >> _CoGBKImpl(pipeline=self.pipeline)) + if restore_tags: + return result | 'RestoreTags' >> MapTuple( + lambda k, vs: (k, restore_tags(vs))) + else: + return result + +class _CoGBKImpl(PTransform): + def __init__(self, *, pipeline=None): + self.pipeline = pipeline + + def expand(self, pcolls): # Check input PCollections for PCollection-ness, and that they all belong # to the same pipeline. - for _, pcoll in pcolls: + for pcoll in pcolls.values(): self._check_pcollection(pcoll) if self.pipeline: assert pcoll.pipeline == self.pipeline + tags = list(pcolls.keys()) + + def add_tag(tag): + return lambda k, v: (k, (tag, v)) + + def collect_values(key, tagged_values): + grouped_values = {tag: [] for tag in tags} + for tag, value in tagged_values: + grouped_values[tag].append(value) + return key, grouped_values + return ([ - pcoll | 'pair_with_%s' % tag >> Map(_pair_tag_with_value, tag) for tag, - pcoll in pcolls + pcoll + | 'Tag[%s]' % tag >> MapTuple(add_tag(tag)) + for (tag, pcoll) in pcolls.items() ] | Flatten(pipeline=self.pipeline) | GroupByKey() - | Map(_merge_tagged_vals_under_key, result_ctor, result_ctor_arg)) + | MapTuple(collect_values)) -def Keys(label='Keys'): # pylint: disable=invalid-name +@ptransform_fn +@typehints.with_input_types(Tuple[K, V]) +@typehints.with_output_types(K) +def Keys(pcoll, label='Keys'): # pylint: disable=invalid-name """Produces a PCollection of first elements of 2-tuples in a PCollection.""" - return label >> Map(lambda k_v: k_v[0]) + return pcoll | label >> MapTuple(lambda k, _: k) -def Values(label='Values'): # pylint: disable=invalid-name +@ptransform_fn +@typehints.with_input_types(Tuple[K, V]) +@typehints.with_output_types(V) +def Values(pcoll, label='Values'): # pylint: disable=invalid-name """Produces a PCollection of second elements of 2-tuples in a PCollection.""" - return label >> Map(lambda k_v1: k_v1[1]) + return pcoll | label >> MapTuple(lambda _, v: v) -def KvSwap(label='KvSwap'): # pylint: disable=invalid-name +@ptransform_fn +@typehints.with_input_types(Tuple[K, V]) +@typehints.with_output_types(Tuple[V, K]) +def KvSwap(pcoll, label='KvSwap'): # pylint: disable=invalid-name """Produces a PCollection reversing 2-tuples in a PCollection.""" - return label >> Map(lambda k_v2: (k_v2[1], k_v2[0])) + return pcoll | label >> MapTuple(lambda k, v: (v, k)) @ptransform_fn +@typehints.with_input_types(T) +@typehints.with_output_types(T) def Distinct(pcoll): # pylint: disable=invalid-name """Produces a PCollection containing distinct elements of a PCollection.""" return ( @@ -242,6 +258,8 @@ def Distinct(pcoll): # pylint: disable=invalid-name @deprecated(since='2.12', current='Distinct') @ptransform_fn +@typehints.with_input_types(T) +@typehints.with_output_types(T) def RemoveDuplicates(pcoll): """Produces a PCollection containing distinct elements of a PCollection.""" return pcoll | 'RemoveDuplicates' >> Distinct() @@ -707,17 +725,13 @@ class Reshuffle(PTransform): """ def expand(self, pcoll): # type: (pvalue.PValue) -> pvalue.PCollection - if sys.version_info >= (3, ): - KeyedT = Tuple[int, T] - else: - KeyedT = Tuple[long, T] # pylint: disable=long-builtin return ( pcoll - | 'AddRandomKeys' >> Map(lambda t: (random.getrandbits( - 32), t)).with_input_types(T).with_output_types(KeyedT) + | 'AddRandomKeys' >> Map(lambda t: (random.getrandbits(32), t)). + with_input_types(T).with_output_types(Tuple[int, T]) | ReshufflePerKey() - | 'RemoveRandomKeys' >> - Map(lambda t: t[1]).with_input_types(KeyedT).with_output_types(T)) + | 'RemoveRandomKeys' >> Map(lambda t: t[1]).with_input_types( + Tuple[int, T]).with_output_types(T)) def to_runner_api_parameter(self, unused_context): # type: (PipelineContext) -> Tuple[str, None] @@ -730,14 +744,40 @@ def from_runner_api_parameter( return Reshuffle() +def fn_takes_side_inputs(fn): + try: + signature = get_signature(fn) + except TypeError: + # We can't tell; maybe it does. + return True + + return ( + len(signature.parameters) > 1 or any( + p.kind == p.VAR_POSITIONAL or p.kind == p.VAR_KEYWORD + for p in signature.parameters.values())) + + @ptransform_fn -def WithKeys(pcoll, k): +def WithKeys(pcoll, k, *args, **kwargs): """PTransform that takes a PCollection, and either a constant key or a callable, and returns a PCollection of (K, V), where each of the values in the input PCollection has been paired with either the constant key or a key - computed from the value. + computed from the value. The callable may optionally accept positional or + keyword arguments, which should be passed to WithKeys directly. These may + be either SideInputs or static (non-PCollection) values, such as ints. """ if callable(k): + if fn_takes_side_inputs(k): + if all([isinstance(arg, AsSideInput) + for arg in args]) and all([isinstance(kwarg, AsSideInput) + for kwarg in kwargs.values()]): + return pcoll | Map( + lambda v, + *args, + **kwargs: (k(v, *args, **kwargs), v), + *args, + **kwargs) + return pcoll | Map(lambda v: (k(v, *args, **kwargs), v)) return pcoll | Map(lambda v: (k(v), v)) return pcoll | Map(lambda v: (k, v)) @@ -797,7 +837,10 @@ def from_runner_api_parameter(unused_ptransform, proto, unused_context): return GroupIntoBatches(*_GroupIntoBatchesParams.parse_payload(proto)) @typehints.with_input_types(Tuple[K, V]) - @typehints.with_output_types(Tuple[K, Iterable[V]]) + @typehints.with_output_types( + typehints.Tuple[ + ShardedKeyType[typehints.TypeVariable(K)], # type: ignore[misc] + typehints.Iterable[typehints.TypeVariable(V)]]) class WithShardedKey(PTransform): """A GroupIntoBatches transform that outputs batched elements associated with sharded input keys. @@ -807,16 +850,19 @@ class WithShardedKey(PTransform): override the default sharding to do a better load balancing during the execution time. """ - def __init__(self, batch_size, max_buffering_duration_secs=None): + def __init__( + self, batch_size, max_buffering_duration_secs=None, clock=time.time): """Create a new GroupIntoBatches with sharded output. See ``GroupIntoBatches`` transform for a description of input parameters. """ self.params = _GroupIntoBatchesParams( batch_size, max_buffering_duration_secs) + self.clock = clock _shard_id_prefix = uuid.uuid4().bytes def expand(self, pcoll): + key_type, value_type = pcoll.element_type.tuple_types sharded_pcoll = pcoll | Map( lambda key_value: ( ShardedKey( @@ -824,11 +870,16 @@ def expand(self, pcoll): # Use [uuid, thread id] as the shard id. GroupIntoBatches.WithShardedKey._shard_id_prefix + bytes( threading.get_ident().to_bytes(8, 'big'))), - key_value[1])) + key_value[1])).with_output_types( + typehints.Tuple[ + ShardedKeyType[key_type], # type: ignore[misc] + value_type]) return ( sharded_pcoll | GroupIntoBatches( - self.params.batch_size, self.params.max_buffering_duration_secs)) + self.params.batch_size, + self.params.max_buffering_duration_secs, + self.clock)) def to_runner_api_parameter( self, diff --git a/sdks/python/apache_beam/transforms/util_test.py b/sdks/python/apache_beam/transforms/util_test.py index 002213046fa9..94cc8f37bf48 100644 --- a/sdks/python/apache_beam/transforms/util_test.py +++ b/sdks/python/apache_beam/transforms/util_test.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import logging import math import random @@ -29,12 +26,8 @@ import time import unittest import warnings -from builtins import object -from builtins import range -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import -from nose.plugins.attrib import attr +import pytest import apache_beam as beam from apache_beam import GroupByKey @@ -45,6 +38,8 @@ from apache_beam.options.pipeline_options import StandardOptions from apache_beam.portability import common_urns from apache_beam.portability.api import beam_runner_api_pb2 +from apache_beam.pvalue import AsList +from apache_beam.pvalue import AsSingleton from apache_beam.runners import pipeline_context from apache_beam.testing.test_pipeline import TestPipeline from apache_beam.testing.test_stream import TestStream @@ -65,6 +60,8 @@ from apache_beam.transforms.window import Sessions from apache_beam.transforms.window import SlidingWindows from apache_beam.transforms.window import TimestampedValue +from apache_beam.typehints import typehints +from apache_beam.typehints.sharded_key_type import ShardedKeyType from apache_beam.utils import proto_utils from apache_beam.utils import timestamp from apache_beam.utils.timestamp import MAX_TIMESTAMP @@ -548,7 +545,7 @@ def test_reshuffle_streaming_global_window(self): assert_that( after_reshuffle, equal_to(expected_data), label='after reshuffle') - @attr('ValidatesRunner') + @pytest.mark.it_validatesrunner def test_reshuffle_preserves_timestamps(self): with TestPipeline() as pipeline: @@ -633,6 +630,30 @@ def test_callable_k(self): with_keys = pc | util.WithKeys(lambda x: x * x) assert_that(with_keys, equal_to([(1, 1), (4, 2), (9, 3)])) + @staticmethod + def _test_args_kwargs_fn(x, multiply, subtract): + return x * multiply - subtract + + def test_args_kwargs_k(self): + with TestPipeline() as p: + pc = p | beam.Create(self.l) + with_keys = pc | util.WithKeys( + WithKeysTest._test_args_kwargs_fn, 2, subtract=1) + assert_that(with_keys, equal_to([(1, 1), (3, 2), (5, 3)])) + + def test_sideinputs(self): + with TestPipeline() as p: + pc = p | beam.Create(self.l) + si1 = AsList(p | "side input 1" >> beam.Create([1, 2, 3])) + si2 = AsSingleton(p | "side input 2" >> beam.Create([10])) + with_keys = pc | util.WithKeys( + lambda x, + the_list, + the_singleton: x + sum(the_list) + the_singleton, + si1, + the_singleton=si2) + assert_that(with_keys, equal_to([(17, 1), (18, 2), (19, 3)])) + class GroupIntoBatchesTest(unittest.TestCase): NUM_ELEMENTS = 10 @@ -718,8 +739,8 @@ def test_buffering_timer_in_fixed_window_streaming(self): max_buffering_duration_secs, fake_clock) | "count elements in batch" >> Map(lambda x: (None, len(x[1]))) - | "global window" >> WindowInto(GlobalWindows()) | GroupByKey() + | "global window" >> WindowInto(GlobalWindows()) | FlatMapTuple(lambda k, vs: vs)) # Window duration is 6 and batch size is 5, so output batch size @@ -777,6 +798,28 @@ def test_buffering_timer_in_global_window_streaming(self): # the global window ends. assert_that(num_elements_per_batch, equal_to([9, 1])) + def test_output_typehints(self): + transform = util.GroupIntoBatches.WithShardedKey( + GroupIntoBatchesTest.BATCH_SIZE) + unused_input_type = typehints.Tuple[str, str] + output_type = transform.infer_output_type(unused_input_type) + self.assertTrue(isinstance(output_type, typehints.TupleConstraint)) + k, v = output_type.tuple_types + self.assertTrue(isinstance(k, ShardedKeyType)) + self.assertTrue(isinstance(v, typehints.IterableTypeConstraint)) + + with TestPipeline() as pipeline: + collection = ( + pipeline + | beam.Create([((1, 2), 'a'), ((2, 3), 'b')]) + | util.GroupIntoBatches.WithShardedKey( + GroupIntoBatchesTest.BATCH_SIZE)) + self.assertTrue( + collection.element_type, + typehints.Tuple[ + ShardedKeyType[typehints.Tuple[int, int]], # type: ignore[misc] + typehints.Iterable[str]]) + def _test_runner_api_round_trip(self, transform, urn): context = pipeline_context.PipelineContext() proto = transform.to_runner_api(context) @@ -791,7 +834,7 @@ def _test_runner_api_round_trip(self, transform, urn): transform_from_proto = ( transform.__class__.from_runner_api_parameter(None, payload, None)) - self.assertTrue(isinstance(transform_from_proto, transform.__class__)) + self.assertIsInstance(transform_from_proto, transform.__class__) self.assertEqual(transform.params, transform_from_proto.params) def test_runner_api(self): diff --git a/sdks/python/apache_beam/transforms/validate_runner_xlang_test.py b/sdks/python/apache_beam/transforms/validate_runner_xlang_test.py index e24587bc2839..9e33ba5399f5 100644 --- a/sdks/python/apache_beam/transforms/validate_runner_xlang_test.py +++ b/sdks/python/apache_beam/transforms/validate_runner_xlang_test.py @@ -50,15 +50,12 @@ for further details. """ -from __future__ import absolute_import - import logging import os import typing import unittest -from nose.plugins.attrib import attr -from past.builtins import unicode +import pytest import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline @@ -94,7 +91,7 @@ def run_prefix(self, pipeline): with pipeline as p: res = ( p - | beam.Create(['a', 'b']).with_output_types(unicode) + | beam.Create(['a', 'b']).with_output_types(str) | beam.ExternalTransform( TEST_PREFIX_URN, ImplicitSchemaPayloadBuilder({'data': u'0'}), @@ -113,10 +110,10 @@ def run_multi_input_output_with_sideinput(self, pipeline): """ with pipeline as p: main1 = p | 'Main1' >> beam.Create( - ['a', 'bb'], reshuffle=False).with_output_types(unicode) + ['a', 'bb'], reshuffle=False).with_output_types(str) main2 = p | 'Main2' >> beam.Create( - ['x', 'yy', 'zzz'], reshuffle=False).with_output_types(unicode) - side = p | 'Side' >> beam.Create(['s']).with_output_types(unicode) + ['x', 'yy', 'zzz'], reshuffle=False).with_output_types(str) + side = p | 'Side' >> beam.Create(['s']).with_output_types(str) res = dict( main1=main1, main2=main2, side=side) | beam.ExternalTransform( TEST_MULTI_URN, None, self.expansion_service) @@ -138,7 +135,7 @@ def run_group_by_key(self, pipeline): p | beam.Create([(0, "1"), (0, "2"), (1, "3")], reshuffle=False).with_output_types( - typing.Tuple[int, unicode]) + typing.Tuple[int, str]) | beam.ExternalTransform(TEST_GBK_URN, None, self.expansion_service) | beam.Map(lambda x: "{}:{}".format(x[0], ','.join(sorted(x[1]))))) assert_that(res, equal_to(['0:1,2', '1:3'])) @@ -156,10 +153,10 @@ def run_cogroup_by_key(self, pipeline): with pipeline as p: col1 = p | 'create_col1' >> beam.Create( [(0, "1"), (0, "2"), (1, "3")], reshuffle=False).with_output_types( - typing.Tuple[int, unicode]) + typing.Tuple[int, str]) col2 = p | 'create_col2' >> beam.Create( [(0, "4"), (1, "5"), (1, "6")], reshuffle=False).with_output_types( - typing.Tuple[int, unicode]) + typing.Tuple[int, str]) res = ( dict(col1=col1, col2=col2) | beam.ExternalTransform(TEST_CGBK_URN, None, self.expansion_service) @@ -197,8 +194,8 @@ def run_combine_per_key(self, pipeline): with pipeline as p: res = ( p - | beam.Create([('a', 1), ('a', 2), ('b', 3)]).with_output_types( - typing.Tuple[unicode, int]) + | beam.Create([('a', 1), ('a', 2), + ('b', 3)]).with_output_types(typing.Tuple[str, int]) | beam.ExternalTransform( TEST_COMPK_URN, None, self.expansion_service)) assert_that(res, equal_to([('a', 3), ('b', 3)])) @@ -240,7 +237,6 @@ def run_partition(self, pipeline): assert_that(res['1'], equal_to([1, 3, 5]), label='check_odd') -@attr('UsesCrossLanguageTransforms') @unittest.skipUnless( os.environ.get('EXPANSION_PORT'), "EXPANSION_PORT environment var is not provided.") @@ -252,34 +248,43 @@ def create_pipeline(self): test_pipeline.not_use_test_runner_api = True return test_pipeline + @pytest.mark.xlang_transforms def test_prefix(self, test_pipeline=None): CrossLanguageTestPipelines().run_prefix( test_pipeline or self.create_pipeline()) + @pytest.mark.xlang_transforms def test_multi_input_output_with_sideinput(self, test_pipeline=None): CrossLanguageTestPipelines().run_multi_input_output_with_sideinput( test_pipeline or self.create_pipeline()) + @pytest.mark.xlang_transforms def test_group_by_key(self, test_pipeline=None): CrossLanguageTestPipelines().run_group_by_key( test_pipeline or self.create_pipeline()) + @pytest.mark.xlang_transforms def test_cogroup_by_key(self, test_pipeline=None): CrossLanguageTestPipelines().run_cogroup_by_key( test_pipeline or self.create_pipeline()) + @pytest.mark.xlang_transforms def test_combine_globally(self, test_pipeline=None): CrossLanguageTestPipelines().run_combine_globally( test_pipeline or self.create_pipeline()) + @pytest.mark.xlang_transforms def test_combine_per_key(self, test_pipeline=None): CrossLanguageTestPipelines().run_combine_per_key( test_pipeline or self.create_pipeline()) + # TODO: enable after fixing BEAM-10507 + # @pytest.mark.xlang_transforms def test_flatten(self, test_pipeline=None): CrossLanguageTestPipelines().run_flatten( test_pipeline or self.create_pipeline()) + @pytest.mark.xlang_transforms def test_partition(self, test_pipeline=None): CrossLanguageTestPipelines().run_partition( test_pipeline or self.create_pipeline()) diff --git a/sdks/python/apache_beam/transforms/window.py b/sdks/python/apache_beam/transforms/window.py index bdf0f2617167..de16c73079b9 100644 --- a/sdks/python/apache_beam/transforms/window.py +++ b/sdks/python/apache_beam/transforms/window.py @@ -49,18 +49,13 @@ # pytype: skip-file -from __future__ import absolute_import - import abc -from builtins import object -from builtins import range from functools import total_ordering from typing import Any from typing import Iterable from typing import List from typing import Optional -from future.utils import with_metaclass from google.protobuf import duration_pb2 from google.protobuf import timestamp_pb2 @@ -121,7 +116,7 @@ def get_impl(timestamp_combiner, window_fn): raise ValueError('Invalid TimestampCombiner: %s.' % timestamp_combiner) -class WindowFn(with_metaclass(abc.ABCMeta, urns.RunnerApiFn)): # type: ignore[misc] +class WindowFn(urns.RunnerApiFn, metaclass=abc.ABCMeta): """An abstract windowing function defining a basic assign and merge.""" class AssignContext(object): """Context passed to WindowFn.assign().""" @@ -304,9 +299,6 @@ def __eq__(self, other): def __hash__(self): return hash((self.value, self.timestamp)) - def __ne__(self, other): - return not self == other - def __lt__(self, other): if type(self) != type(other): return type(self).__name__ < type(other).__name__ @@ -338,9 +330,6 @@ def __eq__(self, other): # Global windows are always and only equal to each other. return self is other or type(self) is type(other) - def __ne__(self, other): - return not self == other - @property def start(self): # type: () -> Timestamp @@ -391,9 +380,6 @@ def __eq__(self, other): # Global windowfn is always and only equal to each other. return self is other or type(self) is type(other) - def __ne__(self, other): - return not self == other - def to_runner_api_parameter(self, context): return common_urns.global_windows.urn, None @@ -454,9 +440,6 @@ def __eq__(self, other): def __hash__(self): return hash((self.size, self.offset)) - def __ne__(self, other): - return not self == other - def to_runner_api_parameter(self, context): return ( common_urns.fixed_windows.urn, @@ -525,9 +508,6 @@ def __eq__(self, other): self.size == other.size and self.offset == other.offset and self.period == other.period) - def __ne__(self, other): - return not self == other - def __hash__(self): return hash((self.offset, self.period)) @@ -604,9 +584,6 @@ def __eq__(self, other): if type(self) == type(other) == Sessions: return self.gap_size == other.gap_size - def __ne__(self, other): - return not self == other - def __hash__(self): return hash(self.gap_size) diff --git a/sdks/python/apache_beam/transforms/window_test.py b/sdks/python/apache_beam/transforms/window_test.py index 3b56a3eb20de..7369090729d6 100644 --- a/sdks/python/apache_beam/transforms/window_test.py +++ b/sdks/python/apache_beam/transforms/window_test.py @@ -18,11 +18,7 @@ """Unit tests for the windowing classes.""" # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - import unittest -from builtins import range import apache_beam as beam from apache_beam.coders import coders diff --git a/sdks/python/apache_beam/transforms/write_ptransform_test.py b/sdks/python/apache_beam/transforms/write_ptransform_test.py index 7ab656324831..ce402d8d3062 100644 --- a/sdks/python/apache_beam/transforms/write_ptransform_test.py +++ b/sdks/python/apache_beam/transforms/write_ptransform_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import logging import unittest diff --git a/sdks/python/apache_beam/typehints/__init__.py b/sdks/python/apache_beam/typehints/__init__.py index 23d0b40d07f3..e89afa1285a7 100644 --- a/sdks/python/apache_beam/typehints/__init__.py +++ b/sdks/python/apache_beam/typehints/__init__.py @@ -17,8 +17,6 @@ """A package defining the syntax and decorator semantics for type-hints.""" -from __future__ import absolute_import - # pylint: disable=wildcard-import from apache_beam.typehints.typehints import * from apache_beam.typehints.decorators import * diff --git a/sdks/python/apache_beam/typehints/decorators.py b/sdks/python/apache_beam/typehints/decorators.py index c0ebe9d022dd..478869dc289f 100644 --- a/sdks/python/apache_beam/typehints/decorators.py +++ b/sdks/python/apache_beam/typehints/decorators.py @@ -61,12 +61,6 @@ def compress_point(ec_point): def int_to_str(a): return str(a) -Type-hinting a function with arguments that unpack tuples are also supported -(in Python 2 only). As an example, such a function would be defined as:: - - def foo((a, b)): - ... - The valid type-hint for such as function looks like the following:: @with_input_types(a=int, b=int) @@ -85,17 +79,12 @@ def foo((a, b)): # pytype: skip-file -from __future__ import absolute_import - import inspect import itertools import logging import sys import traceback import types -from builtins import next -from builtins import object -from builtins import zip from typing import Any from typing import Callable from typing import Dict @@ -115,11 +104,6 @@ def foo((a, b)): from apache_beam.typehints.typehints import check_constraint from apache_beam.typehints.typehints import validate_composite_type_param -try: - import funcsigs # Python 2 only. -except ImportError: - funcsigs = None - __all__ = [ 'disable_type_annotations', 'no_annotations', @@ -142,44 +126,6 @@ def foo((a, b)): _ANY_VAR_KEYWORD = typehints.Dict[typehints.Any, typehints.Any] _disable_from_callable = False -try: - _original_getfullargspec = inspect.getfullargspec - _use_full_argspec = True -except AttributeError: # Python 2 - _original_getfullargspec = inspect.getargspec # type: ignore - _use_full_argspec = False - - -def getfullargspec(func): - # Python 3: Use get_signature instead. - assert sys.version_info < (3, ), 'This method should not be used in Python 3' - try: - return _original_getfullargspec(func) - except TypeError: - if isinstance(func, type): - argspec = getfullargspec(func.__init__) - del argspec.args[0] - return argspec - elif callable(func): - try: - return _original_getfullargspec(func.__call__) - except TypeError: - # Return an ArgSpec with at least one positional argument, - # and any number of other (positional or keyword) arguments - # whose name won't match any real argument. - # Arguments with the %unknown% prefix will be ignored in the type - # checking code. - if _use_full_argspec: - return inspect.FullArgSpec(['_'], - '__unknown__varargs', - '__unknown__keywords', (), [], {}, {}) - else: # Python 2 - return inspect.ArgSpec(['_'], - '__unknown__varargs', - '__unknown__keywords', ()) - else: - raise - def get_signature(func): """Like inspect.signature(), but supports Py2 as well. @@ -188,26 +134,18 @@ def get_signature(func): latter: 'the "self" parameter is always reported, even for bound methods' https://github.com/python/cpython/blob/44f91c388a6f4da9ed3300df32ca290b8aa104ea/Lib/inspect.py#L1103 """ - # Fall back on funcsigs if inspect module doesn't have 'signature'; prefer - # inspect.signature over funcsigs.signature if both are available. - if hasattr(inspect, 'signature'): - inspect_ = inspect - else: - inspect_ = funcsigs - try: - signature = inspect_.signature(func) + signature = inspect.signature(func) except ValueError: # Fall back on a catch-all signature. params = [ - inspect_.Parameter('_', inspect_.Parameter.POSITIONAL_OR_KEYWORD), - inspect_.Parameter( - '__unknown__varargs', inspect_.Parameter.VAR_POSITIONAL), - inspect_.Parameter( - '__unknown__keywords', inspect_.Parameter.VAR_KEYWORD) + inspect.Parameter('_', inspect.Parameter.POSITIONAL_OR_KEYWORD), + inspect.Parameter( + '__unknown__varargs', inspect.Parameter.VAR_POSITIONAL), + inspect.Parameter('__unknown__keywords', inspect.Parameter.VAR_KEYWORD) ] - signature = inspect_.Signature(params) + signature = inspect.Signature(params) # This is a specialization to hint the first argument of certain builtins, # such as str.strip. @@ -539,9 +477,6 @@ def same(a, b): same(self.input_types, other.input_types) and same(self.output_types, other.output_types)) - def __ne__(self, other): - return not self == other - def __hash__(self): return hash(str(self)) @@ -641,65 +576,6 @@ def _unpack_positional_arg_hints(arg, hint): return hint -def getcallargs_forhints(func, *typeargs, **typekwargs): - """Like inspect.getcallargs, with support for declaring default args as Any. - - In Python 2, understands that Tuple[] and an Any unpack. - - Returns: - (Dict[str, Any]) A dictionary from arguments names to values. - """ - if sys.version_info < (3, ): - return getcallargs_forhints_impl_py2(func, typeargs, typekwargs) - else: - return getcallargs_forhints_impl_py3(func, typeargs, typekwargs) - - -def getcallargs_forhints_impl_py2(func, typeargs, typekwargs): - argspec = getfullargspec(func) - # Turn Tuple[x, y] into (x, y) so getcallargs can do the proper unpacking. - packed_typeargs = [ - _unpack_positional_arg_hints(arg, hint) - for (arg, hint) in zip(argspec.args, typeargs) - ] - packed_typeargs += list(typeargs[len(packed_typeargs):]) - - # Monkeypatch inspect.getfullargspec to allow passing non-function objects. - # getfullargspec (getargspec on Python 2) are used by inspect.getcallargs. - # TODO(BEAM-5490): Reimplement getcallargs and stop relying on monkeypatch. - inspect.getargspec = getfullargspec - try: - callargs = inspect.getcallargs(func, *packed_typeargs, **typekwargs) # pylint: disable=deprecated-method - except TypeError as e: - raise TypeCheckError(e) - finally: - # Revert monkey-patch. - inspect.getargspec = _original_getfullargspec - - if argspec.defaults: - # Declare any default arguments to be Any. - for k, var in enumerate(reversed(argspec.args)): - if k >= len(argspec.defaults): - break - if callargs.get(var, None) is argspec.defaults[-k - 1]: - callargs[var] = typehints.Any - # Patch up varargs and keywords - if argspec.varargs: - # TODO(BEAM-8122): This will always assign _ANY_VAR_POSITIONAL. Should be - # "callargs.get(...) or _ANY_VAR_POSITIONAL". - callargs[argspec.varargs] = typekwargs.get( - argspec.varargs, _ANY_VAR_POSITIONAL) - - varkw = argspec.keywords - if varkw: - # TODO(robertwb): Consider taking the union of key and value types. - callargs[varkw] = typekwargs.get(varkw, _ANY_VAR_KEYWORD) - - # TODO(BEAM-5878) Support kwonlyargs. - - return callargs - - def _normalize_var_positional_hint(hint): """Converts a var_positional hint into Tuple[Union[], ...] form. @@ -746,7 +622,7 @@ def _normalize_var_keyword_hint(hint, arg_name): return typehints.Dict[str, typehints.Union[values]] -def getcallargs_forhints_impl_py3(func, type_args, type_kwargs): +def getcallargs_forhints(func, *type_args, **type_kwargs): """Bind type_args and type_kwargs to func. Works like inspect.getcallargs, with some modifications to support type hint diff --git a/sdks/python/apache_beam/typehints/decorators_test.py b/sdks/python/apache_beam/typehints/decorators_test.py index 85ca94f63e92..faf0b55cc357 100644 --- a/sdks/python/apache_beam/typehints/decorators_test.py +++ b/sdks/python/apache_beam/typehints/decorators_test.py @@ -19,24 +19,31 @@ # pytype: skip-file -from __future__ import absolute_import - -import re +import functools import sys +import typing import unittest -import future.tests.base # pylint: disable=unused-import - +from apache_beam import Map from apache_beam.typehints import Any +from apache_beam.typehints import Dict from apache_beam.typehints import List +from apache_beam.typehints import Tuple +from apache_beam.typehints import TypeCheckError +from apache_beam.typehints import TypeVariable from apache_beam.typehints import WithTypeHints from apache_beam.typehints import decorators from apache_beam.typehints import typehints +T = TypeVariable('T') +# Name is 'T' so it converts to a beam type with the same name. +# mypy requires that the name of the variable match, so we must ignore this. +T_typing = typing.TypeVar('T') # type: ignore + class IOTypeHintsTest(unittest.TestCase): def test_get_signature(self): - # Basic coverage only to make sure function works in Py2 and Py3. + # Basic coverage only to make sure function works. def fn(a, b=1, *c, **d): return a, b, c, d @@ -55,7 +62,6 @@ def test_get_signature_builtin(self): self.assertEqual(s.return_annotation, List[Any]) def test_from_callable_without_annotations(self): - # Python 2 doesn't support annotations. See decorators_test_py3.py for that. def fn(a, b=None, *args, **kwargs): return a, b, args, kwargs @@ -120,8 +126,7 @@ def test_make_traceback(self): origin = ''.join( decorators.IOTypeHints.empty().with_input_types(str).origin) self.assertRegex(origin, __name__) - # TODO: use self.assertNotRegex once py2 support is removed. - self.assertIsNone(re.search(r'\b_make_traceback', origin), msg=origin) + self.assertNotRegex(origin, r'\b_make_traceback') def test_origin(self): th = decorators.IOTypeHints.empty() @@ -153,6 +158,140 @@ def test_with_defaults_noop_does_not_grow_origin(self): th = th.with_defaults(th2) self.assertNotEqual(expected_id, id(th)) + def test_from_callable(self): + def fn( + a: int, + b: str = '', + *args: Tuple[T], + foo: List[int], + **kwargs: Dict[str, str]) -> Tuple[Any, ...]: + return a, b, args, foo, kwargs + + th = decorators.IOTypeHints.from_callable(fn) + self.assertEqual( + th.input_types, ((int, str, Tuple[T]), { + 'foo': List[int], 'kwargs': Dict[str, str] + })) + self.assertEqual(th.output_types, ((Tuple[Any, ...], ), {})) + + def test_from_callable_partial_annotations(self): + def fn(a: int, b=None, *args, foo: List[int], **kwargs): + return a, b, args, foo, kwargs + + th = decorators.IOTypeHints.from_callable(fn) + self.assertEqual( + th.input_types, + ((int, Any, Tuple[Any, ...]), { + 'foo': List[int], 'kwargs': Dict[Any, Any] + })) + self.assertEqual(th.output_types, ((Any, ), {})) + + def test_from_callable_class(self): + class Class(object): + def __init__(self, unused_arg: int): + pass + + th = decorators.IOTypeHints.from_callable(Class) + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((Class, ), {})) + + def test_from_callable_method(self): + class Class(object): + def method(self, arg: T = None) -> None: + pass + + th = decorators.IOTypeHints.from_callable(Class.method) + self.assertEqual(th.input_types, ((Any, T), {})) + self.assertEqual(th.output_types, ((None, ), {})) + + th = decorators.IOTypeHints.from_callable(Class().method) + self.assertEqual(th.input_types, ((T, ), {})) + self.assertEqual(th.output_types, ((None, ), {})) + + def test_from_callable_convert_to_beam_types(self): + def fn( + a: typing.List[int], + b: str = '', + *args: typing.Tuple[T_typing], + foo: typing.List[int], + **kwargs: typing.Dict[str, str]) -> typing.Tuple[typing.Any, ...]: + return a, b, args, foo, kwargs + + th = decorators.IOTypeHints.from_callable(fn) + self.assertEqual( + th.input_types, + ((List[int], str, Tuple[T]), { + 'foo': List[int], 'kwargs': Dict[str, str] + })) + self.assertEqual(th.output_types, ((Tuple[Any, ...], ), {})) + + def test_from_callable_partial(self): + def fn(a: int) -> int: + return a + + # functools.partial objects don't have __name__ attributes by default. + fn = functools.partial(fn, 1) + th = decorators.IOTypeHints.from_callable(fn) + self.assertRegex(th.debug_str(), r'unknown') + + def test_getcallargs_forhints(self): + def fn( + a: int, + b: str = '', + *args: Tuple[T], + foo: List[int], + **kwargs: Dict[str, str]) -> Tuple[Any, ...]: + return a, b, args, foo, kwargs + + callargs = decorators.getcallargs_forhints(fn, float, foo=List[str]) + self.assertDictEqual( + callargs, + { + 'a': float, + 'b': str, + 'args': Tuple[T], + 'foo': List[str], + 'kwargs': Dict[str, str] + }) + + def test_getcallargs_forhints_default_arg(self): + # Default args are not necessarily types, so they should be ignored. + def fn(a=List[int], b=None, *args, foo=(), **kwargs) -> Tuple[Any, ...]: + return a, b, args, foo, kwargs + + callargs = decorators.getcallargs_forhints(fn) + self.assertDictEqual( + callargs, + { + 'a': Any, + 'b': Any, + 'args': Tuple[Any, ...], + 'foo': Any, + 'kwargs': Dict[Any, Any] + }) + + def test_getcallargs_forhints_missing_arg(self): + def fn(a, b=None, *args, foo, **kwargs): + return a, b, args, foo, kwargs + + with self.assertRaisesRegex(decorators.TypeCheckError, "missing.*'a'"): + decorators.getcallargs_forhints(fn, foo=List[int]) + with self.assertRaisesRegex(decorators.TypeCheckError, "missing.*'foo'"): + decorators.getcallargs_forhints(fn, 5) + + def test_origin(self): + def annotated(e: str) -> str: + return e + + t = Map(annotated) + th = t.get_type_hints() + th = th.with_input_types(str) + self.assertRegex(th.debug_str(), r'with_input_types') + th = th.with_output_types(str) + self.assertRegex( + th.debug_str(), + r'(?s)with_output_types.*with_input_types.*Map.annotated') + class WithTypeHintsTest(unittest.TestCase): def test_get_type_hints_no_settings(self): @@ -231,5 +370,37 @@ def test_disable_type_annotations(self): self.assertTrue(decorators._disable_from_callable) +class DecoratorsTest(unittest.TestCase): + def test_no_annotations(self): + def fn(a: int) -> int: + return a + + with self.assertRaisesRegex(TypeCheckError, + r'requires .*int.* but got .*str'): + _ = ['a', 'b', 'c'] | Map(fn) + + # Same pipeline doesn't raise without annotations on fn. + fn = decorators.no_annotations(fn) + _ = ['a', 'b', 'c'] | Map(fn) + + +class DecoratorsTest(unittest.TestCase): + def test_no_annotations(self): + def fn(a: int) -> int: + return a + + _ = [1, 2, 3] | Map(fn) # Doesn't raise - correct types. + + with self.assertRaisesRegex(TypeCheckError, + r'requires .*int.* but got .*str'): + _ = ['a', 'b', 'c'] | Map(fn) + + @decorators.no_annotations + def fn2(a: int) -> int: + return a + + _ = ['a', 'b', 'c'] | Map(fn2) # Doesn't raise - no input type hints. + + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/typehints/decorators_test_py3.py b/sdks/python/apache_beam/typehints/decorators_test_py3.py deleted file mode 100644 index 024a0d205b53..000000000000 --- a/sdks/python/apache_beam/typehints/decorators_test_py3.py +++ /dev/null @@ -1,215 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -"""Tests for decorators module with Python 3 syntax not supported by 2.7.""" - -# pytype: skip-file - -from __future__ import absolute_import - -import functools -import typing -import unittest - -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import - -from apache_beam import Map -from apache_beam.typehints import Any -from apache_beam.typehints import Dict -from apache_beam.typehints import List -from apache_beam.typehints import Tuple -from apache_beam.typehints import TypeCheckError -from apache_beam.typehints import TypeVariable -from apache_beam.typehints import decorators - -T = TypeVariable('T') -# Name is 'T' so it converts to a beam type with the same name. -# mypy requires that the name of the variable match, so we must ignore this. -T_typing = typing.TypeVar('T') # type: ignore - - -class IOTypeHintsTest(unittest.TestCase): - def test_from_callable(self): - def fn( - a: int, - b: str = '', - *args: Tuple[T], - foo: List[int], - **kwargs: Dict[str, str]) -> Tuple[Any, ...]: - return a, b, args, foo, kwargs - - th = decorators.IOTypeHints.from_callable(fn) - self.assertEqual( - th.input_types, ((int, str, Tuple[T]), { - 'foo': List[int], 'kwargs': Dict[str, str] - })) - self.assertEqual(th.output_types, ((Tuple[Any, ...], ), {})) - - def test_from_callable_partial_annotations(self): - def fn(a: int, b=None, *args, foo: List[int], **kwargs): - return a, b, args, foo, kwargs - - th = decorators.IOTypeHints.from_callable(fn) - self.assertEqual( - th.input_types, - ((int, Any, Tuple[Any, ...]), { - 'foo': List[int], 'kwargs': Dict[Any, Any] - })) - self.assertEqual(th.output_types, ((Any, ), {})) - - def test_from_callable_class(self): - class Class(object): - def __init__(self, unused_arg: int): - pass - - th = decorators.IOTypeHints.from_callable(Class) - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((Class, ), {})) - - def test_from_callable_method(self): - class Class(object): - def method(self, arg: T = None) -> None: - pass - - th = decorators.IOTypeHints.from_callable(Class.method) - self.assertEqual(th.input_types, ((Any, T), {})) - self.assertEqual(th.output_types, ((None, ), {})) - - th = decorators.IOTypeHints.from_callable(Class().method) - self.assertEqual(th.input_types, ((T, ), {})) - self.assertEqual(th.output_types, ((None, ), {})) - - def test_from_callable_convert_to_beam_types(self): - def fn( - a: typing.List[int], - b: str = '', - *args: typing.Tuple[T_typing], - foo: typing.List[int], - **kwargs: typing.Dict[str, str]) -> typing.Tuple[typing.Any, ...]: - return a, b, args, foo, kwargs - - th = decorators.IOTypeHints.from_callable(fn) - self.assertEqual( - th.input_types, - ((List[int], str, Tuple[T]), { - 'foo': List[int], 'kwargs': Dict[str, str] - })) - self.assertEqual(th.output_types, ((Tuple[Any, ...], ), {})) - - def test_from_callable_partial(self): - def fn(a: int) -> int: - return a - - # functools.partial objects don't have __name__ attributes by default. - fn = functools.partial(fn, 1) - th = decorators.IOTypeHints.from_callable(fn) - self.assertRegex(th.debug_str(), r'unknown') - - def test_getcallargs_forhints(self): - def fn( - a: int, - b: str = '', - *args: Tuple[T], - foo: List[int], - **kwargs: Dict[str, str]) -> Tuple[Any, ...]: - return a, b, args, foo, kwargs - - callargs = decorators.getcallargs_forhints(fn, float, foo=List[str]) - self.assertDictEqual( - callargs, - { - 'a': float, - 'b': str, - 'args': Tuple[T], - 'foo': List[str], - 'kwargs': Dict[str, str] - }) - - def test_getcallargs_forhints_default_arg(self): - # Default args are not necessarily types, so they should be ignored. - def fn(a=List[int], b=None, *args, foo=(), **kwargs) -> Tuple[Any, ...]: - return a, b, args, foo, kwargs - - callargs = decorators.getcallargs_forhints(fn) - self.assertDictEqual( - callargs, - { - 'a': Any, - 'b': Any, - 'args': Tuple[Any, ...], - 'foo': Any, - 'kwargs': Dict[Any, Any] - }) - - def test_getcallargs_forhints_missing_arg(self): - def fn(a, b=None, *args, foo, **kwargs): - return a, b, args, foo, kwargs - - with self.assertRaisesRegex(decorators.TypeCheckError, "missing.*'a'"): - decorators.getcallargs_forhints(fn, foo=List[int]) - with self.assertRaisesRegex(decorators.TypeCheckError, "missing.*'foo'"): - decorators.getcallargs_forhints(fn, 5) - - def test_origin(self): - def annotated(e: str) -> str: - return e - - t = Map(annotated) - th = t.get_type_hints() - th = th.with_input_types(str) - self.assertRegex(th.debug_str(), r'with_input_types') - th = th.with_output_types(str) - self.assertRegex( - th.debug_str(), - r'(?s)with_output_types.*with_input_types.*Map.annotated') - - -class DecoratorsTest(unittest.TestCase): - def test_no_annotations(self): - def fn(a: int) -> int: - return a - - with self.assertRaisesRegex(TypeCheckError, - r'requires .*int.* but got .*str'): - _ = ['a', 'b', 'c'] | Map(fn) - - # Same pipeline doesn't raise without annotations on fn. - fn = decorators.no_annotations(fn) - _ = ['a', 'b', 'c'] | Map(fn) - - -class DecoratorsTest(unittest.TestCase): - def test_no_annotations(self): - def fn(a: int) -> int: - return a - - _ = [1, 2, 3] | Map(fn) # Doesn't raise - correct types. - - with self.assertRaisesRegex(TypeCheckError, - r'requires .*int.* but got .*str'): - _ = ['a', 'b', 'c'] | Map(fn) - - @decorators.no_annotations - def fn2(a: int) -> int: - return a - - _ = ['a', 'b', 'c'] | Map(fn2) # Doesn't raise - no input type hints. - - -if __name__ == '__main__': - unittest.main() diff --git a/sdks/python/apache_beam/typehints/native_type_compatibility.py b/sdks/python/apache_beam/typehints/native_type_compatibility.py index 750dd9225dc3..3d69cd600c5c 100644 --- a/sdks/python/apache_beam/typehints/native_type_compatibility.py +++ b/sdks/python/apache_beam/typehints/native_type_compatibility.py @@ -19,13 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import logging import sys import typing -from builtins import next from apache_beam.typehints import typehints @@ -56,17 +53,6 @@ def _get_args(typ): except AttributeError: if isinstance(typ, typing.TypeVar): return (typ.__name__, ) - # On Python versions < 3.5.3, the Tuple and Union type from typing do - # not have an __args__ attribute, but a __tuple_params__, and a - # __union_params__ argument respectively. - if (3, 0, 0) <= sys.version_info[0:3] < (3, 5, 3): - if getattr(typ, '__tuple_params__', None) is not None: - if typ.__tuple_use_ellipsis__: - return typ.__tuple_params__ + (Ellipsis, ) - else: - return typ.__tuple_params__ - elif getattr(typ, '__union_params__', None) is not None: - return typ.__union_params__ return () @@ -118,7 +104,7 @@ def _match_is_exactly_iterable(user_type): return getattr(user_type, '__origin__', None) is expected_origin -def _match_is_named_tuple(user_type): +def match_is_named_tuple(user_type): return ( _safe_issubclass(user_type, typing.Tuple) and hasattr(user_type, '_field_types')) @@ -145,12 +131,6 @@ def _match_is_union(user_type): if user_type is typing.Union: return True - try: # Python 3.5.2 - if isinstance(user_type, typing.UnionMeta): - return True - except AttributeError: - pass - try: # Python 3.5.4+, or Python 2.7.14+ with typing 3.64 return user_type.__origin__ is typing.Union except AttributeError: @@ -245,7 +225,7 @@ def convert_to_beam_type(typ): # We just convert it to Any for now. # This MUST appear before the entry for the normal Tuple. _TypeMapEntry( - match=_match_is_named_tuple, arity=0, beam_type=typehints.Any), + match=match_is_named_tuple, arity=0, beam_type=typehints.Any), _TypeMapEntry( match=_match_issubclass(typing.Tuple), arity=-1, @@ -338,13 +318,6 @@ def convert_to_typing_type(typ): Raises: ValueError: The type was malformed or could not be converted. """ - - from apache_beam.coders.coders import CoderElementType - if isinstance(typ, CoderElementType): - # This represents an element that holds a coder. - # No special handling is needed here. - return typ - if isinstance(typ, typehints.TypeVariable): # This is a special case, as it's not parameterized by types. # Also, identity must be preserved through conversion (i.e. the same diff --git a/sdks/python/apache_beam/typehints/native_type_compatibility_test.py b/sdks/python/apache_beam/typehints/native_type_compatibility_test.py index 624b07c012bf..407e4d3d89c1 100644 --- a/sdks/python/apache_beam/typehints/native_type_compatibility_test.py +++ b/sdks/python/apache_beam/typehints/native_type_compatibility_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import sys import typing import unittest @@ -42,6 +40,13 @@ class _TestClass(object): pass +T = typing.TypeVar('T') + + +class _TestGeneric(typing.Generic[T]): + pass + + class NativeTypeCompatibilityTest(unittest.TestCase): def test_convert_to_beam_type(self): test_cases = [ @@ -66,14 +71,14 @@ def test_convert_to_beam_type(self): ('test class', _TestClass, _TestClass), ('test class in list', typing.List[_TestClass], typehints.List[_TestClass]), + ('generic bare', _TestGeneric, _TestGeneric), + ('generic subscripted', _TestGeneric[int], _TestGeneric[int]), ('complex tuple', typing.Tuple[bytes, typing.List[typing.Tuple[ bytes, typing.Union[int, bytes, float]]]], typehints.Tuple[bytes, typehints.List[typehints.Tuple[ bytes, typehints.Union[int, bytes, float]]]]), - # TODO(BEAM-7713): This case seems to fail on Py3.5.2 but not 3.5.4. ('arbitrary-length tuple', typing.Tuple[int, ...], - typehints.Tuple[int, ...]) - if sys.version_info >= (3, 5, 4) else None, + typehints.Tuple[int, ...]), ('flat alias', _TestFlatAlias, typehints.Tuple[bytes, float]), # type: ignore[misc] ('nested alias', _TestNestedAlias, typehints.List[typehints.Tuple[bytes, float]]), @@ -88,6 +93,10 @@ def test_convert_to_beam_type(self): typehints.TypeVariable('V')]), ('iterator', typing.Iterator[typing.Any], typehints.Iterator[typehints.Any]), + ('nested generic bare', typing.List[_TestGeneric], + typehints.List[_TestGeneric]), + ('nested generic subscripted', typing.List[_TestGeneric[int]], + typehints.List[_TestGeneric[int]]), ] for test_case in test_cases: diff --git a/sdks/python/apache_beam/typehints/opcodes.py b/sdks/python/apache_beam/typehints/opcodes.py index fd11b8121d5b..b8aa75ff5011 100644 --- a/sdks/python/apache_beam/typehints/opcodes.py +++ b/sdks/python/apache_beam/typehints/opcodes.py @@ -29,16 +29,12 @@ """ # pytype: skip-file -from __future__ import absolute_import - import inspect import logging import sys import types from functools import reduce -from past.builtins import unicode - from apache_beam.typehints import row_type from apache_beam.typehints import typehints from apache_beam.typehints.trivial_inference import BoundMethod @@ -160,7 +156,7 @@ def binary_true_divide(state, unused_arg): def binary_subscr(state, unused_arg): index = state.stack.pop() base = Const.unwrap(state.stack.pop()) - if base in (str, unicode): + if base is str: out = base elif (isinstance(index, Const) and isinstance(index.value, int) and isinstance(base, typehints.IndexableTypeConstraint)): @@ -266,7 +262,7 @@ def build_list(state, arg): # A Dict[Union[], Union[]] is the type of an empty dict. def build_map(state, arg): - if sys.version_info <= (2, ) or arg == 0: + if arg == 0: state.stack.append(Dict[Union[()], Union[()]]) else: state.stack[-2 * arg:] = [ @@ -308,10 +304,7 @@ def _getattr(o, name): isinstance(getattr(o, name, None), (types.MethodType, types.FunctionType))): # TODO(luke-zhu): Support other callable objects - if sys.version_info[0] == 2: - func = getattr(o, name).__func__ - else: - func = getattr(o, name) # Python 3 has no unbound methods + func = getattr(o, name) # Python 3 has no unbound methods return Const(BoundMethod(func, o)) elif isinstance(o, row_type.RowTypeConstraint): return o.get_type_for(name) @@ -384,36 +377,20 @@ def make_function(state, arg): """ # TODO(luke-zhu): Handle default argument types globals = state.f.__globals__ # Inherits globals from the current frame - if sys.version_info[0] == 2: - func_code = state.stack[-1].value - func = types.FunctionType(func_code, globals) - # argc is the number of default parameters. Ignored here. - pop_count = 1 + arg - else: # Python 3.x - func_name = state.stack[-1].value - func_code = state.stack[-2].value - pop_count = 2 - closure = None - if sys.version_info[:2] == (3, 5): - # https://docs.python.org/3.5/library/dis.html#opcode-MAKE_FUNCTION - num_default_pos_args = (arg & 0xff) - num_default_kwonly_args = ((arg >> 8) & 0xff) - num_annotations = ((arg >> 16) & 0x7fff) - pop_count += ( - num_default_pos_args + 2 * num_default_kwonly_args + num_annotations + - num_annotations > 0) - elif sys.version_info >= (3, 6): - # arg contains flags, with corresponding stack values if positive. - # https://docs.python.org/3.6/library/dis.html#opcode-MAKE_FUNCTION - pop_count += bin(arg).count('1') - if arg & 0x08: - # Convert types in Tuple constraint to a tuple of CPython cells. - # https://stackoverflow.com/a/44670295 - closure = tuple((lambda _: lambda: _)(t).__closure__[0] - for t in state.stack[-3].tuple_types) - - func = types.FunctionType( - func_code, globals, name=func_name, closure=closure) + func_name = state.stack[-1].value + func_code = state.stack[-2].value + pop_count = 2 + closure = None + # arg contains flags, with corresponding stack values if positive. + # https://docs.python.org/3.6/library/dis.html#opcode-MAKE_FUNCTION + pop_count += bin(arg).count('1') + if arg & 0x08: + # Convert types in Tuple constraint to a tuple of CPython cells. + # https://stackoverflow.com/a/44670295 + closure = tuple((lambda _: lambda: _)(t).__closure__[0] + for t in state.stack[-3].tuple_types) + + func = types.FunctionType(func_code, globals, name=func_name, closure=closure) assert pop_count <= len(state.stack) state.stack[-pop_count:] = [Const(func)] diff --git a/sdks/python/apache_beam/typehints/row_type.py b/sdks/python/apache_beam/typehints/row_type.py index 715297a8996e..f7098fb90676 100644 --- a/sdks/python/apache_beam/typehints/row_type.py +++ b/sdks/python/apache_beam/typehints/row_type.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - from apache_beam.typehints import typehints @@ -40,10 +38,6 @@ def _inner_types(self): def __eq__(self, other): return type(self) == type(other) and self._fields == other._fields - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(self._fields) diff --git a/sdks/python/apache_beam/typehints/schemas.py b/sdks/python/apache_beam/typehints/schemas.py index 6a299bd136f1..3ef035e10f06 100644 --- a/sdks/python/apache_beam/typehints/schemas.py +++ b/sdks/python/apache_beam/typehints/schemas.py @@ -30,7 +30,7 @@ np.float64 <-----> DOUBLE float ------> DOUBLE bool <-----> BOOLEAN - str/unicode <-----> STRING + str <-----> STRING bytes <-----> BYTES ByteString ------> BYTES Timestamp <-----> LogicalType(urn="beam:logical_type:micros_instant:v1") @@ -47,33 +47,35 @@ :code:`nullable=True` on a Beam :code:`FieldType` is represented in Python by wrapping the type in :code:`Optional`. + +This module is intended for internal use only. Nothing defined here provides +any backwards-compatibility guarantee. """ # pytype: skip-file -from __future__ import absolute_import - from typing import Any from typing import ByteString from typing import Generic +from typing import List from typing import Mapping from typing import NamedTuple from typing import Optional from typing import Sequence +from typing import Tuple from typing import TypeVar from uuid import uuid4 import numpy as np -from past.builtins import unicode from apache_beam.portability.api import schema_pb2 from apache_beam.typehints import row_type from apache_beam.typehints.native_type_compatibility import _get_args from apache_beam.typehints.native_type_compatibility import _match_is_exactly_mapping -from apache_beam.typehints.native_type_compatibility import _match_is_named_tuple from apache_beam.typehints.native_type_compatibility import _match_is_optional from apache_beam.typehints.native_type_compatibility import _safe_issubclass from apache_beam.typehints.native_type_compatibility import extract_optional_type +from apache_beam.typehints.native_type_compatibility import match_is_named_tuple from apache_beam.utils import proto_utils from apache_beam.utils.timestamp import Timestamp @@ -108,7 +110,7 @@ def get_schema_by_id(self, unique_id): (np.int64, schema_pb2.INT64), (np.float32, schema_pb2.FLOAT), (np.float64, schema_pb2.DOUBLE), - (unicode, schema_pb2.STRING), + (str, schema_pb2.STRING), (bool, schema_pb2.BOOLEAN), (bytes, schema_pb2.BYTES), ) @@ -142,13 +144,13 @@ def named_fields_to_schema(names_and_types): def named_fields_from_schema( - schema): # (schema_pb2.Schema) -> typing.List[typing.Tuple[unicode, type]] + schema): # (schema_pb2.Schema) -> typing.List[typing.Tuple[str, type]] return [(field.name, typing_from_runner_api(field.type)) for field in schema.fields] def typing_to_runner_api(type_): - if _match_is_named_tuple(type_): + if match_is_named_tuple(type_): schema = None if hasattr(type_, _BEAM_SCHEMA_ID): schema = SCHEMA_REGISTRY.get_schema_by_id(getattr(type_, _BEAM_SCHEMA_ID)) @@ -279,7 +281,7 @@ def named_tuple_to_schema(named_tuple): return typing_to_runner_api(named_tuple).row_type.schema -def schema_from_element_type(element_type): # (type) -> schema_pb2.Schema +def schema_from_element_type(element_type: type) -> schema_pb2.Schema: """Get a schema for the given PCollection element_type. Returns schema as a list of (name, python_type) tuples""" @@ -287,16 +289,17 @@ def schema_from_element_type(element_type): # (type) -> schema_pb2.Schema # TODO(BEAM-10722): Make sure beam.Row generated schemas are registered and # de-duped return named_fields_to_schema(element_type._fields) - elif _match_is_named_tuple(element_type): + elif match_is_named_tuple(element_type): return named_tuple_to_schema(element_type) else: raise TypeError( - "Attempted to determine schema for unsupported type '%s'" % - element_type) + f"Could not determine schema for type hint {element_type!r}. Did you " + "mean to create a schema-aware PCollection? See " + "https://s.apache.org/beam-python-schemas") def named_fields_from_element_type( - element_type): # (type) -> typing.List[typing.Tuple[unicode, type]] + element_type: type) -> List[Tuple[str, type]]: return named_fields_from_schema(schema_from_element_type(element_type)) @@ -332,7 +335,7 @@ class LogicalType(Generic[LanguageT, RepresentationT, ArgT]): @classmethod def urn(cls): - # type: () -> unicode + # type: () -> str """Return the URN used to identify this logical type""" raise NotImplementedError() diff --git a/sdks/python/apache_beam/typehints/schemas_test.py b/sdks/python/apache_beam/typehints/schemas_test.py index c28f2c59308c..c662bacc4695 100644 --- a/sdks/python/apache_beam/typehints/schemas_test.py +++ b/sdks/python/apache_beam/typehints/schemas_test.py @@ -19,9 +19,8 @@ # pytype: skip-file -from __future__ import absolute_import - import itertools +import pickle import unittest from typing import ByteString from typing import List @@ -31,8 +30,6 @@ from typing import Sequence import numpy as np -from future.moves import pickle -from past.builtins import unicode from apache_beam.portability.api import schema_pb2 from apache_beam.typehints.schemas import named_tuple_from_schema @@ -58,7 +55,6 @@ def test_typing_survives_proto_roundtrip(self): np.int64, np.float32, np.float64, - unicode, bool, bytes, str, @@ -85,10 +81,8 @@ def test_typing_survives_proto_roundtrip(self): 'ComplexSchema', [ ('id', np.int64), - ('name', unicode), - ( - 'optional_map', - Optional[Mapping[unicode, Optional[np.float64]]]), + ('name', str), + ('optional_map', Optional[Mapping[str, Optional[np.float64]]]), ('optional_array', Optional[Sequence[np.float32]]), ('array_optional', Sequence[Optional[bool]]), ('timestamp', Timestamp), @@ -215,9 +209,9 @@ def test_trivial_example(self): MyCuteClass = NamedTuple( 'MyCuteClass', [ - ('name', unicode), + ('name', str), ('age', Optional[int]), - ('interests', List[unicode]), + ('interests', List[str]), ('height', float), ('blob', ByteString), ]) @@ -273,7 +267,7 @@ def test_generated_class_pickle(self): def test_user_type_annotated_with_id_after_conversion(self): MyCuteClass = NamedTuple('MyCuteClass', [ - ('name', unicode), + ('name', str), ]) self.assertFalse(hasattr(MyCuteClass, '_beam_schema_id')) diff --git a/sdks/python/apache_beam/typehints/sharded_key_type.py b/sdks/python/apache_beam/typehints/sharded_key_type.py index 2c463ef704c2..24611416d44b 100644 --- a/sdks/python/apache_beam/typehints/sharded_key_type.py +++ b/sdks/python/apache_beam/typehints/sharded_key_type.py @@ -15,19 +15,29 @@ # limitations under the License. # -# pytype: skip-file +"""Type constraint for `ShardedKey`. + +Can be used like a type-hint, for instance, + 'ShardedKeyType[int]' + 'ShardedKeyType[Tuple[T]]' -from __future__ import absolute_import +The type constraint is registered to be associated with +:class:`apache_beam.coders.coders.ShardedKeyCoder`. +Mostly for internal use. +""" + +# pytype: skip-file -from apache_beam.coders import typecoders -from apache_beam.coders.coders import ShardedKeyCoder +from apache_beam import coders from apache_beam.typehints import typehints -from apache_beam.typehints.typehints import match_type_variables from apache_beam.utils.sharded_key import ShardedKey -class ShardedKeyTypeConstraint(typehints.TypeConstraint): +class ShardedKeyTypeConstraint(typehints.TypeConstraint, + metaclass=typehints.GetitemConstructor): def __init__(self, key_type): + typehints.validate_composite_type_param( + key_type, error_msg_prefix='Parameter to ShardedKeyType hint') self.key_type = typehints.normalize(key_type) def _inner_types(self): @@ -58,7 +68,8 @@ def type_check(self, instance): def match_type_variables(self, concrete_type): if isinstance(concrete_type, ShardedKeyTypeConstraint): - return match_type_variables(self.key_type, concrete_type.key_type) + return typehints.match_type_variables( + self.key_type, concrete_type.key_type) return {} def __eq__(self, other): @@ -69,7 +80,9 @@ def __hash__(self): return hash(self.key_type) def __repr__(self): - return 'ShardedKey(%s)' % typehints._unified_repr(self.key_type) + return 'ShardedKey[%s]' % typehints._unified_repr(self.key_type) -typecoders.registry.register_coder(ShardedKeyTypeConstraint, ShardedKeyCoder) +ShardedKeyType = ShardedKeyTypeConstraint +coders.typecoders.registry.register_coder( + ShardedKeyType, coders.ShardedKeyCoder) diff --git a/sdks/python/apache_beam/typehints/sharded_key_type_test.py b/sdks/python/apache_beam/typehints/sharded_key_type_test.py index 7bc5143b55d7..922fcf96cc9e 100644 --- a/sdks/python/apache_beam/typehints/sharded_key_type_test.py +++ b/sdks/python/apache_beam/typehints/sharded_key_type_test.py @@ -19,30 +19,28 @@ # pytype: skip-file -from __future__ import absolute_import - from apache_beam.typehints import Tuple from apache_beam.typehints import typehints -from apache_beam.typehints.sharded_key_type import ShardedKeyTypeConstraint +from apache_beam.typehints.sharded_key_type import ShardedKeyType from apache_beam.typehints.typehints_test import TypeHintTestCase from apache_beam.utils.sharded_key import ShardedKey class ShardedKeyTypeConstraintTest(TypeHintTestCase): def test_compatibility(self): - constraint1 = ShardedKeyTypeConstraint(int) - constraint2 = ShardedKeyTypeConstraint(str) + constraint1 = ShardedKeyType[int] + constraint2 = ShardedKeyType[str] self.assertCompatible(constraint1, constraint1) self.assertCompatible(constraint2, constraint2) self.assertNotCompatible(constraint1, constraint2) def test_repr(self): - constraint = ShardedKeyTypeConstraint(int) - self.assertEqual('ShardedKey(int)', repr(constraint)) + constraint = ShardedKeyType[int] + self.assertEqual('ShardedKey[int]', repr(constraint)) def test_type_check_not_sharded_key(self): - constraint = ShardedKeyTypeConstraint(int) + constraint = ShardedKeyType[int] obj = 5 with self.assertRaises(TypeError) as e: constraint.type_check(obj) @@ -52,29 +50,38 @@ def test_type_check_not_sharded_key(self): e.exception.args[0]) def test_type_check_invalid_key_type(self): - constraint = ShardedKeyTypeConstraint(int) + constraint = ShardedKeyType[int] obj = ShardedKey(key='abc', shard_id=b'123') with self.assertRaises((TypeError, TypeError)) as e: constraint.type_check(obj) self.assertEqual( - "ShardedKey(int) type-constraint violated. The type of key in " + "ShardedKey[int] type-constraint violated. The type of key in " "'ShardedKey' is incorrect. Expected an instance of type 'int', " "instead received an instance of type 'str'.", e.exception.args[0]) def test_type_check_valid_simple_type(self): - constraint = ShardedKeyTypeConstraint(str) + constraint = ShardedKeyType[str] obj = ShardedKey(key='abc', shard_id=b'123') self.assertIsNone(constraint.type_check(obj)) def test_type_check_valid_composite_type(self): - constraint = ShardedKeyTypeConstraint(Tuple[int, str]) + constraint = ShardedKeyType[Tuple[int, str]] obj = ShardedKey(key=(1, 'a'), shard_id=b'123') self.assertIsNone(constraint.type_check(obj)) def test_match_type_variables(self): K = typehints.TypeVariable('K') # pylint: disable=invalid-name - constraint = ShardedKeyTypeConstraint(K) + constraint = ShardedKeyType[K] self.assertEqual({K: int}, - constraint.match_type_variables( - ShardedKeyTypeConstraint(int))) + constraint.match_type_variables(ShardedKeyType[int])) + + def test_getitem(self): + K = typehints.TypeVariable('K') # pylint: disable=invalid-name + T = typehints.TypeVariable('T') # pylint: disable=invalid-name + with self.assertRaisesRegex(TypeError, + 'Parameter to ShardedKeyType hint.*'): + _ = ShardedKeyType[K, T] + with self.assertRaisesRegex(TypeError, + 'Parameter to ShardedKeyType hint.*'): + _ = ShardedKeyType[(K, T)] diff --git a/sdks/python/apache_beam/typehints/trivial_inference.py b/sdks/python/apache_beam/typehints/trivial_inference.py index 038ba8f455c8..cc6534da6f57 100644 --- a/sdks/python/apache_beam/typehints/trivial_inference.py +++ b/sdks/python/apache_beam/typehints/trivial_inference.py @@ -21,9 +21,7 @@ """ # pytype: skip-file -from __future__ import absolute_import -from __future__ import print_function - +import builtins import collections import dis import inspect @@ -31,8 +29,6 @@ import sys import traceback import types -from builtins import object -from builtins import zip from functools import reduce from apache_beam import pvalue @@ -40,13 +36,6 @@ from apache_beam.typehints import row_type from apache_beam.typehints import typehints -# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports -try: # Python 2 - import __builtin__ as builtins -except ImportError: # Python 3 - import builtins # type: ignore -# pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports - class TypeInferenceError(ValueError): """Error to raise when type inference failed.""" @@ -65,8 +54,6 @@ def instance_to_type(o): ]) elif t not in typehints.DISALLOWED_PRIMITIVE_TYPES: # pylint: disable=deprecated-types-field - if sys.version_info[0] == 2 and t == types.InstanceType: - return o.__class__ if t == BoundMethod: return types.MethodType return t @@ -118,10 +105,6 @@ def __init__(self, value): def __eq__(self, other): return isinstance(other, Const) and self.value == other.value - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(self.value) @@ -151,10 +134,6 @@ def __init__(self, f, local_vars=None, stack=()): def __eq__(self, other): return isinstance(other, FrameState) and self.__dict__ == other.__dict__ - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(tuple(sorted(self.__dict__.items()))) @@ -370,7 +349,6 @@ def infer_return_type_func(f, input_types, debug=False, depth=0): code = co.co_code end = len(code) pc = 0 - extended_arg = 0 # Python 2 only. free = None yields = set() @@ -384,29 +362,19 @@ def infer_return_type_func(f, input_types, debug=False, depth=0): # In Python 3, use dis library functions to disassemble bytecode and handle # EXTENDED_ARGs. - is_py3 = sys.version_info[0] == 3 - if is_py3: - ofs_table = {} # offset -> instruction - for instruction in dis.get_instructions(f): - ofs_table[instruction.offset] = instruction + ofs_table = {} # offset -> instruction + for instruction in dis.get_instructions(f): + ofs_table[instruction.offset] = instruction - # Python 2 - 3.5: 1 byte opcode + optional 2 byte arg (1 or 3 bytes). # Python 3.6+: 1 byte opcode + 1 byte arg (2 bytes, arg may be ignored). - if sys.version_info >= (3, 6): - inst_size = 2 - opt_arg_size = 0 - else: - inst_size = 1 - opt_arg_size = 2 + inst_size = 2 + opt_arg_size = 0 last_pc = -1 while pc < end: # pylint: disable=too-many-nested-blocks start = pc - if is_py3: - instruction = ofs_table[pc] - op = instruction.opcode - else: - op = ord(code[pc]) + instruction = ofs_table[pc] + op = instruction.opcode if debug: print('-->' if pc == last_pc else ' ', end=' ') print(repr(pc).rjust(4), end=' ') @@ -414,14 +382,8 @@ def infer_return_type_func(f, input_types, debug=False, depth=0): pc += inst_size if op >= dis.HAVE_ARGUMENT: - if is_py3: - arg = instruction.arg - else: - arg = ord(code[pc]) + ord(code[pc + 1]) * 256 + extended_arg - extended_arg = 0 + arg = instruction.arg pc += opt_arg_size - if op == dis.EXTENDED_ARG: - extended_arg = arg * 65536 if debug: print(str(arg).rjust(5), end=' ') if op in dis.hasconst: @@ -450,34 +412,11 @@ def infer_return_type_func(f, input_types, debug=False, depth=0): opname = dis.opname[op] jmp = jmp_state = None if opname.startswith('CALL_FUNCTION'): - if sys.version_info < (3, 6): - # Each keyword takes up two arguments on the stack (name and value). - standard_args = (arg & 0xFF) + 2 * (arg >> 8) - var_args = 'VAR' in opname - kw_args = 'KW' in opname - pop_count = standard_args + var_args + kw_args + 1 + if opname == 'CALL_FUNCTION': + pop_count = arg + 1 if depth <= 0: return_type = Any - elif arg >> 8: - if not var_args and not kw_args and not arg & 0xFF: - # Keywords only, maybe it's a call to Row. - if isinstance(state.stack[-pop_count], Const): - from apache_beam.pvalue import Row - if state.stack[-pop_count].value == Row: - fields = state.stack[-pop_count + 1::2] - types = state.stack[-pop_count + 2::2] - return_type = row_type.RowTypeConstraint( - zip([fld.value for fld in fields], Const.unwrap_all(types))) - else: - return_type = Any - else: - # TODO(robertwb): Handle this case. - return_type = Any elif isinstance(state.stack[-pop_count], Const): - # TODO(robertwb): Handle this better. - if var_args or kw_args: - state.stack[-1] = Any - state.stack[-var_args - kw_args] = Any return_type = infer_return_type( state.stack[-pop_count].value, state.stack[1 - pop_count:], @@ -485,57 +424,43 @@ def infer_return_type_func(f, input_types, debug=False, depth=0): depth=depth - 1) else: return_type = Any - state.stack[-pop_count:] = [return_type] - else: # Python 3.6+ - if opname == 'CALL_FUNCTION': - pop_count = arg + 1 - if depth <= 0: - return_type = Any - elif isinstance(state.stack[-pop_count], Const): - return_type = infer_return_type( - state.stack[-pop_count].value, - state.stack[1 - pop_count:], - debug=debug, - depth=depth - 1) + elif opname == 'CALL_FUNCTION_KW': + # TODO(udim): Handle keyword arguments. Requires passing them by name + # to infer_return_type. + pop_count = arg + 2 + if isinstance(state.stack[-pop_count], Const): + from apache_beam.pvalue import Row + if state.stack[-pop_count].value == Row: + fields = state.stack[-1].value + return_type = row_type.RowTypeConstraint( + zip(fields, Const.unwrap_all(state.stack[-pop_count + 1:-1]))) else: return_type = Any - elif opname == 'CALL_FUNCTION_KW': - # TODO(udim): Handle keyword arguments. Requires passing them by name - # to infer_return_type. - pop_count = arg + 2 - if isinstance(state.stack[-pop_count], Const): - from apache_beam.pvalue import Row - if state.stack[-pop_count].value == Row: - fields = state.stack[-1].value - return_type = row_type.RowTypeConstraint( - zip(fields, Const.unwrap_all(state.stack[-pop_count + 1:-1]))) - else: - return_type = Any - else: - return_type = Any - elif opname == 'CALL_FUNCTION_EX': - # stack[-has_kwargs]: Map of keyword args. - # stack[-1 - has_kwargs]: Iterable of positional args. - # stack[-2 - has_kwargs]: Function to call. - has_kwargs = arg & 1 # type: int - pop_count = has_kwargs + 2 - if has_kwargs: - # TODO(udim): Unimplemented. Requires same functionality as a - # CALL_FUNCTION_KW implementation. - return_type = Any - else: - args = state.stack[-1] - _callable = state.stack[-2] - if isinstance(args, typehints.ListConstraint): - # Case where there's a single var_arg argument. - args = [args] - elif isinstance(args, typehints.TupleConstraint): - args = list(args._inner_types()) - return_type = infer_return_type( - _callable.value, args, debug=debug, depth=depth - 1) else: - raise TypeInferenceError('unable to handle %s' % opname) - state.stack[-pop_count:] = [return_type] + return_type = Any + elif opname == 'CALL_FUNCTION_EX': + # stack[-has_kwargs]: Map of keyword args. + # stack[-1 - has_kwargs]: Iterable of positional args. + # stack[-2 - has_kwargs]: Function to call. + has_kwargs = arg & 1 # type: int + pop_count = has_kwargs + 2 + if has_kwargs: + # TODO(udim): Unimplemented. Requires same functionality as a + # CALL_FUNCTION_KW implementation. + return_type = Any + else: + args = state.stack[-1] + _callable = state.stack[-2] + if isinstance(args, typehints.ListConstraint): + # Case where there's a single var_arg argument. + args = [args] + elif isinstance(args, typehints.TupleConstraint): + args = list(args._inner_types()) + return_type = infer_return_type( + _callable.value, args, debug=debug, depth=depth - 1) + else: + raise TypeInferenceError('unable to handle %s' % opname) + state.stack[-pop_count:] = [return_type] elif opname == 'CALL_METHOD': pop_count = 1 + arg # LOAD_METHOD will return a non-Const (Any) if loading from an Any. diff --git a/sdks/python/apache_beam/typehints/trivial_inference_test.py b/sdks/python/apache_beam/typehints/trivial_inference_test.py index 7f54a407d64e..e03e919d8166 100644 --- a/sdks/python/apache_beam/typehints/trivial_inference_test.py +++ b/sdks/python/apache_beam/typehints/trivial_inference_test.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - -import sys import types import unittest @@ -39,6 +36,21 @@ def assertReturnType(self, expected, f, inputs=(), depth=5): expected, trivial_inference.infer_return_type(f, inputs, debug=True, depth=depth)) + def testBuildListUnpack(self): + # Lambda uses BUILD_LIST_UNPACK opcode in Python 3. + self.assertReturnType( + typehints.List[int], + lambda _list: [*_list, *_list, *_list], [typehints.List[int]]) + + def testBuildTupleUnpack(self): + # Lambda uses BUILD_TUPLE_UNPACK opcode in Python 3. + # yapf: disable + self.assertReturnType( + typehints.Tuple[int, str, str], + lambda _list1, _list2: (*_list1, *_list2, *_list2), + [typehints.List[int], typehints.List[str]]) + # yapf: enable + def testIdentity(self): self.assertReturnType(int, lambda x: x, [int]) @@ -74,11 +86,9 @@ def testGetItemSlice(self): self.assertReturnType(str, lambda v: v[::-1], [str]) self.assertReturnType(typehints.Any, lambda v: v[::-1], [typehints.Any]) self.assertReturnType(typehints.Any, lambda v: v[::-1], [object]) - if sys.version_info >= (3, ): - # Test binary_subscr on a slice of a Const. On Py2.7 this will use the - # unsupported opcode SLICE+0. - test_list = ['a', 'b'] - self.assertReturnType(typehints.List[str], lambda: test_list[:], []) + # Test binary_subscr on a slice of a Const. + test_list = ['a', 'b'] + self.assertReturnType(typehints.List[str], lambda: test_list[:], []) def testUnpack(self): def reverse(a_b): @@ -104,7 +114,6 @@ def reverse(a_b): self.assertReturnType( any_tuple, reverse, [trivial_inference.Const((1, 2, 3))]) - @unittest.skipIf(sys.version_info < (3, ), 'BUILD_MAP better in Python 3') def testBuildMap(self): self.assertReturnType( typehints.Dict[typehints.Any, typehints.Any], @@ -170,11 +179,7 @@ def testTupleListComprehension(self): self.assertReturnType( typehints.List[typehints.Union[int, float]], lambda xs: [x for x in xs], [typehints.Tuple[int, float]]) - if sys.version_info[:2] == (3, 5): - # A better result requires implementing the MAKE_CLOSURE opcode. - expected = typehints.Any - else: - expected = typehints.List[typehints.Tuple[str, int]] + expected = typehints.List[typehints.Tuple[str, int]] self.assertReturnType( expected, lambda kvs: [(kvs[0], v) for v in kvs[1]], @@ -269,11 +274,7 @@ def testDict(self): def testDictComprehension(self): fields = [] - if sys.version_info >= (3, 6): - expected_type = typehints.Dict[typehints.Any, typehints.Any] - else: - # For Python 2, just ensure it doesn't crash. - expected_type = typehints.Any + expected_type = typehints.Dict[typehints.Any, typehints.Any] self.assertReturnType( expected_type, lambda row: {f: row[f] for f in fields}, [typehints.Any]) @@ -318,7 +319,6 @@ def fn(x1, x2, *unused_args): x2, _list: fn(x1, x2, *_list), [str, typehints.List[int]]) - @unittest.skipIf(sys.version_info < (3, 6), 'CALL_FUNCTION_EX is new in 3.6') def testCallFunctionEx(self): # Test when fn arguments are built using BUiLD_LIST. def fn(*args): @@ -329,7 +329,6 @@ def fn(*args): lambda x1, x2: fn(*[x1, x2]), [str, float]) - @unittest.skipIf(sys.version_info < (3, 6), 'CALL_FUNCTION_EX is new in 3.6') def testCallFunctionExKwargs(self): def fn(x1, x2, **unused_kwargs): return x1, x2 diff --git a/sdks/python/apache_beam/typehints/trivial_inference_test_py3.py b/sdks/python/apache_beam/typehints/trivial_inference_test_py3.py deleted file mode 100644 index e52a7ecb32e3..000000000000 --- a/sdks/python/apache_beam/typehints/trivial_inference_test_py3.py +++ /dev/null @@ -1,54 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -"""Tests for apache_beam.typehints.trivial_inference that use Python 3 syntax. -""" - -# pytype: skip-file - -from __future__ import absolute_import - -import unittest - -from apache_beam.typehints import trivial_inference -from apache_beam.typehints import typehints - - -class TrivialInferenceTest(unittest.TestCase): - def assertReturnType(self, expected, f, inputs=(), depth=5): - self.assertEqual( - expected, - trivial_inference.infer_return_type(f, inputs, debug=True, depth=depth)) - - def testBuildListUnpack(self): - # Lambda uses BUILD_LIST_UNPACK opcode in Python 3. - self.assertReturnType( - typehints.List[int], - lambda _list: [*_list, *_list, *_list], [typehints.List[int]]) - - def testBuildTupleUnpack(self): - # Lambda uses BUILD_TUPLE_UNPACK opcode in Python 3. - # yapf: disable - self.assertReturnType( - typehints.Tuple[int, str, str], - lambda _list1, _list2: (*_list1, *_list2, *_list2), - [typehints.List[int], typehints.List[str]]) - # yapf: enable - - -if __name__ == '__main__': - unittest.main() diff --git a/sdks/python/apache_beam/typehints/typecheck.py b/sdks/python/apache_beam/typehints/typecheck.py index d44d0deab768..6c4ba250ec80 100644 --- a/sdks/python/apache_beam/typehints/typecheck.py +++ b/sdks/python/apache_beam/typehints/typecheck.py @@ -22,15 +22,11 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import inspect +import sys import types -from future.utils import raise_with_traceback -from past.builtins import unicode - from apache_beam import pipeline from apache_beam.pvalue import TaggedOutput from apache_beam.transforms import core @@ -93,7 +89,8 @@ def wrapper(self, method, args, kwargs): error_msg = ( 'Runtime type violation detected within ParDo(%s): ' '%s' % (self.full_label, e)) - raise_with_traceback(TypeCheckError(error_msg)) + _, _, tb = sys.exc_info() + raise TypeCheckError(error_msg).with_traceback(tb) else: return self._check_type(result) @@ -102,7 +99,7 @@ def _check_type(output): if output is None: return output - elif isinstance(output, (dict, bytes, str, unicode)): + elif isinstance(output, (dict, bytes, str)): object_type = type(output).__name__ raise TypeCheckError( 'Returning a %s from a ParDo or FlatMap is ' @@ -184,13 +181,15 @@ def type_check(type_constraint, datum, is_input): try: check_constraint(type_constraint, datum) except CompositeTypeHintError as e: - raise_with_traceback(TypeCheckError(e.args[0])) + _, _, tb = sys.exc_info() + raise TypeCheckError(e.args[0]).with_traceback(tb) except SimpleTypeHintError: error_msg = ( "According to type-hint expected %s should be of type %s. " "Instead, received '%s', an instance of type %s." % (datum_type, type_constraint, datum, type(datum))) - raise_with_traceback(TypeCheckError(error_msg)) + _, _, tb = sys.exc_info() + raise TypeCheckError(error_msg).with_traceback(tb) class TypeCheckCombineFn(core.CombineFn): @@ -220,7 +219,8 @@ def add_input(self, accumulator, element, *args, **kwargs): error_msg = ( 'Runtime type violation detected within %s: ' '%s' % (self._label, e)) - raise_with_traceback(TypeCheckError(error_msg)) + _, _, tb = sys.exc_info() + raise TypeCheckError(error_msg).with_traceback(tb) return self._combinefn.add_input(accumulator, element, *args, **kwargs) def merge_accumulators(self, accumulators, *args, **kwargs): @@ -239,7 +239,8 @@ def extract_output(self, accumulator, *args, **kwargs): error_msg = ( 'Runtime type violation detected within %s: ' '%s' % (self._label, e)) - raise_with_traceback(TypeCheckError(error_msg)) + _, _, tb = sys.exc_info() + raise TypeCheckError(error_msg).with_traceback(tb) return result def teardown(self, *args, **kwargs): diff --git a/sdks/python/apache_beam/typehints/typecheck_test_py3.py b/sdks/python/apache_beam/typehints/typecheck_test.py similarity index 98% rename from sdks/python/apache_beam/typehints/typecheck_test_py3.py rename to sdks/python/apache_beam/typehints/typecheck_test.py index 9e7749cc19c9..b6897b55d413 100644 --- a/sdks/python/apache_beam/typehints/typecheck_test_py3.py +++ b/sdks/python/apache_beam/typehints/typecheck_test.py @@ -23,8 +23,6 @@ # pytype: skip-file -from __future__ import absolute_import - import os import tempfile import unittest @@ -42,8 +40,6 @@ from apache_beam.typehints import decorators from apache_beam.typehints import with_input_types from apache_beam.typehints import with_output_types -# TODO(BEAM-8371): Use tempfile.TemporaryDirectory. -from apache_beam.utils.subprocess_server_test import TemporaryDirectory decorators._enable_from_callable = True @@ -106,7 +102,7 @@ def test_wrapper_pass_through(self): # We use a file to check the result because the MyDoFn instance passed is # not the same one that actually runs in the pipeline (it is serialized # here and deserialized in the worker). - with TemporaryDirectory() as tmp_dirname: + with tempfile.TemporaryDirectory() as tmp_dirname: path = os.path.join(tmp_dirname + "tmp_filename") dofn = MyDoFn(path) result = self.p | beam.Create([1, 2, 3]) | beam.ParDo(dofn) diff --git a/sdks/python/apache_beam/typehints/typed_pipeline_test.py b/sdks/python/apache_beam/typehints/typed_pipeline_test.py index 21a1803f40fd..b880e49fa891 100644 --- a/sdks/python/apache_beam/typehints/typed_pipeline_test.py +++ b/sdks/python/apache_beam/typehints/typed_pipeline_test.py @@ -19,15 +19,10 @@ # pytype: skip-file -from __future__ import absolute_import - import sys import typing import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import - import apache_beam as beam from apache_beam import pvalue from apache_beam import typehints @@ -44,6 +39,10 @@ class MainInputTest(unittest.TestCase): + def assertStartswith(self, msg, prefix): + self.assertTrue( + msg.startswith(prefix), '"%s" does not start with "%s"' % (msg, prefix)) + def test_bad_main_input(self): @typehints.with_input_types(str, int) def repeat(s, times): @@ -66,7 +65,7 @@ def test_non_function(self): self.assertEqual([1, 16, 256], sorted(result)) @unittest.skipIf( - sys.version_info.major >= 3 and sys.version_info < (3, 7, 0), + sys.version_info < (3, 7, 0), 'Function signatures for builtins are not available in Python 3 before ' 'version 3.7.') def test_non_function_fails(self): @@ -100,18 +99,57 @@ def process(self, element): r'requires.*int.*got.*str'): [1, 2, 3] | (beam.ParDo(MyDoFn()) | 'again' >> beam.ParDo(MyDoFn())) + def test_typed_dofn_method(self): + class MyDoFn(beam.DoFn): + def process(self, element: int) -> typehints.Tuple[str]: + return tuple(str(element)) + + result = [1, 2, 3] | beam.ParDo(MyDoFn()) + self.assertEqual(['1', '2', '3'], sorted(result)) + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*int.*got.*str'): + _ = ['a', 'b', 'c'] | beam.ParDo(MyDoFn()) + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*int.*got.*str'): + _ = [1, 2, 3] | (beam.ParDo(MyDoFn()) | 'again' >> beam.ParDo(MyDoFn())) + + def test_typed_dofn_method_with_class_decorators(self): + # Class decorators take precedence over PEP 484 hints. + @typehints.with_input_types(typehints.Tuple[int, int]) + @typehints.with_output_types(int) + class MyDoFn(beam.DoFn): + def process(self, element: int) -> typehints.Tuple[str]: + yield element[0] + + result = [(1, 2)] | beam.ParDo(MyDoFn()) + self.assertEqual([1], sorted(result)) + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*Tuple\[int, int\].*got.*str'): + _ = ['a', 'b', 'c'] | beam.ParDo(MyDoFn()) + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*Tuple\[int, int\].*got.*int'): + _ = [1, 2, 3] | (beam.ParDo(MyDoFn()) | 'again' >> beam.ParDo(MyDoFn())) + def test_typed_callable_iterable_output(self): - @typehints.with_input_types(int) - @typehints.with_output_types(typehints.Iterable[typehints.Iterable[str]]) - def do_fn(element): + # Only the outer Iterable should be stripped. + def do_fn(element: int) -> typehints.Iterable[typehints.Iterable[str]]: return [[str(element)] * 2] result = [1, 2] | beam.ParDo(do_fn) self.assertEqual([['1', '1'], ['2', '2']], sorted(result)) def test_typed_dofn_instance(self): + # Type hints applied to DoFn instance take precedence over decorators and + # process annotations. + @typehints.with_input_types(typehints.Tuple[int, int]) + @typehints.with_output_types(int) class MyDoFn(beam.DoFn): - def process(self, element): + def process(self, element: typehints.Tuple[int, int]) -> \ + typehints.List[int]: return [str(element)] my_do_fn = MyDoFn().with_input_types(int).with_output_types(str) @@ -119,11 +157,34 @@ def process(self, element): result = [1, 2, 3] | beam.ParDo(my_do_fn) self.assertEqual(['1', '2', '3'], sorted(result)) - with self.assertRaises(typehints.TypeCheckError): - ['a', 'b', 'c'] | beam.ParDo(my_do_fn) + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*int.*got.*str'): + _ = ['a', 'b', 'c'] | beam.ParDo(my_do_fn) - with self.assertRaises(typehints.TypeCheckError): - [1, 2, 3] | (beam.ParDo(my_do_fn) | 'again' >> beam.ParDo(my_do_fn)) + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*int.*got.*str'): + _ = [1, 2, 3] | (beam.ParDo(my_do_fn) | 'again' >> beam.ParDo(my_do_fn)) + + def test_typed_callable_instance(self): + # Type hints applied to ParDo instance take precedence over callable + # decorators and annotations. + @typehints.with_input_types(typehints.Tuple[int, int]) + @typehints.with_output_types(typehints.Generator[int]) + def do_fn(element: typehints.Tuple[int, int]) -> typehints.Generator[str]: + yield str(element) + + pardo = beam.ParDo(do_fn).with_input_types(int).with_output_types(str) + + result = [1, 2, 3] | pardo + self.assertEqual(['1', '2', '3'], sorted(result)) + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*int.*got.*str'): + _ = ['a', 'b', 'c'] | pardo + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*int.*got.*str'): + _ = [1, 2, 3] | (pardo | 'again' >> pardo) def test_filter_type_hint(self): @typehints.with_input_types(int) @@ -264,6 +325,308 @@ def multi_input(pcoll_dict, additional_arg): 'strings': pcoll2, 'integers': pcoll1 } | 'fails' >> multi_input('additional_arg') + def test_typed_dofn_method_not_iterable(self): + class MyDoFn(beam.DoFn): + def process(self, element: int) -> str: + return str(element) + + with self.assertRaisesRegex(ValueError, r'str.*is not iterable'): + _ = [1, 2, 3] | beam.ParDo(MyDoFn()) + + def test_typed_dofn_method_return_none(self): + class MyDoFn(beam.DoFn): + def process(self, unused_element: int) -> None: + pass + + result = [1, 2, 3] | beam.ParDo(MyDoFn()) + self.assertListEqual([], result) + + def test_typed_dofn_method_return_optional(self): + class MyDoFn(beam.DoFn): + def process( + self, + unused_element: int) -> typehints.Optional[typehints.Iterable[int]]: + pass + + result = [1, 2, 3] | beam.ParDo(MyDoFn()) + self.assertListEqual([], result) + + def test_typed_dofn_method_return_optional_not_iterable(self): + class MyDoFn(beam.DoFn): + def process(self, unused_element: int) -> typehints.Optional[int]: + pass + + with self.assertRaisesRegex(ValueError, r'int.*is not iterable'): + _ = [1, 2, 3] | beam.ParDo(MyDoFn()) + + def test_typed_callable_not_iterable(self): + def do_fn(element: int) -> int: + return element + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'int.*is not iterable'): + _ = [1, 2, 3] | beam.ParDo(do_fn) + + def test_typed_dofn_kwonly(self): + class MyDoFn(beam.DoFn): + # TODO(BEAM-5878): A kwonly argument like + # timestamp=beam.DoFn.TimestampParam would not work here. + def process(self, element: int, *, side_input: str) -> \ + typehints.Generator[typehints.Optional[str]]: + yield str(element) if side_input else None + + my_do_fn = MyDoFn() + + result = [1, 2, 3] | beam.ParDo(my_do_fn, side_input='abc') + self.assertEqual(['1', '2', '3'], sorted(result)) + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*str.*got.*int.*side_input'): + _ = [1, 2, 3] | beam.ParDo(my_do_fn, side_input=1) + + def test_typed_dofn_var_kwargs(self): + class MyDoFn(beam.DoFn): + def process(self, element: int, **side_inputs: typehints.Dict[str, str]) \ + -> typehints.Generator[typehints.Optional[str]]: + yield str(element) if side_inputs else None + + my_do_fn = MyDoFn() + + result = [1, 2, 3] | beam.ParDo(my_do_fn, foo='abc', bar='def') + self.assertEqual(['1', '2', '3'], sorted(result)) + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'requires.*str.*got.*int.*side_inputs'): + _ = [1, 2, 3] | beam.ParDo(my_do_fn, a=1) + + def test_typed_callable_string_literals(self): + def do_fn(element: 'int') -> 'typehints.List[str]': + return [[str(element)] * 2] + + result = [1, 2] | beam.ParDo(do_fn) + self.assertEqual([['1', '1'], ['2', '2']], sorted(result)) + + def test_typed_ptransform_fn(self): + # Test that type hints are propagated to the created PTransform. + @beam.ptransform_fn + @typehints.with_input_types(int) + def MyMap(pcoll): + def fn(element: int): + yield element + + return pcoll | beam.ParDo(fn) + + self.assertListEqual([1, 2, 3], [1, 2, 3] | MyMap()) + with self.assertRaisesRegex(typehints.TypeCheckError, r'int.*got.*str'): + _ = ['a'] | MyMap() + + def test_typed_ptransform_fn_conflicting_hints(self): + # In this case, both MyMap and its contained ParDo have separate type + # checks (that disagree with each other). + @beam.ptransform_fn + @typehints.with_input_types(int) + def MyMap(pcoll): + def fn(element: float): + yield element + + return pcoll | beam.ParDo(fn) + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'ParDo.*requires.*float.*got.*int'): + _ = [1, 2, 3] | MyMap() + with self.assertRaisesRegex(typehints.TypeCheckError, + r'MyMap.*expected.*int.*got.*str'): + _ = ['a'] | MyMap() + + def test_typed_dofn_string_literals(self): + class MyDoFn(beam.DoFn): + def process(self, element: 'int') -> 'typehints.List[str]': + return [[str(element)] * 2] + + result = [1, 2] | beam.ParDo(MyDoFn()) + self.assertEqual([['1', '1'], ['2', '2']], sorted(result)) + + def test_typed_map(self): + def fn(element: int) -> int: + return element * 2 + + result = [1, 2, 3] | beam.Map(fn) + self.assertEqual([2, 4, 6], sorted(result)) + + def test_typed_map_return_optional(self): + # None is a valid element value for Map. + def fn(element: int) -> typehints.Optional[int]: + if element > 1: + return element + + result = [1, 2, 3] | beam.Map(fn) + self.assertCountEqual([None, 2, 3], result) + + def test_typed_flatmap(self): + def fn(element: int) -> typehints.Iterable[int]: + yield element * 2 + + result = [1, 2, 3] | beam.FlatMap(fn) + self.assertCountEqual([2, 4, 6], result) + + def test_typed_flatmap_output_hint_not_iterable(self): + def fn(element: int) -> int: + return element * 2 + + # This is raised (originally) in strip_iterable. + with self.assertRaisesRegex(typehints.TypeCheckError, + r'int.*is not iterable'): + _ = [1, 2, 3] | beam.FlatMap(fn) + + def test_typed_flatmap_output_value_not_iterable(self): + def fn(element: int) -> typehints.Iterable[int]: + return element * 2 + + # This is raised in runners/common.py (process_outputs). + with self.assertRaisesRegex(TypeError, r'int.*is not iterable'): + _ = [1, 2, 3] | beam.FlatMap(fn) + + def test_typed_flatmap_optional(self): + def fn(element: int) -> typehints.Optional[typehints.Iterable[int]]: + if element > 1: + yield element * 2 + + # Verify that the output type of fn is int and not Optional[int]. + def fn2(element: int) -> int: + return element + + result = [1, 2, 3] | beam.FlatMap(fn) | beam.Map(fn2) + self.assertCountEqual([4, 6], result) + + def test_typed_ptransform_with_no_error(self): + class StrToInt(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[str]) -> beam.pvalue.PCollection[int]: + return pcoll | beam.Map(lambda x: int(x)) + + class IntToStr(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: + return pcoll | beam.Map(lambda x: str(x)) + + _ = ['1', '2', '3'] | StrToInt() | IntToStr() + + def test_typed_ptransform_with_bad_typehints(self): + class StrToInt(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[str]) -> beam.pvalue.PCollection[int]: + return pcoll | beam.Map(lambda x: int(x)) + + class IntToStr(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[str]) -> beam.pvalue.PCollection[str]: + return pcoll | beam.Map(lambda x: str(x)) + + with self.assertRaisesRegex(typehints.TypeCheckError, + "Input type hint violation at IntToStr: " + "expected , got "): + _ = ['1', '2', '3'] | StrToInt() | IntToStr() + + def test_typed_ptransform_with_bad_input(self): + class StrToInt(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[str]) -> beam.pvalue.PCollection[int]: + return pcoll | beam.Map(lambda x: int(x)) + + class IntToStr(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: + return pcoll | beam.Map(lambda x: str(x)) + + with self.assertRaisesRegex(typehints.TypeCheckError, + "Input type hint violation at StrToInt: " + "expected , got "): + # Feed integers to a PTransform that expects strings + _ = [1, 2, 3] | StrToInt() | IntToStr() + + def test_typed_ptransform_with_partial_typehints(self): + class StrToInt(beam.PTransform): + def expand(self, pcoll) -> beam.pvalue.PCollection[int]: + return pcoll | beam.Map(lambda x: int(x)) + + class IntToStr(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: + return pcoll | beam.Map(lambda x: str(x)) + + # Feed integers to a PTransform that should expect strings + # but has no typehints so it expects any + _ = [1, 2, 3] | StrToInt() | IntToStr() + + def test_typed_ptransform_with_bare_wrappers(self): + class StrToInt(beam.PTransform): + def expand( + self, pcoll: beam.pvalue.PCollection) -> beam.pvalue.PCollection: + return pcoll | beam.Map(lambda x: int(x)) + + class IntToStr(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: + return pcoll | beam.Map(lambda x: str(x)) + + _ = [1, 2, 3] | StrToInt() | IntToStr() + + def test_typed_ptransform_with_no_typehints(self): + class StrToInt(beam.PTransform): + def expand(self, pcoll): + return pcoll | beam.Map(lambda x: int(x)) + + class IntToStr(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: + return pcoll | beam.Map(lambda x: str(x)) + + # Feed integers to a PTransform that should expect strings + # but has no typehints so it expects any + _ = [1, 2, 3] | StrToInt() | IntToStr() + + def test_typed_ptransform_with_generic_annotations(self): + T = typing.TypeVar('T') + + class IntToInt(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[T]) -> beam.pvalue.PCollection[T]: + return pcoll | beam.Map(lambda x: x) + + class IntToStr(beam.PTransform): + def expand( + self, + pcoll: beam.pvalue.PCollection[T]) -> beam.pvalue.PCollection[str]: + return pcoll | beam.Map(lambda x: str(x)) + + _ = [1, 2, 3] | IntToInt() | IntToStr() + + def test_typed_ptransform_with_do_outputs_tuple_compiles(self): + class MyDoFn(beam.DoFn): + def process(self, element: int, *args, **kwargs): + if element % 2: + yield beam.pvalue.TaggedOutput('odd', 1) + else: + yield beam.pvalue.TaggedOutput('even', 1) + + class MyPTransform(beam.PTransform): + def expand(self, pcoll: beam.pvalue.PCollection[int]): + return pcoll | beam.ParDo(MyDoFn()).with_outputs('odd', 'even') + + # This test fails if you remove the following line from ptransform.py + # if isinstance(pvalue_, DoOutputsTuple): continue + _ = [1, 2, 3] | MyPTransform() + class NativeTypesTest(unittest.TestCase): def test_good_main_input(self): @@ -362,11 +725,10 @@ def repeat(s, *times): result = ['a', 'bb', 'c'] | beam.Map(repeat, 3) self.assertEqual(['aaa', 'bbbbbb', 'ccc'], sorted(result)) - if sys.version_info >= (3, ): - with self.assertRaisesRegex( - typehints.TypeCheckError, - r'requires Tuple\[int, ...\] but got Tuple\[str, ...\]'): - ['a', 'bb', 'c'] | beam.Map(repeat, 'z') + with self.assertRaisesRegex( + typehints.TypeCheckError, + r'requires Tuple\[int, ...\] but got Tuple\[str, ...\]'): + ['a', 'bb', 'c'] | beam.Map(repeat, 'z') def test_var_positional_only_side_input_hint(self): # Test that a lambda that accepts only a VAR_POSITIONAL can accept @@ -379,12 +741,11 @@ def test_var_positional_only_side_input_hint(self): str, int).with_output_types(typehints.Tuple[str, int])) self.assertEqual([('a', 5), ('b', 5), ('c', 5)], sorted(result)) - if sys.version_info >= (3, ): - with self.assertRaisesRegex( - typehints.TypeCheckError, - r'requires Tuple\[Union\[int, str\], ...\] but got ' - r'Tuple\[Union\[float, int\], ...\]'): - _ = [1.2] | beam.Map(lambda *_: 'a', 5).with_input_types(int, str) + with self.assertRaisesRegex( + typehints.TypeCheckError, + r'requires Tuple\[Union\[int, str\], ...\] but got ' + r'Tuple\[Union\[float, int\], ...\]'): + _ = [1.2] | beam.Map(lambda *_: 'a', 5).with_input_types(int, str) def test_var_keyword_side_input_hint(self): # Test that a lambda that accepts a VAR_KEYWORD can accept @@ -401,13 +762,12 @@ def test_var_keyword_side_input_hint(self): })], sorted(result)) - if sys.version_info >= (3, ): - with self.assertRaisesRegex( - typehints.TypeCheckError, - r'requires Dict\[str, str\] but got Dict\[str, int\]'): - _ = (['a', 'b', 'c'] - | beam.Map(lambda e, **_: 'a', kw=5).with_input_types( - str, ignored=str)) + with self.assertRaisesRegex( + typehints.TypeCheckError, + r'requires Dict\[str, str\] but got Dict\[str, int\]'): + _ = (['a', 'b', 'c'] + | beam.Map(lambda e, **_: 'a', kw=5).with_input_types( + str, ignored=str)) def test_deferred_side_inputs(self): @typehints.with_input_types(str, int) @@ -522,6 +882,111 @@ def test_pardo_wrapper_builtin_func(self): self.assertIsNone(th.input_types) self.assertIsNone(th.output_types) + def test_pardo_dofn(self): + class MyDoFn(beam.DoFn): + def process(self, element: int) -> typehints.Generator[str]: + yield str(element) + + th = beam.ParDo(MyDoFn()).get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((str, ), {})) + + def test_pardo_dofn_not_iterable(self): + class MyDoFn(beam.DoFn): + def process(self, element: int) -> str: + return str(element) + + with self.assertRaisesRegex(ValueError, r'str.*is not iterable'): + _ = beam.ParDo(MyDoFn()).get_type_hints() + + def test_pardo_wrapper(self): + def do_fn(element: int) -> typehints.Iterable[str]: + return [str(element)] + + th = beam.ParDo(do_fn).get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((str, ), {})) + + def test_pardo_wrapper_tuple(self): + # Test case for callables that return key-value pairs for GBK. The outer + # Iterable should be stripped but the inner Tuple left intact. + def do_fn(element: int) -> typehints.Iterable[typehints.Tuple[str, int]]: + return [(str(element), element)] + + th = beam.ParDo(do_fn).get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((typehints.Tuple[str, int], ), {})) + + def test_pardo_wrapper_not_iterable(self): + def do_fn(element: int) -> str: + return str(element) + + with self.assertRaisesRegex(typehints.TypeCheckError, + r'str.*is not iterable'): + _ = beam.ParDo(do_fn).get_type_hints() + + def test_flat_map_wrapper(self): + def map_fn(element: int) -> typehints.Iterable[int]: + return [element, element + 1] + + th = beam.FlatMap(map_fn).get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((int, ), {})) + + def test_flat_map_wrapper_optional_output(self): + # Optional should not affect output type (Nones are ignored). + def map_fn(element: int) -> typehints.Optional[typehints.Iterable[int]]: + return [element, element + 1] + + th = beam.FlatMap(map_fn).get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((int, ), {})) + + @unittest.skip('BEAM-8662: Py3 annotations not yet supported for MapTuple') + def test_flat_map_tuple_wrapper(self): + # TODO(BEAM-8662): Also test with a fn that accepts default arguments. + def tuple_map_fn(a: str, b: str, c: str) -> typehints.Iterable[str]: + return [a, b, c] + + th = beam.FlatMapTuple(tuple_map_fn).get_type_hints() + self.assertEqual(th.input_types, ((str, str, str), {})) + self.assertEqual(th.output_types, ((str, ), {})) + + def test_map_wrapper(self): + def map_fn(unused_element: int) -> int: + return 1 + + th = beam.Map(map_fn).get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((int, ), {})) + + def test_map_wrapper_optional_output(self): + # Optional does affect output type (Nones are NOT ignored). + def map_fn(unused_element: int) -> typehints.Optional[int]: + return 1 + + th = beam.Map(map_fn).get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((typehints.Optional[int], ), {})) + + @unittest.skip('BEAM-8662: Py3 annotations not yet supported for MapTuple') + def test_map_tuple(self): + # TODO(BEAM-8662): Also test with a fn that accepts default arguments. + def tuple_map_fn(a: str, b: str, c: str) -> str: + return a + b + c + + th = beam.MapTuple(tuple_map_fn).get_type_hints() + self.assertEqual(th.input_types, ((str, str, str), {})) + self.assertEqual(th.output_types, ((str, ), {})) + + def test_filter_wrapper(self): + def filter_fn(element: int) -> bool: + return bool(element % 2) + + th = beam.Filter(filter_fn).get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((int, ), {})) + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/typehints/typed_pipeline_test_py3.py b/sdks/python/apache_beam/typehints/typed_pipeline_test_py3.py deleted file mode 100644 index 6c45c5ceb8da..000000000000 --- a/sdks/python/apache_beam/typehints/typed_pipeline_test_py3.py +++ /dev/null @@ -1,535 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -"""Unit tests for type-hint objects and decorators - Python 3 syntax specific. -""" - -# pytype: skip-file - -from __future__ import absolute_import - -import typing -import unittest - -import apache_beam as beam -from apache_beam import typehints - - -class MainInputTest(unittest.TestCase): - def assertStartswith(self, msg, prefix): - self.assertTrue( - msg.startswith(prefix), '"%s" does not start with "%s"' % (msg, prefix)) - - def test_typed_dofn_method(self): - class MyDoFn(beam.DoFn): - def process(self, element: int) -> typehints.Tuple[str]: - return tuple(str(element)) - - result = [1, 2, 3] | beam.ParDo(MyDoFn()) - self.assertEqual(['1', '2', '3'], sorted(result)) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*int.*got.*str'): - _ = ['a', 'b', 'c'] | beam.ParDo(MyDoFn()) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*int.*got.*str'): - _ = [1, 2, 3] | (beam.ParDo(MyDoFn()) | 'again' >> beam.ParDo(MyDoFn())) - - def test_typed_dofn_method_with_class_decorators(self): - # Class decorators take precedence over PEP 484 hints. - @typehints.with_input_types(typehints.Tuple[int, int]) - @typehints.with_output_types(int) - class MyDoFn(beam.DoFn): - def process(self, element: int) -> typehints.Tuple[str]: - yield element[0] - - result = [(1, 2)] | beam.ParDo(MyDoFn()) - self.assertEqual([1], sorted(result)) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*Tuple\[int, int\].*got.*str'): - _ = ['a', 'b', 'c'] | beam.ParDo(MyDoFn()) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*Tuple\[int, int\].*got.*int'): - _ = [1, 2, 3] | (beam.ParDo(MyDoFn()) | 'again' >> beam.ParDo(MyDoFn())) - - def test_typed_dofn_instance(self): - # Type hints applied to DoFn instance take precedence over decorators and - # process annotations. - @typehints.with_input_types(typehints.Tuple[int, int]) - @typehints.with_output_types(int) - class MyDoFn(beam.DoFn): - def process(self, element: typehints.Tuple[int, int]) -> \ - typehints.List[int]: - return [str(element)] - - my_do_fn = MyDoFn().with_input_types(int).with_output_types(str) - - result = [1, 2, 3] | beam.ParDo(my_do_fn) - self.assertEqual(['1', '2', '3'], sorted(result)) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*int.*got.*str'): - _ = ['a', 'b', 'c'] | beam.ParDo(my_do_fn) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*int.*got.*str'): - _ = [1, 2, 3] | (beam.ParDo(my_do_fn) | 'again' >> beam.ParDo(my_do_fn)) - - def test_typed_callable_instance(self): - # Type hints applied to ParDo instance take precedence over callable - # decorators and annotations. - @typehints.with_input_types(typehints.Tuple[int, int]) - @typehints.with_output_types(typehints.Generator[int]) - def do_fn(element: typehints.Tuple[int, int]) -> typehints.Generator[str]: - yield str(element) - - pardo = beam.ParDo(do_fn).with_input_types(int).with_output_types(str) - - result = [1, 2, 3] | pardo - self.assertEqual(['1', '2', '3'], sorted(result)) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*int.*got.*str'): - _ = ['a', 'b', 'c'] | pardo - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*int.*got.*str'): - _ = [1, 2, 3] | (pardo | 'again' >> pardo) - - def test_typed_callable_iterable_output(self): - # Only the outer Iterable should be stripped. - def do_fn(element: int) -> typehints.Iterable[typehints.Iterable[str]]: - return [[str(element)] * 2] - - result = [1, 2] | beam.ParDo(do_fn) - self.assertEqual([['1', '1'], ['2', '2']], sorted(result)) - - def test_typed_dofn_method_not_iterable(self): - class MyDoFn(beam.DoFn): - def process(self, element: int) -> str: - return str(element) - - with self.assertRaisesRegex(ValueError, r'str.*is not iterable'): - _ = [1, 2, 3] | beam.ParDo(MyDoFn()) - - def test_typed_dofn_method_return_none(self): - class MyDoFn(beam.DoFn): - def process(self, unused_element: int) -> None: - pass - - result = [1, 2, 3] | beam.ParDo(MyDoFn()) - self.assertListEqual([], result) - - def test_typed_dofn_method_return_optional(self): - class MyDoFn(beam.DoFn): - def process( - self, - unused_element: int) -> typehints.Optional[typehints.Iterable[int]]: - pass - - result = [1, 2, 3] | beam.ParDo(MyDoFn()) - self.assertListEqual([], result) - - def test_typed_dofn_method_return_optional_not_iterable(self): - class MyDoFn(beam.DoFn): - def process(self, unused_element: int) -> typehints.Optional[int]: - pass - - with self.assertRaisesRegex(ValueError, r'int.*is not iterable'): - _ = [1, 2, 3] | beam.ParDo(MyDoFn()) - - def test_typed_callable_not_iterable(self): - def do_fn(element: int) -> int: - return element - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'int.*is not iterable'): - _ = [1, 2, 3] | beam.ParDo(do_fn) - - def test_typed_dofn_kwonly(self): - class MyDoFn(beam.DoFn): - # TODO(BEAM-5878): A kwonly argument like - # timestamp=beam.DoFn.TimestampParam would not work here. - def process(self, element: int, *, side_input: str) -> \ - typehints.Generator[typehints.Optional[str]]: - yield str(element) if side_input else None - - my_do_fn = MyDoFn() - - result = [1, 2, 3] | beam.ParDo(my_do_fn, side_input='abc') - self.assertEqual(['1', '2', '3'], sorted(result)) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*str.*got.*int.*side_input'): - _ = [1, 2, 3] | beam.ParDo(my_do_fn, side_input=1) - - def test_typed_dofn_var_kwargs(self): - class MyDoFn(beam.DoFn): - def process(self, element: int, **side_inputs: typehints.Dict[str, str]) \ - -> typehints.Generator[typehints.Optional[str]]: - yield str(element) if side_inputs else None - - my_do_fn = MyDoFn() - - result = [1, 2, 3] | beam.ParDo(my_do_fn, foo='abc', bar='def') - self.assertEqual(['1', '2', '3'], sorted(result)) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'requires.*str.*got.*int.*side_inputs'): - _ = [1, 2, 3] | beam.ParDo(my_do_fn, a=1) - - def test_typed_callable_string_literals(self): - def do_fn(element: 'int') -> 'typehints.List[str]': - return [[str(element)] * 2] - - result = [1, 2] | beam.ParDo(do_fn) - self.assertEqual([['1', '1'], ['2', '2']], sorted(result)) - - def test_typed_ptransform_fn(self): - # Test that type hints are propagated to the created PTransform. - @beam.ptransform_fn - @typehints.with_input_types(int) - def MyMap(pcoll): - def fn(element: int): - yield element - - return pcoll | beam.ParDo(fn) - - self.assertListEqual([1, 2, 3], [1, 2, 3] | MyMap()) - with self.assertRaisesRegex(typehints.TypeCheckError, r'int.*got.*str'): - _ = ['a'] | MyMap() - - def test_typed_ptransform_fn_conflicting_hints(self): - # In this case, both MyMap and its contained ParDo have separate type - # checks (that disagree with each other). - @beam.ptransform_fn - @typehints.with_input_types(int) - def MyMap(pcoll): - def fn(element: float): - yield element - - return pcoll | beam.ParDo(fn) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'ParDo.*requires.*float.*got.*int'): - _ = [1, 2, 3] | MyMap() - with self.assertRaisesRegex(typehints.TypeCheckError, - r'MyMap.*expected.*int.*got.*str'): - _ = ['a'] | MyMap() - - def test_typed_dofn_string_literals(self): - class MyDoFn(beam.DoFn): - def process(self, element: 'int') -> 'typehints.List[str]': - return [[str(element)] * 2] - - result = [1, 2] | beam.ParDo(MyDoFn()) - self.assertEqual([['1', '1'], ['2', '2']], sorted(result)) - - def test_typed_map(self): - def fn(element: int) -> int: - return element * 2 - - result = [1, 2, 3] | beam.Map(fn) - self.assertEqual([2, 4, 6], sorted(result)) - - def test_typed_map_return_optional(self): - # None is a valid element value for Map. - def fn(element: int) -> typehints.Optional[int]: - if element > 1: - return element - - result = [1, 2, 3] | beam.Map(fn) - self.assertCountEqual([None, 2, 3], result) - - def test_typed_flatmap(self): - def fn(element: int) -> typehints.Iterable[int]: - yield element * 2 - - result = [1, 2, 3] | beam.FlatMap(fn) - self.assertCountEqual([2, 4, 6], result) - - def test_typed_flatmap_output_hint_not_iterable(self): - def fn(element: int) -> int: - return element * 2 - - # This is raised (originally) in strip_iterable. - with self.assertRaisesRegex(typehints.TypeCheckError, - r'int.*is not iterable'): - _ = [1, 2, 3] | beam.FlatMap(fn) - - def test_typed_flatmap_output_value_not_iterable(self): - def fn(element: int) -> typehints.Iterable[int]: - return element * 2 - - # This is raised in runners/common.py (process_outputs). - with self.assertRaisesRegex(TypeError, r'int.*is not iterable'): - _ = [1, 2, 3] | beam.FlatMap(fn) - - def test_typed_flatmap_optional(self): - def fn(element: int) -> typehints.Optional[typehints.Iterable[int]]: - if element > 1: - yield element * 2 - - # Verify that the output type of fn is int and not Optional[int]. - def fn2(element: int) -> int: - return element - - result = [1, 2, 3] | beam.FlatMap(fn) | beam.Map(fn2) - self.assertCountEqual([4, 6], result) - - def test_typed_ptransform_with_no_error(self): - class StrToInt(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[str]) -> beam.pvalue.PCollection[int]: - return pcoll | beam.Map(lambda x: int(x)) - - class IntToStr(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: - return pcoll | beam.Map(lambda x: str(x)) - - _ = ['1', '2', '3'] | StrToInt() | IntToStr() - - def test_typed_ptransform_with_bad_typehints(self): - class StrToInt(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[str]) -> beam.pvalue.PCollection[int]: - return pcoll | beam.Map(lambda x: int(x)) - - class IntToStr(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[str]) -> beam.pvalue.PCollection[str]: - return pcoll | beam.Map(lambda x: str(x)) - - with self.assertRaisesRegex(typehints.TypeCheckError, - "Input type hint violation at IntToStr: " - "expected , got "): - _ = ['1', '2', '3'] | StrToInt() | IntToStr() - - def test_typed_ptransform_with_bad_input(self): - class StrToInt(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[str]) -> beam.pvalue.PCollection[int]: - return pcoll | beam.Map(lambda x: int(x)) - - class IntToStr(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: - return pcoll | beam.Map(lambda x: str(x)) - - with self.assertRaisesRegex(typehints.TypeCheckError, - "Input type hint violation at StrToInt: " - "expected , got "): - # Feed integers to a PTransform that expects strings - _ = [1, 2, 3] | StrToInt() | IntToStr() - - def test_typed_ptransform_with_partial_typehints(self): - class StrToInt(beam.PTransform): - def expand(self, pcoll) -> beam.pvalue.PCollection[int]: - return pcoll | beam.Map(lambda x: int(x)) - - class IntToStr(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: - return pcoll | beam.Map(lambda x: str(x)) - - # Feed integers to a PTransform that should expect strings - # but has no typehints so it expects any - _ = [1, 2, 3] | StrToInt() | IntToStr() - - def test_typed_ptransform_with_bare_wrappers(self): - class StrToInt(beam.PTransform): - def expand( - self, pcoll: beam.pvalue.PCollection) -> beam.pvalue.PCollection: - return pcoll | beam.Map(lambda x: int(x)) - - class IntToStr(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: - return pcoll | beam.Map(lambda x: str(x)) - - _ = [1, 2, 3] | StrToInt() | IntToStr() - - def test_typed_ptransform_with_no_typehints(self): - class StrToInt(beam.PTransform): - def expand(self, pcoll): - return pcoll | beam.Map(lambda x: int(x)) - - class IntToStr(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[int]) -> beam.pvalue.PCollection[str]: - return pcoll | beam.Map(lambda x: str(x)) - - # Feed integers to a PTransform that should expect strings - # but has no typehints so it expects any - _ = [1, 2, 3] | StrToInt() | IntToStr() - - def test_typed_ptransform_with_generic_annotations(self): - T = typing.TypeVar('T') - - class IntToInt(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[T]) -> beam.pvalue.PCollection[T]: - return pcoll | beam.Map(lambda x: x) - - class IntToStr(beam.PTransform): - def expand( - self, - pcoll: beam.pvalue.PCollection[T]) -> beam.pvalue.PCollection[str]: - return pcoll | beam.Map(lambda x: str(x)) - - _ = [1, 2, 3] | IntToInt() | IntToStr() - - def test_typed_ptransform_with_do_outputs_tuple_compiles(self): - class MyDoFn(beam.DoFn): - def process(self, element: int, *args, **kwargs): - if element % 2: - yield beam.pvalue.TaggedOutput('odd', 1) - else: - yield beam.pvalue.TaggedOutput('even', 1) - - class MyPTransform(beam.PTransform): - def expand(self, pcoll: beam.pvalue.PCollection[int]): - return pcoll | beam.ParDo(MyDoFn()).with_outputs('odd', 'even') - - # This test fails if you remove the following line from ptransform.py - # if isinstance(pvalue_, DoOutputsTuple): continue - _ = [1, 2, 3] | MyPTransform() - - -class AnnotationsTest(unittest.TestCase): - def test_pardo_dofn(self): - class MyDoFn(beam.DoFn): - def process(self, element: int) -> typehints.Generator[str]: - yield str(element) - - th = beam.ParDo(MyDoFn()).get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((str, ), {})) - - def test_pardo_dofn_not_iterable(self): - class MyDoFn(beam.DoFn): - def process(self, element: int) -> str: - return str(element) - - with self.assertRaisesRegex(ValueError, r'str.*is not iterable'): - _ = beam.ParDo(MyDoFn()).get_type_hints() - - def test_pardo_wrapper(self): - def do_fn(element: int) -> typehints.Iterable[str]: - return [str(element)] - - th = beam.ParDo(do_fn).get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((str, ), {})) - - def test_pardo_wrapper_tuple(self): - # Test case for callables that return key-value pairs for GBK. The outer - # Iterable should be stripped but the inner Tuple left intact. - def do_fn(element: int) -> typehints.Iterable[typehints.Tuple[str, int]]: - return [(str(element), element)] - - th = beam.ParDo(do_fn).get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((typehints.Tuple[str, int], ), {})) - - def test_pardo_wrapper_not_iterable(self): - def do_fn(element: int) -> str: - return str(element) - - with self.assertRaisesRegex(typehints.TypeCheckError, - r'str.*is not iterable'): - _ = beam.ParDo(do_fn).get_type_hints() - - def test_flat_map_wrapper(self): - def map_fn(element: int) -> typehints.Iterable[int]: - return [element, element + 1] - - th = beam.FlatMap(map_fn).get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((int, ), {})) - - def test_flat_map_wrapper_optional_output(self): - # Optional should not affect output type (Nones are ignored). - def map_fn(element: int) -> typehints.Optional[typehints.Iterable[int]]: - return [element, element + 1] - - th = beam.FlatMap(map_fn).get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((int, ), {})) - - @unittest.skip('BEAM-8662: Py3 annotations not yet supported for MapTuple') - def test_flat_map_tuple_wrapper(self): - # TODO(BEAM-8662): Also test with a fn that accepts default arguments. - def tuple_map_fn(a: str, b: str, c: str) -> typehints.Iterable[str]: - return [a, b, c] - - th = beam.FlatMapTuple(tuple_map_fn).get_type_hints() - self.assertEqual(th.input_types, ((str, str, str), {})) - self.assertEqual(th.output_types, ((str, ), {})) - - def test_map_wrapper(self): - def map_fn(unused_element: int) -> int: - return 1 - - th = beam.Map(map_fn).get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((int, ), {})) - - def test_map_wrapper_optional_output(self): - # Optional does affect output type (Nones are NOT ignored). - def map_fn(unused_element: int) -> typehints.Optional[int]: - return 1 - - th = beam.Map(map_fn).get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((typehints.Optional[int], ), {})) - - @unittest.skip('BEAM-8662: Py3 annotations not yet supported for MapTuple') - def test_map_tuple(self): - # TODO(BEAM-8662): Also test with a fn that accepts default arguments. - def tuple_map_fn(a: str, b: str, c: str) -> str: - return a + b + c - - th = beam.MapTuple(tuple_map_fn).get_type_hints() - self.assertEqual(th.input_types, ((str, str, str), {})) - self.assertEqual(th.output_types, ((str, ), {})) - - def test_filter_wrapper(self): - def filter_fn(element: int) -> bool: - return bool(element % 2) - - th = beam.Filter(filter_fn).get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((int, ), {})) - - -if __name__ == '__main__': - unittest.main() diff --git a/sdks/python/apache_beam/typehints/typehints.py b/sdks/python/apache_beam/typehints/typehints.py index 3920d810338b..136ad016b167 100644 --- a/sdks/python/apache_beam/typehints/typehints.py +++ b/sdks/python/apache_beam/typehints/typehints.py @@ -65,18 +65,10 @@ # pytype: skip-file -from __future__ import absolute_import - import collections import copy import logging -import sys -import types import typing -from builtins import next -from builtins import zip - -from future.utils import with_metaclass __all__ = [ 'Any', @@ -182,10 +174,6 @@ def visit(self, visitor, visitor_arg): else: visitor(t, visitor_arg) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def match_type_variables(type_constraint, concrete_type): if isinstance(type_constraint, TypeConstraint): @@ -330,6 +318,19 @@ def __getitem___(self, py_type): raise NotImplementedError +def is_typing_generic(type_param): + """Determines if an object is a subscripted typing.Generic type, such as + PCollection[int]. + + Such objects are considered valid type parameters. + + Always returns false for Python versions below 3.7. + """ + if hasattr(typing, '_GenericAlias'): + return isinstance(type_param, typing._GenericAlias) + return False + + def validate_composite_type_param(type_param, error_msg_prefix): """Determines if an object is a valid type parameter to a :class:`CompositeTypeHint`. @@ -349,10 +350,8 @@ def validate_composite_type_param(type_param, error_msg_prefix): """ # Must either be a TypeConstraint instance or a basic Python type. possible_classes = [type, TypeConstraint] - if sys.version_info[0] == 2: - # Access from __dict__ to avoid py27-lint3 compatibility checker complaint. - possible_classes.append(types.__dict__["ClassType"]) is_not_type_constraint = ( + not is_typing_generic(type_param) and not isinstance(type_param, tuple(possible_classes)) and type_param is not None and getattr(type_param, '__module__', None) != 'typing') @@ -366,6 +365,7 @@ def validate_composite_type_param(type_param, error_msg_prefix): (error_msg_prefix, type_param, type_param.__class__.__name__)) +# TODO(BEAM-12469): Remove this function and use plain repr() instead. def _unified_repr(o): """Given an object return a qualified name for the object. @@ -379,7 +379,10 @@ def _unified_repr(o): Returns: A qualified name for the passed Python object fit for string formatting. """ - return repr(o) if isinstance(o, (TypeConstraint, type(None))) else o.__name__ + if isinstance(o, (TypeConstraint, type(None))) or not hasattr(o, '__name__'): + return repr(o) + else: + return o.__name__ def check_constraint(type_constraint, object_instance): @@ -1047,8 +1050,7 @@ def __getitem__(self, type_param): IteratorTypeConstraint = IteratorHint.IteratorTypeConstraint -class WindowedTypeConstraint(with_metaclass(GetitemConstructor, TypeConstraint) - ): # type: ignore[misc] +class WindowedTypeConstraint(TypeConstraint, metaclass=GetitemConstructor): """A type constraint for WindowedValue objects. Mostly for internal use. @@ -1104,9 +1106,9 @@ class GeneratorHint(IteratorHint): def __getitem__(self, type_params): if isinstance(type_params, tuple) and len(type_params) == 3: yield_type, send_type, return_type = type_params - if send_type is not None: + if send_type is not type(None): _LOGGER.warning('Ignoring send_type hint: %s' % send_type) - if return_type is not None: + if return_type is not type(None): _LOGGER.warning('Ignoring return_type hint: %s' % return_type) else: yield_type = type_params diff --git a/sdks/python/apache_beam/typehints/typehints_test.py b/sdks/python/apache_beam/typehints/typehints_test.py index 69fcbcc63bff..2e0658c94358 100644 --- a/sdks/python/apache_beam/typehints/typehints_test.py +++ b/sdks/python/apache_beam/typehints/typehints_test.py @@ -19,20 +19,21 @@ # pytype: skip-file -from __future__ import absolute_import - import functools import sys +import typing import unittest -from builtins import next -from builtins import range - -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import apache_beam.typehints.typehints as typehints +from apache_beam import Map +from apache_beam import PTransform +from apache_beam.pvalue import PBegin +from apache_beam.pvalue import PCollection +from apache_beam.pvalue import PDone +from apache_beam.transforms.core import DoFn +from apache_beam.typehints import KV from apache_beam.typehints import Any -from apache_beam.typehints import Dict +from apache_beam.typehints import Iterable from apache_beam.typehints import Tuple from apache_beam.typehints import TypeCheckError from apache_beam.typehints import Union @@ -1181,13 +1182,6 @@ def test_positional_arg_hints(self): typehints.Tuple[int, typehints.Any], _positional_arg_hints(['x', 'y'], {'x': int})) - @staticmethod - def relax_for_py2(tuple_hint): - if sys.version_info >= (3, ): - return tuple_hint - else: - return Tuple[Any, ...] - def test_getcallargs_forhints(self): def func(a, b_c, *d): return a, b_c, d @@ -1197,9 +1191,7 @@ def func(a, b_c, *d): }, getcallargs_forhints(func, *[Any, Any])) self.assertEqual({ - 'a': Any, - 'b_c': Any, - 'd': self.relax_for_py2(Tuple[Union[int, str], ...]) + 'a': Any, 'b_c': Any, 'd': Tuple[Union[int, str], ...] }, getcallargs_forhints(func, *[Any, Any, str, int])) self.assertEqual({ @@ -1207,52 +1199,15 @@ def func(a, b_c, *d): }, getcallargs_forhints(func, *[int, Tuple[str, Any]])) self.assertEqual({ - 'a': Any, 'b_c': Any, 'd': self.relax_for_py2(Tuple[str, ...]) + 'a': Any, 'b_c': Any, 'd': Tuple[str, ...] }, getcallargs_forhints(func, *[Any, Any, Tuple[str, ...]])) self.assertEqual({ - 'a': Any, - 'b_c': Any, - 'd': self.relax_for_py2(Tuple[Union[Tuple[str, ...], int], ...]) + 'a': Any, 'b_c': Any, 'd': Tuple[Union[Tuple[str, ...], int], ...] }, getcallargs_forhints( func, *[Any, Any, Tuple[str, ...], int])) - @unittest.skipIf( - sys.version_info < (3, ), - 'kwargs not supported in Py2 version of this function') - def test_getcallargs_forhints_varkw(self): - def func(a, b_c, *d, **e): - return a, b_c, d, e - - self.assertEqual({ - 'a': Any, - 'b_c': Any, - 'd': Tuple[Any, ...], - 'e': Dict[str, Union[str, int]] - }, - getcallargs_forhints( - func, *[Any, Any], **{ - 'kw1': str, 'kw2': int - })) - self.assertEqual({ - 'a': Any, - 'b_c': Any, - 'd': Tuple[Any, ...], - 'e': Dict[str, Union[str, int]] - }, - getcallargs_forhints( - func, *[Any, Any], e=Dict[str, Union[int, str]])) - self.assertEqual( - { - 'a': Any, - 'b_c': Any, - 'd': Tuple[Any, ...], - 'e': Dict[str, Dict[str, Union[str, int]]] - }, - # keyword is not 'e', thus the Dict is considered a value hint. - getcallargs_forhints(func, *[Any, Any], kw1=Dict[str, Union[int, str]])) - def test_getcallargs_forhints_builtins(self): if sys.version_info < (3, 7): # Signatures for builtins are not supported in 3.5 and 3.6. @@ -1264,14 +1219,13 @@ def test_getcallargs_forhints_builtins(self): getcallargs_forhints(str.upper, str)) self.assertEqual({ '_': str, - '__unknown__varargs': self.relax_for_py2(Tuple[str, ...]), + '__unknown__varargs': Tuple[str, ...], '__unknown__keywords': typehints.Dict[Any, Any] }, getcallargs_forhints(str.strip, str, str)) self.assertEqual({ '_': str, - '__unknown__varargs': self.relax_for_py2( - Tuple[typehints.List[int], ...]), + '__unknown__varargs': Tuple[typehints.List[int], ...], '__unknown__keywords': typehints.Dict[Any, Any] }, getcallargs_forhints(str.join, str, typehints.List[int])) @@ -1287,6 +1241,15 @@ def test_getcallargs_forhints_builtins(self): }, getcallargs_forhints(str.join, str, typehints.List[int])) + def test_unified_repr(self): + self.assertIn('int', typehints._unified_repr(int)) + self.assertIn('PCollection', typehints._unified_repr(PCollection)) + if sys.version_info < (3, 7): + self.assertIn('PCollection', typehints._unified_repr(PCollection[int])) + else: + self.assertIn( + 'PCollection[int]', typehints._unified_repr(PCollection[int])) + class TestGetYieldedType(unittest.TestCase): def test_iterables(self): @@ -1331,5 +1294,237 @@ def test_coercion_fail(self): typehints.coerce_to_kv_type(*args) +class TestParDoAnnotations(unittest.TestCase): + def test_with_side_input(self): + class MyDoFn(DoFn): + def process(self, element: float, side_input: str) -> \ + Iterable[KV[str, float]]: + pass + + th = MyDoFn().get_type_hints() + self.assertEqual(th.input_types, ((float, str), {})) + self.assertEqual(th.output_types, ((KV[str, float], ), {})) + + def test_pep484_annotations(self): + class MyDoFn(DoFn): + def process(self, element: int) -> Iterable[str]: + pass + + th = MyDoFn().get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((str, ), {})) + + +class TestPTransformAnnotations(unittest.TestCase): + def test_pep484_annotations(self): + class MyPTransform(PTransform): + def expand(self, pcoll: PCollection[int]) -> PCollection[str]: + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((str, ), {})) + + def test_annotations_without_input_pcollection_wrapper(self): + class MyPTransform(PTransform): + def expand(self, pcoll: int) -> PCollection[str]: + return pcoll | Map(lambda num: str(num)) + + error_str = ( + r'This input type hint will be ignored and not used for ' + r'type-checking purposes. Typically, input type hints for a ' + r'PTransform are single (or nested) types wrapped by a ' + r'PCollection, or PBegin. Got: {} instead.'.format(int)) + + with self.assertLogs(level='WARN') as log: + MyPTransform().get_type_hints() + self.assertIn(error_str, log.output[0]) + + def test_annotations_without_output_pcollection_wrapper(self): + class MyPTransform(PTransform): + def expand(self, pcoll: PCollection[int]) -> str: + return pcoll | Map(lambda num: str(num)) + + error_str = ( + r'This output type hint will be ignored and not used for ' + r'type-checking purposes. Typically, output type hints for a ' + r'PTransform are single (or nested) types wrapped by a ' + r'PCollection, PDone, or None. Got: {} instead.'.format(str)) + + with self.assertLogs(level='WARN') as log: + th = MyPTransform().get_type_hints() + self.assertIn(error_str, log.output[0]) + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, None) + + def test_annotations_without_input_internal_type(self): + class MyPTransform(PTransform): + def expand(self, pcoll: PCollection) -> PCollection[str]: + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((Any, ), {})) + self.assertEqual(th.output_types, ((str, ), {})) + + def test_annotations_without_output_internal_type(self): + class MyPTransform(PTransform): + def expand(self, pcoll: PCollection[int]) -> PCollection: + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((Any, ), {})) + + def test_annotations_without_any_internal_type(self): + class MyPTransform(PTransform): + def expand(self, pcoll: PCollection) -> PCollection: + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((Any, ), {})) + self.assertEqual(th.output_types, ((Any, ), {})) + + def test_annotations_without_input_typehint(self): + class MyPTransform(PTransform): + def expand(self, pcoll) -> PCollection[str]: + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((Any, ), {})) + self.assertEqual(th.output_types, ((str, ), {})) + + def test_annotations_without_output_typehint(self): + class MyPTransform(PTransform): + def expand(self, pcoll: PCollection[int]): + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((int, ), {})) + self.assertEqual(th.output_types, ((Any, ), {})) + + def test_annotations_without_any_typehints(self): + class MyPTransform(PTransform): + def expand(self, pcoll): + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, None) + self.assertEqual(th.output_types, None) + + def test_annotations_with_pbegin(self): + class MyPTransform(PTransform): + def expand(self, pcoll: PBegin): + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((Any, ), {})) + self.assertEqual(th.output_types, ((Any, ), {})) + + def test_annotations_with_pdone(self): + class MyPTransform(PTransform): + def expand(self, pcoll) -> PDone: + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((Any, ), {})) + self.assertEqual(th.output_types, ((Any, ), {})) + + def test_annotations_with_none_input(self): + class MyPTransform(PTransform): + def expand(self, pcoll: None) -> PCollection[str]: + return pcoll | Map(lambda num: str(num)) + + error_str = ( + r'This input type hint will be ignored and not used for ' + r'type-checking purposes. Typically, input type hints for a ' + r'PTransform are single (or nested) types wrapped by a ' + r'PCollection, or PBegin. Got: {} instead.'.format(None)) + + with self.assertLogs(level='WARN') as log: + th = MyPTransform().get_type_hints() + self.assertIn(error_str, log.output[0]) + self.assertEqual(th.input_types, None) + self.assertEqual(th.output_types, ((str, ), {})) + + def test_annotations_with_none_output(self): + class MyPTransform(PTransform): + def expand(self, pcoll) -> None: + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((Any, ), {})) + self.assertEqual(th.output_types, ((Any, ), {})) + + def test_annotations_with_arbitrary_output(self): + class MyPTransform(PTransform): + def expand(self, pcoll) -> str: + return pcoll | Map(lambda num: str(num)) + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((Any, ), {})) + self.assertEqual(th.output_types, None) + + def test_annotations_with_arbitrary_input_and_output(self): + class MyPTransform(PTransform): + def expand(self, pcoll: int) -> str: + return pcoll | Map(lambda num: str(num)) + + input_error_str = ( + r'This input type hint will be ignored and not used for ' + r'type-checking purposes. Typically, input type hints for a ' + r'PTransform are single (or nested) types wrapped by a ' + r'PCollection, or PBegin. Got: {} instead.'.format(int)) + + output_error_str = ( + r'This output type hint will be ignored and not used for ' + r'type-checking purposes. Typically, output type hints for a ' + r'PTransform are single (or nested) types wrapped by a ' + r'PCollection, PDone, or None. Got: {} instead.'.format(str)) + + with self.assertLogs(level='WARN') as log: + th = MyPTransform().get_type_hints() + self.assertIn(input_error_str, log.output[0]) + self.assertIn(output_error_str, log.output[1]) + self.assertEqual(th.input_types, None) + self.assertEqual(th.output_types, None) + + def test_typing_module_annotations_are_converted_to_beam_annotations(self): + class MyPTransform(PTransform): + def expand( + self, pcoll: PCollection[typing.Dict[str, str]] + ) -> PCollection[typing.Dict[str, str]]: + return pcoll + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((typehints.Dict[str, str], ), {})) + self.assertEqual(th.input_types, ((typehints.Dict[str, str], ), {})) + + def test_nested_typing_annotations_are_converted_to_beam_annotations(self): + class MyPTransform(PTransform): + def expand(self, pcoll: + PCollection[typing.Union[int, typing.Any, typing.Dict[str, float]]]) \ + -> PCollection[typing.Union[int, typing.Any, typing.Dict[str, float]]]: + return pcoll + + th = MyPTransform().get_type_hints() + self.assertEqual( + th.input_types, + ((typehints.Union[int, typehints.Any, typehints.Dict[str, + float]], ), {})) + self.assertEqual( + th.input_types, + ((typehints.Union[int, typehints.Any, typehints.Dict[str, + float]], ), {})) + + def test_mixed_annotations_are_converted_to_beam_annotations(self): + class MyPTransform(PTransform): + def expand(self, pcoll: typing.Any) -> typehints.Any: + return pcoll + + th = MyPTransform().get_type_hints() + self.assertEqual(th.input_types, ((typehints.Any, ), {})) + self.assertEqual(th.input_types, ((typehints.Any, ), {})) + + if __name__ == '__main__': unittest.main() diff --git a/sdks/python/apache_beam/typehints/typehints_test_py3.py b/sdks/python/apache_beam/typehints/typehints_test_py3.py deleted file mode 100644 index 5a3633049d92..000000000000 --- a/sdks/python/apache_beam/typehints/typehints_test_py3.py +++ /dev/null @@ -1,274 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -"""Unit tests for the type-hint objects and decorators with Python 3 syntax not -supported by 2.7.""" - -# pytype: skip-file - -from __future__ import absolute_import -from __future__ import print_function - -import typing -import unittest - -import apache_beam.typehints.typehints as typehints -from apache_beam import Map -from apache_beam import PTransform -from apache_beam.pvalue import PBegin -from apache_beam.pvalue import PCollection -from apache_beam.pvalue import PDone -from apache_beam.transforms.core import DoFn -from apache_beam.typehints import KV -from apache_beam.typehints import Iterable -from apache_beam.typehints.typehints import Any - - -class TestParDoAnnotations(unittest.TestCase): - def test_with_side_input(self): - class MyDoFn(DoFn): - def process(self, element: float, side_input: str) -> \ - Iterable[KV[str, float]]: - pass - - th = MyDoFn().get_type_hints() - self.assertEqual(th.input_types, ((float, str), {})) - self.assertEqual(th.output_types, ((KV[str, float], ), {})) - - def test_pep484_annotations(self): - class MyDoFn(DoFn): - def process(self, element: int) -> Iterable[str]: - pass - - th = MyDoFn().get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((str, ), {})) - - -class TestPTransformAnnotations(unittest.TestCase): - def test_pep484_annotations(self): - class MyPTransform(PTransform): - def expand(self, pcoll: PCollection[int]) -> PCollection[str]: - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((str, ), {})) - - def test_annotations_without_input_pcollection_wrapper(self): - class MyPTransform(PTransform): - def expand(self, pcoll: int) -> PCollection[str]: - return pcoll | Map(lambda num: str(num)) - - error_str = ( - r'This input type hint will be ignored and not used for ' - r'type-checking purposes. Typically, input type hints for a ' - r'PTransform are single (or nested) types wrapped by a ' - r'PCollection, or PBegin. Got: {} instead.'.format(int)) - - with self.assertLogs(level='WARN') as log: - MyPTransform().get_type_hints() - self.assertIn(error_str, log.output[0]) - - def test_annotations_without_output_pcollection_wrapper(self): - class MyPTransform(PTransform): - def expand(self, pcoll: PCollection[int]) -> str: - return pcoll | Map(lambda num: str(num)) - - error_str = ( - r'This output type hint will be ignored and not used for ' - r'type-checking purposes. Typically, output type hints for a ' - r'PTransform are single (or nested) types wrapped by a ' - r'PCollection, PDone, or None. Got: {} instead.'.format(str)) - - with self.assertLogs(level='WARN') as log: - th = MyPTransform().get_type_hints() - self.assertIn(error_str, log.output[0]) - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, None) - - def test_annotations_without_input_internal_type(self): - class MyPTransform(PTransform): - def expand(self, pcoll: PCollection) -> PCollection[str]: - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((Any, ), {})) - self.assertEqual(th.output_types, ((str, ), {})) - - def test_annotations_without_output_internal_type(self): - class MyPTransform(PTransform): - def expand(self, pcoll: PCollection[int]) -> PCollection: - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((Any, ), {})) - - def test_annotations_without_any_internal_type(self): - class MyPTransform(PTransform): - def expand(self, pcoll: PCollection) -> PCollection: - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((Any, ), {})) - self.assertEqual(th.output_types, ((Any, ), {})) - - def test_annotations_without_input_typehint(self): - class MyPTransform(PTransform): - def expand(self, pcoll) -> PCollection[str]: - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((Any, ), {})) - self.assertEqual(th.output_types, ((str, ), {})) - - def test_annotations_without_output_typehint(self): - class MyPTransform(PTransform): - def expand(self, pcoll: PCollection[int]): - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((int, ), {})) - self.assertEqual(th.output_types, ((Any, ), {})) - - def test_annotations_without_any_typehints(self): - class MyPTransform(PTransform): - def expand(self, pcoll): - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, None) - self.assertEqual(th.output_types, None) - - def test_annotations_with_pbegin(self): - class MyPTransform(PTransform): - def expand(self, pcoll: PBegin): - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((Any, ), {})) - self.assertEqual(th.output_types, ((Any, ), {})) - - def test_annotations_with_pdone(self): - class MyPTransform(PTransform): - def expand(self, pcoll) -> PDone: - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((Any, ), {})) - self.assertEqual(th.output_types, ((Any, ), {})) - - def test_annotations_with_none_input(self): - class MyPTransform(PTransform): - def expand(self, pcoll: None) -> PCollection[str]: - return pcoll | Map(lambda num: str(num)) - - error_str = ( - r'This input type hint will be ignored and not used for ' - r'type-checking purposes. Typically, input type hints for a ' - r'PTransform are single (or nested) types wrapped by a ' - r'PCollection, or PBegin. Got: {} instead.'.format(None)) - - with self.assertLogs(level='WARN') as log: - th = MyPTransform().get_type_hints() - self.assertIn(error_str, log.output[0]) - self.assertEqual(th.input_types, None) - self.assertEqual(th.output_types, ((str, ), {})) - - def test_annotations_with_none_output(self): - class MyPTransform(PTransform): - def expand(self, pcoll) -> None: - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((Any, ), {})) - self.assertEqual(th.output_types, ((Any, ), {})) - - def test_annotations_with_arbitrary_output(self): - class MyPTransform(PTransform): - def expand(self, pcoll) -> str: - return pcoll | Map(lambda num: str(num)) - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((Any, ), {})) - self.assertEqual(th.output_types, None) - - def test_annotations_with_arbitrary_input_and_output(self): - class MyPTransform(PTransform): - def expand(self, pcoll: int) -> str: - return pcoll | Map(lambda num: str(num)) - - input_error_str = ( - r'This input type hint will be ignored and not used for ' - r'type-checking purposes. Typically, input type hints for a ' - r'PTransform are single (or nested) types wrapped by a ' - r'PCollection, or PBegin. Got: {} instead.'.format(int)) - - output_error_str = ( - r'This output type hint will be ignored and not used for ' - r'type-checking purposes. Typically, output type hints for a ' - r'PTransform are single (or nested) types wrapped by a ' - r'PCollection, PDone, or None. Got: {} instead.'.format(str)) - - with self.assertLogs(level='WARN') as log: - th = MyPTransform().get_type_hints() - self.assertIn(input_error_str, log.output[0]) - self.assertIn(output_error_str, log.output[1]) - self.assertEqual(th.input_types, None) - self.assertEqual(th.output_types, None) - - def test_typing_module_annotations_are_converted_to_beam_annotations(self): - class MyPTransform(PTransform): - def expand( - self, pcoll: PCollection[typing.Dict[str, str]] - ) -> PCollection[typing.Dict[str, str]]: - return pcoll - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((typehints.Dict[str, str], ), {})) - self.assertEqual(th.input_types, ((typehints.Dict[str, str], ), {})) - - def test_nested_typing_annotations_are_converted_to_beam_annotations(self): - class MyPTransform(PTransform): - def expand(self, pcoll: - PCollection[typing.Union[int, typing.Any, typing.Dict[str, float]]]) \ - -> PCollection[typing.Union[int, typing.Any, typing.Dict[str, float]]]: - return pcoll - - th = MyPTransform().get_type_hints() - self.assertEqual( - th.input_types, - ((typehints.Union[int, typehints.Any, typehints.Dict[str, - float]], ), {})) - self.assertEqual( - th.input_types, - ((typehints.Union[int, typehints.Any, typehints.Dict[str, - float]], ), {})) - - def test_mixed_annotations_are_converted_to_beam_annotations(self): - class MyPTransform(PTransform): - def expand(self, pcoll: typing.Any) -> typehints.Any: - return pcoll - - th = MyPTransform().get_type_hints() - self.assertEqual(th.input_types, ((typehints.Any, ), {})) - self.assertEqual(th.input_types, ((typehints.Any, ), {})) - - -if __name__ == '__main__': - unittest.main() diff --git a/sdks/python/apache_beam/utils/__init__.py b/sdks/python/apache_beam/utils/__init__.py index 5bc12e7e282f..635c80f7c6b4 100644 --- a/sdks/python/apache_beam/utils/__init__.py +++ b/sdks/python/apache_beam/utils/__init__.py @@ -19,5 +19,3 @@ For internal use only; no backwards-compatibility guarantees. """ - -from __future__ import absolute_import diff --git a/sdks/python/apache_beam/utils/annotations.py b/sdks/python/apache_beam/utils/annotations.py index 1ded385677bc..25cfb8813d64 100644 --- a/sdks/python/apache_beam/utils/annotations.py +++ b/sdks/python/apache_beam/utils/annotations.py @@ -83,8 +83,6 @@ def exp_multiply(arg1, arg2): # pytype: skip-file -from __future__ import absolute_import - import warnings from functools import partial from functools import wraps diff --git a/sdks/python/apache_beam/utils/annotations_test.py b/sdks/python/apache_beam/utils/annotations_test.py index dce48b207a87..5e93c7ab60b2 100644 --- a/sdks/python/apache_beam/utils/annotations_test.py +++ b/sdks/python/apache_beam/utils/annotations_test.py @@ -16,8 +16,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest import warnings diff --git a/sdks/python/apache_beam/utils/counters.py b/sdks/python/apache_beam/utils/counters.py index 267c214b70b3..320a9cdcf024 100644 --- a/sdks/python/apache_beam/utils/counters.py +++ b/sdks/python/apache_beam/utils/counters.py @@ -17,6 +17,7 @@ # cython: profile=False # cython: overflowcheck=True +# cython: language_level=3 """Counters collect the progress of the Worker for reporting to the service. @@ -25,11 +26,7 @@ # pytype: skip-file -from __future__ import absolute_import - import threading -from builtins import hex -from builtins import object from collections import namedtuple from typing import TYPE_CHECKING from typing import Dict diff --git a/sdks/python/apache_beam/utils/counters_test.py b/sdks/python/apache_beam/utils/counters_test.py index 122877b62e52..9d3bf668c1f4 100644 --- a/sdks/python/apache_beam/utils/counters_test.py +++ b/sdks/python/apache_beam/utils/counters_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest from apache_beam.utils import counters diff --git a/sdks/python/apache_beam/utils/histogram.py b/sdks/python/apache_beam/utils/histogram.py index 93ae29270d31..13bb5c274d2b 100644 --- a/sdks/python/apache_beam/utils/histogram.py +++ b/sdks/python/apache_beam/utils/histogram.py @@ -15,9 +15,6 @@ # limitations under the License. # -from __future__ import absolute_import -from __future__ import division - import logging import math import threading diff --git a/sdks/python/apache_beam/utils/histogram_test.py b/sdks/python/apache_beam/utils/histogram_test.py index 93cb7aef6fd0..a688d8a5ff4b 100644 --- a/sdks/python/apache_beam/utils/histogram_test.py +++ b/sdks/python/apache_beam/utils/histogram_test.py @@ -17,9 +17,6 @@ """Unit tests for Histogram.""" -from __future__ import absolute_import -from __future__ import division - import unittest from mock import patch diff --git a/sdks/python/apache_beam/utils/interactive_utils.py b/sdks/python/apache_beam/utils/interactive_utils.py index ef3e16d0061a..74bedcd27141 100644 --- a/sdks/python/apache_beam/utils/interactive_utils.py +++ b/sdks/python/apache_beam/utils/interactive_utils.py @@ -21,8 +21,6 @@ """ # pytype: skip-file -from __future__ import absolute_import - import logging _LOGGER = logging.getLogger(__name__) diff --git a/sdks/python/apache_beam/utils/interactive_utils_test.py b/sdks/python/apache_beam/utils/interactive_utils_test.py index 13ef93f2573d..76e28ab7ee0a 100644 --- a/sdks/python/apache_beam/utils/interactive_utils_test.py +++ b/sdks/python/apache_beam/utils/interactive_utils_test.py @@ -19,21 +19,13 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest +from unittest.mock import patch from apache_beam.runners.interactive import interactive_environment as ie from apache_beam.runners.interactive.testing.mock_ipython import mock_get_ipython from apache_beam.utils.interactive_utils import is_in_ipython -# TODO(BEAM-8288): clean up the work-around of nose tests using Python2 without -# unittest.mock module. -try: - from unittest.mock import patch -except ImportError: - from mock import patch # type: ignore[misc] - def unavailable_ipython(): # ModuleNotFoundError since Py3.6 is sub class of ImportError. An example, diff --git a/sdks/python/apache_beam/utils/plugin.py b/sdks/python/apache_beam/utils/plugin.py index efac015f0d18..32ceae57009b 100644 --- a/sdks/python/apache_beam/utils/plugin.py +++ b/sdks/python/apache_beam/utils/plugin.py @@ -22,10 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import - -from builtins import object - class BeamPlugin(object): """Plugin base class to be extended by dependent users such as FileSystem. diff --git a/sdks/python/apache_beam/utils/processes.py b/sdks/python/apache_beam/utils/processes.py index ab587aac8ba4..c7b9e240d961 100644 --- a/sdks/python/apache_beam/utils/processes.py +++ b/sdks/python/apache_beam/utils/processes.py @@ -22,8 +22,6 @@ # pytype: skip-file -from __future__ import absolute_import - import platform import subprocess import traceback diff --git a/sdks/python/apache_beam/utils/processes_test.py b/sdks/python/apache_beam/utils/processes_test.py index 2074a7918511..c2868eb2828e 100644 --- a/sdks/python/apache_beam/utils/processes_test.py +++ b/sdks/python/apache_beam/utils/processes_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import subprocess import unittest diff --git a/sdks/python/apache_beam/utils/profiler.py b/sdks/python/apache_beam/utils/profiler.py index 81bd57837ba4..de9f9434dc05 100644 --- a/sdks/python/apache_beam/utils/profiler.py +++ b/sdks/python/apache_beam/utils/profiler.py @@ -23,8 +23,6 @@ # pytype: skip-file # mypy: check-untyped-defs -from __future__ import absolute_import - import cProfile # pylint: disable=bad-python3-import import io import logging @@ -33,7 +31,6 @@ import random import tempfile import time -from builtins import object from typing import Callable from typing import Optional @@ -123,11 +120,7 @@ def __exit__(self, *args): if self.log_results: if self.enable_cpu_profiling: - try: - import StringIO # Python 2 - s = StringIO.StringIO() - except ImportError: - s = io.StringIO() + s = io.StringIO() self.stats = pstats.Stats( self.profile, stream=s).sort_stats(Profile.SORTBY) self.stats.print_stats() diff --git a/sdks/python/apache_beam/utils/proto_utils.py b/sdks/python/apache_beam/utils/proto_utils.py index 9e9ea19a2be6..3a5e020df167 100644 --- a/sdks/python/apache_beam/utils/proto_utils.py +++ b/sdks/python/apache_beam/utils/proto_utils.py @@ -19,9 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import -from __future__ import division - from typing import Type from typing import TypeVar from typing import Union @@ -37,6 +34,8 @@ TimeMessageT = TypeVar( 'TimeMessageT', duration_pb2.Duration, timestamp_pb2.Timestamp) +message_types = (message.Message, ) + @overload def pack_Any(msg): diff --git a/sdks/python/apache_beam/utils/retry.py b/sdks/python/apache_beam/utils/retry.py index 0533f5b2f18c..16f28825c5a7 100644 --- a/sdks/python/apache_beam/utils/retry.py +++ b/sdks/python/apache_beam/utils/retry.py @@ -27,19 +27,12 @@ # pytype: skip-file -from __future__ import absolute_import - import functools import logging import random import sys import time import traceback -from builtins import next -from builtins import object -from builtins import range - -from future.utils import raise_with_traceback from apache_beam.io.filesystem import BeamIOError @@ -270,7 +263,7 @@ def wrapper(*args, **kwargs): sleep_interval = next(retry_intervals) except StopIteration: # Re-raise the original exception since we finished the retries. - raise_with_traceback(exn, exn_traceback) + raise exn.with_traceback(exn_traceback) logger( 'Retry with exponential backoff: waiting for %s seconds before ' diff --git a/sdks/python/apache_beam/utils/retry_test.py b/sdks/python/apache_beam/utils/retry_test.py index f931fbe60238..05cc6e6a7f40 100644 --- a/sdks/python/apache_beam/utils/retry_test.py +++ b/sdks/python/apache_beam/utils/retry_test.py @@ -19,10 +19,7 @@ # pytype: skip-file -from __future__ import absolute_import - import unittest -from builtins import object from parameterized import parameterized diff --git a/sdks/python/apache_beam/utils/sentinel.py b/sdks/python/apache_beam/utils/sentinel.py index 9be2837ae6e0..c2b057b3df6d 100644 --- a/sdks/python/apache_beam/utils/sentinel.py +++ b/sdks/python/apache_beam/utils/sentinel.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import enum diff --git a/sdks/python/apache_beam/utils/sharded_key.py b/sdks/python/apache_beam/utils/sharded_key.py index cd30722f1930..9a03ab36bfd2 100644 --- a/sdks/python/apache_beam/utils/sharded_key.py +++ b/sdks/python/apache_beam/utils/sharded_key.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - class ShardedKey(object): """ diff --git a/sdks/python/apache_beam/utils/shared.py b/sdks/python/apache_beam/utils/shared.py index 420c2be0c8a1..79d70279466c 100644 --- a/sdks/python/apache_beam/utils/shared.py +++ b/sdks/python/apache_beam/utils/shared.py @@ -97,10 +97,6 @@ def construct_table(): RainbowTableLookupFn(shared_handle), reverse_hash_table)) """ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import threading import uuid import weakref diff --git a/sdks/python/apache_beam/utils/shared_test.py b/sdks/python/apache_beam/utils/shared_test.py index 28ba7c8995c2..caba2e5538b6 100644 --- a/sdks/python/apache_beam/utils/shared_test.py +++ b/sdks/python/apache_beam/utils/shared_test.py @@ -17,10 +17,6 @@ """Test for Shared class.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import gc import threading import time diff --git a/sdks/python/apache_beam/utils/subprocess_server.py b/sdks/python/apache_beam/utils/subprocess_server.py index 018a8375c424..d1466a295892 100644 --- a/sdks/python/apache_beam/utils/subprocess_server.py +++ b/sdks/python/apache_beam/utils/subprocess_server.py @@ -17,8 +17,6 @@ # pytype: skip-file -from __future__ import absolute_import - import contextlib import logging import os @@ -30,10 +28,10 @@ import tempfile import threading import time +from urllib.error import URLError +from urllib.request import urlopen import grpc -from future.moves.urllib.error import URLError -from future.moves.urllib.request import urlopen from apache_beam.version import __version__ as beam_version @@ -170,6 +168,10 @@ def start_process(self): if self._existing_service: return self._existing_service else: + if not shutil.which('java'): + raise RuntimeError( + 'Java must be installed on this system to use this ' + 'transform/runner.') return super(JavaJarServer, self).start_process() def stop_process(self): @@ -201,12 +203,18 @@ def path_to_maven_jar( ]) @classmethod - def path_to_beam_jar(cls, gradle_target, appendix=None, version=beam_version): + def path_to_beam_jar( + cls, + gradle_target, + appendix=None, + version=beam_version, + artifact_id=None): if gradle_target in cls._BEAM_SERVICES.replacements: return cls._BEAM_SERVICES.replacements[gradle_target] gradle_package = gradle_target.strip(':').rsplit(':', 1)[0] - artifact_id = 'beam-' + gradle_package.replace(':', '-') + if not artifact_id: + artifact_id = 'beam-' + gradle_package.replace(':', '-') project_root = os.path.sep.join( os.path.abspath(__file__).split(os.path.sep)[:-5]) local_path = os.path.join( diff --git a/sdks/python/apache_beam/utils/subprocess_server_test.py b/sdks/python/apache_beam/utils/subprocess_server_test.py index 22640bc57ed2..025eed4233a6 100644 --- a/sdks/python/apache_beam/utils/subprocess_server_test.py +++ b/sdks/python/apache_beam/utils/subprocess_server_test.py @@ -19,32 +19,16 @@ # pytype: skip-file -from __future__ import absolute_import - import os import re -import shutil import socketserver import tempfile import threading import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import - from apache_beam.utils import subprocess_server -# TODO(Py3): Use tempfile.TemporaryDirectory -class TemporaryDirectory: - def __enter__(self): - self._path = tempfile.mkdtemp() - return self._path - - def __exit__(self, *args): - shutil.rmtree(self._path, ignore_errors=True) - - class JavaJarServerTest(unittest.TestCase): def test_gradle_jar_release(self): self.assertEqual( @@ -57,6 +41,14 @@ def test_gradle_jar_release(self): 'beam-sdks-java-fake/VERSION/beam-sdks-java-fake-A-VERSION.jar', subprocess_server.JavaJarServer.path_to_beam_jar( ':sdks:java:fake:fatJar', appendix='A', version='VERSION')) + self.assertEqual( + 'https://repo.maven.apache.org/maven2/org/apache/beam/' + 'beam-sdks-java-fake/VERSION/beam-sdks-java-fake-A-VERSION.jar', + subprocess_server.JavaJarServer.path_to_beam_jar( + ':gradle:target:doesnt:matter', + appendix='A', + version='VERSION', + artifact_id='beam-sdks-java-fake')) def test_gradle_jar_dev(self): with self.assertRaisesRegex( @@ -81,6 +73,20 @@ def test_gradle_jar_dev(self): ' not found.'): subprocess_server.JavaJarServer.path_to_beam_jar( ':sdks:java:fake:fatJar', appendix='A', version='VERSION.dev') + with self.assertRaisesRegex( + Exception, + re.escape(os.path.join('sdks', + 'java', + 'fake', + 'build', + 'libs', + 'fake-artifact-id-A-VERSION-SNAPSHOT.jar')) + + ' not found.'): + subprocess_server.JavaJarServer.path_to_beam_jar( + ':sdks:java:fake:fatJar', + appendix='A', + version='VERSION.dev', + artifact_id='fake-artifact-id') def test_beam_services(self): with subprocess_server.JavaJarServer.beam_services({':some:target': 'foo'}): @@ -102,7 +108,7 @@ def handle(self): t.daemon = True t.start() - with TemporaryDirectory() as temp_dir: + with tempfile.TemporaryDirectory() as temp_dir: subprocess_server.JavaJarServer.local_jar( 'http://localhost:%s/path/to/file.jar' % port, temp_dir) with open(os.path.join(temp_dir, 'file.jar')) as fin: diff --git a/sdks/python/apache_beam/utils/thread_pool_executor.py b/sdks/python/apache_beam/utils/thread_pool_executor.py index c111b9aeff36..afe172af6c7e 100644 --- a/sdks/python/apache_beam/utils/thread_pool_executor.py +++ b/sdks/python/apache_beam/utils/thread_pool_executor.py @@ -17,18 +17,11 @@ # pytype: skip-file -from __future__ import absolute_import - -import sys +import queue import threading import weakref from concurrent.futures import _base -try: # Python3 - import queue -except Exception: # Python2 - import Queue as queue # type: ignore[no-redef] - class _WorkItem(object): def __init__(self, future, fn, args, kwargs): @@ -43,15 +36,7 @@ def run(self): try: self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs)) except BaseException as exc: - # Even though Python 2 futures library has #set_exection(), - # the way it generates the traceback doesn't align with - # the way in which Python 3 does it so we provide alternative - # implementations that match our test expectations. - if sys.version_info.major >= 3: - self._future.set_exception(exc) - else: - e, tb = sys.exc_info()[1:] - self._future.set_exception_info(e, tb) + self._future.set_exception(exc) class _Worker(threading.Thread): diff --git a/sdks/python/apache_beam/utils/thread_pool_executor_test.py b/sdks/python/apache_beam/utils/thread_pool_executor_test.py index b9251cad00a1..b382224f6850 100644 --- a/sdks/python/apache_beam/utils/thread_pool_executor_test.py +++ b/sdks/python/apache_beam/utils/thread_pool_executor_test.py @@ -19,17 +19,12 @@ # pytype: skip-file -from __future__ import absolute_import - import itertools import threading import time import traceback import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import - from apache_beam.utils import thread_pool_executor from apache_beam.utils.thread_pool_executor import UnboundedThreadPoolExecutor diff --git a/sdks/python/apache_beam/utils/timestamp.py b/sdks/python/apache_beam/utils/timestamp.py index 22c9f3387172..b2738e09c0c8 100644 --- a/sdks/python/apache_beam/utils/timestamp.py +++ b/sdks/python/apache_beam/utils/timestamp.py @@ -23,13 +23,8 @@ # pytype: skip-file # mypy: disallow-untyped-defs -from __future__ import absolute_import -from __future__ import division - import datetime import time -from builtins import object -from typing import Any from typing import Union from typing import overload @@ -37,7 +32,6 @@ import pytz from google.protobuf import duration_pb2 from google.protobuf import timestamp_pb2 -from past.builtins import long from apache_beam.portability import common_urns @@ -60,10 +54,10 @@ class Timestamp(object): """ def __init__(self, seconds=0, micros=0): # type: (Union[int, float], Union[int, float]) -> None - if not isinstance(seconds, (int, long, float)): + if not isinstance(seconds, (int, float)): raise TypeError( 'Cannot interpret %s %s as seconds.' % (seconds, type(seconds))) - if not isinstance(micros, (int, long, float)): + if not isinstance(micros, (int, float)): raise TypeError( 'Cannot interpret %s %s as micros.' % (micros, type(micros))) self.micros = int(seconds * 1000000) + int(micros) @@ -83,7 +77,7 @@ def of(seconds): Corresponding Timestamp object. """ - if not isinstance(seconds, (int, long, float, Timestamp)): + if not isinstance(seconds, (int, float, Timestamp)): raise TypeError( 'Cannot interpret %s %s as Timestamp.' % (seconds, type(seconds))) if isinstance(seconds, Timestamp): @@ -134,6 +128,10 @@ def from_rfc3339(cls, rfc3339): rfc3339, e)) return cls.from_utc_datetime(dt) + def seconds(self) -> int: + """Returns the timestamp in seconds.""" + return self.micros // 1000000 + def predecessor(self): # type: () -> Timestamp @@ -211,17 +209,12 @@ def __eq__(self, other): # Allow comparisons between Duration and Timestamp values. if isinstance(other, (Duration, Timestamp)): return self.micros == other.micros - elif isinstance(other, (int, long, float)): + elif isinstance(other, (int, float)): return self.micros == Timestamp.of(other).micros else: # Support equality with other types return NotImplemented - def __ne__(self, other): - # type: (Any) -> bool - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __lt__(self, other): # type: (TimestampDurationTypes) -> bool # Allow comparisons between Duration and Timestamp values. @@ -372,17 +365,12 @@ def __eq__(self, other): # Allow comparisons between Duration and Timestamp values. if isinstance(other, (Duration, Timestamp)): return self.micros == other.micros - elif isinstance(other, (int, long, float)): + elif isinstance(other, (int, float)): return self.micros == Duration.of(other).micros else: # Support equality with other types return NotImplemented - def __ne__(self, other): - # type: (Any) -> bool - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __lt__(self, other): # type: (TimestampDurationTypes) -> bool # Allow comparisons between Duration and Timestamp values. diff --git a/sdks/python/apache_beam/utils/timestamp_test.py b/sdks/python/apache_beam/utils/timestamp_test.py index 595083548943..6fb30d7f5b02 100644 --- a/sdks/python/apache_beam/utils/timestamp_test.py +++ b/sdks/python/apache_beam/utils/timestamp_test.py @@ -19,13 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - import datetime import unittest -# patches unittest.TestCase to be python3 compatible -import future.tests.base # pylint: disable=unused-import import pytz from google.protobuf import duration_pb2 from google.protobuf import timestamp_pb2 diff --git a/sdks/python/apache_beam/utils/urns.py b/sdks/python/apache_beam/utils/urns.py index dc069d3ef48f..c8da0da8f3f3 100644 --- a/sdks/python/apache_beam/utils/urns.py +++ b/sdks/python/apache_beam/utils/urns.py @@ -19,12 +19,9 @@ # pytype: skip-file -from __future__ import absolute_import - # TODO(BEAM-2685): Issue with dill + local classes + abc metaclass # import abc import inspect -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import Callable diff --git a/sdks/python/apache_beam/utils/windowed_value.py b/sdks/python/apache_beam/utils/windowed_value.py index 13d42714fa9c..08fca45c31c8 100644 --- a/sdks/python/apache_beam/utils/windowed_value.py +++ b/sdks/python/apache_beam/utils/windowed_value.py @@ -25,13 +25,11 @@ # editing this file as WindowedValues are created for every element for # every step in a Beam pipeline. -#cython: profile=True +# cython: profile=True +# cython: language_level=3 # pytype: skip-file -from __future__ import absolute_import - -from builtins import object from typing import TYPE_CHECKING from typing import Any from typing import List @@ -153,10 +151,6 @@ def __eq__(self, other): self.timing == other.timing and self.index == other.index and self.nonspeculative_index == other.nonspeculative_index) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return hash(( self.is_first, @@ -254,10 +248,6 @@ def __eq__(self, other): self.value == other.value and self.windows == other.windows and self.pane_info == other.pane_info) - def __ne__(self, other): - # TODO(BEAM-5949): Needed for Python 2 compatibility. - return not self == other - def __hash__(self): return ((hash(self.value) & 0xFFFFFFFFFFFFFFF) + 3 * (self.timestamp_micros & 0xFFFFFFFFFFFFFF) + 7 * @@ -350,8 +340,5 @@ def __eq__(self, other): self._start_micros == other._start_micros and self._end_micros == other._end_micros) - def __ne__(self, other): - return not self == other - def __repr__(self): return '[%s, %s)' % (float(self.start), float(self.end)) diff --git a/sdks/python/apache_beam/utils/windowed_value_test.py b/sdks/python/apache_beam/utils/windowed_value_test.py index 3b95742b9433..bf4048a9bd06 100644 --- a/sdks/python/apache_beam/utils/windowed_value_test.py +++ b/sdks/python/apache_beam/utils/windowed_value_test.py @@ -19,8 +19,6 @@ # pytype: skip-file -from __future__ import absolute_import - import copy import pickle import unittest diff --git a/sdks/python/apache_beam/version.py b/sdks/python/apache_beam/version.py index 88da315df062..264029d62612 100644 --- a/sdks/python/apache_beam/version.py +++ b/sdks/python/apache_beam/version.py @@ -17,4 +17,4 @@ """Apache Beam SDK version information and utilities.""" -__version__ = '2.27.0.dev' +__version__ = '2.34.0.dev' diff --git a/sdks/python/build-requirements.txt b/sdks/python/build-requirements.txt index 415ac340ab1d..c0855478c902 100644 --- a/sdks/python/build-requirements.txt +++ b/sdks/python/build-requirements.txt @@ -14,9 +14,9 @@ # See the License for the specific language governing permissions and # limitations under the License. # -grpcio-tools==1.30.0 +grpcio-tools==1.37.0 future==0.18.2 mypy-protobuf==1.18 # Avoid https://github.com/pypa/virtualenv/issues/2006 -distlib==0.3.1 \ No newline at end of file +distlib==0.3.1 diff --git a/sdks/python/build.gradle b/sdks/python/build.gradle index 125a9931f32d..2b7581fb955e 100644 --- a/sdks/python/build.gradle +++ b/sdks/python/build.gradle @@ -110,3 +110,15 @@ task startPortableRunner { } } } + +// Run this task to validate the python environment setup for contributors +task wordCount { + description "Run the Python word count example" + dependsOn 'installGcpTest' + doLast { + exec { + executable 'sh' + args '-c', ". ${envdir}/bin/activate && python -m apache_beam.examples.wordcount --runner DirectRunner --output /tmp/output.txt" + } + } +} \ No newline at end of file diff --git a/sdks/python/conftest.py b/sdks/python/conftest.py index b0a35cf16ee4..0b8881ebb5a8 100644 --- a/sdks/python/conftest.py +++ b/sdks/python/conftest.py @@ -14,31 +14,37 @@ # See the License for the specific language governing permissions and # limitations under the License. # -"""Pytest configuration and custom hooks.""" -from __future__ import absolute_import +"""Pytest configuration and custom hooks.""" import sys from apache_beam.options import pipeline_options +from apache_beam.testing.test_pipeline import TestPipeline MAX_SUPPORTED_PYTHON_VERSION = (3, 8) + def pytest_addoption(parser): - parser.addoption('--test-pipeline-options', - help='Options to use in test pipelines. NOTE: Tests may ' - 'ignore some or all of these options.') + parser.addoption( + '--test-pipeline-options', + help='Options to use in test pipelines. NOTE: Tests may ' + 'ignore some or all of these options.') + # See pytest.ini for main collection rules. -collect_ignore_glob = [] -if sys.version_info < (3,): - collect_ignore_glob.append('*_py3*.py') -else: - for minor in range(sys.version_info.minor + 1, - MAX_SUPPORTED_PYTHON_VERSION[1] + 1): - collect_ignore_glob.append('*_py3%d.py' % minor) +collect_ignore_glob = [ + '*_py3%d.py' % minor for minor in range( + sys.version_info.minor + 1, MAX_SUPPORTED_PYTHON_VERSION[1] + 1) +] def pytest_configure(config): + """Saves options added in pytest_addoption for later use. + This is necessary since pytest-xdist workers do not have the same sys.argv as + the main pytest invocation. xdist does seem to pickle TestPipeline + """ + TestPipeline.pytest_test_pipeline_options = config.getoption( + 'test_pipeline_options', default='') # Enable optional type checks on all tests. pipeline_options.enable_all_additional_type_checks() diff --git a/sdks/python/container/Dockerfile b/sdks/python/container/Dockerfile index 9d1d3a8c0c3f..f517dc1926f8 100644 --- a/sdks/python/container/Dockerfile +++ b/sdks/python/container/Dockerfile @@ -25,34 +25,50 @@ ARG pull_licenses # Install native bindings required for dependencies. RUN apt-get update && \ apt-get install -y \ - # These packages are needed for "pip install python-snappy" below. + # Required by python-snappy libsnappy-dev \ - # This package is needed for "pip install pyyaml" below to have c bindings. + # Required by pyyaml (for c bindings) libyaml-dev \ # This is used to speed up the re-installation of the sdk. ccache \ && \ rm -rf /var/lib/apt/lists/* -# Install packages required by the Python SDK and common dependencies of the user code. - -# SDK dependencies not listed in base_image_requirements.txt will be installed when we install SDK -# in the next RUN statement. +#### +# Install required packages for Beam Python SDK and common dependencies used by users. +#### +# SDK dependencies not listed in base_image_requirements.txt will be installed +# when we install SDK with pip below. COPY target/base_image_requirements.txt /tmp/base_image_requirements.txt RUN \ pip install -r /tmp/base_image_requirements.txt && \ + python -c "import nltk; nltk.download('stopwords')" && \ + rm /root/nltk_data/corpora/stopwords.zip && \ # Check that the fast implementation of protobuf is used. python -c "from google.protobuf.internal import api_implementation; assert api_implementation._default_implementation_type == 'cpp'; print ('Verified fast protobuf used.')" && \ # Remove pip cache. rm -rf /root/.cache/pip && \ rm -rf /tmp/base_image_requirements.txt -# Configure ccache prior to installing the SDK. +# Install Google Cloud SDK. +ENV CLOUDSDK_CORE_DISABLE_PROMPTS yes +ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin +RUN mkdir -p /usr/local/gcloud && \ + cd /usr/local/gcloud && \ + curl -s -O https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz && \ + tar -xf google-cloud-sdk.tar.gz && \ + /usr/local/gcloud/google-cloud-sdk/install.sh && \ + rm google-cloud-sdk.tar.gz + +# Configure ccache prior to installing Beam SDK. RUN ln -s /usr/bin/ccache /usr/local/bin/gcc # These parameters are needed as pip compiles artifacts in random temporary directories. RUN ccache --set-config=sloppiness=file_macro && ccache --set-config=hash_dir=false +#### +# Install Apache Beam SDK +#### COPY target/apache-beam.tar.gz /opt/apache/beam/tars/ RUN pip install -v /opt/apache/beam/tars/apache-beam.tar.gz[gcp] @@ -87,6 +103,7 @@ RUN pip check COPY target/LICENSE /opt/apache/beam/ +COPY target/LICENSE.python /opt/apache/beam/ COPY target/NOTICE /opt/apache/beam/ ADD target/launcher/linux_amd64/boot /opt/apache/beam/ diff --git a/sdks/python/container/base_image_requirements.txt b/sdks/python/container/base_image_requirements.txt index d4a9ba90a88d..9a1c76c603e0 100644 --- a/sdks/python/container/base_image_requirements.txt +++ b/sdks/python/container/base_image_requirements.txt @@ -23,47 +23,68 @@ # Any SDK dependencies not listed here will be installed when SDK is installed # into the container. +# TODO(AVRO-2429): Upgrade to >= 1.9.0 only after resolved avro-python3==1.8.2 fastavro==1.0.0.post1 crcmod==1.7 dill==0.3.1.1 future==0.18.2 -grpcio==1.29.0 +grpcio==1.34.0 hdfs==2.5.8 -httplib2==0.12.0 -mock==2.0.0 +httplib2==0.19.1 oauth2client==4.1.3 protobuf==3.12.2 -pyarrow==0.16.0 +pyarrow==3.0.0 pydot==1.4.1 -pymongo==3.9.0 -pytz==2019.3 -pyyaml==5.1 +pymongo==3.10.1 +pytz==2020.1 +pyyaml==5.4 typing-extensions==3.7.4.3 # GCP extra features -google-api-core==1.22.0 -google-apitools==0.5.28 +google-auth==1.31.0 +google-api-core==1.22.2 +google-apitools==0.5.31 google-cloud-pubsub==1.0.2 google-cloud-bigquery==1.26.1 google-cloud-bigtable==1.0.0 google-cloud-core==1.4.1 -google-cloud-datastore==1.7.4 +google-cloud-datastore==1.15.3 +google-cloud-dlp==0.13.0 +google-cloud-language==1.3.0 +google-cloud-profiler==3.0.4 +google-cloud-recommendations-ai==0.2.0 +google-cloud-spanner==1.13.0 +google-cloud-videointelligence==1.13.0 +google-cloud-vision==0.42.0 +google-python-cloud-debugger == 2.15 +grpcio-gcp==0.2.2 -# Optional packages -cython==0.29.13 -guppy3==3.0.9 +## These are additional optional packages likely to be used by customers. +beautifulsoup4 == 4.9.1 +bs4 == 0.0.1 +cython==0.29.21 +cachetools == 3.1.1 +dataclasses == 0.8 ; python_version=="3.6" +guppy3==3.0.10 mmh3==2.5.1 - -# These are additional packages likely to be used by customers. -numpy==1.17.3 +orjson==3.6.1 +python-dateutil == 2.8.1 +requests == 2.24.0 +freezegun == 0.3.15 +pillow == 7.2.0 +python-snappy == 0.5.4 +numpy==1.19.5 scipy==1.4.1 -pandas==1.1.4 +scikit-learn==0.24.1 +pandas==1.1.5 ; python_version<"3.7" +pandas==1.2.4 ; python_version>="3.7" protorpc==0.12.0 -python-gflags==3.0.6 -tensorflow==2.3.0 +python-gflags==3.1.2 +tensorflow==2.5.0 +nltk==3.5.0 # Packages needed for testing. tenacity>=5.0.2 pyhamcrest<2.0,>=1.9 -nose==1.3.7 +pytest==4.6.11 diff --git a/sdks/python/container/boot.go b/sdks/python/container/boot.go index 470eafb5b2a5..60a0ed5e011a 100644 --- a/sdks/python/container/boot.go +++ b/sdks/python/container/boot.go @@ -31,11 +31,11 @@ import ( "strings" "time" - "github.com/apache/beam/sdks/go/pkg/beam/artifact" - pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" - "github.com/apache/beam/sdks/go/pkg/beam/provision" - "github.com/apache/beam/sdks/go/pkg/beam/util/execx" - "github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/artifact" + pipepb "github.com/apache/beam/sdks/v2/go/pkg/beam/model/pipeline_v1" + "github.com/apache/beam/sdks/v2/go/pkg/beam/provision" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/execx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/grpcx" "github.com/golang/protobuf/jsonpb" "github.com/golang/protobuf/proto" "github.com/nightlyone/lockfile" @@ -155,12 +155,18 @@ func main() { // TODO(herohde): the packages to install should be specified explicitly. It // would also be possible to install the SDK in the Dockerfile. fileNames := make([]string, len(files)) + requirementsFiles := []string{requirementsFile} for i, v := range files { - log.Printf("Found artifact: %s", v.Name) - fileNames[i] = v.Name + name, _ := artifact.MustExtractFilePayload(v) + log.Printf("Found artifact: %s", name) + fileNames[i] = name + + if v.RoleUrn == artifact.URNPipRequirementsFile { + requirementsFiles = append(requirementsFiles, name) + } } - if setupErr := installSetupPackages(fileNames, dir); setupErr != nil { + if setupErr := installSetupPackages(fileNames, dir, requirementsFiles); setupErr != nil { log.Fatalf("Failed to install required packages: %v", setupErr) } } @@ -179,11 +185,21 @@ func main() { os.Setenv("SEMI_PERSISTENT_DIRECTORY", *semiPersistDir) os.Setenv("LOGGING_API_SERVICE_DESCRIPTOR", proto.MarshalTextString(&pipepb.ApiServiceDescriptor{Url: *loggingEndpoint})) os.Setenv("CONTROL_API_SERVICE_DESCRIPTOR", proto.MarshalTextString(&pipepb.ApiServiceDescriptor{Url: *controlEndpoint})) + os.Setenv("RUNNER_CAPABILITIES", strings.Join(info.GetRunnerCapabilities(), " ")) if info.GetStatusEndpoint() != nil { os.Setenv("STATUS_API_SERVICE_DESCRIPTOR", proto.MarshalTextString(info.GetStatusEndpoint())) } + if metadata := info.GetMetadata(); metadata != nil { + if jobName, nameExists := metadata["job_name"]; nameExists { + os.Setenv("JOB_NAME", jobName) + } + if jobID, idExists := metadata["job_id"]; idExists { + os.Setenv("JOB_ID", jobID) + } + } + args := []string{ "-m", sdkHarnessEntrypoint, @@ -218,7 +234,7 @@ func setupAcceptableWheelSpecs() error { } // installSetupPackages installs Beam SDK and user dependencies. -func installSetupPackages(files []string, workDir string) error { +func installSetupPackages(files []string, workDir string, requirementsFiles []string) error { log.Printf("Installing setup packages ...") if err := setupAcceptableWheelSpecs(); err != nil { @@ -233,8 +249,10 @@ func installSetupPackages(files []string, workDir string) error { } // The staged files will not disappear due to restarts because workDir is a // folder that is mapped to the host (and therefore survives restarts). - if err := pipInstallRequirements(files, workDir, requirementsFile); err != nil { - return fmt.Errorf("failed to install requirements: %v", err) + for _, f := range requirementsFiles { + if err := pipInstallRequirements(files, workDir, f); err != nil { + return fmt.Errorf("failed to install requirements: %v", err) + } } if err := installExtraPackages(files, extraPackagesFile, workDir); err != nil { return fmt.Errorf("failed to install extra packages: %v", err) @@ -333,7 +351,7 @@ func processArtifactsInSetupOnlyMode() error { } files[i] = filePayload.GetPath() } - if setupErr := installSetupPackages(files, workDir); setupErr != nil { + if setupErr := installSetupPackages(files, workDir, []string{requirementsFile}); setupErr != nil { log.Fatalf("Failed to install required packages: %v", setupErr) } return nil diff --git a/sdks/python/container/build.gradle b/sdks/python/container/build.gradle index 6093a14625ed..32d9bf3f23a7 100644 --- a/sdks/python/container/build.gradle +++ b/sdks/python/container/build.gradle @@ -21,10 +21,10 @@ applyGoNature() description = "Apache Beam :: SDKs :: Python :: Container" -// Figure out why the golang plugin does not add a build dependency between projects. -// Without the line below, we get spurious errors about not being able to resolve -// "./github.com/apache/beam/sdks/go" -resolveBuildDependencies.dependsOn ":sdks:go:goBuild" +// Disable gogradle's dependency resolution so it uses Go modules instead. +installDependencies.enabled = false +resolveBuildDependencies.enabled = false +resolveTestDependencies.enabled = false configurations { sdkSourceTarball @@ -42,7 +42,7 @@ dependencies { } golang { - packagePath = 'github.com/apache/beam/sdks/python/boot' + packagePath = 'github.com/apache/beam/sdks/v2/python/container' goBuild { // Build for linux and mac. targetPlatform = ['linux-amd64', 'darwin-amd64'] @@ -56,6 +56,12 @@ task buildAll { dependsOn ':sdks:python:container:py38:docker' } +task pushAll { + dependsOn ':sdks:python:container:py36:dockerPush' + dependsOn ':sdks:python:container:py37:dockerPush' + dependsOn ':sdks:python:container:py38:dockerPush' +} + artifacts { sdkHarnessLauncher file: file('./build/target/launcher'), builtBy: goBuild } diff --git a/sdks/python/container/common.gradle b/sdks/python/container/common.gradle index bb4eba94c485..8b0f91bb94a3 100644 --- a/sdks/python/container/common.gradle +++ b/sdks/python/container/common.gradle @@ -60,6 +60,8 @@ docker { project.docker_image_default_repo_root, tag: project.rootProject.hasProperty(["docker-tag"]) ? project.rootProject["docker-tag"] : project.sdk_version) + // tags used by dockerTag task + tags containerImageTags() files "../Dockerfile", "./build" buildArgs(['py_version': "${project.ext.pythonVersion}", 'pull_licenses': project.rootProject.hasProperty(["docker-pull-licenses"]) || @@ -77,4 +79,10 @@ if (project.rootProject.hasProperty(["docker-pull-licenses"])) { dependsOn ':release:go-licenses:py:createLicenses' } dockerPrepare.dependsOn 'copyGolangLicenses' +} else { + task skipPullLicenses(type: Exec) { + executable "sh" + args "-c", "mkdir -p build/target/go-licenses" + } + dockerPrepare.dependsOn 'skipPullLicenses' } diff --git a/sdks/python/container/license_scripts/dep_urls_py.yaml b/sdks/python/container/license_scripts/dep_urls_py.yaml index b86c69ca9443..cdc45a8ff5d0 100644 --- a/sdks/python/container/license_scripts/dep_urls_py.yaml +++ b/sdks/python/container/license_scripts/dep_urls_py.yaml @@ -41,12 +41,17 @@ pip_dependencies: notice: "https://raw.githubusercontent.com/apache/avro/master/NOTICE.txt" backports.weakref: license: "https://raw.githubusercontent.com/PiDelport/backports.weakref/master/LICENSE" + bs4: + # bs4 is a dummy PyPI package setup for beautifulsoup4. + license: "skip" chardet: license: "https://raw.githubusercontent.com/chardet/chardet/master/LICENSE" certifi: license: "https://raw.githubusercontent.com/certifi/python-certifi/master/LICENSE" cython: license: "https://raw.githubusercontent.com/cython/cython/master/LICENSE.txt" + dataclasses: + license: "https://raw.githubusercontent.com/ericvsmith/dataclasses/master/LICENSE.txt" enum34: # The original repo is down. This license taken from somebody's clone: # https://github.com/jamespharaoh/python-enum34/blob/master/enum/LICENSE @@ -54,6 +59,8 @@ pip_dependencies: fastavro: license: "https://raw.githubusercontent.com/fastavro/fastavro/master/LICENSE" notice: "https://raw.githubusercontent.com/fastavro/fastavro/master/NOTICE.txt" + flatbuffers: + license: "https://raw.githubusercontent.com/google/flatbuffers/master/LICENSE.txt" funcsigs: license: "https://raw.githubusercontent.com/aliles/funcsigs/master/LICENSE" futures: @@ -62,6 +69,8 @@ pip_dependencies: license: "https://raw.githubusercontent.com/google/apitools/master/LICENSE" googledatastore: license: "https://raw.githubusercontent.com/GoogleCloudPlatform/google-cloud-datastore/master/LICENSE" + google-python-cloud-debugger: + license: "https://raw.githubusercontent.com/GoogleCloudPlatform/cloud-debug-python/master/LICENSE" grpcio: license: "https://raw.githubusercontent.com/grpc/grpc/master/LICENSE" notice: "https://raw.githubusercontent.com/grpc/grpc/master/NOTICE.txt" @@ -77,6 +86,8 @@ pip_dependencies: license: "https://raw.githubusercontent.com/mtth/hdfs/master/LICENSE" httplib2: license: "https://raw.githubusercontent.com/httplib2/httplib2/master/LICENSE" + keras-nightly: + license: "https://raw.githubusercontent.com/keras-team/keras/master/LICENSE" mmh3: license: "https://raw.githubusercontent.com/hajimes/mmh3/master/LICENSE" mock: @@ -89,6 +100,8 @@ pip_dependencies: license: "https://raw.githubusercontent.com/numpy/numpy/master/LICENSE.txt" oauth2client: license: "https://raw.githubusercontent.com/googleapis/oauth2client/master/LICENSE" + orjson: + license: "https://github.com/ijl/orjson/raw/master/LICENSE-APACHE" pandas: license: "https://raw.githubusercontent.com/pandas-dev/pandas/master/LICENSE" pathlib2: @@ -122,6 +135,8 @@ pip_dependencies: license: "https://raw.githubusercontent.com/tensorflow/tensorflow/master/LICENSE" tensorflow-estimator: license: "https://raw.githubusercontent.com/tensorflow/estimator/master/LICENSE" + tensorboard-data-server: + license: "https://raw.githubusercontent.com/tensorflow/tensorboard/master/LICENSE" tensorboard-plugin-wit: license: "https://raw.githubusercontent.com/PAIR-code/what-if-tool/master/LICENSE" timeloop: diff --git a/sdks/python/container/license_scripts/pull_licenses_py.py b/sdks/python/container/license_scripts/pull_licenses_py.py index 75f629dff045..4754da4d1052 100644 --- a/sdks/python/container/license_scripts/pull_licenses_py.py +++ b/sdks/python/container/license_scripts/pull_licenses_py.py @@ -30,10 +30,9 @@ import traceback import yaml -from future.moves.urllib.request import urlopen -from future.moves.urllib.parse import urlparse -from future.moves.urllib.parse import urljoin - +from urllib.request import urlopen +from urllib.parse import urlparse +from urllib.parse import urljoin from tenacity import retry from tenacity import stop_after_attempt from tenacity import wait_exponential diff --git a/sdks/python/container/piputil.go b/sdks/python/container/piputil.go index 8157a6d94f26..73f72cccad82 100644 --- a/sdks/python/container/piputil.go +++ b/sdks/python/container/piputil.go @@ -26,7 +26,7 @@ import ( "path/filepath" "strings" - "github.com/apache/beam/sdks/go/pkg/beam/util/execx" + "github.com/apache/beam/sdks/v2/go/pkg/beam/util/execx" ) var ( @@ -34,10 +34,11 @@ var ( ) func pipLocation() string { + // Users can set 'pip' environment variable to use a custom pip path. if v, ok := os.LookupEnv("pip"); ok { return v } - return "/usr/local/bin/pip" + return "pip" } // pipInstallRequirements installs the given requirement, if present. @@ -48,14 +49,14 @@ func pipInstallRequirements(files []string, dir, name string) error { // as possible PyPI downloads. In the first round the --find-links // option will make sure that only things staged in the worker will be // used without following their dependencies. - args := []string{"install", "-r", filepath.Join(dir, name), "--no-index", "--no-deps", "--find-links", dir} + args := []string{"install", "-r", filepath.Join(dir, name), "--disable-pip-version-check", "--no-index", "--no-deps", "--find-links", dir} if err := execx.Execute(pip, args...); err != nil { return err } // The second install round opens up the search for packages on PyPI and // also installs dependencies. The key is that if all the packages have // been installed in the first round then this command will be a no-op. - args = []string{"install", "-r", filepath.Join(dir, name), "--find-links", dir} + args = []string{"install", "-r", filepath.Join(dir, name), "--disable-pip-version-check", "--find-links", dir} return execx.Execute(pip, args...) } } @@ -83,22 +84,22 @@ func pipInstallPackage(files []string, dir, name string, force, optional bool, e // avoiding reinstallation of dependencies. Note now that if any needed // dependencies were not installed, they will still be missing. // - // Next, we run "pip install" on the package without any flags. Since the + // Next, we run "pip install" on the package without these flags. Since the // installed version will match the package specified, the package itself // will not be reinstalled, but its dependencies will now be resolved and // installed if necessary. This achieves our goal outlined above. - args := []string{"install", "--upgrade", "--force-reinstall", "--no-deps", + args := []string{"install", "--disable-pip-version-check", "--upgrade", "--force-reinstall", "--no-deps", filepath.Join(dir, packageSpec)} err := execx.Execute(pip, args...) if err != nil { return err } - args = []string{"install", filepath.Join(dir, packageSpec)} + args = []string{"install", "--disable-pip-version-check", filepath.Join(dir, packageSpec)} return execx.Execute(pip, args...) } // Case when we do not perform a forced reinstall. - args := []string{"install", filepath.Join(dir, packageSpec)} + args := []string{"install", "--disable-pip-version-check", filepath.Join(dir, packageSpec)} return execx.Execute(pip, args...) } } diff --git a/sdks/python/container/run_validatescontainer.sh b/sdks/python/container/run_validatescontainer.sh index 4bf3bf770db2..4eab5d90fc38 100755 --- a/sdks/python/container/run_validatescontainer.sh +++ b/sdks/python/container/run_validatescontainer.sh @@ -24,16 +24,19 @@ # REGION -> Region name to use for Dataflow # # Execute from the root of the repository: -# test Python2 container: ./sdks/python/container/run_validatescontainer.sh python2 -# test Python3 container: ./sdks/python/container/run_validatescontainer.sh python35 -# test Python3 container: ./sdks/python/container/run_validatescontainer.sh python36 -# test Python3 container: ./sdks/python/container/run_validatescontainer.sh python37 -# test Python3 container: ./sdks/python/container/run_validatescontainer.sh python38 +# test Python3.6 container: +# ./gradlew :sdks:python:test-suites:dataflow:py36:validatesContainer +# test Python3.7 container: +# ./gradlew :sdks:python:test-suites:dataflow:py37:validatesContainer +# test Python3.8 container: +# ./gradlew :sdks:python:test-suites:dataflow:py38:validatesContainer +# or test all supported python versions together: +# ./gradlew :sdks:python:test-suites:dataflow:validatesContainer echo "This script must be executed in the root of beam project. Please set GCS_LOCATION, PROJECT and REGION as desired." -if [[ $# != 1 ]]; then - printf "Usage: \n$> ./sdks/python/container/run_validatescontainer.sh " +if [[ $# != 2 ]]; then + printf "Usage: \n$> ./sdks/python/container/run_validatescontainer.sh " printf "\n\tpython_version: [required] Python version used for container build and run tests." printf " Use 'python35' for Python3.5, python36 for Python3.6, python37 for Python3.7, python38 for Python3.8." exit 1 @@ -49,6 +52,7 @@ GCS_LOCATION=${GCS_LOCATION:-gs://temp-storage-for-end-to-end-tests} PROJECT=${PROJECT:-apache-beam-testing} REGION=${REGION:-us-central1} IMAGE_PREFIX="$(grep 'docker_image_default_repo_prefix' gradle.properties | cut -d'=' -f2)" +SDK_VERSION="$(grep 'sdk_version' gradle.properties | cut -d'=' -f2)" # Other variables branched by Python version. if [[ $1 == "python36" ]]; then @@ -67,7 +71,7 @@ else echo "Must set Python version with one of 'python36', 'python37' and 'python38' from commandline." exit 1 fi -XUNIT_FILE="nosetests-$IMAGE_NAME.xml" +XUNIT_FILE="pytest-$IMAGE_NAME.xml" # Verify in the root of the repository test -d sdks/python/container @@ -78,61 +82,61 @@ command -v gcloud docker -v gcloud -v -# Build the container -TAG=$(date +%Y%m%d-%H%M%S) +# Verify docker image has been built. +docker images | grep "apache/$IMAGE_NAME" | grep "$SDK_VERSION" + +TAG=$(date +%Y%m%d-%H%M%S%N) CONTAINER=us.gcr.io/$PROJECT/$USER/$IMAGE_NAME +PREBUILD_SDK_CONTAINER_REGISTRY_PATH=us.gcr.io/$PROJECT/$USER/prebuild_$1_sdk echo "Using container $CONTAINER" -./gradlew :$CONTAINER_PROJECT:docker -Pdocker-repository-root=us.gcr.io/$PROJECT/$USER -Pdocker-tag=$TAG --info -# Verify it exists -docker images | grep $TAG +# Tag the docker container. +docker tag "apache/$IMAGE_NAME:$SDK_VERSION" "$CONTAINER:$TAG" -# Push the container -gcloud docker -- push $CONTAINER +# Push the container. +gcloud docker -- push $CONTAINER:$TAG function cleanup_container { # Delete the container locally and remotely - docker rmi $CONTAINER:$TAG || echo "Failed to remove container" - docker rmi $(docker images --format '{{.Repository}}:{{.Tag}}' | grep 'prebuilt_sdk') || echo "Failed to remove prebuilt sdk container" + docker rmi $CONTAINER:$TAG || echo "Failed to remove container image" + for image in $(docker images --format '{{.Repository}}:{{.Tag}}' | grep $PREBUILD_SDK_CONTAINER_REGISTRY_PATH) + do docker rmi $image || echo "Failed to remove prebuilt sdk container image" + done gcloud --quiet container images delete $CONTAINER:$TAG || echo "Failed to delete container" + for digest in $(gcloud container images list-tags $PREBUILD_SDK_CONTAINER_REGISTRY_PATH/beam_python_prebuilt_sdk --format="get(digest)") + do gcloud container images delete $PREBUILD_SDK_CONTAINER_REGISTRY_PATH/beam_python_prebuilt_sdk@$digest --force-delete-tags --quiet || echo "Failed to remove prebuilt sdk container image" + done + echo "Removed the container" } trap cleanup_container EXIT echo ">>> Successfully built and push container $CONTAINER" -# Virtualenv for the rest of the script to run setup & e2e test -VENV_PATH=sdks/python/container/venv/$PY_INTERPRETER -virtualenv $VENV_PATH -p $PY_INTERPRETER -. $VENV_PATH/bin/activate cd sdks/python -pip install -e .[gcp,test] - -# Create a tarball -python setup.py sdist -SDK_LOCATION=$(find dist/apache-beam-*.tar.gz) +SDK_LOCATION=$2 # Run ValidatesRunner tests on Google Cloud Dataflow service echo ">>> RUNNING DATAFLOW RUNNER VALIDATESCONTAINER TEST" -python setup.py nosetests \ - --attr ValidatesContainer \ - --nologcapture \ - --processes=1 \ - --process-timeout=900 \ - --with-xunitmp \ - --xunitmp-file=$XUNIT_FILE \ - --ignore-files '.*py3\d?\.py$' \ +pytest -o junit_suite_name=$IMAGE_NAME \ + -m="it_validatescontainer" \ + --show-capture=no \ + --numprocesses=1 \ + --timeout=900 \ + --junitxml=$XUNIT_FILE \ + --ignore-glob '.*py3\d?\.py$' \ + --log-cli-level=INFO \ --test-pipeline-options=" \ --runner=TestDataflowRunner \ --project=$PROJECT \ --region=$REGION \ - --worker_harness_container_image=$CONTAINER:$TAG \ + --sdk_container_image=$CONTAINER:$TAG \ --staging_location=$GCS_LOCATION/staging-validatesrunner-test \ --temp_location=$GCS_LOCATION/temp-validatesrunner-test \ --output=$GCS_LOCATION/output \ --sdk_location=$SDK_LOCATION \ --num_workers=1 \ --prebuild_sdk_container_base_image=$CONTAINER:$TAG \ - --docker_registry_push_url=us.gcr.io/$PROJECT/$USER" + --docker_registry_push_url=$PREBUILD_SDK_CONTAINER_REGISTRY_PATH" echo ">>> SUCCESS DATAFLOW RUNNER VALIDATESCONTAINER TEST" diff --git a/sdks/python/gen_protos.py b/sdks/python/gen_protos.py index 76458b0dd945..2fa3cacdf675 100644 --- a/sdks/python/gen_protos.py +++ b/sdks/python/gen_protos.py @@ -16,9 +16,6 @@ # """Generates Python proto modules and grpc stubs for Beam protos.""" -from __future__ import absolute_import -from __future__ import print_function - import contextlib import glob import inspect @@ -32,6 +29,7 @@ import sys import time import warnings +from importlib import import_module import pkg_resources @@ -181,8 +179,7 @@ def write_message(self, message_name, message, indent=0): sys.path.insert(0, os.path.dirname(api_path)) def _import(m): - # TODO: replace with importlib when we drop support for python2. - return __import__('api.%s' % m, fromlist=[None]) + return import_module('api.%s' % m) try: beam_runner_api_pb2 = _import('beam_runner_api_pb2') diff --git a/sdks/python/mypy.ini b/sdks/python/mypy.ini index 6095c02dd609..cf2a3dda167a 100644 --- a/sdks/python/mypy.ini +++ b/sdks/python/mypy.ini @@ -65,6 +65,12 @@ ignore_errors = true [mypy-apache_beam.io.*] ignore_errors = true +[mypy-apache_beam.io.filesystem] +ignore_errors = false + +[mypy-apache_beam.io.iobase] +ignore_errors = false + [mypy-apache_beam.ml.gcp.*] ignore_errors = true diff --git a/sdks/python/pytest.ini b/sdks/python/pytest.ini index 8929c3c6a4f2..a162b69d17ec 100644 --- a/sdks/python/pytest.ini +++ b/sdks/python/pytest.ini @@ -24,12 +24,21 @@ python_classes = python_functions = # Discover tests using filenames. # See conftest.py for extra collection rules. -python_files = test_*.py *_test.py *_test_py3*.py +python_files = test_*.py *_test.py *_test_py3*.py *_test_it.py markers = + xlang_transforms: collect Cross Language transforms test runs + xlang_sql_expansion_service: collect for Cross Language with SQL expansion service test runs + it_postcommit: collect for post-commit integration test runs + it_validatesrunner: collect for ValidatesRunner integration test runs + no_sickbay_streaming: run without sickbay-streaming + no_sickbay_batch: run without sickbay-batch + it_validatescontainer: collect for ValidatesContainer integration test runs # Tests using this marker conflict with the xdist plugin in some way, such # as enabling save_main_session. no_xdist: run without pytest-xdist plugin + # We run these tests with multiple major pyarrow versions (BEAM-11211) + uses_pyarrow: tests that utilize pyarrow in some way # Default timeout intended for unit tests. # If certain tests need a different value, please see the docs on how to diff --git a/sdks/python/scripts/generate_pydoc.sh b/sdks/python/scripts/generate_pydoc.sh index c3f8f064b788..fb0c415bbca2 100755 --- a/sdks/python/scripts/generate_pydoc.sh +++ b/sdks/python/scripts/generate_pydoc.sh @@ -53,35 +53,41 @@ current_minor_version=`echo ${python_version} | sed -E "s/Python 3.([0-9])\..*/\ # Exclude internal, test, and Cython paths/patterns from the documentation. excluded_patterns=( - apache_beam/coders/stream.* - apache_beam/coders/coder_impl.* - apache_beam/examples/ - apache_beam/internal/clients/ - apache_beam/io/gcp/internal/ - apache_beam/io/gcp/tests/ - apache_beam/metrics/execution.* - apache_beam/runners/common.* - apache_beam/runners/api/ - apache_beam/runners/test/ - apache_beam/runners/dataflow/internal/ - apache_beam/runners/portability/ - apache_beam/runners/worker/ - apache_beam/testing/benchmarks/chicago_taxi/ - apache_beam/tools/map_fn_microbenchmark.* - apache_beam/transforms/cy_combiners.* - apache_beam/transforms/cy_dataflow_distribution_counter.* - apache_beam/transforms/py_dataflow_distribution_counter.* - apache_beam/utils/counters.* - apache_beam/utils/windowed_value.* - *_pb2.py - *_test.py - *_test_common.py - *_py3[`echo $(($current_minor_version+1))`-9]*.py + 'apache_beam/coders/coder_impl.*' + 'apache_beam/coders/stream.*' + 'apache_beam/examples/' + 'apache_beam/io/gcp/tests/' + 'apache_beam/metrics/execution.*' + 'apache_beam/runners/api/' + 'apache_beam/runners/common.*' + 'apache_beam/runners/portability/' + 'apache_beam/runners/test/' + 'apache_beam/runners/worker/' + 'apache_beam/testing/' + 'apache_beam/testing/benchmarks/chicago_taxi/' + 'apache_beam/tools/' + 'apache_beam/tools/map_fn_microbenchmark.*' + 'apache_beam/transforms/cy_combiners.*' + 'apache_beam/transforms/cy_dataflow_distribution_counter.*' + 'apache_beam/transforms/py_dataflow_distribution_counter.*' + 'apache_beam/utils/counters.*' + 'apache_beam/utils/windowed_value.*' + 'apache_beam/version.py' + '**/internal/*' + '*_it.py' + '*_pb2.py' + '*_py3[0-9]*.py' + '*_test.py' + '*_test_common.py' + '*_test_py3.py' ) python $(type -p sphinx-apidoc) -fMeT -o target/docs/source apache_beam \ "${excluded_patterns[@]}" +# Include inherited memebers for the DataFrame API +echo " :inherited-members:" >> target/docs/source/apache_beam.dataframe.frames.rst + # Create the configuration and index files #=== conf.py ===# cat > target/docs/source/conf.py <<'EOF' @@ -116,6 +122,9 @@ autoclass_content = 'both' autodoc_inherit_docstrings = False autodoc_member_order = 'bysource' +# Allow a special section for documenting DataFrame API +napoleon_custom_sections = ['Differences from pandas'] + doctest_global_setup = ''' import apache_beam as beam ''' @@ -124,6 +133,8 @@ intersphinx_mapping = { 'python': ('https://docs.python.org/{}'.format(sys.version_info.major), None), 'hamcrest': ('https://pyhamcrest.readthedocs.io/en/stable/', None), 'google-cloud-datastore': ('https://googleapis.dev/python/datastore/latest/', None), + 'numpy': ('http://docs.scipy.org/doc/numpy', None), + 'pandas': ('http://pandas.pydata.org/pandas-docs/dev', None), } # Since private classes are skipped by sphinx, if there is any cross reference @@ -149,10 +160,13 @@ ignore_identifiers = [ # Ignore private classes 'apache_beam.coders.coders._PickleCoderBase', 'apache_beam.coders.coders.FastCoder', + 'apache_beam.coders.coders.ListLikeCoder', 'apache_beam.io._AvroSource', 'apache_beam.io.gcp.bigquery.RowAsDictJsonCoder', 'apache_beam.io.gcp.datastore.v1new.datastoreio._Mutate', 'apache_beam.io.gcp.datastore.v1new.datastoreio.DatastoreMutateFn', + 'apache_beam.io.gcp.internal.clients.bigquery.' + 'bigquery_v2_messages.TableFieldSchema', 'apache_beam.io.gcp.internal.clients.bigquery.' 'bigquery_v2_messages.TableSchema', 'apache_beam.io.iobase.SourceBase', @@ -204,6 +218,9 @@ ignore_identifiers = [ 'google.cloud.datastore.batch.Batch', 'is_in_ipython', 'doctest.TestResults', + + # IPython Magics py:class reference target not found + 'IPython.core.magic.Magics', ] ignore_references = [ 'BeamIOError', @@ -244,7 +261,7 @@ python $(type -p sphinx-build) -v -a -E -q target/docs/source \ # Fail if there are errors or warnings in docs ! grep -q "ERROR:" target/docs/sphinx-build.warnings.log || exit 1 -! grep -q "WARNING:" target/docs/sphinx-build.warnings.log || exit 1 +(! grep -v 'apache_beam.dataframe' target/docs/sphinx-build.warnings.log | grep -q "WARNING:") || exit 1 # Run tests for code samples, these can be: # - Code blocks using '.. testsetup::', '.. testcode::' and '.. testoutput::' @@ -255,7 +272,7 @@ python -msphinx -M doctest target/docs/source \ # Fail if there are errors or warnings in docs ! grep -q "ERROR:" target/docs/sphinx-doctest.warnings.log || exit 1 -! grep -q "WARNING:" target/docs/sphinx-doctest.warnings.log || exit 1 +(! grep -v 'apache_beam.dataframe' target/docs/sphinx-doctest.warnings.log | grep -q "WARNING:") || exit 1 # Message is useful only when this script is run locally. In a remote # test environment, this path will be removed when the test completes. diff --git a/sdks/python/scripts/run_integration_test.sh b/sdks/python/scripts/run_integration_test.sh index a60a141ea1ac..b477e918d96c 100755 --- a/sdks/python/scripts/run_integration_test.sh +++ b/sdks/python/scripts/run_integration_test.sh @@ -45,10 +45,10 @@ # using this flag. # # Test related flags: -# test_opts -> List of space separated options to configure Nose test -# during execution. Commonly used options like `--attr`, -# `--tests`, `--nologcapture`. More can be found in -# https://nose.readthedocs.io/en/latest/man.html#options +# test_opts -> List of space separated options to configure Pytest test +# during execution. Commonly used options like `--capture=no` +# `--collect-only`. More can be found in +# https://docs.pytest.org/en/latest/reference.html#command-line-flags # suite -> Namespace for this run of tests. Required if running # under Jenkins. Used to differentiate runs of the same # tests with different interpreters/dependencies/etc. @@ -58,7 +58,7 @@ # `$ ./run_integration_test.sh` # # - Run single integration test with default pipeline options: -# `$ ./run_integration_test.sh --test_opts --tests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it` +# `$ ./run_integration_test.sh --test_opts apache_beam/examples/wordcount_it_test.py::WordCountIT::test_wordcount_it` # # - Run full set of PostCommit tests with customized pipeline options: # `$ ./run_integration_test.sh --project my-project --gcs_location gs://my-location` @@ -78,11 +78,12 @@ STREAMING=false WORKER_JAR="" KMS_KEY_NAME="projects/apache-beam-testing/locations/global/keyRings/beam-it/cryptoKeys/test" SUITE="" +COLLECT_MARKERS= -# Default test (nose) options. +# Default test (pytest) options. # Run WordCountIT.test_wordcount_it by default if no test options are # provided. -TEST_OPTS="--tests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it --nocapture" +TEST_OPTS="apache_beam/examples/wordcount_it_test.py::WordCountIT::test_wordcount_it" while [[ $# -gt 0 ]] do @@ -163,6 +164,11 @@ case $key in shift # past argument shift # past value ;; + --collect) + COLLECT_MARKERS="-m=$2" + shift # past argument + shift # past value + ;; *) # unknown option echo "Unknown option: $1" exit 1 @@ -174,7 +180,6 @@ if [[ "$JENKINS_HOME" != "" && "$SUITE" == "" ]]; then echo "Argument --suite is required in a Jenkins environment." exit 1 fi -XUNIT_FILE="nosetests-$SUITE.xml" set -o errexit @@ -240,6 +245,8 @@ if [[ -z $PIPELINE_OPTS ]]; then # Add --runner_v2 if provided if [[ "$RUNNER_V2" = true ]]; then opts+=("--experiments=use_runner_v2") + # TODO(BEAM-11779) remove shuffle_mode=appliance with runner v2 once issue is resolved. + opts+=("--experiments=shuffle_mode=appliance") if [[ "$STREAMING" = true ]]; then # Dataflow Runner V2 only supports streaming engine. opts+=("--enable_streaming_engine") @@ -268,11 +275,12 @@ fi # Run tests and validate that jobs finish successfully. echo ">>> RUNNING integration tests with pipeline options: $PIPELINE_OPTS" -echo ">>> test options: $TEST_OPTS" -# TODO(BEAM-3713): Pass $SUITE once migrated to pytest. xunitmp doesn't support -# suite names. -python setup.py nosetests \ - --test-pipeline-options="$PIPELINE_OPTS" \ - --with-xunitmp --xunitmp-file=$XUNIT_FILE \ - --ignore-files '.*py3\d?\.py$' \ - $TEST_OPTS +echo ">>> pytest options: $TEST_OPTS" +echo ">>> collect markers: $COLLECT_MARKERS" +ARGS="-o junit_suite_name=$SUITE --junitxml=pytest_$SUITE.xml $TEST_OPTS" +# Handle markers as an independient argument from $TEST_OPTS to prevent errors in space separeted flags +if [ -z "$COLLECT_MARKERS" ]; then + pytest $ARGS --test-pipeline-options="$PIPELINE_OPTS" +else + pytest $ARGS --test-pipeline-options="$PIPELINE_OPTS" "$COLLECT_MARKERS" +fi \ No newline at end of file diff --git a/sdks/python/scripts/run_pylint.sh b/sdks/python/scripts/run_pylint.sh index 4049f591a323..922773209ec1 100755 --- a/sdks/python/scripts/run_pylint.sh +++ b/sdks/python/scripts/run_pylint.sh @@ -88,7 +88,6 @@ ISORT_EXCLUDED=( "iobase_test.py" "fast_coders_test.py" "slow_coders_test.py" - "vcfio.py" "tfdv_analyze_and_validate.py" "preprocess.py" "model.py" diff --git a/sdks/python/setup.py b/sdks/python/setup.py index f3351c5b9828..a71e1da7c626 100644 --- a/sdks/python/setup.py +++ b/sdks/python/setup.py @@ -17,9 +17,6 @@ """Apache Beam SDK for Python setup file.""" -from __future__ import absolute_import -from __future__ import print_function - import os import sys import warnings @@ -106,7 +103,6 @@ def get_version(): ) ) - REQUIRED_CYTHON_VERSION = '0.28.1' try: _CYTHON_VERSION = get_distribution('cython').version @@ -131,6 +127,12 @@ def get_version(): # Avro 1.9.2 for python3 was broken. The issue was fixed in version 1.9.2.1 'avro-python3>=1.8.1,!=1.9.2,<1.10.0', 'crcmod>=1.7,<2.0', + # dataclasses backport for python_version<3.7. No version bound because this + # is Python standard since Python 3.7 and each Python version is compatible + # with a specific dataclasses version. + 'dataclasses;python_version<"3.7"', + # orjson, only available on Python 3.6 and above + 'orjson<4.0;python_version>="3.6"', # Dill doesn't have forwards-compatibility guarantees within minor version. # Pickles created with a new version of dill may not unpickle using older # version of dill. It is best to use the same version of dill on client and @@ -141,19 +143,18 @@ def get_version(): 'future>=0.18.2,<1.0.0', 'grpcio>=1.29.0,<2', 'hdfs>=2.1.0,<3.0.0', - 'httplib2>=0.8,<0.18.0', - 'mock>=1.0.1,<3.0.0', - 'numpy>=1.14.3,<2', + 'httplib2>=0.8,<0.20.0', + 'numpy>=1.14.3,<1.21.0', 'pymongo>=3.8.0,<4.0.0', 'oauth2client>=2.0.1,<5', 'protobuf>=3.12.2,<4', - 'pyarrow>=0.15.1,<3.0.0', + 'pyarrow>=0.15.1,<5.0.0', 'pydot>=1.2.0,<2', 'python-dateutil>=2.8.0,<3', 'pytz>=2018.3', 'requests>=2.24.0,<3.0.0', - 'typing-extensions>=3.7.0,<3.8.0', - ] + 'typing-extensions>=3.7.0,<4', +] # [BEAM-8181] pyarrow cannot be installed on 32-bit Windows platforms. if sys.platform == 'win32' and sys.maxsize <= 2**32: @@ -163,9 +164,8 @@ def get_version(): REQUIRED_TEST_PACKAGES = [ 'freezegun>=0.3.12', - 'nose>=1.3.7', - 'nose_xunitmp>=0.4.1', - 'pandas>=1.0,<2', + 'mock>=1.0.1,<3.0.0', + 'pandas>=1.0,<1.3.0', 'parameterized>=0.7.1,<0.8.0', 'pyhamcrest>=1.9,!=1.10.0,<2.0.0', 'pyyaml>=3.12,<6.0.0', @@ -177,16 +177,19 @@ def get_version(): 'sqlalchemy>=1.3,<2.0', 'psycopg2-binary>=2.8.5,<3.0.0', 'testcontainers>=3.0.3,<4.0.0', - ] +] GCP_REQUIREMENTS = [ 'cachetools>=3.1.0,<5', 'google-apitools>=0.5.31,<0.5.32', - 'google-auth>=1.18.0,<2', - 'google-cloud-datastore>=1.7.1,<2', + # NOTE: Maintainers, please do not require google-auth>=2.x.x + # Until this issue is closed + # https://github.com/googleapis/google-cloud-python/issues/10566 + 'google-auth>=1.18.0,<3', + 'google-cloud-datastore>=1.8.0,<2', 'google-cloud-pubsub>=0.39.0,<2', # GCP packages required by tests - 'google-cloud-bigquery>=1.6.0,<2', + 'google-cloud-bigquery>=1.6.0,<3', 'google-cloud-core>=0.28.1,<2', 'google-cloud-bigtable>=0.31.1,<2', 'google-cloud-spanner>=1.13.0,<2', @@ -196,14 +199,16 @@ def get_version(): 'google-cloud-language>=1.3.0,<2', 'google-cloud-videointelligence>=1.8.0,<2', 'google-cloud-vision>=0.38.0,<2', - # GCP packages required by prebuild sdk container functionality. - 'google-cloud-build>=2.0.0,<3', + 'google-cloud-recommendations-ai>=0.1.0,<=0.2.0' ] INTERACTIVE_BEAM = [ 'facets-overview>=1.0.0,<2', - 'ipython>=5.8.0,<8', + 'ipython>=7,<8', 'ipykernel>=5.2.0,<6', + # Skip version 6.1.13 due to + # https://github.com/jupyter/jupyter_client/issues/637 + 'jupyter-client>=6.1.11,<6.1.13', 'timeloop>=1.0.2,<2', ] @@ -211,18 +216,15 @@ def get_version(): # notebok utils 'nbformat>=5.0.5,<6', 'nbconvert>=5.6.1,<6', - 'jupyter-client>=6.1.2,<7', # headless chrome based integration tests 'selenium>=3.141.0,<4', 'needle>=0.5.0,<1', - 'chromedriver-binary>=87,<88', + 'chromedriver-binary>=91,<92', # use a fixed major version of PIL for different python versions 'pillow>=7.1.1,<8', ] -AWS_REQUIREMENTS = [ - 'boto3 >=1.9' -] +AWS_REQUIREMENTS = ['boto3 >=1.9'] AZURE_REQUIREMENTS = [ 'azure-storage-blob >=12.3.2', @@ -230,7 +232,6 @@ def get_version(): ] - # We must generate protos after setup_requires are installed. def generate_protos_first(original_cmd): try: @@ -242,6 +243,7 @@ class cmd(original_cmd, object): def run(self): gen_protos.generate_proto_files() super(cmd, self).run() + return cmd except ImportError: warnings.warn("Could not import gen_protos, skipping proto generation.") @@ -253,8 +255,8 @@ def run(self): if sys.version_info.major == 3 and sys.version_info.minor >= 9: warnings.warn( 'This version of Apache Beam has not been sufficiently tested on ' - 'Python %s.%s. You may encounter bugs or missing features.' % ( - sys.version_info.major, sys.version_info.minor)) + 'Python %s.%s. You may encounter bugs or missing features.' % + (sys.version_info.major, sys.version_info.minor)) setuptools.setup( name=PACKAGE_NAME, @@ -266,10 +268,20 @@ def run(self): author=PACKAGE_AUTHOR, author_email=PACKAGE_EMAIL, packages=setuptools.find_packages(), - package_data={'apache_beam': [ - '*/*.pyx', '*/*/*.pyx', '*/*.pxd', '*/*/*.pxd', '*/*.h', '*/*/*.h', - 'testing/data/*.yaml', 'portability/api/*.yaml']}, + package_data={ + 'apache_beam': [ + '*/*.pyx', + '*/*/*.pyx', + '*/*.pxd', + '*/*/*.pxd', + '*/*.h', + '*/*/*.h', + 'testing/data/*.yaml', + 'portability/api/*.yaml' + ] + }, ext_modules=cythonize([ + # Make sure to use language_level=3 cython directive in files below. 'apache_beam/**/*.pyx', 'apache_beam/coders/coder_impl.py', 'apache_beam/metrics/cells.py', @@ -279,12 +291,12 @@ def run(self): 'apache_beam/runners/worker/opcounters.py', 'apache_beam/runners/worker/operations.py', 'apache_beam/transforms/cy_combiners.py', + 'apache_beam/transforms/stats.py', 'apache_beam/utils/counters.py', 'apache_beam/utils/windowed_value.py', ]), install_requires=REQUIRED_PACKAGES, python_requires=python_requires, - test_suite='nose.collector', # BEAM-8840: Do NOT use tests_require or setup_requires. extras_require={ 'docs': ['Sphinx>=1.5.2,<2.0'], @@ -311,10 +323,6 @@ def run(self): ], license='Apache License, Version 2.0', keywords=PACKAGE_KEYWORDS, - entry_points={ - 'nose.plugins.0.10': [ - 'beam_test_plugin = test_config:BeamTestPlugin', - ]}, cmdclass={ 'build_py': generate_protos_first(build_py), 'develop': generate_protos_first(develop), diff --git a/sdks/python/test-suites/dataflow/build.gradle b/sdks/python/test-suites/dataflow/build.gradle index 919e74358629..cbe9aaed57d3 100644 --- a/sdks/python/test-suites/dataflow/build.gradle +++ b/sdks/python/test-suites/dataflow/build.gradle @@ -48,3 +48,33 @@ task chicagoTaxiExample { dependsOn.add("sdks:python:test-suites:dataflow:py${getVersionSuffix(it)}:chicagoTaxiExample") } } + +task validatesRunnerBatchTests { + getVersionsAsList('dataflow_validates_runner_batch_tests').each { + dependsOn.add(":sdks:python:test-suites:dataflow:py${getVersionSuffix(it)}:validatesRunnerBatchTests") + } +} + +task validatesRunnerStreamingTests { + getVersionsAsList('dataflow_validates_runner_streaming_tests').each { + dependsOn.add(":sdks:python:test-suites:dataflow:py${getVersionSuffix(it)}:validatesRunnerStreamingTests") + } +} + +task validatesRunnerBatchTestsV2 { + getVersionsAsList('dataflow_validates_runner_batch_tests_V2').each { + dependsOn.add(":sdks:python:test-suites:dataflow:py${getVersionSuffix(it)}:validatesRunnerBatchTests") + } +} + +task validatesRunnerStreamingTestsV2 { + getVersionsAsList('dataflow_validates_runner_streaming_tests_V2').each { + dependsOn.add(":sdks:python:test-suites:dataflow:py${getVersionSuffix(it)}:validatesRunnerStreamingTests") + } +} + +task validatesContainerTests { + getVersionsAsList('dataflow_validates_container_tests').each { + dependsOn.add(":sdks:python:test-suites:dataflow:py${getVersionSuffix(it)}:validatesContainer") + } +} diff --git a/sdks/python/test-suites/dataflow/common.gradle b/sdks/python/test-suites/dataflow/common.gradle index 950d9ef100c8..29373b5ac0c8 100644 --- a/sdks/python/test-suites/dataflow/common.gradle +++ b/sdks/python/test-suites/dataflow/common.gradle @@ -29,10 +29,11 @@ dependencies { def runScriptsDir = "${rootDir}/sdks/python/scripts" // Basic test options for ITs running on Jenkins. -def basicTestOpts = [ - "--nocapture", // print stdout instantly - "--processes=8", // run tests in parallel - "--process-timeout=4500", // timeout of whole command execution +def basicPytestOpts = [ + "--capture=no", // print stdout instantly + "--timeout=4500", // timeout of whole command execution + "--color=yes", // console color + "--log-cli-level=INFO", //log level ] def preCommitIT(String runScriptsDir, String envdir, Boolean streaming, Boolean runnerV2, String pythonSuffix) { @@ -48,22 +49,22 @@ def preCommitIT(String runScriptsDir, String envdir, Boolean streaming, Boolean doLast { // Basic integration tests to run in PreCommit def precommitTests = streaming ? [ - "apache_beam.examples.streaming_wordcount_it_test:StreamingWordCountIT.test_streaming_wordcount_it", + "apache_beam/examples/streaming_wordcount_it_test.py::StreamingWordCountIT::test_streaming_wordcount_it", ] : [ - "apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it", + "apache_beam/examples/wordcount_it_test.py::WordCountIT::test_wordcount_it", ] def testOpts = [ - "--tests=${precommitTests.join(',')}", - "--nocapture", // Print stdout instantly - "--processes=2", // Number of tests running in parallel - "--process-timeout=1800", // Timeout of whole command execution + "${precommitTests.join(' ')}", + "--capture=no", // Print stdout instantly + "--numprocesses=2", // Number of tests running in parallel + "--timeout=1800", // Timeout of whole command execution ] def argMap = [ "test_opts" : testOpts, "sdk_location": files(configurations.distTarBall.files).singleFile, "worker_jar" : dataflowWorkerJar, - "suite" : "preCommitIT-df${pythonSuffix}" + "suite" : "preCommitIT-df${pythonSuffix}", ] if (runnerV2) { @@ -107,14 +108,15 @@ task postCommitIT { def dataflowWorkerJar = project(":runners:google-cloud-dataflow-java:worker").shadowJar.archivePath doLast { - def testOpts = basicTestOpts + ["--attr=IT"] - - def cmdArgs = mapToArgString([ + def testOpts = basicPytestOpts + ["--numprocesses=8", "--dist=loadfile"] + def argMap = [ "test_opts": testOpts, "sdk_location": files(configurations.distTarBall.files).singleFile, "worker_jar": dataflowWorkerJar, - "suite": "postCommitIT-df${pythonVersionSuffix}" - ]) + "suite": "postCommitIT-df${pythonVersionSuffix}", + "collect": "it_postcommit" + ] + def cmdArgs = mapToArgString(argMap) exec { executable 'sh' args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs" @@ -128,12 +130,13 @@ task validatesRunnerBatchTests { dependsOn ":runners:google-cloud-dataflow-java:worker:shadowJar" def dataflowWorkerJar = project(":runners:google-cloud-dataflow-java:worker").shadowJar.archivePath - def testOpts = basicTestOpts + ["--attr=ValidatesRunner,!sickbay-batch"] def argMap = [ - "test_opts" : testOpts, + "test_opts" : basicPytestOpts + ["--numprocesses=8"], "worker_jar" : dataflowWorkerJar, "sdk_location": files(configurations.distTarBall.files).singleFile, - "suite" : "validatesRunnerBatchTests-df${pythonVersionSuffix}"] + "suite" : "validatesRunnerBatchTests-df${pythonVersionSuffix}", + "collect": "it_validatesrunner and not no_sickbay_batch" + ] if (project.hasProperty('useRunnerV2')) { argMap.put("runner_v2", "true") @@ -158,19 +161,45 @@ task validatesRunnerStreamingTests { def dataflowWorkerJar = project(":runners:google-cloud-dataflow-java:worker").shadowJar.archivePath // TODO(BEAM-3544,BEAM-5025): Disable tests with 'sickbay-streaming' tag. - def testOpts = basicTestOpts + ["--attr=ValidatesRunner,!sickbay-streaming"] - def argMap = ["test_opts": testOpts, + // Execute tests with xdists + doFirst { + def argMap = [ + "test_opts": basicPytestOpts + ["--numprocesses=8"], "streaming": "true", "sdk_location": files(configurations.distTarBall.files).singleFile, "worker_jar": dataflowWorkerJar, - "suite": "validatesRunnerStreamingTests-df${pythonVersionSuffix}"] - if (project.hasProperty('useRunnerV2')) { - argMap.put("runner_v2", "true") - // KMS is not supported for streaming engine. - argMap.put("kms_key_name", "\"\"") + "suite": "validatesRunnerStreamingTests-df${pythonVersionSuffix}-xdist", + "collect": "it_validatesrunner and not no_sickbay_streaming and not no_xdist" + ] + if (project.hasProperty('useRunnerV2')) { + argMap.put("runner_v2", "true") + // KMS is not supported for streaming engine. + argMap.put("kms_key_name", "\"\"") + } + + def cmdArgs = mapToArgString(argMap) + exec { + executable 'sh' + args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs" + } } + // Execute tests that fail with xdists doLast { + def argMap = [ + "test_opts": basicPytestOpts, + "streaming": "true", + "sdk_location": files(configurations.distTarBall.files).singleFile, + "worker_jar": dataflowWorkerJar, + "suite": "validatesRunnerStreamingTests-df${pythonVersionSuffix}-noxdist", + "collect": "it_validatesrunner and not no_sickbay_streaming and no_xdist" + ] + if (project.hasProperty('useRunnerV2')) { + argMap.put("runner_v2", "true") + // KMS is not supported for streaming engine. + argMap.put("kms_key_name", "\"\"") + } + def cmdArgs = mapToArgString(argMap) exec { executable 'sh' @@ -185,15 +214,14 @@ task runPerformanceTest(type: Exec) { def test = project.findProperty('test') def suite = "runPerformanceTest-df${pythonVersionSuffix}" - def xUnitFile ="nosetests-${suite}.xml" + def xUnitFile ="pytest-${suite}.xml" def testOpts = project.findProperty('test-pipeline-options') testOpts += " --sdk_location=${files(configurations.distTarBall.files).singleFile}" setWorkingDir "${project.rootDir}/sdks/python" - commandLine 'sh', '-c', ". ${envdir}/bin/activate && ${envdir}/bin/python setup.py nosetests" + - " --tests=${test} --test-pipeline-options=\"${testOpts}\" --with-xunitmp" + - " --xunitmp-file=${xUnitFile}" + commandLine 'sh', '-c', ". ${envdir}/bin/activate && pytest -o junit_suite_name=${suite}" + + " ${test} --test-pipeline-options=\"${testOpts}\" --junitxml=${xUnitFile} --timeout=1800" } task mongodbioIT { @@ -237,3 +265,19 @@ task chicagoTaxiExample { } } } + +task validatesContainer() { + def pyversion = "${project.ext.pythonVersion.replace('.', '')}" + dependsOn 'installGcpTest' + dependsOn ':sdks:python:sdist' + dependsOn ":sdks:python:container:py${pyversion}:docker" + def runScriptsPath = "${rootDir}/sdks/python/container/run_validatescontainer.sh" + doLast { + exec { + executable 'sh' + args '-c', ". ${envdir}/bin/activate && cd ${rootDir} && ${runScriptsPath} " + + "python${pyversion} " + + "${files(configurations.distTarBall.files).singleFile}" + } + } +} diff --git a/sdks/python/test-suites/direct/common.gradle b/sdks/python/test-suites/direct/common.gradle index 42bba4c9487d..3680bfae2eb8 100644 --- a/sdks/python/test-suites/direct/common.gradle +++ b/sdks/python/test-suites/direct/common.gradle @@ -21,9 +21,11 @@ def pythonContainerVersion = project.ext.pythonVersion == '2.7' ? '2' : project. def runScriptsDir = "${rootDir}/sdks/python/scripts" // Basic test options for ITs running on Jenkins. def basicTestOpts = [ - "--nocapture", // print stdout instantly - "--processes=8", // run tests in parallel - "--process-timeout=4500", // timeout of whole command execution + "--capture=no", // print stdout instantly + "--numprocesses=8", // run tests in parallel + "--timeout=4500", // timeout of whole command execution + "--color=yes", // console color + "--log-cli-level=INFO" //log level info ] task postCommitIT { @@ -32,25 +34,21 @@ task postCommitIT { // Run IT tests with TestDirectRunner in batch in Python 3. doLast { def batchTests = [ - "apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it", - "apache_beam.io.gcp.pubsub_integration_test:PubSubIntegrationTest", - "apache_beam.io.gcp.big_query_query_to_table_it_test:BigQueryQueryToTableIT", - "apache_beam.io.gcp.bigquery_io_read_it_test", - "apache_beam.io.gcp.bigquery_read_it_test", - "apache_beam.io.gcp.bigquery_write_it_test", - "apache_beam.io.gcp.datastore.v1new.datastore_write_it_test", - "apache_beam.io.gcp.experimental.spannerio_read_it_test", - "apache_beam.io.gcp.experimental.spannerio_write_it_test", - ] - def testOpts = [ - "--tests=${batchTests.join(',')}", - "--nocapture", // Print stdout instantly - "--processes=8", // run tests in parallel - "--process-timeout=4500", // timeout of whole command execution + "apache_beam/examples/wordcount_it_test.py::WordCountIT::test_wordcount_it", + "apache_beam/io/gcp/pubsub_integration_test.py::PubSubIntegrationTest", + "apache_beam/io/gcp/big_query_query_to_table_it_test.py::BigQueryQueryToTableIT", + "apache_beam/io/gcp/bigquery_io_read_it_test.py", + "apache_beam/io/gcp/bigquery_read_it_test.py", + "apache_beam/io/gcp/bigquery_write_it_test.py", + "apache_beam/io/gcp/datastore/v1new/datastore_write_it_test.py", + "apache_beam/io/gcp/experimental/spannerio_read_it_test.py", + "apache_beam/io/gcp/experimental/spannerio_write_it_test.py", ] + def testOpts = basicTestOpts + ["${batchTests.join(' ')}"] def argMap = ["runner": "TestDirectRunner", "test_opts": testOpts, - "suite": "postCommitIT-direct-py${pythonVersionSuffix}"] + "suite": "postCommitIT-direct-py${pythonVersionSuffix}", + ] def batchCmdArgs = mapToArgString(argMap) exec { executable 'sh' @@ -97,18 +95,19 @@ task directRunnerIT { // Run IT tests with TestDirectRunner in batch. doLast { def tests = [ - "apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it", - "apache_beam.io.gcp.pubsub_integration_test:PubSubIntegrationTest", - "apache_beam.io.gcp.big_query_query_to_table_it_test:BigQueryQueryToTableIT", - "apache_beam.io.gcp.bigquery_io_read_it_test", - "apache_beam.io.gcp.bigquery_read_it_test", - "apache_beam.io.gcp.bigquery_write_it_test", - "apache_beam.io.gcp.datastore.v1new.datastore_write_it_test", + "apache_beam/examples/wordcount_it_test.py::WordCountIT::test_wordcount_it", + "apache_beam/io/gcp/pubsub_integration_test.py::PubSubIntegrationTest", + "apache_beam/io/gcp/big_query_query_to_table_it_test.py::BigQueryQueryToTableIT", + "apache_beam/io/gcp/bigquery_io_read_it_test.py", + "apache_beam/io/gcp/bigquery_read_it_test.py", + "apache_beam/io/gcp/bigquery_write_it_test.py", + "apache_beam/io/gcp/datastore/v1new/datastore_write_it_test.py", ] - def batchTestOpts = basicTestOpts + ["--tests=${tests.join(',')}"] + def batchTestOpts = basicTestOpts + ["${tests.join(' ')}"] def argMap = ["runner": "TestDirectRunner", "test_opts": batchTestOpts, - "suite": "directRunnerIT-batch"] + "suite": "directRunnerIT-batch", + ] def batchCmdArgs = mapToArgString(argMap) exec { executable 'sh' @@ -119,18 +118,18 @@ task directRunnerIT { // Run IT tests with TestDirectRunner in streaming. doLast { def tests = [ - "apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it", - "apache_beam.io.gcp.pubsub_integration_test:PubSubIntegrationTest", - "apache_beam.io.gcp.bigquery_test:BigQueryStreamingInsertTransformIntegrationTests\ -.test_multiple_destinations_transform", - "apache_beam.io.gcp.bigquery_test:PubSubBigQueryIT", - "apache_beam.io.gcp.bigquery_file_loads_test:BigQueryFileLoadsIT.test_bqfl_streaming", + "apache_beam/examples/wordcount_it_test.py::WordCountIT::test_wordcount_it", + "apache_beam/io/gcp/pubsub_integration_test.py::PubSubIntegrationTest", + "apache_beam/io/gcp/bigquery_test.py::BigQueryStreamingInsertTransformIntegrationTests::test_multiple_destinations_transform", + "apache_beam/io/gcp/bigquery_test.py::PubSubBigQueryIT", + "apache_beam/io/gcp/bigquery_file_loads_test.py::BigQueryFileLoadsIT::test_bqfl_streaming", ] - def streamingTestOpts = basicTestOpts + ["--tests=${tests.join(',')}"] + def streamingTestOpts = basicTestOpts + ["${tests.join(' ')}"] def argMap = ["runner": "TestDirectRunner", "streaming": "true", "test_opts": streamingTestOpts, - "suite": "directRunnerIT-streaming"] + "suite": "directRunnerIT-streaming", + ] def streamingCmdArgs = mapToArgString(argMap) exec { executable 'sh' diff --git a/sdks/python/test-suites/direct/xlang/build.gradle b/sdks/python/test-suites/direct/xlang/build.gradle index a68b5c05695c..0e64e58b27ef 100644 --- a/sdks/python/test-suites/direct/xlang/build.gradle +++ b/sdks/python/test-suites/direct/xlang/build.gradle @@ -26,7 +26,7 @@ project.evaluationDependsOn(":runners:portability:java") def envDir = project.project(":sdks:python").envdir def crossLanguageTestClasspath = project.project(":runners:portability:java").sourceSets.test.runtimeClasspath -def jobPort = BeamModulePlugin.startingExpansionPortNumber.getAndDecrement() +def jobPort = BeamModulePlugin.getRandomPort() def tmpDir = System.getenv("TMPDIR") ?: System.getenv("WORKSPACE") ?: "/tmp" def pidFile = "${tmpDir}/local_job_service_main-${jobPort}.pid" @@ -34,12 +34,12 @@ def setupTask = project.tasks.create(name: "fnApiJobServerSetup", type: Exec) { dependsOn ':sdks:python:installGcpTest' executable 'sh' - args '-c', ". ${envdir}/bin/activate && python -m apache_beam.runners.portability.local_job_service_main --job_port ${jobPort} --pid_file ${pidFile} --background --stdout_file ${tmpDir}/beam-fnapi-job-server.log" + args '-c', ". ${envDir}/bin/activate && python -m apache_beam.runners.portability.local_job_service_main --job_port ${jobPort} --pid_file ${pidFile} --background --stdout_file ${tmpDir}/beam-fnapi-job-server.log" } def cleanupTask = project.tasks.create(name: "fnApiJobServerCleanup", type: Exec) { executable 'sh' - args '-c', ". ${envdir}/bin/activate && python -m apache_beam.runners.portability.local_job_service_main --pid_file ${pidFile} --stop" + args '-c', ". ${envDir}/bin/activate && python -m apache_beam.runners.portability.local_job_service_main --pid_file ${pidFile} --stop" } createCrossLanguageValidatesRunnerTask( diff --git a/sdks/python/test-suites/gradle.properties b/sdks/python/test-suites/gradle.properties index 1ff61155cef8..76c237c8d807 100644 --- a/sdks/python/test-suites/gradle.properties +++ b/sdks/python/test-suites/gradle.properties @@ -24,10 +24,19 @@ dataflow_precommit_it_task_py_versions=3.6,3.7 dataflow_mongodbio_it_task_py_versions=3.6 dataflow_chicago_taxi_example_task_py_versions=3.7 +dataflow_validates_runner_batch_tests=3.6,3.7,3.8 +dataflow_validates_runner_streaming_tests=3.6,3.7,3.8 +dataflow_validates_container_tests=3.6,3.7,3.8 +# TODO: Enable following tests after making sure we have enough capacity. +dataflow_validates_runner_batch_tests_V2=3.8 +dataflow_validates_runner_streaming_tests_V2=3.8 # direct runner test-suites direct_mongodbio_it_task_py_versions=3.6 -# test-suites for portable runners +# flink runner test-suites flink_validates_runner_precommit_py_versions=3.6 flink_validates_runner_postcommit_py_versions=3.6,3.7,3.8 + +# samza runner test-suites +samza_validates_runner_postcommit_py_versions=3.6,3.7,3.8 diff --git a/sdks/python/test-suites/portable/build.gradle b/sdks/python/test-suites/portable/build.gradle index 732b5a4f376d..dbf289988874 100644 --- a/sdks/python/test-suites/portable/build.gradle +++ b/sdks/python/test-suites/portable/build.gradle @@ -25,6 +25,12 @@ task flinkValidatesRunner { } } +task samzaValidatesRunner { + getVersionsAsList('samza_validates_runner_postcommit_py_versions').each { + dependsOn.add(":sdks:python:test-suites:portable:py${getVersionSuffix(it)}:samzaValidatesRunner") + } +} + task flinkValidatesRunnerPrecommit { getVersionsAsList('flink_validates_runner_precommit_py_versions').each { dependsOn.add(":sdks:python:test-suites:portable:py${getVersionSuffix(it)}:flinkValidatesRunner") diff --git a/sdks/python/test-suites/portable/common.gradle b/sdks/python/test-suites/portable/common.gradle index d273996a7f9a..3b1654f0d65a 100644 --- a/sdks/python/test-suites/portable/common.gradle +++ b/sdks/python/test-suites/portable/common.gradle @@ -20,6 +20,7 @@ import org.apache.tools.ant.taskdefs.condition.Os def pythonRootDir = "${rootDir}/sdks/python" def pythonVersionSuffix = project.ext.pythonVersion.replace('.', '') +def latestFlinkVersion = project.ext.latestFlinkVersion ext { pythonContainerTask = ":sdks:python:container:py${pythonVersionSuffix}:docker" @@ -27,15 +28,15 @@ ext { def createFlinkRunnerTestTask(String workerType) { def taskName = "flinkCompatibilityMatrix${workerType}" - // `project(':runners:flink:1.10:job-server').shadowJar.archivePath` is not resolvable until runtime, so hard-code it here. - def jobServerJar = "${rootDir}/runners/flink/1.10/job-server/build/libs/beam-runners-flink-1.10-job-server-${version}.jar" + // project(":runners:flink:${latestFlinkVersion}:job-server").shadowJar.archivePath is not resolvable until runtime, so hard-code it here. + def jobServerJar = "${rootDir}/runners/flink/${latestFlinkVersion}/job-server/build/libs/beam-runners-flink-${latestFlinkVersion}-job-server-${version}.jar" def options = "--flink_job_server_jar=${jobServerJar} --environment_type=${workerType}" if (workerType == 'PROCESS') { options += " --environment_options=process_command=${buildDir.absolutePath}/sdk_worker.sh" } def task = toxTask(taskName, 'flink-runner-test', options) // Through the Flink job server, we transitively add dependencies on the expansion services needed in tests. - task.dependsOn ':runners:flink:1.10:job-server:shadowJar' + task.dependsOn ":runners:flink:${latestFlinkVersion}:job-server:shadowJar" // The Java SDK worker is required to execute external transforms. task.dependsOn ':sdks:java:container:java8:docker' if (workerType == 'DOCKER') { @@ -57,7 +58,7 @@ task flinkValidatesRunner() { // TODO(BEAM-8598): Enable on pre-commit. task flinkTriggerTranscript() { dependsOn 'setupVirtualenv' - dependsOn ':runners:flink:1.10:job-server:shadowJar' + dependsOn ":runners:flink:${latestFlinkVersion}:job-server:shadowJar" doLast { exec { executable 'sh' @@ -65,9 +66,9 @@ task flinkTriggerTranscript() { . ${envdir}/bin/activate \\ && cd ${pythonRootDir} \\ && pip install -e .[test] \\ - && python setup.py nosetests \\ - --tests apache_beam.transforms.trigger_test:WeakTestStreamTranscriptTest \\ - --test-pipeline-options='--runner=FlinkRunner --environment_type=LOOPBACK --flink_job_server_jar=${project(":runners:flink:1.10:job-server:").shadowJar.archivePath}' + && pytest \\ + apache_beam/transforms/trigger_test.py::WeakTestStreamTranscriptTest \\ + --test-pipeline-options='--runner=FlinkRunner --environment_type=LOOPBACK --flink_job_server_jar=${project(":runners:flink:${latestFlinkVersion}:job-server:").shadowJar.archivePath}' """ } } @@ -118,16 +119,41 @@ task createProcessWorker { } } +def createSamzaRunnerTestTask(String workerType) { + def taskName = "samzaCompatibilityMatrix${workerType}" + def jobServerJar = "${rootDir}/runners/samza/job-server/build/libs/beam-runners-samza-job-server-${version}.jar" + def options = "--samza_job_server_jar=${jobServerJar} --environment_type=${workerType}" + if (workerType == 'PROCESS') { + options += " --environment_options=process_command=${buildDir.absolutePath}/sdk_worker.sh" + } + def task = toxTask(taskName, 'samza-runner-test', options) + task.dependsOn ":runners:samza:job-server:shadowJar" + if (workerType == 'DOCKER') { + task.dependsOn ext.pythonContainerTask + } else if (workerType == 'PROCESS') { + task.dependsOn 'createProcessWorker' + } + return task +} + +createSamzaRunnerTestTask('DOCKER') +createSamzaRunnerTestTask('PROCESS') +createSamzaRunnerTestTask('LOOPBACK') + +task samzaValidatesRunner() { + dependsOn 'samzaCompatibilityMatrixLOOPBACK' +} + def createSparkRunnerTestTask(String workerType) { def taskName = "sparkCompatibilityMatrix${workerType}" - // `project(':runners:spark:job-server').shadowJar.archivePath` is not resolvable until runtime, so hard-code it here. - def jobServerJar = "${rootDir}/runners/spark/job-server/build/libs/beam-runners-spark-job-server-${version}.jar" + // `project(':runners:spark:2:job-server').shadowJar.archivePath` is not resolvable until runtime, so hard-code it here. + def jobServerJar = "${rootDir}/runners/spark/2/job-server/build/libs/beam-runners-spark-job-server-${version}.jar" def options = "--spark_job_server_jar=${jobServerJar} --environment_type=${workerType}" if (workerType == 'PROCESS') { options += " --environment_options=process_command=${buildDir.absolutePath}/sdk_worker.sh" } def task = toxTask(taskName, 'spark-runner-test', options) - task.dependsOn ':runners:spark:job-server:shadowJar' + task.dependsOn ':runners:spark:2:job-server:shadowJar' if (workerType == 'DOCKER') { task.dependsOn ext.pythonContainerTask } else if (workerType == 'PROCESS') { @@ -146,7 +172,7 @@ task sparkValidatesRunner() { project.task("preCommitPy${pythonVersionSuffix}") { dependsOn = [":sdks:python:container:py${pythonVersionSuffix}:docker", - ':runners:flink:1.10:job-server:shadowJar', + ":runners:flink:${latestFlinkVersion}:job-server:shadowJar", 'portableWordCountFlinkRunnerBatch', 'portableWordCountFlinkRunnerStreaming'] } @@ -154,7 +180,7 @@ project.task("preCommitPy${pythonVersionSuffix}") { project.task("postCommitPy${pythonVersionSuffix}") { dependsOn = ['setupVirtualenv', "postCommitPy${pythonVersionSuffix}IT", - ':runners:spark:job-server:shadowJar', + ':runners:spark:2:job-server:shadowJar', 'portableLocalRunnerJuliaSetWithSetupPy', 'portableWordCountSparkRunnerBatch'] } @@ -163,27 +189,37 @@ project.task("postCommitPy${pythonVersionSuffix}IT") { dependsOn = [ 'setupVirtualenv', 'installGcpTest', - ':runners:flink:1.10:job-server:shadowJar', + ":runners:flink:${latestFlinkVersion}:job-server:shadowJar", ':sdks:java:container:java8:docker', ':sdks:java:testing:kafka-service:buildTestKafkaServiceJar', + ':sdks:java:io:expansion-service:shadowJar', ':sdks:java:io:google-cloud-platform:expansion-service:shadowJar', ':sdks:java:io:kinesis:expansion-service:shadowJar', ':sdks:java:extensions:schemaio-expansion-service:shadowJar', + ':sdks:java:io:debezium:expansion-service:shadowJar' ] doLast { def tests = [ - "apache_beam.io.gcp.bigquery_read_it_test", - "apache_beam.io.external.xlang_jdbcio_it_test", - "apache_beam.io.external.xlang_kafkaio_it_test", - "apache_beam.io.external.xlang_kinesisio_it_test", - "apache_beam.io.gcp.tests.xlang_spannerio_it_test", + "apache_beam/io/gcp/bigquery_read_it_test.py", + "apache_beam/io/external/xlang_jdbcio_it_test.py", + "apache_beam/io/external/xlang_kafkaio_it_test.py", + "apache_beam/io/external/xlang_kinesisio_it_test.py", + "apache_beam/io/gcp/tests/xlang_spannerio_it_test.py", + "apache_beam/io/external/xlang_debeziumio_it_test.py", + ] + def testOpts = ["${tests.join(' ')}"] + ["--log-cli-level=INFO"] + def pipelineOpts = [ + "--runner=FlinkRunner", + "--project=apache-beam-testing", + "--environment_type=LOOPBACK", + "--temp_location=gs://temp-storage-for-end-to-end-tests/temp-it", + "--flink_job_server_jar=${project(":runners:flink:${latestFlinkVersion}:job-server").shadowJar.archivePath}", ] - def testOpts = ["--tests=${tests.join(',')}"] def cmdArgs = mapToArgString([ "test_opts": testOpts, "suite": "postCommitIT-flink-py${pythonVersionSuffix}", - "pipeline_opts": "--runner=FlinkRunner --project=apache-beam-testing --environment_type=LOOPBACK --temp_location=gs://temp-storage-for-end-to-end-tests/temp-it", + "pipeline_opts": pipelineOpts.join(" "), ]) def kafkaJar = project(":sdks:java:testing:kafka-service:").buildTestKafkaServiceJar.archivePath exec { @@ -218,20 +254,20 @@ def addTestJavaJarCreator(String runner, Task jobServerJarTask) { } // TODO(BEAM-11333) Update and test multiple Flink versions. -addTestJavaJarCreator("FlinkRunner", tasks.getByPath(":runners:flink:1.10:job-server:shadowJar")) -addTestJavaJarCreator("SparkRunner", tasks.getByPath(":runners:spark:job-server:shadowJar")) +addTestJavaJarCreator("FlinkRunner", tasks.getByPath(":runners:flink:${latestFlinkVersion}:job-server:shadowJar")) +addTestJavaJarCreator("SparkRunner", tasks.getByPath(":runners:spark:2:job-server:shadowJar")) def addTestFlinkUberJar(boolean saveMainSession) { project.tasks.create(name: "testUberJarFlinkRunner${saveMainSession ? 'SaveMainSession' : ''}") { - dependsOn ':runners:flink:1.10:job-server:shadowJar' - dependsOn ':runners:flink:1.10:job-server:miniCluster' + dependsOn ":runners:flink:${latestFlinkVersion}:job-server:shadowJar" + dependsOn ":runners:flink:${latestFlinkVersion}:job-server:miniCluster" dependsOn pythonContainerTask doLast{ exec { executable "sh" def options = [ - "--flink_job_server_jar ${tasks.getByPath(':runners:flink:1.10:job-server:shadowJar').archivePath}", - "--flink_mini_cluster_jar ${tasks.getByPath(':runners:flink:1.10:job-server:miniCluster').archivePath}", + "--flink_job_server_jar ${tasks.getByPath(":runners:flink:${latestFlinkVersion}:job-server:shadowJar").archivePath}", + "--flink_mini_cluster_jar ${tasks.getByPath(":runners:flink:${latestFlinkVersion}:job-server:miniCluster").archivePath}", "--env_dir ${project.rootProject.buildDir}/gradleenv/${project.path.hashCode()}", "--python_root_dir ${project.rootDir}/sdks/python", "--python_version ${project.ext.pythonVersion}", diff --git a/sdks/python/test-suites/tox/common.gradle b/sdks/python/test-suites/tox/common.gradle index 62c5511f5e18..af74725bfe5f 100644 --- a/sdks/python/test-suites/tox/common.gradle +++ b/sdks/python/test-suites/tox/common.gradle @@ -34,7 +34,8 @@ project.task("preCommitPy${pythonVersionSuffix}") { // Generates coverage reports only once, in Py38, to remove duplicated work if (pythonVersionSuffix.equals('38')) { dependsOn = ["testPy38CloudCoverage", "testPy38Cython", - "testPy38pyarrow-0", "testPy38pyarrow-1", "testPy38pyarrow-2"] + "testPy38pyarrow-0", "testPy38pyarrow-1", "testPy38pyarrow-2", + "testPy38pyarrow-3", "testPy38pyarrow-4"] } else { dependsOn = ["testPy${pythonVersionSuffix}Cloud", "testPy${pythonVersionSuffix}Cython"] } diff --git a/sdks/python/test-suites/tox/py38/build.gradle b/sdks/python/test-suites/tox/py38/build.gradle index 68bba18103a1..8497d573bc68 100644 --- a/sdks/python/test-suites/tox/py38/build.gradle +++ b/sdks/python/test-suites/tox/py38/build.gradle @@ -37,9 +37,13 @@ testPy38Cython.mustRunAfter testPython38, testPy38CloudCoverage toxTask "testPy38pyarrow-0", "py38-pyarrow-0" toxTask "testPy38pyarrow-1", "py38-pyarrow-1" toxTask "testPy38pyarrow-2", "py38-pyarrow-2" +toxTask "testPy38pyarrow-3", "py38-pyarrow-3" +toxTask "testPy38pyarrow-4", "py38-pyarrow-4" test.dependsOn "testPy38pyarrow-0" test.dependsOn "testPy38pyarrow-1" test.dependsOn "testPy38pyarrow-2" +test.dependsOn "testPy38pyarrow-3" +test.dependsOn "testPy38pyarrow-4" toxTask "whitespacelint", "whitespacelint" diff --git a/sdks/python/test_config.py b/sdks/python/test_config.py deleted file mode 100644 index a8c80a25de60..000000000000 --- a/sdks/python/test_config.py +++ /dev/null @@ -1,51 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -"""Test configurations for nose - -This module contains nose plugin hooks that configures Beam tests which -includes ValidatesRunner test and E2E integration test. - -TODO(BEAM-3713): Remove this module once nose is removed. -""" - -from __future__ import absolute_import - -from nose.plugins import Plugin - - -class BeamTestPlugin(Plugin): - """A nose plugin for Beam testing that registers command line options - - This plugin is registered through setuptools in entry_points. - """ - - def options(self, parser, env): - """Add '--test-pipeline-options' and '--not_use-test-runner-api' - to command line option to avoid unrecognized option error thrown by nose. - - The value of this option will be processed by TestPipeline and used to - build customized pipeline for ValidatesRunner tests. - """ - parser.add_option('--test-pipeline-options', - action='store', - type=str, - help='providing pipeline options to run tests on runner') - parser.add_option('--not-use-test-runner-api', - action='store_true', - default=False, - help='whether not to use test-runner-api') diff --git a/sdks/python/tox.ini b/sdks/python/tox.ini index ba62c564ddc5..a5ed2db6bbaf 100644 --- a/sdks/python/tox.ini +++ b/sdks/python/tox.ini @@ -62,69 +62,19 @@ commands_post = bash {toxinidir}/scripts/run_tox_cleanup.sh commands = false {envname} is misconfigured -[testenv:py36] +[testenv:py{36,37,38}] commands = python apache_beam/examples/complete/autocomplete_test.py {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" -[testenv:py37] -commands = - python apache_beam/examples/complete/autocomplete_test.py - {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" - -[testenv:py38] -commands = - python apache_beam/examples/complete/autocomplete_test.py - {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" - -[testenv:py36-win] +[testenv:py{36,37,38}-win] commands = python apache_beam/examples/complete/autocomplete_test.py bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" install_command = {envbindir}/python.exe {envbindir}/pip.exe install --retries 10 {opts} {packages} list_dependencies_command = {envbindir}/python.exe {envbindir}/pip.exe freeze -[testenv:py37-win] -commands = - python apache_beam/examples/complete/autocomplete_test.py - bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" -install_command = {envbindir}/python.exe {envbindir}/pip.exe install --retries 10 {opts} {packages} -list_dependencies_command = {envbindir}/python.exe {envbindir}/pip.exe freeze - -[testenv:py38-win] -commands = - python apache_beam/examples/complete/autocomplete_test.py - bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" -install_command = {envbindir}/python.exe {envbindir}/pip.exe install --retries 10 {opts} {packages} -list_dependencies_command = {envbindir}/python.exe {envbindir}/pip.exe freeze - -[testenv:py36-cython] -# cython tests are only expected to work in linux (2.x and 3.x) -# If we want to add other platforms in the future, it should be: -# `platform = linux2|darwin|...` -# See https://docs.python.org/2/library/sys.html#sys.platform for platform codes -platform = linux -commands = - # TODO(BEAM-8954): Remove this build_ext invocation once local source no longer - # shadows the installed apache_beam. - python setup.py build_ext --inplace - python apache_beam/examples/complete/autocomplete_test.py - {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" - -[testenv:py37-cython] -# cython tests are only expected to work in linux (2.x and 3.x) -# If we want to add other platforms in the future, it should be: -# `platform = linux2|darwin|...` -# See https://docs.python.org/2/library/sys.html#sys.platform for platform codes -platform = linux -commands = - # TODO(BEAM-8954): Remove this build_ext invocation once local source no longer - # shadows the installed apache_beam. - python setup.py build_ext --inplace - python apache_beam/examples/complete/autocomplete_test.py - {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" - -[testenv:py38-cython] +[testenv:py{36,37,38}-cython] # cython tests are only expected to work in linux (2.x and 3.x) # If we want to add other platforms in the future, it should be: # `platform = linux2|darwin|...` @@ -137,17 +87,7 @@ commands = python apache_beam/examples/complete/autocomplete_test.py {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" -[testenv:py36-cloud] -extras = test,gcp,interactive,aws,azure -commands = - {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" - -[testenv:py37-cloud] -extras = test,gcp,interactive,aws,azure -commands = - {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" - -[testenv:py38-cloud] +[testenv:py{36,37,38}-cloud] extras = test,gcp,interactive,aws,azure commands = {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" @@ -200,7 +140,7 @@ commands = [testenv:py38-docs] extras = test,gcp,docs,interactive deps = - Sphinx==3.0.3 + Sphinx==1.8.5 sphinx_rtd_theme==0.4.3 commands = time {toxinidir}/scripts/generate_pydoc.sh @@ -283,28 +223,27 @@ extras = test commands = {toxinidir}/scripts/pytest_validates_runner.sh {envname} {toxinidir}/apache_beam/runners/portability/flink_runner_test.py {posargs} -[testenv:spark-runner-test] +[testenv:samza-runner-test] extras = test commands = - {toxinidir}/scripts/pytest_validates_runner.sh {envname} {toxinidir}/apache_beam/runners/portability/spark_runner_test.py {posargs} + {toxinidir}/scripts/pytest_validates_runner.sh {envname} {toxinidir}/apache_beam/runners/portability/samza_runner_test.py {posargs} -[testenv:py38-pyarrow-0] -deps = - pyarrow>=0.15.1,<0.18.0 -commands = - /bin/sh -c "pip freeze | grep pyarrow==0" - {toxinidir}/scripts/run_pytest.sh {envname} {toxinidir}/apache_beam/io/parquetio_test.py - -[testenv:py38-pyarrow-1] -deps = - pyarrow>=1,<2 +[testenv:spark-runner-test] +extras = test commands = - /bin/sh -c "pip freeze | grep pyarrow==1" - {toxinidir}/scripts/run_pytest.sh {envname} {toxinidir}/apache_beam/io/parquetio_test.py + {toxinidir}/scripts/pytest_validates_runner.sh {envname} {toxinidir}/apache_beam/runners/portability/spark_runner_test.py {posargs} -[testenv:py38-pyarrow-2] +[testenv:py{36,37,38}-pyarrow-{0,1,2,3,4}] deps = - pyarrow>=2,<3 -commands = - /bin/sh -c "pip freeze | grep pyarrow==2" - {toxinidir}/scripts/run_pytest.sh {envname} {toxinidir}/apache_beam/io/parquetio_test.py + 0: pyarrow>=0.15.1,<0.18.0 + 1: pyarrow>=1,<2 + 2: pyarrow>=2,<3 + # ARROW-11450,BEAM-11731 + # pyarrow <3 doesn't work with 1.20.0, but doesn't restrict the bounds + {0,1,2}: numpy<1.20.0 + 3: pyarrow>=3,<4 + 4: pyarrow>=4,<5 +commands = + # Log pyarrow and numpy version for debugging + /bin/sh -c "pip freeze | grep -E '(pyarrow|numpy)'" + {toxinidir}/scripts/run_pytest.sh {envname} '-m uses_pyarrow' diff --git a/settings.gradle b/settings.gradle deleted file mode 100644 index 8ddbcce7a938..000000000000 --- a/settings.gradle +++ /dev/null @@ -1,222 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - - -plugins { - id "com.gradle.enterprise" version "3.4.1" apply false -} - - -// Plugins which require online access should not be enabled when running in offline mode. -if (!gradle.startParameter.isOffline()) { - apply plugin: "com.gradle.enterprise" -} - -// JENKINS_HOME and BUILD_ID set automatically during Jenkins execution -def isJenkinsBuild = ['JENKINS_HOME', 'BUILD_ID'].every System.&getenv -// GITHUB_REPOSITORY and GITHUB_RUN_ID set automatically during Github Actions run -def isGithubActionsBuild = ['GITHUB_REPOSITORY', 'GITHUB_RUN_ID'].every System.&getenv -if (isJenkinsBuild || isGithubActionsBuild) { - gradleEnterprise { - buildScan { - // Build Scan enabled and TOS accepted for Jenkins lab build. This does not apply to builds on - // non-Jenkins machines. Developers need to separately enable and accept TOS to use build scans. - termsOfServiceUrl = 'https://gradle.com/terms-of-service' - termsOfServiceAgree = 'yes' - publishAlways() - } - } -} - -rootProject.name = "beam" - -include ":release" -include ":release:go-licenses:go" -include ":release:go-licenses:java" -include ":release:go-licenses:py" - -include ":examples:java" -include ":examples:kotlin" -include ":model:fn-execution" -include ":model:job-management" -include ":model:pipeline" -include ":runners:core-construction-java" -include ":runners:core-java" -include ":runners:direct-java" -include ":runners:extensions-java:metrics" -/* Begin Flink Runner related settings */ -// Flink 1.8 -include ":runners:flink:1.8" -include ":runners:flink:1.8:job-server" -include ":runners:flink:1.8:job-server-container" -// Flink 1.9 -include ":runners:flink:1.9" -include ":runners:flink:1.9:job-server" -include ":runners:flink:1.9:job-server-container" -// Flink 1.10 -include ":runners:flink:1.10" -include ":runners:flink:1.10:job-server" -include ":runners:flink:1.10:job-server-container" -// Flink 1.11 -include ":runners:flink:1.11" -include ":runners:flink:1.11:job-server" -include ":runners:flink:1.11:job-server-container" -// Flink 1.12 -include ":runners:flink:1.12" -include ":runners:flink:1.12:job-server" -include ":runners:flink:1.12:job-server-container" -/* End Flink Runner related settings */ -include ":runners:twister2" -include ":runners:google-cloud-dataflow-java" -include ":runners:google-cloud-dataflow-java:examples" -include ":runners:google-cloud-dataflow-java:examples-streaming" -include ":runners:java-fn-execution" -include ":runners:java-job-service" -include ":runners:jet" -include ":runners:local-java" -include ":runners:portability:java" -include ":runners:spark" -include ":runners:spark:job-server" -include ":runners:spark:job-server:container" -include ":runners:samza" -include ":runners:samza:job-server" -include ":sdks:go" -include ":sdks:go:container" -include ":sdks:go:examples" -include ":sdks:go:test" -include ":sdks:go:test:load" -include ":sdks:java:bom" -include ":sdks:java:build-tools" -include ":sdks:java:container" -include ":sdks:java:container:java8" -include ":sdks:java:container:java11" -include ":sdks:java:core" -include ":sdks:java:expansion-service" -include ":sdks:java:extensions:euphoria" -include ":sdks:java:extensions:kryo" -include ":sdks:java:extensions:google-cloud-platform-core" -include ":sdks:java:extensions:jackson" -include ":sdks:java:extensions:join-library" -include ":sdks:java:extensions:ml" -include ":sdks:java:extensions:protobuf" -include ":sdks:java:extensions:schemaio-expansion-service" -include ":sdks:java:extensions:sketching" -include ":sdks:java:extensions:sorter" -include ":sdks:java:extensions:sql" -include ":sdks:java:extensions:sql:perf-tests" -include ":sdks:java:extensions:sql:jdbc" -include ":sdks:java:extensions:sql:shell" -include ":sdks:java:extensions:sql:hcatalog" -include ":sdks:java:extensions:sql:datacatalog" -include ":sdks:java:extensions:sql:zetasql" -include ":sdks:java:extensions:sql:expansion-service" -include ":sdks:java:extensions:zetasketch" -include ":sdks:java:fn-execution" -include ":sdks:java:harness" -include ":sdks:java:io:amazon-web-services" -include ":sdks:java:io:amazon-web-services2" -include ":sdks:java:io:amqp" -include ":sdks:java:io:azure" -include ":sdks:java:io:cassandra" -include ":sdks:java:io:clickhouse" -include ":sdks:java:io:common" -include ":sdks:java:io:contextualtextio" -include ":sdks:java:io:elasticsearch" -include ":sdks:java:io:elasticsearch-tests:elasticsearch-tests-2" -include ":sdks:java:io:elasticsearch-tests:elasticsearch-tests-5" -include ":sdks:java:io:elasticsearch-tests:elasticsearch-tests-6" -include ":sdks:java:io:elasticsearch-tests:elasticsearch-tests-7" -include ":sdks:java:io:elasticsearch-tests:elasticsearch-tests-common" -include ":sdks:java:io:expansion-service" -include ":sdks:java:io:file-based-io-tests" -include ':sdks:java:io:bigquery-io-perf-tests' -include ":sdks:java:io:google-cloud-platform" -include ":sdks:java:io:google-cloud-platform:expansion-service" -include ":sdks:java:io:hadoop-common" -include ":sdks:java:io:hadoop-file-system" -include ":sdks:java:io:hadoop-format" -include ":sdks:java:io:hbase" -include ":sdks:java:io:hcatalog" -include ":sdks:java:io:jdbc" -include ":sdks:java:io:jms" -include ":sdks:java:io:kafka" -include ":sdks:java:io:kinesis" -include ":sdks:java:io:kinesis:expansion-service" -include ":sdks:java:io:kudu" -include ":sdks:java:io:mongodb" -include ":sdks:java:io:mqtt" -include ":sdks:java:io:parquet" -include ":sdks:java:io:rabbitmq" -include ":sdks:java:io:redis" -include ":sdks:java:io:solr" -include ":sdks:java:io:snowflake" -include ":sdks:java:io:snowflake:expansion-service" -include ":sdks:java:io:splunk" -include ":sdks:java:io:thrift" -include ":sdks:java:io:tika" -include ":sdks:java:io:xml" -include ":sdks:java:io:synthetic" -include ":sdks:java:io:influxdb" -include ":sdks:java:javadoc" -include ":sdks:java:maven-archetypes:examples" -include ":sdks:java:maven-archetypes:starter" -include ":sdks:java:testing:nexmark" -include ":sdks:java:testing:expansion-service" -include ":sdks:java:testing:jpms-tests" -include ":sdks:java:testing:kafka-service" -include ":sdks:java:testing:load-tests" -include ":sdks:java:testing:test-utils" -include ":sdks:java:testing:tpcds" -include ":sdks:python" -include ":sdks:python:apache_beam:testing:load_tests" -include ":sdks:python:container" -include ":sdks:python:container:py36" -include ":sdks:python:container:py37" -include ":sdks:python:container:py38" -include ":sdks:python:test-suites:dataflow" -include ":sdks:python:test-suites:dataflow:py36" -include ":sdks:python:test-suites:dataflow:py37" -include ":sdks:python:test-suites:dataflow:py38" -include ":sdks:python:test-suites:direct" -include ":sdks:python:test-suites:direct:py36" -include ":sdks:python:test-suites:direct:py37" -include ":sdks:python:test-suites:direct:py38" -include ":sdks:python:test-suites:direct:xlang" -include ":sdks:python:test-suites:portable:py36" -include ":sdks:python:test-suites:portable:py37" -include ":sdks:python:test-suites:portable:py38" -include ":sdks:python:test-suites:tox:pycommon" -include ":sdks:python:test-suites:tox:py36" -include ":sdks:python:test-suites:tox:py37" -include ":sdks:python:test-suites:tox:py38" -include ":vendor:grpc-1_26_0" -include ":vendor:bytebuddy-1_10_8" -include ":vendor:calcite-1_20_0" -include ":vendor:guava-26_0-jre" -include ":vendor:sdks-java-extensions-protobuf" -include ":website" -include ":runners:google-cloud-dataflow-java:worker:legacy-worker" -include ":runners:google-cloud-dataflow-java:worker" -include ":runners:google-cloud-dataflow-java:worker:windmill" -// no dots allowed for project paths -include "beam-test-infra-metrics" -project(":beam-test-infra-metrics").dir = file(".test-infra/metrics") -include "beam-test-tools" -project(":beam-test-tools").dir = file(".test-infra/tools") -include "beam-test-jenkins" -project(":beam-test-jenkins").dir = file(".test-infra/jenkins") diff --git a/settings.gradle.kts b/settings.gradle.kts new file mode 100644 index 000000000000..d9d45933b226 --- /dev/null +++ b/settings.gradle.kts @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +plugins { + id("com.gradle.enterprise") version "3.4.1" apply false +} + + +// Plugins which require online access should not be enabled when running in offline mode. +if (!gradle.startParameter.isOffline) { + apply(plugin = "com.gradle.enterprise") +} + +// JENKINS_HOME and BUILD_ID set automatically during Jenkins execution +val isJenkinsBuild = arrayOf("JENKINS_HOME", "BUILD_ID").all { System.getenv(it) != null } +// GITHUB_REPOSITORY and GITHUB_RUN_ID set automatically during Github Actions run +val isGithubActionsBuild = arrayOf("GITHUB_REPOSITORY", "GITHUB_RUN_ID").all { System.getenv(it) != null } +if (isJenkinsBuild || isGithubActionsBuild) { + gradleEnterprise { + buildScan { + // Build Scan enabled and TOS accepted for Jenkins lab build. This does not apply to builds on + // non-Jenkins machines. Developers need to separately enable and accept TOS to use build scans. + termsOfServiceUrl = "https://gradle.com/terms-of-service" + termsOfServiceAgree = "yes" + publishAlways() + } + } +} + +rootProject.name = "beam" + +include(":release") +include(":release:go-licenses:go") +include(":release:go-licenses:java") +include(":release:go-licenses:py") + +include(":examples:java") +include(":examples:java:twitter") +include(":examples:kotlin") +include(":model:fn-execution") +include(":model:job-management") +include(":model:pipeline") +include(":runners:core-construction-java") +include(":runners:core-java") +include(":runners:direct-java") +include(":runners:extensions-java:metrics") +/* Begin Flink Runner related settings */ +// Flink 1.11 +include(":runners:flink:1.11") +include(":runners:flink:1.11:job-server") +include(":runners:flink:1.11:job-server-container") +// Flink 1.12 +include(":runners:flink:1.12") +include(":runners:flink:1.12:job-server") +include(":runners:flink:1.12:job-server-container") +// Flink 1.13 +include(":runners:flink:1.13") +include(":runners:flink:1.13:job-server") +include(":runners:flink:1.13:job-server-container") +/* End Flink Runner related settings */ +include(":runners:twister2") +include(":runners:google-cloud-dataflow-java") +include(":runners:google-cloud-dataflow-java:examples") +include(":runners:google-cloud-dataflow-java:examples-streaming") +include(":runners:java-fn-execution") +include(":runners:java-job-service") +include(":runners:jet") +include(":runners:local-java") +include(":runners:portability:java") +include(":runners:spark:2") +include(":runners:spark:2:job-server") +include(":runners:spark:2:job-server:container") +include(":runners:spark:3") +include(":runners:spark:3:job-server") +include(":runners:spark:3:job-server:container") +include(":runners:samza") +include(":runners:samza:job-server") +include(":sdks:go") +include(":sdks:go:container") +include(":sdks:go:examples") +include(":sdks:go:test") +include(":sdks:go:test:load") +include(":sdks:java:bom") +include(":sdks:java:bom:gcp") +include(":sdks:java:build-tools") +include(":sdks:java:container") +include(":sdks:java:container:java8") +include(":sdks:java:container:java11") +include(":sdks:java:core") +include(":sdks:java:expansion-service") +include(":sdks:java:extensions:arrow") +include(":sdks:java:extensions:euphoria") +include(":sdks:java:extensions:kryo") +include(":sdks:java:extensions:google-cloud-platform-core") +include(":sdks:java:extensions:jackson") +include(":sdks:java:extensions:join-library") +include(":sdks:java:extensions:ml") +include(":sdks:java:extensions:protobuf") +include(":sdks:java:extensions:schemaio-expansion-service") +include(":sdks:java:extensions:sketching") +include(":sdks:java:extensions:sorter") +include(":sdks:java:extensions:sql") +include(":sdks:java:extensions:sql:payloads") +include(":sdks:java:extensions:sql:perf-tests") +include(":sdks:java:extensions:sql:jdbc") +include(":sdks:java:extensions:sql:shell") +include(":sdks:java:extensions:sql:hcatalog") +include(":sdks:java:extensions:sql:datacatalog") +include(":sdks:java:extensions:sql:zetasql") +include(":sdks:java:extensions:sql:expansion-service") +include(":sdks:java:extensions:sql:udf") +include(":sdks:java:extensions:sql:udf-test-provider") +include(":sdks:java:extensions:zetasketch") +include(":sdks:java:fn-execution") +include(":sdks:java:harness") +include(":sdks:java:io:amazon-web-services") +include(":sdks:java:io:amazon-web-services2") +include(":sdks:java:io:amqp") +include(":sdks:java:io:azure") +include(":sdks:java:io:cassandra") +include(":sdks:java:io:clickhouse") +include(":sdks:java:io:common") +include(":sdks:java:io:contextualtextio") +include(":sdks:java:io:debezium") +include(":sdks:java:io:debezium:expansion-service") +include(":sdks:java:io:elasticsearch") +include(":sdks:java:io:elasticsearch-tests:elasticsearch-tests-2") +include(":sdks:java:io:elasticsearch-tests:elasticsearch-tests-5") +include(":sdks:java:io:elasticsearch-tests:elasticsearch-tests-6") +include(":sdks:java:io:elasticsearch-tests:elasticsearch-tests-7") +include(":sdks:java:io:elasticsearch-tests:elasticsearch-tests-common") +include(":sdks:java:io:expansion-service") +include(":sdks:java:io:file-based-io-tests") +include(":sdks:java:io:bigquery-io-perf-tests") +include(":sdks:java:io:google-cloud-platform") +include(":sdks:java:io:google-cloud-platform:expansion-service") +include(":sdks:java:io:hadoop-common") +include(":sdks:java:io:hadoop-file-system") +include(":sdks:java:io:hadoop-format") +include(":sdks:java:io:hbase") +include(":sdks:java:io:hcatalog") +include(":sdks:java:io:jdbc") +include(":sdks:java:io:jms") +include(":sdks:java:io:kafka") +include(":sdks:java:io:kinesis") +include(":sdks:java:io:kinesis:expansion-service") +include(":sdks:java:io:kudu") +include(":sdks:java:io:mongodb") +include(":sdks:java:io:mqtt") +include(":sdks:java:io:parquet") +include(":sdks:java:io:rabbitmq") +include(":sdks:java:io:redis") +include(":sdks:java:io:solr") +include(":sdks:java:io:snowflake") +include(":sdks:java:io:snowflake:expansion-service") +include(":sdks:java:io:splunk") +include(":sdks:java:io:thrift") +include(":sdks:java:io:tika") +include(":sdks:java:io:xml") +include(":sdks:java:io:synthetic") +include(":sdks:java:io:influxdb") +include(":sdks:java:javadoc") +include(":sdks:java:maven-archetypes:examples") +include(":sdks:java:maven-archetypes:gcp-bom-examples") +include(":sdks:java:maven-archetypes:starter") +include(":sdks:java:testing:nexmark") +include(":sdks:java:testing:expansion-service") +include(":sdks:java:testing:jpms-tests") +include(":sdks:java:testing:kafka-service") +include(":sdks:java:testing:load-tests") +include(":sdks:java:testing:test-utils") +include(":sdks:java:testing:tpcds") +include(":sdks:java:testing:watermarks") +include(":sdks:python") +include(":sdks:python:apache_beam:testing:load_tests") +include(":sdks:python:container") +include(":sdks:python:container:py36") +include(":sdks:python:container:py37") +include(":sdks:python:container:py38") +include(":sdks:python:test-suites:dataflow") +include(":sdks:python:test-suites:dataflow:py36") +include(":sdks:python:test-suites:dataflow:py37") +include(":sdks:python:test-suites:dataflow:py38") +include(":sdks:python:test-suites:direct") +include(":sdks:python:test-suites:direct:py36") +include(":sdks:python:test-suites:direct:py37") +include(":sdks:python:test-suites:direct:py38") +include(":sdks:python:test-suites:direct:xlang") +include(":sdks:python:test-suites:portable:py36") +include(":sdks:python:test-suites:portable:py37") +include(":sdks:python:test-suites:portable:py38") +include(":sdks:python:test-suites:tox:pycommon") +include(":sdks:python:test-suites:tox:py36") +include(":sdks:python:test-suites:tox:py37") +include(":sdks:python:test-suites:tox:py38") +include(":vendor:grpc-1_36_0") +include(":vendor:bytebuddy-1_11_0") +include(":vendor:calcite-1_26_0") +include(":vendor:guava-26_0-jre") +include(":website") +include(":runners:google-cloud-dataflow-java:worker:legacy-worker") +include(":runners:google-cloud-dataflow-java:worker") +include(":runners:google-cloud-dataflow-java:worker:windmill") +// no dots allowed for project paths +include("beam-test-infra-metrics") +project(":beam-test-infra-metrics").projectDir = file(".test-infra/metrics") +include("beam-test-tools") +project(":beam-test-tools").projectDir = file(".test-infra/tools") +include("beam-test-jenkins") +project(":beam-test-jenkins").projectDir = file(".test-infra/jenkins") +include("beam-validate-runner") +project(":beam-validate-runner").projectDir = file(".test-infra/validate-runner") diff --git a/start-build-env.sh b/start-build-env.sh index 52cebe45455f..cd5ca72e7305 100755 --- a/start-build-env.sh +++ b/start-build-env.sh @@ -37,9 +37,18 @@ USER_ID=$(id -u "${USER_NAME}") if [ "$(uname -s)" = "Darwin" ]; then GROUP_ID=100 + if (dscl . -read /Groups/docker 2>/dev/null); then + DOCKER_GROUP_ID=$(dscl . -read /Groups/docker| awk '($1 == "PrimaryGroupID:") { print $2 }') + else + # if Docker post-install steps to manage as non-root user not performed - will use dummy gid + DOCKER_GROUP_ID=1000 + echo "`tput setaf 3`Please take a look post-install steps, to manage Docker as non-root user" + echo "`tput setaf 2`https://docs.docker.com/engine/install/linux-postinstall`tput sgr0`" + fi fi if [ "$(uname -s)" = "Linux" ]; then + DOCKER_GROUP_ID=$(getent group docker | cut -d':' -f3) GROUP_ID=$(id -g "${USER_NAME}") # man docker-run # When using SELinux, mounted directories may not be accessible @@ -72,8 +81,6 @@ fi # Set the home directory in the Docker container. DOCKER_HOME_DIR=${DOCKER_HOME_DIR:-/home/${USER_NAME}} -DOCKER_GROUP_ID=$(getent group docker | cut -d':' -f3) - docker build -t "beam-build-${USER_ID}" - < "/etc/sudoers.d/beam-build-${USER_ID}" ENV HOME "${DOCKER_HOME_DIR}" -ENV GOPATH ${HOME}/beam/sdks/go/examples/.gogradle/project_gopath - +ENV GOPATH ${DOCKER_HOME_DIR}/beam/sdks/go/examples/.gogradle/project_gopath # This next command still runs as root causing the ~/.cache/go-build to be owned by root RUN go get github.com/linkedin/goavro -RUN chown -R ${USER_NAME}:${GROUP_ID} ${HOME}/.cache +RUN chown -R ${USER_NAME}:${GROUP_ID} ${DOCKER_HOME_DIR}/.cache UserSpecificDocker echo "" diff --git a/vendor/README.md b/vendor/README.md index c4b38cd78756..3bef746d3c2e 100644 --- a/vendor/README.md +++ b/vendor/README.md @@ -31,8 +31,50 @@ The upgrading of the vendored dependencies should be performed in two steps: The [linkage tool](https://lists.apache.org/thread.html/eb5d95b9a33d7e32dc9bcd0f7d48ba8711d42bd7ed03b9cf0f1103f1%40%3Cdev.beam.apache.org%3E) is useful for the vendored dependency upgrades. It reports the linkage errors across multiple Apache Beam artifact ids. -For example, when we upgrade the version of gRPC to 1.26.0 and the version of the vendored gRPC is 0.1-SNAPSHOT, +For example, when we upgrade the version of gRPC to 1.36.0 and the version of the vendored gRPC is 0.1-SNAPSHOT, we could run the linkage tool as following: + ``` -./gradlew -PvendoredDependenciesOnly -Ppublishing -PjavaLinkageArtifactIds=beam-vendor-grpc-1_26_0:0.1-SNAPSHOT :checkJavaLinkage +# The check task depends on shadowJar task +$ ./gradlew :vendor:grpc-1_36_0:check +$ find vendor/grpc-1_36_0/build -name '*.jar' +vendor/grpc-1_36_0/build/libs/beam-vendor-grpc-1_36_0-0.1.jar +$ mvn install:install-file \ + -Dpackaging=jar \ + -DgroupId=org.apache.beam \ + -DartifactId=beam-vendor-grpc-1_36_0 \ + -Dversion=0.1 \ + -Dfile=vendor/grpc-1_36_0/build/libs/beam-vendor-grpc-1_36_0-0.1.jar +$ ./gradlew -PvendoredDependenciesOnly -Ppublishing -PjavaLinkageArtifactIds=beam-vendor-grpc-1_36_0:0.1 :checkJavaLinkage ``` + +## Known Linkage Errors in the Vendored gRPC Dependencies + +It's expected that the task outputs some linkage errors. +While the `checkJavaLinkage` task does not retrieve optional dependencies to avoid bloated +dependency trees, Netty (one of gRPC dependencies) has various optional features through optional +dependencies. +Therefore the task outputs the linkage errors on the references to missing classes in the optional +dependencies when applied for the vendored gRPC artifact. + +As long as Beam's use of gRPC does not touch these optional Netty features or the classes are +available at runtime, it's fine to have the +references to the missing classes. Here are the known linkage errors: + +- References to `org.junit.runners`: `io.grpc.testing.GrpcCleanupRule` uses JUnit classes, which are + present when we run Beam's tests. +- References from `io.netty.handler.ssl`: Netty users can choose SSL implementation based + on the platform ([Netty documentation](https://netty.io/wiki/forked-tomcat-native.html#wiki-h2-4)). + Beam's vendored gRPC uses `netty-tcnative-boringssl-static`, which contains the static libraries + for all supported OS architectures (x86_64 and aarch64). + The `io.netty.handler.ssl` package has classes that have references to missing classes in other + unused optional SSL implementations. +- References from `io.netty.handler.codec.compression`: Beam does not use the optional dependencies + for compression algorithms (jzlib, lzma, and lzf) through Netty's features. +- References to `com.google.protobuf.nano` and `org.jboss.marshalling`: Beam does not use the + optional serialization algorithms. +- References from `io.netty.util.internal.logging`: Netty's logging framework can choose available + loggers at runtime. The logging implementations are optional dependencies and thus are not needed + to be included in the vendored artifact. Slf4j-api is available at Beam's runtime. +- References to `reactor.blockhound`: When enabled, Netty's BlockHound integration can detect + unexpected blocking calls. Beam does not use it. diff --git a/vendor/bytebuddy-1_10_8/build.gradle b/vendor/bytebuddy-1_11_0/build.gradle.kts similarity index 63% rename from vendor/bytebuddy-1_10_8/build.gradle rename to vendor/bytebuddy-1_11_0/build.gradle.kts index f255a24a1a02..b55c4edd8198 100644 --- a/vendor/bytebuddy-1_10_8/build.gradle +++ b/vendor/bytebuddy-1_11_0/build.gradle.kts @@ -16,22 +16,24 @@ * limitations under the License. */ -plugins { id 'org.apache.beam.vendor-java' } +plugins { id("org.apache.beam.vendor-java") } -description = "Apache Beam :: Vendored Dependencies :: ByteBuddy :: 1.10.8" +description = "Apache Beam :: Vendored Dependencies :: ByteBuddy :: 1.11.0" group = "org.apache.beam" version = "0.1" +val vendorJava = project.extensions.extraProperties.get("vendorJava") as groovy.lang.Closure<*> vendorJava( - dependencies: ["net.bytebuddy:byte-buddy:1.10.8"], - relocations: [ - "net.bytebuddy": "org.apache.beam.vendor.bytebuddy.v1_10_8.net.bytebuddy" - ], - exclusions: [ - "**/module-info.class" - ], - groupId: group, - artifactId: "beam-vendor-bytebuddy-1_10_8", - version: version + mapOf( + "dependencies" to listOf("net.bytebuddy:byte-buddy:1.11.0"), + "relocations" to mapOf( + "net.bytebuddy" to "org.apache.beam.vendor.bytebuddy.v1_11_0.net.bytebuddy"), + "exclusions" to listOf( + "**/module-info.class" + ), + "groupId" to group, + "artifactId" to "beam-vendor-bytebuddy-1_11_0", + "version" to version + ) ) diff --git a/vendor/calcite-1_20_0/build.gradle b/vendor/calcite-1_26_0/build.gradle similarity index 50% rename from vendor/calcite-1_20_0/build.gradle rename to vendor/calcite-1_26_0/build.gradle index 085d822eebf3..2231b081bed1 100644 --- a/vendor/calcite-1_20_0/build.gradle +++ b/vendor/calcite-1_26_0/build.gradle @@ -18,29 +18,33 @@ plugins { id 'org.apache.beam.vendor-java' } -description = "Apache Beam :: Vendored Dependencies :: Calcite 1.20.0" +description = "Apache Beam :: Vendored Dependencies :: Calcite 1.26.0" group = "org.apache.beam" -version = "0.2" +version = "0.1" -def calcite_version = "1.20.0" -def avatica_version = "1.16.0" -def prefix = "org.apache.beam.vendor.calcite.v1_20_0" +def calcite_version = "1.26.0" +def avatica_version = "1.17.0" +def prefix = "org.apache.beam.vendor.calcite.v1_26_0" List packagesToRelocate = [ "com.esri", + "com.fasterxml", "com.google.common", - "com.google.thirdparty", "com.google.protobuf", - "com.fasterxml", + "com.google.thirdparty", + "com.google.uzaygezen", "com.jayway", "com.yahoo", + "net.minidev", "org.apache.calcite", "org.apache.commons", "org.apache.http", + "org.apiguardian.api", "org.codehaus", + "org.objectweb", "org.pentaho", - "org.yaml" + "org.yaml", ] vendorJava( @@ -48,17 +52,48 @@ vendorJava( "org.apache.calcite:calcite-core:$calcite_version", "org.apache.calcite:calcite-linq4j:$calcite_version", "org.apache.calcite.avatica:avatica-core:$avatica_version", - library.java.protobuf_java, - library.java.slf4j_api + ], + runtimeDependencies: [ + library.java.slf4j_api, ], relocations: packagesToRelocate.collectEntries { [ (it): "${prefix}.${it}" ] + [ "jdbc:calcite:": "jdbc:beam-vendor-calcite:"] }, exclusions: [ + // Code quality / Building annotations + "com/google/errorprone/**", + "com/google/j2objc/annotations/**", + "javax/annotation/**", + "org/checkerframework/**", + "org/jmlspecs/**", + + // Runtime logging interface "org/slf4j/**", - "**/module-info.class" + "org/apache/log4j/**", + "org/apache/logging/log4j/**", + "META-INF/versions/9/org/apache/logging/log4j/**", + + // Optional loggers + "org/apache/commons/logging/impl/AvalonLogger*", + "org/apache/commons/logging/impl/LogKitLogger*", + + // Optional JSON providers + "com/jayway/jsonpath/spi/json/GsonJsonProvider*", + "com/jayway/jsonpath/spi/json/JettisonProvider*", + "com/jayway/jsonpath/spi/json/JsonOrgJsonProvider*", + "com/jayway/jsonpath/spi/json/TapestryJsonProvider*", + "com/jayway/jsonpath/spi/mapper/GsonMappingProvider*", + "com/jayway/jsonpath/spi/mapper/JsonOrgMappingProvider*", + "com/jayway/jsonpath/spi/mapper/TapestryMappingProvider*", + + // Unused broken code + "org/apache/commons/dbcp2/managed/**", + "org/apache/commons/pool2/proxy/**", + "org/codehaus/janino/AntCompilerAdapter*", + + "**/module-info.class", ], groupId: group, - artifactId: "beam-vendor-calcite-1_20_0", + artifactId: "beam-vendor-calcite-1_26_0", version: version, ) diff --git a/vendor/grpc-1_26_0/build.gradle b/vendor/grpc-1_36_0/build.gradle similarity index 71% rename from vendor/grpc-1_26_0/build.gradle rename to vendor/grpc-1_36_0/build.gradle index 88c622f7db74..b4971fde7d67 100644 --- a/vendor/grpc-1_26_0/build.gradle +++ b/vendor/grpc-1_36_0/build.gradle @@ -16,22 +16,22 @@ * limitations under the License. */ -import org.apache.beam.gradle.GrpcVendoring_1_26_0 +import org.apache.beam.gradle.GrpcVendoring_1_36_0 plugins { id 'org.apache.beam.vendor-java' } -description = "Apache Beam :: Vendored Dependencies :: gRPC :: 1.26.0" +description = "Apache Beam :: Vendored Dependencies :: gRPC :: 1.36.0" group = "org.apache.beam" -version = "0.3" +version = "0.2" vendorJava( - dependencies: GrpcVendoring_1_26_0.dependencies(), - runtimeDependencies: GrpcVendoring_1_26_0.runtimeDependencies(), - testDependencies: GrpcVendoring_1_26_0.testDependencies(), - relocations: GrpcVendoring_1_26_0.relocations(), - exclusions: GrpcVendoring_1_26_0.exclusions(), - artifactId: "beam-vendor-grpc-1_26_0", + dependencies: GrpcVendoring_1_36_0.dependencies(), + runtimeDependencies: GrpcVendoring_1_36_0.runtimeDependencies(), + testDependencies: GrpcVendoring_1_36_0.testDependencies(), + relocations: GrpcVendoring_1_36_0.relocations(), + exclusions: GrpcVendoring_1_36_0.exclusions(), + artifactId: "beam-vendor-grpc-1_36_0", groupId: group, version: version, ) diff --git a/vendor/sdks-java-extensions-protobuf/build.gradle b/vendor/sdks-java-extensions-protobuf/build.gradle deleted file mode 100644 index 1aaefb166ca0..000000000000 --- a/vendor/sdks-java-extensions-protobuf/build.gradle +++ /dev/null @@ -1,62 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -import org.apache.beam.gradle.GrpcVendoring_1_26_0 - -plugins { id 'org.apache.beam.module' } -applyJavaNature( - automaticModuleName: 'org.apache.beam.vendor.sdks.java.extensions.protobuf', - exportJavadoc: false, - shadowClosure: { - dependencies { - include(dependency("com.google.guava:guava:${GrpcVendoring_1_26_0.guava_version}")) - include(dependency("com.google.protobuf:protobuf-java:${GrpcVendoring_1_26_0.protobuf_version}")) - } - // We specifically relocate beam-sdks-extensions-protobuf under a vendored namespace - // but also vendor guava and protobuf to the same vendored namespace as the model/* - // implementations allowing the artifacts to encode/decode vendored byte strings and - // vendored protobuf messages - relocate "org.apache.beam.sdk.extensions.protobuf", "org.apache.beam.vendor.sdk.v2.sdk.extensions.protobuf" - - // guava uses the com.google.common and com.google.thirdparty package namespaces - relocate "com.google.common", "org.apache.beam.vendor.grpc.v1p26p0.com.google.common" - relocate "com.google.thirdparty", "org.apache.beam.vendor.grpc.v1p26p0.com.google.thirdparty" - - relocate "com.google.protobuf", "org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf" - }, -) - -description = "Apache Beam :: Vendored Dependencies :: SDKs :: Java :: Extensions :: Protobuf" -ext.summary = "Add support to Apache Beam for Vendored Google Protobuf." - -/* - * We need to rely on manually specifying these evaluationDependsOn to ensure that - * the following projects are evaluated before we evaluate this project. This is because - * we are attempting to reference the "sourceSets.main.java.srcDirs" directly. - */ -evaluationDependsOn(":sdks:java:extensions:protobuf") - -compileJava { - source project(":sdks:java:extensions:protobuf").sourceSets.main.java.srcDirs -} - -dependencies { - compile "com.google.guava:guava:${GrpcVendoring_1_26_0.guava_version}" - compile "com.google.protobuf:protobuf-java:${GrpcVendoring_1_26_0.protobuf_version}" - shadow project(path: ":sdks:java:core", configuration: "shadow") -} diff --git a/website/.gitignore b/website/.gitignore deleted file mode 100644 index bbed591c748a..000000000000 --- a/website/.gitignore +++ /dev/null @@ -1,23 +0,0 @@ -_site -.doctrees -.sass-cache -.jekyll-metadata -vendor/ -.bundle/ -content/ - -# Ignore IntelliJ files. -.idea/ -*.iml -*.ipr -*.iws - -# Hugo -www/node_modules -www/dist -www/site/resources -www/site/code_samples -www/site/_config_branch_repo.toml -www/yarn-error.log -!www/site/content - diff --git a/website/CONTRIBUTE.md b/website/CONTRIBUTE.md index 13fa561c677d..b2023f4d7b4d 100644 --- a/website/CONTRIBUTE.md +++ b/website/CONTRIBUTE.md @@ -184,6 +184,33 @@ To have a programming language tab switcher, for instance of java, python and go The purpose is to switch languages of codeblocks. +You can also provide language tabs without the language switcher widget. To do so, place `{{< highlight >}}` shortcodes next to each other, like this: + +``` +{{< highlight java >}} +// Java code here... +{{< /highlight >}} +{{< highlight py >}} +# Python code here... +{{< /highlight >}} +``` + +Note that the `{{< highlight >}}` blocks should be directly adjacent to each other, without an extra return between them. + +Do NOT do this: + +``` +{{< highlight java >}} +// Java code here... +{{< /highlight >}} + +{{< highlight py >}} +# Python code here... +{{< /highlight >}} +``` + +In some circumstances, the Hugo markdown parser will generate a pair of empty `

    ` tags for the extra return, and that breaks the formatting of the code tabs. TODO: Fix this issue: [BEAM-12688](https://issues.apache.org/jira/browse/BEAM-12688). + ### Code highlighting To be consistent, please prefer to use `{{< highlight >}}` syntax instead of ` ``` `, for code-blocks or syntax-highlighters. diff --git a/website/README.md b/website/README.md index 8a2d7a648e72..d5f9a6f8c397 100644 --- a/website/README.md +++ b/website/README.md @@ -24,9 +24,7 @@ https://beam.apache.org/. ## About -The Beam website is built using [Hugo](https://gohugo.io/). Additionally, -for additional formatting capabilities, this website uses -[Twitter Bootstrap](https://getbootstrap.com/). +The Beam website is built using [Hugo](https://gohugo.io/) and the Hugo theme [Docsy](https://www.docsy.dev/). For additional formatting capabilities, this website uses [Twitter Bootstrap](https://getbootstrap.com/). Documentation generated from source code, such as Javadoc and Pydoc, is stored separately on the [beam-site @@ -63,4 +61,30 @@ https://beam.apache.org/. ## Contribution guide -If you'd like to contribute to the Apache Beam website, read our [contribution guide](CONTRIBUTE.md) where you can find detailed instructions on how to work with the website. \ No newline at end of file +If you'd like to contribute to the Apache Beam website, read our [contribution guide](CONTRIBUTE.md) where you can find detailed instructions on how to work with the website. + +## Additional resources + +If you're developing the site, you should know a little bit about Hugo and Docsy. The following external resources will help you get up and running: + +- [Directory Structure](https://gohugo.io/getting-started/directory-structure/) +- [Adding Content](https://www.docsy.dev/docs/adding-content/content/) +- [Shortcodes](https://gohugo.io/content-management/shortcodes/) +- [Introduction to Hugo Templating](https://gohugo.io/templates/introduction/) +- [Partial Templates](https://gohugo.io/templates/partials/) + +## Troubleshooting + +### Hugo server does not reload static files + +The Hugo dev server waits for changes in site content, static files, configuration, and other resources. On change, the server rebuilds and reloads the site in your browser. If you're making changes to static files, and those changes are detected by the server but don't appear in the browser, you may have a caching issue. + +You can tell that the server has detected a change by looking at the output. For example, if you make a change to **website/www/site/static/js/section-nav.js**, you should see something like: + +``` +Change of Static files detected, rebuilding site. +2021-07-16 15:25:29.730 +0000 +Syncing js/section-nav.js to / +``` + +If the change does not appear in the browser, even after a hard refresh, try disabling the cache. For example, to disable the cache in Chrome, open dev tools, select the Network tab, and check the box labeled "Disable cache". diff --git a/website/build.gradle b/website/build.gradle index 6ea9e2ab28ed..5f6bd5a5ba2a 100644 --- a/website/build.gradle +++ b/website/build.gradle @@ -272,6 +272,8 @@ task commitWebsite { def latestCommit = grgit.log(maxCommits: 1)[0].abbreviatedId shell "git fetch --force origin +asf-site:asf-site" + shell "git stash" + shell "git checkout asf-site" git.checkout(branch: 'asf-site') // Delete the previous content. These are asf-site branch paths. diff --git a/website/www/package.json b/website/www/package.json index 186858fb7a8a..cd336b6b8eaa 100644 --- a/website/www/package.json +++ b/website/www/package.json @@ -3,7 +3,7 @@ "version": "1.0.0", "description": "Apache Beam website", "repository": "apache/beam", - "license": "MIT", + "license": "Apache-2.0", "scripts": { "build_code_samples": "./build_code_samples.sh", "develop": "cd site && hugo server", diff --git a/website/www/site/assets/icons/calendar-icon.svg b/website/www/site/assets/icons/calendar-icon.svg new file mode 100644 index 000000000000..1aa91dd4d3f8 --- /dev/null +++ b/website/www/site/assets/icons/calendar-icon.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/close-icon.svg b/website/www/site/assets/icons/close-icon.svg new file mode 100644 index 000000000000..7a07561031f1 --- /dev/null +++ b/website/www/site/assets/icons/close-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/commercial-icon.svg b/website/www/site/assets/icons/commercial-icon.svg new file mode 100644 index 000000000000..916237de46cc --- /dev/null +++ b/website/www/site/assets/icons/commercial-icon.svg @@ -0,0 +1,29 @@ + + + + + + + + + + + + diff --git a/website/www/site/assets/icons/community/bee-icon.svg b/website/www/site/assets/icons/community/bee-icon.svg new file mode 100644 index 000000000000..d04da17d1006 --- /dev/null +++ b/website/www/site/assets/icons/community/bee-icon.svg @@ -0,0 +1,38 @@ + + + + + + + + + + + + + + + + + + + + + diff --git a/website/www/site/assets/icons/community/box-icon.svg b/website/www/site/assets/icons/community/box-icon.svg new file mode 100644 index 000000000000..572896e70ec2 --- /dev/null +++ b/website/www/site/assets/icons/community/box-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/community/calendar-icon.svg b/website/www/site/assets/icons/community/calendar-icon.svg new file mode 100644 index 000000000000..18229ffe4836 --- /dev/null +++ b/website/www/site/assets/icons/community/calendar-icon.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/community/contact-us/bug.svg b/website/www/site/assets/icons/community/contact-us/bug.svg new file mode 100644 index 000000000000..62267024ea72 --- /dev/null +++ b/website/www/site/assets/icons/community/contact-us/bug.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/community/contact-us/discussion.svg b/website/www/site/assets/icons/community/contact-us/discussion.svg new file mode 100644 index 000000000000..dc9c94e95d50 --- /dev/null +++ b/website/www/site/assets/icons/community/contact-us/discussion.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/community/contact-us/gitArrows.svg b/website/www/site/assets/icons/community/contact-us/gitArrows.svg new file mode 100644 index 000000000000..30a857d94adc --- /dev/null +++ b/website/www/site/assets/icons/community/contact-us/gitArrows.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/community/contact-us/knot.svg b/website/www/site/assets/icons/community/contact-us/knot.svg new file mode 100644 index 000000000000..e47af8cd8ca2 --- /dev/null +++ b/website/www/site/assets/icons/community/contact-us/knot.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/community/contact-us/messages.svg b/website/www/site/assets/icons/community/contact-us/messages.svg new file mode 100644 index 000000000000..43806ca5bc3a --- /dev/null +++ b/website/www/site/assets/icons/community/contact-us/messages.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/community/contact-us/notification.svg b/website/www/site/assets/icons/community/contact-us/notification.svg new file mode 100644 index 000000000000..3b2976ad87ff --- /dev/null +++ b/website/www/site/assets/icons/community/contact-us/notification.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/community/contact-us/question-mark.svg b/website/www/site/assets/icons/community/contact-us/question-mark.svg new file mode 100644 index 000000000000..bb230303a16b --- /dev/null +++ b/website/www/site/assets/icons/community/contact-us/question-mark.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/community/envelope-icon.svg b/website/www/site/assets/icons/community/envelope-icon.svg new file mode 100644 index 000000000000..6ced3b9edd59 --- /dev/null +++ b/website/www/site/assets/icons/community/envelope-icon.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/community/message-icon.svg b/website/www/site/assets/icons/community/message-icon.svg new file mode 100644 index 000000000000..466af8040595 --- /dev/null +++ b/website/www/site/assets/icons/community/message-icon.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/contributor/become a commiter/beam-logo-icon.svg b/website/www/site/assets/icons/contributor/become a commiter/beam-logo-icon.svg new file mode 100644 index 000000000000..587667d29cc9 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a commiter/beam-logo-icon.svg @@ -0,0 +1,46 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/website/www/site/assets/icons/contributor/become a commiter/code-icon.svg b/website/www/site/assets/icons/contributor/become a commiter/code-icon.svg new file mode 100644 index 000000000000..338acf8e8de1 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a commiter/code-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/contributor/become a commiter/diamond-icon.svg b/website/www/site/assets/icons/contributor/become a commiter/diamond-icon.svg new file mode 100644 index 000000000000..50930d56bf94 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a commiter/diamond-icon.svg @@ -0,0 +1,39 @@ + + + + + + + + + + + + + + + + + + + + + + diff --git a/website/www/site/assets/icons/contributor/become a commiter/file-icon.svg b/website/www/site/assets/icons/contributor/become a commiter/file-icon.svg new file mode 100644 index 000000000000..a198565b15a4 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a commiter/file-icon.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/contributor/become a commiter/messages-icon.svg b/website/www/site/assets/icons/contributor/become a commiter/messages-icon.svg new file mode 100644 index 000000000000..0f710d68e9f6 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a commiter/messages-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/contributor/become a commiter/tool-icon.svg b/website/www/site/assets/icons/contributor/become a commiter/tool-icon.svg new file mode 100644 index 000000000000..46e30c82ab1a --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a commiter/tool-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/contributor/become a committer/beam-logo-icon.svg b/website/www/site/assets/icons/contributor/become a committer/beam-logo-icon.svg new file mode 100644 index 000000000000..587667d29cc9 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a committer/beam-logo-icon.svg @@ -0,0 +1,46 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/website/www/site/assets/icons/contributor/become a committer/code-icon.svg b/website/www/site/assets/icons/contributor/become a committer/code-icon.svg new file mode 100644 index 000000000000..338acf8e8de1 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a committer/code-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/contributor/become a committer/diamond-icon.svg b/website/www/site/assets/icons/contributor/become a committer/diamond-icon.svg new file mode 100644 index 000000000000..50930d56bf94 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a committer/diamond-icon.svg @@ -0,0 +1,39 @@ + + + + + + + + + + + + + + + + + + + + + + diff --git a/website/www/site/assets/icons/contributor/become a committer/file-icon.svg b/website/www/site/assets/icons/contributor/become a committer/file-icon.svg new file mode 100644 index 000000000000..a198565b15a4 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a committer/file-icon.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/contributor/become a committer/messages-icon.svg b/website/www/site/assets/icons/contributor/become a committer/messages-icon.svg new file mode 100644 index 000000000000..0f710d68e9f6 --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a committer/messages-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/contributor/become a committer/tool-icon.svg b/website/www/site/assets/icons/contributor/become a committer/tool-icon.svg new file mode 100644 index 000000000000..46e30c82ab1a --- /dev/null +++ b/website/www/site/assets/icons/contributor/become a committer/tool-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/documentation/runners/beam-icon.svg b/website/www/site/assets/icons/documentation/runners/beam-icon.svg new file mode 100644 index 000000000000..a28c653d9d2b --- /dev/null +++ b/website/www/site/assets/icons/documentation/runners/beam-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/documentation/runners/dataflow-icon.svg b/website/www/site/assets/icons/documentation/runners/dataflow-icon.svg new file mode 100644 index 000000000000..8bf98f238451 --- /dev/null +++ b/website/www/site/assets/icons/documentation/runners/dataflow-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/documentation/runners/flink-icon.svg b/website/www/site/assets/icons/documentation/runners/flink-icon.svg new file mode 100644 index 000000000000..40c77f112dc1 --- /dev/null +++ b/website/www/site/assets/icons/documentation/runners/flink-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/documentation/runners/nemo-icon.svg b/website/www/site/assets/icons/documentation/runners/nemo-icon.svg new file mode 100644 index 000000000000..d6ddbc517578 --- /dev/null +++ b/website/www/site/assets/icons/documentation/runners/nemo-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/documentation/runners/samza-icon.svg b/website/www/site/assets/icons/documentation/runners/samza-icon.svg new file mode 100644 index 000000000000..4627df1be7bf --- /dev/null +++ b/website/www/site/assets/icons/documentation/runners/samza-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/documentation/runners/spark-icon.svg b/website/www/site/assets/icons/documentation/runners/spark-icon.svg new file mode 100644 index 000000000000..61ee67f363ef --- /dev/null +++ b/website/www/site/assets/icons/documentation/runners/spark-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/documentation/sdks/go-icon.svg b/website/www/site/assets/icons/documentation/sdks/go-icon.svg new file mode 100644 index 000000000000..46f43820bb3a --- /dev/null +++ b/website/www/site/assets/icons/documentation/sdks/go-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/documentation/sdks/java-icon.svg b/website/www/site/assets/icons/documentation/sdks/java-icon.svg new file mode 100644 index 000000000000..a2991e307ca3 --- /dev/null +++ b/website/www/site/assets/icons/documentation/sdks/java-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/documentation/sdks/python-icon.svg b/website/www/site/assets/icons/documentation/sdks/python-icon.svg new file mode 100644 index 000000000000..f539aa3a62aa --- /dev/null +++ b/website/www/site/assets/icons/documentation/sdks/python-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/edit-icon.svg b/website/www/site/assets/icons/edit-icon.svg new file mode 100644 index 000000000000..0832bba986b7 --- /dev/null +++ b/website/www/site/assets/icons/edit-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/extensible-icon.svg b/website/www/site/assets/icons/extensible-icon.svg new file mode 100644 index 000000000000..02ba07c0ccd1 --- /dev/null +++ b/website/www/site/assets/icons/extensible-icon.svg @@ -0,0 +1,26 @@ + + + + + + + + + diff --git a/website/www/site/assets/icons/github-icon.svg b/website/www/site/assets/icons/github-icon.svg new file mode 100644 index 000000000000..ba80c8261586 --- /dev/null +++ b/website/www/site/assets/icons/github-icon.svg @@ -0,0 +1,29 @@ + + + + + + + + + + + + diff --git a/website/www/site/assets/icons/install-button-icon.svg b/website/www/site/assets/icons/install-button-icon.svg new file mode 100644 index 000000000000..39c3ffcfbd31 --- /dev/null +++ b/website/www/site/assets/icons/install-button-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/navbar-arrow-icon.svg b/website/www/site/assets/icons/navbar-arrow-icon.svg new file mode 100644 index 000000000000..78724e930331 --- /dev/null +++ b/website/www/site/assets/icons/navbar-arrow-icon.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/icons/navbar-documentation-icon.svg b/website/www/site/assets/icons/navbar-documentation-icon.svg new file mode 100644 index 000000000000..11c165b0feba --- /dev/null +++ b/website/www/site/assets/icons/navbar-documentation-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/open-source-icon.svg b/website/www/site/assets/icons/open-source-icon.svg new file mode 100644 index 000000000000..d6afab7898e8 --- /dev/null +++ b/website/www/site/assets/icons/open-source-icon.svg @@ -0,0 +1,31 @@ + + + + + + + + + + + + + + diff --git a/website/www/site/assets/icons/open_source-icon.svg b/website/www/site/assets/icons/open_source-icon.svg new file mode 100644 index 000000000000..1505a32429e5 --- /dev/null +++ b/website/www/site/assets/icons/open_source-icon.svg @@ -0,0 +1,28 @@ + + + + + + + + + + + diff --git a/website/www/site/assets/icons/portable-icon.svg b/website/www/site/assets/icons/portable-icon.svg new file mode 100644 index 000000000000..77189163e141 --- /dev/null +++ b/website/www/site/assets/icons/portable-icon.svg @@ -0,0 +1,27 @@ + + + + + + + + + + diff --git a/website/www/site/assets/icons/quote-icon.svg b/website/www/site/assets/icons/quote-icon.svg new file mode 100644 index 000000000000..0a32eea254ad --- /dev/null +++ b/website/www/site/assets/icons/quote-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/search-icon.svg b/website/www/site/assets/icons/search-icon.svg new file mode 100644 index 000000000000..d01a19658e8e --- /dev/null +++ b/website/www/site/assets/icons/search-icon.svg @@ -0,0 +1,22 @@ + + + + + diff --git a/website/www/site/assets/icons/twitter-icon.svg b/website/www/site/assets/icons/twitter-icon.svg new file mode 100644 index 000000000000..e23e8210ae5b --- /dev/null +++ b/website/www/site/assets/icons/twitter-icon.svg @@ -0,0 +1,30 @@ + + + + + + + + + + + + + diff --git a/website/www/site/assets/icons/unified-icon.svg b/website/www/site/assets/icons/unified-icon.svg new file mode 100644 index 000000000000..6dc75d26e576 --- /dev/null +++ b/website/www/site/assets/icons/unified-icon.svg @@ -0,0 +1,26 @@ + + + + + + + + + diff --git a/website/www/site/assets/icons/youtube-icon.svg b/website/www/site/assets/icons/youtube-icon.svg new file mode 100644 index 000000000000..3cf47d8631f5 --- /dev/null +++ b/website/www/site/assets/icons/youtube-icon.svg @@ -0,0 +1,23 @@ + + + + + + diff --git a/website/www/site/assets/scss/_blog.scss b/website/www/site/assets/scss/_blog.scss new file mode 100644 index 000000000000..697cc8b84a31 --- /dev/null +++ b/website/www/site/assets/scss/_blog.scss @@ -0,0 +1,243 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + @import "media"; + + .blog-content { + //padding: 0 24px 0 24px; + margin-top: 64px; + .tf-filter-item { + display: none !important; + } + h2 { + text-align: center; + margin-bottom: 70px !important; + padding-top: 57px; + } + + .category-buttons { + display: flex; + align-items: center; + justify-content: center; + + flex-wrap: wrap-reverse; + margin: 0 auto; + margin-bottom: 80px; + + @media (max-width: $tablet) { + max-width: 100%; + flex-wrap: wrap; + } + + .category-button { + margin: 0 6px 12px 6px; + padding: 10px 20px; + border-radius: 100px; + background-color: #f6f6f6; + text-transform: uppercase; + border: 0; + font-weight: bold; + line-height: 1.14; + letter-spacing: 0.6px; + &:focus { + outline: 0; + } + &:hover { + background-color: #f2804d; + color: #fff; + } + } + + .active { + background-color: #f26628; + color: #fff; + &:hover { + background-color: #f2804d; + } + } + } + .category-buttons_center { + max-width: 872px; + margin:initial; + } + .category-buttons_poweredBy { + justify-content: flex-start; + margin-top: 84px; + } + + .restricted { + max-width: 60%; + @media (max-width: $mobile) { + max-width: 100%; + } + } + } + .load-button { + padding: 15px 51px 15px 52px; + border-radius: 100px; + background-color: #f26628; + text-transform: uppercase; + font-weight: bold; + line-height: normal; + letter-spacing: 0.39px; + color: #ffffff; + border: 0; + margin-top: 104px; + width: 184px; + height: 46px; + &:hover { + background-color: #f2804d; + } + } + + .posts-list { + display: flex; + flex-wrap: wrap; + justify-content: space-around; + + .show-item { + display: inline-block !important; + } + + @media (max-width: $tablet) { + justify-content: space-evenly; + } + .post-card { + width: 381px; + height: 468px; + padding: 24px 24px 128.9px; + border-radius: 16px; + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, 0.16), + 0 4px 4px 0 rgba(0, 0, 0, 0.06); + background-color: #ffffff; + margin-bottom: 36px; + color: #333333; + overflow: hidden; + + &:hover { + text-decoration: none; + box-shadow: 0 4px 20px 0 rgba(0, 0, 0, 0.24), + 0 4px 6px 0 rgba(0, 0, 0, 0.24); + } + @media (max-width: $tablet) { + width: 327px; + height: 216px; + padding: 24px 20px; + margin-bottom: 24px; + display: flex; + flex-direction: column; + justify-content: space-between; + } + + .post-title { + font-size: 24px; + font-weight: 500; + line-height: 1.25; + letter-spacing: normal; + margin-bottom: 24px; + overflow: hidden; + -webkit-box-orient: vertical; + @media (max-width: $tablet) { + font-size: 18px; + font-weight: normal; + line-height: 1.6; + letter-spacing: 0.43px; + } + } + + .post-summary { + overflow: hidden; + line-height: 1.57; + letter-spacing: 0.43px; + display: -webkit-box; + -webkit-line-clamp: 9; + -webkit-box-orient: vertical; + @media (max-width: $tablet) { + display: none; + } + } + } + } + .post-info { + display: flex; + justify-content: space-between; + font-size: 16px; + font-weight: 500; + line-height: 1.63; + letter-spacing: 0.43px; + color: #8c8b8e; + margin-bottom: 38px; + @media (max-width: $tablet) { + margin-bottom: 0; + } + p { + text-transform: uppercase; + } + } + .post-category { + font-size: 14px; + font-weight: normal; + margin-top: 8px; + } + .post { + .post-content { + max-width: 853px; + margin: 0 auto; + margin-bottom: 120px; + padding-top: 55px; + @media (max-width: $tablet) { + padding: 35px 24px; + word-break: break-word; + margin-top: 85px; + } + } + + .post-header { + border-bottom: 2px solid rgba(255, 109, 5, 0.24); + margin-bottom: 24px; + padding-bottom: 26px; + } + .post-card { + width: 381px; + height: 216px; + padding: 24px; + display: flex; + flex-direction: column; + justify-content: space-between; + @media (max-width: $tablet) { + width: 327px; + height: 216px; + padding: 24px 19.2px 24.7px 20px; + } + } + .post-info { + margin: 0; + p { + font-size: 14px; + font-weight: bold; + line-height: normal; + letter-spacing: 2px; + } + } + .post-title { + margin-bottom: 12px !important; + font-size: 18px !important; + line-height: 1.6 !important; + letter-spacing: 0.43px !important; + text-overflow: ellipsis; + max-height: 4.8em; + } + } diff --git a/website/www/site/assets/scss/_calendar.scss b/website/www/site/assets/scss/_calendar.scss new file mode 100644 index 000000000000..824c0312d8e9 --- /dev/null +++ b/website/www/site/assets/scss/_calendar.scss @@ -0,0 +1,292 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +@import "media"; + +.calendar { + padding: $pad-l $pad; + + .calendar-title { + @extend .component-title; + + text-align: center; + } + + .calendar-content { + display: flex; + justify-content: center; + align-items: flex-start; + margin-top: 84px; + + a { + text-decoration: none; + } + + // Left card + .calendar-card-big { + width: 100%; + max-width: 381px; + height: 468px; + border-radius: 16px; + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, 0.16), + 0 4px 4px 0 rgba(0, 0, 0, 0.06); + background-color: $color-white; + padding: 32px 24px; + transition: 0.2s; + + .calendar-card-big-title { + @extend .component-large-header; + + margin-top: 48px; + margin-bottom: 24px; + } + } + + .calendar-card-big-left { + padding: 0 !important; + &:hover { + box-shadow: 0 4px 20px 0 rgba(0, 0, 0, 0.24), + 0 4px 6px 0 rgba(0, 0, 0, 0.24); + } + } + // Middle cards + .calendar-card-box { + margin: 0 37px; + min-height: 468px; + display: flex; + flex-direction: column; + justify-content: space-between; + .show-item { + display: inline-block; + } + .post-card { + width: 381px; + height: 216px; + padding: 24px; + border-radius: 16px; + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, 0.16), + 0 4px 4px 0 rgba(0, 0, 0, 0.06); + background-color: #ffffff; + margin-bottom: 36px; + color: #333333; + overflow: hidden; + display: flex; + flex-direction: column; + justify-content: space-between; + @media (max-width: $tablet) { + width: 327px; + height: 216px; + padding: 24px 19.2px 24.7px 20px; + margin-bottom: 24px; + } + &:hover { + text-decoration: none; + box-shadow: 0 4px 20px 0 rgba(0, 0, 0, 0.24), + 0 4px 6px 0 rgba(0, 0, 0, 0.24); + } + + .post-title { + font-size: 18px; + font-weight: 500; + line-height: 1.6; + letter-spacing: 0.43px; + margin-bottom: 12px; + text-overflow: ellipsis; + word-wrap: break-word; + overflow: hidden; + max-height: 4.8em; + + @media (max-width: $tablet) { + font-size: 18px; + font-weight: normal; + line-height: 1.6; + letter-spacing: 0.43px; + } + } + + .post-summary { + overflow: hidden; + line-height: 1.57; + letter-spacing: 0.43px; + display: -webkit-box; + -webkit-line-clamp: 9; + -webkit-box-orient: vertical; + @media (max-width: $tablet) { + display: none; + } + } + } + + .calendar-card-small:hover { + box-shadow: 0 4px 20px 0 rgba(0, 0, 0, 0.24), + 0 4px 6px 0 rgba(0, 0, 0, 0.24); + } + } + + // Right card + .calendar-card-big-right { + .calendar-card-event-title { + @extend .component-header; + } + + .calendar-card-events { + margin-top: 30px; + margin-bottom: 70px; + .calendar-desktop { + @media (max-width: $tablet) { + display: none; + } + } + .calendar-mobile { + display: none; + @media (max-width: $tablet) { + display: block; + } + } + .calendar-event { + display: flex; + padding: 14px; + + .calendar-event-icon { + margin-right: 12px; + } + + .calendar-event-description { + .calendar-event-title { + @extend .component-label; + } + + .calendar-event-place { + @extend .component-tag; + } + + .calendar-event-time { + @extend .component-tag; + } + } + } + + .calendar-event:hover { + background-color: rgba(196, 196, 196, 0.16); + border-radius: 16px; + } + + :last-child { + margin-bottom: 0; + } + } + + .calendar-card-events-button { + width: 115px; + height: 36px; + border-radius: 100px; + background-color: $color-sun; + border: none; + outline: none; + float: right; + + span { + @extend .component-label; + + font-size: 14px; + color: $color-white; + } + } + + button:hover { + opacity: 0.84; + } + } + } +} + +// Category for left and middle cards +.calendar-category { + display: flex; + justify-content: space-between; + + .calendar-category-tag { + @extend .component-tag; + + font-size: 14px; + font-weight: 500; + text-transform: uppercase; + } + + .calendar-category-date { + @extend .component-tag; + } +} + +// Author for left and middle cards +.calendar-card-author { + @extend .component-tag; +} + +@media (max-width: $tablet) { + .calendar { + padding: $pad-md $pad-s; + + .calendar-content { + flex-direction: column; + align-items: center; + margin-top: 70px; + + .calendar-card-big { + max-width: 327px; + height: 356px; + padding: 32px 20px; + + .calendar-card-big-title { + margin-top: 35px; + margin-bottom: 16px; + } + } + + .calendar-card-box { + margin-bottom: 0; + margin-top: 24px; + min-height: 456px; + + .calendar-card-small { + max-width: 327px; + height: 216px; + padding: 24px 20px; + + .calendar-card-small-title { + margin-top: 30px; + margin-bottom: 10px; + width: 280px; + } + } + } + + .calendar-card-big-right { + height: 404px; + + .calendar-card-events { + margin-top: 20px; + margin-bottom: 15px; + + .calendar-event { + padding: 14px 5px; + } + } + } + } + } +} diff --git a/website/www/site/assets/scss/_capability-matrix.scss b/website/www/site/assets/scss/_capability-matrix.scss new file mode 100644 index 000000000000..4f0d16b6114f --- /dev/null +++ b/website/www/site/assets/scss/_capability-matrix.scss @@ -0,0 +1,334 @@ +/** + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ +@import "media"; + +.back-button { + font-size: 14px; + font-weight: 700; + padding-left: 15px; + line-height: 1.14; + letter-spacing: .6px; + color: #f26628; + text-transform: uppercase; + &:hover { + color: #f2804d; + } + i { + width: 26px; + } +} + +.information-container { + display: flex; + -webkit-box-orient: horizontal; + -webkit-box-direction: normal; + flex-direction: row; + margin-bottom: 84px; + + @media(max-width:$mobile) { + -webkit-box-orient: vertical; + -webkit-box-direction: normal; + flex-direction: column; + } + table { + margin: 0; + } + .read-tables { + table-layout: fixed; + position: relative; + th:first-of-type { + width: 22px; + min-width: 22px; + } + span { + font-weight: 500; + } + } + .second-container { + width: 270px; + margin-left: 100px; + @media(max-width:$mobile) { + margin-left: 0; + } + h5 { + margin-bottom: 22px; + } + .row { + display: -webkit-box; + display: flex; + margin-left: 0; + } + .box { + width: 71px; + height: 24px; + margin-bottom: 10px; + text-align: center; + margin-right: 10px; + } + .white { + border: solid 1px #f6f6f6; + background-color: #fff; + } + .partial { + border: solid 1px #d8d8d8; + background-color: #f9f9f9; + } + .gray { + border: solid 1px #bcbcbc; + background-color: #e1e0e0; + } + } + .border-left { + border-left: 3px solid #ff6d05; + padding: 7px; + font-size: 12px; + font-weight: 500; + line-height: 1.83; + letter-spacing: .43px; + } + .border-right { + transform: rotate(270deg); + height: 80px; + text-align: center; + padding: 0 !important; + width: 25px !important; + font-size: 12px; + font-weight: 500; + font-stretch: normal; + font-style: normal; + line-height: 1.57; + letter-spacing: .43px; + position: absolute; + top: 55px; + left: 24px; + } + .border-top { + border-top: 3px solid #ff6d05; + font-size: 14px; + font-weight: 400; + font-stretch: normal; + font-style: normal; + line-height: 1.57; + letter-spacing: .43px; + td { + width: 162px; + padding: 7px; + } + } +} +.table-container { + display: -webkit-box; + display: flex; + -webkit-box-orient: horizontal; + -webkit-box-direction: normal; + flex-direction: row; + margin-bottom: 32px; + table { + margin: 0; + } + .big-right { + max-width: 1240px; + } + .table-right { + overflow-x: scroll; + padding: 1px; + scrollbar-width: none; + &::-webkit-scrollbar { + width: 0; + background: 0 0; + } + } + .table-headers { + overflow-x: scroll; + border-spacing: 0; + border: solid 1px #d8d8d8; + border-bottom: none; + mix-blend-mode: multiply; + -ms-overflow-style: none; + scrollbar-width: none; + scroll-behavior: auto; + &::-webkit-scrollbar { + width: 0; + background: 0 0; + } + table { + height: 48px; + table-layout: fixed; + width: 710px; + th { + text-align: center; + width: 142px; + font-size: 12px; + font-weight: 500; + line-height: normal; + letter-spacing: normal; + padding-left: 10px; + @media(max-width: $mobile) { + width: 73px; + } + } + } + } + .big-headers table th { + width: 249px; + } + .table-center { + overflow-x: scroll; + border-spacing: 0; + border: solid 1px #d8d8d8; + border-collapse: separate; + transform: rotateX(180deg); + padding-bottom: 8px; + scrollbar-color: #ff6d00; + scroll-behavior: auto; + &::-webkit-scrollbar { + height: 8px; + width: 25px !important; + } + &::-webkit-scrollbar-track { + background-color: #efeded; + background-clip: content-box; + } + &::-webkit-scrollbar-thumb { + border-radius: 8px; + height: 6px !important; + background-color: #ff6d00; + } + table { + transform: rotateX(180deg); + border-collapse: separate; + table-layout: fixed; + width: 710px; + th, td { + width: 142px; + height: 48px; + font-size: 12px; + font-weight: 500; + line-height: normal; + letter-spacing: normal; + border: 1px solid #fff; + @media(max-width:640px) { + width: 73px; + } + } + } + } + .big-center { + table { + tr { + vertical-align: baseline; + th,td { + width: 249px; + height: 352px; + padding: 6px; + font-weight: 400; + p { + font-size: 12px; + font-weight: 500; + line-height: normal; + letter-spacing: normal; + margin: 0; + } + } + } + } + } + .table-left table { + width: 142px; + @media(max-width:640px) { + width: 73px; + } + tr:first-child th { + height: 83px; + border: none; + @media(max-width: $mobile) { + height: 119px; + } + @-moz-document url-prefix() { + height: 77px; + } + } + th { + height: 48px; + font-size: 12px; + font-weight: 500; + line-height: normal; + letter-spacing: normal; + border: solid 1px #f6f6f6; + border-right: none; + padding: 6px; + } + } + .big-left table { + text-align: start; + tr { + vertical-align: baseline; + } + th { + height: 352px; + } + } +} +#table-link { + margin-bottom: 84px; + font-size: 14px; + font-weight: 700; + line-height: 1.14; + letter-spacing: .6px; + color: #f26628; + text-transform: uppercase; + &:hover { + text-decoration: none; + color: #f2804d; + } +} + +[class*=-switcher], +.language-switcher, +nav.version-switcher, +nav.runner-switcher { + ul li{ + margin-bottom: 0 !important; + border: 1px solid #ff6d05; + border-top-left-radius: 6px; + border-top-right-radius: 6px; + border-bottom: none; + padding-left: 0 !important; + &:hover { + cursor: pointer + } + a { + border: none; + margin-right: 0; + color: #333; + &:hover { + background-color: #f2804d; + } + } + .active { + color: #fff; + a { + background-color: #ff6d05; + color: #fff; + border: none + } + } + ::before { + display: none + } + } +} +#copy:hover { + cursor: pointer +} diff --git a/website/www/site/assets/scss/_cards.sass b/website/www/site/assets/scss/_cards.sass deleted file mode 100644 index a1562d0dde1d..000000000000 --- a/website/www/site/assets/scss/_cards.sass +++ /dev/null @@ -1,69 +0,0 @@ -/*! - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -.cards - background-image: url(../images/cards_bg.svg) - background-size: cover - background-repeat: no-repeat - background-position: top - text-align: center - margin-bottom: $pad*2 - - .cards__title - +type-h2 - color: #fff - padding-top: $pad-md - margin-bottom: $pad - - .cards__body - max-width: 550px - +type-body - margin: 0 auto - - .cards__cards - margin-bottom: $pad*2 - +md - display: flex - justify-content: center - align-items: center - - .cards__cards__card - background: #fff - box-shadow: $box-shadow - max-width: 300px - margin: 0 auto $pad - padding: $pad*1.5 - +md - margin: 0 $pad/2 - - .cards__cards__card__body - margin-bottom: $pad - +type-h3 - - .cards__cards__card__user - display: flex - justify-content: center - align-items: center - - .cards__cards__card__user__icon - border-radius: 100% - background: #efefef - width: 40px - height: 40px - - .cards__cards__card__user__name - margin-left: $pad/2 diff --git a/website/www/site/assets/scss/_ctas.sass b/website/www/site/assets/scss/_ctas.sass index 325d4f1b07cc..d334a8834a1f 100644 --- a/website/www/site/assets/scss/_ctas.sass +++ b/website/www/site/assets/scss/_ctas.sass @@ -16,13 +16,59 @@ */ .ctas - text-align: center - margin: $pad-xl 0 + display: flex + justify-content: center + align-items: center + width: 100% + height: 96px + position: fixed + z-index: 1020 + bottom: 0 + box-shadow: 0 -4px 16px 0 rgba(0, 0, 0, 0.06); + background-color: #ffffff + margin-top: -5px - .ctas__ctas - &--top - margin-bottom: $pad + @media (max-width: $ak-breakpoint-lg) + flex-wrap: nowrap + width: auto + justify-content: flex-start + left: 0 + right: 0 + overflow-y: hidden + overflow-x: auto - .ctas__title - +type-h2 - margin-bottom: $pad + + &_row + margin: 0 0 0 91px + max-width: 184px + transform: translate3d(0px, 0px, 0px) + &:first-of-type + margin: 0 + @media (max-width: $ak-breakpoint-lg) + margin-left: 24px + &:first-of-type + margin-left: 24px + + &_button + width: 184px + height: 46px + border-radius: 100px + background-color: #f26628 + font-size: 14px + font-weight: bold + line-height: normal + letter-spacing: 0.39px + color: #ffffff + text-transform: uppercase + display: flex + align-items: center + justify-content: center + + img + margin-right: 12px + height: 20px + width: 20px + &:hover + text-decoration: none + color: #ffffff + background-color: #f2804d diff --git a/website/www/site/assets/scss/_footer.sass b/website/www/site/assets/scss/_footer.sass index a52ba8d115a5..ce99da118e58 100644 --- a/website/www/site/assets/scss/_footer.sass +++ b/website/www/site/assets/scss/_footer.sass @@ -17,34 +17,32 @@ .footer background: $color-dark - padding: 40px 0 - margin-top: 70px + padding-top: 40px + margin-top: 128px @at-root .body--index & margin-top: 0 color: #fff .footer__contained +contained - + padding: 0 30px a color: #fff .footer__cols display: flex justify-content: space-around + @media (max-width: $ak-breakpoint-lg) + display: block + .footer__cols__col__logos + padding-right: 120px + @media (max-width: $ak-breakpoint-lg) + display: flex .footer__cols__col padding: 5px box-sizing: border-box - &:first-child - +md - width: 20% - - &--md - +md - width: 20% - .footer__cols__col__title color: #fff font-weight: $font-weight-bold @@ -55,7 +53,46 @@ .footer__cols__col__logo margin-bottom: $pad + @media (max-width: $ak-breakpoint-lg) + margin-right: 50px + + .footer__flex_mobile + display: flex + flex-direction: row + justify-content: space-between + @media (max-width: $ak-breakpoint-lg) + flex-direction: column + + .wrapper-grid + display: grid + padding-bottom: 30px + grid-template-columns: repeat(4, 1fr) + grid-gap: 52px + grid-auto-rows: minmax(200px, auto) + @media (max-width: $ak-breakpoint-lg) + grid-template-columns: repeat(2, 1fr) + + .footer-wrapper + padding: 0 56px + @media (max-width: $ak-breakpoint-lg) + padding: 0 + + .footer__bottom + line-height: 1.57 + letter-spacing: 0.43px + color: #bfbfbf + padding-bottom: 38px + padding-right: 25px + margin-right: 76px + a + color: #bfbfbf + @media (max-width: $ak-breakpoint-lg) + padding-bottom: 77px + padding-right: 0 + margin-right: 0 - .footer__bottom - margin-top: $pad-md - text-align: center +.main-padding + padding-bottom: 48px + background: #37424B; + @media (max-width: $ak-breakpoint-lg) + padding-bottom: 100px diff --git a/website/www/site/assets/scss/_global.sass b/website/www/site/assets/scss/_global.sass index db0deb28e9de..f7f7b5abe384 100644 --- a/website/www/site/assets/scss/_global.sass +++ b/website/www/site/assets/scss/_global.sass @@ -25,18 +25,23 @@ body .body background: #fff - max-width: 1440px margin: 0 auto - padding-top: 130px - + .no__padding + padding: 0 &:not(.body--index) .body__contained + @media (max-width: $ak-breakpoint-lg) + padding: 0 padding: 0 30px max-width: 1280px figure img width: 100% + .center + margin: 0 auto + .content-up + margin-top: -20px !important .section &:not(.section--wide) @@ -52,6 +57,20 @@ body a color: $color-brand +.body__section-no-nav + margin-left: 256px + //width: calc(100% - 380px) + + > [id]:before + content: '' + display: block + height: 82px // fixed header height + margin: -82px 0 0 // negative fixed header height + + @media (max-width: $ak-breakpoint-lg) + margin-left: 0 + width: 100% + .body__section-nav margin-left: 256px width: calc(100% - 492px) @@ -67,5 +86,73 @@ body width: 100% .container-main-content - padding: 0 20px - position: relative \ No newline at end of file + @media (max-width: $ak-breakpoint-lg) + padding: 0 24px + min-height: 100vh + padding: 0 22px + position: relative + background-color: #fff + margin-top: 64px + + @media (min-width: $tablet) + margin-top: 0 +.desktop + @media (max-width: $ak-breakpoint-lg) + display: none !important +.mobile + display: none + @media (max-width: $ak-breakpoint-lg) + display: block + +.code-snippet, pre + background:rgba(255, 109, 0, 0.03) !important + border-radius: 8px + border-top-left-radius: 0; + border: solid 0.6px #ff6d05 + font-family: Menlo + font-size: 16px + font-weight: normal + line-height: 1.63 + letter-spacing: 0.43px + padding: 24px + max-width: 100% + position: relative + + pre + border: none + padding: 0 + margin-top: 36px + background: initial !important + a + float: right + margin-left: 12px + img + &:hover + opacity: 0.7 + filter: alpha(opacity=70) + cursor: pointer + +.snippet + max-width: 100% + margin-bottom: 40px + .git-link + float: right + font-size: 16px + font-weight: normal + line-height: 1.63 + letter-spacing: 0.43px + color: #8c8b8e + .without_switcher + border-top-left-radius: 8px + pre + background: initial !important + +table + margin-top: 24px + margin-bottom: 24px + th + min-width: 80px + padding-right: 12px + +.underline + text-decoration: underline diff --git a/website/www/site/assets/scss/_graphic.sass b/website/www/site/assets/scss/_graphic.sass deleted file mode 100644 index f01c72a12528..000000000000 --- a/website/www/site/assets/scss/_graphic.sass +++ /dev/null @@ -1,27 +0,0 @@ -/*! - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -.graphic - .graphic__image - text-align: center - line-height: 0 - - img - max-width: 100% - margin: 0 auto - max-height: 500px - border: 1px solid #efefef diff --git a/website/www/site/assets/scss/_graphic.scss b/website/www/site/assets/scss/_graphic.scss new file mode 100644 index 000000000000..336f1f422516 --- /dev/null +++ b/website/www/site/assets/scss/_graphic.scss @@ -0,0 +1,91 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +@import "media"; + +.graphic { + text-align: center; + h2 { + @extend .component-title; + } + .images { + max-width: 868px; + margin: 0 auto; + height: 60px; + } + .margin { + margin-top: 84px; + margin-bottom: 84px; + @media (max-width: $mobile) { + margin-top: 0; + margin-bottom: 64px; + } + } + .row { + display: flex; + justify-content: space-between; + @media (max-width: $mobile) { + flex-direction: column; + align-items: center; + } + .logos-row { + display: flex; + align-items: center; + margin-top: 20px; + max-height: 73px; + @media (max-width: $mobile) { + margin-top: 64px; + } + img { + height: auto; + width: 112px; + } + } + .first_logo { + margin-right: 18px; + } + .column { + display: flex; + flex-direction: column; + align-items: center; + max-width: 306px; + + h4 { + margin-top: 32px; + } + .more { + margin-top: 32px; + font-size: 14px; + font-weight: bold; + line-height: 16px; + letter-spacing: 0.6px; + color: #f26628; + } + } + .icon { + width: 34px; + height: 44px; + margin-top: 16px; + @media (max-width: $mobile) { + margin-top: 64px; + } + } + img { + max-width: 306px; + height: 42px; + } + } +} diff --git a/website/www/site/assets/scss/_page-nav.sass b/website/www/site/assets/scss/_hero-mobile.scss similarity index 53% rename from website/www/site/assets/scss/_page-nav.sass rename to website/www/site/assets/scss/_hero-mobile.scss index 542a4222b20a..b3db3d1cd4b9 100644 --- a/website/www/site/assets/scss/_page-nav.sass +++ b/website/www/site/assets/scss/_hero-mobile.scss @@ -15,48 +15,50 @@ * limitations under the License. */ -.page-nav - border-left: 3px solid $color-brand - overflow-y: auto - max-height: calc(100vh - 130px) - padding: 0 15px - position: fixed - width: 240px - - li - display: block - width: 100% - - a - color: $color-dark - display: block - font-size: 12px - padding: 5px - - span - font-size: 12 - - ul - padding-left: 20px - - .nav - > li.active - a - text-decoration: underline - - @media (max-width: $ak-breakpoint-lg) - margin-top: 0 - max-height: initial - right: 0 - padding: 0 30px - position: relative - width: 100% - - #TableOfContents - > ul - padding-left: 0 - margin-bottom: 0 - ul - padding-left: 20 - ul - display: none +@import "media"; + +.hero-mobile { + position: relative; + margin-bottom: 0; + display: none; + width: 100%; + height: calc(100% - 96px); + min-height: 300px; + .hero-content { + position: absolute; + z-index: 1; + top: 50%; + left: 50%; + transform: translate(-50%, -50%); + text-align: center; + width: 100%; + max-width: 506px; + + h3 { + @extend .hero-title; + + text-transform: uppercase; + margin: 0 auto 16px auto; + } + + h1 { + @extend .hero-heading; + + width: 300px; + margin: 0 auto 24px auto; + } + + h2 { + @extend .hero-subheading; + + width: 300px; + margin: 0 auto; + } + } +} + +@media (max-width: $mobile) { + .hero-mobile { + display: inherit; + } +} diff --git a/website/www/site/assets/scss/_hero.sass b/website/www/site/assets/scss/_hero.sass deleted file mode 100644 index 63c22b9282a5..000000000000 --- a/website/www/site/assets/scss/_hero.sass +++ /dev/null @@ -1,156 +0,0 @@ -/*! - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -.hero-bg - background-image: url(../images/hero_bg_flat.svg) - background-repeat: no-repeat - background-size: cover - background-position: top center - margin-top: -50px - +md - background-size: 100% - padding-bottom: $pad - -.hero - padding-top: $pad-xl - margin-bottom: $pad-md - position: relative - z-index: 1 - +md - padding-top: $pad-sm - margin-bottom: $pad-xl - - .hero__content - position: relative - z-index: 1 - - .hero__image - bottom: 0 - content: '' - left: 0 - line-height: 0 - position: absolute - right: 0 - top: 0 - z-index: 0 - img - position: absolute - bottom: 0 - width: 100% - - .hero__title - +type-h1 - color: #fff - max-width: 500px - margin: 0 auto $pad - text-align: center - +md - margin: 0 0 $pad - text-align: left - - .hero__ctas - text-align: center - margin-bottom: $pad-md - +md - margin-bottom: 0 - text-align: left - - &--first - margin-bottom: $pad - +md - margin-bottom: $pad-sm - - .hero__subtitle - +type-h3 - color: #fff - max-width: 540px - margin: 0 auto $pad - font-weight: $font-weight-semibold - text-align: center - +md - margin: 0 0 $pad-md - text-align: left - - .hero__blog - .hero__blog__title - +type-h4 - font-weight: $font-weight-bold - margin-bottom: $pad - text-align: center - +md - color: #fff - text-align: left - margin-bottom: $pad/2 - - .hero__blog__cards - +md - display: flex - margin: 0 -10px - - .hero__blog__cards__card - background-color: #fff - color: inherit - box-shadow: $box-shadow - padding: 20px - display: block - transition: transform 300ms ease, box-shadow 300ms ease - position: relative - max-width: 300px - margin: 0 auto $pad - +md - margin: 0 10px - - &:before - background-image: url(../images/card_border.svg) - background-position: top - background-repeat: no-repeat - background-size: cover - content: ' ' - display: block - height: 2px - position: absolute - width: 100% - left: 0 - top: 0 - - &:hover - text-decoration: none - transform: translateY(-8px) - box-shadow: $box-shadow-hover - - .hero__blog__cards__card__title - +type-body - margin-bottom: $pad - - .hero__blog__cards__card__date - +type-body-sm - font-weight: $font-weight-semibold - text-transform: uppercase - letter-spacing: 1px - - .hero__cols - +md - display: flex - min-height: 500px - - .hero__cols__col - width: 50% - display: flex - align-items: flex-end - - &:first-child - align-items: center diff --git a/website/www/site/assets/scss/_hero.scss b/website/www/site/assets/scss/_hero.scss new file mode 100644 index 000000000000..e803db45c09e --- /dev/null +++ b/website/www/site/assets/scss/_hero.scss @@ -0,0 +1,133 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + @import "media"; + @import "typography"; + + .hero-desktop { + position: relative; + margin-bottom: 0; + width: 100%; + height: 100%; + display: inherit; + margin-top: -30px; + min-height: 361px; + .hero-content { + position: absolute; + z-index: 1; + top: 50%; + left: 50%; + transform: translate(-50%, -50%); + text-align: center; + + h3 { + @extend .hero-title; + text-transform: uppercase; + margin: 0 auto 24px auto; + } + + h1 { + @extend .hero-heading; + width: 506px; + height: 92px; + font-size: 46px; + margin: 0 auto 36px auto; + } + + h2 { + @extend .hero-subheading; + width: 344px; + margin: 0 auto 56px auto; + } + + a { + text-decoration: none; + } + + button { + width: 184px; + height: 46px; + border-radius: 100px; + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, 0.16), + 0 4px 4px 0 rgba(0, 0, 0, 0.06); + background-color: $color-white; + display: flex; + align-items: center; + justify-content: center; + margin: 0 auto; + border: none; + outline: none; + svg { + width: 14px; + height: 16px; + } + span { + text-transform: uppercase; + font-size: 14px; + font-weight: bold; + letter-spacing: 0.6px; + line-height: 1.14; + color: $color-sun; + margin-left: 12px; + } + } + + button:hover { + background-color: $color-white; + box-shadow: 0 4px 20px 0 rgba(0, 0, 0, 0.24), + 0 4px 6px 0 rgba(0, 0, 0, 0.24); + } + + button:focus { + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, 0.16), + 0 4px 4px 0 rgba(0, 0, 0, 0.06); + } + } + } + + @media (max-width: $tablet) { + .hero-desktop { + margin-top: 64px; + + .hero-content { + h3 { + margin: 0 auto 16px auto; + } + + h1 { + width: 300px; + margin: 0 auto 24px auto; + font-size: 32px; + } + + h2 { + width: 300px; + margin: 0 auto; + } + + button { + display: none; + } + } + } + } + +@media (max-width: $mobile) { + .hero-desktop { + display: none; + } +} diff --git a/website/www/site/assets/scss/_keen-slider.scss b/website/www/site/assets/scss/_keen-slider.scss new file mode 100644 index 000000000000..08a4326e0042 --- /dev/null +++ b/website/www/site/assets/scss/_keen-slider.scss @@ -0,0 +1,40 @@ +/** + * keen-slider 5.3.2 + * The HTML touch slider carousel with the most native feeling you will get. + * https://keen-slider.io + * Copyright 2020-2020 Eric Beyer + * License: MIT + * Released on: 2020-11-10 + */ + +/*# sourceMappingURL=keen-slider.min.css.map */ +// This is pulled from "https://cdn.jsdelivr.net/npm/keen-slider@5.3.2/keen-slider.min.css" to serve the consistency +.keen-slider { + display: flex; + -webkit-user-select: none; + -moz-user-select: none; + -ms-user-select: none; + user-select: none; + -webkit-touch-callout: none; + -khtml-user-select: none; + touch-action: pan-y; + -webkit-tap-highlight-color: transparent; +} +.keen-slider, +.keen-slider__slide { + overflow: hidden; + position: relative; +} +.keen-slider__slide { + width: 100%; + min-height: 100%; +} +.keen-slider[data-keen-slider-v] { + flex-wrap: wrap; +} +.keen-slider[data-keen-slider-v] .keen-slider__slide { + width: 100%; +} +.keen-slider[data-keen-slider-moves] * { + pointer-events: none; +} diff --git a/website/www/site/assets/scss/_layout.scss b/website/www/site/assets/scss/_layout.scss index 0c3b1c3132d0..9f8f2b4de15b 100644 --- a/website/www/site/assets/scss/_layout.scss +++ b/website/www/site/assets/scss/_layout.scss @@ -16,241 +16,235 @@ * Site header */ .site-header { - border-top: 5px solid $grey-color-dark; - border-bottom: 1px solid $grey-color-light; - min-height: 56px; + border-top: 5px solid $grey-color-dark; + border-bottom: 1px solid $grey-color-light; + min-height: 56px; - // Positioning context for the mobile navigation icon - position: relative; + // Positioning context for the mobile navigation icon + position: relative; } .site-title { - font-size: 26px; - font-weight: 300; - line-height: 56px; - letter-spacing: -1px; - margin-bottom: 0; - float: left; - - &, - &:visited { - color: $grey-color-dark; - } + font-size: 26px; + font-weight: 300; + line-height: 56px; + letter-spacing: -1px; + margin-bottom: 0; + float: left; + + &, + &:visited { + color: $grey-color-dark; + } } .site-nav { - float: right; - line-height: 56px; + float: right; + line-height: 56px; - .menu-icon { - display: none; - } + .menu-icon { + display: none; + } - .page-link { - color: $text-color; - line-height: $base-line-height; + .page-link { + color: $text-color; + line-height: $base-line-height; - // Gaps between nav items, but not on the last one - &:not(:last-child) { - margin-right: 20px; - } + // Gaps between nav items, but not on the last one + &:not(:last-child) { + margin-right: 20px; } + } - @include media-query($on-palm) { - position: absolute; - top: 9px; - right: $spacing-unit / 2; - background-color: $background-color; - border: 1px solid $grey-color-light; - border-radius: 5px; - text-align: right; - - .menu-icon { - display: block; - float: right; - width: 36px; - height: 26px; - line-height: 0; - padding-top: 10px; - text-align: center; - - > svg { - width: 18px; - height: 15px; - - path { - fill: $grey-color-dark; - } - } - } + @include media-query($on-palm) { + position: absolute; + top: 9px; + right: $spacing-unit / 2; + background-color: $background-color; + border: 1px solid $grey-color-light; + border-radius: 5px; + text-align: right; - .trigger { - clear: both; - display: none; - } - - &:hover .trigger { - display: block; - padding-bottom: 5px; + .menu-icon { + display: block; + float: right; + width: 36px; + height: 26px; + line-height: 0; + padding-top: 10px; + text-align: center; + + > svg { + width: 18px; + height: 15px; + + path { + fill: $grey-color-dark; } + } + } - .page-link { - display: block; - padding: 5px 10px; + .trigger { + clear: both; + display: none; + } - &:not(:last-child) { - margin-right: 0; - } - margin-left: 20px; - } + &:hover .trigger { + display: block; + padding-bottom: 5px; } -} + .page-link { + display: block; + padding: 5px 10px; + &:not(:last-child) { + margin-right: 0; + } + margin-left: 20px; + } + } +} /** - * Site footer - */ + * Site footer + */ .site-footer { - border-top: 1px solid $grey-color-light; - padding: $spacing-unit 0; + border-top: 1px solid $grey-color-light; + padding: $spacing-unit 0; } .footer-heading { - font-size: 18px; - margin-bottom: $spacing-unit / 2; + font-size: 18px; + margin-bottom: $spacing-unit / 2; } .contact-list, .social-media-list { - list-style: none; - margin-left: 0; + list-style: none; + margin-left: 0; } .footer-col-wrapper { - font-size: 15px; - color: $grey-color; - margin-left: -$spacing-unit / 2; - @extend %clearfix; + font-size: 15px; + color: $grey-color; + margin-left: -$spacing-unit / 2; + @extend %clearfix; } .footer-col { - float: left; - margin-bottom: $spacing-unit / 2; - padding-left: $spacing-unit / 2; + float: left; + margin-bottom: $spacing-unit / 2; + padding-left: $spacing-unit / 2; } .footer-col-1 { - width: -webkit-calc(35% - (#{$spacing-unit} / 2)); - width: calc(35% - (#{$spacing-unit} / 2)); + width: -webkit-calc(35% - (#{$spacing-unit} / 2)); + width: calc(35% - (#{$spacing-unit} / 2)); } .footer-col-2 { - width: -webkit-calc(20% - (#{$spacing-unit} / 2)); - width: calc(20% - (#{$spacing-unit} / 2)); + width: -webkit-calc(20% - (#{$spacing-unit} / 2)); + width: calc(20% - (#{$spacing-unit} / 2)); } .footer-col-3 { - width: -webkit-calc(45% - (#{$spacing-unit} / 2)); - width: calc(45% - (#{$spacing-unit} / 2)); + width: -webkit-calc(45% - (#{$spacing-unit} / 2)); + width: calc(45% - (#{$spacing-unit} / 2)); } @include media-query($on-laptop) { - .footer-col-1, - .footer-col-2 { - width: -webkit-calc(50% - (#{$spacing-unit} / 2)); - width: calc(50% - (#{$spacing-unit} / 2)); - } + .footer-col-1, + .footer-col-2 { + width: -webkit-calc(50% - (#{$spacing-unit} / 2)); + width: calc(50% - (#{$spacing-unit} / 2)); + } - .footer-col-3 { - width: -webkit-calc(100% - (#{$spacing-unit} / 2)); - width: calc(100% - (#{$spacing-unit} / 2)); - } + .footer-col-3 { + width: -webkit-calc(100% - (#{$spacing-unit} / 2)); + width: calc(100% - (#{$spacing-unit} / 2)); + } } @include media-query($on-palm) { - .footer-col { - float: none; - width: -webkit-calc(100% - (#{$spacing-unit} / 2)); - width: calc(100% - (#{$spacing-unit} / 2)); - } + .footer-col { + float: none; + width: -webkit-calc(100% - (#{$spacing-unit} / 2)); + width: calc(100% - (#{$spacing-unit} / 2)); + } } - - /** - * Page content - */ + * Page content + */ .page-content { - padding: $spacing-unit 0; + padding: $spacing-unit 0; } .page-heading { - font-size: 20px; + font-size: 20px; } .post-list { - margin-left: 0; - list-style: none; + margin-left: 0; + list-style: none; - > li { - margin-bottom: $spacing-unit; - } + > li { + margin-bottom: $spacing-unit; + } } .post-meta { - font-size: $small-font-size; - color: $grey-color; + font-size: $small-font-size; + color: $grey-color; } .post-link { - display: block; - font-size: 24px; + display: block; + font-size: 24px; } - - /** - * Posts - */ + * Posts + */ .post-header { - margin-bottom: $spacing-unit; + margin-bottom: $spacing-unit; } .post-title { - font-size: 42px; - letter-spacing: -1px; - line-height: 1; + font-size: 42px; + letter-spacing: -1px; + line-height: 1; - @include media-query($on-laptop) { - font-size: 36px; - } + @include media-query($on-laptop) { + font-size: 36px; + } } .post-content { - margin-bottom: $spacing-unit; + margin-bottom: $spacing-unit; - h2 { - font-size: 32px; + h2 { + font-size: 32px; - @include media-query($on-laptop) { - font-size: 28px; - } + @include media-query($on-laptop) { + font-size: 28px; } + } - h3 { - font-size: 26px; + h3 { + font-size: 26px; - @include media-query($on-laptop) { - font-size: 22px; - } + @include media-query($on-laptop) { + font-size: 22px; } + } - h4 { - font-size: 20px; + h4 { + font-size: 20px; - @include media-query($on-laptop) { - font-size: 18px; - } + @include media-query($on-laptop) { + font-size: 18px; } + } } diff --git a/website/www/site/assets/scss/_lists.scss b/website/www/site/assets/scss/_lists.scss new file mode 100644 index 000000000000..c4a34910c67e --- /dev/null +++ b/website/www/site/assets/scss/_lists.scss @@ -0,0 +1,304 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + @import "media"; + + .arrow-list { + margin-top: 26px; + h3 { + max-width: 555px; + } + ul { + font-size: 16px; + font-weight: normal; + line-height: 1.63; + letter-spacing: 0.43px; + position: relative; + padding-left: 26px; + li::marker { + content: "\2192"; + color: #ff6d05; + font-size: 24px; + } + li { + list-style-type: none; + list-style-position: outside; + padding-left: 8px; + margin-bottom: -2px; + } + li:last-of-type { + margin-bottom: 25px; + } + } + figure { + width: 112px; + height: 112px; + margin-top: 64px; + } + a { + color: #333333; + text-decoration: underline; + } + } + + .arrow-list-header { + color: #333333; + text-decoration: none !important; + display: flex; + max-height: 40px; + margin-bottom: 74px; + &:hover { + cursor: default; + text-decoration: none; + } + @media (max-width: $mobile) { + margin-bottom: 50px; + &:hover { + cursor: pointer; + text-decoration: underline; + } + h2 { + font-size: 24px; + line-height: 1.25; + font-weight: 500; + letter-spacing: normal; + margin-top: 46px; + } + } + figure { + margin-top: 40px; + margin-right: 5px; + width: auto; + height: auto; + @media (min-width: $mobile) { + display: none; + } + img { + width: 18px; + height: 18px; + padding-right: 16px; + transform: rotate(-90deg); + } + } + } + .rotate { + img { + transform: rotate(0deg) !important; + margin-top: 10px; + } + } + .list { + margin-top: 32px; + max-width: 640px; + } + .icon-list { + .list-item { + display: flex; + margin-bottom: 8px; + position: relative; + @media (max-width: $tablet) { + margin-bottom: 43px; + } + img { + position: absolute; + top: 7px; + z-index: 2; + } + .reverse { + transform: rotate(-180deg); + } + + .cirkle { + width: 24px; + height: 24px; + background: #ff6d05; + border-radius: 50%; + text-align: center; + z-index: 5; + position: relative; + i { + vertical-align: bottom; + color: #fff; + } + } + .list-item-icon { + width: 44px; + height: 44px; + display: block; + margin-right: 25px; + } + .list-item-header { + font-size: 22px; + font-weight: 500; + letter-spacing: normal; + line-height: 1.36; + margin-top: 10px; + a { + color: #333333; + } + @media (max-width: $mobile) { + font-weight: 500; + } + } + .list-item-description { + @media (max-width: $mobile) { + position: absolute; + left: 55px; + } + } + } + .pillars-item { + display: flex; + align-items: center; + margin-bottom: 14px; + .pillars-item-icon { + margin-right: 28px; + } + + .pillars-item-description { + width: 100%; + max-width: 684px; + + .pillars-item-header { + font-size: 16px; + font-weight: bold; + line-height: 1.2; + letter-spacing: 0.43px; + margin-bottom: 6px; + margin-top: 10px; + } + + .pillars-item-text { + font-size: 14px; + font-weight: normal; + line-height: 1.57; + letter-spacing: 0.43px; + } + } + } + } + .documentation-list { + padding-left: 12px; + display: flex; + flex-direction: row; + justify-content: space-between; + flex-wrap: wrap; + max-width: 855px; + .row { + display: flex; + flex-direction: column; + margin-top: 63px; + @media (max-width: $mobile) { + max-width: 260px; + } + + .item-icon { + height: 93px; + max-width: 260px; + display: flex; + align-items: center; + @media (max-width: $mobile) { + justify-content: center; + } + } + a { + font-size: 16px; + font-weight: bold; + line-height: 1.63; + letter-spacing: 0.43px; + color: #333333; + text-decoration: none; + } + .item-description { + max-width: 260px; + margin-right: 32px; + margin-top: 20px; + @media (max-width: $mobile) { + margin-right: 0; + width: auto; + } + } + } + } + .sdks { + max-width: 855px; + .item-description { + font-size: 16px; + font-weight: bold; + line-height: 1.63; + letter-spacing: 0.43px; + a{ + text-decoration: none; + } + } + } + .collapsable-list { + margin: 52px 0; + a { + font-size: 14px; + font-weight: bold; + font-stretch: normal; + font-style: normal; + line-height: 1.14; + letter-spacing: 0.6px; + color: #f26628; + } + li { + a { + font-size: 16px; + line-height: 1.63; + letter-spacing: 0.43px; + color: #333333; + } + } + } + .mobile-column { + @media (max-width: $mobile) { + flex-direction: column; + align-items: center; + text-align: center; + } + } + .margin-50 { + margin-top: 50px; + } + @media (min-width: 768px) { + .arrow-list { + .collapse.dont-collapse-sm { + display: block; + height: auto !important; + visibility: visible; + } + .collapsing { + position:relative; + height:unset !important; + overflow:hidden; + } + } + } + .beam-list { + margin-top: 32px; + .beam-title { + margin-bottom: 4px; + font-weight: bold; + } + .beam-description { + margin: 0; + font-size: 14px; + line-height: 22px; + height: 44px; + } + } diff --git a/website/www/site/assets/scss/_logos.sass b/website/www/site/assets/scss/_logos.sass deleted file mode 100644 index fd0f6f30058c..000000000000 --- a/website/www/site/assets/scss/_logos.sass +++ /dev/null @@ -1,36 +0,0 @@ -/*! - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -.logos - text-align: center - margin: $pad-xl 0 - - .logos__title - +type-h2 - margin-bottom: $pad-md - - .logos__logos - display: flex - justify-content: center - - .logos__logos__logo - line-height: 0 - margin: 0 $pad - +md - margin: 0 $pad-md - img - max-height: 70px diff --git a/website/www/site/assets/scss/_pillars.sass b/website/www/site/assets/scss/_logos.scss similarity index 50% rename from website/www/site/assets/scss/_pillars.sass rename to website/www/site/assets/scss/_logos.scss index 220e7ce8c6a0..33cd632d1fb8 100644 --- a/website/www/site/assets/scss/_pillars.sass +++ b/website/www/site/assets/scss/_logos.scss @@ -15,29 +15,57 @@ * limitations under the License. */ -.pillars - margin: $pad-xl 0 - text-align: center - - .pillars__title - +type-h2 - margin-bottom: $pad - - .pillars__cols - +type-body - +md - display: flex - justify-content: center - - .pillars__cols__col - .pillars__cols__col__title - font-weight: 600 - margin-bottom: $pad/2 - - .pillars__cols__col__body - max-width: 350px - margin: 0 auto $pad-sm - - +md - padding: 0 $pad - margin: 0 auto +@import "media"; + +.logos { + padding: $pad-l $pad; + + .logos-title { + @extend .component-title; + + text-align: center; + } + + .logos-logos { + display: flex; + justify-content: space-between; + width: 100%; + max-width: 1111px; + margin: 70px auto 60px; + + .logos-logo { + line-height: 0; + + img { + max-height: 70px; + } + } + } +} + +@media (max-width: $tablet) { + .logos { + padding: $pad-md $pad-s; + + .logos-logos { + max-width: 360px; + flex-wrap: wrap; + justify-content: center; + margin: 50px auto 20px; + + .logos-logo { + margin-right: 60px; + margin-bottom: 50px; + + img { + max-height: 45px; + } + } + + :nth-child(3), + :last-child { + margin-right: 0; + } + } + } +} diff --git a/website/www/site/assets/scss/_table-wrapper.sass b/website/www/site/assets/scss/_media.scss similarity index 88% rename from website/www/site/assets/scss/_table-wrapper.sass rename to website/www/site/assets/scss/_media.scss index 299b0019f62c..0365663590e7 100644 --- a/website/www/site/assets/scss/_table-wrapper.sass +++ b/website/www/site/assets/scss/_media.scss @@ -15,10 +15,6 @@ * limitations under the License. */ -.table-wrapper - > table - @extend .table - -.table-bordered-wrapper - > table - @extend .table-bordered + $mobile: 640px; + $tablet: 1024px; + $fullhd: 1920px; diff --git a/website/www/site/assets/scss/_navbar-desktop.scss b/website/www/site/assets/scss/_navbar-desktop.scss new file mode 100644 index 000000000000..3ec381905fa1 --- /dev/null +++ b/website/www/site/assets/scss/_navbar-desktop.scss @@ -0,0 +1,205 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + @import "media"; + + .navigation-bar-mobile { + display: none; + .arrow-icon { + display: flex; + align-items: baseline; + margin-left: 10px; + } + } + .nav-tabs { + border-bottom: none; + } + .navigation-bar-desktop { + display: flex; + height: 96px; + width: 100%; + align-items: center; + margin-bottom: 30px; + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, 0.06); + background-color: $color-white; + z-index: 10000; // just to make sure that navbar always on top of other elements + + #iconsBar { + display: flex; + } + + a { + @extend .component-text; + font-weight: 500; + color: $color-dark-gray; + letter-spacing: 0.2px; + line-height: 1.63; + margin-right: 39px; + text-decoration: none; + cursor: pointer; + @media(max-width: 1273px){ + font-size: 14px; + margin-right: 18px; + } + } + + .navbar-logo { + margin-left: 58px; + margin-bottom: 4px; + img { + width: 88px; + } + } + .navbar-bar-left { + display: flex; + justify-content: space-between; + width: 100%; + } + .navbar-links { + display: flex; + align-items: center; + justify-content: space-between; + z-index: 10000; + :last-child { + margin-right: 0; + } + + .navbar-link { + display: inline-block; + position: relative; + margin-bottom: 2px; + } + + .navbar-link::before { + transition: 0.3; + content: ""; + position: absolute; + background-color: $color-sun; + height: 0%; + width: 100%; + bottom: 0px; + border-radius: 5px; + } + + .navbar-link:hover::before { + height: 2px; + } + + .navbar-dropdown-documentation { + list-style-type: none; + ul { + width: 209px; + left: -25%; + text-align: center; + border: none; + box-shadow: none; + padding-top: 34px; + padding-bottom: 0; + border-radius: 0; + li { + height: 36px; + } + a { + @extend .component-text; + line-height: 1.63; + letter-spacing: 0.2px; + @media(max-width: 1273px){ + font-size: 14px; + margin-right: 5px; + } + } + } + } + } + + .navbar-dropdown-apache { + margin-right: 90px; + list-style-type: none; + + .dropdown-toggle { + margin: 0; + } + + ul { + width: 209px; + left: 70%; + transform: translateX(-50%); + text-align: center; + border: none; + box-shadow: none; + padding-top: 35px; + padding-bottom: 0; + border-radius: 0; + li { + height: 36px; + } + a { + @extend .component-text; + + margin-right: 0 !important; + } + } + + .arrow-icon { + position: absolute; + top: 3px; + right: -30px; + } + } + + .navbar-dropdown-apache:hover { + .arrow-icon { + opacity: 0.84; + } + } + .dropdown-menu{ + @media(max-width: 1273px){ + width: 132px !important; + left: -14% !important; + } + } + + .navbar-dropdown { + .dropdown-menu > li > a { + &:hover, + &:focus { + text-decoration: none; + color: $color-dropdown-link-hover-text; + background-color: $color-dropdown-link-hover-bg; + } + } + } + + .dropdown:hover .dropdown-menu { + display: block; + margin-top: 0; + } + } + + @media (max-width: $tablet) { + .navigation-bar-desktop { + display: none; + } + + .navigation-bar-mobile { + display: block; + } + + .page-nav { + margin-top: 30px; + } + } diff --git a/website/www/site/assets/scss/_navbar.sass b/website/www/site/assets/scss/_navbar-mobile.sass similarity index 58% rename from website/www/site/assets/scss/_navbar.sass rename to website/www/site/assets/scss/_navbar-mobile.sass index f4b4ea6bcae7..4c9a2a528808 100644 --- a/website/www/site/assets/scss/_navbar.sass +++ b/website/www/site/assets/scss/_navbar-mobile.sass @@ -16,15 +16,31 @@ */ .navbar - padding: 15px 0 - + padding: 0 !important + min-height: 64px + ::before + display: none + ::after + display: none .navbar-nav > li > a text-transform: uppercase - + .navbar-link + font-size: 16px; + font-weight: 500; + line-height: normal; + letter-spacing: normal .navbar-header - margin-left: $pad + float: none + display: flex + align-items: center + justify-content: space-between + height: 100% + margin-top: 5px .navbar-brand + padding: 0 + display: flex + align-items: center +md margin-right: $pad @@ -40,31 +56,56 @@ color: $color-dark-gray .navbar-toggle - float: left + margin-right: 16px !important .icon-bar - background-color: $color-dark-gray + background-color: $color-sun + height: 3px + width: 20px - @media (max-width: $ak-breakpoint-lg) + @media (max-width: $tablet) display: block .navbar-container - @media (max-width: $ak-breakpoint-lg) + @media (max-width: $tablet) background-color: $color-white bottom: 0 min-height: 100vh - max-width: 256px + max-width: 303px padding: 15px position: fixed top: 0 transition: transform 100ms linear - width: calc(100% - 32px) - + width: calc(100% - 72px) + right: 0 + overflow-y: auto + + .navbar-toggle + margin: 0 + .dropdown-toggle + display: flex + align-items: center + .navbar-nav + margin-top: 58px .navbar-nav > li width: 100% + padding: 5px 0 + + span.navbar-link + padding: 10px 15px + color: #555555 + ul + list-style: none !important + li + padding: 9px 0 + a + letter-spacing: 0.2px + + .navbar-link + text-transform: none &.closed - transform: translateX(-100%) + transform: translateX(100%) &.open transform: translateX(0) @@ -78,7 +119,7 @@ top: 0 transition: opacity 200ms - @media (max-width: $ak-breakpoint-lg) + @media (max-width: $tablet) display: block &.closed @@ -88,7 +129,12 @@ &.open opacity: 0.5 width: 100% + overflow-y: auto - @media (max-width: $ak-breakpoint-lg) + @media (max-width: $tablet) .navbar-right margin-right: -15px + margin-top: 0 !important + +.fixedPosition + position: fixed diff --git a/website/www/site/assets/scss/_page-nav.scss b/website/www/site/assets/scss/_page-nav.scss new file mode 100644 index 000000000000..579da31f2b54 --- /dev/null +++ b/website/www/site/assets/scss/_page-nav.scss @@ -0,0 +1,103 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + @import "media"; + +.page-nav { + overflow-y: auto; + max-height: calc(100vh - 130px); + position: fixed; + width: 248px; + margin-left: 30px; + li { + display: block; + width: 100%; + min-height: 36px; + a { + color: #37424b; + display: block; + font-size: 12px; + padding: 5px; + letter-spacing: .43px; + } + span { + font-size: 12px; + } + ul { + padding-left: 20px; + } + } + .nav { + img { + height: 14px; + width: 10px; + vertical-align: text-bottom; + transition: .3s; + margin-right: 8px; + } + .rotate { + transform: rotate(-90deg); + transition: .3s; + } + >li { + padding: 0 15px; + border-left: 3px solid #d8d8d8; + } + .chosen { + position: relative; + ::before { + content: ''; + position: absolute; + top: -1px; + left: -18px; + height: 100%; + min-height: 36px; + border-left: 3px solid #ff6d05; + } + a { + text-decoration: underline; + font-weight: 500; + } + } + .active .chosen ::before { + left: -38px; + } + } + #TableOfContents>ul { + margin-bottom: 0; + } +} + +.page-nav .nav>li>:hover { + border-radius: 8px +} + +.page-nav .nav>li>ul>li>:hover { + border-radius: 8px; + background-color: #f2f2f2; + text-decoration: none +} + +@media(max-width: $ak-breakpoint-lg) { + .page-nav { + margin-top: 0; + max-height: initial; + right: 0; + padding: 0 30px; + position: relative; + width: 100% + } +} diff --git a/website/www/site/assets/scss/_pillars.scss b/website/www/site/assets/scss/_pillars.scss new file mode 100644 index 000000000000..4ba7a01895d7 --- /dev/null +++ b/website/www/site/assets/scss/_pillars.scss @@ -0,0 +1,151 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +@import "media"; + +.pillars { + padding: $pad-l $pad; + + .pillars-title { + @extend .component-title; + + text-align: center; + border: none; + } + + .pillars-content { + display: grid; + grid-template-columns: 443px 443px; + grid-gap: 50px 89px; + justify-content: center; + margin-top: 84px; + + .pillars-item { + display: flex; + align-items: center; + + .pillars-item-icon { + margin-right: 47px; + } + + .pillars-item-description { + width: 100%; + max-width: 284px; + + .pillars-item-header { + @extend .component-header; + } + + .pillars-item-text { + @extend .component-text; + } + } + } + } + + .pillars-social { + display: flex; + flex-direction: column; + align-items: center; + text-align: center; + margin-top: 117px; + + .pillars-social-icons { + display: flex; + align-items: center; + margin-bottom: 45px; + + svg { + height: 41px; + width: auto; + } + + #about-twitter-icon { + height: 45px; + } + + a { + filter: grayscale(100%); + opacity: 0.7; + + &:hover { + filter: grayscale(0); + opacity: 1; + } + } + + .pillars-youtube-icon { + margin: 0 80px; + } + } + + .pillars-social-text { + @extend .component-text; + max-width: 285px; + } + } +} + +@media (max-width: $ak-breakpoint-lg) { + .pillars { + padding: $pad-md $pad-s; + + .pillars-content { + grid-template-columns: 330px; + grid-column-gap: 47px; + margin-top: 62px; + + .pillars-item { + align-items: flex-start; + + .pillars-item-icon { + margin-right: 17px; + margin-top: 12px; + } + + svg { + width: 64px; + height: 64px; + } + } + } + + .pillars-social { + margin-top: 91px; + + .pillars-social-icons { + svg { + height: 34px; + width: auto; + } + + #about-twitter-icon { + height: 37px; + } + + a { + filter: none; + opacity: 1; + } + + .pillars-youtube-icon { + margin: 0 60px; + } + } + } + } +} diff --git a/website/www/site/assets/scss/_powered_by.scss b/website/www/site/assets/scss/_powered_by.scss new file mode 100644 index 000000000000..2a2f67d78a37 --- /dev/null +++ b/website/www/site/assets/scss/_powered_by.scss @@ -0,0 +1,196 @@ +/* + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. + */ + .powered_by { + display: flex; + flex-wrap: wrap; + margin-top: 74px; + .powered-card { + width: 417px; + height: 413px; + padding: 33px 26px 32px; + border-radius: 16px; + -webkit-box-shadow: 0 4px 16px 0 rgba(0, 0, 0, .16), 0 4px 4px 0 rgba(0, 0, 0, .06); + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, .16), 0 4px 4px 0 rgba(0, 0, 0, .06); + background-color: #fff; + display: flex; + flex-direction: column; + align-items: center; + text-align: center; + justify-content: space-between; + margin-right: 20px; + margin-bottom: 20px; + overflow: hidden; + &:hover { + text-decoration: none; + -webkit-box-shadow: 0 4px 20px 0 rgba(0, 0, 0, .24), 0 4px 6px 0 rgba(0, 0, 0, .24); + box-shadow: 0 4px 20px 0 rgba(0, 0, 0, .24), 0 4px 6px 0 rgba(0, 0, 0, .24); + } + .load-button { + margin-top: 0; + } + .powered-summary { + font-size: 14px; + font-weight: 400; + line-height: 1.57; + letter-spacing: .43px; + color: #333; + } + .powered-title { + color: #333; + } + } +} +.powered-icon { + width: 153px; + height: 153px; + margin: 0 auto; + svg { + width: 100%; + height: 100%; + } +} +.use_cases { + display: flex; + flex-direction: column; + .use-card { + width: 872px; + max-width: 100%; + height: 278px; + padding: 33px 33px 24px 37px; + border-radius: 16px; + -webkit-box-shadow: 0 4px 16px 0 rgba(0, 0, 0, .16), 0 4px 4px 0 rgba(0, 0, 0, .06); + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, .16), 0 4px 4px 0 rgba(0, 0, 0, .06); + background-color: #fff; + display: flex; + margin-bottom: 36px; + &:hover { + text-decoration: none; + -webkit-box-shadow: 0 4px 20px 0 rgba(0, 0, 0, .24), 0 4px 6px 0 rgba(0, 0, 0, .24); + box-shadow: 0 4px 20px 0 rgba(0, 0, 0, .24), 0 4px 6px 0 rgba(0, 0, 0, .24); + } + a { + color: #333; + &:hover { + text-decoration: none; + } + } + .use-icon { + display: flex; + align-items: center; + height: 100%; + margin-right: 48px; + svg, img { + width: 153px; + } + } + .use-body { + display: flex; + width: calc(100% - 200px); + flex-direction: column; + justify-content: space-between; + .use-category { + margin-bottom: 16px; + font-size: 14px; + font-weight: 700; + line-height: normal; + letter-spacing: 2px; + text-align: left; + color: #8c8b8e; + text-transform: uppercase; + } + .read-link { + text-align: end; + color: #f26628; + font-weight: 700; + line-height: 1.14; + letter-spacing: .6px; + } + .use-summary { + max-height: 109px; + display: -webkit-box; + overflow: hidden; + -webkit-line-clamp: 4; + -webkit-box-orient: vertical; + } + h5 { + margin-top: 0; + } + } + } + .remove { + display: none; + } + .show-item { + display: flex; + } +} +.feedback { + text-align: center; + position: relative; + padding-top: 56px; + margin-top: 48px; + h3 { + max-width: 100%; + } + .update { + color: #8c8b8e; + float: right; + position: absolute; + top: 0; + right: 0; + } + .load-button { + margin-top: 56px; + padding: 15px 31px; + a{ + color: #fff; + text-decoration: none; + } + } + .description { + max-width: 465px; + margin: 0 auto; + margin-top: 24px; + } +} +@media (max-width: $mobile) { + .powered_by { + justify-content: center; + .powered-card { + width: 327px; + margin-right: 0; + } + } + .use_cases { + align-items: center; + .use-card { + width: 327px; + height: 650px; + padding: 27px 25px 24px; + flex-direction: column; + align-items: center; + .use-icon { + margin-right: 0; + height: 153px; + margin-bottom: 32px; + } + .use-body { + width: 100%; + height: 100%; + .use-summary { + max-height: 234px; + -webkit-line-clamp: 9; + } + } + } + } +} diff --git a/website/www/site/assets/scss/_quotes.scss b/website/www/site/assets/scss/_quotes.scss new file mode 100644 index 000000000000..31ef263ae789 --- /dev/null +++ b/website/www/site/assets/scss/_quotes.scss @@ -0,0 +1,155 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +@import "media"; + +.quotes { + padding: $pad-l $pad; + background-color: $color-medium-gray; + + .quotes-title { + @extend .component-title; + + text-align: center; + border: none; + } + + .quotes-desktop { + display: flex; + justify-content: center; + + .quote-card { + display: flex; + flex-direction: column; + align-items: center; + text-align: center; + width: 100%; + max-width: 381px; + height: 474px; + margin: 86px 36px 0 0; + padding: 55px 20px 24px 20px; + border-radius: 16px; + background-color: $color-white; + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, 0.16), + 0 4px 4px 0 rgba(0, 0, 0, 0.06); + margin-right: 36px; + + .quote-text { + @extend .component-quote; + + margin: 108px 0 20px 0; + } + + img { + max-height: 118px; + max-width: 320px; + } + } + + :last-child { + margin-right: 0; + } + } + + // Sliding feature is only displayed on mobile version + .keen-slider { + display: none; + } + + .dots { + display: none; + } + + .keen-slider { + width: 327px; + margin: 0 auto; + border-radius: 16px; + background-color: $color-white; + box-shadow: 0 4px 16px 0 rgba(0, 0, 0, 0.16), + 0 4px 4px 0 rgba(0, 0, 0, 0.06); + + .keen-slider__slide { + display: flex; + flex-direction: column; + align-items: center; + text-align: center; + width: 100%; + max-width: 327px; + height: 468px; + padding: 55px 24px 24px 24px; + + .quote-text { + @extend .component-quote; + + margin: 108px 0 20px 0; + } + + img { + width: 172px; + } + } + } + + .dots { + display: none; + padding: 10px 0; + justify-content: center; + margin-top: 46px; + } + + .dot { + border: none; + width: 13px; + height: 13px; + background: $color-smoke; + border-radius: 50%; + margin: 0 5px; + padding: 4px; + cursor: pointer; + } + + .dot:focus { + outline: none; + } + + .dot--active { + background: $color-sun; + } +} + +@media (max-width: $tablet) { + .quotes { + .quotes-title { + margin-bottom: 64px; + } + h2 { + margin-bottom: 0 !important; + } + .quotes-desktop { + display: none; + } + + .keen-slider { + display: flex; + width: 100%; + } + + .dots { + display: flex; + } + } +} diff --git a/website/www/site/assets/scss/_search.scss b/website/www/site/assets/scss/_search.scss new file mode 100644 index 000000000000..3da3c40fdb00 --- /dev/null +++ b/website/www/site/assets/scss/_search.scss @@ -0,0 +1,81 @@ +/** + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ +.searchBar { + width: 297px; + margin-right: 30px; + display: flex; + align-items: center; + table { + margin: 0; + } + div { + padding: 0; + svg { + padding-top: 2px; + } + } + .gsc-search-button { + display: none; + } + #__gcse_1 { + max-width: 213px; + } + .gsib_b { + display: none; + } + .gsc-input-box { + border-radius: 100px; + background-color: #f1f1f2; + table { + margin: 0; + } + input { + background-color: #f1f1f2; + min-width: 213px; + font-size: 12px; + font-weight: 400; + line-height: 1.83; + letter-spacing: normal; + } + } +} +.disappear { + display: none !important; +} +.searchBar-mobile { + width: 100%; + table { + margin: 0; + } + .gsc-input-box { + border-radius: 100px; + max-height: 36px; + background-color: #f1f1f2; + padding-top: 2px; + input { + background-color: #f1f1f2; + transform: translateY(-2px); + } + } + .gsc-search-button { + background-color: #fff; + border: none; + svg { + fill: #ff6d00; + } + &:focus { + background-color: #fff; + } + } +} diff --git a/website/www/site/assets/scss/_section-nav.sass b/website/www/site/assets/scss/_section-nav.sass deleted file mode 100644 index 61aff9f462ea..000000000000 --- a/website/www/site/assets/scss/_section-nav.sass +++ /dev/null @@ -1,119 +0,0 @@ -/*! - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -.section-nav - max-width: 250px - position: fixed - width: 100% - overflow-y: auto - background-color: #F7F7F7 - - nav - padding: 15px - max-height: calc(100vh - 130px) - - ul - list-style: none - - li - margin-bottom: 5px - - &:last-child - margin-bottom: 0 - - .section-nav-list - padding-left: 16px - - a - font-size: 12px - - @media (max-width: $ak-breakpoint-lg) - background-color: $color-white - max-height: 100vh - - .active - text-decoration: underline - color: #0f3556 - - &-back - display: none - padding: 15px - - @media (max-width: $ak-breakpoint-lg) - display: block - - &-list-title - display: block - font-size: 12px - position: relative - margin-bottom: 5px - - &--collapsible - cursor: pointer - - &-list-main-title - display: block - font-size: 12px - font-weight: bold - margin-bottom: 15px - text-transform: uppercase - - &-item--collapsible - cursor: pointer - - .section-nav-list - display: none - - span:before - content: "" - position: absolute - top: 2px - left: -16px - border-style: solid - border-width: 5px 0 5px 8px - border-color: transparent transparent transparent #3371e3 - transform: rotate(0deg) - transition: 0.3s - - .expanded - > span:before - transform: rotate(90deg) - - - @media (max-width: $ak-breakpoint-lg) - background-color: $color-light-gray - bottom: 0 - left: 0 - max-width: 256px - position: fixed - top: 0 - transition: transform 100ms linear - width: calc(100% - 32px) - z-index: 10000 - - nav - height: calc(100vh - 44px) - overflow-y: auto - - &.closed - transform: translateX(-100%) - - &.open - transform: translateX(0) - - - diff --git a/website/www/site/assets/scss/_section-nav.scss b/website/www/site/assets/scss/_section-nav.scss new file mode 100644 index 000000000000..8e1e897285e8 --- /dev/null +++ b/website/www/site/assets/scss/_section-nav.scss @@ -0,0 +1,144 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +.section-nav { + max-width: 250px; + position: absolute; + width: 100%; + overflow-y: auto; + background-color: #fbfbfb; + border-radius: 8px; + .section-nav-list-main-title { + margin-top: 33px; + } + .section-nav-back { + padding: 23px 0 23px 31px; + background-color: #fbfbfb; + &:before { + height: 18px; + color: #ff6d00; + } + } + nav { + padding: 15px; + ul { + list-style: none; + } + li { + margin-bottom: 5px; + &:last-child { + margin-bottom: 0; + } + } + .section-nav-list { + padding-left: 16px; + } + a { + font-size: 12px; + color: #757575; + font-weight: 500; + line-height: 1.83; + letter-spacing: .43px; + } + .active { + text-decoration: underline; + border-radius: 8px; + background-color: #f2f2f2; + } + } + .expanded { + >span { + &:before { + transform: rotate(90deg); + } + } + } +} +.section-nav-back { + display: none; + padding: 15px; +} +.section-nav-list-title { + display: block; + font-size: 12px; + position: relative; + margin-bottom: 5px; + font-weight: 700; + line-height: 1.83; + letter-spacing: .43px; +} +.section-nav-list-title--collapsible { + cursor: pointer; +} +.section-nav-list-main-title { + display: block; + font-size: 18px; + font-weight: 700; + line-height: normal; + letter-spacing: 1px; + margin-bottom: 32px; + text-transform: uppercase; +} +.section-nav-item--collapsible { + cursor: pointer; + .section-nav-list { + display: none; + } + span { + &:before { + content: ""; + position: absolute; + top: 4px; + left: -16px; + border-style: solid; + border-width: 5px 0 5px 8px; + border-color: transparent transparent transparent #ff6d00; + transform: rotate(0deg); + transition: .3s; + } + } +} +@media (max-width:1024px) { + .section-nav { + nav { + background-color: #fff; + max-height: 100vh; + height: calc(100vh - 44px); + overflow-y: auto; + } + background-color: #f7f7f7; + bottom: 0; + right: 0; + max-width: 303px; + position: fixed; + top: 0; + transition: transform 100ms linear; + transition: transform 100ms linear, -webkit-transform 100ms linear, -o-transform 100ms linear; + width: calc(100% - 72px); + border-radius: 0; + z-index: 10000; + padding: 0; + } + .section-nav-back { + display: block; + } + .section-nav.closed { + transform: translateX(100%); + } + .section-nav.open { + transform: translateX(0); + } +} diff --git a/website/www/site/assets/scss/_syntax-highlighting.scss b/website/www/site/assets/scss/_syntax-highlighting.scss index ed9f27c32c9a..42c6301a4016 100644 --- a/website/www/site/assets/scss/_syntax-highlighting.scss +++ b/website/www/site/assets/scss/_syntax-highlighting.scss @@ -16,71 +16,203 @@ * Syntax highlighting styles */ .highlight { - background: #fff; + //background: #fff; - .chroma { - background: #eef; - } + .chroma { + background: #eef; + } - .c { color: #998; font-style: italic } // Comment - .err { color: #a61717 } // Error - .k { font-weight: bold } // Keyword - .o { font-weight: bold } // Operator - .cm { color: #998; font-style: italic } // Comment.Multiline - .cp { color: #999; font-weight: bold } // Comment.Preproc - .c1 { color: #998; font-style: italic } // Comment.Single - .cs { color: #999; font-weight: bold; font-style: italic } // Comment.Special - .gd { color: #000; background-color: #fdd } // Generic.Deleted - .gd .x { color: #000; background-color: #faa } // Generic.Deleted.Specific - .ge { font-style: italic } // Generic.Emph - .gr { color: #a00 } // Generic.Error - .gh { color: #999 } // Generic.Heading - .gi { color: #000; background-color: #dfd } // Generic.Inserted - .gi .x { color: #000; background-color: #afa } // Generic.Inserted.Specific - .go { color: #888 } // Generic.Output - .gp { color: #555 } // Generic.Prompt - .gs { font-weight: bold } // Generic.Strong - .gu { color: #aaa } // Generic.Subheading - .gt { color: #a00 } // Generic.Traceback - .kc { font-weight: bold } // Keyword.Constant - .kd { font-weight: bold } // Keyword.Declaration - .kp { font-weight: bold } // Keyword.Pseudo - .kr { font-weight: bold } // Keyword.Reserved - .kt { color: #458; font-weight: bold } // Keyword.Type - .m { color: #099 } // Literal.Number - .s { color: #d14 } // Literal.String - .na { color: #008080 } // Name.Attribute - .nb { color: #0086B3 } // Name.Builtin - .nc { color: #458; font-weight: bold } // Name.Class - .no { color: #008080 } // Name.Constant - .ni { color: #800080 } // Name.Entity - .ne { color: #900; font-weight: bold } // Name.Exception - .nf { color: #900; font-weight: bold } // Name.Function - .nn { color: #555 } // Name.Namespace - .nt { color: #000080 } // Name.Tag - .nv { color: #008080 } // Name.Variable - .ow { font-weight: bold } // Operator.Word - .w { color: #bbb } // Text.Whitespace - .mf { color: #099 } // Literal.Number.Float - .mh { color: #099 } // Literal.Number.Hex - .mi { color: #099 } // Literal.Number.Integer - .mo { color: #099 } // Literal.Number.Oct - .sb { color: #d14 } // Literal.String.Backtick - .sc { color: #d14 } // Literal.String.Char - .sd { color: #d14 } // Literal.String.Doc - .s2 { color: #d14 } // Literal.String.Double - .se { color: #d14 } // Literal.String.Escape - .sh { color: #d14 } // Literal.String.Heredoc - .si { color: #d14 } // Literal.String.Interpol - .sx { color: #d14 } // Literal.String.Other - .sr { color: #009926 } // Literal.String.Regex - .s1 { color: #d14 } // Literal.String.Single - .ss { color: #990073 } // Literal.String.Symbol - .bp { color: #999 } // Name.Builtin.Pseudo - .vc { color: #008080 } // Name.Variable.Class - .vg { color: #008080 } // Name.Variable.Global - .vi { color: #008080 } // Name.Variable.Instance - .il { color: #099 } // Literal.Number.Integer.Long + .c { + color: #998; + font-style: italic; + } // Comment + .err { + color: #a61717; + } // Error + .k { + font-weight: bold; + } // Keyword + .o { + font-weight: bold; + } // Operator + .cm { + color: #998; + font-style: italic; + } // Comment.Multiline + .cp { + color: #999; + font-weight: bold; + } // Comment.Preproc + .c1 { + color: #998; + font-style: italic; + } // Comment.Single + .cs { + color: #999; + font-weight: bold; + font-style: italic; + } // Comment.Special + .gd { + color: #000; + background-color: #fdd; + } // Generic.Deleted + .gd .x { + color: #000; + background-color: #faa; + } // Generic.Deleted.Specific + .ge { + font-style: italic; + } // Generic.Emph + .gr { + color: #a00; + } // Generic.Error + .gh { + color: #999; + } // Generic.Heading + .gi { + color: #000; + background-color: #dfd; + } // Generic.Inserted + .gi .x { + color: #000; + background-color: #afa; + } // Generic.Inserted.Specific + .go { + color: #888; + } // Generic.Output + .gp { + color: #555; + } // Generic.Prompt + .gs { + font-weight: bold; + } // Generic.Strong + .gu { + color: #aaa; + } // Generic.Subheading + .gt { + color: #a00; + } // Generic.Traceback + .kc { + font-weight: bold; + } // Keyword.Constant + .kd { + font-weight: bold; + } // Keyword.Declaration + .kp { + font-weight: bold; + } // Keyword.Pseudo + .kr { + font-weight: bold; + } // Keyword.Reserved + .kt { + color: #458; + font-weight: bold; + } // Keyword.Type + .m { + color: #099; + } // Literal.Number + .s { + color: #d14; + } // Literal.String + .na { + color: #008080; + } // Name.Attribute + .nb { + color: #0086b3; + } // Name.Builtin + .nc { + color: #458; + font-weight: bold; + } // Name.Class + .no { + color: #008080; + } // Name.Constant + .ni { + color: #800080; + } // Name.Entity + .ne { + color: #900; + font-weight: bold; + } // Name.Exception + .nf { + color: #900; + font-weight: bold; + } // Name.Function + .nn { + color: #555; + } // Name.Namespace + .nt { + color: #000080; + } // Name.Tag + .nv { + color: #008080; + } // Name.Variable + .ow { + font-weight: bold; + } // Operator.Word + .w { + color: #bbb; + } // Text.Whitespace + .mf { + color: #099; + } // Literal.Number.Float + .mh { + color: #099; + } // Literal.Number.Hex + .mi { + color: #099; + } // Literal.Number.Integer + .mo { + color: #099; + } // Literal.Number.Oct + .sb { + color: #d14; + } // Literal.String.Backtick + .sc { + color: #d14; + } // Literal.String.Char + .sd { + color: #d14; + } // Literal.String.Doc + .s2 { + color: #d14; + } // Literal.String.Double + .se { + color: #d14; + } // Literal.String.Escape + .sh { + color: #d14; + } // Literal.String.Heredoc + .si { + color: #d14; + } // Literal.String.Interpol + .sx { + color: #d14; + } // Literal.String.Other + .sr { + color: #009926; + } // Literal.String.Regex + .s1 { + color: #d14; + } // Literal.String.Single + .ss { + color: #990073; + } // Literal.String.Symbol + .bp { + color: #999; + } // Name.Builtin.Pseudo + .vc { + color: #008080; + } // Name.Variable.Class + .vg { + color: #008080; + } // Name.Variable.Global + .vi { + color: #008080; + } // Name.Variable.Instance + .il { + color: #099; + } // Literal.Number.Integer.Long } .highlighter-custom { @@ -103,3 +235,186 @@ pre { background: #eef; } +[class*="-switcher"] { + ul { + li { + margin-bottom:0 !important; + padding-left: 0 !important; + &:hover { + cursor: pointer; + } + a { + border: none; + margin-right: 0; + color: #333; + text-decoration: none; + &:hover { + background-color: #f2804d; + color: #fff; + } + } + &::before { + display: none; + } + } + li.active { + color: #fff; + a { + background-color: #ff6d05; + color: #fff; + border: none; + } + } + } +} +.language-switcher { + ul.nav-tabs { + padding-left: 0; + border:0; + li { + &:hover { + cursor: pointer; + } + a { + border: none; + margin-right: 0; + color: #333; + &:hover { + background-color: #f2804d; + color: #fff; + } + } + &::before { + display: none; + } + } + li.active { + color: #fff; + a { + background-color: #ff6d05; + color: #fff; + border: none; + } + } + } +} +.runner-switcher { + ul.nav-tabs { + padding-left: 0; + border:0; + li { + &:hover { + cursor: pointer; + } + a { + border: none; + margin-right: 0; + color: #333; + &:hover { + background-color: #f2804d; + color: #fff; + } + } + &::before { + display: none; + } + } + li.active { + color: #fff; + a { + background-color: #ff6d05; + color: #fff; + border: none; + } + } + } +} +.shell-switcher { + ul.nav-tabs { + padding-left: 0; + border:0; + li { + &:hover { + cursor: pointer; + } + a { + border: none; + margin-right: 0; + color: #333; + &:hover { + background-color: #f2804d; + color: #fff; + } + } + &::before { + display: none; + } + } + li.active { + color: #fff; + a { + background-color: #ff6d05; + color: #fff; + border: none; + } + } + } +} +nav.version-switcher { + ul { + li { + &:hover { + cursor: pointer; + } + a { + border: none; + margin-right: 0; + color: #333; + &:hover { + background-color: #f2804d; + color: #fff; + } + } + &::before { + display: none; + } + } + li.active { + color: #fff; + a { + background-color: #ff6d05; + color: #fff; + border: none; + } + } + } +} +nav.runner-switcher { + ul { + li { + &:hover { + cursor: pointer; + } + a { + border: none; + margin-right: 0; + color: #333; + &:hover { + background-color: #f2804d; + color: #fff; + } + } + &::before { + display: none; + } + } + li.active { + color: #fff; + a { + background-color: #ff6d05; + color: #fff; + border: none; + } + } + } +} diff --git a/website/www/site/assets/scss/_table-wrapper.scss b/website/www/site/assets/scss/_table-wrapper.scss new file mode 100644 index 000000000000..3a7512266b81 --- /dev/null +++ b/website/www/site/assets/scss/_table-wrapper.scss @@ -0,0 +1,107 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +$padding: 24px; +.table-wrapper { + margin-top: 64px; + + table { + @extend .table; + max-width: 853px; + } + th:nth-child(1) { + @media (max-width: $mobile) { + padding-left: 0 !important; + width: 115px; + } + padding-left: 67px + $padding !important; + width: 491px; + } + th:nth-child(2) { + @media (max-width: $mobile) { + width: 111px; + } + width: 181px; + } + th { + padding: 14px 14px 17px 14px !important; + border: none !important; + border-bottom: 1px solid #ff6d05 !important; + letter-spacing: 2px; + line-height: normal !important; + text-transform: uppercase; + } + td { + @media (max-width: $mobile) { + padding: $padding/4 !important; + } + padding: $padding !important; + border-bottom: 1px solid rgba(255, 109, 5, 0.24); + div { + display: flex; + position: relative; + } + img, + svg { + @media (max-width: $mobile) { + display: none; + } + position: absolute; + top: 50%; + transform: translateY(-60%); + } + p { + @media (max-width: $mobile) { + margin: 0; + } + margin-left: 67px; + font-size: 14px; + line-height: 1.57; + } + } + td:nth-child(3) { + @media (max-width: $mobile) { + padding: $padding/3 !important; + padding-left: $padding/4 !important; + } + padding: $padding/3 !important; + padding-left: $padding !important; + div { + flex-direction: column; + } + a { + text-decoration: underline; + line-height: 2; + letter-spacing: 0.43px; + } + } + td:nth-child(2) { + background-color: rgba(255, 109, 0, 0.04); + padding-left: $padding/2 !important; + padding-right: 0 !important; + line-height: 1.57; + letter-spacing: 0.43px; + } + a { + color: #f26628; + } +} + +.table-bordered-wrapper { + table { + @extend .table-bordered; + } +} diff --git a/website/www/site/assets/scss/_toggler-nav.scss b/website/www/site/assets/scss/_toggler-nav.scss index 34ed4e81e1ce..0cc6ba4f2a61 100644 --- a/website/www/site/assets/scss/_toggler-nav.scss +++ b/website/www/site/assets/scss/_toggler-nav.scss @@ -26,7 +26,7 @@ nav.language-switcher { background-color: #f8f8f8; &.active { - background-color: #222c37; + background-color: #ff6d05; color: #fff; } } diff --git a/website/www/site/assets/scss/_traits.scss b/website/www/site/assets/scss/_traits.scss new file mode 100644 index 000000000000..3e74b8d9831d --- /dev/null +++ b/website/www/site/assets/scss/_traits.scss @@ -0,0 +1,49 @@ +/** + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ +@import "media"; +.row_of_traits { + display: flex; + justify-content: space-between; + flex-direction: row; + .traits-item { + display: flex; + flex-direction: column; + justify-content: flex-start; + align-items: center; + margin-top: 64px; + .traits-item-icon { + width: 112px; + height: 112px; + margin-bottom: 48px; + } + .traits-item-description { + max-width: 228px; + text-align: center; + font-size: 14px; + line-height: 1.57; + letter-spacing: .43px; + .font-weight-bold { + font-weight: 700; + } + a{ + color: #333333; + } + } + } +} +@media (max-width: $mobile) { + .row_of_traits { + flex-direction: column; + } +} diff --git a/website/www/site/assets/scss/_typography.scss b/website/www/site/assets/scss/_typography.scss new file mode 100644 index 000000000000..f030747f70a5 --- /dev/null +++ b/website/www/site/assets/scss/_typography.scss @@ -0,0 +1,201 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + @import "media"; + + .component-title { + font-size: 36px; + font-weight: 500; + font-style: normal; + line-height: 1.1; + letter-spacing: normal; + margin: 0; + color: $color-gray; + } + + .component-large-header { + font-size: 30px; + font-weight: 500; + font-style: normal; + line-height: 1.4; + letter-spacing: normal; + margin: 0; + color: $color-gray; + } + + .component-header { + font-size: 22px; + font-weight: 500; + font-style: normal; + line-height: 1.45; + letter-spacing: 0.43px; + margin: 0; + color: $color-gray; + } + + .component-small-header { + font-size: 18px; + font-weight: 500; + font-style: normal; + line-height: 1.56; + letter-spacing: 0.43px; + margin: 0; + color: $color-gray; + } + + .component-text { + font-size: 16px; + font-weight: normal; + font-style: normal; + line-height: 1.63; + letter-spacing: 0.43px; + margin: 0; + color: $color-gray; + } + + .component-label { + font-size: 16px; + font-weight: bold; + font-stretch: normal; + font-style: normal; + line-height: 1.5; + letter-spacing: 0.43px; + margin: 0; + color: $color-gray; + } + + .component-tag { + font-size: 16px; + font-weight: normal; + font-style: normal; + line-height: 1.63; + letter-spacing: 0.43px; + margin: 0; + color: $color-smoke; + } + + .component-quote { + font-size: 20px; + font-weight: normal; + font-stretch: normal; + font-style: italic; + line-height: 1.44; + letter-spacing: 0.43px; + text-align: center; + color: $color-gray; + } + + .hero-title { + font-size: 16px; + font-weight: normal; + font-style: normal; + line-height: 1.88; + letter-spacing: 0.8px; + color: $color-white; + } + + .hero-heading { + font-size: 46px; + font-weight: 500; + font-style: normal; + line-height: 1; + letter-spacing: normal; + color: $color-white; + } + + .hero-subheading { + font-size: 20px; + font-weight: normal; + font-style: normal; + line-height: 1.44; + letter-spacing: normal; + color: $color-white; + } + + @media (max-width: $tablet) { + .component-title { + font-size: 28px; + line-height: 1.29; + } + .component-large-header { + font-size: 24px; + } + } + + h1 { + font-size: 36px; + font-weight: 500; + line-height: 1.1; + letter-spacing: normal; + margin-bottom: 24px; + margin-top: 64px; + } + h2 { + font-size: 30px; + font-weight: 500; + line-height: 1.2; + letter-spacing: normal; + margin-bottom: 24px; + margin-top: 56px; + } + h3 { + font-size: 24px; + font-weight: 500; + line-height: 1.25; + letter-spacing: normal; + margin-bottom: 16px; + margin-top: 48px; + } + h4 { + font-size: 22px; + font-weight: 500; + line-height: 1.36; + letter-spacing: normal; + margin-bottom: 12px; + margin-top: 48px; + } + h5 { + font-size: 18px; + font-weight: 500; + line-height: 1.6; + letter-spacing: 0.43px; + margin-bottom: 12px; + margin-top: 32px; + } + h6 { + font-size: 16px; + font-weight: bold; + line-height: 1.2; + letter-spacing: 0.43px; + margin-bottom: 12px; + margin-top: 32px; + } + + p { + font-size: 16px; + font-weight: normal; + line-height: 1.63; + letter-spacing: 0.43px; + } + .hero-heading { + font-size: 32px; + } + li { + font-size: 16px; + letter-spacing: 0.43px; + line-height: 1.63; + } diff --git a/website/www/site/assets/scss/_vars.sass b/website/www/site/assets/scss/_vars.sass index a5dd998b14c2..626313d993b2 100644 --- a/website/www/site/assets/scss/_vars.sass +++ b/website/www/site/assets/scss/_vars.sass @@ -20,10 +20,20 @@ $color-dark: #37424B $color-white: #FFF $color-light-gray: #F7F7F7 $color-dark-gray: #555 +$color-gray: #333333 +$color-smoke: #8C8B8E +$color-sun: #F26628 +$color-silver: #C4C4C4 +$color-medium-gray: #FBFBFB + +$color-dropdown-link-hover-text: #E65D21 +$color-dropdown-link-hover-bg: #FFEDE5 $pad-sm: 15px +$pad-s: 24px $pad: 30px $pad-md: 60px +$pad-l: 84px $pad-xl: 100px $box-shadow: 0px 3px 20px 0 rgba(0,0,0,0.075) diff --git a/website/www/site/assets/scss/bootstrap/_navbar.scss b/website/www/site/assets/scss/bootstrap/_navbar.scss index 11e5c01c1585..ee2bb17a847b 100755 --- a/website/www/site/assets/scss/bootstrap/_navbar.scss +++ b/website/www/site/assets/scss/bootstrap/_navbar.scss @@ -210,7 +210,7 @@ // Bars .icon-bar { display: block; - width: 22px; + width: 20px; height: 2px; border-radius: 1px; } diff --git a/website/www/site/assets/scss/capability-matrix.scss b/website/www/site/assets/scss/capability-matrix.scss index 6bd629ada529..d5e5fc348aae 100644 --- a/website/www/site/assets/scss/capability-matrix.scss +++ b/website/www/site/assets/scss/capability-matrix.scss @@ -11,131 +11,285 @@ See the License for the specific language governing permissions and limitations under the License. */ +@import "media"; -/* What/Where/When/How colors. */ -.wwwh-what-dark { - color:#ca1; - font-weight:bold; - font-style:italic; +.back-button { + font-size: 14px; + font-weight: 700; + padding-left: 15px; + line-height: 1.14; + letter-spacing: 0.6px; + color: #f26628; + text-transform: uppercase; + &:hover { + color: #f2804d; + } + i { + width: 26px; + } } -.wwwh-where-dark { - color:#37d; - font-weight:bold; - font-style:italic; -} +.information-container { + display: flex; + -webkit-box-orient: horizontal; + -webkit-box-direction: normal; + flex-direction: row; + margin-bottom: 84px; -.wwwh-when-dark { - color:#6a4; - font-weight:bold; - font-style:italic; + @media (max-width: $mobile) { + -webkit-box-orient: vertical; + -webkit-box-direction: normal; + flex-direction: column; + } + table { + margin: 0; + } + .read-tables { + table-layout: fixed; + position: relative; + th:first-of-type { + width: 22px; + min-width: 22px; + } + span { + font-weight: 500; + } + } + .second-container { + width: 270px; + margin-left: 100px; + @media (max-width: $mobile) { + margin-left: 0; + } + h5 { + margin-bottom: 22px; + } + .row { + display: -webkit-box; + display: flex; + margin-left: 0; + } + .box { + width: 71px; + height: 24px; + margin-bottom: 10px; + text-align: center; + margin-right: 10px; + } + .white { + border: solid 1px #f6f6f6; + background-color: #fff; + } + .partial { + border: solid 1px #d8d8d8; + background-color: #f9f9f9; + } + .gray { + border: solid 1px #bcbcbc; + background-color: #e1e0e0; + } + } + .border-left { + border-left: 3px solid #ff6d05; + padding: 7px; + font-size: 12px; + font-weight: 500; + line-height: 1.83; + letter-spacing: 0.43px; + } + .border-right { + transform: rotate(270deg); + height: 80px; + text-align: center; + padding: 0 !important; + width: 25px !important; + font-size: 12px; + font-weight: 500; + font-stretch: normal; + font-style: normal; + line-height: 1.57; + letter-spacing: 0.43px; + position: absolute; + top: 55px; + left: 24px; + } + .border-top { + border-top: 3px solid #ff6d05; + font-size: 14px; + font-weight: 400; + font-stretch: normal; + font-style: normal; + line-height: 1.57; + letter-spacing: 0.43px; + td { + width: 162px; + padding: 7px; + } + } } - -.wwwh-how-dark { - color:#b55; - font-weight:bold; - font-style:italic; +.table-container { + display: -webkit-box; + display: flex; + -webkit-box-orient: horizontal; + -webkit-box-direction: normal; + flex-direction: row; + margin-bottom: 32px; + table { + margin: 0; + } + .big-right { + max-width: 1240px; + } + .table-right { + overflow-x: scroll; + padding: 1px; + scrollbar-width: none; + &::-webkit-scrollbar { + width: 0; + background: 0 0; + } + } + .table-headers { + overflow-x: scroll; + border-spacing: 0; + border: solid 1px #d8d8d8; + border-bottom: none; + mix-blend-mode: multiply; + -ms-overflow-style: none; + scrollbar-width: none; + &::-webkit-scrollbar { + width: 0; + background: 0 0; + } + table { + height: 48px; + table-layout: fixed; + width: 710px; + th { + text-align: center; + width: 142px; + font-size: 12px; + font-weight: 500; + line-height: normal; + letter-spacing: normal; + padding-left: 10px; + @media (max-width: $mobile) { + width: 73px; + } + } + } + } + .big-headers table th { + width: 249px; + } + .table-center { + overflow-x: scroll; + border-spacing: 0; + border: solid 1px #d8d8d8; + border-collapse: separate; + transform: rotateX(180deg); + padding-bottom: 8px; + scrollbar-color: #ff6d00; + &::-webkit-scrollbar { + height: 8px; + width: 25px !important; + } + &::-webkit-scrollbar-track { + background-color: #efeded; + background-clip: content-box; + } + &::-webkit-scrollbar-thumb { + border-radius: 8px; + height: 6px !important; + background-color: #ff6d00; + } + table { + transform: rotateX(180deg); + border-collapse: separate; + table-layout: fixed; + width: 710px; + th, + td { + width: 142px; + height: 48px; + font-size: 12px; + font-weight: 500; + line-height: normal; + letter-spacing: normal; + border: 1px solid #fff; + @media (max-width: 640px) { + width: 73px; + } + } + } + } + .big-center { + table { + tr { + vertical-align: baseline; + th, + td { + width: 249px; + height: 352px; + padding: 6px; + font-weight: 400; + p { + font-size: 12px; + font-weight: 500; + line-height: normal; + letter-spacing: normal; + margin: 0; + } + } + } + } + } + .table-left table { + width: 142px; + @media (max-width: 640px) { + width: 73px; + } + tr:first-child th { + height: 83px; + border: none; + @media (max-width: $mobile) { + height: 119px; + } + @-moz-document url-prefix() { + height: 77px; + } + } + th { + height: 48px; + font-size: 12px; + font-weight: 500; + line-height: normal; + letter-spacing: normal; + border: solid 1px #f6f6f6; + border-right: none; + padding: 6px; + } + } + .big-left table { + text-align: start; + tr { + vertical-align: baseline; + } + th { + height: 352px; + } + } } - -/* Capability matrix general sizing, alignment etc. */ -table.cap { - border-spacing: 0px; - border-collapse: collapse; - padding: 2px; -} - -td.cap { - border-width: 1px; - border-style:solid; - vertical-align:text-top; - padding:0.5ex; -} - -th.cap, tr.cap, table.cap { - border-width: 1px; - border-style:solid; - vertical-align:text-top; - padding:0.5ex; -} - -td.cap-blank { - padding:10px; -} - -/* Capability matrix blog-post sizing, alignment etc. */ -table.cap-summary { - border-spacing: 0px; - border-collapse: collapse; - padding: 2px; - width:600px; -} - -td.cap-summary { - border-width: 1px; - border-style:solid; - vertical-align:text-top; - padding:0.5ex; -} - -th.cap-summary, tr.cap-summary, table.cap-summary { - border-width: 1px; - border-style:solid; - vertical-align:text-top; - padding:0.5ex; -} - -td.cap-summary-blank { - padding:10px; -} - -/* Capability matrix semantic coloring. */ -th.color-metadata, td.color-metadata { - background-color:#fff; - border-color:#fff; - color:#000; -} - -th.color-capability { - background-color:#333; - border-color:#222; -} - -th.color-platform { - background-color:#333; - border-color:#222; -} - -td.color-blank { - background-color:#fff; - color:#fff; -} - -/* Capability matrix semantic formatting */ -th.format-category { - vertical-align:text-top; - font-size:20px; - text-align:center; -} - -th.format-capability { - text-align:right; - white-space:nowrap; -} - -th.format-platform { - text-align:center; -} - -/* Capability matrix expand/collapse details toggle. */ -div.cap-toggle { - border-color:#000; - color:#000; - padding-top:1.5ex; - border-style:solid; - border-width:0px; - text-align:center; - cursor:pointer; - position:absolute; - font-size:16px; - font-weight:normal; +#table-link { + margin-bottom: 84px; + font-size: 14px; + font-weight: 700; + line-height: 1.14; + letter-spacing: 0.6px; + color: #f26628; + text-transform: uppercase; + &:hover { + text-decoration: none; + color: #f2804d; + } } diff --git a/website/www/site/assets/scss/main.scss b/website/www/site/assets/scss/main.scss index a2983ad59473..b72b4085c72a 100644 --- a/website/www/site/assets/scss/main.scss +++ b/website/www/site/assets/scss/main.scss @@ -14,29 +14,39 @@ // Legacy. @import "bootstrap"; -@import "capability-matrix"; +@import "_capability-matrix"; @import "_syntax-highlighting"; @import "_toggler-nav"; // Globals. @import "_vars.sass"; +@import "_media.scss"; @import "_breakpoints.sass"; @import "_type.sass"; @import "_global.sass"; -@import "_navbar.sass"; +@import "_navbar-mobile.sass"; +@import "_typography.scss"; // Components. @import "_button.sass"; // Modules. -@import "_cards.sass"; @import "_ctas.sass"; @import "_footer.sass"; -@import "_graphic.sass"; +@import "_graphic.scss"; @import "_header.sass"; -@import "_hero.sass"; -@import "_logos.sass"; -@import "_pillars.sass"; -@import "_section-nav.sass"; -@import "_page-nav.sass"; -@import "_table-wrapper.sass"; +@import "_hero.scss"; +@import "_hero-mobile.scss"; +@import "_logos.scss"; +@import "_pillars.scss"; +@import "_navbar-desktop.scss"; +@import "_table-wrapper.scss"; +@import "_calendar.scss"; +@import "_section-nav.scss"; +@import "_page-nav.scss"; +@import "_quotes.scss"; +@import "_blog.scss"; +@import "_lists.scss"; +@import "_traits.scss"; +@import "_search.scss"; +@import "_powered_by.scss"; diff --git a/website/www/site/config.toml b/website/www/site/config.toml index b334ee6d89bb..317a62f885cb 100644 --- a/website/www/site/config.toml +++ b/website/www/site/config.toml @@ -37,7 +37,7 @@ pygmentsUseClassic = false # See https://help.farbox.com/pygments.html pygmentsStyle = "tango" -summaryLength = "unlimited" +summaryLength = "30" canonifyURLs = true @@ -104,7 +104,7 @@ github_project_repo = "https://github.com/apache/beam" [params] description = "Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes." -release_latest = "2.25.0" +release_latest = "2.32.0" # The repository and branch where the files live in Github or Colab. This is used # to serve and stage from your local branch, but publish to the master branch. # e.g. https://github.com/{{< param branch_repo >}}/path/to/notebook.ipynb diff --git a/website/www/site/content/en/blog/beam-2.13.0.md b/website/www/site/content/en/blog/beam-2.13.0.md index 2f9f85595551..ac62b63b273d 100644 --- a/website/www/site/content/en/blog/beam-2.13.0.md +++ b/website/www/site/content/en/blog/beam-2.13.0.md @@ -7,7 +7,7 @@ categories: aliases: - /blog/2019/05/22/beam-2.13.0.html authors: - - goenka + - angoenka --- We are happy to present the new 2.21.0 release of Beam. This release includes both improvements and new functionality. -See the [download page](/get-started/downloads/#2210-2020-05-27) for this release. +See the [download page](/get-started/downloads/#2210-2020-05-27) for this release. For more information on changes in 2.21.0, check out the [detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347143). diff --git a/website/www/site/content/en/blog/beam-2.22.0.md b/website/www/site/content/en/blog/beam-2.22.0.md index a468fb6f1e02..17fe4d27a0f9 100644 --- a/website/www/site/content/en/blog/beam-2.22.0.md +++ b/website/www/site/content/en/blog/beam-2.22.0.md @@ -18,7 +18,7 @@ See the License for the specific language governing permissions and limitations under the License. --> We are happy to present the new 2.22.0 release of Beam. This release includes both improvements and new functionality. -See the [download page](/get-started/downloads/#2220-2020-06-08) for this release. +See the [download page](/get-started/downloads/#2220-2020-06-08) for this release. For more information on changes in 2.22.0, check out the [detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347144). diff --git a/website/www/site/content/en/blog/beam-2.23.0.md b/website/www/site/content/en/blog/beam-2.23.0.md index 49b095509866..7e825b53c21d 100644 --- a/website/www/site/content/en/blog/beam-2.23.0.md +++ b/website/www/site/content/en/blog/beam-2.23.0.md @@ -4,7 +4,7 @@ date: 2020-07-29 00:00:01 -0800 categories: - blog authors: - - Valentyn Tymofieiev + - tvalentyn --- We are happy to present the new 2.23.0 release of Apache Beam. This release includes both improvements and new functionality. -See the [download page](/get-started/downloads/#2230-2020-07-29) for this release. +See the [download page](/get-started/downloads/#2230-2020-07-29) for this release. For more information on changes in 2.23.0, check out the [detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347145). diff --git a/website/www/site/content/en/blog/beam-2.24.0.md b/website/www/site/content/en/blog/beam-2.24.0.md index 69f000bc8766..3356f417b79b 100644 --- a/website/www/site/content/en/blog/beam-2.24.0.md +++ b/website/www/site/content/en/blog/beam-2.24.0.md @@ -4,7 +4,7 @@ date: 2020-09-18 00:00:01 -0800 categories: - blog authors: - - Daniel Oliveira + - danoliveira --- We are happy to present the new 2.24.0 release of Apache Beam. This release includes both improvements and new functionality. -See the [download page](/get-started/downloads/#2240-2020-09-18) for this release. +See the [download page](/get-started/downloads/#2240-2020-09-18) for this release. For more information on changes in 2.24.0, check out the [detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347146). diff --git a/website/www/site/content/en/blog/beam-2.25.0.md b/website/www/site/content/en/blog/beam-2.25.0.md index 724eddf64570..e6eb3df07c16 100644 --- a/website/www/site/content/en/blog/beam-2.25.0.md +++ b/website/www/site/content/en/blog/beam-2.25.0.md @@ -4,7 +4,7 @@ date: 2020-10-23 14:00:00 -0800 categories: - blog authors: - - Robin Qiu + - robinyq --- We are happy to present the new 2.25.0 release of Apache Beam. This release includes both improvements and new functionality. -See the [download page](/get-started/downloads/#2250-2020-10-23) for this release. +See the [download page](/get-started/downloads/#2250-2020-10-23) for this release. For more information on changes in 2.25.0, check out the [detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347147). diff --git a/website/www/site/content/en/blog/beam-2.26.0.md b/website/www/site/content/en/blog/beam-2.26.0.md new file mode 100644 index 000000000000..097f33140fe2 --- /dev/null +++ b/website/www/site/content/en/blog/beam-2.26.0.md @@ -0,0 +1,65 @@ +--- +title: "Apache Beam 2.26.0" +date: 2020-12-11 12:00:00 -0800 +categories: + - blog +authors: + - lostluck +--- + +We are happy to present the new 2.26.0 release of Apache Beam. This release includes both improvements and new functionality. +See the [download page](/get-started/downloads/#2260-2020-12-11) for this release. +For more information on changes in 2.26.0, check out the +[detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12348833). + +## Highlights + +* Splittable DoFn is now the default for executing the Read transform for Java based runners (Spark with bounded pipelines) in addition to existing runners from the 2.25.0 release (Direct, Flink, Jet, Samza, Twister2). The expected output of the Read transform is unchanged. Users can opt-out using `--experiments=use_deprecated_read`. The Apache Beam community is looking for feedback for this change as the community is planning to make this change permanent with no opt-out. If you run into an issue requiring the opt-out, please send an e-mail to [user@beam.apache.org](mailto:user@beam.apache.org) specifically referencing BEAM-10670 in the subject line and why you needed to opt-out. (Java) ([BEAM-10670](https://issues.apache.org/jira/browse/BEAM-10670)) + +## I/Os + +* Java BigQuery streaming inserts now have timeouts enabled by default. Pass `--HTTPWriteTimeout=0` to revert to the old behavior. ([BEAM-6103](https://issues.apache.org/jira/browse/BEAM-6103)) +* Added support for Contextual Text IO (Java), a version of text IO that provides metadata about the records ([BEAM-10124](https://issues.apache.org/jira/browse/BEAM-10124)). Support for this IO is currently experimental. Specifically, **there are no update-compatibility guarantees** for streaming jobs with this IO between current future verisons of Apache Beam SDK. + +## New Features / Improvements +* Added support for avro payload format in Beam SQL Kafka Table ([BEAM-10885](https://issues.apache.org/jira/browse/BEAM-10885)) +* Added support for json payload format in Beam SQL Kafka Table ([BEAM-10893](https://issues.apache.org/jira/browse/BEAM-10893)) +* Added support for protobuf payload format in Beam SQL Kafka Table ([BEAM-10892](https://issues.apache.org/jira/browse/BEAM-10892)) +* Added support for avro payload format in Beam SQL Pubsub Table ([BEAM-5504](https://issues.apache.org/jira/browse/BEAM-5504)) +* Added option to disable unnecessary copying between operators in Flink Runner (Java) ([BEAM-11146](https://issues.apache.org/jira/browse/BEAM-11146)) +* Added CombineFn.setup and CombineFn.teardown to Python SDK. These methods let you initialize the CombineFn's state before any of the other methods of the CombineFn is executed and clean that state up later on. If you are using Dataflow, you need to enable Dataflow Runner V2 by passing `--experiments=use_runner_v2` before using this feature. ([BEAM-3736](https://issues.apache.org/jira/browse/BEAM-3736)) + +## Breaking Changes + +* BigQuery's DATETIME type now maps to Beam logical type org.apache.beam.sdk.schemas.logicaltypes.SqlTypes.DATETIME +* Pandas 1.x is now required for dataframe operations. + +## List of Contributors + +According to git shortlog, the following people contributed to the 2.26.0 release. Thank you to all contributors! + +Abhishek Yadav, AbhiY98, Ahmet Altay, Alan Myrvold, Alex Amato, Alexey Romanenko, +Andrew Pilloud, Ankur Goenka, Boyuan Zhang, Brian Hulette, Chad Dombrova, +Chamikara Jayalath, Curtis "Fjord" Hawthorne, Damon Douglas, dandy10, Daniel Oliveira, +David Cavazos, dennis, Derrick Qin, dpcollins-google, Dylan Hercher, emily, Esun Kim, +Gleb Kanterov, Heejong Lee, Ismaël Mejía, Jan Lukavský, Jean-Baptiste Onofré, Jing, +Jozef Vilcek, Justin White, Kamil Wasilewski, Kenneth Knowles, kileys, Kyle Weaver, +lostluck, Luke Cwik, Mark, Maximilian Michels, Milan Cermak, Mohammad Hossein Sekhavat, +Nelson Osacky, Neville Li, Ning Kang, pabloem, Pablo Estrada, pawelpasterz, +Pawel Pasterz, Piotr Szuberski, PoojaChandak, purbanow, rarokni, Ravi Magham, +Reuben van Ammers, Reuven Lax, Reza Rokni, Robert Bradshaw, Robert Burke, +Romain Manni-Bucau, Rui Wang, rworley-monster, Sam Rohde, Sam Whittle, shollyman, +Simone Primarosa, Siyuan Chen, Steve Niemitz, Steven van Rossum, sychen, Teodor Spæren, +Tim Clemons, Tim Robertson, Tobiasz Kędzierski, tszerszen, Tudor Marian, tvalentyn, +Tyson Hamilton, Udi Meiri, Vasu Gupta, xasm83, Yichi Zhang, yichuan66, Yifan Mai, +yoshiki.obata, Yueyang Qiu, yukihira1992 diff --git a/website/www/site/content/en/blog/beam-2.27.0.md b/website/www/site/content/en/blog/beam-2.27.0.md new file mode 100644 index 000000000000..5e2c4df39825 --- /dev/null +++ b/website/www/site/content/en/blog/beam-2.27.0.md @@ -0,0 +1,70 @@ +--- +title: "Apache Beam 2.27.0" +date: 2021-01-07 12:00:00 -0800 +categories: + - blog +authors: + - pabloem +--- + +We are happy to present the new 2.27.0 release of Apache Beam. This release includes both improvements and new functionality. +See the [download page](/get-started/downloads/#2270-2020-12-22) for this release. +For more information on changes in 2.27.0, check out the +[detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12349380). + +## Highlights + +* Java 11 Containers are now published with all Beam releases. +* There is a new transform `ReadAllFromBigQuery` that can receive multiple requests to read data from BigQuery at pipeline runtime. See [PR 13170](https://github.com/apache/beam/pull/13170), and [BEAM-9650](https://issues.apache.org/jira/browse/BEAM-9650). + + +## I/Os +* ReadFromMongoDB can now be used with MongoDB Atlas (Python) ([BEAM-11266](https://issues.apache.org/jira/browse/BEAM-11266).) +* ReadFromMongoDB/WriteToMongoDB will mask password in display_data (Python) ([BEAM-11444](https://issues.apache.org/jira/browse/BEAM-11444).) +* There is a new transform `ReadAllFromBigQuery` that can receive multiple requests to read data from BigQuery at pipeline runtime. See [PR 13170](https://github.com/apache/beam/pull/13170), and [BEAM-9650](https://issues.apache.org/jira/browse/BEAM-9650). + +## New Features / Improvements + +* Beam modules that depend on Hadoop are now tested for compatibility with Hadoop 3 ([BEAM-8569](https://issues.apache.org/jira/browse/BEAM-8569)). (Hive/HCatalog pending) +* Publishing Java 11 SDK container images now supported as part of Apache Beam release process. ([BEAM-8106](https://issues.apache.org/jira/browse/BEAM-8106)) +* Added Cloud Bigtable Provider extension to Beam SQL ([BEAM-11173](https://issues.apache.org/jira/browse/BEAM-11173), [BEAM-11373](https://issues.apache.org/jira/browse/BEAM-11373)) +* Added a schema provider for thrift data ([BEAM-11338](https://issues.apache.org/jira/browse/BEAM-11338)) +* Added combiner packing pipeline optimization to Dataflow runner. ([BEAM-10641](https://issues.apache.org/jira/browse/BEAM-10641)) +* Added an example to ingest data from Apache Kafka to Google Pub/Sub. ([BEAM-11065](https://issues.apache.org/jira/browse/BEAM-11065)) + +## Breaking Changes + +* HBaseIO hbase-shaded-client dependency should be now provided by the users ([BEAM-9278](https://issues.apache.org/jira/browse/BEAM-9278)). +* `--region` flag in amazon-web-services2 was replaced by `--awsRegion` ([BEAM-11331](https://issues.apache.org/jira/projects/BEAM/issues/BEAM-11331)). + + +## List of Contributors + +According to git shortlog, the following people contributed to the 2.27.0 release. Thank you to all contributors! + +Ahmet Altay, Alan Myrvold, Alex Amato, Alexey Romanenko, Aliraza Nagamia, Allen Pradeep Xavier, +Andrew Pilloud, andreyKaparulin, Ashwin Ramaswami, Boyuan Zhang, Brent Worden, Brian Hulette, +Carlos Marin, Chamikara Jayalath, Costi Ciudatu, Damon Douglas, Daniel Collins, +Daniel Oliveira, David Huntsperger, David Lu, David Moravek, David Wrede, +dennis, Dennis Yung, dpcollins-google, Emily Ye, emkornfield, +Esun Kim, Etienne Chauchot, Eugene Nikolaiev, Frank Zhao, Haizhou Zhao, +Hector Acosta, Heejong Lee, Ilya, Iñigo San Jose Visiers, InigoSJ, +Ismaël Mejía, janeliulwq, Jan Lukavský, Kamil Wasilewski, Kenneth Jung, +Kenneth Knowles, Ke Wu, kileys, Kyle Weaver, lostluck, +Matt Casters, Maximilian Michels, Michal Walenia, Mike Dewar, nehsyc, +Nelson Osacky, Niels Basjes, Ning Kang, Pablo Estrada, palmere-google, +Pawel Pasterz, Piotr Szuberski, purbanow, Reuven Lax, rHermes, +Robert Bradshaw, Robert Burke, Rui Wang, Sam Rohde, Sam Whittle, +Siyuan Chen, Tim Robertson, Tobiasz Kędzierski, tszerszen, +Valentyn Tymofieiev, Tyson Hamilton, Udi Meiri, vachan-shetty, Xinyu Liu, +Yichi Zhang, Yifan Mai, yoshiki.obata, Yueyang Qiu diff --git a/website/www/site/content/en/blog/beam-2.28.0.md b/website/www/site/content/en/blog/beam-2.28.0.md new file mode 100644 index 000000000000..531ff514f619 --- /dev/null +++ b/website/www/site/content/en/blog/beam-2.28.0.md @@ -0,0 +1,85 @@ +--- +title: "Apache Beam 2.28.0" +date: 2021-02-22 12:00:00 -0800 +categories: + - blog +authors: + - chamikara +--- + +We are happy to present the new 2.28.0 release of Apache Beam. This release includes both improvements and new functionality. +See the [download page](/get-started/downloads/#2280-2021-02-13) for this release. +For more information on changes in 2.28.0, check out the +[detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12349499). + +## Highlights +* Many improvements related to Parquet support ([BEAM-11460](https://issues.apache.org/jira/browse/BEAM-11460), [BEAM-8202](https://issues.apache.org/jira/browse/BEAM-8202), and [BEAM-11526](https://issues.apache.org/jira/browse/BEAM-11526)) +* Hash Functions in BeamSQL ([BEAM-10074](https://issues.apache.org/jira/browse/BEAM-10074)) +* Hash functions in ZetaSQL ([BEAM-11624](https://issues.apache.org/jira/browse/BEAM-11624)) +* Create ApproximateDistinct using HLL Impl ([BEAM-10324](https://issues.apache.org/jira/browse/BEAM-10324)) + +## I/Os +* SpannerIO supports using BigDecimal for Numeric fields ([BEAM-11643](https://issues.apache.org/jira/browse/BEAM-11643)) +* Add Beam schema support to ParquetIO ([BEAM-11526](https://issues.apache.org/jira/browse/BEAM-11526)) +* Support ParquetTable Writer ([BEAM-8202](https://issues.apache.org/jira/browse/BEAM-8202)) +* GCP BigQuery sink (streaming inserts) uses runner determined sharding ([BEAM-11408](https://issues.apache.org/jira/browse/BEAM-11408)) +* PubSub support types: TIMESTAMP, DATE, TIME, DATETIME ([BEAM-11533](https://issues.apache.org/jira/browse/BEAM-11533)) + +## New Features / Improvements +* ParquetIO add methods _readGenericRecords_ and _readFilesGenericRecords_ can read files with an unknown schema. See [PR-13554](https://github.com/apache/beam/pull/13554) and ([BEAM-11460](https://issues.apache.org/jira/browse/BEAM-11460)) +* Added support for thrift in KafkaTableProvider ([BEAM-11482](https://issues.apache.org/jira/browse/BEAM-11482)) +* Added support for HadoopFormatIO to skip key/value clone ([BEAM-11457](https://issues.apache.org/jira/browse/BEAM-11457)) +* Support Conversion to GenericRecords in Convert.to transform ([BEAM-11571](https://issues.apache.org/jira/browse/BEAM-11571)). +* Support writes for Parquet Tables in Beam SQL ([BEAM-8202](https://issues.apache.org/jira/browse/BEAM-8202)). +* Support reading Parquet files with unknown schema ([BEAM-11460](https://issues.apache.org/jira/browse/BEAM-11460)) +* Support user configurable Hadoop Configuration flags for ParquetIO ([BEAM-11527](https://issues.apache.org/jira/browse/BEAM-11527)) +* Expose commit_offset_in_finalize and timestamp_policy to ReadFromKafka ([BEAM-11677](https://issues.apache.org/jira/browse/BEAM-11677)) +* S3 options does not provided to boto3 client while using FlinkRunner and Beam worker pool container ([BEAM-11799](https://issues.apache.org/jira/browse/BEAM-11799)) +* HDFS not deduplicating identical configuration paths ([BEAM-11329](https://issues.apache.org/jira/browse/BEAM-11329)) +* Hash Functions in BeamSQL ([BEAM-10074](https://issues.apache.org/jira/browse/BEAM-10074)) +* Create ApproximateDistinct using HLL Impl ([BEAM-10324](https://issues.apache.org/jira/browse/BEAM-10324)) +* Add Beam schema support to ParquetIO ([BEAM-11526](https://issues.apache.org/jira/browse/BEAM-11526)) +* Add a Deque Encoder ([BEAM-11538](https://issues.apache.org/jira/browse/BEAM-11538)) +* Hash functions in ZetaSQL ([BEAM-11624](https://issues.apache.org/jira/browse/BEAM-11624)) +* Refactor ParquetTableProvider ([](https://issues.apache.org/jira/browse/)) +* Add JVM properties to JavaJobServer ([BEAM-8344](https://issues.apache.org/jira/browse/BEAM-8344)) +* Single source of truth for supported Flink versions ([](https://issues.apache.org/jira/browse/)) +* Use metric for Python BigQuery streaming insert API latency logging ([BEAM-11018](https://issues.apache.org/jira/browse/BEAM-11018)) +* Use metric for Java BigQuery streaming insert API latency logging ([BEAM-11032](https://issues.apache.org/jira/browse/BEAM-11032)) +* Upgrade Flink runner to Flink versions 1.12.1 and 1.11.3 ([BEAM-11697](https://issues.apache.org/jira/browse/BEAM-11697)) +* Upgrade Beam base image to use Tensorflow 2.4.1 ([BEAM-11762](https://issues.apache.org/jira/browse/BEAM-11762)) +* Create Beam GCP BOM ([BEAM-11665](https://issues.apache.org/jira/browse/BEAM-11665)) + +## Breaking Changes + +* The Java artifacts "beam-sdks-java-io-kinesis", "beam-sdks-java-io-google-cloud-platform", and + "beam-sdks-java-extensions-sql-zetasql" declare Guava 30.1-jre dependency (It was 25.1-jre in Beam 2.27.0). + This new Guava version may introduce dependency conflicts if your project or dependencies rely + on removed APIs. If affected, ensure to use an appropriate Guava version via `dependencyManagement` in Maven and + `force` in Gradle. + +## List of Contributors + +According to git shortlog, the following people contributed to the 2.28.0 release. Thank you to all contributors! + +Ahmet Altay, Alex Amato, Alexey Romanenko, Allen Pradeep Xavier, Anant Damle, Artur Khanin, +Boyuan Zhang, Brian Hulette, Chamikara Jayalath, Chris Roth, Costi Ciudatu, Damon Douglas, +Daniel Collins, Daniel Oliveira, David Cavazos, David Huntsperger, Elliotte Rusty Harold, +Emily Ye, Etienne Chauchot, Etta Rapp, Evan Palmer, Eyal, Filip Krakowski, Fokko Driesprong, +Heejong Lee, Ismaël Mejía, janeliulwq, Jan Lukavský, John Edmonds, Jozef Vilcek, Kenneth Knowles +Ke Wu, kileys, Kyle Weaver, MabelYC, masahitojp, Masato Nakamura, Milena Bukal, Miraç Vuslat Başaran, +Nelson Osacky, Niel Markwick, Ning Kang, omarismail94, Pablo Estrada, Piotr Szuberski, +ramazan-yapparov, Reuven Lax, Reza Rokni, rHermes, Robert Bradshaw, Robert Burke, Robert Gruener, +Romster, Rui Wang, Sam Whittle, shehzaadn-vd, Siyuan Chen, Sonam Ramchand, Tobiasz Kędzierski, +Tomo Suzuki, tszerszen, tvalentyn, Tyson Hamilton, Udi Meiri, Xinbin Huang, Yichi Zhang, +Yifan Mai, yoshiki.obata, Yueyang Qiu, Yusaku Matsuki diff --git a/website/www/site/content/en/blog/beam-2.29.0.md b/website/www/site/content/en/blog/beam-2.29.0.md new file mode 100644 index 000000000000..e2b3e1c9694e --- /dev/null +++ b/website/www/site/content/en/blog/beam-2.29.0.md @@ -0,0 +1,84 @@ +--- +title: "Apache Beam 2.29.0" +date: 2021-04-29 9:00:00 -0700 +categories: + - blog +authors: + - klk +--- + + +We are happy to present the new 2.29.0 release of Beam. +This release includes both improvements and new functionality. +See the [download page](/get-started/downloads/#2290-2021-04-15) for this release. + + + +For more information on changes in 2.29.0, check out the [detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12349629). + +## Highlights + +* Spark Classic and Portable runners officially support Spark 3 ([BEAM-7093](https://issues.apache.org/jira/browse/BEAM-7093)). +* Official Java 11 support for most runners (Dataflow, Flink, Spark) ([BEAM-2530](https://issues.apache.org/jira/browse/BEAM-2530)). +* DataFrame API now supports GroupBy.apply ([BEAM-11628](https://issues.apache.org/jira/browse/BEAM-11628)). + +### I/Os + +* Added support for S3 filesystem on AWS SDK V2 (Java) ([BEAM-7637](https://issues.apache.org/jira/browse/BEAM-7637)) +* GCP BigQuery sink (file loads) uses runner determined sharding for unbounded data ([BEAM-11772](https://issues.apache.org/jira/browse/BEAM-11772)) +* KafkaIO now recognizes the `partition` property in writing records ([BEAM-11806](https://issues.apache.org/jira/browse/BEAM-11806)) +* Support for Hadoop configuration on ParquetIO ([BEAM-11913](https://issues.apache.org/jira/browse/BEAM-11913)) + +### New Features / Improvements + +* DataFrame API now supports pandas 1.2.x ([BEAM-11531](https://issues.apache.org/jira/browse/BEAM-11531)). +* Multiple DataFrame API bugfixes ([BEAM-12071](https://issues.apache/jira/browse/BEAM-12071), [BEAM-11929](https://issues.apache/jira/browse/BEAM-11929)) +* DDL supported in SQL transforms ([BEAM-11850](https://issues.apache.org/jira/browse/BEAM-11850)) +* Upgrade Flink runner to Flink version 1.12.2 ([BEAM-11941](https://issues.apache.org/jira/browse/BEAM-11941)) + +### Breaking Changes + +* Deterministic coding enforced for GroupByKey and Stateful DoFns. Previously non-deterministic coding was allowed, resulting in keys not properly being grouped in some cases. ([BEAM-11719](https://issues.apache.org/jira/browse/BEAM-11719)) + To restore the old behavior, one can register `FakeDeterministicFastPrimitivesCoder` with + `beam.coders.registry.register_fallback_coder(beam.coders.coders.FakeDeterministicFastPrimitivesCoder())` + or use the `allow_non_deterministic_key_coders` pipeline option. + +### Deprecations + +* Support for Flink 1.8 and 1.9 will be removed in the next release (2.30.0) ([BEAM-11948](https://issues.apache.org/jira/browse/BEAM-11948)). + +### Known Issues + +* See a full list of open [issues that affect](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20affectedVersion%20%3D%202.29.0%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC) this version. + +## List of Contributors + +According to `git shortlog`, the following people contributed to the 2.29.0 release. Thank you to all contributors! + +Ahmet Altay, Alan Myrvold, Alex Amato, Alexander Chermenin, Alexey Romanenko, +Allen Pradeep Xavier, Amy Wu, Anant Damle, Andreas Bergmeier, Andrei Balici, +Andrew Pilloud, Andy Xu, Ankur Goenka, Bashir Sadjad, Benjamin Gonzalez, Boyuan +Zhang, Brian Hulette, Chamikara Jayalath, Chinmoy Mandayam, Chuck Yang, +dandy10, Daniel Collins, Daniel Oliveira, David Cavazos, David Huntsperger, +David Moravek, Dmytro Kozhevin, Emily Ye, Esun Kim, Evgeniy Belousov, Filip +Popić, Fokko Driesprong, Gris Cuevas, Heejong Lee, Ihor Indyk, Ismaël Mejía, +Jakub-Sadowski, Jan Lukavský, John Edmonds, Juan Sandoval, 谷口恵輔, Kenneth +Jung, Kenneth Knowles, KevinGG, Kiley Sok, Kyle Weaver, MabelYC, Mackenzie +Clark, Masato Nakamura, Milena Bukal, Miltos, Minbo Bae, Miraç Vuslat Başaran, +mynameborat, Nahian-Al Hasan, Nam Bui, Niel Markwick, Niels Basjes, Ning Kang, +Nir Gazit, Pablo Estrada, Ramazan Yapparov, Raphael Sanamyan, Reuven Lax, Rion +Williams, Robert Bradshaw, Robert Burke, Rui Wang, Sam Rohde, Sam Whittle, +Shehzaad Nakhoda, Shehzaad Nakhoda, Siyuan Chen, Sonam Ramchand, Steve Niemitz, +sychen, Sylvain Veyrié, Tim Robertson, Tobias Kaymak, Tomasz Szerszeń, Tomasz +Szerszeń, Tomo Suzuki, Tyson Hamilton, Udi Meiri, Valentyn Tymofieiev, Yichi +Zhang, Yifan Mai, Yixing Zhang, Yoshiki Obata diff --git a/website/www/site/content/en/blog/beam-2.30.0.md b/website/www/site/content/en/blog/beam-2.30.0.md new file mode 100644 index 000000000000..95ac7b678c31 --- /dev/null +++ b/website/www/site/content/en/blog/beam-2.30.0.md @@ -0,0 +1,80 @@ +--- +title: "Apache Beam 2.30.0" +date: 2021-06-09 9:00:00 -0700 +categories: + - blog +authors: + - heejong +--- + + +We are happy to present the new 2.30.0 release of Beam. +This release includes both improvements and new functionality. +See the [download page](/get-started/downloads/#2300-2021-06-09) for this release. + + + +For more information on changes in 2.30.0, check out the [detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12349978). + +## Highlights + +* Legacy Read transform (non-SDF based Read) is used by default for non-FnAPI opensource runners. Use `use_sdf_read` experimental flag to re-enable SDF based Read transforms ([BEAM-10670](https://issues.apache.org/jira/browse/BEAM-10670)) +* Upgraded vendored gRPC dependency to 1.36.0 ([BEAM-11227](https://issues.apache.org/jira/browse/BEAM-11227)) + + +### I/Os + +* Fixed the issue that WriteToBigQuery with batch file loads does not respect schema update options when there are multiple load jobs ([BEAM-11277](https://issues.apache.org/jira/browse/BEAM-11277)) +* Fixed the issue that the job didn't properly retry since BigQuery sink swallows HttpErrors when performing streaming inserts ([BEAM-12362](https://issues.apache.org/jira/browse/BEAM-12362)) + + +### New Features / Improvements + +* Added capability to declare resource hints in Java and Python SDKs ([BEAM-2085](https://issues.apache.org/jira/browse/BEAM-2085)) +* Added Spanner IO Performance tests for read and write in Python SDK ([BEAM-10029](https://issues.apache.org/jira/browse/BEAM-10029)) +* Added support for accessing GCP PubSub Message ordering keys, message IDs and message publish timestamp in Python SDK ([BEAM-7819](https://issues.apache.org/jira/browse/BEAM-7819)) +* DataFrame API: Added support for collecting DataFrame objects in interactive Beam ([BEAM-11855](https://issues.apache.org/jira/browse/BEAM-11855)) +* DataFrame API: Added [apache_beam.examples.dataframe](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/dataframe) module ([BEAM-12024](https://issues.apache.org/jira/browse/BEAM-12024)) +* Upgraded the GCP Libraries BOM version to 20.0.0 ([BEAM-11205](https://issues.apache.org/jira/browse/BEAM-11205)). For Google Cloud client library versions set by this BOM, see [this table](https://storage.googleapis.com/cloud-opensource-java-dashboard/com.google.cloud/libraries-bom/20.0.0/artifact_details.html) +* Added `sdkContainerImage` flag to (eventually) replace `workerHarnessContainerImage` ([BEAM-12212](https://issues.apache.org/jira/browse/BEAM-12212)) +* Added support for Dataflow update when schemas are used ([BEAM-12198](https://issues.apache.org/jira/browse/BEAM-12198)) +* Fixed the issue that `ZipFiles.zipDirectory` leaks native JVM memory ([BEAM-12220](https://issues.apache.org/jira/browse/BEAM-12220)) +* Fixed the issue that `Reshuffle.withNumBuckets` creates `(N*2)-1` buckets ([BEAM-12361](https://issues.apache.org/jira/browse/BEAM-12361)) + +### Breaking Changes + +* Drop support for Flink 1.8 and 1.9 ([BEAM-11948](https://issues.apache.org/jira/browse/BEAM-11948)) +* MongoDbIO: Read.withFilter() and Read.withProjection() are removed since they are deprecated since Beam 2.12.0 ([BEAM-12217](https://issues.apache.org/jira/browse/BEAM-12217)) +* RedisIO.readAll() was removed since it was deprecated since Beam 2.13.0. Please use RedisIO.readKeyPatterns() for the equivalent functionality ([BEAM-12214](https://issues.apache.org/jira/browse/BEAM-12214)) +* MqttIO.create() with clientId constructor removed because it was deprecated since Beam 2.13.0 ([BEAM-12216](https://issues.apache.org/jira/browse/BEAM-12216)) + +### Known Issues + +* See a full list of open [issues that affect](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20affectedVersion%20%3D%202.30.0%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC) this version. + +## List of Contributors + +According to `git shortlog`, the following people contributed to the 2.30.0 release. Thank you to all contributors! + +Ahmet Altay, Alex Amato, Alexey Romanenko, Anant Damle, Andreas Bergmeier, Andrew Pilloud, Ankur Goenka, +Anup D, Artur Khanin, Benjamin Gonzalez, Bipin Upadhyaya, Boyuan Zhang, Brian Hulette, Bulat Shakirzyanov, +Chamikara Jayalath, Chun Yang, Daniel Kulp, Daniel Oliveira, David Cavazos, Elliotte Rusty Harold, Emily Ye, +Eric Roshan-Eisner, Evan Galpin, Fabien Caylus, Fernando Morales, Heejong Lee, Iñigo San Jose Visiers, +Isidro Martínez, Ismaël Mejía, Ke Wu, Kenneth Knowles, KevinGG, Kyle Weaver, Ludovic Post, MATTHEW Ouyang (LCL), +Mackenzie Clark, Masato Nakamura, Matthias Baetens, Max, Nicholas Azar, Ning Kang, Pablo Estrada, Patrick McCaffrey, +Quentin Sommer, Reuven Lax, Robert Bradshaw, Robert Burke, Rui Wang, Sam Rohde, Sam Whittle, Shoaib Zafar, +Siyuan Chen, Sruthi Sree Kumar, Steve Niemitz, Sylvain Veyrié, Tomo Suzuki, Udi Meiri, Valentyn Tymofieiev, +Vitaly Terentyev, Wenbing, Xinyu Liu, Yichi Zhang, Yifan Mai, Yueyang Qiu, Yunqing Zhou, ajo thomas, brucearctor, +dmkozh, dpcollins-google, emily, jordan-moore, kileys, lostluck, masahitojp, roger-mike, sychen, tvalentyn, +vachan-shetty, yoshiki.obata + diff --git a/website/www/site/content/en/blog/beam-2.31.0.md b/website/www/site/content/en/blog/beam-2.31.0.md new file mode 100644 index 000000000000..2a47fcadec3b --- /dev/null +++ b/website/www/site/content/en/blog/beam-2.31.0.md @@ -0,0 +1,76 @@ +--- +title: "Apache Beam 2.31.0" +date: 2021-07-08 9:00:00 -0700 +categories: + - blog +authors: + - apilloud +--- + + +We are happy to present the new 2.31.0 release of Beam. +This release includes both improvements and new functionality. +See the [download page](/get-started/downloads/#2310-2021-07-08) for this release. + + + +For more information on changes in 2.31.0, check out the [detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12349991). + +## Highlights + +### I/Os + +* Fixed bug in ReadFromBigQuery when a RuntimeValueProvider is used as value of table argument (Python) ([BEAM-12514](https://issues.apache.org/jira/browse/BEAM-12514)). + +### New Features / Improvements + +* `CREATE FUNCTION` DDL statement added to Calcite SQL syntax. `JAR` and `AGGREGATE` are now reserved keywords. ([BEAM-12339](https://issues.apache.org/jira/browse/BEAM-12339)). +* Flink 1.13 is now supported by the Flink runner ([BEAM-12277](https://issues.apache.org/jira/browse/BEAM-12277)). +* DatastoreIO: Write and delete operations now follow automatic gradual ramp-up, + in line with best practices (Java/Python) ([BEAM-12260](https://issues.apache.org/jira/browse/BEAM-12260), [BEAM-12272](https://issues.apache.org/jira/browse/BEAM-12272)). +* Python `TriggerFn` has a new `may_lose_data` method to signal potential data loss. Default behavior assumes safe (necessary for backwards compatibility). See Deprecations for potential impact of overriding this. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). + +### Breaking Changes + +* Python Row objects are now sensitive to field order. So `Row(x=3, y=4)` is no + longer considered equal to `Row(y=4, x=3)` (BEAM-11929). +* Kafka Beam SQL tables now ascribe meaning to the LOCATION field; previously + it was ignored if provided. +* `TopCombineFn` disallow `compare` as its argument (Python) ([BEAM-7372](https://issues.apache.org/jira/browse/BEAM-7372)). +* Drop support for Flink 1.10 ([BEAM-12281](https://issues.apache.org/jira/browse/BEAM-12281)). + +### Deprecations + +* Python GBK will stop supporting unbounded PCollections that have global windowing and a default trigger in Beam 2.33. This can be overriden with `--allow_unsafe_triggers`. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). +* Python GBK will start requiring safe triggers or the `--allow_unsafe_triggers` flag starting with Beam 2.33. ([BEAM-9487](https://issues.apache.org/jira/browse/BEAM-9487)). + +### Known Issues + +* See a full list of [issues that affect](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20affectedVersion%20%3D%202.31.0%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC) this version. + +## List of Contributors + +According to `git shortlog`, the following people contributed to the 2.31.0 release. Thank you to all contributors! + +Ahmet Altay, ajo thomas, Alan Myrvold, Alex Amato, Alexey Romanenko, +AlikRodriguez, Anant Damle, Andrew Pilloud, Benjamin Gonzalez, Boyuan Zhang, +Brian Hulette, Chamikara Jayalath, Daniel Oliveira, David Cavazos, +David Huntsperger, David Moravek, Dmytro Kozhevin, dpcollins-google, Emily Ye, +Ernesto Valentino, Evan Galpin, Fernando Morales, Heejong Lee, Ismaël Mejía, +Jan Lukavský, Josias Rico, jrynd, Kenneth Knowles, Ke Wu, kileys, Kyle Weaver, +masahitojp, Matthias Baetens, Maximilian Michels, Milena Bukal, +Nathan J. Mehl, Pablo Estrada, Peter Sobot, Reuven Lax, Robert Bradshaw, +Robert Burke, roger-mike, Sam Rohde, Sam Whittle, Stephan Hoyer, Tom Underhill, +tvalentyn, Uday Singh, Udi Meiri, Vitaly Terentyev, Xinyu Liu, Yichi Zhang, +Yifan Mai, yoshiki.obata, zhoufek + diff --git a/website/www/site/content/en/blog/beam-2.32.0.md b/website/www/site/content/en/blog/beam-2.32.0.md new file mode 100644 index 000000000000..d0d3b12bce75 --- /dev/null +++ b/website/www/site/content/en/blog/beam-2.32.0.md @@ -0,0 +1,105 @@ +--- + +title: "Apache Beam 2.32.0" + +date: 2021-08-25 00:00:01 -0800 + +categories: + + - blog + +authors: + + - angoenka + +--- + + + +We are happy to present the new 2.32.0 release of Apache Beam. This release includes both improvements and new functionality. + +See the [download page](/get-started/downloads/#2320-2021-08-11) for this release. + +For more information on changes in 2.32.0, check out the + +[detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12349992). + + +## Highlights +* The [Beam DataFrame + API](https://beam.apache.org/documentation/dsls/dataframes/overview/) is no + longer experimental! We've spent the time since the [2.26.0 preview + announcement](https://beam.apache.org/blog/dataframe-api-preview-available/) + implementing the most frequently used pandas operations + ([BEAM-9547](https://issues.apache.org/jira/browse/BEAM-9547)), improving + [documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.html) + and [error messages](https://issues.apache.org/jira/browse/BEAM-12028), + adding + [examples](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/dataframe), + integrating DataFrames with [interactive + Beam](https://beam.apache.org/releases/pydoc/current/apache_beam.runners.interactive.interactive_beam.html), + and of course finding and fixing + [bugs](https://issues.apache.org/jira/issues/?jql=project%3DBEAM%20AND%20issuetype%3DBug%20AND%20status%3DResolved%20AND%20component%3Ddsl-dataframe). + Leaving experimental just means that we now have high confidence in the API + and recommend its use for production workloads. We will continue to improve + the API, guided by your + [feedback](https://beam.apache.org/community/contact-us/). + + +## I/Os + +* Added ability to use JdbcIO.Write.withResults without statement and preparedStatementSetter. ([BEAM-12511](https://issues.apache.org/jira/browse/BEAM-12511)) +- Added ability to register URI schemes to use the S3 protocol via FileIO. ([BEAM-12435](https://issues.apache.org/jira/browse/BEAM-12435)). +* Respect number of shards set in SnowflakeWrite batch mode. ([BEAM-12715](https://issues.apache.org/jira/browse/BEAM-12715)) +* Java SDK: Update Google Cloud Healthcare IO connectors from using v1beta1 to using the GA version. + +## New Features / Improvements + +* Add support to convert Beam Schema to Avro Schema for JDBC LogicalTypes: + `VARCHAR`, `NVARCHAR`, `LONGVARCHAR`, `LONGNVARCHAR`, `DATE`, `TIME` + (Java)([BEAM-12385](https://issues.apache.org/jira/browse/BEAM-12385)). +* Reading from JDBC source by partitions (Java) ([BEAM-12456](https://issues.apache.org/jira/browse/BEAM-12456)). +* PubsubIO can now write to a dead-letter topic after a parsing error (Java)([BEAM-12474](https://issues.apache.org/jira/browse/BEAM-12474)). +* New append-only option for Elasticsearch sink (Java) [BEAM-12601](https://issues.apache.org/jira/browse/BEAM-12601) + +## Breaking Changes + +* ListShards (with DescribeStreamSummary) is used instead of DescribeStream to list shards in Kinesis streams. Due to this change, as mentioned in [AWS documentation](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html), for fine-grained IAM policies it is required to update them to allow calls to ListShards and DescribeStreamSummary APIs. For more information, see [Controlling Access to Amazon Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/controlling-access.html) ([BEAM-12225](https://issues.apache.org/jira/browse/BEAM-12225)). + +## Deprecations + + +## Known Issues + +* Fixed race condition in RabbitMqIO causing duplicate acks (Java) ([BEAM-6516](https://issues.apache.org/jira/browse/BEAM-6516))) + + +## List of Contributors + + + +According to git shortlog, the following people contributed to the 2.32.0 release. Thank you to all contributors! + + + +Ahmet Altay, Ajo Thomas, Alex Amato, Alexey Romanenko, Alex Koay, allenpradeep, Anant Damle, Andrew Pilloud, Ankur Goenka, Ashwin Ramaswami, Benjamin Gonzalez, BenWhitehead, Blake Williams, Boyuan Zhang, Brian Hulette, Chamikara Jayalath, Daniel Oliveira, Daniel Thevessen, daria-malkova, David Cavazos, David Huntsperger, dennisylyung, Dennis Yung, dmkozh, egalpin, emily, Esun Kim, Gabriel Melo de Paula, Harch Vardhan, Heejong Lee, heidimhurst, hoshimura, Iñigo San Jose Visiers, Ismaël Mejía, Jack McCluskey, Jan Lukavský, Justin King, Kenneth Knowles, KevinGG, Ke Wu, kileys, Kyle Weaver, Luke Cwik, Maksym Skorupskyi, masahitojp, Matthew Ouyang, Matthias Baetens, Matt Rudary, MiguelAnzoWizeline, Miguel Hernandez, Nikita Petunin, Ning Ding, Ning Kang, odidev, Pablo Estrada, Pascal Gillet, rafal.ochyra, raphael.sanamyan, Reuven Lax, Robert Bradshaw, Robert Burke, roger-mike, Ryan McDowell, Sam Rohde, Sam Whittle, Siyuan Chen, Teng Qiu, Tianzi Cai, Tobias Hermann, Tomo Suzuki, tvalentyn, Tyson Hamilton, Udi Meiri, Valentyn Tymofieiev, Vitaly Terentyev, Yichi Zhang, Yifan Mai, yoshiki.obata, Yu Feng, YuqiHuai, yzhang559, Zachary Houfek, zhoufek diff --git a/website/www/site/content/en/blog/dataframe-api-preview-available.md b/website/www/site/content/en/blog/dataframe-api-preview-available.md new file mode 100644 index 000000000000..548a30d458d0 --- /dev/null +++ b/website/www/site/content/en/blog/dataframe-api-preview-available.md @@ -0,0 +1,178 @@ +--- +title: "DataFrame API Preview now Available!" +date: "2020-12-16T09:09:41-08:00" +categories: + - blog +authors: + - bhulette + - robertwb +--- + + +We're excited to announce that a preview of the Beam Python SDK's new DataFrame +API is now available in [Beam +2.26.0](https://beam.apache.org/blog/beam-2.26.0/). Much like `SqlTransform` +([Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/extensions/sql/SqlTransform.html), +[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform)), +the DataFrame API gives Beam users a way to express complex +relational logic much more concisely than previously possible. + + +## A more expressive API +Beam's new DataFrame API aims to be compatible with the well known +[Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html) +DataFrame API, with a few caveats detailed below. With this new API a simple +pipeline that reads NYC taxiride data from a CSV, performs a grouped +aggregation, and writes the output to CSV, can be expressed very concisely: + +``` +from apache_beam.dataframe.io import read_csv + +with beam.Pipeline() as p: + df = p | read_csv("gs://apache-beam-samples/nyc_taxi/2019/*.csv", + usecols=['passenger_count' , 'DOLocationID']) + # Count the number of passengers dropped off per LocationID + agg = df.groupby('DOLocationID').sum() + agg.to_csv(output) +``` + +Compare this to the same logic implemented as a conventional Beam python +pipeline with a `CombinePerKey`: + +``` +with beam.Pipeline() as p: + (p | beam.io.ReadFromText("gs://apache-beam-samples/nyc_taxi/2019/*.csv", + skip_header_lines=1) + | beam.Map(lambda line: line.split(',')) + # Parse CSV, create key - value pairs + | beam.Map(lambda splits: (int(splits[8] or 0), # DOLocationID + int(splits[3] or 0))) # passenger_count + # Sum values per key + | beam.CombinePerKey(sum) + | beam.MapTuple(lambda loc_id, pc: f'{loc_id},{pc}') + | beam.io.WriteToText(known_args.output)) +``` + +The DataFrame example is much easier to quickly inspect and understand, as it +allows you to concisely express grouped aggregations without using the low-level +`CombinePerKey`. + +In addition to being more expressive, a pipeline written with the DataFrame API +can often be more efficient than a conventional Beam pipeline. This is because +the DataFrame API defers to the very efficient, columnar Pandas implementation +as much as possible. + +## DataFrames as a DSL +You may already be aware of [Beam +SQL](https://beam.apache.org/documentation/dsls/sql/overview/), which is +a Domain-Specific Language (DSL) built with Beam's Java SDK. SQL is +considered a DSL because it's possible to express a full pipeline, including IOs +and complex operations, entirely with SQL.  + +Similarly, the DataFrame API is a DSL built with the Python SDK. You can see +that the above example is written without traditional Beam constructs like IOs, +ParDo, or CombinePerKey. In fact the only traditional Beam type is the Pipeline +instance! Otherwise this pipeline is written completely using the DataFrame API. +This is possible because the DataFrame API doesn't just implement Pandas' +computation operations, it also includes IOs based on the Pandas native +implementations (`pd.read_{csv,parquet,...}` and `pd.DataFrame.to_{csv,parquet,...}`). + +Like SQL, it's also possible to embed the DataFrame API into a larger pipeline +by using +[schemas](https://beam.apache.org/documentation/programming-guide/#what-is-a-schema). +A schema-aware PCollection can be converted to a DataFrame, processed, and the +result converted back to another schema-aware PCollection. For example, if you +wanted to use traditional Beam IOs rather than one of the DataFrame IOs you +could rewrite the above pipeline like this: + +``` +from apache_beam.dataframe.convert import to_dataframe +from apache_beam.dataframe.convert import to_pcollection + +with beam.Pipeline() as p: + ... + schema_pc = (p | beam.ReadFromText(..) + # Use beam.Select to assign a schema + | beam.Select(DOLocationID=lambda line: int(...), + passenger_count=lambda line: int(...))) + df = to_dataframe(schema_pc) + agg = df.groupby('DOLocationID').sum() + agg_pc = to_pcollection(pc) + + # agg_pc has a schema based on the structure of agg + (agg_pc | beam.Map(lambda row: f'{row.DOLocationID},{row.passenger_count}') + | beam.WriteToText(..)) +``` + +It's also possible to use the DataFrame API by passing a function to +[`DataframeTransform`](https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.transforms.html#apache_beam.dataframe.transforms.DataframeTransform): + +``` +from apache_beam.dataframe.transforms import DataframeTransform + +with beam.Pipeline() as p: + ... + | beam.Select(DOLocationID=lambda line: int(..), + passenger_count=lambda line: int(..)) + | DataframeTransform(lambda df: df.groupby('DOLocationID').sum()) + | beam.Map(lambda row: f'{row.DOLocationID},{row.passenger_count}') + ... +``` + +## Caveats +As hinted above, there are some differences between Beam's DataFrame API and the +Pandas API. The most significant difference is that the Beam DataFrame API is +*deferred*, just like the rest of the Beam API. This means that you can't +`print()` a DataFrame instance in order to inspect the data, because we haven't +computed the data yet! The computation doesn't take place until the pipeline is +`run()`. Before that, we only know about the shape/schema of the result (i.e. +the names and types of the columns), and not the result itself. + +There are a few common exceptions you will likely see when attempting to use +certain Pandas operations: + +- **NotImplementedError:** Indicates this is an operation or argument that we + haven't had time to look at yet. We've tried to make as many Pandas operations + as possible available in the Preview offering of this new API, but there's + still a long tail of operations to go. +- **WontImplementError:** Indicates this is an operation or argument we do not + intend to support in the near-term because it's incompatible with the Beam + model. The largest class of operations that raise this error are those that + are order sensitive (e.g. shift, cummax, cummin, head, tail, etc..). These + cannot be trivially mapped to Beam because PCollections, representing + distributed datasets, are unordered. Note that even some of these operations + *may* get implemented in the future - we actually have some ideas for how we + might support order sensitive operations - but it's a ways off. + +Finally, it's important to note that this is a preview of a new feature that +will get hardened over the next few Beam releases. We would love for you to try +it out now and give us some feedback, but we do not yet recommend it for use in +production workloads. + +## How to get involved +The easiest way to get involved with this effort is to try out DataFrames and +let us know what you think! You can send questions to user@beam.apache.org, or +file bug reports and feature requests in [jira](https://issues.apache.org/jira). +In particular, it would be really helpful to know if there's an operation we +haven't implemented yet that you'd find useful, so that we can prioritize it. + +If you'd like to learn more about how the DataFrame API works under the hood and +get involved with the development we recommend you take a look at the +[design doc](http://s.apache.org/beam-dataframes) +and our [Beam summit +presentation](https://2020.beamsummit.org/sessions/simpler-python-pipelines/). +From there the best way to help is to knock out some of those not implemented +operations. We're coordinating that work in +[BEAM-9547](https://issues.apache.org/jira/browse/BEAM-9547). diff --git a/website/www/site/content/en/blog/kafka-to-pubsub-example.md b/website/www/site/content/en/blog/kafka-to-pubsub-example.md new file mode 100644 index 000000000000..31972d95ffd4 --- /dev/null +++ b/website/www/site/content/en/blog/kafka-to-pubsub-example.md @@ -0,0 +1,89 @@ +--- +layout: post +title: "Example to ingest data from Apache Kafka to Google Cloud Pub/Sub" +date: 2021-01-15 00:00:01 -0800 +categories: + - blog + - java +authors: + - arturkhanin + - ilyakozyrev + - alexkosolapov +--- + + +In this blog post we present an example that creates a pipeline to read data from a single topic or +multiple topics from [Apache Kafka](https://kafka.apache.org/) and write data into a topic +in [Google Pub/Sub](https://cloud.google.com/pubsub). The example provides code samples to implement +simple yet powerful pipelines and also provides an out-of-the-box solution that you can just _" +plug'n'play"_. + +This end-to-end example is included +in [Apache Beam release 2.27](https://beam.apache.org/blog/beam-2.27.0/) +and can be downloaded [here](https://beam.apache.org/get-started/downloads/#2270-2020-12-22). + +We hope you will find this example useful for setting up data pipelines between Kafka and Pub/Sub. + +# Example specs + +Supported data formats: + +- Serializable plain text formats, such as JSON +- [PubSubMessage](https://cloud.google.com/pubsub/docs/reference/rest/v1/PubsubMessage) + +Supported input source configurations: + +- Single or multiple Apache Kafka bootstrap servers +- Apache Kafka SASL/SCRAM authentication over plaintext or SSL connection +- Secrets vault service [HashiCorp Vault](https://www.vaultproject.io/) + +Supported destination configuration: + +- Single Google Pub/Sub topic + +In a simple scenario, the example will create an Apache Beam pipeline that will read messages from a +source Kafka server with a source topic, and stream the text messages into specified Pub/Sub +destination topic. Other scenarios may need Kafka SASL/SCRAM authentication, that can be performed +over plaintext or SSL encrypted connection. The example supports using a single Kafka user account +to authenticate in the provided source Kafka servers and topics. To support SASL authentication over +SSL the example will need an SSL certificate location and access to a secrets vault service with +Kafka username and password, currently supporting HashiCorp Vault. + +# Where can I run this example? + +There are two ways to execute the pipeline. + +1. Locally. This way has many options - run directly from your IntelliJ, or create `.jar` file and + run it in the terminal, or use your favourite method of running Beam pipelines. +2. In [Google Cloud](https://cloud.google.com/) using Google + Cloud [Dataflow](https://cloud.google.com/dataflow): + - With `gcloud` command-line tool you can create + a [Flex Template](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates) + out of this Beam example and execute it in Google Cloud Platform. _This requires corresponding + modifications of the example to turn it into a template._ + - This example exists as + a [Flex Template version](https://github.com/GoogleCloudPlatform/DataflowTemplates/tree/master/v2/kafka-to-pubsub) + within [Google Cloud Dataflow Template Pipelines](https://github.com/GoogleCloudPlatform/DataflowTemplates) + repository and can be run with no additional code modifications. + +# Next Steps + +Give this **Beam end-to-end example** a try. If you are new to Beam, we hope this example will give +you more understanding on how pipelines work and look like. If you are already using Beam, we hope +some code samples in it will be useful for your use cases. + +Please +[let us know](https://beam.apache.org/community/contact-us/) if you encounter any issues. + diff --git a/website/www/site/content/en/blog/pattern-match-beam-sql.md b/website/www/site/content/en/blog/pattern-match-beam-sql.md index fb0620616ed0..409eedf0d782 100644 --- a/website/www/site/content/en/blog/pattern-match-beam-sql.md +++ b/website/www/site/content/en/blog/pattern-match-beam-sql.md @@ -6,7 +6,7 @@ aliases: categories: - blog authors: - - Mark-Zeng + - markzeng --- + +We are pleased to announce that Splittable DoFn (SDF) is ready for use in the Beam Python, Java, +and Go SDKs for versions 2.25.0 and later. + +In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed +to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of +building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core +capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of +coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable +code. + +SDF has three advantages over the existing `UnboundedSource` and `BoundedSource`: +* SDF provides a unified set of APIs to handle both unbounded and bounded cases. +* SDF enables reading from source descriptors dynamically. + - Taking KafkaIO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify + the topic and partition you want to read from during pipeline construction time. There is no way + for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution + time. But it's built-in to SDF. +* SDF fits in as any node on a pipeline freely with the ability of splitting. + - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance + benefits from splitting strategies, which limits many real-world usages. This is no longer a limit + for an SDF. + +As SDF is now ready to use with all the mentioned improvements, it is the recommended +way to build the new I/O connectors. Try out building your own Splittable DoFn by following the +[programming guide](https://beam.apache.org/documentation/programming-guide/#splittable-dofns). We +have provided tonnes of common utility classes such as common types of `RestrictionTracker` and +`WatermarkEstimator` in Beam SDK, which will help you onboard easily. As for the existing I/O +connectors, we have wrapped `UnboundedSource` and `BoundedSource` implementations into Splittable +DoFns, yet we still encourage developers to convert `UnboundedSource`/`BoundedSource` into actual +Splittable DoFn implementation to gain more performance benefits. + +Many thanks to every contributor who brought this highly anticipated design into the data processing +world. We are really excited to see that users benefit from SDF. + +Below are some real-world SDF examples for you to explore. + +## Real world Splittable DoFn examples + +**Java Examples** + +* [Kafka](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java#L118): +An I/O connector for [Apache Kafka](https://kafka.apache.org/) +(an open-source distributed event streaming platform). +* [Watch](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L787): +Uses a polling function producing a growing set of outputs for each input until a per-input +termination condition is met. +* [Parquet](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L365): +An I/O connector for [Apache Parquet](https://parquet.apache.org/) +(an open-source columnar storage format). +* [HL7v2](https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java#L493): +An I/O connector for HL7v2 messages (a clinical messaging format that provides data about events +that occur inside an organization) part of +[Google’s Cloud Healthcare API](https://cloud.google.com/healthcare). +* [BoundedSource wrapper](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L248): +A wrapper which converts an existing [BoundedSource](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/BoundedSource.html) +implementation to a splittable DoFn. +* [UnboundedSource wrapper](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L432): +A wrapper which converts an existing [UnboundedSource](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/UnboundedSource.html) +implementation to a splittable DoFn. + +**Python Examples** +* [BoundedSourceWrapper](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/python/apache_beam/io/iobase.py#L1375): +A wrapper which converts an existing [BoundedSource](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.BoundedSource) +implementation to a splittable DoFn. + +**Go Examples** + * [textio.ReadSdf](https://github.com/apache/beam/blob/ce190e11332469ea59b6c9acf16ee7c673ccefdd/sdks/go/pkg/beam/io/textio/sdf.go#L40) implements reading from text files using a splittable DoFn. diff --git a/website/www/site/content/en/blog/test-stream.md b/website/www/site/content/en/blog/test-stream.md index c21fb2fdc7d8..845dcb892ed8 100644 --- a/website/www/site/content/en/blog/test-stream.md +++ b/website/www/site/content/en/blog/test-stream.md @@ -126,7 +126,7 @@ watermark and provide the result PCollection as input to the CalculateTeamScores PTransform: {{< highlight java >}} -TestStream infos = TestStream.create(AvroCoder.of(GameActionInfo.class)) +TestStream createEvents = TestStream.create(AvroCoder.of(GameActionInfo.class)) .addElements(new GameActionInfo("sky", "blue", 12, new Instant(0L)),                 new GameActionInfo("navy", "blue", 3, new Instant(0L)),                 new GameActionInfo("navy", "blue", 3, new Instant(0L).plus(Duration.standardMinutes(3)))) @@ -160,7 +160,7 @@ the system to be on time, as it arrives before the watermark passes the end of the window {{< highlight java >}} -TestStream infos = TestStream.create(AvroCoder.of(GameActionInfo.class)) +TestStream createEvents = TestStream.create(AvroCoder.of(GameActionInfo.class)) .addElements(new GameActionInfo("sky", "blue", 3, new Instant(0L)),         new GameActionInfo("navy", "blue", 3, new Instant(0L).plus(Duration.standardMinutes(3))))    // Move the watermark up to "near" the end of the window @@ -190,7 +190,7 @@ demonstrate the triggering behavior that causes the system to emit an on-time pane, and then after the late data arrives, a pane that refines the result. {{< highlight java >}} -TestStream infos = TestStream.create(AvroCoder.of(GameActionInfo.class)) +TestStream createEvents = TestStream.create(AvroCoder.of(GameActionInfo.class)) .addElements(new GameActionInfo("sky", "blue", 3, new Instant(0L)),           new GameActionInfo("navy", "blue", 3, new Instant(0L).plus(Duration.standardMinutes(3))))     // Move the watermark up to "near" the end of the window @@ -224,7 +224,7 @@ configured allowed lateness, we can demonstrate that the late element is dropped by the system. {{< highlight java >}} -TestStream infos = TestStream.create(AvroCoder.of(GameActionInfo.class)) +TestStream createEvents = TestStream.create(AvroCoder.of(GameActionInfo.class)) .addElements(new GameActionInfo("sky", "blue", 3, Duration.ZERO),          new GameActionInfo("navy", "blue", 3, Duration.standardMinutes(3)))     // Move the watermark up to "near" the end of the window @@ -260,7 +260,7 @@ to an input PCollection, occasionally advancing the processing time clock, and apply `CalculateUserScores` {{< highlight java >}} -TestStream.create(AvroCoder.of(GameActionInfo.class)) +TestStream createEvents = TestStream.create(AvroCoder.of(GameActionInfo.class))    .addElements(new GameActionInfo("scarlet", "red", 3, new Instant(0L)),                new GameActionInfo("scarlet", "red", 2, new Instant(0L).plus(Duration.standardMinutes(1)))) .advanceProcessingTime(Duration.standardMinutes(12)) @@ -270,7 +270,7 @@ TestStream.create(AvroCoder.of(GameActionInfo.class)) .advanceWatermarkToInfinity(); PCollection> userScores = -    p.apply(infos).apply(new CalculateUserScores(ALLOWED_LATENESS)); +    p.apply(createEvents).apply(new CalculateUserScores(ALLOWED_LATENESS)); {{< /highlight >}} Elements all arrive before the watermark, and are produced in the on-time pane diff --git a/website/www/site/content/en/blog/timely-processing.md b/website/www/site/content/en/blog/timely-processing.md index 440913f61b92..13d26ba6d3d2 100644 --- a/website/www/site/content/en/blog/timely-processing.md +++ b/website/www/site/content/en/blog/timely-processing.md @@ -429,9 +429,9 @@ is intuitively simple: you want to wait a certain amount of time and then receive a call back. To put the finishing touches on our example, we will set a processing time -timer as soon as any data is buffered. We track whether or not the timer has -been set so we don't continually reset it. When an element arrives, if the -timer has not been set, then we set it for the current moment plus +timer as soon as any data is buffered. Note that we set the timer only when +the current buffer is empty, so that we don't continually reset the timer. +When the first element arrives, we set the timer for the current moment plus `MAX_BUFFER_DURATION`. After the allotted processing time has passed, a callback will fire and enrich and emit any buffered elements. @@ -453,7 +453,6 @@ new DoFn() { @TimerId("stale") Timer staleTimer, @TimerId("expiry") Timer expiryTimer) { - boolean staleTimerSet = firstNonNull(staleSetState.read(), false); if (firstNonNull(countState.read(), 0) == 0) { staleTimer.offset(MAX_BUFFER_DURATION).setRelative(); } @@ -549,7 +548,7 @@ similar features before outside of Beam. ### Event Time Windowing "Just Works" -One of the raisons d'etre for Beam is correct processing of out-of-order event +One of the raisons d'être for Beam is correct processing of out-of-order event data, which is almost all event data. Beam's solution to out-of-order data is event time windowing, where windows in event time yield correct results no matter what windowing a user chooses or what order the events come in. diff --git a/website/www/site/content/en/blog/validate-beam-release.md b/website/www/site/content/en/blog/validate-beam-release.md new file mode 100644 index 000000000000..082f5b66f7fb --- /dev/null +++ b/website/www/site/content/en/blog/validate-beam-release.md @@ -0,0 +1,108 @@ +--- +title: "How to validate a Beam Release" +date: 2021-06-08 00:00:01 -0800 +categories: + - blog +authors: + - pabloem + +--- + + +Performing new releases is a core responsibility of any software project. +It is even more important in the culture of Apache projects. Releases are +the main flow of new code / features among the community of a project. + +Beam is no exception: We aspire to keep a release cadence of about 6 weeks, +and try to work with the community to release useful new features, and to +keep Beam useful. + +### Configure a Java build to validate a Beam release candidate + +First of all, it would be useful to have a single property in your `pom.xml` +where you keep the global Beam version that you're using. Something like this +in your `pom.xml`: + +{{< highlight java >}} +<properties> + ... + <beam.version>2.26.0</beam.version> + ... +</properties> +<dependencies> + <dependency> + <groupId>org.apache.beam</groupId> + <artifactId>beam-sdks-java-core</artifactId> + <version>${beam.version}</version> + </dependency> + ... +</dependencies> +{{< /highlight >}} + +Second, you can add a new profile to your `pom.xml` file. In this new profile, +add a new repository with the staging repository for the new Beam release. For +Beam 2.27.0, this was `https://repository.apache.org/content/repositories/orgapachebeam-1149/`. + +{{< highlight java >}} + <profile> + <id>validaterelease</id> + <repositories> + <repository> + <id>apache.beam.newrelease</id> + <url>${beam.release.repo}</url> + </repository> + </repositories> + </profile> +{{< /highlight >}} + +Once you have a `beam.version` property in your `pom.xml`, and a new profile +with the new release, you can run your `mvn` command activating the new profile, +and the new Beam version: + +``` +mvn test -Pvalidaterelease \ + -Dbeam.version=2.27.0 \ + -Dbeam.release.repo=https://repository.apache.org/content/repositories/orgapachebeam-XXXX/ +``` + +This should build your project against the new release, and run basic tests. +It will allow you to run basic validations against the new Beam release. +If you find any issues, then you can share them *before* the release is +finalized, so your concerns can be addressed by the community. + + +### Configuring a Python build to validate a Beam release candidate + +For Python SDK releases, you can install SDK from Pypi, by enabling the +installation of pre-release artifacts. + +First, make sure that your `requirements.txt` or `setup.py` files allow +for Beam versions above the current one. Something like this should install +the latest available version: + +``` +apache-beam<=3.0.0 +``` + +With that, you can ask `pip` to install pre-release versions of Beam in your +environment: + +``` +pip install --pre apache-beam +``` + +With that, the Beam version in your environment will be the latest release +candidate, and you can go ahead and run your tests to verify that everything +works well. diff --git a/website/www/site/content/en/community/_index.md b/website/www/site/content/en/community/_index.md new file mode 100644 index 000000000000..917bc1ef6065 --- /dev/null +++ b/website/www/site/content/en/community/_index.md @@ -0,0 +1,28 @@ +--- +title: "Community Beam" +aliases: + - /use/issue-tracking/ + - /use/mailing-lists/ + - /get-started/support/ +--- + + + +# Welcome to the Beam Community! + +Beam is a tool created by community for community. We tirelessly work to make it better and you can do it too! +If you are a data or software developer this is a place for you. + +{{< community/list_with_icons community_list >}} diff --git a/website/www/site/content/en/community/contact-us.md b/website/www/site/content/en/community/contact-us.md index cca25cda1224..c071818dfbea 100644 --- a/website/www/site/content/en/community/contact-us.md +++ b/website/www/site/content/en/community/contact-us.md @@ -6,6 +6,7 @@ aliases: - /use/mailing-lists/ - /get-started/support/ --- + -# Contact Us +# Contact us! + +The Apache Beam community is friendly and welcoming. We are glad to help with any +question, suggestion or idea you have. Contact us in the following channels: + +## Available contact channels + +{{< community/table_with_icons contact_us >}} + +## Mailing list, what are they and how they work + +The official communication channels for Apache projects are their mailing lists, and Apache Beam has two main lists: [user@beam.apache.org](user@beam.apache.org) and [dev@beam.apache.org](dev@beam.apache.org). The topics for each of them can be seen in the section above. -There are many ways to reach the Beam user and developer communities - use -whichever one seems best. +### Subsribe and Unsubscribe: - -

    +Prior to sending emails to these lists, you need to subscribe. To subscribe, send a blank email to [user-subscribe@beam.apache.org](user-subscribe@beam.apache.org) or [dev-subscribe@beam.apache.org](dev-subscribe@beam.apache.org) depending on the list you want to write to. -| How to contact us | When to use it | -| ----------------- | ---------------| -| [user@](https://lists.apache.org/list.html?user@beam.apache.org) mailing list | User support and questions ([Subscribe](mailto:user-subscribe@beam.apache.org)[^1], [Unsubscribe](mailto:user-unsubscribe@beam.apache.org)[^1], [Archives](https://lists.apache.org/list.html?user@beam.apache.org)) | -| [dev@](https://lists.apache.org/list.html?dev@beam.apache.org) mailing list | Development discussions ([Subscribe](mailto:dev-subscribe@beam.apache.org)[^1], [Unsubscribe](mailto:dev-unsubscribe@beam.apache.org)[^1], [Archives](https://lists.apache.org/list.html?dev@beam.apache.org)) | -| [commits@](https://lists.apache.org/list.html?commits@beam.apache.org) mailing list | Firehose of commits, bugs, pull requests, etc. ([Subscribe](mailto:commits-subscribe@beam.apache.org)[^1], [Unsubscribe](mailto:commits-unsubscribe@beam.apache.org)[^1], [Archives](https://lists.apache.org/list.html?commits@beam.apache.org)) | -| [builds@](https://lists.apache.org/list.html?builds@beam.apache.org) mailing list | Firehose of build notifications from Jenkins ([Subscribe](mailto:builds-subscribe@beam.apache.org)[^1], [Unsubscribe](mailto:builds-unsubscribe@beam.apache.org)[^1], [Archives](https://lists.apache.org/list.html?builds@beam.apache.org)) | -| [JIRA bug tracker](https://issues.apache.org/jira/browse/BEAM) | Report bugs / discover known issues | -| [StackOverflow](https://stackoverflow.com/questions/tagged/apache-beam) | Ask and answer user support questions | -| [Slack](https://s.apache.org/beam-slack-channel) | Chat with users and developers in the ASF Slack. Note: Please [join the #beam channel](https://s.apache.org/beam-slack-channel) after you [created an account](https://s.apache.org/slack-invite). Please do not ask Beam questions in #general. | +To unsubscribe, send a blank email to [user-unsubscribe@beam.apache.org](user-unsubscribe@beam.apache.org) or [dev-unsubscribe@beam.apache.org](dev-unsubscribe@beam.apache.org) depending on the list you want to unsubscribe. -
    +### Useful Tips for Sending Emails -If you have questions about how to use Apache Beam, we recommend you try out the [user@](https://lists.apache.org/list.html?user@beam.apache.org) mailing list, and [StackOverflow](https://stackoverflow.com/questions/tagged/apache-beam). +Tip 1: Use tags in your subject line. +A tag is a word within a pair of brackets [] that indicate the type of message you’re sending. For example: [Bug] or [Proposal] or [Question] or [Idea]. Tags help folks navigate emails easier. -[^1]: To subscribe or unsubscribe, a blank email is fine. +Tip 2: If you’re asking a troubleshooting question, provide as much information as possible to help others replicate your issue or find possible solutions. -If you wish to report a security vulnerability, please contact [security@apache.org](mailto:security@apache.org). Apache Beam follows the typical [Apache vulnerability handling process](https://apache.org/security/committers.html#vulnerability-handling). +Tip 3: Share complete links instead of hyperlinks. A common practice in Apache is adding a number tag like [1] to indicate a word is a link or an attachment, and use the tag as a footnote to add the complete link at the end of your message. diff --git a/website/www/site/content/en/community/join-beam.md b/website/www/site/content/en/community/join-beam.md new file mode 100644 index 000000000000..db65498f1307 --- /dev/null +++ b/website/www/site/content/en/community/join-beam.md @@ -0,0 +1,46 @@ +--- +title: "Join Beam Community" +aliases: + - /community/ + - /use/issue-tracking/ + - /use/mailing-lists/ + - /get-started/support/ +--- + + + +# Join Beam Community! + +We’re so glad you want to join the Apache Beam community. Here’s a pathway you can follow to help you get started. This is not the only way to join, but it’s the recommended progression so you’re successful at reaching your objectives. + +## Join the Conversation + +{{< community/join_beam_columns number-of-data="0" >}} + +## Contribute to the Project + +{{< community/join_beam_columns number-of-data="1" >}} + +## Become a committer + +{{< community/join_beam_columns number-of-data="2" >}} + +## Become a PMC member + +{{< community/join_beam_columns number-of-data="3" >}} + +## Still have questions? + +Send us a note at user@beam.apache.org OR mention @ApacheBeam on Twitter OR ask us in the #beam slack channel. We look forward to seeing you in the community! diff --git a/website/www/site/content/en/community/powered-by.md b/website/www/site/content/en/community/powered-by.md deleted file mode 100644 index dd45ece15b4b..000000000000 --- a/website/www/site/content/en/community/powered-by.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -title: 'Powered by Apache Beam' ---- - -# Projects and Products Powered by Apache Beam - -To add yourself to the list, please open a [pull request](https://github.com/apache/beam/edit/master/website/www/site/content/en/community/powered_by.md) adding your organization name, URL, a list of which Beam components you are using, and a short description of your use case. - -* **[Cloud Dataflow](https://cloud.google.com/dataflow):** Google Cloud Dataflow is a fully managed service for executing - Apache Beam pipelines within the Google Cloud Platform ecosystem. -* **[TensorFlow Extended (TFX)](https://www.tensorflow.org/tfx):** TensorFlow Extended (TFX) is an end-to-end platform - for deploying production ML pipelines. -* **[Apache Hop (incubating)](http://hop.apache.org):** Hop provides a complete data orchestration (ETL / DI) toolset with visual pipeline development. It supports execution on the main Apache Beam runners. - - diff --git a/website/www/site/content/en/contribute/_index.md b/website/www/site/content/en/contribute/_index.md index 4a095b70ba38..75d1cbfa975c 100644 --- a/website/www/site/content/en/contribute/_index.md +++ b/website/www/site/content/en/contribute/_index.md @@ -1,12 +1,15 @@ --- title: "Beam Contribution Guide" +type: "contribute" +layout: "arrow_template" aliases: - - /contribution-guide/ - - /contribute/contribution-guide/ - - /docs/contribute/ - - /contribute/source-repository/ - - /contribute/design-principles/ + - /contribution-guide/ + - /contribute/contribution-guide/ + - /docs/contribute/ + - /contribute/source-repository/ + - /contribute/design-principles/ --- + -# Apache Beam Contribution Guide +# Contribution guide + + -If you have questions, please [reach out to the Beam community](/contribute/get-help). +
    -There are lots of opportunities to contribute: +There are lots of opportunities to contribute. You can for example: - - ask or answer questions on [user@beam.apache.org](/community/contact-us/) or +- ask or answer questions on [user@beam.apache.org](/community/contact-us/) or [stackoverflow](https://stackoverflow.com/questions/tagged/apache-beam) - review proposed design ideas on [dev@beam.apache.org](/community/contact-us/) - improve the documentation @@ -49,29 +57,47 @@ There are lots of opportunities to contribute: https://cwiki.apache.org/confluence/display/BEAM/Contributor+FAQ) - organize local meetups of users or contributors to Apache Beam -Most importantly, if you have an idea of how to contribute, then do it! +
    + + + +
    + Below is a tutorial for contributing code to Beam, covering our tools and typical process in detail. ### Prerequisites -To contribute code, you need - - - a GitHub account - - a Linux, macOS, or Microsoft Windows development environment with Java JDK 8 installed + - A GitHub account. + - A Linux, macOS, or Microsoft Windows development environment + - Java JDK 8 installed - [Docker](https://www.docker.com/) installed for some tasks including building worker containers and testing this website - changes locally - - [Go](https://golang.org) 1.12 or later installed for Go SDK development + changes locally. + - For SDK Development: + - [Go](https://golang.org) 1.12 or later installed for Go SDK development - Python 3.6, 3.7, and 3.8. Yes, you need all three versions installed. - - pip, setuptools, virtualenv, and tox installed for Python development - - for large contributions, a signed [Individual Contributor License + - pip, setuptools, virtualenv, and tox installed for Python development + - For large contributions, a signed [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf) (ICLA) to the Apache Software Foundation (ASF). + +### Configuration options +You have two options for configuring your development environment: +- Local: + - Manually installing the prerequisites listed above. + - Using the automated script for Linux and macOS. +- Container: using a Docker image. + +#### Local: Debian-based Distribution +##### Manual steps To install these in a Debian-based distribution: +1. Execute: ``` sudo apt-get install \ @@ -83,27 +109,52 @@ sudo apt-get install \ docker-ce ``` -On some systems (like Ubuntu 20.04) these need to be installed also +2. On some systems, like Ubuntu 20.04, install these: + ``` pip3 install grpcio-tools mypy-protobuf ``` -You also need to [install Go](https://golang.org/doc/install). - -Once Go is installed, install goavro: +3. If you you develop in GO: + 1. Install [Go](https://golang.org/doc/install). + 1. Check BEAM repo is in: `$GOPATH/src/github.com/apache/` + 1. At the end, it should look like this: `$GOPATH/src/github.com/apache/beam` +4. Once Go is installed, install goavro: ``` $ export GOPATH=`pwd`/sdks/go/examples/.gogradle/project_gopath $ go get github.com/linkedin/goavro ``` -gLinux users should configure their machines for sudoless Docker. +**Important**: gLinux users should configure their machines for sudoless Docker. + +##### Automated script for Linux and macOS + +You can install these in a Debian-based distribution for Linux or macOs using the [local-env-setup.sh](https://github.com/apache/beam/blob/master/local-env-setup.sh) script, which is part of the Beam repo. It contains: + +* pip3 packages +* go packages +* goavro +* JDK 8 +* Python +* Docker + +To install: + +1. Execute: +``` +./local-env-setup.sh +``` + +#### Container: Docker-based Alternatively, you can use the Docker based local development environment to wrap your clone of the Beam repo into a container meeting the requirements above. You can start this container using the [start-build-env.sh](https://github.com/apache/beam/blob/master/start-build-env.sh) script which is part of the Beam repo: + +1. Execute: ``` ./start-build-env.sh ``` @@ -127,18 +178,58 @@ script which is part of the Beam repo: 1. Assign the issue to yourself. To get the permission to do so, email the [dev@ mailing list](/community/contact-us) to introduce yourself and to be added as a contributor in the Beam issue tracker including your - ASF Jira Username. For example [this welcome email]( - https://lists.apache.org/thread.html/e6018c2aaf7dc7895091434295e5b0fafe192b975e3e3761fcf0cda7@%3Cdev.beam.apache.org%3E). + ASF Jira Username. For example [this welcome email](https://lists.apache.org/thread.html/e6018c2aaf7dc7895091434295e5b0fafe192b975e3e3761fcf0cda7@%3Cdev.beam.apache.org%3E). 1. If your change is large or it is your first change, it is a good idea to [discuss it on the dev@ mailing list](/community/contact-us/). 1. For large changes create a design doc ([template](https://s.apache.org/beam-design-doc-template), [examples](https://s.apache.org/beam-design-docs)) and email it to the [dev@ mailing list](/community/contact-us). -### Development Setup +### Development Setup {#development-setup} + +1. Check [Git workflow tips](https://cwiki.apache.org/confluence/display/BEAM/Git+Tips) if you need help with git forking, cloning, branching, committing, pull requests, and squashing commits. +1. Clone the git repository. You can download it anywhere you like. + + $ mkdir -p ~/go/src/github.com/apache + $ cd ~/go/src/github.com/apache + $ git clone https://github.com/apache/beam + $ cd beam + + - For Go development: + We recommend putting it in your [`$GOPATH`](https://golang.org/doc/gopath_code#GOPATH) (`$HOME/go` by default on Unix systems). + 1. Clone the repo, and update your branch as normal + $ git clone https://github.com/apache/beam.git + $ cd beam + $ git remote add git@github.com:/beam.git + $ git fetch --all + 1. Get or Update all the Go SDK dependencies + $ go get -u ./... + +1. Check the environment was set up correctly. + - **Option 1**: validate the Go, Java, and Python environments: + + **Important**: Make sure you have activated Python development. +``` +./gradlew :checkSetup +``` + - **Option 2**: Run independent checks: + - For **Go development**: + 1. Execute: +``` +export GOLANG_PROTOBUF_REGISTRATION_CONFLICT=ignore +./gradlew :sdks:go:examples:wordCount +``` + - For **Python development**: + 1. Execute: +``` +./gradlew :sdks:python:wordCount +``` + - For **Java development**: + 1. Execute: +``` +./gradlew :examples:java:wordCount +``` -1. If you need help with git forking, cloning, branching, committing, pull requests, and squashing commits, see - [Git workflow tips](https://cwiki.apache.org/confluence/display/BEAM/Git+Tips) 1. Familiarize yourself with gradle and the project structure. At the root of the git repository, run: $ ./gradlew projects @@ -157,11 +248,7 @@ script which is part of the Beam repo: 1. Make sure you can build and run tests - Run the entire set of tests with: - - $ ./gradlew check - - You can limit testing to a particular module. Gradle will build just the necessary things to run those tests. For example: + Since Beam is a large project, usually, you will want to limit testing to the particular module you are working on. Gradle will build just the necessary things to run those tests. For example: $ ./gradlew -p sdks/go check $ ./gradlew -p sdks/java/io/cassandra check @@ -226,12 +313,14 @@ script which is part of the Beam repo: and keeps comment threads attached to the code. Please refrain from squashing new commits into reviewed commits before review is completed. Because squashing reviewed and unreviewed commits often makes it harder to - see the the difference between the review iterations, reviewers may ask you to unsquash new changes. + see the difference between the review iterations, reviewers may ask you to unsquash new changes. 1. After review is complete and the PR is accepted, fixup commits should be squashed (see [Git workflow tips](https://cwiki.apache.org/confluence/display/BEAM/Git+Tips)). Beam committers [can squash](https://beam.apache.org/contribute/committer-guide/#merging-it) all commits in the PR during merge, however if a PR has a mixture of independent changes that should not be squashed, and fixup commits, - then the PR author should help squashing fixup commits to maintain a clean commmit history. + then the PR author should help squashing fixup commits to maintain a clean commit history. + +
    ## When will my change show up in an Apache Beam release? diff --git a/website/www/site/content/en/contribute/attributes.md b/website/www/site/content/en/contribute/attributes.md new file mode 100644 index 000000000000..6fdb4b8b8f82 --- /dev/null +++ b/website/www/site/content/en/contribute/attributes.md @@ -0,0 +1,66 @@ +--- +title: "Attributes of a Beam community member" +layout: "arrow_template" +--- + + + +# Attributes of a Beam community member + +{{< figure src="/images/community/beam-logo-icon.svg" >}} + +## Knows, upholds, and reinforces the Beam community’s practices + +- They have a proven commitment to the project +- They share their intentions with the community +- They accept and integrate community feedback in their plans, designs, code, etc. +- They earnestly try to make Beam better with their contributions + +In particular, if a code contributor: + +- They earnestly try to make Beam better with their own code +- They earnestly try to make Beam better with code review +- They accept and integrate feedback on their code +- They know, follow, and enforce Beam’s practices while reviewing/merging code - style, documentation, testing, backward compatibility, etc. + +{{< figure src="/images/community/messages-icon.svg" >}} + +## Knows, upholds, and reinforces the Apache Software Foundation code of conduct + +In particular, we manifestly strive to: + +- Be open +- Be empathetic +- Be welcoming +- Be friendly +- Be patient +- Be collaborative +- Be inquisitive +- Be careful in the words that they choose + +[To learn more see the ASF documentation.](https://httpd.apache.org/docs/) + +{{< figure src="/images/community/diamond-icon.svg" >}} + +## Knows, upholds, and reinforces the responsibilities of an Apache Software Foundation committer + +- They help create a product that will outlive the interest of any particular volunteer (including themselves) +- They grow and maintain the health of the Apache community +- They help out with surrounding work, such as the website & documentation +- They help users +- They can be trusted to decide when code is ready for release, or when to ask someone else to make the judgment +- They can be trusted to decide when to merge code (if a code contributor) or when to ask someone else to make the judgment + +[To learn more see the ASF documentation.](https://httpd.apache.org/docs/) diff --git a/website/www/site/content/en/contribute/become-a-committer.md b/website/www/site/content/en/contribute/become-a-committer.md index 5d86beb471e0..126010fb7b6d 100644 --- a/website/www/site/content/en/contribute/become-a-committer.md +++ b/website/www/site/content/en/contribute/become-a-committer.md @@ -1,6 +1,7 @@ --- title: "Become A Committer" --- + -# Become a Committer +# Become a Beam Committer + +An Apache Beam [committer](https://www.apache.org/foundation/how-it-works.html#committers) takes many forms. There are many actions other than coding that build the trust we place in a committer - code review, design discussion, user support, community outreach, improving infrastructure, documentation, project management, etc. + +### What does it mean to be a committer? + +An Apache Beam committer has write access to the repository for merging pull requests, but you don’t have to be a code contributor to become a committer. Becoming a committer means that you have the project’s trust. Read the [ASF documentation](https://www.apache.org/dev/committers.html#committer-responsibilities) for more about being a committer in the Apache Software Foundation. + +### Ways you can contribute +Everyone is welcome to join and contribute to the project in many ways, not only with code contributions. Things like asking questions, reporting bugs, proposing new features, improving documentation or the website, organizing events or writing blog posts, are also welcome and recognized. -An Apache Beam -[committer](https://www.apache.org/foundation/how-it-works.html#committers) has -write access to the repository for merging pull requests, but you don't have -to be a code contributor to become a committer. Becoming a committer means that -you have the project's trust. Read the [ASF -documentation](https://www.apache.org/dev/committers.html#committer-responsibilities) -for more about being a committer in the Apache Software Foundation. +{{< contributor/list_with_icons ways_of_contribution >}} +### What are the traits of an Apache Beam committer? + +{{< contributor/row_of_traits committer_traits >}} + +### Process The [PMC](https://www.apache.org/foundation/how-it-works.html#pmc-members) makes someone a committer via nomination, discussion, and then majority vote. We use data from as many sources as possible to inform our reasoning. Here are @@ -40,58 +49,3 @@ some examples: etc.) - Public events - Firsthand PMC testimonials - -The PMC has assembled the following set of guidelines for becoming a committer. - -## An Apache Beam committer... - -### Takes many forms - -There are many actions other than coding that build the trust we place in a -committer - code review, design discussion, user support, community outreach, improving -infrastructure, documentation, project management, etc. - -### Knows, upholds, and reinforces the Apache Software Foundation code of conduct - -See the [ASF -documentation](https://www.apache.org/foundation/policies/conduct.html). In -particular, they manifestly strive to: - - - Be open - - Be empathetic - - Be welcoming - - Be friendly - - Be patient - - Be collaborative - - Be inquisitive - - Be careful in the words that they choose - -### Knows, upholds, and reinforces the responsibilities of an Apache Software Foundation committer - -See the [ASF documentation](https://www.apache.org/dev/committers.html#committer-responsibilities). - - - They help create a product that will outlive the interest of any particular - volunteer (including themselves) - - They grow and maintain the health of the Apache community - - They help out with surrounding work, such as the website & documentation - - They help users - - They can be trusted to decide when code is ready for release, or when to ask - someone else to make the judgment - - They can be trusted to decide when to merge code (if a code contributor) or - when to ask someone else to make the judgment - -### Knows, upholds, and reinforces the Beam community’s practices - - - They have a proven commitment to the project - - They share their intentions with the community - - They accept and integrate community feedback in their plans, designs, - code, etc. - - They earnestly try to make Beam better with their contributions - - In particular, if a code contributor: - - They earnestly try to make Beam better with their own code - - They earnestly try to make Beam better with code review - - They accept and integrate feedback on their code - - They know, follow, and enforce Beam’s practices while - reviewing/merging code - style, documentation, testing, backward - compatibility, etc. - diff --git a/website/www/site/content/en/contribute/jira-priorities.md b/website/www/site/content/en/contribute/jira-priorities.md index 473c66eb126f..4bd5f0237e7f 100644 --- a/website/www/site/content/en/contribute/jira-priorities.md +++ b/website/www/site/content/en/contribute/jira-priorities.md @@ -39,6 +39,7 @@ unassigned. Most P1 bugs should block release. - data loss error - important component is nonfunctional for important use cases - major performance regression + - security related issues (CVEs) - failing postcommit test - flaky test diff --git a/website/www/site/content/en/contribute/release-guide.md b/website/www/site/content/en/contribute/release-guide.md index 782482ff7ad5..ee979b8395a8 100644 --- a/website/www/site/content/en/contribute/release-guide.md +++ b/website/www/site/content/en/contribute/release-guide.md @@ -21,13 +21,19 @@ limitations under the License. ## Introduction -The Apache Beam project periodically declares and publishes releases. A release is one or more packages of the project artifact(s) that are approved for general public distribution and use. They may come with various degrees of caveat regarding their perceived quality and potential for change, such as “alpha”, “beta”, “incubating”, “stable”, etc. +The Apache Beam project periodically declares and publishes releases. +A release is one or more packages of the project artifact(s) that are approved for general public distribution and use. +They may come with various degrees of caveat regarding their perceived quality and potential for change, such as “alpha”, “beta”, “incubating”, “stable”, etc. -The Beam community treats releases with great importance. They are a public face of the project and most users interact with the project only through the releases. Releases are signed off by the entire Beam community in a public vote. +The Beam community treats releases with great importance. +They are a public face of the project and most users interact with the project only through the releases. Releases are signed off by the entire Beam community in a public vote. -Each release is executed by a *Release Manager*, who is selected among the Beam committers. This document describes the process that the Release Manager follows to perform a release. Any changes to this process should be discussed and adopted on the [dev@ mailing list](/get-started/support/). +Each release is executed by a *Release Manager*, who is selected among the Beam committers. +This document describes the process that the Release Manager follows to perform a release. +Any changes to this process should be discussed and adopted on the [dev@ mailing list](/get-started/support/). -Please remember that publishing software has legal consequences. This guide complements the foundation-wide [Product Release Policy](http://www.apache.org/dev/release.html) and [Release Distribution Policy](http://www.apache.org/dev/release-distribution). +Please remember that publishing software has legal consequences. +This guide complements the foundation-wide [Product Release Policy](http://www.apache.org/dev/release.html) and [Release Distribution Policy](http://www.apache.org/dev/release-distribution). ### Overview @@ -50,11 +56,15 @@ The release process consists of several steps: ## 1. Decide to release -Deciding to release and selecting a Release Manager is the first step of the release process. This is a consensus-based decision of the entire community. +Deciding to release and selecting a Release Manager is the first step of the release process. +This is a consensus-based decision of the entire community. -Anybody can propose a release on the dev@ mailing list, giving a solid argument and nominating a committer as the Release Manager (including themselves). There’s no formal process, no vote requirements, and no timing requirements. Any objections should be resolved by consensus before starting the release. +Anybody can propose a release on the dev@ mailing list, giving a solid argument and nominating a committer as the Release Manager (including themselves). +There’s no formal process, no vote requirements, and no timing requirements. Any objections should be resolved by consensus before starting the release. -In general, the community prefers to have a rotating set of 3-5 Release Managers. Keeping a small core set of managers allows enough people to build expertise in this area and improve processes over time, without Release Managers needing to re-learn the processes for each release. That said, if you are a committer interested in serving the community in this way, please reach out to the community on the dev@ mailing list. +In general, the community prefers to have a rotating set of 3-5 Release Managers. +Keeping a small core set of managers allows enough people to build expertise in this area and improve processes over time, without Release Managers needing to re-learn the processes for each release. +That said, if you are a committer interested in serving the community in this way, please reach out to the community on the dev@ mailing list. ### Checklist to proceed to the next step @@ -65,9 +75,11 @@ In general, the community prefers to have a rotating set of 3-5 Release Managers ## 2. Prepare for the release -Before your first release, you should perform one-time configuration steps. This will set up your security keys for signing the release and access to various release repositories. +Before your first release, you should perform one-time configuration steps. + This will set up your security keys for signing the release and access to various release repositories. -To prepare for each release, you should audit the project status in the JIRA issue tracker, and do necessary bookkeeping. Finally, you should create a release branch from which individual release candidates will be built. +To prepare for each release, you should audit the project status in the JIRA issue tracker, and do necessary bookkeeping. +Finally, you should create a release branch from which individual release candidates will be built. __NOTE__: If you are using [GitHub two-factor authentication](https://help.github.com/articles/securing-your-account-with-two-factor-authentication-2fa/) and haven't configure HTTPS access, please follow [the guide](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/) to configure command line access. @@ -88,19 +100,20 @@ Please have these credentials ready at hand, you will likely need to enter them #### GPG Key -You need to have a GPG key to sign the release artifacts. Please be aware of the ASF-wide [release signing guidelines](https://www.apache.org/dev/release-signing.html). If you don’t have a GPG key associated with your Apache account, please create one according to the guidelines. +You need to have a GPG key to sign the release artifacts. +Please be aware of the ASF-wide [release signing guidelines](https://www.apache.org/dev/release-signing.html). +If you don’t have a GPG key associated with your Apache account, please create one according to the guidelines. -There are 2 ways to configure your GPG key for release, either using release automation script(which is recommended), -or running all commands manually. +There are 2 ways to configure your GPG key for release, either using release automation script(which is recommended), or running all commands manually. ##### Use preparation_before_release.sh to setup GPG -* Script: [preparation_before_release.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/preparation_before_release.sh) +* **Script:** [preparation_before_release.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/preparation_before_release.sh) -* Usage +* **Usage** ``` ./beam/release/src/main/scripts/preparation_before_release.sh ``` -* Tasks included +* **Tasks included** 1. Help you create a new GPG key if you want. 1. Configure ```git user.signingkey``` with chosen pubkey. 1. Add chosen pubkey into [dev KEYS](https://dist.apache.org/repos/dist/dev/beam/KEYS) and [release KEYS](https://dist.apache.org/repos/dist/release/beam/KEYS) @@ -157,18 +170,15 @@ Configure access to the [Apache Nexus repository](https://repository.apache.org/ In order to make yourself have right permission to stage java artifacts in Apache Nexus staging repository, please submit your GPG public key into [MIT PGP Public Key Server](http://pgp.mit.edu:11371/). -If MIT doesn't work for you (it probably won't, it's slow, returns 502 a lot, Nexus might error out not being able to find the keys), -use a keyserver at `ubuntu.com` instead: https://keyserver.ubuntu.com/. +If MIT doesn't work for you (it probably won't, it's slow, returns 502 a lot, Nexus might error out not being able to find the keys), use a keyserver at `ubuntu.com` instead: https://keyserver.ubuntu.com/. #### Website development setup -Updating the Beam website requires submitting PRs to both the main `apache/beam` -repo and the `apache/beam-site` repo. The first contains reference manuals -generated from SDK code, while the second updates the current release version -number. +Updating the Beam website requires submitting PRs to both the main `apache/beam` repo and the `apache/beam-site` repo. +The first contains reference manuals generated from SDK code, while the second updates the current release version number. -You should already have setup a local clone of `apache/beam`. Setting up a clone -of `apache/beam-site` is similar: +You should already have setup a local clone of `apache/beam`. +Setting up a clone of `apache/beam-site` is similar: $ git clone -b release-docs https://github.com/apache/beam-site.git $ cd beam-site @@ -176,22 +186,26 @@ of `apache/beam-site` is similar: $ git fetch --all $ git checkout -b origin/release-docs -Further instructions on website development on `apache/beam` is -[here](https://github.com/apache/beam/blob/master/website). Background -information about how the website is updated can be found in [Beam-Site -Automation Reliability](https://s.apache.org/beam-site-automation). +Further instructions on website development on `apache/beam` is [here](https://github.com/apache/beam/blob/master/website). +Background information about how the website is updated can be found in [Beam-Site Automation Reliability](https://s.apache.org/beam-site-automation). #### Register to PyPI -Release manager needs to have an account with PyPI. If you need one, [register at PyPI](https://pypi.python.org/account/register/). You also need to be a maintainer (or an owner) of the [apache-beam](https://pypi.python.org/pypi/apache-beam) package in order to push a new release. Ask on the mailing list for assistance. +Release manager needs to have an account with PyPI. +If you need one, [register at PyPI](https://pypi.python.org/account/register/). +You also need to be a maintainer (or an owner) of the [apache-beam](https://pypi.python.org/pypi/apache-beam) package in order to push a new release. +Ask on the mailing list for assistance. #### Login to DockerHub -Run following command manually. It will ask you to input your DockerHub ID and password if -authorization info cannot be found from ~/.docker/config.json file. +Run following command manually. +It will ask you to input your DockerHub ID and password if authorization info cannot be found from ~/.docker/config.json file. + ``` docker login docker.io ``` -After successful login, authorization info will be stored at ~/.docker/config.json file. For example, + +After successful login, authorization info will be stored at ~/.docker/config.json file. +For example, ``` "https://index.docker.io/v1/": { "auth": "xxxxxx" @@ -201,9 +215,12 @@ Release managers should have push permission; request membership in the [`beamma ### Create a new version in JIRA -When contributors resolve an issue in JIRA, they are tagging it with a release that will contain their changes. With the release currently underway, new issues should be resolved against a subsequent future release. Therefore, you should create a release item for this subsequent release, as follows: +When contributors resolve an issue in JIRA, they are tagging it with a release that will contain their changes. +With the release currently underway, new issues should be resolved against a subsequent future release. +Therefore, you should create a release item for this subsequent release, as follows: -__Attention__: Only PMC has permission to perform this. If you are not a PMC, please ask for help in dev@ mailing list. +__Attention__: Only PMC has permission to perform this. +If you are not a PMC, please ask for help in dev@ mailing list. 1. In JIRA, navigate to [`Beam > Administration > Versions`](https://issues.apache.org/jira/plugins/servlet/project-config/BEAM/versions). 1. Add a new release. Choose the next minor version number after the version currently underway, select the release cut date (today’s date) as the `Start Date`, and choose `Add`. @@ -215,7 +232,8 @@ __Attention__: Only PMC has permission to perform this. If you are not a PMC, pl ## 3. Investigate performance regressions -Check the Beam load tests for possible performance regressions. Measurements are available on [metrics.beam.apache.org](http://metrics.beam.apache.org). +Check the Beam load tests for possible performance regressions. +Measurements are available on [metrics.beam.apache.org](http://metrics.beam.apache.org). All Runners which publish data should be checked for the following, in both *batch* and *streaming* mode: @@ -224,8 +242,8 @@ All Runners which publish data should be checked for the following, in both *bat - [IO](http://metrics.beam.apache.org/d/bnlHKP3Wz/java-io-it-tests-dataflow): Runtime If regressions are found, the release branch can still be created, but the regressions should be investigated and fixed as part of the release process. -The role of the release manager is to file JIRA issues for each regression with the 'Fix Version' set to the to-be-released version. The release manager -oversees these just like any other JIRA issue marked with the 'Fix Version' of the release. +The role of the release manager is to file JIRA issues for each regression with the 'Fix Version' set to the to-be-released version. +The release manager oversees these just like any other JIRA issue marked with the 'Fix Version' of the release. The mailing list should be informed to allow fixing the regressions in the course of the release. @@ -233,14 +251,27 @@ The mailing list should be informed to allow fixing the regressions in the cours Attention: Only committer has permission to create release branch in apache/beam. -Release candidates are built from a release branch. As a final step in preparation for the release, you should create the release branch, push it to the Apache code repository, and update version information on the original branch. +Release candidates are built from a release branch. +As a final step in preparation for the release, you should create the release branch, push it to the Apache code repository, and update version information on the original branch. +The final state of the repository should match this diagram: + +Increment minor version on master branch and set Dataflow container version on release branch + +The key points to know: -There are 2 ways to cut a release branch: either running automation script(recommended), or running all commands manually. +- The `master` branch has the SNAPSHOT/dev version incremented. +- The release branch has the SNAPSHOT/dev version to be released. +- The Dataflow container image should be modified to the version to be released. + +This will all be accomplished by the [cut_release_branch.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/cut_release_branch.sh) +script. + +After cutting the branch, you should manually update `CHANGES.md` on `master` by adding a new section for the next release. #### Use cut_release_branch.sh to cut a release branch -* Script: [cut_release_branch.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/cut_release_branch.sh) +* **Script:** [cut_release_branch.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/cut_release_branch.sh) -* Usage +* **Usage** ``` # Cut a release branch ./beam/release/src/main/scripts/cut_release_branch.sh \ @@ -250,90 +281,23 @@ There are 2 ways to cut a release branch: either running automation script(recom # Show help page ./beam/release/src/main/scripts/cut_release_branch.sh -h ``` -* The script will: - 1. Create release-${RELEASE_VERSION} branch locally. - 1. Change and commit dev versoin number in master branch: - - [BeamModulePlugin.groovy](https://github.com/apache/beam/blob/e8abafe360e126818fe80ae0f6075e71f0fc227d/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L209), - [gradle.properties](https://github.com/apache/beam/blob/e8abafe360e126818fe80ae0f6075e71f0fc227d/gradle.properties#L25), - [version.py](https://github.com/apache/beam/blob/e8abafe360e126818fe80ae0f6075e71f0fc227d/sdks/python/apache_beam/version.py#L21) - 1. Change and commit version number in release branch: - - [version.py](https://github.com/apache/beam/blob/release-2.6.0/sdks/python/apache_beam/version.py#L21), - [build.gradle](https://github.com/apache/beam/blob/release-2.6.0/runners/google-cloud-dataflow-java/build.gradle#L39), - [gradle.properties](https://github.com/apache/beam/blob/release-2.16.0/gradle.properties#L27) - -#### (Alternative) Run all steps manually -* Checkout working branch - - Check out the version of the codebase from which you start the release. For a new minor or major release, this may be `HEAD` of the `master` branch. To build a hotfix/incremental release, instead of the `master` branch, use the release tag of the release being patched. (Please make sure your cloned repository is up-to-date before starting.) - - git checkout - - **NOTE**: If you are doing an incremental/hotfix release (e.g. 2.5.1), please check out the previous release tag, rather than the master branch. - -* Set up environment variables - - Set up a few environment variables to simplify Maven commands that follow. (We use `bash` Unix syntax in this guide.) - - RELEASE=2.5.0 - NEXT_VERSION_IN_BASE_BRANCH=2.6.0 - BRANCH=release-${RELEASE} - - Version represents the release currently underway, while next version specifies the anticipated next version to be released from that branch. Normally, 1.2.0 is followed by 1.3.0, while 1.2.3 is followed by 1.2.4. - - **NOTE**: Only if you are doing an incremental/hotfix release (e.g. 2.5.1), please check out the previous release tag, before running the following instructions: - - BASE_RELEASE=2.5.0 - RELEASE=2.5.1 - NEXT_VERSION_IN_BASE_BRANCH=2.6.0 - git checkout tags/${BASE_RELEASE} - -* Create release branch locally - - git branch ${BRANCH} - -* Update version files in the master branch. - - # Now change the version in existing gradle files, and Python files - sed -i -e "s/'${RELEASE}'/'${NEXT_VERSION_IN_BASE_BRANCH}'/g" build_rules.gradle - sed -i -e "s/${RELEASE}/${NEXT_VERSION_IN_BASE_BRANCH}/g" gradle.properties - sed -i -e "s/${RELEASE}/${NEXT_VERSION_IN_BASE_BRANCH}/g" sdks/python/apache_beam/version.py - - # Save changes in master branch - git add gradle.properties build_rules.gradle sdks/python/apache_beam/version.py - git commit -m "Moving to ${NEXT_VERSION_IN_BASE_BRANCH}-SNAPSHOT on master branch." - -* Check out the release branch. - - git checkout ${BRANCH} - - -* Update version files in release branch - - DEV=${RELEASE}.dev - sed -i -e "s/${DEV}/${RELEASE}/g" sdks/python/apache_beam/version.py - sed -i -e "s/${DEV}/${RELEASE}/g" gradle.properties - sed -i -e "s/'beam-master-.*'/'beam-${RELEASE}'/g" runners/google-cloud-dataflow-java/build.gradle - ### Start a snapshot build Start a build of [the nightly snapshot](https://ci-beam.apache.org/job/beam_Release_NightlySnapshot/) against master branch. -Some processes, including our archetype tests, rely on having a live SNAPSHOT of the current version -from the `master` branch. Once the release branch is cut, these SNAPSHOT versions are no longer found, -so builds will be broken until a new snapshot is available. +Some processes, including our archetype tests, rely on having a live SNAPSHOT of the current version from the `master` branch. +Once the release branch is cut, these SNAPSHOT versions are no longer found, so builds will be broken until a new snapshot is available. There are 2 ways to trigger a nightly build, either using automation script(recommended), or perform all operations manually. #### Run start_snapshot_build.sh to trigger build -* Script: [start_snapshot_build.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/start_snapshot_build.sh) +* **Script:** [start_snapshot_build.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/start_snapshot_build.sh) -* Usage +* **Usage** ./beam/release/src/main/scripts/start_snapshot_build.sh -* The script will: +* **The script will:** 1. Install [hub](https://github.com/github/hub) with your agreement. 1. Touch an empty txt file and commit changes into ```${your remote beam repo}/snapshot_build``` 1. Use hub to create a PR against apache:master, which triggers a Jenkins job to build snapshot. @@ -360,10 +324,11 @@ There are 2 ways to perform this verification, either running automation script( ! Dataflow tests will fail if Dataflow worker container is not created and published by this time. (Should be done by Google) #### Run automation script (verify_release_build.sh) -* Script: [verify_release_build.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/verify_release_build.sh) +* **Script:** [verify_release_build.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/verify_release_build.sh) -* Usage - 1. Create a personal access token from your Github account. See instruction [here](https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line). +* **Usage** + 1. Create a personal access token from your Github account. + See instruction [here](https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line). It'll be used by the script for accessing Github API. You only need to enable "repo" permissions to this token. 1. Update required configurations listed in `RELEASE_BUILD_CONFIGS` in [script.config](https://github.com/apache/beam/blob/master/release/src/main/scripts/script.config) @@ -377,11 +342,11 @@ There are 2 ways to perform this verification, either running automation script( See `COMMENTS_TO_ADD` in [mass_comment.py](https://github.com/apache/beam/blob/master/release/src/main/scripts/mass_comment.py) for full list of phrases. -* Tasks included in the script +* **Tasks included in the script** 1. Installs ```hub``` with your agreement and setup local git repo; 1. Create a test PR against release branch; -Jenkins job `beam_Release_Gradle_Build` basically run `./gradlew build -PisRelease`. +The [`beam_Release_Gradle_Build`](https://ci-beam.apache.org/job/beam_Release_Gradle_Build/) Jenkins job runs `./gradlew build -PisRelease`. This only verifies that everything builds with unit tests passing. #### Verify the build succeeds @@ -391,14 +356,12 @@ This only verifies that everything builds with unit tests passing. 2. If build failed, scan log will contain all failures. 3. You should stabilize the release branch until release build succeeded. -There are some projects that don't produce the artifacts, e.g. `beam-test-tools`, you may be able to -ignore failures there. +There are some projects that don't produce the artifacts, e.g. `beam-test-tools`, you may be able to ignore failures there. -To triage the failures and narrow things down you may want to look at `settings.gradle` and run the build only for the -projects you're interested at the moment, e.g. `./gradlew :runners:java-fn-execution`. +To triage the failures and narrow things down you may want to look at `settings.gradle.kts` and run the build only for the projects you're interested at the moment, e.g. `./gradlew :runners:java-fn-execution`. #### (Alternative) Run release build manually (locally) -* Pre-installation for python build +* **Pre-installation for python build** 1. Install pip ``` @@ -422,7 +385,7 @@ projects you're interested at the moment, e.g. `./gradlew :runners:java-fn-execu sudo apt-get install python3.7-dev ``` -* Run gradle release build +* **Run gradle release build** 1. Clean current workspace @@ -448,36 +411,42 @@ projects you're interested at the moment, e.g. `./gradlew :runners:java-fn-execu #### Create release-blocking issues in JIRA -The verify_release_build.sh script may include failing or flaky tests. For each of the failing tests create a JIRA with the following properties: +The verify_release_build.sh script may include failing or flaky tests. +For each of the failing tests create a JIRA with the following properties: -* Issue Type: Bug +* **Issue Type:** Bug -* Summary: Name of failing gradle task and name of failing test (where applicable) in form of :MyGradleProject:SomeGradleTask NameOfFailedTest: Short description of failure +* **Summary:** Name of failing gradle task and name of failing test (where applicable) in form of :MyGradleProject:SomeGradleTask NameOfFailedTest: Short description of failure -* Priority: Major +* **Priority:** Major -* Component: "test-failures" +* **Component:** "test-failures" -* Fix Version: Release number of verified release branch +* **Fix Version:** Release number of verified release branch -* Description: Description of failure +* **Description:** Description of failure #### Inform the mailing list -The dev@beam.apache.org mailing list should be informed about the release branch being cut. Alongside with this note, -a list of pending issues and to-be-trigated issues should be included. Afterwards, this list can be refined and updated -by the release manager and the Beam community. +The dev@beam.apache.org mailing list should be informed about the release branch being cut. +Alongside with this note, a list of pending issues and to-be-triaged issues should be included. +Afterwards, this list can be refined and updated by the release manager and the Beam community. ********** ## 6. Triage release-blocking issues in JIRA -There could be outstanding release-blocking issues, which should be triaged before proceeding to build a release candidate. We track them by assigning a specific `Fix version` field even before the issue resolved. +There could be outstanding release-blocking issues, which should be triaged before proceeding to build a release candidate. +We track them by assigning the blocked release to the issue's `Fix version` field before the issue is resolved. -The list of release-blocking issues is available at the [version status page](https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel). Triage each unresolved issue with one of the following resolutions: -The release manager should triage what does and does not block a release. An issue should not block the release if the problem exists in the current released version or is a bug in new functionality that does not exist in the current released version. It should be a blocker if the bug is a regression between the currently released version and the release in progress and has no easy workaround. +The release manager should triage what does and does not block a release. +The list of release-blocking issues is available at the [version status page](https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel). +Triage each unresolved issue with one of the following resolutions: + +* An issue should not block the release if the problem exists in the current released version or is a bug in new functionality that does not exist in the current released version. +* An issue should be a blocker if the problem is a regression between the currently released version and the release in progress and has no easy workaround. For all JIRA issues: @@ -485,15 +454,20 @@ For all JIRA issues: For JIRA issues with type "Bug" or labeled "flaky": -* If the issue is a known continuously failing test, it is not acceptable to defer this until the next release. Please work with the Beam community to resolve the issue. -* If the issue is a known flaky test, make an attempt to delegate a fix. However, if the issue may take too long to fix (to the discretion of the release manager): +* If the issue is a known continuously failing test, it is not acceptable to defer this until the next release. + Please work with the Beam community to resolve the issue. +* If the issue is a known flaky test, make an attempt to delegate a fix. + However, if the issue may take too long to fix (to the discretion of the release manager): * Delegate manual testing of the flaky issue to ensure no release blocking issues. - * Update the `Fix Version` field to the version of the next release. Please consider discussing this with stakeholders and the dev@ mailing list, as appropriate. + * Update the `Fix Version` field to the version of the next release. + Please consider discussing this with stakeholders and the dev@ mailing list, as appropriate. For all other JIRA issues: -* If the issue has not been resolved and it is acceptable to defer this until the next release, update the `Fix Version` field to the new version you just created. Please consider discussing this with stakeholders and the dev@ mailing list, as appropriate. -* If the issue has not been resolved and it is not acceptable to release until it is fixed, the release cannot proceed. Instead, work with the Beam community to resolve the issue. +* If the issue has not been resolved and it is acceptable to defer this until the next release, update the `Fix Version` field to the new version you just created. + Please consider discussing this with stakeholders and the dev@ mailing list, as appropriate. +* If the issue has not been resolved and it is not acceptable to release until it is fixed, the release cannot proceed. + Instead, work with the Beam community to resolve the issue. If there is a bug found in the RC creation process/tools, those issues should be considered high priority and fixed in 7 days. @@ -502,7 +476,8 @@ If there is a bug found in the RC creation process/tools, those issues should be Check if there are outstanding cherry-picks into the release branch, [e.g. for `2.14.0`](https://github.com/apache/beam/pulls?utf8=%E2%9C%93&q=is%3Apr+base%3Arelease-2.14.0). Make sure they have blocker JIRAs attached and are OK to get into the release by checking with community if needed. -As the Release Manager you are empowered to accept or reject cherry-picks to the release branch. You are encouraged to ask the following questions to be answered on each cherry-pick PR and you can choose to reject cherry-pick requests if these questions are not satisfactorily answered: +As the Release Manager you are empowered to accept or reject cherry-picks to the release branch. +You are encouraged to ask the following questions to be answered on each cherry-pick PR and you can choose to reject cherry-pick requests if these questions are not satisfactorily answered: * Is this a regression from a previous release? (If no, fix could go to a newer version.) * Is this a new feature or related to a new feature? (If yes, fix could go to a new version.) @@ -510,9 +485,12 @@ As the Release Manager you are empowered to accept or reject cherry-picks to th * What percentage of users would be impacted by this issue if it is not fixed? (E.g. If this is predicted to be a small number it may not need to be a cherry pick.) * Would it be possible for the impacted users to skip this version? (If users could skip this version, fix could go to a newer version.) -It is important to accept major/blocking fixes to isolated issues to make a higher quality release. However, beyond that each cherry pick will increase the time required for the release and add more last minute code to the release branch. Neither late releases nor not fully tested code will provide positive user value. +It is important to accept major/blocking fixes to isolated issues to make a higher quality release. +However, beyond that each cherry pick will increase the time required for the release and add more last minute code to the release branch. +Neither late releases nor not fully tested code will provide positive user value. -_Tip_: Another tool in your toolbox is the known issues section of the release blog. Consider adding known issues there for minor issues instead of accepting cherry picks to the release branch. +__Tip__: Another tool in your toolbox is the known issues section of the release blog. +Consider adding known issues there for minor issues instead of accepting cherry picks to the release branch. ********** @@ -535,21 +513,52 @@ _Tip_: Another tool in your toolbox is the known issues section of the release b * Nightly snapshot is in progress (do revisit it continually); * Set `JAVA_HOME` to JDK 8 (Example: `export JAVA_HOME=/example/path/to/java/jdk8`). -The core of the release process is the build-vote-fix cycle. Each cycle produces one release candidate. The Release Manager repeats this cycle until the community approves one release candidate, which is then finalized. +The core of the release process is the build-vote-fix cycle. +Each cycle produces one release candidate. +The Release Manager repeats this cycle until the community approves one release candidate, which is then finalized. For this step, we recommend you using automation script to create a RC, but you still can perform all steps manually if you want. +### Tag a chosen commit for the RC + +Release candidates are built from single commits off the release branch. +Before building, the version must be set to a non-SNAPSHOT, non-dev version. +The final state of the repository should match this diagram: + +Set version to non-SNAPSHOT, non-dev, on tagged RC commit + +- The release branch is unchanged. +- There is a commit not on the release branch with the version adjusted. +- The RC tag points to that commit. + +* **Script:** [choose_rc_commit.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/choose_rc_commit.sh) + +* **Usage** + + ./beam/release/src/main/scripts/choose_rc_commit.sh \ + --release "${RELEASE_VERSION}" \ + --rc "${RC_NUM}" \ + --commit "${COMMIT_REF}" \ + --clone \ + --push-tag + +You can do a dry run by omitting the `--push-tag` flag. Then it will only clone the repo, +adjust the version, and add the tag locally. If it looks good, run it again with `--push-tag`. +If you already have a clone that includes the `${COMMIT_REF}` then you can omit `--clone`. This +is perfectly safe since the script does not depend on the current working tree. + +See the source of the script for more details, or to run commands manually in case of a problem. ### Run build_release_candidate.sh to create a release candidate -* Script: [build_release_candidate.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/build_release_candidate.sh) +* **Script:** [build_release_candidate.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/build_release_candidate.sh) -* Usage +* **Usage** ./beam/release/src/main/scripts/build_release_candidate.sh -* The script will: - 1. Run gradle release to create rc tag and push source release into github repo. +* **The script will:** + 1. Clone the repo at the selected RC tag. 1. Run gradle publish to push java artifacts into Maven staging repo. 1. Stage source release into dist.apache.org dev [repo](https://dist.apache.org/repos/dist/dev/beam/). 1. Stage, sign and hash python source distribution and wheels into dist.apache.org dev repo python dir @@ -584,78 +593,78 @@ For this step, we recommend you using automation script to create a RC, but you 1. Select repository `orgapachebeam-NNNN`. 1. Click the Close button. 1. When prompted for a description, enter “Apache Beam, version X, release candidate Y”. - 1. Review all staged artifacts on https://repository.apache.org/content/repositories/orgapachebeam-NNNN/. They should contain all relevant parts for each module, including `pom.xml`, jar, test jar, javadoc, etc. Artifact names should follow [the existing format](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22) in which artifact name mirrors directory structure, e.g., `beam-sdks-java-io-kafka`. Carefully review any new artifacts. + 1. Review all staged artifacts on https://repository.apache.org/content/repositories/orgapachebeam-NNNN/. + They should contain all relevant parts for each module, including `pom.xml`, jar, test jar, javadoc, etc. + Artifact names should follow [the existing format](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22) in which artifact name mirrors directory structure, e.g., `beam-sdks-java-io-kafka`. + Carefully review any new artifacts. -********** +### Upload release candidate to PyPi +* **Script:** [deploy_release_candidate_pypi.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/deploy_release_candidate_pypi.sh) -## 8. Prepare documents +* **Usage** -### Update and Verify Javadoc + ./release/src/main/scripts/deploy_release_candidate_pypi.sh \ + --release "${RELEASE_VERSION}" \ + --rc "${RC_NUM}" \ + --user "${GITHUB_USER}" \ + --deploy -The build with `-PisRelease` creates the combined Javadoc for the release in `sdks/java/javadoc`. +* **The script will:** + 1. Download python binary artifacts + 1. Deploy release candidate to PyPI -The file `sdks/java/javadoc/build.gradle` contains a list of modules to include -in and exclude, plus a list of offline URLs that populate links from Beam's -Javadoc to the Javadoc for other modules that Beam depends on. +__Attention:__ Verify that: +* The File names version include ``rc-#`` suffix +* [Download Files](https://pypi.org/project/apache-beam/#files) have: + * All wheels uploaded as artifacts + * Release source's zip published + * Signatures and hashes do not need to be uploaded -* Confirm that new modules added since the last release have been added to the - inclusion list as appropriate. +You can do a dry run by omitting the `--deploy` flag. Then it will only download the release candidate binaries. If it looks good, rerun it with `--deploy`. -* Confirm that the excluded package list is up to date. +See the source of the script for more details or to run commands manually in case of a problem. -* Verify the version numbers for offline links match the versions used by Beam. If - the version number has changed, download a new version of the corresponding - `-docs/package-list` file. -### Build the Pydoc API reference +********** -Make sure you have ```tox``` installed: -``` -pip install tox -``` -Create the Python SDK documentation using sphinx by running a helper script. -``` -cd sdks/python && pip install -r build-requirements.txt && tox -e py37-docs -``` -By default the Pydoc is generated in `sdks/python/target/docs/_build`. Let `${PYDOC_ROOT}` be the absolute path to `_build`. +## 8. Prepare documents ### Propose pull requests for website updates -Beam publishes API reference manuals for each release on the website. For Java -and Python SDKs, that’s Javadoc and PyDoc, respectively. The final step of -building the candidate is to propose website pull requests that update these -manuals. +Beam publishes API reference manuals for each release on the website. +For Java and Python SDKs, that’s Javadoc and PyDoc, respectively. +The final step of building the candidate is to propose website pull requests that update these manuals. -Merge the pull requests only after finalizing the release. To avoid invalid -redirects for the 'current' version, merge these PRs in the order listed. Once -the PR is merged, the new contents will get picked up automatically and served -to the Beam website, usually within an hour. +Merge the pull requests only after finalizing the release. +To avoid invalid redirects for the 'current' version, merge these PRs in the order listed. +Once the PR is merged, the new contents will get picked up automatically and served to the Beam website, usually within an hour. +A committer can manually trigger the [beam_PostCommit_Website_Publish](https://ci-beam.apache.org/job/beam_PostCommit_Website_Publish/) task in Jenkins to avoid waiting. **PR 1: apache/beam-site** -This pull request is against the `apache/beam-site` repo, on the `release-docs` -branch ([example](https://github.com/apache/beam-site/pull/603)). +This pull request is against the `apache/beam-site` repo, on the `release-docs` branch ([example](https://github.com/apache/beam-site/pull/603)). It is created by `build_release_candidate.sh` (see above). **PR 2: apache/beam** -This pull request is against the `apache/beam` repo, on the `master` branch ([example](https://github.com/apache/beam/pull/11727)). +This pull request is against the `apache/beam` repo, on the `master` branch ([example](https://github.com/apache/beam/pull/15068)). +* Update `CHANGES.md` to update release date and remove template. * Update release version in `website/www/site/config.toml`. * Add new release in `website/www/site/content/en/get-started/downloads.md`. * Download links will not work until the release is finalized. * Update `website/www/site/static/.htaccess` to redirect to the new version. +* Create the Blog post: +#### Blog post -### Blog post - -Write a blog post similar to [beam-2.23.0.md](https://github.com/apache/beam/commit/b976e7be0744a32e99c841ad790c54920c8737f5#diff-8b1c3fd0d4a6765c16dfd18509182f9d). - -- Update `CHANGES.md` by adding a new section for the next release. +Use the template below to write a blog post for the release. +See [beam-2.31.0.md](https://github.com/apache/beam/commit/a32a75ed0657c122c6625aee1ace27994e7df195#diff-1e2b83a4f61dce8014a1989869b6d31eb3f80cb0d6dade42fb8df5d9407b4748) as an example. - Copy the changes for the current release from `CHANGES.md` to the blog post and edit as necessary. +- Be sure to add yourself to [authors.yml](https://github.com/apache/beam/blob/master/website/www/site/data/authors.yml) if necessary. __Tip__: Use git log to find contributors to the releases. (e.g: `git log --pretty='%aN' ^v2.10.0 v2.11.0 | sort | uniq`). Make sure to clean it up, as there may be duplicate or incorrect user names. @@ -663,56 +672,57 @@ Make sure to clean it up, as there may be duplicate or incorrect user names. __NOTE__: Make sure to include any breaking changes, even to `@Experimental` features, all major features and bug fixes, and all known issues. -Template: +**Template:** -``` -We are happy to present the new {$RELEASE_VERSION} release of Beam. This release includes both improvements and new functionality. -See the [download page](/get-started/downloads/{$DOWNLOAD_ANCHOR}) for this release. -For more information on changes in {$RELEASE_VERSION}, check out the -[detailed release notes]({$JIRA_RELEASE_NOTES}). + We are happy to present the new {$RELEASE_VERSION} release of Beam. + This release includes both improvements and new functionality. + See the [download page](/get-started/downloads/{$DOWNLOAD_ANCHOR}) for this release. -## Highlights + <{$REMOVE_FOR_VALID_SUMMARY_BREAK}!--more--> - * New highly anticipated feature X added to Python SDK ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). - * New highly anticipated feature Y added to JavaSDK ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). + For more information on changes in {$RELEASE_VERSION}, check out the [detailed release notes]({$JIRA_RELEASE_NOTES}). -{$TOPICS e.g.:} -### I/Os -* Support for X source added (Java) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). -{$TOPICS} + ## Highlights -### New Features / Improvements + * New highly anticipated feature X added to Python SDK ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). + * New highly anticipated feature Y added to Java SDK ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). -* X feature added (Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). -* Y feature added (Java) [BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y). + {$TOPICS e.g.:} + ### I/Os + * Support for X source added (Java) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). + {$TOPICS} -### Breaking Changes + ### New Features / Improvements -* X behavior was changed ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). -* Y behavior was changed ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). + * X feature added (Python) ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). + * Y feature added (Java) [BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y). -### Deprecations + ### Breaking Changes -* X behavior is deprecated and will be removed in X versions ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). + * X behavior was changed ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). + * Y behavior was changed ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). -### Bugfixes + ### Deprecations -* Fixed X (Python) ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-X)). -* Fixed Y (Java) ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). + * X behavior is deprecated and will be removed in X versions ([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)). -### Known Issues + ### Bugfixes -* {$KNOWN_ISSUE_1} -* {$KNOWN_ISSUE_2} -* See a full list of open [issues that affect](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20affectedVersion%20%3D%20{$RELEASE}%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC) this version. + * Fixed X (Python) ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-X)). + * Fixed Y (Java) ([BEAM-Y](https://issues.apache.org/jira/browse/BEAM-Y)). + ### Known Issues -## List of Contributors + * {$KNOWN_ISSUE_1} + * {$KNOWN_ISSUE_2} + * See a full list of open [issues that affect](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20affectedVersion%20%3D%20{$RELEASE}%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC) this version. -According to git shortlog, the following people contributed to the {$RELEASE_VERSION} release. Thank you to all contributors! -${CONTRIBUTORS} -``` + ## List of Contributors + + According to git shortlog, the following people contributed to the {$RELEASE_VERSION} release. Thank you to all contributors! + + ${CONTRIBUTORS} #### Checklist to proceed to the next step @@ -724,7 +734,7 @@ ${CONTRIBUTORS} You can (optionally) also do additional verification by: 1. Check that Python zip file contains the `README.md`, `NOTICE`, and `LICENSE` files. -1. Check hashes (e.g. `md5sum -c *.md5` and `sha1sum -c *.sha1`) +1. Check hashes (e.g. `md5sum -c *.md5` and `sha1sum -c *.sha1`. Note that signature/checksum files of Java artifacts may not contain filenames. Hence you might need to compare checksums/signatures manually or modify the files by appending the filenames.) 1. Check signatures (e.g. `gpg --verify apache-beam-1.2.3-python.zip.asc apache-beam-1.2.3-python.zip`) 1. `grep` for legal headers in each file. 1. Run all jenkins suites and include links to passing tests in the voting email. @@ -740,9 +750,11 @@ docker pull apache/beam_python3.5_sdk:2.16.0_rc1 ## 9. Vote and validate release candidate -Once you have built and individually reviewed the release candidate, please share it for the community-wide review. Please review foundation-wide [voting guidelines](http://www.apache.org/foundation/voting.html) for more information. +Once you have built and individually reviewed the release candidate, please share it for the community-wide review. +Please review foundation-wide [voting guidelines](http://www.apache.org/foundation/voting.html) for more information. -Start the review-and-vote thread on the dev@ mailing list. Here’s an email template; please adjust as you see fit. +Start the review-and-vote thread on the dev@ mailing list. +Here’s an email template; please adjust as you see fit. From: Release Manager To: dev@beam.apache.org @@ -762,14 +774,16 @@ Start the review-and-vote thread on the dev@ mailing list. Here’s an email tem * the official Apache source release to be deployed to dist.apache.org [2], which is signed with the key with fingerprint FFFFFFFF [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag "v1.2.3-RC3" [5], - * website pull request listing the release [6], publishing the API reference manual [7], and the blog post [8]. + * website pull request listing the release [6], the blog post [6], and publishing the API reference manual [7]. * Java artifacts were built with Maven MAVEN_VERSION and OpenJDK/Oracle JDK JDK_VERSION. - * Python artifacts are deployed along with the source release to the dist.apache.org [2]. + * Python artifacts are deployed along with the source release to the dist.apache.org [2] and pypy[8]. * Validation sheet with a tab for 1.2.3 release to help with validation [9]. * Docker images published to Docker Hub [10]. The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. + For guidelines on how to try the release in your projects, check out our blog post at https://beam.apache.org/blog/validate-beam-release/. + Thanks, Release Manager @@ -780,39 +794,26 @@ Start the review-and-vote thread on the dev@ mailing list. Here’s an email tem [5] https://github.com/apache/beam/tree/v1.2.3-RC3 [6] https://github.com/apache/beam/pull/... [7] https://github.com/apache/beam-site/pull/... - [8] https://github.com/apache/beam/pull/... + [8] https://pypi.org/project/apache-beam/1.2.3rc3/ [9] https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=... [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image -If there are any issues found in the release candidate, reply on the vote thread to cancel the vote. There’s no need to wait 72 hours. Proceed to the `Fix Issues` step below and address the problem. However, some issues don’t require cancellation. For example, if an issue is found in the website pull request, just correct it on the spot and the vote can continue as-is. - -If there are no issues, reply on the vote thread to close the voting. Then, tally the votes in a separate email thread. Here’s an email template; please adjust as you see fit. - - From: Release Manager - To: dev@beam.apache.org - Subject: [RESULT] [VOTE] Release 1.2.3, release candidate #3 - - I'm happy to announce that we have unanimously approved this release. - - There are XXX approving votes, XXX of which are binding: - * approver 1 - * approver 2 - * approver 3 - * approver 4 - - There are no disapproving votes. - - Thanks everyone! +If there are any issues found in the release candidate, reply on the vote thread to cancel the vote. +There’s no need to wait 72 hours. +Proceed to the `Fix Issues` step below and address the problem. +However, some issues don’t require cancellation. +For example, if an issue is found in the website pull request, just correct it on the spot and the vote can continue as-is. ### Run validation tests All tests listed in this [spreadsheet](https://s.apache.org/beam-release-validation) -Since there are a bunch of tests, we recommend you running validations using automation script. In case of script failure, you can still run all of them manually. +Since there are a bunch of tests, we recommend you running validations using automation script. +In case of script failure, you can still run all of them manually. #### Run validations using run_rc_validation.sh -* Script: [run_rc_validation.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh) +* **Script:** [run_rc_validation.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh) -* Usage +* **Usage** 1. First update required configurations listed in `RC_VALIDATE_CONFIGS` in [script.config](https://github.com/apache/beam/blob/master/release/src/main/scripts/script.config) 1. Then run @@ -820,7 +821,7 @@ Since there are a bunch of tests, we recommend you running validations using aut ./beam/release/src/main/scripts/run_rc_validation.sh ``` -* Tasks included +* **Tasks included** 1. Run Java quickstart with Direct Runner, Flink local runner, Spark local runner and Dataflow runner. 1. Run Java Mobile Games(UserScore, HourlyTeamScore, Leaderboard) with Dataflow runner. 1. Create a PR to trigger python validation job, including @@ -833,7 +834,7 @@ Since there are a bunch of tests, we recommend you running validations using aut * Start a new terminal to run python GameStats with Direct Runner. * Start a new terminal to run python GameStats with Dataflow Runner. -* Tasks you need to do manually +* **Tasks you need to do manually** 1. Check whether validations succeed by following console output instructions. 1. Terminate streaming jobs and java injector. 1. Sign up [spreadsheet](https://s.apache.org/beam-release-validation). @@ -843,27 +844,27 @@ Since there are a bunch of tests, we recommend you running validations using aut _Note_: -Prepourl and -Pver can be found in the RC vote email sent by Release Manager. -* Java Quickstart Validation +* **Java Quickstart Validation** - Direct Runner: + **Direct Runner** ``` ./gradlew :runners:direct-java:runQuickstartJavaDirect \ -Prepourl=https://repository.apache.org/content/repositories/orgapachebeam-${KEY} \ -Pver=${RELEASE_VERSION} ``` - Flink Local Runner + **Flink Local Runner** ``` - ./gradlew :runners:flink:1.10:runQuickstartJavaFlinkLocal \ + ./gradlew :runners:flink:1.13:runQuickstartJavaFlinkLocal \ -Prepourl=https://repository.apache.org/content/repositories/orgapachebeam-${KEY} \ -Pver=${RELEASE_VERSION} ``` - Spark Local Runner + **Spark Local Runner** ``` - ./gradlew :runners:spark:runQuickstartJavaSpark \ + ./gradlew :runners:spark:2:runQuickstartJavaSpark \ -Prepourl=https://repository.apache.org/content/repositories/orgapachebeam-${KEY} \ -Pver=${RELEASE_VERSION} ``` - Dataflow Runner + **Dataflow Runner** ``` ./gradlew :runners:google-cloud-dataflow-java:runQuickstartJavaDataflow \ -Prepourl=https://repository.apache.org/content/repositories/orgapachebeam-${KEY} \ @@ -871,27 +872,26 @@ _Note_: -Prepourl and -Pver can be found in the RC vote email sent by Release Ma -PgcpProject=${YOUR_GCP_PROJECT} \ -PgcsBucket=${YOUR_GCP_BUCKET} ``` -* Java Mobile Game(UserScore, HourlyTeamScore, Leaderboard) +* **Java Mobile Game(UserScore, HourlyTeamScore, Leaderboard)** - Pre-request - * Create your own BigQuery dataset + **Prerequisites** + * **Create your own BigQuery dataset** ``` bq mk --project_id=${YOUR_GCP_PROJECT} ${YOUR_DATASET} ``` - * Create yout PubSub topic + * **Create your PubSub topic** ``` gcloud alpha pubsub topics create --project=${YOUR_GCP_PROJECT} ${YOUR_PROJECT_PUBSUB_TOPIC} ``` - * Setup your service account + * **Setup your service account** - Goto IAM console in your project to create a service account as ```project owner``` + Goto IAM console in your project to create a service account as `project owner`, then run - Run ``` gcloud iam service-accounts keys create ${YOUR_KEY_JSON} --iam-account ${YOUR_SERVICE_ACCOUNT_NAME}@${YOUR_PROJECT_NAME} export GOOGLE_APPLICATION_CREDENTIALS=${PATH_TO_YOUR_KEY_JSON} ``` - Run + **Run** ``` ./gradlew :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow \ -Prepourl=https://repository.apache.org/content/repositories/orgapachebeam-${KEY} \ @@ -900,28 +900,28 @@ _Note_: -Prepourl and -Pver can be found in the RC vote email sent by Release Ma -PgcsBucket=${YOUR_GCP_BUCKET} \ -PbqDataset=${YOUR_DATASET} -PpubsubTopic=${YOUR_PROJECT_PUBSUB_TOPIC} ``` -* Python Quickstart(batch & streaming), MobileGame(UserScore, HourlyTeamScore) +* **Python Quickstart(batch & streaming), MobileGame(UserScore, HourlyTeamScore)** - Create a new PR in apache/beam + Create a new PR in apache/beam. - In comment area, type in ```Run Python ReleaseCandidate``` + In comment area, type in `Run Python ReleaseCandidate` to trigger validation. -* Python Leaderboard & GameStats - * Get staging RC ```wget https://dist.apache.org/repos/dist/dev/beam/2.5.0/* ``` - * Verify the hashes +* **Python Leaderboard & GameStats** + * **Get staging RC** `wget https://dist.apache.org/repos/dist/dev/beam/2.5.0/* ` + * **Verify the hashes** ``` sha512sum -c apache-beam-2.5.0-python.zip.sha512 sha512sum -c apache-beam-2.5.0-source-release.zip.sha512 ``` - * Build SDK + * **Build SDK** ``` sudo apt-get install unzip unzip apache-beam-2.5.0-source-release.zip python setup.py sdist ``` - * Setup virtualenv + * **Setup virtualenv** ``` pip install --upgrade pip @@ -930,13 +930,13 @@ _Note_: -Prepourl and -Pver can be found in the RC vote email sent by Release Ma virtualenv beam_env . beam_env/bin/activate ``` - * Install SDK + * **Install SDK** ``` pip install dist/apache-beam-2.5.0.tar.gz pip install dist/apache-beam-2.5.0.tar.gz[gcp] ``` - * Setup GCP + * **Setup GCP** Please repeat following steps for every following test. @@ -949,46 +949,47 @@ _Note_: -Prepourl and -Pver can be found in the RC vote email sent by Release Ma ``` Setup your service account as described in ```Java Mobile Game``` section above. - Produce data by using java injector: - * Configure your ~/.m2/settings.xml as following: - ``` - - - - release-repo - - true - - - - Release 2.4.0 RC3 - Release 2.4.0 RC3 - https://repository.apache.org/content/repositories/orgapachebeam-1031/ - - - - - - ``` - _Note_: You can found the latest ```id```, ```name``` and ```url``` for one RC in the vote email thread sent out by Release Manager. + * **Produce data by using java injector:** - * Run - ``` - mvn archetype:generate \ - -DarchetypeGroupId=org.apache.beam \ - -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ - -DarchetypeVersion=${RELEASE_VERSION} \ - -DgroupId=org.example \ - -DartifactId=word-count-beam \ - -Dversion="0.1" \ - -Dpackage=org.apache.beam.examples \ - -DinteractiveMode=false - -DarchetypeCatalog=internal - - mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.complete.game.injector.Injector \ - -Dexec.args="${YOUR_PROJECT} ${YOUR_PUBSUB_TOPIC} none" - ``` - * Run Leaderboard with Direct Runner + Configure your ~/.m2/settings.xml as following: + ``` + + + + release-repo + + true + + + + Release 2.4.0 RC3 + Release 2.4.0 RC3 + https://repository.apache.org/content/repositories/orgapachebeam-1031/ + + + + + + ``` + __Note__: You can found the latest ```id```, ```name``` and ```url``` for one RC in the vote email thread sent out by Release Manager. + + Run + ``` + mvn archetype:generate \ + -DarchetypeGroupId=org.apache.beam \ + -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ + -DarchetypeVersion=${RELEASE_VERSION} \ + -DgroupId=org.example \ + -DartifactId=word-count-beam \ + -Dversion="0.1" \ + -Dpackage=org.apache.beam.examples \ + -DinteractiveMode=false + -DarchetypeCatalog=internal + + mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.complete.game.injector.Injector \ + -Dexec.args="${YOUR_PROJECT} ${YOUR_PUBSUB_TOPIC} none" + ``` + * **Run Leaderboard with Direct Runner** ``` python -m apache_beam.examples.complete.game.leader_board \ --project=${YOUR_PROJECT} \ @@ -1000,7 +1001,8 @@ _Note_: -Prepourl and -Pver can be found in the RC vote email sent by Release Ma * Goto your BigQuery console and check whether your ${USER}_test has leader_board_users and leader_board_teams table. * bq head -n 10 ${USER}_test.leader_board_users * bq head -n 10 ${USER}_test.leader_board_teams - * Run Leaderboard with Dataflow Runner + + * **Run Leaderboard with Dataflow Runner** ``` python -m apache_beam.examples.complete.game.leader_board \ --project=${YOUR_PROJECT} \ @@ -1016,7 +1018,8 @@ _Note_: -Prepourl and -Pver can be found in the RC vote email sent by Release Ma * Goto your BigQuery console and check whether your ${USER}_test has leader_board_users and leader_board_teams table. * bq head -n 10 ${USER}_test.leader_board_users * bq head -n 10 ${USER}_test.leader_board_teams - * Run GameStats with Direct Runner + + * **Run GameStats with Direct Runner** ``` python -m apache_beam.examples.complete.game.game_stats \ --project=${YOUR_PROJECT} \ @@ -1030,7 +1033,7 @@ _Note_: -Prepourl and -Pver can be found in the RC vote email sent by Release Ma * bq head -n 10 ${USER}_test.game_stats_teams * bq head -n 10 ${USER}_test.game_stats_sessions - * Run GameStats with Dataflow Runner + * **Run GameStats with Dataflow Runner** ``` python -m apache_beam.examples.complete.game.game_stats \ --project=${YOUR_PROJECT} \ @@ -1051,17 +1054,45 @@ _Note_: -Prepourl and -Pver can be found in the RC vote email sent by Release Ma ### Fix any issues -Any issues identified during the community review and vote should be fixed in this step. Additionally, any JIRA issues created from the initial branch verification should be fixed. +Any issues identified during the community review and vote should be fixed in this step. +Additionally, any JIRA issues created from the initial branch verification should be fixed. -Code changes should be proposed as standard pull requests to the `master` branch and reviewed using the normal contributing process. Then, relevant changes should be cherry-picked into the release branch. The cherry-pick commits should then be proposed as the pull requests against the release branch, again reviewed and merged using the normal contributing process. +Code changes should be proposed as standard pull requests to the `master` branch and reviewed using the normal contributing process. +Then, relevant changes should be cherry-picked into the release branch proposed as pull requests against the release branch, again reviewed and merged using the normal contributing process. Once all issues have been resolved, you should go back and build a new release candidate with these changes. +### Finalize the vote + +Reply on the vote thread to close the voting once following conditions are met for the current release candidate. +* At least 72 hours has passed since the voting email. +* No release blocking issues have been identified. +* Voting thread has at least three approving PMC votes. + +Then, tally the votes in a separate email thread. +Here’s an email template; please adjust as you see fit. + + From: Release Manager + To: dev@beam.apache.org + Subject: [RESULT] [VOTE] Release 1.2.3, release candidate #3 + + I'm happy to announce that we have unanimously approved this release. + + There are XXX approving votes, XXX of which are binding: + * approver 1 + * approver 2 + * approver 3 + * approver 4 + + There are no disapproving votes. + + Thanks everyone! + ### Checklist to proceed to the next step 1. Issues identified during vote have been resolved, with fixes committed to the release branch. 2. All issues tagged with `Fix-Version` for the current release should be closed. -3. Community votes to release the proposed candidate, with at least three approving PMC votes +3. Community votes to release the proposed candidate, with at least three approving PMC votes. ********** @@ -1069,18 +1100,22 @@ Once all issues have been resolved, you should go back and build a new release c ## 10. Finalize the release -Once the release candidate has been reviewed and approved by the community, the release should be finalized. This involves the final deployment of the release candidate to the release repositories, merging of the website changes, etc. +Once the release candidate has been reviewed and approved by the community, the release should be finalized. +This involves the final deployment of the release candidate to the release repositories, merging of the website changes, etc. ### Deploy artifacts to Maven Central Repository -Use the [Apache Nexus repository manager](https://repository.apache.org/#stagingRepositories) to release the staged binary artifacts to the Maven Central repository. In the `Staging Repositories` section, find the relevant release candidate `orgapachebeam-XXX` entry and click `Release`. Drop all other release candidates that are not being released. +Use the [Apache Nexus repository manager](https://repository.apache.org/#stagingRepositories) to release the staged binary artifacts to the Maven Central repository. +In the `Staging Repositories` section, find the relevant release candidate `orgapachebeam-XXX` entry and click `Release`. +Drop all other release candidates that are not being released. + __NOTE__: If you are using [GitHub two-factor authentication](https://help.github.com/articles/securing-your-account-with-two-factor-authentication-2fa/) and haven't configure HTTPS access, please follow [the guide](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/) to configure command line access. ### Deploy Python artifacts to PyPI -* Script: [deploy_pypi.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/deploy_pypi.sh) -* Usage +* **Script:** [deploy_pypi.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/deploy_pypi.sh) +* **Usage** ``` ./beam/release/src/main/scripts/deploy_pypi.sh ``` @@ -1088,16 +1123,17 @@ please follow [the guide](https://help.github.com/articles/creating-a-personal-a All wheels should be published, in addition to the zip of the release source. (Signatures and hashes do _not_ need to be uploaded.) -### Deploy SDK docker images to DockerHub -* Script: [publish_docker_images.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/publish_docker_images.sh) -* Usage +### Deploy docker images to DockerHub +* **Script:** [publish_docker_images.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/publish_docker_images.sh) +* **Usage** ``` ./beam/release/src/main/scripts/publish_docker_images.sh ``` -Verify that: -* Images are published at [DockerHub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) with tags {RELEASE} and *latest*. -* Images with *latest* tag are pointing to current release by confirming - 1. Digest of the image with *latest* tag is the same as the one with {RELEASE} tag. +* **Verify that:** + * Images are published at [DockerHub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) with tags {RELEASE} and *latest*. + * Images with *latest* tag are pointing to current release by confirming the digest of the image with *latest* tag is the same as the one with {RELEASE} tag. + +(Optional) Clean up any unneeded local images afterward to save disk space. ### Merge Website pull requests @@ -1131,31 +1167,37 @@ Note this script reads the release notes from the blog post, so you should make After running the script, the release notes should be visible on Github's [Releases](https://github.com/apache/beam/releases) page. ### PMC-Only Finalization -There are a few release finalization tasks that only PMC members have permissions to do. Ping [dev@](mailto:dev@beam.apache.org) for assistance if you need it. +There are a few release finalization tasks that only PMC members have permissions to do. +Ping [dev@](mailto:dev@beam.apache.org) for assistance if you need it. #### Deploy source release to dist.apache.org Copy the source release from the `dev` repository to the `release` repository at `dist.apache.org` using Subversion. -Make sure the last release's artifacts have been copied from `dist.apache.org` to `archive.apache.org`. This should happen automatically: [dev@ thread](https://lists.apache.org/thread.html/39c26c57c5125a7ca06c3c9315b4917b86cd0e4567b7174f4bc4d63b%40%3Cdev.beam.apache.org%3E) with context. The release manager should also make sure to change these links on the website ([example](https://github.com/apache/beam/pull/11727)). +Make sure the last release's artifacts have been copied from `dist.apache.org` to `archive.apache.org`. +This should happen automatically: [dev@ thread](https://lists.apache.org/thread.html/39c26c57c5125a7ca06c3c9315b4917b86cd0e4567b7174f4bc4d63b%40%3Cdev.beam.apache.org%3E) with context. +The release manager should also make sure to update these links on the website ([example](https://github.com/apache/beam/pull/11727)). #### Mark the version as released in JIRA -In JIRA, inside [version management](https://issues.apache.org/jira/plugins/servlet/project-config/BEAM/versions), hover over the current release and a settings menu will appear. Click `Release`, and select today’s date. +In JIRA, inside [version management](https://issues.apache.org/jira/plugins/servlet/project-config/BEAM/versions), hover over the current release and a settings menu will appear. +Click `Release`, and select today’s date. #### Recordkeeping with ASF -Use reporter.apache.org to seed the information about the release into future project reports. +Use [reporter.apache.org](https://reporter.apache.org/addrelease.html?beam) to seed the information about the release into future project reports. ### Checklist to proceed to the next step * Maven artifacts released and indexed in the [Maven Central Repository](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22) * Source distribution available in the release repository of [dist.apache.org](https://dist.apache.org/repos/dist/release/beam/) * Source distribution removed from the dev repository of [dist.apache.org](https://dist.apache.org/repos/dist/dev/beam/) -* Website pull request to [list the release](/get-started/downloads/) and publish the [API reference manual](https://beam.apache.org/releases/javadoc/) merged +* Website pull request to [list the release](/get-started/downloads/) and publish the [API reference manual](https://beam.apache.org/releases/javadoc/) merged. * The release is tagged on Github's [Tags](https://github.com/apache/beam/tags) page. * The release notes are published on Github's [Releases](https://github.com/apache/beam/releases) page. -* Release version finalized in JIRA. (Note: Not all committers have administrator access to JIRA. If you end up getting permissions errors ask on the mailing list for assistance.) +* Release version finalized in JIRA. + (Note: Not all committers have administrator access to JIRA. + If you end up getting permissions errors ask on the mailing list for assistance.) * Release version is listed at reporter.apache.org @@ -1173,12 +1215,12 @@ Announce on the dev@ mailing list that the release has been finished. Announce on the release on the user@ mailing list, listing major improvements and contributions. Announce the release on the announce@apache.org mailing list. -__NOTE__: This can only be done from `@apache.org` email address. - +__NOTE__: This can only be done from `@apache.org` email address. This email has to be in plain text (no HTML tags). ### Social media -Tweet, post on Facebook, LinkedIn, and other platforms. Ask other contributors to do the same. +Tweet, post on Facebook, LinkedIn, and other platforms. +Ask other contributors to do the same. Also, update [the Wikipedia article on Apache Beam](https://en.wikipedia.org/wiki/Apache_Beam). @@ -1195,6 +1237,9 @@ Also, update [the Wikipedia article on Apache Beam](https://en.wikipedia.org/wik ## Improve the process -It is important that we improve the release processes over time. Once you’ve finished the release, please take a step back and look what areas of this process and be improved. Perhaps some part of the process can be simplified. Perhaps parts of this guide can be clarified. +It is important that we improve the release processes over time. +Once you’ve finished the release, please take a step back and look what areas of this process and be improved. Perhaps some part of the process can be simplified. +Perhaps parts of this guide can be clarified. -If we have specific ideas, please start a discussion on the dev@ mailing list and/or propose a pull request to update this guide. Thanks! +If we have specific ideas, please start a discussion on the dev@ mailing list and/or propose a pull request to update this guide. +Thanks! diff --git a/website/www/site/content/en/contribute/runner-guide.md b/website/www/site/content/en/contribute/runner-guide.md index 57457a68fa8b..6bc65ecd6eb5 100644 --- a/website/www/site/content/en/contribute/runner-guide.md +++ b/website/www/site/content/en/contribute/runner-guide.md @@ -27,183 +27,6 @@ Topics covered: {{< toc >}} -## Basics of the Beam model - -Suppose you have a data processing engine that can pretty easily process graphs -of operations. You want to integrate it with the Beam ecosystem to get access -to other languages, great event time processing, and a library of connectors. -You need to know the core vocabulary: - - * [_Pipeline_](#pipeline) - A pipeline is a graph of transformations that a user constructs - that defines the data processing they want to do. - * [_PCollection_](#pcollections) - Data being processed in a pipeline is part of a PCollection. - * [_PTransforms_](#ptransforms) - The operations executed within a pipeline. These are best - thought of as operations on PCollections. - * _SDK_ - A language-specific library for pipeline authors (we often call them - "users" even though we have many kinds of users) to build transforms, - construct their pipelines and submit them to a runner - * _Runner_ - You are going to write a piece of software called a runner that - takes a Beam pipeline and executes it using the capabilities of your data - processing engine. - -These concepts may be very similar to your processing engine's concepts. Since -Beam's design is for cross-language operation and reusable libraries of -transforms, there are some special features worth highlighting. - -### Pipeline - -A pipeline in Beam is a graph of PTransforms operating on PCollections. A -pipeline is constructed by a user in their SDK of choice, and makes its way to -your runner either via the SDK directly or via the Runner API's (forthcoming) -RPC interfaces. - -### PTransforms - -In Beam, a PTransform can be one of the five primitives or it can be a -composite transform encapsulating a subgraph. The primitives are: - - * [_Read_](#implementing-the-read-primitive) - parallel connectors to external - systems - * [_ParDo_](#implementing-the-pardo-primitive) - per element processing - * [_GroupByKey_](#implementing-the-groupbykey-and-window-primitive) - - aggregating elements per key and window - * [_Flatten_](#implementing-the-flatten-primitive) - union of PCollections - * [_Window_](#implementing-the-window-primitive) - set the windowing strategy - for a PCollection - -When implementing a runner, these are the operations you need to implement. -Composite transforms may or may not be important to your runner. If you expose -a UI, maintaining some of the composite structure will make the pipeline easier -for a user to understand. But the result of processing is not changed. - -### PCollections - -A PCollection is an unordered bag of elements. Your runner will be responsible -for storing these elements. There are some major aspects of a PCollection to -note: - -#### Bounded vs Unbounded - -A PCollection may be bounded or unbounded. - - - _Bounded_ - it is finite and you know it, as in batch use cases - - _Unbounded_ - it may be never end, you don't know, as in streaming use cases - -These derive from the intuitions of batch and stream processing, but the two -are unified in Beam and bounded and unbounded PCollections can coexist in the -same pipeline. If your runner can only support bounded PCollections, you'll -need to reject pipelines that contain unbounded PCollections. If your -runner is only really targeting streams, there are adapters in our support code -to convert everything to APIs targeting unbounded data. - -#### Timestamps - -Every element in a PCollection has a timestamp associated with it. - -When you execute a primitive connector to some storage system, that connector -is responsible for providing initial timestamps. Your runner will need to -propagate and aggregate timestamps. If the timestamp is not important, as with -certain batch processing jobs where elements do not denote events, they will be -the minimum representable timestamp, often referred to colloquially as -"negative infinity". - -#### Watermarks - -Every PCollection has to have a watermark that estimates how complete the -PCollection is. - -The watermark is a guess that "we'll never see an element with an earlier -timestamp". Sources of data are responsible for producing a watermark. Your -runner needs to implement watermark propagation as PCollections are processed, -merged, and partitioned. - -The contents of a PCollection are complete when a watermark advances to -"infinity". In this manner, you may discover that an unbounded PCollection is -finite. - -#### Windowed elements - -Every element in a PCollection resides in a window. No element resides in -multiple windows (two elements can be equal except for their window, but they -are not the same). - -When elements are read from the outside world they arrive in the global window. -When they are written to the outside world, they are effectively placed back -into the global window (any writing transform that doesn't take this -perspective probably risks data loss). - -A window has a maximum timestamp, and when the watermark exceeds this plus -user-specified allowed lateness the window is expired. All data related -to an expired window may be discarded at any time. - -#### Coder - -Every PCollection has a coder, a specification of the binary format of the elements. - -In Beam, the user's pipeline may be written in a language other than the -language of the runner. There is no expectation that the runner can actually -deserialize user data. So the Beam model operates principally on encoded data - -"just bytes". Each PCollection has a declared encoding for its elements, called -a coder. A coder has a URN that identifies the encoding, and may have -additional sub-coders (for example, a coder for lists may contain a coder for -the elements of the list). Language-specific serialization techniques can, and -frequently are used, but there are a few key formats - such as key-value pairs -and timestamps - that are common so your runner can understand them. - -#### Windowing Strategy - -Every PCollection has a windowing strategy, a specification of essential -information for grouping and triggering operations. - -The details will be discussed below when we discuss the -[Window](#implementing-the-window-primitive) primitive, which sets up the -windowing strategy, and -[GroupByKey](#implementing-the-groupbykey-and-window-primitive) primitive, -which has behavior governed by the windowing strategy. - -### User-Defined Functions (UDFs) - -Beam has seven varieties of user-defined function (UDF). A Beam pipeline -may contain UDFs written in a language other than your runner, or even multiple -languages in the same pipeline (see the [Runner API](#the-runner-api)) so the -definitions are language-independent (see the [Fn API](#the-fn-api)). - -The UDFs of Beam are: - - * _DoFn_ - per-element processing function (used in ParDo) - * _WindowFn_ - places elements in windows and merges windows (used in Window - and GroupByKey) - * _Source_ - emits data read from external sources, including initial and - dynamic splitting for parallelism (used in Read) - * _ViewFn_ - adapts a materialized PCollection to a particular interface (used - in side inputs) - * _WindowMappingFn_ - maps one element's window to another, and specifies - bounds on how far in the past the result window will be (used in side - inputs) - * _CombineFn_ - associative and commutative aggregation (used in Combine and - state) - * _Coder_ - encodes user data; some coders have standard formats and are not really UDFs - -The various types of user-defined functions will be described further alongside -the primitives that use them. - -### Runner - -The term "runner" is used for a couple of things. It generally refers to the -software that takes a Beam pipeline and executes it somehow. Often, this is the -translation code that you write. It usually also includes some customized -operators for your data processing engine, and is sometimes used to refer to -the full stack. - -A runner has just a single method `run(Pipeline)`. From here on, I will often -use code font for proper nouns in our APIs, whether or not the identifiers -match across all SDKs. - -The `run(Pipeline)` method should be asynchronous and results in a -PipelineResult which generally will be a job descriptor for your data -processing engine, providing methods for checking its status, canceling it, and -waiting for it to terminate. - ## Implementing the Beam Primitives Aside from encoding and persisting data - which presumably your engine already diff --git a/website/www/site/content/en/documentation/_index.md b/website/www/site/content/en/documentation/_index.md index ffd6c21e7989..80d111b9ad07 100644 --- a/website/www/site/content/en/documentation/_index.md +++ b/website/www/site/content/en/documentation/_index.md @@ -4,6 +4,7 @@ aliases: - /learn/ - /docs/learn/ --- + + +# Basics of the Beam model + +Suppose you have a data processing engine that can pretty easily process graphs +of operations. You want to integrate it with the Beam ecosystem to get access +to other languages, great event time processing, and a library of connectors. +You need to know the core vocabulary: + + * [_Pipeline_](#pipeline) - A pipeline is a graph of transformations that a user constructs + that defines the data processing they want to do. + * [_PCollection_](#pcollections) - Data being processed in a pipeline is part of a PCollection. + * [_PTransforms_](#ptransforms) - The operations executed within a pipeline. These are best + thought of as operations on PCollections. + * _SDK_ - A language-specific library for pipeline authors (we often call them + "users" even though we have many kinds of users) to build transforms, + construct their pipelines and submit them to a runner + * _Runner_ - You are going to write a piece of software called a runner that + takes a Beam pipeline and executes it using the capabilities of your data + processing engine. + +These concepts may be very similar to your processing engine's concepts. Since +Beam's design is for cross-language operation and reusable libraries of +transforms, there are some special features worth highlighting. + +### Pipeline + +A pipeline in Beam is a graph of PTransforms operating on PCollections. A +pipeline is constructed by a user in their SDK of choice, and makes its way to +your runner either via the SDK directly or via the Runner API's +RPC interfaces. + +### PTransforms + +A `PTransform` represents a data processing operation, or a step, +in your pipeline. A `PTransform` can be applied to one or more +`PCollection` objects as input which performs some processing on the elements of that +`PCollection` and produces zero or more output `PCollection` objects. + +### PCollections + +A PCollection is an unordered bag of elements. Your runner will be responsible +for storing these elements. There are some major aspects of a PCollection to +note: + +#### Bounded vs Unbounded + +A PCollection may be bounded or unbounded. + + - _Bounded_ - it is finite and you know it, as in batch use cases + - _Unbounded_ - it may be never end, you don't know, as in streaming use cases + +These derive from the intuitions of batch and stream processing, but the two +are unified in Beam and bounded and unbounded PCollections can coexist in the +same pipeline. If your runner can only support bounded PCollections, you'll +need to reject pipelines that contain unbounded PCollections. If your +runner is only really targeting streams, there are adapters in our support code +to convert everything to APIs targeting unbounded data. + +#### Timestamps + +Every element in a PCollection has a timestamp associated with it. + +When you execute a primitive connector to some storage system, that connector +is responsible for providing initial timestamps. Your runner will need to +propagate and aggregate timestamps. If the timestamp is not important, as with +certain batch processing jobs where elements do not denote events, they will be +the minimum representable timestamp, often referred to colloquially as +"negative infinity". + +#### Watermarks + +Every PCollection has to have a watermark that estimates how complete the +PCollection is. + +The watermark is a guess that "we'll never see an element with an earlier +timestamp". Sources of data are responsible for producing a watermark. Your +runner needs to implement watermark propagation as PCollections are processed, +merged, and partitioned. + +The contents of a PCollection are complete when a watermark advances to +"infinity". In this manner, you may discover that an unbounded PCollection is +finite. + +#### Windowed elements + +Every element in a PCollection resides in a window. No element resides in +multiple windows (two elements can be equal except for their window, but they +are not the same). + +When elements are read from the outside world they arrive in the global window. +When they are written to the outside world, they are effectively placed back +into the global window (any writing transform that doesn't take this +perspective probably risks data loss). + +A window has a maximum timestamp, and when the watermark exceeds this plus +user-specified allowed lateness the window is expired. All data related +to an expired window may be discarded at any time. + +#### Coder + +Every PCollection has a coder, a specification of the binary format of the elements. + +In Beam, the user's pipeline may be written in a language other than the +language of the runner. There is no expectation that the runner can actually +deserialize user data. So the Beam model operates principally on encoded data - +"just bytes". Each PCollection has a declared encoding for its elements, called +a coder. A coder has a URN that identifies the encoding, and may have +additional sub-coders (for example, a coder for lists may contain a coder for +the elements of the list). Language-specific serialization techniques can, and +frequently are used, but there are a few key formats - such as key-value pairs +and timestamps - that are common so your runner can understand them. + +#### Windowing Strategy + +Every PCollection has a windowing strategy, a specification of essential +information for grouping and triggering operations. + +The details will be discussed below when we discuss the +[Window](#implementing-the-window-primitive) primitive, which sets up the +windowing strategy, and +[GroupByKey](#implementing-the-groupbykey-and-window-primitive) primitive, +which has behavior governed by the windowing strategy. + +### User-Defined Functions (UDFs) + +Beam has seven varieties of user-defined function (UDF). A Beam pipeline +may contain UDFs written in a language other than your runner, or even multiple +languages in the same pipeline (see the [Runner API](#the-runner-api)) so the +definitions are language-independent (see the [Fn API](#the-fn-api)). + +The UDFs of Beam are: + + * _DoFn_ - per-element processing function (used in ParDo) + * _WindowFn_ - places elements in windows and merges windows (used in Window + and GroupByKey) + * _Source_ - emits data read from external sources, including initial and + dynamic splitting for parallelism (used in Read) + * _ViewFn_ - adapts a materialized PCollection to a particular interface (used + in side inputs) + * _WindowMappingFn_ - maps one element's window to another, and specifies + bounds on how far in the past the result window will be (used in side + inputs) + * _CombineFn_ - associative and commutative aggregation (used in Combine and + state) + * _Coder_ - encodes user data; some coders have standard formats and are not really UDFs + +The various types of user-defined functions will be described further alongside +the [_PTransforms_](#ptransforms) that use them. + +### Runner + +The term "runner" is used for a couple of things. It generally refers to the +software that takes a Beam pipeline and executes it somehow. Often, this is the +translation code that you write. It usually also includes some customized +operators for your data processing engine, and is sometimes used to refer to +the full stack. + +A runner has just a single method `run(Pipeline)`. From here on, I will often +use code font for proper nouns in our APIs, whether or not the identifiers +match across all SDKs. + +The `run(Pipeline)` method should be asynchronous and results in a +PipelineResult which generally will be a job descriptor for your data +processing engine, providing methods for checking its status, canceling it, and +waiting for it to terminate. diff --git a/website/www/site/content/en/documentation/dsls/dataframes/differences-from-pandas.md b/website/www/site/content/en/documentation/dsls/dataframes/differences-from-pandas.md new file mode 100644 index 000000000000..da0054e5ebe2 --- /dev/null +++ b/website/www/site/content/en/documentation/dsls/dataframes/differences-from-pandas.md @@ -0,0 +1,92 @@ +--- +type: languages +title: "Differences from pandas" +--- + + +# Differences from pandas + +The Apache Beam DataFrame API aims to be a drop-in replacement for pandas, but there are a few differences to be aware of. This page describes divergences between the Beam and pandas APIs and provides tips for working with the Beam DataFrame API. + +## Working with pandas sources + +Beam operations are always associated with a pipeline. To read source data into a Beam DataFrame, you have to apply the source to a pipeline object. For example, to read input from a CSV file, you could use [read_csv](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.dataframe.io.html#apache_beam.dataframe.io.read_csv) as follows: + + df = p | beam.dataframe.io.read_csv(...) + +This is similar to pandas [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html), but `df` is a deferred Beam DataFrame representing the contents of the file. The input filename can be any file pattern understood by [fileio.MatchFiles](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.io.fileio.html#apache_beam.io.fileio.MatchFiles). + +For an example of using sources and sinks with the DataFrame API, see [taxiride.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/taxiride.py). + +## Classes of unsupported operations + +The sections below describe classes of operations that are not supported, or not yet supported, by the Beam DataFrame API. Workarounds are suggested, where applicable. + +### Non-parallelizable operations + +To support distributed processing, Beam invokes DataFrame operations on subsets of data in parallel. Some DataFrame operations can’t be parallelized, and these operations raise a [NonParallelOperation](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.dataframe.expressions.html#apache_beam.dataframe.expressions.NonParallelOperation) error by default. + +**Workaround** + +If you want to use a non-parallelizable operation, you can guard it with a `beam.dataframe.allow_non_parallel_operations` block. For example: + + from apache_beam import dataframe + + with dataframe.allow_non_parallel_operations(): + quantiles = df.quantile() + +Note that this collects the entire input dataset on a single node, so there’s a risk of running out of memory. You should only use this workaround if you’re sure that the input is small enough to process on a single worker. + +### Operations that produce non-deferred columns + +Beam DataFrame operations are deferred, but the schemas of the resulting DataFrames are not, meaning that result columns must be computable without access to the data. Some DataFrame operations can’t support this usage, so they can’t be implemented. These operations raise a [WontImplementError](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.dataframe.frame_base.html#apache_beam.dataframe.frame_base.WontImplementError). + +Currently there’s no workaround for this issue. But in the future, Beam Dataframe may support non-deferred column operations on categorical columns. This work is being tracked in [BEAM-12169](https://issues.apache.org/jira/browse/BEAM-12169). + +### Operations that produce non-deferred values or plots + +Because Beam operations are deferred, it’s infeasible to implement DataFrame APIs that produce non-deferred values or plots. If invoked, these operations raise a [WontImplementError](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.dataframe.frame_base.html#apache_beam.dataframe.frame_base.WontImplementError). + +**Workaround** + +If you’re using [Interactive Beam](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.runners.interactive.interactive_beam.html), you can use `collect` to bring a dataset into local memory and then perform these operations. + +### Order-sensitive operations + +Beam PCollections are inherently unordered, so pandas operations that are sensitive to the ordering of rows are not supported. These operations raise a [WontImplementError](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.dataframe.frame_base.html#apache_beam.dataframe.frame_base.WontImplementError). + +Order-sensitive operations may be supported in the future. To track progress on this issue, follow [BEAM-12129](https://issues.apache.org/jira/browse/BEAM-12129). If you think we should prioritize this work you can also [contact us](https://beam.apache.org/community/contact-us/) to let us know. + +**Workaround** + +If you’re using [Interactive Beam](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.runners.interactive.interactive_beam.html), you can use `collect` to bring a dataset into local memory and then perform these operations. + +Alternatively, there may be ways to rewrite your code so that it’s not order sensitive. For example, pandas users often call the order-sensitive [head](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html) operation to peek at data, but if you just want to view a subset of elements, you can also use `sample`, which doesn’t require you to collect the data first. Similarly, you could use `nlargest` instead of `sort_values(...).head`. + +### Operations that produce deferred scalars + +Some DataFrame operations produce deferred scalars. In Beam, actual computation of the values is deferred, and so the values are not available for control flow. For example, you can compute a sum with `Series.sum`, but you can’t immediately branch on the result, because the result data is not immediately available. `Series.is_unique` is a similar example. Using a deferred scalar for branching logic or truth tests raises a [TypeError](https://github.com/apache/beam/blob/b908f595101ff4f21439f5432514005394163570/sdks/python/apache_beam/dataframe/frame_base.py#L117). + +### Operations that aren’t implemented yet + +The Beam DataFrame API implements many of the commonly used pandas DataFrame operations, and we’re actively working to support the remaining operations. But pandas has a large API, and there are still gaps ([BEAM-9547](https://issues.apache.org/jira/browse/BEAM-9547)). If you invoke an operation that hasn’t been implemented yet, it will raise a `NotImplementedError`. Please [let us know](https://beam.apache.org/community/contact-us/) if you encounter a missing operation that you think should be prioritized. + +## Using Interactive Beam to access the full pandas API + +Interactive Beam is a module designed for use in interactive notebooks. The module, which by convention is imported as `ib`, provides an `ib.collect` function that brings a `PCollection` or deferred DataFrrame into local memory as a pandas DataFrame. After using `ib.collect` to materialize a deferred DataFrame you will be able to perform any operation in the pandas API, not just those that are supported in Beam. + + + +To get started with Beam in a notebook, see [Try Apache Beam](https://beam.apache.org/get-started/try-apache-beam/). diff --git a/website/www/site/content/en/documentation/dsls/dataframes/overview.md b/website/www/site/content/en/documentation/dsls/dataframes/overview.md new file mode 100644 index 000000000000..e08c61b7e236 --- /dev/null +++ b/website/www/site/content/en/documentation/dsls/dataframes/overview.md @@ -0,0 +1,117 @@ +--- +type: languages +title: "Beam DataFrames: Overview" +--- + + +# Beam DataFrames overview + +The Apache Beam Python SDK provides a DataFrame API for working with pandas-like [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) objects. The feature lets you convert a PCollection to a DataFrame and then interact with the DataFrame using the standard methods available on the pandas DataFrame API. The DataFrame API is built on top of the pandas implementation, and pandas DataFrame methods are invoked on subsets of the datasets in parallel. The big difference between Beam DataFrames and pandas DataFrames is that operations are deferred by the Beam API, to support the Beam parallel processing model. (To learn more about differences between the DataFrame implementations, see [Differences from pandas](/documentation/dsls/dataframes/differences-from-pandas/).) + +You can think of Beam DataFrames as a domain-specific language (DSL) for Beam pipelines. Similar to [Beam SQL](https://beam.apache.org/documentation/dsls/sql/overview/), DataFrames is a DSL built into the Beam Python SDK. Using this DSL, you can create pipelines without referencing standard Beam constructs like [ParDo](https://beam.apache.org/documentation/transforms/python/elementwise/pardo/) or [CombinePerKey](https://beam.apache.org/documentation/transforms/python/aggregation/combineperkey/). + +The Beam DataFrame API is intended to provide access to a familiar programming interface within a Beam pipeline. In some cases, the DataFrame API can also improve pipeline efficiency by deferring to the highly efficient, vectorized pandas implementation. + +## What is a DataFrame? + +If you’re new to pandas DataFrames, you can get started by reading [10 minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html), which shows you how to import and work with the `pandas` package. pandas is an open-source Python library for data manipulation and analysis. It provides data structures that simplify working with relational or labeled data. One of these data structures is the [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), which contains two-dimensional tabular data and provides labeled rows and columns for the data. + +## Using DataFrames + +To use Beam DataFrames, you need to install Apache Beam version 2.26.0 or higher (for complete setup instructions, see the [Apache Beam Python SDK Quickstart](https://beam.apache.org/get-started/quickstart-py/)) and pandas version 1.0 or higher. You can use DataFrames as shown in the following example, which reads New York City taxi data from a CSV file, performs a grouped aggregation, and writes the output back to CSV: + +{{< highlight py >}} +from apache_beam.dataframe.io import read_csv + +with beam.Pipeline() as p: + df = p | read_csv("gs://apache-beam-samples/nyc_taxi/misc/sample.csv") + agg = df[['passenger_count', 'DOLocationID']].groupby('DOLocationID').sum() + agg.to_csv('output') +{{< /highlight >}} + +pandas is able to infer column names from the first row of the CSV data, which is where `passenger_count` and `DOLocationID` come from. + +In this example, the only traditional Beam type is the `Pipeline` instance. Otherwise the example is written completely with the DataFrame API. This is possible because the Beam DataFrame API includes its own IO operations (for example, `read_csv` and `to_csv`) based on the pandas native implementations. `read_*` and `to_*` operations support file patterns and any Beam-compatible file system. The grouping is accomplished with a group-by-key, and arbitrary pandas operations (in this case, `sum`) can be applied before the final write that occurs with `to_csv`. + +The Beam DataFrame API aims to be compatible with the native pandas implementation, with a few caveats detailed below in [Differences from standard pandas](/documentation/dsls/dataframes/differences-from-pandas/). + +## Embedding DataFrames in a pipeline + +To use the DataFrames API in a larger pipeline, you can convert a PCollection to a DataFrame, process the DataFrame, and then convert the DataFrame back to a PCollection. In order to convert a PCollection to a DataFrame and back, you have to use PCollections that have [schemas](https://beam.apache.org/documentation/programming-guide/#what-is-a-schema) attached. A PCollection with a schema attached is also referred to as a *schema-aware PCollection*. To learn more about attaching a schema to a PCollection, see [Creating schemas](https://beam.apache.org/documentation/programming-guide/#creating-schemas). + +Here’s an example that creates a schema-aware PCollection, converts it to a DataFrame using `to_dataframe`, processes the DataFrame, and then converts the DataFrame back to a PCollection using `to_pcollection`: + + +{{< highlight py >}} +from apache_beam.dataframe.convert import to_dataframe +from apache_beam.dataframe.convert import to_pcollection +... + # Read the text file[pattern] into a PCollection. + lines = p | 'Read' >> ReadFromText(known_args.input) + + words = ( + lines + | 'Split' >> beam.FlatMap( + lambda line: re.findall(r'[\w]+', line)).with_output_types(str) + # Map to Row objects to generate a schema suitable for conversion + # to a dataframe. + | 'ToRows' >> beam.Map(lambda word: beam.Row(word=word))) + + df = to_dataframe(words) + df['count'] = 1 + counted = df.groupby('word').sum() + counted.to_csv(known_args.output) + + # Deferred DataFrames can also be converted back to schema'd PCollections + counted_pc = to_pcollection(counted, include_indexes=True) + + # Do something with counted_pc + ... +{{< /highlight >}} + +You can find the full wordcount example on +[GitHub](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/wordcount.py), +along with other [example DataFrame pipelines](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/). + +It’s also possible to use the DataFrame API by passing a function to [`DataframeTransform`][pydoc_dataframe_transform]: + +{{< highlight py >}} +from apache_beam.dataframe.transforms import DataframeTransform + +with beam.Pipeline() as p: + ... + | beam.Select(DOLocationID=lambda line: int(..), + passenger_count=lambda line: int(..)) + | DataframeTransform(lambda df: df.groupby('DOLocationID').sum()) + | beam.Map(lambda row: f"{row.DOLocationID},{row.passenger_count}") + ... +{{< /highlight >}} + +[`DataframeTransform`][pydoc_dataframe_transform] is similar to [`SqlTransform`][pydoc_sql_transform] from the [Beam SQL](https://beam.apache.org/documentation/dsls/sql/overview/) DSL. Where `SqlTransform` translates a SQL query to a PTransform, `DataframeTransform` is a PTransform that applies a function that takes and returns DataFrames. A `DataframeTransform` can be particularly useful if you have a stand-alone function that can be called both on Beam and on ordinary pandas DataFrames. + +`DataframeTransform` can accept and return multiple PCollections by name and by keyword, as shown in the following examples: + +{{< highlight py >}} +output = (pc1, pc2) | DataframeTransform(lambda df1, df2: ...) + +output = {'a': pc, ...} | DataframeTransform(lambda a, ...: ...) + +pc1, pc2 = {'a': pc} | DataframeTransform(lambda a: expr1, expr2) + +{...} = {a: pc} | DataframeTransform(lambda a: {...}) +{{< /highlight >}} + +[pydoc_dataframe_transform]: https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.transforms.html#apache_beam.dataframe.transforms.DataframeTransform +[pydoc_sql_transform]: https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform diff --git a/website/www/site/content/en/documentation/dsls/sql/calcite/query-syntax.md b/website/www/site/content/en/documentation/dsls/sql/calcite/query-syntax.md index 37a53f6d1c52..782da8ac5981 100644 --- a/website/www/site/content/en/documentation/dsls/sql/calcite/query-syntax.md +++ b/website/www/site/content/en/documentation/dsls/sql/calcite/query-syntax.md @@ -52,7 +52,7 @@ query and join data. The operations supported are a subset of | expression [ [ AS ] alias ] } [, ...] [ FROM from_item [, ...] ] [ WHERE bool_expression ] - [ GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } ] + [ GROUP BY { expression [, ...] } ] [ HAVING bool_expression ] set_op: @@ -454,7 +454,7 @@ Also see [Windowing & Triggering](/documentation/dsls/sql/windowing-and-triggeri ### Syntax {#syntax_3} - GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } + GROUP BY { expression [, ...] } The `GROUP BY` clause groups together rows in a table with non-distinct values for the `expression` in the `GROUP BY` clause. For multiple rows in the source diff --git a/website/www/site/content/en/documentation/dsls/sql/extensions/create-external-table.md b/website/www/site/content/en/documentation/dsls/sql/extensions/create-external-table.md index 2182b4b8f438..f109726201b9 100644 --- a/website/www/site/content/en/documentation/dsls/sql/extensions/create-external-table.md +++ b/website/www/site/content/en/documentation/dsls/sql/extensions/create-external-table.md @@ -281,6 +281,10 @@ to the key-values pairs specified in `columnsMapping`. Not all existing column families and qualifiers have to be provided to the schema. +Filters are only allowed by `key` field with single `LIKE` statement with +[RE2 Syntax](https://github.com/google/re2/wiki/Syntax) regex, e.g. +`SELECT * FROM table WHERE key LIKE '^key[012]{1}'` + ### Write Mode Supported for flat schema only. @@ -340,21 +344,27 @@ INSERT INTO writeTable(key, boolColumn, longColumn, stringColumn, doubleColumn) ### Syntax +#### Nested mode ``` CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName( event_timestamp TIMESTAMP, - attributes MAP, - payload ROW + attributes [MAP, ARRAY>], + payload [BYTES, ROW] ) TYPE pubsub LOCATION 'projects/[PROJECT]/topics/[TOPIC]' -TBLPROPERTIES '{ - "timestampAttributeKey": "key", - "deadLetterQueue": "projects/[PROJECT]/topics/[TOPIC]", - "format": "format" -}' ``` +#### Flattened mode +``` +CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName(tableElement [, tableElement ]*) +TYPE pubsub +LOCATION 'projects/[PROJECT]/topics/[TOPIC]' +``` + +In nested mode, the following fields hold topic metadata. The presence of the +`attributes` field triggers nested mode usage. + * `event_timestamp`: The event timestamp associated with the Pub/Sub message by PubsubIO. It can be one of the following: * Message publish time, which is provided by Pub/Sub. This is the default @@ -372,6 +382,8 @@ TBLPROPERTIES '{ `deadLeaderQueue` field of the `tblProperties` blob. If no dead-letter queue is specified in this case, an exception is thrown and the pipeline will crash. + + * `LOCATION`: * `PROJECT`: ID of the Google Cloud Project * `TOPIC`: The Pub/Sub topic name. A subscription will be created @@ -386,15 +398,14 @@ TBLPROPERTIES '{ payload was not parsed. If not specified, an exception is thrown for parsing failures. * `format`: Optional. Allows you to specify the Pubsub payload format. - Possible values are {`json`, `avro`}. Defaults to `json`. ### Read Mode -PubsubIO is currently limited to read access only. +PubsubIO supports reading from topics by creating a new subscription. ### Write Mode -Not supported. PubSubIO is currently limited to read access only in Beam SQL. +PubsubIO supports writing to topics. ### Schema @@ -407,13 +418,7 @@ declare a special set of columns, as shown below. ### Supported Payload -* JSON Objects (Default) - * Beam only supports querying messages with payload containing JSON - objects. Beam attempts to parse JSON to match the schema of the - `payload` field. -* Avro - * An Avro schema is automatically generated from the specified schema of - the `payload` field. It is used to parse incoming messages. +* Pub/Sub supports [Generic Payload Handling](#generic-payload-handling). ### Example @@ -423,33 +428,106 @@ TYPE pubsub LOCATION 'projects/testing-integration/topics/user-location' ``` +## Pub/Sub Lite + +### Syntax +``` +CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName( + publish_timestamp DATETIME, + event_timestamp DATETIME, + message_key BYTES, + attributes ARRAY>>, + payload [BYTES, ROW] +) +TYPE pubsublite +// For writing +LOCATION 'projects/[PROJECT]/locations/[GCP-LOCATION]/topics/[TOPIC]' +// For reading +LOCATION 'projects/[PROJECT]/locations/[GCP-LOCATION]/subscriptions/[SUBSCRIPTION]' +``` + +* `LOCATION`: + * `PROJECT`: ID of the Google Cloud Project + * `TOPIC`: The Pub/Sub Lite topic name. + * `SUBSCRIPTION`: The Pub/Sub Lite subscription name. + * `GCP-LOCATION`: The location for this Pub/Sub Lite topic os subscription. +* `TBLPROPERTIES`: + * `timestampAttributeKey`: Optional. The key which contains the event + timestamp associated with the Pub/Sub message. If not specified, the + message publish timestamp is used as an event timestamp for + windowing/watermarking. + * `deadLetterQueue`: Optional, supports + [Generic DLQ Handling](#generic-dlq-handling) + * `format`: Optional. Allows you to specify the payload format. + +### Read Mode + +PubsubLiteIO supports reading from subscriptions. + +### Write Mode + +PubsubLiteIO supports writing to topics. + +### Supported Payload + +* Pub/Sub Lite supports [Generic Payload Handling](#generic-payload-handling). + +### Example + +``` +CREATE EXTERNAL TABLE locations (event_timestamp TIMESTAMP, attributes ARRAY>>, payload ROW) +TYPE pubsublite +LOCATION 'projects/testing-integration/locations/us-central1-a/topics/user-location' +``` + ## Kafka KafkaIO is experimental in Beam SQL. ### Syntax +#### Flattened mode ``` CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName (tableElement [, tableElement ]*) TYPE kafka -LOCATION 'kafka://localhost:2181/brokers' +LOCATION 'my.company.url.com:2181/topic1' +TBLPROPERTIES '{ + "bootstrap_servers": ["localhost:9092", "PLAINTEXT://192.168.1.200:2181"], + "topics": ["topic2", "topic3"], + "format": "json" +}' +``` + +#### Nested mode +``` +CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName ( + event_timestamp DATETIME, + message_key BYTES, + headers ARRAY>>, + payload [BYTES, ROW] +) +TYPE kafka +LOCATION 'my.company.url.com:2181/topic1' TBLPROPERTIES '{ - "bootstrap.servers":"localhost:9092", - "topics": ["topic1", "topic2"], - "format": "avro" - [, "protoClass": "com.example.ExampleMessage" ] + "bootstrap_servers": ["localhost:9092", "PLAINTEXT://192.168.1.200:2181"], + "topics": ["topic2", "topic3"], + "format": "json" }' ``` -* `LOCATION`: The Kafka topic URL. +The presence of the `headers` field triggers nested mode usage. + +* `LOCATION`: A url with the initial bootstrap broker to use and the initial + topic name provided as the path. * `TBLPROPERTIES`: - * `bootstrap.servers`: Optional. Allows you to specify the bootstrap - server. - * `topics`: Optional. Allows you to specify specific topics. + * `bootstrap_servers`: Optional. Allows you to specify additional + bootstrap servers, which are used in addition to the one in `LOCATION`. + * `topics`: Optional. Allows you to specify additional topics, which are + used in addition to the one in `LOCATION`. * `format`: Optional. Allows you to specify the Kafka values format. Possible values are - {`csv`, `avro`, `json`, `proto`}. Defaults to `csv`. - * `protoClass`: Optional. Use only when `format` is equal to `proto`. Allows you to - specify full protocol buffer java class name. + {`csv`, `avro`, `json`, `proto`, `thrift`}. Defaults to `csv` in + flattened mode or `json` in nested mode. `csv` does not support nested + mode. ### Read Mode @@ -464,14 +542,8 @@ Write Mode supports writing to a topic. * CSV (default) * Beam parses the messages, attempting to parse fields according to the types specified in the schema. -* Avro - * An Avro schema is automatically generated from the specified field - types. It is used to parse incoming messages and to format outgoing - messages. -* JSON Objects - * Beam attempts to parse JSON to match the schema. -* Protocol buffers - * Fields in the schema have to match the fields of the given `protoClass`. +* Kafka supports all [Generic Payload Handling](#generic-payload-handling) + formats. ### Schema @@ -569,3 +641,47 @@ CREATE EXTERNAL TABLE orders (id INTEGER, price INTEGER) TYPE text LOCATION '/home/admin/orders' ``` + +## Generic Payload Handling + +Certain data sources and sinks support generic payload handling. This handling +parses a byte array payload field into a table schema. The following schemas are +supported by this handling. All require at least setting `"format": ""`, +and may require other properties. + +* `avro`: Avro + * An Avro schema is automatically generated from the specified field + types. It is used to parse incoming messages and to format outgoing + messages. +* `json`: JSON Objects + * Beam attempts to parse the byte array as UTF-8 JSON to match the schema. +* `proto`: Protocol Buffers + * Beam locates the equivalent Protocol Buffer class and uses it to parse + the payload + * `protoClass`: Required. The proto class name to use. Must be built into + the deployed JAR. + * Fields in the schema have to match the fields of the given `protoClass`. +* `thrift`: Thrift + * Fields in the schema have to match the fields of the given + `thriftClass`. + * `thriftClass`: Required. Allows you to specify full thrift java class + name. Must be built into the deployed JAR. + * `thriftProtocolFactoryClass`: Required. Allows you to specify full class + name of the `TProtocolFactory` to use for thrift serialization. Must be + built into the deployed JAR. + * The `TProtocolFactory` used for thrift serialization must match the + provided `thriftProtocolFactoryClass`. + +## Generic DLQ Handling + +Sources and sinks which support generic DLQ handling specify a parameter with +the format `"": "[DLQ_KIND]:[DLQ_ID]"`. The following types of +DLQ handling are supported: + +* `bigquery`: BigQuery + * DLQ_ID is the table spec for an output table with an "error" string + field and "payload" byte array field. +* `pubsub`: Pub/Sub Topic + * DLQ_ID is the full path of the Pub/Sub Topic. +* `pubsublite`: Pub/Sub Lite Topic + * DLQ_ID is the full path of the Pub/Sub Lite Topic. \ No newline at end of file diff --git a/website/www/site/content/en/documentation/dsls/sql/overview.md b/website/www/site/content/en/documentation/dsls/sql/overview.md index 6c7aacd4cc91..0b0c03ad6c51 100644 --- a/website/www/site/content/en/documentation/dsls/sql/overview.md +++ b/website/www/site/content/en/documentation/dsls/sql/overview.md @@ -18,7 +18,7 @@ limitations under the License. # Beam SQL overview -Beam SQL allows a Beam user (currently only available in Beam Java) to query +Beam SQL allows a Beam user (currently only available in Beam Java and Python) to query bounded and unbounded `PCollections` with SQL statements. Your SQL query is translated to a `PTransform`, an encapsulated segment of a Beam pipeline. You can freely mix SQL `PTransforms` and other `PTransforms` in your pipeline. diff --git a/website/www/site/content/en/documentation/dsls/sql/shell.md b/website/www/site/content/en/documentation/dsls/sql/shell.md index ae1a4e13a1d4..7c7b31bac495 100644 --- a/website/www/site/content/en/documentation/dsls/sql/shell.md +++ b/website/www/site/content/en/documentation/dsls/sql/shell.md @@ -29,7 +29,7 @@ This page describes how to work with the shell, but does not focus on specific f To use Beam SQL shell, you must first clone the [Beam SDK repository](https://github.com/apache/beam). Then, from the root of the repository clone, execute the following commands to run the shell: ``` -./gradlew -p sdks/java/extensions/sql/shell -Pbeam.sql.shell.bundled=':runners:flink:1.10,:sdks:java:io:kafka' installDist +./gradlew -p sdks/java/extensions/sql/shell -Pbeam.sql.shell.bundled=':runners:flink:1.13,:sdks:java:io:kafka' installDist ./sdks/java/extensions/sql/shell/build/install/shell/bin/shell ``` @@ -114,10 +114,10 @@ When you're satisfied with the logic of your SQL statements, you can submit the By default, Beam uses the `DirectRunner` to run the pipeline on the machine where you're executing the commands. If you want to run the pipeline with a different runner, you must perform two steps: -1. Make sure the SQL shell includes the desired runner. Add the corresponding project id to the `-Pbeam.sql.shell.bundled` parameter of the Gradle invocation ([source code](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/shell/build.gradle), [project ids](https://github.com/apache/beam/blob/master/settings.gradle)). For example, use the following command to include Flink runner and KafkaIO: +1. Make sure the SQL shell includes the desired runner. Add the corresponding project id to the `-Pbeam.sql.shell.bundled` parameter of the Gradle invocation ([source code](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/shell/build.gradle), [project ids](https://github.com/apache/beam/blob/master/settings.gradle.kts)). For example, use the following command to include Flink runner and KafkaIO: ``` - ./gradlew -p sdks/java/extensions/sql/shell -Pbeam.sql.shell.bundled=':runners:flink:1.10,:sdks:java:io:kafka' installDist + ./gradlew -p sdks/java/extensions/sql/shell -Pbeam.sql.shell.bundled=':runners:flink:1.13,:sdks:java:io:kafka' installDist ``` _Note: You can bundle multiple runners (using a comma-separated list) or other additional components in the same manner. For example, you can add support for more I/Os._ @@ -143,7 +143,7 @@ To configure the runner, you must specify `PipelineOptions` by using the `SET` c You can also build your own standalone package for SQL shell using `distZip` or `distTar` tasks. For example: ``` -./gradlew -p sdks/java/extensions/sql/shell -Pbeam.sql.shell.bundled=':runners:flink:1.10,:sdks:java:io:kafka' distZip +./gradlew -p sdks/java/extensions/sql/shell -Pbeam.sql.shell.bundled=':runners:flink:1.13,:sdks:java:io:kafka' distZip ls ./sdks/java/extensions/sql/shell/build/distributions/ beam-sdks-java-extensions-sql-shell-2.6.0-SNAPSHOT.tar beam-sdks-java-extensions-sql-shell-2.6.0-SNAPSHOT.zip diff --git a/website/www/site/content/en/documentation/dsls/sql/walkthrough.md b/website/www/site/content/en/documentation/dsls/sql/walkthrough.md index d4c76f054704..90720ab6abbb 100644 --- a/website/www/site/content/en/documentation/dsls/sql/walkthrough.md +++ b/website/www/site/content/en/documentation/dsls/sql/walkthrough.md @@ -138,9 +138,9 @@ to either a single `PCollection` or a `PCollectionTuple` which holds multiple // Create a PCollectionTuple containing both PCollections. // TupleTags IDs will be used as table names in the SQL query - PCollectionTuple namesAndFoods = PCollectionTuple.of( - new TupleTag<>("Apps"), appsRows), // appsRows from the previous example - new TupleTag<>("Reviews"), reviewsRows)); + PCollectionTuple namesAndFoods = PCollectionTuple + .of(new TupleTag<>("Apps"), appsRows) // appsRows from the previous example + .and(new TupleTag<>("Reviews"), reviewsRows); // Compute the total number of reviews // and average rating per app diff --git a/website/www/site/content/en/documentation/dsls/sql/zetasql/overview.md b/website/www/site/content/en/documentation/dsls/sql/zetasql/overview.md index 882e1cb777f2..03ab6e79177d 100644 --- a/website/www/site/content/en/documentation/dsls/sql/zetasql/overview.md +++ b/website/www/site/content/en/documentation/dsls/sql/zetasql/overview.md @@ -30,33 +30,4 @@ A Beam SQL statement comprises a series of tokens. For more information about to Beam SQL supports standard SQL scalar data types as well as extensions including arrays, maps, and nested rows. For more information about scalar data in Beam ZetaSQL, see the [Data types](/documentation/dsls/sql/zetasql/data-types) reference. ## Functions and operators -The following table summarizes the [ZetaSQL functions and operators](https://github.com/google/zetasql/blob/master/docs/functions-and-operators.md) supported by Beam ZetaSQL. -
    - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Operators and functionsBeam ZetaSQL support
    Type conversionYes
    Aggregate functionsSee Beam SQL aggregate functions
    Statistical aggregate functionsNo
    Approximate aggregate functionsNo
    HyperLogLog++ functionsNo
    KLL16 quantile functionsNo
    Numbering functionsNo
    Bit functionsNo
    Mathematical functionsSee mathematical functions
    Navigation functionsNo
    Aggregate analytic functionsSee aggregate functions
    Hash functionsNo
    String functionsSee string functions
    JSON functionsNo
    Array functionsNo
    Date functionsNo
    DateTime functionsNo
    Time functionsNo
    Timestamp functionsNo
    Protocol buffer functionsNo
    Security functionsNo
    Net functionsNo
    Operator precedenceYes
    Conditional expressionsSee conditional expressions
    Expression subqueriesNo
    Debugging functionsNo
    +For a list of the builtin functions and operators supported in Beam ZetaSQL, see [SupportedZetaSqlBuiltinFunctions.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java) (commented-out entries are not yet supported). For documentation on how these functions work, see the [ZetaSQL functions and operators](https://github.com/google/zetasql/blob/master/docs/functions-and-operators.md) reference. \ No newline at end of file diff --git a/website/www/site/content/en/documentation/glossary.md b/website/www/site/content/en/documentation/glossary.md new file mode 100644 index 000000000000..4bc7395e7823 --- /dev/null +++ b/website/www/site/content/en/documentation/glossary.md @@ -0,0 +1,464 @@ +--- +title: "Beam glossary" +--- + + +# Apache Beam glossary + +## Aggregation + +A transform pattern for computing a value from multiple input elements. Aggregation is similar to the reduce operation in the [MapReduce](https://en.wikipedia.org/wiki/MapReduce) model. Aggregation transforms include Count (computes the count of all elements in the aggregation), Max (computes the maximum element in the aggregation), and Sum (computes the sum of all elements in the aggregation). + +For a complete list of aggregation transforms, see: + +* [Java Transform catalog](/documentation/transforms/java/overview/#aggregation) +* [Python Transform catalog](/documentation/transforms/python/overview/#aggregation) + +## Apply + +A method for invoking a transform on a PCollection. Each transform in the Beam SDKs has a generic `apply` method (or pipe operator `|`). Invoking multiple Beam transforms is similar to method chaining, but with a difference: You apply the transform to the input PCollection, passing the transform itself as an argument, and the operation returns the output PCollection. Because of Beam’s deferred execution model, applying a transform does not immediately execute that transform. + +To learn more, see: + +* [Applying transforms](/documentation/programming-guide/#applying-transforms) + +## Batch processing + +A data processing paradigm for working with finite, or bounded, datasets. A bounded PCollection represents a dataset of a known, fixed size. Reading from a batch data source, such as a file or a database, creates a bounded PCollection. A batch processing job eventually ends, in contrast to a streaming job, which runs until cancelled. + +To learn more, see: + +* [Size and boundedness](/documentation/programming-guide/#size-and-boundedness) + +## Bounded data + +A dataset of a known, fixed size. A PCollection can be bounded or unbounded, depending on the source of the data that it represents. Reading from a batch data source, such as a file or a database, creates a bounded PCollection. Beam also supports reading a bounded amount of data from an unbounded source. + +To learn more, see: + +* [Size and boundedness](/documentation/programming-guide/#size-and-boundedness) + +## Bundle + +The processing unit for elements in a PCollection. Instead of processing all elements in a PCollection simultaneously, Beam processes the elements in bundles. The runner handles the division of the collection into bundles, and in doing so it may optimize the bundle size for the use case. For example, a streaming runner might process smaller bundles than a batch runner. + +To learn more, see: + +* [Bundling and persistence](/documentation/runtime/model/#bundling-and-persistence) + +## Coder + +A component that describes how the elements of a PCollection can be encoded and decoded. To support distributed processing and cross-language portability, Beam needs to be able to encode each element of a PCollection as bytes. The Beam SDKs provide built-in coders for common types and language-specific mechanisms for specifying the encoding of a PCollection. + +To learn more, see: + +* [Data encoding and type safety](/documentation/programming-guide/#data-encoding-and-type-safety) + +## CoGroupByKey + +A PTransform that takes two or more PCollections and aggregates the elements by key. In effect, CoGroupByKey performs a relational join of two or more key/value PCollections that have the same key type. While GroupByKey performs this operation over a single input collection, CoGroupByKey operates over multiple input collections. + +To learn more, see: + +* [CoGroupByKey](/documentation/programming-guide/#cogroupbykey) +* [CoGroupByKey (Java)](/documentation/transforms/java/aggregation/cogroupbykey/) +* [CoGroupByKey (Python)](/documentation/transforms/python/aggregation/cogroupbykey/) + +## Collection + +See [PCollection](/documentation/glossary/#pcollection). + +## Combine + +A PTransform for combining all elements of a PCollection or all values associated with a key. When you apply a Combine transform, you have to provide a user-defined function (UDF) that contains the logic for combining the elements or values. The combining function should be [commutative](https://en.wikipedia.org/wiki/Commutative_property) and [associative](https://en.wikipedia.org/wiki/Associative_property), because the function is not necessarily invoked exactly once on all values with a given key. + +To learn more, see: + +* [Combine](/documentation/programming-guide/#combine) +* [Combine (Java)](/documentation/transforms/java/aggregation/combine/) +* [CombineGlobally (Python)](/documentation/transforms/python/aggregation/combineglobally/) +* [CombinePerKey (Python)](/documentation/transforms/python/aggregation/combineperkey/) +* [CombineValues (Python)](/documentation/transforms/python/aggregation/combinevalues/) + +## Composite transform + +A PTransform that expands into many PTransforms. Composite transforms have a nested structure, in which a complex transform applies one or more simpler transforms. These simpler transforms could be existing Beam operations like ParDo, Combine, or GroupByKey, or they could be other composite transforms. Nesting multiple transforms inside a single composite transform can make your pipeline more modular and easier to understand. + +To learn more, see: + +* [Composite transforms](/documentation/programming-guide/#composite-transforms) + +## Counter (metric) + +A metric that reports a single long value and can be incremented. In the Beam model, metrics provide insight into the state of a pipeline, potentially while the pipeline is running. + +To learn more, see: + +* [Types of metrics](/documentation/programming-guide/#types-of-metrics) + +## Cross-language transforms + +Transforms that can be shared across Beam SDKs. With cross-language transforms, you can use transforms written in any supported SDK language (currently, Java and Python) in a pipeline written in a different SDK language. For example, you could use the Apache Kafka connector from the Java SDK in a Python streaming pipeline. Cross-language transforms make it possible to provide new functionality simultaneously in different SDKs. + +To learn more, see: + +* [Multi-language pipelines](/documentation/programming-guide/#multi-language-pipelines) + +## Deferred execution + +A feature of the Beam execution model. Beam operations are deferred, meaning that the result of a given operation may not be available for control flow. Deferred execution allows the Beam API to support parallel processing of data. + +## Distribution (metric) + +A metric that reports information about the distribution of reported values. In the Beam model, metrics provide insight into the state of a pipeline, potentially while the pipeline is running. + +To learn more, see: + +* [Types of metrics](/documentation/programming-guide/#types-of-metrics) + +## DoFn + +A function object used by ParDo (or some other transform) to process the elements of a PCollection. A DoFn is a user-defined function, meaning that it contains custom code that defines a data processing task in your pipeline. The Beam system invokes a DoFn one or more times to process some arbitrary bundle of elements, but Beam doesn’t guarantee an exact number of invocations. + +To learn more, see: + +* [ParDo](/documentation/programming-guide/#pardo) + +## Driver + +A program that defines your pipeline, including all of the inputs, transforms, and outputs. To use Beam, you need to create a driver program using classes from one of the Beam SDKs. The driver program creates a pipeline and specifies the execution options that tell the pipeline where and how to run. These options include the runner, which determines what backend your pipeline will run on. + +To learn more, see: + +* [Overview](/documentation/programming-guide/#overview) + +## Element + +The unit of data in a PCollection. Elements in a PCollection can be of any type, but they must all have the same type. This allows parallel computations to operate uniformly across the entire collection. Some element types have a structure that can be introspected (for example, JSON, Protocol Buffer, Avro, and database records). + +To learn more, see: + +* [PCollection characteristics](/documentation/programming-guide/#pcollection-characteristics) + +## Element-wise + +A type of transform that independently processes each element in an input PCollection. Element-wise is similar to the map operation in the [MapReduce](https://en.wikipedia.org/wiki/MapReduce) model. An element-wise transform might output 0, 1, or multiple values for each input element. This is in contrast to aggregation transforms, which compute a single value from multiple input elements. Element-wise operations include Filter, FlatMap, and ParDo. + +For a complete list of element-wise transforms, see: + +* [Java Transform catalog](/documentation/transforms/java/overview/#element-wise) +* [Python Transform catalog](/documentation/transforms/python/overview/#element-wise) + +## Engine + +A data-processing system, such as Dataflow, Spark, or Flink. A Beam runner for an engine executes a Beam pipeline on that engine. + +## Event time + +The time a data event occurs, determined by a timestamp on an element. This is in contrast to processing time, which is when an element is processed in a pipeline. An event could be, for example, a user interaction or a write to an error log. There’s no guarantee that events will appear in a pipeline in order of event time. + +To learn more, see: + +* [Watermarks and late data](/documentation/programming-guide/#watermarks-and-late-data) +* [Triggers](/documentation/programming-guide/#triggers) + +## Expansion Service + +A service that enables a pipeline to apply (expand) cross-language transforms defined in other SDKs. For example, by connecting to a Java expansion service, the Python SDK can apply transforms implemented in Java. Currently SDKs define expansion services as local processes, but in the future Beam may support long-running expansion services. The development of expansion services is part of the ongoing effort to support multi-language pipelines. + +## Flatten +One of the core PTransforms. Flatten merges multiple PCollections into a single logical PCollection. + +To learn more, see: + +* [Flatten](/documentation/programming-guide/#flatten) +* [Flatten (Java)](/documentation/transforms/java/other/flatten/) +* [Flatten (Python)](/documentation/transforms/python/other/flatten/) + +## Fusion + +An optimization that Beam runners can apply before running a pipeline. When one transform outputs a PCollection that’s consumed by another transform, or when two or more transforms take the same PCollection as input, a runner may be able to fuse the transforms together into a single processing unit (a *stage* in Dataflow). Fusion can make pipeline execution more efficient by preventing I/O operations. + +## Gauge (metric) + +A metric that reports the latest value out of reported values. In the Beam model, metrics provide insight into the state of a pipeline, potentially while the pipeline is running. Because metrics are collected from many workers, the gauge value may not be the absolute last value, but it will be one of the latest values produced by one of the workers. + +To learn more, see: + +* [Types of metrics](/documentation/programming-guide/#types-of-metrics) + +## GroupByKey + +A PTransform for processing collections of key/value pairs. GroupByKey is a parallel reduction operation, similar to the shuffle of a map/shuffle/reduce algorithm. The input to GroupByKey is a collection of key/value pairs in which multiple pairs have the same key but different values (i.e. a multimap). You can use GroupByKey to collect all of the values associated with each unique key. + +To learn more, see: + +* [GroupByKey](/documentation/programming-guide/#groupbykey) +* [GroupByKey (Java)](/documentation/transforms/java/aggregation/groupbykey/) +* [GroupByKey (Python)](/documentation/transforms/python/aggregation/groupbykey/) + +## I/O connector + +A set of PTransforms for working with external data storage systems. When you create a pipeline, you often need to read from or write to external data systems such as files or databases. Beam provides read and write transforms for a number of common data storage types. + +To learn more, see: + +* [Pipeline I/O](/documentation/programming-guide/#pipeline-io) +* [Built-in I/O Transforms](/documentation/io/built-in/) + +## Map + +An element-wise PTransform that applies a user-defined function (UDF) to each element in a PCollection. Using Map, you can transform each individual element, but you can't change the number of elements. + +To learn more, see: + +* [Map (Python)](/documentation/transforms/python/elementwise/map/) +* [MapElements (Java)](/documentation/transforms/java/elementwise/mapelements/) + +## Metrics + +Data on the state of a pipeline, potentially while the pipeline is running. You can use the built-in Beam metrics to gain insight into the functioning of your pipeline. For example, you might use Beam metrics to track errors, calls to a backend service, or the number of elements processed. Beam currently supports three types of metric: Counter, Distribution, and Gauge. + +To learn more, see: + +* [Metrics](/documentation/programming-guide/#metrics) + +## Multi-language pipeline + +A pipeline that uses cross-language transforms. You can combine transforms written in any supported SDK language (currently, Java and Python) and use them in one multi-language pipeline. + +To learn more, see: + +* [Multi-language pipelines](/documentation/programming-guide/#multi-language-pipelines) + +## ParDo + +The lowest-level element-wise PTransform. For each element in an input PCollection, ParDo applies a function and emits zero, one, or multiple elements to an output PCollection. “ParDo” is short for “Parallel Do.” It’s similar to the map operation in a [MapReduce](https://en.wikipedia.org/wiki/MapReduce) algorithm, the `apply` method from a DataFrame, or the `UPDATE` keyword from SQL. + +To learn more, see: + +* [ParDo](/documentation/programming-guide/#pardo) +* [ParDo (Java)](/documentation/transforms/java/elementwise/pardo/) +* [ParDo (Python)](/documentation/transforms/python/elementwise/pardo/) + +## Partition + +An element-wise PTransform that splits a single PCollection into a fixed number of smaller PCollections. Partition requires a user-defined function (UDF) to determine how to split up the elements of the input collection into the resulting output collections. The number of partitions must be determined at graph construction time, meaning that you can’t determine the number of partitions using data calculated by the running pipeline. + +To learn more, see: + +* [Partition](/documentation/programming-guide/#partition) +* [Partition (Java)](/documentation/transforms/java/elementwise/partition/) +* [Partition (Python)](/documentation/transforms/python/elementwise/partition/) + +## PCollection + +A potentially distributed, homogeneous dataset or data stream. PCollections represent data in a Beam pipeline, and Beam transforms (PTransforms) use PCollection objects as inputs and outputs. PCollections are intended to be immutable, meaning that once a PCollection is created, you can’t add, remove, or change individual elements. The “P” stands for “parallel.” + +To learn more, see: + +* [PCollections](/documentation/programming-guide/#pcollections) + +## Pipe operator (`|`) + +Delimits a step in a Python pipeline. For example: `[Final Output PCollection] = ([Initial Input PCollection] | [First Transform] | [Second Transform] | [Third Transform])`. The output of each transform is passed from left to right as input to the next transform. The pipe operator in Python is equivalent to the `apply` method in Java (in other words, the pipe applies a transform to a PCollection). + +To learn more, see: + +* [Applying transforms](/documentation/programming-guide/#applying-transforms) + +## Pipeline + +An encapsulation of your entire data processing task, including reading input data from a source, transforming that data, and writing output data to a sink. You can think of a pipeline as a Beam program that uses PTransforms to process PCollections. The transforms in a pipeline can be represented as a directed acyclic graph (DAG). All Beam driver programs must create a pipeline. + +To learn more, see: + +* [Overview](/documentation/programming-guide/#overview) +* [Creating a pipeline](/documentation/programming-guide/#creating-a-pipeline) +* [Design your pipeline](/documentation/pipelines/design-your-pipeline/) +* [Create your pipeline](/documentation/pipelines/create-your-pipeline/) + +## Processing time + +The time at which an element is processed at some stage in a pipeline. Processing time is not the same as event time, which is the time at which a data event occurs. Processing time is determined by the clock on the system processing the element. There’s no guarantee that elements will be processed in order of event time. + +To learn more, see: + +* [Watermarks and late data](/documentation/programming-guide/#watermarks-and-late-data) +* [Triggers](/documentation/programming-guide/#triggers) + +## PTransform + +A data processing operation, or a step, in your pipeline. A PTransform takes zero or more PCollections as input, applies a processing function to the elements of that PCollection, and produces zero or more output PCollections. Some PTransforms accept user-defined functions that apply custom logic. The “P” stands for “parallel.” + +To learn more, see: + +* [Overview](/documentation/programming-guide/#overview) +* [Transforms](/documentation/programming-guide/#transforms) + +## Runner + +A runner runs a pipeline on a specific platform. Most runners are translators or adapters to massively parallel big data processing systems. Other runners exist for local testing and debugging. Among the supported runners are Google Cloud Dataflow, Apache Spark, Apache Samza, Apache Flink, the Interactive Runner, and the Direct Runner. + +To learn more, see: + +* [Choosing a Runner](/documentation/#choosing-a-runner) +* [Beam Capability Matrix](/documentation/runners/capability-matrix/) + +## Schema + +A language-independent type definition for a PCollection. The schema for a PCollection defines elements of that PCollection as an ordered list of named fields. Each field has a name, a type, and possibly a set of user options. Schemas provide a way to reason about types across different programming-language APIs. + +To learn more, see: + +* [Schemas](/documentation/programming-guide/#schemas) +* [Schema Patterns](/documentation/patterns/schema/) + +## Session + +A time interval for grouping data events. A session is defined by some minimum gap duration between events. For example, a data stream representing user mouse activity may have periods with high concentrations of clicks followed by periods of inactivity. A session can represent such a pattern of activity followed by inactivity. + +To learn more, see: + +* [Session windows](/documentation/programming-guide/#session-windows) +* [Analyzing Usage Patterns](/get-started/mobile-gaming-example/#analyzing-usage-patterns) + +## Side input + +Additional input to a PTransform. Side input is input that you provide in addition to the main input PCollection. A DoFn can access side input each time it processes an element in the PCollection. Side inputs are useful if your transform needs to inject additional data at runtime. + +To learn more, see: + +* [Side inputs](/documentation/programming-guide/#side-inputs) +* [Side input patterns](/documentation/patterns/side-inputs/) + +## Sink + +A transform that writes to an external data storage system, like a file or database. + +To learn more, see: + +* [Developing new I/O connectors](/documentation/io/developing-io-overview/) +* [Pipeline I/O](/documentation/programming-guide/#pipeline-io) +* [Built-in I/O transforms](/documentation/io/built-in/) + +## Source +A transform that reads from an external storage system. A pipeline typically reads input data from a source. The source has a type, which may be different from the sink type, so you can change the format of data as it moves through your pipeline. + +To learn more, see: + +* [Developing new I/O connectors](/documentation/io/developing-io-overview/) +* [Pipeline I/O](/documentation/programming-guide/#pipeline-io) +* [Built-in I/O transforms](/documentation/io/built-in/) + +## Splittable DoFn + +A generalization of DoFn that makes it easier to create complex, modular I/O connectors. A Splittable DoFn (SDF) can process elements in a non-monolithic way, meaning that the processing can be decomposed into smaller tasks. With SDF, you can check-point the processing of an element, and you can split the remaining work to yield additional parallelism. SDF is recommended for building new I/O connectors. + +To learn more, see: + +* [Splittable DoFns](/documentation/programming-guide/#splittable-dofns) +* [Splittable DoFn in Apache Beam is Ready to Use](/blog/splittable-do-fn-is-available/) + +## State + +Persistent values that a PTransform can access. The state API lets you augment element-wise operations (for example, ParDo or Map) with mutable state. Using the state API, you can read from, and write to, state as you process each element of a PCollection. You can use the state API together with the timer API to create processing tasks that give you fine-grained control over the workflow. + +To learn more, see: + +* [State and Timers](/documentation/programming-guide/#state-and-timers) +* [Stateful processing with Apache Beam](/blog/stateful-processing/) + +## Streaming + +A data processing paradigm for working with infinite, or unbounded, datasets. Reading from a streaming data source, such as Pub/Sub or Kafka, creates an unbounded PCollection. An unbounded PCollection must be processed using a job that runs continuously, because the entire collection can never be available for processing at any one time. + +To learn more, see: + +* [Size and boundedness](/documentation/programming-guide/#size-and-boundedness) +* [Python Streaming Pipelines](/documentation/sdks/python-streaming/) + +# Timer + +A Beam feature that enables delayed processing of data stored using the state API. The timer API lets you set timers to call back at either an event-time or a processing-time timestamp. You can use the timer API together with the state API to create processing tasks that give you fine-grained control over the workflow. + +To learn more, see: + +* [State and Timers](/documentation/programming-guide/#state-and-timers) +* [Stateful processing with Apache Beam](/blog/stateful-processing/) +* [Timely (and Stateful) Processing with Apache Beam](/blog/timely-processing/) + +## Timestamp + +A point in time associated with an element in a PCollection and used to assign a window to the element. The source that creates the PCollection assigns each element an initial timestamp, often corresponding to when the element was read or added. But you can also manually assign timestamps. This can be useful if elements have an inherent timestamp, but the timestamp is somewhere in the structure of the element itself (for example, a time field in a server log entry). + +To learn more, see: + +* [Element timestamps](/documentation/programming-guide/#element-timestamps) +* [Adding timestamps to a PCollection’s elements](/documentation/programming-guide/#adding-timestamps-to-a-pcollections-elements) + +## Transform + +See PTransform. + +## Trigger + +Determines when to emit aggregated result data from a window. You can use triggers to refine the windowing strategy for your pipeline. If you use the default windowing configuration and default trigger, Beam outputs an aggregated result when it estimates that all data for a window has arrived, and it discards all subsequent data for that window. But you can also use triggers to emit early results, before all the data in a given window has arrived, or to process late data by triggering after the event time watermark passes the end of the window. + +To learn more, see: + +* [Triggers](/documentation/programming-guide/#triggers) + +## Unbounded data + +A dataset of unlimited size. A PCollection can be bounded or unbounded, depending on the source of the data that it represents. Reading from a streaming or continuously-updating data source, such as Pub/Sub or Kafka, typically creates an unbounded PCollection. + +To learn more, see: + +* [Size and boundedness](/documentation/programming-guide/#size-and-boundedness) + +## User-defined function + +Custom logic that a PTransform applies to your data. Some PTransforms accept a user-defined function (UDF) as a way to configure the transform. For example, ParDo expects user code in the form of a DoFn object. Each language SDK has its own idiomatic way of expressing user-defined functions, but there are some common requirements, like serializability and thread compatibility. + +To learn more, see: + +* [User-Defined Functions (UDFs)](/documentation/basics/#user-defined-functions-udfs) +* [ParDo](/documentation/programming-guide/#pardo) +* [Requirements for writing user code for Beam transforms](/documentation/programming-guide/#requirements-for-writing-user-code-for-beam-transforms) + +## Watermark + +The point in event time when all data in a window can be expected to have arrived in the pipeline. Watermarks provide a way to estimate the completeness of input data. Every PCollection has an associated watermark. Once the watermark progresses past the end of a window, any element that arrives with a timestamp in that window is considered late data. + +To learn more, see: + +* [Watermarks and late data](/documentation/programming-guide/#watermarks-and-late-data) + +## Windowing + +Partitioning a PCollection into bounded subsets grouped by the timestamps of individual elements. In the Beam model, any PCollection – including unbounded PCollections – can be subdivided into logical windows. Each element in a PCollection is assigned to one or more windows according to the PCollection's windowing function, and each individual window contains a finite number of elements. Transforms that aggregate multiple elements, such as GroupByKey and Combine, work implicitly on a per-window basis. + +To learn more, see: + +* [Windowing](/documentation/programming-guide/#windowing) + +## Worker + +A container, process, or virtual machine (VM) that handles some part of the parallel processing of a pipeline. The Beam model doesn’t support synchronizing shared state across worker machines. Instead, each worker node has its own independent copy of state. A Beam runner might serialize elements between machines for communication purposes and for other reasons such as persistence. + +To learn more, see: + +* [Execution model](/documentation/runtime/model/) diff --git a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md index a4f61cc7b60d..c957f5e5ed67 100644 --- a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md +++ b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md @@ -252,13 +252,18 @@ them into JSON `TableRow` objects. {{< paragraph class="language-py" >}} -To read from a BigQuery table using the Beam SDK for Python, apply a `Read` -transform on a `BigQuerySource`. Read returns a `PCollection` of dictionaries, +To read from a BigQuery table using the Beam SDK for Python, apply a `ReadFromBigQuery` +transfrom. `ReadFromBigQuery` returns a `PCollection` of dictionaries, where each element in the `PCollection` represents a single row in the table. Integer values in the `TableRow` objects are encoded as strings to match BigQuery's exported JSON format. {{< /paragraph >}} +{{< paragraph class="language-py" >}} +***Note:*** `BigQuerySource()` is deprecated as of Beam SDK 2.25.0. Before 2.25.0, to read from +a BigQuery table using the Beam SDK, you will apply a `Read` transform on a `BigQuerySource`. For example, +`beam.io.Read(beam.io.BigQuerySource(table_spec))`. +{{< /paragraph >}} ### Reading from a table @@ -293,7 +298,7 @@ the `fromQuery` method. {{< paragraph class="language-py" >}} If you don't want to read an entire table, you can supply a query string to -`BigQuerySource` by specifying the `query` parameter. +`ReadFromBigQuery` by specifying the `query` parameter. {{< /paragraph >}} {{< paragraph class="language-py" >}} @@ -327,10 +332,10 @@ such as column selection and predicate filter push-down which can allow more efficient pipeline execution. The Beam SDK for Java supports using the BigQuery Storage API when reading from -BigQuery. SDK versions before 2.24.0 support the BigQuery Storage API as an +BigQuery. SDK versions before 2.25.0 support the BigQuery Storage API as an [experimental feature](https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/annotations/Experimental.html) and use the pre-GA BigQuery Storage API surface. Callers should migrate -pipelines which use the BigQuery Storage API to use SDK version 2.24.0 or later. +pipelines which use the BigQuery Storage API to use SDK version 2.25.0 or later. The Beam SDK for Python does not support the BigQuery Storage API. See [BEAM-10917](https://issues.apache.org/jira/browse/BEAM-10917)). @@ -360,7 +365,7 @@ GitHub](https://github.com/apache/beam/blob/master/examples/java/src/main/java/o The following code snippet reads with a query string. {{< highlight java >}} -// Snippet not yet available (BEAM-7034). +{{< code_sample "examples/java/src/main/java/org/apache/beam/examples/snippets/transforms/io/gcp/bigquery/BigQueryReadFromQueryWithBigQueryStorageAPI.java" bigquery_read_from_query_with_bigquery_storage_api >}} {{< /highlight >}} {{< highlight py >}} @@ -597,59 +602,106 @@ as the previous example. BigQueryIO supports two methods of inserting data into BigQuery: load jobs and streaming inserts. Each insertion method provides different tradeoffs of cost, quota, and data consistency. See the BigQuery documentation for -[load jobs](https://cloud.google.com/bigquery/loading-data) and -[streaming inserts](https://cloud.google.com/bigquery/streaming-data-into-bigquery) +[different data ingestion options](https://cloud.google.com/bigquery/loading-data) +(specifically, [load jobs](https://cloud.google.com/bigquery/docs/batch-loading-data) +and [streaming inserts](https://cloud.google.com/bigquery/streaming-data-into-bigquery)) for more information about these tradeoffs. +{{< paragraph class="language-java" >}} BigQueryIO chooses a default insertion method based on the input `PCollection`. +You can use `withMethod` to specify the desired insertion method. See +[`Write.Method`](https://beam.apache.org/releases/javadoc/{{< param release_latest >}}/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html) +for the list of the available methods and their restrictions. +{{< /paragraph >}} {{< paragraph class="language-py" >}} -BigQueryIO uses load jobs when you apply a BigQueryIO write transform to a -bounded `PCollection`. +BigQueryIO chooses a default insertion method based on the input `PCollection`. +You can use `method` to specify the desired insertion method. See +[`WriteToBigQuery`](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.io.gcp.bigquery.html#apache_beam.io.gcp.bigquery.WriteToBigQuery) +for the list of the available methods and their restrictions. {{< /paragraph >}} -{{< paragraph class="language-java" >}} BigQueryIO uses load jobs in the following situations: -{{< /paragraph >}} {{< paragraph class="language-java" wrap="span" >}} * When you apply a BigQueryIO write transform to a bounded `PCollection`. -* When you apply a BigQueryIO write transform to an unbounded `PCollection` and - use `BigQueryIO.write().withTriggeringFrequency()` to set the triggering - frequency. * When you specify load jobs as the insertion method using `BigQueryIO.write().withMethod(FILE_LOADS)`. {{< /paragraph >}} -{{< paragraph class="language-py" >}} -BigQueryIO uses streaming inserts when you apply a BigQueryIO write transform to -an unbounded `PCollection`. +{{< paragraph class="language-py" wrap="span" >}} +* When you apply a BigQueryIO write transform to a bounded `PCollection`. +* When you specify load jobs as the insertion method using + `WriteToBigQuery(method='FILE_LOADS')`. {{< /paragraph >}} +***Note:*** If you use batch loads in a streaming pipeline: + {{< paragraph class="language-java" >}} -BigQueryIO uses streaming inserts in the following situations: +You must use `withTriggeringFrequency` to specify a triggering frequency for +initiating load jobs. Be careful about setting the frequency such that your +pipeline doesn't exceed the BigQuery load job [quota limit](https://cloud.google.com/bigquery/quotas#load_jobs). {{< /paragraph >}} +{{< paragraph class="language-java" >}} +You can either use `withNumFileShards` to explicitly set the number of file +shards written, or use `withAutoSharding` to enable dynamic sharding (starting +2.29.0 release) and the number of shards may be determined and changed at +runtime. The sharding behavior depends on the runners. +{{< /paragraph >}} + +{{< paragraph class="language-py" >}} +You must use `triggering_frequency` to specify a triggering frequency for +initiating load jobs. Be careful about setting the frequency such that your +pipeline doesn't exceed the BigQuery load job [quota limit](https://cloud.google.com/bigquery/quotas#load_jobs). +{{< /paragraph >}} + +{{< paragraph class="language-py" >}} +You can set `with_auto_sharding=True` to enable dynamic sharding (starting +2.29.0 release). The number of shards may be determined and changed at runtime. +The sharding behavior depends on the runners. +{{< /paragraph >}} + +BigQueryIO uses streaming inserts in the following situations: + {{< paragraph class="language-java" wrap="span" >}} -* When you apply a BigQueryIO write transform to an unbounded `PCollection` and - do not set the triggering frequency. +* When you apply a BigQueryIO write transform to an unbounded `PCollection`. * When you specify streaming inserts as the insertion method using `BigQueryIO.write().withMethod(STREAMING_INSERTS)`. {{< /paragraph >}} - +{{< paragraph class="language-py" wrap="span" >}} +* When you apply a BigQueryIO write transform to an unbounded `PCollection`. +* When you specify streaming inserts as the insertion method using + `WriteToBigQuery(method='STREAMING_INSERTS')`. +{{< /paragraph >}} + +{{< paragraph class="language-java" wrap="span">}} +***Note:*** Streaming inserts by default enables BigQuery [best-effort deduplication mechanism](https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency). +You can disable that by setting `ignoreInsertIds`. The [quota limitations](https://cloud.google.com/bigquery/quotas#streaming_inserts) +are different when deduplication is enabled vs. disabled. + +Streaming inserts applies a default sharding for each table destination. You can +use `withAutoSharding` (starting 2.28.0 release) to enable dynamic sharding and +the number of shards may be determined and changed at runtime. The sharding +behavior depends on the runners. -{{< paragraph class="language-java" >}} -You can use `withMethod` to specify the desired insertion method. See -[Write.Method](https://beam.apache.org/releases/javadoc/{{< param release_latest >}}/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html) -for the list of the available methods and their restrictions. {{< /paragraph >}} -{{< paragraph class="language-java" >}} -***Note:*** If you use batch loads in a streaming pipeline, you must use -`withTriggeringFrequency` to specify a triggering frequency and `withNumFileShards` to specify number of file shards written. +{{< paragraph class="language-py" wrap="span">}} +***Note:*** Streaming inserts by default enables BigQuery [best-effort deduplication mechanism](https://cloud.google.com/bigquery/streaming-data-into-bigquery#disabling_best_effort_de-duplication). +You can disable that by setting `ignore_insert_ids=True`. The [quota limitations](https://cloud.google.com/bigquery/quotas#streaming_inserts) +are different when deduplication is enabled vs. disabled. + +Streaming inserts applies a default sharding for each table destination. You can +set `with_auto_sharding=True` (starting 2.29.0 release) to enable dynamic +sharding. The number of shards may be determined and changed at runtime. The +sharding behavior depends on the runners. + {{< /paragraph >}} + + ### Writing to a table {{< paragraph class="language-java" >}} diff --git a/website/www/site/content/en/documentation/io/built-in/hadoop.md b/website/www/site/content/en/documentation/io/built-in/hadoop.md index 9dc9104ec3c3..48d6a4b2555d 100644 --- a/website/www/site/content/en/documentation/io/built-in/hadoop.md +++ b/website/www/site/content/en/documentation/io/built-in/hadoop.md @@ -34,6 +34,7 @@ You will need to pass a Hadoop `Configuration` with parameters specifying how th - `value.class` - The `Value` class returned by the `InputFormat` in `mapreduce.job.inputformat.class`. For example: + {{< highlight java >}} Configuration myHadoopConfiguration = new Configuration(false); // Set Hadoop InputFormat, key and value class in configuration @@ -50,6 +51,7 @@ myHadoopConfiguration.setClass("value.class", InputFormatValueClass, Object.clas You will need to check if the `Key` and `Value` classes output by the `InputFormat` have a Beam `Coder` available. If not, you can use `withKeyTranslation` or `withValueTranslation` to specify a method transforming instances of those classes into another class that is supported by a Beam `Coder`. These settings are optional and you don't need to specify translation for both key and value. For example: + {{< highlight java >}} SimpleFunction myOutputKeyType = new SimpleFunction() { @@ -64,7 +66,6 @@ new SimpleFunction() { } }; {{< /highlight >}} - {{< highlight py >}} # The Beam SDK for Python does not support Hadoop Input/Output Format IO. {{< /highlight >}} @@ -291,6 +292,7 @@ This is useful for cases such as reading historical data or offloading of work f There are scenarios when this may prove faster than accessing content through the region servers using the `HBaseIO`. A table snapshot can be taken using the HBase shell or programmatically: + {{< highlight java >}} try ( Connection connection = ConnectionFactory.createConnection(hbaseConf); @@ -364,6 +366,7 @@ You will need to pass a Hadoop `Configuration` with parameters specifying how th _Note_: All mentioned values have appropriate constants. E.g.: `HadoopFormatIO.OUTPUT_FORMAT_CLASS_ATTR`. For example: + {{< highlight java >}} Configuration myHadoopConfiguration = new Configuration(false); // Set Hadoop OutputFormat, key and value class in configuration diff --git a/website/www/site/content/en/documentation/io/built-in/hcatalog.md b/website/www/site/content/en/documentation/io/built-in/hcatalog.md index 9cf5d9a41294..22d4ffaeae6e 100644 --- a/website/www/site/content/en/documentation/io/built-in/hcatalog.md +++ b/website/www/site/content/en/documentation/io/built-in/hcatalog.md @@ -47,6 +47,7 @@ optional parameters are database, partition and batchsize. The destination table should exist beforehand as the transform will not create a new table if missing. For example: + {{< highlight java >}} Map configProperties = new HashMap(); configProperties.put("hive.metastore.uris","thrift://metastore-host:port"); @@ -60,7 +61,6 @@ pipeline .withPartition(partitionValues) //optional, may be specified if the table is partitioned .withBatchSize(1024L)) //optional, assumes a default batch size of 1024 if none specified {{< /highlight >}} - {{< highlight py >}} # The Beam SDK for Python does not support HCatalogIO. {{< /highlight >}} diff --git a/website/www/site/content/en/documentation/io/built-in/snowflake.md b/website/www/site/content/en/documentation/io/built-in/snowflake.md index cb674507eb10..590032edd752 100644 --- a/website/www/site/content/en/documentation/io/built-in/snowflake.md +++ b/website/www/site/content/en/documentation/io/built-in/snowflake.md @@ -404,7 +404,7 @@ data.apply( - `.toTable()` - Accepts the target Snowflake table name. - - Example: `.toTable("MY_TABLE)` + - Example: `.toTable("MY_TABLE")` - `.withStagingBucketName()` - Accepts a cloud bucket path ended with slash. @@ -452,7 +452,9 @@ AS COPY INTO stream_table from @streamstage; **Note**: -SnowflakeIO uses COPY statements behind the scenes to write (using [COPY to table](https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html)). StagingBucketName will be used to save CSV files which will end up in Snowflake. Those CSV files will be saved under the “stagingBucketName” path. +As mentioned before SnowflakeIO uses [SnowPipe REST calls](https://docs.snowflake.com/en/user-guide/data-load-snowpipe.html) +behind the scenes for writing from unbounded sources. StagingBucketName will be used to save CSV files which will end up in Snowflake. +SnowflakeIO is not going to delete created CSV files from path under the “stagingBucketName” either during or after finishing streaming. **Optional** for streaming: - `.withFlushTimeLimit()` diff --git a/website/www/site/content/en/documentation/patterns/bqml.md b/website/www/site/content/en/documentation/patterns/bqml.md new file mode 100644 index 000000000000..e56802fb6400 --- /dev/null +++ b/website/www/site/content/en/documentation/patterns/bqml.md @@ -0,0 +1,106 @@ +--- +title: "BigQuery ML integration" +--- + + + +# BigQuery ML integration + +With the samples on this page we will demonstrate how to integrate models exported from [BigQuery ML (BQML)](https://cloud.google.com/bigquery-ml/docs) into your Apache Beam pipeline using [TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl). + +Roughly, the sections below will go through the following steps in more detail: + +1. Create and train your BigQuery ML model +1. Export your BigQuery ML model +1. Create a transform that uses the brand-new BigQuery ML model + +{{< language-switcher java py >}} + +## Create and train your BigQuery ML model + +To be able to incorporate your BQML model into an Apache Beam pipeline using tfx_bsl, it has to be in the [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) format. An overview that maps different model types to their export model format for BQML can be found [here](https://cloud.google.com/bigquery-ml/docs/exporting-models#export_model_formats_and_samples). + +For the sake of simplicity, we'll be training a (simplified version of the) logistic regression model in the [BQML quickstart guide](https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start), using the publicly available Google Analytics sample dataset (which is a [date-sharded table](https://cloud.google.com/bigquery/docs/partitioned-tables#dt_partition_shard) - alternatively, you might encounter [partitioned tables](https://cloud.google.com/bigquery/docs/partitioned-tables)). An overview of all models you can create using BQML can be found [here](https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in). + +After creating a BigQuery dataset, you continue to create the model, which is fully defined in SQL: + +``` +CREATE MODEL IF NOT EXISTS `bqml_tutorial.sample_model` +OPTIONS(model_type='logistic_reg', input_label_cols=["label"]) AS +SELECT + IF(totals.transactions IS NULL, 0, 1) AS label, + IFNULL(geoNetwork.country, "") AS country +FROM + `bigquery-public-data.google_analytics_sample.ga_sessions_*` +WHERE + _TABLE_SUFFIX BETWEEN '20160801' AND '20170630' +``` + +The model will predict if a purchase will be made given the country of the visitor on data gathered between 2016-08-01 and 2017-06-30. + +## Export your BigQuery ML model + +In order to incorporate your model in an Apache Beam pipeline, you will need to export it. Prerequisites to do so are [installing the `bq` command-line tool](https://cloud.google.com/bigquery/docs/bq-command-line-tool) and [creating a Google Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) to store your exported model. + +Export the model using the following command: + +``` +bq extract -m bqml_tutorial.sample_model gs://some/gcs/path +``` + +## Create an Apache Beam transform that uses your BigQuery ML model + +In this section we will construct an Apache Beam pipeline that will use the BigQuery ML model we just created and exported. The model can be served using Google Cloud AI Platform Prediction - for this please refer to the [AI Platform patterns](https://beam.apache.org/documentation/patterns/ai-platform/). In this case, we'll be illustrating how to use the tfx_bsl library to do local predictions (on your Apache Beam workers). + +First, the model needs to be downloaded to a local directory where you will be developing the rest of your pipeline (e.g. to `serving_dir/sample_model/1`). + +Then, you can start developing your pipeline like you would normally do. We will be using the `RunInference` PTransform from the [tfx_bsl](https://github.com/tensorflow/tfx-bsl) library, and we will point it to our local directory where the model is stored (see the `model_path` variable in the code example). The transform takes elements of the type `tf.train.Example` as inputs and outputs elements of the type [`tensorflow_serving.apis.prediction_log_pb2.PredictionLog`](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_log.proto). Depending on the signature of your model, you can extract values from the output; in our case we extract `label_probs`, `label_values` and the `predicted_label` as per the [docs on the logistic regression model](https://cloud.google.com/bigquery-ml/docs/exporting-models#logistic_reg) in the `extract_prediction` function. + +{{< highlight py >}} +import apache_beam +import tensorflow as tf +from google.protobuf import text_format +from tensorflow.python.framework import tensor_util +from tfx_bsl.beam import run_inference +from tfx_bsl.public.beam import RunInference +from tfx_bsl.public.proto import model_spec_pb2 + + +inputs = tf.train.Example(features=tf.train.Features( + feature={ + 'os': tf.train.Feature(bytes_list=tf.train.BytesList(b"Microsoft")) + }) + ) + +model_path = "serving_dir/sample_model/1" + +def extract_prediction(response): + yield response.predict_log.response.outputs['label_values'].string_val, + tensor_util.MakeNdarray(response.predict_log.response.outputs['label_probs']), + response.predict_log.response.outputs['predicted_label'].string_val + +with beam.Pipeline() as p: + res = ( + p + | beam.Create([inputs]) + | RunInference( + model_spec_pb2.InferenceSpecType( + saved_model_spec=model_spec_pb2.SavedModelSpec( + model_path=model_path, + signature_name=['serving_default']))) + | beam.ParDo(extract_prediction) +{{< /highlight >}} + +{{< highlight java >}} +Implemented in Python. +{{< /highlight >}} diff --git a/website/www/site/content/en/documentation/patterns/cross-language.md b/website/www/site/content/en/documentation/patterns/cross-language.md new file mode 100644 index 000000000000..8cdd6a37658f --- /dev/null +++ b/website/www/site/content/en/documentation/patterns/cross-language.md @@ -0,0 +1,175 @@ +--- +title: "Cross-language transforms" +--- + + + +# Cross-language transforms + +With the samples on this page we will demonstrate how to create and leverage cross-language pipelines. + +The goal of a cross-language pipeline is to incorporate transforms from one SDK (e.g. the Python SDK) into a pipeline written using another SDK (e.g. the Java SDK). This enables having already developed transforms (e.g. ML transforms in Python) and libraries (e.g. the vast library of IOs in Java), and strengths of certain languages at your disposal in whichever language you are more comfortable authoring pipelines while vastly expanding your toolkit in given language. + +In this section we will cover a specific use-case: incorporating a Python transform that does inference on a model but is part of a larger Java pipeline. The section is broken down into 2 parts: + +1. How to author the cross-language pipeline? +1. How to run the cross-language pipeline? + +{{< language-switcher java py >}} + +## How to author the cross-language pipeline? + +This section digs into what changes when authoring a cross-language pipeline: + +1. "Classic" pipeline in Java +1. External transform in Python +1. Expansion server + +### "Classic" pipeline + +We start by developing an Apache Beam pipeline like we would normally do if you were using only one SDK (e.g. the Java SDK): + +{{< highlight java >}} +public class CrossLanguageTransform extends PTransform, PCollection> { + private static final String URN = "beam:transforms:xlang:pythontransform"; + + private static String expansionAddress; + + public CrossLanguageTransform(String expansionAddress) { + this.expansionAddress = expansionAddress; + } + + @Override + public PCollection expand(PCollection input) { + PCollection output = + input.apply( + "ExternalPythonTransform", + External.of(URN, new byte [] {}, this.expansionAddress) + ); + } +} + +public class CrossLanguagePipeline { + public static void main(String[] args) { + Pipeline p = Pipeline.create(); + + String expansionAddress = "localhost:9097" + + PCollection inputs = p.apply(Create.of("features { feature { key: 'country' value { bytes_list { value: 'Belgium' }}}}")); + input.apply(new CrossLanguageTransform(expansionAddress)); + + p.run().waitUntilFinish(); + } +} +{{< /highlight >}} + +The main differences with authoring a classic pipeline and transform are + +- The PTransform uses the [External](https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java) transform. +- This has a Uniform Resource Name (URN) which will identify the transform in your expansion service (more below). +- The address on which the expansion service is running. + +Check the [documentation](https://beam.apache.org/documentation/programming-guide/#use-x-lang-transforms) for a deeper understanding of using external transforms. + +### External transform + +The transform we are trying to call from Java is defined in Python as follows: + +{{< highlight java >}} +Implemented in Python. +{{< /highlight >}} + +{{< highlight py >}} +URN = "beam:transforms:xlang:pythontransform" + +@ptransform.PTransform.register_urn(URN, None) +class PythonTransform(ptransform.PTransform): + def __init__(self): + super(PythonTransform, self).__init__() + + def expand(self, pcoll): + return (pcoll + | "Input preparation" + >> beam.Map( + lambda input: google.protobuf.text_format.Parse(input, tf.train.Example()) + ) + | "Get predictions" >> RunInference( + model_spec_pb2.InferenceSpecType( + saved_model_spec=model_spec_pb2.SavedModelSpec( + model_path=model_path, + signature_name=['serving_default'])))) + + def to_runner_api_parameter(self, unused_context): + return URN, None + + def from_runner_api_parameter( + unused_ptransform, unused_paramter, unused_context): + return PythonTransform() +{{< /highlight >}} + +Check the [documentation](https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms) for a deeper understanding of creating an external transform. + +### Expansion service + +The expansion service is written in the same language as the external transform. It takes care of injecting the transforms in your pipeline before submitting them to the Runner. + +{{< highlight java >}} +Implemented in Python. +{{< /highlight >}} + +{{< highlight py >}} +def main(unused_argv): + parser = argparse.ArgumentParser() + parser.add_argument( + '-p', '--port', type=int, help='port on which to serve the job api') + options = parser.parse_args() + global server + server = grpc.server(thread_pool_executor.shared_unbounded_instance()) + beam_expansion_api_pb2_grpc.add_ExpansionServiceServicer_to_server( + expansion_service.ExpansionServiceServicer( + PipelineOptions( + ["--experiments", "beam_fn_api", "--sdk_location", "container"])), server) + server.add_insecure_port('localhost:{}'.format(options.port)) + server.start() + _LOGGER.info('Listening for expansion requests at %d', options.port) + + signal.signal(signal.SIGTERM, cleanup) + signal.signal(signal.SIGINT, cleanup) + signal.pause() + + +if __name__ == '__main__': + logging.getLogger().setLevel(logging.INFO) + main(sys.argv) +{{< /highlight >}} + +## How to run the cross-language pipeline? + +In this section, the steps to run a cross-language pipeline are set out: + +1. Start the **expansion service** with your Python transforms: `python expansion_service.py -p 9097` +1. Start the **Job Server** which will translated into the stage that will run on your back-end or runner (e.g. Spark): + + - From Apache Beam source code: + `./gradlew :runners:spark:job-server:runShadow` + - Using the pre-build Docker container: + `docker run -net=host apache/beam_spark_job_server` + +1. **Run pipeline**: ```mvn exec:java -Dexec.mainClass=CrossLanguagePipeline \ + -Pportable-runner \ + -Dexec.args=" \ + --runner=PortableRunner \ + --jobEndpoint=localhost:8099 \ + --useExternal=true \ + --expansionServiceURL=localhost:9097 \ + --experiments=beam_fn_api"``` diff --git a/website/www/site/content/en/documentation/patterns/overview.md b/website/www/site/content/en/documentation/patterns/overview.md index ff6593e99248..b13c5d477fd9 100644 --- a/website/www/site/content/en/documentation/patterns/overview.md +++ b/website/www/site/content/en/documentation/patterns/overview.md @@ -45,6 +45,12 @@ Pipeline patterns demonstrate common Beam use cases. Pipeline patterns are based **Schema patterns** - Patterns for using Schemas * [Using Joins](/documentation/patterns/schema/#using-joins) +**BQML integration patterns** - Patterns on integrating BigQuery ML into your Beam pipeline +* [Model export and on-worker serving using BQML and TFX_BSL](/documentation/patterns/bqml/#bigquery-ml-integration) + +**Cross-language patterns** - Patterns for creating cross-language pipelines +* [Cross-language patterns](/documentation/patterns/cross-language/#cross-language-transforms) + ## Contributing a pattern To contribute a new pipeline pattern, create an issue with the [`pipeline-patterns` label](https://issues.apache.org/jira/browse/BEAM-7449?jql=labels%20%3D%20pipeline-patterns) and add details to the issue description. See [Get started contributing](/contribute/) for more information. diff --git a/website/www/site/content/en/documentation/patterns/side-inputs.md b/website/www/site/content/en/documentation/patterns/side-inputs.md index 54dc2b2f79fa..046e90213ee7 100644 --- a/website/www/site/content/en/documentation/patterns/side-inputs.md +++ b/website/www/site/content/en/documentation/patterns/side-inputs.md @@ -59,7 +59,7 @@ This guarantees consistency on the duration of the single window, meaning that each window on the main input will be matched to a single version of side input data. -To read side input data periodically into distinct PColleciton windows: +To read side input data periodically into distinct PCollection windows: 1. Use the PeriodicImpulse or PeriodicSequence PTransform to: * Generate an infinite sequence of elements at required processing time diff --git a/website/www/site/content/en/documentation/programming-guide.md b/website/www/site/content/en/documentation/programming-guide.md index 477152c23cb1..61b6b97e9247 100644 --- a/website/www/site/content/en/documentation/programming-guide.md +++ b/website/www/site/content/en/documentation/programming-guide.md @@ -28,12 +28,16 @@ programmatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your pipelines. -{{< language-switcher java py >}} +{{< language-switcher java py go >}} {{< paragraph class="language-py" >}} The Python SDK supports Python 3.6, 3.7, and 3.8. Beam 2.24.0 was the last Python SDK release to support Python 2 and 3.5. {{< /paragraph >}} +{{< paragraph class="language-go" >}} +The Go SDK supports Go v1.16+. SDK release 2.32.0 is the last experimental version. +{{< /paragraph >}} + ## 1. Overview {#overview} To use Beam, you need to first create a driver program using the classes in one @@ -69,6 +73,15 @@ include: input, performs a processing function that you provide on the elements of that `PCollection`, and produces zero or more output `PCollection` objects. + + +* `Scope`: The Go SDK has an explicit scope variable used to build a `Pipeline`. + A `Pipeline` can return it's root scope with the `Root()` method. The scope + variable is passed to `PTransform` functions to place them in the `Pipeline` + that owns the `Scope`. + + + * I/O transforms: Beam comes with a number of "IOs" - library `PTransform`s that read or write data to various external storage systems. @@ -103,6 +116,7 @@ The `Pipeline` abstraction encapsulates all the data and steps in your data processing task. Your Beam driver program typically starts by constructing a [Pipeline](https://beam.apache.org/releases/javadoc/{{< param release_latest >}}/index.html?org/apache/beam/sdk/Pipeline.html) [Pipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py) +[Pipeline](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pipeline.go#L62) object, and then using that object as the basis for creating the pipeline's data sets as `PCollection`s and its operations as `Transform`s. @@ -126,8 +140,7 @@ Pipeline p = Pipeline.create(options); {{< /highlight >}} {{< highlight go >}} -// In order to start creating the pipeline for execution, a Pipeline object and a Scope object are needed. -p, s := beam.NewPipelineWithRoot() +{{< code_sample "sdks/go/examples/snippets/01_03intro.go" pipelines_constructing_creating >}} {{< /highlight >}} ### 2.1. Configuring pipeline options {#configuring-pipeline-options} @@ -138,18 +151,29 @@ configuration required by the chosen runner. Your pipeline options will potentially include information such as your project ID or a location for storing files. +{{< paragraph class="language-java" >}} When you run the pipeline on a runner of your choice, a copy of the PipelineOptions will be available to your code. For example, if you add a PipelineOptions parameter to a DoFn's `@ProcessElement` method, it will be populated by the system. +{{< /paragraph >}} #### 2.1.1. Setting PipelineOptions from command-line arguments {#pipeline-options-cli} +{{< paragraph class="language-java language-py" >}} While you can configure your pipeline by creating a `PipelineOptions` object and setting the fields directly, the Beam SDKs include a command-line parser that you can use to set fields in `PipelineOptions` using command-line arguments. +{{< /paragraph >}} +{{< paragraph class="language-java language-py" >}} To read options from the command-line, construct your `PipelineOptions` object as demonstrated in the following example code: +{{< /paragraph >}} + +{{< paragraph class="language-go" >}} +Use Go flags to parse command line arguments to configure your pipeline. Flags must be parsed +before `beam.Init()` is called. +{{< /paragraph >}} {{< highlight java >}} PipelineOptions options = @@ -157,12 +181,11 @@ PipelineOptions options = {{< /highlight >}} {{< highlight py >}} -{{< code_sample "sdks/python/apache_beam/examples/snippets/snippets.py" pipelines_constructing_creating >}} +{{< code_sample "sdks/python/apache_beam/examples/snippets/snippets.py" pipeline_options_create >}} {{< /highlight >}} {{< highlight go >}} -// If beamx or Go flags are used, flags must be parsed first. -flag.Parse() +{{< code_sample "sdks/go/examples/snippets/01_03intro.go" pipeline_options >}} {{< /highlight >}} This interprets command-line arguments that follow the format: @@ -171,11 +194,21 @@ This interprets command-line arguments that follow the format: --